Python for Transit: Segment frequencies in a map from GTFS | by Santiago Toso | Dec, 2020


Dive deeper into gtfs_functions Python package

In this article, we will see how to get bus segment frequencies from a GTFS using the Python package gtfs_functions. You can find the repository and official documentation on GitHub.

If you are looking for an extensive explanation of the package, I recommend you first read this introduction. Here, we are going to directly dive into the specific use case of getting stop frequencies in a map.

To install the package and parse the GTFS run the code below. For the article, I downloaded the GTFS from SFMTA (San Francisco, CA).

Sometimes, looking at the variables at the stop or line-level is not the best solution, and we need to go at the segment level. We want to know what is going on between stop A and stop B and how it is different from what is going on between stop C and stop D.

In order to be able to aggregate information at the segment level, we first need to cut the long shapes of each route in segments that go from stop to stop.

That is exactly what the function cut_gtfs does. It takes 3 arguments from the parsed GTFS:

The output shows:

GeoDataFrame output for the function cut_gtfs().

Which is:

  • route_id of the segment
  • direction_id of the segments as comes in the GTFS
  • stop_sequence of the starting stop of the segment as it comes from the GTFS
  • start_stop_name as it comes from the GTFS
  • end_stop_name as it comes from the GTFS
  • start_stop_id as it comes from the GTFS
  • end_stop_id as it comes from the GTFS
  • segment_id as a concatenation of the start_stop_id and end_stop_id
  • shape_id for that segment as it comes from the GTFS
  • geometry as a LineString
  • distance_m that represents the length of the segment in meters. This will be useful to calculate the speeds later.

Having the segments is not the output in itself, but just a middle step we have to take to finally aggregate variables at the segment level. Let’s see how to do that in the next sections.

We can now have the frequency by bus segment. It takes the same three arguments that we had for speeds:

The output for one specific segments, direction, and time of day shows:

Note that in the example above the chose segment 3114–3144 appears four times: one for each of the routes that serve that segment and a fourth time for the route “All lines”. This route is created by the function and it aggregates the frequency in that segment taking into account all the routes that stop in its starting and ending stop.

Also, notice that the aggregated value for “All lines” takes into account the three segments, ignoring the direction the lines had in the GTFS. This makes sense since the segment always starts and ends in the same stops, even if the assigned direction is different in the GTFS

The route “All lines” is created by the function itself and it aggregates fequency in that specific segment taking into account all the routes stop in its starting and ending stop.

If you are looking to visualize data at the segment level for all lines I recommend you go with something more powerful than the map_gdf() that we saw in previous articles like (AKA my favorite data viz library). For example, to check the scheduled speeds per segment:

You will need to manually style the colors and filters but you will have complete control over the visual. Or you can always learn to do it programmatically (which I haven’t yet).

Far from taking credit from other’s work, I want to acknowledge that some functions of this package were built on top of great and more generic packages and were just slightly modified to better serve this specific workflow.

For example, the function import_gtfs() heavily relies on partridge, a powerful Python library created by Remix founders that makes parsing a GTFS very easy. Similarly, map_gdf() and save_gdf() are built on top of folium and geopandas respectively.

Read More …


Write a comment