Visualizing MBTA Data

An interactive exploration of Boston's subway system

Mike Barry and Brian Card - June 10, 2014

Boston’s Massachusetts Bay Transit Authority (MBTA) operates the 4th busiest subway system in the U.S. after New York, Washington, and Chicago. If you live in or around the city you have probably ridden on it. The MBTA recently began publishing substantial amount of subway data through its public APIs. They provide the full schedule in General Transit Feed Specification (GTFS) format which powers Google’s transit directions. They also publish realtime train locations for the Red, Orange, Blue, and Green lines. The following visualizations use data captured from these feeds for the entire month of February, 2014. Green Line data became available in October, 2014 so is not shown here. Also, working with the MBTA, we were able to acquire per-minute entry and exit counts at each station measured at the turnstiles used for payment.

We attempt to present this information to help people in Boston better understand the trains, how people use the trains, and how the people and trains interact with each other.

The Trains

In a typical weekday, trains make approximately 1150 trips on the red, orange, and blue lines starting at 5AM and continuing through 1AM the next morning. On Saturdays trains make 870 trips and on Sundays they make 760.

To better understand how the trains operate on a typical day, below are all trips that trains took on the red, orange, and blue lines on Monday February 3 2014. Each vertical line represents a station, and time extends from top to bottom. Steeper lines indicate slower trains. This visualization was first used by Étienne-Jules Marey to visualize train schedules and is typically called a “Marey Diagram.”

Average Number of Trips per Day
WeekdaysSaturdaysSundays
Red450350300
Orange320260220
Blue380260240
Total1150870760

Subway Trips on Monday February 3, 2014

Locations of each train on the red, blue, and orange lines at . Hover over the diagram to the right to display trains at a different time.

Trains are on the right side of the track relative to the direction they are moving.

See the morning rush-hour, midday lull, afternoon rush-hour, and the evening lull.

To better compare the individual trips on this day, the visualization below shows all of the trips from the above diagram juxtaposed with the the starting points lined up so you can see the range of fastest to slowest trips, as well as variation in trip times based on the time of day. The trains slow down a little bit during the morning rush-hour, primarily on the outbound blue line. The afternoon rush-hour is by-far the worst time of day for the red line. The midday lull and evening lull are both fairly consistent. Hover over the time scale on the left to highlight trips during different parts of the day. Click on a line to see all at what time the train was at each stop.

The People

In a typical weekday, 425,000 people enter a station along the red, orange, or blue lines. On weekends and holidays, that number drops to 200,000. The busiest day was Friday February 7 when 470,187 people entered the system.

This heatmap shows the average number of people that enter and exit stations along the red, orange, and blue line for every hour over the entire month based on records from turnstiles at each station. Each row represents one week. You can see weekends and weekdays with daily peaks at rush hour, as well as a holiday, and two snow storms. Our exit data is less reliable since not all stations require that people exit through a turnstile.

Entrances and Exits from All Stations during February 2014

The table and map below breaks down February's turnstile entries and exits by station. Hover over a row in the table to highlight the corresponding circle on the map, or vice-versa. Click on a row in the table to show a detailed heatmap for the entrances to and exits from that station over the month. Click and drag on several table rows to highlight a range of stations.

You can see the busiest stations are all along the Red Line. Harvard topped the list, followed close by South Station , and then Downtown Crossing . Next to each station are heatmaps showing entrances and exits to each station per-hour for weekdays and weekends/holidays. You can see that some stations are work stations since their exits peak in the morning and entrances peak in the afternoon and that some stations are home stations since their entrances peak in the morning and exits peak in the afternoon. Some stations are just busy all the time.

Entrances and Exits per Station during February 2014

Each circle above and row in the table represent a station, hover over one to highlight the other. Next to each station are heatmaps showing entrances and exits to each station per-hour for weekdays and weekends/holidays.

Notice work stations with exit peaks in the morning and entrances peak in the afternoon, home stations with entrance peaks in the morning and exit peaks in the afternoon, and the stations that are just busy all the time.

How People and Trains Affect Each Other

When you look back at the Marey diagram, the slope of each line tells you how fast a train is going and the time it takes to get between stations. When all of the start and stop times are lined up you can see a drastic variation in the time it takes to get between stops throughout the day. If you have ever ridden the subway during rush hour then you have experienced what the steep lines in the Marey diagram feel like first-hand.

What causes these delays? It’s hard to know for sure, but it appears that number of people riding the subway is a factor.

One Week of Congestion and Delay

This visualization shows congestion and delay on the red, blue, and orange lines for the first full week in February. The gray bands show the total number of entries into the all stations per minute over time for each day of the week. The colored bands below indicate whether the trains are running faster or slower than normal.

The map shows the congestion and delay across the system at a time when you hover over the chart on the right. The thickness of each line at a stop indicates number of entries per minute at that stop, and the color on the right-hand side of a track indicates delay in that direction using the same scale as the colored bands.

You can see basketball games letting out on Monday, Tuesday, Friday, and Sunday. You can also tell that it snowed on Wednesday and people stayed home, especially when you compare how light Wednesday evening's rush hour was compared to Thursday evening's rush hour.

Your Commute

How do all of these factors affect your commute? Click and drag on the map from a starting station to an ending station to see a detailed breakdown of how long that trip takes at different points during a typical workday. The points on top show all of the trip durations for a given starting time from the start to destination and the points on bottom show all of the times between when when trains leave the start station going to the destination station. The time between trains is the longest you would possibly need to wait if you arrived just as the previous train was leaving. The blue band excludes the shortest and longest 10% of all transit times, leaving behind the most-likely 80% range and the orange band does the same for wait times between trains. The dark lines show the middle point where 50% of the time wait/transit times are higher and 50% of the time they are lower.

Drag from a starting station to an ending station to see how long the trip takes over time in the chart. Click on a starting station then an ending station to see how long the trip takes over time in the chart.

In general, delays go up during rush hour but trains come more frequently, for example if you look at South Station to Kendall/MIT you will notice that the transit times go up as the wait times go down. If you drag across the chart, the paragraph below will tell you that these effects roughly balance eachother out and the most-likely trip duration (half the normal time between trains plus total transit time) stays constant around 10-12 minutes. It is also interesting to note that transit times on the blue line, for example State St. to Wonderland are much less variable than the red line. Orange line trips like Downtown Crossing to Forest Hills are less variable in transit time - but trains come much less frequently and reliably.

Summary

Through publicly available data, we have the tools to understand the subway system better than we ever have before. We have seen how the system operates on a daily basis, how people use the system, how that affects the trains and also how this ties back to your daily commute. To see a live version of this data, check out MBTA Trains for real-time subway delays and real-time commuter rail delays.

Credits

This project was created by Michael Barry and Brian Card for a graduate course in Data Visualization at WPI taught by Matthew Ward. Several open-source projects were used under the MIT License including D3, Bootstrap, Glyphicons, Underscore, Moment.js, es6-shim, and D3-tip. Data courtesy of the MBTA and their Developer Relations Program.

Much of the inspiration for this report comes from Bret Victor's Ladder of Abstraction and the works of Edward Tufte and Étienne-Jules Marey.

Source Code

The source code and raw data are available on github and described in this blog post.

For any questions, please reach out to Michael Barry or Brian Card on Twitter.