Boston’s Massachusetts Bay Transit Authority (MBTA) operates the 4th busiest subway system in the U.S. after New York, Washington, and Chicago. If you live in or around the city you have probably ridden on it. The MBTA recently began publishing substantial amount of subway data through its public APIs. They provide the full schedule in General Transit Feed Specification (GTFS) format which powers Google’s transit directions. They also publish realtime train locations for the Red, Orange, Blue, and Green lines. The following visualizations use data captured from these feeds for the entire month of February, 2014. Green Line data became available in October, 2014 so is not shown here. Also, working with the MBTA, we were able to acquire per-minute entry and exit counts at each station measured at the turnstiles used for payment.
We attempt to present this information to help people in Boston better understand the trains, how people use the trains, and how the people and trains interact with each other.
In a typical weekday, trains make approximately 1150 trips on the red, orange, and blue lines starting at 5AM and continuing through 1AM the next morning. On Saturdays trains make 870 trips and on Sundays they make 760.
To better understand how the trains operate on a typical day, below are all trips that trains took on the red, orange, and blue lines on Monday February 3 2014. Each vertical line represents a station, and time extends from top to bottom. Steeper lines indicate slower trains. This visualization was first used by Étienne-Jules Marey to visualize train schedules and is typically called a “Marey Diagram.”
To better compare the individual trips on this day, the visualization below shows all of the trips from the above diagram juxtaposed with the the starting points lined up so you can see the range of fastest to slowest trips, as well as variation in trip times based on the time of day. The trains slow down a little bit during the morning rush-hour, primarily on the outbound blue line. The afternoon rush-hour is by-far the worst time of day for the red line. The midday lull and evening lull are both fairly consistent. Hover over the time scale on the left to highlight trips during different parts of the day. Click on a line to see all at what time the train was at each stop.
In a typical weekday, 425,000 people enter a station along the red, orange, or blue lines. On weekends and holidays, that number drops to 200,000. The busiest day was Friday February 7 when 470,187 people entered the system.
This heatmap shows the average number of people that enter and exit stations along the red, orange, and blue line for every hour over the entire month based on records from turnstiles at each station. Each row represents one week. You can see weekends and weekdays with daily peaks at rush hour, as well as a holiday, and two snow storms. Our exit data is less reliable since not all stations require that people exit through a turnstile.
The table and map below breaks down February's turnstile entries and exits by station. Hover over a row in the table to highlight the corresponding circle on the map, or vice-versa. Click on a row in the table to show a detailed heatmap for the entrances to and exits from that station over the month. Click and drag on several table rows to highlight a range of stations.
You can see the busiest stations are all along the Red Line. Harvard topped the list, followed close by South Station , and then Downtown Crossing . Next to each station are heatmaps showing entrances and exits to each station per-hour for weekdays and weekends/holidays. You can see that some stations are work stations since their exits peak in the morning and entrances peak in the afternoon and that some stations are home stations since their entrances peak in the morning and exits peak in the afternoon. Some stations are just busy all the time.
When you look back at the Marey diagram, the slope of each line tells you how fast a train is going and the time it takes to get between stations. When all of the start and stop times are lined up you can see a drastic variation in the time it takes to get between stops throughout the day. If you have ever ridden the subway during rush hour then you have experienced what the steep lines in the Marey diagram feel like first-hand.
What causes these delays? It’s hard to know for sure, but it appears that number of people riding the subway is a factor.
This visualization shows congestion and delay on the red, blue, and orange lines for the first full week in February. The gray bands show the total number of entries into the all stations per minute over time for each day of the week. The colored bands below indicate whether the trains are running faster or slower than normal.
The map shows the congestion and delay across the system at a time when you hover over the chart on the right. The thickness of each line at a stop indicates number of entries per minute at that stop, and the color on the right-hand side of a track indicates delay in that direction using the same scale as the colored bands.
You can see basketball games letting out on Monday, Tuesday, Friday, and Sunday. You can also tell that it snowed on Wednesday and people stayed home, especially when you compare how light Wednesday evening's rush hour was compared to Thursday evening's rush hour.
How do all of these factors affect your commute? Click and drag on the map from a starting station to an ending station to see a detailed breakdown of how long that trip takes at different points during a typical workday. The points on top show all of the trip durations for a given starting time from the start to destination and the points on bottom show all of the times between when when trains leave the start station going to the destination station. The time between trains is the longest you would possibly need to wait if you arrived just as the previous train was leaving. The blue band excludes the shortest and longest 10% of all transit times, leaving behind the most-likely 80% range and the orange band does the same for wait times between trains. The dark lines show the middle point where 50% of the time wait/transit times are higher and 50% of the time they are lower.
In general, delays go up during rush hour but trains come more frequently, for example if you look at South Station to Kendall/MIT you will notice that the transit times go up as the wait times go down. If you drag across the chart, the paragraph below will tell you that these effects roughly balance eachother out and the most-likely trip duration (half the normal time between trains plus total transit time) stays constant around 10-12 minutes. It is also interesting to note that transit times on the blue line, for example State St. to Wonderland are much less variable than the red line. Orange line trips like Downtown Crossing to Forest Hills are less variable in transit time - but trains come much less frequently and reliably.
Through publicly available data, we have the tools to understand the subway system better than we ever have before. We have seen how the system operates on a daily basis, how people use the system, how that affects the trains and also how this ties back to your daily commute. To see a live version of this data, check out MBTA Trains for real-time subway delays and real-time commuter rail delays.
This project was created by Michael Barry and Brian Card for a graduate course in Data Visualization at WPI taught by Matthew Ward. Several open-source projects were used under the MIT License including D3, Bootstrap, Glyphicons, Underscore, Moment.js, es6-shim, and D3-tip. Data courtesy of the MBTA and their Developer Relations Program.