Visualizing MBTA Data

An interactive exploration of Boston's subway system

Mike Barry and Brian Card - June 10, 2014

Monday 2/3 7:00 am

Boston’s Massachusetts Bay Transit Authority (MBTA) operates the 4th busiest subway system in the U.S. after New York, Washington, and Chicago. If you live in or around the city you have probably ridden on it. The MBTA recently began publishing substantial amount of subway data through its public APIs. They provide the full schedule in General Transit Feed Specification (GTFS) format which powers Google’s transit directions. They also publish realtime train locations for the Red, Orange, Blue, and Green lines. The following visualizations use data captured from these feeds for the entire month of February, 2014. Green Line data became available in October, 2014 so is not shown here. Also, working with the MBTA, we were able to acquire per-minute entry and exit counts at each station measured at the turnstiles used for payment.

We attempt to present this information to help people in Boston better understand the trains, how people use the trains, and how the people and trains interact with each other.

The Trains

In a typical weekday, trains make approximately 1150 trips on the red, orange, and blue lines starting at 5AM and continuing through 1AM the next morning. On Saturdays trains make 870 trips and on Sundays they make 760.

To better understand how the trains operate on a typical day, below are all trips that trains took on the red, orange, and blue lines on Monday February 3 2014. Each vertical line represents a station, and time extends from top to bottom. Steeper lines indicate slower trains. This visualization was first used by Étienne-Jules Marey to visualize train schedules and is typically called a “Marey Diagram.”

Average Number of Trips per Day
WeekdaysSaturdaysSundays
Red450350300
Orange320260220
Blue380260240
Total1150870760

Subway Trips on Monday February 3, 2014

Locations of each train on the red, blue, and orange lines at 5:13 am. Hover over the diagram to the right to display trains at a different time.

Trains are on the right side of the track relative to the direction they are moving.

See the morning rush-hour, midday lull, afternoon rush-hour, and the evening lull.

AshmontAlewifeBraintreeForest HillsOak GroveWonderlandBowdoin5:15 AM5:30 AM5:45 AM6:00 AM6:15 AM6:30 AM6:45 AM7:00 AM7:15 AM7:30 AM7:45 AM8:00 AM8:15 AM8:30 AM8:45 AM9:00 AM9:15 AM9:30 AM9:45 AM10:00 AM10:15 AM10:30 AM10:45 AM11:00 AM11:15 AM11:30 AM11:45 AM12:00 PM12:15 PM12:30 PM12:45 PM1:00 PM1:15 PM1:30 PM1:45 PM2:00 PM2:15 PM2:30 PM2:45 PM3:00 PM3:15 PM3:30 PM3:45 PM4:00 PM4:15 PM4:30 PM4:45 PM5:00 PM5:15 PM5:30 PM5:45 PM6:00 PM6:15 PM6:30 PM6:45 PM7:00 PM7:15 PM7:30 PM7:45 PM8:00 PM8:15 PM8:30 PM8:45 PM9:00 PM9:15 PM9:30 PM9:45 PM10:00 PM10:15 PM10:30 PM10:45 PM11:00 PM11:15 PM11:30 PM11:45 PM12:00 AM12:15 AM12:30 AM12:45 AM1:00 AM1:15 AM1:30 AM1:45 AM2:00 AM2:15 AM2:30 AM2:45 AM3:00 AM3:15 AM3:30 AM3:45 AM4:00 AM4:15 AM4:30 AM4:45 AM5:00 AM5:15 AM5:30 AM5:45 AM6:00 AM5:13 amService starts at 5AMon Monday morning.Each line representsthe path of one train.Time continuesdownward, so steeperlines indicate slowertrains.Since the red linesplits, we show theAshmont branch firstthen the Braintreebranch. Trains on theBraintree branch"jump over" theAshmont branch.Train frequencyincreases around6:30AM as morningrush hour begins.After the morningrush-hour subsides,everything runssmoothly throughoutthe middle of the dayThe afternoon rushhour begins around3:30PMA disabled traincauses delays ontrains after (below) itfor over an hour.Notice how thiscauses delays in theother direction aswell, as trainsimmediately arrive atAlewife then turnaround to go south.Service to Bowdoinstops at 6:20PMNormal serviceresumes for theevening startingaround 7PMA disabled train atWellington Stationcauses northbounddelays on the OrangeLine from 8:50PM to9:15PMNotice howsouthbound trains aretemporarily delayed,but get back onschedule quickly.The last trains of thenight move muchslower, sweeping upthe remainingpassengers to finisharound 1:30AMAt night, trains aremoved betweenstationsAt 5AM on Tuesday,the cycle begins again

To better compare the individual trips on this day, the visualization below shows all of the trips from the above diagram juxtaposed with the the starting points lined up so you can see the range of fastest to slowest trips, as well as variation in trip times based on the time of day. The trains slow down a little bit during the morning rush-hour, primarily on the outbound blue line. The afternoon rush-hour is by-far the worst time of day for the red line. The midday lull and evening lull are both fairly consistent. Hover over the time scale on the left to highlight trips during different parts of the day. Click on a line to see all at what time the train was at each stop.

12 AM2 AM4 AM6 AM8 AM10 AM12 PM2 PM4 PM6 PM8 PM10 PM12 AM0m10m20m30m40m50m60m70m80m90mminutes since start of tripStarting StationEnding StationAlewifeAshmont/BraintreeWonderlandGov't CenterOak GroveForest Hills

The People

In a typical weekday, 425,000 people enter a station along the red, orange, or blue lines. On weekends and holidays, that number drops to 200,000. The busiest day was Friday February 7 when 470,187 people entered the system.

This heatmap shows the average number of people that enter and exit stations along the red, orange, and blue line for every hour over the entire month based on records from turnstiles at each station. Each row represents one week. You can see weekends and weekdays with daily peaks at rush hour, as well as a holiday, and two snow storms. Our exit data is less reliable since not all stations require that people exit through a turnstile.

Entrances and Exits from All Stations during February 2014

Sun6am12pm6pmMon6am12pm6pmTues6am12pm6pmWed6am12pm6pmThurs6am12pm6pmFri6am12pm6pmSat6am12pm6pmWeek12341234entrancesexits
Color shows average entrances/exits0200400600870people per minute

The table and map below breaks down February's turnstile entries and exits by station. Hover over a row in the table to highlight the corresponding circle on the map, or vice-versa. Click on a row in the table to show a detailed heatmap for the entrances to and exits from that station over the month. Click and drag on several table rows to highlight a range of stations.

You can see the busiest stations are all along the Red Line. Harvard topped the list, followed close by South Station , and then Downtown Crossing . Next to each station are heatmaps showing entrances and exits to each station per-hour for weekdays and weekends/holidays. You can see that some stations are work stations since their exits peak in the morning and entrances peak in the afternoon and that some stations are home stations since their entrances peak in the morning and exits peak in the afternoon. Some stations are just busy all the time.

Entrances and Exits per Station during February 2014

Size shows turnstile entries onaverage day50010,00019,400people per day

Each circle above and row in the table represent a station, hover over one to highlight the other. Next to each station are heatmaps showing entrances and exits to each station per-hour for weekdays and weekends/holidays.

Notice work stations with exit peaks in the morning and entrances peak in the afternoon, home stations with entrance peaks in the morning and exit peaks in the afternoon, and the stations that are just busy all the time.

HarvardClick to show details below19,400South StationClick to show details below19,100Downtown CrossingClick to show details below16,900Park StreetClick to show details below13,900North StationClick to show details below13,600Central SquareClick to show details below13,600Back BayClick to show details below13,600Kendall SquareClick to show details below12,800Forest HillsClick to show details below11,200Davis SquareClick to show details below10,900State StreetClick to show details below9,800Malden Center Click to show details below9,100HaymarketClick to show details below8,900Charles MGHClick to show details below8,900RugglesClick to show details below8,800MaverickClick to show details below8,400Sullivan SquareClick to show details below8,300AlewifeClick to show details below8,200Government CenterClick to show details below7,700Porter SquareClick to show details below7,400JFK/U MassClick to show details below7,000AshmontClick to show details below6,900Quincy CenterClick to show details below6,500AirportClick to show details below5,600Wellington Click to show details below5,500ChinatownClick to show details below5,400Mass AveClick to show details below5,300Tufts Medical CenterClick to show details below5,100North QuincyClick to show details below5,000Andrew SquareClick to show details below4,900WonderlandClick to show details below4,800Jackson SquareClick to show details below4,700Oak GroveClick to show details below4,600BroadwayClick to show details below4,400Community CollegeClick to show details below4,200Fields CornerClick to show details below4,100Roxbury CrossingClick to show details below3,700BraintreeClick to show details below3,600AquariumClick to show details below3,600WollastonClick to show details below3,400Quincy AdamsClick to show details below3,400Stony BrookClick to show details below2,900Orient HeightsClick to show details below2,800Green StreetClick to show details below2,800Revere BeachClick to show details below2,400BeachmontClick to show details below2,300Savin HillClick to show details below1,900ShawmutClick to show details below1,700Wood IslandClick to show details below1,500BowdoinClick to show details below1,300Suffolk DownsClick to show details below500StationAvg. WeekdayAvg. WeekendAvg. Turnstile Entries per day6am12pm6pm6am12pm6pm
Color shows average entrances/exits020406082people per minute

How People and Trains Affect Each Other

When you look back at the Marey diagram, the slope of each line tells you how fast a train is going and the time it takes to get between stations. When all of the start and stop times are lined up you can see a drastic variation in the time it takes to get between stops throughout the day. If you have ever ridden the subway during rush hour then you have experienced what the steep lines in the Marey diagram feel like first-hand.

What causes these delays? It’s hard to know for sure, but it appears that number of people riding the subway is a factor.

One Week of Congestion and Delay

5:30 pm on Mon Feb 3
Line width shows turnstile entries at a station050100people per minute
SunFeb 9MonFeb 3TueFeb 4WedFeb 5ThuFeb 6FriFeb 7SatFeb 812AM2AM4AM6AM8AM10AM12PM2PM4PM6PM8PM10PM12AM5:30 pm880 entries/min38% slowColor shows delay20% fasteron time40% slowerthan normalGray bars show entries to all stations03907801060people per minute

This visualization shows congestion and delay on the red, blue, and orange lines for the first full week in February. The gray bands show the total number of entries into the all stations per minute over time for each day of the week. The colored bands below indicate whether the trains are running faster or slower than normal.

The map shows the congestion and delay across the system at a time when you hover over the chart on the right. The thickness of each line at a stop indicates number of entries per minute at that stop, and the color on the right-hand side of a track indicates delay in that direction using the same scale as the colored bands.

You can see basketball games letting out on Monday, Tuesday, Friday, and Sunday. You can also tell that it snowed on Wednesday and people stayed home, especially when you compare how light Wednesday evening's rush hour was compared to Thursday evening's rush hour.

Your Commute

How do all of these factors affect your commute? Click and drag on the map from a starting station to an ending station to see a detailed breakdown of how long that trip takes at different points during a typical workday. The points on top show all of the trip durations for a given starting time from the start to destination and the points on bottom show all of the times between when when trains leave the start station going to the destination station. The time between trains is the longest you would possibly need to wait if you arrived just as the previous train was leaving. The blue band excludes the shortest and longest 10% of all transit times, leaving behind the most-likely 80% range and the orange band does the same for wait times between trains. The dark lines show the middle point where 50% of the time wait/transit times are higher and 50% of the time they are lower.

Kendall/MIT to South Station
Loading... 90%
Drag from a starting station to an ending station to see how long the trip takes over time in the chart. Click on a starting station then an ending station to see how long the trip takes over time in the chart.
6am8am10am12pm2pm4pm6pm8pm10pm12am6am8am10am12pm2pm4pm6pm8pm10pm12amTrip duration (min)Minutes between trainsmiddle 80%transit timemedianmiddle 80%wait timemedian

In general, delays go up during rush hour but trains come more frequently, for example if you look at South Station to Kendall/MIT you will notice that the transit times go up as the wait times go down. If you drag across the chart, the paragraph below will tell you that these effects roughly balance eachother out and the most-likely trip duration (half the normal time between trains plus total transit time) stays constant around 10-12 minutes. It is also interesting to note that transit times on the blue line, for example State St. to Wonderland are much less variable than the red line. Orange line trips like Downtown Crossing to Forest Hills are less variable in transit time - but trains come much less frequently and reliably.

Summary

Through publicly available data, we have the tools to understand the subway system better than we ever have before. We have seen how the system operates on a daily basis, how people use the system, how that affects the trains and also how this ties back to your daily commute. To see a live version of this data, check out MBTA Trains for real-time subway delays and real-time commuter rail delays.

Credits

This project was created by Michael Barry and Brian Card for a graduate course in Data Visualization at WPI taught by Matthew Ward. Several open-source projects were used under the MIT License including D3, Bootstrap, Glyphicons, Underscore, Moment.js, es6-shim, and D3-tip. Data courtesy of the MBTA and their Developer Relations Program.

Much of the inspiration for this report comes from Bret Victor's Ladder of Abstraction and the works of Edward Tufte and Étienne-Jules Marey.

Source Code

The source code and raw data are available on github and described in this blog post.

For any questions, please reach out to Michael Barry or Brian Card on Twitter.