Bus Arrival Predictions

Optimizing TCAT's bus arrival predictions algorithm through a classification model to increase accuracy and efficiency

Tompkins Consolidated Area Transit

TCAT is a 501(c)(3) devoted to contribute to the overall social, environmental, and economic health in our service area by delivering safe, reliable and affordable transportation and, at the same time, being a responsive, responsible employer. TCAT operates 22- ½ hours a day, seven days a week and 360 days a year, being the primary source of transportation for more than 4 million commuters in 2019. It contributes greatly to the community it serves by reducing traffic congestion, greenhouse gas emissions and the cost of building parking facilities

 

It aims to become a model community transportation system committed to quality service, employee-management collaboration, and innovation.

Our Product

Problem Statement

An analysis of TCAT's bus time arrival predictions model revealed that not only were predictions wrong up to 25% of the times on a normal working day, but also that their model was incredibly complex and server intensive. They are curious to find out the frequency and reasons for why their model is delivering poor performance.

 

Main Features

Data Transformation

We developed a script that transforms a JSON object representing a prediction made by the company’s model to a row of a pandas DataFrame object in Python

  • Ran this script on the API that generates these JSON objects for a week and collected ~ 28,000,000 predictions.​

Bus Predictions Model

We developed our own machine learning model to help figure out what routes, times and buses are most prone to error. This model:

  • Classified each prediction into five different classes: On time, late, very late, early, very early based on times determined by mean, median and standard deviation of the results of these predictions, computed by looking at historical data.

  • Added features such as temperature, precipitation, level of snow and elevation between two given stops. Sampled our 28 million predictions and ran a Random Forest classifier using the sklearn library in Python and got an accuracy of ~ 92%

Project Leads

Saksham Mohan

Tech Lead

Bryant Lee

Product Manager

Contact Us

Follow us on our Instagram and Facebook!


If you would like to further inquire about our organization or services, please contact Saksham Mohan (sm985@cornell.edu) or Alexa Batino (afb75@cornell.edu).

Icons made by Freepik from www.flaticon.com / undraw.com

Copyright © 2020 Hack4Impact Cornell