Back to Case Studies
Machine Learning

Transit Delay Prediction

Using machine learning to predict bus delays before they happen, enabling proactive passenger notifications and route optimization.

87%
Prediction Accuracy
15min
Advance Warning
100K+
Training Samples
< 50ms
Inference Time

The Problem

Passengers waiting at bus stops have no way to know if their bus will be delayed until it's already late. This leads to:

  • Frustrated passengers standing in the rain
  • Missed connections and appointments
  • Lack of trust in public transit

Question: Can we predict delays before they happen using historical patterns and real-time data?

ML Approach

Feature Engineering

Time of day
Day of week
Route ID
Direction
Current position
Recent delays
Stop sequence
Distance to stop
Historical avg

Model Architecture

Input Features (9)
      │
      ▼
┌─────────────────────┐
│ Gradient Boosting   │  XGBoost / LightGBM
│ Regressor           │  
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│ Predicted Delay     │  (minutes)
│ + Confidence Score  │
└─────────────────────┘

Why Gradient Boosting?

  • Handles mixed feature types (categorical + numerical)
  • Captures non-linear relationships
  • Fast inference for real-time predictions
  • Interpretable feature importance

Results

Model Performance

MAE (Mean Absolute Error)1.8 min
RMSE2.4 min
R² Score0.74
Within ±3 min accuracy87%

Feature Importance

Recent delays (last 3 stops)34%
Time of day22%
Route historical avg18%
Day of week12%
Distance to stop8%
Other6%

The model achieves 87% accuracy within ±3 minutes, making it reliable enough for passenger notifications while leaving room for improvement with more data.

Technologies Used

PythonScikit-learnXGBoostPandasNumPyGTFS-RT APISQLitePlotlyJupyter

Explore the Code

Full implementation including feature engineering, model training, and evaluation notebooks.

View on GitHub