Back to Case Studies
Machine Learning
Transit Delay Prediction
Using machine learning to predict bus delays before they happen, enabling proactive passenger notifications and route optimization.
87%
Prediction Accuracy
15min
Advance Warning
100K+
Training Samples
< 50ms
Inference Time
The Problem
Passengers waiting at bus stops have no way to know if their bus will be delayed until it's already late. This leads to:
- •Frustrated passengers standing in the rain
- •Missed connections and appointments
- •Lack of trust in public transit
Question: Can we predict delays before they happen using historical patterns and real-time data?
ML Approach
Feature Engineering
Time of day
Day of week
Route ID
Direction
Current position
Recent delays
Stop sequence
Distance to stop
Historical avg
Model Architecture
Input Features (9)
│
▼
┌─────────────────────┐
│ Gradient Boosting │ XGBoost / LightGBM
│ Regressor │
└──────────┬──────────┘
│
▼
┌─────────────────────┐
│ Predicted Delay │ (minutes)
│ + Confidence Score │
└─────────────────────┘Why Gradient Boosting?
- Handles mixed feature types (categorical + numerical)
- Captures non-linear relationships
- Fast inference for real-time predictions
- Interpretable feature importance
Results
Model Performance
MAE (Mean Absolute Error)1.8 min
RMSE2.4 min
R² Score0.74
Within ±3 min accuracy87%
Feature Importance
Recent delays (last 3 stops)34%
Time of day22%
Route historical avg18%
Day of week12%
Distance to stop8%
Other6%
The model achieves 87% accuracy within ±3 minutes, making it reliable enough for passenger notifications while leaving room for improvement with more data.
Technologies Used
PythonScikit-learnXGBoostPandasNumPyGTFS-RT APISQLitePlotlyJupyter
Explore the Code
Full implementation including feature engineering, model training, and evaluation notebooks.
View on GitHub