My Projects

A collection of projects I've built to solve real-world problems and explore new technologies.

8+
Years Experience
10B+
Records Processed
15+
Technologies
2
Fortune 500 Companies

Featured Projects

Here are some of my most impactful projects that showcase my skills and passion for innovation.

🚌

Dublin Bus Real-Time Pipeline

Python + GTFS-RT + SQLite + Streamlit

Complete data pipeline tracking 700+ buses in real-time across Dublin, with interactive dashboard showing delays, routes, and performance analytics.

PythonGTFS-RT APISQLitePandasStreamlitPlotly
🧠

Transit Delay Prediction

XGBoost + Python + Scikit-learn

ML model predicting bus delays 15 minutes in advance with 87% accuracy, using feature engineering on real-time transit data.

PythonXGBoostScikit-learnPandasFeature Engineering
System Ready. Waiting for job trigger...
PostgreSQL
Raw User Data
Cleaning
Dedup & Validation
Spark Job
Aggregation
Feature Eng
Vectorization
Snowflake
Analytics Ready
Cluster Status: Stable
Latency: 24ms
Nodes: 5/5

Live Code Editor

Try out real data engineering code examples. Click "Run Code" to see the results!

Apache Spark ETL Pipeline - PYTHON
from pyspark.sql import SparkSession
from pyspark.sql.functions import col, when, isnan, isnull

# Initialize Spark session
spark = SparkSession.builder \
    .appName("CustomerDataETL") \
    .getOrCreate()

# Read data from source
df = spark.read \
    .format("jdbc") \
    .option("url", "jdbc:postgresql://localhost:5432/customers") \
    .option("dbtable", "customer_data") \
    .load()

# Data cleaning and transformation
cleaned_df = df \
    .filter(col("age").isNotNull()) \
    .filter(col("age") > 0) \
    .withColumn("age_group", 
        when(col("age") < 25, "Young")
        .when(col("age") < 50, "Middle")
        .otherwise("Senior")) \
    .withColumn("is_premium", col("purchase_amount") > 1000)

# Write to data warehouse
cleaned_df.write \
    .format("parquet") \
    .mode("overwrite") \
    .save("s3://data-lake/customers/cleaned/")

print(f"Processed {{cleaned_df.count()}} records")

Apache Spark ETL Pipeline

Extract, transform, and load data using PySpark

🐍
Python & PySpark
Real data processing code
⚑
Live Execution
Run code and see results
πŸ“Š
Real Examples
Production-ready patterns

Data Visualization

Interactive data visualizations and analytics dashboards I've created.

Skills & Technologies

Technologies and tools I use to build amazing projects and solve complex problems.

🐍
Python
βš›οΈ
React
β–²
Next.js
πŸ“˜
TypeScript
🐘
PostgreSQL
🐳
Docker
☁️
AWS
⚑
Kafka
πŸ”₯
Spark
☸️
Kubernetes
πŸ“š
Git
🐧
Linux

Want the Full Story?

Dive deep into my projects with detailed case studies including architecture decisions, challenges overcome, and measurable business impact.

View Case Studies

My Projects

Explore my portfolio of data engineering, web development, and ML projects

Category

Status

Complexity

Sort by:

Showing 5 of 5 projects

Dublin Bus Real-Time Pipeline
Featuredcompleted
intermediate

Dublin Bus Real-Time Pipeline

Complete data pipeline tracking 700+ buses in real-time across Dublin with interactive Streamlit dashboard showing delays and performance.

PythonGTFS-RTSQLite+3 more
Transit Delay Prediction ML
Featuredcompleted
advanced

Transit Delay Prediction ML

Machine learning model predicting bus delays 15 minutes in advance with 87% accuracy using XGBoost and feature engineering.

PythonXGBoostScikit-learn+2 more
Route Performance Analysis
Featuredcompleted
intermediate

Route Performance Analysis

Comprehensive analysis of 198 Dublin bus routes identifying bottlenecks, best/worst performers, and optimization recommendations.

PythonPandasData Analysis+2 more
Full-Stack Portfolio Platform
Featuredcompleted
intermediate

Full-Stack Portfolio Platform

Modern portfolio website with custom CMS, blog system, dark mode, and 100 Lighthouse score built with Next.js and Supabase.

Next.jsTypeScriptTailwind CSS+2 more
Peak Hours Time Analysis
completed
intermediate

Peak Hours Time Analysis

Time-series analysis revealing when Dublin buses are most delayed, helping commuters optimize their travel times.

PythonPandasTime Series+2 more