My Projects
A collection of projects I've built to solve real-world problems and explore new technologies.
Featured Projects
Here are some of my most impactful projects that showcase my skills and passion for innovation.
Real-time Data Pipeline
Apache Kafka + Spark + PostgreSQL
Built a scalable real-time data processing pipeline that handles 1M+ events per day, featuring automatic scaling, monitoring, and data quality checks.
ML Model Deployment Platform
FastAPI + TensorFlow + AWS
Developed an end-to-end ML platform for model training, deployment, and monitoring with automated A/B testing and model versioning capabilities.
Interactive Data Pipeline
Click "Run Pipeline" to see how data flows through my ETL process
Extract data from PostgreSQL database
Live Code Editor
Try out real data engineering code examples. Click "Run Code" to see the results!
from pyspark.sql import SparkSession from pyspark.sql.functions import col, when, isnan, isnull # Initialize Spark session spark = SparkSession.builder \ .appName("CustomerDataETL") \ .getOrCreate() # Read data from source df = spark.read \ .format("jdbc") \ .option("url", "jdbc:postgresql://localhost:5432/customers") \ .option("dbtable", "customer_data") \ .load() # Data cleaning and transformation cleaned_df = df \ .filter(col("age").isNotNull()) \ .filter(col("age") > 0) \ .withColumn("age_group", when(col("age") < 25, "Young") .when(col("age") < 50, "Middle") .otherwise("Senior")) \ .withColumn("is_premium", col("purchase_amount") > 1000) # Write to data warehouse cleaned_df.write \ .format("parquet") \ .mode("overwrite") \ .save("s3://data-lake/customers/cleaned/") print(f"Processed {{cleaned_df.count()}} records")
Apache Spark ETL Pipeline
Extract, transform, and load data using PySpark
Data Visualization
Interactive data visualizations and analytics dashboards I've created.
Skills & Technologies
Technologies and tools I use to build amazing projects and solve complex problems.
My Projects
Explore my portfolio of data engineering, web development, and ML projects
Category
Status
Complexity
Showing 8 of 8 projects
ML Recommendation Engine
Developed a collaborative filtering recommendation system using PySpark MLlib and deployed with MLflow.
Cloud Data Warehouse
Designed and implemented a cloud-based data warehouse using Snowflake and dbt for modern analytics.
Data Pipeline Monitoring
Built a comprehensive monitoring system using Grafana, Prometheus, and custom alerting for data pipelines.
A/B Testing Platform
Developed a statistical analysis platform for A/B testing with automated experiment evaluation.
Real-time Stream Processing
Implemented real-time data processing using Apache Flink and Apache Pulsar for event-driven architecture.
ML Feature Store
Built a centralized feature store using Feast for managing ML features across multiple models.