My Projects

A collection of projects I've built to solve real-world problems and explore new technologies.

15+

Projects Built

5+

Technologies

3+

Years Experience

100%

Passion

Featured Projects

Here are some of my most impactful projects that showcase my skills and passion for innovation.

⚡

Real-time Data Pipeline

Apache Kafka + Spark + PostgreSQL

Built a scalable real-time data processing pipeline that handles 1M+ events per day, featuring automatic scaling, monitoring, and data quality checks.

Apache KafkaApache SparkPostgreSQLDockerKubernetesPython

View Details

2024

🤖

ML Model Deployment Platform

FastAPI + TensorFlow + AWS

Developed an end-to-end ML platform for model training, deployment, and monitoring with automated A/B testing and model versioning capabilities.

PythonFastAPITensorFlowAWS SagemakerDockerReact

View Details

2024

Interactive Data Pipeline

Click "Run Pipeline" to see how data flows through my ETL process

🗄️

PostgreSQL

Customer data extraction

🧹

Data Cleaning

Remove duplicates & validate

⚡

Apache Spark

ETL processing

🔧

Feature Engineering

Create ML features

📊

Data Warehouse

Analytics ready data

Extract data from PostgreSQL database

5

Pipeline Stages

99.9%

Uptime

10TB+

Data Processed

Live Code Editor

Try out real data engineering code examples. Click "Run Code" to see the results!

Apache Spark ETL Pipeline - PYTHON

from pyspark.sql import SparkSession
from pyspark.sql.functions import col, when, isnan, isnull

# Initialize Spark session
spark = SparkSession.builder \
    .appName("CustomerDataETL") \
    .getOrCreate()

# Read data from source
df = spark.read \
    .format("jdbc") \
    .option("url", "jdbc:postgresql://localhost:5432/customers") \
    .option("dbtable", "customer_data") \
    .load()

# Data cleaning and transformation
cleaned_df = df \
    .filter(col("age").isNotNull()) \
    .filter(col("age") > 0) \
    .withColumn("age_group", 
        when(col("age") < 25, "Young")
        .when(col("age") < 50, "Middle")
        .otherwise("Senior")) \
    .withColumn("is_premium", col("purchase_amount") > 1000)

# Write to data warehouse
cleaned_df.write \
    .format("parquet") \
    .mode("overwrite") \
    .save("s3://data-lake/customers/cleaned/")

print(f"Processed {{cleaned_df.count()}} records")

Apache Spark ETL Pipeline

Extract, transform, and load data using PySpark

🐍

Python & PySpark

Real data processing code

⚡

Live Execution

Run code and see results

📊

Real Examples

Production-ready patterns

Data Visualization

Interactive data visualizations and analytics dashboards I've created.

Skills & Technologies

Technologies and tools I use to build amazing projects and solve complex problems.

🐍

Python

⚛️

React

▲

Next.js

📘

TypeScript

🐘

PostgreSQL

🐳

Docker

☁️

AWS

⚡

Kafka

🔥

Spark

☸️

Kubernetes

📚

Git

🐧

Linux

My Projects

Explore my portfolio of data engineering, web development, and ML projects

Status

Complexity

Sort by:

Showing 8 of 8 projects

Featuredcompleted

advanced

Real-time ETL Pipeline

Built a scalable ETL pipeline using Apache Spark, Kafka, and PostgreSQL for processing 10TB+ of customer data daily.

Apache SparkKafkaPostgreSQL+2 more

GitHub Live Demo

Featuredcompleted

advanced

ML Recommendation Engine

Developed a collaborative filtering recommendation system using PySpark MLlib and deployed with MLflow.

PySparkMLlibMLflow+2 more

GitHub

Featuredcompleted

intermediate

Interactive Data Dashboard

Created a real-time analytics dashboard using React, D3.js, and FastAPI for visualizing business metrics.

ReactD3.jsFastAPI+2 more

GitHub Live Demo

completed

advanced

Cloud Data Warehouse

Designed and implemented a cloud-based data warehouse using Snowflake and dbt for modern analytics.

SnowflakedbtAirflow+2 more

GitHub

in progress

intermediate

Data Pipeline Monitoring

Built a comprehensive monitoring system using Grafana, Prometheus, and custom alerting for data pipelines.

GrafanaPrometheusPython+2 more

GitHub

completed

intermediate

A/B Testing Platform

Developed a statistical analysis platform for A/B testing with automated experiment evaluation.

PythonPandasScipy+2 more

GitHub

planned

advanced

Real-time Stream Processing

Implemented real-time data processing using Apache Flink and Apache Pulsar for event-driven architecture.

Apache FlinkApache PulsarJava+2 more

in progress

advanced

ML Feature Store

Built a centralized feature store using Feast for managing ML features across multiple models.

FeastRedisPostgreSQL+2 more

GitHub

My Projects

Featured Projects

Real-time Data Pipeline

ML Model Deployment Platform

Interactive Data Pipeline

Live Code Editor

Apache Spark ETL Pipeline

Data Visualization

Skills & Technologies

My Projects

Category

Status

Complexity

Real-time ETL Pipeline

ML Recommendation Engine

Interactive Data Dashboard

Cloud Data Warehouse

Data Pipeline Monitoring

A/B Testing Platform

Real-time Stream Processing

ML Feature Store