Skip to content
View harshalsp0011's full-sized avatar
πŸ’­
hello
πŸ’­
hello

Highlights

  • Pro

Block or report harshalsp0011

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
harshalsp0011/README.md
Typing SVG

Profile Views Β  Status Β  Education


πŸ’« About Me

Building intelligent data systems that scale β€” from raw bytes to actionable insight.

I'm a Data Engineer & AI Automation Specialist with an MS in Data Science from the University at Buffalo. I specialize in:

  • πŸ” End-to-end ETL/ELT pipelines β€” from ingestion through transformation to serving
  • πŸ—οΈ Scalable data architectures β€” Lakehouse patterns with Delta Lake & Apache Iceberg
  • πŸ€– LLM-powered applications β€” multi-agent systems with LangChain, OpenAI, and FastAPI
  • ☁️ Cloud-native engineering β€” AWS, GCP, Snowflake, BigQuery at production scale

LinkedIn Medium Email


πŸ› οΈ Tech Stack

🐍 Languages & Analytics

Python SQL Bash R Pandas NumPy

βš™οΈ Data Engineering & Orchestration

Apache Airflow Apache Spark Apache Kafka Apache Hadoop dbt Delta Lake Apache Iceberg n8n

☁️ Cloud & Databases

AWS Amazon S3 AWS Glue AWS Lambda Amazon Redshift Amazon EMR Azure GCP Snowflake BigQuery PostgreSQL MySQL

πŸ—οΈ Infrastructure & DevOps

Terraform Docker Git GitHub Actions Linux Prometheus Grafana Great Expectations

πŸ€– AI & LLMs

OpenAI Claude Gemini LangChain LangSmith Hugging Face Vercel AI

🧠 ML & Data Science

Scikit-learn FastAPI Jupyter

πŸ“Š BI & Visualization

Power BI Matplotlib Seaborn Plotly


πŸ—ΊοΈ Experience Timeline

2024 – 2025     β”‚  πŸŽ“  MS in Data Science β€” University at Buffalo
                β”‚       Focus: ML, Big Data Systems, NLP, Cloud Architecture
                β”‚
2023 – 2024     β”‚  πŸ”§  Data Engineering & AI Automation Projects
                β”‚       ETL pipelines Β· Lakehouse architectures Β· LLM apps
                β”‚
2022 – 2023     β”‚  πŸ“Š  Data Analytics & Pipeline Development
                β”‚       Spark Β· Airflow Β· Kafka Β· dbt Β· AWS
                β”‚
2020 – 2022     β”‚  πŸ’»  Software & Data Engineering Foundations
                β”‚       Python Β· SQL Β· Cloud fundamentals Β· ML basics

πŸ“Œ Open to full-time Data Engineering / ML Engineering roles β€” available immediately.


πŸ“Š GitHub Stats

Β 

⚑ Fun Facts

harshal = {
    "pronouns"       : "he/him",
    "currently"      : "Building LLM-powered data pipelines & multi-agent systems",
    "learning"       : ["Apache Iceberg", "LLM fine-tuning", "Rust for data tools"],
    "hobbies"        : ["Exploring new data tools πŸ”", "Technical blogging on Medium ✍️",
                        "Coffee-fuelled late-night debugging β˜•", "F1 🏎️"],
    "fun_fact"       : "I automate tasks so I have more time to automate more tasks πŸ€–",
    "reach_me_at"    : "harshal.sanjivpatil2000@gmail.com",
}

✍️ Dev Quote of the Day


Last updated: 2026 · Built with ❀️ and way too much data

Pinned Loading

  1. Lead-Intelligence-Platform Lead-Intelligence-Platform Public

    AI-powered Multi-Agent Lead Intelligence Platform for utility cost-reduction consulting. Uses specialized agents (Scout, Analyst, Writer) to discover high-spend companies, enrich and score them as…

    Python

  2. realtime-user-analytics realtime-user-analytics Public

    A full-stack real-time analytics pipeline designed to simulate, process, and visualize live user interaction data (such as clicks and cart additions) for e-commerce. The system utilizes Apache Kafk…

    Python

  3. job-market-analysis-using-linkedIn-data job-market-analysis-using-linkedIn-data Public

    A Data Intensive Computing (CSE587) project designed to analyze job market trends using LinkedIn data. The repository documents the setup of a Dockerized Big Data infrastructure, featuring a Hadoop…

    Jupyter Notebook

  4. E-Commerce-product-recommendation-system E-Commerce-product-recommendation-system Public

    A scalable recommendation engine that uses ALS collaborative filtering on RetailRocket data to deliver personalized product suggestions. It features a high-performance FastAPI backend, a Streamlit …

    Python 1

  5. job-portal-database-system job-portal-database-system Public

    Job Portal Database System : ER model, PostgreSQL schema (DDL), fake data generation, bulk loading, and query samples with fixes. Includes a minimal Python app for quick interaction. Reproducible s…

    HTML 1

  6. ev-charging-data-warehouse ev-charging-data-warehouse Public

    EV Charging Network Data Warehouse with Snowflake - A cloud-based analytics platform for electric vehicle charging station optimization

    CSS 1