Skip to content
View HarshitSoni1903's full-sized avatar

Highlights

  • Pro

Block or report HarshitSoni1903

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
HarshitSoni1903/README.md

Harshit Soni

Machine learning researcher working on representation learning, causal inference, and empirical analysis of complex systems. MS Data Science @ NYU.

My work applies ML and statistical methods across domains including biomedical knowledge systems, geospatial data, financial markets, urban policy, and language.

LinkedIn Email


Focus areas

Representation learning across structured and unstructured data · Causal inference in observational and interference settings · Large-scale empirical analysis · Knowledge systems and retrieval pipelines


Methods

Representation learning. Transformer fine-tuning, contrastive learning, embedding models, cross-ontology semantic alignment, multilingual representation learning, temporal and regime-aware embeddings.

Causal inference and experimental design. Difference-in-differences, regression discontinuity, inverse probability weighting, A/B testing, exposure mapping, interference-aware estimation, simulation-based causal evaluation, hypothesis testing.

Statistical modeling and econometrics. Panel data analysis, time-series modeling, regime-switching models (HMM), regression frameworks, dimensionality reduction, feature significance testing.

Machine learning systems. Supervised learning pipelines, recommendation systems, hyperparameter optimization, model evaluation and ablation studies.

Retrieval and knowledge systems. Embedding-based retrieval, vector similarity search, candidate generation pipelines, ontology alignment workflows.

NLP and language modeling. Multilingual NLP pipelines, cross-lingual retrieval, sentiment modeling across languages, text-to-SQL systems, robustness to noisy text inputs.

Data engineering and empirical analysis. Large-scale ETL pipelines, administrative and observational data analysis, spatio-temporal analysis, feature engineering for high-dimensional structured datasets.


Tools

Python R PyTorch HuggingFace scikit-learn PySpark Qdrant FAISS Docker GCP AWS Tableau Git Linux

Programming and data workflows. Python, SQL, R, Shell, Pandas, NumPy, SciPy, PySpark.

ML and modeling. PyTorch, Hugging Face Transformers, Scikit-learn, XGBoost, Optuna, PEFT/QLoRA workflows.

Causal inference and statistical analysis. statsmodels, linearmodels, hmmlearn, A/B testing frameworks, IPW and propensity score pipelines, simulation tooling.

NLP and language systems. mT5, mContriever, Sentence Transformers, multilingual modeling pipelines, text-to-SQL workflows, noisy-text evaluation setups.

Retrieval and representation systems. Qdrant, FAISS, Sentence Transformers, embedding pipelines, vector search, RAG-style workflows.

Visualization and reporting. Matplotlib, Seaborn, Tableau, Plotly, Folium, GeoPandas.

Infrastructure and tooling. Git, Docker, Linux, HPC environments, Jupyter, Google Cloud, AWS.


Visitors

Pinned Loading

  1. Weakly-Supervised-Representation-Learning-for-Cross-Ontology-Mapping Weakly-Supervised-Representation-Learning-for-Cross-Ontology-Mapping Public

    Python 3

  2. Edgar_Multi_Agent Edgar_Multi_Agent Public

    Python

  3. Financial-Drivers-of-Corporate-ESG-Behavior Financial-Drivers-of-Corporate-ESG-Behavior Public

    Jupyter Notebook

  4. Cross-Cultural-Learning-in-Multilingual-Sentiment-Analysis Cross-Cultural-Learning-in-Multilingual-Sentiment-Analysis Public

    NLP Capstone

    1

  5. Regime-Aware-Remote-Sensing-Embeddings Regime-Aware-Remote-Sensing-Embeddings Public

    Python

  6. adversarial_causal adversarial_causal Public

    Python 1