Machine learning researcher working on representation learning, causal inference, and empirical analysis of complex systems. MS Data Science @ NYU.
My work applies ML and statistical methods across domains including biomedical knowledge systems, geospatial data, financial markets, urban policy, and language.
Representation learning across structured and unstructured data · Causal inference in observational and interference settings · Large-scale empirical analysis · Knowledge systems and retrieval pipelines
Representation learning. Transformer fine-tuning, contrastive learning, embedding models, cross-ontology semantic alignment, multilingual representation learning, temporal and regime-aware embeddings.
Causal inference and experimental design. Difference-in-differences, regression discontinuity, inverse probability weighting, A/B testing, exposure mapping, interference-aware estimation, simulation-based causal evaluation, hypothesis testing.
Statistical modeling and econometrics. Panel data analysis, time-series modeling, regime-switching models (HMM), regression frameworks, dimensionality reduction, feature significance testing.
Machine learning systems. Supervised learning pipelines, recommendation systems, hyperparameter optimization, model evaluation and ablation studies.
Retrieval and knowledge systems. Embedding-based retrieval, vector similarity search, candidate generation pipelines, ontology alignment workflows.
NLP and language modeling. Multilingual NLP pipelines, cross-lingual retrieval, sentiment modeling across languages, text-to-SQL systems, robustness to noisy text inputs.
Data engineering and empirical analysis. Large-scale ETL pipelines, administrative and observational data analysis, spatio-temporal analysis, feature engineering for high-dimensional structured datasets.
Programming and data workflows. Python, SQL, R, Shell, Pandas, NumPy, SciPy, PySpark.
ML and modeling. PyTorch, Hugging Face Transformers, Scikit-learn, XGBoost, Optuna, PEFT/QLoRA workflows.
Causal inference and statistical analysis. statsmodels, linearmodels, hmmlearn, A/B testing frameworks, IPW and propensity score pipelines, simulation tooling.
NLP and language systems. mT5, mContriever, Sentence Transformers, multilingual modeling pipelines, text-to-SQL workflows, noisy-text evaluation setups.
Retrieval and representation systems. Qdrant, FAISS, Sentence Transformers, embedding pipelines, vector search, RAG-style workflows.
Visualization and reporting. Matplotlib, Seaborn, Tableau, Plotly, Folium, GeoPandas.
Infrastructure and tooling. Git, Docker, Linux, HPC environments, Jupyter, Google Cloud, AWS.

