massSight implements probabilistic drift-aware optimal transport for cross-study LC–MS feature matching.
Install using uv:
uv add mass-sight
Or using pip:
pip install mass-sight
Core dependencies include numpy, pandas, scikit-learn, and scipy.
massSight expects two feature tables with at least:
MZ(m/z)RT(retention time in minutes)Intensity(a per-feature intensity summary; optional)
massSight is configured via MassSightConfig. By default it expects canonical column names (MZ, RT, Intensity).
If your tables use different column names, either rename your columns or pass a schema override:
from mass_sight import MassSightConfig
cfg = MassSightConfig(mz_col="mz", rt_col="rt_min", intensity_col="area")
If study_a and study_b use different schemas, use study-specific overrides:
cfg = MassSightConfig(
mz_col_study_a="mz",
rt_col_study_a="rt_min",
mz_col_study_b="m_over_z",
rt_col_study_b="rt",
)
For the CLI, use --mz-col, --rt-col, and --intensity-col.
import pandas as pd
from mass_sight import MassSightConfig, match_features
study_a = pd.read_csv("study_a.csv")
study_b = pd.read_csv("study_b.csv")
cfg = MassSightConfig()
res = match_features(study_a, study_b, cfg)
top1 = res.top1 # id1, id2, decision, margin, prob_raw, ...
candidates = res.candidates # residuals, log-likelihoods, OT weights, etc.
mass_sight match study_a.csv study_b.csv \
--out-candidates candidates.csv \
--out-top1 top1.csv
For public-data reuse from Metabolomics Workbench:
mass_sight find --out selection.json
mass_sight reuse \
--analysis-id AN000001 \
--analysis-id AN000002 \
--out-dir reuse_out
find launches an interactive terminal UI to browse Workbench by disease and export a reproducible selection manifest
of analysis_ids. You can then pass those IDs to reuse (or script against selection.json).
This workflow is designed for end users who want a quick, reproducible cross-study run from public MW IDs.
It fetches mwtab + untarg_data, summarizes study metadata, automatically groups compatible assays
(same ion mode + chromatography), and runs clustering within each group.
Common options:
--use-intensity off|auto|on: control whether intensity is used across studies (defaultoff).--min-group-size N: require at leastNstudies per assay-compatible group (default2).--allow-unknown-strata: include analyses with unknown ion mode/chromatography instead of dropping them.--fetch-targeted-data: also fetch study-level named-metabolite matrices (/study_id/.../data) for targeted/meta-analysis (default disabled).
By default, CLI runs also write a machine-readable run manifest capturing software version, parameters, runtime, and outputs:
match:<out-candidates-stem>.run_manifest.jsoncluster:<out-dir>/run_manifest.jsonreuse:<out-dir>/run_manifest.json
- See
CITATION.cff.
MIT.