massSight

massSight implements probabilistic drift-aware optimal transport for cross-study LC–MS feature matching.

Install

Install using uv:

uv add mass-sight

Or using pip:

pip install mass-sight

Core dependencies include numpy, pandas, scikit-learn, and scipy.

Quickstart

Input feature tables

massSight expects two feature tables with at least:

MZ (m/z)
RT (retention time in minutes)
Intensity (a per-feature intensity summary; optional)

`MassSightConfig` and column names

massSight is configured via MassSightConfig. By default it expects canonical column names (MZ, RT, Intensity).

If your tables use different column names, either rename your columns or pass a schema override:

from mass_sight import MassSightConfig

cfg = MassSightConfig(mz_col="mz", rt_col="rt_min", intensity_col="area")

If study_a and study_b use different schemas, use study-specific overrides:

cfg = MassSightConfig(
    mz_col_study_a="mz",
    rt_col_study_a="rt_min",
    mz_col_study_b="m_over_z",
    rt_col_study_b="rt",
)

For the CLI, use --mz-col, --rt-col, and --intensity-col.

Python usage

import pandas as pd
from mass_sight import MassSightConfig, match_features

study_a = pd.read_csv("study_a.csv")
study_b = pd.read_csv("study_b.csv")

cfg = MassSightConfig()
res = match_features(study_a, study_b, cfg)

top1 = res.top1 # id1, id2, decision, margin, prob_raw, ...
candidates = res.candidates  # residuals, log-likelihoods, OT weights, etc.

Command-line usage

mass_sight match study_a.csv study_b.csv \
  --out-candidates candidates.csv \
  --out-top1 top1.csv

For public-data reuse from Metabolomics Workbench:

mass_sight find --out selection.json

mass_sight reuse \
  --analysis-id AN000001 \
  --analysis-id AN000002 \
  --out-dir reuse_out

find launches an interactive terminal UI to browse Workbench by disease and export a reproducible selection manifest of analysis_ids. You can then pass those IDs to reuse (or script against selection.json).

This workflow is designed for end users who want a quick, reproducible cross-study run from public MW IDs. It fetches mwtab + untarg_data, summarizes study metadata, automatically groups compatible assays (same ion mode + chromatography), and runs clustering within each group.

Common options:

--use-intensity off|auto|on: control whether intensity is used across studies (default off).
--min-group-size N: require at least N studies per assay-compatible group (default 2).
--allow-unknown-strata: include analyses with unknown ion mode/chromatography instead of dropping them.
--fetch-targeted-data: also fetch study-level named-metabolite matrices (/study_id/.../data) for targeted/meta-analysis (default disabled).

By default, CLI runs also write a machine-readable run manifest capturing software version, parameters, runtime, and outputs:

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.github/workflows		.github/workflows
docs		docs
examples/reviewer_demo		examples/reviewer_demo
src/mass_sight		src/mass_sight
tests		tests
.gitignore		.gitignore
.python-version		.python-version
CITATION.cff		CITATION.cff
LICENSE.md		LICENSE.md
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

massSight

Install

Quickstart

Input feature tables

`MassSightConfig` and column names

Python usage

Command-line usage

Citation

License

About

Uh oh!

Releases 3

Uh oh!

Contributors

Uh oh!

Languages

License

omicsEye/massSight

Folders and files

Latest commit

History

Repository files navigation

massSight

Install

Quickstart

Input feature tables

MassSightConfig and column names

Python usage

Command-line usage

Citation

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Uh oh!

Contributors

Uh oh!

Languages

`MassSightConfig` and column names