Skip to content

kr-colab/mkado

Repository files navigation

MKado 御門

Documentation Status

A modern Python implementation of the McDonald-Kreitman test toolkit for detecting selection in molecular evolution.

Documentation | PyPI

Features

  • Standard MK test: Classic 2x2 contingency table with Fisher's exact test
  • Polarized MK test: Uses a third outgroup to assign mutations to lineages
  • Asymptotic MK test: Frequency-bin α estimates with exponential extrapolation (Messer & Petrov 2013)
  • Tarone-Greenland α_TG: Weighted multi-gene estimator that corrects for sample size heterogeneity (Stoletzki & Eyre-Walker 2011)
  • Batch processing: Process multiple genes with parallel execution and Benjamini-Hochberg correction for multiple testing
  • Volcano plots: Visualize batch results with publication-ready volcano plots
  • Multiple output formats: Pretty-print, TSV, and JSON

Installation

# Install with uv (recommended)
uv pip install mkado

# Or install with pip
pip install mkado

Development Installation

# Clone the repository
git clone https://github.com/andrewkern/mkado.git
cd mkado

# Install with uv
uv sync

Quick Start

# Standard MK test (combined alignment file)
mkado test alignment.fa -i "dmel" -o "dsim"

# Asymptotic MK test
mkado test alignment.fa -i "dmel" -o "dsim" -a

# Polarized MK test
mkado test alignment.fa -i "dmel" -o "dsim" --polarize-match "dyak"

# Batch process a directory
mkado batch alignments/ -i "dmel" -o "dsim"

# Batch with asymptotic test and 8 parallel workers
mkado batch alignments/ -i "dmel" -o "dsim" -a -w 8

# Get file info
mkado info sequences.fa

Usage Modes

mkado supports two modes for specifying ingroup/outgroup sequences:

Combined File Mode (Recommended)

Use -i and -o to filter sequences by name pattern from a single alignment file:

mkado test alignment.fa -i "speciesA" -o "speciesB"
mkado batch alignments/ -i "speciesA" -o "speciesB"

Separate Files Mode

Provide separate FASTA files for ingroup and outgroup:

mkado test ingroup.fa outgroup.fa
mkado batch genes/ --ingroup-pattern "*_in.fa" --outgroup-pattern "*_out.fa"

Commands

mkado test

Run MK test on a single alignment.

mkado test FASTA [OUTGROUP_FILE] [OPTIONS]

Key Options:

Option Short Description
--ingroup-match -i Ingroup sequence name pattern (combined mode)
--outgroup-match -o Outgroup sequence name pattern (combined mode)
--asymptotic -a Use asymptotic MK test
--polarize -p Second outgroup file (separate files mode)
--polarize-match Second outgroup pattern (combined mode)
--bins -b Frequency bins for asymptotic test (default: 10)
--plot-asymptotic Generate alpha(x) plot for asymptotic test (PNG, PDF, or SVG)
--format -f Output format: pretty, tsv, json
--reading-frame -r Reading frame 1-3 (default: 1)

Examples:

# Combined file mode
mkado test alignment.fa -i "dmel" -o "dsim"
mkado test alignment.fa -i "dmel" -o "dsim" -a -b 20
mkado test alignment.fa -i "dmel" -o "dsim" -a --plot-asymptotic alpha_fit.png
mkado test alignment.fa -i "dmel" -o "dsim" --polarize-match "dyak"

# Separate files mode
mkado test ingroup.fa outgroup.fa
mkado test ingroup.fa outgroup.fa -a
mkado test ingroup.fa outgroup.fa -p outgroup2.fa

mkado batch

Run MK test on multiple alignment files.

mkado batch DIRECTORY [OPTIONS]

Key Options:

Option Short Description
--ingroup-match -i Ingroup pattern (enables combined file mode)
--outgroup-match -o Outgroup pattern (required with -i)
--asymptotic -a Use asymptotic MK test
--alpha-tg Compute weighted α_TG (Stoletzki & Eyre-Walker 2011)
--aggregate/--per-gene Aggregate results or per-gene (asymptotic)
--pattern File glob pattern (default: auto-detect *.fa, *.fasta, *.fna)
--workers -w Parallel workers (0=auto, 1=sequential)
--bins -b Frequency bins for asymptotic test
--format -f Output format: pretty, tsv, json
--volcano Generate volcano plot (PNG, PDF, or SVG)
--plot-asymptotic Generate alpha(x) plot for aggregated asymptotic test

Examples:

# Combined file mode (recommended)
mkado batch alignments/ -i "dmel" -o "dsim"
mkado batch alignments/ -i "dmel" -o "dsim" -a
mkado batch alignments/ -i "dmel" -o "dsim" -a --per-gene
mkado batch alignments/ -i "dmel" -o "dsim" --alpha-tg
mkado batch alignments/ -i "dmel" -o "dsim" -w 8

# Generate a volcano plot
mkado batch alignments/ -i "dmel" -o "dsim" --volcano results.png

# Generate asymptotic alpha(x) plot
mkado batch alignments/ -i "dmel" -o "dsim" -a --plot-asymptotic alpha_fit.png

# Separate files mode
mkado batch genes/ --ingroup-pattern "*_in.fa" --outgroup-pattern "*_out.fa"

mkado info

Display information about a FASTA file.

mkado info FASTA [-r READING_FRAME]

Example Output

$ mkado test alignment.fa -i "kreitman" -o "mauritiana"

Found 11 ingroup, 1 outgroup sequences
MK Test Results:
  Divergence:    Dn=6, Ds=8
  Polymorphism:  Pn=1, Ps=8
  Fisher's exact p-value: 0.176
  Neutrality Index (NI):  0.1667
  Alpha (α):              0.8333
  DoS:                    0.3175

Python API

from mkado import mk_test, asymptotic_mk_test, SequenceSet

# Run MK test
result = mk_test("ingroup.fa", "outgroup.fa")
print(f"Alpha: {result.alpha}")
print(f"P-value: {result.p_value}")

# Run asymptotic MK test
result = asymptotic_mk_test("ingroup.fa", "outgroup.fa")
print(f"Asymptotic Alpha: {result.alpha_asymptotic}")
print(f"95% CI: {result.ci_low} - {result.ci_high}")

# Combined file mode - filter by sequence name
all_seqs = SequenceSet.from_fasta("combined.fa")
ingroup = all_seqs.filter_by_name("dmel")
outgroup = all_seqs.filter_by_name("dsim")
result = mk_test(ingroup, outgroup)

Interpretation

Neutrality Index (NI)

  • NI = 1: Neutral evolution
  • NI > 1: Excess polymorphism (segregating weakly deleterious variants)
  • NI < 1: Excess divergence (positive selection)

Alpha (α)

  • α = 0: No adaptive substitutions
  • α > 0: Proportion of substitutions driven by positive selection
  • α < 0: Excess polymorphism relative to divergence

Direction of Selection (DoS)

From Stoletzki & Eyre-Walker (2011), DoS = Dn/(Dn+Ds) - Pn/(Pn+Ps):

  • DoS = 0: Neutral evolution
  • DoS > 0: Positive selection (excess adaptive substitutions)
  • DoS < 0: Slightly deleterious polymorphisms

DoS is bounded [-1, +1] and symmetric around 0, making it easier to interpret than NI.

Development

# Install dev dependencies
uv sync

# Run tests
uv run pytest

# Run linter
uv run ruff check src/

# Run formatter
uv run ruff format src/

Examples

Example data and tutorials are available in the examples/ directory:

# Run batch MK test on example data
mkado batch examples/anopheles_batch/ -i gamb -o afun

# Run asymptotic MK test
mkado batch examples/anopheles_batch/ -i gamb -o afun -a

See the documentation for detailed tutorials and API reference.

References

License

MIT License

About

A McDonald-Kreitman test implementation

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages