Coverage inspector for targeted sequencing QC (hg38)
covsnap computes per-target (and optionally per-exon) depth-of-coverage metrics from BAM/CRAM files aligned to the human reference genome hg38. It produces a machine-readable TSV alongside a human-readable Markdown report with automated PASS/FAIL classification heuristics — designed for clinical and research sequencing QC workflows.
- Gene-aware analysis — Look up genes by symbol (e.g.
BRCA1,TP53). Ships with a built-in dictionary of ~60 clinically relevant genes and an optional full GENCODE v44 tabix index covering 62,700+ genes. - Exon-level resolution — Per-exon depth metrics via the
--exonsflag using MANE Select transcripts from GENCODE v44. - Region and BED modes — Accepts genomic coordinates (
chr17:43044295-43125482) or a BED file of arbitrary target intervals. - Streaming architecture — O(1) memory per target using Welford's online algorithm for mean/variance and histogram-based exact median. No per-base depth arrays are ever held in memory.
- Dual engine support — Prefers mosdepth when available; falls back to
samtools depth. - Contig auto-detection — Transparently handles both
chr-prefixed (UCSC) and non-prefixed (Ensembl/1000G) BAM contig naming. - Gene alias resolution — Common aliases like
HER2 -> ERBB2andP53 -> TP53are resolved automatically, with fuzzy suggestions for typos. - BED guardrails — Configurable limits on target count, total bases, and file size to prevent accidental whole-exome/whole-genome runs.
- Classification heuristics — Automated PASS / DROP_OUT / UNEVEN / LOW_EXON / LOW_COVERAGE calls with tunable thresholds.
- Multiple output formats — Raw TSV (25 columns), JSON, exon TSV, low-coverage BED, and interpreted Markdown report.
conda install -c bioconda covsnapgit clone https://github.com/enes-ak/covsnap.git
cd covsnap
pip install .pip install ".[dev]"| Dependency | Version | Required? |
|---|---|---|
| Python | >= 3.9 | Yes |
| pysam | >= 0.22 | Yes |
| numpy | >= 1.24 | Yes |
| samtools | any recent | Yes (engine) |
| mosdepth | >= 0.3 | Optional (preferred engine) |
Note: At least one of
samtoolsormosdepthmust be on your$PATH. When--engine auto(the default), covsnap prefers mosdepth and falls back to samtools.
Analyze coverage for a gene by name:
covsnap sample.bam BRCA1This produces two files in the current directory:
covsnap.raw.tsv— machine-readable metricscovsnap.report.md— human-readable interpreted report
Specify an explicit genomic region (1-based inclusive coordinates):
covsnap sample.bam chr17:43044295-43125482Use a BED file of target intervals:
covsnap sample.bam --bed targets.bedcovsnap sample.cram BRCA1 --reference hg38.facovsnap sample.bam BRCA1 --exonsThis additionally writes covsnap.exons.tsv with per-exon metrics for each MANE Select exon.
Tab-separated file with a metadata header line and 25 columns:
| Column | Description |
|---|---|
target_id |
Gene symbol or BED interval label |
contig |
Chromosome (always hg38 style) |
start |
0-based start coordinate |
end |
Half-open end coordinate |
length_bp |
Target length in base pairs |
mean_depth |
Mean read depth |
median_depth |
Exact median depth (histogram-based) |
min_depth |
Minimum depth at any position |
max_depth |
Maximum depth at any position |
stdev_depth |
Standard deviation (Welford's algorithm) |
pct_zero |
Percentage of bases with zero coverage |
pct_ge_1 |
Percentage of bases with depth >= 1 |
pct_ge_5 |
Percentage of bases with depth >= 5 |
pct_ge_10 |
Percentage of bases with depth >= 10 |
pct_ge_20 |
Percentage of bases with depth >= 20 |
pct_ge_30 |
Percentage of bases with depth >= 30 |
pct_ge_50 |
Percentage of bases with depth >= 50 |
pct_ge_100 |
Percentage of bases with depth >= 100 |
n_lowcov_blocks |
Number of contiguous low-coverage blocks |
lowcov_total_bp |
Total bases in low-coverage blocks |
engine_used |
samtools or mosdepth |
bam_path |
Path to the input alignment file |
sample_name |
Sample name from BAM @RG SM tag |
build |
Always hg38 |
annotation_version |
Always gencode_v44 |
The threshold columns (pct_ge_X) are customizable with --pct-thresholds.
Written when --exons is used. Contains 12 columns per exon:
| Column | Description |
|---|---|
target_id |
Parent gene symbol |
exon_id |
GENCODE exon stable ID (e.g. ENSE00003896691.1) |
exon_number |
Exon number within the MANE Select transcript |
contig |
Chromosome |
start |
0-based start |
end |
Half-open end |
length_bp |
Exon length |
mean_depth |
Mean depth across the exon |
median_depth |
Exact median depth |
pct_zero |
Percentage of zero-coverage bases |
pct_ge_20 |
Percentage >= 20x |
pct_ge_30 |
Percentage >= 30x |
Optional structured JSON with all metrics, suitable for programmatic consumption and database loading.
BED file of contiguous low-coverage blocks. Controlled by:
--lowcov-threshold(default: 10) — depth below which a position is "low-coverage"--lowcov-min-len(default: 50) — minimum block length to report
Human-readable report including:
- Run metadata (sample, build, engine, date)
- Per-target summary table with classification
- Detailed per-target metrics
- Exon-level breakdown (when
--exonsis used) - Low-coverage block listing
- Classification heuristics reference
Each target is classified using ordered heuristics (first match wins):
| Status | Condition |
|---|---|
| DROP_OUT | pct_zero > 5% OR any zero-coverage block >= 500 bp |
| UNEVEN | mean_depth > 20 AND coefficient of variation > 1.0 |
| LOW_EXON | Any exon with pct_ge_20 < 90% or pct_zero > 5% (exon mode only) |
| LOW_COVERAGE | pct_ge_20 < 95% |
| PASS | pct_ge_20 >= 95% AND pct_zero <= 1% |
All thresholds are tunable via CLI flags:
covsnap sample.bam BRCA1 \
--pass-pct-ge-20 98.0 \
--pass-max-pct-zero 0.5 \
--dropout-pct-zero 3.0 \
--uneven-cv 0.8When using --bed, covsnap enforces limits to prevent accidental whole-exome/whole-genome processing:
| Parameter | Default | Flag |
|---|---|---|
| Max target intervals | 2,000 | --max-targets |
| Max total base pairs | 50 Mb | --max-total-bp |
| Max BED file size | 50 MB | --max-bed-bytes |
When limits are exceeded, the behavior is controlled by --on-large-bed:
| Mode | Behavior |
|---|---|
error |
Exit with code 4 |
warn_and_clip (default) |
Keep the first N targets that fit within limits |
warn_and_sample |
Reservoir sample N targets (deterministic with --large-bed-seed) |
The package ships with a built-in dictionary of ~60 clinically relevant genes. For access to the full GENCODE v44 catalog (62,700+ genes, 201,000+ MANE Select exons), build the tabix index:
# Download GENCODE v44 GTF (requires ~1.5 GB)
wget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_44/gencode.v44.annotation.gtf.gz
# Build the index
python scripts/build_gene_index.py gencode.v44.annotation.gtf.gz
# Files are written to src/covsnap/data/This creates:
hg38_genes.tsv.gz+.tbi— Gene-level tabix indexhg38_exons.bed.gz+.tbi— Exon-level tabix index (MANE Select only)hg38_gene_aliases.json.gz— Gene alias mapping
After building, reinstall the package to include the index files:
pip install .covsnap [-h] [--version] [--bed BED] [--exons] [--reference FASTA]
[--no-index] [--engine {auto,mosdepth,samtools}]
[--threads N] [--raw-out FILE] [--report-out FILE]
[--json-out FILE] [--exon-out FILE] [--emit-lowcov]
[--lowcov-bed FILE] [--lowcov-threshold N] [--lowcov-min-len N]
[--max-targets N] [--max-total-bp N] [--max-bed-bytes BYTES]
[--on-large-bed {error,warn_and_clip,warn_and_sample}]
[--large-bed-seed N] [--pct-thresholds LIST]
[--pass-pct-ge-20 F] [--pass-max-pct-zero F]
[--dropout-pct-zero F] [--uneven-cv F]
[--exon-pct-ge-20 F] [--exon-max-pct-zero F]
[-v] [--quiet]
alignment [target]
| Argument | Description |
|---|---|
alignment |
Path to BAM or CRAM file |
target |
Gene symbol or genomic region. Mutually exclusive with --bed |
| Flag | Description | Default |
|---|---|---|
--bed BED |
BED file of target intervals | — |
--exons |
Enable exon-level statistics (gene mode only) | off |
--reference FASTA |
Reference FASTA for CRAM decoding | — |
--engine |
Depth engine: auto, mosdepth, samtools |
auto |
--threads N |
Threads for mosdepth | 4 |
--raw-out FILE |
Raw TSV output path | covsnap.raw.tsv |
--report-out FILE |
Report markdown path | covsnap.report.md |
--json-out FILE |
JSON output path | — |
-v / --verbose |
Increase verbosity (repeatable) | — |
--quiet |
Suppress non-error output | off |
All output coordinates use 0-based half-open intervals, consistent with BED format:
# A 100 bp region starting at position 1000
contig start end length_bp
chr17 999 1099 100
User-facing region input accepts 1-based inclusive coordinates (e.g. chr17:1000-1099), which are internally converted.
covsnap sample.bam BRCA1 \
--raw-out results/brca1.tsv \
--report-out results/brca1.report.mdcovsnap sample.bam --bed panel_targets.bed \
--raw-out panel_results.tsv \
--report-out panel_report.md \
--json-out panel_results.jsoncovsnap sample.bam BRCA1 \
--exons \
--exon-out brca1_exons.tsv \
--emit-lowcov \
--lowcov-bed brca1_lowcov.bed \
--lowcov-threshold 20covsnap sample.bam --bed wes_targets.bed \
--on-large-bed error \
--max-targets 500 \
--max-total-bp 10000000covsnap sample.bam TP53 --engine samtools| Code | Meaning |
|---|---|
| 0 | Success |
| 1 | Invalid arguments or input validation failure |
| 2 | Engine error (samtools/mosdepth failure) |
| 3 | Unknown gene name (with fuzzy suggestions printed to stderr) |
| 4 | BED guardrail limits exceeded (when --on-large-bed error) |
pip install ".[test]"
pytestThe test suite uses synthetic BAM files generated on the fly (no real sequencing data needed). Tests requiring the full GENCODE index or mosdepth are automatically skipped if unavailable.
covsnap/
├── src/covsnap/
│ ├── __init__.py # Version, build, annotation constants
│ ├── cli.py # CLI entry point and orchestration
│ ├── annotation.py # Gene lookup, contig detection, region parsing
│ ├── bed.py # Streaming BED parser with guardrails
│ ├── metrics.py # TargetAccumulator (Welford + histogram)
│ ├── engines.py # samtools / mosdepth depth computation
│ ├── output.py # TSV, JSON, BED writers
│ ├── report.py # Classification heuristics + Markdown report
│ └── data/ # Gene/exon tabix indexes (GENCODE v44)
├── tests/ # Comprehensive test suite (~90 tests)
├── scripts/
│ └── build_gene_index.py # GENCODE GTF → tabix index builder
├── examples/ # Sample output files
├── recipes/conda/ # Bioconda-compatible recipe
└── pyproject.toml
MIT License. See LICENSE for details.