GitHub - dedonlab/PTseq_data_analysis: Pipeline for PT-seq data analysis

Copy right and Contact

Author and contact: yfyuan@mit.edu, msdemott@mit.edu

Edit log

10/04/2025 revised: add test dataset, version number

Overview

This foler contains bash and python scripts and bash commands used in the manuscripts entitled "Phosphorothioate DNA modification by BREX type 4 systems in the human gut microbiome" and "PT-seq: A method for metagenomic analysis of phosphorothioate epigenetics in complex microbial communities"

It aims to align reads of an example PT-seq dataset, identify read pileups, and extract sequences including 6 flanking nt at the pileup sites. The package utilizes a simply pipeline from trimming to mapping, sorting and sequencing analysis.

Hardware requirements

The PTseq pileup analysis pipelien requires only a standard computer with enough RAM to support the in-memory operations.

OS Requirements

This package is supported for macOS and Linux. The package has been tested on the following systems:

Linux: Centos 7 macOS: Mojave (10.14.1)

Installation and dependences

The software/tools below should be installed and added to your system’s `PATH` so that it can be invoked from the command line.

bbmap v35.85 https://archive.jgi.doe.gov/data-and-tools/software-tools/bbtools/bb-tools-user-guide/installation-guide/

fastqc v0.11.8 https://github.com/s-andrews/FastQC

bowtie2 v2.4.5 https://github.com/BenLangmead/bowtie2/releases

samtools v1.19.2 https://github.com/samtools/samtools/releases/

bedtools v2.30.0 https://github.com/arq5x/bedtools2/releases

SMARTcleaner-master v1.0 https://github.com/dzhaobio/SMARTcleaner

seqkit v2.1.0 https://bioinf.shenwei.me/seqkit/

The installation time for tools above should be no more than 1 hour.

MEME-suit v5.3.3 #Please follow the instructions of the MEME suite via its website at http://meme-suite.org

Dependences for MEME-suit v5.3.3

perl v5.18.2 https://www.cpan.org/src/

ghostscript v9.52 https://ghostscript.com/releases/

automake v1.15 https://ftp.gnu.org/gnu/automake/

autoconf v2.69 https://www.gnu.org/software/autoconf/

python v3.5.2 https://www.python.org/downloads/release/python-352/

zlib v1.2.11 https://github.com/jrmwng/zlib-1.2.11

jdk v1.8.0-101 https://www.oracle.com/java/technologies/javase/javase8-archive-downloads.html

zlib v1.2.11 https://github.com/jrmwng/zlib-1.2.11

xz v5.2.3 https://github.com/tukaani-project/xz/releases

lzma v4.32.7 https://sourceforge.net/projects/lzma/

Depending on the environment of the system, the installation time for MEME-suite can be from less than 1 hour to about 3 hours.

Usage

install the dependences
Download the scripts and the demo dataset. Place them in the work directory
Keep the demo dataset in work/demo/
To deplete human sequence contamination (optional), please download the hg19_main_mask_ribo_animal_allplant_allfungus.fa.gz file at https://zenodo.org/records/1208052 and place it in the work/demo folder with the demo reads.
Modify trim.sh with ${path_to_your_bbmap}
Run the scripts in the following order:

1) trim reads

# RAM >= 50G is required. For real PT-seq dataset, we recommond thread >= 10.
```
sh trim.sh demo/demo_1.fastq demo/demo_2.fastq job_demo
```
The output files for the next step are trimmed reads: job_demo_R1_final.fastq and job_demo_R2_final.fastq. The output files also include intermediate .fq files and QC report files.

2) map reads to genome, identify pileups and extract sequences at pileup site with 6 flanking nt. input: reference genome, trimmed reads, job name
```
sh main.sh demo/UYXE01.1.fsa demo/demo_1.fastq demo/demo_2.fastq demo
```
The output files for the next step are fasta file of pileups before filter: dome_pileup_dep0.fasta and tab delimited .txt files: dome_pileup_dep0_F.pos.txt and demo_pileup_dep0_R.pos.txt. The columnns are:

scaffold id, chromosome position of pileup, coverage, pileup depth, depth to coverage ratio, sequence (6 flank nt)

3) identify conserved motifs using MEME-suit. Input file dome_pileup_dep0.fasta. Output: meme/dome_pileup_dep0.txt`
```
sh meme.sh
```
4) merge pileup sites.
```
sh mergepileup.sh
```
The output file is a tab delimited txt file with columns: scaffold, genome position, coverage, depth of read 2, depth-to-coverage ratio, sequence with 6-nt flanking, strand, motif.

5) calculate the number of pileups.
```
sh motif_stat.sh
```
The output file is a txt file with motifs such as, CAG, CCA, GATC/GATC, GAAC/GTTC, and corresponding number of modified motif sites.

6) Summarize the number of pileups per gene class.
```
sh pileup_to_gffClass.sh
```
```
sh summary_geneClass.sh
```
The output file is a table of gene class (CDS, rRNA, tRNA) and corresponding number of total motif sites and modified motif sites.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Copy right and Contact

Edit log

Overview

Hardware requirements

OS Requirements

Installation and dependences

The software/tools below should be installed and added to your system’s `PATH` so that it can be invoked from the command line.

The installation time for tools above should be no more than 1 hour.

Depending on the environment of the system, the installation time for MEME-suite can be from less than 1 hour to about 3 hours.

Usage

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
demo		demo
LICENSE		LICENSE
README.md		README.md
gene_class_summary.sh		gene_class_summary.sh
main.sh		main.sh
meme.sh		meme.sh
mergepileup.sh		mergepileup.sh
motif_stat.sh		motif_stat.sh
pileup_to_gffClass.sh		pileup_to_gffClass.sh
pos2seq_R.sh		pos2seq_R.sh
summary_geneClass.sh		summary_geneClass.sh
trim.sh		trim.sh

Folders and files

Latest commit

History

Repository files navigation

Copy right and Contact

Edit log

Overview

Hardware requirements

OS Requirements

Installation and dependences

The software/tools below should be installed and added to your system’s PATH so that it can be invoked from the command line.

The installation time for tools above should be no more than 1 hour.

Depending on the environment of the system, the installation time for MEME-suite can be from less than 1 hour to about 3 hours.

Usage

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

The software/tools below should be installed and added to your system’s `PATH` so that it can be invoked from the command line.

Packages