Capstone PTM alignment This pipeline has been successfully tested on Linux, Windows and some Mac systems. It is known to have issues on ARM-based M4 systems.
This workflow can be run from docker. Docker can be installed from https://docs.docker.com. To set up the workflow, first clone the git repository:
git clone https://github.com/evergreen700/PTMOverlay
cd PTMOverlay
If you are running the docker container on a Windows machine, git will automatically change a script in a way that doesn't work with the docker image. Fix it with this:
git config core.autocrlf false
git checkout .\scripts\*
Then, build the docker image:
docker build -t ptm-overlay .
Create a conda environment. All code should be run inside the environment you create. Instructions for creating a new conda environment can be found here: https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html
Install dependencies using pip:
pip install matplotlib pandas pyteomics biopython lxml snakemake ncbi-taxonomist svgutils six
brew install pdf2svg (Mac)
sudo apt-get install pdf2svg (Linux)
Clone the git repository:
git clone https://github.com/evergreen700/PTMOverlay
We have not extensively tested all the packages this tool requires. These are the versions we have used:
- pyteomics: 4.7.5
- biopython: 1.85
- lxml: 5.3.1
- snakemake: 9.1.1
- ncbi-taxonomist: 1.2.1
In the executables directory of the repository, check to see if a MUSCLE executable that works with your operating system is present. If not, go to this website and download the correct executable and place it in in executables directory.
https://drive5.com/muscle/downloads_v3.htm
Then edit the runMUSCLE.py file accordingly.
If on Windows:
- Edit line 13 to point to the correct executable
If on Linux:
- Edit lines 16, 17 or 19, 20 depending on your machine
If on Mac:
- Edit line 23 or 25 depending on your machine
To run the workflow, place proteomes and kegg annotation files in the folder designated as proteome_dir in the config file. Place .pepXML files in folders sorted by ptm type within the folder designated as pepXML_dir in the config file. Below is an example of the file structure:
PTMOverlay
+ proteome
| + GCA_002847685.2.faa
| + GCA_002847685.2.kegg.txt
| + ...
+ mass_spec
| + Phospho
| | + BioD_urine_UMB0005_01_12Apr24_Arwen_WBEH-23-02-03.pepXML
| | + ... (other phospho pepXMLs)
| + ... (other PTM types)
+ index_umb_taxa_gca.tsv
+ README.md
+ snakefile
+ ...
index_umb_taxa_gca.tsv is a tab-separated file that is used to match between mass spec strain IDs (UMB####), species name, and proteome assembly (GCA). If you are using your own mass spec and proteome files, make sure that the names are in this tsv.
To run the workflow from the docker image on Windows, run
docker run -v .\:/PTMOverlay ptm-overlay /bin/bash -c "cd /PTMOverlay && snakemake --cores all"
To run the workflow from the docker image on MacOS or Linux, run
docker run -v ./:/PTMOverlay ptm-overlay /bin/bash -c "cd /PTMOverlay && snakemake --cores all"
To reduce errors when downloading example data from ftp, you may want to limit the number of concurrent downloads:
docker run -v .\:/PTMOverlay ptm-overlay /bin/bash -c "cd /PTMOverlay && snakemake --cores all --resources='downloads=2'"
To run the workflow on your operating system:
cd PTMOverlay
snakemake
The proteome and kegg annotation files are included as an example. Until we graduate and the files are no longer hosted on BYU Box, the proteome files can be downloaded and installed from Box automatically when running snakemake. We recommend keeping the config file the way it is for the first run. If there are other orthologs or pathways you want to look at on the 31 species, rerunning with modified parameters will run faster if intermediates are already generated.