Tools for the Das Lab at Stanford — cluster file transfer, structure analysis, sequence utilities, and RNA-specific PDB/silent-file processing.
The rna_tools directory was developed initially in Rosetta and is now maintained here. All scripts require Python 3 and accept --help.
Clone the repo to ~/src/daslab_tools, then add the following to your ~/.bashrc:
# daslab_tools — add root and all subdirectories to PATH and PYTHONPATH
export PATH="$PATH$(printf ':%s' $HOME/src/daslab_tools $HOME/src/daslab_tools/*/)"
export PYTHONPATH="$PYTHONPATH$(printf ':%s' $HOME/src/daslab_tools $HOME/src/daslab_tools/*/)"
# rna_tools subdirectories
export PATH="$PATH$(printf ':%s' $HOME/src/daslab_tools/rna_tools $HOME/src/daslab_tools/rna_tools/*/)"
export PYTHONPATH="$PYTHONPATH$(printf ':%s' $HOME/src/daslab_tools/rna_tools $HOME/src/daslab_tools/rna_tools/*/)"Then run source ~/.bashrc. The glob */ picks up all subdirectories automatically, so no changes are needed when new subdirectories are added.
dssr.py— run x3dna-dssr on PDB files; prints dot-bracket secondary structure and sequencelddt.py— compute lDDT/ilDDT scores between a reference and one or more model structures (via OST)ost.py— run OSTcompare-structureson a reference vs. multiple PDBssplit_models.py— split a multi-model PDB into individual numbered filesthread_rna.py— thread a target sequence onto backbone atoms of a template PDBfix_o2prime.py— strip and rebuild O2′ atoms (fixes bad geometry from some structure predictors)USalign.py— wrapper for USalign structure superposition and TM-score
rsync_to_cluster.py(r2c) — rsync files from the current directory to a named cluster destinationrsync_from_cluster.py(rfc) — rsync files from a named cluster destination to the current directorycd_to_cluster.py(c2c) — print the equivalent of the current directory on another filesystemcluster_info.py— list known destinations;cluster_info.py -ashows resolved host and path for eachprepare_slurm.py— generate SLURM sbatch scripts from a plain-text list of commandsarchive_to_oak.py— tar a directory and copy it to Oak long-term storagerosetta_submit.py— submit Rosetta jobs across cluster destinations
rc.py— reverse complement a DNA or RNA sequencerna2dna.py— convert an RNA sequence to DNA (U→T)gel_from_fastq.py— simulate a gel image from FASTQ read-length distributionsam_counts.py— count reads per reference sequence from a SAM file
csv_to_parquet.py— convert a CSV file to Parquetparquet_to_csv.py— convert a Parquet file to CSVclean_parquet.py— filter or clean columns in a Parquet file
renumber_pdb_in_place.py(rpip) — renumber residues in place; supports negative numbers and chain specspdbsubset.py— extract specified residues from a PDB (by residue number, chain, or range)pdbslice.py— extract a contiguous range or custom subset/excision of residuesextract_chain.py— extract one or more chains from a PDB (supports gzip)replace_chain.py/replace_chain_inplace.py— rename chain IDs in a PDBreorder_pdb.py/reorder_to_standard_pdb.py— reorder ATOM records to Rosetta or standard PDB orderpdb2fasta.py— extract a FASTA-format sequence from a PDBget_sequence.py— print the residue sequence from a PDBget_res_num.py— list residue numbers present in a PDBcheck_cutpoints.py— identify chain cutpoints in a PDBmake_rna_rosetta_ready.py— prepare an RNA PDB for Rosetta (renumber, strip ions, fix chains)fetch_pdb.py— download a PDB entry by accession codeget_surrounding_res.py— find residues within a distance cutoff of a target residuerna_helix.py— build an ideal A-form RNA helix for a given sequencerna_mutate.py— mutate residues in an RNA PDBrna_thread.py— thread a new sequence onto an RNA backboneplot_contour.py— plot contour maps fromnucleobase_sample_aroundscore tablesprepare_rna_puzzle_submissions.py— assemble a ranked multi-model PDB for RNA Puzzles submission
extract_lowscore_decoys.py(ex) — extract the lowest-scoring structures from a Rosetta silent fileextract_lowscore_decoys_outfile.py(exo) — same, writing to a named output fileextract_lowscore_decoys_multiple_outfiles.py— extract from multiple silent files in one callcat_outfiles.py— concatenate Rosetta silent/out files, deduplicating SEQUENCE and description headersfields.py(fp) — show which column numbers correspond to which score termsplot_score_vs_rms.py— scatter plot of score vs. RMS from one or more Rosetta .out filessilent_file_sort_and_select.py— sort and filter a Rosetta silent file by score
rna_denovo_setup.py— generate Rosettarna_denovojob scripts from sequence and secondary structurehelix_preassemble_setup.py— set up helix pre-assembly jobs for Rosettaparallel_min_setup.py— set up parallel minimization jobsparse_tag.py/make_tag.py— parse and create Rosetta-style residue/chain tag stringsrosetta_exe.py— locate Rosetta executables on the current system
extract_chain_from_cif.py— extract a single chain from an mmCIF file
- Provide some examples.
- Make all the pdb_tools compatible with modern CIF format and gzip.