SNPAAMapper is a downstream variant annotation program that can effectively classify variants by region (e.g. exon, intron, etc.), predict amino acid change type (e.g. synonymous, non-synonymous mutation, etc.), and prioritize mutation effects (e.g. CDS versus 5'UTR, etc.).
Major Features
- The pipeline accepts a VCF input file in tab-delimited format and processes the vcf input file containing all cases (G5, lowFreq, and novel)
- The variant mapping step allows users to select whether they want to report the base pair distance between each identified intron variant and its nearby exon
- The pipeline can handle VCF files called by different SAMTools versions (0.1.18 and older) or generated using SAMTools with two or three samples
- The spreadsheet result file contains full protein sequences for both reference and alternative alleles, which makes it easier for downstream protein structure/function analysis tools to use
Instructions
Please download and dump all
files in the same directory on a Unix or Mac machine. The user can simply type
./run_SNPAAMapper-Python.sh config.txt
or run the following steps in sequential order (Note: the first step
was compiled for the human hg19 genome and output files have already been generated):
- Process exon annotation files and generate feature start and gene mapping files python3 Algorithm_preprocessing_exon_annotation_RR.py ChrAll_knownGene.txt.exon
- Classify variants by regions (CDS, Upstream, Downstream Intron, UTRs...) python3 Algorithm_mapping_variants_reporting_class_intronLocation_updown.py ChrAll_knownGene.txt.exon VCF_input_file_in_tab_delimited_format.vcf
- Predict amino acid change type python3 Algorithm_predicting_full_AA_change_samtools_updown.py VCF_input_file_in_tab_delimited_format.vcf.append kgXref.txt hg19_CDSIntronWithSign.txt.out ChrAll_knownGene.txt > VCF_input_file_in_tab_delimited_format.vcf.out.txt
- Prioritize mutation effects python3 Algorithm_prioritizing_mutation_headerTop_updown.py VCF_input_file_in_tab_delimited_format.vcf.append.out.txt
(python3 Algorithm_mapping_variants_reporting_class_intronLocation_updown.py ChrAll_knownGene.txt.exon 007_crop.vcf)
OR
python3 Algorithm_mapping_variants_reporting_class_intronLocation_updown.py ChrAll_knownGene.txt.exon VCF_input_file_in_tab_delimited_format.vcf IntronExon_boundary_in_bp
(python3 Algorithm_mapping_variants_reporting_class_intronLocation_updown.py ChrAll_knownGene.txt.exon 007_crop.vcf 6)
(python3 Algorithm_predicting_full_AA_change_samtools_updown.py 007_crop.vcf.append kgXref.txt hg19_CDSIntronWithSign.txt.out ChrAll_knownGene.txt > 007_crop.vcf.out.txt)
(python3 Algorithm_prioritizing_mutation_headerTop_updown.py 007_crop.vcf.append.out.txt)
Source Code Download
GithubReferences
- [1] “The Human Genome Project.” Genome.gov, www.genome.gov/human-genome-project.
- [2] Nature News, Nature Publishing Group, www.nature.com/articles/d42473-021-00030-9.
- [3] Lewis, Tanya. “Human Genome Project Marks 10th Anniversary.” LiveScience, Purch, 14 Apr. 2013, www.livescience.com/28708-human-genome-project-anniversary.html.
- [4] Barba, Marina, Czosnek, Henryk, Hadidi, Ahmed. “Historical Perspective, Development and Applications of next-Generation Sequencing in Plant Virology.” Viruses, MDPI, 6 Jan. 2014, www.ncbi.nlm.nih.gov/pmc/articles/PMC3917434/.
- [5] Bai, Yongsheng, and James Cavalcoli. “SNPAAMapper: An Efficient Genome-Wide SNP Variant Analysis Pipeline for next-Generation Sequencing Data.” Bioinformation, Biomedical Informatics, 16 Oct. 2013, www.ncbi.nlm.nih.gov/pmc/articles/PMC3819573/.
- [6] “UCSC Genome Browser Project History.” Genome Browser History, https://genome.ucsc.edu/goldenPath/history.html.
- [7] “The Perl Programming Language.” TIOBE, https://www.tiobe.c