An Automated Pipeline for Tracing Presence/Absence Variations in Eukaryotic & Prokaryotic Genomes
PAV (Presence/Absence Variation) is a form of structural variation where specific genes or genomic regions are present in some individuals or species but absent in others. Unlike small-scale variations like SNPs, PAVs involve larger segments of DNA and can significantly impact phenotype by altering gene content. Common in both eukaryotic and prokaryotic genomes, PAVs contribute to genetic diversity, adaptation, evolution, and traits such as disease resistance or pathogenicity. They often result from gene duplication, deletion, or horizontal gene transfer, and their analysis is crucial in fields like evolutionary biology, agriculture, microbial genomics, and personalized medicine.
XtractPAV: Pipeline Workflow
Get the latest release of XtractPAV:
GitHub Repository Download XtractPAV pipeline Download Tested DataClone the repository and set up your environment:
git clone https://github.com/yourusername/XtractPAV.git cd XtractPAV # Add the bin to PATH export PATH=$PATH:/path/to/XtractPAV
for conda it is very simple to install all dependencies using the dependencies.yml
file. Just Execute the following commands
conda env create -f XtractPAV-Dependencies.yml conda activate XtractPAV
sudo apt install mummer4 or build from source, you can download by clicking HERE : wget https://github.com/mummer4/mummer/releases/download/v4.0.0beta2/mummer-4.0.0beta2.tar.gz tar xzf mummer-4.0.0beta2.tar.gz cd mummer-4.0.0beta2 ./configure --prefix=/usr/local make sudo make install export PATH=/usr/local/bin:$PATH
sudo apt install ncbi-blast+ or download & extract. You can download by clicking HERE : wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ncbi-blast-*-src.tar.gz tar xzf ncbi-blast-*-src.tar.gz cd ncbi-blast-*/c++ # then configure & make as per NCBI docs
Bedtools – the swiss army knife for genome arithmetic. You can download via Bedtools. XtractPAV is not version specific to bedtools, however version 2.27.1 or the latest version is preferred.
sudo apt install bedtools or tar -zxvf bedtools-2.27.1.tar.gz cd bedtools2 make # Add Bedtools tools to your PATH export PATH=/path/to/bedtools/bin:$PATH
pip install biopython==1.78 Here pip install plotly==6.0.1 Here
Flag | Description | Default |
---|---|---|
--rf |
Reference genome FASTA file | (required) |
--ra |
Reference GFF3 annotation | (required) |
--qf |
Query genome FASTA files (For Multi Genomes use comma‑sep) | (required) |
--qa |
Query GFF3 annotation files (For Multi Genomes use comma‑sep) | (required) |
--cov |
Minimum coverage threshold (0–1) | 0.8 (Float) |
--sim |
Minimum similarity percentage | 90.0 (Float) |
--len |
Minimum PAV length (bp) | 100 (int) |
--thr |
Number of threads | 1 |
--help |
Show help message | Optional |
--version |
Show pipeline version | Optional |
raw_PAVs/
– extracted FASTA sequencesfiltered_PAVs/
– passing coverage & similarity filtersgenic_PAVs/
– annotated genic PAVs (GFF3)report.html
– interactive final summaryTest data is provided in Sample_data/
. To verify functionality:
XtractPAV.sh --rf S_Entrica_LT2.fna --ra S_Entrica_LT2.gff --qf S_Agona_SL483.fna --qa S_Agona_SL483.gff --cov 0.9 --sim 95.0 --len 100 --thr 8Download Test Data
After running, check the Results/
directory for results.
For more details, refer to the Parameters section.
In process
For questions or support, please reach out: