XtractPAV

An Automated Pipeline for Tracing Presence/Absence Variations in Eukaryotic & Prokaryotic Genomes

Introduction

PAV (Presence/Absence Variation) is a form of structural variation where specific genes or genomic regions are present in some individuals or species but absent in others. Unlike small-scale variations like SNPs, PAVs involve larger segments of DNA and can significantly impact phenotype by altering gene content. Common in both eukaryotic and prokaryotic genomes, PAVs contribute to genetic diversity, adaptation, evolution, and traits such as disease resistance or pathogenicity. They often result from gene duplication, deletion, or horizontal gene transfer, and their analysis is crucial in fields like evolutionary biology, agriculture, microbial genomics, and personalized medicine.

XtractPAV: Pipeline Workflow

XtractPAV Pipeline Diagram

Download

Get the latest release of XtractPAV:

GitHub Repository Download XtractPAV pipeline Download Tested Data

Features

Installation

Clone the repository and set up your environment:

git clone https://github.com/yourusername/XtractPAV.git
cd XtractPAV
# Add the bin to PATH
export PATH=$PATH:/path/to/XtractPAV

Using Conda

for conda it is very simple to install all dependencies using the dependencies.yml file. Just Execute the following commands

conda env create -f XtractPAV-Dependencies.yml
conda activate XtractPAV

Using the Ubuntu/Debian system

MUMmer4 (v4.0.0beta2; GCC ≥ 4.7):
sudo apt install mummer4

    or build from source, you can download by clicking  HERE :

wget https://github.com/mummer4/mummer/releases/download/v4.0.0beta2/mummer-4.0.0beta2.tar.gz
tar xzf mummer-4.0.0beta2.tar.gz
cd mummer-4.0.0beta2
./configure --prefix=/usr/local
make
sudo make install
export PATH=/usr/local/bin:$PATH

BLAST+:
sudo apt install ncbi-blast+

    or download & extract. You can download by clicking  HERE :

wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ncbi-blast-*-src.tar.gz
tar xzf ncbi-blast-*-src.tar.gz
cd ncbi-blast-*/c++
# then configure & make as per NCBI docs

Bedtools

Bedtools – the swiss army knife for genome arithmetic. You can download via Bedtools. XtractPAV is not version specific to bedtools, however version 2.27.1 or the latest version is preferred.

sudo apt install bedtools

or

tar -zxvf bedtools-2.27.1.tar.gz
cd bedtools2
make
# Add Bedtools tools to your PATH
export PATH=/path/to/bedtools/bin:$PATH

Python Dependencies (via pip):
pip install biopython==1.78  Here 
pip install plotly==6.0.1  Here 

Parameters

Flag Description Default
--rf Reference genome FASTA file (required)
--ra Reference GFF3 annotation (required)
--qf Query genome FASTA files (For Multi Genomes use comma‑sep) (required)
--qa Query GFF3 annotation files (For Multi Genomes use comma‑sep) (required)
--cov Minimum coverage threshold (0–1) 0.8 (Float)
--sim Minimum similarity percentage 90.0 (Float)
--len Minimum PAV length (bp) 100 (int)
--thr Number of threads 1
--help Show help message Optional
--version Show pipeline version Optional

Input & Output Files

Input

Output

Run XtractPAV with Test Data

Test data is provided in Sample_data/. To verify functionality:

XtractPAV.sh --rf S_Entrica_LT2.fna --ra S_Entrica_LT2.gff --qf S_Agona_SL483.fna --qa S_Agona_SL483.gff --cov 0.9 --sim 95.0 --len 100 --thr 8
Download Test Data

After running, check the Results/ directory for results.

For more details, refer to the Parameters section.

Reference

In process

Contact Us

For questions or support, please reach out: