Bioinformatics Tool

Pangolin

Phylogenetic Assignment of Named Global Outbreak Lineages

The primary software tool for assigning SARS-CoV-2 genome sequences to Pango lineages. Trusted by researchers and public health agencies worldwide for accurate, rapid lineage classification.

15M+
Sequences Analyzed
1000+
Citations
v4.3
Latest Version
Overview

What is Pangolin?

Pangolin is an open-source software tool that assigns the most likely Pango lineage to SARS-CoV-2 query sequences. Using machine learning models and phylogenetic placement algorithms, it provides rapid, accurate lineage classification essential for genomic surveillance.

Fast Analysis

Process thousands of sequences quickly using optimized algorithms and models.

Accurate Results

High-accuracy lineage assignments validated against gold-standard designations.

Updated Daily

pangolin-data is updated regularly with new lineage definitions.

Open Source

Free to use under GPL-3.0 license. Contribute on GitHub.

Getting Started

Installation

Pangolin can be installed via Bioconda (recommended) or pip. It requires Python 3.8+ and works on Linux, macOS, and Windows (via WSL).

1

Install via Conda (Recommended)

The easiest method with all dependencies managed automatically.

2

Update pangolin-data

Keep your lineage definitions current with regular updates.

3

Run Analysis

Process your FASTA files with a single command.

Terminal
# Install via conda (recommended)
$ conda install -c bioconda pangolin
# Or install via pip
$ pip install pangolin
# Update pangolin-data for latest lineages
$ pangolin --update-data
# Run analysis on FASTA file
$ pangolin sequences.fasta
# Output: lineage_report.csv
taxon,lineage,conflict,ambiguity_score,scorpio_call...
seq1,JN.1.1,0.0,0.12,JN.1-like,...
seq2,BA.2.86,0.0,0.08,BA.2.86-like,...
Pipeline

How Pangolin Works

Pangolin uses a multi-step pipeline to assign lineages with high accuracy.

1

Input Processing

FASTA sequences are validated and aligned to the reference genome.

2

Scorpio Check

Sequences are scanned for constellation patterns to identify VOCs.

3

UShER Placement

Sequences are placed on a global phylogenetic tree using UShER.

4

Output Report

CSV report with lineage assignments, confidence scores, and QC metrics.

Results

Output Format

Pangolin outputs a detailed CSV report with multiple fields for each sequence.

FieldDescription
taxonSequence name from the input FASTA file
lineageAssigned Pango lineage (e.g., JN.1.1, BA.2.86)
conflictPhylogenetic conflict score (lower is better)
ambiguity_scoreAmbiguity in assignment (lower is better)
scorpio_callConstellation-based variant call (e.g., Omicron-like)
scorpio_supportSupport score for constellation match
scorpio_conflictConflict in constellation assignment
scorpio_notesAdditional notes on the assignment
versionPangolin and data versions used
pangolin_versionPangolin software version
scorpio_versionScorpio version
constellation_versionConstellation definitions version
is_designatedWhether lineage is officially designated
qc_statusQuality control status (pass/fail)
qc_notesQuality control notes
noteAdditional processing notes

Ready to analyze your sequences?

Use our online interface for instant analysis or download Pangolin for local use.