Rules for the Designation and Naming of Pango Lineages

In the vast majority of instances it is expected that Pango lineage names and designations will conform to the following rules. These rules also act as guidelines for the decisions made by the Lineage Designation Committee.


SECTION I. Criteria for designation of a new Pango lineage

  1. A set of SARS-CoV-2 genome sequences may be considered for the designation of a new lineage name if it exhibits the following essential characteristics:
    • 1a. At the time of designation, the set of sequences is expected to share a single common ancestor and represent a monophyletic or paraphyletic clade in the SARS-CoV-2 phylogeny (see I.1e).
    • 1b. The clade should be distinguished by at least one unambiguous evolutionary event (single nonsynonymous change, insertion/deletion, or recombination event).
    • 1c. The clade should contain a minimum of 5 sequences with high genome coverage. High coverage means that <5% of nucleotides sites in the whole genome (excluding UTRs) are represented by IUPAC ambiguity codes.
    • 1d. The clade must include at least one internal node and therefore cannot be solely composed of a single polytomy. Thus, a lineage is expected to be consistent with a significant amount of onward transmission.
    • 1e. Due to the large size of the SARS-CoV-2 global phylogeny, a quantitative measure of clade support is no longer required (except for that imposed by I.1b).
  2. A new non-recombinant lineage designation is expected to also represent one or more events of epidemiological significance.
    • 2a. A non-exhaustive list of possible events are as follows:
      • (i) The clade may represent inferred movement of the virus into a new country or region (founder event).
      • (ii) The clade may distinguish successive epidemic waves in the same location.
      • (iii) The clade may be observed to be growing rapidly and/or strongly increasing in frequency compared to other co-circulating lineages.
      • (iv) The clade may be associated with observed or predicted changes in phenotypes including, but not limited to, transmissibility, immunogenicity, or pathogenicity.
      • (v) The clade may indicate a cross-species transmission event.
      • (vi) The clade may carry a set of multiple mutations of particular biological interest or concern. Note that, in most cases, the presence of a single specific mutation, by itself, will not be considered sufficient to warrant a new lineage designation.
    • 2b. If a clade becomes the subject of exceptional interest, and a new lineage designation would greatly clarify reference to and discussion of that clade, then a new designation may be considered even if none of the event conditions in I.2a is met. The conditions in section I.1 should still be met.
    • 2c. In the event of unforeseen or exceptional circumstances, the Lineage Designation Committee may designate lineages according to criteria other than those in I.2a and I.2b. Details of these decisions should be recorded and communicated to the Pango Committee.
  3. The essential characteristics in section I.1 are necessary but not sufficient for the designation of a new lineage.
    • 3a. Lineage designation is at the discretion of the Lineage Designation Committee and its interpretation of clade characteristics.
    • 3b. Recombinant clades can qualify for a lineage designation if they meet the essential characteristics in section I.1, regardless of whether any of the conditions in sections I.2 are met.
    • 3c. In most instances it is expected that the number of high quality genomes required to designate a lineage will be greater than the minimum number defined in I.1c.

SECTION II. Definition of a hierarchical alpha-numeric system for lineage names

Each Pango lineage designated according to the guidelines in Section 1 is given its own unique lineage name. Pango lineage names are based on phylogenetic structure, but are intended to convey only partial, local information about ancestor-descendant relationships.

  1. Syntax
    • 1a. Lineage names are constructed from an alphabetical prefix and a numerical suffix.
    • 1b. Any letter in the Latin alphabet may be used, except for I, O and X.
    • 1c. Each full stop (period) within the numerical suffix represents “descendant of”.
    • 1d. The numerical suffix has a maximum of three hierarchical levels. Descendants of lineages with tertiary suffixes are assigned to the next available alphabetical prefix, which acts as an alias.
    • 1e. Details of each alias should be provided in the current lineage description list.
    • 1j. Recombinant lineages are given alphabetical prefixes beginning with the letter X.

SECTION III. Implementation and revision

    • 1d. The Pango nomenclature is dynamic. Each lineage has a status: ACTIVE, UNOBSERVED, INACTIVE, WITHDRAWN, or UNDEFINED.
    • 1e. A lineage name may be withdrawn at the discretion of the Pango Committee. Once withdrawn, a name cannot be reused.

SECTION IV. Designated sequences

    • 1a. The Pango Committee will maintain and distribute a sequence designation list.
    • 1b. Each designated genome sequence belongs to one, and only one, Pango lineage.
    • 1c. Sequences must be of high coverage to be included in the sequence designation list.