Genome designation versus assignation
New Pango lineages are identified and named by the Lineage Designation Committee (LDC), according to the rules set and moderated by the Pango Committee (PC). Further information about the roles and remits of these committees can be found here.
As well as creating new lineages, these committees manage the important process of sequence designation, by which SARS-CoV-2 genome sequences are allocated to a specific Pango lineage. The Pango team maintains the sequence designation list, which is a central record of all the SARS-CoV-2 genomes that have been designated to a lineage.
Not all sequences will have a lineage designation, for several reasons.
- Some sequences do not meet the quality threshold, for example because they contain many missing nucleotides, and therefore cannot be allocated unambiguously to one specific lineage.
- Newly-generated sequences take time to be assessed by the Pango team, and therefore do not immediately have a lineage designation.
- Some Pango lineages are so large and contain so many identical genomes, that it is not practical to designate every possible member of that lineage.
However, in many instances it is important to ascertain the likely lineage status of a new or incomplete SARS-CoV-2 genome. This can be achieved by sequence assignment using a software tool such as pangolin. Assignment is the process of inferring to which lineage a sequence most likely belongs. For new genomes, assignment can be thought of as a ‘provisional’ lineage status that can be used prior to the later inclusion of the genome in the sequence designation list.
Designation: Lineage designation is a definitive statement of classification. The designation process involves manual data analysis and curation by the LDC. Sequence designations are made according to the Pango rules. All genome designations are recorded in the sequence designation list.
Assignment: Lineage assignment is an inference that in most cases will be accurate but does carry some uncertainty. Sequences can be assigned automatically to lineages using software tools that extrapolate from the sequence designation list to previously-unseen sequences. The most popular tool for assignation is pangolin, which infers the lineage status of query sequences. Full details of pangolin are provided here. Pangolin is updated frequently and the sequence assignments it makes sometimes differ between software versions (for example, if a new lineage has been created).