PANTHER
From Wikipedia, the free encyclopedia
capturedGene families
| Content | |
|---|---|
| Description | The PANTHER database classifies gene products into families |
| Data types captured | Gene families |
| Contact | |
| Research center | University of Southern California |
| Authors | Paul D Thomas |
| Primary citation | PMID 12520017 |
| Access | |
| Website | |
| Miscellaneous | |
| Bookmarkable entities | yes |
In bioinformatics, the PANTHER (protein analysis through evolutionary relationships) classification system is a large curated biological database of gene/protein families and their functionally related subfamilies that can be used to classify and identify the function of gene products.[1] PANTHER is part of the Gene Ontology Reference Genome Project[2] designed to classify proteins and their genes for high-throughput analysis.
The project consists of both manual curation and bioinformatics algorithms.[3] Proteins are classified according to family (and subfamily), molecular function, biological process and pathway. It is one of the databases feeding into the European Bioinformatics Institute's InterPro database.[4]—Application of PANTHER—The most important application of PANTHER is to accurately infer the function of uncharacterized genes from any organism based on their evolutionary relationships to genes with known functions.[3] By combining gene function, ontology, pathways and statistical analysis tools, PANTHER enables biologists to analyze large-scale, genome-wide data obtained from the current advance technology including: sequencing, proteomics or gene expression experiments.[5] Shortly, using the data and tools on the PANTHER, users will be able to:[6]
- Obtain information about a particular gene of interest.
- Discover protein families and subfamilies, pathways, biological processes, molecular functions and cellular components.
- Create lists of genes related to a particular protein family/subfamily, molecular function, biological process or pathway.
- Analyze lists of genes, proteins or transcripts.
- 1998:Project was launched at Molecular Application Group.
- 1999:Acquired by Celera Genomics.
- 2000:PANTHER 1 released in Celera Discovery Systems (CDS).
- 2001: PANTHER 2 released, which is used in the annotationon of the first published human genome Celera.
- 2002: PANTHER 3 released. PANTHER annotations are integrated in FlyBase. Moved to ABI.
- 2003: PANTHER 4 released with the public release of PANTHER Classification System.
- 2005: PANTHER 5 released with PANTHER Pathway and analysis tool. Establish collaboration with InterPro.
- 2006: PANTHER 6 released. Move to SRI.
- 2010: PANTHER 7 released.
- 2011: Move to USC.
- 2012: PANTHER 8 released.
- 2014: PANTHER 9 released.[6]
- 2015: PANTHER 10 released.
- 2016: PANTHER 11 released.[7]
Phylogenetic tree
In PANTHER there is a phylogenetic tree for each of the protein families. The annotation of tree is done based on the following criteria:
- Each node is annotated by gene attributes including “subfamily membership”, “protein class”, “gene function”. These attributes are heritable.[3] Swiss-Prot protein names are usually used to name subfamilies.[8] Since PANTHER is part of the GO reference genome project,[2][9] the Gene Ontology[10] (GO) terms are used for gene function. PANTHER/X ontology terms are used for protein class.
- Each internal node is annotated by evolutionary events such as “speciation”, “gene duplication” and “horizontal gene transfer”.[3]
To generate phylogenetic trees, PANTHER uses GIGA algorithm. GIGA uses species tree to develop tree construction. On every iteration it attempts to reconcile tree in event form of speciation and gene duplication.
PANTHER library data generation process
The process for data generation is divided into three steps:
- Family clustering
- Pythologentic tree building
- Annotation of tree nodes
Family clustering
Sequence set
PANTHER trees depicts gene family evolution from a broad selection of genomes which are fully sequenced. PANTHER have one sequence per gene so that the tree can represent event occurred over the course of evolution i.e duplication, speciation. PANTHER genomes set are selected based on the following criteria:
- The set should include a major experimental model organism, this will assist in depicting functional information of the organism which are less studied.
- The set should include a broad taxonomic range of other genomes, preferably fully sequenced and annotated, this will assist in relating experimental model organism.
Family clusters
Following are the requirements for being family clusters in PANTHER:
- The family must contain at least five members among which at least one gene has to be from a GO reference genome.
- In order to support phylogenetic inference, the family must contain a high quality sequence alignment.
- The assessment of multiple aligned sequence is done by assessing a length of the aligned sequence, at least 30 sites aligned across 75% or more of family members.
Phylogenetic tree building
For each family multiple sequence are aligned using a default setting of MAFFT, any column which is aligned less than 75% of the sequence is removed. This data is then used as an input for GIGA program. The output tree from GIGA are labelled. Each internal node is labelled as whether divergence event happened as speciation or gene duplication.
Annotation of tree nodes
Each node in PANTHER tree is annotated with heritable attribute. Heritable attribute can be of three types subfamily membership, gene function and protein class membership. These annotation of nodes applies to primary sequence which was used to construct tree. In applying these annotation to primary sequence simple evolutionary principle is used i.e. each node annotation is propagated by its decedent node.[3]
PANTHER components
PANTHER/LIB (PANTHER library): Library consists of collection of books. Each of these books represents a protein family. There are a Hidden Markov Model (HMM), a multiple sequence alignment (MSA) and a family tree for each protein family in the library.[1]
PANTHER/X (PANTEHR index): Index contains abbreviated ontology which assist in summarizing, navigating molecular function and biological function. Although PANTHER/X ontology has a hierarchical organization, it is a directed acyclic graph and so when it is biologically justified, child categories appear under more than one parent. PANTHER/X has been mapped to GO and arranged in a different way to facilitate large scale analysis of proteins.[1]
PANTHER pathways
PANTHER includes 176 pathway using CellDesigner tool. PANTHER pathways can be downloaded in the following file formats.
- Systems Biology Markup Language (SBML)
- Systems Biology Graphical Notation (SBGN - ML)
- BioPAX[3]