Wikipedia:WikiProject Molecular Biology/Style guide (gene and protein articles)

From Wikipedia, the free encyclopedia

This is a guideline for the structure of gene and protein articles on Wikipedia. It contains the articles naming conventions and the general recommended outline of an article, as well as useful information to bring an article to good article or featured article status.

General considerations

The scope of a gene/protein article is the human gene/protein (including all splice variants derived from that gene) as well as orthologs (as listed in HomoloGene) that exist in other species. If there are paralogs in humans (and by extension other species), then a gene family article in addition to the gene specific articles (see for example dopamine receptor) would be appropriate.

In general, do not hype a study by listing the names, credentials, institutions, or other "qualifications" of their authors. Wikipedia is not a press release. Article prose should focus on what a cited study says about the structure, function, clinical significance, etc. of the gene or protein, not what the gene or protein says about a particular study or the research group who conducted that study. Particularly notable contributions along with who made the discovery however should be mentioned in the discovery/history section.

Article name

If relatively short, the recommended UniProt protein name should be used as the article name. If the protein name is verbose, either a widely used protein acronym or the official HUGO gene symbol, followed by "(gene)" if necessary to disambiguate. UniProt names generally follow the IUBMB recommendations:

When naming proteins which can be grouped into a family based on homology or according to a notion of shared function (like the interleukins), the different members should be enumerated with a dash "-" followed by an Arabic number, e.g. "desmoglein-1", "desmoglein-2", etc.
"Protein Naming Guidelines". Recommendations on Biochemical & Organic Nomenclature, Symbols & Terminology etc. International Union of Biochemistry and Molecular Biology.

If the article is about a viral protein, it is recommended to include the taxon in the title, as "nonstructual protein 2" and "viral protease" can mean many things. A parenthesized term added to disambiguate common symbols does not constitute unnecessary disambiguation even when it is the only article with such a name.

Gene nomenclature

The abbreviations of genes are according to HUGO Gene Nomenclature Committee and written in italic font style (the full names are also written in italic). It is recommended that abbreviations instead of the full name are used. Human gene names are written in capitals, for example ALDOA, INS, etc. For orthologs of human genes in other species, only the initial letter is capitalised, for example mouse Aldoa, bovine Ins, etc.

The following usages of gene symbols are recommended:

  • "the ALDOA gene is regulated...",
  • "the rat gene for Aldoa is regulated..." or
  • "ALDOA is regulated...",

while the following is not recommended:

  • "the gene ALDOA is regulated" since it is redundant.

Images and diagrams

Where possible, diagrams should keep to a standard format. If the diagram guide does not give sufficient guidance on the style for the images in an article, consider suggesting expansions to the standardised formatting.

Infoboxes

One or more of the following infoboxes as appropriate should be included at the top of each article:

More information template, description / suggested use ...
template description / suggested use example article containing this template template filling tool
{{Infobox GNF protein}} for genes/proteins for which an ortholog is present within the human genome (articles containing this template were created as part of the Gene Wiki project) Reelin GeneWikiGenerator
(input: HUGO gene symbol)
{{Infobox protein}} smaller box appropriate for protein family articles where more than one protein is discussed in the same article (e.g., paralogs) Estrogen receptor Wikipedia template filling
(input: HGNC ID)
{{Infobox nonhuman protein}} for proteins without a human ortholog Uterine serpin
{{Infobox protein family}} for protein families (evolutionary related proteins that share a common 3D structure) that are listed in Pfam T-box
{{Infobox rfam}} for RNA families (evolutionary related non-coding RNAs that share a common 3D structure) that are listed in Rfam U1 spliceosomal RNA
{{Infobox enzyme}} for enzymes based on EC number (more properly refers to the reaction catalyzed by the enzyme rather than the enzyme itself)[a] Alcohol dehydrogenase
Close

If there is only one human paralog assigned to a given EC number (the ExPASy database maintains EC number to protein mappings), then in addition to a protein infobox, it may be appropriate to also add the corresponding enzyme infobox. Likewise, if there is only one human paralog that has been assigned to Pfam family, then including a protein family infobox may also be appropriate.

There exist some cases where a large number of infoboxes may apply to an article. You may put less useful ones in a section at the end, laid side-by-side with a table. Collapsing or horizontally scrolling the said table is doubtful, as MOS:COLLAPSE may or may not apply depending on how "extraneous" the boxes are.

Sections

  1. Lead
    The lead section is defined as "the section before the first headline. The table of contents, if displayed, appears between the lead section and the first headline."
    The first sentence of the lead should define what the scope of the article is. For genes/proteins in which a human ortholog exists, "<recommended UniProt name> is a protein that in humans is encoded by the <approved HUGO gene symbol> gene." would be appropriate.
  2. Gene
    Specific information about the gene (on which human chromosome it is located, regulation, etc.). Much of this basic information may already contained in the infobox and should not be unnecessarily repeated in this section unless especially notable.
  3. Protein
    Specific information about the protein (splice variants, post translational modifications, etc.). Again, much of this basic information may already contained in the infobox and should not be unnecessarily repeated unless especially notable.
  4. Species, tissue, and subcellular distribution
    Optional section that concisely describes what species this gene is expressed (e.g., wide species distribution, bacteria, fungi, vertebrates, mammals, etc.), what tissue the protein is expressed, and which subcellular compartments or organelles the protein is found (excreted, cytoplasm, nucleus, mitochondria, cell membrane).
  5. Function
    Describe the function of the transcribed protein.
  6. Interactions
    Optional section that lists proteins that the protein that is the subject of the article is known to interact with.
  7. Clinical significance
    List diseases or conditions that are a result of a mutation in the gene or a deficiency or excess of the expressed protein.
  8. History/Discovery
    In general, it is not appropriate to mention the research group or institution that conducted a study directly in the text of the article. However it is appropriate to list the names of those who made key discoveries concerning the gene or protein in this section (e.g., the scientist or group that originally cloned the gene, determined its function, linked it to a disease, won a major award for the discovery, etc.).

Example articles of what such an organization may look like are: Protein C, Gonadotropin-releasing hormone or Rubisco.

Wikidata item

The Wikipedia article should be linked to a Wikidata item of the entity first mentioned in the first sentence of the lead section, which should be written as defined in WP:MCBMOSSECTIONS. Suppose that the first sentence is "Steroid 21-hydroxylase is a protein that in humans is encoded by the CYP21A2 gene." In this case, the Wikipedia article should be linked to a Wikidata item of the steroid 21-hydroxylase protein rather than the gene.

Citing sources

Categories

Notes

Related Articles

Wikiwand AI