Talk:Human genome
From Wikipedia, the free encyclopedia
| Human genome was one of the Natural sciences good articles, but it has been removed from the list. There are suggestions below for improving the article to meet the good article criteria. Once these issues have been addressed, the article can be renominated. Editors may also seek a reassessment of the decision if they believe there was a mistake. | |||||||||||||
| |||||||||||||
| Current status: Delisted good article | |||||||||||||
| This It is of interest to the following WikiProjects: | |||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||
This article has been mentioned by a media organization:
|
quality
according to the url below, a paper from top experts in a top journal (ie highly authoritative) says that there are many many gaps (unsequenced) regions in the human genome imo, the lack of attention paid to these gaps is somewhat misleading for the general public; eg when scientists use the word "complete" it means, per the dictionary, that we have no gap, no missign sequence genome yet this is empirically false http://www.nature.com/nature/journal/vaop/ncurrent/full/nature13907.html
- Yeah, quality is important. For example, signing and dating a comment or including an actual citation rather than a useless paywalled link to a search site.~2025-39446-09 (talk) 00:21, 31 December 2025 (UTC)
- To expand a bit on the concept of "**full** genome:
- 1.Usually "the genome" is shorthand for "the nuclear genome", but sometimes includes mitochondrial DNA - of which there are many copies (usually) in any one cell. (And it's usually assumed the mtDNA is identical in all mitochondria.)
- 2. Most nucleus containing human cells are diploid. The most numerous cell type, (mature) red blood cells don't have a nucleus (hence don't have nuclear DNA). Some cells are more that diploid, having more than 1 pair of each chromosome.
- 3. Chromosome pairs originate from the father and mother and are different. That is, while the genes generally have the same function on either DNA strand, their exact base sequence differs.
- 4.Because transcription (cell division & DNA copying) is imperfect and because individual humans start with only one (nuclear) genome set which repeatedly replicates, mutations occur often enough that adult humans have millions of different mutations between the cells of their bodies. That is, there is no one unique "human genome", even at the individual level. (If we could somehow record the original (germline) fertilized ova's DNA, that would be as close as we could (arguably) get to "a" human genome.)
- 5.Genes are silenced in ways that are partially determined by the cell's developmental context. Identical twins may have different gene expression because of this. Meaning it isn't "all" about DNA sequence.
- 6. We do indeed have what could be considered a "full" human genome - which includes both the X and Y chromosomes from a number of individuals. The current "reference genome" (an average of a number of individual results) is GRCh38.p14 (2023) but other reference standards exist.
- 7. I do not know if the standard reference includes the two copies of the 22 autosomes and the each of the allosome pair, but I don't see how an "averaged" standard could do that.~2025-39446-09 (talk) 00:21, 31 December 2025 (UTC)
- To expand a bit on the concept of "**full** genome:
Definition of "genome"
The very first sentence of the article defines the human genome as, "The human genome is a complete set of nucleic acid sequences for humans, encoded as DNA within the 23 chromosome pairs in cell nuclei ..." Except for the ambiguous use of 'pairs,' this is a correct definition that corresponds to the definition in genome, "... the 'genome' refers to only one copy of each chromosome."
A recent edit introduced the term "haploid genome." This is equivalent to saying "haploid set of one copy of each chromosome" and that makes no sense. The phrase "diploid genome" is even worse since it implies that there are different kinds of genomes; haploid, diploid, and polyploid. There can be haploid, diploid, and polyploid sets of chromosomes but not genomes.
I made the appropriate edits. Genome42 (talk) 16:06, 19 September 2024 (UTC)
- This (Genome42's) above post is problematic and imho so problematic as to be just plain wrong. First it ignores mitochondrial DNA, but at least it acknowledges this - sorta. (I say sorta because it is false that mDNA is "A small DNA molecule". It is the set of small DNA molecules found in a persons cells. The number of mDNA molecules in a cell varies by cell type as well as by inheritance and environmental factors. mDNA genetics is non-Mendelian, so it makes some sense to consider nuclear DNA and mDNA separately. The mDNA rings found in one person's cells are not necessarily identical, so to speak of it as "a" molecule is clearly misleading. I'd say wrong except I don't know the frequency of sequence variation, if it's low then speaking of "a" sequence may be an adequate first approximation, IDK. IMHO, the first paragraph of the lead is wrong. Mostly by omission. "The" human genome is a REFERENCE SEQUENCE and last I heard NOT one found in a single human (i.e. it is a composite). This is a problem since the term is used both for the reference sequence and for the set of sets of (nuclear) DNA found in the cells of the entire population of homo sapiens. Also, while it's understood that my/your/his/her/their genome *is* each a human genome, it's also generally understood that there will be sequence differences (but I'd guess that the extent of the differences is both under- and over- estimated). I strongly disagree that "the genome refers only to one copy of each chromosome". It's well-known that understanding of the gene coding, especially protein coding, requires knowledge of both the gene-copies, maternal and paternal. The effects of the gene on a person's phenotype depends on both copies. Depending on how this article is written, it may be necessary to distinguish between the reference genome (there's more than one, of course) and an individual's genome. IDK. It seems like it'd be confusing to mix the several meanings of the term, especially when a statement is true for one meaning and false or indeterminate for the other. The Engineer in me says that any discussion of a physical object should include some measures of its range, as well as the range of uncertainty. It is wrong to say that the Y chromosome contains 62,460,029 base pairs. I'm not sure what the uncertainty is for that *particular* measurement but I doubt that there's been enough error analysis to believe the error is below 1 bp. Generally, few lab techniques are without significant random error and it should be generally assumed that even when calibration is frequent each work-flow (machines and technician set) will have non-random (biasing) errors. A little be less unwarranted certainty is, imho, appropriate here.98.19.177.99 (talk) 18:07, 5 February 2025 (UTC)
"Protein coding gene" listed at Redirects for discussion
The redirect Protein coding gene has been listed at redirects for discussion to determine whether its use and function meets the redirect guidelines. Readers of this page are welcome to comment on this redirect at Wikipedia:Redirects for discussion/Log/2025 September 19 § Protein-coding gene until a consensus is reached. 1234qwer1234qwer4 13:40, 19 September 2025 (UTC)
The 23 chromosomes are 46
The first sentence of this article boldly states that "the human genome is a complete set of nucleic acid sequences for humans, encoded as the DNA within each of the 23 distinct chromosomes in the cell nucleus". I am pretty sure that there are 46 chromosomes in the nucleus of each human cell, and that most usually they are all at least a little bit different from one another. Imerologul Valah (talk) 23:17, 28 September 2025 (UTC)
- There are 23 chromosome pairs. The two chromosomes of a pair are distinct from each other, but (other than the X/Y pair), fitting a common map. Any given gene we have mapped in a particular chromosome will be present in both chromosomes - albeit probably different alleles of that gene. The intro sentence could probably be rephrased to state "23 distinct chromosome pairs", since each pair has a completely different set of genes from the other pairs. Tarl N. (discuss) 01:02, 29 September 2025 (UTC)
- The actual human genome consists of sequences for 24 (not 23) distinct chromosomes (22 autosomes + X + Y). Genome42 (talk) 15:24, 30 September 2025 (UTC)
Haploid and diploid genome sequences
The standard reference genome (CHCh38) is a composite sequence that does not represent any one individual. It notes variation in the annotations. The standard reference genome is missing lots of repetitive DNA, especially in the centromeres and the sites of tandem arrays of ribosomal RNA genes.
Most genome assemblies came from sequencing diploid cells and tissues and they did not create separate assemblies for each pair of chromosomes. Instead, the differences were just noted as variants. However, there were some published sequences that did create separate assemblies for each chromosome beginning with Craig Venter's sequence in 2007. You could say that these represent two haploid sequences rather than a composite diploid sequence but I don't find that terminology useful.
The development of long sequence reads enabled researchers to bridge the gaps in the standard reference genome by assembling long stretches of repetitive DNA. Since these regions are highly variable, it's likely that each pair of your chromosomes differ in the amount of repetitive DNA at these sites. That's why the researchers on the telomere-to-telomere (T2T) project decided to use a cell line (CHM13) where each pair of chromosomes is identical. It made the interpretation of the repetitive regions much easier. (Nurk et al., 2021)
These cells (CHM13) were diploid but that was due to the duplication of sperm DNA in cells from a molar pregnancy. You could refer to this sequence as a haploid sequence but I think that's somewhat confusing and beyond the scope of this article.
Peteruetz recently edited this article to include a reference to this T2T sequence that they called "haploid." They also posted a link to an unpublished article on a preprint server (Hansen at al., 2025). This paper assembled separate chromosome sequences for each chromosome pair in a diploid cell line. The difference between this result and similar assemblies in the past is that Hansen et al. used the long sequencing strategy to produce T2T sequences.
This resulted in some direct documentation of variation in the repeated segments that aren't part of the standard reference genome.
I think this is much too complicated for the average Wikipedia reader and it would be better if we didn't include it in the article.
There's an additional complication that we are better off ignoring. It's going to be almost impossible to integrate the T2T sequences with the standard reference genome. If you're interested in a discussion about this, you can check out my blog post: / What do we do with two different human genome reference sequences?. Genome42 (talk) 18:05, 3 October 2025 (UTC)
Number of genes - vandalism
The following two statements are contradictory and taken directly from this WP article: 1."There are 144,000 protein coding genes." from the lead. 2."The human reference genome contains somewhere between 19,000 and 20,000 protein-coding genes." I suspect the first to be vandalism, it seems clearly wrong, but I'm not going to edit this.~2025-39446-09 (talk) 00:29, 31 December 2025 (UTC)


