Wikipedia:WikiProject Chemicals/Chembox validation

Wikipedia project page From Wikipedia, the free encyclopedia

WikiProject Chemicals and WikiProject Pharmacology are validating the content in the infoboxes {{chembox}} and {{drugbox}}. Values in the infobox are compared with values reported in literature, and when the values match, the revision is stored in the index for chembox and the index for drugbox, respectively. This is typically done for values that are 'immutable' (e.g., the boiling point of a chemical compound: the boiling point of water under standard conditions is 99.98 °C, and there is no plausible reason to suspect it will change).

We are verifying the CAS Registry number (|CASNo= in {{chembox}}, |CAS_number= in {{drugbox}}), ChemSpiderID (ChemSpiderID), Unique Ingredient Identifier (UNII), InChI, KEGG, and ChEMBL by comparison with the data on CAS website, ChemSpider and FDA'S UNII Search Service as well as from lists supplied by (CAS number, ChemSpiderID, InChI, UNII, ChEMBL and ChEBI) or downloaded from these websites (KEGG, DrugBank). In the meantime, we are trying to add, update and/or check as a number of other identifiers (InChI, InChIKey) by comparison of the data with the ChemSpider website.

Boxes that contain verified values that are the same as the values in the verified revision are tagged with checkY at the bottom, and boxes where some of these values are changed are tagged with ☒N. Moreover, the individual identifiers are tagged with checkY or ☒N. If the boxes contain changes to these verified fields, they are also categorized in Category:Chemboxes which contain changes to verified fields. Boxes that contain changes to other important fields are categorized in Category:Chemboxes which contain changes to watched fields. For an example, see this vandalism, quickly flagged by CheMoBot.

If you encounter a page with a {{chembox}} or {{drugbox}} that shows an ☒N, then please check if the value is wrong (in which case, it can just be changed back to the value in the verified revision), or if there is a mistake in the verified revision (if so, it may need an update of the index; see WP:WikiProject Chemicals/Index or WP:WikiProject Pharmacology/Index).

Verification – tagging references

We add a template to a _Ref parameter (e.g. for CASNo, CASNo_Ref will be filled with {{cascite|correct|XXX}}) when the field is found to be correct. The first parameter of the template is 'correct', or 'changed', and the box will show a tick or a cross accordingly on CASNo. The second parameter is a field that contains a reference for 'where' the parameter was verified. As we are at the moment verifying all fields against the CAS commonchemistry.org site, the XXX is replaced with 'CAS' (i.e., {{cascite|correct|CAS}}). When using another place to verify the CASNo, please adapt this parameter accordingly and will try to retain this field throughout.

Method of work

Our approach is to start by checking that the CAS registry number and the structure match with the name. This will be used as a foundation upon which we can build a broader validation effort. Once we have the structure verified, we have the formula, and hence the molar mass, and we can also generate other machine representations such as SMILES, InChI and InChIKey.

First 1000

After our IRC meeting on January 13, 2009, we used an Excel file to validate the first 1000 entries from the CAS XML file. This is available to project members here, on the password-protected site. Meanwhile, User:Physchim62 validated the inorganics separately, and these can be found in the CAVer file.

The work

We are now beginning to work through the list of "problem articles" found by User:Beetstra, and listed at User:Beetstra/CASFoundCorrect. A description of the process will be added soon.

Notes

  • Different CAS Registry Numbers[1] are used for each form of a substance. For example, something simple like alanine will have one CAS# for the D form, another for L, another for "unspecified" and a fourth one for racemic. There would be another four CAS#s for the hydrochloride, four for the (1:1) sulfate, four for the (2:1)sulfate, etc. It is very important that we match the correct form CAS# to our Chemboxes!
  • Be aware that CAS uses an unusual system for representing some formulae, which may seem "wrong" to us. These involve describing salts such as sodium nitrate as HNO3·Na, and organic salts follow a similar system. Do not use such formulae on WP, but they are not "wrong" since they are merely a representation, not a formal structure. This also results in incorrect MolarMass in the FW section of the SDF file for salts.
  • For complex chiral structures, such as bleomycin, which may be drawn very differently in WP than in Common Chemistry, I found it best to assign R/S for each center and compare that way. (And yes, Farseer drew bleomycin perfectly!)
  • The CAS No. (that is, "CAS Registry Number"[1]) in a Chembox will receive a green tick (check mark) once {{cascite}} is added. This does not happen yet in the Drugbox (there is no change at present), but we hope to enable a similar system there too, if WP:PHARM is in agreement.

Fields to check/upload

Chemboxes

Check structure, CAS no., Formula, MolarMass.

Notes:

  • 1. The fields are in two sets, watched and unwatched; all changes are reported, but the watched fields are the ones we really want to take care of, those are the fields that contain hardcore, verifiable data that are very unlikely to change (as the boiling point of water, the CAS-number of benzene, the number of carbons in glucose. N.B. the list of 'watched' fields may need to be updated
  • 2. An empty field is marked as 'unknown'.
  • 3. Things between <!-- and --> are 'comments', they can be saved and appear in the editbox, but do not produce visible wikicode.
  • When a 'better' version of a page comes up, change the number on the page.

The workers

Please sign up to work on some of the articles listed at User:Beetstra/CASFoundCorrect. More information later.

The software

{{cascite}}-set

  • {{Cascite}} set of bot-handled templates, used for verified parameters

Problems found when validating the Excel file

Related Articles

Wikiwand AI