Talk:Database normalization
From Wikipedia, the free encyclopedia
| This article is rated C-class on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | ||||||||||||||||||
| ||||||||||||||||||
Why is Boyce-Codd Normal Form Not Discussed More?
Throughout this article, there is discussion of 3NF and then EKNF (which is between 3NF and BCNF) and then 4NF, with no mention of BCNF. Why was this left out? Contributors themselves have noted that EKNF is not discussed much in the literature, so I am not sure why it is mentioned but not BCNF.
--StatsJunkie (talk) 04:22, 23 October 2022 (UTC)
- It is listed in various places and a link to its own article is provided. I'm sure a new subsection Satisfying BCNF would be most welcome. Rp (talk) 18:55, 24 October 2022 (UTC)
- I would appreciate this section as well. Perhaps a link to https://en.wikipedia.org/wiki/Boyce%E2%80%93Codd_normal_form#Achievability_of_BCNF would suffice. Jtbwikiman (talk) 20:04, 8 July 2024 (UTC)
Karnaugh map
How would I know if I had reached the most normalised/optimised stage? Is there any tool like Karnaugh map that lists all permutations/combinations?Anwar (talk) 13:01, 22 May 2008 (UTC)
- I don't know, but for small schemas (say, 20 relations or less) this is easy to see. A related issue is that there may be implicit dependencies that have not been marked explicitly in the database schema - determining those is not so easy, see e.g. Rp (talk) 08:49, 7 July 2009 (UTC)
- We can use the concept of normalization only for the time of removing the redundancy data in the databases. When the data gets the unique data in the records, may get reach the normalised stage. —Preceding unsigned comment added by 210.212.230.196 (talk) 09:52, 18 December 2010 (UTC)
- Normal forms are defined as conditions; you can simply check whether the required conditions hold to know whether a database schema is in a particular normal form. Rp (talk) 23:38, 26 December 2010 (UTC)
Trade-off
The article does not explain the trade-offs suffered with normalisation. For instance, a highly normalised database needs more tables each stripped to the bare minimum. So serving a singular query would require pulling data from several tables. This costs time and money. In a way, the business process is not optimised (though the database is!).Anwar (talk) 13:18, 22 May 2008 (UTC)
- Do you have an article or a computing paper where this is explained, so we can add it better to the article? --Enric Naval (talk) 11:43, 24 May 2008 (UTC)
- A normalization, and thus pulling data from several tables doesn't cost significantly more time. In fact, a database that is not normalized will take much more CPU time and Disk I/O to process, requiring more resources as query load increases. Then your business will either need to purchase more hardware, or re-architect it's database and code base, and trust me, that costs alot more. Common sense always dictates that you "Do it right the first time" 71.231.132.75 (talk) 14:38, 24 December 2009 (UTC)
- A general guideline is that transactional databases (intended for high-concurrency, high-volume interactive updates, inserts and deletes to the data - such as databases used for E-commerce transactions) SHOULD be maximally normalized, because if you don't, changes to certain data elements have to be made in two (or more) places. Denormalization is normal practice for data marts/warehouses, which are read-only when operated interactively (their contents are generated by scripts from a transactional database) to improve query performance. Sometimes, de-normalization can be extreme, as when precomputing aggregates at various levels of hierarchy - e.g., grand sales total, total by division, region, salesperson, product, etc. Prakash Nadkarni (talk) 01:20, 11 May 2020 (UTC)
Denormalization
The statement, "It has never been proven that this denormalization itself provides any increase in performance, or if the concurrent removal of data constraints is what increases the performance." needs more backup on that argument or needs to be revised or removed. Volomike (talk) 12:53, 17 January 2009 (UTC)
- It's wrong. Denormalization doesn't remove data constraints. It just means that instead of maintaining two tables, you maintain their join. This can be faster if you often need that join. Rp (talk) 21:13, 1 July 2009 (UTC)
- PS the article should bother to explain this. Rp (talk) 21:19, 1 July 2009 (UTC)
- Agreed. 99.60.1.164 (talk) 17:13, 22 August 2009 (UTC)
Request for normalization
Please help with http://strategy.wikimedia.org/wiki/Proposal:Assessment_content and http://strategy.wikimedia.org/wiki/Proposal_talk:Assessment_content#Normalization_of_assessment_items_.28questions.29_in_database 99.60.1.164 (talk) 01:37, 23 August 2009 (UTC)
- Please note that contains a list per quesion with each element containing two timestamps, so this is definetly a 6NF-level problem. An easier, and perhaps more important sub-requirement is the review system in , in particular, this example outlines a sub-schema related to
- a selection of text or a url (to a permanent aritcle version or diff, etc.) could be an item for which multiple, randomly-selected reviewers chosen for their stated familiarity with a topic area would be selected. Those reviewers could be shown that text or url (perhaps as part of a list of such items) in a securely authenticated and captcha-ed channel. They would be asked to vote on the accuracy of the item, and have the opportunity to fully explain their votes in comments. If a statistically significant number of votes are in agreement, then the item could be approved as to veracity or rejected. When the votes are not in agreement, then additional voter(s) would perform a tie-breaking function. Each voter's track record in terms of agreement with other voters could be recorded secretly and used to (1) weight their vote to nullify defective voters, and/or (2) used to select whether the voter is allowed to become a tie-breaker.
- The fields required to support that need to be added to . 99.35.130.5 (talk) 18:00, 10 September 2009 (UTC)
ugly tables
this article has the ugliest tables i've seen —Preceding unsigned comment added by 67.187.187.128 (talk) 00:42, 27 November 2009 (UTC)
- You noticed that too? 71.231.132.75 (talk) 14:40, 24 December 2009 (UTC)
Elementary Key Normal Form
This article should include EKNF (given this name in 1982 by Zaniolo, in his paper "A new normal form for the design of database schemata ), a normal form which is stronger (more faithful to the principal of separation) than 3NF and which has the "complete representation" property proposed in 1976 by Philip A Bernstein (Bernstein P.A., "Synthesizing third normal form relations from functional dependencies" ) as a criterion for schema synthesis algorithms.MichealT (talk) 11:57, 7 March 2010 (UTC)
Normalization
In researching the area of normalization, one thing that I often see mentioned is the "insert" or "delete" anomaly. I don't understand how this is an anomaly, let me explain.
Assume a system intended to store information about students and registered courses. If a logical schema presented the following requirements:
-each customer shall enroll in one or more courses and -each course shall have one or more students
Then one can legitimately arrive at a single table
(S#, SNAME, C#, CNAME)
which would face the update anomaly and redundancy.
But, the "insertion" anomaly and "delete" anomaly are a result of the statement of the logical requirements that state that no course may have zero students.
By changing the requirements to be:
-each customer shall enroll in zero or more courses -each course shall have zero or more students
The supposed "update" and "insert" anomalies may be eliminated.
In the original paper by Codd about normalization, I see reference to "redundancy" and attendant update anomalies but where did insert and delete anomalies come from?
130.215.36.61 (talk) 11:01, 16 September 2010 (UTC) amrith
- You are mistaken. The anomaly has nothing to do with requirements (whether courses without students may occur), but with expressiveness: in this table, courses without students cannot be represented, unless we introduce NULL student IDs and names (and NULL is notoriously difficult to deal with consistently). So one may argue that here, the purpose of normalization is to avoid NULLs. Rp (talk) 11:14, 16 September 2010 (UTC)
--Dqmiller (talk) 17:36, 14 December 2012 (UTC)--Dqmiller (talk) 17:36, 14 December 2012 (UTC)
- Delete anomalies can occur for your simplistic table a couple ways. For example, if you delete the only student enrolled in a course, then you also delete the course. Another kind of delete anomaly occurs when a course has multiple students and a "delete course" operation does not catch them all. Neither of those can happen if the structure is better normalized, such that each course and each student are only represented once.
Non-repeating groups
If I understand "repeating groups" correctly, having no repeating groups sort of just means "don't have separate tables for stuff that could be grouped in a table with the same number of fields as the separate tables". Is that "more or less" correct? I kind of find the example confusing. It's currently presented as:
| Customer | Transactions | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Jones |
| ||||||||||||
| Wilkins |
| ||||||||||||
| Stevens |
|
I'm guessing that this is supposed to represent three different tables in the database that "each do the same thing". Would it be better if the tables were presented like this:
| Customer Jones Table | Customer Wilkins Table | Customer Stevens Table | |||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
Do you think this more clearly illustrate "repeating groups" than the present example? (And with less chance of confusion?) Jason Quinn (talk) 10:44, 17 June 2011 (UTC)
- I'm not sure I really understand what you intended to say, but the example with repeating groups is meant to be an example of "what not to do", i.e. an example of a relation that is not in first normal form. Think of a repeating group as an array within one row; in the example, there is a column that contains a variable number of transactions (intended to illustrate a clear-cut case, I presume). I am not sure if you are using the word 'table' in the sense of an array rather than representing a relation.--Boson (talk) 12:42, 17 June 2011 (UTC)
- I don't know if I understood the example correctly now. Are the tables given in the example a single table displayed in three separate sections or three separate tables? I was under the impression they were three separate tables. This now illustrates exactly why I think the current nested presentation is ambiguous and confusing. Jason Quinn (talk) 02:38, 18 June 2011 (UTC)
- As I understand the example, it shows (and must show) a single table with two columns (Customer and Transactions). Each Transactions "field" contains an "array" of individual transactions (possibly with a varying number of logical occurrences). I am using terms rather loosely. It may help to remember that we are really talking logical design, not physical implementation. --Boson (talk) 10:07, 18 June 2011 (UTC)
- If this is the case, the example is really terrible and ought to be completely replaced. It seems like most books and websites, use an example of a field that contain multiple phone numbers as a table that is not in 1NF. Conceptually that is so much better than there's no comparison. Jason Quinn (talk) 09:50, 19 June 2011 (UTC)
- I think the current example shows the issue pretty well. 1NF is about avoiding nested records, i.e. not allowing a single field to contain a set of values. In his paper Codd himself describes normalizing to 1NF as "eliminate domains which have relations as elements" - i.e eliminating nested tables. So an example with a nested table is a good example of what 1NF is supposed to eliminate.--80.62.117.218 (talk) 17:15, 8 June 2021 (UTC)
- I don't really see why phone numbers should be better than transactions. I don't know who created the example, but they may have wanted to indicate that a repeating group containing more than one attribute is more typical for a non-1NF database. For that reason an example with only one attribute is, perhaps, not very good and might make the uninitiated think of an (atypical) "repeating attribute". Can you think of a better way of illustrating a repeating group containing more than one (visible) attribute per occurrence.--Boson (talk) 14:41, 19 June 2011 (UTC)
- A similar example of normalization is contained in Date's An Introduction to Database Systems, but is is perhaps clearer because the sub-column headings are shown only once, in the heading row (immediately under Transactions), though, I suppose, it could be argued that that is not as good, because a relation includes the headings and we are effectively talking about relation-valued domains. I don't know if this is the case in later editions of the book; my copy is from 1990. By 2003, of course, Date was saying explicitly that attributes could be relation-valued. I am not expert enough to judge if there really is a contradiction between his earlier and later books, but we need to be careful about how we define 1NF and what examples we use. By the way, the current example seems to go back to a rewrite of the article in January 2009 by Nabav (talk · contribs).
- --Boson (talk) 16:17, 19 June 2011 (UTC)
- Hi, just to clarify: my intention in the Customers & Transactions example was to do what Boson described, i.e. depict a data structure in which each Customer is associated with "package" of transactions. Any number of transactions can be contained within a given package.
- I would like to avoid the implication that the data structure as a whole is a table. Also I'd like to avoid the implication that the "package" of transactions is a table. Before Codd came on the scene, structures like these were implemented in, for example, hierarchical databases (rather than in relational tables, which didn't exist). One of the most fundamental characteristics of relational tables and 1NF is that they allow us to get by without complex structures like the one in the example.
- To avoid people thinking of the example in terms of tables, perhaps we could make the example look less tabular. This could be done by arranging the transactions for each customer HORIZONTALLY, rather than one on top of the other. --Nabav (talk) 13:34, 12 July 2011 (UTC)
--50.132.39.209 (talk) 05:02, 15 December 2012 (UTC)
- Strictly speaking, the comment "[this is] an example of a relation that is not in first normal form" is inconsistent because, by definition, a relation is always in 1NF. Not so a table, however, and I believe the intent was to point out that the table was not in 1NF and, therefore, not a relation.
- Moving on, I dispute the assertion that "repeating groups" is a 1NF violation in this example. In fact, the usual attention to "repeating groups" is not very helpful. I say that despite the fact that "eliminating repeating groups" is often cited as the first step of normalization. That may be relevant to legacy data structures that accommodate records with a variable number of fields, but it is not really applicable to a table structure that has the same number of columns in every row to begin with. A table, as we know it, is intrinsically free of repeating groups, making that consideration unequivocally moot.
- So, while I do agree the table is not in 1NF, "repeating groups" is hardly the reason. The table is not 1NF (and it, therefore, cannot represent a relation) only because it permits duplicate rows. Express a key on the Customer column and then the table satisfies 1NF. The fact that the Transaction column appears to have a table-valued datatype (or possibly even a relation-valued datatype) is interesting--perhaps even a provocative design--but is not a disqualification for 1NF.
- Much of the issue here seems to stem from that the article does not actually define what a repeating group is, or if such definition exists somewhere, it is not associated with the term "repeating group". In the Satisfying 1NF section, repeating groups are treated as being enumerated attributes for storing multiple values of the same information. In the example discussed here, repeating groups are depicted as non-atomic attributes, attributes that contain several, different, pieces of information as one value; and they are also depicting multi-valued attributes, attributes that for a given prime key, return multiple values. The Transaction attribute contains three pieces of information, Transaction ID, Date, and Amount, but for some Customers, it also contains multiple values of those three pieces of information.
- The solution to enumerated attributes, as discussed in Satisfying 1NF, is to move that information into a new table where each value can be it's own row that references this table. The solution to multi-valued attributes is the same, as they are essentially the same problem. The solution to non-atomic attributes is to split the attribute into it's atomic constituent values, each with their own attribute. Ultimately, the article refers to non-atomic and multi-valued together ambiguously, as well as enumerated attributes separately.
- Other sources have a similar lack of clarity, with some describing partial dependency, surely that at least, is incorrect. 68.54.2.135 (talk) 00:11, 26 March 2020 (UTC)
- It is not correct that a relation by definition is 1NF. In Codd's paper he shows how to perform 1NF normalization on a relation, i.e. the data was a valid relation already before it was normalized to 1NF. "Repeating group" is an old term from hierarchical database systems which just means nested records. Codd explains that repeating groups is equivalent to what he more formally calls "non-simple domains", i.e. attributes which does not contain single values but sets or relations. In other words, nested tables.--80.62.117.218 (talk) 17:15, 8 June 2021 (UTC)
The Design Exercise
- From where I stand, there is an important aspect of Data Normalization that is completely missing in this article. That aspect is the use of these procedures in the design process of any database - or even for the evaluation of how well classes have been partitioned in object-oriented coding.
- I have been programming professionally since 1971 and have found these NF definitions useful since I have started using them - decades now.
- The point of DN is not that it is an implementation goal, but that it is an powerful design tool. Any data base, regardless of how it is to be implemented, should be fully normalized (at a minimum, 3NF) during the design process because that discipline will reduce the chances that the purpose behind the data fields is being missed. DN forces the designer to ask the right kind of questions and collect the required data so that the system, once implemented, will perform productively.
- In the case where there is an existing system which is simply being automated, when interviewing the users of that system, they will described the data fields in the common tougue - very loose terms that barely touch on the purpose of a field. Only on careful investigation can keys and dependencies be determined, and once determined they will commonly result in discoveries that result in more user interviews. As that information is collected, it needs to be recoded and the "common tongue" is not good enough. The simplest comprehensive form for describing the database at this stage in the design process is by describing it in terms of a normalized database.
- Once the database has been documented in this form, the next stage of the desgn process is how it will be implemented. At this point the designer needs to consider precisely how and when backups and restores will be done, whether the harware will support various query and update transactions, etc. At this stage, the database implementation will be described and, except for rare or trivial cases, substantial denormalizations will be applied.
Scott Bowden (talk) 16:08, 9 February 2012 (UTC)
- I would say that probably belongs mainly under Data model, Data modeling, Entity-relationship model, etc., but there doesn't appear to be much in the way of linking to those articles. Perhaps a short section here with a hatnote pointing to those articles? Adding links wouldn't hurt, anyway. --Boson (talk) 20:42, 9 February 2012 (UTC)