MacLochlainns Weblog

Michael McLaughlin's Technical Blog

Site Admin

Data Normalization

with 3 comments

Database normalization is the process of organizing data. There are a bunch of rules governing how you should do it, when you should undo it, and how you can’t do it. My hope is to lay out what normalization means in Texas English, which means clear and simple. I rely on you, the reader, to let me know when its not clear and how it can be more clear by treating this as a living document.

There are now, as of last year, seven or eight normal forms. They began when E.F. Codd first proposed the relational model in his A Relational Model for Large Shared Data Banks paper. There’s also a concept of Domain-Key Normal Form (DKNF). According to some, this belongs between fifth and sixth normal forms. DKNF comes to use by way of Ronald Fagin, in his A Normal Form for Relational Databases That is Based on Domains and Keys paper.

This blog page will try to cover and blend these two ideas to help gain perspective on how you may model your databases. Ultimately, if you’re very interested in the topic you should consider spending some money to buy Logic and Databases: The Roots of Relational Theory by C.J. Date.

Normalization Definitions

Summary

Database normalization attempts to organize data in such a way as to prevent SQL statements from creating insertion, update, or deletion anomalies. As a practice third normal form (3NF) is often considered normalized because most 3NF tables are free of insertion, update or delete anomalies. The key word is most, not all.

Therefore, normalization design attempts to achieve the Highest Normal Form (HNF) possible. A table is in HNF whether it meets or fails to meet any normal form definition. While there is no zero normal form, there is Unnormalized Normal Form (UNF). UNF means that a table contains one or more repeating groups. It is probably important to note that a table may be in UNF and HNF at the same time.

Normalization is the process of organizing data into tables that act as single subjects when acted upon individually or through external relationships. A single subject is also known as a domain. You act on data by querying it or transacting against it. You query data by writing a SELECT statement. You typically transact against data by writing an INSERT, UPDATE, or DELETE statement.

Terms and Definitions

Terms and definitions are very important in any topic. The normalization process has its share of terms. Here are a few that you should understand before reading this document.

 

Normalization Anomalies

Anomalies can occur at various levels of normalization but the majority occur when you have tables that are less than third normal form. The frequent types of anomalies are illustrated by referencing a second normal form table. You can find formal problem and solution resolution of this second normal form table later in the blog page.

Normalization Concepts

Normalization concepts show the conceptual or generalized idea and practice for normalization. The illustrations show you how to make columns atomic and to avoid repeating rows groups for first normal form. Then, they illustrate how to conceptually eliminate multiple subjects dependencies from tables beyond first normal form.

 

Normalization Examples

As much as possible, the examples follow a common problem set that I borrowed from my Oracle Database 11g PL/SQL Programming book. These examples, however, rely on setup code that you can find at the end of this blog page and not from the downloadable code found on the McGraw-Hill web site.

Setup Script

Written by maclochlainn

February 9th, 2009 at 12:11 am

Posted in Uncategorized

3 Responses to 'Data Normalization'

Subscribe to comments with RSS or TrackBack to 'Data Normalization'.

  1. Good content with good examples. Looking forward to get to know about 4 to 6 Normalized methods with clear illustrartions.

    RajkumarV

    15 Mar 09 at 4:56 am

  2. You have done a good work. I really appreciate your work. I wish you good luck

    Saman Withanage

    4 Jul 11 at 5:38 am

  3. The third normal form is mislabeled as (2NF)

    Daniel L

    27 Mar 12 at 10:18 am

Leave a Reply