Normalization

Normalization is the process of efficiently organizing data in a database. 
2 goals of normalization:
-  Eliminating redundant data (for example, storing the same data in more than one table
-  Ensuring data dependencies make sense (only storing related data in a table). 
Both of these reduce the amount of space a database consumes and ensure that data is logically stored.



First Normal Form (1NF)

A relation is said to be in First Normal Form (1NF) if and only if every attribute is single valued for each tuple. This means that each attribute in each row (or each cell) of the relation, contains only one value and the repeating groups (or multivalued) are not allowed. Similarly, entries in any column (attribute) must be of the same kind.

Ex: A table with column Manager, Subordinate1, Subordinate2, Subordinate3 is not in 1NF, because the Subordinate1-Subordinate4 columns are duplicate, but table with column Manager, Subordinate  is in 1NF



Second Normal Form (2NF)


A relation is in Second Normal Form (2NF) if and only if it is in the First Normal Form (INF) and all the non-key attributes are fully functionally dependent on the key attribute.
If a relation is in 1NF and the key consists of a single attribute, then the relation is automatically in 2NF. Because all the non-key attributes are fully functionally dependent on the key; there can be no partial dependency.

Ex: Table (Item, Colors, Price, tax) not in 2NF, because the price and tax depends on the item, but not color. Dividing Table (Item, Colors, Price, tax) into Table1(Item, Colors) and Table2 (Item, Price, Tax) will put it in 2NF.

Third Normal Form (3NF)


A relation is in Third Normal Form (3NF) if it is in Second Normal Form and no transitive dependency exists.

Third Normal Form can also be defined as: "A non-key attribute must not depend on any other non-key attribute of the relation", i.e. all non-primary key fields must be dependent only on primary key. In other words you can say that if the value of a non-key attribute can be obtained simply by knowing the value of another non-key attribute, the relation is not in 3NF.




Ex: Student (Student_name, Serial, Advisor_name, Advisor_phone) is not in 3NF, because Advisor's phone is dependent on Advisor which is not a primary key but Student(Student_name, Serial, Advisor_ID) and Advisor(Advisor_ID, Advisor_name, Advisor_phone) are in 3NF.


Boyce-Codd Normal Form (BCNF or 3.5NF)

The Boyce-Codd Normal Form is defined as: a relation is in Boyce-Codd Normal Form if and only if every determinant is a candidate key.
Ex: Student (StdID, Major, Advisor) is in 3NF, but not in BCNF, because StdID, Major -> Advisor and Advisor -> Major; here determinant Advisor is not a candidate key. To put the table into BCNF, it needs to be divided into Advisor(StdID, Advisor) and Major(Advisor, Major).


Fourth Normal Form (4NF)

A relation is in Fourth Normal Form (4NF) if and only if it is in Boyce-Codd Normal Form (BCNF) and has no multi-valued dependencies.
Multivalued dependencies occur when the presence of one or more rows in a table implies the presence of one or more other rows in that same table. 


Ex:

SubjectInstructorTextbook
JavaA
B
Programming with Java by Deitel & Deitel.Complete Reference Java.
C + +C
D
Programming with C++ by Deitel & Deitel.Programming with C++ by 'A' & 'S'.
Let Us C++


Here we see, a single value of Subject may have multiple values of Instructor as well as multiple values of Textbook.

The multi-valued dependencies between the attributes of this relation are written as;
Subject →→ Instructor
Subject →→ Textbook

To remove the multi-valued dependency from the relation, we must divide the relation into two relations each one storing data for only one of the multi-valued attributes. For example, 'COURSE' relation can be divided into TEACHER' and 'BOOK' as:


TEACHER (SubjectInstructor)

BOOK (SubjectTextbook)
SubjectInstructorSubjectTextbook
JavaRJavaProgramming with Java by Deitel & Deitel.
JavaNJavaComplete Reference Java.
C + +SC++Programming with C++ by Deitel & Deitel.
C + +AC++Programming with C++ by 'A' & 'S'.
C++Let Us C++



Fifth Normal Form (5NF)

A relation is in Fifth Normal Form (5NF) if it is in Fourth Normal Form and does not have a join dependency. It is also known as Project-Join Normal Form (PJNF).
A relation that has a join dependency cannot be divided into two or more relations such that the resulting relations can be re-combined to form the original relation. For example, if a relation R(A, B, C) is decomposed into R1(A, B) and R2(B, C) relations, then a join dependency exists if we can get back to the original relation R byre-combining R1 and R2 relations.



Remember, these normalization guidelines are cumulative. For a database to be in 2NF, it must first fulfill all the criteria of a 1NF database.







No comments:

Post a Comment