Saturday, February 26, 2011

3.Data Normalization

3.Normalization
Introduction:-
Database normalization is a technique for designing relational database tables to minimize duplication of information and, in so doing, to safeguard the database against certain types of logical or structural problems, namely data anomalies. Normalization is a process of remove the redundancy and inconsistency

Normalization is the process of efficiently organizing data in a database. There are two goals of the normalization process: eliminating redundant data (for example, storing the same data in more than one table) and ensuring data dependencies make sense (only storing related data in a table). Both of these are worthy goals as they reduce the amount of space a database consumes and ensure that data is logically stored.

The Normal Forms
The database community has developed a series of guidelines for ensuring that databases are normalized. These are referred to as normal forms and are numbered from one (the lowest form of normalization, referred to as first normal form or 1NF) through five (fifth normal form or 5NF). In practical applications, you'll often see 1NF, 2NF, and 3NF along with the occasional 4NF. Fifth normal form is very rarely seen and won't be discussed in this article.

Before we begin our discussion of the normal forms, it's important to point out that they are guidelines and guidelines only. Occasionally, it becomes necessary to stray from them to meet practical business requirements. However, when variations take place, it's extremely important to evaluate any possible ramifications they could have on your system and account for possible inconsistencies. That said, let's explore the normal forms.

The purposes of normalization are to:
• Reduce or eliminate storage of duplicate data
• Organize data into an efficient and logical structure
The need of normalization
Normalization is needed in data modeling of any system because it helps to remove the following anomalies:
• Insert
• Delete
• Update
Case of a Construction Company
a.Building project -- Project number, Name, Employees assigned to the project.
b.Employee -- Employee number, Name, Job classification
c.Periodically, a report is generated.

First Normal Forms:- First Normal Form that deals atomicity.
The term first normal form (1NF) describes the tabular format in which:
a.All the key attributes are defined.
b.There are no repeating groups in the table.
c.All attributes are dependent on the primary key.
or
First Normal Form (1NF)
First normal form (1NF) sets the very basic rules for an organized database: Eliminate duplicative columns from the same table.
Create separate tables for each group of related data and identify each row with a unique column or set of columns (the primary key).
Eliminate duplicative columns from the same table.
Create separate tables for each group of related data and identify each row with a unique column or set of columns (the primary key).

Second Normal Forms:- Second Normal Form that deal function dependence.
It is in 1NF and
It includes no partial dependencies; that is, no attribute is dependent on only a portion of the primary key.
(It is still possible for a table in 2NF to exhibit transitive dependency; that is, one or more attributes may be functionally dependent on nonkey attributes.)
or
Second Normal Form (2NF)
Second normal form (2NF) further addresses the concept of removing duplicative data: Meet all the requirements of the first normal form.
Remove subsets of data that apply to multiple rows of a table and place them in separate tables.
Create relationships between these new tables and their predecessors through the use of foreign keys.
Meet all the requirements of the first normal form.

Third Normal Forms:-Third Normal Form that deals Transitive dependence.
A table is in 3NF if:
a.It is in 2NF and
b.It contains no transitive dependencies.
or
Third Normal Form (3NF)
Third normal form (3NF) goes one large step further: Meet all the requirements of the second normal form.
Remove columns that are not dependent upon the primary key.

BCNF()
A table is in BCNF if every determinant in that table is a candidate key. If a table contains only one candidate key, 3NF and BCNF are equivalent.

How does data integrity differ with data security?
Data integrity refers to the consistency and accuracy of the data stored in a database. Data security has to do with ensuring the data is safe.
D ata integrity = making sure the data is correct and not corrupt
Data security = making sure only the people who should have access to the data are the only ones who can access the data. also, keeping straight who can read the data and who can write data.

Define the types of constraints in RDBMS with suitable example.
An integrity constraint is a rule that restricts the values that may be present in the database. Integrity constraints allow only correct changes to be made to a database. There are two types of integrity constraints - entity integrity and referential integrity.

Entity Integrity:
Entity Integrity ensures that for each row in a table, the value of the primary key is unique and is not null.
Referential Integrity : Referential Integrity ensures that for each row in a table, the value of the foreign key is present in the reference table.
Referential integrity consists of:
• Not inserting a record if the value of the foreign key being inserted does not match an existing record in another table with the primary key having the same value,
• Not deleting a record whose primary key is defined as a foreign key in child records and
• Not modifying the value of primary keys.


User-Defined Integrity
User-defined integrity refers to specific business rules not covered by the other integrity categories. It is typically implemented through triggers and stored procedures.


Types of constraints in RDBMS
a.Domain Constraints:
Domain constraints specify that the value of each attribute must be atomic value and each data must have same data format.

b.Key Constraints:-
No two tuples can have the same combination of values for all their attributes.

c.NOT NULL Constraints:-
This constraint prohibit the insertion of a NULL value for this attributes.

d.UNIQUE Constraints:
The value of attributes for each tuples must be unique. It may permit NULL value.

e.Primary Key Constraints:
It ensure that certain attributes or set of attributes must have unique value and also it cannot permits null value.

f.CHECK Constraints:
If ensure that each value of attributes must satisfy the given condition balance decimal(7,2) CHECK(balance >2000)

g.Foreign key Constraints:
Foreign key always points to primary key of same or other relation. It ensure that foreign key always points to void primary key. i.e. Value of foreign key must be in primary key.

Domain Constraints
Domain constraints are a user-defined data type which enforces the integrity of the standard data types. A domain consists of all values permitted in a column. In constructing your database you would first select a standard data type such as VARCHAR or INT. You could then constrain or limit the data using NOT NULL, UNIQUE, CHECK, PRIMARY KEY or FOREIGN KEY. For example the CHECK constraint could limit INT data within your domain [column] to numbers less than 5000. You would thus define a custom data type using domain constraints.



Data Dictionary
CREAT TABLE
{
:
:
}

Data dictionary store data that describes the data stored in the database. Such data is also known as metadata (Data of data). It typically list all tables, column, data domain, constraint and assu,ption. The data dictionary store following information
a. Name of the relation.
b. Name of attribute of each relation
c. Domain & length of attributes
d. Integrity constant.
e. Name of authorized user.
f. Authorized & accounting information.

1 comment:

  1. Thanks for sharing such an informative Article. It will be beneficial to those who seek information. Continue to share your knowledge through articles like these, and keep posting on

    Data Engineering Solutions 

    AI Solutions

    Data Analytics Services

    Business Intelligence Solutions

    ReplyDelete