The first thing we were taught was star schema is de-normalized.
In this article I will examine if star schema is really not at all
normalized. I will analyze star schema for each normal form:
First Normal Form: By definition a relation is in 1 NF if and only
if each tuple contains exactly one value for each attribute. 

In above figure, category consists of one or more brand which in turn consists of one or more products. In this example products determine brand which in turn determines category and hence there is transitive dependency.
Now, I would like to show that if we assume that star schema is in
3 NF and neglect transitive dependency for time being then it can be proven
that star schema is also in BCNF, 4 NF and 5 NF as shown below:
Boyce-Codd Normal Form (BCNF): By definition a relation is in BCNF
if and only if every determinant in the relation is a candidate key. 
So, using surrogate key we can uniquely
find out any row, but using non-key attribute we cannot determine key as surrogate key's
values are independent of non-key attribute values.
Fourth Normal Form: By definition 'A relation is in fourth
normal form if it is in BCNF and it contains no multi-valued
dependencies.
 A multi-valued dependency come when all the following conditions
are met:

 Sales dollars (unit price) are independent of sales units (quantity) purchased. Now, we can argue that there can be another column total price which is dependent on unit price and quantity. But, total price can be determined from unit price and quantity and does not add any new information. Moreover, it's a design preference. So, I have not taken it into account.
Now, the keys of fact table is a composite key consisting of keys from dimensional table. Primary key of dimensional table are independent of each other as they are just integers starting from 1. Hence, we do not have multi-value dependency and so fact table is in 4
NF.
Fifth Normal Form: This is based on join dependency. 
Dimension Table: Dimension table has surrogate keys and so we do not have join dependencies in them.
Fact Table: Fact table has composite key. But, each part of composite key is independent of each other as proved above and so it is also in 5 NF. 
So, we can say if transitive dependency is removed from a star
schema then it becomes normalized or in other words we can generalize the
statement by saying that snow flake schema is nothing but star schema with no
transitive dependency.
The main purpose of this article is to show that star schema is still normalized to some level. These were completely my views. Kindly leave comment on what you think about it. 
References: 
Normalization slides provided by Prof. Currim
In above figure, category consists of one or more brand which in turn consists of one or more products. In this example products determine brand which in turn determines category and hence there is transitive dependency.
So, using surrogate key we can uniquely find out any row, but using non-key attribute we cannot determine key as surrogate key's values are independent of non-key attribute values.
Fourth Normal Form: By definition 'A relation is in fourth normal form if it is in BCNF and it contains no multi-valued dependencies.
Sales dollars (unit price) are independent of sales units (quantity) purchased. Now, we can argue that there can be another column total price which is dependent on unit price and quantity. But, total price can be determined from unit price and quantity and does not add any new information. Moreover, it's a design preference. So, I have not taken it into account.
Now, the keys of fact table is a composite key consisting of keys from dimensional table. Primary key of dimensional table are independent of each other as they are just integers starting from 1. Hence, we do not have multi-value dependency and so fact table is in 4 NF.
Fifth Normal Form: This is based on join dependency.
Dimension Table: Dimension table has surrogate keys and so we do not have join dependencies in them.
Fact Table: Fact table has composite key. But, each part of composite key is independent of each other as proved above and so it is also in 5 NF.
Normalization slides provided by Prof. Currim
 
nice blog anirudh...
ReplyDelete