Wednesday, February 26, 2014

SQL vs NoSQL

In this post I will differentiate SQL and NoSQL systems on basis of some features:

Scale to 1000s: This tells us about the scalibility of the system. It shows if the system can be scaled to 1000 or more machines. RDBMS cannot are not scalable. On the other hand, almost all NoSQL systems can be scaled to 1000 machines or more.

Joins/Analytic: It means combining one or more records from different data source (I am generalizing the definition of joins as NoSQL database also include flat files, JSON query etc. So, just saying tables won't suffice). if is probably one of the most important things in relational systems. RDBMS can be joined efficiently. But, the same is not the case with NoSQL. They are inefficient if joins are required and can take exponential time to execute query.

Integrity Constraint: By definition "Database Integrity ensures that data entered into the database is accurate, valid and consistent." Referential Integrity means requiring the existence of a related row in another table.
This is maintained in SQL database. But, NoSQL database are not consistent the moment they are updated. But, most of them are consistent after a period of time. For ex: in Facebook when someone updates his status then it is not immediately reflected thought the world, but it takes some time to get reflected.

References: www.wikipedia.org
Scalable SQL and NoSQL Data Sources by Rick Cattell



Thursday, February 13, 2014

ER Model for Star Schema - Part 2

In last blog I introduced a new construct called dimension (Dm). In this post I will go in detail of it.

I would first explain some existing constructs available and why we can't use them.
1.) Aggregation: An aggregate entity instance is a collection of entities from one or more entity class.
For ex: Attendants, residents and nurses can be combined to form aggregate entity called Medical Team.
We can't use this approach to model star or snow-flake schema because fact table is not collection of one or more dimensional class.
2.) Composite Class: A composite entity instance is composed of a set of members from another entity class.
For ex: combination of different types of employees form employee type.
We can't use it because dimension table is not specialization of fact table. 
3.) Typing and Instantiation Classes: This is generally used when we have to model instances of an entity type.

So, as we can see there is no existing construct which can help us model ER Diagram efficiently and so we need a new construct.

Now, let's have a look at an example of star schema modeled using Dimensional (Dm) approach.

If we create ER Model for it, district and region will be joined to market using Dm (Dimension) construct. Fact table will be modeled as weak entity. Following is the ER diagram for it:


If required we can show cardinality in the ER-Digaram.

Degenerate Dimensions: We can model degenerate dimension as a normal entity with only primary key. As, fact table is weak entity so the primary key of degenerate dimension becomes fact table's foreign key.

References: 
"The Relation of Star Schema into Entity-Relation Diagrams" by Dr. Krippendorf and Dr. Song
ER Diagram Slides by Dr. Currim

Tuesday, February 4, 2014

ER Modelling for Star Schema

ER Modelling for Star Schema

I would first compare two approaches of ER modelling for Star along with deficiencies in each of them. Then I would present with another way of modelling star schema.

First Approach: The first approach is taken from paper "From ER Models to Dimensional Models." by Dr. Moody and Dr. Kortink.
They suggested to create 3 new entities:
1.) Transaction entity: it is used to represent Fact Tables
2.) Component Entity: it is used to represent Dimension Table
3.) Classification Entity: it is used to represent tables related to Dimensional table.
They created ER Diagram using these 3 entities and used notation similar to crow foot.
Shortcoming of this approach: A completely new model for ER diagram.

Second Approach: This approach is taken from paper"The Relation of Star Schema into Entity-Relation Diagrams" by Dr. Krippendorf and Dr. Song.
It uses the existing ER framework to model star schema.
Dimensional tables are converted into entities and Fact tables are either represented as relations or as weak entity.
Shortcoming of this approach: 1.) Dr. Teorey in paper "A Logical Design Methodology for Relational Databases - Using the Extended Entity" proves that , only entities can have attributes. So, we Fact Tables cannot be represented as relations because Fact Tables have attributes.
2.) It does not give us the detail of dimension. For ex: a dimension market with attributes Market Description, district and region would be depicted as single entity. This loses the detail that district and region were essentially different entities.

My Approach: Now, I would present my approach which aims at solving problems we saw in above two approaches:
Broadly speaking ER model of star schema can be represented as n-ary relationship with many dimension connected to single fact using this relationship.

ER representation of Fact Table:
1.) Fact Table should always be converted to a weak entity.
2.) If fact table has own primary key, it should be shown as partial identifier in ER Diagram.
3.) Foreign Keys from dimensional tables are not shown in the Fact entity.
For ex:

In figure shown on left, SKU_ID and Store_id are foreign key from dimensional table which will be shown as primary key of dimension entity.
Date, receipt_nbr and Receipt_line_Item are primary key for fact table.





                                                                                                                           



ER Representation:

 As, seen in the ER diagram Receipt Item, Receipt Number and date are shown as partial identifier while keys from dimensional table are not shown.






Before modelling dimensional table, I will discuss super class, subclass and classification tables.

Super Class and Sub Class: In ER diagram, sub class inherits all the attributes from super class. For ex:

In figure on left manager and associates will inherit employee's attributes like: name and address.












Classification Tables: Tables that form dimension tables are called classification tables. For ex: in figure shown below, district and region would be classification tables and market would be dimension table:                                                                                                                                                                                                                                        

ER Modelling for Dimensional Tables:
I would propose new construct called Dm (Dimension) to model dimensional tables. While creating ER diagram for dimension tables, classification tables are linked using the construct Dm to dimension tables. This can be compared to how super class was linked to subclass using P.

Classification tables transfers all the non-key attributes to dimension table in the same way as super class passed it's attributes to subclass.

Below is ER Representation of Market Dimension:


In this example Region name and district name
will get transferred to market table.


The arrow points to the dimension table.









As, seen the only new construct introduced in my approach called Dm. Apart from that my approach to model ER diagram uses traditional ER modelling technique.

In this article I just showed basic ER modelling. I might be covering this in more details in coming blogs with some examples.

References:
"The Relation of Star Schema into Entity-Relation Diagrams" by Dr. Krippendorf and Dr. Song.
"From ER Models to Dimensional Models." by Dr. Moody and Dr. Kortink
"A Logical Design Methodology for Relational Databases - Using the Extended Entity" by Dr. Teorey, Dr. Yang and Dr. Fry
The Data warehouse toolkit by Dr. Kimball and Dr. Ross
ER Diagram Slides by Dr. Currim