Sunday, 10 January 2010

Handle SuperTypes and SubTypes while converting LDM to PDM.Transition from Logical Data Model to Physical data Model?

Super Types and Subtypes
The final issue is really just a matter of choice – and knowing what the choices are, as well as their pro’s and con’s.

Super and Sub Types (i.e. inheritance) are a very powerful logical modeling construct. Most business users are more than capable of assisting with construction of the logical data model – because it’s really nothing more than a static picture of their data at rest. As one accounting manager said to me once during a business user interview, “I know my own business whether it’s painting on cave walls or this format.”

So let’s take a very simple example – bank accounts can be either savings or checking accounts (but not both at the very same time) – yet all accounts share some common characteristics or attributes. Here’s a simple example of this:

You have a Super Type entity as Account and two subtype entities as Checking and Saving Account.

So how does one physically implement this construct in the physical data model – and why would one choose one way versus another? Now this is one place where the DBA should choose a direction and make a critical design implementation choice. Let’s now examine the three options:

Option #1 - One Big Table

This is probably the most common choice.

•Generate Parent = Y and Generate Children = N
•Requires Discriminator attribute (e.g. Account Type)
•Violates third normal form (… nothing but the key …)
•PRO: Easy to code against, just one big table …
•CON: All child columns optional, so need table check constraint

Option #2 - Table per Sub-Type

This is another common and simple alternative.

•Results in N-1 tables
•Gen. Parent = N, Gen. Children = Y, Inherit All Attributes = Y
•PROS: All child columns implemented as expected
•CON: Two tables to code against

Option #3 - Table per Super and Sub-Type

This choice is valid – but has some issues related to complex application coding and resulting performance (due to excessive joins).

•Results in N tables
•Gen. Parent = Y, Gen. Children = Y, Inherit Only Primary Attr. = Y
•NOT RECOMMENDED: Just Plain Overkill

Which should you use? That’s up to you. But you do need to understand the three alternatives – because there is not a one-size-fits-all answer that works every time.

Also note that not all data modeling tools support these choices the same way. Most data modeling tools fail to create the complex constraints and triggers required by the first two alternatives – so your DBA will have to compensate and add the missing database logic.

Thursday, 17 April 2008

Steps to convert Logical Data Model to Physical Data Model

Assuming that the logical data model is complete, though, what must be done to implement a physical database?The first step is to create an initial physical data model by transforming the logical data model into a physical implementation based on an understanding of the DBMS to be used for deployment.
To successfully create a physical database design you will need to have a good working knowledge of the features of the DBMS including:
  • In-depth knowledge of the database objects supported by the DBMS and the physical structures and files required to support those objects.
  • Details regarding the manner in which the DBMS supports indexing, referential integrity, constraints, data types, and other features that augment the functionality of database objects.
  • Detailed knowledge of new and obsolete features for particular versions or releases of the DBMS to be used.
  • Knowledge of the DBMS configuration parameters that are in place.
  • Data definition language (DDL) skills to translate the physical design into actual database objects.

Armed with the correct information, you can create an effective and efficient database from a logical data model. The first step in transforming a logical data model into a physical model is to perform a simple translation from logical terms to physical objects. Of course, this simple transformation will not result in a complete and correct physical database design – it is simply the first step.

The transformation consists of the following:

  • Transforming entities into tables
  • Transforming attributes into columns
  • Transforming domains into data types and constraints

To support the mapping of attributes to table columns you will need to map each logical domain of the attribute to a physical data type and perhaps additional constraints. In a physical database, each column must be assigned a data type. Certain data types require a maximum length to be specified. For example a character data type could be specified as CHAR(25), indicating that up to 25 characters can be stored for the column. You may need to apply a length to other data types as well, such as graphic, floating point, and decimal (which require a length and scale) types.

But no commercial DBMS product supports relational domains. Therefore the domain assigned in the logical data model must be mapped to a data type supported by the DBMS. You may need to adjust the data type based on the DBMS you use. For example, what data type and length will be used for monetary values if no built-in currency data type exists? Many of the major DBMS products support user-defined data types, so you might want to consider creating a data type to support the logical domain, if no built-in data type is acceptable.

In addition to a data type and length, you also may need to apply a constraint to the column. Consider a domain of integers between 1 and 10 inclusive. Simply assigning the physical column to an integer data type is insufficient to match the domain. A constraint must be added to restrict the values that can be stored for the column to the specified range, 1 through 10. Without a constraint, negative numbers, zero, and values greater than ten could be stored. Using check constraints you can place limits on the data values that can be stored in a column or set of columns.

Specification of a primary key is an integral part of the physical design of entities and attributes. A primary key should be assigned for every entity in the logical data model. As a first course of action you should try to use the primary key as selected in the logical data model. However, multiple candidate keys often are uncovered during the data modeling process. You may decide to choose a primary key other than the one selected during logical design – either one of the candidate keys or another surrogate key for physical implementation. But even if the DBMS does not mandate a primary key for each table it is a good practice to identify a primary key for each physical table you create. Failure to do so will make processing the data in that table more difficult.

There are many other decisions that must be made during the transition from logical to physical. For example, each of the following must be addressed:

  • The nullability of each column in each table
  • Character columns (should fixed length or variable length be used?)
  • The DBMS (Should it be used to assign values to sequences or identity columns?)
  • Implementing logical relationships by assigning referential constraints
  • Building indexes on columns to improve query performance
  • Choosing the type of index to create: b-tree, bit map, reverse key, hash, partitioning, and so on
  • Deciding on the clustering sequence for the data
  • Other physical aspects such as column ordering, buffer pool specification, data files, denormalization, and so on.

Links to my blogs related to Data Modeling.
DataModeling in Telecom
Steps to convert Logical Data Model to Physical Data Model
Application of Shared Information Data Model to create Common Data Model and Data Service
Sandeep Pandit Data Designer's personal website