Today’s topic is normalization. If your database were a piece of dirty laundry, normalization would be a kind of wash-rinse-dry process. But in this case, shrinking wouldn’t be a problem.
Normalization, explained in the first TechTip of this series, is more than just a theoretical concept. It’s time to put it to work in your database.
Normalizing the Database
In this stage of the modernization process, it would be a good idea to take some time and revisit the database design, review the notes you took earlier, and have another look at the Entity Relationship Diagram you drafted. Then, follow the normalization process and update your logical database model’s tables to conform with the second normal form, at least. The following steps are involved in this stage:
- Eliminate unnecessary columns from SQL tables—In the process of normalization, some unnecessary columns will be eliminated or moved to other tables. Many tables contain columns that were intended for some purpose, but over the course of business modifications are no longer used or were never used at all. This is the opportunity to identify and remove such columns. Be careful, and eliminate only those that you’re absolutely certain are not required.
- Update the data dictionary—The data dictionary that you started a few steps ago, when you defined standard abbreviations for table names, can now be updated with some of the following:
- Object naming conventions
- Column naming conventions
- Function naming conventions
- Application naming conventions
- Standard abbreviations
- Establish data domains—This is the process of grouping columns with like attributes into classes or domains. You can implement the data dictionary with established domains using field reference files, since the SQL CREATE TABLE statement can now reference this file. This requires a small and not-very-obvious trick, so let me give you an example. Imagine that you have a field reference table named FRT and it contains the definition commonly used for a few data domains, such as:
- Percentages, defined in column PERC, assuming the percentage varies from 0.01% to 100.00%
- Coefficients, defined in column COEF, assuming the coefficient goes from 0.00001 to 1000.00000
- Names, defined in column NAME, as a 50-character column
- Descriptions, defined in column DESCRIPT, as a 250-character column
Now, let’s say I want to create a new table containing only columns defined in my field reference table. The respective CREATE TABLE statement would look something like this:
CREATE OR REPLACE TABLE new_table
(user_name, user_description, salary_coef, bonus_perc)
AS
(SELECT NAME, DESCRIPT, COEF, PERC FROM FRT)
WITH NO DATA;
Great, right? However, in real life, not everything comes predefined, so it’s highly probable that sooner or later (definitely sooner), you’ll come across a situation in which you’ll need to create a column that doesn’t match any definition in your field reference table. Here is where the trick I mentioned earlier is used. When using this type of CREATE TABLE statement, you can’t mix “standard” and “select-like” definitions, so you’re forced to tweak the SELECT statement to include the extra columns. The way to do it is simple, if you’re familiar with the CAST SQL function. Here’s a variation of the previous example, tweaked to include a couple of new columns (an INTEGER named employee_id and a very long VARCHAR, named soft_skills):
CREATE OR REPLACE TABLE new_table
(user_name, user_description, salary_coef, bonus_perc, employee_id, soft_skills)
AS
(SELECT NAME, DESCRIPT, COEF, PERC, CAST(0 AS INTEGER), CAST(‘’ AS VARCHAR(500)) FROM FRT)
WITH NO DATA;
It’s a bit more verbose, but it gets the job done. There are, however, some details worth mentioning: When you create a table via a SELECT statement, your new table will inherit more than just the definition of the data types and sizes of the columns in the SELECT statement. It also gets the nullability definition (NULL or NOT NULL) of the column and, optionally, other characteristics such as column defaults, identity column attributes, and implicitly hidden and even row change timestamp definitions. In order to “copy” these characteristics, you need to specify the respective INCLUDE clause. Let’s illustrate this by using the previous statement and changing it to also copy the column defaults (see the part in bold):
CREATE OR REPLACE TABLE new_table
(user_name, user_description, salary_coef, bonus_perc, employee_Id, soft_skills)
AS
(SELECT NAME, DESCRIPT, COEF, PERC, CAST(0 AS INTEGER), CAST(‘’ AS VARCHAR(500)) FROM FRT)
WITH NO DATA
INCLUDE COLUMN DEFAULTS;
The same can be used for the other characteristics, in the same way. In short, you can use a field reference table to enforce a certain uniformity over your database by defining the exact attributes of the most commonly used column types.
- Create or update the logical database model—The normalization process probably required the creation of some tables and changes in others. In turn, this caused new relationships between tables. All of this must be documented thoroughly. It’s a good time to create or update your ERD with the latest changes. Be sure to include the attributes and constraints you added when you migrated the DDS objects.
- Implement the model—Preferably using an automated tool such as IBM InfoSphere Data Architect, apply the changes made in the logical model to the physical model (the closest to your actual database).
Now you’ll need to adjust some programs, just like you did in your two-file-conversion PoC, because of the new tables that were created during the normalization phase. This won’t be the last time you’ll have to change them in this process. The next big step requires you to change them again, but it’s worth it.
In the next TechTip, I’ll start discussing the second big step of the database modernization process: moving business rules to the database.
LATEST COMMENTS
MC Press Online