This month's installment covers the logical design of a sample database. This database will be a very simple one, as it is intended to be a training tool and not a database for a finished system.
What is a database? The term database, as I am using it in this article, means a collection of one or more files that solves a particular problem. Unfortunately, most programmers who learned file design on a flat file system tend to design a file at a time. With the availability of relational database technology, it is important to start designing databases instead of just files. A database is composed of files, but it is much more than that. It's a structured repository that captures information concerning the entities of an organization. This repository is composed of files, the relationships between these files, and integrity rules for the information that can be stored.
The Scope
The first step in designing our sample database is to define the scope of our database; that is, the information our database will contain. The easiest way to do this is to write a simple statement of the purpose of our database:
The editorial department needs to keep accurate and timely information concerning the articles that are currently being considered for publication in an issue of the magazine. The information concerning the author of the article must also be kept.
This general statement gives us some boundaries to begin working in. Although this step may seem to be so elementary that you may be tempted not to do it -- do it anyway. Database design is no different than any other structured design methodology; it's a top-down design approach. Start at a very broad level and then gradually decompose the problem until a detailed solution is reached.
Entity Relationships
The next step is to establish the entities that will make up the database, and define their relationships. This will further refine the scope of our database. The easiest way to implement this step is to use entity- relationship (E-R) diagrams.
An E-R diagram shows the main groupings of data as boxes and a simplified version of the relationships between these groups as diamonds. 1 shows a sample E-R diagram for our sample database. In plain English, the chart says that an AUTHOR WRITES an ARTICLE that is then SCHEDULED to be published in an ISSUE.
An E-R diagram shows the main groupings of data as boxes and a simplified version of the relationships between these groups as diamonds. Figure 1 shows a sample E-R diagram for our sample database. In plain English, the chart says that an AUTHOR WRITES an ARTICLE that is then SCHEDULED to be published in an ISSUE.
The E-R diagram offers many advantages. The diagram can be understood by both technical and non-technical personnel. This allows user involvement to ensure that important details haven't been forgotten. The E-R diagram does not depend upon any particular system for the physical implementation of the design. As a result, this design technique may be used for any database management system, including a flat file system.
After the E-R diagram is complete, the definition of the scope of the database is complete. As the design progresses, other entities may arise that were not included in the original diagram and you should make sure that the diagram is updated since this diagram is an important tool for the maintenance of the database. The diagram provides a quick and easy overview of what is going on in the database. Keep in mind that sometime in the life of the database, you or someone else will be required to modify the structure of the database.
Defining the Objects
The next step is to define the objects that will make up the database. These objects are the entities (boxes) from the E-R diagram. The definition of the object is a complete statement of the purpose of the entity it represents.
This specification includes a description of its' purpose, properties, necessary data, and the rules that define the integrity of the object. The object specifications for the AUTHOR, ARTICLE and ISSUE objects are shown in 2. At this step, these only include the purpose and properties of the objects. Before we can define the data of an object and the integrity rules, we must first define the data dictionary.
This specification includes a description of its' purpose, properties, necessary data, and the rules that define the integrity of the object. The object specifications for the AUTHOR, ARTICLE and ISSUE objects are shown in Figure 2. At this step, these only include the purpose and properties of the objects. Before we can define the data of an object and the integrity rules, we must first define the data dictionary.
At this stage of the design process, the main objective is to further refine the scope of our database. The definition of the object at this stage should give the overall purpose of the information that the object will hold and the properties (types of information) that the object will contain. In 2 the AUTHOR object's purpose is to hold information about an individual who writes an article for publication in the magazine. The properties of the object (the type of information contained in the object) are where to contact the author, technical skill level of the author and the writing experience of the author. This information will be used as the input to the next step, the data dictionary.
At this stage of the design process, the main objective is to further refine the scope of our database. The definition of the object at this stage should give the overall purpose of the information that the object will hold and the properties (types of information) that the object will contain. In Figure 2 the AUTHOR object's purpose is to hold information about an individual who writes an article for publication in the magazine. The properties of the object (the type of information contained in the object) are where to contact the author, technical skill level of the author and the writing experience of the author. This information will be used as the input to the next step, the data dictionary.
Data Dictionary
A data dictionary is a definition of all of the individual pieces of information that a database must contain in order for it to satisfy its purpose. It's important that data elements be defined in their own right and not as part of a particular object. This ensures that each element can stand on its own.
A data dictionary should, at the least, define the following for each element.
* Name of the data element -- this name should not be limited by the restrictions of the method of implementation (i.e., six-character names in RPG).
* A description that fully describes the purpose of the data element. The type of data, such as text, numeric or date.
* What makes the contents of the data element valid. For example, amount can't exceed $200.
3 shows an example of some of the elements of the data dictionary for our sample database. At this point the data elements are not part of any particular object, they are separate entities. This is very important to remember when defining the criteria for a valid field. The validity of a field in the data dictionary can't depend on any particular object, such as employee number must exist in the employee master file. The data elements defined in the data dictionary are generic fields, they may be used in more than one object. This ensures that every time a data element is used in the database, it will be consistent with all other occurrences in the database.
Figure 3 shows an example of some of the elements of the data dictionary for our sample database. At this point the data elements are not part of any particular object, they are separate entities. This is very important to remember when defining the criteria for a valid field. The validity of a field in the data dictionary can't depend on any particular object, such as employee number must exist in the employee master file. The data elements defined in the data dictionary are generic fields, they may be used in more than one object. This ensures that every time a data element is used in the database, it will be consistent with all other occurrences in the database.
There are a few things to remember when defining a data dictionary. Each item must be atomic. This means that each element must not be able to be broken down any further. For example, instead of defining an address data element, you should define street, city, state and zip data elements. Data elements can always be combined as needed.
How atomic is atomic? In some situations, an area code will be included with the phone number, in others it may be a separate data element depending upon the application(s) for which the database is intended. As a general rule, it is better to break an element down as far as possible.
A data element must not be a multiple-occurring element. It can't be an array or table. For example, if you need to keep track of monthly balances, don't define an array of 12 items -- instead define a month data element and a balance element. The main reason for this is that it will not allow a database to be in third normal form.
Integrity Rules
An integrity rule is a definition of the relationship between a data element and its contents; a data element and another data element; a data element and an object; or between two or more objects. 4 shows some of the integrity rules for our sample database.
An integrity rule is a definition of the relationship between a data element and its contents; a data element and another data element; a data element and an object; or between two or more objects. Figure 4 shows some of the integrity rules for our sample database.
Integrity rules should cover all possible situations. Be thorough, make sure you have all the bases covered. Each rule must include a definition that is made up of a totally clear set of conditions that must be met before the rule can be satisfied. Don't make the mistake of saying "Oh, well that's just understood." These integrity rules are the basis for an accurate, flexible database.
The E-R diagram gives us our first set of integrity rules. The relationships (diamonds) are integrity rules. At a broad level, these relationships define how, and in what manner, objects interact with each other.
Samples of other integrity rules are as follows; an hourly wage can't be less than $5.00 or more than $50.00; a salesman must be employed by the company at the time he makes a sale; for an order to be placed, the customer must be a current customer; an employee's termination date can't be prior to hire date.
Putting It All Together
It is now time to take the data elements we described in the data dictionary and assign each of them to an object. This is where database normalization comes in. 5 lists the different levels of database normalization and their definitions. In this sample database, we will only concern ourselves with the third normal form. This is because when using existing database technology, it's usually not possible or practical to go beyond this level.
It is now time to take the data elements we described in the data dictionary and assign each of them to an object. This is where database normalization comes in. Figure 5 lists the different levels of database normalization and their definitions. In this sample database, we will only concern ourselves with the third normal form. This is because when using existing database technology, it's usually not possible or practical to go beyond this level.
The first step is to assign each of the data elements to an object. The purpose and properties sections of the object design should give you a clear indication of where a data element belongs. For example, the author name is assigned to the AUTHOR object, not the ARTICLE object. This is because the main purpose of the AUTHOR object, not the ARTICLE object, is to track specifics concerning an author. The author ID is what will be placed in the ARTICLE object.
You will notice that the author ID is assigned to two objects. This is done to establish a relationship between two or more objects. The data element must be a primary key in one and only one of the objects and a foreign key in all of the others. A primary key is a data element that uniquely identifies one of the members of an object; e.g., an employee number. A foreign key is a data element that is the primary key of an object and is used by other objects as a method of linking one object to the other.
Now that we have assigned the data elements to the objects we must choose one or more data elements to act as the primary. These data elements must uniquely identify a row in this object. This restriction is a requirement for a normalized relational database. There may be more than one set of data elements that will uniquely identify a row, these are known as candidate keys. One of these candidate keys must be chosen as the primary key, usually the most concise one.
The final thing to do is to assign any of the integrity rules that concern the object to that object. There will be some integrity rules that don't belong to any one particular object, they apply to the entire database. Make sure that these integrity rules are kept along with the object definitions.
6 shows the final object definitions. These will be used when it comes to the physical design and implementation of the database.
Figure 6 shows the final object definitions. These will be used when it comes to the physical design and implementation of the database.
Advantages of the Logical Design Phase
The process just described offers the following benefits:
* You can involve the users in the design of the database. Since everything produced in this design phase is easily understood by nontechnical personnel, they can proof your design and then you can make any corrections before the physical design phase begins. It's the user's knowledge of the day-to-day workings of the organization that drives good database design.
* The design is based on the what not the how. This delays the real (or imagined) limitations of a database management system until the physical implementation. While you may find that while certain methods of physical implementation have severe limitations, with some ingenuity there are always alternatives.
* Flaws in the database become apparent before too much time and money are spent on the implementation of a poorly designed database.
* Future maintenance is made much easier.
The Physical Design
Next month, we will cover the physical design and implementation of the database we just designed. To show just how flexible you can be when you design before you code, I will show the implementation of the design using three different methods: DDS, IDDU and SQL.
Relational Database Design -- Part 2
Figure 1 Entity-Relationship Diagram (unable to reproduce)
Relational Database Design -- Part 2
Figure 2 Object specifications
Figure 2: Object Specifications Object Name: AUTHOR Purpose: An author is an individual that writes an article for possible publication in the magazine. Properties: Contact information Technical competence Writing competence Object Name: ARTICLE Purpose: An article is a story, program or any piece of information that is submitted for publication in the magazine. This can come from inside and/or outside sources. Properties: Description of article Status of article Object Name: ISSUE Purpose: An issue is a monthly edition of the magazine. Properties: Editorial information
Relational Database Design -- Part 2
Figure 3 Data dictionary
Figure 3: Data Dictionary Data Element: Author's name Description: Legal name of the author of an article. Data Type: Text, 25 Integrity rule: Can not be blank. Data Element: Home Street Description: Street portion of the home address of an author. Data Type: Text, 25 Integrity rule: Can not be blank. Data Element: Work phone number Description: Work phone number of an author. Data Type: Numeric, 10.0 Integrity rule: Can not be less than or equal to zero. Must include the area code. Data Element: Years of technical experience Description: The total number of years experience that an author has .had on a midrange system. Data Type: Numeric, 2.0 Integrity rule: Can not be less than or equal to zero. Data Element: Issue date Description: The date of an issue of the magazine. Data Type: Date Integrity rule: Can not be blank. Data Element: Article status Description: The current status of an article. Data Type: Text, 2 Integrity rule: Can not be blank. Must be AC (accepted), RJ (rejected) or SC (scheduled).
Relational Database Design -- Part 2
Figure 4 Integrity rules
Figure 4: Integrity Rules Name: ARTICLE-ISSUE Objects: ARTICLE, ISSUE Rules: An issue must exist before an article can be scheduled for it. Name: ARTICLE-AUTHOR Objects: ARTICLE, AUTHOR Rules: An author must exist before an article can be entered for him. Name: Author of an article Objects: ARTICLE Rules: All articles must have an author.
Relational Database Design -- Part 2
Figure 5 Normal forms
Figure 5: Normal Forms First Normal Form -- 1NF * Each column is atomic. This means that it can't be subdivided any further without its primary purpose being changed. * The same columns occur in each row. In other words, no multiple record type files. * Contains no multiple-occurrence columns -- no tables, arrays or pseudo- arrays (Month 1, Month 2, etc.). Second Normal Form -- 2NF * Every column that is not part of the primary key, must be functionally dependent on the entire primary key. Third Normal Form -- 3NF * There can't be any functional dependencies from a non-key column to any other column. All functional dependencies must be between non-key data items and a candidate key. Boyce -Codd Normal Form * No part of a key may depend on any non-key column. Fourth Normal Form -- 4NF * There can be only one multi-valued dependency in a relation unless all columns involved int the dependency can be combined to functionally determine some other column. Fifth Normal Form - 5NF * The relation cannot be recreated by joining two or more of its projections, each having a different primary key.
Relational Database Design -- Part 2
Figure 6 Completed object specifications
Figure 6: Completed Object Specifications Object Name: AUTHOR Purpose: An author is an individual that writes an article for possible publication in the magazine. Properties: Contact information Technical competence Writing competence Data Elements: Author ID Name Home street Home city Home state Home ZIP Home phone Work street Work city Work state Work ZIP Work phone Work extension Fax number Years of technical experience Main system Writing skill level Integrity Rules: Object Name: ARTICLE Purpose: An article is a story, program or any piece of information that is submitted for publication in the magazine. This can come from inside and/or outside sources. Properties: Description of article Status of article Data Elements: Article ID Article name Author ID Article description Article status Date in-house Issue date Integrity Rules: Author ID must already exist in AUTHOR. If article status is scheduled the issue date must exist in ISSUE. Object Name: ISSUE Purpose: An issue is a monthly edition of the magazine. Properties: Editorial information Data Elements: Issue date Theme Buyer's Guide Editorial due date Printer due date Integrity Rules:
LATEST COMMENTS
MC Press Online