Relational Database Design -- Part 1

Typography

Smaller Small Medium Big Bigger
Default Helvetica Segoe Georgia Times
Reading Mode

More and more, a company's operational and strategic work depends on data processing. It is now apparent that data is itself an important company resource. Since a company's information on its products, customers, suppliers, employees, competitors, and so on can provide a significant competitive advantage, we must start to manage data as a company resource, while at the same time improve the way it is handled relative to programming. It becomes clear that some sort of uniform system is needed. A relational database environment, with its supporting software and accompanying management techniques, provides the framework for treating data as a standardized, manageable, shareable resource.

The Good Old Days

We still tend to concentrate on the how (code), as opposed to the what (data and the structure of that data). Programming (coding) is the fun part for most of us; designing an application, especially the database, seems too much like work. The majority of us began our programming lives on a flat file system, the S/34 or S/36. Little, if any, thought was given to the design of our files -- the code was what mattered. With the advent of the S/38, and more recently the AS/400, we were given a built-in relational database. But all those years of thinking about the code and not the data have taken their toll. Even today, when we are all supposed to know better, most of our time and emphasis (when it comes to applications development) is still spent on the code.

MIS departments are an ever-expanding entity. Rules change as data processing needs grow. It becomes apparent that in order to keep up with large backlogs of requests for new applications and enhancements to existing applications, application development has to take on a more standard, structured form. The amount of time spent maintaining existing applications becomes unacceptable as we attempt to keep ahead of these requests. It's clear that the way we handle -- or mishandle -- data is a major factor in the application maintenance mess in which we find ourselves.

There are numerous ways the design of our data files impacts new development and maintenance to existing applications. Most shops have one or more of the following problems somewhere in their system:

o Different files store the same data in different formats.

o Different applications store the same data is different files. This stores data redundantly in several files.

o Some programs are written in such a manner that if the way that the data is stored changes, the programs that access that data have to be modified in order to continue working correctly.

o Data is all too often inaccurate, inconsistent, or not up to date. Changes in everything from access methods to tax tables require programming changes.

Most data processing departments find it difficult to maintain standards in both programming and data storage. Programmers tend to go their separate ways when other programs are not directly dependent on their work. Unfortunately, a lack of standardization makes it difficult for anyone -- often even the original programmer -- to understand a program well enough to make required maintenance changes to it later on.

Setting Up Standards

A careful and standardized look at the entire application development process is required when different applications share data files. We need to use consistent data descriptions and access methods in our programs that affect the database and keep in mind that whatever one programmer does affects others, too. Standardization in programming techniques and storage schemes only serves to make for a less error-prone environment. This won't happen overnight, but rather takes place as a gradual transition.

In the independent flat file data processing environment, choice of file organization and access method usually decides the way that the application program is written. The program logic is dependent on the way in which the data is stored. If for any reason the storage characteristics of the data change, the program must be modified, often extensively. This should be avoided, due to the time and expense involved in such efforts. There are many reasons file structures should change to reflect improved storage techniques: new hardware, new software and performance tuning, to name just a few. One reason for the massive amounts of time spent on program maintenance is the typical level of dependency between programs and data.

What we would like to be able to do is separate storage and access considerations from programming as much as possible. The AS/400's Database Management System (DBMS) is a major step in the right direction in solving certain aspects of this problem.

The use of independent flat files should be looked at as both a starting point and a stopgap measure. It is clear that the independent flat file method of data storage, combined with individual applications implemented in a vacuum, carries with it a host of undesirable side effects. The solution, as we shall see more and more, is the database approach.

A Relational Database

Dr. Edgar F. Codd of IBM published a paper in 1970 entitled, "A Relational Model of Data for Large Shared Data Banks." This paper was the beginning of the relational database.

A company needs to keep track of many different pieces of information. Among these are employees, inventory, financial statistics, etc. All of this data must be organized in a manner that allows for fast easy retrieval, while at the same time avoids redundant storage. At a conceptual level, a database is a collection of this information. It's a collection of the data concerning a real world facet of the company.

The Parts of a Relational Database

A relational database is a DBMS implementation that is made up of relations (files), also known as tables. These tables are implemented as flat files, but through the use of keys are combined together to give very complex and sophisticated views of the data. A graphic representation of the parts of a relational database is shown in 1. A relation is made up of tuples (records, rows). A tuple is the description of a particular set of facts in a relational database. For example, an employee master record, a magazine subscriber record. While a relation is a description of a group of entities, a tuple is the description of a particular entity. Tuples are made up of attributes (fields, columns). An attribute is a description of one particular fact concerning a tuple. Just as a tuple is a description of a particular entity, an attribute is a description of one fact about the entity. For example, an employee's social security number, a magazine subscriber's address. Each attribute has a domain. The domain of an attribute is the allowable values of that attribute, e.g., an employee's salary must be numeric and be in the range of $1 to $100. From this point on, when referring to a relation (file) I will use the term table, when referring to a tuple (record) I will use the term row and when referring to an attribute (field) I will use the term column.

A relational database is a DBMS implementation that is made up of relations (files), also known as tables. These tables are implemented as flat files, but through the use of keys are combined together to give very complex and sophisticated views of the data. A graphic representation of the parts of a relational database is shown in Figure 1. A relation is made up of tuples (records, rows). A tuple is the description of a particular set of facts in a relational database. For example, an employee master record, a magazine subscriber record. While a relation is a description of a group of entities, a tuple is the description of a particular entity. Tuples are made up of attributes (fields, columns). An attribute is a description of one particular fact concerning a tuple. Just as a tuple is a description of a particular entity, an attribute is a description of one fact about the entity. For example, an employee's social security number, a magazine subscriber's address. Each attribute has a domain. The domain of an attribute is the allowable values of that attribute, e.g., an employee's salary must be numeric and be in the range of $1 to $100. From this point on, when referring to a relation (file) I will use the term table, when referring to a tuple (record) I will use the term row and when referring to an attribute (field) I will use the term column.

Uniquely Qualified

One requirement of relational databases is that each row in a table must be unique. Each row must be identified by a unique set of columns, known as a key.

There may be many combinations of columns that can uniquely identify a row, these are known as candidate keys. The primary key is the one candidate key chosen to function as the main identifier for the rows of the table. The key of one table can appear in another table, this is know as a foreign key. A foreign key is a column or columns of one table in a database that is the primary key of another table in the database. This foreign key is what allows one table to relate to another table in a database, hence the term relational database. One particular issue that foreign keys are closely tied to is referential integrity. Referential integrity is the issue of whether a data value that serves as the foreign key of a row in one table has a matching primary key value in another table.

Putting it all Together

A relational database must be capable of certain operations in order to qualify for the name relational. Among these operations are three very important ones: Select, Project and Join.

The Select operation allows us to retrieve a table that is the subset of another table. When we select a table, we take a horizontal piece of the table. In other words, we take selected rows from a table. For example, select all employees that have a salary greater that $1,000 an hour. This would retrieve a table that contain only the employees that meet the selection criteria: salary greater than $1,000. On the AS/400 there are many ways to achieve this: logical files, OPNQRYF, SQL/400, etc.

To Project a table, we take only selected columns from the table. When we project a table, we take a vertical piece of the table. In other words, we take selected columns from a row. For instance, retrieve the salary of all employees. This would produce a table that contains only the salary column. When I say that these operations produce a table, it doesn't mean that this table is a permanent table. It means that to an application that result appears as a table, but this table may only exist for that particular run of the program.

The Join operation is the most important and complex of the operations in a relational database. A join allows for the implementation of the purpose of a relational database, the relating of two or more tables. A join is the merging of related records from two or more files using keys as the link. For example, joining the employee master table with the employee insurance table using the column employee number as the join column. In the employee master table, the column employee number is the primary key and in the employee insurance file, the column employee number is a foreign key.

These operations can be combined in many different ways to produce very complicated and sophisticated views of your data. For instance, join the employee master file with the employee insurance file, selecting only those employees that have been employed more than two years, giving their name, insurance type and years of service.

Relationships

The ways in which different entities relate to each other are known as associations or relationships. These relationships are show in 2. Unary associations are when one entity relates to only one other entity, such as an employee and the employee's social security number. There is a one for one (unary) association in each direction: one employee, one social security number. This is also known as a one-to-one relationship.

The ways in which different entities relate to each other are known as associations or relationships. These relationships are show in Figure 2. Unary associations are when one entity relates to only one other entity, such as an employee and the employee's social security number. There is a one for one (unary) association in each direction: one employee, one social security number. This is also known as a one-to-one relationship.

A relationship can also exist that has unary association in one direction and a multiple association in the other direction. This is known as a one-to-many relationship. An example would be the association between an employee and the company. An employee works for one company but the company has many employees working for it. The final type of relationship has multiple associations in both directions. This is known as a many-to-many relationship. An example would be users and AS/400 libraries. There are many users authorized to a library and many libraries authorized to a user.

Data Redundancy

Data stored in a data processing system mirrors the realities of a business. In a data processing system, data is redundant if a particular fact about the real world is stored more than once. Let's take a look at the associated problems of redundant data and data integrity. First we will consider them in the context of one file, then in the context of many files. A example of nonredundant and redundant tables is shown in 3.

Suppose that we are a health insurance company which must keep track of our accounts and agents. The Agent file is keyed on the unique column, Agent Number, and includes, as one of its other fields, the column, Agent Name. The Account file is keyed on the unique column, Account Number, and has, as one of its other fields, the Agent Number, (the number of the agent who services this account). There is one record per agent in the Agent file and each record contains information about a particular agent, such as agent number, name, location, telephone extension, and so on. There's a similar situation for the data on each account in the Account file.

This kind of data storage is said to be nonredundant, because the detail data on each entity is stored only once. If someone comes along and wants to know the name of the agent who is identified by a particular agent number, they merely look up that agent's record in the Agent file, using Agent Number as the key, and read the name. Similarly, if someone wants to know the number of the agent who is responsible for a particular account, that person may access the Account file and look up the account's record, using the Account Number as the key. So far, so good.

Now, what if someone comes along and wants to know the name of the agent who services a particular account, identified by account number. Can that information be obtained from only one of the files? Clearly it cannot, since the account number information exists only in the Account file, and the agent name information exists only in the Agent file. The only way to answer this question is to first look up the account's record in the Account file, find the number of the agent on that account, use that number to find the person's record in the Agent file, and then finally find his or her name in the record. That kind of custom-made, multi-command, multifile access is error-prone in terms of programming, and expensive in terms of execution performance.

Ah, but if we knew that we were going to make such queries, why did we bother breaking the data up into two files in the first place? If it were all contained in one file, in a meaningful way, then there would be no costly multi-command, multifile accesses. Notice that we show the agent assigned to a particular account, just as we did in the two files, but now the agent's other information is also carried along. We can still answer the two original simple questions that we put to the files, but now, in addition, we can answer the tougher, previously multifile question, "What is the name of the agent on a particular account?" with just the one file and one command. Fine! Or is it?

In this combined file, where all of the fundamentally different kinds of information are intermingled, the particulars of a specific agent must be repeated for every account serviced.

This situation of data redundancy causes a number of problems. For one thing, the data takes up a lot more storage space. For another thing, it causes extra work to update. If an agent's name changes, the change must be recorded many times. The time consumed in doing this can be very costly, particularly if the records involved are scattered all over a disk. Also, there is the nagging doubt that each of the occurrences was located and updated, due to anything from programming bugs to execution time system failures. When that happens, it's called a data integrity problem.

There appears to be a trade-off, but trade-offs are hardly rare in the computer field. In the two files, we have no redundancy, but a query of the type that we've been looking at requires a multi-command, multifile access. In the combined file, we have eliminated the need for a multi-command, multifile access for that type of query, but we have introduced data redundancy. Neither of these situations is tolerable, for the reasons indicated previously.

Can we have our cake and eat it, too? Yes, and this question brings us to one of the key features of the database environment. A true integrated data management system; that is, a true database system, is one in which data can be held nonredundantly. It also allows for a multifile query that requires a mixture of different kinds of data to be specified in a single command from the highest level programming interface. Any system that does not have this property really should not be called a database system -- a "file management" system would be a more accurate term.

Another benefit of the integrated approach is the encouragement that it gives users and programmers to find new uses for the data in the database. The richness of having so many different portions of the organization's data interconnected leads to many more usage possibilities than when it was stored separately.

Departmental Data

There is another kind of redundancy that, while perhaps a bit more obvious than the one previously discussed, is not necessarily easier to combat, but is difficult for different reasons.

Certain data is needed by various departments in a company during the course of normal, everyday work. For example, account name address may be needed by the sales department, the accounts receivable department and the credit department. There will also generally be other pieces of information involving accounts that are needed by only one or two departments. Typically the solution to this multiple need is redundancy. The sales department (or accounts receivable and credit departments) has its own stored file that, among other things, contains the account name and address.

While it's true that this keeps various departments from competing for the data, it opens up a can of worms reminiscent of the problems that we encountered in single file redundancy. First of all, storing a piece of data three times takes three times the amount of storage as storing it once. In addition, every time the data must be updated, it must be changed in each file in which it resides. This wastes processing time, causes control nightmares and can potentially lead to integrity problems if all of the data is not updated correctly and at the same time. Furthermore, the question arises as to whose responsibility it is to update data in all of the different files -- files that are probably owned by widely separated departments.

Wouldn't it make more sense for such data to be stored just once, in a way that makes it accessible to all of the different groups that need it? This way, there would be one group in charge of updating it, little wasted space, few integrity problems, and so on.

The database approach includes the concept of a common shared database with controlled management and update responsibility.

Summing It All Up

A relational database offers many advantages over the traditional methods of data storage. It allows for the storage of complex pieces of data concerning the real world activities of a company in a flexible, nonredundant manner. It also provides a set of operations that allows that data to be accessed in a myriad of ways. This approach enables a company to look at its data in ways that were never before possible. With a relational approach to data storage, a company can begin to manage its data the same way it manages its inventory - - as an asset and not a liability.

Relational Database Design -- Part 1

Figure 1 Database components

 Figure 1: Database Components DATABASE RELATION, TABLE, FILE TUPLE, ROW, RECORD ATTRIBUTE, COLUMN, FIELD DOMAIN 
  Relational Database Design -- Part 1
  Figure 2 Entity relationships
  Figure 2: Entity Relationships ONE <--------> ONE ONE <-------->> MANY MANY<<-------->> MANY 
  Relational Database Design -- Part 1
  Figure 3 Redundant vs Nonredundant tables
  Figure 3: Redundant vs. Nonredundant Tables NONREDUNDANT: Account Table: Account Account Account Agent Number Name Type Number Agent Table: Agent Agent Agent Agent Number Name Extension Location REDUNDANT: Account/Agent Table: Account Account Account Agent Agent Agent Number Name Type Name Extension Location

BLOG COMMENTS POWERED BY DISQUS

LATEST COMMENTS

Book Reviews

Book Review: Extract, Transform, and Load with SSIS

Do your business apps access different data sources? This book shows you how to make that task easier
Book Review: 21st Century RPG: /Free, ILE, and MVC

David Shirey’s first book is an educational and entertaining read for “modern” and “old” RPG programmers alike
Book Review: Developing Business Applications for the Web--With HTML, CSS, JSP, PHP, ASP.NET, and JavaScript

If you are ready to get into Web application development, take this book along as your guide
Book Review: DB2 10.5 Fundamentals for LUW: Certification Study Guide (Exam 615)

DBAs who use the book will find it very helpful first in their test study and later as a reference book.
Book Review: DB2 11 for z/OS Database Administration—Certification Study Guide

This is a well-written DB2 11 book that could easily stand on its own as a reference manual, not just a certification guide.
Book Review: Free-Format RPG IV, Third Edition

Jim Martin comes through for us again.
Book Review: IBM i Security Administration and Compliance, Second Edition
Book Review: Programming in ILE RPG, Fifth Edition

This book really hits the mark and is a must-read for all RPG developers.
Book Review: DB2 10.1/10.5 for Linux, UNIX, and Windows Database Administration: Certification Guide
Book Review: Subfiles in Free-Format RPG

Whether you're a newbie or a seasoned pro, this book has something for you.
Book Review: Evolve Your RPG Coding: Move from OPM to ILE ... and Beyond

This book provides an amazingly comprehensive introduction to the concepts while at the same time delivering enough technical detail to make you productive very quickly.
Book Review: Database Design and SQL for DB2
Book Review: The Chief Data Officer Handbook for Data Governance

When implemented appropriately, data governance is a powerful framework.
Book Review: DB2 10 for z/OS: The Smarter, Faster Way to Upgrade

Trying to figure out whether to upgrade? Read on.
Book Review: 5 Keys to Business Analytics Program Success
Book Review: DB2 11: The Ultimate Database for Cloud, Analytics, and Mobile
Book Review: Flexible Input, Dazzling Output with IBM i

Today, it's all about input and output. Getting data into the IBM i from non-traditional sources and then displaying it back out again in varied formats. But where can you go to learn all that you need to know about this critical skill?
Book Review: Advanced Guide to PHP on IBM i

Enterprise-level PHP skills and techniques have been adapted for IBM i developers in Kevin Schroeder's new book.
Book Review: Java for RPG Programmers

If you've been putting off learning Java, you have no excuse anymore!
Book Review: DB2 10.1 Fundamentals: Certification Study Guide

Too valuable to be classified as merely excellent certification material, this book should also rightly take its place on DB2 DBA bookshelves as a solid day-to-day DB2 reference.
Book Review: DB2 10 for Z/OS Database Administration: Certification Study Guide

Whether you're trying to get certified or you just need a great reference book, this is the book for you.
Book Review: Developing Web 2.0 Applications with EGL for IBM i

It's everything you need to know, from the bottom up.
Book Review: Advanced Integrated RPG

Isn't it about time somebody told us how to integrate RPG and Java?
Book Review: Managing Without Walls

If you manage remote or satellite teams, this book is a must-read!
Book Review: Managing Without Walls

If you manage remote or satellite teams, this book is a must-read!
Book Review: The Remote System Explorer

This book speaks directly to the thousands of IBM i programmers who develop in RPG, COBOL, CL, and DDS every day.
Book Review: IBM System i APIs at Work, Second Edition

API expert Bruce Vining delivers the only comprehensive guide to APIs.
Book Review: Functions in Free-Format RPG IV

This one short volume manages to essentially be both a general introduction and a detailed reference.
Book Review: DB2 11: The Database for Big Data and Analytics
Book Review: IBM Mainframe Security: Beyond the Basics

Beginners will have a strong foundation after reading this book. Experienced professionals will reference it frequently.
Book Review: IBM InfoSphere: A Platform for Big Data Governance and Process Data Governance

Find out how IBM is addressing the challenges of big data.
Book Review: Fundamentals of Technology Project Management

Projects can be overwhelming, but taken in small, deliberate steps, all projects are achievable.
Book Review: Customer Experience Analytics

Use CEA as a strategic weapon to stay ahead of your competitors.
Book Review: Big Data Analytics: Disruptive Technologies for Changing the Game

The disciplines of data analytics are evolving to meet the new challenges of big data.
Book Review: IBM i Security: Administration and Compliance

If you have any interest in IBM i security, whether as an administrator, a programmer, or an auditor, then this book is the perfect resource.
Book Review: DB2 9.7 for Linux, UNIX, and Windows Database Administration (Exam 541)

This book, written by the creator of the certification exam, reveals exactly what you'll need to know to prep for the test.
Book Review: Selling Information Governance to the Business

Who governs the information that runs your company?
Book Review: You Want to Do WHAT with PHP?

If you're serious about programming in PHP, get a book that treats you that way.
Book Review: The IBM i Programmer's Guide to PHP

Both a primer and a reference, this book is a must-have for anyone who wants to program in PHP.
Book Review: JavaScript for the Business Developer

There's no faster, easier way to become proficient in JavaScript.
Book Review: SOA for the Business Developer

If you want to know how SOA works in the real world, this is your book.
Book Review: DB2 9 Fundamentals

Whether you want to obtain an IBM certified DB2 professional certification or simply become well-rounded in the fundamental concepts of DB2 and general database theory, this is your book.
Book Review: The Modern RPG IV Language, Fourth Edition

This book isn't a training manual; it's a reference book.

Resource Center

How to Modernize Fast and Within Budget (Quick Guide)
Why Migrate When You Can Modernize?

Business users want new applications now. Market and regulatory pressures require faster application updates and delivery into production. Your IBM i developers may be approaching retirement, and you see no sure way to fill their positions with experienced developers. In addition, you may be caught between maintaining your existing applications and the uncertainty of moving to something new.
Resource Center

The MC Resource Centers bring you the widest selection of white papers, trial software, and on-demand webcasts for you to choose from. >> Review the list of White Papers, Trial Software or On-Demand Webcast at the MC Press Resource Center. >> Add the items to yru Cart and complet he checkout process and submit
IBM i Transformation Risks Every Business Leader Should Know

Join us for this hour-long webcast that will explore:
What to Do When Your AS/400 Talent Retires

IT managers hoping to find new IBM i talent are discovering that the pool of experienced RPG programmers and operators or administrators with intimate knowledge of the operating system and the applications that run on it is small. This begs the question: How will you manage the platform that supports such a big part of your business? This guide offers strategies and software suggestions to help you plan IT staffing and resources and smooth the transition after your AS/400 talent retires. Read on to learn:

Analytics & Cognitive Categories

Latest Analytics & Cognitive News

Career Catgories

Latest Career News

Cloud Categories

Latest Cloud News

IT Infrastructure Categories

Latest IT Infrastructure News

News Categories

Latest News

Programming Categories

Latest Programming News

Security Categories

Latest Security News

Typography

Share This

Relational Database Design -- Part 1

Figure 1 Database components

Relational Database Design -- Part 1

Figure 2 Entity relationships

Relational Database Design -- Part 1

Figure 3 Redundant vs Nonredundant tables

LATEST COMMENTS

MC Press Online

Support MC Press Online

Book Reviews

Book Review: Extract, Transform, and Load with SSIS

Book Review: 21st Century RPG: /Free, ILE, and MVC

Book Review: Developing Business Applications for the Web--With HTML, CSS, JSP, PHP, ASP.NET, and JavaScript

Book Review: DB2 10.5 Fundamentals for LUW: Certification Study Guide (Exam 615)

Book Review: DB2 11 for z/OS Database Administration—Certification Study Guide

Book Review: Free-Format RPG IV, Third Edition

Book Review: IBM i Security Administration and Compliance, Second Edition

Book Review: Programming in ILE RPG, Fifth Edition

Book Review: DB2 10.1/10.5 for Linux, UNIX, and Windows Database Administration: Certification Guide

Book Review: Subfiles in Free-Format RPG

Book Review: Evolve Your RPG Coding: Move from OPM to ILE ... and Beyond

Book Review: Database Design and SQL for DB2

Book Review: The Chief Data Officer Handbook for Data Governance

Book Review: DB2 10 for z/OS: The Smarter, Faster Way to Upgrade

Book Review: 5 Keys to Business Analytics Program Success

Book Review: DB2 11: The Ultimate Database for Cloud, Analytics, and Mobile

Book Review: Flexible Input, Dazzling Output with IBM i

Book Review: Advanced Guide to PHP on IBM i

Book Review: Java for RPG Programmers

Book Review: DB2 10.1 Fundamentals: Certification Study Guide

Book Review: DB2 10 for Z/OS Database Administration: Certification Study Guide