Last time around, I started introducing the UMADB database. Let’s proceed with that discussion now.
I discussed the Students table in the previous article, so now let’s go over the rest of the tables of our fictitious database, starting with the Courses table.
The Courses Table
Let me take a moment to explain the structure of the information in this database. The students take classes, which are taught by teachers and are part of courses. At the end of each semester, the teachers grade the students on each of the classes they attended. This may sound obvious and redundant, but it’s important to keep the structure in mind from this point on. In a way, the Students and Courses tables are the center of the database, because all the other tables are somehow linked to one (or both) of these tables.
Now let’s take a look at the Courses table structure, shown in Table 1.
Table Name |
Column Name |
Data Type |
Length |
Dec. Pos. |
Description |
PFCOM |
CONM |
Char |
60 |
|
Course name |
PFCOM |
CODS |
Char |
100 |
|
Course description |
PFCOM |
CODN |
Char |
50 |
|
Department name |
PFCOM |
CODE |
Char |
60 |
|
Course director name |
PFCOM |
COTA |
Char |
60 |
|
Course teaching assistant name |
PFCOM |
COSC |
Char |
1 |
|
Status |
Table 1: The Courses table structure
Again, the table is pretty typical: the same cryptic names and the lack of a unique record identifier that characterized the Students table. By the way, I imagine that you’re curious about the hidden flaw in the Students table, mentioned earlier. Don’t worry, it’s going to become obvious in a moment, when we look at the Teachers table.
Meanwhile, there’s something common to all the tables in this database: status. As the name implies, it indicates the status of the record. The convention used here is the following:
- 0: Created but not active record
- 1: Active record
- 9: Inactive (deleted) record
This is something that has to be taken into account when querying the database, and it has been the source of many misunderstandings. Sometimes the users forget to include a condition in their queries and end up mixing active and inactive records, which leads to inconsistent or just plain wrong information.
The Teachers Table
The teachers are a very important part of any teaching system. They’re also a very important part of the application our database supports, even though the table that keeps their records is not very “polished.” You’ll see what I mean when we analyze the Teachers table, shown in Table 2.
Table Name |
Column Name |
Data Type |
Length |
Dec. Pos. |
Description |
PFTEM |
TENM |
Char |
60 |
|
Teacher name |
PFTEM |
TETR |
Char |
20 |
|
Teacher rank |
PFTEM |
TEDB |
Decimal |
8 |
0 |
Date of birth |
PFTEM |
TEAD |
Char |
60 |
|
Home address |
PFTEM |
TEPN |
Char |
15 |
|
Home phone number |
PFTEM |
TEMN |
Char |
15 |
|
Mobile number |
PFTEM |
TEEM |
Char |
60 |
|
Email address |
PFTEM |
TEDL |
Char |
20 |
|
Driver’s license |
PFTEM |
TESN |
Char |
11 |
|
Social security number |
PFTEM |
TEST |
Char |
200 |
|
Subjects taught |
PFTEM |
TESA |
Decimal |
11 |
2 |
Salary |
PFTEM |
TESC |
Char |
1 |
|
Status |
Table 2: The Teachers table structure
This table follows the same line as the previous ones, but it introduces the first sensitive piece of information of the database: the teacher’s salary. As things stand, anyone with access to the table can see how much each teacher earns, which might not be a very good idea. I’ll get back to this later, when I discuss how to hide a column’s data from prying eyes.
Notice the similarities between this and the Students table: the personal information (addresses and IDs) is the same. Even though this makes sense—both teachers and students are people and share the same type of information—it brings up a question: what if a student becomes a teacher or vice versa? There will be duplicate and possibly inconsistent information in the database. I’ll address this issue in a later article, in the discussion about database normalization and how that translates to SQL.
Having said that, let me take a moment to explain the other columns in the table. Besides the obvious teacher name and the aforementioned personal information, this table also includes “teacher rank” (which can be something like Assistant Professor, Professor, and so on) and “subjects taught.” The latter is supposed to link to the Classes table, presented in the next section, but the connection is kept by humans, not the database. Because the same person can teach multiple classes in the same school year, the application’s manager thought it would be simpler to manually track the link between teachers and classes—yet another shortcoming we’ll need to solve later.
It’s now time to move on to the next section and review the Classes table.
The Classes Table
Here’s where things start to get interesting: finally, a table with links to other tables. The Classes table contains information about the students that form a class of a given subject during a given year and the course to which the class belongs. As I said before, the teacher is not part of the setup, at least not at database level. Table 3 shows the complete Classes table structure.
Table Name |
Column Name |
Data Type |
Length |
Dec. Pos. |
Description |
PFCLM |
CLNM |
Char |
60 |
0 |
Class name |
PFCLM |
CLCY |
Decimal |
4 |
0 |
Class year |
PFCLM |
CLCN |
Char |
60 |
0 |
Course name |
PFCLM |
CLSN |
Char |
60 |
|
Student name |
PFCLM |
CLSA |
Char |
60 |
|
Student home address |
PFCLM |
CLSE |
Char |
60 |
|
Student email address |
PFCLM |
CLSC |
Char |
1 |
|
Status |
Table 3: The Classes table structure
As you can see from this table, the links I mentioned before are based on the names of the student and the course, which might cause some problems. The ideal situation would be having record identifiers in each of the tables and keep those IDs, instead of the respective names, on the Classes table records. The next issue is the duplicate student information. The application manager thinks this duplication makes sense, because the student information might change from school year to school year, and keeping the information here allows the teacher to contact the student using the most current address. We’ll also have to deal with this situation later.
Finally, the last table of the downsized version of the UMADB database is the Grades table. Let’s analyze it in the next section.
The Grades Table
After the end of the semester, the students are graded on their performance in each of the classes they attended. The results are stored in the Grades table, shown in detail in Table 4.
Table Name |
Column Name |
Data Type |
Length |
Dec. Pos. |
Description |
PFGRM |
GRSN |
Char |
60 |
|
Student name |
PFGRM |
GRCN |
Char |
60 |
0 |
Class name |
PFGRM |
GRCY |
Decimal |
4 |
0 |
Class year |
PFGRM |
GRGR |
Char |
2 |
|
Grade |
Table 4: The Grades table structure
Just like the Classes table, this one also depends on another table’s information to form its unique key. In this case, that key is formed by the student name, class name, and class year. Of these three, two are names stored in character strings. This makes them prone to error (character fields usually make awful keys because of the possible mismatches caused by different character cases—“John” is not the same as “john,” for example) and slower to work with (because it takes longer to process a string of characters than a numeric value). The other problem with this table is the Grade column: there’s no validation on the database to prevent inconsistent values, such as invalid grades. It’s assumed that the letters A, B, C, D, and F will be used, optionally followed by a plus or minus sign, but there’s no actual check for a valid grade at the database level. Just like the student’s date of birth validation, this one also exists at the application level, buried in some RPG program.
Just a Few Tables, and So Many Problems
From what you’ve read so far, you probably concluded this (exaggerated) scenario has some similarities with real-life issues on IBM i databases you’ve seen. Probably not all at the same time, but you know what I mean. It’s true that some of the issues are very basic and easy to solve, while others require some database redesign and ingenuity. I’ll address all these issues and a few more, which are related to the non-implemented functionalities that are currently handled outside the application’s scope, over the next subseries of the SQL 101 TechTip series.
You can skip a few articles, but keep in mind that the database will evolve and each subseries will build upon the foundations laid by its predecessor. If you’re comfortable with the topics discussed in a subseries, you can simply have a quick look at the SQL code samples to keep track of the changes to the database.
And that’s all for now. In the next few articles, I’ll start by reviewing some SQL data manipulation language (DML) statements and sharing a few tricks I’ve learned over the years that can, hopefully, help you get more productive when it comes to manipulating data using SQL. Later, we’ll start to fix this broken database!
LATEST COMMENTS
MC Press Online