They say no man is an island. We’re social creatures and (most of us) need to live in a community. In a properly structured database, no table is an island either, so we need to know how to join them in our queries.
Listing the contents of a table, even with comparisons (or search conditions) that constrain the output, as shown in the previous TechTip, is a bit limited in a real-life situation. Typically, our queries are built using information from multiple tables. I’ll go through almost all the ways you can join tables in SQL; I’ll leave a special type of join, called EXCEPTION JOIN, for later. Let’s start with the simplest of them all: the INNER JOIN.
The JOIN You’ve Been Using Without Realizing: The INNER JOIN
The INNER JOIN is arguably the most-used type of join. Actually, you’ve probably been using it without realizing it, because it can be hidden in the WHERE clause, in what is commonly called an implicit join.
For instance, let’s say you want to list all the students who have enrolled at least one class. This is a complete intersection between the Students and Classes tables, a classic INNER JOIN (even though it could be simply a SELECT over the Classes table because it contains the student name, but let’s ignore that for the moment, because it will be useful to illustrate another type of join). We know, from two articles ago, that the link between the PFSTM (Students table) and the PFCLM (Classes table) is the student name. It’s true that it’s not a brilliant solution, but we’ll have to live with it…for now. An SQL statement that lists all the students that have, at any given point, enrolled in a class is often written using an implicit join, like this:
SELECT STNM
, CLNM
, CLCN
FROM UMADB_CHP2.PFSTM ST, UMADB_CHP2.PFCLM CL
WHERE CL.CLSN = ST.STNM
;
However, if there are more than two tables in the SELECT statement, things might get a little hazy. That’s why I think it’s much clearer to write the same statement using an INNER JOIN:
SELECT STNM
, CLNM
, CLCN
FROM UMADB_CHP2.PFSTM ST
INNER JOIN UMADB_CHP2.PFCLM CL ON CL.CLSN = ST.STNM
;
If you’re familiar with the INNER JOIN syntax, there’s really nothing new for you here, except perhaps the use of aliases for the tables (ST and CL), which make the statement slightly more readable. However, if you’re used to the implicit join instead, there are a couple of things worth mentioning. First, notice how the line following the FROM clause starts with the join type: in this case it’s an INNER JOIN, but there are others, as you’ll see in a moment. The type of join identifier is then followed by the table name (and optionally, an alias), which in turn is followed by the ON keyword. This keyword is used to specify how the connection between the tables is supposed to work. In this case, the link between the Students and Classes tables is achieved via the student name, but it could be a more complex condition, resorting to multiple columns.
Let me close this article with a more complete example that uses multiple INNER JOINs to link all the tables of our sample database. The objective is to obtain a bit more information about a student’s grades and the classes he or she attended. Here’s the statement:
SELECT STNM AS STUDENT_NAME
, CONM AS COURSE_NAME
, CODE AS COURSE_DIRECTOR
, TETR AS TEACHER_RANK
, CLNM AS CLASS_NAME
, CLCY AS CLASS_YEAR
, GRGR AS GRADE
FROM UMADB_CHP2.PFSTM ST
INNER JOIN UMADB_CHP2.PFCLM CL ON ST.STNM = CL.CLSN
INNER JOIN UMADB_CHP2.PFGRM GR ON GR.GRCN = CL.CLNM
AND GR.GRCY= CL.CLCY
AND GR.GRSN= ST.STNM
INNER JOIN UMADB_CHP2.PFCOM CO ON CO.CONM = CL.CLCN
INNER JOIN UMADB_CHP2.PFTEM TE ON TE.TENM = CO.CODE
WHERE GRSN = 'Anthony, Mark'
;
Even though the statement is a bit longer than the previous examples, it’s a simple SELECT. The difference is that it uses a lot more tables, which can make it confusing really quickly. I’d like to emphasize the importance of indentation to improve the statement’s readability and the use of user-friendly column names. Also note that I’ve used aliases for all the tables. At this time, the aliases are not critically important, because the column names all identify the name of the table to which they belong. In a future article, I’ll show you how to “get the best of both worlds” by providing long, human-readable names and keeping the short “RPG-standard” cryptic names that currently exist. Then you’ll see that making a habit of using aliases for the tables and using them when referring to columns in your statements is of paramount importance to the readability and maintainability of your code.
That’s all for now. I hope this gave you something to think about, especially how to apply these tricks to your own code!
LATEST COMMENTS
MC Press Online