Example Tables
The following tables are used in the examples in this article to illustrate how subselect works.
Employee Master (EMP)
|
|||||
Nbr
|
Nam
|
Cls
|
Sex
|
Dpt
|
Sal
|
10
|
Ed
|
5
|
M
|
911
|
700
|
20
|
Heikki
|
2
|
M
|
901
|
600
|
50
|
Marcela
|
3
|
F
|
911
|
750
|
40
|
Mike
|
4
|
M
|
977
|
650
|
30
|
John
|
5
|
M
|
977
|
320
|
60
|
Frank
|
2
|
M
|
990
|
650
|
Department Master (DEP) |
|
Dpt
|
Dnm
|
901
|
Accounts
|
911
|
Sales
|
977
|
Manufact
|
990
|
Spares
|
Employee Transfer Master (TRANSFER)
|
||
Nbr
|
Nam
|
Dpt
|
20
|
Heikki
|
901
|
30
|
John
|
977
|
50
|
Marcela
|
911
|
New Salary Table #1 (NEWSAL)
|
|
Nbr
|
Nsal
|
10
|
789
|
20
|
689
|
30
|
409
|
40
|
739
|
50
|
839
|
60
|
739
|
New Salary Table #2 (NEWSAL2)
|
|
Nbr
|
Nsal
|
10
|
752
|
20
|
675
|
30
|
372
|
60
|
702
|
Simple Subselect
The simplest form of subselect can be found in a SELECT statement where a subselect is used to create a selection list for the IN predicate within the WHERE clause.
Suppose you want to see all the employees who work in a department that have a department name beginning with an uppercase S. The challenge is that the department name (the column called dnm) exists only in the Department Master table, not in the Employee Master table.
You need a technique that allows SQL to find those department numbers that have a name beginning with an S and then use that information to determine which employees work in those departments. The use of a subselect on the right side of the IN predicate for the WHERE clause of the SELECT statement provides this capability and is coded as shown below.
WHERE dpt IN
(SELECT dpt FROM dep WHERE dnm LIKE 'S%')
ORDER BY nbr
This SQL statement retrieves the columns employee number (nbr), employee name (nam), department number (dpt), and salary (sal) FROM the Employee Master table (emp) WHERE the department number (dpt) is IN the list of department numbers (dpt) SELECTed FROM the Department Master table (dep) WHERE the department name (dnm) begins with or is LIKE S. The rows that are retrieved are ORDERed BY employee number (emp).
Nbr
|
Nam
|
Dpt
|
Sal
|
10
|
Ed
|
911
|
700
|
50
|
Marcela
|
911
|
750
|
60
|
Frank
|
990
|
650
|
Subselect Terminology
At this point and before proceeding further, we need to discuss some subselect terminology. The first SQL statement — in this case, the first SELECT statement — is called the primary or outer SQL statement. In this example, it is called the primary or outer SELECT:
The second SELECT statement is the subselect (also called the inner SELECT):
A maximum of 256 SELECT statements, all referred to as inner selects, can be embedded or nested inside the outer select or outer SQL statement. Please keep in mind that the more subselects you embed together inside an SQL statement, the longer that statement may take to execute.
Scalar Subselect
A scalar subselect retrieves data from a secondary table based on data in the primary table and can be used anywhere an SQL function or expression is used. Let's modify our simple subselect example to include the department name in the rows returned. To do this, you must add a scalar subselect to the column list of the example immediately after the column named sal, as shown below:
SELECT nbr, nam, dpt, sal,
(SELECT dnm FROM dep b WHERE a.dpt = b.dpt)
AS Dept_Name
FROM emp a
WHERE dpt IN (SELECT dpt FROM dep
WHERE dnm LIKE 'S%')
ORDER BY nbr
This example now contains two subselect statements: The scalar subselect within the column list (SELECT nbr, nam, dpt, sal, (SELECT dnm FROM dep b ...)) and the subselect within the WHERE clause (WHERE dpt IN (SELECT dpt FROM dep ...)) of the outer SELECT statement. Executing this SELECT statement returns the same rows as in the previous example: all the employees who work in a department with a department name that begins with S. However, now the department name is also included.
Nbr
|
Nam
|
Dpt
|
Sal
|
Dept_Name
|
10
|
Ed
|
911
|
700
|
Sales
|
50
|
Marcela
|
911
|
750
|
Sales
|
60
|
Frank
|
990
|
650
|
Spares
|
In this example, what happens if an employee row exists with an invalid department number in it and the scalar subselect cannot find a matching department number in the Department Master table? If this situation occurs, no department number is retrieved and the default value for the column will be used instead.
If you think about it, this SQL process is very similar to an application program in which you read a record from a primary file and use information from this record to do a single-record random retrieval (CHAIN in RPG, GET BY KEY in other high-level languages) to retrieve information from a secondary file. As you will see later, the information from the record retrieved in the secondary file can also be used to update the current record from the primary file. What should become readily apparent is that it takes significantly less lines of code to do this in SQL than in a high-level language like RPG, Cobol, C, or Java.
Correlated Naming
In this example, the scalar subselect must use correlated naming (translation: qualified column names) since the emp and dep tables each have a column called dpt and these columns are compared in the WHERE clause (WHERE a.dpt = b.dpt) of the scalar subselect. A qualified column name is required whenever the same column name exists in one or more tables, and a comparison will be made between tables using that column name.
Correlated naming is used because SQL supports long table names, up to 128 characters in length. Because of this, SQL allows you to assign a short correlated or alternate name to the table in the FROM clause of the SQL statement, and you can use this alternate name (in place of the long SQL name) to qualify column names where necessary.
In the FROM clause for the outer SELECT statement (FROM emp a) and the FROM clause for the inner scalar subselect (FROM dep b), the table called emp is followed by an a, and the table called dep is followed by a b. The a is the correlated or alternate name for emp, and the b is the correlated or alternate name for dep; these are then used to qualify the column name in the WHERE clause of the scalar subselect (WHERE a.dpt = b.dpt). You can use any alternate name of your choosing; I arbitrarily chose a and b.
Subselect and INSERT
Subselect can be used in the INSERT statement to add one or more rows to a table or to initially populate the table with rows. To illustrate this, let's create an Employee Name work table using the CREATE statement and then populate the work table using a simple subselect in the INSERT statement. The CREATE and INSERT statements required to do this are shown below:
(number DEC (3,0) NOT NULL WITH DEFAULT,
name CHAR (10) NOT NULL WITH DEFAULT,
dept DEC (3,0) NOT NULL WITH DEFAULT)
INSERT INTO empname
SELECT nbr, nam, dpt FROM emp
After executing these two SQL statements, the work table contains the following rows:
Number
|
Name
|
Dept
|
10
|
Ed
|
911
|
20
|
Heikki
|
901
|
30
|
John
|
977
|
40
|
Mike
|
977
|
50
|
Marcela
|
911
|
60
|
Frank
|
990
|
You should note that the column names in the INTO clause of the INSERT statement must be listed in the same order and must have attributes compatible with the corresponding columns in the CREATE TABLE statement.
Now, let's add a layer of complexity to this example by creating the work table and including the department name in addition to the three original columns. Can subselect still be used with the INSERT statement to populate the work table? Yes! The CREATE TABLE and INSERT statements to do that are shown below:
(number DEC (3,0) NOT NULL WITH DEFAULT,
name CHAR (10) NOT NULL WITH DEFAULT,
dept DEC (3,0) NOT NULL WITH DEFAULT,
dptnam CHAR (10) NOT NULL WITH DEFAULT)
INSERT INTO empname
SELECT nbr, nam, dpt,
(SELECT dnm FROM dep b
WHERE a.dpt = b.dpt) AS dptnam
FROM emp a
After executing these SQL statements, the work table contains the following rows, which now include the department name.
Number
|
Name
|
Dept
|
Dptnam
|
10
|
Ed
|
911
|
Sales
|
20
|
Heikki
|
901
|
Accounts
|
30
|
John
|
977
|
Manufact
|
40
|
Mike
|
977
|
Manufact
|
50
|
Marcela
|
911
|
Sales
|
60
|
Frank
|
990
|
Spares
|
Subselect and CREATE TABLE
This last example can actually be done in one step (instead of two steps) by using a single subselect within the CREATE TABLE statement: the CREATE TABLE followed by the INSERT. To do this, replace the column list in the previous CREATE TABLE statement with subselect from the INSERT statement, as shown below:
(SELECT nbr, nam, dpt,
(SELECT dnm FROM dep b
WHERE a.dpt = b.dpt) AS dptnam
FROM emp a)
WITH DATA
The work table that results from executing this CREATE TABLE is the same as the one that results from executing the INSERT followed by the CREATE TABLE.
When using CREATE TABLE with the subselect, each derived column in the subselect must be given a name with the AS operator, and the CREATE TABLE statement must be ended with the WITH DATA or WITH NO DATA clause. The WITH DATA clause causes the table to be created and populated with data, and the WITH NO DATA clause causes the table to be created without being populated with data (an empty member).
Subselect and DELETE
You can use subselect in the DELETE statement much the same way you can use it in the INSERT statement.
Suppose that a few of the employees in the Employee Master table have been transferred to another division of the company. After sending the employee information to the receiving division, you need to delete the rows for the transferred employees in the Employee Master table.
The Employee Transfer Master table shown below has the information for three employees that are to be transferred.
Nbr
|
Nam
|
Dpt
|
20
|
Heikki
|
901
|
30
|
John
|
977
|
50
|
Marcela
|
911
|
This table can be referenced in the FROM clause of a subselect within a DELETE statement that will delete those employees that are being transferred, as shown in the following example:
(SELECT nbr FROM transfer)
After the code executes, the three employees will have been deleted from the Employee Master table, and it will contain the following rows:
Nbr
|
Nam
|
Cls
|
Sex
|
Dpt
|
Sal
|
10
|
Ed
|
5
|
M
|
911
|
700
|
40
|
Mike
|
4
|
M
|
977
|
650
|
60
|
Frank
|
2
|
M
|
990
|
650
|
Subselect and UPDATE
You can use a subselect in an UPDATE statement to update one or more columns in a row from one table based on the value or values retrieved from one or more columns in a row from another table. This is similar to an RPG program that retrieves a column value from a row in a primary table and then chains to a row in a secondary table using that column value as the key. Another, different column value is then retrieved from the row in the secondary table and used to update a corresponding column in the current row of the primary table.
In the following example, we want to change each employee's current salary to the amount listed in the new salary file. To do this, we use the UPDATE statement in conjunction with the scalar subselect, as shown:
SET sal =
(SELECT nsal FROM newsal bb
WHERE aa.nbr = bb.nbr)
When using the SQL syntax listed above, each row in the emp table must have a corresponding, matching row in the newsal table, based on employee number (nbr). If you review the rows in the newsal table found in the Example Tables section at the beginning of this article, you will see that this is the case. Therefore, all rows in the emp table are updated and will reflect the new salaries as shown below:
Nbr
|
Nam
|
Cls
|
Sex
|
Dpt
|
Sal
|
10
|
Ed
|
5
|
M
|
911
|
789
|
20
|
Heikki
|
2
|
M
|
901
|
689
|
30
|
John
|
5
|
M
|
977
|
409
|
40
|
Mike
|
4
|
M
|
977
|
739
|
50
|
Marcela
|
3
|
F
|
911
|
839
|
60
|
Frank
|
2
|
M
|
990
|
739
|
The obvious question is what happens when each row in the emp table does not have a corresponding, matching row in the newsal table. The answer is simple: The update fails with the first row that does not have a match in the newsal table. If you review the rows in the newsal2 table listed in the example tables at the beginning of this article, you will see that employee numbers 40 and 50 do not have corresponding rows in the newsal2 table. If the update of the original emp table is attempted using the newsal2 table, the update fails with the following error message:
SET sal =
(SELECT nsal FROM newsal2 bb
WHERE aa.nbr = bb.nbr)
Null values not allowed in column or variable SAL.
This UPDATE statement fails at employee number 40 because there is no corresponding employee number 40 row provided in the newsal2 table to update the salary column in the emp table. Therefore, the default update value for the salary column becomes a null, and since the salary column is not null-capable, the UPDATE statement fails at employee number 40. In this situation, the Employee Master table is not journaled and is not using commitment control. Therefore, employee numbers 10, 20, and 30 are successfully updated, but employee number 60 is not.
This UPDATE statement can be written or coded so it will complete successfully even though each row in the emp table does not have a corresponding, matching row in the newsal2 table. By adding a WHERE clause with a second subselect to the UPDATE statement as shown below, the designated rows for employee numbers 10, 20, 30, and 60 will be updated successfully.
SET sal =
(SELECT nsal FROM newsal2 bb
WHERE aa.nbr = bb.nbr)
WHERE aa.nbr IN (SELECT nbr FROM newsal2)
The second subselect in the WHERE clause of the UPDATE statement is used to build an IN list based on the employee numbers in the newsal2 table. The result is that only those employee rows in the emp table that have a corresponding matching row in the newsal2 table are updated with a new salary. The result is that employee numbers 10, 20, 30, and 60 are updated with a new salary, but employee numbers 40 and 50 are not updated. The results of the update to the salary column in the emp table are shown below:
Nbr
|
Nam
|
Cls
|
Sex
|
Dpt
|
Sal
|
10
|
Ed
|
5
|
M
|
911
|
752
|
20
|
Heikki
|
2
|
M
|
901
|
675
|
30
|
John
|
5
|
M
|
977
|
372
|
40
|
Mike
|
4
|
M
|
977
|
650
|
50
|
Marcela
|
3
|
F
|
911
|
750
|
60
|
Frank
|
2
|
M
|
990
|
702
|
Derived Table and Subselect
A derived table is one that does not exist until the SQL statement is executed; it's created dynamically on the fly by the SQL statement execution. A subselect is used in the FROM clause of a SELECT statement to create the derived table as part of the statement execution process.
Suppose you want to see all the employee data for the first or lowest employee number in each department. (Yes, I know this isn't the best example, but it works with the example tables provided at the beginning of the article). There is no way to do this without using a derived table. The SELECT statement using a derived table is shown below:
WHERE nbr IN
(SELECT number FROM
(SELECT dpt, MIN(nbr) AS number
FROM emp
GROUP BY dpt)
AS first_row_table)
ORDER BY nbr
This solution employs two subselects, with the innermost subselect using a derived table function. It works as follows: The desired result is to retrieve the first or lowest employee number for each department number in the emp table. To accomplish this, an IN list containing the employee numbers for these rows meeting the above criteria needs to be created within the WHERE clause for the outer SELECT statement.
This is done with the subselect immediately after the WHERE clause. This first (or leftmost) subselect employs a second (or inner) subselect that uses the MINIMUM function (MIN(nbr) AS number) and derives a temporary table (... FROM (SELECT number ...)) named first_row_table, which uses the GROUP BY function to group the rows in the derived table by department number and the first or lowest employee number for that department. Note that the AS clause (AS first_row_table) in the first subselect appears not to be referenced, but it is required.
When this SELECT statement with the derived table is executed, the following rows are retrieved:
Nbr
|
Name
|
Class
|
Sex
|
Dept
|
Salary
|
10
|
Ed
|
5
|
M
|
911
|
700
|
20
|
Heikki
|
2
|
M
|
901
|
600
|
30
|
John
|
5
|
M
|
977
|
320
|
60
|
Frank
|
2
|
M
|
990
|
650
|
If you compare the retrieved rows to the rows listed in the Employee Master table, you will find that the first employee row (the row with the lowest employee number) for each department is listed in the result set.
Take Advantage of Subselect
A subselect allows you to embed up to 256 SELECT statements inside another SELECT, INSERT, UPDATE, DELETE, or CREATE TABLE statement to create very powerful, advanced SQL statements. This provides the capability to reference multiple tables in a SELECT, INSERT, UPDATE, DELETE, or CREATE TABLE statement, without having to do a join of the tables involved. Subselect provides the capability to write a single SQL statement that can perform a complex database task that could require many more lines of code and be significantly more complex if you used a high-level language like RPG, Cobol, C, or Java. The result can be a significant increase in productivity when performing database-related types of tasks.
By learning and using SQL subselect, you add a powerful tool to your repertoire as an IT professional. The key is to make the commitment to do so.
LATEST COMMENTS
MC Press Online