You can boost productivity by mastering tools that work with data in sets, rather than by individual records. This article explains how you can solve complex query requests quickly by combining sets of data.
I learned to program at Itawamba Community College, in Tupelo, Mississippi. The year was 1982, the computer was an IBM System/3 Model 12, and the language was RPG
II. I was taught to program in terms of single record I/O. That is, each read against a file
copied the contents of one record into my program, and each write or update to a file sent output to only one record.
Since then, relational databases have become the industry standard. Relational databases are designed to work with sets of data, yet many programmers who produce business applications still think only in record-at-a-time terms.
I want to show some ways set-at-a-time processing has helped me on many occasions to solve real-world problems quickly. If you have never thought in terms of setat-a-time processing, I hope this will be of benefit to you.
What Is a Set?
A set is a group of things that share one or more common attributes. For example, where you work, there are lots of sets of people:
People who wear glasses
People who dont wear glasses
People who have red hair
People who arrive late, take extended lunch breaks, and leave early
Female people who are over two meters tall and have Hispanic surnames The list could go on and on, because there are so many different ways to classify things.
A set can be empty. An empty set means that no one fits all the criteria required for membership in the set. The last example in the list above, female people who are over two
meters tall and have a Hispanic surname, would likely be empty in many places of employment, because not many people fit that description.
An entity can have membership in more than one set. An employee might belong to these sets:
Salaried employees
Male salaried employees
Purchasing agents
Employees who participate in the corporate stock purchase plan
People who know how to open the microwave door when the latch gets stuck
How to Process Sets
To process records one at a time, you need a compiler or interpreter with operations like read, read prior, write, update record, and delete record. When you process sets of records, you need other tools and operators.
To process sets, you need relational tools, like SQL, OPNQRYF, or Query/400. You tell relational tools what you intend to do and what set of data you want them to do it to. For example, you can delete all employees who were terminated before January 1 of the current year, or give a 20 percent pay increase to all RPG programmers who have been contacted by at least four headhunters within the last week.
Not all relational tools are created equal. SQL will read, write, update, and delete sets of records. Query/400 only reads and writes. OPNQRYF cant do any actions, but depends on an accompanying HLL program or CPYFRMQRYF command to do that. Im going to use SQL to illustrate set-at-a-time programming. SQL supports the four data retrieval and manipulation operations just mentioned, requires no source code or interactive screens to be filled, and, when combined with a CL driver program, makes quick work of some complicated programming tasks. Since SQL/400 doesnt work with CL, Ill be using the Execute SQL Statement (EXCSQLSTM) command we published in December 1994 (see The EXCSQLSTM Utility). If you dont have the code, you can get it from the Midrange Computing Resource CD or from our Web site at http://www.midrangecomputing.com.
Im aware that a lot of shops dont use SQL, but I hope more shops will begin to use it as a result of reading this article.
Now lets look at a couple of examples of set-at-a-time programming.
When Is an Outer Join Not an Outer Join?
Inner joins differ from outer joins in how they treat unmatched data. An inner join does not include records from the primary file if no match is found in the secondary file(s). A left outer join, the only kind of outer join DB2/400 supports, includes data from the primary file, even if no matching data is in the secondary file(s).
Suppose some important person asks you for a list of all inventory items with a price code of PT or PV, and any sales of those items. This important person wants to see all items of those classes, whether they have been sold or not. You might use an outer join like that in Figure 1.
This join works great. You can see the results in Figure 2. Notice that item K287 was included, even though it hasnt yet been sold.
Now, suppose the VIP tells you he only wishes to see sales of June 1998. He still wants to see all items with a class code of PT or PV. Your first impulse might be to add a where clause to restrict the table, as in Figure 3, but that wont yield the results you want, because the where clause references a field in a secondary file.
In fact, different relational database management systems could yield completely different results in a case like this. Many of them will see that a where clause references a
secondary file and will change the query to run an inner join instead of a left outer join. Others go crazy. I ran this query in SQL/400 and got erroneous results that made no sense at all.
Some relational DBMSs allow mixed join types in one query. That is, you can join files A and B with an inner join, and B and C with a left outer join, in the same query. DB2/400 permits only one join type in each query, so thats not an option in this case.
Set-at-a-time programming can eliminate this confusion. Figure 4 contains a short CL program that manipulates sets of records. The first EXCSQLSTM command combines June 1998 sales data for items of the desired classes from the invoice header and invoice detail files. I store the invoice information in a temporary file called TEMP01, in library QTEMP.
Then, I join the item master file to the temporary file. I use a left outer join here, because I want all items of classes PT and PV to appear in the result set, even if there were no June sales.
How do you like that? This program produces a report from only two executable statements.
Another Example
Im going to complicate the example for two reasons. First, I want to show you that complicated queries are not really complicated; and second, most of the set-at-a-time programs Ive had to write were not this simple.
Management has decided to raise prices. They have selected some items of types PT and PV and keyed the proposed new prices into a spreadsheet. Your job is to produce a report to aid them in making their decision. They will look at your report, adjust prices in their spreadsheet, give you the new spreadsheet, and ask for a fresh report. This process will continue until theyre satisfied with the new prices.
The report must show the following: item number; item description; how many sales were made of each item; the quantity sold between July 1, 1997, and June 30, 1998; how much those items sold for (actual sales dollars, not unit price); how much the items would have sold for at the new proposed prices; and the difference between the two sales figures. If there is no new price, use the current sales figures. Total the sales figures and difference columns, and print some asterisks to the right of any line where the increase is greater than 10 percent. Sort it by actual sales dollars in descending order.
I once had a guy ask me for something like this back in my S/36 days. I told him he needed a magician, not a programmer. However, since I started working with the S/38 and AS/400, Ive never turned these requests down.
Anyway, lets look at how youd solve this problem. First, you need to determine where the data is. You already know about the ITEMS, INVHDR, and INVDTL files, but you dont have a file for the new prices. You create another file, NEWPRICE, save the spreadsheet in a comma-separated values (CSV) file, and upload the CSV file into NEWPRICE, using a data transfer program like Client Access.
Now that all the data is on the AS/400 in database files, youre ready to start designing the program. There are several ways you could extract and combine the data, but heres one, shown in Figure 5 using EXCSQLSTM and Query/400.
This program uses EXCSQLSTM to build work files in QTEMP. Ive named the work files TEMPnnn, where nnn is a three-digit number. Ive incremented the numbers by 10, because the people who request these reports change their requirements as they work with the reports, and I often have to add a new work file to the series. I like the work files to be named in the order I build them, so I leave room for insertion.
Heres how the program works:
1. Get all items of types PT and PV, along with new prices. If there is no new price for an item, set the new price to zero.
2. Summarize the sales information for the desired items over the specified period of time. For any items that have no sales records, add records with zero sales figures. This lets me use an inner join to join to the item information. My experience is that less tends to go wrong with inner joins than with outer joins, so I prefer to use inner joins when possible.
3. Combine the item and sales data.
4. Build the calculated columns: projected sales, projected increase, and large increase flag.
5. Print the report. Since EXCSQLSTM has no method for accumulating column totals, I use Query/400 for this step.
The resulting report is shown in Figure 6.
Complexity Doesnt Have to Be Difficult
Working with sets has helped me do things I couldnt have done with the usual tools. Studies have shown that a given programmer is capable of producing a certain number of lines of code per day, regardless of the language with which hes working. That is, programmer Smith can produce 100 lines of assembler language, or 100 lines of RPG, or 100 lines of EXCSQLSTM. Obviously, the more each line of code does for you, the more productive you are.
Think how much harder it would be to develop the report in Figure 6 using RPG and printer files, and work files developed with DDS.
Think about the difficulty of changing this report. Users never ask me for what they want; they ask me for what they think they want. Also, we computer weenies dont speak the same language the users do, so communication is difficult anyway. The only way to give the users what they want, oftentimes, is to give them something to look at, let them tell you whats wrong with it, change it, and run it again. This is difficult with RPG programs, printer files, and DDS-defined work files, but easy with tools like EXCSQLSTM.
This may be off the subject, but Ive been around programmers who never came to understand that users dont care about the things we care about or think they care about. Users can get excited about an ugly report, as long as it tells them what they want to know.
So if youre not writing set-at-a-time programs, give it some thought. Get some good tools, especially a good industrial-strength SQL package. EXCSQLSTM is free, but its slow. When youre able to satisfy complex query requests quickly, users will cease to think of you as a programmer and begin to call you a wizard.
select it.itemno, it.desc, it.class, id.invno, ih.invdate,
ih.custno, id.qty, id.price
from items it left join invdtl id on it.itemno = id.itemno
left join invhdr ih on id.invno = ih.invno
where it.class in (PT,PV)
Figure 1: This left outer join retrieves all desired items, even those for which there are no sales figures
ITEMNO DESC CLASS INVNO INVDATE CUSTNO QTY PRICE
J240 4 Widget PT 101 05/30/98 334 2 5.00 J240 4 Widget PT 102 06/01/98 334 1 5.00 J240 4 Widget PT 105 07/01/98 555 1 5.00 K138 Thingamajig PT 103 06/04/98 445 3 2.00 L909 Doomaflotchie PV 103 06/04/98 445 4 3.00 K287 #3 Whoozit PT null null null null null
Figure 2: Resulting data from the join in Figure 1
select it.itemno, it.desc, it.class, id.invno, ih.invdate,
ih.custno, id.qty, id.price
from items it left join invdtl id on it.itemno = id.itemno
left join invhdr ih on id.invno = ih.invno
where it.class in (PT,PV)
and ih.invdate between 06/01/98 and 06/30/98
Figure 3: Outer joins dont work correctly with record selection over a secondary file
PGM
EXCSQLSTM (+
select id.itemno, id.invno, ih.invdate, +
ih.custno, id.qty, id.price +
from invdtl id +
inner join invhdr ih +
on id.invno = ih.invno +
where ih.invdate between 06/01/98 and 06/30/98 +
and id.itemno in +
(select itemno +
from items +
where class in (PT,PV)) +
) OUTPUT(*OUTFILE) OUTFILE(TEMP01)
EXCSQLSTM (+
select it.itemno, it.desc, it.class, t1.invno, t1.invdate, +
t1.custno, t1.qty, t1.price +
from items it left join temp01 t1 on it.itemno = t1.itemno +
where it.class in (PT,PV) +
) OUTPUT(*PRINT)
ENDPGM
Figure 4: A two-step query solves the outer join problem
PGM
MONMSG MSGID(CPF2105) /* file not found */
MONMSG MSGID(CPF0000) EXEC(GOTO ABEND)
/* Get existing price info and new prices */
DLTF FILE(QTEMP/TEMP010)
EXCSQLSTM (+
select it.itemno, desc, +
decimal(coalesce(newprc,0),5,2) as newprc +
from items it left join newprice np +
on it.itemno = np.itemno +
where class in (PT,PV) +
) OUTPUT(*OUTFILE) OUTFILE(TEMP010)
/* Get sales info */
DLTF FILE(QTEMP/TEMP020)
EXCSQLSTM (+
select itemno, +
decimal(count(*),5,0) as nbrofsales, +
decimal(sum(qty),5,0) as qtysold, +
decimal(sum(qty * price),7,2) as actualsls +
from invdtl id +
inner join invhdr ih +
on id.invno = ih.invno +
where ih.invdate between 07/01/97 and 06/30/98 +
and id.itemno in +
(select itemno +
from temp010) +
group by itemno +
) OUTPUT(*OUTFILE) OUTFILE(TEMP020)
/* Add blank sales figures for items that did not get sold */
EXCSQLSTM (+
insert into temp020 +
select itemno, 0, 0, 0 +
from temp010 t10 +
where itemno not in +
(select itemno from temp020) +
)
/* Combine the data */
DLTF FILE(QTEMP/TEMP030)
EXCSQLSTM (+
select t10.itemno, desc, nbrofsales, +
qtysold, actualsls, +
decimal(qtysold * newprc,7,2) as projsls +
from temp010 t10 +
inner join temp020 t20 +
on t10.itemno = t20.itemno +
) OUTPUT(*OUTFILE) OUTFILE(TEMP030)
/* Set projected sales equal to actual sales if no new prices */
EXCSQLSTM (+
update temp030 +
set projsls = actualsls +
where projsls = 0)
/* Calculate sales difference */
DLTF FILE(QTEMP/TEMP040)
EXCSQLSTM (+
select t30.*, +
decimal(projsls - actualsls,7,2) as increase, +
as incflag +
from temp030 t30 +
order by actualsls desc +
) OUTPUT(*OUTFILE) OUTFILE(TEMP040)
/* Flag large increases */
EXCSQLSTM (+
update temp040 +
set incflag = *** +
where actualsls 0 +
and float(increase) / float(actualsls) >= 0.095)
/* Change column headings */
EXCSQLSTM (+
label on qtemp/temp040 (+
itemno is Item number , +
desc is Description , +
nbrofsales is Number of sales , +
qtysold is Qty sold , +
actualsls is Actual sales , +
projsls is Projected sales , +
increase is Projected increase , +
incflag is +
))
RUNQRY QRY(*LIBL/OUTJOIN2Q)
RETURN
ABEND:
FWDPGMMSG
MONMSG MSGID(CPF0000)
ENDPGM
Figure 5: The price increase report program
07/01/98 09:00:00 Price Increase Worksheet PAGE 1
Item number Description Number Qty Actual Projected Projected
of sales sold sales sales increase
J240 4 Widget 2 3 15.00 18.00 3.00 ***
L909 Doomaflotchie 1 4 12.00 12.00 .00
K138 Thingamajig 1 3 6.00 6.15 .15
K287 #3 Whoozit 0 0 .00 .00 .00
FINAL TOTALS
TOTAL 33.00 36.15 3.15
* * * E N D O F R E P O R T * * *
Company confidential
LATEST COMMENTS
MC Press Online