29
Fri, Nov
0 New Articles

Joining Files

General
Typography
  • Smaller Small Medium Big Bigger
  • Default Helvetica Segoe Georgia Times

Access information from more than one file at a time with this modern database technique

by Ted Holt

If you moved into the System/38-AS/400 world after working with the System/34-36, as I did, you may feel overwhelmed at the difference in the two environments. Many of the differences are simply extensions of what you already know, such as using data areas. Other differences just involve using new techniques, such as treating RPG indicators as data fields.

But the AS/400 is a database machine, built upon new concepts, so the majority of the differences from your old environment proceed from a change in philosophy. If we consider the topic of this article on the join file to simply be a new technique, we fail to understand not only what joining is all about, but also what the AS/400 is all about. Joining files is not a new tool to add to our coder's toolbox - it is part of the philosophy of relational database management systems.

What is Joining?

Joining is combining data from two or more files into one record format, based on common fields. In other words, to the user (a program, or a human using a query tool) each record appears to have come from one file although it really contains data from two or more files.

Joining doesn't retrieve any information that the programmer couldn't get just by reading multiple input files. It is common to read a primary file sequentially, and then use the RPG CHAIN operation or the COBOL random READ to get additional information from other files. Reading multiple input files doesn't increase system overhead by adding another logical file for the system to maintain. So why join?

To help us understand the rationale behind joining, let's review two relational database concepts you've probably heard of before: normalization and view.

Normalization is the process of breaking up some types of records into smaller records, without losing the ability to retrieve the data that was contained in the original record. Normalization will eliminate data redundancies and maximize the flexibility and useability of the data. With normalization, you'll find the number of physical files increasing - one file may become two or three. It may seem like quite a mess, but nothing you can't clean up with views.

The term "view" refers to the fact that a user doesn't have to see the data in exactly the way it is stored in the database. A logical file which includes only certain fields is a view. A logical file which includes only certain records (through select and omit criteria) is also a view. Logicals which concatenate fields into other fields are views. Views allow each user to see the data in the way that he wants to see it.

Your users don't care if you store their data in one big file or a dozen small ones. They do care if you don't give them all the information they need. If you give them too much information, you may lose your job. Views let your database appear differently to different people.

You should join files for two reasons: one, to be able to normalize your database; and two, to present data to users (both programs and humans) in the way that they wish to see it.

Types of Joins

Relational database textbooks are full of pages explaining many types of joins. For our purposes, we need to consider only three: the inner join, the partial outer join, and the exception join. These are distinguished by what happens when a record from the primary join file has no matching records in the secondary files. Figures 1 and 2 contain data from two files, customer master and sales history, which we'll use in our illustrations.

If you have joined files with Query/36, you're already familiar with the inner join. Inner joins create resultant records for records with matching values in both files. Let's join the sales history file to the customer master file with an inner join. The records we get are shown in 3. Notice that order 50013 is not in the resultant file because the customer master file has no records with customer number 106.

If you have joined files with Query/36, you're already familiar with the inner join. Inner joins create resultant records for records with matching values in both files. Let's join the sales history file to the customer master file with an inner join. The records we get are shown in Figure 3. Notice that order 50013 is not in the resultant file because the customer master file has no records with customer number 106.

If we want to see all history records, we'll have to use a partial outer join. Like the inner join, the partial outer join retrieves records with matching values. Unlike the inner join, the records which have no matching value in the secondary file will be joined to a null record, a nonexistent record of default values.

4 illustrates the same join as before, but using a partial outer join instead of an inner join. Order 50013 has been joined to a nonexistent blank customer record, which has a blank customer name and zero sales rep number. Default values for character fields are blanks and for numeric fields, zero. To give a field a different default value, use the DFT keyword in the DDS.

Figure 4 illustrates the same join as before, but using a partial outer join instead of an inner join. Order 50013 has been joined to a nonexistent blank customer record, which has a blank customer name and zero sales rep number. Default values for character fields are blanks and for numeric fields, zero. To give a field a different default value, use the DFT keyword in the DDS.

The e 5 shows the results of an exception join. Since order 50013 is the only one with no matching customer number in the customer master file, it is the only order returned by the join.

Another way of looking at it is thus: inner + exception = partial outer. Those that join successfully, plus those that don't, yield all records.

Join Logical Files

Join logical files are the easiest way to define joins on the S/38 and AS/400. Once you define and compile a join logical, your programs can use it as they do any file, with a few restrictions.

Before we get into the limitations of join logicals, let's study the DDS required to join the sales history file and the customer master file (shown in 6).

Before we get into the limitations of join logicals, let's study the DDS required to join the sales history file and the customer master file (shown in Figure 6).

The first keyword in this file, even before the record format definition, is JDFTVAL (Join Default Value). If this keyword is present in the DDS, the join will be a partial outer join. Otherwise, the join is an inner join.

The JFILE keyword in the record format definition tells which files to join. The first file listed, in this case the SLSHIST file, is the primary file. All other files are secondary.

A join specification, coded with a J in column 17, tells how the files are to be joined. One join specification is required for every pair of files. (Joining three files would require two join specs; four files would require three, and so on.) The JOIN keyword, which is optional if only two files are being joined, tells which pair of files this specification is joining. The JFLD (Join Field) keyword tells which fields to match. Here, the customer number has the same name in both files.

The fields to be included in the join logical must be listed next. If an included field is found in more than one file, you must indicate which file's value is to be used. JREF(1) (Join Reference) means that the value of CUSNBR is to be taken from the first file, SLSHIST.

Defining a key field lets us read it in a sorted sequence. All key fields must be from the primary file.

You may also include select/omit specs, but if you do, you must add the DYNSLT (Dynamic Select) keyword at the file level (before the R spec).

Join logicals have a few limitations which you need to keep in mind. One, you can't do exception joins with join logicals. Two, the key fields must all come from the primary file. Three, join logicals can only join physical files. (Ideally, you shouldn't have to worry about whether a file is a physical or a logical.) Four, you can't update through a join logical file. Instead, you must update through a physical or a non-join logical. You can get around all of these limitations by using Open Query File (OPNQRYF), except for the last one.

Joining files with OPNQRYF

If you want to get around limitations one and three, you'll have to let OPNQRYF join the files. OPNQRYF handles inner, partial outer, and exception joins with the JDFTVAL parameter, which looks like its DDS counterpart, but differs in that it takes a parameter value. JDFTVAL(*NO), the default, is for inner joins. You can get a partial outer join with JDFTVAL(*YES). If you want an exception join, specify JDFTVAL(*ONLYDFT). If any of the files being joined is itself a join logical, it must agree with this parameter.

To overcome the second limitation, when you need to order a join logical by a field from the secondary file, specify the key field on an OPNQRYF command. 7 illustrates how we could sort the CUSHIST file on sales rep number, which comes from the secondary file.

To overcome the second limitation, when you need to order a join logical by a field from the secondary file, specify the key field on an OPNQRYF command. Figure 7 illustrates how we could sort the CUSHIST file on sales rep number, which comes from the secondary file.

You'll also have to work with the FILE, JFLD, FORMAT, and MAPFLD parameters when joining with OPNQRYF. List the files you are joining, primary first, in the FILE parameter. Use the FORMAT parameter to tell OPNQRYF how the high-level language expects the data to look. Put the name of any file here, and if the file is a multi-format file, add the name of the format to use. (If no file has exactly the fields you need, create a physical file with no members in it.) Be sure to code the same file in your HLL program as well. When you override the format file to the primary join file, the program will not read the actual data, but the query.

The JFLD parameter, like the JFLD keyword in DDS, tells which fields to use to join the files. Its syntax is different, in that one JFLD parameter is used in place of several JFLD keywords.

If the same field name is found in more than one file, use the MAPFLD parameter to indicate from which file to retrieve the field value.

8 illustrates these keywords. WORK is a file which has all of the fields needed for the HLL program. The actual data will come from the sales rep master file and the join logical called CUSHIST. Since CUSHIST is a partial outer join logical, OPNQRYF must also indicate a partial outer join with JDFTVAL(*YES). The SLSREP field is used to make the join. The MAPFLD tells OPNQRYF to get the value of SLSREP from the sales rep master file. (If we told it to get the value of SLSREP from the CUSHIST file, it would return a zero sales rep number for sales reps with no sales.)

Figure 8 illustrates these keywords. WORK is a file which has all of the fields needed for the HLL program. The actual data will come from the sales rep master file and the join logical called CUSHIST. Since CUSHIST is a partial outer join logical, OPNQRYF must also indicate a partial outer join with JDFTVAL(*YES). The SLSREP field is used to make the join. The MAPFLD tells OPNQRYF to get the value of SLSREP from the sales rep master file. (If we told it to get the value of SLSREP from the CUSHIST file, it would return a zero sales rep number for sales reps with no sales.)

For more examples of joining with OPNQRYF, see the CL Reference manual and my articles in the May and June issues of Midrange Computing.

Be aware that OPNQRYF does have its performance problems. It does tend to hog resources (files, memory, and CPU cycles). OPNQRYF creates temporary access paths as needed, which slows the job as well as the system.

Use join logicals whenever possible for jobs that run frequently. For those jobs that run infrequently, use OPNQRYF so the system doesn't have to maintain a rarely used logical file. Some frequently run jobs may require features not available with join logicals, so you will have to use OPNQRYF. If you do, build supporting logicals so that OPNQRYF won't have to create access paths every time these jobs run.

Try It, You'll Like It!

If you can code a join with DDS and OPNQRYF, you'll have no problem with joins in other products, such as SQL/400, AS/400 Query, or ASC's SEQUEL. Actually, you'll find that learning how to join files is easy. The hard part is making yourself do it. We RPG programmers love the CHAIN op code, and the idea of writing programs that read one (and only one) input file is, well, different. But that is the shape of things to come.

You'll never be convinced of the importance of views, including joins, until you see what they can do for you, so here is an experiment you can try. Pick a user who bugs you to death with requests for new reports. Build a join logical which includes every piece of information he needs. Give him a query package, or write a generic RPG report program with a prompt screen and OPNQRYF over it, to let him pick the sort sequence and the selection of records. Each time I've used this approach, the users ended up generating more of their reports, leaving me with more time for more important things.

So on top of learning about data queues, subsystems, and job descriptions, you've got a different way to write applications. But you didn't pay all that money for an AS/400 so you could keep doing everything the same way.

A QUICK REVIEW OF THE FIRST THREE NORMAL FORMS

Normalization is a set of guidelines for file design. These guidelines, known as normal forms, help prevent redundant data in your database.

Suppose you must store some employee data for a personnel department. For each employee you must record employee number, name, the number and name of the department in which he works, the state in which he works, the code for the plant where he works, the numbers and names of training courses he has completed, and the dates he completed them.

You might try to store all the data in one file, like this:

File 1: EMP#,EMP_NAME,DEPT#,DEPT_NAME,STATE, PLANT,COURSE#1,COURSE_NAME1,COURSE_ DATE1,COURSE#2,COURSE_NAME2,COURSE_ DATE2,COURSE#3,COURSE_NAME3,COURSE_ DATE3

The underlined field is the key. (This should not be confused with access path. In relational database terminology, the key is the group of one or more fields which distinguishes one record from another. On the S/38 and AS/400, relational keys are implemented as access paths with the UNIQUE keyword.)

You can already see some problems with this approach. For how many courses should you leave room? If management changes the name of Department 4 from Personnel to Human Resources, will all Department 4 records get updated? Where do you store the name of a new course no one has yet completed?

Let's look at how normalization takes care of such problems.

FIRST NORMAL FORM

A file is in first normal form (1NF) if it doesn't have repeating groups. The obvious violation of 1NF in this example is the course information. We can convert this file to first normal form by placing each completed course into a record of another file, whose key will be a composite of employee number and course number.

In 1NF, the data in our example is stored in two files:

File 1: EMP#,EMP_NAME,DEPT#,DEPT_NAME,STATE,PLANT

File 2: #EMP#,COURSE#,COURSE_NAME,COURSE_DATE

This is better than before for two reasons. One, we can store as many or as few courses per employee as we like. Two, to find out which employees have completed course 434, we search one field, not three.

SECOND NORMAL FORM

Our new database design still isn't perfect. If someone enters the wrong course name in a record, there will be more than one course name for the same course number, and the database will be contaminated. There really is no need to store the course name for every employee who has completed it. Because course name is functionally dependent on (can be determined from, is a fact about) only part of a composite key (a key made up of two or more fields), this file is not in second normal form. We need to put the course name in another file, with course number as the key.

In 2NF, the data in our example is stored in three files:

File 1: EMP#,EMP_NAME,DEPT#,DEPT_NAME,STATE,PLANT

File 2: #EMP#,COURSE#,COURSE_DATE

File 3: COURSE#,COURSE_NAME

THIRD NORMAL FORM

The design of our employee database is improving, but still has one problem. The department name in the first file is functionally dependent on a non-key field, department number. We need to move department name to another file to achieve third normal form.

In 3NF, the data in our example is stored in four files:

File 1: EMP#,EMP_NAME,DEPT#,STATE,PLANT

File 2: #DEPT#,DEPT_NAME

File 3: EMP#,COURSE#,COURSE_DATE

File 4: COURSE#,COURSE_NAME

This database design allows us to store each non-key data value one time. Key values are stored redundantly in order to join files. We can retrieve any information in the database from a single file or from a join file of two or more files.

OTHER NORMAL FORMS

There are other normal forms, but these are the most important. If your database is consistent with 3NF, you should be able to store and retrieve anything with no problem. And your database probably will be in fourth or even fifth normal form without any effort on your part.

IT'S COMMON SENSE

Some people wonder why the big fuss over normalization. "It's just common sense," they argue. Well, maybe. I've worked on systems that violated all three of these rules, and you probably have, too. With the technology we had at the time, we often had to violate these rules.

Most of us would have probably designed a course master file and a department master file before writing the first line of code for our fictitious personnel system. Let that encourage you. It means that normalization is not really as new or as difficult as it sounds.

DDS Reference (SC21-9620) Data Base Guide (SC21-9659) Control Language Reference (SC21-9778)


Joining Files

Figure 1 Customer master file CUSMAST

 Figure 1. Customer master file CUSMAST Cust Cust Sales No. Name Rep 100 BAKER 41 101 JONES 85 102 SMITH 7 103 DOE 41 104 BLACK 7 
Joining Files

Figure 2 Sales history file SLSHIST

 Figure 2. Sales history file SLSHIST Order Cust Date Item Unit No. No. Of Sale No. Qty Price 50011 104 90/06/04 T310 4 100.00 50012 100 90/06/04 M099 1 500.00 50013 106 90/06/04 T314 3 250.00 50014 104 90/06/05 T310 8 95.00 
Joining Files

Figure 3 Inner join with sales history and customer master

 Figure 3. Inner join with sales history and customer master Order Cust Date Item Unit Cust Sales No. No. Of Sale No. Qty Price Name Rep 50011 104 90/06/04 T310 4 100.00 BLACK 7 50012 100 90/06/04 M099 1 500.00 BAKER 41 50014 104 90/06/05 T310 8 95.00 BLACK 7 
Joining Files

Figure 4 Partial outer join, sales history and customer mst

 Figure 4. Partial outer join with sales history and customer master Order Cust Date Item Unit Cust Sales No. No. Of Sale No. Qty Price Name Rep 50011 104 90/06/04 T310 4 100.00 BLACK 7 50012 100 90/06/04 M099 1 500.00 BAKER 41 50013 106 90/06/04 T314 3 250.00 0 50014 104 90/06/05 T310 8 95.00 BLACK 7 
Joining Files

Figure 5 Exception join with sales history and customer mst

 Figure 5. Exception join with sales history and customer master Order Cust Date Item Unit Cust Sales No. No. Of Sale No. Qty Price Name Rep 50013 106 90/06/04 T314 3 250.00 0 
Joining Files

Figure 6 DDS for join logical file CUSHIST

 Figure 6. DDS for join logical file CUSHIST A* JOIN SALES HISTORY & CUSTOMER MASTER FILE A A JDFTVAL A R CUSHISTR JFILE(SLSHIST CUSMAST) A J JOIN(SLSHIST CUSMAST) A JFLD(CUSNBR CUSNBR) A ORDNBR A CUSNBR JREF(1) A SLSDAT A ITMNBR A QTY A UPRICE A CUSNAM A SLSREP A K ORDNBR 
Joining Files

Figure 7 Ordering a join LF on field from secondary file

 Figure 7. Ordering a join logical on a field from a secondary file OVRDBF FILE(CUSHIST) SHARE(*YES) OPNQRYF FILE((CUSHIST)) KEYFLD((SLSREP)) CALL HLLPGM CLOF OPNID(CUSHIST) DLTOVR FILE(CUSHISTR) 
Joining Files

Figure 8 Sample join with OPNQRYF

 Figure 8. Sample join with OPNQRYF OVRDBF FILE(WORK) TOFILE(REPMAST) SHARE(*YES) OPNQRYF FILE((REPMAST) (CUSHIST)) + FORMAT(WORK) + JFLD((SLSREP SLSREP)) + JDFTVAL(*YES) + MAPFLD(REPMAST/SLSREP) CALL HLLPGM CLOF OPNID(REPMAST) DLTOVR FILE(WORK) 
TED HOLT

Ted Holt is IT manager of Manufacturing Systems Development for Day-Brite Capri Omega, a manufacturer of lighting fixtures in Tupelo, Mississippi. He has worked in the information processing industry since 1981 and is the author or co-author of seven books. 


MC Press books written by Ted Holt available now on the MC Press Bookstore.

Complete CL: Fifth Edition Complete CL: Fifth Edition
Become a CL guru and fully leverage the abilities of your system.
List Price $79.95

Now On Sale

Complete CL: Sixth Edition Complete CL: Sixth Edition
Now fully updated! Get the master guide to Control Language programming.
List Price $79.95

Now On Sale

IBM i5/iSeries Primer IBM i5/iSeries Primer
Check out the ultimate resource and “must-have” guide for every professional working with the i5/iSeries.
List Price $99.95

Now On Sale

Qshell for iSeries Qshell for iSeries
Check out this Unix-style shell and utilities command interface for OS/400.
List Price $79.95

Now On Sale

BLOG COMMENTS POWERED BY DISQUS

LATEST COMMENTS

Support MC Press Online

$

Book Reviews

Resource Center

  • SB Profound WC 5536 Have you been wondering about Node.js? Our free Node.js Webinar Series takes you from total beginner to creating a fully-functional IBM i Node.js business application. You can find Part 1 here. In Part 2 of our free Node.js Webinar Series, Brian May teaches you the different tooling options available for writing code, debugging, and using Git for version control. Brian will briefly discuss the different tools available, and demonstrate his preferred setup for Node development on IBM i or any platform. Attend this webinar to learn:

  • SB Profound WP 5539More than ever, there is a demand for IT to deliver innovation. Your IBM i has been an essential part of your business operations for years. However, your organization may struggle to maintain the current system and implement new projects. The thousands of customers we've worked with and surveyed state that expectations regarding the digital footprint and vision of the company are not aligned with the current IT environment.

  • SB HelpSystems ROBOT Generic IBM announced the E1080 servers using the latest Power10 processor in September 2021. The most powerful processor from IBM to date, Power10 is designed to handle the demands of doing business in today’s high-tech atmosphere, including running cloud applications, supporting big data, and managing AI workloads. But what does Power10 mean for your data center? In this recorded webinar, IBMers Dan Sundt and Dylan Boday join IBM Power Champion Tom Huntington for a discussion on why Power10 technology is the right strategic investment if you run IBM i, AIX, or Linux. In this action-packed hour, Tom will share trends from the IBM i and AIX user communities while Dan and Dylan dive into the tech specs for key hardware, including:

  • Magic MarkTRY the one package that solves all your document design and printing challenges on all your platforms. Produce bar code labels, electronic forms, ad hoc reports, and RFID tags – without programming! MarkMagic is the only document design and print solution that combines report writing, WYSIWYG label and forms design, and conditional printing in one integrated product. Make sure your data survives when catastrophe hits. Request your trial now!  Request Now.

  • SB HelpSystems ROBOT GenericForms of ransomware has been around for over 30 years, and with more and more organizations suffering attacks each year, it continues to endure. What has made ransomware such a durable threat and what is the best way to combat it? In order to prevent ransomware, organizations must first understand how it works.

  • SB HelpSystems ROBOT GenericIT security is a top priority for businesses around the world, but most IBM i pros don’t know where to begin—and most cybersecurity experts don’t know IBM i. In this session, Robin Tatam explores the business impact of lax IBM i security, the top vulnerabilities putting IBM i at risk, and the steps you can take to protect your organization. If you’re looking to avoid unexpected downtime or corrupted data, you don’t want to miss this session.

  • SB HelpSystems ROBOT GenericCan you trust all of your users all of the time? A typical end user receives 16 malicious emails each month, but only 17 percent of these phishing campaigns are reported to IT. Once an attack is underway, most organizations won’t discover the breach until six months later. A staggering amount of damage can occur in that time. Despite these risks, 93 percent of organizations are leaving their IBM i systems vulnerable to cybercrime. In this on-demand webinar, IBM i security experts Robin Tatam and Sandi Moore will reveal:

  • FORTRA Disaster protection is vital to every business. Yet, it often consists of patched together procedures that are prone to error. From automatic backups to data encryption to media management, Robot automates the routine (yet often complex) tasks of iSeries backup and recovery, saving you time and money and making the process safer and more reliable. Automate your backups with the Robot Backup and Recovery Solution. Key features include:

  • FORTRAManaging messages on your IBM i can be more than a full-time job if you have to do it manually. Messages need a response and resources must be monitored—often over multiple systems and across platforms. How can you be sure you won’t miss important system events? Automate your message center with the Robot Message Management Solution. Key features include:

  • FORTRAThe thought of printing, distributing, and storing iSeries reports manually may reduce you to tears. Paper and labor costs associated with report generation can spiral out of control. Mountains of paper threaten to swamp your files. Robot automates report bursting, distribution, bundling, and archiving, and offers secure, selective online report viewing. Manage your reports with the Robot Report Management Solution. Key features include:

  • FORTRAFor over 30 years, Robot has been a leader in systems management for IBM i. With batch job creation and scheduling at its core, the Robot Job Scheduling Solution reduces the opportunity for human error and helps you maintain service levels, automating even the biggest, most complex runbooks. Manage your job schedule with the Robot Job Scheduling Solution. Key features include:

  • LANSA Business users want new applications now. Market and regulatory pressures require faster application updates and delivery into production. Your IBM i developers may be approaching retirement, and you see no sure way to fill their positions with experienced developers. In addition, you may be caught between maintaining your existing applications and the uncertainty of moving to something new.

  • LANSAWhen it comes to creating your business applications, there are hundreds of coding platforms and programming languages to choose from. These options range from very complex traditional programming languages to Low-Code platforms where sometimes no traditional coding experience is needed. Download our whitepaper, The Power of Writing Code in a Low-Code Solution, and:

  • LANSASupply Chain is becoming increasingly complex and unpredictable. From raw materials for manufacturing to food supply chains, the journey from source to production to delivery to consumers is marred with inefficiencies, manual processes, shortages, recalls, counterfeits, and scandals. In this webinar, we discuss how:

  • The MC Resource Centers bring you the widest selection of white papers, trial software, and on-demand webcasts for you to choose from. >> Review the list of White Papers, Trial Software or On-Demand Webcast at the MC Press Resource Center. >> Add the items to yru Cart and complet he checkout process and submit

  • Profound Logic Have you been wondering about Node.js? Our free Node.js Webinar Series takes you from total beginner to creating a fully-functional IBM i Node.js business application.

  • SB Profound WC 5536Join us for this hour-long webcast that will explore:

  • Fortra IT managers hoping to find new IBM i talent are discovering that the pool of experienced RPG programmers and operators or administrators with intimate knowledge of the operating system and the applications that run on it is small. This begs the question: How will you manage the platform that supports such a big part of your business? This guide offers strategies and software suggestions to help you plan IT staffing and resources and smooth the transition after your AS/400 talent retires. Read on to learn: