Recently, a friend and I spent a fair amount of time debating the relative value of using the *PSSR subroutine for error handling. The conclusion that I came to as a result was that the *PSSR subroutine can be a useful tool if it is used properly and combined with well-thought-out error recovery. In case you missed it, the important part of that last sentence was "if it is used properly," and that is a big "if."
The first advice I'll give you is simply to avoid using the *PSSR subroutine except as a last resort. Use all the other options open to you first. One of the examples given in the RPG ILE reference uses the SCAN opcode with a starting scan location of zero, as shown below:
The code above will trigger the *PSSR if no other error handling has been defined. However, that's probably not what you want to have happen. In a situation such as the above code, there are several other possibilities:
- The code could be checking for a zero value in the starting position prior to executing the SCAN.
- The "E" operation extender could be used to trap the error.
- A MONITOR group could be used to trap the error.
These options should be used to capture and handle errors internally before letting program control go to the *PSSR subroutine. With that said, let's take a closer look at when and how you should use the *PSSR subroutine.
The Good
When used properly, *PSSR can make an application totally "error-free"...at least from the user's perspective. In this case, proper usage means that all of the following statements are true:
- The users and/or operators are informed of the failure(s) in a timely manner.
- The users do not get hard halts when a program fails.
- The information (job log and/or program dump) is available for the developers to trace what happened.
In my opinion, informing the users and operators is by far the most important of these requirements. The following is an example of proper usage combined with good error recovery:
- An RPG program in a batch job fails.
- The *PSSR subroutine traps the error and prints all relevant information regarding the failure. At a minimum, this could be accomplished using a DUMP opcode.
- The next program higher in the stack is informed of failure and takes appropriate actions.
- Step 3 is repeated for all programs in the stack.
- Users/operators are informed of the failure.
This prevents hard halts and provides users and developers with the information they need to address the issue. Minimum interruption to the user and maximum information for the debugger/developer--this is the ideal usage.
Which leads us to the next topic...
The Bad
The single biggest problem with using *PSSR is that if it is not used carefully and in conjunction with good error-recovery practices, it can (and will) mask serious problems both from users and developers. To demonstrate this, here's an example of how *PSSR can mask a problem:
- A program in a batch job fails.
- The *PSSR subroutine handles the error and returns to the calling program using a RETURN opcode.
- The next program higher in the stack doesn't fail or end because the previous program ended "normally."
- The batch job completes normally, and the users receive the message that "job xxx/xxx/xxx submitted by xxx completed successfully."
- The users/operators have no idea that a problem occurred.
The Ugly
The most common mistake I have seen in *PSSR routines is that the developer codes a RETURN. This will cause the program to end "normally" and will not inform the calling program of any failure (as in the case above). Before I get lynched, let me say that coding the RETURN can be the correct thing to do in certain circumstances. But most often it is not, and the RETURN is simply coded because the developer doesn't understand the effect that it has. The same issue applies to *PSSR routines local to subprocedures. If the RETURN is used, then the CALLP or function call will not fail.
The problem of using the RETURN can be (and often is) compounded when a batch job fails and the *PSSR traps the error. Since the job will complete successfully, the printed job log may not be available so that someone can find out what happened. Even if the job log is initially available, the users are unaware that a problem occurred, and it could be days or even weeks before someone figures it out. The job log will most likely be long gone by then.
Another common mistake is not following the error all the way back to the top of the stack. Few things are more irritating (IMHO) than getting a message that a CALL failed and then finding out (after much searching!) that it wasn't the CALLed program that failed at all. It was a program farther down the stack that failed, and the program that you finally got the halt from was not coded to handle the failure correctly. I can't stress enough how important it is to be thorough when developing error handling.
Proper Use Is the Key
To sum up, the *PSSR is a good tool to use, but don't overuse it. Use RETURN in a *PSSR only if it makes sense to "hide" the fact that the program or subprocedure failed. And finally, do a DUMP so the debug information will be available later.
Jeff Olen is an analyst in the IS department at Costco Wholesale in Issaquah, Washington (just outside Seattle). He has nearly 20 years of experience on midrange systems and has developed software for a wide range of applications. He may be reached at
LATEST COMMENTS
MC Press Online