Has a critical job finished? Is the line up? Did the polling process finish on time? Is the Domino server running? Is MQ Series running behind? Is the server available? Are the job queues backed up? Is the HTTP server available?
Each day, your operators and system administrators manually check programs or write programs around processes or procedures to ensure that your team is meeting its SLA.
In many cases, the operators simply run down a list of items to check at the beginning of each shift. They use commands, such as WRKACTJOB, or tools, like the iSeries Navigator, to view the resources and make sure they are in the right status. Over the course of an eight-hour shift, this process may be done regularly or on an as-needed basis. But this manual process can miss critical status items.
Over the years, many system administrators have written their own utilities to monitor the critical resources on their systems. These processes sit on the systems as never-ending jobs that wake up every 5, 10, or 15 minutes to check the resource status and send an event to a message queue. When a new resource needs to be monitored, they just create another program, which then has to be promoted through a change management system.
IBM provides system APIs for checking job queues, jobs, subsystems, lines, TCP/IP resources, and more. These APIs are documented in the IBM Information Center under Programming. IBM even provides many examples on how to use these APIs.
If your site uses system APIs, the processes can be fairly efficient and, consequently, not hard on the system. But is this what you want your critical human resources doing every day? And what happens when IBM releases a new OS/400 level? Many of the tricks for monitoring resources are OS/400 level-specific.
Robot/CONSOLE, Help/Systems' message management and resource monitoring software, can handle the process of monitoring your resources automatically, with no programming required. Robot/CONSOLE can monitor almost any type of resource at specified intervals and then react to the actual status. This gives you 24 x 7 coverage, no matter where you are (see Figure 1). You might want to do this even if you have operators available all day long.
Figure 1: When the status of a monitored resource changes, a message displays on the Resource Monitor message center. (Click images to enlarge.)
The other beauty of Robot/CONSOLE is that all processes are documented in its history. If a monitored resource fails a test, Robot/CONSOLE sends an event to your database. Auditors can view these events, management can see reports about which resources are causing the most problems, and the system administrator is not put on the spot every time something new needs to be monitored (Figure 2).
Figure 2: You can select the type of resource you want to monitor.
Events from Robot/CONSOLE can evoke OPerator Assistance Language (OPAL), our powerful operations language that uses IF-THEN-ELSE logic or calls another program to resolve the issue. For example, if a critical subsystem is not running, simply have Robot/CONSOLE OPAL execute the STRSBS command to restart it. Or if an important job is not processing when it should, have OPAL resubmit the job into batch.
Robot/CONSOLE has full security so that only the administrator can set up message management rules. It also provides full escalation to notify the on-duty operator, when necessary, by sending a text, email, or pager message via Robot/ALERT, our system event notification software. Escalation lists allow you to notify the right person every time or notify a sequence of users until someone responds.
You can learn more about Robot/CONSOLE by clicking here. And check out Help/Systems' other offerings in the MC Showcase Buyer's Guide.
Tom Huntington is Vice President of Technical Services for Help/Systems, Inc. He can be reached at 952.563.1606 or
LATEST COMMENTS
MC Press Online