High availability (HA) and disaster recovery (DR) involve multiple systems as well as some type of data replication solution. Some people interchange the terms, but they are actually quite different and often require different security considerations. Let's look at some scenarios and discuss the security implications.
HA Scenario #1
This scenario involves two or more systems. All are "hot," and database updates on one system are propagated out to the rest of the systems. This situation often involves load-balancing software or hardware. A good example is a set of machines servicing high-volume Web sites. A request comes in and is "routed" to one of the systems. If the load-balancer does not receive a response from one of the systems for a period of time, it "knows" to not send any more "real" requests to that system until the system starts responding again to "test" requests sent by the load-balancer. In this scenario, it is critical that all systems be configured exactly the same way. System values, application profiles, application object *PUBLIC and private authorities, ownership of application objects--it's vital that you keep all of these exactly the same on all systems; otherwise, the application could start to fail.
All HA scenarios require careful consideration of application security schemes, but this particular scenario demands it. This scenario often occurs in a high-volume online retail environment in which credit card numbers and expiration dates are part of each transaction. As stated, all transactions are propagated to all of the other systems in the configuration. That means that the opportunity for a hacker, a devious employee, or a vendor to abuse a poorly designed application security scheme is the number of database files containing a credit card number times the number of systems in your configuration. You should seriously consider setting the *PUBLIC authority of these application databases to *EXCLUDE and re-working the application so that only an application profile can access these files. You may also want to consider encrypting the credit card numbers. You will also want to ensure that the profiles on these systems belong to users who have a direct business need to access these systems and the data that resides on them. All other profiles should be removed.
HA Scenario #2
This scenario typically involves two systems--one production and one backup--and it can apply to two configurations. In either configuration, "real" work occurs on the production system, and the transactions are subsequently reflected on the backup system. Real work never occurs on the backup system except in the case of a failover--that is, the production system is unavailable because of a disaster (such as a fire or flood), a failure (such as electrical or disk pack), or scheduled hardware or software maintenance. One configuration has the iSeries acting as the Web server with a Web site available on the Internet. In this case, when the systems failover, the external IP address of the Web site remains constant, but instead of being routed to the production system, it is routed to the backup system. Another configuration has the iSeries running mission-critical "back office" applications.
Once again, it is vital that system values, application profiles, application object *PUBLIC and private authorities, and ownership of application objects be kept consistent across both systems. You can usually keep these in sync with proper configuration of your replication software. For those of you who have the luxury of truly keeping one of those systems offline to use only when you have to failover to the other system, let's discuss some of the considerations you'll want to make:
Clean up your systems. All products and applications have security considerations that you should make decisions about. Replicating unused products and libraries doubles the exposures and chances for abuse. Unused profiles that can still sign on are one method that disgruntled employees or ex-employees use to breach the system--or, in this case, systems.
Systems running OS/400 typically hold vital corporate data. Replicating garbage (unused objects) and having a poorly implemented security scheme double your chances for a breach. You should carefully examine your security scheme to ensure it meets your corporate requirements for ensuring data integrity, confidentiality, and privacy. "This is one of the biggest concerns we have for our customers," says Ken Zaiken, VP of Research and Development of MIMIX at Lakeview Technology, one of the recognized leaders in HA solutions. "Our products can make sure that a security scheme is implemented on the backup system so that it exactly matches the production system. That's great in the case where they have locked down their confidential data, given appropriate authorities and capabilities to users, and set the security systems values to reasonable (secure) settings. However, if they haven't implemented a solid security scheme, they are now doubling their exposures."
Isolate the backup system. If the backup system is truly used only for backup purposes, what measures have you put in place to ensure that users can't get to the backup system and change the data? There are a couple of ways to accomplish this. You can turn off all of the communication and TCP/IP services except those required by your HA vendor software. Or you can use packet filtering rules to limit where incoming requests originate from. In other words, allow incoming requests only from the production system. You can further limit access by using an exit program to allow requests only when they're made by the vendor's application profile.
The more sophisticated HA solutions, such as Lakeview's MIMIX, have a configuration option to set profiles created on the target system to status *DISABLED and then to re-set them to status *ENABLED when a failover occurs. This ensures that users cannot access the system except in the case of a failover. If your HA software doesn't do this for you, you may want to consider writing a program that performs this function.
Examine the use of exit program software. If you are using an exit program solution to protect your network interfaces, you have a decision to make. Either you can purchase another license and protect your backup system with a totally separate set of rules or you can replicate the rules database to the backup system and only implement the rules in the failover condition. As noted above, you can use exit program software to help isolate the backup system from being accessed, but if this is not a concern, it may be sufficient to only replicate the rules database. In either case, you will want to talk to your exit program vendor to understand which databases need to be replicated and what procedures to follow in the case of a failover to ensure that the software is enforcing the proper set of rules.
HA Scenario #3
Scenario #3 looks much the same as Scenario #2, but the backup system is used for other purposes. This may include running queries against real-time production data without incurring the overhead on the production system and running additional, but typically less vital, applications. "This is the HA scenario we see most often," says Zaiken. "It is rare that you see a system used solely as a backup machine. Using backup systems to run non-production tasks (such as queries) boosts productivity in addition to providing protection."
Since users are allowed to access the backup system to perform legitimate job functions, how do you keep production data from being updated, especially if these users are allowed to update the data when running in production mode? Let's examine the options.
Application objects are *PUBLIC(*EXCLUDE). Users are allowed to run other applications. If your application objects are excluded from use except through use of the application (that is, the application uses adopted authority to access application objects), all you have to do is prohibit users from accessing the application. While this might sound easy, it will be a bit of a challenge if the profiles being replicated have an initial program that automatically brings the user directly into the application. If you set this program to *PUBLIC(*EXCLUDE), users will not be able to sign on to the system. In this case, you will have to find another program or programs to secure so that users can sign on but not run the application. Then, in the event of failover, you will have to set these programs back to be accessible.
Application data is allowed to be queried. This is the scenario I have seen most often. While there may be additional applications a user can run, users' access to the backup system is usually to allow for querying of real-time production data. So how do you easily switch between restricting users to only allow querying (that is, having *USE authority to the data) and having sufficient authority to update data (that is, having *CHANGE authority to the data) when in a failover situation? A solution I architected for one client uses authorization lists. This method can be applied whether the *PUBLIC authority needs to be changed, a private authority needs to be changed, or the owner's authority needs to be changed. How does it work? All of the objects that can be updated during an application's normal processing--database files and data areas, for example--are secured with an authorization list. Then, whatever method is used for users to gain authority to update these application objects is managed via the authorization list. For example, if the application requires a user to have *CHANGE authority to the application database files to run the application, you can enable this requirement by granting the user or the user's group profile authority to the authorization list. While the system is in backup mode, you grant the user or group *USE authority. This enables them to query the production files residing on the backup system but not to update them. When a failover situation occurs, the only authority you have to update is to change the user's or group's authority to the authorization list from *USE to *CHANGE. One quick and simple change enables updates to production data on the backup system.
If the application's security scheme relies on objects' *PUBLIC authority, you change the authorization list's *PUBLIC authority from *USE to *CHANGE. To make this work, when you secure the objects with the authorization list, you will need to set the objects' *PUBLIC authority to the value *AUTL, which indicates to OS/400 that the object's *PUBLIC authority is to come from the authorization lists' *PUBLIC authority setting.
If the application uses adopted authority and the profile that is being adopted has private authority to the application's database files (rather than owning them), simply secure the objects with the authorization list. Grant the application profile *USE authority to the authorization list when in backup mode. When in failover mode, change the application profile's authority to *CHANGE.
If the application uses adopted authority and the profile being adopted owns both the programs and the database files, you will have to remove the owner's authority to each object, secure the objects with the authorization list, and grant the application's owning profile either *USE or *CHANGE to the authorization list, depending on the mode in which the system is currently operating. Removing the owner's authority from each object is a bit strange, I'll admit, but unless you do, the authority OS/400 recognizes and uses will be the owner's authority to the object, not the owner's authority to the authorization list.
Disaster Recovery Scenario
In this scenario, you must recover your system after a disaster. Suppose you have systems that have not been kept in sync, and one system is lost through either a natural disaster or a catastrophic hardware failure. What do you do? Hopefully, you've done some--better yet, a lot--of planning for this scenario. If you have planned, then your backups are current and are recoverable from wherever you store them. Your system has been repaired or you are at a hot site, and now you have to bring up a system, configure it, and restore everything so that you can continue to do business.
The most time-saving step you can make is to ensure that you restore your objects in the order and method described in IBM's OS/400 Backup and Recovery manual, SC41-5304. After restoring OS/400, you must restore your user profiles (RSTUSRPRF *ALL), then your objects (e.g., applications, user libraries, device descriptions, etc.), and then the private authorities (RSTAUT). When restoring objects before user profiles, most profiles that own the objects won't yet exist on the system. Therefore, OS/400 has to set all objects owned by non-IBM profiles to be owned by the IBM-supplied profile QDFTOWN. This can have disastrous affects on application authorization schemes. Worse than that, this scenario is basically unrecoverable. Even if you restore the user profiles and then restore all of the objects again, the objects will remain owned by QDFTOWN. That's because, when an object already resides on the system and is restored again, OS/400 takes the ownership from the existing object, not from the owner of the object on the media. The only way to recover from this scenario is to start all over--that is, remove all the objects from the system and then do the restore process in the correct order.
Properly Managed, It's a Good Thing!
HA technology has provided the capability to keep businesses running when a system becomes unavailable. But it is not without its security challenges. Just because a sound and robust security configuration is replicated from a source system to a target system does not ensure that the target system will stay securely configured. I have been to sites that have implemented HA and data replication software where the target system has different--typically less secure--settings than the source system. Has the vendor product failed? No. These differences are typically due to inattentive or misinformed system administrators restoring or setting back-level values. In other words, once you have your source system cleaned up and securely configured, do not assume that your target system(s) will automatically stay that way. Just as you need to monitor your source system's security scheme, you also need to watch your target systems.
Carol Woodbury is co-founder of SkyView Partners, a firm specializing in security consulting and services and offering the recently released software, SkyView Risk Assessor for OS/400. Carol has over 13 years in the security industry, 10 of those working for IBM's Enterprise Server Group as the AS/400 Security Architect and Chief Engineering Manager of Security Technology. Look for Carol's second book, Experts' Guide to OS/400 Security, to be released soon. Carol can be reached at
LATEST COMMENTS
MC Press Online