29
Fri, Nov
0 New Articles

Big Blue's Autonomic Blueprint

Analysis of News Events
Typography
  • Smaller Small Medium Big Bigger
  • Default Helvetica Segoe Georgia Times

On April 4, 2003, IBM announced that it is introducing the industry's first blueprint to assist customers as they begin to build autonomic computing systems. IBM's autonomic computing initiative is one of the key elements of IBM's larger e-Business on Demand marketing message.

Autonomic computing has been, in the past, defined by IBM as computing systems that are self-healing, self-configuring combinations of hardware and software that relieve administrators from low-level setup and maintenance tasks. But could the making of an autonomic computing architecture impact how high availability is delivered to customers?

The Evolution of High Availability Products

These days, the software and hardware combinations that compose high availability computing solutions--solutions designed to keep systems up and running--are evolutionary responses to real experiences of customers and engineers. They're responses to something that broke in the past and required engineers to devise preventative measures.

But if IBM brings forth a real era of autonomic computing--systems that are self-diagnosing, self-aware, self-configuring, and self-healing--the science of high availability will have new challenges and benefits.

The System/38: Precursors for High Availability

If you look back at the history of most high availability solutions in the market today, you will find a thread that stretches back to the days of the System/38 architecture, when the failure of a single disk drive could spell days of downtime. (And even back then, a single day of unexpected downtime was not a pleasant thing to explain to your management.)

At the time, the System/38 had some unique challenges that transformed simple backup and recovery into a serious need for availability services. It was the first truly virtual computer, composed of an operating system and applications that rode on top of a hardware platform that seemed to be constantly struggling to keep up with the system's advanced design. Programmers loved it. But it was the System/38's reputation as a DASD masher that sent fear and trembling into the hearts of IBM systems engineers. Where was the data to recover, these engineers wondered? In fact, it was scattered anywhere and everywhere, shotgun blasted across the platters for better disk drive armature utilization. This meant that systems engineers had to devise a means of resurrecting failed hard drives on systems where file structures were not clearly identified and where volatile memory often held large portions of the database when the system went down.

The High Availability Market Spins Off

Many of the hardware and software technologies that we see in use today came from engineering experiences that were aimed at solving or avoiding disruptions in the virtual information structures of single-level storage. In fact, in order for the AS/400 to succeed in the marketplace (where the System/38 had faltered), IBM had to prove once and for all that it had solved the problems of system availability with this strange computing system that was designed to use single-level virtual storage.

For many years, IBM concentrated simultaneously on both the hardware and the software front to increase the AS/400's availability. It created drives with comprehensive disk diagnosis and alert messaging software built in. It wrote operating system services--like journaling and mirrored storage pool software--to minimize the damage that a failed piece of hardware might create. And the result? The AS/400 achieved the highest availability rating of any system in the 1990s, with a rating of 99.999%. (And to this day, the premium that customers pay for their iSeries DASD is a vestige of that era when IBM's focus was on unique, advanced AS/400 disk drives. This price is still charged, even though the iSeries today uses the same drives that are used on other IBM platforms.)

Is it any wonder then that--when high availability became an important issue for larger computing systems and for networked servers in the 1990s--the AS/400 had already spawned a lot of expert IBM engineers who had cut their teeth on the AS/400's high availability mechanisms. Many of these engineers went off to form separate companies that sold products to meet the needs of a burgeoning high availability marketplace.

This is not to say that the high availability mechanisms designed in the days of the AS/400 are sufficient for the needs of customers today. And that's the point.

New Challenges in Availability Lead to Autonomic Technology

The challenges facing engineers and administrators today are not simple "what-if" scenarios on a single failing machine or device. They are issues borne of complexity--massive, cascading complexity--on networks composed of a wide spectrum of services. These services, by their very nature, become tangled as users thread their way through portals, applications, networks, servers, routers, protocols, and configurations. Unfortunately today, by the time a network administrator has been notified of some problem--with a disk drive, a stuck printer queue, a corrupted user profile, a streaming router, or a bungled data stream--the damage to the information flow has probably already been done. Why? Because most of the high availability services that are out there are based upon external monitoring of discrete conditions within the information system itself--backed by scripts and controls that have been explicitly designed to handle unique error conditions.

What is amazing is not that our systems sometimes fail, but that we can get as much work done as we do. In fact, our information systems have become so complex that we work in an environment where things are constantly failing, all the time! It's only when they fail catastrophically that we bother to address the problem.

So how does IBM's autonomic computing initiative address this problem? This is where a revolution in creative thinking--backed by some intense engineering--may positively impact the whole industry of high availability.

IBM Shares Project Eliza Insights

IBM's experience with autonomic computing comes, in part, from its Project Eliza, the R&D initiative that was funded several years ago. Project Eliza provided basic research into how systems could be designed to actually learn. Composed of a neural network of sensors, logs, and algorithms, the concept behind Project Eliza was to build systems that could develop enough self-knowledge to make discrete decisions about what needed to be done to make preemptive, self-actuated decisions that might avoid catastrophes. This research, now reaching some maturity, is forming the basis for developing real systems that are self-configuring and self-monitoring, with built-in decision-making capabilities.

IBM could have taken this basic research and developed its own brand of high availability services for each of its products. However, IBM also understands that its products no longer reside in standalone environments; instead, they are members of larger networks of heterogeneous systems in which the failure of one device can spell disruption for the entire information system. As a result, as a part of its long-term e-Business on Demand initiative, IBM has chosen to release a great deal of its research as a blue print for building autonomic systems: a sort of workbook of what it actually takes to design systems that are truly self-configuring, self-monitoring, and self-healing.

Autonomic Technologies Released

In addition to this blueprint, the company is also providing developers with four specific technologies to help develop autonomic systems. These new technologies provide developers and customers with some actual building blocks that will enable them to produce self-managed systems that are compliant within the framework of the new blueprint. These are the four technologies:

  • Log & Trace Tool for Problem Determination--This tool alleviates the manual task of tracking down the cause of a system problem by putting the log data from different system components into a common format. By doing this, administrators can more easily identify the root cause more quickly. This tool captures and correlates events from end-to-end execution in the distributed stack, allows for a more structured analysis of distributed application problems, and facilitates the development of autonomic self-healing and self-optimizing capabilities.
  • ABLE (Agent Building and Learning Environment) Rules Engine for Complex Analysis--ABLE is a set of fast, scalable, and reusable learning and reasoning components that capture and share the individual and organizational knowledge that surrounds a system. ABLE is designed to minimize the need for developing complex algorithms that are the basis for intelligent, autonomic behavior by a system.
  • Monitoring Engine providing Autonomic Monitoring capability--This technology detects resource outages and potential problems before they impact system performance or end-user experience. The monitoring engine has embedded self-healing technology that allows systems to automatically recover from critical situations. IBM says it uses the same advanced resource modeling technology to capture, analyze, and correlate metrics that IBM uses in its Tivoli product line. In addition, IBM says that the Tivoli Autonomic Monitoring Engine will be available in beta this summer and is scheduled to ship later in the year.
  • Business Workload Management for Heterogeneous Environments--IBM says that the initial delivery of this technology will use the Application Response Management (ARM) standard to help identify the causes of bottlenecks in a system by using response time measurement; transactional processing segment reporting; and a neural, network-like, self-learning mechanism through middleware and servers. It's designed to adjust resources automatically to ensure that specified performance objectives are hit. This technology will also start to be delivered with the IBM Tivoli Monitoring for Transaction Performance product.

Additional details about these technologies were made available at IBM's developerWorks Live! conference for software developers that was held last week in New Orleans.

Bringing It All Back Home to iSeries

How these technologies will find their way into the iSeries environment will certainly be tied to IBM's Tivoli product efforts, but the manner by which IBM is making these technologies available to the developer community will also ensure that the level of high availability products offered by third-party vendors will also substantially increase over time. Implementing these technologies will greatly enhance the interrelationships between IBM e-server high availability, and the features and functions of high availability products aimed at routers, non-IBM servers, network switches, and the rest of the components that comprise the modern information system.

High availability has come a long way since the time when systems engineers labored over failed System/38 disk drives, leading to the record-breaking availability performance of the AS/400 and the iSeries platforms. But, if IBM follows through with its autonomic computing plans, it's clear that we're entering into a new era in which 99.999 % availability for the whole information infrastructure will become the standard for an industry rife with complexity and interruption.

For more information about IBM's announcement on April 4, 2003, visit IBM's announcement.

For more information about IBM's autonomic computing initiative, we recommend the following white paper: "The Dawning of the Autonomic Era" by A.G. Ganek and T.A. Corbi.

A complete look at IBM's autonomic efforts may be found at IBM's Autonomic Web page.

Thomas M. Stockwell is the Editor in Chief of MC Press, LLC. He has written extensively about program development, project management, IT management, and IT consulting and has been a frequent contributor to many midrange periodicals. He has authored numerous white papers for iSeries solutions providers. He welcomes your comments about this or other articles and can be reached at This email address is being protected from spambots. You need JavaScript enabled to view it..

Thomas Stockwell

Thomas M. Stockwell is an independent IT analyst and writer. He is the former Editor in Chief of MC Press Online and Midrange Computing magazine and has over 20 years of experience as a programmer, systems engineer, IT director, industry analyst, author, speaker, consultant, and editor.  

 

Tom works from his home in the Napa Valley in California. He can be reached at ITincendiary.com.

 

 

BLOG COMMENTS POWERED BY DISQUS

LATEST COMMENTS

Support MC Press Online

$

Book Reviews

Resource Center

  • SB Profound WC 5536 Have you been wondering about Node.js? Our free Node.js Webinar Series takes you from total beginner to creating a fully-functional IBM i Node.js business application. You can find Part 1 here. In Part 2 of our free Node.js Webinar Series, Brian May teaches you the different tooling options available for writing code, debugging, and using Git for version control. Brian will briefly discuss the different tools available, and demonstrate his preferred setup for Node development on IBM i or any platform. Attend this webinar to learn:

  • SB Profound WP 5539More than ever, there is a demand for IT to deliver innovation. Your IBM i has been an essential part of your business operations for years. However, your organization may struggle to maintain the current system and implement new projects. The thousands of customers we've worked with and surveyed state that expectations regarding the digital footprint and vision of the company are not aligned with the current IT environment.

  • SB HelpSystems ROBOT Generic IBM announced the E1080 servers using the latest Power10 processor in September 2021. The most powerful processor from IBM to date, Power10 is designed to handle the demands of doing business in today’s high-tech atmosphere, including running cloud applications, supporting big data, and managing AI workloads. But what does Power10 mean for your data center? In this recorded webinar, IBMers Dan Sundt and Dylan Boday join IBM Power Champion Tom Huntington for a discussion on why Power10 technology is the right strategic investment if you run IBM i, AIX, or Linux. In this action-packed hour, Tom will share trends from the IBM i and AIX user communities while Dan and Dylan dive into the tech specs for key hardware, including:

  • Magic MarkTRY the one package that solves all your document design and printing challenges on all your platforms. Produce bar code labels, electronic forms, ad hoc reports, and RFID tags – without programming! MarkMagic is the only document design and print solution that combines report writing, WYSIWYG label and forms design, and conditional printing in one integrated product. Make sure your data survives when catastrophe hits. Request your trial now!  Request Now.

  • SB HelpSystems ROBOT GenericForms of ransomware has been around for over 30 years, and with more and more organizations suffering attacks each year, it continues to endure. What has made ransomware such a durable threat and what is the best way to combat it? In order to prevent ransomware, organizations must first understand how it works.

  • SB HelpSystems ROBOT GenericIT security is a top priority for businesses around the world, but most IBM i pros don’t know where to begin—and most cybersecurity experts don’t know IBM i. In this session, Robin Tatam explores the business impact of lax IBM i security, the top vulnerabilities putting IBM i at risk, and the steps you can take to protect your organization. If you’re looking to avoid unexpected downtime or corrupted data, you don’t want to miss this session.

  • SB HelpSystems ROBOT GenericCan you trust all of your users all of the time? A typical end user receives 16 malicious emails each month, but only 17 percent of these phishing campaigns are reported to IT. Once an attack is underway, most organizations won’t discover the breach until six months later. A staggering amount of damage can occur in that time. Despite these risks, 93 percent of organizations are leaving their IBM i systems vulnerable to cybercrime. In this on-demand webinar, IBM i security experts Robin Tatam and Sandi Moore will reveal:

  • FORTRA Disaster protection is vital to every business. Yet, it often consists of patched together procedures that are prone to error. From automatic backups to data encryption to media management, Robot automates the routine (yet often complex) tasks of iSeries backup and recovery, saving you time and money and making the process safer and more reliable. Automate your backups with the Robot Backup and Recovery Solution. Key features include:

  • FORTRAManaging messages on your IBM i can be more than a full-time job if you have to do it manually. Messages need a response and resources must be monitored—often over multiple systems and across platforms. How can you be sure you won’t miss important system events? Automate your message center with the Robot Message Management Solution. Key features include:

  • FORTRAThe thought of printing, distributing, and storing iSeries reports manually may reduce you to tears. Paper and labor costs associated with report generation can spiral out of control. Mountains of paper threaten to swamp your files. Robot automates report bursting, distribution, bundling, and archiving, and offers secure, selective online report viewing. Manage your reports with the Robot Report Management Solution. Key features include:

  • FORTRAFor over 30 years, Robot has been a leader in systems management for IBM i. With batch job creation and scheduling at its core, the Robot Job Scheduling Solution reduces the opportunity for human error and helps you maintain service levels, automating even the biggest, most complex runbooks. Manage your job schedule with the Robot Job Scheduling Solution. Key features include:

  • LANSA Business users want new applications now. Market and regulatory pressures require faster application updates and delivery into production. Your IBM i developers may be approaching retirement, and you see no sure way to fill their positions with experienced developers. In addition, you may be caught between maintaining your existing applications and the uncertainty of moving to something new.

  • LANSAWhen it comes to creating your business applications, there are hundreds of coding platforms and programming languages to choose from. These options range from very complex traditional programming languages to Low-Code platforms where sometimes no traditional coding experience is needed. Download our whitepaper, The Power of Writing Code in a Low-Code Solution, and:

  • LANSASupply Chain is becoming increasingly complex and unpredictable. From raw materials for manufacturing to food supply chains, the journey from source to production to delivery to consumers is marred with inefficiencies, manual processes, shortages, recalls, counterfeits, and scandals. In this webinar, we discuss how:

  • The MC Resource Centers bring you the widest selection of white papers, trial software, and on-demand webcasts for you to choose from. >> Review the list of White Papers, Trial Software or On-Demand Webcast at the MC Press Resource Center. >> Add the items to yru Cart and complet he checkout process and submit

  • Profound Logic Have you been wondering about Node.js? Our free Node.js Webinar Series takes you from total beginner to creating a fully-functional IBM i Node.js business application.

  • SB Profound WC 5536Join us for this hour-long webcast that will explore:

  • Fortra IT managers hoping to find new IBM i talent are discovering that the pool of experienced RPG programmers and operators or administrators with intimate knowledge of the operating system and the applications that run on it is small. This begs the question: How will you manage the platform that supports such a big part of your business? This guide offers strategies and software suggestions to help you plan IT staffing and resources and smooth the transition after your AS/400 talent retires. Read on to learn: