RPG Has SAX Appeal!

RPG

Typography

Smaller Small Medium Big Bigger
Default Helvetica Segoe Georgia Times
Reading Mode

In this part of our RPG XML series, you'll learn how to use RPG's XML-SAX op-code to deal with problematic XML documents and handle situations that XML-INTO cannot deal with.

In the previous two articles in this series, "%Handling XML-INTO Problems" and "i5/OS Offers Native XML Support in V5R4", we focused on the capabilities of RPG's XML-INTO. As we saw, this op-code processes an entire document, either as a single piece or, when needed or desired, in "chunks" by using the capabilities of the %HANDLER BIF. There are, however, situations when this will not work for you. This often relates to limitations in RPG's data structure (DS) capabilities. As you know, a named DS is limited to a maximum size of 64K (at least until V6R1 anyway). Suppose that even a single repeating element will not fit into this? That may sound unlikely, but it doesn't take a huge number of repeating text fields to exceed this limit. Another example, and one that seems to occur quite often, arises when your XML document contains a structure that simply cannot be represented in an RPG DS. To illustrate this, take a look at the new version of our XML document, shown below:

<CatDescr>Toasters</CatDescr>

(A) <Description type="short">Two slot chrome</Description>

(B) <Description type="long">This beautiful two slot chrome finished toaster is

a perfect complement to any modern kitchen ...</Description>

</Product>

<Description type="short">Four slot matt black</Description>

</Product>

</Category>

<CatDescr type="short">Coffee Makers</CatDescr>

<Description>10 cup auto start</Description>

It is substantively the same as in our previous examples, but with one very significant exception: The <Description> element can now be repeated. If that were the only difference, then we could accommodate it by adding a DIM( ) keyword to the element's definition in the DS. But notice that not only does the element repeat, but there is also a new attribute, type, which is used to indicate the type of description (short or long) that is being defined. This presents us with a problem. Since an attribute is treated in the same way as a child element of the parent, the correct RPG definition for "type" would be this:

d description DS

d type 5a

But this leaves us with nowhere to put the content of the description since the content of a DS is the sum of its subfields and any data placed there would overwrite those subfields. In other words, in our situation, the description would overwrite the type field (or vice versa). Not a lot of help! In theory, a DS that looks like the one below should solve the problem:

d description DS Qualified Dim(2)

d description 1000a Varying

d type 5a

In this case, the <Description> would be stored in the field description.description and the "type" attribute would be stored in description.type. Makes sense, doesn't it? Maybe to you, but sadly, not to the compiler.

IBM is aware of this deficiency, and it is on their "to-do" list, but don't expect to see it in V6R1. And don't hold me to it working the way I have described it here; IBM may well have other ideas.

So if we cannot create a DS that matches the structure of the XML data, then we cannot use XML-INTO or at least cannot use it for the whole task. So what are our options?

There are effectively three options:

The first is to take advantage of RPG's XML-SAX op-code. This can be used either by itself to process the entire document or as a follow-on to an XML-INTO parse to "fill in the gaps." We will be dealing with the usage of XML-SAX in the balance of this article.

The second is to reformat the document by using an XSL transform so that it is in a format that can be expressed in RPG terms. This is the approach recommended in the IBM Redbook The Ins and Outs of XML and DB2 UDB for i5/OS. If you have the required XSL skills or are prepared to develop them, this is certainly a valid option and can also help to deal with other issues, such as empty elements. Since the Redbook provides a good working example, we won't duplicate that work here.

Another option would be to process the document in two passes using XML-INTO with a different target DS on each pass. You would also need to use the "AllowExtra" and "AllowMissing" processing options in order to persuade the parser to handle the document since neither of the DSs will exactly match the document. This is not as effective as the XML-SAX option, so we will not be discussing it further.

XML-SAX

The operation of XML-SAX is very different from that of XML-INTO. XML-INTO parses the data from many elements at a time and places the parsed content into the appropriate field in the target DS or array. XML-SAX on the other hand parses the document one event at a time. Examples of events include the beginning of an element (i.e., its starting tag), the value of an element, the end of an element (i.e., its ending tag), the name of an attribute, the value of the attribute, etc.

With XML-INTO, the use of a handler procedure is optional, but with XML-SAX %HANDLER must always be specified. Your handler procedure will be called for every event that the parser encounters. It is up to your logic to decide if it should simply ignore the event or react to it in some way.

Logic is needed in the handler to recognize and react to the beginning of each element and attribute and to store the values in the appropriate places. You will perhaps get a better idea of the kind of logic that might be required if you study the list below. It represents the sequence of events and the associated data (in parentheses) that would be passed to the handler when processing the section of the XML document that begins at (A) above and ends at (B).

• Start Element (description)

• Attribute Name (type)

• Attribute Characters (short)

• End Attribute (type)

• Element Characters (two-slot chrome)

• End Element (description)

Notice that when we receive the element and attribute data, we have no idea which element/attribute it belongs to. That is up to us to determine. In fact, this is not a difficult task as the data will always belong to the last element/attribute that began but has not yet ended. With so many events being signaled to your handler, you can no doubt see that writing the logic to completely process even a simple document with XML-SAX would be somewhat tedious, requiring a lot of rather repetitive code. Luckily, we rarely require all of the data in a document, and we also have the option to combine XML-SAX with XML-INTO to simplify our task.

So to handle the situation in our example, that is what we will do. We will use XML-INTO to capture the bulk of the data and then process again using XML-SAX to fill in the missing piece: the type codes associated with the descriptions.

Let's look at the code that achieves this (shown at the end of this article).

The first thing to notice is the change in the product DS (A). Notice that we have made the description field an array with two elements and also added the type field as a two-element array. Note that the name of the type field in the DS (descrType) does not match the name of the attribute (type) to ensure that XML-INTO will not try to populate it and to make that fact more obvious to those who come after us. In fact, there is no need to actually include the type in the DS at all, but it is convenient to keep all the data together.

The XML-INTO must have the "allowextra=yes" option specified (B) to accommodate the extra type fields. Without this option, the parse would fail since the new version of the DS no longer corresponds to the XML document. Once XML-INTO has completed, we invoke XML-SAX (C) to reprocess the document.

There is no difference in the definition of %HANDLER, but there is a difference between the information passed to an XML-SAX handler and the information passed to the XML-INTO handler we saw in the last article. Take a look at the prototype at (D) and you will see what I mean. The only parameter that is common to the two versions is the first one, the Communication Area. The remaining parameters are as follows:

• event is a four-byte integer that identifies the type of event being processed. Don't worry about the fact that the event is identified by a number. As you will see later, RPG supplies a number of named constants that can be compared with the event value.

• pstring is a pointer to the beginning of the string containing the event data (e.g., the element/attribute names or data).

• stringLen is the length of the string "pointed to" by the previous parameter. This length must be used to determine if data is present as there are occasions when a valid pointer is passed even though there is no data. Only the number of characters indicated by this parameter should be processed.

• exceptionId is an error code identifying any error passed to the handler by the parser. We will not be discussing this in this article. Check the RPG manuals for more information.

Having seen the parameters passed to the handler, it is time to study the mechanics of the handler procedure MySAXHandler. The first step (E) is to check whether any data was received. If no data is received, then the handler simply returns control to the parser. If data is present, then the procedure RmvWhiteSpace( ) is called to remove any unwanted characters and reduce them to a single space. We will look at what I mean by "unwanted" in a moment. Notice that %SUBST is used to pass only the valid portion of the data to the subprocedure. Remember, we were passed only a pointer and a length, and there is probably other data beyond the point indicated by the length parameter. It is worth noting at this point that the field string, which is based on the pointer, can be very useful during debug. If you display it, you will usually be able to see not only the data you are about to process, but also the next part of the XML document. In other words, you will know what to expect next and can perhaps set appropriate breakpoints. This is not guaranteed as sometimes the pointer references a work area, but it is worth remembering.

What do we mean by "unwanted" and why do we need the RmvWhiteSpace routine? Because carriage returns, new lines, tabs, and excess spaces are often present in XML data (sometimes to make it look "pretty"), and we need to remove them from the data. We will not be studying the detail of this procedure, but you will find it included in the version of the program that is available for download. Hopefully, its operation is self-explanatory. (Many thanks to IBM Toronto's Barbara Morris for supplying this routine.)

At (F), the real work begins. A SELECT group is used to identify the type of event we are handling; this is where the named constants mentioned earlier come into play. For example, *XML_START_ELEMENT represents the event code that announces the arrival of a new element name. In the SELECT group at (G), we then identify the specific element that we are dealing with and process accordingly. All this logic is really doing is setting up the appropriate array indices for the Category, Product, and Description arrays. Since we know that the document we are processing is the same one that we just parsed with XML-INTO, we can afford to short-circuit the process, so no attempt is made to match the product codes with the descriptions or anything.

If the event does not represent the beginning of an element, then we next test to see if it is an attribute name (H). If it is, we check to see if it is the type attribute, and if so, we turn on the waitingForType indicator. This indicator allows us to associate the attribute data when it arrives (I) as belonging to the type attribute. Remember, we said earlier that it is up to us to determine that. We then store the value for the type attribute in the appropriate descrType array element.

After processing the document, the XML-SAX parse completes and control returns to the program's main line at (J). At this point, the complete content of the XML document has been stored in our category DS, so our program can process or store that data as necessary. In this simple example, we will just display the data. The logic simply loops through all of the categories and products. As in our previous example, the category loop is controlled by the RPG-supplied xmlElements count in the Program Status Data Structure, which was populated by the XML-INTO operation, and the product loop completes when a blank product code is encountered. The format of our XML document is such that there must be a short description, so the first elements of the description and type arrays are displayed. At (K), the logic then tests to see if a second set is present and, if it is, displays the relevant data.

And that's really all there is to it. I won't describe it here, but I have included in the source code accompanying this article a utility program (XMLSAXLIST) that you might find useful when studying XML documents that you need to process. It uses XML-SAX to parse the document and produces a listing of all the events signaled and the length and content of the associated data. If you run the program, you will be able to see the effect of the RmvWhiteSpace procedure as the original length of the data item is included. If you have any questions about the operation of the program, please let me know.

H Option(*NoDebugIO : *SrcStmt )

// This count is populated by XML-INTO whenever the INTO

// variable is an array

D progStatus SDS

D xmlElements 20i 0 Overlay(progStatus: 372)

(D) D MySAXHandler Pr 10i 0

D commArea Like(dummyCommArea)

D event 10i 0 Value

D pstring * Value

D stringLen 20i 0 Value

D exceptionId 10i 0 Value

D RmvWhitespace pr 65535a Varying

D input 65535a Varying Const

D category DS Qualified Dim(20)

D code 2a

D catDescr 20a

D product LikeDS(product) Dim(50)

D product DS Qualified

D code 4a

(A) D descrType 5a Dim(2)

D description 600a Dim(2)

D mSRP 7p 2

D sellPrice 7p 2

D qtyOnHand 5i 0

D XML_Source S 256a Varying

D Inz('/Partner400/XML/Example5.xml')

// Short version of Description for display purposes

D dispDescription...

D S 40a

D dummyCommArea S 1a

D i S 5i 0

D p S 5i 0

/Free

(B) XML-INTO category

%XML(XML_Source: 'case=any doc=file allowextra=yes +

allowmissing=yes');

// XML-INTO has filled the category array

// Next we use XML-SAX to fill in the missing type details

%XML(XML_Source: 'doc=file');

Dsply ('xmlElements = ' + %char(xmlElements) );

// The XML parser's element count is used to control the loop

(J) For i = 1 to xmlElements;

Dsply ('Cat: ' + category(i).code + ' ' +

category(i).catDescr );

For p = 1 to %Elem(category.product);

If category(i).product(p).code = *Blanks;

Leave; // Exit once blank product code entry located

Else;

// Process the current product entry

dispDescription = category(i).product(p).description(1);

Dsply ('Product: ' + dispDescription);

Dsply ('Type: ' + category(i).product(p).descrType(1));

// If second description is present, display details

(K) If category(i).product(p).description(2) <> *Blanks;

dispDescription = category(i).product(p).description(2);

Dsply ('Product: ' + dispDescription);

Dsply ('Type: ' + category(i).product(p).descrType(2));

EndIf;

EndFor;

&nb

Jon Paris's IBM midrange career started when he fell in love with the System/38 while working as a consultant. This love affair ultimately led him to joining IBM.

In 1987, Jon was hired by the IBM Toronto Laboratory to work on the S/36 and S/38 COBOL compilers. Subsequently, Jon became involved with the AS/400 and in particular COBOL/400.

In early 1989, Jon was transferred to the Languages Architecture and Planning Group, with particular responsibility for the COBOL and RPG languages. There, he played a major role in the definition of the new RPG IV language and in promoting its use with IBM Business Partners and users. He was also heavily involved in producing educational and other support materials and services related to other AS/400 programming languages and development tools, such as CODE/400 and VisualAge for RPG.

Jon left IBM in 1998 to focus on developing and delivering education focused on enhancing AS/400 and iSeries application development skills.

Jon is a frequent speaker at user group meetings and conferences around the world, and he holds a number of speaker excellence awards from COMMON.

BLOG COMMENTS POWERED BY DISQUS

LATEST COMMENTS

Book Reviews

Book Review: Extract, Transform, and Load with SSIS

Do your business apps access different data sources? This book shows you how to make that task easier
Book Review: 21st Century RPG: /Free, ILE, and MVC

David Shirey’s first book is an educational and entertaining read for “modern” and “old” RPG programmers alike
Book Review: Developing Business Applications for the Web--With HTML, CSS, JSP, PHP, ASP.NET, and JavaScript

If you are ready to get into Web application development, take this book along as your guide
Book Review: DB2 10.5 Fundamentals for LUW: Certification Study Guide (Exam 615)

DBAs who use the book will find it very helpful first in their test study and later as a reference book.
Book Review: DB2 11 for z/OS Database Administration—Certification Study Guide

This is a well-written DB2 11 book that could easily stand on its own as a reference manual, not just a certification guide.
Book Review: Free-Format RPG IV, Third Edition

Jim Martin comes through for us again.
Book Review: IBM i Security Administration and Compliance, Second Edition
Book Review: Programming in ILE RPG, Fifth Edition

This book really hits the mark and is a must-read for all RPG developers.
Book Review: DB2 10.1/10.5 for Linux, UNIX, and Windows Database Administration: Certification Guide
Book Review: Subfiles in Free-Format RPG

Whether you're a newbie or a seasoned pro, this book has something for you.
Book Review: Evolve Your RPG Coding: Move from OPM to ILE ... and Beyond

This book provides an amazingly comprehensive introduction to the concepts while at the same time delivering enough technical detail to make you productive very quickly.
Book Review: Database Design and SQL for DB2
Book Review: The Chief Data Officer Handbook for Data Governance

When implemented appropriately, data governance is a powerful framework.
Book Review: DB2 10 for z/OS: The Smarter, Faster Way to Upgrade

Trying to figure out whether to upgrade? Read on.
Book Review: 5 Keys to Business Analytics Program Success
Book Review: DB2 11: The Ultimate Database for Cloud, Analytics, and Mobile
Book Review: Flexible Input, Dazzling Output with IBM i

Today, it's all about input and output. Getting data into the IBM i from non-traditional sources and then displaying it back out again in varied formats. But where can you go to learn all that you need to know about this critical skill?
Book Review: Advanced Guide to PHP on IBM i

Enterprise-level PHP skills and techniques have been adapted for IBM i developers in Kevin Schroeder's new book.
Book Review: Java for RPG Programmers

If you've been putting off learning Java, you have no excuse anymore!
Book Review: DB2 10.1 Fundamentals: Certification Study Guide

Too valuable to be classified as merely excellent certification material, this book should also rightly take its place on DB2 DBA bookshelves as a solid day-to-day DB2 reference.
Book Review: DB2 10 for Z/OS Database Administration: Certification Study Guide

Whether you're trying to get certified or you just need a great reference book, this is the book for you.
Book Review: Developing Web 2.0 Applications with EGL for IBM i

It's everything you need to know, from the bottom up.
Book Review: Advanced Integrated RPG

Isn't it about time somebody told us how to integrate RPG and Java?
Book Review: Managing Without Walls

If you manage remote or satellite teams, this book is a must-read!
Book Review: Managing Without Walls

If you manage remote or satellite teams, this book is a must-read!
Book Review: The Remote System Explorer

This book speaks directly to the thousands of IBM i programmers who develop in RPG, COBOL, CL, and DDS every day.
Book Review: IBM System i APIs at Work, Second Edition

API expert Bruce Vining delivers the only comprehensive guide to APIs.
Book Review: Functions in Free-Format RPG IV

This one short volume manages to essentially be both a general introduction and a detailed reference.
Book Review: DB2 11: The Database for Big Data and Analytics
Book Review: IBM Mainframe Security: Beyond the Basics

Beginners will have a strong foundation after reading this book. Experienced professionals will reference it frequently.
Book Review: IBM InfoSphere: A Platform for Big Data Governance and Process Data Governance

Find out how IBM is addressing the challenges of big data.
Book Review: Fundamentals of Technology Project Management

Projects can be overwhelming, but taken in small, deliberate steps, all projects are achievable.
Book Review: Customer Experience Analytics

Use CEA as a strategic weapon to stay ahead of your competitors.
Book Review: Big Data Analytics: Disruptive Technologies for Changing the Game

The disciplines of data analytics are evolving to meet the new challenges of big data.
Book Review: IBM i Security: Administration and Compliance

If you have any interest in IBM i security, whether as an administrator, a programmer, or an auditor, then this book is the perfect resource.
Book Review: DB2 9.7 for Linux, UNIX, and Windows Database Administration (Exam 541)

This book, written by the creator of the certification exam, reveals exactly what you'll need to know to prep for the test.
Book Review: Selling Information Governance to the Business

Who governs the information that runs your company?
Book Review: You Want to Do WHAT with PHP?

If you're serious about programming in PHP, get a book that treats you that way.
Book Review: The IBM i Programmer's Guide to PHP

Both a primer and a reference, this book is a must-have for anyone who wants to program in PHP.
Book Review: JavaScript for the Business Developer

There's no faster, easier way to become proficient in JavaScript.
Book Review: SOA for the Business Developer

If you want to know how SOA works in the real world, this is your book.
Book Review: DB2 9 Fundamentals

Whether you want to obtain an IBM certified DB2 professional certification or simply become well-rounded in the fundamental concepts of DB2 and general database theory, this is your book.
Book Review: The Modern RPG IV Language, Fourth Edition

This book isn't a training manual; it's a reference book.

Resource Center

How to Modernize Fast and Within Budget (Quick Guide)
Why Migrate When You Can Modernize?

Business users want new applications now. Market and regulatory pressures require faster application updates and delivery into production. Your IBM i developers may be approaching retirement, and you see no sure way to fill their positions with experienced developers. In addition, you may be caught between maintaining your existing applications and the uncertainty of moving to something new.
Resource Center

The MC Resource Centers bring you the widest selection of white papers, trial software, and on-demand webcasts for you to choose from. >> Review the list of White Papers, Trial Software or On-Demand Webcast at the MC Press Resource Center. >> Add the items to yru Cart and complet he checkout process and submit
IBM i Transformation Risks Every Business Leader Should Know

Join us for this hour-long webcast that will explore:
What to Do When Your AS/400 Talent Retires

IT managers hoping to find new IBM i talent are discovering that the pool of experienced RPG programmers and operators or administrators with intimate knowledge of the operating system and the applications that run on it is small. This begs the question: How will you manage the platform that supports such a big part of your business? This guide offers strategies and software suggestions to help you plan IT staffing and resources and smooth the transition after your AS/400 talent retires. Read on to learn:

Analytics & Cognitive Categories

Latest Analytics & Cognitive News

Career Catgories

Latest Career News

Cloud Categories

Latest Cloud News

IT Infrastructure Categories

Latest IT Infrastructure News

News Categories

Latest News

Programming Categories

Latest Programming News

Security Categories

Latest Security News

Typography

Share This

XML-SAX

LATEST COMMENTS

MC Press Online

Support MC Press Online

Book Reviews

Book Review: Extract, Transform, and Load with SSIS

Book Review: 21st Century RPG: /Free, ILE, and MVC

Book Review: Developing Business Applications for the Web--With HTML, CSS, JSP, PHP, ASP.NET, and JavaScript

Book Review: DB2 10.5 Fundamentals for LUW: Certification Study Guide (Exam 615)

Book Review: DB2 11 for z/OS Database Administration—Certification Study Guide

Book Review: Free-Format RPG IV, Third Edition

Book Review: IBM i Security Administration and Compliance, Second Edition

Book Review: Programming in ILE RPG, Fifth Edition

Book Review: DB2 10.1/10.5 for Linux, UNIX, and Windows Database Administration: Certification Guide

Book Review: Subfiles in Free-Format RPG

Book Review: Evolve Your RPG Coding: Move from OPM to ILE ... and Beyond

Book Review: Database Design and SQL for DB2

Book Review: The Chief Data Officer Handbook for Data Governance

Book Review: DB2 10 for z/OS: The Smarter, Faster Way to Upgrade

Book Review: 5 Keys to Business Analytics Program Success

Book Review: DB2 11: The Ultimate Database for Cloud, Analytics, and Mobile

Book Review: Flexible Input, Dazzling Output with IBM i

Book Review: Advanced Guide to PHP on IBM i

Book Review: Java for RPG Programmers

Book Review: DB2 10.1 Fundamentals: Certification Study Guide

Book Review: DB2 10 for Z/OS Database Administration: Certification Study Guide

Book Review: Developing Web 2.0 Applications with EGL for IBM i

Book Review: Advanced Integrated RPG

Book Review: Managing Without Walls

Book Review: Managing Without Walls

Book Review: The Remote System Explorer