In Java Performance Tuning (Midrange Computing, March 2000), I introduced the AS/400 Performance Explorer tool. That article showed how to collect Java performance events. In particular, object creation and garbage collection events were measured using standard AS/400 Performance Explorer (PEX) commands. This article will add specific tips that you can use to improve server Java performance.
There are a few basic system tuning tips and a few basic language tips that everyone should know. Beyond these basics, Ill show you how to use Performance Explorer and a nifty tool called Performance Trace Data Visualizer. To add some excitement, Ill measure a simple Java code fragment that creates a new order number (published by Don Denoncourt in The Spartan Order System: An Exercise in AS/400 Java Application Design in the October/November 1999 issue in Midrange Computings AS/400 NetJava Expert). Then, I will show how to improve the performance of that sample code using the tips in this article.
Since Java was introduced, performance has improved significantly on both the AS/400 and other platforms. In the early days of Java, there were many performance horror stories. Valiant Java heroes spent long hours hand-tuning applications. Under the covers, Java Virtual Machine (JVM) implementers used optimization techniques such as Just-In-Time (JIT) compilers and reimplementation of Java Development Kit (JDK) classes. The challenge you have today is that some Java constructs seem elegant but are too slow. Unfortunately, you cant always see that. The approach I like to use is to construct a controlled measurement around a set of primitive Java operations. Of course, you have to have some idea of where to look in the first place. The tips in this article will help you determine where to start. Once you think you have found a potential problem, you need to have some idea of an alternate approach. There is no tool or process that can give you the best alternative, but, if you use the techniques described in this article, you will soon develop a sense for how to improve your Java code.
This article focuses on server Java performance. The tips provided apply to any release of Java on the AS/400, including JDK 1.1.4, JDK 1.1.6, and JDK 1.1.7. Some of the tips may have less effect on the V4R4 JDK 1.2 environment because a JIT compiler has been added to JDK 1.2. The measurement techniques shown in this article will work for any release, including JDK 1.2.
Before I start, its important to configure your system for Java. Here are four of the most important system tuning tips:
Choose an AS/400 model that is designed for Java. Older AS/400 models are designed for RPG and COBOL applications. Java applications require different memory, cache, and processor speeds than traditional RPG and COBOL applications. Newer models have larger caches and faster clock speeds. The best models for Java can be found in AS/400 Performance Capabilities ReferenceVersion 4, Release 4, dated August 1999.
Monitor storage pool page faults. Java applications require more memory than traditional applications. One way to measure whether you are having memory pool problems is to look at the nondatabase page fault rate in the storage pool that is running your Java application. If you are using the JAVA command from the command line, this will usually be the *BASE pool. Nondatabase page fault rates should be less than 20. If the fault rate gets above 30, add more memory to the storage pool. You can check page fault rates with the Work with System Status (WRKSYSSTS) command.
Use Create Java Program (CRTJVAPGM) whenever possible. The AS/400 CRTJVAPGM command transforms Java class files into direct-execution RISC instructions. In most cases, you should specify optimization level 40. If you dont use CRTJVAPGM, the AS/400 will automatically transform class files during runtime. The default optimization for this automatic runtime transformation is optimization level 10. Transformation of Java archives (.zip and .jar files) is particularly important because optimizations can be made between class files contained in the .zip or .jar file. It is especially important to run CRTJVAPGM over the jt400.jar file because the Java Toolbox for the AS/400 (JT400) is contained in this file and is not transformed during the install process.
Monitor activity levels. The storage pool activity level determines the number of concurrent tasks that are allowed to run in one pool. Each Java thread takes one activity level. If you are using a storage pool that was configured with a low activity level, Java threads may have to wait. Often, this condition is difficult to detect and can significantly slow a threaded application.
The system tuning tips are the most important: Application-level tuning wont help you much if you havent set up the system properly. The following tips are the most important programming tips for Java. There are, of course, many more tips, but I have found the following three to be the most significant (refer to AS/400 Performance Capabilities ReferenceVersion 4, Release 4, dated August 1999 for additional tips):
Avoid implicit object creation. Certain Java classes create several small temporary objects to complete their work. The worse cases I know about are the String and BigDecimal classes. As with any performance tip, String and BigDecimal objects are only a problem if you create lots of them. One way to work around the String problem is to use StringBuffer. Working around BigDecimal is more difficult. Numerical primitives (such as ints or floats) can be used in some cases. The reason why avoiding object creation is such an important tip is that object creation incurs a double penalty: It takes time to create temporary objects, and, then, it takes time to garbage-collect them. In a steady-state server application, all of the objects that are created must eventually be collected.
Avoid unnecessary synchronization. Java synchronization is a locking technique used by the JVM to protect access to objects in a multithreaded environment. A primary design goal should be to lock only objects (both Java and AS/400) that must be shared and then lock them for only a short duration. Synchronization is usually specified with the synchronized modifier on a Java method declaration. Many methods in the standard JDK library use the synchronized modifier. (A scan of JDK 1.1.6 showed 790 cases.) The cases you should be
most concerned about are frequent calls to short-running functions. For example, the StringBuffer and Vector classes use synchronized methods. In extreme cases, it may be necessary to write your own version of a Java class that avoids the synchronized call. Suns Collections package contains several alternatives to the Vector class, all of which perform well because they avoid using synchronized calls. To determine whether developing your own unsynchronized classes is worth the effort, use the tool I describe in this article.
Exploit method inlining. Method inlining is a classic optimization technique. The term refers to a compilers ability to eliminate call overhead by copying the entire called method (or procedure) into the calling location. The nice thing about inlining is that it happens under the covers. You can keep simple abstracted object-oriented design (OOD) that makes good use of get/set methods while relying on the compiler and translator to do the right thing. To make this work, however, you need to be aware of a few structural rules. First, final static methods have a greater potential for being inlined. This is because the system knows that they cannot be overridden. Second, the javac compiler provides an -O option that is worth trying. (Note that this option can be set in all the major Java integrated development environments [IDEs].) In some cases, the compiler can inline methods before translation starts. Finally, packaging applications in a .jar file will help. The translation of a
.jar file can inline methods between classes because it has all of the classes available. Said more succinctly, always deploy Java applications in .jar files, as your Java classes execute faster out of a .jar than from directories.
Now that you have some of the basic tips, let me show you how to put this knowledge to work.
Fishing for Performance
There is an age-old adage: Give a man a fish, and you feed him for a day; teach him how to fish, and you feed him for life. The tips above can be viewed as big fish. To learn how to optimize Java programs better, you need to understand how to measure the relative performance of primitive operations. There are just too many alternatives to learn all of the tips. For example, is ToolBox record I/O performance better than Java Database Connectivity (JDBC)? What does it cost to invoke a servlet? Whats the performance difference between data queues and Sockets? The list goes on and on.
The best way to teach others how to fish is by example. Figure 1 shows code that uses a data area to generate a unique order number. It uses ToolBox classes, reads the data area, converts the order number from BigDecimal to integer, adds one, converts the integer back to BigDecimal, and saves the new result. In theory, other applications written in either Java or RPG could access the same data area. The getNextOrderNo() method is shown here. (The complete program can be found at www.midrangecomputing.com/anje/code.cfm.) Let me see if I can improve the performance of this code.
To measure this code, create a Performance Explorer definition suitable for Performance Trace Data Visualizer (PTDV). PTDV is available from IBMs alphaWorks Web page at www.alphaworks.ibm.com. PTDV analyzes Java method call trace records. It collects statistics such as inline instruction counts for all of the methods that have been hooked for performance. PTDV also manages to correlate Java object creation (*OBJCRT) events and Java thread state change (*THDSTTCHG) events with the methods that are being measured. The object creation events are important because they correlate directly with the Avoid implicit object creation tip. The thread state change events occur when a synchronized lock is waiting for an object. These correlate with the Avoid unnecessary synchronization tip.
Figure 2 shows the PEX definition to use for PTDV. Note that its very important to use the exact options shown here. PTDV can handle only certain trace records, and there is no point in collecting data that cant be analyzed. Next, wrap the code to be measured in a method called nextOrder(). Then, use CRTJVAPGM to add *ENTRY and *EXIT performance hooks to the nextOrder() method. Finally, start PEX (STRPEX) and run the main method with a loop iteration of 100.
Figure 3 shows one output panel from PTDV. PTDV provides many different Java performance data views. One of the most useful views is the Avg Inline Instruction count. Each RISC instruction represents a small unit of work. Primitive path-length analysis amounts to counting instructions between points A and B in a code fragment. Instruction counting is useful because it is invariant between AS/400 models. In this example, the number of average inline instructions is the cost of one call to the nextOrder() method. Inline means all of the instructions I coded plus all of the system functions used by the Java Virtual Machine. Heres one word of caution: The instructions are counted by thread. In the two examples I am comparing, everything is occurring on the same thread. This may not be true in a more complex example.
Primitive analysis involves looking for the relative cost of two different approaches as measured by inline instructions. In this example, PTDV indicates that each call to nextOrder() took about 535,349 RISC instructions (see row 5). The object creation count is the total for 100 iterations, so the object creation count for one call to nextOrder() is about 313 (see row 5). (Note that it is normal for instruction and object counts to vary slightly between calls.)
Tuning Java Code
Heres how to improve the code. Because an order number is a simple integer, try using integers instead of BigDecimal. Of course, you should suspect that BigDecimal might be a problem in the first place. The tips Ive given should give you some idea of where to start looking for problems. Closing the timing window is also important. The problem with the first example is that the read and write of the data area may not be synchronized. Multiple concurrent jobs could generate duplicate order numbers. (The V4R4 Toolbox documentation is not clear on whether the read() operation locks the data area.) One way to correct this is to lock a database record for update. The record lock would lock the data between the read and write. It would also work from any RPG program so that both a new Java order entry application and an old RPG order entry application could be running on the same system.
Figure 4 (pages 97 and 98) shows the code. Its not pretty; Ill admit that the data area approach is more readable. Java code is very elegant, and youll have to make tradeoffs between readability and performance. In this example, I used the ToolBox record I/O functions. A physical database file was created with one record and one integer field. To improve the ToolBox I/O, try using the getContents() method to move the entire buffer into a byte[] array. Then, use the ToolBox converter class to convert to and from an integer. The nextOrder() method is shown here; the complete code can be found at www.midrangecomputing.com/mc.
Figure 5 shows the results of 100 calls to nextOrder(). The average inline instruction count is now 60,748, and the object creation count is 33. The result is dramatic: over eight times better performance, over nine times fewer objects created, and better synchronization. Look back at the PTDV output from the data area test (Figure 3). You will notice that the record I/O test (Figure 5) eliminated the garbage collection thread and the creation of two worker threads. Youll have to decide if this is worth the admittedly less readable code.
Java application performance is determined by many factors. One of the challenges you face is that some of the expensive operations are somewhat hidden from view. In spite of this, it is now possible to build competitive, robust, scalable Java applications. AS/400
Java performance analysis tools need improvement. PTDV does a good job, but it needs to include more non-Java events. IBM should consider making PTDV a supported product. Building Performance Explorer collection definitions involves too much trial and error. I also found some documented Java performance events that arent supported. Finally, there needs to be more documentation on how to put the measurements into a broader system perspective. You may be tuning your Java only to find that the problem is the underlying database, communications, or network infrastructure.
References and Related Materials
AS/400 Performance Capabilities ReferenceVersion 4, Release 4, dated August 1999 (SC41-0607-02, CD-ROM AS4PPCP2)
IBM alphaWorks site: www.alphaworks.ibm.com
Java Performance Tuning, Paul Remtema, MC, March 2000
The Spartan Order System: An Exercise in AS/400 Java Application Design, Don Denoncourt, AS/400 NetJava Expert, October/November 1999
private int getNextOrderNo() {
int lastOrderNo = 0;
try {
BigDecimal dtaara = lastOrderNoDtaara.read();
lastOrderNo = dtaara.intValue();
} catch (Exception e) {
System.out.println(read of LASTORDNO data area error: + e);
System.exit(0);
}
lastOrderNo++;
try {
lastOrderNoDtaara.write(new BigDecimal(lastOrderNo));
} catch (Exception e) {
System.out.println(write of LASTORDNO data area error: + e);
System.exit(0);
}
return lastOrderNo;
}
Figure 1: What looks like a well-crafted Java method will often perform poorly until tuned.
ADDPEXDFN
DFN(PTDV)
TYPE(*TRACE)
JOB(PREMTEMA/QJVACMDSRV)
TASK(*ALL)
MAXSTG(100000)
TRCTYPE(*SLTEVT)
MCHINST(*NONE)
BASEVT(*PMCO)
PGMEVT(*JVAENTRY *JVAEXIT)
JVAEVT(*OBJCRT *THDSTTCHG)
Figure 2: Object creation and garbage collection events were measured using the AS/400 Performance Explorer.
Figure 3: Performance Trace Data Visualizer (PTDV) can be used to tune your Java applications.
package Record_IO;
/**
* This type was created in VisualAge.
*/
import com.ibm.as400.access.*;
public class Record_IO
{ static AS400 system;
static int intorder;
static SequentialFile Lastorder;
static AS400FileRecordDescription ordfmtdes;
static RecordFormat ordfmt[];
static Record ordrec;
static AS400Bin4 converter;
/**
* Record_IO constructor. This constructor assumes that the order number is maintained
* in a physical file. The file format contains one integer field that is the last order number
* used. The constructor assumes the file already exists. The constructor contains logic to
* add one new record the first time it is called.
*/
public Record_IO()
{
try
{
system = new AS400();
converter = new AS400Bin4();
Lastorder = new SequentialFile(system, // Access the file sequentially
/QSYS.LIB/REMTEMA.LIB/LASTORDNO.FILE);
ordfmtdes = new AS400FileRecordDescription(system,
/QSYS.LIB/REMTEMA.LIB/LASTORDNO.FILE); // Extract the file format(s)
ordfmt = ordfmtdes.retrieveRecordFormat(); // Retrieve formats into format array
ordrec = new Record(ordfmt[0]); // Create record to hold data
Lastorder.setRecordFormat(ordfmt[0]); // Set format attributes in file object
Lastorder.open(SequentialFile.READ_WRITE, // Open for update to enable record lock
1, SequentialFile.COMMIT_LOCK_LEVEL_NONE);
if (Lastorder.readFirst() == null) // Test for existance on first record
{
byte temp[];
temp = converter.toBytes(intorder);
ordrec.setContents(temp);
Lastorder.write(ordrec); // Write first record if it doesnt exist }
}
catch (Exception e)
{
System.out.println(e);
}
}
/**
* Starts the application.
* args[0] is integer loop count.
*/
public static void main(java.lang.String[] args)
{
new Record_IO(); // Note: constructor opens the file
int loop = Integer.parseInt(args[0]);
for (int i = 0; i < loop; i++)
{
nextOrder(); // get next order number. Performance test only
}
try
{
Lastorder.close(); // Close file when done.
}
catch (Exception e)
{
System.exit(1);
}
System.out.println(All Done highest order number is + intorder);
}
/**
* This method was created in VisualAge.
* Returns next order number in integer
*/
public static int nextOrder()
{
try
{
byte temp[];
ordrec = Lastorder.readFirst(); // Read first record
temp = ordrec.getContents(); // Get record data into byte[]
intorder = converter.toInt(temp); // Assume entire record is one integer field
intorder++; // Increment the order
temp = converter.toBytes(intorder); // Move updated field back into byte[]
ordrec.setContents(temp); // Set record object with updated integer
Lastorder.update(ordrec); // Update the file
return (intorder);
}
catch (Exception e)
{
return (-1);
}
}
}
Figure 4: Java methods that are tuned may not be as readable as their cleanly coded counterparts, but they often perform eight to 10 times faster.
Figure 5: Using basic code optimization techniques, the example method had an eightfold improvement in performance.
LATEST COMMENTS
MC Press Online