EVIs Rapid Response Unit

Typography

Smaller Small Medium Big Bigger
Default Helvetica Segoe Georgia Times
Reading Mode

With Version 4 Release 3 of OS/400, IBM announced the availability of Encoded Vector Indexes (EVIs). EVIs are a new kind of index. They may sound like some kind of arcane database feature that only a database administrator could love, but they make network administrators happy, as well, by improving user response time.

All relational database management systems (RDBMSs) have some form of B-tree radix indexes. These indexes are, essentially, tree structures that provide very fast retrieval for a small number of rows. Radix indexes were developed years ago when online transaction processing (OLTP) applications were the only thing running on most production systems, and they are well suited to situations in which a fairly simple request results in one or two rows being retrieved from a table.

But with business intelligence, users are running more-complex queries. These queries sometimes retrieve large result sets, and—more importantly—the users are allowed to define ad hoc queries and run them dynamically. The majority of these users are using workstation tools and accessing the data through the network via connectivity protocols such as ODBC. Increasingly, these users are using Web-based data access tools. No matter how they’re issuing the requests, the result is a bottleneck in the database. This is completely transparent to the users; all they see is that response time is slow.

For many types of queries, EVIs can significantly reduce the database bottleneck, thus improving overall response time. Although it is not necessary for network administrators to understand the structure and intricacies of EVIs, it is important that they understand why EVIs improve database performance and recognize the situations in which they will be most useful. This article will provide some background on indexing technologies and provide examples of when and how EVIs help improve performance.

Radix Indexes

Although radix indexes are effective for static queries, they become less effective for ad hoc queries. In a static, or compiled, query, the database administrator already knows all the elements of the query in advance and can create the perfect index to support the query. But in an ad hoc environment, the user is allowed to select the elements of the query at will. In a datamart, for example, the database administrator might know that the user will always select store number and date, but, beyond that, the user might be able to choose from a

broad spectrum of elements, such as revenue, profit, inventory, or item. Given the user’s possible choices, it becomes impossible to define the perfect radix index in advance of the query.

Experts throughout the RDBMS industry have recognized this problem, and various vendors have proposed solutions that can be referred to collectively as bitmapped indexes. The basic concept behind a bitmapped index is that the database generates an array of bits in which each bit represents a row in the table. If the bit is on, the row contains the desired value. So, for example, if you have a customer table and you define a bitmapped index over a column called State that contains each state in the United States, an industry- generic bitmapped index will generate 50 bitmaps. If a user creates a query that includes the predicate where state=“california,” the RDBMS will retrieve the bitmap for California and quickly locate the rows that match the query.

The good news about this kind of index is that, in addition to fast retrieval, bitmaps can be combined using Boolean algebra to further increase ad hoc access. For example, say the database contained bitmapped indexes for the columns State and Month and a user’s query contained the statement where state=“california” and month=“march”.

The RDBMS could “and” the two bitmaps together to create a bitmap in which each set bit represented the combination of the two predicates. In an ad hoc environment, this becomes an extremely powerful tool for supporting users.

The bad news about the generic industry solution is that, since the RDBMS generates a bitmap for each unique value, bitmaps become an ungainly solution for large tables. First, consider the scenario in which you build a bitmap over a column called Gender, which will have two unique values (M and F). The database will build two bitmaps; each bitmap will have one bit for every row in the table. Each time a row is added to the customer table, both bitmaps are rebuilt. When the customer table contains 100,000 rows, this bitmapped index will not be too large or too difficult to maintain.

Now, consider the scenario in which you have thousands of unique values and millions of rows; the bitmapped index quickly becomes huge, and maintaining it becomes costly. Imagine building and maintaining a bitmapped index over a column called Cust_Num when there are 1 million customers, each with a unique customer number! The index would contain 1 million bitmaps, each with a million bits. Because of this scenario, most RDBMS vendors recommend bitmapped indexes only for small tables and columns with a low number of unique values.

EVI to the Rescue!

Aware of the inherent problems with bitmapped indexes, IBM Research developed EVIs, which store information about bitmaps rather than the bitmaps themselves. When a query requires a bitmap, DB2 UDB for AS/400 uses the EVI to generate a dynamic bitmap. Like the generic industry bitmaps, these dynamic bitmaps are used to retrieve rows from the database and can be combined through the Boolean functions and and or to produce a bitmap that perfectly matches the user’s request.

Because of the way EVIs are stored internally, their use is far broader than that of a generic bitmapped index. Because DB2 UDB for AS/400 does not have to maintain a bitmap for each unique value, EVIs are a powerful tool for very large tables. And improving performance for very large tables is exactly why network administrators will love EVIs as much as database administrators will. But, not knowing the intricate details of how a database optimizer works, these network administrators may want to know how EVIs improve performance. The simple answer is this: EVIs reduce the number of rows that the database must access. Of course, the simple answer requires more explanation.

As stated previously, the best way for a database to fetch a few rows for a given query is to build an index over the columns used in the query. But when a few rows become a lot of rows, the performance of a radix index begins to fall. Without EVIs, the only option is what database administrators call a full table scan. This is the process

whereby the database must access every row in the table and check to see if it meets the query’s criteria.

A full table scan is not a problem for small tables, but for large tables—say, hundreds of millions of rows—a full table scan becomes a tremendously large, time- consuming task. Even on the AS/400, where parallel I/O and symmetric multiprocessing (SMP) work in concert to deliver some of the fastest table scans in the industry, full table scans can drag down the performance of a query.

Without EVIs, a database optimizer must choose between a radix index and a full table scan. If there is no radix index to match the query, the database has no choice but to use a full table scan. EVIs provide a third solution: The optimizer can combine several EVIs to create a dynamic bitmap that tells the system exactly which rows to retrieve and which rows to skip. This kind of data retrieval is called skip sequential processing. It uses the power of the AS/400’s parallel I/O subsystem while avoiding the need to access every row in the table.

Performance Testing EVIs

To verify the viability of EVIs, the AS/400 Teraplex Integration Center, a customer testing facility dedicated to high-end business intelligence, set up a test that exercised the three major data access methods: radix indexes, EVIs, and full table scans. At the outset, the hypothesis was that the database would use EVIs and skip sequential processing when an intermediate percentage of rows were being returned. Figure 1 describes how experimenters thought EVIs would be used, relative to other data access methods. As the chart shows, researchers expected that, even when EVIs were available, the database would continue to use a radix index when a small percentage of rows were being retrieved. They also expected to see full table scans used for a large percentage of returned rows.

The purpose of the test was to quantify, at least for one query, where and when the database would stop using one access method and begin using another. To do this, the Teraplex Center researchers created a 512 GB, 2.1 billion-row table representing a distribution business. Figure 2 shows how they defined the query. With the query defined, the team could then build the “perfect” radix index and a set of EVIs that would match the query. The perfect radix index, which is an index built specifically for the query at hand, was built over the columns Year, ReturnFlag, Shipmode, and CustKey because these were the columns used to select which rows would be retrieved. In addition, the team built single-column EVIs over the same set of columns. That is, it defined indexes in the following way:

CREATE ENCODED VECTOR INDEX STARDB/YEAREVI + ON STARDB/ITEMFACT (YEAR) + WITH 333000 DISTINCT VALUES

Similar indexes were created for the other columns. By changing the value of the variable custkey_value in the query, the number of rows returned could be controlled. In the first iteration, 152 rows were returned. The value of custkey_value was systematically increased with each run of the query until all 2.1 billion rows were returned. The table in Figure 3 describes the results.

The results follow the general pattern described in Figure 1. However, the Teraplex team learned a couple of important points from this test. For one, when 0.05 to 25 percent of the rows were retrieved, DB2 UDB for AS/400 chose to use a radix index and skip sequential processing. The database has had this capability since V4R2, and the Teraplex team has seen it used effectively many times. In this case, it was not expecting to see this combination of index and access method used more often than a radix index with key row positioning. These results are a testament to the effectiveness of skip sequential processing.

Second, the system consistently used no more than three EVIs for any query, even though seven EVIs were defined over the table. The optimizer analyzes the cost of using all

the EVIs, and, in this case, it determined that using three of the seven EVIs provided the most efficient access to the required rows.

Finally, the full table scan was not used until 63 percent of the rows were returned. Without EVIs, the optimizer will often choose a full table scan when 20 percent of the rows are returned. On AS/400s with large memory configurations, like the ones in the Teraplex Center, a full table scan may be used when as few as 10 percent of the rows are returned because the system can preload data into memory in parallel.

Although full table scans are very efficient for a high percentage of returned rows, it’s important to understand the cost of using them for all queries. The Teraplex Center team took the same query described in Figure 1 and ran it three different ways, returning 152 rows each time. First, it ran the query with no indexes present, forcing the system to use a full table scan. Then, it ran the query with the seven EVIs defined over the table. Finally, it dropped all the EVIs and ran the query with only a radix index present. Figure 4 illustrates the results of these tests. As expected, with such a small number of rows returned, a radix index was still the most efficient method for processing this query. But more important than that is the dramatic difference between using an EVI and using a full table scan. The full table scan is much slower than an EVI.

The Results Speak for Themselves

The results of the Teraplex tests illustrate the power of EVIs in an ad hoc query environment. These users demand performance, and EVIs are another tool for meeting the performance demand.

Response Time/ System Resources

Skip sequential with EVIs and dynamic bitmaps

Sequential full table scan Few Many Number of rows accessed

Keyed with "perfect" rapid index

Figure 1: This graph shows how experimenters thought EVIs would be used relative to other data access methods.

SELECT CUSTKEY, LINENUMBER FROM ITEM_FACT WHERE YEAR = 1998 AND RETURNFLAG = "R" AND SHIPMODE = "AIR" AND CUSTKEY <= custkey_value

Figure 2: The experimenters defined the query for the test with this SQL statement.

Percent Rows Returned Index Used for Access Access Method

rows < 0.05 Radix Key Row Positioning
0.05 < rows < 25 Radix Skip Sequential 25 < rows < 63 3 EVIs Skip Sequential rows > 63 none Full Table Scan The table was 512 GB and contained 2.1 billion rows. The test was run on a model 740, 12-way system with 40 GB of memory. These results are particular to a given query on a given configuration. Your results will vary.

Figure 3: This table describes the results after all 2.1 billion rows were returned.

Time

(in minutes)

49.43

Full Table Scan EVIs Perfect Index

3.98 0.33

Figure 4: The team also analyzed the time costs of using just one access method for all queries.

BLOG COMMENTS POWERED BY DISQUS

LATEST COMMENTS

Book Reviews

Book Review: Extract, Transform, and Load with SSIS

Do your business apps access different data sources? This book shows you how to make that task easier
Book Review: 21st Century RPG: /Free, ILE, and MVC

David Shirey’s first book is an educational and entertaining read for “modern” and “old” RPG programmers alike
Book Review: Developing Business Applications for the Web--With HTML, CSS, JSP, PHP, ASP.NET, and JavaScript

If you are ready to get into Web application development, take this book along as your guide
Book Review: DB2 10.5 Fundamentals for LUW: Certification Study Guide (Exam 615)

DBAs who use the book will find it very helpful first in their test study and later as a reference book.
Book Review: DB2 11 for z/OS Database Administration—Certification Study Guide

This is a well-written DB2 11 book that could easily stand on its own as a reference manual, not just a certification guide.
Book Review: Free-Format RPG IV, Third Edition

Jim Martin comes through for us again.
Book Review: IBM i Security Administration and Compliance, Second Edition
Book Review: Programming in ILE RPG, Fifth Edition

This book really hits the mark and is a must-read for all RPG developers.
Book Review: DB2 10.1/10.5 for Linux, UNIX, and Windows Database Administration: Certification Guide
Book Review: Subfiles in Free-Format RPG

Whether you're a newbie or a seasoned pro, this book has something for you.
Book Review: Evolve Your RPG Coding: Move from OPM to ILE ... and Beyond

This book provides an amazingly comprehensive introduction to the concepts while at the same time delivering enough technical detail to make you productive very quickly.
Book Review: Database Design and SQL for DB2
Book Review: The Chief Data Officer Handbook for Data Governance

When implemented appropriately, data governance is a powerful framework.
Book Review: DB2 10 for z/OS: The Smarter, Faster Way to Upgrade

Trying to figure out whether to upgrade? Read on.
Book Review: 5 Keys to Business Analytics Program Success
Book Review: DB2 11: The Ultimate Database for Cloud, Analytics, and Mobile
Book Review: Flexible Input, Dazzling Output with IBM i

Today, it's all about input and output. Getting data into the IBM i from non-traditional sources and then displaying it back out again in varied formats. But where can you go to learn all that you need to know about this critical skill?
Book Review: Advanced Guide to PHP on IBM i

Enterprise-level PHP skills and techniques have been adapted for IBM i developers in Kevin Schroeder's new book.
Book Review: Java for RPG Programmers

If you've been putting off learning Java, you have no excuse anymore!
Book Review: DB2 10.1 Fundamentals: Certification Study Guide

Too valuable to be classified as merely excellent certification material, this book should also rightly take its place on DB2 DBA bookshelves as a solid day-to-day DB2 reference.
Book Review: DB2 10 for Z/OS Database Administration: Certification Study Guide

Whether you're trying to get certified or you just need a great reference book, this is the book for you.
Book Review: Developing Web 2.0 Applications with EGL for IBM i

It's everything you need to know, from the bottom up.
Book Review: Advanced Integrated RPG

Isn't it about time somebody told us how to integrate RPG and Java?
Book Review: Managing Without Walls

If you manage remote or satellite teams, this book is a must-read!
Book Review: Managing Without Walls

If you manage remote or satellite teams, this book is a must-read!
Book Review: The Remote System Explorer

This book speaks directly to the thousands of IBM i programmers who develop in RPG, COBOL, CL, and DDS every day.
Book Review: IBM System i APIs at Work, Second Edition

API expert Bruce Vining delivers the only comprehensive guide to APIs.
Book Review: Functions in Free-Format RPG IV

This one short volume manages to essentially be both a general introduction and a detailed reference.
Book Review: DB2 11: The Database for Big Data and Analytics
Book Review: IBM Mainframe Security: Beyond the Basics

Beginners will have a strong foundation after reading this book. Experienced professionals will reference it frequently.
Book Review: IBM InfoSphere: A Platform for Big Data Governance and Process Data Governance

Find out how IBM is addressing the challenges of big data.
Book Review: Fundamentals of Technology Project Management

Projects can be overwhelming, but taken in small, deliberate steps, all projects are achievable.
Book Review: Customer Experience Analytics

Use CEA as a strategic weapon to stay ahead of your competitors.
Book Review: Big Data Analytics: Disruptive Technologies for Changing the Game

The disciplines of data analytics are evolving to meet the new challenges of big data.
Book Review: IBM i Security: Administration and Compliance

If you have any interest in IBM i security, whether as an administrator, a programmer, or an auditor, then this book is the perfect resource.
Book Review: DB2 9.7 for Linux, UNIX, and Windows Database Administration (Exam 541)

This book, written by the creator of the certification exam, reveals exactly what you'll need to know to prep for the test.
Book Review: Selling Information Governance to the Business

Who governs the information that runs your company?
Book Review: You Want to Do WHAT with PHP?

If you're serious about programming in PHP, get a book that treats you that way.
Book Review: The IBM i Programmer's Guide to PHP

Both a primer and a reference, this book is a must-have for anyone who wants to program in PHP.
Book Review: JavaScript for the Business Developer

There's no faster, easier way to become proficient in JavaScript.
Book Review: SOA for the Business Developer

If you want to know how SOA works in the real world, this is your book.
Book Review: DB2 9 Fundamentals

Whether you want to obtain an IBM certified DB2 professional certification or simply become well-rounded in the fundamental concepts of DB2 and general database theory, this is your book.
Book Review: The Modern RPG IV Language, Fourth Edition

This book isn't a training manual; it's a reference book.

Resource Center

How to Modernize Fast and Within Budget (Quick Guide)
Why Migrate When You Can Modernize?

Business users want new applications now. Market and regulatory pressures require faster application updates and delivery into production. Your IBM i developers may be approaching retirement, and you see no sure way to fill their positions with experienced developers. In addition, you may be caught between maintaining your existing applications and the uncertainty of moving to something new.
Resource Center

The MC Resource Centers bring you the widest selection of white papers, trial software, and on-demand webcasts for you to choose from. >> Review the list of White Papers, Trial Software or On-Demand Webcast at the MC Press Resource Center. >> Add the items to yru Cart and complet he checkout process and submit
IBM i Transformation Risks Every Business Leader Should Know

Join us for this hour-long webcast that will explore:
What to Do When Your AS/400 Talent Retires

IT managers hoping to find new IBM i talent are discovering that the pool of experienced RPG programmers and operators or administrators with intimate knowledge of the operating system and the applications that run on it is small. This begs the question: How will you manage the platform that supports such a big part of your business? This guide offers strategies and software suggestions to help you plan IT staffing and resources and smooth the transition after your AS/400 talent retires. Read on to learn:

Analytics & Cognitive Categories

Latest Analytics & Cognitive News

Career Catgories

Latest Career News

Cloud Categories

Latest Cloud News

IT Infrastructure Categories

Latest IT Infrastructure News

News Categories

Latest News

Programming Categories

Latest Programming News

Security Categories

Latest Security News

Typography

Share This

Radix Indexes

EVI to the Rescue!

Performance Testing EVIs

The Results Speak for Themselves

LATEST COMMENTS

MC Press Online

Support MC Press Online

Book Reviews

Book Review: Extract, Transform, and Load with SSIS

Book Review: 21st Century RPG: /Free, ILE, and MVC

Book Review: Developing Business Applications for the Web--With HTML, CSS, JSP, PHP, ASP.NET, and JavaScript

Book Review: DB2 10.5 Fundamentals for LUW: Certification Study Guide (Exam 615)

Book Review: DB2 11 for z/OS Database Administration—Certification Study Guide

Book Review: Free-Format RPG IV, Third Edition

Book Review: IBM i Security Administration and Compliance, Second Edition

Book Review: Programming in ILE RPG, Fifth Edition

Book Review: DB2 10.1/10.5 for Linux, UNIX, and Windows Database Administration: Certification Guide

Book Review: Subfiles in Free-Format RPG

Book Review: Evolve Your RPG Coding: Move from OPM to ILE ... and Beyond

Book Review: Database Design and SQL for DB2

Book Review: The Chief Data Officer Handbook for Data Governance

Book Review: DB2 10 for z/OS: The Smarter, Faster Way to Upgrade

Book Review: 5 Keys to Business Analytics Program Success

Book Review: DB2 11: The Ultimate Database for Cloud, Analytics, and Mobile

Book Review: Flexible Input, Dazzling Output with IBM i

Book Review: Advanced Guide to PHP on IBM i

Book Review: Java for RPG Programmers

Book Review: DB2 10.1 Fundamentals: Certification Study Guide

Book Review: DB2 10 for Z/OS Database Administration: Certification Study Guide

Book Review: Developing Web 2.0 Applications with EGL for IBM i

Book Review: Advanced Integrated RPG