29
Fri, Nov
0 New Articles

EVIs Rapid Response Unit

Typography
  • Smaller Small Medium Big Bigger
  • Default Helvetica Segoe Georgia Times

With Version 4 Release 3 of OS/400, IBM announced the availability of Encoded Vector Indexes (EVIs). EVIs are a new kind of index. They may sound like some kind of arcane database feature that only a database administrator could love, but they make network administrators happy, as well, by improving user response time.

All relational database management systems (RDBMSs) have some form of B-tree radix indexes. These indexes are, essentially, tree structures that provide very fast retrieval for a small number of rows. Radix indexes were developed years ago when online transaction processing (OLTP) applications were the only thing running on most production systems, and they are well suited to situations in which a fairly simple request results in one or two rows being retrieved from a table.

But with business intelligence, users are running more-complex queries. These queries sometimes retrieve large result sets, and—more importantly—the users are allowed to define ad hoc queries and run them dynamically. The majority of these users are using workstation tools and accessing the data through the network via connectivity protocols such as ODBC. Increasingly, these users are using Web-based data access tools. No matter how they’re issuing the requests, the result is a bottleneck in the database. This is completely transparent to the users; all they see is that response time is slow.

For many types of queries, EVIs can significantly reduce the database bottleneck, thus improving overall response time. Although it is not necessary for network administrators to understand the structure and intricacies of EVIs, it is important that they understand why EVIs improve database performance and recognize the situations in which they will be most useful. This article will provide some background on indexing technologies and provide examples of when and how EVIs help improve performance.

Radix Indexes

Although radix indexes are effective for static queries, they become less effective for ad hoc queries. In a static, or compiled, query, the database administrator already knows all the elements of the query in advance and can create the perfect index to support the query. But in an ad hoc environment, the user is allowed to select the elements of the query at will. In a datamart, for example, the database administrator might know that the user will always select store number and date, but, beyond that, the user might be able to choose from a


broad spectrum of elements, such as revenue, profit, inventory, or item. Given the user’s possible choices, it becomes impossible to define the perfect radix index in advance of the query.

Experts throughout the RDBMS industry have recognized this problem, and various vendors have proposed solutions that can be referred to collectively as bitmapped indexes. The basic concept behind a bitmapped index is that the database generates an array of bits in which each bit represents a row in the table. If the bit is on, the row contains the desired value. So, for example, if you have a customer table and you define a bitmapped index over a column called State that contains each state in the United States, an industry- generic bitmapped index will generate 50 bitmaps. If a user creates a query that includes the predicate where state=“california,” the RDBMS will retrieve the bitmap for California and quickly locate the rows that match the query.

The good news about this kind of index is that, in addition to fast retrieval, bitmaps can be combined using Boolean algebra to further increase ad hoc access. For example, say the database contained bitmapped indexes for the columns State and Month and a user’s query contained the statement where state=“california” and month=“march”.

The RDBMS could “and” the two bitmaps together to create a bitmap in which each set bit represented the combination of the two predicates. In an ad hoc environment, this becomes an extremely powerful tool for supporting users.

The bad news about the generic industry solution is that, since the RDBMS generates a bitmap for each unique value, bitmaps become an ungainly solution for large tables. First, consider the scenario in which you build a bitmap over a column called Gender, which will have two unique values (M and F). The database will build two bitmaps; each bitmap will have one bit for every row in the table. Each time a row is added to the customer table, both bitmaps are rebuilt. When the customer table contains 100,000 rows, this bitmapped index will not be too large or too difficult to maintain.

Now, consider the scenario in which you have thousands of unique values and millions of rows; the bitmapped index quickly becomes huge, and maintaining it becomes costly. Imagine building and maintaining a bitmapped index over a column called Cust_Num when there are 1 million customers, each with a unique customer number! The index would contain 1 million bitmaps, each with a million bits. Because of this scenario, most RDBMS vendors recommend bitmapped indexes only for small tables and columns with a low number of unique values.

EVI to the Rescue!

Aware of the inherent problems with bitmapped indexes, IBM Research developed EVIs, which store information about bitmaps rather than the bitmaps themselves. When a query requires a bitmap, DB2 UDB for AS/400 uses the EVI to generate a dynamic bitmap. Like the generic industry bitmaps, these dynamic bitmaps are used to retrieve rows from the database and can be combined through the Boolean functions and and or to produce a bitmap that perfectly matches the user’s request.

Because of the way EVIs are stored internally, their use is far broader than that of a generic bitmapped index. Because DB2 UDB for AS/400 does not have to maintain a bitmap for each unique value, EVIs are a powerful tool for very large tables. And improving performance for very large tables is exactly why network administrators will love EVIs as much as database administrators will. But, not knowing the intricate details of how a database optimizer works, these network administrators may want to know how EVIs improve performance. The simple answer is this: EVIs reduce the number of rows that the database must access. Of course, the simple answer requires more explanation.

As stated previously, the best way for a database to fetch a few rows for a given query is to build an index over the columns used in the query. But when a few rows become a lot of rows, the performance of a radix index begins to fall. Without EVIs, the only option is what database administrators call a full table scan. This is the process


whereby the database must access every row in the table and check to see if it meets the query’s criteria.

A full table scan is not a problem for small tables, but for large tables—say, hundreds of millions of rows—a full table scan becomes a tremendously large, time- consuming task. Even on the AS/400, where parallel I/O and symmetric multiprocessing (SMP) work in concert to deliver some of the fastest table scans in the industry, full table scans can drag down the performance of a query.

Without EVIs, a database optimizer must choose between a radix index and a full table scan. If there is no radix index to match the query, the database has no choice but to use a full table scan. EVIs provide a third solution: The optimizer can combine several EVIs to create a dynamic bitmap that tells the system exactly which rows to retrieve and which rows to skip. This kind of data retrieval is called skip sequential processing. It uses the power of the AS/400’s parallel I/O subsystem while avoiding the need to access every row in the table.

Performance Testing EVIs

To verify the viability of EVIs, the AS/400 Teraplex Integration Center, a customer testing facility dedicated to high-end business intelligence, set up a test that exercised the three major data access methods: radix indexes, EVIs, and full table scans. At the outset, the hypothesis was that the database would use EVIs and skip sequential processing when an intermediate percentage of rows were being returned. Figure 1 describes how experimenters thought EVIs would be used, relative to other data access methods. As the chart shows, researchers expected that, even when EVIs were available, the database would continue to use a radix index when a small percentage of rows were being retrieved. They also expected to see full table scans used for a large percentage of returned rows.

The purpose of the test was to quantify, at least for one query, where and when the database would stop using one access method and begin using another. To do this, the Teraplex Center researchers created a 512 GB, 2.1 billion-row table representing a distribution business. Figure 2 shows how they defined the query. With the query defined, the team could then build the “perfect” radix index and a set of EVIs that would match the query. The perfect radix index, which is an index built specifically for the query at hand, was built over the columns Year, ReturnFlag, Shipmode, and CustKey because these were the columns used to select which rows would be retrieved. In addition, the team built single-column EVIs over the same set of columns. That is, it defined indexes in the following way:

CREATE ENCODED VECTOR INDEX STARDB/YEAREVI +
ON STARDB/ITEMFACT (YEAR) +
WITH 333000 DISTINCT VALUES

Similar indexes were created for the other columns. By changing the value of the variable custkey_value in the query, the number of rows returned could be controlled. In the first iteration, 152 rows were returned. The value of custkey_value was systematically increased with each run of the query until all 2.1 billion rows were returned. The table in Figure 3 describes the results.

The results follow the general pattern described in Figure 1. However, the Teraplex team learned a couple of important points from this test. For one, when 0.05 to 25 percent of the rows were retrieved, DB2 UDB for AS/400 chose to use a radix index and skip sequential processing. The database has had this capability since V4R2, and the Teraplex team has seen it used effectively many times. In this case, it was not expecting to see this combination of index and access method used more often than a radix index with key row positioning. These results are a testament to the effectiveness of skip sequential processing.

Second, the system consistently used no more than three EVIs for any query, even though seven EVIs were defined over the table. The optimizer analyzes the cost of using all


the EVIs, and, in this case, it determined that using three of the seven EVIs provided the most efficient access to the required rows.

Finally, the full table scan was not used until 63 percent of the rows were returned. Without EVIs, the optimizer will often choose a full table scan when 20 percent of the rows are returned. On AS/400s with large memory configurations, like the ones in the Teraplex Center, a full table scan may be used when as few as 10 percent of the rows are returned because the system can preload data into memory in parallel.

Although full table scans are very efficient for a high percentage of returned rows, it’s important to understand the cost of using them for all queries. The Teraplex Center team took the same query described in Figure 1 and ran it three different ways, returning 152 rows each time. First, it ran the query with no indexes present, forcing the system to use a full table scan. Then, it ran the query with the seven EVIs defined over the table. Finally, it dropped all the EVIs and ran the query with only a radix index present. Figure 4 illustrates the results of these tests. As expected, with such a small number of rows returned, a radix index was still the most efficient method for processing this query. But more important than that is the dramatic difference between using an EVI and using a full table scan. The full table scan is much slower than an EVI.

The Results Speak for Themselves

The results of the Teraplex tests illustrate the power of EVIs in an ad hoc query environment. These users demand performance, and EVIs are another tool for meeting the performance demand.

Response Time/ System Resources

Skip sequential with EVIs and dynamic bitmaps

Sequential full table scan Few Many Number of rows accessed

Keyed with "perfect" rapid index

Figure 1: This graph shows how experimenters thought EVIs would be used relative to other data access methods.

SELECT CUSTKEY, LINENUMBER
FROM ITEM_FACT
WHERE YEAR = 1998
AND RETURNFLAG = "R"
AND SHIPMODE = "AIR"
AND CUSTKEY <= custkey_value


Figure 2: The experimenters defined the query for the test with this SQL statement.

Percent Rows Returned Index Used for Access Access Method

rows < 0.05 Radix Key Row Positioning
0.05 < rows < 25 Radix Skip Sequential 25 < rows < 63 3 EVIs Skip Sequential rows > 63 none Full Table Scan The table was 512 GB and contained 2.1 billion rows. The test was run on a model 740, 12-way system with 40 GB of memory. These results are particular to a given query on a given configuration. Your results will vary.

Figure 3: This table describes the results after all 2.1 billion rows were returned.

Time

(in minutes)

60

50

40

30

20

10

0

49.43

Full Table Scan EVIs Perfect Index

3.98 0.33

Figure 4: The team also analyzed the time costs of using just one access method for all queries.


BLOG COMMENTS POWERED BY DISQUS

LATEST COMMENTS

Support MC Press Online

$

Book Reviews

Resource Center

  • SB Profound WC 5536 Have you been wondering about Node.js? Our free Node.js Webinar Series takes you from total beginner to creating a fully-functional IBM i Node.js business application. You can find Part 1 here. In Part 2 of our free Node.js Webinar Series, Brian May teaches you the different tooling options available for writing code, debugging, and using Git for version control. Brian will briefly discuss the different tools available, and demonstrate his preferred setup for Node development on IBM i or any platform. Attend this webinar to learn:

  • SB Profound WP 5539More than ever, there is a demand for IT to deliver innovation. Your IBM i has been an essential part of your business operations for years. However, your organization may struggle to maintain the current system and implement new projects. The thousands of customers we've worked with and surveyed state that expectations regarding the digital footprint and vision of the company are not aligned with the current IT environment.

  • SB HelpSystems ROBOT Generic IBM announced the E1080 servers using the latest Power10 processor in September 2021. The most powerful processor from IBM to date, Power10 is designed to handle the demands of doing business in today’s high-tech atmosphere, including running cloud applications, supporting big data, and managing AI workloads. But what does Power10 mean for your data center? In this recorded webinar, IBMers Dan Sundt and Dylan Boday join IBM Power Champion Tom Huntington for a discussion on why Power10 technology is the right strategic investment if you run IBM i, AIX, or Linux. In this action-packed hour, Tom will share trends from the IBM i and AIX user communities while Dan and Dylan dive into the tech specs for key hardware, including:

  • Magic MarkTRY the one package that solves all your document design and printing challenges on all your platforms. Produce bar code labels, electronic forms, ad hoc reports, and RFID tags – without programming! MarkMagic is the only document design and print solution that combines report writing, WYSIWYG label and forms design, and conditional printing in one integrated product. Make sure your data survives when catastrophe hits. Request your trial now!  Request Now.

  • SB HelpSystems ROBOT GenericForms of ransomware has been around for over 30 years, and with more and more organizations suffering attacks each year, it continues to endure. What has made ransomware such a durable threat and what is the best way to combat it? In order to prevent ransomware, organizations must first understand how it works.

  • SB HelpSystems ROBOT GenericIT security is a top priority for businesses around the world, but most IBM i pros don’t know where to begin—and most cybersecurity experts don’t know IBM i. In this session, Robin Tatam explores the business impact of lax IBM i security, the top vulnerabilities putting IBM i at risk, and the steps you can take to protect your organization. If you’re looking to avoid unexpected downtime or corrupted data, you don’t want to miss this session.

  • SB HelpSystems ROBOT GenericCan you trust all of your users all of the time? A typical end user receives 16 malicious emails each month, but only 17 percent of these phishing campaigns are reported to IT. Once an attack is underway, most organizations won’t discover the breach until six months later. A staggering amount of damage can occur in that time. Despite these risks, 93 percent of organizations are leaving their IBM i systems vulnerable to cybercrime. In this on-demand webinar, IBM i security experts Robin Tatam and Sandi Moore will reveal:

  • FORTRA Disaster protection is vital to every business. Yet, it often consists of patched together procedures that are prone to error. From automatic backups to data encryption to media management, Robot automates the routine (yet often complex) tasks of iSeries backup and recovery, saving you time and money and making the process safer and more reliable. Automate your backups with the Robot Backup and Recovery Solution. Key features include:

  • FORTRAManaging messages on your IBM i can be more than a full-time job if you have to do it manually. Messages need a response and resources must be monitored—often over multiple systems and across platforms. How can you be sure you won’t miss important system events? Automate your message center with the Robot Message Management Solution. Key features include:

  • FORTRAThe thought of printing, distributing, and storing iSeries reports manually may reduce you to tears. Paper and labor costs associated with report generation can spiral out of control. Mountains of paper threaten to swamp your files. Robot automates report bursting, distribution, bundling, and archiving, and offers secure, selective online report viewing. Manage your reports with the Robot Report Management Solution. Key features include:

  • FORTRAFor over 30 years, Robot has been a leader in systems management for IBM i. With batch job creation and scheduling at its core, the Robot Job Scheduling Solution reduces the opportunity for human error and helps you maintain service levels, automating even the biggest, most complex runbooks. Manage your job schedule with the Robot Job Scheduling Solution. Key features include:

  • LANSA Business users want new applications now. Market and regulatory pressures require faster application updates and delivery into production. Your IBM i developers may be approaching retirement, and you see no sure way to fill their positions with experienced developers. In addition, you may be caught between maintaining your existing applications and the uncertainty of moving to something new.

  • LANSAWhen it comes to creating your business applications, there are hundreds of coding platforms and programming languages to choose from. These options range from very complex traditional programming languages to Low-Code platforms where sometimes no traditional coding experience is needed. Download our whitepaper, The Power of Writing Code in a Low-Code Solution, and:

  • LANSASupply Chain is becoming increasingly complex and unpredictable. From raw materials for manufacturing to food supply chains, the journey from source to production to delivery to consumers is marred with inefficiencies, manual processes, shortages, recalls, counterfeits, and scandals. In this webinar, we discuss how:

  • The MC Resource Centers bring you the widest selection of white papers, trial software, and on-demand webcasts for you to choose from. >> Review the list of White Papers, Trial Software or On-Demand Webcast at the MC Press Resource Center. >> Add the items to yru Cart and complet he checkout process and submit

  • Profound Logic Have you been wondering about Node.js? Our free Node.js Webinar Series takes you from total beginner to creating a fully-functional IBM i Node.js business application.

  • SB Profound WC 5536Join us for this hour-long webcast that will explore:

  • Fortra IT managers hoping to find new IBM i talent are discovering that the pool of experienced RPG programmers and operators or administrators with intimate knowledge of the operating system and the applications that run on it is small. This begs the question: How will you manage the platform that supports such a big part of your business? This guide offers strategies and software suggestions to help you plan IT staffing and resources and smooth the transition after your AS/400 talent retires. Read on to learn: