IBM database experts delineate the analytics improvements in DB2 11 for z/OS.
Editor's Note: This article is an excerpt from the new book DB2 11: The Ultimate Database for Cloud, Analytics, and Mobile (MC Press, 2014).
In Version 11, IBM DB2 for z/OS makes it easier to bring analytic components closer to the core operational data—reducing latency, complexity, and costs while improving data quality and governance. Improved query, fast analytics, and reporting facilities can improve competitiveness, reduce risks, and aid confident decision-making with real-time data.
The analytics improvements in DB2 11 include temporal data enhancements with support for views and special registers, transparent archive query, new analytics features, and integration with big data.
Temporal Data Enhancements
DB2 11 introduces support for a period specification when a view is referenced in a FROM clause by an SQL SELECT, UPDATE, or DELETE statement. A couple of examples illustrate how that works.
The first example is a SELECT against a view on a table with a system time specification, where the SELECT specifies a system time period:
CREATE VIEW v01 (col1, col2, col3) AS SELECT * FROM stt;
SELECT * FROM v01
FOR SYSTEM_TIME AS OF TIMESTAMP '2010-01-10 10:00:00';
The second example references a similar sort of view, this time on a table with a business time specification; in this case, an UPDATE and a DELETE are issued against the view, specifying a portion of business time.
CREATE VIEW v8 (col1, col2, col3) AS SELECT * FROM att;
UPDATE v8
FOR PORTION OF BUSINESS_TIME FROM '2009-01-01' TO '2009-06-01'
SET c2 = c2 + 1.10;
DELETE FROM v8
FOR PORTION OF BUSINESS_TIME FROM '2009-01-01' TO '2009-06-01'
WHERE COL1 = 12345;
To enable you to retrieve data from temporal tables without modifying existing SQL, DB2 11 adds support for two temporal special registers, CURRENT TEMPORAL SYSTEM_TIME and CURRENT TEMPORAL BUSINESS_TIME. You can set these special registers to specify a system time or a business period and then execute existing SQL statements as if they had the system time or business period specified.
Two new BIND options, SYSTIMESENSITIVE and BUSTIMESENSITIVE, are used to enable use of the new temporal special registers. To allow the special registers to modify the execution of your queries, you must set one or both of the new BIND options to YES, depending on which special registers you want to use.
Transparent Archive Query
Transparent archive query is designed for the case where only a portion of your data is active or current and is possibly dynamic and subject to INSERT, UPDATE, and DELETE processing. The rest of the data is read-only historical data and is probably referenced infrequently. What you might want to do is to store this data in two tables. The first table would be the current data table, which you'd want to have on high-performance storage with high availability. The second would be a read-only history or archive table, which you could decide to move to more economical storage and possibly offload to the IBM DB2 Analytics Accelerator (IDAA).
Transparent archive query enables applications to query both the current and archive tables with no SQL changes. In other words, the fact that there are two tables is hidden from the application, which is presented with a single table image. By default, an SQL query on the data will retrieve data only from the current or base table. A new global variable, GET_ARCHIVE, can be set to allow the same query to retrieve data from both the base table and the archive table. If you set the special global variable, DB2 automatically converts the SQL to use UNION ALL to select from both tables using dynamic plan switching.
The archiving process of moving data from the current data table to the history table can be user-controlled. However, DB2 11 provides a new global variable, called MOVE_TO_ARCHIVE, that causes a deleted base table row to be moved to the archive table.
Both the base table and the history table have to be created by the user, and the structures of the two tables must be identical. They can be connected by using a new DDL clause, ALTER TABLE ENABLE ARCHIVE.
New Analytics Features
DB2 11 provides improved support for SQL grouping sets, including ROLLUP and CUBE. Previously, DB2 has had limited grouping set support; building each grouping set required a separate query. Now, ROLLUP and CUBE allow for multiple grouping sets inside the same SQL query. ROLLUP is helpful in providing subtotals along a hierarchical dimension, and CUBE is useful for queries that aggregate columns based on multiple dimensions.
Already noted is the performance enhancement for IFI filtering of IFCID 306, used by IDAA V3 with Change Data Capture. DB2 11 also provides this support for IDAA V4:
-
DB2 changes can be propagated to the accelerator as they happen.
-
The staleness of accelerator data can be detected via Real Time Statistics (RTS).
-
Disk storage cost can be reduced by archiving data in the accelerator using the High Performance Storage Saver, maintaining high performance for analytical queries.
-
Workload Manager integration is improved, and better monitoring capabilities are provided.
-
The query offload scope can be increased by using the new special register CURRENT QUERY ACCELERATION.
-
High-performance IBM SPSS in-database scoring via the PACK and UNPACK functions is available in DB2 11. This improvement has been retrofitted to DB2 10 via APAR.
Integration with Big Data
There is no doubt that there is huge interest industry-wide in big data. DB2 11 delivers support for integration with big data by providing connectors to allow DB2 applications to access data stored in Hadoop (a distributed file system) easily and efficiently. This is done by providing new user-defined functions (UDFs) and a new generic table UDF capability.
The goal in this support is to integrate DB2 for z/OS with the Hadoop-based IBM BigInsights platform, enabling traditional applications on DB2 for z/OS to access big data analytics. Analytics jobs can be specified using the JavaScript Object Notation (JSON) query language known as Jaql and submitted to IBM InfoSphere BigInsights. The results are then stored in the Hadoop Distributed File System (HDFS), and a table UDF, HDFS_READ, reads the result from the HDFS for subsequent presentation to an SQL query.
For more information about these and other features of DB2 11 for z/OS, visit this IBM website.
LATEST COMMENTS
MC Press Online