Putting Business Intelligence at Your Fingertips, Part I

Typography

Smaller Small Medium Big Bigger
Default Helvetica Segoe Georgia Times
Reading Mode

Allow for rapid growth and the rapid addition of new "solutions"
Create an architecture in which the business entities and rules are adhered to across "consumers"
Ensure "one version of the truth"
Add value to the business rapidly

Let's begin with a few simple definitions.

Business intelligence (BI)—A business management term referring to applications and technologies that are used to gather, provide access to, and analyze data and information about company operations.
Data warehouse (DW)—The main repository of an organization's historical data, containing the raw material for decision-support systems. Furthermore, the repository is subject-oriented, time-variant, non-volatile, and integrated to support strategic analysis and reporting.
Data mart—A specialized version of a data warehouse predicated on a specific, predefined need for a certain grouping and configuration of select data. A data mart emphasizes easy access to relevant information.
Operational data store (ODS)—A repository of an organization's transactional data, containing the raw material for decision-support systems. Furthermore, the repository is subject-oriented, volatile, and integrated in support of tactical analysis and reporting.
Extract/Transform/Load (ETL)—A process in data warehousing that involves extracting data from outside sources, transforming it to fit business needs, and loading it into the data warehouse.
Online Transaction Processing (OLTP)—A class of systems that facilitate and manage transaction-oriented applications, typically for data entry and retrieval transaction processing. Sometimes referred to as the line-of-business "operational system."
Online Analytical Processing (OLAP)—An approach to quickly providing answers to analytical queries that are multidimensional in nature; for example, sales figures by customers located in Minnesota for the year 2007.

What the BI Architecture Often Looks Like

When first implementing a BI application, many companies focus only on getting data out of the OLTP system and placing it into a data mart as quickly and easily as possible, totally ignoring a broader architecture that supports growth and/or the addition of more data marts and consumers. This quick and dirty solution usually consists of extracting transactional data from the operational system on a periodic basis, performing some transformations and propagating the data to the data mart (see Figure 1).

This is sufficient and acceptable if no other data marts are required. Experience tells us that this is rarely the case.

Figure 1: A simple BI architecture with only one data mart benefits only a few users. (Click images to enlarge.)

Once the word is out that the first data mart is operational and consumers are receiving benefit from the new source of information, additional data marts will be requested. Of course, the seemingly easiest way to provide the new marts will be to replicate the same design and implementation that was used for the first mart. This approach is shown in Figure 2.

Figure 2: A BI architecture with multiple data marts is better but still not as good as it could be.

Herein lies the problem. This "ad hoc" methodology of creating data marts without an overall architecture has disadvantages:

ETL and cleansing of data requires too many processes.
Duplicate processing burdens the operational system.
You're likely to have inconsistent data across data marts.
Additional data marts are increasingly more expensive to create.

Of course, there are some advantages as well, the largest being that you will not have to get the various data mart users to agree on common business entities as each consumer will have their own version of the truth!

What the BI Architecture Should Look Like

A more robust business intelligence architecture will make use of a "data warehouse," a structure that acts as an independent repository for data that is used for strategic analysis, query, and reporting. Furthermore, the architecture might also include an operational data store (ODS), a structure that acts as an independent repository for transactional data used for tactical query and reporting. This is illustrated in Figure 3.

Figure 3: The recommended BI architecture includes a data warehouse and an ODS.

There are inherent advantages in implementing BI solutions using this architecture:

Extraction, cleansing, and transformation done once for all data
Rapid data mart creation
Consistent business entities across all data marts

Additional data marts can source their data from the data warehouse, which helps ensure consistent processing for all data—i.e., one version of the truth. As a new data mart is designed, you will find that much of the data required for it will already be in the warehouse. In other words, by eliminating duplicate processing and pulling the data once from the operational systems, undue work is minimized, making the whole system more efficient.

Of course, one of the disadvantages of using such a robust architecture is that all data mart users will need to agree on consistent definitions of business entities and rules. This will result in more work for the warehouse architects, work that will be both technical and political in nature.

Warning! When laying out a BI architecture, be prepared for complaints such as these:

"It will take forever to model the whole business. We won't ever finish the warehouse."
"The warehouse will contain data that's not used. Just think of all the extra disk space that will be needed."
"I can implement the data mart by itself a lot faster."

To counter these complains, remember that the modeling activity includes only business entities for the data mart that is in progress, not the entire enterprise. By using the focused data model, only the data required to fulfill the current requirement is extracted from the operational system and placed into the data warehouse. And yes, it is very possible that there will be a lot of data stored in the data warehouse and data marts, especially considering that the requirements placed on the business intelligence system can be different from those of the transactional system. For example, the OLTP system might require only two years worth of data, while the BI system requires five years worth of data in support of trending and macro analysis. Why burden the OLTP system with five years of data?

As for the common declaration that "I can implement the data mart by itself a lot faster"...well, we call this guerrilla data warehousing! All of the BI concepts are used— extract data, transform data, load data, store data, query data—but it's done on a personal or local basis. This promotes multiple versions of the truth. And you have to consider the security issues surrounding the storage of vital business data in disparate locations around the enterprise. We are certain that every business has hundreds of Excel spreadsheets scattered around, masquerading as data marts.

What about the idea of "virtual data warehousing"? Can we just query the operational data directly using complex SQL requests? While this is certainly possible given the latest advances in database management system technology as well as the high-performance systems available, it is not recommended. In our experience, asking the operational system to meet the requirements of both a transactional system and business intelligence system is a tall order. At some point, the requirements will diverge and it will be more efficient and economical to have separate systems.

Based on the BI architecture described herein, let's look at one methodology you can use to design and implement the solution. Again, this represents just one methodology.

Overall Steps in the Methodology

This methodology includes the various steps needed to design, build, test, and monitor the BI system (Figure 4).

Figure 4: Follow these steps to design and implement the recommended solution.

When designing and building the system, you can combine some steps or do them in parallel, while other steps will be dependent on previous steps. Many of the steps and sections of the methodology are iterative in nature, meaning that as you go through the process, it might be advantageous or required to go back and do the series of steps again, taking into account newfound information. Just be aware that activities or conclusions in one step can lead you back to others. Designing and building a business intelligence system is a process of discovery. Some of the discoveries will be used to iterate through the design and implementation of the solution. Other discoveries will be categorized and used as input for the next project.

It is also important to realize that good project management and some technical education will be required throughout all phases. And remember, the knowledge and experience gained will be utilized during the next round of data marts.

Define Requirements

Properly defining, identifying, and understanding the requirements is the most important step. Given that both business requirements and technical requirements are involved, participants in this activity include business leaders, technical leaders, key end users, and a decision-maker. The output of this step includes the following:

High-level requirements, both technical and business
A list of potential projects, with the first project selected
A feasibility study
Identification of project participants, both technical and business
Project scope
Skills and education required to move forward

During the requirements gathering, it will be very tempting to keep working until all the requirements are identified and agreed upon. This can be a trap we call "analysis paralysis." Thus we use the 80/20 rule: Gather approximately 80 percent of the requirements and start moving forward. The other requirements will fall into place, or you will discover the real requirements in a future step.

Create/Refine Data Model

After requirements are captured, the process of creating and refining a data model can begin. A data model is an understanding of how the entities in a business are defined, what they look like, and how they relate to one another. This step allows the data warehouse project team to gain an understanding of the business rules, to control redundancy, and to come to an agreement on common definitions and entities. This process will involve both the technical and business leaders.

The output of this exercise does not have to be too detailed, nor take months of effort. Do not attempt to model the entire enterprise; only model areas and entities specific to this project's scope—i.e., this data mart.

Design Data Warehouse

Based on the requirements and high-level modeling, it is now possible to create the logical and physical database design. The design will also be based on any data mart prototyping, provided this step was done previously.

In general, a data warehouse model is based on getting data into the warehouse as quickly and efficiently as possible, storing large quantities of detailed data, and getting data out of the warehouse and into the data marts. These requirements usually lead to a more normalized data model.

The output of this step includes these items:

Logical data model for the data warehouse
Physical data model for the data warehouse (tables, columns, etc.)
Information for the specific data mart addressed in this project

During the data warehouse design process, consider future growth in terms of more rows a well as additional columns to support future requirements.

Select Data Mart Tools

As soon as the requirements are gathered, you can start to evaluate and select what tool the end users will use to access, analyze, and visualize the information. In other words, you can start to evaluate data mart tools. This is an exciting step, but it's also fraught with problems. The excitement results from actually seeing new and sophisticated products. The problems arise because every vendor demos well, every tool looks great, and it's easy to get confused and languish in analysis paralysis.

When evaluating end user query, analysis, and reporting products, it is very important to remember and make use of the requirements you have identified. When considering tools, keep in mind these influencing factors:

What type of problem are you solving (OLAP, data mining, reporting, or simple querying)?
What are the performance expectations?
What is the realistic number of concurrent users?
What are the various types of users (power users vs. report "lookers")?
What is the anticipated ease of use, ease of learning, and ease of configuration and maintenance?

Another important factor is the budget. It makes no sense to evaluate $100,000 tools when the budget is $10,000. On the other hand, if your business problem requires a $100,000 solution, then by all means get it.

Is it acceptable to change query and reporting tools or to have more than one? The answer rests in meeting requirements. If the requirements dictate more than one tool, this is acceptable. If you discover that the current tool no longer meets the requirements, it is acceptable to change. The goal is to provide business value and a good return on investment (ROI).

Design Data Mart

Once the analysis, query, and reporting tool is identified, you can work on the logical and physical design of the data mart. In many cases, the tool selected drives the design. For example, online analytical processing (OLAP) tools may lead to a star or snowflake schema or may have their own data structure not reliant upon a relational database system. In most cases, query and reporting solutions will lead you to a relational database design. Sophisticated data mining tools may lead you to something completely different in terms of data structure. The design will be dictated by the tooling and algorithms (i.e., the type of mining).

For relational database structures such as star or snowflake schemas, the data mart design process converts requirements into dimensions and facts. The process also converts data into consumable information.

Prototype Data Mart

Once the data mart is designed and the data structures are defined, you can implement a functional prototype using the tool selected. While the data model is realistic, the data doesn't have to be. Generating some test data or a subset of current data is enough to demonstrate the tooling. During the prototyping and testing, you have a first chance to get good sizing information for the data mart.

This step is very exciting yet sometimes frustrating. The end users finally get to see what they will be using, so this is an opportunity for them to provide constructive feedback. New requirements might come up, and existing requirements may be confirmed or denied. It's time to finalize requirements and avoid enlarging the scope of the current project. Unmet requirements can be captured and used as input for the next iteration or project.

This is also an opportunity to get excited about the BI project and continue to drive sponsorship and gather support.

Source Data Warehouse Data

With the data warehouse designed, it's time to determine the operational sources of the warehouse data. In other words, map source data to warehouse data. This activity includes these steps:

Determining the best source for each data element
Determining the best extraction method for the data
Determining what transformation and cleansing must be done to the data
Setting up routines/tools to extract, move, transform, and load the data
Defining error-handling procedures to track, report, and fix invalid data
Implementing appropriate measures for data-quality tracking

During this process, it is important to keep in mind performance requirements to meet the operational batch window and data warehouse refresh requirements. If these requirements were not identified in the beginning, they will come out now.

During this step, the extract, transform, load (ETL) process and tooling will be evaluated and tested. In our experience, the ETL area represents about 70 percent of the effort in implementing a BI solution.

Source Data Mart Data

Given the data mart model, it is now possible to define and design the mapping of data elements to their source in the data warehouse. During this phase, the data mart model can be refined to include summarizations and calculated fields.

Some data mart designs may require an intermediate staging or load file to be created. The loading of data mart data can be set up using similar ETL strategies as the data warehouse, allowing for automation as well as data integrity and quality tracking. The time table for loading new data will be based on the requirements gathered previously. If the users need weekly sales information, the data mart can be populated on a daily or weekly basis. On the other hand, if the users need daily sales information for yesterday, then the data mart load process must occur at least on a daily basis.

Tune Data Warehouse and Data Mart

Once the data warehouse and data mart are populated with the appropriate data, is it time to throw the BI project over the wall and let the users start working?

Of course not! One of the most important steps must occur first: namely, tuning the data warehouse and data mart. During this time, refinements can be made to ensure the data is refreshed in a timely and accurate manner. It is also important to evaluate performance of the ETL system, as well as the analysis, query, and reporting mechanisms. This will involve implementing a proper indexing strategy and possibly implementing materialized query tables. It will be very important to watch for usage behavior to determine if the initial requirements are still accurate. For example, is the data mart easy to use, and is it providing the proper information? One approach is to allow a few key users to take the solution out for a test drive. It will be advantageous to get their unbiased feedback before turning the solution over to the entire user community.

Congratulations! It's time to go live.

Monitor Usage and Maintain Data Warehouse and Data Mart

Once the BI solution is live, is the project over? Absolutely not. Monitoring and maintaining the BI solution are key to providing ongoing value and return on investment. Some of the items to monitor include these:

Which reports are used and not used
What new reports are being built
Which columns are used and not used
Which summarizations are used
Which summarizations are calculated at run time
Which queries are run most frequently
How performance levels are affected as more users and/or data are added to the system

As the information is consumed, new requirements will surface. The new requirements might be against the existing data mart or might necessitate a different data mart. Keeping a trained eye on the BI solution will help you to determine what to do next.

Moving on to the Next Project

As you complete the BI project and start delivering value, you will start to look to the next project. This is the natural result of success. So, what's different for the next data mart?

It's very likely the next data mart will focus on a different area of the business or enterprise. To support the new data mart, the warehouse data model may need to incorporate new entities or relationships. The data warehouse may need new tables and/or columns added, with sources identified for new data elements. Over time, more and more data will already be present in the warehouse, minimizing the need to source entirely new elements. During this process, it is important to remember to add only the data that is needed for the new mart.

Sourcing the new data mart data may reveal ETL processing that is the same as for previous marts. This allows further refinement of data warehouse design to store cleansed, transformed, and augmented data.

It is very possible that new and different analytical and visualization tools may be required for the new mart. This might be due to a different set of users or a different business problem to be solved. This is OK, and it reaffirms one of the reasons a robust BI architecture supports multiple data marts.

Some Final Thoughts...

It is said that data warehousing is a journey. This is a suitable observation, given the continual change in business goals, focus, and requirements.

Your business intelligence system is considered successful not when it's operational, but when it's used.

Mike Cain is the team leader of the IBM DB2 for i Center of Excellence in Rochester, Minnesota. He can be reached at This email address is being protected from spambots. You need JavaScript enabled to view it..

BLOG COMMENTS POWERED BY DISQUS

LATEST COMMENTS

Book Reviews

Book Review: Extract, Transform, and Load with SSIS

Do your business apps access different data sources? This book shows you how to make that task easier
Book Review: 21st Century RPG: /Free, ILE, and MVC

David Shirey’s first book is an educational and entertaining read for “modern” and “old” RPG programmers alike
Book Review: Developing Business Applications for the Web--With HTML, CSS, JSP, PHP, ASP.NET, and JavaScript

If you are ready to get into Web application development, take this book along as your guide
Book Review: DB2 10.5 Fundamentals for LUW: Certification Study Guide (Exam 615)

DBAs who use the book will find it very helpful first in their test study and later as a reference book.
Book Review: DB2 11 for z/OS Database Administration—Certification Study Guide

This is a well-written DB2 11 book that could easily stand on its own as a reference manual, not just a certification guide.
Book Review: Free-Format RPG IV, Third Edition

Jim Martin comes through for us again.
Book Review: IBM i Security Administration and Compliance, Second Edition
Book Review: Programming in ILE RPG, Fifth Edition

This book really hits the mark and is a must-read for all RPG developers.
Book Review: DB2 10.1/10.5 for Linux, UNIX, and Windows Database Administration: Certification Guide
Book Review: Subfiles in Free-Format RPG

Whether you're a newbie or a seasoned pro, this book has something for you.
Book Review: Evolve Your RPG Coding: Move from OPM to ILE ... and Beyond

This book provides an amazingly comprehensive introduction to the concepts while at the same time delivering enough technical detail to make you productive very quickly.
Book Review: Database Design and SQL for DB2
Book Review: The Chief Data Officer Handbook for Data Governance

When implemented appropriately, data governance is a powerful framework.
Book Review: DB2 10 for z/OS: The Smarter, Faster Way to Upgrade

Trying to figure out whether to upgrade? Read on.
Book Review: 5 Keys to Business Analytics Program Success
Book Review: DB2 11: The Ultimate Database for Cloud, Analytics, and Mobile
Book Review: Flexible Input, Dazzling Output with IBM i

Today, it's all about input and output. Getting data into the IBM i from non-traditional sources and then displaying it back out again in varied formats. But where can you go to learn all that you need to know about this critical skill?
Book Review: Advanced Guide to PHP on IBM i

Enterprise-level PHP skills and techniques have been adapted for IBM i developers in Kevin Schroeder's new book.
Book Review: Java for RPG Programmers

If you've been putting off learning Java, you have no excuse anymore!
Book Review: DB2 10.1 Fundamentals: Certification Study Guide

Too valuable to be classified as merely excellent certification material, this book should also rightly take its place on DB2 DBA bookshelves as a solid day-to-day DB2 reference.
Book Review: DB2 10 for Z/OS Database Administration: Certification Study Guide

Whether you're trying to get certified or you just need a great reference book, this is the book for you.
Book Review: Developing Web 2.0 Applications with EGL for IBM i

It's everything you need to know, from the bottom up.
Book Review: Advanced Integrated RPG

Isn't it about time somebody told us how to integrate RPG and Java?
Book Review: Managing Without Walls

If you manage remote or satellite teams, this book is a must-read!
Book Review: Managing Without Walls

If you manage remote or satellite teams, this book is a must-read!
Book Review: The Remote System Explorer

This book speaks directly to the thousands of IBM i programmers who develop in RPG, COBOL, CL, and DDS every day.
Book Review: IBM System i APIs at Work, Second Edition

API expert Bruce Vining delivers the only comprehensive guide to APIs.
Book Review: Functions in Free-Format RPG IV

This one short volume manages to essentially be both a general introduction and a detailed reference.
Book Review: DB2 11: The Database for Big Data and Analytics
Book Review: IBM Mainframe Security: Beyond the Basics

Beginners will have a strong foundation after reading this book. Experienced professionals will reference it frequently.
Book Review: IBM InfoSphere: A Platform for Big Data Governance and Process Data Governance

Find out how IBM is addressing the challenges of big data.
Book Review: Fundamentals of Technology Project Management

Projects can be overwhelming, but taken in small, deliberate steps, all projects are achievable.
Book Review: Customer Experience Analytics

Use CEA as a strategic weapon to stay ahead of your competitors.
Book Review: Big Data Analytics: Disruptive Technologies for Changing the Game

The disciplines of data analytics are evolving to meet the new challenges of big data.
Book Review: IBM i Security: Administration and Compliance

If you have any interest in IBM i security, whether as an administrator, a programmer, or an auditor, then this book is the perfect resource.
Book Review: DB2 9.7 for Linux, UNIX, and Windows Database Administration (Exam 541)

This book, written by the creator of the certification exam, reveals exactly what you'll need to know to prep for the test.
Book Review: Selling Information Governance to the Business

Who governs the information that runs your company?
Book Review: You Want to Do WHAT with PHP?

If you're serious about programming in PHP, get a book that treats you that way.
Book Review: The IBM i Programmer's Guide to PHP

Both a primer and a reference, this book is a must-have for anyone who wants to program in PHP.
Book Review: JavaScript for the Business Developer

There's no faster, easier way to become proficient in JavaScript.
Book Review: SOA for the Business Developer

If you want to know how SOA works in the real world, this is your book.
Book Review: DB2 9 Fundamentals

Whether you want to obtain an IBM certified DB2 professional certification or simply become well-rounded in the fundamental concepts of DB2 and general database theory, this is your book.
Book Review: The Modern RPG IV Language, Fourth Edition

This book isn't a training manual; it's a reference book.

Resource Center

How to Modernize Fast and Within Budget (Quick Guide)
Why Migrate When You Can Modernize?

Business users want new applications now. Market and regulatory pressures require faster application updates and delivery into production. Your IBM i developers may be approaching retirement, and you see no sure way to fill their positions with experienced developers. In addition, you may be caught between maintaining your existing applications and the uncertainty of moving to something new.
Resource Center

The MC Resource Centers bring you the widest selection of white papers, trial software, and on-demand webcasts for you to choose from. >> Review the list of White Papers, Trial Software or On-Demand Webcast at the MC Press Resource Center. >> Add the items to yru Cart and complet he checkout process and submit
IBM i Transformation Risks Every Business Leader Should Know

Join us for this hour-long webcast that will explore:
What to Do When Your AS/400 Talent Retires

IT managers hoping to find new IBM i talent are discovering that the pool of experienced RPG programmers and operators or administrators with intimate knowledge of the operating system and the applications that run on it is small. This begs the question: How will you manage the platform that supports such a big part of your business? This guide offers strategies and software suggestions to help you plan IT staffing and resources and smooth the transition after your AS/400 talent retires. Read on to learn:

Analytics & Cognitive Categories

Latest Analytics & Cognitive News

Career Catgories

Latest Career News

Cloud Categories

Latest Cloud News

IT Infrastructure Categories

Latest IT Infrastructure News

News Categories

Latest News

Programming Categories

Latest Programming News

Security Categories

Latest Security News

Typography

Share This

What the BI Architecture Often Looks Like

What the BI Architecture Should Look Like

Overall Steps in the Methodology

Define Requirements

Create/Refine Data Model

Design Data Warehouse

Select Data Mart Tools

Design Data Mart

Prototype Data Mart

Source Data Warehouse Data

Source Data Mart Data

Tune Data Warehouse and Data Mart

Monitor Usage and Maintain Data Warehouse and Data Mart

Moving on to the Next Project

Some Final Thoughts...

LATEST COMMENTS

MC Press Online

Support MC Press Online

Book Reviews

Book Review: Extract, Transform, and Load with SSIS

Book Review: 21st Century RPG: /Free, ILE, and MVC

Book Review: Developing Business Applications for the Web--With HTML, CSS, JSP, PHP, ASP.NET, and JavaScript

Book Review: DB2 10.5 Fundamentals for LUW: Certification Study Guide (Exam 615)

Book Review: DB2 11 for z/OS Database Administration—Certification Study Guide

Book Review: Free-Format RPG IV, Third Edition

Book Review: IBM i Security Administration and Compliance, Second Edition

Book Review: Programming in ILE RPG, Fifth Edition

Book Review: DB2 10.1/10.5 for Linux, UNIX, and Windows Database Administration: Certification Guide

Book Review: Subfiles in Free-Format RPG

Book Review: Evolve Your RPG Coding: Move from OPM to ILE ... and Beyond

Book Review: Database Design and SQL for DB2