One problem with complex queries is keeping all the pieces in sync; this technique helps you do just that.
I've long extolled the virtues of common table expressions (or CTEs), and I've also showed you how to use UNION statements to great advantage. They're powerful, and I find them particularly well suited to performing ad hoc analysis on production data. Unfortunately, both of those techniques can fall prey to a very specific problem that has caused me severe headaches in my analyses. This article explains the problem and the recently added feature that provides a solution.
The Problem with Componentization
Let me explain the issue. Let's take a simple case, the wonderful UNION. Let's say I want to use UNION to identify all the demand for a specific item, reflecting customer orders and production material requirements (this may, for example, be a subassembly that is used in production but thatt we also sell). I'm not going to spend a lot of time laying out the files themselves; I think I can come up with suitably self-documenting file and field names. Here's what such a query might look like:
select MRPROD prod, MRQTY qty, MRDATE needed from PRDMATREQ
union all
select ODPROD, (ODORDERED – ODSHIPPED), ODREQDATE from CUSORDDTL
order by 1, 2
This is a pretty simple example. I need to then take that to the next level. I want to aggregate that value over the next two weeks. That is, I want the total for each item for the next two weeks. Not a particularly difficult extension. Let's see how that works.
with demand as
(select MRPROD prod, MRQTY qty, MRDATE needed from PRDMATREQ
union all
select ODPROD prod, (ODORDERED – ODSHIPPED) qty,
ODREQDATE needed from CUSORDDTL)
select prod, sum(qty) from demand
where needed < current_date + 14 days
group by prod
You can see that the statement got a bit more complex. That's because I had to make sure the column names matched on both of the subselects in the UNION. But once I did that, I was able to start running aggregation, in this case performing a simple sum by product. I also tossed in a little date selection magic; I used the current date (available from the standard register value CURRENT_DATE) and added 14 days. I can change my date range quite easily, even setting up a range. Let's say, for example, that I want the product aggregation for the last 45 days. I replace the WHERE clause:
where needed between current_date – 45 days and current_date
Now I'm getting the total for the last 45 days. But why am I doing this? Because I want to compare my actual demand to my forecasts. Ah! So let's now calculate those:
with demand as
(select MRPROD prod, MRQTY qty, MRDATE needed from PRDMATREQ
union all
select ODPROD prod, (ODORDERED – ODSHIPPED) qty,
ODREQDATE needed from CUSORDDTL),
demandtotals as
(select prod, sum(qty) totdemand from demand
where needed between current_date – 45 days and current_date
group by prod),
forecasttotals as
(select foprod, sum(foqty) totforecast from forecasts
where fodate between current_date – 45 days and current_date
group by foprod)
select foprod, totforecast, totdemand from
forecasttotals join demandtotals on prod = foprod
where totdemand > totforecast * 1.20
That's a serious piece of work right there! First, I took the aggregation of the demand totals and turned that into a second CTE called, not surprisingly, demandtotals. Next, I aggregated the forecast over the same period into a CTE called forecasttotals. Now I have my two pieces that I can compare using a simple JOIN and a formula that shows only those items where the demand exceeds the forecast by 20% or more. That's just an example; obviously, the comparison criteria could get much more complex. But up until now, I really haven't expressed a problem. But one exists, nonetheless. Let's say that after looking at these numbers, we decide we want to change the window; for example, we want to expand to 60 days rather than just 45. That's pretty easy; I can just find the two places where I have 45 days and change that to 60 days. But imagine if I only change one of them. That could be a disaster; the query would run, but the results would be out of whack, and depending on how egregious the mistake, you might not even notice until after using the numbers. Not good!
Variables to the Rescue!
And that's where the power of variables comes into play! DB2 allows you to create a variable that can then be used throughout your session. In this example, I can create a variable called "fence," which I can then use in the rest of my SQL to calculate my cutoff date. The creation of the variable is very simple:
create variable mylib/fence numeric (4)
This creates a variable named FENCE in library MYLIB. The variable is a numeric variable with four digits and an implied zero decimals. This variable is now available to any session. To use it in my example, I first set it:
set variable fence = 45
And once that's done, I can use it in my complex examples above. Here's the modified version of one of the affected lines:
where needed between current_date – fence days and current_date
As you can see, the variable is easily inserted into the statement. If I replace both hardcoded values with the variable, then all I have to do to change how the statement works is to change the variable with another SET statement. I find this technique to have a lot of value, especially in some of my more complex SQL queries. I often have to join huge files together with complex calculations, displaying only a small selection of the data. By performing the filtering first, I can often significantly reduce the time that the query takes. But if I have selection criteria in multiple places in my query, I'm vulnerable to the problem I've described of missing one of the required changes. So using a variable in the selection process removes a potential nightmare.
Variables aren't perfect. One of the more annoying issues is that they aren't like data areas because they don't keep their value between sessions. There are pros and cons to the idea; the reason I've heard for this design is so that different sessions can run with different values. I guess that makes sense, but it does mean that I have to remember to set my variable each time before I run the query, even if I'm setting it to the same value I used the last time I ran the session. But knowing that behavior ahead of time, I can use the feature where it is best suited.
So hopefully you got a little taste of some complex real-world queries as well as knowledge of a new tool to add to your SQL toolbelt. Stay tuned for more SQL know-how in upcoming articles!
LATEST COMMENTS
MC Press Online