Now that you’ve learned (or simply refreshed your memory about) joining tables, let’s go over some handy column functions. I believe these will save you a lot of work, so keep reading!
SQL’s many functions are incredibly useful and can save you a lot of time. However, they can also be the cause of major headaches, so you need to understand and use them correctly. I’ll go over a few of these functions now and explore a few more later. Let’s start with everybody’s favorite: the COUNT function.
The COUNT Function
We’ve all used it, more times than we can count. It’s the simplest way to determine how many records will be included in a query (unless you specify a FETCH FIRST clause). If you aren’t familiar with FETCH FIRST, don’t worry, because I’ll cover it later in this series. Here is a textbook example of the use of COUNT to count the number of rows in the Students table:
SELECT COUNT(*)
FROM UMADB_CHP2.PFSTM;
COUNT(*) includes rows even if they contain NULL values. Count(expression) excludes NULL values from the count.
The interesting thing about COUNT that you might not know is that you can use it in conjunction with other functions.
Next, I’ll talk about finding the minimum and maximum values of a column, and then I’ll show you an example that includes all three functions.
Finding the Minimum and Maximum Values of a Column
In a high-level programming language, finding the minimum or maximum value of a given column typically requires cycling through the whole table and storing the minimum/maximum value in a temporary variable that gets updated whenever the last record read contains a relevant column value. In SQL, it’s a column function that you can use in a SELECT statement column list or, not as commonly done but also possible, in a HAVING clause. By the way, I’ll also discuss the HAVING clause later, just in case you’re not familiar with it. Here’s how to determine the minimum and maximum salaries of the university’s teachers:
SELECT MIN(TESA) AS MIN_SALARY
, MAX(TESA) AS MAX_SALARY
FROM UMADB_CHP2.PFTEM
;
This example has two very interesting elements: it shows that you can use different column functions together, and it also addresses a typical problem that derives from the use of any column function: the name of the column. If you run the same statement without the “AS xxxx” bits, you’re still going to get correct results, but unless you remember which is which when you’re analyzing the data (OK, in this case it should be obvious, but bear with me), they’re going to be pretty useless because the columns will be named 00001 and 00002, respectively. What I’m getting at is that it’s important to use aliases for your columns whenever you use a function or any other expression, such as a string concatenation or an arithmetic expression, so that its contents are obvious to whomever is looking at the output data.
Speaking of arithmetic expressions, the functions I’ve presented so far can work with numbers and characters alike, but the last two of my examples work only on numbers.
Sums and Averages Made Easy
As in the minimum/maximum scenario, summing up a column of values or finding its average in a high-level programming language requires some work, but in SQL there’s a column function that does that for you. Let’s start with one you’ve probably used before:
SELECT SUM(TESA) AS TOTAL_SALARIES
FROM UMADB_CHP2.PFTEM
;
This statement returns the sum of the teachers’ salaries and can be used in conjunction with other column functions without any problems. However, if you try to sum the teachers’ ranks, a non-numeric column, the database engine will return the SQL0402 error message, which explains that you can “only” use INTEGER, SMALLINT, BIGINT, DECIMAL, ZONED, FLOAT, REAL, DOUBLE (or DOUBLE_PRECISION), and DECFLOAT data type values as an argument to the SUM function. If you have a numeric value stored in a character column, you can try to use the DIGITS function to convert it to a number and then calculate the sum. Something like SUM(DIGITS(YOUR_CHARACTER_FIELD)) should work, as long as the values of the character column are convertible to numeric format. Similarly, you can calculate the average of numeric values, and the same rules apply. However, the AVG function can have an unpleasant side effect that is also often misleading for the end user: its precision. Let’s run a quick example to illustrate this problem. You can calculate the average salary of the teachers, with the following statement:
SELECT AVG(TESA) AS AVERAGE_SALARY
FROM UMADB_CHP2.PFTEM
;
If you run a full select of the table and calculate the average yourself, you’ll see that this output value is accurate and perhaps even too accurate (138333.3333333333333333333333) for the end user. After all, the user is expecting an amount, which usually means a number with two decimal places, not a huge train of 3s that can be confusing.
I chose this example because it allows me to introduce another column function that is akin to the DIGITS function on steroids: CAST, which we’ll discuss in the next TechTip! Until then, feel free to comment, correct, or provide additional examples that might help other readers, using the Comments section below.
LATEST COMMENTS
MC Press Online