Learn how to use MERGE to synchronize data between tables with just one SQL statement
by Rafael Victória-Pereira
Editor’s note: This article is excerpted from chapter 12 of SQL for IBM i: A Database Modernization Guide, by Rafael Victória-Pereira.
Often, we’re faced with the need to synchronize two tables, in such a way that the target table gets all the relevant information from the other one. For instance, imagine that you have an inventory table that receives daily updates from several stock movement tables. There are new entries to register (INSERT statements), but also item quantity fluctuations (UPDATE statements), which require separate processing. Naturally, you can also have some items disappearing from stock, and depending on how those situations are handled, these can be updates or actual deletes in the inventory table. With the MERGE statement, you can mash up those statements into a single one.
Anatomy of the MERGE Statement
Seems a bit farfetched, right? Well, once you get the hang of it, you’ll see that this two-for-the-price-of-one statement is very useful. But before you’re able to use it, you need to understand it, so let’s take look at the anatomy of the MERGE statement:
MERGE INTO <Target table name>
USING <origin table or SELECT statement>
ON <matching columns>
WHEN MATCHED <(optional) AND <comparisons>> THEN
<Modified UPDATE or DELETE statement>
WHEN NOT MATCHED <(optional) AND <comparisons>> THEN
<Modified INSERT statement>
ELSE IGNORE
This seems like a lot, so let’s go over it line by line, top to bottom:
- MERGE INTO <target table name>—indicates the name of the table or updatable view over which the I/O actions will take place.
- USING <origin table or SELECT statement>—indicates the exact opposite: instead of specifying where the data goes to, this line indicates where the data comes from. It can be a table name or a SELECT statement.
- ON <matching columns>—This line is extremely important! It explains how the records in the origin and target tables will be matched, just like you’d do in an INNER JOIN instruction.
- WHEN MATCHED <(optional) AND <comparisons>> THEN—This is where the fun begins. This and the next WHEN ... line are optional, but at least one of the two has to be specified. In its simplest form (WHEN MATCHED THEN), it indicates that the actions below it will be executed if the conditions stated in the ON <matching columns> line evaluate to true. Note that you can specify additional conditions to be evaluated in conjunction with the ones from the ON ... line. That’s where the <(optional) AND <comparisons>> bit comes into play. This may sound a bit confusing, but I’ll present an example that will help you understand what I mean.
- <Modified UPDATE or DELETE statement>—If a match was found, then you can perform an UPDATE or DELETE. However, note that this won’t be a regular statement. It can only affect the table/view mentioned in the first line of the MERGE statement, so it doesn’t make sense to include the FROM clause here. It’s another of those peculiarities of the MERGE you’ll have to get used to.
- WHEN NOT MATCHED <(optional) AND <comparisons>> THEN—the story for this one is similar to the WHEN MATCHED THEN, but in reverse: this specifies what should happen when the line from the origin table doesn’t match the requirements—that is, when the conditions in the “ON ...” line and (if you also specify them) the ones in the <(optional) AND <comparisons>> part of this line both evaluate to false.
- <Modified INSERT statement>—If and only if a match was not found, then you can issue an INSERT statement. Keep in mind that this is also not a regular statement, because the FROM clause will be absent, as explained earlier.
- ELSE IGNORE—This is the catch-all clause, which is used as an escape when the record doesn’t match any of the previous conditions. Note that you can have multiple WHEN MATCHED and/or WHEN NOT MATCHED conditions, as long as the <(optional) AND <comparisons>> part is different.
You’re probably scratching your head, thinking this is confusing. It’s a bit weird, I know. Let’s go over an example and try to clear things up.
The scenario I described earlier involves two tables: the temporary TEMP_TBL_PERSONS and the “real” TBL_PERSONS. What we want to do is update the second table with the data from the first one. With this, I’ve identified the origin and target tables, and I can start writing the MERGE statement:
MERGE INTO UMADB_CHP5.TBL_PERSONS PERSON
USING UMADB_CHP5.TEMP_TBL_PERSONS T
The TBL_PERSONS table is the target table; I gave it the alias PERSON, while the TEMP_TBL_PERSONS, our origin table, has the alias T. I gave this table a short alias because we’re going to use it a whole lot, and a longer alias would get in the way.
Now let’s see how to match the records in both tables. I said earlier that the person ID, when the record is from an existing person in the database, is supplied in the Excel file. This is particularly handy because the Persons table primary key is the Person_Id column. With this piece of information, I can write the “ON ...” line:
ON PERSON.PERSON_ID = T.PERSON_ID
Now let’s go over the rules again: if the person IDs between the temp table and the real one match, we can update the existing information with the one coming from the temp table. Let’s translate that into code:
WHEN MATCHED THEN
UPDATE SET PERSON.NAME = T.NAME,
PERSON.DATE_OF_BIRTH = T.DATE_OF_BIRTH,
PERSON.HOME_ADDRESS = T.HOME_ADDRESS,
PERSON.HOME_PHONE_NBR = T.HOME_PHONE_NBR,
PERSON.MOBILE_NBR = T.MOBILE_NBR,
PERSON.EMAIL_ADDRESS = T.EMAIL_ADDRESS,
PERSON.DRIVERS_LICENSE = T.DRIVERS_LICENSE,
PERSON.SOCIAL_SEC_NBR = T.SOCIAL_SEC_NBR
Because I don’t have additional conditions, I went for the simplest possible form of the MATCHED line: WHEN MATCHED THEN. The next few (OK, not so few) lines specify what happens when a match is found: I’ll update all the Persons table columns with their namesakes from the temporary table. Note that this is not a regular UPDATE statement—it’s missing the table name and WHERE clause.
The rules of the scenario described earlier also state that when a matching ID is not found, I should insert the new record in the target table. Coding this is also simple enough, but it has a catch:
WHEN NOT MATCHED THEN
INSERT (NAME,
DATE_OF_BIRTH,
HOME_ADDRESS,
HOME_PHONE_NBR,
MOBILE_NBR,
EMAIL_ADDRESS,
DRIVERS_LICENSE,
SOCIAL_SEC_NBR)
VALUES (T.NAME,
T.DATE_OF_BIRTH,
T.HOME_ADDRESS,
T.HOME_PHONE_NBR,
T.MOBILE_NBR,
T.EMAIL_ADDRESS,
T.DRIVERS_LICENSE,
T.SOCIAL_SEC_NBR);
It’s not immediately obvious, so I’ll remind you of the rules: if a matching ID is not found, then I should insert the record. This means that the ID from the temporary table is useless. In other words, I don’t need to (and actually can’t) use it in the INSERT statement. This is not a problem, because the PERSON_ID column value is automatically generated by the database engine. Again, what you see here is a modified INSERT statement—it also lacks the FROM and WHERE clauses.
That’s it! I hope this example made clear how useful and easy (after some practice) the MERGE statement can be. Here’s the complete statement explained above:
MERGE INTO UMADB_CHP5.TBL_PERSONS PERSON
USING UMADB_CHP5.TEMP_TBL_PERSONS T
ON PERSON.PERSON_ID = T.PERSON_ID
WHEN MATCHED THEN
UPDATE SET PERSON.NAME = T.NAME,
PERSON.DATE_OF_BIRTH = T.DATE_OF_BIRTH,
PERSON.HOME_ADDRESS = T.HOME_ADDRESS,
PERSON.HOME_PHONE_NBR = T.HOME_PHONE_NBR,
PERSON.MOBILE_NBR = T.MOBILE_NBR,
PERSON.EMAIL_ADDRESS = T.EMAIL_ADDRESS,
PERSON.DRIVERS_LICENSE = T.DRIVERS_LICENSE,
PERSON.SOCIAL_SEC_NBR = T.SOCIAL_SEC_NBR
WHEN NOT MATCHED THEN
INSERT (NAME,
DATE_OF_BIRTH,
HOME_ADDRESS,
HOME_PHONE_NBR,
MOBILE_NBR,
EMAIL_ADDRESS,
DRIVERS_LICENSE,
SOCIAL_SEC_NBR)
VALUES (T.NAME,
T.DATE_OF_BIRTH,
T.HOME_ADDRESS,
T.HOME_PHONE_NBR,
T.MOBILE_NBR,
T.EMAIL_ADDRESS,
T.DRIVERS_LICENSE,
T.SOCIAL_SEC_NBR);
If you want to play around with it a bit, the downloadable source code for this chapter at https://www.mc-store.com/products/sql-for-ibm-i-a-database-modernization-guide contains all the necessary statements for creating and populating the temporary table with data that you can then use to test the MERGE statement I just explained. I should also mention that there’s a bit more to the MERGE statement than what I said here. For instance, when you use multiple WHEN MATCHED instructions, the evaluation is from top to bottom and only a match per origin record is allowed. If you want to use a more complex MERGE statement, consult the DB2 for i SQL Reference manual and read the rules carefully. If you stick to simple statements, like the one above, you should be fine with what I explained here.
LATEST COMMENTS
MC Press Online