The introduction of SQL Server 2005 has brought some exciting new features for developers. For instance, the ability to integrate heterogeneous systems with SQL Server in real time keeps getting easier. Part I of this series covered the power and ease of using linked servers as a means of programmatically providing real-time linkage to a remote database via ODBC or OLE DB. Although the options are limitless, this article demonstrates the power of using the .NET Common Language Runtime (CLR) as a means of integrating with remote databases (in particular, DB2 UDB for the iSeries).
What Is CLR?
CLR is the controlled environment used to execute code written in a .NET-compatible language such as C#, VB.NET, or COBOL.NET. All .NET programs—regardless of language—are compiled into something called the Microsoft Intermediate Language (MSIL). It is this intermediate language code that is actually run in the CLR. The concept employed is similar to the byte code that is run by the Java Virtual Machine (JVM).
How Does SQL Server 2005 Use the CLR?
Simply put, SQL Server 2005 (SQL Server hereafter) has the ability to execute compiled code written in any .NET language. This means SQL routines (usually written in T-SQL) such as stored procedures, scalar user-defined functions, and table-valued user-defined functions can be written in a .NET language.
Writing database server code in a .NET language has the following advantages:
- Allows code reuse—The database logic and application logic now have access to the exact same code. In the past, an application would have code written in a high-level language, and the database server would often have duplicated code written in SQL.
- Removes SQL limitations—SQL dialects work well with structured database data within the confines of its own server. However, database logic often demands data from other sources, such as external databases on other platforms, various text files, LDAP queries, etc. Allowing the use of .NET code gives the database developer the ability to overcome traditional SQL limitations.
Many of you by now realize that Microsoft is playing catch-up to something DB2 on the iSeries has been able to do for a long time creating additional database routines using high-level language logic written in RPG, COBOL, C, Java, and others. Nevertheless, this is a welcome addition to SQL Server's capabilities.
The Need for CLR
Recall that linked servers blur the dividing line between SQL Server and other databases by allowing SQL Server to issue Data Definition Language (DDL) and Data Manipulation Language (DML) statements against remote database tables as though they were part of SQL Server. All of this can be done within the comfort of easy-to-understand T-SQL statements.
With linked servers offering flexibility and ease of programming, why would we need to write a CLR routine? The answer lies where linked server capabilities have a few shortcomings:
- Error-handling tasks, such as writing detailed info to a log, can be done more gracefully and thoroughly using .NET routines.
- Dynamically changing remote database environments can be a chore with linked servers. For instance, to point a linked server to a different DB2 database or different machine altogether requires writing ugly dynamic SQL, using synonyms or dropping the linked server and creating it with different attributes. On the other hand, connection strings can be changed easily in .NET.
- Caching the results of parameterized queries or stored procedures often requires using the somewhat clumsy Insert-Exec T-SQL construct in order to save results from the remote database in a table (usually temporary) that T-SQL can use. This step is unnecessary with .NET routines.
- Linked servers do not offer the versatility of massaging data from a remote data source before handing it to SQL Server as compared to a CLR routine.
- Remote data access routines may need the benefit of business logic or other routines only available within the .NET realm.
To demonstrate CLR integration to a remote database, I'll create a table-valued user-defined function and a stored procedure written in VB.NET. These routines will call a DB2 query and return the results to SQL Server as though the data came from a local SQL Server table. (Note that both of these concepts can be accomplished similarly in DB2 on the iSeries using Java as demonstrated in "Query Remote Database Tables from the iSeries Using SQL and Java" and "Execute Remote Stored Procedures from the iSeries").
Software Requirements
The examples require iSeries Access V5R3 (or higher with the latest service pack) to be installed along with the DB2 UDB for iSeries .NET managed provider component (the appropriate ODBC or OLE DB providers can be substituted as well). This must be installed on the same machine as SQL Server.
Visual Studio 2005 (hereafter VS) is also required (not necessarily on the same machine) along with the SQL Server client tools (which will provide VS the templates for creating SQL Server routines and deploying them automatically). Understand that these procedures can be created outside of VS using Notepad, but the deployment and compilation instructions would be a chore to describe!
Setting Up SQL Server 2005
Since .NET code can do just about anything, including destructive tasks, for security reasons you must flip a switch in order to enable SQL Server 2005 to run .NET code. This feature can be enabled by starting the SQL Server Surface Area Configuration utility. Click on Surface Area Configuration for Features, expand the database engine node, click on CLR Integration, and then check the Enable CLR Integration box.
Alternatively, you can execute the following T-SQL code:
GO
RECONFIGURE
GO
sp_configure 'clr enabled', 1
GO
RECONFIGURE
GO
Next, you'll need to choose an existing database or create a new database to use for this exercise. I simply used the AdventureWorks sample database. Start a SQL Server management studio session (this is a unified replacement for the old Enterprise Manager and Query Analyzer tools) and start a new query session that is configured to use the database you've chosen.
Once you've decided on a database, mark it as "trustworthy" using the ALTER DATABASE statement so that it can reference "unsafe" .NET code (more on this in a minute):
Set TRUSTWORTHY On
To do these examples, we'll use the DB2 UDB for iSeries .NET managed provider, so we'll need to add a reference to this assembly. The references to code in external assemblies for a SQL Server CLR project are really enabled by the assemblies referenced within SQL Server. Therefore, you may not add a reference to an assembly from Visual Studio within a SQL Server project as you would in other projects. To register an assembly that SQL Server can use, you must issue the CREATE ASSEMBLY SQL statement (this is done in SQL Server management studio within the database you've chosen).
In case you're new to .NET, an "assembly" refers to executable .NET code stored as a DLL file (somewhat similar to the concept of an ILEservice program). In the example below, IBM.Data.DB2.iSeries.dll is a reusable .NET assembly distributed by IBM.
FROM 'C:Program FilesIBMClient AccessIBM.Data.DB2.iSeries.dll'
WITH PERMISSION_SET = unsafe;
This misleading SQL doesn't really create or compile any code; it only registers existing code for use by SQL Server. The CREATE/DROP ASSEMBLY syntax was implemented for consistency with other SQL statements. The FROM clause contains the path to the qualified path name to the actual assembly (which may vary on your computer, depending on your iSeries Access installation directory).
Finally, the PERMISSION_SET allows three values: SAFE, EXTERNAL ACCESS, and UNSAFE. SAFE indicates that the code requires no access outside of SQL Server. EXTERNAL ACCESS allows the code to access resources such as files, networks, environment variables, and the registry. The UNSAFE setting allows calls to unmanaged code (e.g., COM objects) and other code that is outside the control of the .NET framework. The current DB2 provider assembly requires this setting. As the Microsoft documentation notes, grant the UNSAFE setting only to highly trusted assemblies; otherwise, your system security may be compromised or your system may become unstable. After executing this statement, SQL Server will issue a warning about relative .NET framework levels (because the IBM DB2 for iSeries assembly was written for V1.0 of the framework). You can ignore this error.
Fire Up Visual Studio 2005
After starting a new VS session, create a new project. Select VB.NET as the language (expand this node) and choose the Database project type. Click on SQL Server Project in the templates window and assign a project name (such as MCPressDemo) and a solution name. The Add Database Reference window will appear. Select your database from the list or click the Add New Reference button to select your SQL Server and database name (again, I used AdventureWorks). If you're prompted with a message about debugging CLR code, choose Yes if this is a server you can tinker with (as debugging can impact performance).
If you don't have the SQL Server template available in Visual Studio, that could indicate an issue with having the SQL Server client components installed incorrectly.
Once the project is open, choose Properties from the Project menu and then the Database tab. Set the permission level setting to Unsafe because the project will contain a reference to the IBM DB2 assembly. Next, choose Add Reference from the Project menu. The list of references will be limited to some default .NET framework assemblies and any assemblies registered to the database with the CREATE ASSEMBLY command. If you successfully ran the above CREATE ASSEMBLY statement, IBMDataDB2iSeries should appear in the list. Select this assembly and click OK. Your project code can now reference this library.
Stored Procedure Example
To create a .NET stored procedure, choose Add Stored Procedure from the Project menu and assign a name. VS will create stub code for the stored procedure including default framework references, an attribute indicating that the code will be used as a stored procedure, and a shared (aka static) method with the same name as the stored procedure. Simply fill in .NET logic in the stub code and then build and deploy the project. VS will take care of registering the stored procedure in SQL Server!
Stored procedures can do many things: execute logic, accept and return parameters, and return one or more result sets. Figure 1 shows sample stored procedure spDB2Demo that demonstrates these features. This procedure issues a basic SELECT query to DB2 and passes the DB2 result set back to SQL Server (the caller will not know that .NET code is actually going to DB2 to get the results). It also features an output parameter that returns the number of rows retrieved from DB2.
|
Figure 1: Stored procedures can execute logic, accept and return parameters, and return one or more result sets.
Adding parameters to a .NET stored procedure is as easy as adding parameters to the method's signature. Passing parameters by value (ByVal keyword) causes SQL Server to recognize these parameters as input-only. Passing parameters by reference causes SQL Server to treat them as input/output. When you change your method's parameter signature, the stored procedure signature registered in the database will change as well when the latest project changes are deployed to SQL Server. Parameters passed between SQL Server and .NET code should use data types available in the System.Data.SqlTypes library to allow for things like null compatibility.
Our next task is making the .NET code return a result set to SQL Server. Passing back information as a result set involves three main steps: opening a SQL Server connection, defining the result set's metadata, and writing the data one row at a time. (Incidentally, there are more ways to pass back information.)
Generally, since this CLR code is running in the context of a database connection, the existing connection is used as the pipe for returning results to SQL Server. Access to the existing connection's context is provided by the SqlContext object.
To indicate the structure of the result set, create an array of SqlMetaData objects. Each element of the array corresponds to a column name and its attributes in the result set. Next, create a variable with the type of Microsoft.SqlServer.Server.SqlDataRecord and pass the SqlMetaData array to its constructor. This record object will be passed to SQL Server via the SqlContext.Pipe.SendResultsStart and indicates to SQL Server that result set data will be coming in the specified format.
At this point in the sample, a DB2 data reader object is opened and iterated. Each row from DB2 is copied column by column into the data record object and then sent to SQL Server using the SqlContext.Pipe.SendResultsRow method. Oddly enough, this is very similar to how external DB2 user-defined table functions work. After all the rows are processed, the SqlContext.Pipe.SendResultsEnd method is called to let SQL Server know the end of the result set has come.
As illustrated in the sample code, additional result sets can be returned by repeating the process with a new or existing data record definition and executing the SqlContext.Pipe.SendResultsStart method.
When finished, sample T-SQL code to run the stored procedure will look like this:
Exec SPDB2Demo @NoRows OUTPUT
Print @NoRows
While this stored procedure is a somewhat trivial example of getting data from DB2, the point is that coding a stored procedure in a .NET language allows almost any conceivable programming function to be done through SQL Server, including the most difficult data integration tasks.
Table-Valued User-Defined Function Example
A table-valued user-defined function can be thought of as a "virtual table." Instead of querying data from a database table, table function code supplies the database server data in a tabular format. Similar to stored procedures, table functions can receive input parameters and perform logic, but they return only a single result set and have no output parameter capability.
In the next example, we'll code a CLR table function to get its data from DB2. When finished, the following T-SQL statement will get its data from the DB2 QIWS/QCUSTCDT table on the iSeries:
Didn't we already do this with the stored procedure? Yes, but it's important to realize that table functions have advantages over stored procedures. In particular, if a result set needs to participate in a join to another table or be sorted dynamically with an Order By, then a table function is usually a better tool. In other words, the result set of a stored procedure can't be modified or easily used in a subsequent query, but the result of a table function can.
To create your own .NET coded table function in Visual Studio, choose Add User-Defined Function from the Project menu. Unfortunately, the supplied template code is for a scalar (single-value) user-defined function, and many modifications have to be done to convert it to a table function. Notice that stub code is generated and that the defined method name matches the object name of the table function in SQL Server (once it's deployed). Additionally, adding ByVal parameter references to the method signature equates to adding input parameters in the table function.
Figure 2 contains function DB2QCustCdt. The SqlFunction attribute is specified along with several properties needed to inform SQL Server of how the table function will be implemented. The table definition property, for instance, consists of the column names and SQL Server data types that the table function will return when executed. See the SQL Server help for a list of all the available properties and their roles.
|
Figure 2: In function DB2QCustCdt, the SqlFunction attribute is specified along with several properties needed to inform SQL Server of how the table function will be implemented.
Notice that method DB2QCustCdt returns an object that implements the IEnumerable interface. This is a requirement for all table functions written in .NET code. Many .NET classes use this important interface, including arrays, collections, and data readers. When SQL Server gets a request to run the DB2QCustCdt table function, it will call the DB2QCustCdt method and expect to retrieve an "enumerable" object. Each enumerated object represents a row to be returned in the result set. In this case, the sample code will return an iDB2DataReader.
When SQL Server enumerates (i.e., processes item by item) each row returned by the data reader, it will need a little more help to map data from the IEnumerable object to parameters that represent the table function's columns. The FillRowMethodName property of the SqlFunction attribute defines the method SQL Server will call when breaking up each enumerated row object into distinct data columns. The signature of the method will be a row object (as an input parameter) followed by an output parameter for each column in the table function. Please note that the project will not deploy if the signature for this FillRowMethodName method does not match the table columns defined in the TableDefinition property.
In this example, method FillRowQCustCdt will be called for each row returned by the data reader. In the case of the data reader class, the row parameter will be an object of type System.Data.Common.DbDataRecord, which is fortunate, because it offers methods to extract individual data columns from the DataReader. Each output parameter is filled in from the data reader before the method ends. Each call to FillRowQCustCdt represents one row being parsed and returned to SQL Server.
A Step Further
Figure 3 shows an enhanced example of the table function. It calls a DB2 stored procedure with a criteria parameter instead of a simple Select. This figure contains the DB2 stored procedure, the .NET code, and T-SQL usage examples.
|
Figure 3: This enhanced example of the table function calls a DB2 stored procedure with a criteria parameter instead of a simple Select.
Data Type Cross- Reference
One thing to be aware of is the need for data type compatibility between .NET, DB2, and SQL Server. table below shows how to map DB2 data types to the equivalent .NET types and the resulting SQL Server type:
Data Type Compatibility Between .NET, DB2, and SQL Server
|
||
DB2 Data Type
|
.NET SQL Data Type
|
SQL Server data type
|
CHAR
|
SqlString
|
NCHAR
|
VARCHAR
|
SqlString
|
NVARCHAR
|
INTEGER
|
SqlInt32
|
INTEGER
|
SMALLINT
|
SqlInt16
|
SMALLINT
|
BIGINT
|
SqlInt64
|
BIGINT
|
DECIMAL
|
SqlDecimal
|
DECIMAL
|
NUMERIC
|
SqlDecimal
|
DECIMAL (or NUMERIC)
|
DATE
|
SqlDateTime
|
DATETIME
|
TIMESTAMP
|
SqlDateTime
|
DATETIME
|
DOUBLE
|
SqlDouble
|
FLOAT(53)
|
Notice that the DB2 FLOAT and TIME types are missing. I found DB2 Float to be problematic unless I mapped it to SqlDouble and then to Float(53) in SQL Server (although the expected type to use is REAL). Also, I had problems with the DB2 TIME data type. Mapping the TIME type to SqlDateTime implicitly converts to a time stamp without a date portion (which means the date portion is '01/01/0001'). An error occurs because, unlike DB2 time stamps, SQL Server's DATETIME cannot store this date value. It would probably be best to bring the TIME type into .NET as a string and then cast it back to a DATETIME value later.
Deploying the Project
When your .NET routines have been written, press F5 to deploy the project to SQL Server (or choose Build and then Deploy from the VS Build menu). The deployment will register the project's assembly with SQL Server. Also, the appropriate stored procedure and table function definitions based on the method signatures in the code will be registered in the database. If signatures change, VS will alter the SQL routines appropriately! With all the .NET coding required, at least we don't have to manually write CREATE PROCEDURE and CREATE FUNCTION statements!
.NET Routines or Linked Server Access?
Writing .NET routines requires more work than writing linked server routines. When deciding to use .NET code vs. a linked server, consider whether .NET offers a function (custom business logic, etc.) that is not accessible to the linked server. For instance, you may need to join data from a remote Web service with DB2 data before passing it to SQL Server—something that .NET will allow you to do.
However, this article just touches on the capabilities of CLR routines with respect to real-time DB2 integration. Aggregates and triggers can also be written in .NET code. The granularity and flexibility afforded to the programmer when writing .NET routines is unsurpassed. If your company uses any of the myriad of products that will run on SQL Server 2005, you now have additional tools to create real-time, seamless integration routines with your flagship iSeries applications...and all with standard tools!
LATEST COMMENTS
MC Press Online