SQL Banana: 2014

Tuesday, 30 December 2014

How to Execute SQL Script Asynchronously

SQL code generally runs sequentially. One line of code must complete before the next line is executed. We can redirect the path that is followed through the code using conditional logic or GOTO statements, but ultimately we cannot progress to the next step until the previous one has completed.

However, there are situations where we may want the code to continue to step 2 without waiting for a particularly time-consuming step 1 to complete. If step 1 and step 2 do not impact each other then it might be beneficial to have the SQL execute asynchronously or concurrently, saving overall execution time.

For an example let's create a table to insert data into for step 1, and a stored procedure that will insert data into it:

CREATE TABLE [dbo].[InsertSomething](
       id                   INT IDENTITY(1,1) NOT NULL,
       Col1                 VARCHAR(32) NULL,
       InsertedDate        DATETIME NULL
)
GO

CREATE PROC DoSomething @Data VARCHAR(32) = ''
AS
INSERT INTO InsertSomething (Col1, InsertedDate) VALUES(@Data + ' - Start', GETDATE())
WAITFOR DELAY '00:00:10:00'
INSERT INTO InsertSomething (Col1, InsertedDate) VALUES(@Data + ' - End', GETDATE())
GO

The stored proc DoSomething will insert two rows into the table InsertSomething. After the first row is inserted there will be a delay of ten seconds before the second row is inserted. Therefore the time for the procedure to complete is approximately ten seconds, which in this example represents a long running piece of code.

EXEC DoSomething 'Test 1'

INSERT INTO InsertSomething (Col1, InsertedDate) VALUES('Test 2', GETDATE())
SELECT * FROM InsertSomething

Results

id Col1 InsertedDate

----------- -------------------------------- -----------------------

1 Test 1 - Start 2014-12-30 12:38:40.023

2 Test 1 - End 2014-12-30 12:38:50.030

3 Test 2 2014-12-30 12:38:50.037

We can see from the timestamps that the "Test 2" record was inserted only after the stored procedure had completed inserting the Test 1 records. IE the sql script is executing sequentially, as expected.

In the above example we would prefer the "Test 2" insert statement not to have to wait for the stored procedure to complete. We would like the two sql commands to run concurrently. To do this we need to create a second stored procedure to call the first via a new thread:

CREATE PROC [dbo].[AsyncInsert] @Data    VARCHAR(32)
AS

DECLARE @rc          INT
DECLARE @object      INT
DECLARE @osql_cmd    VARCHAR(1000)
DECLARE @ServerName  VARCHAR(256) = (SELECT @@SERVERNAME)

DECLARE @ExecStmt    VARCHAR(MAX) = '"EXEC DoSomething @Data='''+@Data + '''"'

EXEC @rc = sp_oacreate 'wscript.shell', @object OUT

SET @osql_cmd = 'osql -E -dMyDB -S' +@ServerName+' -Q'+ @ExecStmt

EXEC @rc=sp_oamethod @object, 'run', null, @osql_cmd

EXEC sp_oadestroy @object

GO

Test the new procedure by running the following code:

EXEC [AsyncInsert] @Data = 'Test 1 Async'

WAITFOR DELAY '00:00:01:00'

INSERT INTO InsertSomething (Col1, InsertedDate) VALUES('Test 2 Async', GETDATE())

WAITFOR DELAY '00:00:10:00'

SELECT * FROM InsertSomething

Results
id          Col1                             InsertedDate
----------- -------------------------------- -----------------------
1           Test 1 - Start                   2014-12-30 12:38:40.023
2           Test 1 - End                     2014-12-30 12:38:50.030
3           Test 2                           2014-12-30 12:38:50.037
4           Test 1 Async - Start             2014-12-30 12:39:01.967
5           Test 2 Async                     2014-12-30 12:39:02.930
6           Test 1 Async - End               2014-12-30 12:39:11.987

In the results above we can see that for the Async tests the timestamps show that the Test 2 insert occurred while the stored procedure (Test 1) was still running - ie after the "Test 1 Async - Start" insert command completed but before the "Test 1 Async - End" insert was executed.

We have generated a new thread to execute the stored procedure that does not require us to wait for it to complete before continuing. Therefore the SQL script is now effectively running asynchronously.

Monday, 17 November 2014

SQL UNPIVOT on Multiple Columns

Some time ago I posted about using the UNPIVOT function in a stored procedure here. Recently I came across an interesting problem that required further use of the UNPIVOT command.

Suppose you have a number of columns in a wide table that you would like to unpivot into a single column and multiple rows. An example might be when there are many measure columns and you would like to narrow the table by combining all the measure columns into a single column. This would be a fairly standard use of the UNPIVOT command that was covered in the earlier post.

Now suppose that there are two sets of measures that you would like kept as separate columns, Instead of a single MeasureName and MeasureValue column, you would like 2 of each. Or perhaps a simpler example might not be measure, but dimensional data. Let's look at CRM system data:

CREATE TABLE Company

(Id INT,

CompanyName VARCHAR(500),

Contact1 VARCHAR(500),

Contact2 VARCHAR(500),

Email1 VARCHAR(500),

Email2 VARCHAR(500)

)

GO

INSERT INTO Company SELECT

1, 'FastCarsCo', 'Mr Purple', 'Mr Orange', 'purple@fcc.com', 'orange@fcc.com'

UNION ALL SELECT

2, 'TastyCakeShop', 'Mr Brown', 'Mr White', 'brown@tcs.com', 'orange@tcs.com'

UNION ALL SELECT

3, 'KidsToys', 'Mr Pink', 'Mr Black', 'pink@kt.com', 'black@kt.com'

UNION ALL SELECT

4, 'FruitStall', 'Mr Red', 'Mr Blue', 'red@fs.com', 'blue@fs.com'

GO

SELECT Id, CompanyName, contact1, contact2, email1, email2

FROM Company

GO

In this example we have a company table with two contacts for each company, and an email address for each contact. Instead of two columns for contact names and two for email addresses, we would like to reduce this to one of each: CompanyName, ContactName, EmailAddress. First let's try and reduce the contact1 and contact2 columns to one:

SELECT ID, CompanyName, ContactName,Contact

FROM

(

SELECT ID, CompanyName, Contact1, Contact2

FROM Company

) src

UNPIVOT

(

ContactName FOR Contact IN (Contact1, Contact2)

) pvt

We have now replaced the contact1 and contact2 fields with a single ContactName field. I have added the new Contact column to indicate the source of the data. This can be removed from the select list if it is not required.

Next we need to perform a similar unpivot on the email columns. This is actually easier that you might think. Rather than nesting queries, or complex CTEs, we need only append our first query with another unpivot statement:

SELECT Id,

CompanyName,

ContactName,

EmailAddress,

Contact

FROM

(

SELECT Id, CompanyName, Contact1, Contact2, Email1, Email2

FROM Company

) src

UNPIVOT

(

ContactName FOR Contact IN (Contact1, Contact2 )

) pvt1

UNPIVOT

(

EmailAddress For Email IN (Email1, Email2 )

) pvt2

WHERE RIGHT(Contact,1) = RIGHT(Email,1)

The WHERE clause in the above query is very important. Without it, we haven't told SQL Server how the second set of unpivoted data matches the first. Without the WHERE , it will do a cross join, resulting in Mr Purple occurring twice; once with the correct email address, and once with the address of Mr Orange, his colleague.

In the WHERE clause we state that the new columns we have created, Contact (selected) and Email (not selected) should be joined on the last character. Remember that these two columns are populated with the names of the original fields, ie Contact1, Contact2, and Email1, Email2. The two columns therefore align as Contact1 → Email1 (1=1) and Contact2 → Email2 (2=2). Hence it is important that the names of the original columns in our source Company table contain text such that they can be joined together in the WHERE clause at the end of the query.

More details on the PIVOT / UNPIVOT command can be found here:

http://technet.microsoft.com/en-us/library/ms177410(v=sql.105).aspx

Wednesday, 24 September 2014

SQL Server Error Handling

Good error handling can be critical to writing robust code and ensuring predictable behavior should things go wrong. We always want to be certain that our database will be left in a clean and desirable state, which can be challenging if, say, a stored procedure errors part way through a series of inter-dependent steps.

As a basic starting point for basic handling of problem scenarios it's good practice to contain code within transactions:

BEGIN TRAN

--DO SOMETIHNG

COMMIT

If multiple steps are contained within the transaction, and any one should fail, then all the steps will be rolled back, leaving the database in the same state it was before the transaction began.

With SQL 2005, TRY..CATCH blocks were introduced. Using this feature we can greatly enhance the error handling. We can now “try” to do our data manipulation and should any step fail, we can “catch” the error and perform further "clean up" steps as a result.

A simple example might be to open a transaction in the TRY block, roll it back in the CATCH, else commit it:

BEGIN TRY

BEGIN TRAN

--DO SOMETHING

END TRY

BEGIN CATCH

IF @@TRANCOUNT > 0

ROLLBACK

END CATCH

IF @@TRANCOUNT > 0

COMMIT

For more details on useful global variables such as @@TRANCOUNT see my post on SQL Server Global Variables.

The above code allows us to cleanly handle error scenarios and greatly improve the robustness of our code and therefore our database. However, the above example does not actually let us know any details about the error that has occurred. To do that, we need to capture the error information:

BEGIN TRY

SELECT 10/0

END TRY

BEGIN CATCH

SELECT

ERROR_MESSAGE() AS ErrorMessage,

ERROR_NUMBER() AS ErrorNumber,

ERROR_SEVERITY() AS ErrorSeverity,

ERROR_STATE() AS ErrorState,

ERROR_LINE() AS ErrorLine

END CATCH

In the above script I have forced a “divide by zero” error in the TRY block. When this error is thrown, the code will jump into the CATCH block and execute the select statement:

ErrorMessage ErrorNumber ErrorSeverity ErrorState ErrorLine

------------------------------------- ----------- ------------- ----------- -----------

Divide by zero error encountered. 8134 16 1 3

Note that the actual error has not been thrown, and the code has completed successfully without any problems. We are simply displaying the results of the various error functions.

We can make this even more powerful if we use the RAISERROR function. This will allow us to actually throw an error message back to the calling application. The advantage this gives us is that we can capture the error from the TRY, perform remedial action in the CATCH, and then let the calling application know that there was an error we had to clean up:

DECLARE @ErrorMsg VARCHAR(500)

BEGIN TRY

BEGIN TRAN

SELECT 10/0

END TRY

BEGIN CATCH

SELECT @ErrorMsg = 'Error: ' + ERROR_MESSAGE()

+CHAR(13) +'Severity: ' + CAST(ERROR_SEVERITY() AS VARCHAR(100))

+CHAR(13) +'State: ' + CAST(ERROR_STATE() AS VARCHAR(100))

+CHAR(13) +'Number: ' + CAST(ERROR_NUMBER() AS VARCHAR(100))

+CHAR(13) +'Line: ' + CAST(ERROR_LINE() AS VARCHAR(100))

IF @@TRANCOUNT > 0

ROLLBACK

RAISERROR(@ErrorMsg, 16,10)

END CATCH

IF @@TRANCOUNT > 0

COMMIT

-----------

(0 row(s) affected)

Msg 50000, Level 16, State 10, Line 21

Error: Divide by zero error encountered.

Severity: 16

State: 1

Number: 8134

Line: 5

The result is that the error is captured and handled - the transaction is rolled back. Using the RAISERROR function, however, we have build our own custom error message to return to the calling application. In the example above, this message lets us know even the exact line that caused the error. This message can be enriched to include as much information as you wish, which can greatly ease debugging of the problem.

(It's worth noting that the severity of the error I have set to 16. Different severity numbers can have different effects in different places, and typically 11-18 are the range for user defined errors. See the MSDN link at the end of this article for more details.)

Finally, RAISERROR can be used to throw msg_id's, and this is the more standard SQL behaviour. The list of standard error messages that can be thrown are contained in the sys.messages

table. You can insert your own (with an id >50000) and then use RAISERROR to throw these when required.

Error handling can be used to powerful effect using the techniques described. If you have any techniques you use or ideas on how to improve on the methods described here, please feel free to share them.

MSDN - RAISERROR:

http://msdn.microsoft.com/en-us/library/ms178592.aspx