Monday, 2 April 2012

SQL 2008: SELECT FOR XML

Recently I was tasked with implementing a solution to transfer data between two systems for a Financial Services company.  The data had to be outputted, one entity (account, broker etc) at a time, from a SQL 2008R2 database to an xml file, before being inputted to the 3rd party application via a message queue.

One of the challenges was that each entity  - and there were more than 30 of them -  required the xml to be wrapped in different tags and have different headers to identify them. 

The solution we choose was use the FOR XML clause, which has been enhanced in R2.  The plan was to write the SELECT FOR XML statement in a stored procedure, and BCP it out to a file.  

Sounds very simple.  But in case you have to do the same thing I thought I'd highlight how we did it.
The SELECT...FOR XML will convert the output of your select statment into XML format, and the really useful bit is that with the PATH and ROOT options you can specify exactly what tag should be used to identify a row, and what tag should be used to identify the start and end of the result set. The following example uses a table from the AdventureWorks db:


SELECT TOP
ShiftID, Name 
FROM HumanResources.Shift 
FOR XML PATH('RowName'), ROOT('ResultSetName'), TYPE



Results:
----------------------------------------------------------------------------------------------------------------------------------
<ResultSetName><RowName><ShiftID>1</ShiftID><Name>Day</Name></RowName><RowName><ShiftID>2</ShiftID><Name>Evening</Name></RowName></ResultSetName>


(1 row(s) affected)

And nicely enough, if you click on the result in SSMS, it will display the results all pleasantly formatted:



However in the actual case being worked on, the result set itself had to be wrapped in further tags to identify what file it was to the message queue. Solution, nest this SELECT FOR XML statement inside another, enabling you to generate an XML hierarchy :


SELECT 
( SELECT TOP
ShiftID, Name 
FROM HumanResources.Shift 
FOR XML PATH('NestedRow'), 
        ROOT('NestedRoot'), TYPE 
) AS TestRow
FOR XML PATH ('RowName'), 
ROOT ('ResultSetName'), TYPE


Note that if you only want one of the two tags (ROOT or PATH) then you can put in an empty string.  Not putting in anything at all will result in default behaviour - check out the msdn reference for more on that.
So this code essentially is the answer to the problem. Put the above code into a stored procedure and call that from the slqcmd to output the results to file using the queryout option:
bcp "EXEC GetXml" queryout MyFile.xml -T -c
Alternatively, put the actual sql select statement into the BCP, but that looks rather messy and splits up your logic into two places, the database and the batch file, which means more maintenance.  I would advise keeping it in the stored proc.

This solves the main part problem, but as we have 30+ entities requiring 30+ different wrapper tags we now need 30+ stored procs to be coded.  To avoid having so much repeating code I opted to use dynamic sql and a simple lookup tag table to find the wrapper tags. The procedure was modified to require a parameter to be passed to it identifying what entity it was being executed for.  This entity name would be in the lookup table with associated tags that would then be used to populate the dynamic sql and SELECT FOR XML statments.

CREATE TABLE [dbo].[lookup](
[entity] [varchar](50) NOT NULL,
[parameter_1] [varchar](50) NULL,
[parameter_2] [varchar](50) NULL,
[parameter_3] [varchar](50) NULL,
[parameter_4] [varchar](50) NULL,
[parameter_5] [varchar](50) NULL,
PRIMARY KEY CLUSTERED 
(
[entity] ASC
)
GO
INSERT [dbo].[setting] ([entity], [parameter_1], [parameter_2], [parameter_3], [parameter_4], [parameter_5]) 
VALUES (N'account', N'Account', N'AccountList', N'', N'AccountDownload', NULL)
GO

CREATE PROC [dbo].[get_xml]    @entity         VARCHAR(50)
AS
DECLARE 
@SQL VARCHAR(MAX),
@MessageID VARCHAR(50),
@PARAM1 VARCHAR(50),
@PARAM2 VARCHAR(50),
@PARAM3 VARCHAR(50),
@PARAM4  VARCHAR(50

SELECT  @param1 = parameter_1,
                  @param2 = parameter_2,
                  @param3 = parameter_3,
                  @param4 = parameter_4
FROM dbo.lookup
WHERE entity = @entity



SET @SQL  =  '


SELECT '+@messageID+'  AS MessageId,
(
                SELECT                  *
                FROM                   [MyDB].[dbo].[tbl_'+@entity+'] v
                FOR XML PATH('''+@PARAM1 +'''), ROOT('''+@PARAM2+'''), TYPE
)
FOR XML PATH('''+@PARAM3+'''), ROOT('''+@PARAM4+'''), TYPE
'


EXEC (@SQL)
GO
(Note that @MessageID us used here to illustrate how you can manipulate the xml output.  It was a requirement of the solution to have a MessageID on each file) 

Now we have one stored proc that can be used to build all the XML files required, and all our logic is in a single place.  The procedure accepts an input parameter that indicates what entity is being loaded, and the PATH and ROOT values are then found from the lookup table.  Notice also the addition of the MessageID as a field of the outer SELECT FOR XML statement.  Playing with nesting one statement inside another in this manner allows a huge amount of flexibility when transforming data into xml format. For more info on the SELECT FOR XML clause, including use of other modes, see the msdn reference here:
http://msdn.microsoft.com/en-us/library/ms178107.aspx

Sunday, 25 March 2012

SQL 2008R2 - Trigger order

I was troubleshooting a piece of code for a client recently and came across an interesting issue.  The client had created 2 triggers on a table that were firing at the same time.  This is pretty much against standard coding practice for several reasons - firstly how can you be sure in which order they will fire, and secondly it is very tricky to debug (due to the first point :)).


I decided to do some testing of this to see if there is a consistent pattern to the trigger firing order.  To do this i created a table, trigger_test, and 2 AFTER INSERT triggers, add_two, and times_2.  As the names suggest, add_two will take the "end_value" and add 2 to it, updating the end_value field with this new value.  Trigger times_two will multiply the end_value and update the field with this new value.  Here is code i used for this:


CREATE TABLE trigger_test
(
      id                INT IDENTITY,
      initial_value     INT,
      end_value         INT,
      trigger_add       BIT,
      trigger_times     BIT
)

GO

CREATE TRIGGER add_two ON trigger_test AFTER INSERT
AS
BEGIN
      UPDATE trigger_test 
      SET end_value = z.end_value + 2, trigger_add = 1
      FROM INSERTED i
      INNER JOIN trigger_test z
      ON i.id = z.id
END
GO

CREATE TRIGGER times_two ON trigger_test AFTER INSERT
AS
BEGIN
      UPDATE trigger_test
      SET end_value = z.end_value * 2, trigger_times = 1
      FROM INSERTED i
      INNER JOIN trigger_test z
      ON i.id = z.id
END
GO

Therefore if we insert the value 5 into the table, if trigger add_two occurs first, followed by times_two, we would expect a final value of 14.  If it is the other way round the result would be 12. In this way we can easily determine in what order the triggers are executing.

So next we run the insert, and select the results:

INSERT INTO trigger_test (initial_value,end_value) VALUES (5,5)
GO
SELECT * FROM trigger_test

Results:
(1 row(s) affected)
id          initial_value end_value trigger_add trigger_times
----------- ------------- ----------- ----------- -------------
1           5             14          1           1

(1 row(s) affected)

Here we clearly see that add_two is executing first, (5+2 = 7, 7*2 = 14).

But what happens if we drop the triggers and create them in the opposite order, times_two before add_two? i'll not repeat the code, as it's the same as above but the result then becomes:

(1 row(s) affected)
id          initial_value end_value trigger_add trigger_times
----------- ------------- ----------- ----------- -------------
1           5             12          1           1

(1 row(s) affected)

The result is now 12, indicating that times_two occurred first.  

(The test can be repeated multiple times to ensure there is a definite order)

So it would appear that the order the triggers are created in is the order they will be executed in. Or put differently, the lower OBJECT_ID will be executed before the larger one.

This alone is fine and dandy, but may result in developers inadvertently messing things up if they modify and drop/create a procedure, and are unaware that the order is significant.  In doing so they could accidentally change the order and therefore the functionality of your code, and as stated earlier, if you ARE going to fire multiple triggers at the same time it will now be very tricky to find the problem!

This can be partially resolved by using a handy proc called sp_settriggerorder.  As it's name suggests, this allows you to set the trigger firing order by setting 'FIRST' and 'LAST' values:

EXEC sp_settriggerorder @triggername = 'times_two',
                        @order = 'first',
                        @stmttype = 'insert',
                        @namespace = null

Now we can define the specific order we want the triggers to fire in.  Problem solved? Almost....

We can only set 'first' and 'last' values here, so this can only define the order of up to three triggers  - set one as 'first', one as 'last' and do nothing for the other and it will have to fire second.  Any more triggers and it will not be possible to set the order they will fire.

Also, the same caveat still applies - if a trigger is  explicitly set to 'first' and is then dropped and recreated, it will no longer have this setting and  sp_settriggerorder will need to be rerun.

For more info on sp_settriggerorder see the msdn reference here: http://msdn.microsoft.com/en-us/library/ms186762.aspx


Friday, 10 June 2011

SQL 2008 Data Profiler

I am currently working on the early stages of a SQL 2008 Data Warehousing project.  I wanted to dive into the data sources for the DW I will be building, to determine if any of them could be eliminated up front.  This could be for a variety of reasons, including for not providing the required info for the DW or not containing valid data.  Its essential to conduct this test early on, and previously that would have involved using a 3rd party tool such as Trillium or a range of others to analyse the content and give some statistical feedback on the content.

Great to see that Microsoft (and its competitors) are finally catching on to the significance of this area, and in SQL Server 2008 Microsoft have provided an SSIS task and separate Date Profiler application to do just that.



The Data Profiling Task is available in the SSIS Control Flow and is very quick to set up.  It can be customised or, if you prefer, you can click the "Quick Profile" button to speedily set up and execute.  Under the Quick Profile option it takes just a few seconds to simply select the datasource (table, view etc) and then select a variety of checkboxes on what stats you'd like to see:

Column Null Ratio
Column Statistics
Column Value Distribution
Column Length Distribution
Column Pattern
Candidate Key



An additional option is available if customising, but is not available under the Quick Profile option:

Value Inclusion Profile

Then choose an XML file to output to.  That's all there is to it.  It's now ready to execute and output the results.

In order to actually make sense of the XML output file the Data Profile Viewer ( \Microsoft SQL Server 2008\Integration Services\Data Profile Viewer) is needed.  Open the xml file and lo and behold all the results become available!



Use the tree on the left pane to select the profle request (options checked earlier in the SSIS task) results desired, and on the right pane view the results - the distribution of data in each column, min/max values, null distributions, candidates for the primary key and more.

I have found this to be a very useful tool for quickly diving into the source systems to get a feel for the data and quickly highlight any major data quality issues.  Much better to find them up front that wait to see what you get after you've spent days building your ETL structures!

SQL Banana Launch!

Hello and welcome the launch of SQL Banana.  On this blog I will be sharing my most intimate thoughts and discoveries around all things Microsoft BI - SQL Server, PerformancePoint, SharePoint, Excel, PowerPivot, MDX, DAX, T-SQL, DW Modeling and more.

Check back regularly for my latest musings.