Monday, 2 April 2012

SQL 2008: SELECT FOR XML

Recently I was tasked with implementing a solution to transfer data between two systems for a Financial Services company.  The data had to be outputted, one entity (account, broker etc) at a time, from a SQL 2008R2 database to an xml file, before being inputted to the 3rd party application via a message queue.

One of the challenges was that each entity  - and there were more than 30 of them -  required the xml to be wrapped in different tags and have different headers to identify them. 

The solution we choose was use the FOR XML clause, which has been enhanced in R2.  The plan was to write the SELECT FOR XML statement in a stored procedure, and BCP it out to a file.  

Sounds very simple.  But in case you have to do the same thing I thought I'd highlight how we did it.
The SELECT...FOR XML will convert the output of your select statment into XML format, and the really useful bit is that with the PATH and ROOT options you can specify exactly what tag should be used to identify a row, and what tag should be used to identify the start and end of the result set. The following example uses a table from the AdventureWorks db:


SELECT TOP
ShiftID, Name 
FROM HumanResources.Shift 
FOR XML PATH('RowName'), ROOT('ResultSetName'), TYPE



Results:
----------------------------------------------------------------------------------------------------------------------------------
<ResultSetName><RowName><ShiftID>1</ShiftID><Name>Day</Name></RowName><RowName><ShiftID>2</ShiftID><Name>Evening</Name></RowName></ResultSetName>


(1 row(s) affected)

And nicely enough, if you click on the result in SSMS, it will display the results all pleasantly formatted:



However in the actual case being worked on, the result set itself had to be wrapped in further tags to identify what file it was to the message queue. Solution, nest this SELECT FOR XML statement inside another, enabling you to generate an XML hierarchy :


SELECT 
( SELECT TOP
ShiftID, Name 
FROM HumanResources.Shift 
FOR XML PATH('NestedRow'), 
        ROOT('NestedRoot'), TYPE 
) AS TestRow
FOR XML PATH ('RowName'), 
ROOT ('ResultSetName'), TYPE


Note that if you only want one of the two tags (ROOT or PATH) then you can put in an empty string.  Not putting in anything at all will result in default behaviour - check out the msdn reference for more on that.
So this code essentially is the answer to the problem. Put the above code into a stored procedure and call that from the slqcmd to output the results to file using the queryout option:
bcp "EXEC GetXml" queryout MyFile.xml -T -c
Alternatively, put the actual sql select statement into the BCP, but that looks rather messy and splits up your logic into two places, the database and the batch file, which means more maintenance.  I would advise keeping it in the stored proc.

This solves the main part problem, but as we have 30+ entities requiring 30+ different wrapper tags we now need 30+ stored procs to be coded.  To avoid having so much repeating code I opted to use dynamic sql and a simple lookup tag table to find the wrapper tags. The procedure was modified to require a parameter to be passed to it identifying what entity it was being executed for.  This entity name would be in the lookup table with associated tags that would then be used to populate the dynamic sql and SELECT FOR XML statments.

CREATE TABLE [dbo].[lookup](
[entity] [varchar](50) NOT NULL,
[parameter_1] [varchar](50) NULL,
[parameter_2] [varchar](50) NULL,
[parameter_3] [varchar](50) NULL,
[parameter_4] [varchar](50) NULL,
[parameter_5] [varchar](50) NULL,
PRIMARY KEY CLUSTERED 
(
[entity] ASC
)
GO
INSERT [dbo].[setting] ([entity], [parameter_1], [parameter_2], [parameter_3], [parameter_4], [parameter_5]) 
VALUES (N'account', N'Account', N'AccountList', N'', N'AccountDownload', NULL)
GO

CREATE PROC [dbo].[get_xml]    @entity         VARCHAR(50)
AS
DECLARE 
@SQL VARCHAR(MAX),
@MessageID VARCHAR(50),
@PARAM1 VARCHAR(50),
@PARAM2 VARCHAR(50),
@PARAM3 VARCHAR(50),
@PARAM4  VARCHAR(50

SELECT  @param1 = parameter_1,
                  @param2 = parameter_2,
                  @param3 = parameter_3,
                  @param4 = parameter_4
FROM dbo.lookup
WHERE entity = @entity



SET @SQL  =  '


SELECT '+@messageID+'  AS MessageId,
(
                SELECT                  *
                FROM                   [MyDB].[dbo].[tbl_'+@entity+'] v
                FOR XML PATH('''+@PARAM1 +'''), ROOT('''+@PARAM2+'''), TYPE
)
FOR XML PATH('''+@PARAM3+'''), ROOT('''+@PARAM4+'''), TYPE
'


EXEC (@SQL)
GO
(Note that @MessageID us used here to illustrate how you can manipulate the xml output.  It was a requirement of the solution to have a MessageID on each file) 

Now we have one stored proc that can be used to build all the XML files required, and all our logic is in a single place.  The procedure accepts an input parameter that indicates what entity is being loaded, and the PATH and ROOT values are then found from the lookup table.  Notice also the addition of the MessageID as a field of the outer SELECT FOR XML statement.  Playing with nesting one statement inside another in this manner allows a huge amount of flexibility when transforming data into xml format. For more info on the SELECT FOR XML clause, including use of other modes, see the msdn reference here:
http://msdn.microsoft.com/en-us/library/ms178107.aspx