Saturday 8 December 2012

Reporting Services Report – Changing Column Names, Changing Table Names and sql PIVOT

The other day I was creating some SSRS reports.  For each datafeed in an ETL process the rejected rows were being diverted into error tables – a single error table for each feed, eg error_feed1, error_feed2.  As each of the feeds were different, so too were the column names and metadata of the error tables.  To allow users to review and correct these records I needed to build Reporting Services reports on each of the tables.


Initially it looked like I would need a different report for each feed – when you bind a SSRS report object  to a dataset the column names of the dataset have to remain constant or the report will fail, hence one report, one error table.  As I was dealing with dozens of feeds, the prospect of dozens of very similar reports did not seem favourable.

All these reports would be almost identical, the only difference was the column names and the table names.  I was sure there must be an easier way.  I googled around and found several useful suggestions.  Generally they followed the idea of pivoting the columns into rows in the dataset and then using a SSRS matrix object.  EG:
error_feed1
Id
Col1
Col2
Col3
Col4
1
W
X
Y
Z

Would become:
ID
measure
value
1
Col1
W
1
Col2
X
1
Col3
Y
1
Col4
Z

Using a matrix you would put the ID column on the rows, the measure column on the cross tab section, and value in the data section of the matrix object.  Now it wouldn’t matter if the column names changed, if new columns were added or old ones removed from the source table.  The 3 columns outputted by the pivot query would remain and the matrix report will adapt accordingly. The pivot/unpivot command to do the above would look like this:

SELECT ID, measure, value
FROM( SELECT id, col1, col2, col3, col4, col5 FROM error_feed1) p
UNPIVOT( VALUE FOR measure IN (col1, col2, col3, col4, col5)) AS unpvt


The columns are now dynamic, which solves half the problem. But the FROM clause uses specific column and table names, meaning this metatdata needs to be known in advance and hardcoded into the SSRS dataset query.  Which brings us back to our original problem – we cannot hard code these values because they are constantly changing.
In order to get around this problem I decided to have the dataset be the result of a stored procedure.  I can then have greater flexibility in manipulating the data, so long as I return a result set to SSRS at the end, and always with the same column names returned.
The proc accepts one parameter  - the feed name, to be supplied by the user running the report using a standard SSRS drop down.
The proc itself makes use of the sysobjects and syscolumns system tables to get the full list of columns for any given table:

SELECT      c.name
FROM        sys.columns c
INNER JOIN  sys.objects o
      ON    c.object_id = o.object_id
WHERE       type = 'U'
      AND   o.name LIKE @TableNameORDER BY    c.column_id

name
------------
id
col1
col2
col3
col4



Once the table name has been supplied (as a parameter in the SSRS report), dynamic sql can be leveraged to query the system tables and use the results to build a string containing the required sql PIVOT command, with all the relevant column names for any given table.
 
Once the string variable is populated with the sql script it is then executed,  returning a result set of just 3 columns; the same 3 columns - ID, measure and value -  regardless of the table being queried.  The actual code code of the proc is below:




This result set is all that the SSRS queryset would ever see, and the column names would always be the same 3 named columns, regardless of the feed selected by the user.  Setting up the SSRS matrix object in the manner suggested above would then display the contents of the table as normal – effectively doing a PIVOT to counter the UNPIVOT done in the stored procedure.

We now only need  one single SSRS report to display data from any database table the user selects from the feed list in the parameter drop down -  much simpler than dozens of different reports, or dozens of datasets and playing with visibility settings etc.

Monday 2 April 2012

SQL 2008: SELECT FOR XML

Recently I was tasked with implementing a solution to transfer data between two systems for a Financial Services company.  The data had to be outputted, one entity (account, broker etc) at a time, from a SQL 2008R2 database to an xml file, before being inputted to the 3rd party application via a message queue.

One of the challenges was that each entity  - and there were more than 30 of them -  required the xml to be wrapped in different tags and have different headers to identify them. 

The solution we choose was use the FOR XML clause, which has been enhanced in R2.  The plan was to write the SELECT FOR XML statement in a stored procedure, and BCP it out to a file.  

Sounds very simple.  But in case you have to do the same thing I thought I'd highlight how we did it.
The SELECT...FOR XML will convert the output of your select statment into XML format, and the really useful bit is that with the PATH and ROOT options you can specify exactly what tag should be used to identify a row, and what tag should be used to identify the start and end of the result set. The following example uses a table from the AdventureWorks db:


SELECT TOP
ShiftID, Name 
FROM HumanResources.Shift 
FOR XML PATH('RowName'), ROOT('ResultSetName'), TYPE



Results:
----------------------------------------------------------------------------------------------------------------------------------
<ResultSetName><RowName><ShiftID>1</ShiftID><Name>Day</Name></RowName><RowName><ShiftID>2</ShiftID><Name>Evening</Name></RowName></ResultSetName>


(1 row(s) affected)

And nicely enough, if you click on the result in SSMS, it will display the results all pleasantly formatted:



However in the actual case being worked on, the result set itself had to be wrapped in further tags to identify what file it was to the message queue. Solution, nest this SELECT FOR XML statement inside another, enabling you to generate an XML hierarchy :


SELECT 
( SELECT TOP
ShiftID, Name 
FROM HumanResources.Shift 
FOR XML PATH('NestedRow'), 
        ROOT('NestedRoot'), TYPE 
) AS TestRow
FOR XML PATH ('RowName'), 
ROOT ('ResultSetName'), TYPE


Note that if you only want one of the two tags (ROOT or PATH) then you can put in an empty string.  Not putting in anything at all will result in default behaviour - check out the msdn reference for more on that.
So this code essentially is the answer to the problem. Put the above code into a stored procedure and call that from the slqcmd to output the results to file using the queryout option:
bcp "EXEC GetXml" queryout MyFile.xml -T -c
Alternatively, put the actual sql select statement into the BCP, but that looks rather messy and splits up your logic into two places, the database and the batch file, which means more maintenance.  I would advise keeping it in the stored proc.

This solves the main part problem, but as we have 30+ entities requiring 30+ different wrapper tags we now need 30+ stored procs to be coded.  To avoid having so much repeating code I opted to use dynamic sql and a simple lookup tag table to find the wrapper tags. The procedure was modified to require a parameter to be passed to it identifying what entity it was being executed for.  This entity name would be in the lookup table with associated tags that would then be used to populate the dynamic sql and SELECT FOR XML statments.

CREATE TABLE [dbo].[lookup](
[entity] [varchar](50) NOT NULL,
[parameter_1] [varchar](50) NULL,
[parameter_2] [varchar](50) NULL,
[parameter_3] [varchar](50) NULL,
[parameter_4] [varchar](50) NULL,
[parameter_5] [varchar](50) NULL,
PRIMARY KEY CLUSTERED 
(
[entity] ASC
)
GO
INSERT [dbo].[setting] ([entity], [parameter_1], [parameter_2], [parameter_3], [parameter_4], [parameter_5]) 
VALUES (N'account', N'Account', N'AccountList', N'', N'AccountDownload', NULL)
GO

CREATE PROC [dbo].[get_xml]    @entity         VARCHAR(50)
AS
DECLARE 
@SQL VARCHAR(MAX),
@MessageID VARCHAR(50),
@PARAM1 VARCHAR(50),
@PARAM2 VARCHAR(50),
@PARAM3 VARCHAR(50),
@PARAM4  VARCHAR(50

SELECT  @param1 = parameter_1,
                  @param2 = parameter_2,
                  @param3 = parameter_3,
                  @param4 = parameter_4
FROM dbo.lookup
WHERE entity = @entity



SET @SQL  =  '


SELECT '+@messageID+'  AS MessageId,
(
                SELECT                  *
                FROM                   [MyDB].[dbo].[tbl_'+@entity+'] v
                FOR XML PATH('''+@PARAM1 +'''), ROOT('''+@PARAM2+'''), TYPE
)
FOR XML PATH('''+@PARAM3+'''), ROOT('''+@PARAM4+'''), TYPE
'


EXEC (@SQL)
GO
(Note that @MessageID us used here to illustrate how you can manipulate the xml output.  It was a requirement of the solution to have a MessageID on each file) 

Now we have one stored proc that can be used to build all the XML files required, and all our logic is in a single place.  The procedure accepts an input parameter that indicates what entity is being loaded, and the PATH and ROOT values are then found from the lookup table.  Notice also the addition of the MessageID as a field of the outer SELECT FOR XML statement.  Playing with nesting one statement inside another in this manner allows a huge amount of flexibility when transforming data into xml format. For more info on the SELECT FOR XML clause, including use of other modes, see the msdn reference here:
http://msdn.microsoft.com/en-us/library/ms178107.aspx

Sunday 25 March 2012

SQL 2008R2 - Trigger order

I was troubleshooting a piece of code for a client recently and came across an interesting issue.  The client had created 2 triggers on a table that were firing at the same time.  This is pretty much against standard coding practice for several reasons - firstly how can you be sure in which order they will fire, and secondly it is very tricky to debug (due to the first point :)).


I decided to do some testing of this to see if there is a consistent pattern to the trigger firing order.  To do this i created a table, trigger_test, and 2 AFTER INSERT triggers, add_two, and times_2.  As the names suggest, add_two will take the "end_value" and add 2 to it, updating the end_value field with this new value.  Trigger times_two will multiply the end_value and update the field with this new value.  Here is code i used for this:


CREATE TABLE trigger_test
(
      id                INT IDENTITY,
      initial_value     INT,
      end_value         INT,
      trigger_add       BIT,
      trigger_times     BIT
)

GO

CREATE TRIGGER add_two ON trigger_test AFTER INSERT
AS
BEGIN
      UPDATE trigger_test 
      SET end_value = z.end_value + 2, trigger_add = 1
      FROM INSERTED i
      INNER JOIN trigger_test z
      ON i.id = z.id
END
GO

CREATE TRIGGER times_two ON trigger_test AFTER INSERT
AS
BEGIN
      UPDATE trigger_test
      SET end_value = z.end_value * 2, trigger_times = 1
      FROM INSERTED i
      INNER JOIN trigger_test z
      ON i.id = z.id
END
GO

Therefore if we insert the value 5 into the table, if trigger add_two occurs first, followed by times_two, we would expect a final value of 14.  If it is the other way round the result would be 12. In this way we can easily determine in what order the triggers are executing.

So next we run the insert, and select the results:

INSERT INTO trigger_test (initial_value,end_value) VALUES (5,5)
GO
SELECT * FROM trigger_test

Results:
(1 row(s) affected)
id          initial_value end_value trigger_add trigger_times
----------- ------------- ----------- ----------- -------------
1           5             14          1           1

(1 row(s) affected)

Here we clearly see that add_two is executing first, (5+2 = 7, 7*2 = 14).

But what happens if we drop the triggers and create them in the opposite order, times_two before add_two? i'll not repeat the code, as it's the same as above but the result then becomes:

(1 row(s) affected)
id          initial_value end_value trigger_add trigger_times
----------- ------------- ----------- ----------- -------------
1           5             12          1           1

(1 row(s) affected)

The result is now 12, indicating that times_two occurred first.  

(The test can be repeated multiple times to ensure there is a definite order)

So it would appear that the order the triggers are created in is the order they will be executed in. Or put differently, the lower OBJECT_ID will be executed before the larger one.

This alone is fine and dandy, but may result in developers inadvertently messing things up if they modify and drop/create a procedure, and are unaware that the order is significant.  In doing so they could accidentally change the order and therefore the functionality of your code, and as stated earlier, if you ARE going to fire multiple triggers at the same time it will now be very tricky to find the problem!

This can be partially resolved by using a handy proc called sp_settriggerorder.  As it's name suggests, this allows you to set the trigger firing order by setting 'FIRST' and 'LAST' values:

EXEC sp_settriggerorder @triggername = 'times_two',
                        @order = 'first',
                        @stmttype = 'insert',
                        @namespace = null

Now we can define the specific order we want the triggers to fire in.  Problem solved? Almost....

We can only set 'first' and 'last' values here, so this can only define the order of up to three triggers  - set one as 'first', one as 'last' and do nothing for the other and it will have to fire second.  Any more triggers and it will not be possible to set the order they will fire.

Also, the same caveat still applies - if a trigger is  explicitly set to 'first' and is then dropped and recreated, it will no longer have this setting and  sp_settriggerorder will need to be rerun.

For more info on sp_settriggerorder see the msdn reference here: http://msdn.microsoft.com/en-us/library/ms186762.aspx