SQL Banana: PIVOT

Monday, 17 November 2014

SQL UNPIVOT on Multiple Columns

Some time ago I posted about using the UNPIVOT function in a stored procedure here. Recently I came across an interesting problem that required further use of the UNPIVOT command.

Suppose you have a number of columns in a wide table that you would like to unpivot into a single column and multiple rows. An example might be when there are many measure columns and you would like to narrow the table by combining all the measure columns into a single column. This would be a fairly standard use of the UNPIVOT command that was covered in the earlier post.

Now suppose that there are two sets of measures that you would like kept as separate columns, Instead of a single MeasureName and MeasureValue column, you would like 2 of each. Or perhaps a simpler example might not be measure, but dimensional data. Let's look at CRM system data:

CREATE TABLE Company

(Id INT,

CompanyName VARCHAR(500),

Contact1 VARCHAR(500),

Contact2 VARCHAR(500),

Email1 VARCHAR(500),

Email2 VARCHAR(500)

)

GO

INSERT INTO Company SELECT

1, 'FastCarsCo', 'Mr Purple', 'Mr Orange', 'purple@fcc.com', 'orange@fcc.com'

UNION ALL SELECT

2, 'TastyCakeShop', 'Mr Brown', 'Mr White', 'brown@tcs.com', 'orange@tcs.com'

UNION ALL SELECT

3, 'KidsToys', 'Mr Pink', 'Mr Black', 'pink@kt.com', 'black@kt.com'

UNION ALL SELECT

4, 'FruitStall', 'Mr Red', 'Mr Blue', 'red@fs.com', 'blue@fs.com'

GO

SELECT Id, CompanyName, contact1, contact2, email1, email2

FROM Company

GO

In this example we have a company table with two contacts for each company, and an email address for each contact. Instead of two columns for contact names and two for email addresses, we would like to reduce this to one of each: CompanyName, ContactName, EmailAddress. First let's try and reduce the contact1 and contact2 columns to one:

SELECT ID, CompanyName, ContactName,Contact

FROM

(

SELECT ID, CompanyName, Contact1, Contact2

FROM Company

) src

UNPIVOT

(

ContactName FOR Contact IN (Contact1, Contact2)

) pvt

We have now replaced the contact1 and contact2 fields with a single ContactName field. I have added the new Contact column to indicate the source of the data. This can be removed from the select list if it is not required.

Next we need to perform a similar unpivot on the email columns. This is actually easier that you might think. Rather than nesting queries, or complex CTEs, we need only append our first query with another unpivot statement:

SELECT Id,

CompanyName,

ContactName,

EmailAddress,

Contact

FROM

(

SELECT Id, CompanyName, Contact1, Contact2, Email1, Email2

FROM Company

) src

UNPIVOT

(

ContactName FOR Contact IN (Contact1, Contact2 )

) pvt1

UNPIVOT

(

EmailAddress For Email IN (Email1, Email2 )

) pvt2

WHERE RIGHT(Contact,1) = RIGHT(Email,1)

The WHERE clause in the above query is very important. Without it, we haven't told SQL Server how the second set of unpivoted data matches the first. Without the WHERE , it will do a cross join, resulting in Mr Purple occurring twice; once with the correct email address, and once with the address of Mr Orange, his colleague.

In the WHERE clause we state that the new columns we have created, Contact (selected) and Email (not selected) should be joined on the last character. Remember that these two columns are populated with the names of the original fields, ie Contact1, Contact2, and Email1, Email2. The two columns therefore align as Contact1 → Email1 (1=1) and Contact2 → Email2 (2=2). Hence it is important that the names of the original columns in our source Company table contain text such that they can be joined together in the WHERE clause at the end of the query.

More details on the PIVOT / UNPIVOT command can be found here:

http://technet.microsoft.com/en-us/library/ms177410(v=sql.105).aspx

Saturday, 8 December 2012

Reporting Services Report – Changing Column Names, Changing Table Names and sql PIVOT

The other day I was creating some SSRS reports. For each datafeed in an ETL process the rejected rows were being diverted into error tables – a single error table for each feed, eg error_feed1, error_feed2. As each of the feeds were different, so too were the column names and metadata of the error tables. To allow users to review and correct these records I needed to build Reporting Services reports on each of the tables.

Initially it looked like I would need a different report for each feed – when you bind a SSRS report object to a dataset the column names of the dataset have to remain constant or the report will fail, hence one report, one error table. As I was dealing with dozens of feeds, the prospect of dozens of very similar reports did not seem favourable.

All these reports would be almost identical, the only difference was the column names and the table names. I was sure there must be an easier way. I googled around and found several useful suggestions. Generally they followed the idea of pivoting the columns into rows in the dataset and then using a SSRS matrix object. EG:

error_feed1

Id	Col1	Col2	Col3	Col4
1	W	X	Y	Z

Would become:

ID	measure	value
1	Col1	W
1	Col2	X
1	Col3	Y
1	Col4	Z

Using a matrix you would put the ID column on the rows, the measure column on the cross tab section, and value in the data section of the matrix object. Now it wouldn’t matter if the column names changed, if new columns were added or old ones removed from the source table. The 3 columns outputted by the pivot query would remain and the matrix report will adapt accordingly. The pivot/unpivot command to do the above would look like this:

SELECT ID, measure, value
FROM( SELECT id, col1, col2, col3, col4, col5 FROM error_feed1) p
UNPIVOT( VALUE FOR measure IN (col1, col2, col3, col4, col5)) AS unpvt

The columns are now dynamic, which solves half the problem. But the FROM clause uses specific column and table names, meaning this metatdata needs to be known in advance and hardcoded into the SSRS dataset query. Which brings us back to our original problem – we cannot hard code these values because they are constantly changing.

In order to get around this problem I decided to have the dataset be the result of a stored procedure. I can then have greater flexibility in manipulating the data, so long as I return a result set to SSRS at the end, and always with the same column names returned.

The proc accepts one parameter - the feed name, to be supplied by the user running the report using a standard SSRS drop down.

The proc itself makes use of the sysobjects and syscolumns system tables to get the full list of columns for any given table:

SELECT      c.name
FROM        sys.columns c
INNER JOIN sys.objects o
      ON    c.object_id = o.object_id
WHERE       type = 'U'
      AND   o.name LIKE @TableNameORDER BY    c.column_id

name
------------
id
col1
col2
col3
col4

Once the table name has been supplied (as a parameter in the SSRS report), dynamic sql can be leveraged to query the system tables and use the results to build a string containing the required sql PIVOT command, with all the relevant column names for any given table.

Once the string variable is populated with the sql script it is then executed, returning a result set of just 3 columns; the same 3 columns - ID, measure and value - regardless of the table being queried. The actual code code of the proc is below:

This result set is all that the SSRS queryset would ever see, and the column names would always be the same 3 named columns, regardless of the feed selected by the user. Setting up the SSRS matrix object in the manner suggested above would then display the contents of the table as normal – effectively doing a PIVOT to counter the UNPIVOT done in the stored procedure.

We now only need one single SSRS report to display data from any database table the user selects from the feed list in the parameter drop down - much simpler than dozens of different reports, or dozens of datasets and playing with visibility settings etc.