SQL Banana: variable

Tuesday, 27 May 2014

SQL Server Parameter Sniffing - Slow Running Queries

Parameter sniffing is a problem that can occasionally creep into code, just when everything seems fine. A stored procedure that yesterday ran nice and fast, today is taking much longer. The next day it is fast again. Nothing has changed on the server / db. It can be difficult to track down why.

The answer could be Parameter Sniffing.

When a stored procedure is compiled, it compiles using ( or "sniffing") the parameter values set at the time of the invocation. It uses those parameter values to determine the optimal execution plan for the proc. Simply put, if you later call the proc with a different parameter value, it will still use the execution plan determined by the first set of values. It will do this regardless of whether a better execution plan would have been more appropriate for the new value.

For example, if I have a table containing all the values from 1 to 100, and then 100 rows with a value 999:

/ * Create table * /

CREATE TABLE DummyData ( col1 INT )

DECLARE @Counter INT

/ * Insert data 1-100 * /

SET @Counter = 0

WHILE @Counter < 100

BEGIN

INSERT INTO DummyData ( col1 ) VALUES ( @Counter )

SET @Counter = @Counter + 1

END

/ * Insert 100 data rows with value 999 * /

SET @Counter = 0

WHILE @Counter < 100

BEGIN

INSERT INTO DummyData ( col1 ) VALUES ( 999 )

SET @Counter = @Counter + 1

END

Next create a procedure to select values from this table:

CREATE PROC SelectDummyData ( @DummyValue INT )

BEGIN

SELECT * FROM DummyData WHERE col1 = @DummyValue

END

Now generate the Execution Plan for this procedure, passing a parameter of 999, and look at the "estimated number of rows":

EXEC SelectDummyData @DummyValue = 999

The row count is 100, which is what we would expect. But re-run the execute statement, this time with @DummyValue = 5, and the Estimated Number of Rows is still 100, when we would expect it to be 1.

The reason it is still expecting 100 rows is because the execution plan was determined when the proc was first compiled, when we used a value of 999. It is still using the same execution plan, and therefore the same number of rows is expected, regardless of the input value supplied.

The impact of this oddity in this example is negligible, but for a complex query it can result in significantly slower performance.

If your query is complex it might be easier to generate the execution plan in XML ( SET SHOWPLAN_XML ON ) and then search the XML for something like "ParameterCompiledValue =".

To avoid parameter sniffing issues from occurring there are a number of options that can be considered:

1. Recompile the stored proc every time it is executed:

EXEC SelectDummyData @DummyValue = 1
WITH RECOMPILE

This will ensure that the proc uses the supplied parameters to build the best execution plan every time, but the overhead is the constant recompilation. This may be acceptable if there is sufficient gain in query performance.

Generating the execution plan on the above script, using various parameters, shows the correct number of rows being estimated each time.

If the parameter sniffing relates to a set of queries in particular, then recompilation can be specified individually for queries:

SELECT * FROM DummyData WHERE col1 = @DummyValue

OPTION ( RECOMPILE )

2. Optimize the query/ies for a specific parameter value:

ALTER PROC SelectDummyData ( @DummyValue INT )

BEGIN

SELECT * FROM DummyData WHERE col1 = @DummyValue

OPTION ( optimize FOR ( @DummyValue = 1 ))

END

With this option we can force the query plan to be based on the same known value every time it is compiled (eg During maintenance tasks). This option may work if we know the optimal value for our system. The downside, however, is that it may still perform poorly for other values, but that may well be tolerable.

3. Avoid sniffing altogether, by using a local variable inside the proc:

ALTER PROC SelectDummyData ( @DummyValue INT )
AS BEGIN DECLARE @LocalDummyValue INT SET @LocalDummyValue = @DummyValue
SELECT * FROM DummyData WHERE col1 = @LocalDummyValue
END

Here there is no possibility of parameter sniffing, and the execution plan will be based on the statistics. Note, however, that the plan will be built using statistic densities instead of statistic histograms, which are less accurate. Therefore it is not guaranteed to be the best plan for all possible values.

Conclusion
Parameter sniffing has the potential to cause queries to run with widely varying performance. In order to mitigate this problem consider modifying the code based on the options presented above. Hopefully this will help keep the queries and stored procedures performing well wherever they are used.

More useful info on this topic is available here:

Degremont Michel

http://blogs.technet.com/b/mdegre/archive/2012/03/19/what-is-parameter-sniffing.aspx

Turgay Sahtiyan
http://blogs.msdn.com/b/turgays/archive/2013/09/10/parameter-sniffing-problem-and-workarounds.aspx

Saturday, 8 December 2012

Reporting Services Report – Changing Column Names, Changing Table Names and sql PIVOT

The other day I was creating some SSRS reports. For each datafeed in an ETL process the rejected rows were being diverted into error tables – a single error table for each feed, eg error_feed1, error_feed2. As each of the feeds were different, so too were the column names and metadata of the error tables. To allow users to review and correct these records I needed to build Reporting Services reports on each of the tables.

Initially it looked like I would need a different report for each feed – when you bind a SSRS report object to a dataset the column names of the dataset have to remain constant or the report will fail, hence one report, one error table. As I was dealing with dozens of feeds, the prospect of dozens of very similar reports did not seem favourable.

All these reports would be almost identical, the only difference was the column names and the table names. I was sure there must be an easier way. I googled around and found several useful suggestions. Generally they followed the idea of pivoting the columns into rows in the dataset and then using a SSRS matrix object. EG:

error_feed1

Id	Col1	Col2	Col3	Col4
1	W	X	Y	Z

Would become:

ID	measure	value
1	Col1	W
1	Col2	X
1	Col3	Y
1	Col4	Z

Using a matrix you would put the ID column on the rows, the measure column on the cross tab section, and value in the data section of the matrix object. Now it wouldn’t matter if the column names changed, if new columns were added or old ones removed from the source table. The 3 columns outputted by the pivot query would remain and the matrix report will adapt accordingly. The pivot/unpivot command to do the above would look like this:

SELECT ID, measure, value
FROM( SELECT id, col1, col2, col3, col4, col5 FROM error_feed1) p
UNPIVOT( VALUE FOR measure IN (col1, col2, col3, col4, col5)) AS unpvt

The columns are now dynamic, which solves half the problem. But the FROM clause uses specific column and table names, meaning this metatdata needs to be known in advance and hardcoded into the SSRS dataset query. Which brings us back to our original problem – we cannot hard code these values because they are constantly changing.

In order to get around this problem I decided to have the dataset be the result of a stored procedure. I can then have greater flexibility in manipulating the data, so long as I return a result set to SSRS at the end, and always with the same column names returned.

The proc accepts one parameter - the feed name, to be supplied by the user running the report using a standard SSRS drop down.

The proc itself makes use of the sysobjects and syscolumns system tables to get the full list of columns for any given table:

SELECT      c.name
FROM        sys.columns c
INNER JOIN sys.objects o
      ON    c.object_id = o.object_id
WHERE       type = 'U'
      AND   o.name LIKE @TableNameORDER BY    c.column_id

name
------------
id
col1
col2
col3
col4

Once the table name has been supplied (as a parameter in the SSRS report), dynamic sql can be leveraged to query the system tables and use the results to build a string containing the required sql PIVOT command, with all the relevant column names for any given table.

Once the string variable is populated with the sql script it is then executed, returning a result set of just 3 columns; the same 3 columns - ID, measure and value - regardless of the table being queried. The actual code code of the proc is below:

This result set is all that the SSRS queryset would ever see, and the column names would always be the same 3 named columns, regardless of the feed selected by the user. Setting up the SSRS matrix object in the manner suggested above would then display the contents of the table as normal – effectively doing a PIVOT to counter the UNPIVOT done in the stored procedure.

We now only need one single SSRS report to display data from any database table the user selects from the feed list in the parameter drop down - much simpler than dozens of different reports, or dozens of datasets and playing with visibility settings etc.