Monday 14 December 2015

SSAS - How to Terminate Long Running Queries

Despite how well you may design your SSAS database or how well you educate your users, in my experience eventually someone will issue queries that take way too much time to complete.  It's usually down to a user forgetting to apply a filter somewhere.  They'll likely be using a front end tool like Excel that allows them to easily drag and drop measures and attributes without too much thought, and may "accidentally" query data for all time rather than just the desired dates. If the cube is partitioned by date (eg by day or month) and contains many years of data then this query could be scanning every single partition in the measure group, leaving the user to go get a coffee while waiting for the results.

(Incidentally, if a trace reveals that more partitions than expected are being scanned than there may be an issue with the Slice property)

Different front end applications will handle this waiting time in different ways, but many will simply appear to hang or freeze.  Cue a call from an agitated user: "Every time I run my query the application dies".  This is obviously unacceptable, not to mention the less visible drain on resources it may be causing.  But what steps can we take to resolve this?

While it's important to ensure that users are aware of certain front-end best practices of querying the cube (eg apply filters before nesting dimensions), there are a number of approaches we can take to quickly resolve the problem and stop the applications hanging.

1. Identify and Kill the Query
Just like a SQL relational database, SSAS also has DMVs that provide insights into server activity, including connection information.  Run the below MDX query:

SELECT SESSION_SPID,
       SESSION_USER_NAME,
       SESSION_CURRENT_DATABASE,
       SESSION_LAST_COMMAND,
       SESSION_CPU_TIME_MS,
       SESSION_ELAPSED_TIME_MS
FROM   $system.discover_sessions

From these results, we can identify which session_id is the troublemaker.  Once identified, we can investigate further, and, if necessary, manually kill the session.  Killing the session can be done with the following XMLA command:
<Cancel xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">
       <ConnectionID>[CID]</ConnectionID>
       <SessionID>[SID]</SessionID>
       <SPID>[SPID]</SPID>
</Cancel>

Note: this command can be used to kill the connection, the session or the SPID - delete as required.

Executing the above command will likely cause an error to be thrown in the users application and in some cases may require them to restart the program.  However in most circumstances this is preferable to leaving the application hanging indefinitely.

2. Dynamically Kill Long Running Queries With Scheduled Jobs
While the above method is fine if a manual intervention is required, a much slicker approach would be to automate the process.  It would be simple enough to create a SQL Agent Job or SSIS package to scan the DMVs in order to detect any queries running for longer than a predefined time.  The SPIDs could then be passed dynamically to the XMLA command and killed.  An advantage of this approach is that it allows a great deal of flexibility - for example if certain users or certain times year (eg month-end), or certain databases are known to demand the use of unavoidably long running queries, then the cut-off time for killing these sessions could vary based on those parameters. 

3. Dynamically Kill Long Running Queries using SSAS Server Properties
By far the most common approach for handling this problem is to use the SSAS server properties.  However, I have seen many occasions in which these properties are misunderstood, which is actually what inspired this post.  These properties can be accessed either by right-clicking on the SSAS server in SSMS and selecting Properties (tick the Advance box at the bottom), or they can be accessed directly in the server file msmdsrv.ini:



ExternalCommandTimeout



Often I have seen this setting used to handle this particular problem, however that is a common mistake.  This setting relates to OLAP processing performance, not query performance.  If a processing query, ie a query sent from SSAS to a source system in order to process dimensions or partitions, has not completed its execution on the source system by the threshold set by this property, then the cube process command will be forced to fail.

ServerTimeout


This is the property, towards the end of the Advanced Properties list, that is relevant for us.  Any MDX query on any OLAP database on the server that does not complete within the time set by this property will be terminated.

There's no flexibility with ServerTimeout - one setting affects all MDX queries on the server regardless of origin, destination or anything else.  99% of the time this is a sufficient and effective solution for terminating long running queries with minimal fuss.


Wednesday 16 September 2015

SSAS - Slow Running Query? Unnecessary Partition Scans? Solve it with a Slice!

While there are many possible reasons why an MDX query of an OLAP cube may not perform as well as expected - the cube dimension hierarchies may not be optimised for best use, the MDX query itself may require optimisation etc - an often overlooked area is the measure group partition definitions and the partition Slice property.


 In a typical SSAS project I would normally design an "initial" partition inside SSDT (formerly BIDS):


I would deploy this solution and create a scheduled job to create additional partitions when necessary, depending on the partition strategy (eg every night/week/month etc).  The Create Partition xmla script might look something like this:
  
<Create xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">
<ParentObject>
<DatabaseID>MyDb</DatabaseID>
<CubeID>MyDb</CubeID>
<MeasureGroupID>MyMeasureGroup</MeasureGroupID>
</ParentObject>
<ObjectDefinition>
<Partition xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:ddl2="http://schemas.microsoft.com/analysisservices/2003/engine/2"
xmlns:ddl2_2="http://schemas.microsoft.com/analysisservices/2003/engine/2/2"
xmlns:ddl100_100="http://schemas.microsoft.com/analysisservices/2008/engine/100/100"
xmlns:ddl200="http://schemas.microsoft.com/analysisservices/2010/engine/200"
xmlns:ddl200_200="http://schemas.microsoft.com/analysisservices/2010/engine/200/200"
xmlns:ddl300="http://schemas.microsoft.com/analysisservices/2011/engine/300"
xmlns:ddl300_300="http://schemas.microsoft.com/analysisservices/2011/engine/300/300"
xmlns:ddl400="http://schemas.microsoft.com/analysisservices/2012/engine/400"
xmlns:ddl400_400="http://schemas.microsoft.com/analysisservices/2012/engine/400/400">
<ID>20150102</ID>
<Name>20150102</Name>
<Source xsi:type="QueryBinding">
<DataSourceID>MyDs</DataSourceID>
<QueryDefinition>SELECT * FROM MyTable WHERE MyDate = 20150102</QueryDefinition>
</Source>
<StorageMode>Molap</StorageMode>
<ProcessingMode>Regular</ProcessingMode>
<ProactiveCaching>
<SilenceInterval>-PT1S</SilenceInterval>
<Latency>-PT1S</Latency>
<SilenceOverrideInterval>-PT1S</SilenceOverrideInterval>
<ForceRebuildInterval>-PT1S</ForceRebuildInterval>
<Source xsi:type="ProactiveCachingInheritedBinding" />
</ProactiveCaching>
<EstimatedRows>5000000</EstimatedRows>
<AggregationDesignID>AggregationDesign</AggregationDesignID>
</Partition>
</ObjectDefinition>
</Create>

Once deployed and processed, this solution may work well. Users find the cube responsive and fast.  Over time, however, users start to complain that the same daily query/report that used to run almost instantaneously is now painfully slow.  No design changes have been deployed, no hardware modifications have been made - nothing has been altered.  So how is it that the same query now takes significantly longer?

An often overlooked area is the slice setting for partitions.  As an example of what can happen, let's look at the AdventureWorks cube from the SQL 2012 samples (available here http://msftdbprodsamples.codeplex.com/releases/view/55330).

A good starting point for debugging queries is often to run a SQL Profiler trace on the SSAS database and see what's happening when this query is executed.  Let's take a simple query:

SELECT [Measures].[Internet Sales Amount] ON 0,
[Geography].[City].
Members ON 1
FROM [Adventure Works]
WHERE [Date].[Calendar].[Date].&[20080607]

This will return the "Internet Sales Amount" measure by City on 7th June 2008.  If we use the Object Browser in Management Studio to examine the Internet Sales measure group we see this:


  
The measure group contains four partitions, one for each year.  Therefore we would expect that our query above should only hit the one single partition, Internet_Sales_2008.  Let's confirm by viewing the SQL Profiler trace results:


The trace shows that SSAS is actually scanning three partitions, not one - 2005, 2006 and 2008.  But we know the result set lies only in 2008, so why this extra scanning?  We may want to confirm that the results are only going to be in the 2008 partition by examining the partition query - the SQL query used as the source for the partition data.  In this case it's been checked and they are all set up with the correct queries and WHERE clauses. 

The reason for the extra partition scans is down to the Slice property of the partition not being set.  The slice property is used by SSAS to determine which partition(s) contains the desired data.  If SSAS knows up front where the data lies it can avoid wasting time and resources scanning partitions that will not return any data. Let's take a look at the properties of one of the partitions in SSMS:


We can see here that the Slice property is blank.  When left blank, SSAS uses it's own default method of determining which partitions may contain the data.  While this may sometimes be sufficient, the trace above shows it is hardly foolproof.  Far better to actually set this property to the value we know to be valid, enabling the SSAS engine to quickly be directed to partition containing the results.

Valid values for a slice property are an MDX member, set or tuple.  In this example the slices should be set to the relevant year, eg:

[Date].[Calendar Year].&[2008]

If we now set this for each of the partitions and rerun the same query we see the following in the trace file:


The engine is going directly to the correct partition and ignoring all others.  The result is a much faster running query.  I have seen examples where there were 100+ partitions, all without the slice set.  The result was that queries were taking up to 30 seconds to complete.  On setting the slices appropriately, query time reduced to around 1 second.

Note that we can set the slice either via properties in SSDT at design time, in SSMS properties post deployment, or better yet, in the original XMLA we used at the start of this post to automatically create the partitions on a schedule:

…<StorageMode>Molap</StorageMode>
<
ProcessingMode>Regular</ProcessingMode>
<
Slice>[Date].[Calendar Year].&amp;[2008]</Slice>
<ProactiveCaching>….

So the lesson is: Always set the Slice property :)


Sunday 14 June 2015

SQL Server Triggers – Which fields have been updated? – Part 2

In the last post (SQL Server Triggers – Which fields have been updated? – Part 1) we explored various methods to determine which columns in a table were being updated.  All those methods relied on a SQL trigger being fired.  Once fired, if a specific column in the table was updated we could carry out an appropriate action, such as inserting an audit record into another table.  However, what if the UPDATE statement was executed against the table, but the actual values of the fields were not changed?  Consider the following example:

CREATE TABLE MyTable (Col1 INT IDENTITY(1,1), Col2 varchar(10), Col3 INT)
GO

CREATE TRIGGER dbo.tr_MyTable ON dbo.MyTable  AFTER UPDATE
AS
BEGIN
       PRINT 'An Update has occurred'
END
GO

INSERT INTO MyTable
SELECT 'a',7 UNION ALL
SELECT 'b',8 UNION ALL
SELECT 'c',9 UNION ALL
SELECT 'd',10
GO

UPDATE MyTable
SET Col3 = 10 WHERE Col2 = 'd'

The UPDATE statement above has overwritten Col3 with the same value as it had before.  The table data has not actually been changed.  Nonetheless, because an UPDATE statement has been executed, the AFTER UPDATE Trigger is fired and will output the print statement.  In fact even if the UPDATE statement had not affected a single row, the trigger would still have been fired, outputting the print statement.

This is often not the desired behaviour – I don’t want to be inserting audit records if nothing has actually changed.  Depending on what logic has been implemented in the trigger and how often it is fired, this could have an adverse impact on performance.  Therefore the first improvement we can make is to ensure nothing happens if no records have been updated.  To do this we use the @@ROWCOUNT global variable (for more on this variable see my post on SQL Server Global Variables):

ALTER TRIGGER dbo.tr_MyTable ON dbo.MyTable  AFTER UPDATE
AS
BEGIN
       IF @@ROWCOUNT = 0 RETURN;

       PRINT 'An Update has occurred'
       --Perform UPDATE logic here
END
GO

This trigger will now immediately return, doing nothing, if no records have been updated.

Next we need ensure we only take action if field values have actually changed.  To do this we can make use of the special inserted and deleted tables.  These are temporary tables created and managed automatically by SQL Server.  While we cannot perform any DDL operations on these tables, they contain identical columns to the original table the trigger is created on, and we can refer directly to these columns in the code.  When a DML trigger fires, these two tables are populated with the deleted and inserted records from the underlying table respectively.  If records are updated then the original pre-updated record arrives in the deleted table and the post-updated record in the inserted table.  Therefore, if we want to know if previous values have actually been updated with new, different values we can compare the contents of these two tables:

ALTER TRIGGER dbo.tr_MyTable ON dbo.MyTable  AFTER UPDATE
AS
BEGIN
       IF @@ROWCOUNT = 0 RETURN;

       IF EXISTS(    SELECT *
                           FROM          inserted i
                           INNER JOIN    deleted d
                                   ON    i.Col1 = d.Col1
                           WHERE         d.col2  <> i.col2
                                  OR     d.col3 <> i.col3           
                     )            
       BEGIN        
              PRINT 'An Update has occurred'
       END
END

In the above trigger, inside the IF EXISTS function, we are joining the inserted and deleted tables together on Col1, the identity column.  Because these tables are effectively the “before” and “after” state of the data, we are only interested where the values in the remaining fields do not match, indicating that data has changed in those fields.  This is logic is carried out by the WHERE clause.  Only if such rows exist do we then perform the desired UPDATE logic, in this case the PRINT statement.  


The above trigger will now only do anything if data has actually changed in the underlying table.  This has the potential of saving a lot of time and unnecessary processing of data on those occasions when an UPDATE has occurred but no overall change has actually taken place.

Monday 8 June 2015

SQL Server Triggers – Which fields have been updated? – Part 1


When updating a table it can often be very useful to know not just that an update has taken place, but exactly which columns have been updated. For example, in an HR department you may have an employee table that contains customer names and bank account numbers.  If the bank account number for an employee changes, either through a front end application, a DML query or any other method, it should be audited – by which we mean an audit table should be inserted with a corresponding record.  While if an employee middle name is changed, an audit is not required.

Below we’ll look at a number of ways for testing for specific column updates, and in Part 2 we’ll look at determining if values have indeed actually changed or not – they could have been updated with the same value as before.


UPDATE()

One way of determing which column has been updated is by using an AFTER UPDATE trigger.  Within this trigger we could use the UPDATE() function to detect the column that has been updated and apply the audit logic accordingly:

CREATE TRIGGER tr_Employee ON Employee AFTER UPDATE
AS
       IF UPDATE (BankAccountNo))
       BEGIN
              INSERT INTO EmployeeAudit Column1Column2,... ModifiedByModifiedDate)
              SELECT inserted.Column1inserted.Column2,... USER_NAME(), GETDATE()
       END
  
But what happens if we have additional columns, some requiring audit (such as SortCode, NationalInsuranceNo etc), and some not (eg marital status)?  Or perhaps the non-sensitive columns don’t require audit but do require some other logic, eg an additional table to be amended in line with this one.  We could append multiple IF UPDATE() statements:

IF UPDATE(BankAccountNo)
BEGIN
       --apply audit logic
END

IF UPDATE (MaritalStatus)
BEGIN
       --apply other logic
END
--next IF…

Or we could combine the groups of logic together into fewer statements:

IF (UPDATE(BankAccountNo) OR UPDATE(SortCode))
BEGIN
       --apply audit logic
END

IF (UPDATE(MaritalStatus) OR UPDATE(MiddleName)
BEGIN
       --apply other logic
END

COLUMNS_UPDATED()

If you like working with binary and bitwise logic, COLUMNS_UPDATED() can be used to achieve similar results and much less code. COLUMNS_UPDATED() returns a varbinary bit pattern indicating which columns in a table have been updated. From BOL:

COLUMNS_UPDATED returns one or more bytes that are ordered from left to right, with the least significant bit in each byte being the rightmost. The rightmost bit of the leftmost byte represents the first column in the table; the next bit to the left represents the second column, and so on. COLUMNS_UPDATED returns multiple bytes if the table on which the trigger is created contains more than eight columns, with the least significant byte being the leftmost.

So for a simple four column table, when COLUMNS_UPDATED() is used in the trigger it may return a binary value of 1010.  This would indicate that columns two and four (as listed in syscolumns) have been updated, but one and three have not.  We can therefore use the value returned by this function in our trigger to determine if a specific combination of columns in our table have been updated.  However in order to do this we need to first convert the binary number into decimal.

As each column (right to left) represents an incremental increase in the power of 2 (ie, right most column is 2^0, next to the left is 2^1, then 2^2), a binary value of 1010 can be translated to decimal as:

Column Name
Col4
Col3
Col2
Col1
Column Position
4
3
2
1
COLUMNS_UPDATED()
1
0
1
0
Bin to dec conversion
2^3
2^2
2^1
2^0
Decimal value
8
0
2
0


Binary 1010 = ( 0 x 2^0) + (1 x 2^1) + (0 x 2^2) + (1 x 2^3) =  0 + 2 + 0 + 8 = 10

When working with binary we need to use SQL Server bitwise operators, in this case ‘&’ (Bitwise AND) to compare two values or we may get unexpected results.  The ‘&’ function looks at each bit (that is each 1/0) in the first binary number and compares it to the corresponding bit in the second number.  If the first bit in the first number is 1 AND the second bit in the second number is 1, then the first bit of the output is 1.  If either bits are 0 then the output is zero. Then it moves onto the second bit in both numbers and starts again. The output number can be converted back to decimal again.  In the first table below, the binary number 1010 is ‘AND-ed’ with itself and the result is shown. In the second table it is ‘AND-ed’ with a different number.



Col4
Col3
Col2
Col1
Decimal
Binary Value 1
1
0
1
0
10
Binary Value 2
1
0
1
0
10
Output of ‘AND’
1
0
1
0
10
    

Col4
Col3
Col2
Col1
Decimal
Binary Value 1
1
0
1
0
10
Binary Value 2
0
1
1
0
6
Output of ‘AND’
0
0
1
0
2

The above tables show that using the bitwise AND operator:  10 & 10 = 10,  10 & 6 = 2.  Therefore to we can say that in our table if COLUMNS_UPDATED() = 10 we know that columns two and four have been updated.

The resulting trigger for this example would look like this:

CREATE TRIGGER tr_Employee ON Employee AFTER UPDATE
AS
IF (COLUMNS_UPDATED() & 10 = 10)
BEGIN
       --apply audit logic
END

Note that the IF statement above will return TRUE if columns two and four or more are updated, not just two and four alone.
Advantages of this approach are that for a very wide table, the number of IF statements, or alternatively the number of conditions within the IF statements could be huge and time consuming to write, while using COLUMNS_UPDATED() reduces it to just one short line.  However, using this function will mean that the functionality of the code is heavily dependent on the order of the columns in the table.  If the table is dropped and recreated with columns reordered, the trigger may no longer function as expected.  In addition, while being concise, this code is more difficult to understand and debug.  Therefore I would recommend the earlier approaches unless absolutely necessary.

In Part 2  we’ll look at how to analyse the records that have been updated and determine if any update has actually been made at all – an Update trigger can be fired even if no values have actually changed.