Monday 14 December 2015

SSAS - How to Terminate Long Running Queries

Despite how well you may design your SSAS database or how well you educate your users, in my experience eventually someone will issue queries that take way too much time to complete.  It's usually down to a user forgetting to apply a filter somewhere.  They'll likely be using a front end tool like Excel that allows them to easily drag and drop measures and attributes without too much thought, and may "accidentally" query data for all time rather than just the desired dates. If the cube is partitioned by date (eg by day or month) and contains many years of data then this query could be scanning every single partition in the measure group, leaving the user to go get a coffee while waiting for the results.

(Incidentally, if a trace reveals that more partitions than expected are being scanned than there may be an issue with the Slice property)

Different front end applications will handle this waiting time in different ways, but many will simply appear to hang or freeze.  Cue a call from an agitated user: "Every time I run my query the application dies".  This is obviously unacceptable, not to mention the less visible drain on resources it may be causing.  But what steps can we take to resolve this?

While it's important to ensure that users are aware of certain front-end best practices of querying the cube (eg apply filters before nesting dimensions), there are a number of approaches we can take to quickly resolve the problem and stop the applications hanging.

1. Identify and Kill the Query
Just like a SQL relational database, SSAS also has DMVs that provide insights into server activity, including connection information.  Run the below MDX query:

SELECT SESSION_SPID,
       SESSION_USER_NAME,
       SESSION_CURRENT_DATABASE,
       SESSION_LAST_COMMAND,
       SESSION_CPU_TIME_MS,
       SESSION_ELAPSED_TIME_MS
FROM   $system.discover_sessions

From these results, we can identify which session_id is the troublemaker.  Once identified, we can investigate further, and, if necessary, manually kill the session.  Killing the session can be done with the following XMLA command:
<Cancel xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">
       <ConnectionID>[CID]</ConnectionID>
       <SessionID>[SID]</SessionID>
       <SPID>[SPID]</SPID>
</Cancel>

Note: this command can be used to kill the connection, the session or the SPID - delete as required.

Executing the above command will likely cause an error to be thrown in the users application and in some cases may require them to restart the program.  However in most circumstances this is preferable to leaving the application hanging indefinitely.

2. Dynamically Kill Long Running Queries With Scheduled Jobs
While the above method is fine if a manual intervention is required, a much slicker approach would be to automate the process.  It would be simple enough to create a SQL Agent Job or SSIS package to scan the DMVs in order to detect any queries running for longer than a predefined time.  The SPIDs could then be passed dynamically to the XMLA command and killed.  An advantage of this approach is that it allows a great deal of flexibility - for example if certain users or certain times year (eg month-end), or certain databases are known to demand the use of unavoidably long running queries, then the cut-off time for killing these sessions could vary based on those parameters. 

3. Dynamically Kill Long Running Queries using SSAS Server Properties
By far the most common approach for handling this problem is to use the SSAS server properties.  However, I have seen many occasions in which these properties are misunderstood, which is actually what inspired this post.  These properties can be accessed either by right-clicking on the SSAS server in SSMS and selecting Properties (tick the Advance box at the bottom), or they can be accessed directly in the server file msmdsrv.ini:



ExternalCommandTimeout



Often I have seen this setting used to handle this particular problem, however that is a common mistake.  This setting relates to OLAP processing performance, not query performance.  If a processing query, ie a query sent from SSAS to a source system in order to process dimensions or partitions, has not completed its execution on the source system by the threshold set by this property, then the cube process command will be forced to fail.

ServerTimeout


This is the property, towards the end of the Advanced Properties list, that is relevant for us.  Any MDX query on any OLAP database on the server that does not complete within the time set by this property will be terminated.

There's no flexibility with ServerTimeout - one setting affects all MDX queries on the server regardless of origin, destination or anything else.  99% of the time this is a sufficient and effective solution for terminating long running queries with minimal fuss.