SQL Banana: XMLA

Showing posts with label XMLA. Show all posts

Monday, 14 December 2015

SSAS - How to Terminate Long Running Queries

Despite how well you may design your SSAS database or how well you educate your users, in my experience eventually someone will issue queries that take way too much time to complete. It's usually down to a user forgetting to apply a filter somewhere. They'll likely be using a front end tool like Excel that allows them to easily drag and drop measures and attributes without too much thought, and may "accidentally" query data for all time rather than just the desired dates. If the cube is partitioned by date (eg by day or month) and contains many years of data then this query could be scanning every single partition in the measure group, leaving the user to go get a coffee while waiting for the results.

(Incidentally, if a trace reveals that more partitions than expected are being scanned than there may be an issue with the Slice property)

Different front end applications will handle this waiting time in different ways, but many will simply appear to hang or freeze. Cue a call from an agitated user: "Every time I run my query the application dies". This is obviously unacceptable, not to mention the less visible drain on resources it may be causing. But what steps can we take to resolve this?

While it's important to ensure that users are aware of certain front-end best practices of querying the cube (eg apply filters before nesting dimensions), there are a number of approaches we can take to quickly resolve the problem and stop the applications hanging.

1. Identify and Kill the Query
Just like a SQL relational database, SSAS also has DMVs that provide insights into server activity, including connection information. Run the below MDX query:

SELECT SESSION_SPID,

       SESSION_USER_NAME,

       SESSION_CURRENT_DATABASE,

       SESSION_LAST_COMMAND,

       SESSION_CPU_TIME_MS,

       SESSION_ELAPSED_TIME_MS

FROM   $system.discover_sessions

From these results, we can identify which session_id is the troublemaker. Once identified, we can investigate further, and, if necessary, manually kill the session. Killing the session can be done with the following XMLA command:

<Cancel xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">

       <ConnectionID>[CID]</ConnectionID>

       <SessionID>[SID]</SessionID>

       <SPID>[SPID]</SPID>

</Cancel>

Note: this command can be used to kill the connection, the session or the SPID - delete as required.

Executing the above command will likely cause an error to be thrown in the users application and in some cases may require them to restart the program. However in most circumstances this is preferable to leaving the application hanging indefinitely.

2. Dynamically Kill Long Running Queries With Scheduled Jobs

While the above method is fine if a manual intervention is required, a much slicker approach would be to automate the process. It would be simple enough to create a SQL Agent Job or SSIS package to scan the DMVs in order to detect any queries running for longer than a predefined time. The SPIDs could then be passed dynamically to the XMLA command and killed. An advantage of this approach is that it allows a great deal of flexibility - for example if certain users or certain times year (eg month-end), or certain databases are known to demand the use of unavoidably long running queries, then the cut-off time for killing these sessions could vary based on those parameters.

3. Dynamically Kill Long Running Queries using SSAS Server Properties

By far the most common approach for handling this problem is to use the SSAS server properties. However, I have seen many occasions in which these properties are misunderstood, which is actually what inspired this post. These properties can be accessed either by right-clicking on the SSAS server in SSMS and selecting Properties (tick the Advance box at the bottom), or they can be accessed directly in the server file msmdsrv.ini:

ExternalCommandTimeout

Often I have seen this setting used to handle this particular problem, however that is a common mistake. This setting relates to OLAP processing performance, not query performance. If a processing query, ie a query sent from SSAS to a source system in order to process dimensions or partitions, has not completed its execution on the source system by the threshold set by this property, then the cube process command will be forced to fail.

ServerTimeout

This is the property, towards the end of the Advanced Properties list, that is relevant for us. Any MDX query on any OLAP database on the server that does not complete within the time set by this property will be terminated.

There's no flexibility with ServerTimeout - one setting affects all MDX queries on the server regardless of origin, destination or anything else. 99% of the time this is a sufficient and effective solution for terminating long running queries with minimal fuss.

Wednesday, 16 September 2015

SSAS - Slow Running Query? Unnecessary Partition Scans? Solve it with a Slice!

While there are many possible reasons why an MDX query of an OLAP cube may not perform as well as expected - the cube dimension hierarchies may not be optimised for best use, the MDX query itself may require optimisation etc - an often overlooked area is the measure group partition definitions and the partition Slice property.

In a typical SSAS project I would normally design an "initial" partition inside SSDT (formerly BIDS):

I would deploy this solution and create a scheduled job to create additional partitions when necessary, depending on the partition strategy (eg every night/week/month etc). The Create Partition xmla script might look something like this:

<MeasureGroupID>MyMeasureGroup</MeasureGroupID>

</ParentObject>

<Partition xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xmlns:ddl2="http://schemas.microsoft.com/analysisservices/2003/engine/2"

xmlns:ddl2_2="http://schemas.microsoft.com/analysisservices/2003/engine/2/2"

xmlns:ddl100_100="http://schemas.microsoft.com/analysisservices/2008/engine/100/100"

xmlns:ddl200="http://schemas.microsoft.com/analysisservices/2010/engine/200"

xmlns:ddl200_200="http://schemas.microsoft.com/analysisservices/2010/engine/200/200"

xmlns:ddl300="http://schemas.microsoft.com/analysisservices/2011/engine/300"

xmlns:ddl300_300="http://schemas.microsoft.com/analysisservices/2011/engine/300/300"

xmlns:ddl400="http://schemas.microsoft.com/analysisservices/2012/engine/400"

xmlns:ddl400_400="http://schemas.microsoft.com/analysisservices/2012/engine/400/400">

<QueryDefinition>SELECT * FROM MyTable WHERE MyDate = 20150102</QueryDefinition>

</Source>

<StorageMode>Molap</StorageMode>

<ProcessingMode>Regular</ProcessingMode>

</ProactiveCaching>

<AggregationDesignID>AggregationDesign</AggregationDesignID>

</Partition>

</ObjectDefinition>

</Create>

Once deployed and processed, this solution may work well. Users find the cube responsive and fast. Over time, however, users start to complain that the same daily query/report that used to run almost instantaneously is now painfully slow. No design changes have been deployed, no hardware modifications have been made - nothing has been altered. So how is it that the same query now takes significantly longer?

An often overlooked area is the slice setting for partitions. As an example of what can happen, let's look at the AdventureWorks cube from the SQL 2012 samples (available here http://msftdbprodsamples.codeplex.com/releases/view/55330).

A good starting point for debugging queries is often to run a SQL Profiler trace on the SSAS database and see what's happening when this query is executed. Let's take a simple query:

SELECT [Measures].[Internet Sales Amount] ON 0,
[Geography].[City].Members ON 1
FROM [Adventure Works]
WHERE [Date].[Calendar].[Date].&[20080607]

This will return the "Internet Sales Amount" measure by City on 7th June 2008. If we use the Object Browser in Management Studio to examine the Internet Sales measure group we see this:

The measure group contains four partitions, one for each year. Therefore we would expect that our query above should only hit the one single partition, Internet_Sales_2008. Let's confirm by viewing the SQL Profiler trace results:

The trace shows that SSAS is actually scanning three partitions, not one - 2005, 2006 and 2008. But we know the result set lies only in 2008, so why this extra scanning? We may want to confirm that the results are only going to be in the 2008 partition by examining the partition query - the SQL query used as the source for the partition data. In this case it's been checked and they are all set up with the correct queries and WHERE clauses.

The reason for the extra partition scans is down to the Slice property of the partition not being set. The slice property is used by SSAS to determine which partition(s) contains the desired data. If SSAS knows up front where the data lies it can avoid wasting time and resources scanning partitions that will not return any data. Let's take a look at the properties of one of the partitions in SSMS:

We can see here that the Slice property is blank. When left blank, SSAS uses it's own default method of determining which partitions may contain the data. While this may sometimes be sufficient, the trace above shows it is hardly foolproof. Far better to actually set this property to the value we know to be valid, enabling the SSAS engine to quickly be directed to partition containing the results.

Valid values for a slice property are an MDX member, set or tuple. In this example the slices should be set to the relevant year, eg:

[Date].[Calendar Year].&[2008]

If we now set this for each of the partitions and rerun the same query we see the following in the trace file:

The engine is going directly to the correct partition and ignoring all others. The result is a much faster running query. I have seen examples where there were 100+ partitions, all without the slice set. The result was that queries were taking up to 30 seconds to complete. On setting the slices appropriately, query time reduced to around 1 second.

Note that we can set the slice either via properties in SSDT at design time, in SSMS properties post deployment, or better yet, in the original XMLA we used at the start of this post to automatically create the partitions on a schedule:

…<StorageMode>Molap</StorageMode>
<ProcessingMode>Regular</ProcessingMode>
<Slice>[Date].[Calendar Year].&[2008]</Slice>
<ProactiveCaching>….

So the lesson is: Always set the Slice property :)

Friday, 27 September 2013

SSAS 2008: Clear Cache Command

When testing query perfornance on an SSAS database it's important to know if you are starting from a warm or clear cache. The simplest way to clear the cache is through an XMLA command, run from SSMS:

<ClearCache xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">
<Object>
<DatabaseID>MyDatabase</DatabaseID>
</Object>
</ClearCache>

This clears the cache for the entire database. You can also clear the cache just for a specific cube:

<ClearCache xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">
<Object>
<DatabaseID>MyDatabase</DatabaseID>
<CubeID>MyCube</CubeID>
</Object>
</ClearCache>

Thursday, 8 August 2013

SSAS - ProcessAdd explained

Further to my previous post on creating multiple partitions and multiple processing commands in a single xmla script (here and here) I thought I share one more idea around SSAS Processing tasks.

When looking the different types of processing options available in SSAS you tend to get the same list over and over:

ProcessFull,
ProcessDefault,
ProcessData,
ProcessIndexes,
ProcessUpdate,
ProcessIncremental,
ProcessClear,
ProcessAdd

I will not explain here the differences between each of these types as this info is readily available on plenty of other sites (such as http://technet.microsoft.com/en-us/library/ms174774(v=sql.105).aspx).

However if you try to use each of these processing types from management studio all will go well until you try and process a dimension using ProcessAdd. It may sound like a useful option (faster than ProcessUpdate, less impact than ProcessFull etc) but when you try and use it you will find it is not as straightforward as the others. If you right-click the dimension in SSMS and select Process you might think to just select it in the drop down. But take a close look at that drop down - ProcessAdd is not there!

But the documentation says it is definately an option. Script a different processing option to XMLA and change the process type to Add and it may work, but if you havent specified which records to add the results could be misleading. Confusing, right?

In order to use this processing option you have to actually stipulate quite a lot of details. This can be done with an XMLA script, but the process of building this script is quite tricky. Therefore i shall do my best to explain how to do it here :) (Greg Galloway gives a good overview of this here
http://www.artisconsulting.com/blogs/greggalloway/Lists/Posts/Post.aspx?ID=4, and the steps below were based on following his method)

1 - Use Management Studio (SSMS) to script out to XMLA a ProcessUpdate command. (I recommend editting the error settings to ignore/report duplicates before scripting)

2 - In BIDS in the Solution Explorer right click on the dsv (the dsv file that the relevant dimension is build from, if you have more than one) and select View Code. Copy the entire code and paste it into a new window in SSMS to allow us to edit it.

3 - We should now reduce this dsv copy down to only the relevant part. We only actually need the
<
xs:element name for the particular query that is relevant to our dimension. Tip - quick way to find this element would be to search for a unique string within the query that might single it out from the rest! Keep all data and tags within this particular tag and delete everything outside this (tags at the same level), but keep all the parent tags. SSMS should make it easy to expand and collapse the tags in order to determine was tags are above or below, and what ones are at the same level.
4 - Copy what remains and paste it into your original process script from step 1. The code should be copied immediately before the closing Process tag.

5 - Now modify the actual query to limit it to only new data. For example add a where clause that limits the results to be after a certain date.

6 - Voila - you now have your ProcessAdd script. It should look something like this the code below and ready to execute.

<Batch xmlns="http://schemas.microsoft.com/analysisservices/2003/engine"><
ErrorConfiguration xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ddl2="http://schemas.microsoft.com/analysisservices/2003/engine/2" xmlns:ddl2_2="http://schemas.microsoft.com/analysisservices/2003/engine/2/2" xmlns:ddl100_100="http://schemas.microsoft.com/analysisservices/2008/engine/100/100" xmlns:ddl200="http://schemas.microsoft.com/analysisservices/2010/engine/200" xmlns:ddl200_200="http://schemas.microsoft.com/analysisservices/2010/engine/200/200"><
KeyErrorLimit>-1</KeyErrorLimit></
ErrorConfiguration>
<
Parallel><
Process xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ddl2="http://schemas.microsoft.com/analysisservices/2003/engine/2" xmlns:ddl2_2="http://schemas.microsoft.com/analysisservices/2003/engine/2/2" xmlns:ddl100_100="http://schemas.microsoft.com/analysisservices/2008/engine/100/100" xmlns:ddl200="http://schemas.microsoft.com/analysisservices/2010/engine/200" xmlns:ddl200_200="http://schemas.microsoft.com/analysisservices/2010/engine/200/200"><
Object><
DatabaseID>MyDB</DatabaseID><
DimensionID>MyDim</DimensionID></
Object><
Type>ProcessAdd</Type><
WriteBackTableCreation>UseExisting</WriteBackTableCreation>
<
DataSourceView xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ddl2="http://schemas.microsoft.com/analysisservices/2003/engine/2" xmlns:ddl2_2="http://schemas.microsoft.com/analysisservices/2003/engine/2/2" xmlns:ddl100_100="http://schemas.microsoft.com/analysisservices/2008/engine/100/100" xmlns:ddl200="http://schemas.microsoft.com/analysisservices/2010/engine/200" xmlns:ddl200_200="http://schemas.microsoft.com/analysisservices/2010/engine/200/200" xmlns:dwd="http://schemas.microsoft.com/DataWarehouse/Designer/1.0" dwd:design-time-name="4be9af80-0ae0-4f8a-936d-2103b944155f" xmlns="http://schemas.microsoft.com/analysisservices/2003/engine"><
ID>MyDSV</ID><
Name>MyDSV</Name><
DataSourceID>MyDS</DataSourceID><
Schema><
xs:schema id="MyDS" xmlns="" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:msdata="urn:schemas-microsoft-com:xml-msdata" xmlns:msprop="urn:schemas-microsoft-com:xml-msprop"><
xs:element name="MyDS" msdata:IsDataSet="true" msdata:UseCurrentLocale="true" msprop:design-time-name="2557b54c-2a08-4e49-ab45-fdd94dfb6dee"><
xs:complexType><
xs:choice minOccurs="0" maxOccurs="unbounded"><
xs:element name="MY_DIM" msprop:QueryDefinition="SELECT 'HELLO' AS PHRASE FROM MY_TABLE
WHERE DATE_KEY > 20130801"

msprop:DbTableName="MY_DIM" msprop:QueryBuilder="GenericQueryBuilder" msprop:IsLogical="True" msprop:FriendlyName="MY_DIM" msprop:design-time-name="c4b7ad24-101e-4d41-bc04-7ac2b605cd7b" msprop:TableType="View"><
xs:complexType><
xs:sequence><
xs:element name="MY_ATTRIBUTE" msprop:design-time-name="1e60f905-098a-45f2-b5be-c34364bdf328" msprop:DbColumnName="PHRASE" minOccurs="0"><
xs:simpleType><
xs:restriction base="xs:string"><
xs:maxLength value="100" /></
xs:restriction></
xs:simpleType></
xs:element></
xs:sequence></
xs:complexType></
xs:element></
xs:choice></
xs:complexType></
xs:element></
xs:schema><
diffgr:diffgram xmlns:msdata="urn:schemas-microsoft-com:xml-msdata" xmlns:diffgr="urn:schemas-microsoft-com:xml-diffgram-v1" /></
Schema></
DataSourceView></
Process></
Parallel>
</
Batch>