Saturday 25 April 2015

SSAS 2012 Remove Zeros Using NullProcessing and Improve Query Times

In Analysis Services optimal query response times is key to a successful project.  The cube may contain all sorts of clever logic, processing times may be lightning fast, but if users are having to wait on long running queries each time they access the SSAS database they will soon tire of using it.


As fact and dimension tables grow over time, query performance can slow down.  There are many techniques for improving query performance - good partition design for measure groups, implementing natural hierarchies for dimensions etc.  However, there are also other techniques that often get overlooked.  One such technique is to look at the source data and decide if it is really necessary to import it into the cube. A smaller cube will generally outperform a larger one (all other factors being equal).

I recently worked on an SSAS database with a dimension that had grown to contain over 30 million members at the leaf level.   For any given day a subset of these members would be used - around fifty thousand would actually have data in the cube, and this was massively impacting query performance time for users – around 3 minutes were required to complete the query.

On examining the data returned by the SSAS queries, most of the measure values for these members was zero.  Only a few hundred records per day contained non-zero values.  In the source system relational tables the values were null, but in the cube they were displayed as zero.  As a result, to resolve the cube queries SSAS was having to read and aggregate all these zero values when in actual fact they were of no interest to the users.   If these zeros could be removed we could eliminate a huge chunk of data from the cube and reduce the overhead of processing, reading and returning all that unnecessary data.  

It may seem odd that the cube would convert null values in the source tables into zero's in the cube, but the key to this behaviour lies in the NullProcessing property.  As a simple example, let’s first create some dimension and fact data:

CREATE TABLE dimCustomer (custId int, CustName VARCHAR(25), CustCountry VARCHAR(25))
CREATE TABLE dimDate (DateId INT, CalendarDate DATE )
CREATE TABLE factMeasures (dimCustId INT, dimDateId INT, Sales INT, )

INSERT INTO dimCustomer
SELECT 1, 'Dave', 'England' UNION ALL
SELECT 2, 'Bob', 'Scotland' UNION ALL
SELECT 3, 'Jenny', 'Ireland' UNION ALL
SELECT 4, 'Jill', 'Scotland' UNION ALL
SELECT 5, 'Donny', 'Ireland'


DELCARE @t SMALLDATETIME= GETDATE()
INSERT INTO dimDate
SELECT CAST(CONVERT( VARCHAR(12), @t-1, 112) AS INT),CAST(@t-1 AS DATE) UNION ALL
SELECT CAST(CONVERT( VARCHAR(12), @t-2, 112) AS INT),CAST(@t-2 AS DATE) UNION ALL
SELECT CAST(CONVERT( VARCHAR(12), @t-3, 112) AS INT),CAST(@t-3 AS DATE) UNION ALL
SELECT CAST(CONVERT( VARCHAR(12), @t-4, 112) AS INT),CAST(@t-4 AS DATE) UNION ALL
SELECT CAST(CONVERT( VARCHAR(12), @t-5, 112) AS INT),CAST(@t-5 AS DATE) UNION ALL
SELECT CAST(CONVERT( VARCHAR(12), @t-6, 112) AS INT),CAST(@t-6 AS DATE) UNION ALL
SELECT CAST(CONVERT( VARCHAR(12), @t-7, 112) AS INT),CAST(@t-7 AS DATE)

INSERT INTO factMeasures
SELECT 1, CAST(CONVERT( VARCHAR(12), @t-1, 112) AS INT), 10 UNION ALL
SELECT 1, CAST(CONVERT( VARCHAR(12), @t-2, 112) AS INT), 20 UNION ALL
SELECT 1, CAST(CONVERT( VARCHAR(12), @t-3, 112) AS INT), 20 UNION ALL
SELECT 2, CAST(CONVERT( VARCHAR(12), @t-1, 112) AS INT), 30 UNION ALL
SELECT 2, CAST(CONVERT( VARCHAR(12), @t-2, 112) AS INT), 50 UNION ALL
SELECT 3, CAST(CONVERT( VARCHAR(12), @t-1, 112) AS INT), NULL UNION ALL
SELECT 3, CAST(CONVERT( VARCHAR(12), @t-2, 112) AS INT), NULL UNION ALL
SELECT 3, CAST(CONVERT( VARCHAR(12), @t-3, 112) AS INT), NULL UNION ALL
SELECT 3, CAST(CONVERT( VARCHAR(12), @t-4, 112) AS INT), NULL UNION ALL
SELECT 4, CAST(CONVERT( VARCHAR(12), @t-1, 112) AS INT), 10 UNION ALL
SELECT 4, CAST(CONVERT( VARCHAR(12), @t-2, 112) AS INT), NULL UNION ALL
SELECT 4, CAST(CONVERT( VARCHAR(12), @t-3, 112) AS INT), NULL UNION ALL
SELECT 5, CAST(CONVERT( VARCHAR(12), @t-1, 112) AS INT), NULL UNION ALL
SELECT 5, CAST(CONVERT( VARCHAR(12), @t-2, 112) AS INT), NULL UNION ALL
SELECT 5, CAST(CONVERT( VARCHAR(12), @t-3, 112) AS INT), NULL UNION ALL
SELECT 5, CAST(CONVERT( VARCHAR(12), @t-4, 112) AS INT), NULL

SELECT * FROM factMeasures




As the results show, we now have a simple schema with a fact table containing Sales values.  Note that for CustID 5, Donny, all values are null.

Next let’s build a simple SQL Server Analysis Server 2012 cube on top of this schema, using SQL Server Data Tools (SSDT).  The cube design looks like this:



After deploying and processing the cube, we’re now ready to browse the data by customer and date:



Note that Donny is listed with 4 days of data, with a sales value of zero.  The source system showed nulls, but the cube has converted that to zeros, in effect creating values where none existed.  In our example we would prefer that these cells in the cube are empty.  There may be a need to have the record in the source system, e.g. while this measure column for this row in the source table is null,  there may be an additional measure column for which there are valid figures for Donny. However, at present a NON EMPTY query will return these zeros, when we would expect Donny not to be listed at all.

To change this behaviour to that desired we need to examine the NullProcessing property for the measure. In SSDT open the Cube and go to the Cube Structure tab.  In the measures list on the left select the relevant measure and open the properties window (F4).  At the bottom of the list of properties expand the Source property:



By default the NullProcessing is set to Automatic,  which rather ambiguously means the Analysis Services server decides what to do.  Changing the NullProcessing setting to Preserve, will preserve the nulls from the source data.  Using the above example, setting this to Preserve and reprocessing produces the following results for the same query as before:





Donny no longer appears in the list, and the results have reduced from 48 cells to 18.  Although not apparent on such a small dataset, in the original example of 50k records taking 3 mins to return, applying the NullProcessing changes reduced query time to less than 5 seconds to return only a few hundred rows.