In Analysis
Services optimal query response times is key to a successful project. The
cube may contain all sorts of clever logic, processing times may be lightning
fast, but if users are having to wait on long running queries each time they
access the SSAS database they will soon tire of using it.
As fact and
dimension tables grow over time, query performance can slow down. There
are many techniques for improving query performance - good partition design for
measure groups, implementing natural hierarchies for dimensions etc.
However, there are also other techniques that often get overlooked.
One such technique is to look at the source data and decide if it is really
necessary to import it into the cube. A smaller cube will generally outperform
a larger one (all other factors being equal).
I recently
worked on an SSAS database with a dimension that had grown to contain over 30
million members at the leaf level. For any given day a subset of these
members would be used - around fifty thousand would actually have data in the
cube, and this was massively impacting query performance time for users –
around 3 minutes were required to complete the query.
On examining
the data returned by the SSAS queries, most of the measure values for these
members was zero. Only a few hundred records per day contained non-zero
values. In the source system relational tables the values were null, but
in the cube they were displayed as zero. As a result, to resolve the cube
queries SSAS was having to read and aggregate all these zero values when in actual
fact they were of no interest to the users. If these zeros could be
removed we could eliminate a huge chunk of data from the cube and reduce the
overhead of processing, reading and returning all that unnecessary data.
It may seem
odd that the cube would convert null values in the source tables into zero's in
the cube, but the key to this behaviour lies in the NullProcessing property.
As a simple example, let’s first create some dimension and fact data:
CREATE TABLE
dimCustomer (custId
int, CustName VARCHAR(25), CustCountry VARCHAR(25))
CREATE TABLE dimDate (DateId INT, CalendarDate DATE )
CREATE TABLE factMeasures (dimCustId INT, dimDateId INT, Sales INT, )
INSERT INTO dimCustomer
SELECT 1, 'Dave', 'England' UNION ALL
SELECT 2, 'Bob', 'Scotland' UNION ALL
SELECT 3, 'Jenny', 'Ireland' UNION ALL
SELECT 4, 'Jill', 'Scotland' UNION ALL
SELECT 5, 'Donny', 'Ireland'
DELCARE
@t SMALLDATETIME=
GETDATE()
INSERT INTO dimDate
SELECT CAST(CONVERT( VARCHAR(12), @t-1, 112) AS INT),CAST(@t-1 AS DATE) UNION ALL
SELECT CAST(CONVERT( VARCHAR(12), @t-2, 112) AS INT),CAST(@t-2 AS DATE) UNION ALL
SELECT CAST(CONVERT( VARCHAR(12), @t-3, 112) AS INT),CAST(@t-3 AS DATE) UNION ALL
SELECT CAST(CONVERT( VARCHAR(12), @t-4, 112) AS INT),CAST(@t-4 AS DATE) UNION ALL
SELECT CAST(CONVERT( VARCHAR(12), @t-5, 112) AS INT),CAST(@t-5 AS DATE) UNION ALL
SELECT CAST(CONVERT( VARCHAR(12), @t-6, 112) AS INT),CAST(@t-6 AS DATE) UNION ALL
SELECT CAST(CONVERT( VARCHAR(12), @t-7, 112) AS INT),CAST(@t-7 AS DATE)
INSERT INTO factMeasures
SELECT 1, CAST(CONVERT( VARCHAR(12), @t-1, 112) AS INT), 10 UNION ALL
SELECT 1, CAST(CONVERT( VARCHAR(12), @t-2, 112) AS INT), 20 UNION ALL
SELECT 1, CAST(CONVERT( VARCHAR(12), @t-3, 112) AS INT), 20 UNION ALL
SELECT 2, CAST(CONVERT( VARCHAR(12), @t-1, 112) AS INT), 30 UNION ALL
SELECT 2, CAST(CONVERT( VARCHAR(12), @t-2, 112) AS INT), 50 UNION ALL
SELECT 3, CAST(CONVERT( VARCHAR(12), @t-1, 112) AS INT), NULL UNION ALL
SELECT 3, CAST(CONVERT( VARCHAR(12), @t-2, 112) AS INT), NULL UNION ALL
SELECT 3, CAST(CONVERT( VARCHAR(12), @t-3, 112) AS INT), NULL UNION ALL
SELECT 3, CAST(CONVERT( VARCHAR(12), @t-4, 112) AS INT), NULL UNION ALL
SELECT 4, CAST(CONVERT( VARCHAR(12), @t-1, 112) AS INT), 10 UNION ALL
SELECT 4, CAST(CONVERT( VARCHAR(12), @t-2, 112) AS INT), NULL UNION ALL
SELECT 4, CAST(CONVERT( VARCHAR(12), @t-3, 112) AS INT), NULL UNION ALL
SELECT 5, CAST(CONVERT( VARCHAR(12), @t-1, 112) AS INT), NULL UNION ALL
SELECT 5, CAST(CONVERT( VARCHAR(12), @t-2, 112) AS INT), NULL UNION ALL
SELECT 5, CAST(CONVERT( VARCHAR(12), @t-3, 112) AS INT), NULL UNION ALL
SELECT 5, CAST(CONVERT( VARCHAR(12), @t-4, 112) AS INT), NULL
SELECT * FROM factMeasures
As the results show, we now have a simple schema with a fact
table containing Sales values. Note that
for CustID 5, Donny, all values are null.
Next let’s build a simple SQL Server Analysis Server 2012
cube on top of this schema, using SQL Server Data Tools (SSDT). The cube design looks like this:
After deploying and processing the cube, we’re now ready to
browse the data by customer and date:
Note that Donny is listed with 4 days of data, with a sales
value of zero. The source system showed
nulls, but the cube has converted that to zeros, in effect creating values where none existed. In our
example we would prefer that these cells in the cube are empty. There may be a need to have the
record in the source system, e.g. while this measure column for this row in the source table is null, there may be an additional measure column for which
there are valid figures for Donny. However, at present a NON EMPTY query will return these zeros, when we would expect Donny not to be listed at all.
To change this behaviour to that desired we need to examine
the NullProcessing property for the
measure. In SSDT open the Cube and go to the Cube Structure tab. In the measures list on the left select the
relevant measure and open the properties window (F4). At the bottom of the list of properties
expand the Source property:
By default the NullProcessing is set to Automatic, which rather ambiguously means the Analysis Services server
decides what to do. Changing the NullProcessing setting to Preserve, will preserve
the nulls from the source data. Using the
above example, setting this to Preserve and reprocessing produces the following
results for the same query as before:
Donny no longer appears in the list, and the results have
reduced from 48 cells to 18. Although
not apparent on such a small dataset, in the original example of 50k records
taking 3 mins to return, applying the NullProcessing changes reduced query time
to less than 5 seconds to return only a few hundred rows.
This is awesome!
ReplyDelete