SQL Banana: 2013

Monday 2 December 2013

SQL Server Global Variables: @@Rowcount, @@Error and more

There are lots of SQL Server global variables (prefixed with @@), but I thought I'd list a few that I have found particularly useful in the past. I often find I need to make use of them to gather data when I first go on client sites:

@@ROWCOUNT

Stores the number of rows affected by the last command. I find myself using this variable all the time for logging. For example, if I have a stored procedure inserting/updating/deleting data from a table, I use this variable to store the results of each of those commands in a log table. This makes debugging much easier and any spikes or drops in records can be captured and easily viewed. A word of warning though - this variable only stores the rowcount for the last command. If you find that you are not storing the correct number, it is likely you have another line of code executing before you are reading this variable.

@@ERROR

Stores the error code for the immediately previous command. I use this most often in conjunction with the @@ROWCOUNT variable, and store them together for logging/debugging purposes. Note that if you want to log both of them for the same line of code, you need to make sure to write them both to the output in a single command. If you don't do this, then whichever variable's value is written second, will contain the result from the writing of the first variable, and not the actual insert/update/delete that is of concern.

@@SPID

Stores the session ID. Again, my advice would be to log it with the previous two values. If you have a number of sessions performing a range of activities, the log table will be confusing without being able to identify which sessions are doing what.

@@TRANCOUNT

Stores the number of currently open transactions. I often perform a check using this variable in my CATCH block to determine if we have hit an error inside a transaction and left it open : IF (@@TRANCOUNT > 0) ROLLBACK

@@CPU_BUSY, @@IDLE, @@IO_BUSY

These ones are useful for gathering stats on the SQL box. They store the number of "ticks" that the CPU has been busy doing SQL Server activities, the amount of ticks SQL Server has been idle and the number of ticks SQL Server has spent doing IO. All are measured since the last time SQL Server was started.

@@TIMETICKS

The number of microseconds per tick - this helps to transform the previous stats into understandable time :)

There are many other global variables available in SQL Server, but this is simply meant to be a list of the ones I find myself using most often.

Thursday 7 November 2013

SQL Server Window Functions: RANK, DENSE_RANK, NTILE, ROW_NUMBER

Recently I have been finding it more and more useful to work with Window functions. These are sometimes referred to Analytic functions or more commonly Ranking functions:

RANK
DENSE_RANK
NTILE
ROW_NUMBER

These functions essentially "score" each row that is returned by the select statement. So for example if we look at the Sales.SalesTerritory table in AdventureWorks and run the following query:

SELECT

t.Name,

t.CountryRegionCode,

t.[Group],

t.SalesYTD,

RANK() OVER (ORDER BY SalesYTD ) AS Ranked

FROM sales.SalesTerritory t

This results in the final column contain the "rank" of the record (ascending order), ie if it is the second lowest SalesYTD this column will contain a two.

These functions become much more powerful, however, when using the optional PARTITION BY clause that can be stipulated. Using this clause we can "subrank" within partitions:

SELECT t.Name,

t.CountryRegionCode,

t.[Group],

t.SalesYTD,

RANK() OVER (PARTITION BY [Group] ORDER BY SalesYTD) AS SubRank,

RANK() OVER (ORDER BY SalesYTD) AS OverallRank

FROM sales.SalesTerritory t

The results of this query display each record, ranked within its Group (Europe, North America, Pacific):

Name CountryRegionCode Group SalesYTD SubRank OverallRank

-------------------------------------------------- ----------------- ---Northeast US North America 2402176.8476 1 1

Southeast US North America 2538667.2515 2 2

Central US North America 3072175.118 3 3

Germany DE Europe 3805202.3478 1 4

France FR Europe 4772398.3078 2 5

United Kingdom GB Europe 5012905.3656 3 6

Australia AU Pacific 5977814.9154 1 7

Canada CA North America 6771829.1376 4 8

Northwest US North America 7887186.7882 5 9

Southwest US North America 10510853.8739 6 10

(10 row(s) affected)

This makes it very easy for us to produce reports that return, say, only the top performers in each region.

These functions avoid the need to write subqueries or CTEs to produce their results and are therefore highly efficient. More details on these four functions can be found on technet here.

In the next post we'll look at additional lesser known window functions within SQL Server.

Friday 25 October 2013

Truncate vs Delete - Myth vs Reality

I have been asked many times (often in interviews) the following question:

Q. What is the difference between Truncate and Delete in SQL Server?

The answer that, in my experience, is most expected and most often given is that Delete allows you roll back the data you have wiped, while Truncate does not - Truncate is irreversible. A quick search on google gives a number of posts that say as much.

Simple. But wrong! Just try this simple test.

Insert records into a table.
Open a transaction.
Truncate the table.
View the contents of the table.
Rollback the transaction.
View the contents of the table.

The code below is a very simple example:

CREATE TABLE MyTable (col1 CHAR(1), col2 INT)

GO

INSERT INTO MyTable (col1, col2)

SELECT 'a', 1 UNION ALL SELECT 'b', 2 UNION ALL SELECT 'c', 3

GO

SELECT * FROM MyTable --(3 row(s) affected)

BEGIN TRAN

TRUNCATE TABLE MyTable

GO

SELECT * FROM MyTable --(0 row(s) affected)

ROLLBACK

SELECT * FROM MyTable --(3 row(s) affected)

Before the transaction is rolled back, there are no rows in the table, but after the rollback all three records have returned, despite having used Truncate to remove them. Clearly Truncate is reversible!

So what actually is the difference between Truncate and Delete? There are a number of general differences, but the relevant one here is the following:

When rows are DELETE'd the operation is logged for each row removed. TRUNCATE logs only the deallocation of the data pages rather than the rows themselves, making it a much faster operation.

There are many other differences related to permission levels, seeding, triggers, contraints etc. That's for a future post. For now, the point to note is that TRUNCATE can indeed be rolled back, and the myth has been busted!

Tuesday 8 October 2013

SSAS: MDX True Average VS AverageOfChildren

Recently I was tasked with working on some slow performing calculations in a cube. In the Calculation script a developer had used the DECENDANTS function, with the LEAVES flag to navigate all the way down to the bottom of several dimensions' hierarchies. He had then used AVG across the returned set for a specified measure, effectively aggregating all the way back up again.

All the developer actually wanted to do was to have a measure that returned the average value for the current member selection. By going down to the leaf level and then backup, however, the cube is having to scan through a potentially huge volume of data to produce the answer. Inevitably this will result in degraded query performance for users.

In fact this can be a lot simpler to do than people realise. True, SSAS 2008R2 still does not supply an inbuilt function to do this, which surprises many new users. One day maybe it will, but in the meantime here is how I do it.

First, it should be pointed out that there is a certain "type" of averaging that SSAS does support out of the box. If you look at the options in the AggregationFunction property for a measure you will see that, although by default set to Sum, there is also one option called AverageOfChildren. This name is very misleading. It is not a true average as I would understand. It is actually an average only across the Time dimension. If you have two dates selected and are looking at the Sales Value measure you will see it averaged across those two days. However, if you are looking at a single day, but across two sales areas, you will not see it averaged across those areas, but summed. Confusing, but that is just what it does. Note that to enable this correctly your time dimension needs to have its Type property set to Time.

If, however, you want to do a true average, you will need to create a calculated member in the calculations tab of your cube. The calculation is very simple, for what is an average? It is the sum of the values, divided by the number of values. The sum we have as the base measure from the fact table, say Sales Value. When we create a new measure group in the cube it automatically generates a new measure suffixed "Count". This measure is none other than the count of values (or fact table rows) in the measure group.

Therefore the average calculation is the one measure divided by the other:

CREATE MEMBER CURRENTCUBE.[Measures].AvgSales
AS Measures.Sales / [Measures].[SalesFactCount]

Since both measures are essentially straight from the fact table, their values are stored during processing time, and benefit from aggregation design too. As a result the AvgSales measure ought to perform very fast.

Friday 27 September 2013

SSAS 2008: Clear Cache Command

When testing query perfornance on an SSAS database it's important to know if you are starting from a warm or clear cache. The simplest way to clear the cache is through an XMLA command, run from SSMS:

<ClearCache xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">
<Object>
<DatabaseID>MyDatabase</DatabaseID>
</Object>
</ClearCache>

This clears the cache for the entire database. You can also clear the cache just for a specific cube:

<ClearCache xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">
<Object>
<DatabaseID>MyDatabase</DatabaseID>
<CubeID>MyCube</CubeID>
</Object>
</ClearCache>

Monday 9 September 2013

SSAS: Performance of MDX SCOPE() statement

Further to my recent posts on using SCOPE() statements in a cube and potential issues in using the SCOPE() statement, I came across another issue last week.

In a cube I was developing I had used a SCOPE() statement to ensure that when a particular measure was selected, the dimension member used (for one particular dimension) would be the datamember, so that the measure value displayed would be the "fact table" value and not the aggregated value of the children. (for more on using the DATAMEMBER function see this post).

In the calculation tab of the cube editor I wrote the following:

SCOPE([Measures].[MyMeasure]); THIS=[Portfolio].[Portfolio Name].CURRENTMEMBER.DATAMEMBER; END SCOPE;

However, in user testing accross large volumes of data this measure performed very slowly, taking over a minute to return results.

I tried experimenting with different approaches, and found that the desired results could be achieved without using the scope statement. A calculated measure can be used, that contains the datamember function in its logic:
CREATE MEMBER CURRENTCUBE.[Measures].[MyMeasure2]
AS ([Measures].[MyMeasure,[Portfolio].[Portfolio Name].DATAMEMBER), VISIBLE = 1 ;

This measure produced the same result set but surprisingly the response time reduced to around ten seconds.

If anyone has had a similar experience and has explanation why using the DATAMEMBER function in a calculated measure rather than a SCOPE statement should perform faster please leave a comment and let me know!

Thursday 29 August 2013

SSAS 2008R2 - Scope and Assignments - "A set has been encountered that cannot contain calculated members."

In my previous post I discussed using the SCOPE statement in a cube. I use it quite extensively in my designs and last week I came accross an unexpected issue with it.

I have a cube containing many measures for which I require different functionality under certain conditions. Those conditions can actually be described in subcubes, and therefore the SCOPE statement addresses this requirement.

The functionality I want is the same for several measures and therefore I put them together in the same SCOPE block:

SCOPE([Measures].[Area1],[Measures].[Area2]); THIS=(0); END SCOPE;

Where Area1 and Area2 are both measures defined in the cube. The above scope would set them both to zero, but this is just for illustration.

The issue arises if the 2 measures above, Area1 and Area2, are actually calculated measures. If that is the case then when we deploy this MDX we get the following error:

"A set has been encountered that cannot contain calculated members."

This can be confusing as the error message does not actually identify the SCOPE as the source of the issue. But you can be fairly sure that that is where the problem is.

For some reason using more than one calculated measure in a single scope statment is a problem for SSAS. If you have very few relevant measures then you could split them into individual scope statments, but this would be laborious if you have several. The solution is to use an assignment script instead. This is as simple as:

({[Measures].[Area1],[Measures].[Area2]}) = 0;

Type this in the calculation script and, hey presto, the error disappears and the MDX works as expected.

Chris Webb also covers this here.

Wednesday 28 August 2013

Analysis Services MDX - SCOPE Statement

A question was put to me recently on how to setup a measure that performs differently based on different member selections in the cube.

One option is to use lots of nested IF statments. This may well work, but is quite untidy and is known to be a little slow.

Wherever you are considering using IF type logic in your calculation script, it would be a very good idea to consider using the much faster SCOPE statement instead. The SCOPE statment is clearer and significantly more performant.

As an example of how to use the scope statment i'll use a geometry based illustration: My cube contains a dimension called Shape that contains 2 members: Square and Circle. I then have a number of measures: Length and Radius. I create a calculated measure called Area. The calculation for area depends on what shape i have selected. If i have selected square then Area = (Length)^2, while for Circle it would be Area=Pi*(Radius)^2.

Circle and Square can be thought of as subcubes. I can therefore define the scope statement as follows:

SCOPE([Measures].[Area],[Shapes].[Circle]); THIS=(Pi*[Measures].[radius]*[Measures].[radius]);
END SCOPE;

SCOPE([Measures].[Area],[Shapes].[Square]); THIS=([Measures].[Length]*[Measures].[Length]);
END SCOPE;

Note - I would still need to have defined the Area measure in some manner elsewhere in my script. The above statments only refer to the measure, they dont actually create it.

That is essentially an introduction to how to use the SCOPE statment. Next post will be on issues to watch out for when using it!

Monday 19 August 2013

SSAS: Setting the Default Measure Using MDX

I recently used a nifty bit of code to set the default measure for a cube using MDX.

The most obvious way of setting the default measure for a cube using is relatively easy. In the cube editor, under the Cube Structure tab, highlight your cube (at the top of the measure group tree). In the properties on the right of the screen (shortcut F4) there is a property called DefaultMeasure. Click the dropdown and select a measure from the list.

On closer examination, however, none of the measures in the drop down are calculated measures. So what if you want the default measure to be just that? The answer is that the default measure can be set using MDX in the Calculations tab of the cube editor. The code do it is as follows:

ALTER CUBE CURRENTCUBE UPDATE DIMENSION Measures, DEFAULT_MEMBER = [Measures].MyDefaultMeasure;

That's all there is to it. Put whatever measure you like here, calculated or not.

Additionally, if you now want to change the default measure to be a different one, that would previously have required changing the DefaultMember property mentioned earlier and then redeploying the entire cube (using the Deployment Wizard, direct from BIDS etc), which can have other consequences. However, by assigning the default measure in the calculations tab it has the added benefit that making a change only requires deployment of the MDX script and not the rest of the database. Using a tool like BIDS Helper (from codeplex) means this can be done in a single click from the Calculations tab with no impact to the rest of the cube design.

Thursday 8 August 2013

SSAS - ProcessAdd explained

Further to my previous post on creating multiple partitions and multiple processing commands in a single xmla script (here and here) I thought I share one more idea around SSAS Processing tasks.

When looking the different types of processing options available in SSAS you tend to get the same list over and over:

ProcessFull,
ProcessDefault,
ProcessData,
ProcessIndexes,
ProcessUpdate,
ProcessIncremental,
ProcessClear,
ProcessAdd

I will not explain here the differences between each of these types as this info is readily available on plenty of other sites (such as http://technet.microsoft.com/en-us/library/ms174774(v=sql.105).aspx).

However if you try to use each of these processing types from management studio all will go well until you try and process a dimension using ProcessAdd. It may sound like a useful option (faster than ProcessUpdate, less impact than ProcessFull etc) but when you try and use it you will find it is not as straightforward as the others. If you right-click the dimension in SSMS and select Process you might think to just select it in the drop down. But take a close look at that drop down - ProcessAdd is not there!

But the documentation says it is definately an option. Script a different processing option to XMLA and change the process type to Add and it may work, but if you havent specified which records to add the results could be misleading. Confusing, right?

In order to use this processing option you have to actually stipulate quite a lot of details. This can be done with an XMLA script, but the process of building this script is quite tricky. Therefore i shall do my best to explain how to do it here :) (Greg Galloway gives a good overview of this here
http://www.artisconsulting.com/blogs/greggalloway/Lists/Posts/Post.aspx?ID=4, and the steps below were based on following his method)

1 - Use Management Studio (SSMS) to script out to XMLA a ProcessUpdate command. (I recommend editting the error settings to ignore/report duplicates before scripting)

2 - In BIDS in the Solution Explorer right click on the dsv (the dsv file that the relevant dimension is build from, if you have more than one) and select View Code. Copy the entire code and paste it into a new window in SSMS to allow us to edit it.

3 - We should now reduce this dsv copy down to only the relevant part. We only actually need the
<
xs:element name for the particular query that is relevant to our dimension. Tip - quick way to find this element would be to search for a unique string within the query that might single it out from the rest! Keep all data and tags within this particular tag and delete everything outside this (tags at the same level), but keep all the parent tags. SSMS should make it easy to expand and collapse the tags in order to determine was tags are above or below, and what ones are at the same level.
4 - Copy what remains and paste it into your original process script from step 1. The code should be copied immediately before the closing Process tag.

5 - Now modify the actual query to limit it to only new data. For example add a where clause that limits the results to be after a certain date.

6 - Voila - you now have your ProcessAdd script. It should look something like this the code below and ready to execute.

<Batch xmlns="http://schemas.microsoft.com/analysisservices/2003/engine"><
ErrorConfiguration xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ddl2="http://schemas.microsoft.com/analysisservices/2003/engine/2" xmlns:ddl2_2="http://schemas.microsoft.com/analysisservices/2003/engine/2/2" xmlns:ddl100_100="http://schemas.microsoft.com/analysisservices/2008/engine/100/100" xmlns:ddl200="http://schemas.microsoft.com/analysisservices/2010/engine/200" xmlns:ddl200_200="http://schemas.microsoft.com/analysisservices/2010/engine/200/200"><
KeyErrorLimit>-1</KeyErrorLimit></
ErrorConfiguration>
<
Parallel><
Process xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ddl2="http://schemas.microsoft.com/analysisservices/2003/engine/2" xmlns:ddl2_2="http://schemas.microsoft.com/analysisservices/2003/engine/2/2" xmlns:ddl100_100="http://schemas.microsoft.com/analysisservices/2008/engine/100/100" xmlns:ddl200="http://schemas.microsoft.com/analysisservices/2010/engine/200" xmlns:ddl200_200="http://schemas.microsoft.com/analysisservices/2010/engine/200/200"><
Object><
DatabaseID>MyDB</DatabaseID><
DimensionID>MyDim</DimensionID></
Object><
Type>ProcessAdd</Type><
WriteBackTableCreation>UseExisting</WriteBackTableCreation>
<
DataSourceView xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ddl2="http://schemas.microsoft.com/analysisservices/2003/engine/2" xmlns:ddl2_2="http://schemas.microsoft.com/analysisservices/2003/engine/2/2" xmlns:ddl100_100="http://schemas.microsoft.com/analysisservices/2008/engine/100/100" xmlns:ddl200="http://schemas.microsoft.com/analysisservices/2010/engine/200" xmlns:ddl200_200="http://schemas.microsoft.com/analysisservices/2010/engine/200/200" xmlns:dwd="http://schemas.microsoft.com/DataWarehouse/Designer/1.0" dwd:design-time-name="4be9af80-0ae0-4f8a-936d-2103b944155f" xmlns="http://schemas.microsoft.com/analysisservices/2003/engine"><
ID>MyDSV</ID><
Name>MyDSV</Name><
DataSourceID>MyDS</DataSourceID><
Schema><
xs:schema id="MyDS" xmlns="" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:msdata="urn:schemas-microsoft-com:xml-msdata" xmlns:msprop="urn:schemas-microsoft-com:xml-msprop"><
xs:element name="MyDS" msdata:IsDataSet="true" msdata:UseCurrentLocale="true" msprop:design-time-name="2557b54c-2a08-4e49-ab45-fdd94dfb6dee"><
xs:complexType><
xs:choice minOccurs="0" maxOccurs="unbounded"><
xs:element name="MY_DIM" msprop:QueryDefinition="SELECT 'HELLO' AS PHRASE FROM MY_TABLE
WHERE DATE_KEY > 20130801"

msprop:DbTableName="MY_DIM" msprop:QueryBuilder="GenericQueryBuilder" msprop:IsLogical="True" msprop:FriendlyName="MY_DIM" msprop:design-time-name="c4b7ad24-101e-4d41-bc04-7ac2b605cd7b" msprop:TableType="View"><
xs:complexType><
xs:sequence><
xs:element name="MY_ATTRIBUTE" msprop:design-time-name="1e60f905-098a-45f2-b5be-c34364bdf328" msprop:DbColumnName="PHRASE" minOccurs="0"><
xs:simpleType><
xs:restriction base="xs:string"><
xs:maxLength value="100" /></
xs:restriction></
xs:simpleType></
xs:element></
xs:sequence></
xs:complexType></
xs:element></
xs:choice></
xs:complexType></
xs:element></
xs:schema><
diffgr:diffgram xmlns:msdata="urn:schemas-microsoft-com:xml-msdata" xmlns:diffgr="urn:schemas-microsoft-com:xml-diffgram-v1" /></
Schema></
DataSourceView></
Process></
Parallel>
</
Batch>