Database: Crosstab / Pivot queries

I heard that SQL Server 2005 had this really cool new feature: PIVOT

Since I just finished working with one of our lead developers (name omitted to protect the innocent) on a SQL Server 2000 pivot query, I thought I’d take the chance to implement the query using SQL Server 2005’s new PIVOT feature.

First, we need to convert some of our rows into columns (21 in this case). With SQL 2000, we did this using a CASE statement to generate a count for each row that was to be converted to a column.

(SELECT
'AA' = Count( CASE Action_ID WHEN 'AA' THEN ap_id END),
'AS' = Count( CASE Action_ID WHEN 'AS' THEN ap_id END),
...
'RZ' = Count( CASE Action_ID WHEN 'RZ' THEN ap_id END),
'XX' = Count( CASE Action_ID WHEN 'XX' THEN ap_id END)
) AS pivotTable

With SQL 2005, we still need to convert the rows to columns. However, the syntax is somewhat different. Rather than performing a CASE and count on each row, we get to specify the data values we are looking for and it will count them all for us.

PIVOT
(
count(action_id)
FOR Action_ID IN(
[AA], [AS], [CA], [CB], [CE], [CF], [CP], [CW], [CX], [CY], [CZ], [EX], [IC], [KC], [KT], [KV], [LR], [OE], [PR], [RZ], [XX] )
) AS pivotTable

Now that our basic pivot is put together (notice that both queries have an alias of “pivotTable”), we want to count how many occurrences of each unique row we have. We’ll start by selecting ALL of our converted columns and our count. This portion of the query looks the same in both versions of SQL Server.

SELECT [AA], [AS], [CA], [CB], [CE], [CF], [CP], [CW], [CX], [CY], [CZ], [EX], [IC], [KC], [KT], [KV], [LR], [OE], [PR], [RZ], [XX], count(*) AS Frequency

Since we’re doing a count, and we want distinct values, we also have to group by all of our columns. This is another portion of our query that will look the same in SQL 2000 and 2005. We’ll also sort them so we get our most frequently occurring row counts at the top.

GROUP BY [AA], [AS], [CA], [CB], [CE], [CF], [CP], [CW], [CX], [CY], [CZ], [EX], [IC], [KC], [KT], [KV], [LR], [OE], [PR], [RZ], [XX]
ORDER BY Frequency DESC

Now that we have all the pieces of our query put together, let’s tie them up into something useful. In SQL Server 2000, we just need to select our newly created columns from the sub-query containing all the CASE count statements (I’ll add in the base table name and the remainder of our 21 columns, at this point).

SELECT
AA, [AS], CA, CB, CE, CF, CP, CW, CX, CY, CZ, EX, IC, KC, KT, KV, LR, OE, PR, RZ, XX
, count(*) as Frequency
FROM
(
SELECT
'AA' = Count( CASE Action_ID WHEN 'AA' THEN ap_id END),
'AS' = Count( CASE Action_ID WHEN 'AS' THEN ap_id END),
'CA' = Count( CASE Action_ID WHEN 'CA' THEN ap_id END),
'CB' = Count( CASE Action_ID WHEN 'CB' THEN ap_id END),
'CE' = Count( CASE Action_ID WHEN 'CE' THEN ap_id END),
'CF' = Count( CASE Action_ID WHEN 'CF' THEN ap_id END),
'CP' = Count( CASE Action_ID WHEN 'CP' THEN ap_id END),
'CW' = Count( CASE Action_ID WHEN 'CW' THEN ap_id END),
'CX' = Count( CASE Action_ID WHEN 'CX' THEN ap_id END),
'CY' = Count( CASE Action_ID WHEN 'CY' THEN ap_id END),
'CZ' = Count( CASE Action_ID WHEN 'CZ' THEN ap_id END),
'EX' = Count( CASE Action_ID WHEN 'EX' THEN ap_id END),
'IC' = Count( CASE Action_ID WHEN 'IC' THEN ap_id END),
'KC' = Count( CASE Action_ID WHEN 'KC' THEN ap_id END),
'KT' = Count( CASE Action_ID WHEN 'KT' THEN ap_id END),
'KV' = Count( CASE Action_ID WHEN 'KV' THEN ap_id END),
'LR' = Count( CASE Action_ID WHEN 'LR' THEN ap_id END),
'OE' = Count( CASE Action_ID WHEN 'OE' THEN ap_id END),
'PR' = Count( CASE Action_ID WHEN 'PR' THEN ap_id END),
'RZ' = Count( CASE Action_ID WHEN 'RZ' THEN ap_id END),
'XX' = Count( CASE Action_ID WHEN 'XX' THEN ap_id END)
FROM
dbo.los_LoanServicingActions
GROUP BY
ap_id
) AS pivotTable

GROUP BY
AA, [AS], CA, CB, CE, CF, CP, CW, CX, CY, CZ, EX, IC, KC, KT, KV, LR, OE, PR, RZ, XX
ORDER BY Frequency DESC

We’ll also add the base table to our SQL 2005 query. Rather than going inside the pivot statement, however, it will be placed inside a sub-query right before the PIVOT clause.

SELECT [AA], [AS], [CA], [CB], [CE], [CF], [CP], [CW], [CX], [CY], [CZ], [EX], [IC], [KC], [KT], [KV], [LR], [OE], [PR], [RZ], [XX], count(*) AS Frequency
FROM(SELECT Action_ID, ap_id FROM los_loanServicingActions) AS p

PIVOT
(
count(action_id)
FOR Action_ID IN(
[AA], [AS], [CA], [CB], [CE], [CF], [CP], [CW], [CX], [CY], [CZ], [EX], [IC], [KC], [KT], [KV], [LR], [OE], [PR], [RZ], [XX] )
) AS pivotTable
GROUP BY [AA], [AS], [CA], [CB], [CE], [CF], [CP], [CW], [CX], [CY], [CZ], [EX], [IC], [KC], [KT], [KV], [LR], [OE], [PR], [RZ], [XX]
ORDER BY Frequency DESC

As you can see, our final queries have the same opening and closing clauses. Be alleviating the CASE count line for each column, SQL 2005 allows us to express our pivot much more concisely. The one awkward thing about the 2005 pivot is enclosing the actual data values inside the IN clause in square brackets as if they were object names.

Well, form and brevity are all very nice but, how do they perform? It turns out, oddly enough (based on all the SQL Server 2005 release hype about performance), that they perform equally well (I pinned the source table in memory to remove disk IO as a factor).

SQL Server 2005:
(143 row(s) affected)
0.280000000 seconds

SQL Server 2000:
(143 row(s) affected)
0.280000000 seconds

So, which style should I use? Performance-wise, it’s a wash. It seems that the new PIVOT query may win in ease of use and reduced typing. One caveat to this is that the database must have the compatibility level set to 90 in order to use the new syntax. If you are developing a query that may have to be run against a database with a compatibility level lower than 90, or on SQL Server 2000 or older, you must continue to use the CASE count syntax.

Database

17 February 2006

Crosstab / Pivot queries

No comments: