Friday, September 30, 2011

SQL Pivot / Unpivot

In my experience, the PIVOT and UNPIVOT functions in SQL Server are one of those things that you won't ever need until you need them, and then they are invaluable. If you haven't used them, they basically turn columns into rows and vice versa (I'll go into a bit more detail later). Specifically, a pivot turns columns into rows, and rows into columns. Most commonly, these functions are used to present data in a different format for reporting purposes, but there are other applications as well.

There are a few typical examples of usage of pivots, and a good summary can be found in the Books Online for Pivots and Unpivots, but I'll let you read up on that yourself. I'd like to share a real world example of an unpivot usage, and then a tip for navigating the syntax of pivots which I feel make them way easier.

For a procedure I was writing recently, I had a table which looked like this:



The idea is that for the [ctnull] column, you want to have the count of how many columns are not null for the purpose of averaging the fields you actually have data for (i.e. isnull(val1, 0) + isnull(val2, 0) + isnull(val3, 0) + isnull(val4, 0) / ctnull).

This is doable with some ugly case statments and a few other methods, but as soon as it evolves beyond a basic calculation, this gets really hairy really fast, and thus is a perfect application for a pivot (or more specifically in this case, an unpivot to make an individual row for each date and measure:

TheDate Measure value
40000 Val1 87
40000 Val2 21
40000 Val3 33
40000 Val4 null
40001 Val1 82
...

PIVOT SYNTAX
The syntax for these operations can be the most confusing, but I realize that there are 3 basic components to a pivot, each with a specific purpose, and when properly understood make them much more manageable. You have an outer query, a subquery, and the pivot. The below example is from BOL:

--Part 1
SELECT VendorID, Employee, Orders
FROM

--Part 2
(SELECT VendorID, Emp1, Emp2, Emp3, Emp4, Emp5
FROM pvt) p
UNPIVOT

--Part 3
(Orders FOR Employee IN
(Emp1, Emp2, Emp3, Emp4, Emp5)
)AS unpvt;
GO

Part 1: This defines the aesthetic structure of the result table. In other words, if you want your outputs first column to be employeeId, put that there. If you want subsequent columns along the top to be ice cream flavors, put those across the top. This is mostly responsible for naming and organizing what you want the output to look like. You can use aggregate functions, case statements, or just about anything else you want.

Part 2: This is the actual data and source for the data you want for your pivot. If you want a value to be output or used, it must be in this select. So this would contain your EmployeeIDs, their sales figures, height, weight, ice cream preference, or whatever else you will be using or pivoting on.

Part 3: This contains the actual pivot syntax. While there is still some cryptic syntax in here, once you have realized that this 3rd part is really the only unfamiliar part to the pivot, its much more manageable. You just need the column you want to pivot on, the aggregates, and the IN statement for the valid columns/rows. Spend a little time familiarizing yourself with this portion as it turns out it's really the only non-standard portion.

Once you realize what the function of the first two parts are and how to construct them, it breaks down the big ugly pivot syntax into three familiar and easy to deal with chunks which make the whole thing much more friendly. Hope this helps!

No comments: