Friday, September 30, 2011

SQL Pivot / Unpivot

In my experience, the PIVOT and UNPIVOT functions in SQL Server are one of those things that you won't ever need until you need them, and then they are invaluable. If you haven't used them, they basically turn columns into rows and vice versa (I'll go into a bit more detail later). Specifically, a pivot turns columns into rows, and rows into columns. Most commonly, these functions are used to present data in a different format for reporting purposes, but there are other applications as well.

There are a few typical examples of usage of pivots, and a good summary can be found in the Books Online for Pivots and Unpivots, but I'll let you read up on that yourself. I'd like to share a real world example of an unpivot usage, and then a tip for navigating the syntax of pivots which I feel make them way easier.

For a procedure I was writing recently, I had a table which looked like this:



The idea is that for the [ctnull] column, you want to have the count of how many columns are not null for the purpose of averaging the fields you actually have data for (i.e. isnull(val1, 0) + isnull(val2, 0) + isnull(val3, 0) + isnull(val4, 0) / ctnull).

This is doable with some ugly case statments and a few other methods, but as soon as it evolves beyond a basic calculation, this gets really hairy really fast, and thus is a perfect application for a pivot (or more specifically in this case, an unpivot to make an individual row for each date and measure:

TheDate Measure value
40000 Val1 87
40000 Val2 21
40000 Val3 33
40000 Val4 null
40001 Val1 82
...

PIVOT SYNTAX
The syntax for these operations can be the most confusing, but I realize that there are 3 basic components to a pivot, each with a specific purpose, and when properly understood make them much more manageable. You have an outer query, a subquery, and the pivot. The below example is from BOL:

--Part 1
SELECT VendorID, Employee, Orders
FROM

--Part 2
(SELECT VendorID, Emp1, Emp2, Emp3, Emp4, Emp5
FROM pvt) p
UNPIVOT

--Part 3
(Orders FOR Employee IN
(Emp1, Emp2, Emp3, Emp4, Emp5)
)AS unpvt;
GO

Part 1: This defines the aesthetic structure of the result table. In other words, if you want your outputs first column to be employeeId, put that there. If you want subsequent columns along the top to be ice cream flavors, put those across the top. This is mostly responsible for naming and organizing what you want the output to look like. You can use aggregate functions, case statements, or just about anything else you want.

Part 2: This is the actual data and source for the data you want for your pivot. If you want a value to be output or used, it must be in this select. So this would contain your EmployeeIDs, their sales figures, height, weight, ice cream preference, or whatever else you will be using or pivoting on.

Part 3: This contains the actual pivot syntax. While there is still some cryptic syntax in here, once you have realized that this 3rd part is really the only unfamiliar part to the pivot, its much more manageable. You just need the column you want to pivot on, the aggregates, and the IN statement for the valid columns/rows. Spend a little time familiarizing yourself with this portion as it turns out it's really the only non-standard portion.

Once you realize what the function of the first two parts are and how to construct them, it breaks down the big ugly pivot syntax into three familiar and easy to deal with chunks which make the whole thing much more friendly. Hope this helps!

Wednesday, September 14, 2011

Yellow Fruit

King Soopers usually has a selection of weird fruits and every now and then I pick a random one up, perform an autopsy on it, and generally eat it. This is the story of a big leathery yellow thing.


It's about 7 inches in diameter, bumpy leathery surface and weights about three pounds. The outside skin is rugged and takes a little extra knife pressure to puncture, but once you're in, you can slice through it like butter.


Once open, the first impression is that of a cantelopue, mixed with a watermelon.



After removing the seeds, you can really see how juicy and soft the fruit is.




I sliced it up like I would a canteloupe and ate a few pieces plain. It's a very soft fruit, with a texture almost like a banana. As with the smell, it has a mildly sweet taste like the melon combination I mentioned above.

I opted to make a smoothie with it, and added frozen strawberries to the mix. I don't have a picture of the final smoothie because at the time of writing this, I drank it, and the pics I took were all blurry. The smoothie turned out awesome. Next time I think I'll throw some kiwi in the mix.


Follow your dreams and stuff

Well, I've been programming SQL Server now for the eons that is 4 months, and I gotta say, I've never been happier in my life. For a while after college, I sort of had a fear that I would end up doing something like recruiting for the rest of my life, grinding my teeth through each agonizing second of something I didn't love. Honestly, I got to the point that I was so frightened by the possibility of not loving what I did, that I started really exploring what it was that I wanted to do.

It turns out it had absolutely nothing to do with my degree. I know this is hardly news to the world at large, but apparently school doesn't dictate what you do with your life (or maybe it does, just in that it steers you away from what you went to school for...).

The point is, I took it upon myself to pick up books, subscribe to training websites, write programs and databases, come up with projects and most importantly, actually apply for jobs. It can be easy to feel trapped in a certain flow of life, but it's refreshing to know that hard work and dedication can actually pay off.

Friday, September 9, 2011

SQL Avoiding Subquery Using OVER()

I came across a cool way to avoid using subqueries when you need to get different aggregates in the same result set. Typically, you use a subquery with the aggregate you want, and then select that value for the other aggregates you want to work with. While this doesn't really cause any problems, sometimes it clutters up the code, and as it turns out, can be slightly more costly than other methods.

The code below shows what I mean. I have an example of how you might think to go about doing it, which doesn't work, a method that utilizes a subquery, and a nifty trick you can do using an over() clause with the aggregates.


if OBJECT_ID('tempdb.dbo.#test') > 0 drop table #test
create table #test
(
value int
)

insert into #test (value)
SELECT 1
union all
SELECT 1
union all
SELECT 2
union all
SELECT 5

SELECT * FROM #test



Scenario: we want to see what percent of the time a given value occurs in a set. To do this, we want to divide the count of each value by the count of total values. In the above test data, 1 occurs 50% of the time (2/4) whereas 2 and 5 occur 25% of the time (1/4, 1/4). This is a pretty simple scenario to solve, but there are a few ways to go about doing it. Pay special attention to the [TotalCount] column (aka the denominator of the [Percent] column.


--This query is in valid because the aggregates aren't part of the GROUP BY
SELECT
[Count] = COUNT(1),
[TotalCount] = SUM(COUNT(1)),
[Percent] = COUNT(1) * 100.0 / SUM(COUNT(1))
FROM #test
GROUP BY value

--Subquery option
--This is the way I initially thought of doing it
SELECT
[Count] = COUNT(1),
[TotalCount] = (SELECT COUNT(1) FROM #test),
[Percent] = COUNT(1) * 100.0 / (SELECT COUNT(1) FROM #test)
FROM #test
GROUP BY value

--OVER() option
--This is the cool way that I saw a co-worker do today
--by using an empty OVER() statement, you can avoid using the subquery and actually save some performance costs
SELECT
[Count] = COUNT(1),
[TotalCount] = SUM(count(1)) OVER(),
[Percent] = COUNT(1) * 100.0 / SUM(COUNT(1)) OVER()
FROM #test
GROUP BY value


The over() clause allows you to apply the aggregate over a different window as you see fit, which in this case ends up being over the entire set. The end result is the same but with much less code, and faster.



The relative cost of the subquery method is 59% compared to 41% using the over() clause, which is pretty significant. Toy around with the over() clause and using different aggregate functions. This primarily works because you have to hit #test once for the overall query, and again for the subquery. The method with the over() clause only hits the #test table once, and most of the cost comes from a sort performed by the over(). You can generate some useful figures to work with without having to really delve into any subqueries.

Thursday, September 8, 2011

Paracord Bracelet

How many times has this happened to you? You've got some big heavy thing you need to tie to something else, but you just don't have any high tensile strength woven fiber? Well no more! I bought 100 yards of paracord and some plastic clasps to make paracord bracelets! Here's my first one.







Wednesday, September 7, 2011

SQL Date Ranges

I run across a lot of scenarios in my work where I have to return a result set for a given time period. A simple example is that a website might need closing prices for the last day, week, month, year, etc.


select *
from PricingTable
where thedate is between
case when @DateRange = 'Weekly' then DATEADD(wk,-1,getdate()) --@WeekDate
when @DateRange = 'Monthly' then DATEADD(mm,-1,getdate()) --@MonthDate
when @DateRange = 'Yearly' then DATEADD(yy,-1,getdate()) --@YearDate
end and @CurrentDate --getdate()


Problems arise however because the date logic does not take into account weekends and holidays when the underlying data might not exist. To solve this problem, we have a UDF which essentially does the following, but in lieu of one, here's what you can do.


  1. Set @CurrentDate = current date (getdate())
  2. Set @WeekDate, @MonthDate and @YearDate = to dateadd -1 time period
  3. Here's where it gets cool. You want to set each time period = to the highest date from the underlying table as long as it is less than or equal to the raw date you selected before.

    select @WeekDate = MAX(thedate)
    from PricingTable
    where thedate <= @WeekDate

    That way if the date is a valid day of the week, the date stays the same. Otherwise you get the closest date to it that is valid in the table. Depending on the requirements of your site, you can do something similar to say the minimum date where it is greater than or equal to the raw date, to shift the date the other direction
  4. repeat this for the dates you have, and you then have working date ranges that are guaranteed to work for the underlying data.


In some scenarios, this much level of precision doesn't matter, but I've seen a wide range of problems arise from not using a technique like this that can manifest in strange ways, and be difficult to troubleshoot depending on the complexity of the procedure. If the underlying table is indexed on [thedate], it will speed up these seeks as well, and the extra code can save you some big headaches down the line.

EDIT 9/8/2011:
An alternative to this method is to simply say if the day it lands on is Saturday, subtract 1 day or if Sunday, subtract 2 days if you don't care about holidays. There are other function which can be written to account for US holidays, but this method works in the specific scenario where you a) need to account for holidays and b) don't have a means (yet) of accounting for holidays (or if you deal with international data where holidays could vary greatly).

Monday, September 5, 2011

Guacamole of the Gods

This is the guacamole recipe I've made since I was a little kid, and it still blows my mind. Some of the proportions can vary based on your own personal preferences (for example, I'm a garlic fiend, so I usually use a bit more of that), but I'll try to give you a good balanced version.
    What you'll need:
  • 2 Ripe Avocados

  • Juice of half a lime

  • 1 tsp Cumin

  • 1 tsp Chili Powder

  • 3-4 Tbs Salsa (While I wish I could say some obscure random salsa from a mom and pop farm in some obscure city is the best, Pace Chunky medium salsa really works the best in my opinion)

  • 2 cloves garlic, pressed through a garlic press. If you don't have a garlic press, you can mince it real fine, but a press tastes way better. Try to avoid using garlic powder


    What you do:
  1. Cut the avocados in half, remove the pit and scoop their delicious guts into a bowl, mash with a fork.

  2. Add all the other ingredient and stir with a fork until evenly blended.


Notes: I've made this in a food processor before to make a bigger batch, but it changes the consistency quite a bit, which personally I wasn't a fan of. Play around with it by adding slightly more or less of various ingredients, or by stirring it longer or shorter for a more pureed or chunky guacamole.

Storage:
Guacamole goes brown really easily. To prevent that, cover it with Saran wrap, and press it down so that it is completely flush with the guacamole, eliminating as many air pockets as possible. This prevents most of the oxygen from reaching the guacamole and oxidizing it.