Wednesday, October 3, 2012

Relational Databases (or, How I'm Trying to Save Face)

Several months ago, I started getting active on SQLServerCentral and answered a post to a newbie about how to understand some common terminology about SQL; Rows, Columns, Fields, etc. Well, it turns out I didn't have a very good grasp on it, and the Godfather of SQL himself posted a thesis in response basically tearing me apart for not understanding the relational model correctly. (I wish I could find the article but I think at the time I was so mortified that I removed any trace of it from both my SQLServerCentral account and my email... shame on me).

That said, I think I've come up with a better way to describe a table in a relational database then analogizing it to an excel document - in laymens terms.. (Joe, if you're reading, please let me know!)

Think of a Table as an empty bank vault. At the entrance to the vault is a Guard wearing a shirt that says "Database Engine". Throughout the day, people come by to put things in the vault, remove things from the vault and change things in the vault. To do so, they have to bring in a box with a bunch of paper slips with things written on them; Words, numbers, hexadecimal expressions... and so on.

Now, when the vault was set up, the guard was given very specific instructions on what could be placed in the vault. Any box someone brought in may contain no more than some number of slips of paper. Each slip of paper must have a header on it saying what it represent (email address, phone number, etc). The guard is also responsible for making sure that certain slips of paper only have numbers of a certain length, or do not contain certain characters, and so forth. Each and every box MUST have a slip of paper with an email address. Then end result is, after the guard checks each new box, or instructions to change a box already in the vault, he gives it the OK, that it meets all the requirements set forth by the vaults creators. Over the course of the day, thousands, millions of boxes may come in and go out, but they all match the vaults requirements.

So what do you have? You have a room, with a bunch of containers with pieces of information that meet certain criteria. That's basically what a table is. The database engine constrains the types of data that enter a table so that at any time when you look in a box, you know certain pieces of information about that box. Note, this is a BIG departure from the way you would instinctively think about an Excel table, where it's basically a big (x,y) grid. Although there are parallels in how the data can be retrieve and thought about, the way the system stores the data and to understand how more complex queries work, it's critical to understand the difference.

I won't get into too much more, but you can start to see how things like joins, sorting, indexing and referential integrity flow from this. Along the indexing vein, say you want to know every box in a vault that has an email address that starts with the letter "T". Well, if all the boxes are just in a Heap on the floor (by the way, Heap is a SQL term for a table which has no "Clustered Index", or basically any logical physical ordering on the disk), you get to open each box, look at it's email slip, and then if its email starts with "T", add it to the list. How might one make this easier? Well if you physically structured the vault so that each box was placed adjacent to another box in alphabetical order, you could skim through the boxes till you found the section starting with "T", then see where the section starting with "U" started, and then just say "ok, everything between where "T" started and "U" started, has an email that starts with the letter 'T'". (again, the highly paid security guard is responsible for keeping them in order at all times.) Ok, so there's a bit more involved with indexing than just that, but when you think about a relational table like this, the operations you perform against them start to make a lot more sense.

No comments: