Tuesday, March 12, 2013

Non-deplorable use of triggers

So I had a rare case today to actually use a trigger without invoking the wrath of all the DBAs, developers and gnomes that live under my desk.

For starters, let me give my little myopic view of why I (and most SQL developers I know) avoid triggers like The Plague. Up front I admit that I probably don't know all there is to know about triggers, and how best to implement them to keep from pulling your hair out. That said, the existing structures with triggers have one of the two following problems.

Triggers, when used to enforce data integrity involve restricting data in at least one way. There's an older system I have to work on occasionally at work which has a procedure which updates identifiers in about 20 tables; each table containing triggers referencing other tables (some, the same ones which are in the procedure, but some are not). When the procedure fails, tracing down where the error took place in the procedure is just the beginning. You then end up traversing a network of obscure DML enforcing statements across dozens of objects. The end result is that most people who work on the system take extraneous effort to circumvent the triggers if something goes wrong with them rather than dig through them to fix the problem.

The next problem with triggers is that regardless of the use, there is a "hidden" execution cost on every transaction on the table to which the trigger is bound. Imagine you have a table which when built had very low traffic; maybe 50 items added a day. Each time one of those is logged, a trigger fires to update an audit logging table, and additionally update a reference table to make sure that any identifiers which came in are historically corrected as well. Now imagine a few years go by and the developer who wrote the system has retired to become a pet psychiatrist. As years go by (perhaps due to this developer leaving the company) the company grows by leaps and bounds, and now that table is receiving 500,000 DML transactions a day, or 5 million. While there are certainly ways to remedy this situation, it might take a long time to try to realize that there is a trigger on the table.

So again, maybe this is just the way I've grown accustom to doing things, but integrity enforced by table constraints or through procedural logic are the way I prefer to maintain data integrity.

That said, here's the situation I had today. A client was trying to upload some new data to one of our test environments, and throughout the day the data would (according to the client) appear, and later disappear. I'll spare you the hour long heated conversation we had between several teams, but in the end, I undertook the task of monitoring the tables which housed the data to see what they did throughout the day. Initially I set up an SSIS package which would periodically throughout the day dump the data into a logging table via an Agent job. But on my bus ride home, the annoying fact that this still would not necessarily catch the culprit statements persisted. Thinking back on a presentation I'd given on logging tables (Change Tracking, Change Data Capture, etc.) suddenly it occurred to me that a trigger would be a perfect solution to this*.

I set up an AFTER INSERT, UPDATE, DELETE trigger on the two tables with the identifiers I was concerned with and had them dump any DML statements into a logging table. The table would auto increment to prevent any PK violations. The trigger additionally filtered the INSERTED and DELETED tables by the 4 identifiers the client said were popping in and out, and I set up another Agent job to not let the table grow larger than one month of history. Additionally I added a clause in the trigger compilation to only instantiate in development and test environments. There's certainly no reason it could not run in production as well, but the idea I wanted here was maximum visibility into the table's modifications with a minimal footprint on the database. So with small non-intrusive trigger, I was able to log the actions on the table and identify when the records popped in and out of existence.

There are still a few drawbacks to this approach. First of all, maintaining the list of tracked tickers is a very manual process. While this is a rare situation that I'll probably only have to monitor for a few weeks, if this happened again, i'd almost have to re-build the trigger from scratch. Second, ideally I would have the trigger "retire itself" after say a month so if I forgot about it, when I moved on to become a pet psychiatrist the trigger wouldn't get lost in the tangle of 0s and 1s. Also, and this is not really a drawback of a trigger, but rather a limitation, I would have liked if there was a way to pass the calling procedure's information into the trigger in order to further trace the source (something like object_name(@@PROCID), but unfortunately the execution context of the @@PROCID changes to that of the trigger upon it's call.)

In the end however, this seemed like one of those times where it was the right tool for the job. It's refreshing to break out of the tunnel vision which often inadvertently affects ones programming styles and find a legitimate use for something you'd historically written off.



* Change Tracking and CDC were not options due to database restrictions we have.

No comments: