I got to thinking about brains today, which as any person with an overactive imagination knows, is sort of a black hole. But regardless, my thoughts shifted to consciousness, understanding of the world around us, and psychedelics (specifically DMT and other chemicals which occur naturally in the brain).
I'm not one to believe that much (if any) real astounding wisdom can be derived from the use of drugs, but it does raise the question of why such chemicals naturally exist in our brains, and why adding more of them (or sympathetic chemicals) has such a profound effect on us, perceptually, spiritually and emotionally.
As I started to think about the problem from a programming perspective, I wondered what their role could be and I came up with the idea of "kaleidoscopics". Really this is just a made up word that I though sounded cool, but the idea is that it's an agent which casts doubt upon normal inputs.
It's difficult to design a system which is both deterministic and "intelligent" in the sense which humans typically ascribe to the word. A simple system would be an if-then statement. More complex systems introduce more and more variables, until you have a system, which, like a brain, or an ant colony, takes on emergent properties. There's obviously a lot more going on than the simple interaction of one method or another, but to some extent after a time a system (we'll use the brain as the example from here on out) reaches a state of relative homeostasis.
So I'm rambling a little bit. The point is, how do you program something which can learn any language, lie, love, play, laugh, calculate, trust and kill? You've got to have some pretty flexible rules around what defines reality.
So my hypothesis (like most of the hypotheses I make, being grounded in absolutely no scientific study) is that these chemicals, or if not them, some other set of chemicals in the brain, has the job of turning things on their heads; randomly flipping bits, and presenting the mind with alternative interpretations of what's going on. As a child, when the brain is young and the neurochemical soup of grey matter starts to solidify into a brain and consciousness, benefits greatly from the ability to try out theories and discard them at a moments notice.
As we get older, we start to know what the flipping of a bit will do (what if I COULD put my hand through the wall), and some of our experimentation, sadly, ends.
It also brings up the somewhat terrifying concept of The Singularity, in that to program a mind with the same ability to perceive reality as flexibly as we do, we would have to take the safety off, and really give the program the ability to come up with whatever it wants. It's pretty cool when you see it on the scale of Conway's Game Of Life, but scale it up to (super)human intelligence and allow the possibility for it to veer off in a direction like Skynet, and the human race would be in trouble. Or we could end up with something closer to Peter F. Hamilton's concept of the singularity; a more benign entity whose motivations are beyond our comprehension.
Again, I'm rambling. But if I can try to tie a nice bow on this, is that reality is highly subjective. If you think about how YOU would try to program something to "perceive" reality, it cascades into all sorts of deliciously complex issues. But one of them I'm sure is a mechanism for self doubt. It plagues us and drives us at the same time, but always has a profound effect.
Friday, February 7, 2014
Kaleidoscopics
Labels:
AI,
brain,
Computer,
Consciousness,
DMT,
hallucinogen,
Hamilton,
intelligence,
intelligent,
Peter,
psychotropic,
sentience,
singularity,
technological,
technology,
Vernor,
Vinge
Normalization Musings
Musings
on Database Normalization
Much
of the work I do has to do with formatting the presentation layer of
data. The typical data flow is (a) vendor sends data, (b) we
transform and store it in a format somewhere between what they send
us as what is required by the display later (website) and (c)
procedures are written against this data.
Having
the luxury of placing the burden of data integrity on the vendor, in
combination with an asynchronous replication model means that most of
the time we cannot and do not use surrogate keys or foreign keys.
Some
of the time this works fine; actually most of the time it works fine.
Of course if the data
consumption
architecture deviates
from the simple truncate/insert
correcting
data can be vexing.
I
had my first taste of self-inflicted bad database design on a recent
project. Hindsight being what it is, I would have done many things
differently (not the least of which would have been more planning
before developing), but as it is, it was a good learning experience.
I
was given a data set which I didn't understand well and a limited
amount of time to develop against it. The data sets were effectively
fact tables from a data warehouse; some with provided dimension files
and some without. (By the way, for reasons I wont get into SSAS was
not an option to use). The data sets were also quite large (fact
tables around 100 million records with all the different dimensions
factored in). To add insult to injury, the default format the data
came in was name-value pairs rather than columns. (i.e. {Name: Price,
Value: 3.14} as opposed to Price: 3.14}.
Without
using surrogate keys, a key might easily have 8 columns in the
primary key including a generic “Name” column catapulting the
number of records by a factor of however many members there were in
the table.
There
were quite a few revisions I undertook to optimize both retrieval and
storage. The first was to do away with the NVP structure and use
dedicated columns. Sure, new values required a little bit of dev work
to hook up, but even if it were a schema-less NVP format, some amount
of work would still need to be done to hook up the new data points.
So that cut table sizes dramatically. In combination with data
compression, the data set was not looking quite manageable.
But
that’s not the main point I want to make (although it is a good
example of why NVP models suck).
The
next step I took (and what I should have done from the beginning) was
to really get to know the data, understand the natural hierarchies
and unique entities. Since not all of this data was provided as
reference data by the vendor, I built a process to build inferred
dimension from the incoming files and maintain surrogate identifiers
as necessary.
What
I ended up was two conceptual layers in the database. A decomposed
and normalized version of the fact tables the vendor sent, and a set
of custom pre-aggregated tables to serve specific use surface data
quickly (where indexing alone was insufficient). The main advantage
of the normalized structure was the amount of work it removed from
building and maintaining these “second-level” aggregate data
structures. Being able to reference master records of data ahead of
time, and logically know where to look for everything made the
process of pre-populating tables significantly easier. Plus it put
into my hands much more control over the integrity of the data.
The
next project I worked on was a departure from the model I described
in the first paragraph (vendor -> store -> present). Instead I
built a database drawing upon both internal and external data sets. I
used many lessons from the project I described earlier. First and
foremost, I took (had?) the time to plan out the database ahead of
time. Wherever possible, I normalized things out; in a few cases even
so far as to have references tables with one or two id/value rows in
them. I created surrogate identifiers when the need to standardize
disparate data sources was needed, I documented the database, and
made use of foreign keys for reasonably static reference data.
First
the bad. Although to an extent, surrogate keys are supposed to be
business-process agnostic, there was a need to be able to uniquely
identify entities which had no natural key. As such, when setting up
new items, while the value of the surrogate key was irrelevant, those
identifiers made their way into descriptive models and processes.
There was distinct logic written around EntityID:1 vs EntityID:2.
This means that moving data from environment to environment, updating
values and correcting data, these identifiers have to be maintained.
MyFirstEntity must always be ID:1, and MySecondEntity must always be
ID:2, not just randomly assigned numbers. Perhaps there’s a better
way to uniquely identify elements with no natural key, but short of
fuzzy logic, I’m not aware of one.
Additionally,
foreign keys, while central to database integrity, tie your hands a
bit in terms of how you can modify an immature database. For
instance, during early development cycles, I was changing table
structures every few day or so. Foreign Keys did their job admirably
and prevented me from taking shortcuts to restructure the database
and data; objectively a good thing, but it takes some getting used
to.
Now
the good. As with the first project I mentioned, I had to build
several consolidated tables in order for front-end queries to perform
optimally, but having everything structured logically made building
the new structures embarrassingly simple once the foundation was
configured. Highly (maybe 3NF) normalized databases requiring a wide
variety of complex query format does present probme problems for
systems with low latency tolerance, not all of which can be
ameliorated through indexing and query design alone, but again,
building de-normalized structures off of the normalized data, over
which I had full control, was simple compared to trying to
maintain 20 de-normalized tables and having to make sure the name of
an entity never got out of sync between them.
I've
worked with several highly normalized systems in the past, and while
the learning curve to understand the data can be steeper than if you
were looking at a de-normalized table, report or spreadsheet, once
you understand the data, it makes getting what you need a much more
sane process.
Foreign
keys were the unsung heroes of this database. They’re not as
obviously helpful as normalization or other structural advantages,
but they prevent things from going wrong. Every day that I don’t
have orphaned records, they’re doing their job. Having the
assurance that the schema I've spent so long tinkering with and
refining will work correctly is well worth the effort. Sure, it takes
time to ensure I delete from tables in the correct order and don’t
add sub-entities before their parent, but I doubt anyone will say the
cost of learning to do things in the right order is worth having an
unstable database.
There
were certain instances where I made weighed decisions to avoid
foreign keys (such as on tables with very heavy traffic and/or when
an orphaned record would have no impact on the system, and I stand by
those decisions.
I’m
guessing at this point (if not earlier) normalization fans have been
thinking “we told you so” since the start of the article. These
are things I had no idea about when I started working with SQL. I was
educated in a certain way and the methods used by my mentors just
became engrained. I really had no significant exposure to highly
normalized databases or foreign keys when I started, and nobody took
the time to explain their implications (positive and negative), and I
wish someone had. There are still many cases where I forgo these
database practices and probably will again in the future, and maybe
some of the time they’ll come back to haunt me, but if I can give
anyone who doesn’t use these tools some advice, it would be to
strongly consider them. These were tools I dismissed because others
dismissed them, and I assumed the problems that arose from them were
just the way things were. I found excuses not to break form, but
fortunately over time my curiosity won out, and I’m glad it did. As
Robert Frost said “… And I, I took the [road] lets traveled by,
and that has made all the difference”.
Labels:
Data,
Database,
Foreign,
Form,
Integrity,
Key,
Normal,
Normalization,
optimization,
RDBMS,
Relational,
Server,
SQL
Wednesday, February 5, 2014
For those of you who may not have known, there’s a cool trick you can do in Visual Studio or SSMS; the alt-drag marquee. Many of these functions can be handled with fine and replace with or without regular expressions, but sometimes those are more trouble than it’s worth.
The basics of alt-drag are this. Say I have a block of text, such as a comment block of a procedure which I want to replace all at once. Hold alt and drag to the bottom right, and you get a marquee area which respects x/y coordinates rather than just lines and indentation.
Now the cool part. If you’ve ever used copy-paste with these marquees, you’ll find they work… oddly. However like any good Jedi, you can turn your enemies weakness into your strength.
Say for example, I forgot to type select statements for a derived table like this:
You could use find replace to replace “all” with “all select’, you could use a series of copy/paste commands and down arrow keys, or this:
Create a zero-width vertical marquee and start typing. Every text line which is crossed by the marquee will begin typing there (it works with ctrl-v_ as well.
The basics of alt-drag are this. Say I have a block of text, such as a comment block of a procedure which I want to replace all at once. Hold alt and drag to the bottom right, and you get a marquee area which respects x/y coordinates rather than just lines and indentation.
Now the cool part. If you’ve ever used copy-paste with these marquees, you’ll find they work… oddly. However like any good Jedi, you can turn your enemies weakness into your strength.
Say for example, I forgot to type select statements for a derived table like this:
You could use find replace to replace “all” with “all select’, you could use a series of copy/paste commands and down arrow keys, or this:
Create a zero-width vertical marquee and start typing. Every text line which is crossed by the marquee will begin typing there (it works with ctrl-v_ as well.
Subscribe to:
Posts (Atom)