26 March 2009

The Next Revolution, Part Two

Part 2 was to be a study in my building and testing 5NF versus 0NF. To do that, I needed an Intel X-25E (it is the Real Deal), but at $800 not something I would buy casually; soon, just not this week. For a couple of weeks. Then AnandTech came to my rescue. They published some testing last fall of a group SSD, including the X-25M; which got my brain grinding on this whole ball of tar. They also published some technical discussions of how SSD works, and under certain circumstances don't. For those interested in industrial strength SSD, I first got to know about them through Texas Memory Systems; they've been around since 1978. One of those, I couldn't afford, not this week or next.

I check every now and again, yesterday found that AnandTech had just published a new X-25E specific study. WooHoo!

Here is the link

The page that is of interest to me, since it validates my premise, is Page 11: Testing in the Real World. The money quote: "One X25-E is 66% faster than eight (!) 15000RPM SAS drives."

What they didn't do, and I'll have to get around to it soon, and it would be easier if I had access to a full fledged production sized development system with lots of flat file data (that's too broad a hint??), is demonstrate how much data can be tossed out from such 0NF applications.

Tossing out redundant data is only part of the gain from SSD multi-processor machines. (As an aside, there is a new blog "The Grey Lens Man", telling the ongoing tale of tossing out a COBOL/REXX application on AS/400 (now iSeries last time looked) to Scala on something else, I guess. Some companies will retire old code. There is hope.) The other part of the improvement motivation lies in housing the data constraints with the data; declared referential integrity, triggers, and stored procedures (not everybody's cup of tea). Doing so reduces the amount of coder created code; I make the distinction that database constraint writing is done by database developers, not coders. The distinction is based on my experience that coders seek to write as much text as possible, since there is a tacit understanding twixt them and Pointy Haired Bosses that SLOC is _the_ measure of productivity; more text equals larger paycheck. Database developers, on the other hand, just want the damn thing to work with as little typing as possible. This motivation is what accounts for the growing number of catalog based CRUD generators appearing like croci in the spring. I expect I'll have more to say about them in due time.

Lest I sound too much like a star struck lover on Madeira, there is a cost, albeit small I believe to 5NF data. In order to return the 0NF data to the application, the engine has to synthesize the rows, and that will take some cycles. So, if we need 1,000 rows from the 0NF table Foo, we could get them as 10 rows from table Able and 100 rows from table Baker, where Able and Baker are a full decomposition of Foo. It is the full decomposition of tables that make SSD databases useful. It is unlikely in our lifetimes that terabyte xml dumps could be stored economically on SSD. Which is fine with me. The point here is to make better transactional systems. Whatever the Web 2.0 folk propagandize, the money data is in transactional systems.

Ride the wave. Cowabunga, dude.

No comments: