06 January 2011

Pundit For a Second Day

Regular readers know that I submitted to the Real Cringely a prediction.  He's only up to 6 so far, thus, I don't know whether he's deigned to accept it.  When he made the announcement, he only said that any outside predictions would be credited, not that the author(s) would get advance notification of "winning". 

While we wait, and given the recent spate of SSD and processor news, a new idea has been worming around my cerebrum.  This post is just to establish originality, not the full blown patent.  I could well abandon the idea.  It's just one of those thought experiments.

My submitted prediction was that Oracle would leap frog the other RDBMS vendors by pushing the BCNF approach in order to win over the IBM/COBOL mainframe application crowd to the Oracle/SSD machines.  The magic bullet approach.

But what if a vendor, may be Oracle may be not, took the vision a step further?  What direction would such a step take?  How about writing the *engine itself* to multi-core/processor + SSD machines?  How would such an engine differ from today's versions?  Well, at minimum it would have an optimizer and execution unit which are parallelized.  Some (most?) engines are parallel only to the extent of being able to execute multiple queries at once, one per thread (generally).  What is not common is parallel execution of a query into the datastore. 

With rising processor/core/thread counts, blasting a query through many threads begins to make sense, if the datastore can respond fast enough.  The existing model of concurrency, whether the locker model or MVCC, seeks to minimize the time that a given row is locked.  Row level locking developed to make this possible, and MVCC developed to make it "irrelevant".  Both approaches are based on the concept of a conflict serializable schedule (Weikum & Vossen, pg. 92 et seq).  In practice, the engine does what the COBOL/VSAM coder used to do:  iterate over a bunch of "records" doing stuff one at a time.  The RDBMS presents to the client an "all at once" facade, as Dr. Codd demanded, but there's just a really fast squirrel spinning the cage.

But matrix operations are what Dr. Codd really meant; one can view the relational algebra as essentially linear algebra for a limited domain.  What if a RDMBS vendor looked at current, and near future, machines rather than past machines?  Current databases are still based on uniprocessor slow disk technology.  The emerging machines have been obvious for five years, anyway.  How far along could, Oracle for example, be toward an engine that *demands* X number of cores and Y amount of SSD primary storage?  Could you build a database engine from scratch in five years?  Abso-freaking-lutely.  OS/360 was written in less time, and in assembler.  So, yes, this could be the year.  My guess:  Microsoft.

No comments: