29 March 2009

The Next Revolution and Alternate Storage Propositions

I've spent the last few days reading Chris Date's latest book, "SQL and Relational Theory". One buys books as much to provide support to the author, kind of like alms, as to acquire the facts, thoughts, and opinions therein. Kind of like buying Monkees albums; one doesn't really expect to hear anything new. I may post a discussion of the text, particularly if I find information not in previous books.

What this post is about is the TransRelational Model [TRM] which this latest Date book resurrects, column stores such as Stonebraker's Vertica, and the impact of the Next Revolution on them. As always, this is a thought experiment, not a report on a Proof of Concept or pilot project about either. May be someday.

In Date's eighth edition of "Introduction...", there is the (in)famous Appendix A, wherein he explicates why Tarin's patented Tarin Transform Method, when applied to relational databases, will be "the most significant development in this field since Codd gave us the relational model, nearly 35 years ago" without referencing an implementation. In particular that, "the time it takes to join 20 relations is only twice the time to join 10 (loosely speaking)." When published in 2004, Appendix A led to a bit of kerfuffle over whether, given the reality of discs, slicing and dicing rows could logically lead to the claimed improvements. I found a paper, which says it is the first implementation of TRM. The paper is for sale from Springer, for those who may be interested. You will need to buy the book to see what they found.

At the end of "SQL and Relational Theory", in the About the author, is a list of some of Date's books, among them "Go Faster! The TransRelational Approach to DBMS Implementation, is due for publication in the near future." The same book is "To appear" in Appendix A of the eighth edition. And I had thought it had gone away. The url provided for Required Technologies, Inc. is now the home of an ultrasound firm.

The column database has been around for a while; Vertica is Michael Stonebraker's version. There is also a blog, The Database Column which discusses column stores. It makes for some interesting reading. Two of the listed posters are of Vertica.

My interest is this: given the Next Revolution, do either a TRM or column store database have a purpose? Or any 'new and improved' physical storage proposition. My conclusion is, on the whole, no. The column store, when used to support existing petabyte OLAP systems may be worth the grief, but for transactional systems, at which the TRM is aiming and from which column stores would extract, not so much. The claim in the eighth edition is that TRM datastores scale linearly with the number of tables referenced in a JOIN, but my thoughts are that the SSD table/row RDBMS cares not about the number of tables referenced in the JOIN, since access time is independent of access path. In such a scenario, the number of tables in the JOIN (assuming that the number of tables is determined by the degree of decomposition) should lead to faster access, since there is less data to be retrieved. As I said in part 2, there is a cost in cycles for the engine to synthesize the rows. The actual timing differences will be determined by the real data. In all, however, it seems to me that plain vanilla table/row 5NF RDBMS on SSD multi-processor machines will have better performance than either TRM or column store on any type of machine. Were I of TRM or a column store vendor, inexpensive SSD multi-processor servers would be making my sphincter uncomfortable.

The sine qua non of RDBMS performance implementation, is access path on storage. The fastest are in memory databases, such as solidDB now from IBM. For production databases for normal organizations, mainstream storage for mainstream databases will be where the action is. Both TRM and column datastores, so far as either has 'fessed up, are an attempt to gain superior performance from standard disc storage machines. Remove that assumption, and there may not be any there, there. Gertrude Stein again. Kind of like making the finest buggy whip in 1920.

Current mainstream databases can be run against heavily cached disc storage, buffering in the engine and the storage subsystem. The cost of such systems will approach that of dedicated RAM implemented SSD storage, since the hardware and firmware required to insure data integrity is the same. As was discovered by the late 1990's, one level of buffering which is controlled by the engine is the most efficient and secure way to design physical storage.

And for what it's worth, back in the 1970's, before the RDBMS came into existence, there was the "fully inverted file" approach to 'databases'. In essence, one indexed the data in a file on each 'field', and turned all random requests into sequential requests. This appears to be the kernel behind the TRM and column store approaches. Not new, but if one buys Jim Gray's assertion that density increases will continue to surpass seek/latency improvements, then it makes some sense for rust based storage. The overwhelming tsunami of data which results may be a problem. If we view a world where storage is on SSD, rather than rust, as Torvalds says, the nature of file systems changes. These changes have a material impact on RDBMS implementations.

13 comments:

Anonymous said...

Robert,

I enjoy reading your posts, but had not read this one until recently when reviewing your earlier posts.

You may have already discovered the findings below since this post in 2009. I looked around for more information on the TransRelational Model and Date’s book “Go Faster! …” that you mentioned in this post. I found that the book is now published (2011) and even found a PDF version that can be downloaded for free at http://www.zums.ac.ir/files/research/site/ebooks/it-programming/go-faster.pdf. This PDF copy has advertising in it, but appears to be a complete copy at 287 pages. The book PDF download link is also referenced in Date’s news page http://www.justsql.co.uk/chris_date/chris_date.htm, so I suspect it is legit.

The book mentions that the book publication was held up due on non-disclosure agreements that expired in 2011 (and likely why you hadn’t seen the book at the time of your 2009 post). There is also further information about the Tarin Transform Method and a patent on that method (US PTO 6,009,432 – 12/28/1999 – Value-Instance-Connectivity Computer-Implemented Database). I am still in the process of reading the book, but found some interesting info so far.

I will be interested to hear your comments on this book – perhaps in a future post.

Thanks again for your writings and thoughts.

Scott R.

Sadhana Rathore said...

I wanted to thank for sharing this article and I have bookmarked this page to check out new stuff.
AWS Training in Chennai
AWS course in Chennai
DevOps certification in Chennai
DevOps Training in Chennai
Data Science Course in Chennai
Data Science Training in Chennai
AWS Training in OMR
AWS Training in Porur

Raga Designers said...

I have read your excellent post. Thanks for sharing

aws training in chennai
big data training in chennai
iot training in chennai
data science training in chennai
blockchain training in chennai
rpa training in chennai
security testing training in chennai

Racim Boudjakdjk said...

Robert,

Codd's response to previous hierarchic or graph databases was not more speed or more power but rather total physical independence.

Physical independence happens only when the relationship between data run time representation and operation and physical encoding organization is broken up in the way where there is no relationship at all between the properties of both layers. Unless this condition is met, formalized and mathematically proven, speaking of RDBMS respecting RDM's PI is like speaking of a car running with a fuel engine eating water tanks: won t happen.

Mithun said...

Great Post with lots of useful informations. Excellent blog very much interesting...
SAP Training in Chennai | AWS Training in Chennai | SAP Training | AWS Training

TIC Academy said...

Excellent blog on AWS Concepts. Superb information.
AWS Exam Center in Chennai | AWS Training in Chennai | AWS Training Institute in Chennai

Manigandan said...

Wonderful blog with lots of useful info..
Hardware and Networking Training in Chennai
CCNA Training in Chennai
AWS Training in Chennai
SAP Training in Chennai
Software Testing Training in Chennai
Java Training in Chennai
QTP Training in Chennai
iOS Training in Chennai
Oracle Training in Chennai
Pearson Vue Exam Center in Chennai

Mithun said...

Wonderful Blogspot
AWS Training in Chennai
AWS Training Institutes in Chennai
AWS Training Center in Chennai
AWS Training Course in Chennai
AWS Training Class in Chennai
Best AWS Training in Chennai
AWS Training Institute in Chennai
AWS Certification in Chennai
AWS Classes in Chennai
AWS Training

Aishu said...

Awesome post.
IELTS Coaching in chennai

German Classes in Chennai

GRE Coaching Classes in Chennai

TOEFL Coaching in Chennai

spoken english classes in chennai | Communication training


rajmohan1140 said...

I really like what you have here concerning blogs. Your article is so useful for us, thanks for sharing. Good stuff!

Java Training in Chennai

Java Course in Chennai

mathivarsha said...

Awesome post
DevOps Training in Chennai

lillyraju729 said...

Amazing article ! I would like to thank you for the efforts you had made for writing this awesome article. this article
inspired me to read more. Keep it up.Data Science Training In Chennai

Data Science Course In Chennai

Let2know said...

Rekordbox DJ Crack has become an activation application for audio programs. You might even utilize the Pioneer DJ participant to perform Rekordbox Free Download with License Key