09 February 2010

The Three Cornered Hat

I mulled whether to write as a comment to the previous post, or to make a new one. It is a new one. I will start with an apology to Mr. Harold. It was not my intent to paint him as some sort of malfeasant human; it just so happened that his problems with an xml datastore coincidentally appeared in front of me at the same time as two other events. The primary one was the previous post discussing Phreaky Phil Factor. I thought the connection was obvious, and would be so to my bevy of regular readers. I wasn't expecting a visit from Mr. Harold; so far as I am concerned he was merely the messenger in that story. The other event is a thread on Artima, dealing the hiring/firing problem, and the ensuing discussion among Bruce Eckel, Paul English, and the group assembled, humble self included; Phreaky Phil makes an appearance there, too.

So, the point is not that some xml datastores are slow and some are fast, while some sql databases similarly. Rather, it is the comparison of paradigms among Peso's answer, the described procedural alternatives, and eXist/XQuery/xml. This has always been the problem with xml datastores, and all that redundant data, and the need to write a multi-user engine, and so forth. Those and the lingering resentment, I will acknowledge, toward Don Chamberlin for not only perverting the relational model with SQL (Dr. Codd was shut out), but also the creation of XQuery; he never really got it, having finally returned to his IMS roots. There, I feel better now.

The set based paradigm of the relational model, as implemented in an industrial strength engine, will always be faster for a generalized datastore, than looping over records in any application language. That's the line in the sand. It can be, but not necessarily be, true that a hierarchical datastore, whether IMS or xml, can have a single optimal data path for some query. In fact, that's what IMS was designed specifically to do. Such a datastore will be bad to horrid for any other query. And it will be bedeviled with all of the update anomalies well documented in the literature. There's also the issue with siloed applications (which coding against files promotes) versus shared, disciplined datastores (which mitigates against those silos).

I didn't expect Mr. Harold, or other committed xml zealots, to be suddenly overcome with the spirit of Dr. Codd, lay down their XML Spy, and join the saved. But it could happen.

No comments: