24 February 2010

Humpty Dumpty Falls Again

Well, campers, it's happened again, STEC has crashed. As I've mentioned before, this endeavor is not primarily dedicated to stock tips, but the health of SSD vendors (and storage vendors, too) is material to our journey down the Yellow Brick Road to the Wonderful World of Oz. As I type, the pre-market value is $9.50; this past summer the share was over $40. What happened?

If you take in the message boards (Yahoo! is the one I follow), many holding shares talk of deception and corruption and such on the part of management. But, fact is, the writing was on the wall with the third quarter report, and the share tanked then as well to about $11, when they announced that EMC wouldn't be taking additional shipments in the fourth quarter. EMC said as much during their earnings. So, it was clear that STEC wasn't shipping gobs o SSD.

As I've mentioned here a number of times, the storage vendors, likely under pressure from their clients, are resisting replacing HDD racks with SSD racks on a one-for-one basis. This was predicted. Both the SSD vendors and the storage vendors have to educate the end user clients to the types of systems which will benefit from SSD, and how those applications are superior to other sorts of applications. The answer, which ought not to surprise you, dear reader, is the BCNF database. The current approach by storage vendors is to promote SSD as a caching support. That's a niche, and small at that. There won't be much future in it.

Building from the datastore, rather than the code, has not yet reached the top suite in most companies. I happened on an email the president page at HP, and sent off a missive to Hurd on just this subject, referencing their SSD promotion web page. I might even get a reply, but I won't be holding my breath. I sent along something similar to STEC. As it is used to be said, "You can't turn the Queen Mary around in a bath tub".

09 February 2010

The Three Cornered Hat

I mulled whether to write as a comment to the previous post, or to make a new one. It is a new one. I will start with an apology to Mr. Harold. It was not my intent to paint him as some sort of malfeasant human; it just so happened that his problems with an xml datastore coincidentally appeared in front of me at the same time as two other events. The primary one was the previous post discussing Phreaky Phil Factor. I thought the connection was obvious, and would be so to my bevy of regular readers. I wasn't expecting a visit from Mr. Harold; so far as I am concerned he was merely the messenger in that story. The other event is a thread on Artima, dealing the hiring/firing problem, and the ensuing discussion among Bruce Eckel, Paul English, and the group assembled, humble self included; Phreaky Phil makes an appearance there, too.

So, the point is not that some xml datastores are slow and some are fast, while some sql databases similarly. Rather, it is the comparison of paradigms among Peso's answer, the described procedural alternatives, and eXist/XQuery/xml. This has always been the problem with xml datastores, and all that redundant data, and the need to write a multi-user engine, and so forth. Those and the lingering resentment, I will acknowledge, toward Don Chamberlin for not only perverting the relational model with SQL (Dr. Codd was shut out), but also the creation of XQuery; he never really got it, having finally returned to his IMS roots. There, I feel better now.

The set based paradigm of the relational model, as implemented in an industrial strength engine, will always be faster for a generalized datastore, than looping over records in any application language. That's the line in the sand. It can be, but not necessarily be, true that a hierarchical datastore, whether IMS or xml, can have a single optimal data path for some query. In fact, that's what IMS was designed specifically to do. Such a datastore will be bad to horrid for any other query. And it will be bedeviled with all of the update anomalies well documented in the literature. There's also the issue with siloed applications (which coding against files promotes) versus shared, disciplined datastores (which mitigates against those silos).

I didn't expect Mr. Harold, or other committed xml zealots, to be suddenly overcome with the spirit of Dr. Codd, lay down their XML Spy, and join the saved. But it could happen.

08 February 2010

xml News [Updated 10 March]

I have been avoiding really, really digging into the xml sarcophagous, since I'm such a warm hearted live and let live sort of guy. Well, sometimes. I know I've mentioned that my longest standing connection to blogging/sites is Cafe au Lait. The site is of Elliotte Rusty Harold, which is actually two sites, the other called Cafe con Leche dealing with things xml. While Elliotte was/is primarily a java advocate (and now writing code rather than books for a living, so far as I can tell), he has written both articles and books dealing with xml.

He and I have exchanged e-mails occasionally about xml, mostly me suggesting that he spend more time with real databases. I hadn't looked at Leche for a while, since my bookmark goes to Lait, and Leche is almost always off the bottom of the screen. But today I scrolled down, and found these two stories:

XQuery being slow
and
Xquery not doing well with errors, not that I'd advocate a java approach in a data system.

Coders still insist that they can build a robust datastore from a foundation of Lawyers' document markup language. I guess they're fans of Sarah Palin, too. And on the subject of xml suprises, Tim Bray is telling us that Oracle has decided to keep him. I've always believed that Larry wants Sun in order to take the last significant market it doesn't own: IBM mainframe clients. How having the onlie begetter of xml motivates that goal? Maybe Tim will have to larn him some relational database. Ah, cruel irony.

Update: Tim Bray announced on his blog that he's resigned from Sun, prior to "integration" in Canada, as a result he says he never will have worked for Oracle. No reason given. You can follow him at: his site.

04 February 2010

That Phil Phreak

Simple Talk is a site I've been reading for a few years. I started when I had to do some work with SQL Server, moving an application to DB2. I don't have SQL Server here, being just linux, but I still follow it and will play along from time to time when the subject doesn't require SQL Server syntax. A new piece deals with set based solutions to data problems. While it doesn't use BCNF data, just one table, it is an amusing exposition. More than a few bon mots at record oriented programming; warms my heart.

The author is one I'm not familiar with, but the original instigator is Phil Factor, an anonymous DBA/developer who could be me if I lived in London, spoke real English, and spent my time with SQL Server.

03 February 2010

We Don't Need no Stinkin' Innovation

Artima is a running bunch of message threads on subjects generally related to coding; occasionally databases directly, but not so much recently. There is a current thread, started by Bruce Eckel, discussing the proposition that software development has stalled. The thread then mixed in ideas of innovation and concurrent machines. Well, this was too much to ignore. Herewith my contribution; I find that it stands pretty much on its own, and makes another case for the point of this endeavor.


While it is true that development has stalled, the stall began with java. Java did not usher in a wave of innovation, which has somehow petered out. It merely allowed the ignorant to recreate the paradigms of the 1960's.

The web, so far, is semantically implemented exactly the same way as COBOL/3270 code, circa 1970. You have a dumb datastore, managed by bespoke code, talking to a semi-smart block mode terminal. The only difference is some syntax (COBOL on the mainframe vs. java on the server, 3270 edit language on the terminal vs. html/javascript on the browser), but the method is identical. Better ways were discovered between 1970 and 2000, but the folks, young-uns mostly, who stole the process were quite ignorant of these. They revel in "languages" rather than systems.

The principle reason this is so amusing is that there was far and away more innovation in systems through the 1980's. There were multiple mainframe machines, each with a specific instruction set (aka, architecture), as well as an emerging group of mini-computers, again, each with a specific architecture; among them explicitly parallel machines. Machines were innovated to solve problems heretofore unsolved. We've devolved to a near mono-culture: the X86 instruction set and z/ instruction set. The ARM processor is gaining some traction, and may pull us out of the weeds; it is aimed at a different problem.

All this kerfuffle about concurrent languages is so misinformed, again, because those who consider themselves players weren't around when parallel architectures were first developed in the late 1970's. Those architectures basically went nowhere, and not because they were poor architectures, but because there aren't many problems (in the computer science sense) that benefit from parallelism. As to concurrency in linear problems, see Amdahl's law (it's been mentioned before).

The notion that ad hoc language creation will make it easy to use multi-core/processor machines to execute linear problems (virtually every application which is not a server) more efficiently, is fantasy. Even without Amdahl's law, simple logic (which one may argue is all Amdahl was expressing) makes it clear.

The problem which is staring the industry in the face is the re-siloing of applications as a result of web kiddies' love affair with languages rather than systems, which leads to the "explosion of data". Dr. Codd, bless his heart, gave us the answer: replace code with disciplined data. But because these self-appointed players have a hammer called "languages", they seek to create "new" ones to solve a problem (nail) which is not language solvable.

To accept that the solution is to implement shared, disciplined datastores, is to accept that each application is not going to be its own little fiefdom. Further amusement here, in that Bruce (and many others, as well) make much of "community" of coders in these discussions, but absolutely reject the notion that "community" can be applied to datastores. NEVER do that. If they did that, then there'd be so much less code to build. There'd be so much less "freedom" to make up one's own rules about this and that. So we get a Tower of Babel; each cabal of coders pissing on its territorial boundaries. My code. My data. You don't understand what I'm trying to do here. And so forth. And so very mainframe 60's. Your grandfather said the same thing.

Note that the multi-core/processor machine is not a new architecture, but rather just the aggregation of existing Von Neumann machines (linear execution with an existing instruction set). These are not machines based on an architecture which accepts code and parallelizes it for execution by some new non-linear instruction set.

This is all wasteful wheel spinning. The linear instruction sets are (save for register size expansion) fundamentally unaltered for 30 years (nearly 50 for the z/). ARM takes RISC to a logical conclusion, and attacks an utterly different problem from that of the X86 machines.

This proliferation of languages is scribbling around the edges. They all have to (compile to and) execute on the same old X86 instruction set. You can't (well, really) push a square peg in a round hole. Today's multi-machines aren't really designed for parallel algorithms (and how many of those are there?), they're just miniaturized versions of the IBM 360 MFT (that's 1965 for those keeping track) machines.

The answer to the density/speed/heat problem won't be found in such ad hoc language sand castles. It will require some really smart hardware engineers (are there any left?) figuring out the successor to the Von Neumann architecture. Have fun making new languages, but don't delude yourselves into thinking that you'll actually be solving the real problem. The problem isn't in high level application languages. The answer is either in databases (replacing code with data, leaving display to application languages) or in new hardware architecture.