Dr. Codd Was Right: July 2009

18 July 2009

Common Sense isn't always common

I have been meaning to write about common table expressions (CTE), since this syntax (sugar, some say) explicitly implements hierarchies, and given the reactionaries out there pining for IMS and such won't be quiet. I first used them on DB2/LUW years ago. SQLServer 2005 added support, and now Postgres 8.4 has, too. Oracle has had its CONNECT BY syntax much longer. CTE syntax got added to SQL-2003.

I read various database sites and blogs; among my favorites is PostgreSQL.org. (Note to self; add to Good Stuff.) It has a rotating set of links, and today there is an article discussing CTE in postgres. Since I haven't got around to installing 8.4 yet, the work is done for me. Read it. The article compares using CTE versus functions, which is a neat idea in and of itself. They demonstrate that CTE can run faster than standard syntax on a HDD machine. Very neat.

CTE is germane to the main point of this endeavor, SSD multi-machines, since a CTE is just massive joining behind the scenes. You get the point. What would be somewhat inefficient on HDD, becomes trivial on SSD. One more reason to put 1960's datastore behind. Paul McCartney played on the marquis of the Ed Sullivan theater a few days ago, but he still looked as ancient as Ozzie Osbourne. Hierarchy is from the same time. Dump it. XML is IMS in drag; all sequins and rhinestones.

16 July 2009

Sunrise, Sunset. Game, Set, Match. Part 2.

As (what now turns out to be) Part 1 began:
Well, the other shoe dropped. Oracle has bid for Sun. In my tracking of speculation, Oracle had more weight than IBM. And so it has turned out. This might end up being a problem for IBM.

The buyout/merger/lunch is official (so far as shareholders go; DoJ might vomit, but we'll have to wait on that) today, and since yesterday's post about file systems is still fresh in my brain, I thought I'd take a minute or two to expand on why I believe Oracle wants Sun.

It's not java.

It never was java, to my way of thinking. So, here's my way of thinking.

Larry has been buying up applications for a few years. What to do next? Or should he just keep doing that? What is the next Everest?

IBM has been diligent in transforming itself (as has GE, by the way) from a maker of things to a finance and body shop. The reason for doing so is apparent: services require little (real) capital investment. All you do is rent some office space, some computer time (or cloud it? may be that, too), find some minimally competent bodies; and sell the hell out of the "organization". IBM was, is, and always will be, a sales organization. It needed mainframes in the beginning because the US economy had not yet made the transition to de-industrialization (2007: 40% of corporate profit is finance). And to that point, the company had a history of making bespoke machines; in no small motivation to lock-in its customers. Sales staff's major duty was "client management"; squeeze the last dollar out of each of them, and make damn sure clients didn't go someplace else.

The last place that Larry has not been able to take Oracle (the database) is the IBM mainframe. There are technical reasons (mostly due to how IBM designed those things decades ago) why Oracle has never run all that well on the IBM mainframe. Consider that rather than try to climb Everest, may be you could nuke the sucker, stroll over the debris and say that you've climbed Everest. Kind of true.

How, then, does Larry nuke IBM on the mainframe? That's where Sun comes in. Rather than trying to get Oracle (the database) to run as well on the mainframe doing COBOL support as DB2 does, why not build an alternative which is cheaper, faster, easier, and runs Oracle? Cool. Oracle and Sun have been together for years. Now Larry can get the hardware folks to do exactly what he wants. He has HP doing sort of what he wants; but he doesn't own them. And the Sun processors, some tech folk assert, are better than what IBM now uses.

Which brings us to btrfs. It was developed by Oracle, then open sourced. It is still under development, but if you read the site, you find that SSD support is what it's about. He'll have a machine with SSD and a file system tuned for them, running his database. He can make a compelling TCO argument that converting antique COBOL/VSAM applications to his super hot-rod database machine is a slam dunk. You heard it here first.

As I have been saying, SSD multi-core multi-processor high normal form database machines are the most cost effective way to handle data. The more I learn about what Larry is up to, the more convinced I am that this is where he's taking Oracle. I couldn't be happier.

File Systems for SSD are here

I have no idea how many folks really notice the quotes at the top of this endeavor, nevertheless, I have found a recent article which talks about Linux file systems. The interviewee (if that's a real word) has worked on writing file systems for Linux, and has the following (page 2) to say about SSD and file systems:

With regard to local file systems, I think btrfs is flexible enough to handle any projected hardware changes for the next decade, both in performance and capacity - in other words, SSDs and truly enormous quantities of storage. I also think we may see more flash-based devices exporting access to the hardware directly as the SSD market commoditizes, in which case LogFS and NILFS become even more interesting, perhaps as part of an open source wear-leveling layer that can integrate tightly with other Linux file systems.

So, the future is here. For information about btrfs, see Wikipedia for a start (SSD is explicitly supported).

tick, tock. what's up Doc?

08 July 2009

Lots of Chrome, and Tail Fins

Phones will kill the Internet, but not databases

That was the way it was shown in the Musings in Process list, but times have changed. A couple of weeks on the calendar, but eons in system time. We now have ChromeOS, so my musing was almost right; might still be eventually, but for the near- to mid-term, it looks now as though the netbook may well rule the roost.

The netbook means it's back to the future (well, the 1970's anyway). The netbook means that host/terminal computing has won. Took a while, but won it has. Now, the world will look like the XTerm world of the 80's. M$'s worst nightmare. Odd thing is, as I type this, ARM's share price is going down and so is Texas Instruments. Both are mentioned in articles about ChromeOS and how it will run. That I can't explain.

In any case, high normal form databases serving, effectively, XTerms is as close to Nirvana as one could hope for.

Tick. Tick. Tick.

04 July 2009

Erik Naggum's thoughts on XML

Erik Naggum died recently. He was a SGML expert, and an opponent of XML used stupidly. This is from a 2002 usenet thread. Couldn't say it better myself. The point, of course, is that XML data requires specific code to extract the raw data, which is then processed by the remaining application code. As others have pointed out, one could use comma separated value files, skip the processing code, and go straight to the application code. Or use a database with SQL.

People who think object-orientation is so great, have generally failed to
grasp the value of data-driven designs despite the serious attempt at
making such design easier to model, and think solely in terms of code-
driven designs where their class hierarchies are poor adaptations to
their incompetent coding styles. This is extremely depressing, as the
interminable "software crisis" is a result of code-driven design. SGML
and XML were attempts at promoting data-driven design that would produce
data that was _supposedly_ indepedent of any application. The result is
that people who have so little clue they should have attracted one simply
by the sucking power of vacuum do code-driven designs in XML, which is
_really_ retarded, and then they need to store their moronically designed
data in databases, which is, of course, too hard given their braindamaged
designs, so the relational model does not "work" for them.

03 July 2009

Cloud: Lucy in the Sky with Razorblades ... Updated, twice

"'The time has come,' the walrus said, 'to talk of many things'"

Today I endeavor to check off one of the musings in process, dealing with silos in the sky. I am motivated by a bit of bloviating I ran across in my surfing, this nonsense. I will leave it to readers to endure it on their own. I will only pull out the occasional quote to gloat.

A bit of background is in order. With the arrival of Web 2.0, and to a lesser extent Web .00001, coders began to change the rules back to what existed in the 1960's: all code is smart and all data is dumb. Java became COBOL, and data became copybooks. The main problems are that the coders are dishonest about their desires and intentions, and the effect of what they are attempting will eventually lead to the problems which led Dr. Codd to devise the relational model and database; he is not responsible for SQL, Chamberlin is the guilty party there and he continues to sin with XQuery and the like.

In the 1960's there developed the industry segment known as Service Bureaus. IBM was a major player. A service bureau was a connected computer service, generally over leased lines, to which a company could off-load its applications. Often, applications were provided by the service bureau. The service bureau agreed to provide resources as needed.

SOA and cloud are just http versions of service bureaus. Service bureaus fell out of favor for the obvious reasons: they didn't actually manage to provide resources on demand, they weren't any more reliable than in-house, they weren't any more (often quite less) secure than in-house, and they didn't save their customers any money. After all, they had to make a profit doing what had been a simple cost for their clients. There didn't turn out (surprise, surprise) to be any economies of scale. There won't be for SOA or cloud, either. That is no surprise.

The notion of provisioning in the cloud being cheaper and more scalable is founded on a single, false, assumption. That is: demand spikes for resources among the clients are uncorrelated, or that gross demand for all clients is either constant or monotonically increasing. That the assumption is false is easy to fathom; there are easily identifiable real world generators of demand spikes, and few are unique to either specific companies or industries. Daily closing hour, Friday peaks, weekend peaks, seasonal peaks, month end peaks, and so on. The argument is made that the resource demands made by the release of the latest iPhone (as example) are transitory to Apple, the retail sites, and AT&T. What is ignored is that all such organizations, unless they are failing, will absorb these resources in their continuing (growing) business in short future.

The real failing of SOA/cloud will be imposition of lowest-common-denominator data structures. Which brings us back to that execrable post. Since these NoSQL folk have it in their heads that "all data be mine", just as COBOL programmers did in the 1960's, they will build silos in the sky, just as their grandpappies built silos in glass rooms (most likely don't even know what a glass room is, alas). As that old saying, those that ignore history are doomed to repeat it.

So, some quotes.

"'Relational databases give you too much. They force you to twist your object data to fit a RDBMS [relational database management system],' said Jon Travis, principal engineer at Java toolmaker SpringSource..."

Mr. Travis fails to understand that object data are separate and apart from object methods. Always were, and always will be. The reason OODBMS failed was just for that reason. There is nothing to be gained, and a lot of pain to be endured, from storing all that method text more than once. The instance data, stored relationally which is the minimum cover (so to speak) of the data requirement, identifies each instance of the class. There is no "twisting" to be done. The relational model keeps all the data nice and neat and constrained to correctness; no code needed. The method text is needed only for transitory changes from one stable, correct state to the next. Each state being written back to the relational database. Simple as that. But that means far less code, which coders view as a job threat. Yes, yes it is.

"'SQL is an awkward fit for procedural code, and almost all code is procedural,' said Curt Monash, an independent database analyst and blogger."

Mr. Monash is infamous in the database world, and proves it again. The OO world, it claims anyway, is explicitly non-procedural. It is event driven, through message passing. So it says. In any case, how the object's data is manipulated after construction is of no concern to the data store, be it a RDBMS or shoe box full of 3x5 cards. All such data manipulation is transitory and irrelevant to state. State is what the data store cares about. Mr. Monash doesn't get that.

Siloing, or rather the riddance of same, was Dr. Codd's concern. These KiddieKoders are such ignorant reactionaries, and don't even know it. The title of his first published paper: "A Relational Model of Data for Large Shared Data Banks" puts it all together. No siloing. The cloud, with these primitive one-off data structures, create the massive data problem, not solve it.

The serious problem with the cloud, however, is that, if widely adopted, it may kill off the advantage of the point of this endeavor, the SSD multi-machine database. SSD storage has the advantage of supporting high normal form, and thus parsimonious, data structures. If cloud becomes ascendant, it will adopt lowest-common-denominator storage, i.e. cheap. And SSD multi-machines are not that. I believe it can be shown that SSD storage will be faster (with a smaller footprint) for high normal form databases than for the de-normalized spreadsheets so beloved of KiddieKoders. But cloud providers aren't going to be that smart. Back to the 60's.

I'll close with a quote from those I keep around as sigs for e-mails:

This on-demand, SaaS phenomenon is something I've lived through three times in my career now. The first time, it was called service bureaux. The second time, it was application service providers, and now it's called SaaS. People will realise the hype about SaaS companies has been overblown within the next two years. ... People are stupid. History has shown it repeats itself, and people make the same mistakes.
-- Harry Debes/2008

Update:
I check up with Gartner PRs every now and again; I don't spend the coin to subscribe. Well, they still haven't released the database survey (always looking to see how much DB2 has fallen behind), but there is a survey concerning SaaS that, surprise surprise, reinforces my thesis. Gartner is no better at prognosticating than Madam Zonga, but they do surveys as well as anybody. Have a look.

Update 2:
Another Gartner-ism from a newer report on cloud specifically:
"As cloud computing evolves, combinations of cloud services will be too complex and untrustworthy for end consumers to handle their integration, according to Gartner, Inc. Gartner predicts that as cloud services are adopted, the ability to govern their use, performance and delivery will be provided by cloud service brokerages."

In other words, cloud will need yet a new infrastructure to make this really, really cheap and easy off-loading of responsibility possible. Have these folks already forgotten the clustered duck that was CORBA? God almighty.

Dr. Codd Was Right

Vaccinated ≠ Not Infectious

About

Shameless Plug

Extended Pieces

Good Stuff

Followers

Blog Archive