29 January 2013

One Step Closer to Nirvana

For years, many years, there's been the quandary: if one is updating a row, and not updating key columns, should others be able to concurrently update other non-key columns, even the one I just updated? So far as I know, Postgres is the first database to tiptoe in that direction. This isn't cell locking, though.

With this level of locking, UPDATE queries that do not involve columns of the tuple key can perform concurrently.

The argument will be "but chaos will ensue!" I don't buy it. With conventional row level locking, my updates will get overwritten by your updates, just with lowered concurrency. And, by definition, non-key attributes are fully mutable and irrelevant to identity or process; that's why they're non-key. High normal form schemas fit right in with this locking regime: tables are narrower, with fewer non-key attributes per table. The flatfile folks oft times argue that same-named attributes have to have separate ranges, depending on table; foreign keys and normalization are evil. May they fry in hell.

Data Fiddles, While Rome Burns

The Case-Shiller number is just out, at 5.5% up. More than expected, on a month-to-month basis. One might wonder how this might be happening, given that median income continues to fall. Just saying. Data tends to lag back as much as a year (Census data on median income, for example). Either the recovery is way better than reported data says, or the Banksters are fiddling mortgages again.

25 January 2013

Rotten to the Core

(This was originally posted on Dr. Keynes, since it's mostly macro oriented. But, the mainstream punditry has seen the light, and dealing with the substance in similar terms. So, I guess it's appropriate subject matter for Dr. Codd.)

The entirety of the pundit class, both mainstream and fringe, have been weighing in on the meaning of Apple's disappointing quarterly yesterday. This is old news, from the perspective of this endeavor: I've been asserting for some time that Jobs/Apple has been pursuing a suicidal path. Where once there was a somewhat iconoclastic computer company, is now a garden variety up-scale appliance maker. The ElectroLux or Jenn-Air of the 21st century. Just because there's a cpu running the widget, doesn't make it a computer. Your Ford has as much compute power.

Up-scale anything is a low volume, low growth proposition; always has been and always will be. Ferrari sells about 6,500 units a year. And no growth. Apple's problem is that the middle class is being hollowed out. Obama's call for progressivism is also the clarion call to save Apple. Not that Jobs did or Cook does, understand this. Certainly not the proto-Jobs day trading plungers who don't have a clue. Apple has transformed itself by invading existing market sectors with novel designs, nothing more. None of the devices that have worked (remember Newton?) were original to Apple.

Rumors continue that there is an iTV in the offing. If so, and carried out as a novel design exercise, could it work? The iPhone really sped Apple away from being a computer maker, to being an appliance maker. iTV is pure appliance, but can physical design do for iTV over other set-top boxes what the iPhone did to RIM? Not likely. Physical design provides an avenue for advantage for objects which live either in the eye or in the hand; they're pretty to look at, or they're pretty to manipulate. Set-top boxes have no appreciable use case in either vector (love mixing metaphors, how about you?). How many of you bought your stereo receiver (assuming anyone still has a stereo) because of its looks? Thought so. The long time exception in that sphere is B&O, which markets its facade not its performance; a tiny niche company and the products aren't very good. That's Apple's future.

The Apple story of significant growth, through market invasion, requires that there be markets which already exist and which can be wrapped in a novel physical design. They've clearly worn out the iPhone meme. The last few versions haven't been meaningfully differentiated either from previous versions or competitive phones. The novelty is wearing thin, if not gone completely.

Apple may have slipped into Microsoft land, playing out the game as a worthy journeyman, collecting a fat salary but not playing any meaningful minutes. The days of Kobe are in the mirror. But, and here's the macro problem, could we have reached a technological plateau? Could it be that the innovators don't have a clue what more to do with cpus? With a concentrating upper class, a shrinking middle class, and an exploding lower class, what's a high-end digitizer to do? With smaller feature sizes still occurring (although that may soon end, too), thus greater compute power available, into what new device, smartly designed for the well heeled, should all this power be packaged? Consider that the basic inventions at play here all came to be in the first half (and for some quite early) of the 20th century. The smartphone makers and the carriers are cut from the same cloth as Wintel before them: in both cases the relationship is symbiosis, with each side supplying the needed demand for the other. In the case of Wintel, MS let "the hardware fix performance" while Intel needed Windoze bloat to justify the tick or tock (although tick-tock jargon happened later). Now, carriers need bloated data streams to claim higher fees, while smartphone vendors need bigger data pipes in order to support non-telephone functions.

The entire industry has to figure that out. The "emerging markets" are valuable only so long as the First World countries control the exchange rate regime. But, there's a problem with that, too. In the past, even the recent past, the regime was essentially mercantilist: the First World kept third world currencies undervalued, so that raw materials (and more recently, finished goods) could be imported cheaply. The third world, to First World capitalists, was never intended to be a consumer market. Now, having siphoned off a considerable amount of purchasing power from the First World middle class (by shipping employment to those cheaper currencies), looking to a third world middle class presents a problem: either third world wages/currencies (for First World menial tasks) have to rise enough to replace what's been lost from the First World, or the wheel stops spinning. It can't be both ways: cheap wages/currencies in the third world and middle class in the third world.

Yes, there are reports that Apple does do a decent iPhone business in China. But the value of the renminbi exists at the whim of Beijing.

Here is what largely happened in the West post WWII: unions and corporations conspired to create a blue-collar (low skill) middle class, which had sufficient incomes to absorb the burgeoning output of industrialization. This was a symbiotic relationship; capital needed consumers, but the rest of the world was either wild (most of South America and Africa) or devastated from the war, so domestic consumption of output required domestic incomes to buy. Having Bretton-Woods control of exchange rates, and thus raw materials from the third world (petroleum in particular), played a large part in that. Cheap energy makes for a higher use of energy, what is commonly called "standard of living". This was a period of leveling of income distribution, which powered economic growth. It ended when the Arabs got sick and tired of Israel, and took control of petroleum in retaliation. We still don't know how that will end, although there have been reports that horizontal drilling will yield mammoth amounts of, to date, unrecoverable deposits in North America. We'll see.

23 January 2013

Another Ton of Bricks

About a year ago, "A Ton of Bricks" told the story of Amazon's conversion, as yet still not discussed in the channels I follow, from e-tailer to brick-and-mortar retailer. Well, the conversion continues apace. Amazon's P/E ratio, as I type, is 3,182. They make a few pennies on the share. That Amazon is able to convince so many that it has found the New Way is astounding. The cost of jet transport, per pound, is so much higher than rail, it's clear now that Amazon is finally admitting what OR folks have known for decades: JIT production is fine when one is dealing with very high value parts which are small in volume, light in weight, and a small part of the BOM. In such an instance, stockpiling can add up to more in carrying cost than transport. For garden variety retail, warehousing wins. Always has and always will; so long as fuel isn't free.

Amazon isn't such a manufacturing business. It's a distributor, and its main cost is transport. The main cost it can control, at least. Unless Jeff decides to go all Walton on its suppliers. But, since Amazon has set out to be the purveyor of everything, that ain't gonna happen.

17 January 2013

Don't Hate Me Because I'm Thin

The major corollary of this endeavor is that smart-server/dumb-client computing, with high NF RDBMS calling the tune, is the Back to the Future moment. One of the touchstones of this re-conversion is the VT-220, a terminal made by DEC for years. Turned out to be so popular that many other terminal vendors emulated the VT-220. I saw lots of Wyse terminals in VT-220 mode.

So, here's what Dell is up to. Dell bought Wyse some time back.

You will be assimilated. You are number 6. (Watch out for the bubbles.)

15 January 2013

Cleave Only to Thy Foxy Lady

It's been a while since there appeared here a missive (well, rant) on the future of computing. To recap: the future, for those that prosper, will be multi-processor/core/thread/SSD servers running *nix and an industrial strength RDBMS. I will be kind to the MS folks, and admit that SQL Server (rather a nice engine) might save Windows; but it's a slim shady chance.

News today. This posting suggesting that MS cleave to Mozilla in hopes of surviving. Whether co-incidence or not, Apple's share is diving as I type. Could be transitory. Might not be. Ignoring 80% or more of consumers might catch up with you. It even caught with Rolls-Royce; the Krauts defeated by the RAF got control. Kind of like MS marrying Mozilla, inverse.

Anyway, here's a tidbit from the piece:
Which is why Mozilla's approach is so intriguing. The company isn't going after high-end smartphones, but rather after low-end, emerging market phones. To accomplish this, Mozilla can't wait around for hardware to get better. Instead, it needs to make the web stack better - now - such that it can work on even barebones phones, including in areas of limited or no bandwidth. Mozilla has therefore developed its web apps to be offline from the start, and to use equal-or-less bandwidth than native apps.

If that doesn't sound like the resurrection of the VT-220, I don't know what else could be. Minimal weight on the client, minimal weight on the wire, and maximal weight on the server. Now, consider the paradigm. If the client is merely a pixelized VT-220, doesn't that benefit MS? If the paradigm of app development is towards the server, doesn't that benefit MS? Where is MS's strength? Yes, Office brings in, historically, most of the moolah. But going forward, there's decreasing need for "the next" Office. Office work is about writing memos, after all. Windows Server and SQL Server, while not yet Enterprise weighty, could get there with a reasonable amount of effort. Moreover, by embracing the normalized database paradigm, one needs much less machine to get the job done.

May you live in interesting times, sleeping with strange bedfellows.

11 January 2013

Scuba Diving in Arcania

If I were to be interested in scuba, I'm not, the Caribbean island of Arcania is where I would go to look for unusual and brightly colored specimens. Since actually going there is not on the agenda, a thought visit is required. This journey was inspired by news out today, that an obscure Fed research group had looked into The Great Recession. The reporting says that new ways of modeling, not heretofore used in economics or policy, will come to our aid. Not surprisingly, the methods described, but otherwise named, sounded rather familiar.

So, off we go. That which is new, ain't necessarily so. As stated muchly, Your Good Mother knows better, and quants as often as not are hired to obscure the truth; ignoring or suppressing basic metrics.

To reiterate: TGR was caused by the (willful?) ignorance of quants (financial engineers). As early as 2003, it was clear from available data that the house price / median income ratio had come seriously unstuck. Since housing is not a return (cash) generating allocation of capital (modulo "psychic" income, even if you're a believer in that sort of thing the cash value is arbitrary and not real), the only support for increasing mortgage levels is increases in median income. The latter wasn't, and isn't, reality; thus the inflation in house price had to be corrupt. Quants don't generally have a corruption variable in their models.

Which brings us to Norris' article. The gist of it is that "agent based modeling" (quotes in the original) offers a better quant, which will identify problems before they morph into a TGR.

But a new assessment from a little-known agency created by the Dodd-Frank law argues that the models used by regulators to assess risk need to be fundamentally changed, and that until they are they are likely to be useful during normal times, but not when they matter the most.

Risk assessment by quants has been based on time series analysis for a very long time. The problem with time series analysis is the assumption that tomorrow looks mostly like today, and today looks mostly like yesterday. More so than other quant methods, time series analysis *assumes* that all determinants of the metric under study are embedded in that metric's historical data. As a result, when the price/income ratio went parabolic, the quants (and their Suit overseers) said, "goody, goody" when they should have said, "what the fuck is going on?" It was not in either the quants or Suits direct, immediate money, interest to question the parabola. They all, ignoring Your Good Mother's advice on group behaviour, went off the cliff in a lemming dive.

Mr. Bookstaber argues that conventional ways to measure risk -- known as "value at risk" and stress models -- fail to take into account interactions and feedback effects that can magnify a crisis and turn it into something that has effects far beyond the original development.

And that part is correct. But the argument, and the logic which extends, doesn't deal with identifying the underlying cause of TGR. It does attempt to find where the bread crumbs *will go*.

The working paper explains why the Office of Financial Research, which is part of the Treasury Department, has begun research into what is called "agent-based modeling," which tries to analyze what each agent -- in this case each bank or hedge fund -- will do as a situation develops and worsens. That effort is being run by Mr. Bookstaber, a former hedge fund manager and Wall Street risk manager and the author of an influential 2007 book, "A Demon of Our Own Design," that warned of the problems being created on Wall Street.

Agent based modeling? As we're about to see, it's old wine in new bottles. Kind of like NoSql being just VSAM.

"Agent-based modeling" has been used in a variety of nonfinancial areas, including traffic congestion and crowd dynamics (it turns out that putting a post in front of an emergency exit can actually improve the flow of people fleeing an emergency and thus save lives). But the modeling has received little attention from economists.

This is where it gets interesting. If you review ABM (why did they end up with the acronym for Anti-Ballistic Missile?) here in the wiki, you can walk a breadcrumb trail. ABM is fundamentally very old, and came from economics, although more recently associated with operations research.

The patient zero of ABM is Leontief's input-output analysis. Leontief built I/O analysis in 1936, well before computers and data were as available as today. My senior seminar somehow got Robert Solow to give us a talk on economic growth (that year's topic). In 1958, Solow co-authored "Linear Programming and Economic Analysis". Large, interaction based, models have been part and parcel of economics for decades.

Here is where the article, and Bookstaber, stumble:
Mr. Bookstaber said that he hoped that information from such models, coupled with the additional detailed data the government is now collecting on markets and trading positions, could help regulators spot potential trouble before it happens, as leverage builds up in a particular part of the markets. [my emphasis]

The cause of TGR wasn't leverage; it was the corruption of historic norms. The result of the corruption was an increase in leverage by those who didn't even know they'd done so: hamburger flippers living in McMansions. It remains a fact: only increasing median income can propel house prices. With contracted resets, not tied to prime, only those in growth income employment (or generalized inflation, which amounts to the same thing) can finance the growing vig. ABM, as described here at least, won't detect such corruption of markets. It can't.

Perhaps regulators could then take steps to raise the cost of borrowing in that particular area, rather than use the blunt tool of raising rates throughout the market.

Here we find the anti-Krugman (and humble self, of course). It was the rising interest rates from contractual resets that finally blew up the housing market. Had regulators forced ARMs to reset higher and faster, TGR would have triggered earlier, and might not have been Great. It's the job of economists to know how the economy works. Leontief's I/O model is the basis of contemporary macro-economic modeling.

Here's the thing. In the relational model, the developer specifies a priori which tables relate and which columns in the tables create that relationship. These relations aren't probabilistic, they're deterministic. A similar distinction exists in macro analysis. A traditional I/O model, while derived from real world data, is deterministic in the input and output relations. On the other hand, traditional macro models are probabilistic; R2 rules! Unless economists, and pundits, identify fundamental metrics, and build their models around them, they'll not have any luck predicting. Depressions and recessions have deterministic causes. Now, the loony monetarists tend to blame to the victims, just as they have this time (AIG suing the American taxpayer?). Keynesians tend to blame the centers of economic influence, just as they have this time. Historically, the Keynesians have been right more often than not. Volker be damned.

03 January 2013

Frenemies

There's that old saw: "the enemy of my enemy is my friend". I figured that this was due to Shakespeare, but the wiki says no, the adage originated either in Arabia or China. Makes sense: both cultures were way ahead of England by the time Shakespeare came around.

Each day I get an update from sqlservercentral. They're one part of the Goliath organization which published the Triage piece, so although I'm not currently doing much with SQL Server, it just seemed right. Today's feed included this link: "All Flash". Yhawza!! My little heart goes pit-a-pat. Then I scan the first paragraph, and go off to the link, and these are NoSql kiddies!! Arrgh!

The piece starts off reasonably, making the case that short-stroked arrays of HDD necessary to get the IOPS of a Samsung 840 is orders of magnitude greater than the cost of the 840. A couple of problems with that though. The 840 (not the 840 Pro) is a TLC read-(almost)only part. AnandTech tore it up and then again. While not an egregious part, it isn't by any stretch a server part. Consumer for sure; prosumer not so much.

The piece does get contradictory, however:
Flash is 10x more expensive than rotational disk. However, you'll make up the few thousand dollars you're spending simply by saving the cost of the meetings to discuss the schema optimizations you'll need to try to keep your database together. Flash goes so fast that you'll spend less time agonizing about optimizations.

This is the classic mistake: assuming that flat-file access is scalable. Of course, it isn't with consumer flash drives, and that's why the NoSql crowd find themselves in niche applications. The advantage of the RM, and the subsequent synergy with SSD, is that the RM defines the minimal data footprint. Since random I/O is the norm with multi-user servers, there's no greater penalty to normalization.

When a flash drive fails, you can still read the data.

I don't know where the author gets this from. Since each SSD controller has its own method of controlling data writing on the NAND, unlike HDD which follow standards and largely use Marvell parts (if CMU doesn't bankrupt them), data recovery is iffy. Most failures of SSD to date have been in the controller's firmware, not NAND giving up the ghost, and frequently lead to bricked parts. So, no, you can't remotely depend on simple recovery of data from SSD. While I've not seen definitive proof, SSD failure should be more predictable, which by itself is an advantage. One simply swaps out a drive at some percentage of the write limit. Modulo those firmware burps, you should be good to go until then.

Importantly, new flash technology is available every year with higher durability, such as this year's Intel S3700 which claims each drive can be rewritten 10 times a day for 5 years before failure.

Well, sort of. The S3700's consistency isn't due to NAND durability, but controller magic. It is well known that as geometry has shrunk, inherent durability of NAND has dropped. And that will continue. As I've mused before, we will reach a point where the gymnastics needed to compensate for falling P/E cycles in NAND by controllers will exceed the cost savings of smaller geometries. This is particularly true of the Samsung 840, which begins the article.

Over time, flash device firmware will improve, and small block writes will become more efficient and correct...

It's going the other way, alas. As geometries shrink, page size and erase block size have increased, not decreased. The use of DRAM caching on the drive is the common way to reduce write amplification, i.e. support smaller than page size writes. Firmware can only work around increasing page/erase block size.

What the author misses, of course, is that organic NF relational databases implement minimum byte footprint storage, get you a TPM, lots of DRI, and client agnosticism in the process. So, on the whole, it's a half right article.

01 January 2013

Bootstraps

What makes the marriage of SSD and the RM so appealing is the ease (relatively) of application building. If one specifies a schema, and active components, there is little if any logic that *must* be implemented exclusively on the client. Since the client is disconnected, in typical http driven applications, it can't know what the rest of its global world is doing. It's flying blind. Why would one want to rely on disparate clients (after all, Codd's paper's title includes "large shared data banks") all correctly implementing such logic?

As a New Year's present, a SQL Server (not my current cup of tea) article for generating active stored procs. Note that he references INFORMATION_SCHEMA, which is a standard; this process should work for any compliant engine, modulo SP syntax emitted. I consider this something of a compromise; one could, with more effort certainly, generate the html stream from the schema. There are applications which do that, too.

Touching Me, Touching You (Third Chorus)

(An existing piece, with some new information.)

Diligent readers know that, while this endeavor began with my having some free time to make a public stand for the full relational model/database due to the availability of much less expensive flash SSD (compared to DRAM SSD, which have been around for decades) in a "normal" OLTP application, the world changed a bit from then to now. In particular, the iPad. I've mentioned the implications in earlier postings.

Now, as regular readers know, the iPad is not especially new, from a semantic point of view. Tablets have been in use in warehouse software applications (MRP/ERP/Distribution) for a very long time. (This is just a current version.) I programmed with them in the early '90s.

But the iPad does mean that mainstream software now has a new input semantic to deal with: touch me, touch me, not my type. So, it was with some amusement that I saw this story in today's NY Times. Small-ish touch screens means small bytes of data, a bit at a time. The 100 field input screen that's been perpetuated (in no small measure as a result of the Fortune X00 penchant for "porting" 1970's COBOL screens to java or php) now for what seems like forever is headed the way of the dodo. It simply won't work. And the assumption that "well, we'll just 'break up' those flatfiles into 'sections'" will fail miserably. There'll be deadlocks, livelocks, and collisions till the cows come home.

BCNF schemas, doled out in "just the right size" to Goldilocks, is the way forward. Very cool.


[update]

So, now we have Win8, and the move to PC and touch. Here's the first 2013 story, pushing for push-button computing. As above, while one might argue that the pure semantics of touch versus mouse isn't large (after all, one is "pushing a button" seemingly in either case) the speed and fluency of touch is miles ahead (props to the horn player) of the rodent. Keeping up with this, by providing tidy morsels of data is key to success.

The Way We Were

Well, it's New Year's Day, and here I sit in my drafty New England garret ready to contemplate the most significant opportunity last year in the intersecting worlds of relational databases and quants. I have to give the laurels to the Obama quants, who ousted the political operatives, and set out to do political quant better than their opposition. One might give a thorny crown to the Romney crew, who did things the crony way (letting the loyal political hacks make the decisions), but that would be cruel. Previously cited, here is the story. Given that the DNC came a cropper in state and local elections, ceding yet more control to the Republicans, one can't conclude that Democrats are naturally more adept at quant.

A few honorable mentions:

- SSD land turned into a mine field. OCZ nearly went belly up, and might yet still. Consumer/prosumer SSD has fallen to commodity status; we can expect the Big Boys to dominate going forward. STEC found trouble, and is still in it. Fusion-io has been largely static. Flash arrays, from the bigger companies, gained mindshare; the quote from Linus is closer to true now than last New Year's.

- NoSql and NOSql fought with each other and against SQL for mindshare. On the whole, NOSql appears to have bested NoSql, but I stand by Date's quote. If you care about your data, you have to have a central control, i.e. a TPM, and Kiddie Koders who think they can gin up a replacement (in client code, no less) for any of the industrial strength engines that have been developed over the last two decades are kidding themselves (and any Suit dumb enough to buy the story).

- R, and other analytics, get closer integration with databases; SAP/HANA/R notably. But not, so far as I can tell, Sql Server. That part is too bad.

- Violin was reported to IPO at $2 billion in October, but hasn't happened yet. Enterprise SSD may yet still live as something more than just HDD cache.