28 April 2013

Gilding A Lily

Chris Date wrote "What Not How: The Business Rules Approach to Application Development" nearly fifteen years ago, and folks still think generating lots of client-side code, against some hoard of flatfiles, is the epitome of productivity and efficiency. Today's NY Times offers up yet another paean to coders.

I'll start with the punch line(s):
... Rails and JavaScript, which were interesting to Gild.

The company made Mr. Dominguez a job offer right away, and he accepted a position that pays around $115,000 a year.

Yikes!!! Let's see how this company lost its way down the Yellow Brick Road.

The story goes like this. A company with the self-aggrandizing name of Gild (the story doesn't say whether the management have heard of 'gilding the lily' as a negative epithet) asserts that it's easier, and better, to find superior coders (note, coders specifically) by dregging social media, rather than sorting through resumes and such traditional methods. They're building some application(s) to implement this approach.
In all, Gild's algorithm crunches thousands of bits of information in calculating around 300 larger variables about an individual: the sites where a person hangs out; the types of language, positive or negative, that he or she uses to describe technology of various kinds; self-reported skills on LinkedIn; the projects a person has worked on, and for how long; and, yes, where he or she went to school, in what major, and how that school was ranked that year by U.S. News & World Report.

Dominguez was found, and hired, using Gild's methods; they were eating their own dog food. The piece ends, thus:
The algorithm did a good job measuring what it can measure. It nailed Mr. Dominguez's talent for working with computers. What is still unfolding is how he uses his talent over the long term, working with people.

Of course, the only way to really know whether Dr. Ming's algorithm works is to track those hired with Gild versus those hired by traditional methods. And, of course, the data collection and analysis of such won't be done. Too expensive, having to wait for hires to cycle through, and so on.

I doubt whether the author, or the management of Gild, see the irony of that sentence. Let's see. Gild uses "social media" as the basis of its search and hiring practice, emphasis on 'social', and ends up with an anti-social troglodyte. We have a winner!!

The core problem with the 20-something development mindset is that it's fully embedded in the 1960's paradigm of dumb data run by client-side smart code. COBOL/VSAM all over again. And these folks wonder why they're always behind the curve. The fact that the syntax is ruby or java or (cringe) PHP is irrelevant. I'd even wager that these sort of folks don't even know the meaning of 'semantics'. They spin around 'syntax'.

High normal form RDBMS on multi-processor/core/SSD machines reduces the hairball of code. Coders aren't interested in efficiency, but in doing what's comfortable. And, what's comfortable for kids who've spent time with just a PC and a browser, is javascript and flatfiles. Mo LoC, mo money. Any effort to reduce LoC is an assault on the fortress of gelt.

For that $115,000, Gild could have bought a mammoth machine with a real, industrial strength RDBMS (DB2 on linux is my preference; SQL Server second, and Postgres third). Such a machine can easily handle terabytes of data in organic normal form; not that organic normal form schemas house anything near the byte load of the flatfiles. Gild could also get SAS/SPSS, if it wished; R would be better at any price. But "entrepreneurs" tend to be the tail wagged by the dog. The dog, in this sort of case, is the lemming herd of 20-somethings who've never read up (and certainly never gone to class for) the likes of set theory, the RM, real RDBMS, or even industrial strength SQL. Lone coders tend, almost exclusively, to gravitate to code-centric development for the simple reason that all the development platform needs is a compiler and a framework that fits on a laptop. Thus Rails/Ruby, or java/struts, or ...

The rejoinder, which I've had the pain to hear so many times, goes like this: "well, if we just hire a guy, we can just fire him if it doesn't work out, but if we commit to some database, we're stuck forever". In the first place, once a Gild hires a body to implement its platform, there will always be some body soaking up that $115K forever. They've committed to their "platform", so they're committed to keeping bodies at the oars of the trireme. Moving between SQL databases isn't as difficult as the kiddie koders want management to believe; think of it as front-loaded rearguard action. "We can't use a database!!! We have to code!!!" Baloney, of course. SwisSQL, among others, simplifies transition. And, again, in any case, if you've built an organic normal form schema (which you can't toss off in an afternoon, unlike code), then you'll be very close to vanilla ANSI-sql, and thus not dependent on vendor "extensions". Much as Joe Celko is a burr under the saddle of the Chris Date's of the world, he does stand with ANSI-sql. Good on him.

Design, in the mind of "agile" 20-somethings, looks an awful lot like "it growed like Topsy" COBOL applications from, you guessed it, the 1960s.
- carpenters: measure twice, cut once
- doctors: an ounce of prevention is worth a pound of cure
- the 20-somethings: we ain't got no idea (and it's too complicated for us to know yet, if ever) and never really will, so let's code; we can always bend the bytes to look how we want tomorrow and tomorrow and ...

Clearly, thinking first isn't part of the ethos of lemming coders.

The core problem with companies like Gild (others are mentioned in the piece), is that "the data" doesn't really drive the decision:
Dr. Ming's answer to what she calls "so much wasted talent" is to build machines that try to eliminate human bias. It's not that traditional pedigrees should be ignored, just balanced with what she considers more sophisticated measures. In all, Gild's algorithm crunches thousands of bits of information in calculating around 300 larger variables about an individual: the sites where a person hangs out; the types of language, positive or negative, that he or she uses to describe technology of various kinds; self-reported skills on LinkedIn; the projects a person has worked on, and for how long; and, yes, where he or she went to school, in what major, and how that school was ranked that year by U.S. News & World Report.

"Let's put everything in and let the data speak for itself," Dr. Ming said of the algorithms she is now building for Gild.

Should the Dr. Ming's of the world be making the decision about whether code (javascript, et al) or RDBMS be the basis of the platform? I think not. The "software problem" has gotten worse in lock step, over the last few decades, with the usurpation of technical decision-making from the experienced techies to 20-something "entrepreneurs" and "developers". If you insist that you must have a Black Swan (because your self image determines that you are also Just So Special), then you'll not be satisfied until you find what you believe is a Black Swan.

At no time in the course of the piece is any mentioned made of the data analysis. The implication (or, my inference) is that Dr. Ming is recreating SAS/SPSS/R functions in some ruby code. Such a bloody waste of time and money, largely out of (in the best case) ignorance, or (in the worst case) arrogance. While not my favorite RDBMS, this is a best case scenario for Postgres/R and PL/R. With the $115,000 of one coder, they could put together an inferential/database engine in short order. But doing so short circuits the bureaucratic need of "mo bodies, mo money". I've worked start-ups, and the need for organizational power ("I gots mo asses in seats than you do") starts with the third hire. Denying the reality doesn't make it go away. Failure starts early.

As usual these days, Gild is answering the wrong question.

25 April 2013

Down the Rabbit Hole

Trifecta. Three's a crowd. Win, place, and show. Three strikes, and you're out. Could be worse; there's a fourth item I'm too lazy to delve into.

So soon, you forget, eh? Today's reporting demonstrates that The Powers That Be still don't get it. All their quants doing all their voodoo (I happened to find my way, yet again, to Salmon's piece on Li; so should you), and they all refuse to accept the obvious.

Buried in the back page of the piece is The Truth:
Under them, a borrower's overall monthly debt payments cannot exceed 43 percent of personal income.

In his study, Professor Quercia of the University of North Carolina found that loans that complied with those rules defaulted at a relatively low rate during the housing bust. About 5.8 percent of them went bad, irrespective of how much the borrower put down.

This is The Truth, since it's the converse of what's shown in Viagra in the home: you can't have increasing home prices in a state of stagnant (or declining) median income without some serious fiddling going on.

The argument that down payment makes a difference is a joke. What matters is debt load. The subprime ARMs, in fact all contractual ARMs, would still have exploded irregardless of down payments made. Push up the house vig after two or five or seven years, and the mortgage holder sinks. Unless his/her income has increased concurrently; but, of course, that hasn't been happening for years. Builders, and perhaps lenders, want house prices to always grow, since that's how they continue to make mo money.

Arguments to the contrary, and there's a plenty of them discussed in the piece, pushing the envelope on affordability never ends well. The creation of the various ARMs was not at the behest of home buyers, but of those on the other end(s) of the process. Builders preferred to make 4,000 square foot McMansions. Banks preferred them as well, if only for the larger fees that higher prices brought. In order to accomplish this, however, a way to move folks into mortgages they couldn't have afforded under The Olde Rules necessitated making some new rules. And so it was.

Fiddling with down payment levels won't make much difference, and then not necessarily positive for the economy. The evil part is the ARM. ARM is fine and dandy during periods of macro-inflation; wages lead/lag prices, but move closely enough in concert that the "real" cost of the mortgage stays more or less level. In these (semi-)deflationary times, contractual ARMs (where the rate ratchets at aging points irregardless) will fail. (If there're any who still think such never actually existed, read this.) The only serious avenue is to outlaw contractual ARMs. Prime adjusted ARMs are nearly as bad, when the economy is as out of whack as it is today.

That Giant Pool of Money is still out there, looking for a "risk free" sinecure paying 10%. It ain't gonna happen, but they'll keep trying.

Contortionists

A while back, I mused that the flip phone would be back, once the industry figured out how to do it. At the time, my thought was a high quality hinge, using existing screens in two pieces. The edge would be small enough to be unnoticeable.

Now, it turns out, flexible, enough, screens may be on the way. Toss those Benjamins in the hat, we'll have another song for you shortly.

REM

It's the end of the world as we know it.
It's the end of the world as we know it.
It's the end of the world as we know it and I feel fine.
-- REM

Well, may be not quite the end. I was musing on the state of solid state disks for the past few months, and the world looked a little bleak. OCZ is spinning toward the drain, STEC is barely holding its own, and Fusion-io continued to fall in Mr. Market's eyes. All three, in fact. These are the remaining publicly held (nearly) pure-play SSD companies.

At the same time, Apple is pooping the bed, due (according to some in the pundit class) to the ineffectual nature of Tim Cook. And IBM just pooped its bed. A few days before IBM's quarterly, Arne Alsin, a small time Seeking Alpha blogger announced its demise. IBM, and all the traditional (whatever that means) IT companies will disappear into The Cloud.

Woe is Me!!!

Well, may be. And may be not. Fusion-io got a nice bump with its quarterly last night, but STEC and OCZ remain on thin ice. IBM/Texas Memory recently announced a new flash appliance, which may, or may not, be repackaged TMS SSDs. Hard to tell from the announcements whether it's a "true" flash array a la the Sun/Oracle F5100.

The Cloud argument is that a Cloud client needn't worry his pretty little head about capacity needs; just call up Amazon and get more. As if Amazon doesn't price in the cost of over-provisioning. To the extent, minimal I expect, that demand spikes in one segment of Amazon's Cloud clients are offset by troughs in other segment(s), then it's barely possible for Amazon to make a few bucks and for a given client to save a few bucks on the infrastructure. But I doubt it. Growth of the kind typically described is global (in all senses of the word), and thus not really elastic from the point of view of the provider(s). The rubber band keeps stretching, never relaxing.

If it should appear that Clouds consume a significant fraction of HDD/cpu production, then those manufacturers will seek to eliminate the last middleman. Why give away the gear at onerous bulk pricing? One can easily see a contentious duopoly/duopsony war breaking out. As we used to say, a widespread race to the bottom.

Apple fits in, for the following reason. Many, even in the pundit class, view Apple as the premier computer company, which has fallen on hard times with the demise of Steve. Some, humble self included, have viewed Apple as a toy company (that is, a maker of toys) since the iPod. That was 2001, for those who forget, or have never owned one (hand raised). As a toy company, it has to come up with new toys with some regularity. I'm among the (growing?) crowd who believe that Apple has, in fact, invented very little, while they may have patents on rectangles with rounded corners. They assemble their toys with, mostly, Other Peoples' Parts. Their toys are variations on existing versions, from other vendors. Unlike Mattel, for instance, Apple hasn't been effective at designing new toys, only copying somebody else's. OS X is a closed version of open source *nix. How'd they manage that? They took BSD licensed *nix parts. BSD permits that sort of ripoff, where GPL doesn't. Steve always said: "steal".

iTV or iWatch or iFoo may finally break the mold, by, well, breaking out a new device. Then again, may be not. In any case, how Apple goes says just about nothing of the computer business. Apple does consume a lot of NAND, certainly. Unlike the old GM, what's good for Apple isn't necessarily good for the USofA, and bad things happening to Apple don't necessarily affect the rest of IT.

So, the journey down the Yellow Brick Road has hit a roundabout. Consumer SSD is rapidly falling into the race to the bottom, controlled by the NAND vendors; Intel and Samsung particularly. Enterprise SSD is likely a short-lived flash in the pan (so to speak), to be replaced by directly connected NAND appliances. The interesting question: how soon will Linus' prediction become fact? How soon will hierarchical filesystems, engineered to spinning rust be replaced? linux has to be the leader here, although IBM could easily (relatively) build whatever it wants for the z machines.

22 April 2013

Kids Say The Darndest Things

Turns out, I'm not the only blatherer addicted to quotes; both prophetic and profane.

This first is found on the R-blogger page.

While this is the first's inspiration.

I think it safe to say that both are willing to cast a jaundiced eye at econometrics in the field. Early on in my professional existence, I concluded that the econometrician's fundamental problem is that their data, by and large, is acquired through begging and borrowing. While I have issues with psychometrics, at least some part of the field generates its own data. To paraphrase Bill Parcells, "If they say you have to make the meal, at least have the authority to buy the groceries."

Neither has the chutzpah to put quotes as the introduction to the site, but someone has to blaze the new trial.

18 April 2013

More R & R, I Feel So Much Better!

The broohaha over the Reinhart & Rogoff paper impelled me to look for better stats and economics applied to the question of debt and growth. The graphs in the HAP critique showed clearly that the data as presented by R&R isn't quite right. What struck me from the beginning is that R&R proffered the data as cross-sectional, pure of heart. That was self-evidently false. What we have is an aggregation of time-series. There was a bit of handwaving here and there about serial-correlation (d'oh; it's time series) and multi-collinearity (d'oh; global booms, busts, and catastrophes will impact all economies at once). I'm smart, but I didn't believe that I was the first to notice. In the interests of not re-inventing the wheel, I went looking for papers/postings/musings on the R&R paper and mixed cross-sectional, time series analysis.

On my first search attempt, I found this paper, currently in review. It is dated October, 2012 and directly addresses the R&R paper.

From the paper's Introduction:
These results point to a clear policy implication, namely that it is at best misleading and at worst growth-retarding to assume and employ a common debt/GDP threshold across diverse sets of countries at different stages in development.
...
Finally, we recognise the heterogeneity of the countries in our sample in terms of their level of economic development and provide further insights into the debt-development nexus at the sectoral level, employing data on the agricultural and manufacturing sectors, respectively.
...
our empirical analysis accounts for dynamics and time-series issues

And the money quote (within the context of the Introduction):
In empirical spirit this study is closest to that of Kraay & Nehru (2006, p.342) investigating debt sustainability and arguing that "a common single debt sustainability threshold is not appropriate because it fails to recognize the role of institutions and policies that matter for the likelihood of debt distress".

Kraay & Nehru is: "When Is External Debt Sustainable?" World Bank Economic Review 20 (3): 341-365.

Unlike R&R, this paper deals with the data head-on:
In Section 4 we discuss the time series and cross-section correlation properties of the data.

Bonzai! It's a thirty page paper, and still not finished, so I'm not going to cut & paste most of here, which I am itching to do. Go and read it. There's a bit of algebra, but you can ignore that if you wish. Their analysis is in reasonably plain English.

In simple terms: they show that Reinhart and Rogoff (celebrated economists!) are simpletons. Some of the critiques make the point that discrediting the data analysis supporting policy doesn't impinge on the policy decision. In the sciences, this is often expressed as: "data doesn't invalidate a theory; only a new theory can do that." For those who disdain reality based governance, I suppose shooting holes big enough to drive a Mack truck through amount to a mere flesh wound.

17 April 2013

Bobbing For Apple

The other big news of the day is that Apple is bobbing under $400, and Mr. Market seems irritated. Mr. Market will, eventually, figure out that Apple stopped being a computer company years ago. Today, they're a toy company, just like Mattel, although it's P/E is more than twice Apple as I type. As of their last 10-Ks, both held about 10% of assets in physical capital.

Jobs was deemed a God for bringing Apple out of the desert, but in doing so, he morphed Apple from computer company to computer-driven appliance company. "Mobile computing" is an oxymoron, and the pundits should have known better. As a toy company, it is constrained by discretionary income of the middle class. Oops! They ain't no mo. Without a new, at least for Apple, toy, there's hell to pay. One should note that Apple hasn't invented any of the platforms it exploits. Jobs' strength was convincing the masses that the Apple version of some existing device was not only superior to others, but that it had made a whole new class of device. And those with more money than brains lapped it up.

What Apple needs is P. T. Barnum.

But Liars Figure [update]

If you're a policy wonk, or a regular reader of comprehensive newspapers, you're likely aware of the Reinhart & Rogoff controversy. Fact is, I hadn't been paying attention until today's reporting. The deja vu experience is charming; when I was at UMass, there was only Amherst and the economics department was busy purging anybody who wasn't Right Wing micro zealot. Boy howdy!

The report is here.

Of the critiques of the critique, this one appeals. And here's Krugman.

Anyway, was it intentional academic malfeasance? None of the reporting I've read says so. I'm not so nice.

The money quote, from Next New Deal:
They find that three main issues stand out. First, Reinhart and Rogoff selectively exclude years of high debt and average growth. Second, they use a debatable method to weight the countries. Third, there also appears to be a coding error that excludes high-debt and average-growth countries. All three bias in favor of their result, and without them you don't get their controversial result.

If you read through the various reports, the weighting scheme used by R&R comes up as a major source of criticism. In a nutshell, each country is reduced to one number for each of four categories, regardless of how long its data series is.
(HAP note that just adding an additional category, rather than 90+ makes a difference.
... we add an additional public debt/GDP category, extending by an additional 30 percentage points of public debt/GDP ratio--that is, we add 90-120 percent and greater-than-120 percent categories.
)

What isn't mentioned (in the reports I've viewed), is weights reflecting economy size. Both R&R and HAP treat all economies as equivalent. They certainly aren't.

HAP (page 8):
But equal weighting by country gives a one-year episode as much weight as nearly two decades in the above 90 percent public debt/GDP range.

It's hard to warrant that categorizing the data, particularly in the way they did, is justified. Well, unless one needs to make a case, much as lawyers will selectively pick facts to satisfy a client. This abject disdain for objectivity is the reason I abandoned economics after grad school.

Excel is for cowards. (Props to Bill Simmons; I think I've it got right this time.)

[update]
I've spent some time with the R&R paper, and here's the justification they give:
The four "buckets" encompassing low, medium-low, medium-high, and high debt levels are based on our interpretation of much of the literature and policy discussion on what is considered low, high etc debt levels. It parallels the World Bank country groupings according to four income groups. Sensitivity analysis involving a different set of debt cutoffs merits exploration as do country-specific debt thresholds along the broad lines discussed in Reinhart, Rogoff, and Savastano (2003).

And to further note: weighting by base GDP (for some agreed upon time point) by country has the felicitous effect of de-emphazing small economies with small absolute (in the global realm) GDP growths. A $1 gain for $10 looks a lot better than a $2 gain for $50. Or, "it's easier to grow fast when you start out small!" Just ask Apple, subject of another post today.

The thing is, what they've done (with, or without, the Blessing of the World Bank) is binned continuous data. It is incorrect to label this as "weighting", because it isn't. Binning, for these data, do not improve the explanatory power of the data. Furthermore, they all (R&R, and the critiques I've seen) all allow the outlier question to pass unasked. At 30% and 120% (see the HAP paper, figure 3) there are significant outliers, which (just happen to) push the model in the direction R&R's politics demand. It is well known that linear regression is sensitive to outliers. Further proof. Bad dog!

14 April 2013

Reverse Thrusters, Mr. Scott!

A comment on one of these missives, posted in an alternative universe, led me to comment serially. This is the part of the comment which motivated this missive:
The only way to increase the demand for stick built single family housing is to drop the price a lot. Not to mention that boomers will be dieing off soon, and all those reverse mortgages will stuff yet more inventory into the mix.

Which led me to go a lookin' for data on reverse mortgages by year and value. Haven't found same as yet, but I did find a bit scary reporting from last week.
While forward mortgages show a surplus in FHA's reestimate of reserves of $4.3 billion, the reverse mortgage portion shows a loss of $5.2 billion, resulting in the $943 million shortfall, according to FHA.

An actuarial review of the MMI fund in late 2012 indicated the reverse mortgage fund had a negative $2.8 billion economic value; part of more than $16 billion in negative economic value of the fund overall.

There have been a stream of stories about surviving spouses being tossed out of homes when the mortgage holder spouse kicks the bucket; the rules about who can be on the title/deed get convoluted. And, according to some reports, such as this, the Banksters aren't above fibbing a bit.

I wonder: is there yet another Viagra at the home we need to worry about?

11 April 2013

Structural Failure

Remember the Tacoma Narrows bridge? It was a victim of unmindful structural deficiencies. Imagine my surprise (not really) to see this story describing the addition of the C struct to java. Just as coders have assassinated Dr. Codd's relational model, with their smart code, stupid data approach, now the coders want to kill off even the appearance of OO paradigm in java.

I suppose it was inevitable. At least in the web app world, java objectification has been executed as "data objects" (just a struct with a bit more typing) and "action objects" (just a function with a bit more typing) for a couple of decades. This is why I've been in the habit of referring to java as just COBOL without the SCREAMING CAPITALS all this time. Turning java into C de jure will just complete the de facto regression.

Charlie Chaplin

Channel surfing got me to "Chaplin", the biographical movie of Charlie, with Robert Downey, Jr. I didn't stay very long, but seeing some bits led me to WikiPedia. What was his life, from an historical perspective?
A Dog's Life, released April 1918, was the first film under the new contract. Chaplin paid yet more concern to story construction, and began treating the Tramp as "a sort of Pierrot [sad clown]."

Today, I am a pierrot. AnandTech has a review up of the Crucial/Micron M500. Nearly 1T for $599! But how they got there, which is detailed in the review, dashes any hope for SSD being a performance replacement for HDD. In the comments, a question was asked, and I had the temerity to answer. Here it is.
-- My point is, what's at stake here is who's the next Seagate? The next Western Digital? Of SSDs.

Getting harder to say. The three well known public companies doing SSD (mostly) as such, STEC, OCZ, Fusion-io, have been missing all targets for a least a couple of quarters. Violin may or may not IPO in the next few months.

The reasonable answer is that there won't be a Seagate or WDC for SSD. It's well understood how to take commodity HDD to Enterprise Drive, using tighter QA and some incrementally better parts at modest cost. With SSD, as this review shows, "progress" in feature shrink isn't improving any of the factors at lower cost. It is quite perverse. The NAND suppliers will come to dominate consumer SSD, with performance asymptotically approaching a bit better than current HDD, with a price premium. Look for TLC, with huge erase blocks, long latencies, slowing controllers (having to do all that much more work to get around the NAND).

Enterprise SSD will likely fade away, to be replaced by NAND arrays, along the line of the Sun/Oracle device, which has been around for a few years.

The SSD is dead, long live the flash array.

08 April 2013

Finding Nemo

It must be vexing to be a quant in the financial services industries. On the one hand, you're in the Next Sexy Career: data science. On the other hand, Greenspan and Bernanke and yourselves have shoved a 20 inch long 4 inch wide dildo up your yahoo. That's got to hurt.

What happened? Well, the base cause is wealth concentration. When the few have the most, there's little left to drive real demand for goods and services. Absent real demand, there's no cause to invest in real capital. With a slackening of real capital, the trickle down to fiduciary capital is inevitable. This effect is more obvious in industrial economies, over the last century or thereabouts, than earlier. But, "let them eat cake" has been the crux of the matter for a lot longer.

Unearned income is both a social and economic porcupine. To the extent that the .1%ers demand 10% or so on their amassed moolah, the quants (who are, after all, the braceros in that sector) have to find another way when Treasuries dry up. Before Greenspan cratered interest rates, it wasn't impossible to pull off. Not so much of the moolah had to find a risker home (presaging?). Treasuries were pulling in the mid to high single digits, so only a small part of that accumulated moolah had to be allocated to "risky" assets. Preferably fiduciary assets, since physical assets is something neither the .1%ers nor quants understand all that well. Physics and engineering and real demand for goods are, taken together, mind numbingly difficult to conjure.

With the Fed cratering "risk free" unearned income, the quants looked to the, historically, nearly risk free instrument: the home mortgage. Whether there was (certain Right Wingnuts gainsay) a giant pool of money out there, or not, forcing the interest rate of record to very low values forced all those who depend on unearned income to look elsewhere. So, they did. (The giant pool argument being that the rate would have cratered no matter what Greenspan and Bernanke did, since the giant pool would, in aggregate, have been chasing a fixed field of investments. Supply exceeding demand, drives down the price, which is the interest rate.)

But there is a trickle down unintended (one hopes, at least) consequence: all other forms of reliance on unearned income were just as affected. Your life insurance and health insurance and defined benefit retirement plans (the few that still exist) all began looking for ways to opt out of cratered US fiduciary assets.

What had been the earnings? Here is a compilation.

There are a slew of tables, so I'll just highlight the numbers that speak to me (from table 2.11 $millions):
1970: mortgages - 74,375 stocks - 15,420
2010: mortgages - 366,988 stocks - 1,570,225

As is obvious, a tectonic shift. Put simply: your life insurance policy now depends, more than it ever did, on Prudential's rock solid quants picking the right stocks. And more to the point: your policy depends not on compound interest, but on fiduciary capital gains. You should feel your sphincter tighten rather a lot.

Table 12.2 is instructive. There's a vicious Prudential commercial (actually, a number of versions) on the TeeVee where some "professor" has folks put a dot on a wall, representing the "oldest" person known. These are actors, of course, and it's all made up. This table has life expectancy (from the industry Bible of data; you thought each quant figured this out on his/her own? not really!) at various ages starting in 1900. At birth, it goes from 49 to 78! We're living nearly twice as long! The sky is falling! We must kill off a hoard of old people! Not really.

Social Security was established, circa 1935 (first paid 1937). So the number that really matters is life expectancy, at 65 or 75, then and now (the closest year in the table is 1939).
1939: 65 - 12.8 75 - 7.6
2009: 65 - 18.8 75 - 11.7

So, yes we are living longer, but the sky isn't falling. More to the point, since SS is a *current account* system, having folks living until 65, which is where almost all of those 30 extra years are, means more folks pay in longer. Yes, the Boomers will be something of an extra burden for a decade or so, but after that, it's all gravy.

In sum, the issue comes back to macroeconomics, despite the fact that quants operate in a microeconomic venue (what's the slickest way to make my employer richer?). Both life and health insurance are inter-generational moolah flows; that fact can't be denied. The question of equity is both micro and macro. From a micro point of view: it's a dog eat dog world, and every man for himself. In such a world, insurance is funded solely by individuals' earnings. This is a divide-and-conquer method whose main beneficiary (pardon the pun) is the financial services industry; there are a whole lot more easily divided buyers than sellers. The principle difficulty, aside from outright equity, is that individuals are cornered by the market. If one retired, or wished to do so, around 2008, with only individual assets to support said retirement, well, good luck with that.

It's simply more equitable to remove the luck (good or bad) of the draw from these decisions. Those who are eligible for funding at some point in time ought not to be penalized for macro market conditions that are cratered. Similarly, for those who are elibible at the top of the rollercoaster ride.

Any increase in retirement funding has to come from real investment. As The Great Recession proved to anyone paying attention, fiduciary instruments are merely pass-through monies. Without underlying increases in real incomes (and they can only come from increases in real productivity), the instruments fail. And so they did.

Taking a macro point of view, one comes to a somewhat different conclusion about how to fund insurance programs. Firstly, the income streams are unearned from the point of view the funds; these are fiduciary instruments which have more or less tenuous connection to real world investment. Since both retirement and health exist at the societal level, then one needn't incur the overhead costs of administering a faux investment fund. To be specific: there have been, over the years, assertions that SS is/should be an investment system. It isn't and shouldn't. It isn't because FDR's wonks weren't stupid. And it shouldn't be because it makes no sense. If SS held private stocks and bonds, how would it regulate/police the stock markets? Recall that SS came into existence in the wake of The Great Depression, which was caused by unregulated markets.

If SS held private stocks and bonds, it would be obliged, as a shareholder, to defend Enron and Country Wide and all the other malefactors of the private markets. The current system of "investing" in Treasuries (broadly defined) is a sham, and that's OK. It seems to allay some of the idiots who insist that SS holds their "personal investment". It doesn't, and never did. Also, the notion that Average American in 2010 may/will receive more than s/he paid in, is nonsense. Average American pays for the SS of his/her parents. Would you stiff Mom and Dad, just so you could have an iPhone 7? (You would?) Yes, the Boomers are the pig in the python. And yes, Boomers didn't breed as eagerly as previous generations (well, those who aren't religious whackos, anyway). But we've seen in the last 40 years that productivity has widely outpaced employment. In other words, we don't need no stinkin' babies.

That last is the political meatball. Japan put itself in the mess back in the 1990s, and China, being so much larger, is seeing the effect faster and more pronounced. Society wide insurance demands a society wide (and inter-generational) funding mechanism. Otherwise, it's let them eat cake.

About That Gypsy...

Finally got some time to look at the Postgres version of Northwind, mentioned earlier. Disappointment, but not fatal. Turns out, of the conversions on offer, the PG one is least complete. The file contains only the schema, data, and PK defs. Hmm.

Not only that, but PG has a schizoid relation with letter casing. In the final analysis, unless all your identifiers are lower-case, you'll end having to quote (or escape quote) every bit of SQL you run. Ack. The parser lower-cases anything it sees that doesn't have quotes. So, the DDL can have not only mixed case, but spaces where quotes are used. The file is loaded with all of this. Ack. Now, if you run DML against the table, and get the mixed case correct, but don't quote, you get an object not found error, since the parser insists on lower-casing your text. Irritating. This is, undoubtedly, an artefact of the C/Unix/coders-view-of-the-world heritage of PG. Irritating.

So, out comes vim (SlickEdit version) to the rescue. That gets us the database, schema, data, and PKs.

Fortunately, the FK and view syntax (SQL Server, PG and the MySql versions) are similar enough that grabbing stanzas and editing isn't too irritating. Which leaves the procs. PG doesn't have functions, triggers, and procs as discrete entities, just functions used/labeled also as triggers and procs. Worse, ADS doesn't want to parse proc defs that psql has trouble with. Since procs aren't important, yet, to Northwind's use with R, I'm going to shelve that part of the exercise for now.

Finally, Blogger doesn't allow files for download, and I've not setup a github account; so anyone's who's interested, drop me an e-mail. I'll send along a zip of the three files: base, FKs, and views. The address is hidden in the profile info to the left. Deeply, deeply hidden. You may never find it.

03 April 2013

They Died With Their Boots On

The title of this missive is also the title of an Errol Flynn western, 1941. That much I recalled, and the reason I wanted it for a title. What I didn't recall, but WikiPedia reminds me, is that the movie is a fictionalized account of George Armstrong Custer. Yes, that one. So, as a Dr. Codd title, it's a bit inapt. Or not, depending on your place in the RM/SQL/NoSQL spectrum.

When I was in school, I recall being told that the main difference between a job and a career was one's attachment to the work itself, rather than to the ability to pay some bills. A career lasts a lifetime, while a job is a necessity, much like sitting on the toilet in the morning. Make a bunch of money, so that one can spend as much time as possible doing something else.

The father of a high school friend was a general surgeon (the father, not the friend), who owing to the disruption of WWII and certain digressions in his youth, had been practicing for only about 20 years, even though when I met him he was a bit more than 60. This was before the attack of the vampire squid HMOs, so he was in private practice. As such, he depended, as did specialists then, on referral from general practioners to continue "cutting stomachs", as he oft times described his work. He said that a surgeon was retired by his colleagues: as they gave up their practices, the surgeon loses referrals; younger GPs referred to their age-appropriate peers, by and large. A surgeon didn't stop working because he wanted to, but because he was forced out by circumstance.

Some time later, whilst working in DC, I had the opportunity to take a couple of seminars with W. Edwards Deming. He was still well known within the OR/QA/stats world. Not so much now. He was in his 80s then. According to WikiPedia:
In 1993, Deming published his final book, The New Economics for Industry, Government, Education, which included the System of Profound Knowledge and the 14 Points for Management. It also contained educational concepts involving group-based teaching without grades, as well as management without individual merit or performance reviews.
He died in 1993. With his boots on.

Three examples of the principal of advancement: we accrue knowledge as a people, and don't purposely turn back the clock. The Dark Ages happened because the baser cultures successfully attacked (think about that in the Sandy Hook context of gun control). As attributed to Newton: "If I have seen further it is by standing on the shoulders of giants". I wonder where the IT world stands today?

Which brings us to the question: how is it that IT generally, and database applications specifically, have shown such retrograde/reactionary tendancies over the last couple of decades? That is to say, the embrace of data technologies from the 1960s (and even, one might point out, 1950s)? Why is iteration/looping over sequential data the sine qua non of coding? Is it pure ignorance? "Those that ignore the past are doomed to repeat it"? By way of contrast to other professions, (I was tempted to say that physicians don't revert to using leeches, but they do, a bit) such as medicine accrue learning and move forward. So, I'll point out that they no longer consider blood soaked clothing as a mark of expertise.
Although even some Greek surgeons had advised washing the hands before dealing with patients, this aspect was overlooked and the doctors strode around in blood-stained coats. The bloodier the coat, the higher the reputation of the surgeon.

But the younger set are willfully embracing siloed, non-transactional, client-side driven, flatfile data applications in java and such that, save for the syntax and CAPITALIZATION SCREAMING of COBOL, are semantically the same as those applications tapped out on 059 keypunches. Why? Part of the explanation lies with the residue of early web technology. By the mid 1990s, commercial computing was divided between mainframes running COBOL with mostly DB2 and a bit of Oracle (notoriously ill-suited to the 370 architecture) over SNA to 3270 character terminals, and AS/400 or *nix mini-computers running some RDBMS (or 4GL/database) over RS-232 to VT-X00 character terminals. This was the era of client/server in a box. All the data be ours sayeth the box. Such machines had an order of magnitude (or more) slower processors, smaller memories, and disk capacities than today. But they could populate their screens in real-time. While the 3270 did/does have some local coding capability, not unlike a hobbled javascript, the point was to manage the data in-situ and leave just screen painting to the client. Real time data population and editing against the datastore was the norm. One had to be careful with transaction scope, but one has to anyway.

With the young web, where bandwidth to the browser was effectively far lower than a VT-X00 would see over RS-232, javascript increasingly available, and young studs eager to make money (rather than a career)... Well, we have the "software problem". Not to mention that the math ability of US kids declined.
Unfortunately, the percentage of students in the U.S. Class of 2009 who were highly accomplished in math is well below that of most countries with which the U.S. generally compares itself. No less than 30 of the 56 other countries that participated in the Program for International Student Assessment (PISA) math test had a larger percentage of students who scored at the international equivalent of the advanced level on our National Assessment of Educational Progress (NAEP) tests.

The relational model, which doesn't demand anything more than elementary set theory, doesn't fit the bill.

Making a quick buck on Wall Street, selling bogus securities and such, doesn't require knowing algebra. So, they don't. Making a quick buck with some social app doesn't, either. So, they don't.

I recall the process of class selection as an undergraduate. When it came to electives, even for those with a math-y major, the absolute preference was for classes from sociology, poli sci, and maybe psych. Why? Because there were seldom "right" answers, and if one was an accomplished bullshit artist (and even engineers could manage that), then a high grade was assured. The lower the rigour, the higher the GPA. And so it is with the applications they gravitate towards. Unstructured data. Fuzzy logic. And so forth.

Will it be possible for Chris Date, to cite an example, to be well regarded in his 80th decade (assuming he can get that far), in the way Deming was? I hope so. On the other hand, Deming was gone before the vampire squid assault of the Bayesians, so if he'd been born in 1930 rather than 1900, he could have been ignored once he reached 60.

The infrastructure to support client/server in a box on the web only gets deeper and stronger. At some point it will win out. And we can all die with our boots on.

02 April 2013

The Whistling Gypsy

In the first years of the 1960's, there was a thriving (by those days' standards) folk music scene. It was centered on adults, 30-somethings and older, in nightclubs. By the mid-1960's, the Beatles and Bob had put an end to that. Dylan and friends developed an acoustic music labeled folk, but based on performer penned songs, rather than traditional ones which had aged with the people. So to speak. The only exemplar of that era that's still remember by more than a handful of diehards is The Kingston Trio. For myself, while decades younger than the nightclub goers, I was drawn to the best of the best, The Limeliters. They did some sung-through commercials, identifiably sung by them, which are rare today but common then, so that's how I discovered them, Coke and Lucky Strike, if I remember correctly. Over a couple of years, their recording fame was due to a few live recordings, done quickly and frequently; they were done for by 1963.

"The Dance" by Fleetwood Mac is an example of audience interaction but The Limeliters, were more so. Audience interaction was nearly as much of the act as the singing. Their studio recordings were flat by comparison. The banter came mostly from Lou Gottlieb, a Berkeley Ph.D. (in musicology, not math stats; the latter would make the tale too cloying for words). When the group disbanded, he had his own fifteen minutes of fame when he attempted to will a farm/commune he'd established to God (the piece misspells the group's name, which happened a lot after they'd been consigned to history). The California courts weren't amused.

He opens one of their better known songs with some of his better known banter: "... the title of the song: 'The Whistling Gypsy', or where the hot wind blows." Well, there is a hot wind in the database world, and it's Northwind. Along with Date's supplier/parts database, the most well known and universally available demonstration database in the whole entire world. If your world is Windows and SQL Server.

I've been searching, in a desultory way, for a flat file dump that I can load into my various linux database engines, notably Postgres (due to the PL/R support which provides my R fix in database connectivity) to no avail. Until now. Here's the stuff. DB2/LUW is missing. May be I'll do that one too. For now, Postgres will do nicely.