King of Human Error influences Checklist Manifesto

Vanity Fair has an article by Michael Lewis on Daniel Kahneman's Thinking Fast and Slow book.

LETTER FROM BERKELEY
December 2011

The King of Human Error

Billy Beane’s sports-management revolution, chronicled by the author inMoneyball, was made possible by Israeli psychologists Daniel Kahneman and Amos Tversky. At 77, with his own new book, Thinking, Fast and Slow, the Nobel Prize-winning Kahneman reveals the built-in kinks in human reasoning—and he’s Exhibit A.

Related: “The Quiz Daniel Kahneman Wants You to Fail.”

THINKING MAN Daniel Kahneman outside his Berkeley, California, home. “He [is] more alert and alive than most 20-year-olds,” writes Lewis.

We’re obviously all at the mercy of forces we only dimly perceive and events over which we have no control, but it’s still unsettling to discover that there are people out there—human beings of whose existence you are totally oblivious—who have effectively toyed with your life.

One of the data center executives turned me on to the Checklist Manifesto by Atul Gawande, and guess what.  Atul was influenced by the same King of Human Error.

When you wander into the work of Kahneman and Tversky far enough, you come to find their fingerprints in places you never imagined even existed. It’s alive in the work of the psychologist Philip Tetlock, who famously studied the predictions of putative political experts and found they were less accurate than predictions made by simple algorithms. It’s present in the writing of Atul Gawande (Better, The Checklist Manifesto), who has shown the dangers of doctors who place too much faith in their intuition.

One of the patterns that is interesting to investigate is where the judgement errors are made in the data center.

Why is this important for a green data center?  Because there are judgement errors all over the place.

One way to look at Green Data Center Start-ups are they founded by engineers and scientists or VCs

Two of my cloud computing engineering friends and I are having a blast working on a technology solution that can be used in data centers as well as many other areas. I ran across Steve Blank's post on

How Scientists and Engineers Got It Right, and VC’s Got It Wrong

There are many parts of Steve's post that resonate with our team.

Startups are not smaller versions of large companies. Large companies execute known business models. In the real world a startup is about the search for a business model or more accurately, startups are a temporary organization designed to search for a scalable and repeatable business model.

...

Scientists and engineers as founders and startup CEOs is one of the least celebrated contributions of Silicon Valley.

It might be its most important.

We all worked in Silicon Valley, so we have a bunch of methods ingrained our thinking.

Why It’s “Silicon” Valley
In 1956 entrepreneurship as we know it would change forever.  At the time it didn’t appear earthshaking or momentous. Shockley Semiconductor Laboratory, the first semiconductor company in the valley, set up shop in Mountain View. Fifteen months later eight of Shockley’s employees (three physicists, an electrical engineer, an industrial engineer, a mechanical engineer, a metallurgist and a physical chemist) founded Fairchild Semiconductor.  (Every chip company in Silicon Valley can trace their lineage from Fairchild.)

The history of Fairchild was one of applied experimentation. It wasn’t pure research, but rather a culture of taking sufficient risks to get to market. It was learning, discovery, iteration and execution.  The goal was commercial products, but as scientists and engineers the company’s founders realized that at times the cost of experimentationwas failure. And just as they don’t punish failure in a research lab, they didn’t fire scientists whose experiments didn’t work. Instead the company built a culture where when you hit a wall, you backed up and tried a different path. (In 21st century parlance we say that innovation in the early semiconductor business was all about “pivoting” while aiming for salable products.)

The Fairchild approach would shape Silicon Valley’s entrepreneurial ethos: In startups, failure was treated as experience (until you ran out of money.)

Conveniently, our idea does not need VC money or MBAs.

Scientists and Engineers = Innovation and Entrepreneurship
Yet when venture capital got involved they brought all the processes to administer existing companies they learned in business school – how to write a business plan, accounting, organizational behavior, managerial skills, marketing, operations, etc. This set up a conflict with the learning, discovery and experimentation style of the original valley founders.

Yet because of the Golden Rule, the VC’s got to set how startups were built and managed (those who have the gold set the rules.)

I have been reading Steve Blank and some of his ideas as he experiments with business models.

Earlier this year we developed a class in the Stanford Technology Ventures Program, (the entrepreneurship center at Stanford’s School of Engineering), to provide scientists and engineers just those tools – how to think about all the parts of building a business, not just the product. The Stanford class introduced the first management tools for entrepreneurs built around the business model / customer development / agile development solution stack. (You can read about the class here.)

Some of the best data center conversations I have are on new business models not technology. Give it a try sometime.  It is much more fun.

An Example of what I am thinking of what I'll be working on next

At Structure I ran into some of my readers and they asked what I was planning on doing next as I posted on my plan to change what I work on.

Time to make some changes, my present to myself for my 51st B'day, "it is time"

FRIDAY, JUNE 17, 2011 AT 6:13AM

5 Years ago, I quit Microsoft after 14 years with no idea what I was going to do next.  What I did know is after 5 years I would be working on things that were much bigger and more fun than than what I was doing at Microsoft when I left.  Working on Win3.1, Win95, Windows 2000, and Windows XP were the most fun I had at Microsoft and I have the best memories.  Working at Apple, re-architecting the physical distribution system, being part of the hardware team on the Macintosh II, and working on software components for System 7 was when I had the most fun at Apple.  HP fresh out of school, I had all kinds of ideas on what I wanted to try.  Ideas in quality/reliability engineering, process engineering, and distribution logistics were fun.  Yeh, I am engineering nerd.  Why can’t the same ideas I worked on 30 years ago be applied to data centers?

I was lucky to catch up with one of my old friends who was visiting from Japan yesterday and review some of my plans.  Another of our SW friends who has been working on cloud solutions for the three years said our friend stationed in Japan was in town.  Both of these guys are extremely smart SW operating system, application development, and even data center operations experience.  I quickly sent e-mail seeing if he was available for dinner and cleared my schedule.  As background, we all used to work together at Apple 20 years ago.  One engineer worked for the other at Apple and Adobe.  The senior engineer worked for me for a while at Microsoft and we have always had great discussions on technology, but haven't formed a business together.  Now is the time to try.

In the hour dinner conversation we reviewed the solution and he agreed on the innovative approach that works well in data centers, but applies to many other areas besides data centers.  He complimented me that I had figured out some great insights being immersed in data centers that no one thinks about, and the ideas scale like a platform.  Platform ideass like we did at Apple and Microsoft with operating systems.

We discussed patent strategies, and he came up with better intellectual property protection mechanisms.  We discussed user interface design and real time information vs. post analysis processing of information.  We agreed on a strategy to create a new language that solves many issues we discussed. 

The two engineers will meet in person on Saturday for lunch and hopefully come up with more ideas. We'll have a conference call on Monday to review things as a group.

This reminds of the fun we had at Apple where I was project leader and supported great engineers who knew the right thing to do.  We all left Apple in the dark ages of 1992-3 after System 7.  Neither of us went back to Apple, but we look fondly back to the old days when we were much younger.  One of great lessons we all learned early on at Apple was the confidence and method to create solutions with no data to support the product.  There is no data that supports the solution we are thinking about will work and what the market size is.  The typical business plan approach would be to conduct a market analysis study.  Nahh.  We are going to build it.

My next data center conference is 7x24 Exchange Phoenix in Nov, 2011 and I'll see if some of the SW engineers will join me there.

Weak Bolts suspend operations of 3 Korean Submarines, lesson in managing the supply chain

Mike Manos has his blog at http://loosebolts.wordpress.com/, and I found this article interesting on how weak bolts have suspended the operations of three Korean Submarines.  Many of the data center industry professionals have had duty on a submarine.  Can you imagine how pissed off the operations crew would be at this problem?

For the first 1,800-ton submarine Sohn Won-il, a total of 20 bolts came loose on six occasions between 2006 and 2009.
For the second submarine Jeong Ji, its bolts were broken or loosened on six occasions between 2009 and 2010 while for the third submarine Ahn Jung-geun, its bolts were broken and came loose on three occasions during the same period.
The 214-class submarines, which were designed by German’s Howaldtswerke Deutsche Werft AG, or HDW, and built by Hyundai Heavy Industries, are the primary naval assets for underwater operations.
The military investigated and found that a local subcontractor produced and provided bolts which were weaker than what the German firm required in its design of the submarines, sources said.

I've been having some interesting discussions on supply chain issues in the data center and the need for a Bill of Materials (BOM) approach.  I've tested the idea with some experienced people who understand the approach.  But, to be successful we need an executive sponsor.

Can you think of other data center problems caused by supply chain issues where substandard parts are installed?  I can.

Mike Manos keynote question are you Donkey or a Chaos Monkey?

Mike Manos gave a keynote at Uptime Institute in his new role at AOL as VP of Technology, and was back with his entertaining presentation style.  Mike's talk was on "Preparing for the Cloud: A Data Center Survival Guide", but Mike wisely changed his presentation to challenge the attendees to stop being Donkeys.

Watch this video where Mike makes the point too many people behave like a donkey, like Eeyore and they are depressed about the coming of the Cloud.

Eeyore is generally characterized as a pessimistic, gloomy,depressed, anhedonic

Mike Manos Don't be Donkeys

Here is background on why Mike is calling out the Donkey analogy and how he was inspired for this talk..  at 3:45 mark is where the Donkey/Eeyore idea is mentioned.

Mike Manos Listening to the Uptime Audience

Mike challenges the tag line "disrupted data center" as most of what is being discussed this week was discussed last year and the year before.

Mike Manos disrupted data center

Mike uses Netflix's Chaos Monkey as a response to being a donkey.

The best way to avoid failure is to fail constantly.

We’ve sometimes referred to the Netflix software architecture in AWS as our Rambo Architecture. Each system has to be able to succeed, no matter what, even all on its own. We’re designing each distributed system to expect and tolerate failure from other systems on which it depends.

If our recommendations system is down, we degrade the quality of our responses to our customers, but we still respond. We’ll show popular titles instead of personalized picks. If our search system is intolerably slow, streaming should still work perfectly fine.

One of the first systems our engineers built in AWS is called the Chaos Monkey. The Chaos Monkey’s job is to randomly kill instances and services within our architecture. If we aren’t constantly testing our ability to succeed despite failure, then it isn’t likely to work when it matters most – in the event of an unexpected outage.

image

Are you a Donkey or Chaos Monkey?

Mike and I had a chance to talk about the reaction of people to his talk.  He had tons of people come up and say how they loved his talk.  Mike figured he had 50 people confess they were donkeys.  The funny thing is the guys I were hanging out during Mike's talk admitted they are chaos monkeys.  

Why are Donkeys so bad?  Because they slow down the movement of a group.  Consider this article on how groups disrupt crowd flow.

Secret of Annoying Crowds Revealed

by Dave Mosher on 7 April 2010, 5:00 PM

Get in line! People self-organize in crowds, often without thinking about it.

Push, shout, or politely excuse yourself all you want, but those slowpokes in your way just won't budge. A new study shows a long-neglected reason why: Up to 70% of people in crowds socially glue themselves into groups of two or more, slowing down traffic. What's worse, as crowds gets denser, groups bend into anti-aerodynamic shapes that exacerbate the problem. The study may be a boon to urban planners.

It is interesting to think of movement of ideas in the data center space like a crowds of people moving.

Uptime's Pitt Turner quickly adjusted his follow on to Mike Manos's call to action by telling people to take action and stop being donkeys.  But, telling people to move faster in a crowd can do more harm than good.

The study also determined that those who ask others to move faster actually do more harm than good. “You're contributing to chaos. Crowds are self-organized systems, so when you don't cooperate, the system breaks and you slow everyone down,” Moussaid concludes.

It was great to see Mike back in action and catch up.  I told Mike he should try and focus a presentation on the question of are you a Donkey or a Chaos Monkey?  It is a great topic that gets people thinking.

I think of my readers as more in the Chaos Monkey crowd.  I hope you do too.  I know I have too much fun creating making trouble.  I want one of the AWS Chaos Monkey T-shirts.