Google publishes ideas discussing Good Enough approach to achieve low latency

It can be really hard to get the media to publish complex concepts which is why companies will submit their own articles.  Google's Luiz Barroso and Jeff Dean have an article on Google's Data Center challenge to provide low latency performance at scale.


The Tail at Scale

 


 





Systems that respond to user actions quickly (within 100ms) feel more fluid and natural to users than those that take longer.3Improvements in Internet connectivity and the rise of warehouse-scale computing systems2 have enabled Web services that provide fluid responsiveness while consulting multi-terabyte datasets spanning thousands of servers; for example, the Google search system updates query results interactively as the user types, predicting the most likely query based on the prefix typed so far, performing the search and showing the results within a few tens of milliseconds. Emerging augmented-reality devices (such as the Google Glass prototype7) will need associated Web services with even greater responsiveness in order to guarantee seamless interactivity.

The article can be long for most and here are two key points.

In large information-retrieval (IR) systems, speed is more than a performance metric; it is a key quality metric, as returning good results quickly is better than returning the best results slowly. Two techniques apply to such systems, as well as other to systems that inherently deal with imprecise results:

Good enough. In large IR systems, once a sufficient fraction of all the leaf servers has responded, the user may be best served by being given slightly incomplete ("good-enough") results in exchange for better end-to-end latency. The chance that a particular leaf server has the best result for the query is less than one in 1,000 queries, odds further reduced by replicating the most important documents in the corpus into multiple leaf servers. Since waiting for exceedingly slow servers might stretch service latency to unacceptable levels, Google's IR systems are tuned to occasionally respond with good-enough results when an acceptable fraction of the overall corpus has been searched, while being careful to ensure good-enough results remain rare. In general, good-enough schemes are also used to skip nonessential subsystems to improve responsiveness; for example, results from ads or spelling-correction systems are easily skipped for Web searches if they do not respond in time.

Google has used a technique like sticking your toe in the water to test out an environment before jumping.  They call it a canary request.

Canary requests. Another problem that can occur in systems with very high fan-out is that a particular request exercises an untested code path, causing crashes or extremely long delays on thousands of servers simultaneously. To prevent such correlated crash scenarios, some of Google's IR systems employ a technique called "canary requests"; rather than initially send a request to thousands of leaf servers, a root server sends it first to one or two leaf servers. The remaining servers are only queried if the root gets a successful response from the canary in a reasonable period of time. If the server crashes or hangs while the canary request is outstanding, the system flags the request as potentially dangerous and prevents further execution by not sending it to the remaining leaf servers. Canary requests provide a measure of robustness to back-ends in the face of difficult-to-predict programming errors, as well as malicious denial-of-service attacks.

The canary-request phase adds only a small amount of overall latency because the system must wait for only a single server to respond, producing much less variability than if it had to wait for all servers to respond for large fan-out requests; compare the first and last rows in Table 1. Despite the slight increase in latency caused by canary requests, such requests tend to be used for every request in all of Google's large fan-out search systems due to the additional safety they provide.

The Data Center World is getting smaller as it grows

One of my favorite books in high school was "Small is Beautiful."

Small Is Beautiful: A Study of Economics As If People Mattered is a collection of essays by British economist E. F. Schumacher. The phrase "Small Is Beautiful" came from a phrase by his teacher Leopold Kohr.[1] It is often used to champion small, appropriate technologies that are believed to empower people more, in contrast with phrases such as "bigger is better".

After two weeks of being in LV and then SJ hanging around data center people having interesting discussions it struck me how small the data center world is.  Yet it is growing.

With Social Networking and the bigger getting bigger, there is a small set of people who are driving the industry forward.  Yet, there is an increasing set of people who demand data center services including IT organizations who don't understand how the small data center world works.

I think part of the problem for an newbie to data centers is to filter through the marketing and sales positioning to get the core of how the data center works.  The marketing folks are not taking an approach that "Small is Beautiful" and it is about taking small steps in technology to empower people to design, build, and operate data centers better than the past.

The small is beautiful approach is an interesting one, that needs to be studied more.

Inspiration from Great Architecture, La Sagrada Familia

5 years ago I went to Barcelona to moderate a panel with Mike Manos (Microsoft at the time), HP, and Dell at Microsoft TechForum.  I was able to get away for an afternoon with another Microsoft friend and we visited the La Sagrada Familia.

NewImage

60 minutes just covered the latest update on the project.  Here is the written story.

And for the people who want to see more about the construction here is the story.

And another video.

I will write another post on what ideas I have used for inspiration.

The First Data Center where all knowledge was a goal, where anyone could access the information - Ancient Alexandria

What is a Data Center?  A place to house IT equipment.  What is IT equipment for?  To support the receiving, organizing, processing, analysis, and distribution of information.  Before the Internet the most common way to get information was to go to the library.  Libraries were also places to meet others to discuss topics which support the development of knowledge to be shared.  The library seems so ancient.  But, long long ago there was a library that attempted to have all knowledge accessible to all people just like a data center service like Google Search.

One of the more interesting conversations I've enjoyed discussing with a friend, Fred Gainer who is a retired teacher is the role of museums and libraries in society.  This gets into the subject of Epistemology.

Epistemology (Listeni/ɨˌpɪstɨˈmɒləi/ from Greek ἐπιστήμη - epistēmē, meaning "knowledge, understanding", and λόγος logos, meaning "study of") is the branch ofphilosophy concerned with the nature and scope of knowledge.[1][2] It questions what knowledge is, how it is acquired, and the possible extent a given subject or entity can be known.

I wrote previously posted on epistemology.

I saw a talk by John Leslie King Titled - Knowledge Infrastructure: Mechanism and Transformation in the Information.  One of the slides that got my attention was this one.

NewImage

The role of the Academy in a systematic collecting of information for a crowd-sourced knowledge.

A great point was the knowledge in a perspective of reason for existence, and how what's obvious leads to thinking what's hidden.

NewImage

One of the books that Fred suggested to read is 

The Rise and Fall of Alexandria: Birthplace of the Modern World

NewImage

Warning this book is not a fast read and many of you may not be interested in the idea of how knowledge/information was developed to rival Athens and Rome as centers in the Ancient World.

I just finished the book this morning and the thing that hit me, thinking like a data center person is. The choices made by Ptolemy to make Alexandria a center of knowledge and a repository for books across a wide range of cultures is exactly what has made Google a source of information.

Many of the concepts to build Alexandria, its libraries and its open culture is what is being repeated now in the huge data centers whether they are Google, Facebook, Twitter, or Microsoft.

Reading the history of Alexandria gave me a bunch of ideas on how to approach a knowledge system.  The politics and people issues were huge in Alexandria's history.  

Alexandria became a center of learning and knowledge development.  Companies like Google are focused on developing better knowledge systems that allow them to learn things faster.

It was in Alexandria, during the six hundred years beginning around 300 B.C., that human beings, in an important sense, began the intellectual adventure that has led us to the shores of the cosmic ocean. The city was founded by Alexander the Great who encouraged respect for alien cultures and the open-minded pursuit of knowledge. He encouraged his generals and soldiers to marry Persian and Indian women. He respected the gods of other nations. He collected exotic lifeforms, including an elephant for Aristotle, his teacher. His city was constructed on a lavish scale, to be the world center of commerce, culture, and learning. It was graced with broad avenues thirty meters wide, elegant architecture and statuary, Alexander's monumental tomb, and an enormous lighthouse, the Pharos, one of the seven wonders of the ancient world.

But the greatest marvel of Alexandria was the Library and the associated Museum (literally, an institution devoted to the specialties of the Nine Muses). It was the citadel of a brilliant scientific tradition. The Library was constructed and supported by the Ptolemys, the Greek kings who inherited the Egyptian portion of the empire of Alexander the Great. From the time of its creation until its destruction seven centuries later, it was the brain and heart of the ancient world.

The Real Data Center, lessons from The Real CSI, How reliable is the science behind forensics

PBS Frontline has a video on The Real CSI.

NewImage

Watch The Real CSI on PBS. See more from FRONTLINE.

Watching this video brings into questions of science behind fingerprints, blood tests and bite marks.

The one method that has trumped a bunch of these techniques is DNA testing.

It is interesting talking to the people who  have lots of data center experience, and in some ways it feels like these are the people who haved figured out the science of data centers, and what really works.

in the same way that fingerprints and blood testing are popular and accepted by the mass public, it doesn't necessary mean there is science behind the techniques.

Are you practicing data center science or using the common accepted methods?  There is a difference.