Google's Server Environment is not as homogenous as you think, up to 5 microarchitectures

June 14, 2013 Dave Ohara

There is a common belief that Google, Facebook, Twitter and any of the newer Web 2.0 companies have it easier because they have homogeneous environments vs. a typical enterprise. Well, Google has a paper that discusses how its homogenous Warehouse-scale computers are actually heterogenous and there is opportunity for performance improvements of up to 15%.

In this table Google lists the number of micro architectures in 10 different data centers. Now Google has 13 WSCs so this could show how old this analysis was run (maybe 2-3 yrs ago.) Or it could have been more recently and they dropped 3 data centers out of the table. The 13th just came on line over the past year and would probably not have enough data.

The issue that is pointed out in the paper is that the job manager assumes the cores are homogenous.

When in fact they are not.

Here is the results summary.

Results Summary: This paper shows that there is a

significant performance opportunity when taking advantage

of emergent heterogeneity in modern WSCs. At the scale of

modern cloud infrastructures such as those used by companies

like Google, Apple, and Microsoft, gaining just 1% of

performance improvement for a single application translates

to millions of dollars saved. In this work, we show that largescale

web-service applications that are sensitive to emergent

heterogeneity improve by more than 80% when employing

Whare-Map over heterogeneity-oblivious mapping. When

evaluating Whare-Map using our testbed composed of key

Google applications running on three types of production

machines commonly found co-existing in the same WSC, we

improve the overall performance of an entire WSC by 18%.

We also find a similar improvement of 15% in our benchmark

testbed and in our analysis of production data from WSCs

hosting live services.

Here are three different microarchitectures used in the paper - Table 3 is production. Table 4 is a test bed.

Here are the range in performance for the three different micro architectures.

The new job scheduler is deployed at Google and here are results.

Figure 11 shows the calculated

performance improvement when using Whare-Map over the

currently deployed mapping in 10 of Google’s active WSCs.

Even though some major applications are already mapped

to their best platforms through manual assignment, we have

measured significant potential improvement of up to 15%

when intelligently placing the remaining jobs. This performance

opportunity calculation based on this paper is now

an integral part of Google’s WSC monitoring infrastructure.

Each day the number of ‘wasted cycles’ due to inefficiently

mapping jobs to the WSC is calculated and reported across

each of Google’s WSCs world wide.

There is more in the paper I need to digest, but I need to finish this post as it is long enough already.

Google shares its 10-20% Server performance improvement technique, analyzing micro architecture of AMD and Intel Servers

June 13, 2013 Dave Ohara

If you told someone in the data center industry you could get 10-20% performance gain, people wouldn't believe you. If you said you had a new processor, memory, storage, or network architecture, you would have a higher chance of people thinking you tell the truth. Would you believe someone if they told you at the micro architecture level of servers, if you designed the software to access local memory vs. non-local memory on existing systems you could get a 10-20% performance gain? Well Google has shared this information and is deploying the solution in its data centers.

This indicates

that a simple NUMA-aware scheduling can already

yield sizable benefits in production for those platforms.

Based on our findings, NUMA-aware thread mapping is

implemented and in the deployment process in our production

WSCs.

Here is the Google Paper published in 2013. Warning this is not an easy paper to read if you are not familiar with operating systems and hardware. But, I hope it gives an appreciation of another way to green a data center by making some changes in software.

Optimizing Google's Warehouse Scale Computers: The NUMA Experience

Abstract: Due to the complexity and the massive scale of modern warehouse scale computers (WSCs), it is challenging to quantify the performance impact of individual microarchitectural properties and the potential optimization benefits in the production environment. As a result of these challenges, there is currently a lack of understanding of the microarchitecture-workload interaction, leaving potentially significant performance on the table.
This paper argues for a two-phase performance analysis methodology for optimizing WSCs that combines both an in-production investigation and an experimental load-testing approach. To demonstrate the effectiveness of this two-phase methodology, and to illustrate the challenges, methodologies, and opportunities in optimizing modern WSCs, this paper investigates the impact of non-uniform memory access (NUMA) for several Google's key web-service workloads in large-scale production WSCs. Leveraging a newly-designed metric and continuous large-scale profiling in live datacenters, our production analysis demonstrates that NUMA has a significant impact (10-20%) on two important webservices: Gmail backend and search frontend. Our carefully designed load-test further reveals surprising tradeoffs between optimizing for NUMA performance and reducing cache contention.

New Camera - Canon 6D - full sensor, higher noise free ISO, GPS, and wifi

May 7, 2013 Dave Ohara

Besides smartphone cameras I have a Canon SD100 w GPS and Canon 7D, but after my Canon SD100 had a warranty covered failure losing a GPS tagging camera made me think about the Canon 6D which I decided to go ahead and get. The main reasons I wanted it were the full sensor vs. an APS-C, higher noise free ISO settings (there is not a built in flash in a 6D), GPS built in, and wifi.

One of the things a full sensor makes it easier to do is blur the background to focus on a subject which the flowers in the center pop.

NewImage

Higher ISO let's me crank things up. I shot this at ISO 1600, 80mm, f/7.1, 1/250. Focus isn't quite there as this was only the 2nd day with the camera. :-)

NewImage

The high ISO also means the video in low light works really well. Here is a screen scrape from a video.

NewImage

GPS tagging is really handy and I believed in the feature 13 years ago when we tried to tell people at Microsoft that GPS tagging of images would be big. Huhh??? was most the reaction back then.

What's the wifi good for? Wireless remote control of the camera from an iPhone or Android phone.

This is when I am so glad I am able to use MarsEdit. Writing this blog post would be totally painful in a browser app.

Energy inefficiency forces an early retirement of '09 record holder super computer

March 31, 2013 Dave Ohara

There is lots of press around that the IBM Roadrunner supercomputer at Los Alamos National Laboratory is being turned off.

First Petaflop Supercomputer, 'Roadrunner,' Decommissioned

PC Magazine

7 hours ago

Written by
David Murphy

If you need to take a moment to think of a joke about a particular speedy bird and its coyote companion, we understand. Otherwise, it's time to raise a toast today to one of the computing world's heavyweights, the first supercomputer that ever managed to hit a ...

Former supercomputer king Roadrunner to shut down todayCNET - by Steven Musil

Roadrunner Supercomputer, Once World's Fastest, Decommissioned SundayHuffington Post

In Depth:World's top supercomputer from '09 is now obsolete, will be dismantled

The one I found most useful is the Arstechnica article where the energy efficiency is mentioned.

Petaflop machines aren't automatically obsolete—a petaflop is still speedy enough to crack the top 25 fastest supercomputers. Roadrunner is thus still capable of performing scientific work at mind-boggling speeds, but has been surpassed by competitors in terms of energy efficiency. For example, in the November 2012 ratings Roadrunner required 2,345 kilowatts to hit 1.042 petaflops and a world ranking of #22. The supercomputer at #21 required only 1,177 kilowatts, and #23 (clocked at 1.035 petaflops) required just 493 kilowatts.

Given the high power consumption it would seem most likely this is the actual power draw, not the additional power for the cooling system. A pre-2009 super computer would most likely have over 50% for the cooling system, so this could easily be 3.5MW of power.

Supercomputers are regularly rated on its energy use. And, the author highlights there is a need for a better performance per watt.

"Future supercomputers will need to improve on Roadrunner’s energy efficiency to make the power bill affordable," Los Alamos wrote. "Future supercomputers will also need new solutions for handling and storing the vast amounts of data involved in such massive calculations."

Quanta's Direct sales transformation 65% in 2012, expected to be 85% in 2013

March 18, 2013 Dave Ohara

Quanta was mostly known as an ODM, the guys who the OEMs went to make their hardware. But, Quanta has made the move to be in the direct sales business and guess what they are moving very fast.

Compaq was the beginning of the some of the first dual processor, dual hard drive, x86 based servers. In the beginning it was Compaq, HP and IBM who had the knowledge to build servers. Over time to reduce costs the manufacturing was move to Asia and eventually the engineering was moved to Asia, leaving the OEMs to have the customer relationship. In the shift to the bigger data center players. Remember 5 years ago how small Google, Facebook, Amazon, Microsoft, and Apple's data centers were? Now they are the dominant players and there has been a shift to the economies of scale with 10,000s of servers a small yearly order. The big guys buy 100,000s of servers a year.

This shift benefits a player like Quanta.

GigaOm has an article on Quanta.

How an unknown Taiwanese server maker is eating the big guys’ lunch

by Jordan Novet

MAR. 16, 2013 - 1:30 PM PDT

No Comments

A

A

SUMMARY:
In the server business, Taiwanese hardware company Quanta has shifted from an original-design manufacturer to much more of a direct seller. It wants to extend the trend and sell other products, too.

Here is the part that caught my eye.

Back then, Quanta didn’t sell servers directly to customers, it only built them for traditional server vendors who then put their name on them and sold them to customers. Fast forward a few years, and a majority of Quanta’s server revenue stems from direct deals — 65 percent in 2012, and a forecasted 85 percent this year.

Quanta is expanding the cloud hub of development in Seattle.

Next month, the company will open an office in Seattle in order to be closer to customers. Yang said Quanta has several customers in the area, although he declined to name them. Microsoft, which is building huge data center capacity for Windows Azure and its Live offerings, is a short drive from Seattle, in Redmond, Wash., and Seattle is much closer to Quincy, Wash., a hotbed of data centers, than the Fremont office. Quanta will add more U.S. offices for sales and service this year, Yang said.