Microsoft's 4 yr VP of Cloud Infrastructure joins Google

March 6, 2018 Dave Ohara

Suresh Kumar has updated his LinkedIn profile and he has gone from Microsoft to Google. Before that he was an Amazon. In the Seattle area the revolving door between Amazon, Google, and Microsoft is common.

4 years ago I wrote a blog post comparing Suresh's role at Microsoft to Google's Joe Kava. What I didn't anticipate is the both of them working at Google 4 years later.

Jeff Dean publishes Part 1 of Reflection of 2017 Google Brain results - ML, ML, and more ML

January 11, 2018 Dave Ohara

Jeff Dean posted part 1 of his reflection of Google Brain’s 2017 achievements. https://research.googleblog.com/2018/01/the-google-brain-team-looking-back-on.html

If you don’t know what Google Brain is you can check out this wiki post. https://en.m.wikipedia.org/wiki/Google_Brain

When you read the post you can see the work is lots and lots of ML. Using the below infrastructure. Well they are probably using some really amazing stuff that Google won’t share for a long time. This is the elite Google Brain team they can get anything they want.

As an example of some of their work Jeff references this text to speech work. https://google.github.io/tacotron/publications/tacotron2/index.html

Check out this graphc that shows where TensorFlow is used.

Google shares its observations on Best Practices for AR

December 20, 2017 Dave Ohara

AR is a hot topic and Google has a post where they share their observations on best practices.

“From our own explorations, we’ve learned a few things about design patterns that may be useful for creators as they consider mobile AR platforms. For this post, we revisited our learnings from designing for head-mounted displays, mobile virtual reality experiences, and depth-sensing augmented reality applications. First-party apps such as Google Earth VR and Tilt Brush allow users to explore and create with two positionally-tracked controllers. Daydream helped us understand the opportunities and constraints for designing immersive experiences for mobile. Mobile AR introduces a new set of interaction challenges. Our explorations show how we’ve attempted to adapt emerging patterns to address different physical environments and the need to hold the phone throughout an entire application session.”

It’s a good summary of issues that are kind of obvious when you start down the path of building solutions.

Machine Learning (ML) in Google’s Data Center, Jeff Dean shares details

December 10, 2017 Dave Ohara

Jeff Dean is one of Google’s amazing staff who works on data centers. He posted a presentation on ML that is here. Who is Jeff Dean? Here is a business insider article on Jeff. If you want a good laugh check out the jokes on Jeff Dean’s capabilities. I’ve been lucky to have a few conversations with Jeff and watched him up close which helps to read the ML presentation.

Below is a small fraction of what is in Jeff’s presentation. It is going to take me a while to digest it, and luckily I shared the presentation with one of my friends who has been getting into ML architecture and we are both looking at ML systems.

Part of Jeff’s presentation is the application of ML in the data center.

This slide doesn’t show up until 3/4 through the presentation, and to show you how important this slide is it shows up again in Jeff’s conclusion slide.

So now that you have seen the end slide what is Jeff trying to do? Kind of simple he wants a computational power beyond the limits of Intel Processors. Urs Hoelzle wrote a paper on the need for brawny cores to replace the direction for wimpy cores. https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/36448.pdf

So what’s this look like?

Look at the aisle shot.

And here is shot of the TPU logic board with 4 TPUs.

Google has a mindset perspective from its early days giving it an advantage over many

September 24, 2017 Dave Ohara

In 1998, Google had a $100k check from Andy Becholsheim. In 1998 you could buy between 5-15 Compaq Servers that were used for web content. To make a high Availability system you would have a hot spare which could mean you have 1/2 the available resources. Google took the path that few have taken back then to use consumer components.

Above is the 1st Google Servers. The first iteration of Google production servers was built with inexpensive hardware and was designed to be very fault-tolerant.

In 2013, Google published it Datacenters as a Computer paper. http://www.morganclaypool.com/doi/pdf/10.2200/S00516ED2V01Y201306CAC024

A key part of this paper is discussion of hardware failure.

“1.6.6 HANDLING FAILURES

The sheer scale of WSCs requires that Internet services software tolerate relatively high component fault rates. Disk drives, for example, can exhibit annualized failure rates higher than 4% [123, 137]. Di erent deployments have reported between 1.2 and 16 average server-level restarts per year. With such high component failure rates, an application running across thousands of machines may need to react to failure conditions on an hourly basis. We expand on this topic further on Chapter 2, which describes the application domain, and Chapter 7, which deals with fault statistics.”

Google has come a long ways from using inexpensive hardware, but what has been carried forward is how to deal with failures.

Some may think 2 nodes in a system are required for high availability, but the smart ones know that you need 3 nodes and really want 5 nodes in the system.