Why Zynga moved from AWS to its own private cloud - zCloud

Zynga has an engineering post where they introduce their private cloud.  This post is a bit old, but it provides good details on why Zynga chose to move out of AWS for its own private cloud.

Now, Zynga still does use AWS, but they are thinking from a business/financial perspective.  Zynga has a hybrid cloud infrastructure, both public and private.

While our private cloud infrastructure has been growing quickly, Zynga also uses the AWS public cloud to fuel our rapid growth. Our use of AWS, while very important to our business, comes with an operating expense. Essentially, we have been trading monthly operating expenses against longer-term amortized capital expense. Yet, sometimes the pace of our growth forces us to make that tradeoff.

For example, when CityVille rapidly grew to millions of users in just six weeks, we had to grow our server infrastructure at a pace that kept up with and sometimes even outpaced this demand. In a strict capital expense model, we would have exceeded the supply chain of our equipment suppliers – the process of physically getting the number of servers ordered, shipped, delivered and implemented just takes too long. So, we traded the cost of operating expense in AWS for capital expense.

Zynga's cloud is compatible with AWS.

The zCloud is our private cloud that looks, feels, and operates in a similar fashion to the way that we use Amazon Web Services (AWS) Elastic Compute Cloud (EC2), our public cloud provider. As infrastructure that is private to Zynga, the ZCloud physically resides in our current datacenters and will expand as we grow our infrastructure over time.

Zynga has learned from its AWS operations and builds its own cloud with its own hardware.

While similar in functionality to AWS, our private zCloud is designed specifically for social games in terms of availability, network connectivity, server processing power and storage throughput. We have achieved these improvements by providing redundant power to each rack, state-of-the-art servers with high memory capacity, a fully non-blocking network infrastructure, the use of inline hardware-based load balancers and local disk storage.

 

Gartner says Private Clouds are a last resort, how about learn from public clouds then build your own

One of the guys I always enjoy a chat with is Jones Lang LaSalle's Michael Siteman.  I just got off the the phone to discuss public vs. private cloud ideas.  Then I read this post on Gartner's recommendation that Private Clouds are a last resort.

Gartner: Private clouds are a last resort

Thorough analysis required to identify cloud computing benefits

By Neal Weinberg, Network World 
October 19, 2011 10:00 AM ET

ORLANDO, Fla. -- Enterprises should consider public cloud services first and turn to private clouds only if the public cloud fails to meet their needs.

That was the advice delivered by analyst Daryl Plummer during Gartner's IT Symposium Tuesday. Plummer says that there are many potential benefits to deploying cloud services, including agility, reduced cost, reduced complexity, increased focus, increased innovation and being able to leverage the knowledge and skills of people outside the company.

Rob Enderle writes in disagreement with this recommendation.


At the same time, this week, Gartner took a position that was the polar opposite of the CIOs at these two events and argued that the private cloud was the last resort. This is just wrong. I’m guessing the company missed a meeting because this sentiment is shared by neither the CIOs nor vendors presenting at these events. The attendees are arguing that the private cloud may be their most important tool. What is also interesting is that were Gartner right, it would be a going-out-of-business scenario for them because public cloud services are being presented much like outsourcing and they do represent a very real threat to IT and it is IT that generally funds Gartner’s services.

The observation that Michael and I discussed is enterprises are studying the public clouds to learn how to build and operate cloud environments.  Companies can then make a knowledgeable decision on whether they should buy cloud services or make their own, private cloud.

Just because you see enterprises playing in the public could doesn't mean they are going to deploy there.

And, part of the learning is what are top causes of cloud outages.  Check out these top 10.

 

 

5 months since Mike Manos posts, Shares his Lights Out Data Center achievement

Mike Manos hasn't posted on his blog since May 20, 2011. Mike was nice enough to give me a heads up on what he posted today.

So, what has been Mike up to?  His post as usual is quite long, but having worked with Mike many times I am going to boil down this long post into something that is thought of as a blog post, less than 200 words.  :-)

Here is a graphic of what AOL has launched.

NewImage

So what did Mike's team do?  They shut themselves in their own world, in a cocoon for months to come up with a better way to run AOL's infrastructure.

Luckily I have a world class team at AOL and together we built and entered our own cocoon and busily went to work. We have gone down the path of changing out technology platforms, operational processes, outdated ways of thinking about data centers, infrastructure, and overall approach. Every inch fighting forward on this idea of unified infrastructure.

And part of that time was identifying the cruft.

As I look at the challenges facing modern IT departments across the world, their ability to “go to the cloud” or make use of new approaches is also securely anchored behind by the “cruft” of their past. Sometimes that cruft is so thick that the organization cannot move forward.

And getting rid of the cruft is required for automation.

One of the key foundations for our ATC facility is our cloud platform and automation layer.

Mike shared some of his achievements at Uptime.

We went from provisioning servers in days, to getting base virtual machines up and running in under 8 seconds. Want Service and Application images (for established products)? Add another 8 seconds or so. Want to roll it into production globally (changing global DNS/Load balancing/Security changes)? Lets call that another minute to roll out. We used Open Source products and added our own development glue into our own systems to make all this happen. I am incredibly proud of my Cloud teams here at AOL, because what they have been able to do in such a relatively short period of time is to roll out a world class cloud and service provisioning system that can be applied to new efforts and platforms or our older products. Better yet, the provisioning systems were built to be universal so that if required we can do the same thing with stand-alone physical boxes or virtual machines. No difference. Same system. This technology platform was recently recognized by the Uptime Institute at its last Symposium in California.

NewImage

And Mike gives credit to his team.

The culmination of all of this work is the result of some incredible teams devoted to the desire to affect change, a little dash of renegade engineering, a heaping helping of some new perspective, blood, sweat, tears and vision.   I am extremely proud of the teams here at AOL to deliver this ground-breaking achievement.   But then again, I am more than a bit biased.   I have seen the passion of these teams manifested in some incredible technology.

And what did Mike's team do?  They modularized the work.

This time frame was made possible by a standardized / modular way to build out our compute capacity in logical segments based upon the the infrastructure cloud type being deployed (low tier, mid-tier, etc.).   This approach has given us a predictability to speed of deployment and cost which in my opinion is unparalleled.

One of the things Mike can do that he couldn't do at Microsoft is use open source tools.

This time frame was made possible by a standardized / modular way to build out our compute capacity in logical segments based upon the the infrastructure cloud type being deployed (low tier, mid-tier, etc.).   This approach has given us a predictability to speed of deployment and cost which in my opinion is unparalleled.

...

We used Open Source products and added our own development glue into our own systems to make all  this happen.

Mike can now have really interesting discussions on the use of open source tools with the likes of Facebook, Twitter, Zynga, Mozilla, and others.

Zynga moves from AWS to its own Data Centers

Zynga is one of AWS largest tenants, supporting Zynga's rapid growth.  I discovered most of this information a year ago interviewing some people, and now that there is a public document, I can blog the following information.

VentureBeat's Dean Takahashi reports on the disclosure in the SEC statement.

Zynga planning to diversify beyond Amazon, build its own data centers

One of the little-known facts about social game giant Zynga is that it’s one of the biggest operators of cloud computing infrastructure, built to support its current customer base of more than 281 million monthly active users.

For much of its four-year history, Zynga has relied on a third-party hosting company, Amazon Web Services, for the hardware infrastructure for its server-based games such as FarmVille on Facebook. But to cut costs and diversify its risks, Zynga is now investing more money in building its own data centers, according to the company’s initial public offering filing with the Securities and Exchange Commission.

Zynga considers the investment in its own infrastructure to be important enough to warrant an investment of $100 – $150 million in the second half of 2011, according to the filing.

Where is Zynga moving to?  DCK reports on some of the sites.

Zynga currently leases data center space from two wholesale data center providers, DuPont Fabros Technology (DFT) and Digital Realty Trust (DLR). In the wholesale data center model, a tenant leases dedicated, fully-built data center space. Thisapproach offers greater control and security than shared colocation space, and is quicker and cheaper than building an entire data center facility. The tenant pays a significant premium over typical leases for office space, but is spared the capital investment to construct the data center.


Several of Zynga’s leased data centers are adjacent to Facebook data center facilities.

How fast can Zynga react in its new infrastructure?  How about 1,000 servers in a day.

Using Amazon EC2 and Leased Data Centers
Zynga has a strong cloud-based infrastructure that balances Amazon cloud instances with its own internal cloud infrastructure.  With the ability to add as many as 1,000 new servers to accomodate a surge in users in a 24 hour period (according to the S-1) a heavy hosting cost is associated with increased user demand.  By building more of its own infrastructure in company-owned data centers, Zynga might be able to reduce that cost.

Zynga has architected its solution for AWS.

Cadir Lee (CTO Zynga) quoted in a VentureBeat post:

It’s not the amount of hardware that matters. It’s the architecture of the application. You have to work at making your app architecture so that it takes advantage of Amazon. You have to have complete fluidity with the storage tier, the web tier. We are running our own data centers. We are looking more at doing our own data centers with more of a private cloud.

Netflix is infamous for being 100% in AWS, and Zynga is going in the opposite direction.

Zynga is going the opposite direction than Netflix. While Netflix is focusing (by using Amazon for most of their infrastructure), Zynga is diversifying (building their own data centers) .

And, what has Zynga learned running in AWS.  Note the yellow below "We have experienced, and may in the future experience, website disruptions, outages and other performance problems due to a variety of factors, including infrastructure changes, human or software errors and capacity constraints."

A significant majority of our game traffic is hosted by a single vendor and any failure or significant interruption in our network could impact our operations and harm our business. Our technology infrastructure is critical to the performance of our games and to player satisfaction. Our games run on a complex distributed system, or what is commonly known as cloud computing. We own, operate and maintain elements of this system, but significant elements of this system are operated by third parties that we do not control and which would require significant time to replace. We expect this dependence on third parties to continue. In particular, a significant majority of our game traffic is hosted by Amazon Web Services, or AWS, which service uses multiple locations. We have experienced, and may in the future experience, website disruptions, outages and other performance problems due to a variety of factors, including infrastructure changes, human or software errors and capacity constraints. For example, the operation of a few of our significant games, including FarmVille and CityVille, was interrupted for several hours in April 2011 due to a network outage. If a particular game is unavailable when players attempt to access it or navigation through a game is slower than they expect, players may stop playing the game and may be less likely to return to the game as often, if at all. A failure or significant interruption in our game service would harm our reputation and operations. We expect to continue to make significant investments to our technology infrastructure to maintain and improve all aspects of player experience and game performance. To the extent that our disaster recovery systems are not adequate, or we do not effectively address capacity constraints, upgrade our systems as needed and continually develop our technology and network architecture to accommodate increasing traffic, our business and operating results may suffer. We do not maintain insurance policies covering losses relating to our systems and we do not have business interruption insurance.

Netflix, Cancels Qwikster, let's see what else they change like the price

I just got this e-mail announcing NO QWIKSTER.

Let's see if Netflix takes the next step and changes their pricing.  They say they won't.  I have my Amazon Prime account and Amazon Kindle Fire on order.

Do people trust the Netflix team?

They say they are committed to making Netflix the best.

Netflix

Dear David,

It is clear that for many of our members two websites would make things more difficult, so we are going to keep Netflix as one place to go for streaming and DVDs.

This means no change: one website, one account, one password…in other words, no Qwikster.

While the July price change was necessary, we are now done with price changes.

We're constantly improving our streaming selection. We've recently added hundreds of movies from Paramount, Sony, Universal, Fox, Warner Bros., Lionsgate, MGM and Miramax. Plus, in the last couple of weeks alone, we've added over 3,500 TV episodes from ABC, NBC, FOX, CBS, USA, E!, Nickelodeon, Disney Channel, ABC Family, Discovery Channel, TLC, SyFy, A&E, History, and PBS.

We value you as a member, and we are committed to making Netflix the best place to get your movies & TV shows.

Respectfully,

The Netflix Team