Who will ship the first Thunderbolt Server? For now use a MacBook Pro as a server to test performance

10 GB Ethernet is expensive due to low volumes.  Fiber channel is lower cost, but still not high volume and not cheap enough for mass deployments.  Now that Apple and Intel have announced Thunderbolt, 10 GB IO connections will be low cost. 

Why not use Thunderbolt for SAN and network connectivity.  Look at the difference between these two designs.

Figure 1 illustrates a typical topology of building out a server cluster today, in which, while the form factors may change, the basic configuration follows a similar pattern. Given the widespread availability of open-source software and off-the-shelf hardware, companies have successfully built large topologies for their internal cloud infrastructure using this architecture.

Figure 1: Typical Data Center I/O interconnect

Figure 2 illustrates a server cluster built using a native PCIe fabric. As is evident, the usage of numerous adapters and controllers is significantly reduced and this results in a tremendous reduction in power and cost of the overall platform, while delivering better performance in terms of lower latency and higher throughput.

Figure 2: PCI Express-based Server Cluster

We'll see what server vendor is first with Thunderbolt support.  For now some innovative users could use a bunch of MacBook Pros.

Intel is a Mobile Chip company

I was reading a Forbes blog entry on Intel and the Atom chip.

Intel Should Be $26 But Not Because Of Atom Chips

Nov. 1 2010 - 5:17 pm | 6,417 views | 0 recommendations | 0 comments

posted by TREFIS TEAM

Intel Asia-Pacific general manager Navin Sheno...

Intel's Atom is a market share champ but doesn't do much for the stock price.

Since their launch in 2008, Intel’s Atom microprocessors have dominated the global netbook market. In addition to netbooks, the Atom microprocessor is used in a variety of other places including smartphones, tablets, car infotainment systems, smart TVs, low power consuming servers, and energy management systems.

Despite this, Atom’s rising market share will have minimal impact on Intel’s stock since these ultra low voltage microprocessors account for only around 2% of Intel’s stock price, based on our estimates. Intel’s Atom competes with AMD’s Athlon Neo, Qualcomm’s Snapdragon, andNvidia’s Tegra microprocessors. We currently have a Trefis price estimate of $26.50 for Intel’s stock, about 32% above the current market price of $20.

So where is the value of Intel stock (Note earlier this year I sold all my Intel stock when it was at 24 and luckily bought back when the stock was at 4)?  Check out this graphic from Trefis.

image

If you combine Notebook processors with Mobile chipsets you get 53.5% of Intel’s value.

Never thought of Intel as a mobile chipset company.  Energy efficiency is much of what Intel discusses even in its server chips.  Doing more with less energy is the future.

Read more

Intel acquires McAfee, defining the relationship between Security and energy-efficient performance

Intel announced the purchase of McAfee.

SANTA CLARA, Calif., Aug. 19, 2010 – Intel Corporation has entered into a definitive agreement to acquire McAfee, Inc., through the purchase of all of the company’s common stock at $48 per share in cash, for approximately $7.68 billion. Both boards of directors have unanimously approved the deal, which is expected to close after McAfee shareholder approval, regulatory clearances and other customary conditions specified in the agreement.

image

Most will focus on this as the reason for Intel's acquisition.

The acquisition reflects that security is now a fundamental component of online computing. Today’s security approach does not fully address the billions of new Internet-ready devices connecting, including mobile and wireless devices, TVs, cars, medical devices and ATM machines as well as the accompanying surge in cyber threats. Providing protection to a diverse online world requires a fundamentally new approach involving software, hardware and services.

What caught my eye though is this statement.

Inside Intel, the company has elevated the priority of security to be on par with its strategic focus areas in energy-efficient performance and Internet connectivity.

With a quote from Intel CEO Paul Otellini

“With the rapid expansion of growth across a vast array of Internet-connected devices, more and more of the elements of our lives have moved online,” said Paul Otellini, Intel president and CEO. “In the past, energy-efficient performance and connectivity have defined computing requirements. Looking forward, security will join those as a third pillar of what people demand from all computing experiences.

What Intel has identified is the relationship between Security and Energy-Efficient Performance.  How you approach Security can have a big impact on power consumption for a green data center.  PUE is discussed to explain power and cooling overhead for IT. 

What is the power consumed by security infrastructure?  10%, 20%, 50%

How many systems cannot be consolidated because of security issues?

Security issues contribute to the fiefdoms in data centers.

What is the energy consumption of your security decisions in the data center?

I posted about Security's relationship to being Green back in Apr 2008.

Security is The Opposing Force of Green, demonstration - techniques to remove hard drive data

I was a having a brainstorming session with another smart guy, I don't want to name him, because the idea is too controversial.  We were discussing Green Ideas and we stumbled on the issue of Security being un-Green.

Why? Security at its simplest level creates friction in processes to make things more difficult, this takes more energy, effort, and other resources.  The enemies of your Green IT efforts will be your Security group as they will not want to compromise their security policies.

Now I am not arguing for no security.  It is requirement of any system, but how much security creates an environmental cost which is not sustainable?

Read more

ex-Intel engineers at Microsoft share processor secrets, optimize performance per watt

Microsoft’s Dileep Bhandarkar and Kushagra Vaid published a paper on Rightsizing Servers for cost and power savings which are important in a green data center strategy.  To put things in context both Dileep and Kushagra are ex-Intel processor engineers.  Let’s start with the summary from their paper

In conclusion, the first point to emphasize is that there is more to performance than just speed. When your definition of performance includes cost effectiveness, you also need to consider power. The next point is that in many cases processor speed has outpaced our ability to consume it. It’s difficult to exploit CPU performance across the board. This platform imbalance presents an opportunity to rightsize your configurations. The results will offer a reduction in both power and costs, with power becoming an increasingly important factor in the focus on total cost of ownership.

It is also important to remember that industry benchmarks may not reflect your environment. We strongly recommend that IT departments do their own workload characterization, understand the behavior of the applications in their own world, and then optimize for that.

Dileep and Kushagra are going out on a limb sharing details most wouldn’t.  Intel and server manufacturers goal is to maximize revenue per unit (chips or servers).  If you buy high performance chips in the belief you are buying  high performance per watt systems, then they’ll make more money.  But, the truth is many times you don’t need the high performance processors.  There are many server manufacturers who are selling to big data center companies high performance per watt systems that have low cost processors.

Dileep has a blog post that goes along with the paper.

Before I came to Microsoft to manage server definition and purchases I worked on the other side of the fence. For 17 years I focused on processor architecture and performance at Digital Equipment Corporation, and then worked for 12 years at Intel, focusing on performance, architecture, and strategic planning. It’s interesting how now that I’m a hardware customer, the word “performance” encompasses cost effectiveness almost as much as it does throughput and response time. As my colleague Kushagra Vaid and I point out in our paper, when you look up performance in the dictionary it is defined as “how well something performs the functions for which it’s intended”.

Why should you read this paper? Because as Dileep points out the vast majority of people are purchasing based on unrealistic configurations run under processor benchmarks.

Figure: Three-year total cost of ownership of a basic 1U server

It also surprises me that so many IT groups base their purchasing decisions on published benchmark data about processors, even though that data is often generated using system configurations that are completely unrealistic when compared to real-world environments. Most folks sit up and take note when I display the facts about these topics, because the subject is important.

Rightsizing can clearly reduce the purchase price and the power consumption of a server. But the benefits go beyond the savings in capital expenditure. The lower power consumption has a big impact on the Total Cost of Ownership as shown in the Figure.

So, let’s start diving into the secrets in Dileep and Kushagra’s paper.  Here is the background.

Introduction
How do you make sure that the servers you purchase and deploy are most efficient in terms of cost and energy? In the Microsoft Global Foundation Services organization (GFS)—which builds and manages the company’s datacenters that house tens of thousands of servers—we do this by first performing detailed analysis of our internal workloads. Then by implementing a formal analysis process to rightsize the servers we deploy an immediate and long term cost savings can be realized. GFS finds that testing on actual internal workloads leads to much more useful comparison data versus published benchmark data. In rightsizing our servers we balance systems to achieve substantial savings. Our analysis and experience shows that it usually makes more sense to use fewer and less expensive processors because the bottleneck in performance is almost invariably the disk I/O portion of the platform, not the CPU.

What benchmarks?  SPEC CPU2006.  Understand the conditions of the test.

One of the most commonly used benchmarks is SPEC CPU2006. It provides valuable insight into performance characteristics for different microprocessors central processing units (CPUs) running a standardized set of single-threaded integer and floating-point benchmarks. A multi-threaded version of the benchmark is CPU2006_rate, which provides insight into throughput characteristics using multiple running instances of the CPU2006 benchmark.

But important caveats need to be considered when interpreting the data provided by the CPU2006 benchmark suite. Published benchmark results are almost always obtained using very highly tuned compilers that are rarely if ever used in code development for production systems. They often include settings for code optimization switches uncommon in most production systems. Also, while the individual benchmarks that make up the CPU2006 suite represent a very useful and diverse set of applications, these are not necessarily representative of the applications running in customer production environments. Additionally, it is very important to consider the specifics of the system setup used for obtaining the benchmarking data (e.g., CPU frequency and cache size, memory capacity, etc.) while interpreting the benchmark results since the setup has an impact on results and needs to be understood before making comparisons for product selection.

and TPC.

Additionally, the system configuration is often highly tuned to ensure there are no performance bottlenecks. This typically means using an extremely high performing storage subsystem to keep up with the CPU subsystem. In fact, it is not uncommon to observe system configurations with 1,000 or more disk drives in the storage subsystem for breakthrough TPC-C or TPC-E results. To illustrate this point, a recent real-world example involves a TPC-C
4 | Rightsizing Servers to Achieve Cost and Power Savings in the Datacenter Published December 2009 result for a dual-processor server platform that has an entry level price a little over $3,000 (Source: http://www.tpc.org). The result from the published benchmark is impressive: more than 600,000 transactions per minute. But the total system cost is over $675,000. That’s not a very realistic configuration for most companies. Most of the expense comes from employing 144 GB of memory and over a thousand disk drives.

Both of these test are in general setup to show the performance of CPUs, but as Dileep and Kushagra say, few systems are used in these configurations.  So what do you do?  Rightsize the system which usually means don’t buy the high performing CPU.  As the CPU is not the bottleneck.  Keep in mind these are ex-Intel processor engineers.

CPU is typically not your bottleneck: Balance your systems accordingly
So how should you look at performance in the real world? First you need to consider what the typical user configuration is in your organization. Normally this will be dictated either by the capability or by cost constraints. Typically your memory sizes are smaller than what you see in published benchmarks, and you have a limited amount of disk I/O. This is why CPU utilization throughout the industry is very low: server systems are not well balanced. What can you do about it? One option is to use more memory so there are fewer disk accesses. This adds a bit of cost, but can help you improve performance. The other option—the one GFS likes to use—is to deploy balanced servers so that major platform resources (CPU, memory, disk, and network) are sized correctly.

So, what happens if you don’t rightsize?

If memory or disk bandwidth is under-provisioned for a given application, the CPU will remain idle for a significant amount of time, wasting system power. The problem gets worse with multicore CPUs on the technology roadmap, offering further increases in CPU pipeline processing capabilities. A common technique to mitigate this mismatch is to increase the amount of system memory to reduce the frequency of disk accesses.

The old rule was to buy the highest performing processors i could afford.  Why not?  Because it wastes money and increases your power costs.

Another aspect to consider is shown in Figure 2 below. If you look at performance as measured by frequency for any given processor, typically there is a non-linear effect. At the higher frequency range, the price goes up faster than the frequency. To make matters worse, performance does not typically scale linearly with frequency. If you’re aiming for the highest possible performance, you’re going to end up paying a premium that’s out of proportion with the performance you’re going to get. Do you really need that performance, and is the rest of your system really going to be able to use it? It’s very important from a cost perspective to find the sweet spot you’re after.

image

What is the relationship of system performance, CPU utilization and disks?

See Figure 5 on the next page shows CPU utilization increasing with disk count as the result of the system being disk limited. As you increase the number of disk drives, the number of transactions per second goes up because you’re getting more I/O and consequently more throughput. With only eight drives CPU utilization is just 5 percent. At 24 drives CPU utilization goes up to 20 percent. If you double the drives even more, utilization goes up to about 25 percent. What that says is that you’re disk I/O limited, so you don’t need to buy the most expensive, fastest processor. This kind of data allows us to rightsize the configuration, reducing both power and cost.

image

The paper goes on to discuss Web Servers where if content is cached a faster processor does help.

image

To share the blame, two RAID controllers are looked at one with 256 MB and another with 512MB of cache.

But when we looked at the results from our ETW workload analysis, we found that most of the time our queue depth never goes beyond 8 I/Os. So in our operational area, there is no difference in performance between the two RAID controllers. If we didn’t have the workload analysis and just looked at those curves, we might have been impressed by the 10-15 percent performance improvement at the high end of the scale, and paid a premium for performance we would never have used.

image

Read more

Intel’s “A” letter Rival – ARM, not AMD

In data centers, the standard is go bigger with more power.  But, in the mobile market energy efficient performance is the standard, and ARM is the winner.  At some point, someone is going to build an IT infrastructure on Linux running on thousands of ARM processors. 

People will laugh at the idea, but for the same reason IBM chose their Blue Gene supercomputer architecture, a start-up could do the same.

The Blue Gene/L supercomputer is unique in the following aspects:

  • Trading the speed of processors for lower power consumption.
  • Dual processors per node with two working modes: co-processor (1 user process/node: computation and communication work is shared by two processors) and virtual node (2 user processes/node)
  • System-on-a-chip design
  • A large number of nodes (scalable in increments of 1024 up to at least 65,536)
  • Three-dimensional torus interconnect with auxiliary networks for global communications, I/O, and management
  • Lightweight OS per node for minimum system overhead (computational noise)[9]

News.com goes into more detail on Intel and ARM in the mobile space.

For Intel, small laptops bring challenge from ARM

by Brooke Crothers

Quick: Name an Intel rival whose name begins with an "A" and is abbreviated by three letters.

AMD? How about ARM. Even with attention focused on the immediate impact of Intel's earnings coming Tuesday afternoon, pesky questions linger about a likely future in which U.K.-based ARM and its satellite of chip and device makers pose a growing competitive threat. Maybe more so than Intel's traditional rival, Advanced Micro Devices.

Two recent statements from analysts argue that the camp of companies that make chips based on designs from ARM will dictate future competition in mobile computing. These companies include Qualcomm, Texas Instruments, Samsung, and, in the future, Apple.

New Tripoli, Penn.-based The Information Network said late last month that ARM processors, not Intel's Atom chip, will gain the largest chunk of the Netbook market in 2012--about a 55 percent market share. Netbooks are small, ultralight laptops typically priced under $400.

The market research firm argues that small ARM-based laptops, dubbed "smartbooks," will thrive under subsidized services from telephone carriers "modeled after Hewlett-Packard (cheap printer, expensive ink) and the mobile service providers (cheap cellphone, expensive monthly wireless charge)."

Note this comment on performance per watt.

And on Monday EE Times cited analyst Didier Scemama, with ABN AMRO Bank NV, who said there is a "shift towards computing based on ARM-Linux and away from Intel-Microsoft over the next technology cycle," which he said would begin in the second half of 2010, because ARM processors would match Intel chips in performance and beat them on power consumption and possibly cost.

The fastest growing internet companies have a sizeable Linux investment, and it would seem some are asking the question of whether they can run on an ARM-Linux platform.

Read more