[FoRK] Atom bomb, network included

Eugen Leitl eugen at leitl.org
Tue Jun 15 04:09:50 PDT 2010


Mystery startup uncloaks 512-core server

Atom bomb, network included

By Timothy Prickett Morgan

Posted in Servers, 14th June 2010 13:38 GMT

The mystery behind secretive server startup SeaMicro is dispelled today as
the venture-backed maker of what it has been calling "data center appliances"
unveils its first product: the SM10000, a server cluster comprised of 512 of
Intel's Atom processors with a built-in, virtualized network fabric for the

The SM10000 passes the TPM Server Test of having an elegant design: mainly, I
want one, and I am not even sure why. I'll figure out what to do with it
later. Probably something stupid, like turning it into a giant MapReduce box
that uses log tables instead of floating point math units to do calculations,
just to see if that would work. In recent years, the server designs from
Fabric7, Liquid Computing, and 3Leaf Systems have all passed this test, as
did the original "SledgeHammer" Opteron chips from Advanced Micro Devices and
their clones, the "Nehalem" processors from Intel from last year.

Ditto the Nvidia Tesla 20 GPU co-processors, the Power7 IH supercomputing
nodes used in the "Blue Waters" super, and some of Sun Microsystems very
elegant Sun Fire designs from a few years back; so, too, for many Mini-ITX,
Nano-ITX, and Pico-ITX system boards for homemade, low-power servers.
Clearly, passing the TPM Server Test doesn't necessarily lead to riches, so
it is of dubious value.

SeaMicro has obtained $25m in venture funding from Khosla Ventures, Draper
Fisher Jurvetson, Crosslink Capital, and an unnamed private backer. The
company was also, you will recall, one of the vendors that received a slice
[1] of a $47m grant in January by the US Department of Energy to come up with
some greener technologies for the data center.

SeaMicro got the second-biggest slice of the DOE money, which was part of the
$787bn Obama administration stimulus package, landing a $9.3m grant to field
test a machine that puts hundreds of low-powered servers into a single box.
SeaMicro said it could cut power consumption by 75 percent compared to x64
alternatives in its proposal. The rumor last fall was that SeaMicro was
working on a server that would cram as many as 80 processors, perhaps Intel
Atoms, perhaps ARM RISC chips, into a single chassis with a direct mesh
fabric. The mesh is correct, but the processor count is way low.

The SM10000 does not have 10000 cores, as the name might seem to suggest, but
does put 512 individual servers based on the single-core Atom Z530 processor
into a 10U chassis, which is a neat trick. And one that the techies who used
to work at AMD, Cisco Systems, Force10 Networks, Juniper Networks, and Sun
were able to pull off.

SeaMicro was founded in July 2007 by Andrew Feldman, who formerly headed up
marketing at Force10, and Gary Lauterbach, an AMD chip designer who was also
responsible for putting together Sun's UltraSparc-III and UltraSparc-IV
processors. Feldman and Lauterbach looked at the modern, hyperscale workloads
that were starting to take over the data centers of the world and came to the
conclusion that the complex x64, RISC, and Itanium processors - well suited
to deal with predictable workloads solving complex problems within a single
company's application mix in a predictable and orchestrated fashion - were
wickedly unsuited for the relatively simple, but massively-scaled big data
jobs that companies want to run efficiently and cheaply.

"The reason why power is not an issue is that workloads have changed in the
data center," explains Feldman. "Now companies have smaller workloads, and
they are bursty in nature. And today's systems are particularly bad because
they have all these feature that suck power - out of order speculation branch
prediction, and so forth - that are not particularly useful for these kinds
of new workload and that consume lots of power. The end result is that we are
taking the Space Shuttle to the grocery store."

So SeaMicro looked at all kinds of low-powered, relatively simple processors
that it might base its data center appliances on, including VIA Technologies'
Nano, low-voltage x64 parts from Intel and AMD, and even ARM processors
commonly used in handhelds and cell phones. While SeaMicro thought the future
"Bobcat" processors from AMD were interesting, they would not get to market
in time, and among the Nano, ARM, and Atom alternatives, Feldman says that
the single-core Atom offers the best bang for the buck and the added benefit
- some might say absolute requirement - of compatibility with the x64
architecture. By SeaMicro's reckoning, on Internet-style workloads - search,
Map/Reduce and Hadoop, social networking apps, and such - the Atom core
offers about 3.2 times the performance per watt of a Xeon or Opteron core.
And the box can run Windows or Linux applications unchanged.

Getting the right CPU for the job was only one third of the battle, however,
because in a modern server, processors only account for about a third of the
total power consumption. Chipsets, memory, networking (including on-server
network ports and the external switch), peripheral I/O account for the other
two-thirds of the juice that gets sucked out of the wall. And so SeaMicro
created what is in essence a supercomputer interconnection fabric that also
virtualizes the memory and I/O for tiny Atom-based servers, many of which are
crammed onto a single motherboard, with many of these mobos plugged into the
fabric using plain old PCI-Express links.

That backplane virtualizes the networking and I/O for each Atom server and
also includes an integrated switch, a load balancer, and a terminal server
for all the servers in the box. This really is a single box compute cluster,
and it also has room for integrated disks.

The secret sauce in the SeaMicro design is an ASIC chip that virtualizes disk
access and Ethernet networking for each of the Atom servers. The ASIC also
implements a 3D torus interconnect between all of the server nodes, which is
similar to the interconnect that IBM developed for its BlueGene massively
parallel Linux supercomputer and which delivers 1.28 Tb/sec of aggregate
bandwidth across the 64 server motherboards and 512 cores inside the SM10000

SeaMicro also came up with its own field programmable gate array (FPGA) to do
load balancing across the machines in a very efficient manner. The load
balancing electronics are hooked into the SM10000's system management tools
to allow for pools of servers to be grouped together and managed as a single
object aredit card

The basic unit of computing in the SM10000 server cluster is an Atom machine
with four components: the Atom Z530 processor, which runs at 1.6 GHz and
which has two threads for execution; the "Poulsbo" US15W chipset; the
SeaMicro ASIC, for virtualizing I/O and implementing the fabric; and a SODIMM
memory slot. This server is about 2.2 inches by 3 inches, with the memory
module on one side and the other components on the other. That's reducing a
server from the size of a pizza box to the size of a credit card. Here's how
they lay out on a single SeaMicro server board: SeaMicro Atom Server Board

The SeaMicro SM10000 server board.

As you can see from the picture above, the SeaMicro SM10000 server board has
eight Atom servers (one chip and one chipset) on a single printed circuit
board. The smaller chip is actually the processor and the larger, darker chip
is the chipset. The four ASIC chips that virtualize the I/O and implement the
interconnect are along the bottom, and SeaMicro has designed the mobo so it
links back into the chassis using two absolutely standard PCI-Express 2.0 x16
slots, side by side. (Let this be a lesson to you proprietary blade server
makers with you non-standard backplanes and interconnect electronics). This
board measures 5 inches by 11 inches.

The SM10000 chassis has 128 PCI-Express 2.0 x16 slots, arranged in eight
vertical columms, four on the left and four on the right of the chassis. You
plug in 32 boards (two columns of 16) on each side to get your 512 Atoms per
chassis. Like thus: SeaMicro SM10000 Side View

The SM10000, front and side view.

With each Atom server having its own 2 GB SODIMM, the chassis supports up to
1 TB of main memory across the 512 server nodes. The chassis has room for up
to 64 SATA or solid state disk drives in the front (you always pull cold air
over disks, so they need to be in the front). The disks and server boards are
plug and play, so you don't have to reboot to add capacity. The servers need
to talk to the outside world, of course, so the homegrown networking fabric
and switch created by SeaMicro for the SM10000 has uplinks, which you can see
here: SeaMicro SM10000 Back View

The back-end of the SM10000 server chassis.

The chassis has different network modules, which offer 8 to 64 Gigabit
Ethernet uplinks or 2 to 16 10 Gigabit Ethernet uplinks per chassis. The
FPGAs implementing the load balancer and terminal software as well as the
switching software are in the chassis.

The whole box burns under 2 kilowatts of juice running real workloads, which
is a quarter of the power that a rack of two-socket x64 boxes will do.

The SM10000 will be available on July 30, with a base configuration running

By the way, there is nothing about the SeaMicro architecture that precludes
the company from supporting whatever processor architecture it wants. If
someone wanted a bunch of servers based on ARM processors and was willing to
pay for it, you can bet that SeaMicro could build it. Ditto for protocols and
ports coming off the interconnect fabric. The architecture can support Fibre
Channel or converged enhanced Ethernet, which allows for Fibre Channel to be
run over 10 Gigabit Ethernet.

For now, Feldman says that SeaMicro is looking ahead to a time when Intel
puts an entire Atom as well as its chipset, memory controller, and other
goodies on a single piece of silicon. At that time, SeaMicro should be able
to get a lot more servers and cores onto a single SM10000 system board. And
the company will also eventually be able to link multiple SM10000 chassis
together for integrated management, like stackable network switches do today.

The SM10000 took three years and many millions of dollars to develop and
could be very quick (a lot depends on the software), but is nonetheless a
complete unknown. Not the kind of thing that engenders any new technology to
large, conservative customers. But the issues in power and cooling are so bad
at many hyperscale data centers that enthusiasm for the SM10000 product,
which has been rumored since last summer, was quite high ahead of the launch.

"We have big orders," says Feldman, with a laugh. "And we have a good-sized

This might actually be a machine that Google buys instead of making itself.
We'll see.

More information about the FoRK mailing list