[FoRK] FoRK Digest, Vol 131, Issue 5

Stephen D. Williams sdw at lig.net
Sun Aug 10 12:09:45 PDT 2014

Sort of.  We have or will have many cores soon.  Massive number crunching has been moved to within millimeters of where it is 
needed, running <1v, a few watts at peak but almost nothing in between, and communicated to at 4GBps per channel bundle (with at 
least 2-3 of these links per system) now and growing.  A current mobile chipset has 4 scalar cores with 4 SIMD cores (with some 
overlap in execution), 4 GPU cores (destined to grow a lot), a reasonably powerful pair of DSPs, various hardware processing blocks 
that do heavy-lifting bitmap scaling, transform, composite, and codecs.

All of that is going to grow incrementally, perhaps doubling every several years, along with ever more efficient power usage.  And 
it will be augmented in various ways with ever more powerful and lower power hardware.  We've been at a stabilization point.  Now 
we'll get into the interesting processing.  I can't comment on some interesting developments yet...

And all of that is in a typical mobile phone/tablet in a chipset that is less than a tenth of the overall cost.  Servers will 
inevitably be clusters of those, probably often cast offs from last year's mobile device production.

Devices are already approaching or slightly surpassing 16 cores, but only part of them share the same memory bus, or share it in 
symmetric ways.
While these new chipsets could actually be easily connected in a 3D torus in a high bandwidth/low latency supercomputer 
configuration, few applications need that.  The many-core depth in new systems will mostly be used for 3D, computational imaging, 
advanced sensing, ML, and other AI algorithms.  Also, as is well known, more cores can run at slower clock rates for the same 
workload to use far less power.  This is part of what justified quadcore mobile processors initially.  Although coordination / 
fanout causes some overhead, it is a key effect.  There is also the fast/slow core strategy, although I don't think it is doing as well.

These are some key differences from what people expected I think:

- Core specialization plus powerfully symmetric processors would drastically outperform symmetric general purpose processors
- Mobile rather than powerful server / desktop hardware would be the most important computational platform, improving so rapidly it 
is also viable for server & desktop use.
- Power usage / heat dissipation and overall efficiency would turn out to be the most important aspect of consumer-owned devices 
(mobile) and servers (server density / power usage).
- Mobile and cloud servers, where Unix/Linux quickly bubbled to the top, would become overwhelmingly important, getting on a track 
to completely negating and obsoleting desktops except as cheap web portals and file servers.  For traditional desktop app usage, 
there is a "good enough" hardware capability level that mobile devices have probably already surpassed.  The rest is a simple matter 
of programming and connectivity.  The connectivity is already done: HDMI, USB3, PCIe, 4 lane MIPI, etc.  Thunderbolt would be nice, 
perhaps via PCIe or many bidirectional MIPI lanes.
- The rise in probabilistic algorithms, ML, computational imaging in addition to 3D everywhere.
- That Javascript optimization engines, and other web related technology such as pNACL and emscripten, along with WebGL and soon 
WebCL, would catapult web development range well into and often past what desktop apps can offer, yet run on any device, including 
most mobile and embedded devices.  And that these would range down to a few dollars.
- The rise in resilient cloud processing systems which negates the value of complex resilient processors, memory, or RAID or other 
storage.  Cheap cells that are easily sloughed off and grown anew rule.  Organic computing in a different sense.

Nice really.


On 8/10/14, 10:06 AM, Joseph S. Barrera III wrote:
> On Sun, Aug 10, 2014 at 6:32 AM, Eugen Leitl <eugen at leitl.org> wrote:
>> Wrong about what?
> ​Wrong about a single OS and/or memory system needing to scale to more than
> 16 or so processors. OK, maybe 96.​
> Everything else is done with JBOC (just a bunch of cpus) networked together.

More information about the FoRK mailing list