[FoRK] advances in spintronics
J. Andrew Rogers
andrew at jarbox.org
Fri Nov 23 17:21:49 PST 2012
On Fri, Nov 23, 2012 at 8:31 AM, Eugen Leitl <eugen at leitl.org> wrote:
> Not 32 kBytes/core is tight, but you can work well
> with 128 kBytes/core, if you know how. But most code
> (and OS) will fall flat on its face when it has
> to deal with 16 kCore and 1 MByte/core embedded
> I'm not sure many software developers are yet aware that
> the future is different from the past. If they don't
> adapt, their code stops getting faster, in fact, will
> get slower on that kind of hardware.
I agree that mesh-like topologies are the future and that virtually no
programmers know how to design algorithms know how to use them effectively.
It is not a skill problem per se; the computer science is not there.
Current "parallel" approaches seem to match simple functional paradigms on
a trivial scatter-gather model, which really does not work on mesh
topologies or any non-trivial application for that matter.
That said, in the last year or two I've seen a few new silicon designs for
new high-efficiency parallel processing engines and I am struck by how
similar they are in approach. Apparently, barrel processing is making a
comeback in a big way. Sophisticated barrel processor designs were mostly
locked up in patents owned by Tera (now called Cray) but I think those have
largely expired over the last couple years.
Instead of lots of tiny independent cores, you have single cores with a
massive number of register files for independent hardware threads and huge
amounts of memory bandwidth per core connected by more traditional switch
fabric topologies. This does not solve the basic problem but it pushes the
scaling problems out by a couple orders of magnitude by effectively hiding
latency. Barrel processors allow scale by allowing fine-grained cooperative
scheduling of massive numbers of threads without relatively little overhead.
More importantly, designing software for barrel processing models is much
simpler than mesh topologies. Barrel processing models are amenable to
functional, event-driven algorithm architectures, which is something a
small subset of programmers already know how to do well. I've designed
massively parallel algorithms for barrel processing architectures in the
past and they are pleasant to work with though somewhat counter-intuitive
if you've been writing software for conventional architectures.
Sufficiently large barrel processing fabrics converge on the same problems
that conventional parallel processing environments do, but they push the
problem much further out. I expect this will buy sometime for computer
science to figure out mesh-like architectures.
More information about the FoRK