Tony Finch wrote:
> But the bits that make context switches expensive aren't in the CPU
> core! The main cost comes from reprogramming the MMU and invalidating
> the level-1 cache.
Interesting. But I have an answer to this one: throw out 98% of
the MMU and all of the 1st level cache. If your memory is on-die,
you don't need the (rather bulky) cache, as you're fetching a few
kBit word with essentially 1st level cache latency. It still makes
sense to implement the bottom few words in SRAM, depending on how
much the OS VM takes you could use this as a register file or a
code cache. I would boot the OS from the links, btw. You only
need to put the boot code into ROM that way.
Retain only enough of the MMU to protect the OS and only the OS.
You can do it with a bitmask on address bus (which is rather short,
as you don't have that many words on-die, if each of them is a few
kBit wide). Zero the bottom few bits if you're in user mode.
Adjust the CPU/memory grain size so that the yield is virtually
quantitative, this way you can have some 10..100 CPUs in a desktop,
several CPU/memory units on a single die. If a couple are shot,
this is caught by redundancy. At the high end, this means wafer-scale
integration. Since objects residing in different nodes talk
by hardware message passing only, their individual address
spaces are mutually protected. If you have to have several threads
in one address space you can check whether they're thrashing
each other by a periodic checksum sweep over code. You could combine
memory refresh in software with checksum computation, while you're
This archive was generated by hypermail 2b29 : Fri Apr 27 2001 - 23:17:58 PDT