Post by Thomas Koenig Post by Douglas Miller
I'm not certain what you mean by "main memory just as an I/O
I could probably have phrased that better, probably.
What I mean that a CPU has a certain amount of high-speed RAM,
at the speed of today's level-3 or level-4 caches.
When it needs data that is not present, it requests that data from
a bigger RAM with slow access times (like today's main memory);
this is then transferred and into the main memory, and the core
gets an interrupt when it is there and can then continue
doing what it needs to do.
Very much like today's system handle I/O of disks or other
I/O devices. It might also involve an amount of "paging".
Once you have paging you need to take into account cost
of handling page faults. IIRC Lynn Wheeler wrote that
after extensive optimization in VM/370 he got page fault
handling down to few hundreds of instructions. He also wrote
that other systems may need thousands of instructions.
With current DRAM access times thousands of instructions
would add quite high overhead. Few hundreds of instructions
may be acceptable, but still is costly. OTOH, putting
eqivalent of few hundreds of instructions in hardware is
acceptable increase in hardware complexity. So it makes
sense to put "page" handling in hardware. Since DRAM
transfer rate and latency are not so high compared to
cache ("memory"), one is lead to simpilfied "page"
handling. In particular, it makes sense to decouple
page protection and remaping (virtual memory) from
"page" replacement. Old analyses indicated that
for replacement rather small "page" size gives better
results, proposed page sizes were in range of 32 bytes
to 2 kilobytes. With main memory of about 32 kilobytes
(of order of modern L1 cache) optimal page sizes seem to
be pretty close to cache line size in modern system.
At deeper level bigger size may give slightly better
results, but using the same size at several levels
has advantages and gain from varying "page" size between
levels is probably too small to compensate.
Another issue is automatic versus manual control. In PC
class machines prefetch instructions give some amount
of manual control. OTOH there is several stories indicating
that at large scale automatic systems can do better.
Early OS360 depended on overlays and swaping overlays
led to significant efficiency loss. Later bigger memories
reduced need for overlays. But it seems that paging and
not bigger memories was main factor almost eliminating
overlays. IIUC in practice paging was not only easier
for programmers but also more efficient than overlays.
So, at conceptual level, doing performance analysis it makes
sense to think of main memory as I/O device. OTOH, when
programming it make sense to hide nonuniform structure
of memory and delegate needed support to hardware.