Post by Johann 'Myrkraverk' Oskarsson
I came across this sentence in Linkers & Loaders, by John Levine, p. 42.
Writing efficient segmented code is very tricky and has been well
The book has no references on the subject, and there's nothing in the
bibliography either, and by now such documents, if they still exist,
may be hard to find. This is in a chapter on the x86, and the writing
is about the 16bit architecture of it, mostly.
I am interested in retro programming every now and then, but mostly do
my code for 32bit extended DOS to run in DOSBox. Yet, I find myself
interested in resources on efficient segmented code, if any still exist.
Are there any such books, articles, or documentation still available
somewhere? A quick web search does not yield any promising results.
Johann | email: invalid -> com | www.myrkraverk.com/blog/
There different approaches to optimizing code for segments based on:
1) Memory model (tiny, small, medium, large, huge)
2) execution mode (real, protected).
So, tiny, small (and I think medium) memory models don't count (mostly) because you are not really dealing with segments. The difference, IIRC, between large & huge, is how you treat the stack segment. For large, the default segment & the stack are the same, for huge they are different. Most big programs were "large", "huge" was fairly rare in practice.
The execution mode changed how expensive it was to do a segment load, so for instance doing:
was fairly efficient in real mode, was overly expensive in protected mode. If you were not going to use ES:BX to address something, it was MUCH more efficient to do:
In real mode, one trick I used was to use the ES register as another base register, which worked great as long as the object I was addressing was aligned on a 16 byte boundary. For protected mode, this was a bad idea (oh well).
I worked on a version of the MS Pascal compiler (outside of Microsoft) that we used for cross-compiling. It was created in the days of the 8086, so had absolutely no optimizations related to protected mode. I added peep-hole optimization step to basically remove unneeded (re)loads of the ES register, partially by tracking register usage, and also by using the method mentioned above (pushing addresses). If you had a bit of code that was (my Pascal is rusty, and this example is a bit contrived):
RECORD myrec BEGIN
somestuff : INTEGER^;
morestuff : INTEGER^;
PROCEDURE anotherProc(int1:INTEGER^, int2:INTEGER^)
int1^ = int1^ + int2^
PROCEDURE myproc(VAR data_in: myrec)
For myproc would originally generate something like:
The first pass would remove the unnecessary loads of ES:BX:
The second pass removes the redundant load:
The result was the code was about 5% smaller, and about 15% faster.