Discussion:
Sources on optimizing code for x86 segmented architecture?
(too old to reply)
Johann 'Myrkraverk' Oskarsson
2020-04-25 14:58:44 UTC
Permalink
Dear alt.folklore.computers,

I came across this sentence in Linkers & Loaders, by John Levine, p. 42.
Writing efficient segmented code is very tricky and has been well
documented elsewhere.
The book has no references on the subject, and there's nothing in the
bibliography either, and by now such documents, if they still exist,
may be hard to find. This is in a chapter on the x86, and the writing
is about the 16bit architecture of it, mostly.

I am interested in retro programming every now and then, but mostly do
my code for 32bit extended DOS to run in DOSBox. Yet, I find myself
interested in resources on efficient segmented code, if any still exist.

Are there any such books, articles, or documentation still available
somewhere? A quick web search does not yield any promising results.
--
Johann | email: invalid -> com | www.myrkraverk.com/blog/
I'm not from the Internet, I just work there. | twitter: @myrkraverk
Dennis Boone
2020-04-25 15:42:50 UTC
Permalink
Post by Johann 'Myrkraverk' Oskarsson
I came across this sentence in Linkers & Loaders, by John Levine, p. 42.
Writing efficient segmented code is very tricky and has been well
documented elsewhere.
John is the moderator of comp.compilers. Try asking over there.

De
John Levine
2020-04-25 17:18:41 UTC
Permalink
Post by Johann 'Myrkraverk' Oskarsson
I came across this sentence in Linkers & Loaders, by John Levine, p. 42.
Writing efficient segmented code is very tricky and has been well
documented elsewhere.
The book has no references on the subject, and there's nothing in the
bibliography either, and by now such documents, if they still exist,
may be hard to find. This is in a chapter on the x86, and the writing
is about the 16bit architecture of it, mostly.
Now of course I can't remember what I was referring to when I wrote
that 20 years ago.

I am not aware of any compiler optimizations specifically for
segmented address code. On the 286 segment loads were very slow,
even if you were reloading the same segment number into the same
segment register. I think the compilers of the time could generate
code with a single segment load for stuff like this if p is a long
pointer:
p->a = p->b;

p[i] = p[j];

but I don't think it could do much more than that.
--
Regards,
John Levine, ***@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly
Peter Flass
2020-04-25 17:39:52 UTC
Permalink
Post by John Levine
Post by Johann 'Myrkraverk' Oskarsson
I came across this sentence in Linkers & Loaders, by John Levine, p. 42.
Writing efficient segmented code is very tricky and has been well
documented elsewhere.
The book has no references on the subject, and there's nothing in the
bibliography either, and by now such documents, if they still exist,
may be hard to find. This is in a chapter on the x86, and the writing
is about the 16bit architecture of it, mostly.
that 20 years ago.
I am not aware of any compiler optimizations specifically for
segmented address code. On the 286 segment loads were very slow,
even if you were reloading the same segment number into the same
segment register. I think the compilers of the time could generate
code with a single segment load for stuff like this if p is a long
p->a = p->b;
p[i] = p[j];
but I don't think it could do much more than that.
Obviously organizing your code to avoid cross-segment references would
help.
--
Pete
John Levine
2020-04-25 22:00:05 UTC
Permalink
Post by Peter Flass
Post by Johann 'Myrkraverk' Oskarsson
I came across this sentence in Linkers & Loaders, by John Levine, p. 42.
Writing efficient segmented code is very tricky and has been well
documented elsewhere. ...
Obviously organizing your code to avoid cross-segment references would
help.
Oh, sure. Each source file would turn into a module within which all
of the routines could call each other with fast "near" calls, while
inter-routine calls used "far" calls. I think with some effort it was
possible to tell the linker to combine the code from several object
modules into single code and data segments.
--
Regards,
John Levine, ***@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly
Andreas Kohlbach
2020-04-25 17:20:18 UTC
Permalink
Post by Johann 'Myrkraverk' Oskarsson
Dear alt.folklore.computers,
I came across this sentence in Linkers & Loaders, by John Levine, p. 42.
Writing efficient segmented code is very tricky and has been well
documented elsewhere.
The book has no references on the subject, and there's nothing in the
bibliography either, and by now such documents, if they still exist,
may be hard to find. This is in a chapter on the x86, and the writing
is about the 16bit architecture of it, mostly.
I am interested in retro programming every now and then, but mostly do
my code for 32bit extended DOS to run in DOSBox. Yet, I find myself
interested in resources on efficient segmented code, if any still exist.
Consider to use an emulator, may be for the IBM 5150. The MAME emulator
covers this if you have a "BIOS Rom" (mail me, if you want to use MAME and
don't have this Rom).
Post by Johann 'Myrkraverk' Oskarsson
Are there any such books, articles, or documentation still available
somewhere? A quick web search does not yield any promising results.
In mu library I have a "iAPX_86_88_Users_Manual" in PDF format. Mail me,
if you want this.
--
Andreas

PGP fingerprint 952B0A9F12C2FD6C9F7E68DAA9C2EA89D1A370E0
Johann 'Myrkraverk' Oskarsson
2020-04-25 17:35:37 UTC
Permalink
Post by Andreas Kohlbach
Post by Johann 'Myrkraverk' Oskarsson
Dear alt.folklore.computers,
I came across this sentence in Linkers & Loaders, by John Levine, p. 42.
Writing efficient segmented code is very tricky and has been well
documented elsewhere.
The book has no references on the subject, and there's nothing in the
bibliography either, and by now such documents, if they still exist,
may be hard to find. This is in a chapter on the x86, and the writing
is about the 16bit architecture of it, mostly.
I am interested in retro programming every now and then, but mostly do
my code for 32bit extended DOS to run in DOSBox. Yet, I find myself
interested in resources on efficient segmented code, if any still exist.
Consider to use an emulator, may be for the IBM 5150. The MAME emulator
covers this if you have a "BIOS Rom" (mail me, if you want to use MAME and
don't have this Rom).
Post by Johann 'Myrkraverk' Oskarsson
Are there any such books, articles, or documentation still available
somewhere? A quick web search does not yield any promising results.
In mu library I have a "iAPX_86_88_Users_Manual" in PDF format. Mail me,
if you want this.
Mail sent.
--
Johann | email: invalid -> com | www.myrkraverk.com/blog/
I'm not from the Internet, I just work there. | twitter: @myrkraverk
Kerr-Mudd,John
2020-04-26 19:38:06 UTC
Permalink
On Sat, 25 Apr 2020 17:35:37 GMT, Johann 'Myrkraverk' Oskarsson
Post by Johann 'Myrkraverk' Oskarsson
Post by Andreas Kohlbach
Post by Johann 'Myrkraverk' Oskarsson
Dear alt.folklore.computers,
I came across this sentence in Linkers & Loaders, by John Levine, p. 42.
Writing efficient segmented code is very tricky and has been well
documented elsewhere.
The book has no references on the subject, and there's nothing in
the bibliography either, and by now such documents, if they still
exist, may be hard to find. This is in a chapter on the x86, and
the writing is about the 16bit architecture of it, mostly.
I am interested in retro programming every now and then, but mostly
do my code for 32bit extended DOS to run in DOSBox. Yet, I find
myself interested in resources on efficient segmented code, if any
still exist.
Consider to use an emulator, may be for the IBM 5150. The MAME
emulator covers this if you have a "BIOS Rom" (mail me, if you want
to use MAME and don't have this Rom).
Post by Johann 'Myrkraverk' Oskarsson
Are there any such books, articles, or documentation still available
somewhere? A quick web search does not yield any promising results.
In mu library I have a "iAPX_86_88_Users_Manual" in PDF format. Mail
me, if you want this.
Mail sent.
Available on line from good ol' BitSavers!


https://archive.org/details/bitsavers_inteldataBrsManual_57011881
--
Bah, and indeed, Humbug.
Scott Lurndal
2020-04-27 16:12:08 UTC
Permalink
Post by Johann 'Myrkraverk' Oskarsson
Dear alt.folklore.computers,
I came across this sentence in Linkers & Loaders, by John Levine, p. 42.
Writing efficient segmented code is very tricky and has been well
documented elsewhere.
The book has no references on the subject, and there's nothing in the
bibliography either, and by now such documents, if they still exist,
may be hard to find. This is in a chapter on the x86, and the writing
is about the 16bit architecture of it, mostly.
I am interested in retro programming every now and then, but mostly do
my code for 32bit extended DOS to run in DOSBox. Yet, I find myself
interested in resources on efficient segmented code, if any still exist.
Historically, leaving aside the 8086, there were a wide variety of
segmented systems running; The Burroughs machines (both large systems
and medium systems), the HP 3000, et alia.

You may start looking at older architectures documented at bitsavers.org.
t***@gmail.com
2020-05-04 17:41:58 UTC
Permalink
Post by Johann 'Myrkraverk' Oskarsson
Dear alt.folklore.computers,
I came across this sentence in Linkers & Loaders, by John Levine, p. 42.
Writing efficient segmented code is very tricky and has been well
documented elsewhere.
The book has no references on the subject, and there's nothing in the
bibliography either, and by now such documents, if they still exist,
may be hard to find. This is in a chapter on the x86, and the writing
is about the 16bit architecture of it, mostly.
I am interested in retro programming every now and then, but mostly do
my code for 32bit extended DOS to run in DOSBox. Yet, I find myself
interested in resources on efficient segmented code, if any still exist.
Are there any such books, articles, or documentation still available
somewhere? A quick web search does not yield any promising results.
--
Johann | email: invalid -> com | www.myrkraverk.com/blog/
There different approaches to optimizing code for segments based on:
1) Memory model (tiny, small, medium, large, huge)
2) execution mode (real, protected).

So, tiny, small (and I think medium) memory models don't count (mostly) because you are not really dealing with segments. The difference, IIRC, between large & huge, is how you treat the stack segment. For large, the default segment & the stack are the same, for huge they are different. Most big programs were "large", "huge" was fairly rare in practice.

The execution mode changed how expensive it was to do a segment load, so for instance doing:
LES BX,[BP+4]
PUSH ES
PUSH BX

was fairly efficient in real mode, was overly expensive in protected mode. If you were not going to use ES:BX to address something, it was MUCH more efficient to do:
PUSH [BP+4]
PUSH [BP+6]

In real mode, one trick I used was to use the ES register as another base register, which worked great as long as the object I was addressing was aligned on a 16 byte boundary. For protected mode, this was a bad idea (oh well).

I worked on a version of the MS Pascal compiler (outside of Microsoft) that we used for cross-compiling. It was created in the days of the 8086, so had absolutely no optimizations related to protected mode. I added peep-hole optimization step to basically remove unneeded (re)loads of the ES register, partially by tracking register usage, and also by using the method mentioned above (pushing addresses). If you had a bit of code that was (my Pascal is rusty, and this example is a bit contrived):

RECORD myrec BEGIN
somestuff : INTEGER^;
morestuff : INTEGER^;
END;

PROCEDURE anotherProc(int1:INTEGER^, int2:INTEGER^)
BEGIN
int1^ = int1^ + int2^
END
PROCEDURE myproc(VAR data_in: myrec)
BEGIN
anotherProc(data_in^.somestuff, data_in^.morestuff)
END;

For myproc would originally generate something like:

LES BX,[BP+data_in]
LES BX,ES:[BX+somestuff]
PUSH ES
PUSH BX
LES BX[BP+data_in]
LES BX,ES:[BX+morestuff]
PUSH ES
PUSH BX
CALL anotherProc

The first pass would remove the unnecessary loads of ES:BX:

LES BX,[BP+data_in]
PUSH ES:[BX+somestuff+2]
PUSH ES:[BX+somestuff]
LES BX,[BP+data_in]
PUSH ES:[BX+morestuff+2]
PUSH ES:[BX+morestuff]
CALL anotherProc

The second pass removes the redundant load:
LES BX,[BP+data_in]
PUSH ES:[BX+somestuff+2]
PUSH ES:[BX+somestuff]
PUSH ES:[BX+morestuff+2]
PUSH ES:[BX+morestuff]
CALL anotherProc

The result was the code was about 5% smaller, and about 15% faster.

- Tim
John Levine
2020-05-04 20:30:54 UTC
Permalink
Post by t***@gmail.com
So, tiny, small (and I think medium) memory models don't count (mostly) because you are not really dealing with segments. The difference, IIRC,
between large & huge, is how you treat the stack segment. For large, the default segment & the stack are the same, for huge they are different.
Most big programs were "large", "huge" was fairly rare in practice.
For medium model code, we did a fair amount of fiddling with our code
organization so routines that frequently called each other were in the
same segment and could use short calls and returns.

I agree that huge model was rare, the code was really slow and on PCs
it wasn't common to have large flat data structures that needed it.
Post by t***@gmail.com
In real mode, one trick I used was to use the ES register as another base register, which worked great as long as the object I was addressing
was aligned on a 16 byte boundary. For protected mode, this was a bad idea (oh well).
Oh, gross. Clever, but gross.
--
Regards,
John Levine, ***@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly
Scott Lurndal
2020-05-04 21:22:43 UTC
Permalink
Post by John Levine
Post by t***@gmail.com
So, tiny, small (and I think medium) memory models don't count (mostly) because you are not really dealing with segments. The difference, IIRC,
between large & huge, is how you treat the stack segment. For large, the default segment & the stack are the same, for huge they are different.
Most big programs were "large", "huge" was fairly rare in practice.
For medium model code, we did a fair amount of fiddling with our code
organization so routines that frequently called each other were in the
same segment and could use short calls and returns.
I agree that huge model was rare, the code was really slow and on PCs
it wasn't common to have large flat data structures that needed it.
Post by t***@gmail.com
In real mode, one trick I used was to use the ES register as another base register, which worked great as long as the object I was addressing
was aligned on a 16 byte boundary. For protected mode, this was a bad idea (oh well).
Oh, gross. Clever, but gross.
In modern times, %fs is used as a base register for thread-local data, and %gs
is used by the kernel for kernel 'thread' specific data.
t***@gmail.com
2020-05-04 22:49:50 UTC
Permalink
Post by John Levine
Post by t***@gmail.com
So, tiny, small (and I think medium) memory models don't count (mostly) because you are not really dealing with segments. The difference, IIRC,
between large & huge, is how you treat the stack segment. For large, the default segment & the stack are the same, for huge they are different.
Most big programs were "large", "huge" was fairly rare in practice.
For medium model code, we did a fair amount of fiddling with our code
organization so routines that frequently called each other were in the
same segment and could use short calls and returns.
I agree that huge model was rare, the code was really slow and on PCs
it wasn't common to have large flat data structures that needed it.
Post by t***@gmail.com
In real mode, one trick I used was to use the ES register as another base register, which worked great as long as the object I was addressing
was aligned on a 16 byte boundary. For protected mode, this was a bad idea (oh well).
Oh, gross. Clever, but gross.
It was a library to draw lines in graphics mode for printers, in color.
It could handle almost any printer (including daisy wheels), but it only
drew lines. It required 140K of data memory for the highest resolution/widest paper printers that users were likely to have (24 pin wide color printers).
It also optimized the print out to skip over as much whitespace as possible (something Microsoft Windows Dot matrix drivers didn't bother with).
Since memory was at a premium, it needed to take up as little memory as possible (8K for the code). Sacrifices Had To Be Made! :)

- Tim
t***@gmail.com
2020-05-06 13:51:12 UTC
Permalink
Post by t***@gmail.com
Post by Johann 'Myrkraverk' Oskarsson
Dear alt.folklore.computers,
I came across this sentence in Linkers & Loaders, by John Levine, p. 42.
Writing efficient segmented code is very tricky and has been well
documented elsewhere.
The book has no references on the subject, and there's nothing in the
bibliography either, and by now such documents, if they still exist,
may be hard to find. This is in a chapter on the x86, and the writing
is about the 16bit architecture of it, mostly.
I am interested in retro programming every now and then, but mostly do
my code for 32bit extended DOS to run in DOSBox. Yet, I find myself
interested in resources on efficient segmented code, if any still exist.
Are there any such books, articles, or documentation still available
somewhere? A quick web search does not yield any promising results.
--
Johann | email: invalid -> com | www.myrkraverk.com/blog/
1) Memory model (tiny, small, medium, large, huge)
2) execution mode (real, protected).
So, tiny, small (and I think medium) memory models don't count (mostly) because you are not really dealing with segments. The difference, IIRC, between large & huge, is how you treat the stack segment. For large, the default segment & the stack are the same, for huge they are different. Most big programs were "large", "huge" was fairly rare in practice.
LES BX,[BP+4]
PUSH ES
PUSH BX
PUSH [BP+4]
PUSH [BP+6]
In real mode, one trick I used was to use the ES register as another base register, which worked great as long as the object I was addressing was aligned on a 16 byte boundary. For protected mode, this was a bad idea (oh well).
RECORD myrec BEGIN
somestuff : INTEGER^;
morestuff : INTEGER^;
END;
PROCEDURE anotherProc(int1:INTEGER^, int2:INTEGER^)
BEGIN
int1^ = int1^ + int2^
END
PROCEDURE myproc(VAR data_in: myrec)
BEGIN
anotherProc(data_in^.somestuff, data_in^.morestuff)
END;
LES BX,[BP+data_in]
LES BX,ES:[BX+somestuff]
PUSH ES
PUSH BX
LES BX[BP+data_in]
LES BX,ES:[BX+morestuff]
PUSH ES
PUSH BX
CALL anotherProc
LES BX,[BP+data_in]
PUSH ES:[BX+somestuff+2]
PUSH ES:[BX+somestuff]
LES BX,[BP+data_in]
PUSH ES:[BX+morestuff+2]
PUSH ES:[BX+morestuff]
CALL anotherProc
LES BX,[BP+data_in]
PUSH ES:[BX+somestuff+2]
PUSH ES:[BX+somestuff]
PUSH ES:[BX+morestuff+2]
PUSH ES:[BX+morestuff]
CALL anotherProc
The result was the code was about 5% smaller, and about 15% faster.
- Tim
I thought to also mention that since the Stack Segment and the Default Segment are the same with Large memory model, if you need to point to two segments you can use DS & ES, and if you need to address something in the global data segment just use an SS override.

- Tim

Loading...