Today I realised where 6-character Fortran linker symbols came from ...

Discussion:

(too old to reply)

John Dallman

2024-08-15 21:36:00 UTC

A moderately experienced developer asked me why we have a formal
31-character limit on the length of identifiers in our published API, and
why there are some C macros with names longer than that which don't cause
problems.

I gave him a short version of the history of linkers and the gradually
increasing limits of symbol name length over the decades. I explained how
C macros get replaced: he's a mathematician by education, and took a
while to appreciate that practical computing involves compromises rather
than truly implementing mathematical abstractions.

In the process I realised that 6-character names would fit into the
36-bit words (using 6-bit bytes) of IBM 700/7000 series machines, where
Fortran was originally developed, and this is probably the origin of the
6-character limit.

John

Lawrence D'Oliveiro

2024-08-16 03:55:01 UTC

Permalink

Post by John Dallman
In the process I realised that 6-character names would fit into the
36-bit words (using 6-bit bytes) of IBM 700/7000 series machines, where
Fortran was originally developed, and this is probably the origin of the
6-character limit.

6 whole bits for a character of a simple symbol?? Luxury.

On the PDP-11, we had to pack 6 characters into just 4 bytes, using a
special limited encoding (just for symbols, filenames and the like) called
“Radix-50”.

Lars Poulsen

2024-08-16 04:09:39 UTC

Permalink

Post by Lawrence D'Oliveiro

6 whole bits for a character of a simple symbol?? Luxury.
On the PDP-11, we had to pack 6 characters into just 4 bytes, using a
special limited encoding (just for symbols, filenames and the like) called
“Radix-50”.

The -50 of course was Octal 050 (decimal 40). Which means that it was a
VERY limited character set.

John Levine

2024-08-16 14:20:36 UTC

Permalink

Post by Lawrence D'Oliveiro

6 whole bits for a character of a simple symbol?? Luxury.
On the PDP-11, we had to pack 6 characters into just 4 bytes, using a
special limited encoding (just for symbols, filenames and the like) called
“Radix-50”.

Yeah, they had a version of SQUOZE in compilers on IBM mainframes,
too. The PDP-11 version was based on the PDP-6/10 version which
encoded the six characters into 32 bits leaving the other four bits
for flags.

--
Regards,
John Levine, ***@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

Rich Alderson

2024-08-16 22:07:50 UTC

Permalink

Post by John Dallman
A moderately experienced developer asked me why we have a formal
31-character limit on the length of identifiers in our published API, and
why there are some C macros with names longer than that which don't cause
problems.
I gave him a short version of the history of linkers and the gradually
increasing limits of symbol name length over the decades. I explained how
C macros get replaced: he's a mathematician by education, and took a
while to appreciate that practical computing involves compromises rather
than truly implementing mathematical abstractions.
In the process I realised that 6-character names would fit into the
36-bit words (using 6-bit bytes) of IBM 700/7000 series machines, where
Fortran was originally developed, and this is probably the origin of the
6-character limit.

In point of fact, the use of Radix-50 (base 40 arithmetic, in octal) to encode
symbol names arose in the IBM 36 bit systems, so that linker flags could be
kept in the leftover 4 bits. (See also John Levine's followup regarding PDP-6/10
vs. the PDP-11.)

Friends of mine who were taking a compiler writing class at Ohio State in the
1974 time frame were complaining about having to implement Radix-50 in their
parsers (though they were using a 370/165 at the time).

--
Rich Alderson ***@alderson.users.panix.com
Audendum est, et veritas investiganda; quam etiamsi non assequamur,
omnino tamen proprius, quam nunc sumus, ad eam perveniemus.
--Galen

Kerr-Mudd, John

2024-08-17 08:28:26 UTC

Permalink

On Thu, 15 Aug 2024 22:36 +0100 (BST)

It was Dec wot started it:

from
https://en.wikipedia.org/wiki/RADIX-50

The use of RADIX 50 was the source of the filename size conventions used
by Digital Equipment Corporation PDP-11 operating systems. Using RADIX 50
encoding, six characters of a filename could be stored in two 16-bit
words, while three more extension (file type) characters could be stored
in a third 16-bit word. Similary, a three-character device name such as
"DL1" could also be stored in a 16-bit word. The period that separated the
filename and its extension, and the colon separating a device name from a
filename, was implied (i.e., was not stored and always assumed to be
present).

TL;DnR: 3 x 16bit words.==filename limit of 6.3

--
Bah, and indeed Humbug.

Lawrence D'Oliveiro

2024-08-17 08:47:04 UTC

Permalink

Post by Kerr-Mudd, John
Using RADIX
50 encoding, six characters of a filename could be stored in two 16-bit
words, while three more extension (file type) characters could be stored
in a third 16-bit word. Similary, a three-character device name such as
"DL1" could also be stored in a 16-bit word.

Ah, memories of the “.fss” (filespec string scan) service from RSTS/E ...

But no, on that OS at least, the device name (if it was valid) was not
Radix-50 encoded. The 2 characters of the physical device name were passed
back as is, while the unit number was converted to an integer and passed
back in another byte. There was also a flag byte indicating whether the
unit number had been explicitly specified or not. This allowed, e.g. “SY:”
to be distinguished from “SY0:”.

This also happened if a logical name was specified that could be
translated to a physical device name. If it could not be translated (but
was syntactically valid), then it would be returned Radix-50-encoded, with
a suitable flag bit to indicate this had happened.

One odd thing was, the docs I read (up to RSTS/E v7), always described the
Radix-50 code that was used for the “?” character as “undefined”.

John Levine

2024-08-17 17:13:15 UTC

Permalink

from https://en.wikipedia.org/wiki/RADIX-50

Um, that article says it was preceded by SQUOZE on the IBM 709 in 1958.

DEC's first 36 bit machine, the PDP-6, was shipped in 1964.

The early Fortran compilers stored variable names as six BCD characters
in a word, according to this document:

https://bitsavers.org/pdf/ibm/fortran/FORTRAN_704_709_Systems_Manual-1960.pdf

I don't think we need to look any farther back.

--
Regards,
John Levine, ***@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

Kerr-Mudd, John

2024-08-19 08:04:36 UTC

Permalink

On Sat, 17 Aug 2024 17:13:15 -0000 (UTC)

Post by John Levine

from https://en.wikipedia.org/wiki/RADIX-50

Um, that article says it was preceded by SQUOZE on the IBM 709 in 1958.

Thanks; I sit corrected; that's before my time (literally!).

Post by John Levine
DEC's first 36 bit machine, the PDP-6, was shipped in 1964.
The early Fortran compilers stored variable names as six BCD characters
https://bitsavers.org/pdf/ibm/fortran/FORTRAN_704_709_Systems_Manual-1960.pdf
I don't think we need to look any farther back.

--
Bah, and indeed Humbug.