Discussion:
CR or LF?
(too old to reply)
Gareth Evans
2020-04-12 10:59:56 UTC
Permalink
Both CR and LF are command characters that reflect the
mechanical make-up of some printing devices, the
ASR / KSR 33 / 35 perhaps being the most common at
the time of the creation of the code.

But where there are those who argue that device
characteristics should be hidden away in the
device drivers and not be embedded in printable
text, should not the end of a line be marked by ETX
and the end of a file by EOT and not ^Z?

But, if device behaviours are to be encoded in
printable text, surely the end-of-line marker
should not be LF, but CR, to reflect, for
example, the behaviour of typewriters?

Quasi-religious worshipping at the altar
of UNIX / LINUX has much to answer for!

-----ooooo-----

In any case, A Happy Easter to all my
readers, WITHOUT EXCEPTION, and
God (Should she exist?) bless you all!
John Levine
2020-04-12 20:35:19 UTC
Permalink
I'm sure I'll regret this but ...
Post by Gareth Evans
text, should not the end of a line be marked by ETX
and the end of a file by EOT and not ^Z?
On Unixish systems, the end of file indicator for
keyboard input has always been EOT aka ^D.

STX/ETX was used to frame text blocks in bit synchronous protocols and
never meant anything on async terminals like Teletypes.
Post by Gareth Evans
printable text, surely the end-of-line marker
should not be LF, but CR, to reflect, for
example, the behaviour of typewriters?
The terminal mostly used by the Unix group at Bell Labs was the model
37 Teletype on which the 012 "newline" character both returned the
carriage and returned the paper. The convention matched the hardware,
not to mention being a lot less error prone than the PDP-10 / CP/M /
MS-DOS CR+LF where nobody knew what a bare CR or LF meant (and still
don't.)

The return lever on my manual typewriter both moved the carriage and
advanced the paper. The TTY 33 and 35 that separated the functions
were outliers, albeit quite popular ones.

If anyone's old enough to remember Flexowriters, they also had a
single newline code which returned the carriage (knocking over the
coffee cup you unwisely left near the device) and advanced the paper.
--
Regards,
John Levine, ***@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly
J. Clarke
2020-04-12 20:52:14 UTC
Permalink
On Sun, 12 Apr 2020 20:35:19 -0000 (UTC), John Levine
Post by John Levine
I'm sure I'll regret this but ...
Post by Gareth Evans
text, should not the end of a line be marked by ETX
and the end of a file by EOT and not ^Z?
On Unixish systems, the end of file indicator for
keyboard input has always been EOT aka ^D.
STX/ETX was used to frame text blocks in bit synchronous protocols and
never meant anything on async terminals like Teletypes.
Post by Gareth Evans
printable text, surely the end-of-line marker
should not be LF, but CR, to reflect, for
example, the behaviour of typewriters?
The terminal mostly used by the Unix group at Bell Labs was the model
37 Teletype on which the 012 "newline" character both returned the
carriage and returned the paper. The convention matched the hardware,
not to mention being a lot less error prone than the PDP-10 / CP/M /
MS-DOS CR+LF where nobody knew what a bare CR or LF meant (and still
don't.)
If you want to venture into the EBCDIC world, CR on a 2741 meant just
that--return the carriage. LF meant just that, advance the paper one
line. This meant that you could overtype a line if you wanted to.
Post by John Levine
The return lever on my manual typewriter both moved the carriage and
advanced the paper. The TTY 33 and 35 that separated the functions
were outliers, albeit quite popular ones.
If anyone's old enough to remember Flexowriters, they also had a
single newline code which returned the carriage (knocking over the
coffee cup you unwisely left near the device) and advanced the paper.
John Levine
2020-04-13 22:16:35 UTC
Permalink
Post by J. Clarke
If you want to venture into the EBCDIC world, CR on a 2741 meant just
that--return the carriage. LF meant just that, advance the paper one
line. This meant that you could overtype a line if you wanted to.
Take another look at the manual, particularly the code charts starting
on page 15. It had a NL character which was "Carrier Return and Line
Feed" and Index which just moved the paper, but no bare carriage return.

You could definitely overtype by backspacing, which is how APL input
and output the complex operator characters.

http://bitsavers.org/pdf/ibm/2741/A24-3415-2_2741_Communication_Terminal.pdf
--
Regards,
John Levine, ***@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly
Gareth Evans
2020-04-12 21:48:21 UTC
Permalink
Post by John Levine
The terminal mostly used by the Unix group at Bell Labs was the model
37 Teletype on which the 012 "newline" character both returned the
carriage and returned the paper.
The ghost is thereby laid.

Thankyou.
Mike Spencer
2020-04-12 22:44:39 UTC
Permalink
Post by John Levine
I'm sure I'll regret this but ...
I'll try not to be too harsh. Sir. :-)
Post by John Levine
...not to mention being a lot less error prone than the PDP-10 /
CP/M / MS-DOS CR+LF where nobody knew what a bare CR or LF meant
(and still don't.)
Huh? Got my first computer at age 45, CP/M, and an Epson dot matrix
printer. AFAICT, anybody able to learn what words CR & LF abbreviate
and had ever seen a typewriter could intuit what they meant in terms of
the printer, even though it was a printerhead, not a carriage, that
"returned".
Post by John Levine
The return lever on my manual typewriter both moved the carriage and
advanced the paper.
On my 1957 portable Hermes (which I still have and which still works)
the return lever operates a pawl that engages a ratchet wheel on the
platen shaft. The pawl can be set to advance the paper 0, 1, 2 or 3
lines. So CR and LF are integrated in the same hand movement but are
mechanically distinct.
--
Mike Spencer Nova Scotia, Canada
Douglas Miller
2020-04-12 23:38:41 UTC
Permalink
Post by John Levine
...
The terminal mostly used by the Unix group at Bell Labs was the model
37 Teletype on which the 012 "newline" character both returned the
carriage and returned the paper. The convention matched the hardware,
not to mention being a lot less error prone than the PDP-10 / CP/M /
MS-DOS CR+LF where nobody knew what a bare CR or LF meant (and still
don't.)
...
I don't think I buy much of this. The ASR 33 teletypes could be configured to perform CR+LF on receipt of only one of those (I forget if you could choose which one), so I'm pretty certain that the ASR 37 could also do that. None of the ASR 33's that I ever worked on were configured that way, though, so CR and LF were separate functions, and of course separate keys. In addition, the Unix tty drivers, or even the serial port boards, were often configurable to turn a LF into CR+LF. So, just because a program could send a LF and observe CR+LF on the output device does not mean that the output device functioned that way. Pretty much every CRT terminal and printing device I ever saw could be optionally configure to do "CR on LF" and/or "LF on CR", as well as keeping them separate.

Long before PDP-11, CPM, etc, there was the FORTRAN programming language, and the formatted output functions supported printing without LF (advancing paper). While those were mainframes with (typically) true line printers, it was the same concept: being able to over print one line.

I suspect it was more likely that Bell Labs simply configured some "down stream" component (driver, port, or terminal) to translate LF to CR+LF. All of the Unix ships I worked at did the same sort of thing.
John Levine
2020-04-13 19:03:19 UTC
Permalink
Post by Douglas Miller
I don't think I buy much of this.
Hardly matters, you weren't there.
Post by Douglas Miller
The ASR 33 teletypes could be configured to perform CR+LF on receipt of only one of those (I
forget if you could choose which one), so I'm pretty certain that the ASR 37 could also do that.
Possibly, but nobody did. You are of course correct the many Unix
systems had model 33 and 35 ttys attached and the tty drivers could be
configured to insert CR before NL if needed. In a fairly slick move,
after a CR without a following NL, they'd pause output for a little
while rather than sending NULs like some other systems did. Even on a
mod 37 sending the CR before NL wouldn't cause problems other than
slowing down the output a little.

The tty drivers for mod 33 and 35 also faked upper and lower case
which were standard on the mod 37.
Post by Douglas Miller
Long before PDP-11, CPM, etc, there was the FORTRAN programming
language, and the formatted output functions supported printing
without LF (advancing paper).
Not really. Fortran was originally on mainframes that had drum or
train printers that printed a line at a time. Paper motion was
handled by control operations outside the character data stream. They
had nothing analogous to CR or LF or NL. Yes, you could overprint,
that was one of the control operations.

We all recall that Fortran had a kludge that used the first character
of each output line for carriage control. A space was single spacing,
"1" was new page, "+" was overprint. That was handled by the Fortran
I/O system, not the hardware.
--
Regards,
John Levine, ***@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly
Douglas Miller
2020-04-13 21:12:41 UTC
Permalink
Post by John Levine
Post by Douglas Miller
I don't think I buy much of this.
Hardly matters, you weren't there.
...
Point being, I would be disappointed in the brilliant minds that gave us Unix if they let the "tail wag the dog". It makes no sense. I would also be disappointed in Teletype Corp. if they changed the behavior of something as fundamental as LF between models 33 and 37. Especially when the 37 was supposed to be expanding capabilities/markets.
John Levine
2020-04-13 22:30:54 UTC
Permalink
Post by Douglas Miller
Point being, I would be disappointed in the brilliant minds that gave us Unix if they let the "tail wag
the dog". It makes no sense. I would also be disappointed in Teletype Corp. if they changed the behavior
of something as fundamental as LF between models 33 and 37. Especially when the 37 was supposed to be
expanding capabilities/markets.
I haven't found speculating about facts to be a good use of anyone's time.

Here's a PDF of the Model 37 catalog. See page 9 where they explain
in the New Line section that it "provides the convenience of operating
Carriage Return-Line Space functions by depressing the Line Space key"
which they later say sends the LF character:

https://ia800807.us.archive.org/2/items/TNM_Model_37_terminal_product_catalog_-_Teletype__20170923_0036/TNM_Model_37_terminal_product_catalog_-_Teletype__20170923_0036.pdf

It did a bunch of other stuff which was exotic for the time such as
two-color ribbons, remotely settable tabs (as opposed to little
mechanical slugs) and extra graphic characters via SO/SI.
--
Regards,
John Levine, ***@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly
Douglas Miller
2020-04-13 23:00:46 UTC
Permalink
...
Here's a PDF of the Model 37 catalog. ...
Interesting, they really did push beyond the idea of a "teletype" with the 37.

I guess if you were going to flip a coin between choosing CR or LF as the line termination, and you had a room full of model 37's, that'd probably push you to LF.
Scott Lurndal
2020-04-13 21:22:20 UTC
Permalink
Post by John Levine
We all recall that Fortran had a kludge that used the first character
of each output line for carriage control. A space was single spacing,
"1" was new page, "+" was overprint. That was handled by the Fortran
I/O system, not the hardware.
At least on some systems of the day, all fortran did was to translate
the CC character into a flag on the interface call to the operating
system; the operating system would use that to update the I/O descriptor
to the line printer controller to include the tape channel # to use
when printing each line.

There was a 4-bit tape channel field in the I/O descriptor, with the following
encoding (this is alt.folklore.computers, after all):

Code Meaning
---- ---------------
0 No Paper Motion ("+" in fortran code, and my COBOL
docs are in the attic, but something
like WRITE LINE WITHOUT ADVANCING.)
1 Advance to heading ("1" in fortran) (channel 1 on the tape)
2-B Advance to channel 2 through 11
C Advance to End Of Page (channel 12 on the tape)
E Single Space " " in fortran
F Double Space "0" in fortran,

The printer controller hardware would handle the protocol to the printer
(train or band), but it was mostly sending the channel number of the
paper tape used to control page position (channel zero telling the printer
no page motion required, and 1-f selecting the proper channel on the tape).

Customers would use custom punched tapes to use with specific pre-printed
forms to allow positioning vertically on the page without the overhead
of processing writes of blank lines, using the skip to channel feature
(e.g. that implementation supported "2" .. "B" in the Fortran carriage control byte,
and there was COBOL syntax to position to a specified channel).

(Burroughs mainframes 60s - 90s).
Thomas Koenig
2020-04-14 09:40:37 UTC
Permalink
Post by Scott Lurndal
Post by John Levine
We all recall that Fortran had a kludge that used the first character
of each output line for carriage control. A space was single spacing,
"1" was new page, "+" was overprint. That was handled by the Fortran
I/O system, not the hardware.
At least on some systems of the day, all fortran did was to translate
the CC character into a flag on the interface call to the operating
system; the operating system would use that to update the I/O descriptor
to the line printer controller to include the tape channel # to use
when printing each line.
Not only Fortran.

The first time I ever wrote a program on a mainframe (a Siemens 7881
running an MVS clone), I wrote out the numbers from 1 to 100 using
the system's Pascal compiler. That system already had a laser
printer (fancy!).

Result? The first characters were interpeted as carriage control,
and I had a rather larger stack of paper than I anticipated. And
yes, the fist digits were eaten by the system.
Quadibloc
2020-04-14 00:56:09 UTC
Permalink
Post by John Levine
Post by Douglas Miller
The ASR 33 teletypes could be configured to perform CR+LF on receipt of only one of those (I
forget if you could choose which one), so I'm pretty certain that the ASR 37 could also do that.
Possibly, but nobody did.
Not quite true. Ham radio operators buying surplus ASR 32 and similar from the
phone company often had to contend with machines set this way, but it was
actually not that bad a thing, because characters tended to get lost in RTTY.

John Savard
Charlie Gibbs
2020-04-13 03:32:18 UTC
Permalink
Post by John Levine
I'm sure I'll regret this but ...
Post by Gareth Evans
text, should not the end of a line be marked by ETX
and the end of a file by EOT and not ^Z?
On Unixish systems, the end of file indicator for
keyboard input has always been EOT aka ^D.
STX/ETX was used to frame text blocks in bit synchronous protocols
and never meant anything on async terminals like Teletypes.
However, I've programmed interfaces to many asynchronous devices
that use STX/ETX (followed by various forms of checksums).
--
/~\ Charlie Gibbs | Microsoft is a dictatorship.
\ / <***@kltpzyxm.invalid> | Apple is a cult.
X I'm really at ac.dekanfrus | Linux is anarchy.
/ \ if you read it the right way. | Pick your poison.
Carlos E.R.
2020-04-14 03:26:54 UTC
Permalink
Post by John Levine
The return lever on my manual typewriter both moved the carriage and
advanced the paper. The TTY 33 and 35 that separated the functions
were outliers, albeit quite popular ones
Not on mine. I could advance the paper, return the carriage, or both. It
was needed for things like underlining or bold. Text printers simply
imitated this obvious behaviour - not to be confused with computer text
files.
--
Cheers, Carlos.
Quadibloc
2020-04-13 05:11:41 UTC
Permalink
Post by Gareth Evans
But, if device behaviours are to be encoded in
printable text, surely the end-of-line marker
should not be LF, but CR, to reflect, for
example, the behaviour of typewriters?
It is certainly true that, on the System/360 time-sharing system that I used
first, on ASCII terminals, one pressed the carriage return key, and not the line
feed key, to end a line, because it was certain to be present, and when both
were present, it was usually larger or more conveniently placed.

And the Macintosh, at least originally, used CR rather than LF to separate lines
in its text files. MS-DOS used <CR><LF>. So I do find CR to be a bit more
natural than LF.

However, in my opinion, using _any_ character to delimit lines of text in a text
file is a bad idea. Instead of a text file being a bunch of C strings (which, of
course, are ended by NUL) it should be a bunch of Pascal strings (n, followd by
n characters).

Two benefits.

Better protection against buffer overflows.

No strange problems when one creates a file where binary data and text are
mixed, since now all 256 bytes are legal, no value means "end of record".

John Savard
Charlie Gibbs
2020-04-13 17:15:07 UTC
Permalink
Post by Quadibloc
However, in my opinion, using _any_ character to delimit lines
of text in a text file is a bad idea. Instead of a text file being
a bunch of C strings (which, of course, are ended by NUL) it should
be a bunch of Pascal strings (n, followd by n characters).
Two benefits.
Better protection against buffer overflows.
No strange problems when one creates a file where binary data and
text are mixed, since now all 256 bytes are legal, no value means
"end of record".
These are indeed benefits. But there are offsetting factors.
First, how big is the record size prefix? If it's a single byte,
you're limited to 255-byte records. If it's two bytes, the maximum
size rises to 65,535 bytes. However, the argument about wasted
bytes now moves to files consisting entirely of records shorter
than 256 bytes - and a secondary argument arises over whether
the length prefix should be big-endian or little-endian.
Yes, we could go to some sort of variable-length encoding
of the length field, similar to what's used by UTF-8 -
but things have then gotten much more complicated.

Another potential problem is line hits. If the length prefix
is garbled, you lose record sync. This also happens if bytes
are dropped, or garbage bytes inserted. Either way, there is
no reliable way to re-synchronize, unless you get into some
sort of ACK/NAK protocol with checksumming - and again, things
have gotten more complicated.

Delimited records are much easier to process, and are
self-synchronizing in the event of data errors. The
downside is that there are certain values (i.e. line
terminator characters) which cannot be embedded in
records - although with text files this is seldom a problem.
--
/~\ Charlie Gibbs | Microsoft is a dictatorship.
\ / <***@kltpzyxm.invalid> | Apple is a cult.
X I'm really at ac.dekanfrus | Linux is anarchy.
/ \ if you read it the right way. | Pick your poison.
Peter Flass
2020-04-13 23:40:11 UTC
Permalink
Post by Quadibloc
Post by Gareth Evans
But, if device behaviours are to be encoded in
printable text, surely the end-of-line marker
should not be LF, but CR, to reflect, for
example, the behaviour of typewriters?
It is certainly true that, on the System/360 time-sharing system that I used
first, on ASCII terminals, one pressed the carriage return key, and not the line
feed key, to end a line, because it was certain to be present, and when both
were present, it was usually larger or more conveniently placed.
And the Macintosh, at least originally, used CR rather than LF to separate lines
in its text files. MS-DOS used <CR><LF>. So I do find CR to be a bit more
natural than LF.
However, in my opinion, using _any_ character to delimit lines of text in a text
file is a bad idea. Instead of a text file being a bunch of C strings (which, of
course, are ended by NUL) it should be a bunch of Pascal strings (n, followd by
n characters).
Two benefits.
Better protection against buffer overflows.
No strange problems when one creates a file where binary data and text are
mixed, since now all 256 bytes are legal, no value means "end of record".
John Savard
I agree. Using a character as a delimiter causes all kinds of problems.
--
Pete
Peter Flass
2020-04-14 00:29:53 UTC
Permalink
Post by Peter Flass
Post by Quadibloc
Post by Gareth Evans
But, if device behaviours are to be encoded in
printable text, surely the end-of-line marker
should not be LF, but CR, to reflect, for
example, the behaviour of typewriters?
It is certainly true that, on the System/360 time-sharing system that I used
first, on ASCII terminals, one pressed the carriage return key, and not the line
feed key, to end a line, because it was certain to be present, and when both
were present, it was usually larger or more conveniently placed.
And the Macintosh, at least originally, used CR rather than LF to separate lines
in its text files. MS-DOS used <CR><LF>. So I do find CR to be a bit more
natural than LF.
However, in my opinion, using _any_ character to delimit lines of text in a text
file is a bad idea. Instead of a text file being a bunch of C strings (which, of
course, are ended by NUL) it should be a bunch of Pascal strings (n, followd by
n characters).
Two benefits.
Better protection against buffer overflows.
No strange problems when one creates a file where binary data and text are
mixed, since now all 256 bytes are legal, no value means "end of record".
John Savard
I agree. Using a character as a delimiter causes all kinds of problems.
Besides, thinking about this, what’s the distinction between a two-byte
count field and a two-byte (CRLF) delimiter?
--
Pete
Douglas Miller
2020-04-14 01:18:12 UTC
Permalink
Post by Peter Flass
...
Besides, thinking about this, what’s the distinction between a two-byte
count field and a two-byte (CRLF) delimiter?
--
Pete
Think about how all this evolved. The first ASCII "text files" were probably punched paper tape generated on teletypes and transmitted to a remote (where possibly a copy of the paper tape "file" was made). These text files needed to be raw images of the codes necessary to print the document. On computers, it was much simpler to be able to directly output a file to a peripheral without having to do any (or much) processing on it. For example, "PIP LST:=FOOBAR.PRN". Certainly, an application is free to invent their own format for storing things in files, but for universal interchange there has to be a standard. Slight platform variations (CR, LF, CR+LF) aside.
Niklas Karlsson
2020-04-16 13:40:29 UTC
Permalink
Post by Douglas Miller
Post by Peter Flass
...
Besides, thinking about this, what’s the distinction between a two-byte
count field and a two-byte (CRLF) delimiter?
Think about how all this evolved. The first ASCII "text files" were probably punched paper tape generated on teletypes and transmitted to a remote (where possibly a copy of the paper tape "file" was made). These text files needed to be raw images of the codes necessary to print the document. On computers, it was much simpler to be able to directly output a file to a peripheral without having to do any (or much) processing on it. For example, "PIP LST:=FOOBAR.PRN". Certainly, an application is free to invent their own format for storing things in files, but for universal interchange there has to be a standard. Slight platform variations (CR, LF, CR+LF) aside.
Ironic that the main text of your posting is all one long line, no
breaks.

Niklas
--
"And the attacks have been totally random, so that rules out border skirmishes
or political disagreements. there's no one obvious to blame."
"Which means they'll blame each other, randomly."
-- Sheridan and Delenn in Babylon 5:"In the Kingdom of the Blind"
maus
2020-04-17 11:00:30 UTC
Permalink
Post by Niklas Karlsson
Post by Douglas Miller
Post by Peter Flass
...
Besides, thinking about this, what’s the distinction between a two-byte
count field and a two-byte (CRLF) delimiter?
Think about how all this evolved. The first ASCII "text files" were probably punched paper tape generated on teletypes and transmitted to a remote (where possibly a copy of the paper tape "file" was made). These text files needed to be raw images of the codes necessary to print the document. On computers, it was much simpler to be able to directly output a file to a peripheral without having to do any (or much) processing on it. For example, "PIP LST:=FOOBAR.PRN". Certainly, an application is free to invent their own format for storing things in files, but for universal interchange there has to be a standard. Slight platform variations (CR, LF, CR+LF) aside.
Ironic that the main text of your posting is all one long line, no
breaks.
Niklas
comes from groups.google.com Irritating.
--
greymaus
Niklas Karlsson
2020-04-18 09:56:54 UTC
Permalink
Post by maus
Post by Niklas Karlsson
Ironic that the main text of your posting is all one long line, no
breaks.
comes from groups.google.com Irritating.
I see. If I were to post through there, I'd probably type up my post in
a text editor that wrapped at 72-80 cols or so and then paste it into
google.

Niklas
--
I always buy white autos only. That way, the cops call in a speeding
red car once I've gone by, but the ones at the roadblocks see a blue
one coming their way.
-- J.D. Baldwin
J. Clarke
2020-04-14 01:37:58 UTC
Permalink
On Mon, 13 Apr 2020 17:29:53 -0700, Peter Flass
Post by Peter Flass
Post by Peter Flass
Post by Quadibloc
Post by Gareth Evans
But, if device behaviours are to be encoded in
printable text, surely the end-of-line marker
should not be LF, but CR, to reflect, for
example, the behaviour of typewriters?
It is certainly true that, on the System/360 time-sharing system that I used
first, on ASCII terminals, one pressed the carriage return key, and not the line
feed key, to end a line, because it was certain to be present, and when both
were present, it was usually larger or more conveniently placed.
And the Macintosh, at least originally, used CR rather than LF to separate lines
in its text files. MS-DOS used <CR><LF>. So I do find CR to be a bit more
natural than LF.
However, in my opinion, using _any_ character to delimit lines of text in a text
file is a bad idea. Instead of a text file being a bunch of C strings (which, of
course, are ended by NUL) it should be a bunch of Pascal strings (n, followd by
n characters).
Two benefits.
Better protection against buffer overflows.
No strange problems when one creates a file where binary data and text are
mixed, since now all 256 bytes are legal, no value means "end of record".
John Savard
I agree. Using a character as a delimiter causes all kinds of problems.
Besides, thinking about this, what’s the distinction between a two-byte
count field and a two-byte (CRLF) delimiter?
The difference is that a one-bit error in a CRLF means that you have
an extra long line with a garbage character in the middle A one-bit
error in a count field means that the counts for the entire rest of
the message are garbled and you don't know where the end of it is.
Quadibloc
2020-04-14 21:51:05 UTC
Permalink
Post by J. Clarke
The difference is that a one-bit error in a CRLF means that you have
an extra long line with a garbage character in the middle A one-bit
error in a count field means that the counts for the entire rest of
the message are garbled and you don't know where the end of it is.
I definitely agree that no one should try to invent a TTY terminal that uses
count fields instead of CR and LF characters for use in RTTY.

On computer disk files, however, every disk block has a CRC field, and so if
there's an error, you won't be able to see the data that is in error anyways.

John Savard
Bob Eager
2020-04-14 07:45:42 UTC
Permalink
Post by Peter Flass
Post by Peter Flass
Post by Quadibloc
But, if device behaviours are to be encoded in printable text, surely
the end-of-line marker should not be LF, but CR, to reflect, for
example, the behaviour of typewriters?
It is certainly true that, on the System/360 time-sharing system that
I used first, on ASCII terminals, one pressed the carriage return key,
and not the line feed key, to end a line, because it was certain to be
present, and when both were present, it was usually larger or more
conveniently placed.
And the Macintosh, at least originally, used CR rather than LF to
separate lines in its text files. MS-DOS used <CR><LF>. So I do find
CR to be a bit more natural than LF.
However, in my opinion, using _any_ character to delimit lines of text
in a text file is a bad idea. Instead of a text file being a bunch of
C strings (which, of course, are ended by NUL) it should be a bunch of
Pascal strings (n, followd by n characters).
Two benefits.
Better protection against buffer overflows.
No strange problems when one creates a file where binary data and text
are mixed, since now all 256 bytes are legal, no value means "end of
record".
John Savard
I agree. Using a character as a delimiter causes all kinds of problems.
Besides, thinking about this, what’s the distinction between a two-byte
count field and a two-byte (CRLF) delimiter?
The latter can handle a line of any length.
--
Using UNIX since v6 (1975)...

Use the BIG mirror service in the UK:
http://www.mirrorservice.org
Gareth Evans
2020-04-14 10:38:24 UTC
Permalink
Post by Bob Eager
Post by Peter Flass
Besides, thinking about this, what’s the distinction between a two-byte
count field and a two-byte (CRLF) delimiter?
The latter can handle a line of any length.
I have yet to encounter a line of any length that exceeds
65,535 characters! :-)
Carlos E.R.
2020-04-14 13:28:56 UTC
Permalink
Post by Gareth Evans
Post by Bob Eager
Post by Peter Flass
Besides, thinking about this, what’s the distinction between a two-byte
count field and a two-byte (CRLF) delimiter?
The latter can handle a line of any length.
I have yet to encounter a line of any length that exceeds
65,535 characters!  :-)
I have seen them. Rare, but I have seen them. :-)
--
Cheers, Carlos.
Scott
2020-04-14 15:27:16 UTC
Permalink
On Tue, 14 Apr 2020 11:38:24 +0100, Gareth Evans
Post by Gareth Evans
Post by Bob Eager
Besides, thinking about this, what’s the distinction between a two-byte
count field and a two-byte (CRLF) delimiter?
The latter can handle a line of any length.
I have yet to encounter a line of any length that exceeds
65,535 characters! :-)
Physically, no, of course. On disk? Yes.

It wasn't that long ago that I laid out the basic structure of an
image processing suite. I decided that the value of a few bytes of
memory exceeded the value of handling images larger than 64k pixels
square. Ridiculous. I could not foresee any such thing ever being
useful. Fast forward a few decades and, well...that was dumb, wasn't
it?
Gareth Evans
2020-04-14 16:37:21 UTC
Permalink
Post by Scott
On Tue, 14 Apr 2020 11:38:24 +0100, Gareth Evans
Post by Gareth Evans
Post by Bob Eager
Besides, thinking about this, what’s the distinction between a two-byte
count field and a two-byte (CRLF) delimiter?
The latter can handle a line of any length.
I have yet to encounter a line of any length that exceeds
65,535 characters! :-)
Physically, no, of course. On disk? Yes.
It wasn't that long ago that I laid out the basic structure of an
image processing suite. I decided that the value of a few bytes of
memory exceeded the value of handling images larger than 64k pixels
square. Ridiculous. I could not foresee any such thing ever being
useful. Fast forward a few decades and, well...that was dumb, wasn't
it?
images are not ASCII text with end of line markers.
Quadibloc
2020-04-14 22:00:12 UTC
Permalink
Post by Gareth Evans
images are not ASCII text with end of line markers.
Indeed. So, on MTS, an operating system where text files either were ISAM files
where each line was from 0 to 255 characters, or sequential files where each
line was from 0 to 32,767 characters (they didn't bother with unsigned)... if a
file was going to contain a binary blob of data, in order to fit, it was turned
into a file where every record was the same length, perhaps 80, 128, or 64
characters.

This would be different from what we're used to in Windows or Unix.

Of course, if MTS can mark a file as LINE or SEQ, then an operating system were
*text* files are usually of either the LINE or SEQ types could also support
_other_ file types.

So you could have PTAP files that delimit records with CR, for example, and
DIRECT files which reserve a chunk of disk space so that any byte can be
directly accessed by position, and return as many bytes as are requested. (If
one calls the "return a record" routine, these could also use the CR as a record
delimiter.)

So while the _default_ text file type would be different from what is used now,
the file type seen on today's computers would still be available (which is
DIRECT with a character delimiter; PTAP is a sequential type for which seek to
character N would be impractical, and that typically hasn't been found to be
needed on today's microcomputer systems).

John Savard
f***@hotmail.com
2020-05-23 00:09:20 UTC
Permalink
Post by Gareth Evans
Post by Scott
On Tue, 14 Apr 2020 11:38:24 +0100, Gareth Evans
Post by Gareth Evans
Post by Bob Eager
Besides, thinking about this, what’s the distinction between a two-byte
count field and a two-byte (CRLF) delimiter?
The latter can handle a line of any length.
I have yet to encounter a line of any length that exceeds
65,535 characters! :-)
Physically, no, of course. On disk? Yes.
It wasn't that long ago that I laid out the basic structure of an
image processing suite. I decided that the value of a few bytes of
memory exceeded the value of handling images larger than 64k pixels
square. Ridiculous. I could not foresee any such thing ever being
useful. Fast forward a few decades and, well...that was dumb, wasn't
it?
images are not ASCII text with end of line markers.
Oddly, Gareth, a lot of images are exactly that. When base64 encoded,
or otherwise transmitted. Its only when interpreted by a web browser
that they become... images.

FredW
Quadibloc
2020-05-23 10:37:47 UTC
Permalink
Post by f***@hotmail.com
Post by Gareth Evans
images are not ASCII text with end of line markers.
Oddly, Gareth, a lot of images are exactly that. When base64 encoded,
or otherwise transmitted. Its only when interpreted by a web browser
that they become... images.
The point is that neither the JPEG nor GIF image formats consist of ASCII text
with end of line markers. An image on the hard disk of a computer is what an
image is. What transmission formats are used by the HTML and WWW has nothing to
do with what images "are".

Of course, converting binaries to a format which transmits only 6 bits per
character is a huge waste of bandwidth, which shows that there is room for
improvement in those protocols. Instead of expanding binaries by over 33%, it
should be possible to limit the expansion to less than 2%.

John Savard
J. Clarke
2020-05-23 12:36:15 UTC
Permalink
On Sat, 23 May 2020 03:37:47 -0700 (PDT), Quadibloc
Post by Quadibloc
Post by f***@hotmail.com
Post by Gareth Evans
images are not ASCII text with end of line markers.
Oddly, Gareth, a lot of images are exactly that. When base64 encoded,
or otherwise transmitted. Its only when interpreted by a web browser
that they become... images.
The point is that neither the JPEG nor GIF image formats consist of ASCII text
with end of line markers. An image on the hard disk of a computer is what an
image is. What transmission formats are used by the HTML and WWW has nothing to
do with what images "are".
One could argue that a computer program stored in a ZIP file does not
consiste of ASCII text with end of line markers, because in the ZIP
file it has been compressed. JPEG and GIF are both compressed
formats.
Post by Quadibloc
Of course, converting binaries to a format which transmits only 6 bits per
character is a huge waste of bandwidth, which shows that there is room for
improvement in those protocols. Instead of expanding binaries by over 33%, it
should be possible to limit the expansion to less than 2%.
John Savard
r***@gmail.com
2020-05-23 12:38:20 UTC
Permalink
Post by Quadibloc
Post by f***@hotmail.com
Post by Gareth Evans
images are not ASCII text with end of line markers.
Oddly, Gareth, a lot of images are exactly that. When base64 encoded,
or otherwise transmitted. Its only when interpreted by a web browser
that they become... images.
The point is that neither the JPEG nor GIF image
nor TIFF nor PCX nor pretty all of the image formats
Post by Quadibloc
formats consist of ASCII text
with end of line markers. An image on the hard disk of a computer is what an
image is. What transmission formats are used by the HTML and WWW has nothing to
do with what images "are".
Of course, converting binaries to a format which transmits only 6 bits per
character is a huge waste of bandwidth, which shows that there is room for
improvement in those protocols. Instead of expanding binaries by over 33%, it
should be possible to limit the expansion to less than 2%.
John Savard
Dan Espen
2020-05-23 12:58:35 UTC
Permalink
Post by r***@gmail.com
Post by Quadibloc
Post by f***@hotmail.com
Post by Gareth Evans
images are not ASCII text with end of line markers.
Oddly, Gareth, a lot of images are exactly that. When base64 encoded,
or otherwise transmitted. Its only when interpreted by a web browser
that they become... images.
The point is that neither the JPEG nor GIF image
nor TIFF nor PCX nor pretty all of the image formats
2 notable exceptions, XPM and SVG, both are text.

SVG is interesting because it doesn't necessarily contain a bit mapped
image, it can contain vectors, shapes, text.
--
Dan Espen
John Levine
2020-05-23 18:42:52 UTC
Permalink
Post by Quadibloc
The point is that neither the JPEG nor GIF image formats consist of ASCII text
with end of line markers. An image on the hard disk of a computer is what an
image is. What transmission formats are used by the HTML and WWW has nothing to
do with what images "are".
Of course, converting binaries to a format which transmits only 6 bits per
character is a huge waste of bandwidth, which shows that there is room for
improvement in those protocols. Instead of expanding binaries by over 33%, it
should be possible to limit the expansion to less than 2%.
There are plenty of ways to send mail attachments without b64
encoding, e.g. BINARYMIME and CHUNKING, but for the most part nobody
cares. People who ship around giant files use dropbox and the like.
--
Regards,
John Levine, ***@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly
Jorgen Grahn
2020-05-23 19:28:59 UTC
Permalink
Post by Quadibloc
Post by f***@hotmail.com
Post by Gareth Evans
images are not ASCII text with end of line markers.
Oddly, Gareth, a lot of images are exactly that. When base64 encoded,
or otherwise transmitted. Its only when interpreted by a web browser
that they become... images.
The point is that neither the JPEG nor GIF image formats consist of ASCII text
with end of line markers. An image on the hard disk of a computer is what an
image is. What transmission formats are used by the HTML and WWW has nothing to
do with what images "are".
Of course, converting binaries to a format which transmits only 6 bits per
character is a huge waste of bandwidth, which shows that there is room for
improvement in those protocols.
Since the example was web browsers above, the protocol is HTTP, and it
doesn't use base64. It's designed so that you can feed the image file
as-is over the protocol.

/Jorgen
--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .
Bob Eager
2020-05-23 20:27:02 UTC
Permalink
Post by Jorgen Grahn
Post by Quadibloc
Post by f***@hotmail.com
Post by Gareth Evans
images are not ASCII text with end of line markers.
Oddly, Gareth, a lot of images are exactly that. When base64 encoded,
or otherwise transmitted. Its only when interpreted by a web browser
that they become... images.
The point is that neither the JPEG nor GIF image formats consist of
ASCII text with end of line markers. An image on the hard disk of a
computer is what an image is. What transmission formats are used by the
HTML and WWW has nothing to do with what images "are".
Of course, converting binaries to a format which transmits only 6 bits
per character is a huge waste of bandwidth, which shows that there is
room for improvement in those protocols.
Since the example was web browsers above, the protocol is HTTP, and it
doesn't use base64. It's designed so that you can feed the image file
as-is over the protocol.
See: Transfer-Encoding: chunked
--
Using UNIX since v6 (1975)...

Use the BIG mirror service in the UK:
http://www.mirrorservice.org
Peter Flass
2020-05-23 21:30:50 UTC
Permalink
Post by Bob Eager
Post by Jorgen Grahn
Post by Quadibloc
Post by f***@hotmail.com
Post by Gareth Evans
images are not ASCII text with end of line markers.
Oddly, Gareth, a lot of images are exactly that. When base64 encoded,
or otherwise transmitted. Its only when interpreted by a web browser
that they become... images.
The point is that neither the JPEG nor GIF image formats consist of
ASCII text with end of line markers. An image on the hard disk of a
computer is what an image is. What transmission formats are used by the
HTML and WWW has nothing to do with what images "are".
Of course, converting binaries to a format which transmits only 6 bits
per character is a huge waste of bandwidth, which shows that there is
room for improvement in those protocols.
Since the example was web browsers above, the protocol is HTTP, and it
doesn't use base64. It's designed so that you can feed the image file
as-is over the protocol.
See: Transfer-Encoding: chunked
I was trying to recall when I last used chunking - it was when I was
writing large files to floppies, some years ago.
--
Pete
Kerr-Mudd,John
2020-05-24 10:21:21 UTC
Permalink
Post by Peter Flass
Post by Bob Eager
Post by Jorgen Grahn
Post by Quadibloc
Post by f***@hotmail.com
Post by Gareth Evans
images are not ASCII text with end of line markers.
Oddly, Gareth, a lot of images are exactly that. When base64
encoded, or otherwise transmitted. Its only when interpreted by a
web browser that they become... images.
The point is that neither the JPEG nor GIF image formats consist of
ASCII text with end of line markers. An image on the hard disk of a
computer is what an image is. What transmission formats are used by
the HTML and WWW has nothing to do with what images "are".
Of course, converting binaries to a format which transmits only 6
bits per character is a huge waste of bandwidth, which shows that
there is room for improvement in those protocols.
Since the example was web browsers above, the protocol is HTTP, and
it doesn't use base64. It's designed so that you can feed the image
file as-is over the protocol.
See: Transfer-Encoding: chunked
I was trying to recall when I last used chunking - it was when I was
writing large files to floppies, some years ago.
Ah DOS slice & splice.
https://www.uselesssoftware.com/download/slice-zip
--
Bah, and indeed, Humbug.
Jorgen Grahn
2020-05-24 06:06:02 UTC
Permalink
Post by Bob Eager
Post by Jorgen Grahn
Post by Quadibloc
Post by f***@hotmail.com
Post by Gareth Evans
images are not ASCII text with end of line markers.
Oddly, Gareth, a lot of images are exactly that. When base64 encoded,
or otherwise transmitted. Its only when interpreted by a web browser
that they become... images.
The point is that neither the JPEG nor GIF image formats consist of
ASCII text with end of line markers. An image on the hard disk of a
computer is what an image is. What transmission formats are used by the
HTML and WWW has nothing to do with what images "are".
Of course, converting binaries to a format which transmits only 6 bits
per character is a huge waste of bandwidth, which shows that there is
room for improvement in those protocols.
Since the example was web browsers above, the protocol is HTTP, and it
doesn't use base64. It's designed so that you can feed the image file
as-is over the protocol.
See: Transfer-Encoding: chunked
That is another way, yes, but you don't have to use it, and if you
have the whole image file already (so you know its size) "chunked"
is not the best choice.

/Jorgen
--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .
Bob Eager
2020-05-24 10:45:12 UTC
Permalink
Post by Bob Eager
Post by Jorgen Grahn
Post by Quadibloc
Post by f***@hotmail.com
Post by Gareth Evans
images are not ASCII text with end of line markers.
Oddly, Gareth, a lot of images are exactly that. When base64 encoded,
or otherwise transmitted. Its only when interpreted by a web browser
that they become... images.
The point is that neither the JPEG nor GIF image formats consist of
ASCII text with end of line markers. An image on the hard disk of a
computer is what an image is. What transmission formats are used by
the HTML and WWW has nothing to do with what images "are".
Of course, converting binaries to a format which transmits only 6
bits per character is a huge waste of bandwidth, which shows that
there is room for improvement in those protocols.
Since the example was web browsers above, the protocol is HTTP, and it
doesn't use base64. It's designed so that you can feed the image file
as-is over the protocol.
See: Transfer-Encoding: chunked
That is another way, yes, but you don't have to use it, and if you have
the whole image file already (so you know its size) "chunked"
is not the best choice.
Indeed. But it shows that the statement "HTTP doesn't use base64" is a
little careless. It is part of the protocol.
--
Using UNIX since v6 (1975)...

Use the BIG mirror service in the UK:
http://www.mirrorservice.org
Quadibloc
2020-05-24 14:03:37 UTC
Permalink
Post by Jorgen Grahn
Since the example was web browsers above, the protocol is HTTP, and it
doesn't use base64. It's designed so that you can feed the image file
as-is over the protocol.
I vaguely remember reading something about the Internet ages ago that said it
was equipped to send binary data by doing this:

Everything in the binary data was sent as is, except for the character DLE. That
was sent as DLE DLE.

So when a special function needed to be sent, it was sent as DLE (something
else). End of record might be DLE CR or DLE RS or some such thing.

John Savard
J. Clarke
2020-05-24 14:43:44 UTC
Permalink
On Sun, 24 May 2020 07:03:37 -0700 (PDT), Quadibloc
Post by Quadibloc
Post by Jorgen Grahn
Since the example was web browsers above, the protocol is HTTP, and it
doesn't use base64. It's designed so that you can feed the image file
as-is over the protocol.
I vaguely remember reading something about the Internet ages ago that said it
Everything in the binary data was sent as is, except for the character DLE. That
was sent as DLE DLE.
So when a special function needed to be sent, it was sent as DLE (something
else). End of record might be DLE CR or DLE RS or some such thing.
I believe that this is part of the ASCII spec--Octets following DLE
are to be passed as binary data. There is no defined way to end the
binary--that would be device dependent.
Post by Quadibloc
John Savard
Mike Spencer
2020-04-14 20:49:51 UTC
Permalink
Post by Gareth Evans
Post by Bob Eager
Besides, thinking about this, what's the distinction between a
two-byte count field and a two-byte (CRLF) delimiter?
The latter can handle a line of any length.
I have yet to encounter a line of any length that exceeds
65,535 characters! :-)
I haven't made an actual byte count but on occasions when I've saved
some random web page in order to figure out why it didn't render
properly or the like, I see that many "pages" appear not to have any
(many?) line breaks. Markup, js, style blocks and content all cheek
by serif. HTML is rendered according to the markup and \n is
superfluous. Appears that many pages built on the fly by bots leave
them out. Voila! (potentially) Len > 2^16.

(For a human trying to read such a page, s/>/>\n\n/g.)
--
Mike Spencer Nova Scotia, Canada
Peter Flass
2020-04-14 13:29:25 UTC
Permalink
Post by Bob Eager
Post by Peter Flass
Post by Peter Flass
Post by Quadibloc
But, if device behaviours are to be encoded in printable text, surely
the end-of-line marker should not be LF, but CR, to reflect, for
example, the behaviour of typewriters?
It is certainly true that, on the System/360 time-sharing system that
I used first, on ASCII terminals, one pressed the carriage return key,
and not the line feed key, to end a line, because it was certain to be
present, and when both were present, it was usually larger or more
conveniently placed.
And the Macintosh, at least originally, used CR rather than LF to
separate lines in its text files. MS-DOS used <CR><LF>. So I do find
CR to be a bit more natural than LF.
However, in my opinion, using _any_ character to delimit lines of text
in a text file is a bad idea. Instead of a text file being a bunch of
C strings (which, of course, are ended by NUL) it should be a bunch of
Pascal strings (n, followd by n characters).
Two benefits.
Better protection against buffer overflows.
No strange problems when one creates a file where binary data and text
are mixed, since now all 256 bytes are legal, no value means "end of
record".
John Savard
I agree. Using a character as a delimiter causes all kinds of problems.
Besides, thinking about this, what’s the distinction between a two-byte
count field and a two-byte (CRLF) delimiter?
The latter can handle a line of any length.
Do you want a 16,000 character line?
--
Pete
Ahem A Rivet's Shot
2020-04-18 07:42:36 UTC
Permalink
On Tue, 14 Apr 2020 06:29:25 -0700
Post by Peter Flass
Post by Bob Eager
Post by Peter Flass
Besides, thinking about this, what’s the distinction between a two-byte
count field and a two-byte (CRLF) delimiter?
The latter can handle a line of any length.
Do you want a 16,000 character line?
They happen in systems that use line delimiters therefore some
people want them. The most common cases are serialised structured data and
marked up text where lines are not significant in the application. Arguably
neither are text but rather data in a text encoding which works as a way of
storing text too.
--
Steve O'Hara-Smith | Directable Mirror Arrays
C:\>WIN | A better way to focus the sun
The computer obeys and wins. | licences available see
You lose and Bill collects. | http://www.sohara.org/
Peter Flass
2020-04-18 18:33:31 UTC
Permalink
Post by Ahem A Rivet's Shot
On Tue, 14 Apr 2020 06:29:25 -0700
Post by Peter Flass
Post by Bob Eager
Post by Peter Flass
Besides, thinking about this, what’s the distinction between a two-byte
count field and a two-byte (CRLF) delimiter?
The latter can handle a line of any length.
Do you want a 16,000 character line?
They happen in systems that use line delimiters therefore some
people want them. The most common cases are serialised structured data and
marked up text where lines are not significant in the application.
If line breaks are not significant, why not include them so the file will
be human-readable. Sometime you want to look at this stuff to debug
something.
Post by Ahem A Rivet's Shot
Arguably
neither are text but rather data in a text encoding which works as a way of
storing text too.
--
Pete
Charlie Gibbs
2020-04-18 19:11:04 UTC
Permalink
Post by Peter Flass
Post by Ahem A Rivet's Shot
On Tue, 14 Apr 2020 06:29:25 -0700
Post by Peter Flass
Post by Bob Eager
Post by Peter Flass
Besides, thinking about this, what’s the distinction between a two-byte
count field and a two-byte (CRLF) delimiter?
The latter can handle a line of any length.
Do you want a 16,000 character line?
They happen in systems that use line delimiters therefore some
people want them. The most common cases are serialised structured data and
marked up text where lines are not significant in the application.
If line breaks are not significant, why not include them so the file will
be human-readable. Sometime you want to look at this stuff to debug
something.
I've seen HTML and XML files with no line delimiters at all.
I suspect that whoever created them wants them to be write-only.
(I wrote a routine to insert line breaks so I could easily read them
to figure out what they're up to or to glean useful information.)
--
/~\ Charlie Gibbs | Microsoft is a dictatorship.
\ / <***@kltpzyxm.invalid> | Apple is a cult.
X I'm really at ac.dekanfrus | Linux is anarchy.
/ \ if you read it the right way. | Pick your poison.
Dan Espen
2020-04-18 22:32:15 UTC
Permalink
Post by Charlie Gibbs
Post by Peter Flass
Post by Ahem A Rivet's Shot
On Tue, 14 Apr 2020 06:29:25 -0700
Post by Peter Flass
Post by Bob Eager
Post by Peter Flass
Besides, thinking about this, what’s the distinction between a two-byte
count field and a two-byte (CRLF) delimiter?
The latter can handle a line of any length.
Do you want a 16,000 character line?
They happen in systems that use line delimiters therefore some
people want them. The most common cases are serialised structured data and
marked up text where lines are not significant in the application.
If line breaks are not significant, why not include them so the file will
be human-readable. Sometime you want to look at this stuff to debug
something.
I've seen HTML and XML files with no line delimiters at all.
I suspect that whoever created them wants them to be write-only.
(I wrote a routine to insert line breaks so I could easily read them
to figure out what they're up to or to glean useful information.)
Yep, all those WhizzyWig html editors do that.
It's a shame, hand written HTML is much more fun.
--
Dan Espen
Scott Lurndal
2020-04-19 01:55:14 UTC
Permalink
Post by Charlie Gibbs
Post by Peter Flass
Post by Ahem A Rivet's Shot
On Tue, 14 Apr 2020 06:29:25 -0700
Post by Peter Flass
Post by Bob Eager
Besides, thinking about this, what’s the distinction between a two-byte
count field and a two-byte (CRLF) delimiter?
The latter can handle a line of any length.
Do you want a 16,000 character line?
They happen in systems that use line delimiters therefore some
people want them. The most common cases are serialised structured data and
marked up text where lines are not significant in the application.
If line breaks are not significant, why not include them so the file will
be human-readable. Sometime you want to look at this stuff to debug
something.
I've seen HTML and XML files with no line delimiters at all.
I suspect that whoever created them wants them to be write-only.
(I wrote a routine to insert line breaks so I could easily read them
to figure out what they're up to or to glean useful information.)
There are many XML pretty-printers out there that will make it
more readable (indenting nested tags, e.g.)
Scott Lurndal
2020-04-18 20:03:31 UTC
Permalink
Post by Peter Flass
Post by Ahem A Rivet's Shot
They happen in systems that use line delimiters therefore some
people want them. The most common cases are serialised structured data and
marked up text where lines are not significant in the application.
If line breaks are not significant, why not include them so the file will
be human-readable. Sometime you want to look at this stuff to debug
something.
That's what editors and other tools are for. Why waste the disk space
when a 200MB disk is incredibly expensive? Burroughs had DMPALL,
Unix had od (and more recently, vim comes with xxd) or 'cat -v', etc.
Ahem A Rivet's Shot
2020-04-18 21:44:35 UTC
Permalink
On Sat, 18 Apr 2020 11:33:31 -0700
Post by Peter Flass
Post by Ahem A Rivet's Shot
They happen in systems that use line delimiters therefore some
people want them. The most common cases are serialised structured data
and marked up text where lines are not significant in the application.
If line breaks are not significant, why not include them so the file will
be human-readable. Sometime you want to look at this stuff to debug
something.
Why waste the space storing or transmitting newlines, you can
always use a pretty-printer to view the data if you want to, especially
since you'll almost certainly want indentation as well as newlines to make
it comfortably readable.
--
Steve O'Hara-Smith | Directable Mirror Arrays
C:\>WIN | A better way to focus the sun
The computer obeys and wins. | licences available see
You lose and Bill collects. | http://www.sohara.org/
Jorgen Grahn
2020-04-19 08:46:34 UTC
Permalink
Post by Ahem A Rivet's Shot
On Sat, 18 Apr 2020 11:33:31 -0700
Post by Peter Flass
Post by Ahem A Rivet's Shot
They happen in systems that use line delimiters therefore some
people want them. The most common cases are serialised structured data
and marked up text where lines are not significant in the application.
If line breaks are not significant, why not include them so the file will
be human-readable. Sometime you want to look at this stuff to debug
something.
Why waste the space storing or transmitting newlines, you can
always use a pretty-printer to view the data if you want to, especially
since you'll almost certainly want indentation as well as newlines to make
it comfortably readable.
If it doesn't need to be readable without tools, you might as well
format it properly and then gzip it. That would be a far bigger
saving (for typical XML, anyway).

/Jorgen
--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .
Ahem A Rivet's Shot
2020-04-19 09:58:08 UTC
Permalink
On 19 Apr 2020 08:46:34 GMT
Post by Jorgen Grahn
Post by Ahem A Rivet's Shot
On Sat, 18 Apr 2020 11:33:31 -0700
Post by Peter Flass
Post by Ahem A Rivet's Shot
They happen in systems that use line delimiters therefore
some people want them. The most common cases are serialised
structured data and marked up text where lines are not significant
in the application.
If line breaks are not significant, why not include them so the file
will be human-readable. Sometime you want to look at this stuff to
debug something.
Why waste the space storing or transmitting newlines, you can
always use a pretty-printer to view the data if you want to, especially
since you'll almost certainly want indentation as well as newlines to
make it comfortably readable.
If it doesn't need to be readable without tools, you might as well
format it properly and then gzip it. That would be a far bigger
saving (for typical XML, anyway).
Sure if you want the processing overhead of compression and
decompression which for serialised data or RPC you usually don't!
--
Steve O'Hara-Smith | Directable Mirror Arrays
C:\>WIN | A better way to focus the sun
The computer obeys and wins. | licences available see
You lose and Bill collects. | http://www.sohara.org/
John Levine
2020-04-18 22:22:36 UTC
Permalink
Post by Peter Flass
If line breaks are not significant, why not include them so the file will
be human-readable. Sometime you want to look at this stuff to debug
something.
Because we don't know how big the screens or windows are on which
we'll be looking at them. Computers are really good at wrapping text
on the fly.
--
Regards,
John Levine, ***@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly
J. Clarke
2020-04-18 22:55:05 UTC
Permalink
On Sat, 18 Apr 2020 22:22:36 -0000 (UTC), John Levine
Post by John Levine
Post by Peter Flass
If line breaks are not significant, why not include them so the file will
be human-readable. Sometime you want to look at this stuff to debug
something.
Because we don't know how big the screens or windows are on which
we'll be looking at them. Computers are really good at wrapping text
on the fly.
I think that this is something that people who have managed somehow to
stick their mindset into the 80x25 mindset don't get. Right now I'm
using a 49 inch 8 megapixel display. At work I typically use two two
megapixel 21 inch displayes. And it is possible to buy from a mass
market vendor a 75 inch 16 megapixel display. The same web site has
to work on all of those, and on a cell phone, and on a laptop, and on
a tablet. This means that formatting has to be quite flexible.

And people who design web sites for a living generally don't type
HTML--they use applications such as DreamWeaver that let them do most
of the work graphically. For that matter many web sites aren't
actually written in HTML, they generate HTML on the fly.
Peter Flass
2020-04-19 01:03:18 UTC
Permalink
On Sat, 18 Apr 2020 22:22:36 -0000 (UTC), John Levine <***@taugh.com>
wrote:
In article
<174465144.608927440.165750.peter_flass-***@news.eternal-september.org>,
Peter Flass <***@yahoo.com> wrote:
If line breaks are not significant, why not include them so the file will
be human-readable. Sometime you want to look at this stuff to debug
something.
Because we don't know how big the screens or windows are on which we'll be
looking at them. Computers are really good at wrapping text on the fly.

I think that this is something that people who have managed somehow to
stick their mindset into the 80x25 mindset don't get. Right now I'm using a
49 inch 8 megapixel display. At work I typically use two two megapixel 21
inch displayes. And it is possible to buy from a mass market vendor a 75
inch 16 megapixel display. The same web site has to work on all of those,
and on a cell phone, and on a laptop, and on a tablet. This means that
formatting has to be quite flexible.

I don’t have anything that big, but I like to have a bunch of smaller
windows open rather than a really big one. I think it would be very hard to
read across a line of text several hundred characters long.

And people who design web sites for a living generally don't type
HTML--they use applications such as DreamWeaver that let them do most of
the work graphically. For that matter many web sites aren't actually
written in HTML, they generate HTML on the fly.

Which explains why so many websites are crap.

(sorry, I lost the indentations)
--
Pete
Jorgen Grahn
2020-04-19 08:55:31 UTC
Permalink
Post by J. Clarke
On Sat, 18 Apr 2020 22:22:36 -0000 (UTC), John Levine
Post by John Levine
Post by Peter Flass
If line breaks are not significant, why not include them so the file will
be human-readable. Sometime you want to look at this stuff to debug
something.
Because we don't know how big the screens or windows are on which
we'll be looking at them. Computers are really good at wrapping text
on the fly.
I think that this is something that people who have managed somehow to
stick their mindset into the 80x25 mindset don't get. Right now I'm
using a 49 inch 8 megapixel display. At work I typically use two two
megapixel 21 inch displayes. And it is possible to buy from a mass
market vendor a 75 inch 16 megapixel display. The same web site has
to work on all of those, and on a cell phone, and on a laptop, and on
a tablet. This means that formatting has to be quite flexible.
It's still a fact that it's hard to read text wider than 60--70
characters; computers can't fix that.

/Jorgen
--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .
J. Clarke
2020-04-19 12:16:44 UTC
Permalink
Post by Jorgen Grahn
Post by J. Clarke
On Sat, 18 Apr 2020 22:22:36 -0000 (UTC), John Levine
Post by John Levine
Post by Peter Flass
If line breaks are not significant, why not include them so the file will
be human-readable. Sometime you want to look at this stuff to debug
something.
Because we don't know how big the screens or windows are on which
we'll be looking at them. Computers are really good at wrapping text
on the fly.
I think that this is something that people who have managed somehow to
stick their mindset into the 80x25 mindset don't get. Right now I'm
using a 49 inch 8 megapixel display. At work I typically use two two
megapixel 21 inch displayes. And it is possible to buy from a mass
market vendor a 75 inch 16 megapixel display. The same web site has
to work on all of those, and on a cell phone, and on a laptop, and on
a tablet. This means that formatting has to be quite flexible.
It's still a fact that it's hard to read text wider than 60--70
characters; computers can't fix that.
Do you have research to support that statement?
Post by Jorgen Grahn
/Jorgen
Dan Espen
2020-04-19 13:19:43 UTC
Permalink
Post by J. Clarke
Post by Jorgen Grahn
Post by J. Clarke
On Sat, 18 Apr 2020 22:22:36 -0000 (UTC), John Levine
Post by John Levine
Post by Peter Flass
If line breaks are not significant, why not include them so the file will
be human-readable. Sometime you want to look at this stuff to debug
something.
Because we don't know how big the screens or windows are on which
we'll be looking at them. Computers are really good at wrapping text
on the fly.
I think that this is something that people who have managed somehow to
stick their mindset into the 80x25 mindset don't get. Right now I'm
using a 49 inch 8 megapixel display. At work I typically use two two
megapixel 21 inch displayes. And it is possible to buy from a mass
market vendor a 75 inch 16 megapixel display. The same web site has
to work on all of those, and on a cell phone, and on a laptop, and on
a tablet. This means that formatting has to be quite flexible.
It's still a fact that it's hard to read text wider than 60--70
characters; computers can't fix that.
Do you have research to support that statement?
Seems to me I do that "research" every time I try to read text on a wide
screen. Maybe you can do it, but for me after about 80-90 characters
it becomes hard to follow the line.
--
Dan Espen
J. Clarke
2020-04-19 14:24:28 UTC
Permalink
Post by Dan Espen
Post by J. Clarke
Post by Jorgen Grahn
Post by J. Clarke
On Sat, 18 Apr 2020 22:22:36 -0000 (UTC), John Levine
Post by John Levine
Post by Peter Flass
If line breaks are not significant, why not include them so the file will
be human-readable. Sometime you want to look at this stuff to debug
something.
Because we don't know how big the screens or windows are on which
we'll be looking at them. Computers are really good at wrapping text
on the fly.
I think that this is something that people who have managed somehow to
stick their mindset into the 80x25 mindset don't get. Right now I'm
using a 49 inch 8 megapixel display. At work I typically use two two
megapixel 21 inch displayes. And it is possible to buy from a mass
market vendor a 75 inch 16 megapixel display. The same web site has
to work on all of those, and on a cell phone, and on a laptop, and on
a tablet. This means that formatting has to be quite flexible.
It's still a fact that it's hard to read text wider than 60--70
characters; computers can't fix that.
Do you have research to support that statement?
Seems to me I do that "research" every time I try to read text on a wide
screen. Maybe you can do it, but for me after about 80-90 characters
it becomes hard to follow the line.
How about if the line is 80 characters with hardcoded line breaks, but
shortened to 60 characters?

For some purposes CR/LF is a paragraph break, not a line break. I
suspect that if they had had a crystal ball the early architects would
have included a "paragraph break" character, but we do the best we can
with what we have.
Peter Flass
2020-04-19 17:47:48 UTC
Permalink
Post by J. Clarke
Post by Dan Espen
Post by J. Clarke
Post by Jorgen Grahn
Post by J. Clarke
On Sat, 18 Apr 2020 22:22:36 -0000 (UTC), John Levine
Post by Peter Flass
In article
Post by Peter Flass
If line breaks are not significant, why not include them so the file will
be human-readable. Sometime you want to look at this stuff to debug
something.
Because we don't know how big the screens or windows are on which
we'll be looking at them. Computers are really good at wrapping text
on the fly.
I think that this is something that people who have managed somehow to
stick their mindset into the 80x25 mindset don't get. Right now I'm
using a 49 inch 8 megapixel display. At work I typically use two two
megapixel 21 inch displayes. And it is possible to buy from a mass
market vendor a 75 inch 16 megapixel display. The same web site has
to work on all of those, and on a cell phone, and on a laptop, and on
a tablet. This means that formatting has to be quite flexible.
It's still a fact that it's hard to read text wider than 60--70
characters; computers can't fix that.
Do you have research to support that statement?
Seems to me I do that "research" every time I try to read text on a wide
screen. Maybe you can do it, but for me after about 80-90 characters
it becomes hard to follow the line.
How about if the line is 80 characters with hardcoded line breaks, but
shortened to 60 characters?
For some purposes CR/LF is a paragraph break, not a line break. I
suspect that if they had had a crystal ball the early architects would
have included a "paragraph break" character, but we do the best we can
with what we have.
We have GS, US, and RS characters that no one’s using.
--
Pete
Scott Lurndal
2020-04-20 16:02:34 UTC
Permalink
Post by J. Clarke
Post by Dan Espen
Post by J. Clarke
Post by Jorgen Grahn
Post by J. Clarke
On Sat, 18 Apr 2020 22:22:36 -0000 (UTC), John Levine
Post by Peter Flass
In article
Post by Peter Flass
If line breaks are not significant, why not include them so the file will
be human-readable. Sometime you want to look at this stuff to debug
something.
Because we don't know how big the screens or windows are on which
we'll be looking at them. Computers are really good at wrapping text
on the fly.
I think that this is something that people who have managed somehow to
stick their mindset into the 80x25 mindset don't get. Right now I'm
using a 49 inch 8 megapixel display. At work I typically use two two
megapixel 21 inch displayes. And it is possible to buy from a mass
market vendor a 75 inch 16 megapixel display. The same web site has
to work on all of those, and on a cell phone, and on a laptop, and on
a tablet. This means that formatting has to be quite flexible.
It's still a fact that it's hard to read text wider than 60--70
characters; computers can't fix that.
Do you have research to support that statement?
Seems to me I do that "research" every time I try to read text on a wide
screen. Maybe you can do it, but for me after about 80-90 characters
it becomes hard to follow the line.
How about if the line is 80 characters with hardcoded line breaks, but
shortened to 60 characters?
For some purposes CR/LF is a paragraph break, not a line break. I
suspect that if they had had a crystal ball the early architects would
have included a "paragraph break" character, but we do the best we can
with what we have.
We have GS, US, and RS characters that no one’s using.
Burroughs systems still use them (to introduce fields in forms mode on
full-screen terminals (albeit emulated in modern days)).
Jan van den Broek
2020-05-05 14:32:24 UTC
Permalink
Sun, 19 Apr 2020 10:47:48 -0700
Peter Flass <***@yahoo.com> schrieb:

[Schnipp]
We have GS, US, and RS characters that no one???s using.
I've used them (about ten years ago) and there's a reasonable change
that this is still running.
--
Jan v/d Broek ***@xs4all.nl

"And all those exclamation marks, you notice? Five? A sure
sign of someone who wears his underpants on his head."
Terry Pratchett -- "Maskerade"
Scott Lurndal
2020-04-19 14:54:12 UTC
Permalink
Post by Dan Espen
Post by J. Clarke
Post by Jorgen Grahn
Post by J. Clarke
On Sat, 18 Apr 2020 22:22:36 -0000 (UTC), John Levine
Post by John Levine
Post by Peter Flass
If line breaks are not significant, why not include them so the file will
be human-readable. Sometime you want to look at this stuff to debug
something.
Because we don't know how big the screens or windows are on which
we'll be looking at them. Computers are really good at wrapping text
on the fly.
I think that this is something that people who have managed somehow to
stick their mindset into the 80x25 mindset don't get. Right now I'm
using a 49 inch 8 megapixel display. At work I typically use two two
megapixel 21 inch displayes. And it is possible to buy from a mass
market vendor a 75 inch 16 megapixel display. The same web site has
to work on all of those, and on a cell phone, and on a laptop, and on
a tablet. This means that formatting has to be quite flexible.
It's still a fact that it's hard to read text wider than 60--70
characters; computers can't fix that.
Do you have research to support that statement?
Seems to me I do that "research" every time I try to read text on a wide
screen. Maybe you can do it, but for me after about 80-90 characters
it becomes hard to follow the line.
Greenbar screens? :-)
Dan Espen
2020-04-19 16:19:28 UTC
Permalink
Post by Scott Lurndal
Post by Dan Espen
Post by J. Clarke
Post by Jorgen Grahn
Post by J. Clarke
On Sat, 18 Apr 2020 22:22:36 -0000 (UTC), John Levine
Post by John Levine
Post by Peter Flass
If line breaks are not significant, why not include them so the file will
be human-readable. Sometime you want to look at this stuff to debug
something.
Because we don't know how big the screens or windows are on which
we'll be looking at them. Computers are really good at wrapping text
on the fly.
I think that this is something that people who have managed somehow to
stick their mindset into the 80x25 mindset don't get. Right now I'm
using a 49 inch 8 megapixel display. At work I typically use two two
megapixel 21 inch displayes. And it is possible to buy from a mass
market vendor a 75 inch 16 megapixel display. The same web site has
to work on all of those, and on a cell phone, and on a laptop, and on
a tablet. This means that formatting has to be quite flexible.
It's still a fact that it's hard to read text wider than 60--70
characters; computers can't fix that.
Do you have research to support that statement?
Seems to me I do that "research" every time I try to read text on a wide
screen. Maybe you can do it, but for me after about 80-90 characters
it becomes hard to follow the line.
Greenbar screens? :-)
Greenbar certainly helped scan those 132 or 144 character print lines.
I have a greenbar image I used as a tiled background when I wanted to show a
computer report sample.
--
Dan Espen
Peter Flass
2020-04-19 17:47:47 UTC
Permalink
Post by Dan Espen
Post by J. Clarke
Post by Jorgen Grahn
Post by J. Clarke
On Sat, 18 Apr 2020 22:22:36 -0000 (UTC), John Levine
Post by Peter Flass
In article
Post by Peter Flass
If line breaks are not significant, why not include them so the file will
be human-readable. Sometime you want to look at this stuff to debug
something.
Because we don't know how big the screens or windows are on which
we'll be looking at them. Computers are really good at wrapping text
on the fly.
I think that this is something that people who have managed somehow to
stick their mindset into the 80x25 mindset don't get. Right now I'm
using a 49 inch 8 megapixel display. At work I typically use two two
megapixel 21 inch displayes. And it is possible to buy from a mass
market vendor a 75 inch 16 megapixel display. The same web site has
to work on all of those, and on a cell phone, and on a laptop, and on
a tablet. This means that formatting has to be quite flexible.
It's still a fact that it's hard to read text wider than 60--70
characters; computers can't fix that.
Do you have research to support that statement?
Seems to me I do that "research" every time I try to read text on a wide
screen. Maybe you can do it, but for me after about 80-90 characters
it becomes hard to follow the line.
+1

The IBM 3290 could do up to 160 in one mode, but it had nice features like
a horizontal underline that followed the cursor to mark the current line.
That and a nice block cursor would help you keep track of where you were.
--
Pete
Quadibloc
2020-04-19 16:26:20 UTC
Permalink
Post by J. Clarke
Post by Jorgen Grahn
It's still a fact that it's hard to read text wider than 60--70
characters; computers can't fix that.
Do you have research to support that statement?
I don't know about research, but typesetters have been, since time immemorial,
treating the ideal column width as one and a half alphabets, with widths of up
to two alphabets being acceptable.

And this is, of course, for easier-to-read proportionally-spaced text, rather
than monospaced typewritten text.

There is research on text legibility as well, one could do worse than to start
with _Typographical Printing Surfaces_ by Legros and Grant.

John Savard
Johann 'Myrkraverk' Oskarsson
2020-04-20 09:40:11 UTC
Permalink
Post by Quadibloc
Post by J. Clarke
Post by Jorgen Grahn
It's still a fact that it's hard to read text wider than 60--70
characters; computers can't fix that.
Do you have research to support that statement?
I don't know about research, but typesetters have been, since time immemorial,
treating the ideal column width as one and a half alphabets, with widths of up
to two alphabets being acceptable.
And this is, of course, for easier-to-read proportionally-spaced text, rather
than monospaced typewritten text.
There is research on text legibility as well, one could do worse than to start
with _Typographical Printing Surfaces_ by Legros and Grant.
I seem to recall research, or quoted research, that said 66 characters
per line was the ideal reading length. Right now I cannot recall the
context, so I don't know if fixed width or variable width was relevant
to that supposed research. I might have gotten it from Knuth's Digital
Typography, but I don't have that book here with me to check.
--
Johann | email: invalid -> com | www.myrkraverk.com/blog/
I'm not from the Internet, I just work there. | twitter: @myrkraverk
Fred Smith
2020-04-20 23:05:42 UTC
Permalink
Post by Johann 'Myrkraverk' Oskarsson
I seem to recall research, or quoted research, that said 66 characters
per line was the ideal reading length. Right now I cannot recall the
context, so I don't know if fixed width or variable width was relevant
to that supposed research. I might have gotten it from Knuth's Digital
Typography, but I don't have that book here with me to check.
Knuth's influence would explain why TeX/LaTeX has fairly narrow columns
in the default article class. And in \twocolumn mode. Looks great, easy
to read, but produces a lot of white space on the page.
Jorgen Grahn
2020-04-21 06:07:23 UTC
Permalink
Post by Fred Smith
Post by Johann 'Myrkraverk' Oskarsson
I seem to recall research, or quoted research, that said 66 characters
per line was the ideal reading length. Right now I cannot recall the
context, so I don't know if fixed width or variable width was relevant
to that supposed research. I might have gotten it from Knuth's Digital
Typography, but I don't have that book here with me to check.
Knuth's influence would explain why TeX/LaTeX has fairly narrow columns
in the default article class. And in \twocolumn mode.
Probably, but I think he was just applying existing best practices.
Post by Fred Smith
Looks great, easy to read, but produces a lot of white space on the
page.
That effect is bad enough on A4-sized paper; must be even worse on
Letter. Perhaps \twocolumn should be used more often.

/Jorgen
--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .
Rich Alderson
2020-04-22 19:08:19 UTC
Permalink
Post by Jorgen Grahn
Post by Fred Smith
Post by Johann 'Myrkraverk' Oskarsson
I seem to recall research, or quoted research, that said 66 characters
per line was the ideal reading length. Right now I cannot recall the
context, so I don't know if fixed width or variable width was relevant
to that supposed research. I might have gotten it from Knuth's Digital
Typography, but I don't have that book here with me to check.
Knuth's influence would explain why TeX/LaTeX has fairly narrow columns
in the default article class. And in \twocolumn mode.
Probably, but I think he was just applying existing best practices.
Post by Fred Smith
Looks great, easy to read, but produces a lot of white space on the
page.
That effect is bad enough on A4-sized paper; must be even worse on
Letter. Perhaps \twocolumn should be used more often.
Knuth was writing in the context of scientific/mathematical literature, where
wide margins are the norm so that notes can be added by the reader.
--
Rich Alderson ***@alderson.users.panix.com
Audendum est, et veritas investiganda; quam etiamsi non assequamur,
omnino tamen proprius, quam nunc sumus, ad eam perveniemus.
--Galen
Jorgen Grahn
2020-04-22 20:22:49 UTC
Permalink
Post by Rich Alderson
Post by Jorgen Grahn
Post by Fred Smith
Post by Johann 'Myrkraverk' Oskarsson
I seem to recall research, or quoted research, that said 66 characters
per line was the ideal reading length. Right now I cannot recall the
context, so I don't know if fixed width or variable width was relevant
to that supposed research. I might have gotten it from Knuth's Digital
Typography, but I don't have that book here with me to check.
Knuth's influence would explain why TeX/LaTeX has fairly narrow columns
in the default article class. And in \twocolumn mode.
Probably, but I think he was just applying existing best practices.
Post by Fred Smith
Looks great, easy to read, but produces a lot of white space on the
page.
That effect is bad enough on A4-sized paper; must be even worse on
Letter. Perhaps \twocolumn should be used more often.
Knuth was writing in the context of scientific/mathematical literature, where
wide margins are the norm so that notes can be added by the reader.
I picked up the closest piece of fiction I had handy (a 1952 SF
paperback) and counted a randomly selected line: 59 characters.

Also:
https://en.wikipedia.org/wiki/Line_length

/Jorgen
--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .
Dan Espen
2020-04-22 23:40:50 UTC
Permalink
Post by Rich Alderson
Post by Jorgen Grahn
Post by Fred Smith
Post by Johann 'Myrkraverk' Oskarsson
I seem to recall research, or quoted research, that said 66 characters
per line was the ideal reading length. Right now I cannot recall the
context, so I don't know if fixed width or variable width was relevant
to that supposed research. I might have gotten it from Knuth's Digital
Typography, but I don't have that book here with me to check.
Knuth's influence would explain why TeX/LaTeX has fairly narrow columns
in the default article class. And in \twocolumn mode.
Probably, but I think he was just applying existing best practices.
Post by Fred Smith
Looks great, easy to read, but produces a lot of white space on the
page.
That effect is bad enough on A4-sized paper; must be even worse on
Letter. Perhaps \twocolumn should be used more often.
Knuth was writing in the context of scientific/mathematical literature, where
wide margins are the norm so that notes can be added by the reader.
I was working on an installation guide. I took over writing the
documentation from the tech writers. I put everything I could in the
help screens and massively simplified the process so the installation
guide went from book sized to pamphlet sized. The tech writers wanted
their job back and they claimed their first improvement was going to be
adding white space to the document.

I managed to pull a few levers and remove the tech writers from the
project.

Most us learn not to write in the margins early in our education.

Can you tell I'm not a fan of white space?
--
Dan Espen
Quadibloc
2020-04-23 06:23:36 UTC
Permalink
Post by Rich Alderson
Knuth was writing in the context of scientific/mathematical literature, where
wide margins are the norm so that notes can be added by the reader.
The scientific community learned its lession after Fermat's last theorem?

John Savard
r***@gmail.com
2020-04-23 10:01:31 UTC
Permalink
Post by Quadibloc
Post by Rich Alderson
Knuth was writing in the context of scientific/mathematical literature, where
wide margins are the norm so that notes can be added by the reader.
The scientific community learned its lession after Fermat's last theorem?
There was a need for wider margins!

Had there not been wide margins, we would not have
seen this printed statement
"The nuns have been violating their cows."
and its penned response:
"Better send them a Papal Bull."
Peter Flass
2020-04-19 17:47:46 UTC
Permalink
Post by J. Clarke
Post by Jorgen Grahn
Post by J. Clarke
On Sat, 18 Apr 2020 22:22:36 -0000 (UTC), John Levine
Post by Peter Flass
In article
Post by Peter Flass
If line breaks are not significant, why not include them so the file will
be human-readable. Sometime you want to look at this stuff to debug
something.
Because we don't know how big the screens or windows are on which
we'll be looking at them. Computers are really good at wrapping text
on the fly.
I think that this is something that people who have managed somehow to
stick their mindset into the 80x25 mindset don't get. Right now I'm
using a 49 inch 8 megapixel display. At work I typically use two two
megapixel 21 inch displayes. And it is possible to buy from a mass
market vendor a 75 inch 16 megapixel display. The same web site has
to work on all of those, and on a cell phone, and on a laptop, and on
a tablet. This means that formatting has to be quite flexible.
It's still a fact that it's hard to read text wider than 60--70
characters; computers can't fix that.
Do you have research to support that statement?
Post by Jorgen Grahn
/Jorgen
I’m sure there is some. I’d like to see it. If i get time I’ll search.
--
Pete
Ahem A Rivet's Shot
2020-04-13 18:54:24 UTC
Permalink
On Sun, 12 Apr 2020 22:11:41 -0700 (PDT)
Post by Quadibloc
However, in my opinion, using _any_ character to delimit lines of text in
a text file is a bad idea. Instead of a text file being a bunch of C
strings (which, of course, are ended by NUL) it should be a bunch of
Pascal strings (n, followd by n characters).
Typically a text file is a bunch of characters usually including
line terminators, not a bunch of strings of any type.
--
Steve O'Hara-Smith | Directable Mirror Arrays
C:\>WIN | A better way to focus the sun
The computer obeys and wins. | licences available see
You lose and Bill collects. | http://www.sohara.org/
Carlos E.R.
2020-04-14 03:24:53 UTC
Permalink
Post by Gareth Evans
Both CR and LF are command characters that reflect the
mechanical make-up of some printing devices, the
ASR / KSR  33 / 35 perhaps being the most common at
the time of the creation of the code.
But where there are those who argue that device
characteristics  should be hidden away in the
device drivers and not be embedded in printable
text, should not the end of a line be marked by ETX
and the end of a file by EOT and not ^Z?
You are confusing the behaviour of complex printer drivers with plain
simple text printers (with no drivers).
--
Cheers, Carlos.
Ahem A Rivet's Shot
2020-04-18 07:44:48 UTC
Permalink
On Tue, 14 Apr 2020 05:24:53 +0200
Post by Carlos E.R.
You are confusing the behaviour of complex printer drivers with plain
simple text printers (with no drivers).
Plain simple text printers usually do have drivers, they're just
very simple and support an extremely wide range of printers. They can
usually be configured to handle CR/LF/CRLF translations as well as
inserting NULs and/or delays to accommodate hardware restrictions. The unix
tty driver is a good example.
--
Steve O'Hara-Smith | Directable Mirror Arrays
C:\>WIN | A better way to focus the sun
The computer obeys and wins. | licences available see
You lose and Bill collects. | http://www.sohara.org/
Carlos E.R.
2020-04-18 10:08:23 UTC
Permalink
Post by Ahem A Rivet's Shot
On Tue, 14 Apr 2020 05:24:53 +0200
Post by Carlos E.R.
You are confusing the behaviour of complex printer drivers with plain
simple text printers (with no drivers).
Plain simple text printers usually do have drivers, they're just
very simple and support an extremely wide range of printers. They can
usually be configured to handle CR/LF/CRLF translations as well as
inserting NULs and/or delays to accommodate hardware restrictions. The unix
tty driver is a good example.
I used my 9 pin printers in MsDOS directly, file copy to printer port.
No driver at all. Worked perfectly. :-)

I have done the same from unix, but via rs232 port. Had to flip a switch
in the printer to handle cr/lf.
--
Cheers, Carlos.
Ahem A Rivet's Shot
2020-04-18 11:02:48 UTC
Permalink
On Sat, 18 Apr 2020 12:08:23 +0200
Post by Carlos E.R.
Post by Ahem A Rivet's Shot
On Tue, 14 Apr 2020 05:24:53 +0200
Post by Carlos E.R.
You are confusing the behaviour of complex printer drivers with plain
simple text printers (with no drivers).
Plain simple text printers usually do have drivers, they're just
very simple and support an extremely wide range of printers. They can
usually be configured to handle CR/LF/CRLF translations as well as
inserting NULs and/or delays to accommodate hardware restrictions. The
unix tty driver is a good example.
I used my 9 pin printers in MsDOS directly, file copy to printer port.
No driver at all. Worked perfectly. :-)
There's a driver, it's built into MSDOS, it drives the printer port,
I don't think it can do any conversions BICBW it's been a long time since
MSDOS.
Post by Carlos E.R.
I have done the same from unix, but via rs232 port. Had to flip a switch
in the printer to handle cr/lf.
The unix tty driver is quite capable of handling the cr/lf
conversions (and a great deal more) - see man stty for details. You could
have configured the driver instead of flipping the switch.

Just because it isn't model specific doesn't mean it isn't a
driver, all devices have drivers.
--
Steve O'Hara-Smith | Directable Mirror Arrays
C:\>WIN | A better way to focus the sun
The computer obeys and wins. | licences available see
You lose and Bill collects. | http://www.sohara.org/
Carlos E.R.
2020-04-18 19:50:13 UTC
Permalink
Post by Ahem A Rivet's Shot
On Sat, 18 Apr 2020 12:08:23 +0200
Post by Carlos E.R.
Post by Ahem A Rivet's Shot
On Tue, 14 Apr 2020 05:24:53 +0200
Post by Carlos E.R.
You are confusing the behaviour of complex printer drivers with plain
simple text printers (with no drivers).
Plain simple text printers usually do have drivers, they're just
very simple and support an extremely wide range of printers. They can
usually be configured to handle CR/LF/CRLF translations as well as
inserting NULs and/or delays to accommodate hardware restrictions. The
unix tty driver is a good example.
I used my 9 pin printers in MsDOS directly, file copy to printer port.
No driver at all. Worked perfectly. :-)
There's a driver, it's built into MSDOS, it drives the printer port,
I don't think it can do any conversions BICBW it's been a long time since
MSDOS.
I don't call that "a driver".

I also printed from my own code handling directly the port chip. In any
case, I assure you there was no modification of the stream the code
sends and what the printer gets, byte by byte.

I could use a library, that simply allowed me to send a string to the
printer and not having to care about the gore details of the assembler
code that handled the chip. This could be called "driver" at the time,
but did no translation at all. It just took a byte, wrote it to the
output port when ready, detected when the port was ready for the next
char, and sent it, till buffer was empty.

Any conversion had to be done by your program code. Like a font change.
Writing a graphic was very entertaining.
Post by Ahem A Rivet's Shot
Post by Carlos E.R.
I have done the same from unix, but via rs232 port. Had to flip a switch
in the printer to handle cr/lf.
The unix tty driver is quite capable of handling the cr/lf
conversions (and a great deal more) - see man stty for details. You could
have configured the driver instead of flipping the switch.
Ha! It was far easier to flip a switch (including reading the manual)
than finding if it was possible to do the equivalent in that unix
machine. Even support said to configure the printer.
Post by Ahem A Rivet's Shot
Just because it isn't model specific doesn't mean it isn't a
driver, all devices have drivers.
--
Cheers, Carlos.
Questor
2020-04-18 03:40:29 UTC
Permalink
"We are the programmers who now terminate our text file lines with
ekky-ekky-ekky-ekky-p'tang-zoom-boing!-mrowrrr."
-- the programmers who formerly terminated their text file lines with CRLF

I've seen "religious" wars over operating systems, languages, and editors, but
over text file line terminators? Who would have guessed?

Some people in this discussion are conflating printers and terminals, and there
is a salient distinction to be made there. Given the context of the times and
the number of printing/paper terminals in use then, it is entirely reasonable
that text file lines be terminated with a CRLF pair, so that a file that was
TYPEd would display correctly on a terminal without any implicit processing by
the operating system. As others have noted, it also allows for overstriking.
The other appropriate control characters -- tab, vertical tab, form feed (AKA
page throw), and bell -- had their expected effect as well. In the days when I
was slinging code on PDP10s, it was common practice for programs to accept CR,
LF, CRLF, VT, FF, and ESC as signifying the end of a user's line of input.

This behavior was emulated by video terminals (AKA glass ttys). A bare CR
would return the cursor to the beginning of the line without moving to the next
line; a bare LF would move the cursor to the next line but not horizontally. I
relied on this when I wrote a little joke program that would display the lines
of a text file from right to left instead of left to right, and it worked on
both printing and video terminals.

I once had a customer problem that involved line terminators.

This user had a COBOL program on a PDP10 (I can't remember if it was TOPS10 or
TOPS20, but no matter) that generated a report -- a text file. The file was
transfered over DECnet to a VAX, where it would be printed. The problem was
that when the file was printed, all the blank line spacing was doubled. If
there was supposed to be one blank line there were two, if there was supposed to
be two blank lines there were four, etc.

I don't know COBOL, but the problem arose due to how the "advancing" clause used
in the program's print statements was implemented. The COBOL runtime system
would terminate the first line with a CRLF, and then output the appropriate
number of bare LFs to create the desired number of blank lines. VMS has what I
would call a very aggressive record management system (RMS). When the file was
transfered to the VAX, the RMS somehow recognized the LFs as being separate
lines, but put them each in their own record. So when the file was printed, it
would output the bare LF from the record, and then, because it was a text file
and according to the RMS each record in a text file is terminated by a CRLF, it
would send that as well.

Changing the PDP10 COBOL runtime system was out of the question, as its behavior
was long-established and perhaps relied on by other programs. Changing the VMS
RMS was also a non-starter. I offered a couple of work-arounds, the details of
which are long forgotten. One of them was to use the appropriate VMS command to
change the RMS properties of the transferred file. I think there may have also
been a switch to NFT (the DECnet Network File Transfer utility) that caused the
file to be received by the VAX as a type that would print as desired.
Quadibloc
2020-05-23 10:40:25 UTC
Permalink
Post by Questor
In the days when I
was slinging code on PDP10s, it was common practice for programs to accept CR,
LF, CRLF, VT, FF, and ESC as signifying the end of a user's line of input.
That is so strange. Whatever the operating system may accept as signifying the end
of a user's line of input, by the time that line of input gets to an applications
program, the operating system should signal the end of that line in one and only
one unique way - and so programs would handle exactly that way and no other to
work properly.

John Savard
J. Clarke
2020-05-23 12:38:30 UTC
Permalink
On Sat, 23 May 2020 03:40:25 -0700 (PDT), Quadibloc
Post by Quadibloc
Post by Questor
In the days when I
was slinging code on PDP10s, it was common practice for programs to accept CR,
LF, CRLF, VT, FF, and ESC as signifying the end of a user's line of input.
That is so strange. Whatever the operating system may accept as signifying the end
of a user's line of input, by the time that line of input gets to an applications
program, the operating system should signal the end of that line in one and only
one unique way - and so programs would handle exactly that way and no other to
work properly.
Why is that the responsibility of the operating system? All it is
obligated to do is furnish bytes until the end of data has been
reached. If it starts changing characters it can make a huge mess.
Quadibloc
2020-05-23 15:02:49 UTC
Permalink
Post by J. Clarke
On Sat, 23 May 2020 03:40:25 -0700 (PDT), Quadibloc
Post by Quadibloc
Post by Questor
In the days when I
was slinging code on PDP10s, it was common practice for programs to accept CR,
LF, CRLF, VT, FF, and ESC as signifying the end of a user's line of input.
That is so strange. Whatever the operating system may accept as signifying the end
of a user's line of input, by the time that line of input gets to an applications
program, the operating system should signal the end of that line in one and only
one unique way - and so programs would handle exactly that way and no other to
work properly.
Why is that the responsibility of the operating system? All it is
obligated to do is furnish bytes until the end of data has been
reached. If it starts changing characters it can make a huge mess.
The PDP-10 operating system may indeed be more similar to UNIX than it is to
OS/360, so perhaps my comment was sufficiently unclear as to admit confusion.

It is true that under some circumstances, a user program may request characters
from a device of an appropriate kind, and in such a case, one wouldn't want the
operating system to change what characters are recieved.

However, I was thinking that in the more common case, a user program will
request *records* from a device. While those records might be delimited by ASCII
(or EBCDIC) characters if the device was a character mode terminal or a paper-
tape reader, they would _not_ be delimited by any such thing on a magnetic tape
drive (either inter-record gaps, or the block structure), a punched-card reader,
*or* a disk drive (because the disk files wouldn't look like what they do in
UNIX - records on the disk would look like Pascal strings do, and they also
might be in an ISAM file structure).

And then there are block mode terminals. Some ASCII ones do fake behaving like
ASCII character-mode terminals, because that's what the things they're connected
to are used to. The 3277 display station, for example, doesn't do stuff like
that.

John Savard
J. Clarke
2020-05-23 15:07:35 UTC
Permalink
On Sat, 23 May 2020 08:02:49 -0700 (PDT), Quadibloc
Post by Quadibloc
Post by J. Clarke
On Sat, 23 May 2020 03:40:25 -0700 (PDT), Quadibloc
Post by Quadibloc
Post by Questor
In the days when I
was slinging code on PDP10s, it was common practice for programs to accept CR,
LF, CRLF, VT, FF, and ESC as signifying the end of a user's line of input.
That is so strange. Whatever the operating system may accept as signifying the end
of a user's line of input, by the time that line of input gets to an applications
program, the operating system should signal the end of that line in one and only
one unique way - and so programs would handle exactly that way and no other to
work properly.
Why is that the responsibility of the operating system? All it is
obligated to do is furnish bytes until the end of data has been
reached. If it starts changing characters it can make a huge mess.
The PDP-10 operating system may indeed be more similar to UNIX than it is to
OS/360, so perhaps my comment was sufficiently unclear as to admit confusion.
It is true that under some circumstances, a user program may request characters
from a device of an appropriate kind, and in such a case, one wouldn't want the
operating system to change what characters are recieved.
However, I was thinking that in the more common case, a user program will
request *records* from a device. While those records might be delimited by ASCII
(or EBCDIC) characters if the device was a character mode terminal or a paper-
tape reader, they would _not_ be delimited by any such thing on a magnetic tape
drive (either inter-record gaps, or the block structure), a punched-card reader,
*or* a disk drive (because the disk files wouldn't look like what they do in
UNIX - records on the disk would look like Pascal strings do, and they also
might be in an ISAM file structure).
And then there are block mode terminals. Some ASCII ones do fake behaving like
ASCII character-mode terminals, because that's what the things they're connected
to are used to. The 3277 display station, for example, doesn't do stuff like
that.
If the convention is that there are 20 different characters any of
which can be used as end-of-record, again it is not the operating
system's responsibility to change them. If I write a program that
generates records with some specific character as end of record and
the operating system changes that then when my program tries to read
its own records that it generated it breaks.

See the problem?
Quadibloc
2020-05-23 18:27:29 UTC
Permalink
Post by J. Clarke
See the problem?
Certainly, during charater mode I/O, the operating system should not modify characters.
Post by J. Clarke
Why is that the responsibility of the operating system? All it is
obligated to do is furnish bytes until the end of data has been
reached.
you're thinking of a different kind of operating system than I was.

I was thinking of operating sytems which don't deal much in character mode I/O,
because it's inefficient to go through the layers from application program to
operating system to device driver that often.

Instead, the fundamental unit is the record. A call is made to read a line of
text from a device - and different devices each have their own way of marking
the end of a line of text, many of which don't even *involve* characters.

John Savard
Peter Flass
2020-05-23 19:27:14 UTC
Permalink
Post by J. Clarke
On Sat, 23 May 2020 08:02:49 -0700 (PDT), Quadibloc
Post by Quadibloc
Post by J. Clarke
On Sat, 23 May 2020 03:40:25 -0700 (PDT), Quadibloc
Post by Quadibloc
Post by Questor
In the days when I
was slinging code on PDP10s, it was common practice for programs to accept CR,
LF, CRLF, VT, FF, and ESC as signifying the end of a user's line of input.
That is so strange. Whatever the operating system may accept as signifying the end
of a user's line of input, by the time that line of input gets to an applications
program, the operating system should signal the end of that line in one and only
one unique way - and so programs would handle exactly that way and no other to
work properly.
Why is that the responsibility of the operating system? All it is
obligated to do is furnish bytes until the end of data has been
reached. If it starts changing characters it can make a huge mess.
The PDP-10 operating system may indeed be more similar to UNIX than it is to
OS/360, so perhaps my comment was sufficiently unclear as to admit confusion.
It is true that under some circumstances, a user program may request characters
from a device of an appropriate kind, and in such a case, one wouldn't want the
operating system to change what characters are recieved.
However, I was thinking that in the more common case, a user program will
request *records* from a device. While those records might be delimited by ASCII
(or EBCDIC) characters if the device was a character mode terminal or a paper-
tape reader, they would _not_ be delimited by any such thing on a magnetic tape
drive (either inter-record gaps, or the block structure), a punched-card reader,
*or* a disk drive (because the disk files wouldn't look like what they do in
UNIX - records on the disk would look like Pascal strings do, and they also
might be in an ISAM file structure).
And then there are block mode terminals. Some ASCII ones do fake behaving like
ASCII character-mode terminals, because that's what the things they're connected
to are used to. The 3277 display station, for example, doesn't do stuff like
that.
If the convention is that there are 20 different characters any of
which can be used as end-of-record, again it is not the operating
system's responsibility to change them. If I write a program that
generates records with some specific character as end of record and
the operating system changes that then when my program tries to read
its own records that it generated it breaks.
See the problem?
Systems, including AFAIK Unix and Multics have the ability to return a
record “as is” or what Multics calls “canonical” format. For example, if
the user types “abd<bs>c” do you want to see five characters or three?
--
Pete
J. Clarke
2020-05-23 22:21:36 UTC
Permalink
On Sat, 23 May 2020 12:27:14 -0700, Peter Flass
Post by Peter Flass
Post by J. Clarke
On Sat, 23 May 2020 08:02:49 -0700 (PDT), Quadibloc
Post by Quadibloc
Post by J. Clarke
On Sat, 23 May 2020 03:40:25 -0700 (PDT), Quadibloc
Post by Quadibloc
Post by Questor
In the days when I
was slinging code on PDP10s, it was common practice for programs to accept CR,
LF, CRLF, VT, FF, and ESC as signifying the end of a user's line of input.
That is so strange. Whatever the operating system may accept as signifying the end
of a user's line of input, by the time that line of input gets to an applications
program, the operating system should signal the end of that line in one and only
one unique way - and so programs would handle exactly that way and no other to
work properly.
Why is that the responsibility of the operating system? All it is
obligated to do is furnish bytes until the end of data has been
reached. If it starts changing characters it can make a huge mess.
The PDP-10 operating system may indeed be more similar to UNIX than it is to
OS/360, so perhaps my comment was sufficiently unclear as to admit confusion.
It is true that under some circumstances, a user program may request characters
from a device of an appropriate kind, and in such a case, one wouldn't want the
operating system to change what characters are recieved.
However, I was thinking that in the more common case, a user program will
request *records* from a device. While those records might be delimited by ASCII
(or EBCDIC) characters if the device was a character mode terminal or a paper-
tape reader, they would _not_ be delimited by any such thing on a magnetic tape
drive (either inter-record gaps, or the block structure), a punched-card reader,
*or* a disk drive (because the disk files wouldn't look like what they do in
UNIX - records on the disk would look like Pascal strings do, and they also
might be in an ISAM file structure).
And then there are block mode terminals. Some ASCII ones do fake behaving like
ASCII character-mode terminals, because that's what the things they're connected
to are used to. The 3277 display station, for example, doesn't do stuff like
that.
If the convention is that there are 20 different characters any of
which can be used as end-of-record, again it is not the operating
system's responsibility to change them. If I write a program that
generates records with some specific character as end of record and
the operating system changes that then when my program tries to read
its own records that it generated it breaks.
See the problem?
Systems, including AFAIK Unix and Multics have the ability to return a
record “as is” or what Multics calls “canonical” format. For example, if
the user types “abd<bs>c” do you want to see five characters or three?
What am "I"? If I am APL I want two characters separated by a
backspace to be mapped to an overstruck APL character if there is one
that matches that combination, for example.
John Levine
2020-05-23 20:10:38 UTC
Permalink
Post by Quadibloc
However, I was thinking that in the more common case, a user program will
request *records* from a device. ...
TOPS-10 handled block devices like DECtape and disks perfectly well
and did reads and writes directly from buffers in the user address
space sort of like IBM mainframe QSAM. But the most common way to use
them was as a stream of bytes ignoring the physical block boundaries.

The usual line terminator was CR+LF, and null bytes in files were
generally ignored as padding. For text files, the standard kludge was
the each line started on a word boundary with five ASCII digits
followed by a tab, with the otherwise unused low bit of the word with
the digits set to indicate that it was a line number of interest to
text editors but generally ignored by other programs.
--
Regards,
John Levine, ***@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly
Quadibloc
2020-05-24 15:32:38 UTC
Permalink
Post by John Levine
Post by Quadibloc
However, I was thinking that in the more common case, a user program will
request *records* from a device. ...
TOPS-10 handled block devices like DECtape and disks perfectly well
and did reads and writes directly from buffers in the user address
space sort of like IBM mainframe QSAM. But the most common way to use
them was as a stream of bytes ignoring the physical block boundaries.
The usual line terminator was CR+LF, and null bytes in files were
generally ignored as padding. For text files, the standard kludge was
the each line started on a word boundary with five ASCII digits
followed by a tab, with the otherwise unused low bit of the word with
the digits set to indicate that it was a line number of interest to
text editors but generally ignored by other programs.
As the tab is in the next word, it would have to be stripped out...

And what I'm used to is the Michigan Terminal System.

Text files are normally line files.

The format of a line file is:

An ISAM file with two fields.

One field is a 32-bit integer. That field is the primary key, and it is the line
number. It is in units of 0.001, so a file with lines numbered 1, 2, 3, and 4
would have the integer values 1000, 2000, 3000 and 4000 in this field. (This
facilitates line insertion in text editors.)

The other field is a variable-length text field. This contains the line of text
itself. It may be from 0 to 255 characters in length. (This was changed to 0 to
32,767 in a later version of the operating system.)

The text field could contain *any* data, including binary data. The characters
in it were only data and didn't delimit records - that was done out of band. One
text field = one record. So mixing binary data and text was no problem.

This worked so well and was so easy to use and understand that I'm shocked that
today's microcomputers instead expose the user to messiness that can lead to
buffer overflow problems and the like.

John Savard
Charlie Gibbs
2020-05-24 16:25:33 UTC
Permalink
Post by Quadibloc
And what I'm used to is the Michigan Terminal System.
Text files are normally line files.
An ISAM file with two fields.
One field is a 32-bit integer. That field is the primary key, and it is the
line number. It is in units of 0.001, so a file with lines numbered 1, 2, 3,
and 4 would have the integer values 1000, 2000, 3000 and 4000 in this field.
(This facilitates line insertion in text editors.)
The other field is a variable-length text field. This contains the line of
text itself. It may be from 0 to 255 characters in length. (This was changed
to 0 to 32,767 in a later version of the operating system.)
The text field could contain *any* data, including binary data. The
characters in it were only data and didn't delimit records - that was
done out of band. One text field = one record. So mixing binary data and
text was no problem.
This worked so well and was so easy to use and understand that I'm shocked
that today's microcomputers instead expose the user to messiness that can
lead to buffer overflow problems and the like.
It's a matter of parallel evolution. File systems like your MTS example
("Bright college days, oh, carefree days that fly..." -- Tom Lehrer)
were born on mainframe systems that had lots of buffering capability and
processing power. ASCII text files were born on Teletype networks which
had very little processing capability and even less buffering. Given the
constraints of the time, I'd say that each system served its purposes well.
ASCII text files are so much simpler that their limitations are often not
a serious factor - e.g., when transmitting text.

That DLE character mentioned upthread stands for "Data Link Escape",
and it shows that the designers of those Teletype networks were
thinking about the problem.
--
/~\ Charlie Gibbs | Microsoft is a dictatorship.
\ / <***@kltpzyxm.invalid> | Apple is a cult.
X I'm really at ac.dekanfrus | Linux is anarchy.
/ \ if you read it the right way. | Pick your poison.
John Levine
2020-05-24 18:29:03 UTC
Permalink
Post by Charlie Gibbs
Post by Quadibloc
And what I'm used to is the Michigan Terminal System.
Text files are normally line files.
An ISAM file with two fields.
Yeah, TSS VISAM files were sort of like that, too.
Post by Charlie Gibbs
It's a matter of parallel evolution. File systems like your MTS example
("Bright college days, oh, carefree days that fly..." -- Tom Lehrer)
were born on mainframe systems that had lots of buffering capability and
processing power. ASCII text files were born on Teletype networks which
had very little processing capability and even less buffering. Given the
constraints of the time, I'd say that each system served its purposes well.
ASCII text files are so much simpler that their limitations are often not
a serious factor - e.g., when transmitting text.
Quite right. IBM mainframes have I/O channels that can only do block
I/O. The only way to attach terminals is via front end systems that
handle the individual characters and turn them into blocks to send to
and from the channel.

The PDP-10 was an overgrown minicomputer that shared the same priority
interupt and word I/O features with its 18 bit relatives like the
PDP-4, 7, and -9. That made it able to do its own character at a time
I/O and also made it good at realtime applications without extra
kludgery like the 360/44 needed.
--
Regards,
John Levine, ***@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly
Charlie Gibbs
2020-05-25 02:00:52 UTC
Permalink
Post by John Levine
Post by Charlie Gibbs
It's a matter of parallel evolution. File systems like your MTS example
("Bright college days, oh, carefree days that fly..." -- Tom Lehrer)
were born on mainframe systems that had lots of buffering capability and
processing power. ASCII text files were born on Teletype networks which
had very little processing capability and even less buffering. Given the
constraints of the time, I'd say that each system served its purposes well.
ASCII text files are so much simpler that their limitations are often not
a serious factor - e.g., when transmitting text.
Quite right. IBM mainframes have I/O channels that can only do block
I/O. The only way to attach terminals is via front end systems that
handle the individual characters and turn them into blocks to send to
and from the channel.
This is certainly true of character-mode terminals, be they Teletypes
or the CRT terminals you'd find on a minicomputer. However, mainframes
had their own style of terminals which preserved the block-mode paradigm
of other mainframe channels and peripherals. Programming them was Not Fun.
--
/~\ Charlie Gibbs | Microsoft is a dictatorship.
\ / <***@kltpzyxm.invalid> | Apple is a cult.
X I'm really at ac.dekanfrus | Linux is anarchy.
/ \ if you read it the right way. | Pick your poison.
r***@gmail.com
2020-05-25 04:32:52 UTC
Permalink
Post by John Levine
Post by Charlie Gibbs
Post by Quadibloc
And what I'm used to is the Michigan Terminal System.
Text files are normally line files.
An ISAM file with two fields.
Yeah, TSS VISAM files were sort of like that, too.
Post by Charlie Gibbs
It's a matter of parallel evolution. File systems like your MTS example
("Bright college days, oh, carefree days that fly..." -- Tom Lehrer)
were born on mainframe systems that had lots of buffering capability and
processing power. ASCII text files were born on Teletype networks which
had very little processing capability and even less buffering. Given the
constraints of the time, I'd say that each system served its purposes well.
ASCII text files are so much simpler that their limitations are often not
a serious factor - e.g., when transmitting text.
Quite right. IBM mainframes have I/O channels that can only do block
I/O.
Didn't they also have a multiplexor unit that handled byte
devices such as teletypes?
Post by John Levine
The only way to attach terminals is via front end systems that
handle the individual characters and turn them into blocks to send to
and from the channel.
The PDP-10 was an overgrown minicomputer that shared the same priority
interupt and word I/O features with its 18 bit relatives like the
PDP-4, 7, and -9. That made it able to do its own character at a time
I/O and also made it good at realtime applications without extra
kludgery like the 360/44 needed.
Loading...