Discussion:
The end of text files?
(too old to reply)
Jorgen Grahn
2020-04-11 05:58:20 UTC
Permalink
I'm used to text files where there's an end of line (LF on Unix) on
every line in a file, even the last one. So that non-empty text files
end with a LF. Editors like Emacs and vi tend to add one, and I got
the impression that not having one is ... not exactly /illegal/, but
bad taste.

Now in the past few years I increasingly see files without that last LF.
They come from coworkers who use various IDEs on Linux.

I notice this when diff reports "no newline at end of file", or
when I edit a file with Emacs and Git reports that I changed the
end of the file.

What's the history of this convention? And is the convention missing
on Windows -- is that why fancy IDEs don't do it?

/Jorgen
--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .
Charlie Gibbs
2020-04-11 06:05:14 UTC
Permalink
Post by Jorgen Grahn
I'm used to text files where there's an end of line (LF on Unix) on
every line in a file, even the last one. So that non-empty text files
end with a LF. Editors like Emacs and vi tend to add one, and I got
the impression that not having one is ... not exactly /illegal/, but
bad taste.
Just a couple of weeks ago I had to tweak some software that was
concatenating a number of files that were missing the final EOL.
The last line of file n had the first line of file n+1 appended
to it, resulting in data loss. Not nice.
Post by Jorgen Grahn
Now in the past few years I increasingly see files without that last LF.
They come from coworkers who use various IDEs on Linux.
I notice this when diff reports "no newline at end of file", or
when I edit a file with Emacs and Git reports that I changed the
end of the file.
What's the history of this convention? And is the convention missing
on Windows -- is that why fancy IDEs don't do it?
I wonder whether it's a convention, or just lousy programming.
--
/~\ Charlie Gibbs | Microsoft is a dictatorship.
\ / <***@kltpzyxm.invalid> | Apple is a cult.
X I'm really at ac.dekanfrus | Linux is anarchy.
/ \ if you read it the right way. | Pick your poison.
Jorgen Grahn
2020-04-11 10:14:19 UTC
Permalink
Post by Charlie Gibbs
Post by Jorgen Grahn
I'm used to text files where there's an end of line (LF on Unix) on
every line in a file, even the last one. So that non-empty text files
end with a LF. Editors like Emacs and vi tend to add one, and I got
the impression that not having one is ... not exactly /illegal/, but
bad taste.
Just a couple of weeks ago I had to tweak some software that was
concatenating a number of files that were missing the final EOL.
The last line of file n had the first line of file n+1 appended
to it, resulting in data loss. Not nice.
Post by Jorgen Grahn
Now in the past few years I increasingly see files without that last LF.
They come from coworkers who use various IDEs on Linux.
I notice this when diff reports "no newline at end of file", or
when I edit a file with Emacs and Git reports that I changed the
end of the file.
What's the history of this convention? And is the convention missing
on Windows -- is that why fancy IDEs don't do it?
I wonder whether it's a convention, or just lousy programming.
Which would be the lousy choice? Note that general tools like cat(1)
cannot do the right thing if text files don't end with a newline.

For tools which are more syntax-aware, I agree that it's a bug if they
misbehave when an input line doesn't end with a newline.

/Jorgen
--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .
Thomas Koenig
2020-04-11 15:30:29 UTC
Permalink
Post by Jorgen Grahn
Which would be the lousy choice? Note that general tools like cat(1)
cannot do the right thing if text files don't end with a newline.
I agree that text files without a trailing newline are bad, but...

How would cat(1) fail?
Douglas Miller
2020-04-11 16:35:54 UTC
Permalink
Post by Thomas Koenig
Post by Jorgen Grahn
Which would be the lousy choice? Note that general tools like cat(1)
cannot do the right thing if text files don't end with a newline.
I agree that text files without a trailing newline are bad, but...
How would cat(1) fail?
The beauty of 'cat' is that it just passes input to output. It can be used on binary files as well, which makes it more powerful. I don't think 'cat' failed, it just did what it was told. The resulting file was not what was expected because some of the input files did not end in newline, so lines got concatenated and spoiled the resulting output for it's intended purpose.
Jorgen Grahn
2020-04-11 18:10:59 UTC
Permalink
Post by Douglas Miller
Post by Thomas Koenig
Post by Jorgen Grahn
Which would be the lousy choice? Note that general tools like cat(1)
cannot do the right thing if text files don't end with a newline.
I agree that text files without a trailing newline are bad, but...
How would cat(1) fail?
The beauty of 'cat' is that it just passes input to output. It can
be used on binary files as well, which makes it more powerful. I
don't think 'cat' failed, it just did what it was told. The
resulting file was not what was expected because some of the input
files did not end in newline, so lines got concatenated and spoiled
the resulting output for it's intended purpose.
Precisely. 'cat A B' would have one line less than the sum of A and B;
that would be the first sign.

Note that I'm not arguing that text files without the last newline are
/illegal/ -- just that habitually producing them is a bad idea. Which
in turn makes me wonder why popular tools do so.

/Jorgen
--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .
Dan Espen
2020-04-11 12:18:38 UTC
Permalink
Post by Jorgen Grahn
I'm used to text files where there's an end of line (LF on Unix) on
every line in a file, even the last one. So that non-empty text files
end with a LF. Editors like Emacs and vi tend to add one, and I got
the impression that not having one is ... not exactly /illegal/, but
bad taste.
Now in the past few years I increasingly see files without that last LF.
They come from coworkers who use various IDEs on Linux.
I notice this when diff reports "no newline at end of file", or
when I edit a file with Emacs and Git reports that I changed the
end of the file.
What's the history of this convention? And is the convention missing
on Windows -- is that why fancy IDEs don't do it?
Certainly don't know about Windows, but here's what Emacs has to say
about it:

require-final-newline is a variable defined in ‘files.el’.
Its value is t
Original value was nil
Local in buffer text.txt; global value is nil

This variable is safe as a file local variable if its value
satisfies the predicate ‘symbolp’.

Documentation:
Whether to add a newline automatically at the end of the file.

A value of t means do this only when the file is about to be saved.
A value of ‘visit’ means do this right after the file is visited.
A value of ‘visit-save’ means do it at both of those times.
Any other non-nil value means ask user whether to add a newline, when saving.
A value of nil means don’t add newlines.

Certain major modes set this locally to the value obtained
from ‘mode-require-final-newline’.

You can customize this variable.

A quick check shows this value is 't' for .txt and .cpp.
I'm pretty sure Emacs would do the same thing on Windows.
So, I don't think Emacs is generating these files.
--
Dan Espen
Jorgen Grahn
2020-04-11 12:43:28 UTC
Permalink
Post by Dan Espen
Post by Jorgen Grahn
I'm used to text files where there's an end of line (LF on Unix) on
every line in a file, even the last one. So that non-empty text files
end with a LF. Editors like Emacs and vi tend to add one, and I got
the impression that not having one is ... not exactly /illegal/, but
bad taste.
Now in the past few years I increasingly see files without that last LF.
They come from coworkers who use various IDEs on Linux.
I notice this when diff reports "no newline at end of file", or
when I edit a file with Emacs and Git reports that I changed the
end of the file.
What's the history of this convention? And is the convention missing
on Windows -- is that why fancy IDEs don't do it?
Certainly don't know about Windows, but here's what Emacs has to say
require-final-newline is a variable defined in ‘files.el’.
Its value is t
...
Post by Dan Espen
I'm pretty sure Emacs would do the same thing on Windows.
So, I don't think Emacs is generating these files.
Me neither -- I'm the only Emacs user in the projects, on my side of
the Atlantic anyway. (There's one or two Vim users.)

/Jorgen
--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .
Douglas Miller
2020-04-11 12:36:32 UTC
Permalink
For what it's worth, my view: It is called a "line ending" or "line termination", and *NOT* a "line separator". By that definition, I think the "correct" style is that every (non-empty) text file must end in a "line termination". I would view any program that fails to do that to be flawed. One caveat would be CP/M text files which have a Ctrl-Z after the last line, with the rest of the 128-byte record filled with either garbage or more Ctrl-Zs. I think it fits my definition, as the Ctrl-Z means EOF and it appears after the last line termination. But it becomes more visible when we start moving files back and forth between CP/M and other OSes.
J. Clarke
2020-04-11 13:18:55 UTC
Permalink
On Sat, 11 Apr 2020 05:36:32 -0700 (PDT), Douglas Miller
Post by Douglas Miller
For what it's worth, my view: It is called a "line ending" or "line termination", and *NOT* a "line separator". By that definition, I think the "correct" style is that every (non-empty) text file must end in a "line termination". I would view any program that fails to do that to be flawed. One caveat would be CP/M text files which have a Ctrl-Z after the last line, with the rest of the 128-byte record filled with either garbage or more Ctrl-Zs. I think it fits my definition, as the Ctrl-Z means EOF and it appears after the last line termination. But it becomes more visible when we start moving files back and forth between CP/M and other OSes.
Ran into this a while back moving stuff from the PC to the mainframe
or vice versa. Had to put in a check somewhere along the line and
make sure the last EOL was in place or something or other broke.
Bob Eager
2020-04-11 14:18:36 UTC
Permalink
Post by Douglas Miller
For what it's worth, my view: It is called a "line ending" or "line
termination", and *NOT* a "line separator". By that definition, I think
the "correct" style is that every (non-empty) text file must end in a
"line termination". I would view any program that fails to do that to be
flawed. One caveat would be CP/M text files which have a Ctrl-Z after
the last line, with the rest of the 128-byte record filled with either
garbage or more Ctrl-Zs. I think it fits my definition, as the Ctrl-Z
means EOF and it appears after the last line termination. But it becomes
more visible when we start moving files back and forth between CP/M and
other OSes.
Probably an Algol 60 programmer started this.

In Algol, 60, semicolon was a separator, not a terminator. So, in a
BEGIN...END block, that last statement didn't need a semicolon. But a
null statement was OK, so mostly people always included one.
--
Using UNIX since v6 (1975)...

Use the BIG mirror service in the UK:
http://www.mirrorservice.org
John Levine
2020-04-11 20:40:13 UTC
Permalink
Post by Bob Eager
Probably an Algol 60 programmer started this.
Right, and we're all using PL/I.
--
Regards,
John Levine, ***@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly
Bob Eager
2020-04-11 22:47:09 UTC
Permalink
Post by John Levine
Post by Bob Eager
Probably an Algol 60 programmer started this.
Right, and we're all using PL/I.
Whoosh.
--
Using UNIX since v6 (1975)...

Use the BIG mirror service in the UK:
http://www.mirrorservice.org
Charlie Gibbs
2020-04-12 04:55:20 UTC
Permalink
Post by Douglas Miller
For what it's worth, my view: It is called a "line ending"
or "line termination", and *NOT* a "line separator". By that
definition, I think the "correct" style is that every (non-empty)
text file must end in a "line termination". I would view any program
that fails to do that to be flawed. One caveat would be CP/M text
files which have a Ctrl-Z after the last line, with the rest of the
128-byte record filled with either garbage or more Ctrl-Zs. I think
it fits my definition, as the Ctrl-Z means EOF and it appears after
the last line termination. But it becomes more visible when we start
moving files back and forth between CP/M and other OSes.
Ah yes, I remember having to ensure that my CP/M programs would work
properly if a text file whose length was a multiple of 128 bytes didn't
have a control-Z at the end.

This is, however, orthogonal to whether the last line in that file
should end with CRLF.
--
/~\ Charlie Gibbs | Microsoft is a dictatorship.
\ / <***@kltpzyxm.invalid> | Apple is a cult.
X I'm really at ac.dekanfrus | Linux is anarchy.
/ \ if you read it the right way. | Pick your poison.
Douglas Miller
2020-04-12 12:21:30 UTC
Permalink
Post by Charlie Gibbs
...
This is, however, orthogonal to whether the last line in that file
should end with CRLF.
...
But not entirely random w.r.t. the original post. The original question references using files from IDEs, which raises the possibility that they are "foreign" to *nix (e.g. could originate in CP/M).
Jorgen Grahn
2020-04-12 12:54:28 UTC
Permalink
Post by Douglas Miller
Post by Charlie Gibbs
...
This is, however, orthogonal to whether the last line in that file
should end with CRLF.
...
But not entirely random w.r.t. the original post. The original
question references using files from IDEs, which raises the
possibility that they are "foreign" to *nix (e.g. could originate in
CP/M).
The IDEs are Qt Creator and Visual Studio; I don't think the creators
of those Creators have even heard of CP/M.

My suspicion is that this "last line has no newline" convention
originates in Windows culture. Or some other subculture I don't
belong to, one that's "foreign" to *nix, as you say.

(I used to be in an Amiga subculture, but it's been 25 years and I
can't remember how we ended text files. They had Unix line endings
and used iso8859-1, though.)

/Jorgen
--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .
Douglas Miller
2020-04-12 12:58:33 UTC
Permalink
We may all be over-thinking this, too. Based on the behavior of Emacs and Vim, it's a situation that has been around for awhile. It may not be anything intentional but just and oversight on the part of the application developer - a common mistake made by developers.
J. Clarke
2020-04-12 15:31:30 UTC
Permalink
Post by Jorgen Grahn
Post by Douglas Miller
Post by Charlie Gibbs
...
This is, however, orthogonal to whether the last line in that file
should end with CRLF.
...
But not entirely random w.r.t. the original post. The original
question references using files from IDEs, which raises the
possibility that they are "foreign" to *nix (e.g. could originate in
CP/M).
The IDEs are Qt Creator and Visual Studio; I don't think the creators
of those Creators have even heard of CP/M.
My suspicion is that this "last line has no newline" convention
originates in Windows culture. Or some other subculture I don't
belong to, one that's "foreign" to *nix, as you say.
(I used to be in an Amiga subculture, but it's been 25 years and I
can't remember how we ended text files. They had Unix line endings
and used iso8859-1, though.)
I can confirm that Visual Studio does not put anything at the end of a
source file--in two different hex editors it just ends with the last
visible character.

QT Creator, whatever version you get when you do the install fresh
today, puts CR/LF at the end of the file.

Both on Windows.

I tried a few more. Eclipse does not put anything at the end of the
last line. Idle puts CR/LF. Notepad++ doesn't put anything.

All of these will put a CR/LF if I explicitly key return after the
last character.
Jorgen Grahn
2020-04-12 17:00:43 UTC
Permalink
Post by J. Clarke
Post by Jorgen Grahn
Post by Douglas Miller
Post by Charlie Gibbs
...
This is, however, orthogonal to whether the last line in that file
should end with CRLF.
...
But not entirely random w.r.t. the original post. The original
question references using files from IDEs, which raises the
possibility that they are "foreign" to *nix (e.g. could originate in
CP/M).
The IDEs are Qt Creator and Visual Studio; I don't think the creators
of those Creators have even heard of CP/M.
My suspicion is that this "last line has no newline" convention
originates in Windows culture. Or some other subculture I don't
belong to, one that's "foreign" to *nix, as you say.
(I used to be in an Amiga subculture, but it's been 25 years and I
can't remember how we ended text files. They had Unix line endings
and used iso8859-1, though.)
I can confirm that Visual Studio does not put anything at the end of a
source file--in two different hex editors it just ends with the last
visible character.
QT Creator, whatever version you get when you do the install fresh
today, puts CR/LF at the end of the file.
Both on Windows.
I tried a few more. Eclipse does not put anything at the end of the
last line. Idle puts CR/LF. Notepad++ doesn't put anything.
All of these will put a CR/LF if I explicitly key return after the
last character.
Interesting -- thanks! Especially Notepad++, which (like Emacs and
Vim) isn't only intended for editing code.

/Jorgen
--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .
Douglas Miller
2020-04-12 17:00:45 UTC
Permalink
On 12 Apr 2020 12:54:28 GMT, Jorgen Grahn
...
All of these will put a CR/LF if I explicitly key return after the
last character.
CP/M's Magic Wand EDIT.COM does not put a CR/LF on the last line unless you explicitly enter one. But, for editors, that is reasonable (alternative) behavior. CP/M compilers/assemblers handle that, but I suspect PIP.COM does not when concatenating files.
Charlie Gibbs
2020-04-12 17:44:29 UTC
Permalink
Post by Douglas Miller
On 12 Apr 2020 12:54:28 GMT, Jorgen Grahn
...
All of these will put a CR/LF if I explicitly key return after the
last character.
CP/M's Magic Wand EDIT.COM does not put a CR/LF on the last line
unless you explicitly enter one. But, for editors, that is reasonable
(alternative) behavior. CP/M compilers/assemblers handle that, but I
suspect PIP.COM does not when concatenating files.
FSVO "does not handle". If you define concatenating files as
"putting one string of bytes behind another", and the first file
is missing a line terminator on the last line, the first line
of the second file will be appended to the last line of the
first file. If, on the other hand, you reserve a special
definition of "concatenating" for text files that includes
ensuring proper line terminators everywhere, your program
will append a line terminator to each input file that is
missing one at the end.

The latter behaviour is what I recently had to add to one of
my programs.
--
/~\ Charlie Gibbs | Microsoft is a dictatorship.
\ / <***@kltpzyxm.invalid> | Apple is a cult.
X I'm really at ac.dekanfrus | Linux is anarchy.
/ \ if you read it the right way. | Pick your poison.
Carlos E.R.
2020-04-12 18:36:38 UTC
Permalink
Post by Jorgen Grahn
My suspicion is that this "last line has no newline" convention
originates in Windows culture. Or some other subculture I don't
belong to, one that's "foreign" to *nix, as you say.
IMO, it is not a "last line has no newline" convention, but rather that
it doesn't matter if there is or not.

I remember this "issue" on MsDOS in the 80's. Simply the editor would
not add a newline on its own, but the user could choose to add it or
not. Up to him. Maybe the user does not want a line end at the end of
file intentionally.

And yes, some tools failed if that final line ending was missing, but
IMO that is a failure of the tool, not the editor. Ie, read till eol or eof.
--
Cheers, Carlos.
Jorgen Grahn
2020-04-12 21:09:58 UTC
Permalink
Post by Carlos E.R.
Post by Jorgen Grahn
My suspicion is that this "last line has no newline" convention
originates in Windows culture. Or some other subculture I don't
belong to, one that's "foreign" to *nix, as you say.
IMO, it is not a "last line has no newline" convention, but rather that
it doesn't matter if there is or not.
Yes, that's what I should have written. And the other convention
would be "there doesn't /have/ to be one, but it's better if there
is".

/Jorgen
--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .
Carlos E.R.
2020-04-13 03:04:48 UTC
Permalink
Post by Jorgen Grahn
Post by Carlos E.R.
Post by Jorgen Grahn
My suspicion is that this "last line has no newline" convention
originates in Windows culture. Or some other subculture I don't
belong to, one that's "foreign" to *nix, as you say.
IMO, it is not a "last line has no newline" convention, but rather that
it doesn't matter if there is or not.
Yes, that's what I should have written. And the other convention
would be "there doesn't /have/ to be one, but it's better if there
is".
Yes, of course. After I lost a few hours trying to find out why a
certain now forgotten program would fail or misbehave, till I found out
that adding an EOL just before the EOF would make it work, I made a
mental note of always checking there was an EOL line on all my text
config files and such.
--
Cheers, Carlos.
JimP
2020-04-12 23:39:25 UTC
Permalink
Post by Jorgen Grahn
Post by Douglas Miller
Post by Charlie Gibbs
...
This is, however, orthogonal to whether the last line in that file
should end with CRLF.
...
But not entirely random w.r.t. the original post. The original
question references using files from IDEs, which raises the
possibility that they are "foreign" to *nix (e.g. could originate in
CP/M).
The IDEs are Qt Creator and Visual Studio; I don't think the creators
of those Creators have even heard of CP/M.
My suspicion is that this "last line has no newline" convention
originates in Windows culture. Or some other subculture I don't
belong to, one that's "foreign" to *nix, as you say.
I remember when the university I attended switched from Okidata dot
matrix printers to HP laser jets, paper wouldn't come out.

My boss found out the older software wasn't providing end of page.
This was about 1988/89. I was a part time worker.
Post by Jorgen Grahn
(I used to be in an Amiga subculture, but it's been 25 years and I
can't remember how we ended text files. They had Unix line endings
and used iso8859-1, though.)
/Jorgen
I remember it was different than MS ms-dos 3.x/4.x, but not how
exactly.
--
Jim
Loading...