The thing about C needing newlines at the end of files keeps coming up.

David Chisnall (*Now with 50% more sarcasm!*)

The thing about C needing newlines at the end of files keeps coming up. I am officially old, so I remember using parser generators that had the same limitation as the original AT&T C compiler, which couldn't handle token streams that ended in a token, but I can't remember if it was a lex or yacc limitation (I think it was lex, and lex would not parse the last token correctly). But I can't find a single reference to support this. So either I'm really old and my memory is failing, or the web is now 100% slop.

Can anyone remember the details of the lex / yacc limitation?

Stefan Eissing

@david_chisnall Perhaps @CoolSWEng can shed some light on this?

David Chisnall (*Now with 50% more sarcasm!*)

@rootnode

I have vague recollections of any whitespace being fine after the last token, but a newline was the easiest to make sure was there.

The header file thing would be the same: if you start a file with a token and end a header with a token, they will become a single token after including. But the only place that would need to be a newline is if the header ended with a preprocessing directive and I think the execution order of the preprocessor meant that this wasn't a problem.

Wouter Lindenhof

@david_chisnall @icing

One of them had an issue from what I recall and this was related to the fact that it read lines and not a character stream. And the reason I recall that was because multiline comments using /*…*/ was way harder than it has right to be. That being said this was many years ago and my understanding of parsers was bad.

Jean-Baptiste "JBQ" Quéru

@david_chisnall Feels more likely to be a lex issue than a yacc issue, where lex would drop the incomplete token on EOF, for not treating EOF as a pseudo-character that matches nothing. I admit that feels odd to me, though.

Though, in that case, having a space or a tab as the last character should also work, newline has no reason for only being the last legal character.

Preprocessor requirement, maybe, where newlines are significant?

Laurence Tratt

@david_chisnall One possibility: it may have had a hand-written lexer that had this limitation / design decision? Maybe even because lex hadn't been invented at that point (IIRC it postdates yacc by a year or two).

David Glover-Aoki

@david_chisnall bash “while read” loops will also ignore the last line if it doesn’t end in a newline.

Andrew Eades

@rootnode @david_chisnall Even today Xcode adds a newline to the end of a file. I think most tokenisers of yore would have used fgets to read a file line by line.

HP van Braam

@david_chisnall I can't really vouch for this information, but I can tell you that if this was due to either lex of yac it's probably lex.

If lex can't properly terminate the token stream then in the parser you'd have to write LF or EOF in the grammar a bazillion times I think.

Easier to just tell people to add a newline.

This is conjecture of course but I'm reasonably confident in this based on using flex and bison.