This is an archive of the discontinued LLVM Phabricator instance.

[Preprocessor] Correct internal token parsing of newline characters in CRLF
ClosedPublic

Authored by erichkeane on Aug 23 2017, 1:59 PM.

Details

Summary

Discovered due to a goofy git setup, the test system-headerline-directive.c (and a few others)
failed because the token-consumption will consume only the '\r' in CRLF, making the preprocessor's
printed value give the wrong line number when returning from an include. For example:

(line 1):#include <noline.h>\r\n

The "file exit" code causes the printer to try to print the 'returned to the main file' line. It looks up
what the current line number is. However, since the current 'token' is the '\n' (since only the \r was
consumed), it will give the line number as '1", not '2'. This results in a few failed tests, but more importantly,
results in error messages being incorrect when compiling a previously preprocessed file.

Diff Detail

Repository
rL LLVM

Event Timeline

erichkeane created this revision.Aug 23 2017, 1:59 PM
erichkeane added inline comments.Aug 23 2017, 2:00 PM
test/Frontend/system-header-line-directive-ms-lineendings.c
1 ↗(On Diff #112436)

Note: this whole file is dos line endings (preserved by the above git attributes, and subversion eol-style when i go to commit it). It doesn't show in phab however for some reaosn.

rnk added inline comments.Aug 24 2017, 10:34 AM
lib/Lex/Lexer.cpp
3076–3077 ↗(On Diff #112436)

Should we only do this in the \r case? If I understand correctly, we're basically saying, if this is a CR, and the next byte is an LF, advance one more and do the pre-processor stuff.

erichkeane added inline comments.Aug 24 2017, 10:35 AM
lib/Lex/Lexer.cpp
3076–3077 ↗(On Diff #112436)

That is exactly what we're doing.

I debated that personally, and am a bit on the fence. It seems a number of places like to treat a '\r\n' and a '\n\r' as the same thing, though it seems a little foolish to me. If you fall toward that opinion, I'll definitely change it, just say the word :)

Switched to simply \r\n instead of both cases. This fixes the issue, is likely faster (important, since this is a performance critical part of code), and a smaller-hammer.

rnk added inline comments.Aug 24 2017, 10:47 AM
lib/Lex/Lexer.cpp
3076–3077 ↗(On Diff #112436)

The bug probably doesn't happen in the \n\r case, because don't we count '\n's to compute our line numbers?

Anyway, yeah, I think we should make this specific to '\r'. In that case, we peek one ahead, and if we see a simple '\n' byte, we advance one more so that our line numbers stay correct.

erichkeane marked 2 inline comments as done.Aug 24 2017, 10:47 AM

Decided there were a few additional advantages to just handling \r\n, so Added them.

rnk accepted this revision.Aug 24 2017, 10:47 AM

lgtm

This revision is now accepted and ready to land.Aug 24 2017, 10:47 AM
This revision was automatically updated to reflect the committed changes.