This is an archive of the discontinued LLVM Phabricator instance.

Preprocessor: preserve whitespace in -traditional-cpp mode.
Needs ReviewPublic

Authored by jordan_rose on Feb 12 2013, 2:13 PM.

Details

Reviewers
rsmith
Summary

This allows Lexer's "keep whitespace" mode to work for -E -traditional-cpp, which is used by people (ab)using the preprocessor to preprocess things other than C.

The internal uses we know about don't rely on the GNU behavior that whitespace is also preserved within macros, so this patch doesn't attempt to do that.

Diff Detail

Event Timeline

rsmith added inline comments.Feb 12 2013, 3:22 PM
lib/Lex/Lexer.cpp
1879–1882

Could we handle all the changes to this function by just setting IsAtStartOfLine to Result.hasFlag(Token::StartOfLine) and clearing the Token::StartOfLine flag here?

I'm also not sure whether the current approach does the right thing if (for instance) a file starts with <space>#define <...>. Will we set the StartOfLine flag for the '#' token?

2661–2666

If the sequence of whitespace ends in a newline, we would previously not set the LeadingSpace flag, now we always set it. Is that what we want? (Likewise for all the similar changes below.)

2813–2814

This is no longer correct.

test/Preprocessor/traditional-cpp.c
16–17

What are the {{ }} for here? Is this some oddity of -strict-whitespace?

gribozavr added inline comments.Feb 12 2013, 3:30 PM
test/Preprocessor/traditional-cpp.c
16–17

It might have been cleaner to use the {{^}}...{{$}} pattern for this test.

jordan_rose added inline comments.Feb 13 2013, 10:11 AM
lib/Lex/Lexer.cpp
1879–1882

That almost works, but then we'd lose the validity of StartOfLine for whitespace that does start a line, but also starts the next line. We might not care, though.

As for the start-of-file case, we do get this correct because we don't set IsAtStartOfLine to false in this method.

2661–2666

Hm. I was probably overzealous in moving around the LeadingSpace flag—it's often paired with StartOfLine, but it doesn't have the same behavior when there are whitespace tokens involved. I'll go check back on all of these.

test/Preprocessor/traditional-cpp.c
16–17

Ah, yes. Richard: these are needed because otherwise the CHECK line matches itself. Dmitri's change would allow them to match the whole line instead.

Looking at this again, it's going to be hard to get LeadingSpace to be correct in all cases in keep-whitespace mode. I'm hesitant to add another variable like IsAtStartOfLine that will just slow down every token lexed in non-traditional mode.

(To do -traditional-cpp properly, we'd actually need LeadingSpace to be correct, because # only starts a directive if it's actually in the first column. But this patch doesn't do that anyway.)

jordan_rose updated this revision to Unknown Object (????).Feb 14 2013, 3:44 PM

Make sure LeadingSpace stays correct in regular mode, and let it be incorrect in keep-whitespace mode.

Ping.

On IRC Richard suggested that perhaps it's not worth trying to implement -traditional-cpp in Clang, since the real -traditional-cpp is character-based rather than token-based. However, I consider this more of a patch fix to satisfy our internal developers, so that they don't need a separate binary (currently a separate GCC binary) to process their whitespace-sensitive configuration files. The whitespace preservation in this patch is good enough for that, and hopefully we won't need to continue piling on hacks.