This is an archive of the discontinued LLVM Phabricator instance.

[clang-scan-deps] Fix edge cases in the minimizer
AbandonedPublic

Authored by aganea on Aug 7 2019, 2:24 PM.

Details

Summary

This patch fixes:

  1. Invisible characters that come just before #include, such as #ifndef. ( were hidden, depending on the display locale). I choose to simply skip over non-ASCII characters.
  2. Double slashes in #include directive with angle brackets not handled correctly: #include <dir//file.h>
  3. #error directive with quoted, multi-line content, along with CR+LF line endings wasn't handled correctly.

Diff Detail

Repository
rC Clang

Event Timeline

aganea created this revision.Aug 7 2019, 2:24 PM

Thanks for figuring out these issues! It looks like you're fixing several edge cases, would you mind splitting the patch up for easier review?

Regarding the invisible characters before #ifdef, are you sure that's the right behavior? If you run the preprocessor over your test case, it doesn't accept it as a valid input, so why should the minimizer be different:

$ clang -cc1 -DTEST  %t.c -Eonly -o -
%t.c:3:2: error: #endif without #if
#endif
 ^
1 error generated.

Regarding the invisible characters before #ifdef, are you sure that's the right behavior?

Should we then copy these invisible characters to the minimized output? Believe it or not, these characters are there in our codebase since 2013, and never clang has complained about it :)

Regarding the invisible characters before #ifdef, are you sure that's the right behavior?

Should we then copy these invisible characters to the minimized output? Believe it or not, these characters are there in our codebase since 2013, and never clang has complained about it :)

Do you have an example command-line where clang doesn't error with your test file? Alex gave you an example where it does error.

@aganea These are not just any invisible characters that you have, this is the UTF8 BOM. Clang's Lexer skips over them if they're in the beginning of the file (Lexer::InitLexer). The minimizer should do the same thing, so ideally you would factor out the BOM detection and teach the minimizer to skip past it.

aganea added a comment.Aug 9 2019, 7:02 AM

@aganea These are not just any invisible characters that you have, this is the UTF8 BOM. Clang's Lexer skips over them if they're in the beginning of the file (Lexer::InitLexer). The minimizer should do the same thing, so ideally you would factor out the BOM detection and teach the minimizer to skip past it.

Thanks for the heads up. I'll do that.

aganea abandoned this revision.Aug 21 2019, 1:58 PM

Please see the other reviews I've sent.