The single quote character can act as a c++ digit separator. However, the minimizer shouldn't treat it as such when it's actually following a valid character literal prefix, like L, U, u, or u8.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
LGTM, but I have a suggestion inline for another approach.
clang/lib/Lex/DependencyDirectivesSourceMinimizer.cpp | ||
---|---|---|
265–272 ↗ | (On Diff #209035) | I wonder if this would be easier to identify walking forward from Start rather than working backwards from Cur. |
clang/lib/Lex/DependencyDirectivesSourceMinimizer.cpp | ||
---|---|---|
270 ↗ | (On Diff #209035) | Are we sure at this point that it's always safe to jump back 2? |
clang/lib/Lex/DependencyDirectivesSourceMinimizer.cpp | ||
---|---|---|
265–272 ↗ | (On Diff #209035) | Yes, that's a possible option. We can scan scanning from start, until we reach non-whitespace right before Cur, and then identify the token. In such ambiguous cases it might be a good idea to raw lex the line using the Lexer from Start to End. Then we'll match the behavior of Lexer when there's an actual error as well. I'll see if I can setup a fallback like this in a follow-up patch. |
clang/lib/Lex/DependencyDirectivesSourceMinimizer.cpp | ||
---|---|---|
270 ↗ | (On Diff #209035) | It should be, because otherwise Start would've been equals to Cur - 1 which we check for right before the dereference. |
clang/lib/Lex/DependencyDirectivesSourceMinimizer.cpp | ||
---|---|---|
270 ↗ | (On Diff #209035) | Oh, obviously. I missed that check. |