[Lex] Avoid out-of-bounds dereference in LexAngledStringLiteral.
ClosedPublic

Authored by vsapsai on Dec 19 2017, 5:13 PM.

Details

Summary

Fix makes the loop in LexAngledStringLiteral more like the loops in
LexStringLiteral, LexCharConstant. When we skip a character after
backslash, we need to check if we reached the end of the file instead of
reading the next character unconditionally.

Discovered by OSS-Fuzz:
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=3832

rdar://problem/35572754

Diff Detail

Repository
rC Clang
vsapsai created this revision.Dec 19 2017, 5:13 PM

Ping. OSS-Fuzz classifies the bug as medium severity security issue, would be great to include the fix in Clang 6.0.

dexonsmith requested changes to this revision.Wed, Jan 10, 10:38 AM
dexonsmith added a subscriber: dexonsmith.
dexonsmith added inline comments.
clang/lib/Lex/Lexer.cpp
2014–2015 ↗(On Diff #127630)

If CurPtr is already equal to BufferEnd, why is it safe to call getAndAdvanceChar? Is BufferEnd dereferenceable?

2026 ↗(On Diff #127630)

Should this check still be skipped (in an else if of the C == '\\' check)?

clang/unittests/Lex/LexerTest.cpp
477 ↗(On Diff #127630)

To minimize the diff, please separate this change into an NFC commit ahead of time.

478 ↗(On Diff #127630)

Usually we don't put rdar/bug numbers in the source file. It would make sense in the commit message though.

This revision now requires changes to proceed.Wed, Jan 10, 10:38 AM
vsapsai updated this revision to Diff 129346.Wed, Jan 10, 2:41 PM
  • Remove rdar link from the comment per review.

Also rebased on top of master so diff between diffs can be noisy.

vsapsai marked an inline comment as done.Wed, Jan 10, 2:57 PM
vsapsai added inline comments.
clang/lib/Lex/Lexer.cpp
2014–2015 ↗(On Diff #127630)

BufferEnd is still dereferancable but we do rely on it storing null character. CurPtr < BufferEnd check was added in https://reviews.llvm.org/D9489 to handle #include <\ (no new line in the end). Before that change the buffer overflow was happening because after \ we read null character at line 2015 (CurPtr becomes BufferEnd+1) and then one more character at line 2026. CurPtr < BufferEnd makes sure we can read 2 bytes before starting this 2-character sequence. It works fine when each character is 1 byte but fails when it is more. In this case backslash and new line are read as 1 character and CurPtr < BufferEnd check is insufficient.

In my fix I read the character after backslash and then decide if can read the next one, so it doesn't matter how many bytes are in this character.

2026 ↗(On Diff #127630)

I agree it is behaviour change. NulCharacter is used to warn if you have null character in the string and I think it is worth warning even if it is preceded by the backslash. Therefore I think this check shouldn't be skipped after C == '\\' check. In practice I don't know how you can create a file with null character in its name and use it in inclusion directive.

clang/unittests/Lex/LexerTest.cpp
477 ↗(On Diff #127630)

I plan to nominate this fix for inclusion in 6.0.0 release and having one commit will be easier. It is not necessary to include NFC change in 6.0.0 release but it creates discrepancy that increases a chance of merge conflicts.

But I don't have strong opinion, just pointing out potential downsides with merging the change to other branches.

dexonsmith added inline comments.Wed, Jan 10, 3:28 PM
clang/lib/Lex/Lexer.cpp
2026 ↗(On Diff #127630)

Can you add a test for the behaviour change then?

clang/unittests/Lex/LexerTest.cpp
477 ↗(On Diff #127630)

I think it's worth separating the NFC changes from behaviour changes, even if it means having to cherry-pick extra patches.

vsapsai updated this revision to Diff 129379.Wed, Jan 10, 5:45 PM
  • Add a test for null character in string/character literals.
vsapsai marked 2 inline comments as done.Wed, Jan 10, 5:57 PM
vsapsai added inline comments.
clang/lib/Lex/Lexer.cpp
2026 ↗(On Diff #127630)

Added clang/test/Lexer/null-character-in-literal.c but struggling with Phabricator to display it as text file, not as binary. Guess those null characters can be confusing for some tools.

clang/unittests/Lex/LexerTest.cpp
477 ↗(On Diff #127630)

OK, done.

vsapsai marked an inline comment as done.Wed, Jan 10, 6:06 PM
vsapsai added inline comments.
clang/lib/Lex/Lexer.cpp
2026 ↗(On Diff #127630)
rsmith added a subscriber: rsmith.Wed, Jan 10, 6:50 PM

OSS-Fuzz classifies the bug as medium severity security issue

Well, you should probably fix it to not do that. Any sane threat model involving a C++ compiler should assume that if you can feed the compiler arbitrary input, you can get it to execute arbitrary code, and that that's a feature, not a bug. (For example, our constant expression evaluator can already run arbitrary code as required by the language specification; there happen to be no IO operations that it can perform yet, but it's only a matter of time until enough are required that the program can fully escape the bounds of the compiler as part of the compilation process.) As such, fuzzer bugs that are not representative of patterns found in real programs are likely to be prioritized below bugs that users might more commonly run into, rather than being given special "security bug" treatment.

rsmith added inline comments.Wed, Jan 10, 7:00 PM
clang/lib/Lex/Lexer.cpp
2012–2015 ↗(On Diff #129379)

You can just delete these four lines entirely.

vsapsai added inline comments.Thu, Jan 11, 11:45 AM
clang/lib/Lex/Lexer.cpp
2012–2015 ↗(On Diff #129379)

It will make Clang reject previously accepted #include <test\>escape.h> That's what we want, right? I agree with having no support for such file names, just want to confirm. For the reference, proposed change would match gcc 5.4.0 behaviour. gcc 6.1 and higher rejects such include too but in a different way.

rsmith accepted this revision.Thu, Jan 11, 1:28 PM
rsmith added inline comments.
clang/lib/Lex/Lexer.cpp
2012–2015 ↗(On Diff #129379)

Hmm. I thought so, but on further inspection this area is more subtle than I'd anticipated.

It's *permissible* to reject that. In C++, [lex.header]p2 says: "The appearance of [...] the character[...] \ [...] in a q-char-sequence or an h-char-sequence is conditionally-supported with implementation-defined semantics". C11 6.4.7/3 has similar wording but makes the behavior undefined in that case. (Both GCC's behavior and Clang's behavior are permitted by these rules.)

Clang's handling of the #include "q-char-sequence" form behaves the same as its handling of the #include <h-char-sequence> form today: the header name is *not* terminated by a " or > (respectively) that is preceded by an odd number of \s. Retaining this consistency of behavior seems like a good idea.

However, the actual behavior when such a character is "escaped" does not seem reasonable:

#include <\>> // includes file named \>, not file named >
#include "\"" // includes file named \", not file named "

... and there is no general way to perform a double-quoted include of a file whose name contains a " (not preceded by \), or an angled include of a file whose name is > (not preceded by >). So it seems that permitting escaped ending characters without actually handling them as escape sequences is not particularly worthwhile.

I think the best approach here is to follow GCC's lead: terminate a header-name (of either form) when the first " or > character is reached (depending on the form of header name), regardless of preceding \s.

But... that's a more invasive change (you'll need to change the handling of double-quoted string literals too, for consistency). Let's keep that change and this one separate.

vsapsai added inline comments.Thu, Jan 11, 2:43 PM
clang/lib/Lex/Lexer.cpp
2012–2015 ↗(On Diff #129379)

Thanks for in-depth explanation, that is useful.

dexonsmith accepted this revision.Thu, Jan 11, 4:05 PM
This revision is now accepted and ready to land.Thu, Jan 11, 4:05 PM
This revision was automatically updated to reflect the committed changes.