This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Lex/
-
Lex/
-
Lexer.cpp
-
test/Lexer/
-
Lexer/
-
null-character-in-literal.c
-
unittests/Lex/
-
Lex/
-
LexerTest.cpp

Differential D41423

[Lex] Avoid out-of-bounds dereference in LexAngledStringLiteral.
ClosedPublic

Authored by vsapsai on Dec 19 2017, 5:13 PM.

Download Raw Diff

Details

Reviewers

arphaman
kcc
dexonsmith
rsmith

Commits

rGd8f5d67c8ef6: Merging r322390: --------------------------------------------------------------…
rGabb8dfc11468: [Lex] Avoid out-of-bounds dereference in LexAngledStringLiteral.
rL322649: Merging r322390:
rL322390: [Lex] Avoid out-of-bounds dereference in LexAngledStringLiteral.
rC322390: [Lex] Avoid out-of-bounds dereference in LexAngledStringLiteral.

Summary

Fix makes the loop in LexAngledStringLiteral more like the loops in
LexStringLiteral, LexCharConstant. When we skip a character after
backslash, we need to check if we reached the end of the file instead of
reading the next character unconditionally.

Discovered by OSS-Fuzz:
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=3832

rdar://problem/35572754

Diff Detail

Repository: rC Clang

Event Timeline

vsapsai created this revision.Dec 19 2017, 5:13 PM

Ping. OSS-Fuzz classifies the bug as medium severity security issue, would be great to include the fix in Clang 6.0.

dexonsmith requested changes to this revision.Jan 10 2018, 10:38 AM

dexonsmith added a subscriber: dexonsmith.

dexonsmith added inline comments.

clang/lib/Lex/Lexer.cpp
2014–2015 ↗	(On Diff #127630)	If `CurPtr` is already equal to `BufferEnd`, why is it safe to call `getAndAdvanceChar`? Is `BufferEnd` dereferenceable?
2026 ↗	(On Diff #127630)	Should this check still be skipped (in an `else if` of the `C == '\\'` check)?
clang/unittests/Lex/LexerTest.cpp
477 ↗	(On Diff #127630)	To minimize the diff, please separate this change into an NFC commit ahead of time.
478 ↗	(On Diff #127630)	Usually we don't put rdar/bug numbers in the source file. It would make sense in the commit message though.

This revision now requires changes to proceed.Jan 10 2018, 10:38 AM

Remove rdar link from the comment per review.

Also rebased on top of master so diff between diffs can be noisy.

vsapsai marked an inline comment as done.Jan 10 2018, 2:57 PM

vsapsai added inline comments.

clang/lib/Lex/Lexer.cpp
2014–2015 ↗	(On Diff #127630)	`BufferEnd` is still dereferancable but we do rely on it storing null character. `CurPtr < BufferEnd` check was added in https://reviews.llvm.org/D9489 to handle `#include <\` (no new line in the end). Before that change the buffer overflow was happening because after `\` we read null character at line 2015 (`CurPtr` becomes `BufferEnd+1`) and then one more character at line 2026. `CurPtr < BufferEnd` makes sure we can read 2 bytes before starting this 2-character sequence. It works fine when each character is 1 byte but fails when it is more. In this case backslash and new line are read as 1 character and `CurPtr < BufferEnd` check is insufficient. In my fix I read the character after backslash and then decide if can read the next one, so it doesn't matter how many bytes are in this character.
2026 ↗	(On Diff #127630)	I agree it is behaviour change. `NulCharacter` is used to warn if you have null character in the string and I think it is worth warning even if it is preceded by the backslash. Therefore I think this check shouldn't be skipped after `C == '\\'` check. In practice I don't know how you can create a file with null character in its name and use it in inclusion directive.
clang/unittests/Lex/LexerTest.cpp
477 ↗	(On Diff #127630)	I plan to nominate this fix for inclusion in 6.0.0 release and having one commit will be easier. It is not necessary to include NFC change in 6.0.0 release but it creates discrepancy that increases a chance of merge conflicts. But I don't have strong opinion, just pointing out potential downsides with merging the change to other branches.

dexonsmith added inline comments.Jan 10 2018, 3:28 PM

clang/lib/Lex/Lexer.cpp
2026 ↗	(On Diff #127630)	Can you add a test for the behaviour change then?
clang/unittests/Lex/LexerTest.cpp
477 ↗	(On Diff #127630)	I think it's worth separating the NFC changes from behaviour changes, even if it means having to cherry-pick extra patches.

Add a test for null character in string/character literals.

vsapsai marked 2 inline comments as done.Jan 10 2018, 5:57 PM

vsapsai added inline comments.

clang/lib/Lex/Lexer.cpp
2026 ↗	(On Diff #127630)	Added clang/test/Lexer/null-character-in-literal.c but struggling with Phabricator to display it as text file, not as binary. Guess those null characters can be confusing for some tools.
clang/unittests/Lex/LexerTest.cpp
477 ↗	(On Diff #127630)	OK, done.

vsapsai marked an inline comment as done.Jan 10 2018, 6:06 PM

vsapsai added inline comments.

clang/lib/Lex/Lexer.cpp
2026 ↗	(On Diff #127630)	Added the test file at https://gist.github.com/vsapsai/d61a8a7d06356a18db26658acc0c4b25

In D41423#968587, @vsapsai wrote:

OSS-Fuzz classifies the bug as medium severity security issue

Well, you should probably fix it to not do that. Any sane threat model involving a C++ compiler should assume that if you can feed the compiler arbitrary input, you can get it to execute arbitrary code, and that that's a feature, not a bug. (For example, our constant expression evaluator can already run arbitrary code as required by the language specification; there happen to be no IO operations that it can perform yet, but it's only a matter of time until enough are required that the program can fully escape the bounds of the compiler as part of the compilation process.) As such, fuzzer bugs that are not representative of patterns found in real programs are likely to be prioritized below bugs that users might more commonly run into, rather than being given special "security bug" treatment.

rsmith added inline comments.Jan 10 2018, 7:00 PM

clang/lib/Lex/Lexer.cpp
2012–2015 ↗	(On Diff #129379)	You can just delete these four lines entirely.

vsapsai added inline comments.Jan 11 2018, 11:45 AM

clang/lib/Lex/Lexer.cpp
2012–2015 ↗	(On Diff #129379)	It will make Clang reject previously accepted `#include <test\>escape.h>` That's what we want, right? I agree with having no support for such file names, just want to confirm. For the reference, proposed change would match gcc 5.4.0 behaviour. gcc 6.1 and higher rejects such include too but in a different way.

rsmith accepted this revision.Jan 11 2018, 1:28 PM

rsmith added inline comments.

clang/lib/Lex/Lexer.cpp
2012–2015 ↗	(On Diff #129379)	Hmm. I thought so, but on further inspection this area is more subtle than I'd anticipated. It's permissible to reject that. In C++, [lex.header]p2 says: "The appearance of [...] the character[...] `\` [...] in a q-char-sequence or an h-char-sequence is conditionally-supported with implementation-defined semantics". C11 6.4.7/3 has similar wording but makes the behavior undefined in that case. (Both GCC's behavior and Clang's behavior are permitted by these rules.) Clang's handling of the `#include "q-char-sequence"` form behaves the same as its handling of the `#include <h-char-sequence>` form today: the header name is not terminated by a `"` or `>` (respectively) that is preceded by an odd number of `\`s. Retaining this consistency of behavior seems like a good idea. However, the actual behavior when such a character is "escaped" does not seem reasonable: #include <\>> // includes file named \>, not file named > #include "\"" // includes file named \", not file named " ... and there is no general way to perform a double-quoted include of a file whose name contains a `"` (not preceded by `\`), or an angled include of a file whose name is `>` (not preceded by `>`). So it seems that permitting escaped ending characters without actually handling them as escape sequences is not particularly worthwhile. I think the best approach here is to follow GCC's lead: terminate a header-name (of either form) when the first `"` or `>` character is reached (depending on the form of header name), regardless of preceding `\`s. But... that's a more invasive change (you'll need to change the handling of double-quoted string literals too, for consistency). Let's keep that change and this one separate.

vsapsai added inline comments.Jan 11 2018, 2:43 PM

clang/lib/Lex/Lexer.cpp
2012–2015 ↗	(On Diff #129379)	Thanks for in-depth explanation, that is useful.

dexonsmith accepted this revision.Jan 11 2018, 4:05 PM

This revision is now accepted and ready to land.Jan 11 2018, 4:05 PM

Closed by commit rC322390: [Lex] Avoid out-of-bounds dereference in LexAngledStringLiteral. (authored by vsapsai). · Explain WhyJan 12 2018, 10:55 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Lex/

Lexer.cpp

19 lines

test/

Lexer/

null-character-in-literal.c

unittests/

Lex/

LexerTest.cpp

2 lines

Diff 129666

lib/Lex/Lexer.cpp

	Show First 20 Lines • Show All 2,003 Lines • ▼ Show 20 Lines
	/// LexAngledStringLiteral - Lex the remainder of an angled string literal,			/// LexAngledStringLiteral - Lex the remainder of an angled string literal,
	/// after having lexed the '<' character. This is used for #include filenames.			/// after having lexed the '<' character. This is used for #include filenames.
	bool Lexer::LexAngledStringLiteral(Token &Result, const char *CurPtr) {			bool Lexer::LexAngledStringLiteral(Token &Result, const char *CurPtr) {
	// Does this string contain the \0 character?			// Does this string contain the \0 character?
	const char *NulCharacter = nullptr;			const char *NulCharacter = nullptr;
	const char *AfterLessPos = CurPtr;			const char *AfterLessPos = CurPtr;
	char C = getAndAdvanceChar(CurPtr, Result);			char C = getAndAdvanceChar(CurPtr, Result);
	while (C != '>') {			while (C != '>') {
	// Skip escaped characters.			// Skip escaped characters. Escaped newlines will already be processed by
	if (C == '\\' && CurPtr < BufferEnd) {			// getAndAdvanceChar.
	// Skip the escaped character.			if (C == '\\')
	getAndAdvanceChar(CurPtr, Result);			C = getAndAdvanceChar(CurPtr, Result);
	} else if (C == '\n' \|\| C == '\r' \|\| // Newline.
				if (C == '\n' \|\| C == '\r' \|\| // Newline.
	(C == 0 && (CurPtr-1 == BufferEnd \|\| // End of file.			(C == 0 && (CurPtr-1 == BufferEnd \|\| // End of file.
	isCodeCompletionPoint(CurPtr-1)))) {			isCodeCompletionPoint(CurPtr-1)))) {
	// If the filename is unterminated, then it must just be a lone <			// If the filename is unterminated, then it must just be a lone <
	// character. Return this as such.			// character. Return this as such.
	FormTokenWithChars(Result, AfterLessPos, tok::less);			FormTokenWithChars(Result, AfterLessPos, tok::less);
	return true;			return true;
	} else if (C == 0) {			}

				if (C == 0) {
	NulCharacter = CurPtr-1;			NulCharacter = CurPtr-1;
	}			}
	C = getAndAdvanceChar(CurPtr, Result);			C = getAndAdvanceChar(CurPtr, Result);
	}			}

	// If a nul character existed in the string, warn about it.			// If a nul character existed in the string, warn about it.
	if (NulCharacter && !isLexingRawMode())			if (NulCharacter && !isLexingRawMode())
	Diag(NulCharacter, diag::null_in_char_or_string) << 1;			Diag(NulCharacter, diag::null_in_char_or_string) << 1;
	▲ Show 20 Lines • Show All 1,844 Lines • Show Last 20 Lines

test/Lexer/null-character-in-literal.c

This is a binary file.

Property	Old Value	New Value
svn:mime-type	null	application/octet-stream \ No newline at end of property

unittests/Lex/LexerTest.cpp

Show First 20 Lines • Show All 469 Lines • ▼ Show 20 Lines	for (unsigned Offset = 0; Offset < IdentifierLength; ++Offset) {
// is the same as original token location reported by Lexer.		// is the same as original token location reported by Lexer.
EXPECT_EQ(FoundLocation.second, OriginalLocation.second);		EXPECT_EQ(FoundLocation.second, OriginalLocation.second);
}		}
}		}
}		}

TEST_F(LexerTest, AvoidPastEndOfStringDereference) {		TEST_F(LexerTest, AvoidPastEndOfStringDereference) {
EXPECT_TRUE(Lex(" // \\\n").empty());		EXPECT_TRUE(Lex(" // \\\n").empty());
		EXPECT_TRUE(Lex("#include <\\\\").empty());
		EXPECT_TRUE(Lex("#include <\\\\\n").empty());
}		}

TEST_F(LexerTest, StringizingRasString) {		TEST_F(LexerTest, StringizingRasString) {
// For "std::string Lexer::Stringify(StringRef Str, bool Charify)".		// For "std::string Lexer::Stringify(StringRef Str, bool Charify)".
std::string String1 = R"(foo		std::string String1 = R"(foo
{"bar":[]}		{"bar":[]}
baz)";		baz)";
// For "void Lexer::Stringify(SmallVectorImpl<char> &Str)".		// For "void Lexer::Stringify(SmallVectorImpl<char> &Str)".
Show All 33 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[Lex] Avoid out-of-bounds dereference in LexAngledStringLiteral.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 129666

lib/Lex/Lexer.cpp

test/Lexer/null-character-in-literal.c

unittests/Lex/LexerTest.cpp

[Lex] Avoid out-of-bounds dereference in LexAngledStringLiteral.
ClosedPublic