This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/Basic/
-
clang/
-
Basic/
-
DiagnosticLexKinds.td
-
lib/Lex/
-
Lex/
-
LiteralSupport.cpp

Differential D114003

LiteralSupport: Don't assert() on invalid input
ClosedPublic

Authored by DaanDeMeyer on Nov 16 2021, 8:02 AM.

Download Raw Diff

Details

Reviewers

eduucaldas
beccadax
sammccall
kadircet

Summary

When using clangd, it's possible to trigger assertions in
NumericLiteralParser and CharLiteralParser when switching git branches.
This commit removes the initial asserts on invalid input and replaces
those asserts with the error handling mechanism from those respective
classes instead. This allows clangd to gracefully recover without
crashing.

See https://github.com/clangd/clangd/issues/888 for more information
on the clangd crashes.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	380 ms	x64 debian > LLVM.CodeGen/PowerPC::mi-peepholes-trap-opt.mir

Event Timeline

DaanDeMeyer created this revision.Nov 16 2021, 8:02 AM

Herald added a subscriber: usaxena95. · View Herald TranscriptNov 16 2021, 8:02 AM

DaanDeMeyer requested review of this revision.Nov 16 2021, 8:02 AM

Herald added subscribers: cfe-commits, ilya-biryukov. · View Herald TranscriptNov 16 2021, 8:02 AM

Added some people that were recent reviewers of changes to this file and some clangd folks.

I don't have the time to properly test this unfortunately (aside from verifying that it fixes all the clangd crashes I'm having), but putting the patch up anyway in case anyone's interested.

Harbormaster completed remote builds in B134529: Diff 387642.Nov 16 2021, 8:30 AM

clang-format

Harbormaster completed remote builds in B134544: Diff 387667.Nov 16 2021, 9:20 AM

err_lexing_string’s message is “failure when lexing a string”, which isn’t accurate here since you’re lexing a character literal or numeric literal instead. Could you emit a more appropriate message for this? That might mean adding additional diagnostics or modifying the existing one so you can insert information about the kind of literal.

(err_lexing_string is only used for “can’t happen” errors, so maybe you could change the message to something like failure when lexing a %0 literal; a file may have been modified during compilation.)

I’m otherwise pretty happy with this change—we’ve seen similar un-reproducible crashes in string literal parsing, and this kind of solution has worked well there.

This revision now requires changes to proceed.Nov 16 2021, 9:49 AM

Addressed comments by adding two new errors, one for character literals and one for numeric literals.

Harbormaster completed remote builds in B134586: Diff 387723.Nov 16 2021, 12:32 PM

Added a bit of analysis on https://github.com/clangd/clangd/issues/888.

The short version is we built a PCH. When consuming it, the parser decides to re-lex and this means sources must be loaded from disk.
These sources are inconsistent with the PCH, so the precondition that the input range describes a reasonable literal doesn't hold.

This patch fixes it by removing the precondition and defending against it instead.
This seems like an OK fallback unless there's hundreds of other places where we might re-lex and make similar assumptions.
I'm going to dig and try to find out but if anyone knows about this it'd be nice to chime in :-)

The diagnostics are now much better—thanks!

This revision is now accepted and ready to land.Nov 17 2021, 7:23 AM

This has been committed in https://github.com/llvm/llvm-project/commit/5a6dac66db67225e2443f4e61dfe9d2f96780611.

Herald added a project: Restricted Project. · View Herald TranscriptApr 10 2022, 1:53 PM

Revision Contents

Path

Size

clang/

include/

clang/

Basic/

DiagnosticLexKinds.td

4 lines

lib/

Lex/

LiteralSupport.cpp

23 lines

Diff 387723

clang/include/clang/Basic/DiagnosticLexKinds.td

	Show First 20 Lines • Show All 263 Lines • ▼ Show 20 Lines
	def warn_bad_string_encoding : ExtWarn<			def warn_bad_string_encoding : ExtWarn<
	"illegal character encoding in string literal">,			"illegal character encoding in string literal">,
	InGroup<InvalidSourceEncoding>;			InGroup<InvalidSourceEncoding>;
	def err_bad_character_encoding : Error<			def err_bad_character_encoding : Error<
	"illegal character encoding in character literal">;			"illegal character encoding in character literal">;
	def warn_bad_character_encoding : ExtWarn<			def warn_bad_character_encoding : ExtWarn<
	"illegal character encoding in character literal">,			"illegal character encoding in character literal">,
	InGroup<InvalidSourceEncoding>;			InGroup<InvalidSourceEncoding>;
	def err_lexing_string : Error<"failure when lexing a string">;			def err_lexing_string : Error<"failure when lexing a string literal">;
				def err_lexing_char : Error<"failure when lexing a character literal">;
				def err_lexing_numeric : Error<"failure when lexing a numeric literal">;
	def err_placeholder_in_source : Error<"editor placeholder in source file">;			def err_placeholder_in_source : Error<"editor placeholder in source file">;

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Preprocessor Diagnostics			// Preprocessor Diagnostics
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	let CategoryName = "User-Defined Issue" in {			let CategoryName = "User-Defined Issue" in {
	def pp_hash_warning : Warning<"%0">,			def pp_hash_warning : Warning<"%0">,
	▲ Show 20 Lines • Show All 608 Lines • Show Last 20 Lines

clang/lib/Lex/LiteralSupport.cpp

Show First 20 Lines • Show All 687 Lines • ▼ Show 20 Lines	NumericLiteralParser::NumericLiteralParser(StringRef TokSpelling,
SourceLocation TokLoc,		SourceLocation TokLoc,
const SourceManager &SM,		const SourceManager &SM,
const LangOptions &LangOpts,		const LangOptions &LangOpts,
const TargetInfo &Target,		const TargetInfo &Target,
DiagnosticsEngine &Diags)		DiagnosticsEngine &Diags)
: SM(SM), LangOpts(LangOpts), Diags(Diags),		: SM(SM), LangOpts(LangOpts), Diags(Diags),
ThisTokBegin(TokSpelling.begin()), ThisTokEnd(TokSpelling.end()) {		ThisTokBegin(TokSpelling.begin()), ThisTokEnd(TokSpelling.end()) {

// This routine assumes that the range begin/end matches the regex for integer
// and FP constants (specifically, the 'pp-number' regex), and assumes that
// the byte at "*end" is both valid and not part of the regex. Because of
// this, it doesn't have to check for 'overscan' in various places.
assert(!isPreprocessingNumberBody(*ThisTokEnd) && "didn't maximally munch?");

s = DigitsBegin = ThisTokBegin;		s = DigitsBegin = ThisTokBegin;
saw_exponent = false;		saw_exponent = false;
saw_period = false;		saw_period = false;
saw_ud_suffix = false;		saw_ud_suffix = false;
saw_fixed_point_suffix = false;		saw_fixed_point_suffix = false;
isLong = false;		isLong = false;
isUnsigned = false;		isUnsigned = false;
isLongLong = false;		isLongLong = false;
isSizeT = false;		isSizeT = false;
isHalf = false;		isHalf = false;
isFloat = false;		isFloat = false;
isImaginary = false;		isImaginary = false;
isFloat16 = false;		isFloat16 = false;
isFloat128 = false;		isFloat128 = false;
MicrosoftInteger = 0;		MicrosoftInteger = 0;
isFract = false;		isFract = false;
isAccum = false;		isAccum = false;
hadError = false;		hadError = false;

		// This routine assumes that the range begin/end matches the regex for integer
		// and FP constants (specifically, the 'pp-number' regex), and assumes that
		// the byte at "*end" is both valid and not part of the regex. Because of
		// this, it doesn't have to check for 'overscan' in various places.
		if (isPreprocessingNumberBody(*ThisTokEnd)) {
		Diags.Report(TokLoc, diag::err_lexing_numeric);
		hadError = true;
		return;
		}

if (*s == '0') { // parse radix		if (*s == '0') { // parse radix
ParseNumberStartingWithZero(TokLoc);		ParseNumberStartingWithZero(TokLoc);
if (hadError)		if (hadError)
return;		return;
} else { // the first digit is non-zero		} else { // the first digit is non-zero
radix = 10;		radix = 10;
s = SkipDigits(s);		s = SkipDigits(s);
if (s == ThisTokEnd) {		if (s == ThisTokEnd) {
▲ Show 20 Lines • Show All 698 Lines • ▼ Show 20 Lines	CharLiteralParser::CharLiteralParser(const char begin, const char end,

// Skip over wide character determinant.		// Skip over wide character determinant.
if (Kind != tok::char_constant)		if (Kind != tok::char_constant)
++begin;		++begin;
if (Kind == tok::utf8_char_constant)		if (Kind == tok::utf8_char_constant)
++begin;		++begin;

// Skip over the entry quote.		// Skip over the entry quote.
assert(begin[0] == '\'' && "Invalid token lexed");		if (begin[0] != '\'') {
		PP.Diag(Loc, diag::err_lexing_char);
		HadError = true;
		return;
		}

++begin;		++begin;

// Remove an optional ud-suffix.		// Remove an optional ud-suffix.
if (end[-1] != '\'') {		if (end[-1] != '\'') {
const char *UDSuffixEnd = end;		const char *UDSuffixEnd = end;
do {		do {
--end;		--end;
} while (end[-1] != '\'');		} while (end[-1] != '\'');
▲ Show 20 Lines • Show All 668 Lines • Show Last 20 Lines