This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Basic/
-
Basic/
2
Diagnostic.cpp

Differential D51867

[Diagnostics] Add error handling to FormatDiagnostic()
Needs ReviewPublic

Authored by jkorous on Sep 10 2018, 10:02 AM.

Download Raw Diff

Details

Reviewers

arphaman
vsapsai

Summary

There seem to be couple implicit assumptions that might be better represented explicitly by asserts.

Diff Detail

Repository: rC Clang

Event Timeline

jkorous created this revision.Sep 10 2018, 10:02 AM

Herald added subscribers: cfe-commits, dexonsmith. · View Herald TranscriptSep 10 2018, 10:02 AM

Regarding the asserts to catch potential problems, seems most of them are for buffer overflows. Aren't sanitizers catching those cases, specifically Address Sanitizer? I haven't checked, just seems it would be good to check buffer overflow automatically instead of using explicit asserts.

Also there are a few changes I wouldn't call NFC. Those change loop iteration from "iterator != end" to "iterator < end". As it is functionality change, I'd like to have tests to cover that. Also I've fixed a few bugs with going past the end of buffer and bugs were actually inside the loop, not with buffer range check. It is tempting to play safe but it has a risk of hiding real bugs.

lib/Basic/Diagnostic.cpp
804	For example, I wouldn't call this one NFC.

Hi Volodymyr,
Thanks for the feedback - interesting points!

I see your point regarding NFC - I am going to drop it as you are right.

Two things about sanitizers come to mind:

You'd need to guarantee that you have all possible code paths (or at least those you are able to cover with asserts) covered in tests to be able to replace asserts with sanitizers. (I think that even if that would be feasible asserts would prove to be way simpler.)
I prefer explicit assert right in the place where an assumption is made to test somewhere else as it make understanding the code much easier.

Those change loop iteration from "iterator != end" to "iterator < end". As it is functionality change, I'd like to have tests to cover that.

Technically you are right but I assume (ok, busted) that without any bug in the iteration this is NFC. I will try to look if I can find some simple input that would break current implementation.

Also I've fixed a few bugs with going past the end of buffer and bugs were actually inside the loop, not with buffer range check. It is tempting to play safe but it has a risk of hiding real bugs.

But that almost sounds as that we should write fragile code so bugs from other parts of codebase show up... Anyway, I will think about this a bit more, it's an interesting point.

lib/Basic/Diagnostic.cpp
804	You are right, I am gonna drop the NFC.

jkorous retitled this revision from [Diagnostics][NFC] Add error handling to FormatDiagnostic() to [Diagnostics] Add error handling to FormatDiagnostic().Sep 10 2018, 11:36 AM

I tried to come up with some input that breaks current implementation so I could add the test. Problem is that invalid memory read doesn't guarantee deterministic crash.
E. g. with this input the current implementation is definitely reading way past the buffer:

SmallVector<char, 1> IgnoreMe;
const char* Foo = "foo%";
const char* FooEnd = Foo + 4;
Diag.FormatDiagnostic(Foo, FooEnd, IgnoreMe);

...and it actually found some string there yet it didn't crash until it hit some unrelated assert

(lldb) p DiagStr
(const char *) $0 = 0x0000000100adc53b " SplatSizeInBits == 0 && \"SplatSizeInBits must divide width!\""
(lldb) p *DiagStr
(const char) $1 = ' '
(lldb) p DiagEnd
(const char *) $2 = 0x0000000100ad4155 "0"

The only reliable fail is passing nullptr which currently leads to SIGABRT (nullptr dereferenced)

SmallVector<char, 1> IgnoreMe;
const char* Foo = "foo";
Diag.FormatDiagnostic(Foo, nullptr, IgnoreMe);

I am reconsidering the necessity of such tests here. WDYT?

jkorous requested review of this revision.Sep 10 2018, 2:55 PM

Hi Volodymyr, do you think you might take another look?

It seems like there are too many asserts and they are too specific, they seem to be aimed at specific potential bugs. What about asserts that make sure we maintain some invariants? For example, check DiagStr < DiagEnd once in a loop instead of every place we increment DiagStr. Do you think it should catch the same problems but maybe a little bit later?

My suggestion is based on assumption that FormatDiagnostic works only with predefined format strings and not with strings provided by compiler users (arguments can be provided by users but not format strings).

I won't be able for some time to check the review again, so it's OK not to wait for my approval.

Revision Contents

Path

Size

lib/

Basic/

Diagnostic.cpp

22 lines

Diff 164692

lib/Basic/Diagnostic.cpp

Show First 20 Lines • Show All 762 Lines • ▼ Show 20 Lines	StringRef Diag =
getDiags()->getDiagnosticIDs()->getDescription(getID());		getDiags()->getDiagnosticIDs()->getDescription(getID());

FormatDiagnostic(Diag.begin(), Diag.end(), OutStr);		FormatDiagnostic(Diag.begin(), Diag.end(), OutStr);
}		}

void Diagnostic::		void Diagnostic::
FormatDiagnostic(const char DiagStr, const char DiagEnd,		FormatDiagnostic(const char DiagStr, const char DiagEnd,
SmallVectorImpl<char> &OutStr) const {		SmallVectorImpl<char> &OutStr) const {
		assert(DiagStr <= DiagEnd && "Invalid DiagStr-DiagEnd range");
// When the diagnostic string is only "%0", the entire string is being given		// When the diagnostic string is only "%0", the entire string is being given
// by an outside source. Remove unprintable characters from this string		// by an outside source. Remove unprintable characters from this string
// and skip all the other string processing.		// and skip all the other string processing.
if (DiagEnd - DiagStr == 2 &&		if (DiagEnd - DiagStr == 2 &&
StringRef(DiagStr, DiagEnd - DiagStr).equals("%0") &&		StringRef(DiagStr, DiagEnd - DiagStr).equals("%0") &&
getArgKind(0) == DiagnosticsEngine::ak_std_string) {		getArgKind(0) == DiagnosticsEngine::ak_std_string) {
const std::string &S = getArgStdStr(0);		const std::string &S = getArgStdStr(0);
for (char c : S) {		for (char c : S) {
Show All 14 Lines	FormatDiagnostic(const char DiagStr, const char DiagEnd,
/// compared to see if more information is needed to be printed.		/// compared to see if more information is needed to be printed.
SmallVector<intptr_t, 2> QualTypeVals;		SmallVector<intptr_t, 2> QualTypeVals;
SmallVector<char, 64> Tree;		SmallVector<char, 64> Tree;

for (unsigned i = 0, e = getNumArgs(); i < e; ++i)		for (unsigned i = 0, e = getNumArgs(); i < e; ++i)
if (getArgKind(i) == DiagnosticsEngine::ak_qualtype)		if (getArgKind(i) == DiagnosticsEngine::ak_qualtype)
QualTypeVals.push_back(getRawArg(i));		QualTypeVals.push_back(getRawArg(i));

while (DiagStr != DiagEnd) {		assert(DiagStr != nullptr && "DiagStr is nullptr");

		while (DiagStr < DiagEnd) {
		vsapsaiUnsubmitted Not Done Reply Inline Actions For example, I wouldn't call this one NFC. vsapsai: For example, I wouldn't call this one NFC.
		jkorousAuthorUnsubmitted Not Done Reply Inline Actions You are right, I am gonna drop the NFC. jkorous: You are right, I am gonna drop the NFC.
if (DiagStr[0] != '%') {		if (DiagStr[0] != '%') {
// Append non-%0 substrings to Str if we have one.		// Append non-%0 substrings to Str if we have one.
const char *StrEnd = std::find(DiagStr, DiagEnd, '%');		const char *StrEnd = std::find(DiagStr, DiagEnd, '%');
OutStr.append(DiagStr, StrEnd);		OutStr.append(DiagStr, StrEnd);
DiagStr = StrEnd;		DiagStr = StrEnd;
continue;		continue;
} else if (isPunctuation(DiagStr[1])) {		} else if ((DiagStr + 1) < DiagEnd && isPunctuation(DiagStr[1])) {
OutStr.push_back(DiagStr[1]); // %% -> %.		OutStr.push_back(DiagStr[1]); // %% -> %.
DiagStr += 2;		DiagStr += 2;
continue;		continue;
		} else if ((DiagStr + 1) >= DiagEnd) {
		llvm_unreachable("DiagStr ends with '%'");
		return;
}		}

// Skip the %.		// Skip the %.
++DiagStr;		++DiagStr;

// This must be a placeholder for a diagnostic argument. The format for a		// This must be a placeholder for a diagnostic argument. The format for a
// placeholder is one of "%0", "%modifier0", or "%modifier{arguments}0".		// placeholder is one of "%0", "%modifier0", or "%modifier{arguments}0".
// The digit is a number from 0-9 indicating which argument this comes from.		// The digit is a number from 0-9 indicating which argument this comes from.
// The modifier is a string of digits from the set [-a-z]+, arguments is a		// The modifier is a string of digits from the set [-a-z]+, arguments is a
// brace enclosed string.		// brace enclosed string.
const char Modifier = nullptr, Argument = nullptr;		const char Modifier = nullptr, Argument = nullptr;
unsigned ModifierLen = 0, ArgumentLen = 0;		unsigned ModifierLen = 0, ArgumentLen = 0;

// Check to see if we have a modifier. If so eat it.		// Check to see if we have a modifier. If so eat it.
if (!isDigit(DiagStr[0])) {		if (!isDigit(DiagStr[0])) {
Modifier = DiagStr;		Modifier = DiagStr;
while (DiagStr[0] == '-' \|\|		while (DiagStr < DiagEnd &&
(DiagStr[0] >= 'a' && DiagStr[0] <= 'z'))		(DiagStr[0] == '-' \|\| (DiagStr[0] >= 'a' && DiagStr[0] <= 'z')))
++DiagStr;		++DiagStr;
ModifierLen = DiagStr-Modifier;		ModifierLen = DiagStr-Modifier;

// If we have an argument, get it next.		// If we have an argument, get it next.
if (DiagStr[0] == '{') {		if (DiagStr[0] == '{') {
++DiagStr; // Skip {.		++DiagStr; // Skip {.
		assert(DiagStr < DiagEnd && "Invalid DiagStr");
Argument = DiagStr;		Argument = DiagStr;

DiagStr = ScanFormat(DiagStr, DiagEnd, '}');		DiagStr = ScanFormat(DiagStr, DiagEnd, '}');
assert(DiagStr != DiagEnd && "Mismatched {}'s in diagnostic string!");		assert(DiagStr != DiagEnd && "Mismatched {}'s in diagnostic string!");
ArgumentLen = DiagStr-Argument;		ArgumentLen = DiagStr-Argument;
++DiagStr; // Skip }.		++DiagStr; // Skip }.
		assert(DiagStr < DiagEnd && "Invalid DiagStr");
}		}
}		}

assert(isDigit(*DiagStr) && "Invalid format for argument in diagnostic");		assert(isDigit(*DiagStr) && "Invalid format for argument in diagnostic");
unsigned ArgNo = *DiagStr++ - '0';
		unsigned ArgNo = *DiagStr - '0';

// Only used for type diffing.		// Only used for type diffing.
unsigned ArgNo2 = ArgNo;		unsigned ArgNo2 = ArgNo;
		++DiagStr;

DiagnosticsEngine::ArgumentKind Kind = getArgKind(ArgNo);		DiagnosticsEngine::ArgumentKind Kind = getArgKind(ArgNo);
if (ModifierIs(Modifier, ModifierLen, "diff")) {		if (ModifierIs(Modifier, ModifierLen, "diff")) {
		assert(DiagStr + 1 < DiagEnd && "Invalid diff modifier in DiagStr");
assert(DiagStr == ',' && isDigit((DiagStr + 1)) &&		assert(DiagStr == ',' && isDigit((DiagStr + 1)) &&
"Invalid format for diff modifier");		"Invalid format for diff modifier");
++DiagStr; // Comma.		++DiagStr; // Comma.
		assert(DiagStr < DiagEnd && "Invalid DiagStr");
ArgNo2 = *DiagStr++ - '0';		ArgNo2 = *DiagStr++ - '0';
DiagnosticsEngine::ArgumentKind Kind2 = getArgKind(ArgNo2);		DiagnosticsEngine::ArgumentKind Kind2 = getArgKind(ArgNo2);
if (Kind == DiagnosticsEngine::ak_qualtype &&		if (Kind == DiagnosticsEngine::ak_qualtype &&
Kind2 == DiagnosticsEngine::ak_qualtype)		Kind2 == DiagnosticsEngine::ak_qualtype)
Kind = DiagnosticsEngine::ak_qualtype_pair;		Kind = DiagnosticsEngine::ak_qualtype_pair;
else {		else {
// %diff only supports QualTypes. For other kinds of arguments,		// %diff only supports QualTypes. For other kinds of arguments,
// use the default printing. For example, if the modifier is:		// use the default printing. For example, if the modifier is:
▲ Show 20 Lines • Show All 279 Lines • Show Last 20 Lines