This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/StaticAnalyzer/Core/
-
StaticAnalyzer/
-
Core/
-
RegionStore.cpp
-
test/Analysis/
-
Analysis/
-
ptr-arith.cpp
1/2
svalbuilder-float-cast.c

Differential D136603

[analyzer] getBinding should auto-detect type only if it was not given
ClosedPublic

Authored by steakhal on Oct 24 2022, 7:39 AM.

Download Raw Diff

Details

Reviewers

ASDenysPetrov
martong
xazax.hun
NoQ
Szelethus
tomasz-kaminski-sonarsource

Commits

rG93b98eb399a1: [analyzer] getBinding should auto-detect type only if it was not given

Summary

Casting a pointer to a suitably large integral type by reinterpret-cast
should result in the same value as by using the __builtin_bit_cast().
The compiler exploits this: https://godbolt.org/z/zMP3sG683

However, the analyzer does not bind the same symbolic value to these
expressions, resulting in weird situations, such as failing equality
checks and even results in crashes: https://godbolt.org/z/oeMP7cj8q

Previously, in the RegionStoreManager::getBinding() even if T was non-null, we replaced it with TVR->getValueType() in case the MR was TypedValueRegion.
It doesn't make much sense to auto-detect the type if the type is already given.
By not doing the auto-detection, we would just do the right thing and perform the load by that type.
This means that we will cast the value to that type.

So, in this patch, I'm proposing to do auto-detection only if the type was null.

Here is a snippet of code, annotated by the previous and new dump values.
LocAsInteger should wrap the SymRegion, since we want to load the address as if it was an integer.
In none of the following cases should type auto-detection be triggered, hence we should eventually reach an evalCast() to lazily cast the loaded value into that type.

void LValueToRValueBitCast_dumps(void *p, char (*array)[8]) {
  clang_analyzer_dump(p);     // remained: &SymRegion{reg_$0<void * p>}
  clang_analyzer_dump(array); // remained: {{&SymRegion{reg_$1<char (*)[8] array>}
  clang_analyzer_dump((unsigned long)p);
  // remained: {{&SymRegion{reg_$0<void * p>} [as 64 bit integer]}}
  clang_analyzer_dump(__builtin_bit_cast(unsigned long, p));     <--------- change #1
  // previously: {{&SymRegion{reg_$0<void * p>}}}
  // now:        {{&SymRegion{reg_$0<void * p>} [as 64 bit integer]}}
  clang_analyzer_dump((unsigned long)array); // remained: {{&SymRegion{reg_$1<char (*)[8] array>} [as 64 bit integer]}}
  clang_analyzer_dump(__builtin_bit_cast(unsigned long, array)); <--------- change #2
  // previously: {{&SymRegion{reg_$1<char (*)[8] array>}}}
  // now:        {{&SymRegion{reg_$1<char (*)[8] array>} [as 64 bit integer]}}
}

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

steakhal created this revision.Oct 24 2022, 7:39 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 24 2022, 7:39 AM

Herald added subscribers: manas, dkrupp, donat.nagy and 5 others. · View Herald Transcript

steakhal requested review of this revision.Oct 24 2022, 7:39 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 24 2022, 7:39 AM

Herald added a subscriber: cfe-commits. · View Herald Transcript

steakhal added a reviewer: tomasz-kaminski-sonarsource.Oct 24 2022, 7:41 AM

Harbormaster completed remote builds in B193936: Diff 470147.Oct 24 2022, 8:48 AM

I am not sure, if the ExprEngine::VisitCast is the proper place to add these new modifications and call SValBuilder's evalCast. I think it might be better positioned in RegionStoreManager::getBinding. Considering, we already do a cast evaluation for certain kind of memregions there before returning with the stored value. (Actually, this is again a legacy hack, which is needed because we do not emit all SymbolCasts and thus we store the SVals with an improper type).

if (const FieldRegion* FR = dyn_cast<FieldRegion>(R))
  return svalBuilder.evalCast(getBindingForField(B, FR), T, QualType{});

if (const ElementRegion* ER = dyn_cast<ElementRegion>(R)) {
  // FIXME: Here we actually perform an implicit conversion from the loaded
  // value to the element type.  Eventually we want to compose these values
  // more intelligently.  For example, an 'element' can encompass multiple
  // bound regions (e.g., several bound bytes), or could be a subset of
  // a larger value.
  return svalBuilder.evalCast(getBindingForElement(B, ER), T, QualType{});
}

if (const ObjCIvarRegion *IVR = dyn_cast<ObjCIvarRegion>(R)) {
  // FIXME: Here we actually perform an implicit conversion from the loaded
  // value to the ivar type.  What we should model is stores to ivars
  // that blow past the extent of the ivar.  If the address of the ivar is
  // reinterpretted, it is possible we stored a different value that could
  // fit within the ivar.  Either we need to cast these when storing them
  // or reinterpret them lazily (as we do here).
  return svalBuilder.evalCast(getBindingForObjCIvar(B, IVR), T, QualType{});
}

if (const VarRegion *VR = dyn_cast<VarRegion>(R)) {
  // FIXME: Here we actually perform an implicit conversion from the loaded
  // value to the variable type.  What we should model is stores to variables
  // that blow past the extent of the variable.  If the address of the
  // variable is reinterpretted, it is possible we stored a different value
  // that could fit within the variable.  Either we need to cast these when
  // storing them or reinterpret them lazily (as we do here).
  return svalBuilder.evalCast(getBindingForVar(B, VR), T, QualType{});
}

I think a new if block for SymRegion should be put here to continue the hack.

clang/lib/StaticAnalyzer/Core/ExprEngineC.cpp
301–315 ↗	(On Diff #470147)	I think it might be possible to refactor `evalLoad` by returning with a `tuple` or with a special `struct`. I guess you should return a vector of `SVal`s as one tuple member. Then you could reuse the function in both cases before calling `generateNode` on the result. But, this might be a code that is too complicated, I'll let you decide if this is worth the hassle to avoid the code repetition.

PS: I'm not sure how/when we should get rid of the LocAsInteger and
represent this by a SymbolCast.
Maybe @ASDenysPetrov or @martong could help me review this.

Whenever this https://reviews.llvm.org/D117229 gets accepted and when we emit all symbolCasts for all integers. (-analyzer-config support-symbolic-integer-casts=true should be default by then)

Previously, even in the RegionStoreManager::getBinding() even if T was non-null, we replaced it with TVR->getValueType() in case the MR was TypedValueRegion.
IMO we shouldn't overwrite the type unless we actually need to auto-detect the binding type.

This was particularly wrong when we reinterpret-cast some typed memory region (such as a stack-local variable's address) to something else while operating on the store.

So, in this new version, I'm proposing to do auto-detection only if the type was null.

I haven't done any measurements yet, but I'm still curious about what you think about this.

TODO: Update the summary&title accordingly, if we agree with this direction.

Harbormaster completed remote builds in B194618: Diff 471112.Oct 27 2022, 5:42 AM

In D136603#3888065, @steakhal wrote:

Previously, even in the RegionStoreManager::getBinding() even if T was non-null, we replaced it with TVR->getValueType() in case the MR was TypedValueRegion.

Yeah, that means, we actually evaded a cast, am I right?

IMO we shouldn't overwrite the type unless we actually need to auto-detect the binding type.

I agree.

This was particularly wrong when we reinterpret-cast some typed memory region (such as a stack-local variable's address) to something else while operating on the store.

You mean like when we loaded a value of reinterprec-cast<T1>(t2) with evalLoad?

I haven't done any measurements yet, but I'm still curious about what you think about this.

I think this is a good approach and the change is at the right place. But, any change that relates to casts are very fragile unfortunately. So, I agree, it would be great to see measurements and that we don't have new assertion failures.

clang/test/Analysis/svalbuilder-float-cast.c
21–22	I think it would make sense to have another RUN line with `support-symbolic-integer-casts`. In that case I guess we should see `(int)(float)x` (?).

steakhal added inline comments.Oct 28 2022, 12:07 PM

clang/test/Analysis/svalbuilder-float-cast.c
21–22	The result remains the same with and without `support-symbolic-integer-casts`. I'll add the RUN lines.

Ping

This 'new' approach with the type auto-detection behaves the same way as the originally proposed patch.
Same results, still no crashes :)
I'll update the summary and the title according to the new approach.

steakhal retitled this revision from [analyzer] Model cast after LValueToRValueBitCasts to [analyzer] getBinding should auto-detect type only if it was not given.Nov 9 2022, 7:19 AM

steakhal edited the summary of this revision. (Show Details)

Sounds reasonable to me.

This revision is now accepted and ready to land.Nov 21 2022, 9:49 AM

Closed by commit rG93b98eb399a1: [analyzer] getBinding should auto-detect type only if it was not given (authored by steakhal). · Explain WhyNov 23 2022, 6:52 AM

This revision was automatically updated to reflect the committed changes.

steakhal added a commit: rG93b98eb399a1: [analyzer] getBinding should auto-detect type only if it was not given.

Revision Contents

Path

Size

clang/

lib/

StaticAnalyzer/

Core/

RegionStore.cpp

25 lines

test/

Analysis/

ptr-arith.cpp

26 lines

svalbuilder-float-cast.c

17 lines

Diff 477493

clang/lib/StaticAnalyzer/Core/RegionStore.cpp

Show First 20 Lines • Show All 1,408 Lines • ▼ Show 20 Lines	SVal RegionStoreManager::getBinding(RegionBindingsConstRef B, Loc L, QualType T) {
}		}

const MemRegion *MR = L.castAs<loc::MemRegionVal>().getRegion();		const MemRegion *MR = L.castAs<loc::MemRegionVal>().getRegion();

if (isa<BlockDataRegion>(MR)) {		if (isa<BlockDataRegion>(MR)) {
return UnknownVal();		return UnknownVal();
}		}

if (!isa<TypedValueRegion>(MR)) {		// Auto-detect the binding type.
if (T.isNull()) {		if (T.isNull()) {
if (const TypedRegion *TR = dyn_cast<TypedRegion>(MR))		if (const auto *TVR = dyn_cast<TypedValueRegion>(MR))
		T = TVR->getValueType();
		else if (const auto *TR = dyn_cast<TypedRegion>(MR))
T = TR->getLocationType()->getPointeeType();		T = TR->getLocationType()->getPointeeType();
else if (const SymbolicRegion *SR = dyn_cast<SymbolicRegion>(MR))		else if (const auto *SR = dyn_cast<SymbolicRegion>(MR))
T = SR->getPointeeStaticType();		T = SR->getPointeeStaticType();
}		}
assert(!T.isNull() && "Unable to auto-detect binding type!");		assert(!T.isNull() && "Unable to auto-detect binding type!");
assert(!T->isVoidType() && "Attempting to dereference a void pointer!");		assert(!T->isVoidType() && "Attempting to dereference a void pointer!");

		if (!isa<TypedValueRegion>(MR))
MR = GetElementZeroRegion(cast<SubRegion>(MR), T);		MR = GetElementZeroRegion(cast<SubRegion>(MR), T);
} else {
T = cast<TypedValueRegion>(MR)->getValueType();
}

// FIXME: Perhaps this method should just take a 'const MemRegion*' argument		// FIXME: Perhaps this method should just take a 'const MemRegion*' argument
// instead of 'Loc', and have the other Loc cases handled at a higher level.		// instead of 'Loc', and have the other Loc cases handled at a higher level.
const TypedValueRegion *R = cast<TypedValueRegion>(MR);		const TypedValueRegion *R = cast<TypedValueRegion>(MR);
QualType RTy = R->getValueType();		QualType RTy = R->getValueType();

// FIXME: we do not yet model the parts of a complex type, so treat the		// FIXME: we do not yet model the parts of a complex type, so treat the
// whole thing as "unknown".		// whole thing as "unknown".
▲ Show 20 Lines • Show All 1,497 Lines • Show Last 20 Lines

clang/test/Analysis/ptr-arith.cpp

// RUN: %clang_analyze_cc1 -Wno-unused-value -std=c++14 -analyzer-checker=core,debug.ExprInspection,alpha.core.PointerArithm -verify %s		// RUN: %clang_analyze_cc1 -Wno-unused-value -std=c++14 -verify %s -triple x86_64-pc-linux-gnu \
		// RUN: -analyzer-checker=core,debug.ExprInspection,alpha.core.PointerArithm

		// RUN: %clang_analyze_cc1 -Wno-unused-value -std=c++14 -verify %s -triple x86_64-pc-linux-gnu \
		// RUN: -analyzer-config support-symbolic-integer-casts=true \
		// RUN: -analyzer-checker=core,debug.ExprInspection,alpha.core.PointerArithm

template <typename T> void clang_analyzer_dump(T);		template <typename T> void clang_analyzer_dump(T);

struct X {		struct X {
int *p;		int *p;
int zero;		int zero;
void foo () {		void foo () {
reset(p - 1);		reset(p - 1);
▲ Show 20 Lines • Show All 126 Lines • ▼ Show 20 Lines	int parse(parse_t *p) {
clang_analyzer_dump(copy);		clang_analyzer_dump(copy);
// expected-warning@-1 {{reg_$1<unsigned int Element{SymRegion{reg_$0<parse_t * p>},0 S64b,struct Bug_55934::parse_t}.bits2>}}		// expected-warning@-1 {{reg_$1<unsigned int Element{SymRegion{reg_$0<parse_t * p>},0 S64b,struct Bug_55934::parse_t}.bits2>}}
header bits = (header )©		header bits = (header )©
clang_analyzer_dump(bits->b);		clang_analyzer_dump(bits->b);
// expected-warning@-1 {{derived_$2{reg_$1<unsigned int Element{SymRegion{reg_$0<parse_t * p>},0 S64b,struct Bug_55934::parse_t}.bits2>,Element{copy,0 S64b,struct Bug_55934::header}.b}}}		// expected-warning@-1 {{derived_$2{reg_$1<unsigned int Element{SymRegion{reg_$0<parse_t * p>},0 S64b,struct Bug_55934::parse_t}.bits2>,Element{copy,0 S64b,struct Bug_55934::header}.b}}}
return bits->b; // no-warning		return bits->b; // no-warning
}		}
} // namespace Bug_55934		} // namespace Bug_55934

		void LValueToRValueBitCast_dumps(void p, char (array)[8]) {
		clang_analyzer_dump(p);
		clang_analyzer_dump(array);
		// expected-warning@-2 {{&SymRegion{reg_$0<void * p>}}}
		// expected-warning@-2 {{&SymRegion{reg_$1<char (*)[8] array>}}}
		clang_analyzer_dump((unsigned long)p);
		clang_analyzer_dump(__builtin_bit_cast(unsigned long, p));
		// expected-warning@-2 {{&SymRegion{reg_$0<void * p>} [as 64 bit integer]}}
		// expected-warning@-2 {{&SymRegion{reg_$0<void * p>} [as 64 bit integer]}}
		clang_analyzer_dump((unsigned long)array);
		clang_analyzer_dump(__builtin_bit_cast(unsigned long, array));
		// expected-warning@-2 {{&SymRegion{reg_$1<char (*)[8] array>} [as 64 bit integer]}}
		// expected-warning@-2 {{&SymRegion{reg_$1<char (*)[8] array>} [as 64 bit integer]}}
		}

		unsigned long ptr_arithmetic(void *p) {
		return __builtin_bit_cast(unsigned long, p) + 1; // no-crash
		}

clang/test/Analysis/svalbuilder-float-cast.c

	// RUN: %clang_analyze_cc1 -analyzer-checker debug.ExprInspection -Wno-deprecated-non-prototype -verify %s			// RUN: %clang_analyze_cc1 -analyzer-checker debug.ExprInspection -Wno-deprecated-non-prototype -verify %s
				// RUN: %clang_analyze_cc1 -analyzer-checker debug.ExprInspection -Wno-deprecated-non-prototype -verify %s \
				// RUN: -analyzer-config support-symbolic-integer-casts=true

	void clang_analyzer_denote(int, const char *);			void clang_analyzer_denote(int, const char *);
	void clang_analyzer_express(int);			void clang_analyzer_express(int);
				void clang_analyzer_dump(int);
				void clang_analyzer_dump_ptr(int *);

	void SymbolCast_of_float_type_aux(int *p) {			void SymbolCast_of_float_type_aux(int *p) {
				clang_analyzer_dump_ptr(p); // expected-warning {{&x}}
				clang_analyzer_dump(*p); // expected-warning {{Unknown}}
				// Storing to the memory region of 'float x' as 'int' will
				// materialize a fresh conjured symbol to regain accuracy.
	*p += 0;			*p += 0;
	// FIXME: Ideally, all unknown values should be symbolicated.			clang_analyzer_dump_ptr(p); // expected-warning {{&x}}
	clang_analyzer_denote(*p, "$x"); // expected-warning{{Not a symbol}}			clang_analyzer_dump(*p); // expected-warning {{conj_$0{int}}
				clang_analyzer_denote(*p, "$x");

	*p += 1;			*p += 1;
	// This should NOT be (float)$x + 1. Symbol $x was never casted to float.			// This should NOT be (float)$x + 1. Symbol $x was never casted to float.
	// FIXME: Ideally, this should be $x + 1.			clang_analyzer_express(*p); // expected-warning{{$x + 1}}
				martongUnsubmitted Not Done Reply Inline Actions I think it would make sense to have another RUN line with `support-symbolic-integer-casts`. In that case I guess we should see `(int)(float)x` (?). martong: I think it would make sense to have another RUN line with `support-symbolic-integer-casts`. In…
				steakhalAuthorUnsubmitted Done Reply Inline Actions The result remains the same with and without `support-symbolic-integer-casts`. I'll add the RUN lines. steakhal: The result remains the same with and without `support-symbolic-integer-casts`. I'll add the RUN…
	clang_analyzer_express(*p); // expected-warning{{Not a symbol}}
	}			}

	void SymbolCast_of_float_type(void) {			void SymbolCast_of_float_type(void) {
	extern float x;			extern float x;
	void (*f)() = SymbolCast_of_float_type_aux;			void (*f)() = SymbolCast_of_float_type_aux;
	f(&x);			f(&x);
	}			}