This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] getBinding should auto-detect type only if it was not given
ClosedPublic

Authored by steakhal on Oct 24 2022, 7:39 AM.

Details

Summary

Casting a pointer to a suitably large integral type by reinterpret-cast
should result in the same value as by using the __builtin_bit_cast().
The compiler exploits this: https://godbolt.org/z/zMP3sG683

However, the analyzer does not bind the same symbolic value to these
expressions, resulting in weird situations, such as failing equality
checks and even results in crashes: https://godbolt.org/z/oeMP7cj8q

Previously, in the RegionStoreManager::getBinding() even if T was non-null, we replaced it with TVR->getValueType() in case the MR was TypedValueRegion.
It doesn't make much sense to auto-detect the type if the type is already given.
By not doing the auto-detection, we would just do the right thing and perform the load by that type.
This means that we will cast the value to that type.

So, in this patch, I'm proposing to do auto-detection only if the type was null.

Here is a snippet of code, annotated by the previous and new dump values.
LocAsInteger should wrap the SymRegion, since we want to load the address as if it was an integer.
In none of the following cases should type auto-detection be triggered, hence we should eventually reach an evalCast() to lazily cast the loaded value into that type.

void LValueToRValueBitCast_dumps(void *p, char (*array)[8]) {
  clang_analyzer_dump(p);     // remained: &SymRegion{reg_$0<void * p>}
  clang_analyzer_dump(array); // remained: {{&SymRegion{reg_$1<char (*)[8] array>}
  clang_analyzer_dump((unsigned long)p);
  // remained: {{&SymRegion{reg_$0<void * p>} [as 64 bit integer]}}
  clang_analyzer_dump(__builtin_bit_cast(unsigned long, p));     <--------- change #1
  // previously: {{&SymRegion{reg_$0<void * p>}}}
  // now:        {{&SymRegion{reg_$0<void * p>} [as 64 bit integer]}}
  clang_analyzer_dump((unsigned long)array); // remained: {{&SymRegion{reg_$1<char (*)[8] array>} [as 64 bit integer]}}
  clang_analyzer_dump(__builtin_bit_cast(unsigned long, array)); <--------- change #2
  // previously: {{&SymRegion{reg_$1<char (*)[8] array>}}}
  // now:        {{&SymRegion{reg_$1<char (*)[8] array>} [as 64 bit integer]}}
}

Diff Detail

Event Timeline

steakhal created this revision.Oct 24 2022, 7:39 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 24 2022, 7:39 AM
steakhal requested review of this revision.Oct 24 2022, 7:39 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 24 2022, 7:39 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

I am not sure, if the ExprEngine::VisitCast is the proper place to add these new modifications and call SValBuilder's evalCast. I think it might be better positioned in RegionStoreManager::getBinding. Considering, we already do a cast evaluation for certain kind of memregions there before returning with the stored value. (Actually, this is again a legacy hack, which is needed because we do not emit all SymbolCasts and thus we store the SVals with an improper type).

if (const FieldRegion* FR = dyn_cast<FieldRegion>(R))
  return svalBuilder.evalCast(getBindingForField(B, FR), T, QualType{});

if (const ElementRegion* ER = dyn_cast<ElementRegion>(R)) {
  // FIXME: Here we actually perform an implicit conversion from the loaded
  // value to the element type.  Eventually we want to compose these values
  // more intelligently.  For example, an 'element' can encompass multiple
  // bound regions (e.g., several bound bytes), or could be a subset of
  // a larger value.
  return svalBuilder.evalCast(getBindingForElement(B, ER), T, QualType{});
}

if (const ObjCIvarRegion *IVR = dyn_cast<ObjCIvarRegion>(R)) {
  // FIXME: Here we actually perform an implicit conversion from the loaded
  // value to the ivar type.  What we should model is stores to ivars
  // that blow past the extent of the ivar.  If the address of the ivar is
  // reinterpretted, it is possible we stored a different value that could
  // fit within the ivar.  Either we need to cast these when storing them
  // or reinterpret them lazily (as we do here).
  return svalBuilder.evalCast(getBindingForObjCIvar(B, IVR), T, QualType{});
}

if (const VarRegion *VR = dyn_cast<VarRegion>(R)) {
  // FIXME: Here we actually perform an implicit conversion from the loaded
  // value to the variable type.  What we should model is stores to variables
  // that blow past the extent of the variable.  If the address of the
  // variable is reinterpretted, it is possible we stored a different value
  // that could fit within the variable.  Either we need to cast these when
  // storing them or reinterpret them lazily (as we do here).
  return svalBuilder.evalCast(getBindingForVar(B, VR), T, QualType{});
}

I think a new if block for SymRegion should be put here to continue the hack.

clang/lib/StaticAnalyzer/Core/ExprEngineC.cpp
301–315 ↗(On Diff #470147)

I think it might be possible to refactor evalLoad by returning with a tuple or with a special struct. I guess you should return a vector of SVals as one tuple member. Then you could reuse the function in both cases before calling generateNode on the result. But, this might be a code that is too complicated, I'll let you decide if this is worth the hassle to avoid the code repetition.

PS: I'm not sure how/when we should get rid of the LocAsInteger and
represent this by a SymbolCast.
Maybe @ASDenysPetrov or @martong could help me review this.

Whenever this https://reviews.llvm.org/D117229 gets accepted and when we emit all symbolCasts for all integers. (-analyzer-config support-symbolic-integer-casts=true should be default by then)

steakhal updated this revision to Diff 471112.Oct 27 2022, 3:57 AM

Previously, even in the RegionStoreManager::getBinding() even if T was non-null, we replaced it with TVR->getValueType() in case the MR was TypedValueRegion.
IMO we shouldn't overwrite the type unless we actually need to auto-detect the binding type.

This was particularly wrong when we reinterpret-cast some typed memory region (such as a stack-local variable's address) to something else while operating on the store.


So, in this new version, I'm proposing to do auto-detection only if the type was null.


I haven't done any measurements yet, but I'm still curious about what you think about this.

TODO: Update the summary&title accordingly, if we agree with this direction.

Previously, even in the RegionStoreManager::getBinding() even if T was non-null, we replaced it with TVR->getValueType() in case the MR was TypedValueRegion.

Yeah, that means, we actually evaded a cast, am I right?

IMO we shouldn't overwrite the type unless we actually need to auto-detect the binding type.

I agree.

This was particularly wrong when we reinterpret-cast some typed memory region (such as a stack-local variable's address) to something else while operating on the store.

You mean like when we loaded a value of reinterprec-cast<T1>(t2) with evalLoad?

I haven't done any measurements yet, but I'm still curious about what you think about this.

I think this is a good approach and the change is at the right place. But, any change that relates to casts are very fragile unfortunately. So, I agree, it would be great to see measurements and that we don't have new assertion failures.

clang/test/Analysis/svalbuilder-float-cast.c
21–22

I think it would make sense to have another RUN line with support-symbolic-integer-casts. In that case I guess we should see (int)(float)x (?).

steakhal added inline comments.Oct 28 2022, 12:07 PM
clang/test/Analysis/svalbuilder-float-cast.c
21–22

The result remains the same with and without support-symbolic-integer-casts. I'll add the RUN lines.

This 'new' approach with the type auto-detection behaves the same way as the originally proposed patch.
Same results, still no crashes :)
I'll update the summary and the title according to the new approach.

steakhal retitled this revision from [analyzer] Model cast after LValueToRValueBitCasts to [analyzer] getBinding should auto-detect type only if it was not given.Nov 9 2022, 7:19 AM
steakhal edited the summary of this revision. (Show Details)
xazax.hun accepted this revision.Nov 21 2022, 9:49 AM

Sounds reasonable to me.

This revision is now accepted and ready to land.Nov 21 2022, 9:49 AM