This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Improve loads from reinterpret-cast fields
ClosedPublic

Authored by steakhal on Jun 24 2022, 7:52 AM.

Details

Summary

Consider this example:

struct header {
  unsigned a : 1;
  unsigned b : 1;
};
struct parse_t {
  unsigned bits0 : 1;
  unsigned bits2 : 2; // <-- header
  unsigned bits4 : 4;
};
int parse(parse_t *p) {
  unsigned copy = p->bits2;
  clang_analyzer_dump(copy);
  // expected-warning@-1 {{reg_$1<unsigned int SymRegion{reg_$0<struct Bug_55934::parse_t * p>}.bits2>}}

  header *bits = (header *)&copy;
  clang_analyzer_dump(bits->b); // <--- Was UndefinedVal previously.
  // expected-warning@-1 {{derived_$2{reg_$1<unsigned int SymRegion{reg_$0<struct Bug_55934::parse_t * p>}.bits2>,Element{copy,0 S64b,struct Bug_55934::header}.b}}}
  return bits->b; // no-warning: it's not UndefinedVal
}

bits->b should have the same content as the second bit of p->bits2
(assuming that the bitfields are in spelling order).


The Store has the correct bindings. The problem is with the load of bits->b.
It will eventually reach RegionStoreManager::getBindingForField() with
Element{copy,0 S64b,struct header}.b, which is a FieldRegion.
It did not find any direct bindings, so the getBindingForFieldOrElementCommon()
gets called. That won't find any bindings, but it sees that the variable
is on the stack, thus it must be an uninitialized local variable;
thus it returns UndefinedVal.

Instead of doing this, it should have created a derived symbol
representing the slice of the region corresponding to the member.
So, if the value of copy is reg1, then the value of bits->b should
be derived{reg1, elem{copy,0, header}.b}.

Actually, the getBindingForElement() already does exactly this for reinterpret-casts, so I decided to hoist that and reuse the logic.

Fixes #55934

Diff Detail

Event Timeline

steakhal created this revision.Jun 24 2022, 7:52 AM
Herald added a project: Restricted Project. · View Herald Transcript
steakhal requested review of this revision.Jun 24 2022, 7:52 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 24 2022, 7:52 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
steakhal edited the summary of this revision. (Show Details)Jun 24 2022, 7:53 AM

No runtime diff. No crashes.
8 disappeared:

  • 4 core.CallAndMessage (1st arg is uninitialized)
  • 2 alpha.deadcode.UnreachableCode (sinked before reaching it)
  • 2 core.UndefinedBinaryOperatorResult

2 new reports:

  • 1 core.UndefinedBinaryOperatorResult
  • 1 alpha.security.ArrayBound

15k+ reports preserved.

I like it, but I'd like to know more about the nature of the applied "hack" and if there is a way to solve it properly in the longer term.

clang/lib/StaticAnalyzer/Core/RegionStore.cpp
2014–2017

Could you please elaborate why this is a hack? What should be done and where to solve it properly? (Disclaimer, I did not look into getBindingForFieldOrElementCommon.)

steakhal added inline comments.Jul 7 2022, 4:09 AM
clang/lib/StaticAnalyzer/Core/RegionStore.cpp
1897

BTW I'm not sure why this check is here. I would expect this to work eve without this.
OOh, I think I know why. getTypeSizeInChars can only be called on scalars?

2014–2017

I'm not sure what Jordan was actually referring to. IMO hardcoding to look for a highly specific scenario (a typed memregion, and a symbol to be exact) could be thought about implementing a 'hack'. IMO this fits in the current ecosystem, so for me it doesn't look like a hack; hence IMO this is the way of implementing this.

To make it clear, I simply hoisted the nested dyn_casts and checks into a function, while preserved the 'hack' comment at the callsite; so it's not added by me.

One additional note. We relay on the type information embedded into the memregion - which we know might not always be present and even accurate. That could be another reason why this should be considered as a 'hack'. IDK

martong accepted this revision.Jul 11 2022, 4:45 AM

LGTM (given that you remove the vague and disturbing FIXME comments).

clang/lib/StaticAnalyzer/Core/RegionStore.cpp
2014–2017

Okay, thanks for the detailed explanation. So, seems like these FIXME comments are vague and they do not add any meaningful information, they are just disturbing noise. Could you please remove them?

Other than that, your change makes perfect sense to me.

This revision is now accepted and ready to land.Jul 11 2022, 4:45 AM
This revision was landed with ongoing or failed builds.Jul 26 2022, 3:31 AM
This revision was automatically updated to reflect the committed changes.
steakhal marked 2 inline comments as done.