This is an archive of the discontinued LLVM Phabricator instance.

[Analyzer] Create and handle SymbolCast for pointer to integral conversion
Needs ReviewPublic

Authored by martong on Dec 17 2021, 4:47 AM.

Details

Summary

Currently, pointer to integral type casts are not modeled properly in
the symbol tree.
Below

void foo(int *p) {
  12 - (long)p;
}

the resulting symbol for the BinOP is an IntSymExpr. LHS is 12 and the RHS is
&SymRegion{reg_$0<int * p>}.

Even though, the SVal that is created for (long)p is correctly an
LocAsInteger, we loose this information when we build up the symbol tree for
12 - (long)p. To preserve the cast, we have to create a new node in
the symbol tree that represents it: the SymbolCast. This
is what this patch does.

This reverts D115149, except the test cases.

Diff Detail

Event Timeline

martong created this revision.Dec 17 2021, 4:47 AM
martong requested review of this revision.Dec 17 2021, 4:47 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 17 2021, 4:47 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
martong added inline comments.
clang/lib/StaticAnalyzer/Core/SValBuilder.cpp
417

We should handle LHS as well.

Many thanks for digging into this @martong. I really enjoyed it!
I also believe that this is the fix for the underlying issue.

I also think the getAsSymbol() should be somewhere where we can create new symbols.

NoQ added a comment.EditedDec 17 2021, 1:13 PM

Oh nice, it's great to see these things moving.

I have a couple observations for your specific approach:

(1) Sounds like you're introducing a symbol (long p) that represents the exact same value as (and is used interchangeably with) &SymRegion{$p} (as long). Ambiguity is generally bad because it means a lot of duplication and missed cornercases.

(2) I think you can run into the same problem with a non-symbolic region which you can't unwrap the same way because there's no symbol to cast:

void foo() {
  int p;
  12 - (long)&p;
}

Maybe we should stick to your approach as strictly superior and eliminate LocAsInteger entirely? In this case the example (2) will work through introducing a new custom symbol SymbolRegionAddress which would look like $addr<int p>. Then add a hard canonicalization rule SymRegion{$addr<R>} = R (symregions around address symbols are ill-formed). Ideally we'd also have to unwrap offsets, i.e. $addr<Element{x, char, $i}> = $addr<x> + $i.

ASDenysPetrov added a comment.EditedJan 10 2022, 4:32 AM

Starting producing SymbolCast you should be careful. It'll be helpfull for you to pay attention on my revisions D105340 and D103096 before doing this. I don't want you walk through the things I fought with. At least we've discussed to introduce a new flag support-symbolic-integer-casts to turn on/off producing the casts.

ASDenysPetrov added inline comments.Jan 10 2022, 4:46 AM
clang/lib/StaticAnalyzer/Core/SValBuilder.cpp
415–423

IMO this is not the right place for producing SymbolCast because we can meet (type)x-like semantics in a different contexts and expressions.

419–420
471–478

This can be carried out to the separate revert revision as it is not related to this one directly

ASDenysPetrov added inline comments.Jan 10 2022, 4:57 AM
clang/lib/StaticAnalyzer/Core/SValBuilder.cpp
420

Please take into account that SVal::getType may return NULL QualType. See function's description.

NoQ added inline comments.Jan 10 2022, 1:59 PM
clang/lib/StaticAnalyzer/Core/SValBuilder.cpp
420

Yup, SVal::getType() is relatively speculative. It arguably shouldn't be but as of now it is.

In this case though it'll probably always work correctly. Despite the relevant code making relatively little sense:

QualType VisitNonLocLocAsInteger(nonloc::LocAsInteger LI) {
  // Who cares about the pointer type? It's turned into an integer anyway.
  QualType NestedType = Visit(LI.getLoc());

  // Ok, now we suddenly have a failure condition when there's no pointer type
  // even though the pointer type is completely immaterial.
  // It probably never fails in practice though, because there's usually at least some
  // pointer type (void pointer in the worst case, but it's still something).
  if (NestedType.isNull())
    return NestedType;
  
  // See? We can get bit width without pointer type.
  return Context.getIntTypeForBitwidth(LI.getNumBits(),
                                       // Oh I know this one! It's a pointer type. It isn't an integer type at all.
                                       // Whelp I guess "always unsigned" works for us as well as anything else.
                                       // We don't have any information about signedness one way or the other.
                                       NestedType->isSignedIntegerType());
}

Thanks for the review guys.

Artem, I agree, that we should remove LocAsInteger. LocaAsInteger is a primitive handling of a specific cast operation (when we cast a pointer to an integer). The integration of LocaAsInteger into the SymExpr hierarchy is problematic as both this patch and its parent shows.

For precision, we should produce a SymbolCast when we evaluate a cast operation of a pointer to an integral type. I agree with Denys, that makeSymExprValNN is probably not the best place to do that. I took Denys' patch D105340 and created a prototype on top of that to create the SymbolCast in SValBuilder::evalCastSubKind(loc::MemRegionVal V,.

void test_param(int *p) {
  long p_as_integer = (long)p;
  clang_analyzer_dump(p);            // expected-warning{{&SymRegion{reg_$0<int * p>}}}
  clang_analyzer_dump(p_as_integer); // expected-warning{{(long) (reg_$0<int * p>)}}

I'd like to upload that soon.

Once we produce the SymbolCasts, then we can start handling them semantically. In my opinion, we should follow these steps:

  1. Produce SymbolCasts. This is done in D105340, plus I am going to create another patch for pointer-to-int casts.
  2. Handle SymbolCasts for the simplest cases. E.g. when we cast a (64bit width) pointer to a (64bit width unsigned) integer. This requires the foundation for an additional mapping in the ProgramState that can map a base symbol to cast symbols. This would be used in the constraint manager and this mapping must be super efficient.
  3. Gradually handle more complex SymbolCast semantics. E.g. a widening cast.
  4. Remove LocAsInteger. Produce the SymbolCasts by default and remove the option of ShouldSupportSymbolicIntegerCasts.

Point 2) and 3) are already started in D103096. But again, I think first we have to get the solid foundations with the State and with the ConstraintManager before getting into the more complex implementations of widening.

Ah, I forgot to mention one last point:

  1. Revert D115149. We should never reach that problematic assertion once we produce the SymbolCasts.

I took Denys' patch D105340 and created a prototype on top of that to create the SymbolCast in SValBuilder::evalCastSubKind(loc::MemRegionVal V,

Here is the new (alternative) patch: https://reviews.llvm.org/D117229