This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] catch out of bounds for VLA
AbandonedPublic

Authored by danielmarjamaki on Mar 1 2017, 1:37 AM.

Details

Summary

This is a work in progress patch. I try to detect such error:

void foo(int X) {
    int Buf[X];
    Buf[X] = 0;
}

The patch successfully detects this bug. However it writes FP when you try to take the address.

I would like to know if you think my approach to add a checkPreStmt is good. I would need to add logic to handle *(Buf+X) also.

This checker doesn't really logically belong in the ArrayBounds checker but from a user perspective it seems to belong there. I can move it to a new checker if that sounds best.

Diff Detail

Repository
rL LLVM

Event Timeline

danielmarjamaki created this revision.Mar 1 2017, 1:37 AM
xazax.hun edited edge metadata.Mar 1 2017, 6:01 AM

There is an alternative approach idea:

This is not found by ArrayBoundCheckerV2? If no, an alternative approach would be to properly set the constraints on the extent of the VLA's memory region. After that, maybe ArrayBoundCheckerV2 would work automatically on this case. If the extent size is not set for VLAs properly yet, it should be similar to this patch: https://reviews.llvm.org/D24307

@zaks.anna, @dcoughlin WDYT?

NoQ added a subscriber: NoQ.Mar 1 2017, 8:39 AM
zaks.anna edited edge metadata.Mar 1 2017, 1:12 PM

Gábor's suggestion sounds good to me. I think ArrayBoundCheckerV2 checker has a higher chance to be productized / moved out of alpha in the future.

Should we just remove ArrayBoundChecker.cpp or is there a value in keeping it around?

To me it seems that the extent is calculated properly in ArrayBoundsV2.

Existing code:

DefinedOrUnknownSVal extentVal =
  rawOffset.getRegion()->getExtent(svalBuilder);

This ugly little debug code will extract the needed VLA information from the extentVal...

if (extentVal.getSubKind() == nonloc::SymbolValKind) {
  SymbolRef SR = extentVal.castAs<nonloc::SymbolVal>().getSymbol();
  const SymbolExtent *SE = dyn_cast<SymbolExtent>(SR);
  const MemRegion *SEMR = SE->getRegion();
  if (SEMR->getKind() == MemRegion::VarRegionKind) {
    const VarRegion *VR = cast<VarRegion>(SEMR);
    QualType T = VR->getDecl()->getType();
    ASTContext &Ctx = checkerContext.getASTContext();
    const VariableArrayType *VLA = Ctx.getAsVariableArrayType(T);

A VLA->dump() after that will output:

VariableArrayType 0x87f6a80 'char [sz]' variably_modified 
|-BuiltinType 0x87f6480 'char'
`-ImplicitCastExpr 0x87f6a70 'int' <LValueToRValue>
  `-DeclRefExpr 0x87f6a58 'int' lvalue ParmVar 0x87f6948 'sz' 'int'

which is exactly what the corresponding VLA->dump() in the checkPreStmt() outputs.

As far as I see the problem is that the ProgramState does not keep the symbolic value for sz.

In checkPreStmt the state is:

Expressions:
 (0xe4acb0,0xe04790) sz : reg_$0<int sz>
 (0xe4acb0,0xe04828) array : &array
 (0xe4acb0,0xe04840) sz : &sz
 (0xe4acb0,0xe04858) array : &element{array,0 S64b,char}
 (0xe4acb0,0xe04868) sz : reg_$0<int sz>
 (0xe4acb0,0xe048b0) 1 : 1 S8b

in checkLocation() the state is:

Expressions:
 (0xe4acb0,0xe04878) array[sz] : &element{array,reg_$0<int sz>,char}
 (0xe4acb0,0xe048b0) 1 : 1 S8b
 (0xe4acb0,0xe048c0) array[sz] = 1 : 1 S8b
Ranges of symbol values:
 reg_$0<int sz> : { [0, 2147483647] }

This little code works in checkPreStmt() but not in checkLocation():

SVal sizeV = State->getSVal(VLA->getSizeExpr(), C.getLocationContext());

In checkPreStmt that returns "reg_$0<int sz>" and in checkLocation() that returns "Unknown".

Do you agree that this is the problem? Would it be a good idea to try to keep the sz in the ProgramState?

NoQ added a comment.Mar 3 2017, 2:59 PM

Do you agree that this is the problem? Would it be a good idea to try to keep the sz in the ProgramState?

Environment stores values only temporarily. It's kind of a scratch pad for temporary symbolic calculations: we compute sub-expressions, put them in the Environment, compute the expression itself, then throw the sub-expressions away immediately. Store, on the other hand, is a permanent storage.

Also, in your state dumps no information is actually lost. The fact that the value of variable sz is reg_$0<sz> is trivial: you could ask the Store what's the value of the variable sz and it'd say reg_$0<sz> if there are no bindings over it.

Or, alternatively, you see the same value in the dump of the ElementRegion as its index.

P.S. I'd agree that it's better to merge the two versions of the checker than trying to fix one of them to be at least as good as the other, through different approaches.

zaks.anna requested changes to this revision.Mar 6 2017, 10:19 AM

Following Gabor's suggestion, we should investigate if ArrayBoundCheckerV2 supports this. If not it's possible that we are hitting the Constraint Solver limitations.

This revision now requires changes to proceed.Mar 6 2017, 10:19 AM

Also, in your state dumps no information is actually lost. The fact that the value of variable sz is reg_$0<sz> is trivial: you could ask the Store what's the value of the variable sz and it'd say reg_$0<sz> if there are no bindings over it.

Thanks. I have tried to dig around in the framework. I don't see how I can get it.

I also don't want to get the "current" symbolvalue for the variable. For example:

void f(int X) {
  int Buf[X];
  X--;
  Buf[X] = 0;
}

So I envision that I need to tell the Store which "X" I want. I am trying to get the "X" symbol that defines the VLA size.

If no, an alternative approach would be to properly set the constraints on the extent of the VLA's memory region.

For information I tried to set the extent for the VLA.. by copy/pasting some code from the other diff. Ideally I don't think it should be set in checkPreStmt but this was an experiment...

void ArrayBoundCheckerV2::checkPreStmt(const ArraySubscriptExpr *A, CheckerContext &C) const
{
  ASTContext &Ctx = C.getASTContext();
  QualType T = A->getBase()->IgnoreCasts()->getType();
  const VariableArrayType *VLA = Ctx.getAsVariableArrayType(T);
  if (!VLA)
    return;

  ProgramStateRef State = C.getState();

  SVal ElementCount = State->getSVal(VLA->getSizeExpr(), C.getLocationContext());
  QualType ElementType = VLA->getElementType();
  ASTContext &AstContext = C.getASTContext();
  CharUnits TypeSize = AstContext.getTypeSizeInChars(ElementType);

  if (Optional<DefinedOrUnknownSVal> DefinedSize =
    ElementCount.getAs<DefinedOrUnknownSVal>()) {
    SVal AV = State->getSVal(A->getBase(), C.getLocationContext());
    const SubRegion *Region = AV.getAsRegion()->getAs<SubRegion>();
    SValBuilder &svalBuilder = C.getSValBuilder();
    DefinedOrUnknownSVal Extent = Region->getExtent(svalBuilder);
    // size in Bytes = ElementCount*TypeSize
    SVal SizeInBytes = svalBuilder.evalBinOpNN(
      State, BO_Mul, ElementCount.castAs<NonLoc>(),
      svalBuilder.makeArrayIndex(TypeSize.getQuantity()),
      svalBuilder.getArrayIndexType());
    DefinedOrUnknownSVal extentMatchesSize = svalBuilder.evalEQ(
      State, Extent, SizeInBytes.castAs<DefinedOrUnknownSVal>());
    State = State->assume(extentMatchesSize, true);
    C.addTransition(State);
  }
}

After this the ArrayBoundCheckerV2 still doesn't catch the bugs. As far as I could see that made very little difference. To me the extentValue looks the same.

If not it's possible that we are hitting the Constraint Solver limitations.

Yes. I was hoping that I would be able to update SimpleSValBuilder::evalBinOpNN() somehow. And first I wanted to see if I could do it in the checker where all the CheckerContext and stuff is available.

dkrupp added a subscriber: dkrupp.Mar 20 2017, 1:41 AM

Should we just remove ArrayBoundChecker.cpp or is there a value in keeping it around?

I think once ArrayBoundCheckerV2 is in production we should only keep ArrayBoundChecker there as educational purpuses for now. However, in the future, as the constraint manager and the core analyzer gets smarter, the V2 checker might get more and more similar to the original one. Once this patch is accepted, I might run the analyzer with V2 check enabled on some open source projects. It would be great to have it moved out of alpha soon.

If no, an alternative approach would be to properly set the constraints on the extent of the VLA's memory region.

For information I tried to set the extent for the VLA.. by copy/pasting some code from the other diff. Ideally I don't think it should be set in checkPreStmt but this was an experiment...

void ArrayBoundCheckerV2::checkPreStmt(const ArraySubscriptExpr *A, CheckerContext &C) const
{
  ASTContext &Ctx = C.getASTContext();
  QualType T = A->getBase()->IgnoreCasts()->getType();
  const VariableArrayType *VLA = Ctx.getAsVariableArrayType(T);
  if (!VLA)
    return;

  ProgramStateRef State = C.getState();

  SVal ElementCount = State->getSVal(VLA->getSizeExpr(), C.getLocationContext());
  QualType ElementType = VLA->getElementType();
  ASTContext &AstContext = C.getASTContext();
  CharUnits TypeSize = AstContext.getTypeSizeInChars(ElementType);

  if (Optional<DefinedOrUnknownSVal> DefinedSize =
    ElementCount.getAs<DefinedOrUnknownSVal>()) {
    SVal AV = State->getSVal(A->getBase(), C.getLocationContext());
    const SubRegion *Region = AV.getAsRegion()->getAs<SubRegion>();
    SValBuilder &svalBuilder = C.getSValBuilder();
    DefinedOrUnknownSVal Extent = Region->getExtent(svalBuilder);
    // size in Bytes = ElementCount*TypeSize
    SVal SizeInBytes = svalBuilder.evalBinOpNN(
      State, BO_Mul, ElementCount.castAs<NonLoc>(),
      svalBuilder.makeArrayIndex(TypeSize.getQuantity()),
      svalBuilder.getArrayIndexType());
    DefinedOrUnknownSVal extentMatchesSize = svalBuilder.evalEQ(
      State, Extent, SizeInBytes.castAs<DefinedOrUnknownSVal>());
    State = State->assume(extentMatchesSize, true);
    C.addTransition(State);
  }
}

You probably get an Unknown value for State->getSVal(VLA->getSizeExpr(), C.getLocationContext());, because the sizeExpr is not in the environment, it is already evaluated, its value has been used up, it is no longer needed. So in case the correct constraints are missing from the VLAs, you should add these assumptions at a point where this value is guaranteed to be available, probably right after the evaluation of the declaration.

danielmarjamaki edited edge metadata.

This is just work in progress!!

With these changes Clang static analyzer will detect overflow in this sample code:

void foo(int X) {
  char *Data = new char[X];
  Data[X] = 0; // <- error
  delete[] Data;
}

I updated SimpleSValBuilder so evalEQ can calculate a SVal when both lhs and rhs are symbols. Source code that used to have problems:

DefinedOrUnknownSVal extentMatchesSize = svalBuilder.evalEQ(
    State, Extent, SizeInBytes.castAs<DefinedOrUnknownSVal>());

Inputs: Extent is "extent_$3{SymRegion{conj_$1{char *}}}" and SizeInBytes is "reg_$0<int X>".

Before my quick fix the return SVal is "Unknown".
With my quick fix it will return a SVal "(extent_$3{SymRegion{conj_$1{char *}}}) == (reg_$0<int X>)".

I also made a simple fix for the ConstraintManager. If the state says X==Y then the ConstraintManager should be able to evaluate Y>=X. My evalUgly loops through the constraints and matches them manually.

Do you have some feedback? Do you think my SimpleSValBuilder approach is fine to commit if I polish it? It was just a quick hack so I guess it might make some tests fail etc.

About the ConstraintManager fix. Is it a good idea to handle simple SymSymExpr constraints? Or should this be handled by Z3 instead?

I would propose that I rename and cleanup RangeConstraintManager::uglyEval() and add it. When I tested it, the Z3 does not seem to handle this.

Ping. Any comments?

This revision now requires changes to proceed.Apr 25 2017, 6:33 AM
xazax.hun added a comment.EditedApr 25 2017, 7:09 AM

I would propose that I rename and cleanup RangeConstraintManager::uglyEval() and add it. When I tested it, the Z3 does not seem to handle this.

What do you mean by Z3 not handling this? I mean, if the evalEq returns an unknown SVal, Z3 can not do anything with that. Does Z3 fail after the evalEq returns the correct SVal?

I wonder that is the reason that we do not produce the correct SVal right now. Maybe the reason was that, we do not want to produce SVals that the constraint manager can not solve anyways, to improve performance. In general the current constraint manager does not reason about SymSymExprs. In order to preserve the performance we might not want to start produce all kinds of SymSymExprs, only the cases we already handle in the constraint managers.

E.g.: it might make sense to produce something like $a - $b =0, so https://reviews.llvm.org/rL300178 can kick in, and in some cases we might be able to find the bug (e.g. the function in the example was called with a concrete int).

NoQ added a comment.Jun 1 2017, 4:21 AM

An idea. I believe the safest way to find the bugs you mentioned would be to replace extent-as-a-symbol with extent-as-a-trait.

I.e., currently we construct extent_$3{SymRegion{conj_$1{char *}}}, assume that it is equal to reg_$0<int X> (which produces a SymSymExpr) and then catch more SymSymExprs along the path and compare them to the first one.

Instead, i believe that from the start we should have done something like

REGISTER_MAP_WITH_PROGRAMSTATE(RegionExtent, const SubRegion *, NonLoc);

Then when the VLA is constructed (or memory is malloc'ed or something like that), we just set the RegionExtent trait directly to reg_$0<int X> and later easily compare it to anything. The region's getExtent() method would be modified to consult this trait instead of (or, at least, before) constructing a new symbol.

Ideologically it is the same thing. Technically it produces simpler symbolic expressions, and i believe that both RangeConstraintManager and Z3 would benefit from simpler symbolic expressions.

In D30489#769916, @NoQ wrote:

An idea. I believe the safest way to find the bugs you mentioned would be to replace extent-as-a-symbol with extent-as-a-trait.

I.e., currently we construct extent_$3{SymRegion{conj_$1{char *}}}, assume that it is equal to reg_$0<int X> (which produces a SymSymExpr) and then catch more SymSymExprs along the path and compare them to the first one.

Instead, i believe that from the start we should have done something like

REGISTER_MAP_WITH_PROGRAMSTATE(RegionExtent, const SubRegion *, NonLoc);

Then when the VLA is constructed (or memory is malloc'ed or something like that), we just set the RegionExtent trait directly to reg_$0<int X> and later easily compare it to anything. The region's getExtent() method would be modified to consult this trait instead of (or, at least, before) constructing a new symbol.

Ideologically it is the same thing. Technically it produces simpler symbolic expressions, and i believe that both RangeConstraintManager and Z3 would benefit from simpler symbolic expressions.

+1, I like this approach!

MTC added a subscriber: MTC.Sep 6 2017, 8:22 PM

As suggested, use a ProgramState trait to detect VLA overflows.

I did not yet manage to get a SubRegion from the DeclStmt that matches the location SubRegion. Therefore I am using VariableArrayType in the trait for now.

danielmarjamaki abandoned this revision.Jan 15 2018, 12:33 AM

I will not continue working on this. Feel free to take over the patch or write a new patch.