This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Fix wrong comparison generation of the ranges generated by the refutation manager
ClosedPublic

Authored by mikhail.ramalho on Jun 19 2018, 8:25 AM.

Details

Summary

The refutation manager is removing a true bug from the following program:

typedef struct o p;
struct o {
  struct {
  } s;
};

void q(*r, p2) {
  r < p2;
}

void k(l, node) {
  struct {
    p *node;
  } * n, *nodep, path[sizeof(void)];
  path->node = l;
  for (n = path; node != l;) {
    q(node, n->node);
    nodep = n;
  }
  if (nodep)
    n[1].node->s; // warning: Dereference of undefined pointer value
}

The problem is that the following constraint:

(conj_$1{struct o *}) - (reg_$3<int * r>): [-9223372036854775808, 0]

is encoded as:

(and (bvuge (bvsub $1 $3) #x8000000000000000)
     (bvule (bvsub $1 $3) #x0000000000000000))

The issue is that unsigned comparisons (bvuge and bvule) are being generated instead of signed comparisons (bvsge and bvsle).

When generating the expressions:

(conj_$1{p *}) - (reg_$3<int * r>) >= -9223372036854775808

and

(conj_$1{p *}) - (reg_$3<int * r>) <= 0

both -9223372036854775808 and 0 are casted to pointer type and LTy->isSignedIntegerOrEnumerationType() in Z3ConstraintManager::getZ3BinExpr only checks if the type is signed, not if it's a pointer.

~

I created this PR to discuss the solution, as this change is currently breaking some regressions (Analysis/ptr-arith.c).

I'll add the test to this PR as soon as I can remove all the unrelated warning that are being produced in the example (incompatible types, discarded results, etc).

Diff Detail

Repository
rL LLVM

Event Timeline

lib/StaticAnalyzer/Core/Z3ConstraintManager.cpp
1427 ↗(On Diff #151920)

I'm confused here, aren't pointers unsigned?

I'm confused here, aren't pointers unsigned?

Yes, but the this is actually a ptrdiff_t, which is signed.

ddcc added a comment.Jun 19 2018, 11:45 AM

The code already tries to handle ptrdiff_t specially, see the comment right above, but it is only executed if RetTy is non-null. Instead, I think it should be refactored so that e.g. a local isBool is always computed, and this is passed to the final call to Z3Expr::fromBinOp.

The code already tries to handle ptrdiff_t specially, see the comment right above, but it is only executed if RetTy is non-null. Instead, I think it should be refactored so that e.g. a local isBool is always computed, and this is passed to the final call to Z3Expr::fromBinOp.

That was my initial approach but I think the problem is somewhere else, in Z3ConstraintManager::getZ3RangeExpr, the symbol is:

(conj_$1{p *}) - (reg_$3<int * r>)
BuiltinType 0xa89430 'long'

but the call in Z3ConstraintManager::getZ3RangeExpr:

// Convert symbol
QualType SymTy;
Z3Expr Exp = getZ3Expr(Sym, &SymTy);

returns:

(bvsub $1 $3)
PointerType 0xac7790 'p *'
`-TypedefType 0xac7740 'p' sugar
  |-Typedef 0xac6ea8 'p'
  `-ElaboratedType 0xac6e50 'struct o' sugar
    `-RecordType 0xac6e30 'struct o'
      `-Record 0xac6f10 'o'

getZ3Expr eventually calls getZ3BinExpr but it's RetTy is not updated.

I just found out that RetTy is not being updated because in getZ3BinExpr because of:

if (LTy->isAnyPointerType() && LTy == RTy && Op == BO_Sub) {
  ASTContext &Ctx = getBasicVals().getContext();
  *RetTy = Ctx.getIntTypeForBitwidth(Ctx.getTypeSize(LTy), true);
}

LTy and RTy are not equal:

(gdb) call RTy.dump()
PointerType 0x6c1120 'int *'
`-BuiltinType 0x683410 'int'

(gdb) call LTy.dump()
PointerType 0x6c1790 'p *'
`-TypedefType 0x6c1740 'p' sugar
  |-Typedef 0x6c0ea8 'p'
  `-ElaboratedType 0x6c0e50 'struct o' sugar
    `-RecordType 0x6c0e30 'struct o'
      `-Record 0x6c0f10 'o'

Changing it to:

if (LTy->isAnyPointerType() && RTy->isAnyPointerType() && Op == BO_Sub) {
  ASTContext &Ctx = getBasicVals().getContext();
  *RetTy = Ctx.getIntTypeForBitwidth(Ctx.getTypeSize(LTy), true);
}

Seems to fix the issue but I have to run the tests first to confirm it doesn't break anything.

Changing it to:

Shouldn't we then use maximum of two type sizes? [I'm not sure what are the exact conversion rules there]

BTW from which program line the negation constraint is arising?

Changing it to:

Shouldn't we then use maximum of two type sizes? [I'm not sure what are the exact conversion rules there]

I think the correct type should always be ptrdiff_t, regardless of the type of the pointers involved. ASTContext has a method has a getPointerDiffType() so maybe we should you that instead of Ctx.getIntTypeForBitwidth(Ctx.getTypeSize(LTy), true)?

BTW from which program line the negation constraint is arising?

from the function q.

ddcc added a comment.Jun 19 2018, 12:47 PM

I think the correct type should always be ptrdiff_t, regardless of the type of the pointers involved. ASTContext has a method has a getPointerDiffType() so maybe we should you that instead of Ctx.getIntTypeForBitwidth(Ctx.getTypeSize(LTy), true)?

Yeah, that's seems right. To clarify, there are multiple bugs here:

  1. ptrdiff_t type isn't retrieved correctly -> fixed by your proposal above
  2. Z3Expr internal type doesn't match computed return type -> store in a local and pass to Z3Expr::fromBinOp
  3. return type isn't computed if RetTy is null -> always compute return type, but only set pointer if RetTy is non-null
mikhail.ramalho updated this revision to Diff 152085.EditedJun 20 2018, 7:41 AM

Update solution and add test case.

@ddcc Thanks a lot for your comments, really appreciated!

lib/StaticAnalyzer/Core/Z3ConstraintManager.cpp
1419 ↗(On Diff #152085)

Should we use the method you have mentioned then? (getPointerDiffType)

Just sticking with left type seems arbitrary.

ddcc added a comment.Jun 20 2018, 11:09 AM

Yeah, I think it would be preferable to fix all three bugs here.

Should we use the method you have mentioned then? (getPointerDiffType)

Just sticking with left type seems arbitrary.

I tried by it seems to be generating a bitvector with the wrong size and a
bunch of assertions fail because of that.

I'm still investigating why, but this is the current stable version of the
patch.

It turns out the assertion failures were not related to this patch, but to some other changes I had locally.

All tests pass and, with this patch, 6 more bugs are removed from sqlite3, using the refutation manager.

george.karpenkov requested changes to this revision.Jun 26 2018, 5:56 PM

@mikhail.ramalho should this still be marked as WIP?
Also [Analyzer] prefix should be all-lowercase ([analyzer])

This revision now requires changes to proceed.Jun 26 2018, 5:56 PM

@mikhail.ramalho should this still be marked as WIP?

Yes, I had to look in the C11 standard about pointer subtraction and it states in 6.5.6:

When two pointers to elements of the same array object are subtracted, the result is the difference of the subscripts of the two array elements. [...] Unless both pointers point to elements of the same array object, or one past the last element of the array object, the behavior is undefined.

So, by changing this code in the SMT backend we'll be allowing undef behavior. On the other hand, these situations should be caught by the CSA, which is the one generating the pointer subtraction with different types in the first place. I'm not sure how to proceed here.

Also [Analyzer] prefix should be all-lowercase ([analyzer])

I'll update it.

mikhail.ramalho retitled this revision from [Analyzer] [WIP] Fix wrong comparison generation of the ranges generated by the refutation manager to [analyzer] [WIP] Fix wrong comparison generation of the ranges generated by the refutation manager.Jun 27 2018, 5:43 AM
mikhail.ramalho retitled this revision from [analyzer] [WIP] Fix wrong comparison generation of the ranges generated by the refutation manager to [analyzer] Fix wrong comparison generation of the ranges generated by the refutation manager.
NoQ added a comment.Jun 27 2018, 11:01 AM

As usual, the analyzer, unlike the compiler, should be prepared to handle three different situations:

  1. We know that the two pointers point to the same array. We can easily perform the computation because most of the time they'd probably be ElementRegions with the same base.
  1. We know that the two pointers don't point to the same array. This is UB and we are allowed to return an Undefined value here in the ExprEngine. But we'd better also have a checker to catch this UB and stop the analysis, because otherwise people won't be able to understand where their undefined values are coming from, because we won't do a good job highlighting the reason why the value is undefined in the path pieces. If we're not yet ready to explain the UB properly to the user, we'd rather proceed as if it's situation 3.
  1. We don't know whether the two pointers point to the same array or not. We need to behave conservatively and trust the programmer that the subtraction is allowed. That's probably where symbolic difference should be returned.
In D48324#1145384, @NoQ wrote:

As usual, the analyzer, unlike the compiler, should be prepared to handle three different situations:

  1. We know that the two pointers point to the same array. We can easily perform the computation because most of the time they'd probably be ElementRegions with the same base.
  1. We know that the two pointers don't point to the same array. This is UB and we are allowed to return an Undefined value here in the ExprEngine. But we'd better also have a checker to catch this UB and stop the analysis, because otherwise people won't be able to understand where their undefined values are coming from, because we won't do a good job highlighting the reason why the value is undefined in the path pieces. If we're not yet ready to explain the UB properly to the user, we'd rather proceed as if it's situation 3.
  1. We don't know whether the two pointers point to the same array or not. We need to behave conservatively and trust the programmer that the subtraction is allowed. That's probably where symbolic difference should be returned.

The case here is that the subtraction is generated by the CSA, as there is no subtraction in the example.

My question is, should we assume that all code reaching the SMT backend is correct and only care if it's the two operands of a subtraction are pointers?

NoQ added a comment.Jun 27 2018, 3:50 PM

The case here is that the subtraction is generated by the CSA, as there is no subtraction in the example.

If the original expression has the exact same semantics, then my previous reply applies.

If not, then we shouldn't have generated the subtraction :)

george.karpenkov accepted this revision.Jun 28 2018, 1:58 PM
This revision is now accepted and ready to land.Jun 28 2018, 1:58 PM
This revision was automatically updated to reflect the committed changes.