This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/Analysis/FlowSensitive/Models/
-
Analysis/
-
FlowSensitive/
-
Models/
34/38
UncheckedOptionalAccessModel.cpp
-
unittests/Analysis/FlowSensitive/
-
Analysis/
-
FlowSensitive/
7/7
UncheckedOptionalAccessModelTest.cpp

Differential D122231

[clang][dataflow] Add support for `value_or` in a comparison.
ClosedPublic

Authored by ymandel on Mar 22 2022, 7:10 AM.

Download Raw Diff

Details

Reviewers

sgatev
xazax.hun

Commits

rG7f076004e941: [clang][dataflow] Add support for `value_or` in a comparison.

Summary

This patch adds limited modeling of the value_or method. Specifically, when
used in a particular idiom in a comparison to implicitly check whether the
optional holds a value.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ymandel created this revision.Mar 22 2022, 7:10 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 22 2022, 7:10 AM

Herald added subscribers: tschuett, steakhal, rnkovacs. · View Herald Transcript

ymandel requested review of this revision.Mar 22 2022, 7:10 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 22 2022, 7:10 AM

Harbormaster completed remote builds in B155621: Diff 417289.Mar 22 2022, 7:45 AM

sgatev added inline comments.Mar 23 2022, 2:20 AM

clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
127	Why handle negation here? Would it work for `if (opt.value_or("").empty()) { ... } else { opt.value(); }`?
156	Why `0`? How about `opt_p->value_or(21) != 21`?
542	Extreme nit for consistency with all comments above.
544	Why not hard-code this in the `isValueOrCondition` matcher?
545	The `clang` namespace can be removed. Same comment for other instances in the patch.
547	Why not pass `transferOptionalValueOrCall` as argument instead of wrapping it in a lambda? The function can take the "ValueOrCall" node from the `MatchResult`.
clang/unittests/Analysis/FlowSensitive/UncheckedOptionalAccessModelTest.cpp
1744	Is the `return` important? I think having `void` return type would be simpler. Same comment for the cases below.

xazax.hun added inline comments.Mar 23 2022, 9:28 AM

clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
279	There is an implication in the reverse direction as well. In case we know the optional is empty, we can prune one of the branches from the analysis. Is it possible to implement that with the current status of the framework?

Addressed comments.

remove stray comments

adjust logical formula

Thanks for the detailed review!

clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
127	The negation is a simpler predicate, but you're right that it's too specific. I've rewritten the code to drop the constraint and handle the more general `opt.value_or("").empty()`. The new code also encodes a more precise relationship in the logic, per Gabor's comments below about the implication in the other direction. It also now directly attaches the formula to the value, rather than dropping an implication in the flow conditions. Overall, I think the new approach is an improvement, but please let me know if you disagree.
156	This is addressed in the comment, but do you think we should add a FIXME to support some amount of expression comparision? Integers, floats, bools and variables would be an easy place to start, for example. But, we'd need to drop into regular code -- the matchers can't express that kind of constraint.
279	Yes, good point! Please see my response to Stanislav above. I think the new version handles this by modeling the value_or directly, rather than dropping in an implication.
544	Safety/hygiene. It's easier to see that the ID to which the node is bound is the same that's being used in `getNodeAs`. An alternative which I often use is to use a (static) global constant, so I've changed to that.
clang/unittests/Analysis/FlowSensitive/UncheckedOptionalAccessModelTest.cpp
1744	Totally. Thanks for pointing that out.

Harbormaster completed remote builds in B156276: Diff 418209.Mar 25 2022, 7:09 AM

ymandel added a subscriber: kinu.Mar 25 2022, 9:24 AM

xazax.hun added inline comments.Mar 25 2022, 11:33 AM

clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
150	Yeah, I think Clang is in a very sad state in this regard. We have a lot of half done facilities littered all over the codebase, including: https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Analysis/CloneDetection.h https://github.com/llvm/llvm-project/blob/main/clang/lib/StaticAnalyzer/Checkers/IdenticalExprChecker.cpp#L306 https://github.com/llvm/llvm-project/blob/main/clang-tools-extra/clang-tidy/misc/RedundantExpressionCheck.cpp#L60
155	I wonder if we want to add `""` to support `opt.value_or("") != ""`. Not sure how frequent would this be over the empty call.
281	Is this the right way to initialize `ComparisonValue`? Considering the expression: `opt.value_or(nullptr) != nullptr` When `has_value == false`, `opt.value_or(nullptr)` will return `nullptr`, so `!=` evaluates to false. This case seems to check out. However, when `has_value == true`, `opt` might still hold an `nullptr` and `!=` could still evaluate to false.

ymandel marked 3 inline comments as done.Mar 25 2022, 12:42 PM

ymandel added inline comments.

clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
281	Thanks for digging into this. I think it's correct, but helpful to step through: Its correctness depends on `MakeValue`, so I'll focus on that in particular. For the `nullptr` case, we'll get: HasValueVal && ContentsNotEqX So, when `has_value == true`, this basically reduces to `ContentsNotEqX`. Since that's an atom, the result is indeterminate, which I believe is the desired outcome. WDYT? Also, even if I've convinced you, please let me know how i can improve the comments. For that matter, would `MakeValue` be better with a more specific name, like "MakePredicate" or somesuch?

xazax.hun added inline comments.Mar 25 2022, 12:50 PM

clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
281	I think what confuses me is that we do something different for the 3 cases. You convinced me that `HasValueVal && ContentsNotEqX` is correct. But we only do this for one branch out of the 3. What is the reason for that?

xazax.hun added inline comments.Mar 25 2022, 12:54 PM

clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
318	Typo: values.

xazax.hun added inline comments.Mar 25 2022, 12:56 PM

clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
281	Oh, never mind. Yeah, I think changing `MakeValue` to `MakePredicate` would make this a bit clearer. After a second read now I understand better what is going on.

ymandel added inline comments.Mar 25 2022, 12:58 PM

clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
281	Just to be clear: the three cases you mean are lines 273-283, or something else?

ymandel added inline comments.Mar 25 2022, 1:00 PM

clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
281	and never mind my question, then (I rpelied before I saw your updated). I'll change the name and add comments.

xazax.hun added inline comments.Mar 25 2022, 1:02 PM

clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
290	I am still wondering a bit about this case. We generate: `HasValueVal and ContentsNotEqX and CurrentValue`.' I wonder if we want: `HasValueVal and (ContentsNotEqX <=> CurrentValue)` instead? Or even `HasValueVal and CurrentValue`?

xazax.hun added inline comments.Mar 25 2022, 1:23 PM

clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
306	Not related to this PR, but I think in the future we will want to associate names to the values to make debugging easier (or maybe to generate really nice error messages).

ymandel added inline comments.Mar 25 2022, 1:55 PM

clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
290	I don't think that the iff version is right, but `HasValueVal and CurrentValue` could be. My concern is that we're not guaranteed that `CurrentValue` is populated. And, even if we were, it doesn't feel quite right. Assuming its a high fidelity model, we get (logically): `HasValue(opt) and Ne(ValueOr(opt,X),X)`. Then, when negated (say, on an else branch) we get `not(HasValue(opt)) or not(Ne(ValueOr(opt,X),X))` which is equivalent to `not(HasValue(opt)) or Eq(ValueOr(opt,X),X)`. While true, it seems redundant, since the first clause should be derivable from the second (assuming an interpretatable semantics to the `ValueOr` predicate). Regardless, it might be better to step back and figure out how this should be done systematically. I'll try to come back with a proposal on that.

sgatev added inline comments.Mar 28 2022, 3:00 AM

clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
281	Can you elaborate on the three cases on lines 273-283? Why not simply do auto &ComparisonExprLoc = Env.createStorageLocation(*ComparisonExpr); Env.setStorageLocation(ComparisonExpr, ComparisonExprLoc); Env.setValue(ComparisonExprLoc, ComparisonValue);

ymandel marked 2 inline comments as done.Mar 28 2022, 5:40 AM

ymandel added inline comments.

clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
281	Can you elaborate on the three cases on lines 273-283? Why not simply do auto &ComparisonExprLoc = Env.createStorageLocation(*ComparisonExpr); Env.setStorageLocation(ComparisonExpr, ComparisonExprLoc); Env.setValue(ComparisonExprLoc, ComparisonValue); for the second case: I think we should drop it -- I don't see a reason to maintain the previous value (if there is any). It might be a good idea for compositionality, but we're not doing that anywhere else, so it doesn't make sense here. for the first and third case: I assumed that if the expression already has a location, we'd want to reuse it. But, based on your question, I take it that's incorrect?
290	Regardless, it might be better to step back and figure out how this should be done systematically. I'll try to come back with a proposal on that. Here's what I have: in general, we're aiming for all models to be a sound (over) approximation of reality. That is what we're doing here as well. Yet, that poses a problem for the interpretation of the boolean not operator. If its operand is an overapproximation, then I believe the naive approach gives you an under approximation. That's the problem we're hitting when reasoning about the negation. I'm not sure how to handle this. Stanislav -- have we dealt with this issue before? That said, if we go back to the previous approach, of adding the information to the path condition, I think we avoid this problem, since the path conditions don't get negated. To Gabor's earlier point: There is an implication in the reverse direction as well. In case we know the optional is empty, we can prune one of the branches from the analysis. Is it possible to implement that with the current status of the framework? I think is covered by the condition we're adding. Namely: ExprValue => has_value where `ExprValue` is the truth value of the boolean expression. So, the implication in the reverse direction is: !has_value => !ExprValue that is, if we know the optional doesn't hold a value, then we know that `opt.value_or(X) = X` But, that implication is the contrapositive of our own, so I think it's already implicitly covered by adding the single implication. Does that sound right?

sgatev added inline comments.Mar 29 2022, 1:16 AM

clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
281	Dropping the second case makes sense to me. For the rest, `createStorageLocation` returns a stable storage location so the snippet above should be sufficient. However, `setStorageLocation` will fail if we try calling it again with the same expression, even if it's called with the same storage location. What do you think about making `setStorageLocation` not fail if it's called with the same arguments?
290	I'm not following where `Env.makeAnd(*CurrentValue, ComparisonValue)` comes from so I'd question whether it's sound or not. I would have expected to see something like `ExprValue => has_value` (which I believe was the case in the first iteration) and I see no issues with the contrapositive. If you have `x => y` and `not y` in the flow condition, you'll be able to infer that `not x` is true (assuming no other statements for `x`). How we use this to prune branches from the analysis is a question of its own.

address comments

ymandel marked 7 inline comments as done.Mar 29 2022, 8:15 AM

ymandel added inline comments.

clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
150	Right. I've added a FIXME. I think an elegant solution in this case would be a relational matcher for built-in types.
155	Makes sense -- done.
281	for the code here -- I don't think that there's a case where there's not value associated with a comparison, yet there is a loc, so I think that snippet is fine. for the general case, I think that `setStorageLocation` should be be no stricter than `createStorageLocation`. it seems strange for the set operation to fail when the create does not (since "set" is more commonly a repeatable operation). Otherwise, no strong opinion as long as we document the behavior.
290	I think the new version resolves this?
306	Good idea. I've noted that (to myself) as a todo to add a FIXME or somesuch.

Wow. This did take some iterations and I feel like I just added to the confusion at some point :D But the latest iteration looks much simpler and I'm confident it is right this time. Thanks!

This revision is now accepted and ready to land.Mar 29 2022, 8:27 AM

In D122231#3414109, @xazax.hun wrote:

Wow. This did take some iterations and I feel like I just added to the confusion at some point :D But the latest iteration looks much simpler and I'm confident it is right this time. Thanks!

Not at all -- I think you raised some really good questions! Ultimately, my move from implication in the flow condition to tying it directly to the value was the wrong turn, and your questions effectively highlighted the issues. :)

sgatev accepted this revision.Mar 30 2022, 1:02 AM

sgatev added inline comments.

clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
283	Why do this conditionally? I think we should set a value regardless of whether another model has already done so.
290	Yes, modelling these using implications looks good to me!
546	Call this `isValueOrNotEqX` for consistency?
clang/unittests/Analysis/FlowSensitive/UncheckedOptionalAccessModelTest.cpp
1742	I suggest making `opt` a parameter of `target` in all tests because in the current setup a more advanced analysis would identify one of the code paths we exercise as dead.
1804	Let's move this to a string header and remove the definition in the test above.
1833–1834	These can be combined in a `$ns::$optional<int> *opt` parameter.

Harbormaster completed remote builds in B156761: Diff 418880.Mar 30 2022, 2:27 AM

address comments.

delete line

ymandel added inline comments.Mar 30 2022, 12:23 PM

clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
283	Why? I figured we're agnostic to the underlying value, and only care about relating it via the implication. We're setting it only so we have something to anchor that implication on. If we always set it, then we're erasing the information from another model.
clang/unittests/Analysis/FlowSensitive/UncheckedOptionalAccessModelTest.cpp
1244	Note: this came from sync'ing to HEAD and picking up my other patch.
1833–1834	Unfortunately, that crashes (which must be why I did this to begin with). But, I did reduce to only one var and one param.

sgatev accepted this revision.Mar 31 2022, 12:35 AM

sgatev added inline comments.

clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
283	Nevermind. I probably didn't follow carefully around all the comment blocks and thought that `addToFlowCondition` also happens conditionally. The current approach looks good to me.

Harbormaster completed remote builds in B157035: Diff 419243.Mar 31 2022, 1:33 AM

Closed by commit rG7f076004e941: [clang][dataflow] Add support for `value_or` in a comparison. (authored by ymandel). · Explain WhyMar 31 2022, 6:22 AM

This revision was automatically updated to reflect the committed changes.

ymandel added a commit: rG7f076004e941: [clang][dataflow] Add support for `value_or` in a comparison..

Revision Contents

Path

Size

clang/

lib/

Analysis/

FlowSensitive/

Models/

UncheckedOptionalAccessModel.cpp

106 lines

unittests/

Analysis/

FlowSensitive/

UncheckedOptionalAccessModelTest.cpp

117 lines

Diff 419424

clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp

Show First 20 Lines • Show All 115 Lines • ▼ Show 20 Lines

} }

auto isStdSwapCall() { auto isStdSwapCall() {

return callExpr(callee(functionDecl(hasName("std::swap"))), return callExpr(callee(functionDecl(hasName("std::swap"))),

argumentCountIs(2), hasArgument(0, hasOptionalType()), argumentCountIs(2), hasArgument(0, hasOptionalType()),

hasArgument(1, hasOptionalType())); hasArgument(1, hasOptionalType()));

} }

constexpr llvm::StringLiteral ValueOrCallID = "ValueOrCall";

auto isValueOrStringEmptyCall() {

// `opt.value_or("").empty()`

sgatevUnsubmitted

Done

Why handle negation here? Would it work for if (opt.value_or("").empty()) { ... } else { opt.value(); }?

sgatev: Why handle negation here? Would it work for `if (opt.value_or("").empty()) { ... } else { opt.

ymandelAuthorUnsubmitted

Done

The negation is a simpler predicate, but you're right that it's too specific. I've rewritten the code to drop the constraint and handle the more general opt.value_or("").empty(). The new code also encodes a more precise relationship in the logic, per Gabor's comments below about the implication in the other direction. It also now directly attaches the formula to the value, rather than dropping an implication in the flow conditions.

Overall, I think the new approach is an improvement, but please let me know if you disagree.

ymandel: The negation is a simpler predicate, but you're right that it's too specific. I've rewritten…

return cxxMemberCallExpr(

callee(cxxMethodDecl(hasName("empty"))),

onImplicitObjectArgument(ignoringImplicit(

cxxMemberCallExpr(on(expr(unless(cxxThisExpr()))),

callee(cxxMethodDecl(hasName("value_or"),

ofClass(optionalClass()))),

hasArgument(0, stringLiteral(hasSize(0))))

.bind(ValueOrCallID))));

}

auto isValueOrNotEqX() {

auto ComparesToSame = [](ast_matchers::internal::Matcher<Stmt> Arg) {

return hasOperands(

ignoringImplicit(

cxxMemberCallExpr(on(expr(unless(cxxThisExpr()))),

callee(cxxMethodDecl(hasName("value_or"),

ofClass(optionalClass()))),

hasArgument(0, Arg))

.bind(ValueOrCallID)),

ignoringImplicit(Arg));

};

// `opt.value_or(X) != X`, for X is `nullptr`, `""`, or `0`. Ideally, we'd

xazax.hunUnsubmitted

Done

Yeah, I think Clang is in a very sad state in this regard. We have a lot of half done facilities littered all over the codebase, including:
https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Analysis/CloneDetection.h
https://github.com/llvm/llvm-project/blob/main/clang/lib/StaticAnalyzer/Checkers/IdenticalExprChecker.cpp#L306
https://github.com/llvm/llvm-project/blob/main/clang-tools-extra/clang-tidy/misc/RedundantExpressionCheck.cpp#L60

xazax.hun: Yeah, I think Clang is in a very sad state in this regard. We have a lot of half done…

ymandelAuthorUnsubmitted

Done

Right. I've added a FIXME. I think an elegant solution in this case would be a relational matcher for built-in types.

ymandel: Right. I've added a FIXME. I think an elegant solution in this case would be a relational…

// support this pattern for any expression, but the AST does not have a

// generic expression comparison facility, so we specialize to common cases

// seen in practice. FIXME: define a matcher that compares values across

// nodes, which would let us generalize this to any `X`.

return binaryOperation(hasOperatorName("!="),

xazax.hunUnsubmitted

Done

I wonder if we want to add "" to support opt.value_or("") != "". Not sure how frequent would this be over the empty call.

xazax.hun: I wonder if we want to add `""` to support `opt.value_or("") != ""`. Not sure how frequent…

ymandelAuthorUnsubmitted

Done

Makes sense -- done.

ymandel: Makes sense -- done.

anyOf(ComparesToSame(cxxNullPtrLiteralExpr()),

sgatevUnsubmitted

Done

Why 0? How about opt_p->value_or(21) != 21?

sgatev: Why `0`? How about `opt_p->value_or(21) != 21`?

ymandelAuthorUnsubmitted

Not Done

This is addressed in the comment, but do you think we should add a FIXME to support some amount of expression comparision? Integers, floats, bools and variables would be an easy place to start, for example. But, we'd need to drop into regular code -- the matchers can't express that kind of constraint.

ymandel: This is addressed in the comment, but do you think we should add a FIXME to support some amount…

ComparesToSame(stringLiteral(hasSize(0))),

ComparesToSame(integerLiteral(equals(0)))));

}

/// Creates a symbolic value for an `optional` value using `HasValueVal` as the /// Creates a symbolic value for an `optional` value using `HasValueVal` as the

/// symbolic value of its "has_value" property. /// symbolic value of its "has_value" property.

StructValue &createOptionalValue(Environment &Env, BoolValue &HasValueVal) { StructValue &createOptionalValue(Environment &Env, BoolValue &HasValueVal) {

auto OptionalVal = std::make_unique<StructValue>(); auto OptionalVal = std::make_unique<StructValue>();

OptionalVal->setProperty("has_value", HasValueVal); OptionalVal->setProperty("has_value", HasValueVal);

return Env.takeOwnership(std::move(OptionalVal)); return Env.takeOwnership(std::move(OptionalVal));

} }

▲ Show 20 Lines • Show All 83 Lines • ▼ Show 20 Lines if (auto *OptionalVal = cast_or_null<StructValue>(

assert(HasValueVal != nullptr); assert(HasValueVal != nullptr);

auto &CallExprLoc = State.Env.createStorageLocation(*CallExpr); auto &CallExprLoc = State.Env.createStorageLocation(*CallExpr);

State.Env.setValue(CallExprLoc, *HasValueVal); State.Env.setValue(CallExprLoc, *HasValueVal);

State.Env.setStorageLocation(*CallExpr, CallExprLoc); State.Env.setStorageLocation(*CallExpr, CallExprLoc);

} }

/// `ModelPred` builds a logical formula relating the predicate in

/// `ValueOrPredExpr` to the optional's `has_value` property.

void transferValueOrImpl(const clang::Expr *ValueOrPredExpr,

const MatchFinder::MatchResult &Result,

LatticeTransferState &State,

BoolValue &(*ModelPred)(Environment &Env,

BoolValue &ExprVal,

BoolValue &HasValueVal)) {

auto &Env = State.Env;

const auto *ObjectArgumentExpr =

Result.Nodes.getNodeAs<clang::CXXMemberCallExpr>(ValueOrCallID)

->getImplicitObjectArgument();

auto *OptionalVal = cast_or_null<StructValue>(

Env.getValue(*ObjectArgumentExpr, SkipPast::ReferenceThenPointer));

if (OptionalVal == nullptr)

return;

auto *HasValueVal = getHasValue(OptionalVal);

assert(HasValueVal != nullptr);

xazax.hunUnsubmitted

Done

There is an implication in the reverse direction as well. In case we know the optional is empty, we can prune one of the branches from the analysis. Is it possible to implement that with the current status of the framework?

xazax.hun: There is an implication in the reverse direction as well. In case we know the optional is empty…

ymandelAuthorUnsubmitted

Done

Yes, good point! Please see my response to Stanislav above. I think the new version handles this by modeling the value_or directly, rather than dropping in an implication.

ymandel: Yes, good point! Please see my response to Stanislav above. I think the new version handles…

auto *ExprValue = cast_or_null<BoolValue>(

xazax.hunUnsubmitted

Done

Is this the right way to initialize ComparisonValue?

Considering the expression: opt.value_or(nullptr) != nullptr

When has_value == false, opt.value_or(nullptr) will return nullptr, so != evaluates to false. This case seems to check out.
However, when has_value == true, opt might still hold an nullptr and != could still evaluate to false.

xazax.hun: Is this the right way to initialize `ComparisonValue`? Considering the expression: `opt.

ymandelAuthorUnsubmitted

Done

Thanks for digging into this. I think it's correct, but helpful to step through:

Its correctness depends on MakeValue, so I'll focus on that in particular. For the nullptr case, we'll get:

HasValueVal && ContentsNotEqX

So, when has_value == true, this basically reduces to ContentsNotEqX. Since that's an atom, the result is indeterminate, which I believe is the desired outcome.

WDYT? Also, even if I've convinced you, please let me know how i can improve the comments. For that matter, would MakeValue be better with a more specific name, like "MakePredicate" or somesuch?

ymandel: Thanks for digging into this. I think it's correct, but helpful to step through: Its…

xazax.hunUnsubmitted

Done

I think what confuses me is that we do something different for the 3 cases. You convinced me that HasValueVal && ContentsNotEqX is correct. But we only do this for one branch out of the 3. What is the reason for that?

xazax.hun: I think what confuses me is that we do something different for the 3 cases. You convinced me…

xazax.hunUnsubmitted

Done

Oh, never mind. Yeah, I think changing MakeValue to MakePredicate would make this a bit clearer. After a second read now I understand better what is going on.

xazax.hun: Oh, never mind. Yeah, I think changing `MakeValue` to `MakePredicate` would make this a bit…

ymandelAuthorUnsubmitted

Done

Just to be clear: the three cases you mean are lines 273-283, or something else?

ymandel: Just to be clear: the three cases you mean are lines 273-283, or something else?

ymandelAuthorUnsubmitted

Done

and never mind my question, then (I rpelied before I saw your updated). I'll change the name and add comments.

ymandel: and never mind my question, then (I rpelied before I saw your updated). I'll change the name…

sgatevUnsubmitted

Done

Can you elaborate on the three cases on lines 273-283? Why not simply do

auto &ComparisonExprLoc = Env.createStorageLocation(*ComparisonExpr);
Env.setStorageLocation(ComparisonExpr, ComparisonExprLoc);
Env.setValue(ComparisonExprLoc, ComparisonValue);

sgatev: Can you elaborate on the three cases on lines 273-283? Why not simply do ``` auto…

ymandelAuthorUnsubmitted

Done

Can you elaborate on the three cases on lines 273-283? Why not simply do

auto &ComparisonExprLoc = Env.createStorageLocation(*ComparisonExpr);
Env.setStorageLocation(ComparisonExpr, ComparisonExprLoc);
Env.setValue(ComparisonExprLoc, ComparisonValue);

for the second case: I think we should drop it -- I don't see a reason to maintain the previous value (if there is any). It might be a good idea for compositionality, but we're not doing that anywhere else, so it doesn't make sense here.

for the first and third case: I assumed that if the expression already has a location, we'd want to reuse it. But, based on your question, I take it that's incorrect?

ymandel: > Can you elaborate on the three cases on lines 273-283? Why not simply do > > ``` > auto…

sgatevUnsubmitted

Done

Dropping the second case makes sense to me.

For the rest, createStorageLocation returns a stable storage location so the snippet above should be sufficient. However, setStorageLocation will fail if we try calling it again with the same expression, even if it's called with the same storage location. What do you think about making setStorageLocation not fail if it's called with the same arguments?

sgatev: Dropping the second case makes sense to me. For the rest, `createStorageLocation` returns a…

ymandelAuthorUnsubmitted

Done

for the code here -- I don't think that there's a case where there's not value associated with a comparison, yet there is a loc, so I think that snippet is fine.

for the general case, I think that setStorageLocation should be be no stricter than createStorageLocation. it seems strange for the set operation to fail when the create does not (since "set" is more commonly a repeatable operation). Otherwise, no strong opinion as long as we document the behavior.

ymandel: for the code here -- I don't think that there's a case where there's not value associated with…

State.Env.getValue(*ValueOrPredExpr, SkipPast::None));

if (ExprValue == nullptr) {

sgatevUnsubmitted

Done

Why do this conditionally? I think we should set a value regardless of whether another model has already done so.

sgatev: Why do this conditionally? I think we should set a value regardless of whether another model…

ymandelAuthorUnsubmitted

Not Done

Why? I figured we're agnostic to the underlying value, and only care about relating it via the implication. We're setting it only so we have something to anchor that implication on. If we always set it, then we're erasing the information from another model.

ymandel: Why? I figured we're agnostic to the underlying value, and only care about relating it via the…

sgatevUnsubmitted

Not Done

Nevermind. I probably didn't follow carefully around all the comment blocks and thought that addToFlowCondition also happens conditionally. The current approach looks good to me.

sgatev: Nevermind. I probably didn't follow carefully around all the comment blocks and thought that…

auto &ExprLoc = State.Env.createStorageLocation(*ValueOrPredExpr);

ExprValue = &State.Env.makeAtomicBoolValue();

State.Env.setValue(ExprLoc, *ExprValue);

State.Env.setStorageLocation(*ValueOrPredExpr, ExprLoc);

}

Env.addToFlowCondition(ModelPred(Env, *ExprValue, *HasValueVal));

xazax.hunUnsubmitted

Done

I am still wondering a bit about this case.

We generate: HasValueVal and ContentsNotEqX and CurrentValue.'
I wonder if we want: HasValueVal and (ContentsNotEqX <=> CurrentValue) instead? Or even HasValueVal and CurrentValue?

xazax.hun: I am still wondering a bit about this case. We generate: `HasValueVal and ContentsNotEqX and…

ymandelAuthorUnsubmitted

Done

I don't think that the iff version is right, but HasValueVal and CurrentValue could be. My concern is that we're not guaranteed that CurrentValue is populated. And, even if we were, it doesn't feel quite right. Assuming its a high fidelity model, we get (logically): HasValue(opt) and Ne(ValueOr(opt,X),X). Then, when negated (say, on an else branch) we get not(HasValue(opt)) or not(Ne(ValueOr(opt,X),X)) which is equivalent to not(HasValue(opt)) or Eq(ValueOr(opt,X),X). While true, it seems redundant, since the first clause should be derivable from the second (assuming an interpretatable semantics to the ValueOr predicate).

Regardless, it might be better to step back and figure out how this should be done systematically. I'll try to come back with a proposal on that.

ymandel: I don't think that the iff version is right, but `HasValueVal and CurrentValue` could be. My…

ymandelAuthorUnsubmitted

Done

Regardless, it might be better to step back and figure out how this should be done systematically. I'll try to come back with a proposal on that.

Here's what I have: in general, we're aiming for all models to be a sound (over) approximation of reality. That is what we're doing here as well. Yet, that poses a problem for the interpretation of the boolean not operator. If its operand is an overapproximation, then I believe the naive approach gives you an under approximation. That's the problem we're hitting when reasoning about the negation.

I'm not sure how to handle this. Stanislav -- have we dealt with this issue before?

That said, if we go back to the previous approach, of adding the information to the path condition, I think we avoid this problem, since the path conditions don't get negated. To Gabor's earlier point:

There is an implication in the reverse direction as well. In case we know the optional is empty, we can prune one of the branches from the analysis. Is it possible to implement that with the current status of the framework?

I think is covered by the condition we're adding. Namely:

ExprValue => has_value

where ExprValue is the truth value of the boolean expression.

So, the implication in the reverse direction is:

!has_value => !ExprValue

that is, if we know the optional doesn't hold a value, then we know that opt.value_or(X) = X

But, that implication is the contrapositive of our own, so I think it's already implicitly covered by adding the single implication. Does that sound right?

ymandel: > Regardless, it might be better to step back and figure out how this should be done…

sgatevUnsubmitted

Done

I'm not following where Env.makeAnd(*CurrentValue, ComparisonValue) comes from so I'd question whether it's sound or not. I would have expected to see something like ExprValue => has_value (which I believe was the case in the first iteration) and I see no issues with the contrapositive. If you have x => y and not y in the flow condition, you'll be able to infer that not x is true (assuming no other statements for x). How we use this to prune branches from the analysis is a question of its own.

sgatev: I'm not following where `Env.makeAnd(*CurrentValue, ComparisonValue)` comes from so I'd…

ymandelAuthorUnsubmitted

Done

I think the new version resolves this?

ymandel: I think the new version resolves this?

sgatevUnsubmitted

Done

Yes, modelling these using implications looks good to me!

sgatev: Yes, modelling these using implications looks good to me!

}

void transferValueOrStringEmptyCall(const clang::Expr *ComparisonExpr,

const MatchFinder::MatchResult &Result,

LatticeTransferState &State) {

return transferValueOrImpl(ComparisonExpr, Result, State,

[](Environment &Env, BoolValue &ExprVal,

BoolValue &HasValueVal) -> BoolValue & {

// If the result is *not* empty, then we know the

// optional must have been holding a value. If

// `ExprVal` is true, though, we don't learn

// anything definite about `has_value`, so we

// don't add any corresponding implications to

// the flow condition.

return Env.makeImplication(Env.makeNot(ExprVal),

HasValueVal);

xazax.hunUnsubmitted

Done

Not related to this PR, but I think in the future we will want to associate names to the values to make debugging easier (or maybe to generate really nice error messages).

xazax.hun: Not related to this PR, but I think in the future we will want to associate names to the values…

ymandelAuthorUnsubmitted

Done

Good idea. I've noted that (to myself) as a todo to add a FIXME or somesuch.

ymandel: Good idea. I've noted that (to myself) as a todo to add a FIXME or somesuch.

});

}

void transferValueOrNotEqX(const Expr *ComparisonExpr,

const MatchFinder::MatchResult &Result,

LatticeTransferState &State) {

transferValueOrImpl(ComparisonExpr, Result, State,

[](Environment &Env, BoolValue &ExprVal,

BoolValue &HasValueVal) -> BoolValue & {

// We know that if `(opt.value_or(X) != X)` then

// `opt.hasValue()`, even without knowing further

// details about the contents of `opt`.

xazax.hunUnsubmitted

Done

Typo: values.

xazax.hun: Typo: values.

return Env.makeImplication(ExprVal, HasValueVal);

});

}

void assignOptionalValue(const Expr &E, LatticeTransferState &State, void assignOptionalValue(const Expr &E, LatticeTransferState &State,

BoolValue &HasValueVal) { BoolValue &HasValueVal) {

if (auto *OptionalLoc = if (auto *OptionalLoc =

State.Env.getStorageLocation(E, SkipPast::ReferenceThenPointer)) { State.Env.getStorageLocation(E, SkipPast::ReferenceThenPointer)) {

State.Env.setValue(*OptionalLoc, State.Env.setValue(*OptionalLoc,

createOptionalValue(State.Env, HasValueVal)); createOptionalValue(State.Env, HasValueVal));

} }

▲ Show 20 Lines • Show All 203 Lines • ▼ Show 20 Lines return MatchSwitchBuilder<LatticeTransferState>()

// optional::swap // optional::swap

.CaseOf<CXXMemberCallExpr>(isOptionalMemberCallWithName("swap"), .CaseOf<CXXMemberCallExpr>(isOptionalMemberCallWithName("swap"),

transferSwapCall) transferSwapCall)

// std::swap // std::swap

.CaseOf<CallExpr>(isStdSwapCall(), transferStdSwapCall) .CaseOf<CallExpr>(isStdSwapCall(), transferStdSwapCall)

// opt.value_or("").empty()

sgatevUnsubmitted

Done

.CaseOf<CallExpr>(isStdSwapCall(), transferStdSwapCall)

- // opt.value_or(X) != X, !opt.value_or("").empty():

+ // opt.value_or(X) != X, !opt.value_or("").empty()

.CaseOf<Expr>(

Extreme nit for consistency with all comments above.

sgatev: Extreme nit for consistency with all comments above.

.CaseOf<Expr>(isValueOrStringEmptyCall(), transferValueOrStringEmptyCall)

sgatevUnsubmitted

Not Done

Why not hard-code this in the isValueOrCondition matcher?

sgatev: Why not hard-code this in the `isValueOrCondition` matcher?

ymandelAuthorUnsubmitted

Done

Safety/hygiene. It's easier to see that the ID to which the node is bound is the same that's being used in getNodeAs. An alternative which I often use is to use a (static) global constant, so I've changed to that.

ymandel: Safety/hygiene. It's easier to see that the ID to which the node is bound is the same that's…

// opt.value_or(X) != X

sgatevUnsubmitted

Done

The clang namespace can be removed. Same comment for other instances in the patch.

sgatev: The `clang` namespace can be removed. Same comment for other instances in the patch.

.CaseOf<Expr>(isValueOrNotEqX(), transferValueOrNotEqX)

sgatevUnsubmitted

Done

Call this isValueOrNotEqX for consistency?

sgatev: Call this `isValueOrNotEqX` for consistency?

sgatevUnsubmitted

Done

Why not pass transferOptionalValueOrCall as argument instead of wrapping it in a lambda? The function can take the "ValueOrCall" node from the MatchResult.

sgatev: Why not pass `transferOptionalValueOrCall` as argument instead of wrapping it in a lambda? The…

.Build(); .Build();

} }

} // namespace } // namespace

UncheckedOptionalAccessModel::UncheckedOptionalAccessModel( UncheckedOptionalAccessModel::UncheckedOptionalAccessModel(

ASTContext &Ctx, UncheckedOptionalAccessModelOptions Options) ASTContext &Ctx, UncheckedOptionalAccessModelOptions Options)

: DataflowAnalysis<UncheckedOptionalAccessModel, SourceLocationsLattice>( : DataflowAnalysis<UncheckedOptionalAccessModel, SourceLocationsLattice>(

Show All 12 Lines

clang/unittests/Analysis/FlowSensitive/UncheckedOptionalAccessModelTest.cpp

Show First 20 Lines • Show All 490 Lines • ▼ Show 20 Lines
template <bool B, typename T = void>		template <bool B, typename T = void>
using enable_if_t = typename std::enable_if<B, T>::type;		using enable_if_t = typename std::enable_if<B, T>::type;

} // namespace absl		} // namespace absl

#endif // ABSL_TYPE_TRAITS_H		#endif // ABSL_TYPE_TRAITS_H
)";		)";

		static constexpr char StdStringHeader[] = R"(
		#ifndef STRING_H
		#define STRING_H

		namespace std {

		struct string {
		string(const char*);
		~string();
		bool empty();
		};
		bool operator!=(const string &LHS, const char *RHS);

		} // namespace std

		#endif // STRING_H
		)";

static constexpr char StdUtilityHeader[] = R"(		static constexpr char StdUtilityHeader[] = R"(
#ifndef UTILITY_H		#ifndef UTILITY_H
#define UTILITY_H		#define UTILITY_H

#include "std_type_traits.h"		#include "std_type_traits.h"

namespace std {		namespace std {

▲ Show 20 Lines • Show All 686 Lines • ▼ Show 20 Lines	void ExpectLatticeChecksFor(std::string SourceCode,
FuncDeclMatcher FuncMatcher,		FuncDeclMatcher FuncMatcher,
LatticeChecksMatcher MatchesLatticeChecks) {		LatticeChecksMatcher MatchesLatticeChecks) {
ReplaceAllOccurrences(SourceCode, "$ns", GetParam().NamespaceName);		ReplaceAllOccurrences(SourceCode, "$ns", GetParam().NamespaceName);
ReplaceAllOccurrences(SourceCode, "$optional", GetParam().TypeName);		ReplaceAllOccurrences(SourceCode, "$optional", GetParam().TypeName);

std::vector<std::pair<std::string, std::string>> Headers;		std::vector<std::pair<std::string, std::string>> Headers;
Headers.emplace_back("cstddef.h", CSDtdDefHeader);		Headers.emplace_back("cstddef.h", CSDtdDefHeader);
Headers.emplace_back("std_initializer_list.h", StdInitializerListHeader);		Headers.emplace_back("std_initializer_list.h", StdInitializerListHeader);
		Headers.emplace_back("std_string.h", StdStringHeader);
Headers.emplace_back("std_type_traits.h", StdTypeTraitsHeader);		Headers.emplace_back("std_type_traits.h", StdTypeTraitsHeader);
Headers.emplace_back("std_utility.h", StdUtilityHeader);		Headers.emplace_back("std_utility.h", StdUtilityHeader);
Headers.emplace_back("std_optional.h", StdOptionalHeader);		Headers.emplace_back("std_optional.h", StdOptionalHeader);
Headers.emplace_back("absl_type_traits.h", AbslTypeTraitsHeader);		Headers.emplace_back("absl_type_traits.h", AbslTypeTraitsHeader);
Headers.emplace_back("absl_optional.h", AbslOptionalHeader);		Headers.emplace_back("absl_optional.h", AbslOptionalHeader);
Headers.emplace_back("base_optional.h", BaseOptionalHeader);		Headers.emplace_back("base_optional.h", BaseOptionalHeader);
Headers.emplace_back("unchecked_optional_access_test.h", R"(		Headers.emplace_back("unchecked_optional_access_test.h", R"(
#include "absl_optional.h"		#include "absl_optional.h"
#include "base_optional.h"		#include "base_optional.h"
#include "std_initializer_list.h"		#include "std_initializer_list.h"
#include "std_optional.h"		#include "std_optional.h"
		#include "std_string.h"
#include "std_utility.h"		#include "std_utility.h"

template <typename T>		template <typename T>
T Make();		T Make();
)");		)");
const tooling::FileContentMappings FileContents(Headers.begin(),		const tooling::FileContentMappings FileContents(Headers.begin(),
Headers.end());		Headers.end());
llvm::Error Error = checkDataflow<UncheckedOptionalAccessModel>(		llvm::Error Error = checkDataflow<UncheckedOptionalAccessModel>(
SourceCode, FuncMatcher,		SourceCode, FuncMatcher,
[](ASTContext &Ctx, Environment &) {		[](ASTContext &Ctx, Environment &) {
return UncheckedOptionalAccessModel(		return UncheckedOptionalAccessModel(
Ctx, UncheckedOptionalAccessModelOptions{		Ctx, UncheckedOptionalAccessModelOptions{
/IgnoreSmartPointerDereference=/true});		/IgnoreSmartPointerDereference=/true});
		ymandelAuthorUnsubmitted Done Reply Inline Actions Note: this came from sync'ing to HEAD and picking up my other patch. ymandel: Note: this came from sync'ing to HEAD and picking up my other patch.
},		},
[&MatchesLatticeChecks](		[&MatchesLatticeChecks](
llvm::ArrayRef<std::pair<		llvm::ArrayRef<std::pair<
std::string, DataflowAnalysisState<SourceLocationsLattice>>>		std::string, DataflowAnalysisState<SourceLocationsLattice>>>
CheckToLatticeMap,		CheckToLatticeMap,
ASTContext &Ctx) {		ASTContext &Ctx) {
// FIXME: Consider using a matcher instead of translating		// FIXME: Consider using a matcher instead of translating
// `CheckToLatticeMap` to `CheckToStringifiedLatticeMap`.		// `CheckToLatticeMap` to `CheckToStringifiedLatticeMap`.
▲ Show 20 Lines • Show All 474 Lines • ▼ Show 20 Lines	void target() {
opt.value_or(0);		opt.value_or(0);
(void)0;		(void)0;
/[[check]]/		/[[check]]/
}		}
)",		)",
UnorderedElementsAre(Pair("check", "safe")));		UnorderedElementsAre(Pair("check", "safe")));
}		}

		TEST_P(UncheckedOptionalAccessTest, ValueOrComparison) {
		// Pointers.
		ExpectLatticeChecksFor(
		R"code(
		#include "unchecked_optional_access_test.h"

		void target($ns::$optional<int*> opt) {
		if (opt.value_or(nullptr) != nullptr) {
		sgatevUnsubmitted Done Reply Inline Actions I suggest making `opt` a parameter of `target` in all tests because in the current setup a more advanced analysis would identify one of the code paths we exercise as dead. sgatev: I suggest making `opt` a parameter of `target` in all tests because in the current setup a more…
		opt.value();
		/[[check-ptrs-1]]/
		sgatevUnsubmitted Done Reply Inline Actions Is the `return` important? I think having `void` return type would be simpler. Same comment for the cases below. sgatev: Is the `return` important? I think having `void` return type would be simpler. Same comment for…
		ymandelAuthorUnsubmitted Done Reply Inline Actions Totally. Thanks for pointing that out. ymandel: Totally. Thanks for pointing that out.
		} else {
		opt.value();
		/[[check-ptrs-2]]/
		}
		}
		)code",
		UnorderedElementsAre(Pair("check-ptrs-1", "safe"),
		Pair("check-ptrs-2", "unsafe: input.cc:9:9")));

		// Integers.
		ExpectLatticeChecksFor(
		R"code(
		#include "unchecked_optional_access_test.h"

		void target($ns::$optional<int> opt) {
		if (opt.value_or(0) != 0) {
		opt.value();
		/[[check-ints-1]]/
		} else {
		opt.value();
		/[[check-ints-2]]/
		}
		}
		)code",
		UnorderedElementsAre(Pair("check-ints-1", "safe"),
		Pair("check-ints-2", "unsafe: input.cc:9:9")));

		// Strings.
		ExpectLatticeChecksFor(
		R"code(
		#include "unchecked_optional_access_test.h"

		void target($ns::$optional<std::string> opt) {
		if (!opt.value_or("").empty()) {
		opt.value();
		/[[check-strings-1]]/
		} else {
		opt.value();
		/[[check-strings-2]]/
		}
		}
		)code",
		UnorderedElementsAre(Pair("check-strings-1", "safe"),
		Pair("check-strings-2", "unsafe: input.cc:9:9")));

		ExpectLatticeChecksFor(
		R"code(
		#include "unchecked_optional_access_test.h"

		void target($ns::$optional<std::string> opt) {
		if (opt.value_or("") != "") {
		opt.value();
		/[[check-strings-neq-1]]/
		} else {
		opt.value();
		/[[check-strings-neq-2]]/
		}
		}
		)code",
		UnorderedElementsAre(
		sgatevUnsubmitted Done Reply Inline Actions Let's move this to a string header and remove the definition in the test above. sgatev: Let's move this to a string header and remove the definition in the test above.
		Pair("check-strings-neq-1", "safe"),
		Pair("check-strings-neq-2", "unsafe: input.cc:9:9")));

		// Pointer-to-optional.
		//
		// FIXME: make `opt` a parameter directly, once we ensure that all `optional`
		// values have a `has_value` property.
		ExpectLatticeChecksFor(
		R"code(
		#include "unchecked_optional_access_test.h"

		void target($ns::$optional<int> p) {
		$ns::$optional<int> *opt = &p;
		if (opt->value_or(0) != 0) {
		opt->value();
		/[[check-pto-1]]/
		} else {
		opt->value();
		/[[check-pto-2]]/
		}
		}
		)code",
		UnorderedElementsAre(Pair("check-pto-1", "safe"),
		Pair("check-pto-2", "unsafe: input.cc:10:9")));
		}

TEST_P(UncheckedOptionalAccessTest, Emplace) {		TEST_P(UncheckedOptionalAccessTest, Emplace) {
ExpectLatticeChecksFor(R"(		ExpectLatticeChecksFor(R"(
#include "unchecked_optional_access_test.h"		#include "unchecked_optional_access_test.h"

		sgatevUnsubmitted Done Reply Inline Actions These can be combined in a `$ns::$optional<int> opt` parameter. sgatev:* These can be combined in a `$ns::$optional<int> *opt` parameter.
		ymandelAuthorUnsubmitted Done Reply Inline Actions Unfortunately, that crashes (which must be why I did this to begin with). But, I did reduce to only one var and one param. ymandel: Unfortunately, that crashes (which must be why I did this to begin with). But, I did reduce to…
void target() {		void target() {
$ns::$optional<int> opt;		$ns::$optional<int> opt;
opt.emplace(0);		opt.emplace(0);
opt.value();		opt.value();
/[[check]]/		/[[check]]/
}		}
)",		)",
UnorderedElementsAre(Pair("check", "safe")));		UnorderedElementsAre(Pair("check", "safe")));
▲ Show 20 Lines • Show All 306 Lines • ▼ Show 20 Lines	TEST_P(UncheckedOptionalAccessTest, UniquePtrToStructWithOptionalField) {
)",		)",
UnorderedElementsAre(Pair("check-1", "safe"), Pair("check-2", "safe")));		UnorderedElementsAre(Pair("check-1", "safe"), Pair("check-2", "safe")));
}		}

// FIXME: Add support for:		// FIXME: Add support for:
// - constructors (copy, move)		// - constructors (copy, move)
// - assignment operators (default, copy, move)		// - assignment operators (default, copy, move)
// - invalidation (passing optional by non-const reference/pointer)		// - invalidation (passing optional by non-const reference/pointer)
// - `value_or(nullptr) != nullptr`, `value_or(0) != 0`, `value_or("").empty()`
// - nested `optional` values		// - nested `optional` values

This is an archive of the discontinued LLVM Phabricator instance.

[clang][dataflow] Add support for `value_or` in a comparison.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 419424

clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp

clang/unittests/Analysis/FlowSensitive/UncheckedOptionalAccessModelTest.cpp

[clang][dataflow] Add support for `value_or` in a comparison.
ClosedPublic