This is an archive of the discontinued LLVM Phabricator instance.

[clang][dataflow] Add support for `value_or` in a comparison.
ClosedPublic

Authored by ymandel on Mar 22 2022, 7:10 AM.

Details

Summary

This patch adds limited modeling of the value_or method. Specifically, when
used in a particular idiom in a comparison to implicitly check whether the
optional holds a value.

Diff Detail

Event Timeline

ymandel created this revision.Mar 22 2022, 7:10 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 22 2022, 7:10 AM
ymandel requested review of this revision.Mar 22 2022, 7:10 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 22 2022, 7:10 AM
sgatev added inline comments.Mar 23 2022, 2:20 AM
clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
127

Why handle negation here? Would it work for if (opt.value_or("").empty()) { ... } else { opt.value(); }?

156

Why 0? How about opt_p->value_or(21) != 21?

542

Extreme nit for consistency with all comments above.

544

Why not hard-code this in the isValueOrCondition matcher?

545

The clang namespace can be removed. Same comment for other instances in the patch.

547

Why not pass transferOptionalValueOrCall as argument instead of wrapping it in a lambda? The function can take the "ValueOrCall" node from the MatchResult.

clang/unittests/Analysis/FlowSensitive/UncheckedOptionalAccessModelTest.cpp
1744

Is the return important? I think having void return type would be simpler. Same comment for the cases below.

xazax.hun added inline comments.Mar 23 2022, 9:28 AM
clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
279

There is an implication in the reverse direction as well. In case we know the optional is empty, we can prune one of the branches from the analysis. Is it possible to implement that with the current status of the framework?

ymandel updated this revision to Diff 418205.Mar 25 2022, 6:23 AM

Addressed comments.

ymandel updated this revision to Diff 418206.Mar 25 2022, 6:25 AM

remove stray comments

ymandel updated this revision to Diff 418209.Mar 25 2022, 6:42 AM
ymandel marked an inline comment as done.

adjust logical formula

ymandel marked 6 inline comments as done and an inline comment as not done.Mar 25 2022, 6:47 AM

Thanks for the detailed review!

clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
127

The negation is a simpler predicate, but you're right that it's too specific. I've rewritten the code to drop the constraint and handle the more general opt.value_or("").empty(). The new code also encodes a more precise relationship in the logic, per Gabor's comments below about the implication in the other direction. It also now directly attaches the formula to the value, rather than dropping an implication in the flow conditions.

Overall, I think the new approach is an improvement, but please let me know if you disagree.

156

This is addressed in the comment, but do you think we should add a FIXME to support some amount of expression comparision? Integers, floats, bools and variables would be an easy place to start, for example. But, we'd need to drop into regular code -- the matchers can't express that kind of constraint.

279

Yes, good point! Please see my response to Stanislav above. I think the new version handles this by modeling the value_or directly, rather than dropping in an implication.

544

Safety/hygiene. It's easier to see that the ID to which the node is bound is the same that's being used in getNodeAs. An alternative which I often use is to use a (static) global constant, so I've changed to that.

clang/unittests/Analysis/FlowSensitive/UncheckedOptionalAccessModelTest.cpp
1744

Totally. Thanks for pointing that out.

ymandel added a subscriber: kinu.Mar 25 2022, 9:24 AM
xazax.hun added inline comments.Mar 25 2022, 11:33 AM
clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
150
155

I wonder if we want to add "" to support opt.value_or("") != "". Not sure how frequent would this be over the empty call.

281

Is this the right way to initialize ComparisonValue?

Considering the expression: opt.value_or(nullptr) != nullptr

  • When has_value == false, opt.value_or(nullptr) will return nullptr, so != evaluates to false. This case seems to check out.
  • However, when has_value == true, opt might still hold an nullptr and != could still evaluate to false.
ymandel marked 3 inline comments as done.Mar 25 2022, 12:42 PM
ymandel added inline comments.
clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
281

Thanks for digging into this. I think it's correct, but helpful to step through:

Its correctness depends on MakeValue, so I'll focus on that in particular. For the nullptr case, we'll get:

HasValueVal && ContentsNotEqX

So, when has_value == true, this basically reduces to ContentsNotEqX. Since that's an atom, the result is indeterminate, which I believe is the desired outcome.

WDYT? Also, even if I've convinced you, please let me know how i can improve the comments. For that matter, would MakeValue be better with a more specific name, like "MakePredicate" or somesuch?

xazax.hun added inline comments.Mar 25 2022, 12:50 PM
clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
281

I think what confuses me is that we do something different for the 3 cases. You convinced me that HasValueVal && ContentsNotEqX is correct. But we only do this for one branch out of the 3. What is the reason for that?

xazax.hun added inline comments.Mar 25 2022, 12:54 PM
clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
318

Typo: values.

xazax.hun added inline comments.Mar 25 2022, 12:56 PM
clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
281

Oh, never mind. Yeah, I think changing MakeValue to MakePredicate would make this a bit clearer. After a second read now I understand better what is going on.

ymandel added inline comments.Mar 25 2022, 12:58 PM
clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
281

Just to be clear: the three cases you mean are lines 273-283, or something else?

ymandel added inline comments.Mar 25 2022, 1:00 PM
clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
281

and never mind my question, then (I rpelied before I saw your updated). I'll change the name and add comments.

xazax.hun added inline comments.Mar 25 2022, 1:02 PM
clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
290

I am still wondering a bit about this case.

We generate: HasValueVal and ContentsNotEqX and CurrentValue.'
I wonder if we want: HasValueVal and (ContentsNotEqX <=> CurrentValue) instead? Or even HasValueVal and CurrentValue?

xazax.hun added inline comments.Mar 25 2022, 1:23 PM
clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
306

Not related to this PR, but I think in the future we will want to associate names to the values to make debugging easier (or maybe to generate really nice error messages).

ymandel added inline comments.Mar 25 2022, 1:55 PM
clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
290

I don't think that the iff version is right, but HasValueVal and CurrentValue could be. My concern is that we're not guaranteed that CurrentValue is populated. And, even if we were, it doesn't feel quite right. Assuming its a high fidelity model, we get (logically): HasValue(opt) and Ne(ValueOr(opt,X),X). Then, when negated (say, on an else branch) we get not(HasValue(opt)) or not(Ne(ValueOr(opt,X),X)) which is equivalent to not(HasValue(opt)) or Eq(ValueOr(opt,X),X). While true, it seems redundant, since the first clause should be derivable from the second (assuming an interpretatable semantics to the ValueOr predicate).

Regardless, it might be better to step back and figure out how this should be done systematically. I'll try to come back with a proposal on that.

sgatev added inline comments.Mar 28 2022, 3:00 AM
clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
281

Can you elaborate on the three cases on lines 273-283? Why not simply do

auto &ComparisonExprLoc = Env.createStorageLocation(*ComparisonExpr);
Env.setStorageLocation(ComparisonExpr, ComparisonExprLoc);
Env.setValue(ComparisonExprLoc, ComparisonValue);
ymandel marked 2 inline comments as done.Mar 28 2022, 5:40 AM
ymandel added inline comments.
clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
281

Can you elaborate on the three cases on lines 273-283? Why not simply do

auto &ComparisonExprLoc = Env.createStorageLocation(*ComparisonExpr);
Env.setStorageLocation(ComparisonExpr, ComparisonExprLoc);
Env.setValue(ComparisonExprLoc, ComparisonValue);

for the second case: I think we should drop it -- I don't see a reason to maintain the previous value (if there is any). It might be a good idea for compositionality, but we're not doing that anywhere else, so it doesn't make sense here.

for the first and third case: I assumed that if the expression already has a location, we'd want to reuse it. But, based on your question, I take it that's incorrect?

290

Regardless, it might be better to step back and figure out how this should be done systematically. I'll try to come back with a proposal on that.

Here's what I have: in general, we're aiming for all models to be a sound (over) approximation of reality. That is what we're doing here as well. Yet, that poses a problem for the interpretation of the boolean not operator. If its operand is an overapproximation, then I believe the naive approach gives you an under approximation. That's the problem we're hitting when reasoning about the negation.

I'm not sure how to handle this. Stanislav -- have we dealt with this issue before?

That said, if we go back to the previous approach, of adding the information to the path condition, I think we avoid this problem, since the path conditions don't get negated. To Gabor's earlier point:

There is an implication in the reverse direction as well. In case we know the optional is empty, we can prune one of the branches from the analysis. Is it possible to implement that with the current status of the framework?

I think is covered by the condition we're adding. Namely:

ExprValue => has_value

where ExprValue is the truth value of the boolean expression.

So, the implication in the reverse direction is:

!has_value => !ExprValue

that is, if we know the optional doesn't hold a value, then we know that opt.value_or(X) = X

But, that implication is the contrapositive of our own, so I think it's already implicitly covered by adding the single implication. Does that sound right?

sgatev added inline comments.Mar 29 2022, 1:16 AM
clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
281

Dropping the second case makes sense to me.

For the rest, createStorageLocation returns a stable storage location so the snippet above should be sufficient. However, setStorageLocation will fail if we try calling it again with the same expression, even if it's called with the same storage location. What do you think about making setStorageLocation not fail if it's called with the same arguments?

290

I'm not following where Env.makeAnd(*CurrentValue, ComparisonValue) comes from so I'd question whether it's sound or not. I would have expected to see something like ExprValue => has_value (which I believe was the case in the first iteration) and I see no issues with the contrapositive. If you have x => y and not y in the flow condition, you'll be able to infer that not x is true (assuming no other statements for x). How we use this to prune branches from the analysis is a question of its own.

ymandel updated this revision to Diff 418880.Mar 29 2022, 7:55 AM
ymandel marked 3 inline comments as done.

address comments

ymandel marked 7 inline comments as done.Mar 29 2022, 8:15 AM
ymandel added inline comments.
clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
150

Right. I've added a FIXME. I think an elegant solution in this case would be a relational matcher for built-in types.

155

Makes sense -- done.

281

for the code here -- I don't think that there's a case where there's not value associated with a comparison, yet there is a loc, so I think that snippet is fine.

for the general case, I think that setStorageLocation should be be no stricter than createStorageLocation. it seems strange for the set operation to fail when the create does not (since "set" is more commonly a repeatable operation). Otherwise, no strong opinion as long as we document the behavior.

290

I think the new version resolves this?

306

Good idea. I've noted that (to myself) as a todo to add a FIXME or somesuch.

xazax.hun accepted this revision.Mar 29 2022, 8:27 AM

Wow. This did take some iterations and I feel like I just added to the confusion at some point :D But the latest iteration looks much simpler and I'm confident it is right this time. Thanks!

This revision is now accepted and ready to land.Mar 29 2022, 8:27 AM
ymandel marked 3 inline comments as done.Mar 29 2022, 8:56 AM

Wow. This did take some iterations and I feel like I just added to the confusion at some point :D But the latest iteration looks much simpler and I'm confident it is right this time. Thanks!

Not at all -- I think you raised some really good questions! Ultimately, my move from implication in the flow condition to tying it directly to the value was the wrong turn, and your questions effectively highlighted the issues. :)

sgatev accepted this revision.Mar 30 2022, 1:02 AM
sgatev added inline comments.
clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
283

Why do this conditionally? I think we should set a value regardless of whether another model has already done so.

290

Yes, modelling these using implications looks good to me!

546

Call this isValueOrNotEqX for consistency?

clang/unittests/Analysis/FlowSensitive/UncheckedOptionalAccessModelTest.cpp
1742

I suggest making opt a parameter of target in all tests because in the current setup a more advanced analysis would identify one of the code paths we exercise as dead.

1804

Let's move this to a string header and remove the definition in the test above.

1833–1834

These can be combined in a $ns::$optional<int> *opt parameter.

ymandel updated this revision to Diff 419240.Mar 30 2022, 12:15 PM
ymandel marked 5 inline comments as done.

address comments.

ymandel updated this revision to Diff 419243.Mar 30 2022, 12:22 PM

delete line

ymandel added inline comments.Mar 30 2022, 12:23 PM
clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
283

Why? I figured we're agnostic to the underlying value, and only care about relating it via the implication. We're setting it only so we have something to anchor that implication on. If we always set it, then we're erasing the information from another model.

clang/unittests/Analysis/FlowSensitive/UncheckedOptionalAccessModelTest.cpp
1244

Note: this came from sync'ing to HEAD and picking up my other patch.

1833–1834

Unfortunately, that crashes (which must be why I did this to begin with). But, I did reduce to only one var and one param.

sgatev accepted this revision.Mar 31 2022, 12:35 AM
sgatev added inline comments.
clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
283

Nevermind. I probably didn't follow carefully around all the comment blocks and thought that addToFlowCondition also happens conditionally. The current approach looks good to me.