This patch adds limited modeling of the value_or method. Specifically, when
used in a particular idiom in a comparison to implicitly check whether the
optional holds a value.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp | ||
---|---|---|
127 | Why handle negation here? Would it work for if (opt.value_or("").empty()) { ... } else { opt.value(); }? | |
156 | Why 0? How about opt_p->value_or(21) != 21? | |
542 | Extreme nit for consistency with all comments above. | |
544 | Why not hard-code this in the isValueOrCondition matcher? | |
545 | The clang namespace can be removed. Same comment for other instances in the patch. | |
547 | Why not pass transferOptionalValueOrCall as argument instead of wrapping it in a lambda? The function can take the "ValueOrCall" node from the MatchResult. | |
clang/unittests/Analysis/FlowSensitive/UncheckedOptionalAccessModelTest.cpp | ||
1744 | Is the return important? I think having void return type would be simpler. Same comment for the cases below. |
clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp | ||
---|---|---|
279 | There is an implication in the reverse direction as well. In case we know the optional is empty, we can prune one of the branches from the analysis. Is it possible to implement that with the current status of the framework? |
Thanks for the detailed review!
clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp | ||
---|---|---|
127 | The negation is a simpler predicate, but you're right that it's too specific. I've rewritten the code to drop the constraint and handle the more general opt.value_or("").empty(). The new code also encodes a more precise relationship in the logic, per Gabor's comments below about the implication in the other direction. It also now directly attaches the formula to the value, rather than dropping an implication in the flow conditions. Overall, I think the new approach is an improvement, but please let me know if you disagree. | |
156 | This is addressed in the comment, but do you think we should add a FIXME to support some amount of expression comparision? Integers, floats, bools and variables would be an easy place to start, for example. But, we'd need to drop into regular code -- the matchers can't express that kind of constraint. | |
279 | Yes, good point! Please see my response to Stanislav above. I think the new version handles this by modeling the value_or directly, rather than dropping in an implication. | |
544 | Safety/hygiene. It's easier to see that the ID to which the node is bound is the same that's being used in getNodeAs. An alternative which I often use is to use a (static) global constant, so I've changed to that. | |
clang/unittests/Analysis/FlowSensitive/UncheckedOptionalAccessModelTest.cpp | ||
1744 | Totally. Thanks for pointing that out. |
clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp | ||
---|---|---|
150 | Yeah, I think Clang is in a very sad state in this regard. We have a lot of half done facilities littered all over the codebase, including: | |
155 | I wonder if we want to add "" to support opt.value_or("") != "". Not sure how frequent would this be over the empty call. | |
281 | Is this the right way to initialize ComparisonValue? Considering the expression: opt.value_or(nullptr) != nullptr
|
clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp | ||
---|---|---|
281 | Thanks for digging into this. I think it's correct, but helpful to step through: Its correctness depends on MakeValue, so I'll focus on that in particular. For the nullptr case, we'll get: HasValueVal && ContentsNotEqX So, when has_value == true, this basically reduces to ContentsNotEqX. Since that's an atom, the result is indeterminate, which I believe is the desired outcome. WDYT? Also, even if I've convinced you, please let me know how i can improve the comments. For that matter, would MakeValue be better with a more specific name, like "MakePredicate" or somesuch? |
clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp | ||
---|---|---|
281 | I think what confuses me is that we do something different for the 3 cases. You convinced me that HasValueVal && ContentsNotEqX is correct. But we only do this for one branch out of the 3. What is the reason for that? |
clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp | ||
---|---|---|
318 | Typo: values. |
clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp | ||
---|---|---|
281 | Oh, never mind. Yeah, I think changing MakeValue to MakePredicate would make this a bit clearer. After a second read now I understand better what is going on. |
clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp | ||
---|---|---|
281 | Just to be clear: the three cases you mean are lines 273-283, or something else? |
clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp | ||
---|---|---|
281 | and never mind my question, then (I rpelied before I saw your updated). I'll change the name and add comments. |
clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp | ||
---|---|---|
290 | I am still wondering a bit about this case. We generate: HasValueVal and ContentsNotEqX and CurrentValue.' |
clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp | ||
---|---|---|
306 | Not related to this PR, but I think in the future we will want to associate names to the values to make debugging easier (or maybe to generate really nice error messages). |
clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp | ||
---|---|---|
290 | I don't think that the iff version is right, but HasValueVal and CurrentValue could be. My concern is that we're not guaranteed that CurrentValue is populated. And, even if we were, it doesn't feel quite right. Assuming its a high fidelity model, we get (logically): HasValue(opt) and Ne(ValueOr(opt,X),X). Then, when negated (say, on an else branch) we get not(HasValue(opt)) or not(Ne(ValueOr(opt,X),X)) which is equivalent to not(HasValue(opt)) or Eq(ValueOr(opt,X),X). While true, it seems redundant, since the first clause should be derivable from the second (assuming an interpretatable semantics to the ValueOr predicate). Regardless, it might be better to step back and figure out how this should be done systematically. I'll try to come back with a proposal on that. |
clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp | ||
---|---|---|
281 | Can you elaborate on the three cases on lines 273-283? Why not simply do auto &ComparisonExprLoc = Env.createStorageLocation(*ComparisonExpr); Env.setStorageLocation(ComparisonExpr, ComparisonExprLoc); Env.setValue(ComparisonExprLoc, ComparisonValue); |
clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp | ||
---|---|---|
281 |
for the second case: I think we should drop it -- I don't see a reason to maintain the previous value (if there is any). It might be a good idea for compositionality, but we're not doing that anywhere else, so it doesn't make sense here. for the first and third case: I assumed that if the expression already has a location, we'd want to reuse it. But, based on your question, I take it that's incorrect? | |
290 |
Here's what I have: in general, we're aiming for all models to be a sound (over) approximation of reality. That is what we're doing here as well. Yet, that poses a problem for the interpretation of the boolean not operator. If its operand is an overapproximation, then I believe the naive approach gives you an under approximation. That's the problem we're hitting when reasoning about the negation. I'm not sure how to handle this. Stanislav -- have we dealt with this issue before? That said, if we go back to the previous approach, of adding the information to the path condition, I think we avoid this problem, since the path conditions don't get negated. To Gabor's earlier point:
I think is covered by the condition we're adding. Namely: ExprValue => has_value where ExprValue is the truth value of the boolean expression. So, the implication in the reverse direction is: !has_value => !ExprValue that is, if we know the optional doesn't hold a value, then we know that opt.value_or(X) = X But, that implication is the contrapositive of our own, so I think it's already implicitly covered by adding the single implication. Does that sound right? |
clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp | ||
---|---|---|
281 | Dropping the second case makes sense to me. For the rest, createStorageLocation returns a stable storage location so the snippet above should be sufficient. However, setStorageLocation will fail if we try calling it again with the same expression, even if it's called with the same storage location. What do you think about making setStorageLocation not fail if it's called with the same arguments? | |
290 | I'm not following where Env.makeAnd(*CurrentValue, ComparisonValue) comes from so I'd question whether it's sound or not. I would have expected to see something like ExprValue => has_value (which I believe was the case in the first iteration) and I see no issues with the contrapositive. If you have x => y and not y in the flow condition, you'll be able to infer that not x is true (assuming no other statements for x). How we use this to prune branches from the analysis is a question of its own. |
clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp | ||
---|---|---|
150 | Right. I've added a FIXME. I think an elegant solution in this case would be a relational matcher for built-in types. | |
155 | Makes sense -- done. | |
281 | for the code here -- I don't think that there's a case where there's not value associated with a comparison, yet there is a loc, so I think that snippet is fine. for the general case, I think that setStorageLocation should be be no stricter than createStorageLocation. it seems strange for the set operation to fail when the create does not (since "set" is more commonly a repeatable operation). Otherwise, no strong opinion as long as we document the behavior. | |
290 | I think the new version resolves this? | |
306 | Good idea. I've noted that (to myself) as a todo to add a FIXME or somesuch. |
Wow. This did take some iterations and I feel like I just added to the confusion at some point :D But the latest iteration looks much simpler and I'm confident it is right this time. Thanks!
Not at all -- I think you raised some really good questions! Ultimately, my move from implication in the flow condition to tying it directly to the value was the wrong turn, and your questions effectively highlighted the issues. :)
clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp | ||
---|---|---|
283 | Why do this conditionally? I think we should set a value regardless of whether another model has already done so. | |
290 | Yes, modelling these using implications looks good to me! | |
546 | Call this isValueOrNotEqX for consistency? | |
clang/unittests/Analysis/FlowSensitive/UncheckedOptionalAccessModelTest.cpp | ||
1742 | I suggest making opt a parameter of target in all tests because in the current setup a more advanced analysis would identify one of the code paths we exercise as dead. | |
1804 | Let's move this to a string header and remove the definition in the test above. | |
1833–1834 | These can be combined in a $ns::$optional<int> *opt parameter. |
clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp | ||
---|---|---|
283 | Why? I figured we're agnostic to the underlying value, and only care about relating it via the implication. We're setting it only so we have something to anchor that implication on. If we always set it, then we're erasing the information from another model. | |
clang/unittests/Analysis/FlowSensitive/UncheckedOptionalAccessModelTest.cpp | ||
1244 | Note: this came from sync'ing to HEAD and picking up my other patch. | |
1833–1834 | Unfortunately, that crashes (which must be why I did this to begin with). But, I did reduce to only one var and one param. |
clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp | ||
---|---|---|
283 | Nevermind. I probably didn't follow carefully around all the comment blocks and thought that addToFlowCondition also happens conditionally. The current approach looks good to me. |
Why handle negation here? Would it work for if (opt.value_or("").empty()) { ... } else { opt.value(); }?