This is an archive of the discontinued LLVM Phabricator instance.

[clang][dataflow] Add support for correlated branches to optional model
ClosedPublic

Authored by sgatev on May 18 2022, 3:05 PM.

Details

Summary

Add support for correlated branches to the std::optional dataflow model.

Diff Detail

Event Timeline

sgatev created this revision.May 18 2022, 3:05 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 18 2022, 3:05 PM
sgatev requested review of this revision.May 18 2022, 3:05 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 18 2022, 3:05 PM

Thanks! Looks good overall, just the one question about loops.

clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
690–692

What's the plan for loops? This will increase the size of the constraint on every iteration, preventing termination.

xazax.hun added inline comments.May 19 2022, 9:16 AM
clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
690–692

I have nothing to add, just a brain dump. In this particular case, the analysis domain has infinite chains, the size of the Boolean formulas can increase indefinitely (at least syntactically). The canonical solution to deal with infinite chains is to do a widening operation such that the widened state over approximates both the state of the old and the new iteration. A very simple way of doing this would be to simply erase all knowledge we accumulated about the Boolean value. On the other hand, I wonder if it is often possible to do something smarter. There might be cases when the value is syntactically larger in the new iteration but actually it has the exact same truth table. The solver might be able to show this. On the other hand, if the loop generates new Boolean symbols in each iteration, we have little hope fore that.

xazax.hun added a comment.EditedMay 19 2022, 11:28 AM

Actually, I think in most cases we want to be consistent on how to merge bool values. So I wonder whether instead of reimplementing the merge operation in this check we should just call a function that does the work. And the same function should be used within the engine to merge states after if statements and so on.

Actually, I think in most cases we want to be consistent on how to merge bool values. So I wonder whether instead of reimplementing the merge operation in this check we should just call a function that does the work. And the same function should be used within the engine to merge states after if statements and so on.

Agreed. I think we made a mistake with bools, to be honest, in supporting correlated conditions, given our lack of distinction between join and widen -- something I didn't appreciate fully at the time of the patch. I propose:

  1. In this patch, we go with a widening operation, but put the relevant logic in the core, so it can be reused for booleans in general.
  2. Follow up with a patch that uses the new logic for general booleans.
clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
690–692

I have nothing to add, just a brain dump. In this particular case, the analysis domain has infinite chains, the size of the Boolean formulas can increase indefinitely (at least syntactically). The canonical solution to deal with infinite chains is to do a widening operation such that the widened state over approximates both the state of the old and the new iteration. A very simple way of doing this would be to simply erase all knowledge we accumulated about the Boolean value. On the other hand, I wonder if it is often possible to do something smarter. There might be cases when the value is syntactically larger in the new iteration but actually it has the exact same truth table. The solver might be able to show this. On the other hand, if the loop generates new Boolean symbols in each iteration, we have little hope fore that.

Agreed. In our internal implementation, we have something like this -- it consults the solver to see if the two are logically equivalent. That loses the (subtle) support for correlated conditions. However, until we have more precise support for widening (that is, separate from merge), I think this is the only way to go to ensure that loop analysis terminates.

  1. In this patch, we go with a widening operation, but put the relevant logic in the core, so it can be reused for booleans in general.

Depending on how much simplicity are we willing to sacrifice, doing a merge here might be OK. We only need widening to ensure the convergence of loops. Some frameworks are doing the regular merge operator for joins in the CFG and only use the widening operator along the back edges. This way they can be a bit more precise while preserving convergence.

sgatev updated this revision to Diff 436590.Jun 13 2022, 3:57 PM

Rebase main.

sgatev marked 3 inline comments as done.Jun 13 2022, 4:11 PM
sgatev added inline comments.
clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
690–692

After discussing this, we decided to go for the internal implementation which is tested on a large codebase and commit to smarter merging that preserves flow condition information only after testing it extensively. I have updated the patch accordingly, please take a look.

ymandel accepted this revision.Jun 13 2022, 7:17 PM

I love the tests (pretty cool examples of what can be handled). Thanks for the thoroughness!

clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
174

I believe you can relax this to Value because setProperty is no longer specific to StructValue.

289–291

I think this is no longer necessary now that we don't rely on StructValue for the getProperty call.

696

MergedVal is enough here, assuming the changes suggested above.

This revision is now accepted and ready to land.Jun 13 2022, 7:17 PM
sgatev updated this revision to Diff 436710.Jun 14 2022, 2:07 AM
sgatev marked 4 inline comments as done.

Address comments.

sgatev added inline comments.Jun 14 2022, 2:43 AM
clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
174

I did that intentionally because I still think it makes sense to assert that an optional is modeled as a StructValue. I haven't thought about where and how it'd be best to assert that though so I'll happily remove the casts for now.

ymandel added inline comments.Jun 14 2022, 4:38 AM
clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
174

Sounds good. I had the same thought. But, it occurs to me that at this point, are we using the StructValue at all? If not, then the only reason we expect it to be a StructValue is because we know the optional type is a record type. But, that's not an assumption of our model. So, maybe we should be agnostic to the underlying representation?

xazax.hun accepted this revision.Jun 14 2022, 8:52 AM
xazax.hun added inline comments.
clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
288

I wonder whether NonEmpty is clearer than Engaged. I think Engaged can also be OK, but we probably want to have a comment somewhere to explain the terminology.

696

Would it make sense to have the converese? I.e., when both optionals are empty, create an empty optional here. While currently this modeling does not distinguish between unchecked and empty optionals it might be handy in the future. Although I am also happy with this as is, not adding any code that does not have any utility as of today.

sgatev updated this revision to Diff 437088.Jun 15 2022, 2:27 AM
sgatev marked 3 inline comments as done.

Address comments.

sgatev added inline comments.Jun 15 2022, 2:30 AM
clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
174

Agreed. Let's not assume a StructValue representation unless it's necessary.

288

Sounds good. I went with isEmptyOptional and isNonEmptyOptional.

696

Good idea! I think this can be useful. I added the code and a test case based on dead code:

if (b) {
  opt = std::nullopt;
} else {
  opt = std::nullopt;
}
if (opt.has_value()) {
  // unreachable
}
This revision was landed with ongoing or failed builds.Jun 15 2022, 3:04 AM
This revision was automatically updated to reflect the committed changes.