This is an archive of the discontinued LLVM Phabricator instance.

[clang][dataflow] Track `optional` contents in `optional` model.
ClosedPublic

Authored by ymandel on May 4 2022, 8:52 AM.

Details

Summary

This patch adds partial support for tracking (i.e. modeling) the contents of an
optional value. Specifically, it supports tracking the value after it is
accessed. We leave tracking constructed/assigned contents to a future patch.

Diff Detail

Event Timeline

ymandel created this revision.May 4 2022, 8:52 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 4 2022, 8:52 AM
ymandel requested review of this revision.May 4 2022, 8:52 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 4 2022, 8:52 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
ymandel retitled this revision from [clang][dataflow] Track `optional`s' values (i.e. contents) in `optional` model. to [clang][dataflow] Track `optional` contents in `optional` model..May 4 2022, 8:53 AM
ymandel updated this revision to Diff 427029.May 4 2022, 8:57 AM

remove stray newline

Overall looks good to me. I am curious what will the strategy be to properly support construction. Do you plan to introduce a customization point to Env.createValue to give checks/models a way to set properties up? Or do you have something else in mind?

clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
229

I recall some problems with convergence due to creating fresh locations lazily in multiple branches. I wonder if doubling down before finding a proper solution could make it harder to fix this in the near future. If this is just a temporary solution until construction is better supported, it might be fine.

301

I wonder how well this strategy works in practice. If we have many unwrap calls to the same optional, we might end up emitting lots of diagnostics. An alternative approach is to assume that the optional has a value after the first unwrap. This would ensure that we only report a particular problem once and can reduce the noise.

ymandel updated this revision to Diff 427296.May 5 2022, 6:15 AM

Fixed the FIXME

Overall looks good to me. I am curious what will the strategy be to properly support construction. Do you plan to introduce a customization point to Env.createValue to give checks/models a way to set properties up? Or do you have something else in mind?

If we want to go the eager route, than I think that's the "right" solution. Anything else is brittle, since it is easy to miss a construction point. However, I'm more inclined to move the whole framework to a lazy initialization process, if we can devise one.

clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
229

Excellent question. Yes, you're right about the convergence issue this falls into the same category. That said, we intentionally reuse the same location in all branches, which mitigates that issue in this case -- we are at least assured that the two branches will have the contents value at the same location. In fact, the FIXME I added is wrong (and I've adjusted it accordingly). The property will be shared by all blocks that share the same value representing the Optional (which for the most part is done eagerly).

To your larger point, about the advisability of this approach, even for the short term:

  1. I'm pushing on this largely because I have the data to support that it works in practice. We've done multiple surveys of significant size (~1000 files, 10k optional accesses) and found it does very well. So, this is not much of an issue in practice, in large part because the contents of optionals don't tend to further influence the outcome of the check (e.g. in the case of nested optionals).
  2. I thought of redoing this in an eager fashion, but realized that ultimately we want to move everything to a lazy model, so it seemed less than ideal to start trying to be eager here.
301

I see your point, and agree that there does seem room for significantly reducing warnings, even if all are correct. But, in practice, we haven't gotten complaints (which is a little surprising, actually). I wonder if it would be good to pair the report with the surrounding compound statement, and limit to one per compound statement. I fear that shutting it off entirely after the first issue might be _too_ quiet. Either way, this is definitely something that we'll be actively tweaking.

xazax.hun accepted this revision.May 5 2022, 8:24 AM
This revision is now accepted and ready to land.May 5 2022, 8:24 AM

Thanks for the clarifications!

sgatev added inline comments.May 10 2022, 1:53 AM
clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
180–181

Why is this necessary? Do we really need to create a new value on each getHasValue call? Can we centralize initialization to avoid this? I'm worried that "has_value" and "value" initialization isn't coordinated.

229

Perhaps call this maybeInitializeOptionalValueMember?

229

Let's pass the type instead of Expr, if that's the only thing we need.

232–234

It's unclear to me why we "need" a StorageLocation. Representing the value as a ReferenceValue seems good to me, but there are other options. What issues are we running into if we remove the indirection?

241–246

So we don't lose the ReferenceValue, but we do lose its pointee? Is that the behavior we want when merging?

291

Shouldn't we use getHasValue here?

clang/unittests/Analysis/FlowSensitive/UncheckedOptionalAccessModelTest.cpp
2188–2260

Do we really need tests for all members that check/access/reset the content? These are already covered in tests above.

2280
2287

Do we really need this one or is it already covered by one of the tests in this file?

ymandel marked 8 inline comments as done.May 12 2022, 4:49 PM

I'm having trouble with arcanist -- arc diff is crashing, but wanted to respond in the meantime to your questions. I hope I'll be able to actually upload the new diff soon...

clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
180–181

Why is this necessary? Do we really need to create a new value on each getHasValue call? Can we centralize initialization to avoid this? I'm worried that "has_value" and "value" initialization isn't coordinated.

We don't create a new one on each call, only when none is already populated. As for centralizing -- we could, but I'd like to move to lazy initialization anyhow, so it seemed less than ideal to put effort into centralizing at this point.

232–234

It's unclear to me why we "need" a StorageLocation. Representing the value as a ReferenceValue seems good to me, but there are other options. What issues are we running into if we remove the indirection?

We can't address it. Compare with a normal field, where the corresponding AggregateStorageLocation has a child that maps to a storage location for the field. We don't support properties on AggregateStorageLocation, so the location has to exist somewhere.

What other options do you see?

241–246

So we don't lose the ReferenceValue, but we do lose its pointee? Is that the behavior we want when merging?

Correct. No, that's not the behavior we want, hence the FIXME that follows. :) But, it at least accounts for the other two situations, so its a win overall.

291

Shouldn't we use getHasValue here?

Not quite:

  1. its a bit a redundant since we already need to check whether the value is a StructValue, line 274-5. I tried to keep it to where its use would not cause unnecessary additional checks.
  2. We don't want to create a new "has_value" here, given that the access is illegal in that case. We want the condition to fail in that case (line 282) and result in recording an error.
clang/unittests/Analysis/FlowSensitive/UncheckedOptionalAccessModelTest.cpp
2188–2260

Great point. I rethought the tests in general -- I'd mostly been copying them over, without carefully rethinking them. I've reduced the number of tests and added comments explaining their purposes. Essentially, we need to test that the synthetic value can be used like a normal field.

2280

No longer relevant.

2287

Covered.

ymandel updated this revision to Diff 429210.May 13 2022, 5:48 AM
ymandel marked 6 inline comments as done.

address reviewer comments and fix bug in how the nested value as _re_created -- now uses the loc's type rather than the expression's type (which doesn't always match exactly).

ymandel updated this revision to Diff 429211.May 13 2022, 5:54 AM

remove stray FIXME from previous diff

sgatev@ -- gentle ping.

sgatev accepted this revision.May 24 2022, 11:52 PM
ymandel updated this revision to Diff 435528.Jun 9 2022, 6:26 AM

rebase and merge

ymandel updated this revision to Diff 435537.Jun 9 2022, 7:17 AM

add missing blank line to test

This revision was landed with ongoing or failed builds.Jun 9 2022, 7:19 AM
This revision was automatically updated to reflect the committed changes.