This is an archive of the discontinued LLVM Phabricator instance.

[clang][dataflow] Eliminate duplication between `AggregateStorageLocation` and `StructValue`.
ClosedPublic

Authored by mboehme on Jul 17 2023, 5:16 AM.

Details

Summary

After this change, StructValue is just a wrapper for an AggregateStorageLocation. For the wider context, see https://discourse.llvm.org/t/70086.

How to review

  • Start by looking at the comments added / changed in Value.h, StorageLocation.h, and DataflowEnvironment.h. This will give you a good overview of the semantic changes.
  • Look at the corresponding .cpp files that implement the semantic changes.
  • Transfer.cpp, TypeErasedDataflowAnalysis.cpp, and RecordOps.cpp show how the core of the framework is affected by the semantic changes.
  • UncheckedOptionalAccessModel.cpp shows how this complex model is affected by the changes.
  • Many of the changes in the rest of the patch are mechanical in nature.

Diff Detail

Event Timeline

mboehme created this revision.Jul 17 2023, 5:16 AM
Herald added a project: Restricted Project. · View Herald Transcript
mboehme requested review of this revision.Jul 17 2023, 5:16 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 17 2023, 5:17 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
mboehme edited the summary of this revision. (Show Details)Jul 17 2023, 5:17 AM
mboehme updated this revision to Diff 541021.Jul 17 2023, 7:27 AM

Added bugfix for refreshStructValue() as a diffbase.

I got up to Transfer.cpp and figured I should send along what I have. Looks very good so far! I should have the rest of the review in a few hours...

clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h
337–338

I found this sentence hard to parse. Lots of "of..." chained. I think it might be clearer with a bit of a preamble about record prvalues. Or, refer the reader to Value.h for details? Once I read that, this became a lot clearer.

340
353

Given that this is specific to record types, and that's not reflected in the argument type, should the name somehow reflect that?

413–416
clang/include/clang/Analysis/FlowSensitive/StorageLocation.h
97–99

When assertions are off this will be an odd for loop with no body. Will the optimizer delete it? If not, then consider putting the loop in a lambda inside the assert (or somesuch).

136

which constructor? Also, is this clause clarifying the "other fields cannot change" part of the sentence?

clang/include/clang/Analysis/FlowSensitive/Value.h
202–223

These comments are fantastic!

206–207

nit: consider adding a simple example. Is this one right?
MyStruct S = MyStruct(3);
The intiializer is a prvalue which will result in a StructValue wrapping an ASL. Then, we'll model the decl S using that ASL.

clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
127

Please add comment explaining why we can assume this invariant.

129

Why do we need a new StructValue rather than reusing one of the existing ones?

690–691

Maybe just an assert?

assert(!isa<StructValue>(&Val) || &cast<StructValue>(&Val)->getAggregateLoc() == &Loc);
690–694

is it worth exposing this in the header (inline) now that it's just one line?

708

How does this work, given that you never called setValue?

811–813

how does this interact with the fact that multiple StructValues can share the same ASL?

821

Consider adding a comment for this given that there's none in the header. the name is a bit confusing on its own since the function also creates a corresponding value, not just the loc.

clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
351

What happens to ValueVal if we use createObject?

clang/lib/Analysis/FlowSensitive/RecordOps.cpp
88–89

nit: use else if ?

mboehme updated this revision to Diff 541980.Jul 19 2023, 6:03 AM

Various changes in response to review comments

mboehme marked 16 inline comments as done.Jul 19 2023, 6:09 AM
mboehme added inline comments.
clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h
337–338

Reworded.

I've tried to avoid referring to Value.h because StructValue itself may be going away entirely in the not-too-distant future. Instead, I've adapted some material from there (which also means people don't need to cross-reference Value.h).

WDYT?

353

Done.

clang/include/clang/Analysis/FlowSensitive/StorageLocation.h
97–99

Hm -- I'm not sure. Changed to a lambda.

136

Reworded. Is this clearer now?

clang/include/clang/Analysis/FlowSensitive/Value.h
206–207

Yes, this looks right. Added this as an example below.

clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
129

Added a comment.

690–694

Good point, done.

708

Agree this is confusing.

The intent here was to check a _pre_condition of the function (i.e. that the caller must already have associated StructVal->getAggregateLoc() with Val in the environment before calling this function) rather than a _post_condition of the function. But looking at this again, this doesn't seem like a good contract. All current callers of the function satisfy this precondition, but it's not documented anywhere, and it's also just simply a strange and unnecessary precondition. So instead, I've simply replaced this with a setValue() that makes sure this is the case if it wasn't before.

811–813

Not sure exactly what you're asking?

First of all, I think the wording on the documentation for StructValue may have been confusing, so I've reworded that to make it explicit that when I say that multiple StructValues can share the same AggregateStorageLocation, what I mean, more precisely, is that the same AggregateStorageLocation can be associated with different StructValues in different environments.

Here, however, we've only just created the AggregateStorageLocation (and a StructValue to go with it), so it definitely makes sense to associate the two through a call to setValue(Loc, StructVal).

821

Yes, the name was bad. Renamed and added a comment in the header.

clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
351

Oops -- this was left over from refactoring and was superfluous now. Thanks for catching. Removed.

ymandel accepted this revision.Jul 19 2023, 9:39 AM

This is really impressive. Thank you!

clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h
337–338

Looks great!

clang/include/clang/Analysis/FlowSensitive/StorageLocation.h
136

Yes. thx

clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
811–813

Thanks, that clarifies. I missed the "in different environments" bit. Makes sense now.

clang/lib/Analysis/FlowSensitive/Transfer.cpp
466

This diff makes me very happy. :)

495

I thought cast requires a nonnull argument, so this will always be true?

clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp
397

FIXME, here and below.

clang/unittests/Analysis/FlowSensitive/TransferTest.cpp
30

fixme

This revision is now accepted and ready to land.Jul 19 2023, 9:39 AM
mboehme updated this revision to Diff 542337.Jul 20 2023, 12:28 AM
mboehme marked 11 inline comments as done.

Changes in response to review comments

mboehme marked 3 inline comments as done.Jul 20 2023, 12:34 AM
mboehme added inline comments.
clang/lib/Analysis/FlowSensitive/Transfer.cpp
466

Me too! :)

495

Ah, good catch! Yes, the condition will always be true, so the if statement is superfluous. (We know S->getType() is a record type, and Env.createValue() will always create a value for record types.)

Removed the if statement.

(Confusingly, this comment now looks as if it's anchored to the if statement that does the Env.getValue(). The cast_or_null there _is_ necessary because we might not have a value for the argument.)

mboehme marked an inline comment as done.Jul 20 2023, 4:03 AM

CI failures look unrelated:

  • Windows failure is in SemaCXX/static-assert-cxx26.cpp. I can't see a way in which this patch should affect the failure of this test (the dataflow framework isn't used in Clang itself)
  • check-format failure is in clang/lib/Analysis/UnsafeBufferUsage.cpp, which this patch doesn't touch
xazax.hun accepted this revision.EditedJul 21 2023, 4:19 PM

I did not do a thorough review checking every line, but I read the design paper and skimmed through this patch. Love the direction, and I am OK with landing this as is.

clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h
677

I am glad to see this gone.

clang/include/clang/Analysis/FlowSensitive/Value.h
201–233

Should we link to the design doc somewhere? The comments are great and self-contained, but I wonder if there is still some value for the doc to be mentioned somewhere in the code. I do not have a preference, just wondering.

clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
678

Is eliminating SkipPast coming in a follow-up patch?

clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
534–549

This is more of a note for the future of these APIs. I wonder if this is the right level of abstraction in this framework. I think it would be great to have something like:

void transferCallReturningOptional(const CallExpr *E,
                                   StorageLocation* RetLoc,
                                   ArrayRef<StorageLocation> Args,
                                   const MatchFinder::MatchResult &Result,
                                   LatticeTransferState &State);

and all the values and storage locations could come from the framework. It would be great if the user would almost never need to get a value or a location from an expression, they would be readily available. What do you think?

mboehme marked 4 inline comments as done.Jul 24 2023, 6:16 AM
mboehme added inline comments.
clang/include/clang/Analysis/FlowSensitive/Value.h
201–233

I'm hestitant do to this, as the doc is likely to become out of date at some point. We've already discussed plans on another patch to eliminate StructValue entirely, for example. So I'd rather make sure we have sufficient documentation with the code itself, where it's most likely to be kept up to date.

Anything in particular from the doc that you think would be worth capturing here? (I intend to submit shortly, but would address this in a followup patch.)

clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
678

Yes! I've already got a draft patch, which I'll be submitting for review soon.

clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
534–549

Hm, interesting idea! This definitely sounds like a way to avoid mistakes, though there would definitely be some details to work out -- for example, a StorageLocation for the arguments only works if they are passed by reference, not by value. Something to think about for sure though.

This revision was landed with ongoing or failed builds.Jul 24 2023, 6:20 AM
This revision was automatically updated to reflect the committed changes.
mboehme marked 3 inline comments as done.
xazax.hun added inline comments.Jul 24 2023, 8:50 AM
clang/include/clang/Analysis/FlowSensitive/Value.h
201–233

I have nothing I would miss from the comments, but there were more code examples in the design doc that helped me understand the problem. I can imagine some people have easier time consuming that document for this reason. But I agree that this is not a strong enough argument to link to something that might go stale in the future.

clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
534–549

Another potential advantage of that approach would be to amortize the lookup costs in case there are multiple checks participating in the same fixed-point iteration.