This is an archive of the discontinued LLVM Phabricator instance.

[clang][dataflow] Add API to separate analysis from diagnosis
ClosedPublic

Authored by samestep on Jun 15 2022, 12:35 PM.

Details

Summary

This patch adds an optional PostVisitStmt parameter to the runTypeErasedDataflowAnalysis function, which does one more pass over all statements in the CFG after a fixpoint is reached. It then defines a diagnose method for the optional model in a new UncheckedOptionalAccessDiagnosis class, but only integrates that into the tests and not the actual optional check for clang-tidy. That will be done in a followup patch.

The primary motivation is to separate the implementation of the unchecked optional access check into two parts, to allow for further refactoring of just the model part later, while leaving the checking part alone. Currently there is duplication between the transferUnwrapCall and diagnoseUnwrapCall functions, but that will be dealt with in the followup.

Because diagnostics are now all gathered into one collection rather than being populated at each program point like when computing a fixpoint, this patch removes the usage of Pair and UnorderedElementsAre from the optional model tests, and instead modifies all their expectations to simply check the stringified set of diagnostics against a single string, either "safe" or some concatenation of "unsafe: input.cc:y:x". This is not ideal as it loses any connection to the /*[[check]]*/ annotations in the source strings, but it does still retain the source locations from the diagnostic strings themselves.

Diff Detail

Event Timeline

samestep created this revision.Jun 15 2022, 12:35 PM
samestep requested review of this revision.Jun 15 2022, 12:35 PM
Herald added projects: Restricted Project, Restricted Project, Restricted Project. · View Herald TranscriptJun 15 2022, 12:35 PM
samestep edited the summary of this revision. (Show Details)Jun 15 2022, 12:36 PM
samestep added reviewers: ymandel, sgatev.
ymandel added inline comments.Jun 16 2022, 9:37 AM
clang/include/clang/Analysis/FlowSensitive/DataflowAnalysis.h
35 ↗(On Diff #437299)

I'm not sure that this is valid style. It's essentially a unit type, which I quite like, but I've never seen this done before.

64 ↗(On Diff #437299)

Please describe this parameter.

103 ↗(On Diff #437299)

The virtual function seems out of place here, given that everything else is statically resolved. Either drop virtual or move to the dyn. typed super class.

That said, I'm not clear on why this isn't a free function. I imagine a model of creating the analysis, running it and that results in blocks annotated with Environments and custome lattices. Then, you run any function you want over that using some visitor template helper. To account for state, we either make it an explicit parameter or take a std::function.

clang/include/clang/Analysis/FlowSensitive/MatchSwitch.h
52 ↗(On Diff #437299)

this implies that we want to support some form of accumulation. I'd argue that any accumulation should already occur in the lattice (or the environment) and this should be a straight read of the state and some output. So, the TransferState struct should be enough.

C++ seems to be tied to templates. I wonder if super classes resp. inheritance would make your life easier. You can explicitly define the API and documentation. You could provide default implementations.

xazax.hun added inline comments.Jun 17 2022, 10:11 AM
clang-tools-extra/clang-tidy/bugprone/UncheckedOptionalAccessCheck.cpp
66 ↗(On Diff #437299)

While doing yet another iteration after we reached the fixed point is a valid approach, many analyses have a monotonic property when it comes to emitting diagnostics. In this case, when the analysis discovered that an optional access is unsafe in one of the iterations I would not expect it to become safe in subsequent iterations. In most of the cases this makes collecting diagnostics eagerly a viable option and saves us from doing one more traversal of the CFG. On the other hand, we need to be careful to not to emit duplicate diagnostics.

I wonder whether, from a user point of view, not having to do one more iteration is a more ergonomic interface.

samestep added inline comments.Jun 21 2022, 8:44 AM
clang-tools-extra/clang-tidy/bugprone/UncheckedOptionalAccessCheck.cpp
66 ↗(On Diff #437299)

@xazax.hun Agreed, I don't think for this particular analysis there's a risk of diagnostics being prematurely added and then becoming invalid as the analysis progresses. However, the broader goal of this patch is to separate the information-gathering part of the analysis to the diagnostic-flagging part of the analysis, with the hope that we can maybe derive the former automatically while still allowing the latter to be written manually. It seems to me that it would be infeasible to automatically derive both.

samestep updated this revision to Diff 438891.Jun 21 2022, 7:22 PM

Only add the new API, don't change the check

samestep edited the summary of this revision. (Show Details)Jun 21 2022, 7:28 PM
samestep retitled this revision from [clang][dataflow] Find unsafe locs after fixpoint to [clang][dataflow] Add API to separate analysis from diagnosis.Jun 21 2022, 7:31 PM
sgatev added inline comments.Jun 22 2022, 8:48 AM
clang/include/clang/Analysis/FlowSensitive/Diagnosis.h
26 ↗(On Diff #438891)

Let's add some documentation for Diagnosis and diagnoseCFG.

27 ↗(On Diff #438891)

#include <functional>

32 ↗(On Diff #438891)

#include <vector> and #include "llvm/ADT/Optional.h"

49 ↗(On Diff #438891)

#include <utility>

clang/include/clang/Analysis/FlowSensitive/MatchSwitch.h
49 ↗(On Diff #438891)

Move this to Diagnosis.h?

clang/include/clang/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.h
76

Move this to a new UncheckedOptionalAccessDiagnosis.(h,cpp)?

samestep updated this revision to Diff 439045.Jun 22 2022, 9:06 AM

Comment Diagnosis.h and add missing headers

samestep marked 4 inline comments as done.Jun 22 2022, 9:07 AM
samestep added inline comments.
clang/include/clang/Analysis/FlowSensitive/MatchSwitch.h
49 ↗(On Diff #438891)

I'd prefer to keep it in MatchSwitch.h alongside TransferState, unless we plan to, e.g., move that type to DataflowAnalysis.h.

clang/include/clang/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.h
76

OK! I'll do that next.

samestep updated this revision to Diff 439048.Jun 22 2022, 9:10 AM

Try to fix the broken patch

samestep marked an inline comment as not done.Jun 22 2022, 9:40 AM
samestep added inline comments.
clang/include/clang/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.h
76

Actually, @sgatev would that go under FlowSensitive/Models/ or just under FlowSensitive/?

sgatev added inline comments.Jun 22 2022, 10:15 AM
clang/include/clang/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.h
76

I suggest keeping it under FlowSensitive/Models/ for now. We can change that at a later point if there's a better alternative.

samestep added inline comments.Jun 22 2022, 11:37 AM
clang/include/clang/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.h
76

Where should I put the UncheckedOptionalAccessModelOptions type and ignorableOptional function that need to be shared between both the model and the diagnosis?

samestep added inline comments.Jun 22 2022, 11:47 AM
clang/include/clang/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.h
76

Same question for the following other helper functions:

  • hasOptionalType
  • isOptionalMemberCallWithName
  • isOptionalOperatorCallWithName

Given how much code is shared between the two, I'm really not sure whether it's the best idea to move this into a separate file in this patch...

xazax.hun added inline comments.Jun 22 2022, 4:47 PM
clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
546

I think this should be moved closer to its use.

684–685

This logic is duplicated between the check and the diagnosis. I think it would be nice to factor this out to avoid these two parts getting out of sync.

samestep updated this revision to Diff 439372.Jun 23 2022, 6:24 AM
  • Merge branch 'main' into diagnose-api
  • Incorporate Gábor's suggestions
samestep marked 2 inline comments as done.Jun 23 2022, 6:25 AM
sgatev accepted this revision.Jun 23 2022, 8:21 AM
sgatev added inline comments.
clang/include/clang/Analysis/FlowSensitive/Diagnosis.h
57 ↗(On Diff #439372)

Better to remove this and rely on NRVO? https://en.cppreference.com/w/cpp/language/copy_elision

clang/include/clang/Analysis/FlowSensitive/MatchSwitch.h
49 ↗(On Diff #438891)

Fair enough, but generally I see TransferState and DiagnoseState as very specific to the dataflow analysis framework whereas MatchSwitchBuilder seems to be a bit more generic so I'd consider separating them.

clang/include/clang/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.h
76

I suggest either declaring those in the header and using them from UncheckedOptionalAccessDiagnosis.cpp or moving them to a separate file and including them both in the model and in the diagnosis. In general I think it's fair if the diagnosis depends on the model, but not the other way around. It's perfectly fine if we do this refactoring in a follow up.

This revision is now accepted and ready to land.Jun 23 2022, 8:21 AM
samestep added inline comments.Jun 23 2022, 8:25 AM
clang/include/clang/Analysis/FlowSensitive/Diagnosis.h
57 ↗(On Diff #439372)

Ah OK, I'm not a C++ expert so I wasn't sure when to write std::move and when not to. Will do, thanks!

clang/include/clang/Analysis/FlowSensitive/MatchSwitch.h
49 ↗(On Diff #438891)

Sure, if people think it makes sense to put those two types elsewhere then we can do that in a followup.

clang/include/clang/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.h
76

Yep, makes sense; I just didn't want to make declarations for all those in a header file in this patch.

samestep updated this revision to Diff 439411.Jun 23 2022, 8:26 AM
  • Remove unnecessary std::move
gribozavr2 added inline comments.Jun 23 2022, 9:12 AM
clang/include/clang/Analysis/FlowSensitive/Diagnosis.h
38 ↗(On Diff #439372)

This function seems pretty general and not necessarily tied to diagnostics.

WDYT about using "visitor" in the name?

For example, "visitDataflowAnalysisResults".

The "Diagnosis" would also become "DataflowAnalysisResultVisitor".

42 ↗(On Diff #439372)

I'm leaning towards std::vector<Diag> instead of a set, to avoid requiring that Diag is hashable. Users who need hash-based deduplication can easily do it themselves (but why would they have duplication? we only visit each Stmt once)

57 ↗(On Diff #439372)
clang/include/clang/Analysis/FlowSensitive/MatchSwitch.h
49 ↗(On Diff #438891)

For now, since there is only a single user, I'd actually prefer to move it into UncheckedOptionalAccessDiagnosis itself. It is a trivial component, basically a std::pair, it seems that exposing it as a public API might not be that helpful.

clang/include/clang/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.h
78

"Diagnosis" sounds like the result. Should this be a "Diagnoser"?

clang/unittests/Analysis/FlowSensitive/TestingSupport.h
78–79

It is better to name a function with a verb. WDYT about "VerifyResults"?

If you agree, please also update the functions below.

147
189

Can we use diagnoseCFG to implement the loop above?

clang/unittests/Analysis/FlowSensitive/UncheckedOptionalAccessModelTest.cpp
1841

The original was unsafe, is it an intentional change?

xazax.hun added inline comments.Jun 23 2022, 9:24 AM
clang/include/clang/Analysis/FlowSensitive/Diagnosis.h
38–42 ↗(On Diff #439411)

I feel a bit uneasy about this API. The need of helpers in the tests is a signal that this API might not be ergonomic as is. One needs to pass a lot of stuff in, and it is really error prone. For example, the user might get confused what Environment to pass in. Or one might pass the wrong Analysis if there are multiple similar ones in scope. I wonder if there is a way to reduce the ceremony needed to get diagnostics out. How about adding an optional argument to runDataflowAnalysis function to collect the results? I wonder if that would be a bit cleaner. You do not need to change the return type of that function, the collector object could take a reference to the llvm::DenseSet<Diag> that it populates.

clang/include/clang/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.h
85

Do we expect this to modify the ASTContext? Could this be a const ref instead?

xazax.hun added inline comments.Jun 23 2022, 9:26 AM
clang/include/clang/Analysis/FlowSensitive/Diagnosis.h
38–42 ↗(On Diff #439411)

Moreover, if we pass the diagnostic function to runDataflowAnalysis, the fact whether we collect diagnostic during the fixed-point iteration or after that in a separate pass becomes an implementation detail. We can easily change back and forth without affecting the callers. The current API mandates an implementation approach and makes it harder to change in the future.

samestep marked an inline comment as done.Jun 23 2022, 9:56 AM
samestep added inline comments.
clang/include/clang/Analysis/FlowSensitive/Diagnosis.h
38–42 ↗(On Diff #439411)

I think this makes a lot of sense. I agree that the current set of parameters is unwieldy. I hadn't thought about your point of hiding the separate pass as an implementation detail, but that also makes sense to me. @gribozavr2 @ymandel @sgatev thoughts?

38 ↗(On Diff #439372)

Sure, that sounds fine to me. What do you think about @xazax.hun's suggested alternate API? If we go with that, would we rename it to say "visitor" regardless?

42 ↗(On Diff #439372)

This makes a lot of sense, I'll make that change. Somehow along the way I forgot that the whole reason we even needed a set was because we were using it as a lattice when computing a fixpoint, but you're right that since we're pulling this out of the fixpoint iteration, we a vector would do fine.

57 ↗(On Diff #439372)

Thanks!

clang/include/clang/Analysis/FlowSensitive/MatchSwitch.h
49 ↗(On Diff #438891)

That's also fine; I can make that change here.

clang/include/clang/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.h
78

Sure, I can change it to say "Diagnoser" instead (unless we want to just replace it with "Visitor").

85

I'd guess it shouldn't; I was mostly just copying the ASTContext & field from the DataflowAnalysis class, which is not const. I can try changing it to const here, though.

clang/unittests/Analysis/FlowSensitive/TestingSupport.h
78–79

Sounds good to me, I can do that.

189

Possibly! I'll look into it.

clang/unittests/Analysis/FlowSensitive/UncheckedOptionalAccessModelTest.cpp
1841

Oh! Good catch, I didn't notice that. Will fix.

gribozavr2 added inline comments.Jun 23 2022, 10:04 AM
clang/include/clang/Analysis/FlowSensitive/Diagnosis.h
38–42 ↗(On Diff #439411)

I think that's a good idea, it will make a simpler API for typical usage scenarios. I still think there might be space for an API like diagnoseCFG (e.g., if you want to run multiple visitations of the analysis results), but we don't have a use case for it so far.

samestep marked an inline comment as done.Jun 24 2022, 8:16 AM
samestep added inline comments.
clang/include/clang/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.h
85

@xazax.hun Coming back to this today, I remembered why it has to be ASTContext & and not const ASTContext &: because MatchSwitch requires it to not be const.

xazax.hun added inline comments.Jun 24 2022, 9:15 AM
clang/include/clang/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.h
85

Thanks! Do you mean MatchSwitch cannot be refactored to use const ASTContext&? I quickly skimmed through the code and it looks like almost every instance of ASTContext is non-const. So I think it is ok to leave as is for this PR. I am also not sure if making it const at more places would bring any significant value.

samestep updated this revision to Diff 439878.Jun 24 2022, 1:30 PM
  • Merge branch 'main' into diagnose-api
  • WIP
samestep edited the summary of this revision. (Show Details)Jun 24 2022, 1:36 PM
samestep updated this revision to Diff 440211.Jun 27 2022, 7:06 AM
  • Merge branch 'main' into diagnose-api
samestep updated this revision to Diff 440406.Jun 27 2022, 3:00 PM
  • Bring back Diagnosis.h
samestep updated this revision to Diff 440687.Jun 28 2022, 10:35 AM
  • Implement Gábor's suggestion
  • Merge branch 'main' into diagnose-api
samestep edited the summary of this revision. (Show Details)Jun 28 2022, 10:38 AM
samestep added a reviewer: xazax.hun.
ymandel added inline comments.Jun 28 2022, 11:06 AM
clang/include/clang/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.h
128

Please update comment to mention this new param (and that its optional)

clang/unittests/Analysis/FlowSensitive/TestingSupport.h
65–73

nit: since this includes both inputs and results, maybe AnalysisData?

114

nit: i prefer explicit comparisons, so MakePostVisitStmt != nullptr.

161
172–191

any chance that this could be folded into MakePostVisitStmt?

clang/unittests/Analysis/FlowSensitive/UncheckedOptionalAccessModelTest.cpp
1249
xazax.hun accepted this revision.Jun 28 2022, 11:45 AM

Thanks, LGTM!

gribozavr2 accepted this revision.Jun 28 2022, 1:31 PM
gribozavr2 added inline comments.
clang/include/clang/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.h
78

Not done?

clang/unittests/Analysis/FlowSensitive/TestingSupport.h
144–145
155
clang/unittests/Analysis/FlowSensitive/UncheckedOptionalAccessModelTest.cpp
2335–2336
sgatev accepted this revision.Jun 28 2022, 11:34 PM
sgatev added inline comments.
clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp
401

Let's harmonize the arguments to transferBlock and runTypeErasedDataflowAnalysis. I think in both cases we only need access to the internal Stmt so there's no need to pass CFGStmt to transferBlock. This way we can pass PostVisitStmt directly here without having to wrap it in a lambda.

clang/unittests/Analysis/FlowSensitive/TestingSupport.h
78–82

Why is this a function that returns a function? Couldn't it be simply a void function, like VerifyResults?

samestep marked 9 inline comments as done.Jun 29 2022, 10:56 AM

Fixed all the minor nits and responded to some comments.

clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp
401

Makes sense to me; should I do that in a followup?

clang/unittests/Analysis/FlowSensitive/TestingSupport.h
78–82

It needs to take in the ASTContext, and it needs to be able to close over state; see the one we pass from UncheckedOptionalAccessModelTest.cpp.

172–191

Looking into this now.

clang/unittests/Analysis/FlowSensitive/UncheckedOptionalAccessModelTest.cpp
2335–2336

I don't think this is correct, is it?

samestep updated this revision to Diff 441084.Jun 29 2022, 10:57 AM
  • Merge branch 'main' into diagnose-api
  • Fix minor nits
samestep updated this revision to Diff 441112.Jun 29 2022, 11:54 AM
  • Tidy AnalysisData struct and PostVisitStmt param
This revision was landed with ongoing or failed builds.Jun 29 2022, 12:18 PM
This revision was automatically updated to reflect the committed changes.