This is an archive of the discontinued LLVM Phabricator instance.

[clang][dataflow] Add modeling of Chromium's CHECK functionality
ClosedPublic

Authored by ymandel on Mar 16 2022, 6:00 AM.

Details

Summary

Chromium's implementation of assertions (CHECK, DCHECK, etc.) are not
annotated with "noreturn", by default. This patch adds a model of the logical
implications of successfully executing one of these assertions.

Diff Detail

Event Timeline

ymandel created this revision.Mar 16 2022, 6:00 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 16 2022, 6:00 AM
ymandel requested review of this revision.Mar 16 2022, 6:00 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 16 2022, 6:00 AM
gribozavr2 added inline comments.Mar 16 2022, 7:08 AM
clang/lib/Analysis/FlowSensitive/Models/ChromiumCheckModel.cpp
17–18

Please don't repeat the comment from the header in the cc file.

36
clang/unittests/Analysis/FlowSensitive/ChromiumCheckModelTest.cpp
207

Sorry, could you explain how this works? I think the flow condition should not be implying 'Foo' since we're supposed to ignore this unrelated Check() call.

ymandel marked 3 inline comments as done.Mar 16 2022, 7:19 AM
ymandel added inline comments.
clang/unittests/Analysis/FlowSensitive/ChromiumCheckModelTest.cpp
207

Good catch -- that was a typo (and the test was failing -- I'd gone back and forth on the formulation and uploaded at the wrong point).

ymandel updated this revision to Diff 415936.Mar 16 2022, 12:13 PM
ymandel marked an inline comment as done.

removed lattice dependency.

xazax.hun accepted this revision.Mar 16 2022, 12:49 PM
xazax.hun added inline comments.
clang/unittests/Analysis/FlowSensitive/ChromiumCheckModelTest.cpp
122

I wonder whether the models should actually be called by the framework at some point.
E.g. imagine the following scenario:

void f()
{
    std::optional<int> o(5);
    if (o)
    {
        // dead code here;
    }
}

In an ideal case, an analysis could use the std::optional modeling to realize that the code in the if statement is dead and use this fact to improve its precision. Explicitly request the modeling in the transfer function works OK when we only have a couple things to model. But it might not scale in the future. When we model dozens of standard types and functions we would not want all the analysis clients to invoke all the transfers for all the models individually.

This revision is now accepted and ready to land.Mar 16 2022, 12:49 PM
ymandel marked an inline comment as done.Mar 16 2022, 12:57 PM

Thanks for the review!

clang/unittests/Analysis/FlowSensitive/ChromiumCheckModelTest.cpp
122

Agreed. It seems similar the problems that motivated DLLs back in the day. there's clearly a lot to be worked out here in terms of how best to support composition. It's probably worth a RFC or somesuch to discuss in more depth.

xazax.hun added inline comments.Mar 16 2022, 3:23 PM
clang/unittests/Analysis/FlowSensitive/ChromiumCheckModelTest.cpp
122

Having an RFC and some deeper discussions would be great. I also wonder whether modeling arbitrary Stmts is the right approach. The peculiarities of the language should probably be modelled by the framework itself without any extensions. Maybe we only want the modeling of certain function calls to be customizable?

sgatev accepted this revision.Mar 18 2022, 3:55 AM
sgatev added inline comments.
clang/include/clang/Analysis/FlowSensitive/Models/ChromiumCheckModel.h
15

This is unnecessary.

clang/lib/Analysis/FlowSensitive/Models/ChromiumCheckModel.cpp
49–55
51

Shouldn't this be part of isCheckLikeMethod?

52
57
clang/unittests/Analysis/FlowSensitive/ChromiumCheckModelTest.cpp
90

Perhaps "Incorrect" instead of "Bad" and comment on what makes it incorrect?

133

We're not testing with other standards so remove this?

ymandel updated this revision to Diff 416479.Mar 18 2022, 6:28 AM
ymandel marked an inline comment as done.

address comments

ymandel updated this revision to Diff 416494.Mar 18 2022, 7:06 AM
ymandel marked 6 inline comments as done.

address comments

ymandel marked an inline comment as done.Mar 18 2022, 7:07 AM
ymandel added inline comments.Mar 18 2022, 7:10 AM
clang/unittests/Analysis/FlowSensitive/ChromiumCheckModelTest.cpp
122

Good question. It would be really nice if we could draw this line, but I have a bad feeling that it won't be so simple. :) Still, worth looking at our existing models, and new ones that we're developing, to see if we can find a clear "bounding box".

What does CSA do in this regard?

This revision was landed with ongoing or failed builds.Mar 18 2022, 7:40 AM
This revision was automatically updated to reflect the committed changes.

Why this should be maintained and developed by LLVM/Clang developers and not by Chromium?

Why this should be maintained and developed by LLVM/Clang developers and not by Chromium?

That's a good question. I think the short answer, that skirts around the issue, is that this is targeted for use in an upcoming clang-tidy check, and we don't have any framework for clients developing their own pluggable models for individual checks. Chromium could find some way to patch their clang-tidy source, I suppose, but we'd rather not encourage that. This framework is new and under development and the benefit of accomodating Chromium (in this very small way) seems worth the feedback from them applying it to their codebase.

That said, we are *also* in discussions with them to change their implement of CHECK, etc. so as to obviate the need for this model at all.

xazax.hun added inline comments.Mar 18 2022, 9:46 AM
clang/unittests/Analysis/FlowSensitive/ChromiumCheckModelTest.cpp
122

In the CSA, there is no clear distinction between modeling and diagnostics, each check can do both. Historically, this turned out to be a bad idea. When we have a check that is doing both modeling and diagnostics, users can end up turning the check off due to some false positives and end up getting worse results from other checks because some critical piece of modeling is also turned off. (E.g., even if there are a couple of false positives from a std::optional check, it might still be beneficial to do the modeling in the background without the diagnostics because it might help discover infeasible paths and that can improve the precision of other checks).

Nowadays, we try to make the distinction clear, some checks are modeling only and others are diagnostic only (they might still have their own modeling but they do not affect the global analysis state, i.e., the diagnostic only checks should not be able to affect each other). This distinction is currently a best effort approach and not enforced by any of the APIs.

In CSA, the checker APIs are very powerful. E.g., if there is a pointer with an unknown value and we see a dereference, we can continue the analysis with the assumption that the pointer is not-null. These assumptions could be added by the framework itself or just a regular check. Over time, we are trying to move as much modeling to (non-diagnostic) checks as possible to keep the framework lightweight but most of the meat is still in the framework.

To model libraries, we are using the evalCall callback: https://github.com/llvm/llvm-project/blob/main/clang/lib/StaticAnalyzer/Checkers/CheckerDocumentation.cpp#L229

Roughly speaking the model looks like this:

  • The analyzer encounters a function call, so it asks all the checks in a sequence if any of them wants to model it
  • A check gets the evalCall callback and can do whatever it wants to do. Most of them will return false most of the time as they are only expected to handle a small subset of the functions.
  • This first check returning true will short-circuit this process.
  • If none of the checks returned true, the framework will fall back to a default modeling which is a conservative approximation of the call, i.e., invalidating the bindings that could be changes by the function (globals, output arguments, return values etc.)

The model above assumes that when a check return true it will end up modeling all the aspects of a function (like invalidation). A downside might be that it will not solve the composition, when multiple checks want to model the same function, well, the framework will just pick one of them. Also, modeling types is really challenging. E.g., if a modeling check models the constructor of a type but does not model the destructor, the framework will end up using the default modeling for the dtor. The problem is that, in that case the ctor was not modeled by the framework (but the modeling checker) so some invariants were not established. This can result in false positives or even crashes. Overall, it looks like modeling types is almost an all or nothing endeavor. If a check models at least one method of a type it is really likely that it will need to model at least a bunch more to ensure a good experience.

At Microsoft, we have a similar framework to CSA called EspXEngine. In EspXEngine, we have different APIs for modeling and diagnostic so the API enforce that diagnostic checks cannot influence each other. Our model for the modeling checks is very similar to CSA but the idea is that the modeling checks are maintained by the authors of EspXEngine (while diagnostic checks could be written by anyone), so conflicting models are less of a problem.

Both EspXEngine and CSA has problems chaining modeling checks. E.g., if CSA has modeling for unique_ptr and optional, optional<unique_ptr<int>> or unique_ptr<optional<int>> will not model every aspects of the inner type out of the box.

Thanks, Gabor -- that's a really helpful summary!