Page MenuHomePhabricator

[clang-tidy] misc-no-recursion: a new check
ClosedPublic

Authored by lebedev.ri on Jan 7 2020, 2:21 PM.

Details

Summary

Recursion is a powerful tool, but like any tool
without care it can be dangerous. For example,
if the recursion is unbounded, you will
eventually run out of stack and crash.

You can of course track the recursion depth
but if it is hardcoded, there can always be some
other environment when that depth is too large,
so said magic number would need to be env-dependent.
But then your program's behavior is suddenly more env-dependent.

Also, recursion, while it does not outright stop optimization,
recursive calls are less great than normal calls,
for example they hinder inlining.

Recursion is banned in some coding guidelines:

  • SEI CERT DCL56-CPP. Avoid cycles during initialization of static objects
  • JPL 2.4 Do not use direct or indirect recursion.
  • I'd say it is frowned upon in LLVM, although not banned

And is plain unsupported in some cases:

  • OpenCL 1.2, 6.9 Restrictions: i. Recursion is not supported.

So there's clearly a lot of reasons why one might want to
avoid recursion, and replace it with worklist handling.
It would be great to have a enforcement for it though.

This implements such a check.
Here we detect both direct and indirect recursive calls,
although since clang-tidy (unlike clang static analyzer)
is CTU-unaware, if the recursion transcends a single standalone TU,
we will naturally not find it :/

The algorithm is pretty straight-forward:

  1. Build call-graph for the entire TU. For that, the existing clang::CallGraph is re-used, although it had to be modified to also track the location of the call.
  2. Then, the hard problem: how do we detect recursion? Since we have a graph, let's just do the sane thing, and look for Strongly Connected Function Declarations - widely known as SCC. For that LLVM provides llvm::scc_iterator, which is internally an Tarjan's DFS algorithm, and is used throught LLVM, so this should be as performant as possible.
  3. Now that we've got SCC's, we discard those that don't contain loops. Note that there may be more than one loop in SCC!
  4. For each loopy SCC, we call out each function, and print a single example call graph that shows recursion -- it didn't seem worthwhile enumerating every possible loop in SCC, although i suppose it could be implemented.
    • To come up with that call graph cycle example, we start at first SCC node, see which callee of the node is within SCC (and is thus known to be in cycle), and recurse into it until we hit the callee that is already in call stack.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Herald added a project: Restricted Project. · View Herald TranscriptJan 7 2020, 2:21 PM

It'll be reasonable to add CERT alias.

clang-tools-extra/clang-tidy/misc/NoRecursionCheck.cpp
214

true

clang-tools-extra/docs/clang-tidy/checks/misc-no-recursion.rst
9

It'll be reasonable to add links to relevant coding guidelines.

It'll be reasonable to add CERT alias.

I'm not sure about that.
This diagnoses any potential recursion,
while CERT is much more specific than that.
(Avoid cycles during initialization of static objects)

lebedev.ri edited the summary of this revision. (Show Details)Jan 7 2020, 2:55 PM
lebedev.ri updated this revision to Diff 236843.Jan 8 2020, 9:05 AM
lebedev.ri marked an inline comment as done.

s/1/true/

So that is were the CTU question comes from? :)

clang-tools-extra/clang-tidy/misc/NoRecursionCheck.cpp
22

That should be in the CallGraph code, adding a private operator overload does not feel right.

32

That stuff, too.

68

Why smart?

74

That method name looks confusing. isEmpty()? If not, why?

243

That should be a Note:

288

Please merge these two notes into one.

clang-tools-extra/clang-tidy/misc/NoRecursionCheck.h
29

nit: the private stuff is usually at the bottom in the other clang-tidy code. not sure if there is something in the coding standard though.

clang-tools-extra/docs/ReleaseNotes.rst
73

The technical details should not be in the release notes. Just that it finds recursion and diagnoses it.

clang-tools-extra/test/clang-tidy/checkers/misc-no-recursion.cpp
128

Does the check find recursion through function pointers? (Probably not? Should be noted as limitation).

Please add cases from lambdas. And cases where recursion happens in templates / only with one of multiple instantiations.

lebedev.ri marked 8 inline comments as done.Jan 11 2020, 10:00 AM

Thanks for taking a look.
Some deflective replies inline.

clang-tools-extra/clang-tidy/misc/NoRecursionCheck.cpp
22

I didn't want to put this into callgraph, because this
cements the fact that we do not care about sourceloc,
which may not be true for other users.
I can put it there, but if told so by it's code owners (@NoQ ?)

32

(same reasoning)

68

Because if it's just named ImmutableSet, why should it not just be a using ImmutableSet = DenseSet; const ImmutableSet ......?
This denotes it's different behaviour when in small-size mode and in non-small-size mode.

74

See how it's used, e.g. in count(), see SmallSet also.

243

I'm intentionally emitting it as a warning.

If i // NOLINT the main warning, does that not silence the related Notes?
I don't want NOLINTing the "main" function to silence the report for the rest of the functions in SCC.

bader added a subscriber: bader.Jan 13 2020, 10:38 AM

Does it make sense to implement such diagnostics in clang Sema, considering that OpenCL does not allow recursion?
We implemented similar diagnostics for SYCL programming model and would be like to upstream it to clang later (https://github.com/intel/llvm/commit/4efe9fcf2dc6f6150b5b477b0f8320ea13a7f596). Can we somehow leverage this work for the compiler?

lebedev.ri marked 3 inline comments as done.Jan 13 2020, 11:12 AM

Does it make sense to implement such diagnostics in clang Sema, considering that OpenCL does not allow recursion?
We implemented similar diagnostics for SYCL programming model and would be like to upstream it to clang later (https://github.com/intel/llvm/commit/4efe9fcf2dc6f6150b5b477b0f8320ea13a7f596). Can we somehow leverage this work for the compiler?

Implementing it elsewhere will be more restrictive in the future - somehow i suspect
it will be easier to make clang-tidy CTU-aware rather than clang sema.

That being said, is SYCL inherently single-TU, does it not support
linking multiple separately compiled object files together?

bader added a subscriber: Naghasan.Jan 14 2020, 4:08 AM

Does it make sense to implement such diagnostics in clang Sema, considering that OpenCL does not allow recursion?
We implemented similar diagnostics for SYCL programming model and would be like to upstream it to clang later (https://github.com/intel/llvm/commit/4efe9fcf2dc6f6150b5b477b0f8320ea13a7f596). Can we somehow leverage this work for the compiler?

Implementing it elsewhere will be more restrictive in the future - somehow i suspect
it will be easier to make clang-tidy CTU-aware rather than clang sema.

That being said, is SYCL inherently single-TU, does it not support
linking multiple separately compiled object files together?

SYCL doesn't require multi-TU support. AFAIK, ComputeCPP implementation is signle-TU. +@Naghasan to confirm/clarify.
The open source implementation I referred to, does support linking separately compiled object files, but still I think detecting single-TU recursion in clang is very useful.
Is it possible to have both: intra-TU diagnostics in clang and inter-TU diagnostics in clang-tidy tool? Share any infrastructure (e.g. recursion detection)?

Does it make sense to implement such diagnostics in clang Sema, considering that OpenCL does not allow recursion?
We implemented similar diagnostics for SYCL programming model and would be like to upstream it to clang later (https://github.com/intel/llvm/commit/4efe9fcf2dc6f6150b5b477b0f8320ea13a7f596). Can we somehow leverage this work for the compiler?

Implementing it elsewhere will be more restrictive in the future - somehow i suspect
it will be easier to make clang-tidy CTU-aware rather than clang sema.

That being said, is SYCL inherently single-TU, does it not support
linking multiple separately compiled object files together?

SYCL doesn't require multi-TU support. AFAIK, ComputeCPP implementation is signle-TU. +@Naghasan to confirm/clarify.
The open source implementation I referred to, does support linking separately compiled object files, but still I think detecting single-TU recursion in clang is very useful.
Is it possible to have both: intra-TU diagnostics in clang and inter-TU diagnostics in clang-tidy tool? Share any infrastructure (e.g. recursion detection)?

Feel free to post that as an RFC to cfe-dev. If the feedback is favorable, i suspect i can rework this code somehow.

bader added a comment.Jan 30 2020, 1:57 AM

@lebedev.ri, thanks for the suggestion. We will investigate this option when we ready to upload clang diagnostics patch.

NoQ added a comment.Feb 5 2020, 8:28 AM

CallGraph changes look nice, i don't see any immediate problems and it's a nice-to-have thing that other users may benefit from (the static analyzer wouldn't though, it's only interested in topological sorting order on the graph). I guess you could have as well stored a CallExpr with the CallRecord.

Note that there are certain problems with the call graph itself that are caused by how our very AST is structured. Hopefully they won't cause you too much trouble. I outlined some of them in the inline comments.

clang-tools-extra/clang-tidy/misc/NoRecursionCheck.cpp
22

Does this need to be operator==? It looks like DenseMap uses DenseMapInfo::isEqual instead, maybe just call it "isEqual" to avoid confusion?

32

So what exactly happens if you link together two different translation units with different specializations of DenseMapInfo<CallRecord> and start passing such dense maps from one TU to the other?

(ODR violations are definitely not my specialty)

clang/lib/Analysis/CallGraph.cpp
96–101 ↗(On Diff #236843)

Note that you're missing out on destructors. They aren't in the call graph because they aren't in the AST to begin with. Like, declarations are there of course, but expressions are mostly missing. A destructor may call another function that will in turn destroy an object of the same type (maybe even the same object), causing recursion.

(side note: we're missing out on destructors too)

103–106 ↗(On Diff #236843)

Note that if a function void foo(int x = boo()) { ... } is invoked as void bar() { foo(); foo(); }, you'd get two different-in-every-way call records from the same function bar() to the same function boo() with the same source location (the location of call-expression boo() within the ParmVarDecl for x). I'm legit curious if this is a problem for your checker.

lebedev.ri marked 11 inline comments as done.

@NoQ Thank you for taking a look!

Rebased, addressed most of the review notes (some tests could be added), split callgraph changes into a separate review.

clang-tools-extra/clang-tidy/misc/NoRecursionCheck.cpp
22

Now that i store Expr* in CallGraphNode::CallRecord,
this comparison operator appears to be automatically provided,
which means we really should provide our own to avoid confusing behavior
as compared to the DenseMap behavior.
So i'd say yes.

32

A good question, for another time.

clang/lib/Analysis/CallGraph.cpp
103–106 ↗(On Diff #236843)

Test case added (is this how you'd imagine it to be a problem?).
I'm not sure what you mean by a problem, it is diagnosed at least.

NoQ added inline comments.Feb 5 2020, 11:56 AM
clang-tools-extra/test/clang-tidy/checkers/misc-no-recursion.cpp
153

Aha, yeah, the warning is present, i guess that part is correct, but look how your note function 'bar' calls function 'boo' here: doesn't really point into the body of 'bar'. In this case it still makes sense because it's easy to see that 'foo' is called from 'bar', but the chain of default arguments may be arbitrarily long (what if 'boo' has yet another default argument?). You might want to add a separate facility just to handle this edge case.

lebedev.ri marked an inline comment as done.

Fixup disclaimer printing, NFC.

clang-tools-extra/test/clang-tidy/checkers/misc-no-recursion.cpp
153

To make this more readable, the diag is:

/builddirs/llvm-project/build-Clang9-unknown/tools/clang/tools/extra/test/clang-tidy/checkers/Output/misc-no-recursion.cpp.tmp.cpp:138:5: warning: function 'boo' is part of call graph loop [misc-no-recursion]
int boo();
    ^
/builddirs/llvm-project/build-Clang9-unknown/tools/clang/tools/extra/test/clang-tidy/checkers/Output/misc-no-recursion.cpp.tmp.cpp:140:6: warning: function 'bar' is part of call graph loop [misc-no-recursion]
void bar() {
     ^
/builddirs/llvm-project/build-Clang9-unknown/tools/clang/tools/extra/test/clang-tidy/checkers/Output/misc-no-recursion.cpp.tmp.cpp:138:5: note: Example call graph loop, starting from function 'boo'
int boo();
    ^
/builddirs/llvm-project/build-Clang9-unknown/tools/clang/tools/extra/test/clang-tidy/checkers/Output/misc-no-recursion.cpp.tmp.cpp:145:3: note: Frame #1: function 'boo' calls function 'bar' here:
  bar();
  ^
/builddirs/llvm-project/build-Clang9-unknown/tools/clang/tools/extra/test/clang-tidy/checkers/Output/misc-no-recursion.cpp.tmp.cpp:139:18: note: Frame #2: function 'bar' calls function 'boo' here:
void foo(int x = boo()) {}
                 ^

Am i looking at this wrong, or all the info is present?

lebedev.ri marked an inline comment as not done.Feb 5 2020, 12:18 PM
lebedev.ri added inline comments.
clang-tools-extra/test/clang-tidy/checkers/misc-no-recursion.cpp
153

Edit: right, bar->foo edge is not shown there.

NoQ added inline comments.Feb 5 2020, 12:45 PM
clang-tools-extra/test/clang-tidy/checkers/misc-no-recursion.cpp
153

Yup. And, well, semantically it's kind of correct, because it's not foo() that calls boo(), but it's bar()'s responsiblity to call boo() directly before calling foo(). So it's more like a three-way relationship:

misc-no-recursion.cpp.tmp.cpp:139:18: note: Frame #1: function 'bar' calls function 'boo' here:
void foo(int x = boo()) {}
                 ^
misc-no-recursion.cpp.tmp.cpp:141:7: note: ...in order to compute default argument for function 'foo' here:
  foo();
      ^

So i'd imagine CallRecords potentially containing arbitrarily long chains of backreferences from a default argument initializer to the respective CXXDefaultArgExpr that we're currently visiting (which may in turn be part of another default argument initializer, given that those are arbitrary expressions).

(it's of course up to you whether you want to deal with this immediately or later, i'm just ranting, don't mind me)

aaron.ballman added inline comments.Feb 12 2020, 10:16 AM
clang-tools-extra/clang-tidy/misc/NoRecursionCheck.cpp
29

"Smart" is not a descriptive term. This seems like it's an ImmutableSmallSet?

65

No else after return.

71

Same complaint here about Smart -- should probably just be a SmallSetVector.

204

I think "call graph loop" is a bit much for a diagnostic -- how about %0 is within a recursive call chain or something along those lines?

205

Drop the call to getName() and remove the single quotes from the diagnostic around %0, and just pass in the NamedDecl -- the diagnostics engine will do the right thing (and it gives better names for strange things like operators).

231

Example -> example

I think "call graph group" might be a bit much for a diagnostic -- how about example recursive call chain, starting from %0

233

Same here (and elsewhere, I'll stop commenting on them).

250–251

Diagnostic text should not be full sentences.

... which was the starting point of the recursive call chain; there may be other cycles

260

Remove commented out code.

clang-tools-extra/docs/clang-tidy/checks/misc-no-recursion.rst
9

Agreed.

lebedev.ri marked 16 inline comments as done and an inline comment as not done.

Ping.

@aaron.ballman thank you for taking a look!
Addressed review notes.

clang-tools-extra/clang-tidy/misc/NoRecursionCheck.cpp
29

I was really trying to convey the fact that it can either be in small-sized mode (linear search over vector),
or in set mode (constructed with a large size, search in set).
But then, SmallVector also can be in small-sized mode (stack allocation), and large-sized mode (heap).
So here i guess ImmutableSmallSet name should work..

71

There already is a SmallSetVector, and it does have different semantics,
i've added a code comment explaining their differences, PTAL.

For now, i'd prefer to keep this as-is, but afterwards i suspect
i might be able to "upstream" this into SmallSetVector itself,
thus getting rid of SmartSmallSetVector.

Let me know if this makes sense?

204

Sure, that should be fine too.

205

Nice!

clang-tools-extra/docs/clang-tidy/checks/misc-no-recursion.rst
9

Is this better?

clang-tools-extra/test/clang-tidy/checkers/misc-no-recursion.cpp
128

This indeed gets blinded by function pointers, test added.

As for lambdas, i'm not sure what specifically you mean?
The lambda itself can't be recursive i think?
https://godbolt.org/z/GYLRnB

153

I agree that this isn't perfect, but i'm not very sure how that information could be added to the call stack.
I think i'll leave this as-is for now..

aaron.ballman accepted this revision.Feb 13 2020, 9:32 AM

LGTM aside from some nits.

clang-tools-extra/clang-tidy/misc/NoRecursionCheck.cpp
71

Okay, that's reasonable enough. Can you add a FIXME suggesting that plan in the code?

clang-tools-extra/docs/clang-tidy/checks/misc-no-recursion.rst
7

call graph -> the call graph

And you should probably spell out SCC.

9

Looks great!

9

call graph -> a call graph

clang-tools-extra/test/clang-tidy/checkers/misc-no-recursion.cpp
128

There are techniques to make lambdas recursive, but maybe we don't need to catch those (at least not initially). e.g., https://riptutorial.com/cplusplus/example/8508/recursive-lambdas

This revision is now accepted and ready to land.Feb 13 2020, 9:32 AM
lebedev.ri marked 8 inline comments as done.

LGTM aside from some nits.

Thank you for the review!
Nits addressed.