This is an archive of the discontinued LLVM Phabricator instance.

[clang-tidy] Add check fo SEI CERT item ENV32-C
AbandonedPublic

Authored by gamesh411 on Jul 13 2020, 2:11 PM.

Details

Summary

Add check for the SEI CERT item ENV32-C which dictates that exit handler functions should terminate by returning as opposed to calling exit-functions or jumping out of the handler function body with longjmp. The handler registration functions atexit and at_quick_exit, exit functions _Exit, exit, and quick_exit are checked.

Diff Detail

Event Timeline

gamesh411 created this revision.Jul 13 2020, 2:11 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJul 13 2020, 2:11 PM
gamesh411 edited the summary of this revision. (Show Details)Jul 13 2020, 2:16 PM
gamesh411 updated this revision to Diff 277571.Jul 13 2020, 2:31 PM

update includes

gamesh411 updated this revision to Diff 277575.Jul 13 2020, 2:38 PM

simplify VisitCallExpr

gamesh411 updated this revision to Diff 277581.Jul 13 2020, 3:05 PM

fix test diag location changes resulting from autoformatting

gamesh411 updated this revision to Diff 277588.Jul 13 2020, 3:21 PM

use std::copy algorithm
polish

gamesh411 updated this revision to Diff 277590.Jul 13 2020, 3:22 PM

mention iterator header

It would be a good idea to emit notes to help explain the problem code, I'm thinking something along the lines of:

warning: exit-handler potentially calls a jump function. Handlers should terminate by returning [cert-env32-c]
note: exit-handler declared here
note: call to a jump function declared here

The second note would require you to track the call site along with the FunctionDecl.
I wouldn't go as far as trying to track the entire call stack when functions in exit handlers call functions that exit. While its possible, it could end up looking messy.

clang-tools-extra/clang-tidy/cert/ExitHandlerCheck.cpp
27

Listen to the clang-tidy warnings.

73

Is there a reason for disabling format here? How messy is the matcher code when formatted?

79

Use hasAnyName

121–123

nit:

if (!SeenFunctions.insert(Current).second)
  continue;
152

Use llvm::copy instead of std::copy

clang-tools-extra/clang-tidy/cert/ExitHandlerCheck.h
19

Unnecessary empty line

clang-tools-extra/docs/ReleaseNotes.rst
72

s/those/that

clang-tools-extra/docs/clang-tidy/checks/cert-env32-c.rst
115

Space after longjmp()

gamesh411 updated this revision to Diff 277796.Jul 14 2020, 5:50 AM

extend with notes
apply minor fixes
tests are WIP until I figure out how to properly use file-check

gamesh411 marked 10 inline comments as done.Jul 14 2020, 5:55 AM

Thanks @njames93 :) I have extended the check with notes, but I have to figure out how to appease file-check, so its still WIP until I do.

clang-tools-extra/clang-tidy/cert/ExitHandlerCheck.cpp
73

It is not that bad after all. Especially with the hasAnyName predicate.

121–123

Neat!

Unfortunately, I'm not qualified enough to have much to say.

clang-tools-extra/clang-tidy/cert/ExitHandlerCheck.cpp
2

Env32CCheck.cpp -> ExitHandlerCheck.cpp

39

If EF__Exit, EF_exit and EF_quick_exit referred only once, we could simply inline those,

Same for the isJumpFunction.

61

IMO -> decltype(CalledFunctions.begin()) is unnecessary.
Return type deduction does the right thing for this case.

Same for the auto end() ...

85

If you disabled clang-format for having inline comments, Then you could create smaller matchers and give names for them.

Something similar to this:

const auto DesciptiveName = hasArgument(0, declRefExpr(hasDeclaration(functionDecl().bind("handler_decl"))));
133

Why don't we return here?
Same for the next break.

149–155

nit: I would clear the Collector before the TraverseStmt.

clang-tools-extra/clang-tidy/cert/ExitHandlerCheck.h
21

In the documentation you use the:
clang-tools-extra/docs/clang-tidy/checks/cert-env32-c.rst

All exit handlers must return normally.

You should be consistent.

Eugene.Zelenko added inline comments.Jul 14 2020, 6:47 AM
clang-tools-extra/docs/ReleaseNotes.rst
72

Please separate with empty line.

clang-tools-extra/docs/clang-tidy/checks/cert-env32-c.rst
7

Please synchronize with statement in Release Notes.

8

Reference to original description usually placed at the end.

13

Please enclose all function names in double back-ticks. Same below.

Please don't use lines longer then 80 symbols.

17

Is SIG30-C implemented in Clang-tidy? If not, reference should be removed.

Eugene.Zelenko edited projects, added Restricted Project; removed Restricted Project.
Eugene.Zelenko removed a subscriber: llvm-commits.
Herald added a project: Restricted Project. · View Herald TranscriptJul 14 2020, 6:50 AM

@Eugene.Zelenko Thanks for cleaning up revisions -- same as D69560#1725399, we are working in the same office and have worked on some forms of static analysis for a while now. Adding us as reviewers helps this revision being more visible. Its clear that we don't have the authority to approve changes to clang-tidy, but we can confidently request changes or add to the discussion while still respecting that.

I'll re-add ourselves to keep reviewing well oiled.

whisperity edited the summary of this revision. (Show Details)Jul 14 2020, 7:06 AM
whisperity removed projects: Restricted Project, Restricted Project.
whisperity added subscribers: zporky, gsd.
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJul 14 2020, 7:06 AM
njames93 added a comment.EditedJul 14 2020, 10:43 AM

extend with notes
apply minor fixes
tests are WIP until I figure out how to properly use file-check

If you add tests for notes, you will need to use CHECK-NOTES-DAG as notes wont appear in the same order as they are in the file.

CHECK-NOTES-DAG can be used for detecting warnings too

gamesh411 updated this revision to Diff 278116.Jul 15 2020, 2:23 AM
gamesh411 marked 2 inline comments as done.

use move instead of copy
fix documentation issues
fix tests

gamesh411 marked 12 inline comments as done.Jul 15 2020, 2:31 AM
gamesh411 added inline comments.
clang-tools-extra/clang-tidy/cert/ExitHandlerCheck.cpp
133

I use break, because it is more local to the concept of finishing the loop (ie the visitation algorithm), and that is what I want the reader of the code to receive after reading the line. In this particular case, return is equivalent to break but I would still keep it this way, because that is what I want to say, and if someone were to extend the code later, that early return inside the loop could surprise them.

gamesh411 marked an inline comment as done.Jul 15 2020, 2:40 AM

I have bitten the bullet, and have gone down this route. With relative numbering, the sections themselves are at least translation-invariant. Not the prettiest sight, tho.
Thanks!

@steakhal @Eugene.Zelenko Thanks for checking this patch! I have tried my best to adhere to your suggestions.

Eugene.Zelenko added inline comments.Jul 15 2020, 7:00 AM
clang-tools-extra/docs/clang-tidy/checks/cert-env32-c.rst
7

Please synchronize with statement in Release Notes. Please omit This check.

gamesh411 updated this revision to Diff 278174.Jul 15 2020, 7:06 AM

fix documentation header

gamesh411 marked 2 inline comments as done.Jul 15 2020, 7:06 AM
gamesh411 added inline comments.
clang-tools-extra/docs/clang-tidy/checks/cert-env32-c.rst
7

Thanks! Fixed :)

gamesh411 updated this revision to Diff 278176.Jul 15 2020, 7:07 AM
gamesh411 marked an inline comment as done.Jul 15 2020, 7:07 AM

.

Eugene.Zelenko added inline comments.Jul 15 2020, 7:11 AM
clang-tools-extra/docs/ReleaseNotes.rst
72

This was not done yet.

gamesh411 updated this revision to Diff 278178.Jul 15 2020, 7:15 AM
gamesh411 marked an inline comment as done.

add missing newline in release notes

gamesh411 added inline comments.Jul 15 2020, 7:16 AM
clang-tools-extra/docs/ReleaseNotes.rst
72

Do you mean to separate the header line Finds functions registered... from the link above? I think I was following the entry after this one as a guideline instead of the one before.

Eugene.Zelenko added inline comments.Jul 15 2020, 7:18 AM
clang-tools-extra/docs/ReleaseNotes.rst
72

Yes, I meant that. Please also same problem in entry below.

gamesh411 updated this revision to Diff 278181.Jul 15 2020, 7:21 AM

tidy up release notes, make all new check entries uniform

@Eugene.Zelenko I have just rebase-d, and seen that the release notes page itself was bumped to clang-tidy 12. I have added my check as a new check there. Should I also add the other subsections (like improvements in existing checks, and new check aliases), or authors will add them as needed?

@Eugene.Zelenko I have just rebase-d, and seen that the release notes page itself was bumped to clang-tidy 12. I have added my check as a new check there. Should I also add the other subsections (like improvements in existing checks, and new check aliases), or authors will add them as needed?

No, that sections will be filled when related changes will be made. Release Notes are reset to blank state in master after branching release.

gamesh411 marked 2 inline comments as done.Jul 15 2020, 7:41 AM

@Eugene.Zelenko I have just rebase-d, and seen that the release notes page itself was bumped to clang-tidy 12. I have added my check as a new check there. Should I also add the other subsections (like improvements in existing checks, and new check aliases), or authors will add them as needed?

No, that sections will be filled when related changes will be made. Release Notes are reset to blank state in master after branching release.

Ok, and thank you for your patience with this review! It was fruitful after all, and I have learned a thing or two about the project.

I have bitten the bullet, and have gone down this route. With relative numbering, the sections themselves are at least translation-invariant. Not the prettiest sight, tho.
Thanks!

I almost think it would be nice FileCheck supported some directive like:

// LINE-NAME: <LINENAME>

And corresponding check lines:

[[<LINENAME>]]
[[<LINENAME>+N]] 
[[<LINENAME>-N]]

Would result in the ability to write checks like this:

void longjmp_handler() {// LINE-NAME: LONGJMP
    longjmp(env, 255); 
}

...

void foo(){
  atexit(longjmp_handler);
   // CHECK-NOTES: :[[@LINE-1]]:3: warning: exit-handler potentially calls a jump function. Handlers should terminate by returning [cert-env32-c]
  // CHECK-NOTES: :[[LONGJMP]]:1: note: handler function declared here
  // CHECK-NOTES: :[[LONGJMP+1]]:3: note: jump function called here
}

Anyway that's a story for another day.

I have bitten the bullet, and have gone down this route. With relative numbering, the sections themselves are at least translation-invariant. Not the prettiest sight, tho.
Thanks!

I almost think it would be nice FileCheck supported some directive like:

// LINE-NAME: <LINENAME>

And corresponding check lines:

[[<LINENAME>]]
[[<LINENAME>+N]] 
[[<LINENAME>-N]]

Would result in the ability to write checks like this:

void longjmp_handler() {// LINE-NAME: LONGJMP
    longjmp(env, 255); 
}

...

void foo(){
  atexit(longjmp_handler);
   // CHECK-NOTES: :[[@LINE-1]]:3: warning: exit-handler potentially calls a jump function. Handlers should terminate by returning [cert-env32-c]
  // CHECK-NOTES: :[[LONGJMP]]:1: note: handler function declared here
  // CHECK-NOTES: :[[LONGJMP+1]]:3: note: jump function called here
}

Anyway that's a story for another day.

My thoughts exactly! I also thought about anchor-points as a feature in file-check, as that would immensely increase the readability of the test-code in such cases.

My thoughts exactly! I also thought about anchor-points as a feature in file-check, as that would immensely increase the readability of the test-code in such cases.

I've put in an RFC for that functionality, Think I CC'd you into it, if you have any comments about it I'd appreciate them.

gamesh411 updated this revision to Diff 282529.Aug 3 2020, 1:50 AM

use llvm::move algorithm

gamesh411 updated this revision to Diff 282532.Aug 3 2020, 1:58 AM

use llvm::move algorithm

gamesh411 updated this revision to Diff 282533.Aug 3 2020, 2:01 AM

ensure lint in path

I have updated the revision to use llvm::move algorithm (available thanks to @njames93).
Friendly ping :)

whisperity added inline comments.Aug 3 2020, 3:54 AM
clang-tools-extra/clang-tidy/cert/ExitHandlerCheck.cpp
64

What happens if this test is run on C++ code calling the same functions? I see the rule is only applicable to C, for some reason... Should it be disabled from registering if by accident the check is enabled on a C++ source file?

68–71

It is customary in most Tidy checks that use multiple binds to have the bind names defined as a symbol, instead of using just two string literals as if the bind has to be renamed, it's easy to mess it up.

112–113

Same as below, suggestion: "exit hander potentially calls exit function instead of terminating normally with a return".

("exit handler" and "exit function" without - is more in line with the SEI CERT rule's phrasing too, they don't say "exit-handler".)

124–125

This semi-sentence structure of starting with lowercase but also terminating the sentence and leaving in another but unterminated sentences looks really odd.

Suggestion: "exit handler potentially jumps instead of terminating normally with a return"

clang-tools-extra/test/clang-tidy/checkers/cert-env32-c.c
31

Which is the standard version this test file is set to analyse with? I don't see any -std= flag in the RUN: line.

aaron.ballman added inline comments.Aug 3 2020, 5:06 AM
clang-tools-extra/clang-tidy/cert/ExitHandlerCheck.cpp
2

This comment was marked done but appears to still be an issue.

33

Because this rule applies in C++ as well as C, you should protect these names so that calling something like this doesn't trigger the check:

namespace menu_items {
void exit(int);
}

I think we want ::_Exit and ::std::_Exit to check the fully qualified names (I believe this will still work in C, but should be tested). The same goes for the other names (including atexit and at_quick_exit above).

64

The CERT C++ rules inherit many of the C rules, including this one. It's listed towards the bottom of the set of inherited rules here: https://wiki.sei.cmu.edu/confluence/pages/viewpage.action?pageId=88046336

70

I am not at all certain whether this is plausible, but I *think* this check can be almost entirely implemented in the matchers rather than having to do manual work. I think you can bind the argument node from the call to at_quick_exit() and then use equalsBoundNode() to find the function calls within the bound functionDecl() node.

However, if that's not workable, I think you can get rid of the CalledFunctionsCollector entirely and just use matchers directly within the check() function because by that point, you'll know exactly which AST nodes you want to traverse through.

112–113

Slight tweak to the suggestion: exit hander potentially calls an exit function instead of terminating normally with a return

124–125

Slight tweak here as well. I don't think we'll ever see a jump function other than longjmp for this rule, so what about writing potentially calls 'longjmp' instead of potentially jumps?

128

If you make the suggested change above, I'd do a similar change here.

whisperity added inline comments.Aug 3 2020, 5:25 AM
clang-tools-extra/clang-tidy/cert/ExitHandlerCheck.cpp
33

I think we want ::_Exit and ::std::_Exit to check the fully qualified names (I believe this will still work in C, but should be tested).

Tested with Clang-Query:

clang-query> m functionDecl(hasName("::atexit"))

Match #1:

/home/whisperity/LLVM/Build/foo.c:1:1: note: "root" binds here
int atexit(int) {}
^~~~~~~~~~~~~~~~~~
1 match.
64

Right, thanks for the heads-up. This should be somewhat more indicative on the Wiki on the page for the rule (especially because people won't immediately understand why a -c check reports on their cpp code, assuming they understand -c means C.)

124–125

😉 Agree. And if we see, we will have to update the code anyways. One could in theory whip up some inline assembly black magic, but that's a whole other can of worms.

njames93 added inline comments.Aug 3 2020, 7:06 AM
clang-tools-extra/clang-tidy/cert/ExitHandlerCheck.cpp
32

This should take a pointer to the function to run other checks, like checking its decl context.

33

That only works because the logic inside the hasNamematcher

37

Ditto

gamesh411 updated this revision to Diff 282837.Aug 4 2020, 2:13 AM
gamesh411 marked an inline comment as done.

rename file name in header

aaron.ballman added inline comments.Aug 5 2020, 4:43 AM
clang-tools-extra/clang-tidy/cert/ExitHandlerCheck.cpp
33

That's the bit I was worried about, too, thanks for confirming @njames93.

64

I would imagine people shouldn't be confused by a C check triggering in C++ given that C++ incorporates much of C so there's a considerable amount of overlap (for instance, this hasn't been an issue with cert-env33-c which applies in both C and C++).

That said, what do you think should be indicated on the wiki (I assume you mean the CERT wiki and not the clang-tidy documentation)? I'm happy to pass the suggestion along to folks I still know at CERT.

whisperity added inline comments.Aug 5 2020, 5:05 AM
clang-tools-extra/clang-tidy/cert/ExitHandlerCheck.cpp
64

That said, what do you think should be indicated on the wiki (I assume you mean the CERT wiki and not the clang-tidy documentation)?

On the page of a C rule "ABC-00", if it also applies for C++, it should be indicated. Just this fact: "In addition, this rule is applicable for C++ programs [bla bla]." where [bla bla] is something like "calling C standard library operations" or "dealing with C API" or "handling C data structures" or simply no extra "elaboration", depending on what the rule is targeting.

aaron.ballman added inline comments.Aug 5 2020, 5:26 AM
clang-tools-extra/clang-tidy/cert/ExitHandlerCheck.cpp
64

Thanks, I'll pass this suggestion along!

gamesh411 updated this revision to Diff 289597.Sep 2 2020, 3:46 PM

only consider global and ::std scope handlers

aaron.ballman added inline comments.Sep 3 2020, 6:26 AM
clang-tools-extra/clang-tidy/cert/ExitHandlerCheck.cpp
38

abort() as well?

How about in C++ mode calling std::terminate()?

One last class of problem functions are ones that are attributed as not returning (this would include user-defined functions).

gamesh411 marked 2 inline comments as done.

Add abort and terminate handling
Extend tests to cover every exit functions
Extract matcher bind labels

gamesh411 marked 13 inline comments as done.Sep 17 2020, 12:21 AM

Note that there are no negative test cases that assert that we do NOT report in case a custom or an anonymous namespace is used. For that I would need a small patch in the testing infrastructure.
Patch needed in check_clang_tidy.py:

--- a/clang-tools-extra/test/clang-tidy/check_clang_tidy.py
+++ b/clang-tools-extra/test/clang-tidy/check_clang_tidy.py
@@ -167,7 +167,7 @@ def run_test_once(args, extra_args):
       subprocess.check_output(
           ['FileCheck', '-input-file=' + temp_file_name, input_file_name,
            '-check-prefixes=' + ','.join(check_fixes_prefixes),
-           '-strict-whitespace'],
+           '-strict-whitespace', '--allow-empty'],
           stderr=subprocess.STDOUT)
     except subprocess.CalledProcessError as e:
       print('FileCheck failed:\n' + e.output.decode())
@@ -180,7 +180,7 @@ def run_test_once(args, extra_args):
       subprocess.check_output(
           ['FileCheck', '-input-file=' + messages_file, input_file_name,
            '-check-prefixes=' + ','.join(check_messages_prefixes),
-           '-implicit-check-not={{warning|error}}:'],
+           '-implicit-check-not={{warning|error}}:', '--allow-empty'],
           stderr=subprocess.STDOUT)
     except subprocess.CalledProcessError as e:
       print('FileCheck failed:\n' + e.output.decode())
@@ -195,7 +195,7 @@ def run_test_once(args, extra_args):
       subprocess.check_output(
           ['FileCheck', '-input-file=' + notes_file, input_file_name,
            '-check-prefixes=' + ','.join(check_notes_prefixes),
-           '-implicit-check-not={{note|warning|error}}:'],
+           '-implicit-check-not={{note|warning|error}}:', '--allow-empty'],
           stderr=subprocess.STDOUT)
     except subprocess.CalledProcessError as e:
       print('FileCheck failed:\n' + e.output.decode())

And then I can assert the non-reports by adding the following runlines:

// Functions in anonymous or custom namespace should not be considered as exit
// functions.
//
// RUN: %check_clang_tidy -assume-filename=%s.cpp %s -check-suffix=CUSTOM \
// RUN:     cert-env32-c %t -- -- -DCPPMODE -DTEST_NS_NAME=custom
// CHECK-NOTES-CUSTOM-NOT: NO DIAGS
//
// RUN: %check_clang_tidy -assume-filename=%s.cpp %s -check-suffix=ANONYMOUS \
// RUN:     cert-env32-c %t -- -- -DCPPMODE -DTEST_NS_NAME=''
// CHECK-NOTES-ANONYMOUS-NOT: NO DIAGS
clang-tools-extra/clang-tidy/cert/ExitHandlerCheck.cpp
70

I have investigated, and to do that recursive matching up to indeterminate depth all the while keeping the already matched functions in a set is not something I would implement with a single matcher. I could use a standalone matcher, but as far as I can understand I would need to implement a callback function for handling the matched results out of line for that as well, so I think that the ASTVisitor based solution is at least as good as a standalone ASTMatcher would be. Therefore I'd rather keep this solution.

clang-tools-extra/test/clang-tidy/checkers/cert-env32-c.c
31

Right now there is a run-line for handling it as C, and 2 others for handling it as C++ with different namespaces. The test cases themselves are not dependent on the standard used (AFAIK). The comment contains a standard reference for C, but that is just the first time it was standardized in C so the mention is there for traceability and not to restrict the standard used.

Reformat diagnostic message
Use explicit name longjmp instead of jump function
Fix liberal auto inside Collector

gamesh411 marked 6 inline comments as done.Sep 17 2020, 12:35 AM
gamesh411 marked 2 inline comments as done.Sep 17 2020, 12:46 AM
gamesh411 updated this revision to Diff 292488.Sep 17 2020, 7:09 AM

Update commit message

One of the concerns I have with this not being a flow-sensitive check is that most of the bad situations are not going to be caught by the clang-tidy version of the check. The CERT rules show contrived code examples, but the more frequent issue looks like:

void cleanup(struct whatever *ptr) {
  assert(ptr); // This potentially calls abort()
  free(ptr->buffer);
  free(ptr);
}

void some_cleanup_func(void) {
  for (size_t idx = 0; idx < GlobalElementCount; ++idx) {
    cleanup(GlobalElement[idx]);
  }
}

void some_exit_handler(void) {
  ...
  some_cleanup_func();
  ...
}

The fact that we're not looking through the call sites (even without cross-TU support) means the check isn't going to catch the most problematic cases. You could modify the called function collector to gather this a bit better, but you'd issue false positives in flow-sensitive situations like:

void some_cleanup_func(void) {
  for (size_t idx = 0; idx < GlobalElementCount; ++idx) {
    struct whatever *ptr = GlobalElement[idx];
    if (ptr) {
      // Now we know abort() won't be called
      cleanup(ptr);
    }
  }
}

Have you run this check over any large code bases to see if it currently catches any true positive diagnostics?

clang-tools-extra/clang-tidy/cert/ExitHandlerCheck.cpp
102

I know it was my suggestion originally, but I realize that's just describing the code not what's wrong with it. How about: exit handler potentially terminates the program without running other exit handlers?

115

Similar suggestion here: exit handler potentially calls 'longjmp' which may fail to run other exit handlers?

clang-tools-extra/docs/clang-tidy/checks/cert-env32-c.rst
8

terminate?

10

You should not copy and paste the text from the CERT standard here. There would be copyright questions from that, but also, the CERT standard is a wiki that gets updated with some regularity so these docs are likely to get stale anyway.

We usually handle this by paraphrasing a bit about what the rule is checking for, and then provide a link directly to the CERT rule for the user to get the details.

gamesh411 added a comment.EditedOct 30 2020, 4:03 AM

One of the concerns I have with this not being a cfg-only check is that most of the bad situations are not going to be caught by the clang-tidy version of the check.

...

Have you run this check over any large code bases to see if it currently catches any true positive diagnostics?

I have tried llvm, tmux, curl and tried codesearch.com to look for other sources containing atexit, but no clang-tidy results were found for this check (this is partly because it is hard to manually make the project results buildable). So it is hard to see whether this flow-sensitive approach would result in many false positives.

One of the concerns I have with this not being a cfg-only check is that most of the bad situations are not going to be caught by the clang-tidy version of the check.

...

Have you run this check over any large code bases to see if it currently catches any true positive diagnostics?

I have tried llvm, tmux, curl and tried codesearch.com to look for other sources containing atexit, but no clang-tidy results were found for this check (this is partly because it is hard to manually make the project results buildable). So it is hard to see whether this flow-sensitive approach would result in many false positives.

Just to make sure we're on the same page -- the current approach is not flow-sensitive and so my concern is that it won't report any true positives (not that it will be prone to false positives).

Btw, one trick I've used to make random projects easier to work with in clang-tidy is to either find ones that support CMake already or if they use make, then run the command through bear (https://github.com/rizsotto/Bear) -- this way I can get a compile_commands.json file that I can use to try to get clang-tidy diagnostics out of the project.

In any of the projects that you found which are using atexit(), did you try to inspect the exit handler code paths to see if you could manually identify any problem code (like assert() calls)? I realize that's a lot of effort to go through (especially if you have to consider C++ constructs like constructors or overloaded operators which may be hard to spot by code inspection), but if you find the check doesn't trigger on code bases but there are issues when manually inspecting the code, that strongly suggests this should be a flow-sensitive check that probably lives in the static analyzer rather than clang-tidy.

Just to make sure we're on the same page -- the current approach is not flow-sensitive, and so my concern is that it won't report any true positives (not that it will be prone to false positives).

Sorry about that. You are absolutely right; what I was trying to say is CallGraph-based.

Just some thoughts on this example:

One of the concerns I have with this not being a flow-sensitive check is that most of the bad situations are not going to be caught by the clang-tidy version of the check. The CERT rules show contrived code examples, but the more frequent issue looks like:

void cleanup(struct whatever *ptr) {
  assert(ptr); // This potentially calls abort()
  free(ptr->buffer);
  free(ptr);
}
...

What I have in support of this approach is this reasoning:
If a handler is used where either branch can abort then that branch is expected to be taken. Otherwise it is dead code. I would argue then, that this abortion should be refactored out of the handler function to ensure well-defined behaviour in every possible case.

As a counter-argument; suppose that there is a function that is used as both an exit-handler and as a simple invocation. In this case, I can understand if one would not want to factor the abortion logic out, or possibly pass flags around.

Then to this remark:

The fact that we're not looking through the call sites (even without cross-TU support) means the check isn't going to catch the most problematic cases. You could modify the called function collector to gather this a bit better, but you'd issue false positives in flow-sensitive situations like:

void some_cleanup_func(void) {
  for (size_t idx = 0; idx < GlobalElementCount; ++idx) {
    struct whatever *ptr = GlobalElement[idx];
    if (ptr) {
      // Now we know abort() won't be called
      cleanup(ptr);
    }
  }
}

The current approach definitely does not take 'adjacent' call-sites into account (not to mention CTU ones).
In this regard I also tend to see the benefit of this being a ClangSA checker as that would solve 3 problems at once:

  1. Being path-sensitive, so we can explain how we got to the erroneous program-point
  2. It utilizes CTU mode to take callsites from other TU-s into account
  3. Runtime-stack building is implicitly done by ExprEngine as a side effect of symbolic execution

Counter-argument:
But using ClangSA also introduces a big challenge.
ClangSA analyzes all top-level functions during analysis. However I don't know if it understands the concept of exit-handlers, and I don't know a way of 'triggering' an analysis 'on-exit' so to speak.
So AFAIK this model of analyzing only top-level functions is a limitation when it comes to modelling the program behaviour 'on-exit'.
sidenote:
To validate this claim I have dumped the exploded graph of the following file:

#include <cstdlib>
#include <iostream>

void f() {
  std::cout << "handler f";
};

int main() {
  std::atexit(f);
}

And it has no mention of std::cout being used, so I concluded, that ClangSA does not model the 'on-exit' behaviour.

I wanted to clear these issues before I made the documentation.
Thanks for the effort and the tips on evaluating the solution, I will do some more exploration.

aaron.ballman added a subscriber: NoQ.

Just to make sure we're on the same page -- the current approach is not flow-sensitive, and so my concern is that it won't report any true positives (not that it will be prone to false positives).

Sorry about that. You are absolutely right; what I was trying to say is CallGraph-based.

Just some thoughts on this example:

One of the concerns I have with this not being a flow-sensitive check is that most of the bad situations are not going to be caught by the clang-tidy version of the check. The CERT rules show contrived code examples, but the more frequent issue looks like:

void cleanup(struct whatever *ptr) {
  assert(ptr); // This potentially calls abort()
  free(ptr->buffer);
  free(ptr);
}
...

What I have in support of this approach is this reasoning:
If a handler is used where either branch can abort then that branch is expected to be taken. Otherwise it is dead code. I would argue then, that this abortion should be refactored out of the handler function to ensure well-defined behaviour in every possible case.

If the assert was directly within the handler code, then sure. However, consider a situation like this:

struct Something {
  Something(int *ip) { assert(ip); ... }
  ...
};

where the use of the assertion is far removed from the fact that it's being used within a handler.

As a counter-argument; suppose that there is a function that is used as both an exit-handler and as a simple invocation. In this case, I can understand if one would not want to factor the abortion logic out, or possibly pass flags around.

Yes, this is exactly the situation I'm worried about.

Then to this remark:

The fact that we're not looking through the call sites (even without cross-TU support) means the check isn't going to catch the most problematic cases. You could modify the called function collector to gather this a bit better, but you'd issue false positives in flow-sensitive situations like:

void some_cleanup_func(void) {
  for (size_t idx = 0; idx < GlobalElementCount; ++idx) {
    struct whatever *ptr = GlobalElement[idx];
    if (ptr) {
      // Now we know abort() won't be called
      cleanup(ptr);
    }
  }
}

The current approach definitely does not take 'adjacent' call-sites into account (not to mention CTU ones).
In this regard I also tend to see the benefit of this being a ClangSA checker as that would solve 3 problems at once:

  1. Being path-sensitive, so we can explain how we got to the erroneous program-point
  2. It utilizes CTU mode to take callsites from other TU-s into account
  3. Runtime-stack building is implicitly done by ExprEngine as a side effect of symbolic execution

Agreed.

Counter-argument:
But using ClangSA also introduces a big challenge.
ClangSA analyzes all top-level functions during analysis. However I don't know if it understands the concept of exit-handlers, and I don't know a way of 'triggering' an analysis 'on-exit' so to speak.
So AFAIK this model of analyzing only top-level functions is a limitation when it comes to modelling the program behaviour 'on-exit'.

I'm hoping someone more well-versed in the details of the static analyzer can speak to this point. @NoQ @Szelethus others?

sidenote:
To validate this claim I have dumped the exploded graph of the following file:

#include <cstdlib>
#include <iostream>

void f() {
  std::cout << "handler f";
};

int main() {
  std::atexit(f);
}

And it has no mention of std::cout being used, so I concluded, that ClangSA does not model the 'on-exit' behaviour.

I wanted to clear these issues before I made the documentation.
Thanks for the effort and the tips on evaluating the solution, I will do some more exploration.

Thank you for looking into this! If it turns out that there's some reason why the static analyzer cannot be at least as good of a home for the functionality as clang-tidy, that would be really interesting to learn. Either there are improvements we could consider making to the static analyzer, or we could leave the check in clang-tidy despite the limitations, but there's still a path forward.

NoQ added a comment.Nov 2 2020, 5:56 PM

I don't think you actually need active support for invoking exit handlers path-sensitively at the end of main() in order to implement your checker. You can still find out in a path-insensitive manner whether any given function acts as an exit handler, like you already do. Then during path-sensitive analysis of that function you can warn at any invocation of exit().

I do believe that you want to implement this check as a path-sensitive check as long as you want any sort of interprocedural analysis (i.e., warn when an exit handler calls a function that calls a function ... that calls a function that calls exit() - and you do already have such tests). This is because of the following potential situation:

void foo(bool already_exiting) {
  if (!already_exiting)
    exit();
}

void bar() {
  foo(true);
}

void baz() {
  foo(false);
}

int main() {
  atexit(bar);
  return 0;
}

In this case bar() is an exit handler that calls foo() that calls exit(). However the code is correct and no warning should be emitted, because foo() would never call exit() when called from bar() specifically. Precise reasoning about such problems requires path-sensitive analysis. Of course it's up to you to decide what to do with these false positives - whether you'll be ok with having them, or choose to suppress with an imprecise heuristic - but that's one of the possible reasons to consider reimplementing the checker via path sensitive analysis that the static analyzer provides.

We've had a similar problem with the checker that warns on calling pure virtual functions in constructors/destructors (which is undefined behavior). Such checker had to be path sensitive in order to be interprocedural for the exact same unobvious reason. We've decided to re-implement it with path-sensitive analysis and now we're pretty happy about that decision.

gamesh411 abandoned this revision.Nov 25 2020, 1:01 AM
In D83717#2370154, @NoQ wrote:

I don't think you actually need active support for invoking exit handlers path-sensitively at the end of main() in order to implement your checker. You can still find out in a path-insensitive manner whether any given function acts as an exit handler, like you already do. Then during path-sensitive analysis of that function you can warn at any invocation of exit().

I do believe that you want to implement this check as a path-sensitive check as long as you want any sort of interprocedural analysis (i.e., warn when an exit handler calls a function that calls a function ... that calls a function that calls exit() - and you do already have such tests). This is because of the following potential situation:

void foo(bool already_exiting) {
  if (!already_exiting)
    exit();
}

void bar() {
  foo(true);
}

void baz() {
  foo(false);
}

int main() {
  atexit(bar);
  return 0;
}

In this case bar() is an exit handler that calls foo() that calls exit(). However the code is correct and no warning should be emitted, because foo() would never call exit() when called from bar() specifically. Precise reasoning about such problems requires path-sensitive analysis. Of course it's up to you to decide what to do with these false positives - whether you'll be ok with having them, or choose to suppress with an imprecise heuristic - but that's one of the possible reasons to consider reimplementing the checker via path sensitive analysis that the static analyzer provides.

We've had a similar problem with the checker that warns on calling pure virtual functions in constructors/destructors (which is undefined behavior). Such checker had to be path sensitive in order to be interprocedural for the exact same unobvious reason. We've decided to re-implement it with path-sensitive analysis and now we're pretty happy about that decision.

Thanks! I have checked, and this is definitely doable (however the solution is a bit more involved in case of CTU). This check is definitely better to implement in a path-sensitive way.

For anyone looking to implement this (thanks to @NoQ for the original idea outline above) :

 This is conceptually a 2 phase solution.
- Phase 1: detect the functions used as exit_handlers. As an approximation, this could be done based on syntax (with an ASTMatcher), not unlike the one implemented in this revision. This can be done on the whole TU in any check callback, so the availability of this information is not an issue. In case of CTU the AST is built during the analysis itself, so theoretically we would want to recheck the whole TU every time we use this information (but this seems overkill, and computationally intensive). So doing this only once is also an approximation.
- Phase2: during symbolic execution, we check for call expressions of exit functions (`_Exit`, `exit`, `quick_exit`), and examine the call-stack to identify a parent function declaration, which is in the set of detected handler-functions (built during Phase 1). We report an error in this case.

I guess I will abandon this and will create a patch for ClangSA when I will have some time for it (can't really promise anything as of now).