This is an archive of the discontinued LLVM Phabricator instance.

adds basic -Wfree-nonheap-object functionality
ClosedPublic

Authored by cjdb on Oct 22 2020, 3:01 PM.

Details

Summary

Checks to make sure that stdlib's (std::)free is being appropriately
used. Presently checks for the following misuses:

  • free(&stack_object)
  • free(stack_array)

Diff Detail

Event Timeline

cjdb created this revision.Oct 22 2020, 3:01 PM
cjdb requested review of this revision.Oct 22 2020, 3:01 PM

thanks for this! i like the idea of this warning; seems like a cheap way to catch ugly bugs early. +rtrieu and aaron.ballman, as this is a new diagnostic

most of my comments are just poking at style.

free(&stack_object)
free(stack_array)

along the same lines, have you considered also adding a check for:

void foo();
void baz() {
  free(foo);
}

...which... sadly... i've seen in actual C code. :(

clang/include/clang/Basic/DiagnosticSemaKinds.td
7545

why not include this in -Wall? IMO freeing something that's obviously not from the heap is broken enough to be in -Wall

clang/include/clang/Sema/Sema.h
12404

generally, implementation details like these _try_ to be static functions when their only caller sits in the same file as them, so the API surface here isn't filled with tons of called-once methods. You can do Sema &S as a param and S.Diag(...) << ...

clang/lib/AST/Decl.cpp
3967

looks like the brace addition goes against the style guide: https://llvm.org/docs/CodingStandards.html#don-t-use-braces-on-simple-single-statement-bodies-of-if-else-loop-statements

there's a fair amount of code that violates that guide (hence 'microscopic detail'), but imo we shouldn't be going out of our way to make more code do so (here & below)

the 'removal of else's part lgtm though; please split into a separate patch though, to keep this one a bit more focused

clang/lib/Sema/SemaChecking.cpp
4501–4505

style nit: at this point, i'd probably turn this into a switch

10112

nit: generally assert(param_that_has_no_real_reason_to_be_null != nullptr); is just left to crash. it'll crash in production anyway, and people who have asserting builds of LLVM generally know how to dig into crash dumps pretty well :)

10168

there could be a number of casts and parens here (free((char *)(&foo));, while ugly, is totally valid). i think we prooobably want E->getArg(0)->IgnoreParenCasts()?

we just have to make sure that, whatever we do here, we're careful around user-defined conversions:

struct Foo { explicit operator char *(); };
void foo() {
  Foo f;
  free(f);
}
clang/test/Sema/warn-free-nonheap-object.c
4

looks like the preferred spelling of this is typedef __SIZE_TYPE__ size_t;, since the type of size_t depends on the target's triple

5

nit: looks like this is forgetting a return. in general, feel free to just say void *malloc(size_t);; the compiler shouldn't care about the definition of these being present for this test

clang/test/Sema/warn-free-nonheap-object.cpp
7

nit: decltype(sizeof(0)), or __SIZE_TYPE__, as mentioned above

18

nit: can we please also have a test for a static function-local variable? (i presume global ones will be handled identically to GI, so that's 'good enough')

manojgupta added inline comments.Oct 22 2020, 3:47 PM
clang/include/clang/Basic/DiagnosticSemaKinds.td
7545

This was my suggestion to Chris, first let this warning land and stay in trunk for some time to get some testing on our platforms. After some soaking time, it can be added to Wall.

clang/lib/Sema/SemaChecking.cpp
4478–4481

Please remove any style changes not needed for this patch.

4489–4491

ditto

clang/test/Sema/warn-free-nonheap-object.c
15–16

Not sure , but I think it'd be better to define and use a different variable since this is a double free and if there is a double free warning in future, it will be triggered by this testcase unintentionally.

cjdb updated this revision to Diff 300384.Oct 23 2020, 12:49 PM
cjdb marked 12 inline comments as done.
cjdb added inline comments.
clang/include/clang/Basic/DiagnosticSemaKinds.td
7545

I don't think anyone will disagree on putting it in -Wall (I hope), but I do think landing in what is presumably the most commonly activated flag on the first go is a bit hazardous.

clang/include/clang/Basic/DiagnosticSemaKinds.td
7545

then please add a FIXME to consider adding this to -Wall

cjdb updated this revision to Diff 300724.Oct 26 2020, 10:34 AM
cjdb marked an inline comment as done.
cjdb updated this revision to Diff 300727.Oct 26 2020, 10:41 AM

LGTM, though please wait for a review from someone with more expertise with clang's warnings (maybe Aaron or Richard?) to land.

Thanks again!

This revision is now accepted and ready to land.Oct 26 2020, 10:43 AM

In order to avoid false negatives, I think this needs to be flow-sensitive to track values. e.g., I would love for this to catch code like:

void func(bool b) {
  int i = 12;
  int *ip = b ? &i : (int *)malloc(sizeof(int));

  ...

  free(ip); // Should we warn or not?
}

While this example is incredibly contrived, you'll fine real-world code does this sort of path sensitive stuff relatively often (a common example is a function that sometimes returns a value that's been strdup()'ed except for one bad function that returns a string literal).

clang/lib/Sema/SemaChecking.cpp
10112

Elide braces around single-line compound statements per the usual coding conventions (here and elsewhere in the patch).

10121–10122

Doesn't this cut out a whole bunch of interesting cases, like &some_struct.some_member or other non-variable cases? Also of interest would be freeing member variables within a class member function (including odd things like members of an anonymous struct that have been lifted into the surrounding class context).

10134

No need to call getIdentifier() -- the diagnostics engine will automatically convert any NamedDecl into the properly-quoted diagnostic text.

10143

Same concerns here as above about limiting to only vardecls.

10149

Same issue here about not calling getIdentifier().

10157

This is unsafe -- dyn_cast<> will return nullptr for call expressions that don't originate from a FunctionDecl, like calling a block. Looks like there's test coverage missing for non-function calls.

10163

Not certain how much this matters, but this will miss a case like: free((int[]){1, 2, 3}); -- more importantly, this will also miss other kinds of expressions like member expressions.

clang/test/Sema/warn-free-nonheap-object.cpp
11

I'd like to see test cases for some of the stranger situations you can run into in C++, like:

struct S {
  S() : ptr((char *)malloc(10)) {}
  operator char *() { return ptr; }

private:
  char *ptr;
};

void func() {
  S s;
  free(s); // This is fine
}
cjdb added a comment.EditedOct 26 2020, 10:54 AM

In order to avoid false negatives, I think this needs to be flow-sensitive to track values. e.g., I would love for this to catch code like:

void func(bool b) {
  int i = 12;
  int *ip = b ? &i : (int *)malloc(sizeof(int));

  ...

  free(ip); // Should we warn or not?
}

While this example is incredibly contrived, you'll fine real-world code does this sort of path sensitive stuff relatively often (a common example is a function that sometimes returns a value that's been strdup()'ed except for one bad function that returns a string literal).

Yep, this is what I'm considering "phase 2" or "phase 3". Here's another one of note:

char* p = static_cast<char*>(std::malloc(8));
char& r = *p;
std::free(&p); // should warn
std::free(&r); // shouldn't warn

A key intention of D89988 is to get minimal coverage and then expand from there.

In order to avoid false negatives, I think this needs to be flow-sensitive to track values. e.g., I would love for this to catch code like:

void func(bool b) {
  int i = 12;
  int *ip = b ? &i : (int *)malloc(sizeof(int));

  ...

  free(ip); // Should we warn or not?
}

While this example is incredibly contrived, you'll fine real-world code does this sort of path sensitive stuff relatively often (a common example is a function that sometimes returns a value that's been strdup()'ed except for one bad function that returns a string literal).

Yep, this is what I'm considering "phase 2" or "phase 3". Here's another one of note:

char* p = static_cast<char*>(std::malloc(8));
char& r = *p;
std::free(&p); // should warn
std::free(&r); // shouldn't warn

A key intention of D89988 is to get minimal coverage and then expand from there.

Awesome! However, the part that concerns me relating to D89988 is that doing control-flow sensitive checks is something that's generally done in the Analysis part of the frontend rather than Sema, because generating a CFG and then traversing it to reason about the code can be expensive. So the usual approach is to have a warning flag (like what you've done) that enables the checking, akin to: https://github.com/llvm/llvm-project/blob/52bcd691cb1992187d022161e95977a9db371f51/clang/lib/Sema/AnalysisBasedWarnings.cpp#L2208 which suggests that some of the work here should perhaps be restructured in that way to cut down on the churn when you get to later phases. It also suggests that perhaps adding this check to -Wall is not desired due to expected compilation performance concerns (which aren't here for the initial version of the patch).

If we want _full_ analysis, clang-analyzer-unix.Malloc is what originally flagged the free(function_pointer) case that I referenced above; its complaint was:

warning: Argument to free() is the address of the function 'foo', which is not memory allocated by malloc() [clang-analyzer-unix.Malloc]

I think that whatever we do with this warning is going to be less powerful than what the analyzer does, so the goal is surfacing an appropriate subset of these complaints in clang, potentially with less room for false-positives, etc etc.

Doing fun tricks with the CFG sounds like a reasonable next step here, and doing so will likely be way more powerful (+ CPU-consuming, and invite potentially more false-positives). I agree that that probably shouldn't be under -Wall, but I wonder if these trivial cases aren't worth putting there on their own? It's not a perfect parallel, but our split between -Wuninitialized + -Wsometimes-uninitialized comes to mind.

cjdb updated this revision to Diff 300772.Oct 26 2020, 1:03 PM
cjdb marked 4 inline comments as done.

If we want _full_ analysis, clang-analyzer-unix.Malloc is what originally flagged the free(function_pointer) case that I referenced above; its complaint was:

warning: Argument to free() is the address of the function 'foo', which is not memory allocated by malloc() [clang-analyzer-unix.Malloc]

I think that whatever we do with this warning is going to be less powerful than what the analyzer does, so the goal is surfacing an appropriate subset of these complaints in clang, potentially with less room for false-positives, etc etc.

Doing fun tricks with the CFG sounds like a reasonable next step here, and doing so will likely be way more powerful (+ CPU-consuming, and invite potentially more false-positives). I agree that that probably shouldn't be under -Wall, but I wonder if these trivial cases aren't worth putting there on their own? It's not a perfect parallel, but our split between -Wuninitialized + -Wsometimes-uninitialized comes to mind.

Yep, I can live with that. -Wfree-nonheap-object to do the bare minimum (so it can be turned on by -Wall) and perhaps -Wdeep-free-nonheap-object to do the CFG analysis?

Doing fun tricks with the CFG sounds like a reasonable next step here, and doing so will likely be way more powerful (+ CPU-consuming, and invite potentially more false-positives). I agree that that probably shouldn't be under -Wall, but I wonder if these trivial cases aren't worth putting there on their own? It's not a perfect parallel, but our split between -Wuninitialized + -Wsometimes-uninitialized comes to mind.

I think that approach would work well -- then we get the safety of the trivial-to-check cases without the performance impacts unless the user opts in to those. That would also address my concern about churn for the future changes.

clang/lib/Sema/SemaChecking.cpp
10115

Sorry for not catching this one earlier, but these should be const auto * so we don't have to guess the qualifiers.

cjdb updated this revision to Diff 300779.Oct 26 2020, 1:33 PM
cjdb marked an inline comment as done.Oct 26 2020, 1:35 PM
cjdb added inline comments.
clang/lib/Sema/SemaChecking.cpp
10121–10122

Acknowledged. I'd prefer to keep this patch relatively close to what's present and prioritise this as a follow-up patch.

10157

A key precondition for this function is that E is ::free or std::free. I suppose [](void* p) { std::free(p); }ought to be handled too, but that would need to be implemented in the CFG section you mentioned (and possibly turned on by a more in-depth version of the warning?).

aaron.ballman added inline comments.Oct 27 2020, 10:27 AM
clang/lib/Sema/SemaChecking.cpp
10121–10122

I'm fine rolling things out piecemeal so long as there's agreement that the current form isn't the minimum viable product and we're clear on what we expect to be supported by the time we release. To me, the MVP would need to handle both variables and "typical" class data members, and the other stuff like indirect fields can come along whenever. Otherwise the behavior of the diagnostic is pretty inexplicable:

struct S {
  int i;

  void foo() {
    int j;

    free(&i); // Silent happiness
    free(&j); // Diagnoses
  }
};

Are you okay with that?

10157

That's a good point -- I was thinking more about calling blocks or calling through a function pointer. But as it stands, this is fine.

cjdb added inline comments.Oct 27 2020, 10:59 AM
clang/lib/Sema/SemaChecking.cpp
10121–10122

Agreed. My current plan is to get that patch out by EoW, but thanks for stressing its importance!

10163

Ack, same as above.

aaron.ballman accepted this revision.Oct 27 2020, 11:16 AM

LGTM aside from a few request for comments in the test cases.

clang/lib/Sema/SemaChecking.cpp
10121–10122

Perfect, thanks!

clang/test/Sema/warn-free-nonheap-object.cpp
35

Should probably add a FIXME/TODO comment here.

71

Should probably add a TODO/FIXME comment here.

cjdb added inline comments.Oct 27 2020, 2:52 PM
clang/test/Sema/warn-free-nonheap-object.cpp
35

Assuming you're talking about warning that free(P) is bad because I is a stack variable. Is this something we can cheaply do? I thought this would be in the CFG category.

aaron.ballman added inline comments.Oct 28 2020, 6:06 AM
clang/test/Sema/warn-free-nonheap-object.cpp
35

Assuming you're talking about warning that free(P) is bad because I is a stack variable.

Correct.

Is this something we can cheaply do? I thought this would be in the CFG category.

It would be -- I just figured a comment would make it clear why we're failing to warn here. Something like FIXME diagnosing this would require control flow analysis. or similar to make it clear that the lack of diagnostic isn't an oversight.

cjdb updated this revision to Diff 301434.Oct 28 2020, 2:17 PM
cjdb marked 2 inline comments as done.
cjdb added inline comments.
clang/test/Sema/warn-free-nonheap-object.cpp
35

And done! Also applied to the C test as well.

looks like all comments here are addressed, so i'll land this.

thanks again! :)

cjdb updated this revision to Diff 301467.Oct 28 2020, 4:10 PM
This revision was automatically updated to reflect the committed changes.