This is an archive of the discontinued LLVM Phabricator instance.

[analyzer][CStringChecker] Adjust the invalidation operation on the super region of the destination buffer during string copy
ClosedPublic

Authored by OikawaKirie on Jun 8 2023, 5:21 AM.

Details

Summary

Fixing GitHub issue: https://github.com/llvm/llvm-project/issues/55019
Following the previous fix https://reviews.llvm.org/D12571 on issue https://github.com/llvm/llvm-project/issues/23328

The two issues report false memory leaks after calling string-copy APIs with a buffer field in an object as the destination.
The buffer invalidation incorrectly drops the assignment to a heap memory block when no overflow problems happen.
And the pointer of the dropped assignment is declared in the same object of the destination buffer.

The previous fix only considers the memcpy functions whose copy length is available from arguments.
In this issue, the copy length is inferable from the buffer declaration and string literals being copied.
Therefore, I have adjusted the previous fix to reuse the copy length computed before.

Besides, for APIs that never overflow (strsep) or we never know whether they can overflow (std::copy),
new invalidation operations have been introduced to inform CStringChecker::InvalidateBuffer whether or not to
invalidate the super region that encompasses the destination buffer.

Diff Detail

Event Timeline

OikawaKirie created this revision.Jun 8 2023, 5:21 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 8 2023, 5:21 AM
OikawaKirie requested review of this revision.Jun 8 2023, 5:21 AM

Thanks for the elaborated answer on the GH issue. Now it makes sense. And the fix is align with the observations, which is good.
I only have concerns about the code quality of the modifier code. It was already in pretty bad shape, and I cannot say we improve it with this patch.
However, I can also see that it might require a bit engineering to refactor it into something cleaner, so I won't object if you push back.

Thanks for the patch!

clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
263–280

I don't think this is the cleanest way to achieve this.
To me, it feels like it might make sense to have distinct invalidation functions for each scenario and mentally share the same overload-set.
Right now, the API looks really complicated: using enum values along with default parameters. Not to speak of how many parameters it accepts.
However, I won't object this time given that it was already pretty bad, and unreadable - so in that sense, it's not much worse this way and it was not the point to improve the quality of the code.

992

The type of the SVal might not be always accurate.
in fact, we try to move away from using that in the future - whenever possible.
Consequently, I'd prefer passing down the type as an additional parameter instead, or just keeping the Expr along with the corresponding SVal.

1071–1088

Again, it's not your fault, but this code just looks insane.
I'm not surprised it had a bug, and likely it still has.

clang/test/Analysis/issue-55019.c
3

Why do you need the Wno-strict-prototypes? Can we make it work with it?

8

I guess this include is only for the size_t, right?
In c++, you could use using size_t = decltype(sizeof(int)); to define it.

Ah, I see that it's for c function declarations. If that's the case, have you considered adding the malloc and free declarations to that header?

26

I believe you can use the sizeof(x.arr) instead.

28

I would rather use no-leak-warning here to be more specific about what we don't expect.

clang/test/Analysis/issue-55019.cpp
1–4

I think you can just append the C file to this one.
This shouldn't make a difference. And if it does, you can still guard that test code using macro ifndef trickery in addition to distinct -verify=xxx prefixes.

24–26

So, you say that It should be HeapSymRegion instead because x.arr shouldn't be invalidated because the std::copy` won't modify it. Right?

clang/test/Analysis/pr22954.c
560

Ah, the leak should have been reported at the end of the full-expression of memcpy.
If we have this expect line here it could mislead the reader that it has anything to do with the eval call before.

In fact, any statement after memcpy would have this diagnostic.
I have seen cases where we have a next_line() opaque call after such places to anchor the leak diagnostic.
I think we can do the same here.

OikawaKirie edited the summary of this revision. (Show Details)

Update the implementation of InvalidateBuffer in a multi-entrance-with-callback manner.
Update test cases as suggested.

OikawaKirie added inline comments.Jun 17 2023, 1:09 AM
clang/test/Analysis/issue-55019.cpp
14–15

Ah, I see that it's for c function declarations. If that's the case, have you considered adding the malloc and free declarations to that header?

What about doing this in another patch and updating all test cases that use malloc and free? Maybe other libc APIs as well?
The test case malloc-three-arg.c declares malloc and in a different signature and also includes this header. Simply doing so in this patch will lead to other conflicts.

clang/test/Analysis/pr22954.c
581

Do I need to update all other leak expectation directions in this file as well? Such as this one here.

Let me come back to this once again in the upcoming days to make sure everything is good.

clang/test/Analysis/pr22954.c
581

Definitely no.
Its not advised to have different, but similar looking hunks out of which some are the consequence of the semantic change we made and also have others where they are just refactors.

steakhal added inline comments.Jun 19 2023, 1:08 AM
clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
301–302

I'd highly suggest making this a template taking the functor that way.
Given that we modify these functions, we should make them comply with the lower camel case naming convention.
And their variables with upper camel case.

1048

Shouldn't we assert that SizeV is some known value or at least smaller than (or equal to) the extent of the buffer?

1069–1071

The lambdas inline in the call look a bit messy. I would prefer declaring it and passing it as an argument separately.
This applies to the rest of the lambdas as well.

clang/test/Analysis/issue-55019.cpp
14–15

Okay.

BTW, what does the Done checkbox mean in the code comments?

clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
301–302

I'd highly suggest making this a template taking the functor that way.

Any examples?
Do you mean

template <typename T>
static ProgramStateRef
InvalidateBufferAux(CheckerContext &C, ProgramStateRef state, const Expr *Ex,
                    SVal V, std::function<T> CallBack);
1048

The calls in CStringChecker::evalCopyCommon and CStringChecker::evalStrcpyCommon seem to have chances able to pass an unknown value. But I am not very sure about this.

If the SizeV is sure to be smaller or equal to the extent of the buffer, the IsFirstBufInBound check seems useless.
Maybe I need to dig deeper into the callers.

1069–1071

And their variables with upper camel case.

What about the variable names for the lambdas? Lower camel or upper camel?
I cannot find the rules for named lambdas from the guidance https://llvm.org/docs/CodingStandards.html.

clang/test/Analysis/issue-55019.cpp
81–88

So, you say that It should be HeapSymRegion instead because x.arr shouldn't be invalidated because the std::copy won't modify it. Right?

Yes, but just in this test case.

The size of x.arr is 4 and the copy length is 1, which means the call to std::copy here never overflows.
Therefore, we should not invalidate the super region x and keep x.ptr still pointing to the original HeapSymRegion unchanged.
However, the copy length computed through iterators is not modeled currently in CStringChecker::evalStdCopyCommon, where InvalidateDestinationBufferAlwaysEscapeSuperRegion is called.
When it is modeled in the future, the function should call InvalidateDestinationBufferBySize instead to make the invalidation of the super region determined by the copy length to report potential leaks.

clang/test/Analysis/pr22954.c
560

Its not advised to have different, but similar looking hunks out of which some are the consequence of the semantic change we made and also have others where they are just refactors.

Do you mean I only need to update the verification direction here as you suggested above and leave others unchanged?

BTW, what does the Done checkbox mean in the code comments?

"Done" means "Resolved" on GitHub - which is basically that some action was taken or the person who raised the issue withdrew.
In general, no revision/pull request should be merged with having open/unresolved discussions.
Usually, the patch author marks comments as "Done" and on the next update (by pressing the "Submit" button) all those threads will be closed automatically.

clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
301–302

My problem is that std::function owns the callback, and really likely to allocate for doing so.
And in this context a more lightweight approach would be also good.
What I had in mind was:

template <typename Callback>
static ProgramStateRef
InvalidateBufferAux(CheckerContext &C, ProgramStateRef state, const Expr *Ex,
                    SVal V, Callback F);
...
bool asd = F(a, b);

But actually, llvm::function_ref would be probably even better, so use that instead.
Also, add some comments on what the callback's return type means etc.

1048

Indeed.

1069–1071
clang/test/Analysis/issue-55019.cpp
81–88

Makes sense. Thanks.

OikawaKirie marked 12 inline comments as done and an inline comment as not done.
  1. Function and variable names: functions: lower camel, variables: upper camel, lambdas: upper camel
  2. std::function -> llvm::function_ref
  3. update test case verification directions by following others in this file.
clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
1048

Function CStringChecker::CheckBufferAccess is responsible to check whether the size overflows.
However, when Filter.CheckCStringOutOfBounds is set to false, the function will skip the overflow checking.
This makes that we cannot assert SizeV is always inbound, and the check in IsFirstBufInBound seems to be necessary.
(evalMemset -> CheckBufferAccess(skip overflow checking) -> memsetAux -> InvalidateDestinationBufferBySize with function call of memset(x.arr, 'a', 42))

Besides, if the size argument is originally an unknown value expression (e.g. (unsigned char) 1.0), the unknown value can be passed to this function.
(CStringChecker::evalCopyCommon -> invalidateDestinationBufferBySize, with function call of memcpy(x.arr, "hi", (unsigned char) 1.0))

Hence, we cannot make such an assertion here. : (

Sorry for the formatting error... :(

steakhal accepted this revision.Jul 2 2023, 11:53 PM
steakhal added inline comments.
clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
1070
This revision is now accepted and ready to land.Jul 2 2023, 11:53 PM

I will submit it again later to update the code as you suggested.

Besides,

Build Status
Buildable 242737
Build 376801: pre-merge checks libcxx CI FAILED

What does this mean? Was this patch correctly compiled and checked in the previous build? How can I avoid this failure?

Thx

I will submit it again later to update the code as you suggested.

Feel free to commit it with that modifications included.

Besides,

Build Status
Buildable 242737
Build 376801: pre-merge checks libcxx CI FAILED

What does this mean? Was this patch correctly compiled and checked in the previous build? How can I avoid this failure?

This is likely some anomaly. In general, you should check those logs to see if it's related to your patch. If not, than you can proceed and check if any of the bots cry after you commit.
If they do, triage what went wrong and decide if you attempt a fix commit or a revert.

This revision was landed with ongoing or failed builds.Jul 3 2023, 1:15 AM
This revision was automatically updated to reflect the committed changes.