Finds cases where an integer expression is added to the result of a memory allocation function instead of its argument.
Is this a common problem? There's a lot of silly code we could try to find, but if people don't actually write it, then we get all downsides of maintenance without the benefits of the checker.
Oh yes, all our checkers are developed upon user request. They only request it if they find out their developers write such silly code. And this kind of bug is nasty to debug. (In case of addition less memory is available behind the pointer while the memory ahead of it is lost. In case of subtraction data before the pointer gets overwritten.)
But how often is it? Is it just one case?
And this kind of bug is nasty to debug. (In case of addition less memory is available behind the pointer while the memory ahead of it is lost. In case of subtraction data before the pointer gets overwritten.)
ASan can help debug this issue, and more.
This check is quite limited. For example, if the addition is done in a separate statement, this check wouldn't catch the problem. ASan would.
ASan is too heavyweight for this simple problem. It does not point out the source of the issue as quickly as this simple check which also provides a fix. ASan is meant for the less trivial cases. Is this really such a performance hit? Clang-Tidy already contains lots of checks which target a very narrow pattern.
I don't think that adding new check will hurt Clang-tidy. After all author may observe such coding patterns in some code bases. Indeed, it'll be reasonable to run this check on LLVM and other big open source projects.
It's not uncommon to ask the author to run the check over several large corpora of code to see what the false positive and true positive rates are. Not finding any true positives may mean that the check needs a bit more justification (like publications talking about the issue, etc).
This might be a strange argument, but I did see this happen several times at different companies.
When a company tries to decide which lint tools to use they are doing an evaluation. Usually, people doing the evaluation are not compiler/static analysis devs. Given the information they have they often end up comparing the checks different tools have and do the checkbox game. So if their current tool does support some checks that the other tool does not they might have the fear of missing out and do not switch the new one.
This methodology is, of course, flawed. They do not know the utility of each check, but they do not have the resources to do a proper comparison and they do not have the resources to support multiple tools. So one additional consideration might be, should clang tidy try to have appeal when those comparisons are made?
We could argue on both sides. More users the better, since we can get more bug reports, contributors etc. But it might come with additional maintenance costs. If a check is very unlikely to have false positives, this cost might be low enough to worth it.
Drive by comment: (I have no vested interested other than seeing cool new checkers)
FWIW, I work on a system with a multi million line legacy 25+ years old C++ codebase, where we now have clang-tidy integrated into a CI system and we have > 100,000 firings of clang-tidy warnings (with most non platform specific checks turned on), these get collated into a database based on checker type so help us triage them.
I find myself focusing on those checks which have < 10 firings more than those that report 1000's, I like very specific checkers, unlike the checkers like readability-convert-member-function-to-static that tends to report hundreds of warnings but the false positive rate is very high. IMHO The more general the checker the more false positives there seems to be (just personal observation, not backed by data)
When I see new checkers being sent for review I like eyeball (normally with grep) some of my core libraries to see what chance could I be hit by the checker, in order to assess its usefulness (to me at least)
Just to give you an example, I pulled out 3 malloc's from one directory.
::malloc(a + b+ sizeof(void *));
(malloc((a+ 1) * sizeof(char *)));
malloc(sizeof(a) + (unsigned)b)
Now I don't think any of these mallocs would fire this checkers, but I think I'm just a fat finger away from it! Especially if this is the kind of code which exists elsewhere in my code base.
We all know writing code like this today isn't the best way, but clang-tidy is being used to check code which is hidden deep inside our repos, in places people fear to tread or rarely visit or more to the point are too scared to visit let alone touch.
I don't think we should measure the usefulness of a checker based just on it checking new modern c++ code.
I also don't think we should measure a checker by the number of firings it gives. It might only take one bug to cause a pacemaker to stop or a car to crash, or to let a hacker in.
If it can happen, it does happen or it has happened, I like the idea of having a checker to prevent it.
I'm not a fan of the "maintenance" is a burden argument to prevent new stuff coming in. No matter how true it might be. Maintenance of software has been my life for 30 years, in fact, I quite enjoy it. And aren't a lot of us here for fun and giving our timing freely to explicitly help with that burden?
As we just changed some of our communication systems to encourage new people in. Thats not going to work if we beat them back at the door with "its not coming in".
While I can't pitch in with actual findings of this check, @MyDeveloperDay, you're right in many aspects, including those specific examples not firing. But an example that actually fires this check indicates a very specific undefined behaviour case. So if such bogus code (that would trigger on this check) did not cause run-time issues so far, it's because they never free() their memory allocations (really bad?), or their platform is particular enough that it allows calling free into the buffer, not only for the start of the buffer.
You must (only at most once) free each and every pointer as they had returned from a function of the *alloc family. free(malloc(X)) is okay, but free(malloc(X) + b) is, as per the rules of the language, clearly undefined behaviour. Which of course may just as well include the OS/runtime going "Oh the stupid freed into this buffer, let me find the beginning of the allocation..." until one day it doesn't anymore.
(Snarky "self"-advertisement: Did you check CodeChecker to be that bug database? 😜 )
alloca is a linux/posix-specific thing, and as such, I don't think it could be part of the std namespace.
If I put the + b on X as in malloc(X + b) instead of malloc(X) + b, then it's not pointer arithmetic anymore, but (hopefully unsigned) arithmetic. Should the warning message really start with "pointer arithmetic"?
Maybe you could consider the check saying
arithmetic operation applied to pointer result of ...() instead of size-like argument
optionally, I'd clarify it further by putting at the end:
resulting in ignoring a prefix of the buffer.
considering you specifically match on the std(-like) allocations. (At least for now.)
"resulting in ignoring a prefix of the buffer" <- this is only true for addition. What should we write for subtraction?
Since I seem to be in the minority about thinking that this check does not pull its weight, I reviewed the code, and will LGTM and push once the few small issues are fixed.
We usually try to make the replacement as small as possible. In the case of this checker, the replacement should move the closing parenthesis or the closing bracket after the RHS of the binary expression.
It does not have to be a single argument, the same typo can happen with the last argument of the constructor:
A *ptr = new A(a, b, c) + 1; // as written A *ptr = new A(a, b, c + 1); // should have been
I don't think the check lines in this file need regexes. You can match literal text and avoid having to escape special characters:
CHECK-FIXES: char *p = ...;