This is an archive of the discontinued LLVM Phabricator instance.

[clang-tidy] Add check for implicit widening of multiplication result
ClosedPublic

Authored by lebedev.ri on Dec 26 2020, 5:49 AM.

Details

Summary

Overflows are never fun.
In most cases (in most of the code), they are rare,
because usually you e.g. don't have as many elements.

However, it's exceptionally easy to fall into this pitfail
in code that deals with images, because, assuming 4-channel 32-bit FP data,
you need *just* ~269 megapixel image to case an overflow
when computing at least the total byte count.

In darktable, there is a *long*, painful history of dealing with such bugs:

and yet they clearly keep resurfacing still.

It would be immensely helpful to have a diagnostic for those patterns,
which is what this change proposes.

Currently, i only diagnose the most obvious case, where multiplication
is directly widened with no other expressions inbetween,
(i.e. long r = (int)a * (int)b but not even e.g. long r = ((int)a * (int)b))
however that might be worth relaxing later.

Diff Detail

Event Timeline

lebedev.ri requested review of this revision.Dec 26 2020, 5:49 AM
lebedev.ri created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptDec 26 2020, 5:49 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
  • Properly handle dependent types (bailout instead of crashing)
  • Add ASTContext::getCorrespondingSignedType()
  • Add fix-it for the unsigned long r = int(x) * int(y) case

Haven't reviewed the code in detail, but the idea seems sound. Is there any prior art in other compilers you can draw data/experiences from? & having some data from some reasonably sized codebases on false positive (how often the correct thing is to suppress the warning, versus fixing the warning) would probably be useful. I would guess it doesn't meet the on-by-default bar we would prefer for Clang, but might still be OK.

clang/docs/ReleaseNotes.rst
64–74 ↗(On Diff #313788)

Does this only trigger when the sizeof(char*) > sizeof(int)? (judging by the test coverage I think that's the case)

(ultimately, might be worth committing the two diagnostics separately - usual sort of reasons, separation of concerns, etc)

clang/test/Sema/implicit-widening-of-pointer-offset-in-array-subscript-expression.c
24 ↗(On Diff #313788)

Question is unclear - is there uncertainty about whether this should be tested? Or is this a false negative case? In which case I'd probably include the test showing no diagnostic and mention it could be fixed/improved?

Thank you for taking a look!

Haven't reviewed the code in detail, but the idea seems sound.

That's reassuring!

Is there any prior art in other compilers you can draw data/experiences from?

As far as i can tell, no other compiler diagnoses these cases,
however e.g. GitHub's CodeQL static analysis tool complains about this by default:
https://github.com/darktable-org/rawspeed/security/code-scanning/7?query=342efe49c0bef4bf2450051586d899f9d303ae62

& having some data from some reasonably sized codebases on false positive
(how often the correct thing is to suppress the warning, versus fixing the warning)
would probably be useful.

I don't have those numbers presently (other than the long long diffs of fixes for true-positives i have already linked),
but as a rule of thumb, iff a particular project in question
never allocates more than UINT_MAX bytes / UINT_MAX / sizeof(element) elements,
*all* issued diagnostics for said project *will* be false-positives by definition.
So YMMV.

I would guess it doesn't meet the on-by-default bar we would prefer for Clang,

Yeah, based on my initial observations, it's pretty chatty.

but might still be OK.

I'm open to input here, but it would be good to have this at least in -Wextra (-Weverything always being a last resort option).

clang/docs/ReleaseNotes.rst
64–74 ↗(On Diff #313788)

Does this only trigger when the sizeof(char*) > sizeof(int)? (judging by the test coverage I think that's the case)

That should be the case, since sizeof(size_t) == sizeof(char*) and
// RUN: %clang_cc1 -triple i686-linux-gnu -fsyntax-only -Wimplicit-widening-of-pointer-offset -verify=silent %s

(ultimately, might be worth committing the two diagnostics separately - usual sort of reasons, separation of concerns, etc)

Should i split this into two reviews to begin with?

clang/test/Sema/implicit-widening-of-pointer-offset-in-array-subscript-expression.c
24 ↗(On Diff #313788)

I may be misremembering things, but IIRC a[b] and b[a] is the same thing,
but i'm not sure how to exercise the second spelling.
I.e. i'm just not sure how to write a test for it.

Thank you for taking a look!

Haven't reviewed the code in detail, but the idea seems sound.

That's reassuring!

Is there any prior art in other compilers you can draw data/experiences from?

As far as i can tell, no other compiler diagnoses these cases,
however e.g. GitHub's CodeQL static analysis tool complains about this by default:
https://github.com/darktable-org/rawspeed/security/code-scanning/7?query=342efe49c0bef4bf2450051586d899f9d303ae62

& having some data from some reasonably sized codebases on false positive
(how often the correct thing is to suppress the warning, versus fixing the warning)
would probably be useful.

I don't have those numbers presently (other than the long long diffs of fixes for true-positives i have already linked),
but as a rule of thumb, iff a particular project in question
never allocates more than UINT_MAX bytes / UINT_MAX / sizeof(element) elements,
*all* issued diagnostics for said project *will* be false-positives by definition.
So YMMV.

I would guess it doesn't meet the on-by-default bar we would prefer for Clang,

Yeah, based on my initial observations, it's pretty chatty.

but might still be OK.

I'm open to input here, but it would be good to have this at least in -Wextra (-Weverything always being a last resort option).

Yeah, not sure how @rsmith, @rtrieu, or @aaron.ballman feel about what the bar for warnings is these days - used to be a bit more hardcore about the "if it can't be on by default it shouldn't be in clang" than we are these days, I think. I'd guess -Wextra might be suitable.

clang/docs/ReleaseNotes.rst
64–74 ↗(On Diff #313788)

Should i split this into two reviews to begin with?

Nah, doubt it's worth splitting up right now - see how a few folks feel about the idea in general, and then if there's enough mechanical details to discuss might be worth splitting out one on top of the other.

clang/test/Sema/implicit-widening-of-pointer-offset-in-array-subscript-expression.c
24 ↗(On Diff #313788)

I may be misremembering things, but IIRC a[b] and b[a] is the same thing,

Yep, that's the case - a[b] where one of them is a pointer and teh other is an integer, is equivalent to *(a + b), so that means it's the same as b[a].

I guess you'd write it the same as t0, but with the expressions reversed? return *(a * b)[base];

lebedev.ri marked 2 inline comments as done.

NFC, rebased, added test with multiplication of shorts.

clang/test/Sema/implicit-widening-of-pointer-offset-in-array-subscript-expression.c
24 ↗(On Diff #313788)

That's what i tried, and it does't work - https://godbolt.org/z/as4vP4

dblaikie added inline comments.Dec 30 2020, 5:43 PM
clang/test/Sema/implicit-widening-of-pointer-offset-in-array-subscript-expression.c
24 ↗(On Diff #313788)

Sorry, my mistake - the * was wrong. This works: https://godbolt.org/z/j9jqx3

lebedev.ri marked 3 inline comments as done.
  • Add test for inverse array subscript expression
  • Support it by skipping paren expressions
  • Actually ensure that we only diagnose only truly implicit casts, but not implicit casts that are implicit steps of an explicit cast
clang/test/Sema/implicit-widening-of-pointer-offset-in-array-subscript-expression.c
24 ↗(On Diff #313788)

Aha, thanks.

lebedev.ri edited the summary of this revision. (Show Details)Jan 7 2021, 3:18 PM

Ping

Thank you for taking a look!

I would guess it doesn't meet the on-by-default bar we would prefer for Clang,

Yeah, based on my initial observations, it's pretty chatty.

but might still be OK.

I'm open to input here, but it would be good to have this at least in -Wextra (-Weverything always being a last resort option).

Yeah, not sure how @rsmith, @rtrieu, or @aaron.ballman feel about what the bar for warnings is these days - used to be a bit more hardcore about the "if it can't be on by default it shouldn't be in clang" than we are these days, I think. I'd guess -Wextra might be suitable.

I'm open to input here, but it would be good to have this at least in -Wextra (-Weverything always being a last resort option).

Yeah, not sure how @rsmith, @rtrieu, or @aaron.ballman feel about what the bar for warnings is these days - used to be a bit more hardcore about the "if it can't be on by default it shouldn't be in clang" than we are these days, I think. I'd guess -Wextra might be suitable.

I'd like to see data before making a determination, but my gut instinct is that this diagnostic will be far too chatty to be in Clang (even in -Wextra) but that this would be a great flow-sensitive check for the clang static analyzer (I've added some analyzer folks to the thread for their input). Static analyzers frequently implement this functionality because they can more easily notice things that limit the range of values possible for a given object and use that extra information when deciding whether overflow is likely or not. e.g., if the analyzer encounters if (foo > 10) return; in a method then it can understand that subsequent uses of foo after that statement are bounded such that foo <= 10. I think we'll ultimately need this flow sensitivity to reduce the false positive rate to something more useful for math-heavy code bases.

Assuming that the data shows a high false positive rate, if you don't have the appetite to work on the robust check in the static analyzer, another option would be to implement this functionality in clang-tidy (which is allowed to be more chatty than the Clang frontend or the static analyzer in terms of false positives). However, I still think we eventually will want the static analyzer check and so my preference is for that approach (so that we don't wind up needing to carry around two implementations of effectively the same check).

NoQ added a comment.Jan 11 2021, 9:19 PM

this would be a great flow-sensitive check for the clang static analyzer

You could indeed use static analyzer's path sensitivity to get rid of these obvious false positives where the multiplication is performed pretty much immediately after a branch (or assert) that avoids overflow, where "pretty much immediately" requires the static analyzer to encounter the overflow check while interpreting the function that contains the potential overflow. It's still not going to be good enough to be a good on-by-default checker because such checks may be performed either before the function is invoked, or in a nested function call for which the body is not immediately available in the translation unit. A good on-by-default checker could be achieved if taint analysis is added to the mixture (i.e., only check for overflows of values that are known to definitely be arbitrary) but that'd limit the power of the checker dramatically because such knowledge is rarely available (you may add annotations for that but that's probably a lot of work in your code).

Warnings with false positives may still be useful when they define a clear coding convention that you want the users to follow. In your case such convention seems to be "always use integer types that are wide enough to avoid overflows" and it says nothing about overflow checks being made, so i guess you don't absolutely need to bother with flow/path sensitivity. I think the answer should be data-driven. Does your codebase already contain overflow checks that cause false positives all over the place? If so, static analyzer to the rescue. If most of your multiplications are unchecked and you already rely on the integer type to be wide enough then you're probably good as-is.

@aaron.ballman thank you for taking a look!
@NoQ thank you for commenting.

Indeed, i very much don't want to diagnose just the cases
where the overflow can be proven to happen, doing that
i believe, would remove most of the true-positive reports.

So indeed, this is beginning to look like a coding guideline,
so let's circumvent all this FUD, and move onto clang-tidy.

lebedev.ri retitled this revision from [clang][Sema] Add diagnostics for implicit widening of multiplication result to [clang-tidy] Add check for implicit widening of multiplication result.
lebedev.ri edited the summary of this revision. (Show Details)
lebedev.ri edited reviewers, added: njames93; removed: rsmith, rjmccall, erichkeane, dblaikie.
lebedev.ri added a project: Restricted Project.

Hmm. I kinda dropped the ball here.
As discussed, move it into a clang-tidy check.
Looks good now?

The CI is showing build failures and there are some clang-tidy nits to be addressed as well.

clang-tools-extra/clang-tidy/bugprone/ImplicitWideningOfMultiplicationResultCheck.cpp
29

I think we should skip parens at the very least. x * y should check the same as (x) * (y).

32

I think that could be handled in a follow-up if we find the need, WDYT?

75

Might be worth it to have tests around signed char, unsigned char, and char explicitly, as that gets awkward.

126

You already are using the way?

clang/lib/AST/ASTContext.cpp
10158

This looks to be getting the same value as in the getCorrespondingUnsignedType() call?

10204

It looks like _ExtInt is missing from both this function and the corresponding unsigned one.

lebedev.ri marked an inline comment as done.Apr 1 2021, 9:55 AM

The CI is showing build failures and there are some clang-tidy nits to be addressed as well.

Thank you for taking a look!

clang-tools-extra/clang-tidy/bugprone/ImplicitWideningOfMultiplicationResultCheck.cpp
32

Sure.

75

Are you asking for test coverage, or for explicit handling of those types?
I'll add tests.

clang/lib/AST/ASTContext.cpp
10158

Yep. Are both of them incorrect?
I'm not sure what should be returned here.

10204

Likewise, i'm not sure what should be returned for that.
Just the original _ExtInt?

lebedev.ri updated this revision to Diff 334929.Apr 2 2021, 2:56 AM
lebedev.ri marked 5 inline comments as done.

Addressed review comments.
Thanks!

aaron.ballman added inline comments.Apr 4 2021, 7:22 AM
clang-tools-extra/clang-tidy/bugprone/ImplicitWideningOfMultiplicationResultCheck.cpp
68

We might want to consider letting the user select between static_cast and C-style casts via an option.

75

Just test coverage to make sure we're doing the right thing for them and not triggering these assertions.

131

One thing that's awkward about this is that there's no portable ssize_t type -- that's a POSIX type but it doesn't exist on all platforms (like Windows). We shouldn't print out a typecast that's going to cause compile errors, but we also shouldn't use the underlying type for ssize_t as that may be incorrect for other target architectures.

clang-tools-extra/docs/clang-tidy/checks/bugprone-implicit-widening-of-multiplication-result.rst
7–10

I think the documentation would be stronger if it explained why this diagnostic is helpful (show some examples of bugs that it catches and that the suggested fixes remove the bug).

clang-tools-extra/test/clang-tidy/checkers/bugprone-implicit-widening-of-multiplication-result-char.cpp
10 ↗(On Diff #334929)

Can you also test the fixit behavior in addition to the diagnostic behavior (at least one test for each kind of fix-it)?

clang/lib/AST/ASTContext.cpp
10158

I think I confused myself on the code, I think both are correct.

10204

_ExtInt has signed and unsigned variants. You can use ASTContext::getExtIntType() to get the type with the correct signedness. However, I see now that it's not a builtin type (huh, I thought it was!), so it'd have to be above the switch but after the enum handling.

lebedev.ri updated this revision to Diff 335169.Apr 4 2021, 1:11 PM
lebedev.ri marked 4 inline comments as done.

@aaron.ballman thank you for taking a look!
Addressing all review notes other than ssize_t and static_cast<>().

clang-tools-extra/clang-tidy/bugprone/ImplicitWideningOfMultiplicationResultCheck.cpp
29

This is more about the future case for which there's FIXME few lines down.

75

It seems to kinda just work, because of the promotion to int.

126

No, that's not it, because SizeTy.getAsString() will not be size_t/ssize_t, but long/etc.

clang-tools-extra/test/clang-tidy/checkers/bugprone-implicit-widening-of-multiplication-result-char.cpp
10 ↗(On Diff #334929)

I would love to do that, but as we have briefly talked with @njames93,
fix-it testing in clang-tidy, as compared to clang proper,
is rudimentary. I basically can not add tests here.
As far as i can see, it doesn't want to apply fix-its, because there are two of them.

So this is the best i can do, take it or leave it. /s

clang/lib/AST/ASTContext.cpp
10158

Actually, this is correct.
Note that we don't return it, but the later code will flip it's sign.

FWIW, it looks like tests are still failing on Windows according to the CI pipeline.

clang-tools-extra/clang-tidy/bugprone/ImplicitWideningOfMultiplicationResultCheck.cpp
126

Ah, I see what you mean now, thanks.

131

I'm still not quite certain what to do about this. Would it make sense to use the underlying type on platforms that don't have ssize_t? Relatedly, if we're going to suggest this as a replacement, we should also insert an include for the correct header file.

clang-tools-extra/test/clang-tidy/checkers/bugprone-implicit-widening-of-multiplication-result-char.cpp
10 ↗(On Diff #334929)

Ah, thank you for clarifying the issue that the two fixits make it hard to test.

lebedev.ri marked 4 inline comments as done.

@aaron.ballman thank you for taking a look!
Addressed all review notes.

clang-tools-extra/clang-tidy/bugprone/ImplicitWideningOfMultiplicationResultCheck.cpp
131

I've been thinking about this, and i really can't come up with a better fix than using ptrdiff_t.

Hardcode triple/target in tests, should fix 32-bit CI.

aaron.ballman accepted this revision.Apr 13 2021, 11:14 AM

LGTM aside from some small nits in the documentation.

clang-tools-extra/clang-tidy/bugprone/ImplicitWideningOfMultiplicationResultCheck.cpp
131

I'm not super excited about using ptrdiff_t as there are not likely to be pointer subtractions in the vicinity of the code being diagnosed, but at the same time, ptrdiff_t is functionally the same as size_t except with a sign bit, which is what ssize_t is. I'll hold my nose for now, but if we get bug reports about this use, we may want to investigate more involved solutions.

clang-tools-extra/docs/clang-tidy/checks/bugprone-implicit-widening-of-multiplication-result.rst
41

Add info about the default behavior.

46

Same here.

This revision is now accepted and ready to land.Apr 13 2021, 11:14 AM
lebedev.ri marked 2 inline comments as done.

Address final nits.

@aaron.ballman thank you so much for the review!

clang-tools-extra/clang-tidy/bugprone/ImplicitWideningOfMultiplicationResultCheck.cpp
131

Yep. signed size_t isn't a valid type, so at best i guess we could avoid emitting such a fixit.
In C++, std::make_signed<size_t>::type is valid, but it's kinda mouthful.

This revision was landed with ongoing or failed builds.Apr 13 2021, 11:41 AM
This revision was automatically updated to reflect the committed changes.