This is an archive of the discontinued LLVM Phabricator instance.

[clang-tidy] Add unsigned subtraction warning, with suggestion to convert to unsigned literals.
Needs RevisionPublic

Authored by sorenj on Dec 17 2019, 6:41 AM.

Details

Summary

Many C++ programmers are unaware that an expression of unsigned - signed will
promote the signed argument to unsigned, and the resulting underflow produces
a large unsigned result, rather than signed negative result.
Hence the frequent errors related to the test x.size() - 1 <= 0
when the container x is empty.

This clang tidy detects signed values being subtracted from unsigned values
and warns the user about the potential error. It is not perfect as it is
not always possible at compile time to reason about code when this comparison
is made.

The warning also suggests a fixit change that will append a "u" to numerical
constants - this makes the implicit cast explicit and signals that the
developer knew what they were doing in a subtraction. In other cases it
suggests the rather abhorrent static_cast<>().

The easiest fix is to not do subtraction at all, just move the operation
to the other side of the comparison where it becomes an addition - which
has none of these surprising properties.

Event Timeline

sorenj created this revision.Dec 17 2019, 6:41 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 17 2019, 6:41 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
lebedev.ri retitled this revision from Add unsigned subtraction warning, with suggestion to convert to unsigned literals. to [clang-tidy] Add unsigned subtraction warning, with suggestion to convert to unsigned literals..Dec 17 2019, 6:46 AM

Please add documentation and mention new check in Release Notes.

clang-tools-extra/clang-tidy/bugprone/UnsignedSubtractionCheck.cpp
11

Unnecessary empty line.

138

Please separate with empty line.

Eugene.Zelenko added a project: Restricted Project.
sorenj updated this revision to Diff 234294.Dec 17 2019, 6:59 AM

Address requested whitespace changes.

sorenj marked 2 inline comments as done.Dec 17 2019, 7:00 AM
lebedev.ri edited the summary of this revision. (Show Details)Dec 17 2019, 9:40 AM
lebedev.ri added inline comments.
clang-tools-extra/clang-tidy/bugprone/UnsignedSubtractionCheck.cpp
22

The check's name is more generic than what it does - it only looks at mixed subtractions in comparisons.
The name implies it looks at all mixed subtractions.

sorenj marked an inline comment as done.Dec 17 2019, 10:12 AM
sorenj added inline comments.
clang-tools-extra/clang-tidy/bugprone/UnsignedSubtractionCheck.cpp
22

But it does look at all mixed subtractions, line 28 of the unit test shows one such example.

lebedev.ri added inline comments.Dec 17 2019, 10:55 AM
clang-tools-extra/clang-tidy/bugprone/UnsignedSubtractionCheck.cpp
22

Oh, i didn't spot that one, sorry for the false alarm.

Somewhat related: https://reviews.llvm.org/D40854
This is a check that tried to enforce not mixing any signed/unsigned arithmetic. there was no feedback from the cppcoreguideline-ppl on how to proceed with edge cases and occassion where mixing is not avoidable (e.g. unsigned short + unsigned short + unsigned short, because of integer promotion).
Just to inform you as it might help with testing or anything like that.

Friendly ping - anything further I need to do here?

Documentation and Release Notes entry are still missing..

sorenj updated this revision to Diff 237440.Jan 10 2020, 2:53 PM

First time so trying to follow similar recent submits. PTAL.

Eugene.Zelenko added inline comments.Jan 10 2020, 2:59 PM
clang-tools-extra/docs/ReleaseNotes.rst
106

Please synchronize with first statement in documentation.

clang-tools-extra/docs/clang-tidy/checks/bugprone-unsigned-subtraction.rst
6–7

This check should be removed.

16

fix-it. Check, not warning.

18

code was intended?

sorenj updated this revision to Diff 237466.Jan 10 2020, 5:53 PM
  • Address documentation comments.
  • Address documentation comments.
Eugene.Zelenko added inline comments.Jan 10 2020, 6:12 PM
clang-tools-extra/docs/clang-tidy/checks/bugprone-unsigned-subtraction.rst
18

Please fix double space.

sorenj updated this revision to Diff 237471.Jan 10 2020, 6:38 PM
  • Remove double space.

Anything further needed?

aaron.ballman requested changes to this revision.Jan 16 2020, 6:49 AM

Given that the compiler already has -Wsign-conversion, I'm not certain what value is added by this check. Can you explain a bit about why this is the correct approach for diagnosing the issue? When you ignore the false positives from the test, the only cases not pointed out by -Wsign-compare are the macro cases (which is reasonable behavior to not diagnose on -- the fix for one macro may make other macros invalid).

clang-tools-extra/test/clang-tidy/checkers/bugprone-unsigned-subtraction.cpp
36–37

I consider this to be a false positive diagnostic -- the 2 is converted from a signed value into an unsigned value and there cannot possibly be a conversion error on that implicit cast.

100

Similarly, I consider this to be a false positive.

This revision now requires changes to proceed.Jan 16 2020, 6:49 AM

Okay, but as you can see the majority of my test cases are intentionally false negatives - -Wsign-conversion triggers so often than many people don't use it. And, unsigned x = 2; does not trigger a sign conversion warning despite there being a conversion form 2 to 2u. This check is targeting a very specific but frequent case of functions that do not guard against containers that might be empty.

Regarding the false positives - I think you are focusing on the semantics of the fix-it which is literally a no-op. The code before and after may be just as wrong, but the salience of the implied conversion is a bit higher. Maybe that's not a good idea for a change that may be applied automatically, even if safe.

In short I'm not sure if you are objecting the principle here, or the implementation. Is this something that can be refined further? I understand the objection in the first case, but what about the second case makes you think this is not a good diagnostic and a false positive? It's the specific case I was targeting: the unprotected subtraction of a non-zero value from a potentially smaller value.

This check was used across our very large code base and was evaluated by examining the code by looking at the context where these warning fired. The vast majority were judged to be true positives. Unfortunately it's not possible to share those results. I will try to run this check on some fraction of the llvm code base and follow up.

So, I ran this check against the cxx directory of llvm, it fired 5 times so let's look at the context and disucss:

There are two identical spots in locale.cpp, the first is around line 2717

uint16_t t = static_cast<uint16_t>(
         0xD800
       | ((((wc & 0x1F0000) >> 16) - 1) << 6)
       |   ((wc & 0x00FC00) >> 10));

the fact that a signed value is being right shifted here surprises me, but looking earlier there's a branch for wc < 0x010000 so we are safe here against wc being 0. So this is a false diagnostic. Still, wc is a uint32_t, it's the 0x1f0000 that converts it to signed. Probably should be 0x1f0000u? Will still false-alarm on this code though unless you use - 1u to make the whole thing unsigned end to end.

the third is in memory.cc

void*
align(size_t alignment, size_t size, void*& ptr, size_t& space)
{
    void* r = nullptr;
    if (size <= space)
    {
        char* p1 = static_cast<char*>(ptr);
        char* p2 = reinterpret_cast<char*>(reinterpret_cast<size_t>(p1 + (alignment - 1)) & -alignment);

here it doesn't make sense for alignment to be zero, and the & -alignment will be zero anyway, so false alarm although some check for alignment > 0 might be an improvement

The last two are in valarray.cpp lines 35 and 43, I've copied a large excerpt here

void
gslice::__init(size_t __start)
{
    valarray<size_t> __indices(__size_.size());
    size_t __k = __size_.size() != 0;
    for (size_t __i = 0; __i < __size_.size(); ++__i)
        __k *= __size_[__i];
    __1d_.resize(__k);
    if (__1d_.size())
    {
        __k = 0;
        __1d_[__k] = __start;
        while (true)
        {
            size_t __i = __indices.size() - 1;
            while (true)
            {
                if (++__indices[__i] < __size_[__i])
                {
                    ++__k;
                    __1d_[__k] = __1d_[__k-1] + __stride_[__i];
                    for (size_t __j = __i + 1; __j != __indices.size(); ++__j)
                        __1d_[__k] -= __stride_[__j] * (__size_[__j] - 1);
                    break;
                }
                else

with some looking I can see that __indices.size() has to be non-zero. It's less clear to me that size_[__j] is strictly positive here and there should probably be some guard against underflow there.

That's a firing rate of about 5/12K lines of code, but I wonder if I were to send patches for these 3 files that silence the warning I wonder how many would be approved.

Okay, but as you can see the majority of my test cases are intentionally false negatives - -Wsign-conversion triggers so often than many people don't use it.

Then we should be addressing that issue rather than duplicating the functionality, no?

And, unsigned x = 2; does not trigger a sign conversion warning despite there being a conversion form 2 to 2u.

That should *not* trigger a sign conversion warning because there is no sign conversion. We know the exact value of the RHS and can see that it does not change signs.

This check is targeting a very specific but frequent case of functions that do not guard against containers that might be empty.

We should consider a better name for the check then and limit its utility to those cases. Truthfully, I think that check would be quite useful, but it would almost certainly be a clang static analyzer check to handle control and data flow. e.g., such a check should be able to handle these situations:

size_t count1 = some_container.size() - 1; // Bad if empty
size_t num_elements = some_container.size();
size_t count2 = num_elements - 1; // Bad if empty
if (num_element)
  size_t count3 = num_elements - 1; // Just fine
size_t count4 = std::size(some_container) - 1; // Bad if empty
size_t count5 = std::distance(some_container.begin(), some_container.end()) - 1; // Bad if empty? (Note, there's no type-based sign conversion here)

num_elements + 1;
size_t count6 = num_elements - 1; // Just fine

Regarding the false positives - I think you are focusing on the semantics of the fix-it which is literally a no-op. The code before and after may be just as wrong, but the salience of the implied conversion is a bit higher. Maybe that's not a good idea for a change that may be applied automatically, even if safe.

In short I'm not sure if you are objecting the principle here, or the implementation.

A bit of both. I don't think clang-tidy should get a -Wsign-conversion check -- the frontend handles that, and if there are deficiencies with that handling, we should address those. However, a check that's focused solely on container and container-like situations where an empty container would cause problems seems like a very interesting check that has value. I'm not certain that implementing it in clang-tidy would catch the cases with a reasonable false-positive rate, but it seems like it wouldn't be bad as a static analyzer check instead.

My colleague pointed out that -Wsigned-conversion will not detect this very frequent mistake

for (size_t i = 0; i < v.size() - 1; ++i)

It is my contention, and I think it's pretty well substantiated by reviewing cases where this detector fails that no coder ever really expects to subtract a signed value from an unsigned value.

MaskRay added subscribers: NoQ, MaskRay.EditedJun 24 2020, 11:58 AM

+@NoQ on comments whether clang static analyzer can catch these cases.

clang++ --analyze a.cc does not warn on a.size()-2 AFAICT.

NoQ added a comment.Jun 24 2020, 7:20 PM

+@NoQ on comments whether clang static analyzer can catch these cases.

clang++ --analyze a.cc does not warn on a.size()-2 AFAICT.

Implementing such check in the static analyzer with the help of path-sensitive analysis would ultimately allow you to potentially eliminate (in a very precise and fairly principled/non-hacky way) false positives such as your example with .empty() or even this one:

void foo(vector &v) {
  v.push_back(a);
  v.push_back(b);

  // Size is known to be at least 2, therefore overflow never occurs.
  for (size_t i = 0; i < v.size() - 2; ++i) {
    // ...
  }
}

That won't happen immediately though; it'll require some routine work that'll consist in teaching the analyzer facts such as "only empty containers have size 0" or "vectors grow when pushed into". The analyzer would automagically refute such false positives (in all of its checkers!) once it acquires such knowledge. That said, it's still a fairly large amount of routine work, so i'd rather not have you blocked on this and recommend committing into clang-tidy. We can always move or duplicate the check later if you decide to proceed with this approach.

NoQ added a comment.Jun 24 2020, 7:46 PM

Mmm, well, on second thought, no, that probably won't be sufficient. As long as we're willing to warn on containers about which we know absolutely nothing, we'll still have some of those false positives. And if we stop warning on such containers, the check will probably become much more quiet and much less useful. So i think this check will work best as an opt-in style/lint guideline. It might still make a bit more sense with path-sensitive analysis but we'll still have to accept the inevitable obvious false positives so it's probably not worth it.

aminya added a subscriber: aminya.Oct 18 2021, 10:52 PM

I just hit this bug in my code where the subtraction of two size_t values resulted in a very large value. Fortunately, the address sanitizer immediately alerted me about the issue. It would be great to have such a warning in clang-tidy.

99dcz added a subscriber: 99dcz.Jan 10 2023, 3:08 PM
Herald added a project: Restricted Project. · View Herald Transcript