Page MenuHomePhabricator

[clang-tidy] New checker for not null-terminated result caused by strlen(), size() or equal length
ClosedPublic

Authored by Charusso on Mar 29 2018, 8:46 AM.

Details

Summary

New checker called bugprone-not-null-terminated-result. This checker finds
function calls where it is possible to cause a not null-terminated result.
Usually the proper length of a string is strlen(src) + 1 or equal length
of this expression, because the null terminator needs an extra space.
Without the null terminator it can result in undefined behaviour when the
string is read.

The following and their respective wchar_t based functions are checked:

memcpy, memcpy_s, memchr, memmove, memmove_s, strerror_s,
strncmp, strxfrm

The following is a real-world example where the programmer forgot to
increase the passed third argument, which is size_t length.
That is why the length of the allocated memory is not enough to hold the
null terminator.

static char *stringCpy(const std::string &str) {
  char *result = reinterpret_cast<char *>(malloc(str.size()));
  memcpy(result, str.data(), str.size());
  return result;
}

In addition to issuing warnings, fix-it rewrites all the necessary code.
It also tries to adjust the capacity of the destination array:

static char *stringCpy(const std::string &str) {
  char *result = reinterpret_cast<char *>(malloc(str.size() + 1));
  strcpy(result, str.data());
  return result;
}

Note: It cannot guarantee to rewrite every of the path-sensitive memory
allocations.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
aaron.ballman added inline comments.Sep 25 2018, 1:31 PM
clang-tidy/bugprone/NotNullTerminatedResultCheck.cpp
37 ↗(On Diff #166802)

Please drop the LHK_ prefix -- the enum class name is sufficient as a prefix.

44 ↗(On Diff #166802)

This seems like it should be a parameter threaded through the function calls from check() rather than a static data member.

55 ↗(On Diff #166802)

These functions don't really add much benefit given that they're one-liners where the implementation is shorter than the declaration.

75 ↗(On Diff #166802)

Length is could -> Length could

93 ↗(On Diff #166802)

it denoted as unknown -> it is denoted as unknown.

(Be sure to add the full stop at the end of the sentence.)

101 ↗(On Diff #166802)

prevents to increase -> prevents increasing

106 ↗(On Diff #166802)

Missing the full stop at the end of the sentence.

110 ↗(On Diff #166802)

that is not -> that it is not

Missing the full stop at the end of the sentence.

111 ↗(On Diff #166802)

isNoWrongLength should probably be named isNotWrongLength() or perhaps reverse the logic to isCorrectLength().

118 ↗(On Diff #166802)

EQ -> Equal, but perhaps "lenEquaSrcLen()" is more succinct?

131 ↗(On Diff #166802)

Reorder the words and add a full stop: Increase or decrease an integral expression by one.

137 ↗(On Diff #166802)

Similar: Increase or decrease the passed integral argument by one.

143 ↗(On Diff #166802)

Missing full stop.

196 ↗(On Diff #166802)

You can call IgnoreParenImpCasts() instead.

201 ↗(On Diff #166802)

You can remove the local const qualifications here (we don't typically use that for local unless the qualified type is a pointer or a reference).

Same elsewhere in the patch.

268 ↗(On Diff #166802)

case -> cases

331 ↗(On Diff #166802)

case -> cases

383 ↗(On Diff #166802)

You can use 0U here instead of an explicit cast.

387 ↗(On Diff #166802)

case -> cases

459–460 ↗(On Diff #166802)

This is insufficient. Just because the user *wants* the safe functions doesn't mean they're actually available. You also need to check that __STDC_LIB_EXT1__ is defined, as that tells you the implementation has Annex K functionality. If that's defined and the user also defines __STDC_WANT_LIB_EXT1__ before including C standard library headers, then they can access them.

I am betting our include fixer doesn't properly handle ensuring the correct macros are defined before including the headers, but I think that's acceptable for now.

514 ↗(On Diff #166802)

it is cannot -> it cannot

Also, add a full stop at the end of the sentence.

569–571 ↗(On Diff #166802)

This code is duplicated from above -- it'd be nice to refactor it into one place, perhaps.

595–596 ↗(On Diff #166802)

I think it's just as understandable, but more simple, to say "the length is too short to include the null terminator".

700 ↗(On Diff #166802)

Is there a way we can do this without storing duplicate information? This is already tracked by the Preprocessor instance, I believe, so if we had a way to query that rather than store our own list, that would be better.

715 ↗(On Diff #166802)

const auto *

733 ↗(On Diff #166802)

const auto *

843 ↗(On Diff #166802)

Missing full stop.

871–872 ↗(On Diff #166802)

Do not use auto when the type is not spelled out in the initialization.

879–883 ↗(On Diff #166802)

These should not be declared as auto -- can skip the assignment operator. Same suggestion applies elsewhere.

962 ↗(On Diff #166802)

Do not use auto here either.

983–984 ↗(On Diff #166802)

Don't use auto (elsewhere as well).

1038 ↗(On Diff #166802)

This should be done using a Twine.

clang-tidy/bugprone/NotNullTerminatedResultCheck.h
42–45 ↗(On Diff #166802)

These comments are out of date -- you don't need "Yes", just Y or y (e.g.) as the first character. So "Yolo" will also work. :-P

docs/clang-tidy/checks/bugprone-not-null-terminated-result.rst
9 ↗(On Diff #166802)

Drop "an".

whisperity added inline comments.
clang-tidy/bugprone/NotNullTerminatedResultCheck.cpp
1038 ↗(On Diff #166802)

Can you please give an example of how to well-approach this? We had issues with using Twines on the stack and then later on running into weird undefined behaviour. The documentation is not clear to a lot of our colleagues on where the Twine's usage barriers are... (Asking this not just for @Charusso but also for a few more colleagues, @baloghadamsoftware e.g.)

JonasToth added inline comments.
clang-tidy/bugprone/NotNullTerminatedResultCheck.cpp
1038 ↗(On Diff #166802)

naive approach:

std::string New = Twine("\n").concat(SpaceBeforeStmtStr).concat(exprToStr(..))...).str();

I think retrieving a SmallString is not supported. Please note, that Twine does support taking integer and floats directly without prior conversion.
You can use operator+ instead of concat too.

Did this help or did you have more specific issues?

aaron.ballman added inline comments.Sep 26 2018, 8:57 AM
clang-tidy/bugprone/NotNullTerminatedResultCheck.cpp
1038 ↗(On Diff #166802)

The important bit is -- don't try to store a Twine (locally or otherwise). It's fine as a parameter in a function, but otherwise you should try to use it only as part of an expression. e.g., (Twine("foo") + "bar" + baz).str() is a good pattern to get used to.

Charusso updated this revision to Diff 167597.Sep 29 2018, 6:38 AM
Charusso marked 37 inline comments as done.
Charusso edited the summary of this revision. (Show Details)

Thanks for the very in-depth review @aaron.ballman!

I have not found a way to obtain the instance of Preprocessor nicely, but it is working well. Changes:

  • Huge refactor .
  • Macro definition checking is rely on Preprocessor.
  • isUnjectUL() is now a function, not a static variable.
  • WantToUseSafeFunctions is the new integer specifying the AreSafeFunctionsAvailable variable without the override feature rely on the necessary macros has been defined.
  • Added a test case in bugprone-not-null-terminated-result-memcpy-before-safe.c if the user wants to use safe functions but it is unavailable.
  • Extra: the LOC is now a round number.
clang-tidy/bugprone/NotNullTerminatedResultCheck.cpp
1038 ↗(On Diff #166802)

+1 for @aaron.ballman's example, it is an unspoken standard.

Charusso marked an inline comment as done.Sep 29 2018, 6:39 AM
whisperity accepted this revision.Oct 12 2018, 3:15 AM

Have been looked at this far and wide too many times now. I say we roll this out and fix and improve later on, lest this only going into Clang 72. Results look promising, let's hope users start reporting good findings too.

Considering Tidy has no alpha group the name bugprone seems like a double-edged sword, as what is bugprone, the checker or what it finds? 😜

This revision is now accepted and ready to land.Oct 12 2018, 3:15 AM

@aaron.ballman Neither I nor @Charusso has commit rights, could you please commit this in our stead?

JonasToth added a comment.EditedOct 12 2018, 10:11 AM

The patch does not apply clean, could you please take a look at it?

  • was a non-issue from arc **

Am 12.10.2018 um 17:21 schrieb Whisperity via Phabricator:

whisperity added a comment.

@aaron.ballman Neither I nor @Charusso has commit rights, could you please commit this in our stead?

https://reviews.llvm.org/D45050

JonasToth closed this revision.Oct 12 2018, 10:25 AM

Thanks everyone for the contribution!

Unfortunatly this check does not compile on some ARM platforms. It seems that a matcher exceeds the recursion limit in template instantations.

http://lab.llvm.org:8011/builders/clang-cmake-armv7-quick/builds/4405/steps/build%20stage%201/logs/stdio

I will revert the check for now as I did not find an easy fix now.

JonasToth reopened this revision.Oct 13 2018, 2:34 AM
This revision is now accepted and ready to land.Oct 13 2018, 2:34 AM
JonasToth requested changes to this revision.Oct 13 2018, 2:34 AM
This revision now requires changes to proceed.Oct 13 2018, 2:34 AM

Unfortunatly this check does not compile on some ARM platforms. It seems that a matcher exceeds the recursion limit in template instantations.

http://lab.llvm.org:8011/builders/clang-cmake-armv7-quick/builds/4405/steps/build%20stage%201/logs/stdio

Well, I can do nothing, but ask for @klimek's help.

Manuel, could you review this error please?

I am in contact with the guy that actually discovered this breakage. It seems to be a weird and hard to reproduce error, but happens on x86 as well. So it is a compile/header something combination that results in the error. I am pretty sure that will take a while until we figured it out :/

Sorry for responding so late, but i thought i get a shot to look at it. Here is the mail from Douglas Yung who was so kind to isolate the breakage and find a reproducer.
If you have the chance please take a look at it.

Hi Jonas,

I have attached a bzipped preprocessed file that I can confirm will show the failure if built with clang++ version 3.7.1, specifically the version that shipped with ubuntu 14.04.5 LTS. Here is the full version information:

Ubuntu clang version 3.7.1-svn253742-1~exp1 (branches/release_37) (based on LLVM 3.7.1)

Target: x86_64-pc-linux-gnu
Thread model: posix

If you build it with “clang++ -std=c++11 NotNullTerminatedResultCheck.preproc.cpp” you should see the failure.

Hope this helps.
Douglas Yung

The file is in the attachement of this comment.

whisperity added a subscriber: dyung.

I have attached a bzipped preprocessed file that I can confirm will show the failure if built with clang++ version 3.7.1, specifically the version that shipped with ubuntu 14.04.5 LTS. Here is the full version information:

Ubuntu clang version 3.7.1-svn253742-1~exp1 (branches/release_37) (based on LLVM 3.7.1)

Target: x86_64-pc-linux-gnu
Thread model: posix

If you build it with “clang++ -std=c++11 NotNullTerminatedResultCheck.preproc.cpp” you should see the failure.

I have installed said Ubuntu in a virtual machine for testing this, but unfortunately only the following Clangs are available in the package manager for Trusty:

clang - C, C++ and Objective-C compiler (LLVM based)
clang-3.3 - C, C++ and Objective-C compiler (LLVM based)
clang-3.4 - C, C++ and Objective-C compiler (LLVM based)
clang-3.5 - C, C++ and Objective-C compiler (LLVM based)
clang-3.6 - C, C++ and Objective-C compiler (LLVM based)
clang-3.8 - C, C++ and Objective-C compiler (LLVM based)
clang-3.9 - C, C++ and Objective-C compiler (LLVM based)

(Where clang is just a synonym for clang-3.4.) There is no Clang 3.7 in the package upstream, it seems.


However, 16.04 LTS (Xenial) at the time of writing this comment has an clang-3.7 package, specifically this version:

Ubuntu clang version 3.7.1-2ubuntu2 (tags/RELEASE_371/final) (based on LLVM 3.7.1)
Target: x86_64-pc-linux-gnu
Thread model: posix

With this I can confirm I get a huge trace and the template depth overflow failure.

However, from the filename-line mappings of the preprocessed output, I can see that the type_traits header comes from /usr/include/c++/4.8/type_traits, which is a version 4.8 standard library, but installing the clang-3.7 package (through some transitivity in libasan and such) depended on gcc-5-base, upgrading it from the system-default 5.3.1 to 5.4.0.

Isn't this a discrepancy, relying on an older standard library than what is seemingly available on the system?

Interesting, did you try out gcc-4.8 and if it is reproducable with
that one? This version of gcc is still supported and if the problem
occurs with the libstdc++ shipped there, we need to consider it.

However, 16.04 LTS (Xenial) at the time of writing this comment has an clang-3.7 package, specifically this version:

Ubuntu clang version 3.7.1-2ubuntu2 (tags/RELEASE_371/final) (based on LLVM 3.7.1)
Target: x86_64-pc-linux-gnu
Thread model: posix

With this I can confirm I get a huge trace and the template depth overflow failure.

However, from the filename-line mappings of the preprocessed output, I can see that the type_traits header comes from /usr/include/c++/4.8/type_traits, which is a version 4.8 standard library, but installing the clang-3.7 package (through some transitivity in libasan and such) depended on gcc-5-base, upgrading it from the system-default 5.3.1 to 5.4.0.

Isn't this a discrepancy, relying on an older standard library than what is seemingly available on the system?

https://reviews.llvm.org/D45050

dyung added a comment.Oct 30 2018, 3:12 PM

I have installed said Ubuntu in a virtual machine for testing this, but unfortunately only the following Clangs are available in the package manager for Trusty:

clang - C, C++ and Objective-C compiler (LLVM based)
clang-3.3 - C, C++ and Objective-C compiler (LLVM based)
clang-3.4 - C, C++ and Objective-C compiler (LLVM based)
clang-3.5 - C, C++ and Objective-C compiler (LLVM based)
clang-3.6 - C, C++ and Objective-C compiler (LLVM based)
clang-3.8 - C, C++ and Objective-C compiler (LLVM based)
clang-3.9 - C, C++ and Objective-C compiler (LLVM based)

(Where clang is just a synonym for clang-3.4.) There is no Clang 3.7 in the package upstream, it seems.

Hi, I did not initially setup the machine that hit the failure, so I cannot say for certain where it got clang. I did notice though that the llvm.org releases page does seem to include a download link for clang 3.7.1 for ubuntu 14.04 (http://releases.llvm.org/3.7.1/clang+llvm-3.7.1-x86_64-linux-gnu-ubuntu-14.04.tar.xz).


However, 16.04 LTS (Xenial) at the time of writing this comment has an clang-3.7 package, specifically this version:

Ubuntu clang version 3.7.1-2ubuntu2 (tags/RELEASE_371/final) (based on LLVM 3.7.1)
Target: x86_64-pc-linux-gnu
Thread model: posix

With this I can confirm I get a huge trace and the template depth overflow failure.

However, from the filename-line mappings of the preprocessed output, I can see that the type_traits header comes from /usr/include/c++/4.8/type_traits, which is a version 4.8 standard library, but installing the clang-3.7 package (through some transitivity in libasan and such) depended on gcc-5-base, upgrading it from the system-default 5.3.1 to 5.4.0.

Isn't this a discrepancy, relying on an older standard library than what is seemingly available on the system?

Again, I'm not sure how the toolchain was installed on the system, and if it was installed by simply unzipping the tarball above (or similar) then I could see how dependencies could go unmet. Offhand I do not even know if gcc 5+ is installed on the machine, but I can check if you think it is important.

gerazo added a subscriber: gerazo.Nov 26 2018, 7:20 AM
Charusso updated this revision to Diff 220215.Sep 14 2019, 8:23 AM
Charusso edited the summary of this revision. (Show Details)
Charusso removed reviewers: hokein, ilya-biryukov, xbolva00, dyung.
Charusso set the repository for this revision to rCTE Clang Tools Extra.

After a while I try to make this patch arrive. I wanted to split it up to multiple patches, but everything tied together so I decided to fix false positives instead with improving the existing APIs. Please visit the diff of the test cases and the documentation to see the changes.

Here are some interesting findings:

bitcoin/src/leveldb/db/c.cc:
- char* result = reinterpret_cast<char*>(malloc(sizeof(char) * str.size()));
- memcpy(result, str.data(), sizeof(char) * str.size());
+ char* result = reinterpret_cast<char*>(malloc((sizeof(char) * str.size()) + 1));
+ strcpy(result, str.data());

ffmpeg/libavformat/avio.c:
- memmove(start, key+1, strlen(key));
+ memmove(start, key+1, strlen(key) + 1);

ffmpeg/libavformat/mpeg.c:
- memcpy(ext, !strncmp(ext, "IDX", 3) ? "SUB" : "sub", 3);
+ strcpy(ext, !strncmp(ext, "IDX", 3) ? "SUB" : "sub");

ffmpeg/libavformat/oggparseskeleton.c:
- strncmp(buf, "fishead", 8)
+ strncmp(buf, "fishead", 7)

sqlite/shell.c:
#define APND_MARK_PREFIX     "Start-Of-SQLite3-"
#define APND_MARK_PREFIX_SZ  17
unsigned char a[APND_MARK_SIZE];
- memcpy(a, APND_MARK_PREFIX, APND_MARK_PREFIX_SZ);
+ strcpy((char *)a, APND_MARK_PREFIX);
Herald added a project: Restricted Project. · View Herald TranscriptSep 14 2019, 8:23 AM

Hm, I wanted to upload the new patch here to see the changes with the diff-mode, but it does not work, sorry.

A couple of drive-by comments.

clang-tools-extra/clang-tidy/bugprone/NotNullTerminatedResultCheck.cpp
23–34

Binding names are copied/read a lot (for memoization purposes, see BoundNodesMap) during matching. It makes sense to make them shorter. Ideally they should fit into the inline std::string buffer (15 characters in libc++ on x86-64, https://gcc.godbolt.org/z/oNBghM).

277–278

This conditional statement never affects the result of the function. Either there's a typo somewhere or this statement can be removed.

Charusso updated this revision to Diff 221096.EditedSep 20 2019, 12:58 PM
  • Fix.
  • Tiny refactor.
Charusso marked 3 inline comments as done.Sep 20 2019, 12:59 PM

Thanks!

clang-tools-extra/clang-tidy/bugprone/NotNullTerminatedResultCheck.cpp
23–34

Great explanation, thanks you!

Charusso marked an inline comment as done.Sep 20 2019, 12:59 PM
JonasToth resigned from this revision.Oct 11 2019, 3:47 AM
This revision is now accepted and ready to land.Oct 11 2019, 3:47 AM
This revision was automatically updated to reflect the committed changes.

This fails on Windows: http://45.33.8.238/win/386/step_7.txt

Please take a look, and if it's not immediately clear how to fix, b please revert while you investigate.

This fails on Windows: http://45.33.8.238/win/386/step_7.txt

Please take a look, and if it's not immediately clear how to fix, b please revert while you investigate.

Oh, thanks! We define it like typedef __typeof(sizeof(int)) size_t;, so by rL374715 it needs to work.

xbolva00 added a comment.EditedOct 13 2019, 4:09 AM

typedef __SIZE_TYPE__ size_t;

typedef __SIZE_TYPE__ size_t;

Hm, yes, I have overestimated the uploaded solution:

$ clang/test grep -rn 'typedef __SIZE_TYPE__ size_t;' | wc -l
60
$ clang/test grep -rn 'typedef __typeof(sizeof(int)) size_t;' | wc -l
45

Thanks!

I think everything is working fine on the build-bot side. One tiny issue on Windows left where the getLength() could not obtain the length properly: rL374713, I will look into that.

Shouldn't it catch in curl also this code?

urllen = strlen(url_clone);

....

memcpy(newest, url_clone, urllen);

Edit: if possible, report these bugs to project developers :)

@xbolva00, now I answer it: it is out of the scope of the Tidy, sadly. I really wanted to make this work with tons of heuristics, which made that patch so enormous. I have not forgotten it, I will report the found issues some months later. Now I want to do the same patch with the help of the Analyzer, and research that, whether this type of matching is could be done more appropriately. I have already started that idea by D68725. Thanks again!

Thanks @JonasToth, you really wanted to see my first patch upstreamed, and also thanks everyone, your support really helped me to understand how to do reviews and how to Clang. I apologize for the size, I felt that the larger the better.