This is an archive of the discontinued LLVM Phabricator instance.

[clang-tidy] add new checker for string literal with NUL character.
ClosedPublic

Authored by etienneb on Apr 4 2016, 6:27 PM.

Details

Summary

This patch adds the support for detecting suspicious string
literals and their incorrect usage.

The following example shows a incorrect character escaping leading
to an embedded NUL character.

std::string str = "\0x42";   // Should be "\x42".

The patch also add detection of truncated literal when a literal
is passed to a string constructor.

Diff Detail

Event Timeline

etienneb updated this revision to Diff 52652.Apr 4 2016, 6:27 PM
etienneb retitled this revision from to [clang-tidy] add new checker for string literal with NUL character..
etienneb updated this object.
etienneb added a reviewer: alexfh.
etienneb added a subscriber: cfe-commits.
etienneb updated this object.Apr 4 2016, 6:29 PM
etienneb updated this revision to Diff 52653.Apr 4 2016, 6:34 PM

Fix list.rst (bad merge)

bkramer added a subscriber: bkramer.Apr 5 2016, 1:46 AM
bkramer added inline comments.
clang-tidy/misc/StringLiteralWithEmbeddedNulCheck.cpp
21

Isn't this identical to StringLiteral::getCodeUnit? Also returning 0 for unknown sizes is not a good idea imo.

etienneb updated this revision to Diff 52707.Apr 5 2016, 9:54 AM
etienneb marked an inline comment as done.

fix comments.

Please mention this check in docs/ReleaseNotes.rst.

etienneb updated this revision to Diff 52708.Apr 5 2016, 10:12 AM

add missing reference in release-notes.

Eugene.Zelenko added inline comments.Apr 5 2016, 10:16 AM
docs/ReleaseNotes.rst
161 ↗(On Diff #52708)

I think will be good idea to sort check alphabetically. At least I did so :-)

etienneb updated this revision to Diff 52710.Apr 5 2016, 10:18 AM

alpha ordering

bcraig added a subscriber: bcraig.Apr 5 2016, 10:27 AM

Is this checker able to connect a std::string with a pre-declared string literal (i.e. constant propagation)? For example...

const char *bad_chars = "\0\1\2\3";
std::string bad_str = bad_chars;

I've had real code make that mistake before.

Is this checker able to connect a std::string with a pre-declared string literal (i.e. constant propagation)? For example...

const char *bad_chars = "\0\1\2\3";
std::string bad_str = bad_chars;

I've had real code make that mistake before.

Not yet, but I already started working in it.
There is more false-positive to remove first. It will follow this patch soon.

alexfh edited edge metadata.Apr 6 2016, 7:10 AM

A couple of nits for now. Will take a closer look later.

clang-tidy/misc/StringLiteralWithEmbeddedNulCheck.cpp
23

Should this be an assert instead?

39

clang-format -style=LLVM (or -style=file), please.

etienneb updated this revision to Diff 52791.Apr 6 2016, 7:24 AM
etienneb marked 2 inline comments as done.
etienneb edited edge metadata.

alexfh comments.

I finally get rid of the GetCharAt function.

clang-tidy/misc/StringLiteralWithEmbeddedNulCheck.cpp
23

No, because we are not doing bounds check later:

if (GetCharAt(SL, i) == 0 &&
    GetCharAt(SL, i + 1) == 'x' &&
    isDigit(GetCharAt(SL, i + 2)) &&
    isDigit(GetCharAt(SL, i + 3))) {

We can move this as an assert, and in this case... we can modify the "if" to check the length.

There are some APIs where a series of strings are passed as a single char * with NUL separators between the strings and the last string is indicated by two NUL characters in a row. For instance, see the MSDN documentation for the lpDependencies argument to [[ https://msdn.microsoft.com/en-us/library/windows/desktop/ms682450(v=vs.85).aspx | CreateService ]]. Such strings literals in the code look like:

LPCTSTR dependencies = _T("one\0two\0three\0four\0");

I don't see any annotation on this argument in <winsvc.h> where this function is declared, so there's no hint from the header about this argument. In this case, the biggest mistake people make is omitting the final NUL character, thinking that the \0 acts as a string separator and not a terminator:

LPCTSTR dependencies = _T("one\0two\0three\0four");

If you're lucky this results in a crash right away. If you're unlucky the compiler placed this constant in an area of memory with zero padding and it works by accident on your machine until it ships to a customer and then breaks on their machine mysteriously.

I was never a fan of these magic string array arguments and they mostly appear in the older portions of the Win32 API, but they are there nonetheless. I would hate for this check to signal a bunch of false positives on such strings.

There are some APIs where a series of strings are passed as a single char * with NUL separators between the strings and the last string is indicated by two NUL characters in a row. For instance, see the MSDN documentation for the lpDependencies argument to [[ https://msdn.microsoft.com/en-us/library/windows/desktop/ms682450(v=vs.85).aspx | CreateService ]]. Such strings literals in the code look like:

LPCTSTR dependencies = _T("one\0two\0three\0four\0");

I don't see any annotation on this argument in <winsvc.h> where this function is declared, so there's no hint from the header about this argument. In this case, the biggest mistake people make is omitting the final NUL character, thinking that the \0 acts as a string separator and not a terminator:

LPCTSTR dependencies = _T("one\0two\0three\0four");

If you're lucky this results in a crash right away. If you're unlucky the compiler placed this constant in an area of memory with zero padding and it works by accident on your machine until it ships to a customer and then breaks on their machine mysteriously.

I was never a fan of these magic string array arguments and they mostly appear in the older portions of the Win32 API, but they are there nonetheless. I would hate for this check to signal a bunch of false positives on such strings.

I know about these kind functions, and they are not covered by the current state of this check.
The current check only apply if there is a conversion from char* -> std::string (implicit constructor).
For now, I want to keep the false-positive ratio low.

I made a prototype to detect instances passed to 'func(char*)' but there are too many false-positives.

etienneb updated this revision to Diff 52856.Apr 6 2016, 3:09 PM

rebased.

alexfh added a comment.Apr 7 2016, 2:20 AM

A couple of nits.

clang-tidy/misc/MisplacedWideningCastCheck.cpp
64 ↗(On Diff #52856)

If clang-format did this, it used an incorrect style. In LLVM style it should put return on the next line. It should do the right thing, if you run it with -style=file and your clang-tools-extra working copy is checked out inside the cfe working copy (the important part here is that there's clang's or llvm's .clang-format file in a parent directory of the files you're running clang-format on).

95 ↗(On Diff #52856)

s/RelativeIntSizes/relativeIntSize/ (singular, first character should be lower-case).

223 ↗(On Diff #52856)

Two spaces before comments are common for Google style, so your clang-format most certainly uses a wrong style.

alexfh added a comment.Apr 7 2016, 2:22 AM

Wait, it looks like you've updated the patch from an incorrect branch: I see only clang-tidy/misc/MisplacedWideningCastCheck.cpp file here.

alexfh requested changes to this revision.Apr 7 2016, 4:56 AM
alexfh edited edge metadata.
This revision now requires changes to proceed.Apr 7 2016, 4:56 AM
etienneb updated this revision to Diff 52926.Apr 7 2016, 8:31 AM
etienneb edited edge metadata.
etienneb marked 2 inline comments as done.

Revert last diff (invalid patchset).

etienneb updated this revision to Diff 52927.Apr 7 2016, 8:33 AM
etienneb edited edge metadata.

nit

alexfh requested changes to this revision.Apr 7 2016, 8:45 AM
alexfh edited edge metadata.
alexfh added inline comments.
docs/clang-tidy/checks/misc-string-literal-with-embedded-nul.rst
7

Please use third-person form (Finds, validates, etc.).

28

s/an/a/

31

There doesn't seem to be anything bad with NUL-terminated. Did you mean with embedded NUL character?

This revision now requires changes to proceed.Apr 7 2016, 8:45 AM
etienneb updated this revision to Diff 52930.Apr 7 2016, 8:51 AM
etienneb edited edge metadata.
etienneb marked 3 inline comments as done.

fix doc.

alexfh accepted this revision.Apr 7 2016, 9:04 AM
alexfh edited edge metadata.

LG with one nit.

clang-tidy/misc/StringLiteralWithEmbeddedNulCheck.cpp
71

nit: Should this be return instead?

This revision is now accepted and ready to land.Apr 7 2016, 9:04 AM
etienneb updated this revision to Diff 52935.Apr 7 2016, 9:21 AM
etienneb edited edge metadata.

nit

etienneb closed this revision.Apr 7 2016, 9:21 AM