This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Fix use of length in CStringChecker
ClosedPublic

Authored by vabridgers on Jul 7 2022, 3:18 AM.

Details

Summary

CStringChecker is using getByteLength to get the length of a string
literal. For targets where a "char" is 8-bits, getByteLength() and
getLength() will be equal for a C string, but for targets where a "char"
is 16-bits getByteLength() returns the size in octets.

This is verified in our downstream target, but we have no way to add a
test case for this case since there is no target supporting 16-bit
"char" upstream. Since this cannot have a test case, I'm asserted this
change is "correct by construction", and visually inspected to be
correct by way of the following example where this was found.

The case that shows this fails using a target with 16-bit chars is here.
getByteLength() for the string literal returns 4, which fails when
checked against "char x[4]". With the change, the string literal is
evaluated to a size of 2 which is a correct number of "char"'s for a
16-bit target.

void strcpy_no_overflow_2(char *y) {
  char x[4];
  strcpy(x, "12"); // with getByteLength(), returns 4 using 16-bit chars
}

This change exposed that embedded nulls within the string are not
handled. This is documented as a FIXME for a future fix.

void strcpy_no_overflow_3(char *y) {
  char x[3];
  strcpy(x, "12\0");
}

Diff Detail

Event Timeline

vabridgers created this revision.Jul 7 2022, 3:18 AM
Herald added a project: Restricted Project. · View Herald Transcript
vabridgers requested review of this revision.Jul 7 2022, 3:18 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 7 2022, 3:18 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

Actually, there is one more bug there. If the string has an embedded null-terminator; a wrong length will be returned.
Please add tests to demonstrate the wrong behavior. Try different target triples for having irregular char sizes and pin that.

vabridgers added a comment.EditedJul 7 2022, 3:52 PM

Thanks Balazs, you mean something like this correct?

void strcpy_no_overflow_2(char *y) {
  char x[3];
  strcpy(x, "12\0"); // this produces a warning, but should not. 
}

If I understand correctly, we'd want to iterate through the string literal by "CodeUnit" size, checking each character for NULL. Something like this (terribly hackish ...)

for (countx = 0; 
      countx < strLit->getLength() && (strLit->getCodeUnit(countx) != 0); 
      countx++);

Is that what you're referring too? Nice eye by the way, to have seen that.

Best

vabridgers updated this revision to Diff 443101.Jul 7 2022, 6:11 PM

a proposal to handle embedded null case caught by @steakhal

vabridgers edited the summary of this revision. (Show Details)Jul 7 2022, 6:13 PM

Thanks Vince for the patch! I think the implementation is correct. But I am worried that it might do sub-optimal computations on the string literals. I mean, getting the length would require O(N) steps each time, and that could be problematic with long literals, especially in projects that have many literals.

So, I'd rather keep your first implementation with getLength with the new test case which fails (and the failure is documented). I think, this is what @steakhal was also suggesting.

Then, if your time allows, in a second patch, we could have the loop based implementation for catching embedded null terminators. However, that change should have it's own measurement and maybe even it's own command line option to enable.

Thanks @martong , I understand. I'll make the changes and resubmit. Best!

vabridgers edited the summary of this revision. (Show Details)

update per comments from @martong

vabridgers edited the summary of this revision. (Show Details)Jul 11 2022, 8:41 AM
martong accepted this revision.Jul 13 2022, 6:02 AM

LGTM

This revision is now accepted and ready to land.Jul 13 2022, 6:02 AM