Page MenuHomePhabricator

[Sema] Address-space sensitive index check for unbounded arrays
ClosedPublic

Authored by chrish_ericsson_atx on Aug 28 2020, 10:23 AM.

Details

Summary

Check applied to unbounded (incomplete) arrays and pointers
to spot cases where the computed address is beyond the
largest possible addressable extent of the array, based
on the address space in which the array is delcared, or
which the pointer refers to.

Check helps to avoid cases of nonsense pointer math and
array indexing which could lead to linker failures or
runtime exceptions. Of particular interest when building
for embedded systems with small address spaces.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Herald added a project: Restricted Project. · View Herald TranscriptAug 28 2020, 10:23 AM
chrish_ericsson_atx requested review of this revision.Aug 28 2020, 10:23 AM

Removed Change-Id from commit log message

chrish_ericsson_atx edited the summary of this revision. (Show Details)Aug 28 2020, 10:56 AM
chrish_ericsson_atx added a reviewer: krememek.
Harbormaster completed remote builds in B69940: Diff 288649.

Corrected formatting (per git-clang-format)

ebevhan added a subscriber: ebevhan.Sep 1 2020, 7:13 AM
ebevhan added inline comments.
clang/lib/Sema/SemaChecking.cpp
14133

It might simplify the patch to move this condition out of the tree and just early return for the other case. That is:

if (isUnboundedArray) {
  if (!(index.isUnsigned() || !index.isNegative()))
    return;

  ...
  return;
}

if (index.isUnsigned() ...
14145

The size calculations here could probably be simplified by doing something like this:

  • If getActiveBits of the index is greater than AddrBits, it's indexing outside
  • Construct an AddrBits-wide APInt containing the index value
  • Use umul_ovf with getTypeSizeInChars(ElementType); if that overflows, it's indexing outside
chrish_ericsson_atx marked an inline comment as done.Sep 1 2020, 7:26 AM

I will tinker with the math to simplify as you suggest. Working with APInt and APSInt seems to promulgate sensitive and brittle code, which makes trying alternative expressions more tedious than I'd like (which is why I bailed on an earlier attempt to simplify this). However, that same observation about brittle code supports the goal that simpler math would be safer, as there would presumably be fewer opportunities for AP/APSInt to misbehave as they interact.

clang/lib/Sema/SemaChecking.cpp
14133

There's a bit more code (starting at line 14094 in this patch set) that applies in all cases, so an early return here would prevent the "Array declared here" note from being generated.

Updating D86796: [Sema] Address-space sensitive index check for unbounded arrays

Refactored math as suggested by Bevin Hansson.

chrish_ericsson_atx marked 2 inline comments as done.Sep 2 2020, 10:13 AM
chrish_ericsson_atx added inline comments.
clang/lib/Sema/SemaChecking.cpp
14145

Refactored as suggested. I agree -- it's cleaner this way (eliminates the loop), and avoids a divide when no diag is required. But it's still hard to read because of all the APInt bitwidth hijinks necessary to do the math without tripping asserts.

ebevhan added inline comments.Sep 3 2020, 5:39 AM
clang/lib/Sema/SemaChecking.cpp
14133

Ah, the note.

I wonder if it wouldn't be cleaner (and avoid indenting the entire block) if that was just duplicated.

14141

Should this not be AddrBits + 1 if you want to add 1 below?

14142

Though, why is the +1 here? Isn't this already the maximum number of elements?

14143–14146

MaxElems should already be at a sufficient width here because of the earlier max. You can probably just do apElemBytes = apElemBytes.zextOrTrunc(MaxElems.getBitWidth())?

14148

What if index is wider than AddrBits, but the active bits are fewer? Then you might miss out on triggering the overflow case in the multiplication.

chrish_ericsson_atx marked 3 inline comments as done.Sep 3 2020, 8:52 AM
chrish_ericsson_atx added inline comments.
clang/lib/Sema/SemaChecking.cpp
14141

AddrBits + 1 would also work. I chose AddrBits << 1 figuring that the width would end up being a nice clean power of 2, but that's not necessarily true. Functionally, the effect of either approach is identical. I suppose + 1 is a bit more obvious, though. I'll make this change.

14142

Initial value of MaxElems is APInt::getMaxValue(AddrBits), which is the index of the last addressable CharUnit in the address space. Adding 1 makes it the total number of addressable CharUnits in the address space, which is what we want as the numerator for computing total number of elements of a given size that will fit in that address space.

14143–14146

Yep -- you are right. index and apElemBytes are already guaranteed to have equal width (>= AddrBits), and MaxElems is guaranteed to have the same width or double width. So the single zext of apElemBytes will do fine. Thanks!

14148

Line 13984 checks for active bits of product being less than AddrBits, which is the same case (since product, by definition, has same width as index). So I think this is covered. If I've misunderstood, please re-ping.

chrish_ericsson_atx marked 2 inline comments as done.

Addressed reviewer feedback.

chrish_ericsson_atx marked 2 inline comments as done.Sep 3 2020, 10:27 AM
chrish_ericsson_atx added inline comments.
clang/lib/Sema/SemaChecking.cpp
14133

Hard to say which is cleaner -- it's a tradeoff. Nesting level would be shallower if that code was duplicated, but then again, the duplication increases the chance of an incomplete fix should an issue be discovered in that code. Overall code would be slightly longer, as well (adding about 16 lines of code, but removing only 4). To me, the current strategy feels more surgical, but I'll change it if you feel more strongly about it than I do. Please re-ping if you do.

Refactored as ebevhan suggested to simplify patch.

chrish_ericsson_atx marked an inline comment as done.Sep 3 2020, 12:21 PM
chrish_ericsson_atx added inline comments.
clang/lib/Sema/SemaChecking.cpp
14133

I reconsidered and made the changes you suggested. Thanks.

NC. Pushing null change in hopes of re-triggering testing.

Unit test that failed is low-level LLVM test, which doesn't even exercise the
code I've changed here, so I'm assuming it's a spurious failure in the CI
machinery.

NC push did not resolve failed test. Rebased in hopes that whatever has
broken the build has been resolved in the intervening commits.

ebevhan added inline comments.Sep 4 2020, 4:28 AM
clang/lib/Sema/SemaChecking.cpp
14063

This could be early return to avoid the indentation.

14142

Ah, yes. Got indexes and sizes mixed up again.

14148

The overflow limit for _ovf is determined by the width of the APInt. If index is 32 bits wide but only has 14 bits active, and AddrBits is 16, then an umul_ovf might overflow past 16 bits but not for 32 bits since the product is the same width as the index. Then we won't detect the overflow.

ebevhan added inline comments.Sep 4 2020, 5:43 AM
clang/lib/Sema/SemaChecking.cpp
14148

Scratch this; I missed the second getActiveBits check. With that it should be fine.

Thanks for the excellent feedback, @ebevhan . @aaron.ballman , @krememek , or @rsmith , could one of you take a look at this change and if it's acceptable, please approve it? I have not requested commit privileges yet, either, so I will need your assistance to land this change, assuming it's acceptable.

Ping?

Sorry for the delayed review, but this managed to fall off my radar. Thank you for the ping!

clang/include/clang/Basic/DiagnosticSemaKinds.td
8854–8857

I'd combine this with the above diagnostic given how similar they are:

def warn_exceeds_max_addressable_bounds : Warning<
  "%select{array index %1|the pointer incremented by %1}0 refers past the last possible element for an array in %2-bit "
  "address space containing %3-bit (%4-byte) elements (max possible %5 element%s6)">,
  InGroup<ArrayBounds>;
clang/lib/Sema/SemaChecking.cpp
14043

IsUnboundedArray per our usual coding style rules.

14063

+1 to using early return here. I might even get rid of isUnboundedArray and just use !BaseType

14071

ElemBytes per usual coding style rules.

14077

Overflow, Product (I'll stop commenting on them -- just take a quick pass to make sure the new variables you introduce meet style rules.)

14085

since that -> since that

(removes a whitespace)

14094–14096
unsigned DiagID = ASE ?
  diag::warn_array_index_exceeds_max_addressable_bounds :
  diag::warn_ptr_arith_exceeds_max_addressable_bounds;
14100

It's not clear to me whether we need to pay the extra expense of doing reachability checks for the diagnostic -- do you have evidence that there would be a high false positive rate if we always emit the diagnostic?

14111

You can use const auto * here since the type is spelled out in the initialization. Same below.

14121

Similar comment here about reachability.

clang/test/Sema/const-eval.c
143–146

You should spell out the full form of the diagnostic the first time it appears in a test. You can use the shortened form for subsequent diagnostics.

Thank you for the comments, @aaron.ballman . I'll update with the changes you requested shortly. I did have some requests for clarification of you, though. Thanks!

clang/include/clang/Basic/DiagnosticSemaKinds.td
8854–8857

I was attempting to follow the pattern set by the preceding four definitions, which keep pointer math warnings and array index warnings separated, but are otherwise nearly identical. If there's value in that pattern for those other warnings, I would assume the same value applies to keeping these separated. Please re-ping if you disagree, otherwise, I'll leave these as two separate warnings.

clang/lib/Sema/SemaChecking.cpp
14063

@ebevhan's comment about early return was actually addressed in a previous diff... I should have marked it as such. My apologies.

I'd prefer to keep isUnboundedArray (with the case corrected, of course), as it seems much clearer than !BaseType in terms of expressing what we are actually checking here, and why.

14100

Sorry-- I'm not following you here, @aaron.ballman. What reachability check are you referring to? The diagnostic isn't conditioned on anything (at least not here, where your comment appears...), so I'm not sure what change you are suggesting that would "always emit the diagnostic"... Sorry to be slow on the uptake here... Could you clarify that for me?

14111

Fair point -- this was just cut-and-paste from the code I had to duplicate in order to address the early return comment from @ebevhan. :) I'll change it in both places.

14121

Similar request for clarification. ;)

Addressed feedback from Aaron Ballman

chrish_ericsson_atx marked 11 inline comments as done.Sep 11 2020, 4:45 AM

Addressed all feedback from Aaron, except for two comments about reachability that I don't understand.

chrish_ericsson_atx marked 2 inline comments as not done.Sep 11 2020, 7:21 AM
aaron.ballman accepted this revision.Sep 14 2020, 9:01 AM

LGTM!

clang/include/clang/Basic/DiagnosticSemaKinds.td
8854–8857

I don't think there's a whole lot of value, but I also don't insist on the change.

clang/lib/Sema/SemaChecking.cpp
14063

That's fine by me, and my brain scrambled a bit -- I thought there was an early return possible for !IsUnboundedArray but I was looking at the wrong line.

14100

DiagRuntimeBehavior() does a reachability check on the given expression and will not emit the diagnostic if the code is not reachable. This check can be somewhat expensive and so it's usually only used when we need to eliminate false positives in common code patterns.

Instead of using DiagRuntimeBehavior(), you could use Diag() instead which doesn't do the reachability check. However, looking further at the changes, I think you're correct to use DiagRuntimeBehavior() -- we use it elsewhere in this same function. Sorry for the noise!

This revision is now accepted and ready to land.Sep 14 2020, 9:01 AM