This is an archive of the discontinued LLVM Phabricator instance.

[-Wunsafe-buffer-usage] Add warnings for unsafe buffer accesses by array subscript operations
ClosedPublic

Authored by ziqingluo-90 on Nov 3 2022, 5:42 PM.

Details

Summary

Adding an unsafe Gadget to represent and search (via matchers) for unsafe buffer accesses through array subscript operations.

Diff Detail

Event Timeline

ziqingluo-90 created this revision.Nov 3 2022, 5:42 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 3 2022, 5:43 PM
Herald added a subscriber: rnkovacs. · View Herald Transcript
ziqingluo-90 requested review of this revision.Nov 3 2022, 5:43 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 3 2022, 5:43 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
NoQ added a comment.Nov 3 2022, 6:59 PM

Yes, so this is how you work with the abstraction introduced in D137348. It's very easy to cover more and more patterns incrementally.

ziqingluo-90 retitled this revision from -Wunsafe-buffer-usage: adding warnings for unsafe buffer accesses by array subscript operations to -Wunsafe-buffer-usage: add warnings for unsafe buffer accesses by array subscript operations.Nov 4 2022, 11:48 AM
NoQ retitled this revision from -Wunsafe-buffer-usage: add warnings for unsafe buffer accesses by array subscript operations to [-Wunsafe-buffer-usage] Add warnings for unsafe buffer accesses by array subscript operations.Nov 17 2022, 8:01 PM
xazax.hun added inline comments.Nov 18 2022, 2:43 PM
clang/lib/Analysis/UnsafeBufferUsage.cpp
148–149

As per some of the discussions, in the future the compiler might be able to recognize certain safe patterns, e.g., when there is a simple for loop with known bounds, or when both the index and the array size is statically known.

I think here we need to make a very important design decision: Do we want the gadgets to have the right "safety" category when it is created (e.g., we have to be able to decide if a gadget is safe or not using matchers), or do we want some mechanisms to be able to promote an unsafe gadget to be a safe one? (E.g., do we want to be able to prove some unsafe gadgets safe using dataflow analysis in a later pass?)

aaron.ballman added inline comments.Nov 21 2022, 10:30 AM
clang/lib/Analysis/UnsafeBufferUsage.cpp
148–149

(FWIW, this is a great question and I really appreciate you asking it!)

clang/test/SemaCXX/warn-unsafe-buffer-usage.cpp
16

One test case I'd like to see is: sizeof(p[0]) -- should code in an unevaluated context be warned?

43

Can you also add tests for function declarations like:

void foo(int not_really_an_array[10]) { ... }

template <int N>
void bar(int (&actually_an_array)[N]) { ... }

// Using a dependent type but we know it's a pointer.
template <typename Ty>
void baz(Ty *ptr) { ... }

// Using a dependent type where we have no idea if it's a pointer.
template <typename Ty>
void quux(Ty ptr) { ... }
steakhal added inline comments.
clang/test/SemaCXX/warn-unsafe-buffer-usage.cpp
10–13

I would expect this test file to grow quite a bit.
As such, I think we should have more self-descriptive names for these functions.

I'm also curious what's the purpose of foo()in the examples.

One test case I'd like to see is: sizeof(p[0]) -- should code in an unevaluated context be warned?

I think they should NOT be warned. We haven't addressed the issue of unevaluated context in our patches. I'm adding a test for code in unevaluated context so that we don't forget about it later.

Can you also add tests for function declarations like:

void foo(int not_really_an_array[10]) { ... }

template <int N>
void bar(int (&actually_an_array)[N]) { ... }

// Using a dependent type but we know it's a pointer.
template <typename Ty>
void baz(Ty *ptr) { ... }

// Using a dependent type where we have no idea if it's a pointer.
template <typename Ty>
void quux(Ty ptr) { ... }

Thanks for suggesting these test cases. They have been added in one of the following patches (https://reviews.llvm.org/D138318). That patch improves the matchers to recognize these cases.

ziqingluo-90 added inline comments.Nov 28 2022, 1:15 PM
clang/test/SemaCXX/warn-unsafe-buffer-usage.cpp
10–13

Thanks for the comment. I agree that they should have better names or at least explaining comments.

I'm also curious what's the purpose of foo()in the examples.

I make all testing expressions arguments of foo so that I do not have to create statements to use these expressions while avoiding irrelevant warnings.

NoQ added inline comments.Nov 28 2022, 1:45 PM
clang/lib/Analysis/UnsafeBufferUsage.cpp
148–149

My original design implies that safe gadgets are a separate hierarchy, so there will be a new gadget class for index zero, and this gadget will be changed to skip index zero. But I don't immediately see why such early separation is strictly necessary, other than for a bit of extra type safety (extra virtual methods of the UnsafeGadget class don't make sense on safe gadgets). We *do* have time to make this distinction later, before we get to emitting warnings.

So maybe eventually we'll end up replacing isSafe() with a pure virtual method and deprecate SafeGadget and UnsafeGadget base classes, if we see it cause too much duplication or it turns out that the extra analysis necessary to establish safety can't be nicely implemented in ASTMatchers. In this case I'll admit that I over-engineered it a bit.

clang/test/SemaCXX/warn-unsafe-buffer-usage.cpp
10–13

That's pretty cool but please note that when foo() is declared this way, it becomes a "C-style variadic function" - a very exotic construct that you don't normally see in code (the only practical example is the printf/scanf family of functions). So it may be good that we cover this exotic case from the start, but it may also be really bad that we don't cover the *basic* case.

C++ offers a different way to declare variadic functions: variadic templates (https://en.cppreference.com/w/cpp/language/parameter_pack). These are less valuable to test because they expand to AST that's very similar to the basic case, but that also allows you to cover the basic case better.

Or you can always make yourself happy with a few overloads that cover all your needs, it's not like we're worried about code duplication in tests:

void foo(int);
void foo(int, int);
void foo(int, int, int);
void foo(int, int, int, int);
void foo(int, int, int, int, int);
void foo(int, int, int, int, int, int);
steakhal added inline comments.Nov 28 2022, 2:22 PM
clang/test/SemaCXX/warn-unsafe-buffer-usage.cpp
10–13

IMO its fine. I would probably call it sink() though. Ive used the same construct for the same reason in CSA tests with this name.

ziqingluo-90 added inline comments.Nov 28 2022, 3:02 PM
clang/test/SemaCXX/warn-unsafe-buffer-usage.cpp
10–13

I don't quite get what "basic case" refers to. Could you explain it to me a little more?

NoQ added inline comments.Nov 28 2022, 10:06 PM
clang/test/SemaCXX/warn-unsafe-buffer-usage.cpp
10–13

By "basic case" I mean the ordinary, non-variadic function call with predefined set of arguments in the prototype.

jkorous added inline comments.Dec 1 2022, 9:26 AM
clang/test/SemaCXX/warn-unsafe-buffer-usage.cpp
10–13

@ziqingluo-90 If the only purpose of foo() is to suppress warnings about unused values then you might also consider just suppressing those warnings with relevant -Wno... flags.

ziqingluo-90 added inline comments.Dec 1 2022, 11:33 AM
clang/test/SemaCXX/warn-unsafe-buffer-usage.cpp
10–13

I now see that using a C variadic function could bring some unexpected behaviors such as implicit casts from lvalue reference to rvalue. There are following patches whose tests also use foo. I plan to add an NFC patch later to remove all uses of foo, if this sounds ok.

xazax.hun added inline comments.Dec 5 2022, 3:19 PM
clang/lib/Analysis/UnsafeBufferUsage.cpp
148–149

I'd strongly prefer isSafe() method over inheritance. While adding safe and unsafe gadgets for zero indices is not too bad, if we plan to expand this to more cases, like known array size + compile time constant index, it will get harder and harder to keep all the gadgets in sync. We can end up missing cases or detecting a code snippet both as safe and unsafe. The isSafe method would basically get rid of a whole class of problems, where safe and unsafe versions of the gadgets are overlapping. That being said, I am not insisting doing this change in this PR, it is OK doing the change later in a follow-up.

NoQ added inline comments.Dec 5 2022, 6:58 PM
clang/lib/Analysis/UnsafeBufferUsage.cpp
148–149

Ok so we've had an interesting conversation about this, it's actually harder than that and we'll probably have a much more divergence between unsafe gadgets and safe gadgets covering these operations. The problem is that most unsafe gadgets don't allow fixing code anyway, you typically need more context. For example, ptr[i] may represent unsafe code and ptr[0] may represent safe code, but that doesn't make ptr[0] a safe gadget, because it doesn't make it fixable. In particular, &span[0] crashes when span is empty, whereas ptr[0] doesn't crash when ptr points to one-past-end of the array, so blinding replacing ptr[0] with span[0] will break some code.

Of course even simple ptr, regardless of context, is always fixable to span.data(). But that's a low-quality fix that ruins libc++ runtime checks. So we'll do our best to avoid such fixes unless the extra context makes them inevitable.

So I suspect that we'll allow safe and unsafe gadgets to partially overlap and exclude unsafe gadgets from fixit machine entirely (i.e. move the getFixits() method to the SafeGadget class).

So, like I said in https://reviews.llvm.org/D137348?id=472990#inline-1345337, @malavikasamak is looking into this more deeply.

NoQ added inline comments.Dec 5 2022, 7:03 PM
clang/lib/Analysis/UnsafeBufferUsage.cpp
135–156

Carryover from the nearby patch - let's make a named constant for "arraySubscr"!

155

Carryover from nearby patch - let's use covariant return types!

ziqingluo-90 added inline comments.Dec 6 2022, 4:56 PM
clang/test/SemaCXX/warn-unsafe-buffer-usage.cpp
16

I think they should NOT be warned. We haven't addressed the issue of unevaluated context in our patches. I'm adding a test for code in unevaluated context so that we don't forget about it later.

43

Thanks for suggesting these test cases. They have been added in one of the following patches (https://reviews.llvm.org/D138318). That patch improves the matchers to recognize these cases.

Addressing all the comments we have so far.

NoQ accepted this revision.Dec 15 2022, 6:59 PM

LGTM! Let's land once I land the previous patches.

This revision is now accepted and ready to land.Dec 15 2022, 6:59 PM
This revision was landed with ongoing or failed builds.Dec 16 2022, 5:42 PM
This revision was automatically updated to reflect the committed changes.

This is firing even in checked length codes, is that expected?

example:
https://godbolt.org/z/Todje76ao

std::optional<uint16_t> result;
bool ReadDevice(uint8_t* data, size_t len) {
if (!result)
      return false;
  memset(data, 0, len);
  if (len > 0)     data[0] = (result.value() >> 8) & 0xFF;
  if (len > 1)     data[1] = result.value() & 0xFF;
  return true;
}
<source>:7:26: warning: 'data' is an unsafe pointer used for buffer access [-Wunsafe-buffer-usage]
bool ReadDevice(uint8_t* data, size_t len) {
                ~~~~~~~~~^~~~
<source>:13:20: note: used in buffer access here
  if (len > 1)     data[1] = result.value() & 0xFF;
                   ^~~~

This is firing even in checked length codes, is that expected?

Yes, it is expected. The unsafe buffer analysis is syntax-based. The analysis warns operations that do not follow the buffer-safe programming model we are suggesting. The programming model prohibits pointer arithmetic. In the model, pointer arithmetic and buffer access can be done using hardened libc++ facilities such as std::span.

More information about the analysis and the programming model can be found at https://discourse.llvm.org/t/rfc-c-buffer-hardening/65734.

To suppress the warning, you can either turn the analysis off using -Wno-unsafe-buffer-usage or put code in a pair of opt-out pragmas #pragma clang unsafe_buffer_usage begin & #pragma clang unsafe_buffer_usage end