Adding an unsafe Gadget to represent and search (via matchers) for unsafe buffer accesses through array subscript operations.
Details
Diff Detail
Event Timeline
Yes, so this is how you work with the abstraction introduced in D137348. It's very easy to cover more and more patterns incrementally.
clang/lib/Analysis/UnsafeBufferUsage.cpp | ||
---|---|---|
168–169 | As per some of the discussions, in the future the compiler might be able to recognize certain safe patterns, e.g., when there is a simple for loop with known bounds, or when both the index and the array size is statically known. I think here we need to make a very important design decision: Do we want the gadgets to have the right "safety" category when it is created (e.g., we have to be able to decide if a gadget is safe or not using matchers), or do we want some mechanisms to be able to promote an unsafe gadget to be a safe one? (E.g., do we want to be able to prove some unsafe gadgets safe using dataflow analysis in a later pass?) |
clang/lib/Analysis/UnsafeBufferUsage.cpp | ||
---|---|---|
168–169 | (FWIW, this is a great question and I really appreciate you asking it!) | |
clang/test/SemaCXX/warn-unsafe-buffer-usage.cpp | ||
16 | One test case I'd like to see is: sizeof(p[0]) -- should code in an unevaluated context be warned? | |
43 | Can you also add tests for function declarations like: void foo(int not_really_an_array[10]) { ... } template <int N> void bar(int (&actually_an_array)[N]) { ... } // Using a dependent type but we know it's a pointer. template <typename Ty> void baz(Ty *ptr) { ... } // Using a dependent type where we have no idea if it's a pointer. template <typename Ty> void quux(Ty ptr) { ... } |
clang/test/SemaCXX/warn-unsafe-buffer-usage.cpp | ||
---|---|---|
10–13 | I would expect this test file to grow quite a bit. I'm also curious what's the purpose of foo()in the examples. |
One test case I'd like to see is: sizeof(p[0]) -- should code in an unevaluated context be warned?
I think they should NOT be warned. We haven't addressed the issue of unevaluated context in our patches. I'm adding a test for code in unevaluated context so that we don't forget about it later.
Can you also add tests for function declarations like:
void foo(int not_really_an_array[10]) { ... } template <int N> void bar(int (&actually_an_array)[N]) { ... } // Using a dependent type but we know it's a pointer. template <typename Ty> void baz(Ty *ptr) { ... } // Using a dependent type where we have no idea if it's a pointer. template <typename Ty> void quux(Ty ptr) { ... }
Thanks for suggesting these test cases. They have been added in one of the following patches (https://reviews.llvm.org/D138318). That patch improves the matchers to recognize these cases.
clang/test/SemaCXX/warn-unsafe-buffer-usage.cpp | ||
---|---|---|
10–13 | Thanks for the comment. I agree that they should have better names or at least explaining comments.
I make all testing expressions arguments of foo so that I do not have to create statements to use these expressions while avoiding irrelevant warnings. |
clang/lib/Analysis/UnsafeBufferUsage.cpp | ||
---|---|---|
168–169 | My original design implies that safe gadgets are a separate hierarchy, so there will be a new gadget class for index zero, and this gadget will be changed to skip index zero. But I don't immediately see why such early separation is strictly necessary, other than for a bit of extra type safety (extra virtual methods of the UnsafeGadget class don't make sense on safe gadgets). We *do* have time to make this distinction later, before we get to emitting warnings. So maybe eventually we'll end up replacing isSafe() with a pure virtual method and deprecate SafeGadget and UnsafeGadget base classes, if we see it cause too much duplication or it turns out that the extra analysis necessary to establish safety can't be nicely implemented in ASTMatchers. In this case I'll admit that I over-engineered it a bit. | |
clang/test/SemaCXX/warn-unsafe-buffer-usage.cpp | ||
10–13 | That's pretty cool but please note that when foo() is declared this way, it becomes a "C-style variadic function" - a very exotic construct that you don't normally see in code (the only practical example is the printf/scanf family of functions). So it may be good that we cover this exotic case from the start, but it may also be really bad that we don't cover the *basic* case. C++ offers a different way to declare variadic functions: variadic templates (https://en.cppreference.com/w/cpp/language/parameter_pack). These are less valuable to test because they expand to AST that's very similar to the basic case, but that also allows you to cover the basic case better. Or you can always make yourself happy with a few overloads that cover all your needs, it's not like we're worried about code duplication in tests: void foo(int); void foo(int, int); void foo(int, int, int); void foo(int, int, int, int); void foo(int, int, int, int, int); void foo(int, int, int, int, int, int); |
clang/test/SemaCXX/warn-unsafe-buffer-usage.cpp | ||
---|---|---|
10–13 | IMO its fine. I would probably call it sink() though. Ive used the same construct for the same reason in CSA tests with this name. |
clang/test/SemaCXX/warn-unsafe-buffer-usage.cpp | ||
---|---|---|
10–13 | I don't quite get what "basic case" refers to. Could you explain it to me a little more? |
clang/test/SemaCXX/warn-unsafe-buffer-usage.cpp | ||
---|---|---|
10–13 | By "basic case" I mean the ordinary, non-variadic function call with predefined set of arguments in the prototype. |
clang/test/SemaCXX/warn-unsafe-buffer-usage.cpp | ||
---|---|---|
10–13 | @ziqingluo-90 If the only purpose of foo() is to suppress warnings about unused values then you might also consider just suppressing those warnings with relevant -Wno... flags. |
clang/test/SemaCXX/warn-unsafe-buffer-usage.cpp | ||
---|---|---|
10–13 | I now see that using a C variadic function could bring some unexpected behaviors such as implicit casts from lvalue reference to rvalue. There are following patches whose tests also use foo. I plan to add an NFC patch later to remove all uses of foo, if this sounds ok. |
clang/lib/Analysis/UnsafeBufferUsage.cpp | ||
---|---|---|
168–169 | I'd strongly prefer isSafe() method over inheritance. While adding safe and unsafe gadgets for zero indices is not too bad, if we plan to expand this to more cases, like known array size + compile time constant index, it will get harder and harder to keep all the gadgets in sync. We can end up missing cases or detecting a code snippet both as safe and unsafe. The isSafe method would basically get rid of a whole class of problems, where safe and unsafe versions of the gadgets are overlapping. That being said, I am not insisting doing this change in this PR, it is OK doing the change later in a follow-up. |
clang/lib/Analysis/UnsafeBufferUsage.cpp | ||
---|---|---|
168–169 | Ok so we've had an interesting conversation about this, it's actually harder than that and we'll probably have a much more divergence between unsafe gadgets and safe gadgets covering these operations. The problem is that most unsafe gadgets don't allow fixing code anyway, you typically need more context. For example, ptr[i] may represent unsafe code and ptr[0] may represent safe code, but that doesn't make ptr[0] a safe gadget, because it doesn't make it fixable. In particular, &span[0] crashes when span is empty, whereas ptr[0] doesn't crash when ptr points to one-past-end of the array, so blinding replacing ptr[0] with span[0] will break some code. Of course even simple ptr, regardless of context, is always fixable to span.data(). But that's a low-quality fix that ruins libc++ runtime checks. So we'll do our best to avoid such fixes unless the extra context makes them inevitable. So I suspect that we'll allow safe and unsafe gadgets to partially overlap and exclude unsafe gadgets from fixit machine entirely (i.e. move the getFixits() method to the SafeGadget class). So, like I said in https://reviews.llvm.org/D137348?id=472990#inline-1345337, @malavikasamak is looking into this more deeply. |
clang/test/SemaCXX/warn-unsafe-buffer-usage.cpp | ||
---|---|---|
16 | I think they should NOT be warned. We haven't addressed the issue of unevaluated context in our patches. I'm adding a test for code in unevaluated context so that we don't forget about it later. | |
43 | Thanks for suggesting these test cases. They have been added in one of the following patches (https://reviews.llvm.org/D138318). That patch improves the matchers to recognize these cases. |
This is firing even in checked length codes, is that expected?
example:
https://godbolt.org/z/Todje76ao
std::optional<uint16_t> result; bool ReadDevice(uint8_t* data, size_t len) { if (!result) return false; memset(data, 0, len); if (len > 0) data[0] = (result.value() >> 8) & 0xFF; if (len > 1) data[1] = result.value() & 0xFF; return true; }
<source>:7:26: warning: 'data' is an unsafe pointer used for buffer access [-Wunsafe-buffer-usage] bool ReadDevice(uint8_t* data, size_t len) { ~~~~~~~~~^~~~ <source>:13:20: note: used in buffer access here if (len > 1) data[1] = result.value() & 0xFF; ^~~~
Yes, it is expected. The unsafe buffer analysis is syntax-based. The analysis warns operations that do not follow the buffer-safe programming model we are suggesting. The programming model prohibits pointer arithmetic. In the model, pointer arithmetic and buffer access can be done using hardened libc++ facilities such as std::span.
More information about the analysis and the programming model can be found at https://discourse.llvm.org/t/rfc-c-buffer-hardening/65734.
To suppress the warning, you can either turn the analysis off using -Wno-unsafe-buffer-usage or put code in a pair of opt-out pragmas #pragma clang unsafe_buffer_usage begin & #pragma clang unsafe_buffer_usage end
As per some of the discussions, in the future the compiler might be able to recognize certain safe patterns, e.g., when there is a simple for loop with known bounds, or when both the index and the array size is statically known.
I think here we need to make a very important design decision: Do we want the gadgets to have the right "safety" category when it is created (e.g., we have to be able to decide if a gadget is safe or not using matchers), or do we want some mechanisms to be able to promote an unsafe gadget to be a safe one? (E.g., do we want to be able to prove some unsafe gadgets safe using dataflow analysis in a later pass?)