Details
Diff Detail
Event Timeline
Ok let's explain what's going on.
We're basically realizing that most unsafe patterns aren't fixable without additional context. The original design was like
"Suppose we see arr[n] in the code. This is both an indication that the code is unsafe, and an acknowledgement that this use of arr doesn't require fixing as all safe containers will act as a drop-in replacement."
This isn't correct though, because overloaded operator [] is very different from raw operator []. Namely, if the array has N elements, then &arr[N] is valid code when the operator is raw but undefined behavior when the operator is overloaded (which would be caught by -fsanitize=library as a hard crash). Therefore, even though arr[n] is an unsafe pattern, we cannot actually fix it until we discover more context, such as an implicit lvalue-to-rvalue conversion of the subscript result.
Well, technically, we can always "fix" expression arr[n] by replacing it with arr.data()[n]. We don't even need to include [n] in our context, we can replace any occurrence of arr with arr.data() and call it a day. Such "fix", even though it works in every context, is hovewer a "low-quality" fix because it defeats the purpose of safe buffer usage: the container will no longer be able to check bounds in this operation at runtime.
Of course, in some contexts such "low-quality fix" may be the only possible fix.
Which naturally leads to the idea of fixable gadgets acting as "refinements" of each other: out of all gadgets that consume the given use of the variable we could pick the one that acknowledges the most context, because it provides the most high-quality fix. This could potentially save us some work, but it could also lead to sometimes emitting low-quality fixits when manual user intervention would have led to much better code. This would also maintain the uniformity between unsafe and safe gadgets: unsafe gadgets are still fixable gadgets, just most of them will be "refined" later.
Another solution could be to turn gadgets inside out by using ParentMap to discover the context as part of fixit. Such solution is prone to rapidly getting out of control when contexts of different variable use sites start overlapping (eg. in expressions like ptr1 = ptr2).
So for now, in order to avoid overengineering, we stick to a very simple approach instead: keep unsafe gadgets completely separate, not use them for fixit generation at all. Don't have gadgets refine each other, but only implement gadgets that capture *all* the necessary context. Now the only thing they have in common with safe gadgets is that they both are represented as ASTMatcher-based patterns for which the code can be easily scanned. Therefore the getFixits() method gets moved to the safe gadget interface, and unsafe gadgets don't even need to worry about it. If an unsafe gadget turns out to be fixable without additional context, it will have to be duplicated as safe gadget. But this is fine because that's like 1-2 extra classes in a hierarchy of ~30.
I did a superficial pass and the code LGTM (just nits).
Let me add one more thought to our motivation here.
@malavikasamak has pointed out that replacing arr with arr.data() when its type changes from T* to std::span<T> is not a universally correct fix. A counter-example is LHS of an assignment as arr.data() is not an LValue:
arr = nullptr; => arr.data() = nullptr; // error: expression is not assignable
Which means that not only would gadgets that ignore relevant context be of lower quality but in some cases these would also be incorrect.
clang/lib/Analysis/UnsafeBufferUsage.cpp | ||
---|---|---|
579 | This looks like a debug print in the parent commit. We should remove it from this patch. | |
619 | (unrelated to this patch) | |
651 | Is this a left-over debug print? |
clang/lib/Analysis/UnsafeBufferUsage.cpp | ||
---|---|---|
200 | Do we still keep these terms "safe gadget" and "unsafe gadget" for referring to something else or we should forget about them from now on? | |
651 | I think an assertion assert(false && "This should not be entered" ) might be better. But why this code should be unreachable? |
Yeah that's right, my bad, it's only a correct fix for ImplicitCastExpr<LValueToRValue>(arr). You cannot turn a std::span or std::array into a pointer lvalue, so it's fine that we don't have a low-quality fix for such cases.
Looks great to me as well! I have a few nitpicks.
clang/lib/Analysis/UnsafeBufferUsage.cpp | ||
---|---|---|
159 | I already nuked this function, so you probably need to rebase :( | |
185 | getBaseStmt() can probably go to WarningGadget now? | |
190 | Hmm so this function has to stay on the Gadget class because we want to group warnings by variables. Ok right. | |
200 | No let's wipe them out! | |
641–642 | In your case *every* gadget in the list is a warning gadget. So you can simply check whether the list is empty. |
clang/lib/Analysis/UnsafeBufferUsage.cpp | ||
---|---|---|
223 | Yes. Good catch. Thank you. |
clang/lib/Analysis/UnsafeBufferUsage.cpp | ||
---|---|---|
526 |
Nit: Can we add the newline?