This is an archive of the discontinued LLVM Phabricator instance.

[-Wunsafe-buffer-usage] Safe-buffers re-architecture to introduce Fixable gadgets
ClosedPublic

Authored by malavikasamak on Dec 14 2022, 2:57 PM.

Diff Detail

Event Timeline

malavikasamak created this revision.Dec 14 2022, 2:57 PM
Herald added a project: Restricted Project. · View Herald Transcript
malavikasamak requested review of this revision.Dec 14 2022, 2:57 PM
NoQ added a comment.EditedDec 14 2022, 9:17 PM

Ok let's explain what's going on.

We're basically realizing that most unsafe patterns aren't fixable without additional context. The original design was like

"Suppose we see arr[n] in the code. This is both an indication that the code is unsafe, and an acknowledgement that this use of arr doesn't require fixing as all safe containers will act as a drop-in replacement."

This isn't correct though, because overloaded operator [] is very different from raw operator []. Namely, if the array has N elements, then &arr[N] is valid code when the operator is raw but undefined behavior when the operator is overloaded (which would be caught by -fsanitize=library as a hard crash). Therefore, even though arr[n] is an unsafe pattern, we cannot actually fix it until we discover more context, such as an implicit lvalue-to-rvalue conversion of the subscript result.

Well, technically, we can always "fix" expression arr[n] by replacing it with arr.data()[n]. We don't even need to include [n] in our context, we can replace any occurrence of arr with arr.data() and call it a day. Such "fix", even though it works in every context, is hovewer a "low-quality" fix because it defeats the purpose of safe buffer usage: the container will no longer be able to check bounds in this operation at runtime.

Of course, in some contexts such "low-quality fix" may be the only possible fix.

Which naturally leads to the idea of fixable gadgets acting as "refinements" of each other: out of all gadgets that consume the given use of the variable we could pick the one that acknowledges the most context, because it provides the most high-quality fix. This could potentially save us some work, but it could also lead to sometimes emitting low-quality fixits when manual user intervention would have led to much better code. This would also maintain the uniformity between unsafe and safe gadgets: unsafe gadgets are still fixable gadgets, just most of them will be "refined" later.

Another solution could be to turn gadgets inside out by using ParentMap to discover the context as part of fixit. Such solution is prone to rapidly getting out of control when contexts of different variable use sites start overlapping (eg. in expressions like ptr1 = ptr2).

So for now, in order to avoid overengineering, we stick to a very simple approach instead: keep unsafe gadgets completely separate, not use them for fixit generation at all. Don't have gadgets refine each other, but only implement gadgets that capture *all* the necessary context. Now the only thing they have in common with safe gadgets is that they both are represented as ASTMatcher-based patterns for which the code can be easily scanned. Therefore the getFixits() method gets moved to the safe gadget interface, and unsafe gadgets don't even need to worry about it. If an unsafe gadget turns out to be fixable without additional context, it will have to be duplicated as safe gadget. But this is fine because that's like 1-2 extra classes in a hierarchy of ~30.

jkorous accepted this revision.Dec 15 2022, 5:36 PM

I did a superficial pass and the code LGTM (just nits).

Let me add one more thought to our motivation here.
@malavikasamak has pointed out that replacing arr with arr.data() when its type changes from T* to std::span<T> is not a universally correct fix. A counter-example is LHS of an assignment as arr.data() is not an LValue:

arr = nullptr;
=>
arr.data() = nullptr; // error: expression is not assignable

Which means that not only would gadgets that ignore relevant context be of lower quality but in some cases these would also be incorrect.

clang/lib/Analysis/UnsafeBufferUsage.cpp
579

This looks like a debug print in the parent commit. We should remove it from this patch.

620

(unrelated to this patch)
In some follow-up patch we should rename Pushed to make the code easier to understand. IIUC it means "is there a variable declaration this warning relates to?".

652

Is this a left-over debug print?

This revision is now accepted and ready to land.Dec 15 2022, 5:36 PM
ziqingluo-90 added inline comments.Dec 15 2022, 5:39 PM
clang/lib/Analysis/UnsafeBufferUsage.cpp
202

Do we still keep these terms "safe gadget" and "unsafe gadget" for referring to something else or we should forget about them from now on?

652

I think an assertion assert(false && "This should not be entered" ) might be better. But why this code should be unreachable?

NoQ added a comment.Dec 15 2022, 5:57 PM

Let me add one more thought to our motivation here.
@malavikasamak has pointed out that replacing arr with arr.data() when its type changes from T* to std::span<T> is not a universally correct fix. A counter-example is LHS of an assignment as arr.data() is not an LValue:

arr = nullptr;
=>
arr.data() = nullptr; // error: expression is not assignable

Which means that not only would gadgets that ignore relevant context be of lower quality but in some cases these would also be incorrect.

Yeah that's right, my bad, it's only a correct fix for ImplicitCastExpr<LValueToRValue>(arr). You cannot turn a std::span or std::array into a pointer lvalue, so it's fine that we don't have a low-quality fix for such cases.

NoQ added a comment.Dec 15 2022, 6:07 PM

Looks great to me as well! I have a few nitpicks.

clang/lib/Analysis/UnsafeBufferUsage.cpp
159

I already nuked this function, so you probably need to rebase :(

187

getBaseStmt() can probably go to WarningGadget now?

192

Hmm so this function has to stay on the Gadget class because we want to group warnings by variables. Ok right.

202

No let's wipe them out!

643

In your case *every* gadget in the list is a warning gadget. So you can simply check whether the list is empty.

NoQ retitled this revision from [WIP] Safe-buffers re-architecture to introduce Fixable gadgets to [WIP][-Wunsafe-buffer-usage] Safe-buffers re-architecture to introduce Fixable gadgets.Dec 15 2022, 6:15 PM
malavikasamak marked 6 inline comments as done.

Addressed some comments. Need to rebase to Artem's changes.

malavikasamak added inline comments.Dec 20 2022, 1:03 PM
clang/lib/Analysis/UnsafeBufferUsage.cpp
202

I will do a round and wipe these out from comments.

652

This is a left over debug message I added to test the re-architecture works as intended. Currently, it should not be reachable as we don't have any fixable gadgets available.

malavikasamak retitled this revision from [WIP][-Wunsafe-buffer-usage] Safe-buffers re-architecture to introduce Fixable gadgets to [-Wunsafe-buffer-usage] Safe-buffers re-architecture to introduce Fixable gadgets.
jkorous added inline comments.Jan 4 2023, 5:53 PM
clang/include/clang/Analysis/Analyses/UnsafeBufferUsageGadgets.def
30

Nit: Can we add the newline?

30

Please ignore - this is a stale comment.

clang/lib/Analysis/UnsafeBufferUsage.cpp
224

I think this is a bug.

malavikasamak added inline comments.Jan 4 2023, 5:58 PM
clang/lib/Analysis/UnsafeBufferUsage.cpp
224

Yes. Good catch. Thank you.

jkorous added inline comments.Jan 4 2023, 6:17 PM
clang/lib/Analysis/UnsafeBufferUsage.cpp
527
malavikasamak marked an inline comment as done.
This revision was landed with ongoing or failed builds.Jan 6 2023, 11:45 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptJan 6 2023, 11:45 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript