This is an archive of the discontinued LLVM Phabricator instance.

[-Wunsafe-buffer-usage] Fix-Its transforming `&DRE[any]` to `&DRE.data()[any]`
ClosedPublic

Authored by ziqingluo-90 on Feb 1 2023, 6:38 PM.

Details

Summary

For an expression of the form &DRE[any-expr] under an Unspecified Pointer Context (UPC), we generate a fix-it for it with respect to a strategy. In case the strategy is std::span (it is the only supported one for now), the fix-it replaces the expression with &DRE.data()[any-expr].

A UPC includes at least the contexts where

  • the expression is being casted to an integer; and
  • the expression is an argument of a call to a function that is not marked unsafe.

Diff Detail

Event Timeline

ziqingluo-90 created this revision.Feb 1 2023, 6:38 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 1 2023, 6:38 PM
ziqingluo-90 requested review of this revision.Feb 1 2023, 6:38 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 1 2023, 6:38 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
ziqingluo-90 retitled this revision from [-Wunsafe-buffer-usage][WIP] Fix-Its transforming `&DRE[*]` to `DRE.data() + *` to [-Wunsafe-buffer-usage][WIP] Fix-Its transforming `&DRE[any]` to `DRE.data() + any`.
ziqingluo-90 edited the summary of this revision. (Show Details)
jkorous added inline comments.Feb 6 2023, 12:00 PM
clang/lib/Analysis/UnsafeBufferUsage.cpp
155

I am just wondering how does the callee matcher work in situation with multiple re-declarations 🤔

Something like this:

void foo(int* ptr);
[[clang::unsafe_buffer_usage]] void foo(int* ptr);
void foo(int* ptr);

void bar(int* ptr) {
  foo(ptr);
}
501

I am wondering what will happen in the weird corner-case of &5[ptr] - I feel the Fix-It we produce would be incorrect.

Here's a suggestion - we could use hasLHS instead of hasBase here and add a FIXME that when we find the time we should also properly support the corner-case. That would be a pretty low-priority though - we definitely have more important patterns to support first.

WDYT?

834

Since we use std::nullopt in getFixits to signal errors - we should either use the same strategy in fixUPCAddressofArraySubscriptWithSpan or translate the empty return value from it to nullopt here.
(FWIWI I am leaning towards the former.)
Forwarding the empty Fix-It would be incorrect.

ziqingluo-90 added inline comments.Feb 6 2023, 3:05 PM
clang/lib/Analysis/UnsafeBufferUsage.cpp
501

I'm not sure if I understand your concern. For &5[ptr], we will generate a fix-it ptr.data() + 5 in cases ptr is assigned a span strategy. It is same as the case of &ptr[5].

834

Oh, that's a bug I made! Thank you for finding it for me.

ziqingluo-90 added inline comments.Feb 6 2023, 3:17 PM
clang/lib/Analysis/UnsafeBufferUsage.cpp
155

I think we are fine. According to the doc of FunctionDecl:

/// Represents a function declaration or definition.
///
/// Since a given function can be declared several times in a program,
/// there may be several FunctionDecls that correspond to that
/// function. Only one of those FunctionDecls will be found when
/// traversing the list of declarations in the context of the
/// FunctionDecl (e.g., the translation unit); this FunctionDecl
/// contains all of the information known about the function. Other,
/// previous declarations of the function are available via the
/// getPreviousDecl() chain.
NoQ added a comment.Feb 6 2023, 3:46 PM

Why do we prefer DRE.data() + any to &DRE.data()[any]? It could be much less intrusive this way, and the safety guarantees are the same.

Why do we prefer DRE.data() + any to &DRE.data()[any]? It could be much less intrusive this way, and the safety guarantees are the same.

It is actually (DRE.data() + any) versus &DRE.data()[any]. Are they quite the same in terms of being intrusive?

ziqingluo-90 retitled this revision from [-Wunsafe-buffer-usage][WIP] Fix-Its transforming `&DRE[any]` to `DRE.data() + any` to [-Wunsafe-buffer-usage][WIP] Fix-Its transforming `&DRE[any]` to `(DRE.data() + any)`.

Let fixUPCAddressofArraySubscriptWithSpan return std::nullopt instead of an empty list when we should give up on the fix-it.

Add a few test cases for some corner cases.

jkorous added inline comments.Feb 6 2023, 5:38 PM
clang/lib/Analysis/UnsafeBufferUsage.cpp
155

I see! Sound like we should be fine indeed and the test seems to confirm.
Thank you!

501

Oh, my bad! I assumed (AKA didn't check) that we're just replacing the parts of the code around the DRE and index.
You're right. Please ignore me :)

ziqingluo-90 retitled this revision from [-Wunsafe-buffer-usage][WIP] Fix-Its transforming `&DRE[any]` to `(DRE.data() + any)` to [-Wunsafe-buffer-usage] Fix-Its transforming `&DRE[any]` to `(DRE.data() + any)`.Feb 16 2023, 3:42 PM

Now we have different FixableGadgets that may match the same Stmt (representing a context).
So in order to discover all "Fixable"s, we can no longer match anyOf FixableGadgets' matchers. Instead, we match eachOf them.

ziqingluo-90 added inline comments.Feb 17 2023, 1:46 PM
clang/lib/Analysis/UnsafeBufferUsage.cpp
678

Change from anyOf to eachOf

NoQ added a comment.Mar 2 2023, 5:10 PM

Why do we prefer DRE.data() + any to &DRE.data()[any]? It could be much less intrusive this way, and the safety guarantees are the same.

It is actually (DRE.data() + any) versus &DRE.data()[any]. Are they quite the same in terms of being intrusive?

They may be equally verbose but I think the latter looks much more similar to the original code in terms of shape. This could be valuable because there's probably a reason why the programmer preferred to write it that way. Or even if there wasn't any reason, the users still may find it surprising that we change the shape for no apparent reason.

jkorous added a comment.EditedMar 3 2023, 10:22 AM

This is an interesting topic. In the abstract I see the question as: "Should the Fix-Its prioritize how the code will fit the desired end state (presumably modern idiomatic C++) or carefully respect the state of the code as is now?"

The only thing I feel pretty strongly about is that no matter what philosophy we decide to use here we should apply it consistently to all our Fix-Its (which might or might not already be the case).

And FWIW I can also imagine at some point in the future we might either have two dialects of the Fix-Its or that a separate modernizer tool (completely independent of Safe Buffers) could suggest transformations like:
"Would you like to change &DRE.data()[any] to (DRE.data() + any)?"

This is an interesting topic. In the abstract I see the question as: "Should the Fix-Its prioritize how the code will fit the desired end state (presumably modern idiomatic C++) or carefully respect the state of the code as is now?"

The only thing I feel pretty strongly about is that no matter what philosophy we decide to use here we should apply it consistently to all our Fix-Its (which might or might not already be the case).

And FWIW I can also imagine at some point in the future we might either have two dialects of the Fix-Its or that a separate modernizer tool (completely independent of Safe Buffers) could suggest transformations like:
"Would you like to change &DRE.data()[any] to (DRE.data() + any)?"

Fantastic topic! :) In our experience at Google, we've generally followed the philosophy of leaving the code at least as good as it was before we touched it. So, if the idiom is archaic, but we're just adjusting it, that's fine (as in this example), but our tools shouldn't generate non-idiomatic (or anti-idiomic) code. We've also often taken the path of "leave cleanups to a separate pass", especially when we already have say, a clang tidy check, that does that kind of clean up running regularly. But, this one is more a judgment call. I'd probably lean towards "make the code better once you're at it", but certainly see the conservative argument.

t-rasmud added inline comments.Mar 6 2023, 10:51 AM
clang/lib/Analysis/UnsafeBufferUsage.cpp
885

Nit: "costly"

clang/test/SemaCXX/warn-unsafe-buffer-usage-fixits-addressof-arraysubscript.cpp
75

Can we have a test case for &p[0]?
IIUC, this would generate a fixit p.data() + 0 which is correct but we might want to optimize it to p.data() sometime in the future.

NoQ added a comment.Mar 7 2023, 3:06 PM

This is an interesting topic. In the abstract I see the question as: "Should the Fix-Its prioritize how the code will fit the desired end state (presumably modern idiomatic C++) or carefully respect the state of the code as is now?"

The only thing I feel pretty strongly about is that no matter what philosophy we decide to use here we should apply it consistently to all our Fix-Its (which might or might not already be the case).

And FWIW I can also imagine at some point in the future we might either have two dialects of the Fix-Its or that a separate modernizer tool (completely independent of Safe Buffers) could suggest transformations like:
"Would you like to change &DRE.data()[any] to (DRE.data() + any)?"

Fantastic topic! :) In our experience at Google, we've generally followed the philosophy of leaving the code at least as good as it was before we touched it. So, if the idiom is archaic, but we're just adjusting it, that's fine (as in this example), but our tools shouldn't generate non-idiomatic (or anti-idiomic) code. We've also often taken the path of "leave cleanups to a separate pass", especially when we already have say, a clang tidy check, that does that kind of clean up running regularly. But, this one is more a judgment call. I'd probably lean towards "make the code better once you're at it", but certainly see the conservative argument.

That's a fantastic topic in general, but in *this* case I'm really not sure which one's ultimately "better", I'd totally write code both ways and at different times prefer one over the other.

I honestly actually prefer code like &DRE[any]/&DRE.data()[any] to the one with pointer arithmetic, specifically because it avoids the subject of pointer arithmetic, but instead talks about "Let's take this object, for which we already have a name, and take its address". The readers don't have to typecheck the expression in their heads in order to figure out whether correct offset multipliers are applied during pointer arithmetic. Another advantage of &DRE.data()[any] is that it's really easy to transform it to idiomatic &DRE[any] once the user confirms that the bounds checks don't actually get in the way in their case. This is much harder to achieve with DRE.data() + any given that DRE + any isn't valid code anymore.

Thanks for the valuable discussion about the philosophy on the ideal forms of fix-its. In this case, I think &DRE.data()[any] and (DRE.data() + any) are both straightforward enough for the user to tell what has been changed and why we change that. And I believe both forms are idiomatic so that the user are likely happy with the form. I will keep this discussion in mind as we have other cases whose fix-its may be less idiomatic, such as D144304.

Since we have to choose one form for this case, I buy @NoQ 's argument to use &DRE.data()[any].

In addition, thanks to @t-rasmud for reminding me of that &DRE[0] can be converted to DRE.data() optimally. In such a specific case, I think we do want it to be the most concise form without confusing the user.

ziqingluo-90 marked 2 inline comments as done.Mar 7 2023, 5:46 PM
malavikasamak added inline comments.Mar 24 2023, 5:20 PM
clang/lib/Analysis/UnsafeBufferUsage.cpp
901

The title and summary of this patch indicates this gadget emits fixit of the form &DRE.data() + any when it encounters &DRE[any], but the code seems to emits &DRE.data()[any] instead? If yes, can you please update the summary?

malavikasamak accepted this revision.Mar 24 2023, 6:39 PM
This revision is now accepted and ready to land.Mar 24 2023, 6:39 PM
ziqingluo-90 retitled this revision from [-Wunsafe-buffer-usage] Fix-Its transforming `&DRE[any]` to `(DRE.data() + any)` to [-Wunsafe-buffer-usage] Fix-Its transforming `&DRE[any]` to `&DRE.data()[any]`.Mar 28 2023, 2:13 PM
ziqingluo-90 edited the summary of this revision. (Show Details)
ziqingluo-90 marked an inline comment as done.
NoQ accepted this revision.Apr 3 2023, 12:27 PM

Looks great! LGTM except there's some dead code.

clang/lib/Analysis/UnsafeBufferUsage.cpp
509–512

These dyn_casts are already checked by the matcher. They can be turned into casts and this function can return {DRE} unconditionally.

830

Similarly, this check is redundant, it's already guaranteed by the matcher.

882–884

Same here!

ziqingluo-90 marked 3 inline comments as done.Apr 3 2023, 6:18 PM
This revision was landed with ongoing or failed builds.Apr 4 2023, 1:27 PM
This revision was automatically updated to reflect the committed changes.

Just a heads up it seems like a lot of premerge checks builds are now showing this test as failing on Windows x64:

********************
Failed Tests (1):
  Clang :: SemaCXX/warn-unsafe-buffer-usage-fixits-addressof-arraysubscript.cpp

Here are some links from buildkite for a few of the builds that failed on this test:
https://buildkite.com/llvm-project/premerge-checks/builds/145054
https://buildkite.com/llvm-project/premerge-checks/builds/145048
https://buildkite.com/llvm-project/premerge-checks/builds/145044

I've reverted this, please try to fix the test then reland.

The full test output can be downloaded from the buildbot page, if you need any more information let me know.

ziqingluo-90 added a comment.EditedApr 5 2023, 3:42 PM

I've reverted this, please try to fix the test then reland.

The full test output can be downloaded from the buildbot page, if you need any more information let me know.

Thank you @DavidSpickett and @emgullufsen for catching this issue and reverting the patch for me. I have added a specific target triple for the test and re-laned it.