This is an archive of the discontinued LLVM Phabricator instance.

[-Wunsafe-buffer-usage] Use `Strategy` to determine whether to fix a parameter
ClosedPublic

Authored by ziqingluo-90 on Aug 8 2023, 2:35 PM.

Details

Summary
  • Use Strategy to determine whether to fix a parameter.
  • Fix the Strategy construction so that only variables on the graph are assigned the std::span strategy
  • Extend PointerAssignmentGadget and PointerInitGadget to generate fix-its for cases where the left-hand side has won't fix strategy and the right-hand side has std::span strategy.

Diff Detail

Event Timeline

ziqingluo-90 created this revision.Aug 8 2023, 2:35 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 8 2023, 2:35 PM
ziqingluo-90 requested review of this revision.Aug 8 2023, 2:35 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 8 2023, 2:35 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
ziqingluo-90 added inline comments.Aug 8 2023, 2:50 PM
clang/lib/Analysis/UnsafeBufferUsage.cpp
1637–1638

NFC. Just noticed that getPastLoc is the better function to call here.

2743

VisitedVars are the set of variables on the computed graph. The previous UnsafeVars is a superset of it, including safe variables that have FixableGadgets discovered. We do not want to assign strategies other than Won't fix to those safe variables.

clang/test/SemaCXX/warn-unsafe-buffer-usage-fixits-multi-parm-span.cpp
30

Both b and a will be changed to have std::span type. So this assignment needs no fix-it. Same reason applies to similar changes in the rest of this test file.

80

y does not have to be changed to a safe-type while x does. So we need to fix this assignment. Same reason applies to similar changes in the rest of this test file.

clang/test/SemaCXX/warn-unsafe-buffer-usage-multi-decl-ptr-init.cpp
29

We now support to fix int *r = q. So all the unsafe variables can be fixed.

NoQ added a comment.Aug 8 2023, 4:57 PM

Ooo nice!!

Extend PointerAssignmentGadget and PointerInitGadget to generate fix-its for cases where the left-hand side has won't fix strategy and the right-hand side has std::span strategy.

Hmm, is this intertwined with other changes, or is this completely separate? Also my head appears to be blank; did we reach any conclusion about the nature of "single" RHS-es and usefulness of annotating them as such? Which could be an approach that would encourage people to propagate more bounds in the backwards direction by default.

clang/lib/Analysis/UnsafeBufferUsage.cpp
1573

MatchFinder::MatchResult.Context is what you're probably looking for, so it's already kinda passed in. You can stash it in a member variable in the base class Gadget and have it readily available everywhere, i.e.

 class Gadget {
 public:
-  Gadget(Kind K) : K(K) {}
+  Gadget(Kind K, const MatchResult &Result) : K(K), Context(*Result.Context) {
+    assert(Context);
+  }

 private:
   Kind K;
+  const ASTContext &Context;
 };

 class FixableGadget : public Gadget {
 public:
-  FixableGadget(Kind K) : Gadget(K) {}
+  FixableGadget(Kind K, const MatchResult &Result) : Gadget(K, Result) {}
 };

 class PointerAssignmentGadget {
   PointerAssignmentGadget(const MatchResult &Result)
-      : FixableGadget(Kind::PointerAssignment),
+      : FixableGadget(Kind::PointerAssignment, Result),
     PtrLHS(Result.Nodes.getNodeAs<DeclRefExpr>(PointerAssignLHSTag)),
     PtrRHS(Result.Nodes.getNodeAs<DeclRefExpr>(PointerAssignRHSTag)) {}
 };

(the change in individual gadget classes is really tiny!)

2712–2738

This entire loop can probably go away as soon as we stop relying on FixablesForAllVars and treat groups separately.

Then we can probably even have different Strategy objects for different groups, so getNaiveStrategy() could also be invoked per-group?

2753

I guess the next step is to call this function separately for each group?

Extend PointerAssignmentGadget and PointerInitGadget to generate fix-its for cases where the left-hand side
has won't fix strategy and the right-hand side has std::span strategy.

Hmm, is this intertwined with other changes, or is this completely separate? Also my head appears to be blank;

The two FixableGadgets previously gave incorrect fix-its for the cases where only RHS need fix. After the change made to Strategy, they returned std::nullopt as the case was not implemented.
So to avoid regression, I made the extension to the two Gadgets.

did we reach any conclusion about the nature of "single" RHS-es and usefulness of annotating them as such?
Which could be an approach that would encourage people to propagate more bounds in the backwards direction by default.

Can you elaborate the question a bit more?

t-rasmud added inline comments.Aug 17 2023, 3:47 PM
clang/lib/Analysis/UnsafeBufferUsage.cpp
2743

Would it make more sense (make the code more readable) if we renamed UnsafeVars to say, FixableVars?

NoQ added a comment.Aug 22 2023, 12:59 PM

Extend PointerAssignmentGadget and PointerInitGadget to generate fix-its for cases where the left-hand side
has won't fix strategy and the right-hand side has std::span strategy.

Hmm, is this intertwined with other changes, or is this completely separate? Also my head appears to be blank;

The two FixableGadgets previously gave incorrect fix-its for the cases where only RHS need fix. After the change made to Strategy, they returned std::nullopt as the case was not implemented.
So to avoid regression, I made the extension to the two Gadgets.

did we reach any conclusion about the nature of "single" RHS-es and usefulness of annotating them as such?
Which could be an approach that would encourage people to propagate more bounds in the backwards direction by default.

Can you elaborate the question a bit more?

If previously we were emitting an incorrect fixit and now we're not emitting any fixit, then I think it's a very good improvement, not a regression 🤔

Uhh, I meant LHSes, sorry 😅

Anyway, the point was that it's hard to tell whether .data() is the right fix, so I think we wanted to hold off implementing it, because we probably want to do something better instead in most cases. Like, implement a strategy discovery procedure that would give us a better idea of how exactly is the LHS pointer used safely. Then we might emit different fixes based on that. So I suspect that it should be a separate effort, so it probably makes sense to separate it into a different patch and think more about it.

ziqingluo-90 marked an inline comment as done.
ziqingluo-90 edited the summary of this revision. (Show Details)

In our offline discussion, we realized that p = q.data() is not always a correct fix to p = q for cases where only q is an unsafe pointer. It depends on whether p is being dereferenced later. So I removed my changes to PointerAssignmentGadget and PointerInitGadget.

Note there are many tests adjusted due to the fact that now p = q cannot be fixed if only q is unsafe (previously incorrect fix-it was provided, so it is not a regression).

ziqingluo-90 added inline comments.Aug 24 2023, 1:31 PM
clang/lib/Analysis/UnsafeBufferUsage.cpp
2587

We cannot fix reference type variable declarations for now.
Besides, it seems that reference type is invisible to Exprs. For example,

void foo(int * &p) {
      p[5];  // the type of the DRE `p` is a pointer-to-int instead of a reference
}

So we ignore such variables explicitly here.

2743

Oh, I just realized that UnsafeVars is now an unused variable. So let's remove it. FixableVars would be a better name for sure, since it is in fact the key set of FixablesForAllVars.byVar.

2743

To correct myself, VisitedVars are the variables In the graph but it may not be a subset of "fixable variables". VisitedVars may contain variables that is not fixable, i.e., variables having no FixableGadget associated with. We should leave those unfixable variables being WON'T FIX.

NoQ accepted this revision.Sep 20 2023, 2:33 PM

Looks amazing now. Thanks for splitting it up and fighting hard against complexity!

clang/lib/Analysis/UnsafeBufferUsage.cpp
2587

Ooo you're one of today's lucky 10,000! (More like today's 0.01, I doubt there are many C++ programmers who are aware of this quirk, it's a very niche compiler gotcha.)

That's right, expressions in C++ can't have reference types. They have "value kinds" instead (lvalue, rvalue, etc) which are more fundamental (not even necessarily C++-specific). As far as I understand, from language formalism point of view, references aren't values per se; they're just names given to other values (glvalues, to be exact). There's no separate operation of "reference dereference" in C++ formalism; "undereferenced reference" expressions don't exist in the first place.

So in this example:

int x = 5;
int &y = x;
int a = x + y;

not only the type of DeclRefExpr y is int, but also the *value* of the DeclRefExpr y is the glvalue named x. And the value of DeclRefExpr x is *also* the glvalue named x. From the point of view of C++ formalism, there's no separate location in memory named y. The declaration y simply declares a different way to refer to x.

This formalism is very different from pointers, where in

int x = 5;
int *y = &x;
int a = x + *y;

you're allowed to assume that there's a separate memory location named y where the address of memory location named x is written. You can further take the address of that location and put it into a third location named w by writing int **z = &y;. You can do that indefinitely, and then roll it all back with sufficient amount of *s. You can also reassign your pointers, eg. y = &a would overwrite the data at location y.

You cannot do any of that with references. You cannot reassign a reference. You cannot take a reference to a reference; that's the same as a reference to the original lvalue, both simply become names given to that actual value. Similarly, the language doesn't let you take an address of the location in which the reference is stored; it doesn't even allow you to assume that such location exists.

Sure you can "convert a reference to a pointer" with the help of the operator &, but that's just taking address of an lvalue it refers to. This doesn't mean that the reference itself "is" an address, it just means that the lvalue itself is technically still just an address. References can often be implemented as pointers by the codegen, eg. reference-type fields in classes or reference-type return values are probably just pointers "under the hood". Conversely, in case of pointers the actual location for y and for z could have also been optimized away by the optimizer, causing the pointers to "act like references" "under the hood": from CodeGen perspective they're just a different way to refer to x. But the purpose of this formalism isn't to describe what happens under the hood; it's just to define what operations are allowed on which expressions. And for that very simple and limited purpose, references don't act like values, and expressions of reference type don't really make sense, which is what we observe in the AST.

(None of this has anything to do with the patch, I just hope this helps you approach this problem in the future.)

This revision is now accepted and ready to land.Sep 20 2023, 2:33 PM
ziqingluo-90 marked an inline comment as done.Sep 21 2023, 2:51 PM
clang/lib/Analysis/UnsafeBufferUsage.cpp
2587

Thank you for explaining this! You just provided me a brand new view on C++ references . "expressions of reference type don't really make sense" could have confused me but now it makes perfect sense with your explanation.

This revision was landed with ongoing or failed builds.Sep 21 2023, 3:07 PM
This revision was automatically updated to reflect the committed changes.
ziqingluo-90 marked an inline comment as done.