This is an archive of the discontinued LLVM Phabricator instance.

[-Wunsafe-buffer-usage] Group parameter fix-its
ClosedPublic

Authored by ziqingluo-90 on Jun 15 2023, 11:30 AM.

Details

Summary

For a function F whose parameters need to be fixed, we group fix-its of its' parameters together so that either all of the parameters get fixed or none of them gets fixed.

The reason for doing so is to avoid generating low-quality fix-its. For example,

void f(int * p, int * q) {
   int i = p[5];
   int j = q[5];
}

We would like to fix both parameters p and q with std::span. If we do not group them together, they will be fixed in two steps: 1) fixing p first; and then 2) fixing q. Step 1 yields a new overload of f:

void f(std::span<int> p, int *q) {  ...  }
[[unsafe_buffer_usage]] void f(int *p, int *q) {return f(std::span(p, <# size #>), q);}
}

The new overload of f is, however, still an unsafe function. Then we apply step 2 and have:

void f(std::span<int> p, std::span<int> q) {  ...  }
[[unsafe_buffer_usage]] void f(int *p, int *q) {return f(std::span<int>(p, <# size #>), q);}
[[unsafe_buffer_usage]] void f(std::span<int> p, int *q) {return f(p, std::span<int>(q, <# size #>));}

The "intermediate" overload void f(std::span<int>, int *) stays there but usually is not useful. If there are more than two parameters need to be fixed, there will be more such useless "intermediate" overloads.

With this patch, p and q will be fixed together: either selecting p or q to fix will result in a safe function void f(std::span<int> p, std::span<int> q in one step.

Diff Detail

Event Timeline

ziqingluo-90 created this revision.Jun 15 2023, 11:30 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 15 2023, 11:30 AM
ziqingluo-90 requested review of this revision.Jun 15 2023, 11:30 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 15 2023, 11:30 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
ziqingluo-90 added inline comments.Jun 15 2023, 1:10 PM
clang/lib/Analysis/UnsafeBufferUsage.cpp
2209–2210

I changed the rest of this function drastically so let me explain what I did.

The table FixItsForVariable is initiated with variables whose declarations and associated Fixables can be fixed. So if a variable is not in the table, either 1) its declaration cannot be fixed or 2) one of its Fixables cannot be fixed. Then, the table is further reduced by removing elements such that at least one of its' group members satisfies 1) or 2).
Eventually, the table FixItsForVariable can be used to determine if a variable is ImpossibleToFix (so we no longer need this flag). FixItsForVariable[V], if V exists there, is a collection of fix-its for V's declaration and all Fixables associated to V.

With parameters being fixed, we also generate function overloads as fix-its for the parameters. These overload fix-its are "shared" by the parameters. It means that these fix-its will be added for the group where the parameters belong to, instead of being added to FixItsForVariable[PV] for each such parameter PV.

So finally, for a variable V, FinalFixItsForVariable[V] is a collection of fix-its for the whole group where V is at.

2371–2378

How about changing the variable name to PtrImplicationGraph? For two variables V and W, if W is in PtrImplicationGraph[V], it means fixing V implicates fixing W, right?

ziqingluo-90 added inline comments.Jun 15 2023, 2:44 PM
clang/lib/Analysis/UnsafeBufferUsage.cpp
2371–2378

@t-rasmud what do you think?

t-rasmud added inline comments.Jun 15 2023, 2:54 PM
clang/lib/Analysis/UnsafeBufferUsage.cpp
2371–2378

Sounds good! PtrImplicationGraph conveys the same meaning as PtrAssignmentGraph and is more generic.

ziqingluo-90 retitled this revision from [-Wunsafe-buffer-usage][WIP] Group parameter fix-its to [-Wunsafe-buffer-usage] Group parameter fix-its.Jun 20 2023, 4:23 PM
NoQ added a comment.Jun 20 2023, 5:14 PM

I'll just leave this tiny bit of analysis here, that we did on the whiteboard a while ago.

The "intermediate" overload void f(std::span<int>, int *) stays there but usually is not useful. If there are more than two parameters need to be fixed, there will be more such useless "intermediate" overloads.

This isn't even the biggest problem. The bigger problem is that the process doesn't actually stop at Step 2.

Let me rewrite this more slowly.

Step 0 - the initial code:

// original body, original signature
void foo(int *p, int *q, size_t sz) {
   int i = p[5]; // warning: unsafe buffer usage over 'p'
   int j = q[5]; // warning: unsafe buffer usage over 'q'
}

Step 1 - fix parameter p:

// FIXED: original body, new signature
void foo(span<int> p, int *q, size_t sz) {
   int i = p[5];
   int j = q[5]; // warning: unsafe buffer usage over 'q'
}

// NEW: compatibility overload from step 1 - original signature
[[unsafe_buffer_usage]]
void f(int *p, int *q, size_t sz) {
   foo(span<int>(p, <#size#>), q, sz);
}

Step 2 - fix parameter q as well:

// FIXED: original code, even newer signature
void foo(span<int> p, span<int> q, size_t sz) {
   int i = p[5];
   int j = q[5];
}

// NEW: compatibility overload from step 2
[[unsafe_buffer_usage]]
void foo(span<int> p, int *q, size_t sz) {
   foo(p, span<int>(q, <#size#>), sz);
}

// UNCHANGED: compatibility overload from step 1 - still carries original signature
[[unsafe_buffer_usage]]
void foo(int *p, int *q, size_t sz) {
   foo(span<int>(p, sz), q, sz); // warning: unsafe buffer usage over 'q'
}

Step 3 - now we have a warning inside the overload from step 1 because it's no longer calling the newest safe function, but it's now calling the compatibility overload from step 2. So we need to change the parameter q of the bottom-most function to span.

// UNCHANGED: original code, even newer signature - unchanged
void foo(span<int> p, span<int> q, size_t sz) {
   int i = p[5];
   int j = q[5];
}

// UNCHANGED: compatibility overload from step 2
[[unsafe_buffer_usage]]
void foo(span<int> p, int *q, size_t sz) {
   foo(p, span<int>(q, sz), sz);
}

// FIXED: compatibility overload from step 1 - new safe signature
[[unsafe_buffer_usage]]
void foo(int *p, span<int> q, size_t sz) {
   foo(span<int>(p, sz), q, sz);
}

// NEW: compatibility overload from step 3 - original signature of the overload from step 1, aka simply original signature
[[unsafe_buffer_usage]]
[[unsafe_buffer_usage]] // sic!
void foo(int *p, int *q, size_t sz) {
   foo(p, span<int>(q, <#size#>), sz); // warning: unsafe buffer usage over 'p'
}

There are a few things that look wrong here.

The set of overloads still makes sense; we get ourselves 4 overloads in all combinations of spans and raw pointers. This on its own makes sense.

We've also somehow got the new raw overload (with two raw pointers) carry two instances of [[unsafe_buffer_usage]]. It lowkey makes sense because the function is unsafe with respect to both parameters, but we probably don't want people to write such code.

Now, here's where it gets really weird. You'd probably expect that new raw overload to call foo(span<int>(p, sz), span<int>(q, sz), sz). If it just did that, it'd be perfectly reasonable code. But it doesn't do that! Indeed, by design the autogenerated compatibility overload calls the freshly-fixed function. Which in case of Step 3 was foo(int *p, span<int> q, size_t sz).

Also, the new raw overload already has a warning in it! This is completely unacceptable. This happened because the function under fix was already annotated as [[unsafe_buffer_usage]]. The autofix wouldn't remove the attribute; the attribute is still completely appropriate there.

Finally, because there's a new warning, the process doesn't stop after Step 3 either. I'll leave imagining what Step 4 would look like as an easy exercise to the reader ^.^


So one way or another, we really don't want to be in the business of incrementally fixing parameters one-by-one. Fixing all parameters at once is much easier than fixing them incrementally and discarding all analysis state on each step. Analysis of "partially spanified" code is much harder than analysis of raw C-style code, so we prefer single-step transformation in most cases.

Another direction that we can explore is to avoid auto-fixing functions that already carry [[unsafe_buffer_usage]]. Indeed, it's already a compatibility overload and we know it; it's not *supposed* to be safe! This will prevent Step 3 from breaking things, so it'll leave us with 3 overloads - not ideal, but not incorrect either. It'll also eliminate the "double attribute" problem.

And we ultimately need to explore this direction because no matter how hard we avoid automatic partial transformations, people will still do manual partial transformations! And when they do, we need to demonstrate some reasonable behavior.

But for now let's focus on the basic use case.

NoQ added inline comments.Jun 20 2023, 6:27 PM
clang/lib/Analysis/UnsafeBufferUsage.cpp
2201

There's a bug in variable naming: FixablesForUnsafeVarsactually contains fixables for *all* variables. Rashmi correctly renames it in D150489 to reflect its actual behavior. So IIUC you're generating fixits for all variables here, which is probably not what you want.

2392–2395

Does this really go in both directions? Shouldn't it be just one direction?

void foo(int *p) {
  int *q = new int[10];
  p = q;
  q[5] = 7;
}

In this case q is an unsafe local buffer, but p is a (mostly) safe single-object pointer that doesn't need fixing, despite implication from q to p. Of course this example is somewhat unrealistic (people don't usually overwrite their parameters), but it is also the only situation where the other direction matters.

2407

What about the situation where params aren't seen as unsafe yet, but they're discovered to be unsafe later? Eg.

void foo(int *p, int *q) {
  int *p2 = p;
  p2[5] = 7;
  int *q2 = q;
  q2[5] = 7;
}

Will we notice that p and q need to be fixed simultaneously?


I suspect that the problem of parameter grouping can't be solved by pre-populating strategy implications between all parameters. It'll either cause you to implicate all parameters (even the ones that will never need fixing), or cause you to miss connections between parameters that do need fixing (as the connection is discovered later).

Connection between parameters needs to be added *after* the implications algorithm has finished. And once it's there (might be already there if I missed something), then this part of code probably becomes unnecessary.

2471–2473

Note: This is why FixablesForUnsafeVars is actually FixablesForAllVars.

clang/test/SemaCXX/warn-unsafe-buffer-usage-fixits-multi-parm-span.cpp
14–16

QoL thing: "to propagate bounds information between them" isn't the correct reason in this case. Bounds information doesn't propagate between p and q and a. Each of them carries its own, completely unrelated bounds information.

We need to come up with a different reason. Or it might be better to combine the warnings into one warning here, so that we didn't need a reason:

warning: 'p', 'q' and 'a' are unsafe pointers used for buffer access
note: change type of 'p', 'q' and 'a' to 'std::span' to preserve bounds information

This gets a bit worse when parameters are implicated indirectly, but it still mostly works fine, for example:

void foo(int *p, int *q) {
  int *p2 = p;
  p2[5] = 7;
  int *q2 = q;
  q2[5] = 7;
}

would produce:

warning: 'p2' and 'q2' are unsafe pointers used for buffer access
note: change type of 'p2' and 'q2' to 'std::span' to preserve bounds information, and
      change 'p' and 'q' to 'std::span' to propagate bounds information between them

In any case, this shows that the relationship between parameters (grouped together simply for being parameters) is very different from the relationship within groups of variables implicated by assignments (even if two or more of them are parameters). All the way down to the warning/note printer, we probably need to continue giving the parameter relationship a special treatment.

NoQ added a comment.Jun 20 2023, 6:29 PM

Ok whew that's a big patch! I *think* I found a couple of bugs. In any case, this code is quickly becoming very complicated, and the functions we're editing are rapidly growing out of control. Chopping them up into smaller functions with clear separation of concerns would probably help the reader a lot 😅

ziqingluo-90 added inline comments.Jun 21 2023, 12:57 PM
clang/lib/Analysis/UnsafeBufferUsage.cpp
2201

Thanks for pointing out the incorrectness of the variable name. Actually, whether a variable is unsafe is irrelevant to this function. The function only cares about all variables that need fixes. So I think the incorrect name of the variable does not affect the functionality of my changes here.

2392–2395

You are right! It should be one-direction.

2407

Yes. They will be noticed. Here is why:

The code here first forms a ring over p and q in the assignment (directed) graph:

p <--> q

Then the two initialization statements (implemented in D150489) add two more edges to the graph:

p2 --> p <--> q <-- q2

The algorithm later searches the graph starting from unsafe variables p2 and q2, respectively, Starting from p2, the algorithm discovers that p2 and p, as well as p and q depend on each other. Starting from q2, the algorithm discovers that q2 and q, as well as p and q depend on each other. The dependency graph is sort of undirected. So eventually, the four variables p2, p, q, q2 are in the same group.

clang/test/SemaCXX/warn-unsafe-buffer-usage-fixits-multi-parm-span.cpp
14–16

This is a good argument! An edge from p to q in an assignment graph stands for not only that fixing p implicates fixing q but also that the bounds information is propagated from q to p. In this sense, the way I connect parameters is inappropriate.

But my concern is that although treating parameters specially could improve the quality of the diagnostic messages, the code may be more complex and less re-used.

ziqingluo-90 added inline comments.Jun 22 2023, 11:45 AM
clang/test/SemaCXX/warn-unsafe-buffer-usage-fixits-multi-parm-span.cpp
14–16

So for such an example

int f(int *p, int *q) {
int * a = p;
int * b = q;

a[5] = 0;
b[5] = 0;
}

We will fix all four variables a, b, p, q together. But bounds information propagation only happens between a and p, and b and q, respectively.
Then what should the diagnostic message be? How about something like

change type of 'a' to 'std::span' to preserve bounds information, and change 'p' to propagate bounds information between them. 
To ensure parameters are changed together, also
change type of 'b' to 'std::span' to preserve bounds information, and change 'q' to propagate bounds information between them;
...

for variable a.

NoQ added inline comments.Jun 22 2023, 12:22 PM
clang/lib/Analysis/UnsafeBufferUsage.cpp
2407

The code here first forms a ring over p and q in the assignment (directed) graph

I don't think it does. The ring is formed over the ParamsNeedFix list, which is empty because none of these parameters are listed in UnsafeOps.byVar.

clang/test/SemaCXX/warn-unsafe-buffer-usage-fixits-multi-parm-span.cpp
14–16

I think the ideal narrative is to emit a single warning against the entire function, which covers all parameters at once, together with all related variables.

int f(int *p, int *q) {  // warning: function 'f' performs unsafe buffer access over parameters 'p' and 'q'
                         // note: change type of 'p' and 'q' to 'std::span' to receive bounds information from the call site,
                         //       and change local variables 'a' and 'b' to 'std::span' to propagate it to the access site

  int * a = p;           // note: bounds information needs to propagate from 'p' to 'a' here
  int * b = q;           // note: bounds information needs to propagate from 'q' to 'b' here

  a[5] = 0;              // note: used in buffer access here
  b[5] = 0;              // note: used in buffer access here
}

This makes a lot of sense given that we're fixing the entire function. This is quite a major change though, we need to think this through.

ziqingluo-90 added inline comments.Jun 22 2023, 5:58 PM
clang/lib/Analysis/UnsafeBufferUsage.cpp
2407

Actually p and q will be in ParamsNeedFix. Because they are implicated by p2 and q2, who are unsafe variables. ParamsNeedFix include parameters that need fix, either because they are used in unsafe operations or they are implicated by some Gadgets.

But anyway, I now think it's not a good idea to lump parameters with variables grouped by implication, as you pointed out that variable implication also implies bounds propagation.

Instead, I think it might be more appropriate to make a variable group structured:

  • Let's call it a connected components for a set of variables that need to be fixed together and sharing bounds information. (The term connected components has been used in the code to refer to a group of variables. It keeps its meaning here.)
  • Let's redefine a Variable Group to be a set of mutually disjoint connected componentss (CCs). If there are more than one CCs in a variable group, each of the CCs contains at least one parameter.

For generating fix-its for a variable group, this patch still works as it only requires that variables in a variable group are either all fixed or given up as a whole.
For generating diagnostic messages, we now have more information about relations among grouped variables.

NoQ added inline comments.Jun 22 2023, 6:48 PM
clang/lib/Analysis/UnsafeBufferUsage.cpp
2407

Actually p and q will be in ParamsNeedFix. Because they are implicated by p2 and q2, who are unsafe variables. ParamsNeedFix include parameters that need fix, either because they are used in unsafe operations or they are implicated by some Gadgets.

Ohh, right, it's populated twice, I even commented on the other part but didn't notice it populates the same list!

But in this case you'll probably join p and q in

void foo(int *p, int *q) {
  int *p2 = p;
  p2[5] = 7;
  int *q2 = q;
}

right? (In this case q is implicated by q2 so it gets added to the list, but q2 itself isn't unsafe or implicated by anything, so it doesn't need fixing, but connecting it to p would cause it to be fixed unnecessarily). So I still think this can't be correct unless you do it after the crawler finishes.

But anyway, I now think it's not a good idea to lump parameters with variables grouped by implication, as you pointed out that variable implication also implies bounds propagation.

Yeah, I think it makes a lot more sense. Just look at variable groups already built by the existing crawler, and if more than one of them contains parameters, report all groups that contain parameters through one combined warning.

ziqingluo-90 added inline comments.Jun 23 2023, 3:33 PM
clang/lib/Analysis/UnsafeBufferUsage.cpp
2407

Though I'm going to remove this part, achieving some agreement on this algorithm may help me understand the problem better.

right? (In this case q is implicated by q2 so it gets added to the list, but q2 itself isn't unsafe or implicated by anything, so it doesn't need fixing, but connecting it to p would cause it to be fixed unnecessarily)

Right. the code in this patch does not form ParmsNeedFix correctly. ParmsNeedFix should be the set of parameters that are either unsafe or being implicated (transitively) by unsafe variables. Then the group would be "correct" I think.

t-rasmud added inline comments.Jun 29 2023, 11:49 AM
clang/test/SemaCXX/warn-unsafe-buffer-usage-fixits-multi-parm-span.cpp
6

I have used a workaround for non-determinism by using regular expressions to match on the expected-note. See https://github.com/llvm/llvm-project/commit/db3dcedb9cedcec4a9570fda7406490c642df8ae#diff-8486f0b4ae37871fc0a186f40a41adbc70e5a0ca0134dc9bba762e1f17994460R176.
This is definitely a temporary fix (but helps with passing llvm builedbot tests upon landing on llvm-project\main) and we should probably discuss and put in a more reliable fix (say, using ordered sets) in the future.

t-rasmud added inline comments.Jun 29 2023, 1:03 PM
clang/test/SemaCXX/warn-unsafe-buffer-usage-fixits-multi-parm-span.cpp
252

Does this patch handle virtual methods? Ideally we'd like the fixed methods to have the same subtyping relations as the original method. Does it make sense to have test cases with virtual methods as part of this patch?

NoQ added inline comments.Jun 29 2023, 2:04 PM
clang/lib/Analysis/UnsafeBufferUsage.cpp
2407

Aha ok, whew, gotcha, makes sense then!

  • a major change to the patch: Parameters are now grouped after the grouping of all variables. The groups that contain parameters of the function will form a union group.
  • changes to the grouping algorithm: Groups will reside in the vector Groups and being accessed through their indexes so that we do not have to copy groups. The GrpsUnionForParms is the union group over groups containing parameters. It holds a separate copy. These data structures are wrapped in VariableGroupsManager, which returns the proper group given any variable that needs fix.
  • add a new diagnostic note kind: diag::note_unsafe_buffer_variable_fixit_together. It will replace diag::note_unsafe_buffer_variable_fixit_group when we do not want to explain how a group of variables gets formed.
  • add a "Comparator" to FixableGadgetSets and replace some std::sets with llvm::SetVectors for deterministic diagnostic messages.
clang/lib/Sema/AnalysisBasedWarnings.cpp
2168

The printing process was extracted as a method with some refactoring.

clang/test/SemaCXX/warn-unsafe-buffer-usage-fixits-multi-parm-span.cpp
252

Sorry for the late response.
That's a good point. Virtual methods could get things more complicated. This is one of the reasons that we do not fix any member functions at this point.

t-rasmud added inline comments.Jul 18 2023, 5:08 PM
clang/test/SemaCXX/warn-unsafe-buffer-usage-fixits-multi-parm-span.cpp
306

Can we have a test case with qualifiers on parameters and maybe another with a qualifier on the function itself?

NoQ added a comment.Jul 19 2023, 6:33 PM

Alright, I think I managed to fully chew through this patch and I think it's beautiful and everything makes sense to me!

I have a tiny complaint though: It is very large though, 500 lines of code is very hard to digest all at once. Because we aren't coming in with all the context you had when you wrote this code, it's often very hard for us readers to guess the intention behind every section of the diff, and how they are supposed to work together.

In this patch there are a few refactoring steps that weren't responsible for any change in behavior:

  • Introduce the VariableGroupsManager abstraction;
  • Extract eraseVarsForUnfixableGroupMates() into a function;
  • Extract listVariableGroupAsString into a function;

But because I didn't know that at first, I had to carefully look at every line of code to confirm that it actually doesn't change anything.

So it'd help tremendously if these refactorings were extracted into a parallel "no functional change" ("NFC") patch (with zero tests: no failures on existing tests => the patch is indeed NFC) that doesn't do anything other than "prepare the codebase" for this patch. Then in the NFC patch we'd discuss architecture and confirm that it truly doesn't change anything, whereas in this patch every line of code would be filled with meaning and purpose! ^.^

I'm not sure it makes sense split it up now, probably depends on whether Jan and Rashmi have the same troubles as me ^.^

I also has nitpicks.

clang/lib/Analysis/UnsafeBufferUsage.cpp
2130–2131

(.contains() is unfortunately C++20)

2156–2157

Would it make sense to make createFunctionOverloadsForParms accept a FunctionDecl *D from the start?

2162–2163

Another good use for .count().

2224
2430

Why SetVector? Do we care about order? Maybe just SmallSet?

clang/lib/Sema/AnalysisBasedWarnings.cpp
2171

Does the empty string actually ever make sense for the caller? Maybe just assert that this never happens, and force the caller to check that?

clang/test/SemaCXX/warn-unsafe-buffer-usage-fixits-multi-parm-span.cpp
37–39

The fix is emitted 3 times right? Once for each warning?

Would it make sense to duplicate the three CHECK: lines three times to confirm that?

I'm not sure it makes sense split it up now, probably depends on whether Jan and Rashmi have the same troubles as me ^.^

I have to admit it is a difficult patch to follow. It also has a lot of clever ideas that deserve attention but might be lost because of the complexity of the patch. I would really appreciate if this I can be refactored and split up!

NoQ added a comment.Jul 20 2023, 6:26 PM

I just want to add to this, that the NFC part is actually insanely valuable (despite technically not doing anything). This patch is so complex primarily because UnsafeBufferUsage.cpp already has 1300 lines of code in unstructured static functions - that's more than half of our code! I really appreciate every bit of effort to separate it into components with clear boundaries and contracts, and even the new comments really help 🥺

I vote for splitting the patch to make both the review and any future debugging or git archeology easier.

Thanks for the suggestion of splitting this patch to smaller ones.

I have one such smaller patch ready here: https://reviews.llvm.org/D156474. It does make it easier in describing and discussing about the changes.

clang/lib/Analysis/UnsafeBufferUsage.cpp
2430

We'd like to keep the order of elements in this container deterministic. Otherwise, the diagnostic message that lists those elements could be non-deterministic.

clang/lib/Sema/AnalysisBasedWarnings.cpp
2171

The branch deals with the case where the group has only one member, which is the VD. In that case, we do not list group mates of VD as there is none.
This case is actually possible and conforms to the contract of this method.

The case where the group is empty is not suppose to happen.

ziqingluo-90 marked 4 inline comments as done.

Carved out NFC changes from the original diff.

ziqingluo-90 added inline comments.Jul 31 2023, 5:25 PM
clang/test/SemaCXX/warn-unsafe-buffer-usage-fixits-multi-parm-span.cpp
37–39

I wish there are better ways to test this. I tried CHECK-COUNT-3 but it doesn't work because it expects each line repeats 3 times consecutively.

306

good point. I recently added such tests for parameters and local variables in following open patches:

NoQ added a comment.Aug 1 2023, 5:50 PM

OOo down to ±300, I love this!! I'll take a closer look tomorrow.

clang/lib/Analysis/UnsafeBufferUsage.cpp
1855–1856

Speaking of my comment on the other patch.

Can we simply ask Strategy about the strategy for FD->getParamDecl(i) instead of passing a mask?

Eventually we'll have to do that anyway, given how different parameters can be assigned different strategies.

1899–1903

Arguably not a great use of auto, because the return type isn't literally Identifier *.

2430

Ooo right gotcha!

clang/lib/Sema/AnalysisBasedWarnings.cpp
2171

Gotcha, nice!

ziqingluo-90 marked an inline comment as done.

Address comments.

ziqingluo-90 added inline comments.Aug 4 2023, 5:53 PM
clang/lib/Analysis/UnsafeBufferUsage.cpp
1855–1856

Yes. I think using Strategy is the correct way to do it.

Can I make it a follow-up patch? I noticed that this line needs to be fixed:

Strategy NaiveStrategy = getNaiveStrategy(UnsafeVars);

The UnsafeVars is the set of variables associated to all matched FixableGadgets. There can be variables that is safe and do not need fix. Their strategy is set to std::span currently which is not correct. With this line being fixed, we can have examples like

void f(int * p) {
  int * q;
   q = p;
   p[5] = 5;
}

where q's strategy is WONTFIX and p's strategy is std::span. The expression q = p can be fixed to q = p.data(). It currently has an empty fix-it, which is incorrect.

NoQ added a comment.Aug 8 2023, 4:55 PM

Ok I think this looks great and almost good to go. I love how the core change in the grouping algorithm is actually tiny and simple!

clang/lib/Analysis/UnsafeBufferUsage.cpp
1855–1856

Yes sure!

2293–2295

The usual idiom to avoid double lookup: use find() to extract an iterator, compare it to end() to see if the item was found, and if it was then dereference the iterator.

I wouldn't use at() given that the whole point of .at() is to throw exceptions, while LLVM is compiled with -fno-exceptions.

clang/lib/Sema/AnalysisBasedWarnings.cpp
2271–2272

IIRC we're supposed to stuff ND directly into the stream and it'll figure out on its own how to name it, or how to put quotes around it. See how there's no quotes around %2!

ziqingluo-90 marked 2 inline comments as done.

Address comments

NoQ accepted this revision.Sep 20 2023, 2:12 PM

Everything looks great now, let's commit!

Again, thank you Ziqing for simplifying and splitting up the patch. It looks like we're able to keep code complexity from exploding. That makes me feel very optimistic about the future of -Wunsafe-buffer-usage!

clang/lib/Analysis/UnsafeBufferUsage.cpp
1371

Formatting is a bit off (clang-format probably needed some human help here).

This revision is now accepted and ready to land.Sep 20 2023, 2:12 PM
This revision was landed with ongoing or failed builds.Sep 21 2023, 12:45 PM
This revision was automatically updated to reflect the committed changes.
ziqingluo-90 edited the summary of this revision. (Show Details)Sep 21 2023, 1:28 PM