This is an archive of the discontinued LLVM Phabricator instance.

[-Wunsafe-buffer-usage] Introduce an abstraction for fixable code patterns.
ClosedPublic

Authored by NoQ on Nov 3 2022, 11:45 AM.

Details

Summary

This patch introduces a hierarchy of Gadget classes, which are going to be the core concept in our fixit machine.

A gadget is a rigid code pattern that constitutes an operation that the fixit machine "understands" well enough to work with. A gadget may be "unsafe" (i.e., constitute a raw buffer access) or "safe" (constitute a use of a pointer or array variable that isn't unsafe per se, but may need fixing if the participating variable changes type to a safe container/view).

By "rigid" I mean something like "matchable with a sequence of nested if-dyncast-statements or an ASTMatcher without forEachDescendant() or other non-trivial traversal, except maybe ignoringParenImpCasts() and such. The point is to make it very clear what the semantics of that code is.

We'll be able to make decisions about how to fix the code depending on the set of gadgets that we've enumerated. Basically, in order to fix a type of a variable, each use of that variable has to participate in a gadget, and each such gadget has to consent to fixing itself according to the variable's new type. More on that in follow-up patches!

Diff Detail

Event Timeline

NoQ created this revision.Nov 3 2022, 11:45 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 3 2022, 11:45 AM
NoQ added inline comments.Nov 3 2022, 11:49 AM
clang/lib/Analysis/UnsafeBufferUsage.cpp
37

Whoops typo!

NoQ retitled this revision from -Wunsafe-buffer-usage: Introduce an abstraction for fixable code patterns. to [-Wunsafe-buffer-usage] Introduce an abstraction for fixable code patterns..Nov 17 2022, 8:00 PM
xazax.hun added inline comments.Nov 18 2022, 2:39 PM
clang/lib/Analysis/UnsafeBufferUsage.cpp
70
107

How deep will the Gadget Hierarchy be? Using inheritance only to classify safe and unsafe gadgets feels like a very heavy weight solution to a relatively simple problem.

aaron.ballman added inline comments.Nov 21 2022, 10:13 AM
clang/include/clang/Analysis/Analyses/UnsafeBufferUsageGadgets.def
2

Is this file named properly if it is also going to handle SAFE_GADGET?

10

You have some comments below about what gadgets are and what makes one safe or unsafe; should those comments be hoisted into this .def file?

clang/lib/Analysis/UnsafeBufferUsage.cpp
17–21

Something to think about: ObjC pointers are pointer-like but not actual pointers; should those be covered as well (via isAnyPointerType())?

37

We don't even need the #undef because the .def file already does that for us. Same observation applies below.

63

Should this be an explicit constructor? (Same question applies below to derived classes as well.)

113

Should we make a static constexpr const char * for these strings so we can give it a name and ensure it's used consistently?

steakhal added inline comments.Nov 22 2022, 6:57 AM
clang/lib/Analysis/UnsafeBufferUsage.cpp
40

We could have a GADGET_RANGE(UnsafeGadgets, Increment, Decrement), which could expand to BEGIN_##x = Increment, END_##x = Decrement, when declaring the Kind enum, similarly how SymExpr::Kind::Begin_BINARYSYMEXPRS is defined.
That way this code could look like:

return !(Kind::Begin_UnsafeGadgets <= K && K <= Kind::End_UnsafeGadgets);
43–45

I'd advocate for using the same macro variable name here as an XMACRO argument as it's present in the definition as parameter.
Same for the rest of the XMACRO expansions.

87
102
123

I thought this matcher was already bound as "Increment" by x ## Gadget::matcher().bind(#x) inside GadgetFinderCallback::run()

142

I wonder if we should assert that we only expect exactly one match (entry) inside Result.Nodes.

150

I'd advocate for covariant return types for such cases.

182–183

I was thinking of constructing a std::vector<DynTypedMatcher> of the matchers of the disjunction because the extra comma is allowed inside the initializer expression.
After that, you could probably use constructVariadic(VO_AnyOf, ...) as demonstrated by the TEST(ConstructVariadic, MismatchedTypes_Regression) unit-test. But TBH, I don't think it's more readable than this :D

NoQ added inline comments.Nov 28 2022, 9:36 PM
clang/lib/Analysis/UnsafeBufferUsage.cpp
63

What's the value of making it explicit given that it's an abstract class that can't be constructed directly anyway?

113

I think it makes perfect sense to have different bind-names in different classes. They don't all correspond to the same role the bound expression plays.

123

Yeah it's a bit redundant but it makes the class properly incapsulated, not reliant on the external machine to use it in a certain way. A bit more resistant to renaming as well.

And while this time these two bindings coincided, in the general case they can be different. So I wanted to establish an idiom that works in all cases.

If we ever run into performance problems, I'm totally open to optimizing this.

142

You mean like, exactly one if-statement succeeds? That'd require us to run the entire list every time (at least in debug mode) but it's probably not too bad. I'll look for a clean solution^^

182–183

One way or another, D138253 cleans up this FIXME.

aaron.ballman added inline comments.Nov 29 2022, 8:35 AM
clang/lib/Analysis/UnsafeBufferUsage.cpp
63

None, I had missed that this was an abstract class and forgot to delete the comment here on the base class. It does apply to the (non-abstract) derived classes though.

113

Sorry, I was unclear. I meant a private data member of IncrementGadget so that the constructor and the matcher() functions use a named constant rather than a string literal.

NoQ added inline comments.Nov 29 2022, 2:22 PM
clang/include/clang/Analysis/Analyses/UnsafeBufferUsageGadgets.def
2

UnsafeBufferUsage is the name of the analysis. It's somewhat valuable to keep this file next to UnsafeBufferUsage.h so that it was obvious at a glance that these two files work together.

I'm open to renaming the entire analysis though, a non-judgemental "BufferUsage analysis" totally works for me, WDYT?

Or I can make a folder. But I don't expect more than 2 files in this folder.

clang/lib/Analysis/UnsafeBufferUsage.cpp
17–21

Great question! Technically unsafe operations on ObjC pointers are not a compile error, but they never actually make sense because you can't have buffers of ObjC objects. Technically, you're not even supposed to know their size or layout until runtime (https://en.wikipedia.org/wiki/Objective-C#Non-fragile_instance_variables). So this is an insanely exotic situation that we most likely won't be covering.

37

Whoops right.

113

Oh, fair enough!

NoQ added inline comments.Nov 29 2022, 2:31 PM
clang/lib/Analysis/UnsafeBufferUsage.cpp
107

I'm expecting UnsafeGadget and SafeGadget to grow some common methods in the future. But I'm already second-guessing this decision (https://reviews.llvm.org/D137379?id=473092#inline-1340017) so I guess I'll ungrow these intermediate classes if things become too tedious.

NoQ updated this revision to Diff 480310.Dec 5 2022, 6:36 PM
NoQ marked 17 inline comments as done.

Address review comments.

NoQ marked an inline comment as done.Dec 5 2022, 6:37 PM
NoQ added inline comments.
clang/include/clang/Analysis/Analyses/UnsafeBufferUsageGadgets.def
2

I expect @malavikasamak to do some renaming in a follow-up patch in the near future as she seeks to introduce more precise distinction between unsafe and safe gadgets.

clang/lib/Analysis/UnsafeBufferUsage.cpp
40

Nuked this entire facility. It's always possible to simply call isSafe().

87

I love that!

107

To answer the original question, SafeGadget and UnsafeGadget are most likely going to be the only intermediate class. So, it'll be 2 abstract classes deep.

142

Ok I added this huge #ifdef NDEBUG thing. I agree that it's valuable but I'm also somewhat worried that it may diverge from the non-debug code.

NoQ updated this revision to Diff 480396.Dec 6 2022, 2:09 AM

A cleaner way to implement the debug facility.

NoQ added a comment.Dec 15 2022, 7:09 PM

Because we're approaching a stack of ~15 patches and it's starting to become unmanageable, I'm really interested in landing these older patches quickly. I think I addressed the feedback, and I'm definitely committed to addressing more feedback as it comes in, but it's likely that I'd ask for a chance to address it in follow-up patches anyway. Because otherwise the rest of our team will have to undergo annoying rebases, and it's probably also super confusing to review when all patches in the stack are out of sync from each other. So I'll try to land this tomorrow.

I don't mind committing these patches, I have a high confidence in you going back and addressing feedback post-commit if something is coming up.

jkorous accepted this revision.Dec 16 2022, 9:43 AM

LGTM

This revision is now accepted and ready to land.Dec 16 2022, 9:43 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 16 2022, 3:02 PM
MaskRay added inline comments.
clang/lib/Analysis/UnsafeBufferUsage.cpp
164

numFound was undefined in non-assertion builds. Fixed in 893a0ea948a65421013b62bd1855e430ca184739