This is an archive of the discontinued LLVM Phabricator instance.

[-Wunsafe-buffer-usage] NFC: Implement fix-strategies and variable-use-claiming.
ClosedPublic

Authored by NoQ on Nov 17 2022, 7:13 PM.

Details

Summary

This patch adds more abstractions that we'll need later for emitting -Wunsafe-buffer-usage fixits. It doesn't emit any actual fixits, so no change is observed behavior, but it introduces a way to emit fixits, and existing tests now verify that the compiler still emits no fixits, despite knowing how to do so.

The purpose of our code transformation analysis is to fix variable types in the code from raw pointer types to C++ standard collection/view types.

The analysis has to decide on its own which specific type is the most appropriate for every variable. This patch introduces the Strategy class that maps variables to their most appropriate types.

In D137348 we've introduced the Gadget abstraction, which describes a rigid AST pattern that the analysis "fully understands" and may need to fix. Which specific fix is actually necessary for a given Gadget, and whether it's necessary at all, and whether it's possible in the first place, depends on the Strategy. So, this patch adds a virtual method which every gadget can implement in order to teach the analysis how to fix that gadget:

Gadget->getFixits(Strategy)

However, even if the analysis knows how to fix every Gadget, doesn't necessarily mean it can fix the variable. Some uses of the variable may have never been covered by Gadgets, which corresponds to the situation that the analysis doesn't fully understand how the variable is used. This patch introduces a Tracker class that tracks all variable uses (i.e. DeclRefExprs) in the function. Additionally, each Gadget now provides a new virtual method

Gadget->getClaimedVarUseSites()

that the Tracker can call to see which DeclRefExprs are "claimed" by the Gadget. In order to fix the variable with a certain Strategy, the Tracker needs to confirm that there are no unclaimed uses, and every Gadget has to provide a fix for that Strategy.

This "conservative" behavior guarantees that fixes emitted by our analysis are correct by construction. We can now be sure that the analysis won't attempt to emit a fix if it doesn't understand the code. Later, as we implement more getFixits() methods in individual Gadget classes, we'll start progressively emitting more and more fixits.

Diff Detail

Event Timeline

NoQ created this revision.Nov 17 2022, 7:13 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 17 2022, 7:13 PM
NoQ requested review of this revision.Nov 17 2022, 7:13 PM
NoQ added inline comments.
clang/lib/Analysis/UnsafeBufferUsage.cpp
218

This extra payload wasn't advertised but it's very useful because it's hard to jump from VarDecl to DeclStmt without it, and we need to do such jump because our analysis starts from unsafe *uses* of a variable, which only give us a VarDecl.

441–445

We're safe for now because none of the currently implemented Gadget classes "claim" more than one DeclRefExpr. We need to address this FIXME before we implement such gadgets.

NoQ added inline comments.Nov 17 2022, 8:00 PM
clang/lib/Analysis/UnsafeBufferUsage.cpp
218

Another way to look at this, is to notice that currently DeclStmt of the variable isn't a Gadget on its own, and ask whether it *should* be a Gadget.

I currently believe that yes, it should be a Gadget, because it can potentially be a "multi-variable" Gadget if, for example, the initializer-expression is itself another raw pointer DeclRefExpr. So there needs to be a way to communicate that the fix for the variable declaration may depend on Strategy for more than one variable, and that the Strategy itself should be chosen carefully, considering that choices for different variables may interact this way. And from the point of view of the *other* variable, this definitely needs to be a Gadget, it's no different from any other Gadget.

So, this part needs some more work.

xazax.hun added inline comments.Nov 18 2022, 3:02 PM
clang/lib/Analysis/UnsafeBufferUsage.cpp
218

What about DeclStmts that declare multiple variables, some safe some unsafe. Do you plan to generate fixits in that case exploding the DeclStmt?

steakhal added inline comments.Nov 22 2022, 8:27 AM
clang/include/clang/Analysis/Analyses/UnsafeBufferUsage.h
31

Unless we have a well-formed idea of how many elements we are going to have, we should probably omit it.

clang/lib/Analysis/UnsafeBufferUsage.cpp
65
72

I wonder if we should prefer std::optional as we are c++17.
IDK if it was ever covered on the community forum.
I'm just leaving this here without expecting any action.

213–215
241–244

Maybe Uses should refer to the canonical decls. If that would be the case we could just call .contains(VD) here.

402–468
422–424
steakhal added inline comments.Nov 23 2022, 12:51 PM
clang/lib/Analysis/UnsafeBufferUsage.cpp
72
NoQ added inline comments.Nov 28 2022, 1:46 PM
clang/lib/Analysis/UnsafeBufferUsage.cpp
72

Oooo nice.

218

Yes absolutely.

They're annoying because if the variable is in the middle, you'll get 3 DeclStmts instead of one, but yeah, we'll have to handle those.

241–244

That's up to the AST, not up to us. I'd rather be safe here, because such bugs are very hard to notice.

422–424

I can get used to that!

aaron.ballman added inline comments.Nov 30 2022, 7:59 AM
clang/include/clang/Analysis/Analyses/UnsafeBufferUsage.h
31

Better yet, use a SmallVectorImpl (since this shows up in an interface)

clang/include/clang/Basic/DiagnosticSemaKinds.td
11761–11765
clang/lib/Analysis/UnsafeBufferUsage.cpp
18–20

Do we have to care about data member accesses (which would be a MemberExpr and not a DeclRefExpr)?

Also, because this shows up in interfaces, it should probably be a SmallVectorImpl so the size doesn't matter.

248–253

Use a specific_decl_iterator<VarDecl> instead?

311–317
clang/lib/Sema/AnalysisBasedWarnings.cpp
2161–2163

Hmmm... llvm::for_each instead?

NoQ updated this revision to Diff 480394.EditedDec 6 2022, 2:07 AM
NoQ marked 9 inline comments as done.

A cleaner way to implement the debug facility.

NoQ updated this revision to Diff 480397.Dec 6 2022, 2:10 AM

Whoops, sorry, wrong patch. I was trying to say, "Address review comments".

clang/lib/Analysis/UnsafeBufferUsage.cpp
18–20

Do we have to care about data member accesses (which would be a MemberExpr and not a DeclRefExpr)?

Unlikely. Auto-fixing member variables without breaking source or binary compatibility is next to impossible.

Also, because this shows up in interfaces, it should probably be a SmallVectorImpl so the size doesn't matter.

This doesn't seem to work on return types, which is where these classes show up in my case (SmallVectorImpl has deleted copy-move constructors so it's only suitable for passing by reference).

248–253

Hmm DeclStmt doesn't seem to provide a neat accessor, would you like me to add it?

aaron.ballman added inline comments.Dec 6 2022, 11:04 AM
clang/lib/Analysis/UnsafeBufferUsage.cpp
18–20

Do we have to care about data member accesses (which would be a MemberExpr and not a DeclRefExpr)?

Unlikely. Auto-fixing member variables without breaking source or binary compatibility is next to impossible.

Okie dokie!

Also, because this shows up in interfaces, it should probably be a SmallVectorImpl so the size doesn't matter.

This doesn't seem to work on return types, which is where these classes show up in my case (SmallVectorImpl has deleted copy-move constructors so it's only suitable for passing by reference).

Ahhh I had forgotten that we didn't allow it as a move-only type.

248–253

Eh, I don't insist. Feel free to do it as an NFC change after landing this patch, if you want to.

NoQ added a comment.Dec 15 2022, 7:09 PM

Because we're approaching a stack of ~15 patches and it's starting to become unmanageable, I'm really interested in landing these older patches quickly. I think I addressed the feedback, and I'm definitely committed to addressing more feedback as it comes in, but it's likely that I'd ask for a chance to address it in follow-up patches anyway. Because otherwise the rest of our team will have to undergo annoying rebases, and it's probably also super confusing to review when all patches in the stack are out of sync from each other. So I'll try to land this tomorrow.

jkorous accepted this revision.Dec 16 2022, 9:43 AM

LGTM

This revision is now accepted and ready to land.Dec 16 2022, 9:43 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 16 2022, 6:48 PM

Hello,

I noticed that the following crashes with this patch:

clang -Weverything bbi-77071.c

Result:

clang-16: ../../clang/lib/Analysis/UnsafeBufferUsage.cpp:248: void (anonymous namespace)::DeclUseTracker::discoverDecl(const clang::DeclStmt *): Assertion `Defs.count(VD) == 0 && "Definition already discovered!"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.	Program arguments: /repo/uabelho/main-github/llvm/build-all/bin/clang-16 -cc1 -triple x86_64-unknown-linux-gnu -emit-obj -mrelax-all --mrelax-relocations -disable-free -clear-ast-before-backend -main-file-name bbi-77071.c -mrelocation-model pic -pic-level 2 -pic-is-pie -mframe-pointer=all -fmath-errno -ffp-contract=on -fno-rounding-math -mconstructor-aliases -funwind-tables=2 -target-cpu x86-64 -tune-cpu generic -mllvm -treat-scalable-fixed-error-as-warning -debugger-tuning=gdb -fcoverage-compilation-dir=/repo/uabelho/llvm-dev2/llvm -resource-dir /repo/uabelho/main-github/llvm/build-all/lib/clang/16 -internal-isystem /repo/uabelho/main-github/llvm/build-all/lib/clang/16/include -internal-isystem /usr/local/include -internal-isystem /usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../x86_64-redhat-linux/include -internal-externc-isystem /include -internal-externc-isystem /usr/include -Weverything -fdebug-compilation-dir=/repo/uabelho/llvm-dev2/llvm -ferror-limit 19 -fgnuc-version=4.2.1 -fcolor-diagnostics -faddrsig -D__GCC_HAVE_DWARF2_CFI_ASM=1 -o /tmp/bbi-77071-6173ad.o -x c bbi-77071.c
1.	<eof> parser at end of file
2.	bbi-77071.c:5:9: parsing function body 'fn1'
 #0 0x0000000002fdf843 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/repo/uabelho/main-github/llvm/build-all/bin/clang-16+0x2fdf843)
 #1 0x0000000002fdd56e llvm::sys::RunSignalHandlers() (/repo/uabelho/main-github/llvm/build-all/bin/clang-16+0x2fdd56e)
 #2 0x0000000002fdfbc6 SignalHandler(int) Signals.cpp:0:0
 #3 0x00007fcfa502a630 __restore_rt sigaction.c:0:0
 #4 0x00007fcfa2771387 raise (/lib64/libc.so.6+0x36387)
 #5 0x00007fcfa2772a78 abort (/lib64/libc.so.6+0x37a78)
 #6 0x00007fcfa276a1a6 __assert_fail_base (/lib64/libc.so.6+0x2f1a6)
 #7 0x00007fcfa276a252 (/lib64/libc.so.6+0x2f252)
 #8 0x0000000005520daf findGadgets(clang::Decl const*)::GadgetFinderCallback::run(clang::ast_matchers::MatchFinder::MatchResult const&) UnsafeBufferUsage.cpp:0:0
 #9 0x000000000554c862 clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::MatchVisitor::visitMatch(clang::ast_matchers::BoundNodes const&) ASTMatchFinder.cpp:0:0
#10 0x000000000557d27b clang::ast_matchers::internal::BoundNodesTreeBuilder::visitMatches(clang::ast_matchers::internal::BoundNodesTreeBuilder::Visitor*) (/repo/uabelho/main-github/llvm/build-all/bin/clang-16+0x557d27b)
#11 0x000000000554be73 clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::matchWithFilter(clang::DynTypedNode const&) ASTMatchFinder.cpp:0:0
#12 0x0000000005524f9c clang::ast_matchers::MatchFinder::match(clang::DynTypedNode const&, clang::ASTContext&) (/repo/uabelho/main-github/llvm/build-all/bin/clang-16+0x5524f9c)
#13 0x000000000551eb53 clang::checkUnsafeBufferUsage(clang::Decl const*, clang::UnsafeBufferUsageHandler&) (/repo/uabelho/main-github/llvm/build-all/bin/clang-16+0x551eb53)
#14 0x00000000053f48e5 clang::sema::AnalysisBasedWarnings::IssueWarnings(clang::sema::AnalysisBasedWarnings::Policy, clang::sema::FunctionScopeInfo*, clang::Decl const*, clang::QualType) (/repo/uabelho/main-github/llvm/build-all/bin/clang-16+0x53f48e5)
#15 0x0000000004b91a53 clang::Sema::PopFunctionScopeInfo(clang::sema::AnalysisBasedWarnings::Policy const*, clang::Decl const*, clang::QualType) (/repo/uabelho/main-github/llvm/build-all/bin/clang-16+0x4b91a53)
#16 0x0000000004d14f6f clang::Sema::ActOnFinishFunctionBody(clang::Decl*, clang::Stmt*, bool) (/repo/uabelho/main-github/llvm/build-all/bin/clang-16+0x4d14f6f)
#17 0x0000000004b29135 clang::Parser::ParseFunctionStatementBody(clang::Decl*, clang::Parser::ParseScope&) (/repo/uabelho/main-github/llvm/build-all/bin/clang-16+0x4b29135)
#18 0x0000000004a50d95 clang::Parser::ParseFunctionDefinition(clang::ParsingDeclarator&, clang::Parser::ParsedTemplateInfo const&, clang::Parser::LateParsedAttrList*) (/repo/uabelho/main-github/llvm/build-all/bin/clang-16+0x4a50d95)
#19 0x0000000004a6d637 clang::Parser::ParseDeclGroup(clang::ParsingDeclSpec&, clang::DeclaratorContext, clang::ParsedAttributes&, clang::SourceLocation*, clang::Parser::ForRangeInit*) (/repo/uabelho/main-github/llvm/build-all/bin/clang-16+0x4a6d637)
#20 0x0000000004a4f9a7 clang::Parser::ParseDeclOrFunctionDefInternal(clang::ParsedAttributes&, clang::ParsedAttributes&, clang::ParsingDeclSpec&, clang::AccessSpecifier) (/repo/uabelho/main-github/llvm/build-all/bin/clang-16+0x4a4f9a7)
#21 0x0000000004a4f238 clang::Parser::ParseDeclarationOrFunctionDefinition(clang::ParsedAttributes&, clang::ParsedAttributes&, clang::ParsingDeclSpec*, clang::AccessSpecifier) (/repo/uabelho/main-github/llvm/build-all/bin/clang-16+0x4a4f238)
#22 0x0000000004a4e35e clang::Parser::ParseExternalDeclaration(clang::ParsedAttributes&, clang::ParsedAttributes&, clang::ParsingDeclSpec*) (/repo/uabelho/main-github/llvm/build-all/bin/clang-16+0x4a4e35e)
#23 0x0000000004a4b973 clang::Parser::ParseTopLevelDecl(clang::OpaquePtr<clang::DeclGroupRef>&, clang::Sema::ModuleImportState&) (/repo/uabelho/main-github/llvm/build-all/bin/clang-16+0x4a4b973)
#24 0x0000000004a45da5 clang::ParseAST(clang::Sema&, bool, bool) (/repo/uabelho/main-github/llvm/build-all/bin/clang-16+0x4a45da5)
#25 0x0000000003acb486 clang::FrontendAction::Execute() (/repo/uabelho/main-github/llvm/build-all/bin/clang-16+0x3acb486)
#26 0x0000000003a3d534 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/repo/uabelho/main-github/llvm/build-all/bin/clang-16+0x3a3d534)
#27 0x0000000003b8cd92 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/repo/uabelho/main-github/llvm/build-all/bin/clang-16+0x3b8cd92)
#28 0x00000000009f8516 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/repo/uabelho/main-github/llvm/build-all/bin/clang-16+0x9f8516)
#29 0x00000000009f53b0 ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&) driver.cpp:0:0
#30 0x00000000009f4e66 clang_main(int, char**) (/repo/uabelho/main-github/llvm/build-all/bin/clang-16+0x9f4e66)
#31 0x00007fcfa275d555 __libc_start_main (/lib64/libc.so.6+0x22555)
#32 0x00000000009f0fbb _start (/repo/uabelho/main-github/llvm/build-all/bin/clang-16+0x9f0fbb)
clang-16: error: unable to execute command: Aborted (core dumped)
clang-16: error: clang frontend command failed due to signal (use -v to see invocation)
clang version 16.0.0
Target: x86_64-unknown-linux-gnu
Thread model: posix

NoQ added a comment.Dec 19 2022, 2:11 PM

Ooo thanks I'll fix asap.

NoQ added a comment.Dec 19 2022, 2:24 PM

Also if you're compiling your code with -Weverything by default, I do recommend explicitly disabling -Wunsafe-buffer-usage for at least a month or so, until we land all the basic functionality and you can make an informed decision whether you actually need it (more reading in https://discourse.llvm.org/t/rfc-c-buffer-hardening/65734 and D136811).

But, yeah, we still shouldn't break users who simply want to enable -Weverything temporarily just to see what warnings are there, that's really bad and I hope I'll patch this up today.

Also if you're compiling your code with -Weverything by default, I do recommend explicitly disabling -Wunsafe-buffer-usage for at least a month or so, until we land all the basic functionality and you can make an informed decision whether you actually need it (more reading in https://discourse.llvm.org/t/rfc-c-buffer-hardening/65734 and D136811).

But, yeah, we still shouldn't break users who simply want to enable -Weverything temporarily just to see what warnings are there, that's really bad and I hope I'll patch this up today.

Thanks for the warning. In our fuzzy testing we sometimes randomly use -Weverything just to make sure it doesn't crash. And well, it did crash here, so I guess the testing was succesful :)
So it's no panic for us, but good to sort out the crash.

NoQ added a comment.Dec 20 2022, 4:06 PM

I suppressed the assertion in rG8fd62e7. It looks like we'll need a deeper investigation into this.

It crashes on this as well:

void f(int a[])
{
    int b = {a[1]};
}

with

clang -cc1 foo.c -Weverything

I get

clang: ../../clang/lib/Analysis/UnsafeBufferUsage.cpp:233: void (anonymous namespace)::DeclUseTracker::claimUse(const clang::DeclRefExpr *): Assertion `Uses->count(DRE) && "DRE not found or claimed by multiple matchers!"' failed.

But now I will indeed add -Wno-unsafe-buffer-usage. :)