This is an archive of the discontinued LLVM Phabricator instance.

[-Wunsafe-buffer-usage] Initial commit - Transition away from raw buffer accesses.
ClosedPublic

Authored by NoQ on Nov 3 2022, 11:25 AM.

Details

Summary

This is the initial commit for -Wunsafe-buffer-usage, a warning that helps codebases (especially modern C++ codebases) transition away from buffer accesses. It's a minimal commit that barely implements anything, mostly adds skeleton for future work; we have a long road ahead of us.

Backstory in https://discourse.llvm.org/t/rfc-c-buffer-hardening/65734, more documentation for the proposed programming model in D136811.

I'm putting the actual implementation into libAnalysis as it's going to be a non-trivial analysis - mostly the fixit part where we try to figure out if we understand a variable's use pattern well enough to suggest a safe container/view replacement. Some parts of it may eventually prove useful for any similar fixit machine that tries to change types of variables. More on that in the next patch.

The interface for the analysis is currently very primitive, the analysis emits operations on raw buffers it thinks are unsafe. The plan is that it'll also emit fixit objects, but then the consuming class will figure out how to present them.

Warning text is currently somewhat lame, it going to improve a lot once we specialize it for different operations, and once we start emitting fixits we'll have to rethink it anyway because fixits can't be attached to a specific operation (but to an entire variable or even group of variables).

The warning is disabled by default.

Diff Detail

Event Timeline

NoQ created this revision.Nov 3 2022, 11:25 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 3 2022, 11:25 AM
NoQ requested review of this revision.Nov 3 2022, 11:25 AM
NoQ retitled this revision from -Wunsafe-buffer-usage: A way to transition away from raw buffer accesses. to -Wunsafe-buffer-usage: Initial commit - Transition away from raw buffer accesses..Nov 3 2022, 11:30 AM
jkorous accepted this revision.Nov 15 2022, 11:29 AM

LGTM.

@aaron.ballman Do you have any objection to us landing this?

This revision is now accepted and ready to land.Nov 15 2022, 11:29 AM
xazax.hun added inline comments.Nov 15 2022, 11:45 AM
clang/include/clang/Analysis/Analyses/UnsafeBufferUsage.h
30

What is the purpose of the handler, would this add the fixit hints? In that case, is this the right abstraction level? E.g., do we want to group these by the declarations so the handler can rewrite the declaration and its uses at once?

clang/lib/Analysis/UnsafeBufferUsage.cpp
23

Is this not a FunctionDecl because of ObjCMethod? At some point I wonder if we should have an abstraction that works for both. Would NamedDecl work?

36–37

Move these close to their uses?

NoQ added inline comments.Nov 15 2022, 3:50 PM
clang/include/clang/Analysis/Analyses/UnsafeBufferUsage.h
30

You're absolutely right, we want to group fixits by declarations when fixits are actually available.

But we still need to cover the case when fixits *aren't* available, and in this case it doesn't make sense to group anything, so this interface isn't going away.

And on top of that, there's layers to grouping. For instance, in this code

void foo(int *p, size_t size) {
  int *q = p;
  for (size_t i = 0; i < size; ++i)
    q[i] = 0;
}

we'll need to attach the fix to the declaration p rather than q, as we aim to provide a single fixit for these two variables, because otherwise we end up with a low-quality fix that suppresses the warning but doesn't carry the span all the way from the caller to the use site.

Then if we have two parameters that we want to fix simultaneously this way, the fix will have to be inevitably grouped by *function* declaration.

clang/lib/Analysis/UnsafeBufferUsage.cpp
23

Yupp.

Well, we do have https://clang.llvm.org/doxygen/classclang_1_1AnyCall.html But it's not like the code actually cares at this point. And once it does (eg. for the purposes of fixits), it'll have to consider them separately anyway.

36–37

The code in between is actually instantly deleted by the follow-up patch :)

(D137348)

xazax.hun accepted this revision.Nov 16 2022, 10:30 AM
NoQ retitled this revision from -Wunsafe-buffer-usage: Initial commit - Transition away from raw buffer accesses. to [-Wunsafe-buffer-usage] Initial commit - Transition away from raw buffer accesses..Nov 17 2022, 8:00 PM
aaron.ballman added inline comments.
clang/include/clang/Analysis/Analyses/UnsafeBufferUsage.h
35

Do we need the interface to accept a non-const reference?

clang/include/clang/Basic/DiagnosticSemaKinds.td
11733–11734

The diagnostic wording isn't wrong, but I am a bit worried about complex expressions. Consider something like void func(a, b, c + d, e++, f(&g)); -- if you got this warning on that line of code, how likely would you be to spot what caused the diagnostic? I think we need to be sure that issuing this warning *always* passes in an extra SourceRange for the part of the expression that's caused the issue so users will at least get some underlined squiggles to help them out.

clang/lib/Analysis/UnsafeBufferUsage.cpp
51

Errr, this looks expensive to me, but maybe I'm forgetting (I can't recall if it's ancestor or descendant that's more expensive, I think it's ancestor though). @njames93 @LegalizeAdulthood @klimek @sammccall -- do you have concerns here or know of a better approach?

58

Heh, this is the sort of thing I was worried about, especially when you consider things like lambda bodies or class definitions with member functions being declared in a function, etc.

clang/lib/Sema/AnalysisBasedWarnings.cpp
2152–2154

I think this interface needs an additional SourceRange parameter that can be passed in as a streaming argument, along these lines. This way you'll get squiggles for the problematic part of the expression.

clang/test/SemaCXX/warn-unsafe-buffer-usage.cpp
2

No need to pin to a specific language mode.

NoQ added inline comments.Nov 30 2022, 5:59 PM
clang/include/clang/Analysis/Analyses/UnsafeBufferUsage.h
35

It's same as asking whether the handle... methods should be const. Roughly similar to whether MatchFinder::MatchCallback::run() should be const. I don't see a reason why not, but also "Why say lot word when few word do trick", could have been a lambda.

As an opposite example, static analyzer's Checker::check... callbacks really needed to be const, because carrying mutable state in the checker is often *tempting* but always *terrible*, in a way that's not immediately obvious to beginners.

clang/include/clang/Basic/DiagnosticSemaKinds.td
11733–11734

Yeah, I agree. Later we'll additionally specialize this warning to be more specific than "expression" (eg. "pointer arithmetic", "array access", etc.).

Hmm, is there a way to write the .td file entry so that the source range becomes mandatory?

clang/lib/Analysis/UnsafeBufferUsage.cpp
58

Yes, so @ziqingluo-90 is trying to combat this problem in D138329. His solution is non-expensive (single-pass through the AST without any backreference lookups) and I hope it can be standardized into a general-purpose matcher so that we can get rid of the forCallable() idiom entirely, even in existing clang-tidy checks, to win some performance and make the code less convoluted.

NoQ updated this revision to Diff 479134.Nov 30 2022, 6:44 PM
NoQ marked 4 inline comments as done.

Add range highlighting, unhardcode language mode.

clang/include/clang/Analysis/Analyses/UnsafeBufferUsage.h
30

I'll add comments in D138253 to clarify the separation of concerns more clearly, once the methods [between which concerns are separated] actually get defined.

clang/lib/Sema/AnalysisBasedWarnings.cpp
2152–2154

Operation->getSourceRange() does the trick right? Like this:

warning: unchecked operation on raw buffer in expression

  a[i];
  ~~~~

I suspect that the Operation is forever going to be the exact thing we want to highlight, there's virtually no reason for it to be anything else.

Or do you think it'd be better to do it like this?

warning: unchecked operation on raw buffer in expression

  a[i];
  ~
NoQ added inline comments.Nov 30 2022, 6:46 PM
clang/lib/Analysis/UnsafeBufferUsage.cpp
51

I added some speculation to the other comment (https://reviews.llvm.org/D137346?id=472987#inline-1342220) - I think we're going to be fast and safe as long as this other thing gets implemented.

aaron.ballman accepted this revision.Dec 1 2022, 6:33 AM

LGTM

clang/include/clang/Analysis/Analyses/UnsafeBufferUsage.h
35

Hmmm, I may regret this later (I often do when suggesting laxity with const), but I suppose it's reasonable to leave the interface mutable. I usually prefer things to be const up front and then relax the restriction later once we have a need, but this is a case where relaxing that later would be about as viral as adding const later.

clang/include/clang/Basic/DiagnosticSemaKinds.td
11733–11734

Hmm, is there a way to write the .td file entry so that the source range becomes mandatory?

Not to my knowledge; we could maybe thread through enough information to assert if the diagnostic is called without a source range, but I'm not certain we could reasonably get a compile-time error if the range isn't provided.

clang/lib/Sema/AnalysisBasedWarnings.cpp
2152–2154

Operation->getSourceRange() does the trick right?

Yup!

Or do you think it'd be better to do it like this?

I prefer the first form where the whole operation is highlighted instead of just an operand of the operation. I think that'll make more sense for situations like:

a + i;

to highlight the whole expression instead of just a. It also helpfully means I don't have to ask for a test like: i[a] and see if we highlight the correct "raw buffer" operand. :-D

NoQ added a comment.Dec 1 2022, 1:01 PM

Ok I added D136811 as a parent revision but I don't think it makes sense to land D136811 before this patch, given that D136811's documentation does not *yet* reflect the actual state of things. I think we should start landing patches, and land the documentation later when we think it actually makes sense to inform people about the warning and encourage enabling it.

Herald added a project: Restricted Project. · View Herald TranscriptDec 5 2022, 3:13 PM
NoQ added inline comments.Dec 8 2022, 5:54 PM
clang/include/clang/Analysis/Analyses/UnsafeBufferUsage.h
18

Ok so this is what caused the revert (https://lab.llvm.org/buildbot#builders/121/builds/25877):

FAILED: lib/libclangSema.so.16git 
: && /usr/lib64/ccache/c++ -fPIC -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wno-maybe-uninitialized -Wno-class-memaccess -Wno-noexcept-type -Wdelete-non-virtual-dtor -Wno-comment -Wno-misleading-indentation -fdiagnostics-color -ffunction-sections -fdata-sections -fno-common -Woverloaded-virtual -fno-strict-aliasing -O3 -DNDEBUG  -Wl,-z,defs -Wl,-z,nodelete   -Wl,-rpath-link,/home/buildbots/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage1/./lib  -Wl,--gc-sections -shared -Wl,-soname,libclangSema.so.16git -o lib/libclangSema.so.16git tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/AnalysisBasedWarnings.cpp.o tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/CodeCompleteConsumer.cpp.o tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/DeclSpec.cpp.o tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/DelayedDiagnostic.cpp.o tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/HLSLExternalSemaSource.cpp.o tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/IdentifierResolver.cpp.o tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/JumpDiagnostics.cpp.o tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/MultiplexExternalSemaSource.cpp.o tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/ParsedAttr.cpp.o tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/Scope.cpp.o tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/ScopeInfo.cpp.o tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/Sema.cpp.o tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaAccess.cpp.o tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaAttr.cpp.o tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaAvailability.cpp.o tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaCXXScopeSpec.cpp.o tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaCast.cpp.o tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaChecking.cpp.o tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaCodeComplete.cpp.o tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaConcept.cpp.o tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaConsumer.cpp.o tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaCoroutine.cpp.o tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaCUDA.cpp.o tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaDecl.cpp.o tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaDeclAttr.cpp.o tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaDeclCXX.cpp.o tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaDeclObjC.cpp.o tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaExceptionSpec.cpp.o tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaExpr.cpp.o tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaExprCXX.cpp.o tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaExprMember.cpp.o tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaExprObjC.cpp.o tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaFixItUtils.cpp.o tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaHLSL.cpp.o tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaInit.cpp.o tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaLambda.cpp.o tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaLookup.cpp.o tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaModule.cpp.o tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaObjCProperty.cpp.o tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaOpenMP.cpp.o tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaOverload.cpp.o tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaPseudoObject.cpp.o tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaRISCVVectorLookup.cpp.o tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaStmt.cpp.o tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaStmtAsm.cpp.o tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaStmtAttr.cpp.o tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaSYCL.cpp.o tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaTemplate.cpp.o tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaTemplateDeduction.cpp.o tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaTemplateInstantiate.cpp.o tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaTemplateInstantiateDecl.cpp.o tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaTemplateVariadic.cpp.o tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaType.cpp.o tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/TypeLocBuilder.cpp.o  -Wl,-rpath,"\$ORIGIN/../lib"  lib/libclangAnalysis.so.16git  lib/libclangEdit.so.16git  lib/libclangSupport.so.16git  lib/libLLVMFrontendHLSL.so.16git  lib/libclangAST.so.16git  lib/libLLVMFrontendOpenMP.so.16git  lib/libLLVMMC.so.16git  lib/libclangLex.so.16git  lib/libclangBasic.so.16git  lib/libLLVMCore.so.16git  lib/libLLVMSupport.so.16git  -Wl,-rpath-link,/home/buildbots/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage1/lib && :
tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/AnalysisBasedWarnings.cpp.o: In function `clang::ast_matchers::internal::matcher_hasDecayedType0Matcher::matches(clang::DecayedType const&, clang::ast_matchers::internal::ASTMatchFinder*, clang::ast_matchers::internal::BoundNodesTreeBuilder*) const':
AnalysisBasedWarnings.cpp:(.text._ZNK5clang12ast_matchers8internal30matcher_hasDecayedType0Matcher7matchesERKNS_11DecayedTypeEPNS1_14ASTMatchFinderEPNS1_21BoundNodesTreeBuilderE[_ZNK5clang12ast_matchers8internal30matcher_hasDecayedType0Matcher7matchesERKNS_11DecayedTypeEPNS1_14ASTMatchFinderEPNS1_21BoundNodesTreeBuilderE]+0x2c): undefined reference to `clang::ast_matchers::internal::DynTypedMatcher::matches(clang::DynTypedNode const&, clang::ast_matchers::internal::ASTMatchFinder*, clang::ast_matchers::internal::BoundNodesTreeBuilder*) const'
tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/AnalysisBasedWarnings.cpp.o: In function `clang::ast_matchers::internal::matcher_namesType0Matcher::matches(clang::ElaboratedType const&, clang::ast_matchers::internal::ASTMatchFinder*, clang::ast_matchers::internal::BoundNodesTreeBuilder*) const':
AnalysisBasedWarnings.cpp:(.text._ZNK5clang12ast_matchers8internal25matcher_namesType0Matcher7matchesERKNS_14ElaboratedTypeEPNS1_14ASTMatchFinderEPNS1_21BoundNodesTreeBuilderE[_ZNK5clang12ast_matchers8internal25matcher_namesType0Matcher7matchesERKNS_14ElaboratedTypeEPNS1_14ASTMatchFinderEPNS1_21BoundNodesTreeBuilderE]+0x2c): undefined reference to `clang::ast_matchers::internal::DynTypedMatcher::matches(clang::DynTypedNode const&, clang::ast_matchers::internal::ASTMatchFinder*, clang::ast_matchers::internal::BoundNodesTreeBuilder*) const'
tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/AnalysisBasedWarnings.cpp.o: In function `clang::ast_matchers::internal::matcher_hasImplicitDestinationType0Matcher::matches(clang::ImplicitCastExpr const&, clang::ast_matchers::internal::ASTMatchFinder*, clang::ast_matchers::internal::BoundNodesTreeBuilder*) const':
AnalysisBasedWarnings.cpp:(.text._ZNK5clang12ast_matchers8internal42matcher_hasImplicitDestinationType0Matcher7matchesERKNS_16ImplicitCastExprEPNS1_14ASTMatchFinderEPNS1_21BoundNodesTreeBuilderE[_ZNK5clang12ast_matchers8internal42matcher_hasImplicitDestinationType0Matcher7matchesERKNS_16ImplicitCastExprEPNS1_14ASTMatchFinderEPNS1_21BoundNodesTreeBuilderE]+0x2c): undefined reference to `clang::ast_matchers::internal::DynTypedMatcher::matches(clang::DynTypedNode const&, clang::ast_matchers::internal::ASTMatchFinder*, clang::ast_matchers::internal::BoundNodesTreeBuilder*) const'
...

I didn't need ASTMatchers in this include, and libAnalysis already links to libASTMatchers but libSema doesn't link to libASTMatchers *directly*, so after this header is included from libSema in some shared libs builds it was producing linker errors.