This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/Analysis/Analyses/
-
clang/
-
Analysis/
-
Analyses/
-
UnsafeBufferUsage.h
-
UnsafeBufferUsageGadgets.def
-
lib/Analysis/
-
Analysis/
9/14
UnsafeBufferUsage.cpp
-
test/SemaCXX/
-
SemaCXX/
1/1
warn-unsafe-buffer-usage-fixits-addressof-arraysubscript.cpp

Differential D143128

[-Wunsafe-buffer-usage] Fix-Its transforming `&DRE[any]` to `&DRE.data()[any]`
ClosedPublic

Authored by ziqingluo-90 on Feb 1 2023, 6:38 PM.

Download Raw Diff

Details

Reviewers

t-rasmud
NoQ
jkorous
malavikasamak
aaron.ballman
xazax.hun
gribozavr
ymandel
sgatev

Commits

rG87b5807d3802: [-Wunsafe-buffer-usage] Fix-Its transforming `&DRE[any]` to `&DRE.data()[any]`

Summary

For an expression of the form &DRE[any-expr] under an Unspecified Pointer Context (UPC), we generate a fix-it for it with respect to a strategy. In case the strategy is std::span (it is the only supported one for now), the fix-it replaces the expression with &DRE.data()[any-expr].

A UPC includes at least the contexts where

the expression is being casted to an integer; and
the expression is an argument of a call to a function that is not marked unsafe.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ziqingluo-90 created this revision.Feb 1 2023, 6:38 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 1 2023, 6:38 PM

ziqingluo-90 requested review of this revision.Feb 1 2023, 6:38 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 1 2023, 6:38 PM

Herald added a subscriber: cfe-commits. · View Herald Transcript

Harbormaster completed remote builds in B211366: Diff 494137.Feb 1 2023, 6:39 PM

ziqingluo-90 updated this revision to Diff 494711.Feb 3 2023, 12:50 PM

ziqingluo-90 retitled this revision from [-Wunsafe-buffer-usage][WIP] Fix-Its transforming `&DRE[*]` to `DRE.data() + *` to [-Wunsafe-buffer-usage][WIP] Fix-Its transforming `&DRE[any]` to `DRE.data() + any`.

ziqingluo-90 edited the summary of this revision. (Show Details)

ziqingluo-90 added a parent revision: D143206: [-Wunsafe-buffer-usage] Add Fixable for simple pointer dereference.

Harbormaster completed remote builds in B211782: Diff 494711.Feb 3 2023, 12:57 PM

jkorous added inline comments.Feb 6 2023, 12:00 PM

clang/lib/Analysis/UnsafeBufferUsage.cpp
172	I am just wondering how does the callee matcher work in situation with multiple re-declarations 🤔 Something like this: void foo(int* ptr); [[clang::unsafe_buffer_usage]] void foo(int* ptr); void foo(int* ptr); void bar(int* ptr) { foo(ptr); }
517	I am wondering what will happen in the weird corner-case of `&5[ptr]` - I feel the Fix-It we produce would be incorrect. Here's a suggestion - we could use `hasLHS` instead of `hasBase` here and add a FIXME that when we find the time we should also properly support the corner-case. That would be a pretty low-priority though - we definitely have more important patterns to support first. WDYT?
948	Since we use `std::nullopt` in `getFixits` to signal errors - we should either use the same strategy in `fixUPCAddressofArraySubscriptWithSpan` or translate the empty return value from it to `nullopt` here. (FWIWI I am leaning towards the former.) Forwarding the empty Fix-It would be incorrect.

ziqingluo-90 added inline comments.Feb 6 2023, 3:05 PM

clang/lib/Analysis/UnsafeBufferUsage.cpp
517	I'm not sure if I understand your concern. For `&5[ptr]`, we will generate a fix-it `ptr.data() + 5` in cases `ptr` is assigned a `span` strategy. It is same as the case of `&ptr[5]`.
948	Oh, that's a bug I made! Thank you for finding it for me.

ziqingluo-90 added inline comments.Feb 6 2023, 3:17 PM

clang/lib/Analysis/UnsafeBufferUsage.cpp

172

I think we are fine. According to the doc of FunctionDecl:

/// Represents a function declaration or definition.
///
/// Since a given function can be declared several times in a program,
/// there may be several FunctionDecls that correspond to that
/// function. Only one of those FunctionDecls will be found when
/// traversing the list of declarations in the context of the
/// FunctionDecl (e.g., the translation unit); this FunctionDecl
/// contains all of the information known about the function. Other,
/// previous declarations of the function are available via the
/// getPreviousDecl() chain.

Why do we prefer DRE.data() + any to &DRE.data()[any]? It could be much less intrusive this way, and the safety guarantees are the same.

In D143128#4108375, @NoQ wrote:

Why do we prefer DRE.data() + any to &DRE.data()[any]? It could be much less intrusive this way, and the safety guarantees are the same.

It is actually (DRE.data() + any) versus &DRE.data()[any]. Are they quite the same in terms of being intrusive?

Let fixUPCAddressofArraySubscriptWithSpan return std::nullopt instead of an empty list when we should give up on the fix-it.

Add a few test cases for some corner cases.

Harbormaster completed remote builds in B212242: Diff 495325.Feb 6 2023, 5:07 PM

jkorous added inline comments.Feb 6 2023, 5:38 PM

clang/lib/Analysis/UnsafeBufferUsage.cpp
172	I see! Sound like we should be fine indeed and the test seems to confirm. Thank you!
517	Oh, my bad! I assumed (AKA didn't check) that we're just replacing the parts of the code around the DRE and index. You're right. Please ignore me :)

malavikasamak added a child revision: D143676: [-Wunsafe-buffer-usage] FixableGadget for handling stand alone pointers under UPC..Feb 9 2023, 1:06 PM

ziqingluo-90 added a child revision: D143680: [-Wunsafe-buffer-usage] Improve fix-its for local variable declarations with null pointer initializers.Feb 9 2023, 2:17 PM

ziqingluo-90 removed a child revision: D143680: [-Wunsafe-buffer-usage] Improve fix-its for local variable declarations with null pointer initializers.Feb 13 2023, 11:17 AM

Rebased.

Harbormaster completed remote builds in B213473: Diff 497057.Feb 13 2023, 11:29 AM

ziqingluo-90 retitled this revision from [-Wunsafe-buffer-usage][WIP] Fix-Its transforming `&DRE[any]` to `(DRE.data() + any)` to [-Wunsafe-buffer-usage] Fix-Its transforming `&DRE[any]` to `(DRE.data() + any)`.Feb 16 2023, 3:42 PM

ziqingluo-90 added reviewers: aaron.ballman, xazax.hun, gribozavr, ymandel, sgatev.

Herald added a subscriber: rnkovacs. · View Herald TranscriptFeb 16 2023, 3:42 PM

Now we have different FixableGadgets that may match the same Stmt (representing a context).
So in order to discover all "Fixable"s, we can no longer match anyOf FixableGadgets' matchers. Instead, we match eachOf them.

Harbormaster completed remote builds in B214504: Diff 498498.Feb 17 2023, 1:45 PM

ziqingluo-90 added inline comments.Feb 17 2023, 1:46 PM

clang/lib/Analysis/UnsafeBufferUsage.cpp
778	Change from `anyOf` to `eachOf`

In D143128#4108502, @ziqingluo-90 wrote:

In D143128#4108375, @NoQ wrote:

Why do we prefer DRE.data() + any to &DRE.data()[any]? It could be much less intrusive this way, and the safety guarantees are the same.

It is actually (DRE.data() + any) versus &DRE.data()[any]. Are they quite the same in terms of being intrusive?

They may be equally verbose but I think the latter looks much more similar to the original code in terms of shape. This could be valuable because there's probably a reason why the programmer preferred to write it that way. Or even if there wasn't any reason, the users still may find it surprising that we change the shape for no apparent reason.

This is an interesting topic. In the abstract I see the question as: "Should the Fix-Its prioritize how the code will fit the desired end state (presumably modern idiomatic C++) or carefully respect the state of the code as is now?"

The only thing I feel pretty strongly about is that no matter what philosophy we decide to use here we should apply it consistently to all our Fix-Its (which might or might not already be the case).

And FWIW I can also imagine at some point in the future we might either have two dialects of the Fix-Its or that a separate modernizer tool (completely independent of Safe Buffers) could suggest transformations like:
"Would you like to change &DRE.data()[any] to (DRE.data() + any)?"

In D143128#4167626, @jkorous wrote:

This is an interesting topic. In the abstract I see the question as: "Should the Fix-Its prioritize how the code will fit the desired end state (presumably modern idiomatic C++) or carefully respect the state of the code as is now?"

The only thing I feel pretty strongly about is that no matter what philosophy we decide to use here we should apply it consistently to all our Fix-Its (which might or might not already be the case).

And FWIW I can also imagine at some point in the future we might either have two dialects of the Fix-Its or that a separate modernizer tool (completely independent of Safe Buffers) could suggest transformations like:
"Would you like to change &DRE.data()[any] to (DRE.data() + any)?"

Fantastic topic! :) In our experience at Google, we've generally followed the philosophy of leaving the code at least as good as it was before we touched it. So, if the idiom is archaic, but we're just adjusting it, that's fine (as in this example), but our tools shouldn't generate non-idiomatic (or anti-idiomic) code. We've also often taken the path of "leave cleanups to a separate pass", especially when we already have say, a clang tidy check, that does that kind of clean up running regularly. But, this one is more a judgment call. I'd probably lean towards "make the code better once you're at it", but certainly see the conservative argument.

t-rasmud added inline comments.Mar 6 2023, 10:51 AM

clang/lib/Analysis/UnsafeBufferUsage.cpp
1079	Nit: "costly"
clang/test/SemaCXX/warn-unsafe-buffer-usage-fixits-addressof-arraysubscript.cpp
75	Can we have a test case for `&p[0]`? IIUC, this would generate a fixit `p.data() + 0` which is correct but we might want to optimize it to `p.data()` sometime in the future.

In D143128#4167626, @jkorous wrote:

This is an interesting topic. In the abstract I see the question as: "Should the Fix-Its prioritize how the code will fit the desired end state (presumably modern idiomatic C++) or carefully respect the state of the code as is now?"

The only thing I feel pretty strongly about is that no matter what philosophy we decide to use here we should apply it consistently to all our Fix-Its (which might or might not already be the case).

And FWIW I can also imagine at some point in the future we might either have two dialects of the Fix-Its or that a separate modernizer tool (completely independent of Safe Buffers) could suggest transformations like:
"Would you like to change &DRE.data()[any] to (DRE.data() + any)?"

In D143128#4167655, @ymandel wrote:

Fantastic topic! :) In our experience at Google, we've generally followed the philosophy of leaving the code at least as good as it was before we touched it. So, if the idiom is archaic, but we're just adjusting it, that's fine (as in this example), but our tools shouldn't generate non-idiomatic (or anti-idiomic) code. We've also often taken the path of "leave cleanups to a separate pass", especially when we already have say, a clang tidy check, that does that kind of clean up running regularly. But, this one is more a judgment call. I'd probably lean towards "make the code better once you're at it", but certainly see the conservative argument.

That's a fantastic topic in general, but in *this* case I'm really not sure which one's ultimately "better", I'd totally write code both ways and at different times prefer one over the other.

I honestly actually prefer code like &DRE[any]/&DRE.data()[any] to the one with pointer arithmetic, specifically because it avoids the subject of pointer arithmetic, but instead talks about "Let's take this object, for which we already have a name, and take its address". The readers don't have to typecheck the expression in their heads in order to figure out whether correct offset multipliers are applied during pointer arithmetic. Another advantage of &DRE.data()[any] is that it's really easy to transform it to idiomatic &DRE[any] once the user confirms that the bounds checks don't actually get in the way in their case. This is much harder to achieve with DRE.data() + any given that DRE + any isn't valid code anymore.

NoQ mentioned this in D142795: [-Wunsafe-buffer-usage] Add Fixable for dereference of simple ptr arithmetic.Mar 7 2023, 5:27 PM

Thanks for the valuable discussion about the philosophy on the ideal forms of fix-its. In this case, I think &DRE.data()[any] and (DRE.data() + any) are both straightforward enough for the user to tell what has been changed and why we change that. And I believe both forms are idiomatic so that the user are likely happy with the form. I will keep this discussion in mind as we have other cases whose fix-its may be less idiomatic, such as D144304.

Since we have to choose one form for this case, I buy @NoQ 's argument to use &DRE.data()[any].

In addition, thanks to @t-rasmud for reminding me of that &DRE[0] can be converted to DRE.data() optimally. In such a specific case, I think we do want it to be the most concise form without confusing the user.

Harbormaster completed remote builds in B217974: Diff 503193.Mar 7 2023, 5:46 PM

ziqingluo-90 marked 2 inline comments as done.Mar 7 2023, 5:46 PM

malavikasamak added inline comments.Mar 24 2023, 5:20 PM

clang/lib/Analysis/UnsafeBufferUsage.cpp
1095	The title and summary of this patch indicates this gadget emits fixit of the form &DRE.data() + any when it encounters &DRE[any], but the code seems to emits &DRE.data()[any] instead? If yes, can you please update the summary?

malavikasamak accepted this revision.Mar 24 2023, 6:39 PM

This revision is now accepted and ready to land.Mar 24 2023, 6:39 PM

ziqingluo-90 retitled this revision from [-Wunsafe-buffer-usage] Fix-Its transforming `&DRE[any]` to `(DRE.data() + any)` to [-Wunsafe-buffer-usage] Fix-Its transforming `&DRE[any]` to `&DRE.data()[any]`.Mar 28 2023, 2:13 PM

ziqingluo-90 edited the summary of this revision. (Show Details)

ziqingluo-90 marked an inline comment as done.

Looks great! LGTM except there's some dead code.

clang/lib/Analysis/UnsafeBufferUsage.cpp
563–566	These `dyn_cast`s are already checked by the matcher. They can be turned into `cast`s and this function can return `{DRE}` unconditionally.
944	Similarly, this check is redundant, it's already guaranteed by the matcher.
1076–1078	Same here!

ziqingluo-90 marked 3 inline comments as done.Apr 3 2023, 6:18 PM

This revision was landed with ongoing or failed builds.Apr 4 2023, 1:27 PM

Closed by commit rG87b5807d3802: [-Wunsafe-buffer-usage] Fix-Its transforming `&DRE[any]` to `&DRE.data()[any]` (authored by ziqingluo-90). · Explain Why

This revision was automatically updated to reflect the committed changes.

ziqingluo-90 added a commit: rG87b5807d3802: [-Wunsafe-buffer-usage] Fix-Its transforming `&DRE[any]` to `&DRE.data()[any]`.

Just a heads up it seems like a lot of premerge checks builds are now showing this test as failing on Windows x64:

********************
Failed Tests (1):
  Clang :: SemaCXX/warn-unsafe-buffer-usage-fixits-addressof-arraysubscript.cpp

Here are some links from buildkite for a few of the builds that failed on this test:
https://buildkite.com/llvm-project/premerge-checks/builds/145054
https://buildkite.com/llvm-project/premerge-checks/builds/145048
https://buildkite.com/llvm-project/premerge-checks/builds/145044

The test is also failing on our Windows on Arm bot: https://lab.llvm.org/buildbot/#/builders/65/builds/8950

DavidSpickett added a reverting change: rGd5c428356f6e: Revert "[-Wunsafe-buffer-usage] Fix-Its transforming `&DRE[any]` to `&DRE.data….Apr 5 2023, 1:08 AM

I've reverted this, please try to fix the test then reland.

The full test output can be downloaded from the buildbot page, if you need any more information let me know.

In D143128#4245112, @DavidSpickett wrote:

I've reverted this, please try to fix the test then reland.

The full test output can be downloaded from the buildbot page, if you need any more information let me know.

Thank you @DavidSpickett and @emgullufsen for catching this issue and reverting the patch for me. I have added a specific target triple for the test and re-laned it.

NoQ removed a child revision: D143676: [-Wunsafe-buffer-usage] FixableGadget for handling stand alone pointers under UPC..Apr 7 2023, 2:59 PM

Revision Contents

Path

Size

clang/

include/

clang/

Analysis/

Analyses/

UnsafeBufferUsage.h

2 lines

UnsafeBufferUsageGadgets.def

3 lines

lib/

Analysis/

UnsafeBufferUsage.cpp

134 lines

test/

SemaCXX/

warn-unsafe-buffer-usage-fixits-addressof-arraysubscript.cpp

83 lines

Diff 510914

clang/include/clang/Analysis/Analyses/UnsafeBufferUsage.h

Show All 37 Lines	public:
virtual void handleFixableVariable(const VarDecl *Variable,		virtual void handleFixableVariable(const VarDecl *Variable,
FixItList &&List) = 0;		FixItList &&List) = 0;

/// Returns a reference to the `Preprocessor`:		/// Returns a reference to the `Preprocessor`:
virtual bool isSafeBufferOptOut(const SourceLocation &Loc) const = 0;		virtual bool isSafeBufferOptOut(const SourceLocation &Loc) const = 0;

/// Returns the text indicating that the user needs to provide input there:		/// Returns the text indicating that the user needs to provide input there:
virtual std::string		virtual std::string
getUserFillPlaceHolder(StringRef HintTextToUser = "placeholder") {		getUserFillPlaceHolder(StringRef HintTextToUser = "placeholder") const {
std::string s = std::string("<# ");		std::string s = std::string("<# ");
s += HintTextToUser;		s += HintTextToUser;
s += " #>";		s += " #>";
return s;		return s;
}		}
};		};

// This function invokes the analysis and allows the caller to react to it		// This function invokes the analysis and allows the caller to react to it
Show All 13 Lines

clang/include/clang/Analysis/Analyses/UnsafeBufferUsageGadgets.def

	Show All 24 Lines
	#define FIXABLE_GADGET(name) GADGET(name)			#define FIXABLE_GADGET(name) GADGET(name)
	#endif			#endif

	WARNING_GADGET(Increment)			WARNING_GADGET(Increment)
	WARNING_GADGET(Decrement)			WARNING_GADGET(Decrement)
	WARNING_GADGET(ArraySubscript)			WARNING_GADGET(ArraySubscript)
	WARNING_GADGET(PointerArithmetic)			WARNING_GADGET(PointerArithmetic)
	WARNING_GADGET(UnsafeBufferUsageAttr)			WARNING_GADGET(UnsafeBufferUsageAttr)
	FIXABLE_GADGET(ULCArraySubscript)			FIXABLE_GADGET(ULCArraySubscript) // `DRE[any]` in an Unspecified Lvalue Context
	FIXABLE_GADGET(DerefSimplePtrArithFixable)			FIXABLE_GADGET(DerefSimplePtrArithFixable)
	FIXABLE_GADGET(PointerDereference)			FIXABLE_GADGET(PointerDereference)
				FIXABLE_GADGET(UPCAddressofArraySubscript) // '&DRE[any]' in an Unspecified Pointer Context

	#undef FIXABLE_GADGET			#undef FIXABLE_GADGET
	#undef WARNING_GADGET			#undef WARNING_GADGET
	#undef GADGET			#undef GADGET

clang/lib/Analysis/UnsafeBufferUsage.cpp

Show All 9 Lines
#include "clang/AST/Decl.h"		#include "clang/AST/Decl.h"
#include "clang/AST/RecursiveASTVisitor.h"		#include "clang/AST/RecursiveASTVisitor.h"
#include "clang/ASTMatchers/ASTMatchFinder.h"		#include "clang/ASTMatchers/ASTMatchFinder.h"
#include "clang/Lex/Lexer.h"		#include "clang/Lex/Lexer.h"
#include "clang/Lex/Preprocessor.h"		#include "clang/Lex/Preprocessor.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include <memory>		#include <memory>
#include <optional>		#include <optional>
		#include <sstream>

using namespace llvm;		using namespace llvm;
using namespace clang;		using namespace clang;
using namespace ast_matchers;		using namespace ast_matchers;

namespace clang::ast_matchers {		namespace clang::ast_matchers {
// A `RecursiveASTVisitor` that traverses all descendants of a given node "n"		// A `RecursiveASTVisitor` that traverses all descendants of a given node "n"
// except for those belonging to a different callable of "n".		// except for those belonging to a different callable of "n".
▲ Show 20 Lines • Show All 81 Lines • ▼ Show 20 Lines	private:
const internal::DynTypedMatcher *const Matcher;		const internal::DynTypedMatcher *const Matcher;
internal::ASTMatchFinder *const Finder;		internal::ASTMatchFinder *const Finder;
internal::BoundNodesTreeBuilder *const Builder;		internal::BoundNodesTreeBuilder *const Builder;
internal::BoundNodesTreeBuilder ResultBindings;		internal::BoundNodesTreeBuilder ResultBindings;
const internal::ASTMatchFinder::BindKind Bind;		const internal::ASTMatchFinder::BindKind Bind;
bool Matches;		bool Matches;
};		};

		// Because we're dealing with raw pointers, let's define what we mean by that.
		static auto hasPointerType() {
		return hasType(hasCanonicalType(pointerType()));
		}

		static auto hasArrayType() {
		return hasType(hasCanonicalType(arrayType()));
		}

AST_MATCHER_P(Stmt, forEveryDescendant, internal::Matcher<Stmt>, innerMatcher) {		AST_MATCHER_P(Stmt, forEveryDescendant, internal::Matcher<Stmt>, innerMatcher) {
const DynTypedMatcher &DTM = static_cast<DynTypedMatcher>(innerMatcher);		const DynTypedMatcher &DTM = static_cast<DynTypedMatcher>(innerMatcher);

MatchDescendantVisitor Visitor(&DTM, Finder, Builder, ASTMatchFinder::BK_All);		MatchDescendantVisitor Visitor(&DTM, Finder, Builder, ASTMatchFinder::BK_All);
return Visitor.findMatch(DynTypedNode::create(Node));		return Visitor.findMatch(DynTypedNode::create(Node));
}		}

// Matches a `Stmt` node iff the node is in a safe-buffer opt-out region		// Matches a `Stmt` node iff the node is in a safe-buffer opt-out region
Show All 17 Lines	expr(anyOf(
castSubExpr(innerMatcher)),		castSubExpr(innerMatcher)),
binaryOperator(		binaryOperator(
hasAnyOperatorName("="),		hasAnyOperatorName("="),
hasLHS(innerMatcher)		hasLHS(innerMatcher)
)		)
));		));
// clang-format off		// clang-format off
}		}


		// Returns a matcher that matches any expression `e` such that `InnerMatcher`
		// matches `e` and `e` is in an Unspecified Pointer Context (UPC).
		static internal::Matcher<Stmt>
		isInUnspecifiedPointerContext(internal::Matcher<Stmt> InnerMatcher) {
		// A UPC can be
		// 1. an argument of a function call (except the callee has [[unsafe_...]]
		// attribute), or
		// 2. the operand of a cast operation; or
		// ...
		auto CallArgMatcher =
		callExpr(hasAnyArgument(allOf(
		hasPointerType() /* array also decays to pointer type*/,
		InnerMatcher)),
		jkorousUnsubmitted Not Done Reply Inline Actions I am just wondering how does the callee matcher work in situation with multiple re-declarations 🤔 Something like this: void foo(int* ptr); [[clang::unsafe_buffer_usage]] void foo(int* ptr); void foo(int* ptr); void bar(int* ptr) { foo(ptr); } jkorous: I am just wondering how does the callee matcher work in situation with multiple re-declarations…
		ziqingluo-90AuthorUnsubmitted Done Reply Inline Actions I think we are fine. According to the doc of `FunctionDecl`: /// Represents a function declaration or definition. /// /// Since a given function can be declared several times in a program, /// there may be several FunctionDecls that correspond to that /// function. Only one of those FunctionDecls will be found when /// traversing the list of declarations in the context of the /// FunctionDecl (e.g., the translation unit); this FunctionDecl /// contains all of the information known about the function. Other, /// previous declarations of the function are available via the /// getPreviousDecl() chain. ziqingluo-90: I think we are fine. According to the doc of `FunctionDecl`: ``` /// Represents a function…
		jkorousUnsubmitted Not Done Reply Inline Actions I see! Sound like we should be fine indeed and the test seems to confirm. Thank you! jkorous: I see! Sound like we should be fine indeed and the test seems to confirm. Thank you!
		unless(callee(functionDecl(hasAttr(attr::UnsafeBufferUsage)))));
		auto CastOperandMatcher =
		explicitCastExpr(hasCastKind(CastKind::CK_PointerToIntegral),
		castSubExpr(allOf(hasPointerType(), InnerMatcher)));

		return stmt(anyOf(CallArgMatcher, CastOperandMatcher));
		// FIXME: any more cases? (UPC excludes the RHS of an assignment. For now we
		// don't have to check that.)
		}
} // namespace clang::ast_matchers		} // namespace clang::ast_matchers

namespace {		namespace {
// Because the analysis revolves around variables and their types, we'll need to		// Because the analysis revolves around variables and their types, we'll need to
// track uses of variables (aka DeclRefExprs).		// track uses of variables (aka DeclRefExprs).
using DeclUseList = SmallVector<const DeclRefExpr *, 1>;		using DeclUseList = SmallVector<const DeclRefExpr *, 1>;

// Convenience typedef.		// Convenience typedef.
using FixItList = SmallVector<FixItHint, 4>;		using FixItList = SmallVector<FixItHint, 4>;

// Defined below.		// Defined below.
class Strategy;		class Strategy;
} // namespace		} // namespace

// Because we're dealing with raw pointers, let's define what we mean by that.
static auto hasPointerType() {
return hasType(hasCanonicalType(pointerType()));
}

static auto hasArrayType() {
return hasType(hasCanonicalType(arrayType()));
}

namespace {		namespace {
/// Gadget is an individual operation in the code that may be of interest to		/// Gadget is an individual operation in the code that may be of interest to
/// this analysis. Each (non-abstract) subclass corresponds to a specific		/// this analysis. Each (non-abstract) subclass corresponds to a specific
/// rigid AST structure that constitutes an operation on a pointer-type object.		/// rigid AST structure that constitutes an operation on a pointer-type object.
/// Discovery of a gadget in the code corresponds to claiming that we understand		/// Discovery of a gadget in the code corresponds to claiming that we understand
/// what this part of code is doing well enough to potentially improve it.		/// what this part of code is doing well enough to potentially improve it.
/// Gadgets can be warning (immediately deserving a warning) or fixable (not		/// Gadgets can be warning (immediately deserving a warning) or fixable (not
/// always deserving a warning per se, but requires our attention to identify		/// always deserving a warning per se, but requires our attention to identify
▲ Show 20 Lines • Show All 305 Lines • ▼ Show 20 Lines	public:

static Matcher matcher() {		static Matcher matcher() {
auto Target =		auto Target =
unaryOperator(		unaryOperator(
hasOperatorName("*"),		hasOperatorName("*"),
has(expr(ignoringParenImpCasts(		has(expr(ignoringParenImpCasts(
declRefExpr(to(varDecl())).bind(BaseDeclRefExprTag)))))		declRefExpr(to(varDecl())).bind(BaseDeclRefExprTag)))))
.bind(OperatorTag);		.bind(OperatorTag);

		jkorousUnsubmitted Not Done Reply Inline Actions I am wondering what will happen in the weird corner-case of `&5[ptr]` - I feel the Fix-It we produce would be incorrect. Here's a suggestion - we could use `hasLHS` instead of `hasBase` here and add a FIXME that when we find the time we should also properly support the corner-case. That would be a pretty low-priority though - we definitely have more important patterns to support first. WDYT? jkorous: I am wondering what will happen in the weird corner-case of `&5[ptr]` - I feel the Fix-It we…
		ziqingluo-90AuthorUnsubmitted Done Reply Inline Actions I'm not sure if I understand your concern. For `&5[ptr]`, we will generate a fix-it `ptr.data() + 5` in cases `ptr` is assigned a `span` strategy. It is same as the case of `&ptr[5]`. ziqingluo-90: I'm not sure if I understand your concern. For `&5[ptr]`, we will generate a fix-it `ptr.data…
		jkorousUnsubmitted Not Done Reply Inline Actions Oh, my bad! I assumed (AKA didn't check) that we're just replacing the parts of the code around the DRE and index. You're right. Please ignore me :) jkorous: Oh, my bad! I assumed (AKA didn't check) that we're just replacing the parts of the code around…
return expr(isInUnspecifiedLvalueContext(Target));		return expr(isInUnspecifiedLvalueContext(Target));
}		}

DeclUseList getClaimedVarUseSites() const override {		DeclUseList getClaimedVarUseSites() const override {
return {BaseDeclRefExpr};		return {BaseDeclRefExpr};
}		}

virtual const Stmt *getBaseStmt() const final { return Op; }		virtual const Stmt *getBaseStmt() const final { return Op; }

virtual std::optional<FixItList> getFixits(const Strategy &S) const override;		virtual std::optional<FixItList> getFixits(const Strategy &S) const override;
};		};

		// Represents expressions of the form `&DRE[any]` in the Unspecified Pointer
		// Context (see `isInUnspecifiedPointerContext`).
		// Note here `[]` is the built-in subscript operator.
		class UPCAddressofArraySubscriptGadget : public FixableGadget {
		private:
		static constexpr const char *const UPCAddressofArraySubscriptTag =
		"AddressofArraySubscriptUnderUPC";
		const UnaryOperator *Node; // the `&DRE[any]` node

		public:
		UPCAddressofArraySubscriptGadget(const MatchFinder::MatchResult &Result)
		: FixableGadget(Kind::ULCArraySubscript),
		Node(Result.Nodes.getNodeAs<UnaryOperator>(
		UPCAddressofArraySubscriptTag)) {
		assert(Node != nullptr && "Expecting a non-null matching result");
		}

		static bool classof(const Gadget *G) {
		return G->getKind() == Kind::UPCAddressofArraySubscript;
		}

		static Matcher matcher() {
		return expr(isInUnspecifiedPointerContext(expr(ignoringImpCasts(
		unaryOperator(hasOperatorName("&"),
		hasUnaryOperand(arraySubscriptExpr(
		hasBase(ignoringParenImpCasts(declRefExpr())))))
		.bind(UPCAddressofArraySubscriptTag)))));
		}

		virtual std::optional<FixItList> getFixits(const Strategy &) const override;

		virtual const Stmt *getBaseStmt() const override { return Node; }

		virtual DeclUseList getClaimedVarUseSites() const override {
		const auto *ArraySubst = cast<ArraySubscriptExpr>(Node->getSubExpr());
		const auto *DRE =
		cast<DeclRefExpr>(ArraySubst->getBase()->IgnoreImpCasts());
		NoQUnsubmitted Done Reply Inline Actions These `dyn_cast`s are already checked by the matcher. They can be turned into `cast`s and this function can return `{DRE}` unconditionally. NoQ: These `dyn_cast`s are already checked by the matcher. They can be turned into `cast`s and this…
		return {DRE};
		}
		};
} // namespace		} // namespace

namespace {		namespace {
// An auxiliary tracking facility for the fixit analysis. It helps connect		// An auxiliary tracking facility for the fixit analysis. It helps connect
// declarations to its and make sure we've covered all uses with our analysis		// declarations to its and make sure we've covered all uses with our analysis
// before we try to fix the declaration.		// before we try to fix the declaration.
class DeclUseTracker {		class DeclUseTracker {
using UseSetTy = SmallSet<const DeclRefExpr *, 16>;		using UseSetTy = SmallSet<const DeclRefExpr *, 16>;
▲ Show 20 Lines • Show All 192 Lines • ▼ Show 20 Lines	#include "clang/Analysis/Analyses/UnsafeBufferUsageGadgets.def"
M.addMatcher(		M.addMatcher(
stmt(forEveryDescendant(		stmt(forEveryDescendant(
eachOf(		eachOf(
// A `FixableGadget` matcher and a `WarningGadget` matcher should not disable		// A `FixableGadget` matcher and a `WarningGadget` matcher should not disable
// each other (they could if they were put in the same `anyOf` group).		// each other (they could if they were put in the same `anyOf` group).
// We also should make sure no two `FixableGadget` (resp. `WarningGadget`) matchers		// We also should make sure no two `FixableGadget` (resp. `WarningGadget`) matchers
// match for the same node, so that we can group them		// match for the same node, so that we can group them
// in one `anyOf` group (for better performance via short-circuiting).		// in one `anyOf` group (for better performance via short-circuiting).
stmt(anyOf(		stmt(eachOf(
		ziqingluo-90AuthorUnsubmitted Done Reply Inline Actions Change from `anyOf` to `eachOf` ziqingluo-90: Change from `anyOf` to `eachOf`
#define FIXABLE_GADGET(x) \		#define FIXABLE_GADGET(x) \
x ## Gadget::matcher().bind(#x),		x ## Gadget::matcher().bind(#x),
#include "clang/Analysis/Analyses/UnsafeBufferUsageGadgets.def"		#include "clang/Analysis/Analyses/UnsafeBufferUsageGadgets.def"
// Also match DeclStmts because we'll need them when fixing		// Also match DeclStmts because we'll need them when fixing
// their underlying VarDecls that otherwise don't have		// their underlying VarDecls that otherwise don't have
// any backreferences to DeclStmts.		// any backreferences to DeclStmts.
declStmt().bind("any_ds")		declStmt().bind("any_ds")
)),		)),
▲ Show 20 Lines • Show All 142 Lines • ▼ Show 20 Lines	if (const auto *VD = dyn_cast<VarDecl>(DRE->getDecl())) {
case Strategy::Kind::Array:		case Strategy::Kind::Array:
case Strategy::Kind::Vector:		case Strategy::Kind::Vector:
llvm_unreachable("unsupported strategies for FixableGadgets");		llvm_unreachable("unsupported strategies for FixableGadgets");
}		}
}		}
return std::nullopt;		return std::nullopt;
}		}

		static std::optional<FixItList> // forward declaration
		fixUPCAddressofArraySubscriptWithSpan(const UnaryOperator *Node);

		std::optional<FixItList>
		UPCAddressofArraySubscriptGadget::getFixits(const Strategy &S) const {
		auto DREs = getClaimedVarUseSites();
		const auto *VD = cast<VarDecl>(DREs.front()->getDecl());

		NoQUnsubmitted Done Reply Inline Actions Similarly, this check is redundant, it's already guaranteed by the matcher. NoQ: Similarly, this check is redundant, it's already guaranteed by the matcher.
		switch (S.lookup(VD)) {
		case Strategy::Kind::Span:
		return fixUPCAddressofArraySubscriptWithSpan(Node);
		case Strategy::Kind::Wontfix:
		jkorousUnsubmitted Not Done Reply Inline Actions Since we use `std::nullopt` in `getFixits` to signal errors - we should either use the same strategy in `fixUPCAddressofArraySubscriptWithSpan` or translate the empty return value from it to `nullopt` here. (FWIWI I am leaning towards the former.) Forwarding the empty Fix-It would be incorrect. jkorous: Since we use `std::nullopt` in `getFixits` to signal errors - we should either use the same…
		ziqingluo-90AuthorUnsubmitted Done Reply Inline Actions Oh, that's a bug I made! Thank you for finding it for me. ziqingluo-90: Oh, that's a bug I made! Thank you for finding it for me.
		case Strategy::Kind::Iterator:
		case Strategy::Kind::Array:
		case Strategy::Kind::Vector:
		llvm_unreachable("unsupported strategies for FixableGadgets");
		}
		return std::nullopt; // something went wrong, no fix-it
		}

// Return the text representation of the given `APInt Val`:		// Return the text representation of the given `APInt Val`:
static std::string getAPIntText(APInt Val) {		static std::string getAPIntText(APInt Val) {
SmallVector<char> Txt;		SmallVector<char> Txt;
Val.toString(Txt, 10, true);		Val.toString(Txt, 10, true);
// APInt::toString does not add '\0' to the end of the string for us:		// APInt::toString does not add '\0' to the end of the string for us:
Txt.push_back('\0');		Txt.push_back('\0');
return Txt.data();		return Txt.data();
}		}
▲ Show 20 Lines • Show All 100 Lines • ▼ Show 20 Lines	case Strategy::Kind::Vector:
llvm_unreachable("Strategy not implemented yet!");		llvm_unreachable("Strategy not implemented yet!");
case Strategy::Kind::Wontfix:		case Strategy::Kind::Wontfix:
llvm_unreachable("Invalid strategy!");		llvm_unreachable("Invalid strategy!");
}		}

return std::nullopt;		return std::nullopt;
}		}

		// Generates fix-its replacing an expression of the form `&DRE[e]` with
		// `&DRE.data()[e]`:
		static std::optional<FixItList>
		fixUPCAddressofArraySubscriptWithSpan(const UnaryOperator *Node) {
		const auto *ArraySub = cast<ArraySubscriptExpr>(Node->getSubExpr());
		const auto *DRE = cast<DeclRefExpr>(ArraySub->getBase()->IgnoreImpCasts());
		NoQUnsubmitted Done Reply Inline Actions Same here! NoQ: Same here!
		// FIXME: this `getASTContext` call is costly, we should pass the
		t-rasmudUnsubmitted Done Reply Inline Actions Nit: "costly" t-rasmud: Nit: "costly"
		// ASTContext in:
		const ASTContext &Ctx = DRE->getDecl()->getASTContext();
		const Expr *Idx = ArraySub->getIdx();
		const SourceManager &SM = Ctx.getSourceManager();
		const LangOptions &LangOpts = Ctx.getLangOpts();
		std::stringstream SS;
		bool IdxIsLitZero = false;

		if (auto ICE = Idx->getIntegerConstantExpr(Ctx))
		if ((*ICE).isZero())
		IdxIsLitZero = true;
		if (IdxIsLitZero) {
		// If the index is literal zero, we produce the most concise fix-it:
		SS << getExprText(DRE, SM, LangOpts).str() << ".data()";
		} else {
		SS << "&" << getExprText(DRE, SM, LangOpts).str() << ".data()"
		malavikasamakUnsubmitted Done Reply Inline Actions The title and summary of this patch indicates this gadget emits fixit of the form &DRE.data() + any when it encounters &DRE[any], but the code seems to emits &DRE.data()[any] instead? If yes, can you please update the summary? malavikasamak: The title and summary of this patch indicates this gadget emits fixit of the form &DRE.data()…
		<< "[" << getExprText(Idx, SM, LangOpts).str() << "]";
		}
		return FixItList{
		FixItHint::CreateReplacement(Node->getSourceRange(), SS.str())};
		}

// For a non-null initializer `Init` of `T *` type, this function returns		// For a non-null initializer `Init` of `T *` type, this function returns
// `FixItHint`s producing a list initializer `{Init, S}` as a part of a fix-it		// `FixItHint`s producing a list initializer `{Init, S}` as a part of a fix-it
// to output stream.		// to output stream.
// In many cases, this function cannot figure out the actual extent `S`. It		// In many cases, this function cannot figure out the actual extent `S`. It
// then will use a place holder to replace `S` to ask users to fill `S` in. The		// then will use a place holder to replace `S` to ask users to fill `S` in. The
// initializer shall be used to initialize a variable of type `std::span<T>`.		// initializer shall be used to initialize a variable of type `std::span<T>`.
//		//
// FIXME: Support multi-level pointers		// FIXME: Support multi-level pointers
▲ Show 20 Lines • Show All 263 Lines • Show Last 20 Lines

clang/test/SemaCXX/warn-unsafe-buffer-usage-fixits-addressof-arraysubscript.cpp

This file was added.

				// RUN: %clang_cc1 -std=c++20 -Wunsafe-buffer-usage -fdiagnostics-parseable-fixits %s 2>&1 \| FileCheck %s

				int f(unsigned long, void *);

				[[clang::unsafe_buffer_usage]]
				int unsafe_f(unsigned long, void *);

				void address_to_integer(int x) {
				int * p = new int[10];
				unsigned long n = (unsigned long) &p[5];
				// CHECK: fix-it:"{{.*}}":{[[@LINE-1]]:37-[[@LINE-1]]:42}:"&p.data()[5]"
				unsigned long m = (unsigned long) &p[x];
				// CHECK: fix-it:"{{.*}}":{[[@LINE-1]]:37-[[@LINE-1]]:42}:"&p.data()[x]"
				}

				void call_argument(int x) {
				int * p = new int[10];

				f((unsigned long) &p[5], &p[x]);
				// CHECK-DAG: fix-it:"{{.*}}":{[[@LINE-1]]:21-[[@LINE-1]]:26}:"&p.data()[5]"
				// CHECK-DAG: fix-it:"{{.*}}":{[[@LINE-2]]:28-[[@LINE-2]]:33}:"&p.data()[x]"
				}

				void ignore_unsafe_calls(int x) {
				// Cannot fix `&p[x]` for now as it is an argument of an unsafe
				// call. So no fix for variable `p`.
				int * p = new int[10];
				// CHECK-NOT: fix-it:"{{.*}}":{[[@LINE-1]]
				unsafe_f((unsigned long) &p[5],
				// CHECK-NOT: fix-it:"{{.*}}":{[[@LINE-1]]
				&p[x]);

				int * q = new int[10];
				// CHECK: fix-it:"{{.*}}":{[[@LINE-1]]:3-[[@LINE-1]]:12}:"std::span<int> q"
				// CHECK: fix-it:"{{.*}}":{[[@LINE-2]]:13-[[@LINE-2]]:13}:"{"
				// CHECK: fix-it:"{{.*}}":{[[@LINE-3]]:24-[[@LINE-3]]:24}:", 10}"
				unsafe_f((unsigned long) &q[5],
				// CHECK: fix-it:"{{.*}}":{[[@LINE-1]]:28-[[@LINE-1]]:33}:"&q.data()[5]"
				(void*)0);
				}

				void odd_subscript_form() {
				int * p = new int[10];
				unsigned long n = (unsigned long) &5[p];
				// CHECK: fix-it:"{{.*}}":{[[@LINE-1]]:37-[[@LINE-1]]:42}:"&p.data()[5]"
				}

				void index_is_zero() {
				int * p = new int[10];
				int n = p[5];

				f((unsigned long)&p[0],
				// CHECK: fix-it:"{{.*}}":{[[@LINE-1]]:20-[[@LINE-1]]:25}:"p.data()"
				&p[0]);
				// CHECK: fix-it:"{{.*}}":{[[@LINE-1]]:5-[[@LINE-1]]:10}:"p.data()"
				}

				// CHECK-NOT: fix-it
				// To test multiple function declarations, each of which carries
				// different incomplete informations:
				[[clang::unsafe_buffer_usage]]
				void unsafe_g(void*);

				void unsafe_g(void*);

				void multiple_unsafe_fundecls() {
				int * p = new int[10];

				unsafe_g(&p[5]);
				}

				void unsafe_h(void*);

				[[clang::unsafe_buffer_usage]]
				void unsafe_h(void*);
				t-rasmudUnsubmitted Done Reply Inline Actions Can we have a test case for `&p[0]`? IIUC, this would generate a fixit `p.data() + 0` which is correct but we might want to optimize it to `p.data()` sometime in the future. t-rasmud: Can we have a test case for `&p[0]`? IIUC, this would generate a fixit `p.data() + 0` which is…

				void unsafe_h(void* p) { ((char*)p)[10]; }

				void multiple_unsafe_fundecls2() {
				int * p = new int[10];

				unsafe_h(&p[5]);
				}

This is an archive of the discontinued LLVM Phabricator instance.

[-Wunsafe-buffer-usage] Fix-Its transforming `&DRE[any]` to `&DRE.data()[any]`ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 510914

clang/include/clang/Analysis/Analyses/UnsafeBufferUsage.h

clang/include/clang/Analysis/Analyses/UnsafeBufferUsageGadgets.def

clang/lib/Analysis/UnsafeBufferUsage.cpp

clang/test/SemaCXX/warn-unsafe-buffer-usage-fixits-addressof-arraysubscript.cpp

[-Wunsafe-buffer-usage] Fix-Its transforming `&DRE[any]` to `&DRE.data()[any]`
ClosedPublic