Download Raw Diff

Details

Reviewers

kazu
void
mtrofin
knaumov
chandlerc
manojgupta
rpbeltran

Commits

rGf1764d5b594f: [InlineCost] model calls to llvm.objectsize.*

Summary

Very similar to https://reviews.llvm.org/D111272. We very often can
evaluate calls to llvm.objectsize.* regardless of inlining. Don't count
calls to llvm.objectsize.* against the InlineCost when we can evaluate
the call to a constant.

Link: https://github.com/ClangBuiltLinux/linux/issues/1302

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

nickdesaulniers created this revision.Oct 8 2021, 12:00 PM

Herald added subscribers: haicheng, hiraditya, eraman. · View Herald TranscriptOct 8 2021, 12:00 PM

nickdesaulniers requested review of this revision.Oct 8 2021, 12:00 PM

Herald added a project: Restricted Project. · View Herald TranscriptOct 8 2021, 12:00 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

nickdesaulniers added a parent revision: D111272: [InlineCost] model calls to llvm.is.constant* more carefully.Oct 8 2021, 12:00 PM

nickdesaulniers added a reviewer: lebedev.ri.

Harbormaster completed remote builds in B127851: Diff 378331.Oct 8 2021, 1:02 PM

simplify test akin to as done according to feedback on D111272

Harbormaster completed remote builds in B127879: Diff 378372.Oct 8 2021, 4:32 PM

Looking at https://llvm.org/docs/LangRef.html#id1293; there's a fourth parameter that says this check must be emitted at runtime. I should perhaps check that and bail early.

nickdesaulniers removed a reviewer: lebedev.ri.Oct 11 2021, 2:40 PM

nickdesaulniers added subscribers: lebedev.ri, echristo, erik.pilkington, hliao.

@craig.topper also mentioned that "it looks like maybe ObjectSizeOffsetVisitor doesn't know how to look through bitcasts?" on IRC. Let me look into that, because I suspect we should be able to evaluate call i64 @llvm.objectsize.i64.p0i8(i8* bitcast (%struct.nodemask_t* @numa_nodes_parsed to i8*), i1 false, i1 false, i1 false) to not--1.

Hmm. I may be a bit pedantic, but please bear with me for a moment. Even though I've already LGTMed https://reviews.llvm.org/D111272, and this patch is quite similar in nature -- "speculative" folding, I think we could run into a problem for the following reason. InlineCost.cpp performs two tasks -- the legality check ("Can we inline?") and the desirability check ("Should we inline?"). Once we "speculatively fold" these intrinsics and update SimplifiedValues, we don't examine those "dead" basic blocks for legality purposes. This can be a problem if those basic blocks come back to life, and they contain un-inlinable instructions. I suspect even the return value from @llvm.objectsize could change from the failure value (like -1) to the actual object size after ThinLTO importing.

I am currently thinking about maintaining a parallel world SimplifiedValues and LikelyValues for operations that matter to folding conditional branches that check @llvm.is.constant and @llvm.objectsize. I would also maintain a set of basic blocks that are "speculatively dead". If I am scanning a basic block that is marked speculatively dead, then I check legality but do not increment Cost.

SimplifiedValues appears 40 times in InlineCost.cpp, so it would be a nightmare to duplicate each one and maintain the perfect parallel world for little gain.

Thoughts?

In D111456#3056647, @kazu wrote:

Hmm. I may be a bit pedantic, but please bear with me for a moment. Even though I've already LGTMed https://reviews.llvm.org/D111272, and this patch is quite similar in nature -- "speculative" folding, I think we could run into a problem for the following reason. InlineCost.cpp performs two tasks -- the legality check ("Can we inline?") and the desirability check ("Should we inline?"). Once we "speculatively fold" these intrinsics and update SimplifiedValues, we don't examine those "dead" basic blocks for legality purposes. This can be a problem if those basic blocks come back to life, and they contain un-inlinable instructions. I suspect even the return value from @llvm.objectsize could change from the failure value (like -1) to the actual object size after ThinLTO importing.

I am currently thinking about maintaining a parallel world SimplifiedValues and LikelyValues for operations that matter to folding conditional branches that check @llvm.is.constant and @llvm.objectsize. I would also maintain a set of basic blocks that are "speculatively dead". If I am scanning a basic block that is marked speculatively dead, then I check legality but do not increment Cost.

SimplifiedValues appears 40 times in InlineCost.cpp, so it would be a nightmare to duplicate each one and maintain the perfect parallel world for little gain.

Thoughts?

Why not have a separate SimplifiedValues object when checking for "cost"? There could be a canonical SimplifiedValues when checking legality, but when checking for cost, the only values that modify the canonical SimplifiedValues are those we *know* are correct---i.e., llvm.is.constant evaluates to true, or llvm.objectsize evaluates to a constant value. So when checking for "cost", we make a copy of the canonical version and then proceed as normal.

@kazu, any thoughts on my comment?

In D111456#3159601, @void wrote:

@kazu, any thoughts on my comment?

I like the idea of entering known correct values into SimplifiedValues. Now, when you say a separate SimplifiedValues object, do you mean maintaining something like SimplifiedValuesForCost and SimplifiedValuesForLegality?

Herald added a project: Restricted Project. · View Herald TranscriptSep 13 2022, 12:54 PM

Herald added a subscriber: ChuanqiXu. · View Herald Transcript

rebase

nickdesaulniers added reviewers: manojgupta, rpbeltran.Jan 18 2023, 9:58 AM

Bumping for review; this is actively hurting our ability to protect C code with FORTIFY_SOURCE.

The proposed changes sound nice, seem like pretty significant reworking of inline cost modeling that itself sounds significantly riskier than this patch. I'll leave reworking inline cost model in such a way to others to burden such refactoring risk.

IMO it's a major deficit at the moment of inline cost modeling not knowing about most LLVM intrinsics, and penalizing callers for the full cost of a function call at runtime when many intrinsics don't result in any code being generated at all. This is a simple patch towards improving that problem.

The patch looks simple enough to me. But I do not know this code patch well enough to accept it. Agree that Inlining cost estimation work should not block this.

Harbormaster completed remote builds in B208538: Diff 490213.Jan 18 2023, 11:39 AM

The change looks simple enough and do not see any issues raised.

This revision is now accepted and ready to land.Jan 20 2023, 6:48 AM

update test to use opaque ptrs; test predated conversion!

This revision was landed with ongoing or failed builds.Jan 24 2023, 3:10 PM

Closed by commit rGf1764d5b594f: [InlineCost] model calls to llvm.objectsize.* (authored by nickdesaulniers). · Explain Why

This revision was automatically updated to reflect the committed changes.

nickdesaulniers added a commit: rGf1764d5b594f: [InlineCost] model calls to llvm.objectsize.*.

Harbormaster completed remote builds in B209763: Diff 491931.Jan 24 2023, 6:06 PM

It looks like a bisect linked the change in behavior seen here: https://github.com/llvm/llvm-project/issues/61775 to this commit

Diff 491932

llvm/lib/Analysis/InlineCost.cpp

Show All 16 Lines
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/Analysis/AssumptionCache.h"		#include "llvm/Analysis/AssumptionCache.h"
#include "llvm/Analysis/BlockFrequencyInfo.h"		#include "llvm/Analysis/BlockFrequencyInfo.h"
#include "llvm/Analysis/CodeMetrics.h"		#include "llvm/Analysis/CodeMetrics.h"
#include "llvm/Analysis/ConstantFolding.h"		#include "llvm/Analysis/ConstantFolding.h"
#include "llvm/Analysis/InstructionSimplify.h"		#include "llvm/Analysis/InstructionSimplify.h"
#include "llvm/Analysis/LoopInfo.h"		#include "llvm/Analysis/LoopInfo.h"
		#include "llvm/Analysis/MemoryBuiltins.h"
#include "llvm/Analysis/OptimizationRemarkEmitter.h"		#include "llvm/Analysis/OptimizationRemarkEmitter.h"
#include "llvm/Analysis/ProfileSummaryInfo.h"		#include "llvm/Analysis/ProfileSummaryInfo.h"
#include "llvm/Analysis/TargetLibraryInfo.h"		#include "llvm/Analysis/TargetLibraryInfo.h"
#include "llvm/Analysis/TargetTransformInfo.h"		#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/Config/llvm-config.h"		#include "llvm/Config/llvm-config.h"
#include "llvm/IR/AssemblyAnnotationWriter.h"		#include "llvm/IR/AssemblyAnnotationWriter.h"
#include "llvm/IR/CallingConv.h"		#include "llvm/IR/CallingConv.h"
▲ Show 20 Lines • Show All 381 Lines • ▼ Show 20 Lines	protected:
void findDeadBlocks(BasicBlock CurrBB, BasicBlock NextBB);		void findDeadBlocks(BasicBlock CurrBB, BasicBlock NextBB);
void disableLoadElimination();		void disableLoadElimination();
bool isGEPFree(GetElementPtrInst &GEP);		bool isGEPFree(GetElementPtrInst &GEP);
bool canFoldInboundsGEP(GetElementPtrInst &I);		bool canFoldInboundsGEP(GetElementPtrInst &I);
bool accumulateGEPOffset(GEPOperator &GEP, APInt &Offset);		bool accumulateGEPOffset(GEPOperator &GEP, APInt &Offset);
bool simplifyCallSite(Function *F, CallBase &Call);		bool simplifyCallSite(Function *F, CallBase &Call);
bool simplifyInstruction(Instruction &I);		bool simplifyInstruction(Instruction &I);
bool simplifyIntrinsicCallIsConstant(CallBase &CB);		bool simplifyIntrinsicCallIsConstant(CallBase &CB);
		bool simplifyIntrinsicCallObjectSize(CallBase &CB);
ConstantInt stripAndComputeInBoundsConstantOffsets(Value &V);		ConstantInt stripAndComputeInBoundsConstantOffsets(Value &V);

/// Return true if the given argument to the function being considered for		/// Return true if the given argument to the function being considered for
/// inlining has the given attribute set either at the call site or the		/// inlining has the given attribute set either at the call site or the
/// function declaration. Primarily used to inspect call site specific		/// function declaration. Primarily used to inspect call site specific
/// attributes since these can be more precise than the ones on the callee		/// attributes since these can be more precise than the ones on the callee
/// itself.		/// itself.
bool paramHasAttr(Argument *A, Attribute::AttrKind Attr);		bool paramHasAttr(Argument *A, Attribute::AttrKind Attr);
▲ Show 20 Lines • Show All 1,167 Lines • ▼ Show 20 Lines	bool CallAnalyzer::simplifyIntrinsicCallIsConstant(CallBase &CB) {
if (!C)		if (!C)
C = dyn_cast_or_null<Constant>(SimplifiedValues.lookup(Arg));		C = dyn_cast_or_null<Constant>(SimplifiedValues.lookup(Arg));

Type *RT = CB.getFunctionType()->getReturnType();		Type *RT = CB.getFunctionType()->getReturnType();
SimplifiedValues[&CB] = ConstantInt::get(RT, C ? 1 : 0);		SimplifiedValues[&CB] = ConstantInt::get(RT, C ? 1 : 0);
return true;		return true;
}		}

		bool CallAnalyzer::simplifyIntrinsicCallObjectSize(CallBase &CB) {
		// As per the langref, "The fourth argument to llvm.objectsize determines if
		// the value should be evaluated at runtime."
		if(cast<ConstantInt>(CB.getArgOperand(3))->isOne())
		return false;

		Value *V = lowerObjectSizeCall(&cast<IntrinsicInst>(CB), DL, nullptr,
		/MustSucceed=/true);
		Constant *C = dyn_cast_or_null<Constant>(V);
		if (C)
		SimplifiedValues[&CB] = C;
		return C;
		}

bool CallAnalyzer::visitBitCast(BitCastInst &I) {		bool CallAnalyzer::visitBitCast(BitCastInst &I) {
// Propagate constants through bitcasts.		// Propagate constants through bitcasts.
if (simplifyInstruction(I))		if (simplifyInstruction(I))
return true;		return true;

// Track base/offsets through casts		// Track base/offsets through casts
std::pair<Value *, APInt> BaseAndOffset =		std::pair<Value *, APInt> BaseAndOffset =
ConstantOffsetPtrs.lookup(I.getOperand(0));		ConstantOffsetPtrs.lookup(I.getOperand(0));
▲ Show 20 Lines • Show All 596 Lines • ▼ Show 20 Lines	case Intrinsic::vastart:
return false;		return false;
case Intrinsic::launder_invariant_group:		case Intrinsic::launder_invariant_group:
case Intrinsic::strip_invariant_group:		case Intrinsic::strip_invariant_group:
if (auto *SROAArg = getSROAArgForValueOrNull(II->getOperand(0)))		if (auto *SROAArg = getSROAArgForValueOrNull(II->getOperand(0)))
SROAArgValues[II] = SROAArg;		SROAArgValues[II] = SROAArg;
return true;		return true;
case Intrinsic::is_constant:		case Intrinsic::is_constant:
return simplifyIntrinsicCallIsConstant(Call);		return simplifyIntrinsicCallIsConstant(Call);
		case Intrinsic::objectsize:
		return simplifyIntrinsicCallObjectSize(Call);
}		}
}		}

if (F == Call.getFunction()) {		if (F == Call.getFunction()) {
// This flag will fully abort the analysis, so don't bother with anything		// This flag will fully abort the analysis, so don't bother with anything
// else.		// else.
IsRecursiveCall = true;		IsRecursiveCall = true;
if (!AllowRecursiveCall)		if (!AllowRecursiveCall)
▲ Show 20 Lines • Show All 939 Lines • Show Last 20 Lines

llvm/test/Transforms/Inline/call-intrinsic-objectsize.ll

This file was added.

				; RUN: opt -passes=inline -S %s -inline-threshold=20 2>&1 \| FileCheck %s

				%struct.nodemask_t = type { [16 x i64] }
				@numa_nodes_parsed = external constant %struct.nodemask_t, align 8

				declare void @foo()
				declare i64 @llvm.objectsize.i64.p0(ptr, i1 immarg, i1 immarg, i1 immarg)

				; Test that we inline @callee into @caller.
				define i64 @caller() {
				; CHECK-LABEL: @caller(
				; CHECK-NEXT: [[TMP1:%.*]] = tail call i64 @llvm.objectsize.i64.p0(ptr @numa_nodes_parsed, i1 false, i1 false, i1 false)
				; CHECK-NEXT: [[TMP2:%.*]] = icmp uge i64 [[TMP1]], 128
				; CHECK-NEXT: br i1 [[TMP2]], label %[[CALLEE_EXIT:.]], label %[[HANDLER_TYPE_MISMATCH94_I:.]]
				; CHECK: [[HANDLER_TYPE_MISMATCH94_I]]:
				; CHECK-NEXT: call void @foo()
				; CHECK-NEXT: call void @foo()
				; CHECK-NEXT: call void @foo()
				; CHECK-NEXT: call void @foo()
				; CHECK-NEXT: call void @foo()
				; CHECK-NEXT: call void @foo()
				; CHECK-NEXT: br label %[[CALLEE_EXIT]]
				; CHECK: [[CALLEE_EXIT]]:
				; CHECK-NEXT: ret i64 [[TMP1]]
				;
				%1 = tail call i64 @callee()
				ret i64 %1
				}

				; Testing the InlineCost of the call to @llvm.objectsize.i64.p0i8.
				; Do not change the linkage of @callee; that will give it a severe discount in
				; cost (LastCallToStaticBonus).
				define i64 @callee() {
				%1 = tail call i64 @llvm.objectsize.i64.p0(ptr @numa_nodes_parsed, i1 false, i1 false, i1 false)
				%2 = icmp uge i64 %1, 128
				br i1 %2, label %cont95, label %handler.type_mismatch94

				handler.type_mismatch94:
				call void @foo()
				call void @foo()
				call void @foo()
				call void @foo()
				call void @foo()
				call void @foo()
				br label %cont95

				cont95:
				ret i64 %1
				}

This is an archive of the discontinued LLVM Phabricator instance.

[InlineCost] model calls to llvm.objectsize.*
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 491932

llvm/lib/Analysis/InlineCost.cpp

llvm/test/Transforms/Inline/call-intrinsic-objectsize.ll

This is an archive of the discontinued LLVM Phabricator instance.

[InlineCost] model calls to llvm.objectsize.*ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 491932

llvm/lib/Analysis/InlineCost.cpp

llvm/test/Transforms/Inline/call-intrinsic-objectsize.ll

[InlineCost] model calls to llvm.objectsize.*
ClosedPublic