This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AMDGPU/
-
Target/
-
AMDGPU/
19/25
AMDGPUTargetTransformInfo.cpp
-
test/Transforms/Inline/AMDGPU/
-
Transforms/
-
Inline/
-
AMDGPU/
-
amdgpu-inline-stack-argument-i64.ll
2/2
amdgpu-inline-stack-argument.ll
-
amdgpu-inline-stack-array-ptr-argument.ll
1/1
amdgpu-inline-stack-ptr-argument.ll
-
amdgpu-inline-stack-struct-argument.ll
-
amdgpu-inline-stack-vector-ptr-argument.ll

Differential D140242

[AMDGPU] Modify adjustInliningThreshold to also consider the cost of passing function arguments through the stack
ClosedPublic

Authored by JanekvO on Dec 16 2022, 12:10 PM.

Download Raw Diff

Details

Reviewers

arsenm
scchan

Group Reviewers

Restricted Project

Commits

rG142c28ffa132: [AMDGPU] Modify adjustInliningThreshold to also consider the cost of passing…

Summary

A regression from when new PM got enabled as default. Functions with a big number of instructions will elide getting inlined but do not consider the cost of passing arguments over stack if there are a lot of function arguments. This patch attempts to add a heuristic for AMDGPU's function calling convention that also considers function arguments passed through the stack.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

JanekvO created this revision.Dec 16 2022, 12:10 PM

Herald added a project: Restricted Project. · View Herald TranscriptDec 16 2022, 12:10 PM

Herald added subscribers: kosarev, foad, ChuanqiXu and 9 others. · View Herald Transcript

JanekvO requested review of this revision.Dec 16 2022, 12:10 PM

Herald added a project: Restricted Project. · View Herald TranscriptDec 16 2022, 12:10 PM

Herald added subscribers: llvm-commits, wdng. · View Herald Transcript

JanekvO added a reviewer: Restricted Project.Dec 16 2022, 12:11 PM

JanekvO added inline comments.Dec 16 2022, 12:14 PM

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
1187	I added function passing through SGPRs as it was described for non-kernel functions in the AMDGPU backend user guide but the calling convention tablegen file for AMDGPU seems to only use VGPRs for passing arguments. Not sure if the aim is to add function argument passing through SGPRs and I should keep it in or whether to remove this.

arsenm requested changes to this revision.Dec 16 2022, 12:17 PM

arsenm added inline comments.

llvm/lib/Analysis/InlineCost.cpp
164–166 ↗	(On Diff #483632)	Not sure why you're adding this but it doesn't belong in this patch
llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
88–90	Should be able to compute this directly from the existing costs for stack stores
1173–1175	No reason to specially consider them?
1182	Raw argument counts don't correspond to register counts, need to get the type legalized register size
1249–1252	Typo tInlinig
llvm/test/Transforms/Inline/AMDGPU/amdgpu-inline-stack-argument.ll
19–22	Should use pass arguments or flags to set the thresholds to avoid having so many instructions in the test
2028	Don't use anonymous values in tests

This revision now requires changes to proceed.Dec 16 2022, 12:17 PM

scchan requested changes to this revision.Dec 16 2022, 1:23 PM

scchan added a subscriber: scchan.

scchan added inline comments.

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
1188	I guess it's subtracting the number of clobbered registers - instead of a hardcoded value, could that be replaced by something more meaningful like a const variable or a getter? Also shouldn't VGPRs have a higher penalty relative to SGPRs since they'd occupy more stack space?

arsenm added inline comments.Dec 16 2022, 1:25 PM

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
1188	We only sort of handle SGPR arguments today, and not for compute. We also do not currently implement the optimization of packing SGPRs into a VGPR for the argument spill

scchan added inline comments.Dec 16 2022, 1:38 PM

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
1188	I wasn't paying attention to the comments for ArgStackInlinePenalty. The cost model is only based on the number of instructions and it doesn't take storage into account.

Harbormaster completed remote builds in B203689: Diff 483632.Dec 16 2022, 1:48 PM

Address comments
Rebase

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
88–90	Sorry, I couldn't find any constant or cl option for stack store costs. Did you have anything in mind to replace this cl option?
1173–1175	Removed as inlining will never call kernel functions as callee.
1188	I couldn't infer what measurement unit the inliner cost/threshold uses so I took a cost model relative from the cost of a single instruction. Do let me know if the storage cost should be considered (and possibly with what amount).

arsenm added inline comments.Jan 12 2023, 8:44 AM

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
88–90	I mean TargetTransformInfo::getMemoryOpCost (looking now and I don't think we ever implemented this. I know I tried to implement at one point but I guess never pushed it. We should implement this too)
1182	Get getPrimitiveSizeInBits doesn't work for pointers. You're also going to be repeating a lot of legalization logic. You're better off using getRegisterTypeForCallingConv and getNumRegistersForCallingConv

Harbormaster completed remote builds in B207415: Diff 488665.Jan 12 2023, 10:36 AM

scchan added inline comments.Jan 12 2023, 11:54 AM

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
1188	I was thinking about the contribution of a stack's size to the overall size of the scratch since that may add penalty to the launch overhead. A VGPR store would take more stack space than a SGPR store and therefore has a higher cost (relatively speaking)? I don't know how to model it at the moment but just suggesting that would be something to consider.

Address feedback, add tests for ptr and different arg types

Harbormaster completed remote builds in B208080: Diff 489588.Jan 16 2023, 10:50 AM

Rebase

arsenm added inline comments.Jan 25 2023, 2:26 PM

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
1183–1184	Can you just go through EVT? This is going to not work for vectors of pointers. You shouldn't have to consider any of these details
1212	assign DL to variable above?
1216–1218	This isn't actually true today, we don't implement SGPR argument packing
1224	Don't refer to this as a spill, it's a stack cost

Harbormaster completed remote builds in B209960: Diff 492233.Jan 25 2023, 2:30 PM

Review comments

Harbormaster completed remote builds in B210490: Diff 492929.Jan 27 2023, 4:54 PM

JanekvO marked an inline comment as done.Jan 30 2023, 4:55 AM

JanekvO added inline comments.

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
1183–1184	This is going to not work for vectors of pointers. Sorry, is there a way to get vector of pointers working with EVT that I'm missing?

arsenm added inline comments.Jan 30 2023, 5:57 AM

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
1183–1184	You should be performing no logic on the type. You should pass the raw type to getEVT (i.e. remove the isPointerTy check, that doesn't cover vectors of pointers)

JanekvO added inline comments.Jan 30 2023, 7:35 AM

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
1183–1184	Ah, the reason I put some type logic here is because `getNumRegistersForCallingConv` calls `EVT::getSizeInBits` which is target dependent and will hit an unreachable error (`EVT::getEVT` also doesn't allow vector of pointers).

arsenm added inline comments.Jan 30 2023, 7:46 AM

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
1183–1184	I forgot this API was bad. You probably want getValueType (which is what TTI::getRegUsageForType uses, and also won't work correctly for aggregates). The most correct thing would be to call ComputeValueVTs to cover all aggregates correctly, but that feels a bit heavy. I'd go with getTLI()->getValueType(), and add a test with an array and a struct argument just to make sure those don't break too badly

arsenm added inline comments.Jan 30 2023, 7:47 AM

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
1183–1184	Actually, we kind of care a lot about getting aggregates correct. We generate a number of them for the basic HIP ABI and should be emitting more of them.

Use ComputeValueVTs for argument EVT retrieval.
Rebase

Harbormaster completed remote builds in B211201: Diff 493907.Feb 1 2023, 5:39 AM

arsenm accepted this revision.Feb 1 2023, 6:28 AM

arsenm added inline comments.

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
1205	Hardcoding 4 for the alignment would be fine (not that this argument will do anything)
llvm/test/Transforms/Inline/AMDGPU/amdgpu-inline-stack-ptr-argument.ll
2	REQUIRES should be the first line

Hardcore alignment, switch REQUIRES statement in tests

arsenm accepted this revision.Feb 2 2023, 5:30 AM

Harbormaster completed remote builds in B211462: Diff 494267.Feb 2 2023, 6:07 AM

Format fix

Harbormaster completed remote builds in B211479: Diff 494289.Feb 2 2023, 7:30 AM

Windows failure seems to be unrelated.
@scchan I'm not sure if I've addressed your comments sufficiently, let me know if not.

You don't need to re-post the review for every minor formatting fix

The SGPR argument thing doesn't really matter now, it's not an optimization we perform for arguments

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
1206	Align(4), not MaybeAlign

This revision was not accepted when it landed; it landed in state Needs Review.Feb 3 2023, 7:31 AM

This revision was landed with ongoing or failed builds.

Closed by commit rG142c28ffa132: [AMDGPU] Modify adjustInliningThreshold to also consider the cost of passing… (authored by JanekvO). · Explain Why

This revision was automatically updated to reflect the committed changes.

JanekvO added a commit: rG142c28ffa132: [AMDGPU] Modify adjustInliningThreshold to also consider the cost of passing….

Hi Janek, seeing buildbot failures: https://lab.llvm.org/buildbot/#/builders/193/builds/26047

0. Program arguments: /home/omp-vega20-0/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu --linker-path=/home/omp-vega20-0/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/ld.lld -- -pie -z relro --hash-style=gnu --eh-frame-hdr -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o /home/omp-vega20-0/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/libomptarget/test/amdgcn-amd-amdhsa/offloading/Output/bug49021.cpp.tmp /lib/x86_64-linux-gnu/Scrt1.o /lib/x86_64-linux-gnu/crti.o /usr/lib/gcc/x86_64-linux-gnu/9/crtbeginS.o -L/home/omp-vega20-0/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/libomptarget -L/home/omp-vega20-0/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L/usr/lib/gcc/x86_64-linux-gnu/9 -L/usr/lib/gcc/x86_64-linux-gnu/9/../../../../lib64 -L/lib/x86_64-linux-gnu -L/lib/../lib64 -L/usr/lib/x86_64-linux-gnu -L/usr/lib/../lib64 -L/lib -L/usr/lib -rpath /home/omp-vega20-0/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/libomptarget -rpath /home/omp-vega20-0/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -rpath /home/omp-vega20-0/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib /tmp/lit-tmp-g5sdw5re/bug49021-57cfcf.o -lstdc++ -lm -lomp -lomptarget -lomptarget.devicertl -L/home/omp-vega20-0/bbot/openmp-offload-amdgpu-runtime/llvm.build/lib -lgcc_s -lgcc -lpthread -lc -lgcc_s -lgcc /usr/lib/gcc/x86_64-linux-gnu/9/crtendS.o /lib/x86_64-linux-gnu/crtn.o
#0 0x000055aaa775a69f llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/omp-vega20-0/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/clang-linker-wrapper+0x150a69f)
#1 0x000055aaa7758324 SignalHandler(int) Signals.cpp:0:0
#2 0x00007f4ad77ff420 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x14420)
#3 0x000055aaa661d96a llvm::GCNTTIImpl::adjustInliningThreshold(llvm::CallBase const*) const (.cold) AMDGPUTargetTransformInfo.cpp:0:0
#4 0x000055aaa73dcf01 (anonymous namespace)::InlineCostCallAnalyzer::updateThreshold(llvm::CallBase&, llvm::Function&) InlineCost.cpp:0:0
#5 0x000055aaa73de0ab (anonymous namespace)::InlineCostCallAnalyzer::onAnalysisStart() InlineCost.cpp:0:0

JanekvO added a reverting change: rG1beba4452690: Revert "[AMDGPU] Modify adjustInliningThreshold to also consider the cost of….Feb 3 2023, 11:14 AM

JanekvO mentioned this in D143498: Reapply "[AMDGPU] Modify adjustInliningThreshold to also consider the cost of passing function arguments through the stack".Feb 7 2023, 7:11 AM

JanekvO mentioned this in rGe3515ba3816b: Reapply "[AMDGPU] Modify adjustInliningThreshold to also consider the cost of….Feb 13 2023, 4:17 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

AMDGPUTargetTransformInfo.cpp

61 lines

test/

Transforms/

Inline/

AMDGPU/

amdgpu-inline-stack-argument-i64.ll

100 lines

amdgpu-inline-stack-argument.ll

164 lines

amdgpu-inline-stack-array-ptr-argument.ll

118 lines

amdgpu-inline-stack-ptr-argument.ll

112 lines

amdgpu-inline-stack-struct-argument.ll

171 lines

amdgpu-inline-stack-vector-ptr-argument.ll

118 lines

Diff 493907

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp

Show All 11 Lines
// more precise answers to certain TTI queries, while letting the target		// more precise answers to certain TTI queries, while letting the target
// independent and default TTI implementations handle the rest.		// independent and default TTI implementations handle the rest.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "AMDGPUTargetTransformInfo.h"		#include "AMDGPUTargetTransformInfo.h"
#include "AMDGPUTargetMachine.h"		#include "AMDGPUTargetMachine.h"
#include "MCTargetDesc/AMDGPUMCTargetDesc.h"		#include "MCTargetDesc/AMDGPUMCTargetDesc.h"
		#include "llvm/Analysis/InlineCost.h"
#include "llvm/Analysis/LoopInfo.h"		#include "llvm/Analysis/LoopInfo.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
		#include "llvm/CodeGen/Analysis.h"
#include "llvm/IR/IRBuilder.h"		#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/IntrinsicsAMDGPU.h"		#include "llvm/IR/IntrinsicsAMDGPU.h"
#include "llvm/IR/PatternMatch.h"		#include "llvm/IR/PatternMatch.h"
#include "llvm/Support/KnownBits.h"		#include "llvm/Support/KnownBits.h"
#include <optional>		#include <optional>

using namespace llvm;		using namespace llvm;

▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	cl::desc("Maximum number of BBs allowed in a function after inlining"
" (compile time constraint)"));		" (compile time constraint)"));

static bool dependsOnLocalPhi(const Loop L, const Value Cond,		static bool dependsOnLocalPhi(const Loop L, const Value Cond,
unsigned Depth = 0) {		unsigned Depth = 0) {
const Instruction *I = dyn_cast<Instruction>(Cond);		const Instruction *I = dyn_cast<Instruction>(Cond);
if (!I)		if (!I)
return false;		return false;

for (const Value *V : I->operand_values()) {		for (const Value *V : I->operand_values()) {
if (!L->contains(I))		if (!L->contains(I))
continue;		continue;
		arsenmUnsubmitted Not Done Reply Inline Actions Should be able to compute this directly from the existing costs for stack stores arsenm: Should be able to compute this directly from the existing costs for stack stores
		JanekvOAuthorUnsubmitted Done Reply Inline Actions Sorry, I couldn't find any constant or cl option for stack store costs. Did you have anything in mind to replace this cl option? JanekvO: Sorry, I couldn't find any constant or cl option for stack store costs. Did you have anything…
		arsenmUnsubmitted Not Done Reply Inline Actions I mean TargetTransformInfo::getMemoryOpCost (looking now and I don't think we ever implemented this. I know I tried to implement at one point but I guess never pushed it. We should implement this too) arsenm: I mean TargetTransformInfo::getMemoryOpCost (looking now and I don't think we ever implemented…
if (const PHINode *PHI = dyn_cast<PHINode>(V)) {		if (const PHINode *PHI = dyn_cast<PHINode>(V)) {
if (llvm::none_of(L->getSubLoops(), [PHI](const Loop* SubLoop) {		if (llvm::none_of(L->getSubLoops(), [PHI](const Loop* SubLoop) {
return SubLoop->contains(PHI); }))		return SubLoop->contains(PHI); }))
return true;		return true;
} else if (Depth < 10 && dependsOnLocalPhi(L, V, Depth+1))		} else if (Depth < 10 && dependsOnLocalPhi(L, V, Depth+1))
return true;		return true;
}		}
return false;		return false;
▲ Show 20 Lines • Show All 1,065 Lines • ▼ Show 20 Lines	if (Callee->size() == 1)
return true;		return true;
size_t BBSize = Caller->size() + Callee->size() - 1;		size_t BBSize = Caller->size() + Callee->size() - 1;
return BBSize <= InlineMaxBB;		return BBSize <= InlineMaxBB;
}		}

return true;		return true;
}		}

		static unsigned
		adjustInliningThresholdUsingCallee(const Function *Callee,
		const SITargetLowering *TLI,
		const GCNTTIImpl *TTIImpl) {
		arsenmUnsubmitted Done Reply Inline Actions No reason to specially consider them? arsenm: No reason to specially consider them?
		JanekvOAuthorUnsubmitted Done Reply Inline Actions Removed as inlining will never call kernel functions as callee. JanekvO: Removed as inlining will never call kernel functions as callee.
		const int NrOfSGPRUntilSpill = 26;
		const int NrOfVGPRUntilSpill = 32;

		const DataLayout &DL = TTIImpl->getDataLayout();

		unsigned adjustThreshold = 0;
		int SGPRsInUse = 0;
		arsenmUnsubmitted Done Reply Inline Actions Raw argument counts don't correspond to register counts, need to get the type legalized register size arsenm: Raw argument counts don't correspond to register counts, need to get the type legalized…
		arsenmUnsubmitted Done Reply Inline Actions Get getPrimitiveSizeInBits doesn't work for pointers. You're also going to be repeating a lot of legalization logic. You're better off using getRegisterTypeForCallingConv and getNumRegistersForCallingConv arsenm: Get getPrimitiveSizeInBits doesn't work for pointers. You're also going to be repeating a lot…
		int VGPRsInUse = 0;
		for (const Argument &A : Callee->args()) {
		arsenmUnsubmitted Done Reply Inline Actions Can you just go through EVT? This is going to not work for vectors of pointers. You shouldn't have to consider any of these details arsenm: Can you just go through EVT? This is going to not work for vectors of pointers. You shouldn't…
		JanekvOAuthorUnsubmitted Done Reply Inline Actions This is going to not work for vectors of pointers. Sorry, is there a way to get vector of pointers working with EVT that I'm missing? JanekvO: > This is going to not work for vectors of pointers. Sorry, is there a way to get vector of…
		arsenmUnsubmitted Not Done Reply Inline Actions You should be performing no logic on the type. You should pass the raw type to getEVT (i.e. remove the isPointerTy check, that doesn't cover vectors of pointers) arsenm: You should be performing no logic on the type. You should pass the raw type to getEVT (i.e.
		JanekvOAuthorUnsubmitted Done Reply Inline Actions Ah, the reason I put some type logic here is because `getNumRegistersForCallingConv` calls `EVT::getSizeInBits` which is target dependent and will hit an unreachable error (`EVT::getEVT` also doesn't allow vector of pointers). JanekvO: Ah, the reason I put some type logic here is because `getNumRegistersForCallingConv` calls `EVT…
		arsenmUnsubmitted Not Done Reply Inline Actions I forgot this API was bad. You probably want getValueType (which is what TTI::getRegUsageForType uses, and also won't work correctly for aggregates). The most correct thing would be to call ComputeValueVTs to cover all aggregates correctly, but that feels a bit heavy. I'd go with getTLI()->getValueType(), and add a test with an array and a struct argument just to make sure those don't break too badly arsenm: I forgot this API was bad. You probably want getValueType (which is what TTI…
		arsenmUnsubmitted Not Done Reply Inline Actions Actually, we kind of care a lot about getting aggregates correct. We generate a number of them for the basic HIP ABI and should be emitting more of them. arsenm: Actually, we kind of care a lot about getting aggregates correct. We generate a number of them…
		SmallVector<EVT, 4> ValueVTs;
		ComputeValueVTs(*TLI, DL, A.getType(), ValueVTs);
		for (auto ArgVT : ValueVTs) {
		JanekvOAuthorUnsubmitted Done Reply Inline Actions I added function passing through SGPRs as it was described for non-kernel functions in the AMDGPU backend user guide but the calling convention tablegen file for AMDGPU seems to only use VGPRs for passing arguments. Not sure if the aim is to add function argument passing through SGPRs and I should keep it in or whether to remove this. JanekvO: I added function passing through SGPRs as it was described for non-kernel functions in the…
		unsigned CCRegNum = TLI->getNumRegistersForCallingConv(
		scchanUnsubmitted Done Reply Inline Actions I guess it's subtracting the number of clobbered registers - instead of a hardcoded value, could that be replaced by something more meaningful like a const variable or a getter? Also shouldn't VGPRs have a higher penalty relative to SGPRs since they'd occupy more stack space? scchan: I guess it's subtracting the number of clobbered registers - instead of a hardcoded value…
		arsenmUnsubmitted Done Reply Inline Actions We only sort of handle SGPR arguments today, and not for compute. We also do not currently implement the optimization of packing SGPRs into a VGPR for the argument spill arsenm: We only sort of handle SGPR arguments today, and not for compute. We also do not currently…
		scchanUnsubmitted Done Reply Inline Actions I wasn't paying attention to the comments for ArgStackInlinePenalty. The cost model is only based on the number of instructions and it doesn't take storage into account. scchan: I wasn't paying attention to the comments for ArgStackInlinePenalty. The cost model is only…
		JanekvOAuthorUnsubmitted Done Reply Inline Actions I couldn't infer what measurement unit the inliner cost/threshold uses so I took a cost model relative from the cost of a single instruction. Do let me know if the storage cost should be considered (and possibly with what amount). JanekvO: I couldn't infer what measurement unit the inliner cost/threshold uses so I took a cost model…
		scchanUnsubmitted Done Reply Inline Actions I was thinking about the contribution of a stack's size to the overall size of the scratch since that may add penalty to the launch overhead. A VGPR store would take more stack space than a SGPR store and therefore has a higher cost (relatively speaking)? I don't know how to model it at the moment but just suggesting that would be something to consider. scchan: I was thinking about the contribution of a stack's size to the overall size of the scratch…
		A.getContext(), Callee->getCallingConv(), ArgVT);
		if (AMDGPU::isArgPassedInSGPR(&A))
		SGPRsInUse += CCRegNum;
		else
		VGPRsInUse += CCRegNum;
		}
		}

		// The cost of passing function arguments through the stack:
		// 1 instruction to put a function argument on the stack in the caller.
		// 1 instruction to take a function argument from the stack in callee.
		// 1 instruction is explicitly take care of data dependencies in callee
		// function.
		InstructionCost ArgStackCost(1);
		ArgStackCost += const_cast<GCNTTIImpl *>(TTIImpl)->getMemoryOpCost(
		Instruction::Store, Type::getInt32Ty(Callee->getContext()),
		DL.getPointerPrefAlignment(AMDGPUAS::PRIVATE_ADDRESS),
		arsenmUnsubmitted Done Reply Inline Actions Hardcoding 4 for the alignment would be fine (not that this argument will do anything) arsenm: Hardcoding 4 for the alignment would be fine (not that this argument will do anything)
		AMDGPUAS::PRIVATE_ADDRESS, TTI::TCK_SizeAndLatency);
		arsenmUnsubmitted Not Done Reply Inline Actions Align(4), not MaybeAlign arsenm: Align(4), not MaybeAlign
		ArgStackCost += const_cast<GCNTTIImpl *>(TTIImpl)->getMemoryOpCost(
		Instruction::Load, Type::getInt32Ty(Callee->getContext()),
		DL.getPointerPrefAlignment(AMDGPUAS::PRIVATE_ADDRESS),
		AMDGPUAS::PRIVATE_ADDRESS, TTI::TCK_SizeAndLatency);

		// The penalty cost is computed relative to the cost of instructions and does
		arsenmUnsubmitted Done Reply Inline Actions assign DL to variable above? arsenm: assign DL to variable above?
		// not model any storage costs.
		adjustThreshold +=
		std::max(0, SGPRsInUse - NrOfSGPRUntilSpill) *
		ArgStackCost.getValue() InlineConstants::getInstrCost();
		adjustThreshold +=
		std::max(0, VGPRsInUse - NrOfVGPRUntilSpill) *
		arsenmUnsubmitted Done Reply Inline Actions This isn't actually true today, we don't implement SGPR argument packing arsenm: This isn't actually true today, we don't implement SGPR argument packing
		ArgStackCost.getValue() InlineConstants::getInstrCost();
		return adjustThreshold;
		}

unsigned GCNTTIImpl::adjustInliningThreshold(const CallBase *CB) const {		unsigned GCNTTIImpl::adjustInliningThreshold(const CallBase *CB) const {
// If we have a pointer to private array passed into a function		// If we have a pointer to private array passed into a function
		arsenmUnsubmitted Done Reply Inline Actions Don't refer to this as a spill, it's a stack cost arsenm: Don't refer to this as a spill, it's a stack cost
// it will not be optimized out, leaving scratch usage.		// it will not be optimized out, leaving scratch usage.
// Increase the inline threshold to allow inlining in this case.		// Increase the inline threshold to allow inlining in this case.
		unsigned adjustThreshold = 0;
uint64_t AllocaSize = 0;		uint64_t AllocaSize = 0;
SmallPtrSet<const AllocaInst *, 8> AIVisited;		SmallPtrSet<const AllocaInst *, 8> AIVisited;
for (Value *PtrArg : CB->args()) {		for (Value *PtrArg : CB->args()) {
PointerType *Ty = dyn_cast<PointerType>(PtrArg->getType());		PointerType *Ty = dyn_cast<PointerType>(PtrArg->getType());
if (!Ty \|\| (Ty->getAddressSpace() != AMDGPUAS::PRIVATE_ADDRESS &&		if (!Ty \|\| (Ty->getAddressSpace() != AMDGPUAS::PRIVATE_ADDRESS &&
Ty->getAddressSpace() != AMDGPUAS::FLAT_ADDRESS))		Ty->getAddressSpace() != AMDGPUAS::FLAT_ADDRESS))
continue;		continue;

PtrArg = getUnderlyingObject(PtrArg);		PtrArg = getUnderlyingObject(PtrArg);
if (const AllocaInst *AI = dyn_cast<AllocaInst>(PtrArg)) {		if (const AllocaInst *AI = dyn_cast<AllocaInst>(PtrArg)) {
if (!AI->isStaticAlloca() \|\| !AIVisited.insert(AI).second)		if (!AI->isStaticAlloca() \|\| !AIVisited.insert(AI).second)
continue;		continue;
AllocaSize += DL.getTypeAllocSize(AI->getAllocatedType());		AllocaSize += DL.getTypeAllocSize(AI->getAllocatedType());
// If the amount of stack memory is excessive we will not be able		// If the amount of stack memory is excessive we will not be able
// to get rid of the scratch anyway, bail out.		// to get rid of the scratch anyway, bail out.
if (AllocaSize > ArgAllocaCutoff) {		if (AllocaSize > ArgAllocaCutoff) {
AllocaSize = 0;		AllocaSize = 0;
break;		break;
}		}
}		}
}		}
if (AllocaSize)		adjustThreshold +=
return ArgAllocaCost;		adjustInliningThresholdUsingCallee(CB->getCalledFunction(), TLI, this);
return 0;		adjustThreshold += AllocaSize ? ArgAllocaCost : AllocaSize;
		return adjustThreshold;
		arsenmUnsubmitted Done Reply Inline Actions Typo tInlinig arsenm: Typo tInlinig
}		}

void GCNTTIImpl::getUnrollingPreferences(Loop *L, ScalarEvolution &SE,		void GCNTTIImpl::getUnrollingPreferences(Loop *L, ScalarEvolution &SE,
TTI::UnrollingPreferences &UP,		TTI::UnrollingPreferences &UP,
OptimizationRemarkEmitter *ORE) {		OptimizationRemarkEmitter *ORE) {
CommonTTI.getUnrollingPreferences(L, SE, UP, ORE);		CommonTTI.getUnrollingPreferences(L, SE, UP, ORE);
}		}

Show All 25 Lines

llvm/test/Transforms/Inline/AMDGPU/amdgpu-inline-stack-argument-i64.ll

This file was added.

				; RUN: opt -mtriple=amdgcn-amd-amdhsa -S -passes=inline -inline-cost-full=true -inline-threshold=0 -inline-instr-cost=5 -inline-call-penalty=0 -debug-only=inline < %s 2>&1 \| FileCheck %s
				; REQUIRES: asserts

				; CHECK: NOT Inlining (cost={{[0-9]+}}, threshold={{[0-9]+}}), Call: %noinlinecall1 = call noundef i64 @non_inlining_call
				; CHECK: NOT Inlining (cost={{[0-9]+}}, threshold={{[0-9]+}}), Call: %noinlinecall2 = call noundef i64 @non_inlining_call
				; CHECK-NOT: NOT Inlining (cost={{[0-9]+}}, threshold={{[0-9]+}}), Call: %inlinecall1 = call noundef i64 @inlining_call
				; CHECK-NOT: NOT Inlining (cost={{[0-9]+}}, threshold={{[0-9]+}}), Call: %inlinecall2 = call noundef i64 @inlining_call

				define noundef i64 @non_inlining_call(i64 noundef %a0, i64 noundef %b0, i64 noundef %c0, i64 noundef %d0, i64 noundef %e0, i64 noundef %f0, i64 noundef %g0, i64 noundef %h0, i64 noundef %i0, i64 noundef %j0, i64 noundef %k0, i64 noundef %l0, i64 noundef %m0, i64 noundef %n0, i64 noundef %o0, i64 noundef %p0) {
				entry:
				%xor = xor i64 %a0, %b0
				%xor1 = xor i64 %xor, %c0
				%xor2 = xor i64 %xor1, %d0
				%xor3 = xor i64 %xor2, %e0
				%xor4 = xor i64 %xor3, %f0
				%xor5 = xor i64 %xor4, %g0
				%xor6 = xor i64 %xor5, %h0
				%xor7 = xor i64 %xor6, %i0
				%xor8 = xor i64 %xor7, %j0
				%xor9 = xor i64 %xor8, %k0
				%xor10 = xor i64 %xor9, %l0
				%xor11 = xor i64 %xor10, %m0
				%xor12 = xor i64 %xor11, %n0
				%xor13 = xor i64 %xor12, %o0
				%xor14 = xor i64 %xor13, %p0
				%xor15 = xor i64 %xor14, 1
				%xor16 = xor i64 %xor15, 2
				ret i64 %xor16
				}

				define noundef i64 @inlining_call(i64 noundef %a0, i64 noundef %b0, i64 noundef %c0, i64 noundef %d0, i64 noundef %e0, i64 noundef %f0, i64 noundef %g0, i64 noundef %h0, i64 noundef %i0, i64 noundef %j0, i64 noundef %k0, i64 noundef %l0, i64 noundef %m0, i64 noundef %n0, i64 noundef %o0, i64 noundef %p0, i64 noundef %q0) {
				entry:
				%xor = xor i64 %a0, %b0
				%xor1 = xor i64 %xor, %c0
				%xor2 = xor i64 %xor1, %d0
				%xor3 = xor i64 %xor2, %e0
				%xor4 = xor i64 %xor3, %f0
				%xor5 = xor i64 %xor4, %g0
				%xor6 = xor i64 %xor5, %h0
				%xor7 = xor i64 %xor6, %i0
				%xor8 = xor i64 %xor7, %j0
				%xor9 = xor i64 %xor8, %k0
				%xor10 = xor i64 %xor9, %l0
				%xor11 = xor i64 %xor10, %m0
				%xor12 = xor i64 %xor11, %n0
				%xor13 = xor i64 %xor12, %o0
				%xor14 = xor i64 %xor13, %p0
				%xor15 = xor i64 %xor14, %q0
				%xor16 = xor i64 %xor15, 1
				%xor17 = xor i64 %xor16, 1
				ret i64 %xor17
				}

				; Calling each (non-)inlining function twice to make sure they won't get the sole call inlining cost bonus.
				define i64 @Caller(ptr noundef %in) {
				entry:
				%arrayidx = getelementptr inbounds i64, ptr %in, i64 0
				%a0 = load i64, ptr %arrayidx, align 4
				%arrayidx1 = getelementptr inbounds i64, ptr %in, i64 1
				%b0 = load i64, ptr %arrayidx1, align 4
				%arrayidx2 = getelementptr inbounds i64, ptr %in, i64 2
				%c0 = load i64, ptr %arrayidx2, align 4
				%arrayidx3 = getelementptr inbounds i64, ptr %in, i64 3
				%d0 = load i64, ptr %arrayidx3, align 4
				%arrayidx4 = getelementptr inbounds i64, ptr %in, i64 4
				%e0 = load i64, ptr %arrayidx4, align 4
				%arrayidx5 = getelementptr inbounds i64, ptr %in, i64 5
				%f0 = load i64, ptr %arrayidx5, align 4
				%arrayidx6 = getelementptr inbounds i64, ptr %in, i64 6
				%g0 = load i64, ptr %arrayidx6, align 4
				%arrayidx7 = getelementptr inbounds i64, ptr %in, i64 7
				%h0 = load i64, ptr %arrayidx7, align 4
				%arrayidx8 = getelementptr inbounds i64, ptr %in, i64 8
				%i0 = load i64, ptr %arrayidx8, align 4
				%arrayidx9 = getelementptr inbounds i64, ptr %in, i64 9
				%j0 = load i64, ptr %arrayidx9, align 4
				%arrayidx10 = getelementptr inbounds i64, ptr %in, i64 10
				%k0 = load i64, ptr %arrayidx10, align 4
				%arrayidx11 = getelementptr inbounds i64, ptr %in, i64 11
				%l0 = load i64, ptr %arrayidx11, align 4
				%arrayidx12 = getelementptr inbounds i64, ptr %in, i64 12
				%m0 = load i64, ptr %arrayidx12, align 4
				%arrayidx13 = getelementptr inbounds i64, ptr %in, i64 13
				%n0 = load i64, ptr %arrayidx13, align 4
				%arrayidx14 = getelementptr inbounds i64, ptr %in, i64 14
				%o0 = load i64, ptr %arrayidx14, align 4
				%arrayidx15 = getelementptr inbounds i64, ptr %in, i64 15
				%p0 = load i64, ptr %arrayidx15, align 4
				%arrayidx16 = getelementptr inbounds i64, ptr %in, i64 16
				%q0 = load i64, ptr %arrayidx16, align 4
				%noinlinecall1 = call noundef i64 @non_inlining_call(i64 noundef %a0, i64 noundef %b0, i64 noundef %c0, i64 noundef %d0, i64 noundef %e0, i64 noundef %f0, i64 noundef %g0, i64 noundef %h0, i64 noundef %i0, i64 noundef %j0, i64 noundef %k0, i64 noundef %l0, i64 noundef %m0, i64 noundef %n0, i64 noundef %o0, i64 noundef %p0)
				%add = add i64 0, %noinlinecall1
				%noinlinecall2 = call noundef i64 @non_inlining_call(i64 noundef %a0, i64 noundef %b0, i64 noundef %c0, i64 noundef %d0, i64 noundef %e0, i64 noundef %f0, i64 noundef %g0, i64 noundef %h0, i64 noundef %i0, i64 noundef %j0, i64 noundef %k0, i64 noundef %l0, i64 noundef %m0, i64 noundef %n0, i64 noundef %o0, i64 noundef %p0)
				%add2 = add i64 %add, %noinlinecall2
				%inlinecall1 = call noundef i64 @inlining_call(i64 noundef %a0, i64 noundef %b0, i64 noundef %c0, i64 noundef %d0, i64 noundef %e0, i64 noundef %f0, i64 noundef %g0, i64 noundef %h0, i64 noundef %i0, i64 noundef %j0, i64 noundef %k0, i64 noundef %l0, i64 noundef %m0, i64 noundef %n0, i64 noundef %o0, i64 noundef %p0, i64 noundef %q0)
				%add3 = add i64 %add2, %inlinecall1
				%inlinecall2 = call noundef i64 @inlining_call(i64 noundef %a0, i64 noundef %b0, i64 noundef %c0, i64 noundef %d0, i64 noundef %e0, i64 noundef %f0, i64 noundef %g0, i64 noundef %h0, i64 noundef %i0, i64 noundef %j0, i64 noundef %k0, i64 noundef %l0, i64 noundef %m0, i64 noundef %n0, i64 noundef %o0, i64 noundef %p0, i64 noundef %q0)
				%add4 = add i64 %add3, %inlinecall2
				ret i64 %add4
				}

llvm/test/Transforms/Inline/AMDGPU/amdgpu-inline-stack-argument.ll

This file was added.

				; RUN: opt -mtriple=amdgcn-amd-amdhsa -S -passes=inline -inline-cost-full=true -inline-threshold=0 -inline-instr-cost=5 -inline-call-penalty=0 -debug-only=inline < %s 2>&1 \| FileCheck %s
				; REQUIRES: asserts

				; CHECK: NOT Inlining (cost={{[0-9]+}}, threshold={{[0-9]+}}), Call: %noinlinecall1 = call noundef i32 @non_inlining_call
				; CHECK: NOT Inlining (cost={{[0-9]+}}, threshold={{[0-9]+}}), Call: %noinlinecall2 = call noundef i32 @non_inlining_call
				; CHECK-NOT: NOT Inlining (cost={{[0-9]+}}, threshold={{[0-9]+}}), Call: %inlinecall1 = call noundef i32 @inlining_call
				; CHECK-NOT: NOT Inlining (cost={{[0-9]+}}, threshold={{[0-9]+}}), Call: %inlinecall2 = call noundef i32 @inlining_call

				define noundef i32 @non_inlining_call(i32 noundef %a0, i32 noundef %b0, i32 noundef %c0, i32 noundef %d0, i32 noundef %e0, i32 noundef %f0, i32 noundef %g0, i32 noundef %h0, i32 noundef %i0, i32 noundef %j0, i32 noundef %k0, i32 noundef %l0, i32 noundef %m0, i32 noundef %n0, i32 noundef %o0, i32 noundef %p0, i32 noundef %q0, i32 noundef %r0, i32 noundef %s0, i32 noundef %t0, i32 noundef %u0, i32 noundef %v0, i32 noundef %w0, i32 noundef %x0, i32 noundef %y0, i32 noundef %z0, i32 noundef %a1, i32 noundef %b1, i32 noundef %c1, i32 noundef %d1, i32 noundef %e1, i32 noundef %f1) {
				entry:
				%xor = xor i32 %a0, %b0
				%xor1 = xor i32 %xor, %c0
				%xor2 = xor i32 %xor1, %d0
				%xor3 = xor i32 %xor2, %e0
				%xor4 = xor i32 %xor3, %f0
				%xor5 = xor i32 %xor4, %g0
				%xor6 = xor i32 %xor5, %h0
				%xor7 = xor i32 %xor6, %i0
				%xor8 = xor i32 %xor7, %j0
				%xor9 = xor i32 %xor8, %k0
				%xor10 = xor i32 %xor9, %l0
				%xor11 = xor i32 %xor10, %m0
				arsenmUnsubmitted Done Reply Inline Actions Should use pass arguments or flags to set the thresholds to avoid having so many instructions in the test arsenm: Should use pass arguments or flags to set the thresholds to avoid having so many instructions…
				%xor12 = xor i32 %xor11, %n0
				%xor13 = xor i32 %xor12, %o0
				%xor14 = xor i32 %xor13, %p0
				%xor15 = xor i32 %xor14, %q0
				%xor16 = xor i32 %xor15, %r0
				%xor17 = xor i32 %xor16, %s0
				%xor18 = xor i32 %xor17, %t0
				%xor19 = xor i32 %xor18, %u0
				%xor20 = xor i32 %xor19, %v0
				%xor21 = xor i32 %xor20, %w0
				%xor22 = xor i32 %xor21, %x0
				%xor23 = xor i32 %xor22, %y0
				%xor24 = xor i32 %xor23, %z0
				%xor25 = xor i32 %xor24, %a1
				%xor26 = xor i32 %xor25, %b1
				%xor27 = xor i32 %xor26, %c1
				%xor28 = xor i32 %xor27, %d1
				%xor29 = xor i32 %xor28, %e1
				%xor30 = xor i32 %xor29, %f1
				%xor31 = xor i32 %xor30, 1
				%xor32 = xor i32 %xor31, 2
				ret i32 %xor32
				}

				define noundef i32 @inlining_call(i32 noundef %a0, i32 noundef %b0, i32 noundef %c0, i32 noundef %d0, i32 noundef %e0, i32 noundef %f0, i32 noundef %g0, i32 noundef %h0, i32 noundef %i0, i32 noundef %j0, i32 noundef %k0, i32 noundef %l0, i32 noundef %m0, i32 noundef %n0, i32 noundef %o0, i32 noundef %p0, i32 noundef %q0, i32 noundef %r0, i32 noundef %s0, i32 noundef %t0, i32 noundef %u0, i32 noundef %v0, i32 noundef %w0, i32 noundef %x0, i32 noundef %y0, i32 noundef %z0, i32 noundef %a1, i32 noundef %b1, i32 noundef %c1, i32 noundef %d1, i32 noundef %e1, i32 noundef %f1, i32 noundef %g1) {
				entry:
				%xor = xor i32 %a0, %b0
				%xor1 = xor i32 %xor, %c0
				%xor2 = xor i32 %xor1, %d0
				%xor3 = xor i32 %xor2, %e0
				%xor4 = xor i32 %xor3, %f0
				%xor5 = xor i32 %xor4, %g0
				%xor6 = xor i32 %xor5, %h0
				%xor7 = xor i32 %xor6, %i0
				%xor8 = xor i32 %xor7, %j0
				%xor9 = xor i32 %xor8, %k0
				%xor10 = xor i32 %xor9, %l0
				%xor11 = xor i32 %xor10, %m0
				%xor12 = xor i32 %xor11, %n0
				%xor13 = xor i32 %xor12, %o0
				%xor14 = xor i32 %xor13, %p0
				%xor15 = xor i32 %xor14, %q0
				%xor16 = xor i32 %xor15, %r0
				%xor17 = xor i32 %xor16, %s0
				%xor18 = xor i32 %xor17, %t0
				%xor19 = xor i32 %xor18, %u0
				%xor20 = xor i32 %xor19, %v0
				%xor21 = xor i32 %xor20, %w0
				%xor22 = xor i32 %xor21, %x0
				%xor23 = xor i32 %xor22, %y0
				%xor24 = xor i32 %xor23, %z0
				%xor25 = xor i32 %xor24, %a1
				%xor26 = xor i32 %xor25, %b1
				%xor27 = xor i32 %xor26, %c1
				%xor28 = xor i32 %xor27, %d1
				%xor29 = xor i32 %xor28, %e1
				%xor30 = xor i32 %xor29, %f1
				%xor31 = xor i32 %xor30, %g1
				%xor32 = xor i32 %xor30, 1
				%xor33 = xor i32 %xor31, 2
				ret i32 %xor33
				}

				; Calling each (non-)inlining function twice to make sure they won't get the sole call inlining cost bonus.
				define i32 @Caller(ptr noundef %in) {
				entry:
				%arrayidx = getelementptr inbounds i32, ptr %in, i64 0
				%a0 = load i32, ptr %arrayidx, align 4
				%arrayidx1 = getelementptr inbounds i32, ptr %in, i64 1
				%b0 = load i32, ptr %arrayidx1, align 4
				%arrayidx2 = getelementptr inbounds i32, ptr %in, i64 2
				%c0 = load i32, ptr %arrayidx2, align 4
				%arrayidx3 = getelementptr inbounds i32, ptr %in, i64 3
				%d0 = load i32, ptr %arrayidx3, align 4
				%arrayidx4 = getelementptr inbounds i32, ptr %in, i64 4
				%e0 = load i32, ptr %arrayidx4, align 4
				%arrayidx5 = getelementptr inbounds i32, ptr %in, i64 5
				%f0 = load i32, ptr %arrayidx5, align 4
				%arrayidx6 = getelementptr inbounds i32, ptr %in, i64 6
				%g0 = load i32, ptr %arrayidx6, align 4
				%arrayidx7 = getelementptr inbounds i32, ptr %in, i64 7
				%h0 = load i32, ptr %arrayidx7, align 4
				%arrayidx8 = getelementptr inbounds i32, ptr %in, i64 8
				%i0 = load i32, ptr %arrayidx8, align 4
				%arrayidx9 = getelementptr inbounds i32, ptr %in, i64 9
				%j0 = load i32, ptr %arrayidx9, align 4
				%arrayidx10 = getelementptr inbounds i32, ptr %in, i64 10
				%k0 = load i32, ptr %arrayidx10, align 4
				%arrayidx11 = getelementptr inbounds i32, ptr %in, i64 11
				%l0 = load i32, ptr %arrayidx11, align 4
				%arrayidx12 = getelementptr inbounds i32, ptr %in, i64 12
				%m0 = load i32, ptr %arrayidx12, align 4
				%arrayidx13 = getelementptr inbounds i32, ptr %in, i64 13
				%n0 = load i32, ptr %arrayidx13, align 4
				%arrayidx14 = getelementptr inbounds i32, ptr %in, i64 14
				%o0 = load i32, ptr %arrayidx14, align 4
				%arrayidx15 = getelementptr inbounds i32, ptr %in, i64 15
				%p0 = load i32, ptr %arrayidx15, align 4
				%arrayidx16 = getelementptr inbounds i32, ptr %in, i64 16
				%q0 = load i32, ptr %arrayidx16, align 4
				%arrayidx17 = getelementptr inbounds i32, ptr %in, i64 17
				%r0 = load i32, ptr %arrayidx17, align 4
				%arrayidx18 = getelementptr inbounds i32, ptr %in, i64 18
				%s0 = load i32, ptr %arrayidx18, align 4
				%arrayidx19 = getelementptr inbounds i32, ptr %in, i64 19
				%t0 = load i32, ptr %arrayidx19, align 4
				%arrayidx20 = getelementptr inbounds i32, ptr %in, i64 20
				%u0 = load i32, ptr %arrayidx20, align 4
				%arrayidx21 = getelementptr inbounds i32, ptr %in, i64 21
				%v0 = load i32, ptr %arrayidx21, align 4
				%arrayidx22 = getelementptr inbounds i32, ptr %in, i64 22
				%w0 = load i32, ptr %arrayidx22, align 4
				%arrayidx23 = getelementptr inbounds i32, ptr %in, i64 23
				%x0 = load i32, ptr %arrayidx23, align 4
				%arrayidx24 = getelementptr inbounds i32, ptr %in, i64 24
				%y0 = load i32, ptr %arrayidx24, align 4
				%arrayidx25 = getelementptr inbounds i32, ptr %in, i64 25
				%z0 = load i32, ptr %arrayidx25, align 4
				%arrayidx26 = getelementptr inbounds i32, ptr %in, i64 26
				%a1 = load i32, ptr %arrayidx26, align 4
				%arrayidx27 = getelementptr inbounds i32, ptr %in, i64 27
				%b1 = load i32, ptr %arrayidx27, align 4
				%arrayidx28 = getelementptr inbounds i32, ptr %in, i64 28
				%c1 = load i32, ptr %arrayidx28, align 4
				%arrayidx29 = getelementptr inbounds i32, ptr %in, i64 29
				%d1 = load i32, ptr %arrayidx29, align 4
				%arrayidx30 = getelementptr inbounds i32, ptr %in, i64 30
				%e1 = load i32, ptr %arrayidx30, align 4
				%arrayidx31 = getelementptr inbounds i32, ptr %in, i64 31
				%f1 = load i32, ptr %arrayidx31, align 4
				%arrayidx32 = getelementptr inbounds i32, ptr %in, i64 32
				%g1 = load i32, ptr %arrayidx32, align 4
				%noinlinecall1 = call noundef i32 @non_inlining_call(i32 noundef %a0, i32 noundef %b0, i32 noundef %c0, i32 noundef %d0, i32 noundef %e0, i32 noundef %f0, i32 noundef %g0, i32 noundef %h0, i32 noundef %i0, i32 noundef %j0, i32 noundef %k0, i32 noundef %l0, i32 noundef %m0, i32 noundef %n0, i32 noundef %o0, i32 noundef %p0, i32 noundef %q0, i32 noundef %r0, i32 noundef %s0, i32 noundef %t0, i32 noundef %u0, i32 noundef %v0, i32 noundef %w0, i32 noundef %x0, i32 noundef %y0, i32 noundef %z0, i32 noundef %a1, i32 noundef %b1, i32 noundef %c1, i32 noundef %d1, i32 noundef %e1, i32 noundef %f1)
				%add = add i32 0, %noinlinecall1
				%noinlinecall2 = call noundef i32 @non_inlining_call(i32 noundef %a0, i32 noundef %b0, i32 noundef %c0, i32 noundef %d0, i32 noundef %e0, i32 noundef %f0, i32 noundef %g0, i32 noundef %h0, i32 noundef %i0, i32 noundef %j0, i32 noundef %k0, i32 noundef %l0, i32 noundef %m0, i32 noundef %n0, i32 noundef %o0, i32 noundef %p0, i32 noundef %q0, i32 noundef %r0, i32 noundef %s0, i32 noundef %t0, i32 noundef %u0, i32 noundef %v0, i32 noundef %w0, i32 noundef %x0, i32 noundef %y0, i32 noundef %z0, i32 noundef %a1, i32 noundef %b1, i32 noundef %c1, i32 noundef %d1, i32 noundef %e1, i32 noundef %f1)
				%add2 = add i32 %add, %noinlinecall2
				%inlinecall1 = call noundef i32 @inlining_call(i32 noundef %a0, i32 noundef %b0, i32 noundef %c0, i32 noundef %d0, i32 noundef %e0, i32 noundef %f0, i32 noundef %g0, i32 noundef %h0, i32 noundef %i0, i32 noundef %j0, i32 noundef %k0, i32 noundef %l0, i32 noundef %m0, i32 noundef %n0, i32 noundef %o0, i32 noundef %p0, i32 noundef %q0, i32 noundef %r0, i32 noundef %s0, i32 noundef %t0, i32 noundef %u0, i32 noundef %v0, i32 noundef %w0, i32 noundef %x0, i32 noundef %y0, i32 noundef %z0, i32 noundef %a1, i32 noundef %b1, i32 noundef %c1, i32 noundef %d1, i32 noundef %e1, i32 noundef %f1, i32 noundef %g1)
				%add3 = add i32 %add2, %inlinecall1
				%inlinecall2 = call noundef i32 @inlining_call(i32 noundef %a0, i32 noundef %b0, i32 noundef %c0, i32 noundef %d0, i32 noundef %e0, i32 noundef %f0, i32 noundef %g0, i32 noundef %h0, i32 noundef %i0, i32 noundef %j0, i32 noundef %k0, i32 noundef %l0, i32 noundef %m0, i32 noundef %n0, i32 noundef %o0, i32 noundef %p0, i32 noundef %q0, i32 noundef %r0, i32 noundef %s0, i32 noundef %t0, i32 noundef %u0, i32 noundef %v0, i32 noundef %w0, i32 noundef %x0, i32 noundef %y0, i32 noundef %z0, i32 noundef %a1, i32 noundef %b1, i32 noundef %c1, i32 noundef %d1, i32 noundef %e1, i32 noundef %f1, i32 noundef %g1)
				%add4 = add i32 %add3, %inlinecall2
				ret i32 %add4
				}
				arsenmUnsubmitted Done Reply Inline Actions Don't use anonymous values in tests arsenm: Don't use anonymous values in tests

llvm/test/Transforms/Inline/AMDGPU/amdgpu-inline-stack-array-ptr-argument.ll

This file was added.

				; RUN: opt -mtriple=amdgcn-amd-amdhsa -S -passes=inline -inline-cost-full=true -inline-threshold=0 -inline-instr-cost=5 -inline-call-penalty=0 -debug-only=inline < %s 2>&1 \| FileCheck %s
				; REQUIRES: asserts

				; CHECK: NOT Inlining (cost={{[0-9]+}}, threshold={{[0-9]+}}), Call: %noinlinecall1 = call noundef i64 @non_inlining_call
				; CHECK: NOT Inlining (cost={{[0-9]+}}, threshold={{[0-9]+}}), Call: %noinlinecall2 = call noundef i64 @non_inlining_call
				; CHECK-NOT: NOT Inlining (cost={{[0-9]+}}, threshold={{[0-9]+}}), Call: %inlinecall1 = call noundef i64 @inlining_call
				; CHECK-NOT: NOT Inlining (cost={{[0-9]+}}, threshold={{[0-9]+}}), Call: %inlinecall2 = call noundef i64 @inlining_call

				define noundef i64 @non_inlining_call([2 x ptr] noundef %ptrarr, ptr noundef %ptrc0, ptr noundef %ptrd0, ptr noundef %ptre0, ptr noundef %ptrf0, ptr noundef %ptrg0, ptr noundef %ptrh0, ptr noundef %ptri0, ptr noundef %ptrj0, ptr noundef %ptrk0, ptr noundef %ptrl0, ptr noundef %ptrm0, ptr noundef %ptrn0, ptr noundef %ptro0, ptr noundef %ptrp0) {
				entry:
				%ptra0 = extractvalue [2 x ptr] %ptrarr, 0
				%ptrb0 = extractvalue [2 x ptr] %ptrarr, 1
				%a0 = load i64, ptr %ptra0, align 8
				%b0 = load i64, ptr %ptrb0, align 8
				%c0 = load i64, ptr %ptrc0, align 8
				%d0 = load i64, ptr %ptrd0, align 8
				%e0 = load i64, ptr %ptre0, align 8
				%f0 = load i64, ptr %ptrf0, align 8
				%g0 = load i64, ptr %ptrg0, align 8
				%h0 = load i64, ptr %ptrh0, align 8
				%i0 = load i64, ptr %ptri0, align 8
				%j0 = load i64, ptr %ptrj0, align 8
				%k0 = load i64, ptr %ptrk0, align 8
				%l0 = load i64, ptr %ptrl0, align 8
				%m0 = load i64, ptr %ptrm0, align 8
				%n0 = load i64, ptr %ptrn0, align 8
				%o0 = load i64, ptr %ptro0, align 8
				%p0 = load i64, ptr %ptrp0, align 8
				%xor = xor i64 %a0, %b0
				%xor1 = xor i64 %xor, %c0
				%xor2 = xor i64 %xor1, %d0
				%xor3 = xor i64 %xor2, %e0
				%xor4 = xor i64 %xor3, %f0
				%xor5 = xor i64 %xor4, %g0
				%xor6 = xor i64 %xor5, %h0
				%xor7 = xor i64 %xor6, %i0
				%xor8 = xor i64 %xor7, %j0
				%xor9 = xor i64 %xor8, %k0
				%xor10 = xor i64 %xor9, %l0
				%xor11 = xor i64 %xor10, %m0
				%xor12 = xor i64 %xor11, %n0
				%xor13 = xor i64 %xor12, %o0
				%xor14 = xor i64 %xor13, %p0
				ret i64 %xor14
				}

				define noundef i64 @inlining_call([2 x ptr] noundef %ptrarr, ptr noundef %ptrc0, ptr noundef %ptrd0, ptr noundef %ptre0, ptr noundef %ptrf0, ptr noundef %ptrg0, ptr noundef %ptrh0, ptr noundef %ptri0, ptr noundef %ptrj0, ptr noundef %ptrk0, ptr noundef %ptrl0, ptr noundef %ptrm0, ptr noundef %ptrn0, ptr noundef %ptro0, ptr noundef %ptrp0, ptr noundef %ptrq0) {
				entry:
				%ptra0 = extractvalue [2 x ptr] %ptrarr, 0
				%ptrb0 = extractvalue [2 x ptr] %ptrarr, 1
				%a0 = load i64, ptr %ptra0, align 8
				%b0 = load i64, ptr %ptrb0, align 8
				%c0 = load i64, ptr %ptrc0, align 8
				%d0 = load i64, ptr %ptrd0, align 8
				%e0 = load i64, ptr %ptre0, align 8
				%f0 = load i64, ptr %ptrf0, align 8
				%g0 = load i64, ptr %ptrg0, align 8
				%h0 = load i64, ptr %ptrh0, align 8
				%i0 = load i64, ptr %ptri0, align 8
				%j0 = load i64, ptr %ptrj0, align 8
				%k0 = load i64, ptr %ptrk0, align 8
				%l0 = load i64, ptr %ptrl0, align 8
				%m0 = load i64, ptr %ptrm0, align 8
				%n0 = load i64, ptr %ptrn0, align 8
				%o0 = load i64, ptr %ptro0, align 8
				%p0 = load i64, ptr %ptrp0, align 8
				%q0 = load i64, ptr %ptrq0, align 8
				%xor = xor i64 %a0, %b0
				%xor1 = xor i64 %xor, %c0
				%xor2 = xor i64 %xor1, %d0
				%xor3 = xor i64 %xor2, %e0
				%xor4 = xor i64 %xor3, %f0
				%xor5 = xor i64 %xor4, %g0
				%xor6 = xor i64 %xor5, %h0
				%xor7 = xor i64 %xor6, %i0
				%xor8 = xor i64 %xor7, %j0
				%xor9 = xor i64 %xor8, %k0
				%xor10 = xor i64 %xor9, %l0
				%xor11 = xor i64 %xor10, %m0
				%xor12 = xor i64 %xor11, %n0
				%xor13 = xor i64 %xor12, %o0
				%xor14 = xor i64 %xor13, %p0
				%xor15 = xor i64 %xor14, %q0
				ret i64 %xor15
				}

				; Calling each (non-)inlining function twice to make sure they won't get the sole call inlining cost bonus.
				define i64 @Caller(ptr noundef %in) {
				entry:
				%a0 = getelementptr inbounds i64, ptr %in, i64 0
				%b0 = getelementptr inbounds i64, ptr %in, i64 1
				%arr0 = insertvalue [2 x ptr] undef, ptr %a0, 0
				%arr1 = insertvalue [2 x ptr] %arr0, ptr %b0, 1
				%c0 = getelementptr inbounds i64, ptr %in, i64 2
				%d0 = getelementptr inbounds i64, ptr %in, i64 3
				%e0 = getelementptr inbounds i64, ptr %in, i64 4
				%f0 = getelementptr inbounds i64, ptr %in, i64 5
				%g0 = getelementptr inbounds i64, ptr %in, i64 6
				%h0 = getelementptr inbounds i64, ptr %in, i64 7
				%i0 = getelementptr inbounds i64, ptr %in, i64 8
				%j0 = getelementptr inbounds i64, ptr %in, i64 9
				%k0 = getelementptr inbounds i64, ptr %in, i64 10
				%l0 = getelementptr inbounds i64, ptr %in, i64 11
				%m0 = getelementptr inbounds i64, ptr %in, i64 12
				%n0 = getelementptr inbounds i64, ptr %in, i64 13
				%o0 = getelementptr inbounds i64, ptr %in, i64 14
				%p0 = getelementptr inbounds i64, ptr %in, i64 15
				%q0 = getelementptr inbounds i64, ptr %in, i64 16
				%noinlinecall1 = call noundef i64 @non_inlining_call([2 x ptr] noundef %arr1, ptr noundef %c0, ptr noundef %d0, ptr noundef %e0, ptr noundef %f0, ptr noundef %g0, ptr noundef %h0, ptr noundef %i0, ptr noundef %j0, ptr noundef %k0, ptr noundef %l0, ptr noundef %m0, ptr noundef %n0, ptr noundef %o0, ptr noundef %p0)
				%add = add i64 0, %noinlinecall1
				%noinlinecall2 = call noundef i64 @non_inlining_call([2 x ptr] noundef %arr1, ptr noundef %c0, ptr noundef %d0, ptr noundef %e0, ptr noundef %f0, ptr noundef %g0, ptr noundef %h0, ptr noundef %i0, ptr noundef %j0, ptr noundef %k0, ptr noundef %l0, ptr noundef %m0, ptr noundef %n0, ptr noundef %o0, ptr noundef %p0)
				%add2 = add i64 %add, %noinlinecall2
				%inlinecall1 = call noundef i64 @inlining_call([2 x ptr] noundef %arr1, ptr noundef %c0, ptr noundef %d0, ptr noundef %e0, ptr noundef %f0, ptr noundef %g0, ptr noundef %h0, ptr noundef %i0, ptr noundef %j0, ptr noundef %k0, ptr noundef %l0, ptr noundef %m0, ptr noundef %n0, ptr noundef %o0, ptr noundef %p0, ptr noundef %q0)
				%add3 = add i64 %add2, %inlinecall1
				%inlinecall2 = call noundef i64 @inlining_call([2 x ptr] noundef %arr1, ptr noundef %c0, ptr noundef %d0, ptr noundef %e0, ptr noundef %f0, ptr noundef %g0, ptr noundef %h0, ptr noundef %i0, ptr noundef %j0, ptr noundef %k0, ptr noundef %l0, ptr noundef %m0, ptr noundef %n0, ptr noundef %o0, ptr noundef %p0, ptr noundef %q0)
				%add4 = add i64 %add3, %inlinecall2
				ret i64 %add4
				}

llvm/test/Transforms/Inline/AMDGPU/amdgpu-inline-stack-ptr-argument.ll

This file was added.

				; RUN: opt -mtriple=amdgcn-amd-amdhsa -S -passes=inline -inline-cost-full=true -inline-threshold=0 -inline-instr-cost=5 -inline-call-penalty=0 -debug-only=inline < %s 2>&1 \| FileCheck %s
				; REQUIRES: asserts
				arsenmUnsubmitted Done Reply Inline Actions REQUIRES should be the first line arsenm: REQUIRES should be the first line

				; CHECK: NOT Inlining (cost={{[0-9]+}}, threshold={{[0-9]+}}), Call: %noinlinecall1 = call noundef i64 @non_inlining_call
				; CHECK: NOT Inlining (cost={{[0-9]+}}, threshold={{[0-9]+}}), Call: %noinlinecall2 = call noundef i64 @non_inlining_call
				; CHECK-NOT: NOT Inlining (cost={{[0-9]+}}, threshold={{[0-9]+}}), Call: %inlinecall1 = call noundef i64 @inlining_call
				; CHECK-NOT: NOT Inlining (cost={{[0-9]+}}, threshold={{[0-9]+}}), Call: %inlinecall2 = call noundef i64 @inlining_call

				define noundef i64 @non_inlining_call(ptr noundef %ptra0, ptr noundef %ptrb0, ptr noundef %ptrc0, ptr noundef %ptrd0, ptr noundef %ptre0, ptr noundef %ptrf0, ptr noundef %ptrg0, ptr noundef %ptrh0, ptr noundef %ptri0, ptr noundef %ptrj0, ptr noundef %ptrk0, ptr noundef %ptrl0, ptr noundef %ptrm0, ptr noundef %ptrn0, ptr noundef %ptro0, ptr noundef %ptrp0) {
				entry:
				%a0 = load i64, ptr %ptra0, align 8
				%b0 = load i64, ptr %ptrb0, align 8
				%c0 = load i64, ptr %ptrc0, align 8
				%d0 = load i64, ptr %ptrd0, align 8
				%e0 = load i64, ptr %ptre0, align 8
				%f0 = load i64, ptr %ptrf0, align 8
				%g0 = load i64, ptr %ptrg0, align 8
				%h0 = load i64, ptr %ptrh0, align 8
				%i0 = load i64, ptr %ptri0, align 8
				%j0 = load i64, ptr %ptrj0, align 8
				%k0 = load i64, ptr %ptrk0, align 8
				%l0 = load i64, ptr %ptrl0, align 8
				%m0 = load i64, ptr %ptrm0, align 8
				%n0 = load i64, ptr %ptrn0, align 8
				%o0 = load i64, ptr %ptro0, align 8
				%p0 = load i64, ptr %ptrp0, align 8
				%xor = xor i64 %a0, %b0
				%xor1 = xor i64 %xor, %c0
				%xor2 = xor i64 %xor1, %d0
				%xor3 = xor i64 %xor2, %e0
				%xor4 = xor i64 %xor3, %f0
				%xor5 = xor i64 %xor4, %g0
				%xor6 = xor i64 %xor5, %h0
				%xor7 = xor i64 %xor6, %i0
				%xor8 = xor i64 %xor7, %j0
				%xor9 = xor i64 %xor8, %k0
				%xor10 = xor i64 %xor9, %l0
				%xor11 = xor i64 %xor10, %m0
				%xor12 = xor i64 %xor11, %n0
				%xor13 = xor i64 %xor12, %o0
				%xor14 = xor i64 %xor13, %p0
				ret i64 %xor14
				}

				define noundef i64 @inlining_call(ptr noundef %ptra0, ptr noundef %ptrb0, ptr noundef %ptrc0, ptr noundef %ptrd0, ptr noundef %ptre0, ptr noundef %ptrf0, ptr noundef %ptrg0, ptr noundef %ptrh0, ptr noundef %ptri0, ptr noundef %ptrj0, ptr noundef %ptrk0, ptr noundef %ptrl0, ptr noundef %ptrm0, ptr noundef %ptrn0, ptr noundef %ptro0, ptr noundef %ptrp0, ptr noundef %ptrq0) {
				entry:
				%a0 = load i64, ptr %ptra0, align 8
				%b0 = load i64, ptr %ptrb0, align 8
				%c0 = load i64, ptr %ptrc0, align 8
				%d0 = load i64, ptr %ptrd0, align 8
				%e0 = load i64, ptr %ptre0, align 8
				%f0 = load i64, ptr %ptrf0, align 8
				%g0 = load i64, ptr %ptrg0, align 8
				%h0 = load i64, ptr %ptrh0, align 8
				%i0 = load i64, ptr %ptri0, align 8
				%j0 = load i64, ptr %ptrj0, align 8
				%k0 = load i64, ptr %ptrk0, align 8
				%l0 = load i64, ptr %ptrl0, align 8
				%m0 = load i64, ptr %ptrm0, align 8
				%n0 = load i64, ptr %ptrn0, align 8
				%o0 = load i64, ptr %ptro0, align 8
				%p0 = load i64, ptr %ptrp0, align 8
				%q0 = load i64, ptr %ptrq0, align 8
				%xor = xor i64 %a0, %b0
				%xor1 = xor i64 %xor, %c0
				%xor2 = xor i64 %xor1, %d0
				%xor3 = xor i64 %xor2, %e0
				%xor4 = xor i64 %xor3, %f0
				%xor5 = xor i64 %xor4, %g0
				%xor6 = xor i64 %xor5, %h0
				%xor7 = xor i64 %xor6, %i0
				%xor8 = xor i64 %xor7, %j0
				%xor9 = xor i64 %xor8, %k0
				%xor10 = xor i64 %xor9, %l0
				%xor11 = xor i64 %xor10, %m0
				%xor12 = xor i64 %xor11, %n0
				%xor13 = xor i64 %xor12, %o0
				%xor14 = xor i64 %xor13, %p0
				%xor15 = xor i64 %xor14, %q0
				ret i64 %xor15
				}

				; Calling each (non-)inlining function twice to make sure they won't get the sole call inlining cost bonus.
				define i64 @Caller(ptr noundef %in) {
				entry:
				%a0 = getelementptr inbounds i64, ptr %in, i64 0
				%b0 = getelementptr inbounds i64, ptr %in, i64 1
				%c0 = getelementptr inbounds i64, ptr %in, i64 2
				%d0 = getelementptr inbounds i64, ptr %in, i64 3
				%e0 = getelementptr inbounds i64, ptr %in, i64 4
				%f0 = getelementptr inbounds i64, ptr %in, i64 5
				%g0 = getelementptr inbounds i64, ptr %in, i64 6
				%h0 = getelementptr inbounds i64, ptr %in, i64 7
				%i0 = getelementptr inbounds i64, ptr %in, i64 8
				%j0 = getelementptr inbounds i64, ptr %in, i64 9
				%k0 = getelementptr inbounds i64, ptr %in, i64 10
				%l0 = getelementptr inbounds i64, ptr %in, i64 11
				%m0 = getelementptr inbounds i64, ptr %in, i64 12
				%n0 = getelementptr inbounds i64, ptr %in, i64 13
				%o0 = getelementptr inbounds i64, ptr %in, i64 14
				%p0 = getelementptr inbounds i64, ptr %in, i64 15
				%q0 = getelementptr inbounds i64, ptr %in, i64 16
				%noinlinecall1 = call noundef i64 @non_inlining_call(ptr noundef %a0, ptr noundef %b0, ptr noundef %c0, ptr noundef %d0, ptr noundef %e0, ptr noundef %f0, ptr noundef %g0, ptr noundef %h0, ptr noundef %i0, ptr noundef %j0, ptr noundef %k0, ptr noundef %l0, ptr noundef %m0, ptr noundef %n0, ptr noundef %o0, ptr noundef %p0)
				%add = add i64 0, %noinlinecall1
				%noinlinecall2 = call noundef i64 @non_inlining_call(ptr noundef %a0, ptr noundef %b0, ptr noundef %c0, ptr noundef %d0, ptr noundef %e0, ptr noundef %f0, ptr noundef %g0, ptr noundef %h0, ptr noundef %i0, ptr noundef %j0, ptr noundef %k0, ptr noundef %l0, ptr noundef %m0, ptr noundef %n0, ptr noundef %o0, ptr noundef %p0)
				%add2 = add i64 %add, %noinlinecall2
				%inlinecall1 = call noundef i64 @inlining_call(ptr noundef %a0, ptr noundef %b0, ptr noundef %c0, ptr noundef %d0, ptr noundef %e0, ptr noundef %f0, ptr noundef %g0, ptr noundef %h0, ptr noundef %i0, ptr noundef %j0, ptr noundef %k0, ptr noundef %l0, ptr noundef %m0, ptr noundef %n0, ptr noundef %o0, ptr noundef %p0, ptr noundef %q0)
				%add3 = add i64 %add2, %inlinecall1
				%inlinecall2 = call noundef i64 @inlining_call(ptr noundef %a0, ptr noundef %b0, ptr noundef %c0, ptr noundef %d0, ptr noundef %e0, ptr noundef %f0, ptr noundef %g0, ptr noundef %h0, ptr noundef %i0, ptr noundef %j0, ptr noundef %k0, ptr noundef %l0, ptr noundef %m0, ptr noundef %n0, ptr noundef %o0, ptr noundef %p0, ptr noundef %q0)
				%add4 = add i64 %add3, %inlinecall2
				ret i64 %add4
				}

llvm/test/Transforms/Inline/AMDGPU/amdgpu-inline-stack-struct-argument.ll

This file was added.

				; RUN: opt -mtriple=amdgcn-amd-amdhsa -S -passes=inline -inline-cost-full=true -inline-threshold=0 -inline-instr-cost=5 -inline-call-penalty=0 -debug-only=inline < %s 2>&1 \| FileCheck %s
				; REQUIRES: asserts

				; CHECK: NOT Inlining (cost={{[0-9]+}}, threshold={{[0-9]+}}), Call: %noinlinecall1 = call noundef i64 @non_inlining_call
				; CHECK: NOT Inlining (cost={{[0-9]+}}, threshold={{[0-9]+}}), Call: %noinlinecall2 = call noundef i64 @non_inlining_call
				; CHECK-NOT: NOT Inlining (cost={{[0-9]+}}, threshold={{[0-9]+}}), Call: %inlinecall1 = call noundef i64 @inlining_call
				; CHECK-NOT: NOT Inlining (cost={{[0-9]+}}, threshold={{[0-9]+}}), Call: %inlinecall2 = call noundef i64 @inlining_call

				%noinlineT = type {{ptr, ptr}, ptr, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64}
				%inlineT = type {{ptr, ptr}, ptr, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64}

				define noundef i64 @non_inlining_call(%noinlineT noundef %struc) {
				entry:
				%ptra0 = extractvalue %noinlineT %struc, 0, 0
				%ptrb0 = extractvalue %noinlineT %struc, 0, 1
				%ptrc0 = extractvalue %noinlineT %struc, 1
				%a0 = load i64, ptr %ptra0, align 8
				%b0 = load i64, ptr %ptrb0, align 8
				%c0 = load i64, ptr %ptrc0, align 8
				%d0 = extractvalue %noinlineT %struc, 2
				%e0 = extractvalue %noinlineT %struc, 3
				%f0 = extractvalue %noinlineT %struc, 4
				%g0 = extractvalue %noinlineT %struc, 5
				%h0 = extractvalue %noinlineT %struc, 6
				%i0 = extractvalue %noinlineT %struc, 7
				%j0 = extractvalue %noinlineT %struc, 8
				%k0 = extractvalue %noinlineT %struc, 9
				%l0 = extractvalue %noinlineT %struc, 10
				%m0 = extractvalue %noinlineT %struc, 11
				%n0 = extractvalue %noinlineT %struc, 12
				%o0 = extractvalue %noinlineT %struc, 13
				%p0 = extractvalue %noinlineT %struc, 14
				%xor = xor i64 %a0, %b0
				%xor1 = xor i64 %xor, %c0
				%xor2 = xor i64 %xor1, %d0
				%xor3 = xor i64 %xor2, %e0
				%xor4 = xor i64 %xor3, %f0
				%xor5 = xor i64 %xor4, %g0
				%xor6 = xor i64 %xor5, %h0
				%xor7 = xor i64 %xor6, %i0
				%xor8 = xor i64 %xor7, %j0
				%xor9 = xor i64 %xor8, %k0
				%xor10 = xor i64 %xor9, %l0
				%xor11 = xor i64 %xor10, %m0
				%xor12 = xor i64 %xor11, %n0
				%xor13 = xor i64 %xor12, %o0
				%xor14 = xor i64 %xor13, %p0
				ret i64 %xor14
				}

				define noundef i64 @inlining_call(%inlineT noundef %struc) {
				entry:
				%ptra0 = extractvalue %inlineT %struc, 0, 0
				%ptrb0 = extractvalue %inlineT %struc, 0, 1
				%ptrc0 = extractvalue %inlineT %struc, 1
				%a0 = load i64, ptr %ptra0, align 8
				%b0 = load i64, ptr %ptrb0, align 8
				%c0 = load i64, ptr %ptrc0, align 8
				%d0 = extractvalue %inlineT %struc, 2
				%e0 = extractvalue %inlineT %struc, 3
				%f0 = extractvalue %inlineT %struc, 4
				%g0 = extractvalue %inlineT %struc, 5
				%h0 = extractvalue %inlineT %struc, 6
				%i0 = extractvalue %inlineT %struc, 7
				%j0 = extractvalue %inlineT %struc, 8
				%k0 = extractvalue %inlineT %struc, 9
				%l0 = extractvalue %inlineT %struc, 10
				%m0 = extractvalue %inlineT %struc, 11
				%n0 = extractvalue %inlineT %struc, 12
				%o0 = extractvalue %inlineT %struc, 13
				%p0 = extractvalue %inlineT %struc, 14
				%q0 = extractvalue %inlineT %struc, 15
				%xor = xor i64 %a0, %b0
				%xor1 = xor i64 %xor, %c0
				%xor2 = xor i64 %xor1, %d0
				%xor3 = xor i64 %xor2, %e0
				%xor4 = xor i64 %xor3, %f0
				%xor5 = xor i64 %xor4, %g0
				%xor6 = xor i64 %xor5, %h0
				%xor7 = xor i64 %xor6, %i0
				%xor8 = xor i64 %xor7, %j0
				%xor9 = xor i64 %xor8, %k0
				%xor10 = xor i64 %xor9, %l0
				%xor11 = xor i64 %xor10, %m0
				%xor12 = xor i64 %xor11, %n0
				%xor13 = xor i64 %xor12, %o0
				%xor14 = xor i64 %xor13, %p0
				%xor15 = xor i64 %xor14, %q0
				ret i64 %xor15
				}

				; Calling each (non-)inlining function twice to make sure they won't get the sole call inlining cost bonus.
				define i64 @Caller(ptr noundef %in) {
				entry:
				%ptra0 = getelementptr inbounds i64, ptr %in, i64 0
				%ptrb0 = getelementptr inbounds i64, ptr %in, i64 1
				%ptrc0 = getelementptr inbounds i64, ptr %in, i64 2
				%ptrd0 = getelementptr inbounds i64, ptr %in, i64 3
				%ptre0 = getelementptr inbounds i64, ptr %in, i64 4
				%ptrf0 = getelementptr inbounds i64, ptr %in, i64 5
				%ptrg0 = getelementptr inbounds i64, ptr %in, i64 6
				%ptrh0 = getelementptr inbounds i64, ptr %in, i64 7
				%ptri0 = getelementptr inbounds i64, ptr %in, i64 8
				%ptrj0 = getelementptr inbounds i64, ptr %in, i64 9
				%ptrk0 = getelementptr inbounds i64, ptr %in, i64 10
				%ptrl0 = getelementptr inbounds i64, ptr %in, i64 11
				%ptrm0 = getelementptr inbounds i64, ptr %in, i64 12
				%ptrn0 = getelementptr inbounds i64, ptr %in, i64 13
				%ptro0 = getelementptr inbounds i64, ptr %in, i64 14
				%ptrp0 = getelementptr inbounds i64, ptr %in, i64 15
				%ptrq0 = getelementptr inbounds i64, ptr %in, i64 16
				%a0 = load i64, ptr %ptra0, align 8
				%b0 = load i64, ptr %ptrb0, align 8
				%c0 = load i64, ptr %ptrc0, align 8
				%d0 = load i64, ptr %ptrd0, align 8
				%e0 = load i64, ptr %ptre0, align 8
				%f0 = load i64, ptr %ptrf0, align 8
				%g0 = load i64, ptr %ptrg0, align 8
				%h0 = load i64, ptr %ptrh0, align 8
				%i0 = load i64, ptr %ptri0, align 8
				%j0 = load i64, ptr %ptrj0, align 8
				%k0 = load i64, ptr %ptrk0, align 8
				%l0 = load i64, ptr %ptrl0, align 8
				%m0 = load i64, ptr %ptrm0, align 8
				%n0 = load i64, ptr %ptrn0, align 8
				%o0 = load i64, ptr %ptro0, align 8
				%p0 = load i64, ptr %ptrp0, align 8
				%q0 = load i64, ptr %ptrq0, align 8
				%noinlinestruc1 = insertvalue %noinlineT undef, ptr %ptra0, 0, 0
				%noinlinestruc2 = insertvalue %noinlineT %noinlinestruc1, ptr %ptrb0, 0, 1
				%noinlinestruc3 = insertvalue %noinlineT %noinlinestruc2, ptr %ptrc0, 1
				%noinlinestruc4 = insertvalue %noinlineT %noinlinestruc3, i64 %d0, 2
				%noinlinestruc5 = insertvalue %noinlineT %noinlinestruc4, i64 %e0, 3
				%noinlinestruc6 = insertvalue %noinlineT %noinlinestruc5, i64 %f0, 4
				%noinlinestruc7 = insertvalue %noinlineT %noinlinestruc6, i64 %g0, 5
				%noinlinestruc8 = insertvalue %noinlineT %noinlinestruc7, i64 %h0, 6
				%noinlinestruc9 = insertvalue %noinlineT %noinlinestruc8, i64 %i0, 7
				%noinlinestruc10 = insertvalue %noinlineT %noinlinestruc9, i64 %j0, 8
				%noinlinestruc11 = insertvalue %noinlineT %noinlinestruc10, i64 %k0, 9
				%noinlinestruc12 = insertvalue %noinlineT %noinlinestruc11, i64 %l0, 10
				%noinlinestruc13 = insertvalue %noinlineT %noinlinestruc12, i64 %m0, 11
				%noinlinestruc14 = insertvalue %noinlineT %noinlinestruc13, i64 %n0, 12
				%noinlinestruc15 = insertvalue %noinlineT %noinlinestruc14, i64 %o0, 13
				%noinlinestruc16 = insertvalue %noinlineT %noinlinestruc15, i64 %p0, 14
				%inlinestruc1 = insertvalue %inlineT undef, ptr %ptra0, 0, 0
				%inlinestruc2 = insertvalue %inlineT %inlinestruc1, ptr %ptrb0, 0, 1
				%inlinestruc3 = insertvalue %inlineT %inlinestruc2, ptr %ptrc0, 1
				%inlinestruc4 = insertvalue %inlineT %inlinestruc3, i64 %d0, 2
				%inlinestruc5 = insertvalue %inlineT %inlinestruc4, i64 %e0, 3
				%inlinestruc6 = insertvalue %inlineT %inlinestruc5, i64 %f0, 4
				%inlinestruc7 = insertvalue %inlineT %inlinestruc6, i64 %g0, 5
				%inlinestruc8 = insertvalue %inlineT %inlinestruc7, i64 %h0, 6
				%inlinestruc9 = insertvalue %inlineT %inlinestruc8, i64 %i0, 7
				%inlinestruc10 = insertvalue %inlineT %inlinestruc9, i64 %j0, 8
				%inlinestruc11 = insertvalue %inlineT %inlinestruc10, i64 %k0, 9
				%inlinestruc12 = insertvalue %inlineT %inlinestruc11, i64 %l0, 10
				%inlinestruc13 = insertvalue %inlineT %inlinestruc12, i64 %m0, 11
				%inlinestruc14 = insertvalue %inlineT %inlinestruc13, i64 %n0, 12
				%inlinestruc15 = insertvalue %inlineT %inlinestruc14, i64 %o0, 13
				%inlinestruc16 = insertvalue %inlineT %inlinestruc15, i64 %p0, 14
				%inlinestruc17 = insertvalue %inlineT %inlinestruc16, i64 %q0, 15
				%noinlinecall1 = call noundef i64 @non_inlining_call(%noinlineT noundef %noinlinestruc16)
				%add = add i64 0, %noinlinecall1
				%noinlinecall2 = call noundef i64 @non_inlining_call(%noinlineT noundef %noinlinestruc16)
				%add2 = add i64 %add, %noinlinecall2
				%inlinecall1 = call noundef i64 @inlining_call(%inlineT noundef %inlinestruc17)
				%add3 = add i64 %add2, %inlinecall1
				%inlinecall2 = call noundef i64 @inlining_call(%inlineT noundef %inlinestruc17)
				%add4 = add i64 %add3, %inlinecall2
				ret i64 %add4
				}

llvm/test/Transforms/Inline/AMDGPU/amdgpu-inline-stack-vector-ptr-argument.ll

This file was added.

				; RUN: opt -mtriple=amdgcn-amd-amdhsa -S -passes=inline -inline-cost-full=true -inline-threshold=0 -inline-instr-cost=5 -inline-call-penalty=0 -debug-only=inline < %s 2>&1 \| FileCheck %s
				; REQUIRES: asserts

				; CHECK: NOT Inlining (cost={{[0-9]+}}, threshold={{[0-9]+}}), Call: %noinlinecall1 = call noundef i64 @non_inlining_call
				; CHECK: NOT Inlining (cost={{[0-9]+}}, threshold={{[0-9]+}}), Call: %noinlinecall2 = call noundef i64 @non_inlining_call
				; CHECK-NOT: NOT Inlining (cost={{[0-9]+}}, threshold={{[0-9]+}}), Call: %inlinecall1 = call noundef i64 @inlining_call
				; CHECK-NOT: NOT Inlining (cost={{[0-9]+}}, threshold={{[0-9]+}}), Call: %inlinecall2 = call noundef i64 @inlining_call

				define noundef i64 @non_inlining_call(<2 x ptr> noundef %ptrvec, ptr noundef %ptrc0, ptr noundef %ptrd0, ptr noundef %ptre0, ptr noundef %ptrf0, ptr noundef %ptrg0, ptr noundef %ptrh0, ptr noundef %ptri0, ptr noundef %ptrj0, ptr noundef %ptrk0, ptr noundef %ptrl0, ptr noundef %ptrm0, ptr noundef %ptrn0, ptr noundef %ptro0, ptr noundef %ptrp0) {
				entry:
				%ptra0 = extractelement <2 x ptr> %ptrvec, i32 0
				%ptrb0 = extractelement <2 x ptr> %ptrvec, i32 1
				%a0 = load i64, ptr %ptra0, align 8
				%b0 = load i64, ptr %ptrb0, align 8
				%c0 = load i64, ptr %ptrc0, align 8
				%d0 = load i64, ptr %ptrd0, align 8
				%e0 = load i64, ptr %ptre0, align 8
				%f0 = load i64, ptr %ptrf0, align 8
				%g0 = load i64, ptr %ptrg0, align 8
				%h0 = load i64, ptr %ptrh0, align 8
				%i0 = load i64, ptr %ptri0, align 8
				%j0 = load i64, ptr %ptrj0, align 8
				%k0 = load i64, ptr %ptrk0, align 8
				%l0 = load i64, ptr %ptrl0, align 8
				%m0 = load i64, ptr %ptrm0, align 8
				%n0 = load i64, ptr %ptrn0, align 8
				%o0 = load i64, ptr %ptro0, align 8
				%p0 = load i64, ptr %ptrp0, align 8
				%xor = xor i64 %a0, %b0
				%xor1 = xor i64 %xor, %c0
				%xor2 = xor i64 %xor1, %d0
				%xor3 = xor i64 %xor2, %e0
				%xor4 = xor i64 %xor3, %f0
				%xor5 = xor i64 %xor4, %g0
				%xor6 = xor i64 %xor5, %h0
				%xor7 = xor i64 %xor6, %i0
				%xor8 = xor i64 %xor7, %j0
				%xor9 = xor i64 %xor8, %k0
				%xor10 = xor i64 %xor9, %l0
				%xor11 = xor i64 %xor10, %m0
				%xor12 = xor i64 %xor11, %n0
				%xor13 = xor i64 %xor12, %o0
				%xor14 = xor i64 %xor13, %p0
				ret i64 %xor14
				}

				define noundef i64 @inlining_call(<2 x ptr> noundef %ptrvec, ptr noundef %ptrc0, ptr noundef %ptrd0, ptr noundef %ptre0, ptr noundef %ptrf0, ptr noundef %ptrg0, ptr noundef %ptrh0, ptr noundef %ptri0, ptr noundef %ptrj0, ptr noundef %ptrk0, ptr noundef %ptrl0, ptr noundef %ptrm0, ptr noundef %ptrn0, ptr noundef %ptro0, ptr noundef %ptrp0, ptr noundef %ptrq0) {
				entry:
				%ptra0 = extractelement <2 x ptr> %ptrvec, i32 0
				%ptrb0 = extractelement <2 x ptr> %ptrvec, i32 1
				%a0 = load i64, ptr %ptra0, align 8
				%b0 = load i64, ptr %ptrb0, align 8
				%c0 = load i64, ptr %ptrc0, align 8
				%d0 = load i64, ptr %ptrd0, align 8
				%e0 = load i64, ptr %ptre0, align 8
				%f0 = load i64, ptr %ptrf0, align 8
				%g0 = load i64, ptr %ptrg0, align 8
				%h0 = load i64, ptr %ptrh0, align 8
				%i0 = load i64, ptr %ptri0, align 8
				%j0 = load i64, ptr %ptrj0, align 8
				%k0 = load i64, ptr %ptrk0, align 8
				%l0 = load i64, ptr %ptrl0, align 8
				%m0 = load i64, ptr %ptrm0, align 8
				%n0 = load i64, ptr %ptrn0, align 8
				%o0 = load i64, ptr %ptro0, align 8
				%p0 = load i64, ptr %ptrp0, align 8
				%q0 = load i64, ptr %ptrq0, align 8
				%xor = xor i64 %a0, %b0
				%xor1 = xor i64 %xor, %c0
				%xor2 = xor i64 %xor1, %d0
				%xor3 = xor i64 %xor2, %e0
				%xor4 = xor i64 %xor3, %f0
				%xor5 = xor i64 %xor4, %g0
				%xor6 = xor i64 %xor5, %h0
				%xor7 = xor i64 %xor6, %i0
				%xor8 = xor i64 %xor7, %j0
				%xor9 = xor i64 %xor8, %k0
				%xor10 = xor i64 %xor9, %l0
				%xor11 = xor i64 %xor10, %m0
				%xor12 = xor i64 %xor11, %n0
				%xor13 = xor i64 %xor12, %o0
				%xor14 = xor i64 %xor13, %p0
				%xor15 = xor i64 %xor14, %q0
				ret i64 %xor15
				}

				; Calling each (non-)inlining function twice to make sure they won't get the sole call inlining cost bonus.
				define i64 @Caller(ptr noundef %in) {
				entry:
				%a0 = getelementptr inbounds i64, ptr %in, i64 0
				%b0 = getelementptr inbounds i64, ptr %in, i64 1
				%vec0 = insertelement <2 x ptr> undef, ptr %a0, i32 0
				%vec1 = insertelement <2 x ptr> %vec0, ptr %b0, i32 0
				%c0 = getelementptr inbounds i64, ptr %in, i64 2
				%d0 = getelementptr inbounds i64, ptr %in, i64 3
				%e0 = getelementptr inbounds i64, ptr %in, i64 4
				%f0 = getelementptr inbounds i64, ptr %in, i64 5
				%g0 = getelementptr inbounds i64, ptr %in, i64 6
				%h0 = getelementptr inbounds i64, ptr %in, i64 7
				%i0 = getelementptr inbounds i64, ptr %in, i64 8
				%j0 = getelementptr inbounds i64, ptr %in, i64 9
				%k0 = getelementptr inbounds i64, ptr %in, i64 10
				%l0 = getelementptr inbounds i64, ptr %in, i64 11
				%m0 = getelementptr inbounds i64, ptr %in, i64 12
				%n0 = getelementptr inbounds i64, ptr %in, i64 13
				%o0 = getelementptr inbounds i64, ptr %in, i64 14
				%p0 = getelementptr inbounds i64, ptr %in, i64 15
				%q0 = getelementptr inbounds i64, ptr %in, i64 16
				%noinlinecall1 = call noundef i64 @non_inlining_call(<2 x ptr> noundef %vec1, ptr noundef %c0, ptr noundef %d0, ptr noundef %e0, ptr noundef %f0, ptr noundef %g0, ptr noundef %h0, ptr noundef %i0, ptr noundef %j0, ptr noundef %k0, ptr noundef %l0, ptr noundef %m0, ptr noundef %n0, ptr noundef %o0, ptr noundef %p0)
				%add = add i64 0, %noinlinecall1
				%noinlinecall2 = call noundef i64 @non_inlining_call(<2 x ptr> noundef %vec1, ptr noundef %c0, ptr noundef %d0, ptr noundef %e0, ptr noundef %f0, ptr noundef %g0, ptr noundef %h0, ptr noundef %i0, ptr noundef %j0, ptr noundef %k0, ptr noundef %l0, ptr noundef %m0, ptr noundef %n0, ptr noundef %o0, ptr noundef %p0)
				%add2 = add i64 %add, %noinlinecall2
				%inlinecall1 = call noundef i64 @inlining_call(<2 x ptr> noundef %vec1, ptr noundef %c0, ptr noundef %d0, ptr noundef %e0, ptr noundef %f0, ptr noundef %g0, ptr noundef %h0, ptr noundef %i0, ptr noundef %j0, ptr noundef %k0, ptr noundef %l0, ptr noundef %m0, ptr noundef %n0, ptr noundef %o0, ptr noundef %p0, ptr noundef %q0)
				%add3 = add i64 %add2, %inlinecall1
				%inlinecall2 = call noundef i64 @inlining_call(<2 x ptr> noundef %vec1, ptr noundef %c0, ptr noundef %d0, ptr noundef %e0, ptr noundef %f0, ptr noundef %g0, ptr noundef %h0, ptr noundef %i0, ptr noundef %j0, ptr noundef %k0, ptr noundef %l0, ptr noundef %m0, ptr noundef %n0, ptr noundef %o0, ptr noundef %p0, ptr noundef %q0)
				%add4 = add i64 %add3, %inlinecall2
				ret i64 %add4
				}