This is an archive of the discontinued LLVM Phabricator instance.

[clang][AMDGPU]: Don't use byval for struct arguments in function ABI
ClosedPublic

Authored by cfang on Jul 21 2023, 11:48 AM.

Download Raw Diff

Details

Reviewers

arsenm
bcahoon
jdoerfert

Group Reviewers

Restricted Project

Commits

rGd77c62053c94: [clang][AMDGPU]: Don't use byval for struct arguments in function ABI

Summary

Byval requires allocating additional stack space, and always requires an implicit copy to be inserted in codegen,
where it can be difficult to optimize. In this work, we use byref/IndirectAliased promotion method instead of
byval with the implicit copy semantics.

Diff Detail

Event Timeline

cfang created this revision.Jul 21 2023, 11:48 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 21 2023, 11:48 AM

Herald added subscribers: kerbowa, tpr, dstuttard and 3 others. · View Herald Transcript

cfang requested review of this revision.Jul 21 2023, 11:48 AM

Herald added a subscriber: wdng. · View Herald TranscriptJul 21 2023, 11:48 AM

arsenm added inline comments.Jul 21 2023, 11:52 AM

clang/lib/CodeGen/Targets/AMDGPU.cpp
253	Why does this need the type checks? Can this just go under the isIndirect handling?
clang/test/CodeGenOpenCL/addr-space-struct-arg.cl
124	These test checks are pretty thin, I'd like to see the memcpys in the IR. In a pre-commit, can you switch these tests to generated checks?

Should also get a mention in the release notes (not sure how much ABI detail we have in AMDGPUUsage too)

arsenm added reviewers: Restricted Project, jdoerfert.Jul 21 2023, 12:07 PM

I would have assumed it makes more sense to have an IR pass that changes the ABI of functions with local linkage. The problem with this is that other languages might still emit byval, etc.

In D155986#4523635, @jdoerfert wrote:

I would have assumed it makes more sense to have an IR pass that changes the ABI of functions with local linkage. The problem with this is that other languages might still emit byval, etc.

The backend still supports byval

In D155986#4523635, @jdoerfert wrote:

I would have assumed it makes more sense to have an IR pass that changes the ABI of functions with local linkage. The problem with this is that other languages might still emit byval, etc.

I think this would be good as an alternative optimization, but this wouldn't change the C ABI to avoid byval. I think we don't want byval for whenever we have an object linkable ABI

In D155986#4523674, @arsenm wrote:

In D155986#4523635, @jdoerfert wrote:

I would have assumed it makes more sense to have an IR pass that changes the ABI of functions with local linkage. The problem with this is that other languages might still emit byval, etc.

I think this would be good as an alternative optimization, but this wouldn't change the C ABI to avoid byval. I think we don't want byval for whenever we have an object linkable ABI

I have two points, I should have been more clear:

This is dangerous since we change something that is tested. Interoperability might not be the biggest thing yet, but special casing clangs behavior seems a slippery slope towards rust/fortran/Julia/... code not being able to talk to C/C++/HIP/... code when both are compiled for AMDGPUs.
We get more bang and less (potential) problems if we transform the IR whenever we can prove that we can, e.g., we see all the call sites. It should be overall better since we target a closed world, at least for the foreseeable future.

In D155986#4524001, @jdoerfert wrote:

This is dangerous since we change something that is tested. Interoperability might not be the biggest thing yet, but special casing clangs behavior seems a slippery slope towards rust/fortran/Julia/... code not being able to talk to C/C++/HIP/... code when both are compiled for AMDGPUs.

It's not special casing clang's behavior. We are defining the C ABI, a thing which currently does not exist. Other languages that want to be compatible with the C ABI have to follow, but they don't have to. This isn't a unique property, and every frontend has to do this for every target. I don't see why we need to carry the albatross of byval simply because llvm didn't well abstract ABI details.

We do not have a defined ABI. We do not have machine code linking. Calling functions from assembly requires considerable care already. We change things like which registers are live per-function.

The time will come where we are sufficiently constrained to be stuck with inefficiency in calling convention but we are not there yet. Other languages that want to link against IR emitted by clang can still do so, though they'll also need to change to the more efficient convention to do so.

You're saying this is not specializing it but it is. AMDGPU now emits different IR than NVPTX targets. Is that by itself a problem, no. Can I imagine this to be a problem down the line, yes.
That aside, I am not arguing this is by itself wrong. What I'm mostly trying to say is that there is a more generic alternative we should implement instead.
My goal is to get the same effect for all languages targeting AMDGPUs, not only the ones that go through Clang.

Anyway, I'm not blocking this. We'll can deal with the other languages and such later then.

In D155986#4524191, @JonChesterfield wrote:

... though they'll also need to change to the more efficient convention to do so.

Exactly my argument.

In D155986#4524214, @jdoerfert wrote:

That aside, I am not arguing this is by itself wrong. What I'm mostly trying to say is that there is a more generic alternative we should implement instead.

We should optimise more aggressively within a module. There's an idea in mailing lists to deliberately split functions into local versions with a fast calling convention and externally visible shims used for address escapes. That's a good thing that we should do. It's not this thing.

This is a case where the module external functions can be quicker too. So we should do that. I hear that nvptx does the worse thing and you don't like divergence between the GPU targets. Ptx is the stable ABI there, perhaps we should fix nvptx to match in a later patch.

My goal is to get the same effect for all languages targeting AMDGPUs, not only the ones that go through Clang.

Those languages should also not use the convention in place before this patch. The existing behaviour is a design mistake. This frees them to do something better without generating shims to work around the C ABI.

In D155986#4524214, @jdoerfert wrote:

You're saying this is not specializing it but it is. AMDGPU now emits different IR than NVPTX targets. Is that by itself a problem, no. Can I imagine this to be a problem down the line, yes.
That aside, I am not arguing this is by itself wrong. What I'm mostly trying to say is that there is a more generic alternative we should implement instead.

It's not an alternative, it's an entirely orthogonal thing. We should have both. If we commit to a real ABI, it is not going to be limited by trying to pretend that target specific IR is portable, when that's never been the case. NVPTX can switch to match this if it wants. If we want similar IR why consolidate in the worse direction

Harbormaster completed remote builds in B247294: Diff 543015.Jul 21 2023, 7:11 PM

Move getIndirectAliased under isAggregateTypeForABI(Ty)
update LIT addr-space-struct-arg.cl to check corresponding alloca and memcpy to the struct.

cfang added inline comments.Jul 24 2023, 2:49 PM

clang/test/CodeGenOpenCL/addr-space-struct-arg.cl
124	Do you mean using "update_cc_test_checks.py" to generate the CHECKs? I am not sure why this does not work as expected. But I am including it as a separate file " addr-space-struct-arg-temp.cl" for reference.

In D155986#4523490, @arsenm wrote:

Should also get a mention in the release notes (not sure how much ABI detail we have in AMDGPUUsage too)

Still trying to figure out where to say what in the document.

arsenm added inline comments.Jul 24 2023, 2:50 PM

clang/test/CodeGenOpenCL/addr-space-struct-arg.cl
124	Yes. You might need to manually delete the checks that are already there

Does this patch cause the callee to skip making a local copy of the struct type argument? What if the callee makes changes to the argument? That is a common use case since users assume the function arguments in C/C++/HIP are passed by value.

In D155986#4529782, @yaxunl wrote:

Does this patch cause the callee to skip making a local copy of the struct type argument? What if the callee makes changes to the argument? That is a common use case since users assume the function arguments in C/C++/HIP are passed by value.

The callee is supposed to use an explicit memcpy. The explicit memcpy can be optimized away most of the time. The current byval is an invisible copy on the caller side which is difficult to eliminate

Harbormaster completed remote builds in B247792: Diff 543703.Jul 24 2023, 10:38 PM

Don't add the temporary test, pre-commit a switch of the existing test so it's easy to see the diff in the review.

This revision now requires changes to proceed.Jul 28 2023, 11:50 AM

add a pre-commit test, CodeGenOpenCL/amdgpu-abi-struct-arg-byref.cl, for convenience of review. This test will be added for checking upon commit.

Harbormaster completed remote builds in B249622: Diff 546255.Aug 1 2023, 3:59 PM

Need to add to release notes and document in AMDGPUUsage

clang/test/CodeGenOpenCL/byval.cl
0–1	The test name suggests we should test with a different target that does use byval here

add x86 target to check pass by value in byval.cl
remove -enable-var-scope in the pre-commit test.

TODO: document and release note

Harbormaster completed remote builds in B249634: Diff 546270.Aug 1 2023, 5:04 PM

Add comments in ReleaseNotes and AMDGPUUsage.

Herald added a project: Restricted Project. · View Herald TranscriptAug 3 2023, 1:51 PM

Harbormaster completed remote builds in B250163: Diff 546992.Aug 3 2023, 1:52 PM

arsenm added inline comments.Aug 10 2023, 2:41 PM

llvm/docs/AMDGPUUsage.rst
13812–13813	s/to allocate memory/for allocating stack memory/
13813	Specify C ABI
13814	copying the value of the struct if modified

cfang added inline comments.Aug 11 2023, 11:45 AM

llvm/docs/AMDGPUUsage.rst
13813	Do not get what to do to "Specify C ABI"? Can you suggest explicitly? Thanks.

arsenm added inline comments.Aug 11 2023, 12:01 PM

llvm/docs/AMDGPUUsage.rst
13813	Add the letter C

cfang added inline comments.Aug 11 2023, 12:05 PM

llvm/docs/AMDGPUUsage.rst
13813	... in function C ABI? Or should we remove "function"?

arsenm added inline comments.Aug 11 2023, 12:06 PM

llvm/docs/AMDGPUUsage.rst
13813	Can just say C, doesn't really matter if you state function or not. function is implied

Update description in docs as suggested.

Harbormaster completed remote builds in B252026: Diff 549487.Aug 11 2023, 12:18 PM

Should look into why noundef was lost, but that can be in a follow up

clang/lib/CodeGen/Targets/AMDGPU.cpp
252	Typo "in stead"
clang/test/CodeGenOpenCL/amdgpu-abi-struct-arg-byref.cl
257	This lost the noundef, shouldn't lose it

This revision is now accepted and ready to land.Aug 11 2023, 2:43 PM

cfang added inline comments.Aug 11 2023, 3:44 PM

clang/test/CodeGenOpenCL/amdgpu-abi-struct-arg-byref.cl
257	if (AI.getKind() == ABIArgInfo::Indirect) return "noundef" Should we add IndirectAlised check and include in the this same patch? Thanks

arsenm added inline comments.Aug 11 2023, 4:19 PM

clang/test/CodeGenOpenCL/amdgpu-abi-struct-arg-byref.cl
257	Don't understand this snippet, the attribute emission presumably comes from somewhere else

Add "noundef" attribute for IndirectAlised.

Harbormaster completed remote builds in B252064: Diff 549545.Aug 11 2023, 4:24 PM

cfang added inline comments.Aug 11 2023, 4:26 PM

clang/test/CodeGenOpenCL/amdgpu-abi-struct-arg-byref.cl
257	In function DetermineNoUndef. "noundef was also missing for kernel byref argument

arsenm added inline comments.Aug 11 2023, 4:28 PM

clang/test/CodeGenOpenCL/amdgpu-abi-struct-arg-byref.cl
257	Yes, that should probably also include indirect aliased. You should fix that in a second patch

This revision was landed with ongoing or failed builds.Aug 11 2023, 4:38 PM

Closed by commit rGd77c62053c94: [clang][AMDGPU]: Don't use byval for struct arguments in function ABI (authored by cfang). · Explain Why

This revision was automatically updated to reflect the committed changes.

cfang added a commit: rGd77c62053c94: [clang][AMDGPU]: Don't use byval for struct arguments in function ABI.

Herald added a project: Restricted Project. · View Herald TranscriptAug 11 2023, 4:38 PM

Herald added a subscriber: cfe-commits. · View Herald Transcript

Thanks! Happy to see function calls getting cheaper

Revision Contents

Path

Size

clang/

docs/

ReleaseNotes.rst

4 lines

lib/

CodeGen/

CGCall.cpp

9 lines

Targets/

AMDGPU.cpp

6 lines

test/

CodeGenCXX/

amdgcn-func-arg.cpp

19 lines

CodeGenOpenCL/

addr-space-struct-arg.cl

23 lines

amdgpu-abi-struct-arg-byref.cl

32 lines

amdgpu-abi-struct-coerce.cl

14 lines

byval.cl

10 lines

llvm/

docs/

AMDGPUUsage.rst

4 lines

Diff 546992

clang/docs/ReleaseNotes.rst

	Show All 22 Lines
	Miscellaneous Clang Crashes Fixed			Miscellaneous Clang Crashes Fixed
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^			^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

	Target Specific Changes			Target Specific Changes
	-----------------------			-----------------------

	AMDGPU Support			AMDGPU Support
	^^^^^^^^^^^^^^			^^^^^^^^^^^^^^
				- Use pass-by-reference (byref) instead of pass-by-value (byval) for struct
				arguments in function ABI. Callee is responsible to allocate memory and
				make a copy of the struct. Note that AMDGPU backend still supports byval
				for struct arguments.

	X86 Support			X86 Support
	^^^^^^^^^^^			^^^^^^^^^^^

	Arm and AArch64 Support			Arm and AArch64 Support
	^^^^^^^^^^^^^^^^^^^^^^^			^^^^^^^^^^^^^^^^^^^^^^^

	Windows Support			Windows Support
	Show All 22 Lines

clang/lib/CodeGen/CGCall.cpp

	Show All 22 Lines
	if (Addr.getAlignment() < Align &&			if (Addr.getAlignment() < Align &&
	llvm::getOrEnforceKnownAlignment(V, Align.getAsAlign(), *TD) <			llvm::getOrEnforceKnownAlignment(V, Align.getAsAlign(), *TD) <
	Align.getAsAlign()) {			Align.getAsAlign()) {
	NeedCopy = true;			NeedCopy = true;
	} else if (I->hasLValue()) {			} else if (I->hasLValue()) {
	auto LV = I->getKnownLValue();			auto LV = I->getKnownLValue();
	auto AS = LV.getAddressSpace();			auto AS = LV.getAddressSpace();

	if (!ArgInfo.getIndirectByVal() \|\|			bool isByValOrRef =
				ArgInfo.isIndirectAliased() \|\| ArgInfo.getIndirectByVal();

				if (!isByValOrRef \|\|
	(LV.getAlignment() < getContext().getTypeAlignInChars(I->Ty))) {			(LV.getAlignment() < getContext().getTypeAlignInChars(I->Ty))) {
	NeedCopy = true;			NeedCopy = true;
	}			}
	if (!getLangOpts().OpenCL) {			if (!getLangOpts().OpenCL) {
	if ((ArgInfo.getIndirectByVal() &&			if ((isByValOrRef &&
	(AS != LangAS::Default &&			(AS != LangAS::Default &&
	AS != CGM.getASTAllocaAddressSpace()))) {			AS != CGM.getASTAllocaAddressSpace()))) {
	NeedCopy = true;			NeedCopy = true;
	}			}
	}			}
	// For OpenCL even if RV is located in default or alloca address space			// For OpenCL even if RV is located in default or alloca address space
	// we don't want to perform address space cast for it.			// we don't want to perform address space cast for it.
	else if ((ArgInfo.getIndirectByVal() &&			else if ((isByValOrRef &&
	Addr.getType()->getAddressSpace() != IRFuncTy->			Addr.getType()->getAddressSpace() != IRFuncTy->
	getParamType(FirstIRArg)->getPointerAddressSpace())) {			getParamType(FirstIRArg)->getPointerAddressSpace())) {
	NeedCopy = true;			NeedCopy = true;
	}			}
	}			}

	if (NeedCopy) {			if (NeedCopy) {
	// Create an aligned temporary, and copy to it.			// Create an aligned temporary, and copy to it.
	Show All 22 Lines

clang/lib/CodeGen/Targets/AMDGPU.cpp

	Show All 22 Lines

	if (NumRegsLeft > 0) {			if (NumRegsLeft > 0) {
	unsigned NumRegs = numRegsForType(Ty);			unsigned NumRegs = numRegsForType(Ty);
	if (NumRegsLeft >= NumRegs) {			if (NumRegsLeft >= NumRegs) {
	NumRegsLeft -= NumRegs;			NumRegsLeft -= NumRegs;
	return ABIArgInfo::getDirect();			return ABIArgInfo::getDirect();
	}			}
	}			}

				// Use pass-by-reference in stead of pass-by-value for struct arguments in
				arsenmUnsubmitted Not Done Reply Inline Actions Typo "in stead" arsenm: Typo "in stead"
				// function ABI.
				arsenmUnsubmitted Not Done Reply Inline Actions Why does this need the type checks? Can this just go under the isIndirect handling? arsenm: Why does this need the type checks? Can this just go under the isIndirect handling?
				return ABIArgInfo::getIndirectAliased(
				getContext().getTypeAlignInChars(Ty),
				getContext().getTargetAddressSpace(LangAS::opencl_private));
	}			}

	// Otherwise just do the default thing.			// Otherwise just do the default thing.
	ABIArgInfo ArgInfo = DefaultABIInfo::classifyArgumentType(Ty);			ABIArgInfo ArgInfo = DefaultABIInfo::classifyArgumentType(Ty);
	if (!ArgInfo.isIndirect()) {			if (!ArgInfo.isIndirect()) {
	unsigned NumRegs = numRegsForType(Ty);			unsigned NumRegs = numRegsForType(Ty);
	NumRegsLeft -= std::min(NumRegs, NumRegsLeft);			NumRegsLeft -= std::min(NumRegs, NumRegsLeft);
	}			}
	Show All 22 Lines

clang/test/CodeGenCXX/amdgcn-func-arg.cpp

	Show All 13 Lines

	A g_a;			A g_a;
	B g_b;			B g_b;

	void func_with_ref_arg(A &a);			void func_with_ref_arg(A &a);
	void func_with_ref_arg(B &b);			void func_with_ref_arg(B &b);

	// CHECK-LABEL: @_Z22func_with_indirect_arg1A(			// CHECK-LABEL: @_Z22func_with_indirect_arg1A(
	// CHECK-SAME: ptr addrspace(5) noundef [[ARG:%.*]])
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[INDIRECT_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)			// CHECK-NEXT: [[A_INDIRECT_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
	// CHECK-NEXT: [[P:%.*]] = alloca ptr, align 8, addrspace(5)			// CHECK-NEXT: [[P:%.*]] = alloca ptr, align 8, addrspace(5)
	// CHECK-NEXT: [[INDIRECT_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[INDIRECT_ADDR]] to ptr			// CHECK-NEXT: [[A_INDIRECT_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[A_INDIRECT_ADDR]] to ptr
	// CHECK-NEXT: [[P_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[P]] to ptr			// CHECK-NEXT: [[P_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[P]] to ptr
	// CHECK-NEXT: store ptr addrspace(5) [[ARG]], ptr [[INDIRECT_ADDR_ASCAST]]			// CHECK-NEXT: store ptr addrspace(5) [[A:%.*]], ptr [[A_INDIRECT_ADDR_ASCAST]], align 8
	// CHECK-NEXT: [[A_ASCAST:%.]] = addrspacecast ptr addrspace(5) [[A:%.]] to ptr			// CHECK-NEXT: [[A_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[A]] to ptr
	// CHECK-NEXT: store ptr [[A_ASCAST]], ptr [[P_ASCAST]], align 8			// CHECK-NEXT: store ptr [[A_ASCAST]], ptr [[P_ASCAST]], align 8
	// CHECK-NEXT: ret void			// CHECK-NEXT: ret void
	//			//
	void func_with_indirect_arg(A a) {			void func_with_indirect_arg(A a) {
	A *p = &a;			A *p = &a;
	}			}

	// CHECK-LABEL: @_Z22test_indirect_arg_autov(			// CHECK-LABEL: @_Z22test_indirect_arg_autov(
	Show All 30 Lines
	//			//
	void test_indirect_arg_global() {			void test_indirect_arg_global() {
	func_with_indirect_arg(g_a);			func_with_indirect_arg(g_a);
	func_with_ref_arg(g_a);			func_with_ref_arg(g_a);
	}			}

	// CHECK-LABEL: @_Z19func_with_byval_arg1B(			// CHECK-LABEL: @_Z19func_with_byval_arg1B(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
				// CHECK-NEXT: [[COERCE:%.]] = alloca [[CLASS_B:%.]], align 4, addrspace(5)
	// CHECK-NEXT: [[P:%.*]] = alloca ptr, align 8, addrspace(5)			// CHECK-NEXT: [[P:%.*]] = alloca ptr, align 8, addrspace(5)
				// CHECK-NEXT: [[B:%.*]] = addrspacecast ptr addrspace(5) [[COERCE]] to ptr
	// CHECK-NEXT: [[P_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[P]] to ptr			// CHECK-NEXT: [[P_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[P]] to ptr
	// CHECK-NEXT: [[B_ASCAST:%.]] = addrspacecast ptr addrspace(5) [[B:%.]] to ptr			// CHECK-NEXT: call void @llvm.memcpy.p0.p5.i64(ptr align 4 [[B]], ptr addrspace(5) align 4 [[TMP0:%.*]], i64 400, i1 false)
	// CHECK-NEXT: store ptr [[B_ASCAST]], ptr [[P_ASCAST]], align 8			// CHECK-NEXT: store ptr [[B]], ptr [[P_ASCAST]], align 8
	// CHECK-NEXT: ret void			// CHECK-NEXT: ret void
	//			//
	void func_with_byval_arg(B b) {			void func_with_byval_arg(B b) {
	B *p = &b;			B *p = &b;
	}			}

	// CHECK-LABEL: @_Z19test_byval_arg_autov(			// CHECK-LABEL: @_Z19test_byval_arg_autov(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[B:%.]] = alloca [[CLASS_B:%.]], align 4, addrspace(5)			// CHECK-NEXT: [[B:%.]] = alloca [[CLASS_B:%.]], align 4, addrspace(5)
	// CHECK-NEXT: [[AGG_TMP:%.*]] = alloca [[CLASS_B]], align 4, addrspace(5)			// CHECK-NEXT: [[AGG_TMP:%.*]] = alloca [[CLASS_B]], align 4, addrspace(5)
	// CHECK-NEXT: [[B_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[B]] to ptr			// CHECK-NEXT: [[B_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[B]] to ptr
	// CHECK-NEXT: [[AGG_TMP_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[AGG_TMP]] to ptr			// CHECK-NEXT: [[AGG_TMP_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[AGG_TMP]] to ptr
	// CHECK-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[AGG_TMP_ASCAST]], ptr align 4 [[B_ASCAST]], i64 400, i1 false)			// CHECK-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[AGG_TMP_ASCAST]], ptr align 4 [[B_ASCAST]], i64 400, i1 false)
	// CHECK-NEXT: [[AGG_TMP_ASCAST_ASCAST:%.*]] = addrspacecast ptr [[AGG_TMP_ASCAST]] to ptr addrspace(5)			// CHECK-NEXT: [[AGG_TMP_ASCAST_ASCAST:%.*]] = addrspacecast ptr [[AGG_TMP_ASCAST]] to ptr addrspace(5)
	// CHECK-NEXT: call void @_Z19func_with_byval_arg1B(ptr addrspace(5) noundef byval([[CLASS_B]]) align 4 [[AGG_TMP_ASCAST_ASCAST]])			// CHECK-NEXT: call void @_Z19func_with_byval_arg1B(ptr addrspace(5) byref([[CLASS_B]]) align 4 [[AGG_TMP_ASCAST_ASCAST]])
	// CHECK-NEXT: call void @_Z17func_with_ref_argR1B(ptr noundef nonnull align 4 dereferenceable(400) [[B_ASCAST]])			// CHECK-NEXT: call void @_Z17func_with_ref_argR1B(ptr noundef nonnull align 4 dereferenceable(400) [[B_ASCAST]])
	// CHECK-NEXT: ret void			// CHECK-NEXT: ret void
	//			//
	void test_byval_arg_auto() {			void test_byval_arg_auto() {
	B b;			B b;
	func_with_byval_arg(b);			func_with_byval_arg(b);
	func_with_ref_arg(b);			func_with_ref_arg(b);
	}			}

	// CHECK-LABEL: @_Z21test_byval_arg_globalv(			// CHECK-LABEL: @_Z21test_byval_arg_globalv(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[AGG_TMP:%.]] = alloca [[CLASS_B:%.]], align 4, addrspace(5)			// CHECK-NEXT: [[AGG_TMP:%.]] = alloca [[CLASS_B:%.]], align 4, addrspace(5)
	// CHECK-NEXT: [[AGG_TMP_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[AGG_TMP]] to ptr			// CHECK-NEXT: [[AGG_TMP_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[AGG_TMP]] to ptr
	// CHECK-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[AGG_TMP_ASCAST]], ptr align 4 addrspacecast (ptr addrspace(1) @g_b to ptr), i64 400, i1 false)			// CHECK-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[AGG_TMP_ASCAST]], ptr align 4 addrspacecast (ptr addrspace(1) @g_b to ptr), i64 400, i1 false)
	// CHECK-NEXT: [[AGG_TMP_ASCAST_ASCAST:%.*]] = addrspacecast ptr [[AGG_TMP_ASCAST]] to ptr addrspace(5)			// CHECK-NEXT: [[AGG_TMP_ASCAST_ASCAST:%.*]] = addrspacecast ptr [[AGG_TMP_ASCAST]] to ptr addrspace(5)
	// CHECK-NEXT: call void @_Z19func_with_byval_arg1B(ptr addrspace(5) noundef byval([[CLASS_B]]) align 4 [[AGG_TMP_ASCAST_ASCAST]])			// CHECK-NEXT: call void @_Z19func_with_byval_arg1B(ptr addrspace(5) byref([[CLASS_B]]) align 4 [[AGG_TMP_ASCAST_ASCAST]])
	// CHECK-NEXT: call void @_Z17func_with_ref_argR1B(ptr noundef nonnull align 4 dereferenceable(400) addrspacecast (ptr addrspace(1) @g_b to ptr))			// CHECK-NEXT: call void @_Z17func_with_ref_argR1B(ptr noundef nonnull align 4 dereferenceable(400) addrspacecast (ptr addrspace(1) @g_b to ptr))
	// CHECK-NEXT: ret void			// CHECK-NEXT: ret void
	//			//
	void test_byval_arg_global() {			void test_byval_arg_global() {
	func_with_byval_arg(g_b);			func_with_byval_arg(g_b);
	func_with_ref_arg(g_b);			func_with_ref_arg(g_b);
	}			}

clang/test/CodeGenOpenCL/addr-space-struct-arg.cl

	Show All 22 Lines
	// AMDGCN: load [9 x i32], ptr addrspace(1)	// AMDGCN: load [9 x i32], ptr addrspace(1)
	// AMDGCN: call %struct.Mat4X4 @foo([9 x i32]	// AMDGCN: call %struct.Mat4X4 @foo([9 x i32]
	// AMDGCN: call void @llvm.memcpy.p1.p5.i64(ptr addrspace(1)	// AMDGCN: call void @llvm.memcpy.p1.p5.i64(ptr addrspace(1)
	kernel void ker(global Mat3X3 in, global Mat4X4 out) {	kernel void ker(global Mat3X3 in, global Mat4X4 out) {
	out[0] = foo(in[1]);	out[0] = foo(in[1]);
	}	}

	// X86-LABEL: define{{.*}} void @foo_large(ptr noalias sret(%struct.Mat64X64) align 4 %agg.result, ptr noundef byval(%struct.Mat32X32) align 4 %in)	// X86-LABEL: define{{.*}} void @foo_large(ptr noalias sret(%struct.Mat64X64) align 4 %agg.result, ptr noundef byval(%struct.Mat32X32) align 4 %in)
	// AMDGCN-LABEL: define{{.*}} void @foo_large(ptr addrspace(5) noalias sret(%struct.Mat64X64) align 4 %agg.result, ptr addrspace(5) noundef byval(%struct.Mat32X32) align 4 %in)	// AMDGCN-LABEL: define{{.}} void @foo_large(ptr addrspace(5) noalias sret(%struct.Mat64X64) align 4 %agg.result, ptr addrspace(5) byref(%struct.Mat32X32) align 4 %{{.}}
		// AMDGCN: %in = alloca %struct.Mat32X32, align 4, addrspace(5)
		// AMDGCN-NEXT: call void @llvm.memcpy.p5.p5.i64(ptr addrspace(5) align 4 %in, ptr addrspace(5) align 4 %{{.*}}, i64 4096, i1 false)
	Mat64X64 __attribute__((noinline)) foo_large(Mat32X32 in) {	Mat64X64 __attribute__((noinline)) foo_large(Mat32X32 in) {
	Mat64X64 out;	Mat64X64 out;
	return out;	return out;
	}	}

	// ALL-LABEL: define {{.*}} void @ker_large	// ALL-LABEL: define {{.*}} void @ker_large
	// Expect two mem copies: one for the argument "in", and one for	// Expect two mem copies: one for the argument "in", and one for
	// the return value.	// the return value.
	// X86: call void @llvm.memcpy.p0.p1.i32(ptr	// X86: call void @llvm.memcpy.p0.p1.i32(ptr
	// X86: call void @llvm.memcpy.p1.p0.i32(ptr addrspace(1)	// X86: call void @llvm.memcpy.p1.p0.i32(ptr addrspace(1)
	// AMDGCN: call void @llvm.memcpy.p5.p1.i64(ptr addrspace(5)	// AMDGCN: call void @llvm.memcpy.p5.p1.i64(ptr addrspace(5)
	// AMDGCN: call void @llvm.memcpy.p1.p5.i64(ptr addrspace(1)	// AMDGCN: call void @llvm.memcpy.p1.p5.i64(ptr addrspace(1)
	kernel void ker_large(global Mat32X32 in, global Mat64X64 out) {	kernel void ker_large(global Mat32X32 in, global Mat64X64 out) {
	out[0] = foo_large(in[1]);	out[0] = foo_large(in[1]);
	}	}

	// AMDGCN-LABEL: define{{.*}} void @FuncOneMember(<2 x i32> %u.coerce)	// AMDGCN-LABEL: define{{.*}} void @FuncOneMember(<2 x i32> %u.coerce)
	void FuncOneMember(struct StructOneMember u) {	void FuncOneMember(struct StructOneMember u) {
	u.x = (int2)(0, 0);	u.x = (int2)(0, 0);
	}	}

	// AMDGCN-LABEL: define{{.*}} void @FuncOneLargeMember(ptr addrspace(5) noundef byval(%struct.LargeStructOneMember) align 8 %u)	// AMDGCN-LABEL: define{{.}} void @FuncOneLargeMember(ptr addrspace(5) byref(%struct.LargeStructOneMember) align 8 %{{.}}
		// AMDGCN: %u = alloca %struct.LargeStructOneMember, align 8, addrspace(5)
		// AMDGCN: call void @llvm.memcpy.p5.p5.i64(ptr addrspace(5) align 8 %u, ptr addrspace(5) align 8 %{{.*}}, i64 800, i1 false)
	// AMDGCN-NOT: addrspacecast	// AMDGCN-NOT: addrspacecast
	// AMDGCN: store <2 x i32> %{{.*}}, ptr addrspace(5)	// AMDGCN: store <2 x i32> %{{.*}}, ptr addrspace(5)
	void FuncOneLargeMember(struct LargeStructOneMember u) {	void FuncOneLargeMember(struct LargeStructOneMember u) {
	u.x[0] = (int2)(0, 0);	u.x[0] = (int2)(0, 0);
	}	}

	// AMDGCN20-LABEL: define{{.*}} void @test_indirect_arg_globl()	// AMDGCN20-LABEL: define{{.*}} void @test_indirect_arg_globl()
	// AMDGCN20: %[[byval_temp:.*]] = alloca %struct.LargeStructOneMember, align 8, addrspace(5)	// AMDGCN20: %[[byval_temp:.*]] = alloca %struct.LargeStructOneMember, align 8, addrspace(5)
	// AMDGCN20: call void @llvm.memcpy.p5.p1.i64(ptr addrspace(5) align 8 %[[byval_temp]], ptr addrspace(1) align 8 @g_s, i64 800, i1 false)	// AMDGCN20: call void @llvm.memcpy.p5.p1.i64(ptr addrspace(5) align 8 %[[byval_temp]], ptr addrspace(1) align 8 @g_s, i64 800, i1 false)
	// AMDGCN20: call void @FuncOneLargeMember(ptr addrspace(5) noundef byval(%struct.LargeStructOneMember) align 8 %[[byval_temp]])	// AMDGCN20: call void @FuncOneLargeMember(ptr addrspace(5) byref(%struct.LargeStructOneMember) align 8 %[[byval_temp]])
	#if (__OPENCL_C_VERSION__ == 200) \|\| (__OPENCL_C_VERSION__ >= 300 && defined(__opencl_c_program_scope_global_variables))	#if (__OPENCL_C_VERSION__ == 200) \|\| (__OPENCL_C_VERSION__ >= 300 && defined(__opencl_c_program_scope_global_variables))
	void test_indirect_arg_globl(void) {	void test_indirect_arg_globl(void) {
	FuncOneLargeMember(g_s);	FuncOneLargeMember(g_s);
	}	}
	#endif	#endif

	// AMDGCN-LABEL: define{{.*}} amdgpu_kernel void @test_indirect_arg_local()	// AMDGCN-LABEL: define{{.*}} amdgpu_kernel void @test_indirect_arg_local()
	// AMDGCN: %[[byval_temp:.*]] = alloca %struct.LargeStructOneMember, align 8, addrspace(5)	// AMDGCN: %[[byval_temp:.*]] = alloca %struct.LargeStructOneMember, align 8, addrspace(5)
	// AMDGCN: call void @llvm.memcpy.p5.p3.i64(ptr addrspace(5) align 8 %[[byval_temp]], ptr addrspace(3) align 8 @test_indirect_arg_local.l_s, i64 800, i1 false)	// AMDGCN: call void @llvm.memcpy.p5.p3.i64(ptr addrspace(5) align 8 %[[byval_temp]], ptr addrspace(3) align 8 @test_indirect_arg_local.l_s, i64 800, i1 false)
	// AMDGCN: call void @FuncOneLargeMember(ptr addrspace(5) noundef byval(%struct.LargeStructOneMember) align 8 %[[byval_temp]])	// AMDGCN: call void @FuncOneLargeMember(ptr addrspace(5) byref(%struct.LargeStructOneMember) align 8 %[[byval_temp]])
	kernel void test_indirect_arg_local(void) {	kernel void test_indirect_arg_local(void) {
	local struct LargeStructOneMember l_s;	local struct LargeStructOneMember l_s;
	FuncOneLargeMember(l_s);	FuncOneLargeMember(l_s);
	}	}

	// AMDGCN-LABEL: define{{.*}} void @test_indirect_arg_private()	// AMDGCN-LABEL: define{{.*}} void @test_indirect_arg_private()
	// AMDGCN: %[[p_s:.*]] = alloca %struct.LargeStructOneMember, align 8, addrspace(5)	// AMDGCN: %[[p_s:.*]] = alloca %struct.LargeStructOneMember, align 8, addrspace(5)
	// AMDGCN-NOT: @llvm.memcpy	// AMDGCN-NOT: @llvm.memcpy
	// AMDGCN-NEXT: call void @FuncOneLargeMember(ptr addrspace(5) noundef byval(%struct.LargeStructOneMember) align 8 %[[p_s]])	// AMDGCN-NEXT: call void @FuncOneLargeMember(ptr addrspace(5) byref(%struct.LargeStructOneMember) align 8 %[[p_s]])
		arsenmUnsubmitted Not Done Reply Inline Actions These test checks are pretty thin, I'd like to see the memcpys in the IR. In a pre-commit, can you switch these tests to generated checks? arsenm: These test checks are pretty thin, I'd like to see the memcpys in the IR. In a pre-commit, can…
		cfangAuthorUnsubmitted Done Reply Inline Actions Do you mean using "update_cc_test_checks.py" to generate the CHECKs? I am not sure why this does not work as expected. But I am including it as a separate file " addr-space-struct-arg-temp.cl" for reference. cfang: Do you mean using "update_cc_test_checks.py" to generate the CHECKs? I am not sure why this…
		arsenmUnsubmitted Not Done Reply Inline Actions Yes. You might need to manually delete the checks that are already there arsenm: Yes. You might need to manually delete the checks that are already there
	void test_indirect_arg_private(void) {	void test_indirect_arg_private(void) {
	struct LargeStructOneMember p_s;	struct LargeStructOneMember p_s;
	FuncOneLargeMember(p_s);	FuncOneLargeMember(p_s);
	}	}

	// AMDGCN-LABEL: define{{.*}} amdgpu_kernel void @KernelOneMember	// AMDGCN-LABEL: define{{.*}} amdgpu_kernel void @KernelOneMember
	// AMDGCN-SAME: (<2 x i32> %[[u_coerce:.*]])	// AMDGCN-SAME: (<2 x i32> %[[u_coerce:.*]])
	// AMDGCN: %[[u:.*]] = alloca %struct.StructOneMember, align 8, addrspace(5)	// AMDGCN: %[[u:.*]] = alloca %struct.StructOneMember, align 8, addrspace(5)
	// AMDGCN: %[[coerce_dive:.*]] = getelementptr inbounds %struct.StructOneMember, ptr addrspace(5) %[[u]], i32 0, i32 0	// AMDGCN: %[[coerce_dive:.*]] = getelementptr inbounds %struct.StructOneMember, ptr addrspace(5) %[[u]], i32 0, i32 0
	// AMDGCN: store <2 x i32> %[[u_coerce]], ptr addrspace(5) %[[coerce_dive]]	// AMDGCN: store <2 x i32> %[[u_coerce]], ptr addrspace(5) %[[coerce_dive]]
	// AMDGCN: call void @FuncOneMember(<2 x i32>	// AMDGCN: call void @FuncOneMember(<2 x i32>
	kernel void KernelOneMember(struct StructOneMember u) {	kernel void KernelOneMember(struct StructOneMember u) {
	FuncOneMember(u);	FuncOneMember(u);
	}	}

	// SPIR: call void @llvm.memcpy.p0.p1.i32	// SPIR: call void @llvm.memcpy.p0.p1.i32
	// SPIR-NOT: addrspacecast	// SPIR-NOT: addrspacecast
	kernel void KernelOneMemberSpir(global struct StructOneMember* u) {	kernel void KernelOneMemberSpir(global struct StructOneMember* u) {
	FuncOneMember(*u);	FuncOneMember(*u);
	}	}

	// AMDGCN-LABEL: define{{.*}} amdgpu_kernel void @KernelLargeOneMember(	// AMDGCN-LABEL: define{{.*}} amdgpu_kernel void @KernelLargeOneMember(
	// AMDGCN: %[[U:.*]] = alloca %struct.LargeStructOneMember, align 8, addrspace(5)	// AMDGCN: %[[U:.*]] = alloca %struct.LargeStructOneMember, align 8, addrspace(5)
	// AMDGCN: store %struct.LargeStructOneMember %u.coerce, ptr addrspace(5) %[[U]], align 8	// AMDGCN: store %struct.LargeStructOneMember %u.coerce, ptr addrspace(5) %[[U]], align 8
	// AMDGCN: call void @FuncOneLargeMember(ptr addrspace(5) noundef byval(%struct.LargeStructOneMember) align 8 %[[U]])	// AMDGCN: call void @FuncOneLargeMember(ptr addrspace(5) byref(%struct.LargeStructOneMember) align 8 %[[U]])
	kernel void KernelLargeOneMember(struct LargeStructOneMember u) {	kernel void KernelLargeOneMember(struct LargeStructOneMember u) {
	FuncOneLargeMember(u);	FuncOneLargeMember(u);
	}	}

	// AMDGCN-LABEL: define{{.*}} void @FuncTwoMember(<2 x i32> %u.coerce0, <2 x i32> %u.coerce1)	// AMDGCN-LABEL: define{{.*}} void @FuncTwoMember(<2 x i32> %u.coerce0, <2 x i32> %u.coerce1)
	void FuncTwoMember(struct StructTwoMember u) {	void FuncTwoMember(struct StructTwoMember u) {
	u.y = (int2)(0, 0);	u.y = (int2)(0, 0);
	}	}

	// AMDGCN-LABEL: define{{.*}} void @FuncLargeTwoMember(ptr addrspace(5) noundef byval(%struct.LargeStructTwoMember) align 8 %u)	// AMDGCN-LABEL: define dso_local void @FuncLargeTwoMember
		// AMDGCN-SAME: (ptr addrspace(5) byref([[STRUCT_LARGESTRUCTTWOMEMBER:%.]]) align 8 [[TMP0:%.]])
		// AMDGCN: %[[U:.*]] = alloca %struct.LargeStructTwoMember, align 8, addrspace(5)
		// AMDGCN: call void @llvm.memcpy.p5.p5.i64(ptr addrspace(5) align 8 %[[U]], ptr addrspace(5) align 8 [[TMP0]], i64 480, i1 false)
	void FuncLargeTwoMember(struct LargeStructTwoMember u) {	void FuncLargeTwoMember(struct LargeStructTwoMember u) {
	u.y[0] = (int2)(0, 0);	u.y[0] = (int2)(0, 0);
	}	}

	// AMDGCN-LABEL: define{{.*}} amdgpu_kernel void @KernelTwoMember	// AMDGCN-LABEL: define{{.*}} amdgpu_kernel void @KernelTwoMember
	// AMDGCN-SAME: (%struct.StructTwoMember %[[u_coerce:.*]])	// AMDGCN-SAME: (%struct.StructTwoMember %[[u_coerce:.*]])
	// AMDGCN: %[[u:.*]] = alloca %struct.StructTwoMember, align 8, addrspace(5)	// AMDGCN: %[[u:.*]] = alloca %struct.StructTwoMember, align 8, addrspace(5)
	// AMDGCN: %[[LD0:.*]] = load <2 x i32>, ptr addrspace(5)	// AMDGCN: %[[LD0:.*]] = load <2 x i32>, ptr addrspace(5)
	// AMDGCN: %[[LD1:.*]] = load <2 x i32>, ptr addrspace(5)	// AMDGCN: %[[LD1:.*]] = load <2 x i32>, ptr addrspace(5)
	// AMDGCN: call void @FuncTwoMember(<2 x i32> %[[LD0]], <2 x i32> %[[LD1]])	// AMDGCN: call void @FuncTwoMember(<2 x i32> %[[LD0]], <2 x i32> %[[LD1]])
	kernel void KernelTwoMember(struct StructTwoMember u) {	kernel void KernelTwoMember(struct StructTwoMember u) {
	FuncTwoMember(u);	FuncTwoMember(u);
	}	}

	// AMDGCN-LABEL: define{{.*}} amdgpu_kernel void @KernelLargeTwoMember	// AMDGCN-LABEL: define{{.*}} amdgpu_kernel void @KernelLargeTwoMember
	// AMDGCN-SAME: (%struct.LargeStructTwoMember %[[u_coerce:.*]])	// AMDGCN-SAME: (%struct.LargeStructTwoMember %[[u_coerce:.*]])
	// AMDGCN: %[[u:.*]] = alloca %struct.LargeStructTwoMember, align 8, addrspace(5)	// AMDGCN: %[[u:.*]] = alloca %struct.LargeStructTwoMember, align 8, addrspace(5)
	// AMDGCN: store %struct.LargeStructTwoMember %[[u_coerce]], ptr addrspace(5) %[[u]]	// AMDGCN: store %struct.LargeStructTwoMember %[[u_coerce]], ptr addrspace(5) %[[u]]
	// AMDGCN: call void @FuncLargeTwoMember(ptr addrspace(5) noundef byval(%struct.LargeStructTwoMember) align 8 %[[u]])	// AMDGCN: call void @FuncLargeTwoMember(ptr addrspace(5) byref(%struct.LargeStructTwoMember) align 8 %[[u]])
	kernel void KernelLargeTwoMember(struct LargeStructTwoMember u) {	kernel void KernelLargeTwoMember(struct LargeStructTwoMember u) {
	FuncLargeTwoMember(u);	FuncLargeTwoMember(u);
	}	}
Context not available.

clang/test/CodeGenOpenCL/amdgpu-abi-struct-arg-byref.cl

	Show All 22 Lines
	// AMDGCN-NEXT: call void @llvm.memcpy.p1.p5.i64(ptr addrspace(1) align 4 [[ARRAYIDX]], ptr addrspace(5) align 4 [[TMP]], i64 64, i1 false)	// AMDGCN-NEXT: call void @llvm.memcpy.p1.p5.i64(ptr addrspace(1) align 4 [[ARRAYIDX]], ptr addrspace(5) align 4 [[TMP]], i64 64, i1 false)
	// AMDGCN-NEXT: ret void	// AMDGCN-NEXT: ret void
	//	//
	kernel void ker(global Mat3X3 in, global Mat4X4 out) {	kernel void ker(global Mat3X3 in, global Mat4X4 out) {
	out[0] = foo(in[1]);	out[0] = foo(in[1]);
	}	}

	// AMDGCN-LABEL: define dso_local void @foo_large	// AMDGCN-LABEL: define dso_local void @foo_large
	// AMDGCN-SAME: (ptr addrspace(5) noalias sret([[STRUCT_MAT64X64:%.]]) align 4 [[AGG_RESULT:%.]], ptr addrspace(5) noundef byval([[STRUCT_MAT32X32:%.]]) align 4 [[IN:%.]]) #[[ATTR0]] {	// AMDGCN-SAME: (ptr addrspace(5) noalias sret([[STRUCT_MAT64X64:%.]]) align 4 [[AGG_RESULT:%.]], ptr addrspace(5) byref([[STRUCT_MAT32X32:%.]]) align 4 [[TMP0:%.]]) #[[ATTR0]] {
	// AMDGCN-NEXT: entry:	// AMDGCN-NEXT: entry:
		// AMDGCN-NEXT: [[IN:%.*]] = alloca [[STRUCT_MAT32X32]], align 4, addrspace(5)
		// AMDGCN-NEXT: call void @llvm.memcpy.p5.p5.i64(ptr addrspace(5) align 4 [[IN]], ptr addrspace(5) align 4 [[TMP0]], i64 4096, i1 false)
	// AMDGCN-NEXT: ret void	// AMDGCN-NEXT: ret void
	//	//
	Mat64X64 __attribute__((noinline)) foo_large(Mat32X32 in) {	Mat64X64 __attribute__((noinline)) foo_large(Mat32X32 in) {
	Mat64X64 out;	Mat64X64 out;
	return out;	return out;
	}	}

	// AMDGCN-LABEL: define dso_local amdgpu_kernel void @ker_large	// AMDGCN-LABEL: define dso_local amdgpu_kernel void @ker_large
	// AMDGCN-SAME: (ptr addrspace(1) noundef align 4 [[IN:%.]], ptr addrspace(1) noundef align 4 [[OUT:%.]]) #[[ATTR1]] !kernel_arg_addr_space !4 !kernel_arg_access_qual !5 !kernel_arg_type !8 !kernel_arg_base_type !8 !kernel_arg_type_qual !7 {	// AMDGCN-SAME: (ptr addrspace(1) noundef align 4 [[IN:%.]], ptr addrspace(1) noundef align 4 [[OUT:%.]]) #[[ATTR1]] !kernel_arg_addr_space !4 !kernel_arg_access_qual !5 !kernel_arg_type !8 !kernel_arg_base_type !8 !kernel_arg_type_qual !7 {
	// AMDGCN-NEXT: entry:	// AMDGCN-NEXT: entry:
	// AMDGCN-NEXT: [[IN_ADDR:%.*]] = alloca ptr addrspace(1), align 8, addrspace(5)	// AMDGCN-NEXT: [[IN_ADDR:%.*]] = alloca ptr addrspace(1), align 8, addrspace(5)
	// AMDGCN-NEXT: [[OUT_ADDR:%.*]] = alloca ptr addrspace(1), align 8, addrspace(5)	// AMDGCN-NEXT: [[OUT_ADDR:%.*]] = alloca ptr addrspace(1), align 8, addrspace(5)
	// AMDGCN-NEXT: [[TMP:%.]] = alloca [[STRUCT_MAT64X64:%.]], align 4, addrspace(5)	// AMDGCN-NEXT: [[TMP:%.]] = alloca [[STRUCT_MAT64X64:%.]], align 4, addrspace(5)
	// AMDGCN-NEXT: [[BYVAL_TEMP:%.]] = alloca [[STRUCT_MAT32X32:%.]], align 4, addrspace(5)	// AMDGCN-NEXT: [[BYVAL_TEMP:%.]] = alloca [[STRUCT_MAT32X32:%.]], align 4, addrspace(5)
	// AMDGCN-NEXT: store ptr addrspace(1) [[IN]], ptr addrspace(5) [[IN_ADDR]], align 8	// AMDGCN-NEXT: store ptr addrspace(1) [[IN]], ptr addrspace(5) [[IN_ADDR]], align 8
	// AMDGCN-NEXT: store ptr addrspace(1) [[OUT]], ptr addrspace(5) [[OUT_ADDR]], align 8	// AMDGCN-NEXT: store ptr addrspace(1) [[OUT]], ptr addrspace(5) [[OUT_ADDR]], align 8
	// AMDGCN-NEXT: [[TMP0:%.*]] = load ptr addrspace(1), ptr addrspace(5) [[OUT_ADDR]], align 8	// AMDGCN-NEXT: [[TMP0:%.*]] = load ptr addrspace(1), ptr addrspace(5) [[OUT_ADDR]], align 8
	// AMDGCN-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds [[STRUCT_MAT64X64]], ptr addrspace(1) [[TMP0]], i64 0	// AMDGCN-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds [[STRUCT_MAT64X64]], ptr addrspace(1) [[TMP0]], i64 0
	// AMDGCN-NEXT: [[TMP1:%.*]] = load ptr addrspace(1), ptr addrspace(5) [[IN_ADDR]], align 8	// AMDGCN-NEXT: [[TMP1:%.*]] = load ptr addrspace(1), ptr addrspace(5) [[IN_ADDR]], align 8
	// AMDGCN-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds [[STRUCT_MAT32X32]], ptr addrspace(1) [[TMP1]], i64 1	// AMDGCN-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds [[STRUCT_MAT32X32]], ptr addrspace(1) [[TMP1]], i64 1
	// AMDGCN-NEXT: call void @llvm.memcpy.p5.p1.i64(ptr addrspace(5) align 4 [[BYVAL_TEMP]], ptr addrspace(1) align 4 [[ARRAYIDX1]], i64 4096, i1 false)	// AMDGCN-NEXT: call void @llvm.memcpy.p5.p1.i64(ptr addrspace(5) align 4 [[BYVAL_TEMP]], ptr addrspace(1) align 4 [[ARRAYIDX1]], i64 4096, i1 false)
	// AMDGCN-NEXT: call void @foo_large(ptr addrspace(5) sret([[STRUCT_MAT64X64]]) align 4 [[TMP]], ptr addrspace(5) noundef byval([[STRUCT_MAT32X32]]) align 4 [[BYVAL_TEMP]]) #[[ATTR3]]	// AMDGCN-NEXT: call void @foo_large(ptr addrspace(5) sret([[STRUCT_MAT64X64]]) align 4 [[TMP]], ptr addrspace(5) byref([[STRUCT_MAT32X32]]) align 4 [[BYVAL_TEMP]]) #[[ATTR3]]
	// AMDGCN-NEXT: call void @llvm.memcpy.p1.p5.i64(ptr addrspace(1) align 4 [[ARRAYIDX]], ptr addrspace(5) align 4 [[TMP]], i64 16384, i1 false)	// AMDGCN-NEXT: call void @llvm.memcpy.p1.p5.i64(ptr addrspace(1) align 4 [[ARRAYIDX]], ptr addrspace(5) align 4 [[TMP]], i64 16384, i1 false)
	// AMDGCN-NEXT: ret void	// AMDGCN-NEXT: ret void
	//	//
	kernel void ker_large(global Mat32X32 in, global Mat64X64 out) {	kernel void ker_large(global Mat32X32 in, global Mat64X64 out) {
	out[0] = foo_large(in[1]);	out[0] = foo_large(in[1]);
	}	}

	// AMDGCN-LABEL: define dso_local void @FuncOneMember	// AMDGCN-LABEL: define dso_local void @FuncOneMember
	Show All 9 Lines
	// AMDGCN-NEXT: store <2 x i32> [[TMP0]], ptr addrspace(5) [[X]], align 8	// AMDGCN-NEXT: store <2 x i32> [[TMP0]], ptr addrspace(5) [[X]], align 8
	// AMDGCN-NEXT: ret void	// AMDGCN-NEXT: ret void
	//	//
	void FuncOneMember(struct StructOneMember u) {	void FuncOneMember(struct StructOneMember u) {
	u.x = (int2)(0, 0);	u.x = (int2)(0, 0);
	}	}

	// AMDGCN-LABEL: define dso_local void @FuncOneLargeMember	// AMDGCN-LABEL: define dso_local void @FuncOneLargeMember
	// AMDGCN-SAME: (ptr addrspace(5) noundef byval([[STRUCT_LARGESTRUCTONEMEMBER:%.]]) align 8 [[U:%.]]) #[[ATTR0]] {	// AMDGCN-SAME: (ptr addrspace(5) byref([[STRUCT_LARGESTRUCTONEMEMBER:%.]]) align 8 [[TMP0:%.]]) #[[ATTR0]] {
	// AMDGCN-NEXT: entry:	// AMDGCN-NEXT: entry:
		// AMDGCN-NEXT: [[U:%.*]] = alloca [[STRUCT_LARGESTRUCTONEMEMBER]], align 8, addrspace(5)
	// AMDGCN-NEXT: [[DOTCOMPOUNDLITERAL:%.*]] = alloca <2 x i32>, align 8, addrspace(5)	// AMDGCN-NEXT: [[DOTCOMPOUNDLITERAL:%.*]] = alloca <2 x i32>, align 8, addrspace(5)
		// AMDGCN-NEXT: call void @llvm.memcpy.p5.p5.i64(ptr addrspace(5) align 8 [[U]], ptr addrspace(5) align 8 [[TMP0]], i64 800, i1 false)
	// AMDGCN-NEXT: store <2 x i32> zeroinitializer, ptr addrspace(5) [[DOTCOMPOUNDLITERAL]], align 8	// AMDGCN-NEXT: store <2 x i32> zeroinitializer, ptr addrspace(5) [[DOTCOMPOUNDLITERAL]], align 8
	// AMDGCN-NEXT: [[TMP0:%.*]] = load <2 x i32>, ptr addrspace(5) [[DOTCOMPOUNDLITERAL]], align 8	// AMDGCN-NEXT: [[TMP1:%.*]] = load <2 x i32>, ptr addrspace(5) [[DOTCOMPOUNDLITERAL]], align 8
	// AMDGCN-NEXT: [[X:%.*]] = getelementptr inbounds [[STRUCT_LARGESTRUCTONEMEMBER]], ptr addrspace(5) [[U]], i32 0, i32 0	// AMDGCN-NEXT: [[X:%.*]] = getelementptr inbounds [[STRUCT_LARGESTRUCTONEMEMBER]], ptr addrspace(5) [[U]], i32 0, i32 0
	// AMDGCN-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds [100 x <2 x i32>], ptr addrspace(5) [[X]], i64 0, i64 0	// AMDGCN-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds [100 x <2 x i32>], ptr addrspace(5) [[X]], i64 0, i64 0
	// AMDGCN-NEXT: store <2 x i32> [[TMP0]], ptr addrspace(5) [[ARRAYIDX]], align 8	// AMDGCN-NEXT: store <2 x i32> [[TMP1]], ptr addrspace(5) [[ARRAYIDX]], align 8
	// AMDGCN-NEXT: ret void	// AMDGCN-NEXT: ret void
	//	//
	void FuncOneLargeMember(struct LargeStructOneMember u) {	void FuncOneLargeMember(struct LargeStructOneMember u) {
	u.x[0] = (int2)(0, 0);	u.x[0] = (int2)(0, 0);
	}	}

	#if (__OPENCL_C_VERSION__ == 200) \|\| (__OPENCL_C_VERSION__ >= 300 && defined(__opencl_c_program_scope_global_variables))	#if (__OPENCL_C_VERSION__ == 200) \|\| (__OPENCL_C_VERSION__ >= 300 && defined(__opencl_c_program_scope_global_variables))
	// AMDGCN-LABEL: define dso_local void @test_indirect_arg_globl	// AMDGCN-LABEL: define dso_local void @test_indirect_arg_globl
	// AMDGCN-SAME: () #[[ATTR0]] {	// AMDGCN-SAME: () #[[ATTR0]] {
	// AMDGCN-NEXT: entry:	// AMDGCN-NEXT: entry:
	// AMDGCN-NEXT: [[BYVAL_TEMP:%.]] = alloca [[STRUCT_LARGESTRUCTONEMEMBER:%.]], align 8, addrspace(5)	// AMDGCN-NEXT: [[BYVAL_TEMP:%.]] = alloca [[STRUCT_LARGESTRUCTONEMEMBER:%.]], align 8, addrspace(5)
	// AMDGCN-NEXT: call void @llvm.memcpy.p5.p1.i64(ptr addrspace(5) align 8 [[BYVAL_TEMP]], ptr addrspace(1) align 8 @g_s, i64 800, i1 false)	// AMDGCN-NEXT: call void @llvm.memcpy.p5.p1.i64(ptr addrspace(5) align 8 [[BYVAL_TEMP]], ptr addrspace(1) align 8 @g_s, i64 800, i1 false)
	// AMDGCN-NEXT: call void @FuncOneLargeMember(ptr addrspace(5) noundef byval([[STRUCT_LARGESTRUCTONEMEMBER]]) align 8 [[BYVAL_TEMP]]) #[[ATTR3]]	// AMDGCN-NEXT: call void @FuncOneLargeMember(ptr addrspace(5) byref([[STRUCT_LARGESTRUCTONEMEMBER]]) align 8 [[BYVAL_TEMP]]) #[[ATTR3]]
	// AMDGCN-NEXT: ret void	// AMDGCN-NEXT: ret void
	//	//
	void test_indirect_arg_globl(void) {	void test_indirect_arg_globl(void) {
	FuncOneLargeMember(g_s);	FuncOneLargeMember(g_s);
	}	}
	#endif	#endif

	// AMDGCN-LABEL: define dso_local amdgpu_kernel void @test_indirect_arg_local	// AMDGCN-LABEL: define dso_local amdgpu_kernel void @test_indirect_arg_local
	// AMDGCN-SAME: () #[[ATTR1]] !kernel_arg_addr_space !9 !kernel_arg_access_qual !9 !kernel_arg_type !9 !kernel_arg_base_type !9 !kernel_arg_type_qual !9 {	// AMDGCN-SAME: () #[[ATTR1]] !kernel_arg_addr_space !9 !kernel_arg_access_qual !9 !kernel_arg_type !9 !kernel_arg_base_type !9 !kernel_arg_type_qual !9 {
	// AMDGCN-NEXT: entry:	// AMDGCN-NEXT: entry:
	// AMDGCN-NEXT: [[BYVAL_TEMP:%.]] = alloca [[STRUCT_LARGESTRUCTONEMEMBER:%.]], align 8, addrspace(5)	// AMDGCN-NEXT: [[BYVAL_TEMP:%.]] = alloca [[STRUCT_LARGESTRUCTONEMEMBER:%.]], align 8, addrspace(5)
	// AMDGCN-NEXT: call void @llvm.memcpy.p5.p3.i64(ptr addrspace(5) align 8 [[BYVAL_TEMP]], ptr addrspace(3) align 8 @test_indirect_arg_local.l_s, i64 800, i1 false)	// AMDGCN-NEXT: call void @llvm.memcpy.p5.p3.i64(ptr addrspace(5) align 8 [[BYVAL_TEMP]], ptr addrspace(3) align 8 @test_indirect_arg_local.l_s, i64 800, i1 false)
	// AMDGCN-NEXT: call void @FuncOneLargeMember(ptr addrspace(5) noundef byval([[STRUCT_LARGESTRUCTONEMEMBER]]) align 8 [[BYVAL_TEMP]]) #[[ATTR3]]	// AMDGCN-NEXT: call void @FuncOneLargeMember(ptr addrspace(5) byref([[STRUCT_LARGESTRUCTONEMEMBER]]) align 8 [[BYVAL_TEMP]]) #[[ATTR3]]
	// AMDGCN-NEXT: ret void	// AMDGCN-NEXT: ret void
	//	//
	kernel void test_indirect_arg_local(void) {	kernel void test_indirect_arg_local(void) {
	local struct LargeStructOneMember l_s;	local struct LargeStructOneMember l_s;
	FuncOneLargeMember(l_s);	FuncOneLargeMember(l_s);
	}	}

	// AMDGCN-LABEL: define dso_local void @test_indirect_arg_private	// AMDGCN-LABEL: define dso_local void @test_indirect_arg_private
	// AMDGCN-SAME: () #[[ATTR0]] {	// AMDGCN-SAME: () #[[ATTR0]] {
	// AMDGCN-NEXT: entry:	// AMDGCN-NEXT: entry:
	// AMDGCN-NEXT: [[P_S:%.]] = alloca [[STRUCT_LARGESTRUCTONEMEMBER:%.]], align 8, addrspace(5)	// AMDGCN-NEXT: [[P_S:%.]] = alloca [[STRUCT_LARGESTRUCTONEMEMBER:%.]], align 8, addrspace(5)
	// AMDGCN-NEXT: call void @FuncOneLargeMember(ptr addrspace(5) noundef byval([[STRUCT_LARGESTRUCTONEMEMBER]]) align 8 [[P_S]]) #[[ATTR3]]	// AMDGCN-NEXT: call void @FuncOneLargeMember(ptr addrspace(5) byref([[STRUCT_LARGESTRUCTONEMEMBER]]) align 8 [[P_S]]) #[[ATTR3]]
	// AMDGCN-NEXT: ret void	// AMDGCN-NEXT: ret void
	//	//
	void test_indirect_arg_private(void) {	void test_indirect_arg_private(void) {
	struct LargeStructOneMember p_s;	struct LargeStructOneMember p_s;
	FuncOneLargeMember(p_s);	FuncOneLargeMember(p_s);
	}	}

	// AMDGCN-LABEL: define dso_local amdgpu_kernel void @KernelOneMember	// AMDGCN-LABEL: define dso_local amdgpu_kernel void @KernelOneMember
	Show All 26 Lines
	FuncOneMember(*u);	FuncOneMember(*u);
	}	}

	// AMDGCN-LABEL: define dso_local amdgpu_kernel void @KernelLargeOneMember	// AMDGCN-LABEL: define dso_local amdgpu_kernel void @KernelLargeOneMember
	// AMDGCN-SAME: ([[STRUCT_LARGESTRUCTONEMEMBER:%.]] [[U_COERCE:%.]]) #[[ATTR1]] !kernel_arg_addr_space !10 !kernel_arg_access_qual !11 !kernel_arg_type !16 !kernel_arg_base_type !16 !kernel_arg_type_qual !13 {	// AMDGCN-SAME: ([[STRUCT_LARGESTRUCTONEMEMBER:%.]] [[U_COERCE:%.]]) #[[ATTR1]] !kernel_arg_addr_space !10 !kernel_arg_access_qual !11 !kernel_arg_type !16 !kernel_arg_base_type !16 !kernel_arg_type_qual !13 {
	// AMDGCN-NEXT: entry:	// AMDGCN-NEXT: entry:
	// AMDGCN-NEXT: [[U:%.*]] = alloca [[STRUCT_LARGESTRUCTONEMEMBER]], align 8, addrspace(5)	// AMDGCN-NEXT: [[U:%.*]] = alloca [[STRUCT_LARGESTRUCTONEMEMBER]], align 8, addrspace(5)
	// AMDGCN-NEXT: store [[STRUCT_LARGESTRUCTONEMEMBER]] [[U_COERCE]], ptr addrspace(5) [[U]], align 8	// AMDGCN-NEXT: store [[STRUCT_LARGESTRUCTONEMEMBER]] [[U_COERCE]], ptr addrspace(5) [[U]], align 8
	// AMDGCN-NEXT: call void @FuncOneLargeMember(ptr addrspace(5) noundef byval([[STRUCT_LARGESTRUCTONEMEMBER]]) align 8 [[U]]) #[[ATTR3]]	// AMDGCN-NEXT: call void @FuncOneLargeMember(ptr addrspace(5) byref([[STRUCT_LARGESTRUCTONEMEMBER]]) align 8 [[U]]) #[[ATTR3]]
	// AMDGCN-NEXT: ret void	// AMDGCN-NEXT: ret void
	//	//
	kernel void KernelLargeOneMember(struct LargeStructOneMember u) {	kernel void KernelLargeOneMember(struct LargeStructOneMember u) {
	FuncOneLargeMember(u);	FuncOneLargeMember(u);
	}	}

	// AMDGCN-LABEL: define dso_local void @FuncTwoMember	// AMDGCN-LABEL: define dso_local void @FuncTwoMember
	// AMDGCN-SAME: (<2 x i32> [[U_COERCE0:%.]], <2 x i32> [[U_COERCE1:%.]]) #[[ATTR0]] {	// AMDGCN-SAME: (<2 x i32> [[U_COERCE0:%.]], <2 x i32> [[U_COERCE1:%.]]) #[[ATTR0]] {
	Show All 10 Lines
	// AMDGCN-NEXT: store <2 x i32> [[TMP2]], ptr addrspace(5) [[Y]], align 8	// AMDGCN-NEXT: store <2 x i32> [[TMP2]], ptr addrspace(5) [[Y]], align 8
	// AMDGCN-NEXT: ret void	// AMDGCN-NEXT: ret void
	//	//
	void FuncTwoMember(struct StructTwoMember u) {	void FuncTwoMember(struct StructTwoMember u) {
	u.y = (int2)(0, 0);	u.y = (int2)(0, 0);
	}	}

	// AMDGCN-LABEL: define dso_local void @FuncLargeTwoMember	// AMDGCN-LABEL: define dso_local void @FuncLargeTwoMember
	// AMDGCN-SAME: (ptr addrspace(5) noundef byval([[STRUCT_LARGESTRUCTTWOMEMBER:%.]]) align 8 [[U:%.]]) #[[ATTR0]] {	// AMDGCN-SAME: (ptr addrspace(5) byref([[STRUCT_LARGESTRUCTTWOMEMBER:%.]]) align 8 [[TMP0:%.]]) #[[ATTR0]] {
		arsenmUnsubmitted Not Done Reply Inline Actions This lost the noundef, shouldn't lose it arsenm: This lost the noundef, shouldn't lose it
		cfangAuthorUnsubmitted Done Reply Inline Actions if (AI.getKind() == ABIArgInfo::Indirect) return "noundef" Should we add IndirectAlised check and include in the this same patch? Thanks cfang: if (AI.getKind() == ABIArgInfo::Indirect) return "noundef" Should we add IndirectAlised…
		arsenmUnsubmitted Not Done Reply Inline Actions Don't understand this snippet, the attribute emission presumably comes from somewhere else arsenm: Don't understand this snippet, the attribute emission presumably comes from somewhere else
		cfangAuthorUnsubmitted Done Reply Inline Actions In function DetermineNoUndef. "noundef was also missing for kernel byref argument cfang: In function DetermineNoUndef. "noundef was also missing for kernel byref argument
		arsenmUnsubmitted Not Done Reply Inline Actions Yes, that should probably also include indirect aliased. You should fix that in a second patch arsenm: Yes, that should probably also include indirect aliased. You should fix that in a second patch
	// AMDGCN-NEXT: entry:	// AMDGCN-NEXT: entry:
		// AMDGCN-NEXT: [[U:%.*]] = alloca [[STRUCT_LARGESTRUCTTWOMEMBER]], align 8, addrspace(5)
	// AMDGCN-NEXT: [[DOTCOMPOUNDLITERAL:%.*]] = alloca <2 x i32>, align 8, addrspace(5)	// AMDGCN-NEXT: [[DOTCOMPOUNDLITERAL:%.*]] = alloca <2 x i32>, align 8, addrspace(5)
		// AMDGCN-NEXT: call void @llvm.memcpy.p5.p5.i64(ptr addrspace(5) align 8 [[U]], ptr addrspace(5) align 8 [[TMP0]], i64 480, i1 false)
	// AMDGCN-NEXT: store <2 x i32> zeroinitializer, ptr addrspace(5) [[DOTCOMPOUNDLITERAL]], align 8	// AMDGCN-NEXT: store <2 x i32> zeroinitializer, ptr addrspace(5) [[DOTCOMPOUNDLITERAL]], align 8
	// AMDGCN-NEXT: [[TMP0:%.*]] = load <2 x i32>, ptr addrspace(5) [[DOTCOMPOUNDLITERAL]], align 8	// AMDGCN-NEXT: [[TMP1:%.*]] = load <2 x i32>, ptr addrspace(5) [[DOTCOMPOUNDLITERAL]], align 8
	// AMDGCN-NEXT: [[Y:%.*]] = getelementptr inbounds [[STRUCT_LARGESTRUCTTWOMEMBER]], ptr addrspace(5) [[U]], i32 0, i32 1	// AMDGCN-NEXT: [[Y:%.*]] = getelementptr inbounds [[STRUCT_LARGESTRUCTTWOMEMBER]], ptr addrspace(5) [[U]], i32 0, i32 1
	// AMDGCN-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds [20 x <2 x i32>], ptr addrspace(5) [[Y]], i64 0, i64 0	// AMDGCN-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds [20 x <2 x i32>], ptr addrspace(5) [[Y]], i64 0, i64 0
	// AMDGCN-NEXT: store <2 x i32> [[TMP0]], ptr addrspace(5) [[ARRAYIDX]], align 8	// AMDGCN-NEXT: store <2 x i32> [[TMP1]], ptr addrspace(5) [[ARRAYIDX]], align 8
	// AMDGCN-NEXT: ret void	// AMDGCN-NEXT: ret void
	//	//
	void FuncLargeTwoMember(struct LargeStructTwoMember u) {	void FuncLargeTwoMember(struct LargeStructTwoMember u) {
	u.y[0] = (int2)(0, 0);	u.y[0] = (int2)(0, 0);
	}	}

	// AMDGCN-LABEL: define dso_local amdgpu_kernel void @KernelTwoMember	// AMDGCN-LABEL: define dso_local amdgpu_kernel void @KernelTwoMember
	// AMDGCN-SAME: ([[STRUCT_STRUCTTWOMEMBER:%.]] [[U_COERCE:%.]]) #[[ATTR1]] !kernel_arg_addr_space !10 !kernel_arg_access_qual !11 !kernel_arg_type !17 !kernel_arg_base_type !17 !kernel_arg_type_qual !13 {	// AMDGCN-SAME: ([[STRUCT_STRUCTTWOMEMBER:%.]] [[U_COERCE:%.]]) #[[ATTR1]] !kernel_arg_addr_space !10 !kernel_arg_access_qual !11 !kernel_arg_type !17 !kernel_arg_base_type !17 !kernel_arg_type_qual !13 {
	Show All 11 Lines
	FuncTwoMember(u);	FuncTwoMember(u);
	}	}

	// AMDGCN-LABEL: define dso_local amdgpu_kernel void @KernelLargeTwoMember	// AMDGCN-LABEL: define dso_local amdgpu_kernel void @KernelLargeTwoMember
	// AMDGCN-SAME: ([[STRUCT_LARGESTRUCTTWOMEMBER:%.]] [[U_COERCE:%.]]) #[[ATTR1]] !kernel_arg_addr_space !10 !kernel_arg_access_qual !11 !kernel_arg_type !18 !kernel_arg_base_type !18 !kernel_arg_type_qual !13 {	// AMDGCN-SAME: ([[STRUCT_LARGESTRUCTTWOMEMBER:%.]] [[U_COERCE:%.]]) #[[ATTR1]] !kernel_arg_addr_space !10 !kernel_arg_access_qual !11 !kernel_arg_type !18 !kernel_arg_base_type !18 !kernel_arg_type_qual !13 {
	// AMDGCN-NEXT: entry:	// AMDGCN-NEXT: entry:
	// AMDGCN-NEXT: [[U:%.*]] = alloca [[STRUCT_LARGESTRUCTTWOMEMBER]], align 8, addrspace(5)	// AMDGCN-NEXT: [[U:%.*]] = alloca [[STRUCT_LARGESTRUCTTWOMEMBER]], align 8, addrspace(5)
	// AMDGCN-NEXT: store [[STRUCT_LARGESTRUCTTWOMEMBER]] [[U_COERCE]], ptr addrspace(5) [[U]], align 8	// AMDGCN-NEXT: store [[STRUCT_LARGESTRUCTTWOMEMBER]] [[U_COERCE]], ptr addrspace(5) [[U]], align 8
	// AMDGCN-NEXT: call void @FuncLargeTwoMember(ptr addrspace(5) noundef byval([[STRUCT_LARGESTRUCTTWOMEMBER]]) align 8 [[U]]) #[[ATTR3]]	// AMDGCN-NEXT: call void @FuncLargeTwoMember(ptr addrspace(5) byref([[STRUCT_LARGESTRUCTTWOMEMBER]]) align 8 [[U]]) #[[ATTR3]]
	// AMDGCN-NEXT: ret void	// AMDGCN-NEXT: ret void
	//	//
	kernel void KernelLargeTwoMember(struct LargeStructTwoMember u) {	kernel void KernelLargeTwoMember(struct LargeStructTwoMember u) {
	FuncLargeTwoMember(u);	FuncLargeTwoMember(u);
	}	}
Context not available.

clang/test/CodeGenOpenCL/amdgpu-abi-struct-coerce.cl

	Show All 22 Lines
	{	{
	flexible_array s = { 0 };	flexible_array s = { 0 };
	return s;	return s;
	}	}

	// CHECK: define{{.*}} void @func_reg_state_lo(<4 x i32> noundef %arg0, <4 x i32> noundef %arg1, <4 x i32> noundef %arg2, i32 noundef %arg3, i32 %s.coerce0, float %s.coerce1, i32 %s.coerce2)	// CHECK: define{{.*}} void @func_reg_state_lo(<4 x i32> noundef %arg0, <4 x i32> noundef %arg1, <4 x i32> noundef %arg2, i32 noundef %arg3, i32 %s.coerce0, float %s.coerce1, i32 %s.coerce2)
	void func_reg_state_lo(int4 arg0, int4 arg1, int4 arg2, int arg3, struct_arg_t s) { }	void func_reg_state_lo(int4 arg0, int4 arg1, int4 arg2, int arg3, struct_arg_t s) { }

	// CHECK: define{{.*}} void @func_reg_state_hi(<4 x i32> noundef %arg0, <4 x i32> noundef %arg1, <4 x i32> noundef %arg2, i32 noundef %arg3, i32 noundef %arg4, ptr addrspace(5) nocapture noundef byval(%struct.struct_arg) align 4 %s)	// CHECK: define{{.}} void @func_reg_state_hi(<4 x i32> noundef %arg0, <4 x i32> noundef %arg1, <4 x i32> noundef %arg2, i32 noundef %arg3, i32 noundef %arg4, ptr addrspace(5) nocapture byref(%struct.struct_arg) align 4 %{{.}})
	void func_reg_state_hi(int4 arg0, int4 arg1, int4 arg2, int arg3, int arg4, struct_arg_t s) { }	void func_reg_state_hi(int4 arg0, int4 arg1, int4 arg2, int arg3, int arg4, struct_arg_t s) { }

	// XXX - Why don't the inner structs flatten?	// XXX - Why don't the inner structs flatten?
	// CHECK: define{{.*}} void @func_reg_state_num_regs_nested_struct(<4 x i32> noundef %arg0, i32 noundef %arg1, i32 %arg2.coerce0, %struct.nested %arg2.coerce1, i32 %arg3.coerce0, %struct.nested %arg3.coerce1, ptr addrspace(5) nocapture noundef byval(%struct.num_regs_nested_struct) align 8 %arg4)	// CHECK: define{{.}} void @func_reg_state_num_regs_nested_struct(<4 x i32> noundef %arg0, i32 noundef %arg1, i32 %arg2.coerce0, %struct.nested %arg2.coerce1, i32 %arg3.coerce0, %struct.nested %arg3.coerce1, ptr addrspace(5) nocapture byref(%struct.num_regs_nested_struct) align 8 %{{.}})
	void func_reg_state_num_regs_nested_struct(int4 arg0, int arg1, num_regs_nested_struct arg2, num_regs_nested_struct arg3, num_regs_nested_struct arg4) { }	void func_reg_state_num_regs_nested_struct(int4 arg0, int arg1, num_regs_nested_struct arg2, num_regs_nested_struct arg3, num_regs_nested_struct arg4) { }

	// CHECK: define{{.*}} void @func_double_nested_struct_arg(<4 x i32> noundef %arg0, i32 noundef %arg1, i32 %arg2.coerce0, %struct.double_nested %arg2.coerce1, i16 %arg2.coerce2)	// CHECK: define{{.*}} void @func_double_nested_struct_arg(<4 x i32> noundef %arg0, i32 noundef %arg1, i32 %arg2.coerce0, %struct.double_nested %arg2.coerce1, i16 %arg2.coerce2)
	void func_double_nested_struct_arg(int4 arg0, int arg1, double_nested_struct arg2) { }	void func_double_nested_struct_arg(int4 arg0, int arg1, double_nested_struct arg2) { }

	// CHECK: define{{.*}} %struct.double_nested_struct @func_double_nested_struct_ret(<4 x i32> noundef %arg0, i32 noundef %arg1)	// CHECK: define{{.*}} %struct.double_nested_struct @func_double_nested_struct_ret(<4 x i32> noundef %arg0, i32 noundef %arg1)
	double_nested_struct func_double_nested_struct_ret(int4 arg0, int arg1) {	double_nested_struct func_double_nested_struct_ret(int4 arg0, int arg1) {
	double_nested_struct s = { 0 };	double_nested_struct s = { 0 };
	return s;	return s;
	}	}

	// CHECK: define{{.*}} void @func_large_struct_padding_arg_direct(i8 %arg.coerce0, i32 %arg.coerce1, i8 %arg.coerce2, i32 %arg.coerce3, i8 %arg.coerce4, i8 %arg.coerce5, i16 %arg.coerce6, i16 %arg.coerce7, [3 x i8] %arg.coerce8, i64 %arg.coerce9, i32 %arg.coerce10, i8 %arg.coerce11, i32 %arg.coerce12, i16 %arg.coerce13, i8 %arg.coerce14)	// CHECK: define{{.*}} void @func_large_struct_padding_arg_direct(i8 %arg.coerce0, i32 %arg.coerce1, i8 %arg.coerce2, i32 %arg.coerce3, i8 %arg.coerce4, i8 %arg.coerce5, i16 %arg.coerce6, i16 %arg.coerce7, [3 x i8] %arg.coerce8, i64 %arg.coerce9, i32 %arg.coerce10, i8 %arg.coerce11, i32 %arg.coerce12, i16 %arg.coerce13, i8 %arg.coerce14)
	void func_large_struct_padding_arg_direct(large_struct_padding arg) { }	void func_large_struct_padding_arg_direct(large_struct_padding arg) { }

	// CHECK: define{{.*}} void @func_large_struct_padding_arg_store(ptr addrspace(1) nocapture noundef writeonly %out, ptr addrspace(5) nocapture noundef readonly byval(%struct.large_struct_padding) align 8 %arg)	// CHECK: define{{.}} void @func_large_struct_padding_arg_store(ptr addrspace(1) nocapture noundef writeonly %out, ptr addrspace(5) nocapture readonly byref(%struct.large_struct_padding) align 8 %{{.}})
	void func_large_struct_padding_arg_store(global large_struct_padding* out, large_struct_padding arg) {	void func_large_struct_padding_arg_store(global large_struct_padding* out, large_struct_padding arg) {
	*out = arg;	*out = arg;
	}	}

	// CHECK: define{{.*}} void @v3i32_reg_count(<3 x i32> noundef %arg1, <3 x i32> noundef %arg2, <3 x i32> noundef %arg3, <3 x i32> noundef %arg4, i32 %arg5.coerce0, float %arg5.coerce1, i32 %arg5.coerce2)	// CHECK: define{{.*}} void @v3i32_reg_count(<3 x i32> noundef %arg1, <3 x i32> noundef %arg2, <3 x i32> noundef %arg3, <3 x i32> noundef %arg4, i32 %arg5.coerce0, float %arg5.coerce1, i32 %arg5.coerce2)
	void v3i32_reg_count(int3 arg1, int3 arg2, int3 arg3, int3 arg4, struct_arg_t arg5) { }	void v3i32_reg_count(int3 arg1, int3 arg2, int3 arg3, int3 arg4, struct_arg_t arg5) { }

	// Function signature from blender, nothing should be passed byval. The v3i32	// Function signature from blender, nothing should be passed byval. The v3i32
	// should not count as 4 passed registers.	// should not count as 4 passed registers.
	// CHECK: define{{.*}} void @v3i32_pair_reg_count(ptr addrspace(5) nocapture noundef %arg0, <3 x i32> %arg1.coerce0, <3 x i32> %arg1.coerce1, <3 x i32> noundef %arg2, <3 x i32> %arg3.coerce0, <3 x i32> %arg3.coerce1, <3 x i32> noundef %arg4, float noundef %arg5)	// CHECK: define{{.*}} void @v3i32_pair_reg_count(ptr addrspace(5) nocapture noundef %arg0, <3 x i32> %arg1.coerce0, <3 x i32> %arg1.coerce1, <3 x i32> noundef %arg2, <3 x i32> %arg3.coerce0, <3 x i32> %arg3.coerce1, <3 x i32> noundef %arg4, float noundef %arg5)
	void v3i32_pair_reg_count(int3_pair *arg0, int3_pair arg1, int3 arg2, int3_pair arg3, int3 arg4, float arg5) { }	void v3i32_pair_reg_count(int3_pair *arg0, int3_pair arg1, int3 arg2, int3_pair arg3, int3 arg4, float arg5) { }

	// Each short4 should fit pack into 2 registers.	// Each short4 should fit pack into 2 registers.
	// CHECK: define{{.*}} void @v4i16_reg_count(<4 x i16> noundef %arg0, <4 x i16> noundef %arg1, <4 x i16> noundef %arg2, <4 x i16> noundef %arg3, <4 x i16> noundef %arg4, <4 x i16> noundef %arg5, i32 %arg6.coerce0, i32 %arg6.coerce1, i32 %arg6.coerce2, i32 %arg6.coerce3)	// CHECK: define{{.*}} void @v4i16_reg_count(<4 x i16> noundef %arg0, <4 x i16> noundef %arg1, <4 x i16> noundef %arg2, <4 x i16> noundef %arg3, <4 x i16> noundef %arg4, <4 x i16> noundef %arg5, i32 %arg6.coerce0, i32 %arg6.coerce1, i32 %arg6.coerce2, i32 %arg6.coerce3)
	void v4i16_reg_count(short4 arg0, short4 arg1, short4 arg2, short4 arg3,	void v4i16_reg_count(short4 arg0, short4 arg1, short4 arg2, short4 arg3,
	short4 arg4, short4 arg5, struct_4regs arg6) { }	short4 arg4, short4 arg5, struct_4regs arg6) { }

	// CHECK: define{{.*}} void @v4i16_pair_reg_count_over(<4 x i16> noundef %arg0, <4 x i16> noundef %arg1, <4 x i16> noundef %arg2, <4 x i16> noundef %arg3, <4 x i16> noundef %arg4, <4 x i16> noundef %arg5, <4 x i16> noundef %arg6, ptr addrspace(5) nocapture noundef byval(%struct.struct_4regs) align 4 %arg7)	// CHECK: define{{.}} void @v4i16_pair_reg_count_over(<4 x i16> noundef %arg0, <4 x i16> noundef %arg1, <4 x i16> noundef %arg2, <4 x i16> noundef %arg3, <4 x i16> noundef %arg4, <4 x i16> noundef %arg5, <4 x i16> noundef %arg6, ptr addrspace(5) nocapture byref(%struct.struct_4regs) align 4 %{{.}})
	void v4i16_pair_reg_count_over(short4 arg0, short4 arg1, short4 arg2, short4 arg3,	void v4i16_pair_reg_count_over(short4 arg0, short4 arg1, short4 arg2, short4 arg3,
	short4 arg4, short4 arg5, short4 arg6, struct_4regs arg7) { }	short4 arg4, short4 arg5, short4 arg6, struct_4regs arg7) { }

	// CHECK: define{{.*}} void @v3i16_reg_count(<3 x i16> noundef %arg0, <3 x i16> noundef %arg1, <3 x i16> noundef %arg2, <3 x i16> noundef %arg3, <3 x i16> noundef %arg4, <3 x i16> noundef %arg5, i32 %arg6.coerce0, i32 %arg6.coerce1, i32 %arg6.coerce2, i32 %arg6.coerce3)	// CHECK: define{{.*}} void @v3i16_reg_count(<3 x i16> noundef %arg0, <3 x i16> noundef %arg1, <3 x i16> noundef %arg2, <3 x i16> noundef %arg3, <3 x i16> noundef %arg4, <3 x i16> noundef %arg5, i32 %arg6.coerce0, i32 %arg6.coerce1, i32 %arg6.coerce2, i32 %arg6.coerce3)
	void v3i16_reg_count(short3 arg0, short3 arg1, short3 arg2, short3 arg3,	void v3i16_reg_count(short3 arg0, short3 arg1, short3 arg2, short3 arg3,
	short3 arg4, short3 arg5, struct_4regs arg6) { }	short3 arg4, short3 arg5, struct_4regs arg6) { }

	// CHECK: define{{.*}} void @v3i16_reg_count_over(<3 x i16> noundef %arg0, <3 x i16> noundef %arg1, <3 x i16> noundef %arg2, <3 x i16> noundef %arg3, <3 x i16> noundef %arg4, <3 x i16> noundef %arg5, <3 x i16> noundef %arg6, ptr addrspace(5) nocapture noundef byval(%struct.struct_4regs) align 4 %arg7)	// CHECK: define{{.}} void @v3i16_reg_count_over(<3 x i16> noundef %arg0, <3 x i16> noundef %arg1, <3 x i16> noundef %arg2, <3 x i16> noundef %arg3, <3 x i16> noundef %arg4, <3 x i16> noundef %arg5, <3 x i16> noundef %arg6, ptr addrspace(5) nocapture byref(%struct.struct_4regs) align 4 %{{.}})
	void v3i16_reg_count_over(short3 arg0, short3 arg1, short3 arg2, short3 arg3,	void v3i16_reg_count_over(short3 arg0, short3 arg1, short3 arg2, short3 arg3,
	short3 arg4, short3 arg5, short3 arg6, struct_4regs arg7) { }	short3 arg4, short3 arg5, short3 arg6, struct_4regs arg7) { }

	// CHECK: define{{.*}} void @v2i16_reg_count(<2 x i16> noundef %arg0, <2 x i16> noundef %arg1, <2 x i16> noundef %arg2, <2 x i16> noundef %arg3, <2 x i16> noundef %arg4, <2 x i16> noundef %arg5, <2 x i16> noundef %arg6, <2 x i16> noundef %arg7, <2 x i16> noundef %arg8, <2 x i16> noundef %arg9, <2 x i16> noundef %arg10, <2 x i16> noundef %arg11, i32 %arg13.coerce0, i32 %arg13.coerce1, i32 %arg13.coerce2, i32 %arg13.coerce3)	// CHECK: define{{.*}} void @v2i16_reg_count(<2 x i16> noundef %arg0, <2 x i16> noundef %arg1, <2 x i16> noundef %arg2, <2 x i16> noundef %arg3, <2 x i16> noundef %arg4, <2 x i16> noundef %arg5, <2 x i16> noundef %arg6, <2 x i16> noundef %arg7, <2 x i16> noundef %arg8, <2 x i16> noundef %arg9, <2 x i16> noundef %arg10, <2 x i16> noundef %arg11, i32 %arg13.coerce0, i32 %arg13.coerce1, i32 %arg13.coerce2, i32 %arg13.coerce3)
	void v2i16_reg_count(short2 arg0, short2 arg1, short2 arg2, short2 arg3,	void v2i16_reg_count(short2 arg0, short2 arg1, short2 arg2, short2 arg3,
	short2 arg4, short2 arg5, short2 arg6, short2 arg7,	short2 arg4, short2 arg5, short2 arg6, short2 arg7,
	short2 arg8, short2 arg9, short2 arg10, short2 arg11,	short2 arg8, short2 arg9, short2 arg10, short2 arg11,
	struct_4regs arg13) { }	struct_4regs arg13) { }

	// CHECK: define{{.*}} void @v2i16_reg_count_over(<2 x i16> noundef %arg0, <2 x i16> noundef %arg1, <2 x i16> noundef %arg2, <2 x i16> noundef %arg3, <2 x i16> noundef %arg4, <2 x i16> noundef %arg5, <2 x i16> noundef %arg6, <2 x i16> noundef %arg7, <2 x i16> noundef %arg8, <2 x i16> noundef %arg9, <2 x i16> noundef %arg10, <2 x i16> noundef %arg11, <2 x i16> noundef %arg12, ptr addrspace(5) nocapture noundef byval(%struct.struct_4regs) align 4 %arg13)	// CHECK: define{{.}} void @v2i16_reg_count_over(<2 x i16> noundef %arg0, <2 x i16> noundef %arg1, <2 x i16> noundef %arg2, <2 x i16> noundef %arg3, <2 x i16> noundef %arg4, <2 x i16> noundef %arg5, <2 x i16> noundef %arg6, <2 x i16> noundef %arg7, <2 x i16> noundef %arg8, <2 x i16> noundef %arg9, <2 x i16> noundef %arg10, <2 x i16> noundef %arg11, <2 x i16> noundef %arg12, ptr addrspace(5) nocapture byref(%struct.struct_4regs) align 4 %{{.}})
	void v2i16_reg_count_over(short2 arg0, short2 arg1, short2 arg2, short2 arg3,	void v2i16_reg_count_over(short2 arg0, short2 arg1, short2 arg2, short2 arg3,
	short2 arg4, short2 arg5, short2 arg6, short2 arg7,	short2 arg4, short2 arg5, short2 arg6, short2 arg7,
	short2 arg8, short2 arg9, short2 arg10, short2 arg11,	short2 arg8, short2 arg9, short2 arg10, short2 arg11,
	short2 arg12, struct_4regs arg13) { }	short2 arg12, struct_4regs arg13) { }

	// CHECK: define{{.*}} void @v2i8_reg_count(<2 x i8> noundef %arg0, <2 x i8> noundef %arg1, <2 x i8> noundef %arg2, <2 x i8> noundef %arg3, <2 x i8> noundef %arg4, <2 x i8> noundef %arg5, i32 %arg6.coerce0, i32 %arg6.coerce1, i32 %arg6.coerce2, i32 %arg6.coerce3)	// CHECK: define{{.*}} void @v2i8_reg_count(<2 x i8> noundef %arg0, <2 x i8> noundef %arg1, <2 x i8> noundef %arg2, <2 x i8> noundef %arg3, <2 x i8> noundef %arg4, <2 x i8> noundef %arg5, i32 %arg6.coerce0, i32 %arg6.coerce1, i32 %arg6.coerce2, i32 %arg6.coerce3)
	void v2i8_reg_count(char2 arg0, char2 arg1, char2 arg2, char2 arg3,	void v2i8_reg_count(char2 arg0, char2 arg1, char2 arg2, char2 arg3,
	char2 arg4, char2 arg5, struct_4regs arg6) { }	char2 arg4, char2 arg5, struct_4regs arg6) { }

	// CHECK: define{{.*}} void @v2i8_reg_count_over(<2 x i8> noundef %arg0, <2 x i8> noundef %arg1, <2 x i8> noundef %arg2, <2 x i8> noundef %arg3, <2 x i8> noundef %arg4, <2 x i8> noundef %arg5, i32 noundef %arg6, ptr addrspace(5) nocapture noundef byval(%struct.struct_4regs) align 4 %arg7)	// CHECK: define{{.}} void @v2i8_reg_count_over(<2 x i8> noundef %arg0, <2 x i8> noundef %arg1, <2 x i8> noundef %arg2, <2 x i8> noundef %arg3, <2 x i8> noundef %arg4, <2 x i8> noundef %arg5, i32 noundef %arg6, ptr addrspace(5) nocapture byref(%struct.struct_4regs) align 4 %{{.}})
	void v2i8_reg_count_over(char2 arg0, char2 arg1, char2 arg2, char2 arg3,	void v2i8_reg_count_over(char2 arg0, char2 arg1, char2 arg2, char2 arg3,
	char2 arg4, char2 arg5, int arg6, struct_4regs arg7) { }	char2 arg4, char2 arg5, int arg6, struct_4regs arg7) { }

	// CHECK: define{{.*}} void @num_regs_left_64bit_aggregate(<4 x i32> noundef %arg0, <4 x i32> noundef %arg1, <4 x i32> noundef %arg2, <3 x i32> noundef %arg3, [2 x i32] %arg4.coerce, i32 noundef %arg5)	// CHECK: define{{.*}} void @num_regs_left_64bit_aggregate(<4 x i32> noundef %arg0, <4 x i32> noundef %arg1, <4 x i32> noundef %arg2, <3 x i32> noundef %arg3, [2 x i32] %arg4.coerce, i32 noundef %arg5)
	void num_regs_left_64bit_aggregate(int4 arg0, int4 arg1, int4 arg2, int3 arg3, struct_char_x8 arg4, int arg5) { }	void num_regs_left_64bit_aggregate(int4 arg0, int4 arg1, int4 arg2, int3 arg3, struct_char_x8 arg4, int arg5) { }
Context not available.

clang/test/CodeGenOpenCL/byval.cl

	// RUN: %clang_cc1 -emit-llvm -o - -triple amdgcn %s \| FileCheck %s			// RUN: %clang_cc1 -emit-llvm -o - -triple i686-pc-darwin %s \| FileCheck -check-prefix=X86 %s
				arsenmUnsubmitted Not Done Reply Inline Actions The test name suggests we should test with a different target that does use byval here arsenm: The test name suggests we should test with a different target that does use byval here
				// RUN: %clang_cc1 -emit-llvm -o - -triple amdgcn %s \| FileCheck -check-prefix=AMDGCN %s
	struct A {			struct A {
	int x[100];			int x[100];
	};			};

	int f(struct A a);			int f(struct A a);

	int g() {			int g() {
	struct A a;			struct A a;
	// CHECK: call i32 @f(ptr addrspace(5) noundef byval{{.*}}%a)			// X86: call i32 @f(ptr noundef nonnull byval(%struct.A) align 4 %a)
				// AMDGCN: call i32 @f(ptr addrspace(5) byref{{.*}}%a)
	return f(a);			return f(a);
	}			}

	// CHECK: declare i32 @f(ptr addrspace(5) noundef byval{{.*}})			// X86: declare i32 @f(ptr noundef byval(%struct.A) align 4)
				// AMDGCN: declare i32 @f(ptr addrspace(5) byref{{.*}})

llvm/docs/AMDGPUUsage.rst

	Show All 22 Lines

	If the function calls another function, it will place any stack allocated			If the function calls another function, it will place any stack allocated
	arguments after the last local allocation and adjust SGPR32 to the address			arguments after the last local allocation and adjust SGPR32 to the address
	after the last local allocation.			after the last local allocation.

	9. All other registers are unspecified.			9. All other registers are unspecified.
	10. Any necessary ``s_waitcnt`` has been performed to ensure memory is available			10. Any necessary ``s_waitcnt`` has been performed to ensure memory is available
	to the function.			to the function.
				11: Use pass-by-reference (byref) instead of pass-by-value (byval) for struct
				arguments in function ABI. Callee is responsible to allocate memory and
				arsenmUnsubmitted Done Reply Inline Actions Specify C ABI arsenm: Specify C ABI
				cfangAuthorUnsubmitted Done Reply Inline Actions Do not get what to do to "Specify C ABI"? Can you suggest explicitly? Thanks. cfang: Do not get what to do to "Specify C ABI"? Can you suggest explicitly? Thanks.
				arsenmUnsubmitted Done Reply Inline Actions Add the letter C arsenm: Add the letter C
				cfangAuthorUnsubmitted Done Reply Inline Actions ... in function C ABI? Or should we remove "function"? cfang: ... in function C ABI? Or should we remove "function"?
				arsenmUnsubmitted Done Reply Inline Actions Can just say C, doesn't really matter if you state function or not. function is implied arsenm: Can just say C, doesn't really matter if you state function or not. function is implied
				arsenmUnsubmitted Done Reply Inline Actions s/to allocate memory/for allocating stack memory/ arsenm: s/to allocate memory/for allocating stack memory/
				make a copy of the struct. Note that the backend still supports byval for
				arsenmUnsubmitted Done Reply Inline Actions copying the value of the struct if modified arsenm: copying the value of the struct if modified
				struct arguments.

	On exit from a function:			On exit from a function:

	1. VGPR0-31 and SGPR4-29 are used to pass function result arguments as			1. VGPR0-31 and SGPR4-29 are used to pass function result arguments as
	described below. Any registers used are considered clobbered registers.			described below. Any registers used are considered clobbered registers.
	2. The following registers are preserved and have the same value as on entry:			2. The following registers are preserved and have the same value as on entry:

	* FLAT_SCRATCH			* FLAT_SCRATCH
	Show All 22 Lines