This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/CodeGen/
-
CodeGen/
1
CGCall.cpp
-
test/CodeGenCUDA/
-
CodeGenCUDA/
1/2
amdgpu-kernel-arg-pointer-type.cu

Differential D80237

[hip] Ensure pointer in struct argument has proper `addrspacecast`.
AbandonedPublic

Authored by hliao on May 19 2020, 1:28 PM.

Download Raw Diff

Details

Reviewers

arsenm
tra
rjmccall
yaxunl

Summary

In last https://reviews.llvm.org/D69826, generic pointers in struct/array types are also replaced with global pointers. But, as no additional addrspacecast is inserted, they are promoted with a ptrtoint/inttoptr pair in SROA/GVN. That breaks the address space inferring as well as other optimizations. For such case, we need to recursively dive into these aggregate types and insert addrspacecast when necessary.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

hliao created this revision.May 19 2020, 1:28 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 19 2020, 1:28 PM

Herald added subscribers: cfe-commits, kerbowa, nhaehnle and 2 others. · View Herald Transcript

tra added inline comments.May 19 2020, 2:12 PM

clang/lib/CodeGen/CGCall.cpp
1339	Is there a limit on array size? We may end up here with potentially unbounded number of spacecasts. Perhaps we need a loop which may later be unrolled, if feasible.
clang/test/CodeGenCUDA/amdgpu-kernel-arg-pointer-type.cu
17	Nit: relying on parameter name `%x.coerce` may be rather fragile. Considering that we don't care about the name itself, it may be better to just trim the match after `%x` or even after`%`.
62–63	Nit: You could use `CHECK-COUNT-2` : https://llvm.org/docs/CommandGuide/FileCheck.html#the-check-count-directive

Harbormaster failed remote builds in B57258: Diff 265017!May 19 2020, 2:19 PM

Revise following comments.

Harbormaster failed remote builds in B57334: Diff 265148!May 20 2020, 1:02 AM

addrspacecast might be a real conversion. I feel like this is really going well beyond what argument coercion should be expected to do, and we need to step back and re-evaluate how we're doing this.

In D80237#2051887, @rjmccall wrote:

addrspacecast might be a real conversion. I feel like this is really going well beyond what argument coercion should be expected to do, and we need to step back and re-evaluate how we're doing this.

addrspacecast *must* be a no-op in terms of argument coercion. As we change the parameter type but cannot control how an argument is prepared, coercion should only change how we interpret them instead of how we encode them. If the target chooses to add an addrspacecast, it must ensure that's a no-op. In AMDGPU, global pointers have the same encoding as their corresponding generic pointers.

In D80237#2051909, @hliao wrote:

In D80237#2051887, @rjmccall wrote:

addrspacecast might be a real conversion. I feel like this is really going well beyond what argument coercion should be expected to do, and we need to step back and re-evaluate how we're doing this.

addrspacecast *must* be a no-op in terms of argument coercion.

So what does this mean exactly? If the ABI lowering uses argument coercion in a way that changes address spaces, it must ensure that the representations are different? So it's always *legal* to just do a memcpy here, we're just trying really hard to not do so.

In D80237#2051918, @rjmccall wrote:

In D80237#2051909, @hliao wrote:

In D80237#2051887, @rjmccall wrote:

addrspacecast might be a real conversion. I feel like this is really going well beyond what argument coercion should be expected to do, and we need to step back and re-evaluate how we're doing this.

addrspacecast *must* be a no-op in terms of argument coercion.

So what does this mean exactly? If the ABI lowering uses argument coercion in a way that changes address spaces, it must ensure that the representations are different? So it's always *legal* to just do a memcpy here, we're just trying really hard to not do so.

If addrspacecast is used, the target must ensure that addrspacecast would NOT change the representation to use argument coercion. Yeah, it's always safe to memcpy or load/store directly but, without target specific knowledge, SROA/GVN won't create a no-op addrspacecast to help address space inferring pass. SROA/GVN only generates the pair of ptr2int/int2ptr to ensure the presentation is the same. Unfortunately, address space inferring doesn't understand that without target specific knowledge as well. That leaves the address space information won't be propagated to final IR. We need a trigger to enable that. That's why we need to add addrspacecast for a simple type argument in line 1314 - 1321. In this patch, we need to traverse the struct and add proper addrspacecast.

Okay. Can you explain why we need to coerce in the first place, though? Especially if the representation is the same, why is your target-lowering requiring parameters to be coerced to involve pointers in a different address space?

In D80237#2051933, @rjmccall wrote:

Okay. Can you explain why we need to coerce in the first place, though? Especially if the representation is the same, why is your target-lowering requiring parameters to be coerced to involve pointers in a different address space?

It's not a requirement, but an optimization. The pointer representation is the same, but there's a penalty to using the generic pointer. From the language context here, we know the flat pointer can never alias the non-global address spaces. InferAddressSpaces won't be able to infer this (maybe it could, but it might break a theoretical language that allows passing some valid 32-bit address in 64-bit flat pointers).

I think the real problem here is the IR is missing a no-op cast between pointers with different address spaces. With the current promotion, GVN sees the bits of the addrspace(0) ptr get reinterpreted as addrspace(1), and doesn't have a better option than a ptrtoint/inttoptr pair. When addrspacecast was added, pointer bitcasts between different address spaces were disallowed (I think mostly for reasons that no longer apply). If we reallowed pointer bitcast between equivalent sized address spaces as known no-op cast, we wouldn't need to change the argument promotion in the frontend.

In D80237#2055902, @arsenm wrote:

In D80237#2051933, @rjmccall wrote:

Okay. Can you explain why we need to coerce in the first place, though? Especially if the representation is the same, why is your target-lowering requiring parameters to be coerced to involve pointers in a different address space?

It's not a requirement, but an optimization. The pointer representation is the same, but there's a penalty to using the generic pointer. From the language context here, we know the flat pointer can never alias the non-global address spaces.

Is that because it's a kernel argument? I don't understand how the coercion can work in general for structs that just contain pointers.

I think the real problem here is the IR is missing a no-op cast between pointers with different address spaces. With the current promotion, GVN sees the bits of the addrspace(0) ptr get reinterpreted as addrspace(1), and doesn't have a better option than a ptrtoint/inttoptr pair. When addrspacecast was added, pointer bitcasts between different address spaces were disallowed (I think mostly for reasons that no longer apply). If we reallowed pointer bitcast between equivalent sized address spaces as known no-op cast, we wouldn't need to change the argument promotion in the frontend.

That seems like it would be a much cleaner way of handling this, yeah. Ideally, GVN would be able to find the target-specific information necessary to know that it can introduce an addrspacecast between two address spaces.

In D80237#2058108, @rjmccall wrote:

In D80237#2055902, @arsenm wrote:

In D80237#2051933, @rjmccall wrote:

Okay. Can you explain why we need to coerce in the first place, though? Especially if the representation is the same, why is your target-lowering requiring parameters to be coerced to involve pointers in a different address space?

It's not a requirement, but an optimization. The pointer representation is the same, but there's a penalty to using the generic pointer. From the language context here, we know the flat pointer can never alias the non-global address spaces.

Is that because it's a kernel argument? I don't understand how the coercion can work in general for structs that just contain pointers.

Yes, there's no valid way to pass a non-global address space pointer through a flat pointer into a kernel. This doesn't apply in other contexts

In D80237#2062991, @arsenm wrote:

In D80237#2058108, @rjmccall wrote:

In D80237#2055902, @arsenm wrote:

In D80237#2051933, @rjmccall wrote:

Okay. Can you explain why we need to coerce in the first place, though? Especially if the representation is the same, why is your target-lowering requiring parameters to be coerced to involve pointers in a different address space?

It's not a requirement, but an optimization. The pointer representation is the same, but there's a penalty to using the generic pointer. From the language context here, we know the flat pointer can never alias the non-global address spaces.

Is that because it's a kernel argument? I don't understand how the coercion can work in general for structs that just contain pointers.

Yes, there's no valid way to pass a non-global address space pointer through a flat pointer into a kernel. This doesn't apply in other contexts

Okay. And you can't just treat kernel parameters specially in your pass that un-promotes generic pointers back to global pointers? Or is that pass not quite so precise as that?

hliao abandoned this revision.Jun 25 2020, 8:43 PM

Revision Contents

Path

Size

clang/

lib/

CodeGen/

CGCall.cpp

85 lines

test/

CodeGenCUDA/

amdgpu-kernel-arg-pointer-type.cu

57 lines

Diff 265148

clang/lib/CodeGen/CGCall.cpp

Show First 20 Lines • Show All 1,266 Lines • ▼ Show 20 Lines	CGF.Builder.CreateMemCpy(Tmp.getPointer(), Tmp.getAlignment().getAsAlign(),
llvm::ConstantInt::get(CGF.IntPtrTy, SrcSize));		llvm::ConstantInt::get(CGF.IntPtrTy, SrcSize));
return CGF.Builder.CreateLoad(Tmp);		return CGF.Builder.CreateLoad(Tmp);
}		}

// Function to store a first-class aggregate into memory. We prefer to		// Function to store a first-class aggregate into memory. We prefer to
// store the elements rather than the aggregate to be more friendly to		// store the elements rather than the aggregate to be more friendly to
// fast-isel.		// fast-isel.
// FIXME: Do we need to recurse here?		// FIXME: Do we need to recurse here?
static void BuildAggStore(CodeGenFunction &CGF, llvm::Value *Val,		static void BuildAggStore(CodeGenFunction &CGF, llvm::Value *Val, Address Dest,
Address Dest, bool DestIsVolatile) {		bool DestIsVolatile, llvm::Type *DstTy = nullptr) {
		auto &DL = CGF.CGM.getDataLayout();
		llvm::Type *SrcTy = Val->getType();
// Prefer scalar stores to first-class aggregate stores.		// Prefer scalar stores to first-class aggregate stores.
if (llvm::StructType *STy =		if (llvm::StructType *SrcSTy = dyn_cast<llvm::StructType>(SrcTy)) {
dyn_cast<llvm::StructType>(Val->getType())) {		llvm::StructType *DstSTy = dyn_cast_or_null<llvm::StructType>(DstTy);
for (unsigned i = 0, e = STy->getNumElements(); i != e; ++i) {		const llvm::StructLayout *SrcSL = nullptr;
		const llvm::StructLayout *DstSL = nullptr;
		if (DstSTy && SrcSTy->getNumElements() == DstSTy->getNumElements()) {
		// Retrive StructLayout objects if both src and dst are struct types.
		SrcSL = DL.getStructLayout(SrcSTy);
		DstSL = DL.getStructLayout(DstSTy);
		}
		for (unsigned i = 0, e = SrcSTy->getNumElements(); i != e; ++i) {
Address EltPtr = CGF.Builder.CreateStructGEP(Dest, i);		Address EltPtr = CGF.Builder.CreateStructGEP(Dest, i);
llvm::Value *Elt = CGF.Builder.CreateExtractValue(Val, i);		llvm::Value *Elt = CGF.Builder.CreateExtractValue(Val, i);
		// Check if the element starts from the same offset.
		if (SrcSL && DstSL &&
		SrcSL->getElementOffset(i) == DstSL->getElementOffset(i)) {
		llvm::Type *SrcEltTy = SrcSTy->getElementType(i);
		llvm::Type *DstEltTy = DstSTy->getElementType(i);
		assert(Elt->getType() == SrcEltTy);
		// Check if the store size is same as well.
		if (DL.getTypeStoreSize(SrcEltTy) == DL.getTypeStoreSize(DstEltTy)) {
		llvm::PointerType *SrcPtrTy = dyn_cast<llvm::PointerType>(SrcEltTy);
		llvm::PointerType *DstPtrTy = dyn_cast<llvm::PointerType>(DstEltTy);
		// Apply `addrspacecast` when necessary.
		if (SrcPtrTy && DstPtrTy &&
		SrcPtrTy->getAddressSpace() != DstPtrTy->getAddressSpace()) {
		Elt =
		CGF.Builder.CreatePointerBitCastOrAddrSpaceCast(Elt, DstEltTy);
		}
		EltPtr = CGF.Builder.CreateElementBitCast(EltPtr, DstEltTy);
		BuildAggStore(CGF, Elt, EltPtr, DestIsVolatile, DstEltTy);
		continue;
		}
		}
		// If there is any mismatch, i.e. the different offsets or the different
		// sizes, clear StructLayout objects to skip further checking.
		SrcSL = DstSL = nullptr;
CGF.Builder.CreateStore(Elt, EltPtr, DestIsVolatile);		CGF.Builder.CreateStore(Elt, EltPtr, DestIsVolatile);
}		}
} else {		return;
CGF.Builder.CreateStore(Val, Dest, DestIsVolatile);		}
		// For array types, prefer scalar stores as well if they have matching
		// layouts and reasonable number of fields.
		if (llvm::ArrayType *SrcATy = dyn_cast<llvm::ArrayType>(SrcTy)) {
		llvm::ArrayType *DstATy = dyn_cast_or_null<llvm::ArrayType>(DstTy);
		if (DstATy && DstATy->getNumElements() <= 16 &&
		SrcATy->getNumElements() == DstATy->getNumElements() &&
		CGF.CGM.getDataLayout().getTypeAllocSize(SrcATy->getElementType()) ==
		CGF.CGM.getDataLayout().getTypeAllocSize(
		DstATy->getElementType())) {
		llvm::Type *SrcEltTy = SrcATy->getElementType();
		llvm::Type *DstEltTy = DstATy->getElementType();
		llvm::PointerType *DstPtrTy = nullptr;
		if (isa<llvm::PointerType>(SrcEltTy) &&
		isa<llvm::PointerType>(DstEltTy) &&
		cast<llvm::PointerType>(SrcEltTy)->getAddressSpace() !=
		cast<llvm::PointerType>(DstEltTy)->getAddressSpace()) {
		// For matching layout, check the case where `addrspacecast` is
		// required.
		DstPtrTy = cast<llvm::PointerType>(DstEltTy);
		}
		traUnsubmitted Not Done Reply Inline Actions Is there a limit on array size? We may end up here with potentially unbounded number of spacecasts. Perhaps we need a loop which may later be unrolled, if feasible. tra: Is there a limit on array size? We may end up here with potentially unbounded number of…
		for (uint64_t i = 0, e = SrcATy->getNumElements(); i < e; ++i) {
		Address EltPtr = CGF.Builder.CreateConstArrayGEP(Dest, i);
		llvm::Value *Elt = CGF.Builder.CreateExtractValue(Val, i);
		if (DstPtrTy) {
		// Insert `addrspacecast` if necessary.
		Elt = CGF.Builder.CreatePointerBitCastOrAddrSpaceCast(Elt, DstPtrTy);
}		}
		EltPtr = CGF.Builder.CreateElementBitCast(EltPtr, DstEltTy);
		BuildAggStore(CGF, Elt, EltPtr, DestIsVolatile, DstEltTy);
		}
		return;
		}
		// Fall back to aggregate store if it's not safe due to the layout
		// mismatch.
		}
		CGF.Builder.CreateStore(Val, Dest, DestIsVolatile);
}		}

/// CreateCoercedStore - Create a store to \arg DstPtr from \arg Src,		/// CreateCoercedStore - Create a store to \arg DstPtr from \arg Src,
/// where the source and destination may have different types. The		/// where the source and destination may have different types. The
/// destination is known to be aligned to \arg DstAlign bytes.		/// destination is known to be aligned to \arg DstAlign bytes.
///		///
/// This safely handles the case when the src type is larger than the		/// This safely handles the case when the src type is larger than the
/// destination type; the upper bits of the src will be lost.		/// destination type; the upper bits of the src will be lost.
static void CreateCoercedStore(llvm::Value *Src,		static void CreateCoercedStore(llvm::Value *Src,
Address Dst,		Address Dst,
bool DstIsVolatile,		bool DstIsVolatile,
CodeGenFunction &CGF) {		CodeGenFunction &CGF) {
llvm::Type *SrcTy = Src->getType();		llvm::Type *SrcTy = Src->getType();
		llvm::Type *OrigDstTy = Dst.getElementType();
llvm::Type *DstTy = Dst.getElementType();		llvm::Type *DstTy = Dst.getElementType();
if (SrcTy == DstTy) {		if (SrcTy == DstTy) {
CGF.Builder.CreateStore(Src, Dst, DstIsVolatile);		CGF.Builder.CreateStore(Src, Dst, DstIsVolatile);
return;		return;
}		}

uint64_t SrcSize = CGF.CGM.getDataLayout().getTypeAllocSize(SrcTy);		uint64_t SrcSize = CGF.CGM.getDataLayout().getTypeAllocSize(SrcTy);

Show All 20 Lines	if ((isa<llvm::IntegerType>(SrcTy) \|\| isa<llvm::PointerType>(SrcTy)) &&
return;		return;
}		}

uint64_t DstSize = CGF.CGM.getDataLayout().getTypeAllocSize(DstTy);		uint64_t DstSize = CGF.CGM.getDataLayout().getTypeAllocSize(DstTy);

// If store is legal, just bitcast the src pointer.		// If store is legal, just bitcast the src pointer.
if (SrcSize <= DstSize) {		if (SrcSize <= DstSize) {
Dst = CGF.Builder.CreateElementBitCast(Dst, SrcTy);		Dst = CGF.Builder.CreateElementBitCast(Dst, SrcTy);
BuildAggStore(CGF, Src, Dst, DstIsVolatile);		BuildAggStore(CGF, Src, Dst, DstIsVolatile, OrigDstTy);
} else {		} else {
// Otherwise do coercion through memory. This is stupid, but		// Otherwise do coercion through memory. This is stupid, but
// simple.		// simple.

// Generally SrcSize is never greater than DstSize, since this means we are		// Generally SrcSize is never greater than DstSize, since this means we are
// losing bits. However, this can happen in cases where the structure has		// losing bits. However, this can happen in cases where the structure has
// additional padding, for example due to a user specified alignment.		// additional padding, for example due to a user specified alignment.
//		//
▲ Show 20 Lines • Show All 3,799 Lines • Show Last 20 Lines

clang/test/CodeGenCUDA/amdgpu-kernel-arg-pointer-type.cu

	// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fcuda-is-device -emit-llvm -x hip %s -o - \| FileCheck %s			// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fcuda-is-device -emit-llvm -x hip %s -o - \| FileCheck --check-prefixes=COMMON,CHECK %s
				// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fcuda-is-device -emit-llvm -x hip %s -disable-O0-optnone -o - \| opt -S -O2 \| FileCheck %s --check-prefixes=COMMON,OPT
	// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -emit-llvm -x hip %s -o - \| FileCheck -check-prefix=HOST %s			// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -emit-llvm -x hip %s -o - \| FileCheck -check-prefix=HOST %s

	#include "Inputs/cuda.h"			#include "Inputs/cuda.h"

	// Coerced struct from `struct S` without all generic pointers lowered into			// Coerced struct from `struct S` without all generic pointers lowered into
	// global ones.			// global ones.
	// CHECK: %struct.S.coerce = type { i32 addrspace(1), float addrspace(1) }			// COMMON: %struct.S.coerce = type { i32 addrspace(1), float addrspace(1) }
	// CHECK: %struct.T.coerce = type { [2 x float addrspace(1)*] }			// COMMON: %struct.T.coerce = type { [2 x float addrspace(1)*] }

	// On the host-side compilation, generic pointer won't be coerced.			// On the host-side compilation, generic pointer won't be coerced.
	// HOST-NOT: %struct.S.coerce			// HOST-NOT: %struct.S.coerce
	// HOST-NOT: %struct.T.coerce			// HOST-NOT: %struct.T.coerce

	// CHECK: define amdgpu_kernel void @_Z7kernel1Pi(i32 addrspace(1)* %x.coerce)
	// HOST: define void @_Z22__device_stub__kernel1Pi(i32* %x)			// HOST: define void @_Z22__device_stub__kernel1Pi(i32* %x)
				// COMMON-LABEL: define amdgpu_kernel void @_Z7kernel1Pi(i32 addrspace(1){{.}} %x.coerce)
				traUnsubmitted Not Done Reply Inline Actions Nit: relying on parameter name `%x.coerce` may be rather fragile. Considering that we don't care about the name itself, it may be better to just trim the match after `%x` or even after`%`. tra: Nit: relying on parameter name `%x.coerce` may be rather fragile. Considering that we don't…
				// CHECK: = addrspacecast [[TYPE:.]] addrspace(1) %{{.}} to [[TYPE]]
				// CHECK-NOT: = addrspacecast [[TYPE:.]] addrspace(1) %{{.}} to [[TYPE]]
				// OPT-NOT: alloca
				// OPT-NOT: inttoptr
	__global__ void kernel1(int *x) {			__global__ void kernel1(int *x) {
	x[0]++;			x[0]++;
	}			}

	// CHECK: define amdgpu_kernel void @_Z7kernel2Ri(i32 addrspace(1)* nonnull align 4 dereferenceable(4) %x.coerce)
	// HOST: define void @_Z22__device_stub__kernel2Ri(i32* nonnull align 4 dereferenceable(4) %x)			// HOST: define void @_Z22__device_stub__kernel2Ri(i32* nonnull align 4 dereferenceable(4) %x)
				// COMMON-LABEL: define amdgpu_kernel void @_Z7kernel2Ri(i32 addrspace(1){{.}} nonnull align 4 dereferenceable(4) %x.coerce)
				// CHECK: = addrspacecast [[TYPE:.]] addrspace(1) %{{.}} to [[TYPE]]
				// CHECK-NOT: = addrspacecast [[TYPE:.]] addrspace(1) %{{.}} to [[TYPE]]
				// OPT-NOT: alloca
				// OPT-NOT: ptrtoint
				// OPT-NOT: inttoptr
	__global__ void kernel2(int &x) {			__global__ void kernel2(int &x) {
	x++;			x++;
	}			}

	// CHECK: define amdgpu_kernel void @_Z7kernel3PU3AS2iPU3AS1i(i32 addrspace(2)* %x, i32 addrspace(1)* %y)
	// HOST: define void @_Z22__device_stub__kernel3PU3AS2iPU3AS1i(i32 addrspace(2)* %x, i32 addrspace(1)* %y)			// HOST: define void @_Z22__device_stub__kernel3PU3AS2iPU3AS1i(i32 addrspace(2)* %x, i32 addrspace(1)* %y)
				// CHECK-LABEL: define amdgpu_kernel void @_Z7kernel3PU3AS2iPU3AS1i(i32 addrspace(2){{.}} %x, i32 addrspace(1){{.}} %y)
				// CHECK-NOT: = addrspacecast [[TYPE:.]] addrspace(1) %{{.}} to [[TYPE]]
	__global__ void kernel3(__attribute__((address_space(2))) int *x,			__global__ void kernel3(__attribute__((address_space(2))) int *x,
	__attribute__((address_space(1))) int *y) {			__attribute__((address_space(1))) int *y) {
	y[0] = x[0];			y[0] = x[0];
	}			}

	// CHECK: define void @_Z4funcPi(i32* %x)			// COMMON-LABEL: define void @_Z4funcPi(i32{{.}} %x)
				// CHECK-NOT: = addrspacecast [[TYPE:.]] addrspace(1) %{{.}} to [[TYPE]]
				// OPT-NOT: alloca
				// OPT-NOT: ptrtoint
				// OPT-NOT: inttoptr
	__device__ void func(int *x) {			__device__ void func(int *x) {
	x[0]++;			x[0]++;
	}			}

	struct S {			struct S {
	int *x;			int *x;
	float *y;			float *y;
	};			};
	// `by-val` struct will be coerced into a similar struct with all generic			// `by-val` struct will be coerced into a similar struct with all generic
	// pointers lowerd into global ones.			// pointers lowerd into global ones.
	// CHECK: define amdgpu_kernel void @_Z7kernel41S(%struct.S.coerce %s.coerce)
	// HOST: define void @_Z22__device_stub__kernel41S(i32* %s.coerce0, float* %s.coerce1)			// HOST: define void @_Z22__device_stub__kernel41S(i32* %s.coerce0, float* %s.coerce1)
				// COMMON-LABEL: define amdgpu_kernel void @_Z7kernel41S(%struct.S.coerce %s.coerce)
				// CHECK-COUNT-2: = addrspacecast [[TYPE:.]] addrspace(1) %{{.}} to [[TYPE]]
				// CHECK-NOT: = addrspacecast [[TYPE:.]] addrspace(1) %{{.}} to [[TYPE]]
				traUnsubmitted Done Reply Inline Actions Nit: You could use `CHECK-COUNT-2` : https://llvm.org/docs/CommandGuide/FileCheck.html#the-check-count-directive tra: Nit: You could use `CHECK-COUNT-2` : https://llvm.org/docs/CommandGuide/FileCheck.html#the…
				// OPT-NOT: alloca
				// OPT-NOT: ptrtoint
				// OPT-NOT: inttoptr
	__global__ void kernel4(struct S s) {			__global__ void kernel4(struct S s) {
	s.x[0]++;			s.x[0]++;
	s.y[0] += 1.f;			s.y[0] += 1.f;
	}			}

	// If a pointer to struct is passed, only the pointer itself is coerced into the global one.			// If a pointer to struct is passed, only the pointer itself is coerced into the global one.
	// CHECK: define amdgpu_kernel void @_Z7kernel5P1S(%struct.S addrspace(1)* %s.coerce)
	// HOST: define void @_Z22__device_stub__kernel5P1S(%struct.S* %s)			// HOST: define void @_Z22__device_stub__kernel5P1S(%struct.S* %s)
				// COMMON-LABEL: define amdgpu_kernel void @_Z7kernel5P1S(%struct.S addrspace(1){{.}} %s.coerce)
				// CHECK: = addrspacecast [[TYPE:.]] addrspace(1) %{{.}} to [[TYPE]]
				// CHECK-NOT: = addrspacecast [[TYPE:.]] addrspace(1) %{{.}} to [[TYPE]]
				// OPT-NOT: alloca
				// OPT-NOT: ptrtoint
				// OPT-NOT: inttoptr
	__global__ void kernel5(struct S *s) {			__global__ void kernel5(struct S *s) {
	s->x[0]++;			s->x[0]++;
	s->y[0] += 1.f;			s->y[0] += 1.f;
	}			}

	struct T {			struct T {
	float *x[2];			float *x[2];
	};			};
	// `by-val` array is also coerced.			// `by-val` array is also coerced.
	// CHECK: define amdgpu_kernel void @_Z7kernel61T(%struct.T.coerce %t.coerce)
	// HOST: define void @_Z22__device_stub__kernel61T(float* %t.coerce0, float* %t.coerce1)			// HOST: define void @_Z22__device_stub__kernel61T(float* %t.coerce0, float* %t.coerce1)
				// COMMON-LABEL: define amdgpu_kernel void @_Z7kernel61T(%struct.T.coerce %t.coerce)
				// CHECK-COUNT-2: = addrspacecast [[TYPE:.]] addrspace(1) %{{.}} to [[TYPE]]
				// CHECK-NOT: = addrspacecast [[TYPE:.]] addrspace(1) %{{.}} to [[TYPE]]
				// OPT-NOT: alloca
				// OPT-NOT: ptrtoint
				// OPT-NOT: inttoptr
	__global__ void kernel6(struct T t) {			__global__ void kernel6(struct T t) {
	t.x[0][0] += 1.f;			t.x[0][0] += 1.f;
	t.x[1][0] += 2.f;			t.x[1][0] += 2.f;
	}			}

	// Check that coerced pointers retain the noalias attribute when qualified with __restrict.			// Check that coerced pointers retain the noalias attribute when qualified with __restrict.
	// CHECK: define amdgpu_kernel void @_Z7kernel7Pi(i32 addrspace(1)* noalias %x.coerce)
	// HOST: define void @_Z22__device_stub__kernel7Pi(i32* noalias %x)			// HOST: define void @_Z22__device_stub__kernel7Pi(i32* noalias %x)
				// COMMON-LABEL: define amdgpu_kernel void @_Z7kernel7Pi(i32 addrspace(1)* noalias{{.*}} %x.coerce)
				// CHECK: = addrspacecast [[TYPE:.]] addrspace(1) %{{.}} to [[TYPE]]
				// CHECK-NOT: = addrspacecast [[TYPE:.]] addrspace(1) %{{.}} to [[TYPE]]
				// OPT-NOT: alloca
				// OPT-NOT: ptrtoint
				// OPT-NOT: inttoptr
	__global__ void kernel7(int *__restrict x) {			__global__ void kernel7(int *__restrict x) {
	x[0]++;			x[0]++;
	}			}