Diff 353868

llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp

Show First 20 Lines • Show All 303 Lines • ▼ Show 20 Lines	for (size_t I = 0; I < LocalVars.size(); I++) {
});		});
} else {		} else {
GV->replaceAllUsesWith(GEP);		GV->replaceAllUsesWith(GEP);
}		}
if (GV->use_empty()) {		if (GV->use_empty()) {
UsedList.erase(GV);		UsedList.erase(GV);
GV->eraseFromParent();		GV->eraseFromParent();
}		}

		uint64_t Off = DL.getStructLayout(LDSTy)->getElementOffset(I);
		Align A = commonAlignment(StructAlign, Off);
		hsmhsmUnsubmitted Done Reply Inline Actions Did not understand the logic behind calling commonAlignment() here. I thought, alignment of GV and GEP are same, and we just need to propogate the alignment of GV? Because, we already have properly updated the alignment of GV? hsmhsm: Did not understand the logic behind calling commonAlignment() here. I thought, alignment of GV…
		rampitecAuthorUnsubmitted Done Reply Inline Actions Did not understand the logic behind calling commonAlignment() here. I thought, alignment of GV and GEP are same, and we just need to propogate the alignment of GV? Because, we already have properly updated the alignment of GV? Assume struct: { i64 x 16, i32 } Second field has align 4 and that is the alignment of GV and GEP. But given the structure layout we can tell that actual efective alignment is 8. That is what commonAlignment() call is about. rampitec: > Did not understand the logic behind calling commonAlignment() here. I thought, alignment of…
		refineUsesAlignment(GEP, A, DL);
}		}

// Mark kernels with asm that reads the address of the allocated structure		// Mark kernels with asm that reads the address of the allocated structure
// This is not necessary for lowering. This lets other passes, specifically		// This is not necessary for lowering. This lets other passes, specifically
// PromoteAlloca, accurately calculate how much LDS will be used by the		// PromoteAlloca, accurately calculate how much LDS will be used by the
// kernel after lowering.		// kernel after lowering.
if (!F) {		if (!F) {
IRBuilder<> Builder(Ctx);		IRBuilder<> Builder(Ctx);
SmallPtrSet<Function *, 32> Kernels;		SmallPtrSet<Function *, 32> Kernels;
for (auto &I : M.functions()) {		for (auto &I : M.functions()) {
Function *Func = &I;		Function *Func = &I;
if (AMDGPU::isKernelCC(Func) && !Kernels.contains(Func)) {		if (AMDGPU::isKernelCC(Func) && !Kernels.contains(Func)) {
markUsedByKernel(Builder, Func, SGV);		markUsedByKernel(Builder, Func, SGV);
Kernels.insert(Func);		Kernels.insert(Func);
}		}
}		}
}		}
return true;		return true;
}		}

		void refineUsesAlignment(Value *Ptr, Align A, const DataLayout &DL,
		arsenmUnsubmitted Not Done Reply Inline Actions This looks like you are reinventing getOrEnforceKnownAlignment arsenm: This looks like you are reinventing getOrEnforceKnownAlignment
		arsenmUnsubmitted Not Done Reply Inline Actions I'm not sure why you need to do this. SelectionDAG does already try to increase the alignment after this point arsenm: I'm not sure why you need to do this. SelectionDAG does already try to increase the alignment…
		rampitecAuthorUnsubmitted Done Reply Inline Actions I'm not sure why you need to do this. SelectionDAG does already try to increase the alignment after this point It actually does not, at least not in this scenario. You may notice there are actual codegen changes. I was checking -print-after-all and alignment stays the same all the way past selection. Besides there is also global isel. I assume the earlier we get better alignment the better. rampitec: > I'm not sure why you need to do this. SelectionDAG does already try to increase the alignment…
		rampitecAuthorUnsubmitted Done Reply Inline Actions I'm not sure why you need to do this. SelectionDAG does already try to increase the alignment after this point It actually does not, at least not in this scenario. You may notice there are actual codegen changes. I was checking -print-after-all and alignment stays the same all the way past selection. Besides there is also global isel. I assume the earlier we get better alignment the better. I have checked, SDag only calls refineAlignment when it folds a node or for a mem intrinsic, which does not happen with the tests I am using. I hit zero bps in refineAlignment. rampitec: > > I'm not sure why you need to do this. SelectionDAG does already try to increase the…
		rampitecAuthorUnsubmitted Done Reply Inline Actions This looks like you are reinventing getOrEnforceKnownAlignment It is not the same. I do not need to enforce alignment on an Value itself but propagate it down to loads and stores. I did not find such helper. rampitec: > This looks like you are reinventing getOrEnforceKnownAlignment It is not the same. I do not…
		unsigned MaxDepth = 5) {
		if (!MaxDepth)
		return;

		foadUnsubmitted Done Reply Inline Actions Could early-out here if Align==1 to save a lot of unnecessary work. foad: Could early-out here if Align==1 to save a lot of unnecessary work.
		for (User *U : Ptr->users()) {
		if (auto *LI = dyn_cast<LoadInst>(U)) {
		LI->setAlignment(std::max(A, LI->getAlign()));
		continue;
		}
		if (auto *SI = dyn_cast<StoreInst>(U)) {
		foadUnsubmitted Done Reply Inline Actions For StoreInst, AtomicRMWInst, AtomicCmpXchgInst and GetElementPtrInst you need to check that the use is actually the "address" operand of the instruction. foad: For StoreInst, AtomicRMWInst, AtomicCmpXchgInst and GetElementPtrInst you need to check that…
		foadUnsubmitted Done Reply Inline Actions Oh, I see you are already checking it for GetElementPtrInst. foad: Oh, I see you are already checking it for GetElementPtrInst.
		rampitecAuthorUnsubmitted Done Reply Inline Actions None of AtomicRMW operators can use pointers. AtomicCmpXchgInst can.Actually I cannot come up with a test for GEP too. I think it is only possible if we start to process ConstantExpr here which is not handled right now. Anyway, D104796 addresses this and Align == 1 case. rampitec: None of AtomicRMW operators can use pointers. AtomicCmpXchgInst can.Actually I cannot come up…
		SI->setAlignment(std::max(A, SI->getAlign()));
		continue;
		}
		if (auto *AI = dyn_cast<AtomicRMWInst>(U)) {
		AI->setAlignment(std::max(A, AI->getAlign()));
		continue;
		}
		if (auto *AI = dyn_cast<AtomicCmpXchgInst>(U)) {
		AI->setAlignment(std::max(A, AI->getAlign()));
		continue;
		}
		if (auto *GEP = dyn_cast<GetElementPtrInst>(U)) {
		unsigned BitWidth = DL.getIndexTypeSizeInBits(GEP->getType());
		APInt Off(BitWidth, 0);
		if (GEP->getPointerOperand() == Ptr &&
		GEP->accumulateConstantOffset(DL, Off)) {
		Align GA = commonAlignment(A, Off.getLimitedValue());
		refineUsesAlignment(GEP, GA, DL, MaxDepth - 1);
		hsmhsmUnsubmitted Done Reply Inline Actions In case of GEP instruction, I thought, we should be exploring the uses of GEP? hsmhsm: In case of GEP instruction, I thought, we should be exploring the uses of GEP?
		rampitecAuthorUnsubmitted Done Reply Inline Actions In case of GEP instruction, I thought, we should be exploring the uses of GEP? Yes, this is recursive call to refineUsesAlignment() with GEP as a Ptr. rampitec: > In case of GEP instruction, I thought, we should be exploring the uses of GEP? Yes, this is…
		}
		continue;
		}
		if (auto *I = dyn_cast<Instruction>(U)) {
		if (I->getOpcode() == Instruction::BitCast \|\|
		I->getOpcode() == Instruction::AddrSpaceCast)
		refineUsesAlignment(I, A, DL, MaxDepth - 1);
		hsmhsmUnsubmitted Done Reply Inline Actions Same here, in case of bitcast instruction, I thought, we should be exploring uses of bitcast? hsmhsm: Same here, in case of bitcast instruction, I thought, we should be exploring uses of bitcast?
		rampitecAuthorUnsubmitted Done Reply Inline Actions Same here, in case of bitcast instruction, I thought, we should be exploring uses of bitcast? Same here, recursive call explores uses of a cast. rampitec: > Same here, in case of bitcast instruction, I thought, we should be exploring uses of bitcast?
		}
		}
		}
};		};

} // namespace		} // namespace
char AMDGPULowerModuleLDS::ID = 0;		char AMDGPULowerModuleLDS::ID = 0;

char &llvm::AMDGPULowerModuleLDSID = AMDGPULowerModuleLDS::ID;		char &llvm::AMDGPULowerModuleLDSID = AMDGPULowerModuleLDS::ID;

INITIALIZE_PASS(AMDGPULowerModuleLDS, DEBUG_TYPE,		INITIALIZE_PASS(AMDGPULowerModuleLDS, DEBUG_TYPE,
Show All 12 Lines

llvm/test/CodeGen/AMDGPU/ds_read2.ll

	Show First 20 Lines • Show All 1,003 Lines • ▼ Show 20 Lines

	@bar = addrspace(3) global [4 x i64] undef, align 4			@bar = addrspace(3) global [4 x i64] undef, align 4

	define amdgpu_kernel void @load_misaligned64_constant_offsets(i64 addrspace(1)* %out) {			define amdgpu_kernel void @load_misaligned64_constant_offsets(i64 addrspace(1)* %out) {
	; CI-LABEL: load_misaligned64_constant_offsets:			; CI-LABEL: load_misaligned64_constant_offsets:
	; CI: ; %bb.0:			; CI: ; %bb.0:
	; CI-NEXT: v_mov_b32_e32 v0, 0			; CI-NEXT: v_mov_b32_e32 v0, 0
	; CI-NEXT: s_mov_b32 m0, -1			; CI-NEXT: s_mov_b32 m0, -1
	; CI-NEXT: ds_read2_b64 v[0:3], v0 offset1:1			; CI-NEXT: ds_read_b128 v[0:3], v0
	; CI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9			; CI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9
	; CI-NEXT: s_mov_b32 s3, 0xf000			; CI-NEXT: s_mov_b32 s3, 0xf000
	; CI-NEXT: s_mov_b32 s2, -1			; CI-NEXT: s_mov_b32 s2, -1
	; CI-NEXT: s_waitcnt lgkmcnt(0)			; CI-NEXT: s_waitcnt lgkmcnt(0)
	; CI-NEXT: v_add_i32_e32 v0, vcc, v0, v2			; CI-NEXT: v_add_i32_e32 v0, vcc, v0, v2
	; CI-NEXT: v_addc_u32_e32 v1, vcc, v1, v3, vcc			; CI-NEXT: v_addc_u32_e32 v1, vcc, v1, v3, vcc
	; CI-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0			; CI-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0
	; CI-NEXT: s_endpgm			; CI-NEXT: s_endpgm
	;			;
	; GFX9-ALIGNED-LABEL: load_misaligned64_constant_offsets:			; GFX9-LABEL: load_misaligned64_constant_offsets:
	; GFX9-ALIGNED: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-ALIGNED-NEXT: v_mov_b32_e32 v4, 0			; GFX9-NEXT: v_mov_b32_e32 v4, 0
	; GFX9-ALIGNED-NEXT: ds_read2_b64 v[0:3], v4 offset1:1			; GFX9-NEXT: ds_read_b128 v[0:3], v4
	; GFX9-ALIGNED-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX9-ALIGNED-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-ALIGNED-NEXT: v_add_co_u32_e32 v0, vcc, v0, v2			; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, v0, v2
	; GFX9-ALIGNED-NEXT: v_addc_co_u32_e32 v1, vcc, v1, v3, vcc			; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, v1, v3, vcc
	; GFX9-ALIGNED-NEXT: global_store_dwordx2 v4, v[0:1], s[0:1]			; GFX9-NEXT: global_store_dwordx2 v4, v[0:1], s[0:1]
	; GFX9-ALIGNED-NEXT: s_endpgm			; GFX9-NEXT: s_endpgm
	;
	; GFX9-UNALIGNED-LABEL: load_misaligned64_constant_offsets:
	; GFX9-UNALIGNED: ; %bb.0:
	; GFX9-UNALIGNED-NEXT: v_mov_b32_e32 v4, 0
	; GFX9-UNALIGNED-NEXT: ds_read_b128 v[0:3], v4
	; GFX9-UNALIGNED-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX9-UNALIGNED-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-UNALIGNED-NEXT: v_add_co_u32_e32 v0, vcc, v0, v2
	; GFX9-UNALIGNED-NEXT: v_addc_co_u32_e32 v1, vcc, v1, v3, vcc
	; GFX9-UNALIGNED-NEXT: global_store_dwordx2 v4, v[0:1], s[0:1]
	; GFX9-UNALIGNED-NEXT: s_endpgm
	%val0 = load i64, i64 addrspace(3)* getelementptr inbounds ([4 x i64], [4 x i64] addrspace(3)* @bar, i32 0, i32 0), align 4			%val0 = load i64, i64 addrspace(3)* getelementptr inbounds ([4 x i64], [4 x i64] addrspace(3)* @bar, i32 0, i32 0), align 4
	%val1 = load i64, i64 addrspace(3)* getelementptr inbounds ([4 x i64], [4 x i64] addrspace(3)* @bar, i32 0, i32 1), align 4			%val1 = load i64, i64 addrspace(3)* getelementptr inbounds ([4 x i64], [4 x i64] addrspace(3)* @bar, i32 0, i32 1), align 4
	%sum = add i64 %val0, %val1			%sum = add i64 %val0, %val1
	store i64 %sum, i64 addrspace(1)* %out, align 8			store i64 %sum, i64 addrspace(1)* %out, align 8
	ret void			ret void
	}			}

	@bar.large = addrspace(3) global [4096 x i64] undef, align 4			@bar.large = addrspace(3) global [4096 x i64] undef, align 4
	▲ Show 20 Lines • Show All 499 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/ds_write2.ll

Show First 20 Lines • Show All 812 Lines • ▼ Show 20 Lines	; GFX9-NEXT: s_endpgm
ret void		ret void
}		}

@bar = addrspace(3) global [4 x i64] undef, align 4		@bar = addrspace(3) global [4 x i64] undef, align 4

define amdgpu_kernel void @store_misaligned64_constant_offsets() {		define amdgpu_kernel void @store_misaligned64_constant_offsets() {
; CI-LABEL: store_misaligned64_constant_offsets:		; CI-LABEL: store_misaligned64_constant_offsets:
; CI: ; %bb.0:		; CI: ; %bb.0:
; CI-NEXT: s_movk_i32 s0, 0x7b		; CI-NEXT: v_mov_b32_e32 v0, 0x7b
; CI-NEXT: s_mov_b32 s1, 0		; CI-NEXT: v_mov_b32_e32 v1, 0
; CI-NEXT: v_mov_b32_e32 v0, s0		; CI-NEXT: v_mov_b32_e32 v2, v0
; CI-NEXT: v_mov_b32_e32 v2, 0		; CI-NEXT: v_mov_b32_e32 v3, v1
; CI-NEXT: v_mov_b32_e32 v1, s1
; CI-NEXT: s_mov_b32 m0, -1		; CI-NEXT: s_mov_b32 m0, -1
; CI-NEXT: ds_write2_b64 v2, v[0:1], v[0:1] offset1:1		; CI-NEXT: ds_write_b128 v1, v[0:3]
; CI-NEXT: s_endpgm		; CI-NEXT: s_endpgm
;		;
; GFX9-ALIGNED-LABEL: store_misaligned64_constant_offsets:		; GFX9-LABEL: store_misaligned64_constant_offsets:
; GFX9-ALIGNED: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-ALIGNED-NEXT: s_movk_i32 s0, 0x7b		; GFX9-NEXT: v_mov_b32_e32 v0, 0x7b
; GFX9-ALIGNED-NEXT: s_mov_b32 s1, 0		; GFX9-NEXT: v_mov_b32_e32 v1, 0
; GFX9-ALIGNED-NEXT: v_mov_b32_e32 v0, s0		; GFX9-NEXT: v_mov_b32_e32 v2, v0
; GFX9-ALIGNED-NEXT: v_mov_b32_e32 v2, 0		; GFX9-NEXT: v_mov_b32_e32 v3, v1
; GFX9-ALIGNED-NEXT: v_mov_b32_e32 v1, s1		; GFX9-NEXT: ds_write_b128 v1, v[0:3]
; GFX9-ALIGNED-NEXT: ds_write2_b64 v2, v[0:1], v[0:1] offset1:1		; GFX9-NEXT: s_endpgm
; GFX9-ALIGNED-NEXT: s_endpgm
;
; GFX9-UNALIGNED-LABEL: store_misaligned64_constant_offsets:
; GFX9-UNALIGNED: ; %bb.0:
; GFX9-UNALIGNED-NEXT: v_mov_b32_e32 v0, 0x7b
; GFX9-UNALIGNED-NEXT: v_mov_b32_e32 v1, 0
; GFX9-UNALIGNED-NEXT: v_mov_b32_e32 v2, v0
; GFX9-UNALIGNED-NEXT: v_mov_b32_e32 v3, v1
; GFX9-UNALIGNED-NEXT: ds_write_b128 v1, v[0:3]
; GFX9-UNALIGNED-NEXT: s_endpgm
store i64 123, i64 addrspace(3)* getelementptr inbounds ([4 x i64], [4 x i64] addrspace(3)* @bar, i32 0, i32 0), align 4		store i64 123, i64 addrspace(3)* getelementptr inbounds ([4 x i64], [4 x i64] addrspace(3)* @bar, i32 0, i32 0), align 4
store i64 123, i64 addrspace(3)* getelementptr inbounds ([4 x i64], [4 x i64] addrspace(3)* @bar, i32 0, i32 1), align 4		store i64 123, i64 addrspace(3)* getelementptr inbounds ([4 x i64], [4 x i64] addrspace(3)* @bar, i32 0, i32 1), align 4
ret void		ret void
}		}

@bar.large = addrspace(3) global [4096 x i64] undef, align 4		@bar.large = addrspace(3) global [4096 x i64] undef, align 4

define amdgpu_kernel void @store_misaligned64_constant_large_offsets() {		define amdgpu_kernel void @store_misaligned64_constant_large_offsets() {
▲ Show 20 Lines • Show All 220 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/lower-kernel-and-module-lds.ll

Show All 19 Lines
; CHECK: @llvm.amdgcn.kernel.k1.lds = internal addrspace(3) global %llvm.amdgcn.kernel.k1.lds.t undef, align 16		; CHECK: @llvm.amdgcn.kernel.k1.lds = internal addrspace(3) global %llvm.amdgcn.kernel.k1.lds.t undef, align 16
; CHECK: @llvm.amdgcn.kernel..lds = internal addrspace(3) global %llvm.amdgcn.kernel..lds.t undef, align 2		; CHECK: @llvm.amdgcn.kernel..lds = internal addrspace(3) global %llvm.amdgcn.kernel..lds.t undef, align 2
; CHECK: @llvm.amdgcn.kernel..lds.1 = internal addrspace(3) global %llvm.amdgcn.kernel..lds.t.0 undef, align 4		; CHECK: @llvm.amdgcn.kernel..lds.1 = internal addrspace(3) global %llvm.amdgcn.kernel..lds.t.0 undef, align 4
;.		;.
define amdgpu_kernel void @k0() {		define amdgpu_kernel void @k0() {
; CHECK-LABEL: @k0(		; CHECK-LABEL: @k0(
; CHECK-NEXT: call void @llvm.donothing() [ "ExplicitUse"(%llvm.amdgcn.module.lds.t addrspace(3)* @llvm.amdgcn.module.lds) ]		; CHECK-NEXT: call void @llvm.donothing() [ "ExplicitUse"(%llvm.amdgcn.module.lds.t addrspace(3)* @llvm.amdgcn.module.lds) ]
; CHECK-NEXT: %lds.size.1.align.1.bc = bitcast [1 x i8] addrspace(3)* getelementptr inbounds (%llvm.amdgcn.module.lds.t, %llvm.amdgcn.module.lds.t addrspace(3)* @llvm.amdgcn.module.lds, i32 0, i32 1) to i8 addrspace(3)*		; CHECK-NEXT: %lds.size.1.align.1.bc = bitcast [1 x i8] addrspace(3)* getelementptr inbounds (%llvm.amdgcn.module.lds.t, %llvm.amdgcn.module.lds.t addrspace(3)* @llvm.amdgcn.module.lds, i32 0, i32 1) to i8 addrspace(3)*
; CHECK-NEXT: store i8 1, i8 addrspace(3)* %lds.size.1.align.1.bc, align 1		; CHECK-NEXT: store i8 1, i8 addrspace(3)* %lds.size.1.align.1.bc, align 8
; CHECK-NEXT: %lds.size.2.align.2.bc = bitcast [2 x i8] addrspace(3)* getelementptr inbounds (%llvm.amdgcn.kernel.k0.lds.t, %llvm.amdgcn.kernel.k0.lds.t addrspace(3)* @llvm.amdgcn.kernel.k0.lds, i32 0, i32 2) to i8 addrspace(3)*		; CHECK-NEXT: %lds.size.2.align.2.bc = bitcast [2 x i8] addrspace(3)* getelementptr inbounds (%llvm.amdgcn.kernel.k0.lds.t, %llvm.amdgcn.kernel.k0.lds.t addrspace(3)* @llvm.amdgcn.kernel.k0.lds, i32 0, i32 2) to i8 addrspace(3)*
; CHECK-NEXT: store i8 2, i8 addrspace(3)* %lds.size.2.align.2.bc, align 2		; CHECK-NEXT: store i8 2, i8 addrspace(3)* %lds.size.2.align.2.bc, align 4
; CHECK-NEXT: %lds.size.4.align.4.bc = bitcast [4 x i8] addrspace(3)* getelementptr inbounds (%llvm.amdgcn.kernel.k0.lds.t, %llvm.amdgcn.kernel.k0.lds.t addrspace(3)* @llvm.amdgcn.kernel.k0.lds, i32 0, i32 1) to i8 addrspace(3)*		; CHECK-NEXT: %lds.size.4.align.4.bc = bitcast [4 x i8] addrspace(3)* getelementptr inbounds (%llvm.amdgcn.kernel.k0.lds.t, %llvm.amdgcn.kernel.k0.lds.t addrspace(3)* @llvm.amdgcn.kernel.k0.lds, i32 0, i32 1) to i8 addrspace(3)*
; CHECK-NEXT: store i8 4, i8 addrspace(3)* %lds.size.4.align.4.bc, align 4		; CHECK-NEXT: store i8 4, i8 addrspace(3)* %lds.size.4.align.4.bc, align 16
; CHECK-NEXT: %lds.size.16.align.16.bc = bitcast [16 x i8] addrspace(3)* getelementptr inbounds (%llvm.amdgcn.kernel.k0.lds.t, %llvm.amdgcn.kernel.k0.lds.t addrspace(3)* @llvm.amdgcn.kernel.k0.lds, i32 0, i32 0) to i8 addrspace(3)*		; CHECK-NEXT: %lds.size.16.align.16.bc = bitcast [16 x i8] addrspace(3)* getelementptr inbounds (%llvm.amdgcn.kernel.k0.lds.t, %llvm.amdgcn.kernel.k0.lds.t addrspace(3)* @llvm.amdgcn.kernel.k0.lds, i32 0, i32 0) to i8 addrspace(3)*
; CHECK-NEXT: store i8 16, i8 addrspace(3)* %lds.size.16.align.16.bc, align 16		; CHECK-NEXT: store i8 16, i8 addrspace(3)* %lds.size.16.align.16.bc, align 16
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
%lds.size.1.align.1.bc = bitcast [1 x i8] addrspace(3)* @lds.size.1.align.1 to i8 addrspace(3)*		%lds.size.1.align.1.bc = bitcast [1 x i8] addrspace(3)* @lds.size.1.align.1 to i8 addrspace(3)*
store i8 1, i8 addrspace(3)* %lds.size.1.align.1.bc, align 1		store i8 1, i8 addrspace(3)* %lds.size.1.align.1.bc, align 1

%lds.size.2.align.2.bc = bitcast [2 x i8] addrspace(3)* @lds.size.2.align.2 to i8 addrspace(3)*		%lds.size.2.align.2.bc = bitcast [2 x i8] addrspace(3)* @lds.size.2.align.2 to i8 addrspace(3)*
store i8 2, i8 addrspace(3)* %lds.size.2.align.2.bc, align 2		store i8 2, i8 addrspace(3)* %lds.size.2.align.2.bc, align 2

%lds.size.4.align.4.bc = bitcast [4 x i8] addrspace(3)* @lds.size.4.align.4 to i8 addrspace(3)*		%lds.size.4.align.4.bc = bitcast [4 x i8] addrspace(3)* @lds.size.4.align.4 to i8 addrspace(3)*
store i8 4, i8 addrspace(3)* %lds.size.4.align.4.bc, align 4		store i8 4, i8 addrspace(3)* %lds.size.4.align.4.bc, align 4

%lds.size.16.align.16.bc = bitcast [16 x i8] addrspace(3)* @lds.size.16.align.16 to i8 addrspace(3)*		%lds.size.16.align.16.bc = bitcast [16 x i8] addrspace(3)* @lds.size.16.align.16 to i8 addrspace(3)*
store i8 16, i8 addrspace(3)* %lds.size.16.align.16.bc, align 16		store i8 16, i8 addrspace(3)* %lds.size.16.align.16.bc, align 16

ret void		ret void
}		}

define amdgpu_kernel void @k1() {		define amdgpu_kernel void @k1() {
; CHECK-LABEL: @k1(		; CHECK-LABEL: @k1(
; CHECK-NEXT: call void @llvm.donothing() [ "ExplicitUse"(%llvm.amdgcn.module.lds.t addrspace(3)* @llvm.amdgcn.module.lds) ]		; CHECK-NEXT: call void @llvm.donothing() [ "ExplicitUse"(%llvm.amdgcn.module.lds.t addrspace(3)* @llvm.amdgcn.module.lds) ]
; CHECK-NEXT: %lds.size.2.align.2.bc = bitcast [2 x i8] addrspace(3)* getelementptr inbounds (%llvm.amdgcn.kernel.k1.lds.t, %llvm.amdgcn.kernel.k1.lds.t addrspace(3)* @llvm.amdgcn.kernel.k1.lds, i32 0, i32 2) to i8 addrspace(3)*		; CHECK-NEXT: %lds.size.2.align.2.bc = bitcast [2 x i8] addrspace(3)* getelementptr inbounds (%llvm.amdgcn.kernel.k1.lds.t, %llvm.amdgcn.kernel.k1.lds.t addrspace(3)* @llvm.amdgcn.kernel.k1.lds, i32 0, i32 2) to i8 addrspace(3)*
; CHECK-NEXT: store i8 2, i8 addrspace(3)* %lds.size.2.align.2.bc, align 2		; CHECK-NEXT: store i8 2, i8 addrspace(3)* %lds.size.2.align.2.bc, align 4
; CHECK-NEXT: %lds.size.4.align.4.bc = bitcast [4 x i8] addrspace(3)* getelementptr inbounds (%llvm.amdgcn.kernel.k1.lds.t, %llvm.amdgcn.kernel.k1.lds.t addrspace(3)* @llvm.amdgcn.kernel.k1.lds, i32 0, i32 1) to i8 addrspace(3)*		; CHECK-NEXT: %lds.size.4.align.4.bc = bitcast [4 x i8] addrspace(3)* getelementptr inbounds (%llvm.amdgcn.kernel.k1.lds.t, %llvm.amdgcn.kernel.k1.lds.t addrspace(3)* @llvm.amdgcn.kernel.k1.lds, i32 0, i32 1) to i8 addrspace(3)*
; CHECK-NEXT: store i8 4, i8 addrspace(3)* %lds.size.4.align.4.bc, align 4		; CHECK-NEXT: store i8 4, i8 addrspace(3)* %lds.size.4.align.4.bc, align 16
; CHECK-NEXT: %lds.size.16.align.16.bc = bitcast [16 x i8] addrspace(3)* getelementptr inbounds (%llvm.amdgcn.kernel.k1.lds.t, %llvm.amdgcn.kernel.k1.lds.t addrspace(3)* @llvm.amdgcn.kernel.k1.lds, i32 0, i32 0) to i8 addrspace(3)*		; CHECK-NEXT: %lds.size.16.align.16.bc = bitcast [16 x i8] addrspace(3)* getelementptr inbounds (%llvm.amdgcn.kernel.k1.lds.t, %llvm.amdgcn.kernel.k1.lds.t addrspace(3)* @llvm.amdgcn.kernel.k1.lds, i32 0, i32 0) to i8 addrspace(3)*
; CHECK-NEXT: store i8 16, i8 addrspace(3)* %lds.size.16.align.16.bc, align 16		; CHECK-NEXT: store i8 16, i8 addrspace(3)* %lds.size.16.align.16.bc, align 16
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
%lds.size.2.align.2.bc = bitcast [2 x i8] addrspace(3)* @lds.size.2.align.2 to i8 addrspace(3)*		%lds.size.2.align.2.bc = bitcast [2 x i8] addrspace(3)* @lds.size.2.align.2 to i8 addrspace(3)*
store i8 2, i8 addrspace(3)* %lds.size.2.align.2.bc, align 2		store i8 2, i8 addrspace(3)* %lds.size.2.align.2.bc, align 2

%lds.size.4.align.4.bc = bitcast [4 x i8] addrspace(3)* @lds.size.4.align.4 to i8 addrspace(3)*		%lds.size.4.align.4.bc = bitcast [4 x i8] addrspace(3)* @lds.size.4.align.4 to i8 addrspace(3)*
Show All 29 Lines	;
store i8 4, i8 addrspace(3)* %lds.size.4.align.4.bc, align 4		store i8 4, i8 addrspace(3)* %lds.size.4.align.4.bc, align 4

ret void		ret void
}		}

define void @f0() {		define void @f0() {
; CHECK-LABEL: @f0(		; CHECK-LABEL: @f0(
; CHECK-NEXT: %lds.size.1.align.1.bc = bitcast [1 x i8] addrspace(3)* getelementptr inbounds (%llvm.amdgcn.module.lds.t, %llvm.amdgcn.module.lds.t addrspace(3)* @llvm.amdgcn.module.lds, i32 0, i32 1) to i8 addrspace(3)*		; CHECK-NEXT: %lds.size.1.align.1.bc = bitcast [1 x i8] addrspace(3)* getelementptr inbounds (%llvm.amdgcn.module.lds.t, %llvm.amdgcn.module.lds.t addrspace(3)* @llvm.amdgcn.module.lds, i32 0, i32 1) to i8 addrspace(3)*
; CHECK-NEXT: store i8 1, i8 addrspace(3)* %lds.size.1.align.1.bc, align 1		; CHECK-NEXT: store i8 1, i8 addrspace(3)* %lds.size.1.align.1.bc, align 8
; CHECK-NEXT: %lds.size.8.align.8.bc = bitcast [8 x i8] addrspace(3)* getelementptr inbounds (%llvm.amdgcn.module.lds.t, %llvm.amdgcn.module.lds.t addrspace(3)* @llvm.amdgcn.module.lds, i32 0, i32 0) to i8 addrspace(3)*		; CHECK-NEXT: %lds.size.8.align.8.bc = bitcast [8 x i8] addrspace(3)* getelementptr inbounds (%llvm.amdgcn.module.lds.t, %llvm.amdgcn.module.lds.t addrspace(3)* @llvm.amdgcn.module.lds, i32 0, i32 0) to i8 addrspace(3)*
; CHECK-NEXT: store i8 8, i8 addrspace(3)* %lds.size.8.align.8.bc, align 4		; CHECK-NEXT: store i8 8, i8 addrspace(3)* %lds.size.8.align.8.bc, align 8
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
%lds.size.1.align.1.bc = bitcast [1 x i8] addrspace(3)* @lds.size.1.align.1 to i8 addrspace(3)*		%lds.size.1.align.1.bc = bitcast [1 x i8] addrspace(3)* @lds.size.1.align.1 to i8 addrspace(3)*
store i8 1, i8 addrspace(3)* %lds.size.1.align.1.bc, align 1		store i8 1, i8 addrspace(3)* %lds.size.1.align.1.bc, align 1

%lds.size.8.align.8.bc = bitcast [8 x i8] addrspace(3)* @lds.size.8.align.8 to i8 addrspace(3)*		%lds.size.8.align.8.bc = bitcast [8 x i8] addrspace(3)* @lds.size.8.align.8 to i8 addrspace(3)*
store i8 8, i8 addrspace(3)* %lds.size.8.align.8.bc, align 4		store i8 8, i8 addrspace(3)* %lds.size.8.align.8.bc, align 4

ret void		ret void
}		}
;.		;.
; CHECK: attributes #0 = { nofree nosync nounwind readnone willreturn }		; CHECK: attributes #0 = { nofree nosync nounwind readnone willreturn }
;.		;.

llvm/test/CodeGen/AMDGPU/lower-kernel-lds-constexpr.ll

	Show First 20 Lines • Show All 70 Lines • ▼ Show 20 Lines
	; CHECK-LABEL: @k3(			; CHECK-LABEL: @k3(
	; CHECK-NEXT: %1 = getelementptr inbounds [32 x i8], [32 x i8] addrspace(3)* getelementptr inbounds (%llvm.amdgcn.kernel.k3.lds.t, %llvm.amdgcn.kernel.k3.lds.t addrspace(3)* @llvm.amdgcn.kernel.k3.lds, i32 0, i32 0), i32 0, i32 16			; CHECK-NEXT: %1 = getelementptr inbounds [32 x i8], [32 x i8] addrspace(3)* getelementptr inbounds (%llvm.amdgcn.kernel.k3.lds.t, %llvm.amdgcn.kernel.k3.lds.t addrspace(3)* @llvm.amdgcn.kernel.k3.lds, i32 0, i32 0), i32 0, i32 16
	; CHECK-NEXT: %2 = bitcast i8 addrspace(3)* %1 to i64 addrspace(3)*			; CHECK-NEXT: %2 = bitcast i8 addrspace(3)* %1 to i64 addrspace(3)*
	; CHECK-NEXT: %ptr1 = addrspacecast i64 addrspace(3)* %2 to i64*			; CHECK-NEXT: %ptr1 = addrspacecast i64 addrspace(3)* %2 to i64*
	; CHECK-NEXT: store i64 1, i64* %ptr1, align 1			; CHECK-NEXT: store i64 1, i64* %ptr1, align 1
	; CHECK-NEXT: %3 = getelementptr inbounds [32 x i8], [32 x i8] addrspace(3)* getelementptr inbounds (%llvm.amdgcn.kernel.k3.lds.t, %llvm.amdgcn.kernel.k3.lds.t addrspace(3)* @llvm.amdgcn.kernel.k3.lds, i32 0, i32 0), i32 0, i32 24			; CHECK-NEXT: %3 = getelementptr inbounds [32 x i8], [32 x i8] addrspace(3)* getelementptr inbounds (%llvm.amdgcn.kernel.k3.lds.t, %llvm.amdgcn.kernel.k3.lds.t addrspace(3)* @llvm.amdgcn.kernel.k3.lds, i32 0, i32 0), i32 0, i32 24
	; CHECK-NEXT: %4 = bitcast i8 addrspace(3)* %3 to i64 addrspace(3)*			; CHECK-NEXT: %4 = bitcast i8 addrspace(3)* %3 to i64 addrspace(3)*
	; CHECK-NEXT: %ptr2 = addrspacecast i64 addrspace(3)* %4 to i64*			; CHECK-NEXT: %ptr2 = addrspacecast i64 addrspace(3)* %4 to i64*
	; CHECK-NEXT: store i64 2, i64* %ptr2, align 1			; CHECK-NEXT: store i64 2, i64* %ptr2, align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%ptr1 = addrspacecast i64 addrspace(3)* bitcast (i8 addrspace(3)* getelementptr inbounds ([32 x i8], [32 x i8] addrspace(3)* @lds.3, i32 0, i32 16) to i64 addrspace(3)) to i64			%ptr1 = addrspacecast i64 addrspace(3)* bitcast (i8 addrspace(3)* getelementptr inbounds ([32 x i8], [32 x i8] addrspace(3)* @lds.3, i32 0, i32 16) to i64 addrspace(3)) to i64
	store i64 1, i64* %ptr1, align 1			store i64 1, i64* %ptr1, align 1
	%ptr2 = addrspacecast i64 addrspace(3)* bitcast (i8 addrspace(3)* getelementptr inbounds ([32 x i8], [32 x i8] addrspace(3)* @lds.3, i32 0, i32 24) to i64 addrspace(3)) to i64			%ptr2 = addrspacecast i64 addrspace(3)* bitcast (i8 addrspace(3)* getelementptr inbounds ([32 x i8], [32 x i8] addrspace(3)* @lds.3, i32 0, i32 24) to i64 addrspace(3)) to i64
	store i64 2, i64* %ptr2, align 1			store i64 2, i64* %ptr2, align 1
	ret void			ret void
	}			}
	Show All 27 Lines

llvm/test/CodeGen/AMDGPU/lower-kernel-lds-global-uses.ll

	Show All 35 Lines
	; CHECK: @llvm.amdgcn.kernel.k0.lds = internal addrspace(3) global %llvm.amdgcn.kernel.k0.lds.t undef, align 4			; CHECK: @llvm.amdgcn.kernel.k0.lds = internal addrspace(3) global %llvm.amdgcn.kernel.k0.lds.t undef, align 4

	; CHECK: @llvm.used = appending global [1 x i8] [i8 addrspacecast (i8 addrspace(3)* bitcast (i16 addrspace(3)* @lds.5 to i8 addrspace(3)) to i8)], section "llvm.metadata"			; CHECK: @llvm.used = appending global [1 x i8] [i8 addrspacecast (i8 addrspace(3)* bitcast (i16 addrspace(3)* @lds.5 to i8 addrspace(3)) to i8)], section "llvm.metadata"
	; CHECK: @llvm.compiler.used = appending global [2 x i8] [i8 addrspacecast (i8 addrspace(1)* bitcast (i64* addrspace(1)* @gptr.4 to i8 addrspace(1)) to i8), i8* addrspacecast (i8 addrspace(3)* bitcast (i32 addrspace(3)* @lds.6 to i8 addrspace(3)) to i8)], section "llvm.metadata"			; CHECK: @llvm.compiler.used = appending global [2 x i8] [i8 addrspacecast (i8 addrspace(1)* bitcast (i64* addrspace(1)* @gptr.4 to i8 addrspace(1)) to i8), i8* addrspacecast (i8 addrspace(3)* bitcast (i32 addrspace(3)* @lds.6 to i8 addrspace(3)) to i8)], section "llvm.metadata"
	@llvm.used = appending global [2 x i8] [i8 addrspacecast (i8 addrspace(3)* bitcast (i16 addrspace(3)* @lds.1 to i8 addrspace(3)) to i8), i8* addrspacecast (i8 addrspace(3)* bitcast (i16 addrspace(3)* @lds.5 to i8 addrspace(3)) to i8)], section "llvm.metadata"			@llvm.used = appending global [2 x i8] [i8 addrspacecast (i8 addrspace(3)* bitcast (i16 addrspace(3)* @lds.1 to i8 addrspace(3)) to i8), i8* addrspacecast (i8 addrspace(3)* bitcast (i16 addrspace(3)* @lds.5 to i8 addrspace(3)) to i8)], section "llvm.metadata"
	@llvm.compiler.used = appending global [3 x i8] [i8 addrspacecast (i8 addrspace(3)* bitcast (i32 addrspace(3)* @lds.2 to i8 addrspace(3)) to i8), i8* addrspacecast (i8 addrspace(1)* bitcast (i64* addrspace(1)* @gptr.4 to i8 addrspace(1)) to i8), i8* addrspacecast (i8 addrspace(3)* bitcast (i32 addrspace(3)* @lds.6 to i8 addrspace(3)) to i8)], section "llvm.metadata"			@llvm.compiler.used = appending global [3 x i8] [i8 addrspacecast (i8 addrspace(3)* bitcast (i32 addrspace(3)* @lds.2 to i8 addrspace(3)) to i8), i8* addrspacecast (i8 addrspace(1)* bitcast (i64* addrspace(1)* @gptr.4 to i8 addrspace(1)) to i8), i8* addrspacecast (i8 addrspace(3)* bitcast (i32 addrspace(3)* @lds.6 to i8 addrspace(3)) to i8)], section "llvm.metadata"

	; CHECK-LABEL: @k0()			; CHECK-LABEL: @k0()
	; CHECK: %ld.lds.1 = load i16, i16 addrspace(3)* getelementptr inbounds (%llvm.amdgcn.kernel.k0.lds.t, %llvm.amdgcn.kernel.k0.lds.t addrspace(3)* @llvm.amdgcn.kernel.k0.lds, i32 0, i32 1), align 2			; CHECK: %ld.lds.1 = load i16, i16 addrspace(3)* getelementptr inbounds (%llvm.amdgcn.kernel.k0.lds.t, %llvm.amdgcn.kernel.k0.lds.t addrspace(3)* @llvm.amdgcn.kernel.k0.lds, i32 0, i32 1), align 4
	; CHECK: %ld.lds.2 = load i32, i32 addrspace(3)* getelementptr inbounds (%llvm.amdgcn.kernel.k0.lds.t, %llvm.amdgcn.kernel.k0.lds.t addrspace(3)* @llvm.amdgcn.kernel.k0.lds, i32 0, i32 0), align 4			; CHECK: %ld.lds.2 = load i32, i32 addrspace(3)* getelementptr inbounds (%llvm.amdgcn.kernel.k0.lds.t, %llvm.amdgcn.kernel.k0.lds.t addrspace(3)* @llvm.amdgcn.kernel.k0.lds, i32 0, i32 0), align 4
	; CHECK: %ld.lds.3 = load i64, i64 addrspace(3)* @lds.3, align 4			; CHECK: %ld.lds.3 = load i64, i64 addrspace(3)* @lds.3, align 4
	; CHECK: %ld.lds.4 = load float, float addrspace(3)* @lds.4, align 4			; CHECK: %ld.lds.4 = load float, float addrspace(3)* @lds.4, align 4
	; CHECK: ret void			; CHECK: ret void
	define amdgpu_kernel void @k0() {			define amdgpu_kernel void @k0() {
	%ld.lds.1 = load i16, i16 addrspace(3)* @lds.1			%ld.lds.1 = load i16, i16 addrspace(3)* @lds.1
	%ld.lds.2 = load i32, i32 addrspace(3)* @lds.2			%ld.lds.2 = load i32, i32 addrspace(3)* @lds.2
	%ld.lds.3 = load i64, i64 addrspace(3)* @lds.3			%ld.lds.3 = load i64, i64 addrspace(3)* @lds.3
	%ld.lds.4 = load float, float addrspace(3)* @lds.4			%ld.lds.4 = load float, float addrspace(3)* @lds.4
	ret void			ret void
	}			}

llvm/test/CodeGen/AMDGPU/lower-kernel-lds-super-align.ll

	; RUN: opt -S -mtriple=amdgcn-- -amdgpu-lower-module-lds < %s \| FileCheck --check-prefixes=CHECK,SUPER-ALIGN_ON %s			; RUN: opt -S -mtriple=amdgcn-- -amdgpu-lower-module-lds < %s \| FileCheck --check-prefixes=CHECK,SUPER-ALIGN_ON %s
	; RUN: opt -S -mtriple=amdgcn-- -passes=amdgpu-lower-module-lds < %s \| FileCheck --check-prefixes=CHECK,SUPER-ALIGN_ON %s			; RUN: opt -S -mtriple=amdgcn-- -passes=amdgpu-lower-module-lds < %s \| FileCheck --check-prefixes=CHECK,SUPER-ALIGN_ON %s
	; RUN: opt -S -mtriple=amdgcn-- -amdgpu-lower-module-lds --amdgpu-super-align-lds-globals=false < %s \| FileCheck --check-prefixes=CHECK,SUPER-ALIGN_OFF %s			; RUN: opt -S -mtriple=amdgcn-- -amdgpu-lower-module-lds --amdgpu-super-align-lds-globals=false < %s \| FileCheck --check-prefixes=CHECK,SUPER-ALIGN_OFF %s
	; RUN: opt -S -mtriple=amdgcn-- -passes=amdgpu-lower-module-lds --amdgpu-super-align-lds-globals=false < %s \| FileCheck --check-prefixes=CHECK,SUPER-ALIGN_OFF %s			; RUN: opt -S -mtriple=amdgcn-- -passes=amdgpu-lower-module-lds --amdgpu-super-align-lds-globals=false < %s \| FileCheck --check-prefixes=CHECK,SUPER-ALIGN_OFF %s

	; CHECK: %llvm.amdgcn.kernel.k1.lds.t = type { [32 x i8] }			; CHECK: %llvm.amdgcn.kernel.k1.lds.t = type { [32 x i8] }
				; CHECK: %llvm.amdgcn.kernel.k2.lds.t = type { i16, [2 x i8], i16 }
				; CHECK: %llvm.amdgcn.kernel.k3.lds.t = type { [32 x i64], [32 x i32] }

	; CHECK-NOT: @lds.1			; CHECK-NOT: @lds.1
	@lds.1 = internal unnamed_addr addrspace(3) global [32 x i8] undef, align 1			@lds.1 = internal unnamed_addr addrspace(3) global [32 x i8] undef, align 1

	; SUPER-ALIGN_ON: @llvm.amdgcn.kernel.k1.lds = internal addrspace(3) global %llvm.amdgcn.kernel.k1.lds.t undef, align 16			; SUPER-ALIGN_ON: @llvm.amdgcn.kernel.k1.lds = internal addrspace(3) global %llvm.amdgcn.kernel.k1.lds.t undef, align 16
	; SUPER-ALIGN_OFF: @llvm.amdgcn.kernel.k1.lds = internal addrspace(3) global %llvm.amdgcn.kernel.k1.lds.t undef, align 1			; SUPER-ALIGN_OFF: @llvm.amdgcn.kernel.k1.lds = internal addrspace(3) global %llvm.amdgcn.kernel.k1.lds.t undef, align 1

				; CHECK: @llvm.amdgcn.kernel.k2.lds = internal addrspace(3) global %llvm.amdgcn.kernel.k2.lds.t undef, align 4
				; SUPER-ALIGN_ON: @llvm.amdgcn.kernel.k3.lds = internal addrspace(3) global %llvm.amdgcn.kernel.k3.lds.t undef, align 16
				; SUPER-ALIGN_OFF: @llvm.amdgcn.kernel.k3.lds = internal addrspace(3) global %llvm.amdgcn.kernel.k3.lds.t undef, align 8

	; CHECK-LABEL: @k1			; CHECK-LABEL: @k1
	; CHECK: %1 = getelementptr inbounds [32 x i8], [32 x i8] addrspace(3)* getelementptr inbounds (%llvm.amdgcn.kernel.k1.lds.t, %llvm.amdgcn.kernel.k1.lds.t addrspace(3)* @llvm.amdgcn.kernel.k1.lds, i32 0, i32 0), i32 0, i32 0			; CHECK: %1 = getelementptr inbounds [32 x i8], [32 x i8] addrspace(3)* getelementptr inbounds (%llvm.amdgcn.kernel.k1.lds.t, %llvm.amdgcn.kernel.k1.lds.t addrspace(3)* @llvm.amdgcn.kernel.k1.lds, i32 0, i32 0), i32 0, i32 0
	; CHECK: %2 = addrspacecast i8 addrspace(3)* %1 to i8*			; CHECK: %2 = addrspacecast i8 addrspace(3)* %1 to i8*
	; CHECK: %ptr = getelementptr inbounds i8, i8* %2, i64 %x			; CHECK: %ptr = getelementptr inbounds i8, i8* %2, i64 %x
	; CHECK: store i8 1, i8* %ptr, align 1			; CHECK: store i8 1, i8* %ptr, align 1
	define amdgpu_kernel void @k1(i64 %x) {			define amdgpu_kernel void @k1(i64 %x) {
	%ptr = getelementptr inbounds i8, i8* addrspacecast ([32 x i8] addrspace(3)* @lds.1 to i8*), i64 %x			%ptr = getelementptr inbounds i8, i8* addrspacecast ([32 x i8] addrspace(3)* @lds.1 to i8*), i64 %x
	store i8 1, i8 addrspace(0)* %ptr, align 1			store i8 1, i8 addrspace(0)* %ptr, align 1
	ret void			ret void
	}			}

				@lds.2 = internal unnamed_addr addrspace(3) global i16 undef, align 4
				@lds.3 = internal unnamed_addr addrspace(3) global i16 undef, align 4

				; Check that alignment is propagated to uses for scalar variables.

				; CHECK-LABEL: @k2
				; CHECK: store i16 1, i16 addrspace(3)* getelementptr inbounds (%llvm.amdgcn.kernel.k2.lds.t, %llvm.amdgcn.kernel.k2.lds.t addrspace(3)* @llvm.amdgcn.kernel.k2.lds, i32 0, i32 0), align 4
				; CHECK: store i16 2, i16 addrspace(3)* getelementptr inbounds (%llvm.amdgcn.kernel.k2.lds.t, %llvm.amdgcn.kernel.k2.lds.t addrspace(3)* @llvm.amdgcn.kernel.k2.lds, i32 0, i32 2), align 4
				define amdgpu_kernel void @k2() {
				store i16 1, i16 addrspace(3)* @lds.2, align 2
				store i16 2, i16 addrspace(3)* @lds.3, align 2
				ret void
				}

				@lds.4 = internal unnamed_addr addrspace(3) global [32 x i64] undef, align 8
				@lds.5 = internal unnamed_addr addrspace(3) global [32 x i32] undef, align 4

				; Check that alignment is propagated to uses for arrays.

				; CHECK-LABEL: @k3
				; CHECK: store i32 1, i32 addrspace(3)* %ptr1, align 8
				; CHECK: store i32 2, i32 addrspace(3)* %ptr2, align 4
				; SUPER-ALIGN_ON: store i32 3, i32 addrspace(3)* %ptr3, align 16
				; SUPER-ALIGN_OFF: store i32 3, i32 addrspace(3)* %ptr3, align 8
				; CHECK: store i32 4, i32 addrspace(3)* %ptr4, align 4
				; CHECK: store i32 5, i32 addrspace(3)* %ptr5, align 4
				; CHECK: %load1 = load i32, i32 addrspace(3)* %ptr1, align 8
				; CHECK: %load2 = load i32, i32 addrspace(3)* %ptr2, align 4
				; SUPER-ALIGN_ON: %load3 = load i32, i32 addrspace(3)* %ptr3, align 16
				; SUPER-ALIGN_OFF: %load3 = load i32, i32 addrspace(3)* %ptr3, align 8
				; CHECK: %load4 = load i32, i32 addrspace(3)* %ptr4, align 4
				; CHECK: %load5 = load i32, i32 addrspace(3)* %ptr5, align 4
				; CHECK: %val1 = atomicrmw volatile add i32 addrspace(3)* %ptr1, i32 1 monotonic, align 8
				; CHECK: %val2 = cmpxchg volatile i32 addrspace(3)* %ptr1, i32 1, i32 2 monotonic monotonic, align 8
				; CHECK: %ptr1.bc = bitcast i32 addrspace(3)* %ptr1 to i16 addrspace(3)*
				; CHECK: %ptr2.bc = bitcast i32 addrspace(3)* %ptr2 to i16 addrspace(3)*
				; CHECK: %ptr3.bc = bitcast i32 addrspace(3)* %ptr3 to i16 addrspace(3)*
				; CHECK: %ptr4.bc = bitcast i32 addrspace(3)* %ptr4 to i16 addrspace(3)*
				; CHECK: store i16 11, i16 addrspace(3)* %ptr1.bc, align 8
				; CHECK: store i16 12, i16 addrspace(3)* %ptr2.bc, align 4
				; SUPER-ALIGN_ON: store i16 13, i16 addrspace(3)* %ptr3.bc, align 16
				; SUPER-ALIGN_OFF: store i16 13, i16 addrspace(3)* %ptr3.bc, align 8
				; CHECK: store i16 14, i16 addrspace(3)* %ptr4.bc, align 4
				; CHECK: %ptr1.ac = addrspacecast i32 addrspace(3)* %ptr1 to i32*
				; CHECK: %ptr2.ac = addrspacecast i32 addrspace(3)* %ptr2 to i32*
				; CHECK: %ptr3.ac = addrspacecast i32 addrspace(3)* %ptr3 to i32*
				; CHECK: %ptr4.ac = addrspacecast i32 addrspace(3)* %ptr4 to i32*
				; CHECK: store i32 21, i32* %ptr1.ac, align 8
				; CHECK: store i32 22, i32* %ptr2.ac, align 4
				; SUPER-ALIGN_ON: store i32 23, i32* %ptr3.ac, align 16
				; SUPER-ALIGN_OFF: store i32 23, i32* %ptr3.ac, align 8
				; CHECK: store i32 24, i32* %ptr4.ac, align 4
				define amdgpu_kernel void @k3(i64 %x) {
				%ptr0 = getelementptr inbounds i64, i64 addrspace(3)* bitcast ([32 x i64] addrspace(3)* @lds.4 to i64 addrspace(3)*), i64 0
				store i64 0, i64 addrspace(3)* %ptr0, align 8

				%ptr1 = getelementptr inbounds i32, i32 addrspace(3)* bitcast ([32 x i32] addrspace(3)* @lds.5 to i32 addrspace(3)*), i64 2
				%ptr2 = getelementptr inbounds i32, i32 addrspace(3)* bitcast ([32 x i32] addrspace(3)* @lds.5 to i32 addrspace(3)*), i64 3
				%ptr3 = getelementptr inbounds i32, i32 addrspace(3)* bitcast ([32 x i32] addrspace(3)* @lds.5 to i32 addrspace(3)*), i64 4
				%ptr4 = getelementptr inbounds i32, i32 addrspace(3)* bitcast ([32 x i32] addrspace(3)* @lds.5 to i32 addrspace(3)*), i64 5
				%ptr5 = getelementptr inbounds i32, i32 addrspace(3)* bitcast ([32 x i32] addrspace(3)* @lds.5 to i32 addrspace(3)*), i64 %x

				store i32 1, i32 addrspace(3)* %ptr1, align 4
				store i32 2, i32 addrspace(3)* %ptr2, align 4
				store i32 3, i32 addrspace(3)* %ptr3, align 4
				store i32 4, i32 addrspace(3)* %ptr4, align 4
				store i32 5, i32 addrspace(3)* %ptr5, align 4

				%load1 = load i32, i32 addrspace(3)* %ptr1, align 4
				%load2 = load i32, i32 addrspace(3)* %ptr2, align 4
				%load3 = load i32, i32 addrspace(3)* %ptr3, align 4
				%load4 = load i32, i32 addrspace(3)* %ptr4, align 4
				%load5 = load i32, i32 addrspace(3)* %ptr5, align 4

				%val1 = atomicrmw volatile add i32 addrspace(3)* %ptr1, i32 1 monotonic, align 4
				%val2 = cmpxchg volatile i32 addrspace(3)* %ptr1, i32 1, i32 2 monotonic monotonic, align 4

				%ptr1.bc = bitcast i32 addrspace(3)* %ptr1 to i16 addrspace(3)*
				%ptr2.bc = bitcast i32 addrspace(3)* %ptr2 to i16 addrspace(3)*
				%ptr3.bc = bitcast i32 addrspace(3)* %ptr3 to i16 addrspace(3)*
				%ptr4.bc = bitcast i32 addrspace(3)* %ptr4 to i16 addrspace(3)*

				store i16 11, i16 addrspace(3)* %ptr1.bc, align 2
				store i16 12, i16 addrspace(3)* %ptr2.bc, align 2
				store i16 13, i16 addrspace(3)* %ptr3.bc, align 2
				store i16 14, i16 addrspace(3)* %ptr4.bc, align 2

				%ptr1.ac = addrspacecast i32 addrspace(3)* %ptr1 to i32*
				%ptr2.ac = addrspacecast i32 addrspace(3)* %ptr2 to i32*
				%ptr3.ac = addrspacecast i32 addrspace(3)* %ptr3 to i32*
				%ptr4.ac = addrspacecast i32 addrspace(3)* %ptr4 to i32*

				store i32 21, i32* %ptr1.ac, align 4
				store i32 22, i32* %ptr2.ac, align 4
				store i32 23, i32* %ptr3.ac, align 4
				store i32 24, i32* %ptr4.ac, align 4

				ret void
				}

llvm/test/CodeGen/AMDGPU/lower-kernel-lds.ll

	Show All 12 Lines
	;.			;.
	; CHECK: @lds.size.8.align.8 = internal unnamed_addr addrspace(3) global [8 x i8] undef, align 8			; CHECK: @lds.size.8.align.8 = internal unnamed_addr addrspace(3) global [8 x i8] undef, align 8
	; CHECK: @llvm.amdgcn.kernel.k0.lds = internal addrspace(3) global %llvm.amdgcn.kernel.k0.lds.t undef, align 16			; CHECK: @llvm.amdgcn.kernel.k0.lds = internal addrspace(3) global %llvm.amdgcn.kernel.k0.lds.t undef, align 16
	; CHECK: @llvm.amdgcn.kernel.k1.lds = internal addrspace(3) global %llvm.amdgcn.kernel.k1.lds.t undef, align 16			; CHECK: @llvm.amdgcn.kernel.k1.lds = internal addrspace(3) global %llvm.amdgcn.kernel.k1.lds.t undef, align 16
	;.			;.
	define amdgpu_kernel void @k0() {			define amdgpu_kernel void @k0() {
	; CHECK-LABEL: @k0(			; CHECK-LABEL: @k0(
	; CHECK-NEXT: %lds.size.1.align.1.bc = bitcast [1 x i8] addrspace(3)* getelementptr inbounds (%llvm.amdgcn.kernel.k0.lds.t, %llvm.amdgcn.kernel.k0.lds.t addrspace(3)* @llvm.amdgcn.kernel.k0.lds, i32 0, i32 3) to i8 addrspace(3)*			; CHECK-NEXT: %lds.size.1.align.1.bc = bitcast [1 x i8] addrspace(3)* getelementptr inbounds (%llvm.amdgcn.kernel.k0.lds.t, %llvm.amdgcn.kernel.k0.lds.t addrspace(3)* @llvm.amdgcn.kernel.k0.lds, i32 0, i32 3) to i8 addrspace(3)*
	; CHECK-NEXT: store i8 1, i8 addrspace(3)* %lds.size.1.align.1.bc, align 1			; CHECK-NEXT: store i8 1, i8 addrspace(3)* %lds.size.1.align.1.bc, align 2
	; CHECK-NEXT: %lds.size.2.align.2.bc = bitcast [2 x i8] addrspace(3)* getelementptr inbounds (%llvm.amdgcn.kernel.k0.lds.t, %llvm.amdgcn.kernel.k0.lds.t addrspace(3)* @llvm.amdgcn.kernel.k0.lds, i32 0, i32 2) to i8 addrspace(3)*			; CHECK-NEXT: %lds.size.2.align.2.bc = bitcast [2 x i8] addrspace(3)* getelementptr inbounds (%llvm.amdgcn.kernel.k0.lds.t, %llvm.amdgcn.kernel.k0.lds.t addrspace(3)* @llvm.amdgcn.kernel.k0.lds, i32 0, i32 2) to i8 addrspace(3)*
	; CHECK-NEXT: store i8 2, i8 addrspace(3)* %lds.size.2.align.2.bc, align 2			; CHECK-NEXT: store i8 2, i8 addrspace(3)* %lds.size.2.align.2.bc, align 4
	; CHECK-NEXT: %lds.size.4.align.4.bc = bitcast [4 x i8] addrspace(3)* getelementptr inbounds (%llvm.amdgcn.kernel.k0.lds.t, %llvm.amdgcn.kernel.k0.lds.t addrspace(3)* @llvm.amdgcn.kernel.k0.lds, i32 0, i32 1) to i8 addrspace(3)*			; CHECK-NEXT: %lds.size.4.align.4.bc = bitcast [4 x i8] addrspace(3)* getelementptr inbounds (%llvm.amdgcn.kernel.k0.lds.t, %llvm.amdgcn.kernel.k0.lds.t addrspace(3)* @llvm.amdgcn.kernel.k0.lds, i32 0, i32 1) to i8 addrspace(3)*
	; CHECK-NEXT: store i8 4, i8 addrspace(3)* %lds.size.4.align.4.bc, align 4			; CHECK-NEXT: store i8 4, i8 addrspace(3)* %lds.size.4.align.4.bc, align 16
	; CHECK-NEXT: %lds.size.16.align.16.bc = bitcast [16 x i8] addrspace(3)* getelementptr inbounds (%llvm.amdgcn.kernel.k0.lds.t, %llvm.amdgcn.kernel.k0.lds.t addrspace(3)* @llvm.amdgcn.kernel.k0.lds, i32 0, i32 0) to i8 addrspace(3)*			; CHECK-NEXT: %lds.size.16.align.16.bc = bitcast [16 x i8] addrspace(3)* getelementptr inbounds (%llvm.amdgcn.kernel.k0.lds.t, %llvm.amdgcn.kernel.k0.lds.t addrspace(3)* @llvm.amdgcn.kernel.k0.lds, i32 0, i32 0) to i8 addrspace(3)*
	; CHECK-NEXT: store i8 16, i8 addrspace(3)* %lds.size.16.align.16.bc, align 16			; CHECK-NEXT: store i8 16, i8 addrspace(3)* %lds.size.16.align.16.bc, align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%lds.size.1.align.1.bc = bitcast [1 x i8] addrspace(3)* @lds.size.1.align.1 to i8 addrspace(3)*			%lds.size.1.align.1.bc = bitcast [1 x i8] addrspace(3)* @lds.size.1.align.1 to i8 addrspace(3)*
	store i8 1, i8 addrspace(3)* %lds.size.1.align.1.bc, align 1			store i8 1, i8 addrspace(3)* %lds.size.1.align.1.bc, align 1

	%lds.size.2.align.2.bc = bitcast [2 x i8] addrspace(3)* @lds.size.2.align.2 to i8 addrspace(3)*			%lds.size.2.align.2.bc = bitcast [2 x i8] addrspace(3)* @lds.size.2.align.2 to i8 addrspace(3)*
	store i8 2, i8 addrspace(3)* %lds.size.2.align.2.bc, align 2			store i8 2, i8 addrspace(3)* %lds.size.2.align.2.bc, align 2

	%lds.size.4.align.4.bc = bitcast [4 x i8] addrspace(3)* @lds.size.4.align.4 to i8 addrspace(3)*			%lds.size.4.align.4.bc = bitcast [4 x i8] addrspace(3)* @lds.size.4.align.4 to i8 addrspace(3)*
	store i8 4, i8 addrspace(3)* %lds.size.4.align.4.bc, align 4			store i8 4, i8 addrspace(3)* %lds.size.4.align.4.bc, align 4

	%lds.size.16.align.16.bc = bitcast [16 x i8] addrspace(3)* @lds.size.16.align.16 to i8 addrspace(3)*			%lds.size.16.align.16.bc = bitcast [16 x i8] addrspace(3)* @lds.size.16.align.16 to i8 addrspace(3)*
	store i8 16, i8 addrspace(3)* %lds.size.16.align.16.bc, align 16			store i8 16, i8 addrspace(3)* %lds.size.16.align.16.bc, align 16

	ret void			ret void
	}			}

	define amdgpu_kernel void @k1() {			define amdgpu_kernel void @k1() {
	; CHECK-LABEL: @k1(			; CHECK-LABEL: @k1(
	; CHECK-NEXT: %lds.size.2.align.2.bc = bitcast [2 x i8] addrspace(3)* getelementptr inbounds (%llvm.amdgcn.kernel.k1.lds.t, %llvm.amdgcn.kernel.k1.lds.t addrspace(3)* @llvm.amdgcn.kernel.k1.lds, i32 0, i32 2) to i8 addrspace(3)*			; CHECK-NEXT: %lds.size.2.align.2.bc = bitcast [2 x i8] addrspace(3)* getelementptr inbounds (%llvm.amdgcn.kernel.k1.lds.t, %llvm.amdgcn.kernel.k1.lds.t addrspace(3)* @llvm.amdgcn.kernel.k1.lds, i32 0, i32 2) to i8 addrspace(3)*
	; CHECK-NEXT: store i8 2, i8 addrspace(3)* %lds.size.2.align.2.bc, align 2			; CHECK-NEXT: store i8 2, i8 addrspace(3)* %lds.size.2.align.2.bc, align 4
	; CHECK-NEXT: %lds.size.4.align.4.bc = bitcast [4 x i8] addrspace(3)* getelementptr inbounds (%llvm.amdgcn.kernel.k1.lds.t, %llvm.amdgcn.kernel.k1.lds.t addrspace(3)* @llvm.amdgcn.kernel.k1.lds, i32 0, i32 1) to i8 addrspace(3)*			; CHECK-NEXT: %lds.size.4.align.4.bc = bitcast [4 x i8] addrspace(3)* getelementptr inbounds (%llvm.amdgcn.kernel.k1.lds.t, %llvm.amdgcn.kernel.k1.lds.t addrspace(3)* @llvm.amdgcn.kernel.k1.lds, i32 0, i32 1) to i8 addrspace(3)*
	; CHECK-NEXT: store i8 4, i8 addrspace(3)* %lds.size.4.align.4.bc, align 4			; CHECK-NEXT: store i8 4, i8 addrspace(3)* %lds.size.4.align.4.bc, align 16
	; CHECK-NEXT: %lds.size.16.align.16.bc = bitcast [16 x i8] addrspace(3)* getelementptr inbounds (%llvm.amdgcn.kernel.k1.lds.t, %llvm.amdgcn.kernel.k1.lds.t addrspace(3)* @llvm.amdgcn.kernel.k1.lds, i32 0, i32 0) to i8 addrspace(3)*			; CHECK-NEXT: %lds.size.16.align.16.bc = bitcast [16 x i8] addrspace(3)* getelementptr inbounds (%llvm.amdgcn.kernel.k1.lds.t, %llvm.amdgcn.kernel.k1.lds.t addrspace(3)* @llvm.amdgcn.kernel.k1.lds, i32 0, i32 0) to i8 addrspace(3)*
	; CHECK-NEXT: store i8 16, i8 addrspace(3)* %lds.size.16.align.16.bc, align 16			; CHECK-NEXT: store i8 16, i8 addrspace(3)* %lds.size.16.align.16.bc, align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%lds.size.2.align.2.bc = bitcast [2 x i8] addrspace(3)* @lds.size.2.align.2 to i8 addrspace(3)*			%lds.size.2.align.2.bc = bitcast [2 x i8] addrspace(3)* @lds.size.2.align.2 to i8 addrspace(3)*
	store i8 2, i8 addrspace(3)* %lds.size.2.align.2.bc, align 2			store i8 2, i8 addrspace(3)* %lds.size.2.align.2.bc, align 2

	%lds.size.4.align.4.bc = bitcast [4 x i8] addrspace(3)* @lds.size.4.align.4 to i8 addrspace(3)*			%lds.size.4.align.4.bc = bitcast [4 x i8] addrspace(3)* @lds.size.4.align.4 to i8 addrspace(3)*
	store i8 4, i8 addrspace(3)* %lds.size.4.align.4.bc, align 4			store i8 4, i8 addrspace(3)* %lds.size.4.align.4.bc, align 4

	%lds.size.16.align.16.bc = bitcast [16 x i8] addrspace(3)* @lds.size.16.align.16 to i8 addrspace(3)*			%lds.size.16.align.16.bc = bitcast [16 x i8] addrspace(3)* @lds.size.16.align.16 to i8 addrspace(3)*
	store i8 16, i8 addrspace(3)* %lds.size.16.align.16.bc, align 16			store i8 16, i8 addrspace(3)* %lds.size.16.align.16.bc, align 16

	ret void			ret void
	}			}

llvm/test/CodeGen/AMDGPU/lower-module-lds-used-list.ll

	Show All 23 Lines
	@llvm.used = appending global [2 x i8] [i8 addrspacecast (i8 addrspace(3)* bitcast (float addrspace(3)* @tolower to i8 addrspace(3)) to i8), i8* addrspacecast (i8 addrspace(1)* bitcast (i64 addrspace(1)* @ignored to i8 addrspace(1)) to i8)], section "llvm.metadata"			@llvm.used = appending global [2 x i8] [i8 addrspacecast (i8 addrspace(3)* bitcast (float addrspace(3)* @tolower to i8 addrspace(3)) to i8), i8* addrspacecast (i8 addrspace(1)* bitcast (i64 addrspace(1)* @ignored to i8 addrspace(1)) to i8)], section "llvm.metadata"

	; @ignored still in list, @tolower removed, llvm.amdgcn.module.lds appended			; @ignored still in list, @tolower removed, llvm.amdgcn.module.lds appended
	; CHECK: @llvm.compiler.used = appending global [2 x i8] [i8 addrspacecast (i8 addrspace(1)* bitcast (i64 addrspace(1)* @ignored to i8 addrspace(1)) to i8), i8* addrspacecast (i8 addrspace(3)* bitcast (%llvm.amdgcn.module.lds.t addrspace(3)* @llvm.amdgcn.module.lds to i8 addrspace(3)) to i8)], section "llvm.metadata"			; CHECK: @llvm.compiler.used = appending global [2 x i8] [i8 addrspacecast (i8 addrspace(1)* bitcast (i64 addrspace(1)* @ignored to i8 addrspace(1)) to i8), i8* addrspacecast (i8 addrspace(3)* bitcast (%llvm.amdgcn.module.lds.t addrspace(3)* @llvm.amdgcn.module.lds to i8 addrspace(3)) to i8)], section "llvm.metadata"

	@llvm.compiler.used = appending global [2 x i8] [i8 addrspacecast (i8 addrspace(3)* bitcast (float addrspace(3)* @tolower to i8 addrspace(3)) to i8), i8* addrspacecast (i8 addrspace(1)* bitcast (i64 addrspace(1)* @ignored to i8 addrspace(1)) to i8)], section "llvm.metadata"			@llvm.compiler.used = appending global [2 x i8] [i8 addrspacecast (i8 addrspace(3)* bitcast (float addrspace(3)* @tolower to i8 addrspace(3)) to i8), i8* addrspacecast (i8 addrspace(1)* bitcast (i64 addrspace(1)* @ignored to i8 addrspace(1)) to i8)], section "llvm.metadata"

	; CHECK-LABEL: @func()			; CHECK-LABEL: @func()
	; CHECK: %dec = atomicrmw fsub float addrspace(3)* getelementptr inbounds (%llvm.amdgcn.module.lds.t, %llvm.amdgcn.module.lds.t addrspace(3)* @llvm.amdgcn.module.lds, i32 0, i32 0), float 1.000000e+00 monotonic, align 4			; CHECK: %dec = atomicrmw fsub float addrspace(3)* getelementptr inbounds (%llvm.amdgcn.module.lds.t, %llvm.amdgcn.module.lds.t addrspace(3)* @llvm.amdgcn.module.lds, i32 0, i32 0), float 1.000000e+00 monotonic, align 8
	define void @func() {			define void @func() {
	%dec = atomicrmw fsub float addrspace(3)* @tolower, float 1.0 monotonic			%dec = atomicrmw fsub float addrspace(3)* @tolower, float 1.0 monotonic
	%unused0 = atomicrmw add i64 addrspace(1)* @ignored, i64 1 monotonic			%unused0 = atomicrmw add i64 addrspace(1)* @ignored, i64 1 monotonic
	ret void			ret void
	}			}

llvm/test/CodeGen/AMDGPU/lower-module-lds.ll

	Show All 18 Lines
	@with_init = addrspace(3) global i64 0			@with_init = addrspace(3) global i64 0

	; Instance of new type, aligned to max of element alignment			; Instance of new type, aligned to max of element alignment
	; CHECK: @llvm.amdgcn.module.lds = internal addrspace(3) global %llvm.amdgcn.module.lds.t undef, align 8			; CHECK: @llvm.amdgcn.module.lds = internal addrspace(3) global %llvm.amdgcn.module.lds.t undef, align 8

	; Use in func rewritten to access struct at address zero			; Use in func rewritten to access struct at address zero
	; CHECK-LABEL: @func()			; CHECK-LABEL: @func()
	; CHECK: %dec = atomicrmw fsub float addrspace(3)* getelementptr inbounds (%llvm.amdgcn.module.lds.t, %llvm.amdgcn.module.lds.t addrspace(3)* @llvm.amdgcn.module.lds, i32 0, i32 0), float 1.0			; CHECK: %dec = atomicrmw fsub float addrspace(3)* getelementptr inbounds (%llvm.amdgcn.module.lds.t, %llvm.amdgcn.module.lds.t addrspace(3)* @llvm.amdgcn.module.lds, i32 0, i32 0), float 1.0
	; CHECK: %val0 = load i32, i32 addrspace(3)* getelementptr inbounds (%llvm.amdgcn.module.lds.t, %llvm.amdgcn.module.lds.t addrspace(3)* @llvm.amdgcn.module.lds, i32 0, i32 2), align 4			; CHECK: %val0 = load i32, i32 addrspace(3)* getelementptr inbounds (%llvm.amdgcn.module.lds.t, %llvm.amdgcn.module.lds.t addrspace(3)* @llvm.amdgcn.module.lds, i32 0, i32 2), align 8
	; CHECK: %val1 = add i32 %val0, 4			; CHECK: %val1 = add i32 %val0, 4
	; CHECK: store i32 %val1, i32 addrspace(3)* getelementptr inbounds (%llvm.amdgcn.module.lds.t, %llvm.amdgcn.module.lds.t addrspace(3)* @llvm.amdgcn.module.lds, i32 0, i32 2), align 4			; CHECK: store i32 %val1, i32 addrspace(3)* getelementptr inbounds (%llvm.amdgcn.module.lds.t, %llvm.amdgcn.module.lds.t addrspace(3)* @llvm.amdgcn.module.lds, i32 0, i32 2), align 8
	; CHECK: %unused0 = atomicrmw add i64 addrspace(3)* @with_init, i64 1 monotonic			; CHECK: %unused0 = atomicrmw add i64 addrspace(3)* @with_init, i64 1 monotonic
	define void @func() {			define void @func() {
	%dec = atomicrmw fsub float addrspace(3)* @var0, float 1.0 monotonic			%dec = atomicrmw fsub float addrspace(3)* @var0, float 1.0 monotonic
	%val0 = load i32, i32 addrspace(3)* @var1, align 4			%val0 = load i32, i32 addrspace(3)* @var1, align 4
	%val1 = add i32 %val0, 4			%val1 = add i32 %val0, 4
	store i32 %val1, i32 addrspace(3)* @var1, align 4			store i32 %val1, i32 addrspace(3)* @var1, align 4
	%unused0 = atomicrmw add i64 addrspace(3)* @with_init, i64 1 monotonic			%unused0 = atomicrmw add i64 addrspace(3)* @with_init, i64 1 monotonic
	ret void			ret void
	}			}

	; This kernel calls a function that uses LDS so needs the block			; This kernel calls a function that uses LDS so needs the block
	; CHECK-LABEL: @kern_call()			; CHECK-LABEL: @kern_call()
	; CHECK: call void @llvm.donothing() [ "ExplicitUse"(%llvm.amdgcn.module.lds.t addrspace(3)* @llvm.amdgcn.module.lds) ]			; CHECK: call void @llvm.donothing() [ "ExplicitUse"(%llvm.amdgcn.module.lds.t addrspace(3)* @llvm.amdgcn.module.lds) ]
	; CHECK: call void @func()			; CHECK: call void @func()
	; CHECK: %dec = atomicrmw fsub float addrspace(3)* getelementptr inbounds (%llvm.amdgcn.module.lds.t, %llvm.amdgcn.module.lds.t addrspace(3)* @llvm.amdgcn.module.lds, i32 0, i32 0), float 2.000000e+00 monotonic, align 4			; CHECK: %dec = atomicrmw fsub float addrspace(3)* getelementptr inbounds (%llvm.amdgcn.module.lds.t, %llvm.amdgcn.module.lds.t addrspace(3)* @llvm.amdgcn.module.lds, i32 0, i32 0), float 2.000000e+00 monotonic, align 8
	define amdgpu_kernel void @kern_call() {			define amdgpu_kernel void @kern_call() {
	call void @func()			call void @func()
	%dec = atomicrmw fsub float addrspace(3)* @var0, float 2.0 monotonic			%dec = atomicrmw fsub float addrspace(3)* @var0, float 2.0 monotonic
	ret void			ret void
	}			}

	; This kernel does not need to alloc the LDS block as it makes no calls			; This kernel does not need to alloc the LDS block as it makes no calls
	; CHECK-LABEL: @kern_empty()			; CHECK-LABEL: @kern_empty()
	; CHECK: call void @llvm.donothing() [ "ExplicitUse"(%llvm.amdgcn.module.lds.t addrspace(3)* @llvm.amdgcn.module.lds) ]			; CHECK: call void @llvm.donothing() [ "ExplicitUse"(%llvm.amdgcn.module.lds.t addrspace(3)* @llvm.amdgcn.module.lds) ]
	define spir_kernel void @kern_empty() {			define spir_kernel void @kern_empty() {
	ret void			ret void
	}			}

llvm/test/CodeGen/AMDGPU/replace-lds-by-ptr-lds-offsets.ll

	Show First 20 Lines • Show All 67 Lines • ▼ Show 20 Lines
	; POINTER-REPLACE: %1 = load i16, i16 addrspace(3)* @lds.1.ptr, align 2			; POINTER-REPLACE: %1 = load i16, i16 addrspace(3)* @lds.1.ptr, align 2
	; POINTER-REPLACE: %2 = getelementptr i8, i8 addrspace(3)* null, i16 %1			; POINTER-REPLACE: %2 = getelementptr i8, i8 addrspace(3)* null, i16 %1
	; POINTER-REPLACE: %3 = bitcast i8 addrspace(3)* %2 to i32 addrspace(3)*			; POINTER-REPLACE: %3 = bitcast i8 addrspace(3)* %2 to i32 addrspace(3)*
	; POINTER-REPLACE: store i32 7, i32 addrspace(3)* %3, align 4			; POINTER-REPLACE: store i32 7, i32 addrspace(3)* %3, align 4
	; POINTER-REPLACE: ret void			; POINTER-REPLACE: ret void


	; LOWER_LDS-LABEL: @f1			; LOWER_LDS-LABEL: @f1
	; LOWER_LDS: %1 = load i16, i16 addrspace(3)* getelementptr inbounds (%llvm.amdgcn.module.lds.t, %llvm.amdgcn.module.lds.t addrspace(3)* @llvm.amdgcn.module.lds, i32 0, i32 1), align 2			; LOWER_LDS: %1 = load i16, i16 addrspace(3)* getelementptr inbounds (%llvm.amdgcn.module.lds.t, %llvm.amdgcn.module.lds.t addrspace(3)* @llvm.amdgcn.module.lds, i32 0, i32 1), align 16
	; LOWER_LDS: %2 = getelementptr i8, i8 addrspace(3)* null, i16 %1			; LOWER_LDS: %2 = getelementptr i8, i8 addrspace(3)* null, i16 %1
	; LOWER_LDS: %3 = bitcast i8 addrspace(3)* %2 to i32 addrspace(3)*			; LOWER_LDS: %3 = bitcast i8 addrspace(3)* %2 to i32 addrspace(3)*
	; LOWER_LDS: store i32 7, i32 addrspace(3)* %3, align 4			; LOWER_LDS: store i32 7, i32 addrspace(3)* %3, align 4
	; LOWER_LDS: ret void			; LOWER_LDS: ret void


	; GCN-LABEL: f1:			; GCN-LABEL: f1:
	; GCN: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines
	; LOWER_LDS: %1 = call i32 @llvm.amdgcn.mbcnt.lo(i32 -1, i32 0)			; LOWER_LDS: %1 = call i32 @llvm.amdgcn.mbcnt.lo(i32 -1, i32 0)
	; LOWER_LDS: %2 = icmp eq i32 %1, 0			; LOWER_LDS: %2 = icmp eq i32 %1, 0
	; LOWER_LDS: br i1 %2, label %3, label %6			; LOWER_LDS: br i1 %2, label %3, label %6
	;			;
	; LOWER_LDS-LABEL: 3:			; LOWER_LDS-LABEL: 3:
	; LOWER_LDS: %4 = ptrtoint i64 addrspace(3)* getelementptr inbounds (%llvm.amdgcn.kernel.k1.lds.t, %llvm.amdgcn.kernel.k1.lds.t addrspace(3)* @llvm.amdgcn.kernel.k1.lds, i32 0, i32 0) to i16			; LOWER_LDS: %4 = ptrtoint i64 addrspace(3)* getelementptr inbounds (%llvm.amdgcn.kernel.k1.lds.t, %llvm.amdgcn.kernel.k1.lds.t addrspace(3)* @llvm.amdgcn.kernel.k1.lds, i32 0, i32 0) to i16
	; LOWER_LDS: store i16 %4, i16 addrspace(3)* getelementptr inbounds (%llvm.amdgcn.module.lds.t, %llvm.amdgcn.module.lds.t addrspace(3)* @llvm.amdgcn.module.lds, i32 0, i32 2), align 2			; LOWER_LDS: store i16 %4, i16 addrspace(3)* getelementptr inbounds (%llvm.amdgcn.module.lds.t, %llvm.amdgcn.module.lds.t addrspace(3)* @llvm.amdgcn.module.lds, i32 0, i32 2), align 2
	; LOWER_LDS: %5 = ptrtoint i32 addrspace(3)* getelementptr inbounds (%llvm.amdgcn.kernel.k1.lds.t, %llvm.amdgcn.kernel.k1.lds.t addrspace(3)* @llvm.amdgcn.kernel.k1.lds, i32 0, i32 1) to i16			; LOWER_LDS: %5 = ptrtoint i32 addrspace(3)* getelementptr inbounds (%llvm.amdgcn.kernel.k1.lds.t, %llvm.amdgcn.kernel.k1.lds.t addrspace(3)* @llvm.amdgcn.kernel.k1.lds, i32 0, i32 1) to i16
	; LOWER_LDS: store i16 %5, i16 addrspace(3)* getelementptr inbounds (%llvm.amdgcn.module.lds.t, %llvm.amdgcn.module.lds.t addrspace(3)* @llvm.amdgcn.module.lds, i32 0, i32 1), align 2			; LOWER_LDS: store i16 %5, i16 addrspace(3)* getelementptr inbounds (%llvm.amdgcn.module.lds.t, %llvm.amdgcn.module.lds.t addrspace(3)* @llvm.amdgcn.module.lds, i32 0, i32 1), align 16
	; LOWER_LDS: br label %6			; LOWER_LDS: br label %6
	;			;
	; LOWER_LDS-LABEL: 6:			; LOWER_LDS-LABEL: 6:
	; LOWER_LDS: call void @llvm.amdgcn.wave.barrier()			; LOWER_LDS: call void @llvm.amdgcn.wave.barrier()
	; LOWER_LDS: %bc = bitcast [2 x i64] addrspace(3)* @alias.to.lds.3 to i8 addrspace(3)*			; LOWER_LDS: %bc = bitcast [2 x i64] addrspace(3)* @alias.to.lds.3 to i8 addrspace(3)*
	; LOWER_LDS: store i8 3, i8 addrspace(3)* %bc, align 2			; LOWER_LDS: store i8 3, i8 addrspace(3)* %bc, align 2
	; LOWER_LDS: call void @f1()			; LOWER_LDS: call void @f1()
	; LOWER_LDS: call void @f2()			; LOWER_LDS: call void @f2()
	; LOWER_LDS: ret void			; LOWER_LDS: ret void


	; GCN-LABEL: k1:			; GCN-LABEL: k1:
	; GCN: s_mov_b32 s8, SCRATCH_RSRC_DWORD0			; GCN: s_mov_b32 s8, SCRATCH_RSRC_DWORD0
	; GCN: s_mov_b32 s9, SCRATCH_RSRC_DWORD1			; GCN: s_mov_b32 s9, SCRATCH_RSRC_DWORD1
	; GCN: s_mov_b32 s10, -1			; GCN: s_mov_b32 s10, -1
	; GCN: s_mov_b32 s11, 0xe00000			; GCN: s_mov_b32 s11, 0xe00000
	; GCN: s_add_u32 s8, s8, s1			; GCN: s_add_u32 s8, s8, s1
	; GCN: v_mbcnt_lo_u32_b32 v0, -1, 0			; GCN: v_mbcnt_lo_u32_b32 v0, -1, 0
	; GCN: s_addc_u32 s9, s9, 0			; GCN: s_addc_u32 s9, s9, 0
	; GCN: v_cmp_eq_u32_e32 vcc, 0, v0			; GCN: v_cmp_eq_u32_e32 vcc, 0, v0
	; GCN: s_mov_b32 s32, 0			; GCN: s_mov_b32 s32, 0
	; GCN: s_and_saveexec_b64 s[0:1], vcc			; GCN: s_and_saveexec_b64 s[0:1], vcc
	; GCN: s_cbranch_execz BB2_2			; GCN: s_cbranch_execz BB2_2
	; GCN: v_mov_b32_e32 v0, 24			; GCN: v_mov_b32_e32 v0, 0
	; GCN: v_mov_b32_e32 v1, 0			; GCN: v_mov_b32_e32 v1, 0x180020
	; GCN: ds_write_b16 v1, v0 offset:18			; GCN: ds_write_b32 v0, v1 offset:16
	; GCN: v_mov_b32_e32 v0, 32
	; GCN: ds_write_b16 v1, v0 offset:16
	; GCN-LABEL: BB2_2:			; GCN-LABEL: BB2_2:
	; GCN: s_or_b64 exec, exec, s[0:1]			; GCN: s_or_b64 exec, exec, s[0:1]
	; GCN: s_getpc_b64 s[0:1]			; GCN: s_getpc_b64 s[0:1]
	; GCN: s_add_u32 s0, s0, f1@gotpcrel32@lo+4			; GCN: s_add_u32 s0, s0, f1@gotpcrel32@lo+4
	; GCN: s_addc_u32 s1, s1, f1@gotpcrel32@hi+12			; GCN: s_addc_u32 s1, s1, f1@gotpcrel32@hi+12
	; GCN: s_load_dwordx2 s[4:5], s[0:1], 0x0			; GCN: s_load_dwordx2 s[4:5], s[0:1], 0x0
	; GCN: s_mov_b64 s[0:1], s[8:9]			; GCN: s_mov_b64 s[0:1], s[8:9]
	; GCN: s_mov_b64 s[2:3], s[10:11]			; GCN: s_mov_b64 s[2:3], s[10:11]
	Show All 22 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Propagate LDS align into to instructions
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 353868

llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp

llvm/test/CodeGen/AMDGPU/ds_read2.ll

llvm/test/CodeGen/AMDGPU/ds_write2.ll

llvm/test/CodeGen/AMDGPU/lower-kernel-and-module-lds.ll

llvm/test/CodeGen/AMDGPU/lower-kernel-lds-constexpr.ll

llvm/test/CodeGen/AMDGPU/lower-kernel-lds-global-uses.ll

llvm/test/CodeGen/AMDGPU/lower-kernel-lds-super-align.ll

llvm/test/CodeGen/AMDGPU/lower-kernel-lds.ll

llvm/test/CodeGen/AMDGPU/lower-module-lds-used-list.ll

llvm/test/CodeGen/AMDGPU/lower-module-lds.ll

llvm/test/CodeGen/AMDGPU/replace-lds-by-ptr-lds-offsets.ll

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Propagate LDS align into to instructionsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 353868

llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp

llvm/test/CodeGen/AMDGPU/ds_read2.ll

llvm/test/CodeGen/AMDGPU/ds_write2.ll

llvm/test/CodeGen/AMDGPU/lower-kernel-and-module-lds.ll

llvm/test/CodeGen/AMDGPU/lower-kernel-lds-constexpr.ll

llvm/test/CodeGen/AMDGPU/lower-kernel-lds-global-uses.ll

llvm/test/CodeGen/AMDGPU/lower-kernel-lds-super-align.ll

llvm/test/CodeGen/AMDGPU/lower-kernel-lds.ll

llvm/test/CodeGen/AMDGPU/lower-module-lds-used-list.ll

llvm/test/CodeGen/AMDGPU/lower-module-lds.ll

llvm/test/CodeGen/AMDGPU/replace-lds-by-ptr-lds-offsets.ll

[AMDGPU] Propagate LDS align into to instructions
ClosedPublic