This is an archive of the discontinued LLVM Phabricator instance.

lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1362–1369 ↗	(On Diff #162317)	Can we check NUW flags on the add instead?
test/CodeGen/AMDGPU/constant-address-space-32bit.ll
289 ↗	(On Diff #162317)	The constant should canonically be i32

mareko added inline comments.Aug 24 2018, 9:19 AM

lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1362–1369 ↗	(On Diff #162317)	Yes, but where would I check for 256 * 1024?

nhaehnle added inline comments.Aug 24 2018, 10:00 AM

lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1362–1369 ↗	(On Diff #162317)	The question is, why the check for 256 * 1024 in the first place? That seems rather fishy. I guess it could be made to work if we define that the 32-bit address space only goes up to 4GB - 256KB, but then that should be added to the address space documentation. The other issue is that I think some hardware generations have the IMM offset as a signed byte offset. At least for gfx9 that's what the docs say. So then the check should depend on the hardware generation.

mareko added inline comments.Aug 24 2018, 10:26 PM

lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1362–1369 ↗	(On Diff #162317)	arsenm asked me to check for NUW, which would disable IMM offset for all current users. Do you have a better idea?

I'm not 100% sure the NUW thing works here, because the actual instruction has the conversion to 64-bit before the add IIRC

Right, that matches my understanding: the SMRD/SMEM instruction does a 64-bit addition, so if the 32-bit (add X, imm) were to have an unsigned wraparound, moving it into the immediate of the SMRD/SMEM would remove the wraparound and therefore be incorrect.

Conversely, if the 32-bit add is nuw, then putting the immediate into the SMEM is okay. So we should definitely add this as a condition that allows the move.

The question is: what to do about the 256 KiB condition? It is rather hacky, so ideally we'd do without, but that requires us to figure out how earlier passes can be coaxed into generating adds with nuw set (or live with less efficient code, which would suck). This needs looking at the getelementptr lowering.

Looking at SelectionDAGBuilder::visitGetElementPtr, nuw is set under certain conditions for inbounds getelementptr. I suspect we should be able to make most GEPs inbounds in Mesa - it just means that we never, not even temporarily, try to take addresses outside of properly allocated memory objects (buffers, arrays of descriptors).

Would the combination of:

check NUW here
create inbounds GEP

make good use of SMEM/SMRD immediates?

The DAG isn't good about preserving NUW. I had some old patches I never got reviews to try to improve this

That would be good, yeah. Note here it's not so much a case of preserving NUW as it is of deducing as much NUW as possible from getelementptrs.

We can ignore old Mesa + new LLVM, because LLVM 7 is the first release to have 32-bit pointers, and I think we can fix that before release.

For internal driver additions, e.g. offset + small constant, the driver can use NUW or NSW. For app additions, we would just use normal Add. As long as NUW or NSW is preserved, we should be fine.

That sounds like the way to go is testing just the NUW bit here in SelectSMRDOffset?

@arsenm Where do you have the patches that preserve NUW?

In D51203#1214382, @nhaehnle wrote:

Looking at SelectionDAGBuilder::visitGetElementPtr, nuw is set under certain conditions for inbounds getelementptr. I suspect we should be able to make most GEPs inbounds in Mesa - it just means that we never, not even temporarily, try to take addresses outside of properly allocated memory objects (buffers, arrays of descriptors).

Would the combination of:

check NUW here

create inbounds GEP

make good use of SMEM/SMRD immediates?

The code in SelectionDAGBuilder::visitGetElementPtr is unsafe for our case, because it doesn't know that the addition of 32-bit addresses is performed in 64 bits. Even x+4 can overflow in 32 bits but not 64 bits. The GEP can be "inbounds", but it doesn't change anything. However, we can hackishly use the inbounds flag to mean that the addition is safe, because GEP with inbounds and offset <= INT_MAX is converted to "add nuw".

This fixes GPU hangs with OpenGL bindless handle arithmetic.

Harbormaster completed remote builds in B22043: Diff 163011.Aug 28 2018, 11:50 PM

In D51203#1216951, @mareko wrote:

In D51203#1214382, @nhaehnle wrote:

Looking at SelectionDAGBuilder::visitGetElementPtr, nuw is set under certain conditions for inbounds getelementptr. I suspect we should be able to make most GEPs inbounds in Mesa - it just means that we never, not even temporarily, try to take addresses outside of properly allocated memory objects (buffers, arrays of descriptors).

Would the combination of:

check NUW here

create inbounds GEP

make good use of SMEM/SMRD immediates?

The code in SelectionDAGBuilder::visitGetElementPtr is unsafe for our case, because it doesn't know that the addition of 32-bit addresses is performed in 64 bits. Even x+4 can overflow in 32 bits but not 64 bits. The GEP can be "inbounds", but it doesn't change anything. However, we can hackishly use the inbounds flag to mean that the addition is safe, because GEP with inbounds and offset <= INT_MAX is converted to "add nuw".

Yes, x+4 can overflow, but 'inbounds' is basically a promise that that won't happen. I'm pretty sure there's nothing hackish or unsafe about it.

First, 'inbounds' on getelementptr means (according to LangRef): interpreting all array offsets as signed integers and performing all arithmetic with infinite precision (i.e., no wraparound), the resulting pointer stays in bounds of the object pointed to by the base pointer at each step.

Now, visitGetElementPtr translates inbounds getelementptr on addrspace(6) with a non-negative array index into an add nuw i32. This is correct, because memory objects themselves never wrap around (there is no memory object that starts below 2^32 and ends above 0) and so the in bounds promise means that there is indeed no unsigned wrapping when adding as 32-bits.

Second, when we encounter (load (add nuw i32 reg, const)), and we put the const into the SMRD, then yes, the addition will be performed in 64 bits. But that doesn't change the wrapping behavior because there was no wrapping to begin with.

I think this change is good to go as-is.

This revision is now accepted and ready to land.Aug 29 2018, 6:39 AM

Closed by commit rL340959: AMDGPU: Handle 32-bit address wraparounds for SMRD opcodes (authored by mareko). · Explain WhyAug 29 2018, 1:03 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

AMDGPU/

AMDGPUISelDAGToDAG.cpp

6 lines

test/

CodeGen/

AMDGPU/

constant-address-space-32bit.ll

39 lines

Diff 163167

llvm/trunk/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp

Show First 20 Lines • Show All 1,402 Lines • ▼ Show 20 Lines	SDValue AMDGPUDAGToDAGISel::Expand32BitAddress(SDValue Addr) const {
return SDValue(CurDAG->getMachineNode(AMDGPU::REG_SEQUENCE, SL, MVT::i64,		return SDValue(CurDAG->getMachineNode(AMDGPU::REG_SEQUENCE, SL, MVT::i64,
Ops), 0);		Ops), 0);
}		}

bool AMDGPUDAGToDAGISel::SelectSMRD(SDValue Addr, SDValue &SBase,		bool AMDGPUDAGToDAGISel::SelectSMRD(SDValue Addr, SDValue &SBase,
SDValue &Offset, bool &Imm) const {		SDValue &Offset, bool &Imm) const {
SDLoc SL(Addr);		SDLoc SL(Addr);

if (CurDAG->isBaseWithConstantOffset(Addr)) {		// A 32-bit (address + offset) should not cause unsigned 32-bit integer
		// wraparound, because s_load instructions perform the addition in 64 bits.
		if ((Addr.getValueType() != MVT::i32 \|\|
		Addr->getFlags().hasNoUnsignedWrap()) &&
		CurDAG->isBaseWithConstantOffset(Addr)) {
SDValue N0 = Addr.getOperand(0);		SDValue N0 = Addr.getOperand(0);
SDValue N1 = Addr.getOperand(1);		SDValue N1 = Addr.getOperand(1);

if (SelectSMRDOffset(N1, Offset, Imm)) {		if (SelectSMRDOffset(N1, Offset, Imm)) {
SBase = Expand32BitAddress(N0);		SBase = Expand32BitAddress(N0);
return true;		return true;
}		}
}		}
▲ Show 20 Lines • Show All 802 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AMDGPU/constant-address-space-32bit.ll

	; RUN: llc -march=amdgcn -mcpu=tahiti < %s \| FileCheck -check-prefixes=GCN,SICI,SI %s			; RUN: llc -march=amdgcn -mcpu=tahiti < %s \| FileCheck -check-prefixes=GCN,SICI,SI %s
	; RUN: llc -march=amdgcn -mcpu=bonaire < %s \| FileCheck -check-prefixes=GCN,SICI %s			; RUN: llc -march=amdgcn -mcpu=bonaire < %s \| FileCheck -check-prefixes=GCN,SICI %s
	; RUN: llc -march=amdgcn -mcpu=tonga < %s \| FileCheck -check-prefixes=GCN,VIGFX9 %s			; RUN: llc -march=amdgcn -mcpu=tonga < %s \| FileCheck -check-prefixes=GCN,VIGFX9 %s
	; RUN: llc -march=amdgcn -mcpu=gfx900 < %s \| FileCheck -check-prefixes=GCN,VIGFX9 %s			; RUN: llc -march=amdgcn -mcpu=gfx900 < %s \| FileCheck -check-prefixes=GCN,VIGFX9 %s

	; GCN-LABEL: {{^}}load_i32:			; GCN-LABEL: {{^}}load_i32:
	; GCN-DAG: s_mov_b32 s3, 0			; GCN-DAG: s_mov_b32 s3, 0
	; GCN-DAG: s_mov_b32 s2, s1			; GCN-DAG: s_mov_b32 s2, s1
	; GCN-DAG: s_mov_b32 s1, s3			; GCN-DAG: s_mov_b32 s1, s3
	; SICI-DAG: s_load_dword s{{[0-9]}}, s[0:1], 0x0			; SICI-DAG: s_load_dword s{{[0-9]}}, s[0:1], 0x0
	; SICI-DAG: s_load_dword s{{[0-9]}}, s[2:3], 0x2			; SICI-DAG: s_load_dword s{{[0-9]}}, s[2:3], 0x2
	; VIGFX9-DAG: s_load_dword s{{[0-9]}}, s[0:1], 0x0			; VIGFX9-DAG: s_load_dword s{{[0-9]}}, s[0:1], 0x0
	; VIGFX9-DAG: s_load_dword s{{[0-9]}}, s[2:3], 0x8			; VIGFX9-DAG: s_load_dword s{{[0-9]}}, s[2:3], 0x8
	define amdgpu_vs float @load_i32(i32 addrspace(6)* inreg %p0, i32 addrspace(6)* inreg %p1) #0 {			define amdgpu_vs float @load_i32(i32 addrspace(6)* inreg %p0, i32 addrspace(6)* inreg %p1) #0 {
	%gep1 = getelementptr i32, i32 addrspace(6)* %p1, i64 2			%gep1 = getelementptr inbounds i32, i32 addrspace(6)* %p1, i32 2
	%r0 = load i32, i32 addrspace(6)* %p0			%r0 = load i32, i32 addrspace(6)* %p0
	%r1 = load i32, i32 addrspace(6)* %gep1			%r1 = load i32, i32 addrspace(6)* %gep1
	%r = add i32 %r0, %r1			%r = add i32 %r0, %r1
	%r2 = bitcast i32 %r to float			%r2 = bitcast i32 %r to float
	ret float %r2			ret float %r2
	}			}

	; GCN-LABEL: {{^}}load_v2i32:			; GCN-LABEL: {{^}}load_v2i32:
	; GCN-DAG: s_mov_b32 s3, 0			; GCN-DAG: s_mov_b32 s3, 0
	; GCN-DAG: s_mov_b32 s2, s1			; GCN-DAG: s_mov_b32 s2, s1
	; GCN-DAG: s_mov_b32 s1, s3			; GCN-DAG: s_mov_b32 s1, s3
	; SICI-DAG: s_load_dwordx2 s[{{.*}}], s[0:1], 0x0			; SICI-DAG: s_load_dwordx2 s[{{.*}}], s[0:1], 0x0
	; SICI-DAG: s_load_dwordx2 s[{{.*}}], s[2:3], 0x4			; SICI-DAG: s_load_dwordx2 s[{{.*}}], s[2:3], 0x4
	; VIGFX9-DAG: s_load_dwordx2 s[{{.*}}], s[0:1], 0x0			; VIGFX9-DAG: s_load_dwordx2 s[{{.*}}], s[0:1], 0x0
	; VIGFX9-DAG: s_load_dwordx2 s[{{.*}}], s[2:3], 0x10			; VIGFX9-DAG: s_load_dwordx2 s[{{.*}}], s[2:3], 0x10
	define amdgpu_vs <2 x float> @load_v2i32(<2 x i32> addrspace(6)* inreg %p0, <2 x i32> addrspace(6)* inreg %p1) #0 {			define amdgpu_vs <2 x float> @load_v2i32(<2 x i32> addrspace(6)* inreg %p0, <2 x i32> addrspace(6)* inreg %p1) #0 {
	%gep1 = getelementptr <2 x i32>, <2 x i32> addrspace(6)* %p1, i64 2			%gep1 = getelementptr inbounds <2 x i32>, <2 x i32> addrspace(6)* %p1, i32 2
	%r0 = load <2 x i32>, <2 x i32> addrspace(6)* %p0			%r0 = load <2 x i32>, <2 x i32> addrspace(6)* %p0
	%r1 = load <2 x i32>, <2 x i32> addrspace(6)* %gep1			%r1 = load <2 x i32>, <2 x i32> addrspace(6)* %gep1
	%r = add <2 x i32> %r0, %r1			%r = add <2 x i32> %r0, %r1
	%r2 = bitcast <2 x i32> %r to <2 x float>			%r2 = bitcast <2 x i32> %r to <2 x float>
	ret <2 x float> %r2			ret <2 x float> %r2
	}			}

	; GCN-LABEL: {{^}}load_v4i32:			; GCN-LABEL: {{^}}load_v4i32:
	; GCN-DAG: s_mov_b32 s3, 0			; GCN-DAG: s_mov_b32 s3, 0
	; GCN-DAG: s_mov_b32 s2, s1			; GCN-DAG: s_mov_b32 s2, s1
	; GCN-DAG: s_mov_b32 s1, s3			; GCN-DAG: s_mov_b32 s1, s3
	; SICI-DAG: s_load_dwordx4 s[{{.*}}], s[0:1], 0x0			; SICI-DAG: s_load_dwordx4 s[{{.*}}], s[0:1], 0x0
	; SICI-DAG: s_load_dwordx4 s[{{.*}}], s[2:3], 0x8			; SICI-DAG: s_load_dwordx4 s[{{.*}}], s[2:3], 0x8
	; VIGFX9-DAG: s_load_dwordx4 s[{{.*}}], s[0:1], 0x0			; VIGFX9-DAG: s_load_dwordx4 s[{{.*}}], s[0:1], 0x0
	; VIGFX9-DAG: s_load_dwordx4 s[{{.*}}], s[2:3], 0x20			; VIGFX9-DAG: s_load_dwordx4 s[{{.*}}], s[2:3], 0x20
	define amdgpu_vs <4 x float> @load_v4i32(<4 x i32> addrspace(6)* inreg %p0, <4 x i32> addrspace(6)* inreg %p1) #0 {			define amdgpu_vs <4 x float> @load_v4i32(<4 x i32> addrspace(6)* inreg %p0, <4 x i32> addrspace(6)* inreg %p1) #0 {
	%gep1 = getelementptr <4 x i32>, <4 x i32> addrspace(6)* %p1, i64 2			%gep1 = getelementptr inbounds <4 x i32>, <4 x i32> addrspace(6)* %p1, i32 2
	%r0 = load <4 x i32>, <4 x i32> addrspace(6)* %p0			%r0 = load <4 x i32>, <4 x i32> addrspace(6)* %p0
	%r1 = load <4 x i32>, <4 x i32> addrspace(6)* %gep1			%r1 = load <4 x i32>, <4 x i32> addrspace(6)* %gep1
	%r = add <4 x i32> %r0, %r1			%r = add <4 x i32> %r0, %r1
	%r2 = bitcast <4 x i32> %r to <4 x float>			%r2 = bitcast <4 x i32> %r to <4 x float>
	ret <4 x float> %r2			ret <4 x float> %r2
	}			}

	; GCN-LABEL: {{^}}load_v8i32:			; GCN-LABEL: {{^}}load_v8i32:
	; GCN-DAG: s_mov_b32 s3, 0			; GCN-DAG: s_mov_b32 s3, 0
	; GCN-DAG: s_mov_b32 s2, s1			; GCN-DAG: s_mov_b32 s2, s1
	; GCN-DAG: s_mov_b32 s1, s3			; GCN-DAG: s_mov_b32 s1, s3
	; SICI-DAG: s_load_dwordx8 s[{{.*}}], s[0:1], 0x0			; SICI-DAG: s_load_dwordx8 s[{{.*}}], s[0:1], 0x0
	; SICI-DAG: s_load_dwordx8 s[{{.*}}], s[2:3], 0x10			; SICI-DAG: s_load_dwordx8 s[{{.*}}], s[2:3], 0x10
	; VIGFX9-DAG: s_load_dwordx8 s[{{.*}}], s[0:1], 0x0			; VIGFX9-DAG: s_load_dwordx8 s[{{.*}}], s[0:1], 0x0
	; VIGFX9-DAG: s_load_dwordx8 s[{{.*}}], s[2:3], 0x40			; VIGFX9-DAG: s_load_dwordx8 s[{{.*}}], s[2:3], 0x40
	define amdgpu_vs <8 x float> @load_v8i32(<8 x i32> addrspace(6)* inreg %p0, <8 x i32> addrspace(6)* inreg %p1) #0 {			define amdgpu_vs <8 x float> @load_v8i32(<8 x i32> addrspace(6)* inreg %p0, <8 x i32> addrspace(6)* inreg %p1) #0 {
	%gep1 = getelementptr <8 x i32>, <8 x i32> addrspace(6)* %p1, i64 2			%gep1 = getelementptr inbounds <8 x i32>, <8 x i32> addrspace(6)* %p1, i32 2
	%r0 = load <8 x i32>, <8 x i32> addrspace(6)* %p0			%r0 = load <8 x i32>, <8 x i32> addrspace(6)* %p0
	%r1 = load <8 x i32>, <8 x i32> addrspace(6)* %gep1			%r1 = load <8 x i32>, <8 x i32> addrspace(6)* %gep1
	%r = add <8 x i32> %r0, %r1			%r = add <8 x i32> %r0, %r1
	%r2 = bitcast <8 x i32> %r to <8 x float>			%r2 = bitcast <8 x i32> %r to <8 x float>
	ret <8 x float> %r2			ret <8 x float> %r2
	}			}

	; GCN-LABEL: {{^}}load_v16i32:			; GCN-LABEL: {{^}}load_v16i32:
	; GCN-DAG: s_mov_b32 s3, 0			; GCN-DAG: s_mov_b32 s3, 0
	; GCN-DAG: s_mov_b32 s2, s1			; GCN-DAG: s_mov_b32 s2, s1
	; GCN-DAG: s_mov_b32 s1, s3			; GCN-DAG: s_mov_b32 s1, s3
	; SICI-DAG: s_load_dwordx16 s[{{.*}}], s[0:1], 0x0			; SICI-DAG: s_load_dwordx16 s[{{.*}}], s[0:1], 0x0
	; SICI-DAG: s_load_dwordx16 s[{{.*}}], s[2:3], 0x20			; SICI-DAG: s_load_dwordx16 s[{{.*}}], s[2:3], 0x20
	; VIGFX9-DAG: s_load_dwordx16 s[{{.*}}], s[0:1], 0x0			; VIGFX9-DAG: s_load_dwordx16 s[{{.*}}], s[0:1], 0x0
	; VIGFX9-DAG: s_load_dwordx16 s[{{.*}}], s[2:3], 0x80			; VIGFX9-DAG: s_load_dwordx16 s[{{.*}}], s[2:3], 0x80
	define amdgpu_vs <16 x float> @load_v16i32(<16 x i32> addrspace(6)* inreg %p0, <16 x i32> addrspace(6)* inreg %p1) #0 {			define amdgpu_vs <16 x float> @load_v16i32(<16 x i32> addrspace(6)* inreg %p0, <16 x i32> addrspace(6)* inreg %p1) #0 {
	%gep1 = getelementptr <16 x i32>, <16 x i32> addrspace(6)* %p1, i64 2			%gep1 = getelementptr inbounds <16 x i32>, <16 x i32> addrspace(6)* %p1, i32 2
	%r0 = load <16 x i32>, <16 x i32> addrspace(6)* %p0			%r0 = load <16 x i32>, <16 x i32> addrspace(6)* %p0
	%r1 = load <16 x i32>, <16 x i32> addrspace(6)* %gep1			%r1 = load <16 x i32>, <16 x i32> addrspace(6)* %gep1
	%r = add <16 x i32> %r0, %r1			%r = add <16 x i32> %r0, %r1
	%r2 = bitcast <16 x i32> %r to <16 x float>			%r2 = bitcast <16 x i32> %r to <16 x float>
	ret <16 x float> %r2			ret <16 x float> %r2
	}			}

	; GCN-LABEL: {{^}}load_float:			; GCN-LABEL: {{^}}load_float:
	; GCN-DAG: s_mov_b32 s3, 0			; GCN-DAG: s_mov_b32 s3, 0
	; GCN-DAG: s_mov_b32 s2, s1			; GCN-DAG: s_mov_b32 s2, s1
	; GCN-DAG: s_mov_b32 s1, s3			; GCN-DAG: s_mov_b32 s1, s3
	; SICI-DAG: s_load_dword s{{[0-9]}}, s[0:1], 0x0			; SICI-DAG: s_load_dword s{{[0-9]}}, s[0:1], 0x0
	; SICI-DAG: s_load_dword s{{[0-9]}}, s[2:3], 0x2			; SICI-DAG: s_load_dword s{{[0-9]}}, s[2:3], 0x2
	; VIGFX9-DAG: s_load_dword s{{[0-9]}}, s[0:1], 0x0			; VIGFX9-DAG: s_load_dword s{{[0-9]}}, s[0:1], 0x0
	; VIGFX9-DAG: s_load_dword s{{[0-9]}}, s[2:3], 0x8			; VIGFX9-DAG: s_load_dword s{{[0-9]}}, s[2:3], 0x8
	define amdgpu_vs float @load_float(float addrspace(6)* inreg %p0, float addrspace(6)* inreg %p1) #0 {			define amdgpu_vs float @load_float(float addrspace(6)* inreg %p0, float addrspace(6)* inreg %p1) #0 {
	%gep1 = getelementptr float, float addrspace(6)* %p1, i64 2			%gep1 = getelementptr inbounds float, float addrspace(6)* %p1, i32 2
	%r0 = load float, float addrspace(6)* %p0			%r0 = load float, float addrspace(6)* %p0
	%r1 = load float, float addrspace(6)* %gep1			%r1 = load float, float addrspace(6)* %gep1
	%r = fadd float %r0, %r1			%r = fadd float %r0, %r1
	ret float %r			ret float %r
	}			}

	; GCN-LABEL: {{^}}load_v2float:			; GCN-LABEL: {{^}}load_v2float:
	; GCN-DAG: s_mov_b32 s3, 0			; GCN-DAG: s_mov_b32 s3, 0
	; GCN-DAG: s_mov_b32 s2, s1			; GCN-DAG: s_mov_b32 s2, s1
	; GCN-DAG: s_mov_b32 s1, s3			; GCN-DAG: s_mov_b32 s1, s3
	; SICI-DAG: s_load_dwordx2 s[{{.*}}], s[0:1], 0x0			; SICI-DAG: s_load_dwordx2 s[{{.*}}], s[0:1], 0x0
	; SICI-DAG: s_load_dwordx2 s[{{.*}}], s[2:3], 0x4			; SICI-DAG: s_load_dwordx2 s[{{.*}}], s[2:3], 0x4
	; VIGFX9-DAG: s_load_dwordx2 s[{{.*}}], s[0:1], 0x0			; VIGFX9-DAG: s_load_dwordx2 s[{{.*}}], s[0:1], 0x0
	; VIGFX9-DAG: s_load_dwordx2 s[{{.*}}], s[2:3], 0x10			; VIGFX9-DAG: s_load_dwordx2 s[{{.*}}], s[2:3], 0x10
	define amdgpu_vs <2 x float> @load_v2float(<2 x float> addrspace(6)* inreg %p0, <2 x float> addrspace(6)* inreg %p1) #0 {			define amdgpu_vs <2 x float> @load_v2float(<2 x float> addrspace(6)* inreg %p0, <2 x float> addrspace(6)* inreg %p1) #0 {
	%gep1 = getelementptr <2 x float>, <2 x float> addrspace(6)* %p1, i64 2			%gep1 = getelementptr inbounds <2 x float>, <2 x float> addrspace(6)* %p1, i32 2
	%r0 = load <2 x float>, <2 x float> addrspace(6)* %p0			%r0 = load <2 x float>, <2 x float> addrspace(6)* %p0
	%r1 = load <2 x float>, <2 x float> addrspace(6)* %gep1			%r1 = load <2 x float>, <2 x float> addrspace(6)* %gep1
	%r = fadd <2 x float> %r0, %r1			%r = fadd <2 x float> %r0, %r1
	ret <2 x float> %r			ret <2 x float> %r
	}			}

	; GCN-LABEL: {{^}}load_v4float:			; GCN-LABEL: {{^}}load_v4float:
	; GCN-DAG: s_mov_b32 s3, 0			; GCN-DAG: s_mov_b32 s3, 0
	; GCN-DAG: s_mov_b32 s2, s1			; GCN-DAG: s_mov_b32 s2, s1
	; GCN-DAG: s_mov_b32 s1, s3			; GCN-DAG: s_mov_b32 s1, s3
	; SICI-DAG: s_load_dwordx4 s[{{.*}}], s[0:1], 0x0			; SICI-DAG: s_load_dwordx4 s[{{.*}}], s[0:1], 0x0
	; SICI-DAG: s_load_dwordx4 s[{{.*}}], s[2:3], 0x8			; SICI-DAG: s_load_dwordx4 s[{{.*}}], s[2:3], 0x8
	; VIGFX9-DAG: s_load_dwordx4 s[{{.*}}], s[0:1], 0x0			; VIGFX9-DAG: s_load_dwordx4 s[{{.*}}], s[0:1], 0x0
	; VIGFX9-DAG: s_load_dwordx4 s[{{.*}}], s[2:3], 0x20			; VIGFX9-DAG: s_load_dwordx4 s[{{.*}}], s[2:3], 0x20
	define amdgpu_vs <4 x float> @load_v4float(<4 x float> addrspace(6)* inreg %p0, <4 x float> addrspace(6)* inreg %p1) #0 {			define amdgpu_vs <4 x float> @load_v4float(<4 x float> addrspace(6)* inreg %p0, <4 x float> addrspace(6)* inreg %p1) #0 {
	%gep1 = getelementptr <4 x float>, <4 x float> addrspace(6)* %p1, i64 2			%gep1 = getelementptr inbounds <4 x float>, <4 x float> addrspace(6)* %p1, i32 2
	%r0 = load <4 x float>, <4 x float> addrspace(6)* %p0			%r0 = load <4 x float>, <4 x float> addrspace(6)* %p0
	%r1 = load <4 x float>, <4 x float> addrspace(6)* %gep1			%r1 = load <4 x float>, <4 x float> addrspace(6)* %gep1
	%r = fadd <4 x float> %r0, %r1			%r = fadd <4 x float> %r0, %r1
	ret <4 x float> %r			ret <4 x float> %r
	}			}

	; GCN-LABEL: {{^}}load_v8float:			; GCN-LABEL: {{^}}load_v8float:
	; GCN-DAG: s_mov_b32 s3, 0			; GCN-DAG: s_mov_b32 s3, 0
	; GCN-DAG: s_mov_b32 s2, s1			; GCN-DAG: s_mov_b32 s2, s1
	; GCN-DAG: s_mov_b32 s1, s3			; GCN-DAG: s_mov_b32 s1, s3
	; SICI-DAG: s_load_dwordx8 s[{{.*}}], s[0:1], 0x0			; SICI-DAG: s_load_dwordx8 s[{{.*}}], s[0:1], 0x0
	; SICI-DAG: s_load_dwordx8 s[{{.*}}], s[2:3], 0x10			; SICI-DAG: s_load_dwordx8 s[{{.*}}], s[2:3], 0x10
	; VIGFX9-DAG: s_load_dwordx8 s[{{.*}}], s[0:1], 0x0			; VIGFX9-DAG: s_load_dwordx8 s[{{.*}}], s[0:1], 0x0
	; VIGFX9-DAG: s_load_dwordx8 s[{{.*}}], s[2:3], 0x40			; VIGFX9-DAG: s_load_dwordx8 s[{{.*}}], s[2:3], 0x40
	define amdgpu_vs <8 x float> @load_v8float(<8 x float> addrspace(6)* inreg %p0, <8 x float> addrspace(6)* inreg %p1) #0 {			define amdgpu_vs <8 x float> @load_v8float(<8 x float> addrspace(6)* inreg %p0, <8 x float> addrspace(6)* inreg %p1) #0 {
	%gep1 = getelementptr <8 x float>, <8 x float> addrspace(6)* %p1, i64 2			%gep1 = getelementptr inbounds <8 x float>, <8 x float> addrspace(6)* %p1, i32 2
	%r0 = load <8 x float>, <8 x float> addrspace(6)* %p0			%r0 = load <8 x float>, <8 x float> addrspace(6)* %p0
	%r1 = load <8 x float>, <8 x float> addrspace(6)* %gep1			%r1 = load <8 x float>, <8 x float> addrspace(6)* %gep1
	%r = fadd <8 x float> %r0, %r1			%r = fadd <8 x float> %r0, %r1
	ret <8 x float> %r			ret <8 x float> %r
	}			}

	; GCN-LABEL: {{^}}load_v16float:			; GCN-LABEL: {{^}}load_v16float:
	; GCN-DAG: s_mov_b32 s3, 0			; GCN-DAG: s_mov_b32 s3, 0
	; GCN-DAG: s_mov_b32 s2, s1			; GCN-DAG: s_mov_b32 s2, s1
	; GCN-DAG: s_mov_b32 s1, s3			; GCN-DAG: s_mov_b32 s1, s3
	; SICI-DAG: s_load_dwordx16 s[{{.*}}], s[0:1], 0x0			; SICI-DAG: s_load_dwordx16 s[{{.*}}], s[0:1], 0x0
	; SICI-DAG: s_load_dwordx16 s[{{.*}}], s[2:3], 0x20			; SICI-DAG: s_load_dwordx16 s[{{.*}}], s[2:3], 0x20
	; VIGFX9-DAG: s_load_dwordx16 s[{{.*}}], s[0:1], 0x0			; VIGFX9-DAG: s_load_dwordx16 s[{{.*}}], s[0:1], 0x0
	; VIGFX9-DAG: s_load_dwordx16 s[{{.*}}], s[2:3], 0x80			; VIGFX9-DAG: s_load_dwordx16 s[{{.*}}], s[2:3], 0x80
	define amdgpu_vs <16 x float> @load_v16float(<16 x float> addrspace(6)* inreg %p0, <16 x float> addrspace(6)* inreg %p1) #0 {			define amdgpu_vs <16 x float> @load_v16float(<16 x float> addrspace(6)* inreg %p0, <16 x float> addrspace(6)* inreg %p1) #0 {
	%gep1 = getelementptr <16 x float>, <16 x float> addrspace(6)* %p1, i64 2			%gep1 = getelementptr inbounds <16 x float>, <16 x float> addrspace(6)* %p1, i32 2
	%r0 = load <16 x float>, <16 x float> addrspace(6)* %p0			%r0 = load <16 x float>, <16 x float> addrspace(6)* %p0
	%r1 = load <16 x float>, <16 x float> addrspace(6)* %gep1			%r1 = load <16 x float>, <16 x float> addrspace(6)* %gep1
	%r = fadd <16 x float> %r0, %r1			%r = fadd <16 x float> %r0, %r1
	ret <16 x float> %r			ret <16 x float> %r
	}			}

	; GCN-LABEL: {{^}}load_i32_hi0:			; GCN-LABEL: {{^}}load_i32_hi0:
	; GCN: s_mov_b32 s1, 0			; GCN: s_mov_b32 s1, 0
	Show All 34 Lines
	; GCN: s_load_dwordx8			; GCN: s_load_dwordx8
	; GCN-NEXT: s_load_dwordx4			; GCN-NEXT: s_load_dwordx4
	; GCN: image_sample			; GCN: image_sample
	define amdgpu_ps <{ i32, i32, i32, i32, i32, float, float, float, float, float, float, float, float, float, float, float, float, float, float, float }> @load_sampler([0 x <4 x i32>] addrspace(6)* inreg noalias dereferenceable(18446744073709551615), [0 x <8 x i32>] addrspace(6)* inreg noalias dereferenceable(18446744073709551615), [0 x <4 x i32>] addrspace(6)* inreg noalias dereferenceable(18446744073709551615), [0 x <8 x i32>] addrspace(6)* inreg noalias dereferenceable(18446744073709551615), float inreg, i32 inreg, <2 x i32>, <2 x i32>, <2 x i32>, <3 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, float, float, float, float, float, i32, i32, float, i32) #5 {			define amdgpu_ps <{ i32, i32, i32, i32, i32, float, float, float, float, float, float, float, float, float, float, float, float, float, float, float }> @load_sampler([0 x <4 x i32>] addrspace(6)* inreg noalias dereferenceable(18446744073709551615), [0 x <8 x i32>] addrspace(6)* inreg noalias dereferenceable(18446744073709551615), [0 x <4 x i32>] addrspace(6)* inreg noalias dereferenceable(18446744073709551615), [0 x <8 x i32>] addrspace(6)* inreg noalias dereferenceable(18446744073709551615), float inreg, i32 inreg, <2 x i32>, <2 x i32>, <2 x i32>, <3 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, float, float, float, float, float, i32, i32, float, i32) #5 {
	main_body:			main_body:
	%22 = call nsz float @llvm.amdgcn.interp.mov(i32 2, i32 0, i32 0, i32 %5) #8			%22 = call nsz float @llvm.amdgcn.interp.mov(i32 2, i32 0, i32 0, i32 %5) #8
	%23 = bitcast float %22 to i32			%23 = bitcast float %22 to i32
	%24 = shl i32 %23, 1			%24 = shl i32 %23, 1
	%25 = getelementptr [0 x <8 x i32>], [0 x <8 x i32>] addrspace(6)* %1, i32 0, i32 %24, !amdgpu.uniform !0			%25 = getelementptr inbounds [0 x <8 x i32>], [0 x <8 x i32>] addrspace(6)* %1, i32 0, i32 %24, !amdgpu.uniform !0
	%26 = load <8 x i32>, <8 x i32> addrspace(6)* %25, align 32, !invariant.load !0			%26 = load <8 x i32>, <8 x i32> addrspace(6)* %25, align 32, !invariant.load !0
	%27 = shl i32 %23, 2			%27 = shl i32 %23, 2
	%28 = or i32 %27, 3			%28 = or i32 %27, 3
	%29 = bitcast [0 x <8 x i32>] addrspace(6)* %1 to [0 x <4 x i32>] addrspace(6)*			%29 = bitcast [0 x <8 x i32>] addrspace(6)* %1 to [0 x <4 x i32>] addrspace(6)*
	%30 = getelementptr [0 x <4 x i32>], [0 x <4 x i32>] addrspace(6)* %29, i32 0, i32 %28, !amdgpu.uniform !0			%30 = getelementptr inbounds [0 x <4 x i32>], [0 x <4 x i32>] addrspace(6)* %29, i32 0, i32 %28, !amdgpu.uniform !0
	%31 = load <4 x i32>, <4 x i32> addrspace(6)* %30, align 16, !invariant.load !0			%31 = load <4 x i32>, <4 x i32> addrspace(6)* %30, align 16, !invariant.load !0
	%32 = call nsz <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float 0.0, <8 x i32> %26, <4 x i32> %31, i1 0, i32 0, i32 0) #8			%32 = call nsz <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float 0.0, <8 x i32> %26, <4 x i32> %31, i1 0, i32 0, i32 0) #8
	%33 = extractelement <4 x float> %32, i32 0			%33 = extractelement <4 x float> %32, i32 0
	%34 = extractelement <4 x float> %32, i32 1			%34 = extractelement <4 x float> %32, i32 1
	%35 = extractelement <4 x float> %32, i32 2			%35 = extractelement <4 x float> %32, i32 2
	%36 = extractelement <4 x float> %32, i32 3			%36 = extractelement <4 x float> %32, i32 3
	%37 = bitcast float %4 to i32			%37 = bitcast float %4 to i32
	%38 = insertvalue <{ i32, i32, i32, i32, i32, float, float, float, float, float, float, float, float, float, float, float, float, float, float, float }> undef, i32 %37, 4			%38 = insertvalue <{ i32, i32, i32, i32, i32, float, float, float, float, float, float, float, float, float, float, float, float, float, float, float }> undef, i32 %37, 4
	Show All 12 Lines
	; GCN: s_load_dwordx8			; GCN: s_load_dwordx8
	; GCN-NEXT: s_load_dwordx4			; GCN-NEXT: s_load_dwordx4
	; GCN: image_sample			; GCN: image_sample
	define amdgpu_ps <{ i32, i32, i32, i32, i32, float, float, float, float, float, float, float, float, float, float, float, float, float, float, float }> @load_sampler_nouniform([0 x <4 x i32>] addrspace(6)* inreg noalias dereferenceable(18446744073709551615), [0 x <8 x i32>] addrspace(6)* inreg noalias dereferenceable(18446744073709551615), [0 x <4 x i32>] addrspace(6)* inreg noalias dereferenceable(18446744073709551615), [0 x <8 x i32>] addrspace(6)* inreg noalias dereferenceable(18446744073709551615), float inreg, i32 inreg, <2 x i32>, <2 x i32>, <2 x i32>, <3 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, float, float, float, float, float, i32, i32, float, i32) #5 {			define amdgpu_ps <{ i32, i32, i32, i32, i32, float, float, float, float, float, float, float, float, float, float, float, float, float, float, float }> @load_sampler_nouniform([0 x <4 x i32>] addrspace(6)* inreg noalias dereferenceable(18446744073709551615), [0 x <8 x i32>] addrspace(6)* inreg noalias dereferenceable(18446744073709551615), [0 x <4 x i32>] addrspace(6)* inreg noalias dereferenceable(18446744073709551615), [0 x <8 x i32>] addrspace(6)* inreg noalias dereferenceable(18446744073709551615), float inreg, i32 inreg, <2 x i32>, <2 x i32>, <2 x i32>, <3 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, float, float, float, float, float, i32, i32, float, i32) #5 {
	main_body:			main_body:
	%22 = call nsz float @llvm.amdgcn.interp.mov(i32 2, i32 0, i32 0, i32 %5) #8			%22 = call nsz float @llvm.amdgcn.interp.mov(i32 2, i32 0, i32 0, i32 %5) #8
	%23 = bitcast float %22 to i32			%23 = bitcast float %22 to i32
	%24 = shl i32 %23, 1			%24 = shl i32 %23, 1
	%25 = getelementptr [0 x <8 x i32>], [0 x <8 x i32>] addrspace(6)* %1, i32 0, i32 %24			%25 = getelementptr inbounds [0 x <8 x i32>], [0 x <8 x i32>] addrspace(6)* %1, i32 0, i32 %24
	%26 = load <8 x i32>, <8 x i32> addrspace(6)* %25, align 32, !invariant.load !0			%26 = load <8 x i32>, <8 x i32> addrspace(6)* %25, align 32, !invariant.load !0
	%27 = shl i32 %23, 2			%27 = shl i32 %23, 2
	%28 = or i32 %27, 3			%28 = or i32 %27, 3
	%29 = bitcast [0 x <8 x i32>] addrspace(6)* %1 to [0 x <4 x i32>] addrspace(6)*			%29 = bitcast [0 x <8 x i32>] addrspace(6)* %1 to [0 x <4 x i32>] addrspace(6)*
	%30 = getelementptr [0 x <4 x i32>], [0 x <4 x i32>] addrspace(6)* %29, i32 0, i32 %28			%30 = getelementptr inbounds [0 x <4 x i32>], [0 x <4 x i32>] addrspace(6)* %29, i32 0, i32 %28
	%31 = load <4 x i32>, <4 x i32> addrspace(6)* %30, align 16, !invariant.load !0			%31 = load <4 x i32>, <4 x i32> addrspace(6)* %30, align 16, !invariant.load !0
	%32 = call nsz <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float 0.0, <8 x i32> %26, <4 x i32> %31, i1 0, i32 0, i32 0) #8			%32 = call nsz <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float 0.0, <8 x i32> %26, <4 x i32> %31, i1 0, i32 0, i32 0) #8
	%33 = extractelement <4 x float> %32, i32 0			%33 = extractelement <4 x float> %32, i32 0
	%34 = extractelement <4 x float> %32, i32 1			%34 = extractelement <4 x float> %32, i32 1
	%35 = extractelement <4 x float> %32, i32 2			%35 = extractelement <4 x float> %32, i32 2
	%36 = extractelement <4 x float> %32, i32 3			%36 = extractelement <4 x float> %32, i32 3
	%37 = bitcast float %4 to i32			%37 = bitcast float %4 to i32
	%38 = insertvalue <{ i32, i32, i32, i32, i32, float, float, float, float, float, float, float, float, float, float, float, float, float, float, float }> undef, i32 %37, 4			%38 = insertvalue <{ i32, i32, i32, i32, i32, float, float, float, float, float, float, float, float, float, float, float, float, float, float, float }> undef, i32 %37, 4
	%39 = insertvalue <{ i32, i32, i32, i32, i32, float, float, float, float, float, float, float, float, float, float, float, float, float, float, float }> %38, float %33, 5			%39 = insertvalue <{ i32, i32, i32, i32, i32, float, float, float, float, float, float, float, float, float, float, float, float, float, float, float }> %38, float %33, 5
	%40 = insertvalue <{ i32, i32, i32, i32, i32, float, float, float, float, float, float, float, float, float, float, float, float, float, float, float }> %39, float %34, 6			%40 = insertvalue <{ i32, i32, i32, i32, i32, float, float, float, float, float, float, float, float, float, float, float, float, float, float, float }> %39, float %34, 6
	%41 = insertvalue <{ i32, i32, i32, i32, i32, float, float, float, float, float, float, float, float, float, float, float, float, float, float, float }> %40, float %35, 7			%41 = insertvalue <{ i32, i32, i32, i32, i32, float, float, float, float, float, float, float, float, float, float, float, float, float, float, float }> %40, float %35, 7
	%42 = insertvalue <{ i32, i32, i32, i32, i32, float, float, float, float, float, float, float, float, float, float, float, float, float, float, float }> %41, float %36, 8			%42 = insertvalue <{ i32, i32, i32, i32, i32, float, float, float, float, float, float, float, float, float, float, float, float, float, float, float }> %41, float %36, 8
	%43 = insertvalue <{ i32, i32, i32, i32, i32, float, float, float, float, float, float, float, float, float, float, float, float, float, float, float }> %42, float %20, 19			%43 = insertvalue <{ i32, i32, i32, i32, i32, float, float, float, float, float, float, float, float, float, float, float, float, float, float, float }> %42, float %20, 19
	ret <{ i32, i32, i32, i32, i32, float, float, float, float, float, float, float, float, float, float, float, float, float, float, float }> %43			ret <{ i32, i32, i32, i32, i32, float, float, float, float, float, float, float, float, float, float, float, float, float, float, float }> %43
	}			}

				; GCN-LABEL: {{^}}load_addr_no_fold:
				; GCN-DAG: s_add_i32 s0, s0, 4
				; GCN-DAG: s_mov_b32 s1, 0
				; GCN: s_load_dword s{{[0-9]}}, s[0:1], 0x0
				define amdgpu_vs float @load_addr_no_fold(i32 addrspace(6)* inreg noalias %p0) #0 {
				%gep1 = getelementptr i32, i32 addrspace(6)* %p0, i32 1
				%r1 = load i32, i32 addrspace(6)* %gep1
				%r2 = bitcast i32 %r1 to float
				ret float %r2
				}

	; Function Attrs: nounwind readnone speculatable			; Function Attrs: nounwind readnone speculatable
	declare float @llvm.amdgcn.interp.mov(i32, i32, i32, i32) #6			declare float @llvm.amdgcn.interp.mov(i32, i32, i32, i32) #6

	; Function Attrs: nounwind readonly			; Function Attrs: nounwind readonly
	declare <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32, float, <8 x i32>, <4 x i32>, i1, i32, i32) #7			declare <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32, float, <8 x i32>, <4 x i32>, i1, i32, i32) #7


	!0 = !{}			!0 = !{}
	Show All 10 Lines

This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Handle 32-bit address wraparounds for SMRD opcodesClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 163167

llvm/trunk/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp

llvm/trunk/test/CodeGen/AMDGPU/constant-address-space-32bit.ll

AMDGPU: Handle 32-bit address wraparounds for SMRD opcodes
ClosedPublic