This is an archive of the discontinued LLVM Phabricator instance.

This isn't a correct fix. If there's an issue with 64-bit DS instructions, it's a lowering problem. If we can't use them for some reason, changing this here might be a helpful heuristic but as-is this is not a real fix

This revision now requires changes to proceed.Oct 4 2018, 7:02 PM

In D52907#1256202, @arsenm wrote:

This isn't a correct fix. If there's an issue with 64-bit DS instructions, it's a lowering problem. If we can't use them for some reason, changing this here might be a helpful heuristic but as-is this is not a real fix

So what are you suggesting?

What is it exactly that breaks with the merging?

Multi-dword LDS opcodes seem to be the culprit.

Why exactly? Is it possible the Mesa lds allocation isn’t aligning properly?

In D52907#1258426, @arsenm wrote:

Why exactly? Is it possible the Mesa lds allocation isn’t aligning properly?

Since the issue only occurs on SI, I don't think Mesa is doing anything bad. Unless there is some LDS hw difference on SI...

In D52907#1258433, @mareko wrote:

In D52907#1258426, @arsenm wrote:

Why exactly? Is it possible the Mesa lds allocation isn’t aligning properly?

Since the issue only occurs on SI, I don't think Mesa is doing anything bad. Unless there is some LDS hw difference on SI...

I do remember one bug we have that may be related. We try to use the ds_read2_b32 with 4-byte signed trick on SI, without checking that we can use the offsets if the base address isn't known positive

In D52907#1258465, @arsenm wrote:

In D52907#1258433, @mareko wrote:

In D52907#1258426, @arsenm wrote:

Why exactly? Is it possible the Mesa lds allocation isn’t aligning properly?

Since the issue only occurs on SI, I don't think Mesa is doing anything bad. Unless there is some LDS hw difference on SI...

I do remember one bug we have that may be related. We try to use the ds_read2_b32 with 4-byte signed trick on SI, without checking that we can use the offsets if the base address isn't known positive

Are you saying that on SI, if the base address (from VGPR) is negative, but (base address + offset) is in range, the instruction won't execute correctly? Is there documentation on this somewhere?

In D52907#1260080, @nhaehnle wrote:

In D52907#1258465, @arsenm wrote:

In D52907#1258433, @mareko wrote:

In D52907#1258426, @arsenm wrote:

Why exactly? Is it possible the Mesa lds allocation isn’t aligning properly?

Since the issue only occurs on SI, I don't think Mesa is doing anything bad. Unless there is some LDS hw difference on SI...

I do remember one bug we have that may be related. We try to use the ds_read2_b32 with 4-byte signed trick on SI, without checking that we can use the offsets if the base address isn't known positive

Are you saying that on SI, if the base address (from VGPR) is negative, but (base address + offset) is in range, the instruction won't execute correctly? Is there documentation on this somewhere?

The problem is specifically on SI the adder for the offset is only 16-bit, so if a carry happens it computes the wrong address. The overly strong condition we use for this is that the base address is known positive (see isDSOffsetLegal)

That was a good hint.

It turns out that there is a shader which unconditionally loads from lds_array[n] and lds_array[n+1] in a loop that starts with n == -1...

This patch should be superceded by D53160.

In D52907#1258465, @arsenm wrote:

In D52907#1258433, @mareko wrote:

In D52907#1258426, @arsenm wrote:

Why exactly? Is it possible the Mesa lds allocation isn’t aligning properly?

Since the issue only occurs on SI, I don't think Mesa is doing anything bad. Unless there is some LDS hw difference on SI...

I do remember one bug we have that may be related. We try to use the ds_read2_b32 with 4-byte signed trick on SI, without checking that we can use the offsets if the base address isn't known positive

Revision Contents

Path

Size

lib/

Target/

AMDGPU/

AMDGPUTargetTransformInfo.cpp

3 lines

test/

Transforms/

LoadStoreVectorizer/

AMDGPU/

15 lines

13 lines

19 lines

49 lines

Diff 168378

lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp

	Show First 20 Lines • Show All 252 Lines • ▼ Show 20 Lines

	unsigned GCNTTIImpl::getLoadStoreVecRegBitWidth(unsigned AddrSpace) const {			unsigned GCNTTIImpl::getLoadStoreVecRegBitWidth(unsigned AddrSpace) const {
	if (AddrSpace == AMDGPUAS::GLOBAL_ADDRESS \|\|			if (AddrSpace == AMDGPUAS::GLOBAL_ADDRESS \|\|
	AddrSpace == AMDGPUAS::CONSTANT_ADDRESS \|\|			AddrSpace == AMDGPUAS::CONSTANT_ADDRESS \|\|
	AddrSpace == AMDGPUAS::CONSTANT_ADDRESS_32BIT) {			AddrSpace == AMDGPUAS::CONSTANT_ADDRESS_32BIT) {
	return 512;			return 512;
	}			}

				// 64-bit DS opcodes cause incorrect rendering in Hitman on SI.
	if (AddrSpace == AMDGPUAS::FLAT_ADDRESS \|\|			if (AddrSpace == AMDGPUAS::FLAT_ADDRESS \|\|
	AddrSpace == AMDGPUAS::LOCAL_ADDRESS \|\|			AddrSpace == AMDGPUAS::LOCAL_ADDRESS \|\|
	AddrSpace == AMDGPUAS::REGION_ADDRESS)			AddrSpace == AMDGPUAS::REGION_ADDRESS)
	return 128;			return ST->getGeneration() == AMDGPUSubtarget::SOUTHERN_ISLANDS ? 32 : 128;

	if (AddrSpace == AMDGPUAS::PRIVATE_ADDRESS)			if (AddrSpace == AMDGPUAS::PRIVATE_ADDRESS)
	return 8 * ST->getMaxPrivateElementSize();			return 8 * ST->getMaxPrivateElementSize();

	llvm_unreachable("unhandled address space");			llvm_unreachable("unhandled address space");
	}			}

	bool GCNTTIImpl::isLegalToVectorizeMemChain(unsigned ChainSizeInBytes,			bool GCNTTIImpl::isLegalToVectorizeMemChain(unsigned ChainSizeInBytes,
	▲ Show 20 Lines • Show All 456 Lines • Show Last 20 Lines

test/Transforms/LoadStoreVectorizer/AMDGPU/merge-stores.ll

; RUN: opt -mtriple=amdgcn-amd-amdhsa -load-store-vectorizer -S -o - %s \| FileCheck %s		; RUN: opt -mtriple=amdgcn-amd-amdhsa -mcpu=tahiti -load-store-vectorizer -S -o - %s \| FileCheck %s --check-prefixes=CHECK,SI
		; RUN: opt -mtriple=amdgcn-amd-amdhsa -mcpu=bonaire -load-store-vectorizer -S -o - %s \| FileCheck %s --check-prefixes=CHECK,CIPLUS
		; RUN: opt -mtriple=amdgcn-amd-amdhsa -mcpu=fiji -load-store-vectorizer -S -o - %s \| FileCheck %s --check-prefixes=CHECK,CIPLUS
		; RUN: opt -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -load-store-vectorizer -S -o - %s \| FileCheck %s --check-prefixes=CHECK,CIPLUS
; Copy of test/CodeGen/AMDGPU/merge-stores.ll with some additions		; Copy of test/CodeGen/AMDGPU/merge-stores.ll with some additions

target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5"		target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5"

; TODO: Vector element tests		; TODO: Vector element tests
; TODO: Non-zero base offset for load and store combinations		; TODO: Non-zero base offset for load and store combinations
; TODO: Same base addrspacecasted		; TODO: Same base addrspacecasted

▲ Show 20 Lines • Show All 468 Lines • ▼ Show 20 Lines	define amdgpu_kernel void @merge_local_store_2_constants_i8(i8 addrspace(3)* %out) #0 {
%out.gep.1 = getelementptr i8, i8 addrspace(3)* %out, i32 1		%out.gep.1 = getelementptr i8, i8 addrspace(3)* %out, i32 1

store i8 123, i8 addrspace(3)* %out.gep.1		store i8 123, i8 addrspace(3)* %out.gep.1
store i8 456, i8 addrspace(3)* %out, align 2		store i8 456, i8 addrspace(3)* %out, align 2
ret void		ret void
}		}

; CHECK-LABEL: @merge_local_store_2_constants_i32		; CHECK-LABEL: @merge_local_store_2_constants_i32
; CHECK: store <2 x i32> <i32 456, i32 123>, <2 x i32> addrspace(3)* %{{[0-9]+}}, align 4		; SI: store i32
		; SI: store i32
		; CIPLUS: store <2 x i32> <i32 456, i32 123>, <2 x i32> addrspace(3)* %{{[0-9]+}}, align 4
define amdgpu_kernel void @merge_local_store_2_constants_i32(i32 addrspace(3)* %out) #0 {		define amdgpu_kernel void @merge_local_store_2_constants_i32(i32 addrspace(3)* %out) #0 {
%out.gep.1 = getelementptr i32, i32 addrspace(3)* %out, i32 1		%out.gep.1 = getelementptr i32, i32 addrspace(3)* %out, i32 1

store i32 123, i32 addrspace(3)* %out.gep.1		store i32 123, i32 addrspace(3)* %out.gep.1
store i32 456, i32 addrspace(3)* %out		store i32 456, i32 addrspace(3)* %out
ret void		ret void
}		}

; CHECK-LABEL: @merge_local_store_2_constants_i32_align_2		; CHECK-LABEL: @merge_local_store_2_constants_i32_align_2
; CHECK: store i32		; CHECK: store i32
; CHECK: store i32		; CHECK: store i32
define amdgpu_kernel void @merge_local_store_2_constants_i32_align_2(i32 addrspace(3)* %out) #0 {		define amdgpu_kernel void @merge_local_store_2_constants_i32_align_2(i32 addrspace(3)* %out) #0 {
%out.gep.1 = getelementptr i32, i32 addrspace(3)* %out, i32 1		%out.gep.1 = getelementptr i32, i32 addrspace(3)* %out, i32 1

store i32 123, i32 addrspace(3)* %out.gep.1, align 2		store i32 123, i32 addrspace(3)* %out.gep.1, align 2
store i32 456, i32 addrspace(3)* %out, align 2		store i32 456, i32 addrspace(3)* %out, align 2
ret void		ret void
}		}

; CHECK-LABEL: @merge_local_store_4_constants_i32		; CHECK-LABEL: @merge_local_store_4_constants_i32
; CHECK: store <4 x i32> <i32 1234, i32 123, i32 456, i32 333>, <4 x i32> addrspace(3)*		; SI: store i32
		; SI: store i32
		; SI: store i32
		; SI: store i32
		; CIPLUS: store <4 x i32> <i32 1234, i32 123, i32 456, i32 333>, <4 x i32> addrspace(3)*
define amdgpu_kernel void @merge_local_store_4_constants_i32(i32 addrspace(3)* %out) #0 {		define amdgpu_kernel void @merge_local_store_4_constants_i32(i32 addrspace(3)* %out) #0 {
%out.gep.1 = getelementptr i32, i32 addrspace(3)* %out, i32 1		%out.gep.1 = getelementptr i32, i32 addrspace(3)* %out, i32 1
%out.gep.2 = getelementptr i32, i32 addrspace(3)* %out, i32 2		%out.gep.2 = getelementptr i32, i32 addrspace(3)* %out, i32 2
%out.gep.3 = getelementptr i32, i32 addrspace(3)* %out, i32 3		%out.gep.3 = getelementptr i32, i32 addrspace(3)* %out, i32 3

store i32 123, i32 addrspace(3)* %out.gep.1		store i32 123, i32 addrspace(3)* %out.gep.1
store i32 456, i32 addrspace(3)* %out.gep.2		store i32 456, i32 addrspace(3)* %out.gep.2
store i32 333, i32 addrspace(3)* %out.gep.3		store i32 333, i32 addrspace(3)* %out.gep.3
▲ Show 20 Lines • Show All 142 Lines • Show Last 20 Lines

test/Transforms/LoadStoreVectorizer/AMDGPU/missing-alignment.ll

	; RUN: opt -mtriple=amdgcn-- -load-store-vectorizer -S -o - %s \| FileCheck %s			; RUN: opt -mtriple=amdgcn-- -mcpu=tahiti -load-store-vectorizer -S -o - %s \| FileCheck %s --check-prefixes=CHECK,SI
				; RUN: opt -mtriple=amdgcn-- -mcpu=bonaire -load-store-vectorizer -S -o - %s \| FileCheck %s --check-prefixes=CHECK,CIPLUS
				; RUN: opt -mtriple=amdgcn-- -mcpu=fiji -load-store-vectorizer -S -o - %s \| FileCheck %s --check-prefixes=CHECK,CIPLUS
				; RUN: opt -mtriple=amdgcn-- -mcpu=gfx900 -load-store-vectorizer -S -o - %s \| FileCheck %s --check-prefixes=CHECK,CIPLUS

	target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5"			target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5"

	@lds = internal addrspace(3) global [512 x float] undef, align 4			@lds = internal addrspace(3) global [512 x float] undef, align 4

	; The original load has an implicit alignment of 4, and should not			; The original load has an implicit alignment of 4, and should not
	; increase to an align 8 load.			; increase to an align 8 load.

	; CHECK-LABEL: @load_keep_base_alignment_missing_align(			; CHECK-LABEL: @load_keep_base_alignment_missing_align(
	; CHECK: load <2 x float>, <2 x float> addrspace(3)* %{{[0-9]+}}, align 4			; SI: load float
				; SI: load float
				; CIPLUS: load <2 x float>, <2 x float> addrspace(3)* %{{[0-9]+}}, align 4
	define amdgpu_kernel void @load_keep_base_alignment_missing_align(float addrspace(1)* %out) {			define amdgpu_kernel void @load_keep_base_alignment_missing_align(float addrspace(1)* %out) {
	%ptr0 = getelementptr inbounds [512 x float], [512 x float] addrspace(3)* @lds, i32 0, i32 11			%ptr0 = getelementptr inbounds [512 x float], [512 x float] addrspace(3)* @lds, i32 0, i32 11
	%val0 = load float, float addrspace(3)* %ptr0			%val0 = load float, float addrspace(3)* %ptr0

	%ptr1 = getelementptr inbounds [512 x float], [512 x float] addrspace(3)* @lds, i32 0, i32 12			%ptr1 = getelementptr inbounds [512 x float], [512 x float] addrspace(3)* @lds, i32 0, i32 12
	%val1 = load float, float addrspace(3)* %ptr1			%val1 = load float, float addrspace(3)* %ptr1
	%add = fadd float %val0, %val1			%add = fadd float %val0, %val1
	store float %add, float addrspace(1)* %out			store float %add, float addrspace(1)* %out
	ret void			ret void
	}			}


	; CHECK-LABEL: @store_keep_base_alignment_missing_align(			; CHECK-LABEL: @store_keep_base_alignment_missing_align(
	; CHECK: store <2 x float> zeroinitializer, <2 x float> addrspace(3)* %{{[0-9]+}}, align 4			; SI: store float
				; SI: store float
				; CIPLUS: store <2 x float> zeroinitializer, <2 x float> addrspace(3)* %{{[0-9]+}}, align 4
	define amdgpu_kernel void @store_keep_base_alignment_missing_align() {			define amdgpu_kernel void @store_keep_base_alignment_missing_align() {
	%arrayidx0 = getelementptr inbounds [512 x float], [512 x float] addrspace(3)* @lds, i32 0, i32 1			%arrayidx0 = getelementptr inbounds [512 x float], [512 x float] addrspace(3)* @lds, i32 0, i32 1
	%arrayidx1 = getelementptr inbounds [512 x float], [512 x float] addrspace(3)* @lds, i32 0, i32 2			%arrayidx1 = getelementptr inbounds [512 x float], [512 x float] addrspace(3)* @lds, i32 0, i32 2
	store float 0.0, float addrspace(3)* %arrayidx0			store float 0.0, float addrspace(3)* %arrayidx0
	store float 0.0, float addrspace(3)* %arrayidx1			store float 0.0, float addrspace(3)* %arrayidx1
	ret void			ret void
	}			}

test/Transforms/LoadStoreVectorizer/AMDGPU/multiple_tails.ll

	; RUN: opt -mtriple=amdgcn-amd-amdhsa -basicaa -load-store-vectorizer -S -o - %s \| FileCheck %s			; RUN: opt -mtriple=amdgcn-amd-amdhsa -mcpu=tahiti -basicaa -load-store-vectorizer -S -o - %s \| FileCheck %s --check-prefixes=CHECK,SI
				; RUN: opt -mtriple=amdgcn-amd-amdhsa -mcpu=bonaire -basicaa -load-store-vectorizer -S -o - %s \| FileCheck %s --check-prefixes=CHECK,CIPLUS
				; RUN: opt -mtriple=amdgcn-amd-amdhsa -mcpu=fiji -basicaa -load-store-vectorizer -S -o - %s \| FileCheck %s --check-prefixes=CHECK,CIPLUS
				; RUN: opt -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -basicaa -load-store-vectorizer -S -o - %s \| FileCheck %s --check-prefixes=CHECK,CIPLUS

	target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5"			target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5"

	; Checks that there is no crash when there are multiple tails			; Checks that there is no crash when there are multiple tails
	; for a the same head starting a chain.			; for a the same head starting a chain.
	@0 = internal addrspace(3) global [16384 x i32] undef			@0 = internal addrspace(3) global [16384 x i32] undef

	; CHECK-LABEL: @no_crash(			; CHECK-LABEL: @no_crash(
	; CHECK: store <2 x i32> zeroinitializer			; SI: store i32 0
				; SI: store i32 0
				; CIPLUS: store <2 x i32> zeroinitializer
	; CHECK: store i32 0			; CHECK: store i32 0
	; CHECK: store i32 0			; CHECK: store i32 0

	define amdgpu_kernel void @no_crash(i32 %arg) {			define amdgpu_kernel void @no_crash(i32 %arg) {
	%tmp2 = add i32 %arg, 14			%tmp2 = add i32 %arg, 14
	%tmp3 = getelementptr [16384 x i32], [16384 x i32] addrspace(3)* @0, i32 0, i32 %tmp2			%tmp3 = getelementptr [16384 x i32], [16384 x i32] addrspace(3)* @0, i32 0, i32 %tmp2
	%tmp4 = add i32 %arg, 15			%tmp4 = add i32 %arg, 15
	%tmp5 = getelementptr [16384 x i32], [16384 x i32] addrspace(3)* @0, i32 0, i32 %tmp4			%tmp5 = getelementptr [16384 x i32], [16384 x i32] addrspace(3)* @0, i32 0, i32 %tmp4

	store i32 0, i32 addrspace(3)* %tmp3, align 4			store i32 0, i32 addrspace(3)* %tmp3, align 4
	store i32 0, i32 addrspace(3)* %tmp5, align 4			store i32 0, i32 addrspace(3)* %tmp5, align 4
	store i32 0, i32 addrspace(3)* %tmp5, align 4			store i32 0, i32 addrspace(3)* %tmp5, align 4
	store i32 0, i32 addrspace(3)* %tmp5, align 4			store i32 0, i32 addrspace(3)* %tmp5, align 4

	ret void			ret void
	}			}

	; Check adjiacent memory locations are properly matched and the			; Check adjiacent memory locations are properly matched and the
	; longest chain vectorized			; longest chain vectorized

	; CHECK-LABEL: @interleave_get_longest			; CHECK-LABEL: @interleave_get_longest
	; CHECK: load <4 x i32>			; SI: load i32
				; SI: load i32
				; SI: store i32
				; SI: store i32
				; SI: load i32
				; SI: load i32
				; CIPLUS: load <4 x i32>
	; CHECK: load i32			; CHECK: load i32
	; CHECK: store <2 x i32> zeroinitializer			; CIPLUS: store <2 x i32> zeroinitializer
	; CHECK: load i32			; CHECK: load i32
	; CHECK: load i32			; CHECK: load i32
	; CHECK: load i32			; CHECK: load i32

	define amdgpu_kernel void @interleave_get_longest(i32 %arg) {			define amdgpu_kernel void @interleave_get_longest(i32 %arg) {
	%a1 = add i32 %arg, 1			%a1 = add i32 %arg, 1
	%a2 = add i32 %arg, 2			%a2 = add i32 %arg, 2
	%a3 = add i32 %arg, 3			%a3 = add i32 %arg, 3
	Show All 21 Lines

test/Transforms/LoadStoreVectorizer/AMDGPU/pointer-elements.ll

; RUN: opt -mtriple=amdgcn-amd-amdhsa -basicaa -load-store-vectorizer -S -o - %s \| FileCheck %s		; RUN: opt -mtriple=amdgcn-amd-amdhsa -mcpu=tahiti -basicaa -load-store-vectorizer -S -o - %s \| FileCheck %s --check-prefixes=CHECK,SI
		; RUN: opt -mtriple=amdgcn-amd-amdhsa -mcpu=bonaire -basicaa -load-store-vectorizer -S -o - %s \| FileCheck %s --check-prefixes=CHECK,CIPLUS
		; RUN: opt -mtriple=amdgcn-amd-amdhsa -mcpu=fiji -basicaa -load-store-vectorizer -S -o - %s \| FileCheck %s --check-prefixes=CHECK,CIPLUS
		; RUN: opt -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -basicaa -load-store-vectorizer -S -o - %s \| FileCheck %s --check-prefixes=CHECK,CIPLUS

target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5"		target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5"

declare i32 @llvm.amdgcn.workitem.id.x() #1		declare i32 @llvm.amdgcn.workitem.id.x() #1

; CHECK-LABEL: @merge_v2p1i8(		; CHECK-LABEL: @merge_v2p1i8(
; CHECK: load <2 x i64>		; CHECK: load <2 x i64>
; CHECK: inttoptr i64 %{{[^ ]+}} to i8 addrspace(1)*		; CHECK: inttoptr i64 %{{[^ ]+}} to i8 addrspace(1)*
Show All 9 Lines	entry:

store i8 addrspace(1)* null, i8 addrspace(1)* addrspace(1)* %a, align 4		store i8 addrspace(1)* null, i8 addrspace(1)* addrspace(1)* %a, align 4
store i8 addrspace(1)* null, i8 addrspace(1)* addrspace(1)* %a.1, align 4		store i8 addrspace(1)* null, i8 addrspace(1)* addrspace(1)* %a.1, align 4

ret void		ret void
}		}

; CHECK-LABEL: @merge_v2p3i8(		; CHECK-LABEL: @merge_v2p3i8(
; CHECK: load <2 x i32>		; SI: load i8
; CHECK: inttoptr i32 %{{[^ ]+}} to i8 addrspace(3)*		; SI: load i8
; CHECK: inttoptr i32 %{{[^ ]+}} to i8 addrspace(3)*		; SI: store i8
; CHECK: store <2 x i32> zeroinitializer		; SI: store i8
		; CIPLUS: load <2 x i32>
		; CIPLUS: inttoptr i32 %{{[^ ]+}} to i8 addrspace(3)*
		; CIPLUS: inttoptr i32 %{{[^ ]+}} to i8 addrspace(3)*
		; CIPLUS: store <2 x i32> zeroinitializer
define amdgpu_kernel void @merge_v2p3i8(i8 addrspace(3)* addrspace(3)* nocapture %a, i8 addrspace(3)* addrspace(3)* nocapture readonly %b) #0 {		define amdgpu_kernel void @merge_v2p3i8(i8 addrspace(3)* addrspace(3)* nocapture %a, i8 addrspace(3)* addrspace(3)* nocapture readonly %b) #0 {
entry:		entry:
%a.1 = getelementptr inbounds i8 addrspace(3), i8 addrspace(3) addrspace(3)* %a, i64 1		%a.1 = getelementptr inbounds i8 addrspace(3), i8 addrspace(3) addrspace(3)* %a, i64 1
%b.1 = getelementptr inbounds i8 addrspace(3), i8 addrspace(3) addrspace(3)* %b, i64 1		%b.1 = getelementptr inbounds i8 addrspace(3), i8 addrspace(3) addrspace(3)* %b, i64 1

%ld.c = load i8 addrspace(3), i8 addrspace(3) addrspace(3)* %b, align 4		%ld.c = load i8 addrspace(3), i8 addrspace(3) addrspace(3)* %b, align 4
%ld.c.idx.1 = load i8 addrspace(3), i8 addrspace(3) addrspace(3)* %b.1, align 4		%ld.c.idx.1 = load i8 addrspace(3), i8 addrspace(3) addrspace(3)* %b.1, align 4

▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	entry:

store i64 %val0, i64 addrspace(1)* %a.cast		store i64 %val0, i64 addrspace(1)* %a.cast
store i8 addrspace(1)* %ptr1, i8 addrspace(1)* addrspace(1)* %a.1		store i8 addrspace(1)* %ptr1, i8 addrspace(1)* addrspace(1)* %a.1

ret void		ret void
}		}

; CHECK-LABEL: @merge_load_i32_ptr32(		; CHECK-LABEL: @merge_load_i32_ptr32(
; CHECK: load <2 x i32>		; SI: load i32
; CHECK: [[ELT1:%[^ ]+]] = extractelement <2 x i32> %{{[^ ]+}}, i32 1		; SI: load i8
; CHECK: inttoptr i32 [[ELT1]] to i8 addrspace(3)*		; CIPLUS: load <2 x i32>
		; CIPLUS: [[ELT1:%[^ ]+]] = extractelement <2 x i32> %{{[^ ]+}}, i32 1
		; CIPLUS: inttoptr i32 [[ELT1]] to i8 addrspace(3)*
define amdgpu_kernel void @merge_load_i32_ptr32(i32 addrspace(3)* nocapture %a) #0 {		define amdgpu_kernel void @merge_load_i32_ptr32(i32 addrspace(3)* nocapture %a) #0 {
entry:		entry:
%a.1 = getelementptr inbounds i32, i32 addrspace(3)* %a, i32 1		%a.1 = getelementptr inbounds i32, i32 addrspace(3)* %a, i32 1
%a.1.cast = bitcast i32 addrspace(3)* %a.1 to i8 addrspace(3)* addrspace(3)*		%a.1.cast = bitcast i32 addrspace(3)* %a.1 to i8 addrspace(3)* addrspace(3)*

%ld.0 = load i32, i32 addrspace(3)* %a		%ld.0 = load i32, i32 addrspace(3)* %a
%ld.1 = load i8 addrspace(3), i8 addrspace(3) addrspace(3)* %a.1.cast		%ld.1 = load i8 addrspace(3), i8 addrspace(3) addrspace(3)* %a.1.cast

ret void		ret void
}		}

; CHECK-LABEL: @merge_load_ptr32_i32(		; CHECK-LABEL: @merge_load_ptr32_i32(
; CHECK: load <2 x i32>		; SI: load i8
; CHECK: [[ELT0:%[^ ]+]] = extractelement <2 x i32> %{{[^ ]+}}, i32 0		; SI: load i32
; CHECK: inttoptr i32 [[ELT0]] to i8 addrspace(3)*		; CIPLUS: load <2 x i32>
		; CIPLUS: [[ELT0:%[^ ]+]] = extractelement <2 x i32> %{{[^ ]+}}, i32 0
		; CIPLUS: inttoptr i32 [[ELT0]] to i8 addrspace(3)*
define amdgpu_kernel void @merge_load_ptr32_i32(i32 addrspace(3)* nocapture %a) #0 {		define amdgpu_kernel void @merge_load_ptr32_i32(i32 addrspace(3)* nocapture %a) #0 {
entry:		entry:
%a.cast = bitcast i32 addrspace(3)* %a to i8 addrspace(3)* addrspace(3)*		%a.cast = bitcast i32 addrspace(3)* %a to i8 addrspace(3)* addrspace(3)*
%a.1 = getelementptr inbounds i32, i32 addrspace(3)* %a, i32 1		%a.1 = getelementptr inbounds i32, i32 addrspace(3)* %a, i32 1

%ld.0 = load i8 addrspace(3), i8 addrspace(3) addrspace(3)* %a.cast		%ld.0 = load i8 addrspace(3), i8 addrspace(3) addrspace(3)* %a.cast
%ld.1 = load i32, i32 addrspace(3)* %a.1		%ld.1 = load i32, i32 addrspace(3)* %a.1

ret void		ret void
}		}

; CHECK-LABEL: @merge_store_ptr32_i32(		; CHECK-LABEL: @merge_store_ptr32_i32(
; CHECK: [[ELT0:%[^ ]+]] = ptrtoint i8 addrspace(3)* %ptr0 to i32		; SI: store i8
; CHECK: insertelement <2 x i32> undef, i32 [[ELT0]], i32 0		; SI: store i32
; CHECK: store <2 x i32>		; CIPLUS: [[ELT0:%[^ ]+]] = ptrtoint i8 addrspace(3)* %ptr0 to i32
		; CIPLUS: insertelement <2 x i32> undef, i32 [[ELT0]], i32 0
		; CIPLUS: store <2 x i32>
define amdgpu_kernel void @merge_store_ptr32_i32(i32 addrspace(3)* nocapture %a, i8 addrspace(3)* %ptr0, i32 %val1) #0 {		define amdgpu_kernel void @merge_store_ptr32_i32(i32 addrspace(3)* nocapture %a, i8 addrspace(3)* %ptr0, i32 %val1) #0 {
entry:		entry:
%a.cast = bitcast i32 addrspace(3)* %a to i8 addrspace(3)* addrspace(3)*		%a.cast = bitcast i32 addrspace(3)* %a to i8 addrspace(3)* addrspace(3)*
%a.1 = getelementptr inbounds i32, i32 addrspace(3)* %a, i32 1		%a.1 = getelementptr inbounds i32, i32 addrspace(3)* %a, i32 1

store i8 addrspace(3)* %ptr0, i8 addrspace(3)* addrspace(3)* %a.cast		store i8 addrspace(3)* %ptr0, i8 addrspace(3)* addrspace(3)* %a.cast
store i32 %val1, i32 addrspace(3)* %a.1		store i32 %val1, i32 addrspace(3)* %a.1

ret void		ret void
}		}

; CHECK-LABEL: @merge_store_i32_ptr32(		; CHECK-LABEL: @merge_store_i32_ptr32(
; CHECK: [[ELT1:%[^ ]+]] = ptrtoint i8 addrspace(3)* %ptr1 to i32		; SI: store i32
; CHECK: insertelement <2 x i32> %{{[^ ]+}}, i32 [[ELT1]], i32 1		; SI: store i8
; CHECK: store <2 x i32>		; CIPLUS: [[ELT1:%[^ ]+]] = ptrtoint i8 addrspace(3)* %ptr1 to i32
		; CIPLUS: insertelement <2 x i32> %{{[^ ]+}}, i32 [[ELT1]], i32 1
		; CIPLUS: store <2 x i32>
define amdgpu_kernel void @merge_store_i32_ptr32(i8 addrspace(3)* addrspace(3)* nocapture %a, i32 %val0, i8 addrspace(3)* %ptr1) #0 {		define amdgpu_kernel void @merge_store_i32_ptr32(i8 addrspace(3)* addrspace(3)* nocapture %a, i32 %val0, i8 addrspace(3)* %ptr1) #0 {
entry:		entry:
%a.1 = getelementptr inbounds i8 addrspace(3), i8 addrspace(3) addrspace(3)* %a, i32 1		%a.1 = getelementptr inbounds i8 addrspace(3), i8 addrspace(3) addrspace(3)* %a, i32 1
%a.cast = bitcast i8 addrspace(3)* addrspace(3)* %a to i32 addrspace(3)*		%a.cast = bitcast i8 addrspace(3)* addrspace(3)* %a to i32 addrspace(3)*

store i32 %val0, i32 addrspace(3)* %a.cast		store i32 %val0, i32 addrspace(3)* %a.cast
store i8 addrspace(3)* %ptr1, i8 addrspace(3)* addrspace(3)* %a.1		store i8 addrspace(3)* %ptr1, i8 addrspace(3)* addrspace(3)* %a.1

▲ Show 20 Lines • Show All 149 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Don't merge DS opcodes on SI to fix corruption in HitmanAbandonedPublic

Details

Diff Detail

Event Timeline