This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Avoid splitting FLAT offsets in unsafe ways
ClosedPublic

Authored by foad on Jul 8 2020, 6:51 AM.

Download Raw Diff

Details

Reviewers

arsenm
sameerds
nhaehnle
rampitec

Commits

rG760af7a07432: [AMDGPU] Avoid splitting FLAT offsets in unsafe ways

Summary

As explained in the comment:

For a FLAT instruction the hardware decides whether to access
global/scratch/shared memory based on the high bits of vaddr,
ignoring the offset field, so we have to ensure that when we add
remainder to vaddr it still points into the same underlying object.
The easiest way to do that is to make sure that we split the offset
into two pieces that are both >= 0 or both <= 0.

In particular FLAT (as opposed to SCRATCH and GLOBAL) instructions have
an unsigned immediate offset field, so we can't use it to help split a
negative offset.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

foad created this revision.Jul 8 2020, 6:51 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 8 2020, 6:51 AM

Herald added subscribers: llvm-commits, kerbowa, hiraditya and 8 others. · View Herald Transcript

JonChesterfield added a subscriber: JonChesterfield.Jul 8 2020, 7:11 AM

The mirror change is needed for globalisel

arsenm added inline comments.Jul 8 2020, 7:28 AM

llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1705	This limitation also only needs to be applied if AS == FLAT_ADDRESS

foad marked an inline comment as done.Jul 8 2020, 7:33 AM

foad added inline comments.

llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1705	The only "limitation" is that we don't try to split negative offsets if the immediate offset field is unsigned, but you're saying we can do that if AS != FLAT_ADDRESS? What would that mean - that we're using a FLAT instruction but we know statically which part of the address space it is accessing??

arsenm added inline comments.Jul 8 2020, 7:42 AM

llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1705	Correct. This is always the case pre-gfx9 which did not have the "global" flat instructions

Harbormaster completed remote builds in B63408: Diff 276412.Jul 8 2020, 7:43 AM

arsenm added inline comments.Jul 8 2020, 7:46 AM

llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1705	Actually pre-gfx9 also didn't have flat offsets. However gfx10 does have a bug with flat offsets, so I think it would still be correct to model this correctly. The instruction patterns do accept either (and global instructions are only preferred through pattern priority)

The mirror change is needed for globalisel

AMDGPUInstructionSelector::selectFlatOffsetImpl doesn't attempt to split offsets so I don't think there's anything to fix.

llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1705	This limitation also only needs to be applied if AS == FLAT_ADDRESS I still don't get this. Surely if we're using a FLAT instruction, even if we know which specific address space the programmer is trying to access, we still have to avoid setting vaddr to an address that might point into the wrong aperture.

foad added a reviewer: nhaehnle.Jul 8 2020, 11:13 AM

foad added a reviewer: rampitec.Jul 9 2020, 9:46 AM

Rebase.
Fix silly mistake in checking for negative offsets.

arsenm added inline comments.Jul 9 2020, 11:47 AM

llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1705	My understanding was the aperture only means anything for private or local. If it's a global address, it's neither aperture and behaves as a normal instruction (i.e. there's no aperture for global pointers)

foad marked an inline comment as done.Jul 9 2020, 11:51 AM

foad added inline comments.

llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1705	But in that case, you should still avoid making drastic changes to vaddr in case it ends up accidentally pointing into one of the apertures, when you wanted a global access. E.g. if you're accessing a global that happens to be just past the end of the private or local aperture.

Harbormaster failed remote builds in B63618: Diff 276792!Jul 9 2020, 12:09 PM

arsenm added inline comments.Jul 9 2020, 2:27 PM

llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1705	I thought Nicolai mentioned this might not be possible? I guess I don't understand why the aperture is so complicated and not just a couple of flags in the high, unused bits

In D83394#2142277, @foad wrote:

Rebase.
Fix silly mistake in checking for negative offsets.

It's hard to see through the rebase, but did fixing the negative offset check add more tests? I assuming that the tests in the original patch did not capture this mistake, so it should warrant a new test.

In D83394#2143122, @sameerds wrote:

In D83394#2142277, @foad wrote:

Rebase.
Fix silly mistake in checking for negative offsets.

It's hard to see through the rebase, but did fixing the negative offset check add more tests? I assuming that the tests in the original patch did not capture this mistake, so it should warrant a new test.

There are no new tests. All the testing comes from staring at the changes in offset-split-flat.ll and offset-split-global.ll.

In retrospect I should have noticed that there was something wrong with the original patch, because it didn't cause any changes in offset-split-flat.ll.

foad marked an inline comment as done.Jul 14 2020, 1:14 AM

foad added inline comments.

llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1705	I don't remember hearing that it wasn't possible. @nhaehnle ?

Ping.

arsenm added inline comments.Jul 15 2020, 3:09 PM

llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1705	maskTrailingZeros

foad marked an inline comment as done.Jul 16 2020, 12:30 AM

foad added inline comments.

llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1705	That would round towards -infinity. As the comment says, we're deliberately rounding towards zero instead.

arsenm accepted this revision.Jul 16 2020, 11:24 AM

arsenm added inline comments.

llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1698	Can you add a fixme to check if this is needed if we know the address isn't FLAT_ADDRESS?

This revision is now accepted and ready to land.Jul 16 2020, 11:24 AM

nhaehnle added inline comments.Jul 16 2020, 12:38 PM

llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1698	Yes, I'm pretty sure it's needed. If we generate a FLAT instruction, the hardware will check apertures regardless, and you could run into the following scenario: end_of_scratch_aperture: 0x2'0000'0000 vaddr: 0x1'ffff'fff0 inst_offset: 32 The hardware executes the flat instruction, sees that vaddr falls into the scratch aperture, and executes the instruction as accessing scratch memory. The only way to fix that would be if we knew that there is never any memory mapped directly after the end of any of the apertures.

Closed by commit rG760af7a07432: [AMDGPU] Avoid splitting FLAT offsets in unsafe ways (authored by foad). · Explain WhyJul 17 2020, 3:46 AM

This revision was automatically updated to reflect the committed changes.

I'm seeing (difficult to minimise) failure modes with gfx10 that look the same as those on gfx9 before this patch. I also see the comment:

"gfx10 does have a bug with flat offsets"

Is there a corresponding patch needed for gfx10?

In D83394#2305735, @JonChesterfield wrote:

I'm seeing (difficult to minimise) failure modes with gfx10 that look the same as those on gfx9 before this patch. I also see the comment:

"gfx10 does have a bug with flat offsets"

Is there a corresponding patch needed for gfx10?

This patch affected gfx10 just as much as gfx9, so there is no extra patch required for gfx10.

I am confused about the gfx10 bug. I assume it refers to this:

def FeatureFlatSegmentOffsetBug : SubtargetFeature<"flat-segment-offset-bug",
  "HasFlatSegmentOffsetBug",
  "true",
  "GFX10 bug, inst_offset ignored in flat segment"
>;

But (a) I think this is just the intended behaviour of the hardware, so I wouldn't call it a bug, and (b) I think gfx9 works the same way, otherwise there would have been no need for this patch in the first place!

Update: actually I think there is a gfx10-only hardware bug, but the description of that subtarget feature doesn't describe it very well.

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

AMDGPUISelDAGToDAG.cpp

38 lines

test/

CodeGen/

AMDGPU/

flat-address-space.ll

8 lines

offset-split-flat.ll

43 lines

offset-split-global.ll

51 lines

promote-constOffset-to-imm.ll

20 lines

store-hi16.ll

12 lines

Diff 278712

llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp

Show First 20 Lines • Show All 1,682 Lines • ▼ Show 20 Lines	if (N0 && N1) {
const SIInstrInfo *TII = Subtarget->getInstrInfo();		const SIInstrInfo *TII = Subtarget->getInstrInfo();
unsigned AS = findMemSDNode(N)->getAddressSpace();		unsigned AS = findMemSDNode(N)->getAddressSpace();
if (TII->isLegalFLATOffset(COffsetVal, AS, IsSigned)) {		if (TII->isLegalFLATOffset(COffsetVal, AS, IsSigned)) {
Addr = N0;		Addr = N0;
OffsetVal = COffsetVal;		OffsetVal = COffsetVal;
} else {		} else {
// If the offset doesn't fit, put the low bits into the offset field and		// If the offset doesn't fit, put the low bits into the offset field and
// add the rest.		// add the rest.
		//
		// For a FLAT instruction the hardware decides whether to access
		// global/scratch/shared memory based on the high bits of vaddr,
		// ignoring the offset field, so we have to ensure that when we add
		// remainder to vaddr it still points into the same underlying object.
		// The easiest way to do that is to make sure that we split the offset
		// into two pieces that are both >= 0 or both <= 0.

		arsenmUnsubmitted Not Done Reply Inline Actions Can you add a fixme to check if this is needed if we know the address isn't FLAT_ADDRESS? arsenm: Can you add a fixme to check if this is needed if we know the address isn't FLAT_ADDRESS?
		nhaehnleUnsubmitted Not Done Reply Inline Actions Yes, I'm pretty sure it's needed. If we generate a FLAT instruction, the hardware will check apertures regardless, and you could run into the following scenario: end_of_scratch_aperture: 0x2'0000'0000 vaddr: 0x1'ffff'fff0 inst_offset: 32 The hardware executes the flat instruction, sees that vaddr falls into the scratch aperture, and executes the instruction as accessing scratch memory. The only way to fix that would be if we knew that there is never any memory mapped directly after the end of any of the apertures. nhaehnle: Yes, I'm pretty sure it's needed. If we generate a FLAT instruction, the hardware will check…
SDLoc DL(N);		SDLoc DL(N);
uint64_t ImmField;		uint64_t RemainderOffset = COffsetVal;
		uint64_t ImmField = 0;
const unsigned NumBits = TII->getNumFlatOffsetBits(AS, IsSigned);		const unsigned NumBits = TII->getNumFlatOffsetBits(AS, IsSigned);
if (IsSigned) {		if (IsSigned) {
ImmField = SignExtend64(COffsetVal, NumBits);		// Use signed division by a power of two to truncate towards 0.
		int64_t D = 1LL << (NumBits - 1);
		arsenmUnsubmitted Not Done Reply Inline Actions This limitation also only needs to be applied if AS == FLAT_ADDRESS arsenm: This limitation also only needs to be applied if AS == FLAT_ADDRESS
		foadAuthorUnsubmitted Done Reply Inline Actions The only "limitation" is that we don't try to split negative offsets if the immediate offset field is unsigned, but you're saying we can do that if AS != FLAT_ADDRESS? What would that mean - that we're using a FLAT instruction but we know statically which part of the address space it is accessing?? foad: The only "limitation" is that we don't try to split negative offsets if the immediate offset…
		arsenmUnsubmitted Not Done Reply Inline Actions Correct. This is always the case pre-gfx9 which did not have the "global" flat instructions arsenm: Correct. This is always the case pre-gfx9 which did not have the "global" flat instructions
		arsenmUnsubmitted Not Done Reply Inline Actions Actually pre-gfx9 also didn't have flat offsets. However gfx10 does have a bug with flat offsets, so I think it would still be correct to model this correctly. The instruction patterns do accept either (and global instructions are only preferred through pattern priority) arsenm: Actually pre-gfx9 also didn't have flat offsets. However gfx10 does have a bug with flat…
		foadAuthorUnsubmitted Done Reply Inline Actions This limitation also only needs to be applied if AS == FLAT_ADDRESS I still don't get this. Surely if we're using a FLAT instruction, even if we know which specific address space the programmer is trying to access, we still have to avoid setting vaddr to an address that might point into the wrong aperture. foad: > This limitation also only needs to be applied if AS == FLAT_ADDRESS I still don't get this.
		arsenmUnsubmitted Not Done Reply Inline Actions My understanding was the aperture only means anything for private or local. If it's a global address, it's neither aperture and behaves as a normal instruction (i.e. there's no aperture for global pointers) arsenm: My understanding was the aperture only means anything for private or local. If it's a global…
		foadAuthorUnsubmitted Done Reply Inline Actions But in that case, you should still avoid making drastic changes to vaddr in case it ends up accidentally pointing into one of the apertures, when you wanted a global access. E.g. if you're accessing a global that happens to be just past the end of the private or local aperture. foad: But in that case, you should still avoid making drastic changes to vaddr in case it ends up…
		arsenmUnsubmitted Not Done Reply Inline Actions I thought Nicolai mentioned this might not be possible? I guess I don't understand why the aperture is so complicated and not just a couple of flags in the high, unused bits arsenm: I thought Nicolai mentioned this might not be possible? I guess I don't understand why the…
		foadAuthorUnsubmitted Done Reply Inline Actions I don't remember hearing that it wasn't possible. @nhaehnle ? foad: I don't remember hearing that it wasn't possible. @nhaehnle ?
		arsenmUnsubmitted Not Done Reply Inline Actions maskTrailingZeros arsenm: maskTrailingZeros
		foadAuthorUnsubmitted Done Reply Inline Actions That would round towards -infinity. As the comment says, we're deliberately rounding towards zero instead. foad: That would round towards -infinity. As the comment says, we're deliberately rounding towards…
// Don't use a negative offset field if the base offset is positive.		RemainderOffset = (static_cast<int64_t>(COffsetVal) / D) * D;
// Since the scheduler currently relies on the offset field, doing so		ImmField = COffsetVal - RemainderOffset;
// could result in strange scheduling decisions.		} else if (static_cast<int64_t>(COffsetVal) >= 0) {
		ImmField = COffsetVal & maskTrailingOnes<uint64_t>(NumBits);
// TODO: Should we not do this in the opposite direction as well?		RemainderOffset = COffsetVal - ImmField;
if (static_cast<int64_t>(COffsetVal) > 0) {
if (static_cast<int64_t>(ImmField) < 0) {
const uint64_t OffsetMask =
maskTrailingOnes<uint64_t>(NumBits - 1);
ImmField = COffsetVal & OffsetMask;
}		}
}
} else {
// TODO: Should we do this for a negative offset?
const uint64_t OffsetMask = maskTrailingOnes<uint64_t>(NumBits);
ImmField = COffsetVal & OffsetMask;
}

uint64_t RemainderOffset = COffsetVal - ImmField;

assert(TII->isLegalFLATOffset(ImmField, AS, IsSigned));		assert(TII->isLegalFLATOffset(ImmField, AS, IsSigned));
assert(RemainderOffset + ImmField == COffsetVal);		assert(RemainderOffset + ImmField == COffsetVal);

OffsetVal = ImmField;		OffsetVal = ImmField;

// TODO: Should this try to use a scalar add pseudo if the base address		// TODO: Should this try to use a scalar add pseudo if the base address
// is uniform and saddr is usable?		// is uniform and saddr is usable?
SDValue Sub0 = CurDAG->getTargetConstant(AMDGPU::sub0, DL, MVT::i32);		SDValue Sub0 = CurDAG->getTargetConstant(AMDGPU::sub0, DL, MVT::i32);
▲ Show 20 Lines • Show All 1,215 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/flat-address-space.ll

Show First 20 Lines • Show All 185 Lines • ▼ Show 20 Lines	define amdgpu_kernel void @store_flat_i8_max_offset_p1(i8* %fptr, i8 %x) #0 {
%fptr.offset = getelementptr inbounds i8, i8* %fptr, i64 4096		%fptr.offset = getelementptr inbounds i8, i8* %fptr, i64 4096
store volatile i8 %x, i8* %fptr.offset		store volatile i8 %x, i8* %fptr.offset
ret void		ret void
}		}

; CHECK-LABEL: {{^}}store_flat_i8_neg_offset:		; CHECK-LABEL: {{^}}store_flat_i8_neg_offset:
; CIVI: flat_store_byte v{{\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}}{{$}}		; CIVI: flat_store_byte v{{\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}}{{$}}

; GFX9: v_add_co_u32_e32 v{{[0-9]+}}, vcc, 0xfffff000, v		; GFX9: v_add_co_u32_e64 v{{[0-9]+}}, vcc, -2, s
; GFX9: v_addc_co_u32_e32 v{{[0-9]+}}, vcc, -1,		; GFX9: v_addc_co_u32_e32 v{{[0-9]+}}, vcc, -1,
; GFX9: flat_store_byte v{{\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}} offset:4094{{$}}		; GFX9: flat_store_byte v{{\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}}{{$}}
define amdgpu_kernel void @store_flat_i8_neg_offset(i8* %fptr, i8 %x) #0 {		define amdgpu_kernel void @store_flat_i8_neg_offset(i8* %fptr, i8 %x) #0 {
%fptr.offset = getelementptr inbounds i8, i8* %fptr, i64 -2		%fptr.offset = getelementptr inbounds i8, i8* %fptr, i64 -2
store volatile i8 %x, i8* %fptr.offset		store volatile i8 %x, i8* %fptr.offset
ret void		ret void
}		}

; CHECK-LABEL: {{^}}load_flat_i8_max_offset:		; CHECK-LABEL: {{^}}load_flat_i8_max_offset:
; CIVI: flat_load_ubyte v{{[0-9]+}}, v{{\[[0-9]+:[0-9]+\]}}{{$}}		; CIVI: flat_load_ubyte v{{[0-9]+}}, v{{\[[0-9]+:[0-9]+\]}}{{$}}
Show All 10 Lines	define amdgpu_kernel void @load_flat_i8_max_offset_p1(i8* %fptr) #0 {
%fptr.offset = getelementptr inbounds i8, i8* %fptr, i64 4096		%fptr.offset = getelementptr inbounds i8, i8* %fptr, i64 4096
%val = load volatile i8, i8* %fptr.offset		%val = load volatile i8, i8* %fptr.offset
ret void		ret void
}		}

; CHECK-LABEL: {{^}}load_flat_i8_neg_offset:		; CHECK-LABEL: {{^}}load_flat_i8_neg_offset:
; CIVI: flat_load_ubyte v{{[0-9]+}}, v{{\[[0-9]+:[0-9]+\]}}{{$}}		; CIVI: flat_load_ubyte v{{[0-9]+}}, v{{\[[0-9]+:[0-9]+\]}}{{$}}

; GFX9: v_add_co_u32_e32 v{{[0-9]+}}, vcc, 0xfffff000, v		; GFX9: v_add_co_u32_e64 v{{[0-9]+}}, vcc, -2, s
; GFX9: v_addc_co_u32_e32 v{{[0-9]+}}, vcc, -1,		; GFX9: v_addc_co_u32_e32 v{{[0-9]+}}, vcc, -1,
; GFX9: flat_load_ubyte v{{[0-9]+}}, v{{\[[0-9]+:[0-9]+\]}} offset:4094{{$}}		; GFX9: flat_load_ubyte v{{[0-9]+}}, v{{\[[0-9]+:[0-9]+\]}}{{$}}
define amdgpu_kernel void @load_flat_i8_neg_offset(i8* %fptr) #0 {		define amdgpu_kernel void @load_flat_i8_neg_offset(i8* %fptr) #0 {
%fptr.offset = getelementptr inbounds i8, i8* %fptr, i64 -2		%fptr.offset = getelementptr inbounds i8, i8* %fptr, i64 -2
%val = load volatile i8, i8* %fptr.offset		%val = load volatile i8, i8* %fptr.offset
ret void		ret void
}		}

attributes #0 = { nounwind }		attributes #0 = { nounwind }
attributes #1 = { nounwind convergent }		attributes #1 = { nounwind convergent }

llvm/test/CodeGen/AMDGPU/offset-split-flat.ll

Show First 20 Lines • Show All 97 Lines • ▼ Show 20 Lines	; GFX10-NEXT: s_setpc_b64 s[30:31]
%load = load i8, i8* %gep, align 4		%load = load i8, i8* %gep, align 4
ret i8 %load		ret i8 %load
}		}

define i8 @flat_inst_valu_offset_neg_11bit_max(i8* %p) {		define i8 @flat_inst_valu_offset_neg_11bit_max(i8* %p) {
; GFX9-LABEL: flat_inst_valu_offset_neg_11bit_max:		; GFX9-LABEL: flat_inst_valu_offset_neg_11bit_max:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, 0xfffff000, v0		; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, 0xfffff800, v0
; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, -1, v1, vcc		; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, -1, v1, vcc
; GFX9-NEXT: flat_load_ubyte v0, v[0:1] offset:2048		; GFX9-NEXT: flat_load_ubyte v0, v[0:1]
; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_setpc_b64 s[30:31]		; GFX9-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-LABEL: flat_inst_valu_offset_neg_11bit_max:		; GFX10-LABEL: flat_inst_valu_offset_neg_11bit_max:
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: v_add_co_u32_e64 v0, vcc_lo, 0xfffff800, v0		; GFX10-NEXT: v_add_co_u32_e64 v0, vcc_lo, 0xfffff800, v0
▲ Show 20 Lines • Show All 361 Lines • ▼ Show 20 Lines	; GFX10-NEXT: s_setpc_b64 s[30:31]
ret i8 %load		ret i8 %load
}		}

; Fill 11-bit low-bits, negative high bits (1ull << 63) \| 2047		; Fill 11-bit low-bits, negative high bits (1ull << 63) \| 2047
define i8 @flat_inst_valu_offset_64bit_11bit_neg_high_split0(i8* %p) {		define i8 @flat_inst_valu_offset_64bit_11bit_neg_high_split0(i8* %p) {
; GFX9-LABEL: flat_inst_valu_offset_64bit_11bit_neg_high_split0:		; GFX9-LABEL: flat_inst_valu_offset_64bit_11bit_neg_high_split0:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
		; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, 0x7ff, v0
; GFX9-NEXT: v_bfrev_b32_e32 v2, 1		; GFX9-NEXT: v_bfrev_b32_e32 v2, 1
; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, 0, v0
; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, v2, v1, vcc		; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, v2, v1, vcc
; GFX9-NEXT: flat_load_ubyte v0, v[0:1] offset:2047		; GFX9-NEXT: flat_load_ubyte v0, v[0:1]
; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_setpc_b64 s[30:31]		; GFX9-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-LABEL: flat_inst_valu_offset_64bit_11bit_neg_high_split0:		; GFX10-LABEL: flat_inst_valu_offset_64bit_11bit_neg_high_split0:
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: v_add_co_u32_e64 v0, vcc_lo, 0x7ff, v0		; GFX10-NEXT: v_add_co_u32_e64 v0, vcc_lo, 0x7ff, v0
; GFX10-NEXT: ; implicit-def: $vcc_hi		; GFX10-NEXT: ; implicit-def: $vcc_hi
; GFX10-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, 0x80000000, v1, vcc_lo		; GFX10-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, 0x80000000, v1, vcc_lo
; GFX10-NEXT: flat_load_ubyte v0, v[0:1]		; GFX10-NEXT: flat_load_ubyte v0, v[0:1]
; GFX10-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_setpc_b64 s[30:31]		; GFX10-NEXT: s_setpc_b64 s[30:31]
%gep = getelementptr i8, i8* %p, i64 -9223372036854773761		%gep = getelementptr i8, i8* %p, i64 -9223372036854773761
%load = load i8, i8* %gep, align 4		%load = load i8, i8* %gep, align 4
ret i8 %load		ret i8 %load
}		}

; Fill 11-bit low-bits, negative high bits (1ull << 63) \| 2048		; Fill 11-bit low-bits, negative high bits (1ull << 63) \| 2048
define i8 @flat_inst_valu_offset_64bit_11bit_neg_high_split1(i8* %p) {		define i8 @flat_inst_valu_offset_64bit_11bit_neg_high_split1(i8* %p) {
; GFX9-LABEL: flat_inst_valu_offset_64bit_11bit_neg_high_split1:		; GFX9-LABEL: flat_inst_valu_offset_64bit_11bit_neg_high_split1:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
		; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, 0x800, v0
; GFX9-NEXT: v_bfrev_b32_e32 v2, 1		; GFX9-NEXT: v_bfrev_b32_e32 v2, 1
; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, 0, v0
; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, v2, v1, vcc		; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, v2, v1, vcc
; GFX9-NEXT: flat_load_ubyte v0, v[0:1] offset:2048		; GFX9-NEXT: flat_load_ubyte v0, v[0:1]
; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_setpc_b64 s[30:31]		; GFX9-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-LABEL: flat_inst_valu_offset_64bit_11bit_neg_high_split1:		; GFX10-LABEL: flat_inst_valu_offset_64bit_11bit_neg_high_split1:
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: v_add_co_u32_e64 v0, vcc_lo, 0x800, v0		; GFX10-NEXT: v_add_co_u32_e64 v0, vcc_lo, 0x800, v0
; GFX10-NEXT: ; implicit-def: $vcc_hi		; GFX10-NEXT: ; implicit-def: $vcc_hi
; GFX10-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, 0x80000000, v1, vcc_lo		; GFX10-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, 0x80000000, v1, vcc_lo
; GFX10-NEXT: flat_load_ubyte v0, v[0:1]		; GFX10-NEXT: flat_load_ubyte v0, v[0:1]
; GFX10-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_setpc_b64 s[30:31]		; GFX10-NEXT: s_setpc_b64 s[30:31]
%gep = getelementptr i8, i8* %p, i64 -9223372036854773760		%gep = getelementptr i8, i8* %p, i64 -9223372036854773760
%load = load i8, i8* %gep, align 4		%load = load i8, i8* %gep, align 4
ret i8 %load		ret i8 %load
}		}

; Fill 12-bit low-bits, negative high bits (1ull << 63) \| 4095		; Fill 12-bit low-bits, negative high bits (1ull << 63) \| 4095
define i8 @flat_inst_valu_offset_64bit_12bit_neg_high_split0(i8* %p) {		define i8 @flat_inst_valu_offset_64bit_12bit_neg_high_split0(i8* %p) {
; GFX9-LABEL: flat_inst_valu_offset_64bit_12bit_neg_high_split0:		; GFX9-LABEL: flat_inst_valu_offset_64bit_12bit_neg_high_split0:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
		; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, 0xfff, v0
; GFX9-NEXT: v_bfrev_b32_e32 v2, 1		; GFX9-NEXT: v_bfrev_b32_e32 v2, 1
; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, 0, v0
; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, v2, v1, vcc		; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, v2, v1, vcc
; GFX9-NEXT: flat_load_ubyte v0, v[0:1] offset:4095		; GFX9-NEXT: flat_load_ubyte v0, v[0:1]
; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_setpc_b64 s[30:31]		; GFX9-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-LABEL: flat_inst_valu_offset_64bit_12bit_neg_high_split0:		; GFX10-LABEL: flat_inst_valu_offset_64bit_12bit_neg_high_split0:
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: v_add_co_u32_e64 v0, vcc_lo, 0xfff, v0		; GFX10-NEXT: v_add_co_u32_e64 v0, vcc_lo, 0xfff, v0
Show All 34 Lines	; GFX10-NEXT: s_setpc_b64 s[30:31]
ret i8 %load		ret i8 %load
}		}

; Fill 13-bit low-bits, negative high bits (1ull << 63) \| 8191		; Fill 13-bit low-bits, negative high bits (1ull << 63) \| 8191
define i8 @flat_inst_valu_offset_64bit_13bit_neg_high_split0(i8* %p) {		define i8 @flat_inst_valu_offset_64bit_13bit_neg_high_split0(i8* %p) {
; GFX9-LABEL: flat_inst_valu_offset_64bit_13bit_neg_high_split0:		; GFX9-LABEL: flat_inst_valu_offset_64bit_13bit_neg_high_split0:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, 0x1000, v0		; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, 0x1fff, v0
; GFX9-NEXT: v_bfrev_b32_e32 v2, 1		; GFX9-NEXT: v_bfrev_b32_e32 v2, 1
; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, v2, v1, vcc		; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, v2, v1, vcc
; GFX9-NEXT: flat_load_ubyte v0, v[0:1] offset:4095		; GFX9-NEXT: flat_load_ubyte v0, v[0:1]
; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_setpc_b64 s[30:31]		; GFX9-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-LABEL: flat_inst_valu_offset_64bit_13bit_neg_high_split0:		; GFX10-LABEL: flat_inst_valu_offset_64bit_13bit_neg_high_split0:
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: v_add_co_u32_e64 v0, vcc_lo, 0x1fff, v0		; GFX10-NEXT: v_add_co_u32_e64 v0, vcc_lo, 0x1fff, v0
▲ Show 20 Lines • Show All 162 Lines • ▼ Show 20 Lines

define amdgpu_kernel void @flat_inst_salu_offset_neg_11bit_max(i8* %p) {		define amdgpu_kernel void @flat_inst_salu_offset_neg_11bit_max(i8* %p) {
; GFX9-LABEL: flat_inst_salu_offset_neg_11bit_max:		; GFX9-LABEL: flat_inst_salu_offset_neg_11bit_max:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: v_mov_b32_e32 v0, s0		; GFX9-NEXT: v_mov_b32_e32 v0, s0
; GFX9-NEXT: v_mov_b32_e32 v1, s1		; GFX9-NEXT: v_mov_b32_e32 v1, s1
; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, 0xfffff000, v0		; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, 0xfffff800, v0
; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, -1, v1, vcc		; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, -1, v1, vcc
; GFX9-NEXT: flat_load_ubyte v0, v[0:1] offset:2048		; GFX9-NEXT: flat_load_ubyte v0, v[0:1]
; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX9-NEXT: flat_store_byte v[0:1], v0		; GFX9-NEXT: flat_store_byte v[0:1], v0
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; GFX10-LABEL: flat_inst_salu_offset_neg_11bit_max:		; GFX10-LABEL: flat_inst_salu_offset_neg_11bit_max:
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX10-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX10-NEXT: ; implicit-def: $vcc_hi		; GFX10-NEXT: ; implicit-def: $vcc_hi
▲ Show 20 Lines • Show All 477 Lines • ▼ Show 20 Lines

; Fill 11-bit low-bits, negative high bits (1ull << 63) \| 2047		; Fill 11-bit low-bits, negative high bits (1ull << 63) \| 2047
define amdgpu_kernel void @flat_inst_salu_offset_64bit_11bit_neg_high_split0(i8* %p) {		define amdgpu_kernel void @flat_inst_salu_offset_64bit_11bit_neg_high_split0(i8* %p) {
; GFX9-LABEL: flat_inst_salu_offset_64bit_11bit_neg_high_split0:		; GFX9-LABEL: flat_inst_salu_offset_64bit_11bit_neg_high_split0:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX9-NEXT: v_bfrev_b32_e32 v1, 1		; GFX9-NEXT: v_bfrev_b32_e32 v1, 1
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
		; GFX9-NEXT: v_mov_b32_e32 v0, s0
; GFX9-NEXT: v_mov_b32_e32 v2, s1		; GFX9-NEXT: v_mov_b32_e32 v2, s1
; GFX9-NEXT: v_add_co_u32_e64 v0, vcc, 0, s0		; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, 0x7ff, v0
; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, v1, v2, vcc		; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, v1, v2, vcc
; GFX9-NEXT: flat_load_ubyte v0, v[0:1] offset:2047		; GFX9-NEXT: flat_load_ubyte v0, v[0:1]
; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX9-NEXT: flat_store_byte v[0:1], v0		; GFX9-NEXT: flat_store_byte v[0:1], v0
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; GFX10-LABEL: flat_inst_salu_offset_64bit_11bit_neg_high_split0:		; GFX10-LABEL: flat_inst_salu_offset_64bit_11bit_neg_high_split0:
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX10-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX10-NEXT: ; implicit-def: $vcc_hi		; GFX10-NEXT: ; implicit-def: $vcc_hi
Show All 14 Lines

; Fill 11-bit low-bits, negative high bits (1ull << 63) \| 2048		; Fill 11-bit low-bits, negative high bits (1ull << 63) \| 2048
define amdgpu_kernel void @flat_inst_salu_offset_64bit_11bit_neg_high_split1(i8* %p) {		define amdgpu_kernel void @flat_inst_salu_offset_64bit_11bit_neg_high_split1(i8* %p) {
; GFX9-LABEL: flat_inst_salu_offset_64bit_11bit_neg_high_split1:		; GFX9-LABEL: flat_inst_salu_offset_64bit_11bit_neg_high_split1:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX9-NEXT: v_bfrev_b32_e32 v1, 1		; GFX9-NEXT: v_bfrev_b32_e32 v1, 1
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
		; GFX9-NEXT: v_mov_b32_e32 v0, s0
; GFX9-NEXT: v_mov_b32_e32 v2, s1		; GFX9-NEXT: v_mov_b32_e32 v2, s1
; GFX9-NEXT: v_add_co_u32_e64 v0, vcc, 0, s0		; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, 0x800, v0
; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, v1, v2, vcc		; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, v1, v2, vcc
; GFX9-NEXT: flat_load_ubyte v0, v[0:1] offset:2048		; GFX9-NEXT: flat_load_ubyte v0, v[0:1]
; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX9-NEXT: flat_store_byte v[0:1], v0		; GFX9-NEXT: flat_store_byte v[0:1], v0
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; GFX10-LABEL: flat_inst_salu_offset_64bit_11bit_neg_high_split1:		; GFX10-LABEL: flat_inst_salu_offset_64bit_11bit_neg_high_split1:
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX10-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX10-NEXT: ; implicit-def: $vcc_hi		; GFX10-NEXT: ; implicit-def: $vcc_hi
Show All 14 Lines

; Fill 12-bit low-bits, negative high bits (1ull << 63) \| 4095		; Fill 12-bit low-bits, negative high bits (1ull << 63) \| 4095
define amdgpu_kernel void @flat_inst_salu_offset_64bit_12bit_neg_high_split0(i8* %p) {		define amdgpu_kernel void @flat_inst_salu_offset_64bit_12bit_neg_high_split0(i8* %p) {
; GFX9-LABEL: flat_inst_salu_offset_64bit_12bit_neg_high_split0:		; GFX9-LABEL: flat_inst_salu_offset_64bit_12bit_neg_high_split0:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX9-NEXT: v_bfrev_b32_e32 v1, 1		; GFX9-NEXT: v_bfrev_b32_e32 v1, 1
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
		; GFX9-NEXT: v_mov_b32_e32 v0, s0
; GFX9-NEXT: v_mov_b32_e32 v2, s1		; GFX9-NEXT: v_mov_b32_e32 v2, s1
; GFX9-NEXT: v_add_co_u32_e64 v0, vcc, 0, s0		; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, 0xfff, v0
; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, v1, v2, vcc		; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, v1, v2, vcc
; GFX9-NEXT: flat_load_ubyte v0, v[0:1] offset:4095		; GFX9-NEXT: flat_load_ubyte v0, v[0:1]
; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX9-NEXT: flat_store_byte v[0:1], v0		; GFX9-NEXT: flat_store_byte v[0:1], v0
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; GFX10-LABEL: flat_inst_salu_offset_64bit_12bit_neg_high_split0:		; GFX10-LABEL: flat_inst_salu_offset_64bit_12bit_neg_high_split0:
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX10-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX10-NEXT: ; implicit-def: $vcc_hi		; GFX10-NEXT: ; implicit-def: $vcc_hi
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines
define amdgpu_kernel void @flat_inst_salu_offset_64bit_13bit_neg_high_split0(i8* %p) {		define amdgpu_kernel void @flat_inst_salu_offset_64bit_13bit_neg_high_split0(i8* %p) {
; GFX9-LABEL: flat_inst_salu_offset_64bit_13bit_neg_high_split0:		; GFX9-LABEL: flat_inst_salu_offset_64bit_13bit_neg_high_split0:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX9-NEXT: v_bfrev_b32_e32 v1, 1		; GFX9-NEXT: v_bfrev_b32_e32 v1, 1
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: v_mov_b32_e32 v0, s0		; GFX9-NEXT: v_mov_b32_e32 v0, s0
; GFX9-NEXT: v_mov_b32_e32 v2, s1		; GFX9-NEXT: v_mov_b32_e32 v2, s1
; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, 0x1000, v0		; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, 0x1fff, v0
; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, v1, v2, vcc		; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, v1, v2, vcc
; GFX9-NEXT: flat_load_ubyte v0, v[0:1] offset:4095		; GFX9-NEXT: flat_load_ubyte v0, v[0:1]
; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX9-NEXT: flat_store_byte v[0:1], v0		; GFX9-NEXT: flat_store_byte v[0:1], v0
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; GFX10-LABEL: flat_inst_salu_offset_64bit_13bit_neg_high_split0:		; GFX10-LABEL: flat_inst_salu_offset_64bit_13bit_neg_high_split0:
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX10-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX10-NEXT: ; implicit-def: $vcc_hi		; GFX10-NEXT: ; implicit-def: $vcc_hi
▲ Show 20 Lines • Show All 49 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/offset-split-global.ll

Show First 20 Lines • Show All 465 Lines • ▼ Show 20 Lines	; GFX10-NEXT: s_setpc_b64 s[30:31]
ret i8 %load		ret i8 %load
}		}

; Fill 11-bit low-bits, negative high bits (1ull << 63) \| 2047		; Fill 11-bit low-bits, negative high bits (1ull << 63) \| 2047
define i8 @global_inst_valu_offset_64bit_11bit_neg_high_split0(i8 addrspace(1)* %p) {		define i8 @global_inst_valu_offset_64bit_11bit_neg_high_split0(i8 addrspace(1)* %p) {
; GFX9-LABEL: global_inst_valu_offset_64bit_11bit_neg_high_split0:		; GFX9-LABEL: global_inst_valu_offset_64bit_11bit_neg_high_split0:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
		; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, 0x1000, v0
; GFX9-NEXT: v_bfrev_b32_e32 v2, 1		; GFX9-NEXT: v_bfrev_b32_e32 v2, 1
; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, 0, v0
; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, v2, v1, vcc		; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, v2, v1, vcc
; GFX9-NEXT: global_load_ubyte v0, v[0:1], off offset:2047		; GFX9-NEXT: global_load_ubyte v0, v[0:1], off offset:-2049
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: s_setpc_b64 s[30:31]		; GFX9-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-LABEL: global_inst_valu_offset_64bit_11bit_neg_high_split0:		; GFX10-LABEL: global_inst_valu_offset_64bit_11bit_neg_high_split0:
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: v_add_co_u32_e64 v0, vcc_lo, 0, v0		; GFX10-NEXT: v_add_co_u32_e64 v0, vcc_lo, 0x800, v0
; GFX10-NEXT: ; implicit-def: $vcc_hi		; GFX10-NEXT: ; implicit-def: $vcc_hi
; GFX10-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, 0x80000000, v1, vcc_lo		; GFX10-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, 0x80000000, v1, vcc_lo
; GFX10-NEXT: global_load_ubyte v0, v[0:1], off offset:2047		; GFX10-NEXT: global_load_ubyte v0, v[0:1], off offset:-1
; GFX10-NEXT: s_waitcnt vmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0)
; GFX10-NEXT: s_setpc_b64 s[30:31]		; GFX10-NEXT: s_setpc_b64 s[30:31]
%gep = getelementptr i8, i8 addrspace(1)* %p, i64 -9223372036854773761		%gep = getelementptr i8, i8 addrspace(1)* %p, i64 -9223372036854773761
%load = load i8, i8 addrspace(1)* %gep, align 4		%load = load i8, i8 addrspace(1)* %gep, align 4
ret i8 %load		ret i8 %load
}		}

; Fill 11-bit low-bits, negative high bits (1ull << 63) \| 2048		; Fill 11-bit low-bits, negative high bits (1ull << 63) \| 2048
define i8 @global_inst_valu_offset_64bit_11bit_neg_high_split1(i8 addrspace(1)* %p) {		define i8 @global_inst_valu_offset_64bit_11bit_neg_high_split1(i8 addrspace(1)* %p) {
; GFX9-LABEL: global_inst_valu_offset_64bit_11bit_neg_high_split1:		; GFX9-LABEL: global_inst_valu_offset_64bit_11bit_neg_high_split1:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
		; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, 0x1000, v0
; GFX9-NEXT: v_bfrev_b32_e32 v2, 1		; GFX9-NEXT: v_bfrev_b32_e32 v2, 1
; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, 0, v0
; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, v2, v1, vcc		; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, v2, v1, vcc
; GFX9-NEXT: global_load_ubyte v0, v[0:1], off offset:2048		; GFX9-NEXT: global_load_ubyte v0, v[0:1], off offset:-2048
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: s_setpc_b64 s[30:31]		; GFX9-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-LABEL: global_inst_valu_offset_64bit_11bit_neg_high_split1:		; GFX10-LABEL: global_inst_valu_offset_64bit_11bit_neg_high_split1:
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: v_add_co_u32_e64 v0, vcc_lo, 0x1000, v0		; GFX10-NEXT: v_add_co_u32_e64 v0, vcc_lo, 0x800, v0
; GFX10-NEXT: ; implicit-def: $vcc_hi		; GFX10-NEXT: ; implicit-def: $vcc_hi
; GFX10-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, 0x80000000, v1, vcc_lo		; GFX10-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, 0x80000000, v1, vcc_lo
; GFX10-NEXT: global_load_ubyte v0, v[0:1], off offset:-2048		; GFX10-NEXT: global_load_ubyte v0, v[0:1], off
; GFX10-NEXT: s_waitcnt vmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0)
; GFX10-NEXT: s_setpc_b64 s[30:31]		; GFX10-NEXT: s_setpc_b64 s[30:31]
%gep = getelementptr i8, i8 addrspace(1)* %p, i64 -9223372036854773760		%gep = getelementptr i8, i8 addrspace(1)* %p, i64 -9223372036854773760
%load = load i8, i8 addrspace(1)* %gep, align 4		%load = load i8, i8 addrspace(1)* %gep, align 4
ret i8 %load		ret i8 %load
}		}

; Fill 12-bit low-bits, negative high bits (1ull << 63) \| 4095		; Fill 12-bit low-bits, negative high bits (1ull << 63) \| 4095
define i8 @global_inst_valu_offset_64bit_12bit_neg_high_split0(i8 addrspace(1)* %p) {		define i8 @global_inst_valu_offset_64bit_12bit_neg_high_split0(i8 addrspace(1)* %p) {
; GFX9-LABEL: global_inst_valu_offset_64bit_12bit_neg_high_split0:		; GFX9-LABEL: global_inst_valu_offset_64bit_12bit_neg_high_split0:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
		; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, 0x1000, v0
; GFX9-NEXT: v_bfrev_b32_e32 v2, 1		; GFX9-NEXT: v_bfrev_b32_e32 v2, 1
; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, 0, v0
; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, v2, v1, vcc		; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, v2, v1, vcc
; GFX9-NEXT: global_load_ubyte v0, v[0:1], off offset:4095		; GFX9-NEXT: global_load_ubyte v0, v[0:1], off offset:-1
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: s_setpc_b64 s[30:31]		; GFX9-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-LABEL: global_inst_valu_offset_64bit_12bit_neg_high_split0:		; GFX10-LABEL: global_inst_valu_offset_64bit_12bit_neg_high_split0:
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: v_add_co_u32_e64 v0, vcc_lo, 0x1000, v0		; GFX10-NEXT: v_add_co_u32_e64 v0, vcc_lo, 0x1000, v0
; GFX10-NEXT: ; implicit-def: $vcc_hi		; GFX10-NEXT: ; implicit-def: $vcc_hi
; GFX10-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, 0x80000000, v1, vcc_lo		; GFX10-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, 0x80000000, v1, vcc_lo
; GFX10-NEXT: global_load_ubyte v0, v[0:1], off offset:-1		; GFX10-NEXT: global_load_ubyte v0, v[0:1], off offset:-1
; GFX10-NEXT: s_waitcnt vmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0)
; GFX10-NEXT: s_setpc_b64 s[30:31]		; GFX10-NEXT: s_setpc_b64 s[30:31]
%gep = getelementptr i8, i8 addrspace(1)* %p, i64 -9223372036854771713		%gep = getelementptr i8, i8 addrspace(1)* %p, i64 -9223372036854771713
%load = load i8, i8 addrspace(1)* %gep, align 4		%load = load i8, i8 addrspace(1)* %gep, align 4
ret i8 %load		ret i8 %load
}		}

; Fill 12-bit low-bits, negative high bits (1ull << 63) \| 4096		; Fill 12-bit low-bits, negative high bits (1ull << 63) \| 4096
define i8 @global_inst_valu_offset_64bit_12bit_neg_high_split1(i8 addrspace(1)* %p) {		define i8 @global_inst_valu_offset_64bit_12bit_neg_high_split1(i8 addrspace(1)* %p) {
; GFX9-LABEL: global_inst_valu_offset_64bit_12bit_neg_high_split1:		; GFX9-LABEL: global_inst_valu_offset_64bit_12bit_neg_high_split1:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, 0x2000, v0		; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, 0x1000, v0
; GFX9-NEXT: v_bfrev_b32_e32 v2, 1		; GFX9-NEXT: v_bfrev_b32_e32 v2, 1
; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, v2, v1, vcc		; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, v2, v1, vcc
; GFX9-NEXT: global_load_ubyte v0, v[0:1], off offset:-4096		; GFX9-NEXT: global_load_ubyte v0, v[0:1], off
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: s_setpc_b64 s[30:31]		; GFX9-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-LABEL: global_inst_valu_offset_64bit_12bit_neg_high_split1:		; GFX10-LABEL: global_inst_valu_offset_64bit_12bit_neg_high_split1:
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: v_add_co_u32_e64 v0, vcc_lo, 0x1000, v0		; GFX10-NEXT: v_add_co_u32_e64 v0, vcc_lo, 0x1000, v0
▲ Show 20 Lines • Show All 641 Lines • ▼ Show 20 Lines

; Fill 11-bit low-bits, negative high bits (1ull << 63) \| 2047		; Fill 11-bit low-bits, negative high bits (1ull << 63) \| 2047
define amdgpu_kernel void @global_inst_salu_offset_64bit_11bit_neg_high_split0(i8 addrspace(1)* %p) {		define amdgpu_kernel void @global_inst_salu_offset_64bit_11bit_neg_high_split0(i8 addrspace(1)* %p) {
; GFX9-LABEL: global_inst_salu_offset_64bit_11bit_neg_high_split0:		; GFX9-LABEL: global_inst_salu_offset_64bit_11bit_neg_high_split0:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX9-NEXT: v_bfrev_b32_e32 v1, 1		; GFX9-NEXT: v_bfrev_b32_e32 v1, 1
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
		; GFX9-NEXT: v_mov_b32_e32 v0, s0
; GFX9-NEXT: v_mov_b32_e32 v2, s1		; GFX9-NEXT: v_mov_b32_e32 v2, s1
; GFX9-NEXT: v_add_co_u32_e64 v0, vcc, 0, s0		; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, 0x1000, v0
; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, v1, v2, vcc		; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, v1, v2, vcc
; GFX9-NEXT: global_load_ubyte v0, v[0:1], off offset:2047		; GFX9-NEXT: global_load_ubyte v0, v[0:1], off offset:-2049
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: global_store_byte v[0:1], v0, off		; GFX9-NEXT: global_store_byte v[0:1], v0, off
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; GFX10-LABEL: global_inst_salu_offset_64bit_11bit_neg_high_split0:		; GFX10-LABEL: global_inst_salu_offset_64bit_11bit_neg_high_split0:
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX10-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX10-NEXT: ; implicit-def: $vcc_hi		; GFX10-NEXT: ; implicit-def: $vcc_hi
; GFX10-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-NEXT: v_mov_b32_e32 v1, s1		; GFX10-NEXT: v_mov_b32_e32 v1, s1
; GFX10-NEXT: v_add_co_u32_e64 v0, vcc_lo, 0, s0		; GFX10-NEXT: v_add_co_u32_e64 v0, vcc_lo, 0x800, s0
; GFX10-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, 0x80000000, v1, vcc_lo		; GFX10-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, 0x80000000, v1, vcc_lo
; GFX10-NEXT: global_load_ubyte v0, v[0:1], off offset:2047		; GFX10-NEXT: global_load_ubyte v0, v[0:1], off offset:-1
; GFX10-NEXT: s_waitcnt vmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0)
; GFX10-NEXT: global_store_byte v[0:1], v0, off		; GFX10-NEXT: global_store_byte v[0:1], v0, off
; GFX10-NEXT: s_endpgm		; GFX10-NEXT: s_endpgm
%gep = getelementptr i8, i8 addrspace(1)* %p, i64 -9223372036854773761		%gep = getelementptr i8, i8 addrspace(1)* %p, i64 -9223372036854773761
%load = load volatile i8, i8 addrspace(1)* %gep, align 1		%load = load volatile i8, i8 addrspace(1)* %gep, align 1
store i8 %load, i8 addrspace(1)* undef		store i8 %load, i8 addrspace(1)* undef
ret void		ret void
}		}

; Fill 11-bit low-bits, negative high bits (1ull << 63) \| 2048		; Fill 11-bit low-bits, negative high bits (1ull << 63) \| 2048
define amdgpu_kernel void @global_inst_salu_offset_64bit_11bit_neg_high_split1(i8 addrspace(1)* %p) {		define amdgpu_kernel void @global_inst_salu_offset_64bit_11bit_neg_high_split1(i8 addrspace(1)* %p) {
; GFX9-LABEL: global_inst_salu_offset_64bit_11bit_neg_high_split1:		; GFX9-LABEL: global_inst_salu_offset_64bit_11bit_neg_high_split1:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX9-NEXT: v_bfrev_b32_e32 v1, 1		; GFX9-NEXT: v_bfrev_b32_e32 v1, 1
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
		; GFX9-NEXT: v_mov_b32_e32 v0, s0
; GFX9-NEXT: v_mov_b32_e32 v2, s1		; GFX9-NEXT: v_mov_b32_e32 v2, s1
; GFX9-NEXT: v_add_co_u32_e64 v0, vcc, 0, s0		; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, 0x1000, v0
; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, v1, v2, vcc		; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, v1, v2, vcc
; GFX9-NEXT: global_load_ubyte v0, v[0:1], off offset:2048		; GFX9-NEXT: global_load_ubyte v0, v[0:1], off offset:-2048
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: global_store_byte v[0:1], v0, off		; GFX9-NEXT: global_store_byte v[0:1], v0, off
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; GFX10-LABEL: global_inst_salu_offset_64bit_11bit_neg_high_split1:		; GFX10-LABEL: global_inst_salu_offset_64bit_11bit_neg_high_split1:
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX10-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX10-NEXT: ; implicit-def: $vcc_hi		; GFX10-NEXT: ; implicit-def: $vcc_hi
; GFX10-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-NEXT: v_mov_b32_e32 v1, s1		; GFX10-NEXT: v_mov_b32_e32 v1, s1
; GFX10-NEXT: v_add_co_u32_e64 v0, vcc_lo, 0x1000, s0		; GFX10-NEXT: v_add_co_u32_e64 v0, vcc_lo, 0x800, s0
; GFX10-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, 0x80000000, v1, vcc_lo		; GFX10-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, 0x80000000, v1, vcc_lo
; GFX10-NEXT: global_load_ubyte v0, v[0:1], off offset:-2048		; GFX10-NEXT: global_load_ubyte v0, v[0:1], off
; GFX10-NEXT: s_waitcnt vmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0)
; GFX10-NEXT: global_store_byte v[0:1], v0, off		; GFX10-NEXT: global_store_byte v[0:1], v0, off
; GFX10-NEXT: s_endpgm		; GFX10-NEXT: s_endpgm
%gep = getelementptr i8, i8 addrspace(1)* %p, i64 -9223372036854773760		%gep = getelementptr i8, i8 addrspace(1)* %p, i64 -9223372036854773760
%load = load volatile i8, i8 addrspace(1)* %gep, align 1		%load = load volatile i8, i8 addrspace(1)* %gep, align 1
store i8 %load, i8 addrspace(1)* undef		store i8 %load, i8 addrspace(1)* undef
ret void		ret void
}		}

; Fill 12-bit low-bits, negative high bits (1ull << 63) \| 4095		; Fill 12-bit low-bits, negative high bits (1ull << 63) \| 4095
define amdgpu_kernel void @global_inst_salu_offset_64bit_12bit_neg_high_split0(i8 addrspace(1)* %p) {		define amdgpu_kernel void @global_inst_salu_offset_64bit_12bit_neg_high_split0(i8 addrspace(1)* %p) {
; GFX9-LABEL: global_inst_salu_offset_64bit_12bit_neg_high_split0:		; GFX9-LABEL: global_inst_salu_offset_64bit_12bit_neg_high_split0:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX9-NEXT: v_bfrev_b32_e32 v1, 1		; GFX9-NEXT: v_bfrev_b32_e32 v1, 1
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
		; GFX9-NEXT: v_mov_b32_e32 v0, s0
; GFX9-NEXT: v_mov_b32_e32 v2, s1		; GFX9-NEXT: v_mov_b32_e32 v2, s1
; GFX9-NEXT: v_add_co_u32_e64 v0, vcc, 0, s0		; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, 0x1000, v0
; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, v1, v2, vcc		; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, v1, v2, vcc
; GFX9-NEXT: global_load_ubyte v0, v[0:1], off offset:4095		; GFX9-NEXT: global_load_ubyte v0, v[0:1], off offset:-1
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: global_store_byte v[0:1], v0, off		; GFX9-NEXT: global_store_byte v[0:1], v0, off
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; GFX10-LABEL: global_inst_salu_offset_64bit_12bit_neg_high_split0:		; GFX10-LABEL: global_inst_salu_offset_64bit_12bit_neg_high_split0:
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX10-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX10-NEXT: ; implicit-def: $vcc_hi		; GFX10-NEXT: ; implicit-def: $vcc_hi
Show All 15 Lines
define amdgpu_kernel void @global_inst_salu_offset_64bit_12bit_neg_high_split1(i8 addrspace(1)* %p) {		define amdgpu_kernel void @global_inst_salu_offset_64bit_12bit_neg_high_split1(i8 addrspace(1)* %p) {
; GFX9-LABEL: global_inst_salu_offset_64bit_12bit_neg_high_split1:		; GFX9-LABEL: global_inst_salu_offset_64bit_12bit_neg_high_split1:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX9-NEXT: v_bfrev_b32_e32 v1, 1		; GFX9-NEXT: v_bfrev_b32_e32 v1, 1
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: v_mov_b32_e32 v0, s0		; GFX9-NEXT: v_mov_b32_e32 v0, s0
; GFX9-NEXT: v_mov_b32_e32 v2, s1		; GFX9-NEXT: v_mov_b32_e32 v2, s1
; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, 0x2000, v0		; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, 0x1000, v0
; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, v1, v2, vcc		; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, v1, v2, vcc
; GFX9-NEXT: global_load_ubyte v0, v[0:1], off offset:-4096		; GFX9-NEXT: global_load_ubyte v0, v[0:1], off
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: global_store_byte v[0:1], v0, off		; GFX9-NEXT: global_store_byte v[0:1], v0, off
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; GFX10-LABEL: global_inst_salu_offset_64bit_12bit_neg_high_split1:		; GFX10-LABEL: global_inst_salu_offset_64bit_12bit_neg_high_split1:
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX10-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX10-NEXT: ; implicit-def: $vcc_hi		; GFX10-NEXT: ; implicit-def: $vcc_hi
▲ Show 20 Lines • Show All 81 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/promote-constOffset-to-imm.ll

Show First 20 Lines • Show All 81 Lines • ▼ Show 20 Lines
; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]		; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]
; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]		; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]
; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]		; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]
; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]		; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]
; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]		; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]
; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]		; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]
;		;
; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:-2048		; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:-2048
; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:-4096		; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}
; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:-2048		; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:-2048
; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}		; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}
; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:2048		; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:-2048
; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:-4096
; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:-2048		; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:-2048
; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}		; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}
; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:2048
; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:-4096		; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:-4096
; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off		; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:-4096
		; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:-2048
		; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}
;		;
; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}
; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:-2048		; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:-2048
; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}		; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}
; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:-2048		; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:-2048
; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}		; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}
; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:-2048		; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:-2048
; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}		; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}
; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:-2048		; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:-2048
; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}		; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}
		; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}
; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:-2048		; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:-2048
; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}		; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}
entry:		entry:
%call = tail call i64 @_Z13get_global_idj(i32 0)		%call = tail call i64 @_Z13get_global_idj(i32 0)
%conv = and i64 %call, 255		%conv = and i64 %call, 255
%a0 = shl i64 %call, 17		%a0 = shl i64 %call, 17
%idx.ext11 = and i64 %a0, 4261412864		%idx.ext11 = and i64 %a0, 4261412864
%add.ptr12 = getelementptr inbounds i8, i8 addrspace(1)* %buffer, i64 %idx.ext11		%add.ptr12 = getelementptr inbounds i8, i8 addrspace(1)* %buffer, i64 %idx.ext11
▲ Show 20 Lines • Show All 396 Lines • ▼ Show 20 Lines	entry:
ret void		ret void
}		}

define hidden amdgpu_kernel void @negativeoffset(i8 addrspace(1)* nocapture %buffer) {		define hidden amdgpu_kernel void @negativeoffset(i8 addrspace(1)* nocapture %buffer) {
; GCN-LABEL: negativeoffset:		; GCN-LABEL: negativeoffset:
; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]		; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]
; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]		; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]
;		;
; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:2048		; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:-2048
; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off		; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}
;		;
; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:-2048		; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}
; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off		; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}
entry:		entry:
%call = tail call i64 @_Z13get_global_idj(i32 0) #2		%call = tail call i64 @_Z13get_global_idj(i32 0) #2
%conv = and i64 %call, 255		%conv = and i64 %call, 255
%0 = shl i64 %call, 7		%0 = shl i64 %call, 7
%idx.ext11 = and i64 %0, 4294934528		%idx.ext11 = and i64 %0, 4294934528
%add.ptr12 = getelementptr inbounds i8, i8 addrspace(1)* %buffer, i64 %idx.ext11		%add.ptr12 = getelementptr inbounds i8, i8 addrspace(1)* %buffer, i64 %idx.ext11
%buffer_head = bitcast i8 addrspace(1)* %add.ptr12 to i64 addrspace(1)*		%buffer_head = bitcast i8 addrspace(1)* %add.ptr12 to i64 addrspace(1)*

Show All 14 Lines

llvm/test/CodeGen/AMDGPU/store-hi16.ll

Show First 20 Lines • Show All 308 Lines • ▼ Show 20 Lines	entry:
ret void		ret void
}		}

; GCN-LABEL: {{^}}store_flat_hi_v2i16_neg_offset:		; GCN-LABEL: {{^}}store_flat_hi_v2i16_neg_offset:
; GCN: s_waitcnt		; GCN: s_waitcnt
; GFX803: v_add{{(_co)?}}_{{i\|u}}32_e32		; GFX803: v_add{{(_co)?}}_{{i\|u}}32_e32
; GFX803: v_addc_u32_e32		; GFX803: v_addc_u32_e32

; GFX9-DAG: v_add_co_u32_e32 v{{[0-9]+}}, vcc, 0xfffff000, v		; GFX9-DAG: v_add_co_u32_e32 v{{[0-9]+}}, vcc, 0xfffff802, v
; GFX9-DAG: v_addc_co_u32_e32 v{{[0-9]+}}, vcc, -1, v		; GFX9-DAG: v_addc_co_u32_e32 v{{[0-9]+}}, vcc, -1, v

; GFX906-DAG: v_lshrrev_b32_e32		; GFX906-DAG: v_lshrrev_b32_e32
; GFX906: flat_store_short v[0:1], v2 offset:2050{{$}}		; GFX906: flat_store_short v[0:1], v2{{$}}

; GFX900-NEXT: flat_store_short_d16_hi v[0:1], v2 offset:2050{{$}}		; GFX900-NEXT: flat_store_short_d16_hi v[0:1], v2{{$}}
; GFX803: flat_store_short v[0:1], v2{{$}}		; GFX803: flat_store_short v[0:1], v2{{$}}
; GCN-NEXT: s_waitcnt		; GCN-NEXT: s_waitcnt
; GCN-NEXT: s_setpc_b64		; GCN-NEXT: s_setpc_b64
define void @store_flat_hi_v2i16_neg_offset(i16* %out, i32 %arg) #0 {		define void @store_flat_hi_v2i16_neg_offset(i16* %out, i32 %arg) #0 {
entry:		entry:
%value = bitcast i32 %arg to <2 x i16>		%value = bitcast i32 %arg to <2 x i16>
%hi = extractelement <2 x i16> %value, i32 1		%hi = extractelement <2 x i16> %value, i32 1
%gep = getelementptr inbounds i16, i16* %out, i64 -1023		%gep = getelementptr inbounds i16, i16* %out, i64 -1023
Show All 26 Lines
}		}

; GCN-LABEL: {{^}}store_flat_hi_v2i16_i8_neg_offset:		; GCN-LABEL: {{^}}store_flat_hi_v2i16_i8_neg_offset:
; GCN: s_waitcnt		; GCN: s_waitcnt

; GFX803-DAG: v_add_u32_e32		; GFX803-DAG: v_add_u32_e32
; GFX803-DAG: v_addc_u32_e32		; GFX803-DAG: v_addc_u32_e32

; GFX9-DAG: v_add_co_u32_e32 v{{[0-9]+}}, vcc, 0xfffff000, v		; GFX9-DAG: v_add_co_u32_e32 v{{[0-9]+}}, vcc, 0xfffff001, v
; GFX9-DAG: v_addc_co_u32_e32 v{{[0-9]+}}, vcc, -1, v{{[0-9]+}}, vcc		; GFX9-DAG: v_addc_co_u32_e32 v{{[0-9]+}}, vcc, -1, v{{[0-9]+}}, vcc

; GFX900-NEXT: flat_store_byte_d16_hi v[0:1], v2 offset:1{{$}}		; GFX900-NEXT: flat_store_byte_d16_hi v[0:1], v2{{$}}

; GFX906-DAG: v_lshrrev_b32_e32 v2, 16, v2		; GFX906-DAG: v_lshrrev_b32_e32 v2, 16, v2
; GFX906: flat_store_byte v[0:1], v2 offset:1{{$}}		; GFX906: flat_store_byte v[0:1], v2{{$}}

; GFX803-DAG: v_lshrrev_b32_e32 v2, 16, v2		; GFX803-DAG: v_lshrrev_b32_e32 v2, 16, v2
; GFX803: flat_store_byte v[0:1], v2{{$}}		; GFX803: flat_store_byte v[0:1], v2{{$}}

; GCN-NEXT: s_waitcnt		; GCN-NEXT: s_waitcnt
; GCN-NEXT: s_setpc_b64		; GCN-NEXT: s_setpc_b64
define void @store_flat_hi_v2i16_i8_neg_offset(i8* %out, i32 %arg) #0 {		define void @store_flat_hi_v2i16_i8_neg_offset(i8* %out, i32 %arg) #0 {
entry:		entry:
▲ Show 20 Lines • Show All 290 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Avoid splitting FLAT offsets in unsafe waysClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 278712

llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp

llvm/test/CodeGen/AMDGPU/flat-address-space.ll

llvm/test/CodeGen/AMDGPU/offset-split-flat.ll

llvm/test/CodeGen/AMDGPU/offset-split-global.ll

llvm/test/CodeGen/AMDGPU/promote-constOffset-to-imm.ll

llvm/test/CodeGen/AMDGPU/store-hi16.ll

[AMDGPU] Avoid splitting FLAT offsets in unsafe ways
ClosedPublic