Download Raw Diff

Details

Reviewers

foad
arsenm

Commits

rG8110fcc8fc56: AMDGPU/GlobalISel: Fix negative offset folding for buffer_load

Summary

Buffer_load does unsigned offset calculations. Don't fold
operands of 32-bit add that are likely to cause unsigned add
overflow (common case is when one of the operands is negative).

Diff Detail

Event Timeline

Petar.Avramovic created this revision.Nov 12 2020, 4:50 AM

Herald added a project: Restricted Project. · View Herald TranscriptNov 12 2020, 4:50 AM

Herald added subscribers: llvm-commits, kerbowa, hiraditya and 8 others. · View Herald Transcript

Petar.Avramovic requested review of this revision.Nov 12 2020, 4:50 AM

Herald added a subscriber: wdng. · View Herald TranscriptNov 12 2020, 4:50 AM

foad added inline comments.Nov 12 2020, 5:28 AM

llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.s.buffer.load.mir
51	Can you pre-commit this test so we can see the diff?

Pre-commit the test.

foad added inline comments.Nov 12 2020, 6:32 AM

llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp
16	I am really not sure about this change. I think `Reg` is always a 32-bit value here (perhaps we should assert this), so any G_ADD that we decompose will be a 32-bit add, so using `unsigned` (or perhaps `int`) for the offset makes sense.
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
1377	I think the hardware zero-extends each component of the address from 32 to 64 bits before adding them together. So the question here is: does zext(add(x, y)) == add(zext(x), zext(y))? The answer is "only if add(x, y) has no unsigned overflow". So I don't think "Offset >= 0" is exactly the right condition here, but it does fix the common case of a small negative immediate offset, so I'm happy with this as a short term fix.

Petar.Avramovic added inline comments.Nov 12 2020, 8:03 AM

llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp
16	There is range checking for the offset so the container for value should not make big difference. Choosing int or unsigned brings ambiguity for '32-bit values with 1 at highest bit' (I would assume that it is most probably negative offset then large positive offset so int could work). With int64 we know if value was positive or negative in ir. Also see below (1), (2):
28	(2) while we use getZExtValue here. Is this intended?
36	(1) m_ICst uses getSExtValue() on APInt

foad added inline comments.Nov 12 2020, 8:52 AM

llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp
16	With int64 we know if value was positive or negative in ir. No, it's just a 32-bit value, in IR and in MIR. You don't get any extra information by returning int64_t.
28	getSExtValue vs getZExtValue affects how the 32-bit value is extended to int64_t. But if you make this function return a 32-bit result, then that doesn't matter.

arsenm added inline comments.Nov 12 2020, 9:06 AM

llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
1377	Don't we have the sign bit is zero check?

Petar.Avramovic added inline comments.Nov 12 2020, 9:48 AM

llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp
16	I meant that APInt and int64_t (used by G_CONSTANT) keep sign info and we know what was intended use of that 32 bit value. Given 32 bit value 0xFFFFFFFC, we can't handle it here if it was used as negative offset (-4 as int64_t) since soffset behaves as unsigned in address calculation. However, if that is a large positive(unsigned) offset (4294967292 as int64_t) folding this into soffset behaves correctly.

Petar.Avramovic updated this revision to Diff 306394.Nov 19 2020, 6:26 AM

Petar.Avramovic edited the summary of this revision. (Show Details)

foad added inline comments.Nov 19 2020, 6:47 AM

llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
1377	Not sure what you mean. Are you suggesting only decomposing the ADD if we know the sign bit is zero in both operands? That would be safe, but I'm not sure how often that test would succeed.

arsenm added inline comments.Nov 19 2020, 6:59 AM

llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
1377	Yes, this is what we do in a few places in the DAG already

arsenm added inline comments.Dec 14 2020, 7:28 AM

llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
1324	Shouldn't use unsigned here

foad added inline comments.Dec 14 2020, 7:33 AM

llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
1324	I don't think we need this function at all. Surely `(int)Val >= 0` is clear enough?

Sorry about the delay. Use (int)Offset to check for one in most significant bit and potential unsigned overflow. DAG equivalent in SITargetLowering::setBufferOffsets also uses int for offset.

LGTM. It is correct in more cases than it was before, and it mostly matches what the DAG version does.

llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
1375	The DAG version of setBufferOffsets doesn't have this case. Is it done later on when the register classes are known?

This revision is now accepted and ready to land.Dec 18 2020, 6:04 AM

Petar.Avramovic added inline comments.Dec 18 2020, 6:39 AM

llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
1375	SDAG just gives up and uses add as base (1 extra instruction compared to what global-isel does). See `@s_buffer_load_f32_offset_add_vgpr_sgpr` in `GlobalISel/llvm.amdgcn.s.buffer.load.ll` for example.

Closed by commit rG8110fcc8fc56: AMDGPU/GlobalISel: Fix negative offset folding for buffer_load (authored by Petar.Avramovic). · Explain WhyApr 27 2021, 5:46 AM

This revision was automatically updated to reflect the committed changes.

Petar.Avramovic mentioned this in rG6a3e1b3531c0: AMDGPU/GlobalISel: Add test for buffer_load with negative offset.

Petar.Avramovic added a commit: rG8110fcc8fc56: AMDGPU/GlobalISel: Fix negative offset folding for buffer_load.

Diff 304786

llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.h

	Show All 14 Lines
	namespace llvm {			namespace llvm {

	class MachineInstr;			class MachineInstr;
	class MachineRegisterInfo;			class MachineRegisterInfo;

	namespace AMDGPU {			namespace AMDGPU {

	/// Returns base register and constant offset.			/// Returns base register and constant offset.
	std::pair<Register, unsigned>			std::pair<Register, int64_t>
	getBaseWithConstantOffset(MachineRegisterInfo &MRI, Register Reg);			getBaseWithConstantOffset(MachineRegisterInfo &MRI, Register Reg);

	bool isLegalVOP3PShuffleMask(ArrayRef<int> Mask);			bool isLegalVOP3PShuffleMask(ArrayRef<int> Mask);

	}			}
	}			}

	#endif			#endif

llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp

	//===- AMDGPUGlobalISelUtils.cpp ---------------------------------- C++ --==//			//===- AMDGPUGlobalISelUtils.cpp ---------------------------------- C++ --==//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "AMDGPUGlobalISelUtils.h"			#include "AMDGPUGlobalISelUtils.h"
	#include "llvm/CodeGen/GlobalISel/MIPatternMatch.h"			#include "llvm/CodeGen/GlobalISel/MIPatternMatch.h"
	#include "llvm/IR/Constants.h"			#include "llvm/IR/Constants.h"

	using namespace llvm;			using namespace llvm;
	using namespace MIPatternMatch;			using namespace MIPatternMatch;

	std::pair<Register, unsigned>			std::pair<Register, int64_t>
				foadUnsubmitted Not Done Reply Inline Actions I am really not sure about this change. I think `Reg` is always a 32-bit value here (perhaps we should assert this), so any G_ADD that we decompose will be a 32-bit add, so using `unsigned` (or perhaps `int`) for the offset makes sense. foad: I am really not sure about this change. I think `Reg` is always a 32-bit value here (perhaps we…
				Petar.AvramovicAuthorUnsubmitted Done Reply Inline Actions There is range checking for the offset so the container for value should not make big difference. Choosing int or unsigned brings ambiguity for '32-bit values with 1 at highest bit' (I would assume that it is most probably negative offset then large positive offset so int could work). With int64 we know if value was positive or negative in ir. Also see below (1), (2): Petar.Avramovic: There is range checking for the offset so the container for value should not make big…
				foadUnsubmitted Not Done Reply Inline Actions With int64 we know if value was positive or negative in ir. No, it's just a 32-bit value, in IR and in MIR. You don't get any extra information by returning int64_t. foad: > With int64 we know if value was positive or negative in ir. No, it's just a 32-bit value, in…
				Petar.AvramovicAuthorUnsubmitted Done Reply Inline Actions I meant that APInt and int64_t (used by G_CONSTANT) keep sign info and we know what was intended use of that 32 bit value. Given 32 bit value 0xFFFFFFFC, we can't handle it here if it was used as negative offset (-4 as int64_t) since soffset behaves as unsigned in address calculation. However, if that is a large positive(unsigned) offset (4294967292 as int64_t) folding this into soffset behaves correctly. Petar.Avramovic: I meant that APInt and int64_t (used by G_CONSTANT) keep sign info and we know what was…
	AMDGPU::getBaseWithConstantOffset(MachineRegisterInfo &MRI, Register Reg) {			AMDGPU::getBaseWithConstantOffset(MachineRegisterInfo &MRI, Register Reg) {
	MachineInstr *Def = getDefIgnoringCopies(Reg, MRI);			MachineInstr *Def = getDefIgnoringCopies(Reg, MRI);
	if (!Def)			if (!Def)
	return std::make_pair(Reg, 0);			return std::make_pair(Reg, 0);

	if (Def->getOpcode() == TargetOpcode::G_CONSTANT) {			if (Def->getOpcode() == TargetOpcode::G_CONSTANT) {
	unsigned Offset;			int64_t Offset;
	const MachineOperand &Op = Def->getOperand(1);			const MachineOperand &Op = Def->getOperand(1);
	if (Op.isImm())			if (Op.isImm())
	Offset = Op.getImm();			Offset = Op.getImm();
	else			else
	Offset = Op.getCImm()->getZExtValue();			Offset = Op.getCImm()->getZExtValue();
				Petar.AvramovicAuthorUnsubmitted Done Reply Inline Actions (2) while we use getZExtValue here. Is this intended? Petar.Avramovic: (2) while we use getZExtValue here. Is this intended?
				foadUnsubmitted Not Done Reply Inline Actions getSExtValue vs getZExtValue affects how the 32-bit value is extended to int64_t. But if you make this function return a 32-bit result, then that doesn't matter. foad: getSExtValue vs getZExtValue affects how the 32-bit value is extended to int64_t. But if you…

	return std::make_pair(Register(), Offset);			return std::make_pair(Register(), Offset);
	}			}

	int64_t Offset;			int64_t Offset;
	if (Def->getOpcode() == TargetOpcode::G_ADD) {			if (Def->getOpcode() == TargetOpcode::G_ADD) {
	// TODO: Handle G_OR used for add case			// TODO: Handle G_OR used for add case
	if (mi_match(Def->getOperand(2).getReg(), MRI, m_ICst(Offset)))			if (mi_match(Def->getOperand(2).getReg(), MRI, m_ICst(Offset)))
				Petar.AvramovicAuthorUnsubmitted Done Reply Inline Actions (1) m_ICst uses getSExtValue() on APInt Petar.Avramovic: (1) m_ICst uses getSExtValue() on APInt
	return std::make_pair(Def->getOperand(1).getReg(), Offset);			return std::make_pair(Def->getOperand(1).getReg(), Offset);

	// FIXME: matcher should ignore copies			// FIXME: matcher should ignore copies
	if (mi_match(Def->getOperand(2).getReg(), MRI, m_Copy(m_ICst(Offset))))			if (mi_match(Def->getOperand(2).getReg(), MRI, m_Copy(m_ICst(Offset))))
	return std::make_pair(Def->getOperand(1).getReg(), Offset);			return std::make_pair(Def->getOperand(1).getReg(), Offset);
	}			}

	return std::make_pair(Reg, 0);			return std::make_pair(Reg, 0);
	Show All 10 Lines

llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp

Show First 20 Lines • Show All 1,315 Lines • ▼ Show 20 Lines	static Register getSrcRegIgnoringCopies(const MachineRegisterInfo &MRI,
MachineInstr *Def = getDefIgnoringCopies(Reg, MRI);		MachineInstr *Def = getDefIgnoringCopies(Reg, MRI);
if (!Def)		if (!Def)
return Reg;		return Reg;

// TODO: Guard against this being an implicit def		// TODO: Guard against this being an implicit def
return Def->getOperand(0).getReg();		return Def->getOperand(0).getReg();
}		}

// Analyze a combined offset from an llvm.amdgcn.s.buffer intrinsic and store		// Analyze a combined offset from an llvm.amdgcn.s.buffer intrinsic and store
		arsenmUnsubmitted Not Done Reply Inline Actions Shouldn't use unsigned here arsenm: Shouldn't use unsigned here
		foadUnsubmitted Not Done Reply Inline Actions I don't think we need this function at all. Surely `(int)Val >= 0` is clear enough? foad: I don't think we need this function at all. Surely `(int)Val >= 0` is clear enough?
// the three offsets (voffset, soffset and instoffset)		// the three offsets (voffset, soffset and instoffset)
static unsigned setBufferOffsets(MachineIRBuilder &B,		static unsigned setBufferOffsets(MachineIRBuilder &B,
const AMDGPURegisterBankInfo &RBI,		const AMDGPURegisterBankInfo &RBI,
Register CombinedOffset, Register &VOffsetReg,		Register CombinedOffset, Register &VOffsetReg,
Register &SOffsetReg, int64_t &InstOffsetVal,		Register &SOffsetReg, int64_t &InstOffsetVal,
Align Alignment) {		Align Alignment) {
const LLT S32 = LLT::scalar(32);		const LLT S32 = LLT::scalar(32);
MachineRegisterInfo *MRI = B.getMRI();		MachineRegisterInfo *MRI = B.getMRI();

if (Optional<int64_t> Imm = getConstantVRegVal(CombinedOffset, *MRI)) {		if (Optional<int64_t> Imm = getConstantVRegVal(CombinedOffset, *MRI)) {
uint32_t SOffset, ImmOffset;		uint32_t SOffset, ImmOffset;
if (AMDGPU::splitMUBUFOffset(*Imm, SOffset, ImmOffset, &RBI.Subtarget,		if (AMDGPU::splitMUBUFOffset(*Imm, SOffset, ImmOffset, &RBI.Subtarget,
Alignment)) {		Alignment)) {
VOffsetReg = B.buildConstant(S32, 0).getReg(0);		VOffsetReg = B.buildConstant(S32, 0).getReg(0);
SOffsetReg = B.buildConstant(S32, SOffset).getReg(0);		SOffsetReg = B.buildConstant(S32, SOffset).getReg(0);
InstOffsetVal = ImmOffset;		InstOffsetVal = ImmOffset;

B.getMRI()->setRegBank(VOffsetReg, AMDGPU::VGPRRegBank);		B.getMRI()->setRegBank(VOffsetReg, AMDGPU::VGPRRegBank);
B.getMRI()->setRegBank(SOffsetReg, AMDGPU::SGPRRegBank);		B.getMRI()->setRegBank(SOffsetReg, AMDGPU::SGPRRegBank);
return SOffset + ImmOffset;		return SOffset + ImmOffset;
}		}
}		}

Register Base;		Register Base;
unsigned Offset;		int64_t Offset;

std::tie(Base, Offset) =		std::tie(Base, Offset) =
AMDGPU::getBaseWithConstantOffset(*MRI, CombinedOffset);		AMDGPU::getBaseWithConstantOffset(*MRI, CombinedOffset);

uint32_t SOffset, ImmOffset;		uint32_t SOffset, ImmOffset;
if (Offset > 0 && AMDGPU::splitMUBUFOffset(Offset, SOffset, ImmOffset,		if (Offset > 0 && AMDGPU::splitMUBUFOffset(Offset, SOffset, ImmOffset,
&RBI.Subtarget, Alignment)) {		&RBI.Subtarget, Alignment)) {
if (RBI.getRegBank(Base, MRI, RBI.TRI) == &AMDGPU::VGPRRegBank) {		if (RBI.getRegBank(Base, MRI, RBI.TRI) == &AMDGPU::VGPRRegBank) {
Show All 9 Lines	if (SOffset == 0) {
VOffsetReg = B.buildConstant(S32, 0).getReg(0);		VOffsetReg = B.buildConstant(S32, 0).getReg(0);
B.getMRI()->setRegBank(VOffsetReg, AMDGPU::VGPRRegBank);		B.getMRI()->setRegBank(VOffsetReg, AMDGPU::VGPRRegBank);
SOffsetReg = Base;		SOffsetReg = Base;
InstOffsetVal = ImmOffset;		InstOffsetVal = ImmOffset;
return 0; // XXX - Why is this 0?		return 0; // XXX - Why is this 0?
}		}
}		}

// Handle the variable sgpr + vgpr case.		// Handle the variable sgpr + vgpr case.
		foadUnsubmitted Not Done Reply Inline Actions The DAG version of setBufferOffsets doesn't have this case. Is it done later on when the register classes are known? foad: The DAG version of setBufferOffsets doesn't have this case. Is it done later on when the…
		Petar.AvramovicAuthorUnsubmitted Done Reply Inline Actions SDAG just gives up and uses add as base (1 extra instruction compared to what global-isel does). See `@s_buffer_load_f32_offset_add_vgpr_sgpr` in `GlobalISel/llvm.amdgcn.s.buffer.load.ll` for example. Petar.Avramovic: SDAG just gives up and uses add as base (1 extra instruction compared to what global-isel does).
if (MachineInstr Add = getOpcodeDef(AMDGPU::G_ADD, CombinedOffset, MRI)) {		MachineInstr Add = getOpcodeDef(AMDGPU::G_ADD, CombinedOffset, MRI);
		if (Add && Offset >= 0) {
		foadUnsubmitted Not Done Reply Inline Actions I think the hardware zero-extends each component of the address from 32 to 64 bits before adding them together. So the question here is: does zext(add(x, y)) == add(zext(x), zext(y))? The answer is "only if add(x, y) has no unsigned overflow". So I don't think "Offset >= 0" is exactly the right condition here, but it does fix the common case of a small negative immediate offset, so I'm happy with this as a short term fix. foad: I think the hardware zero-extends each component of the address from 32 to 64 bits before…
		arsenmUnsubmitted Not Done Reply Inline Actions Don't we have the sign bit is zero check? arsenm: Don't we have the sign bit is zero check?
		foadUnsubmitted Not Done Reply Inline Actions Not sure what you mean. Are you suggesting only decomposing the ADD if we know the sign bit is zero in both operands? That would be safe, but I'm not sure how often that test would succeed. foad: Not sure what you mean. Are you suggesting only decomposing the ADD if we know the sign bit is…
		arsenmUnsubmitted Not Done Reply Inline Actions Yes, this is what we do in a few places in the DAG already arsenm: Yes, this is what we do in a few places in the DAG already
Register Src0 = getSrcRegIgnoringCopies(*MRI, Add->getOperand(1).getReg());		Register Src0 = getSrcRegIgnoringCopies(*MRI, Add->getOperand(1).getReg());
Register Src1 = getSrcRegIgnoringCopies(*MRI, Add->getOperand(2).getReg());		Register Src1 = getSrcRegIgnoringCopies(*MRI, Add->getOperand(2).getReg());

const RegisterBank Src0Bank = RBI.getRegBank(Src0, MRI, *RBI.TRI);		const RegisterBank Src0Bank = RBI.getRegBank(Src0, MRI, *RBI.TRI);
const RegisterBank Src1Bank = RBI.getRegBank(Src1, MRI, *RBI.TRI);		const RegisterBank Src1Bank = RBI.getRegBank(Src1, MRI, *RBI.TRI);

if (Src0Bank == &AMDGPU::VGPRRegBank && Src1Bank == &AMDGPU::SGPRRegBank) {		if (Src0Bank == &AMDGPU::VGPRRegBank && Src1Bank == &AMDGPU::SGPRRegBank) {
VOffsetReg = Src0;		VOffsetReg = Src0;
▲ Show 20 Lines • Show All 3,099 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.s.buffer.load.mir

Show All 40 Lines	bb.0:
%0:_(<4 x s32>) = COPY $sgpr0_sgpr1_sgpr2_sgpr3		%0:_(<4 x s32>) = COPY $sgpr0_sgpr1_sgpr2_sgpr3
%1:_(s32) = COPY $sgpr0		%1:_(s32) = COPY $sgpr0
%2:vgpr(s32) = G_CONSTANT i32 256		%2:vgpr(s32) = G_CONSTANT i32 256
%3:_(s32) = G_ADD %1, %2		%3:_(s32) = G_ADD %1, %2
%4:_(s32) = G_AMDGPU_S_BUFFER_LOAD %0, %3, 0		%4:_(s32) = G_AMDGPU_S_BUFFER_LOAD %0, %3, 0
S_ENDPGM 0, implicit %4		S_ENDPGM 0, implicit %4

...		...

		---
		name: s_buffer_load_negative_offset
		foadUnsubmitted Not Done Reply Inline Actions Can you pre-commit this test so we can see the diff? foad: Can you pre-commit this test so we can see the diff?
		legalized: true
		tracksRegLiveness: true
		body: \|
		bb.0:
		liveins: $sgpr0_sgpr1_sgpr2_sgpr3, $vgpr0

		; FAST-LABEL: name: s_buffer_load_negative_offset
		; FAST: liveins: $sgpr0_sgpr1_sgpr2_sgpr3, $vgpr0
		; FAST: [[COPY:%[0-9]+]]:sgpr(<4 x s32>) = COPY $sgpr0_sgpr1_sgpr2_sgpr3
		; FAST: [[COPY1:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
		; FAST: [[C:%[0-9]+]]:sgpr(s32) = G_CONSTANT i32 -60
		; FAST: [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[C]](s32)
		; FAST: [[ADD:%[0-9]+]]:vgpr(s32) = G_ADD [[COPY1]], [[COPY2]]
		; FAST: [[C1:%[0-9]+]]:sgpr(s32) = G_CONSTANT i32 0
		; FAST: [[C2:%[0-9]+]]:vgpr(s32) = G_CONSTANT i32 0
		; FAST: [[AMDGPU_BUFFER_LOAD:%[0-9]+]]:vgpr(s32) = G_AMDGPU_BUFFER_LOAD [[COPY]](<4 x s32>), [[C2]](s32), [[ADD]], [[C1]], 0, 0, 0 :: (dereferenceable invariant load 4)
		; FAST: S_ENDPGM 0, implicit [[AMDGPU_BUFFER_LOAD]](s32)
		; GREEDY-LABEL: name: s_buffer_load_negative_offset
		; GREEDY: liveins: $sgpr0_sgpr1_sgpr2_sgpr3, $vgpr0
		; GREEDY: [[COPY:%[0-9]+]]:sgpr(<4 x s32>) = COPY $sgpr0_sgpr1_sgpr2_sgpr3
		; GREEDY: [[COPY1:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
		; GREEDY: [[C:%[0-9]+]]:sgpr(s32) = G_CONSTANT i32 -60
		; GREEDY: [[COPY2:%[0-9]+]]:vgpr(s32) = COPY [[C]](s32)
		; GREEDY: [[ADD:%[0-9]+]]:vgpr(s32) = G_ADD [[COPY1]], [[COPY2]]
		; GREEDY: [[C1:%[0-9]+]]:sgpr(s32) = G_CONSTANT i32 0
		; GREEDY: [[C2:%[0-9]+]]:vgpr(s32) = G_CONSTANT i32 0
		; GREEDY: [[AMDGPU_BUFFER_LOAD:%[0-9]+]]:vgpr(s32) = G_AMDGPU_BUFFER_LOAD [[COPY]](<4 x s32>), [[C2]](s32), [[ADD]], [[C1]], 0, 0, 0 :: (dereferenceable invariant load 4)
		; GREEDY: S_ENDPGM 0, implicit [[AMDGPU_BUFFER_LOAD]](s32)
		%0:_(<4 x s32>) = COPY $sgpr0_sgpr1_sgpr2_sgpr3
		%1:_(s32) = COPY $vgpr0
		%2:_(s32) = G_CONSTANT i32 -60
		%3:_(s32) = G_ADD %1, %2
		%4:_(s32) = G_AMDGPU_S_BUFFER_LOAD %0, %3, 0
		S_ENDPGM 0, implicit %4

		...

This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/GlobalISel: Fix negative offset folding for buffer_load
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 304786

llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.h

llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp

llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp

llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.s.buffer.load.mir

This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/GlobalISel: Fix negative offset folding for buffer_loadClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 304786

llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.h

llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp

llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp

llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.s.buffer.load.mir

AMDGPU/GlobalISel: Fix negative offset folding for buffer_load
ClosedPublic