This is an archive of the discontinued LLVM Phabricator instance.

llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp
40 ↗	(On Diff #446462)	I'm not perfectly sure we do the right here. One concern is that `m_ICst()` seems to match signed values and another is that we ignore `nuw`/`nsw` flags. I tried to add a check for `nuw`, and that led to numerous test failures, which coupled with that we seem to use this code for non-scalar buffer loads suggests that it's probably a task of its own. If anyone can confirm that these concerns are valid, I'd like to add a TODO explaining them to this patch.

This affects 53 of our test shaders, reducing code size of each by ~100 bytes in average. No shaders with increased code size.

arsenm added inline comments.Jul 21 2022, 6:19 AM

llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp
40 ↗	(On Diff #446462)	m_ICst should just match constants, the sign is in the interpretation. nuw/nsw only potentially matters in cases where we're extending a 32-bit offset to 64-bits in one of the offsets that don't handle wrapping. I expect to match the DAG behavior for all of these addressing modes I don't think you should be trying to match constant offsets in either operand here. This is showing we're missing the standard canonicalization to move constants to the RHS for gMIR

Harbormaster completed remote builds in B176743: Diff 446462.Jul 21 2022, 6:42 AM

Rework matching offsets.

kosarev added inline comments.Jul 26 2022, 8:08 AM

llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp
40 ↗	(On Diff #446462)	It seems the current approach is to rely on matchers that for commutative operations try both the cases. Updated to use them.

Harbormaster completed remote builds in B177627: Diff 447715.Jul 26 2022, 9:31 AM

Ping.

foad added inline comments.Aug 2 2022, 6:29 AM

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
4922	Does anything go wrong if you remove the `Offset == 0` check?

arsenm added inline comments.Aug 2 2022, 7:44 AM

llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp
40 ↗	(On Diff #446462)	It's not a current approach, it's a missing feature. I'd rather not handle the commuted case and separately introduce a new combine to canonicalize constants to the RHS

kosarev added inline comments.Aug 2 2022, 9:02 AM

llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp
40 ↗	(On Diff #446462)	I don't mind to look into why we currently do it the way we do, but now that this patch doesn't do anything special to match both the cases, it seems it can be addressed separately and after this is landed? (And as I side note, I admit I struggle to see how this is not a current approach, considering that this is literally what's being done whenever we aim to match commutative nodes.)
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
4922	The tests still pass (and that is what I would expect from reading relevant code), but we don't have many of them and GISel doesn't seem to be able to cope with the most of our test corpus shaders. Is there any benefit in removing the check?

foad added inline comments.Aug 2 2022, 9:22 AM

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
4922	Is there any benefit in removing the check? I have not fully understood the patch but it seems like it is more complicated than necessary because you insist that this offset has to be non-zero. For example I assume this is why you had to move the `Offset && !SOffset` case out of SelectSMRDBaseOffset and into SelectSMRD(?). So I am trying to understand where the requirement comes from. Is it because you want to be sure that if the offset is 0 then we select _SGPR instead of _SGPR_IMM forms of the instruction? If so, then I think this should be handled by ensuring that the patterns for the _SGPR forms have higher priority. (Perhaps this already happens, just because of the order the patterns appear in the .td file?)

Removed the Offset == 0 check in the GISel matcher.

kosarev added inline comments.Aug 3 2022, 3:47 AM

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
4922	I see. Yes, this makes sense. I've removed the check for the sake of having a chance to catch it if and when we messed up with the matching order. As of right now the ordering looks correct to me as the SGPR_IMM pattern is already the most complex one and thus I understand doesn't need playing with `AddedComplexity`. Regarding the `Offset && !SOffset` case, since we now use `SelectSMRDBaseOffset()` to match the SGPR_IMM form and for that form we don't want the invented zero offset to match the IMM part, that code has to be somewhere outside of this function.

Harbormaster completed remote builds in B179002: Diff 449622.Aug 3 2022, 4:06 AM

foad added inline comments.Aug 3 2022, 7:21 AM

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
4922	Regarding the Offset && !SOffset case, since we now use SelectSMRDBaseOffset() to match the SGPR_IMM form and for that form we don't want the invented zero offset to match the IMM part, that code has to be somewhere outside of this function. I still don't understand. If the _SGPR_IMM form exists then it is in no way inferior to the _SGPR form, so why not use it? (To put it another way, the _SGPR form is redundant on subtargets that have the _SGPR_IMM form - perhaps we should even use predicates to disable it on those subtargets.)

kosarev added inline comments.Aug 3 2022, 9:13 AM

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
4922	OK, that's another matter. One sort of reasons I can think of is to make the resulting code look more natural and to make life a bit easier reading and preparing tests, etc? So no obvious reasons not to use SGPR either, it seems? Trying to see a bigger picture, looks like in a better world we would have the same instruction with the non-zero offset part being syntactically optional and only available on certain subtargets. But where we have different instructions doing the same thing we normally use the one that looks least tricky and unexpected in the context?

arsenm added inline comments.Aug 4 2022, 7:30 PM

llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp
40 ↗	(On Diff #446462)	I don't know why I thought this was checking both operands; I see now this is looking through a copy for a constant. This is still also a pattern that should not reach the selector, and if it does it's demonstrating a combiner failure. The common case where we end up with a copy of a constant is for a VGPR use, which you don't have here so I don't think you need to worry about the copy here

kosarev added inline comments.Aug 8 2022, 4:07 AM

llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp
40 ↗	(On Diff #446462)	This patch doesn't add the copy matching either; just rewrites it to use `mi_match()`. Whether it's a combiner failure or a matcher issue as your FIXME above says, it doesn't seem to have anything to do with this patch? Removing the copy-matching bit leads to some test failures, so looks like it's there for a reason. Failed Tests (5): LLVM :: CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.ds.gws.barrier.ll LLVM :: CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.ds.gws.init.ll LLVM :: CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.s.buffer.load.ll LLVM :: CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.s.buffer.load.ll LLVM :: CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.s.buffer.load.mir

kosarev added inline comments.Aug 12 2022, 4:29 AM

llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp
40 ↗	(On Diff #446462)	Ping.

Ping.

kosarev added a child revision: D132552: [AMDGPU][CodeGen] Support (base | offset) SMEM loads..Aug 24 2022, 5:45 AM

Ping.

kosarev added a child revision: D133021: [AMDGPU][CodeGen] Pre-commit a test on (base | offset) SMEM loads for D132552..Aug 31 2022, 7:36 AM

kosarev removed a child revision: D132552: [AMDGPU][CodeGen] Support (base | offset) SMEM loads..

kosarev mentioned this in D133021: [AMDGPU][CodeGen] Pre-commit a test on (base | offset) SMEM loads for D132552..Sep 1 2022, 1:56 AM

foad added a subscriber: qcolombet.Sep 1 2022, 3:13 AM

foad added inline comments.

llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp
40 ↗	(On Diff #446462)	There is some philosophical discussion about this here: https://discourse.llvm.org/t/globalisel-and-commutative-binops-in-pat/58115 in which @qcolombet says "we don’t do a whole lot because the incoming IR is already supposed to be canonical". So in the interest of making forward progress, perhaps we should drop this hunk from the patch, and instead fix the IR in test_buffer_load_sgpr_plus_imm_offset to be canonical by having the `77` on the RHS of the `add`. This is what `opt -instcombine` would do to it.

foad added inline comments.Sep 1 2022, 3:48 AM

llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1886–1887	Would you mind updating this comment since there is no `Imm` argument.
1973–1974	Would you mind updating this comment since there is no `Imm` argument.
1990	Committing most of this hunk as an NFC refactoring would have made the current patch easier to read.
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
4922	It feels like the getType check could be an assertion? I can't see how it would fail, if the tablegen patterns are written correctly.

Addressed review feedback and rebased.

kosarev marked 3 inline comments as done.Sep 1 2022, 7:27 AM

kosarev added inline comments.

llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp
40 ↗	(On Diff #446462)	What a useful finding is that discussion. Thanks for pointing out. Updated as suggested.

LGTM, thanks!

This revision is now accepted and ready to land.Sep 1 2022, 7:40 AM

Harbormaster completed remote builds in B184576: Diff 457265.Sep 1 2022, 8:28 AM

I'm going to commit this next Monday, if no objections. Should there be any further feedback on this as people get back to office, hopefully we'll be able to address it post-commit. Thanks for reviewing.

Closed by commit rGf33645301e9d: [AMDGPU][CodeGen] Support (soffset + offset) s_buffer_load's. (authored by kosarev). · Explain WhySep 5 2022, 4:56 AM

This revision was automatically updated to reflect the committed changes.

kosarev added a commit: rGf33645301e9d: [AMDGPU][CodeGen] Support (soffset + offset) s_buffer_load's..

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

AMDGPUGISel.td

4 lines

AMDGPUISelDAGToDAG.h

14 lines

AMDGPUISelDAGToDAG.cpp

130 lines

AMDGPUInstructionSelector.h

1 line

AMDGPUInstructionSelector.cpp

21 lines

AMDGPURegisterBankInfo.cpp

1 line

SMInstructions.td

14 lines

test/

CodeGen/

AMDGPU/

amdgcn-load-offset-from-reg.ll

24 lines

Diff 457938

llvm/lib/Target/AMDGPU/AMDGPUGISel.td

	Show First 20 Lines • Show All 143 Lines • ▼ Show 20 Lines
	def gi_smrd_buffer_imm :			def gi_smrd_buffer_imm :
	GIComplexOperandMatcher<s64, "selectSMRDBufferImm">,			GIComplexOperandMatcher<s64, "selectSMRDBufferImm">,
	GIComplexPatternEquiv<SMRDBufferImm>;			GIComplexPatternEquiv<SMRDBufferImm>;

	def gi_smrd_buffer_imm32 :			def gi_smrd_buffer_imm32 :
	GIComplexOperandMatcher<s64, "selectSMRDBufferImm32">,			GIComplexOperandMatcher<s64, "selectSMRDBufferImm32">,
	GIComplexPatternEquiv<SMRDBufferImm32>;			GIComplexPatternEquiv<SMRDBufferImm32>;

				def gi_smrd_buffer_sgpr_imm :
				GIComplexOperandMatcher<s64, "selectSMRDBufferSgprImm">,
				GIComplexPatternEquiv<SMRDBufferSgprImm>;

	// Separate load nodes are defined to glue m0 initialization in			// Separate load nodes are defined to glue m0 initialization in
	// SelectionDAG. The GISel selector can just insert m0 initialization			// SelectionDAG. The GISel selector can just insert m0 initialization
	// directly before selecting a glue-less load, so hide this			// directly before selecting a glue-less load, so hide this
	// distinction.			// distinction.

	def : GINodeEquiv<G_LOAD, AMDGPUld_glue> {			def : GINodeEquiv<G_LOAD, AMDGPUld_glue> {
	let CheckMMOIsNonAtomic = 1;			let CheckMMOIsNonAtomic = 1;
	let IfSignExtend = G_SEXTLOAD;			let IfSignExtend = G_SEXTLOAD;
	▲ Show 20 Lines • Show All 211 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.h

Show First 20 Lines • Show All 187 Lines • ▼ Show 20 Lines	bool SelectGlobalSAddr(SDNode *N, SDValue Addr, SDValue &SAddr,
SDValue &VOffset, SDValue &Offset) const;		SDValue &VOffset, SDValue &Offset) const;
bool SelectScratchSAddr(SDNode *N, SDValue Addr, SDValue &SAddr,		bool SelectScratchSAddr(SDNode *N, SDValue Addr, SDValue &SAddr,
SDValue &Offset) const;		SDValue &Offset) const;
bool checkFlatScratchSVSSwizzleBug(SDValue VAddr, SDValue SAddr,		bool checkFlatScratchSVSSwizzleBug(SDValue VAddr, SDValue SAddr,
uint64_t ImmOffset) const;		uint64_t ImmOffset) const;
bool SelectScratchSVAddr(SDNode *N, SDValue Addr, SDValue &VAddr,		bool SelectScratchSVAddr(SDNode *N, SDValue Addr, SDValue &VAddr,
SDValue &SAddr, SDValue &Offset) const;		SDValue &SAddr, SDValue &Offset) const;

bool SelectSMRDOffset(SDValue Base, SDValue ByteOffsetNode, SDValue *SOffset,		bool SelectSMRDOffset(SDValue ByteOffsetNode, SDValue *SOffset,
SDValue *Offset, bool Imm32Only = false) const;		SDValue *Offset, bool Imm32Only = false,
		bool IsBuffer = false) const;
SDValue Expand32BitAddress(SDValue Addr) const;		SDValue Expand32BitAddress(SDValue Addr) const;
bool SelectSMRDBaseOffset(SDValue Addr, SDValue &SBase, SDValue *SOffset,		bool SelectSMRDBaseOffset(SDValue Addr, SDValue &SBase, SDValue *SOffset,
SDValue *Offset, bool Imm32Only = false) const;		SDValue *Offset, bool Imm32Only = false,
		bool IsBuffer = false) const;
bool SelectSMRD(SDValue Addr, SDValue &SBase, SDValue *SOffset,		bool SelectSMRD(SDValue Addr, SDValue &SBase, SDValue *SOffset,
SDValue *Offset, bool Imm32Only = false) const;		SDValue *Offset, bool Imm32Only = false) const;
bool SelectSMRDImm(SDValue Addr, SDValue &SBase, SDValue &Offset) const;		bool SelectSMRDImm(SDValue Addr, SDValue &SBase, SDValue &Offset) const;
bool SelectSMRDImm32(SDValue Addr, SDValue &SBase, SDValue &Offset) const;		bool SelectSMRDImm32(SDValue Addr, SDValue &SBase, SDValue &Offset) const;
bool SelectSMRDSgpr(SDValue Addr, SDValue &SBase, SDValue &SOffset) const;		bool SelectSMRDSgpr(SDValue Addr, SDValue &SBase, SDValue &SOffset) const;
bool SelectSMRDSgprImm(SDValue Addr, SDValue &SBase, SDValue &SOffset,		bool SelectSMRDSgprImm(SDValue Addr, SDValue &SBase, SDValue &SOffset,
SDValue &Offset) const;		SDValue &Offset) const;
bool SelectSMRDBufferImm(SDValue Addr, SDValue &Offset) const;		bool SelectSMRDBufferImm(SDValue N, SDValue &Offset) const;
bool SelectSMRDBufferImm32(SDValue Addr, SDValue &Offset) const;		bool SelectSMRDBufferImm32(SDValue N, SDValue &Offset) const;
		bool SelectSMRDBufferSgprImm(SDValue N, SDValue &SOffset,
		SDValue &Offset) const;
bool SelectMOVRELOffset(SDValue Index, SDValue &Base, SDValue &Offset) const;		bool SelectMOVRELOffset(SDValue Index, SDValue &Base, SDValue &Offset) const;

bool SelectVOP3Mods_NNaN(SDValue In, SDValue &Src, SDValue &SrcMods) const;		bool SelectVOP3Mods_NNaN(SDValue In, SDValue &Src, SDValue &SrcMods) const;
bool SelectVOP3ModsImpl(SDValue In, SDValue &Src, unsigned &SrcMods,		bool SelectVOP3ModsImpl(SDValue In, SDValue &Src, unsigned &SrcMods,
bool AllowAbs = true) const;		bool AllowAbs = true) const;
bool SelectVOP3Mods(SDValue In, SDValue &Src, SDValue &SrcMods) const;		bool SelectVOP3Mods(SDValue In, SDValue &Src, SDValue &SrcMods) const;
bool SelectVOP3BMods(SDValue In, SDValue &Src, SDValue &SrcMods) const;		bool SelectVOP3BMods(SDValue In, SDValue &Src, SDValue &SrcMods) const;
bool SelectVOP3NoMods(SDValue In, SDValue &Src) const;		bool SelectVOP3NoMods(SDValue In, SDValue &Src) const;
▲ Show 20 Lines • Show All 61 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp

Show First 20 Lines • Show All 1,877 Lines • ▼ Show 20 Lines	bool AMDGPUDAGToDAGISel::SelectScratchSVAddr(SDNode *N, SDValue Addr,
}		}

if (checkFlatScratchSVSSwizzleBug(VAddr, SAddr, ImmOffset))		if (checkFlatScratchSVSSwizzleBug(VAddr, SAddr, ImmOffset))
return false;		return false;
SAddr = SelectSAddrFI(CurDAG, SAddr);		SAddr = SelectSAddrFI(CurDAG, SAddr);
Offset = CurDAG->getTargetConstant(ImmOffset, SDLoc(), MVT::i16);		Offset = CurDAG->getTargetConstant(ImmOffset, SDLoc(), MVT::i16);
return true;		return true;
}		}

// Match an immediate (if Imm is true) or an SGPR (if Imm is false)		// Match an immediate (if Offset is not null) or an SGPR (if SOffset is
		foadUnsubmitted Done Reply Inline Actions Would you mind updating this comment since there is no `Imm` argument. foad: Would you mind updating this comment since there is no `Imm` argument.
// offset. If Imm32Only is true, match only 32-bit immediate offsets		// not null) offset. If Imm32Only is true, match only 32-bit immediate
// available on CI.		// offsets available on CI.
bool AMDGPUDAGToDAGISel::SelectSMRDOffset(SDValue Addr, SDValue ByteOffsetNode,		bool AMDGPUDAGToDAGISel::SelectSMRDOffset(SDValue ByteOffsetNode,
SDValue SOffset, SDValue Offset,		SDValue SOffset, SDValue Offset,
bool Imm32Only) const {		bool Imm32Only, bool IsBuffer) const {
		assert((!SOffset \|\| !Offset) &&
		"Cannot match both soffset and offset at the same time!");

ConstantSDNode *C = dyn_cast<ConstantSDNode>(ByteOffsetNode);		ConstantSDNode *C = dyn_cast<ConstantSDNode>(ByteOffsetNode);
if (!C) {		if (!C) {
if (!SOffset)		if (!SOffset)
return false;		return false;
if (ByteOffsetNode.getValueType().isScalarInteger() &&		if (ByteOffsetNode.getValueType().isScalarInteger() &&
ByteOffsetNode.getValueType().getSizeInBits() == 32) {		ByteOffsetNode.getValueType().getSizeInBits() == 32) {
*SOffset = ByteOffsetNode;		*SOffset = ByteOffsetNode;
return true;		return true;
}		}
if (ByteOffsetNode.getOpcode() == ISD::ZERO_EXTEND) {		if (ByteOffsetNode.getOpcode() == ISD::ZERO_EXTEND) {
if (ByteOffsetNode.getOperand(0).getValueType().getSizeInBits() == 32) {		if (ByteOffsetNode.getOperand(0).getValueType().getSizeInBits() == 32) {
*SOffset = ByteOffsetNode.getOperand(0);		*SOffset = ByteOffsetNode.getOperand(0);
return true;		return true;
}		}
}		}
return false;		return false;
}		}

SDLoc SL(ByteOffsetNode);		SDLoc SL(ByteOffsetNode);
// GFX9 and GFX10 have signed byte immediate offsets.
int64_t ByteOffset = C->getSExtValue();		// GFX9 and GFX10 have signed byte immediate offsets. The immediate
		// offset for S_BUFFER instructions is unsigned.
		int64_t ByteOffset = IsBuffer ? C->getZExtValue() : C->getSExtValue();
Optional<int64_t> EncodedOffset =		Optional<int64_t> EncodedOffset =
AMDGPU::getSMRDEncodedOffset(*Subtarget, ByteOffset, false);		AMDGPU::getSMRDEncodedOffset(*Subtarget, ByteOffset, IsBuffer);
if (EncodedOffset && Offset && !Imm32Only) {		if (EncodedOffset && Offset && !Imm32Only) {
Offset = CurDAG->getTargetConstant(EncodedOffset, SL, MVT::i32);		Offset = CurDAG->getTargetConstant(EncodedOffset, SL, MVT::i32);
return true;		return true;
}		}

// SGPR and literal offsets are unsigned.		// SGPR and literal offsets are unsigned.
if (ByteOffset < 0)		if (ByteOffset < 0)
return false;		return false;
Show All 36 Lines	const SDValue Ops[] = {
SDValue(CurDAG->getMachineNode(AMDGPU::S_MOV_B32, SL, MVT::i32, AddrHi),		SDValue(CurDAG->getMachineNode(AMDGPU::S_MOV_B32, SL, MVT::i32, AddrHi),
0),		0),
CurDAG->getTargetConstant(AMDGPU::sub1, SL, MVT::i32),		CurDAG->getTargetConstant(AMDGPU::sub1, SL, MVT::i32),
};		};

return SDValue(CurDAG->getMachineNode(AMDGPU::REG_SEQUENCE, SL, MVT::i64,		return SDValue(CurDAG->getMachineNode(AMDGPU::REG_SEQUENCE, SL, MVT::i64,
Ops), 0);		Ops), 0);
}		}

// Match a base and an immediate (if Imm is true) or an SGPR		// Match a base and an immediate (if Offset is not null) or an SGPR (if
		foadUnsubmitted Done Reply Inline Actions Would you mind updating this comment since there is no `Imm` argument. foad: Would you mind updating this comment since there is no `Imm` argument.
// (if Imm is false) offset. If Imm32Only is true, match only 32-bit		// SOffset is not null) or an immediate+SGPR offset. If Imm32Only is
// immediate offsets available on CI.		// true, match only 32-bit immediate offsets available on CI.
bool AMDGPUDAGToDAGISel::SelectSMRDBaseOffset(SDValue Addr, SDValue &SBase,		bool AMDGPUDAGToDAGISel::SelectSMRDBaseOffset(SDValue Addr, SDValue &SBase,
SDValue SOffset, SDValue Offset,		SDValue SOffset, SDValue Offset,
bool Imm32Only) const {		bool Imm32Only,
SDLoc SL(Addr);		bool IsBuffer) const {

if (SOffset && Offset) {		if (SOffset && Offset) {
assert(!Imm32Only);		assert(!Imm32Only && !IsBuffer);
SDValue B;		SDValue B;
return SelectSMRDBaseOffset(Addr, B, nullptr, Offset) &&		return SelectSMRDBaseOffset(Addr, B, nullptr, Offset) &&
SelectSMRDBaseOffset(B, SBase, SOffset, nullptr);		SelectSMRDBaseOffset(B, SBase, SOffset, nullptr);
}		}

// A 32-bit (address + offset) should not cause unsigned 32-bit integer		// A 32-bit (address + offset) should not cause unsigned 32-bit integer
// wraparound, because s_load instructions perform the addition in 64 bits.		// wraparound, because s_load instructions perform the addition in 64 bits.
if ((Addr.getValueType() != MVT::i32 \|\|		if (Addr.getValueType() == MVT::i32 && !Addr->getFlags().hasNoUnsignedWrap())
		foadUnsubmitted Not Done Reply Inline Actions Committing most of this hunk as an NFC refactoring would have made the current patch easier to read. foad: Committing most of this hunk as an NFC refactoring would have made the current patch easier to…
Addr->getFlags().hasNoUnsignedWrap())) {		return false;

SDValue N0, N1;		SDValue N0, N1;
// Extract the base and offset if possible.		// Extract the base and offset if possible.
if (CurDAG->isBaseWithConstantOffset(Addr) \|\|		if (CurDAG->isBaseWithConstantOffset(Addr) \|\| Addr.getOpcode() == ISD::ADD) {
Addr.getOpcode() == ISD::ADD) {
N0 = Addr.getOperand(0);		N0 = Addr.getOperand(0);
N1 = Addr.getOperand(1);		N1 = Addr.getOperand(1);
} else if (getBaseWithOffsetUsingSplitOR(*CurDAG, Addr, N0, N1)) {		} else if (getBaseWithOffsetUsingSplitOR(*CurDAG, Addr, N0, N1)) {
assert(N0 && N1 && isa<ConstantSDNode>(N1));		assert(N0 && N1 && isa<ConstantSDNode>(N1));
}		}
if (N0 && N1) {		if (!N0 \|\| !N1)
if (SelectSMRDOffset(N0, N1, SOffset, Offset, Imm32Only)) {		return false;
		if (SelectSMRDOffset(N1, SOffset, Offset, Imm32Only, IsBuffer)) {
SBase = N0;		SBase = N0;
return true;		return true;
}		}
if (SelectSMRDOffset(N1, N0, SOffset, Offset, Imm32Only)) {		if (SelectSMRDOffset(N0, SOffset, Offset, Imm32Only, IsBuffer)) {
SBase = N1;		SBase = N1;
return true;		return true;
}		}
}
return false;
}
if (Offset && !SOffset) {
SBase = Addr;
*Offset = CurDAG->getTargetConstant(0, SL, MVT::i32);
return true;
}
return false;		return false;
}		}

bool AMDGPUDAGToDAGISel::SelectSMRD(SDValue Addr, SDValue &SBase,		bool AMDGPUDAGToDAGISel::SelectSMRD(SDValue Addr, SDValue &SBase,
SDValue SOffset, SDValue Offset,		SDValue SOffset, SDValue Offset,
bool Imm32Only) const {		bool Imm32Only) const {
if (!SelectSMRDBaseOffset(Addr, SBase, SOffset, Offset, Imm32Only))		if (SelectSMRDBaseOffset(Addr, SBase, SOffset, Offset, Imm32Only)) {
return false;
SBase = Expand32BitAddress(SBase);		SBase = Expand32BitAddress(SBase);
return true;		return true;
}		}

		if (Addr.getValueType() == MVT::i32 && Offset && !SOffset) {
		SBase = Expand32BitAddress(Addr);
		*Offset = CurDAG->getTargetConstant(0, SDLoc(Addr), MVT::i32);
		return true;
		}

		return false;
		}

bool AMDGPUDAGToDAGISel::SelectSMRDImm(SDValue Addr, SDValue &SBase,		bool AMDGPUDAGToDAGISel::SelectSMRDImm(SDValue Addr, SDValue &SBase,
SDValue &Offset) const {		SDValue &Offset) const {
return SelectSMRD(Addr, SBase, /* SOffset */ nullptr, &Offset);		return SelectSMRD(Addr, SBase, /* SOffset */ nullptr, &Offset);
}		}

bool AMDGPUDAGToDAGISel::SelectSMRDImm32(SDValue Addr, SDValue &SBase,		bool AMDGPUDAGToDAGISel::SelectSMRDImm32(SDValue Addr, SDValue &SBase,
SDValue &Offset) const {		SDValue &Offset) const {
assert(Subtarget->getGeneration() == AMDGPUSubtarget::SEA_ISLANDS);		assert(Subtarget->getGeneration() == AMDGPUSubtarget::SEA_ISLANDS);
return SelectSMRD(Addr, SBase, /* SOffset */ nullptr, &Offset,		return SelectSMRD(Addr, SBase, /* SOffset */ nullptr, &Offset,
/* Imm32Only */ true);		/* Imm32Only */ true);
}		}

bool AMDGPUDAGToDAGISel::SelectSMRDSgpr(SDValue Addr, SDValue &SBase,		bool AMDGPUDAGToDAGISel::SelectSMRDSgpr(SDValue Addr, SDValue &SBase,
SDValue &SOffset) const {		SDValue &SOffset) const {
return SelectSMRD(Addr, SBase, &SOffset, /* Offset */ nullptr);		return SelectSMRD(Addr, SBase, &SOffset, /* Offset */ nullptr);
}		}

bool AMDGPUDAGToDAGISel::SelectSMRDSgprImm(SDValue Addr, SDValue &SBase,		bool AMDGPUDAGToDAGISel::SelectSMRDSgprImm(SDValue Addr, SDValue &SBase,
SDValue &SOffset,		SDValue &SOffset,
SDValue &Offset) const {		SDValue &Offset) const {
return SelectSMRD(Addr, SBase, &SOffset, &Offset);		return SelectSMRD(Addr, SBase, &SOffset, &Offset);
}		}

bool AMDGPUDAGToDAGISel::SelectSMRDBufferImm(SDValue Addr,		bool AMDGPUDAGToDAGISel::SelectSMRDBufferImm(SDValue N, SDValue &Offset) const {
SDValue &Offset) const {		return SelectSMRDOffset(N, /* SOffset */ nullptr, &Offset,
if (ConstantSDNode *C = dyn_cast<ConstantSDNode>(Addr)) {		/* Imm32Only / false, / IsBuffer */ true);
// The immediate offset for S_BUFFER instructions is unsigned.
if (auto Imm =
AMDGPU::getSMRDEncodedOffset(*Subtarget, C->getZExtValue(), true)) {
Offset = CurDAG->getTargetConstant(*Imm, SDLoc(Addr), MVT::i32);
return true;
}
}

return false;
}		}

bool AMDGPUDAGToDAGISel::SelectSMRDBufferImm32(SDValue Addr,		bool AMDGPUDAGToDAGISel::SelectSMRDBufferImm32(SDValue N,
SDValue &Offset) const {		SDValue &Offset) const {
assert(Subtarget->getGeneration() == AMDGPUSubtarget::SEA_ISLANDS);		assert(Subtarget->getGeneration() == AMDGPUSubtarget::SEA_ISLANDS);
		return SelectSMRDOffset(N, /* SOffset */ nullptr, &Offset,
if (ConstantSDNode *C = dyn_cast<ConstantSDNode>(Addr)) {		/* Imm32Only / true, / IsBuffer */ true);
if (auto Imm = AMDGPU::getSMRDEncodedLiteralOffset32(*Subtarget,
C->getZExtValue())) {
Offset = CurDAG->getTargetConstant(*Imm, SDLoc(Addr), MVT::i32);
return true;
}
}		}

return false;		bool AMDGPUDAGToDAGISel::SelectSMRDBufferSgprImm(SDValue N, SDValue &SOffset,
		SDValue &Offset) const {
		// Match the (soffset + offset) pair as a 32-bit register base and
		// an immediate offset.
		return N.getValueType() == MVT::i32 &&
		SelectSMRDBaseOffset(N, /* SBase / SOffset, / SOffset*/ nullptr,
		&Offset, /* Imm32Only */ false,
		/* IsBuffer */ true);
}		}

bool AMDGPUDAGToDAGISel::SelectMOVRELOffset(SDValue Index,		bool AMDGPUDAGToDAGISel::SelectMOVRELOffset(SDValue Index,
SDValue &Base,		SDValue &Base,
SDValue &Offset) const {		SDValue &Offset) const {
SDLoc DL(Index);		SDLoc DL(Index);

if (CurDAG->isBaseWithConstantOffset(Index)) {		if (CurDAG->isBaseWithConstantOffset(Index)) {
▲ Show 20 Lines • Show All 923 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.h

Show First 20 Lines • Show All 288 Lines • ▼ Show 20 Lines	private:
InstructionSelector::ComplexRendererFns		InstructionSelector::ComplexRendererFns
selectMUBUFOffsetAtomic(MachineOperand &Root) const;		selectMUBUFOffsetAtomic(MachineOperand &Root) const;

InstructionSelector::ComplexRendererFns		InstructionSelector::ComplexRendererFns
selectMUBUFAddr64Atomic(MachineOperand &Root) const;		selectMUBUFAddr64Atomic(MachineOperand &Root) const;

ComplexRendererFns selectSMRDBufferImm(MachineOperand &Root) const;		ComplexRendererFns selectSMRDBufferImm(MachineOperand &Root) const;
ComplexRendererFns selectSMRDBufferImm32(MachineOperand &Root) const;		ComplexRendererFns selectSMRDBufferImm32(MachineOperand &Root) const;
		ComplexRendererFns selectSMRDBufferSgprImm(MachineOperand &Root) const;

void renderTruncImm32(MachineInstrBuilder &MIB, const MachineInstr &MI,		void renderTruncImm32(MachineInstrBuilder &MIB, const MachineInstr &MI,
int OpIdx = -1) const;		int OpIdx = -1) const;

void renderTruncTImm(MachineInstrBuilder &MIB, const MachineInstr &MI,		void renderTruncTImm(MachineInstrBuilder &MIB, const MachineInstr &MI,
int OpIdx) const;		int OpIdx) const;

void renderNegateImm(MachineInstrBuilder &MIB, const MachineInstr &MI,		void renderNegateImm(MachineInstrBuilder &MIB, const MachineInstr &MI,
▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp

Show First 20 Lines • Show All 4,905 Lines • ▼ Show 20 Lines	AMDGPUInstructionSelector::selectSMRDBufferImm32(MachineOperand &Root) const {
Optional<int64_t> EncodedImm		Optional<int64_t> EncodedImm
= AMDGPU::getSMRDEncodedLiteralOffset32(STI, *OffsetVal);		= AMDGPU::getSMRDEncodedLiteralOffset32(STI, *OffsetVal);
if (!EncodedImm)		if (!EncodedImm)
return {};		return {};

return {{ [=](MachineInstrBuilder &MIB) { MIB.addImm(*EncodedImm); } }};		return {{ [=](MachineInstrBuilder &MIB) { MIB.addImm(*EncodedImm); } }};
}		}

		InstructionSelector::ComplexRendererFns
		AMDGPUInstructionSelector::selectSMRDBufferSgprImm(MachineOperand &Root) const {
		// Match the (soffset + offset) pair as a 32-bit register base and
		// an immediate offset.
		Register SOffset;
		unsigned Offset;
		std::tie(SOffset, Offset) =
		AMDGPU::getBaseWithConstantOffset(*MRI, Root.getReg());
		if (!SOffset)
		foadUnsubmitted Not Done Reply Inline Actions Does anything go wrong if you remove the `Offset == 0` check? foad: Does anything go wrong if you remove the `Offset == 0` check?
		kosarevAuthorUnsubmitted Done Reply Inline Actions The tests still pass (and that is what I would expect from reading relevant code), but we don't have many of them and GISel doesn't seem to be able to cope with the most of our test corpus shaders. Is there any benefit in removing the check? kosarev: The tests still pass (and that is what I would expect from reading relevant code), but we don't…
		foadUnsubmitted Not Done Reply Inline Actions Is there any benefit in removing the check? I have not fully understood the patch but it seems like it is more complicated than necessary because you insist that this offset has to be non-zero. For example I assume this is why you had to move the `Offset && !SOffset` case out of SelectSMRDBaseOffset and into SelectSMRD(?). So I am trying to understand where the requirement comes from. Is it because you want to be sure that if the offset is 0 then we select _SGPR instead of _SGPR_IMM forms of the instruction? If so, then I think this should be handled by ensuring that the patterns for the _SGPR forms have higher priority. (Perhaps this already happens, just because of the order the patterns appear in the .td file?) foad: > Is there any benefit in removing the check? I have not fully understood the patch but it…
		kosarevAuthorUnsubmitted Done Reply Inline Actions I see. Yes, this makes sense. I've removed the check for the sake of having a chance to catch it if and when we messed up with the matching order. As of right now the ordering looks correct to me as the SGPR_IMM pattern is already the most complex one and thus I understand doesn't need playing with `AddedComplexity`. Regarding the `Offset && !SOffset` case, since we now use `SelectSMRDBaseOffset()` to match the SGPR_IMM form and for that form we don't want the invented zero offset to match the IMM part, that code has to be somewhere outside of this function. kosarev: I see. Yes, this makes sense. I've removed the check for the sake of having a chance to catch…
		foadUnsubmitted Not Done Reply Inline Actions Regarding the Offset && !SOffset case, since we now use SelectSMRDBaseOffset() to match the SGPR_IMM form and for that form we don't want the invented zero offset to match the IMM part, that code has to be somewhere outside of this function. I still don't understand. If the _SGPR_IMM form exists then it is in no way inferior to the _SGPR form, so why not use it? (To put it another way, the _SGPR form is redundant on subtargets that have the _SGPR_IMM form - perhaps we should even use predicates to disable it on those subtargets.) foad: > Regarding the Offset && !SOffset case, since we now use SelectSMRDBaseOffset() to match the…
		kosarevAuthorUnsubmitted Done Reply Inline Actions OK, that's another matter. One sort of reasons I can think of is to make the resulting code look more natural and to make life a bit easier reading and preparing tests, etc? So no obvious reasons not to use SGPR either, it seems? Trying to see a bigger picture, looks like in a better world we would have the same instruction with the non-zero offset part being syntactically optional and only available on certain subtargets. But where we have different instructions doing the same thing we normally use the one that looks least tricky and unexpected in the context? kosarev: OK, that's another matter. One sort of reasons I can think of is to make the resulting code…
		foadUnsubmitted Done Reply Inline Actions It feels like the getType check could be an assertion? I can't see how it would fail, if the tablegen patterns are written correctly. foad: It feels like the getType check could be an assertion? I can't see how it would fail, if the…
		return None;

		Optional<int64_t> EncodedOffset =
		AMDGPU::getSMRDEncodedOffset(STI, Offset, /* IsBuffer */ true);
		if (!EncodedOffset)
		return None;

		assert(MRI->getType(SOffset) == LLT::scalar(32));
		return {{[=](MachineInstrBuilder &MIB) { MIB.addReg(SOffset); },
		[=](MachineInstrBuilder &MIB) { MIB.addImm(*EncodedOffset); }}};
		}

void AMDGPUInstructionSelector::renderTruncImm32(MachineInstrBuilder &MIB,		void AMDGPUInstructionSelector::renderTruncImm32(MachineInstrBuilder &MIB,
const MachineInstr &MI,		const MachineInstr &MI,
int OpIdx) const {		int OpIdx) const {
assert(MI.getOpcode() == TargetOpcode::G_CONSTANT && OpIdx == -1 &&		assert(MI.getOpcode() == TargetOpcode::G_CONSTANT && OpIdx == -1 &&
"Expected G_CONSTANT");		"Expected G_CONSTANT");
MIB.addImm(MI.getOperand(1).getCImm()->getSExtValue());		MIB.addImm(MI.getOperand(1).getCImm()->getSExtValue());
}		}

▲ Show 20 Lines • Show All 80 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp

	Show First 20 Lines • Show All 1,800 Lines • ▼ Show 20 Lines
	std::pair<Register, unsigned>			std::pair<Register, unsigned>
	AMDGPURegisterBankInfo::splitBufferOffsets(MachineIRBuilder &B,			AMDGPURegisterBankInfo::splitBufferOffsets(MachineIRBuilder &B,
	Register OrigOffset) const {			Register OrigOffset) const {
	const unsigned MaxImm = 4095;			const unsigned MaxImm = 4095;
	Register BaseReg;			Register BaseReg;
	unsigned ImmOffset;			unsigned ImmOffset;
	const LLT S32 = LLT::scalar(32);			const LLT S32 = LLT::scalar(32);

				// TODO: Use AMDGPU::getBaseWithConstantOffset() instead.
	std::tie(BaseReg, ImmOffset) = getBaseWithConstantOffset(*B.getMRI(),			std::tie(BaseReg, ImmOffset) = getBaseWithConstantOffset(*B.getMRI(),
	OrigOffset);			OrigOffset);

	unsigned C1 = 0;			unsigned C1 = 0;
	if (ImmOffset != 0) {			if (ImmOffset != 0) {
	// If the immediate value is too big for the immoffset field, put the value			// If the immediate value is too big for the immoffset field, put the value
	// and -4096 into the immoffset field so that the value that is copied/added			// and -4096 into the immoffset field so that the value that is copied/added
	// for the voffset field is a multiple of 4096, and it stands more chance			// for the voffset field is a multiple of 4096, and it stands more chance
	▲ Show 20 Lines • Show All 3,043 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SMInstructions.td

Show First 20 Lines • Show All 828 Lines • ▼ Show 20 Lines
}		}

def SMRDImm : ComplexPattern<iPTR, 2, "SelectSMRDImm">;		def SMRDImm : ComplexPattern<iPTR, 2, "SelectSMRDImm">;
def SMRDImm32 : ComplexPattern<iPTR, 2, "SelectSMRDImm32">;		def SMRDImm32 : ComplexPattern<iPTR, 2, "SelectSMRDImm32">;
def SMRDSgpr : ComplexPattern<iPTR, 2, "SelectSMRDSgpr">;		def SMRDSgpr : ComplexPattern<iPTR, 2, "SelectSMRDSgpr">;
def SMRDSgprImm : ComplexPattern<iPTR, 3, "SelectSMRDSgprImm">;		def SMRDSgprImm : ComplexPattern<iPTR, 3, "SelectSMRDSgprImm">;
def SMRDBufferImm : ComplexPattern<iPTR, 1, "SelectSMRDBufferImm">;		def SMRDBufferImm : ComplexPattern<iPTR, 1, "SelectSMRDBufferImm">;
def SMRDBufferImm32 : ComplexPattern<iPTR, 1, "SelectSMRDBufferImm32">;		def SMRDBufferImm32 : ComplexPattern<iPTR, 1, "SelectSMRDBufferImm32">;
		def SMRDBufferSgprImm : ComplexPattern<iPTR, 2, "SelectSMRDBufferSgprImm">;

multiclass SMRD_Pattern <string Instr, ValueType vt> {		multiclass SMRD_Pattern <string Instr, ValueType vt> {

// 1. IMM offset		// 1. IMM offset
def : GCNPat <		def : GCNPat <
(smrd_load (SMRDImm i64:$sbase, i32:$offset)),		(smrd_load (SMRDImm i64:$sbase, i32:$offset)),
(vt (!cast<SM_Pseudo>(Instr#"_IMM") $sbase, $offset, 0))		(vt (!cast<SM_Pseudo>(Instr#"_IMM") $sbase, $offset, 0))
>;		>;
Show All 39 Lines	def : GCNPat <
(!cast<InstSI>(Instr#"_IMM_ci") SReg_128:$sbase, smrd_literal_offset:$offset,		(!cast<InstSI>(Instr#"_IMM_ci") SReg_128:$sbase, smrd_literal_offset:$offset,
(extract_cpol $cachepolicy))> {		(extract_cpol $cachepolicy))> {
let OtherPredicates = [isGFX7Only];		let OtherPredicates = [isGFX7Only];
let AddedComplexity = 1;		let AddedComplexity = 1;
}		}

// 3. Offset loaded in an 32bit SGPR		// 3. Offset loaded in an 32bit SGPR
def : GCNPat <		def : GCNPat <
(SIsbuffer_load v4i32:$sbase, i32:$offset, timm:$cachepolicy),		(SIsbuffer_load v4i32:$sbase, i32:$soffset, timm:$cachepolicy),
(vt (!cast<SM_Pseudo>(Instr#"_SGPR") SReg_128:$sbase, SReg_32:$offset, (extract_cpol $cachepolicy)))		(vt (!cast<SM_Pseudo>(Instr#"_SGPR") SReg_128:$sbase, SReg_32:$soffset, (extract_cpol $cachepolicy)))
>;		>;

		// 4. Offset as an 32-bit SGPR + immediate
		def : GCNPat <
		(SIsbuffer_load v4i32:$sbase, (SMRDBufferSgprImm i32:$soffset, i32:$offset),
		timm:$cachepolicy),
		(vt (!cast<SM_Pseudo>(Instr#"_SGPR_IMM") SReg_128:$sbase, SReg_32:$soffset, i32imm:$offset,
		(extract_cpol $cachepolicy)))> {
		let OtherPredicates = [isGFX9Plus];
		}
}		}

// Global and constant loads can be selected to either MUBUF or SMRD		// Global and constant loads can be selected to either MUBUF or SMRD
// instructions, but SMRD instructions are faster so we want the instruction		// instructions, but SMRD instructions are faster so we want the instruction
// selector to prefer those.		// selector to prefer those.
let AddedComplexity = 100 in {		let AddedComplexity = 100 in {

foreach vt = Reg32Types.types in {		foreach vt = Reg32Types.types in {
▲ Show 20 Lines • Show All 326 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/amdgcn-load-offset-from-reg.ll

Show First 20 Lines • Show All 82 Lines • ▼ Show 20 Lines	define amdgpu_ps void @test_sgpr_plus_imm_offset_x2(i8 addrspace(4)* inreg %base, i32 inreg %offset,
%v2 = zext i32 %offset to i64		%v2 = zext i32 %offset to i64
%v3 = getelementptr i8, i8 addrspace(4)* %v1, i64 %v2		%v3 = getelementptr i8, i8 addrspace(4)* %v1, i64 %v2
%v4 = bitcast i8 addrspace(4)* %v3 to <2 x i32> addrspace(4)*		%v4 = bitcast i8 addrspace(4)* %v3 to <2 x i32> addrspace(4)*
%v5 = load <2 x i32>, <2 x i32> addrspace(4)* %v4, align 4		%v5 = load <2 x i32>, <2 x i32> addrspace(4)* %v4, align 4
store <2 x i32> %v5, <2 x i32> addrspace(1)* %out, align 4		store <2 x i32> %v5, <2 x i32> addrspace(1)* %out, align 4
ret void		ret void
}		}

		; GCN-LABEL: name: test_buffer_load_sgpr_plus_imm_offset
		; SDAG-DAG: %[[BASE0:.*]]:sgpr_32 = COPY $sgpr0
		; SDAG-DAG: %[[BASE1:.*]]:sgpr_32 = COPY $sgpr1
		; SDAG-DAG: %[[BASE2:.*]]:sgpr_32 = COPY $sgpr2
		; SDAG-DAG: %[[BASE3:.*]]:sgpr_32 = COPY $sgpr3
		; SDAG-DAG: %[[OFFSET:.*]]:sgpr_32 = COPY $sgpr4
		; SDAG-DAG: %[[BASE:.*]]:sgpr_128 = REG_SEQUENCE %[[BASE0]], %subreg.sub0, %[[BASE1]], %subreg.sub1, %[[BASE2]], %subreg.sub2, %[[BASE3]], %subreg.sub3
		; SDAG: S_BUFFER_LOAD_DWORD_SGPR_IMM killed %[[BASE]], %[[OFFSET]], 77,
		; GISEL-DAG: %[[BASE0:.*]]:sreg_32 = COPY $sgpr0
		; GISEL-DAG: %[[BASE1:.*]]:sreg_32 = COPY $sgpr1
		; GISEL-DAG: %[[BASE2:.*]]:sreg_32 = COPY $sgpr2
		; GISEL-DAG: %[[BASE3:.*]]:sreg_32 = COPY $sgpr3
		; GISEL-DAG: %[[OFFSET:.*]]:sreg_32 = COPY $sgpr4
		; GISEL-DAG: %[[BASE:.*]]:sgpr_128 = REG_SEQUENCE %[[BASE0]], %subreg.sub0, %[[BASE1]], %subreg.sub1, %[[BASE2]], %subreg.sub2, %[[BASE3]], %subreg.sub3
		; GISEL: S_BUFFER_LOAD_DWORD_SGPR_IMM %[[BASE]], %[[OFFSET]], 77,
		define amdgpu_cs void @test_buffer_load_sgpr_plus_imm_offset(<4 x i32> inreg %base, i32 inreg %i, i32 addrspace(1)* inreg %out) {
		%off = add nuw nsw i32 %i, 77
		%v = call i32 @llvm.amdgcn.s.buffer.load.i32(<4 x i32> %base, i32 %off, i32 0)
		store i32 %v, i32 addrspace(1)* %out, align 4
		ret void
		}

declare void @llvm.amdgcn.raw.buffer.store.v4i32(<4 x i32>, <4 x i32>, i32, i32, i32 immarg) #1		declare void @llvm.amdgcn.raw.buffer.store.v4i32(<4 x i32>, <4 x i32>, i32, i32, i32 immarg) #1

		declare i32 @llvm.amdgcn.s.buffer.load.i32(<4 x i32>, i32, i32 immarg) nounwind readnone willreturn

; Function Attrs: nounwind readnone speculatable		; Function Attrs: nounwind readnone speculatable
declare i32 @llvm.amdgcn.reloc.constant(metadata) #3		declare i32 @llvm.amdgcn.reloc.constant(metadata) #3

; Function Attrs: nounwind readnone speculatable		; Function Attrs: nounwind readnone speculatable
declare i64 @llvm.amdgcn.s.getpc() #3		declare i64 @llvm.amdgcn.s.getpc() #3

; Function Attrs: nounwind readnone		; Function Attrs: nounwind readnone
declare <4 x i32> @llvm.amdgcn.s.buffer.load.v4i32(<4 x i32>, i32, i32 immarg) #1		declare <4 x i32> @llvm.amdgcn.s.buffer.load.v4i32(<4 x i32>, i32, i32 immarg) #1
Show All 25 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU][CodeGen] Support (soffset + offset) s_buffer_load's.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 457938

llvm/lib/Target/AMDGPU/AMDGPUGISel.td

llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.h

llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.h

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp

llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp

llvm/lib/Target/AMDGPU/SMInstructions.td

llvm/test/CodeGen/AMDGPU/amdgcn-load-offset-from-reg.ll

[AMDGPU][CodeGen] Support (soffset + offset) s_buffer_load's.
ClosedPublic