This is an archive of the discontinued LLVM Phabricator instance.

The stats is rather interesting. Of all our test shaders, counting all the duplicates, 134 are affected by this change, but of them it's actually less than 19% of those that changed their code size. The reason seems to be that we often need the resulting offset in a register pair anyway, so our good old s_add/s_adc just keep living along with the shiny new 'offset:' loads.

Where the size of the code has changed, it is now smaller, except for one Wolfenstein II shader which got 140 extra bytes of code with this change applied. I'm looking into what's going on there.

kosarev added a parent revision: D129095: [AMDGPU][CodeGen] Match SMRDs with constant bases and register offsets..Jul 8 2022, 10:04 AM

arsenm added inline comments.Jul 8 2022, 10:24 AM

llvm/test/CodeGen/AMDGPU/amdgcn-load-offset-from-reg.ll
1–3	You shouldn't use -stop-after=amdgpu-isel. Use finalize-isel for MIR tests (although I'm not sure why you're testing MIR here)
60	Also test some other load sizes?

In D129381#3639233, @kosarev wrote:

I'm looking into what's going on there.

The reason seems to be that with this change in place we generate more spills, which in turn is caused by reduced number of used registers as determined by AMDGPUResourceUsageAnalysis::analyzeResourceUsage() -- 102 registers originally vs 96 with the change. I'm not sure how precise the analysis is expected to be and whether this needs a ticket. Would appreciate any advice on this.

kosarev added inline comments.Jul 11 2022, 7:42 AM

llvm/test/CodeGen/AMDGPU/amdgcn-load-offset-from-reg.ll
1–3	This seems to be where we already have checks for a similar case? Is there a better place for that?

kosarev added inline comments.Jul 11 2022, 9:22 AM

llvm/test/CodeGen/AMDGPU/amdgcn-load-offset-from-reg.ll
1–3	You shouldn't use -stop-after=amdgpu-isel. Can you expand a bit more on why this is preferable, please? I see literally 4 tests utilising finalize-isel, none of them look like purely isel tests, and the other 35 all seem to use amdgpu-isel. The description of finalize-isel as it comes from rG9cac4e6d14035 again doesn't suggest anything obviously useful for this test?

Updated.

kosarev marked an inline comment as done.Jul 11 2022, 10:02 AM

In D129381#3642390, @kosarev wrote:

In D129381#3639233, @kosarev wrote:

I'm looking into what's going on there.

The reason seems to be that with this change in place we generate more spills, which in turn is caused by reduced number of used registers as determined by AMDGPUResourceUsageAnalysis::analyzeResourceUsage() -- 102 registers originally vs 96 with the change. I'm not sure how precise the analysis is expected to be and whether this needs a ticket. Would appreciate any advice on this.

That should be precise except in the presence of indirect/external/unanalyzeable calls. I'm a bit surprised there would be any big difference from this, but this is certainly going to be from secondary effects

llvm/test/CodeGen/AMDGPU/amdgcn-load-offset-from-reg.ll
1–3	amdgpu-isel is specifically the SDAG pass. Your GISEL check didn't stop anywhere near the same place, and -stop-pass ran off the end and completed register allocation

kosarev added inline comments.Jul 11 2022, 10:41 AM

llvm/test/CodeGen/AMDGPU/amdgcn-load-offset-from-reg.ll
1–3	What if just remove GISEL checks from this test completely then?

Harbormaster completed remote builds in B174697: Diff 443679.Jul 11 2022, 10:47 AM

arsenm added inline comments.Jul 11 2022, 10:48 AM

llvm/test/CodeGen/AMDGPU/amdgcn-load-offset-from-reg.ll
1–3	I'd rather just use -stop-after finalize-isel

The reason seems to be that with this change in place we generate more spills, which in turn is caused by reduced number of used registers as determined by AMDGPUResourceUsageAnalysis::analyzeResourceUsage()

I don't understand this. I thought AMDGPUResourceUsageAnalysis ran after regalloc so I don't see how it can affect spilling.

In D129381#3644817, @foad wrote:

I don't understand this. I thought AMDGPUResourceUsageAnalysis ran after regalloc so I don't see how it can affect spilling.

You are right, I was too quick thinking that that reduced number of registers is where the higher pressure comes from, sorry for misguiding. Further analysis showed that despite we indeed seem to use less registers in the end, at the register allocation phase an additional spill slot is now introduced due to interference of live ranges and a failure to assign a value to a physical register, which I think is not very surprising as I see lots of moved code in other shaders as well, and for that slot we generate 4 v_writelane_b32s and 16 additional v_readlane_b32s replacing what previously was 4 s_mov_b64s and a s_waitcnt. So unfortunately doesn't look like something easy to avoid.

Changed the .ll test to use -stop-after=finalize-isel.

arsenm added inline comments.Jul 12 2022, 8:55 AM

llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1978	I don't follow how/why B is being discarded

kosarev added inline comments.Jul 12 2022, 9:10 AM

llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1978	It's being split into `SBase` and `SOffset`, which we both return.

arsenm accepted this revision.Jul 12 2022, 9:25 AM

This revision is now accepted and ready to land.Jul 12 2022, 9:25 AM

Harbormaster completed remote builds in B174890: Diff 443967.Jul 12 2022, 9:42 AM

kosarev mentioned this in D129095: [AMDGPU][CodeGen] Match SMRDs with constant bases and register offsets..Jul 14 2022, 4:35 AM

This revision was landed with ongoing or failed builds.Jul 18 2022, 3:31 AM

Closed by commit rG432cbd782720: [AMDGPU][CodeGen] Support (register + immediate) SMRD offsets. (authored by kosarev). · Explain Why

This revision was automatically updated to reflect the committed changes.

kosarev added a commit: rG432cbd782720: [AMDGPU][CodeGen] Support (register + immediate) SMRD offsets..

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

AMDGPUGISel.td

4 lines

AMDGPUISelDAGToDAG.h

14 lines

AMDGPUISelDAGToDAG.cpp

74 lines

AMDGPUInstructionSelector.h

8 lines

AMDGPUInstructionSelector.cpp

132 lines

SIInstrInfo.cpp

13 lines

SMInstructions.td

14 lines

test/

CodeGen/

AMDGPU/

GlobalISel/

inst-select-load-smrd.mir

27 lines

amdgcn-load-offset-from-reg.ll

63 lines

Diff 445443

llvm/lib/Target/AMDGPU/AMDGPUGISel.td

	Show First 20 Lines • Show All 82 Lines • ▼ Show 20 Lines
	def gi_smrd_imm32 :			def gi_smrd_imm32 :
	GIComplexOperandMatcher<s64, "selectSmrdImm32">,			GIComplexOperandMatcher<s64, "selectSmrdImm32">,
	GIComplexPatternEquiv<SMRDImm32>;			GIComplexPatternEquiv<SMRDImm32>;

	def gi_smrd_sgpr :			def gi_smrd_sgpr :
	GIComplexOperandMatcher<s64, "selectSmrdSgpr">,			GIComplexOperandMatcher<s64, "selectSmrdSgpr">,
	GIComplexPatternEquiv<SMRDSgpr>;			GIComplexPatternEquiv<SMRDSgpr>;

				def gi_smrd_sgpr_imm :
				GIComplexOperandMatcher<s64, "selectSmrdSgprImm">,
				GIComplexPatternEquiv<SMRDSgprImm>;

	def gi_flat_offset :			def gi_flat_offset :
	GIComplexOperandMatcher<s64, "selectFlatOffset">,			GIComplexOperandMatcher<s64, "selectFlatOffset">,
	GIComplexPatternEquiv<FlatOffset>;			GIComplexPatternEquiv<FlatOffset>;
	def gi_global_offset :			def gi_global_offset :
	GIComplexOperandMatcher<s64, "selectGlobalOffset">,			GIComplexOperandMatcher<s64, "selectGlobalOffset">,
	GIComplexPatternEquiv<GlobalOffset>;			GIComplexPatternEquiv<GlobalOffset>;
	def gi_global_saddr :			def gi_global_saddr :
	GIComplexOperandMatcher<s64, "selectGlobalSAddr">,			GIComplexOperandMatcher<s64, "selectGlobalSAddr">,
	▲ Show 20 Lines • Show All 268 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.h

Show First 20 Lines • Show All 187 Lines • ▼ Show 20 Lines	bool SelectGlobalSAddr(SDNode *N, SDValue Addr, SDValue &SAddr,
SDValue &VOffset, SDValue &Offset) const;		SDValue &VOffset, SDValue &Offset) const;
bool SelectScratchSAddr(SDNode *N, SDValue Addr, SDValue &SAddr,		bool SelectScratchSAddr(SDNode *N, SDValue Addr, SDValue &SAddr,
SDValue &Offset) const;		SDValue &Offset) const;
bool checkFlatScratchSVSSwizzleBug(SDValue VAddr, SDValue SAddr,		bool checkFlatScratchSVSSwizzleBug(SDValue VAddr, SDValue SAddr,
uint64_t ImmOffset) const;		uint64_t ImmOffset) const;
bool SelectScratchSVAddr(SDNode *N, SDValue Addr, SDValue &VAddr,		bool SelectScratchSVAddr(SDNode *N, SDValue Addr, SDValue &VAddr,
SDValue &SAddr, SDValue &Offset) const;		SDValue &SAddr, SDValue &Offset) const;

bool SelectSMRDOffset(SDValue ByteOffsetNode, SDValue &Offset, bool Imm,		bool SelectSMRDOffset(SDValue Base, SDValue ByteOffsetNode, SDValue *SOffset,
bool Imm32Only) const;		SDValue *Offset, bool Imm32Only = false) const;
SDValue Expand32BitAddress(SDValue Addr) const;		SDValue Expand32BitAddress(SDValue Addr) const;
bool SelectSMRD(SDValue Addr, SDValue &SBase, SDValue &Offset, bool Imm,		bool SelectSMRDBaseOffset(SDValue Addr, SDValue &SBase, SDValue *SOffset,
bool Imm32Only = false) const;		SDValue *Offset, bool Imm32Only = false) const;
		bool SelectSMRD(SDValue Addr, SDValue &SBase, SDValue *SOffset,
		SDValue *Offset, bool Imm32Only = false) const;
bool SelectSMRDImm(SDValue Addr, SDValue &SBase, SDValue &Offset) const;		bool SelectSMRDImm(SDValue Addr, SDValue &SBase, SDValue &Offset) const;
bool SelectSMRDImm32(SDValue Addr, SDValue &SBase, SDValue &Offset) const;		bool SelectSMRDImm32(SDValue Addr, SDValue &SBase, SDValue &Offset) const;
bool SelectSMRDSgpr(SDValue Addr, SDValue &SBase, SDValue &Offset) const;		bool SelectSMRDSgpr(SDValue Addr, SDValue &SBase, SDValue &SOffset) const;
		bool SelectSMRDSgprImm(SDValue Addr, SDValue &SBase, SDValue &SOffset,
		SDValue &Offset) const;
bool SelectSMRDBufferImm(SDValue Addr, SDValue &Offset) const;		bool SelectSMRDBufferImm(SDValue Addr, SDValue &Offset) const;
bool SelectSMRDBufferImm32(SDValue Addr, SDValue &Offset) const;		bool SelectSMRDBufferImm32(SDValue Addr, SDValue &Offset) const;
bool SelectMOVRELOffset(SDValue Index, SDValue &Base, SDValue &Offset) const;		bool SelectMOVRELOffset(SDValue Index, SDValue &Base, SDValue &Offset) const;

bool SelectVOP3Mods_NNaN(SDValue In, SDValue &Src, SDValue &SrcMods) const;		bool SelectVOP3Mods_NNaN(SDValue In, SDValue &Src, SDValue &SrcMods) const;
bool SelectVOP3ModsImpl(SDValue In, SDValue &Src, unsigned &SrcMods,		bool SelectVOP3ModsImpl(SDValue In, SDValue &Src, unsigned &SrcMods,
bool AllowAbs = true) const;		bool AllowAbs = true) const;
bool SelectVOP3Mods(SDValue In, SDValue &Src, SDValue &SrcMods) const;		bool SelectVOP3Mods(SDValue In, SDValue &Src, SDValue &SrcMods) const;
▲ Show 20 Lines • Show All 63 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp

Show First 20 Lines • Show All 1,880 Lines • ▼ Show 20 Lines	bool AMDGPUDAGToDAGISel::SelectScratchSVAddr(SDNode *N, SDValue Addr,
SAddr = SelectSAddrFI(CurDAG, SAddr);		SAddr = SelectSAddrFI(CurDAG, SAddr);
Offset = CurDAG->getTargetConstant(ImmOffset, SDLoc(), MVT::i16);		Offset = CurDAG->getTargetConstant(ImmOffset, SDLoc(), MVT::i16);
return true;		return true;
}		}

// Match an immediate (if Imm is true) or an SGPR (if Imm is false)		// Match an immediate (if Imm is true) or an SGPR (if Imm is false)
// offset. If Imm32Only is true, match only 32-bit immediate offsets		// offset. If Imm32Only is true, match only 32-bit immediate offsets
// available on CI.		// available on CI.
bool AMDGPUDAGToDAGISel::SelectSMRDOffset(SDValue ByteOffsetNode,		bool AMDGPUDAGToDAGISel::SelectSMRDOffset(SDValue Addr, SDValue ByteOffsetNode,
SDValue &Offset, bool Imm,		SDValue SOffset, SDValue Offset,
bool Imm32Only) const {		bool Imm32Only) const {
ConstantSDNode *C = dyn_cast<ConstantSDNode>(ByteOffsetNode);		ConstantSDNode *C = dyn_cast<ConstantSDNode>(ByteOffsetNode);
if (!C) {		if (!C) {
if (Imm)		if (!SOffset)
return false;		return false;
if (ByteOffsetNode.getValueType().isScalarInteger() &&		if (ByteOffsetNode.getValueType().isScalarInteger() &&
ByteOffsetNode.getValueType().getSizeInBits() == 32) {		ByteOffsetNode.getValueType().getSizeInBits() == 32) {
Offset = ByteOffsetNode;		*SOffset = ByteOffsetNode;
return true;		return true;
}		}
if (ByteOffsetNode.getOpcode() == ISD::ZERO_EXTEND) {		if (ByteOffsetNode.getOpcode() == ISD::ZERO_EXTEND) {
if (ByteOffsetNode.getOperand(0).getValueType().getSizeInBits() == 32) {		if (ByteOffsetNode.getOperand(0).getValueType().getSizeInBits() == 32) {
Offset = ByteOffsetNode.getOperand(0);		*SOffset = ByteOffsetNode.getOperand(0);
return true;		return true;
}		}
}		}
return false;		return false;
}		}

SDLoc SL(ByteOffsetNode);		SDLoc SL(ByteOffsetNode);
// GFX9 and GFX10 have signed byte immediate offsets.		// GFX9 and GFX10 have signed byte immediate offsets.
int64_t ByteOffset = C->getSExtValue();		int64_t ByteOffset = C->getSExtValue();
Optional<int64_t> EncodedOffset =		Optional<int64_t> EncodedOffset =
AMDGPU::getSMRDEncodedOffset(*Subtarget, ByteOffset, false);		AMDGPU::getSMRDEncodedOffset(*Subtarget, ByteOffset, false);
if (EncodedOffset && Imm && !Imm32Only) {		if (EncodedOffset && Offset && !Imm32Only) {
Offset = CurDAG->getTargetConstant(*EncodedOffset, SL, MVT::i32);		Offset = CurDAG->getTargetConstant(EncodedOffset, SL, MVT::i32);
return true;		return true;
}		}

// SGPR and literal offsets are unsigned.		// SGPR and literal offsets are unsigned.
if (ByteOffset < 0)		if (ByteOffset < 0)
return false;		return false;

EncodedOffset = AMDGPU::getSMRDEncodedLiteralOffset32(*Subtarget, ByteOffset);		EncodedOffset = AMDGPU::getSMRDEncodedLiteralOffset32(*Subtarget, ByteOffset);
if (EncodedOffset && Imm32Only) {		if (EncodedOffset && Offset && Imm32Only) {
Offset = CurDAG->getTargetConstant(*EncodedOffset, SL, MVT::i32);		Offset = CurDAG->getTargetConstant(EncodedOffset, SL, MVT::i32);
return true;		return true;
}		}

if (!isUInt<32>(ByteOffset) && !isInt<32>(ByteOffset))		if (!isUInt<32>(ByteOffset) && !isInt<32>(ByteOffset))
return false;		return false;

if (!Imm) {		if (SOffset) {
SDValue C32Bit = CurDAG->getTargetConstant(ByteOffset, SL, MVT::i32);		SDValue C32Bit = CurDAG->getTargetConstant(ByteOffset, SL, MVT::i32);
Offset = SDValue(		*SOffset = SDValue(
CurDAG->getMachineNode(AMDGPU::S_MOV_B32, SL, MVT::i32, C32Bit), 0);		CurDAG->getMachineNode(AMDGPU::S_MOV_B32, SL, MVT::i32, C32Bit), 0);
return true;		return true;
}		}

return false;		return false;
}		}

SDValue AMDGPUDAGToDAGISel::Expand32BitAddress(SDValue Addr) const {		SDValue AMDGPUDAGToDAGISel::Expand32BitAddress(SDValue Addr) const {
Show All 19 Lines	SDValue AMDGPUDAGToDAGISel::Expand32BitAddress(SDValue Addr) const {

return SDValue(CurDAG->getMachineNode(AMDGPU::REG_SEQUENCE, SL, MVT::i64,		return SDValue(CurDAG->getMachineNode(AMDGPU::REG_SEQUENCE, SL, MVT::i64,
Ops), 0);		Ops), 0);
}		}

// Match a base and an immediate (if Imm is true) or an SGPR		// Match a base and an immediate (if Imm is true) or an SGPR
// (if Imm is false) offset. If Imm32Only is true, match only 32-bit		// (if Imm is false) offset. If Imm32Only is true, match only 32-bit
// immediate offsets available on CI.		// immediate offsets available on CI.
bool AMDGPUDAGToDAGISel::SelectSMRD(SDValue Addr, SDValue &SBase,		bool AMDGPUDAGToDAGISel::SelectSMRDBaseOffset(SDValue Addr, SDValue &SBase,
SDValue &Offset, bool Imm,		SDValue SOffset, SDValue Offset,
bool Imm32Only) const {		bool Imm32Only) const {
SDLoc SL(Addr);		SDLoc SL(Addr);

		if (SOffset && Offset) {
		assert(!Imm32Only);
		SDValue B;
		arsenmUnsubmitted Not Done Reply Inline Actions I don't follow how/why B is being discarded arsenm: I don't follow how/why B is being discarded
		kosarevAuthorUnsubmitted Done Reply Inline Actions It's being split into `SBase` and `SOffset`, which we both return. kosarev: It's being split into `SBase` and `SOffset`, which we both return.
		return SelectSMRDBaseOffset(Addr, B, nullptr, Offset) &&
		SelectSMRDBaseOffset(B, SBase, SOffset, nullptr);
		}

// A 32-bit (address + offset) should not cause unsigned 32-bit integer		// A 32-bit (address + offset) should not cause unsigned 32-bit integer
// wraparound, because s_load instructions perform the addition in 64 bits.		// wraparound, because s_load instructions perform the addition in 64 bits.
if ((Addr.getValueType() != MVT::i32 \|\|		if ((Addr.getValueType() != MVT::i32 \|\|
Addr->getFlags().hasNoUnsignedWrap())) {		Addr->getFlags().hasNoUnsignedWrap())) {
SDValue N0, N1;		SDValue N0, N1;
// Extract the base and offset if possible.		// Extract the base and offset if possible.
if (CurDAG->isBaseWithConstantOffset(Addr) \|\|		if (CurDAG->isBaseWithConstantOffset(Addr) \|\|
Addr.getOpcode() == ISD::ADD) {		Addr.getOpcode() == ISD::ADD) {
N0 = Addr.getOperand(0);		N0 = Addr.getOperand(0);
N1 = Addr.getOperand(1);		N1 = Addr.getOperand(1);
} else if (getBaseWithOffsetUsingSplitOR(*CurDAG, Addr, N0, N1)) {		} else if (getBaseWithOffsetUsingSplitOR(*CurDAG, Addr, N0, N1)) {
assert(N0 && N1 && isa<ConstantSDNode>(N1));		assert(N0 && N1 && isa<ConstantSDNode>(N1));
}		}
if (N0 && N1) {		if (N0 && N1) {
if (SelectSMRDOffset(N1, Offset, Imm, Imm32Only)) {		if (SelectSMRDOffset(N0, N1, SOffset, Offset, Imm32Only)) {
SBase = Expand32BitAddress(N0);		SBase = N0;
return true;		return true;
}		}
if (SelectSMRDOffset(N0, Offset, Imm, Imm32Only)) {		if (SelectSMRDOffset(N1, N0, SOffset, Offset, Imm32Only)) {
SBase = Expand32BitAddress(N1);		SBase = N1;
return true;		return true;
}		}
}		}
return false;		return false;
}		}
if (!Imm)		if (Offset && !SOffset) {
		SBase = Addr;
		*Offset = CurDAG->getTargetConstant(0, SL, MVT::i32);
		return true;
		}
return false;		return false;
SBase = Expand32BitAddress(Addr);		}
Offset = CurDAG->getTargetConstant(0, SL, MVT::i32);
		bool AMDGPUDAGToDAGISel::SelectSMRD(SDValue Addr, SDValue &SBase,
		SDValue SOffset, SDValue Offset,
		bool Imm32Only) const {
		if (!SelectSMRDBaseOffset(Addr, SBase, SOffset, Offset, Imm32Only))
		return false;
		SBase = Expand32BitAddress(SBase);
return true;		return true;
}		}

bool AMDGPUDAGToDAGISel::SelectSMRDImm(SDValue Addr, SDValue &SBase,		bool AMDGPUDAGToDAGISel::SelectSMRDImm(SDValue Addr, SDValue &SBase,
SDValue &Offset) const {		SDValue &Offset) const {
return SelectSMRD(Addr, SBase, Offset, /* Imm */ true);		return SelectSMRD(Addr, SBase, /* SOffset */ nullptr, &Offset);
}		}

bool AMDGPUDAGToDAGISel::SelectSMRDImm32(SDValue Addr, SDValue &SBase,		bool AMDGPUDAGToDAGISel::SelectSMRDImm32(SDValue Addr, SDValue &SBase,
SDValue &Offset) const {		SDValue &Offset) const {
assert(Subtarget->getGeneration() == AMDGPUSubtarget::SEA_ISLANDS);		assert(Subtarget->getGeneration() == AMDGPUSubtarget::SEA_ISLANDS);
return SelectSMRD(Addr, SBase, Offset, /* Imm / true, / Imm32Only */ true);		return SelectSMRD(Addr, SBase, /* SOffset */ nullptr, &Offset,
		/* Imm32Only */ true);
}		}

bool AMDGPUDAGToDAGISel::SelectSMRDSgpr(SDValue Addr, SDValue &SBase,		bool AMDGPUDAGToDAGISel::SelectSMRDSgpr(SDValue Addr, SDValue &SBase,
		SDValue &SOffset) const {
		return SelectSMRD(Addr, SBase, &SOffset, /* Offset */ nullptr);
		}

		bool AMDGPUDAGToDAGISel::SelectSMRDSgprImm(SDValue Addr, SDValue &SBase,
		SDValue &SOffset,
SDValue &Offset) const {		SDValue &Offset) const {
return SelectSMRD(Addr, SBase, Offset, /* Imm */ false);		return SelectSMRD(Addr, SBase, &SOffset, &Offset);
}		}

bool AMDGPUDAGToDAGISel::SelectSMRDBufferImm(SDValue Addr,		bool AMDGPUDAGToDAGISel::SelectSMRDBufferImm(SDValue Addr,
SDValue &Offset) const {		SDValue &Offset) const {
if (ConstantSDNode *C = dyn_cast<ConstantSDNode>(Addr)) {		if (ConstantSDNode *C = dyn_cast<ConstantSDNode>(Addr)) {
// The immediate offset for S_BUFFER instructions is unsigned.		// The immediate offset for S_BUFFER instructions is unsigned.
if (auto Imm =		if (auto Imm =
AMDGPU::getSMRDEncodedOffset(*Subtarget, C->getZExtValue(), true)) {		AMDGPU::getSMRDEncodedOffset(*Subtarget, C->getZExtValue(), true)) {
▲ Show 20 Lines • Show All 950 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.h

Show First 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	public:
static const char *getName();		static const char *getName();

void setupMF(MachineFunction &MF, GISelKnownBits *KB,		void setupMF(MachineFunction &MF, GISelKnownBits *KB,
CodeGenCoverage &CoverageInfo, ProfileSummaryInfo *PSI,		CodeGenCoverage &CoverageInfo, ProfileSummaryInfo *PSI,
BlockFrequencyInfo *BFI) override;		BlockFrequencyInfo *BFI) override;

private:		private:
struct GEPInfo {		struct GEPInfo {
const MachineInstr &GEP;
SmallVector<unsigned, 2> SgprParts;		SmallVector<unsigned, 2> SgprParts;
SmallVector<unsigned, 2> VgprParts;		SmallVector<unsigned, 2> VgprParts;
int64_t Imm;		int64_t Imm = 0;
GEPInfo(const MachineInstr &GEP) : GEP(GEP), Imm(0) { }
};		};

bool isSGPR(Register Reg) const;		bool isSGPR(Register Reg) const;

bool isInstrUniform(const MachineInstr &MI) const;		bool isInstrUniform(const MachineInstr &MI) const;
bool isVCC(Register Reg, const MachineRegisterInfo &MRI) const;		bool isVCC(Register Reg, const MachineRegisterInfo &MRI) const;

const RegisterBank *getArtifactRegBank(		const RegisterBank *getArtifactRegBank(
▲ Show 20 Lines • Show All 116 Lines • ▼ Show 20 Lines	private:
InstructionSelector::ComplexRendererFns		InstructionSelector::ComplexRendererFns
selectVOP3OpSelMods(MachineOperand &Root) const;		selectVOP3OpSelMods(MachineOperand &Root) const;

InstructionSelector::ComplexRendererFns		InstructionSelector::ComplexRendererFns
selectVINTERPMods(MachineOperand &Root) const;		selectVINTERPMods(MachineOperand &Root) const;
InstructionSelector::ComplexRendererFns		InstructionSelector::ComplexRendererFns
selectVINTERPModsHi(MachineOperand &Root) const;		selectVINTERPModsHi(MachineOperand &Root) const;

		bool selectSmrdOffset(MachineOperand &Root, Register &Base, Register *SOffset,
		int64_t *Offset) const;
InstructionSelector::ComplexRendererFns		InstructionSelector::ComplexRendererFns
selectSmrdImm(MachineOperand &Root) const;		selectSmrdImm(MachineOperand &Root) const;
InstructionSelector::ComplexRendererFns		InstructionSelector::ComplexRendererFns
selectSmrdImm32(MachineOperand &Root) const;		selectSmrdImm32(MachineOperand &Root) const;
InstructionSelector::ComplexRendererFns		InstructionSelector::ComplexRendererFns
selectSmrdSgpr(MachineOperand &Root) const;		selectSmrdSgpr(MachineOperand &Root) const;
		InstructionSelector::ComplexRendererFns
		selectSmrdSgprImm(MachineOperand &Root) const;

std::pair<Register, int> selectFlatOffsetImpl(MachineOperand &Root,		std::pair<Register, int> selectFlatOffsetImpl(MachineOperand &Root,
uint64_t FlatVariant) const;		uint64_t FlatVariant) const;

InstructionSelector::ComplexRendererFns		InstructionSelector::ComplexRendererFns
selectFlatOffset(MachineOperand &Root) const;		selectFlatOffset(MachineOperand &Root) const;
InstructionSelector::ComplexRendererFns		InstructionSelector::ComplexRendererFns
selectGlobalOffset(MachineOperand &Root) const;		selectGlobalOffset(MachineOperand &Root) const;
▲ Show 20 Lines • Show All 131 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp

Show First 20 Lines • Show All 2,355 Lines • ▼ Show 20 Lines	void AMDGPUInstructionSelector::getAddrModeInfo(const MachineInstr &Load,

const MachineInstr *PtrMI = MRI.getUniqueVRegDef(Load.getOperand(1).getReg());		const MachineInstr *PtrMI = MRI.getUniqueVRegDef(Load.getOperand(1).getReg());

assert(PtrMI);		assert(PtrMI);

if (PtrMI->getOpcode() != TargetOpcode::G_PTR_ADD)		if (PtrMI->getOpcode() != TargetOpcode::G_PTR_ADD)
return;		return;

GEPInfo GEPInfo(*PtrMI);		GEPInfo GEPInfo;

for (unsigned i = 1; i != 3; ++i) {		for (unsigned i = 1; i != 3; ++i) {
const MachineOperand &GEPOp = PtrMI->getOperand(i);		const MachineOperand &GEPOp = PtrMI->getOperand(i);
const MachineInstr *OpDef = MRI.getUniqueVRegDef(GEPOp.getReg());		const MachineInstr *OpDef = MRI.getUniqueVRegDef(GEPOp.getReg());
assert(OpDef);		assert(OpDef);
if (i == 2 && isConstant(*OpDef)) {		if (i == 2 && isConstant(*OpDef)) {
// TODO: Could handle constant base + variable offset, but a combine		// TODO: Could handle constant base + variable offset, but a combine
// probably should have commuted it.		// probably should have commuted it.
▲ Show 20 Lines • Show All 1,422 Lines • ▼ Show 20 Lines	std::tie(Src, Mods) = selectVOP3ModsImpl(Root,
/* ForceVGPR */ true);		/* ForceVGPR */ true);

return {{		return {{
[=](MachineInstrBuilder &MIB) { MIB.addReg(Src); },		[=](MachineInstrBuilder &MIB) { MIB.addReg(Src); },
[=](MachineInstrBuilder &MIB) { MIB.addImm(Mods); }, // src0_mods		[=](MachineInstrBuilder &MIB) { MIB.addImm(Mods); }, // src0_mods
}};		}};
}		}

InstructionSelector::ComplexRendererFns		bool AMDGPUInstructionSelector::selectSmrdOffset(MachineOperand &Root,
AMDGPUInstructionSelector::selectSmrdImm(MachineOperand &Root) const {		Register &Base,
		Register *SOffset,
		int64_t *Offset) const {
		MachineInstr *MI = Root.getParent();
		MachineBasicBlock *MBB = MI->getParent();

		// FIXME: We should shrink the GEP if the offset is known to be <= 32-bits,
		// then we can select all ptr + 32-bit offsets.
SmallVector<GEPInfo, 4> AddrInfo;		SmallVector<GEPInfo, 4> AddrInfo;
getAddrModeInfo(Root.getParent(), MRI, AddrInfo);		getAddrModeInfo(MI, MRI, AddrInfo);

if (AddrInfo.empty() \|\| AddrInfo[0].SgprParts.size() != 1)		if (AddrInfo.empty())
return None;		return false;

const GEPInfo &GEPInfo = AddrInfo[0];		const GEPInfo &GEPI = AddrInfo[0];
Optional<int64_t> EncodedImm =		Optional<int64_t> EncodedImm =
AMDGPU::getSMRDEncodedOffset(STI, GEPInfo.Imm, false);		AMDGPU::getSMRDEncodedOffset(STI, GEPI.Imm, false);
if (!EncodedImm)
		if (SOffset && Offset) {
		if (GEPI.SgprParts.size() == 1 && GEPI.Imm != 0 && EncodedImm &&
		AddrInfo.size() > 1) {
		const GEPInfo &GEPI2 = AddrInfo[1];
		if (GEPI2.SgprParts.size() == 2 && GEPI2.Imm == 0) {
		if (Register OffsetReg =
		matchZeroExtendFromS32(*MRI, GEPI2.SgprParts[1])) {
		Base = GEPI2.SgprParts[0];
		*SOffset = OffsetReg;
		Offset = EncodedImm;
		return true;
		}
		}
		}
		return false;
		}

		if (Offset && GEPI.SgprParts.size() == 1 && EncodedImm) {
		Base = GEPI.SgprParts[0];
		Offset = EncodedImm;
		return true;
		}

		// SGPR offset is unsigned.
		if (SOffset && GEPI.SgprParts.size() == 1 && isUInt<32>(GEPI.Imm) &&
		GEPI.Imm != 0) {
		// If we make it this far we have a load with an 32-bit immediate offset.
		// It is OK to select this using a sgpr offset, because we have already
		// failed trying to select this load into one of the _IMM variants since
		// the _IMM Patterns are considered before the _SGPR patterns.
		Base = GEPI.SgprParts[0];
		*SOffset = MRI->createVirtualRegister(&AMDGPU::SReg_32RegClass);
		BuildMI(MBB, MI, MI->getDebugLoc(), TII.get(AMDGPU::S_MOV_B32), SOffset)
		.addImm(GEPI.Imm);
		return true;
		}

		if (SOffset && GEPI.SgprParts.size() && GEPI.Imm == 0) {
		if (Register OffsetReg = matchZeroExtendFromS32(*MRI, GEPI.SgprParts[1])) {
		Base = GEPI.SgprParts[0];
		*SOffset = OffsetReg;
		return true;
		}
		}

		return false;
		}

		InstructionSelector::ComplexRendererFns
		AMDGPUInstructionSelector::selectSmrdImm(MachineOperand &Root) const {
		Register Base;
		int64_t Offset;
		if (!selectSmrdOffset(Root, Base, /* SOffset= */ nullptr, &Offset))
return None;		return None;

unsigned PtrReg = GEPInfo.SgprParts[0];		return {{[=](MachineInstrBuilder &MIB) { MIB.addReg(Base); },
return {{		[=](MachineInstrBuilder &MIB) { MIB.addImm(Offset); }}};
[=](MachineInstrBuilder &MIB) { MIB.addReg(PtrReg); },
[=](MachineInstrBuilder &MIB) { MIB.addImm(*EncodedImm); }
}};
}		}

InstructionSelector::ComplexRendererFns		InstructionSelector::ComplexRendererFns
AMDGPUInstructionSelector::selectSmrdImm32(MachineOperand &Root) const {		AMDGPUInstructionSelector::selectSmrdImm32(MachineOperand &Root) const {
SmallVector<GEPInfo, 4> AddrInfo;		SmallVector<GEPInfo, 4> AddrInfo;
getAddrModeInfo(Root.getParent(), MRI, AddrInfo);		getAddrModeInfo(Root.getParent(), MRI, AddrInfo);

if (AddrInfo.empty() \|\| AddrInfo[0].SgprParts.size() != 1)		if (AddrInfo.empty() \|\| AddrInfo[0].SgprParts.size() != 1)
Show All 9 Lines	AMDGPUInstructionSelector::selectSmrdImm32(MachineOperand &Root) const {
return {{		return {{
[=](MachineInstrBuilder &MIB) { MIB.addReg(PtrReg); },		[=](MachineInstrBuilder &MIB) { MIB.addReg(PtrReg); },
[=](MachineInstrBuilder &MIB) { MIB.addImm(*EncodedImm); }		[=](MachineInstrBuilder &MIB) { MIB.addImm(*EncodedImm); }
}};		}};
}		}

InstructionSelector::ComplexRendererFns		InstructionSelector::ComplexRendererFns
AMDGPUInstructionSelector::selectSmrdSgpr(MachineOperand &Root) const {		AMDGPUInstructionSelector::selectSmrdSgpr(MachineOperand &Root) const {
MachineInstr *MI = Root.getParent();		Register Base, SOffset;
MachineBasicBlock *MBB = MI->getParent();		if (!selectSmrdOffset(Root, Base, &SOffset, /* Offset= */ nullptr))

SmallVector<GEPInfo, 4> AddrInfo;
getAddrModeInfo(MI, MRI, AddrInfo);

// FIXME: We should shrink the GEP if the offset is known to be <= 32-bits,
// then we can select all ptr + 32-bit offsets.
if (AddrInfo.empty())
return None;		return None;

const GEPInfo &GEPInfo = AddrInfo[0];		return {{[=](MachineInstrBuilder &MIB) { MIB.addReg(Base); },
Register PtrReg = GEPInfo.SgprParts[0];		[=](MachineInstrBuilder &MIB) { MIB.addReg(SOffset); }}};

// SGPR offset is unsigned.
if (AddrInfo[0].SgprParts.size() == 1 && isUInt<32>(GEPInfo.Imm) &&
GEPInfo.Imm != 0) {
// If we make it this far we have a load with an 32-bit immediate offset.
// It is OK to select this using a sgpr offset, because we have already
// failed trying to select this load into one of the _IMM variants since
// the _IMM Patterns are considered before the _SGPR patterns.
Register OffsetReg = MRI->createVirtualRegister(&AMDGPU::SReg_32RegClass);
BuildMI(*MBB, MI, MI->getDebugLoc(), TII.get(AMDGPU::S_MOV_B32), OffsetReg)
.addImm(GEPInfo.Imm);
return {{[=](MachineInstrBuilder &MIB) { MIB.addReg(PtrReg); },
[=](MachineInstrBuilder &MIB) { MIB.addReg(OffsetReg); }}};
}

if (AddrInfo[0].SgprParts.size() == 2 && GEPInfo.Imm == 0) {
if (Register OffsetReg =
matchZeroExtendFromS32(*MRI, GEPInfo.SgprParts[1])) {
return {{[=](MachineInstrBuilder &MIB) { MIB.addReg(PtrReg); },
[=](MachineInstrBuilder &MIB) { MIB.addReg(OffsetReg); }}};
}
}		}

		InstructionSelector::ComplexRendererFns
		AMDGPUInstructionSelector::selectSmrdSgprImm(MachineOperand &Root) const {
		Register Base, SOffset;
		int64_t Offset;
		if (!selectSmrdOffset(Root, Base, &SOffset, &Offset))
return None;		return None;

		return {{[=](MachineInstrBuilder &MIB) { MIB.addReg(Base); },
		[=](MachineInstrBuilder &MIB) { MIB.addReg(SOffset); },
		[=](MachineInstrBuilder &MIB) { MIB.addImm(Offset); }}};
}		}

std::pair<Register, int>		std::pair<Register, int>
AMDGPUInstructionSelector::selectFlatOffsetImpl(MachineOperand &Root,		AMDGPUInstructionSelector::selectFlatOffsetImpl(MachineOperand &Root,
uint64_t FlatVariant) const {		uint64_t FlatVariant) const {
MachineInstr *MI = Root.getParent();		MachineInstr *MI = Root.getParent();

auto Default = std::make_pair(Root.getReg(), 0);		auto Default = std::make_pair(Root.getReg(), 0);
▲ Show 20 Lines • Show All 1,036 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 214 Lines • ▼ Show 20 Lines	bool SIInstrInfo::areLoadsFromSameBasePtr(SDNode Load0, SDNode Load1,
}		}

if (isSMRD(Opc0) && isSMRD(Opc1)) {		if (isSMRD(Opc0) && isSMRD(Opc1)) {
// Skip time and cache invalidation instructions.		// Skip time and cache invalidation instructions.
if (AMDGPU::getNamedOperandIdx(Opc0, AMDGPU::OpName::sbase) == -1 \|\|		if (AMDGPU::getNamedOperandIdx(Opc0, AMDGPU::OpName::sbase) == -1 \|\|
AMDGPU::getNamedOperandIdx(Opc1, AMDGPU::OpName::sbase) == -1)		AMDGPU::getNamedOperandIdx(Opc1, AMDGPU::OpName::sbase) == -1)
return false;		return false;

assert(getNumOperandsNoGlue(Load0) == getNumOperandsNoGlue(Load1));		unsigned NumOps = getNumOperandsNoGlue(Load0);
		if (NumOps != getNumOperandsNoGlue(Load1))
		return false;

// Check base reg.		// Check base reg.
if (Load0->getOperand(0) != Load1->getOperand(0))		if (Load0->getOperand(0) != Load1->getOperand(0))
return false;		return false;

		// Match register offsets, if both register and immediate offsets present.
		assert(NumOps == 4 \|\| NumOps == 5);
		if (NumOps == 5 && Load0->getOperand(1) != Load1->getOperand(1))
		return false;

const ConstantSDNode *Load0Offset =		const ConstantSDNode *Load0Offset =
dyn_cast<ConstantSDNode>(Load0->getOperand(1));		dyn_cast<ConstantSDNode>(Load0->getOperand(NumOps - 3));
const ConstantSDNode *Load1Offset =		const ConstantSDNode *Load1Offset =
dyn_cast<ConstantSDNode>(Load1->getOperand(1));		dyn_cast<ConstantSDNode>(Load1->getOperand(NumOps - 3));

if (!Load0Offset \|\| !Load1Offset)		if (!Load0Offset \|\| !Load1Offset)
return false;		return false;

Offset0 = Load0Offset->getZExtValue();		Offset0 = Load0Offset->getZExtValue();
Offset1 = Load1Offset->getZExtValue();		Offset1 = Load1Offset->getZExtValue();
return true;		return true;
}		}
▲ Show 20 Lines • Show All 8,248 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SMInstructions.td

Show First 20 Lines • Show All 877 Lines • ▼ Show 20 Lines	if (hasVgprParts(AddrInfo))
return false;		return false;
return true;		return true;
}];		}];
}		}

def SMRDImm : ComplexPattern<iPTR, 2, "SelectSMRDImm">;		def SMRDImm : ComplexPattern<iPTR, 2, "SelectSMRDImm">;
def SMRDImm32 : ComplexPattern<iPTR, 2, "SelectSMRDImm32">;		def SMRDImm32 : ComplexPattern<iPTR, 2, "SelectSMRDImm32">;
def SMRDSgpr : ComplexPattern<iPTR, 2, "SelectSMRDSgpr">;		def SMRDSgpr : ComplexPattern<iPTR, 2, "SelectSMRDSgpr">;
		def SMRDSgprImm : ComplexPattern<iPTR, 3, "SelectSMRDSgprImm">;
def SMRDBufferImm : ComplexPattern<iPTR, 1, "SelectSMRDBufferImm">;		def SMRDBufferImm : ComplexPattern<iPTR, 1, "SelectSMRDBufferImm">;
def SMRDBufferImm32 : ComplexPattern<iPTR, 1, "SelectSMRDBufferImm32">;		def SMRDBufferImm32 : ComplexPattern<iPTR, 1, "SelectSMRDBufferImm32">;

multiclass SMRD_Pattern <string Instr, ValueType vt> {		multiclass SMRD_Pattern <string Instr, ValueType vt> {

// 1. IMM offset		// 1. IMM offset
def : GCNPat <		def : GCNPat <
(smrd_load (SMRDImm i64:$sbase, i32:$offset)),		(smrd_load (SMRDImm i64:$sbase, i32:$offset)),
(vt (!cast<SM_Pseudo>(Instr#"_IMM") $sbase, $offset, 0))		(vt (!cast<SM_Pseudo>(Instr#"_IMM") $sbase, $offset, 0))
>;		>;

// 2. 32-bit IMM offset on CI		// 2. 32-bit IMM offset on CI
def : GCNPat <		def : GCNPat <
(smrd_load (SMRDImm32 i64:$sbase, i32:$offset)),		(smrd_load (SMRDImm32 i64:$sbase, i32:$offset)),
(vt (!cast<InstSI>(Instr#"_IMM_ci") $sbase, $offset, 0))> {		(vt (!cast<InstSI>(Instr#"_IMM_ci") $sbase, $offset, 0))> {
let OtherPredicates = [isGFX7Only];		let OtherPredicates = [isGFX7Only];
}		}

// 3. SGPR offset		// 3. SGPR offset
def : GCNPat <		def : GCNPat <
(smrd_load (SMRDSgpr i64:$sbase, i32:$offset)),		(smrd_load (SMRDSgpr i64:$sbase, i32:$soffset)),
(vt (!cast<SM_Pseudo>(Instr#"_SGPR") $sbase, $offset, 0))		(vt (!cast<SM_Pseudo>(Instr#"_SGPR") $sbase, $soffset, 0))
>;		>;

// 4. No offset		// 4. SGPR+IMM offset
		def : GCNPat <
		(smrd_load (SMRDSgprImm i64:$sbase, i32:$soffset, i32:$offset)),
		(vt (!cast<SM_Pseudo>(Instr#"_SGPR_IMM") $sbase, $soffset, $offset, 0))> {
		let OtherPredicates = [isGFX9Plus];
		}

		// 5. No offset
def : GCNPat <		def : GCNPat <
(vt (smrd_load (i64 SReg_64:$sbase))),		(vt (smrd_load (i64 SReg_64:$sbase))),
(vt (!cast<SM_Pseudo>(Instr#"_IMM") i64:$sbase, 0, 0))		(vt (!cast<SM_Pseudo>(Instr#"_IMM") i64:$sbase, 0, 0))
>;		>;
}		}

multiclass SMLoad_Pattern <string Instr, ValueType vt> {		multiclass SMLoad_Pattern <string Instr, ValueType vt> {
// 1. Offset as an immediate		// 1. Offset as an immediate
▲ Show 20 Lines • Show All 358 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-load-smrd.mir

# RUN: llc -march=amdgcn -mcpu=tahiti -run-pass=instruction-select -verify-machineinstrs %s -o - \| FileCheck %s -check-prefixes=GCN,SI,SICI,SIVI		# RUN: llc -march=amdgcn -mcpu=tahiti -run-pass=instruction-select -verify-machineinstrs %s -o - \| FileCheck %s -check-prefixes=GCN,SI,SICI,SIVI
# RUN: llc -march=amdgcn -mcpu=hawaii -run-pass=instruction-select -verify-machineinstrs %s -o - \| FileCheck %s -check-prefixes=GCN,CI,SICI		# RUN: llc -march=amdgcn -mcpu=hawaii -run-pass=instruction-select -verify-machineinstrs %s -o - \| FileCheck %s -check-prefixes=GCN,CI,SICI
# RUN: llc -march=amdgcn -mcpu=fiji -run-pass=instruction-select -verify-machineinstrs %s -o - \| FileCheck %s -check-prefixes=GCN,VI,SIVI		# RUN: llc -march=amdgcn -mcpu=fiji -run-pass=instruction-select -verify-machineinstrs %s -o - \| FileCheck %s -check-prefixes=GCN,VI,SIVI
		# RUN: llc -march=amdgcn -mcpu=gfx900 -run-pass=instruction-select -verify-machineinstrs %s -o - \| FileCheck %s -check-prefixes=GCN,GFX9

--- \|		--- \|
define amdgpu_kernel void @smrd_imm(i32 addrspace(4)* %const0) { ret void }		define amdgpu_kernel void @smrd_imm(i32 addrspace(4)* %const0) { ret void }
define amdgpu_kernel void @smrd_wide() { ret void }		define amdgpu_kernel void @smrd_wide() { ret void }
define amdgpu_kernel void @constant_address_positive() { ret void }		define amdgpu_kernel void @constant_address_positive() { ret void }
define amdgpu_kernel void @smrd_sgpr() { ret void }		define amdgpu_kernel void @smrd_sgpr() { ret void }
		define amdgpu_kernel void @smrd_sgpr_imm() { ret void }
...		...
---		---

name: smrd_imm		name: smrd_imm
legalized: true		legalized: true
regBankSelected: true		regBankSelected: true

# GCN: body:		# GCN: body:
▲ Show 20 Lines • Show All 209 Lines • ▼ Show 20 Lines	bb.0:
liveins: $sgpr0_sgpr1, $sgpr2		liveins: $sgpr0_sgpr1, $sgpr2
%0:sgpr(p4) = COPY $sgpr0_sgpr1		%0:sgpr(p4) = COPY $sgpr0_sgpr1
%1:sgpr(s32) = COPY $sgpr2		%1:sgpr(s32) = COPY $sgpr2
%2:sgpr(s64) = G_ZEXT %1:sgpr(s32)		%2:sgpr(s64) = G_ZEXT %1:sgpr(s32)
%4:sgpr(p4) = G_PTR_ADD %0, %2		%4:sgpr(p4) = G_PTR_ADD %0, %2
%5:sgpr(s32) = G_LOAD %4 :: (dereferenceable invariant load (s32), align 4, addrspace 4)		%5:sgpr(s32) = G_LOAD %4 :: (dereferenceable invariant load (s32), align 4, addrspace 4)
S_ENDPGM 0, implicit %5		S_ENDPGM 0, implicit %5
...		...

		---

		# Test a load with a (register + immediate) offset.
		# GCN-LABEL: name: smrd_sgpr_imm{{$}}
		# GFX9-DAG: %[[BASE:.*]]:sreg_64 = COPY $sgpr0_sgpr1
		# GFX9-DAG: %[[OFFSET:.*]]:sreg_32 = COPY $sgpr2
		# GFX9: S_LOAD_DWORD_SGPR_IMM %[[BASE]], %[[OFFSET]], 16,

		name: smrd_sgpr_imm
		legalized: true
		regBankSelected: true

		body: \|
		bb.0:
		liveins: $sgpr0_sgpr1, $sgpr2
		%0:sgpr(p4) = COPY $sgpr0_sgpr1
		%1:sgpr(s32) = COPY $sgpr2
		%2:sgpr(s64) = G_ZEXT %1:sgpr(s32)
		%4:sgpr(p4) = G_PTR_ADD %0, %2
		%5:sgpr(s64) = G_CONSTANT i64 16
		%6:sgpr(p4) = G_PTR_ADD %4, %5
		%7:sgpr(s32) = G_LOAD %6 :: (dereferenceable invariant load (s32), align 4, addrspace 4)
		S_ENDPGM 0, implicit %7
		...

llvm/test/CodeGen/AMDGPU/amdgcn-load-offset-from-reg.ll

	; RUN: llc -march=amdgcn -global-isel=0 -verify-machineinstrs -stop-after=amdgpu-isel -o - %s \| FileCheck -check-prefixes=GCN,SDAG %s			; RUN: llc -march=amdgcn -mcpu=gfx900 -global-isel=0 -verify-machineinstrs -stop-after=finalize-isel -o - %s \| FileCheck -check-prefixes=GCN,SDAG %s
	; RUN: llc -march=amdgcn -global-isel=1 -verify-machineinstrs -stop-after=amdgpu-isel -o - %s \| FileCheck -check-prefixes=GCN,GISEL %s			; RUN: llc -march=amdgcn -mcpu=gfx900 -global-isel=1 -verify-machineinstrs -stop-after=finalize-isel -o - %s \| FileCheck -check-prefixes=GCN,GISEL %s

				arsenmUnsubmitted Not Done Reply Inline Actions You shouldn't use -stop-after=amdgpu-isel. Use finalize-isel for MIR tests (although I'm not sure why you're testing MIR here) arsenm: You shouldn't use -stop-after=amdgpu-isel. Use finalize-isel for MIR tests (although I'm not…
				kosarevAuthorUnsubmitted Done Reply Inline Actions This seems to be where we already have checks for a similar case? Is there a better place for that? kosarev: This seems to be where we already have checks for a similar case? Is there a better place for…
				kosarevAuthorUnsubmitted Done Reply Inline Actions You shouldn't use -stop-after=amdgpu-isel. Can you expand a bit more on why this is preferable, please? I see literally 4 tests utilising finalize-isel, none of them look like purely isel tests, and the other 35 all seem to use amdgpu-isel. The description of finalize-isel as it comes from rG9cac4e6d14035 again doesn't suggest anything obviously useful for this test? kosarev: > You shouldn't use -stop-after=amdgpu-isel. Can you expand a bit more on why this is…
				arsenmUnsubmitted Not Done Reply Inline Actions amdgpu-isel is specifically the SDAG pass. Your GISEL check didn't stop anywhere near the same place, and -stop-pass ran off the end and completed register allocation arsenm: amdgpu-isel is specifically the SDAG pass. Your GISEL check didn't stop anywhere near the same…
				kosarevAuthorUnsubmitted Done Reply Inline Actions What if just remove GISEL checks from this test completely then? kosarev: What if just remove GISEL checks from this test completely then?
				arsenmUnsubmitted Not Done Reply Inline Actions I'd rather just use -stop-after finalize-isel arsenm: I'd rather just use -stop-after finalize-isel
	@0 = external dso_local addrspace(4) constant [4 x <2 x float>]			@0 = external dso_local addrspace(4) constant [4 x <2 x float>]
	@1 = external dso_local addrspace(4) constant i32			@1 = external dso_local addrspace(4) constant i32

	; Test that DAG->DAG ISel is able to pick up the S_LOAD_DWORDX4_SGPR instruction that fetches the offset			; Test that DAG->DAG ISel is able to pick up the S_LOAD_DWORDX4_SGPR instruction that fetches the offset
	; from a register.			; from a register.
	; GCN-LABEL: name: test_load_zext			; GCN-LABEL: name: test_load_zext
	; SDAG: %[[OFFSET:[0-9]+]]:sreg_32 = S_MOV_B32 target-flags(amdgpu-abs32-lo) @DescriptorBuffer			; GCN: %[[OFFSET:[0-9]+]]:sreg_32 = S_MOV_B32 target-flags(amdgpu-abs32-lo) @DescriptorBuffer
	; SDAG: %{{[0-9]+}}:sgpr_128 = S_LOAD_DWORDX4_SGPR killed %{{[0-9]+}}, killed %[[OFFSET]], 0 :: (invariant load (s128) from %ir.13, addrspace 4)			; SDAG: %{{[0-9]+}}:sgpr_128 = S_LOAD_DWORDX4_SGPR killed %{{[0-9]+}}, killed %[[OFFSET]], 0 :: (invariant load (s128) from %ir.13, addrspace 4)
	; GISEL: $[[OFFSET:.*]] = S_MOV_B32 target-flags(amdgpu-abs32-lo) @DescriptorBuffer			; GISEL: %{{[0-9]+}}:sgpr_128 = S_LOAD_DWORDX4_SGPR %{{[0-9]+}}, %[[OFFSET]], 0 :: (invariant load (<4 x s32>) from {{.*}}, addrspace 4)
	; GISEL: S_LOAD_DWORDX4_SGPR killed renamable {{.}}, killed renamable $[[OFFSET]], 0 :: (invariant load (<4 x s32>) from {{.}}, addrspace 4)
	define amdgpu_cs void @test_load_zext(i32 inreg %0, i32 inreg %1, i32 inreg %resNode0, i32 inreg %resNode1, <3 x i32> inreg %2, i32 inreg %3, <3 x i32> %4) local_unnamed_addr #2 {			define amdgpu_cs void @test_load_zext(i32 inreg %0, i32 inreg %1, i32 inreg %resNode0, i32 inreg %resNode1, <3 x i32> inreg %2, i32 inreg %3, <3 x i32> %4) local_unnamed_addr #2 {
	.entry:			.entry:
	%5 = call i64 @llvm.amdgcn.s.getpc() #3			%5 = call i64 @llvm.amdgcn.s.getpc() #3
	%6 = bitcast i64 %5 to <2 x i32>			%6 = bitcast i64 %5 to <2 x i32>
	%7 = insertelement <2 x i32> %6, i32 %resNode0, i32 0			%7 = insertelement <2 x i32> %6, i32 %resNode0, i32 0
	%8 = bitcast <2 x i32> %7 to i64			%8 = bitcast <2 x i32> %7 to i64
	%9 = inttoptr i64 %8 to [4294967295 x i8] addrspace(4)*			%9 = inttoptr i64 %8 to [4294967295 x i8] addrspace(4)*
	%10 = call i32 @llvm.amdgcn.reloc.constant(metadata !4)			%10 = call i32 @llvm.amdgcn.reloc.constant(metadata !4)
	%11 = zext i32 %10 to i64			%11 = zext i32 %10 to i64
	%12 = getelementptr [4294967295 x i8], [4294967295 x i8] addrspace(4)* %9, i64 0, i64 %11			%12 = getelementptr [4294967295 x i8], [4294967295 x i8] addrspace(4)* %9, i64 0, i64 %11
	%13 = bitcast i8 addrspace(4)* %12 to <4 x i32> addrspace(4)*, !amdgpu.uniform !5			%13 = bitcast i8 addrspace(4)* %12 to <4 x i32> addrspace(4)*, !amdgpu.uniform !5
	%14 = load <4 x i32>, <4 x i32> addrspace(4)* %13, align 16, !invariant.load !5			%14 = load <4 x i32>, <4 x i32> addrspace(4)* %13, align 16, !invariant.load !5
	%15 = call <4 x i32> @llvm.amdgcn.s.buffer.load.v4i32(<4 x i32> %14, i32 0, i32 0)			%15 = call <4 x i32> @llvm.amdgcn.s.buffer.load.v4i32(<4 x i32> %14, i32 0, i32 0)
	call void @llvm.amdgcn.raw.buffer.store.v4i32(<4 x i32> %15, <4 x i32> %14, i32 0, i32 0, i32 0)			call void @llvm.amdgcn.raw.buffer.store.v4i32(<4 x i32> %15, <4 x i32> %14, i32 0, i32 0, i32 0)
	ret void			ret void
	}			}

	; Make sure we match constant bases with register offests, in which case			; Make sure we match constant bases with register offests, in which case
	; the base may be the RHS operand of the load in SDAG.			; the base may be the RHS operand of the load in SDAG.
	; GCN-LABEL: name: test_complex_reg_offset			; GCN-LABEL: name: test_complex_reg_offset
	; SDAG-DAG: %[[BASE:.*]]:sreg_64 = SI_PC_ADD_REL_OFFSET target-flags(amdgpu-rel32-lo) @0 + 4,			; GCN-DAG: %[[BASE:.*]]:sreg_64 = SI_PC_ADD_REL_OFFSET target-flags(amdgpu-rel32-lo) @0 + 4,
	; SDAG-DAG: %[[OFFSET:.*]]:sreg_32 = S_LSHL_B32			; GCN-DAG: %[[OFFSET:.*]]:sreg_32 = S_LSHL_B32
	; SDAG: S_LOAD_DWORD_SGPR killed %[[BASE]], killed %[[OFFSET]],			; SDAG: S_LOAD_DWORD_SGPR killed %[[BASE]], killed %[[OFFSET]],
	; GISEL-DAG: $[[BASE0:.*]] = S_ADD_U32 internal $sgpr0, target-flags(amdgpu-rel32-lo) @0 + 4,			; GISEL: S_LOAD_DWORD_SGPR %[[BASE]], %[[OFFSET]],
	; GISEL-DAG: $[[BASE1:.*]] = S_ADDC_U32 internal $sgpr1, target-flags(amdgpu-rel32-hi) @0 + 12,
	; GISEL-DAG: $[[OFFSET:.*]] = S_LSHL_B32
	; GISEL-NOT: [[OFFSET]] =
	; GISEL: S_LOAD_DWORD_SGPR killed renamable $[[BASE0]]_[[BASE1]], killed renamable $[[OFFSET]],
	define amdgpu_ps void @test_complex_reg_offset(float addrspace(1)* %out) {			define amdgpu_ps void @test_complex_reg_offset(float addrspace(1)* %out) {
	%i = load i32, i32 addrspace(4)* @1			%i = load i32, i32 addrspace(4)* @1
	%i1 = and i32 %i, 3			%i1 = and i32 %i, 3
	%i2 = zext i32 %i1 to i64			%i2 = zext i32 %i1 to i64
	%i3 = getelementptr [4 x <2 x float>], [4 x <2 x float>] addrspace(4)* @0, i64 0, i64 %i2, i64 0			%i3 = getelementptr [4 x <2 x float>], [4 x <2 x float>] addrspace(4)* @0, i64 0, i64 %i2, i64 0
	%i4 = load float, float addrspace(4)* %i3, align 4			%i4 = load float, float addrspace(4)* %i3, align 4
	store float %i4, float addrspace(1)* %out			store float %i4, float addrspace(1)* %out
	ret void			ret void
	}			}

				; GCN-LABEL: name: test_sgpr_plus_imm_offset
				; SDAG-DAG: %[[BASE0:.*]]:sgpr_32 = COPY $sgpr0
				; SDAG-DAG: %[[BASE1:.*]]:sgpr_32 = COPY $sgpr1
				; SDAG-DAG: %[[OFFSET:.*]]:sgpr_32 = COPY $sgpr2
				; SDAG-DAG: %[[BASE:.*]]:sgpr_64 = REG_SEQUENCE %[[BASE0]], %subreg.sub0, %[[BASE1]], %subreg.sub1
				; SDAG: S_LOAD_DWORD_SGPR_IMM killed %[[BASE]], %[[OFFSET]], 16,
				; GISEL-DAG: %[[BASE0:.*]]:sreg_32 = COPY $sgpr0
				; GISEL-DAG: %[[BASE1:.*]]:sreg_32 = COPY $sgpr1
				; GISEL-DAG: %[[OFFSET:.*]]:sreg_32 = COPY $sgpr2
				; GISEL-DAG: %[[BASE:.*]]:sreg_64 = REG_SEQUENCE %[[BASE0]], %subreg.sub0, %[[BASE1]], %subreg.sub1
				; GISEL: S_LOAD_DWORD_SGPR_IMM %[[BASE]], %[[OFFSET]], 16,
				define amdgpu_ps void @test_sgpr_plus_imm_offset(i8 addrspace(4)* inreg %base, i32 inreg %offset,
				i32 addrspace(1)* inreg %out) {
				%v1 = getelementptr i8, i8 addrspace(4)* %base, i64 16
				arsenmUnsubmitted Done Reply Inline Actions Also test some other load sizes? arsenm: Also test some other load sizes?
				%v2 = zext i32 %offset to i64
				%v3 = getelementptr i8, i8 addrspace(4)* %v1, i64 %v2
				%v4 = bitcast i8 addrspace(4)* %v3 to i32 addrspace(4)*
				%v5 = load i32, i32 addrspace(4)* %v4, align 4
				store i32 %v5, i32 addrspace(1)* %out, align 4
				ret void
				}

				; GCN-LABEL: name: test_sgpr_plus_imm_offset_x2
				; SDAG-DAG: %[[BASE0:.*]]:sgpr_32 = COPY $sgpr0
				; SDAG-DAG: %[[BASE1:.*]]:sgpr_32 = COPY $sgpr1
				; SDAG-DAG: %[[OFFSET:.*]]:sgpr_32 = COPY $sgpr2
				; SDAG-DAG: %[[BASE:.*]]:sgpr_64 = REG_SEQUENCE %[[BASE0]], %subreg.sub0, %[[BASE1]], %subreg.sub1
				; SDAG: S_LOAD_DWORDX2_SGPR_IMM killed %[[BASE]], %[[OFFSET]], 16,
				; GISEL-DAG: %[[BASE0:.*]]:sreg_32 = COPY $sgpr0
				; GISEL-DAG: %[[BASE1:.*]]:sreg_32 = COPY $sgpr1
				; GISEL-DAG: %[[OFFSET:.*]]:sreg_32 = COPY $sgpr2
				; GISEL-DAG: %[[BASE:.*]]:sreg_64 = REG_SEQUENCE %[[BASE0]], %subreg.sub0, %[[BASE1]], %subreg.sub1
				; GISEL: S_LOAD_DWORDX2_SGPR_IMM %[[BASE]], %[[OFFSET]], 16,
				define amdgpu_ps void @test_sgpr_plus_imm_offset_x2(i8 addrspace(4)* inreg %base, i32 inreg %offset,
				<2 x i32> addrspace(1)* inreg %out) {
				%v1 = getelementptr i8, i8 addrspace(4)* %base, i64 16
				%v2 = zext i32 %offset to i64
				%v3 = getelementptr i8, i8 addrspace(4)* %v1, i64 %v2
				%v4 = bitcast i8 addrspace(4)* %v3 to <2 x i32> addrspace(4)*
				%v5 = load <2 x i32>, <2 x i32> addrspace(4)* %v4, align 4
				store <2 x i32> %v5, <2 x i32> addrspace(1)* %out, align 4
				ret void
				}

	declare void @llvm.amdgcn.raw.buffer.store.v4i32(<4 x i32>, <4 x i32>, i32, i32, i32 immarg) #1			declare void @llvm.amdgcn.raw.buffer.store.v4i32(<4 x i32>, <4 x i32>, i32, i32, i32 immarg) #1

	; Function Attrs: nounwind readnone speculatable			; Function Attrs: nounwind readnone speculatable
	declare i32 @llvm.amdgcn.reloc.constant(metadata) #3			declare i32 @llvm.amdgcn.reloc.constant(metadata) #3

	; Function Attrs: nounwind readnone speculatable			; Function Attrs: nounwind readnone speculatable
	declare i64 @llvm.amdgcn.s.getpc() #3			declare i64 @llvm.amdgcn.s.getpc() #3

	Show All 27 Lines