This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AMDGPU/
-
Target/
-
AMDGPU/
-
SIInstrInfo.h
18/24
SIInstrInfo.cpp
-
test/CodeGen/AMDGPU/
-
CodeGen/
-
AMDGPU/
-
hsa-metadata-kernel-code-props-v3.ll
-
hsa-metadata-kernel-code-props.ll
2/2
remat-smrd.mir
-
snippet-copy-bundle-regression.mir

Differential D154083

[AMDGPU] Rematerialize scalar loads
Needs ReviewPublic

Authored by piotr on Jun 29 2023, 6:59 AM.

Download Raw Diff

Details

Reviewers

arsenm
rampitec
foad

Summary

Extend the list of instructions that can be rematerialized in
SIInstrInfo::isReallyTriviallyReMaterializable() to support scalar loads.

Try shrinking instructions to remat only the part needed for current
context. Add SIInstrInfo::reMaterialize target hook, and handle shrinking
of S_LOAD_DWORDX16_IMM to S_LOAD_DWORDX8_IMM as a proof of concept.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

piotr created this revision.Jun 29 2023, 6:59 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 29 2023, 6:59 AM

Herald added subscribers: foad, kerbowa, hiraditya and 6 others. · View Herald Transcript

piotr requested review of this revision.Jun 29 2023, 6:59 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 29 2023, 6:59 AM

Herald added subscribers: llvm-commits, wdng. · View Herald Transcript

piotr added inline comments.Jun 29 2023, 7:23 AM

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
109	This is a pre-existent issue, but I am not very comfortable with the fact that we override `isReallyTriviallyReMaterializable` the way we do. It works because we rely on extra checks being present before the method is invoked (for example on calling `LiveRangeEdit::allUsesAvailableAt()`). However, the comment in the base class says `isReallyTriviallyReMaterializable` "must return false if (..) or if it requres any address registers that are not always available." Having said that, I am not sure if there is a practical solution for that (apart from softening the comment in the base class).
114	Ironically, the addition of the check `isDereferenceableInvariantLoad` makes the examples that drive this work not handled, as the S_LOADs are not marked as `dereferenceable` at the moment (but that is another issue).
llvm/test/CodeGen/AMDGPU/remat-smrd.mir
4	The patch handles the positive cases involving loads, and does not affect the negative cases.
4	(Need to add a test for shrinking)

piotr added reviewers: arsenm, rampitec, foad.Jun 29 2023, 7:24 AM

Herald added a subscriber: StephenFan. · View Herald TranscriptJun 29 2023, 7:24 AM

Harbormaster completed remote builds in B242075: Diff 535774.Jun 29 2023, 8:06 AM

arsenm added inline comments.Jun 29 2023, 9:01 AM

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
109	Yes, I don't like this either. The way the rematerializable hooks is structured is confusing and part of why this never got done before
114	Don't really need the isSMRD with the isDereferenceableInvariantLoad. We should also be able to handle any load
2381	Don't name TRI, should use RI member here
2387–2388	Fix weird line breaks, comment before switch
2393	I'd hope we would have access to a live lane mask that we need to handle. Failing that, do you really need the copy opcode exception?
2396	don't use auto here, or at least use a const *
2400	s/TRI/RI/
2407	Don't think this can happen
2419	Can you go through getSubRegisterClass (possibly with getMatchingSuperRegClass and getSubClassWithSubReg) to avoid hardcoding this
2427	keep this all as int64_t

piotr added inline comments.Jun 29 2023, 10:00 AM

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
114	That was my intuition as well, but I added this extra check for these reasons: the negative tests in remat-smrd.mir contain cases where the loads would be rematted otherwise e.g. test_no_remat_s_load_dword_not_dereferenceable the generic hook that we bypass has a similar condition for loads (https://github.com/llvm/llvm-project/blob/c304be7cfdd2261811671feb252e31222365b475/llvm/lib/CodeGen/TargetInstrInfo.cpp#L1103) Maybe checking for MMO->isInvariant() is enough? And that would mean the negative test would turn to a positive one.

arsenm added inline comments.Jun 29 2023, 11:13 AM

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
114	Would rematerialization ever happen in a context where the use didn't have the same dominating conditions as at the def? My intuition is that it couldn't, and thus the dereferenceable check (including the generic one) would be too strong
2419	Easier yet would be just use the result class from the instruction desc Also, is this safe from other users with a different class?

piotr added inline comments.Jun 30 2023, 9:15 AM

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
2419	Thanks - will rewrite this, but what exactly do you mean here by "safe from other users"?

arsenm added inline comments.Jun 30 2023, 12:26 PM

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
2419	I mean other instructions using the same virtual register that aren't expecting the class to change. You're mutating the existing register and not creating a new one with the new class

Addressed review comments.
Relaxed check to include all invariant loads, not only dereferenceable ones.
Rebased patch over the commit with new/changed tests.
Updated MMO with new size in the shrinking path.

Harbormaster completed remote builds in B244060: Diff 538530.Jul 10 2023, 12:45 AM

piotr marked 9 inline comments as done.Jul 10 2023, 12:51 AM

piotr added inline comments.

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
2419	I'm mutating DestReg which does not have users. Adding the assertion, but maybe an early out would be more future-proof.

arsenm added inline comments.Jul 10 2023, 6:01 PM

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
114	As a pre-commit can you post a patch to weaken the generic check? Best to be on the same page as all targets for this

Could really use a MIR test that shows this. Also would be nice to have some evil cases, where the result register is tied to the input pointer register

In D154083#4490691, @arsenm wrote:

Could really use a MIR test that shows this. Also would be nice to have some evil cases, where the result register is tied to the input pointer register

This patch is now based on a test update (https://reviews.llvm.org/D154816), where I am also adding a new test that exercises the shrinking - test_remat_s_load_dword_immx16_subreg.

Can you describe the evil case(s) in more detail? Do you mean S_LOAD_DWORDX16_IMM with tied-def, or something else?

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
114	Ok, will take a look.

Ran extensive testing on graphics workloads, which uncovered some bugs. Added fixes and more tests for those interesting cases in D154816.

Harbormaster completed remote builds in B246704: Diff 542230.Jul 19 2023, 4:05 PM

In D154083#4493643, @piotr wrote:

In D154083#4490691, @arsenm wrote:

Could really use a MIR test that shows this. Also would be nice to have some evil cases, where the result register is tied to the input pointer register

This patch is now based on a test update (https://reviews.llvm.org/D154816), where I am also adding a new test that exercises the shrinking - test_remat_s_load_dword_immx16_subreg.

Can you describe the evil case(s) in more detail? Do you mean S_LOAD_DWORDX16_IMM with tied-def, or something else?

Yes. I don't think subregisters with tied operands are particularly well defined, but I was thinking something like %0:sreg_256 = S_LOAD_DWORDX16 %0.sub0_sub1

In D154083#4516643, @arsenm wrote:

In D154083#4493643, @piotr wrote:

In D154083#4490691, @arsenm wrote:

Could really use a MIR test that shows this. Also would be nice to have some evil cases, where the result register is tied to the input pointer register

This patch is now based on a test update (https://reviews.llvm.org/D154816), where I am also adding a new test that exercises the shrinking - test_remat_s_load_dword_immx16_subreg.

Can you describe the evil case(s) in more detail? Do you mean S_LOAD_DWORDX16_IMM with tied-def, or something else?

Yes. I don't think subregisters with tied operands are particularly well defined, but I was thinking something like %0:sreg_256 = S_LOAD_DWORDX16 %0.sub0_sub1

Added test case in D154816.

piotr added inline comments.Jul 21 2023, 12:59 AM

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
114	I looked closer at that. Checked that if I modify `TargetInstrInfo::isReallyTriviallyReMaterializableGeneric` with a relaxed check (copy of MI.isDereferenceableInvariantLoad with `&& MMO->isDereferenceable()` removed) - then that makes no difference in the existing tests. X86 relies on further checks in that function "A load from a constant PseudoSourceValue is invariant". So while that boosted my confidence that the generic code does not rely on "dereferenceable", I am not sure if any changes to the generic check are practical. That would be pretty ugly, as I'd have to pretty much copy `isDereferenceableInvariantLoad` only to have `isDereferenceable()` check removed.

piotr added inline comments.Jul 21 2023, 1:01 AM

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
114	(And AMDGPU would not benefit from any of this as we rely on our own logic anyway.)

arsenm added inline comments.Jul 26 2023, 10:15 AM

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
114	I don't follow how it's difficult. It's straightforward relaxing of a restricition? I don't understand how the x86 case complicates matters (also, x86 shouldn't need to special case that instance either)

piotr added inline comments.Jul 26 2023, 11:59 PM

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
114	It is not difficult, but I am unsure if we really want to extend MachineInstr with isInvariantLoad() that would be a copy of isDereferenceableInvariantLoad() but with the relaxed condition. I can add a private common function to avoid duplicating the code, but I do not see how we can avoid adding a new function. We cannot just edit isDereferenceableInvariantLoad because it has more uses than just the one inside isReallyTriviallyReMaterializableGeneric(), and some of them really need isDereferenceable check. As an alternative, instead of modifying MachineInstr I could just add a new private function in TargetInstrInfo, just for the use in isReallyTriviallyReMaterializableGeneric().

piotr mentioned this in D156998: [NFC] Pre-commit test for dead bundle bug.Aug 3 2023, 6:37 AM

piotr mentioned this in D156999: [Inline Spiller] Consider bundles when marking defs as dead.Aug 3 2023, 6:39 AM

Rebased.

Harbormaster completed remote builds in B254902: Diff 553492.Aug 25 2023, 8:28 AM

Ping - I think the only unresolved point is potential weakening of the generic check.

Ping.

Tested on 10k pipelines from Vulkan games, this patch reduces the number of v_writelane instructions from 9842 to 6440 (at the expense of using more loads of course).

Any objections to pushing this?

Ping.

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

SIInstrInfo.h

5 lines

SIInstrInfo.cpp

110 lines

test/

CodeGen/

AMDGPU/

hsa-metadata-kernel-code-props-v3.ll

4 lines

hsa-metadata-kernel-code-props.ll

4 lines

remat-smrd.mir

116 lines

snippet-copy-bundle-regression.mir

42 lines

Diff 553492

llvm/lib/Target/AMDGPU/SIInstrInfo.h

Show First 20 Lines • Show All 266 Lines • ▼ Show 20 Lines	public:
void loadRegFromStackSlot(MachineBasicBlock &MBB,		void loadRegFromStackSlot(MachineBasicBlock &MBB,
MachineBasicBlock::iterator MI, Register DestReg,		MachineBasicBlock::iterator MI, Register DestReg,
int FrameIndex, const TargetRegisterClass *RC,		int FrameIndex, const TargetRegisterClass *RC,
const TargetRegisterInfo *TRI,		const TargetRegisterInfo *TRI,
Register VReg) const override;		Register VReg) const override;

bool expandPostRAPseudo(MachineInstr &MI) const override;		bool expandPostRAPseudo(MachineInstr &MI) const override;

		void reMaterialize(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI,
		Register DestReg, unsigned SubIdx,
		const MachineInstr &Orig,
		const TargetRegisterInfo &TRI) const override;

// Splits a V_MOV_B64_DPP_PSEUDO opcode into a pair of v_mov_b32_dpp		// Splits a V_MOV_B64_DPP_PSEUDO opcode into a pair of v_mov_b32_dpp
// instructions. Returns a pair of generated instructions.		// instructions. Returns a pair of generated instructions.
// Can split either post-RA with physical registers or pre-RA with		// Can split either post-RA with physical registers or pre-RA with
// virtual registers. In latter case IR needs to be in SSA form and		// virtual registers. In latter case IR needs to be in SSA form and
// and a REG_SEQUENCE is produced to define original register.		// and a REG_SEQUENCE is produced to define original register.
std::pair<MachineInstr, MachineInstr>		std::pair<MachineInstr, MachineInstr>
expandMovDPP64(MachineInstr &MI) const;		expandMovDPP64(MachineInstr &MI) const;

▲ Show 20 Lines • Show All 1,139 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 100 Lines • ▼ Show 20 Lines	static bool nodesHaveSameOperandValue(SDNode N0, SDNode N1, unsigned OpName) {
// MachineSDNode's operands, so we need to skip the result operand to get		// MachineSDNode's operands, so we need to skip the result operand to get
// the real index.		// the real index.
--Op0Idx;		--Op0Idx;
--Op1Idx;		--Op1Idx;

return N0->getOperand(Op0Idx) == N1->getOperand(Op1Idx);		return N0->getOperand(Op0Idx) == N1->getOperand(Op1Idx);
}		}

bool SIInstrInfo::isReallyTriviallyReMaterializable(		bool SIInstrInfo::isReallyTriviallyReMaterializable(
		piotrAuthorUnsubmitted Done Reply Inline Actions This is a pre-existent issue, but I am not very comfortable with the fact that we override `isReallyTriviallyReMaterializable` the way we do. It works because we rely on extra checks being present before the method is invoked (for example on calling `LiveRangeEdit::allUsesAvailableAt()`). However, the comment in the base class says `isReallyTriviallyReMaterializable` "must return false if (..) or if it requres any address registers that are not always available." Having said that, I am not sure if there is a practical solution for that (apart from softening the comment in the base class). piotr: This is a pre-existent issue, but I am not very comfortable with the fact that we override…
		arsenmUnsubmitted Not Done Reply Inline Actions Yes, I don't like this either. The way the rematerializable hooks is structured is confusing and part of why this never got done before arsenm: Yes, I don't like this either. The way the rematerializable hooks is structured is confusing…
const MachineInstr &MI) const {		const MachineInstr &MI) const {

		bool CanRemat = false;
if (isVOP1(MI) \|\| isVOP2(MI) \|\| isVOP3(MI) \|\| isSDWA(MI) \|\| isSALU(MI)) {		if (isVOP1(MI) \|\| isVOP2(MI) \|\| isVOP3(MI) \|\| isSDWA(MI) \|\| isSALU(MI)) {
		CanRemat = true;
		piotrAuthorUnsubmitted Done Reply Inline Actions Ironically, the addition of the check `isDereferenceableInvariantLoad` makes the examples that drive this work not handled, as the S_LOADs are not marked as `dereferenceable` at the moment (but that is another issue). piotr: Ironically, the addition of the check `isDereferenceableInvariantLoad` makes the examples that…
		arsenmUnsubmitted Not Done Reply Inline Actions Don't really need the isSMRD with the isDereferenceableInvariantLoad. We should also be able to handle any load arsenm: Don't really need the isSMRD with the isDereferenceableInvariantLoad. We should also be able to…
		piotrAuthorUnsubmitted Done Reply Inline Actions That was my intuition as well, but I added this extra check for these reasons: the negative tests in remat-smrd.mir contain cases where the loads would be rematted otherwise e.g. test_no_remat_s_load_dword_not_dereferenceable the generic hook that we bypass has a similar condition for loads (https://github.com/llvm/llvm-project/blob/c304be7cfdd2261811671feb252e31222365b475/llvm/lib/CodeGen/TargetInstrInfo.cpp#L1103) Maybe checking for MMO->isInvariant() is enough? And that would mean the negative test would turn to a positive one. piotr: That was my intuition as well, but I added this extra check for these reasons: - the negative…
		arsenmUnsubmitted Not Done Reply Inline Actions Would rematerialization ever happen in a context where the use didn't have the same dominating conditions as at the def? My intuition is that it couldn't, and thus the dereferenceable check (including the generic one) would be too strong arsenm: Would rematerialization ever happen in a context where the use didn't have the same dominating…
		arsenmUnsubmitted Not Done Reply Inline Actions As a pre-commit can you post a patch to weaken the generic check? Best to be on the same page as all targets for this arsenm: As a pre-commit can you post a patch to weaken the generic check? Best to be on the same page…
		piotrAuthorUnsubmitted Done Reply Inline Actions Ok, will take a look. piotr: Ok, will take a look.
		piotrAuthorUnsubmitted Done Reply Inline Actions I looked closer at that. Checked that if I modify `TargetInstrInfo::isReallyTriviallyReMaterializableGeneric` with a relaxed check (copy of MI.isDereferenceableInvariantLoad with `&& MMO->isDereferenceable()` removed) - then that makes no difference in the existing tests. X86 relies on further checks in that function "A load from a constant PseudoSourceValue is invariant". So while that boosted my confidence that the generic code does not rely on "dereferenceable", I am not sure if any changes to the generic check are practical. That would be pretty ugly, as I'd have to pretty much copy `isDereferenceableInvariantLoad` only to have `isDereferenceable()` check removed. piotr: I looked closer at that. Checked that if I modify `TargetInstrInfo…
		piotrAuthorUnsubmitted Done Reply Inline Actions (And AMDGPU would not benefit from any of this as we rely on our own logic anyway.) piotr: (And AMDGPU would not benefit from any of this as we rely on our own logic anyway.)
		arsenmUnsubmitted Not Done Reply Inline Actions I don't follow how it's difficult. It's straightforward relaxing of a restricition? I don't understand how the x86 case complicates matters (also, x86 shouldn't need to special case that instance either) arsenm: I don't follow how it's difficult. It's straightforward relaxing of a restricition? I don't…
		piotrAuthorUnsubmitted Done Reply Inline Actions It is not difficult, but I am unsure if we really want to extend MachineInstr with isInvariantLoad() that would be a copy of isDereferenceableInvariantLoad() but with the relaxed condition. I can add a private common function to avoid duplicating the code, but I do not see how we can avoid adding a new function. We cannot just edit isDereferenceableInvariantLoad because it has more uses than just the one inside isReallyTriviallyReMaterializableGeneric(), and some of them really need isDereferenceable check. As an alternative, instead of modifying MachineInstr I could just add a new private function in TargetInstrInfo, just for the use in isReallyTriviallyReMaterializableGeneric(). piotr: It is not difficult, but I am unsure if we really want to extend MachineInstr with…
		} else if (isSMRD(MI)) {
		CanRemat = !MI.memoperands_empty() &&
		llvm::all_of(MI.memoperands(), [](const MachineMemOperand *MMO) {
		return MMO->isLoad() && MMO->isInvariant();
		});
		}

		if (CanRemat) {
// Normally VALU use of exec would block the rematerialization, but that		// Normally VALU use of exec would block the rematerialization, but that
// is OK in this case to have an implicit exec read as all VALU do.		// is OK in this case to have an implicit exec read as all VALU do.
// We really want all of the generic logic for this except for this.		// We really want all of the generic logic for this except for this.

// Another potential implicit use is mode register. The core logic of		// Another potential implicit use is mode register. The core logic of
// the RA will not attempt rematerialization if mode is set anywhere		// the RA will not attempt rematerialization if mode is set anywhere
// in the function, otherwise it is safe since mode is not changed.		// in the function, otherwise it is safe since mode is not changed.

▲ Show 20 Lines • Show All 2,239 Lines • ▼ Show 20 Lines	case AMDGPU::SI_RETURN: {
MIB.copyImplicitOps(MI);		MIB.copyImplicitOps(MI);
MI.eraseFromParent();		MI.eraseFromParent();
break;		break;
}		}
}		}
return true;		return true;
}		}

		void SIInstrInfo::reMaterialize(MachineBasicBlock &MBB,
		MachineBasicBlock::iterator I, Register DestReg,
		unsigned SubIdx, const MachineInstr &Orig,
		const TargetRegisterInfo &RI) const {
		arsenmUnsubmitted Done Reply Inline Actions Don't name TRI, should use RI member here arsenm: Don't name TRI, should use RI member here

		// Try shrinking the instruction to remat only the part needed for current
		// context.
		// TODO: Handle more cases.
		unsigned Opcode = Orig.getOpcode();
		switch (Opcode) {
		case AMDGPU::S_LOAD_DWORDX16_IMM:
		arsenmUnsubmitted Done Reply Inline Actions Fix weird line breaks, comment before switch arsenm: Fix weird line breaks, comment before switch
		case AMDGPU::S_LOAD_DWORDX8_IMM: {
		if (SubIdx != 0)
		break;

		if (I == MBB.end())
		arsenmUnsubmitted Done Reply Inline Actions I'd hope we would have access to a live lane mask that we need to handle. Failing that, do you really need the copy opcode exception? arsenm: I'd hope we would have access to a live lane mask that we need to handle. Failing that, do you…
		break;

		if (I->isBundled())
		arsenmUnsubmitted Done Reply Inline Actions don't use auto here, or at least use a const * arsenm: don't use auto here, or at least use a const *
		break;

		// Look for a single use of the register that is also a subreg.
		Register RegToFind = Orig.getOperand(0).getReg();
		arsenmUnsubmitted Done Reply Inline Actions s/TRI/RI/ arsenm: s/TRI/RI/
		int SingleUseIdx = -1;
		for (unsigned i = 0, e = I->getNumOperands(); i != e; ++i) {
		const MachineOperand &CandMO = I->getOperand(i);
		if (!CandMO.isReg())
		continue;
		Register CandReg = CandMO.getReg();
		if (!CandReg)
		arsenmUnsubmitted Done Reply Inline Actions Don't think this can happen arsenm: Don't think this can happen
		continue;

		if (CandReg == RegToFind \|\| RI.regsOverlap(CandReg, RegToFind)) {
		if (SingleUseIdx == -1 && CandMO.isUse()) {
		SingleUseIdx = i;
		} else {
		SingleUseIdx = -1;
		break;
		}
		}
		}
		if (SingleUseIdx == -1)
		arsenmUnsubmitted Done Reply Inline Actions Can you go through getSubRegisterClass (possibly with getMatchingSuperRegClass and getSubClassWithSubReg) to avoid hardcoding this arsenm: Can you go through getSubRegisterClass (possibly with getMatchingSuperRegClass and…
		arsenmUnsubmitted Done Reply Inline Actions Easier yet would be just use the result class from the instruction desc Also, is this safe from other users with a different class? arsenm: Easier yet would be just use the result class from the instruction desc Also, is this safe…
		piotrAuthorUnsubmitted Done Reply Inline Actions Thanks - will rewrite this, but what exactly do you mean here by "safe from other users"? piotr: Thanks - will rewrite this, but what exactly do you mean here by "safe from other users"?
		arsenmUnsubmitted Not Done Reply Inline Actions I mean other instructions using the same virtual register that aren't expecting the class to change. You're mutating the existing register and not creating a new one with the new class arsenm: I mean other instructions using the same virtual register that aren't expecting the class to…
		piotrAuthorUnsubmitted Done Reply Inline Actions I'm mutating DestReg which does not have users. Adding the assertion, but maybe an early out would be more future-proof. piotr: I'm mutating DestReg which does not have users. Adding the assertion, but maybe an early out…
		break;
		MachineOperand *UseMO = &I->getOperand(SingleUseIdx);
		if (UseMO->getSubReg() == AMDGPU::NoSubRegister)
		break;

		unsigned Offset = RI.getSubRegIdxOffset(UseMO->getSubReg());
		unsigned SubregSize = RI.getSubRegIdxSize(UseMO->getSubReg());

		arsenmUnsubmitted Done Reply Inline Actions keep this all as int64_t arsenm: keep this all as int64_t
		MachineFunction *MF = MBB.getParent();
		MachineRegisterInfo &MRI = MF->getRegInfo();
		assert(MRI.hasAtMostUserInstrs(DestReg, 0) &&
		"DestReg should have no users yet.");

		unsigned NewOpcode = -1;
		if (SubregSize == 256)
		NewOpcode = AMDGPU::S_LOAD_DWORDX8_IMM;
		else if (SubregSize == 128)
		NewOpcode = AMDGPU::S_LOAD_DWORDX4_IMM;
		else
		break;

		const MCInstrDesc &TID = get(NewOpcode);
		const TargetRegisterClass *NewRC =
		RI.getAllocatableClass(getRegClass(TID, 0, &RI, *MF));
		MRI.setRegClass(DestReg, NewRC);

		UseMO->setReg(DestReg);
		UseMO->setSubReg(AMDGPU::NoSubRegister);

		// Use a smaller load with the desired size, possibly with updated offset.
		MachineInstr *MI = MF->CloneMachineInstr(&Orig);
		MI->setDesc(TID);
		MI->getOperand(0).setReg(DestReg);
		MI->getOperand(0).setSubReg(AMDGPU::NoSubRegister);
		if (Offset) {
		MachineOperand OffsetMO = getNamedOperand(MI, AMDGPU::OpName::offset);
		int64_t FinalOffset = OffsetMO->getImm() + Offset / 8;
		OffsetMO->setImm(FinalOffset);
		}
		SmallVector<MachineMemOperand *> NewMMOs;
		for (const MachineMemOperand *MemOp : Orig.memoperands())
		NewMMOs.push_back(MF->getMachineMemOperand(MemOp, MemOp->getPointerInfo(),
		SubregSize / 8));
		MI->setMemRefs(*MF, NewMMOs);

		MBB.insert(I, MI);
		return;
		}

		default:
		break;
		}
		MachineInstr *MI = MBB.getParent()->CloneMachineInstr(&Orig);
		MI->substituteRegister(MI->getOperand(0).getReg(), DestReg, SubIdx, RI);
		MBB.insert(I, MI);
		}

std::pair<MachineInstr, MachineInstr>		std::pair<MachineInstr, MachineInstr>
SIInstrInfo::expandMovDPP64(MachineInstr &MI) const {		SIInstrInfo::expandMovDPP64(MachineInstr &MI) const {
assert (MI.getOpcode() == AMDGPU::V_MOV_B64_DPP_PSEUDO);		assert (MI.getOpcode() == AMDGPU::V_MOV_B64_DPP_PSEUDO);

if (ST.hasMovB64() &&		if (ST.hasMovB64() &&
AMDGPU::isLegalDPALU_DPPControl(		AMDGPU::isLegalDPALU_DPPControl(
getNamedOperand(MI, AMDGPU::OpName::dpp_ctrl)->getImm())) {		getNamedOperand(MI, AMDGPU::OpName::dpp_ctrl)->getImm())) {
MI.setDesc(get(AMDGPU::V_MOV_B64_dpp));		MI.setDesc(get(AMDGPU::V_MOV_B64_dpp));
▲ Show 20 Lines • Show All 6,630 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/hsa-metadata-kernel-code-props-v3.ll

Show First 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	entry:
%a.val = load half, ptr addrspace(1) %a		%a.val = load half, ptr addrspace(1) %a
%b.val = load half, ptr addrspace(1) %b		%b.val = load half, ptr addrspace(1) %b
%r.val = fadd half %a.val, %b.val		%r.val = fadd half %a.val, %b.val
store half %r.val, ptr addrspace(1) %r		store half %r.val, ptr addrspace(1) %r
ret void		ret void
}		}

; CHECK: .name: num_spilled_sgprs		; CHECK: .name: num_spilled_sgprs
; GFX700: .sgpr_spill_count: 38		; GFX700: .sgpr_spill_count: 12
; GFX803: .sgpr_spill_count: 22		; GFX803: .sgpr_spill_count: 12
; GFX900: .sgpr_spill_count: 48		; GFX900: .sgpr_spill_count: 48
; GFX1010: .sgpr_spill_count: 48		; GFX1010: .sgpr_spill_count: 48
; CHECK: .symbol: num_spilled_sgprs.kd		; CHECK: .symbol: num_spilled_sgprs.kd
define amdgpu_kernel void @num_spilled_sgprs(		define amdgpu_kernel void @num_spilled_sgprs(
ptr addrspace(1) %out0, ptr addrspace(1) %out1, [8 x i32],		ptr addrspace(1) %out0, ptr addrspace(1) %out1, [8 x i32],
ptr addrspace(1) %out2, ptr addrspace(1) %out3, [8 x i32],		ptr addrspace(1) %out2, ptr addrspace(1) %out3, [8 x i32],
ptr addrspace(1) %out4, ptr addrspace(1) %out5, [8 x i32],		ptr addrspace(1) %out4, ptr addrspace(1) %out5, [8 x i32],
ptr addrspace(1) %out6, ptr addrspace(1) %out7, [8 x i32],		ptr addrspace(1) %out6, ptr addrspace(1) %out7, [8 x i32],
▲ Show 20 Lines • Show All 109 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/hsa-metadata-kernel-code-props.ll

Show First 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	entry:
%r.val = fadd half %a.val, %b.val		%r.val = fadd half %a.val, %b.val
store half %r.val, ptr addrspace(1) %r		store half %r.val, ptr addrspace(1) %r
ret void		ret void
}		}

; CHECK-LABEL: - Name: num_spilled_sgprs		; CHECK-LABEL: - Name: num_spilled_sgprs
; CHECK: SymbolName: 'num_spilled_sgprs@kd'		; CHECK: SymbolName: 'num_spilled_sgprs@kd'
; CHECK: CodeProps:		; CHECK: CodeProps:
; GFX700: NumSpilledSGPRs: 38		; GFX700: NumSpilledSGPRs: 12
; GFX803: NumSpilledSGPRs: 22		; GFX803: NumSpilledSGPRs: 12
; GFX900: NumSpilledSGPRs: {{22\|48}}		; GFX900: NumSpilledSGPRs: {{22\|48}}
define amdgpu_kernel void @num_spilled_sgprs(		define amdgpu_kernel void @num_spilled_sgprs(
ptr addrspace(1) %out0, ptr addrspace(1) %out1, [8 x i32],		ptr addrspace(1) %out0, ptr addrspace(1) %out1, [8 x i32],
ptr addrspace(1) %out2, ptr addrspace(1) %out3, [8 x i32],		ptr addrspace(1) %out2, ptr addrspace(1) %out3, [8 x i32],
ptr addrspace(1) %out4, ptr addrspace(1) %out5, [8 x i32],		ptr addrspace(1) %out4, ptr addrspace(1) %out5, [8 x i32],
ptr addrspace(1) %out6, ptr addrspace(1) %out7, [8 x i32],		ptr addrspace(1) %out6, ptr addrspace(1) %out7, [8 x i32],
ptr addrspace(1) %out8, ptr addrspace(1) %out9, [8 x i32],		ptr addrspace(1) %out8, ptr addrspace(1) %out9, [8 x i32],
ptr addrspace(1) %outa, ptr addrspace(1) %outb, [8 x i32],		ptr addrspace(1) %outa, ptr addrspace(1) %outb, [8 x i32],
▲ Show 20 Lines • Show All 104 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/remat-smrd.mir

# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py		# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs --stress-regalloc=2 -start-before=greedy -stop-after=virtregrewriter -o - %s \| FileCheck -check-prefix=GCN %s		# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs --stress-regalloc=2 -start-before=greedy -stop-after=virtregrewriter -o - %s \| FileCheck -check-prefix=GCN %s

# Case that should really rematerialize		# Case that should really rematerialize
		piotrAuthorUnsubmitted Done Reply Inline Actions The patch handles the positive cases involving loads, and does not affect the negative cases. piotr: The patch handles the positive cases involving loads, and does not affect the negative cases.
		piotrAuthorUnsubmitted Done Reply Inline Actions (Need to add a test for shrinking) piotr: (Need to add a test for shrinking)
---		---
name: test_remat_s_load_dword_imm		name: test_remat_s_load_dword_imm
tracksRegLiveness: true		tracksRegLiveness: true
body: \|		body: \|
bb.0:		bb.0:
liveins: $sgpr8_sgpr9		liveins: $sgpr8_sgpr9
; GCN-LABEL: name: test_remat_s_load_dword_imm		; GCN-LABEL: name: test_remat_s_load_dword_imm
; GCN: liveins: $sgpr8_sgpr9		; GCN: liveins: $sgpr8_sgpr9
; GCN-NEXT: {{ $}}		; GCN-NEXT: {{ $}}
; GCN-NEXT: renamable $sgpr2_sgpr3 = COPY $sgpr8_sgpr9		; GCN-NEXT: renamable $sgpr2_sgpr3 = COPY $sgpr8_sgpr9
; GCN-NEXT: renamable $sgpr0 = S_LOAD_DWORD_IMM renamable $sgpr2_sgpr3, 0, 0 :: (dereferenceable invariant load (s32), addrspace 4)
; GCN-NEXT: SI_SPILL_S32_SAVE killed renamable $sgpr0, %stack.0, implicit $exec, implicit $sp_reg :: (store (s32) into %stack.0, addrspace 5)
; GCN-NEXT: renamable $sgpr1 = S_LOAD_DWORD_IMM renamable $sgpr2_sgpr3, 4, 0 :: (dereferenceable invariant load (s32), addrspace 4)		; GCN-NEXT: renamable $sgpr1 = S_LOAD_DWORD_IMM renamable $sgpr2_sgpr3, 4, 0 :: (dereferenceable invariant load (s32), addrspace 4)
; GCN-NEXT: renamable $sgpr0 = S_LOAD_DWORD_IMM renamable $sgpr2_sgpr3, 8, 0 :: (dereferenceable invariant load (s32), addrspace 4)		; GCN-NEXT: renamable $sgpr0 = S_LOAD_DWORD_IMM renamable $sgpr2_sgpr3, 0, 0 :: (dereferenceable invariant load (s32), addrspace 4)
; GCN-NEXT: SI_SPILL_S32_SAVE killed renamable $sgpr0, %stack.1, implicit $exec, implicit $sp_reg :: (store (s32) into %stack.1, addrspace 5)
; GCN-NEXT: renamable $sgpr0 = SI_SPILL_S32_RESTORE %stack.0, implicit $exec, implicit $sp_reg :: (load (s32) from %stack.0, addrspace 5)
; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr0		; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr0
; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr1		; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr1
; GCN-NEXT: renamable $sgpr0 = SI_SPILL_S32_RESTORE %stack.1, implicit $exec, implicit $sp_reg :: (load (s32) from %stack.1, addrspace 5)		; GCN-NEXT: renamable $sgpr0 = S_LOAD_DWORD_IMM renamable $sgpr2_sgpr3, 8, 0 :: (dereferenceable invariant load (s32), addrspace 4)
; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr0		; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr0
; GCN-NEXT: S_ENDPGM 0, implicit killed renamable $sgpr2_sgpr3		; GCN-NEXT: S_ENDPGM 0, implicit killed renamable $sgpr2_sgpr3
%0:sreg_64_xexec = COPY $sgpr8_sgpr9		%0:sreg_64_xexec = COPY $sgpr8_sgpr9
%1:sreg_32_xm0_xexec = S_LOAD_DWORD_IMM %0, 0, 0 :: (invariant dereferenceable load (s32), addrspace 4)		%1:sreg_32_xm0_xexec = S_LOAD_DWORD_IMM %0, 0, 0 :: (invariant dereferenceable load (s32), addrspace 4)
%2:sreg_32_xm0_xexec = S_LOAD_DWORD_IMM %0, 4, 0 :: (invariant dereferenceable load (s32), addrspace 4)		%2:sreg_32_xm0_xexec = S_LOAD_DWORD_IMM %0, 4, 0 :: (invariant dereferenceable load (s32), addrspace 4)
%3:sreg_32_xm0_xexec = S_LOAD_DWORD_IMM %0, 8, 0 :: (invariant dereferenceable load (s32), addrspace 4)		%3:sreg_32_xm0_xexec = S_LOAD_DWORD_IMM %0, 8, 0 :: (invariant dereferenceable load (s32), addrspace 4)
S_NOP 0, implicit %1		S_NOP 0, implicit %1
S_NOP 0, implicit %2		S_NOP 0, implicit %2
S_NOP 0, implicit %3		S_NOP 0, implicit %3
S_ENDPGM 0, implicit %0		S_ENDPGM 0, implicit %0
...		...

---		---
name: test_remat_s_load_dword_immx2		name: test_remat_s_load_dword_immx2
tracksRegLiveness: true		tracksRegLiveness: true
body: \|		body: \|
bb.0:		bb.0:
liveins: $sgpr8_sgpr9		liveins: $sgpr8_sgpr9
; GCN-LABEL: name: test_remat_s_load_dword_immx2		; GCN-LABEL: name: test_remat_s_load_dword_immx2
; GCN: liveins: $sgpr8_sgpr9		; GCN: liveins: $sgpr8_sgpr9
; GCN-NEXT: {{ $}}		; GCN-NEXT: {{ $}}
; GCN-NEXT: renamable $sgpr0_sgpr1 = COPY $sgpr8_sgpr9		; GCN-NEXT: renamable $sgpr0_sgpr1 = COPY $sgpr8_sgpr9
; GCN-NEXT: renamable $sgpr2_sgpr3 = S_LOAD_DWORDX2_IMM renamable $sgpr0_sgpr1, 0, 0 :: (dereferenceable invariant load (s64), addrspace 4)		; GCN-NEXT: renamable $sgpr2_sgpr3 = S_LOAD_DWORDX2_IMM renamable $sgpr0_sgpr1, 0, 0 :: (dereferenceable invariant load (s64), addrspace 4)
; GCN-NEXT: SI_SPILL_S64_SAVE killed renamable $sgpr2_sgpr3, %stack.1, implicit $exec, implicit $sp_reg :: (store (s64) into %stack.1, align 4, addrspace 5)
; GCN-NEXT: renamable $sgpr2_sgpr3 = S_LOAD_DWORDX2_IMM renamable $sgpr0_sgpr1, 4, 0 :: (dereferenceable invariant load (s64), addrspace 4)
; GCN-NEXT: SI_SPILL_S64_SAVE killed renamable $sgpr2_sgpr3, %stack.0, implicit $exec, implicit $sp_reg :: (store (s64) into %stack.0, align 4, addrspace 5)
; GCN-NEXT: renamable $sgpr2_sgpr3 = S_LOAD_DWORDX2_IMM renamable $sgpr0_sgpr1, 8, 0 :: (dereferenceable invariant load (s64), addrspace 4)
; GCN-NEXT: SI_SPILL_S64_SAVE killed renamable $sgpr2_sgpr3, %stack.2, implicit $exec, implicit $sp_reg :: (store (s64) into %stack.2, align 4, addrspace 5)
; GCN-NEXT: renamable $sgpr2_sgpr3 = SI_SPILL_S64_RESTORE %stack.1, implicit $exec, implicit $sp_reg :: (load (s64) from %stack.1, align 4, addrspace 5)
; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr2_sgpr3		; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr2_sgpr3
; GCN-NEXT: renamable $sgpr2_sgpr3 = SI_SPILL_S64_RESTORE %stack.0, implicit $exec, implicit $sp_reg :: (load (s64) from %stack.0, align 4, addrspace 5)		; GCN-NEXT: renamable $sgpr2_sgpr3 = S_LOAD_DWORDX2_IMM renamable $sgpr0_sgpr1, 4, 0 :: (dereferenceable invariant load (s64), addrspace 4)
; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr2_sgpr3		; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr2_sgpr3
; GCN-NEXT: renamable $sgpr2_sgpr3 = SI_SPILL_S64_RESTORE %stack.2, implicit $exec, implicit $sp_reg :: (load (s64) from %stack.2, align 4, addrspace 5)		; GCN-NEXT: renamable $sgpr2_sgpr3 = S_LOAD_DWORDX2_IMM renamable $sgpr0_sgpr1, 8, 0 :: (dereferenceable invariant load (s64), addrspace 4)
; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr2_sgpr3		; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr2_sgpr3
; GCN-NEXT: S_ENDPGM 0, implicit killed renamable $sgpr0_sgpr1		; GCN-NEXT: S_ENDPGM 0, implicit killed renamable $sgpr0_sgpr1
%0:sreg_64_xexec = COPY $sgpr8_sgpr9		%0:sreg_64_xexec = COPY $sgpr8_sgpr9
%1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 0, 0 :: (invariant dereferenceable load (s64), addrspace 4)		%1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 0, 0 :: (invariant dereferenceable load (s64), addrspace 4)
%2:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 4, 0 :: (invariant dereferenceable load (s64), addrspace 4)		%2:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 4, 0 :: (invariant dereferenceable load (s64), addrspace 4)
%3:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 8, 0 :: (invariant dereferenceable load (s64), addrspace 4)		%3:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 8, 0 :: (invariant dereferenceable load (s64), addrspace 4)
S_NOP 0, implicit %1		S_NOP 0, implicit %1
S_NOP 0, implicit %2		S_NOP 0, implicit %2
S_NOP 0, implicit %3		S_NOP 0, implicit %3
S_ENDPGM 0, implicit %0		S_ENDPGM 0, implicit %0
...		...

---		---
name: test_remat_s_load_dword_immx4		name: test_remat_s_load_dword_immx4
tracksRegLiveness: true		tracksRegLiveness: true
body: \|		body: \|
bb.0:		bb.0:
liveins: $sgpr8_sgpr9		liveins: $sgpr8_sgpr9
; GCN-LABEL: name: test_remat_s_load_dword_immx4		; GCN-LABEL: name: test_remat_s_load_dword_immx4
; GCN: liveins: $sgpr8_sgpr9		; GCN: liveins: $sgpr8_sgpr9
; GCN-NEXT: {{ $}}		; GCN-NEXT: {{ $}}
; GCN-NEXT: renamable $sgpr0_sgpr1 = COPY $sgpr8_sgpr9		; GCN-NEXT: renamable $sgpr0_sgpr1 = COPY $sgpr8_sgpr9
; GCN-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7 = S_LOAD_DWORDX4_IMM renamable $sgpr0_sgpr1, 0, 0 :: (dereferenceable invariant load (s128), addrspace 4)		; GCN-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7 = S_LOAD_DWORDX4_IMM renamable $sgpr0_sgpr1, 0, 0 :: (dereferenceable invariant load (s128), addrspace 4)
; GCN-NEXT: SI_SPILL_S128_SAVE killed renamable $sgpr4_sgpr5_sgpr6_sgpr7, %stack.1, implicit $exec, implicit $sp_reg :: (store (s128) into %stack.1, align 4, addrspace 5)
; GCN-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7 = S_LOAD_DWORDX4_IMM renamable $sgpr0_sgpr1, 4, 0 :: (dereferenceable invariant load (s128), addrspace 4)
; GCN-NEXT: SI_SPILL_S128_SAVE killed renamable $sgpr4_sgpr5_sgpr6_sgpr7, %stack.0, implicit $exec, implicit $sp_reg :: (store (s128) into %stack.0, align 4, addrspace 5)
; GCN-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7 = S_LOAD_DWORDX4_IMM renamable $sgpr0_sgpr1, 8, 0 :: (dereferenceable invariant load (s128), addrspace 4)
; GCN-NEXT: SI_SPILL_S128_SAVE killed renamable $sgpr4_sgpr5_sgpr6_sgpr7, %stack.2, implicit $exec, implicit $sp_reg :: (store (s128) into %stack.2, align 4, addrspace 5)
; GCN-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7 = SI_SPILL_S128_RESTORE %stack.1, implicit $exec, implicit $sp_reg :: (load (s128) from %stack.1, align 4, addrspace 5)
; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr4_sgpr5_sgpr6_sgpr7		; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr4_sgpr5_sgpr6_sgpr7
; GCN-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7 = SI_SPILL_S128_RESTORE %stack.0, implicit $exec, implicit $sp_reg :: (load (s128) from %stack.0, align 4, addrspace 5)		; GCN-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7 = S_LOAD_DWORDX4_IMM renamable $sgpr0_sgpr1, 4, 0 :: (dereferenceable invariant load (s128), addrspace 4)
; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr4_sgpr5_sgpr6_sgpr7		; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr4_sgpr5_sgpr6_sgpr7
; GCN-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7 = SI_SPILL_S128_RESTORE %stack.2, implicit $exec, implicit $sp_reg :: (load (s128) from %stack.2, align 4, addrspace 5)		; GCN-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7 = S_LOAD_DWORDX4_IMM renamable $sgpr0_sgpr1, 8, 0 :: (dereferenceable invariant load (s128), addrspace 4)
; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr4_sgpr5_sgpr6_sgpr7		; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr4_sgpr5_sgpr6_sgpr7
; GCN-NEXT: S_ENDPGM 0, implicit killed renamable $sgpr0_sgpr1		; GCN-NEXT: S_ENDPGM 0, implicit killed renamable $sgpr0_sgpr1
%0:sreg_64_xexec = COPY $sgpr8_sgpr9		%0:sreg_64_xexec = COPY $sgpr8_sgpr9
%1:sgpr_128 = S_LOAD_DWORDX4_IMM %0, 0, 0 :: (invariant dereferenceable load (s128), addrspace 4)		%1:sgpr_128 = S_LOAD_DWORDX4_IMM %0, 0, 0 :: (invariant dereferenceable load (s128), addrspace 4)
%2:sgpr_128 = S_LOAD_DWORDX4_IMM %0, 4, 0 :: (invariant dereferenceable load (s128), addrspace 4)		%2:sgpr_128 = S_LOAD_DWORDX4_IMM %0, 4, 0 :: (invariant dereferenceable load (s128), addrspace 4)
%3:sgpr_128 = S_LOAD_DWORDX4_IMM %0, 8, 0 :: (invariant dereferenceable load (s128), addrspace 4)		%3:sgpr_128 = S_LOAD_DWORDX4_IMM %0, 8, 0 :: (invariant dereferenceable load (s128), addrspace 4)
S_NOP 0, implicit %1		S_NOP 0, implicit %1
S_NOP 0, implicit %2		S_NOP 0, implicit %2
S_NOP 0, implicit %3		S_NOP 0, implicit %3
S_ENDPGM 0, implicit %0		S_ENDPGM 0, implicit %0
...		...

---		---
name: test_remat_s_load_dword_immx8		name: test_remat_s_load_dword_immx8
tracksRegLiveness: true		tracksRegLiveness: true
body: \|		body: \|
bb.0:		bb.0:
liveins: $sgpr8_sgpr9		liveins: $sgpr8_sgpr9
; GCN-LABEL: name: test_remat_s_load_dword_immx8		; GCN-LABEL: name: test_remat_s_load_dword_immx8
; GCN: liveins: $sgpr8_sgpr9		; GCN: liveins: $sgpr8_sgpr9
; GCN-NEXT: {{ $}}		; GCN-NEXT: {{ $}}
; GCN-NEXT: renamable $sgpr0_sgpr1 = COPY $sgpr8_sgpr9		; GCN-NEXT: renamable $sgpr0_sgpr1 = COPY $sgpr8_sgpr9
; GCN-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11 = S_LOAD_DWORDX8_IMM renamable $sgpr0_sgpr1, 0, 0 :: (dereferenceable invariant load (s256), addrspace 4)		; GCN-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11 = S_LOAD_DWORDX8_IMM renamable $sgpr0_sgpr1, 0, 0 :: (dereferenceable invariant load (s256), addrspace 4)
; GCN-NEXT: SI_SPILL_S256_SAVE killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11, %stack.1, implicit $exec, implicit $sp_reg :: (store (s256) into %stack.1, align 4, addrspace 5)
; GCN-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11 = S_LOAD_DWORDX8_IMM renamable $sgpr0_sgpr1, 4, 0 :: (dereferenceable invariant load (s256), addrspace 4)
; GCN-NEXT: SI_SPILL_S256_SAVE killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11, %stack.0, implicit $exec, implicit $sp_reg :: (store (s256) into %stack.0, align 4, addrspace 5)
; GCN-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11 = S_LOAD_DWORDX8_IMM renamable $sgpr0_sgpr1, 8, 0 :: (dereferenceable invariant load (s256), addrspace 4)
; GCN-NEXT: SI_SPILL_S256_SAVE killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11, %stack.2, implicit $exec, implicit $sp_reg :: (store (s256) into %stack.2, align 4, addrspace 5)
; GCN-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11 = SI_SPILL_S256_RESTORE %stack.1, implicit $exec, implicit $sp_reg :: (load (s256) from %stack.1, align 4, addrspace 5)
; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11		; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11
; GCN-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11 = SI_SPILL_S256_RESTORE %stack.0, implicit $exec, implicit $sp_reg :: (load (s256) from %stack.0, align 4, addrspace 5)		; GCN-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11 = S_LOAD_DWORDX8_IMM renamable $sgpr0_sgpr1, 4, 0 :: (dereferenceable invariant load (s256), addrspace 4)
; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11		; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11
; GCN-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11 = SI_SPILL_S256_RESTORE %stack.2, implicit $exec, implicit $sp_reg :: (load (s256) from %stack.2, align 4, addrspace 5)		; GCN-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11 = S_LOAD_DWORDX8_IMM renamable $sgpr0_sgpr1, 8, 0 :: (dereferenceable invariant load (s256), addrspace 4)
; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11		; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11
; GCN-NEXT: S_ENDPGM 0, implicit killed renamable $sgpr0_sgpr1		; GCN-NEXT: S_ENDPGM 0, implicit killed renamable $sgpr0_sgpr1
%0:sreg_64_xexec = COPY $sgpr8_sgpr9		%0:sreg_64_xexec = COPY $sgpr8_sgpr9
%1:sgpr_256 = S_LOAD_DWORDX8_IMM %0, 0, 0 :: (invariant dereferenceable load (s256), addrspace 4)		%1:sgpr_256 = S_LOAD_DWORDX8_IMM %0, 0, 0 :: (invariant dereferenceable load (s256), addrspace 4)
%2:sgpr_256 = S_LOAD_DWORDX8_IMM %0, 4, 0 :: (invariant dereferenceable load (s256), addrspace 4)		%2:sgpr_256 = S_LOAD_DWORDX8_IMM %0, 4, 0 :: (invariant dereferenceable load (s256), addrspace 4)
%3:sgpr_256 = S_LOAD_DWORDX8_IMM %0, 8, 0 :: (invariant dereferenceable load (s256), addrspace 4)		%3:sgpr_256 = S_LOAD_DWORDX8_IMM %0, 8, 0 :: (invariant dereferenceable load (s256), addrspace 4)
S_NOP 0, implicit %1		S_NOP 0, implicit %1
S_NOP 0, implicit %2		S_NOP 0, implicit %2
S_NOP 0, implicit %3		S_NOP 0, implicit %3
S_ENDPGM 0, implicit %0		S_ENDPGM 0, implicit %0
...		...

---		---
name: test_remat_s_load_dword_immx16		name: test_remat_s_load_dword_immx16
tracksRegLiveness: true		tracksRegLiveness: true
body: \|		body: \|
bb.0:		bb.0:
liveins: $sgpr8_sgpr9		liveins: $sgpr8_sgpr9
; GCN-LABEL: name: test_remat_s_load_dword_immx16		; GCN-LABEL: name: test_remat_s_load_dword_immx16
; GCN: liveins: $sgpr8_sgpr9		; GCN: liveins: $sgpr8_sgpr9
; GCN-NEXT: {{ $}}		; GCN-NEXT: {{ $}}
; GCN-NEXT: renamable $sgpr0_sgpr1 = COPY $sgpr8_sgpr9		; GCN-NEXT: renamable $sgpr0_sgpr1 = COPY $sgpr8_sgpr9
; GCN-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14_sgpr15_sgpr16_sgpr17_sgpr18_sgpr19 = S_LOAD_DWORDX16_IMM renamable $sgpr0_sgpr1, 0, 0 :: (dereferenceable invariant load (s512), addrspace 4)		; GCN-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14_sgpr15_sgpr16_sgpr17_sgpr18_sgpr19 = S_LOAD_DWORDX16_IMM renamable $sgpr0_sgpr1, 0, 0 :: (dereferenceable invariant load (s512), addrspace 4)
; GCN-NEXT: SI_SPILL_S512_SAVE killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14_sgpr15_sgpr16_sgpr17_sgpr18_sgpr19, %stack.1, implicit $exec, implicit $sp_reg :: (store (s512) into %stack.1, align 4, addrspace 5)
; GCN-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14_sgpr15_sgpr16_sgpr17_sgpr18_sgpr19 = S_LOAD_DWORDX16_IMM renamable $sgpr0_sgpr1, 4, 0 :: (dereferenceable invariant load (s512), addrspace 4)
; GCN-NEXT: SI_SPILL_S512_SAVE killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14_sgpr15_sgpr16_sgpr17_sgpr18_sgpr19, %stack.0, implicit $exec, implicit $sp_reg :: (store (s512) into %stack.0, align 4, addrspace 5)
; GCN-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14_sgpr15_sgpr16_sgpr17_sgpr18_sgpr19 = S_LOAD_DWORDX16_IMM renamable $sgpr0_sgpr1, 8, 0 :: (dereferenceable invariant load (s512), addrspace 4)
; GCN-NEXT: SI_SPILL_S512_SAVE killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14_sgpr15_sgpr16_sgpr17_sgpr18_sgpr19, %stack.2, implicit $exec, implicit $sp_reg :: (store (s512) into %stack.2, align 4, addrspace 5)
; GCN-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14_sgpr15_sgpr16_sgpr17_sgpr18_sgpr19 = SI_SPILL_S512_RESTORE %stack.1, implicit $exec, implicit $sp_reg :: (load (s512) from %stack.1, align 4, addrspace 5)
; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14_sgpr15_sgpr16_sgpr17_sgpr18_sgpr19		; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14_sgpr15_sgpr16_sgpr17_sgpr18_sgpr19
; GCN-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14_sgpr15_sgpr16_sgpr17_sgpr18_sgpr19 = SI_SPILL_S512_RESTORE %stack.0, implicit $exec, implicit $sp_reg :: (load (s512) from %stack.0, align 4, addrspace 5)		; GCN-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14_sgpr15_sgpr16_sgpr17_sgpr18_sgpr19 = S_LOAD_DWORDX16_IMM renamable $sgpr0_sgpr1, 4, 0 :: (dereferenceable invariant load (s512), addrspace 4)
; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14_sgpr15_sgpr16_sgpr17_sgpr18_sgpr19		; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14_sgpr15_sgpr16_sgpr17_sgpr18_sgpr19
; GCN-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14_sgpr15_sgpr16_sgpr17_sgpr18_sgpr19 = SI_SPILL_S512_RESTORE %stack.2, implicit $exec, implicit $sp_reg :: (load (s512) from %stack.2, align 4, addrspace 5)		; GCN-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14_sgpr15_sgpr16_sgpr17_sgpr18_sgpr19 = S_LOAD_DWORDX16_IMM renamable $sgpr0_sgpr1, 8, 0 :: (dereferenceable invariant load (s512), addrspace 4)
; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14_sgpr15_sgpr16_sgpr17_sgpr18_sgpr19		; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14_sgpr15_sgpr16_sgpr17_sgpr18_sgpr19
; GCN-NEXT: S_ENDPGM 0, implicit killed renamable $sgpr0_sgpr1		; GCN-NEXT: S_ENDPGM 0, implicit killed renamable $sgpr0_sgpr1
%0:sreg_64_xexec = COPY $sgpr8_sgpr9		%0:sreg_64_xexec = COPY $sgpr8_sgpr9
%1:sgpr_512 = S_LOAD_DWORDX16_IMM %0, 0, 0 :: (invariant dereferenceable load (s512), addrspace 4)		%1:sgpr_512 = S_LOAD_DWORDX16_IMM %0, 0, 0 :: (invariant dereferenceable load (s512), addrspace 4)
%2:sgpr_512 = S_LOAD_DWORDX16_IMM %0, 4, 0 :: (invariant dereferenceable load (s512), addrspace 4)		%2:sgpr_512 = S_LOAD_DWORDX16_IMM %0, 4, 0 :: (invariant dereferenceable load (s512), addrspace 4)
%3:sgpr_512 = S_LOAD_DWORDX16_IMM %0, 8, 0 :: (invariant dereferenceable load (s512), addrspace 4)		%3:sgpr_512 = S_LOAD_DWORDX16_IMM %0, 8, 0 :: (invariant dereferenceable load (s512), addrspace 4)
S_NOP 0, implicit %1		S_NOP 0, implicit %1
S_NOP 0, implicit %2		S_NOP 0, implicit %2
S_NOP 0, implicit %3		S_NOP 0, implicit %3
S_ENDPGM 0, implicit %0		S_ENDPGM 0, implicit %0
...		...

---		---
name: test_remat_s_load_dword_immx16_subreg		name: test_remat_s_load_dword_immx16_subreg
tracksRegLiveness: true		tracksRegLiveness: true
body: \|		body: \|
bb.0:		bb.0:
liveins: $sgpr8_sgpr9, $vgpr0_vgpr1		liveins: $sgpr8_sgpr9, $vgpr0_vgpr1
; GCN-LABEL: name: test_remat_s_load_dword_immx16_subreg		; GCN-LABEL: name: test_remat_s_load_dword_immx16_subreg
; GCN: liveins: $sgpr8_sgpr9, $vgpr0_vgpr1		; GCN: liveins: $sgpr8_sgpr9, $vgpr0_vgpr1
; GCN-NEXT: {{ $}}		; GCN-NEXT: {{ $}}
; GCN-NEXT: renamable $sgpr0_sgpr1 = COPY $sgpr8_sgpr9		; GCN-NEXT: renamable $sgpr0_sgpr1 = COPY $sgpr8_sgpr9
; GCN-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14_sgpr15_sgpr16_sgpr17_sgpr18_sgpr19 = S_LOAD_DWORDX16_IMM renamable $sgpr0_sgpr1, 0, 0 :: (dereferenceable invariant load (s512), align 4, addrspace 4)
; GCN-NEXT: SI_SPILL_S512_SAVE killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14_sgpr15_sgpr16_sgpr17_sgpr18_sgpr19, %stack.0, implicit $exec, implicit $sp_reg :: (store (s512) into %stack.0, align 4, addrspace 5)
; GCN-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14_sgpr15_sgpr16_sgpr17_sgpr18_sgpr19 = S_LOAD_DWORDX16_IMM renamable $sgpr0_sgpr1, 128, 0 :: (dereferenceable invariant load (s512), align 4, addrspace 4)		; GCN-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14_sgpr15_sgpr16_sgpr17_sgpr18_sgpr19 = S_LOAD_DWORDX16_IMM renamable $sgpr0_sgpr1, 128, 0 :: (dereferenceable invariant load (s512), align 4, addrspace 4)
; GCN-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14_sgpr15_sgpr16_sgpr17_sgpr18_sgpr19 = SI_SPILL_S512_RESTORE %stack.0, implicit $exec, implicit $sp_reg :: (load (s512) from %stack.0, align 4, addrspace 5)		; GCN-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11 = S_LOAD_DWORDX8_IMM renamable $sgpr0_sgpr1, 32, 0 :: (dereferenceable invariant load (s256), align 4, addrspace 4)
		; GCN-NEXT: renamable $sgpr12_sgpr13_sgpr14_sgpr15_sgpr16_sgpr17_sgpr18_sgpr19 = COPY killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11
; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr12_sgpr13_sgpr14_sgpr15_sgpr16_sgpr17_sgpr18_sgpr19		; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr12_sgpr13_sgpr14_sgpr15_sgpr16_sgpr17_sgpr18_sgpr19
; GCN-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14_sgpr15_sgpr16_sgpr17_sgpr18_sgpr19 = S_LOAD_DWORDX16_IMM renamable $sgpr0_sgpr1, 128, 0 :: (dereferenceable invariant load (s512), align 4, addrspace 4)		; GCN-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14_sgpr15_sgpr16_sgpr17_sgpr18_sgpr19 = S_LOAD_DWORDX16_IMM renamable $sgpr0_sgpr1, 128, 0 :: (dereferenceable invariant load (s512), align 4, addrspace 4)
; GCN-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14_sgpr15_sgpr16_sgpr17_sgpr18_sgpr19 = SI_SPILL_S512_RESTORE %stack.0, implicit $exec, implicit $sp_reg :: (load (s512) from %stack.0, align 4, addrspace 5)		; GCN-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11 = S_LOAD_DWORDX8_IMM renamable $sgpr0_sgpr1, 0, 0 :: (dereferenceable invariant load (s256), align 4, addrspace 4)
; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11		; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11
; GCN-NEXT: S_ENDPGM 0, implicit killed renamable $sgpr0_sgpr1		; GCN-NEXT: S_ENDPGM 0, implicit killed renamable $sgpr0_sgpr1
%0:sreg_64_xexec = COPY $sgpr8_sgpr9		%0:sreg_64_xexec = COPY $sgpr8_sgpr9
%1:sgpr_512 = S_LOAD_DWORDX16_IMM %0, 0, 0 :: (invariant dereferenceable load (s512), align 4, addrspace 4)		%1:sgpr_512 = S_LOAD_DWORDX16_IMM %0, 0, 0 :: (invariant dereferenceable load (s512), align 4, addrspace 4)
%2:sgpr_512 = S_LOAD_DWORDX16_IMM %0, 128, 0 :: (invariant dereferenceable load (s512), align 4, addrspace 4)		%2:sgpr_512 = S_LOAD_DWORDX16_IMM %0, 128, 0 :: (invariant dereferenceable load (s512), align 4, addrspace 4)
%2.sub8_sub9_sub10_sub11_sub12_sub13_sub14_sub15:sgpr_512 = COPY %1.sub8_sub9_sub10_sub11_sub12_sub13_sub14_sub15:sgpr_512		%2.sub8_sub9_sub10_sub11_sub12_sub13_sub14_sub15:sgpr_512 = COPY %1.sub8_sub9_sub10_sub11_sub12_sub13_sub14_sub15:sgpr_512
S_NOP 0, implicit %2.sub8_sub9_sub10_sub11_sub12_sub13_sub14_sub15		S_NOP 0, implicit %2.sub8_sub9_sub10_sub11_sub12_sub13_sub14_sub15
%3:sgpr_512 = S_LOAD_DWORDX16_IMM %0, 128, 0 :: (invariant dereferenceable load (s512), align 4, addrspace 4)		%3:sgpr_512 = S_LOAD_DWORDX16_IMM %0, 128, 0 :: (invariant dereferenceable load (s512), align 4, addrspace 4)
%3.sub0_sub1_sub2_sub3_sub4_sub5_sub6_sub7:sgpr_512 = COPY %1.sub0_sub1_sub2_sub3_sub4_sub5_sub6_sub7:sgpr_512		%3.sub0_sub1_sub2_sub3_sub4_sub5_sub6_sub7:sgpr_512 = COPY %1.sub0_sub1_sub2_sub3_sub4_sub5_sub6_sub7:sgpr_512
S_NOP 0, implicit %3.sub0_sub1_sub2_sub3_sub4_sub5_sub6_sub7:sgpr_512		S_NOP 0, implicit %3.sub0_sub1_sub2_sub3_sub4_sub5_sub6_sub7:sgpr_512

S_ENDPGM 0, implicit %0		S_ENDPGM 0, implicit %0
...		...

---		---
name: test_remat_s_load_dword_immx16_subreg_no_shrinking_double_use		name: test_remat_s_load_dword_immx16_subreg_no_shrinking_double_use
tracksRegLiveness: true		tracksRegLiveness: true
body: \|		body: \|
bb.0:		bb.0:
liveins: $sgpr8_sgpr9, $vgpr0_vgpr1		liveins: $sgpr8_sgpr9, $vgpr0_vgpr1
; GCN-LABEL: name: test_remat_s_load_dword_immx16_subreg_no_shrinking_double_use		; GCN-LABEL: name: test_remat_s_load_dword_immx16_subreg_no_shrinking_double_use
; GCN: liveins: $sgpr8_sgpr9, $vgpr0_vgpr1		; GCN: liveins: $sgpr8_sgpr9, $vgpr0_vgpr1
; GCN-NEXT: {{ $}}		; GCN-NEXT: {{ $}}
; GCN-NEXT: renamable $sgpr0_sgpr1 = COPY $sgpr8_sgpr9		; GCN-NEXT: renamable $sgpr0_sgpr1 = COPY $sgpr8_sgpr9
; GCN-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14_sgpr15_sgpr16_sgpr17_sgpr18_sgpr19 = S_LOAD_DWORDX16_IMM renamable $sgpr0_sgpr1, 0, 0 :: (dereferenceable invariant load (s512), align 4, addrspace 4)
; GCN-NEXT: SI_SPILL_S512_SAVE killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14_sgpr15_sgpr16_sgpr17_sgpr18_sgpr19, %stack.0, implicit $exec, implicit $sp_reg :: (store (s512) into %stack.0, align 4, addrspace 5)
; GCN-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14_sgpr15_sgpr16_sgpr17_sgpr18_sgpr19 = S_LOAD_DWORDX16_IMM renamable $sgpr0_sgpr1, 128, 0 :: (dereferenceable invariant load (s512), align 4, addrspace 4)		; GCN-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14_sgpr15_sgpr16_sgpr17_sgpr18_sgpr19 = S_LOAD_DWORDX16_IMM renamable $sgpr0_sgpr1, 128, 0 :: (dereferenceable invariant load (s512), align 4, addrspace 4)
; GCN-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14_sgpr15_sgpr16_sgpr17_sgpr18_sgpr19 = SI_SPILL_S512_RESTORE %stack.0, implicit $exec, implicit $sp_reg :: (load (s512) from %stack.0, align 4, addrspace 5)		; GCN-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14_sgpr15_sgpr16_sgpr17_sgpr18_sgpr19 = S_LOAD_DWORDX16_IMM renamable $sgpr0_sgpr1, 0, 0 :: (dereferenceable invariant load (s512), align 4, addrspace 4)
; GCN-NEXT: renamable $sgpr12_sgpr13_sgpr14_sgpr15_sgpr16_sgpr17_sgpr18_sgpr19 = KILL killed renamable $sgpr12_sgpr13_sgpr14_sgpr15_sgpr16_sgpr17_sgpr18_sgpr19, implicit renamable $sgpr4_sgpr5_sgpr6_sgpr7		; GCN-NEXT: renamable $sgpr12_sgpr13_sgpr14_sgpr15_sgpr16_sgpr17_sgpr18_sgpr19 = KILL killed renamable $sgpr12_sgpr13_sgpr14_sgpr15_sgpr16_sgpr17_sgpr18_sgpr19, implicit renamable $sgpr4_sgpr5_sgpr6_sgpr7
; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr12_sgpr13_sgpr14_sgpr15_sgpr16_sgpr17_sgpr18_sgpr19		; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr12_sgpr13_sgpr14_sgpr15_sgpr16_sgpr17_sgpr18_sgpr19
; GCN-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14_sgpr15_sgpr16_sgpr17_sgpr18_sgpr19 = S_LOAD_DWORDX16_IMM renamable $sgpr0_sgpr1, 128, 0 :: (dereferenceable invariant load (s512), align 4, addrspace 4)		; GCN-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14_sgpr15_sgpr16_sgpr17_sgpr18_sgpr19 = S_LOAD_DWORDX16_IMM renamable $sgpr0_sgpr1, 128, 0 :: (dereferenceable invariant load (s512), align 4, addrspace 4)
; GCN-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14_sgpr15_sgpr16_sgpr17_sgpr18_sgpr19 = SI_SPILL_S512_RESTORE %stack.0, implicit $exec, implicit $sp_reg :: (load (s512) from %stack.0, align 4, addrspace 5)		; GCN-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11 = S_LOAD_DWORDX8_IMM renamable $sgpr0_sgpr1, 0, 0 :: (dereferenceable invariant load (s256), align 4, addrspace 4)
; GCN-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11 = KILL killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11, implicit renamable $sgpr0_sgpr1		; GCN-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11 = KILL killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11, implicit renamable $sgpr0_sgpr1
; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11		; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11
; GCN-NEXT: S_ENDPGM 0, implicit killed renamable $sgpr0_sgpr1		; GCN-NEXT: S_ENDPGM 0, implicit killed renamable $sgpr0_sgpr1
%0:sreg_64_xexec = COPY $sgpr8_sgpr9		%0:sreg_64_xexec = COPY $sgpr8_sgpr9
%1:sgpr_512 = S_LOAD_DWORDX16_IMM %0, 0, 0 :: (invariant dereferenceable load (s512), align 4, addrspace 4)		%1:sgpr_512 = S_LOAD_DWORDX16_IMM %0, 0, 0 :: (invariant dereferenceable load (s512), align 4, addrspace 4)
%2:sgpr_512 = S_LOAD_DWORDX16_IMM %0, 128, 0 :: (invariant dereferenceable load (s512), align 4, addrspace 4)		%2:sgpr_512 = S_LOAD_DWORDX16_IMM %0, 128, 0 :: (invariant dereferenceable load (s512), align 4, addrspace 4)
%2.sub8_sub9_sub10_sub11_sub12_sub13_sub14_sub15:sgpr_512 = COPY %1.sub8_sub9_sub10_sub11_sub12_sub13_sub14_sub15:sgpr_512, implicit %1.sub0_sub1_sub2_sub3:sgpr_512		%2.sub8_sub9_sub10_sub11_sub12_sub13_sub14_sub15:sgpr_512 = COPY %1.sub8_sub9_sub10_sub11_sub12_sub13_sub14_sub15:sgpr_512, implicit %1.sub0_sub1_sub2_sub3:sgpr_512
S_NOP 0, implicit %2.sub8_sub9_sub10_sub11_sub12_sub13_sub14_sub15		S_NOP 0, implicit %2.sub8_sub9_sub10_sub11_sub12_sub13_sub14_sub15
%3:sgpr_512 = S_LOAD_DWORDX16_IMM %0, 128, 0 :: (invariant dereferenceable load (s512), align 4, addrspace 4)		%3:sgpr_512 = S_LOAD_DWORDX16_IMM %0, 128, 0 :: (invariant dereferenceable load (s512), align 4, addrspace 4)
%3.sub0_sub1_sub2_sub3_sub4_sub5_sub6_sub7:sgpr_512 = COPY %1.sub0_sub1_sub2_sub3_sub4_sub5_sub6_sub7:sgpr_512, implicit %0		%3.sub0_sub1_sub2_sub3_sub4_sub5_sub6_sub7:sgpr_512 = COPY %1.sub0_sub1_sub2_sub3_sub4_sub5_sub6_sub7:sgpr_512, implicit %0
S_NOP 0, implicit %3.sub0_sub1_sub2_sub3_sub4_sub5_sub6_sub7:sgpr_512		S_NOP 0, implicit %3.sub0_sub1_sub2_sub3_sub4_sub5_sub6_sub7:sgpr_512
S_ENDPGM 0, implicit %0		S_ENDPGM 0, implicit %0
...		...

---		---
name: test_remat_s_load_dword_immx16_subreg_no_shrinking_bundle		name: test_remat_s_load_dword_immx16_subreg_no_shrinking_bundle
tracksRegLiveness: true		tracksRegLiveness: true
body: \|		body: \|
bb.0:		bb.0:
liveins: $sgpr8_sgpr9, $vgpr0_vgpr1		liveins: $sgpr8_sgpr9, $vgpr0_vgpr1
; GCN-LABEL: name: test_remat_s_load_dword_immx16_subreg_no_shrinking_bundle		; GCN-LABEL: name: test_remat_s_load_dword_immx16_subreg_no_shrinking_bundle
; GCN: liveins: $sgpr8_sgpr9, $vgpr0_vgpr1		; GCN: liveins: $sgpr8_sgpr9, $vgpr0_vgpr1
; GCN-NEXT: {{ $}}		; GCN-NEXT: {{ $}}
; GCN-NEXT: renamable $sgpr0_sgpr1 = COPY $sgpr8_sgpr9		; GCN-NEXT: renamable $sgpr0_sgpr1 = COPY $sgpr8_sgpr9
; GCN-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14_sgpr15_sgpr16_sgpr17_sgpr18_sgpr19 = S_LOAD_DWORDX16_IMM renamable $sgpr0_sgpr1, 0, 0 :: (dereferenceable invariant load (s512), align 4, addrspace 4)
; GCN-NEXT: SI_SPILL_S512_SAVE killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14_sgpr15_sgpr16_sgpr17_sgpr18_sgpr19, %stack.0, implicit $exec, implicit $sp_reg :: (store (s512) into %stack.0, align 4, addrspace 5)
; GCN-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14_sgpr15_sgpr16_sgpr17_sgpr18_sgpr19 = S_LOAD_DWORDX16_IMM renamable $sgpr0_sgpr1, 128, 0 :: (dereferenceable invariant load (s512), align 4, addrspace 4)		; GCN-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14_sgpr15_sgpr16_sgpr17_sgpr18_sgpr19 = S_LOAD_DWORDX16_IMM renamable $sgpr0_sgpr1, 128, 0 :: (dereferenceable invariant load (s512), align 4, addrspace 4)
; GCN-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14_sgpr15_sgpr16_sgpr17_sgpr18_sgpr19 = SI_SPILL_S512_RESTORE %stack.0, implicit $exec, implicit $sp_reg :: (load (s512) from %stack.0, align 4, addrspace 5)		; GCN-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14_sgpr15_sgpr16_sgpr17_sgpr18_sgpr19 = S_LOAD_DWORDX16_IMM renamable $sgpr0_sgpr1, 0, 0 :: (dereferenceable invariant load (s512), align 4, addrspace 4)
; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr12_sgpr13_sgpr14_sgpr15_sgpr16_sgpr17_sgpr18_sgpr19		; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr12_sgpr13_sgpr14_sgpr15_sgpr16_sgpr17_sgpr18_sgpr19
; GCN-NEXT: S_ENDPGM 0, implicit killed renamable $sgpr0_sgpr1		; GCN-NEXT: S_ENDPGM 0, implicit killed renamable $sgpr0_sgpr1
%0:sreg_64_xexec = COPY $sgpr8_sgpr9		%0:sreg_64_xexec = COPY $sgpr8_sgpr9
%1:sgpr_512 = S_LOAD_DWORDX16_IMM %0, 0, 0 :: (invariant dereferenceable load (s512), align 4, addrspace 4)		%1:sgpr_512 = S_LOAD_DWORDX16_IMM %0, 0, 0 :: (invariant dereferenceable load (s512), align 4, addrspace 4)
%2:sgpr_512 = S_LOAD_DWORDX16_IMM %0, 128, 0 :: (invariant dereferenceable load (s512), align 4, addrspace 4)		%2:sgpr_512 = S_LOAD_DWORDX16_IMM %0, 128, 0 :: (invariant dereferenceable load (s512), align 4, addrspace 4)
%2.sub8_sub9_sub10_sub11_sub12_sub13_sub14_sub15:sgpr_512 = COPY %1.sub8_sub9_sub10_sub11_sub12_sub13_sub14_sub15:sgpr_512 {		%2.sub8_sub9_sub10_sub11_sub12_sub13_sub14_sub15:sgpr_512 = COPY %1.sub8_sub9_sub10_sub11_sub12_sub13_sub14_sub15:sgpr_512 {
internal %2.sub8_sub9_sub10_sub11_sub12_sub13_sub14_sub15:sgpr_512 = COPY %1.sub8_sub9_sub10_sub11_sub12_sub13_sub14_sub15:sgpr_512		internal %2.sub8_sub9_sub10_sub11_sub12_sub13_sub14_sub15:sgpr_512 = COPY %1.sub8_sub9_sub10_sub11_sub12_sub13_sub14_sub15:sgpr_512
}		}
S_NOP 0, implicit %2.sub8_sub9_sub10_sub11_sub12_sub13_sub14_sub15		S_NOP 0, implicit %2.sub8_sub9_sub10_sub11_sub12_sub13_sub14_sub15

S_ENDPGM 0, implicit %0		S_ENDPGM 0, implicit %0
...		...

---		---
name: test_remat_s_load_dword_immx8_subreg		name: test_remat_s_load_dword_immx8_subreg
tracksRegLiveness: true		tracksRegLiveness: true
body: \|		body: \|
bb.0:		bb.0:
liveins: $sgpr8_sgpr9, $vgpr0_vgpr1		liveins: $sgpr8_sgpr9, $vgpr0_vgpr1
; GCN-LABEL: name: test_remat_s_load_dword_immx8_subreg		; GCN-LABEL: name: test_remat_s_load_dword_immx8_subreg
; GCN: liveins: $sgpr8_sgpr9, $vgpr0_vgpr1		; GCN: liveins: $sgpr8_sgpr9, $vgpr0_vgpr1
; GCN-NEXT: {{ $}}		; GCN-NEXT: {{ $}}
; GCN-NEXT: renamable $sgpr0_sgpr1 = COPY $sgpr8_sgpr9		; GCN-NEXT: renamable $sgpr0_sgpr1 = COPY $sgpr8_sgpr9
; GCN-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11 = S_LOAD_DWORDX8_IMM renamable $sgpr0_sgpr1, 0, 0 :: (dereferenceable invariant load (s256), align 4, addrspace 4)
; GCN-NEXT: SI_SPILL_S256_SAVE killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11, %stack.0, implicit $exec, implicit $sp_reg :: (store (s256) into %stack.0, align 4, addrspace 5)
; GCN-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11 = S_LOAD_DWORDX8_IMM renamable $sgpr0_sgpr1, 128, 0 :: (dereferenceable invariant load (s256), align 4, addrspace 4)		; GCN-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11 = S_LOAD_DWORDX8_IMM renamable $sgpr0_sgpr1, 128, 0 :: (dereferenceable invariant load (s256), align 4, addrspace 4)
; GCN-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11 = SI_SPILL_S256_RESTORE %stack.0, implicit $exec, implicit $sp_reg :: (load (s256) from %stack.0, align 4, addrspace 5)		; GCN-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7 = S_LOAD_DWORDX4_IMM renamable $sgpr0_sgpr1, 16, 0 :: (dereferenceable invariant load (s128), align 4, addrspace 4)
		; GCN-NEXT: renamable $sgpr8_sgpr9_sgpr10_sgpr11 = COPY killed renamable $sgpr4_sgpr5_sgpr6_sgpr7
; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr8_sgpr9_sgpr10_sgpr11		; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr8_sgpr9_sgpr10_sgpr11
; GCN-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11 = S_LOAD_DWORDX8_IMM renamable $sgpr0_sgpr1, 128, 0 :: (dereferenceable invariant load (s256), align 4, addrspace 4)		; GCN-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11 = S_LOAD_DWORDX8_IMM renamable $sgpr0_sgpr1, 128, 0 :: (dereferenceable invariant load (s256), align 4, addrspace 4)
; GCN-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11 = SI_SPILL_S256_RESTORE %stack.0, implicit $exec, implicit $sp_reg :: (load (s256) from %stack.0, align 4, addrspace 5)		; GCN-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7 = S_LOAD_DWORDX4_IMM renamable $sgpr0_sgpr1, 0, 0 :: (dereferenceable invariant load (s128), align 4, addrspace 4)
; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr4_sgpr5_sgpr6_sgpr7		; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr4_sgpr5_sgpr6_sgpr7
; GCN-NEXT: S_ENDPGM 0, implicit killed renamable $sgpr0_sgpr1		; GCN-NEXT: S_ENDPGM 0, implicit killed renamable $sgpr0_sgpr1
%0:sreg_64_xexec = COPY $sgpr8_sgpr9		%0:sreg_64_xexec = COPY $sgpr8_sgpr9
%1:sgpr_256 = S_LOAD_DWORDX8_IMM %0, 0, 0 :: (invariant dereferenceable load (s256), align 4, addrspace 4)		%1:sgpr_256 = S_LOAD_DWORDX8_IMM %0, 0, 0 :: (invariant dereferenceable load (s256), align 4, addrspace 4)
%2:sgpr_256 = S_LOAD_DWORDX8_IMM %0, 128, 0 :: (invariant dereferenceable load (s256), align 4, addrspace 4)		%2:sgpr_256 = S_LOAD_DWORDX8_IMM %0, 128, 0 :: (invariant dereferenceable load (s256), align 4, addrspace 4)
%2.sub4_sub5_sub6_sub7:sgpr_256 = COPY %1.sub4_sub5_sub6_sub7:sgpr_256		%2.sub4_sub5_sub6_sub7:sgpr_256 = COPY %1.sub4_sub5_sub6_sub7:sgpr_256
S_NOP 0, implicit %2.sub4_sub5_sub6_sub7		S_NOP 0, implicit %2.sub4_sub5_sub6_sub7
%3:sgpr_256 = S_LOAD_DWORDX8_IMM %0, 128, 0 :: (invariant dereferenceable load (s256), align 4, addrspace 4)		%3:sgpr_256 = S_LOAD_DWORDX8_IMM %0, 128, 0 :: (invariant dereferenceable load (s256), align 4, addrspace 4)
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	body: \|
bb.0:		bb.0:
liveins: $sgpr8_sgpr9, $sgpr10		liveins: $sgpr8_sgpr9, $sgpr10
; GCN-LABEL: name: test_remat_s_load_dword_sgpr		; GCN-LABEL: name: test_remat_s_load_dword_sgpr
; GCN: liveins: $sgpr10, $sgpr8_sgpr9		; GCN: liveins: $sgpr10, $sgpr8_sgpr9
; GCN-NEXT: {{ $}}		; GCN-NEXT: {{ $}}
; GCN-NEXT: renamable $sgpr2_sgpr3 = COPY $sgpr8_sgpr9		; GCN-NEXT: renamable $sgpr2_sgpr3 = COPY $sgpr8_sgpr9
; GCN-NEXT: renamable $sgpr0 = COPY $sgpr10		; GCN-NEXT: renamable $sgpr0 = COPY $sgpr10
; GCN-NEXT: renamable $sgpr1 = S_LOAD_DWORD_SGPR renamable $sgpr2_sgpr3, renamable $sgpr0, 0 :: (dereferenceable invariant load (s32), addrspace 4)		; GCN-NEXT: renamable $sgpr1 = S_LOAD_DWORD_SGPR renamable $sgpr2_sgpr3, renamable $sgpr0, 0 :: (dereferenceable invariant load (s32), addrspace 4)
; GCN-NEXT: SI_SPILL_S32_SAVE killed renamable $sgpr1, %stack.1, implicit $exec, implicit $sp_reg :: (store (s32) into %stack.1, addrspace 5)
; GCN-NEXT: renamable $sgpr1 = S_LOAD_DWORD_SGPR renamable $sgpr2_sgpr3, renamable $sgpr0, 0 :: (dereferenceable invariant load (s32), addrspace 4)
; GCN-NEXT: SI_SPILL_S32_SAVE killed renamable $sgpr1, %stack.0, implicit $exec, implicit $sp_reg :: (store (s32) into %stack.0, addrspace 5)
; GCN-NEXT: renamable $sgpr1 = S_LOAD_DWORD_SGPR renamable $sgpr2_sgpr3, renamable $sgpr0, 0 :: (dereferenceable invariant load (s32), addrspace 4)
; GCN-NEXT: SI_SPILL_S32_SAVE killed renamable $sgpr1, %stack.2, implicit $exec, implicit $sp_reg :: (store (s32) into %stack.2, addrspace 5)
; GCN-NEXT: renamable $sgpr1 = SI_SPILL_S32_RESTORE %stack.1, implicit $exec, implicit $sp_reg :: (load (s32) from %stack.1, addrspace 5)
; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr1		; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr1
; GCN-NEXT: renamable $sgpr1 = SI_SPILL_S32_RESTORE %stack.0, implicit $exec, implicit $sp_reg :: (load (s32) from %stack.0, addrspace 5)		; GCN-NEXT: renamable $sgpr1 = S_LOAD_DWORD_SGPR renamable $sgpr2_sgpr3, renamable $sgpr0, 0 :: (dereferenceable invariant load (s32), addrspace 4)
; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr1		; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr1
; GCN-NEXT: renamable $sgpr1 = SI_SPILL_S32_RESTORE %stack.2, implicit $exec, implicit $sp_reg :: (load (s32) from %stack.2, addrspace 5)		; GCN-NEXT: renamable $sgpr1 = S_LOAD_DWORD_SGPR renamable $sgpr2_sgpr3, renamable $sgpr0, 0 :: (dereferenceable invariant load (s32), addrspace 4)
; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr1		; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr1
; GCN-NEXT: S_ENDPGM 0, implicit killed renamable $sgpr2_sgpr3, implicit killed renamable $sgpr0		; GCN-NEXT: S_ENDPGM 0, implicit killed renamable $sgpr2_sgpr3, implicit killed renamable $sgpr0
%0:sreg_64_xexec = COPY $sgpr8_sgpr9		%0:sreg_64_xexec = COPY $sgpr8_sgpr9
%1:sgpr_32 = COPY $sgpr10		%1:sgpr_32 = COPY $sgpr10
%2:sreg_32_xm0_xexec = S_LOAD_DWORD_SGPR %0, %1, 0 :: (invariant dereferenceable load (s32), addrspace 4)		%2:sreg_32_xm0_xexec = S_LOAD_DWORD_SGPR %0, %1, 0 :: (invariant dereferenceable load (s32), addrspace 4)
%3:sreg_32_xm0_xexec = S_LOAD_DWORD_SGPR %0, %1, 0 :: (invariant dereferenceable load (s32), addrspace 4)		%3:sreg_32_xm0_xexec = S_LOAD_DWORD_SGPR %0, %1, 0 :: (invariant dereferenceable load (s32), addrspace 4)
%4:sreg_32_xm0_xexec = S_LOAD_DWORD_SGPR %0, %1, 0 :: (invariant dereferenceable load (s32), addrspace 4)		%4:sreg_32_xm0_xexec = S_LOAD_DWORD_SGPR %0, %1, 0 :: (invariant dereferenceable load (s32), addrspace 4)
S_NOP 0, implicit %2		S_NOP 0, implicit %2
Show All 9 Lines	body: \|
bb.0:		bb.0:
liveins: $sgpr8_sgpr9, $sgpr10		liveins: $sgpr8_sgpr9, $sgpr10
; GCN-LABEL: name: test_remat_s_load_dword_sgpr_imm		; GCN-LABEL: name: test_remat_s_load_dword_sgpr_imm
; GCN: liveins: $sgpr10, $sgpr8_sgpr9		; GCN: liveins: $sgpr10, $sgpr8_sgpr9
; GCN-NEXT: {{ $}}		; GCN-NEXT: {{ $}}
; GCN-NEXT: renamable $sgpr2_sgpr3 = COPY $sgpr8_sgpr9		; GCN-NEXT: renamable $sgpr2_sgpr3 = COPY $sgpr8_sgpr9
; GCN-NEXT: renamable $sgpr0 = COPY $sgpr10		; GCN-NEXT: renamable $sgpr0 = COPY $sgpr10
; GCN-NEXT: renamable $sgpr1 = S_LOAD_DWORD_SGPR_IMM renamable $sgpr2_sgpr3, renamable $sgpr0, 0, 0 :: (dereferenceable invariant load (s32), addrspace 4)		; GCN-NEXT: renamable $sgpr1 = S_LOAD_DWORD_SGPR_IMM renamable $sgpr2_sgpr3, renamable $sgpr0, 0, 0 :: (dereferenceable invariant load (s32), addrspace 4)
; GCN-NEXT: SI_SPILL_S32_SAVE killed renamable $sgpr1, %stack.1, implicit $exec, implicit $sp_reg :: (store (s32) into %stack.1, addrspace 5)
; GCN-NEXT: renamable $sgpr1 = S_LOAD_DWORD_SGPR_IMM renamable $sgpr2_sgpr3, renamable $sgpr0, 4, 0 :: (dereferenceable invariant load (s32), addrspace 4)
; GCN-NEXT: SI_SPILL_S32_SAVE killed renamable $sgpr1, %stack.0, implicit $exec, implicit $sp_reg :: (store (s32) into %stack.0, addrspace 5)
; GCN-NEXT: renamable $sgpr1 = S_LOAD_DWORD_SGPR_IMM renamable $sgpr2_sgpr3, renamable $sgpr0, 8, 0 :: (dereferenceable invariant load (s32), addrspace 4)
; GCN-NEXT: SI_SPILL_S32_SAVE killed renamable $sgpr1, %stack.2, implicit $exec, implicit $sp_reg :: (store (s32) into %stack.2, addrspace 5)
; GCN-NEXT: renamable $sgpr1 = SI_SPILL_S32_RESTORE %stack.1, implicit $exec, implicit $sp_reg :: (load (s32) from %stack.1, addrspace 5)
; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr1		; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr1
; GCN-NEXT: renamable $sgpr1 = SI_SPILL_S32_RESTORE %stack.0, implicit $exec, implicit $sp_reg :: (load (s32) from %stack.0, addrspace 5)		; GCN-NEXT: renamable $sgpr1 = S_LOAD_DWORD_SGPR_IMM renamable $sgpr2_sgpr3, renamable $sgpr0, 4, 0 :: (dereferenceable invariant load (s32), addrspace 4)
; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr1		; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr1
; GCN-NEXT: renamable $sgpr1 = SI_SPILL_S32_RESTORE %stack.2, implicit $exec, implicit $sp_reg :: (load (s32) from %stack.2, addrspace 5)		; GCN-NEXT: renamable $sgpr1 = S_LOAD_DWORD_SGPR_IMM renamable $sgpr2_sgpr3, renamable $sgpr0, 8, 0 :: (dereferenceable invariant load (s32), addrspace 4)
; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr1		; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr1
; GCN-NEXT: S_ENDPGM 0, implicit killed renamable $sgpr2_sgpr3, implicit killed renamable $sgpr0		; GCN-NEXT: S_ENDPGM 0, implicit killed renamable $sgpr2_sgpr3, implicit killed renamable $sgpr0
%0:sreg_64_xexec = COPY $sgpr8_sgpr9		%0:sreg_64_xexec = COPY $sgpr8_sgpr9
%1:sgpr_32 = COPY $sgpr10		%1:sgpr_32 = COPY $sgpr10
%2:sreg_32_xm0_xexec = S_LOAD_DWORD_SGPR_IMM %0, %1, 0, 0 :: (invariant dereferenceable load (s32), addrspace 4)		%2:sreg_32_xm0_xexec = S_LOAD_DWORD_SGPR_IMM %0, %1, 0, 0 :: (invariant dereferenceable load (s32), addrspace 4)
%3:sreg_32_xm0_xexec = S_LOAD_DWORD_SGPR_IMM %0, %1, 4, 0 :: (invariant dereferenceable load (s32), addrspace 4)		%3:sreg_32_xm0_xexec = S_LOAD_DWORD_SGPR_IMM %0, %1, 4, 0 :: (invariant dereferenceable load (s32), addrspace 4)
%4:sreg_32_xm0_xexec = S_LOAD_DWORD_SGPR_IMM %0, %1, 8, 0 :: (invariant dereferenceable load (s32), addrspace 4)		%4:sreg_32_xm0_xexec = S_LOAD_DWORD_SGPR_IMM %0, %1, 8, 0 :: (invariant dereferenceable load (s32), addrspace 4)
S_NOP 0, implicit %2		S_NOP 0, implicit %2
Show All 38 Lines
tracksRegLiveness: true		tracksRegLiveness: true
body: \|		body: \|
bb.0:		bb.0:
liveins: $sgpr8_sgpr9_sgpr10_sgpr11		liveins: $sgpr8_sgpr9_sgpr10_sgpr11
; GCN-LABEL: name: test_remat_s_buffer_load_dword_imm		; GCN-LABEL: name: test_remat_s_buffer_load_dword_imm
; GCN: liveins: $sgpr8_sgpr9_sgpr10_sgpr11		; GCN: liveins: $sgpr8_sgpr9_sgpr10_sgpr11
; GCN-NEXT: {{ $}}		; GCN-NEXT: {{ $}}
; GCN-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7 = COPY $sgpr8_sgpr9_sgpr10_sgpr11		; GCN-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7 = COPY $sgpr8_sgpr9_sgpr10_sgpr11
; GCN-NEXT: renamable $sgpr0 = S_BUFFER_LOAD_DWORD_IMM renamable $sgpr4_sgpr5_sgpr6_sgpr7, 0, 0 :: (dereferenceable invariant load (s32), addrspace 4)
; GCN-NEXT: SI_SPILL_S32_SAVE killed renamable $sgpr0, %stack.0, implicit $exec, implicit $sp_reg :: (store (s32) into %stack.0, addrspace 5)
; GCN-NEXT: renamable $sgpr1 = S_BUFFER_LOAD_DWORD_IMM renamable $sgpr4_sgpr5_sgpr6_sgpr7, 4, 0 :: (dereferenceable invariant load (s32), addrspace 4)		; GCN-NEXT: renamable $sgpr1 = S_BUFFER_LOAD_DWORD_IMM renamable $sgpr4_sgpr5_sgpr6_sgpr7, 4, 0 :: (dereferenceable invariant load (s32), addrspace 4)
; GCN-NEXT: renamable $sgpr0 = S_BUFFER_LOAD_DWORD_IMM renamable $sgpr4_sgpr5_sgpr6_sgpr7, 8, 0 :: (dereferenceable invariant load (s32), addrspace 4)		; GCN-NEXT: renamable $sgpr0 = S_BUFFER_LOAD_DWORD_IMM renamable $sgpr4_sgpr5_sgpr6_sgpr7, 0, 0 :: (dereferenceable invariant load (s32), addrspace 4)
; GCN-NEXT: SI_SPILL_S32_SAVE killed renamable $sgpr0, %stack.1, implicit $exec, implicit $sp_reg :: (store (s32) into %stack.1, addrspace 5)
; GCN-NEXT: renamable $sgpr0 = SI_SPILL_S32_RESTORE %stack.0, implicit $exec, implicit $sp_reg :: (load (s32) from %stack.0, addrspace 5)
; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr0		; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr0
; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr1		; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr1
; GCN-NEXT: renamable $sgpr0 = SI_SPILL_S32_RESTORE %stack.1, implicit $exec, implicit $sp_reg :: (load (s32) from %stack.1, addrspace 5)		; GCN-NEXT: renamable $sgpr0 = S_BUFFER_LOAD_DWORD_IMM renamable $sgpr4_sgpr5_sgpr6_sgpr7, 8, 0 :: (dereferenceable invariant load (s32), addrspace 4)
; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr0		; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr0
; GCN-NEXT: S_ENDPGM 0, implicit killed renamable $sgpr4_sgpr5_sgpr6_sgpr7		; GCN-NEXT: S_ENDPGM 0, implicit killed renamable $sgpr4_sgpr5_sgpr6_sgpr7
%0:sgpr_128 = COPY $sgpr8_sgpr9_sgpr10_sgpr11		%0:sgpr_128 = COPY $sgpr8_sgpr9_sgpr10_sgpr11
%1:sreg_32_xm0_xexec = S_BUFFER_LOAD_DWORD_IMM %0, 0, 0 :: (invariant dereferenceable load (s32), addrspace 4)		%1:sreg_32_xm0_xexec = S_BUFFER_LOAD_DWORD_IMM %0, 0, 0 :: (invariant dereferenceable load (s32), addrspace 4)
%2:sreg_32_xm0_xexec = S_BUFFER_LOAD_DWORD_IMM %0, 4, 0 :: (invariant dereferenceable load (s32), addrspace 4)		%2:sreg_32_xm0_xexec = S_BUFFER_LOAD_DWORD_IMM %0, 4, 0 :: (invariant dereferenceable load (s32), addrspace 4)
%3:sreg_32_xm0_xexec = S_BUFFER_LOAD_DWORD_IMM %0, 8, 0 :: (invariant dereferenceable load (s32), addrspace 4)		%3:sreg_32_xm0_xexec = S_BUFFER_LOAD_DWORD_IMM %0, 8, 0 :: (invariant dereferenceable load (s32), addrspace 4)
S_NOP 0, implicit %1		S_NOP 0, implicit %1
S_NOP 0, implicit %2		S_NOP 0, implicit %2
S_NOP 0, implicit %3		S_NOP 0, implicit %3
S_ENDPGM 0, implicit %0		S_ENDPGM 0, implicit %0
...		...

---		---
name: test_remat_s_scratch_load_dword_imm		name: test_remat_s_scratch_load_dword_imm
tracksRegLiveness: true		tracksRegLiveness: true
body: \|		body: \|
bb.0:		bb.0:
liveins: $sgpr8_sgpr9		liveins: $sgpr8_sgpr9
; GCN-LABEL: name: test_remat_s_scratch_load_dword_imm		; GCN-LABEL: name: test_remat_s_scratch_load_dword_imm
; GCN: liveins: $sgpr8_sgpr9		; GCN: liveins: $sgpr8_sgpr9
; GCN-NEXT: {{ $}}		; GCN-NEXT: {{ $}}
; GCN-NEXT: renamable $sgpr2_sgpr3 = COPY $sgpr8_sgpr9		; GCN-NEXT: renamable $sgpr2_sgpr3 = COPY $sgpr8_sgpr9
; GCN-NEXT: renamable $sgpr0 = S_SCRATCH_LOAD_DWORD_IMM renamable $sgpr2_sgpr3, 0, 0, implicit $flat_scr :: (dereferenceable invariant load (s32), addrspace 4)
; GCN-NEXT: SI_SPILL_S32_SAVE killed renamable $sgpr0, %stack.0, implicit $exec, implicit $sp_reg :: (store (s32) into %stack.0, addrspace 5)
; GCN-NEXT: renamable $sgpr1 = S_SCRATCH_LOAD_DWORD_IMM renamable $sgpr2_sgpr3, 4, 0, implicit $flat_scr :: (dereferenceable invariant load (s32), addrspace 4)		; GCN-NEXT: renamable $sgpr1 = S_SCRATCH_LOAD_DWORD_IMM renamable $sgpr2_sgpr3, 4, 0, implicit $flat_scr :: (dereferenceable invariant load (s32), addrspace 4)
; GCN-NEXT: renamable $sgpr0 = S_SCRATCH_LOAD_DWORD_IMM renamable $sgpr2_sgpr3, 8, 0, implicit $flat_scr :: (dereferenceable invariant load (s32), addrspace 4)		; GCN-NEXT: renamable $sgpr0 = S_SCRATCH_LOAD_DWORD_IMM renamable $sgpr2_sgpr3, 0, 0, implicit $flat_scr :: (dereferenceable invariant load (s32), addrspace 4)
; GCN-NEXT: SI_SPILL_S32_SAVE killed renamable $sgpr0, %stack.1, implicit $exec, implicit $sp_reg :: (store (s32) into %stack.1, addrspace 5)
; GCN-NEXT: renamable $sgpr0 = SI_SPILL_S32_RESTORE %stack.0, implicit $exec, implicit $sp_reg :: (load (s32) from %stack.0, addrspace 5)
; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr0		; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr0
; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr1		; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr1
; GCN-NEXT: renamable $sgpr0 = SI_SPILL_S32_RESTORE %stack.1, implicit $exec, implicit $sp_reg :: (load (s32) from %stack.1, addrspace 5)		; GCN-NEXT: renamable $sgpr0 = S_SCRATCH_LOAD_DWORD_IMM renamable $sgpr2_sgpr3, 8, 0, implicit $flat_scr :: (dereferenceable invariant load (s32), addrspace 4)
; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr0		; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr0
; GCN-NEXT: S_ENDPGM 0, implicit killed renamable $sgpr2_sgpr3		; GCN-NEXT: S_ENDPGM 0, implicit killed renamable $sgpr2_sgpr3
%0:sreg_64_xexec = COPY $sgpr8_sgpr9		%0:sreg_64_xexec = COPY $sgpr8_sgpr9
%1:sreg_32_xm0_xexec = S_SCRATCH_LOAD_DWORD_IMM %0, 0, 0, implicit $flat_scr :: (invariant dereferenceable load (s32), addrspace 4)		%1:sreg_32_xm0_xexec = S_SCRATCH_LOAD_DWORD_IMM %0, 0, 0, implicit $flat_scr :: (invariant dereferenceable load (s32), addrspace 4)
%2:sreg_32_xm0_xexec = S_SCRATCH_LOAD_DWORD_IMM %0, 4, 0, implicit $flat_scr :: (invariant dereferenceable load (s32), addrspace 4)		%2:sreg_32_xm0_xexec = S_SCRATCH_LOAD_DWORD_IMM %0, 4, 0, implicit $flat_scr :: (invariant dereferenceable load (s32), addrspace 4)
%3:sreg_32_xm0_xexec = S_SCRATCH_LOAD_DWORD_IMM %0, 8, 0, implicit $flat_scr :: (invariant dereferenceable load (s32), addrspace 4)		%3:sreg_32_xm0_xexec = S_SCRATCH_LOAD_DWORD_IMM %0, 8, 0, implicit $flat_scr :: (invariant dereferenceable load (s32), addrspace 4)
S_NOP 0, implicit %1		S_NOP 0, implicit %1
S_NOP 0, implicit %2		S_NOP 0, implicit %2
S_NOP 0, implicit %3		S_NOP 0, implicit %3
S_ENDPGM 0, implicit %0		S_ENDPGM 0, implicit %0
...		...

---		---
name: test_remat_s_load_dword_invariant_not_dereferenceable		name: test_remat_s_load_dword_invariant_not_dereferenceable
tracksRegLiveness: true		tracksRegLiveness: true
body: \|		body: \|
bb.0:		bb.0:
liveins: $sgpr8_sgpr9		liveins: $sgpr8_sgpr9
; GCN-LABEL: name: test_remat_s_load_dword_invariant_not_dereferenceable		; GCN-LABEL: name: test_remat_s_load_dword_invariant_not_dereferenceable
; GCN: liveins: $sgpr8_sgpr9		; GCN: liveins: $sgpr8_sgpr9
; GCN-NEXT: {{ $}}		; GCN-NEXT: {{ $}}
; GCN-NEXT: renamable $sgpr2_sgpr3 = COPY $sgpr8_sgpr9		; GCN-NEXT: renamable $sgpr2_sgpr3 = COPY $sgpr8_sgpr9
; GCN-NEXT: renamable $sgpr0 = S_LOAD_DWORD_IMM renamable $sgpr2_sgpr3, 0, 0 :: (invariant load (s32), addrspace 4)
; GCN-NEXT: SI_SPILL_S32_SAVE killed renamable $sgpr0, %stack.0, implicit $exec, implicit $sp_reg :: (store (s32) into %stack.0, addrspace 5)
; GCN-NEXT: renamable $sgpr1 = S_LOAD_DWORD_IMM renamable $sgpr2_sgpr3, 4, 0 :: (invariant load (s32), addrspace 4)		; GCN-NEXT: renamable $sgpr1 = S_LOAD_DWORD_IMM renamable $sgpr2_sgpr3, 4, 0 :: (invariant load (s32), addrspace 4)
; GCN-NEXT: renamable $sgpr0 = S_LOAD_DWORD_IMM renamable $sgpr2_sgpr3, 8, 0 :: (invariant load (s32), addrspace 4)		; GCN-NEXT: renamable $sgpr0 = S_LOAD_DWORD_IMM renamable $sgpr2_sgpr3, 0, 0 :: (invariant load (s32), addrspace 4)
; GCN-NEXT: SI_SPILL_S32_SAVE killed renamable $sgpr0, %stack.1, implicit $exec, implicit $sp_reg :: (store (s32) into %stack.1, addrspace 5)
; GCN-NEXT: renamable $sgpr0 = SI_SPILL_S32_RESTORE %stack.0, implicit $exec, implicit $sp_reg :: (load (s32) from %stack.0, addrspace 5)
; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr0		; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr0
; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr1		; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr1
; GCN-NEXT: renamable $sgpr0 = SI_SPILL_S32_RESTORE %stack.1, implicit $exec, implicit $sp_reg :: (load (s32) from %stack.1, addrspace 5)		; GCN-NEXT: renamable $sgpr0 = S_LOAD_DWORD_IMM renamable $sgpr2_sgpr3, 8, 0 :: (invariant load (s32), addrspace 4)
; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr0		; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr0
; GCN-NEXT: S_ENDPGM 0, implicit killed renamable $sgpr2_sgpr3		; GCN-NEXT: S_ENDPGM 0, implicit killed renamable $sgpr2_sgpr3
%0:sreg_64_xexec = COPY $sgpr8_sgpr9		%0:sreg_64_xexec = COPY $sgpr8_sgpr9
%1:sreg_32_xm0_xexec = S_LOAD_DWORD_IMM %0, 0, 0 :: (invariant load (s32), addrspace 4)		%1:sreg_32_xm0_xexec = S_LOAD_DWORD_IMM %0, 0, 0 :: (invariant load (s32), addrspace 4)
%2:sreg_32_xm0_xexec = S_LOAD_DWORD_IMM %0, 4, 0 :: (invariant load (s32), addrspace 4)		%2:sreg_32_xm0_xexec = S_LOAD_DWORD_IMM %0, 4, 0 :: (invariant load (s32), addrspace 4)
%3:sreg_32_xm0_xexec = S_LOAD_DWORD_IMM %0, 8, 0 :: (invariant load (s32), addrspace 4)		%3:sreg_32_xm0_xexec = S_LOAD_DWORD_IMM %0, 8, 0 :: (invariant load (s32), addrspace 4)
S_NOP 0, implicit %1		S_NOP 0, implicit %1
S_NOP 0, implicit %2		S_NOP 0, implicit %2
▲ Show 20 Lines • Show All 132 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/snippet-copy-bundle-regression.mir

Show All 27 Lines	machineFunctionInfo:
stackPtrOffsetReg: '$sgpr32'		stackPtrOffsetReg: '$sgpr32'
occupancy: 8		occupancy: 8
body: \|		body: \|
; CHECK-LABEL: name: kernel		; CHECK-LABEL: name: kernel
; CHECK: bb.0:		; CHECK: bb.0:
; CHECK-NEXT: successors: %bb.2(0x40000000), %bb.1(0x40000000)		; CHECK-NEXT: successors: %bb.2(0x40000000), %bb.1(0x40000000)
; CHECK-NEXT: liveins: $sgpr14, $sgpr15, $sgpr16, $vgpr0, $vgpr1, $vgpr2, $sgpr4_sgpr5, $sgpr6_sgpr7, $sgpr8_sgpr9, $sgpr10_sgpr11		; CHECK-NEXT: liveins: $sgpr14, $sgpr15, $sgpr16, $vgpr0, $vgpr1, $vgpr2, $sgpr4_sgpr5, $sgpr6_sgpr7, $sgpr8_sgpr9, $sgpr10_sgpr11
; CHECK-NEXT: {{ $}}		; CHECK-NEXT: {{ $}}
; CHECK-NEXT: renamable $vgpr1 = IMPLICIT_DEF
; CHECK-NEXT: renamable $sgpr34_sgpr35 = IMPLICIT_DEF		; CHECK-NEXT: renamable $sgpr34_sgpr35 = IMPLICIT_DEF
; CHECK-NEXT: dead renamable $vgpr0 = IMPLICIT_DEF		; CHECK-NEXT: dead renamable $vgpr0 = IMPLICIT_DEF
; CHECK-NEXT: renamable $sgpr41 = IMPLICIT_DEF		; CHECK-NEXT: renamable $sgpr41 = IMPLICIT_DEF
; CHECK-NEXT: renamable $sgpr38_sgpr39 = COPY undef $sgpr8_sgpr9		; CHECK-NEXT: renamable $sgpr38_sgpr39 = COPY undef $sgpr8_sgpr9
; CHECK-NEXT: renamable $sgpr36_sgpr37 = IMPLICIT_DEF		; CHECK-NEXT: renamable $sgpr36_sgpr37 = IMPLICIT_DEF
; CHECK-NEXT: renamable $sgpr44_sgpr45_sgpr46_sgpr47_sgpr48_sgpr49_sgpr50_sgpr51 = S_LOAD_DWORDX8_IMM renamable $sgpr38_sgpr39, 0, 0 :: (dereferenceable invariant load (s256), align 16, addrspace 4)		; CHECK-NEXT: renamable $sgpr44_sgpr45_sgpr46_sgpr47_sgpr48_sgpr49_sgpr50_sgpr51 = S_LOAD_DWORDX8_IMM renamable $sgpr38_sgpr39, 0, 0 :: (dereferenceable invariant load (s256), align 16, addrspace 4)
; CHECK-NEXT: dead renamable $sgpr4 = S_LOAD_DWORD_IMM renamable $sgpr38_sgpr39, 48, 0 :: (dereferenceable invariant load (s32), align 16, addrspace 4)		; CHECK-NEXT: dead renamable $sgpr4 = S_LOAD_DWORD_IMM renamable $sgpr38_sgpr39, 48, 0 :: (dereferenceable invariant load (s32), align 16, addrspace 4)
; CHECK-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11 = S_LOAD_DWORDX8_IMM renamable $sgpr38_sgpr39, 56, 0 :: (dereferenceable invariant load (s256), align 8, addrspace 4)
; CHECK-NEXT: renamable $vgpr1 = V_WRITELANE_B32 $sgpr4, 0, killed $vgpr1, implicit-def $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11, implicit $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11
; CHECK-NEXT: renamable $vgpr1 = V_WRITELANE_B32 $sgpr5, 1, killed $vgpr1
; CHECK-NEXT: renamable $vgpr1 = V_WRITELANE_B32 $sgpr6, 2, killed $vgpr1
; CHECK-NEXT: renamable $vgpr1 = V_WRITELANE_B32 $sgpr7, 3, killed $vgpr1
; CHECK-NEXT: renamable $vgpr1 = V_WRITELANE_B32 $sgpr8, 4, killed $vgpr1
; CHECK-NEXT: renamable $vgpr1 = V_WRITELANE_B32 $sgpr9, 5, killed $vgpr1
; CHECK-NEXT: renamable $vgpr1 = V_WRITELANE_B32 $sgpr10, 6, killed $vgpr1
; CHECK-NEXT: renamable $vgpr1 = V_WRITELANE_B32 killed $sgpr11, 7, killed $vgpr1, implicit killed $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11
; CHECK-NEXT: SI_SPILL_WWM_V32_SAVE killed $vgpr1, %stack.1, $sgpr32, 0, implicit $exec :: (store (s32) into %stack.1, addrspace 5)
; CHECK-NEXT: dead renamable $sgpr4_sgpr5 = S_LOAD_DWORDX2_IMM renamable $sgpr44_sgpr45, 0, 0 :: (invariant load (s64), align 16, addrspace 4)		; CHECK-NEXT: dead renamable $sgpr4_sgpr5 = S_LOAD_DWORDX2_IMM renamable $sgpr44_sgpr45, 0, 0 :: (invariant load (s64), align 16, addrspace 4)
; CHECK-NEXT: ADJCALLSTACKUP 0, 0, implicit-def dead $scc, implicit-def $sgpr32, implicit $sgpr32		; CHECK-NEXT: ADJCALLSTACKUP 0, 0, implicit-def dead $scc, implicit-def $sgpr32, implicit $sgpr32
; CHECK-NEXT: $vgpr1 = COPY renamable $sgpr51		; CHECK-NEXT: $vgpr1 = COPY renamable $sgpr51
; CHECK-NEXT: dead $sgpr30_sgpr31 = SI_CALL undef renamable $sgpr4_sgpr5, 0, csr_amdgpu, implicit undef $sgpr15, implicit $vgpr31, implicit $sgpr0_sgpr1_sgpr2_sgpr3		; CHECK-NEXT: dead $sgpr30_sgpr31 = SI_CALL undef renamable $sgpr4_sgpr5, 0, csr_amdgpu, implicit undef $sgpr15, implicit $vgpr31, implicit $sgpr0_sgpr1_sgpr2_sgpr3
; CHECK-NEXT: ADJCALLSTACKDOWN 0, 0, implicit-def dead $scc, implicit-def $sgpr32, implicit $sgpr32		; CHECK-NEXT: ADJCALLSTACKDOWN 0, 0, implicit-def dead $scc, implicit-def $sgpr32, implicit $sgpr32
; CHECK-NEXT: $vcc = COPY renamable $sgpr40_sgpr41		; CHECK-NEXT: $vcc = COPY renamable $sgpr40_sgpr41
; CHECK-NEXT: S_CBRANCH_VCCZ %bb.2, implicit undef $vcc		; CHECK-NEXT: S_CBRANCH_VCCZ %bb.2, implicit undef $vcc
; CHECK-NEXT: {{ $}}		; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.1:		; CHECK-NEXT: bb.1:
; CHECK-NEXT: successors: %bb.3(0x80000000)		; CHECK-NEXT: successors: %bb.3(0x80000000)
; CHECK-NEXT: liveins: $sgpr34_sgpr35, $sgpr36_sgpr37, $sgpr38_sgpr39, $sgpr44_sgpr45_sgpr46_sgpr47_sgpr48_sgpr49_sgpr50_sgpr51:0x000000000000FC00		; CHECK-NEXT: liveins: $sgpr34_sgpr35, $sgpr36_sgpr37, $sgpr38_sgpr39, $sgpr44_sgpr45_sgpr46_sgpr47_sgpr48_sgpr49_sgpr50_sgpr51:0x000000000000FC00
; CHECK-NEXT: {{ $}}		; CHECK-NEXT: {{ $}}
; CHECK-NEXT: renamable $vgpr1 = SI_SPILL_WWM_V32_RESTORE %stack.1, $sgpr32, 0, implicit $exec :: (load (s32) from %stack.1, addrspace 5)		; CHECK-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11 = S_LOAD_DWORDX8_IMM renamable $sgpr38_sgpr39, 56, 0 :: (dereferenceable invariant load (s256), align 8, addrspace 4)
; CHECK-NEXT: $sgpr4 = V_READLANE_B32 $vgpr1, 0, implicit-def $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11
; CHECK-NEXT: $sgpr5 = V_READLANE_B32 $vgpr1, 1
; CHECK-NEXT: $sgpr6 = V_READLANE_B32 $vgpr1, 2
; CHECK-NEXT: $sgpr7 = V_READLANE_B32 $vgpr1, 3
; CHECK-NEXT: $sgpr8 = V_READLANE_B32 $vgpr1, 4
; CHECK-NEXT: $sgpr9 = V_READLANE_B32 $vgpr1, 5
; CHECK-NEXT: $sgpr10 = V_READLANE_B32 $vgpr1, 6
; CHECK-NEXT: $sgpr11 = V_READLANE_B32 $vgpr1, 7
; CHECK-NEXT: $noreg = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec
; CHECK-NEXT: $exec = S_MOV_B64 killed $noreg
; CHECK-NEXT: S_BRANCH %bb.3		; CHECK-NEXT: S_BRANCH %bb.3
; CHECK-NEXT: {{ $}}		; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.2:		; CHECK-NEXT: bb.2:
; CHECK-NEXT: successors: %bb.3(0x80000000)		; CHECK-NEXT: successors: %bb.3(0x80000000)
; CHECK-NEXT: liveins: $sgpr34_sgpr35, $sgpr36_sgpr37, $sgpr38_sgpr39, $sgpr44_sgpr45_sgpr46_sgpr47_sgpr48_sgpr49_sgpr50_sgpr51:0x000000000000FC00		; CHECK-NEXT: liveins: $sgpr34_sgpr35, $sgpr36_sgpr37, $sgpr38_sgpr39, $sgpr44_sgpr45_sgpr46_sgpr47_sgpr48_sgpr49_sgpr50_sgpr51:0x000000000000FC00
; CHECK-NEXT: {{ $}}		; CHECK-NEXT: {{ $}}
; CHECK-NEXT: renamable $vgpr1 = SI_SPILL_WWM_V32_RESTORE %stack.1, $sgpr32, 0, implicit $exec :: (load (s32) from %stack.1, addrspace 5)		; CHECK-NEXT: renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11 = S_LOAD_DWORDX8_IMM renamable $sgpr38_sgpr39, 56, 0 :: (dereferenceable invariant load (s256), align 8, addrspace 4)
; CHECK-NEXT: $sgpr4 = V_READLANE_B32 $vgpr1, 0, implicit-def $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11
; CHECK-NEXT: $sgpr5 = V_READLANE_B32 $vgpr1, 1
; CHECK-NEXT: $sgpr6 = V_READLANE_B32 $vgpr1, 2
; CHECK-NEXT: $sgpr7 = V_READLANE_B32 $vgpr1, 3
; CHECK-NEXT: $sgpr8 = V_READLANE_B32 $vgpr1, 4
; CHECK-NEXT: $sgpr9 = V_READLANE_B32 $vgpr1, 5
; CHECK-NEXT: $sgpr10 = V_READLANE_B32 $vgpr1, 6
; CHECK-NEXT: $sgpr11 = V_READLANE_B32 $vgpr1, 7
; CHECK-NEXT: S_CMP_LG_U64 renamable $sgpr4_sgpr5, 0, implicit-def $scc		; CHECK-NEXT: S_CMP_LG_U64 renamable $sgpr4_sgpr5, 0, implicit-def $scc
; CHECK-NEXT: $noreg = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec
; CHECK-NEXT: $exec = S_MOV_B64 killed $noreg
; CHECK-NEXT: {{ $}}		; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.3:		; CHECK-NEXT: bb.3:
; CHECK-NEXT: successors: %bb.5(0x40000000), %bb.4(0x40000000)		; CHECK-NEXT: successors: %bb.5(0x40000000), %bb.4(0x40000000)
; CHECK-NEXT: liveins: $vgpr1, $sgpr34_sgpr35, $sgpr36_sgpr37, $sgpr38_sgpr39, $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11:0x00000000000003F0, $sgpr44_sgpr45_sgpr46_sgpr47_sgpr48_sgpr49_sgpr50_sgpr51:0x000000000000FC00		; CHECK-NEXT: liveins: $sgpr34_sgpr35, $sgpr36_sgpr37, $sgpr38_sgpr39, $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11:0x00000000000003F0, $sgpr44_sgpr45_sgpr46_sgpr47_sgpr48_sgpr49_sgpr50_sgpr51:0x000000000000FC00
; CHECK-NEXT: {{ $}}		; CHECK-NEXT: {{ $}}
; CHECK-NEXT: S_CBRANCH_VCCZ %bb.5, implicit undef $vcc		; CHECK-NEXT: S_CBRANCH_VCCZ %bb.5, implicit undef $vcc
; CHECK-NEXT: {{ $}}		; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.4:		; CHECK-NEXT: bb.4:
; CHECK-NEXT: successors: %bb.5(0x80000000)		; CHECK-NEXT: successors: %bb.5(0x80000000)
; CHECK-NEXT: liveins: $vgpr1, $sgpr34_sgpr35, $sgpr36_sgpr37, $sgpr38_sgpr39, $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11:0x00000000000003F0, $sgpr44_sgpr45_sgpr46_sgpr47_sgpr48_sgpr49_sgpr50_sgpr51:0x000000000000FC00		; CHECK-NEXT: liveins: $sgpr34_sgpr35, $sgpr36_sgpr37, $sgpr38_sgpr39, $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11:0x00000000000003F0, $sgpr44_sgpr45_sgpr46_sgpr47_sgpr48_sgpr49_sgpr50_sgpr51:0x000000000000FC00
; CHECK-NEXT: {{ $}}		; CHECK-NEXT: {{ $}}
; CHECK-NEXT: S_CMP_EQ_U32 renamable $sgpr8, 0, implicit-def $scc		; CHECK-NEXT: S_CMP_EQ_U32 renamable $sgpr8, 0, implicit-def $scc
; CHECK-NEXT: {{ $}}		; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.5:		; CHECK-NEXT: bb.5:
; CHECK-NEXT: liveins: $vgpr1, $sgpr34_sgpr35, $sgpr36_sgpr37, $sgpr38_sgpr39, $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11:0x00000000000000F0, $sgpr44_sgpr45_sgpr46_sgpr47_sgpr48_sgpr49_sgpr50_sgpr51:0x000000000000FC00		; CHECK-NEXT: liveins: $sgpr34_sgpr35, $sgpr36_sgpr37, $sgpr38_sgpr39, $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11:0x00000000000000F0, $sgpr44_sgpr45_sgpr46_sgpr47_sgpr48_sgpr49_sgpr50_sgpr51:0x000000000000FC00
; CHECK-NEXT: {{ $}}		; CHECK-NEXT: {{ $}}
; CHECK-NEXT: dead renamable $sgpr4_sgpr5 = S_LOAD_DWORDX2_IMM killed renamable $sgpr38_sgpr39, 40, 0 :: (dereferenceable invariant load (s64), addrspace 4)		; CHECK-NEXT: dead renamable $sgpr4_sgpr5 = S_LOAD_DWORDX2_IMM killed renamable $sgpr38_sgpr39, 40, 0 :: (dereferenceable invariant load (s64), addrspace 4)
; CHECK-NEXT: GLOBAL_STORE_DWORD_SADDR undef renamable $vgpr0, undef renamable $vgpr0, killed renamable $sgpr6_sgpr7, 0, 0, implicit $exec :: (store (s32), addrspace 1)		; CHECK-NEXT: GLOBAL_STORE_DWORD_SADDR undef renamable $vgpr0, undef renamable $vgpr0, killed renamable $sgpr6_sgpr7, 0, 0, implicit $exec :: (store (s32), addrspace 1)
; CHECK-NEXT: GLOBAL_STORE_DWORD_SADDR undef renamable $vgpr0, undef renamable $vgpr0, renamable $sgpr50_sgpr51, 0, 0, implicit $exec :: (store (s32), addrspace 1)		; CHECK-NEXT: GLOBAL_STORE_DWORD_SADDR undef renamable $vgpr0, undef renamable $vgpr0, renamable $sgpr50_sgpr51, 0, 0, implicit $exec :: (store (s32), addrspace 1)
; CHECK-NEXT: dead renamable $vgpr0 = COPY killed renamable $sgpr49		; CHECK-NEXT: dead renamable $vgpr0 = COPY killed renamable $sgpr49
; CHECK-NEXT: ADJCALLSTACKUP 0, 0, implicit-def dead $scc, implicit-def $sgpr32, implicit $sgpr32		; CHECK-NEXT: ADJCALLSTACKUP 0, 0, implicit-def dead $scc, implicit-def $sgpr32, implicit $sgpr32
; CHECK-NEXT: $sgpr6_sgpr7 = COPY killed renamable $sgpr36_sgpr37		; CHECK-NEXT: $sgpr6_sgpr7 = COPY killed renamable $sgpr36_sgpr37
; CHECK-NEXT: $sgpr10_sgpr11 = COPY killed renamable $sgpr34_sgpr35		; CHECK-NEXT: $sgpr10_sgpr11 = COPY killed renamable $sgpr34_sgpr35
; CHECK-NEXT: ADJCALLSTACKDOWN 0, 0, implicit-def dead $scc, implicit-def $sgpr32, implicit $sgpr32		; CHECK-NEXT: ADJCALLSTACKDOWN 0, 0, implicit-def dead $scc, implicit-def $sgpr32, implicit $sgpr32
; CHECK-NEXT: KILL killed renamable $vgpr1
; CHECK-NEXT: S_ENDPGM 0		; CHECK-NEXT: S_ENDPGM 0
bb.0:		bb.0:
liveins: $vgpr0, $vgpr1, $vgpr2, $sgpr4_sgpr5, $sgpr6_sgpr7, $sgpr8_sgpr9, $sgpr10_sgpr11, $sgpr14, $sgpr15, $sgpr16		liveins: $vgpr0, $vgpr1, $vgpr2, $sgpr4_sgpr5, $sgpr6_sgpr7, $sgpr8_sgpr9, $sgpr10_sgpr11, $sgpr14, $sgpr15, $sgpr16

%0:sgpr_64 = IMPLICIT_DEF		%0:sgpr_64 = IMPLICIT_DEF
%1:vgpr_32 = IMPLICIT_DEF		%1:vgpr_32 = IMPLICIT_DEF
undef %2.sub1:sreg_64 = IMPLICIT_DEF		undef %2.sub1:sreg_64 = IMPLICIT_DEF
%3:sgpr_64 = COPY undef $sgpr8_sgpr9		%3:sgpr_64 = COPY undef $sgpr8_sgpr9
Show All 36 Lines