This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AMDGPU/
-
Target/
-
AMDGPU/
4/4
SIFoldOperands.cpp
-
test/CodeGen/AMDGPU/
-
CodeGen/
-
AMDGPU/
2/5
fold-agpr-phis.mir

Differential D153879

[AMDGPU] Handle Additional Cases in tryFoldPhiAGPR
ClosedPublic

Authored by Pierre-vh on Jun 27 2023, 7:50 AM.

Download Raw Diff

Details

Reviewers

arsenm

Group Reviewers

Restricted Project

Commits

rG026fc9e9c41d: [AMDGPU] Handle Additional Cases in tryFoldPhiAGPR

Summary

Sometimes PHI have different incoming values, such as:

%1:vgpr_256 = COPY %0:agpr_256
%2:vgpr_32 = COPY %1:vgpr_256.sub0

Those weren't handled, which could lead to massive performance issues if break-large-PHIs kicked in + AGPRs were used (MFMA)

Fixes SWDEV-407986

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

Pierre-vh created this revision.Jun 27 2023, 7:50 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 27 2023, 7:50 AM

Herald added subscribers: foad, kerbowa, hiraditya and 5 others. · View Herald Transcript

Pierre-vh requested review of this revision.Jun 27 2023, 7:50 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 27 2023, 7:50 AM

Herald added subscribers: llvm-commits, wdng. · View Herald Transcript

arsenm added inline comments.Jun 27 2023, 8:27 AM

llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
1644–1684	Doc comment
1647	Don't call it SubRegMask, it's just the SubReg
1667	We really ought to ban defless registers in SSA but you do need this for undef sources
1764	It's not a regmask, it's just a subreg index
llvm/test/CodeGen/AMDGPU/fold-agpr-phis.mir
505	Could also use and end to end IR test

Harbormaster completed remote builds in B241489: Diff 534988.Jun 27 2023, 9:33 AM

Pierre-vh marked 4 inline comments as done.Jun 28 2023, 12:03 AM

Pierre-vh added inline comments.

llvm/test/CodeGen/AMDGPU/fold-agpr-phis.mir
505	I don't have one; I can try reducing the original app's code but that may take a while, the kernel is big. I tried reproducing the pattern I thought was relevant using some functions from mfma_loop but I can't get it to create that two copy pattern. I will try digging a bit more, but is a end-to-end test necessary here?

Comments

Harbormaster completed remote builds in B241699: Diff 535256.Jun 28 2023, 12:53 AM

arsenm added inline comments.Jun 28 2023, 8:15 AM

llvm/test/CodeGen/AMDGPU/fold-agpr-phis.mir
505	I'd like an end to end test because eventually all this code should be deleted but we should preserve the copy-avoiding behavior. In the ideal future RegBankSelect would take care of this

Pierre-vh added inline comments.Jun 29 2023, 12:41 AM

llvm/test/CodeGen/AMDGPU/fold-agpr-phis.mir
505	`mfma-loop` already has quite a few end-to-end testcases that will fail if this fold doesn't work, but break-large-PHIs is applied. I can try to come up with another case that _maybe_ reproduces the two-copies pattern in the current trunk LLVM, but I can't guarantee it'll always produce that pattern in an end-to-end testcase. Changes anywhere in the pipeline may cause the lowering to be different and this new code path to no longer be hit, hence why I don't think it's particularly useful to have a end-to-end case here. I don't think it's worth the effort to come up with a testcase that may be entirely irrelevant very soon. Now I'm thinking, something that may be useful would be to run `mfma-loop.ll` through CGP, and add that to the tests. That way if break-large-PHIs is disabled at some point in the future, we still have a stress test for AGPRs folding like SIFoldOperand does. Would you like me to add that? Lastly, note that all of this is very specific to break-large-PHIs + DAGISel. There's a very good chance that if we switch to GISel (and no longer break PHIs in IR), that this fold will be entirely irrelevant because we won't break PHIs that way anymore, or some combine will take care of it, etc.

arsenm accepted this revision.Jun 29 2023, 5:06 AM

arsenm added inline comments.

llvm/test/CodeGen/AMDGPU/fold-agpr-phis.mir
505	it doesn't really need to be guaranteed, if you can come up with something add it

This revision is now accepted and ready to land.Jun 29 2023, 5:06 AM

This revision was landed with ongoing or failed builds.Jun 29 2023, 5:49 AM

Closed by commit rG026fc9e9c41d: [AMDGPU] Handle Additional Cases in tryFoldPhiAGPR (authored by Pierre-vh). · Explain Why

This revision was automatically updated to reflect the committed changes.

Pierre-vh added a commit: rG026fc9e9c41d: [AMDGPU] Handle Additional Cases in tryFoldPhiAGPR.

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

SIFoldOperands.cpp

78 lines

test/

CodeGen/

AMDGPU/

fold-agpr-phis.mir

96 lines

Diff 535751

llvm/lib/Target/AMDGPU/SIFoldOperands.cpp

Show First 20 Lines • Show All 1,635 Lines • ▼ Show 20 Lines	bool SIFoldOperands::tryFoldRegSequence(MachineInstr &MI) {
LLVM_DEBUG(dbgs() << "Folded " << RS << " into " << UseMI);		LLVM_DEBUG(dbgs() << "Folded " << RS << " into " << UseMI);

// Erase the REG_SEQUENCE eagerly, unless we followed a chain of COPY users,		// Erase the REG_SEQUENCE eagerly, unless we followed a chain of COPY users,
// in which case we can erase them all later in runOnMachineFunction.		// in which case we can erase them all later in runOnMachineFunction.
if (MRI->use_nodbg_empty(MI.getOperand(0).getReg()))		if (MRI->use_nodbg_empty(MI.getOperand(0).getReg()))
MI.eraseFromParent();		MI.eraseFromParent();
return true;		return true;
}		}

		/// Checks whether \p Copy is a AGPR -> VGPR copy. Returns `true` on success and
		/// stores the AGPR register in \p OutReg and the subreg in \p OutSubReg
		static bool isAGPRCopy(const SIRegisterInfo &TRI,
		arsenmUnsubmitted Done Reply Inline Actions Don't call it SubRegMask, it's just the SubReg arsenm: Don't call it SubRegMask, it's just the SubReg
		const MachineRegisterInfo &MRI, const MachineInstr &Copy,
		Register &OutReg, unsigned &OutSubReg) {
		assert(Copy.isCopy());

		const MachineOperand &CopySrc = Copy.getOperand(1);
		Register CopySrcReg = CopySrc.getReg();
		if (!CopySrcReg.isVirtual())
		return false;

		// Common case: copy from AGPR directly, e.g.
		// %1:vgpr_32 = COPY %0:agpr_32
		if (TRI.isAGPR(MRI, CopySrcReg)) {
		OutReg = CopySrcReg;
		OutSubReg = CopySrc.getSubReg();
		return true;
		}

		// Sometimes it can also involve two copies, e.g.
		// %1:vgpr_256 = COPY %0:agpr_256
		// %2:vgpr_32 = COPY %1:vgpr_256.sub0
		arsenmUnsubmitted Done Reply Inline Actions We really ought to ban defless registers in SSA but you do need this for undef sources arsenm: We really ought to ban defless registers in SSA but you do need this for undef sources
		const MachineInstr *CopySrcDef = MRI.getVRegDef(CopySrcReg);
		if (!CopySrcDef \|\| !CopySrcDef->isCopy())
		return false;

		const MachineOperand &OtherCopySrc = CopySrcDef->getOperand(1);
		Register OtherCopySrcReg = OtherCopySrc.getReg();
		if (!OtherCopySrcReg.isVirtual() \|\|
		CopySrcDef->getOperand(0).getSubReg() != AMDGPU::NoSubRegister \|\|
		OtherCopySrc.getSubReg() != AMDGPU::NoSubRegister \|\|
		!TRI.isAGPR(MRI, OtherCopySrcReg))
		return false;

		OutReg = OtherCopySrcReg;
		OutSubReg = CopySrc.getSubReg();
		return true;
		}

		arsenmUnsubmitted Done Reply Inline Actions Doc comment arsenm: Doc comment
// Try to hoist an AGPR to VGPR copy across a PHI.		// Try to hoist an AGPR to VGPR copy across a PHI.
// This should allow folding of an AGPR into a consumer which may support it.		// This should allow folding of an AGPR into a consumer which may support it.
//		//
// Example 1: LCSSA PHI		// Example 1: LCSSA PHI
// loop:		// loop:
// %1:vreg = COPY %0:areg		// %1:vreg = COPY %0:areg
// exit:		// exit:
// %2:vreg = PHI %1:vreg, %loop		// %2:vreg = PHI %1:vreg, %loop
Show All 25 Lines	bool SIFoldOperands::tryFoldPhiAGPR(MachineInstr &PHI) {
if (!TRI->isVGPR(*MRI, PhiOut))		if (!TRI->isVGPR(*MRI, PhiOut))
return false;		return false;

// Iterate once over all incoming values of the PHI to check if this PHI is		// Iterate once over all incoming values of the PHI to check if this PHI is
// eligible, and determine the exact AGPR RC we'll target.		// eligible, and determine the exact AGPR RC we'll target.
const TargetRegisterClass *ARC = nullptr;		const TargetRegisterClass *ARC = nullptr;
for (unsigned K = 1; K < PHI.getNumExplicitOperands(); K += 2) {		for (unsigned K = 1; K < PHI.getNumExplicitOperands(); K += 2) {
MachineOperand &MO = PHI.getOperand(K);		MachineOperand &MO = PHI.getOperand(K);
		MachineInstr *Copy = MRI->getVRegDef(MO.getReg());
Register PhiIn = MO.getReg();
if (MO.getSubReg() \|\| !TRI->isVGPR(*MRI, PhiIn))
return false;

MachineInstr *Copy = MRI->getVRegDef(PhiIn);
if (!Copy \|\| !Copy->isCopy())		if (!Copy \|\| !Copy->isCopy())
continue;		continue;

Register CopyIn = Copy->getOperand(1).getReg();		Register AGPRSrc;
if (CopyIn.isVirtual() && TRI->isAGPR(*MRI, CopyIn)) {		unsigned AGPRRegMask = AMDGPU::NoSubRegister;
const TargetRegisterClass *CopyInRC =		if (!isAGPRCopy(TRI, MRI, *Copy, AGPRSrc, AGPRRegMask))
getRegOpRC(MRI, TRI, Copy->getOperand(1));		continue;

		const TargetRegisterClass *CopyInRC = MRI->getRegClass(AGPRSrc);
		if (const auto *SubRC = TRI->getSubRegisterClass(CopyInRC, AGPRRegMask))
		CopyInRC = SubRC;

if (ARC && !ARC->hasSubClassEq(CopyInRC))		if (ARC && !ARC->hasSubClassEq(CopyInRC))
return false;		return false;
ARC = CopyInRC;		ARC = CopyInRC;
}		}
}

if (!ARC)		if (!ARC)
return false;		return false;

// Rewrite the PHI's incoming values to ARC.		// Rewrite the PHI's incoming values to ARC.
LLVM_DEBUG(dbgs() << "Folding AGPR copies into: " << PHI);		LLVM_DEBUG(dbgs() << "Folding AGPR copies into: " << PHI);
for (unsigned K = 1; K < PHI.getNumExplicitOperands(); K += 2) {		for (unsigned K = 1; K < PHI.getNumExplicitOperands(); K += 2) {
MachineOperand &MO = PHI.getOperand(K);		MachineOperand &MO = PHI.getOperand(K);
Register Reg = MO.getReg();		Register Reg = MO.getReg();

MachineBasicBlock::iterator InsertPt;		MachineBasicBlock::iterator InsertPt;
MachineBasicBlock *InsertMBB = nullptr;		MachineBasicBlock *InsertMBB = nullptr;

// Look at the def of Reg, ignoring all copies.		// Look at the def of Reg, ignoring all copies.
bool UseAccVGPRWrite = false;		bool UseAccVGPRWrite = false;
if (MachineInstr *Def = MRI->getVRegDef(Reg)) {		if (MachineInstr *Def = MRI->getVRegDef(Reg)) {

// Look at pre-existing COPY instructions from ARC: Steal the operand. If		// Look at pre-existing COPY instructions from ARC: Steal the operand. If
// the copy was single-use, it will be removed by DCE later.		// the copy was single-use, it will be removed by DCE later.
if (Def->isCopy()) {		if (Def->isCopy()) {
MachineOperand &CopyIn = Def->getOperand(1);		Register AGPRSrc;
if (CopyIn.getReg().isVirtual() &&		unsigned AGPRSubReg = AMDGPU::NoSubRegister;
		arsenmUnsubmitted Done Reply Inline Actions It's not a regmask, it's just a subreg index arsenm: It's not a regmask, it's just a subreg index
getRegOpRC(MRI, TRI, CopyIn)->hasSubClassEq(ARC)) {		if (isAGPRCopy(TRI, MRI, *Def, AGPRSrc, AGPRSubReg)) {
MO.setReg(CopyIn.getReg());		MO.setReg(AGPRSrc);
MO.setSubReg(CopyIn.getSubReg());		MO.setSubReg(AGPRSubReg);
continue;		continue;
}		}

// If this is a multi-use SGPR -> VGPR copy, use V_ACCVGPR_WRITE on		// If this is a multi-use SGPR -> VGPR copy, use V_ACCVGPR_WRITE on
// GFX908 directly instead of a COPY. Otherwise, SIFoldOperand may try		// GFX908 directly instead of a COPY. Otherwise, SIFoldOperand may try
// to fold the sgpr -> vgpr -> agpr copy into a sgpr -> agpr copy which		// to fold the sgpr -> vgpr -> agpr copy into a sgpr -> agpr copy which
// is unlikely to be profitable.		// is unlikely to be profitable.
		MachineOperand &CopyIn = Def->getOperand(1);
if (!ST->hasGFX90AInsts() && !MRI->hasOneNonDBGUse(Reg) &&		if (!ST->hasGFX90AInsts() && !MRI->hasOneNonDBGUse(Reg) &&
TRI->isSGPRReg(*MRI, CopyIn.getReg()))		TRI->isSGPRReg(*MRI, CopyIn.getReg()))
UseAccVGPRWrite = true;		UseAccVGPRWrite = true;
}		}

InsertPt = ++Def->getIterator();		InsertPt = ++Def->getIterator();
InsertMBB = Def->getParent();		InsertMBB = Def->getParent();
} else {		} else {
▲ Show 20 Lines • Show All 246 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/fold-agpr-phis.mir

Show First 20 Lines • Show All 402 Lines • ▼ Show 20 Lines	bb.1:
%16:vgpr_32 = COPY %15.sub0		%16:vgpr_32 = COPY %15.sub0
%17:vgpr_32 = COPY %15.sub1		%17:vgpr_32 = COPY %15.sub1
%18:vgpr_32 = COPY %15.sub2		%18:vgpr_32 = COPY %15.sub2
%19:vgpr_32 = COPY %15.sub3		%19:vgpr_32 = COPY %15.sub3
S_CBRANCH_SCC1 %bb.1, implicit $scc		S_CBRANCH_SCC1 %bb.1, implicit $scc
bb.2:		bb.2:
S_ENDPGM 0		S_ENDPGM 0
...		...

		---
		name: test_vgpr_init_two_copies
		tracksRegLiveness: true

		body: \|
		; GFX908-LABEL: name: test_vgpr_init_two_copies
		; GFX908: bb.0:
		; GFX908-NEXT: successors: %bb.1(0x80000000)
		; GFX908-NEXT: liveins: $vgpr0, $scc
		; GFX908-NEXT: {{ $}}
		; GFX908-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
		; GFX908-NEXT: [[COPY1:%[0-9]+]]:agpr_32 = COPY [[COPY]]
		; GFX908-NEXT: [[COPY2:%[0-9]+]]:agpr_32 = COPY [[COPY]]
		; GFX908-NEXT: [[COPY3:%[0-9]+]]:agpr_32 = COPY [[COPY]]
		; GFX908-NEXT: [[COPY4:%[0-9]+]]:agpr_32 = COPY [[COPY]]
		; GFX908-NEXT: {{ $}}
		; GFX908-NEXT: bb.1:
		; GFX908-NEXT: successors: %bb.1(0x40000000), %bb.2(0x40000000)
		; GFX908-NEXT: liveins: $scc
		; GFX908-NEXT: {{ $}}
		; GFX908-NEXT: [[PHI:%[0-9]+]]:agpr_32 = PHI [[COPY4]], %bb.0, %12.sub0, %bb.1
		; GFX908-NEXT: [[PHI1:%[0-9]+]]:agpr_32 = PHI [[COPY3]], %bb.0, %12.sub1, %bb.1
		; GFX908-NEXT: [[PHI2:%[0-9]+]]:agpr_32 = PHI [[COPY2]], %bb.0, %12.sub2, %bb.1
		; GFX908-NEXT: [[PHI3:%[0-9]+]]:agpr_32 = PHI [[COPY1]], %bb.0, %12.sub3, %bb.1
		; GFX908-NEXT: [[COPY5:%[0-9]+]]:vgpr_32 = COPY [[PHI3]]
		; GFX908-NEXT: [[COPY6:%[0-9]+]]:vgpr_32 = COPY [[PHI2]]
		; GFX908-NEXT: [[COPY7:%[0-9]+]]:vgpr_32 = COPY [[PHI1]]
		; GFX908-NEXT: [[COPY8:%[0-9]+]]:vgpr_32 = COPY [[PHI]]
		; GFX908-NEXT: [[REG_SEQUENCE:%[0-9]+]]:areg_128_align2 = REG_SEQUENCE [[COPY8]], %subreg.sub0, [[COPY7]], %subreg.sub1, [[COPY6]], %subreg.sub2, [[COPY5]], %subreg.sub3
		; GFX908-NEXT: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 1073741824, implicit $exec
		; GFX908-NEXT: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 1065353216, implicit $exec
		; GFX908-NEXT: [[V_MFMA_F32_4X4X1F32_e64_:%[0-9]+]]:areg_128_align2 = V_MFMA_F32_4X4X1F32_e64 [[V_MOV_B32_e32_1]], [[V_MOV_B32_e32_]], [[REG_SEQUENCE]], 0, 0, 0, implicit $mode, implicit $exec
		; GFX908-NEXT: S_CBRANCH_SCC1 %bb.1, implicit $scc
		; GFX908-NEXT: {{ $}}
		; GFX908-NEXT: bb.2:
		; GFX908-NEXT: S_ENDPGM 0
		; GFX90A-LABEL: name: test_vgpr_init_two_copies
		; GFX90A: bb.0:
		; GFX90A-NEXT: successors: %bb.1(0x80000000)
		; GFX90A-NEXT: liveins: $vgpr0, $scc
		; GFX90A-NEXT: {{ $}}
		; GFX90A-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
		; GFX90A-NEXT: [[COPY1:%[0-9]+]]:agpr_32 = COPY [[COPY]]
		; GFX90A-NEXT: [[COPY2:%[0-9]+]]:agpr_32 = COPY [[COPY]]
		; GFX90A-NEXT: [[COPY3:%[0-9]+]]:agpr_32 = COPY [[COPY]]
		; GFX90A-NEXT: [[COPY4:%[0-9]+]]:agpr_32 = COPY [[COPY]]
		; GFX90A-NEXT: {{ $}}
		; GFX90A-NEXT: bb.1:
		; GFX90A-NEXT: successors: %bb.1(0x40000000), %bb.2(0x40000000)
		; GFX90A-NEXT: liveins: $scc
		; GFX90A-NEXT: {{ $}}
		; GFX90A-NEXT: [[PHI:%[0-9]+]]:agpr_32 = PHI [[COPY4]], %bb.0, %12.sub0, %bb.1
		; GFX90A-NEXT: [[PHI1:%[0-9]+]]:agpr_32 = PHI [[COPY3]], %bb.0, %12.sub1, %bb.1
		; GFX90A-NEXT: [[PHI2:%[0-9]+]]:agpr_32 = PHI [[COPY2]], %bb.0, %12.sub2, %bb.1
		; GFX90A-NEXT: [[PHI3:%[0-9]+]]:agpr_32 = PHI [[COPY1]], %bb.0, %12.sub3, %bb.1
		; GFX90A-NEXT: [[COPY5:%[0-9]+]]:vgpr_32 = COPY [[PHI3]]
		; GFX90A-NEXT: [[COPY6:%[0-9]+]]:vgpr_32 = COPY [[PHI2]]
		; GFX90A-NEXT: [[COPY7:%[0-9]+]]:vgpr_32 = COPY [[PHI1]]
		; GFX90A-NEXT: [[COPY8:%[0-9]+]]:vgpr_32 = COPY [[PHI]]
		; GFX90A-NEXT: [[REG_SEQUENCE:%[0-9]+]]:areg_128_align2 = REG_SEQUENCE [[COPY8]], %subreg.sub0, [[COPY7]], %subreg.sub1, [[COPY6]], %subreg.sub2, [[COPY5]], %subreg.sub3
		; GFX90A-NEXT: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 1073741824, implicit $exec
		; GFX90A-NEXT: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 1065353216, implicit $exec
		; GFX90A-NEXT: [[V_MFMA_F32_4X4X1F32_e64_:%[0-9]+]]:areg_128_align2 = V_MFMA_F32_4X4X1F32_e64 [[V_MOV_B32_e32_1]], [[V_MOV_B32_e32_]], [[REG_SEQUENCE]], 0, 0, 0, implicit $mode, implicit $exec
		; GFX90A-NEXT: S_CBRANCH_SCC1 %bb.1, implicit $scc
		; GFX90A-NEXT: {{ $}}
		; GFX90A-NEXT: bb.2:
		; GFX90A-NEXT: S_ENDPGM 0
		bb.0:
		liveins: $vgpr0, $scc
		successors: %bb.1

		%0:vgpr_32 = COPY $vgpr0

		bb.1:
		liveins: $scc
		successors: %bb.1, %bb.2

		%8:vgpr_32 = PHI %0, %bb.0, %17, %bb.1
		%9:vgpr_32 = PHI %0, %bb.0, %18, %bb.1
		%10:vgpr_32 = PHI %0, %bb.0, %19, %bb.1
		%11:vgpr_32 = PHI %0, %bb.0, %20, %bb.1
		%12:areg_128_align2 = REG_SEQUENCE %8, %subreg.sub0, %9, %subreg.sub1, %10, %subreg.sub2, %11, %subreg.sub3
		%13:vgpr_32 = V_MOV_B32_e32 1073741824, implicit $exec
		%14:vgpr_32 = V_MOV_B32_e32 1065353216, implicit $exec
		%15:areg_128_align2 = V_MFMA_F32_4X4X1F32_e64 %14:vgpr_32, %13:vgpr_32, %12:areg_128_align2, 0, 0, 0, implicit $mode, implicit $exec
		%16:vreg_128_align2 = COPY %15:areg_128_align2
		%17:vgpr_32 = COPY %16.sub0:vreg_128_align2
		%18:vgpr_32 = COPY %16.sub1:vreg_128_align2
		%19:vgpr_32 = COPY %16.sub2:vreg_128_align2
		%20:vgpr_32 = COPY %16.sub3:vreg_128_align2
		S_CBRANCH_SCC1 %bb.1, implicit $scc

		bb.2:
		S_ENDPGM 0
		arsenmUnsubmitted Not Done Reply Inline Actions Could also use and end to end IR test arsenm: Could also use and end to end IR test
		Pierre-vhAuthorUnsubmitted Done Reply Inline Actions I don't have one; I can try reducing the original app's code but that may take a while, the kernel is big. I tried reproducing the pattern I thought was relevant using some functions from mfma_loop but I can't get it to create that two copy pattern. I will try digging a bit more, but is a end-to-end test necessary here? Pierre-vh: I don't have one; I can try reducing the original app's code but that may take a while, the…
		arsenmUnsubmitted Not Done Reply Inline Actions I'd like an end to end test because eventually all this code should be deleted but we should preserve the copy-avoiding behavior. In the ideal future RegBankSelect would take care of this arsenm: I'd like an end to end test because eventually all this code should be deleted but we should…
		Pierre-vhAuthorUnsubmitted Done Reply Inline Actions `mfma-loop` already has quite a few end-to-end testcases that will fail if this fold doesn't work, but break-large-PHIs is applied. I can try to come up with another case that _maybe_ reproduces the two-copies pattern in the current trunk LLVM, but I can't guarantee it'll always produce that pattern in an end-to-end testcase. Changes anywhere in the pipeline may cause the lowering to be different and this new code path to no longer be hit, hence why I don't think it's particularly useful to have a end-to-end case here. I don't think it's worth the effort to come up with a testcase that may be entirely irrelevant very soon. Now I'm thinking, something that may be useful would be to run `mfma-loop.ll` through CGP, and add that to the tests. That way if break-large-PHIs is disabled at some point in the future, we still have a stress test for AGPRs folding like SIFoldOperand does. Would you like me to add that? Lastly, note that all of this is very specific to break-large-PHIs + DAGISel. There's a very good chance that if we switch to GISel (and no longer break PHIs in IR), that this fold will be entirely irrelevant because we won't break PHIs that way anymore, or some combine will take care of it, etc. Pierre-vh: `mfma-loop` already has quite a few end-to-end testcases that will fail if this fold doesn't…
		arsenmUnsubmitted Not Done Reply Inline Actions it doesn't really need to be guaranteed, if you can come up with something add it arsenm: it doesn't really need to be guaranteed, if you can come up with something add it
		...