This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AMDGPU/
-
Target/
-
AMDGPU/
3/6
SIFoldOperands.cpp
-
test/CodeGen/AMDGPU/
-
CodeGen/
-
AMDGPU/
-
fold-cndmask-wave32.mir

Differential D93174

[amdgpu] Fix a crash case when `V_CNDMASK` could be simplified.
ClosedPublic

Authored by hliao on Dec 12 2020, 9:41 PM.

Download Raw Diff

Details

Reviewers

rampitec
arsenm

Commits

rG1fd1f638b68c: [amdgpu] Fix a crash case when `V_CNDMASK` could be simplified.

Summary

Once an instruction is simplified, foldable candidates from it should be invalidated or skipped as the operand index is no longer valid.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	300 ms	x64 windows > LLVM.CodeGen/XCore::threads.ll

Event Timeline

hliao created this revision.Dec 12 2020, 9:41 PM

Herald added subscribers: kerbowa, hiraditya, t-tye and 6 others. · View Herald TranscriptDec 12 2020, 9:41 PM

hliao requested review of this revision.Dec 12 2020, 9:41 PM

Herald added a project: Restricted Project. · View Herald TranscriptDec 12 2020, 9:41 PM

Herald added subscribers: llvm-commits, wdng. · View Herald Transcript

Harbormaster completed remote builds in B82173: Diff 311429.Dec 12 2020, 10:19 PM

arsenm added inline comments.Dec 14 2020, 6:29 AM

llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
1260	Doesn't this add a second mechanism to avoid the same problem? We already check isUseMIInFoldList to avoid revisiting

hliao added inline comments.Dec 14 2020, 6:58 AM

llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
1260	That one is used to prevent adding a commuted instr into the candidate list. But the case here is that both operands could be folded. Also, if used here, that check is too expensive? That candidate list will be scanned in square times.

arsenm added inline comments.Dec 14 2020, 7:24 AM

llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
1260	Should we just use a SetVector for FoldList then?

hliao added inline comments.Dec 14 2020, 8:11 AM

llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
1260	That list is a list of operand foldable, i.e., the pair MI and one of its operand being folded. If we change that to be keyed by MI, besides major data structure change, we may add extra overhead (set vs list) when building that candidate list. Considering that `tryFoldInst` only simplifies `V_CNDMASK` so far, is that overhead too big to justify the skip list here?

arsenm accepted this revision.Dec 14 2020, 9:44 AM

arsenm added inline comments.

llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
1260	Hmmm. In principle tryFoldInst would erase / replace the instruction, but it just happens to not. I do think the way this pass works is backwards, but I guess this is fine for now

This revision is now accepted and ready to land.Dec 14 2020, 9:44 AM

This revision was landed with ongoing or failed builds.Dec 14 2020, 10:08 AM

Closed by commit rG1fd1f638b68c: [amdgpu] Fix a crash case when `V_CNDMASK` could be simplified. (authored by hliao). · Explain Why

This revision was automatically updated to reflect the committed changes.

hliao added a commit: rG1fd1f638b68c: [amdgpu] Fix a crash case when `V_CNDMASK` could be simplified..

hliao added inline comments.Dec 14 2020, 10:10 AM

llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
1260	Thanks for the code review

foad mentioned this in D100100: [AMDGPU] SIFoldOperands: try harder to fold cndmask instructions.Apr 8 2021, 6:19 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

SIFoldOperands.cpp

6 lines

test/

CodeGen/

AMDGPU/

fold-cndmask-wave32.mir

20 lines

Diff 311429

llvm/lib/Target/AMDGPU/SIFoldOperands.cpp

Show First 20 Lines • Show All 1,251 Lines • ▼ Show 20 Lines	if (FoldingImm) {
}		}
}		}

MachineFunction *MF = MI.getParent()->getParent();		MachineFunction *MF = MI.getParent()->getParent();
// Make sure we add EXEC uses to any new v_mov instructions created.		// Make sure we add EXEC uses to any new v_mov instructions created.
for (MachineInstr *Copy : CopiesToReplace)		for (MachineInstr *Copy : CopiesToReplace)
Copy->addImplicitDefUseOperands(*MF);		Copy->addImplicitDefUseOperands(*MF);

		SmallPtrSet<MachineInstr *, 16> Folded;
		arsenmUnsubmitted Not Done Reply Inline Actions Doesn't this add a second mechanism to avoid the same problem? We already check isUseMIInFoldList to avoid revisiting arsenm: Doesn't this add a second mechanism to avoid the same problem? We already check…
		hliaoAuthorUnsubmitted Done Reply Inline Actions That one is used to prevent adding a commuted instr into the candidate list. But the case here is that both operands could be folded. Also, if used here, that check is too expensive? That candidate list will be scanned in square times. hliao: That one is used to prevent adding a commuted instr into the candidate list. But the case here…
		arsenmUnsubmitted Not Done Reply Inline Actions Should we just use a SetVector for FoldList then? arsenm: Should we just use a SetVector for FoldList then?
		hliaoAuthorUnsubmitted Done Reply Inline Actions That list is a list of operand foldable, i.e., the pair MI and one of its operand being folded. If we change that to be keyed by MI, besides major data structure change, we may add extra overhead (set vs list) when building that candidate list. Considering that `tryFoldInst` only simplifies `V_CNDMASK` so far, is that overhead too big to justify the skip list here? hliao: That list is a list of operand foldable, i.e., the pair MI and one of its operand being folded.
		arsenmUnsubmitted Not Done Reply Inline Actions Hmmm. In principle tryFoldInst would erase / replace the instruction, but it just happens to not. I do think the way this pass works is backwards, but I guess this is fine for now arsenm: Hmmm. In principle tryFoldInst would erase / replace the instruction, but it just happens to…
		hliaoAuthorUnsubmitted Done Reply Inline Actions Thanks for the code review hliao: Thanks for the code review
for (FoldCandidate &Fold : FoldList) {		for (FoldCandidate &Fold : FoldList) {
assert(!Fold.isReg() \|\| Fold.OpToFold);		assert(!Fold.isReg() \|\| Fold.OpToFold);
		if (Folded.count(Fold.UseMI))
		continue;
if (Fold.isReg() && Fold.OpToFold->getReg().isVirtual()) {		if (Fold.isReg() && Fold.OpToFold->getReg().isVirtual()) {
Register Reg = Fold.OpToFold->getReg();		Register Reg = Fold.OpToFold->getReg();
MachineInstr *DefMI = Fold.OpToFold->getParent();		MachineInstr *DefMI = Fold.OpToFold->getParent();
if (DefMI->readsRegister(AMDGPU::EXEC, TRI) &&		if (DefMI->readsRegister(AMDGPU::EXEC, TRI) &&
execMayBeModifiedBeforeUse(MRI, Reg, DefMI, *Fold.UseMI))		execMayBeModifiedBeforeUse(MRI, Reg, DefMI, *Fold.UseMI))
continue;		continue;
}		}
if (updateOperand(Fold, TII, TRI, *ST)) {		if (updateOperand(Fold, TII, TRI, *ST)) {
// Clear kill flags.		// Clear kill flags.
if (Fold.isReg()) {		if (Fold.isReg()) {
assert(Fold.OpToFold && Fold.OpToFold->isReg());		assert(Fold.OpToFold && Fold.OpToFold->isReg());
// FIXME: Probably shouldn't bother trying to fold if not an		// FIXME: Probably shouldn't bother trying to fold if not an
// SGPR. PeepholeOptimizer can eliminate redundant VGPR->VGPR		// SGPR. PeepholeOptimizer can eliminate redundant VGPR->VGPR
// copies.		// copies.
MRI->clearKillFlags(Fold.OpToFold->getReg());		MRI->clearKillFlags(Fold.OpToFold->getReg());
}		}
LLVM_DEBUG(dbgs() << "Folded source from " << MI << " into OpNo "		LLVM_DEBUG(dbgs() << "Folded source from " << MI << " into OpNo "
<< static_cast<int>(Fold.UseOpNo) << " of "		<< static_cast<int>(Fold.UseOpNo) << " of "
<< *Fold.UseMI << '\n');		<< *Fold.UseMI << '\n');
tryFoldInst(TII, Fold.UseMI);		if (tryFoldInst(TII, Fold.UseMI))
		Folded.insert(Fold.UseMI);
} else if (Fold.isCommuted()) {		} else if (Fold.isCommuted()) {
// Restoring instruction's original operand order if fold has failed.		// Restoring instruction's original operand order if fold has failed.
TII->commuteInstruction(*Fold.UseMI, false);		TII->commuteInstruction(*Fold.UseMI, false);
}		}
}		}
}		}

// Clamp patterns are canonically selected to v_max_* instructions, so only		// Clamp patterns are canonically selected to v_max_* instructions, so only
▲ Show 20 Lines • Show All 285 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/fold-cndmask-wave32.mir

This file was added.

				# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
				# RUN: llc -march=amdgcn -mcpu=gfx1030 -run-pass si-fold-operands -verify-machineinstrs -o - %s \| FileCheck %s

				---
				name: fold_cndmask
				tracksRegLiveness: true
				registers:
				body: \|
				bb.0.entry:
				; CHECK-LABEL: name: fold_cndmask
				; CHECK: [[DEF:%[0-9]+]]:sreg_32_xm0_xexec = IMPLICIT_DEF
				; CHECK: [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 0
				; CHECK: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
				; CHECK: [[COPY:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_]]
				%0:sreg_32_xm0_xexec = IMPLICIT_DEF
				%1:sreg_32 = S_MOV_B32 0
				%2:vgpr_32 = COPY %1:sreg_32
				%3:vgpr_32 = V_CNDMASK_B32_e64 0, %1:sreg_32, 0, %2:vgpr_32, %0:sreg_32_xm0_xexec, implicit $exec

				...