This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AMDGPU/
-
Target/
-
AMDGPU/
-
GCNDPPCombine.cpp
-
test/CodeGen/AMDGPU/
-
CodeGen/
-
AMDGPU/
-
dpp_combine.mir

Differential D100760

[AMDGPU] GCNDPPCombine: don't shrink V_ADD_CO_U32 if carry out is used
ClosedPublic

Authored by foad on Apr 19 2021, 6:55 AM.

Download Raw Diff

Details

Reviewers

vpykhtin
rampitec
dstuttard

Commits

rGb22721f01a58: [AMDGPU] GCNDPPCombine: don't shrink V_ADD_CO_U32 if carry out is used

Summary

Don't shrink VOP3 instructions if there are any uses of a carry-out
operand, because the shrunken form of the instruction would write the
carry-out to vcc instead of to a virtual register.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

foad created this revision.Apr 19 2021, 6:55 AM

Herald added subscribers: kerbowa, shchenz, kbarton and 9 others. · View Herald TranscriptApr 19 2021, 6:55 AM

foad requested review of this revision.Apr 19 2021, 6:55 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 19 2021, 6:55 AM

Herald added subscribers: llvm-commits, wdng. · View Herald Transcript

Harbormaster completed remote builds in B99467: Diff 338508.Apr 19 2021, 7:29 AM

LGTM

You might want to wait for Valery or Stas to comment too.

Only minor suggestion - maybe put in a TODO (or equiv) that we could handle this scenario as well (even with the shrinking. (I think).

This revision is now accepted and ready to land.Apr 19 2021, 8:14 AM

In D100760#2698668, @dstuttard wrote:

Only minor suggestion - maybe put in a TODO (or equiv) that we could handle this scenario as well (even with the shrinking. (I think).

I'm reluctant to do that because I don't know how to handle it.

In D100760#2698681, @foad wrote:

In D100760#2698668, @dstuttard wrote:

Only minor suggestion - maybe put in a TODO (or equiv) that we could handle this scenario as well (even with the shrinking. (I think).

I'm reluctant to do that because I don't know how to handle it.

Ok - that's reasonable (I don't either)

Is that a functional or performance problem if vcc is used?

In D100760#2698969, @rampitec wrote:

Is that a functional or performance problem if vcc is used?

It's a correctness problem because when we shrink an instruction like %4:vgpr_32, %5:sreg_64_xexec = V_ADD_CO_U32_e64 %3, %1, 0, implicit $exec the def of %5 turns into an implicit def of vcc, but we don't touch the uses of %5, so they become uses with no def which fails MIR verification.

LGTM

Closed by commit rGb22721f01a58: [AMDGPU] GCNDPPCombine: don't shrink V_ADD_CO_U32 if carry out is used (authored by foad). · Explain WhyApr 20 2021, 1:18 AM

This revision was automatically updated to reflect the committed changes.

foad added a commit: rGb22721f01a58: [AMDGPU] GCNDPPCombine: don't shrink V_ADD_CO_U32 if carry out is used.

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

GCNDPPCombine.cpp

7 lines

test/

CodeGen/

AMDGPU/

dpp_combine.mir

20 lines

Diff 338749

llvm/lib/Target/AMDGPU/GCNDPPCombine.cpp

Show First 20 Lines • Show All 117 Lines • ▼ Show 20 Lines	bool GCNDPPCombine::isShrinkable(MachineInstr &MI) const {
unsigned Op = MI.getOpcode();		unsigned Op = MI.getOpcode();
if (!TII->isVOP3(Op)) {		if (!TII->isVOP3(Op)) {
return false;		return false;
}		}
if (!TII->hasVALU32BitEncoding(Op)) {		if (!TII->hasVALU32BitEncoding(Op)) {
LLVM_DEBUG(dbgs() << " Inst hasn't e32 equivalent\n");		LLVM_DEBUG(dbgs() << " Inst hasn't e32 equivalent\n");
return false;		return false;
}		}
		if (const auto *SDst = TII->getNamedOperand(MI, AMDGPU::OpName::sdst)) {
		// Give up if there are any uses of the carry-out from instructions like
		// V_ADD_CO_U32. The shrunken form of the instruction would write it to vcc
		// instead of to a virtual register.
		if (!MRI->use_nodbg_empty(SDst->getReg()))
		return false;
		}
// check if other than abs\|neg modifiers are set (opsel for example)		// check if other than abs\|neg modifiers are set (opsel for example)
const int64_t Mask = ~(SISrcMods::ABS \| SISrcMods::NEG);		const int64_t Mask = ~(SISrcMods::ABS \| SISrcMods::NEG);
if (!hasNoImmOrEqual(MI, AMDGPU::OpName::src0_modifiers, 0, Mask) \|\|		if (!hasNoImmOrEqual(MI, AMDGPU::OpName::src0_modifiers, 0, Mask) \|\|
!hasNoImmOrEqual(MI, AMDGPU::OpName::src1_modifiers, 0, Mask) \|\|		!hasNoImmOrEqual(MI, AMDGPU::OpName::src1_modifiers, 0, Mask) \|\|
!hasNoImmOrEqual(MI, AMDGPU::OpName::clamp, 0) \|\|		!hasNoImmOrEqual(MI, AMDGPU::OpName::clamp, 0) \|\|
!hasNoImmOrEqual(MI, AMDGPU::OpName::omod, 0)) {		!hasNoImmOrEqual(MI, AMDGPU::OpName::omod, 0)) {
LLVM_DEBUG(dbgs() << " Inst has non-default modifiers\n");		LLVM_DEBUG(dbgs() << " Inst has non-default modifiers\n");
return false;		return false;
▲ Show 20 Lines • Show All 496 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/dpp_combine.mir

Show First 20 Lines • Show All 348 Lines • ▼ Show 20 Lines	bb.0:
%3:vgpr_32 = V_MOV_B32_dpp undef %2, %0, 1, 15, 15, 1, implicit $exec		%3:vgpr_32 = V_MOV_B32_dpp undef %2, %0, 1, 15, 15, 1, implicit $exec
%4:vgpr_32 = V_ADD_U32_e64 %3, %1, 0, implicit $exec		%4:vgpr_32 = V_ADD_U32_e64 %3, %1, 0, implicit $exec

; this shouldn't be combined as clamp is set		; this shouldn't be combined as clamp is set
%5:vgpr_32 = V_MOV_B32_dpp undef %2, %0, 1, 15, 15, 1, implicit $exec		%5:vgpr_32 = V_MOV_B32_dpp undef %2, %0, 1, 15, 15, 1, implicit $exec
%6:vgpr_32 = V_ADD_U32_e64 %5, %1, 1, implicit $exec		%6:vgpr_32 = V_ADD_U32_e64 %5, %1, 1, implicit $exec
...		...

		# GCN-LABEL: name: add_co_u32_e64
		# GCN: %4:vgpr_32, %5:sreg_64_xexec = V_ADD_CO_U32_e64 %3, %1, 0, implicit $exec

		name: add_co_u32_e64
		tracksRegLiveness: true
		body: \|
		bb.0:
		liveins: $vgpr0, $vgpr1

		%0:vgpr_32 = COPY $vgpr0
		%1:vgpr_32 = COPY $vgpr1
		%2:vgpr_32 = IMPLICIT_DEF

		; this shouldn't be combined as the carry-out is used
		%3:vgpr_32 = V_MOV_B32_dpp undef %2, %0, 1, 15, 15, 1, implicit $exec
		%4:vgpr_32, %5:sreg_64_xexec = V_ADD_CO_U32_e64 %3, %1, 0, implicit $exec

		S_NOP 0, implicit %5
		...

# tests on sequences of dpp consumers		# tests on sequences of dpp consumers
# GCN-LABEL: name: dpp_seq		# GCN-LABEL: name: dpp_seq
# GCN: %4:vgpr_32 = V_ADD_CO_U32_dpp %1, %0, %1, 1, 14, 15, 0, implicit-def $vcc, implicit $exec		# GCN: %4:vgpr_32 = V_ADD_CO_U32_dpp %1, %0, %1, 1, 14, 15, 0, implicit-def $vcc, implicit $exec
# GCN: %5:vgpr_32 = V_SUBREV_CO_U32_dpp %1, %0, %1, 1, 14, 15, 0, implicit-def $vcc, implicit $exec		# GCN: %5:vgpr_32 = V_SUBREV_CO_U32_dpp %1, %0, %1, 1, 14, 15, 0, implicit-def $vcc, implicit $exec
# GCN: %6:vgpr_32 = V_OR_B32_dpp %1, %0, %1, 1, 14, 15, 0, implicit $exec		# GCN: %6:vgpr_32 = V_OR_B32_dpp %1, %0, %1, 1, 14, 15, 0, implicit $exec
# broken sequence:		# broken sequence:
# GCN: %7:vgpr_32 = V_MOV_B32_dpp %2, %0, 1, 14, 15, 0, implicit $exec		# GCN: %7:vgpr_32 = V_MOV_B32_dpp %2, %0, 1, 14, 15, 0, implicit $exec

▲ Show 20 Lines • Show All 501 Lines • Show Last 20 Lines