This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] DPP combiner: recognize identities for more opcodes
ClosedPublic

Authored by foad on Jul 4 2019, 7:03 AM.

Download Raw Diff

Details

Reviewers

arsenm
vpykhtin
cwabbott

Commits

rG7e0c10b55ff7: [AMDGPU] DPP combiner: recognize identities for more opcodes
rL365211: [AMDGPU] DPP combiner: recognize identities for more opcodes

Summary

This allows the DPP combiner to kick in more often. For example the
exclusive scan generated by the atomic optimizer for a divergent atomic
add used to look like this:

v_mov_b32_e32 v3, v1
v_mov_b32_e32 v5, v1
v_mov_b32_e32 v6, v1
v_mov_b32_dpp v3, v2  wave_shr:1 row_mask:0xf bank_mask:0xf
s_nop 1
v_add_u32_dpp v4, v3, v3  row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
v_mov_b32_dpp v5, v3  row_shr:2 row_mask:0xf bank_mask:0xf
v_mov_b32_dpp v6, v3  row_shr:3 row_mask:0xf bank_mask:0xf
v_add3_u32 v3, v4, v5, v6
v_mov_b32_e32 v4, v1
s_nop 1
v_mov_b32_dpp v4, v3  row_shr:4 row_mask:0xf bank_mask:0xe
v_add_u32_e32 v3, v3, v4
v_mov_b32_e32 v4, v1
s_nop 1
v_mov_b32_dpp v4, v3  row_shr:8 row_mask:0xf bank_mask:0xc
v_add_u32_e32 v3, v3, v4
v_mov_b32_e32 v4, v1
s_nop 1
v_mov_b32_dpp v4, v3  row_bcast:15 row_mask:0xa bank_mask:0xf
v_add_u32_e32 v3, v3, v4
s_nop 1
v_mov_b32_dpp v1, v3  row_bcast:31 row_mask:0xc bank_mask:0xf
v_add_u32_e32 v1, v3, v1
v_add_u32_e32 v1, v2, v1
v_readlane_b32 s0, v1, 63

But now most of the dpp movs are combined into adds:

v_mov_b32_e32 v3, v1
v_mov_b32_e32 v5, v1
s_nop 0
v_mov_b32_dpp v3, v2  wave_shr:1 row_mask:0xf bank_mask:0xf
s_nop 1
v_add_u32_dpp v4, v3, v3  row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
v_mov_b32_dpp v5, v3  row_shr:2 row_mask:0xf bank_mask:0xf
v_mov_b32_dpp v1, v3  row_shr:3 row_mask:0xf bank_mask:0xf
v_add3_u32 v1, v4, v5, v1
s_nop 1
v_add_u32_dpp v1, v1, v1  row_shr:4 row_mask:0xf bank_mask:0xe
s_nop 1
v_add_u32_dpp v1, v1, v1  row_shr:8 row_mask:0xf bank_mask:0xc
s_nop 1
v_add_u32_dpp v1, v1, v1  row_bcast:15 row_mask:0xa bank_mask:0xf
s_nop 1
v_add_u32_dpp v1, v1, v1  row_bcast:31 row_mask:0xc bank_mask:0xf
v_add_u32_e32 v1, v2, v1
v_readlane_b32 s0, v1, 63

Also fix some typos in comments and debug output.

Diff Detail

Repository

rG LLVM Github Monorepo

Build Status

Buildable 34385
Build 34384: arc lint + arc unit

Event Timeline

foad created this revision.Jul 4 2019, 7:03 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 4 2019, 7:03 AM

Herald added subscribers: jfb, MaskRay, kbarton and 10 others. · View Herald Transcript

Harbormaster completed remote builds in B34382: Diff 208037.Jul 4 2019, 7:05 AM

Herald added a subscriber: • wuzish. · View Herald TranscriptJul 4 2019, 7:05 AM

I'm not sure if e64 instructions have modifiers that cannot be encoded into DPP version, need to check. Otherwise looks good, though I would split typo corrections into separate patch and submit without review.

I think modifiers are checked correctly by the existing code, but can you add a test for e64 encodings into dpp_combine.mir similar to what is under "check for floating point modifiers" comment?

Typo fixes have been committed separately.

Harbormaster completed remote builds in B34385: Diff 208048.Jul 4 2019, 8:11 AM

vpykhtin added a reviewer: cwabbott.Jul 4 2019, 8:19 AM

In D64207#1570456, @vpykhtin wrote:

I think modifiers are checked correctly by the existing code, but can you add a test for e64 encodings into dpp_combine.mir similar to what is under "check for floating point modifiers" comment?

I'm trying but I don't know enough about Machine IR. A typical e64 instruction from my dumps is:

%20:vgpr_32 = V_ADD_U32_e64 %18:vgpr_32, killed %19:vgpr_32, 0, implicit $exec

but I don't know what the third operand (the 0) means or what other values it can take.

Right, this is hard to follow even for me :). 3rd operand is src1_modifiers, you can use a junk value for this to check whether the DPP combiner don't crash and don't combine it.

In D64207#1570526, @vpykhtin wrote:

Right, this is hard to follow even for me :). 3rd operand is src1_modifiers, you can use a junk value for this to check whether the DPP combiner don't crash and don't combine it.

I still can't understand the instruction tables but according to my debugger, the third operand is clamp:

(gdb) call OrigMI.dump()
  %10:vgpr_32 = V_ADD_U32_e64 %9:vgpr_32, %1:vgpr_32, 99, implicit $exec
(gdb) p TII->getNamedOperand(OrigMI, AMDGPU::OpName::src0_modifiers)
$5 = (llvm::MachineOperand *) 0x0
(gdb) p TII->getNamedOperand(OrigMI, AMDGPU::OpName::src1_modifiers)
$6 = (llvm::MachineOperand *) 0x0
(gdb) p TII->getNamedOperand(OrigMI, AMDGPU::OpName::clamp)
$7 = (llvm::MachineOperand *) 0x6d1eef0
(gdb) p TII->getNamedOperand(OrigMI, AMDGPU::OpName::omod)
$8 = (llvm::MachineOperand *) 0x0
(gdb) call TII->getNamedOperand(OrigMI, AMDGPU::OpName::clamp)->dump()
99

Add a test case, and change the tests to run on gfx9 because it needs
add-no-carry instructions.

Harbormaster completed remote builds in B34411: Diff 208125.Jul 5 2019, 1:37 AM

LGTM. Thank you!

This revision is now accepted and ready to land.Jul 5 2019, 4:13 AM

Closed by commit rL365211: [AMDGPU] DPP combiner: recognize identities for more opcodes (authored by foad). · Explain WhyJul 5 2019, 7:55 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

GCNDPPCombine.cpp

13 lines

Diff 208048

llvm/lib/Target/AMDGPU/GCNDPPCombine.cpp

Show First 20 Lines • Show All 247 Lines • ▼ Show 20 Lines	MachineInstr *GCNDPPCombine::createDPPInst(MachineInstr &OrigMI,
return DPPInst.getInstr();		return DPPInst.getInstr();
}		}

static bool isIdentityValue(unsigned OrigMIOp, MachineOperand *OldOpnd) {		static bool isIdentityValue(unsigned OrigMIOp, MachineOperand *OldOpnd) {
assert(OldOpnd->isImm());		assert(OldOpnd->isImm());
switch (OrigMIOp) {		switch (OrigMIOp) {
default: break;		default: break;
case AMDGPU::V_ADD_U32_e32:		case AMDGPU::V_ADD_U32_e32:
		case AMDGPU::V_ADD_U32_e64:
case AMDGPU::V_ADD_I32_e32:		case AMDGPU::V_ADD_I32_e32:
		case AMDGPU::V_ADD_I32_e64:
case AMDGPU::V_OR_B32_e32:		case AMDGPU::V_OR_B32_e32:
		case AMDGPU::V_OR_B32_e64:
case AMDGPU::V_SUBREV_U32_e32:		case AMDGPU::V_SUBREV_U32_e32:
		case AMDGPU::V_SUBREV_U32_e64:
case AMDGPU::V_SUBREV_I32_e32:		case AMDGPU::V_SUBREV_I32_e32:
		case AMDGPU::V_SUBREV_I32_e64:
case AMDGPU::V_MAX_U32_e32:		case AMDGPU::V_MAX_U32_e32:
		case AMDGPU::V_MAX_U32_e64:
case AMDGPU::V_XOR_B32_e32:		case AMDGPU::V_XOR_B32_e32:
		case AMDGPU::V_XOR_B32_e64:
if (OldOpnd->getImm() == 0)		if (OldOpnd->getImm() == 0)
return true;		return true;
break;		break;
case AMDGPU::V_AND_B32_e32:		case AMDGPU::V_AND_B32_e32:
		case AMDGPU::V_AND_B32_e64:
case AMDGPU::V_MIN_U32_e32:		case AMDGPU::V_MIN_U32_e32:
		case AMDGPU::V_MIN_U32_e64:
if (static_cast<uint32_t>(OldOpnd->getImm()) ==		if (static_cast<uint32_t>(OldOpnd->getImm()) ==
std::numeric_limits<uint32_t>::max())		std::numeric_limits<uint32_t>::max())
return true;		return true;
break;		break;
case AMDGPU::V_MIN_I32_e32:		case AMDGPU::V_MIN_I32_e32:
		case AMDGPU::V_MIN_I32_e64:
if (static_cast<int32_t>(OldOpnd->getImm()) ==		if (static_cast<int32_t>(OldOpnd->getImm()) ==
std::numeric_limits<int32_t>::max())		std::numeric_limits<int32_t>::max())
return true;		return true;
break;		break;
case AMDGPU::V_MAX_I32_e32:		case AMDGPU::V_MAX_I32_e32:
		case AMDGPU::V_MAX_I32_e64:
if (static_cast<int32_t>(OldOpnd->getImm()) ==		if (static_cast<int32_t>(OldOpnd->getImm()) ==
std::numeric_limits<int32_t>::min())		std::numeric_limits<int32_t>::min())
return true;		return true;
break;		break;
case AMDGPU::V_MUL_I32_I24_e32:		case AMDGPU::V_MUL_I32_I24_e32:
		case AMDGPU::V_MUL_I32_I24_e64:
case AMDGPU::V_MUL_U32_U24_e32:		case AMDGPU::V_MUL_U32_U24_e32:
		case AMDGPU::V_MUL_U32_U24_e64:
if (OldOpnd->getImm() == 1)		if (OldOpnd->getImm() == 1)
return true;		return true;
break;		break;
}		}
return false;		return false;
}		}

MachineInstr *GCNDPPCombine::createDPPInst(MachineInstr &OrigMI,		MachineInstr *GCNDPPCombine::createDPPInst(MachineInstr &OrigMI,
▲ Show 20 Lines • Show All 202 Lines • Show Last 20 Lines