Page MenuHomePhabricator

AMDGPU: Combine DPP mov with use instuctions (VOP1/2/3)

Authored by vpykhtin on Oct 26 2018, 7:44 AM.



The change adds DPP instruction pseudos and the pass combining V_MOV_B32_dpp instruction with its VALU uses as a DPP src0 operand. If any of the use instruction cannot be combined with the mov the whole sequence is reverted.

$old = ...
$dpp_value = V_MOV_B32_dpp $old, $vgpr_to_be_read_from_other_lane, dpp_controls..., $bound_ctrl
$res = VALU $dpp_value, ...


$res = VALU_DPP $folded_old, $dpp_value, ..., dpp_controls..., $folded_bound_ctrl

Combining rules:

$bound_ctrl is DPP_BOUND_ZERO, $old is any
$bound_ctrl is DPP_BOUND_OFF, $old is 0

-> $folded_old = undef, $folded_bound_ctrl = DPP_BOUND_ZERO

$bound_ctrl is DPP_BOUND_OFF, $old is undef

-> $folded_old = undef, $folded_bound_ctrl = DPP_BOUND_OFF

$bound_ctrl is DPP_BOUND_OFF, $old is foldable

-> $folded_old = folded value, $folded_bound_ctrl = DPP_BOUND_OFF

FMAC and MAC instructions doesn't have $old operand, as it already have tied $src2.

Combining into VOPC instructions ins't supported with this patch (to be added later)

Diff Detail


Event Timeline

vpykhtin created this revision.Oct 26 2018, 7:44 AM
rampitec added inline comments.Oct 26 2018, 10:06 AM
286 ↗(On Diff #171300)


294 ↗(On Diff #171300)


316 ↗(On Diff #171300)


321 ↗(On Diff #171300)

There must be a check somewhere that use is not a SDWA or non-trivial OpSel, as these are incompatible with DPP. Also there must be a negative test for these.

5306 ↗(On Diff #171300)


5313 ↗(On Diff #171300)

I think it should be a part of SIInstrInfo, not llvm namespace.

5350 ↗(On Diff #171300)

Also a part of SIInstrInfo.

914 ↗(On Diff #171300)

Doesn't it and getRegSubRegPair belong to SIRegisterInfo?

vpykhtin added inline comments.Oct 26 2018, 10:13 AM
286 ↗(On Diff #171300)

I wish I knew that before :-)

321 ↗(On Diff #171300)

this is handled by by non-dpp -> dpp instuction map, see getDPPOp at line 131.

914 ↗(On Diff #171300)

You're right but it rather belongs to MachineRegisterInfo. I din't put it into SIRegisterInfo so that there is no need to cast to it every time.

rampitec added inline comments.Oct 26 2018, 10:23 AM
321 ↗(On Diff #171300)

SDWA maybe, but you can still use it with OpSel when it is trivial (selects [lo, hi] for all operands).

In anyway negative tests are needed for that.

914 ↗(On Diff #171300)

They are global helpers, I do not see what to cast ;)
I also agree they better fit common MRI.

vpykhtin added inline comments.Oct 26 2018, 10:45 AM
321 ↗(On Diff #171300)

I agree with tests, but I'm not sure I understood. There is no way to convert SDWA to DPP, right?

914 ↗(On Diff #171300)

I mean MRI should be casted to SIRegiterInfo before use

rampitec added inline comments.Oct 26 2018, 11:53 AM
321 ↗(On Diff #171300)

SDWA not, but can be possible with opsel if it does not select a partial operand or shuffle, i.e. if it is only used as opsel because encoding requires it.

914 ↗(On Diff #171300)


vpykhtin updated this revision to Diff 172353.Nov 2 2018, 7:31 AM
vpykhtin retitled this revision from AMDGPU: Combine DPP mov with use instuctions to AMDGPU: Combine DPP mov with use instuctions (VOP1/2/3).
vpykhtin edited the summary of this revision. (Show Details)

Fixed per review issues.

Added checks on non-default VOP3 modifiers with negative mir test on these. Src operand modifiers are checked to have only ABS or NEG (allowed by DPP) which effectively prevents other modifiers such non-default OpSel modifiers from being combined. OpSel tests aren't included as there're no opsel pseudos yet.

vpykhtin marked 17 inline comments as done.Nov 2 2018, 7:34 AM

You do not need a special pseudo for opsel, just use VOP3 and immediate value for modifier. This is much like you did above for omod.

350 ↗(On Diff #172353)

AFAIR proper default value for opsel has 1 bit set (which means use high bits for high part).

arsenm added inline comments.Nov 2 2018, 10:09 AM
9–10 ↗(On Diff #172353)

The comments in your commit message would be more useful here

228–229 ↗(On Diff #172353)

Why do you need to handle these cases? I would also use uint32_t instead of unsigned

323–324 ↗(On Diff #172353)

New lines

408 ↗(On Diff #172353)

Why do you need to collect a separate list of the moves in the whole function? Can you just use the dfs iterator to avoid this?

rampitec added inline comments.Nov 2 2018, 10:26 AM
350 ↗(On Diff #172353)

After discussion that is not the case here. You cannot process VOP3P, so you do not have op_sel_hi for which default is 1. I.e. this mask is ok, just add a test with non-zero opsel.

vpykhtin added inline comments.Nov 2 2018, 10:40 AM
228–229 ↗(On Diff #172353)

These cases are for the situation when bound_ctrl = 0 (result write disable), meaning that mov result would be the value of 'old' for inactive lanes. If we know the immediate for old we can calculate (is some cases) old value for the VALU operation which isn't the same as for the mov. Otherwise the combining would fail.

For example:

v1 = ...
v10 = ... // other lane reg

v0 = v_mov_b32 1
v0 = v_mov_b32_dpp v10, ..., 0 // bound_ctrl == write disable
v2 = v_mul_u32_u24_e32 v0, v1

in this case we know v0 for inactive lanes would be 1 (identity for mul). This makes possible to use v1 value as the result of the mul for inactive lanes:

v1 = v_mul_u32_u24_dpp v10, v1, ..., 0 // bound_ctrl == write disable
v2 = v_mov_b32 v1

Othervise the combining isn't possible for bound_ctrl == write disable.

408 ↗(On Diff #172353)


vpykhtin updated this revision to Diff 173169.Nov 8 2018, 8:09 AM

fixed per review issues:

  • Added test on non-abs|neg modifiers,
  • replaced accumulation vector for DPP moves with reverse iteration on BB.
vpykhtin marked 7 inline comments as done.Nov 8 2018, 8:11 AM
This revision is now accepted and ready to land.Nov 8 2018, 11:49 AM
This revision was automatically updated to reflect the committed changes.