This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Combine DPP mov even if old reg def is in different BB
ClosedPublic

Authored by foad on Apr 21 2022, 9:22 AM.

Details

Summary

Given a DPP mov like this:

%2:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
...
%3:vgpr_32 = V_MOV_B32_dpp %2, %1, 1, 1, 1, 0, implicit $exec

this patch just removes a check that %2 (the "old reg") was defined in
the same BB as the DPP mov instruction. GCNDPPCombine requires that the
MIR is in SSA form so I don't understand why the BB matters.

This lets the optimization work in more real world cases when the
definition of %2 gets hoisted out of a loop.

Diff Detail

Event Timeline

foad created this revision.Apr 21 2022, 9:22 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 21 2022, 9:22 AM
foad requested review of this revision.Apr 21 2022, 9:22 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 21 2022, 9:22 AM
foad added a comment.Apr 21 2022, 9:23 AM

This has been tested by running the Vulkan CTS subgroups tests on gfx1030 (./deqp-vk --deqp-case="dEQP-VK.subgroups*") - thanks @piotr!

The worry is changes in exec between the def and the use

The worry is changes in exec between the def and the use

There is llvm::execMayBeModifiedBeforeUse() in the SIInstrInfo.cpp. Maybe it shall be used here?

foad added a subscriber: cwabbott.Apr 21 2022, 11:11 AM

The worry is changes in exec between the def and the use

I don't get it. The full sequence is something like this:

%2:vgpr_32 = V_MOV_B32_e32 0, implicit $exec // A
...
%3:vgpr_32 = V_MOV_B32_dpp %2, %1, 1, 1, 1, 0, implicit $exec // B
...
%4:vgpr_32 = V_ADD_U32_e32 %3, %0, implicit $exec // C

A sets up the value to be used when reading inactive lanes, which will either be don't-care or must be an identity for the operation done in C.

I can kind of understand that exec changes between B and C might be a problem. That is what @cwabbott called out in D55314#1321303, and @vpykhtin added a check for in D55444. But they also added a check for A and B being in the same BB.

If we're in SSA form then the only exec change that makes sense between A and B is narrowing the exec mask, as if B was inside the body of an "if", and that does not change the fact that B uses the constant value defined by A. I don't see how this can go wrong.

vpykhtin accepted this revision.May 5 2022, 3:25 AM

Yep, in SSA use of A means it dominates B so it seems correct to expect only narrowing of the exec mask. Let's try and see what happens.

This revision is now accepted and ready to land.May 5 2022, 3:25 AM
This revision was landed with ongoing or failed builds.May 5 2022, 3:33 AM
This revision was automatically updated to reflect the committed changes.