This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] MachineLICM cannot hoist VALU
ClosedPublic

Authored by rampitec on Aug 10 2021, 12:12 PM.

Details

Summary

MachineLoop::isLoopInvariant() returns false for all VALU
because of the exec use. Check TII::isIgnorableUse() to
allow hoisting.

That unfortunately results in higher register consumption
since MachineLICM does not adequately estimate pressure.
Therefor I think it shall only be enabled after D107677 even
though it does not depend on it.

Diff Detail

Event Timeline

rampitec created this revision.Aug 10 2021, 12:12 PM
rampitec requested review of this revision.Aug 10 2021, 12:12 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 10 2021, 12:12 PM
Herald added a subscriber: wdng. · View Herald Transcript

The D107677 has landed so it shall be OK to use this patch at this point I believe.

PSDB and ePSDB passed.

foad added a comment.Aug 27 2021, 7:36 AM

Are there any tests that show a benefit from this, not just churn?

Are there any tests that show a benefit from this, not just churn?

Most of the changed tests show progressions. For sure most of the loop invariants are hoisted yet in the IR LICM pass, so MachineLICM only picks what was created during or after lowering and that is mostly service or index registers with constants.
For example in the dag-divergence-atomic.ll that is v_mov_b32_e32 v0, 0 hoisted out of the loop at BB5_1, and similar changes in the rest of the tests.

arsenm added inline comments.Aug 27 2021, 10:54 AM
llvm/lib/CodeGen/MachineLoopInfo.cpp
179

Should cache TII like TRI

rampitec updated this revision to Diff 369144.Aug 27 2021, 11:03 AM
rampitec marked an inline comment as done.

Cached TTI.

foad added a comment.Sep 1 2021, 2:54 AM

This might be a silly question but I have to ask it: why do we have implicit uses of exec in the first place, if we have to keep teaching passes to ignore them? (Maybe we could only add the implicit uses later in the pass pipeline, which control flow intrinsics are lowered to explicit exec mask manipulations?)

This might be a silly question but I have to ask it: why do we have implicit uses of exec in the first place, if we have to keep teaching passes to ignore them? (Maybe we could only add the implicit uses later in the pass pipeline, which control flow intrinsics are lowered to explicit exec mask manipulations?)

That is hard to predict what transformations would be illegal and it is very easy to lose an implicit operand if not written in the MCDesc. It takes some effort to prove it is safe to ignore this operand in each case. In case of the LICM we always hoist an instruction to a block with a mask which shall completely cover the mask used inside loop, so it must be safe. I am not so sure about the rest of potential cases.

rampitec updated this revision to Diff 374353.Sep 22 2021, 1:38 PM

Rebased.
Ping.

foad accepted this revision.Oct 20 2021, 2:28 AM

Looks OK to me.

This revision is now accepted and ready to land.Oct 20 2021, 2:28 AM
This revision was landed with ongoing or failed builds.Oct 20 2021, 12:06 PM
This revision was automatically updated to reflect the committed changes.