This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Allow SDWA in instructions with immediates and SGPRs
ClosedPublic

Authored by rampitec on May 25 2017, 8:53 PM.

Details

Summary

An encoding does not allow to use SDWA in an instruction with
scalar operands, either literals or SGPRs. That is however possible
to copy these operands into a VGPR first.

Several copies of the value are produced if multiple SDWA conversions
were done. To cleanup MachineLICM (to hoist copies out of loops),
MachineCSE (to remove duplicate copies) and SIFoldOperands (to replace
SGPR to VGPR copy with immediate copy right to the VGPR) runs are added
after the SDWA pass.

Diff Detail

Repository
rL LLVM

Event Timeline

rampitec created this revision.May 25 2017, 8:53 PM
SamWot added inline comments.May 25 2017, 10:16 PM
lib/Target/AMDGPU/SIPeepholeSDWA.cpp
590 ↗(On Diff #100362)

I think there should be some heuristic to check if this should be done. E.g. if we would fold only one SDWA operand in this instruction then creating this copy would only increase code size.

596–597 ↗(On Diff #100362)

Why this check is needed? It seems redundant for me.

731 ↗(On Diff #100362)

You should check for subregs

rampitec added inline comments.May 25 2017, 10:36 PM
lib/Target/AMDGPU/SIPeepholeSDWA.cpp
590 ↗(On Diff #100362)

Copy can be hoisted out of a loop. In this situation it is still profitable even if code has grown. I.e. there is a chance to improve, but if not it does not really hurt.

596–597 ↗(On Diff #100362)

Is there any guarantee that all operands can be VGPRs?

rampitec updated this revision to Diff 100366.May 25 2017, 10:39 PM
rampitec marked an inline comment as done.

Copied subreg along with the operand.

rampitec marked 2 inline comments as done.May 25 2017, 10:48 PM
rampitec added inline comments.
lib/Target/AMDGPU/SIPeepholeSDWA.cpp
596–597 ↗(On Diff #100362)

I have checked the list of supported instructions. I believe all of them accept VGPRs. I will remove the check shortly.

731 ↗(On Diff #100362)

Thanks, nice catch.

rampitec updated this revision to Diff 100367.May 25 2017, 10:55 PM
rampitec marked 3 inline comments as done.

Removed check for operands to support VGPRs. If MI has an SDWA opcode they all support VGPRs as corresponding source operands.

rampitec marked 2 inline comments as done.May 25 2017, 11:02 PM
rampitec added inline comments.
lib/Target/AMDGPU/SIPeepholeSDWA.cpp
590 ↗(On Diff #100362)

In fact the case which inspired this change has exactly this situation. Transformation by itself does not bring a big improvement, just below 1%. But with MachineLICM added which only hoists the immediate move out of the loop it yields 11% improvement. Of course the case is compute bound and the loop is small.

SamWot accepted this revision.May 30 2017, 1:43 AM
This revision is now accepted and ready to land.May 30 2017, 1:43 AM
This revision was automatically updated to reflect the committed changes.
rampitec marked an inline comment as done.