This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/SI: Fold operands with sub-registers
ClosedPublic

Authored by nhaehnle on Jan 4 2016, 3:00 PM.

Details

Summary

Multi-dword constant loads generated unnecessary moves from SGPRs into VGPRs,
increasing the code size and VGPR pressure. These moves are now folded away.

Note that this lack of operand folding was not a problem for VMEM loads,
because COPY nodes from VReg_Nnn to VGPR32 are eliminated by the register
coalescer.

Some tests are updated, note that the fsub.ll test explicitly checks that
the move is elided.

With the IR generated by current Mesa, the changes are obviously relatively
minor:

7063 shaders in 3531 tests
Totals:
SGPRS: 351872 -> 352560 (0.20 %)
VGPRS: 199984 -> 200732 (0.37 %)
Code Size: 9876968 -> 9881112 (0.04 %) bytes
LDS: 91 -> 91 (0.00 %) blocks
Scratch: 1779712 -> 1767424 (-0.69 %) bytes per wave
Wait states: 295164 -> 295337 (0.06 %)

Totals from affected shaders:
SGPRS: 65784 -> 66472 (1.05 %)
VGPRS: 38064 -> 38812 (1.97 %)
Code Size: 1993828 -> 1997972 (0.21 %) bytes
LDS: 42 -> 42 (0.00 %) blocks
Scratch: 795648 -> 783360 (-1.54 %) bytes per wave
Wait states: 54026 -> 54199 (0.32 %)

Diff Detail

Event Timeline

nhaehnle updated this revision to Diff 43929.Jan 4 2016, 3:00 PM
nhaehnle retitled this revision from to AMDGPU/SI: Fold operands with sub-registers.
nhaehnle updated this object.
nhaehnle added a subscriber: llvm-commits.
tstellarAMD accepted this revision.Jan 6 2016, 8:14 PM
tstellarAMD edited edge metadata.

LGTM.

This revision is now accepted and ready to land.Jan 6 2016, 8:14 PM
This revision was automatically updated to reflect the committed changes.