This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Select VOP3 form of add
ClosedPublic

Authored by arsenm on May 5 2019, 2:26 PM.

Details

Reviewers
rampitec
Summary

The VOP3 form should always be the preferred selection form to be
shrunk later. This should only be an optimization issue, but this
partially works around a problem from clobbering VCC when
IFixSGPRCopies rewrites an SCC defining operation directly to VCC.

3 of the testcases are regressions from failing to fold the immediate
in cases it should. These can be avoided by improving the VCC liveness
handling in SIFoldOperands. Simply increasing the threshold to
computeRegisterLiveness works, although this is common enough that VCC
liveness should probably be tracked throughout the pass. The hack of
leaving behind an implicit_def instruction to avoid breaking iterator
wastes instruction count, which inhibits finding the VCC def in long
chains of adds. Doing this however exposes different, worse looking
regressions from poor scheduling behavior. This could probably be
avoided around by forcing the shrink of the addc here, but the
scheduler should probably be fixed.

The r600 add test needs to be split out because it asserts on the
arguments in the new test during the calling convention lowering.

Diff Detail

Event Timeline

arsenm created this revision.May 5 2019, 2:26 PM

I am in favor of this change in general, but can we fix folding issues before? We may have unwanted performance regressions otherwise.

arsenm added a comment.May 6 2019, 1:11 PM

I am in favor of this change in general, but can we fix folding issues before? We may have unwanted performance regressions otherwise.

I put a lot of time into trying, but fixing all of the issues will take time and this is an important workaround. The regression in the clmem lit test from increasing the folding threshold was worse. The folding pass needs more work to track VCC accurately, and the scheduler needs work to not regress it. An alternative might be to force shrinking of the addc

I am in favor of this change in general, but can we fix folding issues before? We may have unwanted performance regressions otherwise.

I put a lot of time into trying, but fixing all of the issues will take time and this is an important workaround. The regression in the clmem lit test from increasing the folding threshold was worse. The folding pass needs more work to track VCC accurately, and the scheduler needs work to not regress it. An alternative might be to force shrinking of the addc

We should probably wait with this change then.

rampitec added inline comments.May 8 2019, 11:57 AM
lib/Target/AMDGPU/VOP2Instructions.td
537

While you are here, please move this brace back couple lines up.

I am in favor of this change in general, but can we fix folding issues before? We may have unwanted performance regressions otherwise.

I put a lot of time into trying, but fixing all of the issues will take time and this is an important workaround. The regression in the clmem lit test from increasing the folding threshold was worse. The folding pass needs more work to track VCC accurately, and the scheduler needs work to not regress it. An alternative might be to force shrinking of the addc

We should probably wait with this change then.

I thought about forcing shrinking addc here, but I think it's not really good, and is purely a scheduler workaround. It isn't naturally better, and introduces new physical register constraints. I'm looking at how to fix the scheduler, but that could take a while

rampitec accepted this revision.May 8 2019, 2:27 PM

LGTM

This revision is now accepted and ready to land.May 8 2019, 2:27 PM
arsenm closed this revision.May 8 2019, 3:07 PM

r360293