This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] si-wqm: Skip only LiveMask COPY
ClosedPublic

Authored by rovka on Aug 25 2023, 6:07 AM.

Details

Reviewers
arsenm
Group Reviewers
Restricted Project
Commits
rG20e9e4f797e7: [AMDGPU] si-wqm: Skip only LiveMask COPY
Summary

si-wqm sometimes needs to save the LiveMask in the entry block. Later
on, while looking for a place to enter WQM/WWM, it unconditionally
skips over the first COPY instruction in the entry block. This is
incorrect for functions where the LiveMask doesn't need to be saved, and
therefore the first COPY is more likely a COPY from a function argument
and might need to be in some non-exact mode.

This patch fixes the issue by also checking that the source of the COPY
is the EXEC register.

This produces different code in 2 of the existing tests:

  • In wave32.ll, we end up with an extra register copy. This is because

the first COPY in the block is now part of the WWM block, so
si-pre-allocate-wwm-regs will allocate a new register for its
destination (when it was outside of the WWM region, the register
allocator could just re-use the same register). We might be able to
improve this in si-pre-allocate-wwm-regs but I haven't looked into it.

  • The same thing happens in dual-source-blend-export.ll, but for that

one it's harder to see because of the scheduling changes. I've uploaded
the before/after si-wqm output for it here:
https://reviews.llvm.org/differential/diff/553445/

Diff Detail

Event Timeline

rovka created this revision.Aug 25 2023, 6:07 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 25 2023, 6:07 AM
rovka requested review of this revision.Aug 25 2023, 6:07 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 25 2023, 6:07 AM
rovka added a reviewer: Restricted Project.
arsenm accepted this revision.Aug 25 2023, 1:22 PM
This revision is now accepted and ready to land.Aug 25 2023, 1:22 PM
This revision was landed with ongoing or failed builds.Nov 10 2023, 12:33 AM
This revision was automatically updated to reflect the committed changes.