This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Improve v_cmpx usage on GFX10.3.
ClosedPublic

Authored by tsymalla on Mar 23 2022, 10:48 AM.

Details

Summary

On GFX10.3 targets, the following instruction sequence

v_cmp_* SGPR, ...
s_and_saveexec ..., SGPR

leads to a fairly long stall caused by a VALU write to a SGPR and having the
following SALU wait for the SGPR.

An equivalent sequence is to save the exec mask manually instead of letting
s_and_saveexec do the work and use a v_cmpx instruction instead to do the
comparison.

This patch modifies the SIOptimizeExecMasking pass as this is the last position
where s_and_saveexec instructions are inserted. It does the transformation by
trying to find the pattern, extracting the operands and generating the new
instruction sequence.

It also changes some existing lit tests and introduces a few new tests to show
the changed behavior on GFX10.3 targets.

Same as D119696 including a buildbot and MIR test fix.

Diff Detail

Event Timeline

tsymalla created this revision.Mar 23 2022, 10:48 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 23 2022, 10:48 AM
tsymalla requested review of this revision.Mar 23 2022, 10:48 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 23 2022, 10:48 AM

The previous revision had an include missing (LivePhysRegs.h) and the MIR test failing on Buildbot, which should be fixed here.

critson accepted this revision.Mar 23 2022, 7:40 PM

LGTM

Please run "arc lint" or otherwise apply clang-format rules to fix the formatting errors highlighted by Lint pre-merge checks.

This revision is now accepted and ready to land.Mar 23 2022, 7:40 PM
tsymalla updated this revision to Diff 418144.Mar 25 2022, 12:11 AM

Ran clang-format.

This revision was landed with ongoing or failed builds.Mar 25 2022, 3:40 AM
This revision was automatically updated to reflect the committed changes.