This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Increase detection range for s_mov, v_cmpx transformation.
ClosedPublic

Authored by tsymalla on Apr 8 2022, 12:54 AM.

Details

Summary

We found that it might be beneficial to have the SIOptimizeExecMasking
pass detect more cases where v_cmp, s_and_saveexec patterns can be
transformed to s_mov, v_cmpx patterns. Currently, the search range
for finding a fitting v_cmp instruction is 5, however, this is doubled
to 10 here.

Diff Detail

Event Timeline

tsymalla created this revision.Apr 8 2022, 12:54 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 8 2022, 12:54 AM
tsymalla requested review of this revision.Apr 8 2022, 12:54 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 8 2022, 12:54 AM
foad accepted this revision.Apr 8 2022, 1:40 AM

Seems obviously fine, since we have seen real world cases where it helps.

This revision is now accepted and ready to land.Apr 8 2022, 1:40 AM
tsymalla updated this revision to Diff 421459.Apr 8 2022, 2:04 AM

Put the limit up to 20 instructions.

foad accepted this revision.Apr 8 2022, 2:08 AM
foad added inline comments.
llvm/lib/Target/AMDGPU/SIOptimizeExecMasking.cpp
319

Pre-existing problem: debug instructions (A->isDebugInstr()) should not count towards the search limit, because we need to get identical codegen with and without debug instructions present.

tsymalla updated this revision to Diff 421480.Apr 8 2022, 3:37 AM

Added handling for debug instrs.

foad accepted this revision.Apr 8 2022, 3:40 AM

LGTM, thanks.

This revision was landed with ongoing or failed builds.Apr 8 2022, 3:47 AM
This revision was automatically updated to reflect the committed changes.