This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Introduce optimizeCompareInstr
ClosedPublic

Authored by rampitec on Aug 31 2021, 4:26 PM.

Details

Summary

The following patterns are currently handled:

s_cmp_eq_u32 (s_and_b32 $src, 1), 1 => s_and_b32 $src, 1
s_cmp_eq_i32 (s_and_b32 $src, 1), 1 => s_and_b32 $src, 1
s_cmp_eq_u64 (s_and_b64 $src, 1), 1 => s_and_b64 $src, 1
s_cmp_ge_u32 (s_and_b32 $src, 1), 1 => s_and_b32 $src, 1
s_cmp_ge_i32 (s_and_b32 $src, 1), 1 => s_and_b32 $src, 1
s_cmp_lg_u32 (s_and_b32 $src, 1), 0 => s_and_b32 $src, 1
s_cmp_lg_i32 (s_and_b32 $src, 1), 0 => s_and_b32 $src, 1
s_cmp_lg_u64 (s_and_b64 $src, 1), 0 => s_and_b64 $src, 1
s_cmp_gt_u32 (s_and_b32 $src, 1), 0 => s_and_b32 $src, 1
s_cmp_gt_i32 (s_and_b32 $src, 1), 0 => s_and_b32 $src, 1

Diff Detail

Event Timeline

rampitec created this revision.Aug 31 2021, 4:26 PM
rampitec requested review of this revision.Aug 31 2021, 4:26 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 31 2021, 4:26 PM
Herald added a subscriber: wdng. · View Herald Transcript

Missing test for a non-1 and value and commuted and

Missing test for a non-1 and value and commuted and

Non-1 and value exists: and_3_cmp_eq_1

Will add commuted case. I guess we do not really need to handle it, just bail, since at this point we hardly commute it.

I probably also want to invert the logic, remove the threshold, check for vreg def and scan to def. Even without threshold it is probably faster since looking for def is O(1) and most of the cases will be rejected just by looking at def.

rampitec updated this revision to Diff 369838.Aug 31 2021, 9:14 PM

Added support and test for commuted and.

rampitec updated this revision to Diff 369841.Aug 31 2021, 9:44 PM
  • Use getUniqueVRegDef() to find Def and check the Def before the scan.
  • Drop checks for defines/modifies SrcReg since we are in SSA.
foad added inline comments.Sep 1 2021, 2:04 AM
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
8007

Did you mean SrcReg2 || SrcReg.isPhysical()?

8014–8015

These two should be ge instead of lt?

rampitec updated this revision to Diff 369973.Sep 1 2021, 10:19 AM
rampitec marked 2 inline comments as done.
rampitec edited the summary of this revision. (Show Details)
  • Addressed review comments.
  • Reverse iterator is not needed anymore.
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
8014–8015

Thanks Jay, good catch!

rampitec updated this revision to Diff 370004.Sep 1 2021, 11:45 AM

Added missing check for killed SCC.

rampitec updated this revision to Diff 370007.Sep 1 2021, 11:48 AM

Added one more test for clobbered SCC.

With D108925 landed it shows real impact now.

arsenm accepted this revision.Sep 1 2021, 3:35 PM

LGTM with nit

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
7974

Should use Register()

7994

Register()

This revision is now accepted and ready to land.Sep 1 2021, 3:35 PM
rampitec updated this revision to Diff 370091.Sep 1 2021, 3:47 PM
rampitec marked 2 inline comments as done.

Use Register() instead of 0.

This revision was landed with ongoing or failed builds.Sep 1 2021, 3:57 PM
This revision was automatically updated to reflect the committed changes.