This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] improve folds for icmp gt/lt (shr X, C1), C2
ClosedPublic

Authored by spatel on Oct 3 2017, 2:14 PM.

Details

Summary

We can always eliminate the shift in: icmp gt/lt (shr X, C1), C2 --> icmp gt/lt X, C'
This patch was supposed to just be an efficiency improvement because we were doing this 3-step process to fold:

IC: Visiting:   %c = icmp ugt i4 %s, 1
IC: ADD:   %s = lshr i4 %x, 1
IC: ADD:   %1 = udiv i4 %x, 2
IC: Old =   %c = icmp ugt i4 %1, 1
    New =   <badref> = icmp uge i4 %x, 4
IC: ADD:   %c = icmp uge i4 %x, 4
IC: ERASE   %2 = icmp ugt i4 %1, 1
IC: Visiting:   %c = icmp uge i4 %x, 4
IC: Old =   %c = icmp uge i4 %x, 4
    New =   <badref> = icmp ugt i4 %x, 3
IC: ADD:   %c = icmp ugt i4 %x, 3
IC: ERASE   %2 = icmp uge i4 %x, 4
IC: Visiting:   %c = icmp ugt i4 %x, 3
IC: DCE:   %1 = udiv i4 %x, 2
IC: ERASE   %1 = udiv i4 %x, 2
IC: DCE:   %s = lshr i4 %x, 1
IC: ERASE   %s = lshr i4 %x, 1
IC: Visiting:   ret i1 %c

When we could go directly to canonical icmp form:

IC: Visiting:   %c = icmp ugt i4 %s, 1
IC: Old =   %c = icmp ugt i4 %s, 1
    New =   <badref> = icmp ugt i4 %x, 3
IC: ADD:   %c = icmp ugt i4 %x, 3
IC: ERASE   %1 = icmp ugt i4 %s, 1
IC: ADD:   %s = lshr i4 %x, 1
IC: DCE:   %s = lshr i4 %x, 1
IC: ERASE   %s = lshr i4 %x, 1
IC: Visiting:   %c = icmp ugt i4 %x, 3

...but then I noticed that the folds were incomplete too:
https://godbolt.org/g/aB2hLE

Here are attempts to prove the logic with Alive:
https://rise4fun.com/Alive/92o

Name: lshr_ult
Pre: ((C2 << C1) u>> C1) == C2
%sh = lshr i8 %x, C1
%r = icmp ult i8 %sh, C2
  =>
%r = icmp ult i8 %x, (C2 << C1)

Name: ashr_slt
Pre: ((C2 << C1) >> C1) == C2
%sh = ashr i8 %x, C1
%r = icmp slt i8 %sh, C2
  =>
%r = icmp slt i8 %x, (C2 << C1)

Name: lshr_ugt
Pre: (C2 != 255) && (((C2+1) << C1) u>> C1) == (C2+1)
%sh = lshr i8 %x, C1
%r = icmp ugt i8 %sh, C2
  =>
%r = icmp ugt i8 %x, ((C2+1) << C1) - 1

Name: ashr_sgt
Pre: (C2 != 127) && ((C2+1) << C1 != -128) && (((C2+1) << C1) >> C1) == (C2+1)
%sh = ashr i8 %x, C1
%r = icmp sgt i8 %sh, C2
  =>
%r = icmp sgt i8 %x, ((C2+1) << C1) - 1

We did something similar for 'shl' in D28406.

Diff Detail

Repository
rL LLVM

Event Timeline

spatel created this revision.Oct 3 2017, 2:14 PM
craig.topper edited the summary of this revision. (Show Details)Oct 3 2017, 7:17 PM
craig.topper edited edge metadata.Oct 3 2017, 8:10 PM

I think this changes behavior of the current code with "exact" shifts. Did we not have tests for that?

For example:

define i1 @lshrugt_01_01_exact(i4 %x) {
; CHECK-LABEL: @lshrugt_01_01(
; CHECK-NEXT:    [[C:%.*]] = icmp ugt i4 %x, 2
; CHECK-NEXT:    ret i1 [[C]]
;
  %s = lshr exact i4 %x, 1
  %c = icmp ugt i4 %s, 1
  ret i1 %c
}

define i1 @lshrugt_01_01(i4 %x) {
; CHECK-LABEL: @lshrugt_01_01(
; CHECK-NEXT:    [[C:%.*]] = icmp ugt i4 %x, 3
; CHECK-NEXT:    ret i1 [[C]]
;
  %s = lshr i4 %x, 1
  %c = icmp ugt i4 %s, 1
  ret i1 %c
}
spatel added a comment.Oct 4 2017, 6:29 AM

I think this changes behavior of the current code with "exact" shifts. Did we not have tests for that?

Nice catch - let me add some handling for that. And no, there are no tests that expose the difference (there's only one candidate 'exact.ll' test for this fold currently, but the constant didn't wiggle). Before rL314837, you could have completely bogus logic for the fold (because I did of course) and not break anything.

spatel updated this revision to Diff 117680.Oct 4 2017, 9:09 AM

Patch updated:

  1. Account for exact shifts in the same way as the existing code. It's not clear to me that using exact universally produces a better result, but I'm not proposing to change that here. In rL314907, I duplicated the entire range of tests with exactness to prove that nothing is changing for any of those tests.
  1. Rearrange the code to try to make the relationship between all of these variants clearer. It's only the non-exact + greater-than cases where we have to inc/dec the constant. All other cases are just a plain shl of the compare constant.

Here's Alive code for the exact cases:

Name: ashr_exact_sgt
Pre: ((C2 << C1) >> C1) == C2
%sh = ashr exact i8 %x, C1
%r = icmp sgt i8 %sh, C2
  =>
%r = icmp sgt i8 %x, (C2 << C1)

Name: ashr_exact_slt
Pre: ((C2 << C1) >> C1) == C2
%sh = ashr exact i8 %x, C1
%r = icmp slt i8 %sh, C2
  =>
%r = icmp slt i8 %x, (C2 << C1)

Name: lshr_exact_ugt
Pre: ((C2 << C1) u>> C1) == C2
%sh = lshr exact i8 %x, C1
%r = icmp ugt i8 %sh, C2
  =>
%r = icmp ugt i8 %x, (C2 << C1)

Name: lshr_exact_ult
Pre: ((C2 << C1) u>> C1) == C2
%sh = lshr exact i8 %x, C1
%r = icmp ult i8 %sh, C2
  =>
%r = icmp ult i8 %x, (C2 << C1)

https://rise4fun.com/Alive/Erp

This revision is now accepted and ready to land.Oct 4 2017, 4:04 PM
spatel added a comment.Oct 5 2017, 1:31 PM

Given the possibility raised in PR34838 ( D38591 ), I'm going to change the asserts to checks here.

This revision was automatically updated to reflect the committed changes.