This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen] [SelectionDAG] More efficient code for X % C == 0 (UREM case) (try 2)
ClosedPublic

Authored by lebedev.ri on Jun 16 2019, 3:25 PM.

Details

Summary

I'm submitting a new revision since i don't understand how to reclaim/reopen/take over the existing one, D50222.
There is no such action in "Add Action" menu...

This implements an optimization described in Hacker's Delight 10-17: when C is constant,
the result of X % C == 0 can be computed more cheaply without actually calculating the remainder.
The motivation is discussed here: https://bugs.llvm.org/show_bug.cgi?id=35479.

Original patch D50222 by @hermord (Dmytro Shynkevych)

Notes:

  • In principle, it's possible to also handle the X % C1 == C2 case, as discussed on bugzilla. This seems to require an extra branch on overflow, so I refrained from implementing this for now.
  • An explicit check for when the REM can be reduced to just its LHS is included: the X % C == 0 optimization breaks test1 in test/CodeGen/X86/jump_sign.ll otherwise. I hadn't managed to find a better way to not generate worse output in this case.
  • The test/CodeGen/X86/jump_sign.ll regresses, and is being fixed by a followup patch D63390.

Diff Detail

Repository
rL LLVM

Event Timeline

lebedev.ri created this revision.Jun 16 2019, 3:25 PM

Thanks, looks fine now (with fixed regression).

Are you going to implement also signed case? And possibly general cases?

GCC’s patch contains quite useful details for “general” rem folds, which can help you:
https://gcc.gnu.org/viewcvs/gcc/trunk/gcc/expr.c?r1=264248&r2=264247&pathrev=264248

Thanks, looks fine now (with fixed regression).

Are you going to implement also signed case? And possibly general cases?

I haven't decided yet. I don't have a motivational pattern at hand
as of this moment, and all this handwritten pattern-matching is
so depressingly futile that it is all kinda pointless.

GCC’s patch contains quite useful details for “general” rem folds, which can help you:
https://gcc.gnu.org/viewcvs/gcc/trunk/gcc/expr.c?r1=264248&r2=264247&pathrev=264248

Uhm, is it okay license-wise to look at that code even?

I'm submitting a new revision since i don't understand how to reclaim/reopen/take over the existing one, D50222.
There is no such action in "Add Action" menu...
Original patch D50222 by @hermord (Dmytro Shynkevych)

Add Action | Revision Actions | Commandeer Revision

I'm submitting a new revision since i don't understand how to reclaim/reopen/take over the existing one, D50222.
There is no such action in "Add Action" menu...
Original patch D50222 by @hermord (Dmytro Shynkevych)

Add Action | Revision Actions | Commandeer Revision

Well, yeah, that's the thing

I'm submitting a new revision since i don't understand how to reclaim/reopen/take over the existing one, D50222.
There is no such action in "Add Action" menu...
Original patch D50222 by @hermord (Dmytro Shynkevych)

Add Action | Revision Actions | Commandeer Revision

Well, yeah, that's the thing

Revision is closed, so you cannot see it. Anyway, just continue this revision.

spatel added inline comments.Jun 17 2019, 10:10 AM
include/llvm/CodeGen/TargetLowering.h
4069–4073

Formatting: use the current guideline 'lowerCamelCase' on these function names?

lib/CodeGen/SelectionDAG/TargetLowering.cpp
4433–4434

Not sure if this legality check is necessary to avoid infinite looping and/or pick up extra diffs, but we don't normally want to distinguish 'custom' from 'legal' like this.

Can we use this simpler check?

if (!isOperationLegalOrCustom(ISD::MUL, REMVT)) 
  return SDValue()

That is, it doesn't matter what stage of combining we're in. If the target supports MUL for this type, this is always a good transfom for performance. That might allow removing the earlier 'isTypeLegal()' check too.

4477–4478

See earlier comment about legality of the MUL. Can we use the simpler check independent of combining stage?

lebedev.ri marked 3 inline comments as done.

Address review notes - just use isOperationLegalOrCustom(), entirety of check-llvm-codegen still passes at least.

Logic looks right; I just pointed out some more formatting nits.

@RKSimon started reviewing the original patch, so let's see if he has comments.

lib/CodeGen/SelectionDAG/TargetLowering.cpp
3398

"a a"
Period at end of sentence.

4400–4401

Could shorten:
REMNode -> Rem
CompNode -> check that this operand is zero before we call this, rather than pass it as a param?

4430

Period at end of sentence.

4434

Period at end of sentence.

4454

Period at end of sentence.

4471

Period at end of sentence.

4473

Period at end of sentence.

xbolva00 added inline comments.Jun 25 2019, 3:21 AM
lib/CodeGen/SelectionDAG/TargetLowering.cpp
4400–4401

In terms of possible future general case of this REM == C fold... I think it is okay it to pass it.

lebedev.ri marked 6 inline comments as done.

Landed regression fix, rebased, addressed comment comments. ping.

xbolva00 accepted this revision.Jun 25 2019, 4:14 AM

Looks ok

This revision is now accepted and ready to land.Jun 25 2019, 4:14 AM
lebedev.ri requested review of this revision.Jun 27 2019, 4:31 AM
RKSimon accepted this revision.Jun 27 2019, 8:27 AM

LGTM - I'd have preferred that non-uniform vectors were supported straight away but this is alright for now.

This revision is now accepted and ready to land.Jun 27 2019, 8:27 AM

Thank you for the reviews!

This revision was automatically updated to reflect the committed changes.
lebedev.ri reopened this revision.Jun 27 2019, 10:22 AM

http://lab.llvm.org:8011/builders/clang-cmake-x86_64-sde-avx512-linux/builds/23789 complains about this/rL364564.
Since new .exec.status entries are present i'm gonna guess the tests are actually failing.

This revision is now accepted and ready to land.Jun 27 2019, 10:22 AM
lebedev.ri planned changes to this revision.Jun 27 2019, 11:17 AM

Yep, this clearly miscompiles things.

So the first thought about multiplicativeInverse() being broken does not appear to be correct
(although that interface is extremely susceptible to misuse),

The miscompiles in test-suite happened for urem by 100 (and 10, did not check for more cases).
This code produces:

define i32 @test_urem_100(i32 %X) nounwind readnone {
; X86-LABEL: test_urem_100:
; X86:       # %bb.0:
; X86-NEXT:    imull $-1030792151, {{[0-9]+}}(%esp), %ecx # imm = 0xC28F5C29
; X86-NEXT:    rorl $2, %ecx
; X86-NEXT:    xorl %eax, %eax
; X86-NEXT:    cmpl $171798692, %ecx # imm = 0xA3D70A4
; X86-NEXT:    setb %al
; X86-NEXT:    retl
;
; X64-LABEL: test_urem_100:
; X64:       # %bb.0:
; X64-NEXT:    imull $-1030792151, %edi, %ecx # imm = 0xC28F5C29
; X64-NEXT:    rorl $2, %ecx
; X64-NEXT:    xorl %eax, %eax
; X64-NEXT:    cmpl $171798692, %ecx # imm = 0xA3D70A4
; X64-NEXT:    setb %al
; X64-NEXT:    retq
  %urem = urem i32 %X, 100
  %cmp = icmp eq i32 %urem, 0
  %ret = zext i1 %cmp to i32
  ret i32 %ret
}

GCC: https://godbolt.org/z/ks6Wm-

imull   $-1030792151, %edi, %edi
rorl    $2, %edi
cmpl    $42949672, %edi
setbe   %al

The difference is in the constant we compare with: 171798692 (patch) vs 42949672 (GCC).

lebedev.ri marked an inline comment as done.Jun 27 2019, 2:00 PM
lebedev.ri added inline comments.
llvm/trunk/lib/CodeGen/SelectionDAG/TargetLowering.cpp
4525–4526 ↗(On Diff #206884)

Got it, this should be dividing by D, not D l>> cttz(D)

lebedev.ri edited the summary of this revision. (Show Details)

Fix miscompile (it was even visible in already existing tests!!!) by properly calculating the constant for new comparison.

This revision is now accepted and ready to land.Jun 27 2019, 2:40 PM
This revision was automatically updated to reflect the committed changes.