This is an archive of the discontinued LLVM Phabricator instance.

[SimplifyCFG] allow speculation of div/rem when sibling op exists (PR31028)
AbandonedPublic

Authored by spatel on Mar 13 2017, 2:18 PM.

Details

Summary

As noted in:
https://bugs.llvm.org/show_bug.cgi?id=31028
...I initially thought this would be a CGP patch and limited to targets like x86. But doing this in SimplifyCFG improves code even for targets like AArch64 that don't have a divrem instruction. That's because we can replace compare and branch with csel.

I know that the hoisting of a rem may be a stretch of the conservative limits of SimplifyCFG, but the benefits of collapsing the blocks seems like a worthy transform. We could (as we do for other expensive ops) split this back up in CGP if it is a concern.

For the example in the PR, AArch64 had:

  mov      w8, w0
  sdiv    w0, w8, w1
  msub    w8, w0, w1, w8
  cmp       w8, #42         // =42
  b.eq    .LBB0_2    <--- nothing in the backend is flattening this
  orr     w0, wzr, #0x3
.LBB0_2:
  ret

After:

sdiv	w8, w0, w1
msub	w9, w8, w1, w0
cmp		w9, #42         // =42
orr	w9, wzr, #0x3
csel	w0, w8, w9, eq
ret

On x86, we had:

  movl	%edi, %ecx
  movl	%ecx, %eax
  cltd
  idivl	%esi
  movl	$3, %eax
  cmpl	$42, %edx
  jne	.LBB0_2
# BB#1:                            
  movl	%ecx, %eax
  cltd
  idivl	%esi    <--- very expensive and useless instruction
.LBB0_2:                
  retq

After:

movl	%edi, %eax
cltd
idivl	%esi
cmpl	$42, %edx
movl	$3, %ecx
cmovnel	%ecx, %eax
retq

Diff Detail

Event Timeline

spatel created this revision.Mar 13 2017, 2:18 PM
filcab added inline comments.Mar 13 2017, 3:05 PM
lib/Transforms/Utils/SimplifyCFG.cpp
1921

Please make one test per pair, too.

filcab edited edge metadata.Mar 13 2017, 3:26 PM

Nevermind, thanks for pointing out you already have the tests I asked for. I somehow skipped over them when reviewing.

efriedma edited edge metadata.Mar 13 2017, 3:49 PM

Every CPU can lower divrem to something cheaper than a separate div and rem; that seems fine.

My one concern here is that for a target without a hardware divrem for the width in question, if the rem is inside the conditional, you're speculating a multiply. This is fine if it gets lowered to a hardware instruction, but might be problematic if it gets lowered to a libcall.

Every CPU can lower divrem to something cheaper than a separate div and rem; that seems fine.

My one concern here is that for a target without a hardware divrem for the width in question, if the rem is inside the conditional, you're speculating a multiply. This is fine if it gets lowered to a hardware instruction, but might be problematic if it gets lowered to a libcall.

Right - I figured I was close to the edge for the hoisting rem case. I see 3 possible solutions:

  1. Fix that up in CGP (the despeculation machinery already exists for things like llvm.cttz).
  2. Limit the transform to legal ops/types here in SimplifyCFG (TTI should give us that?)
  3. Don't hoist rem in this patch, just div. We would still catch the case in the bug report.

Any prefs or other options?

TTI::getUserCost, maybe? Not sure our costs are accurate for illegal types.


I wonder if it would make sense to perform this transform in EarlyCSE or something like that instead; it might be worthwhile even if we can't speculate the whole basic block.

Err, I meant TTI::getOperationCost.

spatel abandoned this revision.Sep 11 2017, 4:07 PM

Abandoning. We've solved this in a more complete way with a dedicated pass in D37121, so I don't think there's a need for a lesser hoisting transform. I'll update the comments in the test file.