Page MenuHomePhabricator

[DivRemPairs] Handling for expanded-form rem - recomposition (PR42673)
ClosedPublic

Authored by lebedev.ri on Jul 25 2019, 1:37 PM.

Details

Summary

While -div-rem-pairs pass can decompose rem in div+rem pair when div-rem pair
is unsupported by target, nothing performs the opposite fold.
We can't do that in InstCombine or DAGCombine since neither of those has access to TTI.
So it makes most sense to teach -div-rem-pairs about it.

If we matched rem in expanded form, we know we will be able to place div-rem pair
next to each other so we won't regress the situation.
Also, we shouldn't decompose rem if we matched already-decomposed form.
This is surprisingly straight-forward otherwise.

https://bugs.llvm.org/show_bug.cgi?id=42673

Diff Detail

Event Timeline

lebedev.ri created this revision.Jul 25 2019, 1:37 PM

LGTM - but I'd prefer that someone with a better grasp of C++ / PointerIntPair also have a look to make sure I'm not missing any implementation subtleties.

One question: should we keep track of the dead multiply instructions and remove them in this pass (for example, using RecursivelyDeleteTriviallyDeadInstructions())?

LGTM - but I'd prefer that someone with a better grasp of C++ / PointerIntPair also have a look to make sure I'm not missing any implementation subtleties.

Thank you. I personally don't think there's any subtleties there.

One question: should we keep track of the dead multiply instructions and remove them in this pass (for example, using RecursivelyDeleteTriviallyDeadInstructions())?

I have thought about that. I kind-of don't see how to nicely fit that here.
The even bigger issue is that we can now rewrite that ((x / y) * y) as x - (x % y)
thus replacing mul with sub, but this also doesn't exactly fit here..
So i'd personally prefer to leave it for further passes to deal with
(yes, i have seen comment in git log about running after instcombine,
so i guess that leaves dagcombine.)

bogner accepted this revision.Jul 29 2019, 1:40 PM
bogner added a subscriber: bogner.

LGTM

This revision is now accepted and ready to land.Jul 29 2019, 1:40 PM

@spatel @bogner thank you for the review!

This revision was automatically updated to reflect the committed changes.
lebedev.ri reopened this revision.Jul 30 2019, 12:44 AM

And reverted.
http://lab.llvm.org:8011/builders/clang-cmake-x86_64-sde-avx512-linux/builds/25150/steps/test-suite/logs/test.log

Only PHI nodes may reference their own value!
  %sub33 = srem i32 %sub33, %ranks_in_i

Not sure what's going on yet.

This revision is now accepted and ready to land.Jul 30 2019, 12:44 AM

Reduced:

$ cat zz.ll 
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

define i32 @d(i32 %X, i32 %Y, i32 %Z) {
bb:
  %t0 = mul nsw i32 %Z, %Y
  %t1 = sdiv i32 %X, %t0
  %t2 = mul nsw i32 %t0, %t1
  %t3 = sub nsw i32 %X, %t2
  %t4 = sdiv i32 %t3, %Y
  %t5 = mul nsw i32 %t4, %Y
  %t6 = sub nsw i32 %t3, %t5
  ret i32 %t6
}
$ /builddirs/llvm-project/build-Clang8-unknown/bin/opt -div-rem-pairs zz.ll -o - -S 
Only PHI nodes may reference their own value!
  %t6 = srem i32 %t6, %Y
in function d
LLVM ERROR: Broken function found, compilation aborted!
$ /builddirs/llvm-project/build-Clang8-unknown/bin/opt -instcombine zz.ll -o - -S 
; ModuleID = 'zz.ll'
source_filename = "zz.ll"
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

define i32 @d(i32 %X, i32 %Y, i32 %Z) {
bb:
  %t0 = mul nsw i32 %Z, %Y
  %0 = srem i32 %X, %t0
  %1 = srem i32 %0, %Y
  ret i32 %1
}
lebedev.ri planned changes to this revision.Jul 30 2019, 4:42 AM

So yeah, that obviously doesn't work.

We do RAUW but we ignore the maps we created, so if one of the values we RAUW'd
happens to be dividend/divisor (and so is used as part of the key in those maps), we get UB.

While this patch exposed the issue, the same bug already exists in trunk: https://bugs.llvm.org/show_bug.cgi?id=42823

It is trivial to catch this via PoisoningVH<>/AssertingVH<>,
but the fix is not obvious - they are used as a key in a map,
so just using TrackingVH/WeakTrackingVH won't work.

There's ValueMap, but we use DivRemMapKey as key, not a single Value.

Also, this pass assumes that there is at most one div instruction and at most one rem instruction with given pair of arguments.

$ cat zz2.ll 
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "aarch64-unknown-linux-gnu"

define void @d(i32 %X, i32 %Y, i32 %Z, i1 %c, i32* %dst00, i32* %dst01, i32* %dst10, i32* %dst11) {
bb:
  %t0 = mul nsw i32 %Z, %Y
  br i1 %c, label %bb1, label %bb2
bb1:
  %t1 = sdiv i32 %X, %t0
  %t3.recomposed = srem i32 %X, %t0
  store i32 %t1, i32* %dst00
  store i32 %t3.recomposed, i32* %dst01
  br label %end
bb2:
  %t12 = sdiv i32 %X, %t0
  %t32.recomposed = srem i32 %X, %t0
  store i32 %t12, i32* %dst10
  store i32 %t32.recomposed, i32* %dst11
  br label %end
end:
  ret void
}
$ /builddirs/llvm-project/build-Clang8-unknown/bin/opt -div-rem-pairs zz2.ll -o - -S 
; ModuleID = 'zz2.ll'
source_filename = "zz2.ll"
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "aarch64-unknown-linux-gnu"

define void @d(i32 %X, i32 %Y, i32 %Z, i1 %c, i32* %dst00, i32* %dst01, i32* %dst10, i32* %dst11) {
bb:
  %t0 = mul nsw i32 %Z, %Y
  br i1 %c, label %bb1, label %bb2

bb1:                                              ; preds = %bb
  %t1 = sdiv i32 %X, %t0
  %t3.recomposed = srem i32 %X, %t0
  store i32 %t1, i32* %dst00
  store i32 %t3.recomposed, i32* %dst01
  br label %end

bb2:                                              ; preds = %bb
  %t12 = sdiv i32 %X, %t0
  %0 = mul i32 %t12, %t0
  %1 = sub i32 %X, %0
  store i32 %t12, i32* %dst10
  store i32 %1, i32* %dst11
  br label %end

end:                                              ; preds = %bb2, %bb1
  ret void
}

Rebased ontop of D65451, simplified code.

This revision is now accepted and ready to land.Jul 30 2019, 11:07 AM

NFC, tidy up the diff a bit more.

This revision was automatically updated to reflect the committed changes.