This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] `select + mul` -> `select + shl` with power of twos.
Needs ReviewPublic

Authored by danlark on Dec 16 2019, 1:50 PM.

Details

Summary

Now select + mul transforms to select + shl with power of twos known in select.

Patch by Danila Kutenin. email:danilak at google.com, github:danlark1@ . I don't have commit rights.

Event Timeline

danlark created this revision.Dec 16 2019, 1:50 PM
danlark updated this revision to Diff 234143.Dec 16 2019, 1:56 PM

Remove TODO

lebedev.ri added inline comments.Dec 16 2019, 2:47 PM
llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
266–268

What happens with undef lanes?
(they should become 0, do not keep them as undef.
there was some helper for that, can't find it now "replace undef values with")

273–274

This isn't sound.
We want to not propagate NSW if *any* lane matches this predicate, not just if they all do..
So you want

if (I.hasNoSignedWrap() && !C1.isNotMinSignedValue())
  Shl->setHasNoSignedWrap();
danlark updated this revision to Diff 234181.Dec 16 2019, 3:55 PM
danlark marked 2 inline comments as done.

Fix the issues and rebase

danlark added inline comments.Dec 16 2019, 3:57 PM
llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
266–268

They will remain undef as in division, why is 0 better?

273–274

Yeah, I agree. Maybe you meant !cast<Constant>(Op1)->isMinSignedValue()

danlark updated this revision to Diff 234183.Dec 16 2019, 4:00 PM

Add simple min integer test

danlark marked 2 inline comments as not done.Dec 16 2019, 4:01 PM
danlark updated this revision to Diff 234187.Dec 16 2019, 4:34 PM

Add zext test

danlark updated this revision to Diff 234188.Dec 16 2019, 4:41 PM

More robust zext tests

danlark updated this revision to Diff 234189.Dec 16 2019, 4:44 PM

Even more robust tests

Harbormaster completed remote builds in B42624: Diff 234189.
danlark updated this revision to Diff 234236.Dec 17 2019, 12:49 AM

Correct INT_MIN test

lebedev.ri requested changes to this revision.Dec 17 2019, 1:07 AM
lebedev.ri added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
266–268

Keeping undef in this situation is (a very common) miscompilation.

define <2 x i8> @shift_if_different_lanes_undef_vector(<2 x i8> %px, i1 %cond) {
%0:
  %sel = select i1 %cond, <2 x i8> { 16, undef }, <2 x i8> { undef, 32 }
  %r = mul <2 x i8> %px, %sel
  ret <2 x i8> %r
}
=>
define <2 x i8> @shift_if_different_lanes_undef_vector(<2 x i8> %px, i1 %cond) {
%0:
  %sel = select i1 %cond, <2 x i8> { 4, undef }, <2 x i8> { undef, 5 }
  %r = shl <2 x i8> %px, %sel
  ret <2 x i8> %r
}
Transformation doesn't verify!
ERROR: Target is more poisonous than source

Example:
<2 x i8> %px = < undef, undef >
i1 %cond = undef

Source:
<2 x i8> %sel = < undef, undef >
<2 x i8> %r = < undef, undef >

Target:
<2 x i8> %sel = < #x04 (4), #x08 (8) >
<2 x i8> %r = < #x00 (0)        [based on undef value], poison >
Source value: < undef, undef >
Target value: < #x00 (0), poison >
define <2 x i8> @shift_if_different_lanes_undef_vector(<2 x i8> %px, i1 %cond) {
%0:
  %sel = select i1 %cond, <2 x i8> { 16, undef }, <2 x i8> { undef, 32 }
  %r = mul <2 x i8> %px, %sel
  ret <2 x i8> %r
}
=>
define <2 x i8> @shift_if_different_lanes_undef_vector(<2 x i8> %px, i1 %cond) {
%0:
  %sel = select i1 %cond, <2 x i8> { 4, 0 }, <2 x i8> { 0, 5 }
  %r = shl <2 x i8> %px, %sel
  ret <2 x i8> %r
}
Transformation seems to be correct!
273–274

I'm very sure i didn't - that has the same problem of only checking for splat INT_MIN while we want to check if any channel is INT_MIN.

This revision now requires changes to proceed.Dec 17 2019, 1:07 AM
danlark updated this revision to Diff 234245.Dec 17 2019, 1:48 AM
danlark marked an inline comment as done.

undef should return zero in select+shl

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
273–274

I believe for now it is correct, C1 is a logarithm of Op1, if Op1 is INT_MIN, then it should not have nsw. So, if all are not int_min, it will be propagated, otherwise, it will not.

define i32 @shift_if_power2_nuw_nsw_min(i32 %x, i1 %cond) {
  %sel = select i1 %cond, i32 2, i32 -2147483648
  %r = mul nuw nsw i32 %sel, %x
  ret i32 %r
}

===>

define i32 @shift_if_power2_nuw_nsw_min(i32 %x, i1 %cond) {
  %sel = select i1 %cond, i32 1, i32 31
  %r =  shl nuw i32 i32 %x, %sel
  ret i32 %r
}
danlark marked 4 inline comments as done.Dec 17 2019, 1:48 AM
danlark added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
266–268

Done.

lebedev.ri added inline comments.Dec 17 2019, 2:03 AM
llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
116

You really want to only use Constant::replaceUndefsWith() to fixup that single case,
not pessimize everything that uses this generic utility..

273–274

Can you point me to the (vector) test that shows it working as expected?

danlark updated this revision to Diff 234248.Dec 17 2019, 2:25 AM

Update vector test

danlark marked an inline comment as done.Dec 17 2019, 2:26 AM
danlark added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
273–274

Ok, I changed to isNot

define <2 x i32> @shift_if_power2_vector_nsw(<2 x i32> %px, i1 %cond) {
  %sel = select i1 %cond, <2 x i32> <i32 4, i32 -2147483648>, <2 x i32> <i32 1, i32 32>
  %r = mul nsw <2 x i32> %px, %sel
  ret <2 x i32> %r
}
=>
define <2 x i32> @shift_if_power2_vector_nsw(<2 x i32> %px, i1 %cond) {
  %sel = select i1 %cond, <2 x i32> <i32 2, i32 31>, <2 x i32> <i32 0, i32 5>
  %r = shl <2 x i32> %px, %sel
  ret <2 x i32> %r
}
danlark updated this revision to Diff 234254.Dec 17 2019, 2:59 AM

Rebase and style

Now I think review is in a good shape

lebedev.ri requested changes to this revision.Dec 17 2019, 2:18 PM

Thank you for working on this.

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
264

I think it would be better to change Op1 type to Constant, and make the caller do the cast.

267–268

Add a comment explaining that we really do need to sanitize undef to 0 here.

280

// X mul (zext (C1 << N)), where C1 is "1<<C2" --> X << (zext (add N, C2))

319–324

If i'm reading this correctly, these are miscompilations:
https://rise4fun.com/Alive/IKH

llvm/test/Transforms/InstCombine/mul.ll
669–682

This is a miscompile:

----------------------------------------
define i32 @shift_if_power2_double_select(i32 %x, i32 %y, i1 %cond1, i1 %cond2) {
%0:
  %shl.res = shl i32 8, %y
  %sel1 = select i1 %cond1, i32 %shl.res, i32 1024
  %sel2 = select i1 %cond2, i32 16, i32 %sel1
  %r = mul nuw i32 %x, %sel2
  ret i32 %r
}
=>
define i32 @shift_if_power2_double_select(i32 %X, i32 %Y, i1 %COND1, i1 %COND2) {
%0:
  %TMP1 = add i32 %Y, 3
  %DOTV = select i1 %COND1, i32 %TMP1, i32 10
  %R_V = select i1 %COND2, i32 4, i32 %DOTV
  %R = shl nuw i32 %X, %R_V
  ret i32 %R
}
Transformation doesn't verify!
ERROR: Target is more poisonous than source

Example:
i32 %x = #x0f360904 (255199492)
i32 %y = #x0000001d (29)
i1 %cond1 = #x1 (1)
i1 %cond2 = undef

Source:
i32 %shl.res = #x00000000 (0)
i32 %sel1 = #x00000000 (0)
i32 %sel2 = #x00000000 (0)      [based on undef value]
i32 %r = #x00000000 (0)

Target:
i32 %TMP1 = #x00000020 (32)
i32 %DOTV = #x00000020 (32)
i32 %R_V = #x00000020 (32)
i32 %R = poison
Source value: #x00000000 (0)
Target value: poison

Summary:
  0 correct transformations
  1 incorrect transformations
  0 errors
This revision now requires changes to proceed.Dec 17 2019, 2:18 PM
danlark updated this revision to Diff 234416.Dec 17 2019, 5:03 PM
danlark marked 6 inline comments as done.

Fix overflow in shifting and further propagation

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
264

It would be harder to combine OperandFoldActions then in udiv and mul at the same time, though I changed to only one call

319–324

Yeah, thank you for the tool. if Shl NSW, this should not happen and this is correct I believe https://rise4fun.com/Alive/sog

llvm/test/Transforms/InstCombine/mul.ll
669–682

Same issue with nsw for shift, fixed

Thank you for working on this.

I want to say sorry if I am asking/doing stupid things, I am only getting used to things

danlark updated this revision to Diff 234424.Dec 17 2019, 5:21 PM

Also add NUW

danlark updated this revision to Diff 234552.Dec 18 2019, 8:46 AM

Don't add zero

danlark added a comment.EditedDec 22 2019, 11:23 AM

Ping.

Another hopeless ping before NY

xbolva00 added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
297

Dtto

299

assert is better choice here?

323

emplace_back?

Another hopeless ping before NY

It is a holiday time so please be patient

danlark updated this revision to Diff 235604.Dec 30 2019, 8:15 AM
danlark marked 3 inline comments as done.

Fix asserts

danlark updated this revision to Diff 235605.Dec 30 2019, 8:17 AM

Remove nullptr assignment

Harbormaster completed remote builds in B43049: Diff 235605.
danlark updated this revision to Diff 236172.Jan 4 2020, 4:57 AM

Replace finally everything with asserts

@lebedev.ri Can you please take a look and possibly submit?

Sorry, i lost track of this patch.
I was hoping someone more familiar with that code would comment, too.

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
332–337

You can at least cut this in half by using m_ZExtOrSelf()

danlark updated this revision to Diff 237081.Jan 9 2020, 7:25 AM
danlark marked an inline comment as done.

Applying m_ZExtOrSelf

danlark marked 2 inline comments as done.Jan 9 2020, 7:26 AM

@spatel You are doing a lot of stuff around this code, PTAL.

spatel added a comment.Jan 9 2020, 8:48 AM

There's a lot going on here, and the patch doesn't apply cleanly to current source. Can you pre-commit any NFC changes/tests to make this patch smaller? For example, the udiv change/tests are not affected by the mul code?

danlark updated this revision to Diff 237110.Jan 9 2020, 9:29 AM

Removing div tests, we don't need them

spatel added a comment.Jan 9 2020, 9:29 AM

There's a lot going on here, and the patch doesn't apply cleanly to current source. Can you pre-commit any NFC changes/tests to make this patch smaller? For example, the udiv change/tests are not affected by the mul code?

Sorry - I didn't see earlier that this patch is part of a sequence.
I'm a bit skeptical about the need to extend the "udiv action" machine. Are the motivating cases really that complicated or could we get away with a simpler pattern match of mul(select...)?

danlark updated this revision to Diff 237113.Jan 9 2020, 9:35 AM

Rebase and make tests committed before

Harbormaster completed remote builds in B43610: Diff 237113.

There's a lot going on here, and the patch doesn't apply cleanly to current source. Can you pre-commit any NFC changes/tests to make this patch smaller? For example, the udiv change/tests are not affected by the mul code?

Sorry - I didn't see earlier that this patch is part of a sequence.
I'm a bit skeptical about the need to extend the "udiv action" machine. Are the motivating cases really that complicated or could we get away with a simpler pattern match of mul(select...)?

I am not sure but I just reused the code for both division and multiplication -- from my perspective the idea is the same.

Also, I rebased a bit and made the patch smaller as you asked.

There's a lot going on here, and the patch doesn't apply cleanly to current source. Can you pre-commit any NFC changes/tests to make this patch smaller? For example, the udiv change/tests are not affected by the mul code?

Sorry - I didn't see earlier that this patch is part of a sequence.
I'm a bit skeptical about the need to extend the "udiv action" machine. Are the motivating cases really that complicated or could we get away with a simpler pattern match of mul(select...)?

I am not sure but I just reused the code for both division and multiplication -- from my perspective the idea is the same.

Also, I rebased a bit and made the patch smaller as you asked.

So, I basically don't see any reason division machine is doing a lot of things but as abstractions and semantics are mostly the same, I decided to reuse it. And to generalize power 2 multiplication then.

That's true that most of patterns are just one mul(select) but I saw at least one where the depth level was two.

There's a lot going on here, and the patch doesn't apply cleanly to current source. Can you pre-commit any NFC changes/tests to make this patch smaller? For example, the udiv change/tests are not affected by the mul code?

Sorry - I didn't see earlier that this patch is part of a sequence.
I'm a bit skeptical about the need to extend the "udiv action" machine. Are the motivating cases really that complicated or could we get away with a simpler pattern match of mul(select...)?

I am not sure but I just reused the code for both division and multiplication -- from my perspective the idea is the same.

Also, I rebased a bit and made the patch smaller as you asked.

So, I basically don't see any reason division machine is doing a lot of things but as abstractions and semantics are mostly the same, I decided to reuse it. And to generalize power 2 multiplication then.

That's true that most of patterns are just one mul(select) but I saw at least one where the depth level was two.

Yes, I understand the similarity, I'm just not sold on the idea that we ever needed the div complexity in the first place. If @lebedev.ri is comfortable with all of the poison-related corner cases, then we can proceed. AFAIK, there are no active contributers on this code, so if you've looked it over - then you're the expert. :)

@lebedev.ri Friendly ping and see message from spatel@ above :)

I'm struggling to keep all of the poison potential straight. Can we make this patch only include foldMulPow2Cst() (remove foldMulShl())?

Double-check to make sure I've translated this correctly:

define i32 @shift_if_power2_double_select_zext(i32 %x, i32 %y, i1 %cond1, i1 %cond2) {
  %shl = shl nsw i32 2147483648, %y
  %sel1 = select i1 %cond1, i32 %shl, i32 1024
  %sel2 = select i1 %cond2, i32 16, i32 %sel1
  %r = mul nsw i32 %x, %sel2
  ret i32 %r
}

$ opt -instcombine mul.ll -S

define i32 @shift_if_power2_double_select_zext(i32 %x, i32 %y, i1 %cond1, i1 %cond2) {
  %1 = add i32 %y, 31
  %.v = select i1 %cond1, i32 %1, i32 10
  %r.v = select i1 %cond2, i32 4, i32 %.v
  %r = shl nsw i32 %x, %r.v
  ret i32 %r
}

Miscompile?
https://rise4fun.com/Alive/YcG

lebedev.ri resigned from this revision.Jan 12 2023, 5:15 PM

This review seems to be stuck/dead, consider abandoning if no longer relevant.

Herald added a project: Restricted Project. · View Herald TranscriptJan 12 2023, 5:15 PM
Herald added a subscriber: StephenFan. · View Herald Transcript