This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Transform X == 0 ? 0 : X * Y --> X * freeze(Y).
ClosedPublic

Authored by fzhinkin on Aug 19 2021, 2:08 PM.

Diff Detail

Event Timeline

fzhinkin created this revision.Aug 19 2021, 2:08 PM
fzhinkin requested review of this revision.Aug 19 2021, 2:08 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 19 2021, 2:08 PM
lebedev.ri requested changes to this revision.Aug 19 2021, 2:22 PM
lebedev.ri added a subscriber: lebedev.ri.

InstSimplify can not create new instructions
This needs to go into instcombine
While there, please let's create the multiply, it's less ugly.

This revision now requires changes to proceed.Aug 19 2021, 2:22 PM
fzhinkin updated this revision to Diff 367902.Aug 20 2021, 2:38 PM

Moved fold optimization to instcombine, updated tests.

InstSimplify can not create new instructions
This needs to go into instcombine
While there, please let's create the multiply, it's less ugly.

Thanks for suggestion!

fzhinkin retitled this revision from [InstSimplify] Transform X == 0 ? 0 : X * Y --> X * freeze(Y). to [InstCombine] Transform X == 0 ? 0 : X * Y --> X * freeze(Y)..Aug 20 2021, 2:50 PM
  1. what's wrong with vectors?
  2. can't we preserve no-wrap flags?
  3. for vectors, what if some element is not zero but undef? (see Constant::mergeUndefsWith())
  1. what's wrong with vectors?

There's nothing wrong, I'll support it, thanks!

  1. can't we preserve no-wrap flags?

Definitely.

  1. for vectors, what if some element is not zero but undef? (see Constant::mergeUndefsWith())

Not sure that I fully understand the issue.
If %a in the snippet below contains some undefs then the select may choose either 0 or %m depending on particular value of a (which could be 0 for undef elements).
Probably %a should be frozen to ensure that both icmp and mul will see the same a's value, but (if I understood it correctly) absence of freeze does not compromise the correctness.

define <2 x i4> @src(<2 x i4> %a, <2 x i4> %b) {
  %c = icmp eq <2 x i4> %a, zeroinitializer
  %m = mul <2 x i4> %a, %b
  %r = select <2 x i1> %c, <2 x i4> zeroinitializer, <2 x i4> %m
  ret <2 x i4> %r
}

Or you meant that the code could be folded even if icmp's RHS is not only the all-zeros vector, but a constant-vector containing some undefs?

RKSimon added inline comments.Aug 21 2021, 5:39 AM
llvm/test/Transforms/InstCombine/select.ll
3028

please can you not enumerate tests, and give them names that describe the purpose of the test - e.g. mul_select_eq_zero, mul_select_eq_zero_commute, mul_select_eq_zero_vector, mul_select_ne_zero

3046

I'm not sure this test does anything useful? instcombine will canonicalize the 0 to the RHS

spatel added a subscriber: spatel.Aug 21 2021, 9:27 AM
fzhinkin updated this revision to Diff 368396.Aug 24 2021, 10:44 AM

Enabled folding for vectors, copied flags for mul, renamed tests.

  1. what's wrong with vectors?

There's nothing wrong, I'll support it, thanks!

  1. can't we preserve no-wrap flags?

Definitely.

  1. for vectors, what if some element is not zero but undef? (see Constant::mergeUndefsWith())

Not sure that I fully understand the issue.
If %a in the snippet below contains some undefs then the select may choose either 0 or %m depending on particular value of a (which could be 0 for undef elements).
Probably %a should be frozen to ensure that both icmp and mul will see the same a's value, but (if I understood it correctly) absence of freeze does not compromise the correctness.

define <2 x i4> @src(<2 x i4> %a, <2 x i4> %b) {
  %c = icmp eq <2 x i4> %a, zeroinitializer
  %m = mul <2 x i4> %a, %b
  %r = select <2 x i1> %c, <2 x i4> zeroinitializer, <2 x i4> %m
  ret <2 x i4> %r
}

Or you meant that the code could be folded even if icmp's RHS is not only the all-zeros vector, but a constant-vector containing some undefs?

I mean that if in either zero constant vector, a particular element is undef, then in the other constant vector, the same element can be anything:
https://alive2.llvm.org/ce/z/-gt253
https://alive2.llvm.org/ce/z/3PAU3E

So i think you want if(!match(Constant::mergeUndefsWith(TrueVal, CondVal.getOperand(1)), m_Zero)) return;

  1. what's wrong with vectors?

There's nothing wrong, I'll support it, thanks!

  1. can't we preserve no-wrap flags?

Definitely.

  1. for vectors, what if some element is not zero but undef? (see Constant::mergeUndefsWith())

Not sure that I fully understand the issue.
If %a in the snippet below contains some undefs then the select may choose either 0 or %m depending on particular value of a (which could be 0 for undef elements).
Probably %a should be frozen to ensure that both icmp and mul will see the same a's value, but (if I understood it correctly) absence of freeze does not compromise the correctness.

define <2 x i4> @src(<2 x i4> %a, <2 x i4> %b) {
  %c = icmp eq <2 x i4> %a, zeroinitializer
  %m = mul <2 x i4> %a, %b
  %r = select <2 x i1> %c, <2 x i4> zeroinitializer, <2 x i4> %m
  ret <2 x i4> %r
}

Or you meant that the code could be folded even if icmp's RHS is not only the all-zeros vector, but a constant-vector containing some undefs?

I mean that if in either zero constant vector, a particular element is undef, then in the other constant vector, the same element can be anything:
https://alive2.llvm.org/ce/z/-gt253
https://alive2.llvm.org/ce/z/3PAU3E

So i think you want if(!match(Constant::mergeUndefsWith(TrueVal, CondVal.getOperand(1)), m_Zero)) return;

Got it, thanks for explanation!

fzhinkin updated this revision to Diff 368694.Aug 25 2021, 11:45 AM

Supproted undef values in both condition's and select's arguments.

fzhinkin edited the summary of this revision. (Show Details)Aug 25 2021, 11:45 AM
nikic added inline comments.Aug 25 2021, 11:51 AM
llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
752

Should either check that the mul only has one use, or else make sure that other mul users get replaced with the new frozen form as well, to avoid having two muls.

fzhinkin updated this revision to Diff 368723.Aug 25 2021, 2:17 PM

Replace original mul with mul using frozen input.

nikic added inline comments.Aug 25 2021, 2:32 PM
llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
758

What ensures that the other case is Y == CmpLHS? Couldn't this be comparing a completely unrelated value?

761

In InstCombine, replaceInstUsesWith should be used in place of replaceAllUsesWith for worklist management reasons.

nikic added inline comments.Aug 26 2021, 12:09 AM
llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
761

Also, can you please test the case where the other user is before the select? I think as written you're inserting the mul before the select, so it may not dominate other users. You would have to adjust the insertion point.

(This might also be one of the rare cases where modifying the mul in place to freeze one operand is more elegant than creating a new one and replacing users...)

spatel added inline comments.Aug 26 2021, 7:18 AM
llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
741

Why match an arbitrary constant if this pattern requires a zero constant? Use m_Zero() or some variation of that.

742

Use ICmpInst::isEquality(Pred)

758

General note: each review comment like this one is a requirement for a regression test - either positive (ie, the transform works) or negative (the transform fails).

Instead of matching 2 arbitrary values with m_Mul(), consider using m_c_Mul(m_Specific(CmpLHS), m_Value(Y)).
(Adjust the variable names and/or code comments, so they line up.)

fzhinkin added inline comments.Aug 28 2021, 4:05 AM
llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
741

I'm capturing it to avoid bunch of casts to extract it from CondVal later on (to check if TrueVal merges with CmpRHS into zero). But as it rises some confusion I will use m_Zero here and will extract CmpRHS before actual use.

758

@nikic You're right, thanks!

@spatel thanks for suggestion!

761

Thanks! Mul's modification requires less code in that case. Not sure about elegance, but I'll stick with it.

fzhinkin updated this revision to Diff 369256.Aug 28 2021, 6:26 AM

Cleaned up the code, updated tests.

spatel added inline comments.Sep 9 2021, 5:33 AM
llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
746–749

I'm still confused about this: we want to match a zero constant as the TrueVal based on the block comment for this function, so why not use m_Zero()?
Usually, we can match vectors that include undef elements, but we may not be able to propagate those undefs through a transform (typically, we would replace with a ConstantInt::getNullValue() to get a "full" zero).

If the partial undef logic is complicated, I think that would be ok to defer to a follow-up patch. Ie, let's get the basic/common case right, then look at corner cases if that really matters.

lebedev.ri added inline comments.Sep 9 2021, 5:44 AM
llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
746–749

Please read previous review comments in this patch. We are going in circles here.

spatel added inline comments.Sep 9 2021, 5:56 AM
llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
746–749

Ah, sorry I missed that. And now I see that @mul_select_eq_undef_vector is testing a non-zero constant.
I think it would be good to pre-commit the tests and add some explanatory comments, so that's easier to spot.
Also, the multi-use test would be easier to follow if it took the more common call void @use() pattern.

fzhinkin updated this revision to Diff 371676.Sep 9 2021, 11:44 AM

Added comments to both transformation and tests, made multi-user test more readable.

fzhinkin added inline comments.Sep 9 2021, 11:48 AM
llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
746–749

it would be good to pre-commit the tests

Could you please explain what did you meant here?

As of other suggestions - I added comments to both code and tests and refactored multi-use test case to use @use_i32.

Thanks!

spatel added inline comments.Sep 9 2021, 11:57 AM
llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
746–749

I mean we can commit the tests with the current (not transformed) output verified by the CHECK lines. Then, we apply the code change here and update the tests. That way, we show the diffs on the tests rather than just the new output.

If you do not have commit access, let me know. I can push the baseline tests for you.

fzhinkin added inline comments.Sep 9 2021, 1:15 PM
llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
746–749

I don't have commit access, so I'll very appreciate if you commit the baseline tests.
Besides changing checks I also removed tests' comments as comments does not make much sense without the transformation.

Here is the patch:

.

Thanks!

spatel added inline comments.Sep 10 2021, 6:10 AM
llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
746–749

Thanks - I pushed that:
745f82b8d909

So now you can update here by regenerating the CHECK lines and adding the test comments back.

fzhinkin updated this revision to Diff 371980.Sep 10 2021, 11:42 AM

Rebased changes.

llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
746–749

Thank you!

spatel accepted this revision.Sep 10 2021, 12:07 PM

LGTM, but let's wait at least a day to commit in case there are any other comments or suggestions.

llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
754

Should that be something like:
"...when it is a scalar undef value or a vector containing non-zero elements that are masked by undef elements in the compare constant."

nikic added inline comments.Sep 10 2021, 1:44 PM
llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
768

I believe this may fail if the multiplication is a constant expression (we should bail in that case).

769

It would be better to create the freeze using IRBuilder, because it will be queued for reprocessing that way. Or alternatively using InsertNewInstBefore, which also handles this correctly.

fzhinkin updated this revision to Diff 372063.Sep 11 2021, 1:52 AM

Updated comments, fixed few bugs.

llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
754

Yes, it looks better, thanks!

768

I was not able to reproduce that case w/ tests because all my const expressions were folded before invocation of newly added folding, but I'll add the check, thanks.

769

Thanks for suggestion, I'll use InsertNewInstBefore.

nikic accepted this revision.Sep 11 2021, 12:21 PM

LG

llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
768

The trick to that is usually embedding globals in to the constant expression, something like mul (i64 ptrtoint (i8* @g1 to i64). i64 ptrtoint (i8* @g2 to i64)) might do.

llvm/test/Transforms/InstCombine/select.ll
2888

As you can see, this doesn't actually check the ne case, as the icmp is canonicalize to eq first. I think it may be possible to prevent this by adding an extra use of the icmp.

fzhinkin updated this revision to Diff 372110.Sep 12 2021, 2:37 AM

Fixed the test w/ icmp ne by preventing icmp's transformation.

fzhinkin added inline comments.Sep 12 2021, 2:48 AM
llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
768

It seems impossible to have ConstantExpr mul at the point where cast to Instruction is done:

  • if mul is ConstantExpr then both its arguments are constant expressions too;
  • to reach the cast one of the mul's operands should be compared with 0 by icmp and if both mul's operands are constant expressions then icmp should be a constant expression too;
  • select's operands are icmp, mul and constant so select should be a constant expression.

So either mul is not a constant expression or the whole select is a constant expression. In latter case InstCombinerImpl::visitSelectInst will not be invoked.

llvm/test/Transforms/InstCombine/select.ll
2888

Thanks for pointing to it, fixed.

This revision was not accepted when it landed; it landed in state Needs Review.Sep 15 2021, 6:05 AM
This revision was automatically updated to reflect the committed changes.