This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] use demanded vector elements to eliminate partially redundant instructions
ClosedPublic

Authored by spatel on Feb 24 2023, 2:24 PM.

Details

Summary

In issue #60632, we have vector math ops that differ because an operand is shuffled, but the math has limited demanded elements, so it can be replaced by another instruction:
https://alive2.llvm.org/ce/z/TKqq7H

I don't think we have anything like this yet - it's like a CSE/GVN fold, but driven by demanded elements of a vector op.
This is limited to splat-0 as a first step to keep it simple and to save time in case it would be better to try this somewhere else.

Diff Detail

Event Timeline

spatel created this revision.Feb 24 2023, 2:24 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 24 2023, 2:24 PM
spatel requested review of this revision.Feb 24 2023, 2:24 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 24 2023, 2:24 PM
goldstein.w.n added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
1740

Think this loop should be a lambda that takes Value * V so you don't need to dup code for X/Y.

1743

Think:
if(Commute ? match(U, m_c_BinOp(Opcode, m_Specific(X), Shuf)) : match(U, m_BinOp(Opcode, m_Specific(X), Shuf)) probably make more sense here.

goldstein.w.n added inline comments.Feb 24 2023, 2:36 PM
llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
1731

This also would work for unary operator? I.e you have Unary(X) and Unary(Shuf(X))?

spatel marked 3 inline comments as done.Feb 26 2023, 7:22 AM
spatel added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
1731

fneg is the only true unary inst; the fabs intrinsic can be considered a pseudo unary inst.

But both of those have shuffle canonicalized after the op, so it doesn't seem like we'd encounter a pattern like this? Let me know if you see a pattern that escapes.

Eg, we also canonicalize fneg after extractelement, but that fold seems to be missing for fabs.

spatel updated this revision to Diff 500580.Feb 26 2023, 7:27 AM
spatel marked an inline comment as done.

Patch updated:

  1. Use lambda and m_c_Binop to reduce code duplication (I tried a few variations of juggling the conditions - this was the best I could find).
  2. Added a pair of negative tests with mismatched demand/shuffle mask.
goldstein.w.n added inline comments.Feb 27 2023, 10:45 AM
llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
1715

Why doesn't this work for div/rem/shift?

spatel marked an inline comment as done.Feb 27 2023, 11:56 AM
spatel added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
1715

This patch should be fine with any binop since we're not altering any instructions, but I left it inside this block as a first step to try to be safer. I can put a TODO comment on it.

The existing code is definitely not safe for div/rem because it can do this:

%u = udiv <2 x i32> %x, <i32 42, i32 1>
%s = shufflevector <2 x i32> %u, <2 x i32> poison, <2 x i32> zeroinitializer
-->
%u = udiv <2 x i32> %x, <i32 42, i32 poison>  ; we don't demand element 1, so replace constant 1
%s = shufflevector <2 x i32> %u, <2 x i32> poison, <2 x i32> zeroinitializer

...but now we have a divide-by-poison element which can be replaced by a divide-by-0 element which means the whole instruction is UB/poison, and that's wrong of course.

I need to look through the old patches to see if there's still potential to do bad things with shifts.

goldstein.w.n added inline comments.Feb 27 2023, 12:02 PM
llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
1715

This patch should be fine with any binop since we're not altering any instructions, but I left it inside this block as a first step to try to be safer. I can put a TODO comment on it.

The existing code is definitely not safe for div/rem because it can do this:

%u = udiv <2 x i32> %x, <i32 42, i32 1>
%s = shufflevector <2 x i32> %u, <2 x i32> poison, <2 x i32> zeroinitializer
-->
%u = udiv <2 x i32> %x, <i32 42, i32 poison>  ; we don't demand element 1, so replace constant 1
%s = shufflevector <2 x i32> %u, <2 x i32> poison, <2 x i32> zeroinitializer

...but now we have a divide-by-poison element which can be replaced by a divide-by-0 element which means the whole instruction is UB/poison, and that's wrong of course.

I need to look through the old patches to see if there's still potential to do bad things with shifts.

Okay, add TODO then patch lgtm.

spatel updated this revision to Diff 500870.Feb 27 2023, 12:12 PM
spatel marked an inline comment as done.

Patch updated:
Added TODO comment to allow this for any binop (currently bails out on div/rem/shift).

goldstein.w.n accepted this revision.Feb 27 2023, 12:18 PM

LGTM. (note i'm not maintainer so maybe sit a day or so incase anyone else wants to way in).

This revision is now accepted and ready to land.Feb 27 2023, 12:18 PM
RKSimon accepted this revision.Feb 28 2023, 2:17 AM

LGTM - cheers

This revision was landed with ongoing or failed builds.Feb 28 2023, 6:45 AM
This revision was automatically updated to reflect the committed changes.