Page MenuHomePhabricator

[InstCombine] Fold compare of int constant against a splatted vector of ints

Authored by dneilson on Mar 28 2018, 1:14 PM.



Folding patterns like:

%vec = shufflevector <4 x i8> %insvec, <4 x i8> undef, <4 x i32> zeroinitializer
%cast = bitcast <4 x i8> %vec to i32
%cond = icmp eq i32 %cast, 0


%ext = extractelement <4 x i8> %insvec, i32 0
%cond = icmp eq i32 %ext, 0

Combined with existing rules, this allows us to fold patterns like:

%insvec = insertelement <4 x i8> undef, i8 %val, i32 0
%vec = shufflevector <4 x i8> %insvec, <4 x i8> undef, <4 x i32> zeroinitializer
%cast = bitcast <4 x i8> %vec to i32
%cond = icmp eq i32 %cast, 0


%cond = icmp eq i8 %val, 0

When we construct a splat vector via a shuffle, and bitcast the vector into an integer type for comparison against an integer constant. Then we can simplify the the comparison to compare the splatted value against the integer constant.

Diff Detail


Event Timeline

dneilson created this revision.Mar 28 2018, 1:14 PM

I don't get why we need the last comparison. Can we just replace %cast with %val in your example?

Ah, I see the type mismatch, but still. What prevents us from doing the same for less, greater etc. comparisons? Arithmetics?

Ah, I see the type mismatch, but still. What prevents us from doing the same for less, greater etc. comparisons? Arithmetics?

Nothing prevents us from doing so. I'll relax the restriction.

dneilson updated this revision to Diff 140258.Mar 29 2018, 8:10 AM
  • Generalize to predicates other than just (in)equality.
efriedma added inline comments.
114 ↗(On Diff #140258)

This can be optimized: . Not sure if that's useful in practice.

dneilson added inline comments.Mar 29 2018, 11:51 AM
114 ↗(On Diff #140258)

Interesting. Not useful for the direct application that I have in mind for this peephole optimization (I really only need eq/ne), but good to keep in mind for the future. Thanks for pointing it out!

mkazantsev added inline comments.Mar 30 2018, 12:29 AM
2488 ↗(On Diff #140258)

I agree that it is true if C.isSplat, but how it will work in else branch?

The matching seems narrower than required. It won't catch cases where we splat some existing vector element (rather than a scalar value). Can we start the match from the shuffle instead:

%splat = shufflevector <M x iN> %x, <4 x iN> undef, <4 x i32> <i32 C, i32 C, i32 C, i32 C>
%cast = bitcast <M x iN> %splat to iMN
%cond = icmp Pred iMN %cast, SplatCmpC

And create this:

%ext = extractelement <4 x iN> %x, i32 C
%cond = icmp Pred iN %ext, CmpC

If there's an insertelement ahead of the extractelement, existing folds should take care of eliminating that.

dneilson updated this revision to Diff 140650.Apr 2 2018, 11:22 AM
  • Following spatel's suggestion -- making the pattern match a little more general by only matching the bitcast + shuffle.
dneilson retitled this revision from [InstCombine] Fold compare of int constant against an integer vector splat to [InstCombine] Fold compare of int constant against a splatted vector of ints.Apr 2 2018, 11:23 AM
dneilson edited the summary of this revision. (Show Details)
spatel added inline comments.Apr 2 2018, 3:20 PM
2470–2475 ↗(On Diff #140650)

Early exit if the bitcast types aren't suitable:

if (!Bitcast->getType()->isIntegerTy() || !Bitcast->getSrcTy()->isIntOrIntVectorTy())
  return nullptr;
2478–2483 ↗(On Diff #140650)

Constant::getSplatValue() ?

2497–2501 ↗(On Diff #140650)

None of this is necessary AFAIK. A splat shuffle will be canonicalized so that the 2nd operand is undef and the mask constant is adjusted to choose from the 1st operand.

2509–2512 ↗(On Diff #140650)

Probably better to handle this part independently in InstSimplify (if it really matters?).

193–201 ↗(On Diff #140650)

This test isn't changing with this patch. Move it to instsimplify if there's no existing coverage?
It would be easier to see the transforms if you check in all of the tests first based on trunk. Then, just run the update script again with this patch in place locally to rebase.

dneilson added inline comments.Apr 3 2018, 6:53 AM
2478–2483 ↗(On Diff #140650)

Completely missed that interface. Thank you! I was not happy with the gymnastics I had to do here; I'm glad there's a better way.

2497–2501 ↗(On Diff #140650)

Right you are. I had in my head, for some reason, that it wasn't getting simplified.

2509–2512 ↗(On Diff #140650)

Seemed like a nice bonus; we've already done all of the heavy lifting doing the match and such here. Replicating that into inst-simplify doesn't seem worth the bother.

dneilson updated this revision to Diff 140789.Apr 3 2018, 7:50 AM
  • Simplify implementation
  • Prune tests (remove redundant & unneccessary); all added tests have now been triple checked that they do not simplify/fold even a single instruction without this patch.
spatel accepted this revision.Apr 3 2018, 9:19 AM

LGTM - see inline comments for more small improvements.

2475–2476 ↗(On Diff #140789)

No need to explicitly set these to nullptr; if the match fails, we don't use those variables.

2482 ↗(On Diff #140789)

Slightly less generic if you name this 'EltTy' or similar to indicate that it's the element's type.

2492–2493 ↗(On Diff #140789)

Less code if we let instcombine handle the replacement:

return new ICmpInst(Pred, Extract, NewC);
12–13 ↗(On Diff #140789)

We should have a test that shows this most basic transform (the extractelt is in the output). I'd also include a test with a shuffle that changes the number of elements, and a test with weird all-in-one:

define i1 @extending_shuffle_with_weird_types(<2 x i9> %v) {
  %splat = shufflevector <2 x i9> %v, <2 x i9> undef, <3 x i32> zeroinitializer
  %cast = bitcast <3 x i9> %splat to i27
  %cmp = icmp slt i27 %cast, 262657 ; 0x040201
  ret i1 %cmp
This revision is now accepted and ready to land.Apr 3 2018, 9:19 AM
dneilson updated this revision to Diff 140817.Apr 3 2018, 10:14 AM
  • Final updates suggested. Just updating the differential review to match what will be landed.
This revision was automatically updated to reflect the committed changes.