Page MenuHomePhabricator

[VectorCombine] scalarize binop of inserted elements into vector constants
ClosedPublic

Authored by spatel on Tue, May 5, 3:58 PM.

Details

Summary

As with the extractelement patterns that are currently in vector-combine, there are going to be several possible variations on this theme. This should be the clearest, simplest example.

Scalarization is the right direction for target-independent canonicalization, and InstCombine has some of those folds already, but it doesn't do this. I proposed a similar transform in D50992. Here in vector-combine, we can check the cost model to be sure it's profitable, so there should be less risk.

Diff Detail

Event Timeline

spatel created this revision.Tue, May 5, 3:58 PM
Herald added a project: Restricted Project. · View Herald TranscriptTue, May 5, 3:58 PM
lebedev.ri accepted this revision.Wed, May 6, 12:25 AM

Seems fairly uncontroversial to me.

llvm/lib/Transforms/Vectorize/VectorCombine.cpp
320

Hm, do we have an interface to ask for cost of InsertElement with variable insert index?

340–341

// We want to scalarize unless vector variant actually has lower cost

357

This should retain the name from I.

This revision is now accepted and ready to land.Wed, May 6, 12:25 AM
RKSimon added inline comments.Wed, May 6, 2:41 AM
llvm/lib/Transforms/Vectorize/VectorCombine.cpp
320

Yes we set Index to -1 in the getVectorInstrCost call if its unknown/variable - we don't do much with it in x86 at least though........

Random question - what about special case undef handling for the upper elements?

opt -instcombine

define i32 @square()  {
    %1 = xor i32 undef, undef
    ret i32 %1
}
->
define i32 @square() {
  ret i32 0
}

Random question - what about special case undef handling for the upper elements?

opt -instcombine

define i32 @square()  {
    %1 = xor i32 undef, undef
    ret i32 %1
}
->
define i32 @square() {
  ret i32 0
}

cc'ing @aqjune @nlopes @regehr

undef ^ undef is actually undef:
http://volta.cs.utah.edu:8080/z/GAs5QJ

But I think we choose to fold to "0" on that to avoid breaking too much existing source code and/or expose possible semantic incompatibility with source code?

In the original vector code here, we're not performing the binop on the *same* value because 'undef' is an arbitrary bit pattern in each instruction:
http://volta.cs.utah.edu:8080/z/eqrVkm

I was assuming that we don't have the same concern about vector code as the scalar code here, but we could use ConstantFolding to generate the result for each element of the output vector.

But that will definitely need to be handled when we generalize this to deal with non-undef constants with something like the code from D50992.

spatel updated this revision to Diff 262367.Wed, May 6, 6:54 AM
spatel marked 4 inline comments as done.

Patch updated - no logic changes but:

  1. Added code comment to make cost metric explicit - scalarize unless original vector code is cheaper.
  2. Added/retained value names to improve debugging experience (tests regenerated to show diffs).
spatel added a comment.Wed, May 6, 8:30 AM

undef ^ undef is actually undef:
http://volta.cs.utah.edu:8080/z/GAs5QJ

But I think we choose to fold to "0" on that to avoid breaking too much existing source code and/or expose possible semantic incompatibility with source code?

https://groups.google.com/forum/#!topic/llvm-dev/C5ydxnn-r0o

aqjune added a comment.Wed, May 6, 4:32 PM

undef ^ undef is actually undef:
http://volta.cs.utah.edu:8080/z/GAs5QJ

But I think we choose to fold to "0" on that to avoid breaking too much existing source code and/or expose possible semantic incompatibility with source code?

https://groups.google.com/forum/#!topic/llvm-dev/C5ydxnn-r0o

This seems interesting.

I modified llvm/lib/IR/ConstantFold.cpp to return undef on xor undef undef, and it did not miscompile testsuite & SPEC on my machine, at least.
I can make a patch for this.

I was assuming that we don't have the same concern about vector code as the scalar code here, but we could use ConstantFolding to generate the result for each element of the output vector.

Inserting into a binop(undef, undef) constant fold makes sense as a safer option.

spatel added a comment.Thu, May 7, 7:26 AM

I was assuming that we don't have the same concern about vector code as the scalar code here, but we could use ConstantFolding to generate the result for each element of the output vector.

Inserting into a binop(undef, undef) constant fold makes sense as a safer option.

Ok - we can generalize the base vector constant matching then. I'll add some tests and update the code.

spatel updated this revision to Diff 262652.Thu, May 7, 7:45 AM
spatel retitled this revision from [VectorCombine] scalarize binop of inserted elements into undef to [VectorCombine] scalarize binop of inserted elements into vector constants.

Patch updated:

  1. Loosen pattern matching to allow any vector constant rather than just undef.
  2. Use constant folding to generate the new base vector for insertion.
  3. Add tests/comments.
  4. Add TODO code comments for potential enhancements.
spatel requested review of this revision.Thu, May 7, 7:57 AM

(requesting re-review because the logic changed slightly)

RKSimon added inline comments.Thu, May 7, 8:40 AM
llvm/test/Transforms/VectorCombine/X86/insert-binop.ll
102

Can we ever have a base vector with undef elements here? sdiv/udiv won't like that but I'm not sure if we can ever get in that state.

%i0 = insertelement <2 x i64> <i64 42, i64 undef>, i64 %x, i64 1
%i1 = insertelement <2 x i64> <i64 -7, i64 undef>, i64 %y, i32 1
spatel marked an inline comment as done.Thu, May 7, 10:11 AM
spatel added inline comments.
llvm/test/Transforms/VectorCombine/X86/insert-binop.ll
102

Yes, it's ok to have undefs in the lane that we're inserting into, and ConstantFolding will deal with that. Demanded elements will eventually replace the unused constants that we see in this test with undefs.

We can't, however, have undefs in lanes we are not inserting into with div/rem - that would be immediate UB, so we would have simplified that before we reach here. But I can add tests with non-canonical IR to make sure there's nothing crazy happening here.

RKSimon added inline comments.Thu, May 7, 10:37 AM
llvm/test/Transforms/VectorCombine/X86/insert-binop.ll
102

Thanks, I think I was thinking that an undef value had the same effect as divide by zero in sdiv/udiv (causing the entire vector to become undef).

spatel marked 3 inline comments as done.Thu, May 7, 12:53 PM
spatel added inline comments.
llvm/test/Transforms/VectorCombine/X86/insert-binop.ll
102

It does, and I was wrong about this getting folded sooner. I thought that analysis existed, but we don't have an InstSimplify that tries to recursively find undef/zero divisor elements and zap the entire value. We just do a simple check on vector constants. So the transform here could mask that potential simplification (although that seems like a rare possibility). I'll update with some test examples.

spatel updated this revision to Diff 262739.Thu, May 7, 1:19 PM
spatel marked an inline comment as done.

Patch updated:
No code changes, but added 2 more tests with undef and udiv/urem to better demonstrate constant folding.

RKSimon accepted this revision.Thu, May 7, 2:17 PM

LGTM - cheers!

This revision is now accepted and ready to land.Thu, May 7, 2:17 PM
This revision was automatically updated to reflect the committed changes.