This is a replacement for the patch in D13207.
It turns out that there are actually separate two bugs. One is the bug that D13207 originally handled.
The other is that the number of lanes it computes may not actually be 1 or 2. See the AVX case of the constant_pblendvb_avx2 test for an example. I'm not sure how exactly it should behave, so I've decided to pessimize it back to safety.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
Hi Michael
The patch looks good to me.
Unrelated to your patch (but still important in my opinion).
Currently, function BUILD_VECTORtoBlendMask is only called by function 'transformVSELECTtoBlendVECTOR_SHUFFLE'. The goal of that function is (if possible) to convert a VSELECT into a target independent vector shuffle node.
Function transformVSELECTtoBlendVECTOR_SHUFFLE performs the folling steps:
- At first it checks if the VSELECT condition is constant (i.e. a build_vector of ConstantSDNode (or Undef));
- It then delegates to function 'BUILD_VECTORtoBlendMask' the computation of a bitmask (an unsigned value) which represents some sort of 'compressed' vector shuffle mask;
- The bitmask is eventually decoded into a mask which is suitable for a target independent vector shuffle node.
The more I look at that code, the more I think that point 2. could have been entirely avoided.
What I am trying to say (correct me if I am wrong) is that we probably don't need to go through a two-step process where we firstly convert the select mask to a bitmask, and then we convert the bitmask to a vector shuffle mask...
Presumably this design made sense in the past (more than one year ago) since we didn't have the new shuffle lowering logic in place. Nowadays we should probably check if it would make more sense to just remove function BUILD_VECTORtoBlendMask and simplify the logic in transformVSELECTtoBlendVECTOR_SHUFFLE. This is just an idea and - as I said - unrelated to your patch.. I hope it makes sense.
-Andrea
Thanks, Andrea.
What you wrote makes sense to me, but I'm not familiar enough with this code (or its history) to have a strong opinion either way.
Filipe and Chandler probably understand the bigger picture better.
I have a question:
This looks like it's to generate BLENDI nodes, in order to match the Intel instructions for blends with immediates...
But there are no byte-shuffles with immediates, AFAICT.
What did I miss?
P.S: If I change that function to bail out on ElemType == MVT::i8, I see the masks reversed (but both lanes are equal)...
test/CodeGen/X86/vector-blend.ll | ||
---|---|---|
664 ↗ | (On Diff #36721) | Why are the "first lanes" changing here? Either it was completely wrong, and you're fixing it, or the replacement is wrong (the "second lane" was always wrong). |
The user of that function delegates the selection of shuffle blend nodes to the shuffle lowering logic.
That said, you are right and there are no byte shuffles with immediate mask. However, I don't think it is in general a bad idea to canonicalize byte shuffles to shuffle vector node since it may potentially enable more combining of shuffles.
test/CodeGen/X86/vector-blend.ll | ||
---|---|---|
664 ↗ | (On Diff #36721) | The operands to the vpblendvb have been commuted. So the mask has been changed accordingly. I don't think this is wrong. |