This is an archive of the discontinued LLVM Phabricator instance.

[X86] Fix bad treatment of multi-lane blends in BUILD_VECTORtoBlendMask()
ClosedPublic

Authored by mkuper on Oct 7 2015, 2:47 AM.

Details

Summary

This is a replacement for the patch in D13207.
It turns out that there are actually separate two bugs. One is the bug that D13207 originally handled.
The other is that the number of lanes it computes may not actually be 1 or 2. See the AVX case of the constant_pblendvb_avx2 test for an example. I'm not sure how exactly it should behave, so I've decided to pessimize it back to safety.

Diff Detail

Repository
rL LLVM

Event Timeline

mkuper updated this revision to Diff 36721.Oct 7 2015, 2:47 AM
mkuper retitled this revision from to [X86] Fix bad treatment of multi-lane blends in BUILD_VECTORtoBlendMask().
mkuper updated this object.
mkuper added reviewers: filcab, chandlerc, aivchenk.
mkuper added subscribers: qcolombet, llvm-commits.
andreadb accepted this revision.Oct 7 2015, 6:33 AM
andreadb added a reviewer: andreadb.

Hi Michael

The patch looks good to me.

Unrelated to your patch (but still important in my opinion).

Currently, function BUILD_VECTORtoBlendMask is only called by function 'transformVSELECTtoBlendVECTOR_SHUFFLE'. The goal of that function is (if possible) to convert a VSELECT into a target independent vector shuffle node.

Function transformVSELECTtoBlendVECTOR_SHUFFLE performs the folling steps:

  1. At first it checks if the VSELECT condition is constant (i.e. a build_vector of ConstantSDNode (or Undef));
  2. It then delegates to function 'BUILD_VECTORtoBlendMask' the computation of a bitmask (an unsigned value) which represents some sort of 'compressed' vector shuffle mask;
  3. The bitmask is eventually decoded into a mask which is suitable for a target independent vector shuffle node.

The more I look at that code, the more I think that point 2. could have been entirely avoided.
What I am trying to say (correct me if I am wrong) is that we probably don't need to go through a two-step process where we firstly convert the select mask to a bitmask, and then we convert the bitmask to a vector shuffle mask...

Presumably this design made sense in the past (more than one year ago) since we didn't have the new shuffle lowering logic in place. Nowadays we should probably check if it would make more sense to just remove function BUILD_VECTORtoBlendMask and simplify the logic in transformVSELECTtoBlendVECTOR_SHUFFLE. This is just an idea and - as I said - unrelated to your patch.. I hope it makes sense.

-Andrea

This revision is now accepted and ready to land.Oct 7 2015, 6:33 AM
mkuper added a comment.Oct 7 2015, 6:44 AM

Thanks, Andrea.

What you wrote makes sense to me, but I'm not familiar enough with this code (or its history) to have a strong opinion either way.
Filipe and Chandler probably understand the bigger picture better.

filcab edited edge metadata.Oct 7 2015, 8:49 AM

I have a question:
This looks like it's to generate BLENDI nodes, in order to match the Intel instructions for blends with immediates...
But there are no byte-shuffles with immediates, AFAICT.

What did I miss?

P.S: If I change that function to bail out on ElemType == MVT::i8, I see the masks reversed (but both lanes are equal)...

test/CodeGen/X86/vector-blend.ll
664 ↗(On Diff #36721)

Why are the "first lanes" changing here? Either it was completely wrong, and you're fixing it, or the replacement is wrong (the "second lane" was always wrong).

I have a question:
This looks like it's to generate BLENDI nodes, in order to match the Intel instructions for blends with immediates...
But there are no byte-shuffles with immediates, AFAICT.

What did I miss?

The user of that function delegates the selection of shuffle blend nodes to the shuffle lowering logic.
That said, you are right and there are no byte shuffles with immediate mask. However, I don't think it is in general a bad idea to canonicalize byte shuffles to shuffle vector node since it may potentially enable more combining of shuffles.

test/CodeGen/X86/vector-blend.ll
664 ↗(On Diff #36721)

The operands to the vpblendvb have been commuted. So the mask has been changed accordingly. I don't think this is wrong.

This revision was automatically updated to reflect the committed changes.