This is an archive of the discontinued LLVM Phabricator instance.

[X86][SSE] Vectorize a bit (AND/XOR/OR) op if a BUILD_VECTOR has the same op for all their scalar elements.
ClosedPublic

Authored by RKSimon on Mar 26 2016, 12:15 PM.

Details

Summary

If all a BUILD_VECTOR's source elements are the same bit (AND/XOR/OR) operation type and each has one constant operand, lower to a pair of BUILD_VECTOR and just apply the bit operation to the vectors.

The constant operands will form a constant vector meaning that we still only have a single BUILD_VECTOR to lower and we will have replaced all the scalarized operations with a single SSE equivalent.

Its probably not in our interest to start make a general purpose vectorizer from this, but I'm seeing enough of these scalar bit operations from the later legalization/scalarization stages to support them at least.

Diff Detail

Repository
rL LLVM

Event Timeline

RKSimon updated this revision to Diff 51719.Mar 26 2016, 12:15 PM
RKSimon retitled this revision from to [X86][SSE] Vectorize a bit (AND/XOR/OR) op if a BUILD_VECTOR has the same op for all their scalar elements..
RKSimon updated this object.
RKSimon added reviewers: chandlerc, delena, andreadb, spatel.
RKSimon set the repository for this revision to rL LLVM.
RKSimon added a subscriber: llvm-commits.
chandlerc edited edge metadata.Mar 26 2016, 12:48 PM

Thanks for explaining why this is a good idea. I really don't want us doing more at this layer than is necessary to clean up after other legalization transforms, but your explanation clarifies that this is specifically the intent. Also, the test case updates certainly make it very clear that this situation comes up all the time.

Couple of high level points:

  1. Why just AND, OR, and XOR? Maybe also support shifts? (fine to leave a TODO for now)
  2. We need to leave documentation of the rationale in comments as well. I marked where it seems to fit below.
lib/Target/X86/X86ISelLowering.cpp
6628–6630 ↗(On Diff #51719)

This seems like a good place to add comments documenting why its important to do this at this level.

6660–6665 ↗(On Diff #51719)

I think using the lambda helper here is a bit overkill. It forces us to do a lot of work that we don't really need to do IMO.

I would get the first element opcode, check that it is one of AND, OR, or XOR, check that it is legal on the target.

Then do the lowering with that opcode.

Shouldn't need a lambda at all, and can use either a switch or 3 tests against ==. I actually think even the switch is likely overkill for just 3 opcodes.

RKSimon updated this revision to Diff 51737.Mar 27 2016, 8:41 AM
RKSimon edited edge metadata.

Updated based on Chandler's feedback.

Regarding adding support for shifts - I have looked at this (and for add/sub too), but the problem I was seeing was that typicallly one of the elements (#0 most often) didn't share the opcode (i.e. zero shift / offset). I'd be willing to do this for a single 'mis-match' but am worried that it will set a precedent for further relaxations to try and force a vectorization.

chandlerc accepted this revision.Mar 28 2016, 1:09 PM
chandlerc edited edge metadata.

Looks good with a minor adjustment below.

lib/Target/X86/X86ISelLowering.cpp
6654–6670 ↗(On Diff #51737)

I would 'break' from the 3 cases you accept, and add a default that bails out. That will allow you to early-exit and reduce indentation for the entire block here.

This revision is now accepted and ready to land.Mar 28 2016, 1:09 PM
This revision was automatically updated to reflect the committed changes.