This is an archive of the discontinued LLVM Phabricator instance.

[X86][SSE] Add MULHS/MULHU custom lowering for i8 vectors
ClosedPublic

Authored by RKSimon on Mar 20 2016, 4:29 PM.

Diff Detail

Repository
rL LLVM

Event Timeline

RKSimon updated this revision to Diff 51137.Mar 20 2016, 4:29 PM
RKSimon retitled this revision from to [X86][SSE] Add MULHS/MULHU custom lowering for i8 vectors.
RKSimon updated this object.
RKSimon added reviewers: qcolombet, congh, andreadb, spatel.
RKSimon set the repository for this revision to rL LLVM.
RKSimon added a subscriber: llvm-commits.
qcolombet added inline comments.Mar 21 2016, 2:00 PM
lib/Target/X86/X86ISelLowering.cpp
19053 ↗(On Diff #51137)

Right now, Subtarget.hasInt256() should be true for 256 bit vector, since we add the custom lowering for those only when this predicate is true.
I.e., this is dead code at the moment, right?

19078 ↗(On Diff #51137)

This returns the low value, right?
I believe we miss a shift before the truncate.

19094 ↗(On Diff #51137)

A few comments on how we achieve the extraction wouldn’t hurt :).

E.g., Fill the 8 high bits of a v8i16 vector with the low input vector.
Shift that vector right by 8, to put the 8 relevant bits at the right place and fill the 8 high bits with 0 or sign extension.

RKSimon updated this revision to Diff 51350.Mar 22 2016, 3:11 PM

Revised based on Quentin's feedback.

Enabled v32i8 and v16i16 MULHS/MULHU custom lowering on AVX1 targets to prevent scalarization - I can commit the v16i16 support separately if necessary.

Improved the AVX 32i8 lowerings to make use ymm PACKUS (saves 2cy on my Carrizo tests) and added a missing shift to the v16i8 version (I had stupidly only locally tested 32i8 on AVX2 hardware).

Tried to improve comments on what is going on.

qcolombet accepted this revision.Mar 24 2016, 2:08 PM
qcolombet edited edge metadata.

Hi Simon,

LGTM. Couple of nits inlined.

Cheers,
-Quentin

lib/Target/X86/X86ISelLowering.cpp
19094 ↗(On Diff #51350)

Add that unlike the smaller PACKUS, the ymm variant interleaves the 128 bits of the both sources.
Without that in mind, the shuffle does not make sense, whereas it is definitely required :).

19117 ↗(On Diff #51350)

Period.

19133 ↗(On Diff #51350)

Period.

This revision is now accepted and ready to land.Mar 24 2016, 2:08 PM
This revision was automatically updated to reflect the committed changes.