This is an archive of the discontinued LLVM Phabricator instance.

[X86][SSE] Use multiplication scale factors for v8i16 SHL on pre-AVX2 targets.
ClosedPublic

Authored by RKSimon on May 30 2018, 11:14 AM.

Details

Summary

Similar to v4i32 SHL, convert v8i16 shift amounts to scale factors instead to improve performance and reduce instruction count. We were already doing this for constant shifts, this adds variable shift support.

Reduces the serial nature of the codegen, which relies on the chain of plendvb/pand+pandn+por shifts.

This is a step towards adding support for vXi16 vector rotates.

Diff Detail

Repository
rL LLVM

Event Timeline

RKSimon created this revision.May 30 2018, 11:14 AM
spatel accepted this revision.Jun 5 2018, 7:19 AM

LGTM - see inline for a couple of potential code comment clarifications.

lib/Target/X86/X86ISelLowering.cpp
23241 ↗(On Diff #149163)

A comment around here or maybe as a function comment to describe the transforms would be good.

// If the target doesn't support variable shifts, use either FP conversion 
// or integer multiplication to avoid shifting each element individually.

?

23250 ↗(On Diff #149163)

Add a note about why AVX2 doesn't want this?

This revision is now accepted and ready to land.Jun 5 2018, 7:19 AM
This revision was automatically updated to reflect the committed changes.