This is an archive of the discontinued LLVM Phabricator instance.

[X86][SSE] Prefer BLEND(SHL(v,c1),SHL(v,c2)) over MUL(v, c3)
ClosedPublic

Authored by RKSimon on Jul 4 2018, 6:40 AM.

Details

Summary

Now that rL336250 has (hopefully) landed, I'd like to prefer 2 immediate shifts + a shuffle blend over performing a multiply. Despite the increase in instructions, this is quicker (especially for slow v4i32 multiplies), avoid loads and constant pool usage. It does mean however that we do increase register pressure,. The code size will go up a little but by less than what we save on the constant pool data.

Diff Detail

Repository
rL LLVM

Event Timeline

RKSimon created this revision.Jul 4 2018, 6:40 AM
RKSimon planned changes to this revision.Jul 4 2018, 6:52 AM

Dammit, just realised that pre-SSE41 targets might introduce AND/ANDN/OR blend masks which will even more costly - I'll see if there is a better way to do this

RKSimon updated this revision to Diff 154197.Jul 5 2018, 3:55 AM

Make vXi16 "2shifts+select" more selective - only do it on pre-SSE41 if the shuffle can be widened. Only do on SSE41+ if a single PBLENDW can be used.

lebedev.ri added inline comments.Jul 8 2018, 9:46 AM
test/CodeGen/X86/lower-vec-shift.ll
211–232 ↗(On Diff #154197)

Subj only talks about mul, but this is div.
This is intended to be changed by this patch?
If yes, there is no lshr test as far as i can tell.

RKSimon added inline comments.Jul 8 2018, 10:20 AM
test/CodeGen/X86/lower-vec-shift.ll
211–232 ↗(On Diff #154197)

This is a side effect of only accepting v8i16 2shifts+blend on pre-SSE41 (no PBLENDW) if the shuffle can be widened to v4i32, as without PBLENDW we have to perform a bitmask with OR(ANDN,AND) - but for other shifts we'd end up doing that anyway - I suppose I could limit this to SHL cases only?

RKSimon updated this revision to Diff 154525.Jul 8 2018, 1:28 PM

Still perform non-SHL shifts without PBLENDW / v4i32 widening

This revision is now accepted and ready to land.Jul 9 2018, 12:00 PM
This revision was automatically updated to reflect the committed changes.