This is an archive of the discontinued LLVM Phabricator instance.

[X86][SSE] Support variable-index float/double vector insertion on SSE41+ targets (PR47924)
ClosedPublic

Authored by RKSimon on Feb 2 2021, 6:12 AM.

Details

Summary

Extends D95779 to permit insertion into float/doubles vectors while avoiding a lot of aliased memory traffic.

The scalar value is already on the simd unit, so we only need to transfer and splat the index value, then perform the select.

SSE4 codegen is a little bulky due to the tied register requirements of (non-VEX) BLENDPS/PD but the extra moves are cheap so shouldn't be an actual problem.

Diff Detail

Event Timeline

RKSimon created this revision.Feb 2 2021, 6:12 AM
RKSimon requested review of this revision.Feb 2 2021, 6:12 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 2 2021, 6:12 AM
pengfei accepted this revision.Feb 2 2021, 5:14 PM

LGTM.

This revision is now accepted and ready to land.Feb 2 2021, 5:14 PM

One more question: if the simd registers are in high pressure, can we get the benefit as expected?

One more question: if the simd registers are in high pressure, can we get the benefit as expected?

I'm going to say "probably" :) The big benefit is that we avoid the scalar write aliasing with a vector write, which can stall various cache optimizations (STLF etc.) - instead we're likely to end up with a single vector push/pull which should be a lot less painful. However, the (V)BLENDVPD/S op isn't commutable so if we end up spilling the wrong vector (or in the SSE case the xmm0 tied register causes problems) then we could see additional stack traffic - I don't think that is a showstopper however, and is something I think we could address in regalloc if it does happen.

This revision was landed with ongoing or failed builds.Feb 3 2021, 6:14 AM
This revision was automatically updated to reflect the committed changes.