This is an archive of the discontinued LLVM Phabricator instance.

[X86][SSE] Generalize X86ISD::BLENDI support to more value types with fix for revert from r354713
ClosedPublic

Authored by craig.topper on Feb 23 2019, 1:58 AM.

Details

Summary

This is a modified version of D57888 which was reverted.

It now uses PBLENDW for 128-bit integer blends using SDNodeXForms to rewrite the immediate. This keeps the instruction in the integer domain and avoids the possiblity of it being commuted and becoming movss/movsd when optsize is enabled. See more information in D57888.

I plan to reduce the test case that came with the revert and include it, but its late here and I wanted to get this patch up so Simon could look at it. I'll also make a separate patch to fix the underlying issue in the two address instruction pass.

Diff Detail

Repository
rL LLVM

Event Timeline

craig.topper created this revision.Feb 23 2019, 1:58 AM

Update comment to indicate use pblendw or an fp blend.

FWICT from the discussion on D57888, doesn't that mean the problem is really in the commutation code and we should be fixing that?

I will fix the commuting code too.

But doing only that means we’ll turn v4i32 blends into movss/movsd sometimes on pre-avx2 targets. For v2i64 it can happen on avx2 targets as well.

I will fix the commuting code too.

But doing only that means we’ll turn v4i32 blends into movss/movsd sometimes on pre-avx2 targets. For v2i64 it can happen on avx2 targets as well.

Good point - I imagine even for OptSize we typically want to avoid the domain swap.

I've put the original patch back in with the fix to the TwoAddressInstructionPass and a reduced test case. I'll rebase this to just the change to use PBLENDW instead.

Rebase. Now this just changes from blendps/blendpd to pblendw. The original patch was recommitted

Herald added a project: Restricted Project. · View Herald TranscriptFeb 23 2019, 7:22 PM
RKSimon accepted this revision.Feb 24 2019, 4:30 AM

LGTM

This revision is now accepted and ready to land.Feb 24 2019, 4:30 AM

As a follow up, its worth investigating how to get the domain pass to prefer pblendd for integers on avx2+ targets - afaict intel targets can dispatch it to more pipes than pblendw, and I don't think bdver4/znver1 cares as much.

This revision was automatically updated to reflect the committed changes.