This is an archive of the discontinued LLVM Phabricator instance.

[x86] flatten packss+movmsk into 256-bit movmsk

Authored by RKSimon on Mar 27 2019, 5:04 PM.



I think we can end up with packss+movmsk sequences either because the code was written that way with intrinsics or because we have a likely over-enthusiastic DAG transform that is seeking to prevent something like this with AVX1:

@@ -377,8 +386,8 @@ define i64 @test_v4i64_legal_sext(<4 x i64> %a0, <4 x i64> %a1) {
 ; AVX1-NEXT:    vextractf128 $1, %ymm0, %xmm3
 ; AVX1-NEXT:    vpcmpgtq %xmm2, %xmm3, %xmm2
 ; AVX1-NEXT:    vpcmpgtq %xmm1, %xmm0, %xmm0
-; AVX1-NEXT:    vpackssdw %xmm2, %xmm0, %xmm0
-; AVX1-NEXT:    vmovmskps %xmm0, %eax
+; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm0, %ymm0
+; AVX1-NEXT:    vmovmskps %ymm0, %eax

That's also why *this* patch is limited for AVX1. I'm not sure yet what it would take to get that right in all cases.

There's potentially a better way to solve more of these patterns generally: always sink extends after shuffles, so we're shuffling bool vectors early in SDAG. That's almost certainly needed in IR to unlock some missed vector optimization, and we could repeat it here in the DAG (possibly with a hook), but I don't think it obviates the need for this patch.

Diff Detail

Event Timeline

spatel created this revision.Mar 27 2019, 5:04 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 27 2019, 5:04 PM

The movmsk being used on vXi16 is worrying me


Haven't check it thoroughly - but how come this isn't vmovmskpd ymm0? For both AVX1 and AVX2.


I don't think this is going to work - we end up with a movmsk of a 32i8 (i32 instead of a i16 zext).

spatel planned changes to this revision.Mar 28 2019, 6:01 AM
spatel marked an inline comment as done.
spatel added inline comments.

Argh...yes, this is wrong. I need to rethink this patch.

RKSimon commandeered this revision.Apr 12 2019, 7:06 AM
RKSimon edited reviewers, added: spatel; removed: RKSimon.

After chatting with @spatel I'm going to deal with this as a followup to D60610

RKSimon abandoned this revision.Jul 4 2020, 7:03 AM

Abandoning - we've now covered this with a mixture of combineSetCCMOVMSK and PromoteMaskArithmetic