This is an archive of the discontinued LLVM Phabricator instance.

[X86][SSE] Add SimplifyDemandedVectorEltsForTargetShuffle to handle target shuffle variable masks.
ClosedPublic

Authored by RKSimon on Jun 13 2020, 11:12 AM.

Details

Summary

Pulled out from the ongoing work on D66004, currently we don't do a good job of simplifying variable shuffle masks that have already lowered to constant pool entries.

This patch adds SimplifyDemandedVectorEltsForTargetShuffle (a custom x86 helper) to first try SimplifyDemandedVectorElts (which we already do) and then constant pool simplification to help mark undefined elements.

To prevent lowering/combines infinite loops, we only handle basic constant pool loads instead of creating new BUILD_VECTOR nodes for lowering - e.g. we don't try to convert them to broadcast/vzext_load - there might be some benefit to this but if so I'd rather we come up with some way to reuse existing code than reimplement a lot of BUILD_VECTOR code.

Diff Detail

Event Timeline

RKSimon created this revision.Jun 13 2020, 11:12 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 13 2020, 11:12 AM
Herald added a subscriber: hiraditya. · View Herald Transcript

Any objections to this patch? This will almost unblock D66004 (and then hopefully D56387)

This revision is now accepted and ready to land.Jun 20 2020, 12:10 PM
This revision was automatically updated to reflect the committed changes.
craig.topper added inline comments.Jul 7 2020, 7:37 PM
llvm/lib/Target/X86/X86ISelLowering.cpp
36727

I think this check isn't enough if the load is narrower than the constant pool vector. For example a v16i8 load with a v32i8 constant pool. So NumCstElts == NumElts * 2 and we'll proceed.

I think this is the cause of some failures we're seeing, but I don't have a reduced case yet.

pengfei added a subscriber: pengfei.Jul 7 2020, 9:58 PM
RKSimon marked an inline comment as done.Jul 8 2020, 12:15 AM
RKSimon added inline comments.
llvm/lib/Target/X86/X86ISelLowering.cpp
36727

Does checking that Mask.getValueSizeInBits() == C->getSizeInBits() as well help?

craig.topper added inline comments.
llvm/lib/Target/X86/X86ISelLowering.cpp
36727

Yeah I ended up trying that right after I wrote my earlier message. That fixed the failing tests we had. I think we may have a reduced test case now. @pengfei or @yubing can you share it?

pengfei added inline comments.Jul 8 2020, 12:42 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
36727

Sure. There's a small one here https://godbolt.org/z/hsh5_K

yubing added inline comments.Jul 8 2020, 12:53 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
36736

Hi, Simon. I'm just wondering why we divide i by scale here. In my case:
When SimplifyDemandedVectorEltsForTargetShuffle visit t150, demandedElts is 0xff0f, scale is 2. so when i=8, DemandedElts[i / Scale] is false, but DemandedElts[i] is true. Thus the t146[8] will become undef while the previous value is -1.

t146: i64 = X86ISD::Wrapper TargetConstantPool:i64<<32 x i8> <i8 4, i8 5, i8 6, i8 7, i8 undef, i8 undef, i8 undef, i8 undef, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef>> 0
t154: v16i8,ch = load<(load 16 from constant-pool, align 32)> t0, t146, undef:i64
t150: v16i8 = X86ISD::PSHUFB t156, t154

RKSimon marked an inline comment as done.Jul 8 2020, 2:46 AM
RKSimon added inline comments.
llvm/lib/Target/X86/X86ISelLowering.cpp
36736

Scale should only be used to handle vXi64 <-> v2Xi32 style issues on 32-bit targets - that we're hitting this on other types is a bug because we're not dealing with the fact that the Constant might be a different size to the mask

@yubing @pengfei @craig.topper Please can you confirm the regressions have now been addressed?

yubing added a comment.Jul 8 2020, 7:06 PM

@yubing @pengfei @craig.topper Please can you confirm the regressions have now been addressed?

Thanks, Simon~ Your patch can solve our bug.