As discussed on D41323.
I've avoided binding the FeatureFastVariableShuffle feature to FeatureAVX512 directly, and just made it part of combineX86ShuffleChain.
Paths
| Differential D41436
[X86][AVX512] Enable variable shuffle combining by default on AVX512 targets AbandonedPublic Authored by RKSimon on Dec 20 2017, 5:05 AM.
Details Summary As discussed on D41323. I've avoided binding the FeatureFastVariableShuffle feature to FeatureAVX512 directly, and just made it part of combineX86ShuffleChain.
Diff Detail
Event TimelineComment Actions @gadi.haber This change means that KNL will be more aggressive with shuffle combining as well - is that OK? Comment Actions
See my reply here, https://reviews.llvm.org/D41323#961494. I don't know of any reason why applying this for KNL would not be beneficial. @gadi.haber, do you have any means of getting an answer to this?
Comment Actions Still trying to get hold of a KNL expert that will answer whether KNL should be included. Can we for now conservatively assume no and exclude KNL from this patch just so this patch can make progress? I want to follow-up on updating the AVX2 tests with FastVariableShuffle configurations. Comment Actions
Isn't that what we have already? Skylake etc all have FeatureFastVariableShuffle enabled, the issue with this patch was whether we should enable it for the avx512 attribute and not just on a per-cpu basis. Comment Actions
Yes, you are right. Then nothing prevents me from updating the tests... Comment Actions
@hfinkel IIRC you've been using KNL, should we be trying to combine to a variable shuffle to combine 2 shuffles or more? Comment Actions Abandoning old patch - we've never formed a consistent picture of fast/slow variable shuffles for KNL vs SKX families
Revision Contents
Diff 127686 lib/Target/X86/X86ISelLowering.cpp
test/CodeGen/X86/avx512-shuffles/broadcast-vector-int.ll
test/CodeGen/X86/avx512-shuffles/partial_permute.ll
test/CodeGen/X86/avx512-trunc.ll
test/CodeGen/X86/broadcastm-lowering.ll
test/CodeGen/X86/shuffle-strided-with-offset-128.ll
test/CodeGen/X86/shuffle-strided-with-offset-256.ll
test/CodeGen/X86/shuffle-strided-with-offset-512.ll
test/CodeGen/X86/shuffle-vs-trunc-128.ll
test/CodeGen/X86/shuffle-vs-trunc-256.ll
test/CodeGen/X86/shuffle-vs-trunc-512.ll
test/CodeGen/X86/vector-half-conversions.ll
test/CodeGen/X86/vector-shuffle-128-v4.ll
test/CodeGen/X86/vector-shuffle-128-v8.ll
test/CodeGen/X86/vector-shuffle-256-v16.ll
test/CodeGen/X86/vector-shuffle-256-v32.ll
test/CodeGen/X86/vector-shuffle-256-v4.ll
test/CodeGen/X86/vector-shuffle-256-v8.ll
test/CodeGen/X86/vector-shuffle-512-v32.ll
test/CodeGen/X86/vector-zext.ll
|
Should we just do the OR in the implementation of hasFastVariableShuffle()? That way if we use this method in other places they'll have consistent behavior.