- User Since
- Jun 8 2016, 12:50 PM (89 w, 15 h)
Thu, Feb 15
Addressing Simon's comments
Wed, Feb 14
LGTM after fixing the signed/unsigned mismatches.
Sun, Feb 11
- matchBinaryPredicate -> matchUnryPredicate
- Use Simon's uniform scalar/vector code suggestion for computing INEXACT
Tue, Feb 6
Sun, Feb 4
Rebase + ping
Following Simon's suggestions, dropping the TLI hook seems to improve all cases except for v2i64 on SSE/AVX1.
I think you are right. Probably all cases will profit except for v2i64. Will try to drop the TLI hook.
Tue, Jan 30
Thu, Jan 25
Wed, Jan 24
LGTM with a minor request:
Tue, Jan 23
Jan 23 2018
Thanks for the fix.
Jan 19 2018
Jan 17 2018
Add some basic encoding tests?
I would appreciate suggestions for alternative solutions.
Jan 16 2018
This seems like a real issue. With no version info in the module, how can AutoUpgrade tell if a divide with no 'nof' attribute is of the old form or new form? This is really a performance issue, because AutoUpgrade can always pessimistically not add 'nof' if the version of the incoming module is unknown. Possible solutions:
Jan 14 2018
Jan 13 2018
Generalize to account for commutativity of add and mul
Jan 12 2018
Check both BUILD_VECTOR nodes together if one is composed of odd indexed extracts and the other composed of even idexed extracts.
Jan 11 2018
Rebase after adding the missing zext cases
Add asserions for type sizes and fix typo in comment
Jan 10 2018
Reabase on top D41925
There are some occurrences of calls to getMaskedGather in DAGCombine.cpp which i do not see being addressed by this patch. I guess they are not being covered by tests?
Fix issue identified by Simon: use original vector type for the insert_vector
Average lowering fully using the refactored type-splitting code.
- Following Simon's suggestion, refactored out the code that splits the vector to legal-types to 'LowerBinTo' (the function name probably needs revision)) and applied to PMADDWD.
- Added a missing DAGCombine to let a truncate negate a sext through an EXTRACT_SUBVECTOR.
Jan 9 2018
Fixes for Craig's comments
Added test with source vector larger than indices vector
Sure, but looking at your example the return type should have the same number of elements as the indices vector, right?
Rebase + apply fixes for Simon's comments. Will commit this change right away to avoid conflicts.
Jan 8 2018
Jan 7 2018
There are some regressions that need to be addressed (or we decide to accept), but overall your approach seems right to me.
Still trying to get hold of a KNL expert that will answer whether KNL should be included. Can we for now conservatively assume no and exclude KNL from this patch just so this patch can make progress? I want to follow-up on updating the AVX2 tests with FastVariableShuffle configurations.
Dec 24 2017
Dec 23 2017
LGTM, but would like to see a more seasoned InstCombine contributer take a look before giving a final ok
Dec 21 2017
Dec 20 2017
Since all known processors with AVX512 will prefer this new feature turned on, can we make AVX512 imply Fast-var-shuffles?
Dec 18 2017
Here's the full list of tests that are affected by setting AllowVariableMask for Depth=2. I think that we should have the full list covered with the new configuration.
I would be happy to assist with the work involved.
On second thought, i think we need to update the tests with a -mattr=+fast-variable-shuffle configuration, right?
Dec 12 2017
Dec 11 2017
Dec 7 2017
@spatel, this patch is for lowerV8I32VectorShuffle() which won't be called for AVX1-only targets. Would be nice if we could somehow get AVX covered as well, if profitable.
I did not observe any speedups with this patch, but FWIW IACA reports that (for Intel processors, of course) the throughput can be higher even if the load is not hoisted.
What triggered this patch was a case i discovered while working on deprecation of llvm.x86.avx2.permd and llvm.x86.avx2.permps. After trashing these intrinsics that case ends up with: