Split off from D108253
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
12707 | At least a summary comment would be useful. | |
12708 | Instead of passing Input by reference - why not return it? It just makes it look messy imo. | |
12714 | Would it be better to assert(isBroadcastShuffleMask(InputMask)) ? The isNoopOrBroadcastShuffleMask checks below should ensure it no? | |
12715 | Why not just create a X86ISD::VBROADCAST node? This code is AVX only and we have isel patterns that handle AVX1 cases where load folding fails. |
@RKSimon thank you for taking a look!
Hopefully addressed review notes.
Note that i'm not quite sure that this is the right way.
This paves road for D108253 that fixes https://bugs.llvm.org/show_bug.cgi?id=50971
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
12708 | Note that we also modify InputMask - we turn it into an identity mask. | |
12714 | Sure, that can work now. | |
12715 | Oh, hmm. I have not considered that, and that changes the results somewhat... |
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
12709 | All my commits are clang-formatted, so this did fit within 80-col limit. |
llvm/test/CodeGen/X86/oddshuffles.ll | ||
---|---|---|
2284 ↗ | (On Diff #368087) | I wrote a comment here, and phab just lost it :( This seems like demandedelts failure. Wild guess: perhaps in SimplifyMultipleUseDemandedBitsForTargetNode() after getTargetShuffleInputs(), |
llvm/test/CodeGen/X86/oddshuffles.ll | ||
---|---|---|
2284 ↗ | (On Diff #368087) | Actually, that won't work either. Optimized legalized selection DAG: %bb.0 'splat_v3i32:' SelectionDAG has 32 nodes: t0: ch = EntryToken t2: i64,ch = CopyFromReg t0, Register:i64 %0 t24: v8i32 = BUILD_VECTOR Constant:i32<0>, undef:i32, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0> t55: v8i32 = X86ISD::BLENDI t24, t58, TargetConstant:i8<2> t19: ch,glue = CopyToReg t0, Register:v8i32 $ymm0, t55 t69: v32i8 = bitcast t58 t76: i64 = X86ISD::Wrapper TargetConstantPool:i64<<32 x i8> <i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 0, i8 1, i8 2, i8 3, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128>> 0 t74: v32i8,ch = load<(load (s256) from constant-pool)> t0, t76, undef:i64 t71: v32i8 = X86ISD::PSHUFB t69, t74 t72: v8i32 = bitcast t71 t21: ch,glue = CopyToReg t19, Register:v8i32 $ymm1, t72, t19:1 t27: i64,ch = load<(load (s64) from %ir.ptr, align 1)> t0, t2, undef:i64 t30: v2i64 = scalar_to_vector t27 t31: v4i32 = bitcast t30 t28: i64 = add nuw t2, Constant:i64<8> t29: i32,ch = load<(load (s32) from %ir.ptr + 8, align 1)> t0, t28, undef:i64 t61: v4i32 = insert_vector_elt t31, t29, Constant:i64<2> t58: v8i32 = insert_subvector undef:v8i32, t61, Constant:i64<0> t22: ch = X86ISD::RET_FLAG t21, TargetConstant:i32<0>, Register:v8i32 $ymm0, Register:v8i32 $ymm1, t21:1 ===== Instruction selection begins: %bb.0 '' t29/t61 is what we want to drop, but even if we could recreate subreg widening, t58 has two uses. |
llvm/test/CodeGen/X86/oddshuffles.ll | ||
---|---|---|
2284 ↗ | (On Diff #368087) | What might work is calling SimplifyMultipleUseDemandedVectorElts on each operand at the end of the combineX86ShufflesRecursively recursion just before the calls into combineX86ShuffleChain? |
llvm/test/CodeGen/X86/oddshuffles.ll | ||
---|---|---|
2284 ↗ | (On Diff #368087) | Let's see... |
Aha! Thank you for the review.
This depends on D109065, so these two accepted patches in this patch series can't land just yet.
At least a summary comment would be useful.