tryToFoldExtendSelectLoad can be called in DAG combine2 which has already finish legalization
So we should conservatively check the Operation Action for new VSELECT.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/test/CodeGen/X86/vselect-post-combine.ll | ||
---|---|---|
27 | You might be able to further simplify this test? |
llvm/test/CodeGen/X86/select-ext.ll | ||
---|---|---|
97 ↗ | (On Diff #507607) | It seems generate more code. Probably you need check "Level <= AfterLegalizeVectorOps" in tryToFoldExtendSelectLoad(), so that there is more combine opportunity. |
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
12246 | Check "Level <= AfterLegalizeVectorOps" instead? |
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
12246 | Maybe changed as below. if ((N0->getOpcode() == ISD::VSELECT && Level > AfterLegalizeVectorOps && TLI.getOperationAction(ISD::VSELECT, VT) != TargetLowering::Legal )) return SDValue(); |
llvm/lib/Target/X86/X86ISelDAGToDAG.cpp | ||
---|---|---|
1043 | On the other hand, why can't use VPBLENDVB ymm1, ymm2, ymm3/m256 (https://www.felixcloutier.com/x86/pblendvb) to vselect v16i16? We can bitcast v16i16 to v32i8. |
llvm/test/CodeGen/X86/vselect-post-combine.ll | ||
---|---|---|
3 | This problem is not limited to Windows. Linux has the some problem if passing a ptr and load it to <32 x i8>. |
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
12246 | Yes, I also had a quick think about it, just need to pass Level here. | |
llvm/lib/Target/X86/X86ISelDAGToDAG.cpp | ||
1043 | Yes, but not only bitcast, it also need to bring new instructions to transfer value of ymm1. | |
llvm/test/CodeGen/X86/vselect-post-combine.ll | ||
3 | Yes, I tended to rm windows then I found it is not triggered, duo to the previous optimization difference. | |
5 | Sure | |
27 | Let me have a try, in fact I spend a lot time to simplify the raw test to current status : ) |
llvm/test/CodeGen/X86/vselect-post-combine.ll | ||
---|---|---|
3 | The difference first comes from ABI, so you can remove it by define ptr @test_mul(ptr %0) { %vecinit.i.i.i.i.i92 = load <32 x i8>, ptr %0 |
llvm/test/CodeGen/X86/vselect-post-combine.ll | ||
---|---|---|
3 | Sorry, there is a delay for me to see this comment, I have update it : ), the test options is mainly to trigger the combine, now we can trigger it. Can we go on now? |
llvm/test/CodeGen/X86/vselect-post-combine.ll | ||
---|---|---|
19 | Could you simplify %shuffle.i.i.i6.i.i as well? |
Check "Level <= AfterLegalizeVectorOps" instead?