Pulled out of PowerPC, and added ABDS support as well (hence the additional v4i32 PPC matches)
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
11807 | Maybe there should be one-use checks? Is abd more efficient if the sub operations are already needed? |
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
11807 | X86 is probably the worst case for this as it usually folds to sub(max(x,y),min(x,y)) - most other cases currently have a ABD instruction, or would gain from better ILP (see PPC's ABDS v4i32 variant). I'll add more extensive x86 test coverage and see what it looks like. |
llvm/test/CodeGen/X86/abds-vector-128.ll | ||
---|---|---|
1106 | so seems multiuse cmp is slight regression. Think it could be fixed if | |
1108 | the: pcmpgtq %xmm2, %xmm1 pcmpeqd %xmm0, %xmm0 pxor %xmm1, %xmm0 is just a duplicate of pcmpgtq %xmm1, %xmm0? Likewise for the avx/avx2 tests. Know whats going on there? But guess at the end of the day, this can be fixed in LowerABD by looking for existing dag nodes and will be probably be easier to do after this patch. Maybe add a TODO in x86 LowerABD to fixup missed optimizations. (Likewise we always blendv on cmp result, but should be doable on the sub, and for avx512 should be vpternlogd instead of blendv) |
llvm/test/CodeGen/X86/abds-vector-128.ll | ||
---|---|---|
1108 | Yes, IIRC we have an existing problem with other problems with patterns using min/max pairs - we don't do enough to share SETCC nodes. |
llvm/test/CodeGen/X86/abds-vector-128.ll | ||
---|---|---|
1108 | We last looked at this for rG813459ed2b0b but I wonder if really we should be doing more generically with SETCC nodes before we get this far. |
ping - apart from the x86 multiuse cmp issues does anyone have anymore comments?
llvm/test/CodeGen/X86/abds-vector-128.ll | ||
---|---|---|
1108 | We're also being hit by the freeze nodes making it tricky to match setcc(x,y) and setcc(freeze(y),freeze(x)) - I think I'd prefer to add this to the list of existing issues we're having with duplicate equivalent compares. |
Maybe there should be one-use checks? Is abd more efficient if the sub operations are already needed?