Instead of abstract cost of the scalar reduction ops, try to use the
cost of actual reduction operation instructions, where possible. Also,
remove the estimation of the vectorized GEPs pointers for reduced loads,
since it is already handled in the tree.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/test/Transforms/SLPVectorizer/X86/horizontal-smax.ll | ||
---|---|---|
73 | CMP+CMOV is quick even on ancient x86 - the smax.i32 throughput cost of 1 is realistic. The issue is the predicted smax.v16i32 reduction cost, which is currently 33 (based on expansion of costs in getMinMaxReductionCost), but realistically is closer to 12 cycles (based off some quick llvm-mca tests) |
llvm/test/Transforms/SLPVectorizer/X86/horizontal-smax.ll | ||
---|---|---|
73 | Can you fix it? |
llvm/test/Transforms/SLPVectorizer/X86/horizontal-smax.ll | ||
---|---|---|
73 | I'll try to fix some of the obvious issues to unstick this patch, but a more complete fix will take more time. |
llvm/test/Transforms/SLPVectorizer/X86/horizontal-smax.ll | ||
---|---|---|
73 | Please can you rebase after rG63c3895327839ba5b57f5b99ec9e888abf976ac6 ? |
llvm/test/Transforms/SLPVectorizer/X86/horizontal-minmax.ll | ||
---|---|---|
1126 | Makes sense: https://gcc.godbolt.org/z/fKbGnzEr8 | |
llvm/test/Transforms/SLPVectorizer/X86/horizontal-smax.ll | ||
37 | This looks to be about right: https://gcc.godbolt.org/z/sq99696Y7 You can add additional SSE test levels if you want to be certain? |
llvm/test/Transforms/SLPVectorizer/X86/horizontal-smax.ll | ||
---|---|---|
37 | You mean add some extra tests for smin/umin/umax/fmin/fmax? |
llvm/test/Transforms/SLPVectorizer/X86/horizontal-smax.ll | ||
---|---|---|
37 | No - extra SSE test levels - I've added them at rG162284b2e1a970a01144d1d8e7f8d4fd1e03c5bf |
Makes sense: https://gcc.godbolt.org/z/fKbGnzEr8