The comment says we need 3 extracts and a select at the end. But didn't we just account for the select in the vector cost above. Aren't we just extracting the single element after taking the min/max in the vector register?
Details
Details
Diff Detail
Diff Detail
- Repository
- rL LLVM
Event Timeline
test/Transforms/SLPVectorizer/X86/horizontal-minmax.ll | ||
---|---|---|
1108 ↗ | (On Diff #177400) | This codegen looks like the final icmp+selects are being done with the scalars. |
test/Transforms/SLPVectorizer/X86/horizontal-minmax.ll | ||
---|---|---|
1097 ↗ | (On Diff #177400) | Isn’t this shuffle and the vector cmp+sel after it the last part of a reduction for elements 0 and 1. |
Comment Actions
LGTM
test/Transforms/SLPVectorizer/X86/horizontal-minmax.ll | ||
---|---|---|
1097 ↗ | (On Diff #177400) | Yes, you're right - I mis-read the test IR. It has to a do a <4 x i32> reduction, then a min/max with the remaining 4 scalar elements (2 in the previous block + 2 in this block). |