The comment says we need 3 extracts and a select at the end. But didn't we just account for the select in the vector cost above. Aren't we just extracting the single element after taking the min/max in the vector register?
Details
Details
Diff Detail
Diff Detail
- Repository
- rL LLVM
- Build Status
Buildable 25860 Build 25859: arc lint + arc unit
Event Timeline
test/Transforms/SLPVectorizer/X86/horizontal-minmax.ll | ||
---|---|---|
1108 | This codegen looks like the final icmp+selects are being done with the scalars. |
test/Transforms/SLPVectorizer/X86/horizontal-minmax.ll | ||
---|---|---|
1097 | Isn’t this shuffle and the vector cmp+sel after it the last part of a reduction for elements 0 and 1. |
Comment Actions
LGTM
test/Transforms/SLPVectorizer/X86/horizontal-minmax.ll | ||
---|---|---|
1097 | Yes, you're right - I mis-read the test IR. It has to a do a <4 x i32> reduction, then a min/max with the remaining 4 scalar elements (2 in the previous block + 2 in this block). |
Isn’t this shuffle and the vector cmp+sel after it the last part of a reduction for elements 0 and 1.