The current cost-model overestimates the cost of vector compares &
selects for ordered floating point compares. This patch fixes that by
extending the existing logic for integer predicates.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
This sounds OK. I went down a bit of a rabbit hole though.
- An FCMP_UNE will be cheap too.
- The other fcmp operators are not much more expensive, and it is the compare that takes 2 compares and an Or - you can argue that the select is still cheap.
- With fast the others are cheap too.
- A default cost of 13 for a vselect seems quite high to me. Given a mask (as vector compares generate on AArch64), the cost of a select will usually be a bif/bic/bsl. (In the not super distant past they would be and's and nots and ors, but things are generally better since then). If the compare isn't operating on elements of the same size as the select, the mask may need to be extended or truncated first though.
- Does this account for times when the compare and select type sizes do not match? That becomes harder to get correct.
Most of that would be better left for other patches though (if it turns out to be useful at-all!)
I would guess that FCMP_UNE are worth adding here, at least.
Thanks for that! Unfortuantely there still are many other cases where we assign sub-optimal costs (usually too high).
- An FCMP_UNE will be cheap too.
Great point, I included it in the pathc.
- The other fcmp operators are not much more expensive, and it is the compare that takes 2 compares and an Or - you can argue that the select is still cheap.
Agreed! But I think that's best improved separately, as all cases included in the patch at the moment follow exactly the same reasoning as the integer cases (select will cost exactly one instruction).
- With fast the others are cheap too.
Agreed, cmp + select might only be single min/max instruction. At the moment, I don't think there's a convenient way to check if the compare had the right fast-math flags here. I am also not sure if changing the cost from 2 to 1 would make a huge difference in practice.
j
- A default cost of 13 for a vselect seems quite high to me. Given a mask (as vector compares generate on AArch64), the cost of a select will usually be a bif/bic/bsl. (In the not super distant past they would be and's and nots and ors, but things are generally better since then). If the compare isn't operating on elements of the same size as the select, the mask may need to be extended or truncated first though.
Yeah, the default cost seems too high, especially for vectors with more than 2 elements. But that's a separate issue I think.
- Does this account for times when the compare and select type sizes do not match? That becomes harder to get correct.
Good point, and no I don't think so. We have the same issue for integer predicates I think. So maybe that could be fixed independently, possibly by looking at the context instruction?
Most of that would be better left for other patches though (if it turns out to be useful at-all!)
I would guess that FCMP_UNE are worth adding here, at least.