This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Fix costs of float vector compare/selects pairs.
ClosedPublic

Authored by fhahn on Jan 26 2022, 8:41 AM.

Details

Summary

The current cost-model overestimates the cost of vector compares &
selects for ordered floating point compares. This patch fixes that by
extending the existing logic for integer predicates.

Diff Detail

Event Timeline

fhahn created this revision.Jan 26 2022, 8:41 AM
fhahn requested review of this revision.Jan 26 2022, 8:41 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 26 2022, 8:41 AM

This sounds OK. I went down a bit of a rabbit hole though.

  • An FCMP_UNE will be cheap too.
  • The other fcmp operators are not much more expensive, and it is the compare that takes 2 compares and an Or - you can argue that the select is still cheap.
  • With fast the others are cheap too.
  • A default cost of 13 for a vselect seems quite high to me. Given a mask (as vector compares generate on AArch64), the cost of a select will usually be a bif/bic/bsl. (In the not super distant past they would be and's and nots and ors, but things are generally better since then). If the compare isn't operating on elements of the same size as the select, the mask may need to be extended or truncated first though.
  • Does this account for times when the compare and select type sizes do not match? That becomes harder to get correct.

Most of that would be better left for other patches though (if it turns out to be useful at-all!)
I would guess that FCMP_UNE are worth adding here, at least.

fhahn updated this revision to Diff 403638.Jan 27 2022, 6:45 AM

This sounds OK. I went down a bit of a rabbit hole though.

Thanks for that! Unfortuantely there still are many other cases where we assign sub-optimal costs (usually too high).

  • An FCMP_UNE will be cheap too.

Great point, I included it in the pathc.

  • The other fcmp operators are not much more expensive, and it is the compare that takes 2 compares and an Or - you can argue that the select is still cheap.

Agreed! But I think that's best improved separately, as all cases included in the patch at the moment follow exactly the same reasoning as the integer cases (select will cost exactly one instruction).

  • With fast the others are cheap too.

Agreed, cmp + select might only be single min/max instruction. At the moment, I don't think there's a convenient way to check if the compare had the right fast-math flags here. I am also not sure if changing the cost from 2 to 1 would make a huge difference in practice.
j

  • A default cost of 13 for a vselect seems quite high to me. Given a mask (as vector compares generate on AArch64), the cost of a select will usually be a bif/bic/bsl. (In the not super distant past they would be and's and nots and ors, but things are generally better since then). If the compare isn't operating on elements of the same size as the select, the mask may need to be extended or truncated first though.

Yeah, the default cost seems too high, especially for vectors with more than 2 elements. But that's a separate issue I think.

  • Does this account for times when the compare and select type sizes do not match? That becomes harder to get correct.

Good point, and no I don't think so. We have the same issue for integer predicates I think. So maybe that could be fixed independently, possibly by looking at the context instruction?

Most of that would be better left for other patches though (if it turns out to be useful at-all!)
I would guess that FCMP_UNE are worth adding here, at least.

dmgreen accepted this revision.Jan 27 2022, 8:00 AM

Thanks. LGTM

This revision is now accepted and ready to land.Jan 27 2022, 8:00 AM
This revision was landed with ongoing or failed builds.Jan 31 2022, 2:18 AM
This revision was automatically updated to reflect the committed changes.