This is an archive of the discontinued LLVM Phabricator instance.

[SVE][CostModel] Add instruction cost for operations on scalable vectors
ClosedPublic

Authored by nasherm on Mar 19 2021, 2:37 AM.

Details

Summary

The following operations have no associated cost for them
when applied to scalable vectors, and as a consequence
can trigger a crash when a call is made to
AArch64TTIImpl::getCastInstrCost():

  • fptrunc
  • trunc
  • fpext
  • fpto(u,s)i

This patch adds costs for these operations and
relevant regression tests.

Diff Detail

Event Timeline

nasherm created this revision.Mar 19 2021, 2:37 AM
nasherm requested review of this revision.Mar 19 2021, 2:37 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 19 2021, 2:37 AM
sdesmalen added inline comments.Mar 19 2021, 2:44 AM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
461

Please avoid changing the formatting of the table, because that makes this diff unnecessarily big.

llvm/test/CodeGen/AArch64/fptrunc-cost.ll
1

The test is in the wrong directory. SVE Cost-model tests should be added in llvm/test/Analysis/CostModel/AArch64

17

I think you may be able to write the test very similar to what was done in D97758, where you can invoke the cost-model directly on the fptrunc operation.
Or is there a specific reason that invoking the loop-vectorizer is required here?

david-arm added inline comments.Mar 19 2021, 2:48 AM
llvm/test/CodeGen/AArch64/fptrunc-cost.ll
17

I think that opt -cost-model -analyze doesn't crash because it calculates the cost in a different way. The loop vectoriser decides to do things "differently" by passing the fptrunc op to the getCastInstrCost function. :(

nasherm added inline comments.Mar 19 2021, 4:06 AM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
461

I believe this was a clang-format change.

llvm/test/CodeGen/AArch64/fptrunc-cost.ll
1

Ah my mistake, will fix!

david-arm added inline comments.Mar 19 2021, 4:16 AM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
461

Hi @nasherm, yeah unfortunately it does happen on my patches too. Typically we try to avoid reformatting large tables like this or large switch-case statements that have a pre-existing format.

nasherm updated this revision to Diff 331896.Mar 19 2021, 8:48 AM
nasherm marked 2 inline comments as done.

Refactored regression test. Removed clang-format changes

nasherm updated this revision to Diff 331897.Mar 19 2021, 8:49 AM

Remove lint

Harbormaster completed remote builds in B94723: Diff 331897.
david-arm added inline comments.Mar 22 2021, 2:28 AM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
376

nit: I think this should be:

// From nxvmf32 to nxvmf64
379

Are we missing other types/combinations here too? For example:

{ ISD::FP_ROUND, MVT::nxv2f16, MVT::nxv2f64, 1 },

and

{ ISD::FP_ROUND, MVT::nxv4f16, MVT::nxv4f32, 1 },
nasherm updated this revision to Diff 332302.Mar 22 2021, 8:48 AM
nasherm marked 2 inline comments as done.

Responded to David's comments to take into account half-precision
floats.

Hi @nasherm, thanks for addressing the previous review comments - it looks much better now! I just have one more comment about the estimated costs in the table.

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
379

I think any conversions that involve illegal types that are too large for a single register will be split up into multiple instructions. For example, nxv8f32 is twice the size of a normal SVE register, which means we actually need 2 instructions that convert nxv4f32 -> nxv4f16. Then, we need a final third instruction to interleave these two results together. I'd expect something a bit like:

{ ISD::FP_ROUND, MVT::nxv8f16, MVT::nxv8f32, 3 }, (2 converts + interleave)
...
{ ISD::FP_ROUND, MVT::nxv4f16, MVT::nxv4f64, 3 }, (2 converts + interleave)
...
{ ISD::FP_ROUND, MVT::nxv8f16, MVT::nxv8f64, 7 }, (4 converts + 3 interleaves)
...
{ ISD::FP_ROUND, MVT::nxv4f32, MVT::nxv4f64, 3 }, (2 converts + interleave)
...
{ ISD::FP_ROUND, MVT::nxv8f32, MVT::nxv8f64, 6 }, (4 converts + 2 interleaves)

It's worth pointing out that these costs are just estimates - the most important thing is the costs should be higher to reflect the increased complexity of the operation.

nasherm updated this revision to Diff 332989.Mar 24 2021, 7:46 AM
nasherm marked an inline comment as done.

Combined several patches which all deal with the same problem: instruction
costs for scalable vectors.

nasherm retitled this revision from [SVE] Add instruction cost for fptrunc in loops to [SVE][CostModel] Add instruction cost for operations on scalable vectors.Mar 24 2021, 7:47 AM
nasherm edited the summary of this revision. (Show Details)
nasherm edited the summary of this revision. (Show Details)Mar 24 2021, 7:49 AM

Hi @nasherm, thanks for all the good fixes here! I think it looks very close now - just have a couple of minor comments. I'm fine with most of the costs as the most important thing for now is to avoid crashing and we can refine these later if necessary. However, I think the truncation costs probably do need fixing.

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
375

I'm not sure that these instructions are 'free', i.e. a cost of 0. For SVE truncation usually involves a cmpXX instruction to generate a predicate, i.e.

cmpne p1.d, p0/g, z0.d, z1.d

I think we probably want at least a cost of 1 here.

408

I think for conversions from nxv8f32->nxv8i16 there are two instructions + interleaving required here - so maybe a cost of 3?

nasherm updated this revision to Diff 333543.Mar 26 2021, 6:08 AM
nasherm marked 2 inline comments as done.
nasherm edited the summary of this revision. (Show Details)

Response to David's cost comments, minor changes.

llvm/test/Analysis/CostModel/AArch64/sve-fpext.ll
10 ↗(On Diff #332989)

I believe you can remove the space before ;
and I think you can use CHECK-NEXT

20 ↗(On Diff #332989)

Also add space between the last CHECK and the first instruction.

nasherm updated this revision to Diff 333553.Mar 26 2021, 7:26 AM
nasherm marked 2 inline comments as done.

Implemented Carol's suggestions.

david-arm accepted this revision.Mar 26 2021, 9:24 AM

LGTM with nit addressed!

llvm/test/Analysis/CostModel/AArch64/sve-fpext.ll
10 ↗(On Diff #332989)

nit: Can you address @CarolineConcatto's comment before merging? For example, I think all these tests just a need a simple sed replace:

sed -i 's/  ;CHECK/; CHECK/g'
This revision is now accepted and ready to land.Mar 26 2021, 9:24 AM
This revision was landed with ongoing or failed builds.Mar 29 2021, 3:16 AM
This revision was automatically updated to reflect the committed changes.