This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Prefer SIMD&FP variant of clast[ab]
ClosedPublic

Authored by c-rhodes on Jul 11 2022, 3:51 AM.

Details

Summary

The scalar variant with GPR source/dest has considerably higher latency
than the SIMD&FP scalar variant across a variety of micro-architectures:

Core           Scalar    SIMD&FP
--------------------------------
Neoverse V1     9 cyc      3 cyc
Neoverse N2     8 cyc      3 cyc
Cortex A510     8 cyc      4 cyc

Diff Detail

Event Timeline

c-rhodes created this revision.Jul 11 2022, 3:51 AM
c-rhodes requested review of this revision.Jul 11 2022, 3:51 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJul 11 2022, 3:51 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
c-rhodes updated this revision to Diff 443594.Jul 11 2022, 4:05 AM

Add full patch context.

Are you saying that it's faster to use clasta targeting a float register, then move the result to an integer register, rather than use the integer form directly? Or is the issue just that we want to split the operations in case we can simplify the resulting bitcast?

Do we expect SelectionDAG to combine a clasta+bitcast to a single instruction? Do we have test coverage for that?

Matt added a subscriber: Matt.Jul 11 2022, 3:27 PM

Are you saying that it's faster to use clasta targeting a float register, then move the result to an integer register, rather than use the integer form directly? Or is the issue just that we want to split the operations in case we can simplify the resulting bitcast?

The former, for the most part. If clast[ab] is inside a loop and is a loop-carried dependency it's considerably faster. If it's a straight bitcast-to-fp + clast[ab] + bitcast-to-int then that costs a cycle or two more depending on the micro-architecture, but in slightly more complicated code from what I've observed the SIMD&FP with bitcasts is as fast as the integer variant, if not faster.

Do we expect SelectionDAG to combine a clasta+bitcast to a single instruction? Do we have test coverage for that?

No, there'll be a mov to an integer register.

efriedma accepted this revision.Jul 12 2022, 9:18 AM

Please update the comment, then LGTM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
801

Maybe put a bit of the explanation you just gave into a comment here, for reference.

This revision is now accepted and ready to land.Jul 12 2022, 9:18 AM
c-rhodes closed this revision.Jul 13 2022, 1:56 AM
c-rhodes marked an inline comment as done.

Please update the comment, then LGTM

Done, cheers Eli.