This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Fix N2 SchedModel for arithmetic and logic ops with cheap LSL
ClosedPublic

Authored by rjj on Mar 6 2023, 4:57 AM.

Details

Summary

According to the N2 Software Optimization Guide, arithmetic ops with LSL ≤ 4, no flagset logical ops, and flagset logical ops with LSL = 0 have a latency of 1 and use pipeline group I. However, most of these ops were being modelled as having a latency of 2 and using pipeline M. The affected instructions include the "unshifted" versions of ADD/SUB, among others.

Diff Detail

Event Timeline

rjj created this revision.Mar 6 2023, 4:57 AM
Herald added a project: Restricted Project. · View Herald Transcript
rjj requested review of this revision.Mar 6 2023, 4:57 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 6 2023, 4:57 AM

The add/sub sounds like a nice change.

From what I can tell all the logical operators (and/orr/etc) that don't set flags are latency 1, throughput 4. The ones that set flags (ands/bics) should be lat:2 throughput:2.

llvm/lib/Target/AArch64/AArch64SchedPredNeoverse.td
17

This is a quite common pattern and I think already exists somewhere. Can you move it somewhere shared?

rjj updated this revision to Diff 503709.Mar 9 2023, 3:00 AM
rjj retitled this revision from [AArch64] Fix N2 SchedModel for arithmetic ops with LSL ≤ 4 to [AArch64] Fix N2 SchedModel for arithmetic and logic ops with cheap LSL.
rjj edited the summary of this revision. (Show Details)

Also address the latency and throughput of logical ops as suggested by @dmgreen.

rjj added inline comments.Mar 9 2023, 3:08 AM
llvm/lib/Target/AArch64/AArch64SchedPredNeoverse.td
17

I believe you are referring to the Exynos and Ampere versions? Though similar in spirit, they actually implement a slightly different logic (I'm not sure if intentionally or not). In particular, the Exynos considers a "cheap shift" any shift = 0, or LSL <= 3. The Ampere considers cheap shifts to be any shift = 0, or LSL <= 4. As far as I could test, this is not accurate for the N2. For example, on N2 an LSR = 0 is still an "expensive shift". Since I was not sure if these slight differences were intentional, I decided to create a separate predicate for the Neoverses.

Having said that, I agree it would be good to move it to a shared location! Do you have any suggestions as to where that could be?

dmgreen accepted this revision.Mar 10 2023, 4:07 AM

LGTM. Thanks

llvm/lib/Target/AArch64/AArch64SchedPredNeoverse.td
17

Ah I see. It was the Ampere version I was thinking of. I would guess a lsr with a shift of 0 isn't too important, and combining the two wouldn't be a problem in practice, but if they are different then that's fine.

This revision is now accepted and ready to land.Mar 10 2023, 4:07 AM
rjj marked 2 inline comments as done.Mar 10 2023, 4:14 AM

LGTM. Thanks

Thanks very much! 👍

This revision was landed with ongoing or failed builds.Mar 10 2023, 4:39 AM
This revision was automatically updated to reflect the committed changes.