This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Add a target feature for AArch64StorePairSuppress
ClosedPublic

Authored by dmgreen on Sep 26 2022, 7:43 AM.

Details

Summary

The AArch64StorePairSuppress pass prevents the creation of STP under some heuristics. Unfortunately it often prevents the creation of STP in cases where it is obviously beneficial, and it doesn't match my understanding of scheduling/cpu pipelining to prevent the creation of STP. From some benchmarking, even on an in-order cpu where the scheduling is most important I don't see it giving better results. In general the lower instruction count for STP would be expected to give a slightly better cycle count.

As the pass specifically mentions the cyclone cpu, this patch adds a target feature for FeatureStorePairSuppress, enabled for all the non-Arm cpus. This has the effect of disabling it for all Arm cpus.

Diff Detail

Event Timeline

dmgreen created this revision.Sep 26 2022, 7:43 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 26 2022, 7:43 AM
dmgreen requested review of this revision.Sep 26 2022, 7:43 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 26 2022, 7:43 AM
Allen added inline comments.Sep 27 2022, 3:46 AM
llvm/lib/Target/AArch64/AArch64.td
1031

TSV110 target is also expected to suppress pair stores

dmgreen added inline comments.Sep 27 2022, 5:46 AM
llvm/lib/Target/AArch64/AArch64.td
1031

OK good to know, I can change it. I am a little surprised it expected to be beneficial though. Do you have an example where it helped? Perhaps I am just running the wrong code to show where it is useful..

Allen added inline comments.Sep 27 2022, 6:58 AM
llvm/lib/Target/AArch64/AArch64.td
1031

I don't have small case, but according the history record, this will benefical to 649.fotonik3d_s in spec2006.

dmgreen added inline comments.Sep 29 2022, 12:12 AM
llvm/lib/Target/AArch64/AArch64.td
1031

Hmm. 649.fotonik3d_s seems to be a fortran benchmark in spec2017. I would not usually measure those. I will see if I can measure some numbers.

dmgreen planned changes to this revision.Sep 29 2022, 12:13 AM
dmgreen updated this revision to Diff 464934.Oct 4 2022, 3:27 AM
dmgreen edited the summary of this revision. (Show Details)

OK - I've managed to run the fortran benchmarks on a range of micro-architectures, and only saw the same behaviour as before. The STP being less instructions gives slightly less time for the benchmark overall, and I don't see a reason to keep the store-pair-suppress option for the cpus I tried.

I've changed this to keep the feature for non-Arm cpus. I would be surprised if any of them really benefit from it, but this makes it Arm-only (and cpu=generic).

Allen added a comment.Oct 8 2022, 6:52 PM

OK - I've managed to run the fortran benchmarks on a range of micro-architectures, and only saw the same behaviour as before. The STP being less instructions gives slightly less time for the benchmark overall, and I don't see a reason to keep the store-pair-suppress option for the cpus I tried.

I've changed this to keep the feature for non-Arm cpus. I would be surprised if any of them really benefit from it, but this makes it Arm-only (and cpu=generic).

Thanks for the check. I also recheck the performance and find this change doesn't affect performance in our newest downstream branch, so LGTM for tsv110 now.

samtebbs accepted this revision.Dec 7 2022, 8:48 AM

Nice, LGTM

This revision is now accepted and ready to land.Dec 7 2022, 8:48 AM
Matt added a subscriber: Matt.Dec 8 2022, 7:18 AM
This revision was landed with ongoing or failed builds.Sep 30 2023, 3:40 AM
This revision was automatically updated to reflect the committed changes.