This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Set maximum vscale VF with shouldMaximizeVectorBandwidth
AbandonedPublic

Authored by Allen on Jul 14 2023, 9:48 PM.

Details

Summary

Set the maximum vscale VF of AArch64 with 128 / the size of smallest type in
loop if there is no register usage overflow, which is similar to the neon VF
done on D118979.

Diff Detail

Event Timeline

Allen created this revision.Jul 14 2023, 9:48 PM
Allen requested review of this revision.Jul 14 2023, 9:48 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 14 2023, 9:48 PM

For Neon we enabled shouldMaximizeVectorBandwidth so that the backend could make use of instructions like umull/umull2 and the narrowing instructions. Extending into larger types for Neon is quite natural in places, and can lead to less total instructions. SVE has instructions like UMULLB/T that work on the top/bottom lanes in a pair, but I don't believe the backend makes any use of them at the moment.

The description is a bit light on details. What is the reasoning behind enabling this for SVE too? And do you have any benchmark results?

Allen added a comment.EditedJul 17 2023, 1:01 AM

I don't have a server with SVE to support run the performance of large benchmark spec2017.
But when I run the Lammp with intel mode (https://www.lammps.org/#gsc.tab=0) on emulator, I find the
hot function PairLJCutCoulLongIntel::eval in file pair_lj_cut_coul_long_intel.cpp:337 will enlarge the VF from 2 to 4
because there are float and double types in the kernel loop body, so choose a more widen VF will have
wider parallelism, and the performance gain about 16% (https://github.com/lammps/lammps/blob/develop/src/INTEL/pair_lj_cut_coul_long_intel.cpp#L337).

Matt added a subscriber: Matt.Jul 17 2023, 4:27 PM
Allen abandoned this revision.Jul 30 2023, 7:45 PM

For the record - In SVE2 there are a number of instructions that can use top/bottom lanes providing the backend does some sort of lane interleaving. Once that is done this might make a lot of sense but it might be better to address that first.