This is an archive of the discontinued LLVM Phabricator instance.

[LV][AArch64] Disable maximising bandwidth for streaming compatible sve
ClosedPublic

Authored by dtemirbulatov on May 10 2023, 6:09 PM.

Details

Summary

We noticed some runtime performance improvements by disabling maximising bandwidth for streaming compatible sve and here is the patch that disables that feature.

Diff Detail

Event Timeline

dtemirbulatov created this revision.May 10 2023, 6:09 PM
dtemirbulatov requested review of this revision.May 10 2023, 6:09 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 10 2023, 6:09 PM
Matt added a subscriber: Matt.May 11 2023, 12:07 PM
david-arm added inline comments.May 12 2023, 1:01 AM
llvm/test/Transforms/LoopVectorize/AArch64/streaming-compatible-sve-no-maximize-bandwidth.ll
2

I don't think you should add this NOTE if you are manually deleting the CHECK lines for prefix NO_SC_SVE.

Also, in the RUN lines below you've added the flag -debug-only=loop-vectorize, but this requires an assert build. It looks like you only want the debug output to check the value of VF chosen. I think you have two choices here:

  1. Add an extra ; REQUIRES: asserts line here so the tests only run with debug builds, or
  2. Remove the -debug-only=loop-vectorize and the Selecting VF CHECK lines. If you autogenerate the vectorised IR for both RUN lines then you are automatically testing the VF anyway because the output IR will either contain <2 x i32> or <8 x i32>.
3

Hi @dtemirbulatov, perhaps instead of specifying -aarch64-sve-vector-bits-min=128 you can just add a vscale_range(1,16) attribute to the function instead? For example,

define void @foo() vscale_range(1,16)  {

Or perhaps even better you can also remove the -mattr=+sve flag by doing something like

; RUN: opt < %s -passes=loop-vectorize -debug-only=loop-vectorize -force-streaming-compatible-sve -scalable-vectorization=off -S 2>&1 | FileCheck %s --check-prefix=SC_SVE
...

define void @foo() #0  {

...

attributes #0 = { "target-features"="+sve" vscale_range(1,16) }
11

Can you rename this function to something else please? It looks like it came from an existing program. This is just a suggestion, but you could call it reduc_max_bandwidth or something like that?

17

The vectorised IR here doesn't match the scalar IR in the test. Can you decide what IR you actually need in the function in order to defend the change in this patch? For example, it looks like the load, sext and mul are all unnecessary for the test to work. I'm just a bit worried about the test being a bit fragile.

57

Hi @dtemirbulatov, the IR here looks wrong to me. We're sign-extending the constant value i16 0 here. Did you mean this instead?

%0 = load i16, ptr null, align 2
%conv10 = sext i16 %0 to i32
58

We're multiplying by 0 and orring with 0 below too. Is this right?

dtemirbulatov marked 6 inline comments as done.

Resolved comments for the test.

Thanks for making these changes @dtemirbulatov, the tests look a lot better now! I just had a couple of minor suggestions for improving the test a bit further and reducing the CHECK lines, but I think it's almost ready to go!

llvm/test/Transforms/LoopVectorize/AArch64/streaming-compatible-sve-no-maximize-bandwidth.ll
3

nit: This is just a suggestion, but if you add -force-vector-interleave=1 to each of the RUN lines it should significantly reduce the number of CHECK lines.

226

I think you can fold these two blocks into one and remove the >0 check, i.e.:

entry:
  %0 = sext i32 %lag to i64
  %wide.trip.count = zext i32 %n to i64
  br label %for.body

for.body:
  %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
...
  br i1 %exitcond.not, label %for.end, label %for.body

for.end:
  %ret.0.lcssa = phi i32 [ %add9, %for.body ]
  ret i32 %ret.0.lcssa
dtemirbulatov marked 2 inline comments as done.

Addressed comments.

david-arm accepted this revision.May 22 2023, 1:00 AM

LGTM! Thanks for making the changes @dtemirbulatov. :)

This revision is now accepted and ready to land.May 22 2023, 1:00 AM