We noticed some runtime performance improvements by disabling maximising bandwidth for streaming compatible sve and here is the patch that disables that feature.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/test/Transforms/LoopVectorize/AArch64/streaming-compatible-sve-no-maximize-bandwidth.ll | ||
---|---|---|
2 | I don't think you should add this NOTE if you are manually deleting the CHECK lines for prefix NO_SC_SVE. Also, in the RUN lines below you've added the flag -debug-only=loop-vectorize, but this requires an assert build. It looks like you only want the debug output to check the value of VF chosen. I think you have two choices here:
| |
3 | Hi @dtemirbulatov, perhaps instead of specifying -aarch64-sve-vector-bits-min=128 you can just add a vscale_range(1,16) attribute to the function instead? For example, define void @foo() vscale_range(1,16) { Or perhaps even better you can also remove the -mattr=+sve flag by doing something like ; RUN: opt < %s -passes=loop-vectorize -debug-only=loop-vectorize -force-streaming-compatible-sve -scalable-vectorization=off -S 2>&1 | FileCheck %s --check-prefix=SC_SVE ... define void @foo() #0 { ... attributes #0 = { "target-features"="+sve" vscale_range(1,16) } | |
11 | Can you rename this function to something else please? It looks like it came from an existing program. This is just a suggestion, but you could call it reduc_max_bandwidth or something like that? | |
17 | The vectorised IR here doesn't match the scalar IR in the test. Can you decide what IR you actually need in the function in order to defend the change in this patch? For example, it looks like the load, sext and mul are all unnecessary for the test to work. I'm just a bit worried about the test being a bit fragile. | |
57 | Hi @dtemirbulatov, the IR here looks wrong to me. We're sign-extending the constant value i16 0 here. Did you mean this instead? %0 = load i16, ptr null, align 2 %conv10 = sext i16 %0 to i32 | |
58 | We're multiplying by 0 and orring with 0 below too. Is this right? |
Thanks for making these changes @dtemirbulatov, the tests look a lot better now! I just had a couple of minor suggestions for improving the test a bit further and reducing the CHECK lines, but I think it's almost ready to go!
llvm/test/Transforms/LoopVectorize/AArch64/streaming-compatible-sve-no-maximize-bandwidth.ll | ||
---|---|---|
3 | nit: This is just a suggestion, but if you add -force-vector-interleave=1 to each of the RUN lines it should significantly reduce the number of CHECK lines. | |
226 | I think you can fold these two blocks into one and remove the >0 check, i.e.: entry: %0 = sext i32 %lag to i64 %wide.trip.count = zext i32 %n to i64 br label %for.body for.body: %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ] ... br i1 %exitcond.not, label %for.end, label %for.body for.end: %ret.0.lcssa = phi i32 [ %add9, %for.body ] ret i32 %ret.0.lcssa |
I don't think you should add this NOTE if you are manually deleting the CHECK lines for prefix NO_SC_SVE.
Also, in the RUN lines below you've added the flag -debug-only=loop-vectorize, but this requires an assert build. It looks like you only want the debug output to check the value of VF chosen. I think you have two choices here: