Page MenuHomePhabricator

[RISCV] Turn on SeparateConstOffsetFromGEPPass for RISC-V target
Needs ReviewPublic

Authored by eklepilkina on Jun 15 2022, 7:25 AM.

Details

Summary

Now clang for RISC-V doesn't use offset addressing in generated assembly. Example from Dhrystone

 addiw   a0, s1, 5
 slli    a1, a0, 0x2
 add     a2, s4, a1
 sw      s2, 0(a2)
 addiw   a3, s1, 6
 slli    a3, a3, 0x2
 add     a3, a3, s4
 sw      s2, 0(a3)
 addiw   a3, s1, 35
 slli    a3, a3, 0x2
add     a3, a3, s4
sw      a0, 0(a3)

It's inefficient because we can use offsets.
Adding this pass allows to generate the next code

    addiw   a4, a2, 5
    slli    a5, a2, 2
    add a0, a0, a5
    sw  a3, 20(a0)
    sw  a3, 24(a0)
    sw  a4, 140(a0)
...

SeparateConstOffsetFromGEPPass is used to solve this problem in targets with limited addressing modes.

Diff Detail

Unit TestsFailed

TimeTest
60 msx64 debian > LLVM.Transforms/SeparateConstOffsetFromGEP/RISCV::split-gep.ll
Script: -- : 'RUN: at line 2'; /var/lib/buildkite-agent/builds/llvm-project/build/bin/opt < /var/lib/buildkite-agent/builds/llvm-project/llvm/test/Transforms/SeparateConstOffsetFromGEP/RISCV/split-gep.ll -mtriple=riscv64-unknown-elf -passes='separate-const-offset-from-gep<check-profit>,early-cse' -S | /var/lib/buildkite-agent/builds/llvm-project/build/bin/FileCheck /var/lib/buildkite-agent/builds/llvm-project/llvm/test/Transforms/SeparateConstOffsetFromGEP/RISCV/split-gep.ll

Event Timeline

eklepilkina created this revision.Jun 15 2022, 7:25 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 15 2022, 7:25 AM
eklepilkina requested review of this revision.Jun 15 2022, 7:25 AM

Now clang for RISC-V doesn't use offset addressing in generated assembly. Example from Dhrystone

 addiw   a0, s1, 5
 slli    a1, a0, 0x2
 add     a2, s4, a1
 sw      s2, 0(a2)
 addiw   a3, s1, 6
 slli    a3, a3, 0x2
 add     a3, a3, s4
 sw      s2, 0(a3)
 addiw   a3, s1, 35
 slli    a3, a3, 0x2
add     a3, a3, s4
sw      a0, 0(a3)

It's inefficient because we can use offsets.
Adding this pass allows to generate the next code

    addiw   a4, a2, 5
    slli    a5, a2, 2
    add a0, a0, a5
    sw  a3, 20(a0)
    sw  a3, 24(a0)
    sw  a4, 140(a0)
...

SeparateConstOffsetFromGEPPass is used to solve this problem in targets with limited addressing modes.

eklepilkina edited the summary of this revision. (Show Details)Jun 16 2022, 6:25 AM
asb added a comment.Jun 20 2022, 3:42 AM

My previous experience looking at whether it's worth enabling SeparateConstOffsetFromGEPPass was that there seemed to be cases where codegen was made worse and it was hard to resolve this (it was disabled for AArch64 by default in rGcd2334e86e018757b5a51eb16e5a814a6d95bded but D113284 started to look at changing this).

Are you seeing any instances of worse code in any of the tests / benchmarks you're looking at? Presumably the changes in D127727 were motivated by such cases?

Sidenote: You need to add Scalar to the RISCVCodeGen target in llvm/lib/Target/RISCV/CMakeLists.txt so the build still works in the case BUILD_SHARED_LIBS=True.

Are you seeing any instances of worse code in any of the tests / benchmarks you're looking at?

Yes, I ran almost all llvm test-suite benchmarks. There were some unstable benchmarks, but regressions on them weren't reproduced in the next runs. Do you have the cases with previous regressions you've found out?

Presumably the changes in D127727 were motivated by such cases?

Exactly, besides unstable cases we also found several regressions and my investigation mostly shows that the current pass originally modifies all GEPs althought sometimes it increase register pressure, so new startegy was added. Of course, it's possible I missed something.

I'll be grateful if you provide some list of benchmarks that I should pay more attention.

  • Added Scalar to CMakeLists
asb added a comment.Jun 22 2022, 3:34 AM

Are you seeing any instances of worse code in any of the tests / benchmarks you're looking at?

Yes, I ran almost all llvm test-suite benchmarks. There were some unstable benchmarks, but regressions on them weren't reproduced in the next runs. Do you have the cases with previous regressions you've found out?

Presumably the changes in D127727 were motivated by such cases?

Exactly, besides unstable cases we also found several regressions and my investigation mostly shows that the current pass originally modifies all GEPs althought sometimes it increase register pressure, so new startegy was added. Of course, it's possible I missed something.

I'll be grateful if you provide some list of benchmarks that I should pay more attention.

I don't have anything in particular in mind. I'd just looked at turning it on, and found the resulting codegen story was more mixed than I hoped and so shelved it for the time being. It might be worth looking at diffing the generated assembly (either objdump output or .s from -save-temps) for your benchmarks. Regressions in generated code quality that don't negatively impact benchmark results are obviously less worrying than those that do, but if this change is introducing more cases of obviously "bad" code, it's likely going to eat up time over the longer term as users report it or it's investigated as a potential performance issue.

Run one more time benchmarks from test-suite.

Runs show some quite big regressions, but all these regressions I checked and all of them are unstable. I ran them several times and the best time isn't worse than base run.

test-suite :: SingleSource/Benchmarks/Misc/ffbench.test   1.98    2.40   21.3%  (unstable - the best result for patched LLVM - 1.7)
test-suite :: SingleSource/Benchmarks/Shootout-C++/Shootout-C++-lists.test  25.35   28.04   10.6%  (unstable - the best result for patched LLVM - 25.22)
test-suite :: MultiSource/Benchmarks/ASC_Sequoia/CrystalMk/CrystalMk.test  16.11   17.16    6.5%     (unstable - the best result for patched LLVM - 16.06)

Other regressions are small and also aren't reprodusible.

Runs were made on Alibaba T-Head RVB-ICE development board featuring a RISC-V dual-core 1.2GHz XuanTie C910 ICE SoC with a Vivante 3D GPU, an NPU, 4GB RAM

  • Added Scalar to CMakeLists
  • [RISCV] Turn off SeparateConstOffsetFromGEP by default

I decided to make flag turned off by default, because even if SeparateConstOffsetFromGEP doesn't change anything EarlyCSE and LICM can change and this isn't always useful.