Page MenuHomePhabricator

[RISCV] Ensure target features get passed to the LTO linker for RISC-V
Needs ReviewPublic

Authored by lewis-revill on Aug 29 2022, 3:49 AM.

Details

Summary

This patch fixes an issue whereby the LTO linker would not receive any information about target features, so things like architecture extensions would not be accounted for during codegen.

There is perhaps an outstanding question as to why this is required versus obtaining the target features from bitcode. This is the approach I implemented for now, but would be happy to look into whether I can make it work via bitcode if desired.

Diff Detail

Event Timeline

lewis-revill created this revision.Aug 29 2022, 3:49 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 29 2022, 3:49 AM
lewis-revill requested review of this revision.Aug 29 2022, 3:49 AM
lenary resigned from this revision.Aug 29 2022, 5:45 AM

I'd prefer to use an approach with less moving parts: the bitcode should contain enough information to generate code without specifying anything on the command-line that wouldn't normally be passed to the linker. Among other issues, it's hard to understand the intended meaning if the flags contradict information encoded in the bitcode.

This is dump from my mailbox, few month ago I written a offlist mail to describe about RISC-V LTO status:


LTO for RISC-V is really kind of a long long story. @khchen has been fighting for that for a long time, but he is no longer focusing on RISC-V.

A short summary is we still have issues with dealing with ABI, target features, and datalayout,

Datalayout is determined by target triple in most other targets, but RISC-V can't only use target triple to determine the datalayout.

So we need to keep ABI in somewhere and read that at LTO phase, the most ideal place is the module flags. We already did that[6], but that comes with a problem is it's too late to update datalayout when we start to read a module, because LLVM will use datalayout to infer some info like the alignment of loads[7], and that means we might re-read the whole LLVM IR again to get the IR with the right info, and that requires fixing multiple places in LLVM (see[2]. Still, I am not sure it's enough or not).

There is also an issue with how to store and determine the final LTO target features. For example, A object built with -march=rv64g and B object built with -march=rv64gc, so which arch should we use in the LTO code generation stage? see [5] for more discussion around this issue.

Currently, our downstream work-around is passing -march and -mabi into LTO by -Wl,-plugin-opt=, and ABI and ISA are determined by the linking stage to avoid (or work-around) the above issues.

Related open revision:
[1] https://reviews.llvm.org/D71387
[2] https://reviews.llvm.org/D72624 (This one is abandoned, but it
still necessary if we need store ABI info in module flags)
[3] https://reviews.llvm.org/D72245
[4] https://reviews.llvm.org/D102582
[5] https://reviews.llvm.org/D106347
[6] https://reviews.llvm.org/D72755

Other revision for ref.

[7] https://reviews.llvm.org/D78403

So we need to keep ABI in somewhere and read that at LTO phase, the most ideal place is the module flags. We already did that[6], but that comes with a problem is it's too late to update datalayout when we start to read a module, because LLVM will use datalayout to infer some info like the alignment of loads[7], and that means we might re-read the whole LLVM IR again to get the IR with the right info, and that requires fixing multiple places in LLVM (see[2]. Still, I am not sure it's enough or not).

I'm not sure how the issues with datalayout in particular end up being an issue in practice.

  • clang shouldn't be writing out object files without a datalayout.
  • The code to infer alignment on load/store/etc. only exists for compatibility with old bitcode/IR; current LLVM bitcode always marks up alignment for all operations.

Or do you mean something else when you say "datalayout"?

There is also an issue with how to store and determine the final LTO target features. For example, A object built with -march=rv64g and B object built with -march=rv64gc, so which arch should we use in the LTO code generation stage? see [5] for more discussion around this issue.

On other targets, the instruction set used is encoded at a per-function level. So on x86, for example, you can have an "AVX" codepath and an "SSE" codepath, and use runtime CPU detection to figure out which to use.

jrtc27 added a comment.Sep 5 2022, 2:46 PM

So we need to keep ABI in somewhere and read that at LTO phase, the most ideal place is the module flags. We already did that[6], but that comes with a problem is it's too late to update datalayout when we start to read a module, because LLVM will use datalayout to infer some info like the alignment of loads[7], and that means we might re-read the whole LLVM IR again to get the IR with the right info, and that requires fixing multiple places in LLVM (see[2]. Still, I am not sure it's enough or not).

I'm not sure how the issues with datalayout in particular end up being an issue in practice.

Maybe for IR DataLayout upgrades, but those are rather rare.

  • clang shouldn't be writing out object files without a datalayout.
  • The code to infer alignment on load/store/etc. only exists for compatibility with old bitcode/IR; current LLVM bitcode always marks up alignment for all operations.

Or do you mean something else when you say "datalayout"?

There is also an issue with how to store and determine the final LTO target features. For example, A object built with -march=rv64g and B object built with -march=rv64gc, so which arch should we use in the LTO code generation stage? see [5] for more discussion around this issue.

On other targets, the instruction set used is encoded at a per-function level. So on x86, for example, you can have an "AVX" codepath and an "SSE" codepath, and use runtime CPU detection to figure out which to use.

The IR records that too, but the "default" set of extensions gets encoded in the output file so loaders can know what instruction sets are _required_ (as opposed to optional via IFUNCs). It's not a perfect match, but it's about as good as you can get. With GNU ld these get merged together, though currently LLD just picks the one from the first file and ignores the rest (I think?); ideally the LLVM module linker would do similar merging.

There is also an issue with how to store and determine the final LTO target features. For example, A object built with -march=rv64g and B object built with -march=rv64gc, so which arch should we use in the LTO code generation stage? see [5] for more discussion around this issue.

On other targets, the instruction set used is encoded at a per-function level. So on x86, for example, you can have an "AVX" codepath and an "SSE" codepath, and use runtime CPU detection to figure out which to use.

The IR records that too, but the "default" set of extensions gets encoded in the output file so loaders can know what instruction sets are _required_ (as opposed to optional via IFUNCs). It's not a perfect match, but it's about as good as you can get. With GNU ld these get merged together, though currently LLD just picks the one from the first file and ignores the rest (I think?); ideally the LLVM module linker would do similar merging.

Oh, I see. Most other targets don't have ELF attributes like that. 32-bit ARM does, but I'm not sure how the interaction with LTO works there off the top of my head.

We probably want to encode the architecture into the IR in some way that allows us to produce the same attributes whether or not LTO is enabled.

I'm not sure how the issues with datalayout in particular end up being an issue in practice.

clang shouldn't be writing out object files without a datalayout.
The code to infer alignment on load/store/etc. only exists for compatibility with old bitcode/IR; current LLVM bitcode always marks up alignment for all operations.
Or do you mean something else when you say "datalayout"?

ilp32/ilp32f/ilp32d and ilp32e(D70401) having different data layout, so when we try to merge those stuffs will run into trouble, although I admit build object file with ilp32/ilp32f/ilp32d then LTO with ilp32e is generally a bad idea.

Another issue is the LLVM isn't compatible between different ABI, e.g. ilp32 and ilp32f having different LLVM IR when passing a struct with a int and a float[2].

[1] https://reviews.llvm.org/D71387#1792169


On other targets, the instruction set used is encoded at a per-function level. So on x86, for example, you can have an "AVX" codepath and an "SSE" codepath, and use runtime CPU detection to figure out which to use.

Give an example to explain what problem we have now and what option we have:

$ clang -target riscv64-elf -flto a.c -o a.o -march=rv64gc            # a.o build with -march=rv64gc
$ clang -target riscv64-elf -flto b.c -o b.o -march=rv64g             # b.o build with -march=rv64g
$ clang -target riscv64-elf  a.o b.o -o a.out -flto -march=rv64gc_zba # and LTO phase use -march=rv64gc_zba, what target feature (or ISA extensions) should be used for all function 

Possible solution/results:

  1. All functions in a.o and b.o using same target features during the first build stage, -march=rv64gc for a.o, -march=rv64g for b.o, and -march option given in LTO CodeGen stage is ignored, it only used for ELF attribute use (this revision).
  2. All functions in a.o and b.o using same target features during the first build stage, -march=rv64gc for a.o, -march=rv64g for b.o, and deduced arch info from those .o for ELF attribute use (D106347), -march
  3. All functions in a.o and b.o re-compile with -march=rv64gc_zba and ELF attribute use rv64gc_zba.

Option 1: Require user use right -march option during LTO stage, and might fill wrong/unexpected ELF attribute if give wrong -march or not even not given in LTO stage.
Option 2: Should be more ideal, but D106347 seems no progress for a while.
Option 3: This option will break IFUNC usage.

Possible solution/results:

  1. All functions in a.o and b.o using same target features during the first build stage, -march=rv64gc for a.o, -march=rv64g for b.o, and -march option given in LTO CodeGen stage is ignored, it only used for ELF attribute use (this revision).
  2. All functions in a.o and b.o using same target features during the first build stage, -march=rv64gc for a.o, -march=rv64g for b.o, and deduced arch info from those .o for ELF attribute use (D106347), -march
  3. All functions in a.o and b.o re-compile with -march=rv64gc_zba and ELF attribute use rv64gc_zba.

Option 1: Require user use right -march option during LTO stage, and might fill wrong/unexpected ELF attribute if give wrong -march or not even not given in LTO stage.
Option 2: Should be more ideal, but D106347 seems no progress for a while.
Option 3: This option will break IFUNC usage.

This patch (Option 1) is look good to me, but maybe we need to report a warning in linking stage if possible.
I think users may not easy to specific the right -march string when they're using external libraries.
We had discussed that before in here and here, it's why I proposed Option 2 which encodes a module scope arch features in IR. IIRC, it's similar to what gcc did.

MTC added a subscriber: MTC.Sep 16 2022, 4:56 AM