This is an archive of the discontinued LLVM Phabricator instance.

RISCV: relax the ABI mismatch checking
Needs ReviewPublic

Authored by compnerd on Jul 20 2021, 9:42 AM.

Details

Reviewers
MaskRay
Summary

The linker validates that the inputs that it is merging follow the same
ABI restrictions (i.e. floating point ABI and RVE). However, the check
is not required or applicable when the input does not have code content
such as a data file. This relaxes the check to only perform the checks
when the input is non-empty and has loaded code sections. This matches
the behaviour of the BFD linker and enables adding data files which may
have mismatched ABI tags.

This also changes the behaviour of the RVC tagging to match BFD: the RVC
flag is only preserved if the input contributes code to the final
binary.

Rewrite the flag checking to use XOR which feels easier to read than the
equality check.

Diff Detail

Event Timeline

compnerd created this revision.Jul 20 2021, 9:42 AM
compnerd requested review of this revision.Jul 20 2021, 9:42 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 20 2021, 9:42 AM
MaskRay added a comment.EditedJul 20 2021, 9:57 AM

I agree with this comment https://github.com/riscv/riscv-elf-psabi-doc/pull/200#issuecomment-883006301

There is a layering difference. e_flags should not need a section header table to provide a diagnostic.
At least from a first glance the GNU ld behavior looks strange to me.

This is indeed the simplest fix(workaround) since we don't have a way to control e_flags with llvm-objcopy (and likely GNU objcopy).
But I am not clear adding a condition to the linkers is correct.

The original intention of this patch is likely the reason I am concerned with so-called compatibility checks (e.g. https://github.com/riscv/riscv-elf-psabi-doc/pull/71) because people may need a way to suppress them.

I've never liked this, because it _is_ possible to screw up here. It is quite plausible that struct layout will be conditional on __riscv_float_abi_* and that you could have a data-only C file.

IMO, using objcopy to put a binary blob back into an ELF is broken by design without the ability to also set the ELF flags in it. There is already a much better option than this: use an assembly file that .incbin's the binary blob, and assemble it with Clang as normal. This removes a reliance on miscellaneous binary utilities and it ensures the file gets _exactly_ the same ELF header etc as any other file in your build. This is what we have switched to in FreeBSD. If the motivation is to support Linux, I highly suggest you just fix Linux to avoid this kind of hack.

I agree that it is unfortunate that RISCV opted to put these flags into the e_flags field in the ELF header. These are machine dependent and is ideally handled by something like .riscv.attributes. In fact, ARM has such a model which does allow for the merging of the incompatible code, with a warning and all. The problem here is that there is existing code which relies on this relaxed check.

This needs linker&psABI discussions on what the best path forward.

If we do need the lld/ELF change, we need to change error to errorOrWarn so that --noinhibit-exec without --fatal-warnings will be a warning...

I agree that it is unfortunate that RISCV opted to put these flags into the e_flags field in the ELF header. These are machine dependent and is ideally handled by something like .riscv.attributes. In fact, ARM has such a model which does allow for the merging of the incompatible code, with a warning and all. The problem here is that there is existing code which relies on this relaxed check.

Where the bits are makes no difference. It's solely a policy for how to interpret them. (Yes, there's also the fact that an ELF note has an additional state of "not present", but that can also be done with an e_flags bit, there is no fundamental difference between them).

e_flags is always machine dependent. I don't get your point there.

This needs linker&psABI discussions on what the best path forward.

If we do need the lld/ELF change, we need to change error to errorOrWarn so that --noinhibit-exec without --fatal-warnings will be a warning...

The right path forward is to not use tools that create ELF files that don't adhere to the psABI (i.e. say they use one ABI when you mean for them to be another ABI). Whether that's by changing the tools to support specifying the ABI for the output file, or by using existing alternative tools (clang + .incbin / gcc + as + .incbin) that don't suffer from those shortcomings, I don't really care, but I see this as a tooling issue not an ABI issue.

Sorry, you are correct that the flags are always machine dependent. I was referring more to @MaskRay's point about these flags wanting an opt-out, and that this really feels like it is better handled with the attributes that can be merged/handled in a more flexible manner. Additionally, these options don't directly impact the ELF object but rather are details on the contents of the code section.

jca added a subscriber: jca.Jan 3 2022, 2:07 PM
jca added a comment.Jan 4 2022, 4:28 AM

I think there is a problem with how code sections are matched.

lld/ELF/Arch/RISCV.cpp
128

This test seems too broad to only match loaded machine code sections. You need to check whether both flags are set.

jca added a comment.Jan 4 2022, 4:46 AM

This needs linker&psABI discussions on what the best path forward.

If we do need the lld/ELF change, we need to change error to errorOrWarn so that --noinhibit-exec without --fatal-warnings will be a warning...

The right path forward is to not use tools that create ELF files that don't adhere to the psABI (i.e. say they use one ABI when you mean for them to be another ABI). Whether that's by changing the tools to support specifying the ABI for the output file, or by using existing alternative tools (clang + .incbin / gcc + as + .incbin) that don't suffer from those shortcomings, I don't really care, but I see this as a tooling issue not an ABI issue.

Regarding "tools that create ELF files that don't adhere to the psABI": it's not just about objcopy. Another well-known way to embed binary data in object files is to use ld -b binary. With lld, such object files end up with e_flags equal to 0 (https://reviews.llvm.org/D71101), and that triggers the fatal error in the code discussed here. There is no way to create object files with a more precise ABI/e_flags using -m (https://reviews.llvm.org/D95755).

I agree that this looks like a tooling issue, and I would love a way to solve or work around this problem ; without resorting to patching the build system of other projects to use lower-level alternatives like .incbin, which may be rejected by upstreams.
My experience comes from building packages for the OpenBSD/riscv64 architecture, and the upstream projects which hit this failure are, so far: mupdf (embeds font files), postgresql-lua, eduke32, nblood (all three want to embed lua bytecode) and utox.