This is an archive of the discontinued LLVM Phabricator instance.

[lld][RISCV] Avoid error when encountering unrecognised ISA extensions/versions in RISC-V attributes
ClosedPublic

Authored by asb on Feb 19 2023, 12:05 PM.

Details

Summary

This patch follows on from this RFC thread https://discourse.llvm.org/t/rfc-resolving-issues-related-to-extension-versioning-in-risc-v/68472/ and the ensuing discussion in the RISC-V LLVM sync-up call. The consensus agreed that the behaviour change in LLD introduced in D138550 that results in object files including arch attributes with unrecognised extensions or versions of extensions is a regression and should be treated as such. As it stands, this logic means that LLD will error out if trying to link a RISC-V object file from LLVM HEAD (rv32i2p0/rv64i2p0) with one from current GCC (rv32i2p1/rv64i2p1 by default).

There's a bigger discussion about exactly when to warn vs error and so on, and how to control that, and this patch doesn't attempt to address all those questions. It simply tries to fix the problem with a minimally invasive change, intended to be cherry-picked for 16.0.x (ideally 16.0.0, but queued for 16.0.1 if we're too late in the release cycle).

As you can see from the test changes, although the changed logic is mostly more permissive, it will reject some embedded arch strings that were accepted before. Because the same logic was previously used for parsing human-written -march as for the arch attribute (intended to be stored in normalised form), various short-hand formats were previously accepted. Maintaining support for such ill-formed attributes would substantially complicate the logic, and given the previous implementation was so much stricter in every other way, was surely a bug rather than a design choice.

Surprisingly, the precise rules for how numbers can be embedded into extension names isn't fully defined (there is a PR to the ISA manual that is not yet merged https://github.com/riscv/riscv-isa-manual/pull/718). In the absence of specific criteria for rejecting extensions names that would be ambiguous in abbreviated form, RISCVISAInfo::parseArchStringNormalized just pulls out the version information from the end of each extension description.

Diff Detail

Event Timeline

asb created this revision.Feb 19 2023, 12:05 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 19 2023, 12:05 PM
asb requested review of this revision.Feb 19 2023, 12:05 PM
Possowsk removed a subscriber: Possowsk.
Possowsk added a subscriber: Possowsk.
asb added a comment.EditedFeb 22 2023, 2:13 PM

An important additional clarification: not errorring or warning when encountering unrecognised extensions or extension versions (as implemented in this patch) matches current binutils ld behaviour.

$ cat a.s
.section .riscv.attributes,"",@0x70000003
.byte 0x41
.long .Lend-.riscv.attributes-1
.asciz "riscv"  # vendor
.Lbegin:
.byte 1  # Tag_File
.long .Lend-.Lbegin
.byte 5  # Tag_RISCV_arch
.asciz "rv64i2p0_zmadeup1p0"
.Lend:

$ cat b.s
.section .riscv.attributes,"",@0x70000003
.byte 0x41
.long .Lend-.riscv.attributes-1
.asciz "riscv"  # vendor
.Lbegin:
.byte 1  # Tag_File
.long .Lend-.Lbegin
.byte 5  # Tag_RISCV_arch
.asciz "rv64i50p0_zanotherfake1p0"
.Lend:

$ riscv64-linux-gnu-as a.s -o a.o
$ riscv64-linux-gnu-as b.s -o b.o
$ riscv64-linux-gnu-ld a.o b.o
riscv64-linux-gnu-ld: warning: cannot find entry symbol _start; defaulting to 0000000000010078
$ llvm-readobj --arch-specific a.out
File: a.out
Format: elf64-littleriscv
Arch: riscv64
AddressSize: 64bit
LoadName: <Not found>
BuildAttributes {
  FormatVersion: 0x41
  Section 1 {
    SectionLength: 53
    Vendor: riscv
    Tag: Tag_File (0x1)
    Size: 43
    FileAttributes {
      Attribute {
        Tag: 5
        TagName: arch
        Value: rv64i50p0_zmadeup1p0_zanotherfake1p0
      }
    }
  }
}

I don't feel comfortable signing off on this as I don't know this part of code well enough, but I want to chime in with support for treating this like a regression. If we can get a change merged into the release branch without compromising quality in other ways, I think we should.

Alex, on the code, can you explain why you took the approach of writing a new parser specifically for this case? The existing routine has the IgnoreUnknown flag. How does that differ from the semantics you need here?

asb added a comment.Feb 22 2023, 2:58 PM

Alex, on the code, can you explain why you took the approach of writing a new parser specifically for this case? The existing routine has the IgnoreUnknown flag. How does that differ from the semantics you need here?

That's a good question and a consideration I should have mentioned in the patch description. The issue with IgnoreUnknown is the extension is literally ignored - it isn't added to the returned RISCVISAInfo, and so will be silently dropped rather than merged according to the merging rules implemented in LLD and specified in the psABI. There's also the fact that the existing routine accepts a much wider range of inputs (and performs additional normalisation) than is required/desired for the inputs from the ELF attributes - though if parseArchString with IgnoreUnknown was closer to what we want, I'd have strongly considered just using that as a stop-gap.

LG and I support back porting this to the release/16.x branch.

I think the new API needs a unittest under llvm/unittests/Support/, which the original patch adding parseArchString omitted.

lld/test/ELF/riscv-attributes.s
18

ditto below

MaskRay added inline comments.Feb 22 2023, 4:46 PM
llvm/lib/Support/RISCVISAInfo.cpp
519

This is untested. You can test it with a test under llvm/unittest/Support

545

getExtensionVersion uses unsigned instead of unsigned long long.

555

Merge this into the loop condition

558

pre-increment

LGTM

I like the idea which decoupling the parsing logic of the normal march string parsing, and we can have a simple parser logic here since ELF attribute has specified that should have explicitly version number and separated by _.

This also enable us could implement further complicated merge rule like merging unknown extension, different extension version and unsupported extension version - not just limited to handle knowing extension and supported version.

Reverse ping. We've got two LGTMs and this is a serious regression. Is there anything holding this back?

asb marked 4 inline comments as done.Feb 24 2023, 8:25 AM

I'll post an updated version with the requested C++ unit tests shortly. I understood MaskRay's LGTM to be conditional on addressing those comments.

llvm/lib/Support/RISCVISAInfo.cpp
545

unsigned long long needs to be used for StringRef::getAsUnsignedInteger. Changing to unsigned gives:

/home/asb/llvm-project/llvm/lib/Support/RISCVISAInfo.cpp:552:9: error: no matching function for call to 'getAsUnsignedInteger'
    if (getAsUnsignedInteger(MinorVersionStr, 10, MinorVersion))
        ^~~~~~~~~~~~~~~~~~~~
/home/asb/llvm-project/llvm/include/llvm/ADT/StringRef.h:34:8: note: candidate function not viable: no known conversion from 'unsigned int' to 'unsigned long long &' for 3rd argument
  bool getAsUnsignedInteger(StringRef Str, unsigned Radix,
       ^
asb updated this revision to Diff 500250.Feb 24 2023, 10:23 AM
asb marked an inline comment as done.

Addresses review comments and adds C++ unit test.

asb added a comment.Feb 24 2023, 12:48 PM

If anyone can confirm there's nothing egregiously wrong (or obvious gaps) with the unit test and other review feedback responses I'd appreciate it. I interpreted the reviews as LGTM or LGTM once minor comments are addressed, without necessarily requiring another review loop. So given that and the time pressure for the backport request, I'll commit tomorrow morning UK time unless there are objections or further review comments.

MaskRay accepted this revision.Feb 24 2023, 1:02 PM
MaskRay added inline comments.
llvm/lib/Support/RISCVISAInfo.cpp
545

OK. We can use the template member function StringRef::getAsInteger to avoid the implicit unsigned long long to unsigned cast below.

llvm/unittests/Support/RISCVISAInfoTest.cpp
1 ↗(On Diff #500250)

//===-- or 3 dashes is more common

22 ↗(On Diff #500250)
30 ↗(On Diff #500250)
38 ↗(On Diff #500250)
51 ↗(On Diff #500250)

Use EXPECT_ for non-fatal checks.

67 ↗(On Diff #500250)

Use EXPECT_ for non-fatal checks.

75 ↗(On Diff #500250)

Use EXPECT_ for non-fatal checks.

86 ↗(On Diff #500250)

Use EXPECT_ for non-fatal checks.

101 ↗(On Diff #500250)

Use EXPECT_ for non-fatal checks.

This revision is now accepted and ready to land.Feb 24 2023, 1:02 PM
asb updated this revision to Diff 500363.Feb 24 2023, 10:16 PM
asb marked 9 inline comments as done.

Address review comments (thanks!)

llvm/unittests/Support/RISCVISAInfoTest.cpp
22 ↗(On Diff #500250)

I've gone ahead and changed these, but I do consider this a case of "the type is already obvious from the context" and hence in-line with the recommendation. Clearly not a big deal either way.

This revision was landed with ongoing or failed builds.Feb 24 2023, 10:18 PM
This revision was automatically updated to reflect the committed changes.