This patch follows on from this RFC thread https://discourse.llvm.org/t/rfc-resolving-issues-related-to-extension-versioning-in-risc-v/68472/ and the ensuing discussion in the RISC-V LLVM sync-up call. The consensus agreed that the behaviour change in LLD introduced in D138550 that results in object files including arch attributes with unrecognised extensions or versions of extensions is a regression and should be treated as such. As it stands, this logic means that LLD will error out if trying to link a RISC-V object file from LLVM HEAD (rv32i2p0/rv64i2p0) with one from current GCC (rv32i2p1/rv64i2p1 by default).
There's a bigger discussion about exactly when to warn vs error and so on, and how to control that, and this patch doesn't attempt to address all those questions. It simply tries to fix the problem with a minimally invasive change, intended to be cherry-picked for 16.0.x (ideally 16.0.0, but queued for 16.0.1 if we're too late in the release cycle).
As you can see from the test changes, although the changed logic is mostly more permissive, it will reject some embedded arch strings that were accepted before. Because the same logic was previously used for parsing human-written -march as for the arch attribute (intended to be stored in normalised form), various short-hand formats were previously accepted. Maintaining support for such ill-formed attributes would substantially complicate the logic, and given the previous implementation was so much stricter in every other way, was surely a bug rather than a design choice.
Surprisingly, the precise rules for how numbers can be embedded into extension names isn't fully defined (there is a PR to the ISA manual that is not yet merged https://github.com/riscv/riscv-isa-manual/pull/718). In the absence of specific criteria for rejecting extensions names that would be ambiguous in abbreviated form, RISCVISAInfo::parseArchStringNormalized just pulls out the version information from the end of each extension description.
ditto below