This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Modify arch string parsing order according to latest riscv spec
ClosedPublic

Authored by joshua-arch1 on Apr 14 2023, 2:25 AM.

Details

Summary

According to latest risc-v spec, the canonical order in which extension names must appear in the name string specified in Table 29.1 is different from before. In the latest table, non-standard extensions must be listed after all standard extensions. To keep consistent, we now change the parsing order in parseArchString().

Diff Detail

Event Timeline

joshua-arch1 created this revision.Apr 14 2023, 2:25 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 14 2023, 2:25 AM
joshua-arch1 requested review of this revision.Apr 14 2023, 2:25 AM
joshua-arch1 edited the summary of this revision. (Show Details)Apr 14 2023, 2:27 AM
joshua-arch1 edited the summary of this revision. (Show Details)
joshua-arch1 edited the summary of this revision. (Show Details)Apr 14 2023, 2:38 AM

LGTM, but I would like wait @asb ack here to confirm this change.


I hate those rule changes...:(

The damage is probably limited in scope from a build system perspective given it’s only noticeable if you’re writing software that needs S extensions, ie privileged ones? This is still completely crap from RVI though and continues to make them look incompetent by making more breaking changes. @philipp.tomsich @cmuellner How can we push back against RVI doing this? Is GCC/binutils going to follow suit? And I guess we also have an issue on the psABI side since the arch string there is meant to be canonical… so maybe toolchains should NAK this and force RVI to backtrack?

Hm, I’m struggling to find a recent change in riscv-isa-manual. Did we just get this wrong before rather than it being a breaking change in the spec like you say in your summary? Please give more information as I don’t want to accuse RVI of bad practice if your statement isn’t correct…

https://github.com/riscv/riscv-isa-manual/commit/84a7edb256d290cb436227c35a8a4f814fa41898

I did some archaeology today, seems like it not only change the order but also removed sx prefix.

GCC and binutils will follow that, we didn't follow that just because we didn't know that change…

We need to remember that all of this is currently a draft document.
Ratification currently only ratifies extensions, but not changes to the entire document.

Next time this comes up for public comment, we need to highlight incompatible changes appropriately.

joshua-arch1 added a comment.EditedApr 16 2023, 7:13 PM

We need to remember that all of this is currently a draft document.
Ratification currently only ratifies extensions, but not changes to the entire document.

Next time this comes up for public comment, we need to highlight incompatible changes appropriately.

I can see the order change mentioned in https://github.com/riscv/riscv-isa-manual/commit/84a7edb256d290cb436227c35a8a4f814fa41898 was commited early in Dec, 2018. I'm wondering why this change has not been ratified yet within more than 4 years.

asb accepted this revision.EditedApr 17 2023, 10:22 PM

LGTM. I suppose a sensible follow-up patch would be to remove Sx altogether. I'm not aware of the prefix being used at all.

This specific patch only affects how we ingest ISA naming strings. The logic used for OrderedExtensionMap is separate, though looking at that seems to have an issue - it sorts S* then Z* then X*. I think this is probably due to misreading the table in the ISA manual (which I did when first composing this response!) - it's only the "Zxm*" machine-level extensions that should be sorted after S extensions, all other Z extensions should be before S.

parseNormalizedArchString, now used by LLD and llvm-objdump etc isn't strict about order, which is overall an advantage in this case as LLVM tooling isn't broken by changing (or fixing) the order. (EDIT: when ingesting binaries at least - you'd see problems with a .s that sets the attributes)

This revision is now accepted and ready to land.Apr 17 2023, 10:22 PM
joshua-arch1 added a comment.EditedApr 17 2023, 11:57 PM

LGTM. I suppose a sensible follow-up patch would be to remove Sx altogether. I'm not aware of the prefix being used at all.

This specific patch only affects how we ingest ISA naming strings. The logic used for OrderedExtensionMap is separate, though looking at that seems to have an issue - it sorts S* then Z* then X*. I think this is probably due to misreading the table in the ISA manual (which I did when first composing this response!) - it's only the "Zxm*" machine-level extensions that should be sorted after S extensions, all other Z extensions should be before S.

parseNormalizedArchString, now used by LLD and llvm-objdump etc isn't strict about order, which is overall an advantage in this case as LLVM tooling isn't broken by changing (or fixing) the order. (EDIT: when ingesting binaries at least - you'd see problems with a .s that sets the attributes)

Do we need to modify the values in RankFlags() for OrderedExtensionMap?

This revision was landed with ongoing or failed builds.Apr 18 2023, 1:20 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptApr 18 2023, 1:20 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
asb added a comment.Apr 18 2023, 1:46 AM

@joshua-arch1: I've posted D146815 to fix the canonical ordering and directly committed rGa35e67fc5be654a7efdfa6125343b90f8960a487 to add some test coverage.

asb added a comment.Apr 18 2023, 11:49 PM

I'm starting to think we should just remove the ordering rules for z/s/x altogether when parsing arch strings I see that gcc 12.2.0 actually requires s and then z:

[asb@purge ~]$ riscv64-linux-gnu-gcc -march=rv64imafdc_svinval_zicbom t.c -c
[asb@purge ~]$ riscv64-linux-gnu-gcc -march=rv64imafdc_zicbom_svinval t.c -c
riscv64-linux-gnu-gcc: error: '-march=rv64imafdc_zicbom_svinval': unexpected ISA string at end: 'svinval'

So ISA naming strings aren't easily portable between clang and GCC right now, even if the same extension names are recognised.

joshua-arch1 added a comment.EditedApr 19 2023, 12:14 AM

I'm starting to think we should just remove the ordering rules for z/s/x altogether when parsing arch strings I see that gcc 12.2.0 actually requires s and then z:

[asb@purge ~]$ riscv64-linux-gnu-gcc -march=rv64imafdc_svinval_zicbom t.c -c
[asb@purge ~]$ riscv64-linux-gnu-gcc -march=rv64imafdc_zicbom_svinval t.c -c
riscv64-linux-gnu-gcc: error: '-march=rv64imafdc_zicbom_svinval': unexpected ISA string at end: 'svinval'

So ISA naming strings aren't easily portable between clang and GCC right now, even if the same extension names are recognised.

One of my colleagues working on GCC posted a patch to modify this order in GCC yesterday.
https://github.com/gcc-mirror/gcc/commit/4204ed2dc74390ab3689d1d6a53001761338baf6#diff-d6f7db0db31bfb339b01bec450f1b905381eb4730cc5ab2b2794971e34647d64
Also, Binutils requires z and then s, which is inconsistent with GCC now.