Current implementation can't parse extension names that contains digits
correctly (e.g. zvl128b). This patch fixes it.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
RISC-V manual, §27.6:
Standard extensions can also be named using a single “Z” followed by an alphabetical name and an optional version number.
Numbers are not in the alphabet, therefore they cannot be part of a valid name. If you start allowing numbers in the extension name then this introduces the need for all kinds of weird edge cases to remove ambiguity, but leads to huge amounts of confusion. Just don't do it.
Hi Jessica,
I was told that upcoming rvv sub-extensions will contain digits in the extension name. (see the v1.0-rc1).
zvl* and zve* would be appended as architecture names inside -march.
Therefore I think your concern should be raised as an issue there?
+1. on this. Thanks for working on this patch @eopXD, but per the current spec, zvl128b isn't a legal extension name. I suppose ISA manual could be updated to allow extensions that include numbers as long as they end with an alphabetical character (other than p!) but as Jessica says, it gets messy...
I've created an issue on the V extension GitHub repo to discuss further https://github.com/riscv/riscv-v-spec/issues/729
Yes. I agree that the ISA manual does not seem to align with the upcoming sub-extensions. Thank you @asb for submitting an issue.
This patch will wait until further discussion is resolved.
llvm/lib/Support/RISCVISAInfo.cpp | ||
---|---|---|
74 | Please put short comment which pattern is the correct one. From implementation it looks like [0-9]*p?[0-9]*, right? For now function is not allowed to get empty strings. If it's deliberate would be good to put information about it in comment too. Last but not least - for some input function returns max size_t value due to Pos = -1 conversion to size_t. I'm not sure how paranoid about this we should be since I don't know how limited is set of potential Ext input. |
Address comment
- Add comment to explain patterns that the function match
- Avoid Pos from turning into negative number
@asb @jrtc27
Hi Alex and Jessica,
Although the issue on relaxing naming rules aren't resolved in riscv-isa-manual [1], the v-spec v1.0 had come out including zve and zvl extensions and is frozen [2].
I have just finished a series of patch for adding extension zve, zvl support onto LLVM. (Please check the related revisions)
The patches rely on this patch to land, may you share your thought on this.
Thank you.
[1] https://github.com/riscv/riscv-isa-manual/pull/718
[2] https://github.com/riscv/riscv-v-spec/releases/tag/v1.0
This seems to match the emerging consensus in the various RISC-V GitHub issue threads and as you point out, is needed to support extensions in the 1.0 V spec. Looks good to me - thanks!
clang/test/Driver/riscv-arch.c | ||
---|---|---|
395 ↗ | (On Diff #383773) | This does highlight that the error reporting is picking the wrong thing to complain about it - it would be better to complain that 'zbb1p0zbp1 is an unrecognised extension. Though I don't think this is an issue introduced by your patch. |
Please put short comment which pattern is the correct one. From implementation it looks like [0-9]*p?[0-9]*, right? For now function is not allowed to get empty strings. If it's deliberate would be good to put information about it in comment too. Last but not least - for some input function returns max size_t value due to Pos = -1 conversion to size_t. I'm not sure how paranoid about this we should be since I don't know how limited is set of potential Ext input.