Page MenuHomePhabricator

[RISCV] Fix arch string parsing for multi-character extensions
ClosedPublic

Authored by eopXD on Sep 3 2021, 2:16 AM.

Details

Summary

Current implementation can't parse extension names that contains digits
correctly (e.g. zvl128b). This patch fixes it.

Diff Detail

Unit TestsFailed

TimeTest
1,360 msx64 debian > AddressSanitizer-x86_64-linux-dynamic.TestCases::exitcode.cpp
Script: -- : 'RUN: at line 1'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang --driver-mode=g++ -fsanitize=address -mno-omit-leaf-frame-pointer -fno-omit-frame-pointer -fno-optimize-sibling-calls -gline-tables-only -m64 -shared-libasan -g -Wno-deprecated-declarations /var/lib/buildkite-agent/builds/llvm-project/compiler-rt/test/asan/TestCases/exitcode.cpp -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/compiler-rt/test/asan/X86_64LinuxDynamicConfig/TestCases/Output/exitcode.cpp.tmp
1,980 msx64 debian > AddressSanitizer-x86_64-linux.TestCases::exitcode.cpp
Script: -- : 'RUN: at line 1'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang --driver-mode=g++ -fsanitize=address -mno-omit-leaf-frame-pointer -fno-omit-frame-pointer -fno-optimize-sibling-calls -gline-tables-only -m64 -g -Wno-deprecated-declarations /var/lib/buildkite-agent/builds/llvm-project/compiler-rt/test/asan/TestCases/exitcode.cpp -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/compiler-rt/test/asan/X86_64LinuxConfig/TestCases/Output/exitcode.cpp.tmp

Event Timeline

eopXD created this revision.Sep 3 2021, 2:16 AM
eopXD requested review of this revision.Sep 3 2021, 2:16 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 3 2021, 2:16 AM
eopXD updated this revision to Diff 370511.Sep 3 2021, 2:17 AM

Update: Remove redundant codes added.

jrtc27 added a comment.Sep 3 2021, 2:19 AM

RISC-V manual, §27.6:

Standard extensions can also be named using a single “Z” followed by an alphabetical name and an optional version number.

Numbers are not in the alphabet, therefore they cannot be part of a valid name. If you start allowing numbers in the extension name then this introduces the need for all kinds of weird edge cases to remove ambiguity, but leads to huge amounts of confusion. Just don't do it.

eopXD added a comment.Sep 3 2021, 2:28 AM

RISC-V manual, §27.6:

Standard extensions can also be named using a single “Z” followed by an alphabetical name and an optional version number.

Numbers are not in the alphabet, therefore they cannot be part of a valid name. If you start allowing numbers in the extension name then this introduces the need for all kinds of weird edge cases to remove ambiguity, but leads to huge amounts of confusion. Just don't do it.

Hi Jessica,
I was told that upcoming rvv sub-extensions will contain digits in the extension name. (see the v1.0-rc1).
zvl* and zve* would be appended as architecture names inside -march.
Therefore I think your concern should be raised as an issue there?

asb added a comment.Sep 3 2021, 2:35 AM

RISC-V manual, §27.6:

Standard extensions can also be named using a single “Z” followed by an alphabetical name and an optional version number.

Numbers are not in the alphabet, therefore they cannot be part of a valid name. If you start allowing numbers in the extension name then this introduces the need for all kinds of weird edge cases to remove ambiguity, but leads to huge amounts of confusion. Just don't do it.

+1. on this. Thanks for working on this patch @eopXD, but per the current spec, zvl128b isn't a legal extension name. I suppose ISA manual could be updated to allow extensions that include numbers as long as they end with an alphabetical character (other than p!) but as Jessica says, it gets messy...

I've created an issue on the V extension GitHub repo to discuss further https://github.com/riscv/riscv-v-spec/issues/729

eopXD added a comment.Sep 3 2021, 2:41 AM

RISC-V manual, §27.6:

Standard extensions can also be named using a single “Z” followed by an alphabetical name and an optional version number.

Numbers are not in the alphabet, therefore they cannot be part of a valid name. If you start allowing numbers in the extension name then this introduces the need for all kinds of weird edge cases to remove ambiguity, but leads to huge amounts of confusion. Just don't do it.

+1. on this. Thanks for working on this patch @eopXD, but per the current spec, zvl128b isn't a legal extension name. I suppose ISA manual could be updated to allow extensions that include numbers as long as they end with an alphabetical character (other than p!) but as Jessica says, it gets messy...

I've created an issue on the V extension GitHub repo to discuss further https://github.com/riscv/riscv-v-spec/issues/729

Yes. I agree that the ISA manual does not seem to align with the upcoming sub-extensions. Thank you @asb for submitting an issue.
This patch will wait until further discussion is resolved.

eopXD retitled this revision from [RISCV] Fix arch sring parsing for multi-character extensions to [RISCV] Fix arch string parsing for multi-character extensions.Sep 4 2021, 12:23 AM
eopXD updated this revision to Diff 370703.Sep 4 2021, 12:24 AM

Add const-ness to function.

yurai007 added inline comments.
llvm/lib/Support/RISCVISAInfo.cpp
74

Please put short comment which pattern is the correct one. From implementation it looks like [0-9]*p?[0-9]*, right? For now function is not allowed to get empty strings. If it's deliberate would be good to put information about it in comment too. Last but not least - for some input function returns max size_t value due to Pos = -1 conversion to size_t. I'm not sure how paranoid about this we should be since I don't know how limited is set of potential Ext input.

eopXD updated this revision to Diff 381720.Oct 23 2021, 3:46 AM

Address comment

  • Add comment to explain patterns that the function match
  • Avoid Pos from turning into negative number
eopXD updated this revision to Diff 381721.Oct 23 2021, 3:47 AM

Fix minor grammatical error in comment.

eopXD updated this revision to Diff 381724.Oct 23 2021, 4:31 AM

clang-format

eopXD marked an inline comment as done.Oct 23 2021, 4:34 AM
eopXD updated this revision to Diff 381738.Oct 23 2021, 6:43 AM

Fix test case fail.

Herald added a project: Restricted Project. · View Herald TranscriptOct 23 2021, 6:43 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
eopXD updated this revision to Diff 383732.EditedNov 1 2021, 1:30 AM

rebase.

eopXD added a reviewer: asb.Nov 2 2021, 1:34 AM

@asb @jrtc27
Hi Alex and Jessica,

Although the issue on relaxing naming rules aren't resolved in riscv-isa-manual [1], the v-spec v1.0 had come out including zve and zvl extensions and is frozen [2].
I have just finished a series of patch for adding extension zve, zvl support onto LLVM. (Please check the related revisions)
The patches rely on this patch to land, may you share your thought on this.

Thank you.

[1] https://github.com/riscv/riscv-isa-manual/pull/718
[2] https://github.com/riscv/riscv-v-spec/releases/tag/v1.0

eopXD added a reviewer: jrtc27.Nov 2 2021, 1:34 AM
asb accepted this revision.Nov 6 2021, 8:00 AM

This seems to match the emerging consensus in the various RISC-V GitHub issue threads and as you point out, is needed to support extensions in the 1.0 V spec. Looks good to me - thanks!

clang/test/Driver/riscv-arch.c
395

This does highlight that the error reporting is picking the wrong thing to complain about it - it would be better to complain that 'zbb1p0zbp1 is an unrecognised extension. Though I don't think this is an issue introduced by your patch.

This revision is now accepted and ready to land.Nov 6 2021, 8:00 AM
eopXD updated this revision to Diff 385329.Nov 7 2021, 12:55 AM

Rebase.

eopXD updated this revision to Diff 393174.Dec 9 2021, 8:17 AM

One last rebase before landing.