This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] More validations on the input value of -march=
ClosedPublic

Authored by apazos on Apr 4 2018, 1:28 PM.

Details

Summary

[RISCV] More validations on the input value of -march=

  • RISC-V ISA strings must be lowercase.

E.g.: rv32IMC is not supported, rv32imc is correct.

  • Multi-letter extensions are to be separated by a single underscore '_'.

The extension prefix counts as a letter.
This means extensions that start with S, X and SX are all multi-letter.
E.g.:
xasb is a single non-standard extension named 'xasb'
xa_sb are two extensions, the non-standard user level extension 'xa',
and the supervisor level extension 'sb'.

  • Extensions might have a version number.

Underscores may be used to separate ISA subset components to improve
readability and to provide disambiguation.
E.g.: rv32i2_m3_a1_f2_d2

  • Additional checks for dependent extensions and invalid extensions combinations. E.g.: 'e' requires rv32 'e' can't be combined with 'f' nor 'd' 'q' requires rv64

Diff Detail

Event Timeline

apazos created this revision.Apr 4 2018, 1:28 PM
apazos updated this revision to Diff 141043.Apr 4 2018, 1:43 PM

updated test case

apazos updated this revision to Diff 141048.Apr 4 2018, 1:49 PM

test fix

apazos updated this revision to Diff 141050.Apr 4 2018, 1:52 PM

fixed test merged line

asb added inline comments.Apr 4 2018, 1:56 PM
test/Driver/riscv-arch.c
151

But rv32e is a valid arch name.

161

Both GCC and the current clang ISA string logic expect lowercase letters. However, the spec does state that the string should be treated as case insensitive. We're probably best following GCC, though perhaps we should discuss this with them

Long time ago, GCC also accept upper case too, but I have no idea why Andrew change that? I guess one possible reason is because multi-lib?

[1] https://github.com/riscv/riscv-gcc/commit/6531a11f03ec3a95cd8b9033daeab0ebf23b5472

asb added a comment.Apr 5 2018, 1:34 AM

Based on Andrew's response (thanks Kito for sending the query) it looks like GCC accepting lowercase only is intentional, and we should follow that. In which case, it might be an improvement to reject uppercase letters in the ISA string with a message saying that only lowercase letters are accepted.

apazos updated this revision to Diff 141717.Apr 9 2018, 12:46 PM
apazos edited the summary of this revision. (Show Details)

Updated code according to the ISA string rules that have been clarified.

asb added a comment.Apr 12 2018, 5:46 AM

I've added a few comments on tweaking the error messages based on your tests.

test/Driver/riscv-arch.c
157

But the given string does begin with rv32. 'string must begin with rv32{i,e,g} or rv64{i,g}'?

167

I don't think this message is clear, specifically 'unsupported canonical order...'. How about 'error: standard extension not given in canonical order'

203

I think this would be clearer if you said "name missing after 'sx'"

208

Ditto.

213

Ditto.

mgrang added a subscriber: mgrang.Apr 12 2018, 8:10 AM
apazos updated this revision to Diff 142279.Apr 12 2018, 3:30 PM

Updated error messages and fixed getExtensionVersion

apazos updated this revision to Diff 142459.Apr 13 2018, 1:12 PM
  • Simplified hasOtherExtensions() now that we clarified non-standard extensions are separated by '_'. We just need to find the first occurrence of the prefixes.
  • updated error message, removed "unsupported" from some messages.
apazos updated this revision to Diff 142502.Apr 13 2018, 6:32 PM

Fixed failure in release mode

asb added inline comments.Apr 16 2018, 4:22 AM
lib/Driver/ToolChains/Arch/RISCV.cpp
50

You should probably document the limitation that this doesn't currently parse minor versions e.g. i2p0.

157–159

This could be tightened up by also rejected rv64e as invalid.

197–198

I'd suggest either just using StringRef::contains and getting rid of hasExtension, or adding a doc comment to hasExtension to explain its semantics.

It might also be worth adding a comment to explain why you want to check the extension is present in the StdExts string (e.g. We have reached the end of the StdExts string. Either the current extension was given outside of the canonical order (in which case issue an error), or else no canonical ordering is defined meaning no error should be generated'.

200

Won't this now iterate StdExtsItr past StdExtsEnd if StdExtsItr == StdExtsEnd but the hasExtension call is false?

234–239

Add a TODO about other dependencies perhaps? e.g. ef and efd are invalid, and q requires rv64.

apazos added inline comments.Apr 16 2018, 4:34 PM
lib/Driver/ToolChains/Arch/RISCV.cpp
50

Correct, it is parsing the major version for each standard extension. Will make note of how to parse minor version.

157–159

OK, will add an error message for the invalid combo 'rv64' and 'e', though e is not supported yet for rv32.

197–198

When we reach here, either c contains a valid extension but it was not given in
canonical order or it is an invalid extension. The code that follows was checking for the former, while the check for the latter happens in the switch default statement right below. But we can handle both here and print the appropriate messages, and leave the the check in switch statement just error out if LLVM does not support the extension. Yes, we can also get rid of hasExtension, it is not used any other place anymore.

200

At this point, if StdExtsItr is StdExtsEnd, the code will error out in the switch case default statement. It means you found an invalid extension. Otherwise it will return due to invalid canonical order check above.

I willl move both error conditions to the same place to make the logic clearer.

234–239

will make a note of the additional dependency checks.

apazos updated this revision to Diff 142716.Apr 16 2018, 4:38 PM
apazos edited the summary of this revision. (Show Details)

Addressed the latest review comments.
Added TODOs for validations we cannot do now.

asb added a comment.Apr 19 2018, 5:50 AM

This is looking great, the only remaining code comment I have is that getExtensionFeatures needs a comment describing it.

The remaining issue I have is more of a spec issue - do canonical ordering requirements apply to extension categories? e.g. is sfoo_xbar illegal because it should be ordered xbar_sfoo? From my reading of the note below table 22.1 in the spec I think it is illegal and ordering of non-standard user-level vs standard supervisor-level vs nons-standard supervisor-level should be checked. Do you agree with my interpretation?

Hi Alex, it seems the table expects these extensions in a canonical order too: all x extensions, followed by all s extensions, and then all sx extensions.

I can make the change, no problem. I have also coded other error situations described below.

But f I cannot push a test we can enable, because we error out when we find the first non-supported extension in the string with unsupported extension message, and stop parsing the string.

Any suggestion?

Examples:

clang -target riscv32-unknown-elf -march=rv32ixabc_ -###
extension name missing after separator '_'

clang -target riscv32-unknown-elf -march=rv32ixabc_a -###
invalid extension prefix 'a'

clang -target riscv32-unknown-elf -march=rv32isabc_xdef -###
non-standard user-level extension not given in canonical order 'xdef'

clang -target riscv32-unknown-elf -march=rv32isxabc_sdef -###
standard supervisor-level extension not given in canonical order 'sdef'

clang -target riscv32-unknown-elf -march=rv32ixabc_xabc -###
duplicated non-standard user-level extension 'xabc'

clang -target riscv32-unknown-elf -march=rv32ixabc_xdef -###
no parsing error, should be accepted if xabc and xdef are valid extensions

clang -target riscv32-unknown-elf -march=rv32ixabc_sdef_sxghi -###
no parsing error, should be accepted if xabc sdef sxghi are valid extensions

asb added a comment.Apr 23 2018, 8:54 AM

Hi Alex, it seems the table expects these extensions in a canonical order too: all x extensions, followed by all s extensions, and then all sx extensions.

I can make the change, no problem. I have also coded other error situations described below.

But f I cannot push a test we can enable, because we error out when we find the first non-supported extension in the string with unsupported extension message, and stop parsing the string.

In the best case we would first parse the string and give errors based on the string being malformed before determining which extensions the compiler supports and complaining if the extension is not supported. This would probably move us towards a two-pass scheme where structs are are generated from the string, (parsing stage) which are then processed. e.g. { enum ExtType Type; StringRef name; int MajorVersion; int MinorVersion; }. The parser can issue errors encountered when extracting this struct (e.g. malformed version identifiers), including erroring if the ordering of the ExtType relative to the previous ExtType is unexpected. The next step is complaining about semantic issues (clashing extensions or extensions with missing dependencies) as well as extensions that aren't supported.

However I realise that might be a larger refactoring. This is already the most complete RISC-V ISA string parser in existence (as far as I know), which goes some way beyond our near-term requirements. I'd be happy to merge this current version and evolve it in-tree if you prefer, or of course I'll happily review further updates to this patch.

apazos updated this revision to Diff 143674.Apr 23 2018, 6:13 PM

Hi Alex, the refactoring will be simple and can be done later when we need it, all the pieces are already parsed (type, name, major, minor) and are in strings, we will only need to convert to the preferred type (enum, int, etc).
I changed the code to not error out on unsupported extension, allowing all the non-standard extensions to be parsed and the format errors reported. Only when we try to set target features I then enforce unsupported extension.
This way I could push all my tests.

asb accepted this revision.Apr 25 2018, 6:06 AM

Looks good to me - thanks!

This revision is now accepted and ready to land.Apr 25 2018, 6:06 AM
This revision was automatically updated to reflect the committed changes.