This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Decode vtype with reserved fields to raw immediate
ClosedPublic

Authored by benshi001 on Nov 25 2021, 4:14 AM.

Details

Summary

This patch fixes a crash when doing "llvm-objdump -D --mattr=+experimental-v"
against an object file which happens to keep a word that can be decoded to
VSETVLI & VSETIVLI with reserved vlmul[2:0]=4. All vtype values with
reserved fields (vlmul[2:0]=4, vsew[2:0]=0b1xx, non-zero bits 8/9/10) are
printed to raw immediate.

Diff Detail

Event Timeline

benshi001 created this revision.Nov 25 2021, 4:14 AM
benshi001 requested review of this revision.Nov 25 2021, 4:14 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 25 2021, 4:14 AM
benshi001 edited the summary of this revision. (Show Details)Nov 25 2021, 4:15 AM
This comment was removed by benshi001.
This comment was removed by benshi001.
jhenderson added inline comments.Nov 25 2021, 4:51 AM
llvm/test/tools/llvm-objdump/RISCV/vsetvli.test
1 ↗(On Diff #389730)

I'm not at all familiar with what's going on in the disassembly, so can't review the rest of this patch, but could you use llvm-mc to create the test input file at runtime, rather than checking in an opaque binary?

Also, I recommend:
a) Use the same comment markers for your RUN and CHECK lines (I prefer '#').
b) Don't bother checking the addresses or raw bytes, unless they're important to the test.
c) If you're just disassembling a regular function, use -d rather than -D.
d) Don't add --check-prefix=CHECK: it's the default, so there's no need to explicitly specify it.

benshi001 updated this revision to Diff 389748.Nov 25 2021, 5:37 AM
benshi001 marked an inline comment as done.Nov 25 2021, 5:39 AM
benshi001 added inline comments.
llvm/test/tools/llvm-objdump/RISCV/vsetvli.test
1 ↗(On Diff #389730)
  1. I have changed all markers to #
  1. I have eliminated the binary test file, and replaced it with text form.
  1. I have to use "-D" than "-d", since the llvm-objdump crash can only be triggered by "-D"
  1. I have eliminated the explicit "--prefix=CHECK"
jhenderson added inline comments.Nov 25 2021, 5:54 AM
llvm/test/tools/llvm-objdump/RISCV/vsetvli.s
5–7 ↗(On Diff #389748)

Do you need to test the addresses/raw instruction bytes for this test to be valid?

Also, can you use CHECK-NEXT for the second and third lines here?

benshi001 updated this revision to Diff 389767.Nov 25 2021, 6:46 AM
benshi001 marked an inline comment as done.
benshi001 marked an inline comment as done.
jhenderson added inline comments.Nov 25 2021, 7:12 AM
llvm/test/tools/llvm-objdump/RISCV/vsetvli.s
4–6 ↗(On Diff #389767)

Unless you specify --match-full-lines to FileCheck, there's an implicit {{.*}} at the start and end of the line, so you just omit that regex pattern.

benshi001 updated this revision to Diff 389795.Nov 25 2021, 7:49 AM
benshi001 marked an inline comment as done.

Thanks, test looks fine. I'll leave it to someone who understands the rest of the code to review that though.

jrtc27 added inline comments.Nov 25 2021, 8:13 AM
llvm/test/tools/llvm-objdump/RISCV/vsetvli.s
1 ↗(On Diff #389795)

You don't need a temporary file

5 ↗(On Diff #389795)

Is this really the right way to disassemble this? I'd expect it to just be disassembled like any other illegal instruction, not as a real instruction with some arbitrary hard-coded human-readable string injected in the middle of it. For all we know the encoding could be used for something completely different in future.

8 ↗(On Diff #389795)

This section directive is not necessary

benshi001 updated this revision to Diff 389941.Nov 26 2021, 1:38 AM
benshi001 edited the summary of this revision. (Show Details)
benshi001 marked 3 inline comments as done.Nov 26 2021, 1:41 AM
benshi001 added inline comments.
llvm/test/tools/llvm-objdump/RISCV/vsetvli.s
5 ↗(On Diff #389795)

So I will decode it to <unknown>, which I think it is better than crash.

benshi001 marked an inline comment as done.Nov 26 2021, 1:41 AM
benshi001 updated this revision to Diff 389943.Nov 26 2021, 1:49 AM
benshi001 updated this revision to Diff 389982.Nov 26 2021, 3:47 AM
benshi001 edited the summary of this revision. (Show Details)
craig.topper added inline comments.Nov 26 2021, 8:58 PM
llvm/lib/Target/RISCV/Disassembler/RISCVDisassembler.cpp
426

I don't think this is the right way to do this. I think the right way is to change the DecoderMethod for VTypeIOp to a new function instead of decodeUImmOperand<11>. The new decoder function can return MCDisassembler::Fail;

benshi001 updated this revision to Diff 390157.Nov 27 2021, 5:03 AM
benshi001 marked an inline comment as done.
benshi001 updated this revision to Diff 390290.Nov 29 2021, 1:58 AM
benshi001 edited the summary of this revision. (Show Details)
benshi001 edited the summary of this revision. (Show Details)Nov 29 2021, 2:07 AM
benshi001 updated this revision to Diff 390296.Nov 29 2021, 3:06 AM
This revision is now accepted and ready to land.Nov 29 2021, 12:58 PM

Looking at the vector spec, what about SEW? All of 100-111 are currently reserved. And why's vtypei 11 bits; isn't it 9 in the spec (12 I could understand if zero-extending to fill the I-type value, but 11 is odd)? Maybe I'm missing something, just glancing through.

llvm/lib/Target/RISCV/Disassembler/RISCVDisassembler.cpp
280–281

I'd just stop at "is reserved", the latter part of the sentence applies to any reserved encoding so doesn't need special mention, just that it's LMUL=4 (technically vlmul[2:0]; LMUL is the name for the decoded value, vlmul is the actual encoded thing) we're checking for here.

craig.topper requested changes to this revision.Nov 29 2021, 2:05 PM
This revision now requires changes to proceed.Nov 29 2021, 2:05 PM

Looking at the vector spec, what about SEW? All of 100-111 are currently reserved. And why's vtypei 11 bits; isn't it 9 in the spec (12 I could understand if zero-extending to fill the I-type value, but 11 is odd)? Maybe I'm missing something, just glancing through.

vsetvli uses 11 bits zimm[10:0] (bits 30:20 of the instruction), vsetivli uses 10 bits zimm[9:0] (bits 29:20 of the instruction). vsetvli is the older instruction and uses more bits. I assume that's where the 11 came from.

llvm/lib/Target/RISCV/Disassembler/RISCVDisassembler.cpp
280–281

Agree this should mention vlmul not LMUL. I should have caught that.

Looking at the vector spec, what about SEW? All of 100-111 are currently reserved. And why's vtypei 11 bits; isn't it 9 in the spec (12 I could understand if zero-extending to fill the I-type value, but 11 is odd)? Maybe I'm missing something, just glancing through.

vsetvli uses 11 bits zimm[10:0] (bits 30:20 of the instruction), vsetivli uses 10 bits zimm[9:0] (bits 29:20 of the instruction). vsetvli is the older instruction and uses more bits. I assume that's where the 11 came from.

Ok. What do the other bit(s) mean? Are they ignored or treated as invalid (until such time as they are given meaning)? I can't obviously see a specification for how the bits are actually encoded in zimm[10:0], just "The new vtype setting is encoded in the immediate fields of vsetvli and vsetivli", which I guess just means it's the low bits of vtype, including bits 8 through 9/10 which are defined to be 0 and so should be checked?

The Spike implementation seems to be that all immediates are legal instructions, but vill gets set if vlmul, vsew or bits 8 and up are invalid. So god knows if we should in fact be disassembling it as vsetvli somehow after all, but if so then the output it gives needs to be valid input, and there is no assembly defined in the vector spec that would give rise to such encodings. This is what you get for not having a complete spec, and not having any Sail to act as a stand in for complete prose...

One option would be to define an alternative non-canonical (except for illegal values) vsetvli rd, rs1, 0xabc syntax, much like we do for CSRs (but they don't have the weirdness of lmul/sew/ta/ma syntax).

In fact, that seems to be exactly what binutils does, from a quick grep of vsetvli in the codebase (see gas/testsuite/gas/riscv/vector-insns.[sd] for their test that includes testing precisely this issue).

Looking at the vector spec, what about SEW? All of 100-111 are currently reserved. And why's vtypei 11 bits; isn't it 9 in the spec (12 I could understand if zero-extending to fill the I-type value, but 11 is odd)? Maybe I'm missing something, just glancing through.

vsetvli uses 11 bits zimm[10:0] (bits 30:20 of the instruction), vsetivli uses 10 bits zimm[9:0] (bits 29:20 of the instruction). vsetvli is the older instruction and uses more bits. I assume that's where the 11 came from.

Ok. What do the other bit(s) mean? Are they ignored or treated as invalid (until such time as they are given meaning)? I can't obviously see a specification for how the bits are actually encoded in zimm[10:0], just "The new vtype setting is encoded in the immediate fields of vsetvli and vsetivli", which I guess just means it's the low bits of vtype, including bits 8 through 9/10 which are defined to be 0 and so should be checked?

The Spike implementation seems to be that all immediates are legal instructions, but vill gets set if vlmul, vsew or bits 8 and up are invalid. So god knows if we should in fact be disassembling it as vsetvli somehow after all, but if so then the output it gives needs to be valid input, and there is no assembly defined in the vector spec that would give rise to such encodings. This is what you get for not having a complete spec, and not having any Sail to act as a stand in for complete prose...

One option would be to define an alternative non-canonical (except for illegal values) vsetvli rd, rs1, 0xabc syntax, much like we do for CSRs (but they don't have the weirdness of lmul/sew/ta/ma syntax).

In fact, that seems to be exactly what binutils does, from a quick grep of vsetvli in the codebase (see gas/testsuite/gas/riscv/vector-insns.[sd] for their test that includes testing precisely this issue).

I think there's something funny about the objdump behavior. The disassembly only prints the immediate for 0x400-0x7ff. Not for 0x100-0x3ff which also have extra bits.

I've notified some of SiFive's binutils developers about the issues. Including that vlmul=0b100 causes objdump to print a regular vsetvli with (null) where the lmul string should go. They will be printing the immediate form.

So I think the LLVM disassembler should decode any immediate value. The RISCVInstPrinter should print the immediate form if any extra bits or reserved encodings are used. The assembly parser should be updated to accept the immediate form if it doesn't already.

benshi001 updated this revision to Diff 390554.Nov 29 2021, 7:31 PM
benshi001 edited the summary of this revision. (Show Details)
benshi001 marked 2 inline comments as done.Nov 29 2021, 7:34 PM

I've notified some of SiFive's binutils developers about the issues. Including that vlmul=0b100 causes objdump to print a regular vsetvli with (null) where the lmul string should go. They will be printing the immediate form.

So I think the LLVM disassembler should decode any immediate value. The RISCVInstPrinter should print the immediate form if any extra bits or reserved encodings are used. The assembly parser should be updated to accept the immediate form if it doesn't already.

So how about I do "RISCVInstPrinter should print the immediate form" in current patch? And I will make another patch for "assembly parser should be updated to accept the immediate form" ?

I've notified some of SiFive's binutils developers about the issues. Including that vlmul=0b100 causes objdump to print a regular vsetvli with (null) where the lmul string should go. They will be printing the immediate form.

So I think the LLVM disassembler should decode any immediate value. The RISCVInstPrinter should print the immediate form if any extra bits or reserved encodings are used. The assembly parser should be updated to accept the immediate form if it doesn't already.

So how about I do "RISCVInstPrinter should print the immediate form" in current patch? And I will make another patch for "assembly parser should be updated to accept the immediate form" ?

Sure.

benshi001 updated this revision to Diff 390559.Nov 29 2021, 8:15 PM

I've notified some of SiFive's binutils developers about the issues. Including that vlmul=0b100 causes objdump to print a regular vsetvli with (null) where the lmul string should go. They will be printing the immediate form.

So I think the LLVM disassembler should decode any immediate value. The RISCVInstPrinter should print the immediate form if any extra bits or reserved encodings are used. The assembly parser should be updated to accept the immediate form if it doesn't already.

So how about I do "RISCVInstPrinter should print the immediate form" in current patch? And I will make another patch for "assembly parser should be updated to accept the immediate form" ?

Sure.

Done. I have uploaded the patch about RISCVInstPrinter should print the immediate form, you are appreciated to have a check.

Do you plan to address all of the other reserved encodings in the printer too? Bits 8, 9, or 10 being non-zero. vsew=0b1xx. They can be other patches I would just like to know if I need to find someone to do it.

llvm/lib/Target/RISCV/MCTargetDesc/RISCVInstPrinter.cpp
174 ↗(On Diff #390559)

Use RISCVVType::getVLMUL and check for VLMUL::LMUL_RESERVED

benshi001 edited the summary of this revision. (Show Details)
benshi001 marked an inline comment as done.
benshi001 marked an inline comment as not done.Nov 29 2021, 10:57 PM
benshi001 updated this revision to Diff 390579.EditedNov 29 2021, 11:21 PM
benshi001 edited the summary of this revision. (Show Details)
benshi001 marked an inline comment as done.

Do you plan to address all of the other reserved encodings in the printer too? Bits 8, 9, or 10 being non-zero. vsew=0b1xx. They can be other patches I would just like to know if I need to find someone to do it.

I have handled all reserved cases in current patch, along with corresponding tests for each. Thanks for your suggestion.

craig.topper accepted this revision.Nov 30 2021, 12:12 AM

LGTM Please update the title to more directly describe the final change.

This revision is now accepted and ready to land.Nov 30 2021, 12:12 AM
benshi001 retitled this revision from [RISCV] Fix a crash in decoding LMUL in VTYPE to [RISCV] Fix a crash in decoding reserved vtype fields.Nov 30 2021, 12:20 AM
benshi001 retitled this revision from [RISCV] Fix a crash in decoding reserved vtype fields to [RISCV] Decode vtype with reserved fields to raw immediate.Nov 30 2021, 12:28 AM
benshi001 edited the summary of this revision. (Show Details)
jrtc27 added inline comments.Nov 30 2021, 5:47 AM
llvm/lib/Target/RISCV/MCTargetDesc/RISCVInstPrinter.cpp
176 ↗(On Diff #390599)

Bit 10 doesn't exist in the immediate for vsetivli, though presumably it's always zero extended due to being a uimm so it doesn't matter. I'd have done (Imm >> 8) != 0 instead though as it avoids needing to care about exactly how many bits to check, and matches the assert in printFenceArg.

This does also make the use of decodeUImmOperand<11> for vsetivli's operand a bit odd, but it does work given the N is just for an assert. When adding the immediate integer parsing support though you will discover that VTypeIAsmOperand, and thus VTypeIOp, need splitting so that you can ban 11-bit immediate arguments to vsetivli.

And not sure why you committed this with "Reviewed By: jhenderson, jrtc27, craig.topper", Craig was the only one to approve it, and James somewhat by saying an older version of the test looked fine

And not sure why you committed this with "Reviewed By: jhenderson, jrtc27, craig.topper", Craig was the only one to approve it, and James somewhat by saying an older version of the test looked fine

I am not clear about the rule of the Reviewed By: list is only for who approve patches. Since you and Craig and James made comments, I thought you all have read (==reviewed) the code. I will pay attention to that next time.

craig.topper added inline comments.Dec 7 2021, 9:42 AM
llvm/lib/Target/RISCV/MCTargetDesc/RISCVInstPrinter.cpp
176 ↗(On Diff #390599)

Bit 10 doesn't exist in the immediate for vsetivli, though presumably it's always zero extended due to being a uimm so it doesn't matter. I'd have done (Imm >> 8) != 0 instead though as it avoids needing to care about exactly how many bits to check, and matches the assert in printFenceArg.

Changed to Imm >> 8 in 622d6894801b19e795737e133f5963d248de45af