Details on the proposed extension: https://github.com/riscv/riscv-bitmanip/blob/master/bitmanip-0.92.pdf
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Target/RISCV/RISCVInstrInfoB.td | ||
---|---|---|
18 | I'm concerned that with this definition you won't be able to enforce the immediate fitting in the field. A better approach would probably be to add an instruction format from scratch with a uimm6 shamt operand. Likewise for RVBInstFunct5, RVBInstFunct6, RVBInstFunct7, with the appropriate operand types for however many bits the shamt operand requires. Invalid immediate tests should also then be added. | |
68 | Is there an explanation for why this instruction is not re-using RVInstR4 from RISCVInstrFormats.td? | |
89 | Slight typo here, this instruction format is for more than one instruction so should use opcodestr rather than the hard coded "fsriw". | |
108 | Vice-versa here, there is still an opcodestr passed to this instruction format. I prefer the approach of keeping the instruction format generic even if there will only ever be one opcodestr, and having the instruction specify it. |
Fixed typos, bad indentations and capitalization.
Added uimm6 and uimm7 riscv asm operand types in order to have more precise classes to represent the templates of the instructions with 5-bit, 6-bit and 7-bit immediate fields. The riscv asm operand uimm5 already existed.
Added missing rev pseudo instructions for riscv64.
Fixed the tests accordingly.
Added comments.
Added the latest updates to the encodings of the instructions, including the gorc, gorci, bfp instructions, the 'orc' pseudo instructions based on gorci and the zip/unzip pseudo instructions based on shfli/unshfli. Changed the encoding of grev and bmatxor to make space for gorc. Added the subtarget feature Zbf to support bfp. Added the necessary MC tests.
Update the encodings to the latest version of the spec. Fix the tests and encoding conflicts.
llvm/lib/Target/RISCV/RISCVInstrInfoB.td | ||
---|---|---|
68 | Yes, that was because with RVInstR4 I couldn't use funct3 for encoding. As it is declared in RISCVInstrFormats.td: class RVInstR4<bits<2> funct2, RISCVOpcode opcode, dag outs, dag ins, string opcodestr, string argstr> : RVInst<outs, ins, opcodestr, argstr, [], InstFormatR4> { bits<5> rs3; bits<5> rs2; bits<5> rs1; bits<3> funct3; bits<5> rd; let Inst{31-27} = rs3; let Inst{26-25} = funct2; let Inst{24-20} = rs2; let Inst{19-15} = rs1; let Inst{14-12} = funct3; let Inst{11-7} = rd; let Opcode = opcode.Value; } any parameter passed as funct3 would be ignored. |
Update encodings to RISCV BitManip v0.92:
add the encodings of intstructions packu[w], packh, sext.b, sext.h,
update the encoding of instructions bfp[w] and bdep[w],
update tests accordingly.
Added empty scheduling descriptions, just to let it build with the upstream LLVM.
Added zext.* and sext.w pseudo instructions.
llvm/lib/Target/RISCV/RISCV.td | ||
---|---|---|
107 | This isn't doing what you want it to. This patch currently causes MC/RISCV/invalid-instruction-spellcheck.s to fail because of the lack of an AssemblyPredicate so the instruction is always enabled. Unfortunately it looks like AsmPredicate doesn't support saying if Feature X or Y is enabled, and we can only compose them as needing X and Y on the Instruction definitions themselves. I couldn't find any other backend needing something like this, so it might be that we need a dummy SubtargetFeature, or teach TableGen that AsmPredicates can be composed in an or fashion instead. |
llvm/lib/Target/RISCV/RISCVISelLowering.cpp | ||
---|---|---|
155 ↗ | (On Diff #243329) | The more granular check (hasStdExtZbb()?) should be used here instead. |
- Update patch now AssemblerPredicate update has landed
- Add "experimental-" to all bitmanip SubtargetFeatures
This looks good to me, as long as it covers the spec. Can you add more reviewers though?
This is starting to look good. I've checked all the encodings, and other than c.zext.w having the wrong value all the encodings are right.
One thing I would consider is to make reviewing/future changes easier, I would define all the instructions in the order they appear in the Chapter 2.11 Opcode Encodings tables. This way, it will be easier to see for future revisions what has changed and make sure the encodings are up to date. This might end up with a few more let Predicates = directives, but we can rearrange and tidy this up when it is all ratified.
llvm/lib/Target/RISCV/RISCVInstrInfoB.td | ||
---|---|---|
86 | It seems like this is just RVInstR but with a opcode flag, but its only used with OPC_OP_IMM_32, can this just assume that opcode type and have a better name (assuming the opcode type makes sense)? | |
116 | This looks like it mostly used for just the 'W' variants of instructions, in that case is this the most appropriate name if the opcode type is OPC_OP_32 by default? It seems for others you're doing the right thing and using ALU_rr as the class you derive from | |
118 | remove {} | |
184 | I don't know if splitting up the 12-bit immediate into a 7 and a 5 might make future maintenance of this easier, given that that's how its expressed int he manual? | |
184 | I'm also unsure on the use of the opcode enum when it represents the wrong thing, I don't know if this causes some confusion, but we can see if anyone else has any comments here? | |
388 | This block is missing C.NEG with funct2 01? | |
392 | c.zext.w should have 10 as its funct2. Also zest -> zext | |
llvm/test/MC/RISCV/rv64bc-valid.s | ||
13 | zest -> zext |
Strong +1 on this. Throughout the RISC-V backend we order the instruction definitions to match the order in the ISA manual, with codegen patterns listed separately in a more meaningful order. As Simon say, it can lead to a few more Predicates directives needed, but makes cross-referencing vs the spec _much_ easier.
Looks tidy and a really decent set of tests. I've added a few comments, though they are mainly me agreeing with Simon's points.
llvm/lib/Target/RISCV/RISCVInstrInfoB.td | ||
---|---|---|
182 | Do these properties need to be specified, or do the values match the values which would be inferred by TableGen anyway (since all the DAG patterns are empty)? | |
184 | I would also recommend avoiding the opcode enum values. It implies a connection to the base instruction encodings which is entirely incidental, and the name is also misleading. | |
184 | I also think it is worth matching the manual and splitting up the 12 bit encoding into separate 7 and 5 bit fields. | |
388 | c.neg will need to be added to the tests too. |
llvm/lib/Target/RISCV/RISCVInstrInfoB.td | ||
---|---|---|
86 | Well, it is used by immediate instructions, so I think and indication of that should appear in the name. I wouldn't base it on RVInstR if that's what you meant. Anyway, in line with what you suggested about RVInstR I could call it ALU_ri_32, but ALU_ri uses a 12 bits immediate field, unlike the instructions using RVBInstImm5 that only use 5 bits immediates. So it's not just a matter of opcode but also of different fields. I could call it RVBInstShift32, as some of the 32 bit counterparts of such instructions use RVBInstShift. The problem is that instructions like GREVIW and GORCIW, or SBSETIW, SBCLRIW, SBINVIW have little to do with shifts. While SLOIW and SROIW have. At the time RVBInstImm5 seemed a good way to express the main difference of such template: the immediate operand bit width. I'll think about better names. | |
116 | I could relate to ALU_rr with the name ALU_rr_32 or ALU_rr_w, just to specify the difference. | |
184 | I agree. The explicit encoding would avoid some confusion here, even though the fact that this instruction uses the number OPC_OP_IMM for its opcode raises the clash between its immediate format and its "not immediate" usage. This might be subject to some adjustments once we move closer to have the encodings ratified. | |
184 | Splitting the immediate field into fields of 7 and 5 fields seems interesting. I mean, I guess it could prove useful in the future. Never the less it can't be applied anyway to all those templates like RVBInstImm5, RVBInstImm6 and RVBInstImm7, as they expect one of the two fields to be used as an immediate operand. So yes, I could do it, but it applies anyway only to the instructions that use those 12 bits only for flags. Do you agree? | |
388 | Correct. The compressed instructions seem outdated. I'll fix them. |
llvm/lib/Target/RISCV/RISCVInstrInfoB.td | ||
---|---|---|
116 | Never mind, I'll use ALUW_rr that was already available. Must have missed it. | |
388 | Correction. Adding any of c.not, c.neg, c.zext.w or c.zext.d creates a decoding conflict with the current encoding of c.addi16sp, c.lui and c.lui (hint) already implemented in the C extension. In the current proposed encoding of the compressed instructions a decoding conflict is due. Since these instructions though are meant to be part of the C extensions, not the B extension I think it's better to leave them out of this patch. Once a definitive encoding will be defined for them they should be added with no problems to the C extension. |
- Sorted the definitions of the instructions and the MC tests according to the order of the encodings in the manual.
- Improved the names of the instruction templates.
- Removed unused operand types and instruction templates.
- The features hasSideEffects, mayLoad and mayStore are now set only on the templates that need it.
- Renamed all the immediate parameters in the templates as 'shamt', for uniformity.
- Split the funct12 field of the unary instruction template into two fields of 7 and 5 bits. For conformity to the manual and for maintainability.
- Added a 4-bit immediate operand for the 32 bit shuffle.
- Added constraints on the size of the immediate operands.
- Added tests on the constraints of the immediate operands.
- Renamed most registers in the tests for uniformity.
- Removed the compressed instructions as they are still a proposal for the C extension and still cause encoding conflicts with the ones aready implemented there.
- Removed tests of compressed instructions.
llvm/lib/Target/RISCV/RISCVInstrInfoB.td | ||
---|---|---|
182 | Yes they are needed. TableGen gives error if they are not specified. |
- Added bitmanip compressed instructions.
- Added compressed instruction emission.
- Added mc tests.
- Added assembler predicate strings to B extension and subextensions.
- Fixed typos.
- Fixed the order of the single bit instructions.
I'm reviewing this with an eye to merging it, but one big thing that comes to mind is the compressed instructions. The draft bitmanip spec describes these under "Future compressed instructions" and says "It presumably would make sense for a future revision of the “C” extension to include compressed opcodes for those instructions." My reading is that this is more of a sketch of potential encodings and a less firm proposal than the 32-bit encodings described elsewhere in the spec. Do you disagree with that assessment?
This is one thing I've been discussing with Paolo a bit and I'm not 100% sure what the correct approach is here.
I agree, it seems more like a sketch of what they should be, but the mentioning of the size improvements suggested to me perhaps they should be, since evaluating the potential "what the final B extension should look like" it probably should be included. There is the question of where this should be enabled, my first thought (which this patch implements) is Zbb because it seemed the most logical, but maybe given the instructions stated state does another subextension make sense for its evaluation? These are all experimental so I think there's some leeway to some option, but I think it would be good to land it if we think people may be using it/want to evaluate it. I suspect there's many valid routes with this being experimental, we just need to choose one we have concensus around making sense. Perhaps a topic for today's call?
Submitting the comments I have so far - need to continue going through in more detail.
llvm/lib/Target/RISCV/RISCV.td | ||
---|---|---|
123 | Nit: indentation of this list seems arbitrary. I'd have that [ should be aligned with the first " on the line above? | |
llvm/lib/Target/RISCV/RISCVInstrInfoB.td | ||
2 | Doesn't match the filename | |
10 | This should probably say something like "This file describes the RISC-V instructions from the standard 'B' Bitmanip extension, version 0.92. This extension has not yet been ratified (meaning LLVM support is experimental)." | |
49 | You're using a different style here than elsewhere in the backend, where we try to use the basic RISC-V formats as the superclass wherever possible (and where necessary, defining a subclass in RISCVInstrInfo*.td). RVBUnary for instance just seems to be an R-type instruction (albeit, one where the rs2 field is hard coded). Also, for the base instr format class we don't set hasSideEffects/mayLoad/mayStore (these are semantic details that aren't necessarily implied by the isntr format). | |
52 | Should be InstFormatR? |
Yes, let's discuss today. I agree there's value in making it easy to evaluate the compressed instructions too. My concern would be that people implementing experimental support in simulators or CPUs might skip over the section as more speculative, meaning that if we emitted those encodings those users might have a bad experience.
- Changed some instruction template classes so that they are not based directly on the basic RVInst.
- Changed the indentation where possible so that it fits in 80 columns.
- Fixed some obsolete comments.
- Added some runs to the Zbp tests.
- Fixed other indentation problems.
I think we should land this and continue further development in-tree. There might be a few minor refactorings I want to propose after the fact, but these shouldn't block this going forwards.
LGTM. Thank you for all your work in putting this together and the care you've taken to do so in a way that matches the rest of the backend.
llvm/test/MC/RISCV/rvZbproposedc-invalid.s | ||
---|---|---|
1 ↗ | (On Diff #253455) | Nit: should be "-mattr=+experimental-zbproposedc,+c" given that +experimental-b won't turn on the proposed C additions by itself. |
nitpick: the capitalisation here looks weird.