Page MenuHomePhabricator

[RISCV 8/10] Add support for all RV32I instructions

Authored by asb on Aug 16 2016, 8:40 AM.



This is overall, fairly mechanical. Instructions are ordered in the same way as in the RISC-V User-Level ISA Specification.

A future patch will add support for pseudoinstructions and other instruction expansions (e.g. 0-arg fence -> fence iorw, iorw).

Diff Detail


Event Timeline

asb updated this revision to Diff 68192.Aug 16 2016, 8:40 AM
asb retitled this revision from to [RISCV 8/10] Add support for all RV32I instructions.
asb updated this object.
asb added reviewers: theraven, jyknight.
asb added a subscriber: llvm-commits.
emaste added a subscriber: emaste.Aug 16 2016, 2:43 PM
reames added a subscriber: reames.Aug 21 2016, 12:38 PM
reames added inline comments.
139 ↗(On Diff #68192)

style: shouldn't this be isImm12?

143 ↗(On Diff #68192)

Style: sort these by immediate size.

148 ↗(On Diff #68192)

style: duplicated code, possible an isImm<X>? With wrappers potentially?

160 ↗(On Diff #68192)

I saw this being discussed in a previous review, but I won't know what these were from the names. Possible a comment? Or a pointer to a design decision?

297 ↗(On Diff #68192)

Style: Hard coding these values seems slightly error prone. Could we generate these messages from the immediate size and common all of this code?

109 ↗(On Diff #68192)

Shouldn't this simply be two different instructions with disambiguation living in the disassembler?

asb updated this revision to Diff 69355.Aug 26 2016, 5:28 AM
asb marked 5 inline comments as done.

Address comments from @reames. AsmOperand definitions in RISCVInstrInfo have changed. Error reporting code has been commoned. Also describe the CSR instructions that were added in the v2.1 RISC-V ISA spec.

139 ↗(On Diff #68192)

I've reworked things so that tablegen will call isImm12, in fact I also went ahead and moved to UImm and SImm for greater clarity.

148 ↗(On Diff #68192)

In D23568 some of the isImm methods get a little more involved. For the isImm methods that are trivial, just having it as a wrapper to a templated function actually help readability? My concern is that it's less easy to see at a glance that a trivial check is taking place rather than something more complex.

160 ↗(On Diff #68192)

I've added a comment to the relevant definition in and added a comment to this file that points to the definitions in

297 ↗(On Diff #68192)

I've added a common error message generator, which I think is an improvement. I'm not sure whether it's really clearer or not now that the desired range isn't hard-coded.

theraven edited edge metadata.Aug 26 2016, 6:59 AM

Comments inline.

162 ↗(On Diff #69355)

This probably wants to be SImmScaled<12,1>(getConstantImm()).

It would also be nice to use the same terminology (scaled immediate) as other back ends and the generic code, rather than simm-mask.

171 ↗(On Diff #69355)

As above, should use SImmScaled.

262 ↗(On Diff #69355)

If this is only used in this file, it might be better off as a function in an anonymous namespace rather than a method exposed in the header.

314 ↗(On Diff #69355)

I agree with reames that it would be nicer to have the ranges come sensibly from TableGen. If you figure out a way to do this, let me know as we are currently specifying the same ranges in three different ways for a few things in the CHERI back end...

I'd expect to see some PrintMethods and InstPrinter adaptations for these (specifically to wrangle the correct immediates from the MCInst representation).

Ah, never mind. Somehow I hadn't realised you were keeping the CodeGen immediate in the MCInst.

asb added inline comments.Oct 8 2016, 1:18 AM
162 ↗(On Diff #69355)

MIPS uses the SImmScaled functions, but naming is of the form simm19_lsl2. Both the lsl naming and the {U,I}ImmScaled functions are unique to MIPS currently so there's not broad consensus here - though I agree unifying terminology is useful.

I feel the naming is perhaps slightly confusing in that the decision to describe a transformation from the encoded to the 'actual' value seems arbitrary vs describing the transformation from 'actual' value to encoded value. The options I considered were:

  • simm20_lsl1:$imm20 (describes how to go from encoded value to logical value. Matches MIPS)
  • simm21_asr1:$imm20 (describes going from logical value to encoded value)
  • simm21_mask1:$imm20 (current approach, describes the constraints on the encoded value)

Are you suggesting simm20_scaled1:$imm20? Or perhaps simm21_scaled1:$imm20?

asb added inline comments.Oct 8 2016, 1:26 AM
162 ↗(On Diff #69355)

Or another alternative: given that in the RISC-V ISA the 'scaled' immediates only shift by 1 bit (UJ and SB instruction forms) we could go with simm21_lsb0:$imm20 to indicate that the least significant bit is known 0.

theraven added inline comments.Oct 8 2016, 3:03 AM
162 ↗(On Diff #69355)

I'm happy with either name, as long as there's a comment explaining what it means where it's first introduced. I'm more concerned to avoid the reimplementation of SImmScaled than what you call the result.

asb updated this revision to Diff 74026.Oct 8 2016, 6:20 AM
asb edited edge metadata.
asb marked 5 inline comments as done.

Make use of isShiftedInt from MathExtras.h. Rename {simm21,simm13}_mask1 to {simm21,simm13}_lsb0. Tests are updated to check instruction printing.

asb added inline comments.Oct 8 2016, 6:25 AM
262 ↗(On Diff #69355)

RISCVAsmParser is itself already in an anonymous namespace. Unless I'm misunderstanding the suggestion, I'm not seeing much advantage in splitting out this and other helpers. It also wouldn't match standard practice in other backends.

asb updated this revision to Diff 74379.Oct 12 2016, 7:55 AM

Update test style as suggested by @jyknight in D23564

jyknight added inline comments.Oct 13 2016, 12:12 PM
162 ↗(On Diff #69355)

(This continues the thread from D23561, but it's more relevant here.)

I find it confusing to talk about the meanings as a transformation. I think the name ought to describe the value itself -- it *is* a 20 bit value, whose meaning is shifted by 1 bit.

Thus, the convention that makes the most sense to me would be to call it an "sImm20s1" -- that is, 20 bits, shifted 1. This is the convention that the arm and aarch64 backends use.

Scaled is an okay word too, except that you might think it's a multiplier, not a shift. (does "scaled2" mean "times 2" or "shifted by 2"?)

Adding "left" or "right" into the name (as in "lsl" or "asr") is seems to me to be unnecessary clarification, that actually adds confusion instead of clarifying. When you specifies direction, I start thinking about what that's trying to say, and about which direction is which, and such. But there's only one sensible direction in the first place, so not saying it somewhat unintuitively seems to be less confusing -- at least for me.

(BTW, since the Aarch64 backend is a pretty new backend, and was written by many of the long-time core contributors to LLVM, I tend to look at it to guide style in preference to other backends. Of course it's not 100% the case that it's always doing things the best way, but I think it's probably more likely than others at the moment.)

156 ↗(On Diff #74026)

This looks like it's actually an "FI" format instruction. I suggest the following:

def FENCE : FI<0b000, 0b0001111, (outs), (ins uimm4:$pred, uimm4:$succ), "fence\t$pred, $succ", []>
  bits<4> pred;
  bits<4> succ;  
  let rs1 = 0;
  let rd = 0;  
  let imm12 = {0b0000,pred,succ};  
168 ↗(On Diff #74026)
def FENCEI : FI<0b001, 0b0001111, (outs), (ins), "fence.i", []> {
  let rs1 = 0;
  let rd = 0;
  let imm12 = 0;
190 ↗(On Diff #74026)

Missing the csrr and csrw aliases.

118 ↗(On Diff #74026)

This can be supported easily via adding:

def : InstAlias<"fence", (FENCE 0, 15)>;

(That also makes disassembly of "fence 0, 15" show up as "fence", automatically.

Razer6 added a subscriber: Razer6.Feb 1 2017, 5:08 AM
asb updated this revision to Diff 88307.Feb 13 2017, 10:25 PM
asb marked 3 inline comments as done.

Refresh patch and incorporate suggestion from @jyknight regarding FENCE and FENCEI (thanks!).

I _think_ the discussion about naming immediate types was resolved with the use of simm13_lsb0, but let me know if there are still concerns. Using semantic names like branchimm or similar isn't ideal as the names may not hold for further RISC-V extensions (out-of-tree custom extensions or future standard extensions). imm_frm_r, imm_frm_i or similar could be an option, but I'm not really seeing a strong advantage. Input welcome though.

190 ↗(On Diff #74026)

I'm intentionally missing aliases in this patch. I'd rather introduce them all together later.

asb updated this revision to Diff 88328.Feb 14 2017, 1:01 AM

The diff I attached a few hours ago didn't include all context, this update fixes that. Sorry for the noise.

Florob added a subscriber: Florob.Feb 20 2017, 3:11 PM
Florob added inline comments.
169 ↗(On Diff #88328)

Upstream GAS also requires the arguments to be a substring of iorw and apparently doesn't accept integers.

asb updated this revision to Diff 111329.Aug 16 2017, 5:31 AM
asb marked an inline comment as done.
asb edited the summary of this revision. (Show Details)

Thanks to @Florob for noting that gas doesn't accept integer arguments to fence. I've updating this patch so that 'iorw' are accepted under the same conditions as gas (no repeated letters, must be given in that order).

I believe this patch is ready for merging.

asb added a comment.Aug 16 2017, 5:35 AM

I should have said - please do take a look at the handling of the fence arguments in RISCVAsmParser. I've actually avoided adding a new RISCVOperand type or directly modifying the operand parsing machinery (as AArch64 does for CondCodes). Allowing whatever is there to be parsed, then working out if it's valid or not seemed to more in line with the rest of the MC assembler parser.

apazos added a subscriber: apazos.Aug 25 2017, 1:31 PM
apazos added inline comments.
272 ↗(On Diff #111329)

Another code standard note: be consistent where you put the default case, people usually put it as the first case to avoid forgetting it.

452 ↗(On Diff #111329)

code standard reminder: {} are unnecessary with one line statement.

asb updated this revision to Diff 112747.Aug 25 2017, 2:25 PM
asb marked 2 inline comments as done.

Address comments from @apazos (thanks!). I've also converted a few if conditions to use MCAsmLexer::{is,isNot}.

theraven edited edge metadata.Sep 6 2017, 2:48 AM

I think that it's probably about time to move the RISC-V back end code to post-commit review.

asb added a comment.Sep 6 2017, 7:39 AM

I think a number of future RISC-V backend patches will be straight-forward enough to just use post-commit review. However, the developer policy specifically warns against abandoning the review process and committing directly once a patch has been submitted

mgrang added a subscriber: mgrang.Sep 6 2017, 4:55 PM
mgrang added inline comments.
146 ↗(On Diff #112747)
266 ↗(On Diff #112747)


theraven added inline comments.Sep 7 2017, 12:52 AM
146 ↗(On Diff #112747)

Do you really think that making a copy of a char and reusing it in a register is more expensive than taking a reference to a char in the middle of the string and relying on alias analysis to ensure that we only load via that pointer once?

References to small (register or pair of register) POD values are likely to have more overhead in a range-based for loop than copies and should only be used if you need to modify the value in place.

asb added inline comments.Sep 7 2017, 3:12 AM
146 ↗(On Diff #112747)

Thanks for the feedback Mandeep. I think using a reference is unnecessary here for the same reason you typically wouldn't use a reference when declaring a function taking an int or char argument. Making c const would perhaps be a better incremental improvement, but given c's short scope it wouldn't add much to readability. [I don't think LLVM has a consistent policy on declaring local PODs as const, but could be wrong]

psnobl added a subscriber: psnobl.Sep 8 2017, 11:35 AM

LGTM w/comments applied before commit.

Note: I'm LGTM this after looking for mostly stylistic issues. I did not closely review the ISA specification to confirm the RISCV specific instruction details. I'm mostly LGTMing this because it's been stuck in review for a while, I want to get it unblocked, and I don't see any obvious reasons to hold it back.

135 ↗(On Diff #112747)

/// comment describing function

136 ↗(On Diff #112747)

I think you're missing an isImm check here?

147 ↗(On Diff #112747)

really minor: a switch would be more clear

149 ↗(On Diff #112747)

Should this check be inverted for an ascending order?

262 ↗(On Diff #112747)

Just use a cast<> and drop the separate assert.

451 ↗(On Diff #112747)

Better to invert this and make the error the early return.

26 ↗(On Diff #112747)

Is there a need to be particularly short here? If not, something like InstFormatR might be more clear.

reames accepted this revision.Sep 10 2017, 7:17 PM
This revision is now accepted and ready to land.Sep 10 2017, 7:17 PM
asb marked 5 inline comments as done.Sep 17 2017, 7:28 AM
asb added inline comments.
136 ↗(On Diff #112747)

It shouldn't be necessary, but yes - let's add it in case things change in the future. Thanks.

147 ↗(On Diff #112747)

Do you find this clearer? It seems slightly less clear to me, but obviously these things are very subjective.

for (char c : Str) {
   if (c <= Prev)
      return false;
  switch (c) {
    return false;
  case 'i':
  case 'o':
  case 'r':
  case 'w':
    Prev = c;
149 ↗(On Diff #112747)

'iorw' is accepted, but 'wroi' would not be, matching the GCC behaviour. Reading the *AsmParser.cpp files is made somewhat confusing by the fact methods like ParseInstruction use false for success, unlike these predicates (which are called by tablegenned code).

451 ↗(On Diff #112747)

I played around with this, and think early-exit for success reads more clearly, particularly as I want to consistently early exit on the same condition (e.g. a couple of lines above we also early-exist on success). There are many more possible incorrect inputs than correct ones, so filtering out the correct ones and having a catch-all for failures at the end makes more sense to me. Happy to change if you feel strongly otherwise.

This revision was automatically updated to reflect the committed changes.
asb marked 2 inline comments as done.