This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Implment pseudo instructions for load/store from a symbol address.
ClosedPublic

Authored by kito-cheng on Aug 9 2018, 1:54 AM.

Details

Summary

Those pseudo-instructions are making load/store instructions able to load/store from/to a symbol, and its always using PC-relative addressing to generating a symbol address.

Diff Detail

Event Timeline

kito-cheng created this revision.Aug 9 2018, 1:54 AM
kito-cheng updated this revision to Diff 159879.Aug 9 2018, 1:59 AM
  • Add default case in switch.

Hi, thanks a lot for the patch. A few comments inline.

Could you add some negative tests so the instructions reject symbols that are not bare?

lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp
995

You will need to extend this case otherwise symbols named like registers fail to parse. I ran into this same problem! :)

That said, maybe we want to rethink this because your change adds a bunch of pseudos to this case.

t.s:1:8: error: immediate must be an integer in the range [-2048, 2047]
lb a0, zero
       ^
t.s:2:8: error: immediate must be an integer in the range [-2048, 2047]
sb a3, zero, a1
       ^
1252

Suggestion: what about renaming it emitLoadSymbolAddressHi, it might look like the full symbol address will be computed otherwise. Then you can rename the new emitLoadWithSymbol to just emitLoadSymbol.

Changes:

  • Rename functions
    • emitLoadSymbolAddress -> emitLoadSymbolAddressHi
    • emitLoadWithSymbol -> emitLoadSymbol
    • emitStoreWithSymbol -> emitStoreSymbol
  • Return shouldForceImediateOperand to true for new pseudo instructions.
  • Add negative test
kito-cheng marked 2 inline comments as done.Aug 13 2018, 2:30 AM
kito-cheng added inline comments.
lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp
995

Thanks, I was worried it will not work for those pseudo load/store, because the have another non-pseudo version, but it seems work fine :)

1252

Thanks your suggestion, naming is always hard to me :P

Thanks for the update @kito-cheng . Some more comments inline as the change is propagating a mistake of mine I introduced in rL339314 (fixed in rL339654).

You may have to rebase the patch. Apologies.

lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp
1263–1264

This introduces the same bug I introduced in my earlier patch (fixed in rL339654). The easiest way to avoid it is to use MCInstBuilder as an argument of a function call.

1270–1288

Ditto.

1324

Ditto.

1359

Ditto.

kito-cheng marked 2 inline comments as done.

Changes:

  • Rebase.
kito-cheng marked 6 inline comments as done.Aug 22 2018, 8:27 AM

Hi Roger:

Thanks, I don't know can't do that too, it teach me a lesson :)

asb added a comment.Nov 29 2018, 3:21 AM
  • I'd like to find a solution that doesn't get rid of the helpful error reporting we currently have for load/store. Downgrading all errors to just "invalid operand for instruction" is an unfortunate regression
  • The F/D load/store are wrong as this patch has them accepting GPRs rather than FPR32/FPR64. The outs for PseudoSTORE also has a mistake
  • We no longer have the shouldForceImmediate logic because associating custom parsers with operands was found to be superior. From some playing with a modified version of this patch it looks like more logic is needed to get these aliases to parse though...

I think defining the load/store classes like this makes sense:

class PseudoLoadSym<RegisterClass rdty, string opcodestr>
    : Pseudo<(outs rdty:$rd), (ins bare_symbol:$addr), [],
              opcodestr, "$rd, $addr"> {
  let hasSideEffects = 0;
  let mayLoad = 1;
  let mayStore = 0;
  let isAsmParserOnly = 1;
}

class PseudoStoreSym<RegisterClass rs1ty, string opcodestr>
    : Pseudo<(outs GPR:$tmp), (ins rs1ty:$rs1, bare_symbol:$addr), [],
              opcodestr, "$rs1, $addr, $tmp"> {
  let hasSideEffects = 0;
  let mayLoad = 0;
  let mayStore = 1;
  let isAsmParserOnly = 1;
}
jrtc27 added inline comments.Dec 4 2018, 6:37 AM
lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp
1252

It strikes me that, apart from choosing the opcode (add/load/store) and store having a temporary register, emitLoadLocalAddress/emitLoadSymbol/emitStoreSymbol all do exactly the same thing. Can we not move all that into emitLoadSymbolAddressHi, rename it appropriately (something like emitAuipcInstPair as an idea) and then remove emitLoadLocalAddress/emitLoadSymbol/emitStoreSymbol? Then processInstruction would look something like:

// ...
case RISCV::PseudoLLA:
  emitAuipcInstPair(Inst, RISCV::ADDI, IDLoc, Out);
  return false;
case RISCV::PseudoLB:
  emitAuipcInstPair(Inst, RISCV::LB, IDLoc, Out);
  return false;
// ...
case RISCV::PseudoSB:
  emitAuipcInstPair(Inst, RISCV::SB, IDLoc, Out);
  return false;
// ...

Note that Inst.getNumOperands() tells you whether there is a temporary register to use.

Alternatively, you can make emitAuipcInstPair take unsigned Opcode, MCOperand DestReg, MCOperand TmpReg, MCExpr *Symbol, SMLoc IDLoc, MCStreamer &Out and keep your wrappers, but their only job is to unpack DestReg, Symbol and TmpReg (set to DestReg for add/load). I'd still push the opcode selection out into processInstruction as you already have to enumerate the pseudo opcodes there. Then you'd have something like:

void RISCVAsmParser::emitLoadLocalAddress(MCInst &Inst, SMLoc IDLoc,
                                          MCStreamer &Out) {
  // The local load address pseudo-instruction "lla" is used in PC-relative
  // addressing of symbols:
  //   lla rdest, symbol
  // expands to
  //   TmpLabel: AUIPC rdest, %pcrel_hi(symbol)
  //             ADDI rdest, %pcrel_lo(TmpLabel)
  MCOperand DestReg = Inst.getOperand(0);
  const MCExpr *Symbol = Inst.getOperand(1).getExpr();
  emitAuipcInstPair(RISCV::ADDI, DestReg, DestReg, Symbol, IDLoc, Out);
}

void RISCVAsmParser::emitLoadSymbol(MCInst &Inst, unsigned Opcode, SMLoc IDLoc,
                                    MCStreamer &Out) {
  // The load pseudo-instruction does a pc-relative load with
  // a symbol.
  //
  // The expansion looks like this
  //
  //   TmpLabel: AUIPC rd, %pcrel_hi(symbol)
  //             LX    rd, %pcrel_lo(TmpLabel)(rd)
  MCOperand DestReg = Inst.getOperand(0);
  const MCExpr *Symbol = Inst.getOperand(1).getExpr();
  emitAuipcInstPair(Opcode, DestReg, DestReg, Symbol, IDLoc, Out);
}

void RISCVAsmParser::emitStoreSymbol(MCInst &Inst, unsigned Opcode, SMLoc IDLoc,
                                    MCStreamer &Out) {
  // The store pseudo-instruction does a pc-relative store with
  // a symbol.
  //
  // The expansion looks like this
  //
  //   TmpLabel: AUIPC tmp, %pcrel_hi(symbol)
  //             SX    rd, %pcrel_lo(TmpLabel)(tmp)
  MCOperand DestReg = Inst.getOperand(0);
  MCOperand TmpReg = Inst.getOperand(1);
  const MCExpr *Symbol = Inst.getOperand(2).getExpr();
  emitAuipcInstPair(Opcode, DestReg, TmpReg, Symbol, IDLoc, Out);
}

Although we can be smart and distinguish between stores and non-stores as described above, I'm actually leaning towards this one as it's a bit more explicit, clearer, and will catch cases where we do something stupid (e.g. stores without a temporary register).

jrtc27 added inline comments.Dec 6 2018, 5:12 PM
lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp
1341

Further investigation reveals that, for GNU as, the symbol is still Operands[1], and the temporary register is Operands[2], i.e. sw a0, foo, t0.

Changes:

  • Merge emitLoadLocalAddress/emitLoadSymbol/emitStoreSymbol
  • Fix wrong operand type and syntax for floating point load instruction.

However this item is still a kind of regression:

  • Downgrading all errors to just "invalid operand for instruction" is an unfortunate regression
kito-cheng marked an inline comment as done.Jan 10 2019, 7:56 AM
kito-cheng added inline comments.
lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp
1341

The order of Operands not literally order in the asm syntax, it's ordered by (outs) (ins) in the pattern, it seems a little counter-intuitive here.

Changes:

  • Add missing tests

Just noted that this is failing:

"sw zero, (a0)"
<stdin>:1:15: error: operand must be a symbol with %lo/%pcrel_lo modifier or an integer in the range [-2048, 2047]
sw zero, (a0)

^

We should also accept it as "sw zero, 0(a0)".
I verified GNU accepts it.

Just noted that this is failing:

"sw zero, (a0)"   
<stdin>:1:15: error: operand must be a symbol with %lo/%pcrel_lo modifier or an integer in the range [-2048, 2047]
sw      zero, (a0)
              ^

We should also accept it as sw zero, 0(a0).
I verified GNU accepts it.

[To clarify, we only accept 0(a0), whereas GNU as will also accept (a0) as having an implicit zero immediate.]

Yes; I have a local patch to add aliases with a missing immediate offset for all the IFDC loads/stores. I have a bunch of patch tidying, updating and submitting to do but it's on my list. However, such support is unrelated to this revision, which is all about expanding pseudo-instructions that use symbols.

lewis-revill added inline comments.
lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp
1252

Can this implementation of emitAuipcInstPair be brought in line with the same function as implemented in D55325, or vice-versa? I would prefer @jrtc27's second suggestion here as has been implemented in D55325, simply because emitLoadAddress requires a wrapper anyway, and emitLoadLocalAddress would have to be re-added in that patch if it were to be deleted here.

This patch doesnt interact well with instructions like sw a3, CONST(a4) introduced by D52298. Looks like MatchOperandParserImpl is happy to match the 'CONST' as a bare symbol for the pseudo before parseImmediate can be called. Perhaps one way to fix it is to have parseBareSymbol check for a trailing parenthesis?

lewis-revill added inline comments.Jan 24 2019, 6:03 AM
test/MC/RISCV/rvi-pseudos-invalid.s
10

Typo 'few'

Changes:

  • Fix parseBareSymbol with a constant symbols.
  • Rebase to D55325

The patch is not applying cleanly due to the added test/MC/RISCV/rv64i-pseudos.s which was first added in https://reviews.llvm.org/D55325

Changes:

  • Update test/MC/RISCV/rvi-pseudos-invalid.s
asb accepted this revision.Feb 14 2019, 2:06 AM

This looks good to me, thanks! Just a few minor comments inline.

lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp
933

Should be Tok and I don't think this is a good case for using auto (see https://llvm.org/docs/CodingStandards.html#use-auto-type-deduction-to-make-code-more-readable). AsmToken Tok = getLexer().getTok(); would IMHO be more readable and in line with the style used elsewhere in LLVM.

934

const MCExpr *V =

lib/Target/RISCV/RISCVInstrFormats.td
111

It would be good to add a comment here to indicate that these are used for load/store to a symbol address.

This revision is now accepted and ready to land.Feb 14 2019, 2:06 AM
asb added a comment.Feb 15 2019, 1:56 AM

I believe the pre-requisite patches for this are now committed, so please go ahead and commit once you've addressed the minor nits in my previous review. Thanks!

jrtc27 added inline comments.Feb 15 2019, 5:03 AM
lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp
1294

Extra blank line is not necessary

1306

Typo: should be SymbolOpIdx

1386

The most common convention in the code base by my limited grepping is /*HasTmpReg=*/false; that's certainly most common in the RISC-V backend.

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptFeb 19 2019, 7:31 PM