Those pseudo-instructions are making load/store instructions able to load/store from/to a symbol, and its always using PC-relative addressing to generating a symbol address.
Details
Diff Detail
Event Timeline
Hi, thanks a lot for the patch. A few comments inline.
Could you add some negative tests so the instructions reject symbols that are not bare?
lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp | ||
---|---|---|
995 | You will need to extend this case otherwise symbols named like registers fail to parse. I ran into this same problem! :) That said, maybe we want to rethink this because your change adds a bunch of pseudos to this case. t.s:1:8: error: immediate must be an integer in the range [-2048, 2047] lb a0, zero ^ t.s:2:8: error: immediate must be an integer in the range [-2048, 2047] sb a3, zero, a1 ^ | |
1252 | Suggestion: what about renaming it emitLoadSymbolAddressHi, it might look like the full symbol address will be computed otherwise. Then you can rename the new emitLoadWithSymbol to just emitLoadSymbol. |
Changes:
- Rename functions
- emitLoadSymbolAddress -> emitLoadSymbolAddressHi
- emitLoadWithSymbol -> emitLoadSymbol
- emitStoreWithSymbol -> emitStoreSymbol
- Return shouldForceImediateOperand to true for new pseudo instructions.
- Add negative test
Thanks for the update @kito-cheng . Some more comments inline as the change is propagating a mistake of mine I introduced in rL339314 (fixed in rL339654).
You may have to rebase the patch. Apologies.
lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp | ||
---|---|---|
1263–1264 | This introduces the same bug I introduced in my earlier patch (fixed in rL339654). The easiest way to avoid it is to use MCInstBuilder as an argument of a function call. | |
1270–1288 | Ditto. | |
1324 | Ditto. | |
1359 | Ditto. |
- I'd like to find a solution that doesn't get rid of the helpful error reporting we currently have for load/store. Downgrading all errors to just "invalid operand for instruction" is an unfortunate regression
- The F/D load/store are wrong as this patch has them accepting GPRs rather than FPR32/FPR64. The outs for PseudoSTORE also has a mistake
- We no longer have the shouldForceImmediate logic because associating custom parsers with operands was found to be superior. From some playing with a modified version of this patch it looks like more logic is needed to get these aliases to parse though...
I think defining the load/store classes like this makes sense:
class PseudoLoadSym<RegisterClass rdty, string opcodestr> : Pseudo<(outs rdty:$rd), (ins bare_symbol:$addr), [], opcodestr, "$rd, $addr"> { let hasSideEffects = 0; let mayLoad = 1; let mayStore = 0; let isAsmParserOnly = 1; } class PseudoStoreSym<RegisterClass rs1ty, string opcodestr> : Pseudo<(outs GPR:$tmp), (ins rs1ty:$rs1, bare_symbol:$addr), [], opcodestr, "$rs1, $addr, $tmp"> { let hasSideEffects = 0; let mayLoad = 0; let mayStore = 1; let isAsmParserOnly = 1; }
lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp | ||
---|---|---|
1252 | It strikes me that, apart from choosing the opcode (add/load/store) and store having a temporary register, emitLoadLocalAddress/emitLoadSymbol/emitStoreSymbol all do exactly the same thing. Can we not move all that into emitLoadSymbolAddressHi, rename it appropriately (something like emitAuipcInstPair as an idea) and then remove emitLoadLocalAddress/emitLoadSymbol/emitStoreSymbol? Then processInstruction would look something like: // ... case RISCV::PseudoLLA: emitAuipcInstPair(Inst, RISCV::ADDI, IDLoc, Out); return false; case RISCV::PseudoLB: emitAuipcInstPair(Inst, RISCV::LB, IDLoc, Out); return false; // ... case RISCV::PseudoSB: emitAuipcInstPair(Inst, RISCV::SB, IDLoc, Out); return false; // ... Note that Inst.getNumOperands() tells you whether there is a temporary register to use. Alternatively, you can make emitAuipcInstPair take unsigned Opcode, MCOperand DestReg, MCOperand TmpReg, MCExpr *Symbol, SMLoc IDLoc, MCStreamer &Out and keep your wrappers, but their only job is to unpack DestReg, Symbol and TmpReg (set to DestReg for add/load). I'd still push the opcode selection out into processInstruction as you already have to enumerate the pseudo opcodes there. Then you'd have something like: void RISCVAsmParser::emitLoadLocalAddress(MCInst &Inst, SMLoc IDLoc, MCStreamer &Out) { // The local load address pseudo-instruction "lla" is used in PC-relative // addressing of symbols: // lla rdest, symbol // expands to // TmpLabel: AUIPC rdest, %pcrel_hi(symbol) // ADDI rdest, %pcrel_lo(TmpLabel) MCOperand DestReg = Inst.getOperand(0); const MCExpr *Symbol = Inst.getOperand(1).getExpr(); emitAuipcInstPair(RISCV::ADDI, DestReg, DestReg, Symbol, IDLoc, Out); } void RISCVAsmParser::emitLoadSymbol(MCInst &Inst, unsigned Opcode, SMLoc IDLoc, MCStreamer &Out) { // The load pseudo-instruction does a pc-relative load with // a symbol. // // The expansion looks like this // // TmpLabel: AUIPC rd, %pcrel_hi(symbol) // LX rd, %pcrel_lo(TmpLabel)(rd) MCOperand DestReg = Inst.getOperand(0); const MCExpr *Symbol = Inst.getOperand(1).getExpr(); emitAuipcInstPair(Opcode, DestReg, DestReg, Symbol, IDLoc, Out); } void RISCVAsmParser::emitStoreSymbol(MCInst &Inst, unsigned Opcode, SMLoc IDLoc, MCStreamer &Out) { // The store pseudo-instruction does a pc-relative store with // a symbol. // // The expansion looks like this // // TmpLabel: AUIPC tmp, %pcrel_hi(symbol) // SX rd, %pcrel_lo(TmpLabel)(tmp) MCOperand DestReg = Inst.getOperand(0); MCOperand TmpReg = Inst.getOperand(1); const MCExpr *Symbol = Inst.getOperand(2).getExpr(); emitAuipcInstPair(Opcode, DestReg, TmpReg, Symbol, IDLoc, Out); } Although we can be smart and distinguish between stores and non-stores as described above, I'm actually leaning towards this one as it's a bit more explicit, clearer, and will catch cases where we do something stupid (e.g. stores without a temporary register). |
lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp | ||
---|---|---|
1341 | Further investigation reveals that, for GNU as, the symbol is still Operands[1], and the temporary register is Operands[2], i.e. sw a0, foo, t0. |
Changes:
- Merge emitLoadLocalAddress/emitLoadSymbol/emitStoreSymbol
- Fix wrong operand type and syntax for floating point load instruction.
However this item is still a kind of regression:
- Downgrading all errors to just "invalid operand for instruction" is an unfortunate regression
lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp | ||
---|---|---|
1341 | The order of Operands not literally order in the asm syntax, it's ordered by (outs) (ins) in the pattern, it seems a little counter-intuitive here. |
Just noted that this is failing:
"sw zero, (a0)"
<stdin>:1:15: error: operand must be a symbol with %lo/%pcrel_lo modifier or an integer in the range [-2048, 2047]
sw zero, (a0)
^
We should also accept it as "sw zero, 0(a0)".
I verified GNU accepts it.
[To clarify, we only accept 0(a0), whereas GNU as will also accept (a0) as having an implicit zero immediate.]
Yes; I have a local patch to add aliases with a missing immediate offset for all the IFDC loads/stores. I have a bunch of patch tidying, updating and submitting to do but it's on my list. However, such support is unrelated to this revision, which is all about expanding pseudo-instructions that use symbols.
lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp | ||
---|---|---|
1252 | Can this implementation of emitAuipcInstPair be brought in line with the same function as implemented in D55325, or vice-versa? I would prefer @jrtc27's second suggestion here as has been implemented in D55325, simply because emitLoadAddress requires a wrapper anyway, and emitLoadLocalAddress would have to be re-added in that patch if it were to be deleted here. |
This patch doesnt interact well with instructions like sw a3, CONST(a4) introduced by D52298. Looks like MatchOperandParserImpl is happy to match the 'CONST' as a bare symbol for the pseudo before parseImmediate can be called. Perhaps one way to fix it is to have parseBareSymbol check for a trailing parenthesis?
test/MC/RISCV/rvi-pseudos-invalid.s | ||
---|---|---|
10 | Typo 'few' |
The patch is not applying cleanly due to the added test/MC/RISCV/rv64i-pseudos.s which was first added in https://reviews.llvm.org/D55325
This looks good to me, thanks! Just a few minor comments inline.
lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp | ||
---|---|---|
933 | Should be Tok and I don't think this is a good case for using auto (see https://llvm.org/docs/CodingStandards.html#use-auto-type-deduction-to-make-code-more-readable). AsmToken Tok = getLexer().getTok(); would IMHO be more readable and in line with the style used elsewhere in LLVM. | |
934 | const MCExpr *V = | |
lib/Target/RISCV/RISCVInstrFormats.td | ||
111 | It would be good to add a comment here to indicate that these are used for load/store to a symbol address. |
I believe the pre-requisite patches for this are now committed, so please go ahead and commit once you've addressed the minor nits in my previous review. Thanks!
Should be Tok and I don't think this is a good case for using auto (see https://llvm.org/docs/CodingStandards.html#use-auto-type-deduction-to-make-code-more-readable). AsmToken Tok = getLexer().getTok(); would IMHO be more readable and in line with the style used elsewhere in LLVM.