This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Add "lla" pseudo-instruction to assembler
ClosedPublic

Authored by rogfer01 on Jul 23 2018, 3:46 AM.

Details

Summary

This pseudo-instruction is similar to la but uses PC-relative addressing unconditionally. This is, la is only different to lla when using -fPIC. This pseudo-instruction seems often forgotten in several specs but it is definitely mentioned in binutils opcodes/riscv-opc.c. The semantics are defined both in page 37 of the "RISC-V Reader" book but also in function macro found in gas/config/tc-riscv.c.

This is a very first step towards adding PIC support for Linux in the RISC-V backend.

The lla pseudo-instruction expands to a sequence of auipc + addi with a couple of pc-rel relocations where the second points to the first one. This is described in https://github.com/riscv/riscv-elf-psabi-doc/blob/master/riscv-elf.md#pc-relative-symbol-addresses

For now, this patch only introduces support of that pseudo instruction at the assembler parser.

As part of the remaining PIC changes and to avoid code duplication between the codegen flow and the assembler flow, I plan to lower PC-rel addressing (the one that does not use the GOT, e.g. loading a file-scope static variable) into a PseudoLLA. But this means the expansion of the pseudo-instruction still has to happen elsewhere for the codegen flow. Again to avoid code duplication, my idea was to introduce a RISCVELFStreamer which knows how to expand the pseudo-MC-instruction when generating an object (I'm a bit worried though that I might be tying this to ELF). I'd do a similar thing with la (but that will require some more stuff not in upstream yet).

I considered using MCCodeEmitter as we do now for PseudoCALL and PseudoTAIL but I don't think this can work as I need to emit a label, something that the MCCodeEmitter doesn't seem (to the best of my knowledge) able to do.

Hope my plan above makes sense.

Diff Detail

Event Timeline

rogfer01 created this revision.Jul 23 2018, 3:46 AM
asb added a comment.Jul 26 2018, 5:09 AM

Thanks for the patch, just a few minor comments in-line.

Could you add a test that shows non-bare symbols are rejected? e.g. lla a0, %lo(sym)?

This will fail to parse lla where the symbol coincides with a register name. Take a look at 'ForceImmediate' as used when parsing call and tail - you'll need to do something similar here. Due to the lack of clear separation between the register and symbol name namespaces for these pseudo-instructions, I think this kind of hack in the parser is the only way to go.

lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp
1192

Could you make the comment more descriptive? e.g. explaining the load local address pseudo-instruction.

1214–1216

clang-format prefers slightly different indentation here

lib/Target/RISCV/RISCVInstrInfo.td
715–717

It would be slightly more consistent with other instruction definitions to write this as:

def PseudoLLA : Pseudo<(outs GPR:$dst), (ins bare_symbol:$src), [],
                       "lla", "$dst, $src">;
rogfer01 updated this revision to Diff 157647.Jul 26 2018, 11:52 PM
rogfer01 marked 3 inline comments as done.

Thanks a lot for the review @asb !

ChangeLog:

  • Refactor a bit ForceImmediate flag to include lla in the second operand
  • Extend test to include symbols named like registers
  • New negative test for non-bare symbols
  • Addressed code consistency issues. Update comment.
asb added a comment.Aug 2 2018, 6:25 AM

Thanks for the update. The helper function for force immediate is a good idea - just a couple of minor comments there.

It would be nice if there was a sensible way to merge the tests in to rvi-aliases-valid.s and a new rvi-aliases-invalid.s, but I see that CHECK-EXPAND (as used by li) isn't appropriate for lla, because of course we won't see %pcrel_hi/%pcrel_lo in the objdump output.

With the suggested tweaks to the helper function I think this is good to go. Thanks!

lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp
969

I think we'd benefit from a comment here describing the function. How about:

Return true if the operand at the OperandIdx for opcode Name should be 'forced' to be parsed as an immediate. This is required for pseudoinstructions such as tail or call, which allow bare symbols to be used that could clash with register names.

970

Function names should start with a lower-case letter. ParseInstruction etc don't do this as they override an existing method, which doesn't conform with this naming convention. https://llvm.org/docs/CodingStandards.html#id42. Maybe shouldForceImediateOperand would be better given the bool return type?

rogfer01 updated this revision to Diff 158938.Aug 3 2018, 12:55 AM
rogfer01 marked 2 inline comments as done.

Thanks @asb for the review!

ChangeLog:

  • Rename ForceImmediateOperandshouldForceImediateOperand.
  • Add comment to shouldForceImediateOperand
  • Fix mayLoad it can be mayLoad = 0 because this is like li, no memory is actually read.
  • Update comments to avoid suggesting that lla loads memory.
asb accepted this revision.Aug 8 2018, 7:35 AM

Thanks, this looks good to me.

This revision is now accepted and ready to land.Aug 8 2018, 7:35 AM

Thanks for the review @asb . I will commit soon.

This revision was automatically updated to reflect the committed changes.