This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Add diff relocation support for RISC-V
ClosedPublic

Authored by simoncook on Apr 2 2018, 1:24 PM.

Details

Summary

For RISC-V it is desirable to have relaxation happen in the linker once addresses are known, and as such the size between two instructions/byte sequences in a section could change.

For most assembler expressions, this is fine, as the absolute address results in the expression being converted to a fixup, and finally relocations. However, for expressions such as .quad .L2-.L1, the assembler folds this down to a constant once fragments are laid out, under the assumption that the difference can no longer change, although in the case of linker relaxation the differences can change at link time, so the constant is incorrect. One place where this commonly appears is in debug information, where the size of a function expression is in a form similar to the above.

This patch extends the assembler to allow an AsmBackend to declare that it does not want the assembler to fold down this expression, and instead generate a pair of relocations that allow the linker to carry out the calculation. In this case, the expression is not folded, but when it comes to emitting a fixup, the generic FK_Data_* fixups are converted into a pair, one for the addition half, one for the subtraction, and this is passed to the relocation generating methods as usual. I have named these FK_Data_Add_* and FK_Data_Sub_* to indicate which half these are for.

For RISC-V, which supports this via e.g. the R_RISCV_ADD64, R_RISCV_SUB64 pair of relocations, these are also set to always emit relocations relative to local symbols rather than section offsets. This is to deal with the fact that if relocations were calculated on e.g. .text+8 and .text+4, the result 12 would be stored rather than 4 as both addends are added in the linker.

Diff Detail

Repository
rL LLVM

Event Timeline

simoncook created this revision.Apr 2 2018, 1:24 PM
mgrang added a subscriber: mgrang.Apr 4 2018, 5:13 PM
apazos added inline comments.Apr 4 2018, 6:01 PM
include/llvm/MC/MCFixup.h
48 ↗(On Diff #140678)

Comments need to be updated:
FK_Data_Add_8 -> A eight byte..
FK_Data_Sub_1 -> A one-byte..

lib/MC/MCAssembler.cpp
686 ↗(On Diff #140678)

Comment update:
one for the add, and one for the addition, -> one for the add, and one for the sub,

lib/Target/RISCV/MCTargetDesc/RISCVAsmBackend.cpp
39 ↗(On Diff #140678)

Should this setting be conditional on the relax flag?
The relax flag is being introduced in https://reviews.llvm.org/D44888

asb added a comment.Apr 5 2018, 4:05 AM

Thanks for this Simon.

Perhaps requiresDiffExpressionRelocations or similar might be better than supportsEmittingDiffExpressions? As Ana says, you might want to stick with the current behaviour when -mno-relax is set.

Just to make sure I understand the scope of this patch properly:

  • As far as you currently understand, does this cover all cases where constants are emitted where relocations are needed in the presence of linker relaxation. i.e. are there other known cases where you have planned follow-ups?
  • Does anyone know of other backends that could be affected by this issue? Linker relaxation seems to used somewhat sparingly as far as I can see, with targets like ARM preferring the less invasive 'range thunks'.
lib/MC/MCAssembler.cpp
684 ↗(On Diff #140678)

different -> difference

asb added a comment.Apr 5 2018, 4:12 AM

From a quick look in binutils, it seems micromips could be tripped up by similar issues.

mgrang added inline comments.Apr 5 2018, 8:13 AM
lib/Target/RISCV/MCTargetDesc/RISCVAsmBackend.cpp
40 ↗(On Diff #140678)

Remove empty line.

simoncook updated this revision to Diff 141454.Apr 6 2018, 4:56 PM
simoncook edited the summary of this revision. (Show Details)

I've updated the patch to respond to comments, and deal with the couple of test failures I was seeing.

The only one I think may need some thoughts on is the changes I made to hilo-constaddr.s: Using %lo on an expression won't work in this case, gas doesn't accept it as input, and in the relaxation case I don't believe there are RISC-V relocations that could represent such an expression. I've split out the test case into two files, one for the working case, and one for the input I would consider invalid.

Does this seem reasonable, I think we might want to reject this all the time, and not just when linker relaxation is enabled. Otherwise I think it could cause confusion that some expressions are accepted by the assembler only when relaxation is on. (For reference, this is the change in RISCVMCExpr.cpp) Any thoughts on this?

Alex, as far as I'm aware, this covers all cases where constants are emitted from what I've seen so far.

Ana, we can tie this to linker relaxation, looking at D44886 which I think adds the MC part of conditional relaxation, my change in the AsmBackend can be easily swapped from return true to querying the SubTargetInfo for return enableLinkerRelax() when that lands.

simoncook marked 4 inline comments as done.Apr 6 2018, 4:57 PM
simoncook updated this revision to Diff 141458.Apr 6 2018, 5:09 PM

The previous diff had one change missing from my original patch, that is now restored.

simoncook updated this revision to Diff 141512.Apr 7 2018, 2:16 PM
simoncook retitled this revision from [RISCV WIP] Add diff relocation support for RISC-V to [RISCV] Add diff relocation support for RISC-V.

Take part of test I removed as it now produces and error, and turn it into an explicit test for an error message.

asb added a comment.Apr 12 2018, 6:26 AM

I think this patch and its tests would be best if based on D44886, so it can be clear which behaviour is conditional on linker relaxation being enabled.

I'd be happy for hilo-constaddr-invalid.s to always be rejected. It never worked in gas, and we can always support it for builds without linker relaxation in the future. Perhaps add a note to the test file to document that we're unconditionally rejecting for now for consistency, but that it would be easy in the future to support when linker relaxation is disabled.

lib/MC/MCAssembler.cpp
685 ↗(On Diff #141512)

indiciated -> indicated

simoncook updated this revision to Diff 142607.Apr 16 2018, 3:29 AM

I've rebased the change on top of D44886 to indicate what is conditional based on linker relaxation.

I haven't made hilo-constaddr-invalid.s rejected unconditionally. I took a look at how to do this, and it seems it's more difficult to do that (considering fixup types on parent expressions when evaluating subexpressions) than it may be worth it, than to have this accepted based on the linker flag. If the latter is the behavior we want in the end, we already have that and I propose leaving it this way around.

simoncook marked 2 inline comments as done.Apr 16 2018, 3:29 AM
asb added inline comments.Apr 16 2018, 8:10 AM
test/MC/RISCV/hilo-constaddr.s
3 ↗(On Diff #142607)

We shouldn't lose the CHECK-REL check from this file, and probably would want to keep the check for the difference of two labels.

asb added a subscriber: sdardis.Apr 17 2018, 5:59 AM

CC @sdardis - micromips seems to use aggressive linker relaxation in binutils ld meaning this may have some current or future relevance to the Mips LLVM backend.

simoncook updated this revision to Diff 143061.Apr 19 2018, 2:05 AM

I've readded the reloc check line to hilo-constaddr.s and renamed hilo-constaddr-invalid.s to hilo-constaddr-expr.s since it is no longer checking something that is always invalid.

simoncook marked an inline comment as done.Apr 19 2018, 2:05 AM
simoncook added inline comments.May 3 2018, 8:51 AM
lib/Target/RISCV/MCTargetDesc/RISCVAsmBackend.cpp
42 ↗(On Diff #143061)

I've been doing some additional thinking on this, and I don't know if having this depend on the relax feature bit might introduce a problem going into the future.

Looking at D45864, the way .option rvc and .option norvc are implemented is they add and remove feature bits. If in the future we support .option norelax through the same approach we can end up with a situation where we end up with correct DWARF depending on which of these two is used last.

If we want to do this unconditionally, then we need to track whether feature relax was ever enabled, possibly via a second feature bit that is set by .option relax but not cleared by .option norelax and have return true if either is set. Does this seem like a good approach?

asb added inline comments.May 3 2018, 9:20 AM
lib/Target/RISCV/MCTargetDesc/RISCVAsmBackend.cpp
42 ↗(On Diff #143061)

I could see that argument. I could certainly see a counter-argument that says that if the user uses .option relax and .option norelax in the same section then they're doing something wrong and if things break they get to keep both pieces.

Maybe an alternative is to warn if relax/norelax is switched within the same section?

shiva0217 added inline comments.May 10 2018, 11:44 PM
lib/Target/RISCV/MCTargetDesc/RISCVAsmBackend.cpp
42 ↗(On Diff #143061)

The array value in the following case will always generate a fixup instead of absolute value after the commit.

commit 4d344626681a31cd8993f97eceb68419bfcc4c54
Author: Nirav Dave <niravd@google.com>
AuthorDate: Mon Apr 30 19:22:40 2018 +0000
Commit: Nirav Dave <niravd@google.com>
CommitDate: Mon Apr 30 19:22:40 2018 +0000
[MC] Change AsmParser to leverage Assembler during evaluation

int foo (int a)
{
  static const short ar[] = { &&l1 - &&l1, &&l2 - &&l1 };
  void *p = &&l1 + ar[a];
  goto *p;
 l1:
  return 1;
 l2:
  return 2;
}

We probably have two options.

  1. requiresDiffExpressionRelocations() always return true to emit ADD16 and SUB16 relocations.
  2. Fix up the above commit and then create a new patch to force this case to emit a relocation type when requiresDiffExpressionRelocations return true.

Any thoughts?

simoncook added inline comments.May 11 2018, 4:36 AM
lib/Target/RISCV/MCTargetDesc/RISCVAsmBackend.cpp
42 ↗(On Diff #143061)

I'm not quite sure what you mean. Compiling this test case I see ADD16 and SUB16 relocations in the table when relaxation is enabled, which I would expect. I did notice clang crashes assembling this case with relaxation disabled, I've submitted a fix to this as D46746, was it this crash you were referring to/the fixup being applied there is fine?

shiva0217 added inline comments.May 13 2018, 10:18 PM
lib/Target/RISCV/MCTargetDesc/RISCVAsmBackend.cpp
42 ↗(On Diff #143061)

Hi Simon. Yes, clang crashes with relaxation disabled in this case. I thought we could let requiresDiffExpressionRelocations always return true to fix this case. But I think https://reviews.llvm.org/D46746 would be a better fixed. Thanks. Another observation is that before the "[MC] Change AsmParser to leverage Assembler" commit,
The case will insert absolute value instead of generating FK_DATA_2 fixup. So the case didn't fail. Anyway, we could apply https://reviews.llvm.org/D46746 on current trunk to pass this case.

asb added a comment.May 16 2018, 9:28 PM

Thanks Simon, I think this is in pretty good shape. I've added a few comments on code style and expanding the tests. The only part of the patch I'm unsure about is the reference to InSet, so if you can clarify I'd really appreciate it.

include/llvm/MC/MCFixup.h
101 ↗(On Diff #143061)

autobrief is enabled in the LLVM Doxygen config so \brief isn't needed in any of these newly added functions.

142–165 ↗(On Diff #143061)

LLVM coding style is not to indent case labels (see getKindForSize).

lib/MC/MCAssembler.cpp
699–701 ↗(On Diff #143061)

This comment applies to both the if and else branches it should be brought up to just below if (!IsResolved) { and possibly given a minor rephrasing to mention the requiresDiffExpressionRelocation case.

702 ↗(On Diff #143061)

Given that the if() has a brace, the else should probably have one too. [Note: personal preference - can't see guidance one way or another in the LLVM coding guidelines).

lib/MC/MCExpr.cpp
574 ↗(On Diff #143061)

Why is InSet needed in this if statement? That's a genuine question rather than a suggestion it's wrong - the meaning of InSet is unclear.

test/MC/RISCV/fixups-expr.s
5 ↗(On Diff #143061)

Could you please make this test more exhaustive:

  1. Add riscv64 check lines
  2. Check R_RISCV_{ADD,SUB}{8,16,32} as well
test/MC/RISCV/hilo-constaddr-expr.s
1 ↗(On Diff #143061)

Could you add a RUN line for when relax is disabled, and adapt the CHECK lines that were previously there when this was in hilo-constaddr.s for this example?

asb added inline comments.May 16 2018, 11:04 PM
test/MC/RISCV/fixups-expr.s
5 ↗(On Diff #143061)

I committed rL332574 which adds support for the same data directive names used by binutils, so you should be able to use .word etc

simoncook updated this revision to Diff 148007.May 22 2018, 8:05 AM

Rebased on top of tree, and applied changes as per Alex's review.

Regarding InSet, I've tried to put a comment that explains its usage, but I'm not sure if it explains why it's there too well (or whether this is really should have a FIXME: The use of InSet = SymbolSize is a hack, similiar to evaluateAsAbsolute (line 461) ).

As far as I can tell, InSet is used as a flag to determine whether to allow expressions to evaluate as an absolute across sections (as this is otherwise an error), and the ELF object emitter re-uses this when emitting the size of a symbol in the symbol table (this seems the most reliable way of identifying an explicit "IsSymbolSize" through the expression evaluator).

The way our linker relaxtion works is that the size in the symbol table is a constant which represents the current symbol size, and the linker adjusts this before emitting its final binary based on how much relaxation occurs. Without this change we would emit a symbol size of zero, and a pair of relocations in the .symtab section, which I don't think the linker understands, or is able to support easily (if at all). Using this approach generates the binary we would expect.

asb accepted this revision.May 23 2018, 5:30 AM

Thanks Simon. A few minor nitpicks, but this looks good to me.

You forgot to include full context in the updated patch version.

include/llvm/MC/MCFixup.h
112 ↗(On Diff #148007)

This \brief still needs removing

lib/MC/MCAssembler.cpp
739–740 ↗(On Diff #148007)

Should be on a single line } else {

lib/MC/MCExpr.cpp
583 ↗(On Diff #148007)

reloations -> relocations

Nitpick: end with full stop

This revision is now accepted and ready to land.May 23 2018, 5:30 AM
This revision was automatically updated to reflect the committed changes.

I don't like resurrecting old reviews. This is just a warning in advance of a future issue we will have with the RISC-V backend when exception handling is enabled (it is not yet upstream).

DwarfDebug is using AsmPrinter::EmitLabelDifferenceAsULEB128 which ends calling MCObjectStreamer::emitAbsoluteSymbolDiffAsULEB128 which calls MCObjectStreamer::absoluteSymbolDiff. That last function checks MCAsmBackend::requiresDiffExpressionRelocations to tell if it can simplify the label difference.

It happens that the function AsmPrinter::EmitLabelDifferenceAsULEB128 is also used by EHStreamer.

When relaxation is enabled, we end with an invalid .eh_frame section. GNU ld rejects those and so do tools like llvm-objdump --dwarf=frames.

After some analysis, I think the reason why the .eh_frame is rejected is because, some of the label differences must be resolved in the final object. At least for the Length field (I'm using this reference: https://refspecs.linuxfoundation.org/LSB_3.0.0/LSB-PDA/LSB-PDA/ehframechpt.html) but maybe other values currently emitted as label differences by EHStreamer too (I'm not well versed in this topic).

Now, the RISC-V backend returns true for MCAsmBackend::requiresDiffExpressionRelocations when relaxation is enabled. This means no label difference is folded in .eh_frame. Instead we end with a field Length that is just emitted as zeroes and should be patched up by a relocation (emitted in .rela.eh_frame), but the code that looks at the .eh_frame section does not seem to care with relocations in that field.

@simoncook @asb perhaps you were already aware of this issue?

Thank you!

Herald added a project: Restricted Project. · View Herald TranscriptApr 17 2019, 6:18 AM

The problem is that Length is the length of the CIE, i.e. the data structure itself; this means it isn't subject to relaxations and so the difference expression can be folded. I don't see much merit in teaching all the tools out there to check relocations for Length when we can instead just emit the "right" thing. I think the fix is to make sure we only return true from requiresDiffExpressionRelocations if a target is in a code section? MCSection::UseCodeAlign will tell you this (or you can use MCSectionELF::getFlags() & ELF::SHF_EXECINSTR to be less general), though perhaps it should be renamed to isCodeSection to match isVirtualSection.

asb added a comment.Apr 17 2019, 6:55 AM

Thanks for the report Roger. I agree with @jrtc27's analysis, it makes sense to limit requiresDiffExpressionRelocations to returning true only for code sections.

James, Alex: Thanks a lot for the comments. I'll look into what I can do here.

James, Alex: Thanks a lot for the comments. I'll look into what I can do here.

This issue also affects the debug information in .debug_info. Relocations (R_RISCV_ADD32 and R_RISCV_SUB32) are emitted for DWARF's compilation unit header Length field. While binutils accepts that, LLVM does not (and it's not trivial to fix that because DWARFContext currently assumes that only one relocation would exist per offset, while these label differences require two), which means that tools like llvm-objdump and llvm-dwarfdump don't correctly parse the debug information. Returning false in requiresDiffExpressionRelocations does not suffice to solve that issue because an absolute R_RISCV_32 relocation is still emitted.

@rogfer01 Are you still looking into this? My efforts to fix the problem with the debug information would also touch this. How do you want to proceed?

@rogfer01 Are you still looking into this? My efforts to fix the problem with the debug information would also touch this. How do you want to proceed?

I made some changes in my downstream so things link successfully but I'm still seeing issues with exceptions at runtime. So nothing very useful at the moment for upstream, feel free to make changes here.

@rogfer01 Are you still looking into this? My efforts to fix the problem with the debug information would also touch this. How do you want to proceed?

I made some changes in my downstream so things link successfully but I'm still seeing issues with exceptions at runtime. So nothing very useful at the moment for upstream, feel free to make changes here.

I added a flag to MCExpr to decide to resolve the symbol difference or not. If the flag is set, the symbol difference will be resolved even when relaxation is enabled. I upstream the idea to D61584. If it is acceptable to add flags to MCExpr, I think it is a solution for the problem. Any suggestions?

By the way, I also found another problem when dealing with exception handling table. The "call site encoding" is uleb128 by default in LLVM. However, LLVM does not support variable length encoding in symbol difference relocations in current implementation. So, when relaxation is enabled, I think it is simpler to use udata4 as "call site encoding".

The problem is that Length is the length of the CIE, i.e. the data structure itself; this means it isn't subject to relaxations and so the difference expression can be folded. I don't see much merit in teaching all the tools out there to check relocations for Length when we can instead just emit the "right" thing. I think the fix is to make sure we only return true from requiresDiffExpressionRelocations if a target is in a code section? MCSection::UseCodeAlign will tell you this (or you can use MCSectionELF::getFlags() & ELF::SHF_EXECINSTR to be less general), though perhaps it should be renamed to isCodeSection to match isVirtualSection.

Use SHF_EXECINSTR flag may be not enough. Several debug information sections also need relocations. For example, although it has no need to generate relocations for Length in FDE, it needs relocations for Address Range in FDE. These two attributes are located in the same section and the section has no SHF_EXECINSTR flag.

Use SHF_EXECINSTR flag may be not enough. Several debug information sections also need relocations. For example, although it has no need to generate relocations for Length in FDE, it needs relocations for Address Range in FDE. These two attributes are located in the same section and the section has no SHF_EXECINSTR flag.

I think the point is to check for the SHF_EXECINSTR flag *not* in the section where the value is going to be generated but for the sections of the referenced symbols. If the symbols are in a code section then their addresses can change due to the relaxations. Since relaxations only occur in code sections (right?) then this isn't an issue for symbols in non-code sections, and therefore you can evaluate the fixup and avoid emitting a relocation.