Download Raw Diff

Details

Reviewers

probinson
dblaikie
aprantl
rogfer01
jrtc27
asb

Commits

rGc5ecdd3c5a98: [DebugInfo] Some fields do not need relocations even relax is enabled.
rL366531: [DebugInfo] Some fields do not need relocations even relax is enabled.

Summary

In debug frame information, some fields, e.g., Length in CIE/FDE and
Offset in FDE are attributes to describe the structure of CIE/FDE. They
are not related to the relaxed code. However, these attributes are
symbol differences. So, in current design, these attributes will be
filled as zero and LLVM generates relocations for them.

Diff Detail

Repository: rL LLVM

Event Timeline

HsiangKai created this revision.May 6 2019, 12:34 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 6 2019, 12:34 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

HsiangKai mentioned this in D45181: [RISCV] Add diff relocation support for RISC-V.May 6 2019, 12:46 AM

I don't claim to be an MCExpr expert, but I think this should not be necessary. There are other situations where symbol differences are computed and emitted as constants, for example when DW_AT_high_pc is a length rather than an address, and they clearly don't require this tactic.

In D61584#1491778, @probinson wrote:

I don't claim to be an MCExpr expert, but I think this should not be necessary. There are other situations where symbol differences are computed and emitted as constants, for example when DW_AT_high_pc is a length rather than an address, and they clearly don't require this tactic.

This patch is provided for the issue discussed in https://reviews.llvm.org/D45181#1470143.
When relaxation is enabled, linker may modify the code sequence according to relocation types. So, DW_AT_high_pc will be changed even it is used as a length instead of an address.

In current implementation, LLVM will generate relocations for ALL binary expressions if relaxation is enabled. However, not all binary expressions need relocations even relaxation is enabled. This attributes in MCExpr is designed for the values which are not expected to be changed after relaxation.

If the flag in MCExpr is an acceptable way to solve the problem, I will refine the patch, e.g., add test cases, until it is ready to land.

Thanks for the reference, that helps some.
However I still think this isn't the right way to go about fixing the problem. This started because RISC-V decided not to resolve certain kinds of expressions, and then it turns out that applies to too many expressions. Somebody suggested that the asm backend decision should be more refined, based on symbol section perhaps, and that feels like a more principled way to go about it.

rogfer01 added a subscriber: rogfer01.May 8 2019, 12:49 PM

dblaikie added a subscriber: dblaikie.May 13 2019, 8:12 AM

In D61584#1493600, @probinson wrote:

Thanks for the reference, that helps some.
However I still think this isn't the right way to go about fixing the problem. This started because RISC-V decided not to resolve certain kinds of expressions, and then it turns out that applies to too many expressions. Somebody suggested that the asm backend decision should be more refined, based on symbol section perhaps, and that feels like a more principled way to go about it.

I don't think to refine the backend decision is a solution for the problem. I have explained the reason in https://reviews.llvm.org/D45181#1495288.

In` EmitFDE` example, Length value will be converted to a Fixup. It is reasonable to convert the expression to fixup due to the symbol values are unknown then.

When finalization, these fixups will be converted to relocations if relaxation is enabled. However, there is no way to decide to generate relocations or generate absolute value in MCAssembler::evaluateFixup(). There is no information to distinguish these two cases.

The call stack when evaluating Length in FDE is

MCExpr::evaluateAsAbsolute()
MCObjectStreamer::EmitValueImpl()  <= generate MCFixup
MCELFStreamer::EmitValueImpl()
MCStreamer::EmitValue()
emitAbsValue()

Another way to solve the problem is to modify the interfaces of emitAbsValue, EmitValue, and EmitValueImpl. Carry the information to MCObjectStreamer::EmitValueImpl and attach the information to MCFixup when generating fixups.

Attach a test case.

Herald added subscribers: jocewei, PkmX, the_o and 12 others. · View Herald TranscriptMay 13 2019, 7:37 PM

HsiangKai added reviewers: probinson, dblaikie, aprantl.May 13 2019, 7:43 PM

Add a flag to MCFixup instead of MCExpr.

Herald added subscribers: atanasyan, aheejin, sbc100 and 2 others. · View Herald TranscriptMay 14 2019, 5:44 AM

It still bothers me that MCDwarf has to know exactly when to override the target's decision to do relaxation.

Why is it appropriate to do that in the two methods where you have made that change in this patch, and not in many other places within MCDwarf that call the same helper? What guides me to know it's correct in the first three calls to emitAbsValue within FrameEmitterImpl::EmitFDE, but not in the fourth? Without becoming an expert in all the fine details of every target.

include/llvm/MC/MCFixup.h
92 ↗	(On Diff #199412)	"Require the assembler to evaluate the fixup." (It is going to become an absolute value regardless, in the contexts where this is set to true; the only question is where the evaluation happens.)
include/llvm/MC/MCStreamer.h
621 ↗	(On Diff #199412)	Need to add Fixed to the doxygen comment for this method.

In D61584#1503168, @probinson wrote:

It still bothers me that MCDwarf has to know exactly when to override the target's decision to do relaxation.

Why is it appropriate to do that in the two methods where you have made that change in this patch, and not in many other places within MCDwarf that call the same helper? What guides me to know it's correct in the first three calls to emitAbsValue within FrameEmitterImpl::EmitFDE, but not in the fourth? Without becoming an expert in all the fine details of every target.

It is not about target's decision. It is about whether the symbol difference will be changed after linking. For example, length field in FDE is used to describe the size of this FDE structure. It will not be affected by linker even relaxation is enabled. However, address_range in FDE is used to describe the range of instructions related to this FDE. If the linker modifies the instructions in the range, address_range will need be updated by linker accordingly. That's why it needs to set Fixed flag in length field instead of address_range field in FDE.

I will review other uses of emitAbsValue to ensure the flag is set correctly.

Set Fixed flag to length of debug_info and debug_line. These fields should be fixed even the linker has modified the instructions in the program.

Update comments.

Check Fixup is nullptr or not before calling its member functions.

Why do we need this flag; can we not already determine this (either in generic code or the target backends)? Anything where all symbols referenced are not in SHF_EXECINSTR sections can be evaluated even with relaxations, which covers all these cases:

MCDwarfLineTableHeader::Emit's LineStartSym/LineEndSym/ProEndSym are in .debug_line.dwo which is just SHF_EXCLUDE
EmitGenDwarfInfo's InfoStart/InfoEnd are in .debug_info.dwo which is just SHF_EXCLUDE
FrameEmitterImpl::EmitCIE's sectionStart/sectionEnd are either in .eh_frame which is SHF_ALLOC (and also SHF_WRITE on non-x86 Solaris), or in .debug_frame which has no flags
FrameEmitterImpl::EmitFDE's fdeStart/fdeEnd/cieStart/SectionStart should all also be in the same section as above

In D61584#1504553, @jrtc27 wrote:

Why do we need this flag; can we not already determine this (either in generic code or the target backends)? Anything where all symbols referenced are not in SHF_EXECINSTR sections can be evaluated even with relaxations, which covers all these cases:

MCDwarfLineTableHeader::Emit's LineStartSym/LineEndSym/ProEndSym are in .debug_line.dwo which is just SHF_EXCLUDE

EmitGenDwarfInfo's InfoStart/InfoEnd are in .debug_info.dwo which is just SHF_EXCLUDE

FrameEmitterImpl::EmitCIE's sectionStart/sectionEnd are either in .eh_frame which is SHF_ALLOC (and also SHF_WRITE on non-x86 Solaris), or in .debug_frame which has no flags

FrameEmitterImpl::EmitFDE's fdeStart/fdeEnd/cieStart/SectionStart should all also be in the same section as above

I got it. I missed the information in MCSymbol. Every symbol should have information about the section it is located. So, we could use something like sym.getSection().getKind().isText() to determine the symbol is located in text or not. You are right. I will abandon this revision. Thanks.

Fix the bug according to @jrtc27's suggestions.

edward-jones added a child revision: D63409: [RISCV] Specify various encodings for DWARF exception handling.Jun 17 2019, 4:39 AM

edward-jones added inline comments.Jun 19 2019, 7:03 AM

lib/MC/MCExpr.cpp
610	I found a bug with these latest changes when compiling the following program: func: .cfi_startproc .cfi_endproc With the command: `llvm-mc -filetype=obj tmp.s -o tmp.o -triple riscv32` The calls to `...->getSymbol().getSection()` can trigger an assertion if the symbol is not defined in a section, which seems to happen in the above testcase.

Just a quick comment, I'm assuming this depends on D58335 based on the test file you've modified?

Fix bugs.

In D61584#1550273, @lewis-revill wrote:

Just a quick comment, I'm assuming this depends on D58335 based on the test file you've modified?

Yes, the test case is based on D58335.

lib/MC/MCExpr.cpp
610	Thank you for your information. I have fixed the bug. Please help me to review it and verify the solution. Thanks.

edward-jones added inline comments.Jun 25 2019, 8:06 AM

lib/MC/MCExpr.cpp
610	Yep, this all works for me now. Looks good!

lenary added a parent revision: D58335: [DebugInfo] Generate fixups as emitting DWARF .debug_frame/.eh_frame..Jul 5 2019, 7:58 AM

Herald added a subscriber: MaskRay. · View Herald TranscriptJul 5 2019, 7:58 AM

I added D58335 as a parent revision, as the relax-debug-frame.ll file is created in that revision, and this patch does not apply without it.

edward-jones added a child revision: D63417: [RISCV] Specify registers used for exception handling.Jul 8 2019, 8:49 AM

edward-jones removed a child revision: D63417: [RISCV] Specify registers used for exception handling.

edward-jones added a child revision: D63404: [RISCV] Don't force absolute FK_Data_X fixups to relocs.

edward-jones mentioned this in D63404: [RISCV] Don't force absolute FK_Data_X fixups to relocs.Jul 8 2019, 9:01 AM

Does this solution extend/address other label deltas, like those that would appear in the other DWARF/non-frame sections? (eg: currently we use a length in the DWARF high_pc because it means not having to use a relocation for high_pc, which reduces object size - so, does this fix cause relocations to be used for that length expression to ensure they're correct if the linker changes the instructions in that range?)

In D61584#1574203, @dblaikie wrote:

Does this solution extend/address other label deltas, like those that would appear in the other DWARF/non-frame sections? (eg: currently we use a length in the DWARF high_pc because it means not having to use a relocation for high_pc, which reduces object size - so, does this fix cause relocations to be used for that length expression to ensure they're correct if the linker changes the instructions in that range?)

After D45181 committed, LLVM will generate relocation pairs for symbol difference expressions if target enables relaxation. So, it will generate relocations for high_pc due to it is a symbol difference expression of end symbol and start symbol of the function range. Such as,

.word   .Lfunc_begin0           # DW_AT_low_pc
.word   .Lfunc_end4-.Lfunc_begin0 # DW_AT_high_pc

This commit modified the condition to generate relocations for symbol difference expressions. Not all symbol difference expressions are related to executable code. However, only executable code will be modified by linker for relaxation. So, we do not need to generate relocations for symbol differences not located in executable code sections.

In D61584#1579860, @HsiangKai wrote:
In D61584#1574203, @dblaikie wrote:

Does this solution extend/address other label deltas, like those that would appear in the other DWARF/non-frame sections? (eg: currently we use a length in the DWARF high_pc because it means not having to use a relocation for high_pc, which reduces object size - so, does this fix cause relocations to be used for that length expression to ensure they're correct if the linker changes the instructions in that range?)

After D45181 committed, LLVM will generate relocation pairs for symbol difference expressions if target enables relaxation. So, it will generate relocations for high_pc due to it is a symbol difference expression of end symbol and start symbol of the function range. Such as,
.word   .Lfunc_begin0           # DW_AT_low_pc
.word   .Lfunc_end4-.Lfunc_begin0 # DW_AT_high_pc
This commit modified the condition to generate relocations for symbol difference expressions. Not all symbol difference expressions are related to executable code. However, only executable code will be modified by linker for relaxation. So, we do not need to generate relocations for symbol differences not located in executable code sections.

Good to know - so has LLVM been creating incorrect DWARF in these cases so far - or has no target enabled linker relaxation yet?

Is there any way to make this more targeted - does linker relaxation allow arbitrary machine code changes? Or is there something more fine-grained - so we could compute compile-time constants for some label differences, but only use linker relocations for cases where we know the instruction sequence includes something relaxable?

In D61584#1579871, @dblaikie wrote:
In D61584#1579860, @HsiangKai wrote:
In D61584#1574203, @dblaikie wrote:

Does this solution extend/address other label deltas, like those that would appear in the other DWARF/non-frame sections? (eg: currently we use a length in the DWARF high_pc because it means not having to use a relocation for high_pc, which reduces object size - so, does this fix cause relocations to be used for that length expression to ensure they're correct if the linker changes the instructions in that range?)

After D45181 committed, LLVM will generate relocation pairs for symbol difference expressions if target enables relaxation. So, it will generate relocations for high_pc due to it is a symbol difference expression of end symbol and start symbol of the function range. Such as,
.word   .Lfunc_begin0           # DW_AT_low_pc
.word   .Lfunc_end4-.Lfunc_begin0 # DW_AT_high_pc
This commit modified the condition to generate relocations for symbol difference expressions. Not all symbol difference expressions are related to executable code. However, only executable code will be modified by linker for relaxation. So, we do not need to generate relocations for symbol differences not located in executable code sections.
Good to know - so has LLVM been creating incorrect DWARF in these cases so far - or has no target enabled linker relaxation yet?

Is there any way to make this more targeted - does linker relaxation allow arbitrary machine code changes? Or is there something more fine-grained - so we could compute compile-time constants for some label differences, but only use linker relocations for cases where we know the instruction sequence includes something relaxable?

AFAIK only RISC-V enables this feature. Currently, .debug_frame/.eh_frame is wrong after relaxation. I already fixed parts of the problem in D58335. In D58335, .debug_frame/.eh_frame will generate relocations if relaxation is enabled. Otherwise, it will not aware relaxation at all and it doesn't work to do frame inspection when debugging.

Linker relaxation processes some patterns of instruction sequences. For example, RISC-V could use auipc/jalr to jump to 32-bits pc-relative address. If linker knows the jump target is within +/-1 Mib, it could replace auipc/jalr to jal. The replacement will shorten the instruction sequence. So, the label differences between these instruction sequences may be changed after relaxation. That is why we need to annotate relocation types for these symbol differences in code sections.

Currently, there is no way to make it more targeted. When relaxation is enabled, it assumes all code sequences may be changed after relaxation. If we want to make placement of relocations more specific, we need to inspect every instruction between two symbols when we create symbol difference expressions for them. I think it will do a lot of work to gain few benefits. (Fewer relocations, smaller object code, less linker efforts, etc.)

In D61584#1579992, @HsiangKai wrote:
In D61584#1579871, @dblaikie wrote:
In D61584#1579860, @HsiangKai wrote:
In D61584#1574203, @dblaikie wrote:

Does this solution extend/address other label deltas, like those that would appear in the other DWARF/non-frame sections? (eg: currently we use a length in the DWARF high_pc because it means not having to use a relocation for high_pc, which reduces object size - so, does this fix cause relocations to be used for that length expression to ensure they're correct if the linker changes the instructions in that range?)

After D45181 committed, LLVM will generate relocation pairs for symbol difference expressions if target enables relaxation. So, it will generate relocations for high_pc due to it is a symbol difference expression of end symbol and start symbol of the function range. Such as,
.word   .Lfunc_begin0           # DW_AT_low_pc
.word   .Lfunc_end4-.Lfunc_begin0 # DW_AT_high_pc
This commit modified the condition to generate relocations for symbol difference expressions. Not all symbol difference expressions are related to executable code. However, only executable code will be modified by linker for relaxation. So, we do not need to generate relocations for symbol differences not located in executable code sections.
Good to know - so has LLVM been creating incorrect DWARF in these cases so far - or has no target enabled linker relaxation yet?

Is there any way to make this more targeted - does linker relaxation allow arbitrary machine code changes? Or is there something more fine-grained - so we could compute compile-time constants for some label differences, but only use linker relocations for cases where we know the instruction sequence includes something relaxable?
AFAIK only RISC-V enables this feature. Currently, .debug_frame/.eh_frame is wrong after relaxation. I already fixed parts of the problem in D58335. In D58335, .debug_frame/.eh_frame will generate relocations if relaxation is enabled. Otherwise, it will not aware relaxation at all and it doesn't work to do frame inspection when debugging.

Linker relaxation processes some patterns of instruction sequences. For example, RISC-V could use auipc/jalr to jump to 32-bits pc-relative address. If linker knows the jump target is within +/-1 Mib, it could replace auipc/jalr to jal. The replacement will shorten the instruction sequence. So, the label differences between these instruction sequences may be changed after relaxation. That is why we need to annotate relocation types for these symbol differences in code sections.

Currently, there is no way to make it more targeted. When relaxation is enabled, it assumes all code sequences may be changed after relaxation. If we want to make placement of relocations more specific, we need to inspect every instruction between two symbols when we create symbol difference expressions for them. I think it will do a lot of work to gain few benefits. (Fewer relocations, smaller object code, less linker efforts, etc.)

Thanks for helping me understand the context/support here.

I'm aware of some folks (should be announced/discussed upstream soon) working on a feature that'd require linker relaxation for x86 - but only under some fairly narrow circumstances, so there would be potentially more benefit to be gained by avoiding these label difference relocations where the address range is known not to be modifiable by the linker.

But all good - not your puzzle to solve, good for me to understand so I can keep it in mind as further work for this x86 situation/project.

LGTM, thanks! Some minor rebasing is needed for relax-debug-frame.ll. Please go ahead and commit.

This revision is now accepted and ready to land.Jul 15 2019, 10:16 PM

Closed by commit rL366531: [DebugInfo] Some fields do not need relocations even relax is enabled. (authored by HsiangKai). · Explain WhyJul 18 2019, 11:12 PM

This revision was automatically updated to reflect the committed changes.

@reames for D150373

For MCObjectStreamer::emitDwarfAdvanceLineAddr and MCObjectStreamer::emitDwarfAdvanceFrameAddr, we have:

fast path: used by assembly input (commit 57ab708bdd3231b23a8ef4978b11ff07616034a2 (2010))
slow path (a relaxable fragment): used by code generation (much more important case)

This piece of code (canFold) blocks label difference folding, so that we will take the slow path.
The code was removed by @compnerd's linker relaxation refactoring D103539 (RISCV: adjust handling of relocation emission for RISCV)

I think we really just want to get rid of the "fast path": D150004.

With both fast path and slow path, we have diverging behaviors for RISC-V: how RISC-V linker relaxation leads to relocations happens in different code paths.

The slow path tends to be tested more and more robust, since it controls direct object code emission (more likely noticed by debuggers, etc), while the fast path is only for assembly output (assembly input files are less common & clang -S a.c; clang -c a.s users are uncommon).

I have tried reading MCExpr.cpp multiple times and I always feel that some parts of it need more love to address some problems (some are documented as FIXME).

(If the fast path is for simple operations like LEB128 (see D150004's analysis), I will more likely accept.
However, I'd prefer that we pay more attention to make the "slow path" faster.)

Herald added a project: Restricted Project. · View Herald TranscriptMay 11 2023, 2:12 PM

Herald added subscribers: luke, • pcwang-thead, frasercrmck and 4 others. · View Herald Transcript

MaskRay mentioned this in D150004: [RISCV][MC] .debug_line/.debug_frame/.eh_frame: emit relocations for assembly input files with relaxation.May 12 2023, 2:37 PM

MaskRay mentioned this in D155357: [RISCV] Allow delayed decision for ADD/SUB relocations.Jul 14 2023, 11:45 PM

MaskRay mentioned this in rGffa829c4c5d1: [RISCV] Allow delayed decision for ADD/SUB relocations.Jul 21 2023, 8:38 AM

This is an archive of the discontinued LLVM Phabricator instance.

[DebugInfo] Some fields do not need relocations even relax is enabled.
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 198227

include/llvm/MC/MCExpr.h

lib/MC/MCDwarf.cpp

lib/MC/MCExpr.cpp

This is an archive of the discontinued LLVM Phabricator instance.

[DebugInfo] Some fields do not need relocations even relax is enabled.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 198227

include/llvm/MC/MCExpr.h

lib/MC/MCDwarf.cpp

lib/MC/MCExpr.cpp

[DebugInfo] Some fields do not need relocations even relax is enabled.
ClosedPublic