This is an archive of the discontinued LLVM Phabricator instance.

[RISCV][MC] .debug_line/.debug_frame/.eh_frame: emit relocations for assembly input files with relaxation
ClosedPublic

Authored by MaskRay on May 5 2023, 3:29 PM.

Details

Summary

When assembling .debug_line for both explicitly specified and synthesized
.loc directives. the integrated assembler may incorrectly omit relocations for
-mrelax.

For an assembly file, we have a MCAssembler object and evaluateAsAbsolute
will incorrectly fold AddrDelta to a constant (which is not true in the
presence of linker relaxation).
MCDwarfLineAddr::Emit will emit a special opcode, which does not take into
account of linker relaxation. This is a sufficiently complex function that
I think should be called in any "fast paths" for linker relaxation aware assembling.

The following script demonstrates the bugs.

cat > x.c <<eof
void f();
void _start() {
  f();
  f();
  f();
}
eof
# C to object file: correct DW_LNS_fixed_advance_pc
clang --target=riscv64 -g -c x.c
llvm-dwarfdump --debug-line -v x.o | grep \ DW_LNS_fixed_advance_pc -q

# Assembly to object file with synthesized line number information: incorrect special opcodes
clang --target=riscv64 -S x.c && clang --target=riscv64 -g -c x.s
llvm-dwarfdump --debug-line -v x.o | grep \ DW_LNS_fixed_advance_pc -q; test $? -eq 1

# Assembly with .loc to object file: incorrect special opcodes
clang --target=riscv64 -S -g x.c && clang --target=riscv64 -c x.s
llvm-dwarfdump --debug-line -v x.o | grep \ DW_LNS_fixed_advance_pc -q; test $? -eq 1

The MCDwarfLineAddr::Emit code path is an old optimization in commit
57ab708bdd3231b23a8ef4978b11ff07616034a2 (2010) that seems no longer relevant.
It don't trigger for direct machine code emission (label differences are not
foldable without a MCAssembler). MCDwarfLineAddr::Emit does complex operations
that are repeated in MCAssembler::relaxDwarfLineAddr, which an intricate RISCV
override.

Let's remove the "fast path". Assembling the assembly output of
X86ISelLowering.cpp with -g may be 2% slower, but I think the cost is fine.
There are opportunities to make the "slow path" faster, e.g.

  • Optimizing the current new MC*Fragment pattern that allocates new fragments on the heap.
  • Reducing the number of relaxation times for .debug_line and .debug_frame, as well as possibly other sections using LEB128. For instance, LEB128 can have a one-byte estimate to avoid the second relaxation iteration.

For assembly input with -mno-relax, in theory we can prefer special opcodes to
DW_LNS_fixed_advance_pc to decrease the size of .debug_line, but such a change
may be overkill and unnecessarily diverge from -mrelax behaviors and GCC.


For .debug_frame/.eh_frame, MCDwarf currently emits DW_CFA_advance_loc without
relocations. Remove the special case to enable relocations. Similar to
.debug_line, even without the bug fix, the MCDwarfFrameEmitter::encodeAdvanceLoc
special case is a sufficiently complex code path that should be avoided.


When there are more than one section, we generate .debug_rnglists for
DWARF v5. We currently emit DW_RLE_start_length using ULEB128, which
is incorrect. The new test gen-dwarf.s adds a TODO.


About other evaluateAsAbsolute uses. MCObjectStreamer::emit[SU]LEB128Value
have similar code to MCDwarfLineAddr. They are fine to keep as we don't have
LEB128 relocations to correctly represent link-time non-constants anyway.


In the future, we should investigate ending a MCFragment for a relaxable
instruction, to further clean up the assembler support for linker relaxation
and fix evaluateAsAbsolute.

See bbea64250f65480d787e1c5ff45c4de3ec2dcda8 for some of the related code.

Diff Detail

Event Timeline

MaskRay created this revision.May 5 2023, 3:29 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 5 2023, 3:29 PM
MaskRay requested review of this revision.May 5 2023, 3:29 PM
MaskRay edited the summary of this revision. (Show Details)May 5 2023, 4:10 PM
enh accepted this revision.May 8 2023, 12:51 PM

sgtm, but you probably want a grownup like craig.topper to do a real review :-)

This revision is now accepted and ready to land.May 8 2023, 12:51 PM
reames added a subscriber: reames.May 8 2023, 12:59 PM
reames added inline comments.
llvm/lib/MC/MCObjectStreamer.cpp
545

The code structure here makes it seem like evaluateAsAbsolute should be returning false in this case. Removing the fastpath here certainly hides this bug, but do we have others lurking due to incorrect return results from evaluateAsAbsolute?

Not my area, so totally possible I'm off base here.

MaskRay marked an inline comment as done.May 8 2023, 1:24 PM

@reames My original attempt is this:

diff --git a/llvm/include/llvm/MC/MCAsmBackend.h b/llvm/include/llvm/MC/MCAsmBackend.h
index 354f9d8e993f..f200d5ef3aa8 100644
--- a/llvm/include/llvm/MC/MCAsmBackend.h
+++ b/llvm/include/llvm/MC/MCAsmBackend.h
@@ -44,2 +44,5 @@ protected: // Can only create subclasses.

+  // If true, force a MCFixup to emit a relocation.
+  bool ForceRelocs = false;
+
 public:
@@ -51,2 +54,4 @@ public:

+  bool getForceRelocs() { return ForceRelocs; }
+
   /// Return true if this target might automatically pad instructions and thus
diff --git a/llvm/lib/MC/MCObjectStreamer.cpp b/llvm/lib/MC/MCObjectStreamer.cpp
index d609194992a5..fa65fb2857b9 100644
--- a/llvm/lib/MC/MCObjectStreamer.cpp
+++ b/llvm/lib/MC/MCObjectStreamer.cpp
@@ -544,3 +544,5 @@ void MCObjectStreamer::emitDwarfAdvanceLineAddr(int64_t LineDelta,
   int64_t Res;
-  if (AddrDelta->evaluateAsAbsolute(Res, getAssemblerPtr())) {
+  MCAssembler *As = getAssemblerPtr();
+  bool ForceRelocs = As && As->getBackend().getForceRelocs();
+  if (!ForceRelocs && AddrDelta->evaluateAsAbsolute(Res, As)) {
     MCDwarfLineAddr::Emit(this, Assembler->getDWARFLinetableParams(), LineDelta,
diff --git a/llvm/lib/Target/RISCV/MCTargetDesc/RISCVAsmBackend.h b/llvm/lib/Target/RISCV/MCTargetDesc/RISCVAsmBackend.h
index b5670b6214c2..96bd6694aa6e 100644
--- a/llvm/lib/Target/RISCV/MCTargetDesc/RISCVAsmBackend.h
+++ b/llvm/lib/Target/RISCV/MCTargetDesc/RISCVAsmBackend.h
@@ -27,3 +27,2 @@ class RISCVAsmBackend : public MCAsmBackend {
   bool Is64Bit;
-  bool ForceRelocs = false;
   const MCTargetOptions &TargetOptions;

I.e. toggling the fast path to be aware of RISC-V linker relaxation. However, I switched to the current version as it simplifies code and ensures an involved RISC-V special case (getBackend().relaxDwarfLineAddr(DF, Layout, WasRelaxed) in MCAssembler::relaxDwarfLineAddr) occurs in just one place. This simplification helps reasoning about the code.

As a penalty, we pay the clang -c a.s overhead (~2%) but the more common clang -g -c a.c is unaffected (possibly slightly faster since we avoid likely-fail AddrDelta->evaluateAsAbsolute(Res, getAssemblerPtr())).


We have a similar issue in .debug_frame/.eh_frame. I will send another patch to fix it.

llvm/lib/MC/MCObjectStreamer.cpp
545

See my main comment.

MaskRay marked an inline comment as done.May 8 2023, 1:25 PM

Some overlap (re: relaxation) with D140478 - it would be good to have some utility for checking whether a given instruction range may be relaxed (even if it's coarse-grained to the whole function) and make choices about encoding on the basis of that utility/test.

Some overlap (re: relaxation) with D140478 - it would be good to have some utility for checking whether a given instruction range may be relaxed (even if it's coarse-grained to the whole function) and make choices about encoding on the basis of that utility/test.

I can read D140478 more closely, but I think D140478 and this patch are different.

For .debug_line and .debug_frame special handling, we really just want to get rid of the legacy "fast path" (faster for assembling but slow for direct machine code emission, added circa 2010 or before).

I think this is more related to MC which I frequently contribute to and less related to RISC-V and debug-info specifics.

This patch is a simplification of a "fast patch" which now gets in the way for linker relaxation. I'll land this patch on Friday with the existing stamp from enh if no second opinion is heard (happy to land sooner if I can get another stamp:) )

barannikov88 added inline comments.
llvm/lib/MC/MCObjectStreamer.cpp
545

I share @reames' concern.
There is emitDwarfAdvanceFrameAddr below that has the same issue.
MCPseudoProbe::emit does exactly the same, too.
The solution as-is looks incomplete.

Removing this code fragment does not look totally wrong, but it should be done
consistently in all places and compile time should not be neglected.
Ideally, evaluateAsAbsolute should return false when the expression cannot be
evaluated as absolute.

I agree with @reames and @barannikov88 that it seems like evaluateAsAbsolute is the right place for this.
Unless you can demonstrate that it gets something else wrong to put the check there.

I share @reames' concern. There is emitDwarfAdvanceFrameAddr below that has the same issue. MCPseudoProbe::emit does exactly the same, too. The solution as-is looks incomplete.

Removing this code fragment does not look totally wrong, but it should be done consistently in all places and compile time should not be neglected. Ideally, evaluateAsAbsolute should return false when the expression cannot be

evaluated as absolute.

.debug_frame/.eh_frame is what I am experimenting with. I think it can be fixed separately without a new set of patches.
Assembly source files typically don't write CFI directives, and a bug is mostly unnoticeable anyway.

The .debug_line is more harmful as some assemblers synthesize .debug_line for assembly source files without .file and .loc.

I disagree that we want to adjust the code of evaluateAsAbsolute or the call site. The evaluateAsAbsolute code is false for "direct object emission".
It makes clang -g -c a.c slower (though the difference is difficult to notice since it isn't a bottleneck).

MaskRay added a comment.EditedMay 10 2023, 11:04 PM

I agree with @reames and @barannikov88 that it seems like evaluateAsAbsolute is the right place for this.
Unless you can demonstrate that it gets something else wrong to put the check there.

Perhaps the question is why we can't teach evaluateAsAbsolute about RISC-V linker relaxation.
This route will go against @compnerd's bbea64250f65480d787e1c5ff45c4de3ec2dcda8 (RISCV: adjust handling of relocation emission for RISCV).
My gut feeling says we should try avoid such reliance, although I don't have a very good enough theory about it.
Reintroducing this folding to MCExpr may cause brittleness.

Eliminating evaluateAsAbsolute with a MCAssembler uses, if the performance overhead is acceptable, is the preferred approach IMO.

MCDwarfLineAddr::Emit introduces another problem that it duplicates some functionality of MCAssembler::relaxDwarfLineAddr.
Both operations are quite complex. It'd be better to remove a "fast path" that does complex things.

I'll soon provide an updated summary about MCObjectStreamer::emit[SU]LEB128Value.

MaskRay updated this revision to Diff 521210.May 10 2023, 11:10 PM
MaskRay retitled this revision from [RISCV] .debug_line: emit relocations for assembly input files to [RISCV][MC] .debug_line/.debug_frame/.eh_frame: emit relocations for assembly input files with relaxation.
MaskRay edited the summary of this revision. (Show Details)

better test. .debug_line/.eh_frame/.debug_rnglists in one file

remove special code path in MCObjectStreamer::emitDwarfAdvanceFrameAddr

Explain that leaving MCObjectStreamer::emit[SU]LEB128Value unchanged is fine.

Hmmm. I can't say I'm 100% convinced but I won't object.
It would be good to know if this has a size impact on (non-RISCV) debug info.

I decided to take a deeper look at this through the lens of the evaluateAbsoluteDiff approach, and unless I'm missing something, it does appear to work. I mocked up a patch (https://reviews.llvm.org/D150373). This looks like simply a bug in https://reviews.llvm.org/rGbbea64250f65480d787e1c5ff45c4de3ec2dcda8. Specifically, we are allowing folding of a subtract expression early as opposed to waiting until the relocations can be fully resolved. If we're going to resolve such expressions late, we need to be consistent about that.

Hmmm. I can't say I'm 100% convinced but I won't object.
It would be good to know if this has a size impact on (non-RISCV) debug info.

There is no difference on non-RISC-V debug info.

Assembly source files typically don't write CFI directives, and a bug is mostly unnoticeable anyway.

They do, if you use -save-temps :) But I agree that it is much less common.

There is also MCObjectStreamer::emitValueImpl, which potentially has the same issue.
I guess this is one of the reasons it was overridden for RISC-V, and fixing the base
version would impose much more significant slowdowns for other targets.

I'm wondering if RISCVELFStreamer::requiresFixups used in emitValueImpl could be
promoted to a virtual function in MCAsmBackend or MCObjectStreamer and used
in evaluateAsRelocatable? Or am I talking nonsense?

evandro removed a subscriber: evandro.May 12 2023, 2:10 PM
MaskRay added a comment.EditedMay 12 2023, 2:37 PM

Assembly source files typically don't write CFI directives, and a bug is mostly unnoticeable anyway.

They do, if you use -save-temps :) But I agree that it is much less common.

Yes, -save-temps is like clang -S a.c && clang -c a.s with more intermediate steps, uncommon in projects' build systems.

There is also MCObjectStreamer::emitValueImpl, which potentially has the same issue.
I guess this is one of the reasons it was overridden for RISC-V, and fixing the base
version would impose much more significant slowdowns for other targets.

The fast path of MCObjectStreamer::emitValueImpl does a simple operation, so keeping the fast path is more justified.
In addition, it has a value evaluated as xxx is out of range assembler error which is not implemented in the slow path yet.

On the contrary, the fast paths removed by this patch do fairly complex things, and I am concerned of keeping them around.

I made a comment to D61584 which introduced canFold in MCExpr.cpp.

I'm wondering if RISCVELFStreamer::requiresFixups used in emitValueImpl could be
promoted to a virtual function in MCAsmBackend or MCObjectStreamer and used
in evaluateAsRelocatable? Or am I talking nonsense?

Making RISCVELFStreamer::requiresFixups available for MCAsmBackend makes sense.
I wish that we try improving the robustness of RISCVELFStreamer::requiresFixups first.

To provide an overview of how I perceive this problem:

We currently have a fragile assembler infrastructure for RISC-V linker relaxation, and the code being removed in this patch builds on top of that. https://xkcd.com/2347/

By removing some code paths, this patch will address the known bugs in .debug_line and .debug_frame without affecting the output for non-RISC-V targets, as well as decreasing the complexity.

While it is possible to address the bugs in RISCVELFStreamer::requiresFixups and MCExpr::evaluateAsAbsolute without applying this patch, it would require additional effort.
Furthermore, there is a concern that making the ambitious changes might introduce further fragility to linker relaxation infrastructure.

I can't say that I'm convinced, but I ran out of arguments and I'm cowardly retreating.
I hope this discussion will be helpful if something goes wrong in the future.

I can't say that I'm convinced, but I ran out of arguments and I'm cowardly retreating.
I hope this discussion will be helpful if something goes wrong in the future.

It's unfortunate that you feel the need to retreat. Could you please elaborate on the specific points that left you unconvinced? Otherwise there is a risk of causing confusion or misunderstanding.

(a) my statement that the linker relaxation support in MC is fragile
(b) my statement that the "fast paths" removed by this patch cause complexity and are unneeded
(c) my statement that adding an RISC-V check in evaluateAsAbsolute needs extreme attention

barannikov88 added a comment.EditedMay 13 2023, 2:20 AM

I did more digging and found two related patches https://reviews.llvm.org/D45181 and https://reviews.llvm.org/D61584.
The third patch, already mentioned here (https://reviews.llvm.org/D103539)
effectively undoes the first two. This may justify why it removed canFold.
I also like the direction taken in the third patch - push as much as possible
out of generic code.


It's unfortunate that you feel the need to retreat. Could you please elaborate on the specific points that left you unconvinced? Otherwise there is a risk of causing confusion or misunderstanding.

The thing that concerns me is that evaluateAs* methods do not behave
as expected. They may "incorrectly" fold expressions if asm layout is provided.
This functionality either should be removed completely, or the methods
should be taught not to fold expressions when it is wrong to do so.
This patch goes in the middle by avoiding calls, which I don't like.

The evaluateAsAbsolute code is false for "direct object emission".

is false because it returns true for constant expressions. I don't see why we
should take the slow path in this case. 2% degradation out of the blue does
not help justifying this change.

Eliminating evaluateAsAbsolute with a MCAssembler uses, if the performance overhead is acceptable, is the preferred approach IMO.

IMO the preferred approach is to not remove the call, but rather remove
the argument that allows generating invalid code. Or, teach this method
not to generate invalid code. This may be as simple as adding a boolean
flag that forbids folding, but this looks equivalent to removing the
Layout argument.

The fast path of MCObjectStreamer::emitValueImpl does a simple operation, so keeping the fast path is more justified.

I do not agree. It may be more justified from the performance point of view,
but not from the correctness point of vew. I believe correctness should be
the foremost factor. If it was correct, RISC-V wouldn't need to override it.

Furthermore, there is a concern that making the ambitious changes might introduce further fragility to linker relaxation infrastructure.

This statement needs some support. I could as well say that it might make linker
relaxation infrastructure more robust.

I have tried reading MCExpr.cpp multiple times and I always feel that some parts of it need more love to address some problems (some are documented as FIXME).

Here, I totally agree. evaluateAs* methods are broken, and we should fix them rather than avoid using them.

I did more digging and found two related patches https://reviews.llvm.org/D45181 and https://reviews.llvm.org/D61584.
The third patch, already mentioned here (https://reviews.llvm.org/D103539)
effectively undoes the first two. This may justify why it removed canFold.
I also like the direction taken in the third patch - push as much as possible
out of generic code.

Thanks for investigating! Knowing the two related patches will be useful and give more clue for future incremental improvement.


It's unfortunate that you feel the need to retreat. Could you please elaborate on the specific points that left you unconvinced? Otherwise there is a risk of causing confusion or misunderstanding.

The thing that concerns me is that evaluateAs* methods do not behave
as expected. They may "incorrectly" fold expressions if asm layout is provided.
This functionality either should be removed completely, or the methods
should be taught not to fold expressions when it is wrong to do so.

Yes, evaluateAs* methods need to be fixed. See the last paragraph of this comment.

This patch goes in the middle by avoiding calls, which I don't like.

The evaluateAsAbsolute code is false for "direct object emission".

is false because it returns true for constant expressions. I don't see why we
should take the slow path in this case. 2% degradation out of the blue does
not help justifying this change.

The "slow path" is slow for assembly input. By removing the "fast path," we can slightly improve the speed of "direct object emission" since we save time on a path that is not expected to be taken.

Therefore, I question the merit of retaining the "fast path."
In my conclusion, it is not necessary (https://reviews.llvm.org/D150004#4339113), at least not for now[now].

MCDwarfLineAddr::Emit and MCDwarfFrameEmitter::encodeAdvanceLoc do complex operations that are repeated in MCAssembler::relaxDwarfLineAddr and MCAssembler::relaxDwarfCallFrameFragment.
The latter two have intricate RISCV overrides.

There are opportunities to make the "slow path" faster, e.g.

  • Optimizing the current new MC*Fragment pattern that allocates new fragments on the heap.
  • Reducing the number of relaxation times for .debug_line and .debug_frame, as well as possibly other sections using LEB128. For instance, LEB128 can have a one-byte estimate to avoid the second relaxation iteration.

As a result, the benefits of the "fast path" (which only applies to assembly input) may become even smaller.

[now]: I mentioned this as a temporary decision. After revisiting how the RISC-V overrides should be implemented, there might still be a chance to reintroduce the "fast path" just for assembly input.
I can create an issue (https://github.com/llvm/llvm-project/issues) in case it is overlooked.

Eliminating evaluateAsAbsolute with a MCAssembler uses, if the performance overhead is acceptable, is the preferred approach IMO.

IMO the preferred approach is to not remove the call, but rather remove
the argument that allows generating invalid code. Or, teach this method
not to generate invalid code. This may be as simple as adding a boolean
flag that forbids folding, but this looks equivalent to removing the
Layout argument.

Are you suggesting to remove the getAssemblerPtr() argument? AIUI this would prevent the fast path from ever being triggered, even for assembly input.

The fast path of MCObjectStreamer::emitValueImpl does a simple operation, so keeping the fast path is more justified.

I do not agree. It may be more justified from the performance point of view,
but not from the correctness point of vew. I believe correctness should be
the foremost factor. If it was correct, RISC-V wouldn't need to override it.

Furthermore, there is a concern that making the ambitious changes might introduce further fragility to linker relaxation infrastructure.

This statement needs some support. I could as well say that it might make linker
relaxation infrastructure more robust.

I have tried reading MCExpr.cpp multiple times and I always feel that some parts of it need more love to address some problems (some are documented as FIXME).

Here, I totally agree. evaluateAs* methods are broken, and we should fix them rather than avoid using them.

Instead of D150373 (which reintroduces D61584), I am considering an ambitious refactoring that involves separating relaxable instructions into their own fragment.
This is how x86 JMP/JCC/etc are relaxed.

Currently, relaxable and non-relaxable instructions are placed together in one MCDataFragment. However, this is not the intended design of MCDataFragment.
By placing relaxable instructions in their own fragments, we can utilize the AttemptToFoldSymbolOffsetDifference generic code to avoid folding for assembly input.

@MaskRay
Thank you for the explanation. Weighting all pros and cons I tend to agree with you.
I also apologize for taking so much of your time for this simple change.

One point I'd like to clarify, because it was one of my main concerns:

Are you suggesting to remove the getAssemblerPtr() argument? AIUI this would prevent the fast path from ever being triggered, even for assembly input.

This will not prevent the fast path from being triggered because not all
expressions contain symbols. An expression can be as simple as 42. I'm not sure
if this can be the case for special directives like cfa / eh_frame though.

barannikov88 accepted this revision.May 13 2023, 2:08 PM

@MaskRay
Thank you for the explanation. Weighting all pros and cons I tend to agree with you.
I also apologize for taking so much of your time for this simple change.

One point I'd like to clarify, because it was one of my main concerns:

Are you suggesting to remove the getAssemblerPtr() argument? AIUI this would prevent the fast path from ever being triggered, even for assembly input.

This will not prevent the fast path from being triggered because not all
expressions contain symbols. An expression can be as simple as 42. I'm not sure
if this can be the case for special directives like cfa / eh_frame though.

Thank you! Yes, evaluateAsAbsolute works on immediate integers like 42.
I think that does not happen with .debug_line or .eh_frame/.debug_frame. The majority (all?) expressions are the difference between two labels...

I'll wait a bit and land this patch.

MaskRay edited the summary of this revision. (Show Details)May 15 2023, 6:30 PM