This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Make linker-relaxable instructions terminate MCDataFragment
ClosedPublic

Authored by MaskRay on Jun 15 2023, 9:47 PM.

Details

Summary

MCExpr::evaluateAsAbsolute has a longstanding bug. When the MCAssembler is
non-null and the MCAsmLayout is null, it may incorrectly fold A-B even if A and
B are separated by a linker-relaxable instruction. This behavior can suppress
some ADD/SUB relocations and lead to wrong results if the linker performs
relaxation.

To fix the bug, ensure that linker-relaxable instructions only appear at the end
of an MCDataFragment, thereby making them terminate the fragment. When computing
A-B, suppress folding if A and B are separated by a linker-relaxable
instruction.

  • .subsection now correctly give errors for non-foldable expressions.
  • gen-dwarf.s will pass even if we add back the .debug_line or .eh_frame/.debug_frame code from D150004
  • This will fix suppressed relocation when we add R_RISCV_SET_ULEB128/R_RISCV_SUB_ULEB128.

In the future, we should investigate the desired behavior for
MCExpr::evaluateAsAbsolute when both MCAssembler and MCAsmLayout are non-null.

(Note: MCRelaxableFragment is only for assembler-relaxation. If we ever need
linker-relaxable MCRelaxableFragment, we would need to adjust RISCVMCExpr.cpp
(D58943/D73211).)

Depends on D153096

Diff Detail

Event Timeline

MaskRay created this revision.Jun 15 2023, 9:47 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 15 2023, 9:47 PM
MaskRay requested review of this revision.Jun 15 2023, 9:47 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 15 2023, 9:47 PM
barannikov88 added inline comments.Jun 15 2023, 10:03 PM
llvm/lib/MC/MCELFStreamer.cpp
634
  • Maybe add a new fixup flag to MCFixupKindInfo.h and check if it is set?
  • Can Fixups contain more than one element? If it can, why are you checking only the last one?
MaskRay marked an inline comment as done.Jun 15 2023, 10:15 PM
MaskRay added inline comments.
llvm/lib/MC/MCELFStreamer.cpp
634

Linker-relaxable instructions have two relocations and the second (last) one is R_RISCV_RELAX. There is just one relocation type, so adding a FixupKindFlags flag appears to be overkill. In addition, calling getFixupKindInfo adds some overhead to non-RISCV targets.

The current way causes the least overhead to non-RISCV targets.

MCObjectStreamer::emitInstructionImpl has some logic deciding whether the instruction can be appended to a data fragment or should go in its own fragment.
I would expect the check to be there instead of MCELFStreamer::emitInstToData.
For this to be possible encodeInstruction should be pulled out too, so that you can examine Fixups.
As I can see, all implementations of emitInstrToData call encodeInstruction. (Except MCDXContainerStreamer,
but I guess this method was overridden just for the vtable to be pinned to cpp file.)

llvm/lib/MC/MCELFStreamer.cpp
634

There is not much of overhead, but makes the solution scalable and avoids target-specific changes to common code.

barannikov88 added inline comments.Jun 15 2023, 10:23 PM
llvm/lib/MC/MCELFStreamer.cpp
634

To support my point of view further, FKF_Constant is also used only by ARM backend.

MaskRay marked an inline comment as done.Jun 15 2023, 11:51 PM

MCObjectStreamer::emitInstructionImpl has some logic deciding whether the instruction can be appended to a data fragment or should go in its own fragment.
I would expect the check to be there instead of MCELFStreamer::emitInstToData.
For this to be possible encodeInstruction should be pulled out too, so that you can examine Fixups.
As I can see, all implementations of emitInstrToData call encodeInstruction. (Except MCDXContainerStreamer,
but I guess this method was overridden just for the vtable to be pinned to cpp file.)

Actually, adding the logic to MCObjectStreamer::emitInstructionImpl was my original approach before sticking with the current version.
My previous attempt placed linker-relaxable instruction in its own fragment, instead of the end of a MCDataFragment.
However, it increased the number of fragments.

Then I realized that the preferred solution is to place the linker-relaxable instruction at the end of a MCDataFragment.
With this choice, I don't think moving DF->setLinkerRelaxable(); into MCObjectStreamer::emitInstructionImpl makes cleaner code.
I think it can be argued either way. Moving DF->setLinkerRelaxable(); closer to DF->setHasInstructions(STI); has some value as well.

llvm/lib/MC/MCELFStreamer.cpp
634

FKF_Constant (my D72892) is probably not the best reference. I suspect that we can remove it but learning AArch32 is not my priority...

makes the solution scalable and avoids target-specific changes to common code.

I am not sure I agree. Both getTargetKind and a FixupKindFlags leak into the generic code and do the same amount of "pollution" in my viewpoint. FixupKindFlags adds slightly more abstraction, so I'd avoid it.

MCObjectStreamer::emitInstructionImpl has some logic deciding whether the instruction can be appended to a data fragment or should go in its own fragment.
I would expect the check to be there instead of MCELFStreamer::emitInstToData.
For this to be possible encodeInstruction should be pulled out too, so that you can examine Fixups.
As I can see, all implementations of emitInstrToData call encodeInstruction. (Except MCDXContainerStreamer,
but I guess this method was overridden just for the vtable to be pinned to cpp file.)

Actually, adding the logic to MCObjectStreamer::emitInstructionImpl was my original approach before sticking with the current version.
My previous attempt placed linker-relaxable instruction in its own fragment, instead of the end of a MCDataFragment.
However, it increased the number of fragments.

Sadly you reconsidered, because I thought the same way.

Then I realized that the preferred solution is to place the linker-relaxable instruction at the end of a MCDataFragment.
With this choice, I don't think moving DF->setLinkerRelaxable(); into MCObjectStreamer::emitInstructionImpl makes cleaner code.
I think it can be argued either way. Moving DF->setLinkerRelaxable(); closer to DF->setHasInstructions(STI); has some value as well.

Moving it into emitInstructionImpl would generalize the changes to non-ELF targets.

llvm/include/llvm/MC/MCFragment.h
251

A brief comment would be nice.

llvm/lib/MC/MCELFStreamer.cpp
634

FKF_Constant (my D72892) is probably not the best reference. I suspect that we can remove it but learning AArch32 is not my priority...

FixupKindFlags adds slightly more abstraction, so I'd avoid it.

Not sure I get it. Abstraction is almost always a win. Well, can't force you.

MaskRay marked an inline comment as done.EditedJun 16 2023, 10:25 AM

MCObjectStreamer::emitInstructionImpl has some logic deciding whether the instruction can be appended to a data fragment or should go in its own fragment.
I would expect the check to be there instead of MCELFStreamer::emitInstToData.
For this to be possible encodeInstruction should be pulled out too, so that you can examine Fixups.
As I can see, all implementations of emitInstrToData call encodeInstruction. (Except MCDXContainerStreamer,
but I guess this method was overridden just for the vtable to be pinned to cpp file.)

Actually, adding the logic to MCObjectStreamer::emitInstructionImpl was my original approach before sticking with the current version.
My previous attempt placed linker-relaxable instruction in its own fragment, instead of the end of a MCDataFragment.
However, it increased the number of fragments.

Sadly you reconsidered, because I thought the same way.

Then I realized that the preferred solution is to place the linker-relaxable instruction at the end of a MCDataFragment.
With this choice, I don't think moving DF->setLinkerRelaxable(); into MCObjectStreamer::emitInstructionImpl makes cleaner code.
I think it can be argued either way. Moving DF->setLinkerRelaxable(); closer to DF->setHasInstructions(STI); has some value as well.

Moving it into emitInstructionImpl would generalize the changes to non-ELF targets.

Unfortunately, RISC-V linker relaxation is very ELF-specific and I don't think other object file formats can benefit from a generalized interface.
LoongArch folks may copy the design (I am a bit concerned of this as the assembler infrastructure for linker relaxation needs significant re-architecture. If they try upstreaming such stuff at this point, it would make fixing linker relaxation more difficult), otherwise no other ELF target will use the relaxation feature.
I think adding a FKF_ flag is over-engineering without a clear benefit.

Moving the setLinkerRelaxable call into MCObjectStreamer::emitInstructionImpl, as you mentioned previously, would require significant refactoring to pull out encodeInstruction out of all derived emitInstToData.
(MCDXContainerStreamer::emitInstToData doesn't call encodeInstruction. The function is unfortunately not tested by check-llvm.)
Yet I don't see a value in terms of various aspects I can think of.

llvm/lib/MC/MCELFStreamer.cpp
634

Well, can't force you.

Thank you:) I think the abstraction will just add a bit more cognitive load to readers. They will need to figure out that this is just about a specific fixup kind.

MaskRay marked an inline comment as done.Jun 16 2023, 10:26 AM
MaskRay updated this revision to Diff 532493.Jun 18 2023, 5:28 PM
MaskRay marked an inline comment as done.

Rebase.
Rename to isLinkerRelaxable.
Improve comment.

Is the presubmit failure related?

llvm/lib/MC/MCExpr.cpp
663

could use your new variables here.

686

once set, we probably don't need to check these other conditions? maybe check !BBeforeRelax first, and !AAfterRelax below (L690)?

if (!BBeforeRelax && (&*FI != FB || ...))
  ...
MaskRay updated this revision to Diff 533727.Jun 22 2023, 12:27 PM
MaskRay edited the summary of this revision. (Show Details)

rebase

Is it possible to tell which instructions are linker-relaxable? If so, perhaps fixing MCExpr::evaluateAsAbsolute to scan fragments for those is a better fix?

Is it possible to tell which instructions are linker-relaxable? If so, perhaps fixing MCExpr::evaluateAsAbsolute to scan fragments for those is a better fix?

Thanks for taking a look.
https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-elf.adoc#linker-relaxation has listed some instructions that are linker-relaxable (they generate a R_RISCV_RELAX relocation with the default -mrelax).

AttemptToFoldSymbolOffsetDifference is eventually called by MCExpr::evaluateAsAbsolute.
Fragment scanning should generally be avoided. Adding another fragment scan to MCExpr::evaluateAsAbsolute would increase complexity.
After intensive investigation, I conclude that AttemptToFoldSymbolOffsetDifference is the place to fix...

llvm/lib/MC/MCExpr.cpp
686

We could add !BBeforeRelax && but I feel that this just makes the code slightly more complex without providing a performance improvement, so I think we may just afford redundant BBeforeRelax assignment...

llvm/test/MC/ELF/RISCV/subsection.s
1–38

riscv64-linux-gnu-as seems able to assemble this test case just fine. That's curious.

MaskRay added inline comments.Jun 22 2023, 2:52 PM
llvm/test/MC/ELF/RISCV/subsection.s
1–38

Yes, GNU assembler allows negative .subsection numbers but that we don't.
The negative subsection numbers may be error-prone, so seems fine to reject.
Here I just use .subsection as a convenient feature to test the behavior of evaluateAsAbsolute...

asb accepted this revision.Jun 29 2023, 7:22 AM

I've stepped through the patch description and the code - both of which make sense to me. I'm not familiar enough with the code to be certain there's not an alternative way of achieving the same thing that would be better....but then I'd normally looking to @MaskRay for such suggestions :) LGTM, just left a minor query regarding tweaking a doc comment.

llvm/include/llvm/MC/MCAsmBackend.h
54

I assume the point of labelling this as RISC-V thing is to note that it hasn't been put to use across multiple targets. So perhaps "Fixup kind used for linker relaxation. Currently only used by RISC-V" might be a better description? (i.e. in theory general code, but hasn't been used in practice outside of one target).

This revision is now accepted and ready to land.Jun 29 2023, 7:22 AM

I've stepped through the patch description and the code - both of which make sense to me. I'm not familiar enough with the code to be certain there's not an alternative way of achieving the same thing that would be better....but then I'd normally looking to @MaskRay for such suggestions :) LGTM, just left a minor query regarding tweaking a doc comment.

This was my main concern. The patch assumes there is only one possible fixup that may need linker relaxation, and only supported on ELF targets.
I can imagine that RISC-V could be supported by non-unix OS, or that more fixups needing linker relaxation can be added, possibly by other targets.
I hoped my concern could be addressed, but I might as well be nitpicking, as it can easily be addressed in the future.

llvm/lib/MC/MCExpr.cpp
663

(nit) It could be slightly easier to follow if the Displacement was calculated after the fragments are sorted.

MaskRay added a comment.EditedJun 29 2023, 9:27 AM

I've stepped through the patch description and the code - both of which make sense to me. I'm not familiar enough with the code to be certain there's not an alternative way of achieving the same thing that would be better....but then I'd normally looking to @MaskRay for such suggestions :) LGTM, just left a minor query regarding tweaking a doc comment.

This was my main concern. The patch assumes there is only one possible fixup that may need linker relaxation, and only supported on ELF targets.
I can imagine that RISC-V could be supported by non-unix OS, or that more fixups needing linker relaxation can be added, possibly by other targets.
I hoped my concern could be addressed, but I might as well be nitpicking, as it can easily be addressed in the future.

Thank you for your thoughts. I think our bikeshedding on this point is a fairly minor thing... The two object file formats LLVM supports that will mostly likely add RISC-V support (even when we say "most likely", it can be multiple years) are Mach-O and PE/COFF.

Mach-O has 16 available r_type values, which are very limited, (along with other assembler issues and subsections semantics) make it fairly difficult to add relaxation based on a relocation type.
If they do enable relaxation, they'll likely use .loh directive, with is entirely independent of this relocation based mechanism.

PE/COFF has an uint16_t relocation type field. If it does add RISC-V relaxation, it can reuse a relocation type for this purpose.
However, it does not provide an addend field. I haven't thought much about what this will entail, but I think adding relaxation could be quite challenging as well.

Anyhow, even if Mach-O and PE/COFF add RISC-V support and enable relaxation, the mechanism will be quite unclear.
The MC code may require larger changes. At that point, we can rethink how to re-architecture our code to make the most sense.
For now, I am not comfortable adding some abstraction to make reasoning about the code slightly more difficult.

MaskRay updated this revision to Diff 535850.Jun 29 2023, 9:34 AM
MaskRay marked 5 inline comments as done.

Move SAOffset and Displacement
Improve a comment

MaskRay edited the summary of this revision. (Show Details)Jun 29 2023, 9:37 AM
This revision was landed with ongoing or failed builds.Jun 29 2023, 9:40 AM
This revision was automatically updated to reflect the committed changes.