This is an archive of the discontinued LLVM Phabricator instance.

[MC] Recursively calculate symbol offset
ClosedPublic

Authored by lgrey on Sep 1 2021, 4:22 PM.

Details

Summary

This is speculative since I'm not sure if there's some implicit contract that a variable symbol must not have another variable symbol in its evaluation tree.

Downstream bug: https://bugs.chromium.org/p/chromium/issues/detail?id=471146#c23.

Test is based on alias.s (removed checks since we just need to know it didn't crash).

Diff Detail

Event Timeline

lgrey created this revision.Sep 1 2021, 4:22 PM
lgrey requested review of this revision.Sep 1 2021, 4:22 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 1 2021, 4:22 PM
lgrey edited reviewers, added: MaskRay; removed: grosbach.Sep 7 2021, 8:56 AM
MaskRay added a comment.EditedSep 10 2021, 12:35 PM

With a recursive getSymbolOffsetImpl, I am worried about infinite loop if two symbols are variables referencing each other (it seems impossible, but symbol reassignments can sometimes happen, unfortunately the conditions are pretty complex, so the risk exists).

chained-alias-offset.s works for ELF where l_c is l_a + 1.

So we should understand why it fails on Mach-O.

I think the root cause is: https://bugs.llvm.org/show_bug.cgi?id=19203. The proximate cause is https://github.com/llvm/llvm-project/blob/main/llvm/lib/MC/MCExpr.cpp#L820.

In the real case that motivated this, we have something like BinaryExpr(SymRefExpr(BinaryExpr(SymRefExpr(Constant) + Constant) + Constant)). Without the MachO special case in MCExpr, this would get reduced to an expression without nested symbols before it makes it to the offset calculation.

MaskRay added a comment.EditedSep 10 2021, 2:42 PM

Ah, right. I looked at this piece of code and wondered as well last year...

I looked at some ninja check-llvm-mc failures if we remove the hack:

--- i/llvm/lib/MC/MCExpr.cpp
+++ w/llvm/lib/MC/MCExpr.cpp
@@ -812,6 +812,7 @@ bool MCExpr::evaluateAsRelocatableImpl(MCValue &Res, const MCAssembler *Asm,
                                                    Kind, Asm->getContext()),
                            Res.getSymB(), Res.getConstant(), Res.getRefKind());
         }
+        return true;
         if (!IsMachO)  /// a proper fix should remove this block completely
           return true;

For

  • diff-with-two-sections.s
  • variable-exprs.s

I guess removing the hack may fix bugs?!

For eh-frame-reloc.s, there will be a pair of X86_64_RELOC_SUBTRACTOR __eh_frame and X86_64_RELOC_UNSIGNED _bar which could(?) be cancelled but not by MC. I think this is benign.

My Mach-O knowledge is fairly insufficient.

@lhames Can you check if the hack can be deleted? If it works, please just push the change:) Then this patch can be abandoned.

lgrey added a comment.Sep 10 2021, 2:48 PM

So I tried this and the eh_frame thing is a real problem at link time. lld errors because the X86_64_RELOC_SUBTRACTOR ends up being local. The link succeeds in ld64, but produces a weirdly large binary.

lgrey added a comment.Sep 14 2021, 1:27 PM

lhames@ ping :) Any idea what the right approach is here?

I think the MachO behavior described in https://bugs.llvm.org/show_bug.cgi?id=19203 is broken, but I still don't think that we can change this until we have a plan to deal with instances of the buggy pattern in the wild -- that's a much bigger issue than underlying bug.

lgrey added a comment.Sep 22 2021, 7:24 AM

I think the MachO behavior described in https://bugs.llvm.org/show_bug.cgi?id=19203 is broken, but I still don't think that we can change this until we have a plan to deal with instances of the buggy pattern in the wild -- that's a much bigger issue than underlying bug.

What's a good path forward to fix that?

I think the MachO behavior described in https://bugs.llvm.org/show_bug.cgi?id=19203 is broken, but I still don't think that we can change this until we have a plan to deal with instances of the buggy pattern in the wild -- that's a much bigger issue than underlying bug.

What's a good path forward to fix that?

I don't thing there's a good path. Given that in-tree assembly is often associated with sensitive code (e.g. kernels and drivers) I think that we would have to put significant effort into making any changes in behavior visible. I can think of two long, awkward paths for this:

(1) We introduce an assembly directive that describes the expected assignment behavior: .llvm_pr19203=fixed|compat. On MachO, we could emit a warning when we encounter a buggy expression if that directive is unset, or set to compat: "Warning: Using llvm_pr19203 compat mode. Please audit or or regenerate your assembly, then switch to fixed". This directive would probably hang around forever (once it's added to the source it's going to be hard to get rid of again).

(2) We rely on assembler tool options, rather than a directive. We lose the ability to determine intended behavior from the source, but don't have to keep an oddball directive around forever. The option would go through a few phases:
Initially: --pr19203=compat (by default) -- If omitted or explicitly set to compat then this will issue the same warning as above. Explicitly setting '--pr19203=fixed' will silence the warning.
Then: --pr19203=fixed (by default) -- If set to compat this will issue a warning that pr203 compat support is going away, and the code should be urgently reviewed/fixed. If set to fixed this should issue a warning: "--pr19203=fixed is now the default. Please remove this option from your assembler invocation".
Sunset: --pr19203 generates an error if set to compat ("compat support is gone"). In 'fixed' mode it now warns that the option should be removed urgently, as the option itself is going away.
Finally: --pr19203 option is removed.

Given how infrequently some groups update compilers the length between the phases should probably be measured in years. This is especially true between 'Initially' and 'Then', when we change the default behavior.

There may be problems with these approaches that I haven't considered too -- either one would need to be discussed with the wider community before work started.

Are there any workarounds that you could apply to solve your issue? If so I'd be inclined to try those rather than tackling this.

Oof! That does sound prohibitively difficult.

@MaskRay is the mutual reference issue that you referred to tractable?

MaskRay added a subscriber: nickdesaulniers.EditedSep 27 2021, 1:00 PM

Oof! That does sound prohibitively difficult.

@MaskRay is the mutual reference issue that you referred to tractable?

@lgrey I am just concerned that some configurations of the Linux kernel's Clang build may hit a behavior change here (I do remember symbol reassignments from some assembly files), but it is possible that I am overly worried.

@nickdesaulniers may assist on testing the patch (with just the llvm/lib/MC/MCFragment.cpp change)

In any case, the llvm/lib/MC/MCFragment.cpp changes need a FIXME comment that it works around the Mach-O issue and is not needed for other binary formats.


(1) We introduce an assembly directive that describes the expected assignment behavior: .llvm_pr19203=fixed|compat. On MachO, we could emit a warning when we encounter a buggy expression if that directive is unset, or set to compat: "Warning: Using llvm_pr19203 compat mode. Please audit or or regenerate your assembly, then switch to fixed". This directive would probably hang around forever (once it's added to the source it's going to be hard to get rid of again).

I think such a Mach-O specific diagnostic will be useful in evaluateAsRelocatableImpl. Hope either of you (who work on Mach-O :) can implement it.

(2) We rely on assembler tool options, rather than a directive. We lose the ability to determine intended behavior from the source, but don't have to keep an oddball directive around forever. The option would go through a few phases:

LGTM, too. This can avoid FIXME in this patch.
If my assessment that "I guess removing the hack may fix bugs?!" is correct, I actually prefer this approach.

Gerolf added a subscriber: Gerolf.Sep 27 2021, 5:27 PM

Agree on the painpoints, but from the user perspective Lang's proposal (2) is a considerate approach to handle this.

nickdesaulniers accepted this revision.Oct 4 2021, 9:59 AM

I built the mainline Linux kernel for x86_64, arm, and arm64 (defconfigs) with clang's integrated assembler with this patch without any problem. I also boot tested the resulting images in qemu. So LGTM from that perspective, but please wait for additional sign off from other reviewers.

This revision is now accepted and ready to land.Oct 4 2021, 9:59 AM

@MaskRay are you OK with this (with added FIXME) given the comment above? I think just removing the hack (and doing the migration work) isn't enough due to the issues I saw with relocations after removing it.

MaskRay requested changes to this revision.Oct 15 2021, 10:55 AM

Please add a comment and run arc diff 'HEAD^' or manually upload the diff.
The diff as it currently displays cannot be accepted.

llvm/test/MC/MachO/chained-alias-offset.s
2

Use llvm-readobj -s to check the symbol values, instead of checking it doesn't crash.

This revision now requires changes to proceed.Oct 15 2021, 10:55 AM
MaskRay added inline comments.Oct 15 2021, 10:57 AM
llvm/test/MC/MachO/chained-alias-offset.s
3

Test discovery trick: update MCFragment.cpp to use a wrong value and run check-llvm-mc

Consider adding the new test to an existing file.

lgrey updated this revision to Diff 380745.Oct 19 2021, 11:30 AM
lgrey marked an inline comment as done.

Added comment, updated test

lgrey marked an inline comment as not done.Oct 19 2021, 11:30 AM
lgrey added inline comments.
llvm/test/MC/MachO/chained-alias-offset.s
3

Unfortunately, that breaks a great deal of tests. I glanced through the broken Mach tests specifically and didn't find anything that looks like a good fit.

MaskRay accepted this revision.Oct 19 2021, 11:50 AM
This revision is now accepted and ready to land.Oct 19 2021, 11:50 AM
This revision was automatically updated to reflect the committed changes.