This is an archive of the discontinued LLVM Phabricator instance.

MCExpr::evaluateAsRelocatableImpl : allow evaluation of non-VK_None MCSymbolRefExpr when MCAsmLayout is available
ClosedPublic

Authored by MaskRay on Sep 30 2020, 5:24 PM.

Details

Summary

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=4acf8c78e659833be8be047ba2f8561386a11d4b
(1994) introduced this behavior:
if a fixup symbol is equated to an expression with an undefined symbol, convert
the fixup to be against the target symbol. glibc relies on this behavior to perform
assembly level indirection

asm("memcpy = __GI_memcpy"); // from sysdeps/generic/symbol-hacks.h

...
  // call memcpy@PLT
  // The relocation references __GI_memcpy in GNU as, but memcpy in MC (without the patch)
  memcpy (...);

(1) It complements extern __typeof(memcpy) memcpy asm("__GI_memcpy"); The frontend asm label does not redirect synthesized memcpy in the middle-end. (See D88712 for details)
(2) asm("memcpy = __GI_memcpy"); is in every translation unit, but the memcpy declaration may not be visible in the translation unit where memcpy is synthesized.

MC already redirects memcpy = __GI_memcpy; call memcpy but not memcpy = __GI_memcpy; call memcpy@plt.
This patch fixes the latter by allowing MCExpr::evaluateAsRelocatableImpl to
evaluate a non-VK_None MCSymbolRefExpr, which is only done after the layout is available.

GNU as allows memcpy = __GI_memcpy+1; call memcpy@PLT which seems nonsensical, so we don't allow it.

MC/PowerPC/pr38945.s NUMBER = 0x6ffffff9; cmpwi 8,NUMBER@l requires the
symbol@l form in AsmMatcher, so evaluation needs to be deferred. This is the
place whether future simplification may be possible.

Note, if we suppress the VM_None evaluation when MCAsmLayout is nullptr, we may
lose the invalid reassignment of non-absolute variable diagnostic
(ARM/thumb_set-diagnostics.s and MC/AsmParser/variables-invalid.s).
We know that this diagnostic is troublesome in some cases
(https://github.com/ClangBuiltLinux/linux/issues/1008), so we can consider
making simplification in the future.

Diff Detail

Event Timeline

MaskRay created this revision.Sep 30 2020, 5:24 PM
MaskRay requested review of this revision.Sep 30 2020, 5:24 PM
MaskRay edited the summary of this revision. (Show Details)Oct 7 2020, 12:49 PM

I have tried adding the expression folding stuff to`MCExpr::evaluateAsRelocatableImpl` but I abandoned that idea because evaluateAsRelocatableImpl does not like a modifier different from VK_None (@PLT). (What does a@plt + 1 do? Relaxing the condition (== VK_None) may work... but the semantics are not very clear.

MaskRay updated this revision to Diff 297384.Oct 9 2020, 9:31 PM
MaskRay edited the summary of this revision. (Show Details)

Fix clang-tidy issues

Oh, you're waiting on me. Uh, I'll get to this tomorrow. :)

I have tried adding the expression folding stuff to`MCExpr::evaluateAsRelocatableImpl` but I abandoned that idea because evaluateAsRelocatableImpl does not like a modifier different from VK_None (@PLT). (What does a@plt + 1 do? Relaxing the condition (== VK_None) may work... but the semantics are not very clear.

a@plt + 1 is perfectly fine -- that turns into a R_X86_64_PLT32 relocation with symbol "a" and addend 1.

The interesting question is what this should do:

b=a+1 # no @plt!
... b@plt

It's certainly impossible to create a PLT referring to a function at address "a+1", in current object formats. So, probably this should be invalid. However, GAS seems to consider the modifier as more of an overall-modifier, resulting in the same thing as a@plt + 1. I agree that's probably not something we ought to support, since it seems wrong.

But, it does still seem reasonable to put the !VK_None support into evaluateAsRelocatableImpl -- as long as the resolved MCValue has only a single symbol, no offset and no kind.

MaskRay added a comment.EditedOct 13 2020, 12:15 PM

I have tried adding the expression folding stuff to`MCExpr::evaluateAsRelocatableImpl` but I abandoned that idea because evaluateAsRelocatableImpl does not like a modifier different from VK_None (@PLT). (What does a@plt + 1 do? Relaxing the condition (== VK_None) may work... but the semantics are not very clear.

a@plt + 1 is perfectly fine -- that turns into a R_X86_64_PLT32 relocation with symbol "a" and addend 1.

The interesting question is what this should do:

b=a+1 # no @plt!
... b@plt

It's certainly impossible to create a PLT referring to a function at address "a+1", in current object formats. So, probably this should be invalid. However, GAS seems to consider the modifier as more of an overall-modifier, resulting in the same thing as a@plt + 1. I agree that's probably not something we ought to support, since it seems wrong.

But, it does still seem reasonable to put the !VK_None support into evaluateAsRelocatableImpl -- as long as the resolved MCValue has only a single symbol, no offset and no kind.

The current way the patch implements the "if a fixup symbol is equated to an expression with an undefined symbol" logic is consistent with the step GNU as implements the check.
GNU as implements the logic when it is about to emit a relocation, not when folding expressions.

One major problem by moving the condition to MCExpr::evaluateAsRelocatableImpl is that we may do isUndefined() check too early (this special case does not kick in when the symbol is defined (memcpy = __GI_memcpy; call memcpy ... __GI_memcpy: ... should emit a relocation referencing memcpy, instead of __GI_memcpy)).
The second problem is indeed the cooperation with the VK_None special rule on MCExpr.cpp:805. Another minor problem is that this will add another rule to the existing if (!IsMachO) hack on 810.

MaskRay retitled this revision from [MC] Redirect fixup symbol to the aliasee (if exists) to [MC] Redirect fixup symbol to the aliasee (if undefined).Oct 15 2020, 3:32 PM
MaskRay edited the summary of this revision. (Show Details)
MaskRay edited the summary of this revision. (Show Details)
MaskRay edited the summary of this revision. (Show Details)Oct 15 2020, 3:56 PM
MaskRay edited the summary of this revision. (Show Details)Oct 15 2020, 3:59 PM
MaskRay added a comment.EditedOct 26 2020, 11:24 PM

@jyknight Ping😹

This is not so important as D88712 when used for glibc, but still it'd be good to have this patch:)

I know that not supporting MCBinaryExpr may make you feel a bit nervous: binary expr on an undefined symbol is very rarely used.

MaskRay edited the summary of this revision. (Show Details)Nov 2 2020, 11:24 AM

I have tried adding the expression folding stuff to`MCExpr::evaluateAsRelocatableImpl` but I abandoned that idea because evaluateAsRelocatableImpl does not like a modifier different from VK_None (@PLT). (What does a@plt + 1 do? Relaxing the condition (== VK_None) may work... but the semantics are not very clear.

a@plt + 1 is perfectly fine -- that turns into a R_X86_64_PLT32 relocation with symbol "a" and addend 1.

The interesting question is what this should do:

b=a+1 # no @plt!
... b@plt

It's certainly impossible to create a PLT referring to a function at address "a+1", in current object formats. So, probably this should be invalid. However, GAS seems to consider the modifier as more of an overall-modifier, resulting in the same thing as a@plt + 1. I agree that's probably not something we ought to support, since it seems wrong.

But, it does still seem reasonable to put the !VK_None support into evaluateAsRelocatableImpl -- as long as the resolved MCValue has only a single symbol, no offset and no kind.

The current way the patch implements the "if a fixup symbol is equated to an expression with an undefined symbol" logic is consistent with the step GNU as implements the check.
GNU as implements the logic when it is about to emit a relocation, not when folding expressions.

All of this is pretty confusing. We do need to expand variable references, including those to undefined symbols, during expression evaluation in certain contexts, such as .if. We have some code in there to do this now (conditioning expansion based on "InSet"), but e.g.

x = a - 1
.if x + 1 == a
nop
.endif

doesn't work with llvm, today, while it does in gas.

OTOH, for something like the following, we definitely need to preserve the reference to b, and NOT try to expand it to a+1 too early (because we need to not emit the PLT relocation to global symbol a). Binutils' as compiles this as a PC32 relocation off of .text.a, while we generate a PLT32 relocation off of b; both of those seem probably fine.

b = a+1

.section .text.foo
foo:
jmp b@PLT


.section .text.a
.globl a
a:
nop
nop

One major problem by moving the condition to MCExpr::evaluateAsRelocatableImpl is that we may do isUndefined() check too early (this special case does not kick in when the symbol is defined (memcpy = __GI_memcpy; call memcpy ... __GI_memcpy: ... should emit a relocation referencing memcpy, instead of __GI_memcpy)).

I think we can ensure that we only do it at the right point, by checking whether we have the optional Layout object passed in. evaluateFixup calls evaluateAsRelocatable with a Layout object, but other uses that occur pre-layout will not.

MaskRay updated this revision to Diff 302429.Nov 2 2020, 4:19 PM
MaskRay retitled this revision from [MC] Redirect fixup symbol to the aliasee (if undefined) to MCExpr::evaluateAsRelocatableImpl : allow evaluation of non-VK_None MCSymbolRefExpr when MCAsmLayout is available.
MaskRay edited the summary of this revision. (Show Details)

Change MCExpr::evaluateAsRelocatableImpl instead. This works for MCBinaryExpr automatically

Changed to patch MCExpr::evaluateAsRelocatableImpl instead

I have tried adding the expression folding stuff to`MCExpr::evaluateAsRelocatableImpl` but I abandoned that idea because evaluateAsRelocatableImpl does not like a modifier different from VK_None (@PLT). (What does a@plt + 1 do? Relaxing the condition (== VK_None) may work... but the semantics are not very clear.

a@plt + 1 is perfectly fine -- that turns into a R_X86_64_PLT32 relocation with symbol "a" and addend 1.

The interesting question is what this should do:

b=a+1 # no @plt!
... b@plt

It's certainly impossible to create a PLT referring to a function at address "a+1", in current object formats. So, probably this should be invalid. However, GAS seems to consider the modifier as more of an overall-modifier, resulting in the same thing as a@plt + 1. I agree that's probably not something we ought to support, since it seems wrong.

But, it does still seem reasonable to put the !VK_None support into evaluateAsRelocatableImpl -- as long as the resolved MCValue has only a single symbol, no offset and no kind.

The current way the patch implements the "if a fixup symbol is equated to an expression with an undefined symbol" logic is consistent with the step GNU as implements the check.
GNU as implements the logic when it is about to emit a relocation, not when folding expressions.

All of this is pretty confusing. We do need to expand variable references, including those to undefined symbols, during expression evaluation in certain contexts, such as .if. We have some code in there to do this now (conditioning expansion based on "InSet"), but e.g.

x = a - 1
.if x + 1 == a
nop
.endif

doesn't work with llvm, today, while it does in gas.

OTOH, for something like the following, we definitely need to preserve the reference to b, and NOT try to expand it to a+1 too early (because we need to not emit the PLT relocation to global symbol a). Binutils' as compiles this as a PC32 relocation off of .text.a, while we generate a PLT32 relocation off of b; both of those seem probably fine.

b = a+1

.section .text.foo
foo:
jmp b@PLT


.section .text.a
.globl a
a:
nop
nop

b is local (a symbol assignment does not copy the binding), so GNU as emits a relocation referencing the section symbol.
This patch does not change the behavior.

One major problem by moving the condition to MCExpr::evaluateAsRelocatableImpl is that we may do isUndefined() check too early (this special case does not kick in when the symbol is defined (memcpy = __GI_memcpy; call memcpy ... __GI_memcpy: ... should emit a relocation referencing memcpy, instead of __GI_memcpy)).

I think we can ensure that we only do it at the right point, by checking whether we have the optional Layout object passed in. evaluateFixup calls evaluateAsRelocatable with a Layout object, but other uses that occur pre-layout will not.

It seems that the parse-time MCExpr::evaluateAsRelocatableImpl only takes actions if the value is absolute (not isInSection()),,,,, so MCExpr::evaluateAsRelocatableImpl actually works.
The whole infrastructure does give me an impression that it can have subtle behaviors in a few places....

MaskRay edited the summary of this revision. (Show Details)Nov 9 2020, 10:21 AM

@jyknight Ping:)

jyknight added inline comments.Nov 12 2020, 2:54 PM
llvm/lib/MC/MCExpr.cpp
811

Maybe to prevent the nonsensical result in the test below, have a condition, like:

// If the reference has a variant kind, we can only handle expressions
// which evaluate exactly to a single unadorned symbol.
if (Kind != MCSymbolRefExpr::VK_None) {
  if (Res.getRefKind() != MCSymbolRefExpr::VK_None || !Res.getSymA() || Res.getSymB() || Res.getConstant())
    return false;
  Res = ...
}
llvm/test/MC/ARM/ehabi-personality-abs.s
12–14

", because the target is an absolute symbol."

llvm/test/MC/ELF/relocation-alias.s
19

This is compatible with what GNU as does, but is nonsensical behavior. Do you think we really want to support it?

MaskRay updated this revision to Diff 304993.Nov 12 2020, 4:37 PM
MaskRay marked an inline comment as done.
MaskRay edited the summary of this revision. (Show Details)

Comments

MaskRay marked 2 inline comments as done.Nov 12 2020, 4:37 PM
MaskRay added inline comments.
llvm/lib/MC/MCExpr.cpp
816

I cannot test Res.getSymB in the conditions because (1) foo - bar can result into an early error (2) foo + foo will cause a report_fatal_error crash.

jyknight added inline comments.Nov 13 2020, 7:16 AM
llvm/lib/MC/MCExpr.cpp
816

Are you saying that the code, as it is now, has this problem? (If so, I don't understand what you mean).

But, looking at the test failure shown for lld/test/ELF/x86-64-relax-got-abs.s shows that "return false" was wrong here -- we shouldn't be _failing_ to evaluate, rather, we should be returning the symbol ref expr as-is. (Res = MCValue::get(SRE, nullptr, 0); return true; -- the fallthrough case below.

MaskRay updated this revision to Diff 305293.Nov 13 2020, 7:03 PM
MaskRay edited the summary of this revision. (Show Details)

Special case the Res.isAbsolute() case

MaskRay marked an inline comment as done.Nov 13 2020, 7:05 PM
MaskRay added inline comments.
llvm/lib/MC/MCExpr.cpp
816

I mean I cannot construct a test case to exercise the "Res.getSymB()" condition.

Thanks for spotting the lld/test/ELF/x86-64-relax-got-abs.s issue. I'll special case Res.isAbsolute and add a test for abs@gotpcrelx... abs = 42

MaskRay marked an inline comment as done.Nov 17 2020, 12:42 PM

Forgot to resolve a comment.

Does it look good now:)

jyknight accepted this revision.Nov 18 2020, 12:26 PM

AFAICT, this looks good now. I guess we'll see if anything breaks after it's submitted. =)

This revision is now accepted and ready to land.Nov 18 2020, 12:26 PM

Thanks for taking time to perform the thorough reviews:)

I have done two Linux kernel builds, make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- O=/tmp/out/arm64 LLVM=1 LLVM_IAS=1 defconfig vmlinux and make O=/tmp/out/x86_64 LLVM=1 LLVM_IAS=1 defconfig vmlinux -> passed.

This revision was landed with ongoing or failed builds.Nov 18 2020, 1:52 PM
This revision was automatically updated to reflect the committed changes.