This is an archive of the discontinued LLVM Phabricator instance.

[COFF] Add MC support for emitting IMAGE_WEAK_EXTERN_ANTI_DEPENDENCY symbols
ClosedPublic

Authored by efriedma on Mar 2 2023, 6:16 PM.

Details

Summary

This is mostly useful for ARM64EC, which uses such symbols extensively.

One interesting quirk of ARM64EC is that we need to be able to emit weak symbols that point at each other (so if either symbol is defined elsewhere, both symbols point at the definition). This required a few changes to the way we handle weak symbols on Windows to avoid recursively visiting weak symbols.

Diff Detail

Event Timeline

efriedma created this revision.Mar 2 2023, 6:16 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 2 2023, 6:16 PM
efriedma requested review of this revision.Mar 2 2023, 6:16 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 2 2023, 6:16 PM
bcl5980 added inline comments.Mar 3 2023, 12:06 AM
llvm/lib/MC/MCExpr.cpp
764

Do we need to limit it to AntiDep only? This code looks means any weak symbol can form a cycle.

efriedma added inline comments.Mar 3 2023, 4:57 PM
llvm/lib/MC/MCExpr.cpp
764

isWeakExternal should be true for any symbol where we call setIsWeakExternal, which should be all weak symbols on Windows.

The whole recursion this is just much less likely to be relevant for non-AntiDep weak symbols; I've never seen this sort of usage of aliases elsewhere.

Looks reasonable to me, but I don't feel entirely confident about what this does in the common MC layer changes.

llvm/test/MC/COFF/alias.s
27

This looks curious - why does this change here?

llvm/test/MC/COFF/weak-anti-dep.s
24

I guess it'd be good to teach llvm-readobj about this new mode too.

efriedma added inline comments.
llvm/test/MC/COFF/alias.s
27

We were resolving ".long weak_aliased_to_external" to external2; the new isWeakExternal() checks prevent us from resolving that, I think. I wasn't really trying for that, specifically... but I think it's the behavior we want? (If a function refers to weak_aliased_to_external, we should allow it to be overridden whether or not the reference is in the same object file.)

llvm/test/MC/COFF/weak-anti-dep.s
24

You mean you want a readable name for the "0x4"? Not sure where that comes from, but I can look.

mstorsjo added inline comments.Mar 14 2023, 1:49 PM
llvm/test/MC/COFF/alias.s
27

Ah, I see - yeah, that makes sense and sounds like a desireable result indeed.

llvm/test/MC/COFF/weak-anti-dep.s
24

Yeah, llvm-readobj has got a list of its own.

efriedma updated this revision to Diff 506689.Mar 20 2023, 1:14 PM

Rebased on updated llvm-readobj dumping

Looks good to me at a glance. Is this something we should support in MASM as well?

Is this something we should support in MASM as well?

I don't think Microsoft currently exposes any masm syntax for this, so you'd need to invent your own.

mstorsjo accepted this revision.Apr 6 2023, 1:05 PM

Looks reasonable to me, but I don't feel entirely confident about what this does in the common MC layer changes.

Ok, if there's nobody else who wants to chime in here, then I guess this is fine, and I trust your judgement on it. Sorry for not following up earlier.

This revision is now accepted and ready to land.Apr 6 2023, 1:05 PM
This revision was landed with ongoing or failed builds.Apr 7 2023, 2:07 PM
This revision was automatically updated to reflect the committed changes.

this causes the following to crash

$ cat /tmp/a.ll
target datalayout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-windows-msvc19.16.0"

@"__profc_?__libcpp_verbose_abort@Cr@std@@YAXPEBDZZ" = external global [1 x i64]
@"__profd_?__libcpp_verbose_abort@Cr@std@@YAXPEBDZZ" = weak global { i64, i64, i64, ptr, ptr, i32, [2 x i16] } { i64 -1867311221809715712, i64 742261418966908927, i64 sub (i64 ptrtoint (ptr @"__profc_?__libcpp_verbose_abort@Cr@std@@YAXPEBDZZ" to i64), i64 ptrtoint (ptr @"__profd_?__libcpp_verbose_abort@Cr@std@@YAXPEBDZZ" to i64)), ptr @"?__libcpp_verbose_abort@Cr@std@@YAXPEBDZZ.local", ptr null, i32 1, [2 x i16] zeroinitializer }

@"?__libcpp_verbose_abort@Cr@std@@YAXPEBDZZ.local" = private alias void (ptr, ...), ptr @"?__libcpp_verbose_abort@Cr@std@@YAXPEBDZZ"

define weak void @"?__libcpp_verbose_abort@Cr@std@@YAXPEBDZZ"(...) {
entry:
  unreachable
}
$ clang-cl -cc1 -triple x86_64-pc-windows-msvc19.16.0        -faddrsig -o /dev/null /tmp/a.ll -emit-obj
clang-cl: ../../llvm/include/llvm/MC/MCSymbol.h:271: MCSection &llvm::MCSymbol::getSection() const: Assertion `isInSection() && "Invalid accessor!"' failed.                                       
efriedma reopened this revision.Apr 13 2023, 2:12 PM
This revision is now accepted and ready to land.Apr 13 2023, 2:12 PM
efriedma updated this revision to Diff 513359.Apr 13 2023, 2:21 PM

Update to fix the addrsig crash.

If an alias points to a weak symbol (which is not itself an alias), the alias is referring to the address of the definition, not the symbol itself. Make sure MCExpr::findAssociatedFragment continues to handle that case correctly.

MaskRay added inline comments.Apr 13 2023, 7:13 PM
llvm/include/llvm/MC/MCDirectives.h
51

Group this with other .weak* directives.

MaskRay accepted this revision.Apr 13 2023, 10:54 PM
MaskRay added inline comments.
llvm/lib/MC/MCExpr.cpp
1003

Add a comment

efriedma updated this revision to Diff 514314.Apr 17 2023, 10:54 AM

Address review comments.

After looking at MCSymbol::getFragment() a bit more, I decided it makes more sense to check isWeakExternal() there, instead of in MCExpr::findAssociatedFragment(). End result is sort of similar because getFragment() is the only caller of findAssociatedFragment(), but the semantics make more sense if the recursion stops earlier.

MaskRay accepted this revision.Apr 17 2023, 12:48 PM
efriedma marked 7 inline comments as done.Apr 17 2023, 1:14 PM
This revision was landed with ongoing or failed builds.Apr 17 2023, 1:17 PM
This revision was automatically updated to reflect the committed changes.
hans added a subscriber: hans.Apr 24 2023, 5:13 AM

Heads up that this broke linking of Chromium: http://crbug.com/1435480 We don't have any analysis of what's happening yet, though.

aeubanks added inline comments.Apr 26 2023, 12:49 PM
llvm/lib/MC/MCExpr.cpp
764

in https://crbug.com/1435480 we're getting undefined symbol errors with symbols introduced via whole program devirtualization. removing this change resolves the issue. any thoughts?

the undefined symbol is a hidden alias to a weak_odr alias to a private ptr array constant

efriedma added inline comments.Apr 26 2023, 8:02 PM
llvm/lib/MC/MCExpr.cpp
764

So you're talking about something like the following, I guess?

baz:
        .word   1
        .weak   bar
.set bar, baz
        .globl  foo
.set foo, bar

Without the patch, we resolve "foo" to point directly at "baz". That seems dubious; we're basically ignoring the linkage of the alias "bar".

With the patch, "foo" is undefined, which is pretty clearly wrong. I'll need to take another look; I guess the code that handles aliases isn't expecting a non-weak alias to point at a weak alias? You can revert in the meantime if you need to...

What I'd expect is that we actually just emit the alias into the object file.

Do you know what code is emitting this construct? It seems like whatever the original IR was trying to do, it wasn't actually getting the effect it wanted.

efriedma added inline comments.Apr 26 2023, 8:40 PM
llvm/lib/MC/MCExpr.cpp
764

Actually, thinking about it a bit more, I'm tempted to just declare that combining .set with .weak is defined to do whatever horribly broken thing we do right now, and introduce new syntax to represent the actual raw semantics of COFF weak externals. Because I can't find any documentation or tests, and apparently whatever I do is going to break something...

efriedma added inline comments.Apr 26 2023, 9:26 PM
llvm/lib/MC/MCExpr.cpp
764

For reference, the rules without this patch are roughly:

If a name A is .set to another name B, all reference to A instead refer to B. If B is itself .set, this happens recursively.
If a name A is .set to another name B, and the name B is defined, define A to refer to the same address as B.
If a name A is .set to another name B, the name B is undefined, and A is marked .weak, emit a weak external for A referring to B.

With this patch, the rules for .weak symbols .set to another symbol change:
If a name A is .set to another name B, and A is marked .weak, do not resolve references to the symbol A; just emit a weak external for A referring to B. If another alias refers to A, treat A as if it's undefined.


Given that, the possible things we can do here:

  • Revert the rules for ".weak" to the original rules; introduce a new ".weak_search_alias" or something like that to get the new semantics.
  • Come up with some sort of compromise rule, so a non-weak alias referring to a weak alias produces some sort of definition.
  • Declare that the current behavior is correct, and fix whole program devirtualization so it doesn't produce the construct in question. (Which is maybe worth doing anyway, so the resulting IR doesn't depend on confusing edge cases of alias semantics.)

Hi, I got a repro of the undefined symbol error.

Commands:

$ clang-cl /c -flto=thin -fsplit-lto-unit -fwhole-program-vtables /GR- /O1 /w main.cpp impl_lite.cc impl.cc coded_stream.cc
$ lld-link /lib /out:full.lib  /WX coded_stream.obj impl.obj impl_lite.obj
$ lld-link main.obj full.lib /out:main.exe /WX

aeubanks added inline comments.Apr 27 2023, 1:13 PM
llvm/lib/MC/MCExpr.cpp
764

I'll take a look at fixing WPD

efriedma reopened this revision.May 3 2023, 7:03 PM
This revision is now accepted and ready to land.May 3 2023, 7:03 PM
efriedma updated this revision to Diff 519332.May 3 2023, 7:04 PM

Drop change to .weak semantics pending resolution of https://github.com/llvm/llvm-project/issues/62439

This revision was landed with ongoing or failed builds.Jun 7 2023, 11:08 AM
This revision was automatically updated to reflect the committed changes.