This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/LTO/
-
LTO/
3
LTO.cpp
-
test/
-
LTO/X86/
-
X86/
-
hidden-escaped-symbols-alt.ll
-
hidden-escaped-symbols.ll
-
ThinLTO/X86/
-
X86/
-
hidden-escaped-symbols-alt.ll
-
hidden-escaped-symbols.ll

Differential D135710

[lto] Do not try to internalize symbols with escaped name
ClosedPublic

Authored by serge-sans-paille on Oct 11 2022, 1:35 PM.

Download Raw Diff

Details

Reviewers

int3
glandium
steven_wu
mehdi_amini
tejohnson

Group Reviewers

Restricted Project

Commits

rG232e0a011e8c: [lto] Do not try to internalize symbols with escaped name

Summary

Because of LLVM mangling escape sequence (through '\01' prefix), it is possible
for a single symbols two have two different IR representations.

For instance, consider @symbol and @"\01_symbol". On OSX, because of the system
mangling rules, these two IR names point are converted in the same final symbol
upon linkage.

LTO doesn't model this behavior, which may result in symbols being incorrectly
internalized (if all reference use the escaping sequence while the definition
doesn't).

The proper approach is probably to use the mangled name to compute GUID to
avoid the dual representation, but we can also avoid discarding symbols that are
bound to two different IR names. This is an approximation, but it's less
intrusive on the codebase.

Fix #57864

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

serge-sans-paille created this revision.Oct 11 2022, 1:35 PM

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptOct 11 2022, 1:35 PM

Herald added a reviewer: Restricted Project. · View Herald Transcript

Herald added subscribers: ormris, hiraditya, inglorion. · View Herald Transcript

serge-sans-paille requested review of this revision.Oct 11 2022, 1:35 PM

Herald added a project: Restricted Project. · View Herald TranscriptOct 11 2022, 1:35 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

serge-sans-paille added a reviewer: tejohnson.Oct 11 2022, 1:36 PM

Did you see my latest comment on the task? https://github.com/llvm/llvm-project/issues/57864#issuecomment-1274971642

My bad, I had a proper look and yeah this seems to do the right thing.

I'm wondering if the test should use llvm-lto instead of ld64.lld (and placed under llvm/test/LTO) since this is technically not an LLD-specific issue. I don't feel strongly about it though.

If we are keeping the test under lld/, I would prefer we combined both test files into one (relying more on split-file).

Harbormaster completed remote builds in B191565: Diff 466902.Oct 11 2022, 2:35 PM

We call GlobalValue::dropLLVMManglingEscape on the IRName before computing the GUID for both regular and thin LTO in LTO.cpp. Do you know why this isn't handling this case?

dropLLVMManglingEscape drops the \01 but doesn't drop the _ prefix. I *think* this prefix is only used on Mach-O and not ELF?

Basically in Mach-O, the un-mangled names \01_foo and foo map to the same mangled name _foo. dropLLVMManglingEscape("\01_foo") gives us back _foo which doesn't match foo.

In D135710#3851010, @int3 wrote:

dropLLVMManglingEscape drops the \01 but doesn't drop the _ prefix. I *think* this prefix is only used on Mach-O and not ELF?

It's also used on COFF for x86 (but not x86-64 or arm32 or arm64), if that's at all relevant here.

@tejohnson for a bit more context, see https://discourse.llvm.org/t/thinlto-does-not-account-for-mangle-suppression-prefix-mach-o-issue/65686

In D135710#3850973, @tejohnson wrote:

We call GlobalValue::dropLLVMManglingEscape on the IRName before computing the GUID for both regular and thin LTO in LTO.cpp. Do you know why this isn't handling this case?

It's because link time mangling can add various prefixes to symbols, see https://llvm.org/docs/LangRef.html#data-layout , m:<mangling> section. Basically for visible symbols, GlobalValue::dropLLVMManglingEscape acts as un unmangler for ELF symbols. But it's not the case for other manglers.

Ok thanks all for the context. I spent some time poking around this morning and I think this approach is fine for now but have a couple of questions and comments.

First, I'm wondering why this isn't an issue for regular LTO which by default does index based liveness. Specifically, see the following which is for both regular and thin LTO:
https://github.com/llvm/llvm-project/blob/main/llvm/lib/LTO/LTO.cpp#L1006-L1032

Since it sounds like we get separate combined index entries for the 2 names, I'm wondering why one of them (the one that doesn't match the global resolution IRName) isn't being incorrectly marked dead by that analysis and then subsequently removed by regular LTO here:
https://github.com/llvm/llvm-project/blob/main/llvm/lib/LTO/LTO.cpp#L861-L873

Second, I see that the big problem we have is that we compute the GUID during ThinLTO in particular using the static version of getGlobalIdentifier, since we don't have IR, and thus it isn't easy to modify that to take the mangling into account. But it looks like we would just need the triple, which is available in the InputFile object created for all bitcode files during linking. That could be used here in the linker interface to LTO and then perhaps stored in the index alongside the corresponding module path for later use during thinlto. It looks like there is handling in llvm/lib/IR/Mangler.cpp that does this mangling that could be modified to then work off that information instead of accessing it from the GV. If this is correct I wonder if we should document this idea somewhere for future work. It is fairly invasive though so I'm fine with the workaround here for now.

Couple more comments below.

In D135710#3850614, @int3 wrote:

My bad, I had a proper look and yeah this seems to do the right thing.

I'm wondering if the test should use llvm-lto instead of ld64.lld (and placed under llvm/test/LTO) since this is technically not an LLD-specific issue. I don't feel strongly about it though.

Yeah, I think llvm-lto2 (not llvm-lto, which tests the legacy LTO interface) would be a better tool to test this with, since it doesn't seem lld-specific. Either under llvm/test/ThinLTO, or if this is also an issue with regular LTO, then under llvm/test/LTO.

If we are keeping the test under lld/, I would prefer we combined both test files into one (relying more on split-file).

Looks like the tests already use split-file? Maybe that was changed after this comment though.

lld/test/MachO/hidden-escaped-symbols-alt.ll
5 ↗	(On Diff #466902)	I know this is just checking that it links correctly without error, but it would be better to add an explicit CHECK for the linked symbol too. Also please add a comment at the top about what is being tested. Ditto for other new test.
llvm/lib/LTO/LTO.cpp
571	Can you put in more details on why (with the MachO mangling as an example).
573	I assume by "dismissal" you mean "internalization" or "dead code elimination". Can you change this to mention which of those is happening incorrectly, since "dismissal" isn't really a term used elsewhere in LTO afaik.

Looks like the tests already use split-file? Maybe that was changed after this comment though.

I was just trying to suggest that hidden-escaped-symbols-alt.ll and hidden-escaped-symbols.ll be combined into one test file.

Update test cases:

use llvm-lto2
illustrate behavior on full and thin LTO

Also update comments on the extra condition, as suggested by @tejohnson

Fix typo + add a FIXME to keep track of potential enhancements.

Harbormaster completed remote builds in B191914: Diff 467394.Oct 13 2022, 3:20 AM

lgtm, but please wait to see if @int3 has any more comments.
Suggest adding a comment to the top of the tests about what it is testing (that LTO is properly handling the MachO mangling and treating @hide_me and @"\01_hide_me" as the same symbol).

This revision is now accepted and ready to land.Oct 13 2022, 5:48 AM

lgtm, thanks for doing this!

llvm/lib/LTO/LTO.cpp
571	nit: technically this is macOS + iOS, not just OSX

Closed by commit rG232e0a011e8c: [lto] Do not try to internalize symbols with escaped name (authored by serge-sans-paille). · Explain WhyOct 14 2022, 1:36 PM

This revision was automatically updated to reflect the committed changes.

serge-sans-paille added a commit: rG232e0a011e8c: [lto] Do not try to internalize symbols with escaped name.

Revision Contents

Path

Size

llvm/

lib/

LTO/

LTO.cpp

16 lines

test/

LTO/

X86/

hidden-escaped-symbols-alt.ll

41 lines

hidden-escaped-symbols.ll

41 lines

ThinLTO/

X86/

hidden-escaped-symbols-alt.ll

42 lines

hidden-escaped-symbols.ll

42 lines

Diff 467898

llvm/lib/LTO/LTO.cpp

Show First 20 Lines • Show All 561 Lines • ▼ Show 20 Lines

if (Res.Prevailing) {

// symbol can have no IR name. That might happen if symbol is defined in

// module level inline asm block. In case we have multiple modules with

// the same symbol we want to use IR name of the prevailing symbol.

// Otherwise, if we haven't seen a prevailing symbol, set the name so that

// we can later use it to check if there is any prevailing copy in IR.

GlobalRes.IRName = std::string(Sym.getIRName());

}

// In rare occasion, the symbol used to initialize GlobalRes has a different

// IRName from the inspected Symbol. This can happen on macOS + iOS, when a

tejohnsonUnsubmitted

Not Done

Can you put in more details on why (with the MachO mangling as an example).

tejohnson: Can you put in more details on why (with the MachO mangling as an example).

int3Unsubmitted

Not Done

// In rare occasion, the symbol used to initialize GlobalRes has a different

- // IRName from the inspected Symbol. This can happen on OSX when a symbol

+ // IRName from the inspected Symbol. This can happen with Mach-O when a symbol

// is referenced through its mangled name, say @"\01_symbol" while the

nit: technically this is macOS + iOS, not just OSX

int3: nit: technically this is macOS + iOS, not just OSX

// symbol is referenced through its mangled name, say @"\01_symbol" while

// the IRName is @symbol (the prefix underscore comes from MachO mangling).

tejohnsonUnsubmitted

Not Done

I assume by "dismissal" you mean "internalization" or "dead code elimination". Can you change this to mention which of those is happening incorrectly, since "dismissal" isn't really a term used elsewhere in LTO afaik.

tejohnson: I assume by "dismissal" you mean "internalization" or "dead code elimination". Can you change…

// In that case, we have the same actual Symbol that can get two different

// GUID, leading to some invalid internalization. Workaround this by marking

// the GlobalRes external.

// FIXME: instead of this check, it would be desirable to compute GUIDs

// based on mangled name, but this requires an access to the Target Triple

// and would be relatively invasive on the codebase.

if (GlobalRes.IRName != Sym.getIRName()) {

GlobalRes.Partition = GlobalResolution::External;

GlobalRes.VisibleOutsideSummary = true;

}

// Set the partition to external if we know it is re-defined by the linker

// with -defsym or -wrap options, used elsewhere, e.g. it is visible to a

// regular object, is referenced from llvm.compiler.used/llvm.used, or was

// already recorded as being referenced from a different partition.

if (Res.LinkerRedefined || Res.VisibleToRegularObj || Sym.isUsed() ||

(GlobalRes.Partition != GlobalResolution::Unknown &&

GlobalRes.Partition != Partition)) {

GlobalRes.Partition = GlobalResolution::External;

▲ Show 20 Lines • Show All 1,086 Lines • Show Last 20 Lines

llvm/test/LTO/X86/hidden-escaped-symbols-alt.ll

This file was added.

				; Check interaction between LTO and LLVM mangling escape char, see #57864.

				; RUN: split-file %s %t
				; RUN: opt %t/hide-me.ll -o %t/hide-me.bc
				; RUN: opt %t/ref.ll -o %t/ref.bc
				; RUN: llvm-lto2 run \
				; RUN: -r %t/hide-me.bc,_hide_me,p \
				; RUN: -r %t/ref.bc,_main,plx \
				; RUN: -r %t/ref.bc,_hide_me,l \
				; RUN: --select-save-temps=precodegen \
				; RUN: -o %t/out \
				; RUN: %t/hide-me.bc %t/ref.bc
				; RUN: llvm-dis %t/out.0.5.precodegen.bc -o - \| FileCheck %s


				;--- hide-me.ll
				target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-apple-macosx10.7.0"

				@"\01_hide_me" = hidden local_unnamed_addr global i8 8, align 1

				;--- ref.ll
				target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-apple-macosx10.7.0"

				@hide_me = external local_unnamed_addr global i8

				define i8 @main() {
				%1 = load i8, ptr @hide_me, align 1
				ret i8 %1
				}


				; CHECK: @"\01_hide_me" = hidden local_unnamed_addr global i8 8, align 1
				; CHECK: @hide_me = external dso_local local_unnamed_addr global i8

				; CHECK: define dso_local i8 @main() local_unnamed_addr #0 {
				; CHECK: %1 = load i8, ptr @hide_me, align 1
				; CHECK: ret i8 %1
				; CHECK: }

llvm/test/LTO/X86/hidden-escaped-symbols.ll

This file was added.

				; Check interaction between LTO and LLVM mangling escape char, see #57864.

				; RUN: split-file %s %t
				; RUN: opt %t/hide-me.ll -o %t/hide-me.bc
				; RUN: opt %t/ref.ll -o %t/ref.bc
				; RUN: llvm-lto2 run \
				; RUN: -r %t/hide-me.bc,_hide_me,p \
				; RUN: -r %t/ref.bc,_main,plx \
				; RUN: -r %t/ref.bc,_hide_me,l \
				; RUN: --select-save-temps=precodegen \
				; RUN: -o %t/out \
				; RUN: %t/hide-me.bc %t/ref.bc
				; RUN: llvm-dis %t/out.0.5.precodegen.bc -o - \| FileCheck %s


				;--- hide-me.ll
				target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-apple-macosx10.7.0"

				@hide_me = hidden local_unnamed_addr global i8 8, align 1

				;--- ref.ll
				target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-apple-macosx10.7.0"

				@"\01_hide_me" = external local_unnamed_addr global i8

				define i8 @main() {
				%1 = load i8, ptr @"\01_hide_me", align 1
				ret i8 %1
				}


				; CHECK: @hide_me = hidden local_unnamed_addr global i8 8, align 1
				; CHECK: @"\01_hide_me" = external dso_local local_unnamed_addr global i8

				; CHECK: define dso_local i8 @main() local_unnamed_addr #0 {
				; CHECK: %1 = load i8, ptr @"\01_hide_me", align 1
				; CHECK: ret i8 %1
				; CHECK: }

llvm/test/ThinLTO/X86/hidden-escaped-symbols-alt.ll

This file was added.

				; Check interaction between LTO and LLVM mangling escape char, see #57864.

				; RUN: split-file %s %t
				; RUN: opt -module-summary %t/hide-me.ll -o %t/hide-me.bc
				; RUN: opt -module-summary %t/ref.ll -o %t/ref.bc
				; RUN: llvm-lto2 run \
				; RUN: -r %t/hide-me.bc,_hide_me,p \
				; RUN: -r %t/ref.bc,_main,plx \
				; RUN: -r %t/ref.bc,_hide_me,l \
				; RUN: --select-save-temps=precodegen \
				; RUN: -o %t/out \
				; RUN: %t/hide-me.bc %t/ref.bc
				; RUN: llvm-dis %t/out.1.5.precodegen.bc -o - \| FileCheck --check-prefix=CHECK-HIDE %s
				; RUN: llvm-dis %t/out.2.5.precodegen.bc -o - \| FileCheck --check-prefix=CHECK-REF %s


				;--- hide-me.ll
				target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-apple-macosx10.7.0"

				@"\01_hide_me" = hidden local_unnamed_addr global i8 8, align 1

				;--- ref.ll
				target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-apple-macosx10.7.0"

				@hide_me = external local_unnamed_addr global i8

				define i8 @main() {
				%1 = load i8, ptr @hide_me, align 1
				ret i8 %1
				}


				; CHECK-HIDE: @"\01_hide_me" = hidden local_unnamed_addr global i8 8, align 1

				; CHECK-REF: @hide_me = external local_unnamed_addr global i8
				; CHECK-REF: define dso_local i8 @main() local_unnamed_addr #0 {
				; CHECK-REF: %1 = load i8, ptr @hide_me, align 1
				; CHECK-REF: ret i8 %1
				; CHECK-REF: }

llvm/test/ThinLTO/X86/hidden-escaped-symbols.ll

This file was added.

				; Check interaction between LTO and LLVM mangling escape char, see #57864.

				; RUN: split-file %s %t
				; RUN: opt -module-summary %t/hide-me.ll -o %t/hide-me.bc
				; RUN: opt -module-summary %t/ref.ll -o %t/ref.bc
				; RUN: llvm-lto2 run \
				; RUN: -r %t/hide-me.bc,_hide_me,p \
				; RUN: -r %t/ref.bc,_main,plx \
				; RUN: -r %t/ref.bc,_hide_me,l \
				; RUN: --select-save-temps=precodegen \
				; RUN: -o %t/out \
				; RUN: %t/hide-me.bc %t/ref.bc
				; RUN: llvm-dis %t/out.1.5.precodegen.bc -o - \| FileCheck --check-prefix=CHECK-HIDE %s
				; RUN: llvm-dis %t/out.2.5.precodegen.bc -o - \| FileCheck --check-prefix=CHECK-REF %s


				;--- hide-me.ll
				target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-apple-macosx10.7.0"

				@hide_me = hidden local_unnamed_addr global i8 8, align 1

				;--- ref.ll
				target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-apple-macosx10.7.0"

				@"\01_hide_me" = external local_unnamed_addr global i8

				define i8 @main() {
				%1 = load i8, ptr @"\01_hide_me", align 1
				ret i8 %1
				}


				; CHECK-HIDE: @hide_me = hidden local_unnamed_addr global i8 8, align 1

				; CHECK-REF: @"\01_hide_me" = external local_unnamed_addr global i8
				; CHECK-REF: define dso_local i8 @main() local_unnamed_addr #0 {
				; CHECK-REF: %1 = load i8, ptr @"\01_hide_me", align 1
				; CHECK-REF: ret i8 %1
				; CHECK-REF: }

This is an archive of the discontinued LLVM Phabricator instance.

[lto] Do not try to internalize symbols with escaped nameClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 467898

llvm/lib/LTO/LTO.cpp

llvm/test/LTO/X86/hidden-escaped-symbols-alt.ll

llvm/test/LTO/X86/hidden-escaped-symbols.ll

llvm/test/ThinLTO/X86/hidden-escaped-symbols-alt.ll

llvm/test/ThinLTO/X86/hidden-escaped-symbols.ll

[lto] Do not try to internalize symbols with escaped name
ClosedPublic