This is an archive of the discontinued LLVM Phabricator instance.

[lld][WebAssembly] Retain data segments referenced via __start/__stop
ClosedPublic

Authored by kateinoigakukun on Jun 3 2022, 12:03 AM.

Details

Summary

As well as ELF linker does, retain all data segments named X referenced
through __start_X or __stop_X.

For example, FOO_MD should not be stripped in the below case, but it's currently mis-stripped

llvm
@FOO_MD  = global [4 x i8] c"bar\00", section "foo_md", align 1
@__start_foo_md = external constant i8*
@__stop_foo_md = external constant i8*
@llvm.used = appending global [1 x i8*] [i8* bitcast (i32 ()* @foo_md_size to i8*)], section "llvm.metadata"

define i32 @foo_md_size()  {
entry:
  ret i32 sub (
    i32 ptrtoint (i8** @__stop_foo_md to i32),
    i32 ptrtoint (i8** @__start_foo_md to i32)
  )
}

This fixes https://github.com/llvm/llvm-project/issues/55839

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptJun 3 2022, 12:03 AM
kateinoigakukun requested review of this revision.Jun 3 2022, 12:03 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 3 2022, 12:03 AM
kateinoigakukun edited the summary of this revision. (Show Details)Jun 3 2022, 12:52 AM
sbc100 added inline comments.Jun 3 2022, 8:08 AM
lld/test/wasm/gc-sections-metadata-startstop.ll
1 ↗(On Diff #433970)

We are trying to move away from .ll tests and stick to use .s tests as they tend to be easier to read/faster/fewer dependencies.

Would you mind converting this test to .s?

lld/wasm/InputChunks.cpp
531

I don't think you need the this-> qualifiers here .

lld/wasm/MarkLive.cpp
89

Just enqueueChunk seems descriptive enough, or maybe even just overload enqueue?

121

So, does this mean that if I reference the symbol __start_data, then all the normal data segment in my whole program will be included (non will be GC'd)?

Is this how ELF works? It seems like a shame to loose the ability to GC data segments just because i want to know where my data starts and/or stops. I suppose we have other symbols such as _edata which can be used to get the start/end of the static data region without disabling GC?

addressed review comments

kateinoigakukun marked 2 inline comments as done and an inline comment as not done.Jun 3 2022, 10:06 AM
kateinoigakukun added inline comments.
lld/wasm/MarkLive.cpp
121

So, does this mean that if I reference the symbol __start_data, then all the normal data segment in my whole program will be included (non will be GC'd)?

Yes, I agree that's unfortunate situation...

Is this how ELF works?

To be precise, according to https://lld.llvm.org/ELF/start-stop-gc, whether __start/__stop symbols retain all input data segments is controlled by -z start-stop-gc / -z nostart-stop-gc.

The latest ELF lld defaults to -z start-stop-gc, but GNU ld defaults to -z nostart-stop-gc to be conservative.

So, how about adding the option in wasm-ld also? If it's reasonable, which behavior should be defaulted?

sbc100 added inline comments.Jun 3 2022, 10:10 AM
lld/wasm/MarkLive.cpp
121

Yes, adding start-stop-gc as part of this change makes sense to me.

So simplify this you could split out of the moving of the getOutputDataSegmentName function amd land that first? It seems like a good refactoring anyway.

sbc100 added inline comments.Jun 3 2022, 10:13 AM
lld/test/wasm/gc-sections-metadata-startstop.s
2

--gc-sections is the default under wasm so you don't need that here.

90

I think you can drop the .globl for everything but _start without effecting the test.

I'm also fine with start-stop-gc being added a followup.

I'm also fine with start-stop-gc being added a followup.

OK, let me do it in a follow-up patch 🙏

lld/test/wasm/gc-sections-metadata-startstop.s
2

Ah, I've intentionally added the option to indicate the main testing target is the option, but it's ok to remove indeed.

lld/wasm/MarkLive.cpp
121

OK, I'll make a separate patch for the refactoring first 👍

addressed review comments, and simplify by separating NFC patch

kateinoigakukun marked 2 inline comments as done.Jun 3 2022, 10:35 AM
sbc100 added inline comments.Jun 3 2022, 10:54 AM
lld/test/wasm/gc-sections-metadata-startstop.s
2

Why not just call this gc-sections-startstop.s.. what is the metadata part referring too?

lld/wasm/MarkLive.cpp
117

I think you can just use segment->name here. That is what ELF seems to do

lld/test/wasm/gc-sections-metadata-startstop.s
2

startstop feature is often used as metadata mechanism, so some other test cases have the metadata part as well, but it's fine to remove that part since it's descriptive enough.

addressed review comments

sbc100 accepted this revision.Jun 3 2022, 11:10 AM
This revision is now accepted and ready to land.Jun 3 2022, 11:10 AM

Thank you for your review @sbc100 !
I'll make a follow-up patch for start-stop-gc option next.

MaskRay added a subscriber: MaskRay.EditedJun 3 2022, 7:47 PM

As well as ELF linker does, retain all data segments named X referenced through start_X or stop_X.

lld doesn't do this by default. This behavior requires -z nostart-stop-gc.

Not force retaining X allows fine-grained GC of metadata sections: https://maskray.me/blog/2021-01-31-metadata-sections-comdat-and-shf-link-order

As well as ELF linker does, retain all data segments named X referenced through start_X or stop_X.

lld doesn't do this by default. This behavior requires -z nostart-stop-gc.

Not force retaining X allows fine-grained GC of metadata sections: https://maskray.me/blog/2021-01-31-metadata-sections-comdat-and-shf-link-order

Ah right, so my description is valid with GNU ld sorry. I'm going to add the option and make it default in wasm-ld.

As well as ELF linker does, retain all data segments named X referenced through start_X or stop_X.

lld doesn't do this by default. This behavior requires -z nostart-stop-gc.

Not force retaining X allows fine-grained GC of metadata sections: https://maskray.me/blog/2021-01-31-metadata-sections-comdat-and-shf-link-order

Ah right, so my description is valid with GNU ld sorry. I'm going to add the option and make it default in wasm-ld.

-z start-stop-gc is the ideal behavior. I am concerned the behavior as defaulted by this patch will make incur unnecessary size costs to metadata sections.

As well as ELF linker does, retain all data segments named X referenced through start_X or stop_X.

lld doesn't do this by default. This behavior requires -z nostart-stop-gc.

Not force retaining X allows fine-grained GC of metadata sections: https://maskray.me/blog/2021-01-31-metadata-sections-comdat-and-shf-link-order

Ah right, so my description is valid with GNU ld sorry. I'm going to add the option and make it default in wasm-ld.

-z start-stop-gc is the ideal behavior. I am concerned the behavior as defaulted by this patch will make incur unnecessary size costs to metadata sections.

So should we revert this patch ASAP and reland with hiding it behind -z nostart-stop-gc? Or is it fine to wait for my upcoming follow-up patch?

MaskRay added a comment.EditedJun 3 2022, 9:04 PM

I'd suggest a revert. Note that Mach-O ld64 matches the ld.lld behavior, too.

I think it needs justification to add a new option. If you want to retain a section, consider implementing something similar to SHF_GNU_RETAIN in ELF, instead of changing the linker GC to force retain sections.