This is an archive of the discontinued LLVM Phabricator instance.

IMO we should default to optimizing for speed (speed is one of our big selling points after all), but I'm open to changing this behavior if you or other people feel strongly about it :)

Yea this case is tough. We could always recommend folks enable it with this same warning instead. My biggest worry, similar to the case we hit with this, is that many iOS apps do not control _all_ the code that ships in their apps. Specifically the proliferation of closed source libraries for analytics / fraud / payments are very common, and might have these subtle bugs that developers cannot easily fix. Really I think flipping the default would just be a benefit for folks who were onboarding.

That's a fair argument. @thakis @oontvoo @MaskRay, do y'all have any thoughts on this?

Specifically the proliferation of closed source libraries for analytics / fraud / payments are very common, and might have these subtle bugs that developers cannot easily fix. Really I think flipping the default would just be a benefit for folks who were onboarding.

Can you elaborate what kind of bugs *not* deduping literals by default could lead to?

I think I agree with int3 here that de-duping by default is more likely to lead to bugs. (in fact, we've already found a lot of cases in our codebase where people incorrectly depended on it; none of which would likely have been identified had we not tried switching over to lld)

In D117250#3241888, @int3 wrote:

That's a fair argument. @thakis @oontvoo @MaskRay, do y'all have any thoughts on this?

I guess my vote is on the current behaviour :) 🗳

In D117250#3242166, @oontvoo wrote:

Specifically the proliferation of closed source libraries for analytics / fraud / payments are very common, and might have these subtle bugs that developers cannot easily fix. Really I think flipping the default would just be a benefit for folks who were onboarding.

Can you elaborate what kind of bugs *not* deduping literals by default could lead to?

I mean I guess literally anything right? The case I found for us was luckily a crasher (repro here https://github.com/llvm/llvm-project/issues/51822) but given some logic error who knows I guess?

In D117250#3242166, @oontvoo wrote:

Specifically the proliferation of closed source libraries for analytics / fraud / payments are very common, and might have these subtle bugs that developers cannot easily fix. Really I think flipping the default would just be a benefit for folks who were onboarding.

I think I agree with int3 here that de-duping by default is more likely to lead to bugs. (in fact, we've already found a lot of cases in our codebase where people incorrectly depended on it; none of which would likely have been identified had we not tried switching over to lld)

It's definitely true that you would be more likely to introduce bugs in your own code with deduping on, but in the closed source binary case, where the producers of these aren't likely to be using lld themselves, us finding those bugs downstream isn't super helpful. Tbh even if we did find it I'm not sure all vendors would be receptive to bugs that only reproduce with the non-default linker, but that's tbd if we can actually even spot them given they could be much more subtle logic errors. It's too bad that there isn't a clang warning for this case AFAICT.

Regardless I still think this is a question of whether usage defaults should mirror ld64, or prefer performance.

Non-string constant deduplication isn't that useful.
Tail string merge has very little benefit.

On the ELF land, I have some brief notes on https://maskray.me/blog/2021-12-19-why-isnt-ld.lld-faster#shf_merge-duplicate-elimination and compressed debug info.
ld.lld -O1 (default) performs SHF_MERGE|SHF_STRINGS deduplication. It has a huge impact on the size of .debug_str.

% ld.lld @response.txt -o clang.0 -O0
% ld.lld @response.txt -o clang.1 -O1
% stat -c %s clang.0 clang.1
2126774248
1389546048

% ~/projects/bloaty/Release/bloaty clang.0 -- clang.1
    FILE SIZE        VM SIZE    
 --------------  -------------- 
  +286%  +661Mi  [ = ]       0    .debug_str
   +87% +41.3Mi   +87% +41.3Mi    .rodata
 +18e4%  +266Ki  [ = ]       0    .comment
  +0.0%      +8  [ = ]       0    .eh_frame
  -0.0%      -5  [ = ]       0    .debug_line
   +53%  +703Mi   +16% +41.3Mi    TOTAL

% hyperfine --warmup 2 --min-runs 10 "numactl -C 20-27 /tmp/out/custom2/bin/ld.lld "{-O0,-O1}" @response.txt --threads=8 -o clang"                                                                                                                                                   
Benchmark 1: numactl -C 20-27 /tmp/out/custom2/bin/ld.lld -O0 @response.txt --threads=8 -o clang
 ⠧ Current estimate: 4.992 s 
  Time (mean ± σ):      5.006 s ±  0.032 s    [User: 5.289 s, System: 3.048 s]
  Range (min … max):    4.958 s …  5.079 s    10 runs
 
Benchmark 2: numactl -C 20-27 /tmp/out/custom2/bin/ld.lld -O1 @response.txt --threads=8 -o clang
  Time (mean ± σ):      6.030 s ±  0.044 s    [User: 11.633 s, System: 2.822 s]
  Range (min … max):    5.936 s …  6.066 s    10 runs
 
Summary
  'numactl -C 20-27 /tmp/out/custom2/bin/ld.lld -O0 @response.txt --threads=8 -o clang' ran
    1.20 ± 0.01 times faster than 'numactl -C 20-27 /tmp/out/custom2/bin/ld.lld -O1 @response.txt --threads=8 -o clang'

.debug_str is ~3.86x (1+286%=3.86) as large if you suppress deduplication.

There are users preferring size and users preferring speed.
If you do parallelism on string deduplication, the speed may not differ too much.
(I have tried poor man's concurrent hash map, but don't find a noticeable improvement.)

There is also an ecosystem problem. Does dsymutil deduplicate strings? If debug executable user will invoke it anyway, then the size probably doesn't matter that much.
In ELF land, if the output is too large, there are definitely users complaining against your linker...

(Oops. I cannot test dwz on the Debug build of clang)

% dwz clang.0 -o clang.0.dwz
dwz: clang.0: Too many DIEs, not optimizing

Perhaps I can contribute to the parallel part of DeduplicatedCStringSection::finalizeContents? :)
If I can make my cbdr work (https://reviews.llvm.org/D114735#3236110). Currently it seems to always print the help message

% cbdr -V  
cbdr 0.2.3
Tools for comparitive benchmarking

USAGE:
    cbdr <SUBCOMMAND>

FLAGS:
    -h, --help       Prints help information
    -V, --version    Prints version information

SUBCOMMANDS:
    analyze    For each pair of benchmarks (x and y), shows, for each metric (̄x and ̄y), the CI of (̄y - ̄x) / ̄x
    help       Prints this message or the help of the given subcommand(s)
    plot       Takes CSV data on stdin and produces a vega-lite plot specification on stdout
    sample     Repeatedly runs benchmarks chosen at random and prints results as CSV

In D117250#3242211, @keith wrote:

In D117250#3242166, @oontvoo wrote:

Specifically the proliferation of closed source libraries for analytics / fraud / payments are very common, and might have these subtle bugs that developers cannot easily fix. Really I think flipping the default would just be a benefit for folks who were onboarding.

Can you elaborate what kind of bugs *not* deduping literals by default could lead to?

I mean I guess literally anything right? The case I found for us was luckily a crasher (repro here https://github.com/llvm/llvm-project/issues/51822) but given some logic error who knows I guess?
It's definitely true that you would be more likely to introduce bugs in your own code with deduping on, but in the closed source binary case, where the producers of these aren't likely to be using lld themselves, us finding those bugs downstream isn't super helpful.
Tbh even if we did find it I'm not sure all vendors would be receptive to bugs that only reproduce with the non-default linker, but that's tbd if we can actually even spot them given they could be much more subtle logic errors. I

I can appreciate the complexity+frustration in chasing and fixing bugs like this, having done that for a few dozens of projects in our code base.
But this is clearly UB (both C/C++ and ObjC never guarantee pointer equality for strings with the same values).

I don't see the argument here being about speed (even though in practice, it kind of is). To me, it's more about whether LLD should make it easy for people to rely on UB - and I'm really not convinced that it should.
Also, we can always fall back to setting this de-dup flag if the bug is in code that's not fixable, no?

It's too bad that there isn't a clang warning for this case AFAICT.

IIRC, there is only one complier-warning for this kind of thing in ObjC - and I've recently added a clang-tidy check for another scenario - but yeah, that's about it.

In D117250#3242329, @oontvoo wrote:

In D117250#3242211, @keith wrote:

In D117250#3242166, @oontvoo wrote:

Specifically the proliferation of closed source libraries for analytics / fraud / payments are very common, and might have these subtle bugs that developers cannot easily fix. Really I think flipping the default would just be a benefit for folks who were onboarding.

Can you elaborate what kind of bugs *not* deduping literals by default could lead to?

I mean I guess literally anything right? The case I found for us was luckily a crasher (repro here https://github.com/llvm/llvm-project/issues/51822) but given some logic error who knows I guess?
It's definitely true that you would be more likely to introduce bugs in your own code with deduping on, but in the closed source binary case, where the producers of these aren't likely to be using lld themselves, us finding those bugs downstream isn't super helpful.
Tbh even if we did find it I'm not sure all vendors would be receptive to bugs that only reproduce with the non-default linker, but that's tbd if we can actually even spot them given they could be much more subtle logic errors. I

I can appreciate the complexity+frustration in chasing and fixing bugs like this, having done that for a few dozens of projects in our code base.
But this is clearly UB (both C/C++ and ObjC never guarantee pointer equality for strings with the same values).

I don't see the argument here being about speed (even though in practice, it kind of is). To me, it's more about whether LLD should make it easy for people to rely on UB - and I'm really not convinced that it should.

I agree with this but it's possible, especially w/o a compiler warning for this, some folks might see this as idealistic given with the standard linker it's never surfaced.

Also, we can always fall back to setting this de-dup flag if the bug is in code that's not fixable, no?

Yes but I think for new users it's nicer to have 1:1 (as much as is realistic) behavior with ld64 to start, and then they can start investigating flags for performance. Versus finding some runtime misbehavior (hopefully folks wouldn't just give up at this point) and then having to find the flag to fix it.

It's too bad that there isn't a clang warning for this case AFAICT.

IIRC, there is only one complier-warning for this kind of thing in ObjC - and I've recently added a clang-tidy check for another scenario - but yeah, that's about it.

I searched around a bit for this as well, any idea if there's a previous conversation about it anywhere? You get the warning for foo == "a" but not foo == bar, I wonder how folks would feel about adding one for that case. I also tested with ubsan and it doesn't warn for this either.

In D117250#3242407, @keith wrote:

, I wonder how folks would feel about adding one for that case.

From slightly biased opinions of my colleagues, this would cause quite a bit of false positives. (eg., code that handles buffers)

I also tested with ubsan and it doesn't warn for this either.

it's not UB to compare two const char* - it is kind of UB to *expect* their equality, which, of course, is hard to tell.

oontvoo added inline comments.Jan 13 2022, 6:29 PM

lld/MachO/ld64-vs-lld.rst
13	s/compared/compare ? (not sure why past tense is needed ...)

After reading through this thread, I'm going to try and argue for both fronts here and then give my final opinion at the end.

For the case of matching ld64, we always advertise (or at least try to) say on diff reviews that we should always be matching ld64's behavior. We say things like "What does ld64 produce?" or "Why isn't LLD outputting the same results as ld64". We care very much matching with ld64's behavior because that's the one of the ways we verify that LLD is working correctly. To that end, perhaps we should be matching with ld64 if we want to keep saying "LLD should be a drop-in replacement for ld64". Our current behavior clearly deviates from this, but we continue advertising this statement. On top of that, as @keith pointed out, it's fewer surprises.

For the case of build speed, we also advertise that "LLD is the fastest linker to exist" (barring the existence of mold). By configuring default behaviors of LLD to achieve the fastest build speed is also aligned with that claim, but this introduces subtleties and behavioral differences that will certainly cause new users to be confused. We could enforce some that they read the documentation on the differences between ld64 and LLD, but chances are that newer folks looking to do a drop in replacement would've sunk a good amount of time debugging before consulting the manual. As @oontvoo also argues, users shouldn't be relying on linker behavior to have their code working correctly.

What do I think about this? I lean towards keeping what we have today because I, more or less, agree that we should be actively fixing these issues. I'm also biased since I had to do this within our codebase as well. In the event where it can't be solved, the fallback flag does exist to offer users a way out. The way I see all this is that people adopt LLD for its speed, not because it's more correct (otherwise people would stick with ld64). If you take mold as an example, it doesn't care about backwards compatibility with older linkers. It just cares that it's the fastest linker. LLD follows the same footsteps, if not similar. QED.

In D117250#3242488, @thevinster wrote:

For the case of matching ld64, we always advertise (or at least try to) say on diff reviews that we should always be matching ld64's behavior. We say things like "What does ld64 produce?" or "Why isn't LLD outputting the same results as ld64". We care very much matching with ld64's behavior because that's the one of the ways we verify that LLD is working correctly. To that end, perhaps we should be matching with ld64 if we want to keep saying "LLD should be a drop-in replacement for ld64". Our current behavior clearly deviates from this, but we continue advertising this statement. On top of that, as @keith pointed out, it's fewer surprises.

For the case of build speed, we also advertise that "LLD is the fastest linker to exist" (barring the existence of mold). By configuring default behaviors of LLD to achieve the fastest build speed is also aligned with that claim, but this introduces subtleties and behavioral differences that will certainly cause new users to be confused. We could enforce some that they read the documentation on the differences between ld64 and LLD, but chances are that newer folks looking to do a drop in replacement would've sunk a good amount of time debugging before consulting the manual. As @oontvoo also argues, users shouldn't be relying on linker behavior to have their code working correctly.

What do I think about this? I lean towards keeping what we have today because I, more or less, agree that we should be actively fixing these issues. I'm also biased since I had to do this within our codebase as well. In the event where it can't be solved, the fallback flag does exist to offer users a way out. The way I see all this is that people adopt LLD for its speed, not because it's more correct (otherwise people would stick with ld64). If you take mold as an example, it doesn't care about backwards compatibility with older linkers. It just cares that it's the fastest linker. LLD follows the same footsteps, if not similar. QED.

My final thoughts: I think this is a good summary of us trying to achieve 2 conflicting things at the same time. My opinion for the iOS community is that prioritizing ease of onboarding (meaning ld64 compat) is more important than providing faster links out of the box (on our project specifically this flag doesn't even have a significant performance impact, so likely they'll still be faster out of the box anyways). Most of the tools at this depth of the stack are black boxes to iOS devs, but they still want improved performance, so being relatively confident they'll get the same results as ld64 is a high priority, and the chances of them diving deep enough to understand why they're not are slim if they think it introduces too much risk. On the flip side people who are deeper into this and want to optimize further won't mind adding flags if necessary. So I think prioritizing compatibility is the best way to get new users in, and then lead them to the docs so they can optimize.

thevinster mentioned this in D117387: [lld-macho] Allow deduplicate-literals to be overridden.Jan 14 2022, 11:18 PM

thevinster mentioned this in rGe5347f2556cb: [lld-macho] Allow deduplicate-literals to be overridden.Jan 18 2022, 3:43 PM

the chances of them diving deep enough to understand why they're not are slim if they think it introduces too much risk

How about introducing a new --ld64-compat flag that developers can use if they want to adopt LLD with minimal differences from ld64? I agree that it's not reasonable to expect the average end-user to understand the linking process in detail just to adopt LLD, but I think presenting them a higher-level flag that maximizes LD64 compatibility (and putting it front-and-center in our docs) makes things more approachable. Right now, that flag would simply alias to --deduplicate-literals, but we could expand it to other behaviors in the future.

In D117250#3253510, @int3 wrote:

the chances of them diving deep enough to understand why they're not are slim if they think it introduces too much risk

How about introducing a new --ld64-compat flag that developers can use if they want to adopt LLD with minimal differences from ld64? I agree that it's not reasonable to expect the average end-user to understand the linking process in detail just to adopt LLD, but I think presenting them a higher-level flag that maximizes LD64 compatibility (and putting it front-and-center in our docs) makes things more approachable. Right now, that flag would simply alias to --deduplicate-literals, but we could expand it to other behaviors in the future.

👍 Good idea!

In D117250#3254563, @oontvoo wrote:

In D117250#3253510, @int3 wrote:

the chances of them diving deep enough to understand why they're not are slim if they think it introduces too much risk

How about introducing a new --ld64-compat flag that developers can use if they want to adopt LLD with minimal differences from ld64? I agree that it's not reasonable to expect the average end-user to understand the linking process in detail just to adopt LLD, but I think presenting them a higher-level flag that maximizes LD64 compatibility (and putting it front-and-center in our docs) makes things more approachable. Right now, that flag would simply alias to --deduplicate-literals, but we could expand it to other behaviors in the future.

👍 Good idea!

I like this idea too, but I wonder if we should wait until we have some more use cases for this? Because I do wonder if over time there will be very many of these that we can easily have a flag like this for.

It seems like the consensus has been reached to keep this flag as is, so we should land this so folks reading it have this info.

Yep, let's wait till we have more use cases before adding the flag.

Closed by commit rGef95d45138ec: [lld-macho] Mention string literal deduplication as a difference from ld64 (authored by int3). · Explain WhyJan 19 2022, 4:31 PM

This revision was automatically updated to reflect the committed changes.

int3 added a commit: rGef95d45138ec: [lld-macho] Mention string literal deduplication as a difference from ld64.

int3 marked an inline comment as done.Jan 19 2022, 11:59 PM

int3 added inline comments.

lld/MachO/ld64-vs-lld.rst
13	whoops, forgot to address this. done in 8f811effac48fdbc1ac5ddbd944884a52866ec14

keith mentioned this in D140517: [lld-macho] Flip string deduplication default.Dec 21 2022, 3:55 PM

Revision Contents

Path

Size

lld/

MachO/

ld64-vs-lld.rst

9 lines

Diff 401439

lld/MachO/ld64-vs-lld.rst

	==================			==================
	LD64 vs LLD-MACHO			LD64 vs LLD-MACHO
	==================			==================

	This doc lists all significant deliberate differences in behavior between LD64 and LLD-MachO.			This doc lists all significant deliberate differences in behavior between LD64 and LLD-MachO.

				String literal deduplication
				****************************
				LD64 always deduplicates string literals. LLD only does it when the `--icf=` or
				the `--deduplicate-literals` flag is passed. Omitting deduplication by default
				ensures that our link is as fast as possible. However, it may also break some
				programs which have (incorrectly) relied on string deduplication always
				occurring. In particular, programs which compared string literals via pointer
				oontvooUnsubmitted Not Done Reply Inline Actions s/compared/compare ? (not sure why past tense is needed ...) oontvoo: s/compared/compare ? (not sure why past tense is needed ...)
				int3AuthorUnsubmitted Done Reply Inline Actions whoops, forgot to address this. done in 8f811effac48fdbc1ac5ddbd944884a52866ec14 int3: whoops, forgot to address this. done in 8f811effac48fdbc1ac5ddbd944884a52866ec14
				equality must be fixed to use value equality instead.

	``-no_deduplicate`` Flag			``-no_deduplicate`` Flag
	**********************			**********************
	- LD64:			- LD64:
	* This turns off ICF (deduplication pass) in the linker.			* This turns off ICF (deduplication pass) in the linker.
	- LLD			- LLD
	* This turns off ICF and string merging in the linker.			* This turns off ICF and string merging in the linker.

	ObjC symbols treatment			ObjC symbols treatment
	**********************			**********************
	There are differences in how LLD and LD64 handle ObjC symbols loaded from archives.			There are differences in how LLD and LD64 handle ObjC symbols loaded from archives.

	- LD64:			- LD64:
	* Duplicate ObjC symbols from the same archives will not raise an error. LD64 will pick the first one.			* Duplicate ObjC symbols from the same archives will not raise an error. LD64 will pick the first one.
	* Duplicate ObjC symbols from different archives will raise a "duplicate symbol" error.			* Duplicate ObjC symbols from different archives will raise a "duplicate symbol" error.
	- LLD:			- LLD:
	* Duplicate symbols, regardless of which archives they are from, will raise errors.			* Duplicate symbols, regardless of which archives they are from, will raise errors.

This is an archive of the discontinued LLVM Phabricator instance.

[lld-macho] Mention string literal deduplication as a difference from ld64ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 401439

lld/MachO/ld64-vs-lld.rst

[lld-macho] Mention string literal deduplication as a difference from ld64
ClosedPublic