This is an archive of the discontinued LLVM Phabricator instance.

[lld-macho] Mention string literal deduplication as a difference from ld64
ClosedPublic

Authored by int3 on Jan 13 2022, 1:38 PM.

Diff Detail

Event Timeline

int3 created this revision.Jan 13 2022, 1:38 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 13 2022, 1:38 PM
Herald added a reviewer: Restricted Project. · View Herald Transcript
int3 requested review of this revision.Jan 13 2022, 1:38 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 13 2022, 1:38 PM
keith accepted this revision as: keith.Jan 13 2022, 2:10 PM
keith added a subscriber: keith.

Docs seem good, but I do wonder if we should flip the default to be like LD64 instead?

This revision is now accepted and ready to land.Jan 13 2022, 2:10 PM
int3 added a comment.EditedJan 13 2022, 2:12 PM

IMO we should default to optimizing for speed (speed is one of our big selling points after all), but I'm open to changing this behavior if you or other people feel strongly about it :)

keith added a comment.Jan 13 2022, 2:18 PM

Yea this case is tough. We could always recommend folks enable it with this same warning instead. My biggest worry, similar to the case we hit with this, is that many iOS apps do not control _all_ the code that ships in their apps. Specifically the proliferation of closed source libraries for analytics / fraud / payments are very common, and might have these subtle bugs that developers cannot easily fix. Really I think flipping the default would just be a benefit for folks who were onboarding.

That's a fair argument. @thakis @oontvoo @MaskRay, do y'all have any thoughts on this?

Specifically the proliferation of closed source libraries for analytics / fraud / payments are very common, and might have these subtle bugs that developers cannot easily fix. Really I think flipping the default would just be a benefit for folks who were onboarding.

Can you elaborate what kind of bugs *not* deduping literals by default could lead to?

I think I agree with int3 here that de-duping by default is more likely to lead to bugs. (in fact, we've already found a lot of cases in our codebase where people incorrectly depended on it; none of which would likely have been identified had we not tried switching over to lld)

That's a fair argument. @thakis @oontvoo @MaskRay, do y'all have any thoughts on this?

I guess my vote is on the current behaviour :) 🗳

keith added a comment.Jan 13 2022, 4:31 PM

Specifically the proliferation of closed source libraries for analytics / fraud / payments are very common, and might have these subtle bugs that developers cannot easily fix. Really I think flipping the default would just be a benefit for folks who were onboarding.

Can you elaborate what kind of bugs *not* deduping literals by default could lead to?

I mean I guess literally anything right? The case I found for us was luckily a crasher (repro here https://github.com/llvm/llvm-project/issues/51822) but given some logic error who knows I guess?

keith added a comment.Jan 13 2022, 4:38 PM

Specifically the proliferation of closed source libraries for analytics / fraud / payments are very common, and might have these subtle bugs that developers cannot easily fix. Really I think flipping the default would just be a benefit for folks who were onboarding.

I think I agree with int3 here that de-duping by default is more likely to lead to bugs. (in fact, we've already found a lot of cases in our codebase where people incorrectly depended on it; none of which would likely have been identified had we not tried switching over to lld)

It's definitely true that you would be more likely to introduce bugs in your own code with deduping on, but in the closed source binary case, where the producers of these aren't likely to be using lld themselves, us finding those bugs downstream isn't super helpful. Tbh even if we did find it I'm not sure all vendors would be receptive to bugs that only reproduce with the non-default linker, but that's tbd if we can actually even spot them given they could be much more subtle logic errors. It's too bad that there isn't a clang warning for this case AFAICT.

Regardless I still think this is a question of whether usage defaults should mirror ld64, or prefer performance.

MaskRay added a comment.EditedJan 13 2022, 5:09 PM

Non-string constant deduplication isn't that useful.
Tail string merge has very little benefit.

On the ELF land, I have some brief notes on https://maskray.me/blog/2021-12-19-why-isnt-ld.lld-faster#shf_merge-duplicate-elimination and compressed debug info.
ld.lld -O1 (default) performs SHF_MERGE|SHF_STRINGS deduplication. It has a huge impact on the size of .debug_str.

% ld.lld @response.txt -o clang.0 -O0
% ld.lld @response.txt -o clang.1 -O1
% stat -c %s clang.0 clang.1
2126774248
1389546048

% ~/projects/bloaty/Release/bloaty clang.0 -- clang.1
    FILE SIZE        VM SIZE    
 --------------  -------------- 
  +286%  +661Mi  [ = ]       0    .debug_str
   +87% +41.3Mi   +87% +41.3Mi    .rodata
 +18e4%  +266Ki  [ = ]       0    .comment
  +0.0%      +8  [ = ]       0    .eh_frame
  -0.0%      -5  [ = ]       0    .debug_line
   +53%  +703Mi   +16% +41.3Mi    TOTAL

% hyperfine --warmup 2 --min-runs 10 "numactl -C 20-27 /tmp/out/custom2/bin/ld.lld "{-O0,-O1}" @response.txt --threads=8 -o clang"                                                                                                                                                   
Benchmark 1: numactl -C 20-27 /tmp/out/custom2/bin/ld.lld -O0 @response.txt --threads=8 -o clang
 ⠧ Current estimate: 4.992 s 
  Time (mean ± σ):      5.006 s ±  0.032 s    [User: 5.289 s, System: 3.048 s]
  Range (min … max):    4.958 s …  5.079 s    10 runs
 
Benchmark 2: numactl -C 20-27 /tmp/out/custom2/bin/ld.lld -O1 @response.txt --threads=8 -o clang
  Time (mean ± σ):      6.030 s ±  0.044 s    [User: 11.633 s, System: 2.822 s]
  Range (min … max):    5.936 s …  6.066 s    10 runs
 
Summary
  'numactl -C 20-27 /tmp/out/custom2/bin/ld.lld -O0 @response.txt --threads=8 -o clang' ran
    1.20 ± 0.01 times faster than 'numactl -C 20-27 /tmp/out/custom2/bin/ld.lld -O1 @response.txt --threads=8 -o clang'

.debug_str is ~3.86x (1+286%=3.86) as large if you suppress deduplication.

There are users preferring size and users preferring speed.
If you do parallelism on string deduplication, the speed may not differ too much.
(I have tried poor man's concurrent hash map, but don't find a noticeable improvement.)

There is also an ecosystem problem. Does dsymutil deduplicate strings? If debug executable user will invoke it anyway, then the size probably doesn't matter that much.
In ELF land, if the output is too large, there are definitely users complaining against your linker...

(Oops. I cannot test dwz on the Debug build of clang)

% dwz clang.0 -o clang.0.dwz
dwz: clang.0: Too many DIEs, not optimizing

Perhaps I can contribute to the parallel part of DeduplicatedCStringSection::finalizeContents? :)
If I can make my cbdr work (https://reviews.llvm.org/D114735#3236110). Currently it seems to always print the help message

% cbdr -V  
cbdr 0.2.3
Tools for comparitive benchmarking

USAGE:
    cbdr <SUBCOMMAND>

FLAGS:
    -h, --help       Prints help information
    -V, --version    Prints version information

SUBCOMMANDS:
    analyze    For each pair of benchmarks (x and y), shows, for each metric (̄x and ̄y), the CI of (̄y - ̄x) / ̄x
    help       Prints this message or the help of the given subcommand(s)
    plot       Takes CSV data on stdin and produces a vega-lite plot specification on stdout
    sample     Repeatedly runs benchmarks chosen at random and prints results as CSV

Specifically the proliferation of closed source libraries for analytics / fraud / payments are very common, and might have these subtle bugs that developers cannot easily fix. Really I think flipping the default would just be a benefit for folks who were onboarding.

Can you elaborate what kind of bugs *not* deduping literals by default could lead to?

I mean I guess literally anything right? The case I found for us was luckily a crasher (repro here https://github.com/llvm/llvm-project/issues/51822) but given some logic error who knows I guess?
It's definitely true that you would be more likely to introduce bugs in your own code with deduping on, but in the closed source binary case, where the producers of these aren't likely to be using lld themselves, us finding those bugs downstream isn't super helpful.
Tbh even if we did find it I'm not sure all vendors would be receptive to bugs that only reproduce with the non-default linker, but that's tbd if we can actually even spot them given they could be much more subtle logic errors. I

I can appreciate the complexity+frustration in chasing and fixing bugs like this, having done that for a few dozens of projects in our code base.
But this is clearly UB (both C/C++ and ObjC never guarantee pointer equality for strings with the same values).

I don't see the argument here being about speed (even though in practice, it kind of is). To me, it's more about whether LLD should make it easy for people to rely on UB - and I'm really not convinced that it should.
Also, we can always fall back to setting this de-dup flag if the bug is in code that's not fixable, no?

It's too bad that there isn't a clang warning for this case AFAICT.

IIRC, there is only one complier-warning for this kind of thing in ObjC - and I've recently added a clang-tidy check for another scenario - but yeah, that's about it.

keith added a comment.Jan 13 2022, 5:50 PM

Specifically the proliferation of closed source libraries for analytics / fraud / payments are very common, and might have these subtle bugs that developers cannot easily fix. Really I think flipping the default would just be a benefit for folks who were onboarding.

Can you elaborate what kind of bugs *not* deduping literals by default could lead to?

I mean I guess literally anything right? The case I found for us was luckily a crasher (repro here https://github.com/llvm/llvm-project/issues/51822) but given some logic error who knows I guess?
It's definitely true that you would be more likely to introduce bugs in your own code with deduping on, but in the closed source binary case, where the producers of these aren't likely to be using lld themselves, us finding those bugs downstream isn't super helpful.
Tbh even if we did find it I'm not sure all vendors would be receptive to bugs that only reproduce with the non-default linker, but that's tbd if we can actually even spot them given they could be much more subtle logic errors. I

I can appreciate the complexity+frustration in chasing and fixing bugs like this, having done that for a few dozens of projects in our code base.
But this is clearly UB (both C/C++ and ObjC never guarantee pointer equality for strings with the same values).

I don't see the argument here being about speed (even though in practice, it kind of is). To me, it's more about whether LLD should make it easy for people to rely on UB - and I'm really not convinced that it should.

I agree with this but it's possible, especially w/o a compiler warning for this, some folks might see this as idealistic given with the standard linker it's never surfaced.

Also, we can always fall back to setting this de-dup flag if the bug is in code that's not fixable, no?

Yes but I think for new users it's nicer to have 1:1 (as much as is realistic) behavior with ld64 to start, and then they can start investigating flags for performance. Versus finding some runtime misbehavior (hopefully folks wouldn't just give up at this point) and then having to find the flag to fix it.

It's too bad that there isn't a clang warning for this case AFAICT.

IIRC, there is only one complier-warning for this kind of thing in ObjC - and I've recently added a clang-tidy check for another scenario - but yeah, that's about it.

I searched around a bit for this as well, any idea if there's a previous conversation about it anywhere? You get the warning for foo == "a" but not foo == bar, I wonder how folks would feel about adding one for that case. I also tested with ubsan and it doesn't warn for this either.

, I wonder how folks would feel about adding one for that case.

From slightly biased opinions of my colleagues, this would cause quite a bit of false positives. (eg., code that handles buffers)

I also tested with ubsan and it doesn't warn for this either.

it's not UB to compare two const char* - it is kind of UB to *expect* their equality, which, of course, is hard to tell.

oontvoo added inline comments.Jan 13 2022, 6:29 PM
lld/MachO/ld64-vs-lld.rst
13

s/compared/compare ?
(not sure why past tense is needed ...)

thevinster added a subscriber: thevinster.EditedJan 13 2022, 6:41 PM

After reading through this thread, I'm going to try and argue for both fronts here and then give my final opinion at the end.

For the case of matching ld64, we always advertise (or at least try to) say on diff reviews that we should always be matching ld64's behavior. We say things like "What does ld64 produce?" or "Why isn't LLD outputting the same results as ld64". We care very much matching with ld64's behavior because that's the one of the ways we verify that LLD is working correctly. To that end, perhaps we should be matching with ld64 if we want to keep saying "LLD should be a drop-in replacement for ld64". Our current behavior clearly deviates from this, but we continue advertising this statement. On top of that, as @keith pointed out, it's fewer surprises.

For the case of build speed, we also advertise that "LLD is the fastest linker to exist" (barring the existence of mold). By configuring default behaviors of LLD to achieve the fastest build speed is also aligned with that claim, but this introduces subtleties and behavioral differences that will certainly cause new users to be confused. We could enforce some that they read the documentation on the differences between ld64 and LLD, but chances are that newer folks looking to do a drop in replacement would've sunk a good amount of time debugging before consulting the manual. As @oontvoo also argues, users shouldn't be relying on linker behavior to have their code working correctly.

What do I think about this? I lean towards keeping what we have today because I, more or less, agree that we should be actively fixing these issues. I'm also biased since I had to do this within our codebase as well. In the event where it can't be solved, the fallback flag does exist to offer users a way out. The way I see all this is that people adopt LLD for its speed, not because it's more correct (otherwise people would stick with ld64). If you take mold as an example, it doesn't care about backwards compatibility with older linkers. It just cares that it's the fastest linker. LLD follows the same footsteps, if not similar. QED.

For the case of matching ld64, we always advertise (or at least try to) say on diff reviews that we should always be matching ld64's behavior. We say things like "What does ld64 produce?" or "Why isn't LLD outputting the same results as ld64". We care very much matching with ld64's behavior because that's the one of the ways we verify that LLD is working correctly. To that end, perhaps we should be matching with ld64 if we want to keep saying "LLD should be a drop-in replacement for ld64". Our current behavior clearly deviates from this, but we continue advertising this statement. On top of that, as @keith pointed out, it's fewer surprises.

For the case of build speed, we also advertise that "LLD is the fastest linker to exist" (barring the existence of mold). By configuring default behaviors of LLD to achieve the fastest build speed is also aligned with that claim, but this introduces subtleties and behavioral differences that will certainly cause new users to be confused. We could enforce some that they read the documentation on the differences between ld64 and LLD, but chances are that newer folks looking to do a drop in replacement would've sunk a good amount of time debugging before consulting the manual. As @oontvoo also argues, users shouldn't be relying on linker behavior to have their code working correctly.

What do I think about this? I lean towards keeping what we have today because I, more or less, agree that we should be actively fixing these issues. I'm also biased since I had to do this within our codebase as well. In the event where it can't be solved, the fallback flag does exist to offer users a way out. The way I see all this is that people adopt LLD for its speed, not because it's more correct (otherwise people would stick with ld64). If you take mold as an example, it doesn't care about backwards compatibility with older linkers. It just cares that it's the fastest linker. LLD follows the same footsteps, if not similar. QED.

My final thoughts: I think this is a good summary of us trying to achieve 2 conflicting things at the same time. My opinion for the iOS community is that prioritizing ease of onboarding (meaning ld64 compat) is more important than providing faster links out of the box (on our project specifically this flag doesn't even have a significant performance impact, so likely they'll still be faster out of the box anyways). Most of the tools at this depth of the stack are black boxes to iOS devs, but they still want improved performance, so being relatively confident they'll get the same results as ld64 is a high priority, and the chances of them diving deep enough to understand why they're not are slim if they think it introduces too much risk. On the flip side people who are deeper into this and want to optimize further won't mind adding flags if necessary. So I think prioritizing compatibility is the best way to get new users in, and then lead them to the docs so they can optimize.

int3 added a comment.Jan 18 2022, 8:19 PM

the chances of them diving deep enough to understand why they're not are slim if they think it introduces too much risk

How about introducing a new --ld64-compat flag that developers can use if they want to adopt LLD with minimal differences from ld64? I agree that it's not reasonable to expect the average end-user to understand the linking process in detail just to adopt LLD, but I think presenting them a higher-level flag that maximizes LD64 compatibility (and putting it front-and-center in our docs) makes things more approachable. Right now, that flag would simply alias to --deduplicate-literals, but we could expand it to other behaviors in the future.

the chances of them diving deep enough to understand why they're not are slim if they think it introduces too much risk

How about introducing a new --ld64-compat flag that developers can use if they want to adopt LLD with minimal differences from ld64? I agree that it's not reasonable to expect the average end-user to understand the linking process in detail just to adopt LLD, but I think presenting them a higher-level flag that maximizes LD64 compatibility (and putting it front-and-center in our docs) makes things more approachable. Right now, that flag would simply alias to --deduplicate-literals, but we could expand it to other behaviors in the future.

👍 Good idea!

keith added a comment.Jan 19 2022, 3:25 PM

the chances of them diving deep enough to understand why they're not are slim if they think it introduces too much risk

How about introducing a new --ld64-compat flag that developers can use if they want to adopt LLD with minimal differences from ld64? I agree that it's not reasonable to expect the average end-user to understand the linking process in detail just to adopt LLD, but I think presenting them a higher-level flag that maximizes LD64 compatibility (and putting it front-and-center in our docs) makes things more approachable. Right now, that flag would simply alias to --deduplicate-literals, but we could expand it to other behaviors in the future.

👍 Good idea!

I like this idea too, but I wonder if we should wait until we have some more use cases for this? Because I do wonder if over time there will be very many of these that we can easily have a flag like this for.

It seems like the consensus has been reached to keep this flag as is, so we should land this so folks reading it have this info.

int3 added a comment.Jan 19 2022, 4:29 PM

Yep, let's wait till we have more use cases before adding the flag.

int3 marked an inline comment as done.Jan 19 2022, 11:59 PM
int3 added inline comments.
lld/MachO/ld64-vs-lld.rst
13

whoops, forgot to address this. done in 8f811effac48fdbc1ac5ddbd944884a52866ec14