This is an archive of the discontinued LLVM Phabricator instance.

[lld/mac] Port typo correction for undefined symbols from ELF port
ClosedPublic

Authored by thakis on Oct 2 2022, 9:57 AM.

Details

Summary

Ports:

Does not port https://reviews.llvm.org/D71735 since I believe that that doesn't
apply to lld/Mach-O.

I was a bit skeptical of the feature when it was initially added, but I think it's
proved itself in the ELF port.

Diff Detail

Event Timeline

thakis created this revision.Oct 2 2022, 9:57 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptOct 2 2022, 9:57 AM
thakis requested review of this revision.Oct 2 2022, 9:57 AM
thakis updated this revision to Diff 464674.Oct 3 2022, 7:09 AM

make extern "C" suggestions actually work

need to account for the additional leading _ that darwin mangling inserts

thakis updated this revision to Diff 464691.Oct 3 2022, 8:12 AM

rebase to try and make presubmit happy

thakis updated this revision to Diff 464692.Oct 3 2022, 8:15 AM

reupload same diff without parent rev set in an attempt to unconfuse the precommit bot

oontvoo accepted this revision.Oct 3 2022, 8:24 AM
oontvoo added a subscriber: oontvoo.

Nice! LGTM
(it seems patch-application still failed - probably need sync'ing again?)

lld/MachO/SymbolTable.cpp
413

This function does some intensive works - is it worth caching the results?

(OTOTH, maybe performance isn't important in case of failures, anyways 🤔 )

This revision is now accepted and ready to land.Oct 3 2022, 8:24 AM
thakis added a comment.Oct 3 2022, 8:31 AM

Thanks!

Presubmit looks happier now that I removed the committed parent rev. Yesterday, when the parent wasn't committed yet, it was green too. Maybe a presubmit bug with landed parent revs.

lld/MachO/SymbolTable.cpp
413

We only call it once per undefined symbol, so a cache would have a very low hit rate i think (?). We only call this for the first 2 undefined symbols for perf reasons too. So even for fairly long symbol names, this should add less than 100k symbol lookups (which is not nothing! But it only happens on failing links, and it's still small compared to total number of lookups during a link)

Herald added a project: Restricted Project. · View Herald TranscriptOct 3 2022, 3:54 PM
int3 added a subscriber: int3.Oct 3 2022, 3:57 PM

I was wondering if we could have more code sharing with the ELF port, or if it's too messy to be worth the effort of refactoring

thakis added a comment.Oct 3 2022, 4:02 PM

I was wondering if we could have more code sharing with the ELF port, or if it's too messy to be worth the effort of refactoring

Some things could be shared if we wanted, but there's also a bunch of differences (some silly, like it being sym.file in ELF vs sym.getFile() in MachO and file->getSymbols() in ELF vs file->symbols in MachO, or there not being an isUndefined() in MachO; some more structural, such as MachO having to deal with the extra _ prefix). The code that generates all the candidates is pretty similar, but if we share it we make it harder to change the symtab impl of each port. I thought duplicating it is nicer, but *shrug*. Happy to change concrete things if you have something concrete that you feel would be better.

int3 added a comment.Oct 3 2022, 4:04 PM

Nah that's fair enough

I agree that code duplication is better engineering if consider object file format differences and updating convenience.

hans added a subscriber: hans.Oct 7 2022, 4:40 AM

Developers are reporting crashes/asserts from this change: http://crbug.com/1372389
I'll back it out for now until it can be fixed.

(FWIW, my 2c & this dates back to the founding of the 'new' (now not-so-new) lld - that the code duplication between ports is pretty unfortunate & while it does give some agility to the ports to work differently because they are different - it also means code diverges, bugfixes get made in one port and not shared by others - all the usual reasons code duplication is problematic & that LLVM as a whole has been founded on reusable library abstractions that have been quite effective/beneficial overall - I'd love to see even small parts of lld start to be abstracted into common helpers even with some trait adapters to handle the trivial differences in naming and/or some pressure to make the naming consistent and keep it that way at least in subsets of the API (eg: enough to make this work as a template) - but you folks can take this for whatever it's worth/weigh it against all your other goals/needs)

(FWIW, my 2c & this dates back to the founding of the 'new' (now not-so-new) lld - that the code duplication between ports is pretty unfortunate & while it does give some agility

Not sure if this is the right forum to re-litigate this :) I'm sure you're aware of the history: The original lld used a design like this and didn't go anywhere. The new lld uses this somewhat uncommon approach, and it's led to lld becoming the fastest production quality linker that can target COFF, ELF, and MachO. Abstractly I kind of agree with you, but you also can't argue too much with the impact the current approach had.

Unrelatedly, I relanded the patch, and I made D135889 that has a test for the crash. (It depends on D135888, so I didn't include it in the reland directly, to make the reland almost identical to the original commit.)

(FWIW, my 2c & this dates back to the founding of the 'new' (now not-so-new) lld - that the code duplication between ports is pretty unfortunate & while it does give some agility

Not sure if this is the right forum to re-litigate this :) I'm sure you're aware of the history: The original lld used a design like this and didn't go anywhere. The new lld uses this somewhat uncommon approach, and it's led to lld becoming the fastest production quality linker that can target COFF, ELF, and MachO. Abstractly I kind of agree with you, but you also can't argue too much with the impact the current approach had.

mold might disagree about speed.

mold might disagree about speed.

mold is great, but it doesn't have a COFF port yet and I think it's seen little production use for its MachO port so far. (mold uses the same internal design as lld – distinct code for ELF and MachO, so I think bringing it up doesn't add anything to the discussion here – but as I said I also think it's not terribly useful to have this discussion again on this random review.)

(FWIW, my 2c & this dates back to the founding of the 'new' (now not-so-new) lld - that the code duplication between ports is pretty unfortunate & while it does give some agility

Not sure if this is the right forum to re-litigate this :) I'm sure you're aware of the history: The original lld used a design like this and didn't go anywhere. The new lld uses this somewhat uncommon approach, and it's led to lld becoming the fastest production quality linker that can target COFF, ELF, and MachO. Abstractly I kind of agree with you, but you also can't argue too much with the impact the current approach had.

One of the arguments made early on was that unlibrary-ification would be necessary for early velocity (not one I agree with, but that's how it went) & that library-ification could occur once development was over that hump/had got concrete results. Given that lld's been proved out and is a production linker, I think now's a good time to start considering looking for commonality and migrating towards more common infrastructure and library-like design.

One of the arguments made early on was that unlibrary-ification would be necessary for early velocity (not one I agree with, but that's how it went) & that library-ification could occur once development was over that hump/had got concrete results. Given that lld's been proved out and is a production linker, I think now's a good time to start considering looking for commonality and migrating towards more common infrastructure and library-like design.

I'm somewhat familiar with three of the ports, and I think there isn't a lot of commonality really.

One of the arguments made early on was that unlibrary-ification would be necessary for early velocity (not one I agree with, but that's how it went) & that library-ification could occur once development was over that hump/had got concrete results. Given that lld's been proved out and is a production linker, I think now's a good time to start considering looking for commonality and migrating towards more common infrastructure and library-like design.

I'm somewhat familiar with three of the ports, and I think there isn't a lot of commonality really.

Fair enough - this seemed like a place that was relatively close enough to consider, judging by the description/purpose of the code, etc. Ah well.