This is an archive of the discontinued LLVM Phabricator instance.

[lld-macho] Fix loading same libraries from both LC_LINKER_OPTION and command line
ClosedPublic

Authored by PRESIDENT810 on Jul 12 2022, 3:27 AM.

Details

Summary

This fixes https://github.com/llvm/llvm-project/issues/56059 and https://github.com/llvm/llvm-project/issues/56440. This is inspired by tapthaker's patch (https://reviews.llvm.org/D127941), and has reused his test cases. This patch adds an bool "isCommandLineLoad" to indicate where archives are from. If lld tries to load the same library loaded previously by LC_LINKER_OPTION from CLI, it will use this isCommandLineLoad to determine if it should be affected by -all_load & -ObjC flags. This also prevents -force_load from affecting archives loaded previously from CLI without such flag, whereas tapthaker's patch will fail such test case (introduced by https://reviews.llvm.org/D128025).

Diff Detail

Event Timeline

PRESIDENT810 created this revision.Jul 12 2022, 3:27 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJul 12 2022, 3:27 AM
Herald added a reviewer: Restricted Project. · View Herald Transcript
PRESIDENT810 requested review of this revision.Jul 12 2022, 3:27 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 12 2022, 3:27 AM
PRESIDENT810 edited the summary of this revision. (Show Details)
int3 added a comment.Jul 12 2022, 6:02 PM

Nice fixes, thank you!

While looking back at my own diff, I noticed that one of the tests wasn't quite testing anything new... fixed in rG61ace8f78b1c52fd63fc8c2d800f08d7ddd87b0d. You might want to pull in that fix and check that your changes pass (though I'm guessing they do).

lld/MachO/Driver.cpp
278

I think this should work since LoadScope is just an enum and not an enum class?

1023–1026

hmm why hoist this out of addFile? feels like it'd make sense to have all the archive load logic in one place

lld/MachO/InputFiles.h
270–272 ↗(On Diff #443912)

our enums are usually capitalized

I just realized that -force_load should not work only if the same library is already explicitly passed from command line, but not when it is previously passed from LC_LINKER_OPTION. I revised how I determine if -force_load should work by saving libraries shown in command line in a DenseSet.

int3 added a comment.Jul 13 2022, 4:12 PM

Hm on seeing the latest iteration + thinking more about the problem, I'm wondering if we could simplify things a bit. This is my understanding of the requirements, correct me if I'm wrong:

  1. -ObjC and -force_load only have an effect on command line arguments. LC_LINKER_OPTION libraries are only ever lazy-loaded
  2. -force_load and -all_load take precedence over -ObjC, but only if the library hasn't already been loaded on the command line. Notably, if something has been loaded with -ObjC, we will never need to force-load it again later (so we should never need to compare LoadScope::AllLoad against LoadScope::ObjCLoad.)
  3. Since lazy-loading an archive twice (or lazy-loading after an eager load) is redundant, we don't need to do anything if we see an LC_LINKER_OPTION of some libfoo.a if we have already loaded libfoo.a via a command-line argument
  4. But if we have loaded libfoo.a via LC_LINKER_OPTION and then later see it on the command line, we need to load it if either -ObjC or force_load / all_load is used
  5. If none of those flags are used, we could harmlessly lazy-load it a second time, but that would be unnecessary perf overhead

Prior to this fix, we only had the ForceLoad enum. ForceLoad::Yes and ForceLoad::Default only ever get passed to addFile if something is being loaded from the CLI. LC_LINKER_OPTION loads are always ForceLoad::No. So addFile already has enough information to determine whether something is a command-line load vs a LC load (although it's not super obvious from the name ForceLoad::No that this applies only to LC loads).

So here's my suggestion... we could rename ForceLoad to something more descriptive. enum LoadType { CommandLineLoad, CommandLineForceLoad, LCLinkerLoad } maybe. Also change loadedArchives to store a value of struct { ArchiveFile *file, bool isCommandLineLoad }. Now if we try to load an archive and find it already cached in loadedArchives, we only need to load it again if isCommandLineLoad is false and the current load is not another lazy load. In that case we can fall back to the existing logic; there isn't a need to calculate and store a LoadScope as isCommandLineLoad should be enough.

Pros of this design:

  • No extra commandLineInputs cache
  • Storing bool isCommandLineLoad rather than LoadScope makes it clear (from looking at the data structures alone) that our cache update policy depends only on whether something is being loaded from LC_LINKER_OPTION vs the CLI

Thoughts?

lld/MachO/Driver.cpp
278

nit: I don't know if nesting the enum members under two scopes is really necessary... why not leave it as a raw enum?

I removed ForceLoad enum and add LoadFrom & LoadScope. LoadFrom indicates currently where is this archive from and LoadScope means what symbols should we load from this archive (only lazy load, load ObjC symbols, load all symbols or just skip loading this time). Now addFile accepts a LoadFrom enum so it is able to know if this archive is from CLI or LC_LINKER_OPTION. Based on this LoadFrom enum and the LoadFrom enum in its cache (if any), we can decide what symbols we need to load this time. I think this could be clearer than our previous ForceLoad? Also, I added a table in Config.h's comment to explain my implementation when encountered with these flags, please check it for me to make sure that I understand this right!

I didn't make loadedArchives to store extra information, instead I make LoadFrom an attribute in ArchiveFile class, so when we access loadedArchives and obtain the cached files, we are able to know if this archive is from CLI or not. That should do the same. Tell me if you think it's better to store LoadFrom in loadedArchives and I will change it, thanks!

int3 added a comment.Jul 17 2022, 8:55 PM

Thanks for the revision, but I still think this is more complicated than it needs to be. I think maybe my initial comment was not clear enough... in particular

there isn't a need to calculate and store a LoadScope as isCommandLineLoad should be enough.

i.e. we don't need the LoadScope enum at all. I've now suggested the code change to show how it would work...

Tell me if you think it's better to store LoadFrom in loadedArchives and I will change it, thanks!

I think it's better because we will never access it outside of Driver.cpp. So no need for it to be in the header. Also, from my previous comment:

Storing bool isCommandLineLoad rather than LoadScope makes it clear (from looking at the data structures alone) that our cache update policy depends only on whether something is being loaded from LC_LINKER_OPTION vs the CLI

i.e. could store just a boolean instead of the LoadFrom enum. This one is not such a big deal, if you prefer storing the enum that is fine too

lld/MachO/Driver.cpp
277–295
297
311
PRESIDENT810 edited the summary of this revision. (Show Details)

Updated according to int3's guide. Yeah this way is much more simplified indeed...

int3 accepted this revision.Jul 19 2022, 2:41 PM

lgtm, thanks!

This revision is now accepted and ready to land.Jul 19 2022, 2:41 PM