This fixes https://github.com/llvm/llvm-project/issues/56059 and https://github.com/llvm/llvm-project/issues/56440. This is inspired by tapthaker's patch (https://reviews.llvm.org/D127941), and has reused his test cases. This patch adds an bool "isCommandLineLoad" to indicate where archives are from. If lld tries to load the same library loaded previously by LC_LINKER_OPTION from CLI, it will use this isCommandLineLoad to determine if it should be affected by -all_load & -ObjC flags. This also prevents -force_load from affecting archives loaded previously from CLI without such flag, whereas tapthaker's patch will fail such test case (introduced by https://reviews.llvm.org/D128025).
Details
- Reviewers
int3 oontvoo thakis gkm - Group Reviewers
Restricted Project - Commits
- rGdd5635541cd7: [lld-macho] Fix loading same libraries from both LC_LINKER_OPTION and command…
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Nice fixes, thank you!
While looking back at my own diff, I noticed that one of the tests wasn't quite testing anything new... fixed in rG61ace8f78b1c52fd63fc8c2d800f08d7ddd87b0d. You might want to pull in that fix and check that your changes pass (though I'm guessing they do).
lld/MachO/Driver.cpp | ||
---|---|---|
278 | I think this should work since LoadScope is just an enum and not an enum class? | |
1023–1026 | hmm why hoist this out of addFile? feels like it'd make sense to have all the archive load logic in one place | |
lld/MachO/InputFiles.h | ||
270–272 ↗ | (On Diff #443912) | our enums are usually capitalized |
I just realized that -force_load should not work only if the same library is already explicitly passed from command line, but not when it is previously passed from LC_LINKER_OPTION. I revised how I determine if -force_load should work by saving libraries shown in command line in a DenseSet.
Hm on seeing the latest iteration + thinking more about the problem, I'm wondering if we could simplify things a bit. This is my understanding of the requirements, correct me if I'm wrong:
- -ObjC and -force_load only have an effect on command line arguments. LC_LINKER_OPTION libraries are only ever lazy-loaded
- -force_load and -all_load take precedence over -ObjC, but only if the library hasn't already been loaded on the command line. Notably, if something has been loaded with -ObjC, we will never need to force-load it again later (so we should never need to compare LoadScope::AllLoad against LoadScope::ObjCLoad.)
- Since lazy-loading an archive twice (or lazy-loading after an eager load) is redundant, we don't need to do anything if we see an LC_LINKER_OPTION of some libfoo.a if we have already loaded libfoo.a via a command-line argument
- But if we have loaded libfoo.a via LC_LINKER_OPTION and then later see it on the command line, we need to load it if either -ObjC or force_load / all_load is used
- If none of those flags are used, we could harmlessly lazy-load it a second time, but that would be unnecessary perf overhead
Prior to this fix, we only had the ForceLoad enum. ForceLoad::Yes and ForceLoad::Default only ever get passed to addFile if something is being loaded from the CLI. LC_LINKER_OPTION loads are always ForceLoad::No. So addFile already has enough information to determine whether something is a command-line load vs a LC load (although it's not super obvious from the name ForceLoad::No that this applies only to LC loads).
So here's my suggestion... we could rename ForceLoad to something more descriptive. enum LoadType { CommandLineLoad, CommandLineForceLoad, LCLinkerLoad } maybe. Also change loadedArchives to store a value of struct { ArchiveFile *file, bool isCommandLineLoad }. Now if we try to load an archive and find it already cached in loadedArchives, we only need to load it again if isCommandLineLoad is false and the current load is not another lazy load. In that case we can fall back to the existing logic; there isn't a need to calculate and store a LoadScope as isCommandLineLoad should be enough.
Pros of this design:
- No extra commandLineInputs cache
- Storing bool isCommandLineLoad rather than LoadScope makes it clear (from looking at the data structures alone) that our cache update policy depends only on whether something is being loaded from LC_LINKER_OPTION vs the CLI
Thoughts?
lld/MachO/Driver.cpp | ||
---|---|---|
278 | nit: I don't know if nesting the enum members under two scopes is really necessary... why not leave it as a raw enum? |
I removed ForceLoad enum and add LoadFrom & LoadScope. LoadFrom indicates currently where is this archive from and LoadScope means what symbols should we load from this archive (only lazy load, load ObjC symbols, load all symbols or just skip loading this time). Now addFile accepts a LoadFrom enum so it is able to know if this archive is from CLI or LC_LINKER_OPTION. Based on this LoadFrom enum and the LoadFrom enum in its cache (if any), we can decide what symbols we need to load this time. I think this could be clearer than our previous ForceLoad? Also, I added a table in Config.h's comment to explain my implementation when encountered with these flags, please check it for me to make sure that I understand this right!
I didn't make loadedArchives to store extra information, instead I make LoadFrom an attribute in ArchiveFile class, so when we access loadedArchives and obtain the cached files, we are able to know if this archive is from CLI or not. That should do the same. Tell me if you think it's better to store LoadFrom in loadedArchives and I will change it, thanks!
Thanks for the revision, but I still think this is more complicated than it needs to be. I think maybe my initial comment was not clear enough... in particular
there isn't a need to calculate and store a LoadScope as isCommandLineLoad should be enough.
i.e. we don't need the LoadScope enum at all. I've now suggested the code change to show how it would work...
Tell me if you think it's better to store LoadFrom in loadedArchives and I will change it, thanks!
I think it's better because we will never access it outside of Driver.cpp. So no need for it to be in the header. Also, from my previous comment:
Storing bool isCommandLineLoad rather than LoadScope makes it clear (from looking at the data structures alone) that our cache update policy depends only on whether something is being loaded from LC_LINKER_OPTION vs the CLI
i.e. could store just a boolean instead of the LoadFrom enum. This one is not such a big deal, if you prefer storing the enum that is fine too
lld/MachO/Driver.cpp | ||
---|---|---|
277–295 | ||
297 | ||
311 |
I think this should work since LoadScope is just an enum and not an enum class?