This is an archive of the discontinued LLVM Phabricator instance.

[LinkerWrapper] Only import static libraries with needed symbols
ClosedPublic

Authored by jhuber6 on Jan 24 2023, 9:10 AM.

Details

Summary

Currently, we pull in every single static archive member as long as we
have an offloading architecture that requires it. This goes against the
standard sematnics of static libraries that only pull in symbols that
define currently undefined symbols. In order to support this we roll
some custom symbol resolution logic to check if a static library is
needed. Because of offloading semantics, this requires an extra check
for externally visibile symbols. E.g. if a static member defines a
kernel we should import it.

The main benefit to this is that we can now link against the
libomptarget.devicertl.a library unconditionally. This removes the
requirement for users to specify LTO on the link command. This will also
allow us to stop using the amdgcn bitcode versions of the libraries.

clang foo.c -fopenmp --offload-arch=gfx1030 -foffload-lto -c
clang foo.o -fopenmp --offload-arch=gfx1030 -foffload-lto

Diff Detail

Event Timeline

jhuber6 created this revision.Jan 24 2023, 9:10 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 24 2023, 9:10 AM
Herald added a subscriber: kosarev. · View Herald Transcript
jhuber6 requested review of this revision.Jan 24 2023, 9:10 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJan 24 2023, 9:10 AM
tra added a subscriber: MaskRay.Jan 24 2023, 10:18 AM

@MaskRay - we seem to be reinventing the linker here and could use your expertise.

clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
1217–1218

How do we handle conflicting symbols defined more than once?

jhuber6 added a comment.EditedJan 24 2023, 10:29 AM

@MaskRay - we seem to be reinventing the linker here and could use your expertise.

Yeah, this reinvents a lot of logic. But I don't think there's an easy way to get around this without duplicating some ugly logic. I would absolutely love to shove the vast majority of this LTO / static library handling directly into the linker. E.g. we package the extracted libraries into a new one and let lld handle it. The problem is pretty firmly in Nvidia's hands because they don't have a real linker. They use nvlink which does some magic to .cubin files and doesn't understand our LTO or static library linking. So in order to make this common I needed to reproduce a lot of that logic here.

Also, the one subtle difference in this and the regular rules for static libraries is that we still want to extract symbols that would normally be considered "ExportDynamic". This is just because most of the time, these symbols are going to be read from the user's offloading app.

clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
1217–1218

I just assume that will be caught by the actual linker, it's not relevant for knowing which symbols to pull out.

tra added inline comments.Jan 24 2023, 10:36 AM
clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
1217–1218

That would be the case when we were loading all libraries at once.
However, now that we load them as needed one by one, we may end up loading only the first file which provides such symbol and would potentially ignore others, if no other symbol requires loading them. Given that we're checking the symbols anyways, it would be useful to diagnose it, IMO.

jhuber6 added inline comments.Jan 24 2023, 10:38 AM
clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
1217–1218

This is the expected behavior of static libraries right? lld has some option to print out why a certain library was extracted, we could do something similar?

tra added a comment.Jan 24 2023, 11:44 AM

We could also use more test cases. E.g. weak symbols (should not cause object extraction)

clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
1197

This could use some comments.
@MaskRay's blog post https://maskray.me/blog/2021-06-20-symbol-processing#archive-processing would be helpful to explain what's going on here.

I'd also refactor the condition a bit for readability:

bool ResolvesStrongReference = (OldSym & Sym_Undefined) &&
                               !(OldSym & Sym_Weak) &&
                               !(*FlagsOrErr & SymbolRef::SF_Undefined);
bool NewGlobalSymbol = ((NewSymbol || (OldSym & Sym_Undefined)) &&
                        !(*FlagsOrErr & SymbolRef::SF_Undefined) &&
                        !(*FlagsOrErr & SymbolRef::SF_Hidden));
Resolved |= ResolvesStrongReference || NewGlobalSymbol;
1217–1218

Yes, it is expected. It's also a common source of surprises. I'm not sure whether lld diagnoses it these days. We should follow whatever it does.

jhuber6 marked an inline comment as done.Jan 24 2023, 12:09 PM

We could also use more test cases. E.g. weak symbols (should not cause object extraction)

Yeah, I'll try to add a reasonable test here. It's a little difficult due to the indirect nature of this however.

clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
1217–1218

I just tested it with the same library defining a global x for main.c.

> clang main.c libx.a libx.a libx.a -Wl,--why-extract=-
reference	extracted	symbol
/tmp/main-8a5566.o	libx.a(x.o)	x

I could maybe add a similar option here to at least let us know if it extracts. Maybe for some tests.

jhuber6 updated this revision to Diff 491883.Jan 24 2023, 1:03 PM

Adding test and making the logic a bit more readable.

tra accepted this revision.Jan 24 2023, 2:17 PM

LGTM. Please wait a bit before landing it, in case @MaskRay has something to say.

clang/test/Driver/linker-wrapper-libs.c
39–40

I'm puzzled about how exactly we're checking the that we extract a static library defining an undefined symbol part here.

Presumably the trailing .o would be the object extracted from the library. I'm not sure where the first .o/.s argument come from? Is that the 'main' GPU object/executable embedded in the host object?

clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
1213

Nit: perhaps we should rename Resolved -> ShouldExtractObject, it would be a better match. After all, defining/including a new symbol is not quite the same as resolving a reference.

This revision is now accepted and ready to land.Jan 24 2023, 2:17 PM
jhuber6 marked an inline comment as done.Jan 24 2023, 2:23 PM

LGTM. Please wait a bit before landing it, in case @MaskRay has something to say.

I'm somewhat hoping to get this in before the fork that happens in a few hours.

clang/test/Driver/linker-wrapper-libs.c
39–40

Yes, if we extract it we'll have two object files. The .s comes from the fact that I deliberately made the main binary an ELF while the library is a .bc. This is why we get a .s for Nvidia, it's PTX.

clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
1213

Good point, I'll change it.

tra added a comment.Jan 24 2023, 2:32 PM

I'm somewhat hoping to get this in before the fork that happens in a few hours.

OK. We can fix it later if it turns out that we've missed something.

This revision was landed with ongoing or failed builds.Jan 24 2023, 3:01 PM
This revision was automatically updated to reflect the committed changes.
jhuber6 marked an inline comment as done.

I'm actually not sure why the test it failing this way. The failure suggests that it's not extracting the members, but I have absolutely no clue what could be different on AIX versus all the other systems the tests pass on. Nothing in this logic should be target dependent, unless there's something weird with how we handle llvm-ar or similar.

I might just put that this test requires Linux. Because I don't think I can debug it without access to the system. We don't really support offloading on anything but Linux anyway.

I might just put that this test requires Linux. Because I don't think I can debug it without access to the system. We don't really support offloading on anything but Linux anyway.

I think that makes sense. We'd need time to investigate it on AIX anyways.

I might just put that this test requires Linux. Because I don't think I can debug it without access to the system. We don't really support offloading on anything but Linux anyway.

I think that makes sense. We'd need time to investigate it on AIX anyways.

I put a patch to fix the problem. https://reviews.llvm.org/D145600