This is an archive of the discontinued LLVM Phabricator instance.

[3/3] [COFF] Emit embedded -exclude-symbols: directives for hidden visibility for MinGW
ClosedPublic

Authored by mstorsjo on Jul 19 2022, 1:27 PM.

Details

Summary

This works with the automatic export of all symbols; in MinGW mode,
when a DLL has no explicit dllexports, it exports all symbols (except
for some that are hardcoded to be excluded, including some toolchain
libraries).

By hooking up the hidden visibility to the -exclude-symbols: directive,
the automatic export of all symbols can be controlled in an easier
way (with a mechanism that doesn't require strict annotation of every
single symbol, but which allows gradually marking more unnecessary
symbols as hidden).

The primary use case is dylib builds of LLVM/Clang. These can be done
in MinGW mode but not in MSVC mode, as MinGW builds can export all
symbols (and the calling code can use APIs without corresponding
dllimport directives). However, as all symbols are exported, it can
easily overflow the max number of exported symbols in a DLL (65536).

In llvm-mingw, only the X86, ARM and AArch64 backends are enabled;
for the LLVM 13.0.0 release, libLLVM-13.dll ended up with 58112 exported
symbols. For LLVM 14.0.0, it was 62015 symbols. Current builds of
git main end up at around 64650 symbols - i.e. extremely close to the
limit.

The msys2 packages of LLVM have had to progressively disable more
of their backends in their builds, to be able to keep building with a
dylib.

While the proper path forward would be to add complete visibility
annotations to all the APIs (an effort which was started in
https://reviews.llvm.org/D109192, but seems to have stalled currently),
this allows improving the current mingw dylib situation significantly,
by using the same -fvisibility-inlines-hidden option as on Unix
platforms, and by making LLVM_LIBRARY_VISIBILITY expand to
attribute ((visibility("hidden"))).

With those in place, a current build of LLVM git main ends up at
35142 symbols instead of 64650. Without this, a mingw dylib build of
MLIR is impossible (as it exceeds the symbol limit).

Diff Detail

Event Timeline

mstorsjo created this revision.Jul 19 2022, 1:27 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 19 2022, 1:27 PM
mstorsjo requested review of this revision.Jul 19 2022, 1:27 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 19 2022, 1:27 PM

I'm a little bit concerned this could break some projects in the wild but for projects that use it properly this sounds like huge win (as your LLVM results show).

llvm/lib/IR/Mangler.cpp
246

This could be simplified to TT.isOSCygMing().

I'm a little bit concerned this could break some projects in the wild but for projects that use it properly this sounds like huge win (as your LLVM results show).

Yes, there's some risk with this.

Any project that uses hidden visibility will break linking with older LLD, as LLD errors out on unhandled embedded directives. As the patches to LLD would be landed before, that's probably fine as long as a user doesn't update Clang without updating LLD. And the binutils patch seems well received so far too.

Secondly, as you mention - the risk is with projects using visibility inconsistently in mingw configurations so far. Ironically, it seems like LLVM is such a case... https://github.com/llvm/llvm-project/blob/bc9b964f8f38bdaa4574847fc59527ea597dbc0a/llvm/lib/Target/CMakeLists.txt#L24-L30 enables hidden visibility by default for all code under llvm/lib/Target, with the expectation that LLVM_EXTERNAL_VISIBILITY overrides it - but the LLVM_EXTERNAL_VISIBILITY define doesn't expand to anything on mingw: https://github.com/llvm/llvm-project/blob/bc9b964f8f38bdaa4574847fc59527ea597dbc0a/llvm/include/llvm/Support/Compiler.h#L116-L127

Other than that, I haven't run into any issues in my testing with these patches, but I don't build all that many projects with tricky DLL setups either.

llvm/lib/IR/Mangler.cpp
246

Ah, indeed, thanks!

mstorsjo updated this revision to Diff 446788.Jul 22 2022, 4:43 AM

Use .isOSCygMing() in the new code (not updating the existing case in the function).

I'm a little bit concerned this could break some projects in the wild but for projects that use it properly this sounds like huge win (as your LLVM results show).

Do you want any more investigation into the breakage risk here? I don’t really know of any other good ways of doing that, other than including the patch in msys2’s Clang and rebuilding packages… In any case, if there’s breakage, it should be caused by inconsistent use of visibility attributes, which hopefully is easy to fix.

@rnk - can I get your opinion on this one too?

As the patches to LLD would be landed before, that's probably fine as long as a user doesn't update Clang without updating LLD.

For MSYS2 it's unsupported scenario so I don't fear it.

Do you want any more investigation into the breakage risk here?

I'm away right now but even when I'm back I strongly doubt I'll have time to test it.
I think fixing builds will be trivial (using sed) while waiting for upstreams to release fixed versions so it should be good to go.

I think this is reasonable, but it's a big change worth discussing more widely. The basic idea is to try to make hidden visibility work on GNU COFF targets.

One major reservation that I have about this solution is that we've discovered that the .drective section is extremely inefficient. Consider that, with heavy use of templates, most C++ symbols are inline. Roughly speaking, O(all) strings in the symbol table will be duplicated into the .drectve section. C++ symbols, can in many cases, dominate symbol file size. This was a big problem for us in PGO builds, which use llvm.used, which uses the /include: directive. I think I filed a feature request to implement a more efficient scheme similar to the address-significance tables. This is a much more efficient way to tag symbols with some boolean property, and you can see it was discussed a few times recently with regard to the ELF address significance tables because it interacts with strip and objcopy in bad ways. However, I've never really seen those tools as part of the mainstream for COFF targets, so we kept the address significance protocol as it is.

I have never coordinated with binutils folks, and I think compatibility probably outweighs efficiency here, but if they are open to improvements, it's worth looking into this before we lock in this inefficient format.

Various interested parties: +@MaskRay @hans @aeubanks

I think this is reasonable, but it's a big change worth discussing more widely. The basic idea is to try to make hidden visibility work on GNU COFF targets.

Thanks for picking up the thread here - this is exactly the discussion I'd like to have. Yes, it's a bigger change than it seems originally.

FWIW, the scope of my effort is that I've added support to binutils and lld for handling the directive, and with this patch, LLVM would produce such directives for hidden visibility. Currently, setting hidden visibility with Clang/LLVM produces no warnings but doesn't have any effect. With GCC, using __attribute__((visibility("hidden"))) currently gives warnings about not having any effect, but passing e.g. -fvisibility=hidden doesn't - so projects may be passing that accidentally somewhere (like we did, before D130200). I'm not planning on making patches myself to GCC to map the visibility to this directive there though.

One major reservation that I have about this solution is that we've discovered that the .drective section is extremely inefficient.

Yup. What led me to this solution was that it turned out to be extremely straightforward to implement (which shouldn't be an argument for defining new features, but alas...) as it's modelled exactly 1:1 after the regular dllexport mechanism. In lld, I used the same early handling of the directive just like what has been done for exports for performance reasons.

Consider that, with heavy use of templates, most C++ symbols are inline. Roughly speaking, O(all) strings in the symbol table will be duplicated into the .drectve section. C++ symbols, can in many cases, dominate symbol file size.

On one hand, one could argue that with dllexport, you can also have such large .drectve sections with all the exported symbols. However in practice, I guess the norm would be that a fraction of the symbols are exported and the majority aren't. And in the default/hidden visibility scenario, we have the inverse - in the typical case, most symbols would be hidden (ideally at least). So I agree that the issue is on a different scale here, and this could be worse than current day dllexport.

I can try to do a test build of LLVM with and without this feature (with a separate patch that takes the hidden visibility into use) and see how much it inflates the build directory. What would be the best measuring point for that? Maybe the size of the largest object file and/or largest static library?

This was a big problem for us in PGO builds, which use llvm.used, which uses the /include: directive.

I think I filed a feature request to implement a more efficient scheme similar to the address-significance tables. This is a much more efficient way to tag symbols with some boolean property,

Where did you file this request? It'd be interesting to track how it fares.

and you can see it was discussed a few times recently with regard to the ELF address significance tables because it interacts with strip and objcopy in bad ways. However, I've never really seen those tools as part of the mainstream for COFF targets, so we kept the address significance protocol as it is.

Strip and objcopy are used a lot for GNU COFF though, mainly to get rid of symbol tables and embedded dwarf debug info, neither which are a thing in the MSVC ecosystem.

I have never coordinated with binutils folks, and I think compatibility probably outweighs efficiency here, but if they are open to improvements, it's worth looking into this before we lock in this inefficient format.

FWIW on the binutils side, the support for the new directive (equivalent to D130120) was merged recently, but it was just past the 2.39 branch, so I think we've got another 5 months or so to settle things finally. I'm sorry I went ahead and merged that patch there before we've had this discussion here (I wasn't sure if we'd get any good discussion though). On the other hand, I guess nothing says we must remove support for the directive if we settle on a more efficient mechanism either, but it'd feel a bit unnecessary to have a brand new directive that won't be used for what it was meant for.

In practice, there doesn't seem to be much activity from others to coordinate with there either (there wasn't any discussion like this one) - I guess it's all up to writing an acceptable patch for whatever format can be argued to be the best way forward.

Initially, I had hoped to get this landed by the 15.x branch, but it's clearly for the best to delay it to get things sorted out properly. I tried to do a mingw dylib build including mlir+flang, and that requires this feature to get libMLIR.dll under the export limit.

For the mingw dylib builds, we're awfully close to the export limit for libLLVM.dll right now for the 3 architectures I support in llvm-mingw - while msys2 have had to successively disable more and more targets the last few releases to fit under that limit. So I think msys2 might want to cherrypick this already to their 15.x release in order not to need to cut down the list of targets entirely - if we've settled on a way forward.

rnk added a comment.Aug 9 2022, 4:08 PM

On one hand, one could argue that with dllexport, you can also have such large .drectve sections with all the exported symbols. However in practice, I guess the norm would be that a fraction of the symbols are exported and the majority aren't.

Surprisingly, the exports can get pretty large as well. Without /Zc:dllexportinlines-, every inline method of every dllexported class is emitted into every object file in which it is included, and it is referenced from .drectve. I don't remember breaking out the cost specifically, but this was part of why we implemented the flag: 30% faster Chrome/Win dev builds.

Strip and objcopy are used a lot for GNU COFF though, mainly to get rid of symbol tables and embedded dwarf debug info, neither which are a thing in the MSVC ecosystem

Sure, this is a very important use case, but these tools are not typically used on relocatable object files before linking. FB/Meta for some reason ran these tools after running ld -r, and ran into issues. See the most recent thread.

Where did you file this request? It'd be interesting to track how it fares.

I wasn't able to find it, sorry.


The binutils folks seemed to really not like the design of encoding symbol table indices in object file sections, so they might not like this proposed design. See this bug. However, I still think it's a good way to extend the symbol table with new flags. As long as it's a handshake between the compiler and linker and not some open ended set of other tools (objcopy & strip), it's straightforward.

I follow COFF development passively. Representing hidden visibility with -exclude-symbols: in .drectve LGTM.
Due to the limitation of the format, this seems the best possible emulation.
The performance issue seems to mitigated by having a fast path following D78845.

llvm/test/CodeGen/X86/mingw-hidden.ll
3

Use --check-prefixes

73

-SAME

MaskRay added inline comments.Aug 9 2022, 5:42 PM
llvm/lib/IR/Mangler.cpp
248

Is it useful to encode several symbols joined by , with one -exclude-symbols:? Just to make the section content smaller.

One major reservation that I have about this solution is that we've discovered that the .drective section is extremely inefficient.

I can try to do a test build of LLVM with and without this feature (with a separate patch that takes the hidden visibility into use) and see how much it inflates the build directory. What would be the best measuring point for that? Maybe the size of the largest object file and/or largest static library?

Ok, now I've tested actually measuring this and comparing the tradeoff. I've tested building llvm+clang with -DLLVM_LINK_LLVM_DYLIB=TRUE, with a patch to take the hidden visibility into use.

  • Individual object files do grow - undoubtedly. One large object file is tools/clang/lib/ASTMatchers/Dynamic/CMakeFiles/obj.clangDynamicASTMatchers.dir/Registry.cpp.obj, which grew from 4.2 MB to 4.6 MB. (And to doublecheck things; this object file now has a .drectve section at 409 KB.) I.e. this grew by around 10%.
  • The total build directory actually ended up slightly smaller (but by a very small margin); originally it was 1234 MB, now it is 1232 MB. One factor here is that since we're exporting much fewer symbols (and possibly due to better codegen due to knowing about hidden visibility?), some files shrink - a notable amount. libLLVM-16.git.dll ends up at 58.9 MB where it originally was 62.3 MB. The import library, libLLVM-16git.dll.a, is now 12 MB instead of 24 MB.
  • Linking time with lld changes negligibly. Linking libLLVM-16git.dll takes somewhere around 1.0-1.4 seconds on the system I'm testing, and the change is much much smaller than my measurement noise here.
  • Linking time with binutils changes a lot though. The implementation I made in ld.bfd used the same naive data structure used for e.g. aligncomm directives (a linked list of strings, so just inserting strings into it ends up as O(n^2)), so this ends up quite inefficient. Previously, linking libLLVM-16git.dll took 28 seconds, now it takes 66 seconds. But that's something that could be improved on the binutils side, not a limitation of the format itself per se.

All in all, it doesn't look too bad to me.

mstorsjo updated this revision to Diff 451431.Aug 10 2022, 5:57 AM

Updated to use --check-prefixes=.

mstorsjo marked an inline comment as done.Aug 10 2022, 5:59 AM
mstorsjo added inline comments.
llvm/lib/IR/Mangler.cpp
248

Yes, that would be useful given that the option does support multiple symbols per directive (contrary to -export). But I think that would require some amount of extra refactoring, because emitLinkerFlagsForGlobalCOFF is called with one GlobalValue at a time, and it is called from at least 3 different locations.

llvm/test/CodeGen/X86/mingw-hidden.ll
73

I don't see how -SAME would help anything here? All the .section .drectve and all the .ascii lines are on separate lines in the output.

MaskRay accepted this revision.Aug 10 2022, 1:31 PM

Thanks!

llvm/lib/IR/Mangler.cpp
248

You mentioned

And to doublecheck things; this object file now has a .drectve section at 409 KB.)

So I think it's worth doing some refactoring, but the change can be separate.

llvm/test/CodeGen/X86/mingw-hidden.ll
73

Sorry, ignore my comment.

This revision is now accepted and ready to land.Aug 10 2022, 1:31 PM

Thanks for the reviews! FWIW, as this is a rather notable change, I'd add this to the release notes along with the patch:

+Changes to the Windows Target
+-----------------------------
+
+* For MinGW, generate embedded ``-exclude-symbols:`` directives for symbols
+  with hidden visibility, omitting them from automatic export of all symbols.
+  This roughly makes hidden visibility work like it does for other object
+  file formats.
+

What do you think, @rnk @MaskRay @mati865, should we try to backport this set to 15.x already, or is it too disruptive? I think msys2 might end up backporting it to their distribution in any case.

I don't know about LLVM policy for such potentially disruptive change but regardless of what is decided here MSYS2 will almost surely backport this change, otherwise there is no way to provide shared libraries.
On the other hand even if it landed earlier it probably wouldn't have received a lot of testing anyway.

MaskRay added a comment.EditedAug 10 2022, 6:08 PM

I think the lld changes (1) (2) are pretty safe to cherry pick. I like that idea that consumers support something earlier than the producers.
The Chrome folks may want to analyze (3).

This revision was landed with ongoing or failed builds.Aug 11 2022, 2:01 AM
This revision was automatically updated to reflect the committed changes.

I think the lld changes (1) (2) are pretty safe to cherry pick. I like that idea that consumers support something earlier than the producers.

Yes, that sounds reasonable, (1) and (2) are safe and harmless to make available, while (3) is the one that potentially can cause surprises.

The Chrome folks may want to analyze (3).

Not sure how much there's to analyze for them in it, as they don't use the mingw mode at all. At most, I guess they'd be concerned if the extra branches in the fast path affect performance of their linking, but I would expect that it doesn't.

Reasonable to back port this patch, too.

Reasonable to back port this patch, too.

Oh, do you think so? In one sense, I guess if we're going to do this, it doesn't really matter that much if we do it now or in 16.x. (Although by the time 16.x is out, binutils 2.40 with support for the linker directive might be out too, so it'd work better in Clang+ld.bfd cases.)

rnk added a comment.Aug 15 2022, 10:21 AM

I wanted to ACK the performance measurements, thanks for looking! I think it looks good.

Some day in the future we can invent a special LLD-only compact directive encoding scheme for all of these if someone profiles and sees it as a problem. The solution generalizes to all of these per-symbol flags (-export: -include:, -exclude-symbol:, some I probably forgot).

rnk added a comment.Aug 15 2022, 10:23 AM

Regarding backporting, I thought we had tried to backport bug fixes, not features, and this is definitely a new feature. OTOH it is quite useful. So, either way seems fine to me.

Regarding backporting, I thought we had tried to backport bug fixes, not features, and this is definitely a new feature. OTOH it is quite useful. So, either way seems fine to me.

I think it makes sense to leave this one out for now still, but let downstream distros (i.e. msys2) play with the patch until the next release round. FWIW I found one detail where we can improve this patch too.

Also, regarding the necessity for this patch:

In llvm-mingw, only the X86, ARM and AArch64 backends are enabled; for the LLVM 13.0.0 release, libLLVM-13.dll ended up with 58112 exported symbols. For LLVM 14.0.0, it was 62015 symbols. Current builds of git main end up at around 64650 symbols - i.e. extremely close to the limit.

Current git main of llvm as of today finally broke this threshold: If trying to build a dylib with these targets enabled, with a toolchain that doesn't handle exluding symbols based on hidden visibility, it now fails due to exporting too many symbols. (https://github.com/mstorsjo/llvm-mingw/runs/8125805911?check_suite_focus=true#logs) I.e., to build main/16.x of llvm as a mingw dylib, you need clang from main/16.x or a patched 15.x. Not a big issue in practice as msys2 does carry these patches downstream already in their current 14.x version though - but it shows how really how urgent this feature turned out. (The github actions build should be fixable by just setting a flag to force it to update the msys2 install before trying to build.)