This is an archive of the discontinued LLVM Phabricator instance.

[llvm-lib] Use COFF archive format in llvm-lib (other archive tools don't use this format).
ClosedPublic

Authored by jacek on Feb 7 2023, 4:52 PM.

Details

Summary

We currently just use GNU format for llvm-lib. This mostly works, but ARM64EC needs an additional section that does not really fit GNU format. This patch implements writing in COFF format (as in, it's what archive reader considers as K_COFF). This mostly requires symbol emitting symbol map. Note that, just like in case of MSVC, symbols are de-duplicated in both usual symbol table and the new symbol map.

Diff Detail

Event Timeline

jacek created this revision.Feb 7 2023, 4:52 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 7 2023, 4:52 PM
jacek requested review of this revision.Feb 7 2023, 4:52 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 7 2023, 4:52 PM
jacek retitled this revision from [llvm-lib] Add support for writing COFF archives. to [llvm-lib 4/5] Add support for writing COFF archives..
jacek retitled this revision from [llvm-lib 4/5] Add support for writing COFF archives. to [llvm-lib] Add support for writing COFF archives..Feb 8 2023, 12:47 PM

So - I'm not very familiar with the internals of static archives, can you elaborate a bit more on what K_COFF vs K_GNU means here. Is this a format flag which is written to the static archives themselves (I believe not?), or just different conventions for what details are written and found in the archives? Currently, when reading such archives, are they identified as K_GNU or K_COFF? And currently, before this patch, which mode is used? If I read the patch correctly (I just eyed it quickly) - llvm-ar will continue to write things like it used to, while llvm-lib will pick K_COFF (unless it's a thin library)?

mstorsjo added inline comments.Feb 8 2023, 2:20 PM
llvm/test/tools/llvm-lib/duplicate.test
20

Before this change, would --print-armap not have printed anything at all, or what's the change in externally visible functionality here?

jacek added a comment.Feb 8 2023, 3:08 PM

So - I'm not very familiar with the internals of static archives, can you elaborate a bit more on what K_COFF vs K_GNU means here. Is this a format flag which is written to the static archives themselves (I believe not?), or just different conventions for what details are written and found in the archives?

It's about how details are written to the archives, the meaning in this patch it analogical to how reader uses them, see Archive class constructor. Essentially, if the second member is another symbol table, then the archive treated as K_COFF. The other subtle difference is that strings in string table are null-terminated, see ArchiveMemberHeader::getName. This patch makes it possible to write archives that will be recognized as K_COFF.

Currently, when reading such archives, are they identified as K_GNU or K_COFF? And currently, before this patch, which mode is used?

Libraries generated by current llvm-lib are identified as K_GNU. Libraries generated by llvm-lib with this patch are identified as K_COFF.

If I read the patch correctly (I just eyed it quickly) - llvm-ar will continue to write things like it used to, while llvm-lib will pick K_COFF (unless it's a thin library)?

Yes, nothing changes for llvm-ar yet. I guess it would make sense to add new llvm-ar --format argument for that, but we probably shouldn't change the default. If we changed the default for Windows host, it would require explicit --format when cross compiling from Windows, which is currently not required.

jacek added inline comments.Feb 8 2023, 3:16 PM
llvm/test/tools/llvm-lib/duplicate.test
20

Without this patch, it would print duplicated and unsorted symbols:

Archive map
b in bar.o
c in abc.o
a in abc.o
b in abc.o
a in foo.o

So - I'm not very familiar with the internals of static archives, can you elaborate a bit more on what K_COFF vs K_GNU means here. Is this a format flag which is written to the static archives themselves (I believe not?), or just different conventions for what details are written and found in the archives?

It's about how details are written to the archives, the meaning in this patch it analogical to how reader uses them, see Archive class constructor. Essentially, if the second member is another symbol table, then the archive treated as K_COFF. The other subtle difference is that strings in string table are null-terminated, see ArchiveMemberHeader::getName. This patch makes it possible to write archives that will be recognized as K_COFF.

Ok, thanks for the explanation! (So far in my endeavours, I've diven into most of the internals of PE/COFF, but never had to worry about the details of static archives.)

If I read the patch correctly (I just eyed it quickly) - llvm-ar will continue to write things like it used to, while llvm-lib will pick K_COFF (unless it's a thin library)?

Yes, nothing changes for llvm-ar yet. I guess it would make sense to add new llvm-ar --format argument for that, but we probably shouldn't change the default. If we changed the default for Windows host, it would require explicit --format when cross compiling from Windows, which is currently not required.

Yeah we shouldn't make such differences based on the host where the tool is running. There's some predecent for autodetecting the mode based on other context (preexisting archive or archive members, or maybe also objects passed on the command line), which mostly should avoid the cross compilation issues. See https://github.com/llvm/llvm-project/commit/a0d42c86dbcb150c2a7a3d8f2a8500dcfb98a5d1 for the original implementation of that (I believe it has evolved a bit still through the years since).

llvm/test/tools/llvm-lib/duplicate.test
20

I see, thanks!

jacek updated this revision to Diff 496117.Feb 9 2023, 7:15 AM

rebased

jacek added a comment.Feb 9 2023, 7:29 AM

Yeah we shouldn't make such differences based on the host where the tool is running. There's some predecent for autodetecting the mode based on other context (preexisting archive or archive members, or maybe also objects passed on the command line), which mostly should avoid the cross compilation issues. See https://github.com/llvm/llvm-project/commit/a0d42c86dbcb150c2a7a3d8f2a8500dcfb98a5d1 for the original implementation of that (I believe it has evolved a bit still through the years since).

Interesting. Actually, that change was broken around K_COFF handling. If you try to modify an archive recognized as K_COFF, it would hit llvm_unreachable in isBSDLike when trying to write an archive. I verified that it's fixed by this patch series and llvm-ar writes K_COFF archive.

jacek updated this revision to Diff 507382.Mar 22 2023, 9:02 AM

rebased

efriedma accepted this revision.Mar 22 2023, 12:52 PM

Please explicitly note in the commit message that this doesn't affect other archive-writing tools, like llvm-ar.

Otherwise LGTM

llvm/lib/Object/ArchiveWriter.cpp
768

I'm curious, what does MSVC lib do here? Refuse to construct the archive? Skip creating the symbol table? Use some other output format?

This revision is now accepted and ready to land.Mar 22 2023, 12:52 PM
jacek added inline comments.Mar 22 2023, 3:22 PM
llvm/lib/Object/ArchiveWriter.cpp
768

According to my testing, it has a limit of 4090 object files. When I try to pass more, I get an ambiguous error:
$ lib -out:test.lib -nologo @inputs/files.txt
LINK : fatal error LNK1104: cannot open file 'inputs\a4091.obj'

jacek retitled this revision from [llvm-lib] Add support for writing COFF archives. to [llvm-lib] Use COFF archive format in llvm-lib (other archive tools don't use this format)..Mar 22 2023, 3:32 PM
efriedma added inline comments.Mar 22 2023, 4:40 PM
llvm/lib/Object/ArchiveWriter.cpp
768

That's strange... maybe accidentally running into the C library's file handle limit. What happens if you try adding additional files to a library that already contains 4090 object files?

jacek added inline comments.Mar 22 2023, 5:04 PM
llvm/lib/Object/ArchiveWriter.cpp
768

Oh, right, I can get larger libraries this way. I can get up to 65535 object files by merging multiple libraries until I get an error:

LINK : fatal error LNK1189: library limit of 65535 objects exceeded

efriedma added inline comments.Mar 22 2023, 8:27 PM
llvm/lib/Object/ArchiveWriter.cpp
768

I guess we don't need to worry about that for compatibility, then.

This revision was landed with ongoing or failed builds.Mar 23 2023, 4:44 AM
This revision was automatically updated to reflect the committed changes.