Page MenuHomePhabricator

COFF: drop the dependency on LIB.EXE for implibs

Authored by compnerd on Jul 10 2016, 3:30 PM.



lld currently relies on lib.exe in order to generate an empty import library.
The "empty" import library consists of 5 members:

  • first linker member
  • second linker member
  • Import Descriptor
  • NULL Import Descriptor
  • NULl Thunk

The first two entries (first and second linker members) are string tables which
are never updated. Therefore, they may as well as not be present. A subsequent
change to add that is probably warranted. However, this does not prevent the
use of the linker.

The Import Descriptor is the content which is most important. It provides an
Import Name Table entry for the library (as specified by the LIBRARY directive
in the DEF file). Additionally, it contains undefined references to the NULL
Import Descriptor and the library NULL Thunk Data. This ensures that the linker
will pull in the subsequent objects from the import library for the link. The
Import Descriptor has a single symbol (__IMPORT_DESCRIPTOR_<Library>) which
contains 3 relocations, one to the INT (Import Name Table) entry, one to the ILT
(Import Lookup Table) entry, and one to the IAT (Import Address Table) entry.

The NULL Import Descriptor is the last import descriptor and terminates the
import descriptor array. It contains a single symbol

The NULL Thunk contains a single symbol (\x7f<Library>_NULL_THUNK_DATA) and
provides the terminator for the ILT and IAT.

These files are currently constructed manually following the example of the
Short Import Library format. This is arguably less than ideal, and it may be
possible to use MCAssembler and feed it the fragments to construct the object.

The major difference between the LIB (LINK) generated objects and the ones
generated here is that they are all one section shorter (.debug$S) as they do
not contain the debug information and one symbol shorter ( as they do
not contain the RICH signature.

Move the logic related to the librarian into a new source file (Librarian.cpp).

Diff Detail

Event Timeline

compnerd updated this revision to Diff 63441.Jul 10 2016, 3:30 PM
compnerd retitled this revision from to COFF: drop the dependency on LIB.EXE for implibs.
compnerd updated this object.
compnerd added reviewers: rui314, ruiu.
compnerd set the repository for this revision to rL LLVM.
compnerd added a project: lld.
compnerd added a subscriber: llvm-commits.
ruiu removed a reviewer: rui314.Jul 10 2016, 3:55 PM
ruiu edited edge metadata.Jul 10 2016, 3:58 PM

I didn't expect that you needed this much code. Can you create a new file and move the new code there?

In D22206#479771, @ruiu wrote:

I didn't expect that you needed this much code. Can you create a new file and move the new code there?

I wanted to get an early version of it up for review in the hopes that someone can see something that Im missing and see if this can be written more compactly. Ill go ahead and split it out into a separate file in the mean time.

compnerd updated this revision to Diff 63445.Jul 10 2016, 4:35 PM
compnerd updated this object.
compnerd edited edge metadata.

Move the librarian logic into a new file.

ruiu added a comment.Jul 10 2016, 5:24 PM

I think you want to create a Librarian.h for functions/classes in Librarian.cpp.


You want to describe what Librarian is. (Readers already know that this file is for "Librarian" whatever it is because this is Librarian.cpp.)


It is recommended by the LLVM coding style that you minimize the region that an anonymous namespace encloses. Since static functions don't have to be in an anonymous namespace, you want to exclude them from this anonymous namespace.


Add a newline before this line.


Since this is actually a constant, I'd make this a non-member constant string.


It seems that you don't need to compute the size beforehand. Instead, you can construct a std::vector or something like that and then get the size of the constructed object at end of this function. (This would be a bit inefficient, but because this function is not hot at all, it should be fine.)

Or you can create a buffer that is large enough to store any import library file and truncate it at end of this funciton.

compnerd added inline comments.Jul 10 2016, 6:32 PM

Im not sure I understand. The size is dependent on the name of the library, which is user input. The size is used for the buffer allocation.

compnerd marked 3 inline comments as done.Jul 10 2016, 8:05 PM
In D22206#479798, @ruiu wrote:

I think you want to create a Librarian.h for functions/classes in Librarian.cpp.

Right now the only externally visible function is writeImportLibrary. I can create the header if you like, but I don't see it as being useful at this time. If we end up with more functions in the future, we could also introduce the header then.

compnerd marked an inline comment as done.Jul 10 2016, 8:12 PM
compnerd updated this revision to Diff 63453.Jul 10 2016, 8:16 PM
compnerd removed rL LLVM as the repository for this revision.
ruiu added inline comments.Jul 11 2016, 1:31 PM

What I was trying to say is something like this. If you define a function to append data to a std::vector,

template <class T> append(std::vector<uint8_t> &Vec, const T &Data) {
  size_t S = Vec.size();
  Vec.resize(S + sizeof(T));
  memcpy(&Vec[S], Data, sizeof(T));

and use that to append data, then you don't need to compute the buffer size beforehand.

compnerd added inline comments.Jul 11 2016, 5:29 PM

Okay. So, you want to replace the local bump ptr allocator with a std::vector<char>?

I guess that the write* functions would do the allocation + write.

ruiu added inline comments.Jul 11 2016, 5:33 PM

Yeah. More precisely, I want to replace the buffers allocated using the bump ptr allocator with std::vector<uint8_t>.

compnerd updated this revision to Diff 63642.Jul 11 2016, 11:31 PM

Replaced the use of the BumpPtr allocator with the std::vector. Note that we pass the buffer in as otherwise the content will be discarded at the end of the function since the NewArchiveMember does not copy the contents. Since we no longer use the bump allocator, we need to ensure that the content will be preserved during the lifetime of the object until the archive is written to disk.

compnerd marked an inline comment as done.Jul 11 2016, 11:31 PM
ruiu added inline comments.Jul 12 2016, 1:01 PM

Why don't you add using namespace lld::coff at beginning of this file?


Can you describe the structure of the string table briefly? It consists of the size field followed by null-terminated strings.


I prefer size_t over auto.


Add a comment that this is to backfill the size field.


I'd be nice to add comments for this class to say that most code in this class is to create an empty import table which has almost fixed contents and you don't need to understand the details unless you are particularly interested in the import table structure.


Can you define this and other member functions as out-of-line functions?


Please use C++-style comments (//) instead of C-style (/**/).


You can define this as

void lld::coff::writeImportLibrary()

to remove the namespace definitions surrounding this.

compnerd marked 7 inline comments as done.Jul 12 2016, 5:48 PM
compnerd added inline comments.

No strong reason, I just tend to prefer explicit namespace, fine by me to use that though :-).


If we are using the type, Id rather use std::vector<uint8_t>::size_type. Changed to that.


This conflicts with the other comment to move the functions out of the class. Once we decide between the two, Ill update the patch.


We could, it was meant as a means to avoid re-creating the strings for the ImportDescriptor and NullThunk symbol name. This isn't the hot path, so we can certainly do that. I don't have too strong of an opinion.

compnerd updated this revision to Diff 63763.Jul 12 2016, 6:19 PM
compnerd marked 2 inline comments as done.
ruiu accepted this revision.Jul 12 2016, 6:56 PM
ruiu edited edge metadata.

LGTM with a few nits.


Ugh, this seems too generic to me - I really prefer size_t because an architecture where sizeof(std::vector<uint8_t>::size_type) != sizeof(size_t) seems really exotic.


memcpy(reinterpret_cast<char *>(&B[Pos]),, S.length()) would be better because it's cheaper if the string is not NUL-terminated.


Move } to the next line.


Add } to close the anonymous namespace.

This revision is now accepted and ready to land.Jul 12 2016, 6:56 PM
compnerd closed this revision.Jul 12 2016, 8:27 PM

Fixed and commited as SVN r275242!