Page MenuHomePhabricator

[LLD] [COFF] Fix automatic export of symbols from LTO objects

Authored by mstorsjo on Apr 29 2021, 1:47 PM.



This isn't quite pretty, but it seems to work at least. Suggestions
for how to do things better are very welcome. I'll make a few inline
comments about a few surprising bits.

Diff Detail

Event Timeline

mstorsjo created this revision.Apr 29 2021, 1:47 PM
mstorsjo requested review of this revision.Apr 29 2021, 1:47 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 29 2021, 1:47 PM
mstorsjo added inline comments.Apr 29 2021, 1:51 PM

It turns out we can't actually call def->getChunk() on a DefinedRegular from a bitcode object file, because that symbol has got the SectionChunk **data; member set to nullptr, and getChunk() dereferences *data, so we have to do this dance so we won't crash on calling def->getChunk().


Here we end up iterating over all symbols in the input bitcode file, to find the one we're looking at, to figure out whether it's data or a function. Not pretty...


Here we need to decide what symbols to export before we run addCombinedLTOObjects().

126 ↗(On Diff #341632)

The ->getChunk() call from here stemmed from the original iplementation of this feature, where it was added just as a guesstimate of what would be a decent condition - for LTO symbols we must tolerate quite different symbols.

rnk added inline comments.Apr 29 2021, 2:22 PM

Exciting. I took a look at LTO support a while back, and I wanted to rearrange how some of this stuff worked, but I never got around to it. I think I wanted to make some kind of DefinedBitcode symbol or something like that so that the fields would be less overloaded and more uniform.


Yeah, it would be nice to figure out how to avoid this, since it's O(n^2)-ish.

One way would be to store this bit directly on the DefinedRegular symbol object.

Another way would be to start making fake, synthetic SectionChunk objects for symbols from bitcode files. We could have the standard text, data, rdata, bss sections, for example. This would make the previous conditional unnecessary. That seems better.

The least invasive way would be to separately iterate over the bitcode input files and look their symbols up in the symbol table and use that to create exports. However, symbol resolution is already expensive, and this is basically a second symbol resolution phase.

I kind of lean towards the second option. It leads to less null checks in the long run.


Yes, that makes sense: we can't really do effective LTO without knowing what the entry points are going to be.

mstorsjo added inline comments.Apr 29 2021, 2:31 PM

You mean your preference is synthetic section chunks?

That does sound nice as it'd avoid null checks, but wouldn't it be a huge waste (both mem and time) to do that for all bitcode object file symbols, for the odd chance of wanting to inspect some of them for exporting - just for the short period of time until we replace them with the LTO compiled output?

Or should there be like 1 common fake section chunk used for all LTO symbols, for each symbol type (func/data)?

rnk added inline comments.Apr 29 2021, 2:53 PM

I was thinking we could have one fake section chunk for the standard types of LLVM IR GlobalValues: .text for Function, .data for GlobalVariable, .rdata for Constant, etc. Something like that. I think it's just a matter of passing extra parameters to addRegular in BitcodeFile::parse.

mstorsjo updated this revision to Diff 343865.May 8 2021, 2:03 PM

Simplified (somewhat) by adding fake SectionChunks for LTO symbols until they're compiled.

There's still some amount of ugliness here; the SectionChunk ctor needs to be made to cope with a null ObjFile, and that this runs before config is initialized. Any ideas for how to make that bit neater?

rnk added a comment.May 11 2021, 11:17 AM

Nice, I think this is a reasonable direction.


Please put this in an anonymous namespace and add a comment that this is just adding a constructor to help stamp out COFF section headers. For some reason I thought it was adding fields. Actually, maybe it would be less surprising if the coff_section was a plain field, and then we can take its address to construct the SectionChunk objects.


Can this be marked constexpr? We want to avoid any dynamic initialization if possible.


Please mark the globals static.


The SectionChunk constructor is non-trivial, so this will definitely create dynamic initialization. I guess that's OK, LLD isn't really structured as a library anyway.

mstorsjo added inline comments.May 11 2021, 11:23 AM

Hmm, I'll have to see.


Yeah this bit isn't ideal. I guess it's still tolerable for a library, even if it isn't optimal.

mstorsjo added inline comments.May 12 2021, 3:45 AM

Doesn't seem like it's doable to make it constexpr in this form, it'd have to be e.g. constexpr FakeSection(int c) : section(<initializer expression>) {}, it seems (at least in C++11).

The most straightforward would be C99 style designated intializers, i.e coff_section fakeSection = { .Characteristics = 42; };, but that's only available in C++20. And I'd rather not enumerate the members by hand to create something like coff_section fakeSection = { 0, 0, 0, 0, 0, 0, importantValue }.

Also, coff_section consists of support::ulittle16_t, which don't seem to be constexpr'able, so I'm not sure if it'd even work like that in C++20?

(This is all kinda roundabout, when what we really want just is a couple static bytes laid out in a specific way, but accessible for the caller via the coff_section class.)

mstorsjo updated this revision to Diff 344755.May 12 2021, 3:50 AM

Added a comment about the fake section class, made the coff_section a member instead of subclassing it, wrapped the globals in an anonymous namespace.

mstorsjo updated this revision to Diff 344764.May 12 2021, 4:23 AM

Apply clang-format suggestions.

rnk accepted this revision.May 20 2021, 10:25 AM

lgtm, with a nit


Oh well.


Wouldn't it be simpler to make the field public, and take the address of the field in the constructor below?

This revision is now accepted and ready to land.May 20 2021, 10:25 AM
mstorsjo added inline comments.May 20 2021, 12:09 PM

I guess that would work just as well, and be clearer. Will change before pushing.