Page MenuHomePhabricator

Introduce libLTO C APIs to target the "resolution-based" new LTO API
Needs ReviewPublic

Authored by mehdi_amini on Apr 10 2017, 10:42 AM.



The goal is to deprecate the existing C API exposed by libLTO and
remove it 2 or 3 releases from now.
This commit introduces the basic replacement APIs, and it not intended
to be a complete replacement at this point.

A new tool is introduced: llvm-liblto. This tool links statically
only to libSupport, and link dynamically to libLTO. It offers a way
to test the new C API in tree without having a linker using this API
available. This was a major painpoint of the existing API.

This is a work-in-progress, I'm sharing it now to show the general
direction I took until now.

Event Timeline

mehdi_amini created this revision.Apr 10 2017, 10:42 AM
pcc edited edge metadata.Apr 10 2017, 12:02 PM

A few comments, mostly on the interface.


What is the minimum level of C we support in our C interfaces? If it is C89 we should use /* */ in this file.


Please be more consistent about type naming (i.e. lto_foo vs LTOFoo).


How are you planning to expose other symbol attributes? I'd use an array-of-tagged-unions approach for that as well.


Is this data structure needed? I think it would be sufficient to just have a way of querying for the number of symbols and symbol attributes for the nth symbol given an lto_input_t.


I would imagine that a client would use this function to destroy the input file after retrieving the list of symbols without adding it to an LTO object, for example if the client is ar or wants to support gold's --start-lib/--end-lib flags.


Maybe there should be a tag field in lto_symbol_resolutions_t so that we can introduce new versions of the lto_symbol_resolution_t data structure by incrementing the tag.


Document that this function takes ownership of the lto_input_t.


Is this change necessary?


This isn't guaranteed to be null terminated (especially after the string table changes), so this probably needs to return a pointer and a length.

tobiasvk added inline comments.

InputFile::symbols() only contains global and non-format specific symbols, so InputFile::Symbol::isGlobal will always be true (and ::isFormatSpecific always false).

If you do need local and/or private symbols too, you could add a symbolsWithLocals() or similar to InputFile. I would have a use for that with LTO + Linker Scripts :)

Thanks, I'll address all these.


Right, I planned to adopt the C like comments.


That's another one that I noticed in the middle of the development: the existing LTO C API was likely based on ld64 convention, but I rather use the LLVM one. I'll likely update all the APIs as well.


I was planning to add a new API for each accessor. An array isn't great: it requires to allocate memory (which mean deallocating as well) and search in the array for the name when it is the only thing you want. I expect the client side to be more annoying to implement with an array.
(I may have misunderstood what you meant here).


I could do it now, I wrote this when the symbols were not backed by an array but some kind of list that we had to iterate through. So having an API with get_ith_symbol(int n) was requiring to iterate n times every call.
(we discuss it on IRC last month).

Now that you added the IR symtab, this does not add much value anymore.


Good point, I'll document this!


No in this patch, at some point I was populating a special LTOSymbol structure containing all the information here.

pcc added inline comments.Apr 10 2017, 1:23 PM

Yes, the allocation could be a problem if it were the responsibility of the API. But another possibility would be for the client to allocate the data structure and have the API fill it in.

The client side would look like this:

lto_symbol_attr attrs[2];
attrs[0].tag = tag_name;
attrs[1].tag = tag_visibility;

for (int i = 0; i != n; ++i) {
  lto_get_symbol_attrs(input_file, i, attrs, sizeof(attrs));
  char *name = attrs[0].value.u_ptr;
  lto_visibility vis = attrs[1].value.u_int;
  // process the symbol

What do you think?


Right, I expect that we will continue to provide random access via the irsymtab (or whatever we replace it with).


I think you can move this check into lto2.cpp now that the symbol list is random access.

Wallbraker added inline comments.

These will need to be prefixed with LTO or something like that. See the LLVM-C enums.


Add to the comment a note about threading. If you use multiple threads does the callback happen on different threads?

Is there a way to avoid using a callback?


This API feels very un LLVM-Cy, maybe make the lto_config_t object opaque and have functions to add params to it?


Why not just destroy it automatically?

mehdi_amini added inline comments.Apr 10 2017, 3:51 PM

That looks nicer this way.

That said I forgot to mention that I planned to also have a c++ wrapper for the C API, so that the client could have a C++-like API that would target the C API exposed by LTO.
Something like:

namespace {
class LTOSymbol {
  lto_symbol_t Sym;
  LTOStringRef getName() {
    return lto_symbol_name(Sym);

But also something that would allow to write:

LTOInput Input = ...;
for (LTOSymbol Sym : Input) {
  auto Name = Sym.getName();

So I'd like to have an API that allows to write a nice wrapper for the client.


Not auto-destroying leaves the possibility to introduce new APIs that can be called after lto_run without breaking existing client.