This is an archive of the discontinued LLVM Phabricator instance.

[ThinLTO] Avoid archive member collisions
ClosedPublic

Authored by davide on Oct 11 2016, 7:10 PM.

Diff Detail

Repository
rL LLVM

Event Timeline

davide updated this revision to Diff 74318.Oct 11 2016, 7:10 PM
davide retitled this revision from to [ThinLTO] Avoid archive member collisions.
davide updated this object.
davide added a reviewer: pcc.
davide added a subscriber: llvm-commits.
pcc edited edge metadata.Oct 11 2016, 7:32 PM

It seems that this does not entirely solve the problem -- you will still have collisions if two files happen to have the same size.

The gold plugin solves this problem by incorporating the byte offset within the archive file into the filename it passes to the LTO library. Would it be possible to do something similar here?

In D25495#567829, @pcc wrote:

It seems that this does not entirely solve the problem -- you will still have collisions if two files happen to have the same size.

The gold plugin solves this problem by incorporating the byte offset within the archive file into the filename it passes to the LTO library. Would it be possible to do something similar here?

Will it solve the problem?
If you have two archives with the same size, and the two member with the same name are the first one in the archive, they'll likely start at the same offset, wouldn't you get a collision again?

pcc added a comment.Oct 11 2016, 9:10 PM

This patch is already including the archive name as part of the filename, which would seem to be enough to avoid a collision in that case.

Oh sure, as long as the full path to the archive is kept and not only the archive name itself, that should be fine!

ELF/InputFiles.cpp
718 ↗(On Diff #74318)

s/size of the archive/size of the member buffer/

davide updated this revision to Diff 74393.Oct 12 2016, 9:14 AM
davide edited edge metadata.

Thanks for clarifying, Peter. Here's a new version of the patch. I decided to store the information about the offset in the archive in class InputFile as we do for the archive name already.

ruiu added a subscriber: ruiu.Oct 12 2016, 10:58 AM
ruiu added inline comments.
ELF/InputFiles.cpp
726–727 ↗(On Diff #74393)

Can you use the object address as a unique identifier instead? I mean utostr((uint64_t)&MB) is always unique.

davide added inline comments.Oct 12 2016, 11:06 AM
ELF/InputFiles.cpp
726–727 ↗(On Diff #74393)

Maybe, but I'd like to double check with Peter to make sure it's fine on lib/LTO side (probably it is).

pcc added inline comments.Oct 12 2016, 11:34 AM
ELF/InputFiles.cpp
726–727 ↗(On Diff #74393)

I'd prefer not to do that, for a couple of reasons:

  1. these names are used by lib/LTO to form file names for save-temps, and it's probably best if that remains deterministic
  2. eventually I think we will need to plumb the file name and offset into lib/LTO so that ThinLTO can use it to reload the input file in another process.

OK, makes sense. Are you fine with the rest of the patch?

mehdi_amini added inline comments.Oct 12 2016, 11:36 AM
ELF/InputFiles.cpp
726–727 ↗(On Diff #74393)

For 2), that would tie us to a "file based" API, I'll look closely at the benefit before going this route.

davide updated this revision to Diff 74411.Oct 12 2016, 11:50 AM
ruiu added a comment.Oct 12 2016, 11:50 AM

If we put aside (2) for now, all the plumbing of OffsetInArchive is just to create a unique filename, which I think I don't like. There are other ways to generate a deterministic unique filename. For example, you could add static int Counter = 0 to BitcodeFile and use the counter value as part of a filename (and increment it every time you use it).

ruiu added a comment.Oct 12 2016, 11:54 AM

Isn't ArchiveName a full path? If you concatenate a full path archive name and a filename inside of the archive, does it become a unique identifier?

In D25495#568482, @ruiu wrote:

If we put aside (2) for now, all the plumbing of OffsetInArchive is just to create a unique filename, which I think I don't like. There are other ways to generate a deterministic unique filename. For example, you could add static int Counter = 0 to BitcodeFile and use the counter value as part of a filename (and increment it every time you use it).

I'm not sure I like exposing a global with static lifetime to avoid threading informations properly into the linker. If it would be a awful lot of work, I'd agree with you, but it seems it's not.

In D25495#568483, @ruiu wrote:

Isn't ArchiveName a full path? If you concatenate a full path archive name and a filename inside of the archive, does it become a unique identifier?

No, it does not, because you can have a valid archive which contains two member functions with the same name.

In D25495#568483, @ruiu wrote:

Isn't ArchiveName a full path? If you concatenate a full path archive name and a filename inside of the archive, does it become a unique identifier?

No, it does not, because you can have a valid archive which contains two member functions with the same name.

ruiu added a comment.Oct 12 2016, 12:02 PM

That's true. That means when we print out an error message, we have the same problem that users can't identify which file is wrong. I'm wondering if InputFile::getName() should return a unique name, such as "some/path/foobar.a:123:x.o" where 123 is an offset in the archive.

pcc added a comment.Oct 12 2016, 12:26 PM
In D25495#568493, @ruiu wrote:

I'm wondering if InputFile::getName() should return a unique name, such as "some/path/foobar.a:123:x.o" where 123 is an offset in the archive.

That might be something we could do in lld::elf::getFilename(). It could be confusing in the normal case, so maybe we could do it only if the archive contains duplicate names?

ELF/InputFiles.cpp
491 ↗(On Diff #74411)

If this is a thin archive, I don't think getChildOffset will give us the right value (looking at the implementation, it appears that it will return a garbage value for thin archive members). Maybe return 0 if thin?

davide updated this revision to Diff 74419.Oct 12 2016, 12:36 PM
In D25495#568522, @pcc wrote:
In D25495#568493, @ruiu wrote:

I'm wondering if InputFile::getName() should return a unique name, such as "some/path/foobar.a:123:x.o" where 123 is an offset in the archive.

That might be something we could do in lld::elf::getFilename(). It could be confusing in the normal case, so maybe we could do it only if the archive contains duplicate names?

Maybe, but I still think the change belongs to a different patch

ELF/InputFiles.cpp
491 ↗(On Diff #74411)

Oh yeah, good catch.

pcc accepted this revision.Oct 12 2016, 12:40 PM
pcc edited edge metadata.
In D25495#568522, @pcc wrote:
In D25495#568493, @ruiu wrote:

I'm wondering if InputFile::getName() should return a unique name, such as "some/path/foobar.a:123:x.o" where 123 is an offset in the archive.

That might be something we could do in lld::elf::getFilename(). It could be confusing in the normal case, so maybe we could do it only if the archive contains duplicate names?

Maybe, but I still think the change belongs to a different patch

Agreed. Rui, what do you think?

This LGTM as far as I'm concerned though.

This revision is now accepted and ready to land.Oct 12 2016, 12:40 PM
ruiu added a comment.Oct 12 2016, 12:42 PM

I will try to see how my proposal work but it doesn't have to block this. LGTM.

ruiu accepted this revision.Oct 12 2016, 12:43 PM
ruiu added a reviewer: ruiu.
This revision was automatically updated to reflect the committed changes.
ruiu added a comment.Oct 12 2016, 1:07 PM

Does this work for --whole-archive?