This is an archive of the discontinued LLVM Phabricator instance.

LTO: introduce object file-based on-disk module format.
ClosedPublic

Authored by pcc on Jul 3 2014, 12:56 AM.

Details

Summary

This format is simply a regular object file with the bitcode stored in a
section named ".llvmbc", plus any number of other (non-allocated) sections.

One immediate use case for this is to accommodate compilation processes
which expect the object file to contain metadata in non-allocated sections,
such as the ".go_export" section used by some Go compilers [1], although I
imagine that in the future we could consider compiling parts of the module
(such as large non-inlinable functions) directly into the object file to
improve LTO efficiency.

[1] http://golang.org/doc/install/gccgo#Imports

Diff Detail

Event Timeline

pcc updated this revision to Diff 11045.Jul 3 2014, 12:56 AM
pcc retitled this revision from to LTO: introduce object file-based on-disk module format..
pcc updated this object.
pcc edited the test plan for this revision. (Show Details)
pcc added a reviewer: rafael.
pcc added a subscriber: Unknown Object (MLST).
rafael edited edge metadata.Jul 3 2014, 7:41 AM

+ /// Returns 'true' if the bitcode bc is for the specified target triple.
+ static bool isTargetMatch(StringRef bc, const char *triplePrefix);

Using a StringRef is a nice independent improvement in here
(isTargetMatch), please commit it first. Nit: take the opportunity to
rename the argument names to match the llvm style.

LTOModule *LTOModule::createFromBuffer(const void *mem, size_t length,

                                     TargetOptions options,
                                     std::string &errMsg, StringRef path) {
std::unique_ptr<MemoryBuffer> buffer(makeBuffer(mem, length, path));
if (!buffer)
  return nullptr;
  • return makeLTOModule(buffer.release(), options, errMsg);

+ return makeLTOModule(StringRef((const char *)mem, length), options, errMsg);

The buffer is not need in here anymore, is it?

  • return makeLTOModule(buffer.release(), options, errMsg);

+ return makeLTOModule(buffer->getBuffer(), options, errMsg);

The MemoryBuffer is now destroyed immediately instead of being
transferred into the produced LTOModule. This would work with the
current setup, except you also do

  • m->materializeAllPermanently();

Which means the IRReader will be pointed to unmapped memory, no?

BTW, the materializeAllPermanently is a very unfortunate but necessary
thing for the C api. The reason it is there is the way ld64 implements
an optimization:

  • We want to avoid putting linkonce_odr symbols in the symbol table

unless their address is significant.

  • The way ld64 implements this is being told for every symbol if it

can be dropped from the symbol table.

  • To tell ld64 about that we need to look at every use of a symbol.
  • To look at every use of a symbol we have to materialize all function

bodies since they may contain an use.

Two higher level questions about this approach:

  • It is effectively a way of doing fat object files. GCC does

something similar. One small but recurring annoyance I had when
working with GCC LTO is that it can end up falling back silently to
non-LTO since tho outer container is ELF, with IR in a section. With
LLVM right now we get an error, since without our help the linker is
clueless as to what LLVM IR is. Can't you invert this? Put the ELF in
the IR (or even represent .go_export directly in some metadata) and
extent the gold plugin API to handle this? Another advantage is that
it should be easier to support fat COFF and MachO objects too.

  • If we do need to go with IR in ELF instead of ELF in IR, we also

need to update lib/Object to handle it. We still want to get an
IRObjectFile (by default at least) when given one of those files so
that llvm-nm/llvm-ar do the right thing.

pcc updated this revision to Diff 13563.Sep 10 2014, 3:31 PM
pcc edited edge metadata.

Rebase, add LLVMObject support

It feels like these new format support should be documented somewhere. BitCodeFormat.rst maybe?

Also, an entry in the release notes would be nice.

include/llvm/Object/IRObjectFile.h
54 ↗(On Diff #13563)

Some comments saying what these function do would be nice.

I wonder if findBitcodeIn... might be a more appropriate name. Not sure.

lib/LTO/LTOModule.cpp
109

I don't think we use auto for cases like this.

111

Is the empty check needed? It seems better to say that an empty section would be a broken bitcode. That is, this should return true and we would print an error afterwards.

114

It seems tempting to just get the buffer and forward to IRObjectFile::getBitcodeFromMemBuffer. Would avoid the duplicated switch over the type.

lib/Object/IRObjectFile.cpp
268 ↗(On Diff #13563)

Maybe this should take an ObjectFile&?

271 ↗(On Diff #13563)

You don't need the {

276 ↗(On Diff #13563)

You don't need the {

lib/Object/SymbolicFile.cpp
65 ↗(On Diff #13563)

Explicit type, not auto I think.

66 ↗(On Diff #13563)

The empty check looks odd.

tools/gold/gold-plugin.cpp
551 ↗(On Diff #13563)

You have to handle the case where we don't have a get_view (bfd using the plugin).

On trunk we create a BufferRef in both branches of the if, so you can probably just move this after the end of the else.

pcc updated this revision to Diff 13798.Sep 17 2014, 1:12 PM

Address comments.

pcc added a comment.Sep 17 2014, 1:13 PM

Added docs.

include/llvm/Object/IRObjectFile.h
54 ↗(On Diff #13563)

Some comments saying what these function do would be nice.

Added.

I wonder if findBitcodeIn... might be a more appropriate name. Not sure.

Ok, I like that name better. Done.

lib/LTO/LTOModule.cpp
109

Changed.

111

Previously I returned an empty memory buffer from getBitcodeFromMemBuffer if the file was of the wrong type or there was no .llvmbc section present. That was a bad decision in retrospect; that function now returns error codes in those cases, which lets me drop the empty check here and elsewhere.

114

Done. Now that I think about it, the previous way wasn't saving us much.

lib/Object/IRObjectFile.cpp
268 ↗(On Diff #13563)

It can even take a const ObjectFile &. Done.

271 ↗(On Diff #13563)

Done.

276 ↗(On Diff #13563)

Done.

lib/Object/SymbolicFile.cpp
65 ↗(On Diff #13563)

Done.

66 ↗(On Diff #13563)

Removed.

tools/gold/gold-plugin.cpp
551 ↗(On Diff #13563)

Are you thinking of another function (claim_file_hook?) It looks like the code in this function (getModuleForFile) uses get_view unconditionally. I don't know if this is correct or not, but it seems to have worked up to now.

rafael accepted this revision.Sep 18 2014, 12:27 PM
rafael edited edge metadata.

LGTM with a nit.

test/tools/gold/bcsection.ll
5 ↗(On Diff #13798)

Since the intention is to check that %T/bcsection.o is not just a copy of %T/bcsection.bco, how about extending this to:

llvm-nm -no-llvm-bc %T/bcsection.bco | FileCheck %s --check-prefix=NO-MAIN
lvm-nm %T/bcsection.bco | FileCheck %s
lvm-nm %T/bcsection.o| FileCheck %s

?

This revision is now accepted and ready to land.Sep 18 2014, 12:27 PM
pcc added inline comments.Sep 18 2014, 2:38 PM
test/tools/gold/bcsection.ll
5 ↗(On Diff #13798)

The first test seems useful (as a negative test for -no-llvm-bc); added.

The LTO test suite already has variations of the second test.

I don't see what the third one would accomplish. More specifically, I don't see how it could fail if "llvm-nm -no-llvm-bc %T/bcsection.o | FileCheck %s" passes. Maybe if the object file had an .llvmbc section whose bitcode does not define 'main'? But this seems unlikely.

pcc closed this revision.Sep 18 2014, 2:38 PM
pcc updated this revision to Diff 13850.

Closed by commit rL218078 (authored by @pcc).

Hi Peter,

Sorry to perform some thread necromancy here, but I didn't see this go around the first time. What are you trying to accomplish with this? I'm not sure I see how this is necessary to push metadata around, but maybe I'm missing something. Can you explain a bit?

Thanks!

-eric

pcc added a comment.May 13 2015, 2:59 PM

Hi Eric,

Have you seen my original proposal? (http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-July/074540.html) In the second paragraph in particular it explains how llgo uses this feature.

Peter

Hi Peter,

If the imports are for .o files and the bitcode is for before compilation
is complete? If it's post compilation then we'll have a .o file and can put
the metadata in as normal? Otherwise if we're using lazy compilation you'll
need to teach something how to read llvm-ir anyhow and so might as well not
bother storing it in an elf file?

I could easily be missing something of the compilation pipeline here.

-eric