This is an archive of the discontinued LLVM Phabricator instance.

[ThinLTO] Emit individual index files for distributed backends
ClosedPublic

Authored by tejohnson on Apr 26 2016, 1:02 PM.

Details

Summary

When launching ThinLTO backends in a distributed build (currently
supported in gold via the thinlto-index-only plugin option), emit
an individual index file for each backend process as described here:
http://lists.llvm.org/pipermail/llvm-dev/2016-April/098272.html

The individual index file encodes the summary and module information
required for implementing the importing/exporting decisions made
for a given module in the thin link step.
This is in place of the current mechanism that uses the combined index
to make importing decisions in each back end independently. It is an
enabler for doing global summary based optimizations in the thin link
step (which will be recorded in the individual index files), and reduces
the size of the index that must be sent to each backend process, and
the amount of work to scan it in the backends.

Rather than create entirely new ModuleSummaryIndex structures (and all
the included unique_ptrs) for each backend index file, a map is created
to record all of the GUID and summary pointers needed for a particular
index file. The IndexBitcodeWriter walks this map instead of the full
index (hiding the details of managing the appropriate summary iteration
in a new iterator subclass). This is more efficient than walking the
entire combined index and filtering out just the needed summaries during
each backend bitcode index write.

Depends on D19481.

Diff Detail

Event Timeline

tejohnson updated this revision to Diff 55076.Apr 26 2016, 1:02 PM
tejohnson retitled this revision from to [ThinLTO] Emit individual index files for distributed backends.
tejohnson updated this object.
tejohnson added a reviewer: mehdi_amini.
tejohnson added a subscriber: llvm-commits.
tejohnson updated this revision to Diff 55244.Apr 27 2016, 9:02 AM
  • Refactor ThinLTO handling out of allSymbolsReadHook as it is getting big
mehdi_amini edited edge metadata.Apr 27 2016, 10:37 PM

Is it possible to have some coverage in a test based on llvm-lto (or other standard tools)?
The availability of the gold plugin is rather limited usually.

lib/Bitcode/Writer/BitcodeWriter.cpp
396

no else after return.

Same as before, it'd be nice to have a llvm-lto based test to cover this.

lib/Bitcode/Writer/BitcodeWriter.cpp
283

Why not use directly the ImportMapTy populated by ComputeCrossModuleImportForModule?
That would avoid restructuring the data. If the only thing is the use of std::map instead of StringMap for the purpose of stable ordering, let's change what ComputeCrossModuleImportForModule operates on.

tools/gold/gold-plugin.cpp
1232

Why?

In D19556#415035, @joker.eph wrote:

Is it possible to have some coverage in a test based on llvm-lto (or other standard tools)?
The availability of the gold plugin is rather limited usually.

Ok, looking at the code added here to gold-plugin, I think it makes most sense to move the setup of the ModuleToSummariesForIndex map into a helper method in FunctionImport.cpp. I would leave the actual invocation of WriteIndexToFile here, so that I don't need to add a dependence from Transforms/IPO to libBitWriter (which is doable, won't introduce a circular dependence, but seems somewhat strange). I would also leave the invocation of collectDefinedGVSummariesPerModule and ComputeCrossModuleImport here for now - eventually we will want to do similar optimizations in gold to what you are doing in libLTO, which means refactoring that handling out of libLTO, but that is an orthogonal thing I would like to leave for another day.

Then I can add a similar mechanism to libLTO (similar interface to e.g. ThinLTOGenerator::internalize - take a module and the combined index as input, but then invoke the new mechanism to compute and return the individual index summaries map. Then I can add a mechanism to llvm-lto to invoke it and write out the individual index files, and add a llvm-lto based test.

lib/Bitcode/Writer/BitcodeWriter.cpp
283

Initially I planned to do exactly that. It required refactoring ImportMapTy out of FunctionImport.h, since I needed to use it in BitcodeWriter.cpp, so I moved it to ModuleSummaryIndex.h and made the necessary changes. I put it back when I realized the following:

But the bigger issue is that it didn't really give me the information I needed:

  • The ImportMapTy is a map to FunctionsToImportTy, which provides the GUID of the functions to import. I would then need to perform an extra step in the bitcode writer to find the corresponding summary object for that module. It was more natural and efficient to get that directly from the ModuleToDefinedGVSummaries map already populated in the client (and in fact this map and ModuleToDefinedGVSummaries use the same GVSummaryMapTy).
  • I also need to emit the summaries for the module we are importing into (see response to your other comment below on why), which are not in the import map, so I would have had to pass in additional info and handle it separately inside the bitcode writer.
  • I don't need the other information in the FunctionsToImportTy (the threshold), although that is not a big issue
396

Will fix.

tools/gold/gold-plugin.cpp
1232

Because when we make global optimization decisions here (i.e. promotion, ODR resolution, etc), we will need to note the info (e.g. updated linkage type) in the summaries for this module in its individual backend index, so that the linkage changes can be made in the backend process.

See also the "Individual Module Index Files" overview section in http://lists.llvm.org/pipermail/llvm-dev/2016-April/098272.html.

tejohnson updated this revision to Diff 55488.Apr 28 2016, 2:24 PM
tejohnson edited edge metadata.

Address comments, including refactoring some of the handling out of
gold-plugins and into FunctionImport. Invoke the refactored helper in
libLTO and add llvm-lto support and test.

tejohnson updated this revision to Diff 55527.Apr 28 2016, 9:38 PM
  • Refine libLTO support: parsing module not needed for computing summaries.
mehdi_amini accepted this revision.May 4 2016, 1:49 PM
mehdi_amini edited edge metadata.

LGTM (see one bikeshed suggestion inline)

lib/Bitcode/Writer/BitcodeWriter.cpp
283

The ImportMapTy is a map to FunctionsToImportTy, which provides the GUID of the functions to import. I would then need to perform an extra step in the bitcode writer to find the corresponding summary object for that module. It was more natural and efficient to get that directly from the ModuleToDefinedGVSummaries map already populated in the client (and in fact this map and ModuleToDefinedGVSummaries use the same GVSummaryMapTy).

OK I missed this the first time, seems obvious afterward...
It seems like something that could be changed in whatever ComputeCrossModuleImportForModule produces, but we can refactor/change that later (indeed this is the kind of information I was expected to gather from your current series of patches to feed the design a common API for linkers).

lib/Transforms/IPO/FunctionImport.cpp
423 ↗(On Diff #55527)

bikeshed: what about llvm::gatherImportedSummariesForModule() (i.e. we don't really "compute", and "File" seems odd compare to the usual use of "Module")

This revision is now accepted and ready to land.May 4 2016, 1:49 PM
mehdi_amini added inline comments.May 4 2016, 2:01 PM
lib/LTO/ThinLTOCodeGenerator.cpp
745 ↗(On Diff #55527)

I am wondering if this need to sit on the ThinLTOCodeGenerator though: it does not use any member, does it?

tejohnson added inline comments.May 4 2016, 2:04 PM
lib/Bitcode/Writer/BitcodeWriter.cpp
283

Yes, I agree that some of these steps can/should be combined when we refactor into a common library.

lib/Transforms/IPO/FunctionImport.cpp
423 ↗(On Diff #55527)

Good idea, will switch to that name.

tejohnson added inline comments.May 4 2016, 2:12 PM
lib/LTO/ThinLTOCodeGenerator.cpp
745 ↗(On Diff #55527)

You're right it doesn't. It is invoked from llvm-lto - would it be better to just create a global function outside the class but still in ThinLTOCodeGenerator.h/cpp (in the llvm namespace)?

mehdi_amini added inline comments.May 4 2016, 3:18 PM
lib/LTO/ThinLTOCodeGenerator.cpp
745 ↗(On Diff #55527)

I'm not totally sure what is "the right thing to do", maybe just turning it into a static method in the class would be enough for now?

I don't have a strong opinion on this, I figured I'll just mention it.

Initially I had this class only having addModule and run. The individual method I added were here to support breaking down testing of all of what's going on in run(). It is slightly slipping here, but we'll refactor all of that later, hopefully not in too long.

tejohnson added inline comments.May 4 2016, 4:10 PM
lib/LTO/ThinLTOCodeGenerator.cpp
745 ↗(On Diff #55527)

I think using a static class member makes the most sense for now, will do that.

This revision was automatically updated to reflect the committed changes.

For the MSAN issue:

llvm/trunk/lib/Bitcode/Writer/BitcodeWriter.cpp
347 ↗(On Diff #56279)

Here if Writer.Index.begin() == Writer.Index.end() we end up with IndexGVSummariesIter being uninitialized.

363 ↗(On Diff #56279)

Here we don't check for the same conditions as in the ctor.

tejohnson added inline comments.May 5 2016, 10:41 AM
llvm/trunk/lib/Bitcode/Writer/BitcodeWriter.cpp
363 ↗(On Diff #56279)

(Copying from IRC for posterity)

This was also my first instinct, until I realized that operator++ should never be executed here.

It's coming from the "for (const auto &I : *this)" on line 291 for an empty index. The == end() check should be done before attempting to do the operator++. Confirmed with a normal llvm-lto build that it detects it is at the end and exits (operator++ is never entered for the test case). So something else is going wrong. Attempting to build an msan compiler to track this down better.

As a sanitizers build cop, I am going to revert this path. It should be easy to re-apply the patch when a fix is ready.

LGTM, it does not hurt to revert during the investigation.

Reverted in r268660

I'm at a loss as to what is going on with the msan failures. Maybe it will be obvious to someone else, so posting what I found so far here.

Unfortunately, I can make the msan failure go away any number of ways (including building at a lower level of optimization or adding any print statements, which is making debugging challenging). I also put a log from a debugging session below to show what info I can get.

With my own msan built compiler, I can reproduce most but not all of the failures from the bot. The case I have been looking at is with the distributed_indexes.ll test added with this patch. In that case when we create the IndexBitcodeWriter we have a non-null ModuleToSummariesForIndex pointer, so we don't initialize the IndexSummaries* pointers. When I step through in the debugger, the range based for loop in the IndexBitcodeWriter constructor executes the expected paths in the begin() and end() iterator constructors (not shown in the below trace). After the first for loop body executes, I can see where we execute the iterator's operator++() and operator==() paths for the expected ModuleToSummariesForIndex!=null case. But then oddly we seem to jump back to operator++() handling on the wrong path (that we would execute if ModuleToSummariesForIndex==null), and is where we attempt to access and get the msan error on the IndexSummaries* iterator accesses.

A few things that make it go away, but I don't understand why (may be luck):

  1. Initializing the IndexSummariesBack and IndexSummariesIter unconditionally in the constructor (we can do this safely since the Index is always passed in).
  2. Re-initializing IndexSummariesBack in the operator++() code right before we access it (although that access shouldn't happen as noted above).

For both 1) and 2) I don't understand why these would help except luck, since we shouldn't be executing the path that accesses these iterators. Is there something about the way they are declared that would make the compiler think it can hoist the accesses so that they are unconditionally executed?

  1. Change the range based for loop to be a non-range for loop: for (IndexBitcodeWriter::iterator I = begin(); I != end(); ++I) GUIDToValueIdMap[(*I).first] = ++GlobalValueId;
  2. Change the range based for loop to make a copy of the value instead of assigning to a const reference: for (auto I : *this) GUIDToValueIdMap[I.first] = ++GlobalValueId;

For both 3) and 4) this avoids binding the return of the iterator::operator* (which is a std::pair) to a const reference. However, looking at operator* which invokes make_pair on a GUID and summary pointer to create the return value, I think the const reference should extend the lifetime of the returned temporary through the loop body shouldn't it? In any case, this doesn't explain why I am seeing the wrong path of the operator++() being executed after we appear to already have done the operator++ and operator==.

Here's the debugging log, with some comments sprinkled in:

Starting program: /usr/local/google/home/tejohnson/llvm/llvm_msan/bin/llvm-lto -thinlto-action=distributedindexes -thinlto-index /usr/local/google/home/tejohnson/llvm/llvm_msan/test/ThinLTO/X86/Output/distributed_indexes.ll.tmp.index.bc /usr/local/google/home/tejohnson/llvm/llvm_msan/test/ThinLTO/X86/Output/distributed_indexes.ll.tmp1.bc /usr/local/google/home/tejohnson/llvm/llvm_msan/test/ThinLTO/X86/Output/distributed_indexes.ll.tmp2.bc

Breakpoint 3, IndexBitcodeWriter () at llvm/lib/Bitcode/Writer/BitcodeWriter.cpp:285
285 : BitcodeWriter(Buffer), Index(Index),

// entered the IndexBitcodeWriter constructor, use next to get to the body of the first iteration:

(gdb) n
286 ModuleToSummariesForIndex(ModuleToSummariesForIndex) {
(gdb)
285 : BitcodeWriter(Buffer), Index(Index),
(gdb)
286 ModuleToSummariesForIndex(ModuleToSummariesForIndex) {
(gdb)
281 IndexBitcodeWriter(SmallVectorImpl<char> &Buffer,
(gdb)
275 unsigned GlobalValueId = 0;
(gdb)
291 for (const auto &I : *this)
(gdb)
292 GUIDToValueIdMap[I.first] = ++GlobalValueId;

// single step from here. First set of instructions are the emplace into map from line 292:

(gdb) s
operator[] () at llvm/lib/Bitcode/Writer/BitcodeWriter.cpp:292
292 GUIDToValueIdMap[I.first] = ++GlobalValueId;
(gdb)
emplace_unique_key_args<unsigned long, const std::1::piecewise_construct_t &, std::1::tuple<const unsigned long &>, std::1::tuple<> > () at llvm/lib/Bitcode/Writer/BitcodeWriter.cpp:292
292 GUIDToValueIdMap[I.first] = ++GlobalValueId;
(gdb)
insert_node_at () at llvm/lib/Bitcode/Writer/BitcodeWriter.cpp:292
292 GUIDToValueIdMap[I.first] = ++GlobalValueId;
(gdb)
size () at llvm/lib/Bitcode/Writer/BitcodeWriter.cpp:292
292 GUIDToValueIdMap[I.first] = ++GlobalValueId;
(gdb)
first () at llvm/lib/Bitcode/Writer/BitcodeWriter.cpp:292
292 GUIDToValueIdMap[I.first] = ++GlobalValueId;
(gdb)
first () at /usr/local/google/home/tejohnson/llvm/llvm_msan/build-libcxx-msan/include/c++/v1/memory:2312
2312 _LIBCPP_INLINE_VISIBILITY _T1_reference first() _NOEXCEPT {return
first_;}
(gdb)
emplace_unique_key_args<unsigned long, const std::1::piecewise_construct_t &, std::1::tuple<const unsigned long &>, std::1::tuple<> > () at /usr/local/google/home/tejohnson/llvm/llvm_msan/build-libcxx-msan/include/c++/v1/__tree:2013
2013 node_base_pointer& child = find_equal(parent, k);
(gdb)
find_equal<unsigned long> () at /usr/local/google/home/tejohnson/llvm/llvm_msan/build-libcxx-msan/include/c++/v1/__tree:1887
1887 if (nd != nullptr)
(gdb)
emplace_unique_key_args<unsigned long, const std::1::piecewise_construct_t &, std::1::tuple<const unsigned long &>, std::1::tuple<> > () at /usr/local/google/home/tejohnson/llvm/llvm_msan/build-libcxx-msan/include/c++/v1/tree:2023
2023 insert_node_at(parent, child, static_cast<node_base_pointer>(h.get()));
(gdb)
insert_node_at () at /usr/local/google/home/tejohnson/llvm/llvm_msan/build-libcxx-msan/include/c++/v1/__tree:1997
1997 ++size();

// Now enter the operator++() in the range for loop, executes the Writer.ModuleToSummariesForIndex!=null case as expected:

(gdb)
357 if (ModuleGVSummariesIter == ModuleSummariesIter->second.end() &&
(gdb) list
352 First the inner iterator is incremented, then if it is at the end
353
and there are more outer iterations to go, the inner is reset to
354 // the start of the next inner list.
355 if (Writer.ModuleToSummariesForIndex) {
356 ++ModuleGVSummariesIter;
357 if (ModuleGVSummariesIter == ModuleSummariesIter->second.end() &&
358 ModuleSummariesIter != ModuleSummariesBack) {
359 ++ModuleSummariesIter;
360 ModuleGVSummariesIter = ModuleSummariesIter->second.begin();
361 }
(gdb) up
#1 IndexBitcodeWriter () at llvm/lib/Bitcode/Writer/BitcodeWriter.cpp:291
291 for (const auto &I : *this)
(gdb) down
#0 operator++ () at llvm/lib/Bitcode/Writer/BitcodeWriter.cpp:357
357 if (ModuleGVSummariesIter == ModuleSummariesIter->second.end() &&
(gdb) s
end () at llvm/lib/Bitcode/Writer/BitcodeWriter.cpp:357
357 if (ModuleGVSummariesIter == ModuleSummariesIter->second.end() &&
(gdb)
end () at llvm/lib/Bitcode/Writer/BitcodeWriter.cpp:357
357 if (ModuleGVSummariesIter == ModuleSummariesIter->second.end() &&
(gdb)
end_node () at llvm/lib/Bitcode/Writer/BitcodeWriter.cpp:357
357 if (ModuleGVSummariesIter == ModuleSummariesIter->second.end() &&
(gdb)
first () at llvm/lib/Bitcode/Writer/BitcodeWriter.cpp:357
357 if (ModuleGVSummariesIter == ModuleSummariesIter->second.end() &&
(gdb)
first () at /usr/local/google/home/tejohnson/llvm/llvm_msan/build-libcxx-msan/include/c++/v1/memory:2312
2312 _LIBCPP_INLINE_VISIBILITY _T1_reference first() _NOEXCEPT {return
first_;}
(gdb)
operator++ () at llvm/lib/Bitcode/Writer/BitcodeWriter.cpp:358
358 ModuleSummariesIter != ModuleSummariesBack) {
(gdb) s
operator!= () at llvm/lib/Bitcode/Writer/BitcodeWriter.cpp:358
358 ModuleSummariesIter != ModuleSummariesBack) {
(gdb) s
operator!= () at /usr/local/google/home/tejohnson/llvm/llvm_msan/build-libcxx-msan/include/c++/v1/__tree:810
810 {return !(x == y);}
(gdb)
operator++ () at llvm/lib/Bitcode/Writer/BitcodeWriter.cpp:357
357 if (ModuleGVSummariesIter == ModuleSummariesIter->second.end() &&
(gdb)
IndexBitcodeWriter () at llvm/lib/Bitcode/Writer/BitcodeWriter.cpp:291
291 for (const auto &I : *this)

// Next we execute the operator!= to check if for loop at end():

(gdb) s
operator!= () at llvm/lib/Bitcode/Writer/BitcodeWriter.cpp:291
291 for (const auto &I : *this)
(gdb)
operator== () at llvm/lib/Bitcode/Writer/BitcodeWriter.cpp:291
291 for (const auto &I : *this)
(gdb) up
#1 operator!= () at llvm/include/llvm/ADT/iterator.h:97
97 return !static_cast<const DerivedT *>(this)->operator==(RHS);
(gdb) up
#2 IndexBitcodeWriter () at llvm/lib/Bitcode/Writer/BitcodeWriter.cpp:291
291 for (const auto &I : *this)
(gdb) down
#1 operator!= () at llvm/include/llvm/ADT/iterator.h:97
97 return !static_cast<const DerivedT *>(this)->operator==(RHS);
(gdb) down
#0 operator== () at llvm/lib/Bitcode/Writer/BitcodeWriter.cpp:291
291 for (const auto &I : *this)
(gdb) s
operator== () at llvm/lib/Bitcode/Writer/BitcodeWriter.cpp:291
291 for (const auto &I : *this)

// This is the Writer.ModuleToSummariesForIndex!=null case of operator==() as expected:

(gdb) up
#1 operator== () at llvm/lib/Bitcode/Writer/BitcodeWriter.cpp:395
395 return ModuleGVSummariesIter == RHS.ModuleGVSummariesIter;
(gdb) s
operator== () at /usr/local/google/home/tejohnson/llvm/llvm_msan/build-libcxx-msan/include/c++/v1/__tree:807
807 {return x.ptr_ == y.ptr_;}
(gdb) s
IndexBitcodeWriter () at llvm/lib/Bitcode/Writer/BitcodeWriter.cpp:291
291 for (const auto &I : *this)

// Here's where things go weird. This is the Writer.ModuleToSummariesForIndex==null case of operator++():

(gdb) s
operator++ () at llvm/lib/Bitcode/Writer/BitcodeWriter.cpp:364
364 if (IndexGVSummariesIter == IndexSummariesIter->second.end() &&
(gdb) up
#1 IndexBitcodeWriter () at llvm/lib/Bitcode/Writer/BitcodeWriter.cpp:291
291 for (const auto &I : *this)
(gdb) down
#0 operator++ () at llvm/lib/Bitcode/Writer/BitcodeWriter.cpp:364
364 if (IndexGVSummariesIter == IndexSummariesIter->second.end() &&
(gdb) s
365 IndexSummariesIter != IndexSummariesBack) {
(gdb) s
operator!= () at llvm/lib/Bitcode/Writer/BitcodeWriter.cpp:365
365 IndexSummariesIter != IndexSummariesBack) {
(gdb) s
operator!= () at /usr/local/google/home/tejohnson/llvm/llvm_msan/build-libcxx-msan/include/c++/v1/__tree:886
886 {return !(x == y);}
(gdb) up
#1 operator!= () at /usr/local/google/home/tejohnson/llvm/llvm_msan/build-libcxx-msan/include/c++/v1/map:798
798 {return x.i_ != y.i_;}
(gdb) up
#2 operator++ () at llvm/lib/Bitcode/Writer/BitcodeWriter.cpp:365
365 IndexSummariesIter != IndexSummariesBack) {
(gdb) down
#1 operator!= () at /usr/local/google/home/tejohnson/llvm/llvm_msan/build-libcxx-msan/include/c++/v1/map:798
798 {return x.i_ != y.i_;}
(gdb) down
#0 operator!= () at /usr/local/google/home/tejohnson/llvm/llvm_msan/build-libcxx-msan/include/c++/v1/__tree:886
886 {return !(x == y);}
(gdb) s
operator++ () at llvm/lib/Bitcode/Writer/BitcodeWriter.cpp:364
364 if (IndexGVSummariesIter == IndexSummariesIter->second.end() &&
(gdb) s

Breakpoint 2, __msan_warning_noreturn ()

at /usr/local/google/home/tejohnson/llvm/llvm_msan/llvm/projects/compiler-rt/lib/msan/msan.cc:362

362 void __msan_warning_noreturn() {

For the assembly just posted, the issue is around line 1211. Note that earlier in that block we are supposedly executing BitcodeWriter.cpp:357 (which is in the expected path of operator++). Then we have some code that is attributed to line 364. The je .LBB91_91 is not taken, but je .LBB91_310 is, which branches to the msan warning call. Note that if we didn't take this second branch, the next thing would be the operator* code at line 377 (which is from the expected Writer.ModuleToSummariesForIndex!=null path).

Upload of the module IR before and after unswitching the loop in question. The message with the debug info about the unswitch in this function (there was only one) is at the end of the beforeunswitch file. I had made some minor edits to BitcodeWriter.cpp for this compile, but nothing that had any effect and the line numbers should be the same.

Here is the assembly from -O3 without msan instrumentation. It looks more sane. I.e. the execution of the BitcodeWriter.cpp:364 (line 733) appears to be guarded by an earlier conditional branch from the 355 (see the jne .LBB91_77 just above).