This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/Transforms/
-
llvm/
-
Transforms/
-
IPO.h
-
IPO/
-
PassManagerBuilder.h
-
lib/Transforms/IPO/
-
Transforms/
-
IPO/
2
FunctionImport.cpp
-
tools/gold/
-
gold/
59
gold-plugin.cpp

Differential D15390

[ThinLTO] Launch importing backends in parallel threads from gold plugin
ClosedPublic

Authored by tejohnson on Dec 9 2015, 11:07 AM.

Download Raw Diff

Details

Reviewers

pcc
• rafael
mehdi_amini
dexonsmith

Commits

rG7cffaf3ad05c: [ThinLTO] Launch importing backends in parallel threads from gold plugin
rL262724: [ThinLTO] Launch importing backends in parallel threads from gold plugin

Summary

Instead of exiting after creating the combined index in the gold plugin,
unless requested by new option, we will now launch the ThinLTO backends
(LTO and codegen pipelines with importing) in parallel threads. The
number of threads is controlled by the existing -jobs gold plugin option,
or the hardware concurrency if not specified.

As discussed on IRC with Rafael, pull split codegen into gold-plugin and
use the ThreadPool support. Refactor both the split codegen and ThinLTO
handling to utilize a new CodeGen class that encapsulates the
optimization and code generation handling for each module (split or
not). This allows better code reuse between the ThinLTO and split
codegen cases. For now I have included this along with the ThinLTO thread patch, to
show how it all fits together. I can commit the split code gen changes
first though, followed by the ThinLTO backend support. Let me know if
you would like to review these separately.

Along with follow-on fixes D16173 and D16120, all of the SPEC cpu2006 C/C++ benchmarks now build and run correctly with -flto=thin.

Diff Detail

Event Timeline

tejohnson updated this revision to Diff 42323.Dec 9 2015, 11:07 AM

tejohnson retitled this revision from to [ThinLTO] Launch importing backends in parallel threads from gold plugin.

tejohnson updated this object.

tejohnson added reviewers: mehdi_amini, • rafael, dexonsmith.

tejohnson added subscribers: llvm-commits, davidxl.

Herald added a subscriber: mehdi_amini. · View Herald TranscriptDec 9 2015, 11:07 AM

Rebase to pick up changes I committed separately. The changes in this patch are now just related to the ThinLTO thread support.

mehdi_amini added inline comments.Dec 9 2015, 1:31 PM

lib/Transforms/IPO/FunctionImport.cpp
272	Const correctness is great, please commit now separately.
tools/gold/gold-plugin.cpp
940	We should probably refactor this out of the plugin, but this can be done later.
979	This is very suboptimal, I don't mind if you want to get this in for now as it is limited to the gold plugin. I plan to submit a ThreadPool to LLVMSupport (I'm using it locally in my bringup of the ld64 plugin).

Thanks for the comments.

lib/Transforms/IPO/FunctionImport.cpp
272	Done already, just rebased this to pick it up.
tools/gold/gold-plugin.cpp
940	Right, I am wondering how much overlap there is with the support you were adding for ld64. Would putting it in libLTO be useful?
979	Right, part of the reason I wanted to send this right away was to see if there was something existing or under development so that I didn't have to reinvent the wheel. Glad to hear you had a similar need for this. Do you expect it to go in soon?

Biggest change in this update is due to gold not being thread-safe
by default, requiring lots of refactoring to make gold callbacks in
single-threaded mode.

Also changed from std::thread to the thread wrapper in LLVM, which
handles !LLVM_ENABLE_THREADS.

Some test updates to test both single and multi-threaded handling.

I have not refactored any of the code out of gold yet. Doing so will
require refactoring out other routines, such as codegenImpl and
its callees such as runLTOPasses, or invoking via a callback.
There is some overlap between the handling in these routines and
handling that exists currently in LTOCodeGenerator, which we could
refactor out of both. I'm not sure where the best place to put the
refactored code is, maybe lib/CodeGen (which is where splitCodeGen lives)?

In D15390#308252, @tejohnson wrote:

Biggest change in this update is due to gold not being thread-safe
by default, requiring lots of refactoring to make gold callbacks in
single-threaded mode.

Also changed from std::thread to the thread wrapper in LLVM, which
handles !LLVM_ENABLE_THREADS.

Some test updates to test both single and multi-threaded handling.

I have not refactored any of the code out of gold yet. Doing so will
require refactoring out other routines, such as codegenImpl and
its callees such as runLTOPasses, or invoking via a callback.
There is some overlap between the handling in these routines and
handling that exists currently in LTOCodeGenerator, which we could
refactor out of both. I'm not sure where the best place to put the
refactored code is, maybe lib/CodeGen (which is where splitCodeGen lives)?

It is more than just codegenImpl, but also getModuleForFile that would have to be refactored out of gold. Right now looking at the ThinLTO thread handling in gold as well as the LTO pass and codegen invocations in libLTO (LTOCodeGenerator), I don't think we gain much by refactoring this out. The actual thread handling in gold is pretty minimal, and very specific to gold. So I'd prefer to keep this in the gold plugin at least for now.

No longer WIP. Tested with both regression tests and several SPEC cpu2006 benchmarks. PTAL.

Rebase and change to use new ThreadPool support.

• rafael added inline comments.Dec 15 2015, 8:12 AM

test/tools/gold/X86/thinlto.ll
12 ↗	(On Diff #42860)	This gold invocation is not being tested.
32 ↗	(On Diff #42860)	These two only check the t4.thinlto.bc. Don't you want to, for example, run llvm-nm on t4?
tools/gold/gold-plugin.cpp
33	Why do you need Linker.h?

tejohnson added inline comments.Dec 15 2015, 8:43 AM

test/tools/gold/X86/thinlto.ll
12 ↗	(On Diff #42860)	Right, it was just checking to ensure that it succeeded without an error.
32 ↗	(On Diff #42860)	Same as above, it was just checking for the invocation succeeding without an error. I could run llvm-nm on the output file and check for "T f", just to make sure it is there and not ill-formed. Is that what you had in mind?
tools/gold/gold-plugin.cpp
33	renameModuleForThinLTO

Add check for expected gold output file.

• rafael added inline comments.Dec 15 2015, 12:55 PM

tools/gold/gold-plugin.cpp
67	thread or task now?
925	Start the function name with a lower case. This is not a thread anymore. Task maybe?
930	It is thread safe since you create a new one for each thread, no?
945	Don't we want something that uses gold's symbol resolution? renameModuleForThinLTO will copy even stuff that gold has marked as preempted, no?
976	This seems incompatible with threading or even multiple outputs, no? It looks like every thread will try to use the same output file name. I would suggest producing an error for now.

Thanks for the comments. Responses and a couple questions below.

tools/gold/gold-plugin.cpp
67	Yeah, will change to TaskInfo and update the comments accordingly.
925	Will fix both issues
930	The comment was not entirely clear - what I meant was that I am creating a new one for each thread/task because of the fact that it is not thread-safe (i.e. they can't all share the same context). Will clarify.
945	Unfortunately all the ThinLTO promotion logic and renaming support is in the ModuleLinker, so I couldn't just use IRMover::move with the Keep list. Perhaps the ModuleLinker should have a mode where all it does is the promotion handling for exporting modules before calling IRMover::move. I.e. I think this would just need to do a version of processGlobalsForThinLTO where locals in a supplied ValuesToLink list (initiallized from the Keep list in gold) would be promoted if necessary. Similar to the existing processGlobalsForThinLTO but only for things already in the supplied ValuesToLink. Does that sound right?
976	The openOutputFile helper below will append a unique thread ID (should probably change this to TaskID...). Will add comment to that effect. Or do you still think it is better to error?

Comment at: tools/gold/gold-plugin.cpp:940
@@ +939,3 @@
+ std::unique_ptr<llvm::Module> RenamedModule =
+ renameModuleForThinLTO(M, &CombinedIndex);

+ if (!RenamedModule)

rafael wrote:

Don't we want something that uses gold's symbol resolution?

renameModuleForThinLTO will copy even stuff that gold has marked as preempted, no?

Unfortunately all the ThinLTO promotion logic and renaming support is in the ModuleLinker, so I couldn't just use IRMover::move with the Keep list.

Perhaps the ModuleLinker should have a mode where all it does is the promotion handling for exporting modules before calling IRMover::move. I.e. I think this would just need to do a version of processGlobalsForThinLTO where locals in a supplied ValuesToLink list (initiallized from the Keep list in gold) would be promoted if necessary. Similar to the existing processGlobalsForThinLTO but only for things already in the supplied ValuesToLink.

Does that sound right?

Not sure. Thinking a bit more about it I think I am missing the big picture.

I was at least under the impression that we could:

Run the IRMover more or less like we do in normal LTO, but instead

of moving to a merged module, each task gets one file and moves it
into an empty module.

Run a transformation that updates name and visibility in place.
For each module we want to cherry pick something, FunctionImport brings it in.

Comment at: tools/gold/gold-plugin.cpp:971
@@ +970,3 @@
+ if (!options::obj_path.empty())
+ Filename = options::obj_path;

+ else if (options::TheOutputType == options::OT_SAVE_TEMPS)

rafael wrote:

This seems incompatible with threading or even multiple outputs, no? It looks like every thread will try to use the same output file name.

I would suggest producing an error for now.

The openOutputFile helper below will append a unique thread ID (should probably change this to TaskID...). Will add comment to that effect. Or do you still think it is better to error?

No, but please add a test :-)

Cheers,
Rafael

Rebase and address Rafael's review comments.

I think I have addressed all your comments. Notable changes from prior version:
Use gold's symbol resolution via IRMover, and invoke renameModuleForThinLTO afterwards to do renaming (with TODO noting that this is temporary until we can do this in place)
Rebase to use new RAII wrapper for plugin file handling. Add move assignment/copy constructor to enable moving ownership into TaskInfo object. Change RAII PluginInputFile wrapper to use a unique_ptr for the ld_plugin_input_file object so that it can be moved, and add a flag to prevent double-release on a moved object.
s/Thread/Task/
Add test to ensure gold's symbol resolution not overridden.
Add test for plugin option obj-path handling with ThinLTO threads

• rafael added inline comments.Dec 17 2015, 9:07 AM

tools/gold/gold-plugin.cpp
67–68	Please rebase the patch.
68	You can just memcpy the ld_plugin_input_file, no ?

tejohnson added inline comments.Dec 17 2015, 11:16 AM

tools/gold/gold-plugin.cpp
67–68	Will do and upload the new one shortly.
68	I could and I considered that. As currently defined by gold it would save to memcpy. However, I thought it would be better to use a unique_ptr since it doesn't assume anything about the structure which isn't defined here, and it seemed clearer and cleaner to avoid copying. Note we pass a reference to this member to the ThreadPool::async to be used by the thread, that would have to be changed to a memcpy as well.

Rebased

• rafael added inline comments.Dec 17 2015, 1:35 PM

tools/gold/gold-plugin.cpp
68	OK. If we have a std::unique_ptr, we can use it instead of the Valid field, no? Valid is false iff File is null.

I got this warning:

/home/espindola/llvm/llvm/tools/gold/gold-plugin.cpp:133:17: warning: private field 'F' is not used [-Wunused-private-field]

claimed_file *F

tools/gold/gold-plugin.cpp
943	If you change codegenImpl to take an ArrayRef you don't have to do this.

In D15390#313349, @rafael wrote:
I got this warning:

/home/espindola/llvm/llvm/tools/gold/gold-plugin.cpp:133:17: warning: private field 'F' is not used [-Wunused-private-field]
claimed_file *F

Will fix. Looks like there are some stale comments about using this class for the join, which isn't necessary after switching to the ThreadPool. Will clean that up.

tools/gold/gold-plugin.cpp
68	Good point, will fix this.
943	Ok, will change.

We should probably refactor splitCodeGen. It is odd that now we have
two parallel codegen paths. With thinLTO we already multiple BC files,
so it should probably look something like

if (SplitForParallelCodeGen)

ProduceMultipleModules();

Create the tasks.

Each task handles one bc file, which may be one of the original ones
if using thinLto or one of the split ones.

Address review comments/suggestions

In D15390#313367, @rafael wrote:
We should probably refactor splitCodeGen. It is odd that now we have
two parallel codegen paths. With thinLTO we already multiple BC files,
so it should probably look something like

if (SplitForParallelCodeGen)
ProduceMultipleModules();
Create the tasks.

Each task handles one bc file, which may be one of the original ones
if using thinLto or one of the split ones.

This will require some refactoring of SplitModule() as well, which currently takes a callback (that actually creates each thread) and does the module splitting. For the case where we don't want multiple split modules, like in ThinLTO, we simply pass a single output stream. Note that in both the split and non-split case the same codegen() routine is called to do the actual codegen part.

I think I've addressed all of your other comments. PTAL. Thanks!

Ping

tejohnson mentioned this in D15696: [ThinLTO] Enable in-place symbol changes for exporting module.Dec 21 2015, 11:10 AM

• rafael added inline comments.Dec 22 2015, 12:21 PM

tools/gold/gold-plugin.cpp
859	splitCodeGen can take a ArrayRef too. Why do you need the vec?
946	It seems odd how much work the destructor of TaskInfo is doing. Most of the work is here because gold is not thread safe, correct? It so, it seems better to write this code explicitly after ThinLTOThreadPool.wait();
969	Why do you need a worklist? Can't you just just a simple loop over Modules?
1004	This can be just Tasks.emplace_back(new TaskInfo(std::move(InputFile), std::move(OS), NewFilename.c_str(), TempOutFile));

Per IRC discussion, will do some refactoring of splitCodeGen next, then subsequently rebase this patch on top of that. But I wanted to reply to the latest comments here and upload a new patch that addresses them first.

tools/gold/gold-plugin.cpp
859	Ah ok, fixed.
946	Changed TaskInfo::~TaskInfo into TaskInfo::cleanup and invoked explicitly on each task after the wait().
969	I think this was leftover from my original pre-ThreadPool implementation. Good point that it isn't needed. Updated to iterate over Modules as suggested.
1004	Fixed

Address latest feedback.

For now I have included this along with the ThinLTO thread patch, to
show how it all fits together. I can commit the split code gen changes
first though, followed by the ThinLTO backend support. Let me know if
you would like to review these separately.

mehdi_amini added inline comments.Jan 2 2016, 7:28 PM

include/llvm/Support/thread.h
60 ↗	(On Diff #43184)	Can be committed separately I think.

Ping.

include/llvm/Support/thread.h
60 ↗	(On Diff #43745)	Will do.

Rebase and improve -save-temps behavior with ThinLTO

tejohnson added a child revision: D16173: [ThinLTO] Ensure prevailing linkonce emitted as weak in ThinLTO backends.Jan 13 2016, 8:09 PM

tejohnson updated this object.Jan 13 2016, 8:15 PM

Ping.

Using this support extensively in my own ThinLTO spec testing. Would be great to get this reviewed and in tree. =)

Note it involves some refactoring of the split codegen path as suggested by Rafael on IRC (see the comment history for details, specifically Dec 29 update).

I'm not familiar with Gold, but here are a few minor comments

tools/gold/gold-plugin.cpp
67–68	(same =default here)
79	Any difference with `PluginInputFile(PluginInputFile &&RHS) = default;` ?
830	Note: you could reuse the TargetMachine for the next module processed by this thread.
961	There is a bunch of duplicated code above (used in regular LTO as well I think)

In D15390#338559, @joker.eph wrote:

I'm not familiar with Gold, but here are a few minor comments

Thanks for the comments!

tools/gold/gold-plugin.cpp
67–68	Ditto.
79	Good point, will change to default
830	That's an interesting idea. But I don't think I have any ability to control this once I send tasks to the thread pool. Is there a good way to share things across tasks assigned to the same thread by the pool?
961	True, the LTO handling in allSymbolsReadHook does some of the same things. But the LLVMContext and IRMover constructors are outside the loop over the modules since they can be shared in that case. And the invocation of getModuleForFile is a bit different. I could probably create a helper that does the getModuleForFile, setting of the target triple, and invoke IRMover::move though. I'm not sure if that ends up being clearer, but let me see what I could do here.

Address review comments. Use default move constructors, and refactor
common code into a helper.

Some more comment.

tools/gold/gold-plugin.cpp
815	It took me some time to understand what was going on, this "recursive" use of the CodeGen class can be confusing. As long as it is limited to this file I won't object.
830	Yeah is it annoying, in my local implementation I store a "per thread context" in a global map protect retrieving the Context with a mutex. (I'm not asking you to do the same here and now)
909	Could this be done in the TaskInfo dtor?
939	Usually I prefer RAII (i.e. using a new scope).
1017	This could be std::vector<ThinLTOTaskInfo> Tasks; Tasks.reserve(Modules.size()); (same above for `std::vector<std::unique_ptr<TaskInfo>> Tasks;` around line 1023)

tejohnson added inline comments.Feb 1 2016, 4:17 PM

tools/gold/gold-plugin.cpp
815	Yeah, it was unfortunatly hard to get the refactoring and code sharing between the different modes without doing this. So I tried to document it as well as I could.
909	I previously had it there, but Rafael thought the dtor was too heavy-weight and wanted it more explicit. =)
939	Oh I see, the wait() is unnecessary if I provoke the ThreadPool destructor via RAII. Will do that here and for the ThinLTO thread pool as well.
1017	Ok

Address more review comments: Use RAII on ThreadPool instead of expicit
wait(), and reserve TaskInfo vectors rather than emplacing unique_ptrs.

Ping

tejohnson added a reviewer: pcc.Feb 9 2016, 4:39 PM

• rafael added inline comments.Feb 9 2016, 4:54 PM

tools/gold/gold-plugin.cpp
876	s/thread/task/
925	It is nice that now we always use a ThreadPool. It would be awesome if this could be refactored so that there was just one ThreadPool for thinlto and conventional parallel codegen.

tejohnson added inline comments.Feb 10 2016, 8:10 AM

tools/gold/gold-plugin.cpp
876	Fixed here and a couple other places.
925	The task type is different, and the iteration/handling to add tasks is different - I'm not sure how much code reuse we would get by sharing the thread pool creation and management code. The code to create the thread pool and insert into it is pretty minimal by itself. Also note that you never are using both thread pools in a single compilation.

s/thread/task/ in a couple places

pcc added inline comments.Feb 12 2016, 12:24 PM

tools/gold/gold-plugin.cpp
915	I don't think thread pools are necessary for split code gen, as we can already perfectly assign the right amount of work to individual threads. Also, this implementation loses the pipelining feature from the original code (i.e. worker threads can work on codegen'ing while the main thread is still splitting). I would prefer you to use the existing implementation in `llvm/CodeGen/ParallelCG.h`.

mehdi_amini added inline comments.Feb 12 2016, 2:07 PM

tools/gold/gold-plugin.cpp
915	The thread that is doing the splitting can issue other jobs to the thread pool, providing the desired pipeline. Just fuse the loop body below within the lambda...

mehdi_amini added inline comments.Feb 12 2016, 2:10 PM

tools/gold/gold-plugin.cpp
915	I'll add that while the pooling is not necessary if you only queue as many jobs as you have threads, it is not a reason by itself not to use it: the paradigm is fairly clear, and it decouples the actual splitting granularity from the number of actual worker threads, allowing to experiment with different numbers for each (providing better pipelining for instance).

tejohnson mentioned this in D17115: Define the ThinLTO Pipeline.Feb 12 2016, 2:14 PM

pcc added inline comments.Feb 12 2016, 2:22 PM

tools/gold/gold-plugin.cpp
915	Yes, but the current implementation doesn't need any of that. If we experimentally find that decoupling would provide some benefit, then by all means we can start using thread pools here. In any case, if there is a compelling reason to use thread pools, the right place to make the change is in `lib/CodeGen/ParallelCG.cpp` rather than in a duplicate implementation here. We can defer what the design for that should look like simply by not using thread pools yet.

Modify the patch to implement what is hopefully a compromise solution on
split code gen. I modified lib/CodeGen/ParallelCG.cpp to use a
ThreadPool, and go back to invoking it from the gold plugin.

This has a few nice effects:

ThreadPool used by all ParallelCG consumers.
Restores the pipelining of splitting and codegen (although note that with a tweak to the old version of this thread that this could be attained in the gold-plugin implementation as well).
Avoids the recursive construction of the CodeGen object on the split code gen path.

Can one of you take a look and see if this is acceptable, and if so and
there are not other comments, mark it accepted?

Update a comment to match new version. Also, rename the CodeGen Filename
member to SaveTempsFilename to make it clearer and disambiguate from
places that use Filename as a local var, and initialize it as expected
for ThinLTO. Found this issue while testing changes to dependent patch
D16173.

tejohnson mentioned this in D16173: [ThinLTO] Ensure prevailing linkonce emitted as weak in ThinLTO backends.Feb 25 2016, 9:11 AM

Seems reasonable to me. Mehdi?

Sure!

Great, thanks. Do either of you have any other comments or if not can one of you mark this accepted?

LGTM

tools/gold/gold-plugin.cpp
864–865	I don't think this should be dependent on a property of the host machine, as there are behavioral differences between parallelism levels (e.g. symbol ordering will be different, and some uses of inline asm won't work with parallelism >1, although some of that is arguably a bug). Can you please update the comment to reflect that?

This revision is now accepted and ready to land.Mar 3 2016, 10:57 AM

In D15390#367460, @pcc wrote:

LGTM

Thanks!

tools/gold/gold-plugin.cpp
864–865	Ok, will do.

tejohnson mentioned this in rL262677: Add hardware_concurrency interface to llvm::thread (NFC).Mar 3 2016, 4:30 PM

tejohnson mentioned this in rL262719: Change split code gen to use ThreadPool.Mar 4 2016, 7:44 AM

tejohnson mentioned this in rL262721: Refactor gold-plugin codegen to prepare for ThinLTO threads (NFC).Mar 4 2016, 8:40 AM

Closed by commit rL262724: [ThinLTO] Launch importing backends in parallel threads from gold plugin (authored by tejohnson). · Explain WhyMar 4 2016, 9:10 AM

This revision was automatically updated to reflect the committed changes.

tejohnson mentioned this in rL262724: [ThinLTO] Launch importing backends in parallel threads from gold plugin.

Revision Contents

Path

Size

include/

llvm/

Transforms/

IPO.h

2 lines

IPO/

PassManagerBuilder.h

2 lines

lib/

Transforms/

IPO/

FunctionImport.cpp

6 lines

tools/

gold/

gold-plugin.cpp

130 lines

Diff 42323

include/llvm/Transforms/IPO.h

	Show First 20 Lines • Show All 82 Lines • ▼ Show 20 Lines
	/// the specified global values. Otherwise, it deletes as much of the module as			/// the specified global values. Otherwise, it deletes as much of the module as
	/// possible, except for the global values specified.			/// possible, except for the global values specified.
	///			///
	ModulePass createGVExtractionPass(std::vector<GlobalValue>& GVs, bool			ModulePass createGVExtractionPass(std::vector<GlobalValue>& GVs, bool
	deleteFn = false);			deleteFn = false);

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	/// This pass performs iterative function importing from other modules.			/// This pass performs iterative function importing from other modules.
	Pass createFunctionImportPass(FunctionInfoIndex Index = nullptr);			Pass createFunctionImportPass(const FunctionInfoIndex Index = nullptr);

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	/// createFunctionInliningPass - Return a new pass object that uses a heuristic			/// createFunctionInliningPass - Return a new pass object that uses a heuristic
	/// to inline direct function calls to small functions.			/// to inline direct function calls to small functions.
	///			///
	/// The Threshold can be passed directly, or asked to be computed from the			/// The Threshold can be passed directly, or asked to be computed from the
	/// given optimization and size optimization arguments.			/// given optimization and size optimization arguments.
	///			///
	▲ Show 20 Lines • Show All 127 Lines • Show Last 20 Lines

include/llvm/Transforms/IPO/PassManagerBuilder.h

Show First 20 Lines • Show All 111 Lines • ▼ Show 20 Lines	public:
/// per-module pass pipeline.		/// per-module pass pipeline.
TargetLibraryInfoImpl *LibraryInfo;		TargetLibraryInfoImpl *LibraryInfo;

/// Inliner - Specifies the inliner to use. If this is non-null, it is		/// Inliner - Specifies the inliner to use. If this is non-null, it is
/// added to the per-module passes.		/// added to the per-module passes.
Pass *Inliner;		Pass *Inliner;

/// The function summary index to use for function importing.		/// The function summary index to use for function importing.
FunctionInfoIndex *FunctionIndex;		const FunctionInfoIndex *FunctionIndex;

bool DisableTailCalls;		bool DisableTailCalls;
bool DisableUnitAtATime;		bool DisableUnitAtATime;
bool DisableUnrollLoops;		bool DisableUnrollLoops;
bool BBVectorize;		bool BBVectorize;
bool SLPVectorize;		bool SLPVectorize;
bool LoopVectorize;		bool LoopVectorize;
bool RerollLoops;		bool RerollLoops;
▲ Show 20 Lines • Show All 51 Lines • Show Last 20 Lines

lib/Transforms/IPO/FunctionImport.cpp

Show First 20 Lines • Show All 252 Lines • ▼ Show 20 Lines	getFunctionIndexForFile(StringRef Path, std::string &Error,
}		}
return (*ObjOrErr)->takeIndex();		return (*ObjOrErr)->takeIndex();
}		}

/// Pass that performs cross-module function import provided a summary file.		/// Pass that performs cross-module function import provided a summary file.
class FunctionImportPass : public ModulePass {		class FunctionImportPass : public ModulePass {
/// Optional function summary index to use for importing, otherwise		/// Optional function summary index to use for importing, otherwise
/// the summary-file option must be specified.		/// the summary-file option must be specified.
FunctionInfoIndex *Index;		const FunctionInfoIndex *Index;

public:		public:
/// Pass identification, replacement for typeid		/// Pass identification, replacement for typeid
static char ID;		static char ID;

/// Specify pass name for debug output		/// Specify pass name for debug output
const char *getPassName() const override {		const char *getPassName() const override {
return "Function Importing";		return "Function Importing";
}		}

explicit FunctionImportPass(FunctionInfoIndex *Index = nullptr)		explicit FunctionImportPass(const FunctionInfoIndex *Index = nullptr)
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions Const correctness is great, please commit now separately. mehdi_amini: Const correctness is great, please commit now separately.
		tejohnsonAuthorUnsubmitted Not Done Reply Inline Actions Done already, just rebased this to pick it up. tejohnson: Done already, just rebased this to pick it up.
: ModulePass(ID), Index(Index) {}		: ModulePass(ID), Index(Index) {}

bool runOnModule(Module &M) override {		bool runOnModule(Module &M) override {
if (SummaryFile.empty() && !Index)		if (SummaryFile.empty() && !Index)
report_fatal_error("error: -function-import requires -summary-file or "		report_fatal_error("error: -function-import requires -summary-file or "
"file from frontend\n");		"file from frontend\n");
std::unique_ptr<FunctionInfoIndex> IndexPtr;		std::unique_ptr<FunctionInfoIndex> IndexPtr;
if (!SummaryFile.empty()) {		if (!SummaryFile.empty()) {
Show All 22 Lines

char FunctionImportPass::ID = 0;		char FunctionImportPass::ID = 0;
INITIALIZE_PASS_BEGIN(FunctionImportPass, "function-import",		INITIALIZE_PASS_BEGIN(FunctionImportPass, "function-import",
"Summary Based Function Import", false, false)		"Summary Based Function Import", false, false)
INITIALIZE_PASS_END(FunctionImportPass, "function-import",		INITIALIZE_PASS_END(FunctionImportPass, "function-import",
"Summary Based Function Import", false, false)		"Summary Based Function Import", false, false)

namespace llvm {		namespace llvm {
Pass createFunctionImportPass(FunctionInfoIndex Index = nullptr) {		Pass createFunctionImportPass(const FunctionInfoIndex Index = nullptr) {
return new FunctionImportPass(Index);		return new FunctionImportPass(Index);
}		}
}		}

tools/gold/gold-plugin.cpp

Show All 24 Lines
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/DiagnosticInfo.h"		#include "llvm/IR/DiagnosticInfo.h"
#include "llvm/IR/DiagnosticPrinter.h"		#include "llvm/IR/DiagnosticPrinter.h"
#include "llvm/IR/LLVMContext.h"		#include "llvm/IR/LLVMContext.h"
#include "llvm/IR/LegacyPassManager.h"		#include "llvm/IR/LegacyPassManager.h"
#include "llvm/IR/Module.h"		#include "llvm/IR/Module.h"
#include "llvm/IR/Verifier.h"		#include "llvm/IR/Verifier.h"
#include "llvm/Linker/Linker.h"		#include "llvm/Linker/Linker.h"
#include "llvm/MC/SubtargetFeature.h"		#include "llvm/MC/SubtargetFeature.h"
		rafaelUnsubmitted Not Done Reply Inline Actions Why do you need Linker.h? rafael: Why do you need Linker.h?
		tejohnsonAuthorUnsubmitted Not Done Reply Inline Actions renameModuleForThinLTO tejohnson: renameModuleForThinLTO
#include "llvm/Object/IRObjectFile.h"
#include "llvm/Object/FunctionIndexObjectFile.h"		#include "llvm/Object/FunctionIndexObjectFile.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Object/IRObjectFile.h"
#include "llvm/Support/Host.h"		#include "llvm/Support/Host.h"
#include "llvm/Support/ManagedStatic.h"		#include "llvm/Support/ManagedStatic.h"
#include "llvm/Support/MemoryBuffer.h"		#include "llvm/Support/MemoryBuffer.h"
#include "llvm/Support/TargetRegistry.h"		#include "llvm/Support/TargetRegistry.h"
#include "llvm/Support/TargetSelect.h"		#include "llvm/Support/TargetSelect.h"
		#include "llvm/Support/raw_ostream.h"
		#include "llvm/Support/thread.h"
#include "llvm/Transforms/IPO.h"		#include "llvm/Transforms/IPO.h"
#include "llvm/Transforms/IPO/PassManagerBuilder.h"		#include "llvm/Transforms/IPO/PassManagerBuilder.h"
#include "llvm/Transforms/Utils/GlobalStatus.h"		#include "llvm/Transforms/Utils/GlobalStatus.h"
#include "llvm/Transforms/Utils/ModuleUtils.h"		#include "llvm/Transforms/Utils/ModuleUtils.h"
#include "llvm/Transforms/Utils/ValueMapper.h"		#include "llvm/Transforms/Utils/ValueMapper.h"
#include <list>		#include <list>
#include <plugin-api.h>		#include <plugin-api.h>
#include <system_error>		#include <system_error>
#include <vector>		#include <vector>

#ifndef LDPO_PIE		#ifndef LDPO_PIE
// FIXME: remove this declaration when we stop maintaining Ubuntu Quantal and		// FIXME: remove this declaration when we stop maintaining Ubuntu Quantal and
// Precise and Debian Wheezy (binutils 2.23 is required)		// Precise and Debian Wheezy (binutils 2.23 is required)
# define LDPO_PIE 3		# define LDPO_PIE 3
#endif		#endif

using namespace llvm;		using namespace llvm;

namespace {		namespace {
struct claimed_file {		struct claimed_file {
void *handle;		void *handle;
std::vector<ld_plugin_symbol> syms;		std::vector<ld_plugin_symbol> syms;
};		};
}		}

		rafaelUnsubmitted Not Done Reply Inline Actions thread or task now? rafael: thread or task now?
		tejohnsonAuthorUnsubmitted Not Done Reply Inline Actions Yeah, will change to TaskInfo and update the comments accordingly. tejohnson: Yeah, will change to TaskInfo and update the comments accordingly.
static ld_plugin_status discard_message(int level, const char *format, ...) {		static ld_plugin_status discard_message(int level, const char *format, ...) {
		rafaelUnsubmitted Not Done Reply Inline Actions Please rebase the patch. rafael: Please rebase the patch.
		tejohnsonAuthorUnsubmitted Not Done Reply Inline Actions Will do and upload the new one shortly. tejohnson: Will do and upload the new one shortly.
		rafaelUnsubmitted Not Done Reply Inline Actions You can just memcpy the ld_plugin_input_file, no ? rafael: You can just memcpy the ld_plugin_input_file, no ?
		tejohnsonAuthorUnsubmitted Not Done Reply Inline Actions I could and I considered that. As currently defined by gold it would save to memcpy. However, I thought it would be better to use a unique_ptr since it doesn't assume anything about the structure which isn't defined here, and it seemed clearer and cleaner to avoid copying. Note we pass a reference to this member to the ThreadPool::async to be used by the thread, that would have to be changed to a memcpy as well. tejohnson: I could and I considered that. As currently defined by gold it would save to memcpy. However, I…
		rafaelUnsubmitted Not Done Reply Inline Actions OK. If we have a std::unique_ptr, we can use it instead of the Valid field, no? Valid is false iff File is null. rafael: OK. If we have a std::unique_ptr, we can use it instead of the Valid field, no? Valid is false…
		tejohnsonAuthorUnsubmitted Not Done Reply Inline Actions Good point, will fix this. tejohnson: Good point, will fix this.
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions (same =default here) mehdi_amini: (same =default here)
		tejohnsonAuthorUnsubmitted Not Done Reply Inline Actions Ditto. tejohnson: Ditto.
// Die loudly. Recent versions of Gold pass ld_plugin_message as the first		// Die loudly. Recent versions of Gold pass ld_plugin_message as the first
// callback in the transfer vector. This should never be called.		// callback in the transfer vector. This should never be called.
abort();		abort();
}		}

static ld_plugin_get_input_file get_input_file = nullptr;		static ld_plugin_get_input_file get_input_file = nullptr;
static ld_plugin_release_input_file release_input_file = nullptr;		static ld_plugin_release_input_file release_input_file = nullptr;
static ld_plugin_add_symbols add_symbols = nullptr;		static ld_plugin_add_symbols add_symbols = nullptr;
static ld_plugin_get_symbols get_symbols = nullptr;		static ld_plugin_get_symbols get_symbols = nullptr;
static ld_plugin_add_input_file add_input_file = nullptr;		static ld_plugin_add_input_file add_input_file = nullptr;
static ld_plugin_set_extra_library_path set_extra_library_path = nullptr;		static ld_plugin_set_extra_library_path set_extra_library_path = nullptr;
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions Any difference with `PluginInputFile(PluginInputFile &&RHS) = default;` ? mehdi_amini: Any difference with ` PluginInputFile(PluginInputFile &&RHS) = default;` ?
		tejohnsonAuthorUnsubmitted Not Done Reply Inline Actions Good point, will change to default tejohnson: Good point, will change to default
static ld_plugin_get_view get_view = nullptr;		static ld_plugin_get_view get_view = nullptr;
static ld_plugin_message message = discard_message;		static ld_plugin_message message = discard_message;
static Reloc::Model RelocationModel = Reloc::Default;		static Reloc::Model RelocationModel = Reloc::Default;
static std::string output_name = "";		static std::string output_name = "";
static std::list<claimed_file> Modules;		static std::list<claimed_file> Modules;
static std::vector<std::string> Cleanup;		static std::vector<std::string> Cleanup;
static llvm::TargetOptions TargetOpts;		static llvm::TargetOptions TargetOpts;

namespace options {		namespace options {
enum OutputType {		enum OutputType {
OT_NORMAL,		OT_NORMAL,
OT_DISABLE,		OT_DISABLE,
OT_BC_ONLY,		OT_BC_ONLY,
OT_SAVE_TEMPS		OT_SAVE_TEMPS
};		};
static bool generate_api_file = false;		static bool generate_api_file = false;
static OutputType TheOutputType = OT_NORMAL;		static OutputType TheOutputType = OT_NORMAL;
static unsigned OptLevel = 2;		static unsigned OptLevel = 2;
static unsigned Parallelism = 1;		static unsigned Parallelism = 0;
#ifdef NDEBUG		#ifdef NDEBUG
static bool DisableVerify = true;		static bool DisableVerify = true;
#else		#else
static bool DisableVerify = false;		static bool DisableVerify = false;
#endif		#endif
static std::string obj_path;		static std::string obj_path;
static std::string extra_library_path;		static std::string extra_library_path;
static std::string triple;		static std::string triple;
static std::string mcpu;		static std::string mcpu;
// When the thinlto plugin option is specified, only read the function		// When the thinlto plugin option is specified, only read the function
// the information from intermediate files and write a combined		// the information from intermediate files and write a combined
// global index for the ThinLTO backends.		// global index for the ThinLTO backends.
static bool thinlto = false;		static bool thinlto = false;
		// If false, all ThinLTO backend compilations through code gen are performed
		// using multiple threads in the gold-plugin, before handing control back to
		// gold. If true, exit after creating the combined index, the assuming is
		// that the build system will launch the backend processes.
		static bool thinlto_index_only = false;
// Additional options to pass into the code generator.		// Additional options to pass into the code generator.
// Note: This array will contain all plugin options which are not claimed		// Note: This array will contain all plugin options which are not claimed
// as plugin exclusive to pass to the code generator.		// as plugin exclusive to pass to the code generator.
// For example, "generate-api-file" and "as"options are for the plugin		// For example, "generate-api-file" and "as"options are for the plugin
// use only and will not be passed.		// use only and will not be passed.
static std::vector<const char *> extra;		static std::vector<const char *> extra;

static void process_plugin_option(const char *opt_)		static void process_plugin_option(const char *opt_)
Show All 15 Lines	static void process_plugin_option(const char *opt_)
} else if (opt == "emit-llvm") {		} else if (opt == "emit-llvm") {
TheOutputType = OT_BC_ONLY;		TheOutputType = OT_BC_ONLY;
} else if (opt == "save-temps") {		} else if (opt == "save-temps") {
TheOutputType = OT_SAVE_TEMPS;		TheOutputType = OT_SAVE_TEMPS;
} else if (opt == "disable-output") {		} else if (opt == "disable-output") {
TheOutputType = OT_DISABLE;		TheOutputType = OT_DISABLE;
} else if (opt == "thinlto") {		} else if (opt == "thinlto") {
thinlto = true;		thinlto = true;
		} else if (opt == "thinlto-index-only") {
		thinlto_index_only = true;
} else if (opt.size() == 2 && opt[0] == 'O') {		} else if (opt.size() == 2 && opt[0] == 'O') {
if (opt[1] < '0' \|\| opt[1] > '3')		if (opt[1] < '0' \|\| opt[1] > '3')
message(LDPL_FATAL, "Optimization level must be between 0 and 3");		message(LDPL_FATAL, "Optimization level must be between 0 and 3");
OptLevel = opt[1] - '0';		OptLevel = opt[1] - '0';
} else if (opt.startswith("jobs=")) {		} else if (opt.startswith("jobs=")) {
if (StringRef(opt_ + 5).getAsInteger(10, Parallelism))		if (StringRef(opt_ + 5).getAsInteger(10, Parallelism))
message(LDPL_FATAL, "Invalid parallelism level: %s", opt_ + 5);		message(LDPL_FATAL, "Invalid parallelism level: %s", opt_ + 5);
} else if (opt == "disable-verify") {		} else if (opt == "disable-verify") {
▲ Show 20 Lines • Show All 235 Lines • ▼ Show 20 Lines	static ld_plugin_status claim_file_hook(const ld_plugin_input_file *file,

Modules.resize(Modules.size() + 1);		Modules.resize(Modules.size() + 1);
claimed_file &cf = Modules.back();		claimed_file &cf = Modules.back();

cf.handle = file->handle;		cf.handle = file->handle;

// If we are doing ThinLTO compilation, don't need to process the symbols.		// If we are doing ThinLTO compilation, don't need to process the symbols.
// Later we simply build a combined index file after all files are claimed.		// Later we simply build a combined index file after all files are claimed.
if (options::thinlto)		if (options::thinlto && options::thinlto_index_only)
return LDPS_OK;		return LDPS_OK;

for (auto &Sym : Obj->symbols()) {		for (auto &Sym : Obj->symbols()) {
uint32_t Symflags = Sym.getFlags();		uint32_t Symflags = Sym.getFlags();
if (shouldSkip(Symflags))		if (shouldSkip(Symflags))
continue;		continue;

cf.syms.push_back(ld_plugin_symbol());		cf.syms.push_back(ld_plugin_symbol());
▲ Show 20 Lines • Show All 367 Lines • ▼ Show 20 Lines	getModuleForFile(LLVMContext &Context, claimed_file &F,
}		}

for (auto *GV : Drop)		for (auto *GV : Drop)
drop(*GV);		drop(*GV);

return Obj.takeModule();		return Obj.takeModule();
}		}

static void runLTOPasses(Module &M, TargetMachine &TM) {		static void runLTOPasses(Module &M, TargetMachine &TM,
		const FunctionInfoIndex *Index) {
M.setDataLayout(TM.createDataLayout());		M.setDataLayout(TM.createDataLayout());

legacy::PassManager passes;		legacy::PassManager passes;
passes.add(createTargetTransformInfoWrapperPass(TM.getTargetIRAnalysis()));		passes.add(createTargetTransformInfoWrapperPass(TM.getTargetIRAnalysis()));

PassManagerBuilder PMB;		PassManagerBuilder PMB;
PMB.LibraryInfo = new TargetLibraryInfoImpl(Triple(TM.getTargetTriple()));		PMB.LibraryInfo = new TargetLibraryInfoImpl(Triple(TM.getTargetTriple()));
PMB.Inliner = createFunctionInliningPass();		PMB.Inliner = createFunctionInliningPass();
// Unconditionally verify input since it is not verified before this		// Unconditionally verify input since it is not verified before this
// point and has unknown origin.		// point and has unknown origin.
PMB.VerifyInput = true;		PMB.VerifyInput = true;
PMB.VerifyOutput = !options::DisableVerify;		PMB.VerifyOutput = !options::DisableVerify;
PMB.LoopVectorize = true;		PMB.LoopVectorize = true;
PMB.SLPVectorize = true;		PMB.SLPVectorize = true;
PMB.OptLevel = options::OptLevel;		PMB.OptLevel = options::OptLevel;
		PMB.FunctionIndex = Index;
PMB.populateLTOPassManager(passes);		PMB.populateLTOPassManager(passes);
passes.run(M);		passes.run(M);
}		}

static void saveBCFile(StringRef Path, Module &M) {		static void saveBCFile(StringRef Path, Module &M) {
std::error_code EC;		std::error_code EC;
raw_fd_ostream OS(Path, EC, sys::fs::OpenFlags::F_None);		raw_fd_ostream OS(Path, EC, sys::fs::OpenFlags::F_None);
if (EC)		if (EC)
message(LDPL_FATAL, "Failed to write the output file.");		message(LDPL_FATAL, "Failed to write the output file.");
WriteBitcodeToFile(&M, OS, /* ShouldPreserveUseListOrder */ true);		WriteBitcodeToFile(&M, OS, /* ShouldPreserveUseListOrder */ true);
}		}

static void codegen(std::unique_ptr<Module> M) {		static void codegen(std::unique_ptr<Module> M, unsigned int MaxThreads,
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions It took me some time to understand what was going on, this "recursive" use of the CodeGen class can be confusing. As long as it is limited to this file I won't object. mehdi_amini: It took me some time to understand what was going on, this "recursive" use of the CodeGen class…
		tejohnsonAuthorUnsubmitted Not Done Reply Inline Actions Yeah, it was unfortunatly hard to get the refactoring and code sharing between the different modes without doing this. So I tried to document it as well as I could. tejohnson: Yeah, it was unfortunatly hard to get the refactoring and code sharing between the different…
		const FunctionInfoIndex *CombinedIndex = nullptr) {
const std::string &TripleStr = M->getTargetTriple();		const std::string &TripleStr = M->getTargetTriple();
Triple TheTriple(TripleStr);		Triple TheTriple(TripleStr);

std::string ErrMsg;		std::string ErrMsg;
const Target *TheTarget = TargetRegistry::lookupTarget(TripleStr, ErrMsg);		const Target *TheTarget = TargetRegistry::lookupTarget(TripleStr, ErrMsg);
if (!TheTarget)		if (!TheTarget)
message(LDPL_FATAL, "Target not found: %s", ErrMsg.c_str());		message(LDPL_FATAL, "Target not found: %s", ErrMsg.c_str());

if (unsigned NumOpts = options::extra.size())		if (unsigned NumOpts = options::extra.size())
cl::ParseCommandLineOptions(NumOpts, &options::extra[0]);		cl::ParseCommandLineOptions(NumOpts, &options::extra[0]);

SubtargetFeatures Features;		SubtargetFeatures Features;
Features.getDefaultSubtargetFeatures(TheTriple);		Features.getDefaultSubtargetFeatures(TheTriple);
for (const std::string &A : MAttrs)		for (const std::string &A : MAttrs)
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions Note: you could reuse the TargetMachine for the next module processed by this thread. mehdi_amini: Note: you could reuse the TargetMachine for the next module processed by this thread.
		tejohnsonAuthorUnsubmitted Not Done Reply Inline Actions That's an interesting idea. But I don't think I have any ability to control this once I send tasks to the thread pool. Is there a good way to share things across tasks assigned to the same thread by the pool? tejohnson: That's an interesting idea. But I don't think I have any ability to control this once I send…
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions Yeah is it annoying, in my local implementation I store a "per thread context" in a global map protect retrieving the Context with a mutex. (I'm not asking you to do the same here and now) mehdi_amini: Yeah is it annoying, in my local implementation I store a "per thread context" in a global map…
Features.AddFeature(A);		Features.AddFeature(A);

TargetOptions Options = InitTargetOptionsFromCodeGenFlags();		TargetOptions Options = InitTargetOptionsFromCodeGenFlags();
CodeGenOpt::Level CGOptLevel;		CodeGenOpt::Level CGOptLevel;
switch (options::OptLevel) {		switch (options::OptLevel) {
case 0:		case 0:
CGOptLevel = CodeGenOpt::None;		CGOptLevel = CodeGenOpt::None;
break;		break;
case 1:		case 1:
CGOptLevel = CodeGenOpt::Less;		CGOptLevel = CodeGenOpt::Less;
break;		break;
case 2:		case 2:
CGOptLevel = CodeGenOpt::Default;		CGOptLevel = CodeGenOpt::Default;
break;		break;
case 3:		case 3:
CGOptLevel = CodeGenOpt::Aggressive;		CGOptLevel = CodeGenOpt::Aggressive;
break;		break;
}		}
std::unique_ptr<TargetMachine> TM(TheTarget->createTargetMachine(		std::unique_ptr<TargetMachine> TM(TheTarget->createTargetMachine(
TripleStr, options::mcpu, Features.getString(), Options, RelocationModel,		TripleStr, options::mcpu, Features.getString(), Options, RelocationModel,
CodeModel::Default, CGOptLevel));		CodeModel::Default, CGOptLevel));

runLTOPasses(M, TM);		runLTOPasses(M, TM, CombinedIndex);

if (options::TheOutputType == options::OT_SAVE_TEMPS)		if (options::TheOutputType == options::OT_SAVE_TEMPS)
saveBCFile(output_name + ".opt.bc", *M);		saveBCFile(output_name + ".opt.bc", *M);

SmallString<128> Filename;		SmallString<128> Filename;
if (!options::obj_path.empty())		if (!options::obj_path.empty())
		rafaelUnsubmitted Not Done Reply Inline Actions splitCodeGen can take a ArrayRef too. Why do you need the vec? rafael: splitCodeGen can take a ArrayRef too. Why do you need the vec?
		tejohnsonAuthorUnsubmitted Not Done Reply Inline Actions Ah ok, fixed. tejohnson: Ah ok, fixed.
Filename = options::obj_path;		Filename = options::obj_path;
else if (options::TheOutputType == options::OT_SAVE_TEMPS)		else if (options::TheOutputType == options::OT_SAVE_TEMPS)
Filename = output_name + ".o";		Filename = output_name + ".o";

std::vector<SmallString<128>> Filenames(options::Parallelism);		std::vector<SmallString<128>> Filenames(MaxThreads);
bool TempOutFile = Filename.empty();		bool TempOutFile = Filename.empty();
		pccUnsubmitted Not Done Reply Inline Actions I don't think this should be dependent on a property of the host machine, as there are behavioral differences between parallelism levels (e.g. symbol ordering will be different, and some uses of inline asm won't work with parallelism >1, although some of that is arguably a bug). Can you please update the comment to reflect that? pcc: I don't think this should be dependent on a property of the host machine, as there are…
		tejohnsonAuthorUnsubmitted Not Done Reply Inline Actions Ok, will do. tejohnson: Ok, will do.
{		{
// Open a file descriptor for each backend thread. This is done in a block		// Open a file descriptor for each backend thread. This is done in a block
// so that the output file descriptors are closed before gold opens them.		// so that the output file descriptors are closed before gold opens them.
std::list<llvm::raw_fd_ostream> OSs;		std::list<llvm::raw_fd_ostream> OSs;
std::vector<llvm::raw_pwrite_stream *> OSPtrs(options::Parallelism);		std::vector<llvm::raw_pwrite_stream *> OSPtrs(MaxThreads);
for (unsigned I = 0; I != options::Parallelism; ++I) {		for (unsigned I = 0; I != MaxThreads; ++I) {
int FD;		int FD;
if (TempOutFile) {		if (TempOutFile) {
std::error_code EC =		std::error_code EC =
sys::fs::createTemporaryFile("lto-llvm", "o", FD, Filenames[I]);		sys::fs::createTemporaryFile("lto-llvm", "o", FD, Filenames[I]);
if (EC)		if (EC)
		rafaelUnsubmitted Not Done Reply Inline Actions s/thread/task/ rafael: s/thread/task/
		tejohnsonAuthorUnsubmitted Not Done Reply Inline Actions Fixed here and a couple other places. tejohnson: Fixed here and a couple other places.
message(LDPL_FATAL, "Could not create temporary file: %s",		message(LDPL_FATAL, "Could not create temporary file: %s",
EC.message().c_str());		EC.message().c_str());
} else {		} else {
Filenames[I] = Filename;		Filenames[I] = Filename;
if (options::Parallelism != 1)		if (MaxThreads != 1)
Filenames[I] += utostr(I);		Filenames[I] += utostr(I);
std::error_code EC =		std::error_code EC =
sys::fs::openFileForWrite(Filenames[I], FD, sys::fs::F_None);		sys::fs::openFileForWrite(Filenames[I], FD, sys::fs::F_None);
if (EC)		if (EC)
message(LDPL_FATAL, "Could not open file: %s", EC.message().c_str());		message(LDPL_FATAL, "Could not open file: %s", EC.message().c_str());
}		}
OSs.emplace_back(FD, true);		OSs.emplace_back(FD, true);
OSPtrs[I] = &OSs.back();		OSPtrs[I] = &OSs.back();
Show All 9 Lines	if (add_input_file(Filename.c_str()) != LDPS_OK)
message(LDPL_FATAL,		message(LDPL_FATAL,
"Unable to add .o file to the link. File left behind in: %s",		"Unable to add .o file to the link. File left behind in: %s",
Filename.c_str());		Filename.c_str());
if (TempOutFile)		if (TempOutFile)
Cleanup.push_back(Filename.c_str());		Cleanup.push_back(Filename.c_str());
}		}
}		}

		/// Perform the backend on a single module, invoking the LTO and codegen
		/// pipelines.
		static void ThinLTOBackendThread(claimed_file F, raw_fd_ostream ApiFile,
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions Could this be done in the TaskInfo dtor? mehdi_amini: Could this be done in the TaskInfo dtor?
		tejohnsonAuthorUnsubmitted Not Done Reply Inline Actions I previously had it there, but Rafael thought the dtor was too heavy-weight and wanted it more explicit. =) tejohnson: I previously had it there, but Rafael thought the dtor was too heavy-weight and wanted it more…
		const FunctionInfoIndex &CombinedIndex,
		const SmallString<128> &Filename) {
		ld_plugin_input_file File;
		if (get_input_file(F->handle, &File) != LDPS_OK)
		message(LDPL_FATAL, "Failed to get file information");

		pccUnsubmitted Not Done Reply Inline Actions I don't think thread pools are necessary for split code gen, as we can already perfectly assign the right amount of work to individual threads. Also, this implementation loses the pipelining feature from the original code (i.e. worker threads can work on codegen'ing while the main thread is still splitting). I would prefer you to use the existing implementation in `llvm/CodeGen/ParallelCG.h`. pcc: I don't think thread pools are necessary for split code gen, as we can already perfectly assign…
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions The thread that is doing the splitting can issue other jobs to the thread pool, providing the desired pipeline. Just fuse the loop body below within the lambda... mehdi_amini: The thread that is doing the splitting can issue other jobs to the thread pool, providing the…
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions I'll add that while the pooling is not necessary if you only queue as many jobs as you have threads, it is not a reason by itself not to use it: the paradigm is fairly clear, and it decouples the actual splitting granularity from the number of actual worker threads, allowing to experiment with different numbers for each (providing better pipelining for instance). mehdi_amini: I'll add that while the pooling is not necessary if you only queue as many jobs as you have…
		pccUnsubmitted Not Done Reply Inline Actions Yes, but the current implementation doesn't need any of that. If we experimentally find that decoupling would provide some benefit, then by all means we can start using thread pools here. In any case, if there is a compelling reason to use thread pools, the right place to make the change is in `lib/CodeGen/ParallelCG.cpp` rather than in a duplicate implementation here. We can defer what the design for that should look like simply by not using thread pools yet. pcc: Yes, but the current implementation doesn't need any of that. If we experimentally find that…
		LLVMContext Context;
		Context.setDiagnosticHandler(diagnosticHandlerForContext, nullptr, true);

		StringSet<> Dummy;
		std::unique_ptr<Module> M =
		getModuleForFile(Context, *F, File, ApiFile, Dummy, Dummy);
		if (!options::triple.empty())
		M->setTargetTriple(options::triple.c_str());
		else if (M->getTargetTriple().empty()) {
		M->setTargetTriple(sys::getDefaultTargetTriple());
		rafaelUnsubmitted Not Done Reply Inline Actions Start the function name with a lower case. This is not a thread anymore. Task maybe? rafael: Start the function name with a lower case. This is not a thread anymore. Task maybe?
		tejohnsonAuthorUnsubmitted Not Done Reply Inline Actions Will fix both issues tejohnson: Will fix both issues
		rafaelUnsubmitted Not Done Reply Inline Actions It is nice that now we always use a ThreadPool. It would be awesome if this could be refactored so that there was just one ThreadPool for thinlto and conventional parallel codegen. rafael: It is nice that now we always use a ThreadPool. It would be awesome if this could be…
		tejohnsonAuthorUnsubmitted Not Done Reply Inline Actions The task type is different, and the iteration/handling to add tasks is different - I'm not sure how much code reuse we would get by sharing the thread pool creation and management code. The code to create the thread pool and insert into it is pretty minimal by itself. Also note that you never are using both thread pools in a single compilation. tejohnson: The task type is different, and the iteration/handling to add tasks is different - I'm not sure…
		}

		std::unique_ptr<llvm::Module> RenamedModule =
		renameModuleForThinLTO(M, &CombinedIndex, diagnosticHandler);
		if (!RenamedModule)
		rafaelUnsubmitted Not Done Reply Inline Actions It is thread safe since you create a new one for each thread, no? rafael: It is thread safe since you create a new one for each thread, no?
		tejohnsonAuthorUnsubmitted Not Done Reply Inline Actions The comment was not entirely clear - what I meant was that I am creating a new one for each thread/task because of the fact that it is not thread-safe (i.e. they can't all share the same context). Will clarify. tejohnson: The comment was not entirely clear - what I meant was that I am creating a new one for each…
		message(LDPL_FATAL, "Failed to rename module for ThinLTO");

		if (release_input_file(F->handle) != LDPS_OK)
		message(LDPL_FATAL, "Failed to release file information");

		// We are already running in a thread, don't use split code gen.
		codegen(std::move(RenamedModule), 1 /* MaxThreads */, &CombinedIndex);
		}

		mehdi_aminiUnsubmitted Not Done Reply Inline Actions Usually I prefer RAII (i.e. using a new scope). mehdi_amini: Usually I prefer RAII (i.e. using a new scope).
		tejohnsonAuthorUnsubmitted Not Done Reply Inline Actions Oh I see, the wait() is unnecessary if I provoke the ThreadPool destructor via RAII. Will do that here and for the ThinLTO thread pool as well. tejohnson: Oh I see, the wait() is unnecessary if I provoke the ThreadPool destructor via RAII. Will do…
		/// Launch each module's backend pipeline in a separate thread.
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions We should probably refactor this out of the plugin, but this can be done later. mehdi_amini: We should probably refactor this out of the plugin, but this can be done later.
		tejohnsonAuthorUnsubmitted Not Done Reply Inline Actions Right, I am wondering how much overlap there is with the support you were adding for ld64. Would putting it in libLTO be useful? tejohnson: Right, I am wondering how much overlap there is with the support you were adding for ld64.
		static void ThinLTOBackends(raw_fd_ostream *ApiFile,
		const FunctionInfoIndex &CombinedIndex) {
		SmallString<128> Filename;
		rafaelUnsubmitted Not Done Reply Inline Actions If you change codegenImpl to take an ArrayRef you don't have to do this. rafael: If you change codegenImpl to take an ArrayRef you don't have to do this.
		tejohnsonAuthorUnsubmitted Not Done Reply Inline Actions Ok, will change. tejohnson: Ok, will change.
		if (!options::obj_path.empty())
		Filename = options::obj_path;
		rafaelUnsubmitted Not Done Reply Inline Actions Don't we want something that uses gold's symbol resolution? renameModuleForThinLTO will copy even stuff that gold has marked as preempted, no? rafael: Don't we want something that uses gold's symbol resolution? renameModuleForThinLTO will copy…
		tejohnsonAuthorUnsubmitted Not Done Reply Inline Actions Unfortunately all the ThinLTO promotion logic and renaming support is in the ModuleLinker, so I couldn't just use IRMover::move with the Keep list. Perhaps the ModuleLinker should have a mode where all it does is the promotion handling for exporting modules before calling IRMover::move. I.e. I think this would just need to do a version of processGlobalsForThinLTO where locals in a supplied ValuesToLink list (initiallized from the Keep list in gold) would be promoted if necessary. Similar to the existing processGlobalsForThinLTO but only for things already in the supplied ValuesToLink. Does that sound right? tejohnson: Unfortunately all the ThinLTO promotion logic and renaming support is in the ModuleLinker, so I…
		else if (options::TheOutputType == options::OT_SAVE_TEMPS)
		rafaelUnsubmitted Not Done Reply Inline Actions It seems odd how much work the destructor of TaskInfo is doing. Most of the work is here because gold is not thread safe, correct? It so, it seems better to write this code explicitly after ThinLTOThreadPool.wait(); rafael: It seems odd how much work the destructor of TaskInfo is doing. Most of the work is here…
		tejohnsonAuthorUnsubmitted Not Done Reply Inline Actions Changed TaskInfo::~TaskInfo into TaskInfo::cleanup and invoked explicitly on each task after the wait(). tejohnson: Changed TaskInfo::~TaskInfo into TaskInfo::cleanup and invoked explicitly on each task after…
		Filename = output_name + ".o";

		std::vector<claimed_file *> Worklist;
		for (claimed_file &F : Modules)
		Worklist.push_back(&F);

		unsigned ThreadCount = 0;
		unsigned ThreadIndex = 0;
		std::vector<thread> Threads;
		unsigned int MaxThreads = options::Parallelism
		? options::Parallelism
		: std::thread::hardware_concurrency();

		// TODO: Use a thread pool for better parallelism. Otherwise we will wait on
		// slow backend threads to finish before launching more threads.
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions There is a bunch of duplicated code above (used in regular LTO as well I think) mehdi_amini: There is a bunch of duplicated code above (used in regular LTO as well I think)
		tejohnsonAuthorUnsubmitted Not Done Reply Inline Actions True, the LTO handling in allSymbolsReadHook does some of the same things. But the LLVMContext and IRMover constructors are outside the loop over the modules since they can be shared in that case. And the invocation of getModuleForFile is a bit different. I could probably create a helper that does the getModuleForFile, setting of the target triple, and invoke IRMover::move though. I'm not sure if that ends up being clearer, but let me see what I could do here. tejohnson: True, the LTO handling in allSymbolsReadHook does some of the same things. But the LLVMContext…
		while (!Worklist.empty()) {
		claimed_file *F = Worklist.back();
		Worklist.pop_back();

		Threads.emplace_back(ThinLTOBackendThread, F, ApiFile,
		std::ref(CombinedIndex), Filename);
		// If we hit the max number of threads, wait for the oldest thread to
		// complete before launching another.
		rafaelUnsubmitted Not Done Reply Inline Actions Why do you need a worklist? Can't you just just a simple loop over Modules? rafael: Why do you need a worklist? Can't you just just a simple loop over Modules?
		tejohnsonAuthorUnsubmitted Not Done Reply Inline Actions I think this was leftover from my original pre-ThreadPool implementation. Good point that it isn't needed. Updated to iterate over Modules as suggested. tejohnson: I think this was leftover from my original pre-ThreadPool implementation. Good point that it…
		if (++ThreadCount >= MaxThreads)
		Threads[ThreadIndex++].join();
		}

		// Wait for the remaining threads to complete.
		while (ThreadIndex < ThreadCount) {
		Threads[ThreadIndex].join();
		rafaelUnsubmitted Not Done Reply Inline Actions This seems incompatible with threading or even multiple outputs, no? It looks like every thread will try to use the same output file name. I would suggest producing an error for now. rafael: This seems incompatible with threading or even multiple outputs, no? It looks like every thread…
		tejohnsonAuthorUnsubmitted Not Done Reply Inline Actions The openOutputFile helper below will append a unique thread ID (should probably change this to TaskID...). Will add comment to that effect. Or do you still think it is better to error? tejohnson: The openOutputFile helper below will append a unique thread ID (should probably change this to…
		++ThreadIndex;
		}
		}
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions This is very suboptimal, I don't mind if you want to get this in for now as it is limited to the gold plugin. I plan to submit a ThreadPool to LLVMSupport (I'm using it locally in my bringup of the ld64 plugin). mehdi_amini: This is very suboptimal, I don't mind if you want to get this in for now as it is limited to…
		tejohnsonAuthorUnsubmitted Not Done Reply Inline Actions Right, part of the reason I wanted to send this right away was to see if there was something existing or under development so that I didn't have to reinvent the wheel. Glad to hear you had a similar need for this. Do you expect it to go in soon? tejohnson: Right, part of the reason I wanted to send this right away was to see if there was something…

/// gold informs us that all symbols have been read. At this point, we use		/// gold informs us that all symbols have been read. At this point, we use
/// get_symbols to see if any of our definitions have been overridden by a		/// get_symbols to see if any of our definitions have been overridden by a
/// native object file. Then, perform optimization and codegen.		/// native object file. Then, perform optimization and codegen.
static ld_plugin_status allSymbolsReadHook(raw_fd_ostream *ApiFile) {		static ld_plugin_status allSymbolsReadHook(raw_fd_ostream *ApiFile) {
if (Modules.empty())		if (Modules.empty())
return LDPS_OK;		return LDPS_OK;

LLVMContext Context;
Context.setDiagnosticHandler(diagnosticHandlerForContext, nullptr, true);

// If we are doing ThinLTO compilation, simply build the combined		// If we are doing ThinLTO compilation, simply build the combined
// function index/summary and emit it. We don't need to parse the modules		// function index/summary and emit it. We don't need to parse the modules
// and link them in this case.		// and link them in this case.
if (options::thinlto) {		if (options::thinlto) {
FunctionInfoIndex CombinedIndex;		FunctionInfoIndex CombinedIndex;
uint64_t NextModuleId = 0;		uint64_t NextModuleId = 0;
for (claimed_file &F : Modules) {		for (claimed_file &F : Modules) {
ld_plugin_input_file File;		ld_plugin_input_file File;
if (get_input_file(F.handle, &File) != LDPS_OK)		if (get_input_file(F.handle, &File) != LDPS_OK)
message(LDPL_FATAL, "Failed to get file information");		message(LDPL_FATAL, "Failed to get file information");

std::unique_ptr<FunctionInfoIndex> Index =		std::unique_ptr<FunctionInfoIndex> Index =
getFunctionIndexForFile(F, File);		getFunctionIndexForFile(F, File);

// Skip files without a function summary.		// Skip files without a function summary.
if (!Index)		if (!Index)
continue;		continue;
		rafaelUnsubmitted Not Done Reply Inline Actions This can be just Tasks.emplace_back(new TaskInfo(std::move(InputFile), std::move(OS), NewFilename.c_str(), TempOutFile)); rafael: This can be just Tasks.emplace_back(new TaskInfo(std::move(InputFile), std::move(OS)…
		tejohnsonAuthorUnsubmitted Not Done Reply Inline Actions Fixed tejohnson: Fixed

CombinedIndex.mergeFrom(std::move(Index), ++NextModuleId);		CombinedIndex.mergeFrom(std::move(Index), ++NextModuleId);

		if (release_input_file(F.handle) != LDPS_OK)
		message(LDPL_FATAL, "Failed to release file information");
}		}

std::error_code EC;		std::error_code EC;
raw_fd_ostream OS(output_name + ".thinlto.bc", EC,		raw_fd_ostream OS(output_name + ".thinlto.bc", EC,
sys::fs::OpenFlags::F_None);		sys::fs::OpenFlags::F_None);
if (EC)		if (EC)
message(LDPL_FATAL, "Unable to open %s.thinlto.bc for writing: %s",		message(LDPL_FATAL, "Unable to open %s.thinlto.bc for writing: %s",
output_name.data(), EC.message().c_str());		output_name.data(), EC.message().c_str());
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions This could be std::vector<ThinLTOTaskInfo> Tasks; Tasks.reserve(Modules.size()); (same above for `std::vector<std::unique_ptr<TaskInfo>> Tasks;` around line 1023) mehdi_amini: This could be ``` std::vector<ThinLTOTaskInfo> Tasks; Tasks.reserve(Modules.size()); ```…
		tejohnsonAuthorUnsubmitted Not Done Reply Inline Actions Ok tejohnson: Ok
WriteFunctionSummaryToFile(CombinedIndex, OS);		WriteFunctionSummaryToFile(CombinedIndex, OS);
OS.close();		OS.close();

		if (options::thinlto_index_only) {
cleanup_hook();		cleanup_hook();
exit(0);		exit(0);
}		}
		ThinLTOBackends(ApiFile, CombinedIndex);
		return LDPS_OK;
		}

		LLVMContext Context;
		Context.setDiagnosticHandler(diagnosticHandlerForContext, nullptr, true);

std::unique_ptr<Module> Combined(new Module("ld-temp.o", Context));		std::unique_ptr<Module> Combined(new Module("ld-temp.o", Context));
Linker L(*Combined, diagnosticHandler);		Linker L(*Combined, diagnosticHandler);

std::string DefaultTriple = sys::getDefaultTargetTriple();		std::string DefaultTriple = sys::getDefaultTargetTriple();

StringSet<> Internalize;		StringSet<> Internalize;
StringSet<> Maybe;		StringSet<> Maybe;
Show All 39 Lines	if (options::TheOutputType == options::OT_BC_ONLY)
path = output_name;		path = output_name;
else		else
path = output_name + ".bc";		path = output_name + ".bc";
saveBCFile(path, *Combined);		saveBCFile(path, *Combined);
if (options::TheOutputType == options::OT_BC_ONLY)		if (options::TheOutputType == options::OT_BC_ONLY)
return LDPS_OK;		return LDPS_OK;
}		}

codegen(std::move(Combined));		// TODO: Should this use std::thread::hardware_concurrency() if
		// -jobs option was not specified? Currently preserve behavior of
		// default parallelism being 1.
		unsigned int MaxThreads = options::Parallelism ? options::Parallelism : 1;
		codegen(std::move(Combined), MaxThreads);

if (!options::extra_library_path.empty() &&		if (!options::extra_library_path.empty() &&
set_extra_library_path(options::extra_library_path.c_str()) != LDPS_OK)		set_extra_library_path(options::extra_library_path.c_str()) != LDPS_OK)
message(LDPL_FATAL, "Unable to set the extra library path.");		message(LDPL_FATAL, "Unable to set the extra library path.");

return LDPS_OK;		return LDPS_OK;
}		}

Show All 37 Lines