This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/Driver/
-
clang/
-
Driver/
-
Driver.h
1
Job.h
-
Options.td
-
lib/Driver/
-
Driver/
2/8
Compilation.cpp
-
Driver.cpp
-
Job.cpp
-
ToolChains/
-
Clang.cpp

Differential D69582

Let clang driver support parallel jobs
Needs ReviewPublic

Authored by yaxunl on Oct 29 2019, 1:26 PM.

Download Raw Diff

Details

Reviewers

tra
rsmith
rjmccall
echristo

Commits

rG754fb406266c: Fix parallel jobs

Summary

It is observed that device code compilation takes most of the compilation time when
clang compiles CUDA/HIP programs since device code usually contains complicated
computation code. Often times such code are highly coupled, which results in
a few large source files which become bottlenecks of a whole project. Things become
worse when such code is compiled with multiple gpu archs, since clang compiles for
each gpu arch sequentially. In practice, it is common to compile for more than 5 gpu
archs.

To alleviate this issue, this patch implements a simple scheduler which let clang
driver compile independent jobs in parallel.

This patch tries to minimize impact on existing clang driver. No changes to action
builder and tool chain. It introduces a driver option -parallel-jobs=n to control number
of parallel jobs to launch. By default it is 1, and it is NFC per clang driver behavior.
If llvm/clang is built with LLVM_ENABLE_THREADS off, this change is also NFC.

The basic design of the scheduler is to find the dependence among the jobs and
use a thread to launches a job when its dependent jobs are done.

Diff Detail

Event Timeline

yaxunl created this revision.Oct 29 2019, 1:26 PM

Herald added subscribers: jfb, mgorny. · View Herald TranscriptOct 29 2019, 1:26 PM

@echristo Eric, any thoughts/concerns on the overall direction for the driver?

@yaxunl One concern I have is diagnostics. When the jobs are executed in parallel, I assume all of them will go to the standard error and will be interleaved randomly across all parallel compilations. Figuring out what went wrong will be hard. Ideally we may want to collect output from individual sub-commands and print them once the job has finished, so there's no confusion about the source of the error.

It is observed that device code compilation takes most of the compilation time when
clang compiles CUDA/HIP programs since device code usually contains complicated
computation code. Often times such code are highly coupled, which results in
a few large source files which become bottlenecks of a whole project. Things become
worse when such code is compiled with multiple gpu archs, since clang compiles for
each gpu arch sequentially. In practice, it is common to compile for more than 5 gpu
archs.

I think this change will only help with relatively small local builds with few relatively large CUDA/HIP files. We did talk internally about parallelizing CUDA builds in the past and came to the conclusion that it's not very useful in practice, at least for us. We have a lot of non-CUDA files to compile, too, and that usually provides enough work for the build to hide the long CUDA compilations. Distributed builds (and I guess local, too) often assume one compilation per CPU, so launching multiple parallel subcompilations for each top-level job may be not that helpful in practice beyond manual compilation of one file. That said, the change will be a nice improvement for quick rebuilds where only one/few CUDA files need to be recompiled. However, in that case being able to get comprehensible error messages would also be very important.

Overall I'm on the fence about this change. It may be more trouble than it's worth.

clang/include/clang/Driver/Job.h
77	Nit: Using pointer as a key will result in sub-compilations being done in different order from run to run and that may result in build results changing from run to run. I can't think of a realistic scenario yet. One case where it may make a difference is generation of dependency file. We currently leak some output file name flags to device-side compilations. E.g. `-fsyntax-only -MD -MF foo.d` will write foo.d for each compilation. At best we'll end up with the result of whichever sub-compilation finished last. At worst we'll end up with corrupt output. In this case it's the output argument leak that's the problem, but I suspect there may be other cases where execution order will be observable.
clang/lib/Driver/Compilation.cpp
284–286	Indentation seems to be off. Run through clang-format?

thakis added a subscriber: thakis.Oct 30 2019, 11:15 AM

thakis added inline comments.

clang/lib/Driver/Compilation.cpp
303	Maybe a select() / fork() loop is a better approach than spawning one thread per subprocess? This is doing thread-level parallelism in addition to process-level parallelism :) If llvm doesn't have a subprocess pool abstraction yet, ninja's is pretty small, self-contained, battle-tested and open-source, maybe you could copy that over (and remove bits you don't need)? https://github.com/ninja-build/ninja/blob/master/src/subprocess.h https://github.com/ninja-build/ninja/blob/master/src/subprocess-win32.cc https://github.com/ninja-build/ninja/blob/master/src/subprocess-posix.cc

+@aganea @amccarth
Users have been asking for /MP support in clang-cl for a while, which is basically this.

Is there anything in JobScheduler that could reasonably be moved down to llvm/lib/Support? I would also like to be able to use it to implement multi-process ThinLTO instead of multi-threaded ThinLTO.

This is somehow similar to what I was proposing in D52193.

Would you possibly provide tests and/or an example of your usage please?

clang/lib/Driver/Compilation.cpp
303	@thakis How would this new `Subprocess` interface with the existing `llvm/include/llvm/Support/Program.h` APIs? Wouldn't be better to simply extend what is already there with a `WaitMany()` and a `Terminate()` API like I was suggesting in D52193? That would cover all that's needed. Or are you suggesting to further stub `ExecuteAndWait()` by this new `Subprocess` API?
332	In addition to what @thakis said above, yielding here is maybe not a good idea. This causes the process to spin, and remain in the OS' active process list, which uselessly eats cpu cycles. This can become significant over the course of several minutes of compilation. Here's a tiny example of what happens when threads are waiting for something to happen: (the top parts yields frequently; the bottom part does not yield - see D68820) You would need here to go through a OS primitive that suspends the process until at least one job in the pool completes. On Windows this can be achieved through `WaitForMultipleObjects()` or I/O completion ports like provided by @thakis. You can take a look at `Compilation::executeJobs()` in D52193 and further down the line, `WaitMany()` which waits for at least one job/process to complete.
354	It's a waste to start a new thread here just because `ExecuteAndWait()` is used inside `Command::Execute()`. An async mechanism would be a lot better like stated above.

yurybura added a subscriber: yurybura.Nov 1 2019, 4:17 AM

dblaikie added a subscriber: dblaikie.Nov 19 2019, 5:00 PM

tycho added a subscriber: tycho.Nov 20 2019, 10:21 AM

aganea mentioned this in D69825: [Clang][Driver] Re-use the calling process instead of creating a new process for the cc1 invocation.Dec 2 2019, 3:19 PM

split the patch

yaxunl added a parent revision: D71080: [NFC] Separate getLastArgIntValue to Basic.Dec 5 2019, 11:11 AM

Trass3r added a subscriber: Trass3r.Jan 8 2020, 5:52 AM

yaxunl marked an inline comment as done.Feb 4 2020, 1:23 PM

yaxunl added inline comments.

clang/lib/Driver/Compilation.cpp
332	Sorry for the delay. If D52193 is commited, I will probably only need some minor change to support parallel compilation for HIP. Therefore I hope D52193 could get committed soon. I am wondering what is the current status of D52193 and what is blocking it. Is there any chance to get it commited soon? Thanks.

aganea added inline comments.Feb 5 2020, 6:14 AM

clang/lib/Driver/Compilation.cpp
332	Hi @yaxunl! Nothing prevents from finishing D52193 :-) It was meant as a prototype, but I could transform it into a more desirable state. I left it aside because we made another (unpublished) prototype, where the process invocations were in fact collapsed into the calling process, ie. ran in a thread pool in the manner of the recent `-fintegrated-cc1` flag. But that requires for `cl::opt` to support different contexts, as opposed to just one global state (an RFC was discussed about a year ago, but there was no consensus). Having a thread pool instead of the process pool is faster when compiling .C/.CPP files with `clang-cl /MP`, but perhaps in your case that won't work, you need to call external binaries, do you? Binaries that are not part of LLVM? If so, then landing D52193 first would makes sense.

yaxunl marked an inline comment as done.Feb 5 2020, 7:22 AM

yaxunl added inline comments.

clang/lib/Driver/Compilation.cpp
332	HIP toolchain needs to launch executables other than clang for a compilation, therefore D52193 is more relevant to us. I believe this is also the case for CUDA, OpenMP and probably more general situations involving linker. I think both parallel by threads and parallel by processes are useful. However parallel by processes is probably more generic. Therefore landing D52193 first would benefit a lot.

ashi1 added a subscriber: ashi1.Aug 24 2020, 12:32 PM

Herald added a subscriber: dang. · View Herald TranscriptAug 24 2020, 12:32 PM

tra mentioned this in D136701: [LinkerWrapper] Perform device linking steps in parallel.Oct 25 2022, 11:03 AM

Revision Contents

Path

Size

clang/

include/

clang/

Driver/

Driver.h

9 lines

Job.h

7 lines

Options.td

2 lines

lib/

Driver/

Compilation.cpp

126 lines

Driver.cpp

9 lines

Job.cpp

4 lines

ToolChains/

Clang.cpp

4 lines

Diff 232405

clang/include/clang/Driver/Driver.h

Show First 20 Lines • Show All 241 Lines • ▼ Show 20 Lines	private:

/// Cache of all the ToolChains in use by the driver.		/// Cache of all the ToolChains in use by the driver.
///		///
/// This maps from the string representation of a triple to a ToolChain		/// This maps from the string representation of a triple to a ToolChain
/// created targeting that triple. The driver owns all the ToolChain objects		/// created targeting that triple. The driver owns all the ToolChain objects
/// stored in it, and will clean them up when torn down.		/// stored in it, and will clean them up when torn down.
mutable llvm::StringMap<std::unique_ptr<ToolChain>> ToolChains;		mutable llvm::StringMap<std::unique_ptr<ToolChain>> ToolChains;

		/// Number of parallel jobs.
		unsigned NumParallelJobs;

private:		private:
/// TranslateInputArgs - Create a new derived argument list from the input		/// TranslateInputArgs - Create a new derived argument list from the input
/// arguments, after applying the standard argument translations.		/// arguments, after applying the standard argument translations.
llvm::opt::DerivedArgList *		llvm::opt::DerivedArgList *
TranslateInputArgs(const llvm::opt::InputArgList &Args) const;		TranslateInputArgs(const llvm::opt::InputArgList &Args) const;

// getFinalPhase - Determine which compilation mode we are in and record		// getFinalPhase - Determine which compilation mode we are in and record
// which option we used to determine the final phase.		// which option we used to determine the final phase.
▲ Show 20 Lines • Show All 286 Lines • ▼ Show 20 Lines	public:
bool ShouldUseFlangCompiler(const JobAction &JA) const;		bool ShouldUseFlangCompiler(const JobAction &JA) const;

/// Returns true if we are performing any kind of LTO.		/// Returns true if we are performing any kind of LTO.
bool isUsingLTO() const { return LTOMode != LTOK_None; }		bool isUsingLTO() const { return LTOMode != LTOK_None; }

/// Get the specific kind of LTO being performed.		/// Get the specific kind of LTO being performed.
LTOKind getLTOMode() const { return LTOMode; }		LTOKind getLTOMode() const { return LTOMode; }

		/// Get the number of parallel jobs.
		unsigned getNumberOfParallelJobs() const { return NumParallelJobs; }

		/// Set the number of parallel jobs.
		void setNumberOfParallelJobs(unsigned N) { NumParallelJobs = N; }

private:		private:

/// Tries to load options from configuration file.		/// Tries to load options from configuration file.
///		///
/// \returns true if error occurred.		/// \returns true if error occurred.
bool loadConfigFile();		bool loadConfigFile();

/// Read options from the specified file.		/// Read options from the specified file.
▲ Show 20 Lines • Show All 67 Lines • Show Last 20 Lines

clang/include/clang/Driver/Job.h

Show First 20 Lines • Show All 67 Lines • ▼ Show 20 Lines	class Command {

/// String storage if we need to create a new argument to specify a response		/// String storage if we need to create a new argument to specify a response
/// file		/// file
std::string ResponseFileFlag;		std::string ResponseFileFlag;

/// See Command::setEnvironment		/// See Command::setEnvironment
std::vector<const char *> Environment;		std::vector<const char *> Environment;

		/// Dependent actions
		llvm::SmallVector<const Action *, 4> DependentActions;
		traUnsubmitted Not Done Reply Inline Actions Nit: Using pointer as a key will result in sub-compilations being done in different order from run to run and that may result in build results changing from run to run. I can't think of a realistic scenario yet. One case where it may make a difference is generation of dependency file. We currently leak some output file name flags to device-side compilations. E.g. `-fsyntax-only -MD -MF foo.d` will write foo.d for each compilation. At best we'll end up with the result of whichever sub-compilation finished last. At worst we'll end up with corrupt output. In this case it's the output argument leak that's the problem, but I suspect there may be other cases where execution order will be observable. tra: Nit: Using pointer as a key will result in sub-compilations being done in different order from…

/// When a response file is needed, we try to put most arguments in an		/// When a response file is needed, we try to put most arguments in an
/// exclusive file, while others remains as regular command line arguments.		/// exclusive file, while others remains as regular command line arguments.
/// This functions fills a vector with the regular command line arguments,		/// This functions fills a vector with the regular command line arguments,
/// argv, excluding the ones passed in a response file.		/// argv, excluding the ones passed in a response file.
void buildArgvForResponseFile(llvm::SmallVectorImpl<const char *> &Out) const;		void buildArgvForResponseFile(llvm::SmallVectorImpl<const char *> &Out) const;

/// Encodes an array of C strings into a single string separated by whitespace.		/// Encodes an array of C strings into a single string separated by whitespace.
/// This function will also put in quotes arguments that have whitespaces and		/// This function will also put in quotes arguments that have whitespaces and
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	public:

const llvm::opt::ArgStringList &getArguments() const { return Arguments; }		const llvm::opt::ArgStringList &getArguments() const { return Arguments; }

/// Print a command argument, and optionally quote it.		/// Print a command argument, and optionally quote it.
static void printArg(llvm::raw_ostream &OS, StringRef Arg, bool Quote);		static void printArg(llvm::raw_ostream &OS, StringRef Arg, bool Quote);

/// Set whether to print the input filenames when executing.		/// Set whether to print the input filenames when executing.
void setPrintInputFilenames(bool P) { PrintInputFilenames = P; }		void setPrintInputFilenames(bool P) { PrintInputFilenames = P; }

		const llvm::SmallVector<const Action *, 4> &getDependentActions() const {
		return DependentActions;
		}
};		};

/// Like Command, but with a fallback which is executed in case		/// Like Command, but with a fallback which is executed in case
/// the primary command crashes.		/// the primary command crashes.
class FallbackCommand : public Command {		class FallbackCommand : public Command {
public:		public:
FallbackCommand(const Action &Source_, const Tool &Creator_,		FallbackCommand(const Action &Source_, const Tool &Creator_,
const char *Executable_,		const char *Executable_,
▲ Show 20 Lines • Show All 64 Lines • Show Last 20 Lines

clang/include/clang/Driver/Options.td

Show First 20 Lines • Show All 395 Lines • ▼ Show 20 Lines	def ObjCXX : Flag<["-"], "ObjC++">, Flags<[DriverOption]>,
HelpText<"Treat source input files as Objective-C++ inputs">;		HelpText<"Treat source input files as Objective-C++ inputs">;
def ObjC : Flag<["-"], "ObjC">, Flags<[DriverOption]>,		def ObjC : Flag<["-"], "ObjC">, Flags<[DriverOption]>,
HelpText<"Treat source input files as Objective-C inputs">;		HelpText<"Treat source input files as Objective-C inputs">;
def O : Joined<["-"], "O">, Group<O_Group>, Flags<[CC1Option]>;		def O : Joined<["-"], "O">, Group<O_Group>, Flags<[CC1Option]>;
def O_flag : Flag<["-"], "O">, Flags<[CC1Option]>, Alias<O>, AliasArgs<["2"]>;		def O_flag : Flag<["-"], "O">, Flags<[CC1Option]>, Alias<O>, AliasArgs<["2"]>;
def Ofast : Joined<["-"], "Ofast">, Group<O_Group>, Flags<[CC1Option]>;		def Ofast : Joined<["-"], "Ofast">, Group<O_Group>, Flags<[CC1Option]>;
def P : Flag<["-"], "P">, Flags<[CC1Option]>, Group<Preprocessor_Group>,		def P : Flag<["-"], "P">, Flags<[CC1Option]>, Group<Preprocessor_Group>,
HelpText<"Disable linemarker output in -E mode">;		HelpText<"Disable linemarker output in -E mode">;
		def parallel_jobs_EQ : Joined<["-"], "parallel-jobs=">, Flags<[DriverOption]>,
		HelpText<"Number of parallel jobs">;
def Qy : Flag<["-"], "Qy">, Flags<[CC1Option]>,		def Qy : Flag<["-"], "Qy">, Flags<[CC1Option]>,
HelpText<"Emit metadata containing compiler name and version">;		HelpText<"Emit metadata containing compiler name and version">;
def Qn : Flag<["-"], "Qn">, Flags<[CC1Option]>,		def Qn : Flag<["-"], "Qn">, Flags<[CC1Option]>,
HelpText<"Do not emit metadata containing compiler name and version">;		HelpText<"Do not emit metadata containing compiler name and version">;
def : Flag<["-"], "fident">, Group<f_Group>, Alias<Qy>, Flags<[CC1Option]>;		def : Flag<["-"], "fident">, Group<f_Group>, Alias<Qy>, Flags<[CC1Option]>;
def : Flag<["-"], "fno-ident">, Group<f_Group>, Alias<Qn>, Flags<[CC1Option]>;		def : Flag<["-"], "fno-ident">, Group<f_Group>, Alias<Qn>, Flags<[CC1Option]>;
def Qunused_arguments : Flag<["-"], "Qunused-arguments">, Flags<[DriverOption, CoreOption]>,		def Qunused_arguments : Flag<["-"], "Qunused-arguments">, Flags<[DriverOption, CoreOption]>,
HelpText<"Don't emit warning for unused driver arguments">;		HelpText<"Don't emit warning for unused driver arguments">;
▲ Show 20 Lines • Show All 2,945 Lines • Show Last 20 Lines

clang/lib/Driver/Compilation.cpp

Show All 9 Lines
#include "clang/Basic/LLVM.h"		#include "clang/Basic/LLVM.h"
#include "clang/Driver/Action.h"		#include "clang/Driver/Action.h"
#include "clang/Driver/Driver.h"		#include "clang/Driver/Driver.h"
#include "clang/Driver/DriverDiagnostic.h"		#include "clang/Driver/DriverDiagnostic.h"
#include "clang/Driver/Job.h"		#include "clang/Driver/Job.h"
#include "clang/Driver/Options.h"		#include "clang/Driver/Options.h"
#include "clang/Driver/ToolChain.h"		#include "clang/Driver/ToolChain.h"
#include "clang/Driver/Util.h"		#include "clang/Driver/Util.h"
		#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/None.h"		#include "llvm/ADT/None.h"
#include "llvm/ADT/STLExtras.h"		#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/Triple.h"		#include "llvm/ADT/Triple.h"
#include "llvm/Option/ArgList.h"		#include "llvm/Option/ArgList.h"
#include "llvm/Option/OptSpecifier.h"		#include "llvm/Option/OptSpecifier.h"
#include "llvm/Option/Option.h"		#include "llvm/Option/Option.h"
#include "llvm/Support/FileSystem.h"		#include "llvm/Support/FileSystem.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include <cassert>		#include <cassert>
		#include <functional>
		#include <mutex>
#include <string>		#include <string>
#include <system_error>		#include <system_error>
		#include <thread>
#include <utility>		#include <utility>

using namespace clang;		using namespace clang;
using namespace driver;		using namespace driver;
using namespace llvm::opt;		using namespace llvm::opt;

Compilation::Compilation(const Driver &D, const ToolChain &_DefaultToolChain,		Compilation::Compilation(const Driver &D, const ToolChain &_DefaultToolChain,
InputArgList _Args, DerivedArgList _TranslatedArgs,		InputArgList _Args, DerivedArgList _TranslatedArgs,
▲ Show 20 Lines • Show All 177 Lines • ▼ Show 20 Lines	static bool ActionFailed(const Action *A,
return false;		return false;
}		}

static bool InputsOk(const Command &C,		static bool InputsOk(const Command &C,
const FailingCommandList &FailingCommands) {		const FailingCommandList &FailingCommands) {
return !ActionFailed(&C.getSource(), FailingCommands);		return !ActionFailed(&C.getSource(), FailingCommands);
}		}

		namespace {
		class JobScheduler {
		public:
		enum JobState { JS_WAIT, JS_RUN, JS_DONE, JS_FAIL };
		JobScheduler(const JobList &Jobs, size_t NJobs = 1)
		: Jobs(Jobs), NumJobs(NJobs) {
		#if !LLVM_ENABLE_THREADS
		NumJobs = 1;
		#endif
		for (auto &Job : Jobs) {
		JState[&Job] = JS_WAIT;
		for (const auto *AI : Job.getDependentActions()) {
		for (const auto *CI : ActToCmds[AI]) {
		DependentCmds[&Job].push_back(CI);
		}
		}
		for (const auto *CI : ActToCmds[&Job.getSource()]) {
		DependentCmds[&Job].push_back(CI);
		}
		ActToCmds[&Job.getSource()].push_back(&Job);
		}
		}
		/// \return true if all jobs are done. Otherwise, \p Next contains the
		/// the next job ready to be executed if it is not null pointer. Otherwise
		/// all jobs are running or waiting.
		bool IsDone(const Command *&Next) {
		std::lock_guard<std::mutex> lock(Mutex);
		Next = nullptr;
		unsigned Done = 0;
		unsigned Running = 0;
		for (auto &Cmd : Jobs) {
		switch (JState[&Cmd]) {
		case JS_RUN:
		++Running;
		break;
		case JS_DONE:
		case JS_FAIL:
		++Done;
		break;
		case JS_WAIT: {
		bool InputsReady = true;
		for (const auto *CI : DependentCmds[&Cmd]) {
		if (JState[CI] == JS_FAIL) {
		JState[&Cmd] = JS_FAIL;
		++Done;
		InputsReady = false;
		break;
		}
		if (JState[CI] != JS_DONE) {
		InputsReady = false;
		break;
		}
		}
		if (!Next && InputsReady) {
		Next = &Cmd;
		}
		break;
		}
		}
		}
		traUnsubmitted Not Done Reply Inline Actions Indentation seems to be off. Run through clang-format? tra: Indentation seems to be off. Run through clang-format?
		if (Running >= NumJobs)
		Next = nullptr;
		return Done == Jobs.size();
		}

		void setJobState(const Command *Cmd, JobState JS) {
		std::lock_guard<std::mutex> lock(Mutex);
		JState[Cmd] = JS;
		}

		void launch(std::function<void()> Work) {
		#if LLVM_ENABLE_THREADS
		if (NumJobs == 1) {
		Work();
		return;
		}
		std::thread Th(Work);
		thakisUnsubmitted Not Done Reply Inline Actions Maybe a select() / fork() loop is a better approach than spawning one thread per subprocess? This is doing thread-level parallelism in addition to process-level parallelism :) If llvm doesn't have a subprocess pool abstraction yet, ninja's is pretty small, self-contained, battle-tested and open-source, maybe you could copy that over (and remove bits you don't need)? https://github.com/ninja-build/ninja/blob/master/src/subprocess.h https://github.com/ninja-build/ninja/blob/master/src/subprocess-win32.cc https://github.com/ninja-build/ninja/blob/master/src/subprocess-posix.cc thakis: Maybe a select() / fork() loop is a better approach than spawning one thread per subprocess?
		aganeaUnsubmitted Not Done Reply Inline Actions @thakis How would this new `Subprocess` interface with the existing `llvm/include/llvm/Support/Program.h` APIs? Wouldn't be better to simply extend what is already there with a `WaitMany()` and a `Terminate()` API like I was suggesting in D52193? That would cover all that's needed. Or are you suggesting to further stub `ExecuteAndWait()` by this new `Subprocess` API? aganea: @thakis How would this new `Subprocess` interface with the existing…
		Th.detach();
		#else
		Work();
		#endif
		}

		private:
		std::mutex Mutex;
		const JobList &Jobs;
		llvm::DenseMap<const Command *, JobState> JState;
		llvm::DenseMap<const Action , llvm::SmallVector<const Command , 4>>
		ActToCmds;
		llvm::DenseMap<const Command , llvm::SmallVector<const Command , 4>>
		DependentCmds;
		size_t NumJobs; // Number of parallel jobs to run
		};
		} // namespace
void Compilation::ExecuteJobs(const JobList &Jobs,		void Compilation::ExecuteJobs(const JobList &Jobs,
FailingCommandList &FailingCommands) const {		FailingCommandList &FailingCommands) const {
// According to UNIX standard, driver need to continue compiling all the		// According to UNIX standard, driver need to continue compiling all the
// inputs on the command line even one of them failed.		// inputs on the command line even one of them failed.
// In all but CLMode, execute all the jobs unless the necessary inputs for the		// In all but CLMode, execute all the jobs unless the necessary inputs for the
// job is missing due to previous failures.		// job is missing due to previous failures.
for (const auto &Job : Jobs) {		JobScheduler JS(Jobs, getDriver().getNumberOfParallelJobs());
if (!InputsOk(Job, FailingCommands))
		const Command *Next = nullptr;
		while (!JS.IsDone(Next)) {
		if (!Next) {
		std::this_thread::yield();
		aganeaUnsubmitted Not Done Reply Inline Actions In addition to what @thakis said above, yielding here is maybe not a good idea. This causes the process to spin, and remain in the OS' active process list, which uselessly eats cpu cycles. This can become significant over the course of several minutes of compilation. Here's a tiny example of what happens when threads are waiting for something to happen: (the top parts yields frequently; the bottom part does not yield - see D68820) You would need here to go through a OS primitive that suspends the process until at least one job in the pool completes. On Windows this can be achieved through `WaitForMultipleObjects()` or I/O completion ports like provided by @thakis. You can take a look at `Compilation::executeJobs()` in D52193 and further down the line, `WaitMany()` which waits for at least one job/process to complete. aganea: In addition to what @thakis said above, yielding here is maybe not a good idea. This causes the…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions Sorry for the delay. If D52193 is commited, I will probably only need some minor change to support parallel compilation for HIP. Therefore I hope D52193 could get committed soon. I am wondering what is the current status of D52193 and what is blocking it. Is there any chance to get it commited soon? Thanks. yaxunl: Sorry for the delay. If D52193 is commited, I will probably only need some minor change to…
		aganeaUnsubmitted Not Done Reply Inline Actions Hi @yaxunl! Nothing prevents from finishing D52193 :-) It was meant as a prototype, but I could transform it into a more desirable state. I left it aside because we made another (unpublished) prototype, where the process invocations were in fact collapsed into the calling process, ie. ran in a thread pool in the manner of the recent `-fintegrated-cc1` flag. But that requires for `cl::opt` to support different contexts, as opposed to just one global state (an RFC was discussed about a year ago, but there was no consensus). Having a thread pool instead of the process pool is faster when compiling .C/.CPP files with `clang-cl /MP`, but perhaps in your case that won't work, you need to call external binaries, do you? Binaries that are not part of LLVM? If so, then landing D52193 first would makes sense. aganea: Hi @yaxunl! Nothing prevents from finishing D52193 :-) It was meant as a prototype, but I could…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions HIP toolchain needs to launch executables other than clang for a compilation, therefore D52193 is more relevant to us. I believe this is also the case for CUDA, OpenMP and probably more general situations involving linker. I think both parallel by threads and parallel by processes are useful. However parallel by processes is probably more generic. Therefore landing D52193 first would benefit a lot. yaxunl: HIP toolchain needs to launch executables other than clang for a compilation, therefore D52193…
continue;		continue;
const Command *FailingCommand = nullptr;		}
if (int Res = ExecuteCommand(Job, FailingCommand)) {
FailingCommands.push_back(std::make_pair(Res, FailingCommand));		if (!InputsOk(*Next, FailingCommands)) {
		JS.setJobState(Next, JobScheduler::JS_FAIL);
// Bail as soon as one command fails in cl driver mode.		// Bail as soon as one command fails in cl driver mode.
if (TheDriver.IsCLMode())		if (TheDriver.IsCLMode())
return;		return;
		continue;
		}

		JS.setJobState(Next, JobScheduler::JS_RUN);
		auto Work = [&, Next]() {
		const Command *FailingCommand = nullptr;
		if (int Res = ExecuteCommand(*Next, FailingCommand)) {
		JS.setJobState(Next, JobScheduler::JS_FAIL);
		FailingCommands.push_back(std::make_pair(Res, FailingCommand));
		} else {
		JS.setJobState(Next, JobScheduler::JS_DONE);
}		}
		};
		JS.launch(Work);
		aganeaUnsubmitted Not Done Reply Inline Actions It's a waste to start a new thread here just because `ExecuteAndWait()` is used inside `Command::Execute()`. An async mechanism would be a lot better like stated above. aganea: It's a waste to start a new thread here just because `ExecuteAndWait()` is used inside `Command…
}		}
}		}

void Compilation::initCompilationForDiagnostics() {		void Compilation::initCompilationForDiagnostics() {
ForDiagnostics = true;		ForDiagnostics = true;

// Free actions and jobs.		// Free actions and jobs.
Actions.clear();		Actions.clear();
Show All 36 Lines

clang/lib/Driver/Driver.cpp

Show All 32 Lines
#include "ToolChains/MSVC.h"		#include "ToolChains/MSVC.h"
#include "ToolChains/MinGW.h"		#include "ToolChains/MinGW.h"
#include "ToolChains/Minix.h"		#include "ToolChains/Minix.h"
#include "ToolChains/MipsLinux.h"		#include "ToolChains/MipsLinux.h"
#include "ToolChains/Myriad.h"		#include "ToolChains/Myriad.h"
#include "ToolChains/NaCl.h"		#include "ToolChains/NaCl.h"
#include "ToolChains/NetBSD.h"		#include "ToolChains/NetBSD.h"
#include "ToolChains/OpenBSD.h"		#include "ToolChains/OpenBSD.h"
#include "ToolChains/PS4CPU.h"
#include "ToolChains/PPCLinux.h"		#include "ToolChains/PPCLinux.h"
		#include "ToolChains/PS4CPU.h"
#include "ToolChains/RISCVToolchain.h"		#include "ToolChains/RISCVToolchain.h"
#include "ToolChains/Solaris.h"		#include "ToolChains/Solaris.h"
#include "ToolChains/TCE.h"		#include "ToolChains/TCE.h"
#include "ToolChains/WebAssembly.h"		#include "ToolChains/WebAssembly.h"
#include "ToolChains/XCore.h"		#include "ToolChains/XCore.h"
		#include "clang/Basic/OptionUtils.h"
#include "clang/Basic/Version.h"		#include "clang/Basic/Version.h"
#include "clang/Config/config.h"		#include "clang/Config/config.h"
#include "clang/Driver/Action.h"		#include "clang/Driver/Action.h"
#include "clang/Driver/Compilation.h"		#include "clang/Driver/Compilation.h"
#include "clang/Driver/DriverDiagnostic.h"		#include "clang/Driver/DriverDiagnostic.h"
#include "clang/Driver/Job.h"		#include "clang/Driver/Job.h"
#include "clang/Driver/Options.h"		#include "clang/Driver/Options.h"
#include "clang/Driver/SanitizerArgs.h"		#include "clang/Driver/SanitizerArgs.h"
#include "clang/Driver/Tool.h"		#include "clang/Driver/Tool.h"
#include "clang/Driver/ToolChain.h"		#include "clang/Driver/ToolChain.h"
		#include "clang/Driver/Util.h"
#include "llvm/ADT/ArrayRef.h"		#include "llvm/ADT/ArrayRef.h"
#include "llvm/ADT/STLExtras.h"		#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/SmallSet.h"		#include "llvm/ADT/SmallSet.h"
#include "llvm/ADT/StringExtras.h"		#include "llvm/ADT/StringExtras.h"
#include "llvm/ADT/StringSet.h"		#include "llvm/ADT/StringSet.h"
#include "llvm/ADT/StringSwitch.h"		#include "llvm/ADT/StringSwitch.h"
#include "llvm/Config/llvm-config.h"		#include "llvm/Config/llvm-config.h"
#include "llvm/Option/Arg.h"		#include "llvm/Option/Arg.h"
▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	: Diags(Diags), VFS(std::move(VFS)), Mode(GCCMode),
SaveTemps(SaveTempsNone), BitcodeEmbed(EmbedNone), LTOMode(LTOK_None),		SaveTemps(SaveTempsNone), BitcodeEmbed(EmbedNone), LTOMode(LTOK_None),
ClangExecutable(ClangExecutable), SysRoot(DEFAULT_SYSROOT),		ClangExecutable(ClangExecutable), SysRoot(DEFAULT_SYSROOT),
DriverTitle("clang LLVM compiler"), CCPrintOptionsFilename(nullptr),		DriverTitle("clang LLVM compiler"), CCPrintOptionsFilename(nullptr),
CCPrintHeadersFilename(nullptr), CCLogDiagnosticsFilename(nullptr),		CCPrintHeadersFilename(nullptr), CCLogDiagnosticsFilename(nullptr),
CCCPrintBindings(false), CCPrintOptions(false), CCPrintHeaders(false),		CCCPrintBindings(false), CCPrintOptions(false), CCPrintHeaders(false),
CCLogDiagnostics(false), CCGenDiagnostics(false),		CCLogDiagnostics(false), CCGenDiagnostics(false),
TargetTriple(TargetTriple), CCCGenericGCCName(""), Saver(Alloc),		TargetTriple(TargetTriple), CCCGenericGCCName(""), Saver(Alloc),
CheckInputsExist(true), GenReproducer(false),		CheckInputsExist(true), GenReproducer(false),
SuppressMissingInputWarning(false) {		SuppressMissingInputWarning(false), NumParallelJobs(1) {

// Provide a sane fallback if no VFS is specified.		// Provide a sane fallback if no VFS is specified.
if (!this->VFS)		if (!this->VFS)
this->VFS = llvm::vfs::getRealFileSystem();		this->VFS = llvm::vfs::getRealFileSystem();

Name = llvm::sys::path::filename(ClangExecutable);		Name = llvm::sys::path::filename(ClangExecutable);
Dir = llvm::sys::path::parent_path(ClangExecutable);		Dir = llvm::sys::path::parent_path(ClangExecutable);
InstalledDir = Dir; // Provide a sensible default installed dir.		InstalledDir = Dir; // Provide a sensible default installed dir.
▲ Show 20 Lines • Show All 956 Lines • ▼ Show 20 Lines	unsigned Model = llvm::StringSwitch<unsigned>(Name)
.Default(~0U);		.Default(~0U);
if (Model == ~0U) {		if (Model == ~0U) {
Diags.Report(diag::err_drv_invalid_value) << A->getAsString(Args)		Diags.Report(diag::err_drv_invalid_value) << A->getAsString(Args)
<< Name;		<< Name;
} else		} else
BitcodeEmbed = static_cast<BitcodeEmbedMode>(Model);		BitcodeEmbed = static_cast<BitcodeEmbedMode>(Model);
}		}

		setNumberOfParallelJobs(
		getLastArgIntValue(Args, options::OPT_parallel_jobs_EQ, 1, Diags));

std::unique_ptr<llvm::opt::InputArgList> UArgs =		std::unique_ptr<llvm::opt::InputArgList> UArgs =
std::make_unique<InputArgList>(std::move(Args));		std::make_unique<InputArgList>(std::move(Args));

// Perform the default argument translations.		// Perform the default argument translations.
DerivedArgList TranslatedArgs = TranslateInputArgs(UArgs);		DerivedArgList TranslatedArgs = TranslateInputArgs(UArgs);

// Owned by the host.		// Owned by the host.
const ToolChain &TC = getToolChain(		const ToolChain &TC = getToolChain(
▲ Show 20 Lines • Show All 3,938 Lines • Show Last 20 Lines

clang/lib/Driver/Job.cpp

	Show All 33 Lines
	using namespace driver;			using namespace driver;

	Command::Command(const Action &Source, const Tool &Creator,			Command::Command(const Action &Source, const Tool &Creator,
	const char *Executable,			const char *Executable,
	const llvm::opt::ArgStringList &Arguments,			const llvm::opt::ArgStringList &Arguments,
	ArrayRef<InputInfo> Inputs)			ArrayRef<InputInfo> Inputs)
	: Source(Source), Creator(Creator), Executable(Executable),			: Source(Source), Creator(Creator), Executable(Executable),
	Arguments(Arguments) {			Arguments(Arguments) {
	for (const auto &II : Inputs)			for (const auto &II : Inputs) {
	if (II.isFilename())			if (II.isFilename())
	InputFilenames.push_back(II.getFilename());			InputFilenames.push_back(II.getFilename());
				DependentActions.push_back(II.getAction());
				}
	}			}

	/// Check if the compiler flag in question should be skipped when			/// Check if the compiler flag in question should be skipped when
	/// emitting a reproducer. Also track how many arguments it has and if the			/// emitting a reproducer. Also track how many arguments it has and if the
	/// option is some kind of include path.			/// option is some kind of include path.
	static bool skipArgs(const char *Flag, bool HaveCrashVFS, int &SkipNum,			static bool skipArgs(const char *Flag, bool HaveCrashVFS, int &SkipNum,
	bool &IsInclude) {			bool &IsInclude) {
	SkipNum = 2;			SkipNum = 2;
	▲ Show 20 Lines • Show All 389 Lines • Show Last 20 Lines

clang/lib/Driver/ToolChains/Clang.cpp

Show First 20 Lines • Show All 6,477 Lines • ▼ Show 20 Lines	for (unsigned I = 0; I < Inputs.size(); ++I) {
UB += CurTC->getInputFilename(Inputs[I]);		UB += CurTC->getInputFilename(Inputs[I]);
}		}
CmdArgs.push_back(TCArgs.MakeArgString(UB));		CmdArgs.push_back(TCArgs.MakeArgString(UB));

// All the inputs are encoded as commands.		// All the inputs are encoded as commands.
C.addCommand(std::make_unique<Command>(		C.addCommand(std::make_unique<Command>(
JA, *this,		JA, *this,
TCArgs.MakeArgString(getToolChain().GetProgramPath(getShortName())),		TCArgs.MakeArgString(getToolChain().GetProgramPath(getShortName())),
CmdArgs, None));		CmdArgs, Inputs));
}		}

void OffloadBundler::ConstructJobMultipleOutputs(		void OffloadBundler::ConstructJobMultipleOutputs(
Compilation &C, const JobAction &JA, const InputInfoList &Outputs,		Compilation &C, const JobAction &JA, const InputInfoList &Outputs,
const InputInfoList &Inputs, const llvm::opt::ArgList &TCArgs,		const InputInfoList &Inputs, const llvm::opt::ArgList &TCArgs,
const char *LinkingOutput) const {		const char *LinkingOutput) const {
// The version with multiple outputs is expected to refer to a unbundling job.		// The version with multiple outputs is expected to refer to a unbundling job.
auto &UA = cast<OffloadUnbundlingJobAction>(JA);		auto &UA = cast<OffloadUnbundlingJobAction>(JA);
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	void OffloadBundler::ConstructJobMultipleOutputs(
}		}
CmdArgs.push_back(TCArgs.MakeArgString(UB));		CmdArgs.push_back(TCArgs.MakeArgString(UB));
CmdArgs.push_back("-unbundle");		CmdArgs.push_back("-unbundle");

// All the inputs are encoded as commands.		// All the inputs are encoded as commands.
C.addCommand(std::make_unique<Command>(		C.addCommand(std::make_unique<Command>(
JA, *this,		JA, *this,
TCArgs.MakeArgString(getToolChain().GetProgramPath(getShortName())),		TCArgs.MakeArgString(getToolChain().GetProgramPath(getShortName())),
CmdArgs, None));		CmdArgs, Inputs));
}		}

void OffloadWrapper::ConstructJob(Compilation &C, const JobAction &JA,		void OffloadWrapper::ConstructJob(Compilation &C, const JobAction &JA,
const InputInfo &Output,		const InputInfo &Output,
const InputInfoList &Inputs,		const InputInfoList &Inputs,
const ArgList &Args,		const ArgList &Args,
const char *LinkingOutput) const {		const char *LinkingOutput) const {
ArgStringList CmdArgs;		ArgStringList CmdArgs;
Show All 23 Lines