This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/Driver/
-
clang/
-
Driver/
-
Driver.h
4/5
Options.td
-
lib/Driver/
-
Driver/
2/2
Driver.cpp
-
ToolChains/
4/4
Clang.cpp
-
HIP.cpp
-
test/Driver/
-
Driver/
-
hip-options.hip
-
llvm/
-
lib/Transforms/IPO/
-
Transforms/
-
IPO/
4/4
FunctionImport.cpp
-
test/Transforms/FunctionImport/
-
Transforms/
-
FunctionImport/
-
Inputs/
-
funcimport.ll
-
noinline.ll
-
adjustable_threshold.ll
-
funcimport.ll
2/2
noinline.ll

Differential D99683

[HIP] Support ThinLTO
ClosedPublic

Authored by yaxunl on Mar 31 2021, 1:48 PM.

Download Raw Diff

Details

Reviewers

tra
ashi1
scchan
tejohnson

Commits

rGbf6124580dfb: [HIP] support ThinLTO

Summary

Add options -[no-]offload-lto and -foffload-lto=[thin,full] for controlling
LTO for offload compilation. Allow LTO for AMDGPU target.

AMDGPU target does not support codegen of object files containing
call of external functions, therefore the LLVM module passed to
AMDGPU backend needs to contain definitions of all the callees.
An LLVM option is added to allow function importer to import
functions with noinline attribute.

HIP toolchain passes proper LLVM options to lld to make sure
function importer imports definitions of all the callees.

Diff Detail

Event Timeline

yaxunl created this revision.Mar 31 2021, 1:48 PM

Herald added subscribers: jansvoboda11, dang, hiraditya, inglorion. · View Herald TranscriptMar 31 2021, 1:48 PM

yaxunl requested review of this revision.Mar 31 2021, 1:48 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 31 2021, 1:48 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B96585: Diff 334525.Mar 31 2021, 1:51 PM

LGTM in general. Please give LTO folks some time to chime in case they have any feedback.

@tejohnson: Just a FYI that we're tinkering with LTO on GPUs here.

clang/include/clang/Driver/Options.td
1948–1951	Should it be `BoolFOption` ?
clang/lib/Driver/Driver.cpp
612	Leftover debug printout?
clang/lib/Driver/ToolChains/Clang.cpp
4433–4434	Nit: rephrase it as `Only AMDGPU supports device-side LTO` ?
llvm/test/Transforms/FunctionImport/noinline.ll
5	I'd add a meaningful suffix to the binaries we'll use to run the checks on. E.g `%t3` -> `%t.lto.bc`, `%t2` -> `%t.inputs.noinline.bc`, `%t` -> `%t.main.bc`.

This revision is now accepted and ready to land.Apr 1 2021, 10:29 AM

I haven't looked extensively yet, but why import noinline functions? Also, please add a patch description.

In D99683#2664674, @tejohnson wrote:

I haven't looked extensively yet, but why import noinline functions? Also, please add a patch description.

AMDGPU backend does not support linking of object files containing external symbols, i.e. one object file calling a function defined in another object file. Therefore the LLVM module passed to AMDGPU backend needs to contain definitions of all callees, even if a callee has noinline attribute. To support backends like this, the function importer needs to be able to import functions with noinline attribute. Therefore we add an LLVM option for allowing that, which is off by default. We have comments at line 70 of HIP.cpp about this.

Will add a patch description.

clang/include/clang/Driver/Options.td
1948–1951	Yes. will fix
clang/lib/Driver/Driver.cpp
612	will remove
clang/lib/Driver/ToolChains/Clang.cpp
4433–4434	will do
llvm/test/Transforms/FunctionImport/noinline.ll
5	will rename %t and %t2 as suggested. However, llvm-lto will postfix the output file name with .thinlto.bc, therefore I would rename %t3 -> %t.summary

In D99683#2669047, @yaxunl wrote:

In D99683#2664674, @tejohnson wrote:

I haven't looked extensively yet, but why import noinline functions? Also, please add a patch description.

AMDGPU backend does not support linking of object files containing external symbols, i.e. one object file calling a function defined in another object file. Therefore the LLVM module passed to AMDGPU backend needs to contain definitions of all callees, even if a callee has noinline attribute. To support backends like this, the function importer needs to be able to import functions with noinline attribute. Therefore we add an LLVM option for allowing that, which is off by default. We have comments at line 70 of HIP.cpp about this.

How does a non-LTO build work, or is (full) LTO currently required? Because with ThinLTO we only import functions that are externally defined but referenced in the current module. Also, when ThinLTO imports functions it makes them available_externally, which means they are dropped and made external symbols again after inlining. So anything imported but not inlined will go back to being an external symbol.

Revise by Artem's comments. Add patch description.

Herald added a subscriber: tpr. · View Herald TranscriptApr 5 2021, 9:01 AM

In D99683#2669080, @tejohnson wrote:

In D99683#2669047, @yaxunl wrote:

In D99683#2664674, @tejohnson wrote:

I haven't looked extensively yet, but why import noinline functions? Also, please add a patch description.

AMDGPU backend does not support linking of object files containing external symbols, i.e. one object file calling a function defined in another object file. Therefore the LLVM module passed to AMDGPU backend needs to contain definitions of all callees, even if a callee has noinline attribute. To support backends like this, the function importer needs to be able to import functions with noinline attribute. Therefore we add an LLVM option for allowing that, which is off by default. We have comments at line 70 of HIP.cpp about this.

How does a non-LTO build work, or is (full) LTO currently required? Because with ThinLTO we only import functions that are externally defined but referenced in the current module. Also, when ThinLTO imports functions it makes them available_externally, which means they are dropped and made external symbols again after inlining. So anything imported but not inlined will go back to being an external symbol.

AMDGPU backend by default uses full LTO for linking. It does not support non-LTO linking. Currently, it inlines all functions except kernels. However we want to be able to be able not to inline all functions. Is it OK to add an LLVM option to mark imported functions as linkonce_odr so that AMDGPU backend can keep the definitions of the imported functions?

Harbormaster completed remote builds in B97136: Diff 335268.Apr 5 2021, 10:06 AM

jansvoboda11 added inline comments.Apr 6 2021, 1:31 AM

clang/include/clang/Driver/Options.td
1948–1951	The `BoolFOption` marshalling multiclass should be only used for flags where either the positive or the negative (or both) are -cc1 options and map to a field in `CompilerInvocation`. Since this patch only deals with the driver (not the cc1 frontend) using `BoolFOption` is not correct. Please, revert this change to the previous state. I might need to explicitly call this out in the documentation https://clang.llvm.org/docs/InternalsManual.html#adding-new-command-line-option.

In D99683#2669136, @yaxunl wrote:

In D99683#2669080, @tejohnson wrote:

In D99683#2669047, @yaxunl wrote:

In D99683#2664674, @tejohnson wrote:

I haven't looked extensively yet, but why import noinline functions? Also, please add a patch description.

AMDGPU backend does not support linking of object files containing external symbols, i.e. one object file calling a function defined in another object file. Therefore the LLVM module passed to AMDGPU backend needs to contain definitions of all callees, even if a callee has noinline attribute. To support backends like this, the function importer needs to be able to import functions with noinline attribute. Therefore we add an LLVM option for allowing that, which is off by default. We have comments at line 70 of HIP.cpp about this.

How does a non-LTO build work, or is (full) LTO currently required? Because with ThinLTO we only import functions that are externally defined but referenced in the current module. Also, when ThinLTO imports functions it makes them available_externally, which means they are dropped and made external symbols again after inlining. So anything imported but not inlined will go back to being an external symbol.

AMDGPU backend by default uses full LTO for linking. It does not support non-LTO linking. Currently, it inlines all functions except kernels. However we want to be able to be able not to inline all functions. Is it OK to add an LLVM option to mark imported functions as linkonce_odr so that AMDGPU backend can keep the definitions of the imported functions?

Actually AMDGPU backend will internalize all non-kernel functions before codegen. Those functions with available_externally linkage will have internal linkage before codegen, therefore they will not be dropped.

clang/include/clang/Driver/Options.td
1948–1951	will do

revert the change about option

rebase

tra added inline comments.Apr 6 2021, 10:23 AM

clang/include/clang/Driver/Options.td
1948–1951	<off-topic for the patch> @jansvoboda11 Thank you for the explanation. Updating the docs would indeed be useful. I would also suggest describing the restrictions next to the `BoolFOption` definition. Developers tend to read the sources way more often than the docs, and the comments source code make it look like a general-purpose multiclass for any boolean `-f` option. Would it also be possible to add some sort of compile-time safeguards to enforce intended constraints on the CLI tablegen?

Harbormaster completed remote builds in B97338: Diff 335571.Apr 6 2021, 11:20 AM

Harbormaster completed remote builds in B97340: Diff 335573.Apr 6 2021, 11:46 AM

In D99683#2671727, @yaxunl wrote:

In D99683#2669136, @yaxunl wrote:

In D99683#2669080, @tejohnson wrote:

In D99683#2669047, @yaxunl wrote:

In D99683#2664674, @tejohnson wrote:

I haven't looked extensively yet, but why import noinline functions? Also, please add a patch description.

AMDGPU backend does not support linking of object files containing external symbols, i.e. one object file calling a function defined in another object file. Therefore the LLVM module passed to AMDGPU backend needs to contain definitions of all callees, even if a callee has noinline attribute. To support backends like this, the function importer needs to be able to import functions with noinline attribute. Therefore we add an LLVM option for allowing that, which is off by default. We have comments at line 70 of HIP.cpp about this.

How does a non-LTO build work, or is (full) LTO currently required? Because with ThinLTO we only import functions that are externally defined but referenced in the current module. Also, when ThinLTO imports functions it makes them available_externally, which means they are dropped and made external symbols again after inlining. So anything imported but not inlined will go back to being an external symbol.

AMDGPU backend by default uses full LTO for linking. It does not support non-LTO linking. Currently, it inlines all functions except kernels. However we want to be able to be able not to inline all functions. Is it OK to add an LLVM option to mark imported functions as linkonce_odr so that AMDGPU backend can keep the definitions of the imported functions?

Actually AMDGPU backend will internalize all non-kernel functions before codegen. Those functions with available_externally linkage will have internal linkage before codegen, therefore they will not be dropped.

This raises some higher level questions for me:

First, how will you deal with other corner cases that won't or cannot be imported right now? While enabling importing of noinline functions and cranking up the threshold will get the majority of functions imported, there are cases that we still won't import (functions/vars that are interposable, certain funcs/vars that cannot be renamed, most non-const variables with non-trivial initializers).

Second, force importing of everything transitively referenced defeats the purpose of ThinLTO and would probably make it worse than regular LTO. The main entry module will need to import everything transitively referenced from there, so everything not dead in the binary, which should make that module post importing equivalent to a regular LTO module. In addition, every other module needs to transitively import everything referenced from those modules, making them very large depending on how many leaf vs non-leaf functions and variables they contain. What is the goal of doing ThinLTO in this case?

In D99683#2672554, @tejohnson wrote:

This raises some higher level questions for me:

First, how will you deal with other corner cases that won't or cannot be imported right now? While enabling importing of noinline functions and cranking up the threshold will get the majority of functions imported, there are cases that we still won't import (functions/vars that are interposable, certain funcs/vars that cannot be renamed, most non-const variables with non-trivial initializers).

We will document the limitation of thinLTO support of HIP toolchain and recommend users not to use thinLTO in those corner cases.

Second, force importing of everything transitively referenced defeats the purpose of ThinLTO and would probably make it worse than regular LTO. The main entry module will need to import everything transitively referenced from there, so everything not dead in the binary, which should make that module post importing equivalent to a regular LTO module. In addition, every other module needs to transitively import everything referenced from those modules, making them very large depending on how many leaf vs non-leaf functions and variables they contain. What is the goal of doing ThinLTO in this case?

The objective is to improve optimization/codegen time by using multi-threads of thinLTO. For example, I have 10 modules each containing a kernel. In full LTO linking, I get one big module containing 10 kernels with all functions inlined, and I have one thread for optimization/codegen. With thinLTO, I get one kernel in each module, with all functions inlined. AMDGPU internalization and global DCE will remove functions not used by that kernel in each module. I will get 10 threads, each doing optimization/codegen for one kernel. Theoretically, there could be 10 times speed up.

In D99683#2672578, @yaxunl wrote:

In D99683#2672554, @tejohnson wrote:

This raises some higher level questions for me:

First, how will you deal with other corner cases that won't or cannot be imported right now? While enabling importing of noinline functions and cranking up the threshold will get the majority of functions imported, there are cases that we still won't import (functions/vars that are interposable, certain funcs/vars that cannot be renamed, most non-const variables with non-trivial initializers).

We will document the limitation of thinLTO support of HIP toolchain and recommend users not to use thinLTO in those corner cases.

Second, force importing of everything transitively referenced defeats the purpose of ThinLTO and would probably make it worse than regular LTO. The main entry module will need to import everything transitively referenced from there, so everything not dead in the binary, which should make that module post importing equivalent to a regular LTO module. In addition, every other module needs to transitively import everything referenced from those modules, making them very large depending on how many leaf vs non-leaf functions and variables they contain. What is the goal of doing ThinLTO in this case?

The objective is to improve optimization/codegen time by using multi-threads of thinLTO. For example, I have 10 modules each containing a kernel. In full LTO linking, I get one big module containing 10 kernels with all functions inlined, and I have one thread for optimization/codegen. With thinLTO, I get one kernel in each module, with all functions inlined. AMDGPU internalization and global DCE will remove functions not used by that kernel in each module. I will get 10 threads, each doing optimization/codegen for one kernel. Theoretically, there could be 10 times speed up.

That will work as long as there are no dependence edges anywhere between the kernels. Is this a library that has a bunch of totally independent kernels only called externally?

In D99683#2672668, @tejohnson wrote:

In D99683#2672578, @yaxunl wrote:

In D99683#2672554, @tejohnson wrote:

This raises some higher level questions for me:

First, how will you deal with other corner cases that won't or cannot be imported right now? While enabling importing of noinline functions and cranking up the threshold will get the majority of functions imported, there are cases that we still won't import (functions/vars that are interposable, certain funcs/vars that cannot be renamed, most non-const variables with non-trivial initializers).

We will document the limitation of thinLTO support of HIP toolchain and recommend users not to use thinLTO in those corner cases.

Second, force importing of everything transitively referenced defeats the purpose of ThinLTO and would probably make it worse than regular LTO. The main entry module will need to import everything transitively referenced from there, so everything not dead in the binary, which should make that module post importing equivalent to a regular LTO module. In addition, every other module needs to transitively import everything referenced from those modules, making them very large depending on how many leaf vs non-leaf functions and variables they contain. What is the goal of doing ThinLTO in this case?

The objective is to improve optimization/codegen time by using multi-threads of thinLTO. For example, I have 10 modules each containing a kernel. In full LTO linking, I get one big module containing 10 kernels with all functions inlined, and I have one thread for optimization/codegen. With thinLTO, I get one kernel in each module, with all functions inlined. AMDGPU internalization and global DCE will remove functions not used by that kernel in each module. I will get 10 threads, each doing optimization/codegen for one kernel. Theoretically, there could be 10 times speed up.

That will work as long as there are no dependence edges anywhere between the kernels. Is this a library that has a bunch of totally independent kernels only called externally?

There are no dependence edges between the kernels since they cannot call each other. The HIP device compilation output is always a shared library which contains multiple independent kernels which can be launched by a HIP program.

Any other concerns? Thanks.

In D99683#2677357, @yaxunl wrote:

Any other concerns? Thanks.

I have some concerns around the fragility of this, for the reasons I mentioned earlier where it may not always be able to import everything needed.

We will document the limitation of thinLTO support of HIP toolchain and recommend users not to use thinLTO in those corner cases.

Where will this be documented?

My concern is that we start getting bugs filed for these corner cases, and it burns a bunch of someone's time to dig in only to discover that it is an unsupported corner case. Since we can detect in the function importer when we cannot import something successfully, I think it would therefore be worthwhile to issue a hard error with a meaningful error message in the AMDGPU case.

clang/lib/Driver/ToolChains/Clang.cpp
4434	Should there be an error (or is there one already) emitted somewhere if LTO is requested along with device offloading and this isn't AMDGPU?

tejohnson added a reviewer: tejohnson.Apr 12 2021, 8:52 AM

To do what I suggested in the prior comment, you'd probably want to add a new index-wide flag (since we don't read IR in the thin link). See for example how EnableSplitLTOUnit is set and used. You could add a flag like ForceImportAll or something like that. Then you don't necessarily even need to bump up the importing threshold or add the new import-noinline flag. Just key off of that in the importer to try to force import everything. If something cannot be imported, fail with a clear error.

This revision now requires changes to proceed.Apr 12 2021, 8:58 AM

In D99683#2683308, @tejohnson wrote:

To do what I suggested in the prior comment, you'd probably want to add a new index-wide flag (since we don't read IR in the thin link). See for example how EnableSplitLTOUnit is set and used. You could add a flag like ForceImportAll or something like that. Then you don't necessarily even need to bump up the importing threshold or add the new import-noinline flag. Just key off of that in the importer to try to force import everything. If something cannot be imported, fail with a clear error.

will do

clang/lib/Driver/ToolChains/Clang.cpp
4434	yes. will do

revised by Teresa's comments

Harbormaster completed remote builds in B103211: Diff 343696.May 7 2021, 9:42 AM

In D99683#2744764, @yaxunl wrote:

In D99683#2683308, @tejohnson wrote:

To do what I suggested in the prior comment, you'd probably want to add a new index-wide flag (since we don't read IR in the thin link). See for example how EnableSplitLTOUnit is set and used. You could add a flag like ForceImportAll or something like that. Then you don't necessarily even need to bump up the importing threshold or add the new import-noinline flag. Just key off of that in the importer to try to force import everything. If something cannot be imported, fail with a clear error.

will do

I noticed you implemented with an internal error rather than a flag in the index. I think this is ok for now, especially if the support will eventually be removed because the linker will support external functions as noted in your TODO (note in order to do this in the index you would need to set up the flag during the compile step, not the linker invocation as you are doing here, which has some advantages if this will persist longer term). I have a suggestion about the error detection below, so that you can report errors earlier along with the failure reason.

llvm/lib/Transforms/IPO/FunctionImport.cpp
494–507	Probably better to issue an error here with the import failure reason?

yaxunl marked an inline comment as done.May 12 2021, 7:00 AM

yaxunl added inline comments.

llvm/lib/Transforms/IPO/FunctionImport.cpp
494–507	My understanding is that the import failure reason is only available if PrintImportFailures is enabled. Also it can only print the GUID and can not print the failed callee name since it is not available, therefore the information is cryptic. It seems to me the current error msg at line 1332 is more suitable for common users. For compiler developers, they can enable PrintImportFailures and see the reason of failed imports.

tejohnson added inline comments.May 12 2021, 7:26 AM

llvm/lib/Transforms/IPO/FunctionImport.cpp
494–507	selectCallee always sets the Reason. And we have the name in addition to the GUID in normal circumstances (linking from modules). It would only not be available in certain debugging situations (e.g. linking from an existing combined module with llvm-lto). Also, by failing here, you don't need to wait until the LTO backends to issue the error, so it fails a little earlier.

yaxunl marked 2 inline comments as done.May 12 2021, 1:43 PM

yaxunl added inline comments.

llvm/lib/Transforms/IPO/FunctionImport.cpp
494–507	I see. Will do.

Revised by Teresa's comments

Harbormaster completed remote builds in B104134: Diff 344949.May 12 2021, 3:32 PM

lgtm

This revision is now accepted and ready to land.May 21 2021, 8:29 AM

Closed by commit rGbf6124580dfb: [HIP] support ThinLTO (authored by yaxunl). · Explain WhyMay 22 2021, 7:49 AM

This revision was automatically updated to reflect the committed changes.

yaxunl added a commit: rGbf6124580dfb: [HIP] support ThinLTO.

Herald added a project: Restricted Project. · View Herald TranscriptMay 22 2021, 7:49 AM

tejohnson mentioned this in D103579: [LTO] Fix -fwhole-program-vtables handling after HIP ThinLTO patch.Jun 2 2021, 5:33 PM

tejohnson mentioned this in rGd0ee8b64ecf3: [LTO] Fix -fwhole-program-vtables handling after HIP ThinLTO patch.Jun 3 2021, 2:25 PM

Revision Contents

Path

Size

clang/

include/

clang/

Driver/

Driver.h

11 lines

Options.td

6 lines

lib/

Driver/

Driver.cpp

41 lines

ToolChains/

Clang.cpp

42 lines

HIP.cpp

10 lines

test/

Driver/

hip-options.hip

14 lines

llvm/

lib/

Transforms/

IPO/

FunctionImport.cpp

29 lines

test/

Transforms/

FunctionImport/

Inputs/

funcimport.ll

3 lines

noinline.ll

8 lines

adjustable_threshold.ll

10 lines

funcimport.ll

9 lines

noinline.ll

23 lines

Diff 344949

clang/include/clang/Driver/Driver.h

Show First 20 Lines • Show All 78 Lines • ▼ Show 20 Lines	enum BitcodeEmbedMode {
EmbedNone,		EmbedNone,
EmbedMarker,		EmbedMarker,
EmbedBitcode		EmbedBitcode
} BitcodeEmbed;		} BitcodeEmbed;

/// LTO mode selected via -f(no-)?lto(=.*)? options.		/// LTO mode selected via -f(no-)?lto(=.*)? options.
LTOKind LTOMode;		LTOKind LTOMode;

		/// LTO mode selected via -f(no-offload-)?lto(=.*)? options.
		LTOKind OffloadLTOMode;

public:		public:
enum OpenMPRuntimeKind {		enum OpenMPRuntimeKind {
/// An unknown OpenMP runtime. We can't generate effective OpenMP code		/// An unknown OpenMP runtime. We can't generate effective OpenMP code
/// without knowing what runtime to target.		/// without knowing what runtime to target.
OMPRT_Unknown,		OMPRT_Unknown,

/// The LLVM OpenMP runtime. When completed and integrated, this will become		/// The LLVM OpenMP runtime. When completed and integrated, this will become
/// the default for Clang.		/// the default for Clang.
▲ Show 20 Lines • Show All 459 Lines • ▼ Show 20 Lines	public:
/// ShouldUseFlangCompiler - Should the flang compiler be used to		/// ShouldUseFlangCompiler - Should the flang compiler be used to
/// handle this action.		/// handle this action.
bool ShouldUseFlangCompiler(const JobAction &JA) const;		bool ShouldUseFlangCompiler(const JobAction &JA) const;

/// ShouldEmitStaticLibrary - Should the linker emit a static library.		/// ShouldEmitStaticLibrary - Should the linker emit a static library.
bool ShouldEmitStaticLibrary(const llvm::opt::ArgList &Args) const;		bool ShouldEmitStaticLibrary(const llvm::opt::ArgList &Args) const;

/// Returns true if we are performing any kind of LTO.		/// Returns true if we are performing any kind of LTO.
bool isUsingLTO() const { return LTOMode != LTOK_None; }		bool isUsingLTO(bool IsOffload = false) const {
		return getLTOMode(IsOffload) != LTOK_None;
		}

/// Get the specific kind of LTO being performed.		/// Get the specific kind of LTO being performed.
LTOKind getLTOMode() const { return LTOMode; }		LTOKind getLTOMode(bool IsOffload = false) const {
		return IsOffload ? OffloadLTOMode : LTOMode;
		}

private:		private:

/// Tries to load options from configuration file.		/// Tries to load options from configuration file.
///		///
/// \returns true if error occurred.		/// \returns true if error occurred.
bool loadConfigFile();		bool loadConfigFile();

▲ Show 20 Lines • Show All 72 Lines • Show Last 20 Lines

clang/include/clang/Driver/Options.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 1,937 Lines • ▼ Show 20 Lines
	def fapple_link_rtlib : Flag<["-"], "fapple-link-rtlib">, Group<f_Group>,			def fapple_link_rtlib : Flag<["-"], "fapple-link-rtlib">, Group<f_Group>,
	HelpText<"Force linking the clang builtins runtime library">;			HelpText<"Force linking the clang builtins runtime library">;
	def flto_EQ : Joined<["-"], "flto=">, Flags<[CoreOption, CC1Option]>, Group<f_Group>,			def flto_EQ : Joined<["-"], "flto=">, Flags<[CoreOption, CC1Option]>, Group<f_Group>,
	HelpText<"Set LTO mode to either 'full' or 'thin'">, Values<"thin,full">;			HelpText<"Set LTO mode to either 'full' or 'thin'">, Values<"thin,full">;
	def flto : Flag<["-"], "flto">, Flags<[CoreOption, CC1Option]>, Group<f_Group>,			def flto : Flag<["-"], "flto">, Flags<[CoreOption, CC1Option]>, Group<f_Group>,
	HelpText<"Enable LTO in 'full' mode">;			HelpText<"Enable LTO in 'full' mode">;
	def fno_lto : Flag<["-"], "fno-lto">, Flags<[CoreOption, CC1Option]>, Group<f_Group>,			def fno_lto : Flag<["-"], "fno-lto">, Flags<[CoreOption, CC1Option]>, Group<f_Group>,
	HelpText<"Disable LTO mode (default)">;			HelpText<"Disable LTO mode (default)">;
				def foffload_lto_EQ : Joined<["-"], "foffload-lto=">, Flags<[CoreOption]>, Group<f_Group>,
				HelpText<"Set LTO mode to either 'full' or 'thin' for offload compilation">, Values<"thin,full">;
				def foffload_lto : Flag<["-"], "foffload-lto">, Flags<[CoreOption]>, Group<f_Group>,
				HelpText<"Enable LTO in 'full' mode for offload compilation">;
				def fno_offload_lto : Flag<["-"], "fno-offload-lto">, Flags<[CoreOption]>, Group<f_Group>,
				HelpText<"Disable LTO mode (default) for offload compilation">;
				traUnsubmitted Done Reply Inline Actions Should it be `BoolFOption` ? tra: Should it be `BoolFOption` ?
				yaxunlAuthorUnsubmitted Done Reply Inline Actions Yes. will fix yaxunl: Yes. will fix
				jansvoboda11Unsubmitted Done Reply Inline Actions The `BoolFOption` marshalling multiclass should be only used for flags where either the positive or the negative (or both) are -cc1 options and map to a field in `CompilerInvocation`. Since this patch only deals with the driver (not the cc1 frontend) using `BoolFOption` is not correct. Please, revert this change to the previous state. I might need to explicitly call this out in the documentation https://clang.llvm.org/docs/InternalsManual.html#adding-new-command-line-option. jansvoboda11: The `BoolFOption` marshalling multiclass should be only used for flags where either the…
				yaxunlAuthorUnsubmitted Done Reply Inline Actions will do yaxunl: will do
				traUnsubmitted Not Done Reply Inline Actions <off-topic for the patch> @jansvoboda11 Thank you for the explanation. Updating the docs would indeed be useful. I would also suggest describing the restrictions next to the `BoolFOption` definition. Developers tend to read the sources way more often than the docs, and the comments source code make it look like a general-purpose multiclass for any boolean `-f` option. Would it also be possible to add some sort of compile-time safeguards to enforce intended constraints on the CLI tablegen? tra: <off-topic for the patch> @jansvoboda11 Thank you for the explanation. Updating the docs would…
	def flto_jobs_EQ : Joined<["-"], "flto-jobs=">,			def flto_jobs_EQ : Joined<["-"], "flto-jobs=">,
	Flags<[CC1Option]>, Group<f_Group>,			Flags<[CC1Option]>, Group<f_Group>,
	HelpText<"Controls the backend parallelism of -flto=thin (default "			HelpText<"Controls the backend parallelism of -flto=thin (default "
	"of 0 means the number of threads will be derived from "			"of 0 means the number of threads will be derived from "
	"the number of CPUs detected)">;			"the number of CPUs detected)">;
	def fthinlto_index_EQ : Joined<["-"], "fthinlto-index=">,			def fthinlto_index_EQ : Joined<["-"], "fthinlto-index=">,
	Flags<[CoreOption, CC1Option]>, Group<f_Group>,			Flags<[CoreOption, CC1Option]>, Group<f_Group>,
	HelpText<"Perform ThinLTO importing using provided function summary index">;			HelpText<"Perform ThinLTO importing using provided function summary index">;
	▲ Show 20 Lines • Show All 4,266 Lines • Show Last 20 Lines

clang/lib/Driver/Driver.cpp

Show First 20 Lines • Show All 588 Lines • ▼ Show 20 Lines	if (A && Target.isRISCV()) {
else if (ArchName.startswith_lower("rv64"))		else if (ArchName.startswith_lower("rv64"))
Target.setArch(llvm::Triple::riscv64);		Target.setArch(llvm::Triple::riscv64);
}		}

return Target;		return Target;
}		}

// Parse the LTO options and record the type of LTO compilation		// Parse the LTO options and record the type of LTO compilation
// based on which -f(no-)?lto(=.*)? option occurs last.		// based on which -f(no-)?lto(=.)? or -f(no-)?offload-lto(=.)?
void Driver::setLTOMode(const llvm::opt::ArgList &Args) {		// option occurs last.
LTOMode = LTOK_None;		static llvm::Optional<driver::LTOKind>
if (!Args.hasFlag(options::OPT_flto, options::OPT_flto_EQ,		parseLTOMode(Driver &D, const llvm::opt::ArgList &Args, OptSpecifier OptPos,
options::OPT_fno_lto, false))		OptSpecifier OptNeg, OptSpecifier OptEq) {
return;		driver::LTOKind LTOMode = LTOK_None;
		if (!Args.hasFlag(OptPos, OptEq, OptNeg, false))
		return None;

StringRef LTOName("full");		StringRef LTOName("full");

const Arg *A = Args.getLastArg(options::OPT_flto_EQ);		const Arg *A = Args.getLastArg(OptEq);
if (A)		if (A)
LTOName = A->getValue();		LTOName = A->getValue();

LTOMode = llvm::StringSwitch<LTOKind>(LTOName)		LTOMode = llvm::StringSwitch<LTOKind>(LTOName)
		traUnsubmitted Done Reply Inline Actions Leftover debug printout? tra: Leftover debug printout?
		yaxunlAuthorUnsubmitted Done Reply Inline Actions will remove yaxunl: will remove
.Case("full", LTOK_Full)		.Case("full", LTOK_Full)
.Case("thin", LTOK_Thin)		.Case("thin", LTOK_Thin)
.Default(LTOK_Unknown);		.Default(LTOK_Unknown);

if (LTOMode == LTOK_Unknown) {		if (LTOMode == LTOK_Unknown) {
assert(A);		assert(A);
Diag(diag::err_drv_unsupported_option_argument) << A->getOption().getName()		D.Diag(diag::err_drv_unsupported_option_argument)
<< A->getValue();		<< A->getOption().getName() << A->getValue();
		return None;
		}
		return LTOMode;
}		}

		// Parse the LTO options.
		void Driver::setLTOMode(const llvm::opt::ArgList &Args) {
		LTOMode = LTOK_None;
		if (auto M = parseLTOMode(*this, Args, options::OPT_flto,
		options::OPT_fno_lto, options::OPT_flto_EQ))
		LTOMode = M.getValue();

		OffloadLTOMode = LTOK_None;
		if (auto M = parseLTOMode(*this, Args, options::OPT_foffload_lto,
		options::OPT_fno_offload_lto,
		options::OPT_foffload_lto_EQ))
		OffloadLTOMode = M.getValue();
}		}

/// Compute the desired OpenMP runtime from the flags provided.		/// Compute the desired OpenMP runtime from the flags provided.
Driver::OpenMPRuntimeKind Driver::getOpenMPRuntime(const ArgList &Args) const {		Driver::OpenMPRuntimeKind Driver::getOpenMPRuntime(const ArgList &Args) const {
StringRef RuntimeName(CLANG_DEFAULT_OPENMP_RUNTIME);		StringRef RuntimeName(CLANG_DEFAULT_OPENMP_RUNTIME);

const Arg *A = Args.getLastArg(options::OPT_fopenmp_EQ);		const Arg *A = Args.getLastArg(options::OPT_fopenmp_EQ);
if (A)		if (A)
▲ Show 20 Lines • Show All 2,297 Lines • ▼ Show 20 Lines	getDeviceDependences(OffloadAction::DeviceDependences &DA,
if (!Relocatable && CurPhase == phases::Backend && !EmitLLVM &&		if (!Relocatable && CurPhase == phases::Backend && !EmitLLVM &&
!EmitAsm) {		!EmitAsm) {
// If we are in backend phase, we attempt to generate the fat binary.		// If we are in backend phase, we attempt to generate the fat binary.
// We compile each arch to IR and use a link action to generate code		// We compile each arch to IR and use a link action to generate code
// object containing ISA. Then we use a special "link" action to create		// object containing ISA. Then we use a special "link" action to create
// a fat binary containing all the code objects for different GPU's.		// a fat binary containing all the code objects for different GPU's.
// The fat binary is then an input to the host action.		// The fat binary is then an input to the host action.
for (unsigned I = 0, E = GpuArchList.size(); I != E; ++I) {		for (unsigned I = 0, E = GpuArchList.size(); I != E; ++I) {
if (GPUSanitize) {		if (GPUSanitize \|\| C.getDriver().isUsingLTO(/IsOffload=/true)) {
// When GPU sanitizer is enabled, since we need to link in the		// When GPU sanitizer is enabled, since we need to link in the
// the sanitizer runtime library after the sanitize pass, we have		// the sanitizer runtime library after the sanitize pass, we have
// to skip the backend and assemble phases and use lld to link		// to skip the backend and assemble phases and use lld to link
// the bitcode.		// the bitcode. The same happens if users request to use LTO
		// explicitly.
ActionList AL;		ActionList AL;
AL.push_back(CudaDeviceActions[I]);		AL.push_back(CudaDeviceActions[I]);
// Create a link action to link device IR with device library		// Create a link action to link device IR with device library
// and generate ISA.		// and generate ISA.
CudaDeviceActions[I] =		CudaDeviceActions[I] =
C.MakeAction<LinkJobAction>(AL, types::TY_Image);		C.MakeAction<LinkJobAction>(AL, types::TY_Image);
} else {		} else {
// When GPU sanitizer is not enabled, we follow the conventional		// When GPU sanitizer is not enabled, we follow the conventional
▲ Show 20 Lines • Show All 2,566 Lines • Show Last 20 Lines

clang/lib/Driver/ToolChains/Clang.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,170 Lines • ▼ Show 20 Lines	void Clang::ConstructJob(Compilation &C, const JobAction &JA,
// include as part of the module. All other jobs are expected to have exactly		// include as part of the module. All other jobs are expected to have exactly
// one input.		// one input.
bool IsCuda = JA.isOffloading(Action::OFK_Cuda);		bool IsCuda = JA.isOffloading(Action::OFK_Cuda);
bool IsCudaDevice = JA.isDeviceOffloading(Action::OFK_Cuda);		bool IsCudaDevice = JA.isDeviceOffloading(Action::OFK_Cuda);
bool IsHIP = JA.isOffloading(Action::OFK_HIP);		bool IsHIP = JA.isOffloading(Action::OFK_HIP);
bool IsHIPDevice = JA.isDeviceOffloading(Action::OFK_HIP);		bool IsHIPDevice = JA.isDeviceOffloading(Action::OFK_HIP);
bool IsOpenMPDevice = JA.isDeviceOffloading(Action::OFK_OpenMP);		bool IsOpenMPDevice = JA.isDeviceOffloading(Action::OFK_OpenMP);
bool IsHeaderModulePrecompile = isa<HeaderModulePrecompileJobAction>(JA);		bool IsHeaderModulePrecompile = isa<HeaderModulePrecompileJobAction>(JA);
		bool IsDeviceOffloadAction = !(JA.isDeviceOffloading(Action::OFK_None) \|\|
		JA.isDeviceOffloading(Action::OFK_Host));
		bool IsUsingLTO = D.isUsingLTO(IsDeviceOffloadAction);
		auto LTOMode = D.getLTOMode(IsDeviceOffloadAction);

// A header module compilation doesn't have a main input file, so invent a		// A header module compilation doesn't have a main input file, so invent a
// fake one as a placeholder.		// fake one as a placeholder.
const char *ModuleName = [&]{		const char *ModuleName = [&]{
auto *ModuleNameArg = Args.getLastArg(options::OPT_fmodule_name_EQ);		auto *ModuleNameArg = Args.getLastArg(options::OPT_fmodule_name_EQ);
return ModuleNameArg ? ModuleNameArg->getValue() : "";		return ModuleNameArg ? ModuleNameArg->getValue() : "";
}();		}();
InputInfo HeaderModuleInput(Inputs[0].getType(), ModuleName, ModuleName);		InputInfo HeaderModuleInput(Inputs[0].getType(), ModuleName, ModuleName);
▲ Show 20 Lines • Show All 234 Lines • ▼ Show 20 Lines	if (isa<AnalyzeJobAction>(JA)) {
}		}

// Preserve use-list order by default when emitting bitcode, so that		// Preserve use-list order by default when emitting bitcode, so that
// loading the bitcode up in 'opt' or 'llc' and running passes gives the		// loading the bitcode up in 'opt' or 'llc' and running passes gives the
// same result as running passes here. For LTO, we don't need to preserve		// same result as running passes here. For LTO, we don't need to preserve
// the use-list order, since serialization to bitcode is part of the flow.		// the use-list order, since serialization to bitcode is part of the flow.
if (JA.getType() == types::TY_LLVM_BC)		if (JA.getType() == types::TY_LLVM_BC)
CmdArgs.push_back("-emit-llvm-uselists");		CmdArgs.push_back("-emit-llvm-uselists");

// Device-side jobs do not support LTO.		if (IsUsingLTO) {
		traUnsubmitted Done Reply Inline Actions Nit: rephrase it as `Only AMDGPU supports device-side LTO` ? tra: Nit: rephrase it as `Only AMDGPU supports device-side LTO` ?
		yaxunlAuthorUnsubmitted Done Reply Inline Actions will do yaxunl: will do
		tejohnsonUnsubmitted Done Reply Inline Actions Should there be an error (or is there one already) emitted somewhere if LTO is requested along with device offloading and this isn't AMDGPU? tejohnson: Should there be an error (or is there one already) emitted somewhere if LTO is requested along…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions yes. will do yaxunl: yes. will do
bool isDeviceOffloadAction = !(JA.isDeviceOffloading(Action::OFK_None) \|\|		if (!IsDeviceOffloadAction) {
JA.isDeviceOffloading(Action::OFK_Host));

if (D.isUsingLTO() && !isDeviceOffloadAction) {
Args.AddLastArg(CmdArgs, options::OPT_flto, options::OPT_flto_EQ);		Args.AddLastArg(CmdArgs, options::OPT_flto, options::OPT_flto_EQ);
CmdArgs.push_back("-flto-unit");		CmdArgs.push_back("-flto-unit");
		} else if (Triple.isAMDGPU()) {
		// Only AMDGPU supports device-side LTO
		assert(LTOMode == LTOK_Full \|\| LTOMode == LTOK_Thin);
		CmdArgs.push_back(Args.MakeArgString(
		Twine("-flto=") + (LTOMode == LTOK_Thin ? "thin" : "full")));
		CmdArgs.push_back("-flto-unit");
		} else {
		D.Diag(diag::err_drv_unsupported_opt_for_target)
		<< Args.getLastArg(options::OPT_foffload_lto,
		options::OPT_foffload_lto_EQ)
		->getAsString(Args)
		<< Triple.getTriple();
		}
}		}
}		}

if (const Arg *A = Args.getLastArg(options::OPT_fthinlto_index_EQ)) {		if (const Arg *A = Args.getLastArg(options::OPT_fthinlto_index_EQ)) {
if (!types::isLLVMIR(Input.getType()))		if (!types::isLLVMIR(Input.getType()))
D.Diag(diag::err_drv_arg_requires_bitcode_input) << A->getAsString(Args);		D.Diag(diag::err_drv_arg_requires_bitcode_input) << A->getAsString(Args);
Args.AddLastArg(CmdArgs, options::OPT_fthinlto_index_EQ);		Args.AddLastArg(CmdArgs, options::OPT_fthinlto_index_EQ);
}		}

if (Args.getLastArg(options::OPT_fthin_link_bitcode_EQ))		if (Args.getLastArg(options::OPT_fthin_link_bitcode_EQ))
Args.AddLastArg(CmdArgs, options::OPT_fthin_link_bitcode_EQ);		Args.AddLastArg(CmdArgs, options::OPT_fthin_link_bitcode_EQ);

if (Args.getLastArg(options::OPT_save_temps_EQ))		if (Args.getLastArg(options::OPT_save_temps_EQ))
Args.AddLastArg(CmdArgs, options::OPT_save_temps_EQ);		Args.AddLastArg(CmdArgs, options::OPT_save_temps_EQ);

auto *MemProfArg = Args.getLastArg(options::OPT_fmemory_profile,		auto *MemProfArg = Args.getLastArg(options::OPT_fmemory_profile,
options::OPT_fmemory_profile_EQ,		options::OPT_fmemory_profile_EQ,
options::OPT_fno_memory_profile);		options::OPT_fno_memory_profile);
if (MemProfArg &&		if (MemProfArg &&
!MemProfArg->getOption().matches(options::OPT_fno_memory_profile))		!MemProfArg->getOption().matches(options::OPT_fno_memory_profile))
MemProfArg->render(Args, CmdArgs);		MemProfArg->render(Args, CmdArgs);

// Embed-bitcode option.		// Embed-bitcode option.
// Only white-listed flags below are allowed to be embedded.		// Only white-listed flags below are allowed to be embedded.
if (C.getDriver().embedBitcodeInObject() && !C.getDriver().isUsingLTO() &&		if (C.getDriver().embedBitcodeInObject() && !IsUsingLTO &&
(isa<BackendJobAction>(JA) \|\| isa<AssembleJobAction>(JA))) {		(isa<BackendJobAction>(JA) \|\| isa<AssembleJobAction>(JA))) {
// Add flags implied by -fembed-bitcode.		// Add flags implied by -fembed-bitcode.
Args.AddLastArg(CmdArgs, options::OPT_fembed_bitcode_EQ);		Args.AddLastArg(CmdArgs, options::OPT_fembed_bitcode_EQ);
// Disable all llvm IR level optimizations.		// Disable all llvm IR level optimizations.
CmdArgs.push_back("-disable-llvm-passes");		CmdArgs.push_back("-disable-llvm-passes");

// Render target options.		// Render target options.
TC.addClangTargetOptions(Args, CmdArgs, JA.getOffloadingDeviceKind());		TC.addClangTargetOptions(Args, CmdArgs, JA.getOffloadingDeviceKind());
▲ Show 20 Lines • Show All 100 Lines • ▼ Show 20 Lines	if (C.getDriver().embedBitcodeInObject() && !IsUsingLTO &&
}		}

C.addCommand(std::make_unique<Command>(		C.addCommand(std::make_unique<Command>(
JA, *this, ResponseFileSupport::AtFileUTF8(), D.getClangProgramPath(),		JA, *this, ResponseFileSupport::AtFileUTF8(), D.getClangProgramPath(),
CmdArgs, Inputs, Output));		CmdArgs, Inputs, Output));
return;		return;
}		}

if (C.getDriver().embedBitcodeMarkerOnly() && !C.getDriver().isUsingLTO())		if (C.getDriver().embedBitcodeMarkerOnly() && !IsUsingLTO)
CmdArgs.push_back("-fembed-bitcode=marker");		CmdArgs.push_back("-fembed-bitcode=marker");

// We normally speed up the clang process a bit by skipping destructors at		// We normally speed up the clang process a bit by skipping destructors at
// exit, but when we're generating diagnostics we can rely on some of the		// exit, but when we're generating diagnostics we can rely on some of the
// cleanup.		// cleanup.
if (!C.isForDiagnostics())		if (!C.isForDiagnostics())
CmdArgs.push_back("-disable-free");		CmdArgs.push_back("-disable-free");

▲ Show 20 Lines • Show All 1,820 Lines • ▼ Show 20 Lines
// by the frontend.		// by the frontend.
// When -fembed-bitcode is enabled, optimized bitcode is emitted because it		// When -fembed-bitcode is enabled, optimized bitcode is emitted because it
// has slightly different breakdown between stages.		// has slightly different breakdown between stages.
// FIXME: -fembed-bitcode -save-temps will save optimized bitcode instead of		// FIXME: -fembed-bitcode -save-temps will save optimized bitcode instead of
// pristine IR generated by the frontend. Ideally, a new compile action should		// pristine IR generated by the frontend. Ideally, a new compile action should
// be added so both IR can be captured.		// be added so both IR can be captured.
if ((C.getDriver().isSaveTempsEnabled() \|\|		if ((C.getDriver().isSaveTempsEnabled() \|\|
JA.isHostOffloading(Action::OFK_OpenMP)) &&		JA.isHostOffloading(Action::OFK_OpenMP)) &&
!(C.getDriver().embedBitcodeInObject() && !C.getDriver().isUsingLTO()) &&		!(C.getDriver().embedBitcodeInObject() && !IsUsingLTO) &&
isa<CompileJobAction>(JA))		isa<CompileJobAction>(JA))
CmdArgs.push_back("-disable-llvm-passes");		CmdArgs.push_back("-disable-llvm-passes");

Args.AddAllArgs(CmdArgs, options::OPT_undef);		Args.AddAllArgs(CmdArgs, options::OPT_undef);

const char *Exec = D.getClangProgramPath();		const char *Exec = D.getClangProgramPath();

// Optionally embed the -cc1 level arguments into the debug info or a		// Optionally embed the -cc1 level arguments into the debug info or a
▲ Show 20 Lines • Show All 116 Lines • ▼ Show 20 Lines
}		}

bool VirtualFunctionElimination =		bool VirtualFunctionElimination =
Args.hasFlag(options::OPT_fvirtual_function_elimination,		Args.hasFlag(options::OPT_fvirtual_function_elimination,
options::OPT_fno_virtual_function_elimination, false);		options::OPT_fno_virtual_function_elimination, false);
if (VirtualFunctionElimination) {		if (VirtualFunctionElimination) {
// VFE requires full LTO (currently, this might be relaxed to allow ThinLTO		// VFE requires full LTO (currently, this might be relaxed to allow ThinLTO
// in the future).		// in the future).
if (D.getLTOMode() != LTOK_Full)		if (LTOMode != LTOK_Full)
D.Diag(diag::err_drv_argument_only_allowed_with)		D.Diag(diag::err_drv_argument_only_allowed_with)
<< "-fvirtual-function-elimination"		<< "-fvirtual-function-elimination"
<< "-flto=full";		<< "-flto=full";

CmdArgs.push_back("-fvirtual-function-elimination");		CmdArgs.push_back("-fvirtual-function-elimination");
}		}

// VFE requires whole-program-vtables, and enables it by default.		// VFE requires whole-program-vtables, and enables it by default.
bool WholeProgramVTables = Args.hasFlag(		bool WholeProgramVTables = Args.hasFlag(
options::OPT_fwhole_program_vtables,		options::OPT_fwhole_program_vtables,
options::OPT_fno_whole_program_vtables, VirtualFunctionElimination);		options::OPT_fno_whole_program_vtables, VirtualFunctionElimination);
if (VirtualFunctionElimination && !WholeProgramVTables) {		if (VirtualFunctionElimination && !WholeProgramVTables) {
D.Diag(diag::err_drv_argument_not_allowed_with)		D.Diag(diag::err_drv_argument_not_allowed_with)
<< "-fno-whole-program-vtables"		<< "-fno-whole-program-vtables"
<< "-fvirtual-function-elimination";		<< "-fvirtual-function-elimination";
}		}

if (WholeProgramVTables) {		if (WholeProgramVTables) {
if (!D.isUsingLTO())		if (!IsUsingLTO)
D.Diag(diag::err_drv_argument_only_allowed_with)		D.Diag(diag::err_drv_argument_only_allowed_with)
<< "-fwhole-program-vtables"		<< "-fwhole-program-vtables"
<< "-flto";		<< "-flto";
CmdArgs.push_back("-fwhole-program-vtables");		CmdArgs.push_back("-fwhole-program-vtables");
}		}

bool DefaultsSplitLTOUnit =		bool DefaultsSplitLTOUnit =
(WholeProgramVTables \|\| Sanitize.needsLTO()) &&		(WholeProgramVTables \|\| Sanitize.needsLTO()) &&
(D.getLTOMode() == LTOK_Full \|\| TC.canSplitThinLTOUnit());		(LTOMode == LTOK_Full \|\| TC.canSplitThinLTOUnit());
bool SplitLTOUnit =		bool SplitLTOUnit =
Args.hasFlag(options::OPT_fsplit_lto_unit,		Args.hasFlag(options::OPT_fsplit_lto_unit,
options::OPT_fno_split_lto_unit, DefaultsSplitLTOUnit);		options::OPT_fno_split_lto_unit, DefaultsSplitLTOUnit);
if (Sanitize.needsLTO() && !SplitLTOUnit)		if (Sanitize.needsLTO() && !SplitLTOUnit)
D.Diag(diag::err_drv_argument_not_allowed_with) << "-fno-split-lto-unit"		D.Diag(diag::err_drv_argument_not_allowed_with) << "-fno-split-lto-unit"
<< "-fsanitize=cfi";		<< "-fsanitize=cfi";
if (SplitLTOUnit)		if (SplitLTOUnit)
CmdArgs.push_back("-fsplit-lto-unit");		CmdArgs.push_back("-fsplit-lto-unit");
Show All 29 Lines	if (Arg *A = Args.getLastArg(options::OPT_fglobal_isel,
}		}
}		}

if (Args.hasArg(options::OPT_forder_file_instrumentation)) {		if (Args.hasArg(options::OPT_forder_file_instrumentation)) {
CmdArgs.push_back("-forder-file-instrumentation");		CmdArgs.push_back("-forder-file-instrumentation");
// Enable order file instrumentation when ThinLTO is not on. When ThinLTO is		// Enable order file instrumentation when ThinLTO is not on. When ThinLTO is
// on, we need to pass these flags as linker flags and that will be handled		// on, we need to pass these flags as linker flags and that will be handled
// outside of the compiler.		// outside of the compiler.
if (!D.isUsingLTO()) {		if (!IsUsingLTO) {
CmdArgs.push_back("-mllvm");		CmdArgs.push_back("-mllvm");
CmdArgs.push_back("-enable-order-file-instrumentation");		CmdArgs.push_back("-enable-order-file-instrumentation");
}		}
}		}

if (Arg *A = Args.getLastArg(options::OPT_fforce_enable_int128,		if (Arg *A = Args.getLastArg(options::OPT_fforce_enable_int128,
options::OPT_fno_force_enable_int128)) {		options::OPT_fno_force_enable_int128)) {
if (A->getOption().matches(options::OPT_fforce_enable_int128))		if (A->getOption().matches(options::OPT_fforce_enable_int128))
▲ Show 20 Lines • Show All 1,032 Lines • Show Last 20 Lines

clang/lib/Driver/ToolChains/HIP.cpp

Show First 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	void AMDGCN::Linker::constructLldCommand(Compilation &C, const JobAction &JA,
// Construct lld command.		// Construct lld command.
// The output from ld.lld is an HSA code object file.		// The output from ld.lld is an HSA code object file.
ArgStringList LldArgs{"-flavor", "gnu", "--no-undefined", "-shared",		ArgStringList LldArgs{"-flavor", "gnu", "--no-undefined", "-shared",
"-plugin-opt=-amdgpu-internalize-symbols"};		"-plugin-opt=-amdgpu-internalize-symbols"};

auto &TC = getToolChain();		auto &TC = getToolChain();
auto &D = TC.getDriver();		auto &D = TC.getDriver();
assert(!Inputs.empty() && "Must have at least one input.");		assert(!Inputs.empty() && "Must have at least one input.");
addLTOOptions(TC, Args, LldArgs, Output, Inputs[0],		bool IsThinLTO = D.getLTOMode(/IsOffload=/true) == LTOK_Thin;
D.getLTOMode() == LTOK_Thin);		addLTOOptions(TC, Args, LldArgs, Output, Inputs[0], IsThinLTO);

// Extract all the -m options		// Extract all the -m options
std::vector<llvm::StringRef> Features;		std::vector<llvm::StringRef> Features;
amdgpu::getAMDGPUTargetFeatures(D, TC.getTriple(), Args, Features);		amdgpu::getAMDGPUTargetFeatures(D, TC.getTriple(), Args, Features);

// Add features to mattr such as cumode		// Add features to mattr such as cumode
std::string MAttrString = "-plugin-opt=-mattr=";		std::string MAttrString = "-plugin-opt=-mattr=";
for (auto OneFeature : unifyTargetFeatures(Features)) {		for (auto OneFeature : unifyTargetFeatures(Features)) {
MAttrString.append(Args.MakeArgString(OneFeature));		MAttrString.append(Args.MakeArgString(OneFeature));
if (OneFeature != Features.back())		if (OneFeature != Features.back())
MAttrString.append(",");		MAttrString.append(",");
}		}
if (!Features.empty())		if (!Features.empty())
LldArgs.push_back(Args.MakeArgString(MAttrString));		LldArgs.push_back(Args.MakeArgString(MAttrString));

		// ToDo: Remove this option after AMDGPU backend supports ISA-level linking.
		// Since AMDGPU backend currently does not support ISA-level linking, all
		// called functions need to be imported.
		if (IsThinLTO)
		LldArgs.push_back(Args.MakeArgString("-plugin-opt=-force-import-all"));

for (const Arg *A : Args.filtered(options::OPT_mllvm)) {		for (const Arg *A : Args.filtered(options::OPT_mllvm)) {
LldArgs.push_back(		LldArgs.push_back(
Args.MakeArgString(Twine("-plugin-opt=") + A->getValue(0)));		Args.MakeArgString(Twine("-plugin-opt=") + A->getValue(0)));
}		}

if (C.getDriver().isSaveTempsEnabled())		if (C.getDriver().isSaveTempsEnabled())
LldArgs.push_back("-save-temps");		LldArgs.push_back("-save-temps");

▲ Show 20 Lines • Show All 379 Lines • Show Last 20 Lines

clang/test/Driver/hip-options.hip

	Show First 20 Lines • Show All 50 Lines • ▼ Show 20 Lines
	// RUN: --cuda-gpu-arch=gfx906 %s 2>&1 \| FileCheck -check-prefix=CTA %s			// RUN: --cuda-gpu-arch=gfx906 %s 2>&1 \| FileCheck -check-prefix=CTA %s
	// CTA: clang{{.}} "-triple" "x86_64-unknown-linux-gnu" {{.}} "-mconstructor-aliases"			// CTA: clang{{.}} "-triple" "x86_64-unknown-linux-gnu" {{.}} "-mconstructor-aliases"
	// CTA-NOT: clang{{.}} "-triple" "amdgcn-amd-amdhsa" {{.}} "-mconstructor-aliases"			// CTA-NOT: clang{{.}} "-triple" "amdgcn-amd-amdhsa" {{.}} "-mconstructor-aliases"

	// RUN: %clang -### -target x86_64-unknown-linux-gnu -nogpuinc -nogpulib \			// RUN: %clang -### -target x86_64-unknown-linux-gnu -nogpuinc -nogpulib \
	// RUN: --offload-arch=gfx906 -fgpu-inline-threshold=1000 %s 2>&1 \| FileCheck -check-prefix=THRESH %s			// RUN: --offload-arch=gfx906 -fgpu-inline-threshold=1000 %s 2>&1 \| FileCheck -check-prefix=THRESH %s
	// THRESH: clang{{.}} "-triple" "amdgcn-amd-amdhsa" {{.}} "-mllvm" "-inline-threshold=1000"			// THRESH: clang{{.}} "-triple" "amdgcn-amd-amdhsa" {{.}} "-mllvm" "-inline-threshold=1000"
	// THRESH-NOT: clang{{.}} "-triple" "x86_64-unknown-linux-gnu" {{.}} "-inline-threshold=1000"			// THRESH-NOT: clang{{.}} "-triple" "x86_64-unknown-linux-gnu" {{.}} "-inline-threshold=1000"

				// Check -foffload-lto=thin translated correctly.

				// RUN: %clang -### -target x86_64-unknown-linux-gnu -nogpuinc -nogpulib \
				// RUN: --cuda-gpu-arch=gfx906 -foffload-lto=thin %s 2>&1 \
				// RUN: \| FileCheck -check-prefix=THINLTO %s

				// RUN: %clang -### -target x86_64-unknown-linux-gnu -nogpuinc -nogpulib \
				// RUN: --cuda-gpu-arch=gfx906 -fgpu-rdc -foffload-lto=thin %s 2>&1 \
				// RUN: \| FileCheck -check-prefix=THINLTO %s

				// THINLTO-NOT: clang{{.}} "-triple" "x86_64-unknown-linux-gnu" {{.}} "-flto-unit"
				// THINLTO: clang{{.}} "-triple" "amdgcn-amd-amdhsa" {{.}} "-flto=thin" "-flto-unit"
				// THINLTO: lld{{.*}}"-plugin-opt=mcpu=gfx906" "-plugin-opt=thinlto" "-plugin-opt=-force-import-all"

llvm/lib/Transforms/IPO/FunctionImport.cpp

Show First 20 Lines • Show All 78 Lines • ▼ Show 20 Lines
static cl::opt<unsigned> ImportInstrLimit(		static cl::opt<unsigned> ImportInstrLimit(
"import-instr-limit", cl::init(100), cl::Hidden, cl::value_desc("N"),		"import-instr-limit", cl::init(100), cl::Hidden, cl::value_desc("N"),
cl::desc("Only import functions with less than N instructions"));		cl::desc("Only import functions with less than N instructions"));

static cl::opt<int> ImportCutoff(		static cl::opt<int> ImportCutoff(
"import-cutoff", cl::init(-1), cl::Hidden, cl::value_desc("N"),		"import-cutoff", cl::init(-1), cl::Hidden, cl::value_desc("N"),
cl::desc("Only import first N functions if N>=0 (default -1)"));		cl::desc("Only import first N functions if N>=0 (default -1)"));

		static cl::opt<bool>
		ForceImportAll("force-import-all", cl::init(false), cl::Hidden,
		cl::desc("Import functions with noinline attribute"));

static cl::opt<float>		static cl::opt<float>
ImportInstrFactor("import-instr-evolution-factor", cl::init(0.7),		ImportInstrFactor("import-instr-evolution-factor", cl::init(0.7),
cl::Hidden, cl::value_desc("x"),		cl::Hidden, cl::value_desc("x"),
cl::desc("As we import functions, multiply the "		cl::desc("As we import functions, multiply the "
"`import-instr-limit` threshold by this factor "		"`import-instr-limit` threshold by this factor "
"before processing newly imported functions"));		"before processing newly imported functions"));

static cl::opt<float> ImportHotInstrFactor(		static cl::opt<float> ImportHotInstrFactor(
▲ Show 20 Lines • Show All 127 Lines • ▼ Show 20 Lines	auto It = llvm::find_if(
CalleeSummaryList.size() > 1 &&		CalleeSummaryList.size() > 1 &&
Summary->modulePath() != CallerModulePath) {		Summary->modulePath() != CallerModulePath) {
Reason =		Reason =
FunctionImporter::ImportFailureReason::LocalLinkageNotInModule;		FunctionImporter::ImportFailureReason::LocalLinkageNotInModule;
return false;		return false;
}		}

if ((Summary->instCount() > Threshold) &&		if ((Summary->instCount() > Threshold) &&
!Summary->fflags().AlwaysInline) {		!Summary->fflags().AlwaysInline && !ForceImportAll) {
Reason = FunctionImporter::ImportFailureReason::TooLarge;		Reason = FunctionImporter::ImportFailureReason::TooLarge;
return false;		return false;
}		}

// Skip if it isn't legal to import (e.g. may reference unpromotable		// Skip if it isn't legal to import (e.g. may reference unpromotable
// locals).		// locals).
if (Summary->notEligibleToImport()) {		if (Summary->notEligibleToImport()) {
Reason = FunctionImporter::ImportFailureReason::NotEligible;		Reason = FunctionImporter::ImportFailureReason::NotEligible;
return false;		return false;
}		}

// Don't bother importing if we can't inline it anyway.		// Don't bother importing if we can't inline it anyway.
if (Summary->fflags().NoInline) {		if (Summary->fflags().NoInline && !ForceImportAll) {
Reason = FunctionImporter::ImportFailureReason::NoInline;		Reason = FunctionImporter::ImportFailureReason::NoInline;
return false;		return false;
}		}

return true;		return true;
});		});
if (It == CalleeSummaryList.end())		if (It == CalleeSummaryList.end())
return nullptr;		return nullptr;
▲ Show 20 Lines • Show All 230 Lines • ▼ Show 20 Lines	if (CalleeSummary) {
std::max(FailureInfo->MaxHotness, Edge.second.getHotness());		std::max(FailureInfo->MaxHotness, Edge.second.getHotness());
}		}
} else if (PrintImportFailures) {		} else if (PrintImportFailures) {
assert(!FailureInfo &&		assert(!FailureInfo &&
"Expected no FailureInfo for newly rejected candidate");		"Expected no FailureInfo for newly rejected candidate");
FailureInfo = std::make_unique<FunctionImporter::ImportFailureInfo>(		FailureInfo = std::make_unique<FunctionImporter::ImportFailureInfo>(
VI, Edge.second.getHotness(), Reason, 1);		VI, Edge.second.getHotness(), Reason, 1);
}		}
LLVM_DEBUG(		if (ForceImportAll) {
dbgs() << "ignored! No qualifying callee with summary found.\n");		std::string Msg = std::string("Failed to import function ") +
		VI.name().str() + " due to " +
		getFailureName(Reason);
		auto Error = make_error<StringError>(
		Msg, std::make_error_code(std::errc::operation_not_supported));
		logAllUnhandledErrors(std::move(Error), errs(),
		"Error importing module: ");
		break;
		} else {
		LLVM_DEBUG(dbgs()
		<< "ignored! No qualifying callee with summary found.\n");
continue;		continue;
}		}
		tejohnsonUnsubmitted Done Reply Inline Actions Probably better to issue an error here with the import failure reason? tejohnson: Probably better to issue an error here with the import failure reason?
		yaxunlAuthorUnsubmitted Done Reply Inline Actions My understanding is that the import failure reason is only available if PrintImportFailures is enabled. Also it can only print the GUID and can not print the failed callee name since it is not available, therefore the information is cryptic. It seems to me the current error msg at line 1332 is more suitable for common users. For compiler developers, they can enable PrintImportFailures and see the reason of failed imports. yaxunl: My understanding is that the import failure reason is only available if PrintImportFailures is…
		tejohnsonUnsubmitted Done Reply Inline Actions selectCallee always sets the Reason. And we have the name in addition to the GUID in normal circumstances (linking from modules). It would only not be available in certain debugging situations (e.g. linking from an existing combined module with llvm-lto). Also, by failing here, you don't need to wait until the LTO backends to issue the error, so it fails a little earlier. tejohnson: selectCallee always sets the Reason. And we have the name in addition to the GUID in normal…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions I see. Will do. yaxunl: I see. Will do.
		}

// "Resolve" the summary		// "Resolve" the summary
CalleeSummary = CalleeSummary->getBaseObject();		CalleeSummary = CalleeSummary->getBaseObject();
ResolvedCalleeSummary = cast<FunctionSummary>(CalleeSummary);		ResolvedCalleeSummary = cast<FunctionSummary>(CalleeSummary);

assert((ResolvedCalleeSummary->fflags().AlwaysInline \|\|		assert((ResolvedCalleeSummary->fflags().AlwaysInline \|\| ForceImportAll \|\|
(ResolvedCalleeSummary->instCount() <= NewThreshold)) &&		(ResolvedCalleeSummary->instCount() <= NewThreshold)) &&
"selectCallee() didn't honor the threshold");		"selectCallee() didn't honor the threshold");

auto ExportModulePath = ResolvedCalleeSummary->modulePath();		auto ExportModulePath = ResolvedCalleeSummary->modulePath();
auto ILI = ImportList[ExportModulePath].insert(VI.getGUID());		auto ILI = ImportList[ExportModulePath].insert(VI.getGUID());
// We previously decided to import this GUID definition if it was already		// We previously decided to import this GUID definition if it was already
// inserted in the set of imports from the exporting module.		// inserted in the set of imports from the exporting module.
bool PreviouslyImported = !ILI.second;		bool PreviouslyImported = !ILI.second;
if (!PreviouslyImported) {		if (!PreviouslyImported) {
▲ Show 20 Lines • Show All 906 Lines • Show Last 20 Lines

llvm/test/Transforms/FunctionImport/Inputs/funcimport.ll

				target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-apple-macosx10.11.0"

	@globalvar = global i32 1, align 4			@globalvar = global i32 1, align 4
	@staticvar = internal global i32 1, align 4			@staticvar = internal global i32 1, align 4
	@staticconstvar = internal unnamed_addr constant [2 x i32] [i32 10, i32 20], align 4			@staticconstvar = internal unnamed_addr constant [2 x i32] [i32 10, i32 20], align 4
	@commonvar = common global i32 0, align 4			@commonvar = common global i32 0, align 4
	@P = internal global void ()* null, align 8			@P = internal global void ()* null, align 8

	@weakalias = weak alias void (...), bitcast (void ()* @globalfunc1 to void (...)*)			@weakalias = weak alias void (...), bitcast (void ()* @globalfunc1 to void (...)*)
	@analias = alias void (...), bitcast (void ()* @globalfunc2 to void (...)*)			@analias = alias void (...), bitcast (void ()* @globalfunc2 to void (...)*)
	▲ Show 20 Lines • Show All 157 Lines • Show Last 20 Lines

llvm/test/Transforms/FunctionImport/Inputs/noinline.ll

This file was added.

				define void @foo(i64* %v) #0 {
				entry:
				%v.addr = alloca i64*, align 8
				store i64* %v, i64** %v.addr, align 8
				ret void
				}

				attributes #0 = { noinline }
				No newline at end of file

llvm/test/Transforms/FunctionImport/adjustable_threshold.ll

	; Do setup work for all below tests: generate bitcode and combined index			; Do setup work for all below tests: generate bitcode and combined index
	; RUN: opt -module-summary %s -o %t.bc			; RUN: opt -module-summary %s -o %t.bc
	; RUN: opt -module-summary %p/Inputs/adjustable_threshold.ll -o %t2.bc			; RUN: opt -module-summary %p/Inputs/adjustable_threshold.ll -o %t2.bc
	; RUN: llvm-lto -thinlto -o %t3 %t.bc %t2.bc			; RUN: llvm-lto -thinlto -o %t3 %t.bc %t2.bc

	; Test import with default progressive instruction factor			; Test import with default progressive instruction factor
	; RUN: opt -function-import -summary-file %t3.thinlto.bc %t.bc -import-instr-limit=10 -S \| FileCheck %s --check-prefix=INSTLIM-DEFAULT			; RUN: opt -function-import -summary-file %t3.thinlto.bc %t.bc -import-instr-limit=10 -S \| FileCheck %s --check-prefix=INSTLIM-DEFAULT
	; INSTLIM-DEFAULT: call void @staticfunc2.llvm.			; INSTLIM-DEFAULT: call void @staticfunc2.llvm.

	; Test import with a reduced progressive instruction factor			; Test import with a reduced progressive instruction factor
	; RUN: opt -function-import -summary-file %t3.thinlto.bc %t.bc -import-instr-limit=10 -import-instr-evolution-factor=0.5 -S \| FileCheck %s --check-prefix=INSTLIM-PROGRESSIVE			; RUN: opt -function-import -summary-file %t3.thinlto.bc %t.bc -import-instr-limit=10 -import-instr-evolution-factor=0.5 -S \| FileCheck %s --check-prefix=INSTLIM-PROGRESSIVE
	; INSTLIM-PROGRESSIVE-NOT: call void @staticfunc			; INSTLIM-PROGRESSIVE-NOT: call void @staticfunc

				; Test force import all
				; RUN: opt -function-import -summary-file %t3.thinlto.bc %t.bc \
				; RUN: -import-instr-limit=1 -force-import-all -S \
				; RUN: \| FileCheck %s --check-prefix=IMPORTALL
				; IMPORTALL-DAG: define available_externally void @globalfunc1()
				; IMPORTALL-DAG: define available_externally void @trampoline()
				; IMPORTALL-DAG: define available_externally void @largefunction()
				; IMPORTALL-DAG: define available_externally hidden void @staticfunc2.llvm.0()
				; IMPORTALL-DAG: define available_externally void @globalfunc2()

	declare void @globalfunc1()			declare void @globalfunc1()
	declare void @globalfunc2()			declare void @globalfunc2()

	define void @entry() {			define void @entry() {
	entry:			entry:
	; Call site are processed in reversed order!			; Call site are processed in reversed order!

	Show All 9 Lines

llvm/test/Transforms/FunctionImport/funcimport.ll

				; REQUIRES: x86-registered-target

	; Do setup work for all below tests: generate bitcode and combined index			; Do setup work for all below tests: generate bitcode and combined index
	; RUN: opt -module-summary %s -o %t.bc			; RUN: opt -module-summary %s -o %t.bc
	; RUN: opt -module-summary %p/Inputs/funcimport.ll -o %t2.bc			; RUN: opt -module-summary %p/Inputs/funcimport.ll -o %t2.bc
	; RUN: llvm-lto -thinlto -print-summary-global-ids -o %t3 %t.bc %t2.bc 2>&1 \| FileCheck %s --check-prefix=GUID			; RUN: llvm-lto -thinlto -print-summary-global-ids -o %t3 %t.bc %t2.bc 2>&1 \| FileCheck %s --check-prefix=GUID

	; Do the import now			; Do the import now
	; RUN: opt -function-import -stats -print-imports -enable-import-metadata -summary-file %t3.thinlto.bc %t.bc -S 2>&1 \| FileCheck %s --check-prefix=CHECK --check-prefix=INSTLIMDEF			; RUN: opt -function-import -stats -print-imports -enable-import-metadata -summary-file %t3.thinlto.bc %t.bc -S 2>&1 \| FileCheck %s --check-prefix=CHECK --check-prefix=INSTLIMDEF
	; Try again with new pass manager			; Try again with new pass manager
	; RUN: opt -passes='function-import' -stats -print-imports -enable-import-metadata -summary-file %t3.thinlto.bc %t.bc -S 2>&1 \| FileCheck %s --check-prefix=CHECK --check-prefix=INSTLIMDEF			; RUN: opt -passes='function-import' -stats -print-imports -enable-import-metadata -summary-file %t3.thinlto.bc %t.bc -S 2>&1 \| FileCheck %s --check-prefix=CHECK --check-prefix=INSTLIMDEF
	; RUN: opt -passes='function-import' -debug-only=function-import -enable-import-metadata -summary-file %t3.thinlto.bc %t.bc -S 2>&1 \| FileCheck %s --check-prefix=DUMP			; RUN: opt -passes='function-import' -debug-only=function-import -enable-import-metadata -summary-file %t3.thinlto.bc %t.bc -S 2>&1 \| FileCheck %s --check-prefix=DUMP
	; "-stats" and "-debug-only" require +Asserts.			; "-stats" and "-debug-only" require +Asserts.
	; REQUIRES: asserts			; REQUIRES: asserts

	; Test import with smaller instruction limit			; Test import with smaller instruction limit
	; RUN: opt -function-import -enable-import-metadata -summary-file %t3.thinlto.bc %t.bc -import-instr-limit=5 -S \| FileCheck %s --check-prefix=CHECK --check-prefix=INSTLIM5			; RUN: opt -function-import -enable-import-metadata -summary-file %t3.thinlto.bc %t.bc -import-instr-limit=5 -S \| FileCheck %s --check-prefix=CHECK --check-prefix=INSTLIM5
	; INSTLIM5-NOT: @staticfunc.llvm.			; INSTLIM5-NOT: @staticfunc.llvm.

				; Test force import all
				; RUN: llvm-lto -thinlto-action=run -force-import-all %t.bc %t2.bc 2>&1 \
				; RUN: \| FileCheck %s --check-prefix=IMPORTALL
				; IMPORTALL-DAG: Error importing module: Failed to import function weakalias due to InterposableLinkage

				target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-apple-macosx10.11.0"

	define i32 @main() #0 {			define i32 @main() #0 {
	entry:			entry:
	call void (...) @weakalias()			call void (...) @weakalias()
	call void (...) @analias()			call void (...) @analias()
	call void (...) @linkoncealias()			call void (...) @linkoncealias()
	%call = call i32 (...) @referencestatics()			%call = call i32 (...) @referencestatics()
	%call1 = call i32 (...) @referenceglobals()			%call1 = call i32 (...) @referenceglobals()
	▲ Show 20 Lines • Show All 137 Lines • Show Last 20 Lines

llvm/test/Transforms/FunctionImport/noinline.ll

This file was added.

				; Do setup work for all below tests: generate bitcode and combined index
				; RUN: opt -module-summary %s -o %t.main.bc
				; RUN: opt -module-summary %p/Inputs/noinline.ll -o %t.inputs.noinline.bc
				; RUN: llvm-lto -thinlto -o %t.summary %t.main.bc %t.inputs.noinline.bc

				traUnsubmitted Done Reply Inline Actions I'd add a meaningful suffix to the binaries we'll use to run the checks on. E.g `%t3` -> `%t.lto.bc`, `%t2` -> `%t.inputs.noinline.bc`, `%t` -> `%t.main.bc`. tra: I'd add a meaningful suffix to the binaries we'll use to run the checks on. E.g `%t3` -> `%t.
				yaxunlAuthorUnsubmitted Done Reply Inline Actions will rename %t and %t2 as suggested. However, llvm-lto will postfix the output file name with .thinlto.bc, therefore I would rename %t3 -> %t.summary yaxunl: will rename %t and %t2 as suggested. However, llvm-lto will postfix the output file name with .
				; Attempt the import now, ensure below that file containing noinline
				; is not imported by default but imported with -force-import-all.

				; RUN: opt -function-import -summary-file %t.summary.thinlto.bc %t.main.bc -S 2>&1 \
				; RUN: \| FileCheck -check-prefix=NOIMPORT %s
				; RUN: opt -function-import -force-import-all -summary-file %t.summary.thinlto.bc \
				; RUN: %t.main.bc -S 2>&1 \| FileCheck -check-prefix=IMPORT %s

				define i32 @main() #0 {
				entry:
				%f = alloca i64, align 8
				call void @foo(i64* %f)
				ret i32 0
				}

				; NOIMPORT: declare void @foo(i64*)
				; IMPORT: define available_externally void @foo
				declare void @foo(i64*) #1

This is an archive of the discontinued LLVM Phabricator instance.

[HIP] Support ThinLTOClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 344949

clang/include/clang/Driver/Driver.h

clang/include/clang/Driver/Options.td

clang/lib/Driver/Driver.cpp

clang/lib/Driver/ToolChains/Clang.cpp

clang/lib/Driver/ToolChains/HIP.cpp

clang/test/Driver/hip-options.hip

llvm/lib/Transforms/IPO/FunctionImport.cpp

llvm/test/Transforms/FunctionImport/Inputs/funcimport.ll

llvm/test/Transforms/FunctionImport/Inputs/noinline.ll

llvm/test/Transforms/FunctionImport/adjustable_threshold.ll

llvm/test/Transforms/FunctionImport/funcimport.ll

llvm/test/Transforms/FunctionImport/noinline.ll

[HIP] Support ThinLTO
ClosedPublic