This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/
-
clang/
-
Basic/
-
CodeGenOptions.h
-
CodeGenOptions.def
-
Driver/
-
CC1Options.td
-
Options.td
1/2
ToolChain.h
-
lib/
-
Basic/Targets/
-
Targets/
-
AMDGPU.cpp
-
CodeGen/
2/5
CGCall.cpp
-
CodeGenModule.cpp
-
Driver/ToolChains/
-
ToolChains/
-
AMDGPU.h
-
AMDGPU.cpp
2/5
Clang.cpp
-
Cuda.h
-
Cuda.cpp
-
HIP.cpp
-
Frontend/
-
CompilerInvocation.cpp
-
test/
-
CodeGenCUDA/
-
flush-denormals.cu
-
propagate-metadata.cu
-
CodeGenOpenCL/
-
amdgpu-features.cl
-
denorms-are-zero.cl
-
gfx9-fp32-denorms.cl
-
Driver/
-
cl-denorms-are-zero.cl
-
cuda-flush-denormals-to-zero.cu
-
denormal-fp-math.c
-
opencl.cl
-
llvm/
-
docs/
5/12
LangRef.rst
-
lib/
-
CodeGen/
-
MachineFunction.cpp
-
Target/NVPTX/
-
NVPTX/
-
NVPTXISelLowering.cpp
-
Transforms/InstCombine/
-
InstCombine/
-
InstCombineCalls.cpp
-
test/
-
CodeGen/NVPTX/
-
NVPTX/
-
fast-math.ll
-
math-intrins.ll
-
sqrt-approx.ll
-
Transforms/InstCombine/NVPTX/
-
InstCombine/
-
NVPTX/
-
nvvm-intrins.ll

Differential D69878

Consoldiate internal denormal flushing controls
ClosedPublic

Authored by arsenm on Nov 5 2019, 9:06 PM.

Download Raw Diff

Details

Reviewers

scanon
spatel
cameron.mcinally
andrew.w.kaylor
tra
jlebar
Anastasia
yaxunl

Summary

Currently there are 4 different mechanisms for controlling denormal
flushing behavior, and about as many equivalent frontend controls.

- AMDGPU uses the fp32-denormals and fp64-f16-denormals subtarget features
- NVPTX uses the nvptx-f32ftz attribute
- ARM directly uses the denormal-fp-math attribute
- Other targets indirectly use denormal-fp-math in one DAGCombine
- cl-denorms-are-zero has a corresponding denorms-are-zero attribute

AMDGPU wants a distinct control for f32 flushing from f16/f64, and as
far as I can tell the same is true for NVPTX (based on the attribute
name).

Work on consolidating these into the denormal-fp-math attribute, and a
new type specific denormal-fp-math-f32 variant. Only ARM seems to
support the two different flush modes, so this is overkill for the
other use cases. Ideally we would error on the unsupported
positive-zero mode on other targets from somewhere.

Move the logic for selecting the flush mode into the compiler driver,
instead of handling it in cc1. denormal-fp-math/denormal-fp-math-f32
are now both cc1 flags, but denormal-fp-math-f32 is not yet exposed as
a user flag.

-cl-denorms-are-zero, -fcuda-flush-denormals-to-zero and
-fno-cuda-flush-denormals-to-zero will be mapped to
-fp-denormal-math-f32=ieee or preserve-sign rather than the old
-attributes.

Stop emitting the denorms-are-zero attribute for the OpenCL flag. It
has no in-tree users. The meaning would also be target dependent, such
as the AMDGPU choice to treat this as only meaning allow flushing of
f32 and not f16 or f64. The naming is also potentially confusing,
since DAZ in other contexts refers to instructions implicitly treating
input denormals as zero, not necessarily flushing output denormals to
zero.

This also does not attempt to change the behavior for the current
attribute. The LangRef now states that the default is ieee behavior,
but this is inaccurate for the current implementation. The clang
handling is slightly hacky to avoid touching the existing
denormal-fp-math uses. Fixing this will be left for a future patch.

AMDGPU is still using the subtarget feature to control the denormal
mode, but the new attribute are now emitted. A future change will
switch this and remove the subtarget features.

Diff Detail

Event Timeline

arsenm created this revision.Nov 5 2019, 9:06 PM

Herald added a project: Restricted Project. · View Herald TranscriptNov 5 2019, 9:06 PM

Herald added subscribers: hiraditya, kristof.beyls, tpr and 4 others. · View Herald Transcript

arsenm added parent revisions: D69598: Work on cleaning up denormal mode handling, D69583: AMDGPU: Refactor treatment of denormal mode, D69729: AMDGPU: Be explicit about denormal mode in MIR tests, D69547: DAG: Add function context to isFMAFasterThanFMulAndFAdd.Nov 5 2019, 9:07 PM

arsenm marked an inline comment as done.Nov 6 2019, 10:28 AM

arsenm added inline comments.

llvm/docs/LangRef.rst
1828–1831	On second thought I think this may be too permissive. I think based on the use in DAGCombiner, that flushing of outputs is compulsory.

sanjoy.google added a subscriber: sanjoy.google.Nov 6 2019, 10:35 AM

arsenm marked an inline comment as done.Nov 6 2019, 3:00 PM

arsenm added inline comments.

llvm/docs/LangRef.rst
1828–1831	It turns out the fast sqrt usage really cares about input denormals being implicitly treated as 0, not the output flushing (i.e. this only needs DAZ, not FTZ). I think being permissive on the output is OK, but if implicit input flushing is required then it's compulsory and a target is responsible for inserting a flush of some kind if the use instruction isn't known to follow this mode. Because of this, I do think it's necessary to treat this as two separate modes. I'm thinking to comma separate output-mode,input-mode, and assume input-mode=output-mode if the second half isn't specified for compatibility with the existing attribute.

Rename subnormal to denormal. Will defer splitting input and output setting into a future patch before switching default behavior

Stop emitting the denorms-are-zero attribute for the OpenCL flag. It
has no in-tree users. The meaning would also be target dependent, such
as the AMDGPU choice to treat this as only meaning allow flushing of
f32 and not f16 or f64. The naming is also potentially confusing,
since DAZ in other contexts refers to instructions implicitly treating
input denormals as zero, not necessarily flushing output denormals to
zero.

Would the targets supporting OpenCL need to define their own behavior in getDefaultDenormalModeForType?

clang/include/clang/Driver/ToolChain.h
619	Can you elaborate what has to be done in order to fix this?

In D69878#1736865, @Anastasia wrote:

Stop emitting the denorms-are-zero attribute for the OpenCL flag. It
has no in-tree users. The meaning would also be target dependent, such
as the AMDGPU choice to treat this as only meaning allow flushing of
f32 and not f16 or f64. The naming is also potentially confusing,
since DAZ in other contexts refers to instructions implicitly treating
input denormals as zero, not necessarily flushing output denormals to
zero.

Would the targets supporting OpenCL need to define their own behavior in getDefaultDenormalModeForType?

Yes. The future ieee default should be conservatively correct though

clang/include/clang/Driver/ToolChain.h
619	The main problem is the current user assumes non-ieee by default. The main blocker is knowing what platforms should default to something different to avoid performance regressions. I have the patch almost ready to switch the default, it’s just missing toolchain overrides

arsenm added a child revision: D69978: Separately track input and output denormal mode.Nov 7 2019, 5:12 PM

Fix name in documentation

AMDGPU wants a distinct control for f32 flushing from f16/f64, and as far as I can tell the same is true for NVPTX (based on the attribute name).

I may be corrected, but I believe nvptx only supports ftz for f32.

Double-precision instructions support subnormal inputs and results. Single-precision instructions support subnormal inputs and results by default for sm_20 and subsequent targets, and flush subnormal inputs and results to sign-preserving zero for sm_1x targets. The optional .ftz modifier on single-precision instructions provides backward compatibility with sm_1x targets by flushing subnormal inputs and results to sign-preserving zero regardless of the target architecture.

https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#floating-point-instructions

arsenm added a child revision: D69982: PPC: Prepare tests for switch of default denormal-fp-math.Nov 14 2019, 6:04 PM

Anastasia added inline comments.Nov 19 2019, 8:23 AM

clang/lib/CodeGen/CGCall.cpp
1775	so where would `denorms-are-zero` be emitted now (in case some out of tree implementations rely on this)?

arsenm marked an inline comment as done.Nov 19 2019, 8:42 AM

arsenm added inline comments.

clang/lib/CodeGen/CGCall.cpp
1775	Rely on in what sense? Do you have a concrete use of this?

Anastasia added inline comments.Nov 26 2019, 9:14 AM

clang/lib/CodeGen/CGCall.cpp
1775	Since it has been emitted before in the module potentially some LLVM implementations could be using that attribute?

arsenm marked an inline comment as done.Dec 1 2019, 10:47 PM

arsenm added inline comments.

clang/lib/CodeGen/CGCall.cpp
1775	I'm disinclined to leave things around just in case some unknown user might have been using them. We've dropped attributes like this before (I think the less-precise-fp-mad one for disuse). This also isn't needed for correctness, so it should be pretty safe to drop

pengfei added a subscriber: pengfei.Dec 2 2019, 4:43 PM

scanon added inline comments.Dec 4 2019, 5:35 AM

llvm/docs/LangRef.rst
1834	Can you clarify this a little bit? I'd prefer something like "Same as `"denorm-fp-math"`, but only controls the behavior of the 32-bit float type.".

Reword langref, fix name in langref

Anastasia added inline comments.Dec 10 2019, 8:58 AM

clang/lib/CodeGen/CGCall.cpp
1775	Ok, fair enough!

arsenm added a child revision: D71353: Fix denormal-fp-math flag and attribute interaction.Dec 11 2019, 6:34 AM

arsenm added a child revision: D71354: CodeGen: Add -denormal-fp-math-f32 flag.

arsenm added a child revision: D71357: AMDGPU: Assume f32 denormals are enabled by default.Dec 11 2019, 6:43 AM

ping

This is looking pretty good to me, but I'm ignoring some of the target specific code that I'm not familiar with.

Is denormal-fp-math influenced by -Ofast? Or are there plans for that? Seems like -Ofast should imply DAZ and FTZ (if supported by target).

I think we discussed this before, but it's worth repeating. If denormal-fp-math isn't specified, we default to IEEE behavior, right? When this lands in master, there could be an unexpected performance hit for targets that aren't paying attention. E.g. I want to use denormal-fp-math to toggle whether a FSUB(-0.0,X) is converted to a FNEG(X) in SelectionDAGBuilder.

Apologies in advance if this has been discussed recently. I've been distracted with another project for the passed few months...

clang/lib/Driver/ToolChains/Clang.cpp
2240	Last line of comment was not removed. Also, is it safe to remove `TrappingMathPresent`? Is that part of the work-in-progress to support `ffp-exception-behavior`?

In D69878#1801508, @cameron.mcinally wrote:

This is looking pretty good to me, but I'm ignoring some of the target specific code that I'm not familiar with.

Is denormal-fp-math influenced by -Ofast? Or are there plans for that? Seems like -Ofast should imply DAZ and FTZ (if supported by target).

Yes, through the toolchain handling. I copied the logic for when crtfastmath is linked for the default mode for x86.

I think we discussed this before, but it's worth repeating. If denormal-fp-math isn't specified, we default to IEEE behavior, right? When this lands in master, there could be an unexpected performance hit for targets that aren't paying attention. E.g. I want to use denormal-fp-math to toggle whether a FSUB(-0.0,X) is converted to a FNEG(X) in SelectionDAGBuilder.

Apologies in advance if this has been discussed recently. I've been distracted with another project for the passed few months...

Yes, ieee should be the default. The dependent patches start adding the attribute by default for platforms with flushing enabled with fast math

clang/lib/Driver/ToolChains/Clang.cpp
2240	I think this is a rebase gone bad. The patch changing the strict math was revered and recommitted and I probably broke this

In D69878#1805804, @arsenm wrote:

In D69878#1801508, @cameron.mcinally wrote:

This is looking pretty good to me, but I'm ignoring some of the target specific code that I'm not familiar with.

Is denormal-fp-math influenced by -Ofast? Or are there plans for that? Seems like -Ofast should imply DAZ and FTZ (if supported by target).

Yes, through the toolchain handling. I copied the logic for when crtfastmath is linked for the default mode for x86.

I think we discussed this before, but it's worth repeating. If denormal-fp-math isn't specified, we default to IEEE behavior, right? When this lands in master, there could be an unexpected performance hit for targets that aren't paying attention. E.g. I want to use denormal-fp-math to toggle whether a FSUB(-0.0,X) is converted to a FNEG(X) in SelectionDAGBuilder.

Apologies in advance if this has been discussed recently. I've been distracted with another project for the passed few months...

Yes, ieee should be the default. The dependent patches start adding the attribute by default for platforms with flushing enabled with fast math

To clarify this patch leaves the default and defers changing that to a later patch

Ok, thanks for the clarifications. Looks good to me, but it would be good to have experts in OpenCL/Cuda/AMDGPU review the target specific changes.

andrew.w.kaylor added inline comments.Jan 6 2020, 10:01 AM

llvm/docs/LangRef.rst
1837	Can you document which targets do support the option? What happens if I try to use the option on a target where it is not supported?

arsenm marked an inline comment as done.Jan 6 2020, 10:52 AM

arsenm added inline comments.

llvm/docs/LangRef.rst
1837	I'm not sure where to document this, or if/how/where to diagnose it. I don't think the high level LangRef description is the right place to discuss specific target handling. Currently it won't error or anything. Code checking the denorm mode will see the f32 specific mode, even if the target in the end isn't really going to respect this. One problem is this potentially does require coordination with other toolchain components. For AMDGPU, the compiler can directly tell the driver what FP mode to set on each entry point, but for x86 it requires linking in crtfastmath to set the default mode bits. If another target had a similar runtime environment requirement, I don't think we can be sure the attribute is correct or not.

andrew.w.kaylor added inline comments.Jan 6 2020, 11:11 AM

llvm/docs/LangRef.rst
1837	There is precedent for describing target-specific behavior in LangRef. It just doesn't seem useful to say that not all targets support the attribute without saying which ones do. We should also say what is expected if a target doesn't support the attribute. It seems reasonable for the function attribute to be silently ignored. One problem is this potentially does require coordination with other toolchain components. For AMDGPU, the compiler can directly tell the driver what FP mode to set on each entry point, but for x86 it requires linking in crtfastmath to set the default mode bits. This is a point I'm interested in. I don't like the current crtfastmath.o handling. It feels almost accidental when FTZ works as expected. My understanding is we link crtfastmath.o if we find it but if not everything just goes about its business. The Intel compiler injects code into main() to explicitly set the FTZ/DAZ control modes. That obviously has problems too, but it's at least consistent and predictable. As I understand it, crtfastmath.o sets these modes from a static initializer, but I'm not sure anything is done to determine the order of that initializer relative to others. How does the compiler identify entry points for AMDGPU? And does it emit code to set FTZ based on the function attribute here?

arsenm marked an inline comment as done.Jan 9 2020, 1:46 PM

arsenm added inline comments.

llvm/docs/LangRef.rst
1837	The entry points are a specific calling convention. There's no real concept of main. Each kernel has an associated blob of metadata the driver uses to set up various config registers on dispatch. I don't think specially recognizing main in the compiler is fundamentally different than having it done in a static constructor. It's still a construct not associated with any particular function or anything.

andrew.w.kaylor added inline comments.Jan 9 2020, 6:43 PM

llvm/docs/LangRef.rst
1837	The problem with having it done in a static constructor is that you have no certainty of when it will be done relative to other static constructors. If it's in main you can at least say that it's after all the static constructors (assuming main is your entry point).

cameron.mcinally added inline comments.Jan 10 2020, 10:48 AM

llvm/docs/LangRef.rst
1837	Yes and no. The linker should honor static constructor priorities. But, yeah, there's no guarantee that this constructor will run before other priority 101 constructors. The performance penalty for setting denormal flushing in main could be significant (think C++). Also, there's precedent for using static constructors, like GCC's crtfastmath.o.

andrew.w.kaylor added inline comments.Jan 10 2020, 11:09 AM

llvm/docs/LangRef.rst
1837	Fair enough. I don't necessarily like how icc handles this. I don't have a problem with how gcc handles it. I just really don't like how LLVM does it. If we want to take the static constructor approach we should define our own, not depend on whether or not the GNU object file happens to be around. Static initialization doesn't help for AMDGPU, and I suppose that's likely to be the case for any offload execution model. Since this patch is moving us toward a more consistent implementation I'm wondering if we can define some general rules for how this is supposed to work. Like when the function attribute will result in injected instructions setting the control flags and when it won't.

arsenm marked an inline comment as done.Jan 10 2020, 11:38 AM

arsenm added inline comments.

llvm/docs/LangRef.rst
1837	I think the most we can expect of this attribute as informing codegen of the expected FP denormal handling mode, and not something responsible for ensuring the mode will really be set. AMDGPU conceptually could have a separate set of attributes for setting the denormal FP mode, but since it would look identical, this gets a bonus usage for setting it for kernels. This doesn't protect you from calling functions in modules compiled with different attributes, so similar problems outside the view of the compiler still exist

cameron.mcinally added inline comments.Jan 10 2020, 11:47 AM

llvm/docs/LangRef.rst
1837	If we want to take the static constructor approach we should define our own, not depend on whether or not the GNU object file happens to be around. That's a good idea. There's subtle differences between targets in the GNU implementation. It would be good to standardize them.

Mention support in langref

ping

Herald added a subscriber: kerbowa. · View Herald TranscriptJan 15 2020, 10:50 AM

andrew.w.kaylor added inline comments.Jan 16 2020, 1:45 PM

clang/lib/Driver/ToolChains/Clang.cpp
2240	Looks like this is still wrong. You didn't intend to change either TrappingMath flag, did you?
2459	Shouldn't this also restore DenormalFP32Math to its default value?

Forgot clang parts

arsenm marked an inline comment as done.Jan 16 2020, 5:40 PM

arsenm added inline comments.

clang/lib/Driver/ToolChains/Clang.cpp
2499	I think this should just follow along with DenormalFPMath, but I'll put this off to the later patch since this one still is slightly awkward trying to avoid changing the meaning of the absence of the option

I don't know if there were other reviewers who haven't commented on how you addressed their concerns, but this looks good to me.

Thanks for taking the time to improve our handling of this!

This revision is now accepted and ready to land.Jan 17 2020, 11:26 AM

LGTM too. Would be good if an expert reviewed the target-specific changes, but they seem safe enough either way.

a4451d88ee456304c26d552749aea6a7f5154bde

Revision Contents

Path

Size

clang/

include/

clang/

Basic/

CodeGenOptions.h

3 lines

CodeGenOptions.def

1 line

Driver/

CC1Options.td

3 lines

Options.td

4 lines

ToolChain.h

13 lines

lib/

Basic/

Targets/

AMDGPU.cpp

3 lines

CodeGen/

CGCall.cpp

16 lines

CodeGenModule.cpp

3 lines

Driver/

ToolChains/

5 lines

35 lines

49 lines

5 lines

20 lines

4 lines

Frontend/

CompilerInvocation.cpp

10 lines

test/

CodeGenCUDA/

flush-denormals.cu

40 lines

propagate-metadata.cu

18 lines

CodeGenOpenCL/

amdgpu-features.cl

14 lines

denorms-are-zero.cl

gfx9-fp32-denorms.cl

Driver/

cl-denorms-are-zero.cl

20 lines

cuda-flush-denormals-to-zero.cu

13 lines

denormal-fp-math.c

2 lines

opencl.cl

5 lines

llvm/

docs/

LangRef.rst

16 lines

lib/

CodeGen/

MachineFunction.cpp

10 lines

Target/

NVPTX/

NVPTXISelLowering.cpp

10 lines

Transforms/

InstCombine/

InstCombineCalls.cpp

8 lines

test/

CodeGen/

NVPTX/

fast-math.ll

2 lines

math-intrins.ll

2 lines

sqrt-approx.ll

2 lines

Transforms/

InstCombine/

NVPTX/

nvvm-intrins.ll

4 lines

Diff 228348

clang/include/clang/Basic/CodeGenOptions.h

Show First 20 Lines • Show All 160 Lines • ▼ Show 20 Lines	public:
std::map<std::string, std::string> DebugPrefixMap;		std::map<std::string, std::string> DebugPrefixMap;

/// The ABI to use for passing floating point arguments.		/// The ABI to use for passing floating point arguments.
std::string FloatABI;		std::string FloatABI;

/// The floating-point denormal mode to use.		/// The floating-point denormal mode to use.
llvm::DenormalMode FPDenormalMode = llvm::DenormalMode::Invalid;		llvm::DenormalMode FPDenormalMode = llvm::DenormalMode::Invalid;

		/// The floating-point subnormal mode to use, for float.
		llvm::DenormalMode FP32DenormalMode = llvm::DenormalMode::Invalid;

/// The float precision limit to use, if non-empty.		/// The float precision limit to use, if non-empty.
std::string LimitFloatPrecision;		std::string LimitFloatPrecision;

struct BitcodeFileToLink {		struct BitcodeFileToLink {
/// The filename of the bitcode file to link in.		/// The filename of the bitcode file to link in.
std::string Filename;		std::string Filename;
/// If true, we set attributes functions in the bitcode library according to		/// If true, we set attributes functions in the bitcode library according to
/// our CodeGenOptions, much as we set attrs on functions that we generate		/// our CodeGenOptions, much as we set attrs on functions that we generate
▲ Show 20 Lines • Show All 188 Lines • Show Last 20 Lines

clang/include/clang/Basic/CodeGenOptions.def

	Show First 20 Lines • Show All 143 Lines • ▼ Show 20 Lines
	CODEGENOPT(NoImplicitFloat , 1, 0) ///< Set when -mno-implicit-float is enabled.			CODEGENOPT(NoImplicitFloat , 1, 0) ///< Set when -mno-implicit-float is enabled.
	CODEGENOPT(NoInfsFPMath , 1, 0) ///< Assume FP arguments, results not +-Inf.			CODEGENOPT(NoInfsFPMath , 1, 0) ///< Assume FP arguments, results not +-Inf.
	CODEGENOPT(NoSignedZeros , 1, 0) ///< Allow ignoring the signedness of FP zero			CODEGENOPT(NoSignedZeros , 1, 0) ///< Allow ignoring the signedness of FP zero
	CODEGENOPT(NullPointerIsValid , 1, 0) ///< Assume Null pointer deference is defined.			CODEGENOPT(NullPointerIsValid , 1, 0) ///< Assume Null pointer deference is defined.
	CODEGENOPT(Reassociate , 1, 0) ///< Allow reassociation of FP math ops			CODEGENOPT(Reassociate , 1, 0) ///< Allow reassociation of FP math ops
	CODEGENOPT(ReciprocalMath , 1, 0) ///< Allow FP divisions to be reassociated.			CODEGENOPT(ReciprocalMath , 1, 0) ///< Allow FP divisions to be reassociated.
	CODEGENOPT(NoTrappingMath , 1, 0) ///< Set when -fno-trapping-math is enabled.			CODEGENOPT(NoTrappingMath , 1, 0) ///< Set when -fno-trapping-math is enabled.
	CODEGENOPT(NoNaNsFPMath , 1, 0) ///< Assume FP arguments, results not NaN.			CODEGENOPT(NoNaNsFPMath , 1, 0) ///< Assume FP arguments, results not NaN.
	CODEGENOPT(FlushDenorm , 1, 0) ///< Allow FP denorm numbers to be flushed to zero
	CODEGENOPT(CorrectlyRoundedDivSqrt, 1, 0) ///< -cl-fp32-correctly-rounded-divide-sqrt			CODEGENOPT(CorrectlyRoundedDivSqrt, 1, 0) ///< -cl-fp32-correctly-rounded-divide-sqrt

	/// When false, this attempts to generate code as if the result of an			/// When false, this attempts to generate code as if the result of an
	/// overflowing conversion matches the overflowing behavior of a target's native			/// overflowing conversion matches the overflowing behavior of a target's native
	/// float-to-int conversion instructions.			/// float-to-int conversion instructions.
	CODEGENOPT(StrictFloatCastOverflow, 1, 1)			CODEGENOPT(StrictFloatCastOverflow, 1, 1)

	CODEGENOPT(UniformWGSize , 1, 0) ///< -cl-uniform-work-group-size			CODEGENOPT(UniformWGSize , 1, 0) ///< -cl-uniform-work-group-size
	▲ Show 20 Lines • Show All 223 Lines • Show Last 20 Lines

clang/include/clang/Driver/CC1Options.td

Show First 20 Lines • Show All 399 Lines • ▼ Show 20 Lines	def msign_return_address_key_EQ : Joined<["-"], "msign-return-address-key=">,
Values<"a_key,b_key">;		Values<"a_key,b_key">;
def mbranch_target_enforce : Flag<["-"], "mbranch-target-enforce">;		def mbranch_target_enforce : Flag<["-"], "mbranch-target-enforce">;
def fno_dllexport_inlines : Flag<["-"], "fno-dllexport-inlines">;		def fno_dllexport_inlines : Flag<["-"], "fno-dllexport-inlines">;
def cfguard_no_checks : Flag<["-"], "cfguard-no-checks">,		def cfguard_no_checks : Flag<["-"], "cfguard-no-checks">,
HelpText<"Emit Windows Control Flow Guard tables only (no checks)">;		HelpText<"Emit Windows Control Flow Guard tables only (no checks)">;
def cfguard : Flag<["-"], "cfguard">,		def cfguard : Flag<["-"], "cfguard">,
HelpText<"Emit Windows Control Flow Guard tables and checks">;		HelpText<"Emit Windows Control Flow Guard tables and checks">;

		def fdenormal_fp_math_f32_EQ : Joined<["-"], "fdenormal-fp-math-f32=">,
		Group<f_Group>;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Dependency Output Options		// Dependency Output Options
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

def sys_header_deps : Flag<["-"], "sys-header-deps">,		def sys_header_deps : Flag<["-"], "sys-header-deps">,
HelpText<"Include system headers in dependency output">;		HelpText<"Include system headers in dependency output">;
def module_file_deps : Flag<["-"], "module-file-deps">,		def module_file_deps : Flag<["-"], "module-file-deps">,
HelpText<"Include module files in dependency output">;		HelpText<"Include module files in dependency output">;
▲ Show 20 Lines • Show All 508 Lines • Show Last 20 Lines

clang/include/clang/Driver/Options.td

	Show First 20 Lines • Show All 517 Lines • ▼ Show 20 Lines
	def cl_fast_relaxed_math : Flag<["-"], "cl-fast-relaxed-math">, Group<opencl_Group>, Flags<[CC1Option]>,			def cl_fast_relaxed_math : Flag<["-"], "cl-fast-relaxed-math">, Group<opencl_Group>, Flags<[CC1Option]>,
	HelpText<"OpenCL only. Sets -cl-finite-math-only and -cl-unsafe-math-optimizations, and defines __FAST_RELAXED_MATH__.">;			HelpText<"OpenCL only. Sets -cl-finite-math-only and -cl-unsafe-math-optimizations, and defines __FAST_RELAXED_MATH__.">;
	def cl_mad_enable : Flag<["-"], "cl-mad-enable">, Group<opencl_Group>, Flags<[CC1Option]>,			def cl_mad_enable : Flag<["-"], "cl-mad-enable">, Group<opencl_Group>, Flags<[CC1Option]>,
	HelpText<"OpenCL only. Allow use of less precise MAD computations in the generated binary.">;			HelpText<"OpenCL only. Allow use of less precise MAD computations in the generated binary.">;
	def cl_no_signed_zeros : Flag<["-"], "cl-no-signed-zeros">, Group<opencl_Group>, Flags<[CC1Option]>,			def cl_no_signed_zeros : Flag<["-"], "cl-no-signed-zeros">, Group<opencl_Group>, Flags<[CC1Option]>,
	HelpText<"OpenCL only. Allow use of less precise no signed zeros computations in the generated binary.">;			HelpText<"OpenCL only. Allow use of less precise no signed zeros computations in the generated binary.">;
	def cl_std_EQ : Joined<["-"], "cl-std=">, Group<opencl_Group>, Flags<[CC1Option]>,			def cl_std_EQ : Joined<["-"], "cl-std=">, Group<opencl_Group>, Flags<[CC1Option]>,
	HelpText<"OpenCL language standard to compile for.">, Values<"cl,CL,cl1.1,CL1.1,cl1.2,CL1.2,cl2.0,CL2.0,clc++,CLC++">;			HelpText<"OpenCL language standard to compile for.">, Values<"cl,CL,cl1.1,CL1.1,cl1.2,CL1.2,cl2.0,CL2.0,clc++,CLC++">;
	def cl_denorms_are_zero : Flag<["-"], "cl-denorms-are-zero">, Group<opencl_Group>, Flags<[CC1Option]>,			def cl_denorms_are_zero : Flag<["-"], "cl-denorms-are-zero">, Group<opencl_Group>,
	HelpText<"OpenCL only. Allow denormals to be flushed to zero.">;			HelpText<"OpenCL only. Allow denormals to be flushed to zero.">;
	def cl_fp32_correctly_rounded_divide_sqrt : Flag<["-"], "cl-fp32-correctly-rounded-divide-sqrt">, Group<opencl_Group>, Flags<[CC1Option]>,			def cl_fp32_correctly_rounded_divide_sqrt : Flag<["-"], "cl-fp32-correctly-rounded-divide-sqrt">, Group<opencl_Group>, Flags<[CC1Option]>,
	HelpText<"OpenCL only. Specify that single precision floating-point divide and sqrt used in the program source are correctly rounded.">;			HelpText<"OpenCL only. Specify that single precision floating-point divide and sqrt used in the program source are correctly rounded.">;
	def cl_uniform_work_group_size : Flag<["-"], "cl-uniform-work-group-size">, Group<opencl_Group>, Flags<[CC1Option]>,			def cl_uniform_work_group_size : Flag<["-"], "cl-uniform-work-group-size">, Group<opencl_Group>, Flags<[CC1Option]>,
	HelpText<"OpenCL only. Defines that the global work-size be a multiple of the work-group size specified to clEnqueueNDRangeKernel">;			HelpText<"OpenCL only. Defines that the global work-size be a multiple of the work-group size specified to clEnqueueNDRangeKernel">;
	def client__name : JoinedOrSeparate<["-"], "client_name">;			def client__name : JoinedOrSeparate<["-"], "client_name">;
	def combine : Flag<["-", "--"], "combine">, Flags<[DriverOption, Unsupported]>;			def combine : Flag<["-", "--"], "combine">, Flags<[DriverOption, Unsupported]>;
	def compatibility__version : JoinedOrSeparate<["-"], "compatibility_version">;			def compatibility__version : JoinedOrSeparate<["-"], "compatibility_version">;
	Show All 38 Lines
	def no_cuda_noopt_device_debug : Flag<["--"], "no-cuda-noopt-device-debug">;			def no_cuda_noopt_device_debug : Flag<["--"], "no-cuda-noopt-device-debug">;
	def cuda_path_EQ : Joined<["--"], "cuda-path=">, Group<i_Group>,			def cuda_path_EQ : Joined<["--"], "cuda-path=">, Group<i_Group>,
	HelpText<"CUDA installation path">;			HelpText<"CUDA installation path">;
	def cuda_path_ignore_env : Flag<["--"], "cuda-path-ignore-env">, Group<i_Group>,			def cuda_path_ignore_env : Flag<["--"], "cuda-path-ignore-env">, Group<i_Group>,
	HelpText<"Ignore environment variables to detect CUDA installation">;			HelpText<"Ignore environment variables to detect CUDA installation">;
	def ptxas_path_EQ : Joined<["--"], "ptxas-path=">, Group<i_Group>,			def ptxas_path_EQ : Joined<["--"], "ptxas-path=">, Group<i_Group>,
	HelpText<"Path to ptxas (used for compiling CUDA code)">;			HelpText<"Path to ptxas (used for compiling CUDA code)">;
	def fcuda_flush_denormals_to_zero : Flag<["-"], "fcuda-flush-denormals-to-zero">,			def fcuda_flush_denormals_to_zero : Flag<["-"], "fcuda-flush-denormals-to-zero">,
	Flags<[CC1Option]>, HelpText<"Flush denormal floating point values to zero in CUDA device mode.">;			HelpText<"Flush denormal floating point values to zero in CUDA device mode.">;
	def fno_cuda_flush_denormals_to_zero : Flag<["-"], "fno-cuda-flush-denormals-to-zero">;			def fno_cuda_flush_denormals_to_zero : Flag<["-"], "fno-cuda-flush-denormals-to-zero">;
	def fcuda_approx_transcendentals : Flag<["-"], "fcuda-approx-transcendentals">,			def fcuda_approx_transcendentals : Flag<["-"], "fcuda-approx-transcendentals">,
	Flags<[CC1Option]>, HelpText<"Use approximate transcendental functions">;			Flags<[CC1Option]>, HelpText<"Use approximate transcendental functions">;
	def fno_cuda_approx_transcendentals : Flag<["-"], "fno-cuda-approx-transcendentals">;			def fno_cuda_approx_transcendentals : Flag<["-"], "fno-cuda-approx-transcendentals">;
	def fgpu_rdc : Flag<["-"], "fgpu-rdc">, Flags<[CC1Option]>,			def fgpu_rdc : Flag<["-"], "fgpu-rdc">, Flags<[CC1Option]>,
	HelpText<"Generate relocatable device code, also known as separate compilation mode.">;			HelpText<"Generate relocatable device code, also known as separate compilation mode.">;
	def fno_gpu_rdc : Flag<["-"], "fno-gpu-rdc">;			def fno_gpu_rdc : Flag<["-"], "fno-gpu-rdc">;
	def : Flag<["-"], "fcuda-rdc">, Alias<fgpu_rdc>;			def : Flag<["-"], "fcuda-rdc">, Alias<fgpu_rdc>;
	▲ Show 20 Lines • Show All 2,763 Lines • Show Last 20 Lines

clang/include/clang/Driver/ToolChain.h

Show All 10 Lines

#include "clang/Basic/DebugInfoOptions.h"		#include "clang/Basic/DebugInfoOptions.h"
#include "clang/Basic/LLVM.h"		#include "clang/Basic/LLVM.h"
#include "clang/Basic/LangOptions.h"		#include "clang/Basic/LangOptions.h"
#include "clang/Basic/Sanitizers.h"		#include "clang/Basic/Sanitizers.h"
#include "clang/Driver/Action.h"		#include "clang/Driver/Action.h"
#include "clang/Driver/Multilib.h"		#include "clang/Driver/Multilib.h"
#include "clang/Driver/Types.h"		#include "clang/Driver/Types.h"
		#include "llvm/ADT/APFloat.h"
#include "llvm/ADT/ArrayRef.h"		#include "llvm/ADT/ArrayRef.h"
		#include "llvm/ADT/FloatingPointMode.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
#include "llvm/ADT/Triple.h"		#include "llvm/ADT/Triple.h"
#include "llvm/MC/MCTargetOptions.h"		#include "llvm/MC/MCTargetOptions.h"
#include "llvm/Option/Option.h"		#include "llvm/Option/Option.h"
#include "llvm/Support/VersionTuple.h"		#include "llvm/Support/VersionTuple.h"
#include "llvm/Target/TargetOptions.h"		#include "llvm/Target/TargetOptions.h"
#include <cassert>		#include <cassert>
▲ Show 20 Lines • Show All 573 Lines • ▼ Show 20 Lines	public:
/// Return sanitizers which are enabled by default.		/// Return sanitizers which are enabled by default.
virtual SanitizerMask getDefaultSanitizers() const {		virtual SanitizerMask getDefaultSanitizers() const {
return SanitizerMask();		return SanitizerMask();
}		}

/// Returns true when it's possible to split LTO unit to use whole		/// Returns true when it's possible to split LTO unit to use whole
/// program devirtualization and CFI santiizers.		/// program devirtualization and CFI santiizers.
virtual bool canSplitThinLTOUnit() const { return true; }		virtual bool canSplitThinLTOUnit() const { return true; }

		/// Returns the output denormal handling type in the default floating point
		/// environment for the given \p FPType if given. Otherwise, the default
		/// assumed mode for any floating point type.
		virtual llvm::DenormalMode getDefaultDenormalModeForType(
		const llvm::opt::ArgList &DriverArgs,
		Action::OffloadKind DeviceOffloadKind,
		const llvm::fltSemantics *FPType = nullptr) const {
		// FIXME: This should be IEEE when default handling is fixed.
		AnastasiaUnsubmitted Not Done Reply Inline Actions Can you elaborate what has to be done in order to fix this? Anastasia: Can you elaborate what has to be done in order to fix this?
		arsenmAuthorUnsubmitted Done Reply Inline Actions The main problem is the current user assumes non-ieee by default. The main blocker is knowing what platforms should default to something different to avoid performance regressions. I have the patch almost ready to switch the default, it’s just missing toolchain overrides arsenm: The main problem is the current user assumes non-ieee by default. The main blocker is knowing…
		return llvm::DenormalMode::Invalid;
		}
};		};

/// Set a ToolChain's effective triple. Reset it when the registration object		/// Set a ToolChain's effective triple. Reset it when the registration object
/// is destroyed.		/// is destroyed.
class RegisterEffectiveTriple {		class RegisterEffectiveTriple {
const ToolChain &TC;		const ToolChain &TC;

public:		public:
Show All 12 Lines

clang/lib/Basic/Targets/AMDGPU.cpp

Show First 20 Lines • Show All 232 Lines • ▼ Show 20 Lines	void AMDGPUTargetInfo::adjustTargetOptions(const CodeGenOptions &CGOpts,
for (auto &I : TargetOpts.FeaturesAsWritten) {		for (auto &I : TargetOpts.FeaturesAsWritten) {
if (I == "+fp32-denormals" \|\| I == "-fp32-denormals")		if (I == "+fp32-denormals" \|\| I == "-fp32-denormals")
hasFP32Denormals = true;		hasFP32Denormals = true;
if (I == "+fp64-fp16-denormals" \|\| I == "-fp64-fp16-denormals")		if (I == "+fp64-fp16-denormals" \|\| I == "-fp64-fp16-denormals")
hasFP64Denormals = true;		hasFP64Denormals = true;
}		}
if (!hasFP32Denormals)		if (!hasFP32Denormals)
TargetOpts.Features.push_back(		TargetOpts.Features.push_back(
(Twine(hasFastFMAF() && hasFullRateDenormalsF32() && !CGOpts.FlushDenorm		(Twine(hasFastFMAF() && hasFullRateDenormalsF32() &&
		CGOpts.FP32DenormalMode == llvm::DenormalMode::IEEE
? '+' : '-') + Twine("fp32-denormals"))		? '+' : '-') + Twine("fp32-denormals"))
.str());		.str());
// Always do not flush fp64 or fp16 denorms.		// Always do not flush fp64 or fp16 denorms.
if (!hasFP64Denormals && hasFP64())		if (!hasFP64Denormals && hasFP64())
TargetOpts.Features.push_back("+fp64-fp16-denormals");		TargetOpts.Features.push_back("+fp64-fp16-denormals");
}		}

void AMDGPUTargetInfo::fillValidCPUList(		void AMDGPUTargetInfo::fillValidCPUList(
▲ Show 20 Lines • Show All 101 Lines • Show Last 20 Lines

clang/lib/CodeGen/CGCall.cpp

Show First 20 Lines • Show All 1,735 Lines • ▼ Show 20 Lines	if (AttrOnCallSite) {
}		}
FuncAttrs.addAttribute("frame-pointer", FpKind);		FuncAttrs.addAttribute("frame-pointer", FpKind);

FuncAttrs.addAttribute("less-precise-fpmad",		FuncAttrs.addAttribute("less-precise-fpmad",
llvm::toStringRef(CodeGenOpts.LessPreciseFPMAD));		llvm::toStringRef(CodeGenOpts.LessPreciseFPMAD));

if (CodeGenOpts.NullPointerIsValid)		if (CodeGenOpts.NullPointerIsValid)
FuncAttrs.addAttribute("null-pointer-is-valid", "true");		FuncAttrs.addAttribute("null-pointer-is-valid", "true");

		// TODO: Omit attribute when the default is IEEE.
if (CodeGenOpts.FPDenormalMode != llvm::DenormalMode::Invalid)		if (CodeGenOpts.FPDenormalMode != llvm::DenormalMode::Invalid)
FuncAttrs.addAttribute("denormal-fp-math",		FuncAttrs.addAttribute("denormal-fp-math",
llvm::subnormalModeName(CodeGenOpts.FPDenormalMode));		llvm::denormalModeName(CodeGenOpts.FPDenormalMode));
		if (CodeGenOpts.FP32DenormalMode != llvm::DenormalMode::Invalid)
		FuncAttrs.addAttribute(
		"denormal-fp-math-f32",
		llvm::denormalModeName(CodeGenOpts.FP32DenormalMode));

FuncAttrs.addAttribute("no-trapping-math",		FuncAttrs.addAttribute("no-trapping-math",
llvm::toStringRef(CodeGenOpts.NoTrappingMath));		llvm::toStringRef(CodeGenOpts.NoTrappingMath));

// Strict (compliant) code is the default, so only add this attribute to		// Strict (compliant) code is the default, so only add this attribute to
// indicate that we are trying to workaround a problem case.		// indicate that we are trying to workaround a problem case.
if (!CodeGenOpts.StrictFloatCastOverflow)		if (!CodeGenOpts.StrictFloatCastOverflow)
FuncAttrs.addAttribute("strict-float-cast-overflow", "false");		FuncAttrs.addAttribute("strict-float-cast-overflow", "false");
Show All 11 Lines	if (AttrOnCallSite) {
FuncAttrs.addAttribute("stack-protector-buffer-size",		FuncAttrs.addAttribute("stack-protector-buffer-size",
llvm::utostr(CodeGenOpts.SSPBufferSize));		llvm::utostr(CodeGenOpts.SSPBufferSize));
FuncAttrs.addAttribute("no-signed-zeros-fp-math",		FuncAttrs.addAttribute("no-signed-zeros-fp-math",
llvm::toStringRef(CodeGenOpts.NoSignedZeros));		llvm::toStringRef(CodeGenOpts.NoSignedZeros));
FuncAttrs.addAttribute(		FuncAttrs.addAttribute(
"correctly-rounded-divide-sqrt-fp-math",		"correctly-rounded-divide-sqrt-fp-math",
llvm::toStringRef(CodeGenOpts.CorrectlyRoundedDivSqrt));		llvm::toStringRef(CodeGenOpts.CorrectlyRoundedDivSqrt));

if (getLangOpts().OpenCL)
FuncAttrs.addAttribute("denorms-are-zero",
AnastasiaUnsubmitted Not Done Reply Inline Actions so where would `denorms-are-zero` be emitted now (in case some out of tree implementations rely on this)? Anastasia: so where would `denorms-are-zero` be emitted now (in case some out of tree implementations rely…
arsenmAuthorUnsubmitted Done Reply Inline Actions Rely on in what sense? Do you have a concrete use of this? arsenm: Rely on in what sense? Do you have a concrete use of this?
AnastasiaUnsubmitted Not Done Reply Inline Actions Since it has been emitted before in the module potentially some LLVM implementations could be using that attribute? Anastasia: Since it has been emitted before in the module potentially some LLVM implementations could be…
arsenmAuthorUnsubmitted Done Reply Inline Actions I'm disinclined to leave things around just in case some unknown user might have been using them. We've dropped attributes like this before (I think the less-precise-fp-mad one for disuse). This also isn't needed for correctness, so it should be pretty safe to drop arsenm: I'm disinclined to leave things around just in case some unknown user might have been using…
AnastasiaUnsubmitted Not Done Reply Inline Actions Ok, fair enough! Anastasia: Ok, fair enough!
llvm::toStringRef(CodeGenOpts.FlushDenorm));

// TODO: Reciprocal estimate codegen options should apply to instructions?		// TODO: Reciprocal estimate codegen options should apply to instructions?
const std::vector<std::string> &Recips = CodeGenOpts.Reciprocals;		const std::vector<std::string> &Recips = CodeGenOpts.Reciprocals;
if (!Recips.empty())		if (!Recips.empty())
FuncAttrs.addAttribute("reciprocal-estimates",		FuncAttrs.addAttribute("reciprocal-estimates",
llvm::join(Recips, ","));		llvm::join(Recips, ","));

if (!CodeGenOpts.PreferVectorWidth.empty() &&		if (!CodeGenOpts.PreferVectorWidth.empty() &&
CodeGenOpts.PreferVectorWidth != "none")		CodeGenOpts.PreferVectorWidth != "none")
Show All 16 Lines	if (getLangOpts().assumeFunctionsAreConvergent()) {
// applied around them). LLVM will remove this attribute where it safely		// applied around them). LLVM will remove this attribute where it safely
// can.		// can.
FuncAttrs.addAttribute(llvm::Attribute::Convergent);		FuncAttrs.addAttribute(llvm::Attribute::Convergent);
}		}

if (getLangOpts().CUDA && getLangOpts().CUDAIsDevice) {		if (getLangOpts().CUDA && getLangOpts().CUDAIsDevice) {
// Exceptions aren't supported in CUDA device code.		// Exceptions aren't supported in CUDA device code.
FuncAttrs.addAttribute(llvm::Attribute::NoUnwind);		FuncAttrs.addAttribute(llvm::Attribute::NoUnwind);

// Respect -fcuda-flush-denormals-to-zero.
if (CodeGenOpts.FlushDenorm)
FuncAttrs.addAttribute("nvptx-f32ftz", "true");
}		}

for (StringRef Attr : CodeGenOpts.DefaultFunctionAttrs) {		for (StringRef Attr : CodeGenOpts.DefaultFunctionAttrs) {
StringRef Var, Value;		StringRef Var, Value;
std::tie(Var, Value) = Attr.split('=');		std::tie(Var, Value) = Attr.split('=');
FuncAttrs.addAttribute(Var, Value);		FuncAttrs.addAttribute(Var, Value);
}		}
}		}
▲ Show 20 Lines • Show All 2,816 Lines • Show Last 20 Lines

clang/lib/CodeGen/CodeGenModule.cpp

Show First 20 Lines • Show All 552 Lines • ▼ Show 20 Lines	getModule().addModuleFlag(llvm::Module::Override, "cf-protection-branch",
1);		1);
}		}

if (LangOpts.CUDAIsDevice && getTriple().isNVPTX()) {		if (LangOpts.CUDAIsDevice && getTriple().isNVPTX()) {
// Indicate whether __nvvm_reflect should be configured to flush denormal		// Indicate whether __nvvm_reflect should be configured to flush denormal
// floating point values to 0. (This corresponds to its "__CUDA_FTZ"		// floating point values to 0. (This corresponds to its "__CUDA_FTZ"
// property.)		// property.)
getModule().addModuleFlag(llvm::Module::Override, "nvvm-reflect-ftz",		getModule().addModuleFlag(llvm::Module::Override, "nvvm-reflect-ftz",
CodeGenOpts.FlushDenorm ? 1 : 0);		CodeGenOpts.FP32DenormalMode !=
		llvm::DenormalMode::IEEE);
}		}

// Emit OpenCL specific module metadata: OpenCL/SPIR version.		// Emit OpenCL specific module metadata: OpenCL/SPIR version.
if (LangOpts.OpenCL) {		if (LangOpts.OpenCL) {
EmitOpenCLMetadata();		EmitOpenCLMetadata();
// Emit SPIR version.		// Emit SPIR version.
if (getTriple().isSPIR()) {		if (getTriple().isSPIR()) {
// SPIR v2.0 s2.12 - The SPIR version used by the module is stored in the		// SPIR v2.0 s2.12 - The SPIR version used by the module is stored in the
▲ Show 20 Lines • Show All 5,363 Lines • Show Last 20 Lines

clang/lib/Driver/ToolChains/AMDGPU.h

Show First 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	public:

llvm::opt::DerivedArgList *		llvm::opt::DerivedArgList *
TranslateArgs(const llvm::opt::DerivedArgList &Args, StringRef BoundArch,		TranslateArgs(const llvm::opt::DerivedArgList &Args, StringRef BoundArch,
Action::OffloadKind DeviceOffloadKind) const override;		Action::OffloadKind DeviceOffloadKind) const override;

void addClangTargetOptions(const llvm::opt::ArgList &DriverArgs,		void addClangTargetOptions(const llvm::opt::ArgList &DriverArgs,
llvm::opt::ArgStringList &CC1Args,		llvm::opt::ArgStringList &CC1Args,
Action::OffloadKind DeviceOffloadKind) const override;		Action::OffloadKind DeviceOffloadKind) const override;

		llvm::DenormalMode getDefaultDenormalModeForType(
		const llvm::opt::ArgList &DriverArgs,
		Action::OffloadKind DeviceOffloadKind,
		const llvm::fltSemantics *FPType = nullptr) const override;
};		};

} // end namespace toolchains		} // end namespace toolchains
} // end namespace driver		} // end namespace driver
} // end namespace clang		} // end namespace clang

#endif // LLVM_CLANG_LIB_DRIVER_TOOLCHAINS_AMDGPU_H		#endif // LLVM_CLANG_LIB_DRIVER_TOOLCHAINS_AMDGPU_H

clang/lib/Driver/ToolChains/AMDGPU.cpp

//===--- AMDGPU.cpp - AMDGPU ToolChain Implementations ----------- C++ --===//		//===--- AMDGPU.cpp - AMDGPU ToolChain Implementations ----------- C++ --===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "AMDGPU.h"		#include "AMDGPU.h"
#include "CommonArgs.h"		#include "CommonArgs.h"
#include "InputInfo.h"		#include "InputInfo.h"
#include "clang/Driver/Compilation.h"		#include "clang/Driver/Compilation.h"
#include "clang/Driver/DriverDiagnostic.h"		#include "clang/Driver/DriverDiagnostic.h"
#include "llvm/Option/ArgList.h"		#include "llvm/Option/ArgList.h"
		#include "llvm/Support/TargetParser.h"

using namespace clang::driver;		using namespace clang::driver;
using namespace clang::driver::tools;		using namespace clang::driver::tools;
using namespace clang::driver::toolchains;		using namespace clang::driver::toolchains;
using namespace clang;		using namespace clang;
using namespace llvm::opt;		using namespace llvm::opt;

void amdgpu::Linker::ConstructJob(Compilation &C, const JobAction &JA,		void amdgpu::Linker::ConstructJob(Compilation &C, const JobAction &JA,
▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines	if (!Args.hasArg(options::OPT_O, options::OPT_O0, options::OPT_O4,
options::OPT_Ofast))		options::OPT_Ofast))
DAL->AddJoinedArg(nullptr, Opts.getOption(options::OPT_O),		DAL->AddJoinedArg(nullptr, Opts.getOption(options::OPT_O),
getOptionDefault(options::OPT_O));		getOptionDefault(options::OPT_O));
}		}

return DAL;		return DAL;
}		}

		llvm::DenormalMode AMDGPUToolChain::getDefaultDenormalModeForType(
		const llvm::opt::ArgList &DriverArgs, Action::OffloadKind DeviceOffloadKind,
		const llvm::fltSemantics *FPType) const {
		// Denormals should always be enabled for f16 and f64.
		if (!FPType \|\| FPType != &llvm::APFloat::IEEEsingle())
		return llvm::DenormalMode::IEEE;

		if (DeviceOffloadKind == Action::OFK_Cuda) {
		if (FPType && FPType == &llvm::APFloat::IEEEsingle() &&
		DriverArgs.hasFlag(options::OPT_fcuda_flush_denormals_to_zero,
		options::OPT_fno_cuda_flush_denormals_to_zero,
		false))
		return llvm::DenormalMode::PreserveSign;
		}

		const StringRef GpuArch = DriverArgs.getLastArgValue(options::OPT_mcpu_EQ);
		auto Kind = llvm::AMDGPU::parseArchAMDGCN(GpuArch);

		// Default to enabling f32 denormals by default on subtargets where fma is
		// fast with denormals

		const unsigned ArchAttr = llvm::AMDGPU::getArchAttrAMDGCN(Kind);
		const bool DefaultDenormsAreZeroForTarget =
		(ArchAttr & llvm::AMDGPU::FEATURE_FAST_FMA_F32) &&
		(ArchAttr & llvm::AMDGPU::FEATURE_FAST_DENORMAL_F32);

		// TODO: There are way too many flags that change this. Do we need to check
		// them all?
		bool DAZ = DriverArgs.hasArg(options::OPT_cl_denorms_are_zero) \|\|
		!DefaultDenormsAreZeroForTarget;
		// Outputs are flushed to zero, preserving sign
		return DAZ ? llvm::DenormalMode::PreserveSign : llvm::DenormalMode::IEEE;
		}

void AMDGPUToolChain::addClangTargetOptions(		void AMDGPUToolChain::addClangTargetOptions(
const llvm::opt::ArgList &DriverArgs,		const llvm::opt::ArgList &DriverArgs,
llvm::opt::ArgStringList &CC1Args,		llvm::opt::ArgStringList &CC1Args,
Action::OffloadKind DeviceOffloadingKind) const {		Action::OffloadKind DeviceOffloadingKind) const {
// Default to "hidden" visibility, as object level linking will not be		// Default to "hidden" visibility, as object level linking will not be
// supported for the foreseeable future.		// supported for the foreseeable future.
if (!DriverArgs.hasArg(options::OPT_fvisibility_EQ,		if (!DriverArgs.hasArg(options::OPT_fvisibility_EQ,
options::OPT_fvisibility_ms_compat)) {		options::OPT_fvisibility_ms_compat)) {
CC1Args.push_back("-fvisibility");		CC1Args.push_back("-fvisibility");
CC1Args.push_back("hidden");		CC1Args.push_back("hidden");
CC1Args.push_back("-fapply-global-visibility-to-externs");		CC1Args.push_back("-fapply-global-visibility-to-externs");
}		}
}		}

clang/lib/Driver/ToolChains/Clang.cpp

Show First 20 Lines • Show All 2,215 Lines • ▼ Show 20 Lines	static void CollectArgsForIntegratedAssembler(Compilation &C,
// forward -fembed-bitcode to assmebler		// forward -fembed-bitcode to assmebler
if (C.getDriver().embedBitcodeEnabled() \|\|		if (C.getDriver().embedBitcodeEnabled() \|\|
C.getDriver().embedBitcodeMarkerOnly())		C.getDriver().embedBitcodeMarkerOnly())
Args.AddLastArg(CmdArgs, options::OPT_fembed_bitcode_EQ);		Args.AddLastArg(CmdArgs, options::OPT_fembed_bitcode_EQ);
}		}

static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D,		static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D,
bool OFastEnabled, const ArgList &Args,		bool OFastEnabled, const ArgList &Args,
ArgStringList &CmdArgs) {		ArgStringList &CmdArgs,
		Action::OffloadKind DeviceOffloadKind) {
// Handle various floating point optimization flags, mapping them to the		// Handle various floating point optimization flags, mapping them to the
// appropriate LLVM code generation flags. This is complicated by several		// appropriate LLVM code generation flags. This is complicated by several
// "umbrella" flags, so we do this by stepping through the flags incrementally		// "umbrella" flags, so we do this by stepping through the flags incrementally
// adjusting what we think is enabled/disabled, then at the end setting the		// adjusting what we think is enabled/disabled, then at the end setting the
// LLVM flags based on the final state.		// LLVM flags based on the final state.
bool HonorINFs = true;		bool HonorINFs = true;
bool HonorNaNs = true;		bool HonorNaNs = true;
// -fmath-errno is the default on some platforms, e.g. BSD-derived OSes.		// -fmath-errno is the default on some platforms, e.g. BSD-derived OSes.
bool MathErrno = TC.IsMathErrnoDefault();		bool MathErrno = TC.IsMathErrnoDefault();
bool AssociativeMath = false;		bool AssociativeMath = false;
bool ReciprocalMath = false;		bool ReciprocalMath = false;
bool SignedZeros = true;		bool SignedZeros = true;
bool TrappingMath = false; // Implemented via -ffp-exception-behavior		bool TrappingMath = false; // Implemented via -ffp-exception-behavior
bool TrappingMathPresent = false; // Is trapping-math in args, and not		bool TrappingMathPresent = false; // Is trapping-math in args, and not
// overriden by ffp-exception-behavior?		// overriden by ffp-exception-behavior?
		cameron.mcinallyUnsubmitted Not Done Reply Inline Actions Last line of comment was not removed. Also, is it safe to remove `TrappingMathPresent`? Is that part of the work-in-progress to support `ffp-exception-behavior`? cameron.mcinally: Last line of comment was not removed. Also, is it safe to remove `TrappingMathPresent`? Is…
		arsenmAuthorUnsubmitted Done Reply Inline Actions I think this is a rebase gone bad. The patch changing the strict math was revered and recommitted and I probably broke this arsenm: I think this is a rebase gone bad. The patch changing the strict math was revered and…
		andrew.w.kaylorUnsubmitted Not Done Reply Inline Actions Looks like this is still wrong. You didn't intend to change either TrappingMath flag, did you? andrew.w.kaylor: Looks like this is still wrong. You didn't intend to change either TrappingMath flag, did you?
bool RoundingFPMath = false;		bool RoundingFPMath = false;
bool RoundingMathPresent = false; // Is rounding-math in args?		bool RoundingMathPresent = false; // Is rounding-math in args?
// -ffp-model values: strict, fast, precise		// -ffp-model values: strict, fast, precise
StringRef FPModel = "";		StringRef FPModel = "";
// -ffp-exception-behavior options: strict, maytrap, ignore		// -ffp-exception-behavior options: strict, maytrap, ignore
StringRef FPExceptionBehavior = "";		StringRef FPExceptionBehavior = "";
StringRef DenormalFPMath = "";		const llvm::DenormalMode DefaultDenormalFPMath =
		TC.getDefaultDenormalModeForType(Args, DeviceOffloadKind);
		llvm::DenormalMode DenormalFPMath = DefaultDenormalFPMath;
StringRef FPContract = "";		StringRef FPContract = "";
bool StrictFPModel = false;		bool StrictFPModel = false;

		llvm::DenormalMode DenormalFP32Math = TC.getDefaultDenormalModeForType(
		Args, DeviceOffloadKind, &llvm::APFloat::IEEEsingle());

if (const Arg *A = Args.getLastArg(options::OPT_flimited_precision_EQ)) {		if (const Arg *A = Args.getLastArg(options::OPT_flimited_precision_EQ)) {
CmdArgs.push_back("-mlimit-float-precision");		CmdArgs.push_back("-mlimit-float-precision");
CmdArgs.push_back(A->getValue());		CmdArgs.push_back(A->getValue());
}		}

for (const Arg *A : Args) {		for (const Arg *A : Args) {
auto optID = A->getOption().getID();		auto optID = A->getOption().getID();
bool PreciseFPModel = false;		bool PreciseFPModel = false;
▲ Show 20 Lines • Show All 105 Lines • ▼ Show 20 Lines	case options::OPT_frounding_math:
RoundingMathPresent = true;		RoundingMathPresent = true;
break;		break;
case options::OPT_fno_rounding_math:		case options::OPT_fno_rounding_math:
RoundingFPMath = false;		RoundingFPMath = false;
RoundingMathPresent = false;		RoundingMathPresent = false;
break;		break;

case options::OPT_fdenormal_fp_math_EQ:		case options::OPT_fdenormal_fp_math_EQ:
DenormalFPMath = A->getValue();		DenormalFPMath = llvm::parseDenormalFPAttribute(A->getValue());
		if (DenormalFPMath == llvm::DenormalMode::Invalid) {
		D.Diag(diag::err_drv_invalid_value)
		<< A->getAsString(Args) << A->getValue();
		}
		break;

		case options::OPT_fdenormal_fp_math_f32_EQ:
		DenormalFP32Math = llvm::parseDenormalFPAttribute(A->getValue());
		if (DenormalFP32Math == llvm::DenormalMode::Invalid) {
		D.Diag(diag::err_drv_invalid_value)
		<< A->getAsString(Args) << A->getValue();
		}
break;		break;

// Validate and pass through -ffp-contract option.		// Validate and pass through -ffp-contract option.
case options::OPT_ffp_contract: {		case options::OPT_ffp_contract: {
StringRef Val = A->getValue();		StringRef Val = A->getValue();
if (PreciseFPModel) {		if (PreciseFPModel) {
// -ffp-model=precise enables ffp-contract=fast as a side effect		// -ffp-model=precise enables ffp-contract=fast as a side effect
// the FPContract value has already been set to a string literal		// the FPContract value has already been set to a string literal
▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	case options::OPT_funsafe_math_optimizations:
break;		break;
case options::OPT_fno_unsafe_math_optimizations:		case options::OPT_fno_unsafe_math_optimizations:
AssociativeMath = false;		AssociativeMath = false;
ReciprocalMath = false;		ReciprocalMath = false;
SignedZeros = true;		SignedZeros = true;
TrappingMath = true;		TrappingMath = true;
FPExceptionBehavior = "strict";		FPExceptionBehavior = "strict";
// -fno_unsafe_math_optimizations restores default denormal handling		// -fno_unsafe_math_optimizations restores default denormal handling
DenormalFPMath = "";		DenormalFPMath = DefaultDenormalFPMath;
		andrew.w.kaylorUnsubmitted Not Done Reply Inline Actions Shouldn't this also restore DenormalFP32Math to its default value? andrew.w.kaylor: Shouldn't this also restore DenormalFP32Math to its default value?
break;		break;

case options::OPT_Ofast:		case options::OPT_Ofast:
// If -Ofast is the optimization level, then -ffast-math should be enabled		// If -Ofast is the optimization level, then -ffast-math should be enabled
if (!OFastEnabled)		if (!OFastEnabled)
continue;		continue;
LLVM_FALLTHROUGH;		LLVM_FALLTHROUGH;
case options::OPT_ffast_math:		case options::OPT_ffast_math:
Show All 16 Lines	case options::OPT_fno_fast_math:
// toolchain default (which may be false).		// toolchain default (which may be false).
MathErrno = TC.IsMathErrnoDefault();		MathErrno = TC.IsMathErrnoDefault();
AssociativeMath = false;		AssociativeMath = false;
ReciprocalMath = false;		ReciprocalMath = false;
SignedZeros = true;		SignedZeros = true;
TrappingMath = false;		TrappingMath = false;
RoundingFPMath = false;		RoundingFPMath = false;
// -fno_fast_math restores default denormal and fpcontract handling		// -fno_fast_math restores default denormal and fpcontract handling
DenormalFPMath = "";		DenormalFPMath = DefaultDenormalFPMath;
FPContract = "";		FPContract = "";
break;		break;
}		}
if (StrictFPModel) {		if (StrictFPModel) {
// If -ffp-model=strict has been specified on command line but		// If -ffp-model=strict has been specified on command line but
// subsequent options conflict then emit warning diagnostic.		// subsequent options conflict then emit warning diagnostic.
if (HonorINFs && HonorNaNs &&		if (HonorINFs && HonorNaNs &&
		arsenmAuthorUnsubmitted Done Reply Inline Actions I think this should just follow along with DenormalFPMath, but I'll put this off to the later patch since this one still is slightly awkward trying to avoid changing the meaning of the absence of the option arsenm: I think this should just follow along with DenormalFPMath, but I'll put this off to the later…
!AssociativeMath && !ReciprocalMath &&		!AssociativeMath && !ReciprocalMath &&
SignedZeros && TrappingMath && RoundingFPMath &&		SignedZeros && TrappingMath && RoundingFPMath &&
DenormalFPMath.empty() && FPContract.empty())		DenormalFPMath.empty() && FPContract.empty())
// OK: Current Arg doesn't conflict with -ffp-model=strict		// OK: Current Arg doesn't conflict with -ffp-model=strict
;		;
else {		else {
StrictFPModel = false;		StrictFPModel = false;
FPModel = "";		FPModel = "";
Show All 32 Lines	static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D,

if (TrappingMath) {		if (TrappingMath) {
// FP Exception Behavior is also set to strict		// FP Exception Behavior is also set to strict
assert(FPExceptionBehavior.equals("strict"));		assert(FPExceptionBehavior.equals("strict"));
CmdArgs.push_back("-ftrapping-math");		CmdArgs.push_back("-ftrapping-math");
} else if (TrappingMathPresent)		} else if (TrappingMathPresent)
CmdArgs.push_back("-fno-trapping-math");		CmdArgs.push_back("-fno-trapping-math");

if (!DenormalFPMath.empty())		// TODO: Omit flag for the default IEEE instead
CmdArgs.push_back(		if (DenormalFPMath != llvm::DenormalMode::Invalid) {
Args.MakeArgString("-fdenormal-fp-math=" + DenormalFPMath));		CmdArgs.push_back(Args.MakeArgString(
		"-fdenormal-fp-math=" + llvm::subnormalModeName(DenormalFPMath)));
		}

		if (DenormalFP32Math != llvm::DenormalMode::Invalid) {
		CmdArgs.push_back(Args.MakeArgString(
		"-fdenormal-fp-math-f32=" + llvm::subnormalModeName(DenormalFP32Math)));
		}

if (!FPContract.empty())		if (!FPContract.empty())
CmdArgs.push_back(Args.MakeArgString("-ffp-contract=" + FPContract));		CmdArgs.push_back(Args.MakeArgString("-ffp-contract=" + FPContract));

if (!RoundingFPMath)		if (!RoundingFPMath)
CmdArgs.push_back(Args.MakeArgString("-fno-rounding-math"));		CmdArgs.push_back(Args.MakeArgString("-fno-rounding-math"));

if (RoundingFPMath && RoundingMathPresent)		if (RoundingFPMath && RoundingMathPresent)
▲ Show 20 Lines • Show All 198 Lines • ▼ Show 20 Lines	if (!TrivialAutoVarInit.empty()) {
if (TrivialAutoVarInit == "zero" && !Args.hasArg(options::OPT_enable_trivial_var_init_zero))		if (TrivialAutoVarInit == "zero" && !Args.hasArg(options::OPT_enable_trivial_var_init_zero))
D.Diag(diag::err_drv_trivial_auto_var_init_zero_disabled);		D.Diag(diag::err_drv_trivial_auto_var_init_zero_disabled);
CmdArgs.push_back(		CmdArgs.push_back(
Args.MakeArgString("-ftrivial-auto-var-init=" + TrivialAutoVarInit));		Args.MakeArgString("-ftrivial-auto-var-init=" + TrivialAutoVarInit));
}		}
}		}

static void RenderOpenCLOptions(const ArgList &Args, ArgStringList &CmdArgs) {		static void RenderOpenCLOptions(const ArgList &Args, ArgStringList &CmdArgs) {
		// cl-denorms-are-zero is not forwarded. It is translated into a generic flag
		// for denormal flushing handling based on the target.
const unsigned ForwardedArguments[] = {		const unsigned ForwardedArguments[] = {
options::OPT_cl_opt_disable,		options::OPT_cl_opt_disable,
options::OPT_cl_strict_aliasing,		options::OPT_cl_strict_aliasing,
options::OPT_cl_single_precision_constant,		options::OPT_cl_single_precision_constant,
options::OPT_cl_finite_math_only,		options::OPT_cl_finite_math_only,
options::OPT_cl_kernel_arg_info,		options::OPT_cl_kernel_arg_info,
options::OPT_cl_unsafe_math_optimizations,		options::OPT_cl_unsafe_math_optimizations,
options::OPT_cl_fast_relaxed_math,		options::OPT_cl_fast_relaxed_math,
options::OPT_cl_mad_enable,		options::OPT_cl_mad_enable,
options::OPT_cl_no_signed_zeros,		options::OPT_cl_no_signed_zeros,
options::OPT_cl_denorms_are_zero,
options::OPT_cl_fp32_correctly_rounded_divide_sqrt,		options::OPT_cl_fp32_correctly_rounded_divide_sqrt,
options::OPT_cl_uniform_work_group_size		options::OPT_cl_uniform_work_group_size
};		};

if (Arg *A = Args.getLastArg(options::OPT_cl_std_EQ)) {		if (Arg *A = Args.getLastArg(options::OPT_cl_std_EQ)) {
std::string CLStdStr = std::string("-cl-std=") + A->getValue();		std::string CLStdStr = std::string("-cl-std=") + A->getValue();
CmdArgs.push_back(Args.MakeArgString(CLStdStr));		CmdArgs.push_back(Args.MakeArgString(CLStdStr));
}		}
▲ Show 20 Lines • Show All 1,161 Lines • ▼ Show 20 Lines	for (const auto &A : Args)
D.Diag(diag::err_drv_unsupported_embed_bitcode) << A->getSpelling();		D.Diag(diag::err_drv_unsupported_embed_bitcode) << A->getSpelling();

// Render the CodeGen options that need to be passed.		// Render the CodeGen options that need to be passed.
if (!Args.hasFlag(options::OPT_foptimize_sibling_calls,		if (!Args.hasFlag(options::OPT_foptimize_sibling_calls,
options::OPT_fno_optimize_sibling_calls))		options::OPT_fno_optimize_sibling_calls))
CmdArgs.push_back("-mdisable-tail-calls");		CmdArgs.push_back("-mdisable-tail-calls");

RenderFloatingPointOptions(TC, D, isOptimizationLevelFast(Args), Args,		RenderFloatingPointOptions(TC, D, isOptimizationLevelFast(Args), Args,
CmdArgs);		CmdArgs, JA.getOffloadingDeviceKind());

// Render ABI arguments		// Render ABI arguments
switch (TC.getArch()) {		switch (TC.getArch()) {
default: break;		default: break;
case llvm::Triple::arm:		case llvm::Triple::arm:
case llvm::Triple::armeb:		case llvm::Triple::armeb:
case llvm::Triple::thumbeb:		case llvm::Triple::thumbeb:
RenderARMABI(Triple, Args, CmdArgs);		RenderARMABI(Triple, Args, CmdArgs);
▲ Show 20 Lines • Show All 282 Lines • ▼ Show 20 Lines	#endif

Args.AddLastArg(CmdArgs, options::OPT_ffine_grained_bitfield_accesses,		Args.AddLastArg(CmdArgs, options::OPT_ffine_grained_bitfield_accesses,
options::OPT_fno_fine_grained_bitfield_accesses);		options::OPT_fno_fine_grained_bitfield_accesses);

// Handle segmented stacks.		// Handle segmented stacks.
if (Args.hasArg(options::OPT_fsplit_stack))		if (Args.hasArg(options::OPT_fsplit_stack))
CmdArgs.push_back("-split-stacks");		CmdArgs.push_back("-split-stacks");

RenderFloatingPointOptions(TC, D, OFastEnabled, Args, CmdArgs);		RenderFloatingPointOptions(TC, D, OFastEnabled, Args, CmdArgs,
		JA.getOffloadingDeviceKind());

if (Arg *A = Args.getLastArg(options::OPT_LongDouble_Group)) {		if (Arg *A = Args.getLastArg(options::OPT_LongDouble_Group)) {
if (TC.getArch() == llvm::Triple::x86 \|\|		if (TC.getArch() == llvm::Triple::x86 \|\|
TC.getArch() == llvm::Triple::x86_64)		TC.getArch() == llvm::Triple::x86_64)
A->render(Args, CmdArgs);		A->render(Args, CmdArgs);
else if ((TC.getArch() == llvm::Triple::ppc \|\| TC.getTriple().isPPC64()) &&		else if ((TC.getArch() == llvm::Triple::ppc \|\| TC.getTriple().isPPC64()) &&
(A->getOption().getID() != options::OPT_mlong_double_80))		(A->getOption().getID() != options::OPT_mlong_double_80))
A->render(Args, CmdArgs);		A->render(Args, CmdArgs);
▲ Show 20 Lines • Show All 2,440 Lines • Show Last 20 Lines

clang/lib/Driver/ToolChains/Cuda.h

Show First 20 Lines • Show All 143 Lines • ▼ Show 20 Lines	public:

llvm::opt::DerivedArgList *		llvm::opt::DerivedArgList *
TranslateArgs(const llvm::opt::DerivedArgList &Args, StringRef BoundArch,		TranslateArgs(const llvm::opt::DerivedArgList &Args, StringRef BoundArch,
Action::OffloadKind DeviceOffloadKind) const override;		Action::OffloadKind DeviceOffloadKind) const override;
void addClangTargetOptions(const llvm::opt::ArgList &DriverArgs,		void addClangTargetOptions(const llvm::opt::ArgList &DriverArgs,
llvm::opt::ArgStringList &CC1Args,		llvm::opt::ArgStringList &CC1Args,
Action::OffloadKind DeviceOffloadKind) const override;		Action::OffloadKind DeviceOffloadKind) const override;

		llvm::DenormalMode getDefaultDenormalModeForType(
		const llvm::opt::ArgList &DriverArgs,
		Action::OffloadKind DeviceOffloadKind,
		const llvm::fltSemantics *FPType = nullptr) const override;

// Never try to use the integrated assembler with CUDA; always fork out to		// Never try to use the integrated assembler with CUDA; always fork out to
// ptxas.		// ptxas.
bool useIntegratedAs() const override { return false; }		bool useIntegratedAs() const override { return false; }
bool isCrossCompiling() const override { return true; }		bool isCrossCompiling() const override { return true; }
bool isPICDefault() const override { return false; }		bool isPICDefault() const override { return false; }
bool isPIEDefault() const override { return false; }		bool isPIEDefault() const override { return false; }
bool isPICDefaultForced() const override { return false; }		bool isPICDefaultForced() const override { return false; }
bool SupportsProfiling() const override { return false; }		bool SupportsProfiling() const override { return false; }
▲ Show 20 Lines • Show All 43 Lines • Show Last 20 Lines

clang/lib/Driver/ToolChains/Cuda.cpp

Show All 15 Lines
#include "clang/Driver/Driver.h"		#include "clang/Driver/Driver.h"
#include "clang/Driver/DriverDiagnostic.h"		#include "clang/Driver/DriverDiagnostic.h"
#include "clang/Driver/Options.h"		#include "clang/Driver/Options.h"
#include "llvm/Option/ArgList.h"		#include "llvm/Option/ArgList.h"
#include "llvm/Support/FileSystem.h"		#include "llvm/Support/FileSystem.h"
#include "llvm/Support/Path.h"		#include "llvm/Support/Path.h"
#include "llvm/Support/Process.h"		#include "llvm/Support/Process.h"
#include "llvm/Support/Program.h"		#include "llvm/Support/Program.h"
		#include "llvm/Support/TargetParser.h"
#include "llvm/Support/VirtualFileSystem.h"		#include "llvm/Support/VirtualFileSystem.h"
#include <system_error>		#include <system_error>

using namespace clang::driver;		using namespace clang::driver;
using namespace clang::driver::toolchains;		using namespace clang::driver::toolchains;
using namespace clang::driver::tools;		using namespace clang::driver::tools;
using namespace clang;		using namespace clang;
using namespace llvm::opt;		using namespace llvm::opt;
▲ Show 20 Lines • Show All 576 Lines • ▼ Show 20 Lines	void CudaToolChain::addClangTargetOptions(
assert(!GpuArch.empty() && "Must have an explicit GPU arch.");		assert(!GpuArch.empty() && "Must have an explicit GPU arch.");
assert((DeviceOffloadingKind == Action::OFK_OpenMP \|\|		assert((DeviceOffloadingKind == Action::OFK_OpenMP \|\|
DeviceOffloadingKind == Action::OFK_Cuda) &&		DeviceOffloadingKind == Action::OFK_Cuda) &&
"Only OpenMP or CUDA offloading kinds are supported for NVIDIA GPUs.");		"Only OpenMP or CUDA offloading kinds are supported for NVIDIA GPUs.");

if (DeviceOffloadingKind == Action::OFK_Cuda) {		if (DeviceOffloadingKind == Action::OFK_Cuda) {
CC1Args.push_back("-fcuda-is-device");		CC1Args.push_back("-fcuda-is-device");

if (DriverArgs.hasFlag(options::OPT_fcuda_flush_denormals_to_zero,
options::OPT_fno_cuda_flush_denormals_to_zero, false))
CC1Args.push_back("-fcuda-flush-denormals-to-zero");

if (DriverArgs.hasFlag(options::OPT_fcuda_approx_transcendentals,		if (DriverArgs.hasFlag(options::OPT_fcuda_approx_transcendentals,
options::OPT_fno_cuda_approx_transcendentals, false))		options::OPT_fno_cuda_approx_transcendentals, false))
CC1Args.push_back("-fcuda-approx-transcendentals");		CC1Args.push_back("-fcuda-approx-transcendentals");

if (DriverArgs.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc,		if (DriverArgs.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc,
false))		false))
CC1Args.push_back("-fgpu-rdc");		CC1Args.push_back("-fgpu-rdc");
}		}
▲ Show 20 Lines • Show All 84 Lines • ▼ Show 20 Lines	for (StringRef LibraryPath : LibraryPaths) {
}		}
}		}
if (!FoundBCLibrary)		if (!FoundBCLibrary)
getDriver().Diag(diag::warn_drv_omp_offload_target_missingbcruntime)		getDriver().Diag(diag::warn_drv_omp_offload_target_missingbcruntime)
<< LibOmpTargetName;		<< LibOmpTargetName;
}		}
}		}

		llvm::DenormalMode CudaToolChain::getDefaultDenormalModeForType(
		const llvm::opt::ArgList &DriverArgs, Action::OffloadKind DeviceOffloadKind,
		const llvm::fltSemantics *FPType) const {
		if (DeviceOffloadKind == Action::OFK_Cuda) {
		if (FPType && FPType == &llvm::APFloat::IEEEsingle() &&
		DriverArgs.hasFlag(options::OPT_fcuda_flush_denormals_to_zero,
		options::OPT_fno_cuda_flush_denormals_to_zero,
		false))
		return llvm::DenormalMode::PreserveSign;
		}

		assert(DeviceOffloadKind != Action::OFK_Host);
		return llvm::DenormalMode::IEEE;
		}

bool CudaToolChain::supportsDebugInfoOption(const llvm::opt::Arg *A) const {		bool CudaToolChain::supportsDebugInfoOption(const llvm::opt::Arg *A) const {
const Option &O = A->getOption();		const Option &O = A->getOption();
return (O.matches(options::OPT_gN_Group) &&		return (O.matches(options::OPT_gN_Group) &&
!O.matches(options::OPT_gmodules)) \|\|		!O.matches(options::OPT_gmodules)) \|\|
O.matches(options::OPT_g_Flag) \|\|		O.matches(options::OPT_g_Flag) \|\|
O.matches(options::OPT_ggdbN_Group) \|\| O.matches(options::OPT_ggdb) \|\|		O.matches(options::OPT_ggdbN_Group) \|\| O.matches(options::OPT_ggdb) \|\|
O.matches(options::OPT_gdwarf) \|\| O.matches(options::OPT_gdwarf_2) \|\|		O.matches(options::OPT_gdwarf) \|\| O.matches(options::OPT_gdwarf_2) \|\|
O.matches(options::OPT_gdwarf_3) \|\| O.matches(options::OPT_gdwarf_4) \|\|		O.matches(options::OPT_gdwarf_3) \|\| O.matches(options::OPT_gdwarf_4) \|\|
▲ Show 20 Lines • Show All 158 Lines • Show Last 20 Lines

clang/lib/Driver/ToolChains/HIP.cpp

Show First 20 Lines • Show All 274 Lines • ▼ Show 20 Lines	void HIPToolChain::addClangTargetOptions(
(void) GpuArch;		(void) GpuArch;
assert(DeviceOffloadingKind == Action::OFK_HIP &&		assert(DeviceOffloadingKind == Action::OFK_HIP &&
"Only HIP offloading kinds are supported for GPUs.");		"Only HIP offloading kinds are supported for GPUs.");

CC1Args.push_back("-target-cpu");		CC1Args.push_back("-target-cpu");
CC1Args.push_back(DriverArgs.MakeArgStringRef(GpuArch));		CC1Args.push_back(DriverArgs.MakeArgStringRef(GpuArch));
CC1Args.push_back("-fcuda-is-device");		CC1Args.push_back("-fcuda-is-device");

if (DriverArgs.hasFlag(options::OPT_fcuda_flush_denormals_to_zero,
options::OPT_fno_cuda_flush_denormals_to_zero, false))
CC1Args.push_back("-fcuda-flush-denormals-to-zero");

if (DriverArgs.hasFlag(options::OPT_fcuda_approx_transcendentals,		if (DriverArgs.hasFlag(options::OPT_fcuda_approx_transcendentals,
options::OPT_fno_cuda_approx_transcendentals, false))		options::OPT_fno_cuda_approx_transcendentals, false))
CC1Args.push_back("-fcuda-approx-transcendentals");		CC1Args.push_back("-fcuda-approx-transcendentals");

if (DriverArgs.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc,		if (DriverArgs.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc,
false))		false))
CC1Args.push_back("-fgpu-rdc");		CC1Args.push_back("-fgpu-rdc");

▲ Show 20 Lines • Show All 161 Lines • Show Last 20 Lines

clang/lib/Frontend/CompilerInvocation.cpp

Show First 20 Lines • Show All 904 Lines • ▼ Show 20 Lines	Opts.NoNaNsFPMath = (Args.hasArg(OPT_menable_no_nans) \|\|
Args.hasArg(OPT_cl_unsafe_math_optimizations) \|\|		Args.hasArg(OPT_cl_unsafe_math_optimizations) \|\|
Args.hasArg(OPT_cl_finite_math_only) \|\|		Args.hasArg(OPT_cl_finite_math_only) \|\|
Args.hasArg(OPT_cl_fast_relaxed_math));		Args.hasArg(OPT_cl_fast_relaxed_math));
Opts.NoSignedZeros = (Args.hasArg(OPT_fno_signed_zeros) \|\|		Opts.NoSignedZeros = (Args.hasArg(OPT_fno_signed_zeros) \|\|
Args.hasArg(OPT_cl_no_signed_zeros) \|\|		Args.hasArg(OPT_cl_no_signed_zeros) \|\|
Args.hasArg(OPT_cl_unsafe_math_optimizations) \|\|		Args.hasArg(OPT_cl_unsafe_math_optimizations) \|\|
Args.hasArg(OPT_cl_fast_relaxed_math));		Args.hasArg(OPT_cl_fast_relaxed_math));
Opts.Reassociate = Args.hasArg(OPT_mreassociate);		Opts.Reassociate = Args.hasArg(OPT_mreassociate);
Opts.FlushDenorm = Args.hasArg(OPT_cl_denorms_are_zero) \|\|
(Args.hasArg(OPT_fcuda_is_device) &&
Args.hasArg(OPT_fcuda_flush_denormals_to_zero));
Opts.CorrectlyRoundedDivSqrt =		Opts.CorrectlyRoundedDivSqrt =
Args.hasArg(OPT_cl_fp32_correctly_rounded_divide_sqrt);		Args.hasArg(OPT_cl_fp32_correctly_rounded_divide_sqrt);
Opts.UniformWGSize =		Opts.UniformWGSize =
Args.hasArg(OPT_cl_uniform_work_group_size);		Args.hasArg(OPT_cl_uniform_work_group_size);
Opts.Reciprocals = Args.getAllArgValues(OPT_mrecip_EQ);		Opts.Reciprocals = Args.getAllArgValues(OPT_mrecip_EQ);
Opts.ReciprocalMath = Args.hasArg(OPT_freciprocal_math);		Opts.ReciprocalMath = Args.hasArg(OPT_freciprocal_math);
Opts.NoTrappingMath = Args.hasArg(OPT_fno_trapping_math);		Opts.NoTrappingMath = Args.hasArg(OPT_fno_trapping_math);
Opts.StrictFloatCastOverflow =		Opts.StrictFloatCastOverflow =
▲ Show 20 Lines • Show All 342 Lines • ▼ Show 20 Lines	static bool ParseCodeGenArgs(CodeGenOptions &Opts, ArgList &Args, InputKind IK,

if (Arg *A = Args.getLastArg(OPT_fdenormal_fp_math_EQ)) {		if (Arg *A = Args.getLastArg(OPT_fdenormal_fp_math_EQ)) {
StringRef Val = A->getValue();		StringRef Val = A->getValue();
Opts.FPDenormalMode = llvm::parseDenormalFPAttribute(Val);		Opts.FPDenormalMode = llvm::parseDenormalFPAttribute(Val);
if (Opts.FPDenormalMode == llvm::DenormalMode::Invalid)		if (Opts.FPDenormalMode == llvm::DenormalMode::Invalid)
Diags.Report(diag::err_drv_invalid_value) << A->getAsString(Args) << Val;		Diags.Report(diag::err_drv_invalid_value) << A->getAsString(Args) << Val;
}		}

		if (Arg *A = Args.getLastArg(OPT_fdenormal_fp_math_f32_EQ)) {
		StringRef Val = A->getValue();
		Opts.FP32DenormalMode = llvm::parseDenormalFPAttribute(Val);
		if (Opts.FP32DenormalMode == llvm::DenormalMode::Invalid)
		Diags.Report(diag::err_drv_invalid_value) << A->getAsString(Args) << Val;
		}

if (Arg *A = Args.getLastArg(OPT_fpcc_struct_return, OPT_freg_struct_return)) {		if (Arg *A = Args.getLastArg(OPT_fpcc_struct_return, OPT_freg_struct_return)) {
if (A->getOption().matches(OPT_fpcc_struct_return)) {		if (A->getOption().matches(OPT_fpcc_struct_return)) {
Opts.setStructReturnConvention(CodeGenOptions::SRCK_OnStack);		Opts.setStructReturnConvention(CodeGenOptions::SRCK_OnStack);
} else {		} else {
assert(A->getOption().matches(OPT_freg_struct_return));		assert(A->getOption().matches(OPT_freg_struct_return));
Opts.setStructReturnConvention(CodeGenOptions::SRCK_InRegs);		Opts.setStructReturnConvention(CodeGenOptions::SRCK_InRegs);
}		}
}		}
▲ Show 20 Lines • Show All 2,489 Lines • Show Last 20 Lines

clang/test/CodeGenCUDA/flush-denormals.cu

	// RUN: %clang_cc1 -fcuda-is-device \			// RUN: %clang_cc1 -fcuda-is-device \
	// RUN: -triple nvptx-nvidia-cuda -emit-llvm -o - %s \| \			// RUN: -triple nvptx-nvidia-cuda -emit-llvm -o - %s \| \
	// RUN: FileCheck %s -check-prefix CHECK -check-prefix NOFTZ			// RUN: FileCheck -check-prefix=DEFAULT %s
	// RUN: %clang_cc1 -fcuda-is-device -fcuda-flush-denormals-to-zero \
				// RUN: %clang_cc1 -fcuda-is-device -fdenormal-fp-math-f32=ieee \
				// RUN: -triple nvptx-nvidia-cuda -emit-llvm -o - %s \| \
				// RUN: FileCheck -check-prefix=NOFTZ %s

				// RUN: %clang_cc1 -fcuda-is-device -fdenormal-fp-math-f32=preserve-sign \
	// RUN: -triple nvptx-nvidia-cuda -emit-llvm -o - %s \| \			// RUN: -triple nvptx-nvidia-cuda -emit-llvm -o - %s \| \
	// RUN: FileCheck %s -check-prefix CHECK -check-prefix FTZ			// RUN: FileCheck -check-prefix=FTZ %s

				// FIXME: Unspecified should default to ieee
	// RUN: %clang_cc1 -fcuda-is-device -x hip \			// RUN: %clang_cc1 -fcuda-is-device -x hip \
	// RUN: -triple amdgcn-amd-amdhsa -target-cpu gfx900 -emit-llvm -o - %s \| \			// RUN: -triple amdgcn-amd-amdhsa -target-cpu gfx900 -emit-llvm -o - %s \| \
	// RUN: FileCheck %s -check-prefix CHECK -check-prefix AMDNOFTZ			// RUN: FileCheck -check-prefix=AMDFTZ %s
	// RUN: %clang_cc1 -fcuda-is-device -x hip -fcuda-flush-denormals-to-zero \
				// RUN: %clang_cc1 -fcuda-is-device -x hip \
				// RUN: -triple amdgcn-amd-amdhsa -target-cpu gfx900 -fdenormal-fp-math-f32=ieee -emit-llvm -o - %s \| \
				// RUN: FileCheck -check-prefix=AMDNOFTZ %s

				// RUN: %clang_cc1 -fcuda-is-device -x hip -fdenormal-fp-math-f32=preserve-sign \
	// RUN: -triple amdgcn-amd-amdhsa -target-cpu gfx900 -emit-llvm -o - %s \| \			// RUN: -triple amdgcn-amd-amdhsa -target-cpu gfx900 -emit-llvm -o - %s \| \
	// RUN: FileCheck %s -check-prefix CHECK -check-prefix AMDFTZ			// RUN: FileCheck -check-prefix=AMDFTZ %s

	#include "Inputs/cuda.h"			#include "Inputs/cuda.h"

	// Checks that device function calls get emitted with the "ntpvx-f32ftz"			// Checks that device function calls get emitted with the "denormal-fp-math-f32"
	// attribute set to "true" when we compile CUDA device code with			// attribute set when we compile CUDA device code with
	// -fcuda-flush-denormals-to-zero. Further, check that we reflect the presence			// -fdenormal-fp-math-f32. Further, check that we reflect the presence or
	// or absence of -fcuda-flush-denormals-to-zero in a module flag.			// absence of -fcuda-flush-denormals-to-zero in a module flag.

	// AMDGCN targets always have +fp64-fp16-denormals.			// AMDGCN targets always have +fp64-fp16-denormals.
	// AMDGCN targets without fast FMAF (e.g. gfx803) always have +fp32-denormals.			// AMDGCN targets without fast FMAF (e.g. gfx803) always have +fp32-denormals.
	// For AMDGCN target with fast FMAF (e.g. gfx900), it has +fp32-denormals			// For AMDGCN target with fast FMAF (e.g. gfx900), it has +fp32-denormals
	// by default and -fp32-denormals when there is option			// by default and -fp32-denormals when there is option
	// -fcuda-flush-denormals-to-zero.			// -fcuda-flush-denormals-to-zero.

	// CHECK-LABEL: define void @foo() #0			// CHECK-LABEL: define void @foo() #0
	extern "C" __device__ void foo() {}			extern "C" __device__ void foo() {}

	// FTZ: attributes #0 = {{.*}} "nvptx-f32ftz"="true"			// FTZ: attributes #0 = {{.*}} "denormal-fp-math-f32"="preserve-sign"
	// NOFTZ-NOT: attributes #0 = {{.*}} "nvptx-f32ftz"			// NOFTZ: attributes #0 = {{.*}} "denormal-fp-math-f32"="ieee"


				// FIXME: This should be removed
				// DEFAULT-NOT: "denormal-fp-math-f32"

	// AMDNOFTZ: attributes #0 = {{.}}+fp32-denormals{{.}}+fp64-fp16-denormals			// AMDNOFTZ: attributes #0 = {{.}}+fp32-denormals{{.}}+fp64-fp16-denormals
	// AMDFTZ: attributes #0 = {{.}}+fp64-fp16-denormals{{.}}-fp32-denormals			// AMDFTZ: attributes #0 = {{.}}+fp64-fp16-denormals{{.}}-fp32-denormals

	// FTZ:!llvm.module.flags = !{{{.*}}[[MODFLAG:![0-9]+]]}			// FTZ:!llvm.module.flags = !{{{.*}}[[MODFLAG:![0-9]+]]}
	// FTZ:[[MODFLAG]] = !{i32 4, !"nvvm-reflect-ftz", i32 1}			// FTZ:[[MODFLAG]] = !{i32 4, !"nvvm-reflect-ftz", i32 1}

	// NOFTZ:!llvm.module.flags = !{{{.*}}[[MODFLAG:![0-9]+]]}			// NOFTZ:!llvm.module.flags = !{{{.*}}[[MODFLAG:![0-9]+]]}
	// NOFTZ:[[MODFLAG]] = !{i32 4, !"nvvm-reflect-ftz", i32 0}			// NOFTZ:[[MODFLAG]] = !{i32 4, !"nvvm-reflect-ftz", i32 0}

clang/test/CodeGenCUDA/propagate-metadata.cu

	Show All 9 Lines
	// overrides the flag on the bitcode library.			// overrides the flag on the bitcode library.

	// Build the bitcode library. This is not built in CUDA mode, otherwise it			// Build the bitcode library. This is not built in CUDA mode, otherwise it
	// might have incompatible attributes. This mirrors how libdevice is built.			// might have incompatible attributes. This mirrors how libdevice is built.
	// RUN: %clang_cc1 -x c++ -emit-llvm-bc -ftrapping-math -DLIB \			// RUN: %clang_cc1 -x c++ -emit-llvm-bc -ftrapping-math -DLIB \
	// RUN: %s -o %t.bc -triple nvptx-unknown-unknown			// RUN: %s -o %t.bc -triple nvptx-unknown-unknown

	// RUN: %clang_cc1 -x cuda %s -emit-llvm -mlink-builtin-bitcode %t.bc -o - \			// RUN: %clang_cc1 -x cuda %s -emit-llvm -mlink-builtin-bitcode %t.bc -o - \
	// RUN: -fno-trapping-math -fcuda-is-device -triple nvptx-unknown-unknown \			// RUN: -fno-trapping-math -fcuda-is-device -fdenormal-fp-math-f32=ieee -triple nvptx-unknown-unknown \
	// RUN: \| FileCheck %s --check-prefix=CHECK --check-prefix=NOFTZ --check-prefix=NOFAST			// RUN: \| FileCheck %s --check-prefix=CHECK --check-prefix=NOFTZ --check-prefix=NOFAST

	// RUN: %clang_cc1 -x cuda %s -emit-llvm -mlink-builtin-bitcode %t.bc \			// RUN: %clang_cc1 -x cuda %s -emit-llvm -mlink-builtin-bitcode %t.bc \
	// RUN: -fno-trapping-math -fcuda-flush-denormals-to-zero -o - \			// RUN: -fno-trapping-math -fdenormal-fp-math-f32=preserve-sign -o - \
	// RUN: -fcuda-is-device -triple nvptx-unknown-unknown \			// RUN: -fcuda-is-device -triple nvptx-unknown-unknown \
	// RUN: \| FileCheck %s --check-prefix=CHECK --check-prefix=FTZ \			// RUN: \| FileCheck %s --check-prefix=CHECK --check-prefix=FTZ \
	// RUN: --check-prefix=NOFAST			// RUN: --check-prefix=NOFAST

	// RUN: %clang_cc1 -x cuda %s -emit-llvm -mlink-builtin-bitcode %t.bc \			// RUN: %clang_cc1 -x cuda %s -emit-llvm -mlink-builtin-bitcode %t.bc \
	// RUN: -fno-trapping-math -fcuda-flush-denormals-to-zero -o - \			// RUN: -fno-trapping-math -fdenormal-fp-math-f32=preserve-sign -o - \
	// RUN: -fcuda-is-device -menable-unsafe-fp-math -triple nvptx-unknown-unknown \			// RUN: -fcuda-is-device -menable-unsafe-fp-math -triple nvptx-unknown-unknown \
	// RUN: \| FileCheck %s --check-prefix=CHECK --check-prefix=FAST			// RUN: \| FileCheck %s --check-prefix=CHECK --check-prefix=FAST

	// Wrap everything in extern "C" so we don't have to worry about name mangling			// Wrap everything in extern "C" so we don't have to worry about name mangling
	// in the IR.			// in the IR.
	extern "C" {			extern "C" {
	#ifdef LIB			#ifdef LIB

	Show All 9 Lines

	#endif			#endif
	}			}

	// The kernel and lib function should have the same attributes.			// The kernel and lib function should have the same attributes.
	// CHECK: define void @kernel() [[attr:#[0-9]+]]			// CHECK: define void @kernel() [[attr:#[0-9]+]]
	// CHECK: define internal void @lib_fn() [[attr]]			// CHECK: define internal void @lib_fn() [[attr]]

				// FIXME: These -NOT checks do not work as intended and do not check on the same
				// line.

	// Check the attribute list.			// Check the attribute list.
	// CHECK: attributes [[attr]] = {			// CHECK: attributes [[attr]] = {
	// CHECK: "no-trapping-math"="true"

	// FTZ-SAME: "nvptx-f32ftz"="true"			// FTZ-NOT: "denormal-fp-math"
	// NOFTZ-NOT: "nvptx-f32ftz"="true"
				// FTZ-SAME: "denormal-fp-math-f32"="preserve-sign"
				// NOFTZ-SAME: "denormal-fp-math-f32"="ieee"

				// CHECK-SAME: "no-trapping-math"="true"

	// FAST-SAME: "unsafe-fp-math"="true"			// FAST-SAME: "unsafe-fp-math"="true"
	// NOFAST-NOT: "unsafe-fp-math"="true"			// NOFAST-NOT: "unsafe-fp-math"="true"

clang/test/CodeGenOpenCL/amdgpu-features.cl

	// REQUIRES: amdgpu-registered-target			// REQUIRES: amdgpu-registered-target

	// Check that appropriate features are defined for every supported AMDGPU			// Check that appropriate features are defined for every supported AMDGPU
	// "-target" and "-mcpu" options.			// "-target" and "-mcpu" options.

	// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx904 -S -emit-llvm -o - %s \| FileCheck --check-prefix=GFX904 %s			// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx904 -S -emit-llvm -o - %s \| FileCheck --check-prefix=GFX904 %s
	// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx906 -S -emit-llvm -o - %s \| FileCheck --check-prefix=GFX906 %s			// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx906 -S -emit-llvm -o - %s \| FileCheck --check-prefix=GFX906 %s
	// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx908 -S -emit-llvm -o - %s \| FileCheck --check-prefix=GFX908 %s			// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx908 -S -emit-llvm -o - %s \| FileCheck --check-prefix=GFX908 %s
	// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx1010 -S -emit-llvm -o - %s \| FileCheck --check-prefix=GFX1010 %s			// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx1010 -S -emit-llvm -o - %s \| FileCheck --check-prefix=GFX1010 %s
	// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx1011 -S -emit-llvm -o - %s \| FileCheck --check-prefix=GFX1011 %s			// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx1011 -S -emit-llvm -o - %s \| FileCheck --check-prefix=GFX1011 %s
	// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx1012 -S -emit-llvm -o - %s \| FileCheck --check-prefix=GFX1012 %s			// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx1012 -S -emit-llvm -o - %s \| FileCheck --check-prefix=GFX1012 %s
	// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx801 -S -emit-llvm -o - %s \| FileCheck --check-prefix=GFX801 %s			// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx801 -S -emit-llvm -o - %s \| FileCheck --check-prefix=GFX801 %s
	// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx700 -S -emit-llvm -o - %s \| FileCheck --check-prefix=GFX700 %s			// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx700 -S -emit-llvm -o - %s \| FileCheck --check-prefix=GFX700 %s
	// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx600 -S -emit-llvm -o - %s \| FileCheck --check-prefix=GFX600 %s			// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx600 -S -emit-llvm -o - %s \| FileCheck --check-prefix=GFX600 %s
	// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx601 -S -emit-llvm -o - %s \| FileCheck --check-prefix=GFX601 %s			// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx601 -S -emit-llvm -o - %s \| FileCheck --check-prefix=GFX601 %s

	// GFX904: "target-features"="+16-bit-insts,+ci-insts,+dpp,+flat-address-space,+fp32-denormals,+fp64-fp16-denormals,+gfx8-insts,+gfx9-insts,+s-memrealtime"			// GFX904: "target-features"="+16-bit-insts,+ci-insts,+dpp,+flat-address-space,+fp64-fp16-denormals,+gfx8-insts,+gfx9-insts,+s-memrealtime,-fp32-denormals"
	// GFX906: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dpp,+flat-address-space,+fp32-denormals,+fp64-fp16-denormals,+gfx8-insts,+gfx9-insts,+s-memrealtime"			// GFX906: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dpp,+flat-address-space,+fp64-fp16-denormals,+gfx8-insts,+gfx9-insts,+s-memrealtime,-fp32-denormals"
	// GFX908: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dpp,+flat-address-space,+fp32-denormals,+fp64-fp16-denormals,+gfx8-insts,+gfx9-insts,+s-memrealtime"			// GFX908: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dpp,+flat-address-space,+fp64-fp16-denormals,+gfx8-insts,+gfx9-insts,+s-memrealtime,-fp32-denormals"
	// GFX1010: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dpp,+flat-address-space,+fp32-denormals,+fp64-fp16-denormals,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime"			// GFX1010: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dpp,+flat-address-space,+fp64-fp16-denormals,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime,-fp32-denormals"
	// GFX1011: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dot5-insts,+dot6-insts,+dpp,+flat-address-space,+fp32-denormals,+fp64-fp16-denormals,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime"			// GFX1011: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dot5-insts,+dot6-insts,+dpp,+flat-address-space,+fp64-fp16-denormals,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime,-fp32-denormals"
	// GFX1012: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dot5-insts,+dot6-insts,+dpp,+flat-address-space,+fp32-denormals,+fp64-fp16-denormals,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime"			// GFX1012: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dot5-insts,+dot6-insts,+dpp,+flat-address-space,+fp64-fp16-denormals,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime,-fp32-denormals"
	// GFX801: "target-features"="+16-bit-insts,+ci-insts,+dpp,+flat-address-space,+fp32-denormals,+fp64-fp16-denormals,+gfx8-insts,+s-memrealtime"			// GFX801: "target-features"="+16-bit-insts,+ci-insts,+dpp,+flat-address-space,+fp64-fp16-denormals,+gfx8-insts,+s-memrealtime,-fp32-denormals"
	// GFX700: "target-features"="+ci-insts,+flat-address-space,+fp64-fp16-denormals,-fp32-denormals"			// GFX700: "target-features"="+ci-insts,+flat-address-space,+fp64-fp16-denormals,-fp32-denormals"
	// GFX600: "target-features"="+fp64-fp16-denormals,-fp32-denormals"			// GFX600: "target-features"="+fp64-fp16-denormals,-fp32-denormals"
	// GFX601: "target-features"="+fp64-fp16-denormals,-fp32-denormals"			// GFX601: "target-features"="+fp64-fp16-denormals,-fp32-denormals"

	kernel void test() {}			kernel void test() {}

clang/test/CodeGenOpenCL/denorms-are-zero.cl

This file was deleted.

	// RUN: %clang_cc1 -emit-llvm -o - %s \| FileCheck %s
	// RUN: %clang_cc1 -emit-llvm -cl-denorms-are-zero -o - %s \| FileCheck -check-prefix=DENORM-ZERO %s

	// Slow FMAF and slow f32 denormals
	// RUN: %clang_cc1 -emit-llvm -o - -triple amdgcn--amdhsa -target-cpu pitcairn %s \| FileCheck -check-prefixes=AMDGCN,AMDGCN-FLUSH %s
	// RUN: %clang_cc1 -emit-llvm -cl-denorms-are-zero -o - -triple amdgcn--amdhsa -target-cpu pitcairn %s \| FileCheck -check-prefixes=AMDGCN,AMDGCN-FLUSH-OPT %s

	// Fast FMAF, but slow f32 denormals
	// RUN: %clang_cc1 -emit-llvm -o - -triple amdgcn--amdhsa -target-cpu tahiti %s \| FileCheck -check-prefixes=AMDGCN,AMDGCN-FLUSH %s
	// RUN: %clang_cc1 -emit-llvm -cl-denorms-are-zero -o - -triple amdgcn--amdhsa -target-cpu tahiti %s \| FileCheck -check-prefixes=AMDGCN,AMDGCN-FLUSH-OPT %s

	// Fast F32 denormals, but slow FMAF
	// RUN: %clang_cc1 -emit-llvm -o - -triple amdgcn--amdhsa -target-cpu fiji %s \| FileCheck -check-prefixes=AMDGCN,AMDGCN-FLUSH %s
	// RUN: %clang_cc1 -emit-llvm -cl-denorms-are-zero -o - -triple amdgcn--amdhsa -target-cpu fiji %s \| FileCheck -check-prefixes=AMDGCN,AMDGCN-FLUSH-OPT %s

	// Fast F32 denormals and fast FMAF
	// RUN: %clang_cc1 -emit-llvm -o - -triple amdgcn--amdhsa -target-cpu gfx900 %s \| FileCheck -check-prefixes=AMDGCN,AMDGCN-DENORM %s
	// RUN: %clang_cc1 -emit-llvm -cl-denorms-are-zero -o - -triple amdgcn--amdhsa -target-cpu gfx900 %s \| FileCheck -check-prefixes=AMDGCN,AMDGCN-FLUSH-OPT %s

	// RUN: %clang_cc1 -emit-llvm -target-feature +fp32-denormals -target-feature -fp64-fp16-denormals -cl-denorms-are-zero -o - -triple amdgcn--amdhsa -target-cpu fiji %s \| FileCheck -check-prefixes=AMDGCN,AMDGCN-FEATURE %s
	// RUN: %clang_cc1 -emit-llvm -target-feature +fp32-denormals -target-feature -fp64-fp16-denormals -cl-denorms-are-zero -o - -triple amdgcn--amdhsa -target-cpu pitcairn %s \| FileCheck -check-prefixes=AMDGCN,AMDGCN-FEATURE %s



	// For all targets 'denorms-are-zero' attribute is set to 'true'
	// if '-cl-denorms-are-zero' was specified and to 'false' otherwise.

	// CHECK-LABEL: define {{(dso_local )?}}void @f()
	// CHECK: attributes #{{[0-9]}} = {{{[^}]}} "denorms-are-zero"="false"
	//
	// DENORM-ZERO-LABEL: define {{(dso_local )?}}void @f()
	// DENORM-ZERO: attributes #{{[0-9]}} = {{{[^}]}} "denorms-are-zero"="true"

	// For amdgcn target cpu fiji, fp32 should be flushed since fiji does not support fp32 denormals, unless +fp32-denormals is
	// explicitly set. amdgcn target always do not flush fp64 denormals. The control for fp64 and fp16 denormals is the same.

	// AMDGCN-LABEL: define void @f()

	// AMDGCN-FLUSH: attributes #{{[0-9]}} = {{{[^}]}} "denorms-are-zero"="false" {{.}} "target-features"="{{[^"]}}+fp64-fp16-denormals,{{[^"]}}-fp32-denormals{{[^"]}}"
	// AMDGCN-FLUSH-OPT: attributes #{{[0-9]}} = {{{[^}]}} "denorms-are-zero"="true" {{.}} "target-features"="{{[^"]}}+fp64-fp16-denormals,{{[^"]}}-fp32-denormals{{[^"]}}"

	// AMDGCN-DENORM: attributes #{{[0-9]}} = {{{[^}]}} "denorms-are-zero"="false" {{.}} "target-features"="{{[^"]}}+fp32-denormals,{{[^"]}}+fp64-fp16-denormals{{[^"]}}"

	// AMDGCN-FEATURE: attributes #{{[0-9]}} = {{{[^}]}} "denorms-are-zero"="true" {{.}} "target-features"="{{[^"]}}+fp32-denormals,{{[^"]}}-fp64-fp16-denormals{{[^"]}}"
	void f() {}

clang/test/CodeGenOpenCL/gfx9-fp32-denorms.cl

This file was deleted.

	// REQUIRES: amdgpu-registered-target

	// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu gfx900 -S -emit-llvm -o - %s \| FileCheck --check-prefix=DEFAULT %s
	// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu gfx900 -S -emit-llvm -o - -target-feature +fp32-denormals %s \| FileCheck --check-prefix=FEATURE_FP32_DENORMALS_ON %s
	// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu gfx900 -S -emit-llvm -o - -target-feature -fp32-denormals %s \| FileCheck --check-prefix=FEATURE_FP32_DENORMALS_OFF %s
	// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu gfx900 -S -emit-llvm -o - -cl-denorms-are-zero %s \| FileCheck --check-prefix=OPT_DENORMS_ARE_ZERO %s

	// DEFAULT: +fp32-denormals
	// FEATURE_FP32_DENORMALS_ON: +fp32-denormals
	// FEATURE_FP32_DENORMALS_OFF: -fp32-denormals
	// OPT_DENORMS_ARE_ZERO: -fp32-denormals

	kernel void gfx9_fp32_denorms() {}

clang/test/Driver/cl-denorms-are-zero.cl

This file was added.

				// Slow FMAF and slow f32 denormals
				// RUN: %clang -### -target amdgcn--amdhsa -c -mcpu=pitcairn %s 2>&1 \| FileCheck -check-prefixes=AMDGCN,AMDGCN-FLUSH %s
				// RUN: %clang -### -cl-denorms-are-zero -o - -target amdgcn--amdhsa -c -mcpu=pitcairn %s 2>&1 \| FileCheck -check-prefixes=AMDGCN,AMDGCN-FLUSH %s

				// Fast FMAF, but slow f32 denormals
				// RUN: %clang -### -target amdgcn--amdhsa -c -mcpu=tahiti %s 2>&1 \| FileCheck -check-prefixes=AMDGCN,AMDGCN-FLUSH %s
				// RUN: %clang -### -cl-denorms-are-zero -o - -target amdgcn--amdhsa -c -mcpu=tahiti %s 2>&1 \| FileCheck -check-prefixes=AMDGCN,AMDGCN-FLUSH %s

				// Fast F32 denormals, but slow FMAF
				// RUN: %clang -### -target amdgcn--amdhsa -c -mcpu=fiji %s 2>&1 \| FileCheck -check-prefixes=AMDGCN,AMDGCN-FLUSH %s
				// RUN: %clang -### -cl-denorms-are-zero -o - -target amdgcn--amdhsa -c -mcpu=fiji %s 2>&1 \| FileCheck -check-prefixes=AMDGCN,AMDGCN-FLUSH %s

				// Fast F32 denormals and fast FMAF
				// RUN: %clang -### -target amdgcn--amdhsa -c -mcpu=gfx900 %s 2>&1 \| FileCheck -check-prefixes=AMDGCN,AMDGCN-DENORM %s
				// RUN: %clang -### -cl-denorms-are-zero -o - -target amdgcn--amdhsa -c -mcpu=gfx900 %s 2>&1 \| FileCheck -check-prefixes=AMDGCN,AMDGCN-FLUSH %s

				// AMDGCN-FLUSH: "-fdenormal-fp-math-f32=preserve-sign"

				// This should be omitted and default to ieee
				// AMDGCN-DENORM-NOT: "-fdenormal-fp-math-f32"

clang/test/Driver/cuda-flush-denormals-to-zero.cu

This file was added.

				// Checks that cuda compilation does the right thing when passed
				// -fcuda-flush-denormals-to-zero. This should be translated to
				// -fdenormal-fp-math-f32=preserve-sign

				// RUN: %clang -no-canonical-prefixes -### -target x86_64-linux-gnu -c -march=haswell--cuda-gpu-arch=sm_20 -fcuda-flush-denormals-to-zero -nocudainc -nocudalib %s 2>&1 \| FileCheck -check-prefix=FTZ %s
				// RUN: %clang -no-canonical-prefixes -### -target x86_64-linux-gnu -c -march=haswell--cuda-gpu-arch=sm_20 -fno-cuda-flush-denormals-to-zero -nocudainc -nocudalib %s 2>&1 \| FileCheck -check-prefix=NOFTZ %s
				// RUN: %clang -no-canonical-prefixes -### -target x86_64-linux-gnu -c -march=haswell--cuda-gpu-arch=sm_10 -fcuda-flush-denormals-to-zero -nocudainc -nocudalib %s 2>&1 \| FileCheck -check-prefix=FTZ %s
				// RUN: %clang -no-canonical-prefixes -### -target x86_64-linux-gnu -c -march=haswell--cuda-gpu-arch=sm_10 -fno-cuda-flush-denormals-to-zero -nocudainc -nocudalib %s 2>&1 \| FileCheck -check-prefix=NOFTZ %s

				// CPUFTZ-NOT: -fdenormal-fp-math

				// FTZ: "-fdenormal-fp-math-f32=preserve-sign"
				// NOFTZ: "-fdenormal-fp-math=ieee"

clang/test/Driver/denormal-fp-math.c

	// RUN: %clang -### -target arm-unknown-linux-gnu -c %s -fdenormal-fp-math=ieee -v 2>&1 \| FileCheck -check-prefix=CHECK-IEEE %s			// RUN: %clang -### -target arm-unknown-linux-gnu -c %s -fdenormal-fp-math=ieee -v 2>&1 \| FileCheck -check-prefix=CHECK-IEEE %s
	// RUN: %clang -### -target arm-unknown-linux-gnu -c %s -fdenormal-fp-math=preserve-sign -v 2>&1 \| FileCheck -check-prefix=CHECK-PS %s			// RUN: %clang -### -target arm-unknown-linux-gnu -c %s -fdenormal-fp-math=preserve-sign -v 2>&1 \| FileCheck -check-prefix=CHECK-PS %s
	// RUN: %clang -### -target arm-unknown-linux-gnu -c %s -fdenormal-fp-math=positive-zero -v 2>&1 \| FileCheck -check-prefix=CHECK-PZ %s			// RUN: %clang -### -target arm-unknown-linux-gnu -c %s -fdenormal-fp-math=positive-zero -v 2>&1 \| FileCheck -check-prefix=CHECK-PZ %s
	// RUN: %clang -### -target arm-unknown-linux-gnu -c %s -fdenormal-fp-math=ieee -fno-fast-math -v 2>&1 \| FileCheck -check-prefix=CHECK-NO-UNSAFE %s			// RUN: %clang -### -target arm-unknown-linux-gnu -c %s -fdenormal-fp-math=ieee -fno-fast-math -v 2>&1 \| FileCheck -check-prefix=CHECK-NO-UNSAFE %s
	// RUN: %clang -### -target arm-unknown-linux-gnu -c %s -fdenormal-fp-math=ieee -fno-unsafe-math-optimizations -v 2>&1 \| FileCheck -check-prefix=CHECK-NO-UNSAFE %s			// RUN: %clang -### -target arm-unknown-linux-gnu -c %s -fdenormal-fp-math=ieee -fno-unsafe-math-optimizations -v 2>&1 \| FileCheck -check-prefix=CHECK-NO-UNSAFE %s
	// RUN: not %clang -target arm-unknown-linux-gnu -c %s -fdenormal-fp-math=foo -v 2>&1 \| FileCheck -check-prefix=CHECK-INVALID %s			// RUN: not %clang -target arm-unknown-linux-gnu -c %s -fdenormal-fp-math=foo -v 2>&1 \| FileCheck -check-prefix=CHECK-INVALID %s

	// CHECK-IEEE: "-fdenormal-fp-math=ieee"			// CHECK-IEEE: -fdenormal-fp-math=ieee
	// CHECK-PS: "-fdenormal-fp-math=preserve-sign"			// CHECK-PS: "-fdenormal-fp-math=preserve-sign"
	// CHECK-PZ: "-fdenormal-fp-math=positive-zero"			// CHECK-PZ: "-fdenormal-fp-math=positive-zero"
	// CHECK-NO-UNSAFE-NOT: "-fdenormal-fp-math=ieee"			// CHECK-NO-UNSAFE-NOT: "-fdenormal-fp-math=ieee"
	// CHECK-INVALID: error: invalid value 'foo' in '-fdenormal-fp-math=foo'			// CHECK-INVALID: error: invalid value 'foo' in '-fdenormal-fp-math=foo'

clang/test/Driver/opencl.cl

	Show All 26 Lines
	// CHECK-STRICT-ALIASING: "-cc1" {{.*}} "-cl-strict-aliasing"			// CHECK-STRICT-ALIASING: "-cc1" {{.*}} "-cl-strict-aliasing"
	// CHECK-SINGLE-PRECISION-CONST: "-cc1" {{.*}} "-cl-single-precision-constant"			// CHECK-SINGLE-PRECISION-CONST: "-cc1" {{.*}} "-cl-single-precision-constant"
	// CHECK-FINITE-MATH-ONLY: "-cc1" {{.*}} "-cl-finite-math-only"			// CHECK-FINITE-MATH-ONLY: "-cc1" {{.*}} "-cl-finite-math-only"
	// CHECK-KERNEL-ARG-INFO: "-cc1" {{.*}} "-cl-kernel-arg-info"			// CHECK-KERNEL-ARG-INFO: "-cc1" {{.*}} "-cl-kernel-arg-info"
	// CHECK-UNSAFE-MATH-OPT: "-cc1" {{.*}} "-cl-unsafe-math-optimizations"			// CHECK-UNSAFE-MATH-OPT: "-cc1" {{.*}} "-cl-unsafe-math-optimizations"
	// CHECK-FAST-RELAXED-MATH: "-cc1" {{.*}} "-cl-fast-relaxed-math"			// CHECK-FAST-RELAXED-MATH: "-cc1" {{.*}} "-cl-fast-relaxed-math"
	// CHECK-MAD-ENABLE: "-cc1" {{.*}} "-cl-mad-enable"			// CHECK-MAD-ENABLE: "-cc1" {{.*}} "-cl-mad-enable"
	// CHECK-NO-SIGNED-ZEROS: "-cc1" {{.*}} "-cl-no-signed-zeros"			// CHECK-NO-SIGNED-ZEROS: "-cc1" {{.*}} "-cl-no-signed-zeros"
	// CHECK-DENORMS-ARE-ZERO: "-cc1" {{.*}} "-cl-denorms-are-zero"
				// This is not forwarded
				// CHECK-DENORMS-ARE-ZERO-NOT: "-cl-denorms-are-zero"

	// CHECK-ROUND-DIV: "-cc1" {{.*}} "-cl-fp32-correctly-rounded-divide-sqrt"			// CHECK-ROUND-DIV: "-cc1" {{.*}} "-cl-fp32-correctly-rounded-divide-sqrt"
	// CHECK-UNIFORM-WG: "-cc1" {{.*}} "-cl-uniform-work-group-size"			// CHECK-UNIFORM-WG: "-cc1" {{.*}} "-cl-uniform-work-group-size"
	// CHECK-C99: error: invalid value 'c99' in '-cl-std=c99'			// CHECK-C99: error: invalid value 'c99' in '-cl-std=c99'
	// CHECK-INVALID: error: invalid value 'invalid' in '-cl-std=invalid'			// CHECK-INVALID: error: invalid value 'invalid' in '-cl-std=invalid'

	kernel void func(void);			kernel void func(void);

llvm/docs/LangRef.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,812 Lines • ▼ Show 20 Lines	``sspstrong``
resulting function will have an ``sspstrong`` attribute.		resulting function will have an ``sspstrong`` attribute.
``strictfp``		``strictfp``
This attribute indicates that the function was called from a scope that		This attribute indicates that the function was called from a scope that
requires strict floating-point semantics. LLVM will not attempt any		requires strict floating-point semantics. LLVM will not attempt any
optimizations that require assumptions about the floating-point rounding		optimizations that require assumptions about the floating-point rounding
mode or that might alter the state of floating-point status flags that		mode or that might alter the state of floating-point status flags that
might otherwise be set or cleared by calling this function. LLVM will		might otherwise be set or cleared by calling this function. LLVM will
not introduce any new floating-point instructions that may trap.		not introduce any new floating-point instructions that may trap.

		``"denormal-fp-math"``
		This indicates the subnormal handling that may be assumed for the
		default floating-point environment. This may be one of ``"ieee"``,
		``"preserve-sign"``, or ``"positive-zero"``. If this is attribute
		is not specified, the default is ``"ieee"``. If the mode is
		``"preserve-sign"``, or ``"positive-zero"``, subnormal outputs may
		be flushed to zero by standard floating point operations. It is not
		mandated that flushing to zero occurs, but if a subnormal output is
		flushed to zero, it must respect the sign mode. Not all targets
		support all modes.
		arsenmAuthorUnsubmitted Done Reply Inline Actions On second thought I think this may be too permissive. I think based on the use in DAGCombiner, that flushing of outputs is compulsory. arsenm: On second thought I think this may be too permissive. I think based on the use in DAGCombiner…
		arsenmAuthorUnsubmitted Done Reply Inline Actions It turns out the fast sqrt usage really cares about input denormals being implicitly treated as 0, not the output flushing (i.e. this only needs DAZ, not FTZ). I think being permissive on the output is OK, but if implicit input flushing is required then it's compulsory and a target is responsible for inserting a flush of some kind if the use instruction isn't known to follow this mode. Because of this, I do think it's necessary to treat this as two separate modes. I'm thinking to comma separate output-mode,input-mode, and assume input-mode=output-mode if the second half isn't specified for compatibility with the existing attribute. arsenm: It turns out the fast sqrt usage really cares about input denormals being implicitly treated as…

		``"denormal-fp-math-f32"``
		Same as ``"denorm-fp-math-f32"``, except for float types. If both
		scanonUnsubmitted Not Done Reply Inline Actions Can you clarify this a little bit? I'd prefer something like "Same as `"denorm-fp-math"`, but only controls the behavior of the 32-bit float type.". scanon: Can you clarify this a little bit? I'd prefer something like "Same as ``"denorm-fp-math"``, but…
		are present, this overrides ``"denorm-fp-math"``.

``"thunk"``		``"thunk"``
		andrew.w.kaylorUnsubmitted Not Done Reply Inline Actions Can you document which targets do support the option? What happens if I try to use the option on a target where it is not supported? andrew.w.kaylor: Can you document which targets do support the option? What happens if I try to use the option…
		arsenmAuthorUnsubmitted Done Reply Inline Actions I'm not sure where to document this, or if/how/where to diagnose it. I don't think the high level LangRef description is the right place to discuss specific target handling. Currently it won't error or anything. Code checking the denorm mode will see the f32 specific mode, even if the target in the end isn't really going to respect this. One problem is this potentially does require coordination with other toolchain components. For AMDGPU, the compiler can directly tell the driver what FP mode to set on each entry point, but for x86 it requires linking in crtfastmath to set the default mode bits. If another target had a similar runtime environment requirement, I don't think we can be sure the attribute is correct or not. arsenm: I'm not sure where to document this, or if/how/where to diagnose it. I don't think the high…
		andrew.w.kaylorUnsubmitted Not Done Reply Inline Actions There is precedent for describing target-specific behavior in LangRef. It just doesn't seem useful to say that not all targets support the attribute without saying which ones do. We should also say what is expected if a target doesn't support the attribute. It seems reasonable for the function attribute to be silently ignored. One problem is this potentially does require coordination with other toolchain components. For AMDGPU, the compiler can directly tell the driver what FP mode to set on each entry point, but for x86 it requires linking in crtfastmath to set the default mode bits. This is a point I'm interested in. I don't like the current crtfastmath.o handling. It feels almost accidental when FTZ works as expected. My understanding is we link crtfastmath.o if we find it but if not everything just goes about its business. The Intel compiler injects code into main() to explicitly set the FTZ/DAZ control modes. That obviously has problems too, but it's at least consistent and predictable. As I understand it, crtfastmath.o sets these modes from a static initializer, but I'm not sure anything is done to determine the order of that initializer relative to others. How does the compiler identify entry points for AMDGPU? And does it emit code to set FTZ based on the function attribute here? andrew.w.kaylor: There is precedent for describing target-specific behavior in LangRef. It just doesn't seem…
		arsenmAuthorUnsubmitted Done Reply Inline Actions The entry points are a specific calling convention. There's no real concept of main. Each kernel has an associated blob of metadata the driver uses to set up various config registers on dispatch. I don't think specially recognizing main in the compiler is fundamentally different than having it done in a static constructor. It's still a construct not associated with any particular function or anything. arsenm: The entry points are a specific calling convention. There's no real concept of main. Each…
		andrew.w.kaylorUnsubmitted Not Done Reply Inline Actions The problem with having it done in a static constructor is that you have no certainty of when it will be done relative to other static constructors. If it's in main you can at least say that it's after all the static constructors (assuming main is your entry point). andrew.w.kaylor: The problem with having it done in a static constructor is that you have no certainty of when…
		cameron.mcinallyUnsubmitted Not Done Reply Inline Actions Yes and no. The linker should honor static constructor priorities. But, yeah, there's no guarantee that this constructor will run before other priority 101 constructors. The performance penalty for setting denormal flushing in main could be significant (think C++). Also, there's precedent for using static constructors, like GCC's crtfastmath.o. cameron.mcinally: Yes and no. The linker should honor static constructor priorities. But, yeah, there's no…
		andrew.w.kaylorUnsubmitted Not Done Reply Inline Actions Fair enough. I don't necessarily like how icc handles this. I don't have a problem with how gcc handles it. I just really don't like how LLVM does it. If we want to take the static constructor approach we should define our own, not depend on whether or not the GNU object file happens to be around. Static initialization doesn't help for AMDGPU, and I suppose that's likely to be the case for any offload execution model. Since this patch is moving us toward a more consistent implementation I'm wondering if we can define some general rules for how this is supposed to work. Like when the function attribute will result in injected instructions setting the control flags and when it won't. andrew.w.kaylor: Fair enough. I don't necessarily like how icc handles this. I don't have a problem with how gcc…
		arsenmAuthorUnsubmitted Done Reply Inline Actions I think the most we can expect of this attribute as informing codegen of the expected FP denormal handling mode, and not something responsible for ensuring the mode will really be set. AMDGPU conceptually could have a separate set of attributes for setting the denormal FP mode, but since it would look identical, this gets a bonus usage for setting it for kernels. This doesn't protect you from calling functions in modules compiled with different attributes, so similar problems outside the view of the compiler still exist arsenm: I think the most we can expect of this attribute as informing codegen of the expected FP…
		cameron.mcinallyUnsubmitted Not Done Reply Inline Actions If we want to take the static constructor approach we should define our own, not depend on whether or not the GNU object file happens to be around. That's a good idea. There's subtle differences between targets in the GNU implementation. It would be good to standardize them. cameron.mcinally: > If we want to take the static constructor approach we should define our own, not depend on…
This attribute indicates that the function will delegate to some other		This attribute indicates that the function will delegate to some other
function with a tail call. The prototype of a thunk should not be used for		function with a tail call. The prototype of a thunk should not be used for
optimization purposes. The caller is expected to cast the thunk prototype to		optimization purposes. The caller is expected to cast the thunk prototype to
match the thunk target prototype.		match the thunk target prototype.
``uwtable``		``uwtable``
This attribute indicates that the ABI being targeted requires that		This attribute indicates that the ABI being targeted requires that
an unwind table entry be produced for this function even if we can		an unwind table entry be produced for this function even if we can
show that no exceptions passes by it. This is normally the case for		show that no exceptions passes by it. This is normally the case for
▲ Show 20 Lines • Show All 16,168 Lines • Show Last 20 Lines

llvm/lib/CodeGen/MachineFunction.cpp

Show First 20 Lines • Show All 265 Lines • ▼ Show 20 Lines	getOrCreateJumpTableInfo(unsigned EntryKind) {
if (JumpTableInfo) return JumpTableInfo;		if (JumpTableInfo) return JumpTableInfo;

JumpTableInfo = new (Allocator)		JumpTableInfo = new (Allocator)
MachineJumpTableInfo((MachineJumpTableInfo::JTEntryKind)EntryKind);		MachineJumpTableInfo((MachineJumpTableInfo::JTEntryKind)EntryKind);
return JumpTableInfo;		return JumpTableInfo;
}		}

DenormalMode MachineFunction::getDenormalMode(const fltSemantics &FPType) const {		DenormalMode MachineFunction::getDenormalMode(const fltSemantics &FPType) const {
		if (&FPType == &APFloat::IEEEsingle()) {
		Attribute Attr = F.getFnAttribute("denormal-fp-math-f32");
		StringRef Val = Attr.getValueAsString();
		if (!Val.empty())
		return parseDenormalFPAttribute(Val);

		// If the f32 variant of the attribute isn't specified, try to use the
		// generic one.
		}

// TODO: Should probably avoid the connection to the IR and store directly		// TODO: Should probably avoid the connection to the IR and store directly
// in the MachineFunction.		// in the MachineFunction.
Attribute Attr = F.getFnAttribute("denormal-fp-math");		Attribute Attr = F.getFnAttribute("denormal-fp-math");

// FIXME: This should assume IEEE behavior on an unspecified		// FIXME: This should assume IEEE behavior on an unspecified
// attribute. However, the one current user incorrectly assumes a non-IEEE		// attribute. However, the one current user incorrectly assumes a non-IEEE
// target by default.		// target by default.
StringRef Val = Attr.getValueAsString();		StringRef Val = Attr.getValueAsString();
▲ Show 20 Lines • Show All 856 Lines • Show Last 20 Lines

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp

Show First 20 Lines • Show All 114 Lines • ▼ Show 20 Lines	bool NVPTXTargetLowering::usePrecSqrtF32() const {
}		}
}		}

bool NVPTXTargetLowering::useF32FTZ(const MachineFunction &MF) const {		bool NVPTXTargetLowering::useF32FTZ(const MachineFunction &MF) const {
// TODO: Get rid of this flag; there can be only one way to do this.		// TODO: Get rid of this flag; there can be only one way to do this.
if (FtzEnabled.getNumOccurrences() > 0) {		if (FtzEnabled.getNumOccurrences() > 0) {
// If nvptx-f32ftz is used on the command-line, always honor it		// If nvptx-f32ftz is used on the command-line, always honor it
return FtzEnabled;		return FtzEnabled;
} else {
const Function &F = MF.getFunction();
// Otherwise, check for an nvptx-f32ftz attribute on the function
if (F.hasFnAttribute("nvptx-f32ftz"))
return F.getFnAttribute("nvptx-f32ftz").getValueAsString() == "true";
else
return false;
}		}

		return MF.getDenormalMode(APFloat::IEEEsingle()) ==
		DenormalMode::PreserveSign;
}		}

static bool IsPTXVectorType(MVT VT) {		static bool IsPTXVectorType(MVT VT) {
switch (VT.SimpleTy) {		switch (VT.SimpleTy) {
default:		default:
return false;		return false;
case MVT::v2i1:		case MVT::v2i1:
case MVT::v4i1:		case MVT::v4i1:
▲ Show 20 Lines • Show All 4,929 Lines • Show Last 20 Lines

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp

Show All 9 Lines
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "InstCombineInternal.h"		#include "InstCombineInternal.h"
#include "llvm/ADT/APFloat.h"		#include "llvm/ADT/APFloat.h"
#include "llvm/ADT/APInt.h"		#include "llvm/ADT/APInt.h"
#include "llvm/ADT/APSInt.h"		#include "llvm/ADT/APSInt.h"
#include "llvm/ADT/ArrayRef.h"		#include "llvm/ADT/ArrayRef.h"
		#include "llvm/ADT/FloatingPointMode.h"
#include "llvm/ADT/None.h"		#include "llvm/ADT/None.h"
#include "llvm/ADT/Optional.h"		#include "llvm/ADT/Optional.h"
#include "llvm/ADT/STLExtras.h"		#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/ADT/Twine.h"		#include "llvm/ADT/Twine.h"
#include "llvm/Analysis/AssumptionCache.h"		#include "llvm/Analysis/AssumptionCache.h"
#include "llvm/Analysis/InstructionSimplify.h"		#include "llvm/Analysis/InstructionSimplify.h"
▲ Show 20 Lines • Show All 1,672 Lines • ▼ Show 20 Lines	const SimplifyAction Action = [II]() -> SimplifyAction {
}		}
}();		}();

// If Action.FtzRequirementTy is not satisfied by the module's ftz state, we		// If Action.FtzRequirementTy is not satisfied by the module's ftz state, we
// can bail out now. (Notice that in the case that IID is not an NVVM		// can bail out now. (Notice that in the case that IID is not an NVVM
// intrinsic, we don't have to look up any module metadata, as		// intrinsic, we don't have to look up any module metadata, as
// FtzRequirementTy will be FTZ_Any.)		// FtzRequirementTy will be FTZ_Any.)
if (Action.FtzRequirement != FTZ_Any) {		if (Action.FtzRequirement != FTZ_Any) {
bool FtzEnabled =		StringRef Attr = II->getFunction()
II->getFunction()->getFnAttribute("nvptx-f32ftz").getValueAsString() ==		->getFnAttribute("denormal-fp-math-f32")
"true";		.getValueAsString();
		bool FtzEnabled = parseDenormalFPAttribute(Attr) != DenormalMode::IEEE;

if (FtzEnabled != (Action.FtzRequirement == FTZ_MustBeOn))		if (FtzEnabled != (Action.FtzRequirement == FTZ_MustBeOn))
return nullptr;		return nullptr;
}		}

// Simplify to target-generic intrinsic.		// Simplify to target-generic intrinsic.
if (Action.IID) {		if (Action.IID) {
SmallVector<Value *, 4> Args(II->arg_operands());		SmallVector<Value *, 4> Args(II->arg_operands());
▲ Show 20 Lines • Show All 3,123 Lines • Show Last 20 Lines

llvm/test/CodeGen/NVPTX/fast-math.ll

	Show First 20 Lines • Show All 156 Lines • ▼ Show 20 Lines
	; CHECK: mul.ftz.f32			; CHECK: mul.ftz.f32
	%x = fdiv float %a, %divisor			%x = fdiv float %a, %divisor
	%y = fdiv float %b, %divisor			%y = fdiv float %b, %divisor
	%z = select i1 %pred, float %x, float %y			%z = select i1 %pred, float %x, float %y
	ret float %z			ret float %z
	}			}

	attributes #0 = { "unsafe-fp-math" = "true" }			attributes #0 = { "unsafe-fp-math" = "true" }
	attributes #1 = { "nvptx-f32ftz" = "true" }			attributes #1 = { "denormal-fp-math-f32" = "preserve-sign" }

llvm/test/CodeGen/NVPTX/math-intrins.ll

	Show First 20 Lines • Show All 283 Lines • ▼ Show 20 Lines
	; CHECK-LABEL: @fma_double			; CHECK-LABEL: @fma_double
	define double @fma_double(double %a, double %b, double %c) {			define double @fma_double(double %a, double %b, double %c) {
	; CHECK: fma.rn.f64			; CHECK: fma.rn.f64
	%x = call double @llvm.fma.f64(double %a, double %b, double %c)			%x = call double @llvm.fma.f64(double %a, double %b, double %c)
	ret double %x			ret double %x
	}			}

	attributes #0 = { nounwind readnone }			attributes #0 = { nounwind readnone }
	attributes #1 = { "nvptx-f32ftz" = "true" }			attributes #1 = { "denormal-fp-math-f32" = "preserve-sign" }

llvm/test/CodeGen/NVPTX/sqrt-approx.ll

	Show First 20 Lines • Show All 140 Lines • ▼ Show 20 Lines
	; CHECK-LABEL: test_sqrt64_refined_ftz			; CHECK-LABEL: test_sqrt64_refined_ftz
	define double @test_sqrt64_refined_ftz(double %a) #0 #1 #2 {			define double @test_sqrt64_refined_ftz(double %a) #0 #1 #2 {
	; CHECK: rsqrt.approx.f64			; CHECK: rsqrt.approx.f64
	%ret = tail call double @llvm.sqrt.f64(double %a)			%ret = tail call double @llvm.sqrt.f64(double %a)
	ret double %ret			ret double %ret
	}			}

	attributes #0 = { "unsafe-fp-math" = "true" }			attributes #0 = { "unsafe-fp-math" = "true" }
	attributes #1 = { "nvptx-f32ftz" = "true" }			attributes #1 = { "denormal-fp-math-f32" = "preserve-sign" }
	attributes #2 = { "reciprocal-estimates" = "rsqrtf:1,rsqrtd:1,sqrtf:1,sqrtd:1" }			attributes #2 = { "reciprocal-estimates" = "rsqrtf:1,rsqrtd:1,sqrtf:1,sqrtd:1" }

llvm/test/Transforms/InstCombine/NVPTX/nvvm-intrins.ll

	; Check that nvvm intrinsics get simplified to target-generic intrinsics where			; Check that nvvm intrinsics get simplified to target-generic intrinsics where
	; possible.			; possible.
	;			;
	; We run this test twice; once with ftz on, and again with ftz off. Behold the			; We run this test twice; once with ftz on, and again with ftz off. Behold the
	; hackery:			; hackery:

	; RUN: cat %s > %t.ftz			; RUN: cat %s > %t.ftz
	; RUN: echo 'attributes #0 = { "nvptx-f32ftz" = "true" }' >> %t.ftz			; RUN: echo 'attributes #0 = { "denormal-fp-math-f32" = "preserve-sign" }' >> %t.ftz
	; RUN: opt < %t.ftz -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=FTZ			; RUN: opt < %t.ftz -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=FTZ

	; RUN: cat %s > %t.noftz			; RUN: cat %s > %t.noftz
	; RUN: echo 'attributes #0 = { "nvptx-f32ftz" = "false" }' >> %t.noftz			; RUN: echo 'attributes #0 = { "denormal-fp-math-f32" = "ieee" }' >> %t.noftz
	; RUN: opt < %t.noftz -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=NOFTZ			; RUN: opt < %t.noftz -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=NOFTZ

	; We handle nvvm intrinsics with ftz variants as follows:			; We handle nvvm intrinsics with ftz variants as follows:
	; - If the module is in ftz mode, the ftz variant is transformed into the			; - If the module is in ftz mode, the ftz variant is transformed into the
	; regular llvm intrinsic, and the non-ftz variant is left alone.			; regular llvm intrinsic, and the non-ftz variant is left alone.
	; - If the module is not in ftz mode, it's the reverse: Only the non-ftz			; - If the module is not in ftz mode, it's the reverse: Only the non-ftz
	; variant is transformed, and the ftz variant is left alone.			; variant is transformed, and the ftz variant is left alone.

	▲ Show 20 Lines • Show All 451 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Consoldiate internal denormal flushing controlsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 228348

clang/include/clang/Basic/CodeGenOptions.h

clang/include/clang/Basic/CodeGenOptions.def

clang/include/clang/Driver/CC1Options.td

clang/include/clang/Driver/Options.td

clang/include/clang/Driver/ToolChain.h

clang/lib/Basic/Targets/AMDGPU.cpp

clang/lib/CodeGen/CGCall.cpp

clang/lib/CodeGen/CodeGenModule.cpp

clang/lib/Driver/ToolChains/AMDGPU.h

clang/lib/Driver/ToolChains/AMDGPU.cpp

clang/lib/Driver/ToolChains/Clang.cpp

clang/lib/Driver/ToolChains/Cuda.h

clang/lib/Driver/ToolChains/Cuda.cpp

clang/lib/Driver/ToolChains/HIP.cpp

clang/lib/Frontend/CompilerInvocation.cpp

clang/test/CodeGenCUDA/flush-denormals.cu

clang/test/CodeGenCUDA/propagate-metadata.cu

clang/test/CodeGenOpenCL/amdgpu-features.cl

clang/test/CodeGenOpenCL/denorms-are-zero.cl

clang/test/CodeGenOpenCL/gfx9-fp32-denorms.cl

clang/test/Driver/cl-denorms-are-zero.cl

clang/test/Driver/cuda-flush-denormals-to-zero.cu

clang/test/Driver/denormal-fp-math.c

clang/test/Driver/opencl.cl

llvm/docs/LangRef.rst

llvm/lib/CodeGen/MachineFunction.cpp

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp

llvm/test/CodeGen/NVPTX/fast-math.ll

llvm/test/CodeGen/NVPTX/math-intrins.ll

llvm/test/CodeGen/NVPTX/sqrt-approx.ll

llvm/test/Transforms/InstCombine/NVPTX/nvvm-intrins.ll

Consoldiate internal denormal flushing controls
ClosedPublic