This is an archive of the discontinued LLVM Phabricator instance.

Separately track input and output denormal mode
ClosedPublic

Authored by arsenm on Nov 7 2019, 5:11 PM.

Download Raw Diff

Details

Reviewers

scanon
cameron.mcinally
spatel
andrew.w.kaylor
mibintc
SjoerdMeijer

Summary

AMDGPU and x86 at least both have separate controls for whether
denormal results are flushed on output, and for whether denormals are
implicitly treated as 0 as an input. The current DAGCombiner use only
really cares about the input treatment of denormals.

Diff Detail

Event Timeline

arsenm created this revision.Nov 7 2019, 5:11 PM

Herald added a project: Restricted Project. · View Herald TranscriptNov 7 2019, 5:11 PM

Herald added subscribers: dexonsmith, hiraditya, tpr and 4 others. · View Herald Transcript

arsenm added parent revisions: D69878: Consoldiate internal denormal flushing controls, D69598: Work on cleaning up denormal mode handling.Nov 7 2019, 5:12 PM

arsenm added a child revision: D69979: clang: Guess at some platform FTZ/DAZ default settings.Nov 7 2019, 5:46 PM

andrew.w.kaylor added inline comments.Nov 12 2019, 5:58 PM

llvm/docs/LangRef.rst
1821–1822	I don't like the definition of this attribute. It's not reader-friendly. The comma-separated pair format has no indication which value refers to inputs and which refers to outputs. Also, while this predates your changes, I think the meanings of the current choices are unclear. What would you think of a comma-separated list with the following possibilities? allow-denormals (default) inputs-are-zero (outputs not flushed) inputs-are-zero, outputs-are-zero inputs-are-zero, outputs-are-positive-zero inputs-are-positivezero (outputs not flushed) inputs-are-positivezero, outputs-are-zero inputs-are-positivezero, outputs-are-positive-zero denormal-outputs-are-zero (inputs are unchanged) denormal-outputs-are-positive-zero (inputs are unchanged) I'd also be open to abbreviations. I don't know if "daz" and "ftz" are readable to everyone, but I'm more comfortable with them. That would make the options something like this. allow-denormals daz daz, ftz daz, ftz+ daz+ daz+, ftz daz+, ftz+ ftz ftz+

arsenm marked an inline comment as done.Nov 18 2019, 3:47 AM

arsenm added inline comments.

llvm/docs/LangRef.rst
1821–1822	I'm trying to avoid needing to autoupgrade bitcode at this point, which leaving the names as-is accomplishes. I'm worried this could still end up not in the right place, and then we would need another level of auto upgrade to deal with it later. I think these are overly verbose (I'm also keeping in mind the fact that any use of these does a linear scan through all string attributes, and then needs to parse these). I'm also unclear on what this weird ARM positive-zero really means. Does it mean inputs and outputs ignored the sign? Is there value in representing positive-zero on both sides?

pengfei added a subscriber: pengfei.Dec 2 2019, 4:43 PM

arsenm added a child revision: D71353: Fix denormal-fp-math flag and attribute interaction.Dec 11 2019, 6:34 AM

arsenm added a child revision: D71354: CodeGen: Add -denormal-fp-math-f32 flag.

arsenm added a child revision: D71357: AMDGPU: Assume f32 denormals are enabled by default.Dec 11 2019, 6:43 AM

Rebase

Herald added a subscriber: kerbowa. · View Herald TranscriptJan 20 2020, 1:39 PM

andrew.w.kaylor added inline comments.Jan 29 2020, 5:52 PM

llvm/docs/LangRef.rst
1829	Based on the changes below, if the second value is omitted the input mode will be assumed to be the same as the output mode. That should probably be documented. I guess you intend for that not to happen, but the documentation here leaves the result ambiguous if it does happen.
1850	Is this saying that if a backend generates an instruction that doesn't handle the hardware daz mode then it must insert instructions to check for normals and convert them to zero? If so, do you intend this to apply to all such instructions or only instructions that aren't able to accept denormal inputs?

nhaehnle removed a subscriber: nhaehnle.Jan 30 2020, 1:25 AM

arsenm marked 3 inline comments as done.Jan 31 2020, 6:18 AM

arsenm added inline comments.

llvm/docs/LangRef.rst
1829	I've added a note for this
1850	Only in cases where denormal inputs are invalid or unhandled. The case I'm thinking of is the one user in DAGCombiner, where if denormals are not flushed the result ends up incorrect (see https://bugs.llvm.org/show_bug.cgi?id=34994)

Tweak langref

lgtm

This revision is now accepted and ready to land.Jan 31 2020, 11:46 AM

a3c814d23497bc71b8ed53c35f773366aff02922

Revision Contents

Path

Size

clang/

include/

clang/

Basic/

CodeGenOptions.h

4 lines

Driver/

ToolChain.h

2 lines

lib/

Basic/

Targets/

AMDGPU.cpp

2 lines

CodeGen/

CGCall.cpp

10 lines

CodeGenModule.cpp

2 lines

Driver/

ToolChains/

AMDGPU.cpp

7 lines

Clang.cpp

26 lines

Cuda.cpp

4 lines

Frontend/

CompilerInvocation.cpp

4 lines

test/

CodeGen/

denormalfpmode.c

6 lines

CodeGenCUDA/

flush-denormals.cu

4 lines

propagate-metadata.cu

4 lines

Driver/

cl-denorms-are-zero.cl

2 lines

cuda-flush-denormals-to-zero.cu

4 lines

denormal-fp-math.c

16 lines

llvm/

docs/

LangRef.rst

45 lines

include/

llvm/

ADT/

FloatingPointMode.h

115 lines

lib/

CodeGen/

MachineFunction.cpp

2 lines

SelectionDAG/

DAGCombiner.cpp

25 lines

Target/

NVPTX/

NVPTXISelLowering.cpp

2 lines

Transforms/

InstCombine/

InstCombineCalls.cpp

3 lines

unittests/

ADT/

FloatingPointMode.cpp

121 lines

Diff 241714

clang/include/clang/Basic/CodeGenOptions.h

Show First 20 Lines • Show All 158 Lines • ▼ Show 20 Lines	public:
std::string RecordCommandLine;		std::string RecordCommandLine;

std::map<std::string, std::string> DebugPrefixMap;		std::map<std::string, std::string> DebugPrefixMap;

/// The ABI to use for passing floating point arguments.		/// The ABI to use for passing floating point arguments.
std::string FloatABI;		std::string FloatABI;

/// The floating-point denormal mode to use.		/// The floating-point denormal mode to use.
llvm::DenormalMode FPDenormalMode = llvm::DenormalMode::Invalid;		llvm::DenormalMode FPDenormalMode;

/// The floating-point subnormal mode to use, for float.		/// The floating-point subnormal mode to use, for float.
llvm::DenormalMode FP32DenormalMode = llvm::DenormalMode::Invalid;		llvm::DenormalMode FP32DenormalMode;

/// The float precision limit to use, if non-empty.		/// The float precision limit to use, if non-empty.
std::string LimitFloatPrecision;		std::string LimitFloatPrecision;

struct BitcodeFileToLink {		struct BitcodeFileToLink {
/// The filename of the bitcode file to link in.		/// The filename of the bitcode file to link in.
std::string Filename;		std::string Filename;
/// If true, we set attributes functions in the bitcode library according to		/// If true, we set attributes functions in the bitcode library according to
▲ Show 20 Lines • Show All 194 Lines • Show Last 20 Lines

clang/include/clang/Driver/ToolChain.h

Show First 20 Lines • Show All 611 Lines • ▼ Show 20 Lines	public:
/// Returns the output denormal handling type in the default floating point		/// Returns the output denormal handling type in the default floating point
/// environment for the given \p FPType if given. Otherwise, the default		/// environment for the given \p FPType if given. Otherwise, the default
/// assumed mode for any floating point type.		/// assumed mode for any floating point type.
virtual llvm::DenormalMode getDefaultDenormalModeForType(		virtual llvm::DenormalMode getDefaultDenormalModeForType(
const llvm::opt::ArgList &DriverArgs,		const llvm::opt::ArgList &DriverArgs,
Action::OffloadKind DeviceOffloadKind,		Action::OffloadKind DeviceOffloadKind,
const llvm::fltSemantics *FPType = nullptr) const {		const llvm::fltSemantics *FPType = nullptr) const {
// FIXME: This should be IEEE when default handling is fixed.		// FIXME: This should be IEEE when default handling is fixed.
return llvm::DenormalMode::Invalid;		return llvm::DenormalMode::getInvalid();
}		}
};		};

/// Set a ToolChain's effective triple. Reset it when the registration object		/// Set a ToolChain's effective triple. Reset it when the registration object
/// is destroyed.		/// is destroyed.
class RegisterEffectiveTriple {		class RegisterEffectiveTriple {
const ToolChain &TC;		const ToolChain &TC;

Show All 13 Lines

clang/lib/Basic/Targets/AMDGPU.cpp

Show First 20 Lines • Show All 237 Lines • ▼ Show 20 Lines	for (auto &I : TargetOpts.FeaturesAsWritten) {
if (I == "+fp32-denormals" \|\| I == "-fp32-denormals")		if (I == "+fp32-denormals" \|\| I == "-fp32-denormals")
hasFP32Denormals = true;		hasFP32Denormals = true;
if (I == "+fp64-fp16-denormals" \|\| I == "-fp64-fp16-denormals")		if (I == "+fp64-fp16-denormals" \|\| I == "-fp64-fp16-denormals")
hasFP64Denormals = true;		hasFP64Denormals = true;
}		}
if (!hasFP32Denormals)		if (!hasFP32Denormals)
TargetOpts.Features.push_back(		TargetOpts.Features.push_back(
(Twine(hasFastFMAF() && hasFullRateDenormalsF32() &&		(Twine(hasFastFMAF() && hasFullRateDenormalsF32() &&
CGOpts.FP32DenormalMode == llvm::DenormalMode::IEEE		CGOpts.FP32DenormalMode.Output == llvm::DenormalMode::IEEE
? '+' : '-') + Twine("fp32-denormals"))		? '+' : '-') + Twine("fp32-denormals"))
.str());		.str());
// Always do not flush fp64 or fp16 denorms.		// Always do not flush fp64 or fp16 denorms.
if (!hasFP64Denormals && hasFP64())		if (!hasFP64Denormals && hasFP64())
TargetOpts.Features.push_back("+fp64-fp16-denormals");		TargetOpts.Features.push_back("+fp64-fp16-denormals");
}		}

void AMDGPUTargetInfo::fillValidCPUList(		void AMDGPUTargetInfo::fillValidCPUList(
▲ Show 20 Lines • Show All 101 Lines • Show Last 20 Lines

clang/lib/CodeGen/CGCall.cpp

Show First 20 Lines • Show All 1,743 Lines • ▼ Show 20 Lines	if (AttrOnCallSite) {

FuncAttrs.addAttribute("less-precise-fpmad",		FuncAttrs.addAttribute("less-precise-fpmad",
llvm::toStringRef(CodeGenOpts.LessPreciseFPMAD));		llvm::toStringRef(CodeGenOpts.LessPreciseFPMAD));

if (CodeGenOpts.NullPointerIsValid)		if (CodeGenOpts.NullPointerIsValid)
FuncAttrs.addAttribute("null-pointer-is-valid", "true");		FuncAttrs.addAttribute("null-pointer-is-valid", "true");

// TODO: Omit attribute when the default is IEEE.		// TODO: Omit attribute when the default is IEEE.
if (CodeGenOpts.FPDenormalMode != llvm::DenormalMode::Invalid)		if (CodeGenOpts.FPDenormalMode.isValid())
FuncAttrs.addAttribute("denormal-fp-math",		FuncAttrs.addAttribute("denormal-fp-math",
llvm::denormalModeName(CodeGenOpts.FPDenormalMode));		CodeGenOpts.FPDenormalMode.str());
		if (CodeGenOpts.FP32DenormalMode.isValid()) {
if (CodeGenOpts.FP32DenormalMode != llvm::DenormalMode::Invalid)
FuncAttrs.addAttribute(		FuncAttrs.addAttribute(
"denormal-fp-math-f32",		"denormal-fp-math-f32",
llvm::denormalModeName(CodeGenOpts.FP32DenormalMode));		CodeGenOpts.FP32DenormalMode.str());
		}

FuncAttrs.addAttribute("no-trapping-math",		FuncAttrs.addAttribute("no-trapping-math",
llvm::toStringRef(CodeGenOpts.NoTrappingMath));		llvm::toStringRef(CodeGenOpts.NoTrappingMath));

// Strict (compliant) code is the default, so only add this attribute to		// Strict (compliant) code is the default, so only add this attribute to
// indicate that we are trying to workaround a problem case.		// indicate that we are trying to workaround a problem case.
if (!CodeGenOpts.StrictFloatCastOverflow)		if (!CodeGenOpts.StrictFloatCastOverflow)
FuncAttrs.addAttribute("strict-float-cast-overflow", "false");		FuncAttrs.addAttribute("strict-float-cast-overflow", "false");
▲ Show 20 Lines • Show All 3,024 Lines • Show Last 20 Lines

clang/lib/CodeGen/CodeGenModule.cpp

Show First 20 Lines • Show All 581 Lines • ▼ Show 20 Lines	getModule().addModuleFlag(llvm::Module::Override, "cf-protection-branch",
1);		1);
}		}

if (LangOpts.CUDAIsDevice && getTriple().isNVPTX()) {		if (LangOpts.CUDAIsDevice && getTriple().isNVPTX()) {
// Indicate whether __nvvm_reflect should be configured to flush denormal		// Indicate whether __nvvm_reflect should be configured to flush denormal
// floating point values to 0. (This corresponds to its "__CUDA_FTZ"		// floating point values to 0. (This corresponds to its "__CUDA_FTZ"
// property.)		// property.)
getModule().addModuleFlag(llvm::Module::Override, "nvvm-reflect-ftz",		getModule().addModuleFlag(llvm::Module::Override, "nvvm-reflect-ftz",
CodeGenOpts.FP32DenormalMode !=		CodeGenOpts.FP32DenormalMode.Output !=
llvm::DenormalMode::IEEE);		llvm::DenormalMode::IEEE);
}		}

// Emit OpenCL specific module metadata: OpenCL/SPIR version.		// Emit OpenCL specific module metadata: OpenCL/SPIR version.
if (LangOpts.OpenCL) {		if (LangOpts.OpenCL) {
EmitOpenCLMetadata();		EmitOpenCLMetadata();
// Emit SPIR version.		// Emit SPIR version.
if (getTriple().isSPIR()) {		if (getTriple().isSPIR()) {
▲ Show 20 Lines • Show All 5,338 Lines • Show Last 20 Lines

clang/lib/Driver/ToolChains/AMDGPU.cpp

Show First 20 Lines • Show All 102 Lines • ▼ Show 20 Lines	AMDGPUToolChain::TranslateArgs(const DerivedArgList &Args, StringRef BoundArch,
return DAL;		return DAL;
}		}

llvm::DenormalMode AMDGPUToolChain::getDefaultDenormalModeForType(		llvm::DenormalMode AMDGPUToolChain::getDefaultDenormalModeForType(
const llvm::opt::ArgList &DriverArgs, Action::OffloadKind DeviceOffloadKind,		const llvm::opt::ArgList &DriverArgs, Action::OffloadKind DeviceOffloadKind,
const llvm::fltSemantics *FPType) const {		const llvm::fltSemantics *FPType) const {
// Denormals should always be enabled for f16 and f64.		// Denormals should always be enabled for f16 and f64.
if (!FPType \|\| FPType != &llvm::APFloat::IEEEsingle())		if (!FPType \|\| FPType != &llvm::APFloat::IEEEsingle())
return llvm::DenormalMode::IEEE;		return llvm::DenormalMode::getIEEE();

if (DeviceOffloadKind == Action::OFK_Cuda) {		if (DeviceOffloadKind == Action::OFK_Cuda) {
if (FPType && FPType == &llvm::APFloat::IEEEsingle() &&		if (FPType && FPType == &llvm::APFloat::IEEEsingle() &&
DriverArgs.hasFlag(options::OPT_fcuda_flush_denormals_to_zero,		DriverArgs.hasFlag(options::OPT_fcuda_flush_denormals_to_zero,
options::OPT_fno_cuda_flush_denormals_to_zero,		options::OPT_fno_cuda_flush_denormals_to_zero,
false))		false))
return llvm::DenormalMode::PreserveSign;		return llvm::DenormalMode::getPreserveSign();
}		}

const StringRef GpuArch = DriverArgs.getLastArgValue(options::OPT_mcpu_EQ);		const StringRef GpuArch = DriverArgs.getLastArgValue(options::OPT_mcpu_EQ);
auto Kind = llvm::AMDGPU::parseArchAMDGCN(GpuArch);		auto Kind = llvm::AMDGPU::parseArchAMDGCN(GpuArch);

// Default to enabling f32 denormals by default on subtargets where fma is		// Default to enabling f32 denormals by default on subtargets where fma is
// fast with denormals		// fast with denormals

const unsigned ArchAttr = llvm::AMDGPU::getArchAttrAMDGCN(Kind);		const unsigned ArchAttr = llvm::AMDGPU::getArchAttrAMDGCN(Kind);
const bool DefaultDenormsAreZeroForTarget =		const bool DefaultDenormsAreZeroForTarget =
(ArchAttr & llvm::AMDGPU::FEATURE_FAST_FMA_F32) &&		(ArchAttr & llvm::AMDGPU::FEATURE_FAST_FMA_F32) &&
(ArchAttr & llvm::AMDGPU::FEATURE_FAST_DENORMAL_F32);		(ArchAttr & llvm::AMDGPU::FEATURE_FAST_DENORMAL_F32);

// TODO: There are way too many flags that change this. Do we need to check		// TODO: There are way too many flags that change this. Do we need to check
// them all?		// them all?
bool DAZ = DriverArgs.hasArg(options::OPT_cl_denorms_are_zero) \|\|		bool DAZ = DriverArgs.hasArg(options::OPT_cl_denorms_are_zero) \|\|
!DefaultDenormsAreZeroForTarget;		!DefaultDenormsAreZeroForTarget;
// Outputs are flushed to zero, preserving sign		// Outputs are flushed to zero, preserving sign
return DAZ ? llvm::DenormalMode::PreserveSign : llvm::DenormalMode::IEEE;		return DAZ ? llvm::DenormalMode::getPreserveSign() :
		llvm::DenormalMode::getIEEE();
}		}

void AMDGPUToolChain::addClangTargetOptions(		void AMDGPUToolChain::addClangTargetOptions(
const llvm::opt::ArgList &DriverArgs,		const llvm::opt::ArgList &DriverArgs,
llvm::opt::ArgStringList &CC1Args,		llvm::opt::ArgStringList &CC1Args,
Action::OffloadKind DeviceOffloadingKind) const {		Action::OffloadKind DeviceOffloadingKind) const {
// Default to "hidden" visibility, as object level linking will not be		// Default to "hidden" visibility, as object level linking will not be
// supported for the foreseeable future.		// supported for the foreseeable future.
if (!DriverArgs.hasArg(options::OPT_fvisibility_EQ,		if (!DriverArgs.hasArg(options::OPT_fvisibility_EQ,
options::OPT_fvisibility_ms_compat)) {		options::OPT_fvisibility_ms_compat)) {
CC1Args.push_back("-fvisibility");		CC1Args.push_back("-fvisibility");
CC1Args.push_back("hidden");		CC1Args.push_back("hidden");
CC1Args.push_back("-fapply-global-visibility-to-externs");		CC1Args.push_back("-fapply-global-visibility-to-externs");
}		}
}		}

clang/lib/Driver/ToolChains/Clang.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,625 Lines • ▼ Show 20 Lines	for (const Arg *A : Args) {

case options::OPT_fno_rounding_math:		case options::OPT_fno_rounding_math:
RoundingFPMath = false;		RoundingFPMath = false;
RoundingMathPresent = false;		RoundingMathPresent = false;
break;		break;

case options::OPT_fdenormal_fp_math_EQ:		case options::OPT_fdenormal_fp_math_EQ:
DenormalFPMath = llvm::parseDenormalFPAttribute(A->getValue());		DenormalFPMath = llvm::parseDenormalFPAttribute(A->getValue());
if (DenormalFPMath == llvm::DenormalMode::Invalid) {		if (!DenormalFPMath.isValid()) {
D.Diag(diag::err_drv_invalid_value)		D.Diag(diag::err_drv_invalid_value)
<< A->getAsString(Args) << A->getValue();		<< A->getAsString(Args) << A->getValue();
}		}
break;		break;

case options::OPT_fdenormal_fp_math_f32_EQ:		case options::OPT_fdenormal_fp_math_f32_EQ:
DenormalFP32Math = llvm::parseDenormalFPAttribute(A->getValue());		DenormalFP32Math = llvm::parseDenormalFPAttribute(A->getValue());
if (DenormalFP32Math == llvm::DenormalMode::Invalid) {		if (!DenormalFP32Math.isValid()) {
D.Diag(diag::err_drv_invalid_value)		D.Diag(diag::err_drv_invalid_value)
<< A->getAsString(Args) << A->getValue();		<< A->getAsString(Args) << A->getValue();
}		}
break;		break;

// Validate and pass through -ffp-contract option.		// Validate and pass through -ffp-contract option.
case options::OPT_ffp_contract: {		case options::OPT_ffp_contract: {
StringRef Val = A->getValue();		StringRef Val = A->getValue();
▲ Show 20 Lines • Show All 102 Lines • ▼ Show 20 Lines	for (const Arg *A : Args) {
}		}
if (StrictFPModel) {		if (StrictFPModel) {
// If -ffp-model=strict has been specified on command line but		// If -ffp-model=strict has been specified on command line but
// subsequent options conflict then emit warning diagnostic.		// subsequent options conflict then emit warning diagnostic.
// TODO: How should this interact with DenormalFP32Math?		// TODO: How should this interact with DenormalFP32Math?
if (HonorINFs && HonorNaNs &&		if (HonorINFs && HonorNaNs &&
!AssociativeMath && !ReciprocalMath &&		!AssociativeMath && !ReciprocalMath &&
SignedZeros && TrappingMath && RoundingFPMath &&		SignedZeros && TrappingMath && RoundingFPMath &&
DenormalFPMath != llvm::DenormalMode::IEEE &&		DenormalFPMath != llvm::DenormalMode::getIEEE() &&
FPContract.empty())		FPContract.empty())
// OK: Current Arg doesn't conflict with -ffp-model=strict		// OK: Current Arg doesn't conflict with -ffp-model=strict
;		;
else {		else {
StrictFPModel = false;		StrictFPModel = false;
FPModel = "";		FPModel = "";
D.Diag(clang::diag::warn_drv_overriding_flag_option)		D.Diag(clang::diag::warn_drv_overriding_flag_option)
<< "-ffp-model=strict" <<		<< "-ffp-model=strict" <<
Show All 31 Lines	static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D,
if (TrappingMath) {		if (TrappingMath) {
// FP Exception Behavior is also set to strict		// FP Exception Behavior is also set to strict
assert(FPExceptionBehavior.equals("strict"));		assert(FPExceptionBehavior.equals("strict"));
CmdArgs.push_back("-ftrapping-math");		CmdArgs.push_back("-ftrapping-math");
} else if (TrappingMathPresent)		} else if (TrappingMathPresent)
CmdArgs.push_back("-fno-trapping-math");		CmdArgs.push_back("-fno-trapping-math");

// TODO: Omit flag for the default IEEE instead		// TODO: Omit flag for the default IEEE instead
if (DenormalFPMath != llvm::DenormalMode::Invalid) {		if (DenormalFPMath.isValid()) {
CmdArgs.push_back(Args.MakeArgString(		llvm::SmallString<64> DenormFlag;
"-fdenormal-fp-math=" + llvm::denormalModeName(DenormalFPMath)));		llvm::raw_svector_ostream ArgStr(DenormFlag);
		ArgStr << "-fdenormal-fp-math=" << DenormalFPMath;
		CmdArgs.push_back(Args.MakeArgString(ArgStr.str()));
}		}

if (DenormalFP32Math != llvm::DenormalMode::Invalid) {		if (DenormalFP32Math.isValid()) {
CmdArgs.push_back(Args.MakeArgString(		llvm::SmallString<64> DenormFlag;
"-fdenormal-fp-math-f32=" + llvm::denormalModeName(DenormalFP32Math)));		llvm::raw_svector_ostream ArgStr(DenormFlag);
		ArgStr << "-fdenormal-fp-math-f32=" << DenormalFP32Math;
		CmdArgs.push_back(Args.MakeArgString(ArgStr.str()));
}		}

if (!FPContract.empty())		if (!FPContract.empty())
CmdArgs.push_back(Args.MakeArgString("-ffp-contract=" + FPContract));		CmdArgs.push_back(Args.MakeArgString("-ffp-contract=" + FPContract));

if (!RoundingFPMath)		if (!RoundingFPMath)
CmdArgs.push_back(Args.MakeArgString("-fno-rounding-math"));		CmdArgs.push_back(Args.MakeArgString("-fno-rounding-math"));

▲ Show 20 Lines • Show All 4,169 Lines • Show Last 20 Lines

clang/lib/Driver/ToolChains/Cuda.cpp

	Show First 20 Lines • Show All 705 Lines • ▼ Show 20 Lines
	llvm::DenormalMode CudaToolChain::getDefaultDenormalModeForType(			llvm::DenormalMode CudaToolChain::getDefaultDenormalModeForType(
	const llvm::opt::ArgList &DriverArgs, Action::OffloadKind DeviceOffloadKind,			const llvm::opt::ArgList &DriverArgs, Action::OffloadKind DeviceOffloadKind,
	const llvm::fltSemantics *FPType) const {			const llvm::fltSemantics *FPType) const {
	if (DeviceOffloadKind == Action::OFK_Cuda) {			if (DeviceOffloadKind == Action::OFK_Cuda) {
	if (FPType && FPType == &llvm::APFloat::IEEEsingle() &&			if (FPType && FPType == &llvm::APFloat::IEEEsingle() &&
	DriverArgs.hasFlag(options::OPT_fcuda_flush_denormals_to_zero,			DriverArgs.hasFlag(options::OPT_fcuda_flush_denormals_to_zero,
	options::OPT_fno_cuda_flush_denormals_to_zero,			options::OPT_fno_cuda_flush_denormals_to_zero,
	false))			false))
	return llvm::DenormalMode::PreserveSign;			return llvm::DenormalMode::getPreserveSign();
	}			}

	assert(DeviceOffloadKind != Action::OFK_Host);			assert(DeviceOffloadKind != Action::OFK_Host);
	return llvm::DenormalMode::IEEE;			return llvm::DenormalMode::getIEEE();
	}			}

	bool CudaToolChain::supportsDebugInfoOption(const llvm::opt::Arg *A) const {			bool CudaToolChain::supportsDebugInfoOption(const llvm::opt::Arg *A) const {
	const Option &O = A->getOption();			const Option &O = A->getOption();
	return (O.matches(options::OPT_gN_Group) &&			return (O.matches(options::OPT_gN_Group) &&
	!O.matches(options::OPT_gmodules)) \|\|			!O.matches(options::OPT_gmodules)) \|\|
	O.matches(options::OPT_g_Flag) \|\|			O.matches(options::OPT_g_Flag) \|\|
	O.matches(options::OPT_ggdbN_Group) \|\| O.matches(options::OPT_ggdb) \|\|			O.matches(options::OPT_ggdbN_Group) \|\| O.matches(options::OPT_ggdb) \|\|
	▲ Show 20 Lines • Show All 160 Lines • Show Last 20 Lines

clang/lib/Frontend/CompilerInvocation.cpp

Show First 20 Lines • Show All 1,285 Lines • ▼ Show 20 Lines	if (Arg *A = Args.getLastArg(OPT_ftlsmodel_EQ)) {
}		}
}		}

Opts.TLSSize = getLastArgIntValue(Args, OPT_mtls_size_EQ, 0, Diags);		Opts.TLSSize = getLastArgIntValue(Args, OPT_mtls_size_EQ, 0, Diags);

if (Arg *A = Args.getLastArg(OPT_fdenormal_fp_math_EQ)) {		if (Arg *A = Args.getLastArg(OPT_fdenormal_fp_math_EQ)) {
StringRef Val = A->getValue();		StringRef Val = A->getValue();
Opts.FPDenormalMode = llvm::parseDenormalFPAttribute(Val);		Opts.FPDenormalMode = llvm::parseDenormalFPAttribute(Val);
if (Opts.FPDenormalMode == llvm::DenormalMode::Invalid)		if (!Opts.FPDenormalMode.isValid())
Diags.Report(diag::err_drv_invalid_value) << A->getAsString(Args) << Val;		Diags.Report(diag::err_drv_invalid_value) << A->getAsString(Args) << Val;
}		}

if (Arg *A = Args.getLastArg(OPT_fdenormal_fp_math_f32_EQ)) {		if (Arg *A = Args.getLastArg(OPT_fdenormal_fp_math_f32_EQ)) {
StringRef Val = A->getValue();		StringRef Val = A->getValue();
Opts.FP32DenormalMode = llvm::parseDenormalFPAttribute(Val);		Opts.FP32DenormalMode = llvm::parseDenormalFPAttribute(Val);
if (Opts.FP32DenormalMode == llvm::DenormalMode::Invalid)		if (!Opts.FP32DenormalMode.isValid())
Diags.Report(diag::err_drv_invalid_value) << A->getAsString(Args) << Val;		Diags.Report(diag::err_drv_invalid_value) << A->getAsString(Args) << Val;
}		}

if (Arg *A = Args.getLastArg(OPT_fpcc_struct_return, OPT_freg_struct_return)) {		if (Arg *A = Args.getLastArg(OPT_fpcc_struct_return, OPT_freg_struct_return)) {
if (A->getOption().matches(OPT_fpcc_struct_return)) {		if (A->getOption().matches(OPT_fpcc_struct_return)) {
Opts.setStructReturnConvention(CodeGenOptions::SRCK_OnStack);		Opts.setStructReturnConvention(CodeGenOptions::SRCK_OnStack);
} else {		} else {
assert(A->getOption().matches(OPT_freg_struct_return));		assert(A->getOption().matches(OPT_freg_struct_return));
▲ Show 20 Lines • Show All 2,502 Lines • Show Last 20 Lines

clang/test/CodeGen/denormalfpmode.c

	// RUN: %clang_cc1 -S -fdenormal-fp-math=ieee %s -emit-llvm -o - \| FileCheck %s --check-prefix=CHECK-IEEE			// RUN: %clang_cc1 -S -fdenormal-fp-math=ieee %s -emit-llvm -o - \| FileCheck %s --check-prefix=CHECK-IEEE
	// RUN: %clang_cc1 -S -fdenormal-fp-math=preserve-sign %s -emit-llvm -o - \| FileCheck %s --check-prefix=CHECK-PS			// RUN: %clang_cc1 -S -fdenormal-fp-math=preserve-sign %s -emit-llvm -o - \| FileCheck %s --check-prefix=CHECK-PS
	// RUN: %clang_cc1 -S -fdenormal-fp-math=positive-zero %s -emit-llvm -o - \| FileCheck %s --check-prefix=CHECK-PZ			// RUN: %clang_cc1 -S -fdenormal-fp-math=positive-zero %s -emit-llvm -o - \| FileCheck %s --check-prefix=CHECK-PZ

	// CHECK-LABEL: main			// CHECK-LABEL: main
	// CHECK-IEEE: attributes #0 = {{.}}"denormal-fp-math"="ieee"{{.}}			// CHECK-IEEE: attributes #0 = {{.}}"denormal-fp-math"="ieee,ieee"{{.}}
	// CHECK-PS: attributes #0 = {{.}}"denormal-fp-math"="preserve-sign"{{.}}			// CHECK-PS: attributes #0 = {{.}}"denormal-fp-math"="preserve-sign,preserve-sign"{{.}}
	// CHECK-PZ: attributes #0 = {{.}}"denormal-fp-math"="positive-zero"{{.}}			// CHECK-PZ: attributes #0 = {{.}}"denormal-fp-math"="positive-zero,positive-zero"{{.}}

	int main() {			int main() {
	return 0;			return 0;
	}			}

clang/test/CodeGenCUDA/flush-denormals.cu

	Show All 33 Lines
	// AMDGCN targets without fast FMAF (e.g. gfx803) always have +fp32-denormals.			// AMDGCN targets without fast FMAF (e.g. gfx803) always have +fp32-denormals.
	// For AMDGCN target with fast FMAF (e.g. gfx900), it has +fp32-denormals			// For AMDGCN target with fast FMAF (e.g. gfx900), it has +fp32-denormals
	// by default and -fp32-denormals when there is option			// by default and -fp32-denormals when there is option
	// -fcuda-flush-denormals-to-zero.			// -fcuda-flush-denormals-to-zero.

	// CHECK-LABEL: define void @foo() #0			// CHECK-LABEL: define void @foo() #0
	extern "C" __device__ void foo() {}			extern "C" __device__ void foo() {}

	// FTZ: attributes #0 = {{.*}} "denormal-fp-math-f32"="preserve-sign"			// FTZ: attributes #0 = {{.*}} "denormal-fp-math-f32"="preserve-sign,preserve-sign"
	// NOFTZ: attributes #0 = {{.*}} "denormal-fp-math-f32"="ieee"			// NOFTZ: attributes #0 = {{.*}} "denormal-fp-math-f32"="ieee,ieee"


	// FIXME: This should be removed			// FIXME: This should be removed
	// DEFAULT-NOT: "denormal-fp-math-f32"			// DEFAULT-NOT: "denormal-fp-math-f32"

	// AMDNOFTZ: attributes #0 = {{.}}+fp32-denormals{{.}}+fp64-fp16-denormals			// AMDNOFTZ: attributes #0 = {{.}}+fp32-denormals{{.}}+fp64-fp16-denormals
	// AMDFTZ: attributes #0 = {{.}}+fp64-fp16-denormals{{.}}-fp32-denormals			// AMDFTZ: attributes #0 = {{.}}+fp64-fp16-denormals{{.}}-fp32-denormals

	// FTZ:!llvm.module.flags = !{{{.*}}[[MODFLAG:![0-9]+]]}			// FTZ:!llvm.module.flags = !{{{.*}}[[MODFLAG:![0-9]+]]}
	// FTZ:[[MODFLAG]] = !{i32 4, !"nvvm-reflect-ftz", i32 1}			// FTZ:[[MODFLAG]] = !{i32 4, !"nvvm-reflect-ftz", i32 1}

	// NOFTZ:!llvm.module.flags = !{{{.*}}[[MODFLAG:![0-9]+]]}			// NOFTZ:!llvm.module.flags = !{{{.*}}[[MODFLAG:![0-9]+]]}
	// NOFTZ:[[MODFLAG]] = !{i32 4, !"nvvm-reflect-ftz", i32 0}			// NOFTZ:[[MODFLAG]] = !{i32 4, !"nvvm-reflect-ftz", i32 0}

clang/test/CodeGenCUDA/propagate-metadata.cu

	Show First 20 Lines • Show All 55 Lines • ▼ Show 20 Lines

	// Check the attribute list.			// Check the attribute list.
	// CHECK: attributes [[attr]] = {			// CHECK: attributes [[attr]] = {

	// CHECK-SAME: convergent			// CHECK-SAME: convergent

	// FTZ-NOT: "denormal-fp-math"			// FTZ-NOT: "denormal-fp-math"

	// FTZ-SAME: "denormal-fp-math-f32"="preserve-sign"			// FTZ-SAME: "denormal-fp-math-f32"="preserve-sign,preserve-sign"
	// NOFTZ-SAME: "denormal-fp-math-f32"="ieee"			// NOFTZ-SAME: "denormal-fp-math-f32"="ieee,ieee"

	// CHECK-SAME: "no-trapping-math"="true"			// CHECK-SAME: "no-trapping-math"="true"

	// FAST-SAME: "unsafe-fp-math"="true"			// FAST-SAME: "unsafe-fp-math"="true"
	// NOFAST-NOT: "unsafe-fp-math"="true"			// NOFAST-NOT: "unsafe-fp-math"="true"

clang/test/Driver/cl-denorms-are-zero.cl

	// Slow FMAF and slow f32 denormals			// Slow FMAF and slow f32 denormals
	// RUN: %clang -### -target amdgcn--amdhsa -c -mcpu=pitcairn %s 2>&1 \| FileCheck -check-prefixes=AMDGCN,AMDGCN-FLUSH %s			// RUN: %clang -### -target amdgcn--amdhsa -c -mcpu=pitcairn %s 2>&1 \| FileCheck -check-prefixes=AMDGCN,AMDGCN-FLUSH %s
	// RUN: %clang -### -cl-denorms-are-zero -o - -target amdgcn--amdhsa -c -mcpu=pitcairn %s 2>&1 \| FileCheck -check-prefixes=AMDGCN,AMDGCN-FLUSH %s			// RUN: %clang -### -cl-denorms-are-zero -o - -target amdgcn--amdhsa -c -mcpu=pitcairn %s 2>&1 \| FileCheck -check-prefixes=AMDGCN,AMDGCN-FLUSH %s

	// Fast FMAF, but slow f32 denormals			// Fast FMAF, but slow f32 denormals
	// RUN: %clang -### -target amdgcn--amdhsa -c -mcpu=tahiti %s 2>&1 \| FileCheck -check-prefixes=AMDGCN,AMDGCN-FLUSH %s			// RUN: %clang -### -target amdgcn--amdhsa -c -mcpu=tahiti %s 2>&1 \| FileCheck -check-prefixes=AMDGCN,AMDGCN-FLUSH %s
	// RUN: %clang -### -cl-denorms-are-zero -o - -target amdgcn--amdhsa -c -mcpu=tahiti %s 2>&1 \| FileCheck -check-prefixes=AMDGCN,AMDGCN-FLUSH %s			// RUN: %clang -### -cl-denorms-are-zero -o - -target amdgcn--amdhsa -c -mcpu=tahiti %s 2>&1 \| FileCheck -check-prefixes=AMDGCN,AMDGCN-FLUSH %s

	// Fast F32 denormals, but slow FMAF			// Fast F32 denormals, but slow FMAF
	// RUN: %clang -### -target amdgcn--amdhsa -c -mcpu=fiji %s 2>&1 \| FileCheck -check-prefixes=AMDGCN,AMDGCN-FLUSH %s			// RUN: %clang -### -target amdgcn--amdhsa -c -mcpu=fiji %s 2>&1 \| FileCheck -check-prefixes=AMDGCN,AMDGCN-FLUSH %s
	// RUN: %clang -### -cl-denorms-are-zero -o - -target amdgcn--amdhsa -c -mcpu=fiji %s 2>&1 \| FileCheck -check-prefixes=AMDGCN,AMDGCN-FLUSH %s			// RUN: %clang -### -cl-denorms-are-zero -o - -target amdgcn--amdhsa -c -mcpu=fiji %s 2>&1 \| FileCheck -check-prefixes=AMDGCN,AMDGCN-FLUSH %s

	// Fast F32 denormals and fast FMAF			// Fast F32 denormals and fast FMAF
	// RUN: %clang -### -target amdgcn--amdhsa -c -mcpu=gfx900 %s 2>&1 \| FileCheck -check-prefixes=AMDGCN,AMDGCN-DENORM %s			// RUN: %clang -### -target amdgcn--amdhsa -c -mcpu=gfx900 %s 2>&1 \| FileCheck -check-prefixes=AMDGCN,AMDGCN-DENORM %s
	// RUN: %clang -### -cl-denorms-are-zero -o - -target amdgcn--amdhsa -c -mcpu=gfx900 %s 2>&1 \| FileCheck -check-prefixes=AMDGCN,AMDGCN-FLUSH %s			// RUN: %clang -### -cl-denorms-are-zero -o - -target amdgcn--amdhsa -c -mcpu=gfx900 %s 2>&1 \| FileCheck -check-prefixes=AMDGCN,AMDGCN-FLUSH %s

	// AMDGCN-FLUSH: "-fdenormal-fp-math-f32=preserve-sign"			// AMDGCN-FLUSH: "-fdenormal-fp-math-f32=preserve-sign,preserve-sign"

	// This should be omitted and default to ieee			// This should be omitted and default to ieee
	// AMDGCN-DENORM-NOT: "-fdenormal-fp-math-f32"			// AMDGCN-DENORM-NOT: "-fdenormal-fp-math-f32"

clang/test/Driver/cuda-flush-denormals-to-zero.cu

	// Checks that cuda compilation does the right thing when passed			// Checks that cuda compilation does the right thing when passed
	// -fcuda-flush-denormals-to-zero. This should be translated to			// -fcuda-flush-denormals-to-zero. This should be translated to
	// -fdenormal-fp-math-f32=preserve-sign			// -fdenormal-fp-math-f32=preserve-sign

	// RUN: %clang -no-canonical-prefixes -### -target x86_64-linux-gnu -c -march=haswell--cuda-gpu-arch=sm_20 -fcuda-flush-denormals-to-zero -nocudainc -nocudalib %s 2>&1 \| FileCheck -check-prefix=FTZ %s			// RUN: %clang -no-canonical-prefixes -### -target x86_64-linux-gnu -c -march=haswell--cuda-gpu-arch=sm_20 -fcuda-flush-denormals-to-zero -nocudainc -nocudalib %s 2>&1 \| FileCheck -check-prefix=FTZ %s
	// RUN: %clang -no-canonical-prefixes -### -target x86_64-linux-gnu -c -march=haswell--cuda-gpu-arch=sm_20 -fno-cuda-flush-denormals-to-zero -nocudainc -nocudalib %s 2>&1 \| FileCheck -check-prefix=NOFTZ %s			// RUN: %clang -no-canonical-prefixes -### -target x86_64-linux-gnu -c -march=haswell--cuda-gpu-arch=sm_20 -fno-cuda-flush-denormals-to-zero -nocudainc -nocudalib %s 2>&1 \| FileCheck -check-prefix=NOFTZ %s
	// RUN: %clang -no-canonical-prefixes -### -target x86_64-linux-gnu -c -march=haswell--cuda-gpu-arch=sm_10 -fcuda-flush-denormals-to-zero -nocudainc -nocudalib %s 2>&1 \| FileCheck -check-prefix=FTZ %s			// RUN: %clang -no-canonical-prefixes -### -target x86_64-linux-gnu -c -march=haswell--cuda-gpu-arch=sm_10 -fcuda-flush-denormals-to-zero -nocudainc -nocudalib %s 2>&1 \| FileCheck -check-prefix=FTZ %s
	// RUN: %clang -no-canonical-prefixes -### -target x86_64-linux-gnu -c -march=haswell--cuda-gpu-arch=sm_10 -fno-cuda-flush-denormals-to-zero -nocudainc -nocudalib %s 2>&1 \| FileCheck -check-prefix=NOFTZ %s			// RUN: %clang -no-canonical-prefixes -### -target x86_64-linux-gnu -c -march=haswell--cuda-gpu-arch=sm_10 -fno-cuda-flush-denormals-to-zero -nocudainc -nocudalib %s 2>&1 \| FileCheck -check-prefix=NOFTZ %s

	// CPUFTZ-NOT: -fdenormal-fp-math			// CPUFTZ-NOT: -fdenormal-fp-math

	// FTZ: "-fdenormal-fp-math-f32=preserve-sign"			// FTZ: "-fdenormal-fp-math-f32=preserve-sign,preserve-sign"
	// NOFTZ: "-fdenormal-fp-math=ieee"			// NOFTZ: "-fdenormal-fp-math=ieee,ieee"

clang/test/Driver/denormal-fp-math.c

	// RUN: %clang -### -target arm-unknown-linux-gnu -c %s -fdenormal-fp-math=ieee -v 2>&1 \| FileCheck -check-prefix=CHECK-IEEE %s			// RUN: %clang -### -target arm-unknown-linux-gnu -c %s -fdenormal-fp-math=ieee -v 2>&1 \| FileCheck -check-prefix=CHECK-IEEE %s
	// RUN: %clang -### -target arm-unknown-linux-gnu -c %s -fdenormal-fp-math=preserve-sign -v 2>&1 \| FileCheck -check-prefix=CHECK-PS %s			// RUN: %clang -### -target arm-unknown-linux-gnu -c %s -fdenormal-fp-math=preserve-sign -v 2>&1 \| FileCheck -check-prefix=CHECK-PS %s
	// RUN: %clang -### -target arm-unknown-linux-gnu -c %s -fdenormal-fp-math=positive-zero -v 2>&1 \| FileCheck -check-prefix=CHECK-PZ %s			// RUN: %clang -### -target arm-unknown-linux-gnu -c %s -fdenormal-fp-math=positive-zero -v 2>&1 \| FileCheck -check-prefix=CHECK-PZ %s
	// RUN: %clang -### -target arm-unknown-linux-gnu -c %s -fdenormal-fp-math=ieee -fno-fast-math -v 2>&1 \| FileCheck -check-prefix=CHECK-NO-UNSAFE %s			// RUN: %clang -### -target arm-unknown-linux-gnu -c %s -fdenormal-fp-math=ieee -fno-fast-math -v 2>&1 \| FileCheck -check-prefix=CHECK-NO-UNSAFE %s
	// RUN: %clang -### -target arm-unknown-linux-gnu -c %s -fdenormal-fp-math=ieee -fno-unsafe-math-optimizations -v 2>&1 \| FileCheck -check-prefix=CHECK-NO-UNSAFE %s			// RUN: %clang -### -target arm-unknown-linux-gnu -c %s -fdenormal-fp-math=ieee -fno-unsafe-math-optimizations -v 2>&1 \| FileCheck -check-prefix=CHECK-NO-UNSAFE %s
	// RUN: not %clang -target arm-unknown-linux-gnu -c %s -fdenormal-fp-math=foo -v 2>&1 \| FileCheck -check-prefix=CHECK-INVALID %s			// RUN: not %clang -target arm-unknown-linux-gnu -c %s -fdenormal-fp-math=foo -v 2>&1 \| FileCheck -check-prefix=CHECK-INVALID0 %s
				// RUN: not %clang -target arm-unknown-linux-gnu -c %s -fdenormal-fp-math=ieee,foo -v 2>&1 \| FileCheck -check-prefix=CHECK-INVALID1 %s
				// RUN: not %clang -target arm-unknown-linux-gnu -c %s -fdenormal-fp-math=foo,ieee -v 2>&1 \| FileCheck -check-prefix=CHECK-INVALID2 %s
				// RUN: not %clang -target arm-unknown-linux-gnu -c %s -fdenormal-fp-math=foo,foo -v 2>&1 \| FileCheck -check-prefix=CHECK-INVALID3 %s

	// CHECK-IEEE: -fdenormal-fp-math=ieee			// CHECK-IEEE: -fdenormal-fp-math=ieee,ieee
	// CHECK-PS: "-fdenormal-fp-math=preserve-sign"			// CHECK-PS: "-fdenormal-fp-math=preserve-sign,preserve-sign"
	// CHECK-PZ: "-fdenormal-fp-math=positive-zero"			// CHECK-PZ: "-fdenormal-fp-math=positive-zero,positive-zero"
	// CHECK-NO-UNSAFE-NOT: "-fdenormal-fp-math=ieee"			// CHECK-NO-UNSAFE-NOT: "-fdenormal-fp-math=ieee"
	// CHECK-INVALID: error: invalid value 'foo' in '-fdenormal-fp-math=foo'			// CHECK-INVALID0: error: invalid value 'foo' in '-fdenormal-fp-math=foo'
				// CHECK-INVALID1: error: invalid value 'ieee,foo' in '-fdenormal-fp-math=ieee,foo'
				// CHECK-INVALID2: error: invalid value 'foo,ieee' in '-fdenormal-fp-math=foo,ieee'
				// CHECK-INVALID3: error: invalid value 'foo,foo' in '-fdenormal-fp-math=foo,foo'

llvm/docs/LangRef.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,812 Lines • ▼ Show 20 Lines	``sspstrong``
resulting function will have an ``sspstrong`` attribute.		resulting function will have an ``sspstrong`` attribute.
``strictfp``		``strictfp``
This attribute indicates that the function was called from a scope that		This attribute indicates that the function was called from a scope that
requires strict floating-point semantics. LLVM will not attempt any		requires strict floating-point semantics. LLVM will not attempt any
optimizations that require assumptions about the floating-point rounding		optimizations that require assumptions about the floating-point rounding
mode or that might alter the state of floating-point status flags that		mode or that might alter the state of floating-point status flags that
might otherwise be set or cleared by calling this function. LLVM will		might otherwise be set or cleared by calling this function. LLVM will
not introduce any new floating-point instructions that may trap.		not introduce any new floating-point instructions that may trap.

``"denormal-fp-math"``		``"denormal-fp-math"``
		andrew.w.kaylorUnsubmitted Not Done Reply Inline Actions I don't like the definition of this attribute. It's not reader-friendly. The comma-separated pair format has no indication which value refers to inputs and which refers to outputs. Also, while this predates your changes, I think the meanings of the current choices are unclear. What would you think of a comma-separated list with the following possibilities? allow-denormals (default) inputs-are-zero (outputs not flushed) inputs-are-zero, outputs-are-zero inputs-are-zero, outputs-are-positive-zero inputs-are-positivezero (outputs not flushed) inputs-are-positivezero, outputs-are-zero inputs-are-positivezero, outputs-are-positive-zero denormal-outputs-are-zero (inputs are unchanged) denormal-outputs-are-positive-zero (inputs are unchanged) I'd also be open to abbreviations. I don't know if "daz" and "ftz" are readable to everyone, but I'm more comfortable with them. That would make the options something like this. allow-denormals daz daz, ftz daz, ftz+ daz+ daz+, ftz daz+, ftz+ ftz ftz+ andrew.w.kaylor: I don't like the definition of this attribute. It's not reader-friendly. The comma-separated…
		arsenmAuthorUnsubmitted Done Reply Inline Actions I'm trying to avoid needing to autoupgrade bitcode at this point, which leaving the names as-is accomplishes. I'm worried this could still end up not in the right place, and then we would need another level of auto upgrade to deal with it later. I think these are overly verbose (I'm also keeping in mind the fact that any use of these does a linear scan through all string attributes, and then needs to parse these). I'm also unclear on what this weird ARM positive-zero really means. Does it mean inputs and outputs ignored the sign? Is there value in representing positive-zero on both sides? arsenm: I'm trying to avoid needing to autoupgrade bitcode at this point, which leaving the names as-is…
This indicates the denormal (subnormal) handling that may be assumed		This indicates the denormal (subnormal) handling that may be
for the default floating-point environment. This may be one of		assumed for the default floating-point environment. This is a comma
``"ieee"``, ``"preserve-sign"``, or ``"positive-zero"``. If this		separated pair. The elements may be one of ``"ieee"``,
is attribute is not specified, the default is ``"ieee"``. If the		``"preserve-sign"``, or ``"positive-zero"``. The first entry
mode is ``"preserve-sign"``, or ``"positive-zero"``, denormal		indicates the flushing mode for the result of floating point
outputs may be flushed to zero by standard floating point		operations. The second indicates the handling of denormal inputs to
		floating point instructions. For compatability with older bitcode,
		andrew.w.kaylorUnsubmitted Done Reply Inline Actions Based on the changes below, if the second value is omitted the input mode will be assumed to be the same as the output mode. That should probably be documented. I guess you intend for that not to happen, but the documentation here leaves the result ambiguous if it does happen. andrew.w.kaylor: Based on the changes below, if the second value is omitted the input mode will be assumed to be…
		arsenmAuthorUnsubmitted Done Reply Inline Actions I've added a note for this arsenm: I've added a note for this
		if the second value is omitted, both input and output modes will
		assume the same mode.

		If this is attribute is not specified, the default is
		``"ieee,ieee"``.

		If the output mode is ``"preserve-sign"``, or ``"positive-zero"``,
		denormal outputs may be flushed to zero by standard floating-point
operations. It is not mandated that flushing to zero occurs, but if		operations. It is not mandated that flushing to zero occurs, but if
a denormal output is flushed to zero, it must respect the sign		a denormal output is flushed to zero, it must respect the sign
mode. Not all targets support all modes. While this indicates the		mode. Not all targets support all modes. While this indicates the
expected floating point mode the function will be executed with,		expected floating point mode the function will be executed with,
this does not make any attempt to ensure the mode is		this does not make any attempt to ensure the mode is
consistent. User or platform code is expected to set the floating		consistent. User or platform code is expected to set the floating
point mode appropriately before function entry.		point mode appropriately before function entry.

		If the input mode is ``"preserve-sign"``, or ``"positive-zero"``, a
		floating-point operation must treat any input denormal value as
		zero. In some situations, if an instruction does not respect this
		mode, the input may need to be converted to 0 as if by
		``@llvm.canonicalize`` during lowering for correctness.
		andrew.w.kaylorUnsubmitted Not Done Reply Inline Actions Is this saying that if a backend generates an instruction that doesn't handle the hardware daz mode then it must insert instructions to check for normals and convert them to zero? If so, do you intend this to apply to all such instructions or only instructions that aren't able to accept denormal inputs? andrew.w.kaylor: Is this saying that if a backend generates an instruction that doesn't handle the hardware daz…
		arsenmAuthorUnsubmitted Done Reply Inline Actions Only in cases where denormal inputs are invalid or unhandled. The case I'm thinking of is the one user in DAGCombiner, where if denormals are not flushed the result ends up incorrect (see https://bugs.llvm.org/show_bug.cgi?id=34994) arsenm: Only in cases where denormal inputs are invalid or unhandled. The case I'm thinking of is the…

``"denormal-fp-math-f32"``		``"denormal-fp-math-f32"``
Same as ``"denormal-fp-math"``, but only controls the behavior of		Same as ``"denormal-fp-math"``, but only controls the behavior of
the 32-bit float type (or vectors of 32-bit floats). If both are		the 32-bit float type (or vectors of 32-bit floats). If both are
are present, this overrides ``"denormal-fp-math"``. Not all targets		are present, this overrides ``"denormal-fp-math"``. Not all targets
support separately setting the denormal mode per type, and no		support separately setting the denormal mode per type, and no
attempt is made to diagnose unsupported uses. Currently this		attempt is made to diagnose unsupported uses. Currently this
attribute is respected by the AMDGPU and NVPTX backends.		attribute is respected by the AMDGPU and NVPTX backends.

▲ Show 20 Lines • Show All 13,743 Lines • ▼ Show 20 Lines
operations is done. The correct way to mix constrained and less constrained		operations is done. The correct way to mix constrained and less constrained
operations is to use the rounding mode and exception handling metadata to		operations is to use the rounding mode and exception handling metadata to
mark constrained intrinsics as having LLVM's default behavior.		mark constrained intrinsics as having LLVM's default behavior.

Each of these intrinsics corresponds to a normal floating-point operation. The		Each of these intrinsics corresponds to a normal floating-point operation. The
data arguments and the return value are the same as the corresponding FP		data arguments and the return value are the same as the corresponding FP
operation.		operation.

The rounding mode argument is a metadata string specifying what		The rounding mode argument is a metadata string specifying what
assumptions, if any, the optimizer can make when transforming constant		assumptions, if any, the optimizer can make when transforming constant
values. Some constrained FP intrinsics omit this argument. If required		values. Some constrained FP intrinsics omit this argument. If required
by the intrinsic, this argument must be one of the following strings:		by the intrinsic, this argument must be one of the following strings:

::		::

"round.dynamic"		"round.dynamic"
"round.tonearest"		"round.tonearest"
"round.downward"		"round.downward"
"round.upward"		"round.upward"
▲ Show 20 Lines • Show All 299 Lines • ▼ Show 20 Lines	::

declare <ty2>		declare <ty2>
@llvm.experimental.constrained.fptoui(<type> <value>,		@llvm.experimental.constrained.fptoui(<type> <value>,
metadata <exception behavior>)		metadata <exception behavior>)

Overview:		Overview:
"""""""""		"""""""""

The '``llvm.experimental.constrained.fptoui``' intrinsic converts a		The '``llvm.experimental.constrained.fptoui``' intrinsic converts a
floating-point ``value`` to its unsigned integer equivalent of type ``ty2``.		floating-point ``value`` to its unsigned integer equivalent of type ``ty2``.

Arguments:		Arguments:
""""""""""		""""""""""

The first argument to the '``llvm.experimental.constrained.fptoui``'		The first argument to the '``llvm.experimental.constrained.fptoui``'
intrinsic must be :ref:`floating point <t_floating>` or :ref:`vector		intrinsic must be :ref:`floating point <t_floating>` or :ref:`vector
<t_vector>` of floating point values.		<t_vector>` of floating point values.
Show All 16 Lines	::

declare <ty2>		declare <ty2>
@llvm.experimental.constrained.fptosi(<type> <value>,		@llvm.experimental.constrained.fptosi(<type> <value>,
metadata <exception behavior>)		metadata <exception behavior>)

Overview:		Overview:
"""""""""		"""""""""

The '``llvm.experimental.constrained.fptosi``' intrinsic converts		The '``llvm.experimental.constrained.fptosi``' intrinsic converts
:ref:`floating-point <t_floating>` ``value`` to type ``ty2``.		:ref:`floating-point <t_floating>` ``value`` to type ``ty2``.

Arguments:		Arguments:
""""""""""		""""""""""

The first argument to the '``llvm.experimental.constrained.fptosi``'		The first argument to the '``llvm.experimental.constrained.fptosi``'
intrinsic must be :ref:`floating point <t_floating>` or :ref:`vector		intrinsic must be :ref:`floating point <t_floating>` or :ref:`vector
<t_vector>` of floating point values.		<t_vector>` of floating point values.

The second argument specifies the exception behavior as described above.		The second argument specifies the exception behavior as described above.

Semantics:		Semantics:
""""""""""		""""""""""

The result produced is a signed integer converted from the floating		The result produced is a signed integer converted from the floating
point operand. The value is truncated, so it is rounded towards zero.		point operand. The value is truncated, so it is rounded towards zero.
▲ Show 20 Lines • Show All 92 Lines • ▼ Show 20 Lines
Arguments:		Arguments:
""""""""""		""""""""""

The first argument to the '``llvm.experimental.constrained.fptrunc``'		The first argument to the '``llvm.experimental.constrained.fptrunc``'
intrinsic must be :ref:`floating point <t_floating>` or :ref:`vector		intrinsic must be :ref:`floating point <t_floating>` or :ref:`vector
<t_vector>` of floating point values. This argument must be larger in size		<t_vector>` of floating point values. This argument must be larger in size
than the result.		than the result.

The second and third arguments specify the rounding mode and exception		The second and third arguments specify the rounding mode and exception
behavior as described above.		behavior as described above.

Semantics:		Semantics:
""""""""""		""""""""""

The result produced is a floating point value truncated to be smaller in size		The result produced is a floating point value truncated to be smaller in size
than the operand.		than the operand.

'``llvm.experimental.constrained.fpext``' Intrinsic		'``llvm.experimental.constrained.fpext``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

declare <ty2>		declare <ty2>
@llvm.experimental.constrained.fpext(<type> <value>,		@llvm.experimental.constrained.fpext(<type> <value>,
metadata <exception behavior>)		metadata <exception behavior>)

Overview:		Overview:
"""""""""		"""""""""

The '``llvm.experimental.constrained.fpext``' intrinsic extends a		The '``llvm.experimental.constrained.fpext``' intrinsic extends a
floating-point ``value`` to a larger floating-point value.		floating-point ``value`` to a larger floating-point value.

Arguments:		Arguments:
""""""""""		""""""""""

The first argument to the '``llvm.experimental.constrained.fpext``'		The first argument to the '``llvm.experimental.constrained.fpext``'
intrinsic must be :ref:`floating point <t_floating>` or :ref:`vector		intrinsic must be :ref:`floating point <t_floating>` or :ref:`vector
<t_vector>` of floating point values. This argument must be smaller in size		<t_vector>` of floating point values. This argument must be smaller in size
▲ Show 20 Lines • Show All 1,000 Lines • ▼ Show 20 Lines
Syntax:		Syntax:
"""""""		"""""""

::		::

declare <inttype>		declare <inttype>
@llvm.experimental.constrained.llround(<fptype> <op1>,		@llvm.experimental.constrained.llround(<fptype> <op1>,
metadata <exception behavior>)		metadata <exception behavior>)

Overview:		Overview:
"""""""""		"""""""""

The '``llvm.experimental.constrained.llround``' intrinsic returns the first		The '``llvm.experimental.constrained.llround``' intrinsic returns the first
operand rounded to the nearest integer with ties away from zero. It will		operand rounded to the nearest integer with ties away from zero. It will
raise an inexact floating-point exception if the operand is not an integer.		raise an inexact floating-point exception if the operand is not an integer.
An invalid exception is raised if the result is too large to fit into a		An invalid exception is raised if the result is too large to fit into a
supported integer type, and in this case the result is undefined.		supported integer type, and in this case the result is undefined.
▲ Show 20 Lines • Show All 1,605 Lines • Show Last 20 Lines

llvm/include/llvm/ADT/FloatingPointMode.h

	//===- llvm/Support/FloatingPointMode.h -------------------------- C++ --===//			//===- llvm/Support/FloatingPointMode.h -------------------------- C++ --===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// Utilities for dealing with flags related to floating point mode controls.			// Utilities for dealing with flags related to floating point mode controls.
	//			//
	//===----------------------------------------------------------------------===/			//===----------------------------------------------------------------------===/

	#ifndef LLVM_FLOATINGPOINTMODE_H			#ifndef LLVM_FLOATINGPOINTMODE_H
	#define LLVM_FLOATINGPOINTMODE_H			#define LLVM_FLOATINGPOINTMODE_H

	#include "llvm/ADT/StringSwitch.h"			#include "llvm/ADT/StringSwitch.h"
				#include "llvm/Support/raw_ostream.h"

	namespace llvm {			namespace llvm {

				/// Represent ssubnormal handling kind for floating point instruction inputs and
				/// outputs.
				struct DenormalMode {
	/// Represent handled modes for denormal (aka subnormal) modes in the floating			/// Represent handled modes for denormal (aka subnormal) modes in the floating
	/// point environment.			/// point environment.
	enum class DenormalMode {			enum DenormalModeKind : char {
	Invalid = -1,			Invalid = -1,

	/// IEEE-754 denormal numbers preserved.			/// IEEE-754 denormal numbers preserved.
	IEEE,			IEEE,

	/// The sign of a flushed-to-zero number is preserved in the sign of 0			/// The sign of a flushed-to-zero number is preserved in the sign of 0
	PreserveSign,			PreserveSign,

	/// Denormals are flushed to positive zero.			/// Denormals are flushed to positive zero.
	PositiveZero			PositiveZero
	};			};

				/// Denormal flushing mode for floating point instruction results in the
				/// default floating point environment.
				DenormalModeKind Output = DenormalModeKind::Invalid;

				/// Denormal treatment kind for floating point instruction inputs in the
				/// default floating-point environment. If this is not DenormalModeKind::IEEE,
				/// floating-point instructions implicitly treat the input value as 0.
				DenormalModeKind Input = DenormalModeKind::Invalid;

				DenormalMode() = default;
				DenormalMode(DenormalModeKind Out, DenormalModeKind In) :
				Output(Out), Input(In) {}


				static DenormalMode getInvalid() {
				return DenormalMode(DenormalModeKind::Invalid, DenormalModeKind::Invalid);
				}

				static DenormalMode getIEEE() {
				return DenormalMode(DenormalModeKind::IEEE, DenormalModeKind::IEEE);
				}

				static DenormalMode getPreserveSign() {
				return DenormalMode(DenormalModeKind::PreserveSign,
				DenormalModeKind::PreserveSign);
				}

				static DenormalMode getPositiveZero() {
				return DenormalMode(DenormalModeKind::PositiveZero,
				DenormalModeKind::PositiveZero);
				}

				bool operator==(DenormalMode Other) const {
				return Output == Other.Output && Input == Other.Input;
				}

				bool operator!=(DenormalMode Other) const {
				return !(*this == Other);
				}

				bool isSimple() const {
				return Input == Output;
				}

				bool isValid() const {
				return Output != DenormalModeKind::Invalid &&
				Input != DenormalModeKind::Invalid;
				}

				inline void print(raw_ostream &OS) const;

				inline std::string str() const {
				std::string storage;
				raw_string_ostream OS(storage);
				print(OS);
				return OS.str();
				}
				};

				inline raw_ostream& operator<<(raw_ostream &OS, DenormalMode Mode) {
				Mode.print(OS);
				return OS;
				}

	/// Parse the expected names from the denormal-fp-math attribute.			/// Parse the expected names from the denormal-fp-math attribute.
	inline DenormalMode parseDenormalFPAttribute(StringRef Str) {			inline DenormalMode::DenormalModeKind
				parseDenormalFPAttributeComponent(StringRef Str) {
	// Assume ieee on unspecified attribute.			// Assume ieee on unspecified attribute.
	return StringSwitch<DenormalMode>(Str)			return StringSwitch<DenormalMode::DenormalModeKind>(Str)
	.Cases("", "ieee", DenormalMode::IEEE)			.Cases("", "ieee", DenormalMode::IEEE)
	.Case("preserve-sign", DenormalMode::PreserveSign)			.Case("preserve-sign", DenormalMode::PreserveSign)
	.Case("positive-zero", DenormalMode::PositiveZero)			.Case("positive-zero", DenormalMode::PositiveZero)
	.Default(DenormalMode::Invalid);			.Default(DenormalMode::Invalid);
	}			}

	/// Return the name used for the denormal handling mode used by the the			/// Return the name used for the denormal handling mode used by the the
	/// expected names from the denormal-fp-math attribute.			/// expected names from the denormal-fp-math attribute.
	inline StringRef denormalModeName(DenormalMode Mode) {			inline StringRef denormalModeKindName(DenormalMode::DenormalModeKind Mode) {
	switch (Mode) {			switch (Mode) {
	case DenormalMode::IEEE:			case DenormalMode::IEEE:
	return "ieee";			return "ieee";
	case DenormalMode::PreserveSign:			case DenormalMode::PreserveSign:
	return "preserve-sign";			return "preserve-sign";
	case DenormalMode::PositiveZero:			case DenormalMode::PositiveZero:
	return "positive-zero";			return "positive-zero";
	default:			default:
	return "";			return "";
	}			}
	}			}

				/// Returns the denormal mode to use for inputs and outputs.
				inline DenormalMode parseDenormalFPAttribute(StringRef Str) {
				StringRef OutputStr, InputStr;
				std::tie(OutputStr, InputStr) = Str.split(',');

				DenormalMode Mode;
				Mode.Output = parseDenormalFPAttributeComponent(OutputStr);

				// Maintain compatability with old form of the attribute which only specified
				// one component.
				Mode.Input = InputStr.empty() ? Mode.Output :
				parseDenormalFPAttributeComponent(InputStr);

				return Mode;
				}

				void DenormalMode::print(raw_ostream &OS) const {
				OS << denormalModeKindName(Output) << ',' << denormalModeKindName(Input);
				}

	}			}

	#endif // LLVM_FLOATINGPOINTMODE_H			#endif // LLVM_FLOATINGPOINTMODE_H

llvm/lib/CodeGen/MachineFunction.cpp

Show First 20 Lines • Show All 284 Lines • ▼ Show 20 Lines	DenormalMode MachineFunction::getDenormalMode(const fltSemantics &FPType) const {
// in the MachineFunction.		// in the MachineFunction.
Attribute Attr = F.getFnAttribute("denormal-fp-math");		Attribute Attr = F.getFnAttribute("denormal-fp-math");

// FIXME: This should assume IEEE behavior on an unspecified		// FIXME: This should assume IEEE behavior on an unspecified
// attribute. However, the one current user incorrectly assumes a non-IEEE		// attribute. However, the one current user incorrectly assumes a non-IEEE
// target by default.		// target by default.
StringRef Val = Attr.getValueAsString();		StringRef Val = Attr.getValueAsString();
if (Val.empty())		if (Val.empty())
return DenormalMode::Invalid;		return DenormalMode::getInvalid();

return parseDenormalFPAttribute(Val);		return parseDenormalFPAttribute(Val);
}		}

/// Should we be emitting segmented stack stuff for the function		/// Should we be emitting segmented stack stuff for the function
bool MachineFunction::shouldSplitStack() const {		bool MachineFunction::shouldSplitStack() const {
return getFunction().hasFnAttribute("split-stack");		return getFunction().hasFnAttribute("split-stack");
}		}
▲ Show 20 Lines • Show All 861 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,599 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::MatchStoreCombine(StoreSDNode *N) {
EVT VT = EVT::getIntegerVT(		EVT VT = EVT::getIntegerVT(
DAG.getContext(), Width N->getMemoryVT().getSizeInBits());		DAG.getContext(), Width N->getMemoryVT().getSizeInBits());
if (VT != MVT::i16 && VT != MVT::i32 && VT != MVT::i64)		if (VT != MVT::i16 && VT != MVT::i32 && VT != MVT::i64)
return SDValue();		return SDValue();

if (LegalOperations && !TLI.isOperationLegal(ISD::STORE, VT))		if (LegalOperations && !TLI.isOperationLegal(ISD::STORE, VT))
return SDValue();		return SDValue();

// Check if all the bytes of the combined value we are looking at are stored		// Check if all the bytes of the combined value we are looking at are stored
// to the same base address. Collect bytes offsets from Base address into		// to the same base address. Collect bytes offsets from Base address into
// ByteOffsets.		// ByteOffsets.
SDValue CombinedValue;		SDValue CombinedValue;
SmallVector<int64_t, 8> ByteOffsets(Width, INT64_MAX);		SmallVector<int64_t, 8> ByteOffsets(Width, INT64_MAX);
int64_t FirstOffset = INT64_MAX;		int64_t FirstOffset = INT64_MAX;
StoreSDNode *FirstStore = nullptr;		StoreSDNode *FirstStore = nullptr;
Optional<BaseIndexOffset> Base;		Optional<BaseIndexOffset> Base;
for (auto Store : Stores) {		for (auto Store : Stores) {
// All the stores store different byte of the CombinedValue. A truncate is		// All the stores store different byte of the CombinedValue. A truncate is
// required to get that byte value.		// required to get that byte value.
SDValue Trunc = Store->getValue();		SDValue Trunc = Store->getValue();
if (Trunc.getOpcode() != ISD::TRUNCATE)		if (Trunc.getOpcode() != ISD::TRUNCATE)
return SDValue();		return SDValue();
// A shift operation is required to get the right byte offset, except the		// A shift operation is required to get the right byte offset, except the
// first byte.		// first byte.
int64_t Offset = 0;		int64_t Offset = 0;
SDValue Value = Trunc.getOperand(0);		SDValue Value = Trunc.getOperand(0);
if (Value.getOpcode() == ISD::SRL \|\|		if (Value.getOpcode() == ISD::SRL \|\|
Value.getOpcode() == ISD::SRA) {		Value.getOpcode() == ISD::SRA) {
ConstantSDNode *ShiftOffset =		ConstantSDNode *ShiftOffset =
dyn_cast<ConstantSDNode>(Value.getOperand(1));		dyn_cast<ConstantSDNode>(Value.getOperand(1));
// Trying to match the following pattern. The shift offset must be		// Trying to match the following pattern. The shift offset must be
// a constant and a multiple of 8. It is the byte offset in "y".		// a constant and a multiple of 8. It is the byte offset in "y".
//		//
// x = srl y, offset		// x = srl y, offset
// i8 z = trunc x		// i8 z = trunc x
// store z, ...		// store z, ...
if (!ShiftOffset \|\| (ShiftOffset->getSExtValue() % 8))		if (!ShiftOffset \|\| (ShiftOffset->getSExtValue() % 8))
return SDValue();		return SDValue();

Offset = ShiftOffset->getSExtValue()/8;		Offset = ShiftOffset->getSExtValue()/8;
Value = Value.getOperand(0);		Value = Value.getOperand(0);
}		}

// Stores must share the same combined value with different offsets.		// Stores must share the same combined value with different offsets.
if (!CombinedValue)		if (!CombinedValue)
CombinedValue = Value;		CombinedValue = Value;
else if (stripTruncAndExt(CombinedValue) != stripTruncAndExt(Value))		else if (stripTruncAndExt(CombinedValue) != stripTruncAndExt(Value))
Show All 28 Lines	for (auto Store : Stores) {
if (Offset < 0 \|\| Offset >= Width \|\| ByteOffsets[Offset] != INT64_MAX)		if (Offset < 0 \|\| Offset >= Width \|\| ByteOffsets[Offset] != INT64_MAX)
return SDValue();		return SDValue();
ByteOffsets[Offset] = ByteOffsetFromBase;		ByteOffsets[Offset] = ByteOffsetFromBase;
}		}

assert(FirstOffset != INT64_MAX && "First byte offset must be set");		assert(FirstOffset != INT64_MAX && "First byte offset must be set");
assert(FirstStore && "First store must be set");		assert(FirstStore && "First store must be set");

// Check if the bytes of the combined value we are looking at match with		// Check if the bytes of the combined value we are looking at match with
// either big or little endian value store.		// either big or little endian value store.
Optional<bool> IsBigEndian = isBigEndian(ByteOffsets, FirstOffset);		Optional<bool> IsBigEndian = isBigEndian(ByteOffsets, FirstOffset);
if (!IsBigEndian.hasValue())		if (!IsBigEndian.hasValue())
return SDValue();		return SDValue();

// The node we are looking at matches with the pattern, check if we can		// The node we are looking at matches with the pattern, check if we can
// replace it with a single bswap if needed and store.		// replace it with a single bswap if needed and store.

▲ Show 20 Lines • Show All 1,922 Lines • ▼ Show 20 Lines	if (VT0 == MVT::i1) {
if (N2->getOpcode() == ISD::SELECT && N2->hasOneUse()) {		if (N2->getOpcode() == ISD::SELECT && N2->hasOneUse()) {
SDValue N2_0 = N2->getOperand(0);		SDValue N2_0 = N2->getOperand(0);
SDValue N2_1 = N2->getOperand(1);		SDValue N2_1 = N2->getOperand(1);
SDValue N2_2 = N2->getOperand(2);		SDValue N2_2 = N2->getOperand(2);
if (N2_1 == N1 && N0.getValueType() == N2_0.getValueType()) {		if (N2_1 == N1 && N0.getValueType() == N2_0.getValueType()) {
// Create the actual or node if we can generate good code for it.		// Create the actual or node if we can generate good code for it.
if (!normalizeToSequence) {		if (!normalizeToSequence) {
SDValue Or = DAG.getNode(ISD::OR, DL, N0.getValueType(), N0, N2_0);		SDValue Or = DAG.getNode(ISD::OR, DL, N0.getValueType(), N0, N2_0);
return DAG.getNode(ISD::SELECT, DL, N1.getValueType(), Or, N1,		return DAG.getNode(ISD::SELECT, DL, N1.getValueType(), Or, N1,
N2_2, Flags);		N2_2, Flags);
}		}
// Otherwise see if we can optimize to a better pattern.		// Otherwise see if we can optimize to a better pattern.
if (SDValue Combined = visitORLike(N0, N2_0, N))		if (SDValue Combined = visitORLike(N0, N2_0, N))
return DAG.getNode(ISD::SELECT, DL, N1.getValueType(), Combined, N1,		return DAG.getNode(ISD::SELECT, DL, N1.getValueType(), Combined, N1,
N2_2, Flags);		N2_2, Flags);
}		}
}		}
▲ Show 20 Lines • Show All 1,854 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::ReduceLoadWidth(SDNode *N) {
}		}

// If we haven't found a load, we can't narrow it.		// If we haven't found a load, we can't narrow it.
if (!isa<LoadSDNode>(N0))		if (!isa<LoadSDNode>(N0))
return SDValue();		return SDValue();

LoadSDNode *LN0 = cast<LoadSDNode>(N0);		LoadSDNode *LN0 = cast<LoadSDNode>(N0);
// Reducing the width of a volatile load is illegal. For atomics, we may be		// Reducing the width of a volatile load is illegal. For atomics, we may be
// able to reduce the width provided we never widen again. (see D66309)		// able to reduce the width provided we never widen again. (see D66309)
if (!LN0->isSimple() \|\|		if (!LN0->isSimple() \|\|
!isLegalNarrowLdSt(LN0, ExtType, ExtVT, ShAmt))		!isLegalNarrowLdSt(LN0, ExtType, ExtVT, ShAmt))
return SDValue();		return SDValue();

auto AdjustBigEndianShift = [&](unsigned ShAmt) {		auto AdjustBigEndianShift = [&](unsigned ShAmt) {
unsigned LVTStoreBits = LN0->getMemoryVT().getStoreSizeInBits();		unsigned LVTStoreBits = LN0->getMemoryVT().getStoreSizeInBits();
unsigned EVTStoreBits = ExtVT.getStoreSizeInBits();		unsigned EVTStoreBits = ExtVT.getStoreSizeInBits();
return LVTStoreBits - EVTStoreBits - ShAmt;		return LVTStoreBits - EVTStoreBits - ShAmt;
▲ Show 20 Lines • Show All 10,313 Lines • ▼ Show 20 Lines	if (Iterations) {

if (!Reciprocal) {		if (!Reciprocal) {
// The estimate is now completely wrong if the input was exactly 0.0 or		// The estimate is now completely wrong if the input was exactly 0.0 or
// possibly a denormal. Force the answer to 0.0 for those cases.		// possibly a denormal. Force the answer to 0.0 for those cases.
SDLoc DL(Op);		SDLoc DL(Op);
EVT CCVT = getSetCCResultType(VT);		EVT CCVT = getSetCCResultType(VT);
ISD::NodeType SelOpcode = VT.isVector() ? ISD::VSELECT : ISD::SELECT;		ISD::NodeType SelOpcode = VT.isVector() ? ISD::VSELECT : ISD::SELECT;
DenormalMode DenormMode = DAG.getDenormalMode(VT);		DenormalMode DenormMode = DAG.getDenormalMode(VT);
if (DenormMode == DenormalMode::IEEE) {		if (DenormMode.Input == DenormalMode::IEEE) {
		// This is specifically a check for the handling of denormal inputs,
		// not the result.

// fabs(X) < SmallestNormal ? 0.0 : Est		// fabs(X) < SmallestNormal ? 0.0 : Est
const fltSemantics &FltSem = DAG.EVTToAPFloatSemantics(VT);		const fltSemantics &FltSem = DAG.EVTToAPFloatSemantics(VT);
APFloat SmallestNorm = APFloat::getSmallestNormalized(FltSem);		APFloat SmallestNorm = APFloat::getSmallestNormalized(FltSem);
SDValue NormC = DAG.getConstantFP(SmallestNorm, DL, VT);		SDValue NormC = DAG.getConstantFP(SmallestNorm, DL, VT);
SDValue FPZero = DAG.getConstantFP(0.0, DL, VT);		SDValue FPZero = DAG.getConstantFP(0.0, DL, VT);
SDValue Fabs = DAG.getNode(ISD::FABS, DL, VT, Op);		SDValue Fabs = DAG.getNode(ISD::FABS, DL, VT, Op);
SDValue IsDenorm = DAG.getSetCC(DL, CCVT, Fabs, NormC, ISD::SETLT);		SDValue IsDenorm = DAG.getSetCC(DL, CCVT, Fabs, NormC, ISD::SETLT);
Est = DAG.getNode(SelOpcode, DL, VT, IsDenorm, FPZero, Est);		Est = DAG.getNode(SelOpcode, DL, VT, IsDenorm, FPZero, Est);
▲ Show 20 Lines • Show All 450 Lines • Show Last 20 Lines

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp

	Show First 20 Lines • Show All 117 Lines • ▼ Show 20 Lines

	bool NVPTXTargetLowering::useF32FTZ(const MachineFunction &MF) const {			bool NVPTXTargetLowering::useF32FTZ(const MachineFunction &MF) const {
	// TODO: Get rid of this flag; there can be only one way to do this.			// TODO: Get rid of this flag; there can be only one way to do this.
	if (FtzEnabled.getNumOccurrences() > 0) {			if (FtzEnabled.getNumOccurrences() > 0) {
	// If nvptx-f32ftz is used on the command-line, always honor it			// If nvptx-f32ftz is used on the command-line, always honor it
	return FtzEnabled;			return FtzEnabled;
	}			}

	return MF.getDenormalMode(APFloat::IEEEsingle()) ==			return MF.getDenormalMode(APFloat::IEEEsingle()).Output ==
	DenormalMode::PreserveSign;			DenormalMode::PreserveSign;
	}			}

	static bool IsPTXVectorType(MVT VT) {			static bool IsPTXVectorType(MVT VT) {
	switch (VT.SimpleTy) {			switch (VT.SimpleTy) {
	default:			default:
	return false;			return false;
	case MVT::v2i1:			case MVT::v2i1:
	▲ Show 20 Lines • Show All 4,930 Lines • Show Last 20 Lines

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp

Show First 20 Lines • Show All 1,708 Lines • ▼ Show 20 Lines	static Instruction SimplifyNVVMIntrinsic(IntrinsicInst II, InstCombiner &IC) {
// If Action.FtzRequirementTy is not satisfied by the module's ftz state, we		// If Action.FtzRequirementTy is not satisfied by the module's ftz state, we
// can bail out now. (Notice that in the case that IID is not an NVVM		// can bail out now. (Notice that in the case that IID is not an NVVM
// intrinsic, we don't have to look up any module metadata, as		// intrinsic, we don't have to look up any module metadata, as
// FtzRequirementTy will be FTZ_Any.)		// FtzRequirementTy will be FTZ_Any.)
if (Action.FtzRequirement != FTZ_Any) {		if (Action.FtzRequirement != FTZ_Any) {
StringRef Attr = II->getFunction()		StringRef Attr = II->getFunction()
->getFnAttribute("denormal-fp-math-f32")		->getFnAttribute("denormal-fp-math-f32")
.getValueAsString();		.getValueAsString();
bool FtzEnabled = parseDenormalFPAttribute(Attr) != DenormalMode::IEEE;		DenormalMode Mode = parseDenormalFPAttribute(Attr);
		bool FtzEnabled = Mode.Output != DenormalMode::IEEE;

if (FtzEnabled != (Action.FtzRequirement == FTZ_MustBeOn))		if (FtzEnabled != (Action.FtzRequirement == FTZ_MustBeOn))
return nullptr;		return nullptr;
}		}

// Simplify to target-generic intrinsic.		// Simplify to target-generic intrinsic.
if (Action.IID) {		if (Action.IID) {
SmallVector<Value *, 4> Args(II->arg_operands());		SmallVector<Value *, 4> Args(II->arg_operands());
▲ Show 20 Lines • Show All 3,266 Lines • Show Last 20 Lines

llvm/unittests/ADT/FloatingPointMode.cpp

	//===- llvm/unittest/ADT/FloatingPointMode.cpp ----------------------------===//			//===- llvm/unittest/ADT/FloatingPointMode.cpp ----------------------------===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "llvm/ADT/FloatingPointMode.h"			#include "llvm/ADT/FloatingPointMode.h"
	#include "gtest/gtest.h"			#include "gtest/gtest.h"

	using namespace llvm;			using namespace llvm;

	namespace {			namespace {

	TEST(FloatingPointModeTest, ParseDenormalFPAttribute) {			TEST(FloatingPointModeTest, ParseDenormalFPAttributeComponent) {
	EXPECT_EQ(DenormalMode::IEEE, parseDenormalFPAttribute("ieee"));			EXPECT_EQ(DenormalMode::IEEE, parseDenormalFPAttributeComponent("ieee"));
	EXPECT_EQ(DenormalMode::IEEE, parseDenormalFPAttribute(""));			EXPECT_EQ(DenormalMode::IEEE, parseDenormalFPAttributeComponent(""));
	EXPECT_EQ(DenormalMode::PreserveSign,			EXPECT_EQ(DenormalMode::PreserveSign,
	parseDenormalFPAttribute("preserve-sign"));			parseDenormalFPAttributeComponent("preserve-sign"));
	EXPECT_EQ(DenormalMode::PositiveZero,			EXPECT_EQ(DenormalMode::PositiveZero,
	parseDenormalFPAttribute("positive-zero"));			parseDenormalFPAttributeComponent("positive-zero"));
	EXPECT_EQ(DenormalMode::Invalid, parseDenormalFPAttribute("foo"));			EXPECT_EQ(DenormalMode::Invalid, parseDenormalFPAttributeComponent("foo"));
	}			}

	TEST(FloatingPointModeTest, DenormalAttributeName) {			TEST(FloatingPointModeTest, DenormalAttributeName) {
	EXPECT_EQ("ieee", denormalModeName(DenormalMode::IEEE));			EXPECT_EQ("ieee", denormalModeKindName(DenormalMode::IEEE));
	EXPECT_EQ("preserve-sign", denormalModeName(DenormalMode::PreserveSign));			EXPECT_EQ("preserve-sign", denormalModeKindName(DenormalMode::PreserveSign));
	EXPECT_EQ("positive-zero", denormalModeName(DenormalMode::PositiveZero));			EXPECT_EQ("positive-zero", denormalModeKindName(DenormalMode::PositiveZero));
	EXPECT_EQ("", denormalModeName(DenormalMode::Invalid));			EXPECT_EQ("", denormalModeKindName(DenormalMode::Invalid));
				}

				TEST(FloatingPointModeTest, ParseDenormalFPAttribute) {
				EXPECT_EQ(DenormalMode(DenormalMode::IEEE, DenormalMode::IEEE),
				parseDenormalFPAttribute("ieee"));
				EXPECT_EQ(DenormalMode(DenormalMode::IEEE, DenormalMode::IEEE),
				parseDenormalFPAttribute("ieee,ieee"));
				EXPECT_EQ(DenormalMode(DenormalMode::IEEE, DenormalMode::IEEE),
				parseDenormalFPAttribute("ieee,"));
				EXPECT_EQ(DenormalMode(DenormalMode::IEEE, DenormalMode::IEEE),
				parseDenormalFPAttribute(""));
				EXPECT_EQ(DenormalMode(DenormalMode::IEEE, DenormalMode::IEEE),
				parseDenormalFPAttribute(","));

				EXPECT_EQ(DenormalMode(DenormalMode::PreserveSign, DenormalMode::PreserveSign),
				parseDenormalFPAttribute("preserve-sign"));
				EXPECT_EQ(DenormalMode(DenormalMode::PreserveSign, DenormalMode::PreserveSign),
				parseDenormalFPAttribute("preserve-sign,"));
				EXPECT_EQ(DenormalMode(DenormalMode::PreserveSign, DenormalMode::PreserveSign),
				parseDenormalFPAttribute("preserve-sign,preserve-sign"));

				EXPECT_EQ(DenormalMode(DenormalMode::PositiveZero, DenormalMode::PositiveZero),
				parseDenormalFPAttribute("positive-zero"));
				EXPECT_EQ(DenormalMode(DenormalMode::PositiveZero, DenormalMode::PositiveZero),
				parseDenormalFPAttribute("positive-zero,positive-zero"));


				EXPECT_EQ(DenormalMode(DenormalMode::IEEE, DenormalMode::PositiveZero),
				parseDenormalFPAttribute("ieee,positive-zero"));
				EXPECT_EQ(DenormalMode(DenormalMode::PositiveZero, DenormalMode::IEEE),
				parseDenormalFPAttribute("positive-zero,ieee"));

				EXPECT_EQ(DenormalMode(DenormalMode::PreserveSign, DenormalMode::IEEE),
				parseDenormalFPAttribute("preserve-sign,ieee"));
				EXPECT_EQ(DenormalMode(DenormalMode::IEEE, DenormalMode::PreserveSign),
				parseDenormalFPAttribute("ieee,preserve-sign"));


				EXPECT_EQ(DenormalMode(DenormalMode::Invalid, DenormalMode::Invalid),
				parseDenormalFPAttribute("foo"));
				EXPECT_EQ(DenormalMode(DenormalMode::Invalid, DenormalMode::Invalid),
				parseDenormalFPAttribute("foo,foo"));
				EXPECT_EQ(DenormalMode(DenormalMode::Invalid, DenormalMode::Invalid),
				parseDenormalFPAttribute("foo,bar"));
				}

				TEST(FloatingPointModeTest, RenderDenormalFPAttribute) {
				EXPECT_EQ(DenormalMode(DenormalMode::Invalid, DenormalMode::Invalid),
				parseDenormalFPAttribute("foo"));

				EXPECT_EQ("ieee,ieee",
				DenormalMode(DenormalMode::IEEE, DenormalMode::IEEE).str());
				EXPECT_EQ(",",
				DenormalMode(DenormalMode::Invalid, DenormalMode::Invalid).str());

				EXPECT_EQ(
				"preserve-sign,preserve-sign",
				DenormalMode(DenormalMode::PreserveSign, DenormalMode::PreserveSign).str());

				EXPECT_EQ(
				"positive-zero,positive-zero",
				DenormalMode(DenormalMode::PositiveZero, DenormalMode::PositiveZero).str());

				EXPECT_EQ(
				"ieee,preserve-sign",
				DenormalMode(DenormalMode::IEEE, DenormalMode::PreserveSign).str());

				EXPECT_EQ(
				"preserve-sign,ieee",
				DenormalMode(DenormalMode::PreserveSign, DenormalMode::IEEE).str());

				EXPECT_EQ(
				"preserve-sign,positive-zero",
				DenormalMode(DenormalMode::PreserveSign, DenormalMode::PositiveZero).str());
				}

				TEST(FloatingPointModeTest, DenormalModeIsSimple) {
				EXPECT_TRUE(DenormalMode(DenormalMode::IEEE, DenormalMode::IEEE).isSimple());
				EXPECT_FALSE(DenormalMode(DenormalMode::IEEE,
				DenormalMode::Invalid).isSimple());
				EXPECT_FALSE(DenormalMode(DenormalMode::PreserveSign,
				DenormalMode::PositiveZero).isSimple());
				}

				TEST(FloatingPointModeTest, DenormalModeIsValid) {
				EXPECT_TRUE(DenormalMode(DenormalMode::IEEE, DenormalMode::IEEE).isValid());
				EXPECT_FALSE(DenormalMode(DenormalMode::IEEE, DenormalMode::Invalid).isValid());
				EXPECT_FALSE(DenormalMode(DenormalMode::Invalid, DenormalMode::IEEE).isValid());
				EXPECT_FALSE(DenormalMode(DenormalMode::Invalid,
				DenormalMode::Invalid).isValid());
				}

				TEST(FloatingPointModeTest, DenormalModeConstructor) {
				EXPECT_EQ(DenormalMode(DenormalMode::Invalid, DenormalMode::Invalid),
				DenormalMode::getInvalid());
				EXPECT_EQ(DenormalMode(DenormalMode::IEEE, DenormalMode::IEEE),
				DenormalMode::getIEEE());
				EXPECT_EQ(DenormalMode(DenormalMode::PreserveSign, DenormalMode::PreserveSign),
				DenormalMode::getPreserveSign());
				EXPECT_EQ(DenormalMode(DenormalMode::PositiveZero, DenormalMode::PositiveZero),
				DenormalMode::getPositiveZero());
	}			}

	}			}

This is an archive of the discontinued LLVM Phabricator instance.

Separately track input and output denormal modeClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 241714

clang/include/clang/Basic/CodeGenOptions.h

clang/include/clang/Driver/ToolChain.h

clang/lib/Basic/Targets/AMDGPU.cpp

clang/lib/CodeGen/CGCall.cpp

clang/lib/CodeGen/CodeGenModule.cpp

clang/lib/Driver/ToolChains/AMDGPU.cpp

clang/lib/Driver/ToolChains/Clang.cpp

clang/lib/Driver/ToolChains/Cuda.cpp

clang/lib/Frontend/CompilerInvocation.cpp

clang/test/CodeGen/denormalfpmode.c

clang/test/CodeGenCUDA/flush-denormals.cu

clang/test/CodeGenCUDA/propagate-metadata.cu

clang/test/Driver/cl-denorms-are-zero.cl

clang/test/Driver/cuda-flush-denormals-to-zero.cu

clang/test/Driver/denormal-fp-math.c

llvm/docs/LangRef.rst

llvm/include/llvm/ADT/FloatingPointMode.h

llvm/lib/CodeGen/MachineFunction.cpp

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp

llvm/unittests/ADT/FloatingPointMode.cpp

Separately track input and output denormal mode
ClosedPublic