This is an archive of the discontinued LLVM Phabricator instance.

clang: Guess at some platform FTZ/DAZ default settings
ClosedPublic

Authored by arsenm on Nov 7 2019, 5:45 PM.

Download Raw Diff

Details

Reviewers

spatel
craig.topper
RKSimon
hfinkel
probinson
andrew.w.kaylor
cameron.mcinally

Summary

This is to avoid performance regressions when the default attribute
behavior is fixed to assume ieee.

I tested the default on x86_64 ubuntu, which seems to default to
FTZ/DAZ, but am guessing for x86 and PS4.

Diff Detail

Event Timeline

arsenm created this revision.Nov 7 2019, 5:45 PM

Herald added a subscriber: wdng. · View Herald TranscriptNov 7 2019, 5:45 PM

arsenm added a parent revision: D69978: Separately track input and output denormal mode.Nov 7 2019, 5:46 PM

craig.topper added a reviewer: andrew.w.kaylor.Nov 7 2019, 6:00 PM

I checked Redhat 7.4 that's on the server I'm using for work. And I had a coworker check his Ubuntu 18.04 system with this program. And both systems printed 1f80 as the value of MXCSR which shows FTZ and DAZ are both 0. Are you seeing something different?

#include <x86intrin.h>
#include <stdio.h>

int main() {
  int csr = _mm_getcsr();
  printf("%x\n", csr);
  return 0;
}

In D69979#1738099, @craig.topper wrote:

I checked Redhat 7.4 that's on the server I'm using for work. And I had a coworker check his Ubuntu 18.04 system with this program. And both systems printed 1f80 as the value of MXCSR which shows FTZ and DAZ are both 0. Are you seeing something different?

AFAIK, x86(-64) Linux is IEEE-compliant by default. It's only when compiling with -ffast-math that clang/gcc link in the startup routine to set FTZ/DAZ. So this patch should use that same mechanism to set the denorm mode. See:
https://reviews.llvm.org/rL165240

@RKSimon - is it the same on PS4?

Also, I may have missed some discussions. Does this patch series replace the proposal to add instruction-level FMF for denorms?
http://lists.llvm.org/pipermail/llvm-dev/2019-September/135183.html

Ie, did we decide that a function-level attribute is good enough?

In D69979#1738723, @spatel wrote:

Also, I may have missed some discussions. Does this patch series replace the proposal to add instruction-level FMF for denorms?
http://lists.llvm.org/pipermail/llvm-dev/2019-September/135183.html

Ie, did we decide that a function-level attribute is good enough?

I think this is an orthogonal question. I would still find a ftz flag useful even in the presence of this attribute indicating flushing. For AMDGPU it would be useful with a specific instruction context to allow flushing even when the default mode is set to not flush. For example llvm.fmuladd could be emitted with an ftz flag which would select to an instruction that would ordinarily be illegal if denormals are enabled

In D69979#1738099, @craig.topper wrote:
I checked Redhat 7.4 that's on the server I'm using for work. And I had a coworker check his Ubuntu 18.04 system with this program. And both systems printed 1f80 as the value of MXCSR which shows FTZ and DAZ are both 0. Are you seeing something different?
#include <x86intrin.h>
#include <stdio.h>

int main() {
  int csr = _mm_getcsr();
  printf("%x\n", csr);
  return 0;
}

I see the value as 1f80. However the test program I wrote suggests the default is to flush (and what the comments in bug 34994 suggest?):

In default FP mode
neg_subnormal + neg_subnormal: -0x0p+0
neg_subnormal + neg_zero: -0x0p+0
sqrtf subnormal: 0x0p+0
sqrtf neg_subnormal: -0x0p+0
sqrtf neg_zero: -0x0p+0

With denormals disabled
neg_subnormal + neg_subnormal: -0x0p+0
neg_subnormal + neg_zero: -0x0p+0
sqrtf subnormal: 0x0p+0
sqrtf neg_subnormal: -0x0p+0
sqrtf neg_zero: -0x0p+0

With denormals enabled
neg_subnormal + neg_subnormal: -0x1p-126
neg_subnormal + neg_zero: -0x1p-127
sqrtf subnormal: 0x1.6a09e6p-64
sqrtf neg_subnormal: -nan
sqrtf neg_zero: -0x0p+0

With daz only
neg_subnormal + neg_subnormal: -0x0p+0
neg_subnormal + neg_zero: -0x0p+0
sqrtf subnormal: 0x0p+0
sqrtf neg_subnormal: -0x0p+0
sqrtf neg_zero: -0x0p+0

With ftz only
neg_subnormal + neg_subnormal: -0x1p-126
neg_subnormal + neg_zero: -0x0p+0
sqrtf subnormal: 0x1.6a09e6p-64
sqrtf neg_subnormal: -nan
sqrtf neg_zero: -0x0p+0

In D69979#1740294, @arsenm wrote:
In D69979#1738099, @craig.topper wrote:
I checked Redhat 7.4 that's on the server I'm using for work. And I had a coworker check his Ubuntu 18.04 system with this program. And both systems printed 1f80 as the value of MXCSR which shows FTZ and DAZ are both 0. Are you seeing something different?
#include <x86intrin.h>
#include <stdio.h>

int main() {
  int csr = _mm_getcsr();
  printf("%x\n", csr);
  return 0;
}
I see the value as 1f80. However the test program I wrote suggests the default is to flush (and what the comments in bug 34994 suggest?):

Is the test program attached somewhere?
Bug 34994 (https://bugs.llvm.org/show_bug.cgi?id=34994) was limited to changing cases where we are running in some kind of loose-FP environment (otherwise, we would not be generating a sqrt estimate sequence at all). In the default (IEEE-compliant) environment, x86 would use a full-precision sqrt instruction or make a call to libm sqrt.

In D69979#1746043, @spatel wrote:
In D69979#1740294, @arsenm wrote:
In D69979#1738099, @craig.topper wrote:
I checked Redhat 7.4 that's on the server I'm using for work. And I had a coworker check his Ubuntu 18.04 system with this program. And both systems printed 1f80 as the value of MXCSR which shows FTZ and DAZ are both 0. Are you seeing something different?
#include <x86intrin.h>
#include <stdio.h>

int main() {
  int csr = _mm_getcsr();
  printf("%x\n", csr);
  return 0;
}
I see the value as 1f80. However the test program I wrote suggests the default is to flush (and what the comments in bug 34994 suggest?):
Is the test program attached somewhere?
Bug 34994 (https://bugs.llvm.org/show_bug.cgi?id=34994) was limited to changing cases where we are running in some kind of loose-FP environment (otherwise, we would not be generating a sqrt estimate sequence at all). In the default (IEEE-compliant) environment, x86 would use a full-precision sqrt instruction or make a call to libm sqrt.

I just posted the test I wrote here: https://github.com/arsenm/subnormal_test

In D69979#1749198, @arsenm wrote:

I just posted the test I wrote here: https://github.com/arsenm/subnormal_test

Thanks. I tried compiling with gcc (can't trust clang since it doesn't honor #pragma STDC FENV_ACCESS ON?).
And running that on a Ubuntu 17.10 x86-64 system, it's behaving as I would expect. If you compile without -ffast-math, it asserts:

With denormals disabled
a.out: subnormal_test.cpp:33: void fp32_denorm_test(): Assertion `std::fpclassify(subnormal) == FP_SUBNORMAL' failed.

And if you compile with -ffast-math, it asserts:

In default FP mode
a.out: subnormal_test.cpp:33: void fp32_denorm_test(): Assertion `std::fpclassify(subnormal) == FP_SUBNORMAL' failed.

This is what I see compiling Craig's csr tester:

$ cc -O2 csr.c && ./a.out
1f80
$ cc -O2 csr.c -ffast-math && ./a.out
9fc0

FZ is bit 15 (0x8000) and DAZ is bit 6 (0x0040), so they are clear in default (IEEE) mode and set with -ffast-math.

spatel mentioned this in D69989: Assume ieee behavior without denormal-fp-math attribute.Nov 20 2019, 11:19 AM

DAZ/FTZ seem to be set in crtfastmath.o, so try to reproduce the logic for linking that

spatel added a subscriber: andreadb.Dec 2 2019, 11:29 AM

spatel added inline comments.

clang/include/clang/Driver/ToolChain.h
580	Formatting nit - prefer to start with verb and lower-case: isFastMathRuntimeAvailable() or hasFastMathRuntime().
587	Add -> add
clang/lib/Driver/ToolChains/PS4CPU.h
95–96	@probinson / @andreadb - is this correct for PS4? or is there some equivalent to the Linux startup file?

Rename functions

ping

spatel added inline comments.Dec 11 2019, 4:56 AM

clang/test/Driver/default-denormal-fp-math.c
8	The prefix should be PRESERVE_SIGN to match the flag?

Rebase and fix check prefix name

LGTM - the PS4 behavior was confirmed off-list.

This revision is now accepted and ready to land.Feb 10 2020, 8:48 AM

fa7cd549d604bfd8f9dce5d649a19720cbc39cca

Revision Contents

Path

Size

clang/

include/

clang/

Driver/

ToolChain.h

11 lines

lib/

Driver/

ToolChain.cpp

27 lines

ToolChains/

Linux.h

5 lines

Linux.cpp

19 lines

PS4CPU.h

8 lines

test/

Driver/

default-denormal-fp-math.c

19 lines

Diff 231726

clang/include/clang/Driver/ToolChain.h

Show First 20 Lines • Show All 567 Lines • ▼ Show 20 Lines	public:
void AddFilePathLibArgs(const llvm::opt::ArgList &Args,		void AddFilePathLibArgs(const llvm::opt::ArgList &Args,
llvm::opt::ArgStringList &CmdArgs) const;		llvm::opt::ArgStringList &CmdArgs) const;

/// AddCCKextLibArgs - Add the system specific linker arguments to use		/// AddCCKextLibArgs - Add the system specific linker arguments to use
/// for kernel extensions (Darwin-specific).		/// for kernel extensions (Darwin-specific).
virtual void AddCCKextLibArgs(const llvm::opt::ArgList &Args,		virtual void AddCCKextLibArgs(const llvm::opt::ArgList &Args,
llvm::opt::ArgStringList &CmdArgs) const;		llvm::opt::ArgStringList &CmdArgs) const;

		/// If a runtime library exists that sets global flags for unsafe floating
		/// point math, return true.
		///
		/// This checks for presence of the -Ofast, -ffast-math or -funsafe-math flags.
		virtual bool FastMathRuntimeIsAvailable(
		spatelUnsubmitted Not Done Reply Inline Actions Formatting nit - prefer to start with verb and lower-case: isFastMathRuntimeAvailable() or hasFastMathRuntime(). spatel: Formatting nit - prefer to start with verb and lower-case: isFastMathRuntimeAvailable() or…
		const llvm::opt::ArgList &Args, std::string &Path) const;

/// AddFastMathRuntimeIfAvailable - If a runtime library exists that sets		/// AddFastMathRuntimeIfAvailable - If a runtime library exists that sets
/// global flags for unsafe floating point math, add it and return true.		/// global flags for unsafe floating point math, add it and return true.
///		///
/// This checks for presence of the -Ofast, -ffast-math or -funsafe-math flags.		/// This checks for presence of the -Ofast, -ffast-math or -funsafe-math flags.
virtual bool AddFastMathRuntimeIfAvailable(		bool AddFastMathRuntimeIfAvailable(
		spatelUnsubmitted Not Done Reply Inline Actions Add -> add spatel: Add -> add
const llvm::opt::ArgList &Args, llvm::opt::ArgStringList &CmdArgs) const;		const llvm::opt::ArgList &Args, llvm::opt::ArgStringList &CmdArgs) const;

/// addProfileRTLibs - When -fprofile-instr-profile is specified, try to pass		/// addProfileRTLibs - When -fprofile-instr-profile is specified, try to pass
/// a suitable profile runtime library to the linker.		/// a suitable profile runtime library to the linker.
virtual void addProfileRTLibs(const llvm::opt::ArgList &Args,		virtual void addProfileRTLibs(const llvm::opt::ArgList &Args,
llvm::opt::ArgStringList &CmdArgs) const;		llvm::opt::ArgStringList &CmdArgs) const;

/// Add arguments to use system-specific CUDA includes.		/// Add arguments to use system-specific CUDA includes.
virtual void AddCudaIncludeArgs(const llvm::opt::ArgList &DriverArgs,		virtual void AddCudaIncludeArgs(const llvm::opt::ArgList &DriverArgs,
▲ Show 20 Lines • Show All 52 Lines • Show Last 20 Lines

clang/lib/Driver/ToolChain.cpp

Show First 20 Lines • Show All 909 Lines • ▼ Show 20 Lines	if(LibPath.length() > 0)
CmdArgs.push_back(Args.MakeArgString(StringRef("-L") + LibPath));		CmdArgs.push_back(Args.MakeArgString(StringRef("-L") + LibPath));
}		}

void ToolChain::AddCCKextLibArgs(const ArgList &Args,		void ToolChain::AddCCKextLibArgs(const ArgList &Args,
ArgStringList &CmdArgs) const {		ArgStringList &CmdArgs) const {
CmdArgs.push_back("-lcc_kext");		CmdArgs.push_back("-lcc_kext");
}		}

bool ToolChain::AddFastMathRuntimeIfAvailable(const ArgList &Args,		bool ToolChain::FastMathRuntimeIsAvailable(const ArgList &Args,
ArgStringList &CmdArgs) const {		std::string &Path) const {
// Do not check for -fno-fast-math or -fno-unsafe-math when -Ofast passed		// Do not check for -fno-fast-math or -fno-unsafe-math when -Ofast passed
// (to keep the linker options consistent with gcc and clang itself).		// (to keep the linker options consistent with gcc and clang itself).
if (!isOptimizationLevelFast(Args)) {		if (!isOptimizationLevelFast(Args)) {
// Check if -ffast-math or -funsafe-math.		// Check if -ffast-math or -funsafe-math.
Arg *A =		Arg *A =
Args.getLastArg(options::OPT_ffast_math, options::OPT_fno_fast_math,		Args.getLastArg(options::OPT_ffast_math, options::OPT_fno_fast_math,
options::OPT_funsafe_math_optimizations,		options::OPT_funsafe_math_optimizations,
options::OPT_fno_unsafe_math_optimizations);		options::OPT_fno_unsafe_math_optimizations);

if (!A \|\| A->getOption().getID() == options::OPT_fno_fast_math \|\|		if (!A \|\| A->getOption().getID() == options::OPT_fno_fast_math \|\|
A->getOption().getID() == options::OPT_fno_unsafe_math_optimizations)		A->getOption().getID() == options::OPT_fno_unsafe_math_optimizations)
return false;		return false;
}		}
// If crtfastmath.o exists add it to the arguments.		// If crtfastmath.o exists add it to the arguments.
std::string Path = GetFilePath("crtfastmath.o");		Path = GetFilePath("crtfastmath.o");
if (Path == "crtfastmath.o") // Not found.		return (Path != "crtfastmath.o"); // Not found.
return false;		}

		bool ToolChain::AddFastMathRuntimeIfAvailable(const ArgList &Args,
		ArgStringList &CmdArgs) const {
		std::string Path;
		if (FastMathRuntimeIsAvailable(Args, Path)) {
CmdArgs.push_back(Args.MakeArgString(Path));		CmdArgs.push_back(Args.MakeArgString(Path));
return true;		return true;
}		}

		return false;
		}

SanitizerMask ToolChain::getSupportedSanitizers() const {		SanitizerMask ToolChain::getSupportedSanitizers() const {
// Return sanitizers which don't require runtime support and are not		// Return sanitizers which don't require runtime support and are not
// platform dependent.		// platform dependent.

SanitizerMask Res = (SanitizerKind::Undefined & ~SanitizerKind::Vptr &		SanitizerMask Res = (SanitizerKind::Undefined & ~SanitizerKind::Vptr &
~SanitizerKind::Function) \|		~SanitizerKind::Function) \|
(SanitizerKind::CFI & ~SanitizerKind::CFIICall) \|		(SanitizerKind::CFI & ~SanitizerKind::CFIICall) \|
SanitizerKind::CFICastStrict \|		SanitizerKind::CFICastStrict \|
▲ Show 20 Lines • Show All 146 Lines • Show Last 20 Lines

clang/lib/Driver/ToolChains/Linux.h

Show First 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	public:
void addProfileRTLibs(const llvm::opt::ArgList &Args,		void addProfileRTLibs(const llvm::opt::ArgList &Args,
llvm::opt::ArgStringList &CmdArgs) const override;		llvm::opt::ArgStringList &CmdArgs) const override;
virtual std::string computeSysRoot() const;		virtual std::string computeSysRoot() const;

virtual std::string getDynamicLinker(const llvm::opt::ArgList &Args) const;		virtual std::string getDynamicLinker(const llvm::opt::ArgList &Args) const;

std::vector<std::string> ExtraOpts;		std::vector<std::string> ExtraOpts;

		llvm::DenormalMode getDefaultDenormalModeForType(
		const llvm::opt::ArgList &DriverArgs,
		Action::OffloadKind DeviceOffloadKind,
		const llvm::fltSemantics *FPType = nullptr) const override;

protected:		protected:
Tool *buildAssembler() const override;		Tool *buildAssembler() const override;
Tool *buildLinker() const override;		Tool *buildLinker() const override;
};		};

} // end namespace toolchains		} // end namespace toolchains
} // end namespace driver		} // end namespace driver
} // end namespace clang		} // end namespace clang

#endif // LLVM_CLANG_LIB_DRIVER_TOOLCHAINS_LINUX_H		#endif // LLVM_CLANG_LIB_DRIVER_TOOLCHAINS_LINUX_H

clang/lib/Driver/ToolChains/Linux.cpp

Show First 20 Lines • Show All 1,038 Lines • ▼ Show 20 Lines	void Linux::addProfileRTLibs(const llvm::opt::ArgList &Args,
// Add linker option -u__llvm_runtime_variable to cause runtime		// Add linker option -u__llvm_runtime_variable to cause runtime
// initialization module to be linked in.		// initialization module to be linked in.
if ((!Args.hasArg(options::OPT_coverage)) &&		if ((!Args.hasArg(options::OPT_coverage)) &&
(!Args.hasArg(options::OPT_ftest_coverage)))		(!Args.hasArg(options::OPT_ftest_coverage)))
CmdArgs.push_back(Args.MakeArgString(		CmdArgs.push_back(Args.MakeArgString(
Twine("-u", llvm::getInstrProfRuntimeHookVarName())));		Twine("-u", llvm::getInstrProfRuntimeHookVarName())));
ToolChain::addProfileRTLibs(Args, CmdArgs);		ToolChain::addProfileRTLibs(Args, CmdArgs);
}		}

		llvm::DenormalMode Linux::getDefaultDenormalModeForType(
		const llvm::opt::ArgList &DriverArgs,
		Action::OffloadKind DeviceOffloadKind,
		const llvm::fltSemantics *FPType) const {
		switch (getTriple().getArch()) {
		case llvm::Triple::x86:
		case llvm::Triple::x86_64: {
		std::string Unused;
		// DAZ and FTZ are turned on in crtfastmath.o
		if (!DriverArgs.hasArg(options::OPT_nostdlib, options::OPT_nostartfiles) &&
		FastMathRuntimeIsAvailable(DriverArgs, Unused))
		return llvm::DenormalMode::getPreserveSign();
		return llvm::DenormalMode::getIEEE();
		}
		default:
		return llvm::DenormalMode::getIEEE();
		}
		}

clang/lib/Driver/ToolChains/PS4CPU.h

Show First 20 Lines • Show All 82 Lines • ▼ Show 20 Lines	public:
}		}

SanitizerMask getSupportedSanitizers() const override;		SanitizerMask getSupportedSanitizers() const override;

// PS4 toolchain uses legacy thin LTO API, which is not		// PS4 toolchain uses legacy thin LTO API, which is not
// capable of unit splitting.		// capable of unit splitting.
bool canSplitThinLTOUnit() const override { return false; }		bool canSplitThinLTOUnit() const override { return false; }

		llvm::DenormalMode getDefaultDenormalModeForType(
		const llvm::opt::ArgList &DriverArgs,
		Action::OffloadKind DeviceOffloadKind,
		const llvm::fltSemantics *FPType) const override {
		// DAZ and FTZ are on by default.
		return llvm::DenormalMode::getPreserveSign();
		spatelUnsubmitted Not Done Reply Inline Actions @probinson / @andreadb - is this correct for PS4? or is there some equivalent to the Linux startup file? spatel: @probinson / @andreadb - is this correct for PS4? or is there some equivalent to the Linux…
		}

protected:		protected:
Tool *buildAssembler() const override;		Tool *buildAssembler() const override;
Tool *buildLinker() const override;		Tool *buildLinker() const override;
};		};

} // end namespace toolchains		} // end namespace toolchains
} // end namespace driver		} // end namespace driver
} // end namespace clang		} // end namespace clang

#endif // LLVM_CLANG_LIB_DRIVER_TOOLCHAINS_PS4CPU_H		#endif // LLVM_CLANG_LIB_DRIVER_TOOLCHAINS_PS4CPU_H

clang/test/Driver/default-denormal-fp-math.c

This file was added.

				// RUN: %clang -### -target arm-unknown-linux-gnu -c %s -v 2>&1 \| FileCheck -check-prefix=CHECK-IEEE %s
				// RUN: %clang -### -target i386-unknown-linux-gnu -c %s -v 2>&1 \| FileCheck -check-prefix=CHECK-IEEE %s

				// RUN: %clang -### -target x86_64-unknown-linux-gnu --sysroot=%S/Inputs/basic_linux_tree -c %s -v 2>&1 \| FileCheck -check-prefix=CHECK-IEEE %s

				// crtfastmath enables ftz and daz
				// RUN: %clang -### -target x86_64-unknown-linux-gnu -ffast-math --sysroot=%S/Inputs/basic_linux_tree -c %s -v 2>&1 \| FileCheck -check-prefix=CHECK-ZEROSIGN %s

				spatelUnsubmitted Not Done Reply Inline Actions The prefix should be PRESERVE_SIGN to match the flag? spatel: The prefix should be PRESERVE_SIGN to match the flag?
				// crt not linked in with nostartfiles
				// RUN: %clang -### -target x86_64-unknown-linux-gnu -ffast-math -nostartfiles --sysroot=%S/Inputs/basic_linux_tree -c %s -v 2>&1 \| FileCheck -check-prefix=CHECK-IEEE %s

				// If there's no crtfastmath, don't assume ftz/daz
				// RUN: %clang -### -target x86_64-unknown-linux-gnu -ffast-math --sysroot=/dev/null -c %s -v 2>&1 \| FileCheck -check-prefix=CHECK-IEEE %s

				// RUN: %clang -### -target x86_64-scei-ps4 -c %s -v 2>&1 \| FileCheck -check-prefix=CHECK-ZEROSIGN %s


				// CHECK-IEEE: -fdenormal-fp-math=ieee,ieee
				// CHECK-ZEROSIGN: -fdenormal-fp-math=preserve-sign,preserve-sign

This is an archive of the discontinued LLVM Phabricator instance.

clang: Guess at some platform FTZ/DAZ default settingsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 231726

clang/include/clang/Driver/ToolChain.h

clang/lib/Driver/ToolChain.cpp

clang/lib/Driver/ToolChains/Linux.h

clang/lib/Driver/ToolChains/Linux.cpp

clang/lib/Driver/ToolChains/PS4CPU.h

clang/test/Driver/default-denormal-fp-math.c

clang: Guess at some platform FTZ/DAZ default settings
ClosedPublic