Page MenuHomePhabricator

clang: Guess at some platform FTZ/DAZ default settings
ClosedPublic

Authored by arsenm on Nov 7 2019, 5:45 PM.

Details

Summary

This is to avoid performance regressions when the default attribute
behavior is fixed to assume ieee.

I tested the default on x86_64 ubuntu, which seems to default to
FTZ/DAZ, but am guessing for x86 and PS4.

Diff Detail

Event Timeline

arsenm created this revision.Nov 7 2019, 5:45 PM

I checked Redhat 7.4 that's on the server I'm using for work. And I had a coworker check his Ubuntu 18.04 system with this program. And both systems printed 1f80 as the value of MXCSR which shows FTZ and DAZ are both 0. Are you seeing something different?

#include <x86intrin.h>
#include <stdio.h>

int main() {
  int csr = _mm_getcsr();
  printf("%x\n", csr);
  return 0;
}
spatel added a comment.Nov 8 2019, 5:15 AM

I checked Redhat 7.4 that's on the server I'm using for work. And I had a coworker check his Ubuntu 18.04 system with this program. And both systems printed 1f80 as the value of MXCSR which shows FTZ and DAZ are both 0. Are you seeing something different?

AFAIK, x86(-64) Linux is IEEE-compliant by default. It's only when compiling with -ffast-math that clang/gcc link in the startup routine to set FTZ/DAZ. So this patch should use that same mechanism to set the denorm mode. See:
https://reviews.llvm.org/rL165240

@RKSimon - is it the same on PS4?

Also, I may have missed some discussions. Does this patch series replace the proposal to add instruction-level FMF for denorms?
http://lists.llvm.org/pipermail/llvm-dev/2019-September/135183.html

Ie, did we decide that a function-level attribute is good enough?

Also, I may have missed some discussions. Does this patch series replace the proposal to add instruction-level FMF for denorms?
http://lists.llvm.org/pipermail/llvm-dev/2019-September/135183.html

Ie, did we decide that a function-level attribute is good enough?

I think this is an orthogonal question. I would still find a ftz flag useful even in the presence of this attribute indicating flushing. For AMDGPU it would be useful with a specific instruction context to allow flushing even when the default mode is set to not flush. For example llvm.fmuladd could be emitted with an ftz flag which would select to an instruction that would ordinarily be illegal if denormals are enabled

I checked Redhat 7.4 that's on the server I'm using for work. And I had a coworker check his Ubuntu 18.04 system with this program. And both systems printed 1f80 as the value of MXCSR which shows FTZ and DAZ are both 0. Are you seeing something different?

#include <x86intrin.h>
#include <stdio.h>

int main() {
  int csr = _mm_getcsr();
  printf("%x\n", csr);
  return 0;
}

I see the value as 1f80. However the test program I wrote suggests the default is to flush (and what the comments in bug 34994 suggest?):

In default FP mode
neg_subnormal + neg_subnormal: -0x0p+0
neg_subnormal + neg_zero: -0x0p+0
sqrtf subnormal: 0x0p+0
sqrtf neg_subnormal: -0x0p+0
sqrtf neg_zero: -0x0p+0

With denormals disabled
neg_subnormal + neg_subnormal: -0x0p+0
neg_subnormal + neg_zero: -0x0p+0
sqrtf subnormal: 0x0p+0
sqrtf neg_subnormal: -0x0p+0
sqrtf neg_zero: -0x0p+0

With denormals enabled
neg_subnormal + neg_subnormal: -0x1p-126
neg_subnormal + neg_zero: -0x1p-127
sqrtf subnormal: 0x1.6a09e6p-64
sqrtf neg_subnormal: -nan
sqrtf neg_zero: -0x0p+0

With daz only
neg_subnormal + neg_subnormal: -0x0p+0
neg_subnormal + neg_zero: -0x0p+0
sqrtf subnormal: 0x0p+0
sqrtf neg_subnormal: -0x0p+0
sqrtf neg_zero: -0x0p+0

With ftz only
neg_subnormal + neg_subnormal: -0x1p-126
neg_subnormal + neg_zero: -0x0p+0
sqrtf subnormal: 0x1.6a09e6p-64
sqrtf neg_subnormal: -nan
sqrtf neg_zero: -0x0p+0

I checked Redhat 7.4 that's on the server I'm using for work. And I had a coworker check his Ubuntu 18.04 system with this program. And both systems printed 1f80 as the value of MXCSR which shows FTZ and DAZ are both 0. Are you seeing something different?

#include <x86intrin.h>
#include <stdio.h>

int main() {
  int csr = _mm_getcsr();
  printf("%x\n", csr);
  return 0;
}

I see the value as 1f80. However the test program I wrote suggests the default is to flush (and what the comments in bug 34994 suggest?):

Is the test program attached somewhere?
Bug 34994 (https://bugs.llvm.org/show_bug.cgi?id=34994) was limited to changing cases where we are running in some kind of loose-FP environment (otherwise, we would not be generating a sqrt estimate sequence at all). In the default (IEEE-compliant) environment, x86 would use a full-precision sqrt instruction or make a call to libm sqrt.

I checked Redhat 7.4 that's on the server I'm using for work. And I had a coworker check his Ubuntu 18.04 system with this program. And both systems printed 1f80 as the value of MXCSR which shows FTZ and DAZ are both 0. Are you seeing something different?

#include <x86intrin.h>
#include <stdio.h>

int main() {
  int csr = _mm_getcsr();
  printf("%x\n", csr);
  return 0;
}

I see the value as 1f80. However the test program I wrote suggests the default is to flush (and what the comments in bug 34994 suggest?):

Is the test program attached somewhere?
Bug 34994 (https://bugs.llvm.org/show_bug.cgi?id=34994) was limited to changing cases where we are running in some kind of loose-FP environment (otherwise, we would not be generating a sqrt estimate sequence at all). In the default (IEEE-compliant) environment, x86 would use a full-precision sqrt instruction or make a call to libm sqrt.

I just posted the test I wrote here: https://github.com/arsenm/subnormal_test

I just posted the test I wrote here: https://github.com/arsenm/subnormal_test

Thanks. I tried compiling with gcc (can't trust clang since it doesn't honor #pragma STDC FENV_ACCESS ON?).
And running that on a Ubuntu 17.10 x86-64 system, it's behaving as I would expect. If you compile without -ffast-math, it asserts:

With denormals disabled
a.out: subnormal_test.cpp:33: void fp32_denorm_test(): Assertion `std::fpclassify(subnormal) == FP_SUBNORMAL' failed.

And if you compile with -ffast-math, it asserts:

In default FP mode
a.out: subnormal_test.cpp:33: void fp32_denorm_test(): Assertion `std::fpclassify(subnormal) == FP_SUBNORMAL' failed.

This is what I see compiling Craig's csr tester:

$ cc -O2 csr.c && ./a.out
1f80
$ cc -O2 csr.c -ffast-math && ./a.out
9fc0

FZ is bit 15 (0x8000) and DAZ is bit 6 (0x0040), so they are clear in default (IEEE) mode and set with -ffast-math.

arsenm updated this revision to Diff 231726.Dec 2 2019, 9:22 AM

DAZ/FTZ seem to be set in crtfastmath.o, so try to reproduce the logic for linking that

spatel added inline comments.
clang/include/clang/Driver/ToolChain.h
580

Formatting nit - prefer to start with verb and lower-case: isFastMathRuntimeAvailable() or hasFastMathRuntime().

587–588

Add -> add

clang/lib/Driver/ToolChains/PS4CPU.h
95–96

@probinson / @andreadb - is this correct for PS4? or is there some equivalent to the Linux startup file?

arsenm updated this revision to Diff 232088.Dec 4 2019, 4:39 AM

Rename functions

spatel added inline comments.Dec 11 2019, 4:56 AM
clang/test/Driver/default-denormal-fp-math.c
8

The prefix should be PRESERVE_SIGN to match the flag?

arsenm updated this revision to Diff 243553.Feb 10 2020, 7:19 AM

Rebase and fix check prefix name

spatel accepted this revision.Feb 10 2020, 8:48 AM

LGTM - the PS4 behavior was confirmed off-list.

This revision is now accepted and ready to land.Feb 10 2020, 8:48 AM