Page MenuHomePhabricator

[libc] Add hardware implementations of fma and fmaf for x86_64 and aarch64.
ClosedPublic

Authored by sivachandra on Apr 19 2021, 9:03 PM.

Details

Summary

The current generic implementation of the fmaf function has been moved
to the FPUtil directory. This allows one use the fma operation from
implementations of other math functions like the trignometric functions
without depending on/requiring the fma/fmaf/fmal function targets. If
this pattern ends being convenient, we will switch all generic math
implementations to this pattern.

Diff Detail

Unit TestsFailed

TimeTest
70 msx64 debian > Clang.CodeGen::opt-record-MIR.c
Script: -- : 'RUN: at line 2'; /mnt/disks/ssd0/agent/llvm-project/build/bin/clang -cc1 -internal-isystem /mnt/disks/ssd0/agent/llvm-project/build/lib/clang/13.0.0/include -nostdsysteminc -triple arm64-apple-ios -S -o /dev/null /mnt/disks/ssd0/agent/llvm-project/clang/test/CodeGen/opt-record-MIR.c -O2 -Rpass-missed=regalloc 2>&1 | /mnt/disks/ssd0/agent/llvm-project/build/bin/FileCheck -check-prefix=REMARK /mnt/disks/ssd0/agent/llvm-project/clang/test/CodeGen/opt-record-MIR.c
110 msx64 windows > Clang.CodeGen::opt-record-MIR.c
Script: -- : 'RUN: at line 2'; c:\ws\w16c2-2\llvm-project\premerge-checks\build\bin\clang.exe -cc1 -internal-isystem c:\ws\w16c2-2\llvm-project\premerge-checks\build\lib\clang\13.0.0\include -nostdsysteminc -triple arm64-apple-ios -S -o /dev/null C:\ws\w16c2-2\llvm-project\premerge-checks\clang\test\CodeGen\opt-record-MIR.c -O2 -Rpass-missed=regalloc 2>&1 | c:\ws\w16c2-2\llvm-project\premerge-checks\build\bin\filecheck.exe -check-prefix=REMARK C:\ws\w16c2-2\llvm-project\premerge-checks\clang\test\CodeGen\opt-record-MIR.c

Event Timeline

sivachandra created this revision.Apr 19 2021, 9:03 PM
sivachandra requested review of this revision.Apr 19 2021, 9:03 PM

I have not yet fully tested this patch on x86_64. Will update soon if it requires any changes.

Add fma to the x86_64 entrypoint list.

lntue added inline comments.Apr 19 2021, 10:04 PM
libc/utils/FPUtil/FMA.h
14

Do we also need to check for _ _FMA__ flag?

Stupid question: why assembler instead of builtins:
https://github.com/gcc-mirror/gcc/blob/master/gcc/testsuite/gcc.target/ia64/builtin-fma-1.c
I guess there is a tradeoff between control over the process and helping the register allocator.

Stupid question: why assembler instead of builtins:
https://github.com/gcc-mirror/gcc/blob/master/gcc/testsuite/gcc.target/ia64/builtin-fma-1.c
I guess there is a tradeoff between control over the process and helping the register allocator.

Builtins can potentially silently call into the libc. I preferred inline assembly to avoid such a possibility. I think using target specific compiler intrinsics is a better option but they don't have convenient ways to convert from intrinsic types like __m128 to and from float/double.

libc/utils/FPUtil/FMA.h
14

I agree that not checking on __FMA__ is not ideal. But, I preferred not to condition on any other thing because it complicates the build system. For example, if __FMA__ was not defined, then we wouldn't get the hardware fma and fmaf but only get the integer fmaf implementation. But, other places will expect that fma and fmaf both are available. We can of course teach the build system to check for __FMA__ and propagate that info suitably to the rest of the libc build. That is a change we can take up separately if really required. I would prefer to see more examples of that pattern before building the appropriate machinery in to the build system.

lntue accepted this revision.Apr 20 2021, 2:43 PM
This revision is now accepted and ready to land.Apr 20 2021, 2:43 PM
This revision was landed with ongoing or failed builds.Apr 20 2021, 9:31 PM
This revision was automatically updated to reflect the committed changes.