Page MenuHomePhabricator

[clang] RFC Support new builtin __arithmetic_fence to control floating point optimization, and new clang option fprotect-parens
Needs ReviewPublic

Authored by mibintc on Apr 8 2021, 8:45 AM.

Details

Reviewers
kpn
aaron.ballman
Summary

This is a proposal to add a new clang builtin, __arithmetic_fence. The purpose of the builtin is to provide the user fine control, at the expression level, over floating point optimization when -ffast-math (-ffp-model=fast) is enabled. We are also proposing a new clang builtin that provides access to this intrinsic, as well as a new clang command line option -fprotect-parens that will be implemented using this intrinsic.

This is the clang patch corresponding to https://reviews.llvm.org/D99675 which proposes a new llvm intrinsic llvm.arith.fence

Note: Preliminary patch, not ready for code review, doesn't yet run end-to-end with the llvm patch, and the logic for the new option is not yet written.

Rationale

Some expression transformations that are mathematically correct, such as reassociation and distribution, may be incorrect when dealing with finite precision floating point. For example, these two expressions,

(a + b) + c
a + (b + c)

are equivalent mathematically in integer arithmetic, but not in floating point. In some floating point (FP) models, the compiler is allowed to make these value-unsafe transformations for performance reasons, even when the programmer uses parentheses explicitly. But the compiler must always honor the parentheses implied by __arithmetic_fence, regardless of the FP model settings.

Under –ffp-model=fast, __arithmetic_fence provides a way to partially enforce ordering in an FP expression.

Original expressionTransformed expressionPermitted?
(a + b) + ca + (b + c)Yes!
__arithmetic_fence(a + b) + ca + (b + c)No!
NOTE: The __arithmetic_fence serves no purpose in value-safe FP modes like –ffp-model=precise: FP expressions are already strictly ordered.

Motivation:

  • The new clang builtin provides clang compatibility with the Intel C++ compiler builtin __fence which has similar semantics, and likewise enables implementation of the option -fprotect-parens.
  • The new builtin provides the clang programmer control over floating point optimizations at the expression level.
  • The gnu fortran compiler, gfortran, likewise supports -fprotect-parens

Requirements for __arithmetic_fence:

  • There is one operand. The operand type must be scalar floating point, complex floating point or vector thereof.
  • The return type is the same as the operand type.
  • The return value is equivalent to the operand.
  • The invocation of __arithmetic_fence is not a C/C++ constant expression, even if the operands are constant.

New option -fprotect-parens

  • Determines whether the optimizer honors parentheses when floating-point expressions are evaluated
  • Option defaults to fno-protect-parens

Diff Detail

Unit TestsFailed

TimeTest
270 msx64 debian > LLVM.CodeGen/X86::arithmetic_fence.ll
Script: -- : 'RUN: at line 2'; /mnt/disks/ssd0/agent/llvm-project/build/bin/llc < /mnt/disks/ssd0/agent/llvm-project/llvm/test/CodeGen/X86/arithmetic_fence.ll -mtriple=i686-unknown-unknown -mattr=+fma | /mnt/disks/ssd0/agent/llvm-project/build/bin/FileCheck /mnt/disks/ssd0/agent/llvm-project/llvm/test/CodeGen/X86/arithmetic_fence.ll --check-prefix=X86
160 msx64 windows > LLVM.CodeGen/X86::arithmetic_fence.ll
Script: -- : 'RUN: at line 2'; c:\ws\w16c2-2\llvm-project\premerge-checks\build\bin\llc.exe < C:\ws\w16c2-2\llvm-project\premerge-checks\llvm\test\CodeGen\X86\arithmetic_fence.ll -mtriple=i686-unknown-unknown -mattr=+fma | c:\ws\w16c2-2\llvm-project\premerge-checks\build\bin\filecheck.exe C:\ws\w16c2-2\llvm-project\premerge-checks\llvm\test\CodeGen\X86\arithmetic_fence.ll --check-prefix=X86

Event Timeline

mibintc created this revision.Apr 8 2021, 8:45 AM
mibintc requested review of this revision.Apr 8 2021, 8:45 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptApr 8 2021, 8:45 AM
mibintc retitled this revision from [clang] RFC Support new arithmetic __fence builtin to control floating point optiization to [clang] RFC Support new builtin __arithmetic_fence to control floating point optiization.
mibintc updated this revision to Diff 337881.Thu, Apr 15, 1:19 PM
mibintc retitled this revision from [clang] RFC Support new builtin __arithmetic_fence to control floating point optiization to [clang] RFC Support new builtin __arithmetic_fence to control floating point optimization, and new clang option fprotect-parens.

This is a minor change with only formatting changes, this patch is not yet ready for review, only discussion.
Together with the llvm parent patch, this simple program can now run end-to-end

clang -c -ffast-math test.c

float addF(float x, float y) {
  return __arithmetic_fence(x + y);
}
kpn added a comment.Fri, Apr 16, 11:10 AM

I thought that adding a new fence required changing every optimization pass in LLVM. That's why the constrained intrinsics were implemented they way they are where no fence is needed.

Aren't you going to have miscompiles using this new fence until all that optimization work is done? Or am I wrong? @andrew.w.kaylor?

kpn added a comment.Fri, Apr 16, 11:21 AM

I thought that adding a new fence required changing every optimization pass in LLVM. That's why the constrained intrinsics were implemented they way they are where no fence is needed.

Aren't you going to have miscompiles using this new fence until all that optimization work is done? Or am I wrong? @andrew.w.kaylor?

Perhaps there is no problem. I'm looking at D99675 now.

kpn added inline comments.Tue, Apr 20, 11:14 AM
clang/lib/CodeGen/CGBuiltin.cpp
2844

Does this say that the fence will be silently dropped if any of the fast math flags are disabled? Silently dropping something used for correctness makes me nervous. Is there a case where all of these flags are required for correct behavior of the fence?

mibintc added inline comments.Tue, Apr 20, 11:19 AM
clang/lib/CodeGen/CGBuiltin.cpp
2844

Yes that is the idea, the clang builtin is only meaningful for float operations when -ffast-math is enabled. If fast math isn't enabled then the operations should already be performed strictly. I didn't have "isFast", should rewrite with isFast to make it easier to understand

mibintc added inline comments.Tue, Apr 20, 11:29 AM
clang/lib/CodeGen/CGBuiltin.cpp
2844

Oops I'm wrong. @kbsmith1 tells me only the reassoc bit is interesting. i'll need to fix this in the documentation etc. Thanks for your careful reading.

mibintc added inline comments.Wed, Apr 21, 11:25 AM
clang/lib/CodeGen/CGBuiltin.cpp
2844

BTW in the options parsing, any of the unsafe fp math options freciprocal-math, fsigned-zeros and fapprox-func have the effect of enabing LangOptions.AllowFPReassoc. This is in the file Options.td

mibintc updated this revision to Diff 340598.Mon, Apr 26, 11:35 AM
mibintc added a reviewer: aaron.ballman.

I think this patch is complete except it needs to wait until the parent patch is finished. Also some re-factoring may be desireable, I'll add some inline comments about that.

some inline comments for reviewers

clang/include/clang/Basic/TargetInfo.h
1162

There's no good place to diagnose when LangOptions and TargetOptions conflict, I see "conflict" diagnostics being emitted in clang/CodeGen when creating the func or module, which seems wrong to me. On the other hand, the "adjust" function makes a good place to do the reconciliation I think. This is the change that could be pulled out in a refactoring and committed prior to this patch. What do you think?

clang/include/clang/Driver/Options.td
4300

@jansvoboda11 This is a gfortran option, but I don't think there's a way to allow the same option spelling for gfortran and clang. Other clang options, like -short-enum, also have been pulled out of gfortran group, and the gfortran option test is an expected-fail test.