This is an archive of the discontinued LLVM Phabricator instance.

[clang] Add support for new builtin __arithmetic_fence to control floating point optimization, and new clang option fprotect-parens
ClosedPublic

Authored by mibintc on Apr 8 2021, 8:45 AM.

Details

Summary

This patch adds a new clang builtin, __arithmetic_fence. The purpose of the builtin is to provide the user fine control, at the expression level, over floating point optimization when -ffast-math (-ffp-model=fast) is enabled. We are also proposing a new clang builtin that provides access to this intrinsic, as well as a new clang command line option -fprotect-parens that will be implemented using this intrinsic.

This is the clang patch corresponding to https://reviews.llvm.org/D99675 which proposes a new llvm intrinsic llvm.arith.fence

Rationale

Some expression transformations that are mathematically correct, such as reassociation and distribution, may be incorrect when dealing with finite precision floating point. For example, these two expressions,

(a + b) + c
a + (b + c)

are equivalent mathematically in integer arithmetic, but not in floating point. In some floating point (FP) models, the compiler is allowed to make these value-unsafe transformations for performance reasons, even when the programmer uses parentheses explicitly. But the compiler must always honor the parentheses implied by __arithmetic_fence, regardless of the FP model settings.

Under –ffp-model=fast, __arithmetic_fence provides a way to partially enforce ordering in an FP expression.

Original expressionTransformed expressionPermitted?
(a + b) + ca + (b + c)Yes!
__arithmetic_fence(a + b) + ca + (b + c)No!
NOTE: The __arithmetic_fence serves no purpose in value-safe FP modes like –ffp-model=precise: FP expressions are already strictly ordered.

Motivation:

  • The new clang builtin provides clang compatibility with the Intel C++ compiler builtin __fence which has similar semantics, and likewise enables implementation of the option -fprotect-parens.
  • The new builtin provides the clang programmer control over floating point optimizations at the expression level.
  • The gnu fortran compiler, gfortran, likewise supports -fprotect-parens

Requirements for __arithmetic_fence:

  • There is one operand. The operand type must be scalar floating point, complex floating point or vector thereof.
  • The return type is the same as the operand type.
  • The return value is equivalent to the operand.
  • The invocation of __arithmetic_fence is not a C/C++ constant expression, even if the operands are constant.

New option -fprotect-parens

  • Determines whether the optimizer honors parentheses when floating-point expressions are evaluated
  • Option defaults to fno-protect-parens

Diff Detail

Event Timeline

mibintc created this revision.Apr 8 2021, 8:45 AM
mibintc requested review of this revision.Apr 8 2021, 8:45 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptApr 8 2021, 8:45 AM
mibintc retitled this revision from [clang] RFC Support new arithmetic __fence builtin to control floating point optiization to [clang] RFC Support new builtin __arithmetic_fence to control floating point optiization.
mibintc updated this revision to Diff 337881.Apr 15 2021, 1:19 PM
mibintc retitled this revision from [clang] RFC Support new builtin __arithmetic_fence to control floating point optiization to [clang] RFC Support new builtin __arithmetic_fence to control floating point optimization, and new clang option fprotect-parens.

This is a minor change with only formatting changes, this patch is not yet ready for review, only discussion.
Together with the llvm parent patch, this simple program can now run end-to-end

clang -c -ffast-math test.c

float addF(float x, float y) {
  return __arithmetic_fence(x + y);
}
kpn added a comment.Apr 16 2021, 11:10 AM

I thought that adding a new fence required changing every optimization pass in LLVM. That's why the constrained intrinsics were implemented they way they are where no fence is needed.

Aren't you going to have miscompiles using this new fence until all that optimization work is done? Or am I wrong? @andrew.w.kaylor?

kpn added a comment.Apr 16 2021, 11:21 AM

I thought that adding a new fence required changing every optimization pass in LLVM. That's why the constrained intrinsics were implemented they way they are where no fence is needed.

Aren't you going to have miscompiles using this new fence until all that optimization work is done? Or am I wrong? @andrew.w.kaylor?

Perhaps there is no problem. I'm looking at D99675 now.

kpn added inline comments.Apr 20 2021, 11:14 AM
clang/lib/CodeGen/CGBuiltin.cpp
2844

Does this say that the fence will be silently dropped if any of the fast math flags are disabled? Silently dropping something used for correctness makes me nervous. Is there a case where all of these flags are required for correct behavior of the fence?

mibintc added inline comments.Apr 20 2021, 11:19 AM
clang/lib/CodeGen/CGBuiltin.cpp
2844

Yes that is the idea, the clang builtin is only meaningful for float operations when -ffast-math is enabled. If fast math isn't enabled then the operations should already be performed strictly. I didn't have "isFast", should rewrite with isFast to make it easier to understand

mibintc added inline comments.Apr 20 2021, 11:29 AM
clang/lib/CodeGen/CGBuiltin.cpp
2844

Oops I'm wrong. @kbsmith1 tells me only the reassoc bit is interesting. i'll need to fix this in the documentation etc. Thanks for your careful reading.

mibintc added inline comments.Apr 21 2021, 11:25 AM
clang/lib/CodeGen/CGBuiltin.cpp
2844

BTW in the options parsing, any of the unsafe fp math options freciprocal-math, fsigned-zeros and fapprox-func have the effect of enabing LangOptions.AllowFPReassoc. This is in the file Options.td

mibintc updated this revision to Diff 340598.Apr 26 2021, 11:35 AM

I think this patch is complete except it needs to wait until the parent patch is finished. Also some re-factoring may be desireable, I'll add some inline comments about that.

some inline comments for reviewers

clang/include/clang/Basic/TargetInfo.h
1162

There's no good place to diagnose when LangOptions and TargetOptions conflict, I see "conflict" diagnostics being emitted in clang/CodeGen when creating the func or module, which seems wrong to me. On the other hand, the "adjust" function makes a good place to do the reconciliation I think. This is the change that could be pulled out in a refactoring and committed prior to this patch. What do you think?

clang/include/clang/Driver/Options.td
4300

@jansvoboda11 This is a gfortran option, but I don't think there's a way to allow the same option spelling for gfortran and clang. Other clang options, like -short-enum, also have been pulled out of gfortran group, and the gfortran option test is an expected-fail test.

Matt added a subscriber: Matt.May 19 2021, 11:12 AM
foad removed a subscriber: foad.May 25 2021, 2:02 AM
mibintc updated this revision to Diff 348052.May 26 2021, 12:14 PM
mibintc retitled this revision from [clang] RFC Support new builtin __arithmetic_fence to control floating point optimization, and new clang option fprotect-parens to [clang] Add support for new builtin __arithmetic_fence to control floating point optimization, and new clang option fprotect-parens.
mibintc edited the summary of this revision. (Show Details)

Rebased the patch. The parent patch is updated and corrected as well, and tests can run end-to-end.

Ready for your code review, thanks!

mibintc added a reviewer: jyu2.Jun 3 2021, 8:32 AM
mibintc updated this revision to Diff 349644.Jun 3 2021, 12:16 PM

I made a change to ActOnParenExpr to check that the parenthesized expression is not an lvalue before inserting the call to builtin __arithmetic_fence

aaron.ballman added inline comments.
clang/docs/UsersManual.rst
1484

We may need to expound on what "honor parentheses" means. The summary for the patch mentions (a + b) + c, but does this also mean (x + y) * z could be interpreted as x + (y * z)? What about for non-arithmetic expressions involving parentheses, like (float1 == float2 && float1 == float3) || blah()?

clang/include/clang/Basic/DiagnosticSemaKinds.td
8470–8471
clang/include/clang/Basic/TargetInfo.h
1162

I think this makes sense, but should be done as a separate patch.

1415
clang/include/clang/Driver/Options.td
1703

Should this option also be exposed to clang-cl (should it be a CoreOption)?

clang/lib/CodeGen/CGBuiltin.cpp
2839

Please spell out the type here.

clang/lib/Driver/ToolChains/Clang.cpp
2695–2697

Is this code needed for some reason?

clang/lib/Sema/SemaChecking.cpp
6391

What if there is no argument given?

6394
6396–6399
6405–6406

This looks like it should move up a bit.

clang/lib/Sema/SemaCoroutine.cpp
387–388

I am fine ignoring this clang-format issue; I think your formatting is far more readable.

clang/lib/Sema/SemaExpr.cpp
4012

Do you have to worry about vector types here as well?

4013

One worry I have about this is that the AST will be a bit mysterious for people writing AST matchers. The source code will have parens in it, but the AST will have a ParenExpr node or not only depending on the language options. This may also cause problems for other parts of the code that are checking for properly-parenthesized expressions (like && and || within the same expression), so we should probably have a test case like:

bool func(float f1, float f2, float f3) {
  return (f1 == f2 && f1 == f3) || f2 == f3; // Should not warn here
}
6522

Because this cannot fail, is there a benefit to returning an ExprResult rather than an Expr * as before? Then you can remove a bunch of get() calls elsewhere that look suspicious because they're not checking for a failure.

clang/test/Sema/arithmetic-fence-builtin.c
46

You should add a test that we diagnose things like:

__arithmetic_fence();
__arithmetic_fence(1, 2);

I think it would also be useful to add an AST test (to clang\test\AST) that shows the call expression AST nodes being inserted for paren expressions.

question for @aaron.ballman

clang/docs/UsersManual.rst
1484

Thanks for your review. Just a quick first response, What do you mean "(x + y) * z could be reinterpreted as x + (y * z)" -- not following you here?

aaron.ballman added inline comments.Jun 4 2021, 7:09 AM
clang/docs/UsersManual.rst
1484

"whether the optimizer honors the parentheses" reads to me like (x + y) * z could either honor or not honor the parentheses, so that could either evaluate as (x + y) * z or x + (y * z) depending on whether the parens are honored or not.

joerg added a subscriber: joerg.Jun 4 2021, 4:44 PM
joerg added inline comments.
clang/docs/UsersManual.rst
1484

I would prefer to start with a description of the intended behavior from first principles. The logic here should not be tied to fast-mode as it also overlaps a lot with the formulation of fused multiply-add expressions and certainly should have specified behavior on the interaction. I'm also not sure about what you mean with value-safe FP modes, IIRC the C/C++ standard explicitly leave it open in which order floating point expressions of the same operand priority are evaluated.

I'm also a bit puzzled why __arithmethic_fence should not be considered a constant expression?

Reply to @joerg proposing new wording for the option description

clang/docs/UsersManual.rst
1484

I would prefer to start with a description of the intended behavior from first principles. The logic here should not be tied to fast-mode as it also overlaps a lot with the formulation of fused multiply-add expressions and certainly should have specified behavior on the interaction. I'm also not sure about what you mean with value-safe FP modes, IIRC the C/C++ standard explicitly leave it open in which order floating point expressions of the same operand priority are evaluated.

@joerg Thank you. What do you think about this modified description of the option:
option:: -f[no-]protect-parens:

This option pertains to floating-point types, complex types with floating-point components, and vectors of these types. Some arithmetic expression transformations that are mathematically correct and permissible according to the C and C++ language standards may be incorrect when dealing with floating-point types, such as reassociation and distribution. Further, the optimizer may ignore parentheses when computing arithmetic expressions in circumstances where the parenthesized and unparenthesized expression express the same mathematical value. For example (a+b)+c is the same mathematical value as a+(b+c), but the optimizer is free to evaluate the additions in any order regardless of the parentheses. When enabled, this option forces the optimizer to honor the order of operations with respect to parentheses in all circumstances.

Note that floating-point contraction (option -ffp-contract=) is disabled when -fprotect-parens is enabled. Also note that in safe floating-point modes, such as -ffp-model=precise or -ffp-model=strict, this option has no effect because the optimizer is prohibited from making unsafe transformations.

I'm also a bit puzzled why __arithmethic_fence should not be considered a constant expression?

I can see your point about that. I was leery of steering into constant folding waters but I will make that change.

joerg added a comment.Jun 21 2021, 5:04 AM

Thanks, that's better. The clang front-end option is not directly relevant for the semantic of the intrinsic, so it would be better to explicitly specify it was not being fuseable unless the operand degenerates into a single operand. Otherwise the specification sounds reasonable.

mibintc updated this revision to Diff 353719.Jun 22 2021, 11:07 AM

This patch addresses almost all the review comments, not yet sure about @aaron.ballman 's question about CoreOptions

This patch addresses almost all the review comments, not yet sure about @aaron.ballman 's question about CoreOptions

FWIW, I'm fine addressing that comment in a follow-up. Also, it looks like the Clang changes didn't make it into the latest updated patch.

mibintc updated this revision to Diff 354009.Jun 23 2021, 10:07 AM

The patch I uploaded for review yesterday wasn't correct, not sure what happened. This one looks better.

mibintc marked 9 inline comments as done.Jun 23 2021, 10:28 AM

A couple inline replies to go along with the updated patch

clang/include/clang/Driver/Options.td
1703

I don't know the answer to this, do you have a recommendation?

clang/lib/Sema/SemaChecking.cpp
6396–6399

I found there was a boolean function, so i got rid of the lambda.

clang/lib/Sema/SemaCoroutine.cpp
294

I moved this function to public because another file is also using the same logic

clang/lib/Sema/SemaExpr.cpp
4012

Oh yes, i didn't get this one.

4013

I added an AST test case

aaron.ballman added inline comments.Jun 23 2021, 11:00 AM
clang/include/clang/Driver/Options.td
1703

I think this would be useful for clang-cl users, so I'd recommend adding it to CoreOption unless there's a reason not to.

clang/lib/AST/ExprConstant.cpp
9309–9311

Under what circumstances could we get here? (Given that this is within PointerExprEvaluator, I would have assumed this would be unreachable code.)

clang/lib/CodeGen/CGBuiltin.cpp
2841

May as well fix this clang-format issue.

clang/lib/Sema/SemaExpr.cpp
6523

Might as well hit this clang-format warning too.

clang/test/AST/arithmetic-fence-builtin.c
34 ↗(On Diff #354009)

Does the (a + b) still have an AST node for the ParenExpr?

aaron.ballman added inline comments.Jun 23 2021, 11:09 AM
clang/test/AST/arithmetic-fence-builtin.c
34 ↗(On Diff #354009)

Nevermind, it won't, it'll have the CallExpr node.

I think this may be somewhat surprising to folks who use AST matchers or -ast-print, but I'm not certain that there's a better way to implement this that would keep the ParenExpr in the AST for full source fidelity, so I think this is reasonable enough.

mibintc updated this revision to Diff 354061.Jun 23 2021, 1:17 PM
mibintc marked 2 inline comments as done.

Respond to @aaron.ballman 's review

mibintc marked 11 inline comments as done.Jun 23 2021, 1:20 PM

some inline replies. I think this is all set now.

clang/lib/Sema/SemaExpr.cpp
4012

Yes thanks, i also added a little test case for this

aaron.ballman accepted this revision.Jun 24 2021, 11:54 AM

In general, I think this looks good (I did find one minor nit that removes an include, but that can be fixed as you land). There's an open question for @jansvoboda11 and it'd be nice to hear if @joerg agrees their concerns are fully addressed, so please wait a bit before landing to give them time to respond with any concerns or feedback.

clang/include/clang/Driver/Options.td
4300

@jansvoboda11 -- do you have any ideas here? Or is the current approach fine (and we can remove this commented-out line)?

clang/include/clang/Sema/Sema.h
5377

I think it might make sense to pass in an unsigned rather than a Builtin::ID so that we can avoid including Builtins.h (we do this sort of dance to avoid including headers elsewhere as well). WDYT?

This revision is now accepted and ready to land.Jun 24 2021, 11:54 AM
This revision was landed with ongoing or failed builds.Jun 28 2021, 9:27 AM
This revision was automatically updated to reflect the committed changes.
mibintc reopened this revision.Jun 28 2021, 12:06 PM

buildbot fails on lldb need to update this patch

This revision is now accepted and ready to land.Jun 28 2021, 12:06 PM