This is an archive of the discontinued LLVM Phabricator instance.

Use FPContractModeKind universally
ClosedPublic

Authored by anemet on Mar 20 2017, 8:11 PM.

Details

Summary

FPContractModeKind is the codegen option flag which is already ternary (off,
on, fast). This makes it universally the type for the contractable info
across the front-end:

  • In FPOptions (i.e. in the Sema + in the expression nodes).
  • In LangOpts::DefaultFPContractMode which is the option that initializes

FPOptions in the Sema.

Another way to look at this change is that before fp-contractable on/off were
the only states handled to the front-end:

  • For "on", FMA folding was performed by the front-end
  • For "fast", we simply forwarded the flag to TargetOptions to handle it in LLVM

Now off/on/fast are all exposed because for fast we will generate
fast-math-flags during CodeGen.

This is toward moving fp-contraction=fast from an LLVM TargetOption to a
FastMathFlag in order to fix PR25721.

Diff Detail

Repository
rL LLVM

Event Timeline

anemet created this revision.Mar 20 2017, 8:11 PM
aaron.ballman accepted this revision.Mar 29 2017, 1:14 PM

Aside from a minor comment nit, LGTM

include/clang/Basic/LangOptions.h
92 ↗(On Diff #92423)

I think you mean effected rather than affected.

This revision is now accepted and ready to land.Mar 29 2017, 1:14 PM
anemet added inline comments.Mar 29 2017, 1:31 PM
include/clang/Basic/LangOptions.h
92 ↗(On Diff #92423)

I think the verb is affect; effect is the noun. Quick grep confirms:

/org/llvm$ git grep affected | wc -l

109

/org/llvm$ git grep effected | wc -l

2
aaron.ballman added inline comments.Mar 29 2017, 1:44 PM
include/clang/Basic/LangOptions.h
92 ↗(On Diff #92423)

I think I was thrown by "form" being the verb in that sentence, but you are correct. :-)

This revision was automatically updated to reflect the committed changes.
rnk added a subscriber: rnk.Mar 29 2017, 5:40 PM
rnk added inline comments.
cfe/trunk/include/clang/Basic/LangOptions.h
217

Please do not use bitfields with enum types, it's a good way to break the build on Windows. This change triggered this clang-cl warning:

C:\src\llvm-project\clang\include\clang/Basic/LangOptions.h(208,17):  warning: implicit truncation from 'clang::LangOptions::FPContractModeKind' to bit-field changes value from 2 to -2 [-Wbitfield-constant-conversion]
    fp_contract = LangOptions::FPC_Fast;
                ^ ~~~~~~~~~~~~~~~~~~~~~
anemet added inline comments.Mar 29 2017, 8:17 PM
cfe/trunk/include/clang/Basic/LangOptions.h
217

Noted and thanks for the fix! Unfortunately the warning wasn't showing up on my host. I'll take a look why.

rnk added inline comments.Mar 30 2017, 11:42 AM
cfe/trunk/include/clang/Basic/LangOptions.h
217

Clang doesn't emit that warning on Posix because it wouldn't be true. The implicit underlying type of the enum on non-Windows is 'unsigned', not 'int'. We could issue a portability warning, but we wouldn't be able to turn it on by default because many users don't care about Windows portability.

Anyway, sorry about the bother. This is one of the reasons we just use 'unsigned' for all our bitfields. =/

yaxunl added a subscriber: yaxunl.Apr 7 2017, 11:41 AM
yaxunl added inline comments.
cfe/trunk/include/clang/Basic/LangOptions.def
220

This change seemed to cause some performance degradations in some OpenCL applications.

This option used to be on by default. Now it is off by default.

There are applications do separated compile/link/codegen stages. In the codegen stage, clang is invoked with .bc as input, therefore this option is off by default, whereas it was on by default before this change.

Is there any reason not to keep the original behavior?

Thanks.

anemet added inline comments.Apr 10 2017, 12:24 PM
cfe/trunk/include/clang/Basic/LangOptions.def
220

This change seemed to cause some performance degradations in some OpenCL applications.

This option used to be on by default. Now it is off by default.

It's always been off. It was set to fast for CUDA which should still be the case. See the changes further down on the patch.

There are applications do separated compile/link/codegen stages. In the codegen stage, clang is invoked with .bc as input, therefore this option is off by default, whereas it was on by default before this change.

Is there any reason not to keep the original behavior?

Sorry but I am not sure what changed, can you elaborate on what you're doing and how things used to work for you?

hfinkel added inline comments.Apr 10 2017, 12:38 PM
cfe/trunk/lib/Frontend/CompilerInvocation.cpp
1638

Looks like the intent is certainly to have kept the OpenCL default the same here.

yaxunl added inline comments.Apr 10 2017, 12:40 PM
cfe/trunk/include/clang/Basic/LangOptions.def
220

Before your change, -ffp-contract was a codegen option defined as

ENUM_CODEGENOPT(FPContractMode, FPContractModeKind, 2, FPC_On)

therefore the default value as on. After your change, it becomes off by default.

yaxunl added inline comments.Apr 10 2017, 12:45 PM
cfe/trunk/lib/Frontend/CompilerInvocation.cpp
1638

Well we also use clang to compile LLVM IR to ISA. Before this change, fp-contract was on so that backend did fp contract. However, now clang does not turn on fp-contract when compiling LLVM IRs.

hfinkel added inline comments.Apr 10 2017, 1:03 PM
cfe/trunk/lib/Frontend/CompilerInvocation.cpp
1638

That's the equivalent of 'Fast'. Maybe this should also be set to Fast then.

yaxunl added inline comments.Apr 10 2017, 1:20 PM
cfe/trunk/lib/Frontend/CompilerInvocation.cpp
1638

Agree. That should fix this issue.

anemet added inline comments.Apr 10 2017, 2:49 PM
cfe/trunk/include/clang/Basic/LangOptions.def
220

No. -ffp-contract=on was a FE option and -ffp-contract=fast was a backend option. I still don't understand your use-case. Can you include a small testcase how this used to work before?

yaxunl added inline comments.Apr 11 2017, 9:24 AM
cfe/trunk/include/clang/Basic/LangOptions.def
220

-ffp-contract=on used to be a codegen option before this change. In CodeGen/BackendUtil.cpp, before this change:

switch (CodeGenOpts.getFPContractMode()) {
  switch (LangOpts.getDefaultFPContractMode()) {
  case LangOptions::FPC_Off:
    Options.AllowFPOpFusion = llvm::FPOpFusion::Strict;
    break;
  case LangOptions::FPC_On:
    Options.AllowFPOpFusion = llvm::FPOpFusion::Standard;
    break;
  case LangOptions::FPC_Fast:
    Options.AllowFPOpFusion = llvm::FPOpFusion::Fast;
    break;
  }

By default, -fp-contract=on, which results in llvm::FPOpFusion::Standard in the backend. This options allows backend to translate llvm.fmuladd to FMA machine instructions.

After this change, it becomes:

switch (LangOpts.getDefaultFPContractMode()) {
  case LangOptions::FPC_Off:
    Options.AllowFPOpFusion = llvm::FPOpFusion::Strict;
    break;
  case LangOptions::FPC_On:
    Options.AllowFPOpFusion = llvm::FPOpFusion::Standard;
    break;
  case LangOptions::FPC_Fast:
    Options.AllowFPOpFusion = llvm::FPOpFusion::Fast;
    break;
  }

now by default -ffp-contract=off, which results in llvm::FPOpFusion::Strict in the backend. This option disables backend translating llvm.fmuladd to FMA machine instructions in certain situations.

A simple lit test is as follows:

; RUN: %clang_cc1 -triple amdgcn -S -o - %s| FileCheck %s

define amdgpu_kernel void @f(double addrspace(1)* nocapture %out, double %a, double %b, double %c) local_unnamed_addr #0 {
entry:
  ; CHECK: v_fma_f64
  %0 = tail call double @llvm.fmuladd.f64(double %b, double %c, double %a)
  store double %0, double addrspace(1)* %out, align 8, !tbaa !6
  ret void
}

declare double @llvm.fmuladd.f64(double, double, double) #1

attributes #0 = { nounwind "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-features"="+fp64-fp16-denormals,-fp32-denormals" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #1 = { nounwind readnone }

The test passes before this change and fails after this change.

cfe/trunk/lib/Frontend/CompilerInvocation.cpp
1638

Actually, the option on may be working too, which corresponds to llvm::FPOpFusion::Standard. At least it allows llvm.fmuladd to be translated to FMA machine instructions. However, off will disable such translation in certain cases.

Quoting TargetOptions.h:

/// Fast mode - allows formation of fused FP ops whenever they're
/// profitable.
/// Standard mode - allow fusion only for 'blessed' FP ops. At present the
/// only blessed op is the fmuladd intrinsic. In the future more blessed ops
/// may be added.
/// Strict mode - allow fusion only if/when it can be proven that the excess
/// precision won't effect the result.
hfinkel added inline comments.Apr 11 2017, 9:45 AM
cfe/trunk/include/clang/Basic/LangOptions.def
220

I'm missing something here. We're calling for OpenCL:

Opts.setDefaultFPContractMode(LangOptions::FPC_On);

which seems like it should change the result returned by getDefaultFPContractMode(). Why does it not?

yaxunl added inline comments.Apr 11 2017, 9:52 AM
cfe/trunk/include/clang/Basic/LangOptions.def
220

The input to this test is LLVM IR, so Opts.setDefaultFPContractMode(LangOptions::FPC_On) is not executed.

hfinkel added inline comments.Apr 11 2017, 10:12 AM
cfe/trunk/include/clang/Basic/LangOptions.def
220

Seems like this is another aspect of the LTO problem. All of these things need to be function attributes. This is also true to make LTO work.

anemet added inline comments.Apr 11 2017, 10:17 AM
cfe/trunk/include/clang/Basic/LangOptions.def
220

@hfinkel, what is the actual point of un-fusing the intrinsic in the BE with FPOpFusion::Strict?

I don't know if targets use the target-independent intrinsic to implement builtin support but if yes this would override the user choosing of FMA by directly specifying the builtin.

hfinkel added inline comments.Apr 12 2017, 11:06 AM
cfe/trunk/include/clang/Basic/LangOptions.def
220

@hfinkel, what is the actual point of un-fusing the intrinsic in the BE with FPOpFusion::Strict?

I think this was added based on a different conception of how frontends would use the intrinsic: that they'd always add it where the language-semantics allow, and then the backend would either never use FMAs (strict), use FMAs where profitable (standard), etc.

I don't know if targets use the target-independent intrinsic to implement builtin support but if yes this would override the user choosing of FMA by directly specifying the builtin.

No target should use fmuladd to implement the builtin support for a target-specific intrinsic. It should use the fma intrinsic would always maps to an fma.

anemet added inline comments.Apr 12 2017, 12:18 PM
cfe/trunk/include/clang/Basic/LangOptions.def
220

I think this was added based on a different conception of how frontends would use the intrinsic: that they'd always add it where the language-semantics allow, and then the backend would either never use FMAs (strict), use FMAs where profitable (standard), etc.

So would you support disabling un-fusion on Strict? That should solve Yaxun's problem.

Also my mid-term goal is to completely remove the FPOpFusion target option in favor of the explicit representation in the IR: either the FMF or the direct use of the intrinsic. At that point the behavior would have to change anyway.

No target should use fmuladd to implement the builtin support for a target-specific intrinsic. It should use the fma intrinsic would always maps to an fma.

Could you please rephrase this, I am not sure I understand?

mehdi_amini edited edge metadata.Apr 13 2017, 12:41 AM

I believe considering the goal of moving to using per-instruction FMF and kills the global backend option, this leads to a bitcode upgrade issue: OpenCL (or other) bitcode that were generated bitcode don't have the right FMF and will be upgraded conservatively.
Performance regression when upgrading bitcode are to be expected in general, so it is not a bug.
To recover, an option for an OpenCL backend would be to add a pass that set the expected FMF everywhere after bitcode upgrade.

I believe considering the goal of moving to using per-instruction FMF and kills the global backend option, this leads to a bitcode upgrade issue: OpenCL (or other) bitcode that were generated bitcode don't have the right FMF and will be upgraded conservatively.
Performance regression when upgrading bitcode are to be expected in general, so it is not a bug.
To recover, an option for an OpenCL backend would be to add a pass that set the expected FMF everywhere after bitcode upgrade.

Actually we are not concerned about old bitcodes. We are doing separate compile/link/codegen, but the bitcode we use in codegen is fresh from the same revision of llvm/clang. As long as the FMF carries the correct info from Clang we should be OK.

hfinkel added inline comments.Apr 14 2017, 5:54 AM
cfe/trunk/include/clang/Basic/LangOptions.def
220

So would you support disabling un-fusion on Strict? That should solve Yaxun's problem.

Why don't you just have the frontend never set 'Strict' and always set' Standard' (or 'Fast')? That seems like a reasonable option to me given how Clang generates IR.

Also my mid-term goal is to completely remove the FPOpFusion target option in favor of the explicit representation in the IR: either the FMF or the direct use of the intrinsic.

Sounds good.

Could you please rephrase this, I am not sure I understand?

Hrmm, maybe I misunderstood the question. Were you asking whether any target intrinsics, __builtin_<my target prefix>_<some fma thing>(a, b, c) would be lowered in terms of fmuladd instead of fma?