This is an archive of the discontinued LLVM Phabricator instance.

[FPEnv] Add pragma FP_CONTRACT support under strict FP.
ClosedPublic

Authored by pengfei on Jan 15 2020, 7:02 PM.

Event Timeline

pengfei created this revision.Jan 15 2020, 7:02 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJan 15 2020, 7:02 PM
pengfei updated this revision to Diff 238408.Jan 15 2020, 7:24 PM

Remove dead code.

The title of this review is misleading. It should at least mention FPEnv, constrained intrinsics, or strict fp or something. Right now it sounds like FP_CONTRACT isn't supported at all.

Can we split most of the X86 changes into a separate patch? Most of it can be tested with fneg and constrained.fma.

craig.topper added inline comments.Jan 15 2020, 7:46 PM
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
7090

Can you make the SDValue Result an argument of this and only capture 'this'. I don't like depending on reassigning Result.

pengfei updated this revision to Diff 238414.Jan 15 2020, 9:08 PM
pengfei marked an inline comment as done.

Address review comments.

craig.topper added inline comments.Jan 15 2020, 9:22 PM
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
7061

Why is Result a reference? It's not modified is it? Don't use auto for parameter types. llvm coding style prefers auto to only be used when the type is easily assumed by someone reading the code.

pengfei updated this revision to Diff 238417.Jan 15 2020, 10:00 PM
pengfei marked an inline comment as done.

Address review comment.

pengfei retitled this revision from Add pragma FP_CONTRACT support. to [FPEnv] Add pragma FP_CONTRACT support under strict FP..Jan 15 2020, 10:08 PM
pengfei edited the summary of this revision. (Show Details)
clang/lib/CodeGen/CGExprScalar.cpp
3381

You shouldn't just assume that MulOp is a constrained intrinsic. Cast to ConstrainedFPIntrinsic and use ConstrainedFPIntrinsic::getRoundingMode() and ConstrainedFPIntrinsic::getExceptionBehavior(). The cast will effectively assert that MulOp is a constrained intrisic. I think that should always be true.

3423

I don't think we should ever non-constrained create FMul instructions if Builder is in FP constrained mode, but you should assert that somewhere above. Maybe move this block above line 3409 and add:

assert(LHSBinOp->getOpcode() != llvm::Instruction::FMul && RHSBinOp->getOpcode() != llvm::Instruction::FMul);

clang/test/CodeGen/constrained-math-builtins.c
160

I'd like to see a test that verifies the calls generated in the function and specifically a test that verifies that the constrained fneg is generated if needed.

llvm/docs/LangRef.rst
16094

s/specifie/specify

s/the exception behavior/the rounding mode and exception behavior

16104

missing metadata arguments

llvm/include/llvm/CodeGen/BasicTTIImpl.h
1515

I don't think that matters. The cost calculation here is a conservative estimate based on the cost if we are unable to generate an FMA instruction. So a constrained fmuladd that can't be lowered to FMA will be lower the same way a contrained mul followed by a constrained add would be.

llvm/include/llvm/CodeGen/ISDOpcodes.h
355

Something is wrong with this comment. I'm not sure what it's trying to say but the grammar is wrong.

After looking through the rest of the code, I think I understand what's going on. I think we need a verbose comment to explain it. Here's my suggestion

FMULADD/STRICT_FMULADD -- These are intermediate opcodes used to handle the constrained.fmuladd intrinsic. The FMULADD opcode only exists because it is required for correct macro expansion and default handling (which is never reached). There should never be a node with ISD::FMULADD. The STRICT_FMULADD opcode is used to allow selectionDAGBuilder::visitConstrainedFPIntrinsic to determine (based on TargetOptions and target cost information) whether the constrained.fmuladd intrinsic should be lowered to FMA or separate FMUL and FADD operations.

Having thought through that, however, it strikes me as a lot of overhead. Can we just add special handling for the constrained.fmuladd intrinsic and make the decision then to create either a STRICT_FMA node or separate STRICT_FMUL and STRICT_FADD?

The idea that ISD::FMULADD is going to exist as a defined opcode but we never intend to add any support for handling it is particularly bad.

cameron.mcinally added inline comments.
clang/lib/CodeGen/CGExprScalar.cpp
3436

I don't think it's safe to fuse a FMUL and FADD if the intermediate rounding isn't exactly the same as those individual operations. FMULADD doesn't guarantee that, does it?

clang/lib/CodeGen/CGExprScalar.cpp
3436

To be clear, we could miss very-edge-case overflow/underflow exceptions.

clang/lib/CodeGen/CGExprScalar.cpp
3436

Ah, but I see C/C++ FP_CONTRACT allows the exceptions to be optimized away. Sorry for the noise.

clang/lib/CodeGen/CGExprScalar.cpp
3436

We've talked about this before but I don't think we ever documented a decision as to whether we want to allow constrained intrinsics and fast math flags to be combined. This patch moves that decision into clang's decision to generate this intrinsic or not.

I think it definitely makes sense in the case of fp contraction, because even if a user cares about value safety they might want FMA, which is theorectically more accurate than the separate values even though it produces a different value. This is consistent with gcc (which produces FMA under "-ffp-contract=fast -fno-fast-math") and icc (which produced FMA under "-fp-model strict -fma").

For the record, I also think it makes sense to use nnan, ninf, and nsz with constrained intrinsics.

pengfei updated this revision to Diff 238769.Jan 17 2020, 6:58 AM
pengfei marked 7 inline comments as done.

Address review comments.

pengfei marked an inline comment as done.Jan 17 2020, 7:01 AM
pengfei added inline comments.
clang/lib/CodeGen/CGExprScalar.cpp
3381

I prefer to reuse the operands from the fmul intrinsic here. 1). fmuladd always has the same exception/rounding mode with fmul. 2). the function getRoundingMode/getExceptionBehavior just return a enum value. We need more code to turn them into Value type.

3423

Add assertion in line 3380. We only need to check once there.

llvm/test/TableGen/GlobalISelEmitter-input-discard.td
18 ↗(On Diff #238769)

It's strange the number is affected. I haven't found any cause.

pengfei updated this revision to Diff 238770.Jan 17 2020, 7:07 AM

Remove unnecessary comment.

clang/lib/CodeGen/CGExprScalar.cpp
3436

You had me until:

For the record, I also think it makes sense to use nnan, ninf, and nsz with constrained intrinsics.

To be clear, we'd need them for the fast case, but I don't see a lot of value for the strict case.

We definitely want reassoc/recip/etc for the optimized but trap-safe case, so that's enough to require FMF flags on constrained intrinsics alone.

We should probably break this conversation out into an llvm-dev thread...

Remember that the design is that constrained intrinsics must be used whenever *any* code in the function is constrained. It is not unreasonable that part of the function might be constrained and the rest subject to fast-math; it'd be a shame if the intrinsics couldn't even express that.

clang/lib/CodeGen/CGExprScalar.cpp
3436

I agree about starting an llvm-dev thread. I'll send something out unless you've already done so by the time I finish typing it.

kpn added a subscriber: kpn.Jan 21 2020, 11:54 AM
craig.topper added inline comments.Jan 23 2020, 10:06 PM
clang/lib/CodeGen/CGExprScalar.cpp
3381

Doesn't this need to be CreateConstrainedFPCall so that the strictfp attribute is added? That will take care of adding the metadata operands too.

kpn added inline comments.Jan 24 2020, 4:56 AM
clang/lib/CodeGen/CGExprScalar.cpp
3381

Is this code tested? I ran into a bug yesterday where CreateCall was used with a constrained intrinsic and the Instruction class blew up because the function signature was wrong. I wasn't passing in the metadata arguments.

So, yes, it should be, and it would might make sense for the patch to have test coverage that catches any other cases of this.

craig.topper added inline comments.Jan 24 2020, 8:40 AM
clang/lib/CodeGen/CGExprScalar.cpp
3381

This code is copying the metadata arguments from the fmul intrinsic, MulOp. that’s the getOperand(2) and getOperand(3).

kpn added inline comments.Jan 24 2020, 9:14 AM
clang/lib/CodeGen/CGExprScalar.cpp
3381

Ah, yes, thanks. Your comment about the attribute is still valid, though. And, yes, using CreateConstrainedFPCall() is the easiest way to fix the attribute.

pengfei updated this revision to Diff 240474.Jan 26 2020, 9:17 PM

Address review comment.

pengfei marked an inline comment as done.Jan 26 2020, 9:19 PM
pengfei added inline comments.
clang/lib/CodeGen/CGExprScalar.cpp
3381

Yes, it's the best choice. Thanks!

This revision is now accepted and ready to land.Jan 27 2020, 10:31 PM
This revision was automatically updated to reflect the committed changes.
jhenderson added inline comments.
llvm/docs/LangRef.rst
16065

This underline isn't long enough and is breaking the sphinx build bot. Please fix.

pengfei marked an inline comment as done.Jan 28 2020, 6:03 AM
pengfei added inline comments.
llvm/docs/LangRef.rst
16065

Thanks! I'll fix it soon.

Allen added a subscriber: Allen.Feb 27 2023, 5:57 PM
Allen added inline comments.
clang/lib/CodeGen/CGExprScalar.cpp
3382

Sorry, I'm not familiar with the optimization of the clang front end.
I'd like to ask, is this optimization supposed to assume that all the backends have instructions like Fmuladd?

Herald added a project: Restricted Project. · View Herald TranscriptFeb 27 2023, 5:57 PM
pengfei added inline comments.Feb 28 2023, 1:31 AM
clang/lib/CodeGen/CGExprScalar.cpp
3382

No, it is a flexible intrinsic, which allows backends to choose their best approach. It can be either interpretered as mul + add or fma. It represents user doesn't care the differece between them.