Support pragma FP_CONTRACT under strict FP.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
- Build Status
Buildable 44116 Build 45255: arc lint + arc unit
Event Timeline
The title of this review is misleading. It should at least mention FPEnv, constrained intrinsics, or strict fp or something. Right now it sounds like FP_CONTRACT isn't supported at all.
Can we split most of the X86 changes into a separate patch? Most of it can be tested with fneg and constrained.fma.
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp | ||
---|---|---|
7064 | Can you make the SDValue Result an argument of this and only capture 'this'. I don't like depending on reassigning Result. |
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp | ||
---|---|---|
7061 | Why is Result a reference? It's not modified is it? Don't use auto for parameter types. llvm coding style prefers auto to only be used when the type is easily assumed by someone reading the code. |
clang/lib/CodeGen/CGExprScalar.cpp | ||
---|---|---|
3381 | You shouldn't just assume that MulOp is a constrained intrinsic. Cast to ConstrainedFPIntrinsic and use ConstrainedFPIntrinsic::getRoundingMode() and ConstrainedFPIntrinsic::getExceptionBehavior(). The cast will effectively assert that MulOp is a constrained intrisic. I think that should always be true. | |
3423 | I don't think we should ever non-constrained create FMul instructions if Builder is in FP constrained mode, but you should assert that somewhere above. Maybe move this block above line 3409 and add: assert(LHSBinOp->getOpcode() != llvm::Instruction::FMul && RHSBinOp->getOpcode() != llvm::Instruction::FMul); | |
clang/test/CodeGen/constrained-math-builtins.c | ||
160 | I'd like to see a test that verifies the calls generated in the function and specifically a test that verifies that the constrained fneg is generated if needed. | |
llvm/docs/LangRef.rst | ||
16094 | s/specifie/specify s/the exception behavior/the rounding mode and exception behavior | |
16104 | missing metadata arguments | |
llvm/include/llvm/CodeGen/BasicTTIImpl.h | ||
1515 | I don't think that matters. The cost calculation here is a conservative estimate based on the cost if we are unable to generate an FMA instruction. So a constrained fmuladd that can't be lowered to FMA will be lower the same way a contrained mul followed by a constrained add would be. | |
llvm/include/llvm/CodeGen/ISDOpcodes.h | ||
355 | Something is wrong with this comment. I'm not sure what it's trying to say but the grammar is wrong. After looking through the rest of the code, I think I understand what's going on. I think we need a verbose comment to explain it. Here's my suggestion FMULADD/STRICT_FMULADD -- These are intermediate opcodes used to handle the constrained.fmuladd intrinsic. The FMULADD opcode only exists because it is required for correct macro expansion and default handling (which is never reached). There should never be a node with ISD::FMULADD. The STRICT_FMULADD opcode is used to allow selectionDAGBuilder::visitConstrainedFPIntrinsic to determine (based on TargetOptions and target cost information) whether the constrained.fmuladd intrinsic should be lowered to FMA or separate FMUL and FADD operations. Having thought through that, however, it strikes me as a lot of overhead. Can we just add special handling for the constrained.fmuladd intrinsic and make the decision then to create either a STRICT_FMA node or separate STRICT_FMUL and STRICT_FADD? The idea that ISD::FMULADD is going to exist as a defined opcode but we never intend to add any support for handling it is particularly bad. |
clang/lib/CodeGen/CGExprScalar.cpp | ||
---|---|---|
3436 | I don't think it's safe to fuse a FMUL and FADD if the intermediate rounding isn't exactly the same as those individual operations. FMULADD doesn't guarantee that, does it? |
clang/lib/CodeGen/CGExprScalar.cpp | ||
---|---|---|
3436 | To be clear, we could miss very-edge-case overflow/underflow exceptions. |
clang/lib/CodeGen/CGExprScalar.cpp | ||
---|---|---|
3436 | Ah, but I see C/C++ FP_CONTRACT allows the exceptions to be optimized away. Sorry for the noise. |
clang/lib/CodeGen/CGExprScalar.cpp | ||
---|---|---|
3436 | We've talked about this before but I don't think we ever documented a decision as to whether we want to allow constrained intrinsics and fast math flags to be combined. This patch moves that decision into clang's decision to generate this intrinsic or not. I think it definitely makes sense in the case of fp contraction, because even if a user cares about value safety they might want FMA, which is theorectically more accurate than the separate values even though it produces a different value. This is consistent with gcc (which produces FMA under "-ffp-contract=fast -fno-fast-math") and icc (which produced FMA under "-fp-model strict -fma"). For the record, I also think it makes sense to use nnan, ninf, and nsz with constrained intrinsics. |
clang/lib/CodeGen/CGExprScalar.cpp | ||
---|---|---|
3381 | I prefer to reuse the operands from the fmul intrinsic here. 1). fmuladd always has the same exception/rounding mode with fmul. 2). the function getRoundingMode/getExceptionBehavior just return a enum value. We need more code to turn them into Value type. | |
3423 | Add assertion in line 3380. We only need to check once there. | |
llvm/test/TableGen/GlobalISelEmitter-input-discard.td | ||
18 ↗ | (On Diff #238769) | It's strange the number is affected. I haven't found any cause. |
clang/lib/CodeGen/CGExprScalar.cpp | ||
---|---|---|
3436 | You had me until:
To be clear, we'd need them for the fast case, but I don't see a lot of value for the strict case. We definitely want reassoc/recip/etc for the optimized but trap-safe case, so that's enough to require FMF flags on constrained intrinsics alone. We should probably break this conversation out into an llvm-dev thread... |
Remember that the design is that constrained intrinsics must be used whenever *any* code in the function is constrained. It is not unreasonable that part of the function might be constrained and the rest subject to fast-math; it'd be a shame if the intrinsics couldn't even express that.
clang/lib/CodeGen/CGExprScalar.cpp | ||
---|---|---|
3436 | I agree about starting an llvm-dev thread. I'll send something out unless you've already done so by the time I finish typing it. |
clang/lib/CodeGen/CGExprScalar.cpp | ||
---|---|---|
3381 | Doesn't this need to be CreateConstrainedFPCall so that the strictfp attribute is added? That will take care of adding the metadata operands too. |
clang/lib/CodeGen/CGExprScalar.cpp | ||
---|---|---|
3381 | Is this code tested? I ran into a bug yesterday where CreateCall was used with a constrained intrinsic and the Instruction class blew up because the function signature was wrong. I wasn't passing in the metadata arguments. So, yes, it should be, and it would might make sense for the patch to have test coverage that catches any other cases of this. |
clang/lib/CodeGen/CGExprScalar.cpp | ||
---|---|---|
3381 | This code is copying the metadata arguments from the fmul intrinsic, MulOp. that’s the getOperand(2) and getOperand(3). |
clang/lib/CodeGen/CGExprScalar.cpp | ||
---|---|---|
3381 | Ah, yes, thanks. Your comment about the attribute is still valid, though. And, yes, using CreateConstrainedFPCall() is the easiest way to fix the attribute. |
clang/lib/CodeGen/CGExprScalar.cpp | ||
---|---|---|
3381 | Yes, it's the best choice. Thanks! |
llvm/docs/LangRef.rst | ||
---|---|---|
16065 | This underline isn't long enough and is breaking the sphinx build bot. Please fix. |
llvm/docs/LangRef.rst | ||
---|---|---|
16065 | Thanks! I'll fix it soon. |
clang/lib/CodeGen/CGExprScalar.cpp | ||
---|---|---|
3382 | Sorry, I'm not familiar with the optimization of the clang front end. |
clang/lib/CodeGen/CGExprScalar.cpp | ||
---|---|---|
3382 | No, it is a flexible intrinsic, which allows backends to choose their best approach. It can be either interpretered as mul + add or fma. It represents user doesn't care the differece between them. |
You shouldn't just assume that MulOp is a constrained intrinsic. Cast to ConstrainedFPIntrinsic and use ConstrainedFPIntrinsic::getRoundingMode() and ConstrainedFPIntrinsic::getExceptionBehavior(). The cast will effectively assert that MulOp is a constrained intrisic. I think that should always be true.