Index: llvm/docs/AMDGPUUsage.rst =================================================================== --- llvm/docs/AMDGPUUsage.rst +++ llvm/docs/AMDGPUUsage.rst @@ -948,14 +948,14 @@ .. table:: AMDGPU LLVM IR Intrinsics :name: amdgpu-llvm-ir-intrinsics-table - ========================================= ========================================================== - LLVM Intrinsic Description - ========================================= ========================================================== - llvm.amdgcn.log Provides direct access to v_log_f32 and v_log_f16 - (on targets with half support). Peforms log2 function. + ========================================== ========================================================== + LLVM Intrinsic Description + ========================================== ========================================================== + llvm.amdgcn.log Provides direct access to v_log_f32 and v_log_f16 + (on targets with half support). Peforms log2 function. - llvm.amdgcn.exp2 Provides direct access to v_exp_f32 and v_exp_f16 - (on targets with half support). Performs exp2 function. + llvm.amdgcn.exp2 Provides direct access to v_exp_f32 and v_exp_f16 + (on targets with half support). Performs exp2 function. :ref:`llvm.frexp ` Implemented for half, float and double. @@ -964,13 +964,17 @@ 1ULP accuracy for float, and 0.51ULP for half. Float instruction does not natively support denormal inputs. Backend will optimize out denormal scaling if - marked with the :ref:`afn ` flag. - :ref:`llvm.log ` Implemented for float and half (and vectors). + :ref:`llvm.log10 ` Implemented for float and half (and vectors). - :ref:`llvm.exp ` Implemented for float and half (and vectors). + :ref:`llvm.exp2 ` Implemented for float and half (and vectors of float or + half). Not implemented for double. Hardware provides + 1ULP accuracy for float, and 0.51ULP for half. Float + instruction does not natively support denormal + inputs. Backend will optimize out denormal scaling if + marked with the :ref:`afn ` flag. - :ref:`llvm.log10 ` Implemented for float and half (and vectors). + :ref:`llvm.exp ` Implemented for float and half (and vectors). :ref:`llvm.exp2 ` Implemented for float and half (and vectors of float or half). Not implemented for double. Hardware provides @@ -979,7 +983,23 @@ inputs. Backend will optimize out denormal scaling if marked with the :ref:`afn ` flag. - ========================================= ========================================================== + :ref:`llvm.get.rounding` AMDGPU supports two separately controllable rounding + modes depending on the floating-point type. One + controls float, and the other controls both double and + half operations. If both modes are the same, returns + one of the standard return values. If the modes are + different, returns one of :ref:`12 extended values + ` + describing the two modes. + + To nearest, ties away from zero is not a supported + mode. The raw rounding mode values in the MODE + register do not exactly match the FLT_ROUNDS values, + so a conversion is performed. + + ========================================== ========================================================== + +>>>>>>> e4b497627ee6 (AMDGPU: Implement llvm.get.rounding) .. TODO:: @@ -4807,6 +4827,22 @@ FLOAT_ROUND_MODE_ZERO 3 Round Toward 0 ====================================== ===== ============================== + + .. table:: Extended FLT_ROUNDS Enumeration Values + :name: amdgpu-rounding-mode-enumeration-values-table + + +------------------------+---------------+-------------------+--------------------+----------+ + | | F32 NEAR_EVEN | F32 PLUS_INFINITY | F32 MINUS_INFINITY | F32 ZERO | + +------------------------+---------------+-------------------+--------------------+----------+ + | F64/F16 NEAR_EVEN | 1 | 11 | 14 | 17 | + +------------------------+---------------+-------------------+--------------------+----------+ + | F64/F16 PLUS_INFINITY | 8 | 2 | 15 | 18 | + +------------------------+---------------+-------------------+--------------------+----------+ + | F64/F16 MINUS_INFINITY | 9 | 12 | 3 | 19 | + +------------------------+---------------+-------------------+--------------------+----------+ + | F64/F16 ZERO | 10 | 13 | 16 | 0 | + +------------------------+---------------+-------------------+--------------------+----------+ + .. .. table:: Floating Point Denorm Mode Enumeration Values Index: llvm/docs/LangRef.rst =================================================================== --- llvm/docs/LangRef.rst +++ llvm/docs/LangRef.rst @@ -25399,6 +25399,8 @@ mode or state of floating point exceptions. Altering the floating point environment requires special care. See :ref:`Floating Point Environment `. +.. _int_get_rounding: + '``llvm.get.rounding``' Intrinsic ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Index: llvm/docs/ReleaseNotes.rst =================================================================== --- llvm/docs/ReleaseNotes.rst +++ llvm/docs/ReleaseNotes.rst @@ -158,6 +158,8 @@ * llvm.exp2.f32 and llvm.exp.f32 are now lowered accurately. Use llvm.amdgcn.exp2.f32 to access the old behavior for llvm.exp2.f32. +* Implemented :ref:`llvm.get.rounding ` + Changes to the ARM Backend -------------------------- Index: llvm/include/llvm/CodeGen/ISDOpcodes.h =================================================================== --- llvm/include/llvm/CodeGen/ISDOpcodes.h +++ llvm/include/llvm/CodeGen/ISDOpcodes.h @@ -872,6 +872,7 @@ /// 2 Round to +inf /// 3 Round to -inf /// 4 Round to nearest, ties to zero + /// Other values are target dependent. /// Result is rounding mode and chain. Input is a chain. GET_ROUNDING, Index: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp =================================================================== --- llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp +++ llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp @@ -99,7 +99,7 @@ Module *Mod = nullptr; const DataLayout *DL = nullptr; bool HasUnsafeFPMath = false; - bool HasFP32Denormals = false; + bool HasFP32DenormalFlush = false; bool FlowChanged = false; DenseMap BreakPhiNodesCache; @@ -793,8 +793,8 @@ // // NOTE: optimizeWithRcp should be tried first because rcp is the preference. static Value *optimizeWithFDivFast(Value *Num, Value *Den, float ReqdAccuracy, - bool HasDenormals, IRBuilder<> &Builder, - Module *Mod) { + bool HasFP32DenormalFlush, + IRBuilder<> &Builder, Module *Mod) { // fdiv.fast can achieve 2.5 ULP accuracy. if (ReqdAccuracy < 2.5f) return nullptr; @@ -811,7 +811,7 @@ } // fdiv does not support denormals. But 1.0/x is always fine to use it. - if (HasDenormals && !NumIsOne) + if (!HasFP32DenormalFlush && !NumIsOne) return nullptr; Function *Decl = Intrinsic::getDeclaration(Mod, Intrinsic::amdgcn_fdiv_fast); @@ -851,7 +851,7 @@ // rcp_f16 is accurate to 0.51 ulp. // rcp_f32 is accurate for !fpmath >= 1.0ulp and denormals are flushed. // rcp_f64 is never accurate. - const bool RcpIsAccurate = !HasFP32Denormals && ReqdAccuracy >= 1.0f; + const bool RcpIsAccurate = HasFP32DenormalFlush && ReqdAccuracy >= 1.0f; IRBuilder<> Builder(FDiv.getParent(), std::next(FDiv.getIterator())); Builder.setFastMathFlags(FMF); @@ -873,8 +873,8 @@ Value *NewElt = optimizeWithRcp(NumEltI, DenEltI, AllowInaccurateRcp, RcpIsAccurate, Builder, Mod); if (!NewElt) // Try fdiv.fast. - NewElt = optimizeWithFDivFast(NumEltI, DenEltI, ReqdAccuracy, - HasFP32Denormals, Builder, Mod); + NewElt = optimizeWithFDivFast(NumEltI, DenEltI, ReqdAccuracy, + HasFP32DenormalFlush, Builder, Mod); if (!NewElt) // Keep the original. NewElt = Builder.CreateFDiv(NumEltI, DenEltI); @@ -885,8 +885,8 @@ NewFDiv = optimizeWithRcp(Num, Den, AllowInaccurateRcp, RcpIsAccurate, Builder, Mod); if (!NewFDiv) { // Try fdiv.fast. - NewFDiv = optimizeWithFDivFast(Num, Den, ReqdAccuracy, HasFP32Denormals, - Builder, Mod); + NewFDiv = optimizeWithFDivFast(Num, Den, ReqdAccuracy, + HasFP32DenormalFlush, Builder, Mod); } } @@ -1832,7 +1832,8 @@ Impl.DT = DTWP ? &DTWP->getDomTree() : nullptr; Impl.HasUnsafeFPMath = hasUnsafeFPMath(F); SIModeRegisterDefaults Mode(F); - Impl.HasFP32Denormals = Mode.allFP32Denormals(); + Impl.HasFP32DenormalFlush = + Mode.FP32Denormals == DenormalMode::getPreserveSign(); return Impl.run(F); } @@ -1848,7 +1849,8 @@ Impl.DT = FAM.getCachedResult(F); Impl.HasUnsafeFPMath = hasUnsafeFPMath(F); SIModeRegisterDefaults Mode(F); - Impl.HasFP32Denormals = Mode.allFP32Denormals(); + Impl.HasFP32DenormalFlush = + Mode.FP32Denormals == DenormalMode::getPreserveSign(); PreservedAnalyses PA = PreservedAnalyses::none(); if (!Impl.FlowChanged) PA.preserveSet(); Index: llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp =================================================================== --- llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp +++ llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp @@ -1883,7 +1883,8 @@ bool UseFmadFtz = false; if (Subtarget->isGCN()) { const SIMachineFunctionInfo *MFI = MF.getInfo(); - UseFmadFtz = MFI->getMode().allFP32Denormals(); + UseFmadFtz = + MFI->getMode().FP32Denormals != DenormalMode::getPreserveSign(); } // float fr = mad(fqneg, fb, fa); @@ -1975,11 +1976,11 @@ const SIMachineFunctionInfo *MFI = MF.getInfo(); // Compute denominator reciprocal. - unsigned FMAD = !Subtarget->hasMadMacF32Insts() ? - (unsigned)ISD::FMA : - !MFI->getMode().allFP32Denormals() ? - (unsigned)ISD::FMAD : - (unsigned)AMDGPUISD::FMAD_FTZ; + unsigned FMAD = + !Subtarget->hasMadMacF32Insts() ? (unsigned)ISD::FMA + : MFI->getMode().FP32Denormals == DenormalMode::getPreserveSign() + ? (unsigned)ISD::FMAD + : (unsigned)AMDGPUISD::FMAD_FTZ; SDValue Cvt_Lo = DAG.getNode(ISD::UINT_TO_FP, DL, MVT::f32, RHS_Lo); SDValue Cvt_Hi = DAG.getNode(ISD::UINT_TO_FP, DL, MVT::f32, RHS_Hi); Index: llvm/lib/Target/AMDGPU/AMDGPUInstructions.td =================================================================== --- llvm/lib/Target/AMDGPU/AMDGPUInstructions.td +++ llvm/lib/Target/AMDGPU/AMDGPUInstructions.td @@ -110,12 +110,12 @@ class AMDGPUPatIgnoreCopies : AMDGPUPat; let RecomputePerFunction = 1 in { -def FP16Denormals : Predicate<"MF->getInfo()->getMode().allFP64FP16Denormals()">; -def FP32Denormals : Predicate<"MF->getInfo()->getMode().allFP32Denormals()">; -def FP64Denormals : Predicate<"MF->getInfo()->getMode().allFP64FP16Denormals()">; -def NoFP16Denormals : Predicate<"!MF->getInfo()->getMode().allFP64FP16Denormals()">; -def NoFP32Denormals : Predicate<"!MF->getInfo()->getMode().allFP32Denormals()">; -def NoFP64Denormals : Predicate<"!MF->getInfo()->getMode().allFP64FP16Denormals()">; +def FP16Denormals : Predicate<"MF->getInfo()->getMode().FP64FP16Denormals != DenormalMode::getPreserveSign()">; +def FP32Denormals : Predicate<"MF->getInfo()->getMode().FP32Denormals != DenormalMode::getPreserveSign()">; +def FP64Denormals : Predicate<"MF->getInfo()->getMode().FP64FP16Denormals != DenormalMode::getPreserveSign()">; +def NoFP16Denormals : Predicate<"MF->getInfo()->getMode().FP64FP16Denormals == DenormalMode::getPreserveSign()">; +def NoFP32Denormals : Predicate<"MF->getInfo()->getMode().FP32Denormals == DenormalMode::getPreserveSign()">; +def NoFP64Denormals : Predicate<"MF->getInfo()->getMode().FP64FP16Denormals == DenormalMode::getPreserveSign()">; def UnsafeFPMath : Predicate<"TM.Options.UnsafeFPMath">; } Index: llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp =================================================================== --- llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp +++ llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp @@ -2965,9 +2965,11 @@ // TODO: Always legal with future ftz flag. // FIXME: Do we need just output? - if (Ty == LLT::scalar(32) && !MFI->getMode().allFP32Denormals()) + if (Ty == LLT::scalar(32) && + MFI->getMode().FP32Denormals == DenormalMode::getPreserveSign()) return true; - if (Ty == LLT::scalar(16) && !MFI->getMode().allFP64FP16Denormals()) + if (Ty == LLT::scalar(16) && + MFI->getMode().FP64FP16Denormals == DenormalMode::getPreserveSign()) return true; MachineIRBuilder HelperBuilder(MI); @@ -4642,7 +4644,7 @@ // FIXME: Doesn't correctly model the FP mode switch, and the FP operations // aren't modeled as reading it. - if (!Mode.allFP32Denormals()) + if (Mode.FP32Denormals != DenormalMode::getIEEE()) toggleSPDenormMode(true, B, ST, Mode); auto Fma0 = B.buildFMA(S32, NegDivScale0, ApproxRcp, One, Flags); @@ -4652,7 +4654,9 @@ auto Fma3 = B.buildFMA(S32, Fma2, Fma1, Mul, Flags); auto Fma4 = B.buildFMA(S32, NegDivScale0, Fma3, NumeratorScaled, Flags); - if (!Mode.allFP32Denormals()) + // FIXME: This mishandles dynamic denormal mode. We need to query the + // current mode and restore the original. + if (Mode.FP32Denormals != DenormalMode::getIEEE()) toggleSPDenormMode(false, B, ST, Mode); auto Fmas = B.buildIntrinsic(Intrinsic::amdgcn_div_fmas, {S32}, false) Index: llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp =================================================================== --- llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp +++ llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp @@ -297,8 +297,9 @@ TLI(ST->getTargetLowering()), CommonTTI(TM, F), IsGraphics(AMDGPU::isGraphics(F.getCallingConv())) { SIModeRegisterDefaults Mode(F); - HasFP32Denormals = Mode.allFP32Denormals(); - HasFP64FP16Denormals = Mode.allFP64FP16Denormals(); + HasFP32Denormals = Mode.FP32Denormals != DenormalMode::getPreserveSign(); + HasFP64FP16Denormals = + Mode.FP64FP16Denormals != DenormalMode::getPreserveSign(); } bool GCNTTIImpl::hasBranchDivergence(const Function *F) const { Index: llvm/lib/Target/AMDGPU/SIISelLowering.h =================================================================== --- llvm/lib/Target/AMDGPU/SIISelLowering.h +++ llvm/lib/Target/AMDGPU/SIISelLowering.h @@ -402,6 +402,7 @@ SDValue lowerDYNAMIC_STACKALLOCImpl(SDValue Op, SelectionDAG &DAG) const; SDValue LowerDYNAMIC_STACKALLOC(SDValue Op, SelectionDAG &DAG) const; + SDValue lowerGET_ROUNDING(SDValue Op, SelectionDAG &DAG) const; Register getRegisterByName(const char* RegName, LLT VT, const MachineFunction &MF) const override; Index: llvm/lib/Target/AMDGPU/SIISelLowering.cpp =================================================================== --- llvm/lib/Target/AMDGPU/SIISelLowering.cpp +++ llvm/lib/Target/AMDGPU/SIISelLowering.cpp @@ -751,6 +751,8 @@ MVT::i8, MVT::i128}, Custom); + setOperationAction(ISD::GET_ROUNDING, MVT::i32, Custom); + setTargetDAGCombine({ISD::ADD, ISD::UADDO_CARRY, ISD::SUB, @@ -3516,6 +3518,77 @@ return AMDGPUTargetLowering::LowerDYNAMIC_STACKALLOC(Op, DAG); } +SDValue SITargetLowering::lowerGET_ROUNDING(SDValue Op, + SelectionDAG &DAG) const { + SDLoc SL(Op); + assert(Op.getValueType() == MVT::i32); + + uint32_t BothRoundHwReg = + AMDGPU::Hwreg::encodeHwreg(AMDGPU::Hwreg::ID_MODE, 0, 4); + SDValue GetRoundBothImm = DAG.getTargetConstant(BothRoundHwReg, SL, MVT::i32); + + SDValue IntrinID = + DAG.getTargetConstant(Intrinsic::amdgcn_s_getreg, SL, MVT::i32); + SDValue GetReg = DAG.getNode(ISD::INTRINSIC_W_CHAIN, SL, Op->getVTList(), + Op.getOperand(0), IntrinID, GetRoundBothImm); + + // There are two rounding modes, one for f32 and one for f64/f16. We only + // report in the standard value range if both are the same. + // + // The raw values also differ from the expected FLT_ROUNDS values. Nearest + // ties away from zero is not supported, and the other values are rotated by + // 1. + // + // If the two rounding modes are not the same, report a target defined value. + + // Mode register rounding mode fields: + // + // [1:0] Single-precision round mode. + // [3:2] Double/Half-precision round mode. + // + // 0=nearest even; 1= +infinity; 2= -infinity, 3= toward zero. + // + // Hardware Spec + // Toward-0 3 0 + // Nearest Even 0 1 + // +Inf 1 2 + // -Inf 2 3 + // NearestAway0 N/A 4 + // + // We have to handle 16 permutations of a 4-bit value, so we create a 64-bit + // table we can index by the raw hardware mode. + // + // (trunc (FltRoundConversionTable >> MODE.fp_round)) & 0xf + + SDValue BitTable = + DAG.getConstant(AMDGPU::FltRoundConversionTable, SL, MVT::i64); + + SDValue Two = DAG.getConstant(2, SL, MVT::i32); + SDValue RoundModeTimesNumBits = + DAG.getNode(ISD::SHL, SL, MVT::i32, GetReg, Two); + + // TODO: We could possibly avoid a 64-bit shift and use a simpler table if we + // knew only one mode was demanded. + SDValue TableValue = + DAG.getNode(ISD::SRL, SL, MVT::i64, BitTable, RoundModeTimesNumBits); + SDValue TruncTable = DAG.getNode(ISD::TRUNCATE, SL, MVT::i32, TableValue); + + SDValue EntryMask = DAG.getConstant(0xf, SL, MVT::i32); + SDValue TableEntry = + DAG.getNode(ISD::AND, SL, MVT::i32, TruncTable, EntryMask); + + // There's a gap in the 4-bit encoded table and actual enum values, so offset + // if it's an extended value. + SDValue Four = DAG.getConstant(4, SL, MVT::i32); + SDValue IsStandardValue = + DAG.getSetCC(SL, MVT::i1, TableEntry, Four, ISD::SETULT); + SDValue EnumOffset = DAG.getNode(ISD::ADD, SL, MVT::i32, TableEntry, Four); + SDValue Result = DAG.getNode(ISD::SELECT, SL, MVT::i32, IsStandardValue, + TableEntry, EnumOffset); + + return DAG.getMergeValues({Result, GetReg.getValue(1)}, SL); +} + Register SITargetLowering::getRegisterByName(const char* RegName, LLT VT, const MachineFunction &MF) const { Register Reg = StringSwitch(RegName) @@ -4902,6 +4975,8 @@ return lowerXMUL_LOHI(Op, DAG); case ISD::DYNAMIC_STACKALLOC: return LowerDYNAMIC_STACKALLOC(Op, DAG); + case ISD::GET_ROUNDING: + return lowerGET_ROUNDING(Op, DAG); } return SDValue(); } Index: llvm/lib/Target/AMDGPU/SIModeRegisterDefaults.h =================================================================== --- llvm/lib/Target/AMDGPU/SIModeRegisterDefaults.h +++ llvm/lib/Target/AMDGPU/SIModeRegisterDefaults.h @@ -85,6 +85,65 @@ } }; +namespace AMDGPU { + +/// Return values used for llvm.get.rounding +/// +/// When both the F32 and F64/F16 modes are the same, returns the standard +/// values. If they differ, returns an extended mode starting at 8. +enum AMDGPUFltRounds : int8_t { + // Inherit everything from RoundingMode + TowardZero = static_cast(RoundingMode::TowardZero), + NearestTiesToEven = static_cast(RoundingMode::NearestTiesToEven), + TowardPositive = static_cast(RoundingMode::TowardPositive), + TowardNegative = static_cast(RoundingMode::TowardNegative), + NearestTiesToAwayUnsupported = + static_cast(RoundingMode::NearestTiesToAway), + + Dynamic = static_cast(RoundingMode::Dynamic), + + // Permute the mismatched rounding mode cases. If the modes are the same, use + // the standard values, otherwise, these values are sorted such that higher + // hardware encoded values have higher enum values. + NearestTiesToEvenF32_NearestTiesToEvenF64 = NearestTiesToEven, + NearestTiesToEvenF32_TowardPositiveF64 = 8, + NearestTiesToEvenF32_TowardNegativeF64 = 9, + NearestTiesToEvenF32_TowardZeroF64 = 10, + + TowardPositiveF32_NearestTiesToEvenF64 = 11, + TowardPositiveF32_TowardPositiveF64 = TowardPositive, + TowardPositiveF32_TowardNegativeF64 = 12, + TowardPositiveF32_TowardZeroF64 = 13, + + TowardNegativeF32_NearestTiesToEvenF64 = 14, + TowardNegativeF32_TowardPositiveF64 = 15, + TowardNegativeF32_TowardNegativeF64 = TowardNegative, + TowardNegativeF32_TowardZeroF64 = 16, + + TowardZeroF32_NearestTiesToEvenF64 = 17, + TowardZeroF32_TowardPositiveF64 = 18, + TowardZeroF32_TowardNegativeF64 = 19, + TowardZeroF32_TowardZeroF64 = TowardZero, + + Invalid = static_cast(RoundingMode::Invalid) +}; + +/// Offset of nonstandard values for llvm.get.rounding results from the largest +/// supported mode. +static constexpr uint32_t ExtendedFltRoundOffset = 4; + +/// Offset in mode register of f32 rounding mode. +static constexpr uint32_t F32FltRoundOffset = 0; + +/// Offset in mode register of f64/f16 rounding mode. +static constexpr uint32_t F64FltRoundOffset = 2; + +// Bit indexed table to convert from hardware rounding mode values to FLT_ROUNDS +// values. +extern const uint64_t FltRoundConversionTable; + +} // end namespace AMDGPU + } // end namespace llvm #endif // LLVM_LIB_TARGET_AMDGPU_SIMODEREGISTERDEFAULTS_H Index: llvm/lib/Target/AMDGPU/SIModeRegisterDefaults.cpp =================================================================== --- llvm/lib/Target/AMDGPU/SIModeRegisterDefaults.cpp +++ llvm/lib/Target/AMDGPU/SIModeRegisterDefaults.cpp @@ -36,3 +36,135 @@ FP64FP16Denormals = DenormMode; } } + +using namespace AMDGPU; + +/// Combine f32 and f64 rounding modes into a combined rounding mode value. +static constexpr uint32_t getModeRegisterRoundMode(uint32_t HWFP32Val, + uint32_t HWFP64Val) { + return HWFP32Val << F32FltRoundOffset | HWFP64Val << F64FltRoundOffset; +} + +static constexpr uint64_t encodeFltRoundsTable(uint32_t FltRoundsVal, + uint32_t HWF32Val, + uint32_t HWF64Val) { + uint32_t ModeVal = getModeRegisterRoundMode(HWF32Val, HWF64Val); + if (FltRoundsVal > TowardNegative) + FltRoundsVal -= ExtendedFltRoundOffset; + + uint32_t BitIndex = ModeVal << 2; + return static_cast(FltRoundsVal) << BitIndex; +} + +// Encode FLT_ROUNDS value where the two rounding modes are the same and use a +// standard value +static constexpr uint64_t +encodeFltRoundsTableSame(AMDGPUFltRounds FltRoundsMode, uint32_t HWVal) { + return encodeFltRoundsTable(FltRoundsMode, HWVal, HWVal); +} + +// Convert mode register encoded rounding mode to AMDGPUFltRounds +static constexpr AMDGPUFltRounds +decodeIndexFltRoundConversionTable(uint32_t HWMode) { + uint32_t TableRead = (FltRoundConversionTable >> (HWMode << 2)) & 0xf; + if (TableRead > TowardNegative) + TableRead += ExtendedFltRoundOffset; + return static_cast(TableRead); +} + +static constexpr uint32_t HWTowardZero = FP_ROUND_ROUND_TO_ZERO; +static constexpr uint32_t HWNearestTiesToEven = FP_ROUND_ROUND_TO_NEAREST; +static constexpr uint32_t HWTowardPositive = FP_ROUND_ROUND_TO_INF; +static constexpr uint32_t HWTowardNegative = FP_ROUND_ROUND_TO_NEGINF; + +constexpr uint64_t AMDGPU::FltRoundConversionTable = + encodeFltRoundsTableSame(TowardZeroF32_TowardZeroF64, HWTowardZero) | + encodeFltRoundsTableSame(NearestTiesToEvenF32_NearestTiesToEvenF64, + HWNearestTiesToEven) | + encodeFltRoundsTableSame(TowardPositiveF32_TowardPositiveF64, + HWTowardPositive) | + encodeFltRoundsTableSame(TowardNegativeF32_TowardNegativeF64, + HWTowardNegative) | + + encodeFltRoundsTable(TowardZeroF32_NearestTiesToEvenF64, HWTowardZero, + HWNearestTiesToEven) | + encodeFltRoundsTable(TowardZeroF32_TowardPositiveF64, HWTowardZero, + HWTowardPositive) | + encodeFltRoundsTable(TowardZeroF32_TowardNegativeF64, HWTowardZero, + HWTowardNegative) | + + encodeFltRoundsTable(NearestTiesToEvenF32_TowardZeroF64, + HWNearestTiesToEven, HWTowardZero) | + encodeFltRoundsTable(NearestTiesToEvenF32_TowardPositiveF64, + HWNearestTiesToEven, HWTowardPositive) | + encodeFltRoundsTable(NearestTiesToEvenF32_TowardNegativeF64, + HWNearestTiesToEven, HWTowardNegative) | + + encodeFltRoundsTable(TowardPositiveF32_TowardZeroF64, HWTowardPositive, + HWTowardZero) | + encodeFltRoundsTable(TowardPositiveF32_NearestTiesToEvenF64, + HWTowardPositive, HWNearestTiesToEven) | + encodeFltRoundsTable(TowardPositiveF32_TowardNegativeF64, HWTowardPositive, + HWTowardNegative) | + + encodeFltRoundsTable(TowardNegativeF32_TowardZeroF64, HWTowardNegative, + HWTowardZero) | + encodeFltRoundsTable(TowardNegativeF32_NearestTiesToEvenF64, + HWTowardNegative, HWNearestTiesToEven) | + encodeFltRoundsTable(TowardNegativeF32_TowardPositiveF64, HWTowardNegative, + HWTowardPositive); + +// Verify evaluation of FltRoundConversionTable + +// If both modes are the same, should return the standard values. +static_assert(decodeIndexFltRoundConversionTable(getModeRegisterRoundMode( + HWTowardZero, HWTowardZero)) == AMDGPUFltRounds::TowardZero); +static_assert(decodeIndexFltRoundConversionTable(getModeRegisterRoundMode( + HWNearestTiesToEven, HWNearestTiesToEven)) == + AMDGPUFltRounds::NearestTiesToEven); +static_assert(decodeIndexFltRoundConversionTable(getModeRegisterRoundMode( + HWTowardPositive, HWTowardPositive)) == + AMDGPUFltRounds::TowardPositive); +static_assert(decodeIndexFltRoundConversionTable(getModeRegisterRoundMode( + HWTowardNegative, HWTowardNegative)) == + AMDGPUFltRounds::TowardNegative); + +static_assert(decodeIndexFltRoundConversionTable(getModeRegisterRoundMode( + HWTowardZero, HWNearestTiesToEven)) == + TowardZeroF32_NearestTiesToEvenF64); +static_assert(decodeIndexFltRoundConversionTable( + getModeRegisterRoundMode(HWTowardZero, HWTowardPositive)) == + TowardZeroF32_TowardPositiveF64); +static_assert(decodeIndexFltRoundConversionTable( + getModeRegisterRoundMode(HWTowardZero, HWTowardNegative)) == + TowardZeroF32_TowardNegativeF64); + +static_assert(decodeIndexFltRoundConversionTable(getModeRegisterRoundMode( + HWNearestTiesToEven, HWTowardZero)) == + NearestTiesToEvenF32_TowardZeroF64); +static_assert(decodeIndexFltRoundConversionTable(getModeRegisterRoundMode( + HWNearestTiesToEven, HWTowardPositive)) == + NearestTiesToEvenF32_TowardPositiveF64); +static_assert(decodeIndexFltRoundConversionTable(getModeRegisterRoundMode( + HWNearestTiesToEven, HWTowardNegative)) == + NearestTiesToEvenF32_TowardNegativeF64); + +static_assert(decodeIndexFltRoundConversionTable( + getModeRegisterRoundMode(HWTowardPositive, HWTowardZero)) == + TowardPositiveF32_TowardZeroF64); +static_assert(decodeIndexFltRoundConversionTable(getModeRegisterRoundMode( + HWTowardPositive, HWNearestTiesToEven)) == + TowardPositiveF32_NearestTiesToEvenF64); +static_assert(decodeIndexFltRoundConversionTable(getModeRegisterRoundMode( + HWTowardPositive, HWTowardNegative)) == + TowardPositiveF32_TowardNegativeF64); + +static_assert(decodeIndexFltRoundConversionTable( + getModeRegisterRoundMode(HWTowardNegative, HWTowardZero)) == + TowardNegativeF32_TowardZeroF64); +static_assert(decodeIndexFltRoundConversionTable(getModeRegisterRoundMode( + HWTowardNegative, HWNearestTiesToEven)) == + TowardNegativeF32_NearestTiesToEvenF64); +static_assert(decodeIndexFltRoundConversionTable(getModeRegisterRoundMode( + HWTowardNegative, HWTowardPositive)) == + TowardNegativeF32_TowardPositiveF64); Index: llvm/test/CodeGen/AMDGPU/GlobalISel/combine-fma-unmerge-values.mir =================================================================== --- llvm/test/CodeGen/AMDGPU/GlobalISel/combine-fma-unmerge-values.mir +++ llvm/test/CodeGen/AMDGPU/GlobalISel/combine-fma-unmerge-values.mir @@ -34,6 +34,7 @@ machineFunctionInfo: mode: fp32-input-denormals: false + fp32-output-denormals: false body: | bb.1: liveins: $vgpr0, $vgpr1, $vgpr2_vgpr3 @@ -63,6 +64,7 @@ machineFunctionInfo: mode: fp32-input-denormals: false + fp32-output-denormals: false body: | bb.1: liveins: $sgpr0, $sgpr1, $vgpr0_vgpr1 @@ -99,6 +101,7 @@ machineFunctionInfo: mode: fp32-input-denormals: false + fp32-output-denormals: false body: | bb.1: liveins: $sgpr0, $sgpr1, $vgpr0_vgpr1 @@ -199,6 +202,7 @@ machineFunctionInfo: mode: fp32-input-denormals: false + fp32-output-denormals: false body: | bb.1: liveins: $vgpr0, $vgpr1, $vgpr2_vgpr3, $vgpr4, $vgpr5 @@ -241,6 +245,7 @@ machineFunctionInfo: mode: fp32-input-denormals: false + fp32-output-denormals: false body: | bb.1: liveins: $vgpr0, $vgpr1, $vgpr2_vgpr3, $vgpr4, $vgpr5 @@ -290,6 +295,7 @@ machineFunctionInfo: mode: fp32-input-denormals: false + fp32-output-denormals: false body: | bb.1: liveins: $vgpr0_vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5 @@ -332,6 +338,7 @@ machineFunctionInfo: mode: fp32-input-denormals: false + fp32-output-denormals: false body: | bb.1: liveins: $vgpr0_vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5 @@ -381,6 +388,7 @@ machineFunctionInfo: mode: fp32-input-denormals: false + fp32-output-denormals: false body: | bb.1: liveins: $vgpr0, $vgpr1, $vgpr2_vgpr3 @@ -411,6 +419,7 @@ machineFunctionInfo: mode: fp32-input-denormals: false + fp32-output-denormals: false body: | bb.1: liveins: $vgpr0, $vgpr1, $vgpr2_vgpr3 Index: llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-fdiv.ll =================================================================== --- llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-fdiv.ll +++ llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-fdiv.ll @@ -4,7 +4,7 @@ ; NOOP-LABEL: @noop_fdiv_fpmath( ; NOOP: %md.25ulp = fdiv float %a, %b, !fpmath !0 -define amdgpu_kernel void @noop_fdiv_fpmath(ptr addrspace(1) %out, float %a, float %b) #3 { +define amdgpu_kernel void @noop_fdiv_fpmath(ptr addrspace(1) %out, float %a, float %b) { %md.25ulp = fdiv float %a, %b, !fpmath !0 store volatile float %md.25ulp, ptr addrspace(1) %out ret void @@ -337,9 +337,24 @@ ret void } +; CHECK-LABEL: @rcp_fpmath_dynamic_denorm( +; CHECK: %md.25ulp = fdiv float 1.000000e+00, %x, !fpmath !2 +define float @rcp_fpmath_dynamic_denorm(float %x) #3 { + %md.25ulp = fdiv float 1.0, %x, !fpmath !2 + ret float %md.25ulp +} + +; CHECK-LABEL: @rcp_dynamic_denorm( +; CHECK: %md.25ulp = fdiv float 1.000000e+00, %x +define float @rcp_dynamic_denorm(float %x) #3 { + %md.25ulp = fdiv float 1.0, %x + ret float %md.25ulp +} + attributes #0 = { nounwind optnone noinline } attributes #1 = { nounwind "denormal-fp-math-f32"="preserve-sign,preserve-sign" } attributes #2 = { nounwind "denormal-fp-math-f32"="ieee,ieee" } +attributes #3 = { nounwind "denormal-fp-math-f32"="dynamic,dynamic" } !0 = !{float 2.500000e+00} !1 = !{float 5.000000e-01} Index: llvm/test/CodeGen/AMDGPU/fdiv.ll =================================================================== --- llvm/test/CodeGen/AMDGPU/fdiv.ll +++ llvm/test/CodeGen/AMDGPU/fdiv.ll @@ -370,8 +370,20 @@ ret void } +; FUNC-LABEL: {{^}}v_fdiv_f32_dynamic_denorm: +; PREGFX10: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3 +; GFX10: s_denorm_mode 15 + +; PREGFX10: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 0 +; GFX10: s_denorm_mode 12 +define float @v_fdiv_f32_dynamic_denorm(float %a, float %b) #3 { + %fdiv = fdiv float %a, %b + ret float %fdiv +} + attributes #0 = { nounwind "enable-unsafe-fp-math"="false" "denormal-fp-math-f32"="preserve-sign,preserve-sign" "target-features"="-flat-for-global" } attributes #1 = { nounwind "enable-unsafe-fp-math"="true" "denormal-fp-math-f32"="preserve-sign,preserve-sign" "target-features"="-flat-for-global" } attributes #2 = { nounwind "enable-unsafe-fp-math"="false" "denormal-fp-math-f32"="ieee,ieee" "target-features"="-flat-for-global" } +attributes #3 = { nounwind "denormal-fp-math-f32"="dynamic,dynamic" "target-features"="-flat-for-global" } !0 = !{float 2.500000e+00} Index: llvm/test/CodeGen/AMDGPU/llvm.exp.ll =================================================================== --- llvm/test/CodeGen/AMDGPU/llvm.exp.ll +++ llvm/test/CodeGen/AMDGPU/llvm.exp.ll @@ -3989,14 +3989,15 @@ ; VI-SDAG-NEXT: v_sub_f32_e32 v4, v0, v1 ; VI-SDAG-NEXT: v_mul_f32_e32 v2, 0x3fb8a000, v1 ; VI-SDAG-NEXT: v_mul_f32_e32 v5, 0x39a3b295, v4 -; VI-SDAG-NEXT: s_mov_b32 s4, 0x3fb8a000 -; VI-SDAG-NEXT: v_rndne_f32_e32 v2, v2 -; VI-SDAG-NEXT: v_mac_f32_e32 v5, 0x3fb8a000, v4 -; VI-SDAG-NEXT: v_mad_f32 v3, v1, s4, -v2 -; VI-SDAG-NEXT: v_mac_f32_e32 v5, 0x39a3b295, v1 -; VI-SDAG-NEXT: v_add_f32_e32 v1, v3, v5 +; VI-SDAG-NEXT: v_mul_f32_e32 v4, 0x3fb8a000, v4 +; VI-SDAG-NEXT: v_rndne_f32_e32 v3, v2 +; VI-SDAG-NEXT: v_add_f32_e32 v4, v4, v5 +; VI-SDAG-NEXT: v_mul_f32_e32 v1, 0x39a3b295, v1 +; VI-SDAG-NEXT: v_sub_f32_e32 v2, v2, v3 +; VI-SDAG-NEXT: v_add_f32_e32 v1, v1, v4 +; VI-SDAG-NEXT: v_add_f32_e32 v1, v2, v1 ; VI-SDAG-NEXT: v_exp_f32_e32 v1, v1 -; VI-SDAG-NEXT: v_cvt_i32_f32_e32 v2, v2 +; VI-SDAG-NEXT: v_cvt_i32_f32_e32 v2, v3 ; VI-SDAG-NEXT: s_mov_b32 s4, 0xc2ce8ed0 ; VI-SDAG-NEXT: v_cmp_ngt_f32_e32 vcc, s4, v0 ; VI-SDAG-NEXT: s_mov_b32 s4, 0x42b17218 @@ -4012,14 +4013,15 @@ ; VI-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; VI-GISEL-NEXT: v_and_b32_e32 v1, 0xfffff000, v0 ; VI-GISEL-NEXT: v_sub_f32_e32 v2, v0, v1 -; VI-GISEL-NEXT: v_mul_f32_e32 v3, 0x3fb8a000, v1 ; VI-GISEL-NEXT: v_mul_f32_e32 v4, 0x39a3b295, v2 -; VI-GISEL-NEXT: s_mov_b32 s4, 0x3fb8a000 -; VI-GISEL-NEXT: v_mac_f32_e32 v4, 0x3fb8a000, v2 +; VI-GISEL-NEXT: v_mul_f32_e32 v2, 0x3fb8a000, v2 +; VI-GISEL-NEXT: v_mul_f32_e32 v3, 0x3fb8a000, v1 +; VI-GISEL-NEXT: v_add_f32_e32 v2, v2, v4 +; VI-GISEL-NEXT: v_mul_f32_e32 v1, 0x39a3b295, v1 +; VI-GISEL-NEXT: v_add_f32_e32 v1, v1, v2 ; VI-GISEL-NEXT: v_rndne_f32_e32 v2, v3 -; VI-GISEL-NEXT: v_mac_f32_e32 v4, 0x39a3b295, v1 -; VI-GISEL-NEXT: v_mad_f32 v1, v1, s4, -v2 -; VI-GISEL-NEXT: v_add_f32_e32 v1, v1, v4 +; VI-GISEL-NEXT: v_sub_f32_e32 v3, v3, v2 +; VI-GISEL-NEXT: v_add_f32_e32 v1, v3, v1 ; VI-GISEL-NEXT: v_cvt_i32_f32_e32 v2, v2 ; VI-GISEL-NEXT: v_exp_f32_e32 v1, v1 ; VI-GISEL-NEXT: v_mov_b32_e32 v3, 0x7f800000 @@ -4035,20 +4037,20 @@ ; GFX900-SDAG-LABEL: v_exp_f32_afn_dynamic: ; GFX900-SDAG: ; %bb.0: ; GFX900-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GFX900-SDAG-NEXT: s_mov_b32 s4, 0x3fb8aa3b ; GFX900-SDAG-NEXT: v_mul_f32_e32 v1, 0x3fb8aa3b, v0 -; GFX900-SDAG-NEXT: v_fma_f32 v2, v0, s4, -v1 -; GFX900-SDAG-NEXT: s_mov_b32 s5, 0x32a5705f -; GFX900-SDAG-NEXT: v_rndne_f32_e32 v1, v1 -; GFX900-SDAG-NEXT: v_fma_f32 v2, v0, s5, v2 -; GFX900-SDAG-NEXT: v_mad_f32 v3, v0, s4, -v1 -; GFX900-SDAG-NEXT: v_add_f32_e32 v2, v3, v2 -; GFX900-SDAG-NEXT: v_exp_f32_e32 v2, v2 -; GFX900-SDAG-NEXT: v_cvt_i32_f32_e32 v1, v1 +; GFX900-SDAG-NEXT: s_mov_b32 s4, 0x3fb8aa3b +; GFX900-SDAG-NEXT: v_rndne_f32_e32 v2, v1 +; GFX900-SDAG-NEXT: v_sub_f32_e32 v3, v1, v2 +; GFX900-SDAG-NEXT: v_fma_f32 v1, v0, s4, -v1 +; GFX900-SDAG-NEXT: s_mov_b32 s4, 0x32a5705f +; GFX900-SDAG-NEXT: v_fma_f32 v1, v0, s4, v1 +; GFX900-SDAG-NEXT: v_add_f32_e32 v1, v3, v1 +; GFX900-SDAG-NEXT: v_exp_f32_e32 v1, v1 +; GFX900-SDAG-NEXT: v_cvt_i32_f32_e32 v2, v2 ; GFX900-SDAG-NEXT: s_mov_b32 s4, 0xc2ce8ed0 ; GFX900-SDAG-NEXT: v_cmp_ngt_f32_e32 vcc, s4, v0 ; GFX900-SDAG-NEXT: s_mov_b32 s4, 0x42b17218 -; GFX900-SDAG-NEXT: v_ldexp_f32 v1, v2, v1 +; GFX900-SDAG-NEXT: v_ldexp_f32 v1, v1, v2 ; GFX900-SDAG-NEXT: v_cndmask_b32_e32 v1, 0, v1, vcc ; GFX900-SDAG-NEXT: v_mov_b32_e32 v2, 0x7f800000 ; GFX900-SDAG-NEXT: v_cmp_nlt_f32_e32 vcc, s4, v0 @@ -4062,14 +4064,14 @@ ; GFX900-GISEL-NEXT: v_mul_f32_e32 v1, 0x3fb8aa3b, v0 ; GFX900-GISEL-NEXT: v_fma_f32 v2, v0, s4, -v1 ; GFX900-GISEL-NEXT: v_mov_b32_e32 v3, 0x32a5705f -; GFX900-GISEL-NEXT: v_rndne_f32_e32 v1, v1 ; GFX900-GISEL-NEXT: v_fma_f32 v2, v0, v3, v2 -; GFX900-GISEL-NEXT: v_mad_f32 v3, v0, s4, -v1 -; GFX900-GISEL-NEXT: v_add_f32_e32 v2, v3, v2 -; GFX900-GISEL-NEXT: v_cvt_i32_f32_e32 v1, v1 -; GFX900-GISEL-NEXT: v_exp_f32_e32 v2, v2 +; GFX900-GISEL-NEXT: v_rndne_f32_e32 v3, v1 +; GFX900-GISEL-NEXT: v_sub_f32_e32 v1, v1, v3 +; GFX900-GISEL-NEXT: v_add_f32_e32 v1, v1, v2 +; GFX900-GISEL-NEXT: v_cvt_i32_f32_e32 v2, v3 +; GFX900-GISEL-NEXT: v_exp_f32_e32 v1, v1 ; GFX900-GISEL-NEXT: v_mov_b32_e32 v3, 0x7f800000 -; GFX900-GISEL-NEXT: v_ldexp_f32 v1, v2, v1 +; GFX900-GISEL-NEXT: v_ldexp_f32 v1, v1, v2 ; GFX900-GISEL-NEXT: v_mov_b32_e32 v2, 0xc2ce8ed0 ; GFX900-GISEL-NEXT: v_cmp_lt_f32_e32 vcc, v0, v2 ; GFX900-GISEL-NEXT: v_mov_b32_e32 v2, 0x42b17218 @@ -4081,20 +4083,20 @@ ; SI-SDAG-LABEL: v_exp_f32_afn_dynamic: ; SI-SDAG: ; %bb.0: ; SI-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; SI-SDAG-NEXT: s_mov_b32 s4, 0x3fb8aa3b ; SI-SDAG-NEXT: v_mul_f32_e32 v1, 0x3fb8aa3b, v0 -; SI-SDAG-NEXT: v_fma_f32 v2, v0, s4, -v1 -; SI-SDAG-NEXT: s_mov_b32 s5, 0x32a5705f -; SI-SDAG-NEXT: v_rndne_f32_e32 v1, v1 -; SI-SDAG-NEXT: v_fma_f32 v2, v0, s5, v2 -; SI-SDAG-NEXT: v_mad_f32 v3, v0, s4, -v1 -; SI-SDAG-NEXT: v_add_f32_e32 v2, v3, v2 -; SI-SDAG-NEXT: v_exp_f32_e32 v2, v2 -; SI-SDAG-NEXT: v_cvt_i32_f32_e32 v1, v1 +; SI-SDAG-NEXT: s_mov_b32 s4, 0x3fb8aa3b +; SI-SDAG-NEXT: v_rndne_f32_e32 v2, v1 +; SI-SDAG-NEXT: v_sub_f32_e32 v3, v1, v2 +; SI-SDAG-NEXT: v_fma_f32 v1, v0, s4, -v1 +; SI-SDAG-NEXT: s_mov_b32 s4, 0x32a5705f +; SI-SDAG-NEXT: v_fma_f32 v1, v0, s4, v1 +; SI-SDAG-NEXT: v_add_f32_e32 v1, v3, v1 +; SI-SDAG-NEXT: v_exp_f32_e32 v1, v1 +; SI-SDAG-NEXT: v_cvt_i32_f32_e32 v2, v2 ; SI-SDAG-NEXT: s_mov_b32 s4, 0xc2ce8ed0 ; SI-SDAG-NEXT: v_cmp_ngt_f32_e32 vcc, s4, v0 ; SI-SDAG-NEXT: s_mov_b32 s4, 0x42b17218 -; SI-SDAG-NEXT: v_ldexp_f32_e32 v1, v2, v1 +; SI-SDAG-NEXT: v_ldexp_f32_e32 v1, v1, v2 ; SI-SDAG-NEXT: v_cndmask_b32_e32 v1, 0, v1, vcc ; SI-SDAG-NEXT: v_mov_b32_e32 v2, 0x7f800000 ; SI-SDAG-NEXT: v_cmp_nlt_f32_e32 vcc, s4, v0 @@ -4108,14 +4110,14 @@ ; SI-GISEL-NEXT: v_mul_f32_e32 v1, 0x3fb8aa3b, v0 ; SI-GISEL-NEXT: v_fma_f32 v2, v0, s4, -v1 ; SI-GISEL-NEXT: v_mov_b32_e32 v3, 0x32a5705f -; SI-GISEL-NEXT: v_rndne_f32_e32 v1, v1 ; SI-GISEL-NEXT: v_fma_f32 v2, v0, v3, v2 -; SI-GISEL-NEXT: v_mad_f32 v3, v0, s4, -v1 -; SI-GISEL-NEXT: v_add_f32_e32 v2, v3, v2 -; SI-GISEL-NEXT: v_cvt_i32_f32_e32 v1, v1 -; SI-GISEL-NEXT: v_exp_f32_e32 v2, v2 +; SI-GISEL-NEXT: v_rndne_f32_e32 v3, v1 +; SI-GISEL-NEXT: v_sub_f32_e32 v1, v1, v3 +; SI-GISEL-NEXT: v_add_f32_e32 v1, v1, v2 +; SI-GISEL-NEXT: v_cvt_i32_f32_e32 v2, v3 +; SI-GISEL-NEXT: v_exp_f32_e32 v1, v1 ; SI-GISEL-NEXT: v_mov_b32_e32 v3, 0x7f800000 -; SI-GISEL-NEXT: v_ldexp_f32_e32 v1, v2, v1 +; SI-GISEL-NEXT: v_ldexp_f32_e32 v1, v1, v2 ; SI-GISEL-NEXT: v_mov_b32_e32 v2, 0xc2ce8ed0 ; SI-GISEL-NEXT: v_cmp_lt_f32_e32 vcc, v0, v2 ; SI-GISEL-NEXT: v_mov_b32_e32 v2, 0x42b17218 @@ -4306,14 +4308,15 @@ ; VI-SDAG-NEXT: v_sub_f32_e32 v4, v0, v1 ; VI-SDAG-NEXT: v_mul_f32_e32 v2, 0x3fb8a000, v1 ; VI-SDAG-NEXT: v_mul_f32_e32 v5, 0x39a3b295, v4 -; VI-SDAG-NEXT: s_mov_b32 s4, 0x3fb8a000 -; VI-SDAG-NEXT: v_rndne_f32_e32 v2, v2 -; VI-SDAG-NEXT: v_mac_f32_e32 v5, 0x3fb8a000, v4 -; VI-SDAG-NEXT: v_mad_f32 v3, v1, s4, -v2 -; VI-SDAG-NEXT: v_mac_f32_e32 v5, 0x39a3b295, v1 -; VI-SDAG-NEXT: v_add_f32_e32 v1, v3, v5 +; VI-SDAG-NEXT: v_mul_f32_e32 v4, 0x3fb8a000, v4 +; VI-SDAG-NEXT: v_rndne_f32_e32 v3, v2 +; VI-SDAG-NEXT: v_add_f32_e32 v4, v4, v5 +; VI-SDAG-NEXT: v_mul_f32_e32 v1, 0x39a3b295, v1 +; VI-SDAG-NEXT: v_sub_f32_e32 v2, v2, v3 +; VI-SDAG-NEXT: v_add_f32_e32 v1, v1, v4 +; VI-SDAG-NEXT: v_add_f32_e32 v1, v2, v1 ; VI-SDAG-NEXT: v_exp_f32_e32 v1, v1 -; VI-SDAG-NEXT: v_cvt_i32_f32_e32 v2, v2 +; VI-SDAG-NEXT: v_cvt_i32_f32_e32 v2, v3 ; VI-SDAG-NEXT: s_mov_b32 s4, 0xc2ce8ed0 ; VI-SDAG-NEXT: v_cmp_ngt_f32_e32 vcc, s4, v0 ; VI-SDAG-NEXT: s_mov_b32 s4, 0x42b17218 @@ -4329,14 +4332,15 @@ ; VI-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; VI-GISEL-NEXT: v_and_b32_e32 v1, 0xfffff000, v0 ; VI-GISEL-NEXT: v_sub_f32_e32 v2, v0, v1 -; VI-GISEL-NEXT: v_mul_f32_e32 v3, 0x3fb8a000, v1 ; VI-GISEL-NEXT: v_mul_f32_e32 v4, 0x39a3b295, v2 -; VI-GISEL-NEXT: s_mov_b32 s4, 0x3fb8a000 -; VI-GISEL-NEXT: v_mac_f32_e32 v4, 0x3fb8a000, v2 +; VI-GISEL-NEXT: v_mul_f32_e32 v2, 0x3fb8a000, v2 +; VI-GISEL-NEXT: v_mul_f32_e32 v3, 0x3fb8a000, v1 +; VI-GISEL-NEXT: v_add_f32_e32 v2, v2, v4 +; VI-GISEL-NEXT: v_mul_f32_e32 v1, 0x39a3b295, v1 +; VI-GISEL-NEXT: v_add_f32_e32 v1, v1, v2 ; VI-GISEL-NEXT: v_rndne_f32_e32 v2, v3 -; VI-GISEL-NEXT: v_mac_f32_e32 v4, 0x39a3b295, v1 -; VI-GISEL-NEXT: v_mad_f32 v1, v1, s4, -v2 -; VI-GISEL-NEXT: v_add_f32_e32 v1, v1, v4 +; VI-GISEL-NEXT: v_sub_f32_e32 v3, v3, v2 +; VI-GISEL-NEXT: v_add_f32_e32 v1, v3, v1 ; VI-GISEL-NEXT: v_cvt_i32_f32_e32 v2, v2 ; VI-GISEL-NEXT: v_exp_f32_e32 v1, v1 ; VI-GISEL-NEXT: v_mov_b32_e32 v3, 0x7f800000 @@ -4352,20 +4356,20 @@ ; GFX900-SDAG-LABEL: v_exp_f32_daz: ; GFX900-SDAG: ; %bb.0: ; GFX900-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GFX900-SDAG-NEXT: s_mov_b32 s4, 0x3fb8aa3b ; GFX900-SDAG-NEXT: v_mul_f32_e32 v1, 0x3fb8aa3b, v0 -; GFX900-SDAG-NEXT: v_fma_f32 v2, v0, s4, -v1 -; GFX900-SDAG-NEXT: s_mov_b32 s5, 0x32a5705f -; GFX900-SDAG-NEXT: v_rndne_f32_e32 v1, v1 -; GFX900-SDAG-NEXT: v_fma_f32 v2, v0, s5, v2 -; GFX900-SDAG-NEXT: v_mad_f32 v3, v0, s4, -v1 -; GFX900-SDAG-NEXT: v_add_f32_e32 v2, v3, v2 -; GFX900-SDAG-NEXT: v_exp_f32_e32 v2, v2 -; GFX900-SDAG-NEXT: v_cvt_i32_f32_e32 v1, v1 +; GFX900-SDAG-NEXT: s_mov_b32 s4, 0x3fb8aa3b +; GFX900-SDAG-NEXT: v_rndne_f32_e32 v2, v1 +; GFX900-SDAG-NEXT: v_sub_f32_e32 v3, v1, v2 +; GFX900-SDAG-NEXT: v_fma_f32 v1, v0, s4, -v1 +; GFX900-SDAG-NEXT: s_mov_b32 s4, 0x32a5705f +; GFX900-SDAG-NEXT: v_fma_f32 v1, v0, s4, v1 +; GFX900-SDAG-NEXT: v_add_f32_e32 v1, v3, v1 +; GFX900-SDAG-NEXT: v_exp_f32_e32 v1, v1 +; GFX900-SDAG-NEXT: v_cvt_i32_f32_e32 v2, v2 ; GFX900-SDAG-NEXT: s_mov_b32 s4, 0xc2ce8ed0 ; GFX900-SDAG-NEXT: v_cmp_ngt_f32_e32 vcc, s4, v0 ; GFX900-SDAG-NEXT: s_mov_b32 s4, 0x42b17218 -; GFX900-SDAG-NEXT: v_ldexp_f32 v1, v2, v1 +; GFX900-SDAG-NEXT: v_ldexp_f32 v1, v1, v2 ; GFX900-SDAG-NEXT: v_cndmask_b32_e32 v1, 0, v1, vcc ; GFX900-SDAG-NEXT: v_mov_b32_e32 v2, 0x7f800000 ; GFX900-SDAG-NEXT: v_cmp_nlt_f32_e32 vcc, s4, v0 @@ -4379,14 +4383,14 @@ ; GFX900-GISEL-NEXT: v_mul_f32_e32 v1, 0x3fb8aa3b, v0 ; GFX900-GISEL-NEXT: v_fma_f32 v2, v0, s4, -v1 ; GFX900-GISEL-NEXT: v_mov_b32_e32 v3, 0x32a5705f -; GFX900-GISEL-NEXT: v_rndne_f32_e32 v1, v1 ; GFX900-GISEL-NEXT: v_fma_f32 v2, v0, v3, v2 -; GFX900-GISEL-NEXT: v_mad_f32 v3, v0, s4, -v1 -; GFX900-GISEL-NEXT: v_add_f32_e32 v2, v3, v2 -; GFX900-GISEL-NEXT: v_cvt_i32_f32_e32 v1, v1 -; GFX900-GISEL-NEXT: v_exp_f32_e32 v2, v2 +; GFX900-GISEL-NEXT: v_rndne_f32_e32 v3, v1 +; GFX900-GISEL-NEXT: v_sub_f32_e32 v1, v1, v3 +; GFX900-GISEL-NEXT: v_add_f32_e32 v1, v1, v2 +; GFX900-GISEL-NEXT: v_cvt_i32_f32_e32 v2, v3 +; GFX900-GISEL-NEXT: v_exp_f32_e32 v1, v1 ; GFX900-GISEL-NEXT: v_mov_b32_e32 v3, 0x7f800000 -; GFX900-GISEL-NEXT: v_ldexp_f32 v1, v2, v1 +; GFX900-GISEL-NEXT: v_ldexp_f32 v1, v1, v2 ; GFX900-GISEL-NEXT: v_mov_b32_e32 v2, 0xc2ce8ed0 ; GFX900-GISEL-NEXT: v_cmp_lt_f32_e32 vcc, v0, v2 ; GFX900-GISEL-NEXT: v_mov_b32_e32 v2, 0x42b17218 @@ -4398,20 +4402,20 @@ ; SI-SDAG-LABEL: v_exp_f32_daz: ; SI-SDAG: ; %bb.0: ; SI-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; SI-SDAG-NEXT: s_mov_b32 s4, 0x3fb8aa3b ; SI-SDAG-NEXT: v_mul_f32_e32 v1, 0x3fb8aa3b, v0 -; SI-SDAG-NEXT: v_fma_f32 v2, v0, s4, -v1 -; SI-SDAG-NEXT: s_mov_b32 s5, 0x32a5705f -; SI-SDAG-NEXT: v_rndne_f32_e32 v1, v1 -; SI-SDAG-NEXT: v_fma_f32 v2, v0, s5, v2 -; SI-SDAG-NEXT: v_mad_f32 v3, v0, s4, -v1 -; SI-SDAG-NEXT: v_add_f32_e32 v2, v3, v2 -; SI-SDAG-NEXT: v_exp_f32_e32 v2, v2 -; SI-SDAG-NEXT: v_cvt_i32_f32_e32 v1, v1 +; SI-SDAG-NEXT: s_mov_b32 s4, 0x3fb8aa3b +; SI-SDAG-NEXT: v_rndne_f32_e32 v2, v1 +; SI-SDAG-NEXT: v_sub_f32_e32 v3, v1, v2 +; SI-SDAG-NEXT: v_fma_f32 v1, v0, s4, -v1 +; SI-SDAG-NEXT: s_mov_b32 s4, 0x32a5705f +; SI-SDAG-NEXT: v_fma_f32 v1, v0, s4, v1 +; SI-SDAG-NEXT: v_add_f32_e32 v1, v3, v1 +; SI-SDAG-NEXT: v_exp_f32_e32 v1, v1 +; SI-SDAG-NEXT: v_cvt_i32_f32_e32 v2, v2 ; SI-SDAG-NEXT: s_mov_b32 s4, 0xc2ce8ed0 ; SI-SDAG-NEXT: v_cmp_ngt_f32_e32 vcc, s4, v0 ; SI-SDAG-NEXT: s_mov_b32 s4, 0x42b17218 -; SI-SDAG-NEXT: v_ldexp_f32_e32 v1, v2, v1 +; SI-SDAG-NEXT: v_ldexp_f32_e32 v1, v1, v2 ; SI-SDAG-NEXT: v_cndmask_b32_e32 v1, 0, v1, vcc ; SI-SDAG-NEXT: v_mov_b32_e32 v2, 0x7f800000 ; SI-SDAG-NEXT: v_cmp_nlt_f32_e32 vcc, s4, v0 @@ -4425,14 +4429,14 @@ ; SI-GISEL-NEXT: v_mul_f32_e32 v1, 0x3fb8aa3b, v0 ; SI-GISEL-NEXT: v_fma_f32 v2, v0, s4, -v1 ; SI-GISEL-NEXT: v_mov_b32_e32 v3, 0x32a5705f -; SI-GISEL-NEXT: v_rndne_f32_e32 v1, v1 ; SI-GISEL-NEXT: v_fma_f32 v2, v0, v3, v2 -; SI-GISEL-NEXT: v_mad_f32 v3, v0, s4, -v1 -; SI-GISEL-NEXT: v_add_f32_e32 v2, v3, v2 -; SI-GISEL-NEXT: v_cvt_i32_f32_e32 v1, v1 -; SI-GISEL-NEXT: v_exp_f32_e32 v2, v2 +; SI-GISEL-NEXT: v_rndne_f32_e32 v3, v1 +; SI-GISEL-NEXT: v_sub_f32_e32 v1, v1, v3 +; SI-GISEL-NEXT: v_add_f32_e32 v1, v1, v2 +; SI-GISEL-NEXT: v_cvt_i32_f32_e32 v2, v3 +; SI-GISEL-NEXT: v_exp_f32_e32 v1, v1 ; SI-GISEL-NEXT: v_mov_b32_e32 v3, 0x7f800000 -; SI-GISEL-NEXT: v_ldexp_f32_e32 v1, v2, v1 +; SI-GISEL-NEXT: v_ldexp_f32_e32 v1, v1, v2 ; SI-GISEL-NEXT: v_mov_b32_e32 v2, 0xc2ce8ed0 ; SI-GISEL-NEXT: v_cmp_lt_f32_e32 vcc, v0, v2 ; SI-GISEL-NEXT: v_mov_b32_e32 v2, 0x42b17218 @@ -4620,14 +4624,15 @@ ; VI-SDAG-NEXT: v_sub_f32_e32 v4, v0, v1 ; VI-SDAG-NEXT: v_mul_f32_e32 v2, 0x3fb8a000, v1 ; VI-SDAG-NEXT: v_mul_f32_e32 v5, 0x39a3b295, v4 -; VI-SDAG-NEXT: s_mov_b32 s4, 0x3fb8a000 -; VI-SDAG-NEXT: v_rndne_f32_e32 v2, v2 -; VI-SDAG-NEXT: v_mac_f32_e32 v5, 0x3fb8a000, v4 -; VI-SDAG-NEXT: v_mad_f32 v3, v1, s4, -v2 -; VI-SDAG-NEXT: v_mac_f32_e32 v5, 0x39a3b295, v1 -; VI-SDAG-NEXT: v_add_f32_e32 v1, v3, v5 +; VI-SDAG-NEXT: v_mul_f32_e32 v4, 0x3fb8a000, v4 +; VI-SDAG-NEXT: v_rndne_f32_e32 v3, v2 +; VI-SDAG-NEXT: v_add_f32_e32 v4, v4, v5 +; VI-SDAG-NEXT: v_mul_f32_e32 v1, 0x39a3b295, v1 +; VI-SDAG-NEXT: v_sub_f32_e32 v2, v2, v3 +; VI-SDAG-NEXT: v_add_f32_e32 v1, v1, v4 +; VI-SDAG-NEXT: v_add_f32_e32 v1, v2, v1 ; VI-SDAG-NEXT: v_exp_f32_e32 v1, v1 -; VI-SDAG-NEXT: v_cvt_i32_f32_e32 v2, v2 +; VI-SDAG-NEXT: v_cvt_i32_f32_e32 v2, v3 ; VI-SDAG-NEXT: s_mov_b32 s4, 0xc2ce8ed0 ; VI-SDAG-NEXT: v_cmp_ngt_f32_e32 vcc, s4, v0 ; VI-SDAG-NEXT: s_mov_b32 s4, 0x42b17218 @@ -4643,14 +4648,15 @@ ; VI-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; VI-GISEL-NEXT: v_and_b32_e32 v1, 0xfffff000, v0 ; VI-GISEL-NEXT: v_sub_f32_e32 v2, v0, v1 -; VI-GISEL-NEXT: v_mul_f32_e32 v3, 0x3fb8a000, v1 ; VI-GISEL-NEXT: v_mul_f32_e32 v4, 0x39a3b295, v2 -; VI-GISEL-NEXT: s_mov_b32 s4, 0x3fb8a000 -; VI-GISEL-NEXT: v_mac_f32_e32 v4, 0x3fb8a000, v2 +; VI-GISEL-NEXT: v_mul_f32_e32 v2, 0x3fb8a000, v2 +; VI-GISEL-NEXT: v_mul_f32_e32 v3, 0x3fb8a000, v1 +; VI-GISEL-NEXT: v_add_f32_e32 v2, v2, v4 +; VI-GISEL-NEXT: v_mul_f32_e32 v1, 0x39a3b295, v1 +; VI-GISEL-NEXT: v_add_f32_e32 v1, v1, v2 ; VI-GISEL-NEXT: v_rndne_f32_e32 v2, v3 -; VI-GISEL-NEXT: v_mac_f32_e32 v4, 0x39a3b295, v1 -; VI-GISEL-NEXT: v_mad_f32 v1, v1, s4, -v2 -; VI-GISEL-NEXT: v_add_f32_e32 v1, v1, v4 +; VI-GISEL-NEXT: v_sub_f32_e32 v3, v3, v2 +; VI-GISEL-NEXT: v_add_f32_e32 v1, v3, v1 ; VI-GISEL-NEXT: v_cvt_i32_f32_e32 v2, v2 ; VI-GISEL-NEXT: v_exp_f32_e32 v1, v1 ; VI-GISEL-NEXT: v_mov_b32_e32 v3, 0x7f800000 @@ -4666,20 +4672,20 @@ ; GFX900-SDAG-LABEL: v_exp_f32_nnan_daz: ; GFX900-SDAG: ; %bb.0: ; GFX900-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GFX900-SDAG-NEXT: s_mov_b32 s4, 0x3fb8aa3b ; GFX900-SDAG-NEXT: v_mul_f32_e32 v1, 0x3fb8aa3b, v0 -; GFX900-SDAG-NEXT: v_fma_f32 v2, v0, s4, -v1 -; GFX900-SDAG-NEXT: s_mov_b32 s5, 0x32a5705f -; GFX900-SDAG-NEXT: v_rndne_f32_e32 v1, v1 -; GFX900-SDAG-NEXT: v_fma_f32 v2, v0, s5, v2 -; GFX900-SDAG-NEXT: v_mad_f32 v3, v0, s4, -v1 -; GFX900-SDAG-NEXT: v_add_f32_e32 v2, v3, v2 -; GFX900-SDAG-NEXT: v_exp_f32_e32 v2, v2 -; GFX900-SDAG-NEXT: v_cvt_i32_f32_e32 v1, v1 +; GFX900-SDAG-NEXT: s_mov_b32 s4, 0x3fb8aa3b +; GFX900-SDAG-NEXT: v_rndne_f32_e32 v2, v1 +; GFX900-SDAG-NEXT: v_sub_f32_e32 v3, v1, v2 +; GFX900-SDAG-NEXT: v_fma_f32 v1, v0, s4, -v1 +; GFX900-SDAG-NEXT: s_mov_b32 s4, 0x32a5705f +; GFX900-SDAG-NEXT: v_fma_f32 v1, v0, s4, v1 +; GFX900-SDAG-NEXT: v_add_f32_e32 v1, v3, v1 +; GFX900-SDAG-NEXT: v_exp_f32_e32 v1, v1 +; GFX900-SDAG-NEXT: v_cvt_i32_f32_e32 v2, v2 ; GFX900-SDAG-NEXT: s_mov_b32 s4, 0xc2ce8ed0 ; GFX900-SDAG-NEXT: v_cmp_ngt_f32_e32 vcc, s4, v0 ; GFX900-SDAG-NEXT: s_mov_b32 s4, 0x42b17218 -; GFX900-SDAG-NEXT: v_ldexp_f32 v1, v2, v1 +; GFX900-SDAG-NEXT: v_ldexp_f32 v1, v1, v2 ; GFX900-SDAG-NEXT: v_cndmask_b32_e32 v1, 0, v1, vcc ; GFX900-SDAG-NEXT: v_mov_b32_e32 v2, 0x7f800000 ; GFX900-SDAG-NEXT: v_cmp_nlt_f32_e32 vcc, s4, v0 @@ -4693,14 +4699,14 @@ ; GFX900-GISEL-NEXT: v_mul_f32_e32 v1, 0x3fb8aa3b, v0 ; GFX900-GISEL-NEXT: v_fma_f32 v2, v0, s4, -v1 ; GFX900-GISEL-NEXT: v_mov_b32_e32 v3, 0x32a5705f -; GFX900-GISEL-NEXT: v_rndne_f32_e32 v1, v1 ; GFX900-GISEL-NEXT: v_fma_f32 v2, v0, v3, v2 -; GFX900-GISEL-NEXT: v_mad_f32 v3, v0, s4, -v1 -; GFX900-GISEL-NEXT: v_add_f32_e32 v2, v3, v2 -; GFX900-GISEL-NEXT: v_cvt_i32_f32_e32 v1, v1 -; GFX900-GISEL-NEXT: v_exp_f32_e32 v2, v2 +; GFX900-GISEL-NEXT: v_rndne_f32_e32 v3, v1 +; GFX900-GISEL-NEXT: v_sub_f32_e32 v1, v1, v3 +; GFX900-GISEL-NEXT: v_add_f32_e32 v1, v1, v2 +; GFX900-GISEL-NEXT: v_cvt_i32_f32_e32 v2, v3 +; GFX900-GISEL-NEXT: v_exp_f32_e32 v1, v1 ; GFX900-GISEL-NEXT: v_mov_b32_e32 v3, 0x7f800000 -; GFX900-GISEL-NEXT: v_ldexp_f32 v1, v2, v1 +; GFX900-GISEL-NEXT: v_ldexp_f32 v1, v1, v2 ; GFX900-GISEL-NEXT: v_mov_b32_e32 v2, 0xc2ce8ed0 ; GFX900-GISEL-NEXT: v_cmp_lt_f32_e32 vcc, v0, v2 ; GFX900-GISEL-NEXT: v_mov_b32_e32 v2, 0x42b17218 @@ -4712,20 +4718,20 @@ ; SI-SDAG-LABEL: v_exp_f32_nnan_daz: ; SI-SDAG: ; %bb.0: ; SI-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; SI-SDAG-NEXT: s_mov_b32 s4, 0x3fb8aa3b ; SI-SDAG-NEXT: v_mul_f32_e32 v1, 0x3fb8aa3b, v0 -; SI-SDAG-NEXT: v_fma_f32 v2, v0, s4, -v1 -; SI-SDAG-NEXT: s_mov_b32 s5, 0x32a5705f -; SI-SDAG-NEXT: v_rndne_f32_e32 v1, v1 -; SI-SDAG-NEXT: v_fma_f32 v2, v0, s5, v2 -; SI-SDAG-NEXT: v_mad_f32 v3, v0, s4, -v1 -; SI-SDAG-NEXT: v_add_f32_e32 v2, v3, v2 -; SI-SDAG-NEXT: v_exp_f32_e32 v2, v2 -; SI-SDAG-NEXT: v_cvt_i32_f32_e32 v1, v1 +; SI-SDAG-NEXT: s_mov_b32 s4, 0x3fb8aa3b +; SI-SDAG-NEXT: v_rndne_f32_e32 v2, v1 +; SI-SDAG-NEXT: v_sub_f32_e32 v3, v1, v2 +; SI-SDAG-NEXT: v_fma_f32 v1, v0, s4, -v1 +; SI-SDAG-NEXT: s_mov_b32 s4, 0x32a5705f +; SI-SDAG-NEXT: v_fma_f32 v1, v0, s4, v1 +; SI-SDAG-NEXT: v_add_f32_e32 v1, v3, v1 +; SI-SDAG-NEXT: v_exp_f32_e32 v1, v1 +; SI-SDAG-NEXT: v_cvt_i32_f32_e32 v2, v2 ; SI-SDAG-NEXT: s_mov_b32 s4, 0xc2ce8ed0 ; SI-SDAG-NEXT: v_cmp_ngt_f32_e32 vcc, s4, v0 ; SI-SDAG-NEXT: s_mov_b32 s4, 0x42b17218 -; SI-SDAG-NEXT: v_ldexp_f32_e32 v1, v2, v1 +; SI-SDAG-NEXT: v_ldexp_f32_e32 v1, v1, v2 ; SI-SDAG-NEXT: v_cndmask_b32_e32 v1, 0, v1, vcc ; SI-SDAG-NEXT: v_mov_b32_e32 v2, 0x7f800000 ; SI-SDAG-NEXT: v_cmp_nlt_f32_e32 vcc, s4, v0 @@ -4739,14 +4745,14 @@ ; SI-GISEL-NEXT: v_mul_f32_e32 v1, 0x3fb8aa3b, v0 ; SI-GISEL-NEXT: v_fma_f32 v2, v0, s4, -v1 ; SI-GISEL-NEXT: v_mov_b32_e32 v3, 0x32a5705f -; SI-GISEL-NEXT: v_rndne_f32_e32 v1, v1 ; SI-GISEL-NEXT: v_fma_f32 v2, v0, v3, v2 -; SI-GISEL-NEXT: v_mad_f32 v3, v0, s4, -v1 -; SI-GISEL-NEXT: v_add_f32_e32 v2, v3, v2 -; SI-GISEL-NEXT: v_cvt_i32_f32_e32 v1, v1 -; SI-GISEL-NEXT: v_exp_f32_e32 v2, v2 +; SI-GISEL-NEXT: v_rndne_f32_e32 v3, v1 +; SI-GISEL-NEXT: v_sub_f32_e32 v1, v1, v3 +; SI-GISEL-NEXT: v_add_f32_e32 v1, v1, v2 +; SI-GISEL-NEXT: v_cvt_i32_f32_e32 v2, v3 +; SI-GISEL-NEXT: v_exp_f32_e32 v1, v1 ; SI-GISEL-NEXT: v_mov_b32_e32 v3, 0x7f800000 -; SI-GISEL-NEXT: v_ldexp_f32_e32 v1, v2, v1 +; SI-GISEL-NEXT: v_ldexp_f32_e32 v1, v1, v2 ; SI-GISEL-NEXT: v_mov_b32_e32 v2, 0xc2ce8ed0 ; SI-GISEL-NEXT: v_cmp_lt_f32_e32 vcc, v0, v2 ; SI-GISEL-NEXT: v_mov_b32_e32 v2, 0x42b17218 @@ -4776,14 +4782,15 @@ ; VI-SDAG-NEXT: v_sub_f32_e32 v4, v0, v1 ; VI-SDAG-NEXT: v_mul_f32_e32 v2, 0x3fb8a000, v1 ; VI-SDAG-NEXT: v_mul_f32_e32 v5, 0x39a3b295, v4 -; VI-SDAG-NEXT: s_mov_b32 s4, 0x3fb8a000 -; VI-SDAG-NEXT: v_rndne_f32_e32 v2, v2 -; VI-SDAG-NEXT: v_mac_f32_e32 v5, 0x3fb8a000, v4 -; VI-SDAG-NEXT: v_mad_f32 v3, v1, s4, -v2 -; VI-SDAG-NEXT: v_mac_f32_e32 v5, 0x39a3b295, v1 -; VI-SDAG-NEXT: v_add_f32_e32 v1, v3, v5 +; VI-SDAG-NEXT: v_mul_f32_e32 v4, 0x3fb8a000, v4 +; VI-SDAG-NEXT: v_rndne_f32_e32 v3, v2 +; VI-SDAG-NEXT: v_add_f32_e32 v4, v4, v5 +; VI-SDAG-NEXT: v_mul_f32_e32 v1, 0x39a3b295, v1 +; VI-SDAG-NEXT: v_sub_f32_e32 v2, v2, v3 +; VI-SDAG-NEXT: v_add_f32_e32 v1, v1, v4 +; VI-SDAG-NEXT: v_add_f32_e32 v1, v2, v1 ; VI-SDAG-NEXT: v_exp_f32_e32 v1, v1 -; VI-SDAG-NEXT: v_cvt_i32_f32_e32 v2, v2 +; VI-SDAG-NEXT: v_cvt_i32_f32_e32 v2, v3 ; VI-SDAG-NEXT: s_mov_b32 s4, 0xc2ce8ed0 ; VI-SDAG-NEXT: v_cmp_ngt_f32_e32 vcc, s4, v0 ; VI-SDAG-NEXT: s_mov_b32 s4, 0x42b17218 @@ -4799,14 +4806,15 @@ ; VI-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; VI-GISEL-NEXT: v_and_b32_e32 v1, 0xfffff000, v0 ; VI-GISEL-NEXT: v_sub_f32_e32 v2, v0, v1 -; VI-GISEL-NEXT: v_mul_f32_e32 v3, 0x3fb8a000, v1 ; VI-GISEL-NEXT: v_mul_f32_e32 v4, 0x39a3b295, v2 -; VI-GISEL-NEXT: s_mov_b32 s4, 0x3fb8a000 -; VI-GISEL-NEXT: v_mac_f32_e32 v4, 0x3fb8a000, v2 +; VI-GISEL-NEXT: v_mul_f32_e32 v2, 0x3fb8a000, v2 +; VI-GISEL-NEXT: v_mul_f32_e32 v3, 0x3fb8a000, v1 +; VI-GISEL-NEXT: v_add_f32_e32 v2, v2, v4 +; VI-GISEL-NEXT: v_mul_f32_e32 v1, 0x39a3b295, v1 +; VI-GISEL-NEXT: v_add_f32_e32 v1, v1, v2 ; VI-GISEL-NEXT: v_rndne_f32_e32 v2, v3 -; VI-GISEL-NEXT: v_mac_f32_e32 v4, 0x39a3b295, v1 -; VI-GISEL-NEXT: v_mad_f32 v1, v1, s4, -v2 -; VI-GISEL-NEXT: v_add_f32_e32 v1, v1, v4 +; VI-GISEL-NEXT: v_sub_f32_e32 v3, v3, v2 +; VI-GISEL-NEXT: v_add_f32_e32 v1, v3, v1 ; VI-GISEL-NEXT: v_cvt_i32_f32_e32 v2, v2 ; VI-GISEL-NEXT: v_exp_f32_e32 v1, v1 ; VI-GISEL-NEXT: v_mov_b32_e32 v3, 0x7f800000 @@ -4822,20 +4830,20 @@ ; GFX900-SDAG-LABEL: v_exp_f32_nnan_dynamic: ; GFX900-SDAG: ; %bb.0: ; GFX900-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GFX900-SDAG-NEXT: s_mov_b32 s4, 0x3fb8aa3b ; GFX900-SDAG-NEXT: v_mul_f32_e32 v1, 0x3fb8aa3b, v0 -; GFX900-SDAG-NEXT: v_fma_f32 v2, v0, s4, -v1 -; GFX900-SDAG-NEXT: s_mov_b32 s5, 0x32a5705f -; GFX900-SDAG-NEXT: v_rndne_f32_e32 v1, v1 -; GFX900-SDAG-NEXT: v_fma_f32 v2, v0, s5, v2 -; GFX900-SDAG-NEXT: v_mad_f32 v3, v0, s4, -v1 -; GFX900-SDAG-NEXT: v_add_f32_e32 v2, v3, v2 -; GFX900-SDAG-NEXT: v_exp_f32_e32 v2, v2 -; GFX900-SDAG-NEXT: v_cvt_i32_f32_e32 v1, v1 +; GFX900-SDAG-NEXT: s_mov_b32 s4, 0x3fb8aa3b +; GFX900-SDAG-NEXT: v_rndne_f32_e32 v2, v1 +; GFX900-SDAG-NEXT: v_sub_f32_e32 v3, v1, v2 +; GFX900-SDAG-NEXT: v_fma_f32 v1, v0, s4, -v1 +; GFX900-SDAG-NEXT: s_mov_b32 s4, 0x32a5705f +; GFX900-SDAG-NEXT: v_fma_f32 v1, v0, s4, v1 +; GFX900-SDAG-NEXT: v_add_f32_e32 v1, v3, v1 +; GFX900-SDAG-NEXT: v_exp_f32_e32 v1, v1 +; GFX900-SDAG-NEXT: v_cvt_i32_f32_e32 v2, v2 ; GFX900-SDAG-NEXT: s_mov_b32 s4, 0xc2ce8ed0 ; GFX900-SDAG-NEXT: v_cmp_ngt_f32_e32 vcc, s4, v0 ; GFX900-SDAG-NEXT: s_mov_b32 s4, 0x42b17218 -; GFX900-SDAG-NEXT: v_ldexp_f32 v1, v2, v1 +; GFX900-SDAG-NEXT: v_ldexp_f32 v1, v1, v2 ; GFX900-SDAG-NEXT: v_cndmask_b32_e32 v1, 0, v1, vcc ; GFX900-SDAG-NEXT: v_mov_b32_e32 v2, 0x7f800000 ; GFX900-SDAG-NEXT: v_cmp_nlt_f32_e32 vcc, s4, v0 @@ -4849,14 +4857,14 @@ ; GFX900-GISEL-NEXT: v_mul_f32_e32 v1, 0x3fb8aa3b, v0 ; GFX900-GISEL-NEXT: v_fma_f32 v2, v0, s4, -v1 ; GFX900-GISEL-NEXT: v_mov_b32_e32 v3, 0x32a5705f -; GFX900-GISEL-NEXT: v_rndne_f32_e32 v1, v1 ; GFX900-GISEL-NEXT: v_fma_f32 v2, v0, v3, v2 -; GFX900-GISEL-NEXT: v_mad_f32 v3, v0, s4, -v1 -; GFX900-GISEL-NEXT: v_add_f32_e32 v2, v3, v2 -; GFX900-GISEL-NEXT: v_cvt_i32_f32_e32 v1, v1 -; GFX900-GISEL-NEXT: v_exp_f32_e32 v2, v2 +; GFX900-GISEL-NEXT: v_rndne_f32_e32 v3, v1 +; GFX900-GISEL-NEXT: v_sub_f32_e32 v1, v1, v3 +; GFX900-GISEL-NEXT: v_add_f32_e32 v1, v1, v2 +; GFX900-GISEL-NEXT: v_cvt_i32_f32_e32 v2, v3 +; GFX900-GISEL-NEXT: v_exp_f32_e32 v1, v1 ; GFX900-GISEL-NEXT: v_mov_b32_e32 v3, 0x7f800000 -; GFX900-GISEL-NEXT: v_ldexp_f32 v1, v2, v1 +; GFX900-GISEL-NEXT: v_ldexp_f32 v1, v1, v2 ; GFX900-GISEL-NEXT: v_mov_b32_e32 v2, 0xc2ce8ed0 ; GFX900-GISEL-NEXT: v_cmp_lt_f32_e32 vcc, v0, v2 ; GFX900-GISEL-NEXT: v_mov_b32_e32 v2, 0x42b17218 @@ -4868,20 +4876,20 @@ ; SI-SDAG-LABEL: v_exp_f32_nnan_dynamic: ; SI-SDAG: ; %bb.0: ; SI-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; SI-SDAG-NEXT: s_mov_b32 s4, 0x3fb8aa3b ; SI-SDAG-NEXT: v_mul_f32_e32 v1, 0x3fb8aa3b, v0 -; SI-SDAG-NEXT: v_fma_f32 v2, v0, s4, -v1 -; SI-SDAG-NEXT: s_mov_b32 s5, 0x32a5705f -; SI-SDAG-NEXT: v_rndne_f32_e32 v1, v1 -; SI-SDAG-NEXT: v_fma_f32 v2, v0, s5, v2 -; SI-SDAG-NEXT: v_mad_f32 v3, v0, s4, -v1 -; SI-SDAG-NEXT: v_add_f32_e32 v2, v3, v2 -; SI-SDAG-NEXT: v_exp_f32_e32 v2, v2 -; SI-SDAG-NEXT: v_cvt_i32_f32_e32 v1, v1 +; SI-SDAG-NEXT: s_mov_b32 s4, 0x3fb8aa3b +; SI-SDAG-NEXT: v_rndne_f32_e32 v2, v1 +; SI-SDAG-NEXT: v_sub_f32_e32 v3, v1, v2 +; SI-SDAG-NEXT: v_fma_f32 v1, v0, s4, -v1 +; SI-SDAG-NEXT: s_mov_b32 s4, 0x32a5705f +; SI-SDAG-NEXT: v_fma_f32 v1, v0, s4, v1 +; SI-SDAG-NEXT: v_add_f32_e32 v1, v3, v1 +; SI-SDAG-NEXT: v_exp_f32_e32 v1, v1 +; SI-SDAG-NEXT: v_cvt_i32_f32_e32 v2, v2 ; SI-SDAG-NEXT: s_mov_b32 s4, 0xc2ce8ed0 ; SI-SDAG-NEXT: v_cmp_ngt_f32_e32 vcc, s4, v0 ; SI-SDAG-NEXT: s_mov_b32 s4, 0x42b17218 -; SI-SDAG-NEXT: v_ldexp_f32_e32 v1, v2, v1 +; SI-SDAG-NEXT: v_ldexp_f32_e32 v1, v1, v2 ; SI-SDAG-NEXT: v_cndmask_b32_e32 v1, 0, v1, vcc ; SI-SDAG-NEXT: v_mov_b32_e32 v2, 0x7f800000 ; SI-SDAG-NEXT: v_cmp_nlt_f32_e32 vcc, s4, v0 @@ -4895,14 +4903,14 @@ ; SI-GISEL-NEXT: v_mul_f32_e32 v1, 0x3fb8aa3b, v0 ; SI-GISEL-NEXT: v_fma_f32 v2, v0, s4, -v1 ; SI-GISEL-NEXT: v_mov_b32_e32 v3, 0x32a5705f -; SI-GISEL-NEXT: v_rndne_f32_e32 v1, v1 ; SI-GISEL-NEXT: v_fma_f32 v2, v0, v3, v2 -; SI-GISEL-NEXT: v_mad_f32 v3, v0, s4, -v1 -; SI-GISEL-NEXT: v_add_f32_e32 v2, v3, v2 -; SI-GISEL-NEXT: v_cvt_i32_f32_e32 v1, v1 -; SI-GISEL-NEXT: v_exp_f32_e32 v2, v2 +; SI-GISEL-NEXT: v_rndne_f32_e32 v3, v1 +; SI-GISEL-NEXT: v_sub_f32_e32 v1, v1, v3 +; SI-GISEL-NEXT: v_add_f32_e32 v1, v1, v2 +; SI-GISEL-NEXT: v_cvt_i32_f32_e32 v2, v3 +; SI-GISEL-NEXT: v_exp_f32_e32 v1, v1 ; SI-GISEL-NEXT: v_mov_b32_e32 v3, 0x7f800000 -; SI-GISEL-NEXT: v_ldexp_f32_e32 v1, v2, v1 +; SI-GISEL-NEXT: v_ldexp_f32_e32 v1, v1, v2 ; SI-GISEL-NEXT: v_mov_b32_e32 v2, 0xc2ce8ed0 ; SI-GISEL-NEXT: v_cmp_lt_f32_e32 vcc, v0, v2 ; SI-GISEL-NEXT: v_mov_b32_e32 v2, 0x42b17218 @@ -4932,14 +4940,15 @@ ; VI-SDAG-NEXT: v_sub_f32_e32 v4, v0, v1 ; VI-SDAG-NEXT: v_mul_f32_e32 v2, 0x3fb8a000, v1 ; VI-SDAG-NEXT: v_mul_f32_e32 v5, 0x39a3b295, v4 -; VI-SDAG-NEXT: s_mov_b32 s4, 0x3fb8a000 -; VI-SDAG-NEXT: v_rndne_f32_e32 v2, v2 -; VI-SDAG-NEXT: v_mac_f32_e32 v5, 0x3fb8a000, v4 -; VI-SDAG-NEXT: v_mad_f32 v3, v1, s4, -v2 -; VI-SDAG-NEXT: v_mac_f32_e32 v5, 0x39a3b295, v1 -; VI-SDAG-NEXT: v_add_f32_e32 v1, v3, v5 +; VI-SDAG-NEXT: v_mul_f32_e32 v4, 0x3fb8a000, v4 +; VI-SDAG-NEXT: v_rndne_f32_e32 v3, v2 +; VI-SDAG-NEXT: v_add_f32_e32 v4, v4, v5 +; VI-SDAG-NEXT: v_mul_f32_e32 v1, 0x39a3b295, v1 +; VI-SDAG-NEXT: v_sub_f32_e32 v2, v2, v3 +; VI-SDAG-NEXT: v_add_f32_e32 v1, v1, v4 +; VI-SDAG-NEXT: v_add_f32_e32 v1, v2, v1 ; VI-SDAG-NEXT: v_exp_f32_e32 v1, v1 -; VI-SDAG-NEXT: v_cvt_i32_f32_e32 v2, v2 +; VI-SDAG-NEXT: v_cvt_i32_f32_e32 v2, v3 ; VI-SDAG-NEXT: s_mov_b32 s4, 0xc2ce8ed0 ; VI-SDAG-NEXT: v_cmp_ngt_f32_e32 vcc, s4, v0 ; VI-SDAG-NEXT: v_ldexp_f32 v1, v1, v2 @@ -4951,14 +4960,15 @@ ; VI-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; VI-GISEL-NEXT: v_and_b32_e32 v1, 0xfffff000, v0 ; VI-GISEL-NEXT: v_sub_f32_e32 v2, v0, v1 -; VI-GISEL-NEXT: v_mul_f32_e32 v3, 0x3fb8a000, v1 ; VI-GISEL-NEXT: v_mul_f32_e32 v4, 0x39a3b295, v2 -; VI-GISEL-NEXT: s_mov_b32 s4, 0x3fb8a000 -; VI-GISEL-NEXT: v_mac_f32_e32 v4, 0x3fb8a000, v2 +; VI-GISEL-NEXT: v_mul_f32_e32 v2, 0x3fb8a000, v2 +; VI-GISEL-NEXT: v_mul_f32_e32 v3, 0x3fb8a000, v1 +; VI-GISEL-NEXT: v_add_f32_e32 v2, v2, v4 +; VI-GISEL-NEXT: v_mul_f32_e32 v1, 0x39a3b295, v1 +; VI-GISEL-NEXT: v_add_f32_e32 v1, v1, v2 ; VI-GISEL-NEXT: v_rndne_f32_e32 v2, v3 -; VI-GISEL-NEXT: v_mac_f32_e32 v4, 0x39a3b295, v1 -; VI-GISEL-NEXT: v_mad_f32 v1, v1, s4, -v2 -; VI-GISEL-NEXT: v_add_f32_e32 v1, v1, v4 +; VI-GISEL-NEXT: v_sub_f32_e32 v3, v3, v2 +; VI-GISEL-NEXT: v_add_f32_e32 v1, v3, v1 ; VI-GISEL-NEXT: v_cvt_i32_f32_e32 v2, v2 ; VI-GISEL-NEXT: v_exp_f32_e32 v1, v1 ; VI-GISEL-NEXT: v_ldexp_f32 v1, v1, v2 @@ -4970,19 +4980,19 @@ ; GFX900-SDAG-LABEL: v_exp_f32_ninf_daz: ; GFX900-SDAG: ; %bb.0: ; GFX900-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GFX900-SDAG-NEXT: s_mov_b32 s4, 0x3fb8aa3b ; GFX900-SDAG-NEXT: v_mul_f32_e32 v1, 0x3fb8aa3b, v0 -; GFX900-SDAG-NEXT: v_fma_f32 v2, v0, s4, -v1 -; GFX900-SDAG-NEXT: s_mov_b32 s5, 0x32a5705f -; GFX900-SDAG-NEXT: v_rndne_f32_e32 v1, v1 -; GFX900-SDAG-NEXT: v_fma_f32 v2, v0, s5, v2 -; GFX900-SDAG-NEXT: v_mad_f32 v3, v0, s4, -v1 -; GFX900-SDAG-NEXT: v_add_f32_e32 v2, v3, v2 -; GFX900-SDAG-NEXT: v_exp_f32_e32 v2, v2 -; GFX900-SDAG-NEXT: v_cvt_i32_f32_e32 v1, v1 +; GFX900-SDAG-NEXT: s_mov_b32 s4, 0x3fb8aa3b +; GFX900-SDAG-NEXT: v_rndne_f32_e32 v2, v1 +; GFX900-SDAG-NEXT: v_sub_f32_e32 v3, v1, v2 +; GFX900-SDAG-NEXT: v_fma_f32 v1, v0, s4, -v1 +; GFX900-SDAG-NEXT: s_mov_b32 s4, 0x32a5705f +; GFX900-SDAG-NEXT: v_fma_f32 v1, v0, s4, v1 +; GFX900-SDAG-NEXT: v_add_f32_e32 v1, v3, v1 +; GFX900-SDAG-NEXT: v_exp_f32_e32 v1, v1 +; GFX900-SDAG-NEXT: v_cvt_i32_f32_e32 v2, v2 ; GFX900-SDAG-NEXT: s_mov_b32 s4, 0xc2ce8ed0 ; GFX900-SDAG-NEXT: v_cmp_ngt_f32_e32 vcc, s4, v0 -; GFX900-SDAG-NEXT: v_ldexp_f32 v1, v2, v1 +; GFX900-SDAG-NEXT: v_ldexp_f32 v1, v1, v2 ; GFX900-SDAG-NEXT: v_cndmask_b32_e32 v0, 0, v1, vcc ; GFX900-SDAG-NEXT: s_setpc_b64 s[30:31] ; @@ -4993,13 +5003,13 @@ ; GFX900-GISEL-NEXT: v_mul_f32_e32 v1, 0x3fb8aa3b, v0 ; GFX900-GISEL-NEXT: v_fma_f32 v2, v0, s4, -v1 ; GFX900-GISEL-NEXT: v_mov_b32_e32 v3, 0x32a5705f -; GFX900-GISEL-NEXT: v_rndne_f32_e32 v1, v1 ; GFX900-GISEL-NEXT: v_fma_f32 v2, v0, v3, v2 -; GFX900-GISEL-NEXT: v_mad_f32 v3, v0, s4, -v1 -; GFX900-GISEL-NEXT: v_add_f32_e32 v2, v3, v2 -; GFX900-GISEL-NEXT: v_cvt_i32_f32_e32 v1, v1 -; GFX900-GISEL-NEXT: v_exp_f32_e32 v2, v2 -; GFX900-GISEL-NEXT: v_ldexp_f32 v1, v2, v1 +; GFX900-GISEL-NEXT: v_rndne_f32_e32 v3, v1 +; GFX900-GISEL-NEXT: v_sub_f32_e32 v1, v1, v3 +; GFX900-GISEL-NEXT: v_add_f32_e32 v1, v1, v2 +; GFX900-GISEL-NEXT: v_cvt_i32_f32_e32 v2, v3 +; GFX900-GISEL-NEXT: v_exp_f32_e32 v1, v1 +; GFX900-GISEL-NEXT: v_ldexp_f32 v1, v1, v2 ; GFX900-GISEL-NEXT: v_mov_b32_e32 v2, 0xc2ce8ed0 ; GFX900-GISEL-NEXT: v_cmp_lt_f32_e32 vcc, v0, v2 ; GFX900-GISEL-NEXT: v_cndmask_b32_e64 v0, v1, 0, vcc @@ -5008,19 +5018,19 @@ ; SI-SDAG-LABEL: v_exp_f32_ninf_daz: ; SI-SDAG: ; %bb.0: ; SI-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; SI-SDAG-NEXT: s_mov_b32 s4, 0x3fb8aa3b ; SI-SDAG-NEXT: v_mul_f32_e32 v1, 0x3fb8aa3b, v0 -; SI-SDAG-NEXT: v_fma_f32 v2, v0, s4, -v1 -; SI-SDAG-NEXT: s_mov_b32 s5, 0x32a5705f -; SI-SDAG-NEXT: v_rndne_f32_e32 v1, v1 -; SI-SDAG-NEXT: v_fma_f32 v2, v0, s5, v2 -; SI-SDAG-NEXT: v_mad_f32 v3, v0, s4, -v1 -; SI-SDAG-NEXT: v_add_f32_e32 v2, v3, v2 -; SI-SDAG-NEXT: v_exp_f32_e32 v2, v2 -; SI-SDAG-NEXT: v_cvt_i32_f32_e32 v1, v1 +; SI-SDAG-NEXT: s_mov_b32 s4, 0x3fb8aa3b +; SI-SDAG-NEXT: v_rndne_f32_e32 v2, v1 +; SI-SDAG-NEXT: v_sub_f32_e32 v3, v1, v2 +; SI-SDAG-NEXT: v_fma_f32 v1, v0, s4, -v1 +; SI-SDAG-NEXT: s_mov_b32 s4, 0x32a5705f +; SI-SDAG-NEXT: v_fma_f32 v1, v0, s4, v1 +; SI-SDAG-NEXT: v_add_f32_e32 v1, v3, v1 +; SI-SDAG-NEXT: v_exp_f32_e32 v1, v1 +; SI-SDAG-NEXT: v_cvt_i32_f32_e32 v2, v2 ; SI-SDAG-NEXT: s_mov_b32 s4, 0xc2ce8ed0 ; SI-SDAG-NEXT: v_cmp_ngt_f32_e32 vcc, s4, v0 -; SI-SDAG-NEXT: v_ldexp_f32_e32 v1, v2, v1 +; SI-SDAG-NEXT: v_ldexp_f32_e32 v1, v1, v2 ; SI-SDAG-NEXT: v_cndmask_b32_e32 v0, 0, v1, vcc ; SI-SDAG-NEXT: s_setpc_b64 s[30:31] ; @@ -5031,13 +5041,13 @@ ; SI-GISEL-NEXT: v_mul_f32_e32 v1, 0x3fb8aa3b, v0 ; SI-GISEL-NEXT: v_fma_f32 v2, v0, s4, -v1 ; SI-GISEL-NEXT: v_mov_b32_e32 v3, 0x32a5705f -; SI-GISEL-NEXT: v_rndne_f32_e32 v1, v1 ; SI-GISEL-NEXT: v_fma_f32 v2, v0, v3, v2 -; SI-GISEL-NEXT: v_mad_f32 v3, v0, s4, -v1 -; SI-GISEL-NEXT: v_add_f32_e32 v2, v3, v2 -; SI-GISEL-NEXT: v_cvt_i32_f32_e32 v1, v1 -; SI-GISEL-NEXT: v_exp_f32_e32 v2, v2 -; SI-GISEL-NEXT: v_ldexp_f32_e32 v1, v2, v1 +; SI-GISEL-NEXT: v_rndne_f32_e32 v3, v1 +; SI-GISEL-NEXT: v_sub_f32_e32 v1, v1, v3 +; SI-GISEL-NEXT: v_add_f32_e32 v1, v1, v2 +; SI-GISEL-NEXT: v_cvt_i32_f32_e32 v2, v3 +; SI-GISEL-NEXT: v_exp_f32_e32 v1, v1 +; SI-GISEL-NEXT: v_ldexp_f32_e32 v1, v1, v2 ; SI-GISEL-NEXT: v_mov_b32_e32 v2, 0xc2ce8ed0 ; SI-GISEL-NEXT: v_cmp_lt_f32_e32 vcc, v0, v2 ; SI-GISEL-NEXT: v_cndmask_b32_e64 v0, v1, 0, vcc @@ -5064,14 +5074,15 @@ ; VI-SDAG-NEXT: v_sub_f32_e32 v4, v0, v1 ; VI-SDAG-NEXT: v_mul_f32_e32 v2, 0x3fb8a000, v1 ; VI-SDAG-NEXT: v_mul_f32_e32 v5, 0x39a3b295, v4 -; VI-SDAG-NEXT: s_mov_b32 s4, 0x3fb8a000 -; VI-SDAG-NEXT: v_rndne_f32_e32 v2, v2 -; VI-SDAG-NEXT: v_mac_f32_e32 v5, 0x3fb8a000, v4 -; VI-SDAG-NEXT: v_mad_f32 v3, v1, s4, -v2 -; VI-SDAG-NEXT: v_mac_f32_e32 v5, 0x39a3b295, v1 -; VI-SDAG-NEXT: v_add_f32_e32 v1, v3, v5 +; VI-SDAG-NEXT: v_mul_f32_e32 v4, 0x3fb8a000, v4 +; VI-SDAG-NEXT: v_rndne_f32_e32 v3, v2 +; VI-SDAG-NEXT: v_add_f32_e32 v4, v4, v5 +; VI-SDAG-NEXT: v_mul_f32_e32 v1, 0x39a3b295, v1 +; VI-SDAG-NEXT: v_sub_f32_e32 v2, v2, v3 +; VI-SDAG-NEXT: v_add_f32_e32 v1, v1, v4 +; VI-SDAG-NEXT: v_add_f32_e32 v1, v2, v1 ; VI-SDAG-NEXT: v_exp_f32_e32 v1, v1 -; VI-SDAG-NEXT: v_cvt_i32_f32_e32 v2, v2 +; VI-SDAG-NEXT: v_cvt_i32_f32_e32 v2, v3 ; VI-SDAG-NEXT: s_mov_b32 s4, 0xc2ce8ed0 ; VI-SDAG-NEXT: v_cmp_ngt_f32_e32 vcc, s4, v0 ; VI-SDAG-NEXT: v_ldexp_f32 v1, v1, v2 @@ -5083,14 +5094,15 @@ ; VI-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; VI-GISEL-NEXT: v_and_b32_e32 v1, 0xfffff000, v0 ; VI-GISEL-NEXT: v_sub_f32_e32 v2, v0, v1 -; VI-GISEL-NEXT: v_mul_f32_e32 v3, 0x3fb8a000, v1 ; VI-GISEL-NEXT: v_mul_f32_e32 v4, 0x39a3b295, v2 -; VI-GISEL-NEXT: s_mov_b32 s4, 0x3fb8a000 -; VI-GISEL-NEXT: v_mac_f32_e32 v4, 0x3fb8a000, v2 +; VI-GISEL-NEXT: v_mul_f32_e32 v2, 0x3fb8a000, v2 +; VI-GISEL-NEXT: v_mul_f32_e32 v3, 0x3fb8a000, v1 +; VI-GISEL-NEXT: v_add_f32_e32 v2, v2, v4 +; VI-GISEL-NEXT: v_mul_f32_e32 v1, 0x39a3b295, v1 +; VI-GISEL-NEXT: v_add_f32_e32 v1, v1, v2 ; VI-GISEL-NEXT: v_rndne_f32_e32 v2, v3 -; VI-GISEL-NEXT: v_mac_f32_e32 v4, 0x39a3b295, v1 -; VI-GISEL-NEXT: v_mad_f32 v1, v1, s4, -v2 -; VI-GISEL-NEXT: v_add_f32_e32 v1, v1, v4 +; VI-GISEL-NEXT: v_sub_f32_e32 v3, v3, v2 +; VI-GISEL-NEXT: v_add_f32_e32 v1, v3, v1 ; VI-GISEL-NEXT: v_cvt_i32_f32_e32 v2, v2 ; VI-GISEL-NEXT: v_exp_f32_e32 v1, v1 ; VI-GISEL-NEXT: v_ldexp_f32 v1, v1, v2 @@ -5102,19 +5114,19 @@ ; GFX900-SDAG-LABEL: v_exp_f32_ninf_dynamic: ; GFX900-SDAG: ; %bb.0: ; GFX900-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GFX900-SDAG-NEXT: s_mov_b32 s4, 0x3fb8aa3b ; GFX900-SDAG-NEXT: v_mul_f32_e32 v1, 0x3fb8aa3b, v0 -; GFX900-SDAG-NEXT: v_fma_f32 v2, v0, s4, -v1 -; GFX900-SDAG-NEXT: s_mov_b32 s5, 0x32a5705f -; GFX900-SDAG-NEXT: v_rndne_f32_e32 v1, v1 -; GFX900-SDAG-NEXT: v_fma_f32 v2, v0, s5, v2 -; GFX900-SDAG-NEXT: v_mad_f32 v3, v0, s4, -v1 -; GFX900-SDAG-NEXT: v_add_f32_e32 v2, v3, v2 -; GFX900-SDAG-NEXT: v_exp_f32_e32 v2, v2 -; GFX900-SDAG-NEXT: v_cvt_i32_f32_e32 v1, v1 +; GFX900-SDAG-NEXT: s_mov_b32 s4, 0x3fb8aa3b +; GFX900-SDAG-NEXT: v_rndne_f32_e32 v2, v1 +; GFX900-SDAG-NEXT: v_sub_f32_e32 v3, v1, v2 +; GFX900-SDAG-NEXT: v_fma_f32 v1, v0, s4, -v1 +; GFX900-SDAG-NEXT: s_mov_b32 s4, 0x32a5705f +; GFX900-SDAG-NEXT: v_fma_f32 v1, v0, s4, v1 +; GFX900-SDAG-NEXT: v_add_f32_e32 v1, v3, v1 +; GFX900-SDAG-NEXT: v_exp_f32_e32 v1, v1 +; GFX900-SDAG-NEXT: v_cvt_i32_f32_e32 v2, v2 ; GFX900-SDAG-NEXT: s_mov_b32 s4, 0xc2ce8ed0 ; GFX900-SDAG-NEXT: v_cmp_ngt_f32_e32 vcc, s4, v0 -; GFX900-SDAG-NEXT: v_ldexp_f32 v1, v2, v1 +; GFX900-SDAG-NEXT: v_ldexp_f32 v1, v1, v2 ; GFX900-SDAG-NEXT: v_cndmask_b32_e32 v0, 0, v1, vcc ; GFX900-SDAG-NEXT: s_setpc_b64 s[30:31] ; @@ -5125,13 +5137,13 @@ ; GFX900-GISEL-NEXT: v_mul_f32_e32 v1, 0x3fb8aa3b, v0 ; GFX900-GISEL-NEXT: v_fma_f32 v2, v0, s4, -v1 ; GFX900-GISEL-NEXT: v_mov_b32_e32 v3, 0x32a5705f -; GFX900-GISEL-NEXT: v_rndne_f32_e32 v1, v1 ; GFX900-GISEL-NEXT: v_fma_f32 v2, v0, v3, v2 -; GFX900-GISEL-NEXT: v_mad_f32 v3, v0, s4, -v1 -; GFX900-GISEL-NEXT: v_add_f32_e32 v2, v3, v2 -; GFX900-GISEL-NEXT: v_cvt_i32_f32_e32 v1, v1 -; GFX900-GISEL-NEXT: v_exp_f32_e32 v2, v2 -; GFX900-GISEL-NEXT: v_ldexp_f32 v1, v2, v1 +; GFX900-GISEL-NEXT: v_rndne_f32_e32 v3, v1 +; GFX900-GISEL-NEXT: v_sub_f32_e32 v1, v1, v3 +; GFX900-GISEL-NEXT: v_add_f32_e32 v1, v1, v2 +; GFX900-GISEL-NEXT: v_cvt_i32_f32_e32 v2, v3 +; GFX900-GISEL-NEXT: v_exp_f32_e32 v1, v1 +; GFX900-GISEL-NEXT: v_ldexp_f32 v1, v1, v2 ; GFX900-GISEL-NEXT: v_mov_b32_e32 v2, 0xc2ce8ed0 ; GFX900-GISEL-NEXT: v_cmp_lt_f32_e32 vcc, v0, v2 ; GFX900-GISEL-NEXT: v_cndmask_b32_e64 v0, v1, 0, vcc @@ -5140,19 +5152,19 @@ ; SI-SDAG-LABEL: v_exp_f32_ninf_dynamic: ; SI-SDAG: ; %bb.0: ; SI-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; SI-SDAG-NEXT: s_mov_b32 s4, 0x3fb8aa3b ; SI-SDAG-NEXT: v_mul_f32_e32 v1, 0x3fb8aa3b, v0 -; SI-SDAG-NEXT: v_fma_f32 v2, v0, s4, -v1 -; SI-SDAG-NEXT: s_mov_b32 s5, 0x32a5705f -; SI-SDAG-NEXT: v_rndne_f32_e32 v1, v1 -; SI-SDAG-NEXT: v_fma_f32 v2, v0, s5, v2 -; SI-SDAG-NEXT: v_mad_f32 v3, v0, s4, -v1 -; SI-SDAG-NEXT: v_add_f32_e32 v2, v3, v2 -; SI-SDAG-NEXT: v_exp_f32_e32 v2, v2 -; SI-SDAG-NEXT: v_cvt_i32_f32_e32 v1, v1 +; SI-SDAG-NEXT: s_mov_b32 s4, 0x3fb8aa3b +; SI-SDAG-NEXT: v_rndne_f32_e32 v2, v1 +; SI-SDAG-NEXT: v_sub_f32_e32 v3, v1, v2 +; SI-SDAG-NEXT: v_fma_f32 v1, v0, s4, -v1 +; SI-SDAG-NEXT: s_mov_b32 s4, 0x32a5705f +; SI-SDAG-NEXT: v_fma_f32 v1, v0, s4, v1 +; SI-SDAG-NEXT: v_add_f32_e32 v1, v3, v1 +; SI-SDAG-NEXT: v_exp_f32_e32 v1, v1 +; SI-SDAG-NEXT: v_cvt_i32_f32_e32 v2, v2 ; SI-SDAG-NEXT: s_mov_b32 s4, 0xc2ce8ed0 ; SI-SDAG-NEXT: v_cmp_ngt_f32_e32 vcc, s4, v0 -; SI-SDAG-NEXT: v_ldexp_f32_e32 v1, v2, v1 +; SI-SDAG-NEXT: v_ldexp_f32_e32 v1, v1, v2 ; SI-SDAG-NEXT: v_cndmask_b32_e32 v0, 0, v1, vcc ; SI-SDAG-NEXT: s_setpc_b64 s[30:31] ; @@ -5163,13 +5175,13 @@ ; SI-GISEL-NEXT: v_mul_f32_e32 v1, 0x3fb8aa3b, v0 ; SI-GISEL-NEXT: v_fma_f32 v2, v0, s4, -v1 ; SI-GISEL-NEXT: v_mov_b32_e32 v3, 0x32a5705f -; SI-GISEL-NEXT: v_rndne_f32_e32 v1, v1 ; SI-GISEL-NEXT: v_fma_f32 v2, v0, v3, v2 -; SI-GISEL-NEXT: v_mad_f32 v3, v0, s4, -v1 -; SI-GISEL-NEXT: v_add_f32_e32 v2, v3, v2 -; SI-GISEL-NEXT: v_cvt_i32_f32_e32 v1, v1 -; SI-GISEL-NEXT: v_exp_f32_e32 v2, v2 -; SI-GISEL-NEXT: v_ldexp_f32_e32 v1, v2, v1 +; SI-GISEL-NEXT: v_rndne_f32_e32 v3, v1 +; SI-GISEL-NEXT: v_sub_f32_e32 v1, v1, v3 +; SI-GISEL-NEXT: v_add_f32_e32 v1, v1, v2 +; SI-GISEL-NEXT: v_cvt_i32_f32_e32 v2, v3 +; SI-GISEL-NEXT: v_exp_f32_e32 v1, v1 +; SI-GISEL-NEXT: v_ldexp_f32_e32 v1, v1, v2 ; SI-GISEL-NEXT: v_mov_b32_e32 v2, 0xc2ce8ed0 ; SI-GISEL-NEXT: v_cmp_lt_f32_e32 vcc, v0, v2 ; SI-GISEL-NEXT: v_cndmask_b32_e64 v0, v1, 0, vcc @@ -5330,14 +5342,15 @@ ; VI-SDAG-NEXT: v_sub_f32_e32 v4, v0, v1 ; VI-SDAG-NEXT: v_mul_f32_e32 v2, 0x3fb8a000, v1 ; VI-SDAG-NEXT: v_mul_f32_e32 v5, 0x39a3b295, v4 -; VI-SDAG-NEXT: s_mov_b32 s4, 0x3fb8a000 -; VI-SDAG-NEXT: v_rndne_f32_e32 v2, v2 -; VI-SDAG-NEXT: v_mac_f32_e32 v5, 0x3fb8a000, v4 -; VI-SDAG-NEXT: v_mad_f32 v3, v1, s4, -v2 -; VI-SDAG-NEXT: v_mac_f32_e32 v5, 0x39a3b295, v1 -; VI-SDAG-NEXT: v_add_f32_e32 v1, v3, v5 +; VI-SDAG-NEXT: v_mul_f32_e32 v4, 0x3fb8a000, v4 +; VI-SDAG-NEXT: v_rndne_f32_e32 v3, v2 +; VI-SDAG-NEXT: v_add_f32_e32 v4, v4, v5 +; VI-SDAG-NEXT: v_mul_f32_e32 v1, 0x39a3b295, v1 +; VI-SDAG-NEXT: v_sub_f32_e32 v2, v2, v3 +; VI-SDAG-NEXT: v_add_f32_e32 v1, v1, v4 +; VI-SDAG-NEXT: v_add_f32_e32 v1, v2, v1 ; VI-SDAG-NEXT: v_exp_f32_e32 v1, v1 -; VI-SDAG-NEXT: v_cvt_i32_f32_e32 v2, v2 +; VI-SDAG-NEXT: v_cvt_i32_f32_e32 v2, v3 ; VI-SDAG-NEXT: s_mov_b32 s4, 0xc2ce8ed0 ; VI-SDAG-NEXT: v_cmp_ngt_f32_e32 vcc, s4, v0 ; VI-SDAG-NEXT: v_ldexp_f32 v1, v1, v2 @@ -5349,14 +5362,15 @@ ; VI-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; VI-GISEL-NEXT: v_and_b32_e32 v1, 0xfffff000, v0 ; VI-GISEL-NEXT: v_sub_f32_e32 v2, v0, v1 -; VI-GISEL-NEXT: v_mul_f32_e32 v3, 0x3fb8a000, v1 ; VI-GISEL-NEXT: v_mul_f32_e32 v4, 0x39a3b295, v2 -; VI-GISEL-NEXT: s_mov_b32 s4, 0x3fb8a000 -; VI-GISEL-NEXT: v_mac_f32_e32 v4, 0x3fb8a000, v2 +; VI-GISEL-NEXT: v_mul_f32_e32 v2, 0x3fb8a000, v2 +; VI-GISEL-NEXT: v_mul_f32_e32 v3, 0x3fb8a000, v1 +; VI-GISEL-NEXT: v_add_f32_e32 v2, v2, v4 +; VI-GISEL-NEXT: v_mul_f32_e32 v1, 0x39a3b295, v1 +; VI-GISEL-NEXT: v_add_f32_e32 v1, v1, v2 ; VI-GISEL-NEXT: v_rndne_f32_e32 v2, v3 -; VI-GISEL-NEXT: v_mac_f32_e32 v4, 0x39a3b295, v1 -; VI-GISEL-NEXT: v_mad_f32 v1, v1, s4, -v2 -; VI-GISEL-NEXT: v_add_f32_e32 v1, v1, v4 +; VI-GISEL-NEXT: v_sub_f32_e32 v3, v3, v2 +; VI-GISEL-NEXT: v_add_f32_e32 v1, v3, v1 ; VI-GISEL-NEXT: v_cvt_i32_f32_e32 v2, v2 ; VI-GISEL-NEXT: v_exp_f32_e32 v1, v1 ; VI-GISEL-NEXT: v_ldexp_f32 v1, v1, v2 @@ -5368,19 +5382,19 @@ ; GFX900-SDAG-LABEL: v_exp_f32_nnan_ninf_daz: ; GFX900-SDAG: ; %bb.0: ; GFX900-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GFX900-SDAG-NEXT: s_mov_b32 s4, 0x3fb8aa3b ; GFX900-SDAG-NEXT: v_mul_f32_e32 v1, 0x3fb8aa3b, v0 -; GFX900-SDAG-NEXT: v_fma_f32 v2, v0, s4, -v1 -; GFX900-SDAG-NEXT: s_mov_b32 s5, 0x32a5705f -; GFX900-SDAG-NEXT: v_rndne_f32_e32 v1, v1 -; GFX900-SDAG-NEXT: v_fma_f32 v2, v0, s5, v2 -; GFX900-SDAG-NEXT: v_mad_f32 v3, v0, s4, -v1 -; GFX900-SDAG-NEXT: v_add_f32_e32 v2, v3, v2 -; GFX900-SDAG-NEXT: v_exp_f32_e32 v2, v2 -; GFX900-SDAG-NEXT: v_cvt_i32_f32_e32 v1, v1 +; GFX900-SDAG-NEXT: s_mov_b32 s4, 0x3fb8aa3b +; GFX900-SDAG-NEXT: v_rndne_f32_e32 v2, v1 +; GFX900-SDAG-NEXT: v_sub_f32_e32 v3, v1, v2 +; GFX900-SDAG-NEXT: v_fma_f32 v1, v0, s4, -v1 +; GFX900-SDAG-NEXT: s_mov_b32 s4, 0x32a5705f +; GFX900-SDAG-NEXT: v_fma_f32 v1, v0, s4, v1 +; GFX900-SDAG-NEXT: v_add_f32_e32 v1, v3, v1 +; GFX900-SDAG-NEXT: v_exp_f32_e32 v1, v1 +; GFX900-SDAG-NEXT: v_cvt_i32_f32_e32 v2, v2 ; GFX900-SDAG-NEXT: s_mov_b32 s4, 0xc2ce8ed0 ; GFX900-SDAG-NEXT: v_cmp_ngt_f32_e32 vcc, s4, v0 -; GFX900-SDAG-NEXT: v_ldexp_f32 v1, v2, v1 +; GFX900-SDAG-NEXT: v_ldexp_f32 v1, v1, v2 ; GFX900-SDAG-NEXT: v_cndmask_b32_e32 v0, 0, v1, vcc ; GFX900-SDAG-NEXT: s_setpc_b64 s[30:31] ; @@ -5391,13 +5405,13 @@ ; GFX900-GISEL-NEXT: v_mul_f32_e32 v1, 0x3fb8aa3b, v0 ; GFX900-GISEL-NEXT: v_fma_f32 v2, v0, s4, -v1 ; GFX900-GISEL-NEXT: v_mov_b32_e32 v3, 0x32a5705f -; GFX900-GISEL-NEXT: v_rndne_f32_e32 v1, v1 ; GFX900-GISEL-NEXT: v_fma_f32 v2, v0, v3, v2 -; GFX900-GISEL-NEXT: v_mad_f32 v3, v0, s4, -v1 -; GFX900-GISEL-NEXT: v_add_f32_e32 v2, v3, v2 -; GFX900-GISEL-NEXT: v_cvt_i32_f32_e32 v1, v1 -; GFX900-GISEL-NEXT: v_exp_f32_e32 v2, v2 -; GFX900-GISEL-NEXT: v_ldexp_f32 v1, v2, v1 +; GFX900-GISEL-NEXT: v_rndne_f32_e32 v3, v1 +; GFX900-GISEL-NEXT: v_sub_f32_e32 v1, v1, v3 +; GFX900-GISEL-NEXT: v_add_f32_e32 v1, v1, v2 +; GFX900-GISEL-NEXT: v_cvt_i32_f32_e32 v2, v3 +; GFX900-GISEL-NEXT: v_exp_f32_e32 v1, v1 +; GFX900-GISEL-NEXT: v_ldexp_f32 v1, v1, v2 ; GFX900-GISEL-NEXT: v_mov_b32_e32 v2, 0xc2ce8ed0 ; GFX900-GISEL-NEXT: v_cmp_lt_f32_e32 vcc, v0, v2 ; GFX900-GISEL-NEXT: v_cndmask_b32_e64 v0, v1, 0, vcc @@ -5406,19 +5420,19 @@ ; SI-SDAG-LABEL: v_exp_f32_nnan_ninf_daz: ; SI-SDAG: ; %bb.0: ; SI-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; SI-SDAG-NEXT: s_mov_b32 s4, 0x3fb8aa3b ; SI-SDAG-NEXT: v_mul_f32_e32 v1, 0x3fb8aa3b, v0 -; SI-SDAG-NEXT: v_fma_f32 v2, v0, s4, -v1 -; SI-SDAG-NEXT: s_mov_b32 s5, 0x32a5705f -; SI-SDAG-NEXT: v_rndne_f32_e32 v1, v1 -; SI-SDAG-NEXT: v_fma_f32 v2, v0, s5, v2 -; SI-SDAG-NEXT: v_mad_f32 v3, v0, s4, -v1 -; SI-SDAG-NEXT: v_add_f32_e32 v2, v3, v2 -; SI-SDAG-NEXT: v_exp_f32_e32 v2, v2 -; SI-SDAG-NEXT: v_cvt_i32_f32_e32 v1, v1 +; SI-SDAG-NEXT: s_mov_b32 s4, 0x3fb8aa3b +; SI-SDAG-NEXT: v_rndne_f32_e32 v2, v1 +; SI-SDAG-NEXT: v_sub_f32_e32 v3, v1, v2 +; SI-SDAG-NEXT: v_fma_f32 v1, v0, s4, -v1 +; SI-SDAG-NEXT: s_mov_b32 s4, 0x32a5705f +; SI-SDAG-NEXT: v_fma_f32 v1, v0, s4, v1 +; SI-SDAG-NEXT: v_add_f32_e32 v1, v3, v1 +; SI-SDAG-NEXT: v_exp_f32_e32 v1, v1 +; SI-SDAG-NEXT: v_cvt_i32_f32_e32 v2, v2 ; SI-SDAG-NEXT: s_mov_b32 s4, 0xc2ce8ed0 ; SI-SDAG-NEXT: v_cmp_ngt_f32_e32 vcc, s4, v0 -; SI-SDAG-NEXT: v_ldexp_f32_e32 v1, v2, v1 +; SI-SDAG-NEXT: v_ldexp_f32_e32 v1, v1, v2 ; SI-SDAG-NEXT: v_cndmask_b32_e32 v0, 0, v1, vcc ; SI-SDAG-NEXT: s_setpc_b64 s[30:31] ; @@ -5429,13 +5443,13 @@ ; SI-GISEL-NEXT: v_mul_f32_e32 v1, 0x3fb8aa3b, v0 ; SI-GISEL-NEXT: v_fma_f32 v2, v0, s4, -v1 ; SI-GISEL-NEXT: v_mov_b32_e32 v3, 0x32a5705f -; SI-GISEL-NEXT: v_rndne_f32_e32 v1, v1 ; SI-GISEL-NEXT: v_fma_f32 v2, v0, v3, v2 -; SI-GISEL-NEXT: v_mad_f32 v3, v0, s4, -v1 -; SI-GISEL-NEXT: v_add_f32_e32 v2, v3, v2 -; SI-GISEL-NEXT: v_cvt_i32_f32_e32 v1, v1 -; SI-GISEL-NEXT: v_exp_f32_e32 v2, v2 -; SI-GISEL-NEXT: v_ldexp_f32_e32 v1, v2, v1 +; SI-GISEL-NEXT: v_rndne_f32_e32 v3, v1 +; SI-GISEL-NEXT: v_sub_f32_e32 v1, v1, v3 +; SI-GISEL-NEXT: v_add_f32_e32 v1, v1, v2 +; SI-GISEL-NEXT: v_cvt_i32_f32_e32 v2, v3 +; SI-GISEL-NEXT: v_exp_f32_e32 v1, v1 +; SI-GISEL-NEXT: v_ldexp_f32_e32 v1, v1, v2 ; SI-GISEL-NEXT: v_mov_b32_e32 v2, 0xc2ce8ed0 ; SI-GISEL-NEXT: v_cmp_lt_f32_e32 vcc, v0, v2 ; SI-GISEL-NEXT: v_cndmask_b32_e64 v0, v1, 0, vcc @@ -5462,14 +5476,15 @@ ; VI-SDAG-NEXT: v_sub_f32_e32 v4, v0, v1 ; VI-SDAG-NEXT: v_mul_f32_e32 v2, 0x3fb8a000, v1 ; VI-SDAG-NEXT: v_mul_f32_e32 v5, 0x39a3b295, v4 -; VI-SDAG-NEXT: s_mov_b32 s4, 0x3fb8a000 -; VI-SDAG-NEXT: v_rndne_f32_e32 v2, v2 -; VI-SDAG-NEXT: v_mac_f32_e32 v5, 0x3fb8a000, v4 -; VI-SDAG-NEXT: v_mad_f32 v3, v1, s4, -v2 -; VI-SDAG-NEXT: v_mac_f32_e32 v5, 0x39a3b295, v1 -; VI-SDAG-NEXT: v_add_f32_e32 v1, v3, v5 +; VI-SDAG-NEXT: v_mul_f32_e32 v4, 0x3fb8a000, v4 +; VI-SDAG-NEXT: v_rndne_f32_e32 v3, v2 +; VI-SDAG-NEXT: v_add_f32_e32 v4, v4, v5 +; VI-SDAG-NEXT: v_mul_f32_e32 v1, 0x39a3b295, v1 +; VI-SDAG-NEXT: v_sub_f32_e32 v2, v2, v3 +; VI-SDAG-NEXT: v_add_f32_e32 v1, v1, v4 +; VI-SDAG-NEXT: v_add_f32_e32 v1, v2, v1 ; VI-SDAG-NEXT: v_exp_f32_e32 v1, v1 -; VI-SDAG-NEXT: v_cvt_i32_f32_e32 v2, v2 +; VI-SDAG-NEXT: v_cvt_i32_f32_e32 v2, v3 ; VI-SDAG-NEXT: s_mov_b32 s4, 0xc2ce8ed0 ; VI-SDAG-NEXT: v_cmp_ngt_f32_e32 vcc, s4, v0 ; VI-SDAG-NEXT: v_ldexp_f32 v1, v1, v2 @@ -5481,14 +5496,15 @@ ; VI-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; VI-GISEL-NEXT: v_and_b32_e32 v1, 0xfffff000, v0 ; VI-GISEL-NEXT: v_sub_f32_e32 v2, v0, v1 -; VI-GISEL-NEXT: v_mul_f32_e32 v3, 0x3fb8a000, v1 ; VI-GISEL-NEXT: v_mul_f32_e32 v4, 0x39a3b295, v2 -; VI-GISEL-NEXT: s_mov_b32 s4, 0x3fb8a000 -; VI-GISEL-NEXT: v_mac_f32_e32 v4, 0x3fb8a000, v2 +; VI-GISEL-NEXT: v_mul_f32_e32 v2, 0x3fb8a000, v2 +; VI-GISEL-NEXT: v_mul_f32_e32 v3, 0x3fb8a000, v1 +; VI-GISEL-NEXT: v_add_f32_e32 v2, v2, v4 +; VI-GISEL-NEXT: v_mul_f32_e32 v1, 0x39a3b295, v1 +; VI-GISEL-NEXT: v_add_f32_e32 v1, v1, v2 ; VI-GISEL-NEXT: v_rndne_f32_e32 v2, v3 -; VI-GISEL-NEXT: v_mac_f32_e32 v4, 0x39a3b295, v1 -; VI-GISEL-NEXT: v_mad_f32 v1, v1, s4, -v2 -; VI-GISEL-NEXT: v_add_f32_e32 v1, v1, v4 +; VI-GISEL-NEXT: v_sub_f32_e32 v3, v3, v2 +; VI-GISEL-NEXT: v_add_f32_e32 v1, v3, v1 ; VI-GISEL-NEXT: v_cvt_i32_f32_e32 v2, v2 ; VI-GISEL-NEXT: v_exp_f32_e32 v1, v1 ; VI-GISEL-NEXT: v_ldexp_f32 v1, v1, v2 @@ -5500,19 +5516,19 @@ ; GFX900-SDAG-LABEL: v_exp_f32_nnan_ninf_dynamic: ; GFX900-SDAG: ; %bb.0: ; GFX900-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GFX900-SDAG-NEXT: s_mov_b32 s4, 0x3fb8aa3b ; GFX900-SDAG-NEXT: v_mul_f32_e32 v1, 0x3fb8aa3b, v0 -; GFX900-SDAG-NEXT: v_fma_f32 v2, v0, s4, -v1 -; GFX900-SDAG-NEXT: s_mov_b32 s5, 0x32a5705f -; GFX900-SDAG-NEXT: v_rndne_f32_e32 v1, v1 -; GFX900-SDAG-NEXT: v_fma_f32 v2, v0, s5, v2 -; GFX900-SDAG-NEXT: v_mad_f32 v3, v0, s4, -v1 -; GFX900-SDAG-NEXT: v_add_f32_e32 v2, v3, v2 -; GFX900-SDAG-NEXT: v_exp_f32_e32 v2, v2 -; GFX900-SDAG-NEXT: v_cvt_i32_f32_e32 v1, v1 +; GFX900-SDAG-NEXT: s_mov_b32 s4, 0x3fb8aa3b +; GFX900-SDAG-NEXT: v_rndne_f32_e32 v2, v1 +; GFX900-SDAG-NEXT: v_sub_f32_e32 v3, v1, v2 +; GFX900-SDAG-NEXT: v_fma_f32 v1, v0, s4, -v1 +; GFX900-SDAG-NEXT: s_mov_b32 s4, 0x32a5705f +; GFX900-SDAG-NEXT: v_fma_f32 v1, v0, s4, v1 +; GFX900-SDAG-NEXT: v_add_f32_e32 v1, v3, v1 +; GFX900-SDAG-NEXT: v_exp_f32_e32 v1, v1 +; GFX900-SDAG-NEXT: v_cvt_i32_f32_e32 v2, v2 ; GFX900-SDAG-NEXT: s_mov_b32 s4, 0xc2ce8ed0 ; GFX900-SDAG-NEXT: v_cmp_ngt_f32_e32 vcc, s4, v0 -; GFX900-SDAG-NEXT: v_ldexp_f32 v1, v2, v1 +; GFX900-SDAG-NEXT: v_ldexp_f32 v1, v1, v2 ; GFX900-SDAG-NEXT: v_cndmask_b32_e32 v0, 0, v1, vcc ; GFX900-SDAG-NEXT: s_setpc_b64 s[30:31] ; @@ -5523,13 +5539,13 @@ ; GFX900-GISEL-NEXT: v_mul_f32_e32 v1, 0x3fb8aa3b, v0 ; GFX900-GISEL-NEXT: v_fma_f32 v2, v0, s4, -v1 ; GFX900-GISEL-NEXT: v_mov_b32_e32 v3, 0x32a5705f -; GFX900-GISEL-NEXT: v_rndne_f32_e32 v1, v1 ; GFX900-GISEL-NEXT: v_fma_f32 v2, v0, v3, v2 -; GFX900-GISEL-NEXT: v_mad_f32 v3, v0, s4, -v1 -; GFX900-GISEL-NEXT: v_add_f32_e32 v2, v3, v2 -; GFX900-GISEL-NEXT: v_cvt_i32_f32_e32 v1, v1 -; GFX900-GISEL-NEXT: v_exp_f32_e32 v2, v2 -; GFX900-GISEL-NEXT: v_ldexp_f32 v1, v2, v1 +; GFX900-GISEL-NEXT: v_rndne_f32_e32 v3, v1 +; GFX900-GISEL-NEXT: v_sub_f32_e32 v1, v1, v3 +; GFX900-GISEL-NEXT: v_add_f32_e32 v1, v1, v2 +; GFX900-GISEL-NEXT: v_cvt_i32_f32_e32 v2, v3 +; GFX900-GISEL-NEXT: v_exp_f32_e32 v1, v1 +; GFX900-GISEL-NEXT: v_ldexp_f32 v1, v1, v2 ; GFX900-GISEL-NEXT: v_mov_b32_e32 v2, 0xc2ce8ed0 ; GFX900-GISEL-NEXT: v_cmp_lt_f32_e32 vcc, v0, v2 ; GFX900-GISEL-NEXT: v_cndmask_b32_e64 v0, v1, 0, vcc @@ -5538,19 +5554,19 @@ ; SI-SDAG-LABEL: v_exp_f32_nnan_ninf_dynamic: ; SI-SDAG: ; %bb.0: ; SI-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; SI-SDAG-NEXT: s_mov_b32 s4, 0x3fb8aa3b ; SI-SDAG-NEXT: v_mul_f32_e32 v1, 0x3fb8aa3b, v0 -; SI-SDAG-NEXT: v_fma_f32 v2, v0, s4, -v1 -; SI-SDAG-NEXT: s_mov_b32 s5, 0x32a5705f -; SI-SDAG-NEXT: v_rndne_f32_e32 v1, v1 -; SI-SDAG-NEXT: v_fma_f32 v2, v0, s5, v2 -; SI-SDAG-NEXT: v_mad_f32 v3, v0, s4, -v1 -; SI-SDAG-NEXT: v_add_f32_e32 v2, v3, v2 -; SI-SDAG-NEXT: v_exp_f32_e32 v2, v2 -; SI-SDAG-NEXT: v_cvt_i32_f32_e32 v1, v1 +; SI-SDAG-NEXT: s_mov_b32 s4, 0x3fb8aa3b +; SI-SDAG-NEXT: v_rndne_f32_e32 v2, v1 +; SI-SDAG-NEXT: v_sub_f32_e32 v3, v1, v2 +; SI-SDAG-NEXT: v_fma_f32 v1, v0, s4, -v1 +; SI-SDAG-NEXT: s_mov_b32 s4, 0x32a5705f +; SI-SDAG-NEXT: v_fma_f32 v1, v0, s4, v1 +; SI-SDAG-NEXT: v_add_f32_e32 v1, v3, v1 +; SI-SDAG-NEXT: v_exp_f32_e32 v1, v1 +; SI-SDAG-NEXT: v_cvt_i32_f32_e32 v2, v2 ; SI-SDAG-NEXT: s_mov_b32 s4, 0xc2ce8ed0 ; SI-SDAG-NEXT: v_cmp_ngt_f32_e32 vcc, s4, v0 -; SI-SDAG-NEXT: v_ldexp_f32_e32 v1, v2, v1 +; SI-SDAG-NEXT: v_ldexp_f32_e32 v1, v1, v2 ; SI-SDAG-NEXT: v_cndmask_b32_e32 v0, 0, v1, vcc ; SI-SDAG-NEXT: s_setpc_b64 s[30:31] ; @@ -5561,13 +5577,13 @@ ; SI-GISEL-NEXT: v_mul_f32_e32 v1, 0x3fb8aa3b, v0 ; SI-GISEL-NEXT: v_fma_f32 v2, v0, s4, -v1 ; SI-GISEL-NEXT: v_mov_b32_e32 v3, 0x32a5705f -; SI-GISEL-NEXT: v_rndne_f32_e32 v1, v1 ; SI-GISEL-NEXT: v_fma_f32 v2, v0, v3, v2 -; SI-GISEL-NEXT: v_mad_f32 v3, v0, s4, -v1 -; SI-GISEL-NEXT: v_add_f32_e32 v2, v3, v2 -; SI-GISEL-NEXT: v_cvt_i32_f32_e32 v1, v1 -; SI-GISEL-NEXT: v_exp_f32_e32 v2, v2 -; SI-GISEL-NEXT: v_ldexp_f32_e32 v1, v2, v1 +; SI-GISEL-NEXT: v_rndne_f32_e32 v3, v1 +; SI-GISEL-NEXT: v_sub_f32_e32 v1, v1, v3 +; SI-GISEL-NEXT: v_add_f32_e32 v1, v1, v2 +; SI-GISEL-NEXT: v_cvt_i32_f32_e32 v2, v3 +; SI-GISEL-NEXT: v_exp_f32_e32 v1, v1 +; SI-GISEL-NEXT: v_ldexp_f32_e32 v1, v1, v2 ; SI-GISEL-NEXT: v_mov_b32_e32 v2, 0xc2ce8ed0 ; SI-GISEL-NEXT: v_cmp_lt_f32_e32 vcc, v0, v2 ; SI-GISEL-NEXT: v_cndmask_b32_e64 v0, v1, 0, vcc @@ -5622,14 +5638,15 @@ ; VI-SDAG-NEXT: v_sub_f32_e32 v4, v0, v1 ; VI-SDAG-NEXT: v_mul_f32_e32 v2, 0x3fb8a000, v1 ; VI-SDAG-NEXT: v_mul_f32_e32 v5, 0x39a3b295, v4 -; VI-SDAG-NEXT: s_mov_b32 s4, 0x3fb8a000 -; VI-SDAG-NEXT: v_rndne_f32_e32 v2, v2 -; VI-SDAG-NEXT: v_mac_f32_e32 v5, 0x3fb8a000, v4 -; VI-SDAG-NEXT: v_mad_f32 v3, v1, s4, -v2 -; VI-SDAG-NEXT: v_mac_f32_e32 v5, 0x39a3b295, v1 -; VI-SDAG-NEXT: v_add_f32_e32 v1, v3, v5 +; VI-SDAG-NEXT: v_mul_f32_e32 v4, 0x3fb8a000, v4 +; VI-SDAG-NEXT: v_rndne_f32_e32 v3, v2 +; VI-SDAG-NEXT: v_add_f32_e32 v4, v4, v5 +; VI-SDAG-NEXT: v_mul_f32_e32 v1, 0x39a3b295, v1 +; VI-SDAG-NEXT: v_sub_f32_e32 v2, v2, v3 +; VI-SDAG-NEXT: v_add_f32_e32 v1, v1, v4 +; VI-SDAG-NEXT: v_add_f32_e32 v1, v2, v1 ; VI-SDAG-NEXT: v_exp_f32_e32 v1, v1 -; VI-SDAG-NEXT: v_cvt_i32_f32_e32 v2, v2 +; VI-SDAG-NEXT: v_cvt_i32_f32_e32 v2, v3 ; VI-SDAG-NEXT: s_mov_b32 s4, 0xc2ce8ed0 ; VI-SDAG-NEXT: v_cmp_ngt_f32_e32 vcc, s4, v0 ; VI-SDAG-NEXT: s_mov_b32 s4, 0x42b17218 @@ -5645,14 +5662,15 @@ ; VI-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; VI-GISEL-NEXT: v_and_b32_e32 v1, 0xfffff000, v0 ; VI-GISEL-NEXT: v_sub_f32_e32 v2, v0, v1 -; VI-GISEL-NEXT: v_mul_f32_e32 v3, 0x3fb8a000, v1 ; VI-GISEL-NEXT: v_mul_f32_e32 v4, 0x39a3b295, v2 -; VI-GISEL-NEXT: s_mov_b32 s4, 0x3fb8a000 -; VI-GISEL-NEXT: v_mac_f32_e32 v4, 0x3fb8a000, v2 +; VI-GISEL-NEXT: v_mul_f32_e32 v2, 0x3fb8a000, v2 +; VI-GISEL-NEXT: v_mul_f32_e32 v3, 0x3fb8a000, v1 +; VI-GISEL-NEXT: v_add_f32_e32 v2, v2, v4 +; VI-GISEL-NEXT: v_mul_f32_e32 v1, 0x39a3b295, v1 +; VI-GISEL-NEXT: v_add_f32_e32 v1, v1, v2 ; VI-GISEL-NEXT: v_rndne_f32_e32 v2, v3 -; VI-GISEL-NEXT: v_mac_f32_e32 v4, 0x39a3b295, v1 -; VI-GISEL-NEXT: v_mad_f32 v1, v1, s4, -v2 -; VI-GISEL-NEXT: v_add_f32_e32 v1, v1, v4 +; VI-GISEL-NEXT: v_sub_f32_e32 v3, v3, v2 +; VI-GISEL-NEXT: v_add_f32_e32 v1, v3, v1 ; VI-GISEL-NEXT: v_cvt_i32_f32_e32 v2, v2 ; VI-GISEL-NEXT: v_exp_f32_e32 v1, v1 ; VI-GISEL-NEXT: v_mov_b32_e32 v3, 0x7f800000 @@ -5668,20 +5686,20 @@ ; GFX900-SDAG-LABEL: v_exp_f32_dynamic_mode: ; GFX900-SDAG: ; %bb.0: ; GFX900-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GFX900-SDAG-NEXT: s_mov_b32 s4, 0x3fb8aa3b ; GFX900-SDAG-NEXT: v_mul_f32_e32 v1, 0x3fb8aa3b, v0 -; GFX900-SDAG-NEXT: v_fma_f32 v2, v0, s4, -v1 -; GFX900-SDAG-NEXT: s_mov_b32 s5, 0x32a5705f -; GFX900-SDAG-NEXT: v_rndne_f32_e32 v1, v1 -; GFX900-SDAG-NEXT: v_fma_f32 v2, v0, s5, v2 -; GFX900-SDAG-NEXT: v_mad_f32 v3, v0, s4, -v1 -; GFX900-SDAG-NEXT: v_add_f32_e32 v2, v3, v2 -; GFX900-SDAG-NEXT: v_exp_f32_e32 v2, v2 -; GFX900-SDAG-NEXT: v_cvt_i32_f32_e32 v1, v1 +; GFX900-SDAG-NEXT: s_mov_b32 s4, 0x3fb8aa3b +; GFX900-SDAG-NEXT: v_rndne_f32_e32 v2, v1 +; GFX900-SDAG-NEXT: v_sub_f32_e32 v3, v1, v2 +; GFX900-SDAG-NEXT: v_fma_f32 v1, v0, s4, -v1 +; GFX900-SDAG-NEXT: s_mov_b32 s4, 0x32a5705f +; GFX900-SDAG-NEXT: v_fma_f32 v1, v0, s4, v1 +; GFX900-SDAG-NEXT: v_add_f32_e32 v1, v3, v1 +; GFX900-SDAG-NEXT: v_exp_f32_e32 v1, v1 +; GFX900-SDAG-NEXT: v_cvt_i32_f32_e32 v2, v2 ; GFX900-SDAG-NEXT: s_mov_b32 s4, 0xc2ce8ed0 ; GFX900-SDAG-NEXT: v_cmp_ngt_f32_e32 vcc, s4, v0 ; GFX900-SDAG-NEXT: s_mov_b32 s4, 0x42b17218 -; GFX900-SDAG-NEXT: v_ldexp_f32 v1, v2, v1 +; GFX900-SDAG-NEXT: v_ldexp_f32 v1, v1, v2 ; GFX900-SDAG-NEXT: v_cndmask_b32_e32 v1, 0, v1, vcc ; GFX900-SDAG-NEXT: v_mov_b32_e32 v2, 0x7f800000 ; GFX900-SDAG-NEXT: v_cmp_nlt_f32_e32 vcc, s4, v0 @@ -5695,14 +5713,14 @@ ; GFX900-GISEL-NEXT: v_mul_f32_e32 v1, 0x3fb8aa3b, v0 ; GFX900-GISEL-NEXT: v_fma_f32 v2, v0, s4, -v1 ; GFX900-GISEL-NEXT: v_mov_b32_e32 v3, 0x32a5705f -; GFX900-GISEL-NEXT: v_rndne_f32_e32 v1, v1 ; GFX900-GISEL-NEXT: v_fma_f32 v2, v0, v3, v2 -; GFX900-GISEL-NEXT: v_mad_f32 v3, v0, s4, -v1 -; GFX900-GISEL-NEXT: v_add_f32_e32 v2, v3, v2 -; GFX900-GISEL-NEXT: v_cvt_i32_f32_e32 v1, v1 -; GFX900-GISEL-NEXT: v_exp_f32_e32 v2, v2 +; GFX900-GISEL-NEXT: v_rndne_f32_e32 v3, v1 +; GFX900-GISEL-NEXT: v_sub_f32_e32 v1, v1, v3 +; GFX900-GISEL-NEXT: v_add_f32_e32 v1, v1, v2 +; GFX900-GISEL-NEXT: v_cvt_i32_f32_e32 v2, v3 +; GFX900-GISEL-NEXT: v_exp_f32_e32 v1, v1 ; GFX900-GISEL-NEXT: v_mov_b32_e32 v3, 0x7f800000 -; GFX900-GISEL-NEXT: v_ldexp_f32 v1, v2, v1 +; GFX900-GISEL-NEXT: v_ldexp_f32 v1, v1, v2 ; GFX900-GISEL-NEXT: v_mov_b32_e32 v2, 0xc2ce8ed0 ; GFX900-GISEL-NEXT: v_cmp_lt_f32_e32 vcc, v0, v2 ; GFX900-GISEL-NEXT: v_mov_b32_e32 v2, 0x42b17218 @@ -5714,20 +5732,20 @@ ; SI-SDAG-LABEL: v_exp_f32_dynamic_mode: ; SI-SDAG: ; %bb.0: ; SI-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; SI-SDAG-NEXT: s_mov_b32 s4, 0x3fb8aa3b ; SI-SDAG-NEXT: v_mul_f32_e32 v1, 0x3fb8aa3b, v0 -; SI-SDAG-NEXT: v_fma_f32 v2, v0, s4, -v1 -; SI-SDAG-NEXT: s_mov_b32 s5, 0x32a5705f -; SI-SDAG-NEXT: v_rndne_f32_e32 v1, v1 -; SI-SDAG-NEXT: v_fma_f32 v2, v0, s5, v2 -; SI-SDAG-NEXT: v_mad_f32 v3, v0, s4, -v1 -; SI-SDAG-NEXT: v_add_f32_e32 v2, v3, v2 -; SI-SDAG-NEXT: v_exp_f32_e32 v2, v2 -; SI-SDAG-NEXT: v_cvt_i32_f32_e32 v1, v1 +; SI-SDAG-NEXT: s_mov_b32 s4, 0x3fb8aa3b +; SI-SDAG-NEXT: v_rndne_f32_e32 v2, v1 +; SI-SDAG-NEXT: v_sub_f32_e32 v3, v1, v2 +; SI-SDAG-NEXT: v_fma_f32 v1, v0, s4, -v1 +; SI-SDAG-NEXT: s_mov_b32 s4, 0x32a5705f +; SI-SDAG-NEXT: v_fma_f32 v1, v0, s4, v1 +; SI-SDAG-NEXT: v_add_f32_e32 v1, v3, v1 +; SI-SDAG-NEXT: v_exp_f32_e32 v1, v1 +; SI-SDAG-NEXT: v_cvt_i32_f32_e32 v2, v2 ; SI-SDAG-NEXT: s_mov_b32 s4, 0xc2ce8ed0 ; SI-SDAG-NEXT: v_cmp_ngt_f32_e32 vcc, s4, v0 ; SI-SDAG-NEXT: s_mov_b32 s4, 0x42b17218 -; SI-SDAG-NEXT: v_ldexp_f32_e32 v1, v2, v1 +; SI-SDAG-NEXT: v_ldexp_f32_e32 v1, v1, v2 ; SI-SDAG-NEXT: v_cndmask_b32_e32 v1, 0, v1, vcc ; SI-SDAG-NEXT: v_mov_b32_e32 v2, 0x7f800000 ; SI-SDAG-NEXT: v_cmp_nlt_f32_e32 vcc, s4, v0 @@ -5741,14 +5759,14 @@ ; SI-GISEL-NEXT: v_mul_f32_e32 v1, 0x3fb8aa3b, v0 ; SI-GISEL-NEXT: v_fma_f32 v2, v0, s4, -v1 ; SI-GISEL-NEXT: v_mov_b32_e32 v3, 0x32a5705f -; SI-GISEL-NEXT: v_rndne_f32_e32 v1, v1 ; SI-GISEL-NEXT: v_fma_f32 v2, v0, v3, v2 -; SI-GISEL-NEXT: v_mad_f32 v3, v0, s4, -v1 -; SI-GISEL-NEXT: v_add_f32_e32 v2, v3, v2 -; SI-GISEL-NEXT: v_cvt_i32_f32_e32 v1, v1 -; SI-GISEL-NEXT: v_exp_f32_e32 v2, v2 +; SI-GISEL-NEXT: v_rndne_f32_e32 v3, v1 +; SI-GISEL-NEXT: v_sub_f32_e32 v1, v1, v3 +; SI-GISEL-NEXT: v_add_f32_e32 v1, v1, v2 +; SI-GISEL-NEXT: v_cvt_i32_f32_e32 v2, v3 +; SI-GISEL-NEXT: v_exp_f32_e32 v1, v1 ; SI-GISEL-NEXT: v_mov_b32_e32 v3, 0x7f800000 -; SI-GISEL-NEXT: v_ldexp_f32_e32 v1, v2, v1 +; SI-GISEL-NEXT: v_ldexp_f32_e32 v1, v1, v2 ; SI-GISEL-NEXT: v_mov_b32_e32 v2, 0xc2ce8ed0 ; SI-GISEL-NEXT: v_cmp_lt_f32_e32 vcc, v0, v2 ; SI-GISEL-NEXT: v_mov_b32_e32 v2, 0x42b17218 @@ -6499,19 +6517,20 @@ ; VI-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; VI-SDAG-NEXT: v_add_f16_e32 v0, v0, v1 ; VI-SDAG-NEXT: v_cvt_f32_f16_e32 v0, v0 -; VI-SDAG-NEXT: s_mov_b32 s4, 0x3fb8a000 +; VI-SDAG-NEXT: s_mov_b32 s4, 0xc2ce8ed0 ; VI-SDAG-NEXT: v_and_b32_e32 v1, 0xfffff000, v0 ; VI-SDAG-NEXT: v_sub_f32_e32 v2, v0, v1 -; VI-SDAG-NEXT: v_mul_f32_e32 v3, 0x3fb8a000, v1 ; VI-SDAG-NEXT: v_mul_f32_e32 v4, 0x39a3b295, v2 -; VI-SDAG-NEXT: v_mac_f32_e32 v4, 0x3fb8a000, v2 -; VI-SDAG-NEXT: v_rndne_f32_e32 v2, v3 -; VI-SDAG-NEXT: v_mac_f32_e32 v4, 0x39a3b295, v1 -; VI-SDAG-NEXT: v_mad_f32 v1, v1, s4, -v2 -; VI-SDAG-NEXT: v_add_f32_e32 v1, v1, v4 -; VI-SDAG-NEXT: v_cvt_i32_f32_e32 v2, v2 +; VI-SDAG-NEXT: v_mul_f32_e32 v2, 0x3fb8a000, v2 +; VI-SDAG-NEXT: v_mul_f32_e32 v3, 0x39a3b295, v1 +; VI-SDAG-NEXT: v_add_f32_e32 v2, v2, v4 +; VI-SDAG-NEXT: v_mul_f32_e32 v1, 0x3fb8a000, v1 +; VI-SDAG-NEXT: v_add_f32_e32 v2, v3, v2 +; VI-SDAG-NEXT: v_rndne_f32_e32 v3, v1 +; VI-SDAG-NEXT: v_sub_f32_e32 v1, v1, v3 +; VI-SDAG-NEXT: v_add_f32_e32 v1, v1, v2 ; VI-SDAG-NEXT: v_exp_f32_e32 v1, v1 -; VI-SDAG-NEXT: s_mov_b32 s4, 0xc2ce8ed0 +; VI-SDAG-NEXT: v_cvt_i32_f32_e32 v2, v3 ; VI-SDAG-NEXT: v_cmp_ngt_f32_e32 vcc, s4, v0 ; VI-SDAG-NEXT: s_mov_b32 s4, 0x42b17218 ; VI-SDAG-NEXT: v_ldexp_f32 v1, v1, v2 @@ -6526,16 +6545,17 @@ ; VI-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; VI-GISEL-NEXT: v_add_f16_e32 v0, v0, v1 ; VI-GISEL-NEXT: v_cvt_f32_f16_e32 v0, v0 -; VI-GISEL-NEXT: s_mov_b32 s4, 0x3fb8a000 ; VI-GISEL-NEXT: v_and_b32_e32 v1, 0xfffff000, v0 ; VI-GISEL-NEXT: v_sub_f32_e32 v2, v0, v1 -; VI-GISEL-NEXT: v_mul_f32_e32 v3, 0x3fb8a000, v1 ; VI-GISEL-NEXT: v_mul_f32_e32 v4, 0x39a3b295, v2 -; VI-GISEL-NEXT: v_mac_f32_e32 v4, 0x3fb8a000, v2 +; VI-GISEL-NEXT: v_mul_f32_e32 v2, 0x3fb8a000, v2 +; VI-GISEL-NEXT: v_mul_f32_e32 v3, 0x3fb8a000, v1 +; VI-GISEL-NEXT: v_add_f32_e32 v2, v2, v4 +; VI-GISEL-NEXT: v_mul_f32_e32 v1, 0x39a3b295, v1 +; VI-GISEL-NEXT: v_add_f32_e32 v1, v1, v2 ; VI-GISEL-NEXT: v_rndne_f32_e32 v2, v3 -; VI-GISEL-NEXT: v_mac_f32_e32 v4, 0x39a3b295, v1 -; VI-GISEL-NEXT: v_mad_f32 v1, v1, s4, -v2 -; VI-GISEL-NEXT: v_add_f32_e32 v1, v1, v4 +; VI-GISEL-NEXT: v_sub_f32_e32 v3, v3, v2 +; VI-GISEL-NEXT: v_add_f32_e32 v1, v3, v1 ; VI-GISEL-NEXT: v_cvt_i32_f32_e32 v2, v2 ; VI-GISEL-NEXT: v_exp_f32_e32 v1, v1 ; VI-GISEL-NEXT: v_mov_b32_e32 v3, 0x7f800000 @@ -6552,50 +6572,50 @@ ; GFX900-SDAG: ; %bb.0: ; GFX900-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; GFX900-SDAG-NEXT: v_add_f16_e32 v0, v0, v1 -; GFX900-SDAG-NEXT: v_cvt_f32_f16_e32 v1, v0 +; GFX900-SDAG-NEXT: v_cvt_f32_f16_e32 v0, v0 ; GFX900-SDAG-NEXT: s_mov_b32 s4, 0x3fb8aa3b ; GFX900-SDAG-NEXT: s_mov_b32 s5, 0x32a5705f -; GFX900-SDAG-NEXT: v_mul_f32_e32 v2, 0x3fb8aa3b, v1 -; GFX900-SDAG-NEXT: v_fma_f32 v3, v1, s4, -v2 -; GFX900-SDAG-NEXT: v_rndne_f32_e32 v2, v2 -; GFX900-SDAG-NEXT: v_fma_f32 v3, v1, s5, v3 -; GFX900-SDAG-NEXT: v_mad_mix_f32 v0, v0, s4, -v2 op_sel_hi:[1,0,0] -; GFX900-SDAG-NEXT: v_add_f32_e32 v0, v0, v3 -; GFX900-SDAG-NEXT: v_cvt_i32_f32_e32 v2, v2 -; GFX900-SDAG-NEXT: v_exp_f32_e32 v0, v0 +; GFX900-SDAG-NEXT: v_mul_f32_e32 v1, 0x3fb8aa3b, v0 +; GFX900-SDAG-NEXT: v_fma_f32 v2, v0, s4, -v1 +; GFX900-SDAG-NEXT: v_rndne_f32_e32 v3, v1 +; GFX900-SDAG-NEXT: v_fma_f32 v2, v0, s5, v2 +; GFX900-SDAG-NEXT: v_sub_f32_e32 v1, v1, v3 +; GFX900-SDAG-NEXT: v_add_f32_e32 v1, v1, v2 +; GFX900-SDAG-NEXT: v_cvt_i32_f32_e32 v2, v3 +; GFX900-SDAG-NEXT: v_exp_f32_e32 v1, v1 ; GFX900-SDAG-NEXT: s_mov_b32 s4, 0xc2ce8ed0 -; GFX900-SDAG-NEXT: v_cmp_ngt_f32_e32 vcc, s4, v1 +; GFX900-SDAG-NEXT: v_cmp_ngt_f32_e32 vcc, s4, v0 ; GFX900-SDAG-NEXT: s_mov_b32 s4, 0x42b17218 -; GFX900-SDAG-NEXT: v_ldexp_f32 v0, v0, v2 -; GFX900-SDAG-NEXT: v_cndmask_b32_e32 v0, 0, v0, vcc +; GFX900-SDAG-NEXT: v_ldexp_f32 v1, v1, v2 +; GFX900-SDAG-NEXT: v_cndmask_b32_e32 v1, 0, v1, vcc ; GFX900-SDAG-NEXT: v_mov_b32_e32 v2, 0x7f800000 -; GFX900-SDAG-NEXT: v_cmp_nlt_f32_e32 vcc, s4, v1 -; GFX900-SDAG-NEXT: v_cndmask_b32_e32 v0, v2, v0, vcc +; GFX900-SDAG-NEXT: v_cmp_nlt_f32_e32 vcc, s4, v0 +; GFX900-SDAG-NEXT: v_cndmask_b32_e32 v0, v2, v1, vcc ; GFX900-SDAG-NEXT: s_setpc_b64 s[30:31] ; ; GFX900-GISEL-LABEL: v_exp_f32_from_fpext_math_f16_daz: ; GFX900-GISEL: ; %bb.0: ; GFX900-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; GFX900-GISEL-NEXT: v_add_f16_e32 v0, v0, v1 -; GFX900-GISEL-NEXT: v_cvt_f32_f16_e32 v1, v0 +; GFX900-GISEL-NEXT: v_cvt_f32_f16_e32 v0, v0 ; GFX900-GISEL-NEXT: s_mov_b32 s4, 0x3fb8aa3b -; GFX900-GISEL-NEXT: v_mov_b32_e32 v2, 0x32a5705f -; GFX900-GISEL-NEXT: v_mul_f32_e32 v3, 0x3fb8aa3b, v1 -; GFX900-GISEL-NEXT: v_fma_f32 v4, v1, s4, -v3 -; GFX900-GISEL-NEXT: v_rndne_f32_e32 v3, v3 -; GFX900-GISEL-NEXT: v_fma_f32 v2, v1, v2, v4 -; GFX900-GISEL-NEXT: v_mad_mix_f32 v0, v0, s4, -v3 op_sel_hi:[1,0,0] -; GFX900-GISEL-NEXT: v_add_f32_e32 v0, v0, v2 +; GFX900-GISEL-NEXT: v_mov_b32_e32 v1, 0x32a5705f +; GFX900-GISEL-NEXT: v_mul_f32_e32 v2, 0x3fb8aa3b, v0 +; GFX900-GISEL-NEXT: v_fma_f32 v3, v0, s4, -v2 +; GFX900-GISEL-NEXT: v_fma_f32 v1, v0, v1, v3 +; GFX900-GISEL-NEXT: v_rndne_f32_e32 v3, v2 +; GFX900-GISEL-NEXT: v_sub_f32_e32 v2, v2, v3 +; GFX900-GISEL-NEXT: v_add_f32_e32 v1, v2, v1 ; GFX900-GISEL-NEXT: v_cvt_i32_f32_e32 v2, v3 -; GFX900-GISEL-NEXT: v_exp_f32_e32 v0, v0 +; GFX900-GISEL-NEXT: v_exp_f32_e32 v1, v1 ; GFX900-GISEL-NEXT: v_mov_b32_e32 v3, 0x7f800000 -; GFX900-GISEL-NEXT: v_ldexp_f32 v0, v0, v2 +; GFX900-GISEL-NEXT: v_ldexp_f32 v1, v1, v2 ; GFX900-GISEL-NEXT: v_mov_b32_e32 v2, 0xc2ce8ed0 -; GFX900-GISEL-NEXT: v_cmp_lt_f32_e32 vcc, v1, v2 +; GFX900-GISEL-NEXT: v_cmp_lt_f32_e32 vcc, v0, v2 ; GFX900-GISEL-NEXT: v_mov_b32_e32 v2, 0x42b17218 -; GFX900-GISEL-NEXT: v_cndmask_b32_e64 v0, v0, 0, vcc -; GFX900-GISEL-NEXT: v_cmp_gt_f32_e32 vcc, v1, v2 -; GFX900-GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v3, vcc +; GFX900-GISEL-NEXT: v_cndmask_b32_e64 v1, v1, 0, vcc +; GFX900-GISEL-NEXT: v_cmp_gt_f32_e32 vcc, v0, v2 +; GFX900-GISEL-NEXT: v_cndmask_b32_e32 v0, v1, v3, vcc ; GFX900-GISEL-NEXT: s_setpc_b64 s[30:31] ; ; SI-SDAG-LABEL: v_exp_f32_from_fpext_math_f16_daz: @@ -6604,20 +6624,20 @@ ; SI-SDAG-NEXT: v_cvt_f32_f16_e32 v0, v0 ; SI-SDAG-NEXT: v_cvt_f32_f16_e32 v1, v1 ; SI-SDAG-NEXT: s_mov_b32 s4, 0x3fb8aa3b -; SI-SDAG-NEXT: s_mov_b32 s5, 0x32a5705f ; SI-SDAG-NEXT: v_add_f32_e32 v0, v0, v1 ; SI-SDAG-NEXT: v_mul_f32_e32 v1, 0x3fb8aa3b, v0 ; SI-SDAG-NEXT: v_fma_f32 v2, v0, s4, -v1 -; SI-SDAG-NEXT: v_rndne_f32_e32 v1, v1 -; SI-SDAG-NEXT: v_fma_f32 v2, v0, s5, v2 -; SI-SDAG-NEXT: v_mad_f32 v3, v0, s4, -v1 -; SI-SDAG-NEXT: v_add_f32_e32 v2, v3, v2 -; SI-SDAG-NEXT: v_exp_f32_e32 v2, v2 -; SI-SDAG-NEXT: v_cvt_i32_f32_e32 v1, v1 +; SI-SDAG-NEXT: s_mov_b32 s4, 0x32a5705f +; SI-SDAG-NEXT: v_rndne_f32_e32 v3, v1 +; SI-SDAG-NEXT: v_fma_f32 v2, v0, s4, v2 +; SI-SDAG-NEXT: v_sub_f32_e32 v1, v1, v3 +; SI-SDAG-NEXT: v_add_f32_e32 v1, v1, v2 +; SI-SDAG-NEXT: v_exp_f32_e32 v1, v1 +; SI-SDAG-NEXT: v_cvt_i32_f32_e32 v2, v3 ; SI-SDAG-NEXT: s_mov_b32 s4, 0xc2ce8ed0 ; SI-SDAG-NEXT: v_cmp_ngt_f32_e32 vcc, s4, v0 ; SI-SDAG-NEXT: s_mov_b32 s4, 0x42b17218 -; SI-SDAG-NEXT: v_ldexp_f32_e32 v1, v2, v1 +; SI-SDAG-NEXT: v_ldexp_f32_e32 v1, v1, v2 ; SI-SDAG-NEXT: v_cndmask_b32_e32 v1, 0, v1, vcc ; SI-SDAG-NEXT: v_mov_b32_e32 v2, 0x7f800000 ; SI-SDAG-NEXT: v_cmp_nlt_f32_e32 vcc, s4, v0 @@ -6630,7 +6650,6 @@ ; SI-GISEL-NEXT: v_cvt_f32_f16_e32 v0, v0 ; SI-GISEL-NEXT: v_cvt_f32_f16_e32 v1, v1 ; SI-GISEL-NEXT: s_mov_b32 s4, 0x3fb8aa3b -; SI-GISEL-NEXT: v_mov_b32_e32 v4, 0x42b17218 ; SI-GISEL-NEXT: v_mov_b32_e32 v5, 0x7f800000 ; SI-GISEL-NEXT: v_add_f32_e32 v0, v0, v1 ; SI-GISEL-NEXT: v_cvt_f16_f32_e32 v0, v0 @@ -6638,15 +6657,16 @@ ; SI-GISEL-NEXT: v_cvt_f32_f16_e32 v0, v0 ; SI-GISEL-NEXT: v_mul_f32_e32 v2, 0x3fb8aa3b, v0 ; SI-GISEL-NEXT: v_fma_f32 v3, v0, s4, -v2 -; SI-GISEL-NEXT: v_rndne_f32_e32 v2, v2 +; SI-GISEL-NEXT: v_rndne_f32_e32 v4, v2 ; SI-GISEL-NEXT: v_fma_f32 v1, v0, v1, v3 -; SI-GISEL-NEXT: v_mad_f32 v3, v0, s4, -v2 -; SI-GISEL-NEXT: v_add_f32_e32 v1, v3, v1 -; SI-GISEL-NEXT: v_cvt_i32_f32_e32 v2, v2 +; SI-GISEL-NEXT: v_sub_f32_e32 v2, v2, v4 +; SI-GISEL-NEXT: v_add_f32_e32 v1, v2, v1 +; SI-GISEL-NEXT: v_cvt_i32_f32_e32 v3, v4 ; SI-GISEL-NEXT: v_exp_f32_e32 v1, v1 -; SI-GISEL-NEXT: v_mov_b32_e32 v3, 0xc2ce8ed0 -; SI-GISEL-NEXT: v_cmp_lt_f32_e32 vcc, v0, v3 -; SI-GISEL-NEXT: v_ldexp_f32_e32 v1, v1, v2 +; SI-GISEL-NEXT: v_mov_b32_e32 v2, 0xc2ce8ed0 +; SI-GISEL-NEXT: v_mov_b32_e32 v4, 0x42b17218 +; SI-GISEL-NEXT: v_cmp_lt_f32_e32 vcc, v0, v2 +; SI-GISEL-NEXT: v_ldexp_f32_e32 v1, v1, v3 ; SI-GISEL-NEXT: v_cndmask_b32_e64 v1, v1, 0, vcc ; SI-GISEL-NEXT: v_cmp_gt_f32_e32 vcc, v0, v4 ; SI-GISEL-NEXT: v_cndmask_b32_e32 v0, v1, v5, vcc Index: llvm/test/CodeGen/AMDGPU/llvm.get.rounding.ll =================================================================== --- /dev/null +++ llvm/test/CodeGen/AMDGPU/llvm.get.rounding.ll @@ -0,0 +1,79 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 2 +; RUN: llc -march=amdgcn -mcpu=tahiti < %s | FileCheck -check-prefixes=GCN,GFX678,GFX6 %s +; RUN: llc -march=amdgcn -mcpu=hawaii < %s | FileCheck -check-prefixes=GCN,GFX678,GFX7 %s +; RUN: llc -march=amdgcn -mcpu=fiji < %s | FileCheck -check-prefixes=GCN,GFX678,GFX8 %s +; RUN: llc -march=amdgcn -mcpu=gfx900 < %s | FileCheck -check-prefixes=GCN,GFX9 %s +; RUN: llc -march=amdgcn -mcpu=gfx1030 < %s | FileCheck -check-prefixes=GCN,GFX1011,GFX10 %s +; RUN: llc -march=amdgcn -mcpu=gfx1100 -amdgpu-enable-delay-alu=0 < %s | FileCheck -check-prefixes=GCN,GFX1011,GFX11 %s + +declare i32 @llvm.get.rounding() + +define i32 @func_rounding() { +; GFX678-LABEL: func_rounding: +; GFX678: ; %bb.0: +; GFX678-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GFX678-NEXT: s_getreg_b32 s4, hwreg(HW_REG_MODE, 0, 4) +; GFX678-NEXT: s_lshl_b32 s6, s4, 2 +; GFX678-NEXT: s_mov_b32 s4, 0xeb24da71 +; GFX678-NEXT: s_mov_b32 s5, 0xc96f385 +; GFX678-NEXT: s_lshr_b64 s[4:5], s[4:5], s6 +; GFX678-NEXT: s_and_b32 s4, s4, 15 +; GFX678-NEXT: s_add_i32 s5, s4, 4 +; GFX678-NEXT: s_cmp_lt_u32 s4, 4 +; GFX678-NEXT: s_cselect_b32 s4, s4, s5 +; GFX678-NEXT: v_mov_b32_e32 v0, s4 +; GFX678-NEXT: s_setpc_b64 s[30:31] +; +; GFX9-LABEL: func_rounding: +; GFX9: ; %bb.0: +; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GFX9-NEXT: s_getreg_b32 s4, hwreg(HW_REG_MODE, 0, 4) +; GFX9-NEXT: s_lshl_b32 s6, s4, 2 +; GFX9-NEXT: s_mov_b32 s4, 0xeb24da71 +; GFX9-NEXT: s_mov_b32 s5, 0xc96f385 +; GFX9-NEXT: s_lshr_b64 s[4:5], s[4:5], s6 +; GFX9-NEXT: s_and_b32 s4, s4, 15 +; GFX9-NEXT: s_add_i32 s5, s4, 4 +; GFX9-NEXT: s_cmp_lt_u32 s4, 4 +; GFX9-NEXT: s_cselect_b32 s4, s4, s5 +; GFX9-NEXT: v_mov_b32_e32 v0, s4 +; GFX9-NEXT: s_setpc_b64 s[30:31] +; +; GFX10-LABEL: func_rounding: +; GFX10: ; %bb.0: +; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GFX10-NEXT: s_getreg_b32 s4, hwreg(HW_REG_MODE, 0, 4) +; GFX10-NEXT: s_lshl_b32 s6, s4, 2 +; GFX10-NEXT: s_mov_b32 s4, 0xeb24da71 +; GFX10-NEXT: s_mov_b32 s5, 0xc96f385 +; GFX10-NEXT: s_lshr_b64 s[4:5], s[4:5], s6 +; GFX10-NEXT: s_and_b32 s4, s4, 15 +; GFX10-NEXT: s_add_i32 s5, s4, 4 +; GFX10-NEXT: s_cmp_lt_u32 s4, 4 +; GFX10-NEXT: s_cselect_b32 s4, s4, s5 +; GFX10-NEXT: v_mov_b32_e32 v0, s4 +; GFX10-NEXT: s_setpc_b64 s[30:31] +; +; GFX11-LABEL: func_rounding: +; GFX11: ; %bb.0: +; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GFX11-NEXT: s_getreg_b32 s0, hwreg(HW_REG_MODE, 0, 4) +; GFX11-NEXT: s_lshl_b32 s2, s0, 2 +; GFX11-NEXT: s_mov_b32 s0, 0xeb24da71 +; GFX11-NEXT: s_mov_b32 s1, 0xc96f385 +; GFX11-NEXT: s_lshr_b64 s[0:1], s[0:1], s2 +; GFX11-NEXT: s_and_b32 s0, s0, 15 +; GFX11-NEXT: s_add_i32 s1, s0, 4 +; GFX11-NEXT: s_cmp_lt_u32 s0, 4 +; GFX11-NEXT: s_cselect_b32 s0, s0, s1 +; GFX11-NEXT: v_mov_b32_e32 v0, s0 +; GFX11-NEXT: s_setpc_b64 s[30:31] + %rounding = call i32 @llvm.get.rounding() + ret i32 %rounding +} +;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line: +; GCN: {{.*}} +; GFX1011: {{.*}} +; GFX6: {{.*}} +; GFX7: {{.*}} +; GFX8: {{.*}} Index: llvm/test/CodeGen/AMDGPU/llvm.log.ll =================================================================== --- llvm/test/CodeGen/AMDGPU/llvm.log.ll +++ llvm/test/CodeGen/AMDGPU/llvm.log.ll @@ -3330,9 +3330,9 @@ ; SI-SDAG-NEXT: s_mov_b32 s5, 0x3377d1cf ; SI-SDAG-NEXT: s_mov_b32 s6, 0x7f800000 ; SI-SDAG-NEXT: v_mul_f32_e32 v1, 0x3f317217, v0 -; SI-SDAG-NEXT: v_fma_f32 v1, v0, s4, -v1 -; SI-SDAG-NEXT: v_fma_f32 v1, v0, s5, v1 -; SI-SDAG-NEXT: v_mac_f32_e32 v1, 0x3f317217, v0 +; SI-SDAG-NEXT: v_fma_f32 v2, v0, s4, -v1 +; SI-SDAG-NEXT: v_fma_f32 v2, v0, s5, v2 +; SI-SDAG-NEXT: v_add_f32_e32 v1, v1, v2 ; SI-SDAG-NEXT: v_cmp_lt_f32_e64 vcc, |v0|, s6 ; SI-SDAG-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc ; SI-SDAG-NEXT: s_setpc_b64 s[30:31] @@ -3345,9 +3345,9 @@ ; SI-GISEL-NEXT: v_mov_b32_e32 v1, 0x3377d1cf ; SI-GISEL-NEXT: v_mov_b32_e32 v2, 0x7f800000 ; SI-GISEL-NEXT: v_mul_f32_e32 v3, 0x3f317217, v0 -; SI-GISEL-NEXT: v_fma_f32 v3, v0, s4, -v3 -; SI-GISEL-NEXT: v_fma_f32 v1, v0, v1, v3 -; SI-GISEL-NEXT: v_mac_f32_e32 v1, 0x3f317217, v0 +; SI-GISEL-NEXT: v_fma_f32 v4, v0, s4, -v3 +; SI-GISEL-NEXT: v_fma_f32 v1, v0, v1, v4 +; SI-GISEL-NEXT: v_add_f32_e32 v1, v3, v1 ; SI-GISEL-NEXT: v_cmp_lt_f32_e64 vcc, |v0|, v2 ; SI-GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc ; SI-GISEL-NEXT: s_setpc_b64 s[30:31] @@ -3358,28 +3358,34 @@ ; VI-SDAG-NEXT: v_log_f32_e32 v0, v0 ; VI-SDAG-NEXT: s_mov_b32 s4, 0x7f800000 ; VI-SDAG-NEXT: v_and_b32_e32 v1, 0xfffff000, v0 -; VI-SDAG-NEXT: v_sub_f32_e32 v2, v0, v1 -; VI-SDAG-NEXT: v_mul_f32_e32 v3, 0x3805fdf4, v2 -; VI-SDAG-NEXT: v_mac_f32_e32 v3, 0x3805fdf4, v1 -; VI-SDAG-NEXT: v_mac_f32_e32 v3, 0x3f317000, v2 -; VI-SDAG-NEXT: v_mac_f32_e32 v3, 0x3f317000, v1 +; VI-SDAG-NEXT: v_sub_f32_e32 v3, v0, v1 +; VI-SDAG-NEXT: v_mul_f32_e32 v2, 0x3805fdf4, v1 +; VI-SDAG-NEXT: v_mul_f32_e32 v4, 0x3805fdf4, v3 +; VI-SDAG-NEXT: v_mul_f32_e32 v3, 0x3f317000, v3 +; VI-SDAG-NEXT: v_add_f32_e32 v2, v2, v4 +; VI-SDAG-NEXT: v_mul_f32_e32 v1, 0x3f317000, v1 +; VI-SDAG-NEXT: v_add_f32_e32 v2, v3, v2 +; VI-SDAG-NEXT: v_add_f32_e32 v1, v1, v2 ; VI-SDAG-NEXT: v_cmp_lt_f32_e64 vcc, |v0|, s4 -; VI-SDAG-NEXT: v_cndmask_b32_e32 v0, v0, v3, vcc +; VI-SDAG-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc ; VI-SDAG-NEXT: s_setpc_b64 s[30:31] ; ; VI-GISEL-LABEL: v_log_f32_daz: ; VI-GISEL: ; %bb.0: ; VI-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; VI-GISEL-NEXT: v_log_f32_e32 v0, v0 -; VI-GISEL-NEXT: v_mov_b32_e32 v1, 0x7f800000 -; VI-GISEL-NEXT: v_and_b32_e32 v2, 0xfffff000, v0 -; VI-GISEL-NEXT: v_sub_f32_e32 v3, v0, v2 -; VI-GISEL-NEXT: v_mul_f32_e32 v4, 0x3805fdf4, v3 -; VI-GISEL-NEXT: v_mac_f32_e32 v4, 0x3805fdf4, v2 -; VI-GISEL-NEXT: v_mac_f32_e32 v4, 0x3f317000, v3 -; VI-GISEL-NEXT: v_mac_f32_e32 v4, 0x3f317000, v2 -; VI-GISEL-NEXT: v_cmp_lt_f32_e64 vcc, |v0|, v1 -; VI-GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v4, vcc +; VI-GISEL-NEXT: v_and_b32_e32 v1, 0xfffff000, v0 +; VI-GISEL-NEXT: v_sub_f32_e32 v2, v0, v1 +; VI-GISEL-NEXT: v_mul_f32_e32 v3, 0x3805fdf4, v1 +; VI-GISEL-NEXT: v_mul_f32_e32 v4, 0x3805fdf4, v2 +; VI-GISEL-NEXT: v_mul_f32_e32 v2, 0x3f317000, v2 +; VI-GISEL-NEXT: v_add_f32_e32 v3, v3, v4 +; VI-GISEL-NEXT: v_mul_f32_e32 v1, 0x3f317000, v1 +; VI-GISEL-NEXT: v_add_f32_e32 v2, v2, v3 +; VI-GISEL-NEXT: v_add_f32_e32 v1, v1, v2 +; VI-GISEL-NEXT: v_mov_b32_e32 v2, 0x7f800000 +; VI-GISEL-NEXT: v_cmp_lt_f32_e64 vcc, |v0|, v2 +; VI-GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc ; VI-GISEL-NEXT: s_setpc_b64 s[30:31] ; ; GFX900-SDAG-LABEL: v_log_f32_daz: @@ -3390,9 +3396,9 @@ ; GFX900-SDAG-NEXT: s_mov_b32 s5, 0x3377d1cf ; GFX900-SDAG-NEXT: s_mov_b32 s6, 0x7f800000 ; GFX900-SDAG-NEXT: v_mul_f32_e32 v1, 0x3f317217, v0 -; GFX900-SDAG-NEXT: v_fma_f32 v1, v0, s4, -v1 -; GFX900-SDAG-NEXT: v_fma_f32 v1, v0, s5, v1 -; GFX900-SDAG-NEXT: v_mac_f32_e32 v1, 0x3f317217, v0 +; GFX900-SDAG-NEXT: v_fma_f32 v2, v0, s4, -v1 +; GFX900-SDAG-NEXT: v_fma_f32 v2, v0, s5, v2 +; GFX900-SDAG-NEXT: v_add_f32_e32 v1, v1, v2 ; GFX900-SDAG-NEXT: v_cmp_lt_f32_e64 vcc, |v0|, s6 ; GFX900-SDAG-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc ; GFX900-SDAG-NEXT: s_setpc_b64 s[30:31] @@ -3405,9 +3411,9 @@ ; GFX900-GISEL-NEXT: v_mov_b32_e32 v1, 0x3377d1cf ; GFX900-GISEL-NEXT: v_mov_b32_e32 v2, 0x7f800000 ; GFX900-GISEL-NEXT: v_mul_f32_e32 v3, 0x3f317217, v0 -; GFX900-GISEL-NEXT: v_fma_f32 v3, v0, s4, -v3 -; GFX900-GISEL-NEXT: v_fma_f32 v1, v0, v1, v3 -; GFX900-GISEL-NEXT: v_mac_f32_e32 v1, 0x3f317217, v0 +; GFX900-GISEL-NEXT: v_fma_f32 v4, v0, s4, -v3 +; GFX900-GISEL-NEXT: v_fma_f32 v1, v0, v1, v4 +; GFX900-GISEL-NEXT: v_add_f32_e32 v1, v3, v1 ; GFX900-GISEL-NEXT: v_cmp_lt_f32_e64 vcc, |v0|, v2 ; GFX900-GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc ; GFX900-GISEL-NEXT: s_setpc_b64 s[30:31] @@ -3666,9 +3672,9 @@ ; SI-SDAG-NEXT: s_mov_b32 s5, 0x3377d1cf ; SI-SDAG-NEXT: s_mov_b32 s6, 0x7f800000 ; SI-SDAG-NEXT: v_mul_f32_e32 v1, 0x3f317217, v0 -; SI-SDAG-NEXT: v_fma_f32 v1, v0, s4, -v1 -; SI-SDAG-NEXT: v_fma_f32 v1, v0, s5, v1 -; SI-SDAG-NEXT: v_mac_f32_e32 v1, 0x3f317217, v0 +; SI-SDAG-NEXT: v_fma_f32 v2, v0, s4, -v1 +; SI-SDAG-NEXT: v_fma_f32 v2, v0, s5, v2 +; SI-SDAG-NEXT: v_add_f32_e32 v1, v1, v2 ; SI-SDAG-NEXT: v_cmp_lt_f32_e64 vcc, |v0|, s6 ; SI-SDAG-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc ; SI-SDAG-NEXT: s_setpc_b64 s[30:31] @@ -3681,9 +3687,9 @@ ; SI-GISEL-NEXT: v_mov_b32_e32 v1, 0x3377d1cf ; SI-GISEL-NEXT: v_mov_b32_e32 v2, 0x7f800000 ; SI-GISEL-NEXT: v_mul_f32_e32 v3, 0x3f317217, v0 -; SI-GISEL-NEXT: v_fma_f32 v3, v0, s4, -v3 -; SI-GISEL-NEXT: v_fma_f32 v1, v0, v1, v3 -; SI-GISEL-NEXT: v_mac_f32_e32 v1, 0x3f317217, v0 +; SI-GISEL-NEXT: v_fma_f32 v4, v0, s4, -v3 +; SI-GISEL-NEXT: v_fma_f32 v1, v0, v1, v4 +; SI-GISEL-NEXT: v_add_f32_e32 v1, v3, v1 ; SI-GISEL-NEXT: v_cmp_lt_f32_e64 vcc, |v0|, v2 ; SI-GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc ; SI-GISEL-NEXT: s_setpc_b64 s[30:31] @@ -3694,28 +3700,34 @@ ; VI-SDAG-NEXT: v_log_f32_e32 v0, v0 ; VI-SDAG-NEXT: s_mov_b32 s4, 0x7f800000 ; VI-SDAG-NEXT: v_and_b32_e32 v1, 0xfffff000, v0 -; VI-SDAG-NEXT: v_sub_f32_e32 v2, v0, v1 -; VI-SDAG-NEXT: v_mul_f32_e32 v3, 0x3805fdf4, v2 -; VI-SDAG-NEXT: v_mac_f32_e32 v3, 0x3805fdf4, v1 -; VI-SDAG-NEXT: v_mac_f32_e32 v3, 0x3f317000, v2 -; VI-SDAG-NEXT: v_mac_f32_e32 v3, 0x3f317000, v1 +; VI-SDAG-NEXT: v_sub_f32_e32 v3, v0, v1 +; VI-SDAG-NEXT: v_mul_f32_e32 v2, 0x3805fdf4, v1 +; VI-SDAG-NEXT: v_mul_f32_e32 v4, 0x3805fdf4, v3 +; VI-SDAG-NEXT: v_mul_f32_e32 v3, 0x3f317000, v3 +; VI-SDAG-NEXT: v_add_f32_e32 v2, v2, v4 +; VI-SDAG-NEXT: v_mul_f32_e32 v1, 0x3f317000, v1 +; VI-SDAG-NEXT: v_add_f32_e32 v2, v3, v2 +; VI-SDAG-NEXT: v_add_f32_e32 v1, v1, v2 ; VI-SDAG-NEXT: v_cmp_lt_f32_e64 vcc, |v0|, s4 -; VI-SDAG-NEXT: v_cndmask_b32_e32 v0, v0, v3, vcc +; VI-SDAG-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc ; VI-SDAG-NEXT: s_setpc_b64 s[30:31] ; ; VI-GISEL-LABEL: v_log_f32_nnan_daz: ; VI-GISEL: ; %bb.0: ; VI-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; VI-GISEL-NEXT: v_log_f32_e32 v0, v0 -; VI-GISEL-NEXT: v_mov_b32_e32 v1, 0x7f800000 -; VI-GISEL-NEXT: v_and_b32_e32 v2, 0xfffff000, v0 -; VI-GISEL-NEXT: v_sub_f32_e32 v3, v0, v2 -; VI-GISEL-NEXT: v_mul_f32_e32 v4, 0x3805fdf4, v3 -; VI-GISEL-NEXT: v_mac_f32_e32 v4, 0x3805fdf4, v2 -; VI-GISEL-NEXT: v_mac_f32_e32 v4, 0x3f317000, v3 -; VI-GISEL-NEXT: v_mac_f32_e32 v4, 0x3f317000, v2 -; VI-GISEL-NEXT: v_cmp_lt_f32_e64 vcc, |v0|, v1 -; VI-GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v4, vcc +; VI-GISEL-NEXT: v_and_b32_e32 v1, 0xfffff000, v0 +; VI-GISEL-NEXT: v_sub_f32_e32 v2, v0, v1 +; VI-GISEL-NEXT: v_mul_f32_e32 v3, 0x3805fdf4, v1 +; VI-GISEL-NEXT: v_mul_f32_e32 v4, 0x3805fdf4, v2 +; VI-GISEL-NEXT: v_mul_f32_e32 v2, 0x3f317000, v2 +; VI-GISEL-NEXT: v_add_f32_e32 v3, v3, v4 +; VI-GISEL-NEXT: v_mul_f32_e32 v1, 0x3f317000, v1 +; VI-GISEL-NEXT: v_add_f32_e32 v2, v2, v3 +; VI-GISEL-NEXT: v_add_f32_e32 v1, v1, v2 +; VI-GISEL-NEXT: v_mov_b32_e32 v2, 0x7f800000 +; VI-GISEL-NEXT: v_cmp_lt_f32_e64 vcc, |v0|, v2 +; VI-GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc ; VI-GISEL-NEXT: s_setpc_b64 s[30:31] ; ; GFX900-SDAG-LABEL: v_log_f32_nnan_daz: @@ -3726,9 +3738,9 @@ ; GFX900-SDAG-NEXT: s_mov_b32 s5, 0x3377d1cf ; GFX900-SDAG-NEXT: s_mov_b32 s6, 0x7f800000 ; GFX900-SDAG-NEXT: v_mul_f32_e32 v1, 0x3f317217, v0 -; GFX900-SDAG-NEXT: v_fma_f32 v1, v0, s4, -v1 -; GFX900-SDAG-NEXT: v_fma_f32 v1, v0, s5, v1 -; GFX900-SDAG-NEXT: v_mac_f32_e32 v1, 0x3f317217, v0 +; GFX900-SDAG-NEXT: v_fma_f32 v2, v0, s4, -v1 +; GFX900-SDAG-NEXT: v_fma_f32 v2, v0, s5, v2 +; GFX900-SDAG-NEXT: v_add_f32_e32 v1, v1, v2 ; GFX900-SDAG-NEXT: v_cmp_lt_f32_e64 vcc, |v0|, s6 ; GFX900-SDAG-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc ; GFX900-SDAG-NEXT: s_setpc_b64 s[30:31] @@ -3741,9 +3753,9 @@ ; GFX900-GISEL-NEXT: v_mov_b32_e32 v1, 0x3377d1cf ; GFX900-GISEL-NEXT: v_mov_b32_e32 v2, 0x7f800000 ; GFX900-GISEL-NEXT: v_mul_f32_e32 v3, 0x3f317217, v0 -; GFX900-GISEL-NEXT: v_fma_f32 v3, v0, s4, -v3 -; GFX900-GISEL-NEXT: v_fma_f32 v1, v0, v1, v3 -; GFX900-GISEL-NEXT: v_mac_f32_e32 v1, 0x3f317217, v0 +; GFX900-GISEL-NEXT: v_fma_f32 v4, v0, s4, -v3 +; GFX900-GISEL-NEXT: v_fma_f32 v1, v0, v1, v4 +; GFX900-GISEL-NEXT: v_add_f32_e32 v1, v3, v1 ; GFX900-GISEL-NEXT: v_cmp_lt_f32_e64 vcc, |v0|, v2 ; GFX900-GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc ; GFX900-GISEL-NEXT: s_setpc_b64 s[30:31] @@ -3803,11 +3815,11 @@ ; SI-SDAG-NEXT: v_log_f32_e32 v0, v0 ; SI-SDAG-NEXT: s_mov_b32 s4, 0x3f317217 ; SI-SDAG-NEXT: v_mul_f32_e32 v1, 0x3f317217, v0 -; SI-SDAG-NEXT: v_fma_f32 v1, v0, s4, -v1 +; SI-SDAG-NEXT: v_fma_f32 v2, v0, s4, -v1 ; SI-SDAG-NEXT: s_mov_b32 s4, 0x3377d1cf -; SI-SDAG-NEXT: v_fma_f32 v1, v0, s4, v1 +; SI-SDAG-NEXT: v_fma_f32 v2, v0, s4, v2 ; SI-SDAG-NEXT: s_mov_b32 s4, 0x7f800000 -; SI-SDAG-NEXT: v_mac_f32_e32 v1, 0x3f317217, v0 +; SI-SDAG-NEXT: v_add_f32_e32 v1, v1, v2 ; SI-SDAG-NEXT: v_cmp_lt_f32_e64 s[4:5], |v0|, s4 ; SI-SDAG-NEXT: v_cndmask_b32_e64 v0, v0, v1, s[4:5] ; SI-SDAG-NEXT: v_mov_b32_e32 v1, 0x41b17218 @@ -3825,12 +3837,12 @@ ; SI-GISEL-NEXT: v_mul_f32_e32 v0, v0, v1 ; SI-GISEL-NEXT: v_log_f32_e32 v0, v0 ; SI-GISEL-NEXT: s_mov_b32 s4, 0x3f317217 -; SI-GISEL-NEXT: v_mov_b32_e32 v2, 0x3377d1cf +; SI-GISEL-NEXT: v_mov_b32_e32 v3, 0x3377d1cf ; SI-GISEL-NEXT: v_mul_f32_e32 v1, 0x3f317217, v0 -; SI-GISEL-NEXT: v_fma_f32 v1, v0, s4, -v1 -; SI-GISEL-NEXT: v_fma_f32 v1, v0, v2, v1 +; SI-GISEL-NEXT: v_fma_f32 v2, v0, s4, -v1 +; SI-GISEL-NEXT: v_fma_f32 v2, v0, v3, v2 +; SI-GISEL-NEXT: v_add_f32_e32 v1, v1, v2 ; SI-GISEL-NEXT: v_mov_b32_e32 v2, 0x7f800000 -; SI-GISEL-NEXT: v_mac_f32_e32 v1, 0x3f317217, v0 ; SI-GISEL-NEXT: v_cmp_lt_f32_e64 s[4:5], |v0|, v2 ; SI-GISEL-NEXT: v_cndmask_b32_e64 v0, v0, v1, s[4:5] ; SI-GISEL-NEXT: v_mov_b32_e32 v1, 0x41b17218 @@ -3850,13 +3862,16 @@ ; VI-SDAG-NEXT: s_mov_b32 s4, 0x7f800000 ; VI-SDAG-NEXT: v_and_b32_e32 v1, 0xfffff000, v0 ; VI-SDAG-NEXT: v_sub_f32_e32 v2, v0, v1 -; VI-SDAG-NEXT: v_mul_f32_e32 v3, 0x3805fdf4, v2 -; VI-SDAG-NEXT: v_mac_f32_e32 v3, 0x3805fdf4, v1 -; VI-SDAG-NEXT: v_mac_f32_e32 v3, 0x3f317000, v2 -; VI-SDAG-NEXT: v_mac_f32_e32 v3, 0x3f317000, v1 +; VI-SDAG-NEXT: v_mul_f32_e32 v3, 0x3f317000, v2 +; VI-SDAG-NEXT: v_mul_f32_e32 v2, 0x3805fdf4, v2 +; VI-SDAG-NEXT: v_mul_f32_e32 v4, 0x3805fdf4, v1 +; VI-SDAG-NEXT: v_add_f32_e32 v2, v4, v2 +; VI-SDAG-NEXT: v_add_f32_e32 v2, v3, v2 +; VI-SDAG-NEXT: v_mul_f32_e32 v1, 0x3f317000, v1 +; VI-SDAG-NEXT: v_add_f32_e32 v1, v1, v2 ; VI-SDAG-NEXT: v_cmp_lt_f32_e64 s[4:5], |v0|, s4 +; VI-SDAG-NEXT: v_cndmask_b32_e64 v0, v0, v1, s[4:5] ; VI-SDAG-NEXT: v_mov_b32_e32 v1, 0x41b17218 -; VI-SDAG-NEXT: v_cndmask_b32_e64 v0, v0, v3, s[4:5] ; VI-SDAG-NEXT: v_cndmask_b32_e32 v1, 0, v1, vcc ; VI-SDAG-NEXT: v_sub_f32_e32 v0, v0, v1 ; VI-SDAG-NEXT: s_setpc_b64 s[30:31] @@ -3873,13 +3888,16 @@ ; VI-GISEL-NEXT: v_and_b32_e32 v1, 0xfffff000, v0 ; VI-GISEL-NEXT: v_sub_f32_e32 v2, v0, v1 ; VI-GISEL-NEXT: v_mul_f32_e32 v3, 0x3805fdf4, v2 -; VI-GISEL-NEXT: v_mac_f32_e32 v3, 0x3805fdf4, v1 -; VI-GISEL-NEXT: v_mac_f32_e32 v3, 0x3f317000, v2 -; VI-GISEL-NEXT: v_mac_f32_e32 v3, 0x3f317000, v1 -; VI-GISEL-NEXT: v_mov_b32_e32 v1, 0x7f800000 -; VI-GISEL-NEXT: v_cmp_lt_f32_e64 s[4:5], |v0|, v1 +; VI-GISEL-NEXT: v_mul_f32_e32 v4, 0x3805fdf4, v1 +; VI-GISEL-NEXT: v_add_f32_e32 v3, v4, v3 +; VI-GISEL-NEXT: v_mul_f32_e32 v2, 0x3f317000, v2 +; VI-GISEL-NEXT: v_add_f32_e32 v2, v2, v3 +; VI-GISEL-NEXT: v_mul_f32_e32 v1, 0x3f317000, v1 +; VI-GISEL-NEXT: v_add_f32_e32 v1, v1, v2 +; VI-GISEL-NEXT: v_mov_b32_e32 v2, 0x7f800000 +; VI-GISEL-NEXT: v_cmp_lt_f32_e64 s[4:5], |v0|, v2 +; VI-GISEL-NEXT: v_cndmask_b32_e64 v0, v0, v1, s[4:5] ; VI-GISEL-NEXT: v_mov_b32_e32 v1, 0x41b17218 -; VI-GISEL-NEXT: v_cndmask_b32_e64 v0, v0, v3, s[4:5] ; VI-GISEL-NEXT: v_cndmask_b32_e32 v1, 0, v1, vcc ; VI-GISEL-NEXT: v_sub_f32_e32 v0, v0, v1 ; VI-GISEL-NEXT: s_setpc_b64 s[30:31] @@ -3895,11 +3913,11 @@ ; GFX900-SDAG-NEXT: v_log_f32_e32 v0, v0 ; GFX900-SDAG-NEXT: s_mov_b32 s4, 0x3f317217 ; GFX900-SDAG-NEXT: v_mul_f32_e32 v1, 0x3f317217, v0 -; GFX900-SDAG-NEXT: v_fma_f32 v1, v0, s4, -v1 +; GFX900-SDAG-NEXT: v_fma_f32 v2, v0, s4, -v1 ; GFX900-SDAG-NEXT: s_mov_b32 s4, 0x3377d1cf -; GFX900-SDAG-NEXT: v_fma_f32 v1, v0, s4, v1 +; GFX900-SDAG-NEXT: v_fma_f32 v2, v0, s4, v2 ; GFX900-SDAG-NEXT: s_mov_b32 s4, 0x7f800000 -; GFX900-SDAG-NEXT: v_mac_f32_e32 v1, 0x3f317217, v0 +; GFX900-SDAG-NEXT: v_add_f32_e32 v1, v1, v2 ; GFX900-SDAG-NEXT: v_cmp_lt_f32_e64 s[4:5], |v0|, s4 ; GFX900-SDAG-NEXT: v_cndmask_b32_e64 v0, v0, v1, s[4:5] ; GFX900-SDAG-NEXT: v_mov_b32_e32 v1, 0x41b17218 @@ -3917,12 +3935,12 @@ ; GFX900-GISEL-NEXT: v_mul_f32_e32 v0, v0, v1 ; GFX900-GISEL-NEXT: v_log_f32_e32 v0, v0 ; GFX900-GISEL-NEXT: s_mov_b32 s4, 0x3f317217 -; GFX900-GISEL-NEXT: v_mov_b32_e32 v2, 0x3377d1cf +; GFX900-GISEL-NEXT: v_mov_b32_e32 v3, 0x3377d1cf ; GFX900-GISEL-NEXT: v_mul_f32_e32 v1, 0x3f317217, v0 -; GFX900-GISEL-NEXT: v_fma_f32 v1, v0, s4, -v1 -; GFX900-GISEL-NEXT: v_fma_f32 v1, v0, v2, v1 +; GFX900-GISEL-NEXT: v_fma_f32 v2, v0, s4, -v1 +; GFX900-GISEL-NEXT: v_fma_f32 v2, v0, v3, v2 +; GFX900-GISEL-NEXT: v_add_f32_e32 v1, v1, v2 ; GFX900-GISEL-NEXT: v_mov_b32_e32 v2, 0x7f800000 -; GFX900-GISEL-NEXT: v_mac_f32_e32 v1, 0x3f317217, v0 ; GFX900-GISEL-NEXT: v_cmp_lt_f32_e64 s[4:5], |v0|, v2 ; GFX900-GISEL-NEXT: v_cndmask_b32_e64 v0, v0, v1, s[4:5] ; GFX900-GISEL-NEXT: v_mov_b32_e32 v1, 0x41b17218 @@ -3996,9 +4014,9 @@ ; SI-SDAG-NEXT: s_mov_b32 s5, 0x3377d1cf ; SI-SDAG-NEXT: s_mov_b32 s6, 0x7f800000 ; SI-SDAG-NEXT: v_mul_f32_e32 v1, 0x3f317217, v0 -; SI-SDAG-NEXT: v_fma_f32 v1, v0, s4, -v1 -; SI-SDAG-NEXT: v_fma_f32 v1, v0, s5, v1 -; SI-SDAG-NEXT: v_mac_f32_e32 v1, 0x3f317217, v0 +; SI-SDAG-NEXT: v_fma_f32 v2, v0, s4, -v1 +; SI-SDAG-NEXT: v_fma_f32 v2, v0, s5, v2 +; SI-SDAG-NEXT: v_add_f32_e32 v1, v1, v2 ; SI-SDAG-NEXT: v_cmp_lt_f32_e64 vcc, |v0|, s6 ; SI-SDAG-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc ; SI-SDAG-NEXT: s_setpc_b64 s[30:31] @@ -4011,9 +4029,9 @@ ; SI-GISEL-NEXT: v_mov_b32_e32 v1, 0x3377d1cf ; SI-GISEL-NEXT: v_mov_b32_e32 v2, 0x7f800000 ; SI-GISEL-NEXT: v_mul_f32_e32 v3, 0x3f317217, v0 -; SI-GISEL-NEXT: v_fma_f32 v3, v0, s4, -v3 -; SI-GISEL-NEXT: v_fma_f32 v1, v0, v1, v3 -; SI-GISEL-NEXT: v_mac_f32_e32 v1, 0x3f317217, v0 +; SI-GISEL-NEXT: v_fma_f32 v4, v0, s4, -v3 +; SI-GISEL-NEXT: v_fma_f32 v1, v0, v1, v4 +; SI-GISEL-NEXT: v_add_f32_e32 v1, v3, v1 ; SI-GISEL-NEXT: v_cmp_lt_f32_e64 vcc, |v0|, v2 ; SI-GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc ; SI-GISEL-NEXT: s_setpc_b64 s[30:31] @@ -4024,28 +4042,34 @@ ; VI-SDAG-NEXT: v_log_f32_e32 v0, v0 ; VI-SDAG-NEXT: s_mov_b32 s4, 0x7f800000 ; VI-SDAG-NEXT: v_and_b32_e32 v1, 0xfffff000, v0 -; VI-SDAG-NEXT: v_sub_f32_e32 v2, v0, v1 -; VI-SDAG-NEXT: v_mul_f32_e32 v3, 0x3805fdf4, v2 -; VI-SDAG-NEXT: v_mac_f32_e32 v3, 0x3805fdf4, v1 -; VI-SDAG-NEXT: v_mac_f32_e32 v3, 0x3f317000, v2 -; VI-SDAG-NEXT: v_mac_f32_e32 v3, 0x3f317000, v1 +; VI-SDAG-NEXT: v_sub_f32_e32 v3, v0, v1 +; VI-SDAG-NEXT: v_mul_f32_e32 v2, 0x3805fdf4, v1 +; VI-SDAG-NEXT: v_mul_f32_e32 v4, 0x3805fdf4, v3 +; VI-SDAG-NEXT: v_mul_f32_e32 v3, 0x3f317000, v3 +; VI-SDAG-NEXT: v_add_f32_e32 v2, v2, v4 +; VI-SDAG-NEXT: v_mul_f32_e32 v1, 0x3f317000, v1 +; VI-SDAG-NEXT: v_add_f32_e32 v2, v3, v2 +; VI-SDAG-NEXT: v_add_f32_e32 v1, v1, v2 ; VI-SDAG-NEXT: v_cmp_lt_f32_e64 vcc, |v0|, s4 -; VI-SDAG-NEXT: v_cndmask_b32_e32 v0, v0, v3, vcc +; VI-SDAG-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc ; VI-SDAG-NEXT: s_setpc_b64 s[30:31] ; ; VI-GISEL-LABEL: v_log_f32_ninf_daz: ; VI-GISEL: ; %bb.0: ; VI-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; VI-GISEL-NEXT: v_log_f32_e32 v0, v0 -; VI-GISEL-NEXT: v_mov_b32_e32 v1, 0x7f800000 -; VI-GISEL-NEXT: v_and_b32_e32 v2, 0xfffff000, v0 -; VI-GISEL-NEXT: v_sub_f32_e32 v3, v0, v2 -; VI-GISEL-NEXT: v_mul_f32_e32 v4, 0x3805fdf4, v3 -; VI-GISEL-NEXT: v_mac_f32_e32 v4, 0x3805fdf4, v2 -; VI-GISEL-NEXT: v_mac_f32_e32 v4, 0x3f317000, v3 -; VI-GISEL-NEXT: v_mac_f32_e32 v4, 0x3f317000, v2 -; VI-GISEL-NEXT: v_cmp_lt_f32_e64 vcc, |v0|, v1 -; VI-GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v4, vcc +; VI-GISEL-NEXT: v_and_b32_e32 v1, 0xfffff000, v0 +; VI-GISEL-NEXT: v_sub_f32_e32 v2, v0, v1 +; VI-GISEL-NEXT: v_mul_f32_e32 v3, 0x3805fdf4, v1 +; VI-GISEL-NEXT: v_mul_f32_e32 v4, 0x3805fdf4, v2 +; VI-GISEL-NEXT: v_mul_f32_e32 v2, 0x3f317000, v2 +; VI-GISEL-NEXT: v_add_f32_e32 v3, v3, v4 +; VI-GISEL-NEXT: v_mul_f32_e32 v1, 0x3f317000, v1 +; VI-GISEL-NEXT: v_add_f32_e32 v2, v2, v3 +; VI-GISEL-NEXT: v_add_f32_e32 v1, v1, v2 +; VI-GISEL-NEXT: v_mov_b32_e32 v2, 0x7f800000 +; VI-GISEL-NEXT: v_cmp_lt_f32_e64 vcc, |v0|, v2 +; VI-GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc ; VI-GISEL-NEXT: s_setpc_b64 s[30:31] ; ; GFX900-SDAG-LABEL: v_log_f32_ninf_daz: @@ -4056,9 +4080,9 @@ ; GFX900-SDAG-NEXT: s_mov_b32 s5, 0x3377d1cf ; GFX900-SDAG-NEXT: s_mov_b32 s6, 0x7f800000 ; GFX900-SDAG-NEXT: v_mul_f32_e32 v1, 0x3f317217, v0 -; GFX900-SDAG-NEXT: v_fma_f32 v1, v0, s4, -v1 -; GFX900-SDAG-NEXT: v_fma_f32 v1, v0, s5, v1 -; GFX900-SDAG-NEXT: v_mac_f32_e32 v1, 0x3f317217, v0 +; GFX900-SDAG-NEXT: v_fma_f32 v2, v0, s4, -v1 +; GFX900-SDAG-NEXT: v_fma_f32 v2, v0, s5, v2 +; GFX900-SDAG-NEXT: v_add_f32_e32 v1, v1, v2 ; GFX900-SDAG-NEXT: v_cmp_lt_f32_e64 vcc, |v0|, s6 ; GFX900-SDAG-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc ; GFX900-SDAG-NEXT: s_setpc_b64 s[30:31] @@ -4071,9 +4095,9 @@ ; GFX900-GISEL-NEXT: v_mov_b32_e32 v1, 0x3377d1cf ; GFX900-GISEL-NEXT: v_mov_b32_e32 v2, 0x7f800000 ; GFX900-GISEL-NEXT: v_mul_f32_e32 v3, 0x3f317217, v0 -; GFX900-GISEL-NEXT: v_fma_f32 v3, v0, s4, -v3 -; GFX900-GISEL-NEXT: v_fma_f32 v1, v0, v1, v3 -; GFX900-GISEL-NEXT: v_mac_f32_e32 v1, 0x3f317217, v0 +; GFX900-GISEL-NEXT: v_fma_f32 v4, v0, s4, -v3 +; GFX900-GISEL-NEXT: v_fma_f32 v1, v0, v1, v4 +; GFX900-GISEL-NEXT: v_add_f32_e32 v1, v3, v1 ; GFX900-GISEL-NEXT: v_cmp_lt_f32_e64 vcc, |v0|, v2 ; GFX900-GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc ; GFX900-GISEL-NEXT: s_setpc_b64 s[30:31] @@ -4133,11 +4157,11 @@ ; SI-SDAG-NEXT: v_log_f32_e32 v0, v0 ; SI-SDAG-NEXT: s_mov_b32 s4, 0x3f317217 ; SI-SDAG-NEXT: v_mul_f32_e32 v1, 0x3f317217, v0 -; SI-SDAG-NEXT: v_fma_f32 v1, v0, s4, -v1 +; SI-SDAG-NEXT: v_fma_f32 v2, v0, s4, -v1 ; SI-SDAG-NEXT: s_mov_b32 s4, 0x3377d1cf -; SI-SDAG-NEXT: v_fma_f32 v1, v0, s4, v1 +; SI-SDAG-NEXT: v_fma_f32 v2, v0, s4, v2 ; SI-SDAG-NEXT: s_mov_b32 s4, 0x7f800000 -; SI-SDAG-NEXT: v_mac_f32_e32 v1, 0x3f317217, v0 +; SI-SDAG-NEXT: v_add_f32_e32 v1, v1, v2 ; SI-SDAG-NEXT: v_cmp_lt_f32_e64 s[4:5], |v0|, s4 ; SI-SDAG-NEXT: v_cndmask_b32_e64 v0, v0, v1, s[4:5] ; SI-SDAG-NEXT: v_mov_b32_e32 v1, 0x41b17218 @@ -4155,12 +4179,12 @@ ; SI-GISEL-NEXT: v_mul_f32_e32 v0, v0, v1 ; SI-GISEL-NEXT: v_log_f32_e32 v0, v0 ; SI-GISEL-NEXT: s_mov_b32 s4, 0x3f317217 -; SI-GISEL-NEXT: v_mov_b32_e32 v2, 0x3377d1cf +; SI-GISEL-NEXT: v_mov_b32_e32 v3, 0x3377d1cf ; SI-GISEL-NEXT: v_mul_f32_e32 v1, 0x3f317217, v0 -; SI-GISEL-NEXT: v_fma_f32 v1, v0, s4, -v1 -; SI-GISEL-NEXT: v_fma_f32 v1, v0, v2, v1 +; SI-GISEL-NEXT: v_fma_f32 v2, v0, s4, -v1 +; SI-GISEL-NEXT: v_fma_f32 v2, v0, v3, v2 +; SI-GISEL-NEXT: v_add_f32_e32 v1, v1, v2 ; SI-GISEL-NEXT: v_mov_b32_e32 v2, 0x7f800000 -; SI-GISEL-NEXT: v_mac_f32_e32 v1, 0x3f317217, v0 ; SI-GISEL-NEXT: v_cmp_lt_f32_e64 s[4:5], |v0|, v2 ; SI-GISEL-NEXT: v_cndmask_b32_e64 v0, v0, v1, s[4:5] ; SI-GISEL-NEXT: v_mov_b32_e32 v1, 0x41b17218 @@ -4180,13 +4204,16 @@ ; VI-SDAG-NEXT: s_mov_b32 s4, 0x7f800000 ; VI-SDAG-NEXT: v_and_b32_e32 v1, 0xfffff000, v0 ; VI-SDAG-NEXT: v_sub_f32_e32 v2, v0, v1 -; VI-SDAG-NEXT: v_mul_f32_e32 v3, 0x3805fdf4, v2 -; VI-SDAG-NEXT: v_mac_f32_e32 v3, 0x3805fdf4, v1 -; VI-SDAG-NEXT: v_mac_f32_e32 v3, 0x3f317000, v2 -; VI-SDAG-NEXT: v_mac_f32_e32 v3, 0x3f317000, v1 +; VI-SDAG-NEXT: v_mul_f32_e32 v3, 0x3f317000, v2 +; VI-SDAG-NEXT: v_mul_f32_e32 v2, 0x3805fdf4, v2 +; VI-SDAG-NEXT: v_mul_f32_e32 v4, 0x3805fdf4, v1 +; VI-SDAG-NEXT: v_add_f32_e32 v2, v4, v2 +; VI-SDAG-NEXT: v_add_f32_e32 v2, v3, v2 +; VI-SDAG-NEXT: v_mul_f32_e32 v1, 0x3f317000, v1 +; VI-SDAG-NEXT: v_add_f32_e32 v1, v1, v2 ; VI-SDAG-NEXT: v_cmp_lt_f32_e64 s[4:5], |v0|, s4 +; VI-SDAG-NEXT: v_cndmask_b32_e64 v0, v0, v1, s[4:5] ; VI-SDAG-NEXT: v_mov_b32_e32 v1, 0x41b17218 -; VI-SDAG-NEXT: v_cndmask_b32_e64 v0, v0, v3, s[4:5] ; VI-SDAG-NEXT: v_cndmask_b32_e32 v1, 0, v1, vcc ; VI-SDAG-NEXT: v_sub_f32_e32 v0, v0, v1 ; VI-SDAG-NEXT: s_setpc_b64 s[30:31] @@ -4203,13 +4230,16 @@ ; VI-GISEL-NEXT: v_and_b32_e32 v1, 0xfffff000, v0 ; VI-GISEL-NEXT: v_sub_f32_e32 v2, v0, v1 ; VI-GISEL-NEXT: v_mul_f32_e32 v3, 0x3805fdf4, v2 -; VI-GISEL-NEXT: v_mac_f32_e32 v3, 0x3805fdf4, v1 -; VI-GISEL-NEXT: v_mac_f32_e32 v3, 0x3f317000, v2 -; VI-GISEL-NEXT: v_mac_f32_e32 v3, 0x3f317000, v1 -; VI-GISEL-NEXT: v_mov_b32_e32 v1, 0x7f800000 -; VI-GISEL-NEXT: v_cmp_lt_f32_e64 s[4:5], |v0|, v1 +; VI-GISEL-NEXT: v_mul_f32_e32 v4, 0x3805fdf4, v1 +; VI-GISEL-NEXT: v_add_f32_e32 v3, v4, v3 +; VI-GISEL-NEXT: v_mul_f32_e32 v2, 0x3f317000, v2 +; VI-GISEL-NEXT: v_add_f32_e32 v2, v2, v3 +; VI-GISEL-NEXT: v_mul_f32_e32 v1, 0x3f317000, v1 +; VI-GISEL-NEXT: v_add_f32_e32 v1, v1, v2 +; VI-GISEL-NEXT: v_mov_b32_e32 v2, 0x7f800000 +; VI-GISEL-NEXT: v_cmp_lt_f32_e64 s[4:5], |v0|, v2 +; VI-GISEL-NEXT: v_cndmask_b32_e64 v0, v0, v1, s[4:5] ; VI-GISEL-NEXT: v_mov_b32_e32 v1, 0x41b17218 -; VI-GISEL-NEXT: v_cndmask_b32_e64 v0, v0, v3, s[4:5] ; VI-GISEL-NEXT: v_cndmask_b32_e32 v1, 0, v1, vcc ; VI-GISEL-NEXT: v_sub_f32_e32 v0, v0, v1 ; VI-GISEL-NEXT: s_setpc_b64 s[30:31] @@ -4225,11 +4255,11 @@ ; GFX900-SDAG-NEXT: v_log_f32_e32 v0, v0 ; GFX900-SDAG-NEXT: s_mov_b32 s4, 0x3f317217 ; GFX900-SDAG-NEXT: v_mul_f32_e32 v1, 0x3f317217, v0 -; GFX900-SDAG-NEXT: v_fma_f32 v1, v0, s4, -v1 +; GFX900-SDAG-NEXT: v_fma_f32 v2, v0, s4, -v1 ; GFX900-SDAG-NEXT: s_mov_b32 s4, 0x3377d1cf -; GFX900-SDAG-NEXT: v_fma_f32 v1, v0, s4, v1 +; GFX900-SDAG-NEXT: v_fma_f32 v2, v0, s4, v2 ; GFX900-SDAG-NEXT: s_mov_b32 s4, 0x7f800000 -; GFX900-SDAG-NEXT: v_mac_f32_e32 v1, 0x3f317217, v0 +; GFX900-SDAG-NEXT: v_add_f32_e32 v1, v1, v2 ; GFX900-SDAG-NEXT: v_cmp_lt_f32_e64 s[4:5], |v0|, s4 ; GFX900-SDAG-NEXT: v_cndmask_b32_e64 v0, v0, v1, s[4:5] ; GFX900-SDAG-NEXT: v_mov_b32_e32 v1, 0x41b17218 @@ -4247,12 +4277,12 @@ ; GFX900-GISEL-NEXT: v_mul_f32_e32 v0, v0, v1 ; GFX900-GISEL-NEXT: v_log_f32_e32 v0, v0 ; GFX900-GISEL-NEXT: s_mov_b32 s4, 0x3f317217 -; GFX900-GISEL-NEXT: v_mov_b32_e32 v2, 0x3377d1cf +; GFX900-GISEL-NEXT: v_mov_b32_e32 v3, 0x3377d1cf ; GFX900-GISEL-NEXT: v_mul_f32_e32 v1, 0x3f317217, v0 -; GFX900-GISEL-NEXT: v_fma_f32 v1, v0, s4, -v1 -; GFX900-GISEL-NEXT: v_fma_f32 v1, v0, v2, v1 +; GFX900-GISEL-NEXT: v_fma_f32 v2, v0, s4, -v1 +; GFX900-GISEL-NEXT: v_fma_f32 v2, v0, v3, v2 +; GFX900-GISEL-NEXT: v_add_f32_e32 v1, v1, v2 ; GFX900-GISEL-NEXT: v_mov_b32_e32 v2, 0x7f800000 -; GFX900-GISEL-NEXT: v_mac_f32_e32 v1, 0x3f317217, v0 ; GFX900-GISEL-NEXT: v_cmp_lt_f32_e64 s[4:5], |v0|, v2 ; GFX900-GISEL-NEXT: v_cndmask_b32_e64 v0, v0, v1, s[4:5] ; GFX900-GISEL-NEXT: v_mov_b32_e32 v1, 0x41b17218 @@ -4499,25 +4529,25 @@ ; SI-SDAG-LABEL: v_log_f32_nnan_ninf_daz: ; SI-SDAG: ; %bb.0: ; SI-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; SI-SDAG-NEXT: v_log_f32_e32 v1, v0 +; SI-SDAG-NEXT: v_log_f32_e32 v0, v0 ; SI-SDAG-NEXT: s_mov_b32 s4, 0x3f317217 -; SI-SDAG-NEXT: v_mul_f32_e32 v0, 0x3f317217, v1 -; SI-SDAG-NEXT: v_fma_f32 v0, v1, s4, -v0 +; SI-SDAG-NEXT: v_mul_f32_e32 v1, 0x3f317217, v0 +; SI-SDAG-NEXT: v_fma_f32 v2, v0, s4, -v1 ; SI-SDAG-NEXT: s_mov_b32 s4, 0x3377d1cf -; SI-SDAG-NEXT: v_fma_f32 v0, v1, s4, v0 -; SI-SDAG-NEXT: v_mac_f32_e32 v0, 0x3f317217, v1 +; SI-SDAG-NEXT: v_fma_f32 v0, v0, s4, v2 +; SI-SDAG-NEXT: v_add_f32_e32 v0, v1, v0 ; SI-SDAG-NEXT: s_setpc_b64 s[30:31] ; ; SI-GISEL-LABEL: v_log_f32_nnan_ninf_daz: ; SI-GISEL: ; %bb.0: ; SI-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; SI-GISEL-NEXT: v_log_f32_e32 v1, v0 +; SI-GISEL-NEXT: v_log_f32_e32 v0, v0 ; SI-GISEL-NEXT: s_mov_b32 s4, 0x3f317217 -; SI-GISEL-NEXT: v_mov_b32_e32 v2, 0x3377d1cf -; SI-GISEL-NEXT: v_mul_f32_e32 v0, 0x3f317217, v1 -; SI-GISEL-NEXT: v_fma_f32 v0, v1, s4, -v0 -; SI-GISEL-NEXT: v_fma_f32 v0, v1, v2, v0 -; SI-GISEL-NEXT: v_mac_f32_e32 v0, 0x3f317217, v1 +; SI-GISEL-NEXT: v_mov_b32_e32 v3, 0x3377d1cf +; SI-GISEL-NEXT: v_mul_f32_e32 v1, 0x3f317217, v0 +; SI-GISEL-NEXT: v_fma_f32 v2, v0, s4, -v1 +; SI-GISEL-NEXT: v_fma_f32 v0, v0, v3, v2 +; SI-GISEL-NEXT: v_add_f32_e32 v0, v1, v0 ; SI-GISEL-NEXT: s_setpc_b64 s[30:31] ; ; VI-LABEL: v_log_f32_nnan_ninf_daz: @@ -4525,35 +4555,38 @@ ; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; VI-NEXT: v_log_f32_e32 v0, v0 ; VI-NEXT: v_and_b32_e32 v1, 0xfffff000, v0 -; VI-NEXT: v_sub_f32_e32 v2, v0, v1 -; VI-NEXT: v_mul_f32_e32 v0, 0x3805fdf4, v2 -; VI-NEXT: v_mac_f32_e32 v0, 0x3805fdf4, v1 -; VI-NEXT: v_mac_f32_e32 v0, 0x3f317000, v2 -; VI-NEXT: v_mac_f32_e32 v0, 0x3f317000, v1 +; VI-NEXT: v_sub_f32_e32 v0, v0, v1 +; VI-NEXT: v_mul_f32_e32 v2, 0x3805fdf4, v1 +; VI-NEXT: v_mul_f32_e32 v3, 0x3805fdf4, v0 +; VI-NEXT: v_mul_f32_e32 v0, 0x3f317000, v0 +; VI-NEXT: v_add_f32_e32 v2, v2, v3 +; VI-NEXT: v_add_f32_e32 v0, v0, v2 +; VI-NEXT: v_mul_f32_e32 v1, 0x3f317000, v1 +; VI-NEXT: v_add_f32_e32 v0, v1, v0 ; VI-NEXT: s_setpc_b64 s[30:31] ; ; GFX900-SDAG-LABEL: v_log_f32_nnan_ninf_daz: ; GFX900-SDAG: ; %bb.0: ; GFX900-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GFX900-SDAG-NEXT: v_log_f32_e32 v1, v0 +; GFX900-SDAG-NEXT: v_log_f32_e32 v0, v0 ; GFX900-SDAG-NEXT: s_mov_b32 s4, 0x3f317217 -; GFX900-SDAG-NEXT: v_mul_f32_e32 v0, 0x3f317217, v1 -; GFX900-SDAG-NEXT: v_fma_f32 v0, v1, s4, -v0 +; GFX900-SDAG-NEXT: v_mul_f32_e32 v1, 0x3f317217, v0 +; GFX900-SDAG-NEXT: v_fma_f32 v2, v0, s4, -v1 ; GFX900-SDAG-NEXT: s_mov_b32 s4, 0x3377d1cf -; GFX900-SDAG-NEXT: v_fma_f32 v0, v1, s4, v0 -; GFX900-SDAG-NEXT: v_mac_f32_e32 v0, 0x3f317217, v1 +; GFX900-SDAG-NEXT: v_fma_f32 v0, v0, s4, v2 +; GFX900-SDAG-NEXT: v_add_f32_e32 v0, v1, v0 ; GFX900-SDAG-NEXT: s_setpc_b64 s[30:31] ; ; GFX900-GISEL-LABEL: v_log_f32_nnan_ninf_daz: ; GFX900-GISEL: ; %bb.0: ; GFX900-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GFX900-GISEL-NEXT: v_log_f32_e32 v1, v0 +; GFX900-GISEL-NEXT: v_log_f32_e32 v0, v0 ; GFX900-GISEL-NEXT: s_mov_b32 s4, 0x3f317217 -; GFX900-GISEL-NEXT: v_mov_b32_e32 v2, 0x3377d1cf -; GFX900-GISEL-NEXT: v_mul_f32_e32 v0, 0x3f317217, v1 -; GFX900-GISEL-NEXT: v_fma_f32 v0, v1, s4, -v0 -; GFX900-GISEL-NEXT: v_fma_f32 v0, v1, v2, v0 -; GFX900-GISEL-NEXT: v_mac_f32_e32 v0, 0x3f317217, v1 +; GFX900-GISEL-NEXT: v_mov_b32_e32 v3, 0x3377d1cf +; GFX900-GISEL-NEXT: v_mul_f32_e32 v1, 0x3f317217, v0 +; GFX900-GISEL-NEXT: v_fma_f32 v2, v0, s4, -v1 +; GFX900-GISEL-NEXT: v_fma_f32 v0, v0, v3, v2 +; GFX900-GISEL-NEXT: v_add_f32_e32 v0, v1, v0 ; GFX900-GISEL-NEXT: s_setpc_b64 s[30:31] ; ; GFX1100-SDAG-LABEL: v_log_f32_nnan_ninf_daz: @@ -4607,13 +4640,13 @@ ; SI-SDAG-NEXT: v_log_f32_e32 v0, v0 ; SI-SDAG-NEXT: s_mov_b32 s4, 0x3f317217 ; SI-SDAG-NEXT: v_mul_f32_e32 v1, 0x3f317217, v0 -; SI-SDAG-NEXT: v_fma_f32 v1, v0, s4, -v1 +; SI-SDAG-NEXT: v_fma_f32 v2, v0, s4, -v1 ; SI-SDAG-NEXT: s_mov_b32 s4, 0x3377d1cf -; SI-SDAG-NEXT: v_fma_f32 v1, v0, s4, v1 -; SI-SDAG-NEXT: v_mac_f32_e32 v1, 0x3f317217, v0 -; SI-SDAG-NEXT: v_mov_b32_e32 v0, 0x41b17218 -; SI-SDAG-NEXT: v_cndmask_b32_e32 v0, 0, v0, vcc -; SI-SDAG-NEXT: v_sub_f32_e32 v0, v1, v0 +; SI-SDAG-NEXT: v_fma_f32 v0, v0, s4, v2 +; SI-SDAG-NEXT: v_add_f32_e32 v0, v1, v0 +; SI-SDAG-NEXT: v_mov_b32_e32 v1, 0x41b17218 +; SI-SDAG-NEXT: v_cndmask_b32_e32 v1, 0, v1, vcc +; SI-SDAG-NEXT: v_sub_f32_e32 v0, v0, v1 ; SI-SDAG-NEXT: s_setpc_b64 s[30:31] ; ; SI-GISEL-LABEL: v_log_f32_nnan_ninf_dynamic: @@ -4626,14 +4659,14 @@ ; SI-GISEL-NEXT: v_mul_f32_e32 v0, v0, v1 ; SI-GISEL-NEXT: v_log_f32_e32 v0, v0 ; SI-GISEL-NEXT: s_mov_b32 s4, 0x3f317217 -; SI-GISEL-NEXT: v_mov_b32_e32 v2, 0x3377d1cf +; SI-GISEL-NEXT: v_mov_b32_e32 v3, 0x3377d1cf ; SI-GISEL-NEXT: v_mul_f32_e32 v1, 0x3f317217, v0 -; SI-GISEL-NEXT: v_fma_f32 v1, v0, s4, -v1 -; SI-GISEL-NEXT: v_fma_f32 v1, v0, v2, v1 -; SI-GISEL-NEXT: v_mac_f32_e32 v1, 0x3f317217, v0 -; SI-GISEL-NEXT: v_mov_b32_e32 v0, 0x41b17218 -; SI-GISEL-NEXT: v_cndmask_b32_e32 v0, 0, v0, vcc -; SI-GISEL-NEXT: v_sub_f32_e32 v0, v1, v0 +; SI-GISEL-NEXT: v_fma_f32 v2, v0, s4, -v1 +; SI-GISEL-NEXT: v_fma_f32 v0, v0, v3, v2 +; SI-GISEL-NEXT: v_add_f32_e32 v0, v1, v0 +; SI-GISEL-NEXT: v_mov_b32_e32 v1, 0x41b17218 +; SI-GISEL-NEXT: v_cndmask_b32_e32 v1, 0, v1, vcc +; SI-GISEL-NEXT: v_sub_f32_e32 v0, v0, v1 ; SI-GISEL-NEXT: s_setpc_b64 s[30:31] ; ; VI-SDAG-LABEL: v_log_f32_nnan_ninf_dynamic: @@ -4647,13 +4680,16 @@ ; VI-SDAG-NEXT: v_log_f32_e32 v0, v0 ; VI-SDAG-NEXT: v_and_b32_e32 v1, 0xfffff000, v0 ; VI-SDAG-NEXT: v_sub_f32_e32 v0, v0, v1 -; VI-SDAG-NEXT: v_mul_f32_e32 v2, 0x3805fdf4, v0 -; VI-SDAG-NEXT: v_mac_f32_e32 v2, 0x3805fdf4, v1 -; VI-SDAG-NEXT: v_mac_f32_e32 v2, 0x3f317000, v0 -; VI-SDAG-NEXT: v_mov_b32_e32 v0, 0x41b17218 -; VI-SDAG-NEXT: v_mac_f32_e32 v2, 0x3f317000, v1 -; VI-SDAG-NEXT: v_cndmask_b32_e32 v0, 0, v0, vcc -; VI-SDAG-NEXT: v_sub_f32_e32 v0, v2, v0 +; VI-SDAG-NEXT: v_mul_f32_e32 v2, 0x3f317000, v0 +; VI-SDAG-NEXT: v_mul_f32_e32 v0, 0x3805fdf4, v0 +; VI-SDAG-NEXT: v_mul_f32_e32 v3, 0x3805fdf4, v1 +; VI-SDAG-NEXT: v_add_f32_e32 v0, v3, v0 +; VI-SDAG-NEXT: v_add_f32_e32 v0, v2, v0 +; VI-SDAG-NEXT: v_mul_f32_e32 v1, 0x3f317000, v1 +; VI-SDAG-NEXT: v_add_f32_e32 v0, v1, v0 +; VI-SDAG-NEXT: v_mov_b32_e32 v1, 0x41b17218 +; VI-SDAG-NEXT: v_cndmask_b32_e32 v1, 0, v1, vcc +; VI-SDAG-NEXT: v_sub_f32_e32 v0, v0, v1 ; VI-SDAG-NEXT: s_setpc_b64 s[30:31] ; ; VI-GISEL-LABEL: v_log_f32_nnan_ninf_dynamic: @@ -4668,12 +4704,15 @@ ; VI-GISEL-NEXT: v_and_b32_e32 v1, 0xfffff000, v0 ; VI-GISEL-NEXT: v_sub_f32_e32 v0, v0, v1 ; VI-GISEL-NEXT: v_mul_f32_e32 v2, 0x3805fdf4, v0 -; VI-GISEL-NEXT: v_mac_f32_e32 v2, 0x3805fdf4, v1 -; VI-GISEL-NEXT: v_mac_f32_e32 v2, 0x3f317000, v0 -; VI-GISEL-NEXT: v_mov_b32_e32 v0, 0x41b17218 -; VI-GISEL-NEXT: v_mac_f32_e32 v2, 0x3f317000, v1 -; VI-GISEL-NEXT: v_cndmask_b32_e32 v0, 0, v0, vcc -; VI-GISEL-NEXT: v_sub_f32_e32 v0, v2, v0 +; VI-GISEL-NEXT: v_mul_f32_e32 v3, 0x3805fdf4, v1 +; VI-GISEL-NEXT: v_add_f32_e32 v2, v3, v2 +; VI-GISEL-NEXT: v_mul_f32_e32 v0, 0x3f317000, v0 +; VI-GISEL-NEXT: v_add_f32_e32 v0, v0, v2 +; VI-GISEL-NEXT: v_mul_f32_e32 v1, 0x3f317000, v1 +; VI-GISEL-NEXT: v_add_f32_e32 v0, v1, v0 +; VI-GISEL-NEXT: v_mov_b32_e32 v1, 0x41b17218 +; VI-GISEL-NEXT: v_cndmask_b32_e32 v1, 0, v1, vcc +; VI-GISEL-NEXT: v_sub_f32_e32 v0, v0, v1 ; VI-GISEL-NEXT: s_setpc_b64 s[30:31] ; ; GFX900-SDAG-LABEL: v_log_f32_nnan_ninf_dynamic: @@ -4687,13 +4726,13 @@ ; GFX900-SDAG-NEXT: v_log_f32_e32 v0, v0 ; GFX900-SDAG-NEXT: s_mov_b32 s4, 0x3f317217 ; GFX900-SDAG-NEXT: v_mul_f32_e32 v1, 0x3f317217, v0 -; GFX900-SDAG-NEXT: v_fma_f32 v1, v0, s4, -v1 +; GFX900-SDAG-NEXT: v_fma_f32 v2, v0, s4, -v1 ; GFX900-SDAG-NEXT: s_mov_b32 s4, 0x3377d1cf -; GFX900-SDAG-NEXT: v_fma_f32 v1, v0, s4, v1 -; GFX900-SDAG-NEXT: v_mac_f32_e32 v1, 0x3f317217, v0 -; GFX900-SDAG-NEXT: v_mov_b32_e32 v0, 0x41b17218 -; GFX900-SDAG-NEXT: v_cndmask_b32_e32 v0, 0, v0, vcc -; GFX900-SDAG-NEXT: v_sub_f32_e32 v0, v1, v0 +; GFX900-SDAG-NEXT: v_fma_f32 v0, v0, s4, v2 +; GFX900-SDAG-NEXT: v_add_f32_e32 v0, v1, v0 +; GFX900-SDAG-NEXT: v_mov_b32_e32 v1, 0x41b17218 +; GFX900-SDAG-NEXT: v_cndmask_b32_e32 v1, 0, v1, vcc +; GFX900-SDAG-NEXT: v_sub_f32_e32 v0, v0, v1 ; GFX900-SDAG-NEXT: s_setpc_b64 s[30:31] ; ; GFX900-GISEL-LABEL: v_log_f32_nnan_ninf_dynamic: @@ -4706,14 +4745,14 @@ ; GFX900-GISEL-NEXT: v_mul_f32_e32 v0, v0, v1 ; GFX900-GISEL-NEXT: v_log_f32_e32 v0, v0 ; GFX900-GISEL-NEXT: s_mov_b32 s4, 0x3f317217 -; GFX900-GISEL-NEXT: v_mov_b32_e32 v2, 0x3377d1cf +; GFX900-GISEL-NEXT: v_mov_b32_e32 v3, 0x3377d1cf ; GFX900-GISEL-NEXT: v_mul_f32_e32 v1, 0x3f317217, v0 -; GFX900-GISEL-NEXT: v_fma_f32 v1, v0, s4, -v1 -; GFX900-GISEL-NEXT: v_fma_f32 v1, v0, v2, v1 -; GFX900-GISEL-NEXT: v_mac_f32_e32 v1, 0x3f317217, v0 -; GFX900-GISEL-NEXT: v_mov_b32_e32 v0, 0x41b17218 -; GFX900-GISEL-NEXT: v_cndmask_b32_e32 v0, 0, v0, vcc -; GFX900-GISEL-NEXT: v_sub_f32_e32 v0, v1, v0 +; GFX900-GISEL-NEXT: v_fma_f32 v2, v0, s4, -v1 +; GFX900-GISEL-NEXT: v_fma_f32 v0, v0, v3, v2 +; GFX900-GISEL-NEXT: v_add_f32_e32 v0, v1, v0 +; GFX900-GISEL-NEXT: v_mov_b32_e32 v1, 0x41b17218 +; GFX900-GISEL-NEXT: v_cndmask_b32_e32 v1, 0, v1, vcc +; GFX900-GISEL-NEXT: v_sub_f32_e32 v0, v0, v1 ; GFX900-GISEL-NEXT: s_setpc_b64 s[30:31] ; ; GFX1100-SDAG-LABEL: v_log_f32_nnan_ninf_dynamic: @@ -4808,11 +4847,11 @@ ; SI-SDAG-NEXT: v_log_f32_e32 v0, v0 ; SI-SDAG-NEXT: s_mov_b32 s4, 0x3f317217 ; SI-SDAG-NEXT: v_mul_f32_e32 v1, 0x3f317217, v0 -; SI-SDAG-NEXT: v_fma_f32 v1, v0, s4, -v1 +; SI-SDAG-NEXT: v_fma_f32 v2, v0, s4, -v1 ; SI-SDAG-NEXT: s_mov_b32 s4, 0x3377d1cf -; SI-SDAG-NEXT: v_fma_f32 v1, v0, s4, v1 +; SI-SDAG-NEXT: v_fma_f32 v2, v0, s4, v2 ; SI-SDAG-NEXT: s_mov_b32 s4, 0x7f800000 -; SI-SDAG-NEXT: v_mac_f32_e32 v1, 0x3f317217, v0 +; SI-SDAG-NEXT: v_add_f32_e32 v1, v1, v2 ; SI-SDAG-NEXT: v_cmp_lt_f32_e64 s[4:5], |v0|, s4 ; SI-SDAG-NEXT: v_cndmask_b32_e64 v0, v0, v1, s[4:5] ; SI-SDAG-NEXT: v_mov_b32_e32 v1, 0x41b17218 @@ -4830,12 +4869,12 @@ ; SI-GISEL-NEXT: v_mul_f32_e32 v0, v0, v1 ; SI-GISEL-NEXT: v_log_f32_e32 v0, v0 ; SI-GISEL-NEXT: s_mov_b32 s4, 0x3f317217 -; SI-GISEL-NEXT: v_mov_b32_e32 v2, 0x3377d1cf +; SI-GISEL-NEXT: v_mov_b32_e32 v3, 0x3377d1cf ; SI-GISEL-NEXT: v_mul_f32_e32 v1, 0x3f317217, v0 -; SI-GISEL-NEXT: v_fma_f32 v1, v0, s4, -v1 -; SI-GISEL-NEXT: v_fma_f32 v1, v0, v2, v1 +; SI-GISEL-NEXT: v_fma_f32 v2, v0, s4, -v1 +; SI-GISEL-NEXT: v_fma_f32 v2, v0, v3, v2 +; SI-GISEL-NEXT: v_add_f32_e32 v1, v1, v2 ; SI-GISEL-NEXT: v_mov_b32_e32 v2, 0x7f800000 -; SI-GISEL-NEXT: v_mac_f32_e32 v1, 0x3f317217, v0 ; SI-GISEL-NEXT: v_cmp_lt_f32_e64 s[4:5], |v0|, v2 ; SI-GISEL-NEXT: v_cndmask_b32_e64 v0, v0, v1, s[4:5] ; SI-GISEL-NEXT: v_mov_b32_e32 v1, 0x41b17218 @@ -4855,13 +4894,16 @@ ; VI-SDAG-NEXT: s_mov_b32 s4, 0x7f800000 ; VI-SDAG-NEXT: v_and_b32_e32 v1, 0xfffff000, v0 ; VI-SDAG-NEXT: v_sub_f32_e32 v2, v0, v1 -; VI-SDAG-NEXT: v_mul_f32_e32 v3, 0x3805fdf4, v2 -; VI-SDAG-NEXT: v_mac_f32_e32 v3, 0x3805fdf4, v1 -; VI-SDAG-NEXT: v_mac_f32_e32 v3, 0x3f317000, v2 -; VI-SDAG-NEXT: v_mac_f32_e32 v3, 0x3f317000, v1 +; VI-SDAG-NEXT: v_mul_f32_e32 v3, 0x3f317000, v2 +; VI-SDAG-NEXT: v_mul_f32_e32 v2, 0x3805fdf4, v2 +; VI-SDAG-NEXT: v_mul_f32_e32 v4, 0x3805fdf4, v1 +; VI-SDAG-NEXT: v_add_f32_e32 v2, v4, v2 +; VI-SDAG-NEXT: v_add_f32_e32 v2, v3, v2 +; VI-SDAG-NEXT: v_mul_f32_e32 v1, 0x3f317000, v1 +; VI-SDAG-NEXT: v_add_f32_e32 v1, v1, v2 ; VI-SDAG-NEXT: v_cmp_lt_f32_e64 s[4:5], |v0|, s4 +; VI-SDAG-NEXT: v_cndmask_b32_e64 v0, v0, v1, s[4:5] ; VI-SDAG-NEXT: v_mov_b32_e32 v1, 0x41b17218 -; VI-SDAG-NEXT: v_cndmask_b32_e64 v0, v0, v3, s[4:5] ; VI-SDAG-NEXT: v_cndmask_b32_e32 v1, 0, v1, vcc ; VI-SDAG-NEXT: v_sub_f32_e32 v0, v0, v1 ; VI-SDAG-NEXT: s_setpc_b64 s[30:31] @@ -4878,13 +4920,16 @@ ; VI-GISEL-NEXT: v_and_b32_e32 v1, 0xfffff000, v0 ; VI-GISEL-NEXT: v_sub_f32_e32 v2, v0, v1 ; VI-GISEL-NEXT: v_mul_f32_e32 v3, 0x3805fdf4, v2 -; VI-GISEL-NEXT: v_mac_f32_e32 v3, 0x3805fdf4, v1 -; VI-GISEL-NEXT: v_mac_f32_e32 v3, 0x3f317000, v2 -; VI-GISEL-NEXT: v_mac_f32_e32 v3, 0x3f317000, v1 -; VI-GISEL-NEXT: v_mov_b32_e32 v1, 0x7f800000 -; VI-GISEL-NEXT: v_cmp_lt_f32_e64 s[4:5], |v0|, v1 +; VI-GISEL-NEXT: v_mul_f32_e32 v4, 0x3805fdf4, v1 +; VI-GISEL-NEXT: v_add_f32_e32 v3, v4, v3 +; VI-GISEL-NEXT: v_mul_f32_e32 v2, 0x3f317000, v2 +; VI-GISEL-NEXT: v_add_f32_e32 v2, v2, v3 +; VI-GISEL-NEXT: v_mul_f32_e32 v1, 0x3f317000, v1 +; VI-GISEL-NEXT: v_add_f32_e32 v1, v1, v2 +; VI-GISEL-NEXT: v_mov_b32_e32 v2, 0x7f800000 +; VI-GISEL-NEXT: v_cmp_lt_f32_e64 s[4:5], |v0|, v2 +; VI-GISEL-NEXT: v_cndmask_b32_e64 v0, v0, v1, s[4:5] ; VI-GISEL-NEXT: v_mov_b32_e32 v1, 0x41b17218 -; VI-GISEL-NEXT: v_cndmask_b32_e64 v0, v0, v3, s[4:5] ; VI-GISEL-NEXT: v_cndmask_b32_e32 v1, 0, v1, vcc ; VI-GISEL-NEXT: v_sub_f32_e32 v0, v0, v1 ; VI-GISEL-NEXT: s_setpc_b64 s[30:31] @@ -4900,11 +4945,11 @@ ; GFX900-SDAG-NEXT: v_log_f32_e32 v0, v0 ; GFX900-SDAG-NEXT: s_mov_b32 s4, 0x3f317217 ; GFX900-SDAG-NEXT: v_mul_f32_e32 v1, 0x3f317217, v0 -; GFX900-SDAG-NEXT: v_fma_f32 v1, v0, s4, -v1 +; GFX900-SDAG-NEXT: v_fma_f32 v2, v0, s4, -v1 ; GFX900-SDAG-NEXT: s_mov_b32 s4, 0x3377d1cf -; GFX900-SDAG-NEXT: v_fma_f32 v1, v0, s4, v1 +; GFX900-SDAG-NEXT: v_fma_f32 v2, v0, s4, v2 ; GFX900-SDAG-NEXT: s_mov_b32 s4, 0x7f800000 -; GFX900-SDAG-NEXT: v_mac_f32_e32 v1, 0x3f317217, v0 +; GFX900-SDAG-NEXT: v_add_f32_e32 v1, v1, v2 ; GFX900-SDAG-NEXT: v_cmp_lt_f32_e64 s[4:5], |v0|, s4 ; GFX900-SDAG-NEXT: v_cndmask_b32_e64 v0, v0, v1, s[4:5] ; GFX900-SDAG-NEXT: v_mov_b32_e32 v1, 0x41b17218 @@ -4922,12 +4967,12 @@ ; GFX900-GISEL-NEXT: v_mul_f32_e32 v0, v0, v1 ; GFX900-GISEL-NEXT: v_log_f32_e32 v0, v0 ; GFX900-GISEL-NEXT: s_mov_b32 s4, 0x3f317217 -; GFX900-GISEL-NEXT: v_mov_b32_e32 v2, 0x3377d1cf +; GFX900-GISEL-NEXT: v_mov_b32_e32 v3, 0x3377d1cf ; GFX900-GISEL-NEXT: v_mul_f32_e32 v1, 0x3f317217, v0 -; GFX900-GISEL-NEXT: v_fma_f32 v1, v0, s4, -v1 -; GFX900-GISEL-NEXT: v_fma_f32 v1, v0, v2, v1 +; GFX900-GISEL-NEXT: v_fma_f32 v2, v0, s4, -v1 +; GFX900-GISEL-NEXT: v_fma_f32 v2, v0, v3, v2 +; GFX900-GISEL-NEXT: v_add_f32_e32 v1, v1, v2 ; GFX900-GISEL-NEXT: v_mov_b32_e32 v2, 0x7f800000 -; GFX900-GISEL-NEXT: v_mac_f32_e32 v1, 0x3f317217, v0 ; GFX900-GISEL-NEXT: v_cmp_lt_f32_e64 s[4:5], |v0|, v2 ; GFX900-GISEL-NEXT: v_cndmask_b32_e64 v0, v0, v1, s[4:5] ; GFX900-GISEL-NEXT: v_mov_b32_e32 v1, 0x41b17218 Index: llvm/test/CodeGen/AMDGPU/llvm.log10.ll =================================================================== --- llvm/test/CodeGen/AMDGPU/llvm.log10.ll +++ llvm/test/CodeGen/AMDGPU/llvm.log10.ll @@ -3330,9 +3330,9 @@ ; SI-SDAG-NEXT: s_mov_b32 s5, 0x3284fbcf ; SI-SDAG-NEXT: s_mov_b32 s6, 0x7f800000 ; SI-SDAG-NEXT: v_mul_f32_e32 v1, 0x3e9a209a, v0 -; SI-SDAG-NEXT: v_fma_f32 v1, v0, s4, -v1 -; SI-SDAG-NEXT: v_fma_f32 v1, v0, s5, v1 -; SI-SDAG-NEXT: v_mac_f32_e32 v1, 0x3e9a209a, v0 +; SI-SDAG-NEXT: v_fma_f32 v2, v0, s4, -v1 +; SI-SDAG-NEXT: v_fma_f32 v2, v0, s5, v2 +; SI-SDAG-NEXT: v_add_f32_e32 v1, v1, v2 ; SI-SDAG-NEXT: v_cmp_lt_f32_e64 vcc, |v0|, s6 ; SI-SDAG-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc ; SI-SDAG-NEXT: s_setpc_b64 s[30:31] @@ -3345,9 +3345,9 @@ ; SI-GISEL-NEXT: v_mov_b32_e32 v1, 0x3284fbcf ; SI-GISEL-NEXT: v_mov_b32_e32 v2, 0x7f800000 ; SI-GISEL-NEXT: v_mul_f32_e32 v3, 0x3e9a209a, v0 -; SI-GISEL-NEXT: v_fma_f32 v3, v0, s4, -v3 -; SI-GISEL-NEXT: v_fma_f32 v1, v0, v1, v3 -; SI-GISEL-NEXT: v_mac_f32_e32 v1, 0x3e9a209a, v0 +; SI-GISEL-NEXT: v_fma_f32 v4, v0, s4, -v3 +; SI-GISEL-NEXT: v_fma_f32 v1, v0, v1, v4 +; SI-GISEL-NEXT: v_add_f32_e32 v1, v3, v1 ; SI-GISEL-NEXT: v_cmp_lt_f32_e64 vcc, |v0|, v2 ; SI-GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc ; SI-GISEL-NEXT: s_setpc_b64 s[30:31] @@ -3358,28 +3358,34 @@ ; VI-SDAG-NEXT: v_log_f32_e32 v0, v0 ; VI-SDAG-NEXT: s_mov_b32 s4, 0x7f800000 ; VI-SDAG-NEXT: v_and_b32_e32 v1, 0xfffff000, v0 -; VI-SDAG-NEXT: v_sub_f32_e32 v2, v0, v1 -; VI-SDAG-NEXT: v_mul_f32_e32 v3, 0x369a84fb, v2 -; VI-SDAG-NEXT: v_mac_f32_e32 v3, 0x369a84fb, v1 -; VI-SDAG-NEXT: v_mac_f32_e32 v3, 0x3e9a2000, v2 -; VI-SDAG-NEXT: v_mac_f32_e32 v3, 0x3e9a2000, v1 +; VI-SDAG-NEXT: v_sub_f32_e32 v3, v0, v1 +; VI-SDAG-NEXT: v_mul_f32_e32 v2, 0x369a84fb, v1 +; VI-SDAG-NEXT: v_mul_f32_e32 v4, 0x369a84fb, v3 +; VI-SDAG-NEXT: v_mul_f32_e32 v3, 0x3e9a2000, v3 +; VI-SDAG-NEXT: v_add_f32_e32 v2, v2, v4 +; VI-SDAG-NEXT: v_mul_f32_e32 v1, 0x3e9a2000, v1 +; VI-SDAG-NEXT: v_add_f32_e32 v2, v3, v2 +; VI-SDAG-NEXT: v_add_f32_e32 v1, v1, v2 ; VI-SDAG-NEXT: v_cmp_lt_f32_e64 vcc, |v0|, s4 -; VI-SDAG-NEXT: v_cndmask_b32_e32 v0, v0, v3, vcc +; VI-SDAG-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc ; VI-SDAG-NEXT: s_setpc_b64 s[30:31] ; ; VI-GISEL-LABEL: v_log10_f32_daz: ; VI-GISEL: ; %bb.0: ; VI-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; VI-GISEL-NEXT: v_log_f32_e32 v0, v0 -; VI-GISEL-NEXT: v_mov_b32_e32 v1, 0x7f800000 -; VI-GISEL-NEXT: v_and_b32_e32 v2, 0xfffff000, v0 -; VI-GISEL-NEXT: v_sub_f32_e32 v3, v0, v2 -; VI-GISEL-NEXT: v_mul_f32_e32 v4, 0x369a84fb, v3 -; VI-GISEL-NEXT: v_mac_f32_e32 v4, 0x369a84fb, v2 -; VI-GISEL-NEXT: v_mac_f32_e32 v4, 0x3e9a2000, v3 -; VI-GISEL-NEXT: v_mac_f32_e32 v4, 0x3e9a2000, v2 -; VI-GISEL-NEXT: v_cmp_lt_f32_e64 vcc, |v0|, v1 -; VI-GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v4, vcc +; VI-GISEL-NEXT: v_and_b32_e32 v1, 0xfffff000, v0 +; VI-GISEL-NEXT: v_sub_f32_e32 v2, v0, v1 +; VI-GISEL-NEXT: v_mul_f32_e32 v3, 0x369a84fb, v1 +; VI-GISEL-NEXT: v_mul_f32_e32 v4, 0x369a84fb, v2 +; VI-GISEL-NEXT: v_mul_f32_e32 v2, 0x3e9a2000, v2 +; VI-GISEL-NEXT: v_add_f32_e32 v3, v3, v4 +; VI-GISEL-NEXT: v_mul_f32_e32 v1, 0x3e9a2000, v1 +; VI-GISEL-NEXT: v_add_f32_e32 v2, v2, v3 +; VI-GISEL-NEXT: v_add_f32_e32 v1, v1, v2 +; VI-GISEL-NEXT: v_mov_b32_e32 v2, 0x7f800000 +; VI-GISEL-NEXT: v_cmp_lt_f32_e64 vcc, |v0|, v2 +; VI-GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc ; VI-GISEL-NEXT: s_setpc_b64 s[30:31] ; ; GFX900-SDAG-LABEL: v_log10_f32_daz: @@ -3390,9 +3396,9 @@ ; GFX900-SDAG-NEXT: s_mov_b32 s5, 0x3284fbcf ; GFX900-SDAG-NEXT: s_mov_b32 s6, 0x7f800000 ; GFX900-SDAG-NEXT: v_mul_f32_e32 v1, 0x3e9a209a, v0 -; GFX900-SDAG-NEXT: v_fma_f32 v1, v0, s4, -v1 -; GFX900-SDAG-NEXT: v_fma_f32 v1, v0, s5, v1 -; GFX900-SDAG-NEXT: v_mac_f32_e32 v1, 0x3e9a209a, v0 +; GFX900-SDAG-NEXT: v_fma_f32 v2, v0, s4, -v1 +; GFX900-SDAG-NEXT: v_fma_f32 v2, v0, s5, v2 +; GFX900-SDAG-NEXT: v_add_f32_e32 v1, v1, v2 ; GFX900-SDAG-NEXT: v_cmp_lt_f32_e64 vcc, |v0|, s6 ; GFX900-SDAG-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc ; GFX900-SDAG-NEXT: s_setpc_b64 s[30:31] @@ -3405,9 +3411,9 @@ ; GFX900-GISEL-NEXT: v_mov_b32_e32 v1, 0x3284fbcf ; GFX900-GISEL-NEXT: v_mov_b32_e32 v2, 0x7f800000 ; GFX900-GISEL-NEXT: v_mul_f32_e32 v3, 0x3e9a209a, v0 -; GFX900-GISEL-NEXT: v_fma_f32 v3, v0, s4, -v3 -; GFX900-GISEL-NEXT: v_fma_f32 v1, v0, v1, v3 -; GFX900-GISEL-NEXT: v_mac_f32_e32 v1, 0x3e9a209a, v0 +; GFX900-GISEL-NEXT: v_fma_f32 v4, v0, s4, -v3 +; GFX900-GISEL-NEXT: v_fma_f32 v1, v0, v1, v4 +; GFX900-GISEL-NEXT: v_add_f32_e32 v1, v3, v1 ; GFX900-GISEL-NEXT: v_cmp_lt_f32_e64 vcc, |v0|, v2 ; GFX900-GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc ; GFX900-GISEL-NEXT: s_setpc_b64 s[30:31] @@ -3666,9 +3672,9 @@ ; SI-SDAG-NEXT: s_mov_b32 s5, 0x3284fbcf ; SI-SDAG-NEXT: s_mov_b32 s6, 0x7f800000 ; SI-SDAG-NEXT: v_mul_f32_e32 v1, 0x3e9a209a, v0 -; SI-SDAG-NEXT: v_fma_f32 v1, v0, s4, -v1 -; SI-SDAG-NEXT: v_fma_f32 v1, v0, s5, v1 -; SI-SDAG-NEXT: v_mac_f32_e32 v1, 0x3e9a209a, v0 +; SI-SDAG-NEXT: v_fma_f32 v2, v0, s4, -v1 +; SI-SDAG-NEXT: v_fma_f32 v2, v0, s5, v2 +; SI-SDAG-NEXT: v_add_f32_e32 v1, v1, v2 ; SI-SDAG-NEXT: v_cmp_lt_f32_e64 vcc, |v0|, s6 ; SI-SDAG-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc ; SI-SDAG-NEXT: s_setpc_b64 s[30:31] @@ -3681,9 +3687,9 @@ ; SI-GISEL-NEXT: v_mov_b32_e32 v1, 0x3284fbcf ; SI-GISEL-NEXT: v_mov_b32_e32 v2, 0x7f800000 ; SI-GISEL-NEXT: v_mul_f32_e32 v3, 0x3e9a209a, v0 -; SI-GISEL-NEXT: v_fma_f32 v3, v0, s4, -v3 -; SI-GISEL-NEXT: v_fma_f32 v1, v0, v1, v3 -; SI-GISEL-NEXT: v_mac_f32_e32 v1, 0x3e9a209a, v0 +; SI-GISEL-NEXT: v_fma_f32 v4, v0, s4, -v3 +; SI-GISEL-NEXT: v_fma_f32 v1, v0, v1, v4 +; SI-GISEL-NEXT: v_add_f32_e32 v1, v3, v1 ; SI-GISEL-NEXT: v_cmp_lt_f32_e64 vcc, |v0|, v2 ; SI-GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc ; SI-GISEL-NEXT: s_setpc_b64 s[30:31] @@ -3694,28 +3700,34 @@ ; VI-SDAG-NEXT: v_log_f32_e32 v0, v0 ; VI-SDAG-NEXT: s_mov_b32 s4, 0x7f800000 ; VI-SDAG-NEXT: v_and_b32_e32 v1, 0xfffff000, v0 -; VI-SDAG-NEXT: v_sub_f32_e32 v2, v0, v1 -; VI-SDAG-NEXT: v_mul_f32_e32 v3, 0x369a84fb, v2 -; VI-SDAG-NEXT: v_mac_f32_e32 v3, 0x369a84fb, v1 -; VI-SDAG-NEXT: v_mac_f32_e32 v3, 0x3e9a2000, v2 -; VI-SDAG-NEXT: v_mac_f32_e32 v3, 0x3e9a2000, v1 +; VI-SDAG-NEXT: v_sub_f32_e32 v3, v0, v1 +; VI-SDAG-NEXT: v_mul_f32_e32 v2, 0x369a84fb, v1 +; VI-SDAG-NEXT: v_mul_f32_e32 v4, 0x369a84fb, v3 +; VI-SDAG-NEXT: v_mul_f32_e32 v3, 0x3e9a2000, v3 +; VI-SDAG-NEXT: v_add_f32_e32 v2, v2, v4 +; VI-SDAG-NEXT: v_mul_f32_e32 v1, 0x3e9a2000, v1 +; VI-SDAG-NEXT: v_add_f32_e32 v2, v3, v2 +; VI-SDAG-NEXT: v_add_f32_e32 v1, v1, v2 ; VI-SDAG-NEXT: v_cmp_lt_f32_e64 vcc, |v0|, s4 -; VI-SDAG-NEXT: v_cndmask_b32_e32 v0, v0, v3, vcc +; VI-SDAG-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc ; VI-SDAG-NEXT: s_setpc_b64 s[30:31] ; ; VI-GISEL-LABEL: v_log10_f32_nnan_daz: ; VI-GISEL: ; %bb.0: ; VI-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; VI-GISEL-NEXT: v_log_f32_e32 v0, v0 -; VI-GISEL-NEXT: v_mov_b32_e32 v1, 0x7f800000 -; VI-GISEL-NEXT: v_and_b32_e32 v2, 0xfffff000, v0 -; VI-GISEL-NEXT: v_sub_f32_e32 v3, v0, v2 -; VI-GISEL-NEXT: v_mul_f32_e32 v4, 0x369a84fb, v3 -; VI-GISEL-NEXT: v_mac_f32_e32 v4, 0x369a84fb, v2 -; VI-GISEL-NEXT: v_mac_f32_e32 v4, 0x3e9a2000, v3 -; VI-GISEL-NEXT: v_mac_f32_e32 v4, 0x3e9a2000, v2 -; VI-GISEL-NEXT: v_cmp_lt_f32_e64 vcc, |v0|, v1 -; VI-GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v4, vcc +; VI-GISEL-NEXT: v_and_b32_e32 v1, 0xfffff000, v0 +; VI-GISEL-NEXT: v_sub_f32_e32 v2, v0, v1 +; VI-GISEL-NEXT: v_mul_f32_e32 v3, 0x369a84fb, v1 +; VI-GISEL-NEXT: v_mul_f32_e32 v4, 0x369a84fb, v2 +; VI-GISEL-NEXT: v_mul_f32_e32 v2, 0x3e9a2000, v2 +; VI-GISEL-NEXT: v_add_f32_e32 v3, v3, v4 +; VI-GISEL-NEXT: v_mul_f32_e32 v1, 0x3e9a2000, v1 +; VI-GISEL-NEXT: v_add_f32_e32 v2, v2, v3 +; VI-GISEL-NEXT: v_add_f32_e32 v1, v1, v2 +; VI-GISEL-NEXT: v_mov_b32_e32 v2, 0x7f800000 +; VI-GISEL-NEXT: v_cmp_lt_f32_e64 vcc, |v0|, v2 +; VI-GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc ; VI-GISEL-NEXT: s_setpc_b64 s[30:31] ; ; GFX900-SDAG-LABEL: v_log10_f32_nnan_daz: @@ -3726,9 +3738,9 @@ ; GFX900-SDAG-NEXT: s_mov_b32 s5, 0x3284fbcf ; GFX900-SDAG-NEXT: s_mov_b32 s6, 0x7f800000 ; GFX900-SDAG-NEXT: v_mul_f32_e32 v1, 0x3e9a209a, v0 -; GFX900-SDAG-NEXT: v_fma_f32 v1, v0, s4, -v1 -; GFX900-SDAG-NEXT: v_fma_f32 v1, v0, s5, v1 -; GFX900-SDAG-NEXT: v_mac_f32_e32 v1, 0x3e9a209a, v0 +; GFX900-SDAG-NEXT: v_fma_f32 v2, v0, s4, -v1 +; GFX900-SDAG-NEXT: v_fma_f32 v2, v0, s5, v2 +; GFX900-SDAG-NEXT: v_add_f32_e32 v1, v1, v2 ; GFX900-SDAG-NEXT: v_cmp_lt_f32_e64 vcc, |v0|, s6 ; GFX900-SDAG-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc ; GFX900-SDAG-NEXT: s_setpc_b64 s[30:31] @@ -3741,9 +3753,9 @@ ; GFX900-GISEL-NEXT: v_mov_b32_e32 v1, 0x3284fbcf ; GFX900-GISEL-NEXT: v_mov_b32_e32 v2, 0x7f800000 ; GFX900-GISEL-NEXT: v_mul_f32_e32 v3, 0x3e9a209a, v0 -; GFX900-GISEL-NEXT: v_fma_f32 v3, v0, s4, -v3 -; GFX900-GISEL-NEXT: v_fma_f32 v1, v0, v1, v3 -; GFX900-GISEL-NEXT: v_mac_f32_e32 v1, 0x3e9a209a, v0 +; GFX900-GISEL-NEXT: v_fma_f32 v4, v0, s4, -v3 +; GFX900-GISEL-NEXT: v_fma_f32 v1, v0, v1, v4 +; GFX900-GISEL-NEXT: v_add_f32_e32 v1, v3, v1 ; GFX900-GISEL-NEXT: v_cmp_lt_f32_e64 vcc, |v0|, v2 ; GFX900-GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc ; GFX900-GISEL-NEXT: s_setpc_b64 s[30:31] @@ -3803,11 +3815,11 @@ ; SI-SDAG-NEXT: v_log_f32_e32 v0, v0 ; SI-SDAG-NEXT: s_mov_b32 s4, 0x3e9a209a ; SI-SDAG-NEXT: v_mul_f32_e32 v1, 0x3e9a209a, v0 -; SI-SDAG-NEXT: v_fma_f32 v1, v0, s4, -v1 +; SI-SDAG-NEXT: v_fma_f32 v2, v0, s4, -v1 ; SI-SDAG-NEXT: s_mov_b32 s4, 0x3284fbcf -; SI-SDAG-NEXT: v_fma_f32 v1, v0, s4, v1 +; SI-SDAG-NEXT: v_fma_f32 v2, v0, s4, v2 ; SI-SDAG-NEXT: s_mov_b32 s4, 0x7f800000 -; SI-SDAG-NEXT: v_mac_f32_e32 v1, 0x3e9a209a, v0 +; SI-SDAG-NEXT: v_add_f32_e32 v1, v1, v2 ; SI-SDAG-NEXT: v_cmp_lt_f32_e64 s[4:5], |v0|, s4 ; SI-SDAG-NEXT: v_cndmask_b32_e64 v0, v0, v1, s[4:5] ; SI-SDAG-NEXT: v_mov_b32_e32 v1, 0x411a209b @@ -3825,12 +3837,12 @@ ; SI-GISEL-NEXT: v_mul_f32_e32 v0, v0, v1 ; SI-GISEL-NEXT: v_log_f32_e32 v0, v0 ; SI-GISEL-NEXT: s_mov_b32 s4, 0x3e9a209a -; SI-GISEL-NEXT: v_mov_b32_e32 v2, 0x3284fbcf +; SI-GISEL-NEXT: v_mov_b32_e32 v3, 0x3284fbcf ; SI-GISEL-NEXT: v_mul_f32_e32 v1, 0x3e9a209a, v0 -; SI-GISEL-NEXT: v_fma_f32 v1, v0, s4, -v1 -; SI-GISEL-NEXT: v_fma_f32 v1, v0, v2, v1 +; SI-GISEL-NEXT: v_fma_f32 v2, v0, s4, -v1 +; SI-GISEL-NEXT: v_fma_f32 v2, v0, v3, v2 +; SI-GISEL-NEXT: v_add_f32_e32 v1, v1, v2 ; SI-GISEL-NEXT: v_mov_b32_e32 v2, 0x7f800000 -; SI-GISEL-NEXT: v_mac_f32_e32 v1, 0x3e9a209a, v0 ; SI-GISEL-NEXT: v_cmp_lt_f32_e64 s[4:5], |v0|, v2 ; SI-GISEL-NEXT: v_cndmask_b32_e64 v0, v0, v1, s[4:5] ; SI-GISEL-NEXT: v_mov_b32_e32 v1, 0x411a209b @@ -3850,13 +3862,16 @@ ; VI-SDAG-NEXT: s_mov_b32 s4, 0x7f800000 ; VI-SDAG-NEXT: v_and_b32_e32 v1, 0xfffff000, v0 ; VI-SDAG-NEXT: v_sub_f32_e32 v2, v0, v1 -; VI-SDAG-NEXT: v_mul_f32_e32 v3, 0x369a84fb, v2 -; VI-SDAG-NEXT: v_mac_f32_e32 v3, 0x369a84fb, v1 -; VI-SDAG-NEXT: v_mac_f32_e32 v3, 0x3e9a2000, v2 -; VI-SDAG-NEXT: v_mac_f32_e32 v3, 0x3e9a2000, v1 +; VI-SDAG-NEXT: v_mul_f32_e32 v3, 0x3e9a2000, v2 +; VI-SDAG-NEXT: v_mul_f32_e32 v2, 0x369a84fb, v2 +; VI-SDAG-NEXT: v_mul_f32_e32 v4, 0x369a84fb, v1 +; VI-SDAG-NEXT: v_add_f32_e32 v2, v4, v2 +; VI-SDAG-NEXT: v_add_f32_e32 v2, v3, v2 +; VI-SDAG-NEXT: v_mul_f32_e32 v1, 0x3e9a2000, v1 +; VI-SDAG-NEXT: v_add_f32_e32 v1, v1, v2 ; VI-SDAG-NEXT: v_cmp_lt_f32_e64 s[4:5], |v0|, s4 +; VI-SDAG-NEXT: v_cndmask_b32_e64 v0, v0, v1, s[4:5] ; VI-SDAG-NEXT: v_mov_b32_e32 v1, 0x411a209b -; VI-SDAG-NEXT: v_cndmask_b32_e64 v0, v0, v3, s[4:5] ; VI-SDAG-NEXT: v_cndmask_b32_e32 v1, 0, v1, vcc ; VI-SDAG-NEXT: v_sub_f32_e32 v0, v0, v1 ; VI-SDAG-NEXT: s_setpc_b64 s[30:31] @@ -3873,13 +3888,16 @@ ; VI-GISEL-NEXT: v_and_b32_e32 v1, 0xfffff000, v0 ; VI-GISEL-NEXT: v_sub_f32_e32 v2, v0, v1 ; VI-GISEL-NEXT: v_mul_f32_e32 v3, 0x369a84fb, v2 -; VI-GISEL-NEXT: v_mac_f32_e32 v3, 0x369a84fb, v1 -; VI-GISEL-NEXT: v_mac_f32_e32 v3, 0x3e9a2000, v2 -; VI-GISEL-NEXT: v_mac_f32_e32 v3, 0x3e9a2000, v1 -; VI-GISEL-NEXT: v_mov_b32_e32 v1, 0x7f800000 -; VI-GISEL-NEXT: v_cmp_lt_f32_e64 s[4:5], |v0|, v1 +; VI-GISEL-NEXT: v_mul_f32_e32 v4, 0x369a84fb, v1 +; VI-GISEL-NEXT: v_add_f32_e32 v3, v4, v3 +; VI-GISEL-NEXT: v_mul_f32_e32 v2, 0x3e9a2000, v2 +; VI-GISEL-NEXT: v_add_f32_e32 v2, v2, v3 +; VI-GISEL-NEXT: v_mul_f32_e32 v1, 0x3e9a2000, v1 +; VI-GISEL-NEXT: v_add_f32_e32 v1, v1, v2 +; VI-GISEL-NEXT: v_mov_b32_e32 v2, 0x7f800000 +; VI-GISEL-NEXT: v_cmp_lt_f32_e64 s[4:5], |v0|, v2 +; VI-GISEL-NEXT: v_cndmask_b32_e64 v0, v0, v1, s[4:5] ; VI-GISEL-NEXT: v_mov_b32_e32 v1, 0x411a209b -; VI-GISEL-NEXT: v_cndmask_b32_e64 v0, v0, v3, s[4:5] ; VI-GISEL-NEXT: v_cndmask_b32_e32 v1, 0, v1, vcc ; VI-GISEL-NEXT: v_sub_f32_e32 v0, v0, v1 ; VI-GISEL-NEXT: s_setpc_b64 s[30:31] @@ -3895,11 +3913,11 @@ ; GFX900-SDAG-NEXT: v_log_f32_e32 v0, v0 ; GFX900-SDAG-NEXT: s_mov_b32 s4, 0x3e9a209a ; GFX900-SDAG-NEXT: v_mul_f32_e32 v1, 0x3e9a209a, v0 -; GFX900-SDAG-NEXT: v_fma_f32 v1, v0, s4, -v1 +; GFX900-SDAG-NEXT: v_fma_f32 v2, v0, s4, -v1 ; GFX900-SDAG-NEXT: s_mov_b32 s4, 0x3284fbcf -; GFX900-SDAG-NEXT: v_fma_f32 v1, v0, s4, v1 +; GFX900-SDAG-NEXT: v_fma_f32 v2, v0, s4, v2 ; GFX900-SDAG-NEXT: s_mov_b32 s4, 0x7f800000 -; GFX900-SDAG-NEXT: v_mac_f32_e32 v1, 0x3e9a209a, v0 +; GFX900-SDAG-NEXT: v_add_f32_e32 v1, v1, v2 ; GFX900-SDAG-NEXT: v_cmp_lt_f32_e64 s[4:5], |v0|, s4 ; GFX900-SDAG-NEXT: v_cndmask_b32_e64 v0, v0, v1, s[4:5] ; GFX900-SDAG-NEXT: v_mov_b32_e32 v1, 0x411a209b @@ -3917,12 +3935,12 @@ ; GFX900-GISEL-NEXT: v_mul_f32_e32 v0, v0, v1 ; GFX900-GISEL-NEXT: v_log_f32_e32 v0, v0 ; GFX900-GISEL-NEXT: s_mov_b32 s4, 0x3e9a209a -; GFX900-GISEL-NEXT: v_mov_b32_e32 v2, 0x3284fbcf +; GFX900-GISEL-NEXT: v_mov_b32_e32 v3, 0x3284fbcf ; GFX900-GISEL-NEXT: v_mul_f32_e32 v1, 0x3e9a209a, v0 -; GFX900-GISEL-NEXT: v_fma_f32 v1, v0, s4, -v1 -; GFX900-GISEL-NEXT: v_fma_f32 v1, v0, v2, v1 +; GFX900-GISEL-NEXT: v_fma_f32 v2, v0, s4, -v1 +; GFX900-GISEL-NEXT: v_fma_f32 v2, v0, v3, v2 +; GFX900-GISEL-NEXT: v_add_f32_e32 v1, v1, v2 ; GFX900-GISEL-NEXT: v_mov_b32_e32 v2, 0x7f800000 -; GFX900-GISEL-NEXT: v_mac_f32_e32 v1, 0x3e9a209a, v0 ; GFX900-GISEL-NEXT: v_cmp_lt_f32_e64 s[4:5], |v0|, v2 ; GFX900-GISEL-NEXT: v_cndmask_b32_e64 v0, v0, v1, s[4:5] ; GFX900-GISEL-NEXT: v_mov_b32_e32 v1, 0x411a209b @@ -3996,9 +4014,9 @@ ; SI-SDAG-NEXT: s_mov_b32 s5, 0x3284fbcf ; SI-SDAG-NEXT: s_mov_b32 s6, 0x7f800000 ; SI-SDAG-NEXT: v_mul_f32_e32 v1, 0x3e9a209a, v0 -; SI-SDAG-NEXT: v_fma_f32 v1, v0, s4, -v1 -; SI-SDAG-NEXT: v_fma_f32 v1, v0, s5, v1 -; SI-SDAG-NEXT: v_mac_f32_e32 v1, 0x3e9a209a, v0 +; SI-SDAG-NEXT: v_fma_f32 v2, v0, s4, -v1 +; SI-SDAG-NEXT: v_fma_f32 v2, v0, s5, v2 +; SI-SDAG-NEXT: v_add_f32_e32 v1, v1, v2 ; SI-SDAG-NEXT: v_cmp_lt_f32_e64 vcc, |v0|, s6 ; SI-SDAG-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc ; SI-SDAG-NEXT: s_setpc_b64 s[30:31] @@ -4011,9 +4029,9 @@ ; SI-GISEL-NEXT: v_mov_b32_e32 v1, 0x3284fbcf ; SI-GISEL-NEXT: v_mov_b32_e32 v2, 0x7f800000 ; SI-GISEL-NEXT: v_mul_f32_e32 v3, 0x3e9a209a, v0 -; SI-GISEL-NEXT: v_fma_f32 v3, v0, s4, -v3 -; SI-GISEL-NEXT: v_fma_f32 v1, v0, v1, v3 -; SI-GISEL-NEXT: v_mac_f32_e32 v1, 0x3e9a209a, v0 +; SI-GISEL-NEXT: v_fma_f32 v4, v0, s4, -v3 +; SI-GISEL-NEXT: v_fma_f32 v1, v0, v1, v4 +; SI-GISEL-NEXT: v_add_f32_e32 v1, v3, v1 ; SI-GISEL-NEXT: v_cmp_lt_f32_e64 vcc, |v0|, v2 ; SI-GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc ; SI-GISEL-NEXT: s_setpc_b64 s[30:31] @@ -4024,28 +4042,34 @@ ; VI-SDAG-NEXT: v_log_f32_e32 v0, v0 ; VI-SDAG-NEXT: s_mov_b32 s4, 0x7f800000 ; VI-SDAG-NEXT: v_and_b32_e32 v1, 0xfffff000, v0 -; VI-SDAG-NEXT: v_sub_f32_e32 v2, v0, v1 -; VI-SDAG-NEXT: v_mul_f32_e32 v3, 0x369a84fb, v2 -; VI-SDAG-NEXT: v_mac_f32_e32 v3, 0x369a84fb, v1 -; VI-SDAG-NEXT: v_mac_f32_e32 v3, 0x3e9a2000, v2 -; VI-SDAG-NEXT: v_mac_f32_e32 v3, 0x3e9a2000, v1 +; VI-SDAG-NEXT: v_sub_f32_e32 v3, v0, v1 +; VI-SDAG-NEXT: v_mul_f32_e32 v2, 0x369a84fb, v1 +; VI-SDAG-NEXT: v_mul_f32_e32 v4, 0x369a84fb, v3 +; VI-SDAG-NEXT: v_mul_f32_e32 v3, 0x3e9a2000, v3 +; VI-SDAG-NEXT: v_add_f32_e32 v2, v2, v4 +; VI-SDAG-NEXT: v_mul_f32_e32 v1, 0x3e9a2000, v1 +; VI-SDAG-NEXT: v_add_f32_e32 v2, v3, v2 +; VI-SDAG-NEXT: v_add_f32_e32 v1, v1, v2 ; VI-SDAG-NEXT: v_cmp_lt_f32_e64 vcc, |v0|, s4 -; VI-SDAG-NEXT: v_cndmask_b32_e32 v0, v0, v3, vcc +; VI-SDAG-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc ; VI-SDAG-NEXT: s_setpc_b64 s[30:31] ; ; VI-GISEL-LABEL: v_log10_f32_ninf_daz: ; VI-GISEL: ; %bb.0: ; VI-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; VI-GISEL-NEXT: v_log_f32_e32 v0, v0 -; VI-GISEL-NEXT: v_mov_b32_e32 v1, 0x7f800000 -; VI-GISEL-NEXT: v_and_b32_e32 v2, 0xfffff000, v0 -; VI-GISEL-NEXT: v_sub_f32_e32 v3, v0, v2 -; VI-GISEL-NEXT: v_mul_f32_e32 v4, 0x369a84fb, v3 -; VI-GISEL-NEXT: v_mac_f32_e32 v4, 0x369a84fb, v2 -; VI-GISEL-NEXT: v_mac_f32_e32 v4, 0x3e9a2000, v3 -; VI-GISEL-NEXT: v_mac_f32_e32 v4, 0x3e9a2000, v2 -; VI-GISEL-NEXT: v_cmp_lt_f32_e64 vcc, |v0|, v1 -; VI-GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v4, vcc +; VI-GISEL-NEXT: v_and_b32_e32 v1, 0xfffff000, v0 +; VI-GISEL-NEXT: v_sub_f32_e32 v2, v0, v1 +; VI-GISEL-NEXT: v_mul_f32_e32 v3, 0x369a84fb, v1 +; VI-GISEL-NEXT: v_mul_f32_e32 v4, 0x369a84fb, v2 +; VI-GISEL-NEXT: v_mul_f32_e32 v2, 0x3e9a2000, v2 +; VI-GISEL-NEXT: v_add_f32_e32 v3, v3, v4 +; VI-GISEL-NEXT: v_mul_f32_e32 v1, 0x3e9a2000, v1 +; VI-GISEL-NEXT: v_add_f32_e32 v2, v2, v3 +; VI-GISEL-NEXT: v_add_f32_e32 v1, v1, v2 +; VI-GISEL-NEXT: v_mov_b32_e32 v2, 0x7f800000 +; VI-GISEL-NEXT: v_cmp_lt_f32_e64 vcc, |v0|, v2 +; VI-GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc ; VI-GISEL-NEXT: s_setpc_b64 s[30:31] ; ; GFX900-SDAG-LABEL: v_log10_f32_ninf_daz: @@ -4056,9 +4080,9 @@ ; GFX900-SDAG-NEXT: s_mov_b32 s5, 0x3284fbcf ; GFX900-SDAG-NEXT: s_mov_b32 s6, 0x7f800000 ; GFX900-SDAG-NEXT: v_mul_f32_e32 v1, 0x3e9a209a, v0 -; GFX900-SDAG-NEXT: v_fma_f32 v1, v0, s4, -v1 -; GFX900-SDAG-NEXT: v_fma_f32 v1, v0, s5, v1 -; GFX900-SDAG-NEXT: v_mac_f32_e32 v1, 0x3e9a209a, v0 +; GFX900-SDAG-NEXT: v_fma_f32 v2, v0, s4, -v1 +; GFX900-SDAG-NEXT: v_fma_f32 v2, v0, s5, v2 +; GFX900-SDAG-NEXT: v_add_f32_e32 v1, v1, v2 ; GFX900-SDAG-NEXT: v_cmp_lt_f32_e64 vcc, |v0|, s6 ; GFX900-SDAG-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc ; GFX900-SDAG-NEXT: s_setpc_b64 s[30:31] @@ -4071,9 +4095,9 @@ ; GFX900-GISEL-NEXT: v_mov_b32_e32 v1, 0x3284fbcf ; GFX900-GISEL-NEXT: v_mov_b32_e32 v2, 0x7f800000 ; GFX900-GISEL-NEXT: v_mul_f32_e32 v3, 0x3e9a209a, v0 -; GFX900-GISEL-NEXT: v_fma_f32 v3, v0, s4, -v3 -; GFX900-GISEL-NEXT: v_fma_f32 v1, v0, v1, v3 -; GFX900-GISEL-NEXT: v_mac_f32_e32 v1, 0x3e9a209a, v0 +; GFX900-GISEL-NEXT: v_fma_f32 v4, v0, s4, -v3 +; GFX900-GISEL-NEXT: v_fma_f32 v1, v0, v1, v4 +; GFX900-GISEL-NEXT: v_add_f32_e32 v1, v3, v1 ; GFX900-GISEL-NEXT: v_cmp_lt_f32_e64 vcc, |v0|, v2 ; GFX900-GISEL-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc ; GFX900-GISEL-NEXT: s_setpc_b64 s[30:31] @@ -4133,11 +4157,11 @@ ; SI-SDAG-NEXT: v_log_f32_e32 v0, v0 ; SI-SDAG-NEXT: s_mov_b32 s4, 0x3e9a209a ; SI-SDAG-NEXT: v_mul_f32_e32 v1, 0x3e9a209a, v0 -; SI-SDAG-NEXT: v_fma_f32 v1, v0, s4, -v1 +; SI-SDAG-NEXT: v_fma_f32 v2, v0, s4, -v1 ; SI-SDAG-NEXT: s_mov_b32 s4, 0x3284fbcf -; SI-SDAG-NEXT: v_fma_f32 v1, v0, s4, v1 +; SI-SDAG-NEXT: v_fma_f32 v2, v0, s4, v2 ; SI-SDAG-NEXT: s_mov_b32 s4, 0x7f800000 -; SI-SDAG-NEXT: v_mac_f32_e32 v1, 0x3e9a209a, v0 +; SI-SDAG-NEXT: v_add_f32_e32 v1, v1, v2 ; SI-SDAG-NEXT: v_cmp_lt_f32_e64 s[4:5], |v0|, s4 ; SI-SDAG-NEXT: v_cndmask_b32_e64 v0, v0, v1, s[4:5] ; SI-SDAG-NEXT: v_mov_b32_e32 v1, 0x411a209b @@ -4155,12 +4179,12 @@ ; SI-GISEL-NEXT: v_mul_f32_e32 v0, v0, v1 ; SI-GISEL-NEXT: v_log_f32_e32 v0, v0 ; SI-GISEL-NEXT: s_mov_b32 s4, 0x3e9a209a -; SI-GISEL-NEXT: v_mov_b32_e32 v2, 0x3284fbcf +; SI-GISEL-NEXT: v_mov_b32_e32 v3, 0x3284fbcf ; SI-GISEL-NEXT: v_mul_f32_e32 v1, 0x3e9a209a, v0 -; SI-GISEL-NEXT: v_fma_f32 v1, v0, s4, -v1 -; SI-GISEL-NEXT: v_fma_f32 v1, v0, v2, v1 +; SI-GISEL-NEXT: v_fma_f32 v2, v0, s4, -v1 +; SI-GISEL-NEXT: v_fma_f32 v2, v0, v3, v2 +; SI-GISEL-NEXT: v_add_f32_e32 v1, v1, v2 ; SI-GISEL-NEXT: v_mov_b32_e32 v2, 0x7f800000 -; SI-GISEL-NEXT: v_mac_f32_e32 v1, 0x3e9a209a, v0 ; SI-GISEL-NEXT: v_cmp_lt_f32_e64 s[4:5], |v0|, v2 ; SI-GISEL-NEXT: v_cndmask_b32_e64 v0, v0, v1, s[4:5] ; SI-GISEL-NEXT: v_mov_b32_e32 v1, 0x411a209b @@ -4180,13 +4204,16 @@ ; VI-SDAG-NEXT: s_mov_b32 s4, 0x7f800000 ; VI-SDAG-NEXT: v_and_b32_e32 v1, 0xfffff000, v0 ; VI-SDAG-NEXT: v_sub_f32_e32 v2, v0, v1 -; VI-SDAG-NEXT: v_mul_f32_e32 v3, 0x369a84fb, v2 -; VI-SDAG-NEXT: v_mac_f32_e32 v3, 0x369a84fb, v1 -; VI-SDAG-NEXT: v_mac_f32_e32 v3, 0x3e9a2000, v2 -; VI-SDAG-NEXT: v_mac_f32_e32 v3, 0x3e9a2000, v1 +; VI-SDAG-NEXT: v_mul_f32_e32 v3, 0x3e9a2000, v2 +; VI-SDAG-NEXT: v_mul_f32_e32 v2, 0x369a84fb, v2 +; VI-SDAG-NEXT: v_mul_f32_e32 v4, 0x369a84fb, v1 +; VI-SDAG-NEXT: v_add_f32_e32 v2, v4, v2 +; VI-SDAG-NEXT: v_add_f32_e32 v2, v3, v2 +; VI-SDAG-NEXT: v_mul_f32_e32 v1, 0x3e9a2000, v1 +; VI-SDAG-NEXT: v_add_f32_e32 v1, v1, v2 ; VI-SDAG-NEXT: v_cmp_lt_f32_e64 s[4:5], |v0|, s4 +; VI-SDAG-NEXT: v_cndmask_b32_e64 v0, v0, v1, s[4:5] ; VI-SDAG-NEXT: v_mov_b32_e32 v1, 0x411a209b -; VI-SDAG-NEXT: v_cndmask_b32_e64 v0, v0, v3, s[4:5] ; VI-SDAG-NEXT: v_cndmask_b32_e32 v1, 0, v1, vcc ; VI-SDAG-NEXT: v_sub_f32_e32 v0, v0, v1 ; VI-SDAG-NEXT: s_setpc_b64 s[30:31] @@ -4203,13 +4230,16 @@ ; VI-GISEL-NEXT: v_and_b32_e32 v1, 0xfffff000, v0 ; VI-GISEL-NEXT: v_sub_f32_e32 v2, v0, v1 ; VI-GISEL-NEXT: v_mul_f32_e32 v3, 0x369a84fb, v2 -; VI-GISEL-NEXT: v_mac_f32_e32 v3, 0x369a84fb, v1 -; VI-GISEL-NEXT: v_mac_f32_e32 v3, 0x3e9a2000, v2 -; VI-GISEL-NEXT: v_mac_f32_e32 v3, 0x3e9a2000, v1 -; VI-GISEL-NEXT: v_mov_b32_e32 v1, 0x7f800000 -; VI-GISEL-NEXT: v_cmp_lt_f32_e64 s[4:5], |v0|, v1 +; VI-GISEL-NEXT: v_mul_f32_e32 v4, 0x369a84fb, v1 +; VI-GISEL-NEXT: v_add_f32_e32 v3, v4, v3 +; VI-GISEL-NEXT: v_mul_f32_e32 v2, 0x3e9a2000, v2 +; VI-GISEL-NEXT: v_add_f32_e32 v2, v2, v3 +; VI-GISEL-NEXT: v_mul_f32_e32 v1, 0x3e9a2000, v1 +; VI-GISEL-NEXT: v_add_f32_e32 v1, v1, v2 +; VI-GISEL-NEXT: v_mov_b32_e32 v2, 0x7f800000 +; VI-GISEL-NEXT: v_cmp_lt_f32_e64 s[4:5], |v0|, v2 +; VI-GISEL-NEXT: v_cndmask_b32_e64 v0, v0, v1, s[4:5] ; VI-GISEL-NEXT: v_mov_b32_e32 v1, 0x411a209b -; VI-GISEL-NEXT: v_cndmask_b32_e64 v0, v0, v3, s[4:5] ; VI-GISEL-NEXT: v_cndmask_b32_e32 v1, 0, v1, vcc ; VI-GISEL-NEXT: v_sub_f32_e32 v0, v0, v1 ; VI-GISEL-NEXT: s_setpc_b64 s[30:31] @@ -4225,11 +4255,11 @@ ; GFX900-SDAG-NEXT: v_log_f32_e32 v0, v0 ; GFX900-SDAG-NEXT: s_mov_b32 s4, 0x3e9a209a ; GFX900-SDAG-NEXT: v_mul_f32_e32 v1, 0x3e9a209a, v0 -; GFX900-SDAG-NEXT: v_fma_f32 v1, v0, s4, -v1 +; GFX900-SDAG-NEXT: v_fma_f32 v2, v0, s4, -v1 ; GFX900-SDAG-NEXT: s_mov_b32 s4, 0x3284fbcf -; GFX900-SDAG-NEXT: v_fma_f32 v1, v0, s4, v1 +; GFX900-SDAG-NEXT: v_fma_f32 v2, v0, s4, v2 ; GFX900-SDAG-NEXT: s_mov_b32 s4, 0x7f800000 -; GFX900-SDAG-NEXT: v_mac_f32_e32 v1, 0x3e9a209a, v0 +; GFX900-SDAG-NEXT: v_add_f32_e32 v1, v1, v2 ; GFX900-SDAG-NEXT: v_cmp_lt_f32_e64 s[4:5], |v0|, s4 ; GFX900-SDAG-NEXT: v_cndmask_b32_e64 v0, v0, v1, s[4:5] ; GFX900-SDAG-NEXT: v_mov_b32_e32 v1, 0x411a209b @@ -4247,12 +4277,12 @@ ; GFX900-GISEL-NEXT: v_mul_f32_e32 v0, v0, v1 ; GFX900-GISEL-NEXT: v_log_f32_e32 v0, v0 ; GFX900-GISEL-NEXT: s_mov_b32 s4, 0x3e9a209a -; GFX900-GISEL-NEXT: v_mov_b32_e32 v2, 0x3284fbcf +; GFX900-GISEL-NEXT: v_mov_b32_e32 v3, 0x3284fbcf ; GFX900-GISEL-NEXT: v_mul_f32_e32 v1, 0x3e9a209a, v0 -; GFX900-GISEL-NEXT: v_fma_f32 v1, v0, s4, -v1 -; GFX900-GISEL-NEXT: v_fma_f32 v1, v0, v2, v1 +; GFX900-GISEL-NEXT: v_fma_f32 v2, v0, s4, -v1 +; GFX900-GISEL-NEXT: v_fma_f32 v2, v0, v3, v2 +; GFX900-GISEL-NEXT: v_add_f32_e32 v1, v1, v2 ; GFX900-GISEL-NEXT: v_mov_b32_e32 v2, 0x7f800000 -; GFX900-GISEL-NEXT: v_mac_f32_e32 v1, 0x3e9a209a, v0 ; GFX900-GISEL-NEXT: v_cmp_lt_f32_e64 s[4:5], |v0|, v2 ; GFX900-GISEL-NEXT: v_cndmask_b32_e64 v0, v0, v1, s[4:5] ; GFX900-GISEL-NEXT: v_mov_b32_e32 v1, 0x411a209b @@ -4499,25 +4529,25 @@ ; SI-SDAG-LABEL: v_log10_f32_nnan_ninf_daz: ; SI-SDAG: ; %bb.0: ; SI-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; SI-SDAG-NEXT: v_log_f32_e32 v1, v0 +; SI-SDAG-NEXT: v_log_f32_e32 v0, v0 ; SI-SDAG-NEXT: s_mov_b32 s4, 0x3e9a209a -; SI-SDAG-NEXT: v_mul_f32_e32 v0, 0x3e9a209a, v1 -; SI-SDAG-NEXT: v_fma_f32 v0, v1, s4, -v0 +; SI-SDAG-NEXT: v_mul_f32_e32 v1, 0x3e9a209a, v0 +; SI-SDAG-NEXT: v_fma_f32 v2, v0, s4, -v1 ; SI-SDAG-NEXT: s_mov_b32 s4, 0x3284fbcf -; SI-SDAG-NEXT: v_fma_f32 v0, v1, s4, v0 -; SI-SDAG-NEXT: v_mac_f32_e32 v0, 0x3e9a209a, v1 +; SI-SDAG-NEXT: v_fma_f32 v0, v0, s4, v2 +; SI-SDAG-NEXT: v_add_f32_e32 v0, v1, v0 ; SI-SDAG-NEXT: s_setpc_b64 s[30:31] ; ; SI-GISEL-LABEL: v_log10_f32_nnan_ninf_daz: ; SI-GISEL: ; %bb.0: ; SI-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; SI-GISEL-NEXT: v_log_f32_e32 v1, v0 +; SI-GISEL-NEXT: v_log_f32_e32 v0, v0 ; SI-GISEL-NEXT: s_mov_b32 s4, 0x3e9a209a -; SI-GISEL-NEXT: v_mov_b32_e32 v2, 0x3284fbcf -; SI-GISEL-NEXT: v_mul_f32_e32 v0, 0x3e9a209a, v1 -; SI-GISEL-NEXT: v_fma_f32 v0, v1, s4, -v0 -; SI-GISEL-NEXT: v_fma_f32 v0, v1, v2, v0 -; SI-GISEL-NEXT: v_mac_f32_e32 v0, 0x3e9a209a, v1 +; SI-GISEL-NEXT: v_mov_b32_e32 v3, 0x3284fbcf +; SI-GISEL-NEXT: v_mul_f32_e32 v1, 0x3e9a209a, v0 +; SI-GISEL-NEXT: v_fma_f32 v2, v0, s4, -v1 +; SI-GISEL-NEXT: v_fma_f32 v0, v0, v3, v2 +; SI-GISEL-NEXT: v_add_f32_e32 v0, v1, v0 ; SI-GISEL-NEXT: s_setpc_b64 s[30:31] ; ; VI-LABEL: v_log10_f32_nnan_ninf_daz: @@ -4525,35 +4555,38 @@ ; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; VI-NEXT: v_log_f32_e32 v0, v0 ; VI-NEXT: v_and_b32_e32 v1, 0xfffff000, v0 -; VI-NEXT: v_sub_f32_e32 v2, v0, v1 -; VI-NEXT: v_mul_f32_e32 v0, 0x369a84fb, v2 -; VI-NEXT: v_mac_f32_e32 v0, 0x369a84fb, v1 -; VI-NEXT: v_mac_f32_e32 v0, 0x3e9a2000, v2 -; VI-NEXT: v_mac_f32_e32 v0, 0x3e9a2000, v1 +; VI-NEXT: v_sub_f32_e32 v0, v0, v1 +; VI-NEXT: v_mul_f32_e32 v2, 0x369a84fb, v1 +; VI-NEXT: v_mul_f32_e32 v3, 0x369a84fb, v0 +; VI-NEXT: v_mul_f32_e32 v0, 0x3e9a2000, v0 +; VI-NEXT: v_add_f32_e32 v2, v2, v3 +; VI-NEXT: v_add_f32_e32 v0, v0, v2 +; VI-NEXT: v_mul_f32_e32 v1, 0x3e9a2000, v1 +; VI-NEXT: v_add_f32_e32 v0, v1, v0 ; VI-NEXT: s_setpc_b64 s[30:31] ; ; GFX900-SDAG-LABEL: v_log10_f32_nnan_ninf_daz: ; GFX900-SDAG: ; %bb.0: ; GFX900-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GFX900-SDAG-NEXT: v_log_f32_e32 v1, v0 +; GFX900-SDAG-NEXT: v_log_f32_e32 v0, v0 ; GFX900-SDAG-NEXT: s_mov_b32 s4, 0x3e9a209a -; GFX900-SDAG-NEXT: v_mul_f32_e32 v0, 0x3e9a209a, v1 -; GFX900-SDAG-NEXT: v_fma_f32 v0, v1, s4, -v0 +; GFX900-SDAG-NEXT: v_mul_f32_e32 v1, 0x3e9a209a, v0 +; GFX900-SDAG-NEXT: v_fma_f32 v2, v0, s4, -v1 ; GFX900-SDAG-NEXT: s_mov_b32 s4, 0x3284fbcf -; GFX900-SDAG-NEXT: v_fma_f32 v0, v1, s4, v0 -; GFX900-SDAG-NEXT: v_mac_f32_e32 v0, 0x3e9a209a, v1 +; GFX900-SDAG-NEXT: v_fma_f32 v0, v0, s4, v2 +; GFX900-SDAG-NEXT: v_add_f32_e32 v0, v1, v0 ; GFX900-SDAG-NEXT: s_setpc_b64 s[30:31] ; ; GFX900-GISEL-LABEL: v_log10_f32_nnan_ninf_daz: ; GFX900-GISEL: ; %bb.0: ; GFX900-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GFX900-GISEL-NEXT: v_log_f32_e32 v1, v0 +; GFX900-GISEL-NEXT: v_log_f32_e32 v0, v0 ; GFX900-GISEL-NEXT: s_mov_b32 s4, 0x3e9a209a -; GFX900-GISEL-NEXT: v_mov_b32_e32 v2, 0x3284fbcf -; GFX900-GISEL-NEXT: v_mul_f32_e32 v0, 0x3e9a209a, v1 -; GFX900-GISEL-NEXT: v_fma_f32 v0, v1, s4, -v0 -; GFX900-GISEL-NEXT: v_fma_f32 v0, v1, v2, v0 -; GFX900-GISEL-NEXT: v_mac_f32_e32 v0, 0x3e9a209a, v1 +; GFX900-GISEL-NEXT: v_mov_b32_e32 v3, 0x3284fbcf +; GFX900-GISEL-NEXT: v_mul_f32_e32 v1, 0x3e9a209a, v0 +; GFX900-GISEL-NEXT: v_fma_f32 v2, v0, s4, -v1 +; GFX900-GISEL-NEXT: v_fma_f32 v0, v0, v3, v2 +; GFX900-GISEL-NEXT: v_add_f32_e32 v0, v1, v0 ; GFX900-GISEL-NEXT: s_setpc_b64 s[30:31] ; ; GFX1100-SDAG-LABEL: v_log10_f32_nnan_ninf_daz: @@ -4607,13 +4640,13 @@ ; SI-SDAG-NEXT: v_log_f32_e32 v0, v0 ; SI-SDAG-NEXT: s_mov_b32 s4, 0x3e9a209a ; SI-SDAG-NEXT: v_mul_f32_e32 v1, 0x3e9a209a, v0 -; SI-SDAG-NEXT: v_fma_f32 v1, v0, s4, -v1 +; SI-SDAG-NEXT: v_fma_f32 v2, v0, s4, -v1 ; SI-SDAG-NEXT: s_mov_b32 s4, 0x3284fbcf -; SI-SDAG-NEXT: v_fma_f32 v1, v0, s4, v1 -; SI-SDAG-NEXT: v_mac_f32_e32 v1, 0x3e9a209a, v0 -; SI-SDAG-NEXT: v_mov_b32_e32 v0, 0x411a209b -; SI-SDAG-NEXT: v_cndmask_b32_e32 v0, 0, v0, vcc -; SI-SDAG-NEXT: v_sub_f32_e32 v0, v1, v0 +; SI-SDAG-NEXT: v_fma_f32 v0, v0, s4, v2 +; SI-SDAG-NEXT: v_add_f32_e32 v0, v1, v0 +; SI-SDAG-NEXT: v_mov_b32_e32 v1, 0x411a209b +; SI-SDAG-NEXT: v_cndmask_b32_e32 v1, 0, v1, vcc +; SI-SDAG-NEXT: v_sub_f32_e32 v0, v0, v1 ; SI-SDAG-NEXT: s_setpc_b64 s[30:31] ; ; SI-GISEL-LABEL: v_log10_f32_nnan_ninf_dynamic: @@ -4626,14 +4659,14 @@ ; SI-GISEL-NEXT: v_mul_f32_e32 v0, v0, v1 ; SI-GISEL-NEXT: v_log_f32_e32 v0, v0 ; SI-GISEL-NEXT: s_mov_b32 s4, 0x3e9a209a -; SI-GISEL-NEXT: v_mov_b32_e32 v2, 0x3284fbcf +; SI-GISEL-NEXT: v_mov_b32_e32 v3, 0x3284fbcf ; SI-GISEL-NEXT: v_mul_f32_e32 v1, 0x3e9a209a, v0 -; SI-GISEL-NEXT: v_fma_f32 v1, v0, s4, -v1 -; SI-GISEL-NEXT: v_fma_f32 v1, v0, v2, v1 -; SI-GISEL-NEXT: v_mac_f32_e32 v1, 0x3e9a209a, v0 -; SI-GISEL-NEXT: v_mov_b32_e32 v0, 0x411a209b -; SI-GISEL-NEXT: v_cndmask_b32_e32 v0, 0, v0, vcc -; SI-GISEL-NEXT: v_sub_f32_e32 v0, v1, v0 +; SI-GISEL-NEXT: v_fma_f32 v2, v0, s4, -v1 +; SI-GISEL-NEXT: v_fma_f32 v0, v0, v3, v2 +; SI-GISEL-NEXT: v_add_f32_e32 v0, v1, v0 +; SI-GISEL-NEXT: v_mov_b32_e32 v1, 0x411a209b +; SI-GISEL-NEXT: v_cndmask_b32_e32 v1, 0, v1, vcc +; SI-GISEL-NEXT: v_sub_f32_e32 v0, v0, v1 ; SI-GISEL-NEXT: s_setpc_b64 s[30:31] ; ; VI-SDAG-LABEL: v_log10_f32_nnan_ninf_dynamic: @@ -4647,13 +4680,16 @@ ; VI-SDAG-NEXT: v_log_f32_e32 v0, v0 ; VI-SDAG-NEXT: v_and_b32_e32 v1, 0xfffff000, v0 ; VI-SDAG-NEXT: v_sub_f32_e32 v0, v0, v1 -; VI-SDAG-NEXT: v_mul_f32_e32 v2, 0x369a84fb, v0 -; VI-SDAG-NEXT: v_mac_f32_e32 v2, 0x369a84fb, v1 -; VI-SDAG-NEXT: v_mac_f32_e32 v2, 0x3e9a2000, v0 -; VI-SDAG-NEXT: v_mov_b32_e32 v0, 0x411a209b -; VI-SDAG-NEXT: v_mac_f32_e32 v2, 0x3e9a2000, v1 -; VI-SDAG-NEXT: v_cndmask_b32_e32 v0, 0, v0, vcc -; VI-SDAG-NEXT: v_sub_f32_e32 v0, v2, v0 +; VI-SDAG-NEXT: v_mul_f32_e32 v2, 0x3e9a2000, v0 +; VI-SDAG-NEXT: v_mul_f32_e32 v0, 0x369a84fb, v0 +; VI-SDAG-NEXT: v_mul_f32_e32 v3, 0x369a84fb, v1 +; VI-SDAG-NEXT: v_add_f32_e32 v0, v3, v0 +; VI-SDAG-NEXT: v_add_f32_e32 v0, v2, v0 +; VI-SDAG-NEXT: v_mul_f32_e32 v1, 0x3e9a2000, v1 +; VI-SDAG-NEXT: v_add_f32_e32 v0, v1, v0 +; VI-SDAG-NEXT: v_mov_b32_e32 v1, 0x411a209b +; VI-SDAG-NEXT: v_cndmask_b32_e32 v1, 0, v1, vcc +; VI-SDAG-NEXT: v_sub_f32_e32 v0, v0, v1 ; VI-SDAG-NEXT: s_setpc_b64 s[30:31] ; ; VI-GISEL-LABEL: v_log10_f32_nnan_ninf_dynamic: @@ -4668,12 +4704,15 @@ ; VI-GISEL-NEXT: v_and_b32_e32 v1, 0xfffff000, v0 ; VI-GISEL-NEXT: v_sub_f32_e32 v0, v0, v1 ; VI-GISEL-NEXT: v_mul_f32_e32 v2, 0x369a84fb, v0 -; VI-GISEL-NEXT: v_mac_f32_e32 v2, 0x369a84fb, v1 -; VI-GISEL-NEXT: v_mac_f32_e32 v2, 0x3e9a2000, v0 -; VI-GISEL-NEXT: v_mov_b32_e32 v0, 0x411a209b -; VI-GISEL-NEXT: v_mac_f32_e32 v2, 0x3e9a2000, v1 -; VI-GISEL-NEXT: v_cndmask_b32_e32 v0, 0, v0, vcc -; VI-GISEL-NEXT: v_sub_f32_e32 v0, v2, v0 +; VI-GISEL-NEXT: v_mul_f32_e32 v3, 0x369a84fb, v1 +; VI-GISEL-NEXT: v_add_f32_e32 v2, v3, v2 +; VI-GISEL-NEXT: v_mul_f32_e32 v0, 0x3e9a2000, v0 +; VI-GISEL-NEXT: v_add_f32_e32 v0, v0, v2 +; VI-GISEL-NEXT: v_mul_f32_e32 v1, 0x3e9a2000, v1 +; VI-GISEL-NEXT: v_add_f32_e32 v0, v1, v0 +; VI-GISEL-NEXT: v_mov_b32_e32 v1, 0x411a209b +; VI-GISEL-NEXT: v_cndmask_b32_e32 v1, 0, v1, vcc +; VI-GISEL-NEXT: v_sub_f32_e32 v0, v0, v1 ; VI-GISEL-NEXT: s_setpc_b64 s[30:31] ; ; GFX900-SDAG-LABEL: v_log10_f32_nnan_ninf_dynamic: @@ -4687,13 +4726,13 @@ ; GFX900-SDAG-NEXT: v_log_f32_e32 v0, v0 ; GFX900-SDAG-NEXT: s_mov_b32 s4, 0x3e9a209a ; GFX900-SDAG-NEXT: v_mul_f32_e32 v1, 0x3e9a209a, v0 -; GFX900-SDAG-NEXT: v_fma_f32 v1, v0, s4, -v1 +; GFX900-SDAG-NEXT: v_fma_f32 v2, v0, s4, -v1 ; GFX900-SDAG-NEXT: s_mov_b32 s4, 0x3284fbcf -; GFX900-SDAG-NEXT: v_fma_f32 v1, v0, s4, v1 -; GFX900-SDAG-NEXT: v_mac_f32_e32 v1, 0x3e9a209a, v0 -; GFX900-SDAG-NEXT: v_mov_b32_e32 v0, 0x411a209b -; GFX900-SDAG-NEXT: v_cndmask_b32_e32 v0, 0, v0, vcc -; GFX900-SDAG-NEXT: v_sub_f32_e32 v0, v1, v0 +; GFX900-SDAG-NEXT: v_fma_f32 v0, v0, s4, v2 +; GFX900-SDAG-NEXT: v_add_f32_e32 v0, v1, v0 +; GFX900-SDAG-NEXT: v_mov_b32_e32 v1, 0x411a209b +; GFX900-SDAG-NEXT: v_cndmask_b32_e32 v1, 0, v1, vcc +; GFX900-SDAG-NEXT: v_sub_f32_e32 v0, v0, v1 ; GFX900-SDAG-NEXT: s_setpc_b64 s[30:31] ; ; GFX900-GISEL-LABEL: v_log10_f32_nnan_ninf_dynamic: @@ -4706,14 +4745,14 @@ ; GFX900-GISEL-NEXT: v_mul_f32_e32 v0, v0, v1 ; GFX900-GISEL-NEXT: v_log_f32_e32 v0, v0 ; GFX900-GISEL-NEXT: s_mov_b32 s4, 0x3e9a209a -; GFX900-GISEL-NEXT: v_mov_b32_e32 v2, 0x3284fbcf +; GFX900-GISEL-NEXT: v_mov_b32_e32 v3, 0x3284fbcf ; GFX900-GISEL-NEXT: v_mul_f32_e32 v1, 0x3e9a209a, v0 -; GFX900-GISEL-NEXT: v_fma_f32 v1, v0, s4, -v1 -; GFX900-GISEL-NEXT: v_fma_f32 v1, v0, v2, v1 -; GFX900-GISEL-NEXT: v_mac_f32_e32 v1, 0x3e9a209a, v0 -; GFX900-GISEL-NEXT: v_mov_b32_e32 v0, 0x411a209b -; GFX900-GISEL-NEXT: v_cndmask_b32_e32 v0, 0, v0, vcc -; GFX900-GISEL-NEXT: v_sub_f32_e32 v0, v1, v0 +; GFX900-GISEL-NEXT: v_fma_f32 v2, v0, s4, -v1 +; GFX900-GISEL-NEXT: v_fma_f32 v0, v0, v3, v2 +; GFX900-GISEL-NEXT: v_add_f32_e32 v0, v1, v0 +; GFX900-GISEL-NEXT: v_mov_b32_e32 v1, 0x411a209b +; GFX900-GISEL-NEXT: v_cndmask_b32_e32 v1, 0, v1, vcc +; GFX900-GISEL-NEXT: v_sub_f32_e32 v0, v0, v1 ; GFX900-GISEL-NEXT: s_setpc_b64 s[30:31] ; ; GFX1100-SDAG-LABEL: v_log10_f32_nnan_ninf_dynamic: @@ -4808,11 +4847,11 @@ ; SI-SDAG-NEXT: v_log_f32_e32 v0, v0 ; SI-SDAG-NEXT: s_mov_b32 s4, 0x3e9a209a ; SI-SDAG-NEXT: v_mul_f32_e32 v1, 0x3e9a209a, v0 -; SI-SDAG-NEXT: v_fma_f32 v1, v0, s4, -v1 +; SI-SDAG-NEXT: v_fma_f32 v2, v0, s4, -v1 ; SI-SDAG-NEXT: s_mov_b32 s4, 0x3284fbcf -; SI-SDAG-NEXT: v_fma_f32 v1, v0, s4, v1 +; SI-SDAG-NEXT: v_fma_f32 v2, v0, s4, v2 ; SI-SDAG-NEXT: s_mov_b32 s4, 0x7f800000 -; SI-SDAG-NEXT: v_mac_f32_e32 v1, 0x3e9a209a, v0 +; SI-SDAG-NEXT: v_add_f32_e32 v1, v1, v2 ; SI-SDAG-NEXT: v_cmp_lt_f32_e64 s[4:5], |v0|, s4 ; SI-SDAG-NEXT: v_cndmask_b32_e64 v0, v0, v1, s[4:5] ; SI-SDAG-NEXT: v_mov_b32_e32 v1, 0x411a209b @@ -4830,12 +4869,12 @@ ; SI-GISEL-NEXT: v_mul_f32_e32 v0, v0, v1 ; SI-GISEL-NEXT: v_log_f32_e32 v0, v0 ; SI-GISEL-NEXT: s_mov_b32 s4, 0x3e9a209a -; SI-GISEL-NEXT: v_mov_b32_e32 v2, 0x3284fbcf +; SI-GISEL-NEXT: v_mov_b32_e32 v3, 0x3284fbcf ; SI-GISEL-NEXT: v_mul_f32_e32 v1, 0x3e9a209a, v0 -; SI-GISEL-NEXT: v_fma_f32 v1, v0, s4, -v1 -; SI-GISEL-NEXT: v_fma_f32 v1, v0, v2, v1 +; SI-GISEL-NEXT: v_fma_f32 v2, v0, s4, -v1 +; SI-GISEL-NEXT: v_fma_f32 v2, v0, v3, v2 +; SI-GISEL-NEXT: v_add_f32_e32 v1, v1, v2 ; SI-GISEL-NEXT: v_mov_b32_e32 v2, 0x7f800000 -; SI-GISEL-NEXT: v_mac_f32_e32 v1, 0x3e9a209a, v0 ; SI-GISEL-NEXT: v_cmp_lt_f32_e64 s[4:5], |v0|, v2 ; SI-GISEL-NEXT: v_cndmask_b32_e64 v0, v0, v1, s[4:5] ; SI-GISEL-NEXT: v_mov_b32_e32 v1, 0x411a209b @@ -4855,13 +4894,16 @@ ; VI-SDAG-NEXT: s_mov_b32 s4, 0x7f800000 ; VI-SDAG-NEXT: v_and_b32_e32 v1, 0xfffff000, v0 ; VI-SDAG-NEXT: v_sub_f32_e32 v2, v0, v1 -; VI-SDAG-NEXT: v_mul_f32_e32 v3, 0x369a84fb, v2 -; VI-SDAG-NEXT: v_mac_f32_e32 v3, 0x369a84fb, v1 -; VI-SDAG-NEXT: v_mac_f32_e32 v3, 0x3e9a2000, v2 -; VI-SDAG-NEXT: v_mac_f32_e32 v3, 0x3e9a2000, v1 +; VI-SDAG-NEXT: v_mul_f32_e32 v3, 0x3e9a2000, v2 +; VI-SDAG-NEXT: v_mul_f32_e32 v2, 0x369a84fb, v2 +; VI-SDAG-NEXT: v_mul_f32_e32 v4, 0x369a84fb, v1 +; VI-SDAG-NEXT: v_add_f32_e32 v2, v4, v2 +; VI-SDAG-NEXT: v_add_f32_e32 v2, v3, v2 +; VI-SDAG-NEXT: v_mul_f32_e32 v1, 0x3e9a2000, v1 +; VI-SDAG-NEXT: v_add_f32_e32 v1, v1, v2 ; VI-SDAG-NEXT: v_cmp_lt_f32_e64 s[4:5], |v0|, s4 +; VI-SDAG-NEXT: v_cndmask_b32_e64 v0, v0, v1, s[4:5] ; VI-SDAG-NEXT: v_mov_b32_e32 v1, 0x411a209b -; VI-SDAG-NEXT: v_cndmask_b32_e64 v0, v0, v3, s[4:5] ; VI-SDAG-NEXT: v_cndmask_b32_e32 v1, 0, v1, vcc ; VI-SDAG-NEXT: v_sub_f32_e32 v0, v0, v1 ; VI-SDAG-NEXT: s_setpc_b64 s[30:31] @@ -4878,13 +4920,16 @@ ; VI-GISEL-NEXT: v_and_b32_e32 v1, 0xfffff000, v0 ; VI-GISEL-NEXT: v_sub_f32_e32 v2, v0, v1 ; VI-GISEL-NEXT: v_mul_f32_e32 v3, 0x369a84fb, v2 -; VI-GISEL-NEXT: v_mac_f32_e32 v3, 0x369a84fb, v1 -; VI-GISEL-NEXT: v_mac_f32_e32 v3, 0x3e9a2000, v2 -; VI-GISEL-NEXT: v_mac_f32_e32 v3, 0x3e9a2000, v1 -; VI-GISEL-NEXT: v_mov_b32_e32 v1, 0x7f800000 -; VI-GISEL-NEXT: v_cmp_lt_f32_e64 s[4:5], |v0|, v1 +; VI-GISEL-NEXT: v_mul_f32_e32 v4, 0x369a84fb, v1 +; VI-GISEL-NEXT: v_add_f32_e32 v3, v4, v3 +; VI-GISEL-NEXT: v_mul_f32_e32 v2, 0x3e9a2000, v2 +; VI-GISEL-NEXT: v_add_f32_e32 v2, v2, v3 +; VI-GISEL-NEXT: v_mul_f32_e32 v1, 0x3e9a2000, v1 +; VI-GISEL-NEXT: v_add_f32_e32 v1, v1, v2 +; VI-GISEL-NEXT: v_mov_b32_e32 v2, 0x7f800000 +; VI-GISEL-NEXT: v_cmp_lt_f32_e64 s[4:5], |v0|, v2 +; VI-GISEL-NEXT: v_cndmask_b32_e64 v0, v0, v1, s[4:5] ; VI-GISEL-NEXT: v_mov_b32_e32 v1, 0x411a209b -; VI-GISEL-NEXT: v_cndmask_b32_e64 v0, v0, v3, s[4:5] ; VI-GISEL-NEXT: v_cndmask_b32_e32 v1, 0, v1, vcc ; VI-GISEL-NEXT: v_sub_f32_e32 v0, v0, v1 ; VI-GISEL-NEXT: s_setpc_b64 s[30:31] @@ -4900,11 +4945,11 @@ ; GFX900-SDAG-NEXT: v_log_f32_e32 v0, v0 ; GFX900-SDAG-NEXT: s_mov_b32 s4, 0x3e9a209a ; GFX900-SDAG-NEXT: v_mul_f32_e32 v1, 0x3e9a209a, v0 -; GFX900-SDAG-NEXT: v_fma_f32 v1, v0, s4, -v1 +; GFX900-SDAG-NEXT: v_fma_f32 v2, v0, s4, -v1 ; GFX900-SDAG-NEXT: s_mov_b32 s4, 0x3284fbcf -; GFX900-SDAG-NEXT: v_fma_f32 v1, v0, s4, v1 +; GFX900-SDAG-NEXT: v_fma_f32 v2, v0, s4, v2 ; GFX900-SDAG-NEXT: s_mov_b32 s4, 0x7f800000 -; GFX900-SDAG-NEXT: v_mac_f32_e32 v1, 0x3e9a209a, v0 +; GFX900-SDAG-NEXT: v_add_f32_e32 v1, v1, v2 ; GFX900-SDAG-NEXT: v_cmp_lt_f32_e64 s[4:5], |v0|, s4 ; GFX900-SDAG-NEXT: v_cndmask_b32_e64 v0, v0, v1, s[4:5] ; GFX900-SDAG-NEXT: v_mov_b32_e32 v1, 0x411a209b @@ -4922,12 +4967,12 @@ ; GFX900-GISEL-NEXT: v_mul_f32_e32 v0, v0, v1 ; GFX900-GISEL-NEXT: v_log_f32_e32 v0, v0 ; GFX900-GISEL-NEXT: s_mov_b32 s4, 0x3e9a209a -; GFX900-GISEL-NEXT: v_mov_b32_e32 v2, 0x3284fbcf +; GFX900-GISEL-NEXT: v_mov_b32_e32 v3, 0x3284fbcf ; GFX900-GISEL-NEXT: v_mul_f32_e32 v1, 0x3e9a209a, v0 -; GFX900-GISEL-NEXT: v_fma_f32 v1, v0, s4, -v1 -; GFX900-GISEL-NEXT: v_fma_f32 v1, v0, v2, v1 +; GFX900-GISEL-NEXT: v_fma_f32 v2, v0, s4, -v1 +; GFX900-GISEL-NEXT: v_fma_f32 v2, v0, v3, v2 +; GFX900-GISEL-NEXT: v_add_f32_e32 v1, v1, v2 ; GFX900-GISEL-NEXT: v_mov_b32_e32 v2, 0x7f800000 -; GFX900-GISEL-NEXT: v_mac_f32_e32 v1, 0x3e9a209a, v0 ; GFX900-GISEL-NEXT: v_cmp_lt_f32_e64 s[4:5], |v0|, v2 ; GFX900-GISEL-NEXT: v_cndmask_b32_e64 v0, v0, v1, s[4:5] ; GFX900-GISEL-NEXT: v_mov_b32_e32 v1, 0x411a209b Index: llvm/test/CodeGen/AMDGPU/v_mac.ll =================================================================== --- llvm/test/CodeGen/AMDGPU/v_mac.ll +++ llvm/test/CodeGen/AMDGPU/v_mac.ll @@ -279,6 +279,34 @@ ret void } +; Need to assume denormal handling is needed for dynamic denormal mode +; GCN-LABEL: {{^}}v_mac_f32_dynamic: +; GCN: v_mul_f32 +; GCN: v_add_f32 +define float @v_mac_f32_dynamic(float %a, float %b, float %c) "denormal-fp-math-f32"="dynamic,dynamic" { + %mul = fmul float %a, %b + %mad = fadd float %mul, %c + ret float %mad +} + +; GCN-LABEL: {{^}}v_mac_f32_dynamic_daz: +; GCN: v_mul_f32 +; GCN: v_add_f32 +define float @v_mac_f32_dynamic_daz(float %a, float %b, float %c) "denormal-fp-math-f32"="preserve-sign,dynamic" { + %mul = fmul float %a, %b + %mad = fadd float %mul, %c + ret float %mad +} + +; GCN-LABEL: {{^}}v_mac_f32_dynamic_ftz: +; GCN: v_mul_f32 +; GCN: v_add_f32 +define float @v_mac_f32_dynamic_ftz(float %a, float %b, float %c) "denormal-fp-math-f32"="dynamic,preserve-sign" { + %mul = fmul float %a, %b + %mad = fadd float %mul, %c + ret float %mad +} + declare i32 @llvm.amdgcn.workitem.id.x() #2 attributes #0 = { nounwind "no-signed-zeros-fp-math"="false" }