This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
1/2
AArch64ISelDAGToDAG.cpp
2/4
AArch64InstrFormats.td
-
AArch64InstrInfo.td
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
1/2
svtcf-fmul-fdiv-combine.ll

Differential D156538

[AArch64] Try to combine FMUL with FDIV
ClosedPublic

Authored by jaykang10 on Jul 28 2023, 7:03 AM.

Download Raw Diff

Details

Reviewers

samtebbs
dmgreen
efriedma
t.p.northover

Commits

rG9f8dcb070655: [AArch64] Try to detect patterns with fdiv and fmul for [su]cvtf.

Summary

gcc generates less instructions than llvm from below example.

float foo(int state) {
    return (float)state / 2;
}

gcc output

foo:
  scvtf s0, w0, #1
  ret

llvm output

foo:
  scvtf s0, w0
  fmov s1, #0.50000000
  fmul s0, s0, s1
  ret

gcc converts the float division to float multiplication like X / C --> X * (1 / C), and it has a pattern aarch64_scvtfsisf2_mult with float multiplication for scvtf.
llvm also converts fdiv to fmul in InstCombine pass like X / C --> X * (1 / C) but it does not have ISel codes with fmul for scvtf.
If fmul's constant operand is the reciprocal of a power of 2 like (1/2^n) and the other operand is SINT_TO_FP, we can try X * (1 / C) --> X / C because it will be matched with scvtf patterns with fixed-point.
With this patch, the llvm's output is as below.

foo:
        scvtf   s0, w0, #1
        ret

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

jaykang10 created this revision.Jul 28 2023, 7:03 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 28 2023, 7:03 AM

Herald added subscribers: hiraditya, kristof.beyls. · View Herald Transcript

jaykang10 requested review of this revision.Jul 28 2023, 7:03 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 28 2023, 7:03 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Hi @jaykang10, perhaps I'm being over-cautious here, but I'm not sure of the licensing implications of posting GCC source code in the commit message for a LLVM patch? It feels like this might be problematic.

In D156538#4542438, @david-arm wrote:

Hi @jaykang10, perhaps I'm being over-cautious here, but I'm not sure of the licensing implications of posting GCC source code in the commit message for a LLVM patch? It feels like this might be problematic.

Ah, sorry. I did not know that. Let me remove the gcc source code.
Thanks for letting me know @david-arm

jaykang10 edited the summary of this revision. (Show Details)Jul 28 2023, 7:11 AM

Harbormaster completed remote builds in B248840: Diff 545138.Jul 28 2023, 8:42 AM

All of these examples seem to canonicalize to fmul in the midend: https://godbolt.org/z/hqPv3azjf
Is it worth keeping the currently lowering for fdiv(sitofp)? Or should we just change that to work with fmul?

In D156538#4542937, @dmgreen wrote:

All of these examples seem to canonicalize to fmul in the midend: https://godbolt.org/z/hqPv3azjf
Is it worth keeping the currently lowering for fdiv(sitofp)? Or should we just change that to work with fmul?

I guess you are mentioning performFDivCombine function. As you can see, the function convert the fdiv(sitofp) into Intrinsic::aarch64_neon_vcvtfxs2fp which is vector version and it is matched with below pattern.

multiclass SIMDFPScalarRShift<bit U, bits<5> opc, string asm> {
  let Predicates = [HasNEON, HasFullFP16] in {
  def h : BaseSIMDScalarShift<U, opc, {0,0,1,?,?,?,?},
                              FPR16, FPR16, vecshiftR16, asm, []> { 
    let Inst{19-16} = imm{3-0};
  }
  } // Predicates = [HasNEON, HasFullFP16]
  def s : BaseSIMDScalarShift<U, opc, {0,1,?,?,?,?,?},
                              FPR32, FPR32, vecshiftR32, asm, []> { 
    let Inst{20-16} = imm{4-0};
  }
  def d : BaseSIMDScalarShift<U, opc, {1,?,?,?,?,?,?},
                              FPR64, FPR64, vecshiftR64, asm, []> { 
    let Inst{21-16} = imm{5-0};
  }
}
...
defm SCVTF  : SIMDFPScalarRShift<0, 0b11100, "scvtf">;
...
def : Pat<(int_aarch64_neon_vcvtfxs2fp FPR32:$Rn, vecshiftR32:$imm),
          (SCVTFs FPR32:$Rn, vecshiftR32:$imm)>;

As you can see on the multiclass SIMDFPScalarRShift, the MIR definition expects FPR register classes for input/output. It causes COPY MIR, which is fmov, between FPR and GPR.
For scalar version, AArch64 target has below patterns.

multiclass IntegerToFP<bit isUnsigned, string asm, SDPatternOperator node> {
...
  def SWSri: BaseIntegerToFP<isUnsigned, GPR32, FPR32, fixedpoint_f32_i32, asm,
                             [(set FPR32:$Rd,
                                   (fdiv (node GPR32:$Rn),
                                         fixedpoint_f32_i32:$scale))]> {
    let Inst{31} = 0; // 32-bit GPR flag
    let Inst{23-22} = 0b00; // 32-bit FPR flag
    let scale{5} = 1;
  }
...
defm SCVTF : IntegerToFP<0, "scvtf", any_sint_to_fp>;

We need to keep fdiv(sitofp) node to match above pattern.
In order to use current patterns for scalar version, I have converted fmul to fdiv using dagcombine with fmul.
Do you want to add some code, which handles fmul, in performFDivCombine? I am not sure whether that is better than this patch or not...

Oh I see, I hadn't spotted performFDivCombine. It was the scalar patterns I was thinking about, via IntegerToFP and fixedpoint_f32_i32 and SelectCVTFixedPosOperand.

Does it make sense to keep them selection fdiv, or should they always just match fmul? It would seem we only need one, and fmul is more canonical.

Oh I see, I hadn't spotted performFDivCombine. It was the scalar patterns I was thinking about, via IntegerToFP and fixedpoint_f32_i32 and SelectCVTFixedPosOperand.

Ah, sorry, I thought you mentioned the performFDivCombine function because I was not able to find the custom lowering code for FDIV except SVE.

Does it make sense to keep them selection fdiv, or should they always just match fmul? It would seem we only need one, and fmul is more canonical.

If possible, I would like to keep the existing fdiv pattern in this patch.
I saw @samtebbs changed the SelectCVTFixedPosOperand function so I added him as a reviewer and if possible, I would like to get his opinion too.

Matt added a subscriber: Matt.Jul 31 2023, 12:10 PM

I think this is good as it is, although I'm not 100% sure on the fact that we need to get it converted to aarch64_neon_vcvtfxs2fp first, as if something goes wrong there then we'll miss out on this optimisation. If you can think of a way to circumvent the need for that and go directly to the scvtf that would be good, otherwise this looks good to me.

In D156538#4550702, @samtebbs wrote:

I think this is good as it is, although I'm not 100% sure on the fact that we need to get it converted to aarch64_neon_vcvtfxs2fp first, as if something goes wrong there then we'll miss out on this optimisation. If you can think of a way to circumvent the need for that and go directly to the scvtf that would be good, otherwise this looks good to me.

Thanks for comment.
There is a comment in AArch64InstrInfo.td as below and it looks there was TableGen issue to generate the simd type instruction directly.

defm FCVTZS : SIMDFPScalarRShift<0, 0b11111, "fcvtzs">;
defm FCVTZU : SIMDFPScalarRShift<1, 0b11111, "fcvtzu">;
defm SCVTF  : SIMDFPScalarRShift<0, 0b11100, "scvtf">;
defm UCVTF  : SIMDFPScalarRShift<1, 0b11100, "ucvtf">;
// Codegen patterns for the above. We don't put these directly on the
// instructions because TableGen's type inference can't handle the truth.
// Having the same base pattern for fp <--> int totally freaks it out.

Does it make sense to keep them selection fdiv, or should they always just match fmul? It would seem we only need one, and fmul is more canonical.

Additionally, multiclass FPToIntegerScaled and multiclass IntegerToFP share the SelectCVTFixedPosOperand as below.

class fixedpoint_i32<ValueType FloatVT>
  : Operand<FloatVT>,
    ComplexPattern<FloatVT, 1, "SelectCVTFixedPosOperand<32>", [fpimm, ld]> {
  let EncoderMethod = "getFixedPointScaleOpValue";
  let DecoderMethod = "DecodeFixedPointScaleImm32";
  let ParserMatchClass = Imm1_32Operand;
}
...

def fixedpoint_f32_i64 : fixedpoint_i64<f32>;
...

multiclass FPToIntegerScaled<bits<2> rmode, bits<3> opcode, string asm,
                             SDPatternOperator OpN> {
...
  def SXSri : BaseFPToInteger<0b00, rmode, opcode, FPR32, GPR64,
                              fixedpoint_f32_i64, asm,
              [(set GPR64:$Rd, (OpN (fmul FPR32:$Rn,
                                          fixedpoint_f32_i64:$scale)))]> {
    let Inst{31} = 1; // 64-bit GPR flag
  }
...
multiclass IntegerToFP<bit isUnsigned, string asm, SDPatternOperator node> {
...
  def SXSri: BaseIntegerToFP<isUnsigned, GPR64, FPR32, fixedpoint_f32_i64, asm,
                             [(set FPR32:$Rd,
                                   (fdiv (node GPR64:$Rn),
                                         fixedpoint_f32_i64:$scale))]> {
    let Inst{31} = 1; // 64-bit GPR flag
    let Inst{23-22} = 0b00; // 32-bit FPR flag
  }

The SelectCVTFixedPosOperand is for fcvt and checks the constant is 2^fbits.
If we keep the fdiv for scvtf, we can use SelectCVTFixedPosOperand.
If we replace the fdiv with fmul for scvtf, we need other complex pattern which checks the constant 1/2^fbits.
GCC uses fmul for fcvt and scvft but it has two patterns for 2^n and 1/2^n.
From my personal opinion, as current implementation, it would also be good to keep one complex pattern with fmul and fdiv.

In D156538#4551097, @jaykang10 wrote:
Does it make sense to keep them selection fdiv, or should they always just match fmul? It would seem we only need one, and fmul is more canonical.

Additionally, multiclass FPToIntegerScaled and multiclass IntegerToFP share the SelectCVTFixedPosOperand as below.
class fixedpoint_i32<ValueType FloatVT>
  : Operand<FloatVT>,
    ComplexPattern<FloatVT, 1, "SelectCVTFixedPosOperand<32>", [fpimm, ld]> {
  let EncoderMethod = "getFixedPointScaleOpValue";
  let DecoderMethod = "DecodeFixedPointScaleImm32";
  let ParserMatchClass = Imm1_32Operand;
}
...

def fixedpoint_f32_i64 : fixedpoint_i64<f32>;
...

multiclass FPToIntegerScaled<bits<2> rmode, bits<3> opcode, string asm,
                             SDPatternOperator OpN> {
...
  def SXSri : BaseFPToInteger<0b00, rmode, opcode, FPR32, GPR64,
                              fixedpoint_f32_i64, asm,
              [(set GPR64:$Rd, (OpN (fmul FPR32:$Rn,
                                          fixedpoint_f32_i64:$scale)))]> {
    let Inst{31} = 1; // 64-bit GPR flag
  }
...
multiclass IntegerToFP<bit isUnsigned, string asm, SDPatternOperator node> {
...
  def SXSri: BaseIntegerToFP<isUnsigned, GPR64, FPR32, fixedpoint_f32_i64, asm,
                             [(set FPR32:$Rd,
                                   (fdiv (node GPR64:$Rn),
                                         fixedpoint_f32_i64:$scale))]> {
    let Inst{31} = 1; // 64-bit GPR flag
    let Inst{23-22} = 0b00; // 32-bit FPR flag
  }
The SelectCVTFixedPosOperand is for fcvt and checks the constant is 2^fbits.
If we keep the fdiv for scvtf, we can use SelectCVTFixedPosOperand.
If we replace the fdiv with fmul for scvtf, we need other complex pattern which checks the constant 1/2^fbits.
GCC uses fmul for fcvt and scvft but it has two patterns for 2^n and 1/2^n.
From my personal opinion, as current implementation, it would also be good to keep one complex pattern with fmul and fdiv.

I see. That makes sense, but we may need to take that route anyway. I worry that if we do it this way we will just end up in a loop, transforming fdiv to fmul and back again.

There is a generic DAG combine in this bit of code that does the inverse transform: https://github.com/llvm/llvm-project/blob/c2093b85044d87805c39267c65ac9032d5454e0e/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L16543. It currently only triggers with UnsafeFPMath or AllowReciprocal, which is probably why it doesn't come up in the tests. According to the InstCombine version it should be fine for any constant that has an exact inverse (which seems the same as what you have here too), so should be more generally applicable.

I think my vote would still be for changing IntegerToFP to use fmul with a difference ComplexPat, but if you do go this route it will need some way of preventing the infinite folding back and forth.

Allen added a subscriber: Allen.Aug 2 2023, 2:35 AM

I see. That makes sense, but we may need to take that route anyway. I worry that if we do it this way we will just end up in a loop, transforming fdiv to fmul and back again.

There is a generic DAG combine in this bit of code that does the inverse transform: https://github.com/llvm/llvm-project/blob/c2093b85044d87805c39267c65ac9032d5454e0e/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L16543. It currently only triggers with UnsafeFPMath or AllowReciprocal, which is probably why it doesn't come up in the tests. According to the InstCombine version it should be fine for any constant that has an exact inverse (which seems the same as what you have here too), so should be more generally applicable.

I think my vote would still be for changing IntegerToFP to use fmul with a difference ComplexPat, but if you do go this route it will need some way of preventing the infinite folding back and forth.

Sorry for late and thanks for comment.
If possible, I would like to re-use existing patterns.
Any more comments or any objection for this patch?

Sorry for late and thanks for comment.
If possible, I would like to re-use existing patterns.
Any more comments or any objection for this patch?

Unfortunately I believe this will get stuck in loops some of the time. Try https://godbolt.org/z/c6badjcb1 for example, when it runs under Ofast.

Unfortunately I believe this will get stuck in loops some of the time. Try https://godbolt.org/z/c6badjcb1 for example, when it runs under Ofast.

Ah, Thanks for pointing out the Ofast option! I misunderstood one of your previous comments. I can see the loop between tryCombineFMULWithFDIV and DAGCombiner::visitFDIV.
I can see below comment for the precision on DAGCombiner::visitFDIV. That's what I want to check for this patch.

SDValue DAGCombiner::visitFDIV(SDNode *N) {
...
  if (Options.UnsafeFPMath || Flags.hasAllowReciprocal()) {
    // fold (fdiv X, c2) -> fmul X, 1/c2 if losing precision is acceptable.
    if (auto *N1CFP = dyn_cast<ConstantFPSDNode>(N1))

Let me add a complex pattern with Options.UnsafeFPMath || Flags.hasAllowReciprocal().

Following @dmgreen's comment, added a complex pattern and updated patterns with it.

jaykang10 updated this revision to Diff 549012.Aug 10 2023, 6:17 AM

@dmgreen As you can see, DAGCombiner::visitFDIV does the conversion with Options.UnsafeFPMath || Flags.hasAllowReciprocal() so the patterns with fdiv are not selected without the options.
In order to use only fmul, we could need to remove the Options.UnsafeFPMath || Flags.hasAllowReciprocal(). Alternatively, We could need to support patterns with both fmul and fdiv with different complex patterns. How do you think about it?

Harbormaster completed remote builds in B251671: Diff 549012.Aug 10 2023, 10:56 AM

In D156538#4576702, @jaykang10 wrote:

@dmgreen As you can see, DAGCombiner::visitFDIV does the conversion with Options.UnsafeFPMath || Flags.hasAllowReciprocal() so the patterns with fdiv are not selected without the options.
In order to use only fmul, we could need to remove the Options.UnsafeFPMath || Flags.hasAllowReciprocal(). Alternatively, We could need to support patterns with both fmul and fdiv with different complex patterns. How do you think about it?

There is an instcombine version here: https://github.com/llvm/llvm-project/blob/c2093b85044d87805c39267c65ac9032d5454e0e/llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp#L1546. It seems to do the transform whenever the value has an exact inverse. // If the constant divisor has an exact inverse, this is always safe. Alive doesn't run to completion to prove it (https://alive2.llvm.org/ce/z/mxheqK), but doesn't come up with a failure in that time. Can we use the same logic to always to the transform? I don't know of any counter examples where it wouldn't be true.

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
3686	Could this be shared with SelectCVTFixedPosOperand? Maybe with a flag to specify whether the getExactInverse needs to be performed. If not them maybe the ConstantFPSDNode/ConstantPoolSDNode stuff can be pulled out and shared?

In D156538#4579185, @dmgreen wrote:

In D156538#4576702, @jaykang10 wrote:

@dmgreen As you can see, DAGCombiner::visitFDIV does the conversion with Options.UnsafeFPMath || Flags.hasAllowReciprocal() so the patterns with fdiv are not selected without the options.
In order to use only fmul, we could need to remove the Options.UnsafeFPMath || Flags.hasAllowReciprocal(). Alternatively, We could need to support patterns with both fmul and fdiv with different complex patterns. How do you think about it?

There is an instcombine version here: https://github.com/llvm/llvm-project/blob/c2093b85044d87805c39267c65ac9032d5454e0e/llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp#L1546. It seems to do the transform whenever the value has an exact inverse. // If the constant divisor has an exact inverse, this is always safe. Alive doesn't run to completion to prove it (https://alive2.llvm.org/ce/z/mxheqK), but doesn't come up with a failure in that time. Can we use the same logic to always to the transform? I don't know of any counter examples where it wouldn't be true.

Ok, good.
Let me update this patch with the exact inverse.

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
3686	Yep, let me try to share the function.

Following @dmgreen's comment, updated patch.

Harbormaster completed remote builds in B251897: Diff 549312.Aug 11 2023, 2:55 AM

@dmgreen As you can see, there are some regressions.
After exact inverse for fmul, the convertToInteger sets IsExact to false in some cases.
Maybe, we could need to keep the fdiv patterns too...

In D156538#4579511, @jaykang10 wrote:

@dmgreen As you can see, there are some regressions.
After exact inverse for fmul, the convertToInteger sets IsExact to false in some cases.
Maybe, we could need to keep the fdiv patterns too...

I think that any test with a fdiv, that would by instcombine be converted to a fmul (like https://godbolt.org/z/P1bd73TK7) are OK to leave as regressions. We would not see the regression in practice as instcombine has already performed the conversion to a fmul.

Do you know which cases that would not cover and would be left over if the fdiv was not handled?

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
16717 ↗	(On Diff #549312)	I think this can be dropped. If we want this transform for scalars it would be best to reuse the logic in DAGCombine.
llvm/lib/Target/AArch64/AArch64InstrFormats.td
709–711	This can maybe be removed, as the class is only used as a ComplexPattern, not as a assembly Operand?
llvm/test/CodeGen/AArch64/svtcf-fmul-fdiv-combine.ll
5	Can you add some fp16 variants in here too.

I think that any test with a fdiv, that would by instcombine be converted to a fmul (like https://godbolt.org/z/P1bd73TK7) are OK to leave as regressions. We would not see the regression in practice as instcombine has already performed the conversion to a fmul.

Do you know which cases that would not cover and would be left over if the fdiv was not handled?

Thanks for comments.
If we add some patterns with fdiv, we can avoid the regressions. For example,

def : Pat<(f16 (fdiv (f16 (any_sint_to_fp (i32 GPR32:$Rn))), fixedpoint_f16_i32:$scale)),
          (SCVTFSWHri GPR32:$Rn, fixedpoint_f16_i32:$scale)>;
def : Pat<(f32 (fdiv (f32 (any_sint_to_fp (i32 GPR32:$Rn))), fixedpoint_f32_i32:$scale)),
          (SCVTFSWSri GPR32:$Rn, fixedpoint_f32_i32:$scale)>;
def : Pat<(f64 (fdiv (f64 (any_sint_to_fp (i32 GPR32:$Rn))), fixedpoint_f64_i32:$scale)),
          (SCVTFSWDri GPR32:$Rn, fixedpoint_f64_i32:$scale)>;

Let me add these patterns in update. If you do not like it, please let me know. Let me remove them.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
16717 ↗	(On Diff #549312)	Let me remove it.
llvm/lib/Target/AArch64/AArch64InstrFormats.td
709–711	Let me remove it.
llvm/test/CodeGen/AArch64/svtcf-fmul-fdiv-combine.ll
5	Let me add fp16 tests.

Following @dmgreen's comment, updated patch.

Harbormaster completed remote builds in B252339: Diff 549917.Aug 14 2023, 8:29 AM

I like it, the new patterns look good to me. LGTM

llvm/lib/Target/AArch64/AArch64InstrFormats.td
5046	Should these be recip too? I'm not sure they need to be, but it might be better for them to be consistent.

This revision is now accepted and ready to land.Aug 15 2023, 1:35 AM

jaykang10 added inline comments.Aug 15 2023, 2:25 AM

llvm/lib/Target/AArch64/AArch64InstrFormats.td
5046	Sorry for mistake. I did not update it. It looks the TableGen does not complain about the inconsistency between complex pattern operands in `InOperandList` and `Pattern`. As you mentioned, it would be just good to use same thing for consistent.

This revision was landed with ongoing or failed builds.Aug 15 2023, 2:59 AM

Closed by commit rG9f8dcb070655: [AArch64] Try to detect patterns with fdiv and fmul for [su]cvtf. (authored by jaykang10). · Explain Why

This revision was automatically updated to reflect the committed changes.

jaykang10 added a commit: rG9f8dcb070655: [AArch64] Try to detect patterns with fdiv and fmul for [su]cvtf..

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64ISelDAGToDAG.cpp

35 lines

AArch64InstrFormats.td

58 lines

AArch64InstrInfo.td

28 lines

test/

CodeGen/

AArch64/

svtcf-fmul-fdiv-combine.ll

167 lines

Diff 550229

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp

Show First 20 Lines • Show All 456 Lines • ▼ Show 20 Lines	private:

template<unsigned RegWidth>		template<unsigned RegWidth>
bool SelectCVTFixedPosOperand(SDValue N, SDValue &FixedPos) {		bool SelectCVTFixedPosOperand(SDValue N, SDValue &FixedPos) {
return SelectCVTFixedPosOperand(N, FixedPos, RegWidth);		return SelectCVTFixedPosOperand(N, FixedPos, RegWidth);
}		}

bool SelectCVTFixedPosOperand(SDValue N, SDValue &FixedPos, unsigned Width);		bool SelectCVTFixedPosOperand(SDValue N, SDValue &FixedPos, unsigned Width);

		template<unsigned RegWidth>
		bool SelectCVTFixedPosRecipOperand(SDValue N, SDValue &FixedPos) {
		return SelectCVTFixedPosRecipOperand(N, FixedPos, RegWidth);
		}

		bool SelectCVTFixedPosRecipOperand(SDValue N, SDValue &FixedPos,
		unsigned Width);

bool SelectCMP_SWAP(SDNode *N);		bool SelectCMP_SWAP(SDNode *N);

bool SelectSVEAddSubImm(SDValue N, MVT VT, SDValue &Imm, SDValue &Shift);		bool SelectSVEAddSubImm(SDValue N, MVT VT, SDValue &Imm, SDValue &Shift);
bool SelectSVECpyDupImm(SDValue N, MVT VT, SDValue &Imm, SDValue &Shift);		bool SelectSVECpyDupImm(SDValue N, MVT VT, SDValue &Imm, SDValue &Shift);
bool SelectSVELogicalImm(SDValue N, MVT VT, SDValue &Imm, bool Invert);		bool SelectSVELogicalImm(SDValue N, MVT VT, SDValue &Imm, bool Invert);

bool SelectSVESignedArithImm(SDValue N, SDValue &Imm);		bool SelectSVESignedArithImm(SDValue N, SDValue &Imm);
bool SelectSVEShiftImm(SDValue N, uint64_t Low, uint64_t High,		bool SelectSVEShiftImm(SDValue N, uint64_t Low, uint64_t High,
▲ Show 20 Lines • Show All 3,147 Lines • ▼ Show 20 Lines	else if (VT == MVT::i64 && NewShiftAmt->getValueType(0) == MVT::i32) {
NewShiftAmt = SDValue(Ext, 0);		NewShiftAmt = SDValue(Ext, 0);
}		}

SDValue Ops[] = {N->getOperand(0), NewShiftAmt};		SDValue Ops[] = {N->getOperand(0), NewShiftAmt};
CurDAG->SelectNodeTo(N, Opc, VT, Ops);		CurDAG->SelectNodeTo(N, Opc, VT, Ops);
return true;		return true;
}		}

bool		static bool checkCVTFixedPointOperandWithFBits(SelectionDAG *CurDAG, SDValue N,
AArch64DAGToDAGISel::SelectCVTFixedPosOperand(SDValue N, SDValue &FixedPos,		SDValue &FixedPos,
unsigned RegWidth) {		unsigned RegWidth,
		bool isReciprocal) {
APFloat FVal(0.0);		APFloat FVal(0.0);
if (ConstantFPSDNode *CN = dyn_cast<ConstantFPSDNode>(N))		if (ConstantFPSDNode *CN = dyn_cast<ConstantFPSDNode>(N))
FVal = CN->getValueAPF();		FVal = CN->getValueAPF();
else if (LoadSDNode *LN = dyn_cast<LoadSDNode>(N)) {		else if (LoadSDNode *LN = dyn_cast<LoadSDNode>(N)) {
// Some otherwise illegal constants are allowed in this case.		// Some otherwise illegal constants are allowed in this case.
if (LN->getOperand(1).getOpcode() != AArch64ISD::ADDlow \|\|		if (LN->getOperand(1).getOpcode() != AArch64ISD::ADDlow \|\|
!isa<ConstantPoolSDNode>(LN->getOperand(1)->getOperand(1)))		!isa<ConstantPoolSDNode>(LN->getOperand(1)->getOperand(1)))
return false;		return false;

ConstantPoolSDNode *CN =		ConstantPoolSDNode *CN =
dyn_cast<ConstantPoolSDNode>(LN->getOperand(1)->getOperand(1));		dyn_cast<ConstantPoolSDNode>(LN->getOperand(1)->getOperand(1));
FVal = cast<ConstantFP>(CN->getConstVal())->getValueAPF();		FVal = cast<ConstantFP>(CN->getConstVal())->getValueAPF();
} else		} else
return false;		return false;

// An FCVT[SU] instruction performs: convertToInt(Val * 2^fbits) where fbits		// An FCVT[SU] instruction performs: convertToInt(Val * 2^fbits) where fbits
// is between 1 and 32 for a destination w-register, or 1 and 64 for an		// is between 1 and 32 for a destination w-register, or 1 and 64 for an
// x-register.		// x-register.
//		//
// By this stage, we've detected (fp_to_[su]int (fmul Val, THIS_NODE)) so we		// By this stage, we've detected (fp_to_[su]int (fmul Val, THIS_NODE)) so we
// want THIS_NODE to be 2^fbits. This is much easier to deal with using		// want THIS_NODE to be 2^fbits. This is much easier to deal with using
// integers.		// integers.
bool IsExact;		bool IsExact;

		if (isReciprocal)
		if (!FVal.getExactInverse(&FVal))
		return false;

// fbits is between 1 and 64 in the worst-case, which means the fmul		// fbits is between 1 and 64 in the worst-case, which means the fmul
// could have 2^64 as an actual operand. Need 65 bits of precision.		// could have 2^64 as an actual operand. Need 65 bits of precision.
APSInt IntVal(65, true);		APSInt IntVal(65, true);
FVal.convertToInteger(IntVal, APFloat::rmTowardZero, &IsExact);		FVal.convertToInteger(IntVal, APFloat::rmTowardZero, &IsExact);

// N.b. isPowerOf2 also checks for > 0.		// N.b. isPowerOf2 also checks for > 0.
if (!IsExact \|\| !IntVal.isPowerOf2()) return false;		if (!IsExact \|\| !IntVal.isPowerOf2())
		return false;
unsigned FBits = IntVal.logBase2();		unsigned FBits = IntVal.logBase2();

// Checks above should have guaranteed that we haven't lost information in		// Checks above should have guaranteed that we haven't lost information in
// finding FBits, but it must still be in range.		// finding FBits, but it must still be in range.
if (FBits == 0 \|\| FBits > RegWidth) return false;		if (FBits == 0 \|\| FBits > RegWidth) return false;

FixedPos = CurDAG->getTargetConstant(FBits, SDLoc(N), MVT::i32);		FixedPos = CurDAG->getTargetConstant(FBits, SDLoc(N), MVT::i32);
return true;		return true;
}		}

		bool AArch64DAGToDAGISel::SelectCVTFixedPosOperand(SDValue N, SDValue &FixedPos,
		dmgreenUnsubmitted Not Done Reply Inline Actions Could this be shared with SelectCVTFixedPosOperand? Maybe with a flag to specify whether the getExactInverse needs to be performed. If not them maybe the ConstantFPSDNode/ConstantPoolSDNode stuff can be pulled out and shared? dmgreen: Could this be shared with SelectCVTFixedPosOperand? Maybe with a flag to specify whether the…
		jaykang10AuthorUnsubmitted Done Reply Inline Actions Yep, let me try to share the function. jaykang10: Yep, let me try to share the function.
		unsigned RegWidth) {
		return checkCVTFixedPointOperandWithFBits(CurDAG, N, FixedPos, RegWidth,
		false);
		}

		bool AArch64DAGToDAGISel::SelectCVTFixedPosRecipOperand(SDValue N,
		SDValue &FixedPos,
		unsigned RegWidth) {
		return checkCVTFixedPointOperandWithFBits(CurDAG, N, FixedPos, RegWidth,
		true);
		}

// Inspects a register string of the form o0:op1:CRn:CRm:op2 gets the fields		// Inspects a register string of the form o0:op1:CRn:CRm:op2 gets the fields
// of the string and obtains the integer values from them and combines these		// of the string and obtains the integer values from them and combines these
// into a single value to be used in the MRS/MSR instruction.		// into a single value to be used in the MRS/MSR instruction.
static int getIntOperandFromRegisterString(StringRef RegString) {		static int getIntOperandFromRegisterString(StringRef RegString) {
SmallVector<StringRef, 5> Fields;		SmallVector<StringRef, 5> Fields;
RegString.split(Fields, ':');		RegString.split(Fields, ':');

if (Fields.size() == 1)		if (Fields.size() == 1)
▲ Show 20 Lines • Show All 3,106 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64InstrFormats.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 697 Lines • ▼ Show 20 Lines
def fixedpoint_f16_i32 : fixedpoint_i32<f16>;		def fixedpoint_f16_i32 : fixedpoint_i32<f16>;
def fixedpoint_f32_i32 : fixedpoint_i32<f32>;		def fixedpoint_f32_i32 : fixedpoint_i32<f32>;
def fixedpoint_f64_i32 : fixedpoint_i32<f64>;		def fixedpoint_f64_i32 : fixedpoint_i32<f64>;

def fixedpoint_f16_i64 : fixedpoint_i64<f16>;		def fixedpoint_f16_i64 : fixedpoint_i64<f16>;
def fixedpoint_f32_i64 : fixedpoint_i64<f32>;		def fixedpoint_f32_i64 : fixedpoint_i64<f32>;
def fixedpoint_f64_i64 : fixedpoint_i64<f64>;		def fixedpoint_f64_i64 : fixedpoint_i64<f64>;

		class fixedpoint_recip_i32<ValueType FloatVT>
		: Operand<FloatVT>,
		ComplexPattern<FloatVT, 1, "SelectCVTFixedPosRecipOperand<32>", [fpimm, ld]> {
		let EncoderMethod = "getFixedPointScaleOpValue";
		let DecoderMethod = "DecodeFixedPointScaleImm32";
		}
		dmgreenUnsubmitted Not Done Reply Inline Actions This can maybe be removed, as the class is only used as a ComplexPattern, not as a assembly Operand? dmgreen: This can maybe be removed, as the class is only used as a ComplexPattern, not as a assembly…
		jaykang10AuthorUnsubmitted Done Reply Inline Actions Let me remove it. jaykang10: Let me remove it.

		class fixedpoint_recip_i64<ValueType FloatVT>
		: Operand<FloatVT>,
		ComplexPattern<FloatVT, 1, "SelectCVTFixedPosRecipOperand<64>", [fpimm, ld]> {
		let EncoderMethod = "getFixedPointScaleOpValue";
		let DecoderMethod = "DecodeFixedPointScaleImm64";
		}

		def fixedpoint_recip_f16_i32 : fixedpoint_recip_i32<f16>;
		def fixedpoint_recip_f32_i32 : fixedpoint_recip_i32<f32>;
		def fixedpoint_recip_f64_i32 : fixedpoint_recip_i32<f64>;

		def fixedpoint_recip_f16_i64 : fixedpoint_recip_i64<f16>;
		def fixedpoint_recip_f32_i64 : fixedpoint_recip_i64<f32>;
		def fixedpoint_recip_f64_i64 : fixedpoint_recip_i64<f64>;

def vecshiftR8 : Operand<i32>, ImmLeaf<i32, [{		def vecshiftR8 : Operand<i32>, ImmLeaf<i32, [{
return (((uint32_t)Imm) > 0) && (((uint32_t)Imm) < 9);		return (((uint32_t)Imm) > 0) && (((uint32_t)Imm) < 9);
}]> {		}]> {
let EncoderMethod = "getVecShiftR8OpValue";		let EncoderMethod = "getVecShiftR8OpValue";
let DecoderMethod = "DecodeVecShiftR8Imm";		let DecoderMethod = "DecodeVecShiftR8Imm";
let ParserMatchClass = Imm1_8Operand;		let ParserMatchClass = Imm1_8Operand;
}		}
def vecshiftR16 : Operand<i32>, ImmLeaf<i32, [{		def vecshiftR16 : Operand<i32>, ImmLeaf<i32, [{
▲ Show 20 Lines • Show All 4,265 Lines • ▼ Show 20 Lines	multiclass IntegerToFP<bit isUnsigned, string asm, SDPatternOperator node> {
}		}

def UXDri: BaseIntegerToFPUnscaled<isUnsigned, GPR64, FPR64, f64, asm, node> {		def UXDri: BaseIntegerToFPUnscaled<isUnsigned, GPR64, FPR64, f64, asm, node> {
let Inst{31} = 1; // 64-bit GPR flag		let Inst{31} = 1; // 64-bit GPR flag
let Inst{23-22} = 0b01; // 64-bit FPR flag		let Inst{23-22} = 0b01; // 64-bit FPR flag
}		}

// Scaled		// Scaled
def SWHri: BaseIntegerToFP<isUnsigned, GPR32, FPR16, fixedpoint_f16_i32, asm,		def SWHri: BaseIntegerToFP<isUnsigned, GPR32, FPR16, fixedpoint_recip_f16_i32, asm,
[(set (f16 FPR16:$Rd),		[(set (f16 FPR16:$Rd),
(fdiv (node GPR32:$Rn),		(fmul (node GPR32:$Rn),
fixedpoint_f16_i32:$scale))]> {		fixedpoint_recip_f16_i32:$scale))]> {
let Inst{31} = 0; // 32-bit GPR flag		let Inst{31} = 0; // 32-bit GPR flag
let Inst{23-22} = 0b11; // 16-bit FPR flag		let Inst{23-22} = 0b11; // 16-bit FPR flag
let scale{5} = 1;		let scale{5} = 1;
let Predicates = [HasFullFP16];		let Predicates = [HasFullFP16];
}		}

def SWSri: BaseIntegerToFP<isUnsigned, GPR32, FPR32, fixedpoint_f32_i32, asm,		def SWSri: BaseIntegerToFP<isUnsigned, GPR32, FPR32, fixedpoint_recip_f32_i32, asm,
[(set FPR32:$Rd,		[(set FPR32:$Rd,
(fdiv (node GPR32:$Rn),		(fmul (node GPR32:$Rn),
fixedpoint_f32_i32:$scale))]> {		fixedpoint_recip_f32_i32:$scale))]> {
let Inst{31} = 0; // 32-bit GPR flag		let Inst{31} = 0; // 32-bit GPR flag
let Inst{23-22} = 0b00; // 32-bit FPR flag		let Inst{23-22} = 0b00; // 32-bit FPR flag
let scale{5} = 1;		let scale{5} = 1;
}		}

def SWDri: BaseIntegerToFP<isUnsigned, GPR32, FPR64, fixedpoint_f64_i32, asm,		def SWDri: BaseIntegerToFP<isUnsigned, GPR32, FPR64, fixedpoint_recip_f64_i32, asm,
[(set FPR64:$Rd,		[(set FPR64:$Rd,
(fdiv (node GPR32:$Rn),		(fmul (node GPR32:$Rn),
fixedpoint_f64_i32:$scale))]> {		fixedpoint_recip_f64_i32:$scale))]> {
let Inst{31} = 0; // 32-bit GPR flag		let Inst{31} = 0; // 32-bit GPR flag
let Inst{23-22} = 0b01; // 64-bit FPR flag		let Inst{23-22} = 0b01; // 64-bit FPR flag
let scale{5} = 1;		let scale{5} = 1;
}		}

def SXHri: BaseIntegerToFP<isUnsigned, GPR64, FPR16, fixedpoint_f16_i64, asm,		def SXHri: BaseIntegerToFP<isUnsigned, GPR64, FPR16, fixedpoint_recip_f16_i64, asm,
[(set (f16 FPR16:$Rd),		[(set (f16 FPR16:$Rd),
(fdiv (node GPR64:$Rn),		(fmul (node GPR64:$Rn),
fixedpoint_f16_i64:$scale))]> {		fixedpoint_recip_f16_i64:$scale))]> {
let Inst{31} = 1; // 64-bit GPR flag		let Inst{31} = 1; // 64-bit GPR flag
let Inst{23-22} = 0b11; // 16-bit FPR flag		let Inst{23-22} = 0b11; // 16-bit FPR flag
let Predicates = [HasFullFP16];		let Predicates = [HasFullFP16];
}		}

def SXSri: BaseIntegerToFP<isUnsigned, GPR64, FPR32, fixedpoint_f32_i64, asm,		def SXSri: BaseIntegerToFP<isUnsigned, GPR64, FPR32, fixedpoint_recip_f32_i64, asm,
		dmgreenUnsubmitted Not Done Reply Inline Actions Should these be recip too? I'm not sure they need to be, but it might be better for them to be consistent. dmgreen: Should these be recip too? I'm not sure they need to be, but it might be better for them to be…
		jaykang10AuthorUnsubmitted Done Reply Inline Actions Sorry for mistake. I did not update it. It looks the TableGen does not complain about the inconsistency between complex pattern operands in `InOperandList` and `Pattern`. As you mentioned, it would be just good to use same thing for consistent. jaykang10: Sorry for mistake. I did not update it. It looks the TableGen does not complain about the…
[(set FPR32:$Rd,		[(set FPR32:$Rd,
(fdiv (node GPR64:$Rn),		(fmul (node GPR64:$Rn),
fixedpoint_f32_i64:$scale))]> {		fixedpoint_recip_f32_i64:$scale))]> {
let Inst{31} = 1; // 64-bit GPR flag		let Inst{31} = 1; // 64-bit GPR flag
let Inst{23-22} = 0b00; // 32-bit FPR flag		let Inst{23-22} = 0b00; // 32-bit FPR flag
}		}

def SXDri: BaseIntegerToFP<isUnsigned, GPR64, FPR64, fixedpoint_f64_i64, asm,		def SXDri: BaseIntegerToFP<isUnsigned, GPR64, FPR64, fixedpoint_recip_f64_i64, asm,
[(set FPR64:$Rd,		[(set FPR64:$Rd,
(fdiv (node GPR64:$Rn),		(fmul (node GPR64:$Rn),
fixedpoint_f64_i64:$scale))]> {		fixedpoint_recip_f64_i64:$scale))]> {
let Inst{31} = 1; // 64-bit GPR flag		let Inst{31} = 1; // 64-bit GPR flag
let Inst{23-22} = 0b01; // 64-bit FPR flag		let Inst{23-22} = 0b01; // 64-bit FPR flag
}		}
}		}

//---		//---
// Unscaled integer <-> floating point conversion (i.e. FMOV)		// Unscaled integer <-> floating point conversion (i.e. FMOV)
//---		//---
▲ Show 20 Lines • Show All 7,153 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64InstrInfo.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 4,342 Lines • ▼ Show 20 Lines

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Scaled integer to floating point conversion instructions.			// Scaled integer to floating point conversion instructions.
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	defm SCVTF : IntegerToFP<0, "scvtf", any_sint_to_fp>;			defm SCVTF : IntegerToFP<0, "scvtf", any_sint_to_fp>;
	defm UCVTF : IntegerToFP<1, "ucvtf", any_uint_to_fp>;			defm UCVTF : IntegerToFP<1, "ucvtf", any_uint_to_fp>;

				def : Pat<(f16 (fdiv (f16 (any_sint_to_fp (i32 GPR32:$Rn))), fixedpoint_f16_i32:$scale)),
				(SCVTFSWHri GPR32:$Rn, fixedpoint_f16_i32:$scale)>;
				def : Pat<(f32 (fdiv (f32 (any_sint_to_fp (i32 GPR32:$Rn))), fixedpoint_f32_i32:$scale)),
				(SCVTFSWSri GPR32:$Rn, fixedpoint_f32_i32:$scale)>;
				def : Pat<(f64 (fdiv (f64 (any_sint_to_fp (i32 GPR32:$Rn))), fixedpoint_f64_i32:$scale)),
				(SCVTFSWDri GPR32:$Rn, fixedpoint_f64_i32:$scale)>;

				def : Pat<(f16 (fdiv (f16 (any_sint_to_fp (i64 GPR64:$Rn))), fixedpoint_f16_i64:$scale)),
				(SCVTFSXHri GPR64:$Rn, fixedpoint_f16_i64:$scale)>;
				def : Pat<(f32 (fdiv (f32 (any_sint_to_fp (i64 GPR64:$Rn))), fixedpoint_f32_i64:$scale)),
				(SCVTFSXSri GPR64:$Rn, fixedpoint_f32_i64:$scale)>;
				def : Pat<(f64 (fdiv (f64 (any_sint_to_fp (i64 GPR64:$Rn))), fixedpoint_f64_i64:$scale)),
				(SCVTFSXDri GPR64:$Rn, fixedpoint_f64_i64:$scale)>;

				def : Pat<(f16 (fdiv (f16 (any_uint_to_fp (i64 GPR64:$Rn))), fixedpoint_f16_i64:$scale)),
				(UCVTFSXHri GPR64:$Rn, fixedpoint_f16_i64:$scale)>;
				def : Pat<(f32 (fdiv (f32 (any_uint_to_fp (i64 GPR64:$Rn))), fixedpoint_f32_i64:$scale)),
				(UCVTFSXSri GPR64:$Rn, fixedpoint_f32_i64:$scale)>;
				def : Pat<(f64 (fdiv (f64 (any_uint_to_fp (i64 GPR64:$Rn))), fixedpoint_f64_i64:$scale)),
				(UCVTFSXDri GPR64:$Rn, fixedpoint_f64_i64:$scale)>;

				def : Pat<(f16 (fdiv (f16 (any_uint_to_fp (i32 GPR32:$Rn))), fixedpoint_f16_i32:$scale)),
				(UCVTFSWHri GPR32:$Rn, fixedpoint_f16_i32:$scale)>;
				def : Pat<(f32 (fdiv (f32 (any_uint_to_fp (i32 GPR32:$Rn))), fixedpoint_f32_i32:$scale)),
				(UCVTFSWSri GPR32:$Rn, fixedpoint_f32_i32:$scale)>;
				def : Pat<(f64 (fdiv (f64 (any_uint_to_fp (i32 GPR32:$Rn))), fixedpoint_f64_i32:$scale)),
				(UCVTFSWDri GPR32:$Rn, fixedpoint_f64_i32:$scale)>;

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Unscaled integer to floating point conversion instruction.			// Unscaled integer to floating point conversion instruction.
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	defm FMOV : UnscaledConversion<"fmov">;			defm FMOV : UnscaledConversion<"fmov">;

	// Add pseudo ops for FMOV 0 so we can mark them as isReMaterializable			// Add pseudo ops for FMOV 0 so we can mark them as isReMaterializable
	let isReMaterializable = 1, isCodeGenOnly = 1, isAsCheapAsAMove = 1 in {			let isReMaterializable = 1, isCodeGenOnly = 1, isAsCheapAsAMove = 1 in {
	▲ Show 20 Lines • Show All 4,799 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/svtcf-fmul-fdiv-combine.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 2
				; RUN: llc -mtriple aarch64-none-linux-gnu -enable-unsafe-fp-math -mattr=+fullfp16 < %s \| FileCheck %s

				define half @scvtf_f16_2(i32 %state) {
				; CHECK-LABEL: scvtf_f16_2:
				dmgreenUnsubmitted Not Done Reply Inline Actions Can you add some fp16 variants in here too. dmgreen: Can you add some fp16 variants in here too.
				jaykang10AuthorUnsubmitted Done Reply Inline Actions Let me add fp16 tests. jaykang10: Let me add fp16 tests.
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: scvtf h0, w0, #1
				; CHECK-NEXT: ret
				entry:
				%conv = sitofp i32 %state to half
				%div = fmul half %conv, 5.000000e-01
				ret half %div
				}

				define half @scvtf_f16_4(i32 %state) {
				; CHECK-LABEL: scvtf_f16_4:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: scvtf h0, w0, #2
				; CHECK-NEXT: ret
				entry:
				%conv = sitofp i32 %state to half
				%div = fmul half %conv, 2.500000e-01
				ret half %div
				}

				define half @scvtf_f16_8(i32 %state) {
				; CHECK-LABEL: scvtf_f16_8:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: scvtf h0, w0, #3
				; CHECK-NEXT: ret
				entry:
				%conv = sitofp i32 %state to half
				%div = fmul half %conv, 1.250000e-01
				ret half %div
				}

				define half @scvtf_f16_16(i32 %state) {
				; CHECK-LABEL: scvtf_f16_16:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: scvtf h0, w0, #4
				; CHECK-NEXT: ret
				entry:
				%conv = sitofp i32 %state to half
				%div = fmul half %conv, 6.250000e-02
				ret half %div
				}

				define half @scvtf_f16_32(i32 %state) {
				; CHECK-LABEL: scvtf_f16_32:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: scvtf h0, w0, #5
				; CHECK-NEXT: ret
				entry:
				%conv = sitofp i32 %state to half
				%div = fmul half %conv, 3.125000e-02
				ret half %div
				}

				define float @scvtf_f32_2(i32 %state) {
				; CHECK-LABEL: scvtf_f32_2:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: scvtf s0, w0, #1
				; CHECK-NEXT: ret
				entry:
				%conv = sitofp i32 %state to float
				%div = fmul float %conv, 5.000000e-01
				ret float %div
				}

				define float @scvtf_f32_4(i32 %state) {
				; CHECK-LABEL: scvtf_f32_4:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: scvtf s0, w0, #2
				; CHECK-NEXT: ret
				entry:
				%conv = sitofp i32 %state to float
				%div = fmul float %conv, 2.500000e-01
				ret float %div
				}

				define float @scvtf_f32_8(i32 %state) {
				; CHECK-LABEL: scvtf_f32_8:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: scvtf s0, w0, #3
				; CHECK-NEXT: ret
				entry:
				%conv = sitofp i32 %state to float
				%div = fmul float %conv, 1.250000e-01
				ret float %div
				}

				define float @scvtf_f32_16(i32 %state) {
				; CHECK-LABEL: scvtf_f32_16:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: scvtf s0, w0, #4
				; CHECK-NEXT: ret
				entry:
				%conv = sitofp i32 %state to float
				%div = fmul float %conv, 6.250000e-02
				ret float %div
				}

				define float @scvtf_f32_32(i32 %state) {
				; CHECK-LABEL: scvtf_f32_32:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: scvtf s0, w0, #5
				; CHECK-NEXT: ret
				entry:
				%conv = sitofp i32 %state to float
				%div = fmul float %conv, 3.125000e-02
				ret float %div
				}

				define double @scvtf_f64_2(i64 %state) {
				; CHECK-LABEL: scvtf_f64_2:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: scvtf d0, x0, #1
				; CHECK-NEXT: ret
				entry:
				%conv = sitofp i64 %state to double
				%div = fmul double %conv, 5.000000e-01
				ret double %div
				}

				define double @scvtf_f64_4(i64 %state) {
				; CHECK-LABEL: scvtf_f64_4:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: scvtf d0, x0, #2
				; CHECK-NEXT: ret
				entry:
				%conv = sitofp i64 %state to double
				%div = fmul double %conv, 2.500000e-01
				ret double %div
				}

				define double @scvtf_f64_8(i64 %state) {
				; CHECK-LABEL: scvtf_f64_8:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: scvtf d0, x0, #3
				; CHECK-NEXT: ret
				entry:
				%conv = sitofp i64 %state to double
				%div = fmul double %conv, 1.250000e-01
				ret double %div
				}

				define double @scvtf_f64_16(i64 %state) {
				; CHECK-LABEL: scvtf_f64_16:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: scvtf d0, x0, #4
				; CHECK-NEXT: ret
				entry:
				%conv = sitofp i64 %state to double
				%div = fmul double %conv, 6.250000e-02
				ret double %div
				}

				define double @scvtf_f64_32(i64 %state) {
				; CHECK-LABEL: scvtf_f64_32:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: scvtf d0, x0, #5
				; CHECK-NEXT: ret
				entry:
				%conv = sitofp i64 %state to double
				%div = fmul double %conv, 3.125000e-02
				ret double %div
				}

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Try to combine FMUL with FDIVClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 550229

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp

llvm/lib/Target/AArch64/AArch64InstrFormats.td

llvm/lib/Target/AArch64/AArch64InstrInfo.td

llvm/test/CodeGen/AArch64/svtcf-fmul-fdiv-combine.ll

[AArch64] Try to combine FMUL with FDIV
ClosedPublic