This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
-
AArch64InstrFormats.td
-
AArch64InstrInfo.td
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
fcvt-fixed.ll
-
fp16_intrinsic_scalar_2op.ll

Differential D127158

[AArch64] Add intrinsic support for gpr<->fpr flavors of fixed-point converts
Needs ReviewPublic

Authored by rmcclure on Jun 6 2022, 2:58 PM.

Download Raw Diff

Details

Reviewers

dmgreen
SjoerdMeijer

Summary

This patch adds patterns to generate (for example) "UCVTF Dd, Wn, #imm" from a call to "aarch64.neon.vcvtfxu2fp.f64.i32" (and similar for other fixed-point converts).

Diff Detail

Unit TestsFailed

	Time	Test
	60,140 ms	x64 debian > AddressSanitizer-x86_64-linux-dynamic.TestCases::scariness_score_test.cpp
	60,100 ms	x64 debian > AddressSanitizer-x86_64-linux.TestCases::scariness_score_test.cpp
	40 ms	x64 debian > LLVM.CodeGen/AArch64::arm64-fixed-point-scalar-cvt-dagcombine.ll

Event Timeline

rmcclure created this revision.Jun 6 2022, 2:58 PM

Herald added a project: Restricted Project. · View Herald TranscriptJun 6 2022, 2:58 PM

Herald added subscribers: hiraditya, kristof.beyls. · View Herald Transcript

rmcclure requested review of this revision.Jun 6 2022, 2:58 PM

Herald added a project: Restricted Project. · View Herald TranscriptJun 6 2022, 2:58 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

One of the existing tests (do_stuff in CodeGen/AArch64/arm64-fixed-point-scalar-cvt-dagcombine.ll) fails with this change, because we now generate an fmov followed by a gpr->fpr ucvtf, instead of the expected fpr->fpr ucvtf. Of course, other tests show that we now avoid an unnecessary fmov after this change.
Is there a preferred method for making a better decision for when to use the gpr->fpr or fpr->fpr flavor of these instructions?
I see the AArch64AdvSIMDScalarPass pass which converts certain GPR ops into AdvSIMD scalar ops when it would save on copies, which sounds mildly similar to the problem I'd like to solve here. Would it be reasonable to leverage that pass for this purpose?

Harbormaster completed remote builds in B168163: Diff 434621.Jun 6 2022, 3:39 PM

Hello - Can you give more details about why we want these to be added? Are we missing ACLE intrinsics that clang doesn't have, or are some intrinsics broken? I don't see definitions for vcvtd_n_f64_s32 in the ACLE for example.

In D127158#3566948, @dmgreen wrote:

Hello - Can you give more details about why we want these to be added? Are we missing ACLE intrinsics that clang doesn't have, or are some intrinsics broken? I don't see definitions for vcvtd_n_f64_s32 in the ACLE for example.

The main goal is just to expose these instructions to anyone who might want them, since they expose functionality that isn't available with the existing intrinsic support.

I agree that vcvtd_n_f64_s32 doesn't exist, but vcvth_n_s64_f16 does, for example, which is where things get a bit more interesting.

In mainline, LLVM currently produces something like the following:

fcvtzs  h0, h0, #10
fmov    x0, d0

This technically matches what the ACLE lists as the generated instruction (FCVTSZ Hd,Hn,#n), but that seems wrong to me - it drops the sign information from the result (e.g. converting -1.0f produces a positive result).
For comparison, after this change, LLVM generates:

fcvtzs  x0, h0, #10

which preserves the sign information. It also supports results that exceed the range of a 16-bit integer, which also seems sensible.

As two other points of reference:
For the example above, GCC also generates fcvtzs x0, h0, #10
For vcvth_s64_f16 (the fp->integer flavor), mainline LLVM (and GCC) generates fcvtzs x0, h0, which technically violates the ACLE spec in the same way I described above (it expects FCVTZ Hd,Hn), so making the fixed-point<->fp converts behave similarly seems reasonable.

I agree that the LLVM intrinsics should convert between the exact types listed in the signature of the intrinsic, but we need to make sure we still have an intrinsic that produces "scvtf h0, h0, #16" etc.

This patch also seems like it's changing a few too many things at once... can you split the patch into pieces that are more easy to review?

For something like "llvm.aarch64.neon.vcvtfp2fxs.i32.f32", I guess there are two possible instructions with effectively equivalent semantics. Ideally, the compiler is just clever enough to figure out the best one from the context. Given the way SelectionDAG isel works, you probably have to do some sort of post-isel fixup like AArch64AdvSIMDScalarPass .

In terms of what clang generates, we have to follow the NEON spec; we can't use an instruction that produces a different result just because it's more useful. So if the spec says the conversion produces a 16-bit integer, we have to produce a 16-bit integer. That said, it's a little suspicious that the spec says vcvth_n_s16_f16, vcvth_n_s32_f16, and vcvth_n_s64_f16 all produce exactly the same instruction; maybe there's a spec bug? @dmgreen @fpetrogalli can you confirm whether the spec is right?

@dmgreen @fpetrogalli can you confirm whether the spec is right?

Thanks for the info. I've asked internally if anyone has a clear memory. It does look like GCC produces the x/w variants: https://godbolt.org/z/1GqeGv9xn

The main goal is just to expose these instructions to anyone who might want them, since they expose functionality that isn't available with the existing intrinsic support.

OK I see. My main reason for asking was whether the fp_to_si(fmul(x, C)) form would be acceptable to your use-case. Compared to the @llvm.aarch64.neon... intrinsics, whilst not always perfectly identical, do have certain benefits. It depends what the user is after, but the plain IR instructions benefit from all the constantfolding/range analysis/vectorization/etc that can happen in the mid-end, where the neon intrinsics often remain as black-boxes to optimizations. The intrinsics would probably be more accurately specified as fptosi_sat(fmul(x, C)), so long as the constants were precise, but I'm not sure if there is lowering for that yet.

If the fptosi_sat(fmul(x, C)) form is precisely equivalent to the intrinsics, my opinion would be to remove the @llvm.aarch64.neon.vcvt.. intrinsics entirely and reply on pure codegen. You always run into the possibility that the compiler may produce a worse result by mis-optimizing, but the chances of improvement usually outweigh down sides. (I'm not sure if it works in all cases though, if 2^16 can't be represented as a fp16).

we need to make sure we still have an intrinsic that produces "scvtf h0, h0, #16" etc.

That particular example is easy, since there's only one instruction form that does the int16_t -> float16_t conversion.
For the cases where there are multiple choices, are you saying there should be a way to force a particular form, even if it is suboptimal? e.g. something to force generation of scvtf s0, w0, #10 instead of scvtf s0, s0, #10, even if the source is already in an FPR?

For something like "llvm.aarch64.neon.vcvtfp2fxs.i32.f32", I guess there are two possible instructions with effectively equivalent semantics. Ideally, the compiler is just clever enough to figure out the best one from the context. Given the way SelectionDAG isel works, you probably have to do some sort of post-isel fixup like AArch64AdvSIMDScalarPass .

Yes, this was my thought process, too. If AArch64AdvSIMDScalarPass is a reasonable place to add that "cleverness", I'm happy to add it there.

In terms of what clang generates, we have to follow the NEON spec; we can't use an instruction that produces a different result just because it's more useful. So if the spec says the conversion produces a 16-bit integer, we have to produce a 16-bit integer.

Agreed! My claim is that the spec is currently inconsistent here (e.g. vcvth_n_s64_f16 should return an int64_t, but the instruction used in the spec returns an int16_t). This change causes LLVM to match the source/return type specified in the spec, rather than the instruction specified in the spec (as mentioned, this causes us to match LLVM's behavior for the integer <-> fp converts, where the spec is similarly inconsistent).

This patch also seems like it's changing a few too many things at once... can you split the patch into pieces that are more easy to review?

Hm... the patch effectively has 3 components:

adding patterns to map the instrinsics to the instructions
relocates the fp_to_si(fmul(x, C)) (and similar) patterns
adds tests for the new patterns

All 3 components seem mostly inseparable (the existence of the patterns in #2 are protected by tests, so I'm loathe to remove them).
Having said that, I could separate the "fixed-point -> fp" and "fp -> fixed-point" changes, but given how similar the two directions are to each other, I thought it made more sense to keep them together than to separate them.

My main reason for asking was whether the fp_to_si(fmul(x, C)) form would be acceptable to your use-case.

Unfortunately, that form isn't sufficient for my use-case. As you mentioned, there are ranges of fixed-point placement (2^16, and even higher, since scvtf h0, x0, #n supports an immediate up to 64).
Also, my experience is that some optimization pass tends to change the sequence into a form that the backend no longer recognizes (this probably applied more to the fdiv(si_to_fp(x), C) patterns).

dkreitzer added a subscriber: dkreitzer.Jun 15 2022, 9:09 AM

Ping! Any further thoughts on this change? To summarize the opens:

arm64-fixed-point-scalar-cvt-dagcombine.ll has a failure because we now select a different flavor of the convert than the test expects. My plan here would be to temporarily disable the test, and submit a subsequent change that lets LLVM select the ~best flavor depending on which register file the inputs/outputs prefer to be in
This technically violates the ACLE spec for these intrinsics. I believe the ACLE spec should be changed, but since LLVM already violates the spec for the int <-> fp converts, I don't believe that this change should be gated by the ACLE spec change.

Hello. I think the first step needs to be to update the ACLE specification, to make sure the changes are the correct approach to take. The specification is at https://github.com/ARM-software/acle and accepts pull requests. With GCC implementing the other semantics for the fp16 scalar vector intrinsics, there should be a high chance such a change would be accepted.

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64InstrFormats.td

87 lines

AArch64InstrInfo.td

66 lines

test/

CodeGen/

AArch64/

fcvt-fixed.ll

173 lines

fp16_intrinsic_scalar_2op.ll

44 lines

Diff 434621

llvm/lib/Target/AArch64/AArch64InstrFormats.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 634 Lines • ▼ Show 20 Lines
def fixedpoint_f16_i32 : fixedpoint_i32<f16>;		def fixedpoint_f16_i32 : fixedpoint_i32<f16>;
def fixedpoint_f32_i32 : fixedpoint_i32<f32>;		def fixedpoint_f32_i32 : fixedpoint_i32<f32>;
def fixedpoint_f64_i32 : fixedpoint_i32<f64>;		def fixedpoint_f64_i32 : fixedpoint_i32<f64>;

def fixedpoint_f16_i64 : fixedpoint_i64<f16>;		def fixedpoint_f16_i64 : fixedpoint_i64<f16>;
def fixedpoint_f32_i64 : fixedpoint_i64<f32>;		def fixedpoint_f32_i64 : fixedpoint_i64<f32>;
def fixedpoint_f64_i64 : fixedpoint_i64<f64>;		def fixedpoint_f64_i64 : fixedpoint_i64<f64>;

		def fixedpoint_i64_literal : Operand<Any>, ImmLeaf<i32, [{
		return (((uint32_t)Imm) > 0) && (((uint32_t)Imm) < 65);
		}]> {
		let EncoderMethod = "getFixedPointScaleOpValue";
		let DecoderMethod = "DecodeFixedPointScaleImm64";
		let ParserMatchClass = Imm1_64Operand;
		}
		def fixedpoint_i32_literal : Operand<Any>, ImmLeaf<i32, [{
		return (((uint32_t)Imm) > 0) && (((uint32_t)Imm) < 33);
		}]> {
		let EncoderMethod = "getFixedPointScaleOpValue";
		let DecoderMethod = "DecodeFixedPointScaleImm32";
		let ParserMatchClass = Imm1_32Operand;
		}

def vecshiftR8 : Operand<i32>, ImmLeaf<i32, [{		def vecshiftR8 : Operand<i32>, ImmLeaf<i32, [{
return (((uint32_t)Imm) > 0) && (((uint32_t)Imm) < 9);		return (((uint32_t)Imm) > 0) && (((uint32_t)Imm) < 9);
}]> {		}]> {
let EncoderMethod = "getVecShiftR8OpValue";		let EncoderMethod = "getVecShiftR8OpValue";
let DecoderMethod = "DecodeVecShiftR8Imm";		let DecoderMethod = "DecodeVecShiftR8Imm";
let ParserMatchClass = Imm1_8Operand;		let ParserMatchClass = Imm1_8Operand;
}		}
def vecshiftR16 : Operand<i32>, ImmLeaf<i32, [{		def vecshiftR16 : Operand<i32>, ImmLeaf<i32, [{
▲ Show 20 Lines • Show All 3,996 Lines • ▼ Show 20 Lines	def UXDr : BaseFPToIntegerUnscaled<0b01, rmode, opcode, FPR64, GPR64, asm,
let Inst{31} = 1; // 64-bit GPR flag		let Inst{31} = 1; // 64-bit GPR flag
}		}
}		}

multiclass FPToIntegerScaled<bits<2> rmode, bits<3> opcode, string asm,		multiclass FPToIntegerScaled<bits<2> rmode, bits<3> opcode, string asm,
SDPatternOperator OpN> {		SDPatternOperator OpN> {
// Scaled half-precision to 32-bit		// Scaled half-precision to 32-bit
def SWHri : BaseFPToInteger<0b11, rmode, opcode, FPR16, GPR32,		def SWHri : BaseFPToInteger<0b11, rmode, opcode, FPR16, GPR32,
fixedpoint_f16_i32, asm,		fixedpoint_i32_literal, asm,
[(set GPR32:$Rd, (OpN (fmul (f16 FPR16:$Rn),		[(set GPR32:$Rd, (OpN (f16 FPR16:$Rn), fixedpoint_i32_literal:$scale))]> {
fixedpoint_f16_i32:$scale)))]> {
let Inst{31} = 0; // 32-bit GPR flag		let Inst{31} = 0; // 32-bit GPR flag
let scale{5} = 1;		let scale{5} = 1;
let Predicates = [HasFullFP16];		let Predicates = [HasFullFP16];
}		}

// Scaled half-precision to 64-bit		// Scaled half-precision to 64-bit
def SXHri : BaseFPToInteger<0b11, rmode, opcode, FPR16, GPR64,		def SXHri : BaseFPToInteger<0b11, rmode, opcode, FPR16, GPR64,
fixedpoint_f16_i64, asm,		fixedpoint_i64_literal, asm,
[(set GPR64:$Rd, (OpN (fmul (f16 FPR16:$Rn),		[(set GPR64:$Rd, (OpN (f16 FPR16:$Rn), fixedpoint_i64_literal:$scale))]> {
fixedpoint_f16_i64:$scale)))]> {
let Inst{31} = 1; // 64-bit GPR flag		let Inst{31} = 1; // 64-bit GPR flag
let Predicates = [HasFullFP16];		let Predicates = [HasFullFP16];
}		}

// Scaled single-precision to 32-bit		// Scaled single-precision to 32-bit
def SWSri : BaseFPToInteger<0b00, rmode, opcode, FPR32, GPR32,		def SWSri : BaseFPToInteger<0b00, rmode, opcode, FPR32, GPR32,
fixedpoint_f32_i32, asm,		fixedpoint_i32_literal, asm,
[(set GPR32:$Rd, (OpN (fmul FPR32:$Rn,		[(set GPR32:$Rd, (OpN FPR32:$Rn, fixedpoint_i32_literal:$scale))]> {
fixedpoint_f32_i32:$scale)))]> {
let Inst{31} = 0; // 32-bit GPR flag		let Inst{31} = 0; // 32-bit GPR flag
let scale{5} = 1;		let scale{5} = 1;
}		}

// Scaled single-precision to 64-bit		// Scaled single-precision to 64-bit
def SXSri : BaseFPToInteger<0b00, rmode, opcode, FPR32, GPR64,		def SXSri : BaseFPToInteger<0b00, rmode, opcode, FPR32, GPR64,
fixedpoint_f32_i64, asm,		fixedpoint_i64_literal, asm,
[(set GPR64:$Rd, (OpN (fmul FPR32:$Rn,		[(set GPR64:$Rd, (OpN FPR32:$Rn, fixedpoint_i64_literal:$scale))]> {
fixedpoint_f32_i64:$scale)))]> {
let Inst{31} = 1; // 64-bit GPR flag		let Inst{31} = 1; // 64-bit GPR flag
}		}

// Scaled double-precision to 32-bit		// Scaled double-precision to 32-bit
def SWDri : BaseFPToInteger<0b01, rmode, opcode, FPR64, GPR32,		def SWDri : BaseFPToInteger<0b01, rmode, opcode, FPR64, GPR32,
fixedpoint_f64_i32, asm,		fixedpoint_i32_literal, asm,
[(set GPR32:$Rd, (OpN (fmul FPR64:$Rn,		[(set GPR32:$Rd, (OpN (f64 FPR64:$Rn), fixedpoint_i32_literal:$scale))]> {
fixedpoint_f64_i32:$scale)))]> {
let Inst{31} = 0; // 32-bit GPR flag		let Inst{31} = 0; // 32-bit GPR flag
let scale{5} = 1;		let scale{5} = 1;
}		}

// Scaled double-precision to 64-bit		// Scaled double-precision to 64-bit
def SXDri : BaseFPToInteger<0b01, rmode, opcode, FPR64, GPR64,		def SXDri : BaseFPToInteger<0b01, rmode, opcode, FPR64, GPR64,
fixedpoint_f64_i64, asm,		fixedpoint_i64_literal, asm,
[(set GPR64:$Rd, (OpN (fmul FPR64:$Rn,		[(set GPR64:$Rd, (OpN (f64 FPR64:$Rn), fixedpoint_i64_literal:$scale))]> {
fixedpoint_f64_i64:$scale)))]> {
let Inst{31} = 1; // 64-bit GPR flag		let Inst{31} = 1; // 64-bit GPR flag
}		}
}		}

//---		//---
// Integer to floating point conversion		// Integer to floating point conversion
//---		//---

let mayStore = 0, mayLoad = 0, hasSideEffects = 0, mayRaiseFPException = 1 in		let mayStore = 0, mayLoad = 0, hasSideEffects = 0, mayRaiseFPException = 1 in
class BaseIntegerToFP<bit isUnsigned,		class BaseIntegerToFPScaled<bit isUnsigned,
RegisterClass srcType, RegisterClass dstType,		RegisterClass srcType, RegisterClass dstType,
Operand immType, string asm, list<dag> pattern>		ValueType dvt, Operand immType, string asm, SDPatternOperator node>
: I<(outs dstType:$Rd), (ins srcType:$Rn, immType:$scale),		: I<(outs dstType:$Rd), (ins srcType:$Rn, immType:$scale),
asm, "\t$Rd, $Rn, $scale", "", pattern>,		asm, "\t$Rd, $Rn, $scale", "", [(set (dvt dstType:$Rd), (node srcType:$Rn, immType:$scale))]>,
Sched<[WriteFCvt]> {		Sched<[WriteFCvt]> {
bits<5> Rd;		bits<5> Rd;
bits<5> Rn;		bits<5> Rn;
bits<6> scale;		bits<6> scale;
let Inst{30-24} = 0b0011110;		let Inst{30-24} = 0b0011110;
let Inst{21-17} = 0b00001;		let Inst{21-17} = 0b00001;
let Inst{16} = isUnsigned;		let Inst{16} = isUnsigned;
let Inst{15-10} = scale;		let Inst{15-10} = scale;
Show All 14 Lines	class BaseIntegerToFPUnscaled<bit isUnsigned,
let Inst{30-24} = 0b0011110;		let Inst{30-24} = 0b0011110;
let Inst{21-17} = 0b10001;		let Inst{21-17} = 0b10001;
let Inst{16} = isUnsigned;		let Inst{16} = isUnsigned;
let Inst{15-10} = 0b000000;		let Inst{15-10} = 0b000000;
let Inst{9-5} = Rn;		let Inst{9-5} = Rn;
let Inst{4-0} = Rd;		let Inst{4-0} = Rd;
}		}

multiclass IntegerToFP<bit isUnsigned, string asm, SDPatternOperator node> {		multiclass IntegerToFPUnscaled<bit isUnsigned, string asm, SDPatternOperator node> {
// Unscaled
def UWHri: BaseIntegerToFPUnscaled<isUnsigned, GPR32, FPR16, f16, asm, node> {		def UWHri: BaseIntegerToFPUnscaled<isUnsigned, GPR32, FPR16, f16, asm, node> {
let Inst{31} = 0; // 32-bit GPR flag		let Inst{31} = 0; // 32-bit GPR flag
let Inst{23-22} = 0b11; // 16-bit FPR flag		let Inst{23-22} = 0b11; // 16-bit FPR flag
let Predicates = [HasFullFP16];		let Predicates = [HasFullFP16];
}		}

def UWSri: BaseIntegerToFPUnscaled<isUnsigned, GPR32, FPR32, f32, asm, node> {		def UWSri: BaseIntegerToFPUnscaled<isUnsigned, GPR32, FPR32, f32, asm, node> {
let Inst{31} = 0; // 32-bit GPR flag		let Inst{31} = 0; // 32-bit GPR flag
Show All 15 Lines	def UXSri: BaseIntegerToFPUnscaled<isUnsigned, GPR64, FPR32, f32, asm, node> {
let Inst{31} = 1; // 64-bit GPR flag		let Inst{31} = 1; // 64-bit GPR flag
let Inst{23-22} = 0b00; // 32-bit FPR flag		let Inst{23-22} = 0b00; // 32-bit FPR flag
}		}

def UXDri: BaseIntegerToFPUnscaled<isUnsigned, GPR64, FPR64, f64, asm, node> {		def UXDri: BaseIntegerToFPUnscaled<isUnsigned, GPR64, FPR64, f64, asm, node> {
let Inst{31} = 1; // 64-bit GPR flag		let Inst{31} = 1; // 64-bit GPR flag
let Inst{23-22} = 0b01; // 64-bit FPR flag		let Inst{23-22} = 0b01; // 64-bit FPR flag
}		}
		}

// Scaled		multiclass IntegerToFPScaled<bit isUnsigned, string asm, SDPatternOperator node> {
def SWHri: BaseIntegerToFP<isUnsigned, GPR32, FPR16, fixedpoint_f16_i32, asm,		def SWHri: BaseIntegerToFPScaled<isUnsigned, GPR32, FPR16, f16, fixedpoint_i32_literal, asm, node> {
[(set (f16 FPR16:$Rd),
(fdiv (node GPR32:$Rn),
fixedpoint_f16_i32:$scale))]> {
let Inst{31} = 0; // 32-bit GPR flag		let Inst{31} = 0; // 32-bit GPR flag
let Inst{23-22} = 0b11; // 16-bit FPR flag		let Inst{23-22} = 0b11; // 16-bit FPR flag
let scale{5} = 1;		let scale{5} = 1;
let Predicates = [HasFullFP16];		let Predicates = [HasFullFP16];
}		}

def SWSri: BaseIntegerToFP<isUnsigned, GPR32, FPR32, fixedpoint_f32_i32, asm,		def SWSri: BaseIntegerToFPScaled<isUnsigned, GPR32, FPR32, f32, fixedpoint_i32_literal, asm, node> {
[(set FPR32:$Rd,
(fdiv (node GPR32:$Rn),
fixedpoint_f32_i32:$scale))]> {
let Inst{31} = 0; // 32-bit GPR flag		let Inst{31} = 0; // 32-bit GPR flag
let Inst{23-22} = 0b00; // 32-bit FPR flag		let Inst{23-22} = 0b00; // 32-bit FPR flag
let scale{5} = 1;		let scale{5} = 1;
}		}

def SWDri: BaseIntegerToFP<isUnsigned, GPR32, FPR64, fixedpoint_f64_i32, asm,		def SWDri: BaseIntegerToFPScaled<isUnsigned, GPR32, FPR64, f64, fixedpoint_i32_literal, asm, node> {
[(set FPR64:$Rd,
(fdiv (node GPR32:$Rn),
fixedpoint_f64_i32:$scale))]> {
let Inst{31} = 0; // 32-bit GPR flag		let Inst{31} = 0; // 32-bit GPR flag
let Inst{23-22} = 0b01; // 64-bit FPR flag		let Inst{23-22} = 0b01; // 64-bit FPR flag
let scale{5} = 1;		let scale{5} = 1;
}		}

def SXHri: BaseIntegerToFP<isUnsigned, GPR64, FPR16, fixedpoint_f16_i64, asm,		def SXHri: BaseIntegerToFPScaled<isUnsigned, GPR64, FPR16, f16, fixedpoint_i64_literal, asm, node> {
[(set (f16 FPR16:$Rd),
(fdiv (node GPR64:$Rn),
fixedpoint_f16_i64:$scale))]> {
let Inst{31} = 1; // 64-bit GPR flag		let Inst{31} = 1; // 64-bit GPR flag
let Inst{23-22} = 0b11; // 16-bit FPR flag		let Inst{23-22} = 0b11; // 16-bit FPR flag
let Predicates = [HasFullFP16];		let Predicates = [HasFullFP16];
}		}

def SXSri: BaseIntegerToFP<isUnsigned, GPR64, FPR32, fixedpoint_f32_i64, asm,		def SXSri: BaseIntegerToFPScaled<isUnsigned, GPR64, FPR32, f32, fixedpoint_i64_literal, asm, node> {
[(set FPR32:$Rd,
(fdiv (node GPR64:$Rn),
fixedpoint_f32_i64:$scale))]> {
let Inst{31} = 1; // 64-bit GPR flag		let Inst{31} = 1; // 64-bit GPR flag
let Inst{23-22} = 0b00; // 32-bit FPR flag		let Inst{23-22} = 0b00; // 32-bit FPR flag
}		}

def SXDri: BaseIntegerToFP<isUnsigned, GPR64, FPR64, fixedpoint_f64_i64, asm,		def SXDri: BaseIntegerToFPScaled<isUnsigned, GPR64, FPR64, f64, fixedpoint_i64_literal, asm, node> {
[(set FPR64:$Rd,
(fdiv (node GPR64:$Rn),
fixedpoint_f64_i64:$scale))]> {
let Inst{31} = 1; // 64-bit GPR flag		let Inst{31} = 1; // 64-bit GPR flag
let Inst{23-22} = 0b01; // 64-bit FPR flag		let Inst{23-22} = 0b01; // 64-bit FPR flag
}		}
}		}

//---		//---
// Unscaled integer <-> floating point conversion (i.e. FMOV)		// Unscaled integer <-> floating point conversion (i.e. FMOV)
//---		//---
▲ Show 20 Lines • Show All 6,733 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64InstrInfo.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 3,858 Lines • ▼ Show 20 Lines
defm FCVTMS : FPToIntegerUnscaled<0b10, 0b000, "fcvtms", int_aarch64_neon_fcvtms>;		defm FCVTMS : FPToIntegerUnscaled<0b10, 0b000, "fcvtms", int_aarch64_neon_fcvtms>;
defm FCVTMU : FPToIntegerUnscaled<0b10, 0b001, "fcvtmu", int_aarch64_neon_fcvtmu>;		defm FCVTMU : FPToIntegerUnscaled<0b10, 0b001, "fcvtmu", int_aarch64_neon_fcvtmu>;
defm FCVTNS : FPToIntegerUnscaled<0b00, 0b000, "fcvtns", int_aarch64_neon_fcvtns>;		defm FCVTNS : FPToIntegerUnscaled<0b00, 0b000, "fcvtns", int_aarch64_neon_fcvtns>;
defm FCVTNU : FPToIntegerUnscaled<0b00, 0b001, "fcvtnu", int_aarch64_neon_fcvtnu>;		defm FCVTNU : FPToIntegerUnscaled<0b00, 0b001, "fcvtnu", int_aarch64_neon_fcvtnu>;
defm FCVTPS : FPToIntegerUnscaled<0b01, 0b000, "fcvtps", int_aarch64_neon_fcvtps>;		defm FCVTPS : FPToIntegerUnscaled<0b01, 0b000, "fcvtps", int_aarch64_neon_fcvtps>;
defm FCVTPU : FPToIntegerUnscaled<0b01, 0b001, "fcvtpu", int_aarch64_neon_fcvtpu>;		defm FCVTPU : FPToIntegerUnscaled<0b01, 0b001, "fcvtpu", int_aarch64_neon_fcvtpu>;
defm FCVTZS : FPToIntegerUnscaled<0b11, 0b000, "fcvtzs", any_fp_to_sint>;		defm FCVTZS : FPToIntegerUnscaled<0b11, 0b000, "fcvtzs", any_fp_to_sint>;
defm FCVTZU : FPToIntegerUnscaled<0b11, 0b001, "fcvtzu", any_fp_to_uint>;		defm FCVTZU : FPToIntegerUnscaled<0b11, 0b001, "fcvtzu", any_fp_to_uint>;
defm FCVTZS : FPToIntegerScaled<0b11, 0b000, "fcvtzs", any_fp_to_sint>;		defm FCVTZS : FPToIntegerScaled<0b11, 0b000, "fcvtzs", int_aarch64_neon_vcvtfp2fxs>;
defm FCVTZU : FPToIntegerScaled<0b11, 0b001, "fcvtzu", any_fp_to_uint>;		defm FCVTZU : FPToIntegerScaled<0b11, 0b001, "fcvtzu", int_aarch64_neon_vcvtfp2fxu>;

// AArch64's FCVT instructions saturate when out of range.		// AArch64's FCVT instructions saturate when out of range.
multiclass FPToIntegerSatPats<SDNode to_int_sat, string INST> {		multiclass FPToIntegerSatPats<SDNode to_int_sat, string INST> {
let Predicates = [HasFullFP16] in {		let Predicates = [HasFullFP16] in {
def : Pat<(i32 (to_int_sat f16:$Rn, i32)),		def : Pat<(i32 (to_int_sat f16:$Rn, i32)),
(!cast<Instruction>(INST # UWHr) f16:$Rn)>;		(!cast<Instruction>(INST # UWHr) f16:$Rn)>;
def : Pat<(i64 (to_int_sat f16:$Rn, i64)),		def : Pat<(i64 (to_int_sat f16:$Rn, i64)),
(!cast<Instruction>(INST # UXHr) f16:$Rn)>;		(!cast<Instruction>(INST # UXHr) f16:$Rn)>;
Show All 21 Lines	def : Pat<(i32 (to_int_sat (fmul f64:$Rn, fixedpoint_f64_i32:$scale), i32)),
(!cast<Instruction>(INST # SWDri) $Rn, $scale)>;		(!cast<Instruction>(INST # SWDri) $Rn, $scale)>;
def : Pat<(i64 (to_int_sat (fmul f64:$Rn, fixedpoint_f64_i64:$scale), i64)),		def : Pat<(i64 (to_int_sat (fmul f64:$Rn, fixedpoint_f64_i64:$scale), i64)),
(!cast<Instruction>(INST # SXDri) $Rn, $scale)>;		(!cast<Instruction>(INST # SXDri) $Rn, $scale)>;
}		}

defm : FPToIntegerSatPats<fp_to_sint_sat, "FCVTZS">;		defm : FPToIntegerSatPats<fp_to_sint_sat, "FCVTZS">;
defm : FPToIntegerSatPats<fp_to_uint_sat, "FCVTZU">;		defm : FPToIntegerSatPats<fp_to_uint_sat, "FCVTZU">;

		multiclass FPToIntegerScaledPats<SDPatternOperator to_int, string INST> {
		let Predicates = [HasFullFP16] in {
		def : Pat<(i32 (to_int (fmul f16:$Rn, fixedpoint_f16_i32:$scale))),
		(!cast<Instruction>(INST # SWHri) $Rn, $scale)>;
		def : Pat<(i64 (to_int (fmul f16:$Rn, fixedpoint_f16_i64:$scale))),
		(!cast<Instruction>(INST # SXHri) $Rn, $scale)>;
		}
		def : Pat<(i32 (to_int (fmul f32:$Rn, fixedpoint_f32_i32:$scale))),
		(!cast<Instruction>(INST # SWSri) $Rn, $scale)>;
		def : Pat<(i64 (to_int (fmul f32:$Rn, fixedpoint_f32_i64:$scale))),
		(!cast<Instruction>(INST # SXSri) $Rn, $scale)>;
		def : Pat<(i32 (to_int (fmul f64:$Rn, fixedpoint_f64_i32:$scale))),
		(!cast<Instruction>(INST # SWDri) $Rn, $scale)>;
		def : Pat<(i64 (to_int (fmul f64:$Rn, fixedpoint_f64_i64:$scale))),
		(!cast<Instruction>(INST # SXDri) $Rn, $scale)>;
		}

		defm : FPToIntegerScaledPats<any_fp_to_sint, "FCVTZS">;
		defm : FPToIntegerScaledPats<any_fp_to_uint, "FCVTZU">;

multiclass FPToIntegerIntPats<Intrinsic round, string INST> {		multiclass FPToIntegerIntPats<Intrinsic round, string INST> {
let Predicates = [HasFullFP16] in {		let Predicates = [HasFullFP16] in {
def : Pat<(i32 (round f16:$Rn)), (!cast<Instruction>(INST # UWHr) $Rn)>;		def : Pat<(i32 (round f16:$Rn)), (!cast<Instruction>(INST # UWHr) $Rn)>;
def : Pat<(i64 (round f16:$Rn)), (!cast<Instruction>(INST # UXHr) $Rn)>;		def : Pat<(i64 (round f16:$Rn)), (!cast<Instruction>(INST # UXHr) $Rn)>;
}		}
def : Pat<(i32 (round f32:$Rn)), (!cast<Instruction>(INST # UWSr) $Rn)>;		def : Pat<(i32 (round f32:$Rn)), (!cast<Instruction>(INST # UWSr) $Rn)>;
def : Pat<(i64 (round f32:$Rn)), (!cast<Instruction>(INST # UXSr) $Rn)>;		def : Pat<(i64 (round f32:$Rn)), (!cast<Instruction>(INST # UXSr) $Rn)>;
def : Pat<(i32 (round f64:$Rn)), (!cast<Instruction>(INST # UWDr) $Rn)>;		def : Pat<(i32 (round f64:$Rn)), (!cast<Instruction>(INST # UWDr) $Rn)>;
def : Pat<(i64 (round f64:$Rn)), (!cast<Instruction>(INST # UXDr) $Rn)>;		def : Pat<(i64 (round f64:$Rn)), (!cast<Instruction>(INST # UXDr) $Rn)>;

let Predicates = [HasFullFP16] in {		defm : FPToIntegerScaledPats<round, INST>;
def : Pat<(i32 (round (fmul f16:$Rn, fixedpoint_f16_i32:$scale))),
(!cast<Instruction>(INST # SWHri) $Rn, $scale)>;
def : Pat<(i64 (round (fmul f16:$Rn, fixedpoint_f16_i64:$scale))),
(!cast<Instruction>(INST # SXHri) $Rn, $scale)>;
}
def : Pat<(i32 (round (fmul f32:$Rn, fixedpoint_f32_i32:$scale))),
(!cast<Instruction>(INST # SWSri) $Rn, $scale)>;
def : Pat<(i64 (round (fmul f32:$Rn, fixedpoint_f32_i64:$scale))),
(!cast<Instruction>(INST # SXSri) $Rn, $scale)>;
def : Pat<(i32 (round (fmul f64:$Rn, fixedpoint_f64_i32:$scale))),
(!cast<Instruction>(INST # SWDri) $Rn, $scale)>;
def : Pat<(i64 (round (fmul f64:$Rn, fixedpoint_f64_i64:$scale))),
(!cast<Instruction>(INST # SXDri) $Rn, $scale)>;
}		}

defm : FPToIntegerIntPats<int_aarch64_neon_fcvtzs, "FCVTZS">;		defm : FPToIntegerIntPats<int_aarch64_neon_fcvtzs, "FCVTZS">;
defm : FPToIntegerIntPats<int_aarch64_neon_fcvtzu, "FCVTZU">;		defm : FPToIntegerIntPats<int_aarch64_neon_fcvtzu, "FCVTZU">;

multiclass FPToIntegerPats<SDNode to_int, SDNode to_int_sat, SDNode round, string INST> {		multiclass FPToIntegerPats<SDNode to_int, SDNode to_int_sat, SDNode round, string INST> {
def : Pat<(i32 (to_int (round f32:$Rn))),		def : Pat<(i32 (to_int (round f32:$Rn))),
(!cast<Instruction>(INST # UWSr) f32:$Rn)>;		(!cast<Instruction>(INST # UWSr) f32:$Rn)>;
▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	def : Pat<(i64 (any_llround f32:$Rn)),
(!cast<Instruction>(FCVTASUXSr) f32:$Rn)>;		(!cast<Instruction>(FCVTASUXSr) f32:$Rn)>;
def : Pat<(i64 (any_llround f64:$Rn)),		def : Pat<(i64 (any_llround f64:$Rn)),
(!cast<Instruction>(FCVTASUXDr) f64:$Rn)>;		(!cast<Instruction>(FCVTASUXDr) f64:$Rn)>;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Scaled integer to floating point conversion instructions.		// Scaled integer to floating point conversion instructions.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

defm SCVTF : IntegerToFP<0, "scvtf", any_sint_to_fp>;		defm SCVTF : IntegerToFPScaled<0, "scvtf", int_aarch64_neon_vcvtfxs2fp>;
defm UCVTF : IntegerToFP<1, "ucvtf", any_uint_to_fp>;		defm UCVTF : IntegerToFPScaled<1, "ucvtf", int_aarch64_neon_vcvtfxu2fp>;

		multiclass IntegerToFPScaledPats<SDPatternOperator to_fp, string INST> {
		let Predicates = [HasFullFP16] in {
		def : Pat<(f16 (fdiv (to_fp i32:$Rn), fixedpoint_f16_i32:$scale)),
		(!cast<Instruction>(INST # SWHri) $Rn, $scale)>;
		def : Pat<(f16 (fdiv (to_fp i64:$Rn), fixedpoint_f16_i64:$scale)),
		(!cast<Instruction>(INST # SXHri) $Rn, $scale)>;
		}
		def : Pat<(f32 (fdiv (to_fp i32:$Rn), fixedpoint_f32_i32:$scale)),
		(!cast<Instruction>(INST # SWSri) $Rn, $scale)>;
		def : Pat<(f64 (fdiv (to_fp i32:$Rn), fixedpoint_f64_i32:$scale)),
		(!cast<Instruction>(INST # SWDri) $Rn, $scale)>;
		def : Pat<(f32 (fdiv (to_fp i64:$Rn), fixedpoint_f32_i64:$scale)),
		(!cast<Instruction>(INST # SXSri) $Rn, $scale)>;
		def : Pat<(f64 (fdiv (to_fp i64:$Rn), fixedpoint_f64_i64:$scale)),
		(!cast<Instruction>(INST # SXDri) $Rn, $scale)>;
		}

		defm : IntegerToFPScaledPats<any_sint_to_fp, "SCVTF">;
		defm : IntegerToFPScaledPats<any_uint_to_fp, "UCVTF">;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Unscaled integer to floating point conversion instruction.		// Unscaled integer to floating point conversion instruction.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

		defm SCVTF : IntegerToFPUnscaled<0, "scvtf", any_sint_to_fp>;
		defm UCVTF : IntegerToFPUnscaled<1, "ucvtf", any_uint_to_fp>;

defm FMOV : UnscaledConversion<"fmov">;		defm FMOV : UnscaledConversion<"fmov">;

// Add pseudo ops for FMOV 0 so we can mark them as isReMaterializable		// Add pseudo ops for FMOV 0 so we can mark them as isReMaterializable
let isReMaterializable = 1, isCodeGenOnly = 1, isAsCheapAsAMove = 1 in {		let isReMaterializable = 1, isCodeGenOnly = 1, isAsCheapAsAMove = 1 in {
def FMOVH0 : Pseudo<(outs FPR16:$Rd), (ins), [(set f16:$Rd, (fpimm0))]>,		def FMOVH0 : Pseudo<(outs FPR16:$Rd), (ins), [(set f16:$Rd, (fpimm0))]>,
Sched<[WriteF]>, Requires<[HasFullFP16]>;		Sched<[WriteF]>, Requires<[HasFullFP16]>;
def FMOVS0 : Pseudo<(outs FPR32:$Rd), (ins), [(set f32:$Rd, (fpimm0))]>,		def FMOVS0 : Pseudo<(outs FPR32:$Rd), (ins), [(set f32:$Rd, (fpimm0))]>,
Sched<[WriteF]>;		Sched<[WriteF]>;
▲ Show 20 Lines • Show All 4,360 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/fcvt-fixed.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s --check-prefixes=CHECK,CHECK-NO16			; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s --check-prefixes=CHECK,CHECK-NO16
	; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu -mattr=+fullfp16 \| FileCheck %s --check-prefixes=CHECK,CHECK-FP16			; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu -mattr=+fullfp16 \| FileCheck %s --check-prefixes=CHECK,CHECK-FP16


				declare float @llvm.aarch64.neon.vcvtfxs2fp.f32.i32(i32, i32)
				declare float @llvm.aarch64.neon.vcvtfxs2fp.f32.i64(i64, i32)
				declare double @llvm.aarch64.neon.vcvtfxs2fp.f64.i32(i32, i32)
				declare double @llvm.aarch64.neon.vcvtfxs2fp.f64.i64(i64, i32)

				declare float @llvm.aarch64.neon.vcvtfxu2fp.f32.i32(i32, i32)
				declare float @llvm.aarch64.neon.vcvtfxu2fp.f32.i64(i64, i32)
				declare double @llvm.aarch64.neon.vcvtfxu2fp.f64.i32(i32, i32)
				declare double @llvm.aarch64.neon.vcvtfxu2fp.f64.i64(i64, i32)

				declare i32 @llvm.aarch64.neon.vcvtfp2fxs.i32.f32(float, i32)
				declare i64 @llvm.aarch64.neon.vcvtfp2fxs.i64.f32(float, i32)
				declare i32 @llvm.aarch64.neon.vcvtfp2fxs.i32.f64(double, i32)
				declare i64 @llvm.aarch64.neon.vcvtfp2fxs.i64.f64(double, i32)

				declare i32 @llvm.aarch64.neon.vcvtfp2fxu.i32.f32(float, i32)
				declare i64 @llvm.aarch64.neon.vcvtfp2fxu.i64.f32(float, i32)
				declare i32 @llvm.aarch64.neon.vcvtfp2fxu.i32.f64(double, i32)
				declare i64 @llvm.aarch64.neon.vcvtfp2fxu.i64.f64(double, i32)

				; fptosi

				define i32 @fcvtzs_f32_i32_int(float %flt) {
				; CHECK-LABEL: fcvtzs_f32_i32_int:
				; CHECK: // %bb.0:
				; CHECK-NEXT: fcvtzs w0, s0, #1
				; CHECK-NEXT: ret
				%cvt = tail call i32 @llvm.aarch64.neon.vcvtfp2fxs.i32.f32(float %flt, i32 1)
				ret i32 %cvt
				}

				define i64 @fcvtzs_f32_i64_int(float %flt) {
				; CHECK-LABEL: fcvtzs_f32_i64_int:
				; CHECK: // %bb.0:
				; CHECK-NEXT: fcvtzs x0, s0, #1
				; CHECK-NEXT: ret
				%cvt = tail call i64 @llvm.aarch64.neon.vcvtfp2fxs.i64.f32(float %flt, i32 1)
				ret i64 %cvt
				}

				define i32 @fcvtzs_f64_i32_int(double %dbl) {
				; CHECK-LABEL: fcvtzs_f64_i32_int:
				; CHECK: // %bb.0:
				; CHECK-NEXT: fcvtzs w0, d0, #1
				; CHECK-NEXT: ret
				%cvt = tail call i32 @llvm.aarch64.neon.vcvtfp2fxs.i32.f64(double %dbl, i32 1)
				ret i32 %cvt
				}

				define i64 @fcvtzs_f64_i64_int(double %dbl) {
				; CHECK-LABEL: fcvtzs_f64_i64_int:
				; CHECK: // %bb.0:
				; CHECK-NEXT: fcvtzs x0, d0, #1
				; CHECK-NEXT: ret
				%cvt = tail call i64 @llvm.aarch64.neon.vcvtfp2fxs.i64.f64(double %dbl, i32 1)
				ret i64 %cvt
				}

				; fptoui

				define i32 @fcvtzu_f32_i32_int(float %flt) {
				; CHECK-LABEL: fcvtzu_f32_i32_int:
				; CHECK: // %bb.0:
				; CHECK-NEXT: fcvtzu w0, s0, #1
				; CHECK-NEXT: ret
				%cvt = tail call i32 @llvm.aarch64.neon.vcvtfp2fxu.i32.f32(float %flt, i32 1)
				ret i32 %cvt
				}

				define i64 @fcvtzu_f32_i64_int(float %flt) {
				; CHECK-LABEL: fcvtzu_f32_i64_int:
				; CHECK: // %bb.0:
				; CHECK-NEXT: fcvtzu x0, s0, #1
				; CHECK-NEXT: ret
				%cvt = tail call i64 @llvm.aarch64.neon.vcvtfp2fxu.i64.f32(float %flt, i32 1)
				ret i64 %cvt
				}

				define i32 @fcvtzu_f64_i32_int(double %dbl) {
				; CHECK-LABEL: fcvtzu_f64_i32_int:
				; CHECK: // %bb.0:
				; CHECK-NEXT: fcvtzu w0, d0, #1
				; CHECK-NEXT: ret
				%cvt = tail call i32 @llvm.aarch64.neon.vcvtfp2fxu.i32.f64(double %dbl, i32 1)
				ret i32 %cvt
				}

				define i64 @fcvtzu_f64_i64_int(double %dbl) {
				; CHECK-LABEL: fcvtzu_f64_i64_int:
				; CHECK: // %bb.0:
				; CHECK-NEXT: fcvtzu x0, d0, #1
				; CHECK-NEXT: ret
				%cvt = tail call i64 @llvm.aarch64.neon.vcvtfp2fxu.i64.f64(double %dbl, i32 1)
				ret i64 %cvt
				}

				; sitofp

				define float @scvtf_f32_i32_int(i32 %int) {
				; CHECK-LABEL: scvtf_f32_i32_int:
				; CHECK: // %bb.0:
				; CHECK-NEXT: scvtf s0, w0, #1
				; CHECK-NEXT: ret
				%cvt = tail call float @llvm.aarch64.neon.vcvtfxs2fp.f32.i32(i32 %int, i32 1)
				ret float %cvt
				}

				define float @scvtf_f32_i64_int(i64 %long) {
				; CHECK-LABEL: scvtf_f32_i64_int:
				; CHECK: // %bb.0:
				; CHECK-NEXT: scvtf s0, x0, #1
				; CHECK-NEXT: ret
				%cvt = tail call float @llvm.aarch64.neon.vcvtfxs2fp.f32.i64(i64 %long, i32 1)
				ret float %cvt
				}

				define double @scvtf_f64_i32_int(i32 %int) {
				; CHECK-LABEL: scvtf_f64_i32_int:
				; CHECK: // %bb.0:
				; CHECK-NEXT: scvtf d0, w0, #1
				; CHECK-NEXT: ret
				%cvt = tail call double @llvm.aarch64.neon.vcvtfxs2fp.f64.i32(i32 %int, i32 1)
				ret double %cvt
				}

				define double @scvtf_f64_i64_int(i64 %long) {
				; CHECK-LABEL: scvtf_f64_i64_int:
				; CHECK: // %bb.0:
				; CHECK-NEXT: scvtf d0, x0, #1
				; CHECK-NEXT: ret
				%cvt = tail call double @llvm.aarch64.neon.vcvtfxs2fp.f64.i64(i64 %long, i32 1)
				ret double %cvt
				}

				; uitofp

				define float @ucvtf_f32_i32_int(i32 %int) {
				; CHECK-LABEL: ucvtf_f32_i32_int:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ucvtf s0, w0, #1
				; CHECK-NEXT: ret
				%cvt = tail call float @llvm.aarch64.neon.vcvtfxu2fp.f32.i32(i32 %int, i32 1)
				ret float %cvt
				}

				define float @ucvtf_f32_i64_int(i64 %long) {
				; CHECK-LABEL: ucvtf_f32_i64_int:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ucvtf s0, x0, #1
				; CHECK-NEXT: ret
				%cvt = tail call float @llvm.aarch64.neon.vcvtfxu2fp.f32.i64(i64 %long, i32 1)
				ret float %cvt
				}

				define double @ucvtf_f64_i32_int(i32 %int) {
				; CHECK-LABEL: ucvtf_f64_i32_int:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ucvtf d0, w0, #1
				; CHECK-NEXT: ret
				%cvt = tail call double @llvm.aarch64.neon.vcvtfxu2fp.f64.i32(i32 %int, i32 1)
				ret double %cvt
				}

				define double @ucvtf_f64_i64_int(i64 %long) {
				; CHECK-LABEL: ucvtf_f64_i64_int:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ucvtf d0, x0, #1
				; CHECK-NEXT: ret
				%cvt = tail call double @llvm.aarch64.neon.vcvtfxu2fp.f64.i64(i64 %long, i32 1)
				ret double %cvt
				}

	; fptoui			; fptoui

	define i32 @fcvtzs_f32_i32_7(float %flt) {			define i32 @fcvtzs_f32_i32_7(float %flt) {
	; CHECK-LABEL: fcvtzs_f32_i32_7:			; CHECK-LABEL: fcvtzs_f32_i32_7:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: fcvtzs w0, s0, #7			; CHECK-NEXT: fcvtzs w0, s0, #7
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%fix = fmul float %flt, 128.0			%fix = fmul float %flt, 128.0
	▲ Show 20 Lines • Show All 958 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/fp16_intrinsic_scalar_2op.ll

	Show First 20 Lines • Show All 154 Lines • ▼ Show 20 Lines
	entry:			entry:
	%sext = sext i16 %a to i32			%sext = sext i16 %a to i32
	%fcvth_n = tail call half @llvm.aarch64.neon.vcvtfxs2fp.f16.i32(i32 %sext, i32 16)			%fcvth_n = tail call half @llvm.aarch64.neon.vcvtfxs2fp.f16.i32(i32 %sext, i32 16)
	ret half %fcvth_n			ret half %fcvth_n
	}			}

	define dso_local half @test_vcvth_n_f16_s32_1(i32 %a) {			define dso_local half @test_vcvth_n_f16_s32_1(i32 %a) {
	; CHECK-LABEL: test_vcvth_n_f16_s32_1:			; CHECK-LABEL: test_vcvth_n_f16_s32_1:
	; CHECK: fmov s0, w0			; CHECK: scvtf h0, w0, #1
	; CHECK-NEXT: scvtf h0, h0, #1
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%vcvth_n_f16_s32 = tail call half @llvm.aarch64.neon.vcvtfxs2fp.f16.i32(i32 %a, i32 1)			%vcvth_n_f16_s32 = tail call half @llvm.aarch64.neon.vcvtfxs2fp.f16.i32(i32 %a, i32 1)
	ret half %vcvth_n_f16_s32			ret half %vcvth_n_f16_s32
	}			}

	define dso_local half @test_vcvth_n_f16_s32_16(i32 %a) {			define dso_local half @test_vcvth_n_f16_s32_16(i32 %a) {
	; CHECK-LABEL: test_vcvth_n_f16_s32_16:			; CHECK-LABEL: test_vcvth_n_f16_s32_16:
	; CHECK: fmov s0, w0			; CHECK: scvtf h0, w0, #16
	; CHECK-NEXT: scvtf h0, h0, #16
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%vcvth_n_f16_s32 = tail call half @llvm.aarch64.neon.vcvtfxs2fp.f16.i32(i32 %a, i32 16)			%vcvth_n_f16_s32 = tail call half @llvm.aarch64.neon.vcvtfxs2fp.f16.i32(i32 %a, i32 16)
	ret half %vcvth_n_f16_s32			ret half %vcvth_n_f16_s32
	}			}

	define dso_local i16 @test_vcvth_n_s16_f16_1(half %a) {			define dso_local i16 @test_vcvth_n_s16_f16_1(half %a) {
	; CHECK-LABEL: test_vcvth_n_s16_f16_1:			; CHECK-LABEL: test_vcvth_n_s16_f16_1:
	; CHECK: fcvtzs h0, h0, #1			; CHECK: fcvtzs w0, h0, #1
	; CHECK-NEXT: fmov w0, s0
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%fcvth_n = tail call i32 @llvm.aarch64.neon.vcvtfp2fxs.i32.f16(half %a, i32 1)			%fcvth_n = tail call i32 @llvm.aarch64.neon.vcvtfp2fxs.i32.f16(half %a, i32 1)
	%0 = trunc i32 %fcvth_n to i16			%0 = trunc i32 %fcvth_n to i16
	ret i16 %0			ret i16 %0
	}			}

	define dso_local i16 @test_vcvth_n_s16_f16_16(half %a) {			define dso_local i16 @test_vcvth_n_s16_f16_16(half %a) {
	; CHECK-LABEL: test_vcvth_n_s16_f16_16:			; CHECK-LABEL: test_vcvth_n_s16_f16_16:
	; CHECK: fcvtzs h0, h0, #16			; CHECK: fcvtzs w0, h0, #16
	; CHECK-NEXT: fmov w0, s0
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%fcvth_n = tail call i32 @llvm.aarch64.neon.vcvtfp2fxs.i32.f16(half %a, i32 16)			%fcvth_n = tail call i32 @llvm.aarch64.neon.vcvtfp2fxs.i32.f16(half %a, i32 16)
	%0 = trunc i32 %fcvth_n to i16			%0 = trunc i32 %fcvth_n to i16
	ret i16 %0			ret i16 %0
	}			}

	define dso_local i32 @test_vcvth_n_s32_f16_1(half %a) {			define dso_local i32 @test_vcvth_n_s32_f16_1(half %a) {
	; CHECK-LABEL: test_vcvth_n_s32_f16_1:			; CHECK-LABEL: test_vcvth_n_s32_f16_1:
	; CHECK: fcvtzs h0, h0, #1			; CHECK: fcvtzs w0, h0, #1
	; CHECK-NEXT: fmov w0, s0
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%vcvth_n_s32_f16 = tail call i32 @llvm.aarch64.neon.vcvtfp2fxs.i32.f16(half %a, i32 1)			%vcvth_n_s32_f16 = tail call i32 @llvm.aarch64.neon.vcvtfp2fxs.i32.f16(half %a, i32 1)
	ret i32 %vcvth_n_s32_f16			ret i32 %vcvth_n_s32_f16
	}			}

	define dso_local i32 @test_vcvth_n_s32_f16_16(half %a) {			define dso_local i32 @test_vcvth_n_s32_f16_16(half %a) {
	; CHECK-LABEL: test_vcvth_n_s32_f16_16:			; CHECK-LABEL: test_vcvth_n_s32_f16_16:
	; CHECK: fcvtzs h0, h0, #16			; CHECK: fcvtzs w0, h0, #16
	; CHECK-NEXT: fmov w0, s0
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%vcvth_n_s32_f16 = tail call i32 @llvm.aarch64.neon.vcvtfp2fxs.i32.f16(half %a, i32 16)			%vcvth_n_s32_f16 = tail call i32 @llvm.aarch64.neon.vcvtfp2fxs.i32.f16(half %a, i32 16)
	ret i32 %vcvth_n_s32_f16			ret i32 %vcvth_n_s32_f16
	}			}

	define dso_local i64 @test_vcvth_n_s64_f16_1(half %a) {			define dso_local i64 @test_vcvth_n_s64_f16_1(half %a) {
	; CHECK-LABEL: test_vcvth_n_s64_f16_1:			; CHECK-LABEL: test_vcvth_n_s64_f16_1:
	; CHECK: fcvtzs h0, h0, #1			; CHECK: fcvtzs x0, h0, #1
	; CHECK-NEXT: fmov x0, d0
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%vcvth_n_s64_f16 = tail call i64 @llvm.aarch64.neon.vcvtfp2fxs.i64.f16(half %a, i32 1)			%vcvth_n_s64_f16 = tail call i64 @llvm.aarch64.neon.vcvtfp2fxs.i64.f16(half %a, i32 1)
	ret i64 %vcvth_n_s64_f16			ret i64 %vcvth_n_s64_f16
	}			}

	define dso_local i64 @test_vcvth_n_s64_f16_32(half %a) {			define dso_local i64 @test_vcvth_n_s64_f16_32(half %a) {
	; CHECK-LABEL: test_vcvth_n_s64_f16_32:			; CHECK-LABEL: test_vcvth_n_s64_f16_32:
	; CHECK: fcvtzs h0, h0, #32			; CHECK: fcvtzs x0, h0, #32
	; CHECK-NEXT: fmov x0, d0
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%vcvth_n_s64_f16 = tail call i64 @llvm.aarch64.neon.vcvtfp2fxs.i64.f16(half %a, i32 32)			%vcvth_n_s64_f16 = tail call i64 @llvm.aarch64.neon.vcvtfp2fxs.i64.f16(half %a, i32 32)
	ret i64 %vcvth_n_s64_f16			ret i64 %vcvth_n_s64_f16
	}			}

	define dso_local half @test_vcvth_n_f16_u16_1(i16 %a) {			define dso_local half @test_vcvth_n_f16_u16_1(i16 %a) {
	; CHECK-LABEL: test_vcvth_n_f16_u16_1:			; CHECK-LABEL: test_vcvth_n_f16_u16_1:
	Show All 12 Lines
	entry:			entry:
	%0 = zext i16 %a to i32			%0 = zext i16 %a to i32
	%fcvth_n = tail call half @llvm.aarch64.neon.vcvtfxu2fp.f16.i32(i32 %0, i32 16)			%fcvth_n = tail call half @llvm.aarch64.neon.vcvtfxu2fp.f16.i32(i32 %0, i32 16)
	ret half %fcvth_n			ret half %fcvth_n
	}			}

	define dso_local half @test_vcvth_n_f16_u32_1(i32 %a) {			define dso_local half @test_vcvth_n_f16_u32_1(i32 %a) {
	; CHECK-LABEL: test_vcvth_n_f16_u32_1:			; CHECK-LABEL: test_vcvth_n_f16_u32_1:
	; CHECK: fmov s0, w0			; CHECK: ucvtf h0, w0, #1
	; CHECK-NEXT: ucvtf h0, h0, #1
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%vcvth_n_f16_u32 = tail call half @llvm.aarch64.neon.vcvtfxu2fp.f16.i32(i32 %a, i32 1)			%vcvth_n_f16_u32 = tail call half @llvm.aarch64.neon.vcvtfxu2fp.f16.i32(i32 %a, i32 1)
	ret half %vcvth_n_f16_u32			ret half %vcvth_n_f16_u32
	}			}

	define dso_local half @test_vcvth_n_f16_u32_16(i32 %a) {			define dso_local half @test_vcvth_n_f16_u32_16(i32 %a) {
	; CHECK-LABEL: test_vcvth_n_f16_u32_16:			; CHECK-LABEL: test_vcvth_n_f16_u32_16:
	; CHECK: ucvtf h0, h0, #16			; CHECK: ucvtf h0, w0, #16
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%vcvth_n_f16_u32 = tail call half @llvm.aarch64.neon.vcvtfxu2fp.f16.i32(i32 %a, i32 16)			%vcvth_n_f16_u32 = tail call half @llvm.aarch64.neon.vcvtfxu2fp.f16.i32(i32 %a, i32 16)
	ret half %vcvth_n_f16_u32			ret half %vcvth_n_f16_u32
	}			}

	define dso_local i16 @test_vcvth_n_u16_f16_1(half %a) {			define dso_local i16 @test_vcvth_n_u16_f16_1(half %a) {
	; CHECK-LABEL: test_vcvth_n_u16_f16_1:			; CHECK-LABEL: test_vcvth_n_u16_f16_1:
	; CHECK: fcvtzu h0, h0, #1			; CHECK: fcvtzu w0, h0, #1
	; CHECK-NEXT: fmov w0, s0
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%fcvth_n = tail call i32 @llvm.aarch64.neon.vcvtfp2fxu.i32.f16(half %a, i32 1)			%fcvth_n = tail call i32 @llvm.aarch64.neon.vcvtfp2fxu.i32.f16(half %a, i32 1)
	%0 = trunc i32 %fcvth_n to i16			%0 = trunc i32 %fcvth_n to i16
	ret i16 %0			ret i16 %0
	}			}

	define dso_local i16 @test_vcvth_n_u16_f16_16(half %a) {			define dso_local i16 @test_vcvth_n_u16_f16_16(half %a) {
	; CHECK-LABEL: test_vcvth_n_u16_f16_16:			; CHECK-LABEL: test_vcvth_n_u16_f16_16:
	; CHECK: fcvtzu h0, h0, #16			; CHECK: fcvtzu w0, h0, #16
	; CHECK-NEXT: fmov w0, s0
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%fcvth_n = tail call i32 @llvm.aarch64.neon.vcvtfp2fxu.i32.f16(half %a, i32 16)			%fcvth_n = tail call i32 @llvm.aarch64.neon.vcvtfp2fxu.i32.f16(half %a, i32 16)
	%0 = trunc i32 %fcvth_n to i16			%0 = trunc i32 %fcvth_n to i16
	ret i16 %0			ret i16 %0
	}			}

	define dso_local i32 @test_vcvth_n_u32_f16_1(half %a) {			define dso_local i32 @test_vcvth_n_u32_f16_1(half %a) {
	; CHECK-LABEL: test_vcvth_n_u32_f16_1:			; CHECK-LABEL: test_vcvth_n_u32_f16_1:
	; CHECK: fcvtzu h0, h0, #1			; CHECK: fcvtzu w0, h0, #1
	; CHECK-NEXT: fmov w0, s0
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%vcvth_n_u32_f16 = tail call i32 @llvm.aarch64.neon.vcvtfp2fxu.i32.f16(half %a, i32 1)			%vcvth_n_u32_f16 = tail call i32 @llvm.aarch64.neon.vcvtfp2fxu.i32.f16(half %a, i32 1)
	ret i32 %vcvth_n_u32_f16			ret i32 %vcvth_n_u32_f16
	}			}

	define dso_local i32 @test_vcvth_n_u32_f16_16(half %a) {			define dso_local i32 @test_vcvth_n_u32_f16_16(half %a) {
	; CHECK-LABEL: test_vcvth_n_u32_f16_16:			; CHECK-LABEL: test_vcvth_n_u32_f16_16:
	; CHECK: fcvtzu h0, h0, #16			; CHECK: fcvtzu w0, h0, #16
	; CHECK-NEXT: fmov w0, s0
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%vcvth_n_u32_f16 = tail call i32 @llvm.aarch64.neon.vcvtfp2fxu.i32.f16(half %a, i32 16)			%vcvth_n_u32_f16 = tail call i32 @llvm.aarch64.neon.vcvtfp2fxu.i32.f16(half %a, i32 16)
	ret i32 %vcvth_n_u32_f16			ret i32 %vcvth_n_u32_f16
	}			}

	define dso_local i16 @vcageh_f16_test(half %a, half %b) {			define dso_local i16 @vcageh_f16_test(half %a, half %b) {
	; CHECK-LABEL: vcageh_f16_test:			; CHECK-LABEL: vcageh_f16_test:
	Show All 14 Lines
	entry:			entry:
	%facg = tail call i32 @llvm.aarch64.neon.facgt.i32.f16(half %a, half %b)			%facg = tail call i32 @llvm.aarch64.neon.facgt.i32.f16(half %a, half %b)
	%0 = trunc i32 %facg to i16			%0 = trunc i32 %facg to i16
	ret i16 %0			ret i16 %0
	}			}

	define dso_local half @vcvth_n_f16_s64_test(i64 %a) {			define dso_local half @vcvth_n_f16_s64_test(i64 %a) {
	; CHECK-LABEL: vcvth_n_f16_s64_test:			; CHECK-LABEL: vcvth_n_f16_s64_test:
	; CHECK: fmov d0, x0			; CHECK: scvtf h0, x0, #16
	; CHECK-NEXT: scvtf h0, h0, #16
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%vcvth_n_f16_s64 = tail call half @llvm.aarch64.neon.vcvtfxs2fp.f16.i64(i64 %a, i32 16)			%vcvth_n_f16_s64 = tail call half @llvm.aarch64.neon.vcvtfxs2fp.f16.i64(i64 %a, i32 16)
	ret half %vcvth_n_f16_s64			ret half %vcvth_n_f16_s64
	}			}