This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/LoongArch/
-
Target/
-
LoongArch/
-
LoongArchFloat32InstrInfo.td
-
LoongArchFloat64InstrInfo.td
-
LoongArchISelDAGToDAG.h
-
LoongArchISelDAGToDAG.cpp
1/2
LoongArchISelLowering.cpp
-
LoongArchInstrInfo.td
-
test/CodeGen/LoongArch/
-
CodeGen/
-
LoongArch/
-
double-imm.ll
-
float-imm.ll
-
ir-instruction/
-
double-convert.ll
-
float-convert.ll

Differential D129715

[LoongArch] Heuristically load FP immediates by movgr2fr from materialized integer
AcceptedPublic

Authored by gonglingqin on Jul 13 2022, 6:32 PM.

Download Raw Diff

Details

Reviewers

xry111
SixWeining
MaskRay
xen0n

Summary

Load FP immediates by movgr2fr from materialized integer if the bitcasted integer
can be materialized within 2 instructions.
For example, when loading double 1024.0, use

lu52i.d $a0, $zero, 1033
movgr2fr.d $fa0, $a0

instead of

pcalau12i $a0, .LCPI2_0
addi.d $a0, $a0, .LCPI2_0
fld.d $fa0, $a0, 0

Test this patch with 3A5000 on llvm13, the result shows that SPEC CPU2006 FP
score increases 1.2% in average, 470.lbm score increases 11.9%.

Thanks to @xry111 for the suggestion: https://reviews.llvm.org/D128898#3632140

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	60,050 ms	x64 debian > MLIR.Examples/standalone::test.toy
	60,050 ms	x64 debian > ThreadSanitizer-x86_64.ThreadSanitizer-x86_64::restore_stack.cpp
	60,050 ms	x64 debian > libFuzzer.libFuzzer::fuzzer-leak.test
	60,040 ms	x64 debian > libFuzzer.libFuzzer::value-profile-load.test

Event Timeline

gonglingqin created this revision.Jul 13 2022, 6:32 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 13 2022, 6:32 PM

Herald added subscribers: StephenFan, hiraditya. · View Herald Transcript

gonglingqin requested review of this revision.Jul 13 2022, 6:32 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 13 2022, 6:32 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Thanks!

This revision is now accepted and ready to land.Jul 13 2022, 7:05 PM

This change optimzies out the constant pool for loading floating-point constant by li+i2f, so I think the title could be: "[LoongArch] Optimize the loading of floating-point immediates by li+i2f", and it' better that use a common case in the summary but not the 1.0.

To other reviewers: we tested this optimization with an internal llvm version (llvm13) on 3A5000, and it shows that SPEC CPU2006 FP score increases 1% in average. 470.lbm score increases 8.9%. But we wonder why other architectures have not done so? Is there any potential issue?

In D129715#3650519, @SixWeining wrote:

This change optimzies out the constant pool for loading floating-point constant by li+i2f, so I think the title could be: "[LoongArch] Optimize the loading of floating-point immediates by li+i2f", and it' better that use a common case in the summary but not the 1.0.

To other reviewers: we tested this optimization with an internal llvm version (llvm13) on 3A5000, and it shows that SPEC CPU2006 FP score increases 1% in average. 470.lbm score increases 8.9%. But we wonder why other architectures have not done so? Is there any potential issue?

Hmm yeah I overlooked the overly generic patch title (I was reviewing the code in the metro). But it seems the i2f isn't found anywhere in the repo nor the commit history, instead there's i2fp but that usage isn't common either. I assume the i2fp is short for the {s,u}itofp IR insn. Then the description is incorrect because the {s,u}itofp transfers the numeric value, not bit layout.

I think the title could be further simplified into something like "[LoongArch] Load FP immediates by movgr2fr from materialized integer", and the justification (such as the performance numbers you cited) could be put in the commit message body. What do you think?

Harbormaster completed remote builds in B175275: Diff 444483.Jul 13 2022, 7:54 PM

In D129715#3650542, @xen0n wrote:

In D129715#3650519, @SixWeining wrote:

This change optimzies out the constant pool for loading floating-point constant by li+i2f, so I think the title could be: "[LoongArch] Optimize the loading of floating-point immediates by li+i2f", and it' better that use a common case in the summary but not the 1.0.

To other reviewers: we tested this optimization with an internal llvm version (llvm13) on 3A5000, and it shows that SPEC CPU2006 FP score increases 1% in average. 470.lbm score increases 8.9%. But we wonder why other architectures have not done so? Is there any potential issue?

Hmm yeah I overlooked the overly generic patch title (I was reviewing the code in the metro). But it seems the i2f isn't found anywhere in the repo nor the commit history, instead there's i2fp but that usage isn't common either. I assume the i2fp is short for the {s,u}itofp IR insn. Then the description is incorrect because the {s,u}itofp transfers the numeric value, not bit layout.

I think the title could be further simplified into something like "[LoongArch] Load FP immediates by movgr2fr from materialized integer", and the justification (such as the performance numbers you cited) could be put in the commit message body. What do you think?

That sounds good! Thanks!

In D129715#3650542, @xen0n wrote:

In D129715#3650519, @SixWeining wrote:

This change optimzies out the constant pool for loading floating-point constant by li+i2f, so I think the title could be: "[LoongArch] Optimize the loading of floating-point immediates by li+i2f", and it' better that use a common case in the summary but not the 1.0.

To other reviewers: we tested this optimization with an internal llvm version (llvm13) on 3A5000, and it shows that SPEC CPU2006 FP score increases 1% in average. 470.lbm score increases 8.9%. But we wonder why other architectures have not done so? Is there any potential issue?

Hmm yeah I overlooked the overly generic patch title (I was reviewing the code in the metro). But it seems the i2f isn't found anywhere in the repo nor the commit history, instead there's i2fp but that usage isn't common either. I assume the i2fp is short for the {s,u}itofp IR insn. Then the description is incorrect because the {s,u}itofp transfers the numeric value, not bit layout.

I think the title could be further simplified into something like "[LoongArch] Load FP immediates by movgr2fr from materialized integer", and the justification (such as the performance numbers you cited) could be put in the commit message body. What do you think?

Thanks. I will change that.

Address @xen0n and @SixWeining's comments.

gonglingqin retitled this revision from [LoongArch] Optimize the loading of floating-point immediates to [LoongArch] Load FP immediates by movgr2fr from materialized integer.Jul 13 2022, 8:26 PM

gonglingqin edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B175293: Diff 444507.Jul 13 2022, 8:55 PM

To other reviewers: we tested this optimization with an internal llvm version (llvm13) on 3A5000, and it shows that SPEC CPU2006 FP score increases 1% in average. 470.lbm score increases 8.9%. But we wonder why other architectures have not done so? Is there any potential issue?

I guess the reason is "for very simple test cases fld is really faster".

bench.S:

#define VALUE	0x4090000000000000

.text
.type	main, @function
.globl	main

main:
	li.w	$t1, 1048576
.loop:
	.rept 1024
#if LOAD_IMM
	li.d	$t0, VALUE
	movgr2fr.d	$ft0, $t0
#else
	la.local	$t0, .const0
	fld.d	$ft0, $t0, 0
#endif
	.endr
	addi.w	$t1, $t1, -1
	bnez	$t1, .loop
	li.w	$a0, 0
	jr	$ra


.data
.hidden	.const0
.const0:
	.dword	VALUE

On my 3A5000 (at 2.3 GHz) cc bench_imm.S && time ./a.out gives 0.35s, but cc bench_imm.S -DLOAD_IMM && time ./a.out gives 0.60s. But I think it's just because for the simple case the constant pool is always in the L1 cache...

In D129715#3650672, @xry111 wrote:

On my 3A5000 (at 2.3 GHz) cc bench_imm.S && time ./a.out gives 0.35s, but cc bench_imm.S -DLOAD_IMM && time ./a.out gives 0.60s. But I think it's just because for the simple case the constant pool is always in the L1 cache...

Ah, just make the fetch unit busier then the result will prefer immediate loading:

.loop:
	.rept 1024
#if LOAD_IMM
	li.d	$t0, VALUE
	movgr2fr.d	$ft0, $t0
#else
	la.local	$t0, .const0
	fld.d	$ft0, $t0, 0
#endif
	la.local	$t0, .const1
	ld.d	$t2, $t0, 0
	.endr
	addi.w	$t1, $t1, -1
	bnez	$t1, .loop
	li.w	$a0, 0
	jr	$ra

cc bench_imm.S && time ./a.out gives 0.70s, and cc bench_imm.S -DLOAD_IMM && time ./a.out gives 0.59s. But for a more complex bit pattern (like 0x400921FB54442D18 for PI) fld.d will win again.

It it possible to limit the use of movgr2fr for the patterns can be loaded with only one or two instructions and see how the SPEC will change?

In D129715#3650728, @xry111 wrote:
In D129715#3650672, @xry111 wrote:

On my 3A5000 (at 2.3 GHz) cc bench_imm.S && time ./a.out gives 0.35s, but cc bench_imm.S -DLOAD_IMM && time ./a.out gives 0.60s. But I think it's just because for the simple case the constant pool is always in the L1 cache...

Ah, just make the fetch unit busier then the result will prefer immediate loading:
.loop:
	.rept 1024
#if LOAD_IMM
	li.d	$t0, VALUE
	movgr2fr.d	$ft0, $t0
#else
	la.local	$t0, .const0
	fld.d	$ft0, $t0, 0
#endif
	la.local	$t0, .const1
	ld.d	$t2, $t0, 0
	.endr
	addi.w	$t1, $t1, -1
	bnez	$t1, .loop
	li.w	$a0, 0
	jr	$ra
cc bench_imm.S && time ./a.out gives 0.70s, and cc bench_imm.S -DLOAD_IMM && time ./a.out gives 0.59s. But for a more complex bit pattern (like 0x400921FB54442D18 for PI) fld.d will win again.

It it possible to limit the use of movgr2fr for the patterns can be loaded with only one or two instructions and see how the SPEC will change?

Thanks for the suggestion. It may be possible, I will test it.

Address @xry111's comments. Load FP immediates by movgr2fr from materialized integer if the bitcasted integer can be materialized within 2 instructions.

gonglingqin retitled this revision from [LoongArch] Load FP immediates by movgr2fr from materialized integer to [LoongArch] Heuristically load FP immediates by movgr2fr from materialized integer.Jul 14 2022, 7:42 PM

gonglingqin edited the summary of this revision. (Show Details)

I don't know if you did the experiments thoroughly and found out 2 is the optimal threshold (on SPEC2006), or if it was just an arbitrary choice ("拍脑袋").

You could mention how the threshold was chosen, in case it is indeed arbitrary but others wrongly assume it's something related to micro-architecture details, or empirically verified.

llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
874	nit: `bitcast` -- the verb "cast"'s past participle is itself, so is the compound word "bitcast".

In D129715#3653892, @xen0n wrote:

I don't know if you did the experiments thoroughly and found out 2 is the optimal threshold (on SPEC2006), or if it was just an arbitrary choice ("拍脑袋").

You could mention how the threshold was chosen, in case it is indeed arbitrary but others wrongly assume it's something related to micro-architecture details, or empirically verified.

I used 3A5000 on llvm13 to test materialized integer within 1,2 and 4 instructions.the results show that the performance is the best when using no more than 2 instructions. Maybe we should test the situation materialized integer within 3 instructions.

gonglingqin added inline comments.Jul 14 2022, 8:11 PM

llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
874	Thanks. I will change that.

In D129715#3653900, @gonglingqin wrote:

In D129715#3653892, @xen0n wrote:

I don't know if you did the experiments thoroughly and found out 2 is the optimal threshold (on SPEC2006), or if it was just an arbitrary choice ("拍脑袋").

You could mention how the threshold was chosen, in case it is indeed arbitrary but others wrongly assume it's something related to micro-architecture details, or empirically verified.

I used 3A5000 on llvm13 to test materialized integer within 1,2 and 4 instructions.the results show that the performance is the best when using no more than 2 instructions. Maybe we should test the situation materialized integer within 3 instructions.

Could be better to find some time to upgrade your benchmarking environment for testing the actual main branch. ;-)

Regarding the actual benchmarks, yes I think testing the 3-instruction case could be useful. But again, it may not make a significant difference, since the IEEE-754 biased exponent is occupying the highest 12 bits (except the sign bit), all f64's with top 12 bits zeroed are denormals. And numbers whose binary representation have big "holes" of all-0s or 1s for their two "middle" 20-bit segments or lowest 12 bits are probably not commonly used in the wild, let alone being used as immediates. You could try benchmarking of course, but I doubt the result would be much different from the 2-insn case.

(The 4-insn case is useless and equivalent to unconditionally loading via integer immediates, because all 64-bit values can be loaded in 4 insns (lu12i.w + ori + lu32i.d + lu52i.d) in LA64, and in LA32 you need two pairs of materialization and GPR-FPR moves for the higher and lower 32 bits anyway.)

In D129715#3653907, @xen0n wrote:

In D129715#3653900, @gonglingqin wrote:

In D129715#3653892, @xen0n wrote:

I don't know if you did the experiments thoroughly and found out 2 is the optimal threshold (on SPEC2006), or if it was just an arbitrary choice ("拍脑袋").

You could mention how the threshold was chosen, in case it is indeed arbitrary but others wrongly assume it's something related to micro-architecture details, or empirically verified.

I used 3A5000 on llvm13 to test materialized integer within 1,2 and 4 instructions.the results show that the performance is the best when using no more than 2 instructions. Maybe we should test the situation materialized integer within 3 instructions.

Could be better to find some time to upgrade your benchmarking environment for testing the actual main branch. ;-)

Ignore this; I forgot the main branch has no clang support yet.

Harbormaster completed remote builds in B175552: Diff 444860.Jul 14 2022, 9:16 PM

In D129715#3653907, @xen0n wrote:

In D129715#3653900, @gonglingqin wrote:

I used 3A5000 on llvm13 to test materialized integer within 1,2 and 4 instructions.the results show that the performance is the best when using no more than 2 instructions. Maybe we should test the situation materialized integer within 3 instructions.

Could be better to find some time to upgrade your benchmarking environment for testing the actual main branch. ;-)

Regarding the actual benchmarks, yes I think testing the 3-instruction case could be useful. But again, it may not make a significant difference, since the IEEE-754 biased exponent is occupying the highest 12 bits (except the sign bit), all f64's with top 12 bits zeroed are denormals. And numbers whose binary representation have big "holes" of all-0s or 1s for their two "middle" 20-bit segments or lowest 12 bits are probably not commonly used in the wild, let alone being used as immediates. You could try benchmarking of course, but I doubt the result would be much different from the 2-insn case.

(The 4-insn case is useless and equivalent to unconditionally loading via integer immediates, because all 64-bit values can be loaded in 4 insns (lu12i.w + ori + lu32i.d + lu52i.d) in LA64, and in LA32 you need two pairs of materialization and GPR-FPR moves for the higher and lower 32 bits anyway.)

The test results show that the performance of materialized integer within 3 instructions is better than that of the 2-instructions case. The test results are shown in the table

Benchmarks	Score of 2 instructions case	Score of 3 instructions case	diff
433.milc	13.2	13.2	0
444.namd	15	15.1	0.1
447.dealII	26.6	26.7	0.1
450.soplex	23.6	24.2	0.6
453.povray	23.3	23.4	0.1
470.lbm	21.5	21.9	0.4
482.sphinx3	25.5	25.5	0

It seems that 3-instructions case outperforms the other cases. @xen0n, Do you have any suggestions?
(Since we do not support flang for the time being, I didn't test fortran related topics)

In D129715#3654388, @gonglingqin wrote:

In D129715#3653907, @xen0n wrote:

In D129715#3653900, @gonglingqin wrote:

I used 3A5000 on llvm13 to test materialized integer within 1,2 and 4 instructions.the results show that the performance is the best when using no more than 2 instructions. Maybe we should test the situation materialized integer within 3 instructions.

Could be better to find some time to upgrade your benchmarking environment for testing the actual main branch. ;-)

Regarding the actual benchmarks, yes I think testing the 3-instruction case could be useful. But again, it may not make a significant difference, since the IEEE-754 biased exponent is occupying the highest 12 bits (except the sign bit), all f64's with top 12 bits zeroed are denormals. And numbers whose binary representation have big "holes" of all-0s or 1s for their two "middle" 20-bit segments or lowest 12 bits are probably not commonly used in the wild, let alone being used as immediates. You could try benchmarking of course, but I doubt the result would be much different from the 2-insn case.

(The 4-insn case is useless and equivalent to unconditionally loading via integer immediates, because all 64-bit values can be loaded in 4 insns (lu12i.w + ori + lu32i.d + lu52i.d) in LA64, and in LA32 you need two pairs of materialization and GPR-FPR moves for the higher and lower 32 bits anyway.)

The test results show that the performance of materialized integer within 3 instructions is better than that of the 2-instructions case. The test results are shown in the table

Benchmarks Score of 2 instructions case Score of 3 instructions case diff

433.milc 13.2 13.2 0

444.namd 15 15.1 0.1

447.dealII 26.6 26.7 0.1

450.soplex 23.6 24.2 0.6

453.povray 23.3 23.4 0.1

470.lbm 21.5 21.9 0.4

482.sphinx3 25.5 25.5 0

It seems that 3-instructions case outperforms the other cases. @xen0n, Do you have any suggestions?
(Since we do not support flang for the time being, I didn't test fortran related topics)

This is interesting data, is the SPEC2006 runs one-shot or averaged over multiple runs like the Phoronix Test Suite? Although the 450.soplex case seems statistically significant enough.

I think some assembly comparison could go a long way, but again, SPEC2006 is *horribly outdated* so actually IMO the argument for 3-instruction threshold would be a lot stronger if you could replicate this result on some more recent or comprehensive benchmark suites. (PTS or newer SPEC are all better than SPEC2006 in this regard.)

In D129715#3654612, @xen0n wrote:

This is interesting data, is the SPEC2006 runs one-shot or averaged over multiple runs like the Phoronix Test Suite? Although the 450.soplex case seems statistically significant enough.

Spec2006 was tested twice and the average of the scores was taken

I think some assembly comparison could go a long way, but again, SPEC2006 is *horribly outdated* so actually IMO the argument for 3-instruction threshold would be a lot stronger if you could replicate this result on some more recent or comprehensive benchmark suites. (PTS or newer SPEC are all better than SPEC2006 in this regard.)

Thanks, I will test other benchmark sets.

I'm now feeling guilty because I've raise the suggestion w/o any benchmarking done... When I get some spare time I'll try to implement this for GCC and benchmark it.

In D129715#3657355, @xry111 wrote:

I'm now feeling guilty because I've raise the suggestion w/o any benchmarking done... When I get some spare time I'll try to implement this for GCC and benchmark it.

You don't have to feel guilty. We must do this to make best decision. :)

In D129715#3656776, @gonglingqin wrote:

In D129715#3654612, @xen0n wrote:

I think some assembly comparison could go a long way, but again, SPEC2006 is *horribly outdated* so actually IMO the argument for 3-instruction threshold would be a lot stronger if you could replicate this result on some more recent or comprehensive benchmark suites. (PTS or newer SPEC are all better than SPEC2006 in this regard.)

Thanks, I will test other benchmark sets.

I used cpu2017(fortran excluded) to test the performance in 5 cases,

using constant pool,
materialized integer with 1 instruction,
materialized integer within 2 instructions,
materialized integer within 3 instructions,
materialized integer within 4 instructions.

(Tests were run three times for each condition and the scores were geometrically averaged).
The results showed no change in the scores for the 5 cases. @xen0n, @xry111, do you have any suggestions?

In D129715#3681339, @gonglingqin wrote:

In D129715#3656776, @gonglingqin wrote:

In D129715#3654612, @xen0n wrote:

I think some assembly comparison could go a long way, but again, SPEC2006 is *horribly outdated* so actually IMO the argument for 3-instruction threshold would be a lot stronger if you could replicate this result on some more recent or comprehensive benchmark suites. (PTS or newer SPEC are all better than SPEC2006 in this regard.)

Thanks, I will test other benchmark sets.

I used cpu2017(fortran excluded) to test the performance in 5 cases,

using constant pool,

materialized integer with 1 instruction,

materialized integer within 2 instructions,

materialized integer within 3 instructions,

materialized integer within 4 instructions.

(Tests were run three times for each condition and the scores were geometrically averaged).
The results showed no change in the scores for the 5 cases. @xen0n, @xry111, do you have any suggestions?

Make it a tunable (-loongarch-materialize-float-imm=0/1/2/3/4, or some better name), I guess. And set the default to 0 for -mtune=generic or -mtune=la464. Then we can set it to other values if a future uarch behaves differently.

In D129715#3681358, @xry111 wrote:

In D129715#3681339, @gonglingqin wrote:

I used cpu2017(fortran excluded) to test the performance in 5 cases,

using constant pool,

materialized integer with 1 instruction,

materialized integer within 2 instructions,

materialized integer within 3 instructions,

materialized integer within 4 instructions.

(Tests were run three times for each condition and the scores were geometrically averaged).
The results showed no change in the scores for the 5 cases. @xen0n, @xry111, do you have any suggestions?

Make it a tunable (-loongarch-materialize-float-imm=0/1/2/3/4, or some better name), I guess. And set the default to 0 for -mtune=generic or -mtune=la464. Then we can set it to other values if a future uarch behaves differently.

Good suggestion! Thanks! If others agree with this opinion, I will implement it.

In D129715#3681385, @gonglingqin wrote:

In D129715#3681358, @xry111 wrote:

In D129715#3681339, @gonglingqin wrote:

I used cpu2017(fortran excluded) to test the performance in 5 cases,

using constant pool,

materialized integer with 1 instruction,

materialized integer within 2 instructions,

materialized integer within 3 instructions,

materialized integer within 4 instructions.

(Tests were run three times for each condition and the scores were geometrically averaged).
The results showed no change in the scores for the 5 cases. @xen0n, @xry111, do you have any suggestions?

Make it a tunable (-loongarch-materialize-float-imm=0/1/2/3/4, or some better name), I guess. And set the default to 0 for -mtune=generic or -mtune=la464. Then we can set it to other values if a future uarch behaves differently.

Good suggestion! Thanks! If others agree with this opinion, I will implement it.

Hmm, so the workload characteristics of SPEC2017fp actually changed enough to make this optimization negligible. Interesting.

I think @xry111's suggestion to make this optimization tunable is reasonable, but it may need more work then. Perhaps this patch could be put on the back burner, we can always come to finish this later after the more essential things. Lots of downstream projects are blocked by availability of LLVM so it may be very worthwhile to shift priorities for now.

I think @xry111's suggestion to make this optimization tunable is reasonable, but it may need more work then. Perhaps this patch could be put on the back burner, we can always come to finish this later after the more essential things. Lots of downstream projects are blocked by availability of LLVM so it may be very worthwhile to shift priorities for now.

Yes, if there is too much extra cost we can just delay this one. My suggestion to make a tunable is based on "the logic is already written and let's not waste it".

Apologize again for raising some premature thoughts too early :(.

In D129715#3682174, @xry111 wrote:

I think @xry111's suggestion to make this optimization tunable is reasonable, but it may need more work then. Perhaps this patch could be put on the back burner, we can always come to finish this later after the more essential things. Lots of downstream projects are blocked by availability of LLVM so it may be very worthwhile to shift priorities for now.

Yes, if there is too much extra cost we can just delay this one. My suggestion to make a tunable is based on "the logic is already written and let's not waste it".

Thank you for your suggestions. After discussion, we will continue to improve this patch after the implementation of important functions.

Apologize again for raising some premature thoughts too early :(.

You don't have to feel guilty. This is an interesting optimization.

Revision Contents

Path

Size

llvm/

lib/

Target/

LoongArch/

LoongArchFloat32InstrInfo.td

2 lines

LoongArchFloat64InstrInfo.td

4 lines

LoongArchISelDAGToDAG.h

2 lines

LoongArchISelDAGToDAG.cpp

68 lines

LoongArchISelLowering.cpp

16 lines

LoongArchInstrInfo.td

2 lines

test/

CodeGen/

LoongArch/

double-imm.ll

92 lines

float-imm.ll

32 lines

ir-instruction/

double-convert.ll

35 lines

float-convert.ll

33 lines

Diff 444860

llvm/lib/Target/LoongArch/LoongArchFloat32InstrInfo.td

	Show First 20 Lines • Show All 196 Lines • ▼ Show 20 Lines

	/// Stores			/// Stores

	defm : StPat<store, FST_S, FPR32, f32>;			defm : StPat<store, FST_S, FPR32, f32>;

	/// Floating point constants			/// Floating point constants

	def : Pat<(f32 fpimm0), (MOVGR2FR_W R0)>;			def : Pat<(f32 fpimm0), (MOVGR2FR_W R0)>;
	def : Pat<(f32 fpimm0neg), (FNEG_S (MOVGR2FR_W R0))>;
	def : Pat<(f32 fpimm1), (FFINT_S_W (MOVGR2FR_W (ADDI_W R0, 1)))>;

	// FP Conversion			// FP Conversion
	def : Pat<(loongarch_ftint FPR32:$src), (FTINTRZ_W_S FPR32:$src)>;			def : Pat<(loongarch_ftint FPR32:$src), (FTINTRZ_W_S FPR32:$src)>;
	} // Predicates = [HasBasicF]			} // Predicates = [HasBasicF]

	let Predicates = [HasBasicF, IsLA64] in {			let Predicates = [HasBasicF, IsLA64] in {
	// GPR -> FPR			// GPR -> FPR
	def : Pat<(loongarch_movgr2fr_w_la64 GPR:$src), (MOVGR2FR_W GPR:$src)>;			def : Pat<(loongarch_movgr2fr_w_la64 GPR:$src), (MOVGR2FR_W GPR:$src)>;
	Show All 15 Lines

llvm/lib/Target/LoongArch/LoongArchFloat64InstrInfo.td

	Show First 20 Lines • Show All 209 Lines • ▼ Show 20 Lines
	// f32 -> f64			// f32 -> f64
	def : Pat<(f64 (fpextend FPR32:$src)), (FCVT_D_S FPR32:$src)>;			def : Pat<(f64 (fpextend FPR32:$src)), (FCVT_D_S FPR32:$src)>;
	} // Predicates = [HasBasicD]			} // Predicates = [HasBasicD]

	/// Floating point constants			/// Floating point constants

	let Predicates = [HasBasicD, IsLA64] in {			let Predicates = [HasBasicD, IsLA64] in {
	def : Pat<(f64 fpimm0), (MOVGR2FR_D R0)>;			def : Pat<(f64 fpimm0), (MOVGR2FR_D R0)>;
	def : Pat<(f64 fpimm0neg), (FNEG_D (MOVGR2FR_D R0))>;
	def : Pat<(f64 fpimm1), (FFINT_D_L (MOVGR2FR_D (ADDI_D R0, 1)))>;

	// Convert int to FP			// Convert int to FP
	def : Pat<(f64 (sint_to_fp (i64 (sexti32 (i64 GPR:$src))))),			def : Pat<(f64 (sint_to_fp (i64 (sexti32 (i64 GPR:$src))))),
	(FFINT_D_W (MOVGR2FR_W GPR:$src))>;			(FFINT_D_W (MOVGR2FR_W GPR:$src))>;
	def : Pat<(f64 (sint_to_fp GPR:$src)), (FFINT_D_L (MOVGR2FR_D GPR:$src))>;			def : Pat<(f64 (sint_to_fp GPR:$src)), (FFINT_D_L (MOVGR2FR_D GPR:$src))>;

	def : Pat<(f64 (uint_to_fp (i64 (zexti32 (i64 GPR:$src))))),			def : Pat<(f64 (uint_to_fp (i64 (zexti32 (i64 GPR:$src))))),
	(FFINT_D_W (MOVGR2FR_W GPR:$src))>;			(FFINT_D_W (MOVGR2FR_W GPR:$src))>;

	def : Pat<(bitconvert GPR:$src), (MOVGR2FR_D GPR:$src)>;			def : Pat<(bitconvert GPR:$src), (MOVGR2FR_D GPR:$src)>;

	// Convert FP to int			// Convert FP to int
	def : Pat<(bitconvert FPR64:$src), (MOVFR2GR_D FPR64:$src)>;			def : Pat<(bitconvert FPR64:$src), (MOVFR2GR_D FPR64:$src)>;
	} // Predicates = [HasBasicD, IsLA64]			} // Predicates = [HasBasicD, IsLA64]

	let Predicates = [HasBasicD, IsLA32] in {			let Predicates = [HasBasicD, IsLA32] in {
	def : Pat<(f64 fpimm0), (MOVGR2FRH_W (MOVGR2FR_W_64 R0), R0)>;			def : Pat<(f64 fpimm0), (MOVGR2FRH_W (MOVGR2FR_W_64 R0), R0)>;
	def : Pat<(f64 fpimm0neg), (FNEG_D (MOVGR2FRH_W (MOVGR2FR_W_64 R0), R0))>;
	def : Pat<(f64 fpimm1), (FCVT_D_S (FFINT_S_W (MOVGR2FR_W (ADDI_W R0, 1))))>;

	// Convert int to FP			// Convert int to FP
	def : Pat<(f64 (sint_to_fp (i32 GPR:$src))), (FFINT_D_W (MOVGR2FR_W GPR:$src))>;			def : Pat<(f64 (sint_to_fp (i32 GPR:$src))), (FFINT_D_W (MOVGR2FR_W GPR:$src))>;
	} // Predicates = [HasBasicD, IsLA32]			} // Predicates = [HasBasicD, IsLA32]

llvm/lib/Target/LoongArch/LoongArchISelDAGToDAG.h

Show First 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	public:
}		}
bool selectShiftMask32(SDValue N, SDValue &ShAmt) {		bool selectShiftMask32(SDValue N, SDValue &ShAmt) {
return selectShiftMask(N, 32, ShAmt);		return selectShiftMask(N, 32, ShAmt);
}		}

bool selectSExti32(SDValue N, SDValue &Val);		bool selectSExti32(SDValue N, SDValue &Val);
bool selectZExti32(SDValue N, SDValue &Val);		bool selectZExti32(SDValue N, SDValue &Val);

		SDNode *getImmediate(int64_t Imm, MVT VT, const SDLoc &DL);

// Include the pieces autogenerated from the target description.		// Include the pieces autogenerated from the target description.
#include "LoongArchGenDAGISel.inc"		#include "LoongArchGenDAGISel.inc"
};		};

} // namespace llvm		} // namespace llvm

#endif // LLVM_LIB_TARGET_LOONGARCH_LOONGARCHISELDAGTODAG_H		#endif // LLVM_LIB_TARGET_LOONGARCH_LOONGARCHISELDAGTODAG_H

llvm/lib/Target/LoongArch/LoongArchISelDAGToDAG.cpp

Show All 14 Lines
#include "MCTargetDesc/LoongArchMCTargetDesc.h"		#include "MCTargetDesc/LoongArchMCTargetDesc.h"
#include "MCTargetDesc/LoongArchMatInt.h"		#include "MCTargetDesc/LoongArchMatInt.h"
#include "llvm/Support/KnownBits.h"		#include "llvm/Support/KnownBits.h"

using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "loongarch-isel"		#define DEBUG_TYPE "loongarch-isel"

		SDNode *LoongArchDAGToDAGISel::getImmediate(int64_t Imm, MVT VT,
		const SDLoc &DL) {
		SDNode *Result = nullptr;
		SDValue SrcReg = CurDAG->getRegister(LoongArch::R0, VT);
		for (LoongArchMatInt::Inst &Inst : LoongArchMatInt::generateInstSeq(Imm)) {
		SDValue SDImm = CurDAG->getTargetConstant(Inst.Imm, DL, VT);
		if (Inst.Opc == LoongArch::LU12I_W)
		Result = CurDAG->getMachineNode(LoongArch::LU12I_W, DL, VT, SDImm);
		else
		Result = CurDAG->getMachineNode(Inst.Opc, DL, VT, SrcReg, SDImm);
		SrcReg = SDValue(Result, 0);
		}

		return Result;
		}

void LoongArchDAGToDAGISel::Select(SDNode *Node) {		void LoongArchDAGToDAGISel::Select(SDNode *Node) {
// If we have a custom node, we have already selected.		// If we have a custom node, we have already selected.
if (Node->isMachineOpcode()) {		if (Node->isMachineOpcode()) {
LLVM_DEBUG(dbgs() << "== "; Node->dump(CurDAG); dbgs() << "\n");		LLVM_DEBUG(dbgs() << "== "; Node->dump(CurDAG); dbgs() << "\n");
Node->setNodeId(-1);		Node->setNodeId(-1);
return;		return;
}		}

Show All 34 Lines	case ISD::FrameIndex: {
SDValue Imm = CurDAG->getTargetConstant(0, DL, GRLenVT);		SDValue Imm = CurDAG->getTargetConstant(0, DL, GRLenVT);
int FI = cast<FrameIndexSDNode>(Node)->getIndex();		int FI = cast<FrameIndexSDNode>(Node)->getIndex();
SDValue TFI = CurDAG->getTargetFrameIndex(FI, VT);		SDValue TFI = CurDAG->getTargetFrameIndex(FI, VT);
unsigned ADDIOp =		unsigned ADDIOp =
Subtarget->is64Bit() ? LoongArch::ADDI_D : LoongArch::ADDI_W;		Subtarget->is64Bit() ? LoongArch::ADDI_D : LoongArch::ADDI_W;
ReplaceNode(Node, CurDAG->getMachineNode(ADDIOp, DL, VT, TFI, Imm));		ReplaceNode(Node, CurDAG->getMachineNode(ADDIOp, DL, VT, TFI, Imm));
return;		return;
}		}
		case ISD::ConstantFP: {
		ConstantFPSDNode *CN = dyn_cast<ConstantFPSDNode>(Node);
		int64_t Imm = CN->getValueAPF().bitcastToAPInt().getSExtValue();
		SDNode *Result = nullptr;

		// When the floating-point immediate is +0.0, use pattern for matching.
		if (Imm == 0)
		break;

		if (Node->getValueType(0) == MVT::f64) {
		// Handle floating point immediates when is64Bit() is true.
		if (Subtarget->is64Bit()) {
		Result = getImmediate(Imm, MVT::i64, DL);
		Result = CurDAG->getMachineNode(LoongArch::MOVGR2FR_D, DL, MVT::f64,
		SDValue(Result, 0));
		ReplaceNode(Node, Result);
		return;
		}

		// Handle floating point immediates when is64Bit() is false.
		int32_t ImmHi = Imm >> 32;
		SDValue SrcReg = CurDAG->getRegister(LoongArch::R0, GRLenVT);
		if (CN->getValueAPF().bitcastToAPInt().getLoBits(32).isZero()) {
		Result = CurDAG->getMachineNode(LoongArch::MOVGR2FR_W_64, DL, MVT::f64,
		SrcReg);
		Result = CurDAG->getMachineNode(
		LoongArch::MOVGR2FRH_W, DL, MVT::f64, SDValue(Result, 0),
		SDValue(getImmediate(ImmHi, MVT::i32, DL), 0));
		ReplaceNode(Node, Result);
		return;
		}

		int32_t ImmLo =
		CN->getValueAPF().bitcastToAPInt().getLoBits(32).getSExtValue();
		Result =
		CurDAG->getMachineNode(LoongArch::MOVGR2FR_W_64, DL, MVT::f64,
		SDValue(getImmediate(ImmLo, MVT::i32, DL), 0));
		Result = CurDAG->getMachineNode(
		LoongArch::MOVGR2FRH_W, DL, MVT::f64, SDValue(Result, 0),
		SDValue(getImmediate(ImmHi, MVT::i32, DL), 0));
		ReplaceNode(Node, Result);
		return;
		}

		int32_t Imm32 = CN->getValueAPF().bitcastToAPInt().getSExtValue();
		Result = getImmediate(Imm32, GRLenVT, DL);
		Result = CurDAG->getMachineNode(LoongArch::MOVGR2FR_W, DL, MVT::f32,
		SDValue(Result, 0));
		ReplaceNode(Node, Result);
		return;
		}

// TODO: Add selection nodes needed later.		// TODO: Add selection nodes needed later.
}		}

// Select the default instruction.		// Select the default instruction.
SelectCode(Node);		SelectCode(Node);
}		}

bool LoongArchDAGToDAGISel::SelectBaseAddr(SDValue Addr, SDValue &Base) {		bool LoongArchDAGToDAGISel::SelectBaseAddr(SDValue Addr, SDValue &Base) {
▲ Show 20 Lines • Show All 106 Lines • Show Last 20 Lines

llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp

Show All 12 Lines

#include "LoongArchISelLowering.h"		#include "LoongArchISelLowering.h"
#include "LoongArch.h"		#include "LoongArch.h"
#include "LoongArchMachineFunctionInfo.h"		#include "LoongArchMachineFunctionInfo.h"
#include "LoongArchRegisterInfo.h"		#include "LoongArchRegisterInfo.h"
#include "LoongArchSubtarget.h"		#include "LoongArchSubtarget.h"
#include "LoongArchTargetMachine.h"		#include "LoongArchTargetMachine.h"
#include "MCTargetDesc/LoongArchMCTargetDesc.h"		#include "MCTargetDesc/LoongArchMCTargetDesc.h"
		#include "MCTargetDesc/LoongArchMatInt.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/CodeGen/ISDOpcodes.h"		#include "llvm/CodeGen/ISDOpcodes.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"

using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "loongarch-isel-lowering"		#define DEBUG_TYPE "loongarch-isel-lowering"

▲ Show 20 Lines • Show All 834 Lines • ▼ Show 20 Lines	if (Glue.getNode())
RetOps.push_back(Glue);		RetOps.push_back(Glue);

return DAG.getNode(LoongArchISD::RET, DL, MVT::Other, RetOps);		return DAG.getNode(LoongArchISD::RET, DL, MVT::Other, RetOps);
}		}

bool LoongArchTargetLowering::isFPImmLegal(const APFloat &Imm, EVT VT,		bool LoongArchTargetLowering::isFPImmLegal(const APFloat &Imm, EVT VT,
bool ForCodeSize) const {		bool ForCodeSize) const {
assert((VT == MVT::f32 \|\| VT == MVT::f64) && "Unexpected VT");		assert((VT == MVT::f32 \|\| VT == MVT::f64) && "Unexpected VT");
		if (VT == MVT::f32 && Subtarget.hasBasicF())
if (VT == MVT::f32 && !Subtarget.hasBasicF())		return true;
return false;		// f64 imm is legal if the bitcasted integer can be materialized within 2
		xen0nUnsubmitted Not Done Reply Inline Actions nit: `bitcast` -- the verb "cast"'s past participle is itself, so is the compound word "bitcast". xen0n: nit: `bitcast` -- the verb "cast"'s past participle is itself, so is the compound word…
		gonglingqinAuthorUnsubmitted Done Reply Inline Actions Thanks. I will change that. gonglingqin: Thanks. I will change that.
if (VT == MVT::f64 && !Subtarget.hasBasicD())		// instructions.
		if (VT == MVT::f64 && Subtarget.hasBasicD() &&
		LoongArchMatInt::generateInstSeq(Imm.bitcastToAPInt().getSExtValue())
		.size() < 3)
		return true;
return false;		return false;
return (Imm.isZero() \|\| Imm.isExactlyValue(+1.0));
}		}

llvm/lib/Target/LoongArch/LoongArchInstrInfo.td

	Show First 20 Lines • Show All 160 Lines • ▼ Show 20 Lines
	// Return the negation of an immediate value.			// Return the negation of an immediate value.
	def NegImm : SDNodeXForm<imm, [{			def NegImm : SDNodeXForm<imm, [{
	return CurDAG->getTargetConstant(-N->getSExtValue(), SDLoc(N),			return CurDAG->getTargetConstant(-N->getSExtValue(), SDLoc(N),
	N->getValueType(0));			N->getValueType(0));
	}]>;			}]>;

	// FP immediate patterns.			// FP immediate patterns.
	def fpimm0 : PatLeaf<(fpimm), [{return N->isExactlyValue(+0.0);}]>;			def fpimm0 : PatLeaf<(fpimm), [{return N->isExactlyValue(+0.0);}]>;
	def fpimm0neg : PatLeaf<(fpimm), [{return N->isExactlyValue(-0.0);}]>;
	def fpimm1 : PatLeaf<(fpimm), [{return N->isExactlyValue(+1.0);}]>;

	def CallSymbol: AsmOperandClass {			def CallSymbol: AsmOperandClass {
	let Name = "CallSymbol";			let Name = "CallSymbol";
	let RenderMethod = "addImmOperands";			let RenderMethod = "addImmOperands";
	let PredicateMethod = "isImm";			let PredicateMethod = "isImm";
	}			}

	// A bare symbol used in call only.			// A bare symbol used in call only.
	▲ Show 20 Lines • Show All 743 Lines • Show Last 20 Lines

llvm/test/CodeGen/LoongArch/double-imm.ll

	Show All 13 Lines
	; LA64-NEXT: movgr2fr.d $fa0, $zero			; LA64-NEXT: movgr2fr.d $fa0, $zero
	; LA64-NEXT: jirl $zero, $ra, 0			; LA64-NEXT: jirl $zero, $ra, 0
	ret double 0.0			ret double 0.0
	}			}

	define double @f64_negative_zero() nounwind {			define double @f64_negative_zero() nounwind {
	; LA32-LABEL: f64_negative_zero:			; LA32-LABEL: f64_negative_zero:
	; LA32: # %bb.0:			; LA32: # %bb.0:
				; LA32-NEXT: lu12i.w $a0, -524288
	; LA32-NEXT: movgr2fr.w $fa0, $zero			; LA32-NEXT: movgr2fr.w $fa0, $zero
	; LA32-NEXT: movgr2frh.w $fa0, $zero			; LA32-NEXT: movgr2frh.w $fa0, $a0
	; LA32-NEXT: fneg.d $fa0, $fa0
	; LA32-NEXT: jirl $zero, $ra, 0			; LA32-NEXT: jirl $zero, $ra, 0
	;			;
	; LA64-LABEL: f64_negative_zero:			; LA64-LABEL: f64_negative_zero:
	; LA64: # %bb.0:			; LA64: # %bb.0:
	; LA64-NEXT: movgr2fr.d $fa0, $zero			; LA64-NEXT: lu52i.d $a0, $zero, -2048
	; LA64-NEXT: fneg.d $fa0, $fa0			; LA64-NEXT: movgr2fr.d $fa0, $a0
	; LA64-NEXT: jirl $zero, $ra, 0			; LA64-NEXT: jirl $zero, $ra, 0
	ret double -0.0			ret double -0.0
	}			}

	define double @f64_constant_pi() nounwind {			define double @f64_constant_pi() nounwind {
	; LA32-LABEL: f64_constant_pi:			; LA32-LABEL: f64_constant_pi:
	; LA32: # %bb.0:			; LA32: # %bb.0:
	; LA32-NEXT: pcalau12i $a0, .LCPI2_0			; LA32-NEXT: pcalau12i $a0, .LCPI2_0
	; LA32-NEXT: addi.w $a0, $a0, .LCPI2_0			; LA32-NEXT: addi.w $a0, $a0, .LCPI2_0
	; LA32-NEXT: fld.d $fa0, $a0, 0			; LA32-NEXT: fld.d $fa0, $a0, 0
	; LA32-NEXT: jirl $zero, $ra, 0			; LA32-NEXT: jirl $zero, $ra, 0
	;			;
	; LA64-LABEL: f64_constant_pi:			; LA64-LABEL: f64_constant_pi:
	; LA64: # %bb.0:			; LA64: # %bb.0:
	; LA64-NEXT: pcalau12i $a0, .LCPI2_0			; LA64-NEXT: pcalau12i $a0, .LCPI2_0
	; LA64-NEXT: addi.d $a0, $a0, .LCPI2_0			; LA64-NEXT: addi.d $a0, $a0, .LCPI2_0
	; LA64-NEXT: fld.d $fa0, $a0, 0			; LA64-NEXT: fld.d $fa0, $a0, 0
	; LA64-NEXT: jirl $zero, $ra, 0			; LA64-NEXT: jirl $zero, $ra, 0
	ret double 3.1415926535897931159979634685441851615905761718750			ret double 3.1415926535897931159979634685441851615905761718750
	}			}

	define double @f64_add_fimm1(double %a) nounwind {			define double @f64_add_fimm1(double %a) nounwind {
	; LA32-LABEL: f64_add_fimm1:			; LA32-LABEL: f64_add_fimm1:
	; LA32: # %bb.0:			; LA32: # %bb.0:
	; LA32-NEXT: addi.w $a0, $zero, 1			; LA32-NEXT: lu12i.w $a0, 261888
	; LA32-NEXT: movgr2fr.w $fa1, $a0			; LA32-NEXT: movgr2fr.w $fa1, $zero
	; LA32-NEXT: ffint.s.w $fa1, $fa1			; LA32-NEXT: movgr2frh.w $fa1, $a0
	; LA32-NEXT: fcvt.d.s $fa1, $fa1
	; LA32-NEXT: fadd.d $fa0, $fa0, $fa1			; LA32-NEXT: fadd.d $fa0, $fa0, $fa1
	; LA32-NEXT: jirl $zero, $ra, 0			; LA32-NEXT: jirl $zero, $ra, 0
	;			;
	; LA64-LABEL: f64_add_fimm1:			; LA64-LABEL: f64_add_fimm1:
	; LA64: # %bb.0:			; LA64: # %bb.0:
	; LA64-NEXT: addi.d $a0, $zero, 1			; LA64-NEXT: lu52i.d $a0, $zero, 1023
	; LA64-NEXT: movgr2fr.d $fa1, $a0			; LA64-NEXT: movgr2fr.d $fa1, $a0
	; LA64-NEXT: ffint.d.l $fa1, $fa1
	; LA64-NEXT: fadd.d $fa0, $fa0, $fa1			; LA64-NEXT: fadd.d $fa0, $fa0, $fa1
	; LA64-NEXT: jirl $zero, $ra, 0			; LA64-NEXT: jirl $zero, $ra, 0
	%1 = fadd double %a, 1.0			%1 = fadd double %a, 1.0
	ret double %1			ret double %1
	}			}

	define double @f64_positive_fimm1() nounwind {			define double @f64_positive_fimm1() nounwind {
	; LA32-LABEL: f64_positive_fimm1:			; LA32-LABEL: f64_positive_fimm1:
	; LA32: # %bb.0:			; LA32: # %bb.0:
	; LA32-NEXT: addi.w $a0, $zero, 1			; LA32-NEXT: lu12i.w $a0, 261888
	; LA32-NEXT: movgr2fr.w $fa0, $a0			; LA32-NEXT: movgr2fr.w $fa0, $zero
	; LA32-NEXT: ffint.s.w $fa0, $fa0			; LA32-NEXT: movgr2frh.w $fa0, $a0
	; LA32-NEXT: fcvt.d.s $fa0, $fa0
	; LA32-NEXT: jirl $zero, $ra, 0			; LA32-NEXT: jirl $zero, $ra, 0
	;			;
	; LA64-LABEL: f64_positive_fimm1:			; LA64-LABEL: f64_positive_fimm1:
	; LA64: # %bb.0:			; LA64: # %bb.0:
	; LA64-NEXT: addi.d $a0, $zero, 1			; LA64-NEXT: lu52i.d $a0, $zero, 1023
	; LA64-NEXT: movgr2fr.d $fa0, $a0			; LA64-NEXT: movgr2fr.d $fa0, $a0
	; LA64-NEXT: ffint.d.l $fa0, $fa0
	; LA64-NEXT: jirl $zero, $ra, 0			; LA64-NEXT: jirl $zero, $ra, 0
	ret double 1.0			ret double 1.0
	}			}

				define double @f64_positive_fimm64() nounwind {
				; LA32-LABEL: f64_positive_fimm64:
				; LA32: # %bb.0:
				; LA32-NEXT: lu12i.w $a0, 263424
				; LA32-NEXT: movgr2fr.w $fa0, $zero
				; LA32-NEXT: movgr2frh.w $fa0, $a0
				; LA32-NEXT: jirl $zero, $ra, 0
				;
				; LA64-LABEL: f64_positive_fimm64:
				; LA64: # %bb.0:
				; LA64-NEXT: lu52i.d $a0, $zero, 1029
				; LA64-NEXT: movgr2fr.d $fa0, $a0
				; LA64-NEXT: jirl $zero, $ra, 0
				ret double 64.0
				}

				define double @f64_negative_fimm64() nounwind {
				; LA32-LABEL: f64_negative_fimm64:
				; LA32: # %bb.0:
				; LA32-NEXT: lu12i.w $a0, -260864
				; LA32-NEXT: movgr2fr.w $fa0, $zero
				; LA32-NEXT: movgr2frh.w $fa0, $a0
				; LA32-NEXT: jirl $zero, $ra, 0
				;
				; LA64-LABEL: f64_negative_fimm64:
				; LA64: # %bb.0:
				; LA64-NEXT: lu52i.d $a0, $zero, -1019
				; LA64-NEXT: movgr2fr.d $fa0, $a0
				; LA64-NEXT: jirl $zero, $ra, 0
				ret double -64.0
				}

				define double @f64_positive_fimm1024() nounwind {
				; LA32-LABEL: f64_positive_fimm1024:
				; LA32: # %bb.0:
				; LA32-NEXT: lu12i.w $a0, 264448
				; LA32-NEXT: movgr2fr.w $fa0, $zero
				; LA32-NEXT: movgr2frh.w $fa0, $a0
				; LA32-NEXT: jirl $zero, $ra, 0
				;
				; LA64-LABEL: f64_positive_fimm1024:
				; LA64: # %bb.0:
				; LA64-NEXT: lu52i.d $a0, $zero, 1033
				; LA64-NEXT: movgr2fr.d $fa0, $a0
				; LA64-NEXT: jirl $zero, $ra, 0
				ret double 1024.0
				}

				define double @f64_negative_fimm1024() nounwind {
				; LA32-LABEL: f64_negative_fimm1024:
				; LA32: # %bb.0:
				; LA32-NEXT: lu12i.w $a0, -259840
				; LA32-NEXT: movgr2fr.w $fa0, $zero
				; LA32-NEXT: movgr2frh.w $fa0, $a0
				; LA32-NEXT: jirl $zero, $ra, 0
				;
				; LA64-LABEL: f64_negative_fimm1024:
				; LA64: # %bb.0:
				; LA64-NEXT: lu52i.d $a0, $zero, -1015
				; LA64-NEXT: movgr2fr.d $fa0, $a0
				; LA64-NEXT: jirl $zero, $ra, 0
				ret double -1024.0
				}

llvm/test/CodeGen/LoongArch/float-imm.ll

	Show All 12 Lines
	; LA64-NEXT: movgr2fr.w $fa0, $zero			; LA64-NEXT: movgr2fr.w $fa0, $zero
	; LA64-NEXT: jirl $zero, $ra, 0			; LA64-NEXT: jirl $zero, $ra, 0
	ret float 0.0			ret float 0.0
	}			}

	define float @f32_negative_zero() nounwind {			define float @f32_negative_zero() nounwind {
	; LA32-LABEL: f32_negative_zero:			; LA32-LABEL: f32_negative_zero:
	; LA32: # %bb.0:			; LA32: # %bb.0:
	; LA32-NEXT: movgr2fr.w $fa0, $zero			; LA32-NEXT: lu12i.w $a0, -524288
	; LA32-NEXT: fneg.s $fa0, $fa0			; LA32-NEXT: movgr2fr.w $fa0, $a0
	; LA32-NEXT: jirl $zero, $ra, 0			; LA32-NEXT: jirl $zero, $ra, 0
	;			;
	; LA64-LABEL: f32_negative_zero:			; LA64-LABEL: f32_negative_zero:
	; LA64: # %bb.0:			; LA64: # %bb.0:
	; LA64-NEXT: movgr2fr.w $fa0, $zero			; LA64-NEXT: lu12i.w $a0, -524288
	; LA64-NEXT: fneg.s $fa0, $fa0			; LA64-NEXT: movgr2fr.w $fa0, $a0
	; LA64-NEXT: jirl $zero, $ra, 0			; LA64-NEXT: jirl $zero, $ra, 0
	ret float -0.0			ret float -0.0
	}			}

	define float @f32_constant_pi() nounwind {			define float @f32_constant_pi() nounwind {
	; LA32-LABEL: f32_constant_pi:			; LA32-LABEL: f32_constant_pi:
	; LA32: # %bb.0:			; LA32: # %bb.0:
	; LA32-NEXT: pcalau12i $a0, .LCPI2_0			; LA32-NEXT: lu12i.w $a0, 263312
	; LA32-NEXT: addi.w $a0, $a0, .LCPI2_0			; LA32-NEXT: ori $a0, $a0, 4059
	; LA32-NEXT: fld.s $fa0, $a0, 0			; LA32-NEXT: movgr2fr.w $fa0, $a0
	; LA32-NEXT: jirl $zero, $ra, 0			; LA32-NEXT: jirl $zero, $ra, 0
	;			;
	; LA64-LABEL: f32_constant_pi:			; LA64-LABEL: f32_constant_pi:
	; LA64: # %bb.0:			; LA64: # %bb.0:
	; LA64-NEXT: pcalau12i $a0, .LCPI2_0			; LA64-NEXT: lu12i.w $a0, 263312
	; LA64-NEXT: addi.d $a0, $a0, .LCPI2_0			; LA64-NEXT: ori $a0, $a0, 4059
	; LA64-NEXT: fld.s $fa0, $a0, 0			; LA64-NEXT: movgr2fr.w $fa0, $a0
	; LA64-NEXT: jirl $zero, $ra, 0			; LA64-NEXT: jirl $zero, $ra, 0
	ret float 3.14159274101257324218750			ret float 3.14159274101257324218750
	}			}

	define float @f32_add_fimm1(float %a) nounwind {			define float @f32_add_fimm1(float %a) nounwind {
	; LA32-LABEL: f32_add_fimm1:			; LA32-LABEL: f32_add_fimm1:
	; LA32: # %bb.0:			; LA32: # %bb.0:
	; LA32-NEXT: addi.w $a0, $zero, 1			; LA32-NEXT: lu12i.w $a0, 260096
	; LA32-NEXT: movgr2fr.w $fa1, $a0			; LA32-NEXT: movgr2fr.w $fa1, $a0
	; LA32-NEXT: ffint.s.w $fa1, $fa1
	; LA32-NEXT: fadd.s $fa0, $fa0, $fa1			; LA32-NEXT: fadd.s $fa0, $fa0, $fa1
	; LA32-NEXT: jirl $zero, $ra, 0			; LA32-NEXT: jirl $zero, $ra, 0
	;			;
	; LA64-LABEL: f32_add_fimm1:			; LA64-LABEL: f32_add_fimm1:
	; LA64: # %bb.0:			; LA64: # %bb.0:
	; LA64-NEXT: addi.w $a0, $zero, 1			; LA64-NEXT: lu12i.w $a0, 260096
	; LA64-NEXT: movgr2fr.w $fa1, $a0			; LA64-NEXT: movgr2fr.w $fa1, $a0
	; LA64-NEXT: ffint.s.w $fa1, $fa1
	; LA64-NEXT: fadd.s $fa0, $fa0, $fa1			; LA64-NEXT: fadd.s $fa0, $fa0, $fa1
	; LA64-NEXT: jirl $zero, $ra, 0			; LA64-NEXT: jirl $zero, $ra, 0
	%1 = fadd float %a, 1.0			%1 = fadd float %a, 1.0
	ret float %1			ret float %1
	}			}

	define float @f32_positive_fimm1() nounwind {			define float @f32_positive_fimm1() nounwind {
	; LA32-LABEL: f32_positive_fimm1:			; LA32-LABEL: f32_positive_fimm1:
	; LA32: # %bb.0:			; LA32: # %bb.0:
	; LA32-NEXT: addi.w $a0, $zero, 1			; LA32-NEXT: lu12i.w $a0, 260096
	; LA32-NEXT: movgr2fr.w $fa0, $a0			; LA32-NEXT: movgr2fr.w $fa0, $a0
	; LA32-NEXT: ffint.s.w $fa0, $fa0
	; LA32-NEXT: jirl $zero, $ra, 0			; LA32-NEXT: jirl $zero, $ra, 0
	;			;
	; LA64-LABEL: f32_positive_fimm1:			; LA64-LABEL: f32_positive_fimm1:
	; LA64: # %bb.0:			; LA64: # %bb.0:
	; LA64-NEXT: addi.w $a0, $zero, 1			; LA64-NEXT: lu12i.w $a0, 260096
	; LA64-NEXT: movgr2fr.w $fa0, $a0			; LA64-NEXT: movgr2fr.w $fa0, $a0
	; LA64-NEXT: ffint.s.w $fa0, $fa0
	; LA64-NEXT: jirl $zero, $ra, 0			; LA64-NEXT: jirl $zero, $ra, 0
	ret float 1.0			ret float 1.0
	}			}

llvm/test/CodeGen/LoongArch/ir-instruction/double-convert.ll

	Show First 20 Lines • Show All 110 Lines • ▼ Show 20 Lines
	; LA64-NEXT: jirl $zero, $ra, 0			; LA64-NEXT: jirl $zero, $ra, 0
	%1 = fptosi double %a to i32			%1 = fptosi double %a to i32
	ret i32 %1			ret i32 %1
	}			}

	define i32 @convert_double_to_u32(double %a) nounwind {			define i32 @convert_double_to_u32(double %a) nounwind {
	; LA32-LABEL: convert_double_to_u32:			; LA32-LABEL: convert_double_to_u32:
	; LA32: # %bb.0:			; LA32: # %bb.0:
	; LA32-NEXT: pcalau12i $a0, .LCPI7_0			; LA32-NEXT: lu12i.w $a0, 269824
	; LA32-NEXT: addi.w $a0, $a0, .LCPI7_0			; LA32-NEXT: movgr2fr.w $fa1, $zero
	; LA32-NEXT: fld.d $fa1, $a0, 0			; LA32-NEXT: movgr2frh.w $fa1, $a0
	; LA32-NEXT: fsub.d $fa2, $fa0, $fa1			; LA32-NEXT: fsub.d $fa2, $fa0, $fa1
	; LA32-NEXT: ftintrz.w.d $fa2, $fa2			; LA32-NEXT: ftintrz.w.d $fa2, $fa2
	; LA32-NEXT: movfr2gr.s $a0, $fa2			; LA32-NEXT: movfr2gr.s $a0, $fa2
	; LA32-NEXT: lu12i.w $a1, -524288			; LA32-NEXT: lu12i.w $a1, -524288
	; LA32-NEXT: xor $a0, $a0, $a1			; LA32-NEXT: xor $a0, $a0, $a1
	; LA32-NEXT: fcmp.clt.d $fcc0, $fa0, $fa1			; LA32-NEXT: fcmp.clt.d $fcc0, $fa0, $fa1
	; LA32-NEXT: movcf2gr $a1, $fcc0			; LA32-NEXT: movcf2gr $a1, $fcc0
	; LA32-NEXT: masknez $a0, $a0, $a1			; LA32-NEXT: masknez $a0, $a0, $a1
	Show All 38 Lines
	; LA32-NEXT: st.w $ra, $sp, 12 # 4-byte Folded Spill			; LA32-NEXT: st.w $ra, $sp, 12 # 4-byte Folded Spill
	; LA32-NEXT: bl __fixunsdfdi			; LA32-NEXT: bl __fixunsdfdi
	; LA32-NEXT: ld.w $ra, $sp, 12 # 4-byte Folded Reload			; LA32-NEXT: ld.w $ra, $sp, 12 # 4-byte Folded Reload
	; LA32-NEXT: addi.w $sp, $sp, 16			; LA32-NEXT: addi.w $sp, $sp, 16
	; LA32-NEXT: jirl $zero, $ra, 0			; LA32-NEXT: jirl $zero, $ra, 0
	;			;
	; LA64-LABEL: convert_double_to_u64:			; LA64-LABEL: convert_double_to_u64:
	; LA64: # %bb.0:			; LA64: # %bb.0:
	; LA64-NEXT: pcalau12i $a0, .LCPI9_0			; LA64-NEXT: lu52i.d $a0, $zero, 1086
	; LA64-NEXT: addi.d $a0, $a0, .LCPI9_0			; LA64-NEXT: movgr2fr.d $fa1, $a0
	; LA64-NEXT: fld.d $fa1, $a0, 0
	; LA64-NEXT: fsub.d $fa2, $fa0, $fa1			; LA64-NEXT: fsub.d $fa2, $fa0, $fa1
	; LA64-NEXT: ftintrz.l.d $fa2, $fa2			; LA64-NEXT: ftintrz.l.d $fa2, $fa2
	; LA64-NEXT: movfr2gr.d $a0, $fa2			; LA64-NEXT: movfr2gr.d $a0, $fa2
	; LA64-NEXT: lu52i.d $a1, $zero, -2048			; LA64-NEXT: lu52i.d $a1, $zero, -2048
	; LA64-NEXT: xor $a0, $a0, $a1			; LA64-NEXT: xor $a0, $a0, $a1
	; LA64-NEXT: fcmp.clt.d $fcc0, $fa0, $fa1			; LA64-NEXT: fcmp.clt.d $fcc0, $fa0, $fa1
	; LA64-NEXT: movcf2gr $a1, $fcc0			; LA64-NEXT: movcf2gr $a1, $fcc0
	; LA64-NEXT: masknez $a0, $a0, $a1			; LA64-NEXT: masknez $a0, $a0, $a1
	▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines
	; LA32-LABEL: convert_u32_to_double:			; LA32-LABEL: convert_u32_to_double:
	; LA32: # %bb.0:			; LA32: # %bb.0:
	; LA32-NEXT: addi.w $sp, $sp, -16			; LA32-NEXT: addi.w $sp, $sp, -16
	; LA32-NEXT: addi.w $a1, $sp, 8			; LA32-NEXT: addi.w $a1, $sp, 8
	; LA32-NEXT: ori $a1, $a1, 4			; LA32-NEXT: ori $a1, $a1, 4
	; LA32-NEXT: lu12i.w $a2, 275200			; LA32-NEXT: lu12i.w $a2, 275200
	; LA32-NEXT: st.w $a2, $a1, 0			; LA32-NEXT: st.w $a2, $a1, 0
	; LA32-NEXT: st.w $a0, $sp, 8			; LA32-NEXT: st.w $a0, $sp, 8
	; LA32-NEXT: pcalau12i $a0, .LCPI12_0			; LA32-NEXT: lu12i.w $a0, -249088
	; LA32-NEXT: addi.w $a0, $a0, .LCPI12_0			; LA32-NEXT: movgr2fr.w $fa0, $zero
	; LA32-NEXT: fld.d $fa0, $a0, 0			; LA32-NEXT: movgr2frh.w $fa0, $a0
	; LA32-NEXT: fld.d $fa1, $sp, 8			; LA32-NEXT: fld.d $fa1, $sp, 8
	; LA32-NEXT: fsub.d $fa0, $fa1, $fa0			; LA32-NEXT: fadd.d $fa0, $fa1, $fa0
	; LA32-NEXT: addi.w $sp, $sp, 16			; LA32-NEXT: addi.w $sp, $sp, 16
	; LA32-NEXT: jirl $zero, $ra, 0			; LA32-NEXT: jirl $zero, $ra, 0
	;			;
	; LA64-LABEL: convert_u32_to_double:			; LA64-LABEL: convert_u32_to_double:
	; LA64: # %bb.0:			; LA64: # %bb.0:
	; LA64-NEXT: lu52i.d $a1, $zero, 1107			; LA64-NEXT: lu52i.d $a1, $zero, 1107
	; LA64-NEXT: movgr2fr.d $fa0, $a1			; LA64-NEXT: movgr2fr.d $fa0, $a1
	; LA64-NEXT: pcalau12i $a1, .LCPI12_0			; LA64-NEXT: lu12i.w $a1, 256
	; LA64-NEXT: addi.d $a1, $a1, .LCPI12_0			; LA64-NEXT: lu52i.d $a1, $a1, -941
	; LA64-NEXT: fld.d $fa1, $a1, 0			; LA64-NEXT: movgr2fr.d $fa1, $a1
	; LA64-NEXT: fsub.d $fa0, $fa0, $fa1			; LA64-NEXT: fadd.d $fa0, $fa0, $fa1
	; LA64-NEXT: bstrpick.d $a0, $a0, 31, 0			; LA64-NEXT: bstrpick.d $a0, $a0, 31, 0
	; LA64-NEXT: lu52i.d $a1, $zero, 1075			; LA64-NEXT: lu52i.d $a1, $zero, 1075
	; LA64-NEXT: or $a0, $a0, $a1			; LA64-NEXT: or $a0, $a0, $a1
	; LA64-NEXT: movgr2fr.d $fa1, $a0			; LA64-NEXT: movgr2fr.d $fa1, $a0
	; LA64-NEXT: fadd.d $fa0, $fa1, $fa0			; LA64-NEXT: fadd.d $fa0, $fa1, $fa0
	; LA64-NEXT: jirl $zero, $ra, 0			; LA64-NEXT: jirl $zero, $ra, 0
	%1 = uitofp i32 %a to double			%1 = uitofp i32 %a to double
	ret double %1			ret double %1
	Show All 10 Lines
	; LA32-NEXT: jirl $zero, $ra, 0			; LA32-NEXT: jirl $zero, $ra, 0
	;			;
	; LA64-LABEL: convert_u64_to_double:			; LA64-LABEL: convert_u64_to_double:
	; LA64: # %bb.0:			; LA64: # %bb.0:
	; LA64-NEXT: srli.d $a1, $a0, 32			; LA64-NEXT: srli.d $a1, $a0, 32
	; LA64-NEXT: lu52i.d $a2, $zero, 1107			; LA64-NEXT: lu52i.d $a2, $zero, 1107
	; LA64-NEXT: or $a1, $a1, $a2			; LA64-NEXT: or $a1, $a1, $a2
	; LA64-NEXT: movgr2fr.d $fa0, $a1			; LA64-NEXT: movgr2fr.d $fa0, $a1
	; LA64-NEXT: pcalau12i $a1, .LCPI13_0			; LA64-NEXT: lu12i.w $a1, 256
	; LA64-NEXT: addi.d $a1, $a1, .LCPI13_0			; LA64-NEXT: lu52i.d $a1, $a1, -941
	; LA64-NEXT: fld.d $fa1, $a1, 0			; LA64-NEXT: movgr2fr.d $fa1, $a1
	; LA64-NEXT: fsub.d $fa0, $fa0, $fa1			; LA64-NEXT: fadd.d $fa0, $fa0, $fa1
	; LA64-NEXT: bstrpick.d $a0, $a0, 31, 0			; LA64-NEXT: bstrpick.d $a0, $a0, 31, 0
	; LA64-NEXT: lu52i.d $a1, $zero, 1075			; LA64-NEXT: lu52i.d $a1, $zero, 1075
	; LA64-NEXT: or $a0, $a0, $a1			; LA64-NEXT: or $a0, $a0, $a1
	; LA64-NEXT: movgr2fr.d $fa1, $a0			; LA64-NEXT: movgr2fr.d $fa1, $a0
	; LA64-NEXT: fadd.d $fa0, $fa1, $fa0			; LA64-NEXT: fadd.d $fa0, $fa1, $fa0
	; LA64-NEXT: jirl $zero, $ra, 0			; LA64-NEXT: jirl $zero, $ra, 0
	%1 = uitofp i64 %a to double			%1 = uitofp i64 %a to double
	ret double %1			ret double %1
	▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines

llvm/test/CodeGen/LoongArch/ir-instruction/float-convert.ll

	Show First 20 Lines • Show All 175 Lines • ▼ Show 20 Lines
	; LA64D-NEXT: jirl $zero, $ra, 0			; LA64D-NEXT: jirl $zero, $ra, 0
	%1 = fptoui float %a to i16			%1 = fptoui float %a to i16
	ret i16 %1			ret i16 %1
	}			}

	define i32 @convert_float_to_u32(float %a) nounwind {			define i32 @convert_float_to_u32(float %a) nounwind {
	; LA32F-LABEL: convert_float_to_u32:			; LA32F-LABEL: convert_float_to_u32:
	; LA32F: # %bb.0:			; LA32F: # %bb.0:
	; LA32F-NEXT: pcalau12i $a0, .LCPI6_0			; LA32F-NEXT: lu12i.w $a0, 323584
	; LA32F-NEXT: addi.w $a0, $a0, .LCPI6_0			; LA32F-NEXT: movgr2fr.w $fa1, $a0
	; LA32F-NEXT: fld.s $fa1, $a0, 0
	; LA32F-NEXT: fsub.s $fa2, $fa0, $fa1			; LA32F-NEXT: fsub.s $fa2, $fa0, $fa1
	; LA32F-NEXT: ftintrz.w.s $fa2, $fa2			; LA32F-NEXT: ftintrz.w.s $fa2, $fa2
	; LA32F-NEXT: movfr2gr.s $a0, $fa2			; LA32F-NEXT: movfr2gr.s $a0, $fa2
	; LA32F-NEXT: lu12i.w $a1, -524288			; LA32F-NEXT: lu12i.w $a1, -524288
	; LA32F-NEXT: xor $a0, $a0, $a1			; LA32F-NEXT: xor $a0, $a0, $a1
	; LA32F-NEXT: fcmp.clt.s $fcc0, $fa0, $fa1			; LA32F-NEXT: fcmp.clt.s $fcc0, $fa0, $fa1
	; LA32F-NEXT: movcf2gr $a1, $fcc0			; LA32F-NEXT: movcf2gr $a1, $fcc0
	; LA32F-NEXT: masknez $a0, $a0, $a1			; LA32F-NEXT: masknez $a0, $a0, $a1
	; LA32F-NEXT: ftintrz.w.s $fa0, $fa0			; LA32F-NEXT: ftintrz.w.s $fa0, $fa0
	; LA32F-NEXT: movfr2gr.s $a2, $fa0			; LA32F-NEXT: movfr2gr.s $a2, $fa0
	; LA32F-NEXT: maskeqz $a1, $a2, $a1			; LA32F-NEXT: maskeqz $a1, $a2, $a1
	; LA32F-NEXT: or $a0, $a1, $a0			; LA32F-NEXT: or $a0, $a1, $a0
	; LA32F-NEXT: jirl $zero, $ra, 0			; LA32F-NEXT: jirl $zero, $ra, 0
	;			;
	; LA32D-LABEL: convert_float_to_u32:			; LA32D-LABEL: convert_float_to_u32:
	; LA32D: # %bb.0:			; LA32D: # %bb.0:
	; LA32D-NEXT: pcalau12i $a0, .LCPI6_0			; LA32D-NEXT: lu12i.w $a0, 323584
	; LA32D-NEXT: addi.w $a0, $a0, .LCPI6_0			; LA32D-NEXT: movgr2fr.w $fa1, $a0
	; LA32D-NEXT: fld.s $fa1, $a0, 0
	; LA32D-NEXT: fsub.s $fa2, $fa0, $fa1			; LA32D-NEXT: fsub.s $fa2, $fa0, $fa1
	; LA32D-NEXT: ftintrz.w.s $fa2, $fa2			; LA32D-NEXT: ftintrz.w.s $fa2, $fa2
	; LA32D-NEXT: movfr2gr.s $a0, $fa2			; LA32D-NEXT: movfr2gr.s $a0, $fa2
	; LA32D-NEXT: lu12i.w $a1, -524288			; LA32D-NEXT: lu12i.w $a1, -524288
	; LA32D-NEXT: xor $a0, $a0, $a1			; LA32D-NEXT: xor $a0, $a0, $a1
	; LA32D-NEXT: fcmp.clt.s $fcc0, $fa0, $fa1			; LA32D-NEXT: fcmp.clt.s $fcc0, $fa0, $fa1
	; LA32D-NEXT: movcf2gr $a1, $fcc0			; LA32D-NEXT: movcf2gr $a1, $fcc0
	; LA32D-NEXT: masknez $a0, $a0, $a1			; LA32D-NEXT: masknez $a0, $a0, $a1
	; LA32D-NEXT: ftintrz.w.s $fa0, $fa0			; LA32D-NEXT: ftintrz.w.s $fa0, $fa0
	; LA32D-NEXT: movfr2gr.s $a2, $fa0			; LA32D-NEXT: movfr2gr.s $a2, $fa0
	; LA32D-NEXT: maskeqz $a1, $a2, $a1			; LA32D-NEXT: maskeqz $a1, $a2, $a1
	; LA32D-NEXT: or $a0, $a1, $a0			; LA32D-NEXT: or $a0, $a1, $a0
	; LA32D-NEXT: jirl $zero, $ra, 0			; LA32D-NEXT: jirl $zero, $ra, 0
	;			;
	; LA64F-LABEL: convert_float_to_u32:			; LA64F-LABEL: convert_float_to_u32:
	; LA64F: # %bb.0:			; LA64F: # %bb.0:
	; LA64F-NEXT: pcalau12i $a0, .LCPI6_0			; LA64F-NEXT: lu12i.w $a0, 323584
	; LA64F-NEXT: addi.d $a0, $a0, .LCPI6_0			; LA64F-NEXT: movgr2fr.w $fa1, $a0
	; LA64F-NEXT: fld.s $fa1, $a0, 0
	; LA64F-NEXT: fsub.s $fa2, $fa0, $fa1			; LA64F-NEXT: fsub.s $fa2, $fa0, $fa1
	; LA64F-NEXT: ftintrz.w.s $fa2, $fa2			; LA64F-NEXT: ftintrz.w.s $fa2, $fa2
	; LA64F-NEXT: movfr2gr.s $a0, $fa2			; LA64F-NEXT: movfr2gr.s $a0, $fa2
	; LA64F-NEXT: lu12i.w $a1, -524288			; LA64F-NEXT: lu12i.w $a1, -524288
	; LA64F-NEXT: xor $a0, $a0, $a1			; LA64F-NEXT: xor $a0, $a0, $a1
	; LA64F-NEXT: fcmp.clt.s $fcc0, $fa0, $fa1			; LA64F-NEXT: fcmp.clt.s $fcc0, $fa0, $fa1
	; LA64F-NEXT: movcf2gr $a1, $fcc0			; LA64F-NEXT: movcf2gr $a1, $fcc0
	; LA64F-NEXT: masknez $a0, $a0, $a1			; LA64F-NEXT: masknez $a0, $a0, $a1
	Show All 28 Lines
	; LA32D-NEXT: st.w $ra, $sp, 12 # 4-byte Folded Spill			; LA32D-NEXT: st.w $ra, $sp, 12 # 4-byte Folded Spill
	; LA32D-NEXT: bl __fixunssfdi			; LA32D-NEXT: bl __fixunssfdi
	; LA32D-NEXT: ld.w $ra, $sp, 12 # 4-byte Folded Reload			; LA32D-NEXT: ld.w $ra, $sp, 12 # 4-byte Folded Reload
	; LA32D-NEXT: addi.w $sp, $sp, 16			; LA32D-NEXT: addi.w $sp, $sp, 16
	; LA32D-NEXT: jirl $zero, $ra, 0			; LA32D-NEXT: jirl $zero, $ra, 0
	;			;
	; LA64F-LABEL: convert_float_to_u64:			; LA64F-LABEL: convert_float_to_u64:
	; LA64F: # %bb.0:			; LA64F: # %bb.0:
	; LA64F-NEXT: pcalau12i $a0, .LCPI7_0			; LA64F-NEXT: lu12i.w $a0, 389120
	; LA64F-NEXT: addi.d $a0, $a0, .LCPI7_0			; LA64F-NEXT: movgr2fr.w $fa1, $a0
	; LA64F-NEXT: fld.s $fa1, $a0, 0
	; LA64F-NEXT: fsub.s $fa2, $fa0, $fa1			; LA64F-NEXT: fsub.s $fa2, $fa0, $fa1
	; LA64F-NEXT: ftintrz.w.s $fa2, $fa2			; LA64F-NEXT: ftintrz.w.s $fa2, $fa2
	; LA64F-NEXT: movfr2gr.s $a0, $fa2			; LA64F-NEXT: movfr2gr.s $a0, $fa2
	; LA64F-NEXT: lu52i.d $a1, $zero, -2048			; LA64F-NEXT: lu52i.d $a1, $zero, -2048
	; LA64F-NEXT: xor $a0, $a0, $a1			; LA64F-NEXT: xor $a0, $a0, $a1
	; LA64F-NEXT: fcmp.clt.s $fcc0, $fa0, $fa1			; LA64F-NEXT: fcmp.clt.s $fcc0, $fa0, $fa1
	; LA64F-NEXT: movcf2gr $a1, $fcc0			; LA64F-NEXT: movcf2gr $a1, $fcc0
	; LA64F-NEXT: masknez $a0, $a0, $a1			; LA64F-NEXT: masknez $a0, $a0, $a1
	; LA64F-NEXT: ftintrz.w.s $fa0, $fa0			; LA64F-NEXT: ftintrz.w.s $fa0, $fa0
	; LA64F-NEXT: movfr2gr.s $a2, $fa0			; LA64F-NEXT: movfr2gr.s $a2, $fa0
	; LA64F-NEXT: maskeqz $a1, $a2, $a1			; LA64F-NEXT: maskeqz $a1, $a2, $a1
	; LA64F-NEXT: or $a0, $a1, $a0			; LA64F-NEXT: or $a0, $a1, $a0
	; LA64F-NEXT: jirl $zero, $ra, 0			; LA64F-NEXT: jirl $zero, $ra, 0
	;			;
	; LA64D-LABEL: convert_float_to_u64:			; LA64D-LABEL: convert_float_to_u64:
	; LA64D: # %bb.0:			; LA64D: # %bb.0:
	; LA64D-NEXT: pcalau12i $a0, .LCPI7_0			; LA64D-NEXT: lu12i.w $a0, 389120
	; LA64D-NEXT: addi.d $a0, $a0, .LCPI7_0			; LA64D-NEXT: movgr2fr.w $fa1, $a0
	; LA64D-NEXT: fld.s $fa1, $a0, 0
	; LA64D-NEXT: fsub.s $fa2, $fa0, $fa1			; LA64D-NEXT: fsub.s $fa2, $fa0, $fa1
	; LA64D-NEXT: ftintrz.l.s $fa2, $fa2			; LA64D-NEXT: ftintrz.l.s $fa2, $fa2
	; LA64D-NEXT: movfr2gr.d $a0, $fa2			; LA64D-NEXT: movfr2gr.d $a0, $fa2
	; LA64D-NEXT: lu52i.d $a1, $zero, -2048			; LA64D-NEXT: lu52i.d $a1, $zero, -2048
	; LA64D-NEXT: xor $a0, $a0, $a1			; LA64D-NEXT: xor $a0, $a0, $a1
	; LA64D-NEXT: fcmp.clt.s $fcc0, $fa0, $fa1			; LA64D-NEXT: fcmp.clt.s $fcc0, $fa0, $fa1
	; LA64D-NEXT: movcf2gr $a1, $fcc0			; LA64D-NEXT: movcf2gr $a1, $fcc0
	; LA64D-NEXT: masknez $a0, $a0, $a1			; LA64D-NEXT: masknez $a0, $a0, $a1
	▲ Show 20 Lines • Show All 201 Lines • ▼ Show 20 Lines
	; LA32D-LABEL: convert_u32_to_float:			; LA32D-LABEL: convert_u32_to_float:
	; LA32D: # %bb.0:			; LA32D: # %bb.0:
	; LA32D-NEXT: addi.w $sp, $sp, -16			; LA32D-NEXT: addi.w $sp, $sp, -16
	; LA32D-NEXT: addi.w $a1, $sp, 8			; LA32D-NEXT: addi.w $a1, $sp, 8
	; LA32D-NEXT: ori $a1, $a1, 4			; LA32D-NEXT: ori $a1, $a1, 4
	; LA32D-NEXT: lu12i.w $a2, 275200			; LA32D-NEXT: lu12i.w $a2, 275200
	; LA32D-NEXT: st.w $a2, $a1, 0			; LA32D-NEXT: st.w $a2, $a1, 0
	; LA32D-NEXT: st.w $a0, $sp, 8			; LA32D-NEXT: st.w $a0, $sp, 8
	; LA32D-NEXT: pcalau12i $a0, .LCPI14_0			; LA32D-NEXT: lu12i.w $a0, -249088
	; LA32D-NEXT: addi.w $a0, $a0, .LCPI14_0			; LA32D-NEXT: movgr2fr.w $fa0, $zero
	; LA32D-NEXT: fld.d $fa0, $a0, 0			; LA32D-NEXT: movgr2frh.w $fa0, $a0
	; LA32D-NEXT: fld.d $fa1, $sp, 8			; LA32D-NEXT: fld.d $fa1, $sp, 8
	; LA32D-NEXT: fsub.d $fa0, $fa1, $fa0			; LA32D-NEXT: fadd.d $fa0, $fa1, $fa0
	; LA32D-NEXT: fcvt.s.d $fa0, $fa0			; LA32D-NEXT: fcvt.s.d $fa0, $fa0
	; LA32D-NEXT: addi.w $sp, $sp, 16			; LA32D-NEXT: addi.w $sp, $sp, 16
	; LA32D-NEXT: jirl $zero, $ra, 0			; LA32D-NEXT: jirl $zero, $ra, 0
	;			;
	; LA64F-LABEL: convert_u32_to_float:			; LA64F-LABEL: convert_u32_to_float:
	; LA64F: # %bb.0:			; LA64F: # %bb.0:
	; LA64F-NEXT: bstrpick.d $a1, $a0, 31, 1			; LA64F-NEXT: bstrpick.d $a1, $a0, 31, 1
	; LA64F-NEXT: andi $a2, $a0, 1			; LA64F-NEXT: andi $a2, $a0, 1
	▲ Show 20 Lines • Show All 130 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[LoongArch] Heuristically load FP immediates by movgr2fr from materialized integerAcceptedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 444860

llvm/lib/Target/LoongArch/LoongArchFloat32InstrInfo.td

llvm/lib/Target/LoongArch/LoongArchFloat64InstrInfo.td

llvm/lib/Target/LoongArch/LoongArchISelDAGToDAG.h

llvm/lib/Target/LoongArch/LoongArchISelDAGToDAG.cpp

llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp

llvm/lib/Target/LoongArch/LoongArchInstrInfo.td

llvm/test/CodeGen/LoongArch/double-imm.ll

llvm/test/CodeGen/LoongArch/float-imm.ll

llvm/test/CodeGen/LoongArch/ir-instruction/double-convert.ll

llvm/test/CodeGen/LoongArch/ir-instruction/float-convert.ll

[LoongArch] Heuristically load FP immediates by movgr2fr from materialized integer
AcceptedPublic