This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Optimize floating point materialization
ClosedPublic

Authored by zatrazz on Feb 20 2019, 10:05 AM.

Download Raw Diff

Details

Reviewers

javed.absar
huntergr
SjoerdMeijer
t.p.northover
echristo
evandro
rengolin
efriedma

Commits

rGa3cefa5d6492: [AArch64] Optimize floating point materialization
rL356390: [AArch64] Optimize floating point materialization

Summary

This patch follows some ideas from r352866 to optimize the floating point materialization further. It changes isFPImmLegal to considere up to 2 mov instruction. The rationale is the cost is the same for mov+fmov vs. adrp+ldr; but the mov+fmov sequence is always better because of the reduced d-cache pressure. The timings are still the same if you consider movw+movk+fmov vs. adrp+ldr will be fused (although one instruction longer).

Diff Detail

Event Timeline

zatrazz created this revision.Feb 20 2019, 10:05 AM

Herald added subscribers: mstorsjo, kristof.beyls. · View Herald TranscriptFeb 20 2019, 10:05 AM

efriedma added inline comments.Feb 20 2019, 11:43 AM

lib/Target/AArch64/AArch64ISelLowering.cpp
5404	Please don't copy-paste code. And this is a very inaccurate approximation for the logic in AArch64ExpandPseudo::expandMOVImm.
5410	Not sure this comment should be here; should be next to the code that actually makes this decision.
5417	Does it matter whether we're optimizing for size?

New revision based on previous comments. I refactored the logic used on isFPImmLegal to evaluate whether to materialize the FP constant or not by adding a new function on common aarch64 code, AArch64_AM::getExpandImmCost. To avoid code replication I refactored the code by moving some definitions from AArch64ExpandPseudoInsts.cpp.

lib/Target/AArch64/AArch64ISelLowering.cpp
5404	Ack, I changed it on next revision to use the same ideas AArch64ExpandPseudo::expandMOVIm. It required to consolidate some common code.
5410	Ack, I moved to the code that actually uses it.
5417	I intend to send another patch to add the optimization for size information on isFPImmLegal, since it requires some refactoring in various backends.

Not sure I like the duplicated logic in getExpandImmCost; it doesn't have good test coverage, and it could fall out of sync in the future. That said, I've been considering refactoring the code in expandMOVImm anyway, to split the actual instruction emission away from the logic that figures out the appropriate sequence. Basically, the idea would be that instead of returning a number from getExpandImmCost, you return an abstraction of the instruction sequence: an array that contains, for each instruction, the appropriate opcode and immediate. isFPImmLegal just uses the number of elements in the array, while expandMOVImm actually emits instructions based on the array. I think this would shrink the code overall because the logic for building instructions is currently duplicated multiple times. (I was considering it more in the context of adding more possible sequences, but it works here as well.)

lib/Target/AArch64/AArch64ISelLowering.cpp
5422	Is this comment redundant?
5425	Where is forCodeSize defined?
lib/Target/AArch64/MCTargetDesc/AArch64AddressingModes.h
1054 ↗	(On Diff #188415)	"return 2" isn't quite right; it could be 1.

In D58460#1411040, @efriedma wrote:

Not sure I like the duplicated logic in getExpandImmCost; it doesn't have good test coverage, and it could fall out of sync in the future. That said, I've been considering refactoring the code in expandMOVImm anyway, to split the actual instruction emission away from the logic that figures out the appropriate sequence. Basically, the idea would be that instead of returning a number from getExpandImmCost, you return an abstraction of the instruction sequence: an array that contains, for each instruction, the appropriate opcode and immediate. isFPImmLegal just uses the number of elements in the array, while expandMOVImm actually emits instructions based on the array. I think this would shrink the code overall because the logic for building instructions is currently duplicated multiple times. (I was considering it more in the context of adding more possible sequences, but it works here as well.)

Right, I think I can follow this idea and rewrite my patch if you don't mind.

lib/Target/AArch64/AArch64ISelLowering.cpp
5422	Indeed, I will remove it.
5425	It is an artifact from a wrong rebase, I will fix it (it is meant for a different patch).
lib/Target/AArch64/MCTargetDesc/AArch64AddressingModes.h
1054 ↗	(On Diff #188415)	Indeed, I will fix it.

evandro added inline comments.Feb 27 2019, 10:38 AM

lib/Target/AArch64/AArch64ISelLowering.cpp
5418	Perhaps you could check for AArch64Subtarget::hasFuseLiterals() and emit up to 5 instructions, thus including f64, unless optimizing for size.

Right, I think I can follow this idea and rewrite my patch if you don't mind.

Okay, thanks.

Updated patch based on previous comments. It depends on https://reviews.llvm.org/D58915 and https://reviews.llvm.org/D58690

LGTM (but obviously can't be merged until the dependency is reviewed/merged)

This revision is now accepted and ready to land.Mar 4 2019, 1:30 PM

zatrazz closed this revision.Mar 18 2019, 11:44 AM

Revision Contents

Path

Size

lib/

Target/

AArch64/

AArch64ISelLowering.cpp

16 lines

test/

CodeGen/

AArch64/

arm64-fp-imm-size.ll

40 lines

arm64-fp-imm.ll

7 lines

fpimm.ll

12 lines

literal_pools_float.ll

11 lines

misched-fusion-lit.ll

15 lines

win_cst_pool.ll

16 lines

Diff 189176

lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show All 11 Lines

#include "AArch64ISelLowering.h"		#include "AArch64ISelLowering.h"
#include "AArch64CallingConvention.h"		#include "AArch64CallingConvention.h"
#include "AArch64MachineFunctionInfo.h"		#include "AArch64MachineFunctionInfo.h"
#include "AArch64PerfectShuffle.h"		#include "AArch64PerfectShuffle.h"
#include "AArch64RegisterInfo.h"		#include "AArch64RegisterInfo.h"
#include "AArch64Subtarget.h"		#include "AArch64Subtarget.h"
#include "MCTargetDesc/AArch64AddressingModes.h"		#include "MCTargetDesc/AArch64AddressingModes.h"
		#include "MCTargetDesc/AArch64ExpandImm.h"
#include "Utils/AArch64BaseInfo.h"		#include "Utils/AArch64BaseInfo.h"
#include "llvm/ADT/APFloat.h"		#include "llvm/ADT/APFloat.h"
#include "llvm/ADT/APInt.h"		#include "llvm/ADT/APInt.h"
#include "llvm/ADT/ArrayRef.h"		#include "llvm/ADT/ArrayRef.h"
#include "llvm/ADT/STLExtras.h"		#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
▲ Show 20 Lines • Show All 5,367 Lines • ▼ Show 20 Lines	bool AArch64TargetLowering::isFPImmLegal(const APFloat &Imm, EVT VT,
bool OptForSize) const {		bool OptForSize) const {
bool IsLegal = false;		bool IsLegal = false;
// We can materialize #0.0 as fmov $Rd, XZR for 64-bit, 32-bit cases, and		// We can materialize #0.0 as fmov $Rd, XZR for 64-bit, 32-bit cases, and
// 16-bit case when target has full fp16 support.		// 16-bit case when target has full fp16 support.
// FIXME: We should be able to handle f128 as well with a clever lowering.		// FIXME: We should be able to handle f128 as well with a clever lowering.
const APInt ImmInt = Imm.bitcastToAPInt();		const APInt ImmInt = Imm.bitcastToAPInt();
if (VT == MVT::f64)		if (VT == MVT::f64)
IsLegal = AArch64_AM::getFP64Imm(ImmInt) != -1 \|\| Imm.isPosZero();		IsLegal = AArch64_AM::getFP64Imm(ImmInt) != -1 \|\| Imm.isPosZero();
else if (VT == MVT::f32)		else if (VT == MVT::f32)
		efriedmaUnsubmitted Not Done Reply Inline Actions Please don't copy-paste code. And this is a very inaccurate approximation for the logic in AArch64ExpandPseudo::expandMOVImm. efriedma: Please don't copy-paste code. And this is a very inaccurate approximation for the logic in…
		zatrazzAuthorUnsubmitted Done Reply Inline Actions Ack, I changed it on next revision to use the same ideas AArch64ExpandPseudo::expandMOVIm. It required to consolidate some common code. zatrazz: Ack, I changed it on next revision to use the same ideas AArch64ExpandPseudo::expandMOVIm. It…
IsLegal = AArch64_AM::getFP32Imm(ImmInt) != -1 \|\| Imm.isPosZero();		IsLegal = AArch64_AM::getFP32Imm(ImmInt) != -1 \|\| Imm.isPosZero();
else if (VT == MVT::f16 && Subtarget->hasFullFP16())		else if (VT == MVT::f16 && Subtarget->hasFullFP16())
IsLegal = AArch64_AM::getFP16Imm(ImmInt) != -1 \|\| Imm.isPosZero();		IsLegal = AArch64_AM::getFP16Imm(ImmInt) != -1 \|\| Imm.isPosZero();
// TODO: fmov h0, w0 is also legal, however on't have an isel pattern to		// TODO: fmov h0, w0 is also legal, however on't have an isel pattern to
// generate that fmov.		// generate that fmov.

		efriedmaUnsubmitted Not Done Reply Inline Actions Not sure this comment should be here; should be next to the code that actually makes this decision. efriedma: Not sure this comment should be here; should be next to the code that actually makes this…
		zatrazzAuthorUnsubmitted Done Reply Inline Actions Ack, I moved to the code that actually uses it. zatrazz: Ack, I moved to the code that actually uses it.
// If we can not materialize in immediate field for fmov, check if the		// If we can not materialize in immediate field for fmov, check if the
// value can be encoded as the immediate operand of a logical instruction.		// value can be encoded as the immediate operand of a logical instruction.
// The immediate value will be created with either MOVZ, MOVN, or ORR.		// The immediate value will be created with either MOVZ, MOVN, or ORR.
if (!IsLegal && (VT == MVT::f64 \|\| VT == MVT::f32))		if (!IsLegal && (VT == MVT::f64 \|\| VT == MVT::f32)) {
IsLegal = AArch64_AM::isAnyMOVWMovAlias(ImmInt.getZExtValue(),		// The cost is actually exactly the same for mov+fmov vs. adrp+ldr;
VT.getSizeInBits());		// however the mov+fmov sequence is always better because of the reduced
		// cache pressure. The timings are still the same if you consider
		efriedmaUnsubmitted Not Done Reply Inline Actions Does it matter whether we're optimizing for size? efriedma: Does it matter whether we're optimizing for size?
		zatrazzAuthorUnsubmitted Done Reply Inline Actions I intend to send another patch to add the optimization for size information on isFPImmLegal, since it requires some refactoring in various backends. zatrazz: I intend to send another patch to add the optimization for size information on isFPImmLegal…
		// movw+movk+fmov vs. adrp+ldr (it's one instruction longer, but the
		evandroUnsubmitted Not Done Reply Inline Actions Perhaps you could check for AArch64Subtarget::hasFuseLiterals() and emit up to 5 instructions, thus including f64, unless optimizing for size. evandro: Perhaps you could check for AArch64Subtarget::hasFuseLiterals() and emit up to 5 instructions…
		// movw+movk is fused). So we limit up to 2 instrdduction at most.
		SmallVector<AArch64_IMM::ImmInsnModel, 4> Insn;
		AArch64_IMM::expandMOVImm(ImmInt.getZExtValue(), VT.getSizeInBits(),
		Insn);
		efriedmaUnsubmitted Not Done Reply Inline Actions Is this comment redundant? efriedma: Is this comment redundant?
		zatrazzAuthorUnsubmitted Done Reply Inline Actions Indeed, I will remove it. zatrazz: Indeed, I will remove it.
		unsigned Limit = (OptForSize ? 1 : (Subtarget->hasFuseLiterals() ? 5 : 2));
		IsLegal = Insn.size() <= Limit;
		}
		efriedmaUnsubmitted Not Done Reply Inline Actions Where is forCodeSize defined? efriedma: Where is forCodeSize defined?
		zatrazzAuthorUnsubmitted Done Reply Inline Actions It is an artifact from a wrong rebase, I will fix it (it is meant for a different patch). zatrazz: It is an artifact from a wrong rebase, I will fix it (it is meant for a different patch).

LLVM_DEBUG(dbgs() << (IsLegal ? "Legal " : "Illegal ") << VT.getEVTString()		LLVM_DEBUG(dbgs() << (IsLegal ? "Legal " : "Illegal ") << VT.getEVTString()
<< " imm value: "; Imm.dump(););		<< " imm value: "; Imm.dump(););
return IsLegal;		return IsLegal;
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// AArch64 Optimization Hooks		// AArch64 Optimization Hooks
▲ Show 20 Lines • Show All 6,535 Lines • Show Last 20 Lines

test/CodeGen/AArch64/arm64-fp-imm-size.ll

This file was added.

				; RUN: llc < %s -mtriple=arm64-apple-darwin \| FileCheck %s

				; CHECK: literal8
				; CHECK: .quad 4614256656552045848
				define double @foo() optsize {
				; CHECK: _foo:
				; CHECK: adrp x[[REG:[0-9]+]], lCPI0_0@PAGE
				; CHECK: ldr d0, [x[[REG]], lCPI0_0@PAGEOFF]
				; CHECK-NEXT: ret
				ret double 0x400921FB54442D18
				}

				; CHECK: literal8
				; CHECK: .quad 137438953409
				define double @foo2() optsize {
				; CHECK: _foo2:
				; CHECK: adrp x[[REG:[0-9]+]], lCPI1_0@PAGE
				; CHECK: ldr d0, [x[[REG]], lCPI1_0@PAGEOFF]
				; CHECK-NEXT: ret
				ret double 0x1FFFFFFFC1
				}

				define float @bar() optsize {
				; CHECK: _bar:
				; CHECK: adrp x[[REG:[0-9]+]], lCPI2_0@PAGE
				; CHECK: ldr s0, [x[[REG]], lCPI2_0@PAGEOFF]
				; CHECK-NEXT: ret
				ret float 0x400921FB60000000
				}

				; CHECK: literal16
				; CHECK: .quad 0
				; CHECK: .quad 0
				define fp128 @baz() optsize {
				; CHECK: _baz:
				; CHECK: adrp x[[REG:[0-9]+]], lCPI3_0@PAGE
				; CHECK: ldr q0, [x[[REG]], lCPI3_0@PAGEOFF]
				; CHECK-NEXT: ret
				ret fp128 0xL00000000000000000000000000000000
				}

test/CodeGen/AArch64/arm64-fp-imm.ll

	; RUN: llc < %s -mtriple=arm64-apple-darwin \| FileCheck %s			; RUN: llc < %s -mtriple=arm64-apple-darwin \| FileCheck %s

	; CHECK: literal8			; CHECK: literal8
	; CHECK: .quad 4614256656552045848			; CHECK: .quad 4614256656552045848
	define double @foo() {			define double @foo() {
	; CHECK: _foo:			; CHECK: _foo:
	; CHECK: adrp x[[REG:[0-9]+]], lCPI0_0@PAGE			; CHECK: adrp x[[REG:[0-9]+]], lCPI0_0@PAGE
	; CHECK: ldr d0, [x[[REG]], lCPI0_0@PAGEOFF]			; CHECK: ldr d0, [x[[REG]], lCPI0_0@PAGEOFF]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	ret double 0x400921FB54442D18			ret double 0x400921FB54442D18
	}			}

	; CHECK: literal4
	; CHECK: .long 1078530011
	define float @bar() {			define float @bar() {
	; CHECK: _bar:			; CHECK: _bar:
	; CHECK: adrp x[[REG:[0-9]+]], lCPI1_0@PAGE			; CHECK: mov [[REG:w[0-9]+]], #4059
	; CHECK: ldr s0, [x[[REG]], lCPI1_0@PAGEOFF]			; CHECK: movk [[REG]], #16457, lsl #16
				; CHECK: fmov s0, [[REG]]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	ret float 0x400921FB60000000			ret float 0x400921FB60000000
	}			}

	; CHECK: literal16			; CHECK: literal16
	; CHECK: .quad 0			; CHECK: .quad 0
	; CHECK: .quad 0			; CHECK: .quad 0
	define fp128 @baz() {			define fp128 @baz() {
	; CHECK: _baz:			; CHECK: _baz:
	; CHECK: adrp x[[REG:[0-9]+]], lCPI2_0@PAGE			; CHECK: adrp x[[REG:[0-9]+]], lCPI2_0@PAGE
	; CHECK: ldr q0, [x[[REG]], lCPI2_0@PAGEOFF]			; CHECK: ldr q0, [x[[REG]], lCPI2_0@PAGEOFF]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	ret fp128 0xL00000000000000000000000000000000			ret fp128 0xL00000000000000000000000000000000
	}			}

test/CodeGen/AArch64/fpimm.ll

Show All 39 Lines	; TINY-DAG: fmov {{d[0-9]+}}, #8.5

%newval2 = fadd double %val, 128.0		%newval2 = fadd double %val, 128.0
store volatile double %newval2, double* @varf64		store volatile double %newval2, double* @varf64
; CHECK-DAG: mov [[X128:x[0-9]+]], #4638707616191610880		; CHECK-DAG: mov [[X128:x[0-9]+]], #4638707616191610880
; CHECK-DAG: fmov {{d[0-9]+}}, [[X128]]		; CHECK-DAG: fmov {{d[0-9]+}}, [[X128]]
; TINY-DAG: mov [[X128:x[0-9]+]], #4638707616191610880		; TINY-DAG: mov [[X128:x[0-9]+]], #4638707616191610880
; TINY-DAG: fmov {{d[0-9]+}}, [[X128]]		; TINY-DAG: fmov {{d[0-9]+}}, [[X128]]

		; 64-bit ORR followed by MOVK.
		; CHECK-DAG: mov [[XFP0:x[0-9]+]], #1082331758844
		; CHECK-DAG: movk [[XFP0]], #64764, lsl #16
		; CHECk-DAG: fmov {{d[0-9]+}}, [[XFP0]]
		%newval3 = fadd double %val, 0xFCFCFC00FC
		store volatile double %newval3, double* @varf64

; CHECK: ret		; CHECK: ret
; TINY: ret		; TINY: ret
ret void		ret void
}		}

; LARGE-LABEL: check_float2		; LARGE-LABEL: check_float2
; LARGE: mov [[REG:w[0-9]+]], #4059		; LARGE: mov [[REG:w[0-9]+]], #4059
; LARGE-NEXT: movk [[REG]], #16457, lsl #16		; LARGE-NEXT: movk [[REG]], #16457, lsl #16
; LARGE-NEXT: fmov s0, [[REG]]		; LARGE-NEXT: fmov s0, [[REG]]
; TINY-LABEL: check_float2		; TINY-LABEL: check_float2
; TINY: ldr s0, .LCPI2_0		; TINY: mov [[REG:w[0-9]+]], #4059
		; TINY-NEXT: movk [[REG]], #16457, lsl #16
define float @check_float2() {		define float @check_float2() {
ret float 3.14159274101257324218750		ret float 3.14159274101257324218750
}		}

; LARGE-LABEL: check_double2		; LARGE-LABEL: check_double2
; LARGE: mov [[REG:x[0-9]+]], #11544		; LARGE: mov [[REG:x[0-9]+]], #11544
; LARGE-NEXT: movk [[REG]], #21572, lsl #16		; LARGE-NEXT: movk [[REG]], #21572, lsl #16
; LARGE-NEXT: movk [[REG]], #8699, lsl #32		; LARGE-NEXT: movk [[REG]], #8699, lsl #32
; LARGE-NEXT: movk [[REG]], #16393, lsl #48		; LARGE-NEXT: movk [[REG]], #16393, lsl #48
; LARGE-NEXT: fmov d0, [[REG]]		; LARGE-NEXT: fmov d0, [[REG]]
; TINY-LABEL: check_double2		; TINY-LABEL: check_double2
; TINY: ldr d0, .LCPI3_0		; TINY: ldr d0, .LCPI3_0
define double @check_double2() {		define double @check_double2() {
ret double 3.1415926535897931159979634685441851615905761718750		ret double 3.1415926535897931159979634685441851615905761718750
}		}

test/CodeGen/AArch64/literal_pools_float.ll

	Show All 25 Lines
	; CHECK-LARGE: fadd			; CHECK-LARGE: fadd
	; CHECK-NOFP-LARGE-NOT: ldr {{s[0-9]+}},			; CHECK-NOFP-LARGE-NOT: ldr {{s[0-9]+}},
	; CHECK-NOFP-LARGE-NOT: fadd			; CHECK-NOFP-LARGE-NOT: fadd

	store float %newfloat, float* @varfloat			store float %newfloat, float* @varfloat

	%doubleval = load double, double* @vardouble			%doubleval = load double, double* @vardouble
	%newdouble = fadd double %doubleval, 129.0			%newdouble = fadd double %doubleval, 129.0
	; CHECK: adrp x[[LITBASE:[0-9]+]], [[CURLIT:.LCPI[0-9]+_[0-9]+]]
	; CHECK: ldr [[LIT129:d[0-9]+]], [x[[LITBASE]], {{#?}}:lo12:[[CURLIT]]]
	; CHECK-NOFP-NOT: ldr {{d[0-9]+}},			; CHECK-NOFP-NOT: ldr {{d[0-9]+}},
				; CHECK: mov [[W129:x[0-9]+]], #35184372088832
				; CHECK: movk [[W129]], #16480, lsl #48
				; CHECK: fmov {{d[0-9]+}}, [[W129]]
	; CHECK-NOFP-NOT: fadd			; CHECK-NOFP-NOT: fadd

	; CHECK-TINY: ldr [[LIT129:d[0-9]+]], [[CURLIT:.LCPI[0-9]+_[0-9]+]]			; CHECK-TINY: mov [[W129:x[0-9]+]], #35184372088832
				; CHECK-TINY: movk [[W129]], #16480, lsl #48
				; CHECK-TINY: fmov {{d[0-9]+}}, [[W129]]
	; CHECK-NOFP-TINY-NOT: ldr {{d[0-9]+}},			; CHECK-NOFP-TINY-NOT: ldr {{d[0-9]+}},
	; CHECK-NOFP-TINY-NOT: fadd			; CHECK-NOFP-TINY-NOT: fadd

	; CHECK-LARGE: movz x[[LITADDR:[0-9]+]], #:abs_g0_nc:[[CURLIT:.LCPI[0-9]+_[0-9]+]]			; CHECK-LARGE: movz x[[LITADDR:[0-9]+]], #:abs_g0_nc:[[CURLIT:vardouble]]
	; CHECK-LARGE: movk x[[LITADDR]], #:abs_g1_nc:[[CURLIT]]			; CHECK-LARGE: movk x[[LITADDR]], #:abs_g1_nc:[[CURLIT]]
	; CHECK-LARGE: movk x[[LITADDR]], #:abs_g2_nc:[[CURLIT]]			; CHECK-LARGE: movk x[[LITADDR]], #:abs_g2_nc:[[CURLIT]]
	; CHECK-LARGE: movk x[[LITADDR]], #:abs_g3:[[CURLIT]]			; CHECK-LARGE: movk x[[LITADDR]], #:abs_g3:[[CURLIT]]
	; CHECK-LARGE: ldr {{d[0-9]+}}, [x[[LITADDR]]]			; CHECK-LARGE: ldr {{d[0-9]+}}, [x[[LITADDR]]]
	; CHECK-NOFP-LARGE-NOT: ldr {{d[0-9]+}},			; CHECK-NOFP-LARGE-NOT: ldr {{d[0-9]+}},

	store double %newdouble, double* @vardouble			store double %newdouble, double* @vardouble

	ret void			ret void
	}			}

test/CodeGen/AArch64/misched-fusion-lit.ll

	Show All 40 Lines

	; CHECK-LABEL: litl:			; CHECK-LABEL: litl:
	; CHECK: mov [[R:x[0-9]+]], {{#[0-9]+}}			; CHECK: mov [[R:x[0-9]+]], {{#[0-9]+}}
	; CHECK-NEXT: movk [[R]], {{#[0-9]+}}, lsl #16			; CHECK-NEXT: movk [[R]], {{#[0-9]+}}, lsl #16
	; CHECK: movk [[R]], {{#[0-9]+}}, lsl #32			; CHECK: movk [[R]], {{#[0-9]+}}, lsl #32
	; CHECKDONT-NEXT: add {{x[0-9]+}}, {{x[0-9]+}}, {{x[0-9]+}}			; CHECKDONT-NEXT: add {{x[0-9]+}}, {{x[0-9]+}}, {{x[0-9]+}}
	; CHECKFUSE-NEXT: movk [[R]], {{#[0-9]+}}, lsl #48			; CHECKFUSE-NEXT: movk [[R]], {{#[0-9]+}}, lsl #48
	}			}

				; Function Attrs: norecurse nounwind readnone
				define double @litf() {
				entry:
				ret double 0x400921FB54442D18

				; CHECK-LABEL: litf:
				; CHECK-DONT: adrp [[ADDR:x[0-9]+]], [[CSTLABEL:.LCP.*]]
				; CHECK-DONT-NEXT: ldr {{d[0-9]+}}, {{[[]}}[[ADDR]], :lo12:[[CSTLABEL]]{{[]]}}
				; CHECK-FUSE: mov [[R:x[0-9]+]], #11544
				; CHECK-FUSE: movk [[R]], #21572, lsl #16
				; CHECK-FUSE: movk [[R]], #8699, lsl #32
				; CHECK-FUSE: movk [[R]], #16393, lsl #48
				; CHECK-FUSE: fmov {{d[0-9]+}}, [[R]]
				}

test/CodeGen/AArch64/win_cst_pool.ll

	; RUN: llc < %s -mtriple=aarch64-win32-msvc \| FileCheck %s			; RUN: llc < %s -mtriple=aarch64-win32-msvc \| FileCheck %s
	; RUN: llc < %s -mtriple=aarch64-win32-gnu \| FileCheck -check-prefix=MINGW %s			; RUN: llc < %s -mtriple=aarch64-win32-gnu \| FileCheck -check-prefix=MINGW %s

	define double @double() {			define double @double() {
	ret double 0x0000000000800001			ret double 0x2000000000800001
	}			}
	; CHECK: .globl __real@0000000000800001			; CHECK: .globl __real@2000000000800001
	; CHECK-NEXT: .section .rdata,"dr",discard,__real@0000000000800001			; CHECK-NEXT: .section .rdata,"dr",discard,__real@2000000000800001
	; CHECK-NEXT: .p2align 3			; CHECK-NEXT: .p2align 3
	; CHECK-NEXT: __real@0000000000800001:			; CHECK-NEXT: __real@2000000000800001:
	; CHECK-NEXT: .xword 8388609			; CHECK-NEXT: .xword 2305843009222082561
	; CHECK: double:			; CHECK: double:
	; CHECK: adrp x8, __real@0000000000800001			; CHECK: adrp x8, __real@2000000000800001
	; CHECK-NEXT: ldr d0, [x8, __real@0000000000800001]			; CHECK-NEXT: ldr d0, [x8, __real@2000000000800001]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret

	; MINGW: .section .rdata,"dr"			; MINGW: .section .rdata,"dr"
	; MINGW-NEXT: .p2align 3			; MINGW-NEXT: .p2align 3
	; MINGW-NEXT: [[LABEL:\.LC.*]]:			; MINGW-NEXT: [[LABEL:\.LC.*]]:
	; MINGW-NEXT: .xword 8388609			; MINGW-NEXT: .xword 2305843009222082561
	; MINGW: double:			; MINGW: double:
	; MINGW: adrp x8, [[LABEL]]			; MINGW: adrp x8, [[LABEL]]
	; MINGW-NEXT: ldr d0, [x8, [[LABEL]]]			; MINGW-NEXT: ldr d0, [x8, [[LABEL]]]
	; MINGW-NEXT: ret			; MINGW-NEXT: ret