This is an archive of the discontinued LLVM Phabricator instance.

[FPEnv][AArch64] Add lowering and instruction selection for STRICT_FP_ROUND
ClosedPublic

Authored by john.brawn on Jan 22 2020, 7:50 AM.

Download Raw Diff

Details

Reviewers

pengfei
uweigand
craig.topper
dmgreen
t.p.northover

Commits

rGa97c77ad1750: [FPEnv][AArch64] Add lowering and instruction selection for STRICT_FP_ROUND
rG258d8dd76afd: [FPEnv][AArch64] Add lowering and instruction selection for STRICT_FP_ROUND

Summary

This gets selected to the appropriate fcvt instruction. Handling from there on isn't fully correct yet, as we need to model fcvt reading and writing to fpsr and fpcr.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

john.brawn created this revision.Jan 22 2020, 7:50 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 22 2020, 7:50 AM

Herald added subscribers: hiraditya, kristof.beyls. · View Herald Transcript

Why is this problem unique to STRICT_FP_ROUND? What makes FP_ROUND work? I suspect this is really a missing setOperationAction line for STRICT_FP_ROUND in the AArch64ISelLowering.cpp

In D73201#1834055, @craig.topper wrote:

Why is this problem unique to STRICT_FP_ROUND? What makes FP_ROUND work? I suspect this is really a missing setOperationAction line for STRICT_FP_ROUND in the AArch64ISelLowering.cpp

It looks like FP_ROUND doesn't work either. If I make AArch64TargetLowering do setOperationAction(ISD::FP_ROUND, MVT::f32, Expand) then I get the same assertion failure ("Invalid TRUNCATE!" at SelectionDAG.cpp:4661). It looks like currently there's no targets that have FP_ROUND set to Expand for scalar floating-point types, so I guess that the FP_ROUND code is never used.

In D73201#1834200, @john.brawn wrote:

In D73201#1834055, @craig.topper wrote:

Why is this problem unique to STRICT_FP_ROUND? What makes FP_ROUND work? I suspect this is really a missing setOperationAction line for STRICT_FP_ROUND in the AArch64ISelLowering.cpp

It looks like FP_ROUND doesn't work either. If I make AArch64TargetLowering do setOperationAction(ISD::FP_ROUND, MVT::f32, Expand) then I get the same assertion failure ("Invalid TRUNCATE!" at SelectionDAG.cpp:4661). It looks like currently there's no targets that have FP_ROUND set to Expand for scalar floating-point types, so I guess that the FP_ROUND code is never used.

So does AArch64 set FP_ROUND to Legal or Custom? STRICT_FP_ROUND should do the same.

In D73201#1834254, @craig.topper wrote:

So does AArch64 set FP_ROUND to Legal or Custom? STRICT_FP_ROUND should do the same.

FP_ROUND is Custom in AArch64. I could add something to the AArch64 target to handle this, but it's a bit odd that the default behaviour is for an invalid operation to be generated (and I see that -mtriple=sparc hits the same assertion failure, as it also has FP_ROUND set to Custom).

In D73201#1834300, @john.brawn wrote:

In D73201#1834254, @craig.topper wrote:

So does AArch64 set FP_ROUND to Legal or Custom? STRICT_FP_ROUND should do the same.

FP_ROUND is Custom in AArch64. I could add something to the AArch64 target to handle this, but it's a bit odd that the default behaviour is for an invalid operation to be generated (and I see that -mtriple=sparc hits the same assertion failure, as it also has FP_ROUND set to Custom).

STRICT_ contains a lot of hacks to try to guess what to do by trying to see what the target does for non-strict nodes. That's why there's a call to getStrictFPOperationAction which will query the non-strict behavior. But it doesn't handle Custom well because we don't know what to do with it.

The change you've proposed here will end up causing the STRICT_FP_ROUND node to be considered Legal even though its set to Expand. It will then be mutated to the non-strict node during isel which is technically wrong. And in this case it works because there are isel patterns for most of the FP_ROUND combinations since the Custom handler only does something special for fp128. So fp128 STRICT_FP_ROUND will end up failing isel since it really needs to be Custom handled.

We want to remove all the mutation code once the in tree targets all support STRICT_ nodes properly. It's a hack to enable targets to limp along a little bit, but in hindsight may have been a bad idea since it creates the illusion of things working when they really don't.

kpn added a subscriber: kpn.Jan 22 2020, 10:57 AM

In D73201#1834370, @craig.topper wrote:

We want to remove all the mutation code once the in tree targets all support STRICT_ nodes properly. It's a hack to enable targets to limp along a little bit, but in hindsight may have been a bad idea since it creates the illusion of things working when they really don't.

It may be worthwhile to remove the mutation code sooner. For ops that are Custom we can't even pretend that things work. And a loud failure up front is better than generating code that silently doesn't do what it was asked to do. Plus, removing the mutation code makes it simpler to read and can avoid confusion from people who haven't been following it for years.

Changed to add handling in the aarch64 backend.

This looks good to me from a strict FP perspective, but maybe wait a bit for someone more familiar with AArch64 to take a look.

LGTM

This revision is now accepted and ready to land.Jan 30 2020, 3:57 AM

Closed by commit rG258d8dd76afd: [FPEnv][AArch64] Add lowering and instruction selection for STRICT_FP_ROUND (authored by john.brawn). · Explain WhyJan 30 2020, 4:58 AM

This revision was automatically updated to reflect the committed changes.

A user reported running into asserts when compiling musl for aarch64 with -frounding-math.
It appears to be fixed (well, at least it doesn't assert..) by this patch. Should it be cherry-picked to the 10.x branch?

In D73201#1878706, @hans wrote:

A user reported running into asserts when compiling musl for aarch64 with -frounding-math.
It appears to be fixed (well, at least it doesn't assert..) by this patch. Should it be cherry-picked to the 10.x branch?

There's a bunch of strict-math-related commits that are present on trunk that aren't present on 10.x and which should be cherry-picked. The complete list is, I think:

594a89f7270d [FPEnv][ARM] Don't call mutateStrictFPToFP when lowering
8bc790f9e6a6 [AArch64][FPenv] Update chain of int to fp conversion
0ec57972967d [ARM] Fix infinite loop when lowering STRICT_FP_EXTEND
68cf574857c8 [FPEnv][AArch64] Add lowering of f128 STRICT_FSETCC
b37d59353f69 [FPEnv][ARM] Add lowering of STRICT_FSETCC and STRICT_FSETCCS
0bb9a27c9895 [FPEnv][AArch64] Add lowering and instruction selection for strict conversions
258d8dd76afd [FPEnv][AArch64] Add lowering and instruction selection for STRICT_FP_ROUND
2224407ef5ba Add lowering of STRICT_FSETCC and STRICT_FSETCCS

There's a bunch of strict-math-related commits that are present on trunk that aren't present on 10.x and which should be cherry-picked. The complete list is, I think:

Thanks! I've pushed those to 10.x in f87a0929c6bd59750e424d06581507cdfd439a56..f636e9feb9f0969e3b563d3140db5a0faa1e30d8

Please let me know if you think of more commits that should be picked over.

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64ISelLowering.cpp

18 lines

AArch64InstrFormats.td

6 lines

test/

CodeGen/

AArch64/

fp-intrinsics.ll

21 lines

Diff 241413

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 287 Lines • ▼ Show 20 Lines	AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
setOperationAction(ISD::SINT_TO_FP, MVT::i32, Custom);		setOperationAction(ISD::SINT_TO_FP, MVT::i32, Custom);
setOperationAction(ISD::SINT_TO_FP, MVT::i64, Custom);		setOperationAction(ISD::SINT_TO_FP, MVT::i64, Custom);
setOperationAction(ISD::SINT_TO_FP, MVT::i128, Custom);		setOperationAction(ISD::SINT_TO_FP, MVT::i128, Custom);
setOperationAction(ISD::UINT_TO_FP, MVT::i32, Custom);		setOperationAction(ISD::UINT_TO_FP, MVT::i32, Custom);
setOperationAction(ISD::UINT_TO_FP, MVT::i64, Custom);		setOperationAction(ISD::UINT_TO_FP, MVT::i64, Custom);
setOperationAction(ISD::UINT_TO_FP, MVT::i128, Custom);		setOperationAction(ISD::UINT_TO_FP, MVT::i128, Custom);
setOperationAction(ISD::FP_ROUND, MVT::f32, Custom);		setOperationAction(ISD::FP_ROUND, MVT::f32, Custom);
setOperationAction(ISD::FP_ROUND, MVT::f64, Custom);		setOperationAction(ISD::FP_ROUND, MVT::f64, Custom);
		setOperationAction(ISD::STRICT_FP_ROUND, MVT::f32, Custom);
		setOperationAction(ISD::STRICT_FP_ROUND, MVT::f64, Custom);

// Variable arguments.		// Variable arguments.
setOperationAction(ISD::VASTART, MVT::Other, Custom);		setOperationAction(ISD::VASTART, MVT::Other, Custom);
setOperationAction(ISD::VAARG, MVT::Other, Custom);		setOperationAction(ISD::VAARG, MVT::Other, Custom);
setOperationAction(ISD::VACOPY, MVT::Other, Custom);		setOperationAction(ISD::VACOPY, MVT::Other, Custom);
setOperationAction(ISD::VAEND, MVT::Other, Expand);		setOperationAction(ISD::VAEND, MVT::Other, Expand);

// Variable-sized objects.		// Variable-sized objects.
▲ Show 20 Lines • Show All 2,212 Lines • ▼ Show 20 Lines	SDValue AArch64TargetLowering::LowerFP_EXTEND(SDValue Op,
RTLIB::Libcall LC;		RTLIB::Libcall LC;
LC = RTLIB::getFPEXT(Op.getOperand(0).getValueType(), Op.getValueType());		LC = RTLIB::getFPEXT(Op.getOperand(0).getValueType(), Op.getValueType());

return LowerF128Call(Op, DAG, LC);		return LowerF128Call(Op, DAG, LC);
}		}

SDValue AArch64TargetLowering::LowerFP_ROUND(SDValue Op,		SDValue AArch64TargetLowering::LowerFP_ROUND(SDValue Op,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
if (Op.getOperand(0).getValueType() != MVT::f128) {		bool IsStrict = Op->isStrictFPOpcode();
		SDValue SrcVal = Op.getOperand(IsStrict ? 1 : 0);
		if (SrcVal.getValueType() != MVT::f128) {
// It's legal except when f128 is involved		// It's legal except when f128 is involved
return Op;		return Op;
}		}

RTLIB::Libcall LC;		RTLIB::Libcall LC;
LC = RTLIB::getFPROUND(Op.getOperand(0).getValueType(), Op.getValueType());		LC = RTLIB::getFPROUND(SrcVal.getValueType(), Op.getValueType());

// FP_ROUND node has a second operand indicating whether it is known to be		// FP_ROUND node has a second operand indicating whether it is known to be
// precise. That doesn't take part in the LibCall so we can't directly use		// precise. That doesn't take part in the LibCall so we can't directly use
// LowerF128Call.		// LowerF128Call.
SDValue SrcVal = Op.getOperand(0);
MakeLibCallOptions CallOptions;		MakeLibCallOptions CallOptions;
return makeLibCall(DAG, LC, Op.getValueType(), SrcVal, CallOptions,		SDValue Chain = IsStrict ? Op.getOperand(0) : SDValue();
SDLoc(Op)).first;		SDValue Result;
		SDLoc dl(Op);
		std::tie(Result, Chain) = makeLibCall(DAG, LC, Op.getValueType(), SrcVal,
		CallOptions, dl, Chain);
		return IsStrict ? DAG.getMergeValues({Result, Chain}, dl) : Result;
}		}

SDValue AArch64TargetLowering::LowerVectorFP_TO_INT(SDValue Op,		SDValue AArch64TargetLowering::LowerVectorFP_TO_INT(SDValue Op,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
// Warning: We maintain cost tables in AArch64TargetTransformInfo.cpp.		// Warning: We maintain cost tables in AArch64TargetTransformInfo.cpp.
// Any additional optimization in this function should be recorded		// Any additional optimization in this function should be recorded
// in the cost tables.		// in the cost tables.
EVT InVT = Op.getOperand(0).getValueType();		EVT InVT = Op.getOperand(0).getValueType();
▲ Show 20 Lines • Show All 658 Lines • ▼ Show 20 Lines	case ISD::FADD:
return LowerF128Call(Op, DAG, RTLIB::ADD_F128);		return LowerF128Call(Op, DAG, RTLIB::ADD_F128);
case ISD::FSUB:		case ISD::FSUB:
return LowerF128Call(Op, DAG, RTLIB::SUB_F128);		return LowerF128Call(Op, DAG, RTLIB::SUB_F128);
case ISD::FMUL:		case ISD::FMUL:
return LowerF128Call(Op, DAG, RTLIB::MUL_F128);		return LowerF128Call(Op, DAG, RTLIB::MUL_F128);
case ISD::FDIV:		case ISD::FDIV:
return LowerF128Call(Op, DAG, RTLIB::DIV_F128);		return LowerF128Call(Op, DAG, RTLIB::DIV_F128);
case ISD::FP_ROUND:		case ISD::FP_ROUND:
		case ISD::STRICT_FP_ROUND:
return LowerFP_ROUND(Op, DAG);		return LowerFP_ROUND(Op, DAG);
case ISD::FP_EXTEND:		case ISD::FP_EXTEND:
return LowerFP_EXTEND(Op, DAG);		return LowerFP_EXTEND(Op, DAG);
case ISD::FRAMEADDR:		case ISD::FRAMEADDR:
return LowerFRAMEADDR(Op, DAG);		return LowerFRAMEADDR(Op, DAG);
case ISD::SPONENTRY:		case ISD::SPONENTRY:
return LowerSPONENTRY(Op, DAG);		return LowerSPONENTRY(Op, DAG);
case ISD::RETURNADDR:		case ISD::RETURNADDR:
▲ Show 20 Lines • Show All 10,287 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64InstrFormats.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,730 Lines • ▼ Show 20 Lines	class BaseFPConversion<bits<2> type, bits<2> opcode, RegisterClass dstType,
let Inst{14-10} = 0b10000;		let Inst{14-10} = 0b10000;
let Inst{9-5} = Rn;		let Inst{9-5} = Rn;
let Inst{4-0} = Rd;		let Inst{4-0} = Rd;
}		}

multiclass FPConversion<string asm> {		multiclass FPConversion<string asm> {
// Double-precision to Half-precision		// Double-precision to Half-precision
def HDr : BaseFPConversion<0b01, 0b11, FPR16, FPR64, asm,		def HDr : BaseFPConversion<0b01, 0b11, FPR16, FPR64, asm,
[(set FPR16:$Rd, (fpround FPR64:$Rn))]>;		[(set FPR16:$Rd, (any_fpround FPR64:$Rn))]>;

// Double-precision to Single-precision		// Double-precision to Single-precision
def SDr : BaseFPConversion<0b01, 0b00, FPR32, FPR64, asm,		def SDr : BaseFPConversion<0b01, 0b00, FPR32, FPR64, asm,
[(set FPR32:$Rd, (fpround FPR64:$Rn))]>;		[(set FPR32:$Rd, (any_fpround FPR64:$Rn))]>;

// Half-precision to Double-precision		// Half-precision to Double-precision
def DHr : BaseFPConversion<0b11, 0b01, FPR64, FPR16, asm,		def DHr : BaseFPConversion<0b11, 0b01, FPR64, FPR16, asm,
[(set FPR64:$Rd, (fpextend FPR16:$Rn))]>;		[(set FPR64:$Rd, (fpextend FPR16:$Rn))]>;

// Half-precision to Single-precision		// Half-precision to Single-precision
def SHr : BaseFPConversion<0b11, 0b00, FPR32, FPR16, asm,		def SHr : BaseFPConversion<0b11, 0b00, FPR32, FPR16, asm,
[(set FPR32:$Rd, (fpextend FPR16:$Rn))]>;		[(set FPR32:$Rd, (fpextend FPR16:$Rn))]>;

// Single-precision to Double-precision		// Single-precision to Double-precision
def DSr : BaseFPConversion<0b00, 0b01, FPR64, FPR32, asm,		def DSr : BaseFPConversion<0b00, 0b01, FPR64, FPR32, asm,
[(set FPR64:$Rd, (fpextend FPR32:$Rn))]>;		[(set FPR64:$Rd, (fpextend FPR32:$Rn))]>;

// Single-precision to Half-precision		// Single-precision to Half-precision
def HSr : BaseFPConversion<0b00, 0b11, FPR16, FPR32, asm,		def HSr : BaseFPConversion<0b00, 0b11, FPR16, FPR32, asm,
[(set FPR16:$Rd, (fpround FPR32:$Rn))]>;		[(set FPR16:$Rd, (any_fpround FPR32:$Rn))]>;
}		}

//---		//---
// Single operand floating point data processing		// Single operand floating point data processing
//---		//---

let mayLoad = 0, mayStore = 0, hasSideEffects = 0 in		let mayLoad = 0, mayStore = 0, hasSideEffects = 0 in
class BaseSingleOperandFPData<bits<6> opcode, RegisterClass regtype,		class BaseSingleOperandFPData<bits<6> opcode, RegisterClass regtype,
▲ Show 20 Lines • Show All 6,180 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/fp-intrinsics.ll

Show First 20 Lines • Show All 71 Lines • ▼ Show 20 Lines

; CHECK-LABEL: fptoui_i64_f32:		; CHECK-LABEL: fptoui_i64_f32:
; FIXME-CHECK: fcvtzu x0, s0		; FIXME-CHECK: fcvtzu x0, s0
define i64 @fptoui_i64_f32(float %x) #0 {		define i64 @fptoui_i64_f32(float %x) #0 {
%val = call i64 @llvm.experimental.constrained.fptoui.i64.f32(float %x, metadata !"fpexcept.strict") #0		%val = call i64 @llvm.experimental.constrained.fptoui.i64.f32(float %x, metadata !"fpexcept.strict") #0
ret i64 %val		ret i64 %val
}		}

; TODO: sitofp_f32_i32 (missing STRICT_FP_ROUND handling)		; CHECK-LABEL: sitofp_f32_i32:
		; FIXME-CHECK: scvtf s0, w0
		define float @sitofp_f32_i32(i32 %x) #0 {
		%val = call float @llvm.experimental.constrained.sitofp.f32.i32(i32 %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
		ret float %val
		}

; TODO: uitofp_f32_i32 (missing STRICT_FP_ROUND handling)		; CHECK-LABEL: uitofp_f32_i32:
		; FIXME-CHECK: ucvtf s0, w0
		define float @uitofp_f32_i32(i32 %x) #0 {
		%val = call float @llvm.experimental.constrained.uitofp.f32.i32(i32 %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
		ret float %val
		}

; TODO: sitofp_f32_i64 (missing STRICT_SINT_TO_FP handling)		; TODO: sitofp_f32_i64 (missing STRICT_SINT_TO_FP handling)

; TODO: uitofp_f32_i64 (missing STRICT_SINT_TO_FP handling)		; TODO: uitofp_f32_i64 (missing STRICT_SINT_TO_FP handling)

; CHECK-LABEL: sqrt_f32:		; CHECK-LABEL: sqrt_f32:
; CHECK: fsqrt s0, s0		; CHECK: fsqrt s0, s0
define float @sqrt_f32(float %x) #0 {		define float @sqrt_f32(float %x) #0 {
▲ Show 20 Lines • Show All 784 Lines • ▼ Show 20 Lines	define i32 @fcmps_une_f64(double %a, double %b) #0 {
%cmp = call i1 @llvm.experimental.constrained.fcmps.f64(double %a, double %b, metadata !"une", metadata !"fpexcept.strict") #0		%cmp = call i1 @llvm.experimental.constrained.fcmps.f64(double %a, double %b, metadata !"une", metadata !"fpexcept.strict") #0
%conv = zext i1 %cmp to i32		%conv = zext i1 %cmp to i32
ret i32 %conv		ret i32 %conv
}		}


; Single/Double conversion intrinsics		; Single/Double conversion intrinsics

; TODO: fptrunc_f32 (missing STRICT_FP_ROUND handling)		; CHECK-LABEL: fptrunc_f32:
		; CHECK: fcvt s0, d0
		define float @fptrunc_f32(double %x) #0 {
		%val = call float @llvm.experimental.constrained.fptrunc.f32.f64(double %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
		ret float %val
		}

; CHECK-LABEL: fpext_f32:		; CHECK-LABEL: fpext_f32:
; CHECK: fcvt d0, s0		; CHECK: fcvt d0, s0
define double @fpext_f32(float %x) #0 {		define double @fpext_f32(float %x) #0 {
%val = call double @llvm.experimental.constrained.fpext.f64.f32(float %x, metadata !"fpexcept.strict") #0		%val = call double @llvm.experimental.constrained.fpext.f64.f32(float %x, metadata !"fpexcept.strict") #0
ret double %val		ret double %val
}		}

▲ Show 20 Lines • Show All 83 Lines • Show Last 20 Lines