This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/
-
CodeGen/SelectionDAG/
-
SelectionDAG/
-
DAGCombiner.cpp
-
Target/RISCV/
-
RISCV/
-
RISCVInstrInfoF.td
-
test/CodeGen/RISCV/
-
CodeGen/
-
RISCV/
-
fp-convert-indirect.ll

Differential D79187

[DAGCombiner] fold (fp_round (int_to_fp X)) -> (int_to_fp X)
AbandonedPublic

Authored by lenary on Apr 30 2020, 10:56 AM.

Download Raw Diff

Details

Reviewers

asb
luismarques
efriedma
t.p.northover
spatel

Summary

This is a follow-up to D78906 which moves the required fold from a SelectionDAG
pattern into a (target-independent) DAGCombiner fold.

I believe the fold is valid if result type of the round can still accurately
represent the original integer type (signed or unsigned), and the intermediate
fp type is wider than the resulting fp type.

An example from these tests is where the original integer type is i32, the
intermediate float type is double, and the resulting fp type is float. My
understanding in this case is that the fp_round removes as much information as
just doing the int_to_fp directly.

The advantage of doing it in the DAGCombiner in this case is that it allows the
fold to happen, even if the intermediate floating point value is not a legal
type. In that case, the legaliser will turn the round and the int_to_fp into two
separate libcalls, before patterns can be used. The DAGCombiner does not have
this issue, as it is run before legalisation.

This patch needs a thorough review with respect to FP semantics. I am not
particularly familiar with how rounding and the like will affect this fold, and
I did find it hard to get the fold conditions correct to avoid infinite loops in
the AArch64 backend (which affected fp16 types), but to have the optimisation
work on the examples I expected in the RISC-V backend.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

lenary created this revision.Apr 30 2020, 10:56 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 30 2020, 10:56 AM

Herald added subscribers: llvm-commits, apazos, sameer.abuasal and 20 others. · View Herald Transcript

(Alive2 is happy with the transform; see http://volta.cs.utah.edu:8080/z/oR3wKj .)

I did find it hard to get the fold conditions correct to avoid infinite loops in the AArch64 backend (which affected fp16 types),

Can you give an example here?

We probably need some code to avoid creating illegal SINT_TO_FP after legalization; that could lead to an infinite loop. There are lots of examples in DAGCombine to follow that call isOperationLegal.

In D79187#2013528, @efriedma wrote:

(Alive2 is happy with the transform; see http://volta.cs.utah.edu:8080/z/oR3wKj .)

I did find it hard to get the fold conditions correct to avoid infinite loops in the AArch64 backend (which affected fp16 types),

Can you give an example here?

We probably need some code to avoid creating illegal SINT_TO_FP after legalization; that could lead to an infinite loop. There are lots of examples in DAGCombine to follow that call isOperationLegal.

This patch itself does not cause infinite loops. However, small changes to the validity checker of the transform seem to cause issues, in the worst case breaking all of the following:

LLVM :: CodeGen/AArch64/arm64-convert-v4f64.ll
LLVM :: CodeGen/AArch64/arm64-fast-isel-conversion-fallback.ll
LLVM :: CodeGen/AArch64/complex-int-to-fp.ll
LLVM :: CodeGen/AArch64/f16-instructions.ll
LLVM :: CodeGen/AArch64/fdiv_combine.ll
LLVM :: CodeGen/AArch64/fp16-v16-instructions.ll
LLVM :: CodeGen/AArch64/fp16-v4-instructions.ll
LLVM :: CodeGen/AArch64/fp16-v8-instructions.ll

I did try to ensure that SINT_TO_FP was legal, but I'm not sure the query I had for that was ever correct, because I'm not sure the query hasOperation(ISD::SINT_TO_FP, IntTy) necessarily tells you anything - I think I'm looking for the other half of this query, which says "is it legal to go *to* this FP type with sint_to_fp. Guidance on this would also be helpful :) I'm new to the DAGCombiner.

The worst offendors seemed to be the following. I looked into f16-instructions.ll with llc -debug-only=dagcombine and there was definitely an infinite loop somewhere there.

LLVM :: CodeGen/AArch64/arm64-fast-isel-conversion-fallback.ll
LLVM :: CodeGen/AArch64/f16-instructions.ll

However, small changes to the validity checker of the transform seem to cause issues

You mean, we don't have enough test coverage to catch all the issues? :)

Actually, I guess you won't run into issues with the current version of the patch because the legal integer types aren't narrow enough. I assume if you added a call to GetNumSignBits or whatever, the issues would pop back up. The problem is essentially that the fp16 type is "legal", but doesn't have SINT_TO_FP; we instead convert to a 32-bit float, then truncate the float to fp16. Which is exactly the pattern you're matching here.

I think if the result type of the fp_round is a half, the transform is legal for any integer type. half is so tiny that all integer half values fit into the mantissa of a 32-bit float, so there can't be any rounding issues. Alive2 agrees.

I'm not sure the query hasOperation(ISD::SINT_TO_FP, IntTy) necessarily tells you anything

If that returns Legal, it means SINT_TO_FP is legal for all legal floating-point types. If it's not legal for some types, the target has to mark it Custom and add handling for the types in question.

Thanks for all the help, I will add the hasOperation(...) check to the fold.

I'm not convinced I'm using the semanticsPrecision from the correct node - I think perhaps I should be using the precision of the final resulting type of the fp_round. Once I've made the hasOperation change, I'll check this potential change against the AArch64 tests again.

Address @efriedma's feedback and guidance:

Use hasOperation to ensure legality. This causes the optimisation not to happen for the testcase on on RV64*F, in this example, I think because i32 is not a legal type on that architecture. I'll follow-up this patch if I can get that working.

I have also refactored the check, and added an even greater commentary of why I
believe the transform to be correct, which I think means it applies in more
cases. This is based off the observation/assumption that fp_round always
removes precision of the final float, so that precision loss can always be
"moved" into the int_to_fp operation.

Harbormaster failed remote builds in B55424: Diff 261453!May 1 2020, 4:16 AM

From http://volta.cs.utah.edu:8080/z/d6wpw5:

----------------------------------------
define float @src(i64 %x) {
%0:
  %xx = sitofp i64 %x to double
  %xxx = fptrunc double %xx to float
  ret float %xxx
}
=>
define float @tgt(i64 %x) {
%0:
  %xx = sitofp i64 %x to float
  ret float %xx
}
Transformation doesn't verify!
ERROR: Value mismatch

Example:
i64 %x = #x0810400800000010 (581034755034710032)

Source:
double %xx = #x43a0208010000000 (581034755034710016)
float %xxx = #x5d010400 (581034720674971648)

Target:
float %xx = #x5d010401 (581034789394448384)
Source value: #x5d010400 (581034720674971648)
Target value: #x5d010401 (581034789394448384)

Use hasOperation to ensure legality. This causes the optimisation not to happen for the testcase on on RV64*F, in this example, I think because i32 is not a legal type on that architecture. I'll follow-up this patch if I can get that working.

You could relax the check a bit; if !LegalTypes, you don't need to worry whether the operation is legal.

Actually, it might make sense to do this in instcombine instead of DAGCombine.

In D79187#2015298, @efriedma wrote:
From http://volta.cs.utah.edu:8080/z/d6wpw5:
...
Source value: #x5d010400 (581034720674971648)
Target value: #x5d010401 (581034789394448384)

I *thought* something like this might pop up last time.

I'll defer to you over whether we move this check to instcombine. In the meantime, I'll reinstate the previous check (that the intermediate float can precisely represent the source int), and add !LegalTypes || to the hasOperation check.

efriedma added a reviewer: spatel.May 1 2020, 6:20 PM

Updates to use original precision check (against intermediate float type), and
relaxes the hasOperation() check pre-legaltypes.

Harbormaster failed remote builds in B55545: Diff 261640!May 2 2020, 7:23 AM

In D79187#2015327, @efriedma wrote:

Actually, it might make sense to do this in instcombine instead of DAGCombine.

Agree - if we can fold this earlier and universally, that would be better. See D79116 for a similar transform.

RKSimon added a subscriber: RKSimon.May 4 2020, 7:54 AM

spatel mentioned this in rG130a2356aee7: [InstCombine] add tests for FP cast of cast; NFC.May 17 2020, 9:02 AM

rGc048a02b5b26 does this transform in IR. Do we still need a backend transform?

Herald added a subscriber: ecnelises. · View Herald TranscriptMay 24 2020, 7:03 AM

In D79187#2052620, @spatel wrote:

rGc048a02b5b26 does this transform in IR. Do we still need a backend transform?

I don't think so.

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

27 lines

Target/

RISCV/

RISCVInstrInfoF.td

6 lines

test/

CodeGen/

RISCV/

fp-convert-indirect.ll

41 lines

Diff 261640

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 13,456 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitFP_ROUND(SDNode *N) {
// fold (fp_round c1fp) -> c1fp		// fold (fp_round c1fp) -> c1fp
if (N0CFP)		if (N0CFP)
return DAG.getNode(ISD::FP_ROUND, SDLoc(N), VT, N0, N1);		return DAG.getNode(ISD::FP_ROUND, SDLoc(N), VT, N0, N1);

// fold (fp_round (fp_extend x)) -> x		// fold (fp_round (fp_extend x)) -> x
if (N0.getOpcode() == ISD::FP_EXTEND && VT == N0.getOperand(0).getValueType())		if (N0.getOpcode() == ISD::FP_EXTEND && VT == N0.getOperand(0).getValueType())
return N0.getOperand(0);		return N0.getOperand(0);

		// fold (fp_round ({u,s}int_to_fp x)) -> ({u,s}int_to_fp x)
		// but only when the ({u,s}int_to_fp x) remains precise
		if (N0.getOpcode() == ISD::SINT_TO_FP \|\| N0.getOpcode() == ISD::UINT_TO_FP) {
		SDValue IntNode = N0->getOperand(0);
		EVT IntTy = IntNode.getValueType();

		bool IsIntSigned = N0.getOpcode() == ISD::SINT_TO_FP;
		unsigned IntSize = (int)IntTy.getScalarSizeInBits() - IsIntSigned;
		const fltSemantics &IntrSem = DAG.EVTToAPFloatSemantics(N0.getValueType());

		// The intuition behind this check is that the original DAG took an integer,
		// and converted it to the resulting float, but via a more precise float.
		// After the transform, the integer is converted directly to the resulting
		// float, without the intermediate precise float.
		//
		// Because the intermediate float is rounded to the resulting float, we know
		// the resulting float is less precise than the intermediate float.
		// Therefore, the relative precision of the int to the resulting float does
		// not matter as long as we can fully represent the int in the intermediate
		// float value. This avoids double-rounding issues.
		if (APFloat::semanticsPrecision(IntrSem) >= IntSize &&
		(!LegalTypes \|\| hasOperation(N0.getOpcode(), IntTy))) {
		SDLoc DL(N);
		return DAG.getNode(N0.getOpcode(), DL, VT, IntNode);
		}
		}

// fold (fp_round (fp_round x)) -> (fp_round x)		// fold (fp_round (fp_round x)) -> (fp_round x)
if (N0.getOpcode() == ISD::FP_ROUND) {		if (N0.getOpcode() == ISD::FP_ROUND) {
const bool NIsTrunc = N->getConstantOperandVal(1) == 1;		const bool NIsTrunc = N->getConstantOperandVal(1) == 1;
const bool N0IsTrunc = N0.getConstantOperandVal(1) == 1;		const bool N0IsTrunc = N0.getConstantOperandVal(1) == 1;

// Skip this folding if it results in an fp_round from f80 to f16.		// Skip this folding if it results in an fp_round from f80 to f16.
//		//
// f80 to f16 always generates an expensive (and as yet, unimplemented)		// f80 to f16 always generates an expensive (and as yet, unimplemented)
▲ Show 20 Lines • Show All 8,241 Lines • Show Last 20 Lines

llvm/lib/Target/RISCV/RISCVInstrInfoF.td

	Show First 20 Lines • Show All 386 Lines • ▼ Show 20 Lines
	let Predicates = [HasStdExtF, IsRV32] in {			let Predicates = [HasStdExtF, IsRV32] in {
	// float->[u]int. Round-to-zero must be used.			// float->[u]int. Round-to-zero must be used.
	def : Pat<(fp_to_sint FPR32:$rs1), (FCVT_W_S $rs1, 0b001)>;			def : Pat<(fp_to_sint FPR32:$rs1), (FCVT_W_S $rs1, 0b001)>;
	def : Pat<(fp_to_uint FPR32:$rs1), (FCVT_WU_S $rs1, 0b001)>;			def : Pat<(fp_to_uint FPR32:$rs1), (FCVT_WU_S $rs1, 0b001)>;

	// [u]int->float. Match GCC and default to using dynamic rounding mode.			// [u]int->float. Match GCC and default to using dynamic rounding mode.
	def : Pat<(sint_to_fp GPR:$rs1), (FCVT_S_W $rs1, 0b111)>;			def : Pat<(sint_to_fp GPR:$rs1), (FCVT_S_W $rs1, 0b111)>;
	def : Pat<(uint_to_fp GPR:$rs1), (FCVT_S_WU $rs1, 0b111)>;			def : Pat<(uint_to_fp GPR:$rs1), (FCVT_S_WU $rs1, 0b111)>;

	// [u]int->double->float
	def : Pat<(fpround (f64 (sint_to_fp GPR:$rs1))),
	(FCVT_S_W GPR:$rs1, 0b111)>;
	def : Pat<(fpround (f64 (uint_to_fp GPR:$rs1))),
	(FCVT_S_WU GPR:$rs1, 0b111)>;
	} // Predicates = [HasStdExtF, IsRV32]			} // Predicates = [HasStdExtF, IsRV32]

	let Predicates = [HasStdExtF, IsRV64] in {			let Predicates = [HasStdExtF, IsRV64] in {
	def : Pat<(riscv_fmv_w_x_rv64 GPR:$src), (FMV_W_X GPR:$src)>;			def : Pat<(riscv_fmv_w_x_rv64 GPR:$src), (FMV_W_X GPR:$src)>;
	def : Pat<(riscv_fmv_x_anyextw_rv64 FPR32:$src), (FMV_X_W FPR32:$src)>;			def : Pat<(riscv_fmv_x_anyextw_rv64 FPR32:$src), (FMV_X_W FPR32:$src)>;
	def : Pat<(sexti32 (riscv_fmv_x_anyextw_rv64 FPR32:$src)),			def : Pat<(sexti32 (riscv_fmv_x_anyextw_rv64 FPR32:$src)),
	(FMV_X_W FPR32:$src)>;			(FMV_X_W FPR32:$src)>;

	Show All 24 Lines

llvm/test/CodeGen/RISCV/fp-convert-indirect.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=riscv32 -mattr=+f -verify-machineinstrs < %s \			; RUN: llc -mtriple=riscv32 -mattr=+f -verify-machineinstrs < %s \
	; RUN: \| FileCheck -check-prefix=RV32IF %s			; RUN: \| FileCheck -check-prefix=RV32IF %s
	; RUN: llc -mtriple=riscv32 -mattr=+f,+d -verify-machineinstrs < %s \			; RUN: llc -mtriple=riscv32 -mattr=+f,+d -verify-machineinstrs < %s \
	; RUN: \| FileCheck -check-prefix=RV32IFD %s			; RUN: \| FileCheck -check-prefix=RV32IFD %s
	; RUN: llc -mtriple=riscv64 -mattr=+f -verify-machineinstrs < %s \			; RUN: llc -mtriple=riscv64 -mattr=+f -verify-machineinstrs < %s \
	; RUN: \| FileCheck -check-prefix=RV64IF %s			; RUN: \| FileCheck -check-prefix=RV64IF %s
	; RUN: llc -mtriple=riscv64 -mattr=+f,+d -verify-machineinstrs < %s \			; RUN: llc -mtriple=riscv64 -mattr=+f,+d -verify-machineinstrs < %s \
	; RUN: \| FileCheck -check-prefix=RV64IFD %s			; RUN: \| FileCheck -check-prefix=RV64IFD %s

	;; These testcases check that we merge sequences of `fcvt.d.wu; fcvt.s.d` into			;; These testcases check that we merge sequences of `fcvt.d.wu; fcvt.s.d` into
	;; `fcvt.s.wu`.			;; `fcvt.s.wu`.
	;;			;;
	;; TODO: Unfortunately, though this only uses 32-bit FP instructions, we cannot			;; These folds are actually implemented in the DAGCombiner, because otherwise
	;; do this optimisation without the D extension as we need 64-bit FP values to			;; without the D extension the intermediate `double` will be legalised away and
	;; be legal to get the right operands to match.			;; the conversions will be turned into libcalls.

	define float @fcvt_s_w_via_d(i32 %a) nounwind {			define float @fcvt_s_w_via_d(i32 %a) nounwind {
	; RV32IF-LABEL: fcvt_s_w_via_d:			; RV32IF-LABEL: fcvt_s_w_via_d:
	; RV32IF: # %bb.0:			; RV32IF: # %bb.0:
	; RV32IF-NEXT: addi sp, sp, -16			; RV32IF-NEXT: fcvt.s.w ft0, a0
	; RV32IF-NEXT: sw ra, 12(sp)			; RV32IF-NEXT: fmv.x.w a0, ft0
	; RV32IF-NEXT: call __floatsidf
	; RV32IF-NEXT: call __truncdfsf2
	; RV32IF-NEXT: lw ra, 12(sp)
	; RV32IF-NEXT: addi sp, sp, 16
	; RV32IF-NEXT: ret			; RV32IF-NEXT: ret
	;			;
	; RV32IFD-LABEL: fcvt_s_w_via_d:			; RV32IFD-LABEL: fcvt_s_w_via_d:
	; RV32IFD: # %bb.0:			; RV32IFD: # %bb.0:
	; RV32IFD-NEXT: fcvt.s.w ft0, a0			; RV32IFD-NEXT: fcvt.s.w ft0, a0
	; RV32IFD-NEXT: fmv.x.w a0, ft0			; RV32IFD-NEXT: fmv.x.w a0, ft0
	; RV32IFD-NEXT: ret			; RV32IFD-NEXT: ret
	;			;
	; RV64IF-LABEL: fcvt_s_w_via_d:			; RV64IF-LABEL: fcvt_s_w_via_d:
	; RV64IF: # %bb.0:			; RV64IF: # %bb.0:
	; RV64IF-NEXT: addi sp, sp, -16			; RV64IF-NEXT: fcvt.s.w ft0, a0
	; RV64IF-NEXT: sd ra, 8(sp)			; RV64IF-NEXT: fmv.x.w a0, ft0
	; RV64IF-NEXT: sext.w a0, a0
	; RV64IF-NEXT: call __floatsidf
	; RV64IF-NEXT: call __truncdfsf2
	; RV64IF-NEXT: ld ra, 8(sp)
	; RV64IF-NEXT: addi sp, sp, 16
	; RV64IF-NEXT: ret			; RV64IF-NEXT: ret
	;			;
	; RV64IFD-LABEL: fcvt_s_w_via_d:			; RV64IFD-LABEL: fcvt_s_w_via_d:
	; RV64IFD: # %bb.0:			; RV64IFD: # %bb.0:
	; RV64IFD-NEXT: fcvt.s.w ft0, a0			; RV64IFD-NEXT: fcvt.s.w ft0, a0
	; RV64IFD-NEXT: fmv.x.w a0, ft0			; RV64IFD-NEXT: fmv.x.w a0, ft0
	; RV64IFD-NEXT: ret			; RV64IFD-NEXT: ret
	%1 = sitofp i32 %a to double			%1 = sitofp i32 %a to double
	%2 = fptrunc double %1 to float			%2 = fptrunc double %1 to float
	ret float %2			ret float %2
	}			}

	define float @fcvt_s_wu_via_d(i32 %a) nounwind {			define float @fcvt_s_wu_via_d(i32 %a) nounwind {
	; RV32IF-LABEL: fcvt_s_wu_via_d:			; RV32IF-LABEL: fcvt_s_wu_via_d:
	; RV32IF: # %bb.0:			; RV32IF: # %bb.0:
	; RV32IF-NEXT: addi sp, sp, -16			; RV32IF-NEXT: fcvt.s.wu ft0, a0
	; RV32IF-NEXT: sw ra, 12(sp)			; RV32IF-NEXT: fmv.x.w a0, ft0
	; RV32IF-NEXT: call __floatunsidf
	; RV32IF-NEXT: call __truncdfsf2
	; RV32IF-NEXT: lw ra, 12(sp)
	; RV32IF-NEXT: addi sp, sp, 16
	; RV32IF-NEXT: ret			; RV32IF-NEXT: ret
	;			;
	; RV32IFD-LABEL: fcvt_s_wu_via_d:			; RV32IFD-LABEL: fcvt_s_wu_via_d:
	; RV32IFD: # %bb.0:			; RV32IFD: # %bb.0:
	; RV32IFD-NEXT: fcvt.s.wu ft0, a0			; RV32IFD-NEXT: fcvt.s.wu ft0, a0
	; RV32IFD-NEXT: fmv.x.w a0, ft0			; RV32IFD-NEXT: fmv.x.w a0, ft0
	; RV32IFD-NEXT: ret			; RV32IFD-NEXT: ret
	;			;
	; RV64IF-LABEL: fcvt_s_wu_via_d:			; RV64IF-LABEL: fcvt_s_wu_via_d:
	; RV64IF: # %bb.0:			; RV64IF: # %bb.0:
	; RV64IF-NEXT: addi sp, sp, -16			; RV64IF-NEXT: fcvt.s.wu ft0, a0
	; RV64IF-NEXT: sd ra, 8(sp)			; RV64IF-NEXT: fmv.x.w a0, ft0
	; RV64IF-NEXT: slli a0, a0, 32
	; RV64IF-NEXT: srli a0, a0, 32
	; RV64IF-NEXT: call __floatunsidf
	; RV64IF-NEXT: call __truncdfsf2
	; RV64IF-NEXT: ld ra, 8(sp)
	; RV64IF-NEXT: addi sp, sp, 16
	; RV64IF-NEXT: ret			; RV64IF-NEXT: ret
	;			;
	; RV64IFD-LABEL: fcvt_s_wu_via_d:			; RV64IFD-LABEL: fcvt_s_wu_via_d:
	; RV64IFD: # %bb.0:			; RV64IFD: # %bb.0:
	; RV64IFD-NEXT: fcvt.s.wu ft0, a0			; RV64IFD-NEXT: fcvt.s.wu ft0, a0
	; RV64IFD-NEXT: fmv.x.w a0, ft0			; RV64IFD-NEXT: fmv.x.w a0, ft0
	; RV64IFD-NEXT: ret			; RV64IFD-NEXT: ret
	%1 = uitofp i32 %a to double			%1 = uitofp i32 %a to double
	%2 = fptrunc double %1 to float			%2 = fptrunc double %1 to float
	ret float %2			ret float %2
	}			}