This is an archive of the discontinued LLVM Phabricator instance.

Don't combine fp_round (fp_round x) if f80 to f16 is generated
ClosedPublic

Authored by pirama on Feb 12 2016, 3:52 PM.

Download Raw Diff

Details

Reviewers

Commits

rG7476bc89e939: Don't combine fp_round (fp_round x) if f80 to f16 is generated
rL260769: Don't combine fp_round (fp_round x) if f80 to f16 is generated

Summary

This patch skips DAG combine of fp_round (fp_round x) if it results in
an fp_round from f80 to f16.

fp_round from f80 to f16 always generates an expensive (and as yet,
unimplemented) libcall to __truncxfhf2. This prevents selection of
native f16 conversion instructions from f32 or f64. Moreover, the first
(value-preserving) fp_round from f80 to either f32 or f64 may become a
NOP in platforms like x86.

Diff Detail

Repository: rL LLVM

Event Timeline

pirama updated this revision to Diff 47869.Feb 12 2016, 3:52 PM

pirama retitled this revision from to Don't combine fp_round (fp_round x) if f80 to f16 is generated.

pirama updated this object.

pirama added a reviewer: ab.

pirama added subscribers: llvm-commits, srhines.

ab added inline comments.Feb 12 2016, 3:58 PM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9030 ↗	(On Diff #47869)	How do we know that the round is value-preserving?

pirama added inline comments.Feb 12 2016, 4:03 PM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9030 ↗	(On Diff #47869)	The first round being value-preserving is a pre-requisite for the folding (the if statement below). So, this patch doesn't change behavior if the first fp_round is not value-preserving.

LGTM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9030 ↗	(On Diff #47869)	Ah, you're right; I misread that, sorry.
test/CodeGen/X86/half.ll
270 ↗	(On Diff #47869)	What do you think of explicitly testing f80->f16 instead?

This revision is now accepted and ready to land.Feb 12 2016, 4:11 PM

Thanks for the quick review :)

Closed by commit rL260769: Don't combine fp_round (fp_round x) if f80 to f16 is generated (authored by pirama). · Explain WhyFeb 12 2016, 4:12 PM

This revision was automatically updated to reflect the committed changes.

srhines added inline comments.Feb 12 2016, 5:18 PM

test/CodeGen/X86/half.ll
270 ↗	(On Diff #47869)	Pirama, I think you missed this question. :)

pirama added inline comments.Feb 12 2016, 6:35 PM

test/CodeGen/X86/half.ll
270 ↗	(On Diff #47869)	Aah, sorry, I missed this. I think you mean testing ftrunc from f80 -> say f32 followed by f32 -> f16. This doesn't trigger the folding because the rounds are not value preserving.

Revision Contents

Path

Size

llvm/

trunk/

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

11 lines

test/

CodeGen/

X86/

half.ll

15 lines

Diff 47877

llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 9,013 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitFP_ROUND(SDNode *N) {
// fold (fp_round (fp_extend x)) -> x		// fold (fp_round (fp_extend x)) -> x
if (N0.getOpcode() == ISD::FP_EXTEND && VT == N0.getOperand(0).getValueType())		if (N0.getOpcode() == ISD::FP_EXTEND && VT == N0.getOperand(0).getValueType())
return N0.getOperand(0);		return N0.getOperand(0);

// fold (fp_round (fp_round x)) -> (fp_round x)		// fold (fp_round (fp_round x)) -> (fp_round x)
if (N0.getOpcode() == ISD::FP_ROUND) {		if (N0.getOpcode() == ISD::FP_ROUND) {
const bool NIsTrunc = N->getConstantOperandVal(1) == 1;		const bool NIsTrunc = N->getConstantOperandVal(1) == 1;
const bool N0IsTrunc = N0.getNode()->getConstantOperandVal(1) == 1;		const bool N0IsTrunc = N0.getNode()->getConstantOperandVal(1) == 1;

		// Skip this folding if it results in an fp_round from f80 to f16.
		//
		// f80 to f16 always generates an expensive (and as yet, unimplemented)
		// libcall to __truncxfhf2 instead of selecting native f16 conversion
		// instructions from f32 or f64. Moreover, the first (value-preserving)
		// fp_round from f80 to either f32 or f64 may become a NOP in platforms like
		// x86.
		if (N0.getOperand(0).getValueType() == MVT::f80 && VT == MVT::f16)
		return SDValue();

// If the first fp_round isn't a value preserving truncation, it might		// If the first fp_round isn't a value preserving truncation, it might
// introduce a tie in the second fp_round, that wouldn't occur in the		// introduce a tie in the second fp_round, that wouldn't occur in the
// single-step fp_round we want to fold to.		// single-step fp_round we want to fold to.
// In other words, double rounding isn't the same as rounding.		// In other words, double rounding isn't the same as rounding.
// Also, this is a value preserving truncation iff both fp_round's are.		// Also, this is a value preserving truncation iff both fp_round's are.
if (DAG.getTarget().Options.UnsafeFPMath \|\| N0IsTrunc) {		if (DAG.getTarget().Options.UnsafeFPMath \|\| N0IsTrunc) {
SDLoc DL(N);		SDLoc DL(N);
return DAG.getNode(ISD::FP_ROUND, DL, VT, N0.getOperand(0),		return DAG.getNode(ISD::FP_ROUND, DL, VT, N0.getOperand(0),
▲ Show 20 Lines • Show All 5,764 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/half.ll

	; RUN: llc < %s -march=x86-64 -mtriple=x86_64-unknown-linux-gnu -mcpu=corei7 -mattr=-f16c -asm-verbose=false \			; RUN: llc < %s -march=x86-64 -mtriple=x86_64-unknown-linux-gnu -mcpu=corei7 -mattr=-f16c -asm-verbose=false \
	; RUN: \| FileCheck %s -check-prefix=CHECK -check-prefix=CHECK-LIBCALL			; RUN: \| FileCheck %s -check-prefix=CHECK -check-prefix=CHECK-LIBCALL
	; RUN: llc < %s -march=x86-64 -mtriple=x86_64-unknown-linux-gnu -mcpu=corei7 -mattr=+f16c -asm-verbose=false \			; RUN: llc < %s -march=x86-64 -mtriple=x86_64-unknown-linux-gnu -mcpu=corei7 -mattr=+f16c -asm-verbose=false \
	; RUN: \| FileCheck %s -check-prefix=CHECK -check-prefix=CHECK-F16C			; RUN: \| FileCheck %s -check-prefix=CHECK -check-prefix=CHECK-F16C
				; RUN: llc < %s -mtriple=i686-unknown-linux-gnu -mattr +sse2 -asm-verbose=false \
				; RUN: \| FileCheck %s -check-prefix=CHECK-I686

	define void @test_load_store(half* %in, half* %out) {			define void @test_load_store(half* %in, half* %out) {
	; CHECK-LABEL: test_load_store:			; CHECK-LABEL: test_load_store:
	; CHECK: movw (%rdi), [[TMP:%[a-z0-9]+]]			; CHECK: movw (%rdi), [[TMP:%[a-z0-9]+]]
	; CHECK: movw [[TMP]], (%rsi)			; CHECK: movw [[TMP]], (%rsi)
	%val = load half, half* %in			%val = load half, half* %in
	store half %val, half* %out			store half %val, half* %out
	ret void			ret void
	▲ Show 20 Lines • Show All 242 Lines • ▼ Show 20 Lines
	; CHECK: movw			; CHECK: movw
	; CHECK: movw			; CHECK: movw
	; CHECK: movw			; CHECK: movw
	%v = fptrunc <4 x double> %a to <4 x half>			%v = fptrunc <4 x double> %a to <4 x half>
	store <4 x half> %v, <4 x half>* %p			store <4 x half> %v, <4 x half>* %p
	ret void			ret void
	}			}

				declare float @test_floatret();

				; On i686, if SSE2 is available, the return value from test_floatret is loaded
				; to f80 and then rounded to f32. The DAG combiner should not combine this
				; fp_round and the subsequent fptrunc from float to half.
				define half @test_f80trunc_nodagcombine() #0 {
				; CHECK-LABEL: test_f80trunc_nodagcombine:
				; CHECK-I686-NOT: calll __truncxfhf2
				%1 = call float @test_floatret()
				%2 = fptrunc float %1 to half
				ret half %2
				}

	attributes #0 = { nounwind }			attributes #0 = { nounwind }