This is an archive of the discontinued LLVM Phabricator instance.

Differential D6744

Combine fcmp + select to fminnum / fmaxnum if no nans and legal
ClosedPublic

Authored by arsenm on Dec 19 2014, 3:29 PM.

Download Raw Diff

Details

Reviewers

Diff Detail

Event Timeline

arsenm updated this revision to Diff 17525.Dec 19 2014, 3:29 PM

arsenm retitled this revision from to Combine fcmp + select to fminnum / fmaxnum if no nans and legal .

arsenm updated this object.

arsenm edited the test plan for this revision. (Show Details)

arsenm set the repository for this revision to rL LLVM.

arsenm added a subscriber: Unknown Object (MLST).

ping

mehdi_amini added a subscriber: mehdi_amini.Jan 8 2015, 1:57 PM

arsenm added a reviewer: ab.Jan 12 2015, 1:30 PM

The NaN part LGTM, but what happens on 0?

Say:

select +0.0, -0.0, (fcmp lt +0.0, -0.0)

turns into:

fminnum +0.0, -0.0

My understanding is, the first returns -0.0 (because they compare equal, and not "lt").
But it's unspecified which the second returns, so if the implementation returns the first operand when both are zero, it would return a "different" result here, +0.0.
Does that make sense?

-Ahmed

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4629	Nit: clang-format? (or at least the same formatting as the call, with e.g. LHS/RHS on the same line.)
4742	Optionally, how about checking isKnownNeverNaN for both operands? Also, the non-U/-O CondCodes don't care about NaNs, so it would be fine to bypass NoNaNsFPMath in that case as well, no? Though I'm not sure that combination ever happens anyway.
4749	Nit: same, format?

In D6744#107740, @ab wrote:
The NaN part LGTM, but what happens on 0?

Say:
select +0.0, -0.0, (fcmp lt +0.0, -0.0)
turns into:
fminnum +0.0, -0.0
My understanding is, the first returns -0.0 (because they compare equal, and not "lt").
But it's unspecified which the second returns, so if the implementation returns the first operand when both are zero, it would return a "different" result here, +0.0.
Does that make sense?

Yes, however I am unclear on what guarantees there are for signed zeros. The DAG TargetOptions are also missing the equivalent of No Signed Zeros. I can weaken this to unsafe FP Math as well.

-Ahmed

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4742	I didn't know about isKnownNeverNaN, I'll switch to using it. They do care about NaN. The ordered compares fail if either operand is a NaN, and the unordered succeed if either is.

+Owen

In D6744#107756, @arsenm wrote:
In D6744#107740, @ab wrote:
The NaN part LGTM, but what happens on 0?

Say:
select +0.0, -0.0, (fcmp lt +0.0, -0.0)
turns into:
fminnum +0.0, -0.0
My understanding is, the first returns -0.0 (because they compare equal, and not "lt").
But it's unspecified which the second returns, so if the implementation returns the first operand when both are zero, it would return a "different" result here, +0.0.
Does that make sense?
Yes, however I am unclear on what guarantees there are for signed zeros. The DAG TargetOptions are also missing the equivalent of No Signed Zeros. I can weaken this to unsafe FP Math as well.

I believe the X86 equivalent of this combine swaps operands to make sure it gets the same result, but that's only possible because the operation is specified there.

Other knowledgeable FP people, an opinion?

-Ahmed

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4742	I meant the non-orderered, and non-unordered, compares, as e.g., ISD::SETLT, ISD::SETGT. Those are undefined on NaNs, no?

Use isKnownNeverNaN, check UnsafeFPMath due to signed zeros

LGTM

-Ahmed

This revision is now accepted and ready to land.Jan 12 2015, 4:21 PM

r225744

mehdi_amini added inline comments.Jan 12 2015, 5:40 PM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4630	What about renaming True TrueValue and False FalseValue or something like that. I was confused with c++ true/false keyword when reading this in the first place.
4659	Can't you move the code out of the switch and having the switch only initializing the OpCode?
4742	Isn't the transformation valid for SETGT, SETGE, SETLT, SETLE even without Options.NoNaNsFPMath?
4742	Why is the limit N0.hasOneUse()?
4749	It is the correct way of indenting the arg list here?

arsenm added inline comments.Jan 12 2015, 5:47 PM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4659	Yes, but I don't like assigning variables like that and having to track them
4742	Oh, those. Yes, i it would be OK for those. However, I don't think I've ever actually seen one of those produced for FP compares before.
4749	This is what clang-format produced

Revision Contents

Path

Size

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

59 lines

test/

CodeGen/

R600/

fmax_legacy.ll

18 lines

fmin_legacy.ll

20 lines

Diff 18059

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,611 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitCTPOP(SDNode *N) {
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);

// fold (ctpop c1) -> c2		// fold (ctpop c1) -> c2
if (isa<ConstantSDNode>(N0))		if (isa<ConstantSDNode>(N0))
return DAG.getNode(ISD::CTPOP, SDLoc(N), VT, N0);		return DAG.getNode(ISD::CTPOP, SDLoc(N), VT, N0);
return SDValue();		return SDValue();
}		}


		/// \brief Generate Min/Max node
		static SDValue combineMinNumMaxNum(SDLoc DL, EVT VT, SDValue LHS, SDValue RHS,
		SDValue True, SDValue False,
		ISD::CondCode CC, const TargetLowering &TLI,
		SelectionDAG &DAG) {
		if (!(LHS == True && RHS == False) && !(LHS == False && RHS == True))
		return SDValue();

		switch (CC) {
		abUnsubmitted Not Done Reply Inline Actions Nit: clang-format? (or at least the same formatting as the call, with e.g. LHS/RHS on the same line.) ab: Nit: clang-format? (or at least the same formatting as the call, with e.g. LHS/RHS on the same…
		case ISD::SETOLT:
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions What about renaming True TrueValue and False FalseValue or something like that. I was confused with c++ true/false keyword when reading this in the first place. mehdi_amini: What about renaming True TrueValue and False FalseValue or something like that. I was confused…
		case ISD::SETOLE:
		case ISD::SETLT:
		case ISD::SETLE:
		case ISD::SETULT:
		case ISD::SETULE: {
		unsigned Opcode = (LHS == True) ? ISD::FMINNUM : ISD::FMAXNUM;
		if (TLI.isOperationLegal(Opcode, VT))
		return DAG.getNode(Opcode, DL, VT, LHS, RHS);
		return SDValue();
		}
		case ISD::SETOGT:
		case ISD::SETOGE:
		case ISD::SETGT:
		case ISD::SETGE:
		case ISD::SETUGT:
		case ISD::SETUGE: {
		unsigned Opcode = (LHS == True) ? ISD::FMAXNUM : ISD::FMINNUM;
		if (TLI.isOperationLegal(Opcode, VT))
		return DAG.getNode(Opcode, DL, VT, LHS, RHS);
		return SDValue();
		}
		default:
		return SDValue();
		}
		}

SDValue DAGCombiner::visitSELECT(SDNode *N) {		SDValue DAGCombiner::visitSELECT(SDNode *N) {
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
SDValue N1 = N->getOperand(1);		SDValue N1 = N->getOperand(1);
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions Can't you move the code out of the switch and having the switch only initializing the OpCode? mehdi_amini: Can't you move the code out of the switch and having the switch only initializing the OpCode?
		arsenmAuthorUnsubmitted Not Done Reply Inline Actions Yes, but I don't like assigning variables like that and having to track them arsenm: Yes, but I don't like assigning variables like that and having to track them
SDValue N2 = N->getOperand(2);		SDValue N2 = N->getOperand(2);
ConstantSDNode *N0C = dyn_cast<ConstantSDNode>(N0);		ConstantSDNode *N0C = dyn_cast<ConstantSDNode>(N0);
ConstantSDNode *N1C = dyn_cast<ConstantSDNode>(N1);		ConstantSDNode *N1C = dyn_cast<ConstantSDNode>(N1);
ConstantSDNode *N2C = dyn_cast<ConstantSDNode>(N2);		ConstantSDNode *N2C = dyn_cast<ConstantSDNode>(N2);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
EVT VT0 = N0.getValueType();		EVT VT0 = N0.getValueType();

// fold (select C, X, X) -> X		// fold (select C, X, X) -> X
▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	if (VT == MVT::i1 && (N0 == N2 \|\| (N2C && N2C->getAPIntValue() == 0)))
return DAG.getNode(ISD::AND, SDLoc(N), VT, N0, N1);		return DAG.getNode(ISD::AND, SDLoc(N), VT, N0, N1);

// If we can fold this based on the true/false value, do so.		// If we can fold this based on the true/false value, do so.
if (SimplifySelectOps(N, N1, N2))		if (SimplifySelectOps(N, N1, N2))
return SDValue(N, 0); // Don't revisit N.		return SDValue(N, 0); // Don't revisit N.

// fold selects based on a setcc into other things, such as min/max/abs		// fold selects based on a setcc into other things, such as min/max/abs
if (N0.getOpcode() == ISD::SETCC) {		if (N0.getOpcode() == ISD::SETCC) {
		// select x, y (fcmp lt x, y) -> fminnum x, y
		// select x, y (fcmp gt x, y) -> fmaxnum x, y
		//
		// This is OK if we don't care about what happens if either operand is a
		// NaN.
		//

		mehdi_aminiUnsubmitted Not Done Reply Inline Actions Isn't the transformation valid for SETGT, SETGE, SETLT, SETLE even without Options.NoNaNsFPMath? mehdi_amini: Isn't the transformation valid for SETGT, SETGE, SETLT, SETLE even without Options.NoNaNsFPMath?
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions Why is the limit N0.hasOneUse()? mehdi_amini: Why is the limit N0.hasOneUse()?
		abUnsubmitted Not Done Reply Inline Actions Optionally, how about checking isKnownNeverNaN for both operands? Also, the non-U/-O CondCodes don't care about NaNs, so it would be fine to bypass NoNaNsFPMath in that case as well, no? Though I'm not sure that combination ever happens anyway. ab: Optionally, how about checking isKnownNeverNaN for both operands? Also, the non-U/-O CondCodes…
		arsenmAuthorUnsubmitted Not Done Reply Inline Actions I didn't know about isKnownNeverNaN, I'll switch to using it. They do care about NaN. The ordered compares fail if either operand is a NaN, and the unordered succeed if either is. arsenm: I didn't know about isKnownNeverNaN, I'll switch to using it. They do care about NaN. The…
		abUnsubmitted Not Done Reply Inline Actions I meant the non-orderered, and non-unordered, compares, as e.g., ISD::SETLT, ISD::SETGT. Those are undefined on NaNs, no? ab: I meant the non-orderered, and non-unordered, compares, as e.g., ISD::SETLT, ISD::SETGT. Those…
		arsenmAuthorUnsubmitted Not Done Reply Inline Actions Oh, those. Yes, i it would be OK for those. However, I don't think I've ever actually seen one of those produced for FP compares before. arsenm: Oh, those. Yes, i it would be OK for those. However, I don't think I've ever actually seen one…
		// FIXME: Instead of testing for UnsafeFPMath, this should be checking for
		// no signed zeros as well as no nans.
		const TargetOptions &Options = DAG.getTarget().Options;
		if (Options.UnsafeFPMath &&
		VT.isFloatingPoint() && N0.hasOneUse() &&
		DAG.isKnownNeverNaN(N1) && DAG.isKnownNeverNaN(N2)) {
		ISD::CondCode CC = cast<CondCodeSDNode>(N0.getOperand(2))->get();
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions It is the correct way of indenting the arg list here? mehdi_amini: It is the correct way of indenting the arg list here?
		abUnsubmitted Not Done Reply Inline Actions Nit: same, format? ab: Nit: same, format?
		arsenmAuthorUnsubmitted Not Done Reply Inline Actions This is what clang-format produced arsenm: This is what clang-format produced

		SDValue FMinMax =
		combineMinNumMaxNum(SDLoc(N), VT, N0.getOperand(0), N0.getOperand(1),
		N1, N2, CC, TLI, DAG);
		if (FMinMax)
		return FMinMax;
		}

if ((!LegalOperations &&		if ((!LegalOperations &&
TLI.isOperationLegalOrCustom(ISD::SELECT_CC, VT)) \|\|		TLI.isOperationLegalOrCustom(ISD::SELECT_CC, VT)) \|\|
TLI.isOperationLegal(ISD::SELECT_CC, VT))		TLI.isOperationLegal(ISD::SELECT_CC, VT))
return DAG.getNode(ISD::SELECT_CC, SDLoc(N), VT,		return DAG.getNode(ISD::SELECT_CC, SDLoc(N), VT,
N0.getOperand(0), N0.getOperand(1),		N0.getOperand(0), N0.getOperand(1),
N1, N2, N0.getOperand(2));		N1, N2, N0.getOperand(2));
return SimplifySelect(SDLoc(N), N0, N1, N2);		return SimplifySelect(SDLoc(N), N0, N1, N2);
}		}
▲ Show 20 Lines • Show All 8,014 Lines • Show Last 20 Lines

test/CodeGen/R600/fmax_legacy.ll

	; RUN: llc -march=amdgcn -mcpu=SI < %s \| FileCheck -check-prefix=SI -check-prefix=FUNC %s			; RUN: llc -march=amdgcn -mcpu=SI < %s \| FileCheck -check-prefix=SI -check-prefix=SI-SAFE -check-prefix=FUNC %s
				; RUN: llc -enable-no-nans-fp-math -unsafe-fp-math -march=amdgcn -mcpu=SI < %s \| FileCheck -check-prefix=SI-NONAN -check-prefix=SI -check-prefix=FUNC %s
	; RUN: llc -march=r600 -mcpu=redwood < %s \| FileCheck -check-prefix=EG -check-prefix=FUNC %s			; RUN: llc -march=r600 -mcpu=redwood < %s \| FileCheck -check-prefix=EG -check-prefix=FUNC %s

				; FIXME: Should replace unsafe-fp-math with no signed zeros.

	declare i32 @llvm.r600.read.tidig.x() #1			declare i32 @llvm.r600.read.tidig.x() #1

	; FUNC-LABEL: @test_fmax_legacy_uge_f32			; FUNC-LABEL: @test_fmax_legacy_uge_f32
	; SI: buffer_load_dword [[A:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64{{$}}			; SI: buffer_load_dword [[A:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64{{$}}
	; SI: buffer_load_dword [[B:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64 offset:4			; SI: buffer_load_dword [[B:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64 offset:4
	; SI: v_max_legacy_f32_e32 {{v[0-9]+}}, [[B]], [[A]]			; SI-SAFE: v_max_legacy_f32_e32 {{v[0-9]+}}, [[B]], [[A]]
				; SI-NONAN: v_max_f32_e32 {{v[0-9]+}}, [[B]], [[A]]

	; EG: MAX			; EG: MAX
	define void @test_fmax_legacy_uge_f32(float addrspace(1)* %out, float addrspace(1)* %in) #0 {			define void @test_fmax_legacy_uge_f32(float addrspace(1)* %out, float addrspace(1)* %in) #0 {
	%tid = call i32 @llvm.r600.read.tidig.x() #1			%tid = call i32 @llvm.r600.read.tidig.x() #1
	%gep.0 = getelementptr float addrspace(1)* %in, i32 %tid			%gep.0 = getelementptr float addrspace(1)* %in, i32 %tid
	%gep.1 = getelementptr float addrspace(1)* %gep.0, i32 1			%gep.1 = getelementptr float addrspace(1)* %gep.0, i32 1

	%a = load float addrspace(1)* %gep.0, align 4			%a = load float addrspace(1)* %gep.0, align 4
	%b = load float addrspace(1)* %gep.1, align 4			%b = load float addrspace(1)* %gep.1, align 4

	%cmp = fcmp uge float %a, %b			%cmp = fcmp uge float %a, %b
	%val = select i1 %cmp, float %a, float %b			%val = select i1 %cmp, float %a, float %b
	store float %val, float addrspace(1)* %out, align 4			store float %val, float addrspace(1)* %out, align 4
	ret void			ret void
	}			}

	; FUNC-LABEL: @test_fmax_legacy_oge_f32			; FUNC-LABEL: @test_fmax_legacy_oge_f32
	; SI: buffer_load_dword [[A:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64{{$}}			; SI: buffer_load_dword [[A:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64{{$}}
	; SI: buffer_load_dword [[B:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64 offset:4			; SI: buffer_load_dword [[B:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64 offset:4
	; SI: v_max_legacy_f32_e32 {{v[0-9]+}}, [[A]], [[B]]			; SI-SAFE: v_max_legacy_f32_e32 {{v[0-9]+}}, [[A]], [[B]]
				; SI-NONAN: v_max_f32_e32 {{v[0-9]+}}, [[B]], [[A]]
	; EG: MAX			; EG: MAX
	define void @test_fmax_legacy_oge_f32(float addrspace(1)* %out, float addrspace(1)* %in) #0 {			define void @test_fmax_legacy_oge_f32(float addrspace(1)* %out, float addrspace(1)* %in) #0 {
	%tid = call i32 @llvm.r600.read.tidig.x() #1			%tid = call i32 @llvm.r600.read.tidig.x() #1
	%gep.0 = getelementptr float addrspace(1)* %in, i32 %tid			%gep.0 = getelementptr float addrspace(1)* %in, i32 %tid
	%gep.1 = getelementptr float addrspace(1)* %gep.0, i32 1			%gep.1 = getelementptr float addrspace(1)* %gep.0, i32 1

	%a = load float addrspace(1)* %gep.0, align 4			%a = load float addrspace(1)* %gep.0, align 4
	%b = load float addrspace(1)* %gep.1, align 4			%b = load float addrspace(1)* %gep.1, align 4

	%cmp = fcmp oge float %a, %b			%cmp = fcmp oge float %a, %b
	%val = select i1 %cmp, float %a, float %b			%val = select i1 %cmp, float %a, float %b
	store float %val, float addrspace(1)* %out, align 4			store float %val, float addrspace(1)* %out, align 4
	ret void			ret void
	}			}

	; FUNC-LABEL: @test_fmax_legacy_ugt_f32			; FUNC-LABEL: @test_fmax_legacy_ugt_f32
	; SI: buffer_load_dword [[A:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64{{$}}			; SI: buffer_load_dword [[A:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64{{$}}
	; SI: buffer_load_dword [[B:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64 offset:4			; SI: buffer_load_dword [[B:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64 offset:4
	; SI: v_max_legacy_f32_e32 {{v[0-9]+}}, [[B]], [[A]]			; SI-SAFE: v_max_legacy_f32_e32 {{v[0-9]+}}, [[B]], [[A]]
				; SI-NONAN: v_max_f32_e32 {{v[0-9]+}}, [[B]], [[A]]
	; EG: MAX			; EG: MAX
	define void @test_fmax_legacy_ugt_f32(float addrspace(1)* %out, float addrspace(1)* %in) #0 {			define void @test_fmax_legacy_ugt_f32(float addrspace(1)* %out, float addrspace(1)* %in) #0 {
	%tid = call i32 @llvm.r600.read.tidig.x() #1			%tid = call i32 @llvm.r600.read.tidig.x() #1
	%gep.0 = getelementptr float addrspace(1)* %in, i32 %tid			%gep.0 = getelementptr float addrspace(1)* %in, i32 %tid
	%gep.1 = getelementptr float addrspace(1)* %gep.0, i32 1			%gep.1 = getelementptr float addrspace(1)* %gep.0, i32 1

	%a = load float addrspace(1)* %gep.0, align 4			%a = load float addrspace(1)* %gep.0, align 4
	%b = load float addrspace(1)* %gep.1, align 4			%b = load float addrspace(1)* %gep.1, align 4

	%cmp = fcmp ugt float %a, %b			%cmp = fcmp ugt float %a, %b
	%val = select i1 %cmp, float %a, float %b			%val = select i1 %cmp, float %a, float %b
	store float %val, float addrspace(1)* %out, align 4			store float %val, float addrspace(1)* %out, align 4
	ret void			ret void
	}			}

	; FUNC-LABEL: @test_fmax_legacy_ogt_f32			; FUNC-LABEL: @test_fmax_legacy_ogt_f32
	; SI: buffer_load_dword [[A:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64{{$}}			; SI: buffer_load_dword [[A:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64{{$}}
	; SI: buffer_load_dword [[B:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64 offset:4			; SI: buffer_load_dword [[B:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64 offset:4
	; SI: v_max_legacy_f32_e32 {{v[0-9]+}}, [[A]], [[B]]			; SI-SAFE: v_max_legacy_f32_e32 {{v[0-9]+}}, [[A]], [[B]]
				; SI-NONAN: v_max_f32_e32 {{v[0-9]+}}, [[B]], [[A]]
	; EG: MAX			; EG: MAX
	define void @test_fmax_legacy_ogt_f32(float addrspace(1)* %out, float addrspace(1)* %in) #0 {			define void @test_fmax_legacy_ogt_f32(float addrspace(1)* %out, float addrspace(1)* %in) #0 {
	%tid = call i32 @llvm.r600.read.tidig.x() #1			%tid = call i32 @llvm.r600.read.tidig.x() #1
	%gep.0 = getelementptr float addrspace(1)* %in, i32 %tid			%gep.0 = getelementptr float addrspace(1)* %in, i32 %tid
	%gep.1 = getelementptr float addrspace(1)* %gep.0, i32 1			%gep.1 = getelementptr float addrspace(1)* %gep.0, i32 1

	%a = load float addrspace(1)* %gep.0, align 4			%a = load float addrspace(1)* %gep.0, align 4
	%b = load float addrspace(1)* %gep.1, align 4			%b = load float addrspace(1)* %gep.1, align 4
	Show All 34 Lines

test/CodeGen/R600/fmin_legacy.ll

	; RUN: llc -march=amdgcn -mcpu=SI < %s \| FileCheck -check-prefix=SI -check-prefix=FUNC %s			; RUN: llc -march=amdgcn -mcpu=SI < %s \| FileCheck -check-prefix=SI-SAFE -check-prefix=SI -check-prefix=FUNC %s
				; RUN: llc -enable-no-nans-fp-math -enable-unsafe-fp-math -march=amdgcn -mcpu=SI < %s \| FileCheck -check-prefix=SI-NONAN -check-prefix=SI -check-prefix=FUNC %s
	; RUN: llc -march=r600 -mcpu=redwood < %s \| FileCheck -check-prefix=EG -check-prefix=FUNC %s			; RUN: llc -march=r600 -mcpu=redwood < %s \| FileCheck -check-prefix=EG -check-prefix=FUNC %s

				; FIXME: Should replace unsafe-fp-math with no signed zeros.

	declare i32 @llvm.r600.read.tidig.x() #1			declare i32 @llvm.r600.read.tidig.x() #1

	; FUNC-LABEL: @test_fmin_legacy_f32			; FUNC-LABEL: @test_fmin_legacy_f32
	; EG: MIN *			; EG: MIN *
	; SI: v_min_legacy_f32_e32			; SI-SAFE: v_min_legacy_f32_e32
				; SI-NONAN: v_min_f32_e32
	define void @test_fmin_legacy_f32(<4 x float> addrspace(1)* %out, <4 x float> inreg %reg0) #0 {			define void @test_fmin_legacy_f32(<4 x float> addrspace(1)* %out, <4 x float> inreg %reg0) #0 {
	%r0 = extractelement <4 x float> %reg0, i32 0			%r0 = extractelement <4 x float> %reg0, i32 0
	%r1 = extractelement <4 x float> %reg0, i32 1			%r1 = extractelement <4 x float> %reg0, i32 1
	%r2 = fcmp uge float %r0, %r1			%r2 = fcmp uge float %r0, %r1
	%r3 = select i1 %r2, float %r1, float %r0			%r3 = select i1 %r2, float %r1, float %r0
	%vec = insertelement <4 x float> undef, float %r3, i32 0			%vec = insertelement <4 x float> undef, float %r3, i32 0
	store <4 x float> %vec, <4 x float> addrspace(1)* %out, align 16			store <4 x float> %vec, <4 x float> addrspace(1)* %out, align 16
	ret void			ret void
	}			}

	; FUNC-LABEL: @test_fmin_legacy_ule_f32			; FUNC-LABEL: @test_fmin_legacy_ule_f32
	; SI: buffer_load_dword [[A:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64{{$}}			; SI: buffer_load_dword [[A:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64{{$}}
	; SI: buffer_load_dword [[B:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64 offset:4			; SI: buffer_load_dword [[B:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64 offset:4
	; SI: v_min_legacy_f32_e32 {{v[0-9]+}}, [[B]], [[A]]			; SI-SAFE: v_min_legacy_f32_e32 {{v[0-9]+}}, [[B]], [[A]]
				; SI-NONAN: v_min_f32_e32 {{v[0-9]+}}, [[B]], [[A]]
	define void @test_fmin_legacy_ule_f32(float addrspace(1)* %out, float addrspace(1)* %in) #0 {			define void @test_fmin_legacy_ule_f32(float addrspace(1)* %out, float addrspace(1)* %in) #0 {
	%tid = call i32 @llvm.r600.read.tidig.x() #1			%tid = call i32 @llvm.r600.read.tidig.x() #1
	%gep.0 = getelementptr float addrspace(1)* %in, i32 %tid			%gep.0 = getelementptr float addrspace(1)* %in, i32 %tid
	%gep.1 = getelementptr float addrspace(1)* %gep.0, i32 1			%gep.1 = getelementptr float addrspace(1)* %gep.0, i32 1

	%a = load float addrspace(1)* %gep.0, align 4			%a = load float addrspace(1)* %gep.0, align 4
	%b = load float addrspace(1)* %gep.1, align 4			%b = load float addrspace(1)* %gep.1, align 4

	%cmp = fcmp ule float %a, %b			%cmp = fcmp ule float %a, %b
	%val = select i1 %cmp, float %a, float %b			%val = select i1 %cmp, float %a, float %b
	store float %val, float addrspace(1)* %out, align 4			store float %val, float addrspace(1)* %out, align 4
	ret void			ret void
	}			}

	; FUNC-LABEL: @test_fmin_legacy_ole_f32			; FUNC-LABEL: @test_fmin_legacy_ole_f32
	; SI: buffer_load_dword [[A:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64{{$}}			; SI: buffer_load_dword [[A:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64{{$}}
	; SI: buffer_load_dword [[B:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64 offset:4			; SI: buffer_load_dword [[B:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64 offset:4
	; SI: v_min_legacy_f32_e32 {{v[0-9]+}}, [[A]], [[B]]			; SI-SAFE: v_min_legacy_f32_e32 {{v[0-9]+}}, [[A]], [[B]]
				; SI-NONAN: v_min_f32_e32 {{v[0-9]+}}, [[B]], [[A]]
	define void @test_fmin_legacy_ole_f32(float addrspace(1)* %out, float addrspace(1)* %in) #0 {			define void @test_fmin_legacy_ole_f32(float addrspace(1)* %out, float addrspace(1)* %in) #0 {
	%tid = call i32 @llvm.r600.read.tidig.x() #1			%tid = call i32 @llvm.r600.read.tidig.x() #1
	%gep.0 = getelementptr float addrspace(1)* %in, i32 %tid			%gep.0 = getelementptr float addrspace(1)* %in, i32 %tid
	%gep.1 = getelementptr float addrspace(1)* %gep.0, i32 1			%gep.1 = getelementptr float addrspace(1)* %gep.0, i32 1

	%a = load float addrspace(1)* %gep.0, align 4			%a = load float addrspace(1)* %gep.0, align 4
	%b = load float addrspace(1)* %gep.1, align 4			%b = load float addrspace(1)* %gep.1, align 4

	%cmp = fcmp ole float %a, %b			%cmp = fcmp ole float %a, %b
	%val = select i1 %cmp, float %a, float %b			%val = select i1 %cmp, float %a, float %b
	store float %val, float addrspace(1)* %out, align 4			store float %val, float addrspace(1)* %out, align 4
	ret void			ret void
	}			}

	; FUNC-LABEL: @test_fmin_legacy_olt_f32			; FUNC-LABEL: @test_fmin_legacy_olt_f32
	; SI: buffer_load_dword [[A:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64{{$}}			; SI: buffer_load_dword [[A:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64{{$}}
	; SI: buffer_load_dword [[B:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64 offset:4			; SI: buffer_load_dword [[B:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64 offset:4
	; SI: v_min_legacy_f32_e32 {{v[0-9]+}}, [[A]], [[B]]			; SI-SAFE: v_min_legacy_f32_e32 {{v[0-9]+}}, [[A]], [[B]]
				; SI-NONAN: v_min_f32_e32 {{v[0-9]+}}, [[B]], [[A]]
	define void @test_fmin_legacy_olt_f32(float addrspace(1)* %out, float addrspace(1)* %in) #0 {			define void @test_fmin_legacy_olt_f32(float addrspace(1)* %out, float addrspace(1)* %in) #0 {
	%tid = call i32 @llvm.r600.read.tidig.x() #1			%tid = call i32 @llvm.r600.read.tidig.x() #1
	%gep.0 = getelementptr float addrspace(1)* %in, i32 %tid			%gep.0 = getelementptr float addrspace(1)* %in, i32 %tid
	%gep.1 = getelementptr float addrspace(1)* %gep.0, i32 1			%gep.1 = getelementptr float addrspace(1)* %gep.0, i32 1

	%a = load float addrspace(1)* %gep.0, align 4			%a = load float addrspace(1)* %gep.0, align 4
	%b = load float addrspace(1)* %gep.1, align 4			%b = load float addrspace(1)* %gep.1, align 4

	%cmp = fcmp olt float %a, %b			%cmp = fcmp olt float %a, %b
	%val = select i1 %cmp, float %a, float %b			%val = select i1 %cmp, float %a, float %b
	store float %val, float addrspace(1)* %out, align 4			store float %val, float addrspace(1)* %out, align 4
	ret void			ret void
	}			}

	; FUNC-LABEL: @test_fmin_legacy_ult_f32			; FUNC-LABEL: @test_fmin_legacy_ult_f32
	; SI: buffer_load_dword [[A:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64{{$}}			; SI: buffer_load_dword [[A:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64{{$}}
	; SI: buffer_load_dword [[B:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64 offset:4			; SI: buffer_load_dword [[B:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64 offset:4
	; SI: v_min_legacy_f32_e32 {{v[0-9]+}}, [[B]], [[A]]			; SI-SAFE: v_min_legacy_f32_e32 {{v[0-9]+}}, [[B]], [[A]]
				; SI-NONAN: v_min_f32_e32 {{v[0-9]+}}, [[B]], [[A]]
	define void @test_fmin_legacy_ult_f32(float addrspace(1)* %out, float addrspace(1)* %in) #0 {			define void @test_fmin_legacy_ult_f32(float addrspace(1)* %out, float addrspace(1)* %in) #0 {
	%tid = call i32 @llvm.r600.read.tidig.x() #1			%tid = call i32 @llvm.r600.read.tidig.x() #1
	%gep.0 = getelementptr float addrspace(1)* %in, i32 %tid			%gep.0 = getelementptr float addrspace(1)* %in, i32 %tid
	%gep.1 = getelementptr float addrspace(1)* %gep.0, i32 1			%gep.1 = getelementptr float addrspace(1)* %gep.0, i32 1

	%a = load float addrspace(1)* %gep.0, align 4			%a = load float addrspace(1)* %gep.0, align 4
	%b = load float addrspace(1)* %gep.1, align 4			%b = load float addrspace(1)* %gep.1, align 4

	Show All 31 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Combine fcmp + select to fminnum / fmaxnum if no nans and legal ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 18059

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

test/CodeGen/R600/fmax_legacy.ll

test/CodeGen/R600/fmin_legacy.ll

Combine fcmp + select to fminnum / fmaxnum if no nans and legal
ClosedPublic