Download Raw Diff

Details

Reviewers

cfang
• tstellarAMD
arsenm

Commits

rL272044: Differential Revision: http://reviews.llvm.org/D20557

Summary

Lowering floating point division for 32-bit using IEEE 754

Diff Detail

Repository: rL LLVM

Event Timeline

wdng updated this revision to Diff 58190.May 23 2016, 9:19 PM

wdng retitled this revision from to Lowering floating point division for 32-bit using IEEE 754.

wdng updated this object.

wdng added reviewers: arsenm, • tstellarAMD, cfang.

wdng set the repository for this revision to rL LLVM.

Herald added a subscriber: arsenm. · View Herald TranscriptMay 23 2016, 9:19 PM

arsenm added inline comments.May 24 2016, 12:50 PM

lib/Target/AMDGPU/SIISelLowering.cpp
42 ↗	(On Diff #58190)	Description is inaccurate. Should say something about faster 2.5 ulp fdiv
2031 ↗	(On Diff #58190)	No space after (
2032 ↗	(On Diff #58190)	Break on same line
2034 ↗	(On Diff #58190)	Indentation looks wrong
2047 ↗	(On Diff #58190)	Don't all caps ON
2076 ↗	(On Diff #58190)	You don't need the else

wdng updated this revision to Diff 58310.May 24 2016, 1:50 PM

wdng updated this object.

wdng marked 6 inline comments as done.

wdng removed a subscriber: arsenm.

Needs tests

wdng updated this revision to Diff 58637.May 26 2016, 10:07 AM

wdng updated this object.

wdng edited edge metadata.

You should add more tests that use the fast math flags, and also run with unsafe-fp-math enabled on the function

test/CodeGen/AMDGPU/fdiv.ll
4 ↗	(On Diff #58637)	This should come before the r600 line
81–82 ↗	(On Diff #58637)	-DAG does not work as expected if you specify the same string multiple times in the same -DAG sequence. I would reduce the vector tests to a differentiating instruction or two per element (e.g. div_scale + div_fixup)

Hi Matt, You commented that "You should add more tests that use the fast math flags, and also run with unsafe-fp-math enabled on the function". Looks like fast math is not a llc flag.
Could you explain in more details about using the fast math flags? e.g. What's the fast math flag? How the compiler backend retrieves the fast math flag? what code should we generate for fast-math? Thanks a lot!

The llc flag is -enable-unsafe-fp-math. There is also the per-function attribute "unsafe-fp-math"="true". We should probably use that, but I discovered a while ago that if one function has it used, the others must have "unsafe-fp-math"="false" because there is a bug where the setting is reset between functions

The new introduced flag " -amdgpu-fast-fdiv" is used to control use old and new added IEEE754 div implementation. For the -enable-unsafe-fp-math, could you please let me know which part of code will be tested? Thanks!

The test should show which division implementation you get when unsafe math is enabled. There also need to be tests for the fast math flags on the individual instructions

Modify & add LIT tests based on Matt's comments

You still need to add tests that only use the fast math flags

lib/Target/AMDGPU/SIISelLowering.cpp
2080–2081 ↗	(On Diff #58855)	Variables should be capitalized and camel case

Any comments about my latest changes? Thanks!

Capitalize variables to follow the camel case.

In D20557#444512, @wdng wrote:

Any comments about my latest changes? Thanks!

Still need tests with the individual fast math flags

In D20557#444692, @arsenm wrote:

In D20557#444512, @wdng wrote:

Any comments about my latest changes? Thanks!

Still need tests with the individual fast math flags

Not quite clear about the individual fast math? Could you please describe in details?

In D20557#444720, @wdng wrote:

In D20557#444692, @arsenm wrote:

In D20557#444512, @wdng wrote:

Any comments about my latest changes? Thanks!

Still need tests with the individual fast math flags

Not quite clear about the individual fast math? Could you please describe in details?

There should be tests with just the fast math flags as described here: http://llvm.org/docs/LangRef.html#fast-math-flags

Rather than the globally enabled fast math option.

In D20557#444721, @arsenm wrote:

In D20557#444720, @wdng wrote:

In D20557#444692, @arsenm wrote:

In D20557#444512, @wdng wrote:

Any comments about my latest changes? Thanks!

Still need tests with the individual fast math flags

Not quite clear about the individual fast math? Could you please describe in details?

There should be tests with just the fast math flags as described here: http://llvm.org/docs/LangRef.html#fast-math-flags

Rather than the globally enabled fast math option.

Looks like compiler chooses a different implementation for fpdiv if we have "-enable-unsafe-fp-math" flag enabled when using llc. Generated *.s does show different ISA output, but the breakpoint which I put at SITargetLowering::LowerFDIV is never reached. Is this correct?

Based on Matt's comments: Added tests (Merge Changpeng's LIT tests) with just the fast math flags as described here: http://llvm.org/docs/LangRef.html#fast-math-flags, rather than the globally enabled fast math option.

arsenm added inline comments.Jun 2 2016, 12:04 PM

test/CodeGen/AMDGPU/fdiv.ll
4 ↗	(On Diff #59137)	fast math + -amdgpu-fast-fdiv would be another combination to try
58 ↗	(On Diff #59137)	There should be one that only uses arcp as well

wdng added inline comments.Jun 2 2016, 12:36 PM

test/CodeGen/AMDGPU/fdiv.ll
4 ↗	(On Diff #59137)	Do you mean fast-math flag + "-amdgpu-fast-fdiv" ?

arsenm added inline comments.Jun 2 2016, 5:11 PM

test/CodeGen/AMDGPU/fdiv.ll
4 ↗	(On Diff #59137)	Yes

wdng added inline comments.Jun 2 2016, 5:41 PM

test/CodeGen/AMDGPU/fdiv.ll
4 ↗	(On Diff #59137)	Actually combining fast math + -amdgpu-fast-fdiv has been covered in these tests. When fast math flags + -enable-unsafe-fp-math are enabled (fast, arcp, etc.), " if (Flags->hasAllowReciprocal()) { ... } " will be executed. Orignal less accurate fpdiv implementation will be executed when only enabling the -amdgpu-fast-fdiv flag. If we enable fast math + -amdgpu-fast-fdiv, less accurate fpdiv will be tested.

Added tests for fast-math arcp.

LGTM

test/CodeGen/AMDGPU/fdiv.ll
4 ↗	(On Diff #59627)	Is it? This is what I'm unsure about and why I think there should be an additional test. Are the individual math node flags set if these are globally enabled?

This revision is now accepted and ready to land.Jun 6 2016, 1:42 PM

artem.tamazov added a subscriber: artem.tamazov.Jun 7 2016, 4:57 AM

artem.tamazov added inline comments.

test/CodeGen/AMDGPU/fdiv.ll
50–55 ↗	(On Diff #59627)	AFAIK there should be: Two v_div_scale prior v_rcp Two (not one) v_fma prior v_mul Three (not one) v_fma after v_mul v_div_fmas prior v_div_fixup. Perhaps the test should check that.

wdng added inline comments.Jun 7 2016, 8:40 AM

test/CodeGen/AMDGPU/fdiv.ll
50–55 ↗	(On Diff #59627)	Based on Matt's comment: "-DAG does not work as expected if you specify the same string multiple times in the same -DAG sequence". So, we reduce the vector tests to a differentiating instruction or two per element (e.g. div_scale + div_fixup).

Closed by commit rL272044: Differential Revision: http://reviews.llvm.org/D20557 (authored by wdng). · Explain WhyJun 7 2016, 12:11 PM

This revision was automatically updated to reflect the committed changes.

artem.tamazov added inline comments.Jun 8 2016, 6:28 AM

test/CodeGen/AMDGPU/fdiv.ll
50–55 ↗	(On Diff #59627)	OK. However, v_div_fmas_f32 is still missing. Perhaps this is not important.

Fixed LIT test failed for frem.ll and rsq.ll

arsenm added inline comments.Jun 8 2016, 12:32 PM

test/CodeGen/AMDGPU/frem.ll
1–3 ↗	(On Diff #60081)	This should also test without -amdgpu-fast-fdiv. -mcpu=SI should also be removed

cfang added inline comments.Jun 8 2016, 1:02 PM

test/CodeGen/AMDGPU/frem.ll
1–3 ↗	(On Diff #60081)	Actually I don't suggest we test with -amdgpu-fast-fdiv here at all. This option has already been tested in fdiv.ll, and testing here does not provide any additional value.

Modified LIT test based on Matt's and Changpeng's suggestion.

wdng marked an inline comment as done.Jun 8 2016, 5:17 PM

wdng added inline comments.

test/CodeGen/AMDGPU/fdiv.ll
50–55 ↗	(On Diff #60120)	Yes.

LGTM

Please go ahead to commit!

Diff 59927

llvm/trunk/lib/Target/AMDGPU/SIISelLowering.cpp

Show All 30 Lines
#include "llvm/CodeGen/MachineInstrBuilder.h"		#include "llvm/CodeGen/MachineInstrBuilder.h"
#include "llvm/CodeGen/MachineRegisterInfo.h"		#include "llvm/CodeGen/MachineRegisterInfo.h"
#include "llvm/CodeGen/SelectionDAG.h"		#include "llvm/CodeGen/SelectionDAG.h"
#include "llvm/IR/DiagnosticInfo.h"		#include "llvm/IR/DiagnosticInfo.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"

using namespace llvm;		using namespace llvm;

		// -amdgpu-fast-fdiv - Command line option to enable faster 2.5 ulp fdiv.
		static cl::opt<bool> EnableAMDGPUFastFDIV(
		"amdgpu-fast-fdiv",
		cl::desc("Enable faster 2.5 ulp fdiv"),
		cl::init(false));

static unsigned findFirstFreeSGPR(CCState &CCInfo) {		static unsigned findFirstFreeSGPR(CCState &CCInfo) {
unsigned NumSGPRs = AMDGPU::SGPR_32RegClass.getNumRegs();		unsigned NumSGPRs = AMDGPU::SGPR_32RegClass.getNumRegs();
for (unsigned Reg = 0; Reg < NumSGPRs; ++Reg) {		for (unsigned Reg = 0; Reg < NumSGPRs; ++Reg) {
if (!CCInfo.isAllocated(AMDGPU::SGPR0 + Reg)) {		if (!CCInfo.isAllocated(AMDGPU::SGPR0 + Reg)) {
return AMDGPU::SGPR0 + Reg;		return AMDGPU::SGPR0 + Reg;
}		}
}		}
llvm_unreachable("Cannot allocate sgpr");		llvm_unreachable("Cannot allocate sgpr");
▲ Show 20 Lines • Show All 1,889 Lines • ▼ Show 20 Lines	if (Unsafe) {
SDValue Recip = DAG.getNode(AMDGPUISD::RCP, SL, VT, RHS);		SDValue Recip = DAG.getNode(AMDGPUISD::RCP, SL, VT, RHS);
return DAG.getNode(ISD::FMUL, SL, VT, LHS, Recip, &Flags);		return DAG.getNode(ISD::FMUL, SL, VT, LHS, Recip, &Flags);
}		}

return SDValue();		return SDValue();
}		}

SDValue SITargetLowering::LowerFDIV32(SDValue Op, SelectionDAG &DAG) const {		SDValue SITargetLowering::LowerFDIV32(SDValue Op, SelectionDAG &DAG) const {
		const SDNodeFlags *Flags = Op->getFlags();
		if (Flags->hasAllowReciprocal()) {
if (SDValue FastLowered = LowerFastFDIV(Op, DAG))		if (SDValue FastLowered = LowerFastFDIV(Op, DAG))
return FastLowered;		return FastLowered;
		}

// This uses v_rcp_f32 which does not handle denormals. Let this hit a		// This uses v_rcp_f32 which does not handle denormals. Let this hit a
// selection error for now rather than do something incorrect.		// selection error for now rather than do something incorrect.
if (Subtarget->hasFP32Denormals())		if (Subtarget->hasFP32Denormals())
return SDValue();		return SDValue();

SDLoc SL(Op);		SDLoc SL(Op);
SDValue LHS = Op.getOperand(0);		SDValue LHS = Op.getOperand(0);
SDValue RHS = Op.getOperand(1);		SDValue RHS = Op.getOperand(1);

		// faster 2.5 ulp fdiv when using -amdgpu-fast-fdiv flag
		if (EnableAMDGPUFastFDIV) {
SDValue r1 = DAG.getNode(ISD::FABS, SL, MVT::f32, RHS);		SDValue r1 = DAG.getNode(ISD::FABS, SL, MVT::f32, RHS);

const APFloat K0Val(BitsToFloat(0x6f800000));		const APFloat K0Val(BitsToFloat(0x6f800000));
const SDValue K0 = DAG.getConstantFP(K0Val, SL, MVT::f32);		const SDValue K0 = DAG.getConstantFP(K0Val, SL, MVT::f32);

const APFloat K1Val(BitsToFloat(0x2f800000));		const APFloat K1Val(BitsToFloat(0x2f800000));
const SDValue K1 = DAG.getConstantFP(K1Val, SL, MVT::f32);		const SDValue K1 = DAG.getConstantFP(K1Val, SL, MVT::f32);

const SDValue One = DAG.getConstantFP(1.0, SL, MVT::f32);		const SDValue One = DAG.getConstantFP(1.0, SL, MVT::f32);

EVT SetCCVT =		EVT SetCCVT =
getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(), MVT::f32);		getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(), MVT::f32);

SDValue r2 = DAG.getSetCC(SL, SetCCVT, r1, K0, ISD::SETOGT);		SDValue r2 = DAG.getSetCC(SL, SetCCVT, r1, K0, ISD::SETOGT);

SDValue r3 = DAG.getNode(ISD::SELECT, SL, MVT::f32, r2, K1, One);		SDValue r3 = DAG.getNode(ISD::SELECT, SL, MVT::f32, r2, K1, One);

// TODO: Should this propagate fast-math-flags?		// TODO: Should this propagate fast-math-flags?

r1 = DAG.getNode(ISD::FMUL, SL, MVT::f32, RHS, r3);		r1 = DAG.getNode(ISD::FMUL, SL, MVT::f32, RHS, r3);

SDValue r0 = DAG.getNode(AMDGPUISD::RCP, SL, MVT::f32, r1);		SDValue r0 = DAG.getNode(AMDGPUISD::RCP, SL, MVT::f32, r1);

SDValue Mul = DAG.getNode(ISD::FMUL, SL, MVT::f32, LHS, r0);		SDValue Mul = DAG.getNode(ISD::FMUL, SL, MVT::f32, LHS, r0);

return DAG.getNode(ISD::FMUL, SL, MVT::f32, r3, Mul);		return DAG.getNode(ISD::FMUL, SL, MVT::f32, r3, Mul);
}		}

		// Generates more precise fpdiv32.
		const SDValue One = DAG.getConstantFP(1.0, SL, MVT::f32);

		SDVTList ScaleVT = DAG.getVTList(MVT::f32, MVT::i1);

		SDValue DenominatorScaled = DAG.getNode(AMDGPUISD::DIV_SCALE, SL, ScaleVT, RHS, RHS, LHS);
		SDValue NumeratorScaled = DAG.getNode(AMDGPUISD::DIV_SCALE, SL, ScaleVT, LHS, RHS, LHS);

		SDValue ApproxRcp = DAG.getNode(AMDGPUISD::RCP, SL, MVT::f32, DenominatorScaled);

		SDValue NegDivScale0 = DAG.getNode(ISD::FNEG, SL, MVT::f32, DenominatorScaled);

		SDValue Fma0 = DAG.getNode(ISD::FMA, SL, MVT::f32, NegDivScale0, ApproxRcp, One);
		SDValue Fma1 = DAG.getNode(ISD::FMA, SL, MVT::f32, Fma0, ApproxRcp, ApproxRcp);

		SDValue Mul = DAG.getNode(ISD::FMUL, SL, MVT::f32, NumeratorScaled, Fma1);

		SDValue Fma2 = DAG.getNode(ISD::FMA, SL, MVT::f32, NegDivScale0, Mul, NumeratorScaled);
		SDValue Fma3 = DAG.getNode(ISD::FMA, SL, MVT::f32, Fma2, Fma1, Mul);
		SDValue Fma4 = DAG.getNode(ISD::FMA, SL, MVT::f32, NegDivScale0, Fma3, NumeratorScaled);

		SDValue Scale = NumeratorScaled.getValue(1);
		SDValue Fmas = DAG.getNode(AMDGPUISD::DIV_FMAS, SL, MVT::f32, Fma4, Fma1, Fma3, Scale);

		return DAG.getNode(AMDGPUISD::DIV_FIXUP, SL, MVT::f32, Fmas, RHS, LHS);
		}

SDValue SITargetLowering::LowerFDIV64(SDValue Op, SelectionDAG &DAG) const {		SDValue SITargetLowering::LowerFDIV64(SDValue Op, SelectionDAG &DAG) const {
if (DAG.getTarget().Options.UnsafeFPMath)		if (DAG.getTarget().Options.UnsafeFPMath)
return LowerFastFDIV(Op, DAG);		return LowerFastFDIV(Op, DAG);

SDLoc SL(Op);		SDLoc SL(Op);
SDValue X = Op.getOperand(0);		SDValue X = Op.getOperand(0);
SDValue Y = Op.getOperand(1);		SDValue Y = Op.getOperand(1);

▲ Show 20 Lines • Show All 1,298 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AMDGPU/fdiv.ll

	; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=SI %s			; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=SI %s
	; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefix=SI %s			; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs -amdgpu-fast-fdiv < %s \| FileCheck -check-prefix=SI %s
				; RUN: llc -march=amdgcn -mcpu=fiji -verify-machineinstrs < %s \| FileCheck -check-prefix=I754 %s
				; RUN: llc -march=amdgcn -mcpu=fiji -verify-machineinstrs -enable-unsafe-fp-math < %s \| FileCheck -check-prefix=UNSAFE-FP %s
	; RUN: llc -march=r600 -mcpu=redwood < %s \| FileCheck -check-prefix=R600 %s			; RUN: llc -march=r600 -mcpu=redwood < %s \| FileCheck -check-prefix=R600 %s

	; These tests check that fdiv is expanded correctly and also test that the			; These tests check that fdiv is expanded correctly and also test that the
	; scheduler is scheduling the RECIP_IEEE and MUL_IEEE instructions in separate			; scheduler is scheduling the RECIP_IEEE and MUL_IEEE instructions in separate
	; instruction groups.			; instruction groups.

				; These test check that fdiv using unsafe_fp_math, coarse fp div, and IEEE754 fp div.

	; FUNC-LABEL: {{^}}fdiv_f32:			; FUNC-LABEL: {{^}}fdiv_f32:
	; R600-DAG: RECIP_IEEE * T{{[0-9]+\.[XYZW]}}, KC0[3].Z			; R600-DAG: RECIP_IEEE * T{{[0-9]+\.[XYZW]}}, KC0[3].Z
	; R600-DAG: RECIP_IEEE * T{{[0-9]+\.[XYZW]}}, KC0[3].Y			; R600-DAG: RECIP_IEEE * T{{[0-9]+\.[XYZW]}}, KC0[3].Y
	; R600-DAG: MUL_IEEE {{\** *}}T{{[0-9]+\.[XYZW]}}, KC0[3].X, PS			; R600-DAG: MUL_IEEE {{\** *}}T{{[0-9]+\.[XYZW]}}, KC0[3].X, PS
	; R600-DAG: MUL_IEEE {{\** *}}T{{[0-9]+\.[XYZW]}}, KC0[2].W, PS			; R600-DAG: MUL_IEEE {{\** *}}T{{[0-9]+\.[XYZW]}}, KC0[2].W, PS

				; UNSAFE-FP: v_rcp_f32
				; UNSAFE-FP: v_mul_f32_e32

	; SI-DAG: v_rcp_f32			; SI-DAG: v_rcp_f32
	; SI-DAG: v_mul_f32			; SI-DAG: v_mul_f32

				; I754-DAG: v_div_scale_f32
				; I754-DAG: v_rcp_f32
				; I754-DAG: v_fma_f32
				; I754-DAG: v_mul_f32
				; I754-DAG: v_fma_f32
				; I754-DAG: v_div_fixup_f32
	define void @fdiv_f32(float addrspace(1)* %out, float %a, float %b) {			define void @fdiv_f32(float addrspace(1)* %out, float %a, float %b) {
	entry:			entry:
	%0 = fdiv float %a, %b			%0 = fdiv float %a, %b
	store float %0, float addrspace(1)* %out			store float %0, float addrspace(1)* %out
	ret void			ret void
	}			}

				; FUNC-LABEL: {{^}}fdiv_f32_fast_math:
				; R600-DAG: RECIP_IEEE * T{{[0-9]+\.[XYZW]}}, KC0[3].Z
				; R600-DAG: RECIP_IEEE * T{{[0-9]+\.[XYZW]}}, KC0[3].Y
				; R600-DAG: MUL_IEEE {{\** *}}T{{[0-9]+\.[XYZW]}}, KC0[3].X, PS
				; R600-DAG: MUL_IEEE {{\** *}}T{{[0-9]+\.[XYZW]}}, KC0[2].W, PS

				; UNSAFE-FP: v_rcp_f32
				; UNSAFE-FP: v_mul_f32_e32

				; SI-DAG: v_rcp_f32
				; SI-DAG: v_mul_f32

				; I754-DAG: v_div_scale_f32
				; I754-DAG: v_rcp_f32
				; I754-DAG: v_fma_f32
				; I754-DAG: v_mul_f32
				; I754-DAG: v_fma_f32
				; I754-DAG: v_div_fixup_f32
				define void @fdiv_f32_fast_math(float addrspace(1)* %out, float %a, float %b) {
				entry:
				%0 = fdiv fast float %a, %b
				store float %0, float addrspace(1)* %out
				ret void
				}

				; FUNC-LABEL: {{^}}fdiv_f32_arcp_math:
				; R600-DAG: RECIP_IEEE * T{{[0-9]+\.[XYZW]}}, KC0[3].Z
				; R600-DAG: RECIP_IEEE * T{{[0-9]+\.[XYZW]}}, KC0[3].Y
				; R600-DAG: MUL_IEEE {{\** *}}T{{[0-9]+\.[XYZW]}}, KC0[3].X, PS
				; R600-DAG: MUL_IEEE {{\** *}}T{{[0-9]+\.[XYZW]}}, KC0[2].W, PS

				; UNSAFE-FP: v_rcp_f32
				; UNSAFE-FP: v_mul_f32_e32

				; SI-DAG: v_rcp_f32
				; SI-DAG: v_mul_f32

				; I754-DAG: v_div_scale_f32
				; I754-DAG: v_rcp_f32
				; I754-DAG: v_fma_f32
				; I754-DAG: v_mul_f32
				; I754-DAG: v_fma_f32
				; I754-DAG: v_div_fixup_f32
				define void @fdiv_f32_arcp_math(float addrspace(1)* %out, float %a, float %b) {
				entry:
				%0 = fdiv arcp float %a, %b
				store float %0, float addrspace(1)* %out
				ret void
				}

	; FUNC-LABEL: {{^}}fdiv_v2f32:			; FUNC-LABEL: {{^}}fdiv_v2f32:
	; R600-DAG: RECIP_IEEE * T{{[0-9]+\.[XYZW]}}, KC0[3].Z			; R600-DAG: RECIP_IEEE * T{{[0-9]+\.[XYZW]}}, KC0[3].Z
	; R600-DAG: RECIP_IEEE * T{{[0-9]+\.[XYZW]}}, KC0[3].Y			; R600-DAG: RECIP_IEEE * T{{[0-9]+\.[XYZW]}}, KC0[3].Y
	; R600-DAG: MUL_IEEE {{\** *}}T{{[0-9]+\.[XYZW]}}, KC0[3].X, PS			; R600-DAG: MUL_IEEE {{\** *}}T{{[0-9]+\.[XYZW]}}, KC0[3].X, PS
	; R600-DAG: MUL_IEEE {{\** *}}T{{[0-9]+\.[XYZW]}}, KC0[2].W, PS			; R600-DAG: MUL_IEEE {{\** *}}T{{[0-9]+\.[XYZW]}}, KC0[2].W, PS

				; UNSAFE-FP: v_rcp_f32
				; UNSAFE-FP: v_rcp_f32
				; UNSAFE-FP: v_mul_f32_e32
				; UNSAFE-FP: v_mul_f32_e32

	; SI-DAG: v_rcp_f32			; SI-DAG: v_rcp_f32
	; SI-DAG: v_mul_f32			; SI-DAG: v_mul_f32
	; SI-DAG: v_rcp_f32			; SI-DAG: v_rcp_f32
	; SI-DAG: v_mul_f32			; SI-DAG: v_mul_f32

				; I754: v_div_scale_f32
				; I754: v_div_scale_f32
				; I754: v_div_scale_f32
				; I754: v_div_scale_f32
				; I754: v_div_fixup_f32
				; I754: v_div_fixup_f32
	define void @fdiv_v2f32(<2 x float> addrspace(1)* %out, <2 x float> %a, <2 x float> %b) {			define void @fdiv_v2f32(<2 x float> addrspace(1)* %out, <2 x float> %a, <2 x float> %b) {
	entry:			entry:
	%0 = fdiv <2 x float> %a, %b			%0 = fdiv <2 x float> %a, %b
	store <2 x float> %0, <2 x float> addrspace(1)* %out			store <2 x float> %0, <2 x float> addrspace(1)* %out
	ret void			ret void
	}			}

				; FUNC-LABEL: {{^}}fdiv_v2f32_fast_math:
				; R600-DAG: RECIP_IEEE * T{{[0-9]+\.[XYZW]}}, KC0[3].Z
				; R600-DAG: RECIP_IEEE * T{{[0-9]+\.[XYZW]}}, KC0[3].Y
				; R600-DAG: MUL_IEEE {{\** *}}T{{[0-9]+\.[XYZW]}}, KC0[3].X, PS
				; R600-DAG: MUL_IEEE {{\** *}}T{{[0-9]+\.[XYZW]}}, KC0[2].W, PS

				; UNSAFE-FP: v_rcp_f32
				; UNSAFE-FP: v_rcp_f32
				; UNSAFE-FP: v_mul_f32_e32
				; UNSAFE-FP: v_mul_f32_e32

				; SI-DAG: v_rcp_f32
				; SI-DAG: v_mul_f32
				; SI-DAG: v_rcp_f32
				; SI-DAG: v_mul_f32

				; I754: v_div_scale_f32
				; I754: v_div_scale_f32
				; I754: v_div_scale_f32
				; I754: v_div_scale_f32
				; I754: v_div_fixup_f32
				; I754: v_div_fixup_f32
				define void @fdiv_v2f32_fast_math(<2 x float> addrspace(1)* %out, <2 x float> %a, <2 x float> %b) {
				entry:
				%0 = fdiv fast <2 x float> %a, %b
				store <2 x float> %0, <2 x float> addrspace(1)* %out
				ret void
				}

				; FUNC-LABEL: {{^}}fdiv_v2f32_arcp_math:
				; R600-DAG: RECIP_IEEE * T{{[0-9]+\.[XYZW]}}, KC0[3].Z
				; R600-DAG: RECIP_IEEE * T{{[0-9]+\.[XYZW]}}, KC0[3].Y
				; R600-DAG: MUL_IEEE {{\** *}}T{{[0-9]+\.[XYZW]}}, KC0[3].X, PS
				; R600-DAG: MUL_IEEE {{\** *}}T{{[0-9]+\.[XYZW]}}, KC0[2].W, PS

				; UNSAFE-FP: v_rcp_f32
				; UNSAFE-FP: v_rcp_f32
				; UNSAFE-FP: v_mul_f32_e32
				; UNSAFE-FP: v_mul_f32_e32

				; SI-DAG: v_rcp_f32
				; SI-DAG: v_mul_f32
				; SI-DAG: v_rcp_f32
				; SI-DAG: v_mul_f32

				; I754: v_div_scale_f32
				; I754: v_div_scale_f32
				; I754: v_div_scale_f32
				; I754: v_div_scale_f32
				; I754: v_div_fixup_f32
				; I754: v_div_fixup_f32
				define void @fdiv_v2f32_arcp_math(<2 x float> addrspace(1)* %out, <2 x float> %a, <2 x float> %b) {
				entry:
				%0 = fdiv arcp <2 x float> %a, %b
				store <2 x float> %0, <2 x float> addrspace(1)* %out
				ret void
				}


	; FUNC-LABEL: {{^}}fdiv_v4f32:			; FUNC-LABEL: {{^}}fdiv_v4f32:
	; R600-DAG: RECIP_IEEE * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}			; R600-DAG: RECIP_IEEE * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
	; R600-DAG: RECIP_IEEE * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}			; R600-DAG: RECIP_IEEE * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
	; R600-DAG: RECIP_IEEE * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}			; R600-DAG: RECIP_IEEE * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
	; R600-DAG: RECIP_IEEE * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}			; R600-DAG: RECIP_IEEE * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
	; R600-DAG: MUL_IEEE {{\** *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}, PS			; R600-DAG: MUL_IEEE {{\** *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}, PS
	; R600-DAG: MUL_IEEE {{\** *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}, PS			; R600-DAG: MUL_IEEE {{\** *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}, PS
	; R600-DAG: MUL_IEEE {{\** *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}, PS			; R600-DAG: MUL_IEEE {{\** *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}, PS
	; R600-DAG: MUL_IEEE {{\** *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}, PS			; R600-DAG: MUL_IEEE {{\** *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}, PS

				; UNSAFE-FP: v_rcp_f32_e32
				; UNSAFE-FP: v_rcp_f32_e32
				; UNSAFE-FP: v_rcp_f32_e32
				; UNSAFE-FP: v_rcp_f32_e32
				; UNSAFE-FP: v_mul_f32_e32
				; UNSAFE-FP: v_mul_f32_e32
				; UNSAFE-FP: v_mul_f32_e32
				; UNSAFE-FP: v_mul_f32_e32

	; SI-DAG: v_rcp_f32			; SI-DAG: v_rcp_f32
	; SI-DAG: v_mul_f32			; SI-DAG: v_mul_f32
	; SI-DAG: v_rcp_f32			; SI-DAG: v_rcp_f32
	; SI-DAG: v_mul_f32			; SI-DAG: v_mul_f32
	; SI-DAG: v_rcp_f32			; SI-DAG: v_rcp_f32
	; SI-DAG: v_mul_f32			; SI-DAG: v_mul_f32
	; SI-DAG: v_rcp_f32			; SI-DAG: v_rcp_f32
	; SI-DAG: v_mul_f32			; SI-DAG: v_mul_f32

				; I754: v_div_scale_f32
				; I754: v_div_scale_f32
				; I754: v_div_scale_f32
				; I754: v_div_scale_f32
				; I754: v_div_scale_f32
				; I754: v_div_scale_f32
				; I754: v_div_scale_f32
				; I754: v_div_scale_f32
				; I754: v_div_fixup_f32
				; I754: v_div_fixup_f32
				; I754: v_div_fixup_f32
				; I754: v_div_fixup_f32
	define void @fdiv_v4f32(<4 x float> addrspace(1)* %out, <4 x float> addrspace(1)* %in) {			define void @fdiv_v4f32(<4 x float> addrspace(1)* %out, <4 x float> addrspace(1)* %in) {
	%b_ptr = getelementptr <4 x float>, <4 x float> addrspace(1)* %in, i32 1			%b_ptr = getelementptr <4 x float>, <4 x float> addrspace(1)* %in, i32 1
	%a = load <4 x float>, <4 x float> addrspace(1) * %in			%a = load <4 x float>, <4 x float> addrspace(1) * %in
	%b = load <4 x float>, <4 x float> addrspace(1) * %b_ptr			%b = load <4 x float>, <4 x float> addrspace(1) * %b_ptr
	%result = fdiv <4 x float> %a, %b			%result = fdiv <4 x float> %a, %b
	store <4 x float> %result, <4 x float> addrspace(1)* %out			store <4 x float> %result, <4 x float> addrspace(1)* %out
	ret void			ret void
	}			}

				; FUNC-LABEL: {{^}}fdiv_v4f32_fast_math:
				; R600-DAG: RECIP_IEEE * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
				; R600-DAG: RECIP_IEEE * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
				; R600-DAG: RECIP_IEEE * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
				; R600-DAG: RECIP_IEEE * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
				; R600-DAG: MUL_IEEE {{\** *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}, PS
				; R600-DAG: MUL_IEEE {{\** *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}, PS
				; R600-DAG: MUL_IEEE {{\** *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}, PS
				; R600-DAG: MUL_IEEE {{\** *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}, PS

				; UNSAFE-FP: v_rcp_f32_e32
				; UNSAFE-FP: v_rcp_f32_e32
				; UNSAFE-FP: v_rcp_f32_e32
				; UNSAFE-FP: v_rcp_f32_e32
				; UNSAFE-FP: v_mul_f32_e32
				; UNSAFE-FP: v_mul_f32_e32
				; UNSAFE-FP: v_mul_f32_e32
				; UNSAFE-FP: v_mul_f32_e32

				; SI-DAG: v_rcp_f32
				; SI-DAG: v_mul_f32
				; SI-DAG: v_rcp_f32
				; SI-DAG: v_mul_f32
				; SI-DAG: v_rcp_f32
				; SI-DAG: v_mul_f32
				; SI-DAG: v_rcp_f32
				; SI-DAG: v_mul_f32

				; I754: v_div_scale_f32
				; I754: v_div_scale_f32
				; I754: v_div_scale_f32
				; I754: v_div_scale_f32
				; I754: v_div_scale_f32
				; I754: v_div_scale_f32
				; I754: v_div_scale_f32
				; I754: v_div_scale_f32
				; I754: v_div_fixup_f32
				; I754: v_div_fixup_f32
				; I754: v_div_fixup_f32
				; I754: v_div_fixup_f32
				define void @fdiv_v4f32_fast_math(<4 x float> addrspace(1)* %out, <4 x float> addrspace(1)* %in) {
				%b_ptr = getelementptr <4 x float>, <4 x float> addrspace(1)* %in, i32 1
				%a = load <4 x float>, <4 x float> addrspace(1) * %in
				%b = load <4 x float>, <4 x float> addrspace(1) * %b_ptr
				%result = fdiv fast <4 x float> %a, %b
				store <4 x float> %result, <4 x float> addrspace(1)* %out
				ret void
				}

				; FUNC-LABEL: {{^}}fdiv_v4f32_arcp_math:
				; R600-DAG: RECIP_IEEE * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
				; R600-DAG: RECIP_IEEE * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
				; R600-DAG: RECIP_IEEE * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
				; R600-DAG: RECIP_IEEE * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
				; R600-DAG: MUL_IEEE {{\** *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}, PS
				; R600-DAG: MUL_IEEE {{\** *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}, PS
				; R600-DAG: MUL_IEEE {{\** *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}, PS
				; R600-DAG: MUL_IEEE {{\** *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}, PS

				; UNSAFE-FP: v_rcp_f32_e32
				; UNSAFE-FP: v_rcp_f32_e32
				; UNSAFE-FP: v_rcp_f32_e32
				; UNSAFE-FP: v_rcp_f32_e32
				; UNSAFE-FP: v_mul_f32_e32
				; UNSAFE-FP: v_mul_f32_e32
				; UNSAFE-FP: v_mul_f32_e32
				; UNSAFE-FP: v_mul_f32_e32

				; SI-DAG: v_rcp_f32
				; SI-DAG: v_mul_f32
				; SI-DAG: v_rcp_f32
				; SI-DAG: v_mul_f32
				; SI-DAG: v_rcp_f32
				; SI-DAG: v_mul_f32
				; SI-DAG: v_rcp_f32
				; SI-DAG: v_mul_f32

				; I754: v_div_scale_f32
				; I754: v_div_scale_f32
				; I754: v_div_scale_f32
				; I754: v_div_scale_f32
				; I754: v_div_scale_f32
				; I754: v_div_scale_f32
				; I754: v_div_scale_f32
				; I754: v_div_scale_f32
				; I754: v_div_fixup_f32
				; I754: v_div_fixup_f32
				; I754: v_div_fixup_f32
				; I754: v_div_fixup_f32
				define void @fdiv_v4f32_arcp_math(<4 x float> addrspace(1)* %out, <4 x float> addrspace(1)* %in) {
				%b_ptr = getelementptr <4 x float>, <4 x float> addrspace(1)* %in, i32 1
				%a = load <4 x float>, <4 x float> addrspace(1) * %in
				%b = load <4 x float>, <4 x float> addrspace(1) * %b_ptr
				%result = fdiv arcp <4 x float> %a, %b
				store <4 x float> %result, <4 x float> addrspace(1)* %out
				ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

Lowering floating point division for 32-bit using IEEE 754
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 59927

llvm/trunk/lib/Target/AMDGPU/SIISelLowering.cpp

llvm/trunk/test/CodeGen/AMDGPU/fdiv.ll

This is an archive of the discontinued LLVM Phabricator instance.

Lowering floating point division for 32-bit using IEEE 754ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 59927

llvm/trunk/lib/Target/AMDGPU/SIISelLowering.cpp

llvm/trunk/test/CodeGen/AMDGPU/fdiv.ll

Lowering floating point division for 32-bit using IEEE 754
ClosedPublic