This is an archive of the discontinued LLVM Phabricator instance.

Differential D22898

AMDGPU: Fix ffloor for SI
Needs ReviewPublic

Authored by arsenm on Jul 27 2016, 6:58 PM.

Download Raw Diff

This revision needs review, but all specified reviewers are disabled or inactive.

Details

Reviewers

• tstellarAMD

Summary

OpenCL conformance failed with:
ERROR: floor: 0.500000 ulp error at -0x1.0000000000000p-143 0xb700000000000000): *-0x1.0000000000000p+0 vs. -0x1.fffffffffffffp-1

Diff Detail

Event Timeline

arsenm updated this revision to Diff 65855.Jul 27 2016, 6:58 PM

arsenm retitled this revision from to AMDGPU: Fix ffloor for SI.

arsenm updated this object.

arsenm added a subscriber: llvm-commits.

Herald added a reviewer: • tstellarAMD. · View Herald TranscriptJul 27 2016, 6:58 PM

Herald added subscribers: kzhuravl, arsenm. · View Herald Transcript

Is the MIN needed for correctness at all? Looking at the workaround docs, I see the explanation that "[FRACT] is outputting 1.0 for very small negative inputs). Sounds to me like v_fract is correctly in the range [0, 1.0), except for those very small negative inputs, where it returns 1.0 (which happens to be correct for the ffloor lowering).

lib/Target/AMDGPU/SIInstructions.td
3542	Please also change the comment above.

In D22898#499043, @nhaehnle wrote:

Is the MIN needed for correctness at all? Looking at the workaround docs, I see the explanation that "[FRACT] is outputting 1.0 for very small negative inputs). Sounds to me like v_fract is correctly in the range [0, 1.0), except for those very small negative inputs, where it returns 1.0 (which happens to be correct for the ffloor lowering).

I guess so? I don't know the details of the bug but this passes conformance now

In D22898#501301, @arsenm wrote:

In D22898#499043, @nhaehnle wrote:

Is the MIN needed for correctness at all? Looking at the workaround docs, I see the explanation that "[FRACT] is outputting 1.0 for very small negative inputs). Sounds to me like v_fract is correctly in the range [0, 1.0), except for those very small negative inputs, where it returns 1.0 (which happens to be correct for the ffloor lowering).

I guess so? I don't know the details of the bug but this passes conformance now

Thinking about it more this makes sense. 1.0 will skip the fract at exactly 1.0. up to 0.99... fract is used

In D22898#501345, @arsenm wrote:

In D22898#501301, @arsenm wrote:

In D22898#499043, @nhaehnle wrote:

Is the MIN needed for correctness at all? Looking at the workaround docs, I see the explanation that "[FRACT] is outputting 1.0 for very small negative inputs). Sounds to me like v_fract is correctly in the range [0, 1.0), except for those very small negative inputs, where it returns 1.0 (which happens to be correct for the ffloor lowering).

I guess so? I don't know the details of the bug but this passes conformance now

Thinking about it more this makes sense. 1.0 will skip the fract at exactly 1.0. up to 0.99... fract is used

How does it make any sense? fract should return values in [0, 1). SI has a bug that it returns 1 incorrectly in one case. Doing min(x, 1) will have no effect on the result of buggy fract. That min() is a no-op operation.

In D22898#501974, @mareko wrote:

In D22898#501345, @arsenm wrote:

In D22898#501301, @arsenm wrote:

In D22898#499043, @nhaehnle wrote:

Is the MIN needed for correctness at all? Looking at the workaround docs, I see the explanation that "[FRACT] is outputting 1.0 for very small negative inputs). Sounds to me like v_fract is correctly in the range [0, 1.0), except for those very small negative inputs, where it returns 1.0 (which happens to be correct for the ffloor lowering).

I guess so? I don't know the details of the bug but this passes conformance now

Thinking about it more this makes sense. 1.0 will skip the fract at exactly 1.0. up to 0.99... fract is used

How does it make any sense? fract should return values in [0, 1). SI has a bug that it returns 1 incorrectly in one case. Doing min(x, 1) will have no effect on the result of buggy fract. That min() is a no-op operation.

Yes, by clamping to exactly 1 it skips the broken 1 value. 0.999999... needs to be passed through fract

In D22898#514040, @arsenm wrote:

In D22898#501974, @mareko wrote:

In D22898#501345, @arsenm wrote:

In D22898#501301, @arsenm wrote:

In D22898#499043, @nhaehnle wrote:

Is the MIN needed for correctness at all? Looking at the workaround docs, I see the explanation that "[FRACT] is outputting 1.0 for very small negative inputs). Sounds to me like v_fract is correctly in the range [0, 1.0), except for those very small negative inputs, where it returns 1.0 (which happens to be correct for the ffloor lowering).

I guess so? I don't know the details of the bug but this passes conformance now

Thinking about it more this makes sense. 1.0 will skip the fract at exactly 1.0. up to 0.99... fract is used

How does it make any sense? fract should return values in [0, 1). SI has a bug that it returns 1 incorrectly in one case. Doing min(x, 1) will have no effect on the result of buggy fract. That min() is a no-op operation.

Yes, by clamping to exactly 1 it skips the broken 1 value. 0.999999... needs to be passed through fract

For incorrect x=fract(y) -> x=1, min(x, 1) -> min(1, 1) -> 1. You need this instead: min(x, 0.9999999........). Or am I missing something?

In D22898#514109, @mareko wrote:

In D22898#514040, @arsenm wrote:

In D22898#501974, @mareko wrote:

In D22898#501345, @arsenm wrote:

In D22898#501301, @arsenm wrote:

In D22898#499043, @nhaehnle wrote:

Is the MIN needed for correctness at all? Looking at the workaround docs, I see the explanation that "[FRACT] is outputting 1.0 for very small negative inputs). Sounds to me like v_fract is correctly in the range [0, 1.0), except for those very small negative inputs, where it returns 1.0 (which happens to be correct for the ffloor lowering).

I guess so? I don't know the details of the bug but this passes conformance now

Thinking about it more this makes sense. 1.0 will skip the fract at exactly 1.0. up to 0.99... fract is used

How does it make any sense? fract should return values in [0, 1). SI has a bug that it returns 1 incorrectly in one case. Doing min(x, 1) will have no effect on the result of buggy fract. That min() is a no-op operation.

Yes, by clamping to exactly 1 it skips the broken 1 value. 0.999999... needs to be passed through fract

For incorrect x=fract(y) -> x=1, min(x, 1) -> min(1, 1) -> 1. You need this instead: min(x, 0.9999999........). Or am I missing something?

1 is the incorrect value. 0.99999999999999999 should be correctly handled. By clamping to 0.99999999999999999, you miss correctly handling that one value.

I don't understand.

min(x, 1) is an no-op operation in this case. It doesn't avoid the hardware bug. You could remove that MIN instruction and the behavior would be exactly the same.

The bug information (edited for publishing):

SI: Precision issue for FRACT_F32/64 opcodes *

3.31.1.1 Synopsis
Range of outputs for FRACT opcode is [+0.0, 1.0). The hardware is outputting 1.0 for very small negative inputs (i.e. 0xb3000000).

3.31.1.2 Symptoms
Precision difference with OpenCL conformance test, SW already using workaround. (Could potentially cause precision difference with other APIs.)

3.31.1.3 Scope
Found in all SI family.

3.31.1.4 Suggested Driver Solution
Compiler Expansion for FRACT_F32:

out = FRACT_F32(in)
out = MIN_F32(out, 0x3f7fffff)
out = ISNAN_F32(in) ? in : out;

(Note: 1.0 == 0x3f800000, thus 1.0 is not correct)

Here's what the closed compiler does for F64:

If the Abs modifier is 1 and the Negate modifier is 0, don't apply the workaround.
Otherwise, use V_MIN_F64(0x3fefffffffffffff, x). If IEEE should be obeyed (optional), preserve NaNs with V_CMP_CLASS_F64 and 2x V_CNDMASK_B32.

In D22898#517905, @mareko wrote:
I don't understand.

min(x, 1) is an no-op operation in this case. It doesn't avoid the hardware bug. You could remove that MIN instruction and the behavior would be exactly the same.

The bug information (edited for publishing):

SI: Precision issue for FRACT_F32/64 opcodes *

3.31.1.1 Synopsis
Range of outputs for FRACT opcode is [+0.0, 1.0). The hardware is outputting 1.0 for very small negative inputs (i.e. 0xb3000000).

3.31.1.2 Symptoms
Precision difference with OpenCL conformance test, SW already using workaround. (Could potentially cause precision difference with other APIs.)

3.31.1.3 Scope
Found in all SI family.

3.31.1.4 Suggested Driver Solution
Compiler Expansion for FRACT_F32:
out = FRACT_F32(in)
out = MIN_F32(out, 0x3f7fffff)
out = ISNAN_F32(in) ? in : out;
(Note: 1.0 == 0x3f800000, thus 1.0 is not correct)

Here's what the closed compiler does for F64:

If the Abs modifier is 1 and the Negate modifier is 0, don't apply the workaround.

Otherwise, use V_MIN_F64(0x3fefffffffffffff, x). If IEEE should be obeyed (optional), preserve NaNs with V_CMP_CLASS_F64 and 2x V_CNDMASK_B32.

This is what I get when I dump sc's output for __amdil_fraction_f64:

v_fract_f64   v[0:1], s[0:1]                              // 0000000C: 7E007C00
v_mov_b32     v2, -1                                      // 00000010: 7E0402C1
v_mov_b32     v3, 0x3fefffff                              // 00000014: 7E0602FF 3FEFFFFF
v_min_f64     v[2:3], v[2:3], v[0:1]                      // 0000001C: D2CC0002 00020102
v_cmp_class_f64  vcc, v[0:1], 3                           // 00000024: D150006A 00010700
v_cndmask_b32  v0, v2, v0, vcc                            // 0000002C: 00000102
v_cndmask_b32  v1, v3, v1, vcc

v[2:3] = 0x3fefffffffffffff = 1.0

In D22898#517905, @mareko wrote:
I don't understand.

min(x, 1) is an no-op operation in this case. It doesn't avoid the hardware bug. You could remove that MIN instruction and the behavior would be exactly the same.

The bug information (edited for publishing):

SI: Precision issue for FRACT_F32/64 opcodes *

3.31.1.1 Synopsis
Range of outputs for FRACT opcode is [+0.0, 1.0). The hardware is outputting 1.0 for very small negative inputs (i.e. 0xb3000000).

3.31.1.2 Symptoms
Precision difference with OpenCL conformance test, SW already using workaround. (Could potentially cause precision difference with other APIs.)

3.31.1.3 Scope
Found in all SI family.

3.31.1.4 Suggested Driver Solution
Compiler Expansion for FRACT_F32:
out = FRACT_F32(in)
out = MIN_F32(out, 0x3f7fffff)
out = ISNAN_F32(in) ? in : out;
(Note: 1.0 == 0x3f800000, thus 1.0 is not correct)

Here's what the closed compiler does for F64:

If the Abs modifier is 1 and the Negate modifier is 0, don't apply the workaround.

Otherwise, use V_MIN_F64(0x3fefffffffffffff, x). If IEEE should be obeyed (optional), preserve NaNs with V_CMP_CLASS_F64 and 2x V_CNDMASK_B32.

I'm confused now, because using 1.0 does pass conformance and using 0x3fefffffffffffff does not. This specifically is the workaround for v_fract_f64, but this is the implementation for ffloor. Is this really supposed to be a pure x - fract(x)? It also appears that there is a different bug in v_fract_f64 on CI, but it seems we don't do anything about that right now.

Yeah I know about the CI bug, but it's not important for OpenGL.

0x3ff0000000000000 is 1.0.
0x3fefffffffffffff isn't 1.0. It's the largest number smaller than 1.0, which is 0.9999999999999........... it can also be written as: bitcast(1.0, i32) - 1
If you print 0x3fefffffffffffff with 6 decimal places, it will be rounded to 1.0 for the purpose of printing. I guess that's where the confusion comes from.

I don't know why 1.0 passes OpenCL conformance and bitcast(1.0, i32) - 1 doesn't. I suggest you check what the closed compiler does in this case.

In D22898#518537, @mareko wrote:

Yeah I know about the CI bug, but it's not important for OpenGL.

0x3ff0000000000000 is 1.0.
0x3fefffffffffffff isn't 1.0. It's the largest number smaller than 1.0, which is 0.9999999999999........... it can also be written as: bitcast(1.0, i32) - 1
If you print 0x3fefffffffffffff with 6 decimal places, it will be rounded to 1.0 for the purpose of printing. I guess that's where the confusion comes from.

I don't know why 1.0 passes OpenCL conformance and bitcast(1.0, i32) - 1 doesn't. I suggest you check what the closed compiler does in this case.

Closed OpenCL on the AMDIL path uses a library expansion for floor, and doesn't try to use any of these instructions

In D22898#518537, @mareko wrote:

Yeah I know about the CI bug, but it's not important for OpenGL.

0x3ff0000000000000 is 1.0.
0x3fefffffffffffff isn't 1.0. It's the largest number smaller than 1.0, which is 0.9999999999999........... it can also be written as: bitcast(1.0, i32) - 1
If you print 0x3fefffffffffffff with 6 decimal places, it will be rounded to 1.0 for the purpose of printing. I guess that's where the confusion comes from.

I don't know why 1.0 passes OpenCL conformance and bitcast(1.0, i32) - 1 doesn't. I suggest you check what the closed compiler does in this case.

It implements floor with a library expansion with bitops and doesn't attempt to use fract (though I don't think this is intentional). The custom floor lowering that I found was dead deleted in this patch also passes conformance

In D22898#501973, @mareko wrote:

In D22898#501345, @arsenm wrote:

In D22898#501301, @arsenm wrote:

In D22898#499043, @nhaehnle wrote:

Is the MIN needed for correctness at all? Looking at the workaround docs, I see the explanation that "[FRACT] is outputting 1.0 for very small negative inputs). Sounds to me like v_fract is correctly in the range [0, 1.0), except for those very small negative inputs, where it returns 1.0 (which happens to be correct for the ffloor lowering).

I guess so? I don't know the details of the bug but this passes conformance now

Thinking about it more this makes sense. 1.0 will skip the fract at exactly 1.0. up to 0.99... fract is used

How does it make any sense? fract should return values in [0, 1). SI has a bug that it returns 1 incorrectly in one case. Doing min(x, 1) will have no effect on the result of buggy fract. That min() is a no-op operation.

It's not actually a no-op, it's a canonicalize.

In D22898#855778, @arsenm wrote:

In D22898#501973, @mareko wrote:

In D22898#501345, @arsenm wrote:

In D22898#501301, @arsenm wrote:

In D22898#499043, @nhaehnle wrote:

Is the MIN needed for correctness at all? Looking at the workaround docs, I see the explanation that "[FRACT] is outputting 1.0 for very small negative inputs). Sounds to me like v_fract is correctly in the range [0, 1.0), except for those very small negative inputs, where it returns 1.0 (which happens to be correct for the ffloor lowering).

I guess so? I don't know the details of the bug but this passes conformance now

Thinking about it more this makes sense. 1.0 will skip the fract at exactly 1.0. up to 0.99... fract is used

How does it make any sense? fract should return values in [0, 1). SI has a bug that it returns 1 incorrectly in one case. Doing min(x, 1) will have no effect on the result of buggy fract. That min() is a no-op operation.

It's not actually a no-op, it's a canonicalize.

What does that mean?

In D22898#855792, @mareko wrote:

In D22898#855778, @arsenm wrote:

In D22898#501973, @mareko wrote:

In D22898#501345, @arsenm wrote:

In D22898#501301, @arsenm wrote:

In D22898#499043, @nhaehnle wrote:

Is the MIN needed for correctness at all? Looking at the workaround docs, I see the explanation that "[FRACT] is outputting 1.0 for very small negative inputs). Sounds to me like v_fract is correctly in the range [0, 1.0), except for those very small negative inputs, where it returns 1.0 (which happens to be correct for the ffloor lowering).

I guess so? I don't know the details of the bug but this passes conformance now

Thinking about it more this makes sense. 1.0 will skip the fract at exactly 1.0. up to 0.99... fract is used

How does it make any sense? fract should return values in [0, 1). SI has a bug that it returns 1 incorrectly in one case. Doing min(x, 1) will have no effect on the result of buggy fract. That min() is a no-op operation.

It's not actually a no-op, it's a canonicalize.

What does that mean?

IEEE canonicalize. http://www.llvm.org/docs/LangRef.html#llvm-canonicalize-intrinsic

NaNs are quieted, denormals may be flushed

In D22898#855802, @arsenm wrote:

In D22898#855792, @mareko wrote:

In D22898#855778, @arsenm wrote:

In D22898#501973, @mareko wrote:

In D22898#501345, @arsenm wrote:

In D22898#501301, @arsenm wrote:

In D22898#499043, @nhaehnle wrote:

Is the MIN needed for correctness at all? Looking at the workaround docs, I see the explanation that "[FRACT] is outputting 1.0 for very small negative inputs). Sounds to me like v_fract is correctly in the range [0, 1.0), except for those very small negative inputs, where it returns 1.0 (which happens to be correct for the ffloor lowering).

I guess so? I don't know the details of the bug but this passes conformance now

Thinking about it more this makes sense. 1.0 will skip the fract at exactly 1.0. up to 0.99... fract is used

How does it make any sense? fract should return values in [0, 1). SI has a bug that it returns 1 incorrectly in one case. Doing min(x, 1) will have no effect on the result of buggy fract. That min() is a no-op operation.

It's not actually a no-op, it's a canonicalize.

What does that mean?

IEEE canonicalize. http://www.llvm.org/docs/LangRef.html#llvm-canonicalize-intrinsic

NaNs are quieted, denormals may be flushed

If this patch is what SC does, it's OK with me.

arsenm added inline comments.Aug 30 2017, 12:03 PM

lib/Target/AMDGPU/SIInstructions.td
3545	This might be the problem. This is using SRCMODS.NONE rather than preserving it like the other uses. It might be less error prone to do this as a custom expansion of floor rather than expanding the fract here

arsenm mentioned this in D73352: AMDGPU/GlobalISel: Legalize f64 G_FFLOOR for SI.Jan 24 2020, 7:10 AM

arsenm mentioned this in rG5aa6e246a1e4: AMDGPU/GlobalISel: Legalize f64 G_FFLOOR for SI.Feb 5 2020, 11:33 AM

Revision Contents

Path

Size

lib/

Target/

AMDGPU/

AMDGPUISelLowering.h

1 line

AMDGPUISelLowering.cpp

26 lines

SIISelLowering.h

1 line

SIInstructions.td

2 lines

test/

CodeGen/

AMDGPU/

ffloor.f64.ll

16 lines

fract.f64.ll

12 lines

Diff 65855

lib/Target/AMDGPU/AMDGPUISelLowering.h

Show All 40 Lines	protected:
SDValue LowerFCEIL(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFCEIL(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFTRUNC(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFTRUNC(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFRINT(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFRINT(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFNEARBYINT(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFNEARBYINT(SDValue Op, SelectionDAG &DAG) const;

SDValue LowerFROUND32(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFROUND32(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFROUND64(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFROUND64(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFROUND(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFROUND(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFFLOOR(SDValue Op, SelectionDAG &DAG) const;

SDValue LowerCTLZ(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerCTLZ(SDValue Op, SelectionDAG &DAG) const;

SDValue LowerINT_TO_FP32(SDValue Op, SelectionDAG &DAG, bool Signed) const;		SDValue LowerINT_TO_FP32(SDValue Op, SelectionDAG &DAG, bool Signed) const;
SDValue LowerINT_TO_FP64(SDValue Op, SelectionDAG &DAG, bool Signed) const;		SDValue LowerINT_TO_FP64(SDValue Op, SelectionDAG &DAG, bool Signed) const;
SDValue LowerUINT_TO_FP(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerUINT_TO_FP(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerSINT_TO_FP(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerSINT_TO_FP(SDValue Op, SelectionDAG &DAG) const;

▲ Show 20 Lines • Show All 262 Lines • Show Last 20 Lines

lib/Target/AMDGPU/AMDGPUISelLowering.cpp

Show First 20 Lines • Show All 701 Lines • ▼ Show 20 Lines	SDValue AMDGPUTargetLowering::LowerOperation(SDValue Op,
case ISD::UDIVREM: return LowerUDIVREM(Op, DAG);		case ISD::UDIVREM: return LowerUDIVREM(Op, DAG);
case ISD::SDIVREM: return LowerSDIVREM(Op, DAG);		case ISD::SDIVREM: return LowerSDIVREM(Op, DAG);
case ISD::FREM: return LowerFREM(Op, DAG);		case ISD::FREM: return LowerFREM(Op, DAG);
case ISD::FCEIL: return LowerFCEIL(Op, DAG);		case ISD::FCEIL: return LowerFCEIL(Op, DAG);
case ISD::FTRUNC: return LowerFTRUNC(Op, DAG);		case ISD::FTRUNC: return LowerFTRUNC(Op, DAG);
case ISD::FRINT: return LowerFRINT(Op, DAG);		case ISD::FRINT: return LowerFRINT(Op, DAG);
case ISD::FNEARBYINT: return LowerFNEARBYINT(Op, DAG);		case ISD::FNEARBYINT: return LowerFNEARBYINT(Op, DAG);
case ISD::FROUND: return LowerFROUND(Op, DAG);		case ISD::FROUND: return LowerFROUND(Op, DAG);
case ISD::FFLOOR: return LowerFFLOOR(Op, DAG);
case ISD::SINT_TO_FP: return LowerSINT_TO_FP(Op, DAG);		case ISD::SINT_TO_FP: return LowerSINT_TO_FP(Op, DAG);
case ISD::UINT_TO_FP: return LowerUINT_TO_FP(Op, DAG);		case ISD::UINT_TO_FP: return LowerUINT_TO_FP(Op, DAG);
case ISD::FP_TO_SINT: return LowerFP_TO_SINT(Op, DAG);		case ISD::FP_TO_SINT: return LowerFP_TO_SINT(Op, DAG);
case ISD::FP_TO_UINT: return LowerFP_TO_UINT(Op, DAG);		case ISD::FP_TO_UINT: return LowerFP_TO_UINT(Op, DAG);
case ISD::CTLZ:		case ISD::CTLZ:
case ISD::CTLZ_ZERO_UNDEF:		case ISD::CTLZ_ZERO_UNDEF:
return LowerCTLZ(Op, DAG);		return LowerCTLZ(Op, DAG);
case ISD::DYNAMIC_STACKALLOC: return LowerDYNAMIC_STACKALLOC(Op, DAG);		case ISD::DYNAMIC_STACKALLOC: return LowerDYNAMIC_STACKALLOC(Op, DAG);
▲ Show 20 Lines • Show All 955 Lines • ▼ Show 20 Lines	if (VT == MVT::f32)
return LowerFROUND32(Op, DAG);		return LowerFROUND32(Op, DAG);

if (VT == MVT::f64)		if (VT == MVT::f64)
return LowerFROUND64(Op, DAG);		return LowerFROUND64(Op, DAG);

llvm_unreachable("unhandled type");		llvm_unreachable("unhandled type");
}		}

SDValue AMDGPUTargetLowering::LowerFFLOOR(SDValue Op, SelectionDAG &DAG) const {
SDLoc SL(Op);
SDValue Src = Op.getOperand(0);

// result = trunc(src);
// if (src < 0.0 && src != result)
// result += -1.0.

SDValue Trunc = DAG.getNode(ISD::FTRUNC, SL, MVT::f64, Src);

const SDValue Zero = DAG.getConstantFP(0.0, SL, MVT::f64);
const SDValue NegOne = DAG.getConstantFP(-1.0, SL, MVT::f64);

EVT SetCCVT =
getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(), MVT::f64);

SDValue Lt0 = DAG.getSetCC(SL, SetCCVT, Src, Zero, ISD::SETOLT);
SDValue NeTrunc = DAG.getSetCC(SL, SetCCVT, Src, Trunc, ISD::SETONE);
SDValue And = DAG.getNode(ISD::AND, SL, SetCCVT, Lt0, NeTrunc);

SDValue Add = DAG.getNode(ISD::SELECT, SL, MVT::f64, And, NegOne, Zero);
// TODO: Should this propagate fast-math-flags?
return DAG.getNode(ISD::FADD, SL, MVT::f64, Trunc, Add);
}

SDValue AMDGPUTargetLowering::LowerCTLZ(SDValue Op, SelectionDAG &DAG) const {		SDValue AMDGPUTargetLowering::LowerCTLZ(SDValue Op, SelectionDAG &DAG) const {
SDLoc SL(Op);		SDLoc SL(Op);
SDValue Src = Op.getOperand(0);		SDValue Src = Op.getOperand(0);
bool ZeroUndef = Op.getOpcode() == ISD::CTLZ_ZERO_UNDEF;		bool ZeroUndef = Op.getOpcode() == ISD::CTLZ_ZERO_UNDEF;

if (ZeroUndef && Src.getValueType() == MVT::i32)		if (ZeroUndef && Src.getValueType() == MVT::i32)
return DAG.getNode(AMDGPUISD::FFBH_U32, SL, MVT::i32, Src);		return DAG.getNode(AMDGPUISD::FFBH_U32, SL, MVT::i32, Src);

▲ Show 20 Lines • Show All 1,135 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIISelLowering.h

Show All 35 Lines	class SITargetLowering final : public AMDGPUTargetLowering {
SDValue LowerFrameIndex(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFrameIndex(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerLOAD(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerLOAD(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerSELECT(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerSELECT(SDValue Op, SelectionDAG &DAG) const;
SDValue lowerFastUnsafeFDIV(SDValue Op, SelectionDAG &DAG) const;		SDValue lowerFastUnsafeFDIV(SDValue Op, SelectionDAG &DAG) const;
SDValue lowerFDIV_FAST(SDValue Op, SelectionDAG &DAG) const;		SDValue lowerFDIV_FAST(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFDIV32(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFDIV32(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFDIV64(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFDIV64(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFDIV(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFDIV(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerINT_TO_FP(SDValue Op, SelectionDAG &DAG, bool Signed) const;
SDValue LowerSTORE(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerSTORE(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerTrig(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerTrig(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerATOMIC_CMP_SWAP(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerATOMIC_CMP_SWAP(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerBRCOND(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerBRCOND(SDValue Op, SelectionDAG &DAG) const;

SDValue getSegmentAperture(unsigned AS, SelectionDAG &DAG) const;		SDValue getSegmentAperture(unsigned AS, SelectionDAG &DAG) const;
SDValue lowerADDRSPACECAST(SDValue Op, SelectionDAG &DAG) const;		SDValue lowerADDRSPACECAST(SDValue Op, SelectionDAG &DAG) const;
SDValue lowerTRAP(SDValue Op, SelectionDAG &DAG) const;		SDValue lowerTRAP(SDValue Op, SelectionDAG &DAG) const;
▲ Show 20 Lines • Show All 111 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIInstructions.td

Show First 20 Lines • Show All 3,533 Lines • ▼ Show 20 Lines	(V_ADD_F64
$mods,		$mods,
$x,		$x,
SRCMODS.NEG,		SRCMODS.NEG,
(V_CNDMASK_B64_PSEUDO		(V_CNDMASK_B64_PSEUDO
(V_MIN_F64		(V_MIN_F64
SRCMODS.NONE,		SRCMODS.NONE,
(V_FRACT_F64_e64 $mods, $x, DSTCLAMP.NONE, DSTOMOD.NONE),		(V_FRACT_F64_e64 $mods, $x, DSTCLAMP.NONE, DSTOMOD.NONE),
SRCMODS.NONE,		SRCMODS.NONE,
(V_MOV_B64_PSEUDO 0x3fefffffffffffff),		CONST.FP64_ONE,
		nhaehnleUnsubmitted Not Done Reply Inline Actions Please also change the comment above. nhaehnle: Please also change the comment above.
DSTCLAMP.NONE, DSTOMOD.NONE),		DSTCLAMP.NONE, DSTOMOD.NONE),
$x,		$x,
(V_CMP_CLASS_F64_e64 SRCMODS.NONE, $x, 3/NaN/)),		(V_CMP_CLASS_F64_e64 SRCMODS.NONE, $x, 3/NaN/)),
		arsenmAuthorUnsubmitted Not Done Reply Inline Actions This might be the problem. This is using SRCMODS.NONE rather than preserving it like the other uses. It might be less error prone to do this as a custom expansion of floor rather than expanding the fract here arsenm: This might be the problem. This is using SRCMODS.NONE rather than preserving it like the other…
DSTCLAMP.NONE, DSTOMOD.NONE)		DSTCLAMP.NONE, DSTOMOD.NONE)
>;		>;

} // End Predicates = [isSI]		} // End Predicates = [isSI]

//============================================================================//		//============================================================================//
// Miscellaneous Optimization Patterns		// Miscellaneous Optimization Patterns
//============================================================================//		//============================================================================//
Show All 15 Lines

test/CodeGen/AMDGPU/ffloor.f64.ll

	; RUN: llc -march=amdgcn -verify-machineinstrs -enable-unsafe-fp-math < %s \| FileCheck -check-prefix=SI -check-prefix=FUNC %s			; RUN: llc -march=amdgcn -verify-machineinstrs -enable-unsafe-fp-math < %s \| FileCheck -check-prefix=SI -check-prefix=FUNC %s
	; RUN: llc -march=amdgcn -mcpu=bonaire -verify-machineinstrs -enable-unsafe-fp-math < %s \| FileCheck -check-prefix=CI -check-prefix=FUNC %s			; RUN: llc -march=amdgcn -mcpu=bonaire -verify-machineinstrs -enable-unsafe-fp-math < %s \| FileCheck -check-prefix=CI -check-prefix=FUNC %s
	; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs -enable-unsafe-fp-math < %s \| FileCheck -check-prefix=CI -check-prefix=FUNC %s			; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs -enable-unsafe-fp-math < %s \| FileCheck -check-prefix=CI -check-prefix=FUNC %s

	declare double @llvm.fabs.f64(double %Val)			declare double @llvm.fabs.f64(double %Val)
	declare double @llvm.floor.f64(double) nounwind readnone			declare double @llvm.floor.f64(double) nounwind readnone
	declare <2 x double> @llvm.floor.v2f64(<2 x double>) nounwind readnone			declare <2 x double> @llvm.floor.v2f64(<2 x double>) nounwind readnone
	declare <3 x double> @llvm.floor.v3f64(<3 x double>) nounwind readnone			declare <3 x double> @llvm.floor.v3f64(<3 x double>) nounwind readnone
	declare <4 x double> @llvm.floor.v4f64(<4 x double>) nounwind readnone			declare <4 x double> @llvm.floor.v4f64(<4 x double>) nounwind readnone
	declare <8 x double> @llvm.floor.v8f64(<8 x double>) nounwind readnone			declare <8 x double> @llvm.floor.v8f64(<8 x double>) nounwind readnone
	declare <16 x double> @llvm.floor.v16f64(<16 x double>) nounwind readnone			declare <16 x double> @llvm.floor.v16f64(<16 x double>) nounwind readnone

	; FUNC-LABEL: {{^}}ffloor_f64:			; FUNC-LABEL: {{^}}ffloor_f64:
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	; SI: v_fract_f64_e32
	; SI-DAG: v_min_f64			; SI: v_fract_f64_e32 [[FRACT:v\[[0-9]+:[0-9]+\]]], [[X:s\[[0-9]+:[0-9]+\]]]
	; SI-DAG: v_cmp_class_f64_e64			; SI-DAG: v_min_f64 [[MIN:v\[[0-9]+:[0-9]+\]]], 1.0, [[FRACT]]
				; SI-DAG: v_cmp_class_f64_e64 {{s\[[0-9]+:[0-9]+\]}}, [[X]], 3
	; SI: v_cndmask_b32_e64			; SI: v_cndmask_b32_e64
	; SI: v_cndmask_b32_e64			; SI: v_cndmask_b32_e64
	; SI: v_add_f64			; SI: v_add_f64 {{v\[[0-9]+:[0-9]+\]}}, [[X]], -{{v\[[0-9]+:[0-9]+\]}}
	; SI: s_endpgm			; SI: s_endpgm
	define void @ffloor_f64(double addrspace(1)* %out, double %x) {			define void @ffloor_f64(double addrspace(1)* %out, double %x) {
	%y = call double @llvm.floor.f64(double %x) nounwind readnone			%y = call double @llvm.floor.f64(double %x) nounwind readnone
	store double %y, double addrspace(1)* %out			store double %y, double addrspace(1)* %out
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}ffloor_f64_neg:			; FUNC-LABEL: {{^}}ffloor_f64_neg:
	; CI: v_floor_f64_e64			; CI: v_floor_f64_e64

	; SI: v_fract_f64_e64 {{v[[0-9]+:[0-9]+]}}, -[[INPUT:s[[0-9]+:[0-9]+]]]			; SI: v_fract_f64_e64 {{v[[0-9]+:[0-9]+]}}, -[[INPUT:s[[0-9]+:[0-9]+]]]
	; SI-DAG: v_min_f64			; SI-DAG: v_min_f64
	; SI-DAG: v_cmp_class_f64_e64			; SI-DAG: v_cmp_class_f64_e64
	; SI: v_cndmask_b32_e64			; SI: v_cndmask_b32_e64
	; SI: v_cndmask_b32_e64			; SI: v_cndmask_b32_e64
	; SI: v_add_f64 {{v[[0-9]+:[0-9]+]}}, -[[INPUT]]			; SI: v_add_f64 {{v[[0-9]+:[0-9]+]}}, -[[INPUT]], -v{{\[[0-9]+:[0-9]+\]}}
	; SI: s_endpgm			; SI: s_endpgm
	define void @ffloor_f64_neg(double addrspace(1)* %out, double %x) {			define void @ffloor_f64_neg(double addrspace(1)* %out, double %x) {
	%neg = fsub double 0.0, %x			%neg = fsub double -0.0, %x
	%y = call double @llvm.floor.f64(double %neg) nounwind readnone			%y = call double @llvm.floor.f64(double %neg) nounwind readnone
	store double %y, double addrspace(1)* %out			store double %y, double addrspace(1)* %out
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}ffloor_f64_neg_abs:			; FUNC-LABEL: {{^}}ffloor_f64_neg_abs:
	; CI: v_floor_f64_e64			; CI: v_floor_f64_e64
	; SI: v_fract_f64_e64 {{v[[0-9]+:[0-9]+]}}, -\|[[INPUT:s[[0-9]+:[0-9]+]]]\|			; SI: v_fract_f64_e64 {{v[[0-9]+:[0-9]+]}}, -\|[[INPUT:s[[0-9]+:[0-9]+]]]\|
	; SI-DAG: v_min_f64			; SI-DAG: v_min_f64
	; SI-DAG: v_cmp_class_f64_e64			; SI-DAG: v_cmp_class_f64_e64
	; SI: v_cndmask_b32_e64			; SI: v_cndmask_b32_e64
	; SI: v_cndmask_b32_e64			; SI: v_cndmask_b32_e64
	; SI: v_add_f64 {{v[[0-9]+:[0-9]+]}}, -\|[[INPUT]]\|			; SI: v_add_f64 {{v[[0-9]+:[0-9]+]}}, -\|[[INPUT]]\|
	; SI: s_endpgm			; SI: s_endpgm
	define void @ffloor_f64_neg_abs(double addrspace(1)* %out, double %x) {			define void @ffloor_f64_neg_abs(double addrspace(1)* %out, double %x) {
	%abs = call double @llvm.fabs.f64(double %x)			%abs = call double @llvm.fabs.f64(double %x)
	%neg = fsub double 0.0, %abs			%neg = fsub double -0.0, %abs
	%y = call double @llvm.floor.f64(double %neg) nounwind readnone			%y = call double @llvm.floor.f64(double %neg) nounwind readnone
	store double %y, double addrspace(1)* %out			store double %y, double addrspace(1)* %out
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}ffloor_v2f64:			; FUNC-LABEL: {{^}}ffloor_v2f64:
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	; CI: v_floor_f64_e32			; CI: v_floor_f64_e32
	▲ Show 20 Lines • Show All 65 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/fract.f64.ll

; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=SI -check-prefix=FUNC %s		; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=SI -check-prefix=FUNC %s
; RUN: llc -march=amdgcn -mcpu=bonaire -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=CI -check-prefix=FUNC %s		; RUN: llc -march=amdgcn -mcpu=bonaire -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=CI -check-prefix=FUNC %s
; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=CI -check-prefix=FUNC %s		; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=CI -check-prefix=FUNC %s

; RUN: llc -march=amdgcn -enable-unsafe-fp-math -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN-UNSAFE -check-prefix=SI-UNSAFE -check-prefix=FUNC %s		; RUN: llc -march=amdgcn -enable-unsafe-fp-math -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN-UNSAFE -check-prefix=SI-UNSAFE -check-prefix=FUNC %s
; RUN: llc -march=amdgcn -mcpu=tonga -enable-unsafe-fp-math -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN-UNSAFE -check-prefix=VI-UNSAFE -check-prefix=FUNC %s		; RUN: llc -march=amdgcn -mcpu=tonga -enable-unsafe-fp-math -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN-UNSAFE -check-prefix=VI-UNSAFE -check-prefix=FUNC %s

declare double @llvm.fabs.f64(double) #0		declare double @llvm.fabs.f64(double) #0
declare double @llvm.floor.f64(double) #0		declare double @llvm.floor.f64(double) #0

; FUNC-LABEL: {{^}}fract_f64:		; FUNC-LABEL: {{^}}fract_f64:
; SI-DAG: v_fract_f64_e32 [[FRC:v\[[0-9]+:[0-9]+\]]], v{{\[}}[[LO:[0-9]+]]:[[HI:[0-9]+]]]		; SI-DAG: v_fract_f64_e32 [[FRC:v\[[0-9]+:[0-9]+\]]], v{{\[}}[[LO:[0-9]+]]:[[HI:[0-9]+]]]
; SI-DAG: v_mov_b32_e32 v[[UPLO:[0-9]+]], -1		; SI-DAG: v_min_f64 v{{\[}}[[MINLO:[0-9]+]]:[[MINHI:[0-9]+]]], 1.0, [[FRC]]
; SI-DAG: v_mov_b32_e32 v[[UPHI:[0-9]+]], 0x3fefffff
; SI-DAG: v_min_f64 v{{\[}}[[MINLO:[0-9]+]]:[[MINHI:[0-9]+]]], v{{\[}}[[UPLO]]:[[UPHI]]], [[FRC]]
; SI-DAG: v_cmp_class_f64_e64 [[COND:s\[[0-9]+:[0-9]+\]]], v{{\[}}[[LO]]:[[HI]]], 3		; SI-DAG: v_cmp_class_f64_e64 [[COND:s\[[0-9]+:[0-9]+\]]], v{{\[}}[[LO]]:[[HI]]], 3
; SI: v_cndmask_b32_e64 v[[RESLO:[0-9]+]], v[[MINLO]], v[[LO]], [[COND]]		; SI: v_cndmask_b32_e64 v[[RESLO:[0-9]+]], v[[MINLO]], v[[LO]], [[COND]]
; SI: v_cndmask_b32_e64 v[[RESHI:[0-9]+]], v[[MINHI]], v[[HI]], [[COND]]		; SI: v_cndmask_b32_e64 v[[RESHI:[0-9]+]], v[[MINHI]], v[[HI]], [[COND]]
; SI: v_add_f64 [[SUB0:v\[[0-9]+:[0-9]+\]]], v{{\[}}[[LO]]:[[HI]]{{\]}}, -v{{\[}}[[RESLO]]:[[RESHI]]{{\]}}		; SI: v_add_f64 [[SUB0:v\[[0-9]+:[0-9]+\]]], v{{\[}}[[LO]]:[[HI]]{{\]}}, -v{{\[}}[[RESLO]]:[[RESHI]]{{\]}}
; SI: v_add_f64 [[FRACT:v\[[0-9]+:[0-9]+\]]], v{{\[}}[[LO]]:[[HI]]{{\]}}, -[[SUB0]]		; SI: v_add_f64 [[FRACT:v\[[0-9]+:[0-9]+\]]], v{{\[}}[[LO]]:[[HI]]{{\]}}, -[[SUB0]]

; CI: buffer_load_dwordx2 [[X:v\[[0-9]+:[0-9]+\]]]		; CI: buffer_load_dwordx2 [[X:v\[[0-9]+:[0-9]+\]]]
; CI: v_floor_f64_e32 [[FLOORX:v\[[0-9]+:[0-9]+\]]], [[X]]		; CI: v_floor_f64_e32 [[FLOORX:v\[[0-9]+:[0-9]+\]]], [[X]]
; CI: v_add_f64 [[FRACT:v\[[0-9]+:[0-9]+\]]], [[X]], -[[FLOORX]]		; CI: v_add_f64 [[FRACT:v\[[0-9]+:[0-9]+\]]], [[X]], -[[FLOORX]]

; GCN-UNSAFE: buffer_load_dwordx2 [[X:v\[[0-9]+:[0-9]+\]]]		; GCN-UNSAFE: buffer_load_dwordx2 [[X:v\[[0-9]+:[0-9]+\]]]
; GCN-UNSAFE: v_fract_f64_e32 [[FRACT:v\[[0-9]+:[0-9]+\]]], [[X]]		; GCN-UNSAFE: v_fract_f64_e32 [[FRACT:v\[[0-9]+:[0-9]+\]]], [[X]]

; GCN: buffer_store_dwordx2 [[FRACT]]		; GCN: buffer_store_dwordx2 [[FRACT]]
define void @fract_f64(double addrspace(1)* %out, double addrspace(1)* %src) #1 {		define void @fract_f64(double addrspace(1)* %out, double addrspace(1)* %src) #1 {
%x = load double, double addrspace(1)* %src		%x = load double, double addrspace(1)* %src
%floor.x = call double @llvm.floor.f64(double %x)		%floor.x = call double @llvm.floor.f64(double %x)
%fract = fsub double %x, %floor.x		%fract = fsub double %x, %floor.x
store double %fract, double addrspace(1)* %out		store double %fract, double addrspace(1)* %out
ret void		ret void
}		}

; FUNC-LABEL: {{^}}fract_f64_neg:		; FUNC-LABEL: {{^}}fract_f64_neg:
; SI-DAG: v_fract_f64_e64 [[FRC:v\[[0-9]+:[0-9]+\]]], -v{{\[}}[[LO:[0-9]+]]:[[HI:[0-9]+]]]		; SI-DAG: v_fract_f64_e64 [[FRC:v\[[0-9]+:[0-9]+\]]], -v{{\[}}[[LO:[0-9]+]]:[[HI:[0-9]+]]]
; SI-DAG: v_mov_b32_e32 v[[UPLO:[0-9]+]], -1		; SI-DAG: v_min_f64 v{{\[}}[[MINLO:[0-9]+]]:[[MINHI:[0-9]+]]], 1.0, [[FRC]]
; SI-DAG: v_mov_b32_e32 v[[UPHI:[0-9]+]], 0x3fefffff
; SI-DAG: v_min_f64 v{{\[}}[[MINLO:[0-9]+]]:[[MINHI:[0-9]+]]], v{{\[}}[[UPLO]]:[[UPHI]]], [[FRC]]
; SI-DAG: v_cmp_class_f64_e64 [[COND:s\[[0-9]+:[0-9]+\]]], v{{\[}}[[LO]]:[[HI]]], 3		; SI-DAG: v_cmp_class_f64_e64 [[COND:s\[[0-9]+:[0-9]+\]]], v{{\[}}[[LO]]:[[HI]]], 3
; SI: v_cndmask_b32_e64 v[[RESLO:[0-9]+]], v[[MINLO]], v[[LO]], [[COND]]		; SI: v_cndmask_b32_e64 v[[RESLO:[0-9]+]], v[[MINLO]], v[[LO]], [[COND]]
; SI: v_cndmask_b32_e64 v[[RESHI:[0-9]+]], v[[MINHI]], v[[HI]], [[COND]]		; SI: v_cndmask_b32_e64 v[[RESHI:[0-9]+]], v[[MINHI]], v[[HI]], [[COND]]
; SI: v_add_f64 [[SUB0:v\[[0-9]+:[0-9]+\]]], -v{{\[}}[[LO]]:[[HI]]{{\]}}, -v{{\[}}[[RESLO]]:[[RESHI]]{{\]}}		; SI: v_add_f64 [[SUB0:v\[[0-9]+:[0-9]+\]]], -v{{\[}}[[LO]]:[[HI]]{{\]}}, -v{{\[}}[[RESLO]]:[[RESHI]]{{\]}}
; SI: v_add_f64 [[FRACT:v\[[0-9]+:[0-9]+\]]], -v{{\[}}[[LO]]:[[HI]]{{\]}}, -[[SUB0]]		; SI: v_add_f64 [[FRACT:v\[[0-9]+:[0-9]+\]]], -v{{\[}}[[LO]]:[[HI]]{{\]}}, -[[SUB0]]

; CI: buffer_load_dwordx2 [[X:v\[[0-9]+:[0-9]+\]]]		; CI: buffer_load_dwordx2 [[X:v\[[0-9]+:[0-9]+\]]]
; CI: v_floor_f64_e64 [[FLOORX:v\[[0-9]+:[0-9]+\]]], -[[X]]		; CI: v_floor_f64_e64 [[FLOORX:v\[[0-9]+:[0-9]+\]]], -[[X]]
Show All 9 Lines	define void @fract_f64_neg(double addrspace(1)* %out, double addrspace(1)* %src) #1 {
%floor.neg.x = call double @llvm.floor.f64(double %neg.x)		%floor.neg.x = call double @llvm.floor.f64(double %neg.x)
%fract = fsub double %neg.x, %floor.neg.x		%fract = fsub double %neg.x, %floor.neg.x
store double %fract, double addrspace(1)* %out		store double %fract, double addrspace(1)* %out
ret void		ret void
}		}

; FUNC-LABEL: {{^}}fract_f64_neg_abs:		; FUNC-LABEL: {{^}}fract_f64_neg_abs:
; SI-DAG: v_fract_f64_e64 [[FRC:v\[[0-9]+:[0-9]+\]]], -\|v{{\[}}[[LO:[0-9]+]]:[[HI:[0-9]+]]]\|		; SI-DAG: v_fract_f64_e64 [[FRC:v\[[0-9]+:[0-9]+\]]], -\|v{{\[}}[[LO:[0-9]+]]:[[HI:[0-9]+]]]\|
; SI-DAG: v_mov_b32_e32 v[[UPLO:[0-9]+]], -1		; SI-DAG: v_min_f64 v{{\[}}[[MINLO:[0-9]+]]:[[MINHI:[0-9]+]]], 1.0, [[FRC]]
; SI-DAG: v_mov_b32_e32 v[[UPHI:[0-9]+]], 0x3fefffff
; SI-DAG: v_min_f64 v{{\[}}[[MINLO:[0-9]+]]:[[MINHI:[0-9]+]]], v{{\[}}[[UPLO]]:[[UPHI]]], [[FRC]]
; SI-DAG: v_cmp_class_f64_e64 [[COND:s\[[0-9]+:[0-9]+\]]], v{{\[}}[[LO]]:[[HI]]], 3		; SI-DAG: v_cmp_class_f64_e64 [[COND:s\[[0-9]+:[0-9]+\]]], v{{\[}}[[LO]]:[[HI]]], 3
; SI: v_cndmask_b32_e64 v[[RESLO:[0-9]+]], v[[MINLO]], v[[LO]], [[COND]]		; SI: v_cndmask_b32_e64 v[[RESLO:[0-9]+]], v[[MINLO]], v[[LO]], [[COND]]
; SI: v_cndmask_b32_e64 v[[RESHI:[0-9]+]], v[[MINHI]], v[[HI]], [[COND]]		; SI: v_cndmask_b32_e64 v[[RESHI:[0-9]+]], v[[MINHI]], v[[HI]], [[COND]]
; SI: v_add_f64 [[SUB0:v\[[0-9]+:[0-9]+\]]], -\|v{{\[}}[[LO]]:[[HI]]{{\]}}\|, -v{{\[}}[[RESLO]]:[[RESHI]]{{\]}}		; SI: v_add_f64 [[SUB0:v\[[0-9]+:[0-9]+\]]], -\|v{{\[}}[[LO]]:[[HI]]{{\]}}\|, -v{{\[}}[[RESLO]]:[[RESHI]]{{\]}}
; SI: v_add_f64 [[FRACT:v\[[0-9]+:[0-9]+\]]], -\|v{{\[}}[[LO]]:[[HI]]{{\]}}\|, -[[SUB0]]		; SI: v_add_f64 [[FRACT:v\[[0-9]+:[0-9]+\]]], -\|v{{\[}}[[LO]]:[[HI]]{{\]}}\|, -[[SUB0]]

; CI: buffer_load_dwordx2 [[X:v\[[0-9]+:[0-9]+\]]]		; CI: buffer_load_dwordx2 [[X:v\[[0-9]+:[0-9]+\]]]
; CI: v_floor_f64_e64 [[FLOORX:v\[[0-9]+:[0-9]+\]]], -\|[[X]]\|		; CI: v_floor_f64_e64 [[FLOORX:v\[[0-9]+:[0-9]+\]]], -\|[[X]]\|
Show All 33 Lines