This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Fix ffloor for SI
Needs ReviewPublic

Authored by arsenm on Jul 27 2016, 6:58 PM.
This revision needs review, but all specified reviewers are disabled or inactive.

Details

Reviewers
tstellarAMD
Summary

OpenCL conformance failed with:
ERROR: floor: 0.500000 ulp error at -0x1.0000000000000p-143 0xb700000000000000): *-0x1.0000000000000p+0 vs. -0x1.fffffffffffffp-1

Diff Detail

Event Timeline

arsenm updated this revision to Diff 65855.Jul 27 2016, 6:58 PM
arsenm retitled this revision from to AMDGPU: Fix ffloor for SI.
arsenm updated this object.
arsenm added a subscriber: llvm-commits.

Is the MIN needed for correctness at all? Looking at the workaround docs, I see the explanation that "[FRACT] is outputting 1.0 for very small negative inputs). Sounds to me like v_fract is correctly in the range [0, 1.0), except for those very small negative inputs, where it returns 1.0 (which happens to be correct for the ffloor lowering).

lib/Target/AMDGPU/SIInstructions.td
3542

Please also change the comment above.

Is the MIN needed for correctness at all? Looking at the workaround docs, I see the explanation that "[FRACT] is outputting 1.0 for very small negative inputs). Sounds to me like v_fract is correctly in the range [0, 1.0), except for those very small negative inputs, where it returns 1.0 (which happens to be correct for the ffloor lowering).

I guess so? I don't know the details of the bug but this passes conformance now

Is the MIN needed for correctness at all? Looking at the workaround docs, I see the explanation that "[FRACT] is outputting 1.0 for very small negative inputs). Sounds to me like v_fract is correctly in the range [0, 1.0), except for those very small negative inputs, where it returns 1.0 (which happens to be correct for the ffloor lowering).

I guess so? I don't know the details of the bug but this passes conformance now

Thinking about it more this makes sense. 1.0 will skip the fract at exactly 1.0. up to 0.99... fract is used

mareko added a subscriber: mareko.Aug 1 2016, 3:21 AM

Is the MIN needed for correctness at all? Looking at the workaround docs, I see the explanation that "[FRACT] is outputting 1.0 for very small negative inputs). Sounds to me like v_fract is correctly in the range [0, 1.0), except for those very small negative inputs, where it returns 1.0 (which happens to be correct for the ffloor lowering).

I guess so? I don't know the details of the bug but this passes conformance now

Thinking about it more this makes sense. 1.0 will skip the fract at exactly 1.0. up to 0.99... fract is used

How does it make any sense? fract should return values in [0, 1). SI has a bug that it returns 1 incorrectly in one case. Doing min(x, 1) will have no effect on the result of buggy fract. That min() is a no-op operation.

Is the MIN needed for correctness at all? Looking at the workaround docs, I see the explanation that "[FRACT] is outputting 1.0 for very small negative inputs). Sounds to me like v_fract is correctly in the range [0, 1.0), except for those very small negative inputs, where it returns 1.0 (which happens to be correct for the ffloor lowering).

I guess so? I don't know the details of the bug but this passes conformance now

Thinking about it more this makes sense. 1.0 will skip the fract at exactly 1.0. up to 0.99... fract is used

How does it make any sense? fract should return values in [0, 1). SI has a bug that it returns 1 incorrectly in one case. Doing min(x, 1) will have no effect on the result of buggy fract. That min() is a no-op operation.

Yes, by clamping to exactly 1 it skips the broken 1 value. 0.999999... needs to be passed through fract

Is the MIN needed for correctness at all? Looking at the workaround docs, I see the explanation that "[FRACT] is outputting 1.0 for very small negative inputs). Sounds to me like v_fract is correctly in the range [0, 1.0), except for those very small negative inputs, where it returns 1.0 (which happens to be correct for the ffloor lowering).

I guess so? I don't know the details of the bug but this passes conformance now

Thinking about it more this makes sense. 1.0 will skip the fract at exactly 1.0. up to 0.99... fract is used

How does it make any sense? fract should return values in [0, 1). SI has a bug that it returns 1 incorrectly in one case. Doing min(x, 1) will have no effect on the result of buggy fract. That min() is a no-op operation.

Yes, by clamping to exactly 1 it skips the broken 1 value. 0.999999... needs to be passed through fract

For incorrect x=fract(y) -> x=1, min(x, 1) -> min(1, 1) -> 1. You need this instead: min(x, 0.9999999........). Or am I missing something?

Is the MIN needed for correctness at all? Looking at the workaround docs, I see the explanation that "[FRACT] is outputting 1.0 for very small negative inputs). Sounds to me like v_fract is correctly in the range [0, 1.0), except for those very small negative inputs, where it returns 1.0 (which happens to be correct for the ffloor lowering).

I guess so? I don't know the details of the bug but this passes conformance now

Thinking about it more this makes sense. 1.0 will skip the fract at exactly 1.0. up to 0.99... fract is used

How does it make any sense? fract should return values in [0, 1). SI has a bug that it returns 1 incorrectly in one case. Doing min(x, 1) will have no effect on the result of buggy fract. That min() is a no-op operation.

Yes, by clamping to exactly 1 it skips the broken 1 value. 0.999999... needs to be passed through fract

For incorrect x=fract(y) -> x=1, min(x, 1) -> min(1, 1) -> 1. You need this instead: min(x, 0.9999999........). Or am I missing something?

1 is the incorrect value. 0.99999999999999999 should be correctly handled. By clamping to 0.99999999999999999, you miss correctly handling that one value.

I don't understand.

min(x, 1) is an no-op operation in this case. It doesn't avoid the hardware bug. You could remove that MIN instruction and the behavior would be exactly the same.

The bug information (edited for publishing):

  • SI: Precision issue for FRACT_F32/64 opcodes *

3.31.1.1 Synopsis
Range of outputs for FRACT opcode is [+0.0, 1.0). The hardware is outputting 1.0 for very small negative inputs (i.e. 0xb3000000).

3.31.1.2 Symptoms
Precision difference with OpenCL conformance test, SW already using workaround. (Could potentially cause precision difference with other APIs.)

3.31.1.3 Scope
Found in all SI family.

3.31.1.4 Suggested Driver Solution
Compiler Expansion for FRACT_F32:

out = FRACT_F32(in)
out = MIN_F32(out, 0x3f7fffff)
out = ISNAN_F32(in) ? in : out;

(Note: 1.0 == 0x3f800000, thus 1.0 is not correct)

Here's what the closed compiler does for F64:

  • If the Abs modifier is 1 and the Negate modifier is 0, don't apply the workaround.
  • Otherwise, use V_MIN_F64(0x3fefffffffffffff, x). If IEEE should be obeyed (optional), preserve NaNs with V_CMP_CLASS_F64 and 2x V_CNDMASK_B32.

I don't understand.

min(x, 1) is an no-op operation in this case. It doesn't avoid the hardware bug. You could remove that MIN instruction and the behavior would be exactly the same.

The bug information (edited for publishing):

  • SI: Precision issue for FRACT_F32/64 opcodes *

3.31.1.1 Synopsis
Range of outputs for FRACT opcode is [+0.0, 1.0). The hardware is outputting 1.0 for very small negative inputs (i.e. 0xb3000000).

3.31.1.2 Symptoms
Precision difference with OpenCL conformance test, SW already using workaround. (Could potentially cause precision difference with other APIs.)

3.31.1.3 Scope
Found in all SI family.

3.31.1.4 Suggested Driver Solution
Compiler Expansion for FRACT_F32:

out = FRACT_F32(in)
out = MIN_F32(out, 0x3f7fffff)
out = ISNAN_F32(in) ? in : out;

(Note: 1.0 == 0x3f800000, thus 1.0 is not correct)

Here's what the closed compiler does for F64:

  • If the Abs modifier is 1 and the Negate modifier is 0, don't apply the workaround.
  • Otherwise, use V_MIN_F64(0x3fefffffffffffff, x). If IEEE should be obeyed (optional), preserve NaNs with V_CMP_CLASS_F64 and 2x V_CNDMASK_B32.

This is what I get when I dump sc's output for __amdil_fraction_f64:

v_fract_f64   v[0:1], s[0:1]                              // 0000000C: 7E007C00
v_mov_b32     v2, -1                                      // 00000010: 7E0402C1
v_mov_b32     v3, 0x3fefffff                              // 00000014: 7E0602FF 3FEFFFFF
v_min_f64     v[2:3], v[2:3], v[0:1]                      // 0000001C: D2CC0002 00020102
v_cmp_class_f64  vcc, v[0:1], 3                           // 00000024: D150006A 00010700
v_cndmask_b32  v0, v2, v0, vcc                            // 0000002C: 00000102
v_cndmask_b32  v1, v3, v1, vcc

v[2:3] = 0x3fefffffffffffff = 1.0

I don't understand.

min(x, 1) is an no-op operation in this case. It doesn't avoid the hardware bug. You could remove that MIN instruction and the behavior would be exactly the same.

The bug information (edited for publishing):

  • SI: Precision issue for FRACT_F32/64 opcodes *

3.31.1.1 Synopsis
Range of outputs for FRACT opcode is [+0.0, 1.0). The hardware is outputting 1.0 for very small negative inputs (i.e. 0xb3000000).

3.31.1.2 Symptoms
Precision difference with OpenCL conformance test, SW already using workaround. (Could potentially cause precision difference with other APIs.)

3.31.1.3 Scope
Found in all SI family.

3.31.1.4 Suggested Driver Solution
Compiler Expansion for FRACT_F32:

out = FRACT_F32(in)
out = MIN_F32(out, 0x3f7fffff)
out = ISNAN_F32(in) ? in : out;

(Note: 1.0 == 0x3f800000, thus 1.0 is not correct)

Here's what the closed compiler does for F64:

  • If the Abs modifier is 1 and the Negate modifier is 0, don't apply the workaround.
  • Otherwise, use V_MIN_F64(0x3fefffffffffffff, x). If IEEE should be obeyed (optional), preserve NaNs with V_CMP_CLASS_F64 and 2x V_CNDMASK_B32.

I'm confused now, because using 1.0 does pass conformance and using 0x3fefffffffffffff does not. This specifically is the workaround for v_fract_f64, but this is the implementation for ffloor. Is this really supposed to be a pure x - fract(x)? It also appears that there is a different bug in v_fract_f64 on CI, but it seems we don't do anything about that right now.

Yeah I know about the CI bug, but it's not important for OpenGL.

0x3ff0000000000000 is 1.0.
0x3fefffffffffffff isn't 1.0. It's the largest number smaller than 1.0, which is 0.9999999999999........... it can also be written as: bitcast(1.0, i32) - 1
If you print 0x3fefffffffffffff with 6 decimal places, it will be rounded to 1.0 for the purpose of printing. I guess that's where the confusion comes from.

I don't know why 1.0 passes OpenCL conformance and bitcast(1.0, i32) - 1 doesn't. I suggest you check what the closed compiler does in this case.

Yeah I know about the CI bug, but it's not important for OpenGL.

0x3ff0000000000000 is 1.0.
0x3fefffffffffffff isn't 1.0. It's the largest number smaller than 1.0, which is 0.9999999999999........... it can also be written as: bitcast(1.0, i32) - 1
If you print 0x3fefffffffffffff with 6 decimal places, it will be rounded to 1.0 for the purpose of printing. I guess that's where the confusion comes from.

I don't know why 1.0 passes OpenCL conformance and bitcast(1.0, i32) - 1 doesn't. I suggest you check what the closed compiler does in this case.

Closed OpenCL on the AMDIL path uses a library expansion for floor, and doesn't try to use any of these instructions

Yeah I know about the CI bug, but it's not important for OpenGL.

0x3ff0000000000000 is 1.0.
0x3fefffffffffffff isn't 1.0. It's the largest number smaller than 1.0, which is 0.9999999999999........... it can also be written as: bitcast(1.0, i32) - 1
If you print 0x3fefffffffffffff with 6 decimal places, it will be rounded to 1.0 for the purpose of printing. I guess that's where the confusion comes from.

I don't know why 1.0 passes OpenCL conformance and bitcast(1.0, i32) - 1 doesn't. I suggest you check what the closed compiler does in this case.

It implements floor with a library expansion with bitops and doesn't attempt to use fract (though I don't think this is intentional). The custom floor lowering that I found was dead deleted in this patch also passes conformance

Is the MIN needed for correctness at all? Looking at the workaround docs, I see the explanation that "[FRACT] is outputting 1.0 for very small negative inputs). Sounds to me like v_fract is correctly in the range [0, 1.0), except for those very small negative inputs, where it returns 1.0 (which happens to be correct for the ffloor lowering).

I guess so? I don't know the details of the bug but this passes conformance now

Thinking about it more this makes sense. 1.0 will skip the fract at exactly 1.0. up to 0.99... fract is used

How does it make any sense? fract should return values in [0, 1). SI has a bug that it returns 1 incorrectly in one case. Doing min(x, 1) will have no effect on the result of buggy fract. That min() is a no-op operation.

It's not actually a no-op, it's a canonicalize.

Is the MIN needed for correctness at all? Looking at the workaround docs, I see the explanation that "[FRACT] is outputting 1.0 for very small negative inputs). Sounds to me like v_fract is correctly in the range [0, 1.0), except for those very small negative inputs, where it returns 1.0 (which happens to be correct for the ffloor lowering).

I guess so? I don't know the details of the bug but this passes conformance now

Thinking about it more this makes sense. 1.0 will skip the fract at exactly 1.0. up to 0.99... fract is used

How does it make any sense? fract should return values in [0, 1). SI has a bug that it returns 1 incorrectly in one case. Doing min(x, 1) will have no effect on the result of buggy fract. That min() is a no-op operation.

It's not actually a no-op, it's a canonicalize.

What does that mean?

Is the MIN needed for correctness at all? Looking at the workaround docs, I see the explanation that "[FRACT] is outputting 1.0 for very small negative inputs). Sounds to me like v_fract is correctly in the range [0, 1.0), except for those very small negative inputs, where it returns 1.0 (which happens to be correct for the ffloor lowering).

I guess so? I don't know the details of the bug but this passes conformance now

Thinking about it more this makes sense. 1.0 will skip the fract at exactly 1.0. up to 0.99... fract is used

How does it make any sense? fract should return values in [0, 1). SI has a bug that it returns 1 incorrectly in one case. Doing min(x, 1) will have no effect on the result of buggy fract. That min() is a no-op operation.

It's not actually a no-op, it's a canonicalize.

What does that mean?

IEEE canonicalize. http://www.llvm.org/docs/LangRef.html#llvm-canonicalize-intrinsic

NaNs are quieted, denormals may be flushed

Is the MIN needed for correctness at all? Looking at the workaround docs, I see the explanation that "[FRACT] is outputting 1.0 for very small negative inputs). Sounds to me like v_fract is correctly in the range [0, 1.0), except for those very small negative inputs, where it returns 1.0 (which happens to be correct for the ffloor lowering).

I guess so? I don't know the details of the bug but this passes conformance now

Thinking about it more this makes sense. 1.0 will skip the fract at exactly 1.0. up to 0.99... fract is used

How does it make any sense? fract should return values in [0, 1). SI has a bug that it returns 1 incorrectly in one case. Doing min(x, 1) will have no effect on the result of buggy fract. That min() is a no-op operation.

It's not actually a no-op, it's a canonicalize.

What does that mean?

IEEE canonicalize. http://www.llvm.org/docs/LangRef.html#llvm-canonicalize-intrinsic

NaNs are quieted, denormals may be flushed

If this patch is what SC does, it's OK with me.

arsenm added inline comments.Aug 30 2017, 12:03 PM
lib/Target/AMDGPU/SIInstructions.td
3545

This might be the problem. This is using SRCMODS.NONE rather than preserving it like the other uses. It might be less error prone to do this as a custom expansion of floor rather than expanding the fract here