OpenCL conformance failed with:
ERROR: floor: 0.500000 ulp error at -0x1.0000000000000p-143 0xb700000000000000): *-0x1.0000000000000p+0 vs. -0x1.fffffffffffffp-1
Details
- Reviewers
• tstellarAMD
Diff Detail
Event Timeline
Is the MIN needed for correctness at all? Looking at the workaround docs, I see the explanation that "[FRACT] is outputting 1.0 for very small negative inputs). Sounds to me like v_fract is correctly in the range [0, 1.0), except for those very small negative inputs, where it returns 1.0 (which happens to be correct for the ffloor lowering).
lib/Target/AMDGPU/SIInstructions.td | ||
---|---|---|
3542 | Please also change the comment above. |
Thinking about it more this makes sense. 1.0 will skip the fract at exactly 1.0. up to 0.99... fract is used
How does it make any sense? fract should return values in [0, 1). SI has a bug that it returns 1 incorrectly in one case. Doing min(x, 1) will have no effect on the result of buggy fract. That min() is a no-op operation.
Yes, by clamping to exactly 1 it skips the broken 1 value. 0.999999... needs to be passed through fract
For incorrect x=fract(y) -> x=1, min(x, 1) -> min(1, 1) -> 1. You need this instead: min(x, 0.9999999........). Or am I missing something?
1 is the incorrect value. 0.99999999999999999 should be correctly handled. By clamping to 0.99999999999999999, you miss correctly handling that one value.
I don't understand.
min(x, 1) is an no-op operation in this case. It doesn't avoid the hardware bug. You could remove that MIN instruction and the behavior would be exactly the same.
The bug information (edited for publishing):
- SI: Precision issue for FRACT_F32/64 opcodes *
3.31.1.1 Synopsis
Range of outputs for FRACT opcode is [+0.0, 1.0). The hardware is outputting 1.0 for very small negative inputs (i.e. 0xb3000000).
3.31.1.2 Symptoms
Precision difference with OpenCL conformance test, SW already using workaround. (Could potentially cause precision difference with other APIs.)
3.31.1.3 Scope
Found in all SI family.
3.31.1.4 Suggested Driver Solution
Compiler Expansion for FRACT_F32:
out = FRACT_F32(in) out = MIN_F32(out, 0x3f7fffff) out = ISNAN_F32(in) ? in : out;
(Note: 1.0 == 0x3f800000, thus 1.0 is not correct)
Here's what the closed compiler does for F64:
- If the Abs modifier is 1 and the Negate modifier is 0, don't apply the workaround.
- Otherwise, use V_MIN_F64(0x3fefffffffffffff, x). If IEEE should be obeyed (optional), preserve NaNs with V_CMP_CLASS_F64 and 2x V_CNDMASK_B32.
This is what I get when I dump sc's output for __amdil_fraction_f64:
v_fract_f64 v[0:1], s[0:1] // 0000000C: 7E007C00 v_mov_b32 v2, -1 // 00000010: 7E0402C1 v_mov_b32 v3, 0x3fefffff // 00000014: 7E0602FF 3FEFFFFF v_min_f64 v[2:3], v[2:3], v[0:1] // 0000001C: D2CC0002 00020102 v_cmp_class_f64 vcc, v[0:1], 3 // 00000024: D150006A 00010700 v_cndmask_b32 v0, v2, v0, vcc // 0000002C: 00000102 v_cndmask_b32 v1, v3, v1, vcc
v[2:3] = 0x3fefffffffffffff = 1.0
I'm confused now, because using 1.0 does pass conformance and using 0x3fefffffffffffff does not. This specifically is the workaround for v_fract_f64, but this is the implementation for ffloor. Is this really supposed to be a pure x - fract(x)? It also appears that there is a different bug in v_fract_f64 on CI, but it seems we don't do anything about that right now.
Yeah I know about the CI bug, but it's not important for OpenGL.
0x3ff0000000000000 is 1.0.
0x3fefffffffffffff isn't 1.0. It's the largest number smaller than 1.0, which is 0.9999999999999........... it can also be written as: bitcast(1.0, i32) - 1
If you print 0x3fefffffffffffff with 6 decimal places, it will be rounded to 1.0 for the purpose of printing. I guess that's where the confusion comes from.
I don't know why 1.0 passes OpenCL conformance and bitcast(1.0, i32) - 1 doesn't. I suggest you check what the closed compiler does in this case.
Closed OpenCL on the AMDIL path uses a library expansion for floor, and doesn't try to use any of these instructions
It implements floor with a library expansion with bitops and doesn't attempt to use fract (though I don't think this is intentional). The custom floor lowering that I found was dead deleted in this patch also passes conformance
IEEE canonicalize. http://www.llvm.org/docs/LangRef.html#llvm-canonicalize-intrinsic
NaNs are quieted, denormals may be flushed
lib/Target/AMDGPU/SIInstructions.td | ||
---|---|---|
3545 | This might be the problem. This is using SRCMODS.NONE rather than preserving it like the other uses. It might be less error prone to do this as a custom expansion of floor rather than expanding the fract here |
Please also change the comment above.