These inherited the fast math checks from f32, but the manual suggests
these should be accurate enough for unconditional use. The definition
of correctly rounded is 0.5ulp, but the manual says "0.51ulp". I've
been a bit nervous about changing this as the OpenCL conformance test
does not cover half. Brute force produces identical values compared to
a reference host implementation for all values.
Details
Diff Detail
Event Timeline
There is 'half' subtest of the OCL conformance. Not sure it covers it, but I would run it. Otherwise LGTM.
It doesn't. Even more disturbing is the regular reciprocal test for f32/f64 is commented out
Brute force produces identical values compared to a reference host implementation for all values.
Have you tested v_sqrt_f16 or any other f16 trans instructions?
v_sqrt_f16 is identical.
v_log_f16 is is identical.
v_exp_f16 has a single value differ: ref=0x1p+0 inst=0x1.004p+0
I'm also comparing these by cast to float host implementations, maybe a proper f16 implementation would have rounded these differences differently?
I think I'm doing something wrong with the pre-scaling for sin/cos, those results just seem totally wrong
v_exp_f16 has a single value differ: ref=0x1p+0 inst=0x1.004p+0
For what input value?
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | ||
---|---|---|
4152–4155 | Comments don't match the code. https://github.com/llvm/llvm-project/pull/68982 fixes the comments. | |
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
9172–9175 | Same. |
Comments don't match the code. https://github.com/llvm/llvm-project/pull/68982 fixes the comments.