Assuming single precision denormals and accurate sqrt/div are not reported, this passes the OpenCL conformance test.
The f64 version is currently still stuck on broken handling of the div_scale instructions. The legalizing of instruction operands is too strict and only allows one SGPR operand, when the restriction is that only 1 SGPR may be read from, but that single SGPR can be used for multiple operands. When it does legalize div_scale, it produces a copy for one of the SGPR operands, so that the first is no longer the same operand as the second or third.