We have reasonable fast sqrt and accurate rsqrt for half type due to the
limited fractions. So neither do we need multi steps refinement for
rsqrt nor replace sqrt by rsqrt.
This fixes a correctness issue when RefinementSteps = 0.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
What's the correctness issue? It's fast math so the answer isn't required to be exact.
@kmclaughlin I'm meant to change in the target independent code (enabled in D110557), then I found the test in your D111657 was affected too. I'll change there if you think it is workable for aarch64 too.
See https://godbolt.org/z/P4Y89hsj4
I guess we need at least RefinementSteps = 1 for correntness when Reciprocal == 0.
That seems like a more general bug. The same issue happens if you force the estimate steps for floating point https://godbolt.org/z/vK8eab8zP
I think for f16 you should also be returning true from X86TargetLowering::isFsqrtCheap which would prevent getSqrtEstimate from being called for the non-reciprocal case.
I see. But I'm not sure if it is bug or by design. Maybe need an assertion?
Good point!
I think this is a weird interface quirk. The caller interprets returning RefinementSteps = 0 to mean that all needed code has been created and nothing should be done. Theoretically a target could have its own way of handling it without the final FMUL the target independent code inserts. So I think X86 should either insert the final FMUL itself or not do the reciprocal approximation for non-reciprocal if RefinementSteps is 0. NVPTXTargetLowering::getSqrtEstimate does the latter.
So I think X86 should either insert the final FMUL itself or ...
Insert a FMUL. Thanks Craig!
That looks like a long-standing bug for x86. I don't remember if that was a design flaw originally or if the code evolved to handle more patterns, and we just missed that problem.
Better to add the tests first, add the code to fix the bug, then add the code to bypass for f16?
llvm/test/CodeGen/X86/avx512fp16vl-intrinsics.ll | ||
---|---|---|
972 | Does it make sense to add a scalar test too? define half @sqrt_half_fast(half %a0, half %a1) { %1 = call fast half @llvm.sqrt.f16(half %a0) ret half %1 } |
Does it make sense to add a scalar test too?