This change uses fmf subflags to guard optimizations as well as unsafe. These changes originated from D46483.
It contains only context for fsqrt.
I think we want to allow this transform when the node allows "approximate functions" (afn). I don't think we should care about 'arcp' - the transform of 1.0/sqrt(x) is handled on a different path AFAICT.
Not quite true, I reference AMDGPU which uses 1.0/sqrt to map to AMDGPUISD::RSQ, which is rsqrt, although a partial condition. See SITargetLowering::lowerFastUnsafeFDIV. Adding Matt as this is a AMD topic.
It does seem to me that 'afn' is what we want here, rather than 'arcp'. I think the point about SITargetLowering::lowerFastUnsafeFDIV means that the code there ought to be checking both 'afn' and 'arcp'.
I agree. This is just a question of how we compute sqrt(x). For an approximation of 1/sqrt(x), then I can see also needing arcp.
Is there some reason for trimming the output here? I think it's important that we show the entire estimate sequence here to be consistent. Otherwise, it's misleading as it appears that we only need the raw estimate instruction.