This change uses fmf subflags to guard optimizations as well as unsafe. These changes originated from D46483.
It contains only context for fsqrt.
Details
Diff Detail
Event Timeline
lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
10897 | Here, we're entering a bit of uncharted FMF territory. cc'ing some more potential reviewers for opinions - @wristow @efriedma @andrew.w.kaylor I think we want to allow this transform when the node allows "approximate functions" (afn). I don't think we should care about 'arcp' - the transform of 1.0/sqrt(x) is handled on a different path AFAICT. |
lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
10897 | Not quite true, I reference AMDGPU which uses 1.0/sqrt to map to AMDGPUISD::RSQ, which is rsqrt, although a partial condition. See SITargetLowering::lowerFastUnsafeFDIV. Adding Matt as this is a AMD topic. |
lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
10897 | It does seem to me that 'afn' is what we want here, rather than 'arcp'. I think the point about SITargetLowering::lowerFastUnsafeFDIV means that the code there ought to be checking both 'afn' and 'arcp'. |
lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
10897 | The point of the code there is it doesn't need either. In some situations the instruction does the right thing without any need for a flag |
lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
10897 | For the cases where it's not legal, I suppose checking both makes sense |
lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
10897 | I can't see how arcp is relevant to this patch at all; visitFSQRT doesn't deal with division. The flags required for rsqrt were discussed here: https://reviews.llvm.org/D37686#877987 . |
My take on Eli's point from D37686 is afn or unsafe is what we need then. The code and tests are augmented for that change.
lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
10897 |
I agree. This is just a question of how we compute sqrt(x). For an approximation of 1/sqrt(x), then I can see also needing arcp. |
test/CodeGen/PowerPC/fmf-propagation.ll | ||
---|---|---|
368–370 | Is there some reason for trimming the output here? I think it's important that we show the entire estimate sequence here to be consistent. Otherwise, it's misleading as it appears that we only need the raw estimate instruction. |
So should we include arcp in the check so that buildSqrtEstimate fires for arcp opportunities(to produce rsqrt) as well or is it fine the way it is?
No - everyone agrees that this transform is predicated with 'afn' alone. The reciprocal sqrt transform is an independent problem/patch.
LGTM.
Here, we're entering a bit of uncharted FMF territory. cc'ing some more potential reviewers for opinions - @wristow @efriedma @andrew.w.kaylor
I think we want to allow this transform when the node allows "approximate functions" (afn). I don't think we should care about 'arcp' - the transform of 1.0/sqrt(x) is handled on a different path AFAICT.