This is purely refactoring. No functional changes intended.
The ultimate goal is to allow targets other than PowerPC (certainly X86 and Aarch64) to turn this:
z = y / sqrt(x)
into:
z = y * rsqrte(x)
And:
z = y / x
into:
z = y * rcpe(x)
using whatever HW magic they can use. See http://llvm.org/bugs/show_bug.cgi?id=20900 .
In part 1 ( http://reviews.llvm.org/D5425 ) of this refactoring, I moved just the wrapper portion of the square root estimate out of the PPC backend and into DAGCombiner. In this patch, I've moved everything that I can out of PPCISelLowering and into DAGCombiner.
It turns out that we might as well grab the reciprocal estimate code too because I think that any hardware that provides a rsqrt estimate is also going to provide a recip estimate. And PPC even uses rcpe to generate sqrt from rsqrte! I added a visitFSQRT() to DAGCombiner to keep that functionality.
There are small hooks in TargetLowering to get the target-specific opcode for each estimate instruction and a function to tell DAGCombiner how many times it needs to run the Newton-Raphson refinement loop.
This will allow any target to generate the estimate code by implementing these methods:
virtual SDValue getRecipEst(SDValue Op, DAGCombinerInfo &DCI) const; virtual SDValue getRSqrtEst(SDValue Op, DAGCombinerInfo &DCI) const; virtual unsigned getNRSteps(EVT VT) const;
The number of iterations necessary for the reciprocal estimate and for the reciprocal sqrt estimate might be different. Please provide a way to differentiate (and I'd want to make really sure the target actually overrides this). Maybe: