Page MenuHomePhabricator

[DAGCombiner] try to convert pow(x, 0.25) to sqrt(sqrt(x))
ClosedPublic

Authored by spatel on Sep 4 2018, 7:21 AM.

Details

Summary

This was proposed as an IR transform in D49306, but it was not clearly justifiable as a canonicalization. Here, we only do the transform when the target tells us that sqrt can be lowered with inline code.

This is the basic case. I noted the potential enhancements that I imagined with TODO comments:

  1. Generalize the transform for other exponents (allow more than 2 sqrt calcs if that's really cheaper).
  2. If we have less fast-math-flags, generate code to avoid -0.0 and/or INF.
  3. Allow the transform when optimizing/minimizing size (might require a target hook to get that right).

Note that by default x86 converts single-precision sqrt calcs into sqrt reciprocal estimate with refinement. That codegen is controlled by CPU attributes and can be manually overridden. We have plenty of test coverage for that already, so I didn't bother to include extra testing for that here. AArch uses its full-precision ops in all cases (not sure if that's the intended behavior or not, but that should also be covered by existing tests).

A follow-on patch can extend this to handle the other pattern that we deferred: pow(x,1/3) --> cbrt. But that requires a bit more work because we don't currently have a FCBRT DAG node defined.

Diff Detail

Repository
rL LLVM

Event Timeline

spatel created this revision.Sep 4 2018, 7:21 AM
spatel edited the summary of this revision. (Show Details)Sep 4 2018, 10:27 AM

Could you possibly duplicate the tests for ARM? It looks like the patch does the right thing to me, but it'd be good to have it confirmed and tested.

spatel added a comment.Sep 5 2018, 6:17 AM

Could you possibly duplicate the tests for ARM? It looks like the patch does the right thing to me, but it'd be good to have it confirmed and tested.

Sure. I'm generally not sure what the right target specs are for an ARM RUN line.
Is this good: -mtriple=arm-eabi -mattr=neon ?
Do we want more RUNs to check codegen with other features?

They'd probably work. We often just use useful triples, maybe thumbv8-linux-gnueabihf, and perhaps thumbv7m-linux-gnueabi as a soft-float target that ought to use libcalls for everything (I'd probably just check there are the appropriate number of calls there rather than tracking all the marshalling nonsense that goes on).

spatel updated this revision to Diff 164049.Sep 5 2018, 8:15 AM

Patch updated:
Added ARM (Thumb) tests. No changes otherwise.

t.p.northover accepted this revision.Sep 5 2018, 8:32 AM

Thanks! LGTM.

This revision is now accepted and ready to land.Sep 5 2018, 8:32 AM
This revision was automatically updated to reflect the committed changes.