Page MenuHomePhabricator

Add llvm.ldexp.* intrinsic, associated SDNode and library calls
Needs ReviewPublic

Authored by nhaehnle on Nov 4 2015, 2:54 AM.

Details

Summary

This allows the LibCallSimplifier to consistently produce only intrinsics
when optimizing an existing exp2 intrinsic.

For targets that do not support FLDEXP natively (i.e. most of them),
legalization can Expand it to the open coded SINT_TO_FP + FEXP2 + FMUL
or convert it to the corresponding LibCall. This has been turned on
conservatively.

This fixes a bug where instcombine incorrectly generated a library call
for the AMDGPU target.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92709

Diff Detail

Event Timeline

nhaehnle updated this revision to Diff 39178.Nov 4 2015, 2:54 AM
nhaehnle retitled this revision from to Add llvm.ldexp.* intrinsic, associated SDNode and library calls.
nhaehnle updated this object.
arsenm added a comment.Nov 4 2015, 9:34 AM

This mostly LGTM except for the question of error behavior. There should be a few additions to get more of the benefits of using an intrinsic over a libcall. ldexp should be added to isTriviallyVectorizable and isSafeToSpeculativelyExecute with appropriate tests, assuming we can assume it doesn't set errno. This could be a follow up patch.

docs/LangRef.rst
9989

I don't think this should be defined it to handling the same way as libm. I think we should say it does not set errno, and then to only do the libcall transformation if the call is marked readonly/readnone. This is an area that isn't handled particularly consistently by the existing math intrinsics.

test/CodeGen/AMDGPU/llvm.ldexp.ll
22

Should include vector versions for at least v2f32, v4f32 and v2f64

Also, can you merge the existing llvm.AMDGPU.ldexp.ll test into this one and rename them with a legacy_ prefix

arsenm added a subscriber: hfinkel.Nov 4 2015, 9:35 AM
nhaehnle updated this revision to Diff 39403.Nov 5 2015, 12:59 PM

Thank you for taking a look! I've made some changes based on your feedback:

  • AMDGPU: more llvm.ldexp.ll tests and assorted bugfixes
  • LangRef for llvm.ldexp.*: remove statement about handling error conditions
  • [VectorUtils] llvm.ldexp.* intrinsic is vectorizable
  • [ValueTracking] ldexp preserves the sign of its first argument

I agree that the error handling is a problem, and I have to admit that I don't
know what is best. At the time of the libcall transformation, we already have
an SDNode, so I do not know how to tell the attributes of the original call.

It's also some effort to provide an expansion that is guaranteed to never set
errno, because the most straightforward expansion uses exp2, which is in turn
likely to become a library call. I suppose one could write a custom implementation
in compiler-rt, but I don't think that that's the best use of my time.

For now, I have made changes that are in line with the other intrinsics like pow
and powi: those are marked as isTriviallyVectorizable, but *not* as
isSafeToSpeculativelyExecute.

I hope that this is good enough. There are quite a number of TODOs already in
the code regarding these error problems. In any case, I've left those changes
as separate commits locally, so it's easy enough for me to rearrange them.
(Though at least for some of them I believe they should definitely be squashed
before committing to SVN.)

Couldn't the original bug be fixed by marking ldexpf as unavailable for AMDGPU in lib/Analysis/TargetLibraryInfo.cpp ?

Couldn't the original bug be fixed by marking ldexpf as unavailable for AMDGPU in lib/Analysis/TargetLibraryInfo.cpp ?

I think so, yes. Though Matt said that we do want to use the ldexp instruction because it is a full-rate instruction.

Couldn't the original bug be fixed by marking ldexpf as unavailable for AMDGPU in lib/Analysis/TargetLibraryInfo.cpp ?

I think so, yes. Though Matt said that we do want to use the ldexp instruction because it is a full-rate instruction.

Ok, so for a temporary solution, rather than changing the intrinsic emitted by Mesa, I think we should mark this libcall as unavailable. This current patch could then be done as a follow up.

arsenm added inline comments.Jan 19 2016, 1:49 PM
test/CodeGen/X86/ldexp.ll
4

Vector tests here are probably a good idea as well

hfinkel added inline comments.Feb 2 2016, 5:03 PM
docs/LangRef.rst
9989

As I recall, we're very consistent about this, with one exception: @llvm.sqrt. And this causes a lot of confusion. That having been said, there is a precedent, and there are good reasons to do it. However, we do need to say what happens if the result is not representable. You really have two choices:

  1. "and handles error conditions in the same way" (i.e. perhaps sets errno)
    1. Has undefined behavior (it needs to be undefined because it might be implemented using libm, and we can't know whether libm will affect errno)
lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
3395

This seems like a great idea is FEXP2 is legal, but otherwise, seems likely slower than the original library function call to ldexp. Unless we really know better, we should keep the original call.

lib/CodeGen/TargetLoweringBase.cpp
873

You should add FLDEXP here too.

nhaehnle updated this revision to Diff 47547.Feb 10 2016, 3:16 PM
nhaehnle marked 4 inline comments as done.

Rebased on top of current trunk and addressed the various comments.

Since the TargetAction now defaults to Expand (which is actually LibCall
in disguise when available), I have removed several places where targets
redundantly set the action.

I've opted to go the "undefined range error" behaviour route in the revision since that seemed more useful to me given that he LibCallSimplifier is intrinsic->intrinsic and libcall->libcall now.

arsenm edited edge metadata.Mar 2 2016, 6:54 PM

Could also use updating some IR places to handle it (e.g. TTI, isSafeToSpeculativelyExecute), but that's probably a separate patch

docs/LangRef.rst
9989

The returned value on underflow is defined to be zero, and HUGE_VAL, which may be infinity, on overflow. I think saying undefined behavior for the case is too strong. Maybe saying just the state of errno is undefined?

hfinkel added inline comments.Apr 26 2016, 6:00 PM
docs/LangRef.rst
9989

We don't have a way to model errno. We need to "prevent" a situation where we're allowed to reorder a call to ldexp in between, for example, a call to open() and a call to perror(). To get the benefits you want, however, you need to mark the function as readnone. However, it might be implemented using the underlying library call, which might set errno. Unless you make that undefined behavior, then the readnone on the intrinsic is wrong. Both overflow and underflow need to be undefined behavior. I realize that this is unfortunate.

lib/Target/PowerPC/PPCISelLowering.cpp
465

Don't do this. Set it to Expand by default (in TargetLoweringBase::initActions). That's our current best practice for new rarely-legal nodes.

arsenm added inline comments.Apr 27 2016, 1:27 PM
docs/LangRef.rst
9989

The converse is we already don't 'correctly' lower the existing intrinsics which are assumed to write errno because errno does not exist on the platform.

I'm still generally confused about the inconsistency of errno handling. Why don't we have a separate set of math intrinsics for respecting errno, and not? Lowering the non-errno version with a library call would be an incorrect lowering for these. Alternatively, why doesn't the possibility of of writing errno always be a libcall, while the intrinsics are fine for -fno-math-errno? Currently -fno-math-errno adds readnone to the call site of the library call, and allows selecting to the corresponding DAG node. The inconsistency in behavior between the DAG nodes and intrinsics has always confused me. A readnone call to the library function will select to the corresponding chainless node, which could still be lowered to a call to an errno writing function. In the case of sqrt, this is further confused because < 0 inputs are no longer undefined. I would expect the intrinsics would be the for using a native instruction which ignores errno.

The current set of math intrinsics, including those that say handle errors the same way, are already IntrNoMem (e.g. llvm.exp) and say nothing about undefined behavior. The sqrt intrinsic has undefined behavior for < 0, but we are able to fold an isnan() check before it out in the DAG. I'm not sure what an underflow/overflow test for ldexp would look like, but it would be more complicated than the simple compare and select for sqrt.

jfb added a comment.Apr 27 2016, 3:18 PM

Regarding errno: it's totally valid to ignore if the implementation sets math_errhandling & MATH_ERRNO to zero. Of course, you need to know the C library to make that choice, but its value never changes at runtime. See C11 section 7.12, as well as the soon-to-be-published C++ paper p0108r1 which you can preview here.