Page MenuHomePhabricator

Emit more intrinsics for builtin functions
ClosedPublic

Authored by arsenm on Oct 21 2014, 4:33 PM.

Details

Reviewers
hfinkel
Summary

This is important for building libclc. Since r273039 tests are failing
due to now emitting calls to these functions instead of emitting the
DAG node. The libm function names are implemented for OpenCL, and should
call the locally defined versions, so -fno-builtin is used. The IR
Some functions use the __builtins and expect the intrinsics to be
emitted. Without this we end up with nobuiltin calls to intrinsics
or to unsupported library calls.

Diff Detail

Event Timeline

arsenm updated this revision to Diff 15222.Oct 21 2014, 4:33 PM
arsenm retitled this revision from to Emit minnum / maxnum for __builtin_fmin/fmax .
arsenm updated this object.
arsenm edited the test plan for this revision. (Show Details)
arsenm added a subscriber: Unknown Object (MLST).
arsenm updated this revision to Diff 15224.Oct 21 2014, 5:17 PM

I noticed most of the other FP intrinsics are also emitted as lib calls, so handle more (fabs, copying, floor, ceil, trunc, rint, nearbyint, round)

This might seem like a silly question, but what's the motivation for doing this? These builtin* functions are emitted as readnone which should enable the backend to optimize them using TargetLibraryInfo just as is can for the intrinsics. The intrinsics are important for vectorization, but I'd hope that for scalar optimization, they're essentially the same as the libcalls. Is this not true?

Well first, I would think the point of using __builtin is to specifically get the intrinsic, and not use the library function.

I don’t think they are treated the same as libcalls in most situations, but the current situation is not easy to follow. First, I mostly care about targets that do not have calls and TargetLibraryInfo isn’t particularly helpful or meaningful as is. I think if there is an equivalent intrinsic for the library function, it should always be used instead for consistency. The current state where a call is sometimes treated as an intrinsic and sometimes isn't is pretty confusing (It’s also something else I probably need to add support for for minnum / maxnum). Some places end up trying to consider both forms, or some forget one or the other. Maybe there should be a pass that translates all of the libcalls into equivalent intrinsics?

Intrinsics are considered many more places than query TargetLibraryInfo, e.g. isSafeToSpeculativelyExecute checks many intrinsics but no libcalls. I’m also not sure how the call form interacts with -fno-builtin. I don’t think a regular call can / is considered as a candidate for vectorization, but the intrinsics are.

Well first, I would think the point of using __builtin is to specifically get the intrinsic, and not use the library function.

The point, at the C level, is to get a certain set of semantics. The __builtin_* calls are marked as readnone in the IR, in theory giving them the desired semantics. As you point out below, this is likely not strong enough....

I don’t think they are treated the same as libcalls in most situations, but the current situation is not easy to follow.

For optimizations in SimplifyLibCalls, we try to treat both the same way.

First, I mostly care about targets that do not have calls and TargetLibraryInfo isn’t particularly helpful or meaningful as is. I think if there is an equivalent intrinsic for the library function, it should always be used instead for consistency.

Interestingly, the only adjustment that TargetLibraryInfo makes for your target is:

// There are no library implementations of mempcy and memset for r600 and
// these can be difficult to lower in the backend.
if (T.getArch() == Triple::r600) {
  TLI.setUnavailable(LibFunc::memcpy);
  TLI.setUnavailable(LibFunc::memset);
  TLI.setUnavailable(LibFunc::memset_pattern16);
  return;
}

and so this probably does "work" for you -- but I do see your point. A true freestanding implementation might really not have these calls, and TLI might actually reflect that fact, rendering this scheme of using the libc calls (even marked as readnone) as suboptimal.

The current state where a call is sometimes treated as an intrinsic and sometimes isn't is pretty confusing (It’s also something else I probably need to add support for for minnum / maxnum). Some places end up trying to consider both forms, or some forget one or the other.

Yes, in theory we try to locate all relevant optimizations inside of SimplifyLibCalls; perhaps in practice this does not quite capture everything.

Maybe there should be a pass that translates all of the libcalls into equivalent intrinsics?

I think this is a reasonable idea. It could be done inside of SimplifyLibCalls.

Intrinsics are considered many more places than query TargetLibraryInfo, e.g. isSafeToSpeculativelyExecute checks many intrinsics but no libcalls.

This is a good point. In theory we could do a better job at this for some known libcalls, but you're right, isSafeToSpeculativelyExecute does not return true for readnone functions, and noted in the function:

return false; // The called function could have undefined behavior or
              // side-effects, even if marked readnone nounwind.

as a side note, I've been thinking about adding some kind of "safe to speculate" metadata for loads/calls; this is another use case ;) Nevertheless, we could use TLI here too and specifically handle certain known libcalls.

I’m also not sure how the call form interacts with -fno-builtin.

We nicely mark the functions as 'nobuiltin'. This is a bug. :(

I don’t think a regular call can / is considered as a candidate for vectorization, but the intrinsics are.

They certainly can be (when marked readnone) and are. The vectorizers handle the translation.

In short, I understand what you're saying, and I support canonicalizing on the intrinsics for these builtin operations. However, this would be a change to an established engineering decision, and I'd not approve this here without some discussion on llvmdev. Can you please write an RFC there? I want to make sure that everyone is on the same page.

emaste added a subscriber: emaste.May 17 2015, 7:01 PM
arsenm added a comment.Mar 9 2016, 9:22 PM

ping since there seems to be a consensus now that intrinsics should be preferred

arsenm updated this revision to Diff 61432.Jun 21 2016, 1:03 PM
arsenm retitled this revision from Emit minnum / maxnum for __builtin_fmin/fmax to Emit more intrinsics for builtin functions.
arsenm updated this object.
arsenm added a subscriber: tstellarAMD.

Update for trunk

Why are you removing 'F' from all of the builtin definitions? And if you need to, why are you not removing them from copysign?

Why are you removing 'F' from all of the builtin definitions? And if you need to, why are you not removing them from copysign?

That marks them as emitting a libcall. It should also be removed from copysign

arsenm updated this revision to Diff 61468.Jun 21 2016, 5:25 PM

It appears removing the F isn't actually necessary, unremove it

arsenm updated this revision to Diff 61472.Jun 21 2016, 5:43 PM

Attach right diff

hfinkel accepted this revision.Jun 30 2016, 7:15 PM
hfinkel added a reviewer: hfinkel.

LGTM

This revision is now accepted and ready to land.Jun 30 2016, 7:15 PM
arsenm closed this revision.Jul 1 2016, 11:11 AM

r274370