This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen] lower math intrinsics to finite version of libcalls when possible (PR35672)
ClosedPublic

Authored by spatel on Dec 17 2017, 4:53 PM.

Details

Summary

This is a partial implementation of a fix for PR35672:
https://bugs.llvm.org/show_bug.cgi?id=35672

If this is on the right track, then I can add similar code for other transcendentals (exp2, log, log10, log2, pow).

Some questions:

  1. Do the finite calls need the double-leading underscores? I saw an existing test with __sqrt_finite, so I assume we want those, but I'm not sure if/how the regular calls acquire the underscores.
  2. Does this make sense for ISD::STRICT_FEXP (the strict version of the node)?
  3. Does the mathlib actually support the long double variants?

Diff Detail

Repository
rL LLVM

Event Timeline

spatel created this revision.Dec 17 2017, 4:53 PM

Do the finite calls need the double-leading underscores? I saw an existing test with __sqrt_finite, so I assume we want those, but I'm not sure if/how the regular calls acquire the underscores.

That's just how they're named; the names come from the glibc headers.

Does this make sense for ISD::STRICT_FEXP (the strict version of the node)?

I would guess strict and fast math don't really mix...

Does the mathlib actually support the long double variants?

long double variants should exist, as far as I know... at least for the "long double" which is native for the platform.

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
4049 ↗(On Diff #127295)

"getLibInfo().has(LibFunc_exp_finite)" probably isn't the right check. There are three __exp*_finite variants; each of them may or may not be legal, and you need to check that the long double variant actually accepts the input float type.

Also, it looks like TargetLibraryInfo doesn't contain the appropriate checks to disable them (it assumes any non-Windows platform has __exp_finite, which is clearly false).

Do the finite calls need the double-leading underscores? I saw an existing test with __sqrt_finite, so I assume we want those, but I'm not sure if/how the regular calls acquire the underscores.

That's just how they're named; the names come from the glibc headers.

Does this make sense for ISD::STRICT_FEXP (the strict version of the node)?

I would guess strict and fast math don't really mix...

I agree.

Does the mathlib actually support the long double variants?

long double variants should exist, as far as I know... at least for the "long double" which is native for the platform.

Correct.

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
4049 ↗(On Diff #127295)

and you need to check that the long double variant actually accepts the input float type.

How would we check this?

(it assumes any non-Windows platform has __exp_finite, which is clearly false).

Hrmm. We could start with disabling them for any non-GNU triple.

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
4049 ↗(On Diff #127295)

For figuring out whether the long-double variant is the one we need, I guess TargetLibraryInfo should know? We need to avoid generating a call with undefined behavior somehow. And we probably need to eventually teach this code how to generate a call to __expf128_finite when it's available.

hfinkel added inline comments.Dec 20 2017, 12:37 PM
lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
4049 ↗(On Diff #127295)

For figuring out whether the long-double variant is the one we need, I guess TargetLibraryInfo should know? We need to avoid generating a call with undefined behavior somehow.

I don't think that anything knows right now (except Clang). I think that we can generate <foo>l/__<foo>l_finite calls for ppcf128/f80 and <foo>f128/__<foo>f128_finite for f128 for x86 and ppc specifically (as we have special types, ppcf128 and f80, for long double on those platforms), and calls to <foo>l/__<foo>l_finite for f128 everywhere else. No?

And we probably need to eventually teach this code how to generate a call to __expf128_finite when it's available.

I agree. glibc added a whole bunch of these now.

and calls to <foo>l/__<foo>l_finite for f128 everywhere else. No?

There are four ways to lower C long double to LLVM IR: fp128, ppc_f128, x86_fp80, and double. If you assume the IR doesn't contain any intrinsic calls which can't be lowered, the logic becomes simpler, but I'm not sure that's a safe assumption.

and calls to <foo>l/__<foo>l_finite for f128 everywhere else. No?

There are four ways to lower C long double to LLVM IR: fp128, ppc_f128, x86_fp80, and double. If you assume the IR doesn't contain any intrinsic calls which can't be lowered, the logic becomes simpler, but I'm not sure that's a safe assumption.

Historically, I believe that we've made this assumption. That's, in part, why all of the intrinsics say "Not all targets support all types however." in the LangRef.

That's, in part, why all of the intrinsics say "Not all targets support all types however." in the LangRef.

I think most people would assume that means you'll get some sort of error from the backend, not a silent miscompile. But I guess it isn't any worse than what we do now for other math library calls.

Does this make sense for ISD::STRICT_FEXP (the strict version of the node)?

I would guess strict and fast math don't really mix...

I agree.

Sorry for the delay. Let me address this comment first. I would agree too, but I was informed that there's a clang customer in the gaming world that wants to compile with -ffast-math and enable div-by-zero FP exceptions as a way to sanitize their data (at least in development builds).

I assume that we're still a long way from realizing this dream (optimized FP + some subset of FP exceptions enabled) in LLVM, but if there's no correctness issue with allowing this transform, then I think we should treat these nodes the same in this patch.

spatel updated this revision to Diff 128236.Dec 27 2017, 8:22 AM

Patch updated:

  1. Fixed the availability checking for finite functions in TargetLibraryInfoImpl - these only exist on Linux.
  2. Added a darwin triple to the test file to show that is behaving as expected (no finite calls there).

IIUC, if we're going to improve the long double availability / lowering, then we'll want to do that for all of the math functions (not just the finite variants). So I deferred making any changes related to that here. Also, I haven't added {exp2, log, log10, log2, pow} yet just to keep the patch small until we're sure this is correct. If it is correct, then the other nodes should be simple edits of how we handle exp, and I'll add them.

Does this make sense for ISD::STRICT_FEXP (the strict version of the node)?

I would guess strict and fast math don't really mix...

I agree.

Sorry for the delay. Let me address this comment first. I would agree too, but I was informed that there's a clang customer in the gaming world that wants to compile with -ffast-math and enable div-by-zero FP exceptions as a way to sanitize their data (at least in development builds).

I assume that we're still a long way from realizing this dream (optimized FP + some subset of FP exceptions enabled) in LLVM, but if there's no correctness issue with allowing this transform, then I think we should treat these nodes the same in this patch.

I feel like there are still a lot of things that need to be figured out with regard to StrictFP and ISel. What's there now is just kind of a "get something working" solution, and I think it will need to be beefed up later.

Regarding the original question about mixing fast math and strict FP, I think what the gaming customer is asking for sounds reasonable. The strict FP intrinsics handle two things that are at least potentially separate: value safety and exception behavior. The fast math flags mostly affect value safety, but there are some transformation that could change exception behavior. That will have to be thought through when individual transformations are taught to recognize the constrained intrinsics. More generally, it's certainly reasonable to indicate via the intrinsics that you want to preserve exception behavior but that you don't care about rounding mode. How the user would get the front end to do that is another question, because I think it will have to be something other than using the FENV_ACCESS pragma and -ffast-math together (which I would hope would trigger a warning saying that one of them was going to be ignored).

After thinking through this particular situation a bit more with regard to the STRICT_EXP node, I think what you've chosen to do here is probably correct. I'm not entirely certain what the exp_finite implementation does, but I would expect that with regard to rounding it will produce the same result as the normal function as long as the input is finite. Similarly, I think that the exception behavior of exp_finite should be the same as the non-finite version as long as the input is finite. If the input is non-finite, then I would expect that the appropriate exception was raised or status flag set by whatever produced the value. I don't think either exp or exp_finite will produce an exception for non-finite values. We'll get the wrong answer with exp_finite of course, but the user signed up for that when using the fast math flags.

In general, as this implementation approach is propagated to other functions, I think this is the questions we need to ask. Will the finite version hide exceptions that would have been raised by the normal function call or produce exceptions that wouldn't have been produced otherwise? I'd need to look into each function implementation, but I can imagine cases where assuming finite inputs could lead to changes in exception behavior.

hfinkel accepted this revision.Jan 6 2018, 6:33 PM

LGTM

This revision is now accepted and ready to land.Jan 6 2018, 6:33 PM
spatel added a comment.Jan 8 2018, 9:48 AM

I noticed that we can expose the availability bug via constant propagation, so I split that part of the patch off here:
rL322010

This revision was automatically updated to reflect the committed changes.