Teach the IRBuilder about constrained fadd and friends
Needs ReviewPublic

Authored by kpn on Oct 11 2018, 11:59 AM.

Details

Summary

The IRBuilder has calls to create floating point instructions like fadd. It does not have calls to create constrained versions of them. This patch adds support for constrained creation of fadd, fsub, fmul, fdiv, and frem.

Diff Detail

kpn created this revision.Oct 11 2018, 11:59 AM

Perhaps this can be somehow tested in IRBuilderTest.cpp?

kpn updated this revision to Diff 172826.Tue, Nov 6, 12:14 PM

Update to have tests in this patch.

Minor changes to support that testing.

spatel added a comment.Wed, Nov 7, 5:42 AM

Should the builder's FastMathFlags setting apply to these calls? Is it worth adding an optional FMF parameter to these now, so we don't end up with duplicated API like we have for the regular FP binops?

kpn added a comment.Wed, Nov 7, 8:00 AM

Should the builder's FastMathFlags setting apply to these calls? Is it worth adding an optional FMF parameter to these now, so we don't end up with duplicated API like we have for the regular FP binops?

In the interest of having these new calls be like the non-strict ones, I think this does make some sense.

Would it be enough to model after the existing FP binops and take an Instruction argument to copy the flags from?

spatel added a comment.Wed, Nov 7, 8:12 AM
In D53157#1290220, @kpn wrote:

Should the builder's FastMathFlags setting apply to these calls? Is it worth adding an optional FMF parameter to these now, so we don't end up with duplicated API like we have for the regular FP binops?

In the interest of having these new calls be like the non-strict ones, I think this does make some sense.

Would it be enough to model after the existing FP binops and take an Instruction argument to copy the flags from?

Yes, that's the typical usage - copy the flags over from some existing instruction. I don't remember why the existing code has 2 versions of each. If the optional parameter isn't there, then we should just use the buillder's default FMF on the new call?

Adding the usual suspects: @andrew.w.kaylor @craig.topper @uweigand @hfinkel

I don't see the benefit of this change. The intrinsics are good enough to be functional right now (minus a few that are in progress), so I'm not sure we need IRBuilder functions. Am I missing something?

That said, the constrained intrinsics aren't a good long term solution for the FPEnv project. IMO, we'll hit a wall once we try to optimize them. Not sure if this patch helps with that problem though. We really need to toggle optimizations that can clobber side-effects on a case-by-case basis.

kpn added a comment.Wed, Nov 7, 8:28 AM

Adding the usual suspects: @andrew.w.kaylor @craig.topper @uweigand @hfinkel

I don't see the benefit of this change. The intrinsics are good enough to be functional right now (minus a few that are in progress), so I'm not sure we need IRBuilder functions. Am I missing something?

The IRBuilder is used in clang. Once the use of the pragma is wired down to clang's codegen we need a way to emit the constrained intrinsics. It makes for very readable code to change a call to Builder.CreateFAdd() so it conditionally calls Builder.CreateConstrainedFAdd() instead. And if CreateConstrainedFAdd() returns something other than a call to an intrinsic then clang doesn't care.

Seems fair. I don't know enough about the Clang FE to have an opinion on that.

In D53157#1290266, @kpn wrote:

Adding the usual suspects: @andrew.w.kaylor @craig.topper @uweigand @hfinkel

I don't see the benefit of this change. The intrinsics are good enough to be functional right now (minus a few that are in progress), so I'm not sure we need IRBuilder functions. Am I missing something?

The IRBuilder is used in clang. Once the use of the pragma is wired down to clang's codegen we need a way to emit the constrained intrinsics. It makes for very readable code to change a call to Builder.CreateFAdd() so it conditionally calls Builder.CreateConstrainedFAdd() instead. And if CreateConstrainedFAdd() returns something other than a call to an intrinsic then clang doesn't care.

Exactly. We'll want this for Clang CodeGen, etc.

Rather than having separate methods to create the constrained versions of the intrinsics, why not have a way to set the constrained state in the IR builder and have the regular CreateFAdd et. al. functions use that to automatically create the constrained intrinsic versions? This would correspond to how fast math flags are handled.

Rather than having separate methods to create the constrained versions of the intrinsics, why not have a way to set the constrained state in the IR builder and have the regular CreateFAdd et. al. functions use that to automatically create the constrained intrinsic versions? This would correspond to how fast math flags are handled.

Well, yes and no. There is the default fast math flags that get picked up by, eg, CreateFAdd(). There's also a default MDNode. But CreateFAdd() and the others let you specify a different MDNode, and the Create*FMF() functions let you specify different fast math flags. So the existing practice isn't strongly pointing in either direction.

Having CreateConstrained*() functions means front ends can key off its own flag and not have to notify the IRBuilder when that flag goes out of scope. Then again, having a IRBuilder-instance-global flag means probably fewer places in the front end have to be touched, but it would have to notify the IRBuilder when its own strictness flag goes out of scope. So I don't know what is better.

I suspect you are correct and we should just stick with the existing calls. I don't expect anyone would need to switch between constrained and regular floating point at an instruction level of granularity. But we need to do what is best for users of the IRBuilder, and I'm not 100% sold either way.

Can we get some input from someone who maintains front ends?

Craig Topper also raised some concerns with me (in person, his desk is right by mine) about the potential effect this might have on code size. He tells me that IRBuilder calls are intended to always be inlined and if we grow the implementation of these functions too much it could lead to noticeable bloat. It still seems to me like it might be worthwhile for the simplification it would allow in the front end, but I'm not really a front end guy so I definitely agree that we should get some input from front end people about what they want.

In D53157#1291964, @kpn wrote:

I don't expect anyone would need to switch between constrained and regular floating point at an instruction level of granularity.

The Standard allows for this (to some degree). E.g. FENV_ACCESS can be toggled between nested compound statements.

Also, some AVX512 hardware instructions have explicit SAE and RM operands.

kpn added a comment.Thu, Nov 8, 11:45 AM
In D53157#1291964, @kpn wrote:

I don't expect anyone would need to switch between constrained and regular floating point at an instruction level of granularity.

The Standard allows for this (to some degree). E.g. FENV_ACCESS can be toggled between nested compound statements.

Yes, but I'm not worried about what sort of code is input to a compiler. I'm worried about what is right for the compiler itself. Also, since we're in LLVM here, we have to think about all languages that use LLVM as a backend.