This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
include/llvm/IR/
-
llvm/
-
IR/
-
IRBuilder.h
-
IntrinsicInst.h
-
lib/IR/
-
IR/
-
IntrinsicInst.cpp
-
Verifier.cpp
-
unittests/IR/
-
IR/
-
IRBuilderTest.cpp

Differential D53157

Teach the IRBuilder about constrained fadd and friends
ClosedPublic

Authored by kpn on Oct 11 2018, 11:59 AM.

Download Raw Diff

Details

Reviewers

chandlerc
spatel
arsenm
rsmith
rjmccall
andrew.w.kaylor

Commits

rG472e5dda110f: Teach the IRBuilder about fadd and friends.
rL365339: Teach the IRBuilder about fadd and friends.

Summary

The IRBuilder has calls to create floating point instructions like fadd. It does not have calls to create constrained versions of them. This patch adds support for constrained creation of fadd, fsub, fmul, fdiv, and frem.

Diff Detail

Repository: rL LLVM

Event Timeline

kpn created this revision.Oct 11 2018, 11:59 AM

Herald added subscribers: llvm-commits, wdng. · View Herald TranscriptOct 11 2018, 11:59 AM

The test case for this is in D52839, now.

Perhaps this can be somehow tested in IRBuilderTest.cpp?

Update to have tests in this patch.

Minor changes to support that testing.

Should the builder's FastMathFlags setting apply to these calls? Is it worth adding an optional FMF parameter to these now, so we don't end up with duplicated API like we have for the regular FP binops?

In D53157#1290044, @spatel wrote:

Should the builder's FastMathFlags setting apply to these calls? Is it worth adding an optional FMF parameter to these now, so we don't end up with duplicated API like we have for the regular FP binops?

In the interest of having these new calls be like the non-strict ones, I think this does make some sense.

Would it be enough to model after the existing FP binops and take an Instruction argument to copy the flags from?

cameron.mcinally added a subscriber: cameron.mcinally.Nov 7 2018, 8:04 AM

In D53157#1290220, @kpn wrote:

In D53157#1290044, @spatel wrote:

Should the builder's FastMathFlags setting apply to these calls? Is it worth adding an optional FMF parameter to these now, so we don't end up with duplicated API like we have for the regular FP binops?

In the interest of having these new calls be like the non-strict ones, I think this does make some sense.

Would it be enough to model after the existing FP binops and take an Instruction argument to copy the flags from?

Yes, that's the typical usage - copy the flags over from some existing instruction. I don't remember why the existing code has 2 versions of each. If the optional parameter isn't there, then we should just use the buillder's default FMF on the new call?

Adding the usual suspects: @andrew.w.kaylor @craig.topper @uweigand @hfinkel

I don't see the benefit of this change. The intrinsics are good enough to be functional right now (minus a few that are in progress), so I'm not sure we need IRBuilder functions. Am I missing something?

That said, the constrained intrinsics aren't a good long term solution for the FPEnv project. IMO, we'll hit a wall once we try to optimize them. Not sure if this patch helps with that problem though. We really need to toggle optimizations that can clobber side-effects on a case-by-case basis.

In D53157#1290248, @cameron.mcinally wrote:

Adding the usual suspects: @andrew.w.kaylor @craig.topper @uweigand @hfinkel

I don't see the benefit of this change. The intrinsics are good enough to be functional right now (minus a few that are in progress), so I'm not sure we need IRBuilder functions. Am I missing something?

The IRBuilder is used in clang. Once the use of the pragma is wired down to clang's codegen we need a way to emit the constrained intrinsics. It makes for very readable code to change a call to Builder.CreateFAdd() so it conditionally calls Builder.CreateConstrainedFAdd() instead. And if CreateConstrainedFAdd() returns something other than a call to an intrinsic then clang doesn't care.

Seems fair. I don't know enough about the Clang FE to have an opinion on that.

In D53157#1290266, @kpn wrote:

In D53157#1290248, @cameron.mcinally wrote:

Adding the usual suspects: @andrew.w.kaylor @craig.topper @uweigand @hfinkel

I don't see the benefit of this change. The intrinsics are good enough to be functional right now (minus a few that are in progress), so I'm not sure we need IRBuilder functions. Am I missing something?

The IRBuilder is used in clang. Once the use of the pragma is wired down to clang's codegen we need a way to emit the constrained intrinsics. It makes for very readable code to change a call to Builder.CreateFAdd() so it conditionally calls Builder.CreateConstrainedFAdd() instead. And if CreateConstrainedFAdd() returns something other than a call to an intrinsic then clang doesn't care.

Exactly. We'll want this for Clang CodeGen, etc.

Rather than having separate methods to create the constrained versions of the intrinsics, why not have a way to set the constrained state in the IR builder and have the regular CreateFAdd et. al. functions use that to automatically create the constrained intrinsic versions? This would correspond to how fast math flags are handled.

In D53157#1290364, @andrew.w.kaylor wrote:

Rather than having separate methods to create the constrained versions of the intrinsics, why not have a way to set the constrained state in the IR builder and have the regular CreateFAdd et. al. functions use that to automatically create the constrained intrinsic versions? This would correspond to how fast math flags are handled.

Well, yes and no. There is the default fast math flags that get picked up by, eg, CreateFAdd(). There's also a default MDNode. But CreateFAdd() and the others let you specify a different MDNode, and the Create*FMF() functions let you specify different fast math flags. So the existing practice isn't strongly pointing in either direction.

Having CreateConstrained*() functions means front ends can key off its own flag and not have to notify the IRBuilder when that flag goes out of scope. Then again, having a IRBuilder-instance-global flag means probably fewer places in the front end have to be touched, but it would have to notify the IRBuilder when its own strictness flag goes out of scope. So I don't know what is better.

I suspect you are correct and we should just stick with the existing calls. I don't expect anyone would need to switch between constrained and regular floating point at an instruction level of granularity. But we need to do what is best for users of the IRBuilder, and I'm not 100% sold either way.

Can we get some input from someone who maintains front ends?

Craig Topper also raised some concerns with me (in person, his desk is right by mine) about the potential effect this might have on code size. He tells me that IRBuilder calls are intended to always be inlined and if we grow the implementation of these functions too much it could lead to noticeable bloat. It still seems to me like it might be worthwhile for the simplification it would allow in the front end, but I'm not really a front end guy so I definitely agree that we should get some input from front end people about what they want.

In D53157#1291964, @kpn wrote:

I don't expect anyone would need to switch between constrained and regular floating point at an instruction level of granularity.

The Standard allows for this (to some degree). E.g. FENV_ACCESS can be toggled between nested compound statements.

Also, some AVX512 hardware instructions have explicit SAE and RM operands.

In D53157#1291978, @cameron.mcinally wrote:

In D53157#1291964, @kpn wrote:

I don't expect anyone would need to switch between constrained and regular floating point at an instruction level of granularity.

The Standard allows for this (to some degree). E.g. FENV_ACCESS can be toggled between nested compound statements.

Yes, but I'm not worried about what sort of code is input to a compiler. I'm worried about what is right for the compiler itself. Also, since we're in LLVM here, we have to think about all languages that use LLVM as a backend.

In D53157#1291978, @cameron.mcinally wrote:

In D53157#1291964, @kpn wrote:

I don't expect anyone would need to switch between constrained and regular floating point at an instruction level of granularity.

The Standard allows for this (to some degree). E.g. FENV_ACCESS can be toggled between nested compound statements.

Also, some AVX512 hardware instructions have explicit SAE and RM operands.

Just because FENV_ACCESS can be toggled on that granularity doesn't mean we have to represent it that way. We've previously agreed (I think) that if FENV_ACCESS is enabled anywhere in a function we will want to use the constrained intrinsics for all FP operations in the function, not just the ones in the scope where it was specified. I think this works because FENV_ACCESS ON needs to be respected (i.e. we need to assume that the user may change the FP environment) but FENV_ACCESS OFF doesn't need to be respected (i.e. we are not required to assume that the user will not change the environment). For instance, the spec does give us liberty to assume the default rounding mode in FENV_ACCESS OFF regions, but if we make no assumptions about the rounding mode that will be conservatively correct.

As for instructions that take an explicit rounding mode argument, that's a separate issue (and one that is relevant for multiple architectures). The constrained intrinsics do not enforce a rounding mode (i.e. they are descriptive rather than prescriptive), but if we have concrete rounding mode arguments for any given instruction we could use that to encode the rounding mode in the instruction. I'm not sure how user code would express this apart from use of intrinsics.

Fair enough. Just for conversation's sake, I was envisioning the FE support a little differently...

Add a command line option, say -ffpe_safe=[true|false], that toggles FPEnv-safe code generation for a whole file. A -ffpe_safe=true would be equivalent to FENV_ACCESS=ON at the beginning of a translation unit and we would capture it in some FE variable. That command line option can be overridden by the FENV_ACCESS pragma.

If the FENV_ACCESS pragma is seen, carry a nop/metadata/something construct to toggle the FE variable during IR generation.

Then when generating IR, choose between constrained/unconstrained IR depending on the state of the FE variable.

#1 may seem superfluous, but there are certain benchmarks (e.g. SPEC CPU) that do not allow for code modifications. So, in order to run those with FPEnv-safe code, we'd need a non-invasive way to toggle FPEnv-safe code generation.

I'll also add that my interpretation of FENV_ACCESS in the C Standard [7.6.1] implies that we have to track the FENV_ACCESS pragma. That is, it's not okay to ignore FENV_ACCESS=OFF. When we transfer from FENV_ACCESS=OFF code to FENV_ACCESS=ON code, the rounding mode needs to be reset to the default setting. But, I'm open to hearing other interpretations. My copy of the Standard is old, so please correct me if this changed...

When execution passes from a part of the program translated with FENV_ACCESS ‘‘off’’ to a part translated with FENV_ACCESS ‘‘on’’, the state of the floating-point status flags is unspecified and the floating-point control modes have their default settings.

In D53157#1291977, @andrew.w.kaylor wrote:

Craig Topper also raised some concerns with me (in person, his desk is right by mine) about the potential effect this might have on code size. He tells me that IRBuilder calls are intended to always be inlined and if we grow the implementation of these functions too much it could lead to noticeable bloat. It still seems to me like it might be worthwhile for the simplification it would allow in the front end, but I'm not really a front end guy so I definitely agree that we should get some input from front end people about what they want.

Craig's right about not wanting to bloat the inlinable functions in IRBuilder, but this is something that we can measure. In addition, we might be able to move the "slow path" (which create the constrained intrinsics) to the .cpp file (by manually outlining to a different function).

Just because FENV_ACCESS can be toggled on that granularity doesn't mean we have to represent it that way. We've previously agreed (I think) that if FENV_ACCESS is enabled anywhere in a function we will want to use the constrained intrinsics for all FP operations in the function, not just the ones in the scope where it was specified.

Yes, this is also my understanding. We can't soundly mix the two in the same function because we can't prevent the code motion within the function.

In D53157#1301991, @hfinkel wrote:

In D53157#1291977, @andrew.w.kaylor wrote:

Craig Topper also raised some concerns with me (in person, his desk is right by mine) about the potential effect this might have on code size. He tells me that IRBuilder calls are intended to always be inlined and if we grow the implementation of these functions too much it could lead to noticeable bloat. It still seems to me like it might be worthwhile for the simplification it would allow in the front end, but I'm not really a front end guy so I definitely agree that we should get some input from front end people about what they want.

Craig's right about not wanting to bloat the inlinable functions in IRBuilder, but this is something that we can measure. In addition, we might be able to move the "slow path" (which create the constrained intrinsics) to the .cpp file (by manually outlining to a different function).

Also, to be clear, adding a mode that automatically adds these when using the existing IRBuilder functions seems worth investigating. It seems like that would greatly simply the FE code.

In D53157#1301782, @cameron.mcinally wrote:

Fair enough. Just for conversation's sake, I was envisioning the FE support a little differently...

Add a command line option, say -ffpe_safe=[true|false], that toggles FPEnv-safe code generation for a whole file. A -ffpe_safe=true would be equivalent to FENV_ACCESS=ON at the beginning of a translation unit and we would capture it in some FE variable. That command line option can be overridden by the FENV_ACCESS pragma.

To bikeshed, I'd prefer we don't have an underscore in the name of the command-line flag.

If the FENV_ACCESS pragma is seen, carry a nop/metadata/something construct to toggle the FE variable during IR generation.

Then when generating IR, choose between constrained/unconstrained IR depending on the state of the FE variable.

#1 may seem superfluous, but there are certain benchmarks (e.g. SPEC CPU) that do not allow for code modifications. So, in order to run those with FPEnv-safe code, we'd need a non-invasive way to toggle FPEnv-safe code generation.

I'll also add that my interpretation of FENV_ACCESS in the C Standard [7.6.1] implies that we have to track the FENV_ACCESS pragma. That is, it's not okay to ignore FENV_ACCESS=OFF. When we transfer from FENV_ACCESS=OFF code to FENV_ACCESS=ON code, the rounding mode needs to be reset to the default setting. But, I'm open to hearing other interpretations. My copy of the Standard is old, so please correct me if this changed...
When execution passes from a part of the program translated with FENV_ACCESS ‘‘off’’ to a part translated with FENV_ACCESS ‘‘on’’, the state of the floating-point status flags is unspecified and the floating-point control modes have their default settings.

The rounding mode does need to be reset to its default setting when passing from FENV_ACCESS "on" to FENV_ACCESS "off", but that seems to be the user's responsibility. Are you saying that the implementation should reset it on that transition?

In D53157#1301992, @hfinkel wrote:

Just because FENV_ACCESS can be toggled on that granularity doesn't mean we have to represent it that way. We've previously agreed (I think) that if FENV_ACCESS is enabled anywhere in a function we will want to use the constrained intrinsics for all FP operations in the function, not just the ones in the scope where it was specified.

Yes, this is also my understanding. We can't soundly mix the two in the same function because we can't prevent the code motion within the function.

Ugh, I don't know. The C Standard's language is so vague.

In D53157#1301994, @hfinkel wrote:

The rounding mode does need to be reset to its default setting when passing from FENV_ACCESS "on" to FENV_ACCESS "off", but that seems to be the user's responsibility. Are you saying that the implementation should reset it on that transition?

Yes, that's how I was interpreting the Standard (today). The implementation should reset the control modes. The verbiage is murky at best though.

We touched on this in D43142 and I do realize that my opinion has flip-flopped since then. I previously believed that reseting the control modes was up to the user, but now I'm not so sure. I suppose that either way, as long as a fesetround(default_mode_constant) is seen with a FENV_ACCESS=OFF, we could use that as a barrier to prevent the problematic code motion.

Thinking aloud, maybe we should be working on redefining FENV_ACCESS in the C Standard? It's pretty clear that this section could use some work.

All that said, my understanding of $7.6.1 in the Standard is cloudy at best. If I'm the only one that feels this way, I'll drop my objections...

In D53157#1302159, @cameron.mcinally wrote:

In D53157#1301992, @hfinkel wrote:

Just because FENV_ACCESS can be toggled on that granularity doesn't mean we have to represent it that way. We've previously agreed (I think) that if FENV_ACCESS is enabled anywhere in a function we will want to use the constrained intrinsics for all FP operations in the function, not just the ones in the scope where it was specified.

Yes, this is also my understanding. We can't soundly mix the two in the same function because we can't prevent the code motion within the function.

Ugh, I don't know. The C Standard's language is so vague.

To be clear, I mean here that *we* can't mix the two soundly in the same function (where we generate some operations using the constrained instrinsics and some not) because of constraints imposed by our design. The standard may indeed allow for more precision. We'll need to be conservatively correct.

In D53157#1301994, @hfinkel wrote:

The rounding mode does need to be reset to its default setting when passing from FENV_ACCESS "on" to FENV_ACCESS "off", but that seems to be the user's responsibility. Are you saying that the implementation should reset it on that transition?

Yes, that's how I was interpreting the Standard (today). The implementation should reset the control modes. The verbiage is murky at best though.

I'll also point out that whether or not we insert a rounding-mode reset when the pragma changes state is orthogonal to whether we stop emitting constrained intrinsics at that point.

We touched on this in D43142 and I do realize that my opinion has flip-flopped since then. I previously believed that reseting the control modes was up to the user, but now I'm not so sure. I suppose that either way, as long as a fesetround(default_mode_constant) is seen with a FENV_ACCESS=OFF, we could use that as a barrier to prevent the problematic code motion.

Thinking aloud, maybe we should be working on redefining FENV_ACCESS in the C Standard? It's pretty clear that this section could use some work.

All that said, my understanding of $7.6.1 in the Standard is cloudy at best. If I'm the only one that feels this way, I'll drop my objections...

I don't read the standard that way, but the standard could certainly be more clear.

A couple of comments on the previous discussion:

Instead of defining a new command line option, I'd prefer to use the existing options -frounding-math and -ftrapping-math to set the default behavior of math operations w.r.t. rounding modes and exception status. (For compatibility with GCC if nothing else.)
I also read the C standard to imply that it is a requirement of user code to reset the status flag to default before switching back to FENV_ACCESS OFF. The fundamental characterization of the pragma says "The FENV_ACCESS pragma provides a means to inform the implementation when a program might access the floating-point environment to test floating-point status flags or run under non-default floating-point control modes." There is no mention anywhere that using the pragma, on its own, will ever change those control modes. The last sentence about "... the floating-point control modes have their default setting", while indeed a bit ambiguous, is still consistent with an interpretation that it is the responsibility of user code to ensure that state, there is no explicit statement that the implementation will do so.
I agree that we need to be careful about intermixing "normal" floating-point operations with strict ones. However, I'm still not convinced that the pragma itself must be the scheduling barrier. It seems to me that the compiler already knows where FP control flags are ever modified directly (this can only happen with intrinsics or the like), so the main issue is whether function calls need to be considered. This is where the pragma comes in: in my mind, the primary difference between FENV_ACCESS ON and FENV_ACCESS OFF regions is that where the pragma is ON, function calls need to be considered (unless otherwise known for sure) to access FP control flags, while where the pragma is OFF, function calls can be considered to never touch FP control flags. So the real scheduling barrier would be any function call within a FENV_ACCESS ON region. Those would have to be marked by the front-end in the IR, presumably using a function attribute. The common LLVM optimizers would then need to respect that scheduling barrier (here is where we likely still have an open issue, there doesn't appear to be any way to express that at the IR level for regular floating-point operations ...), and likewise the back-ends (but that looks straightforward: a back-end typically will model FP status as residing in a register or in a pseudo-memory slot, and those can simply be considered used/clobbered by function calls marked as within FENV_ACCESS ON regions).

In D53157#1302724, @uweigand wrote:

A couple of comments on the previous discussion:

Instead of defining a new command line option, I'd prefer to use the existing options -frounding-math and -ftrapping-math to set the default behavior of math operations w.r.t. rounding modes and exception status. (For compatibility with GCC if nothing else.)

I agree that it's preferable to re-use these existing options if possible. I have some concerns that -ftrapping-math has a partial implementation in place that doesn't seem to be well aligned with the way fast-math flags are handled, so it might require some work to have that working as expected without breaking existing users. In general though these seem like they should do what we need.

Regarding GCC compatibility, I notice that GCC defaults to trapping math being enabled and I don't think that's what we want with clang. It also seems to imply something more than I think we need for constrained handling. For example, the GCC documentation says that -fno-trapping-math "can result in incorrect output for programs that depend on an exact implementation of IEEE or ISO rules/specifications for math functions" so it sounds like maybe it also implies (for GCC) something like LLVM's "afn" fast math flag.

So if we are going to use these options, I think we need to have a discussion about whether or not it's OK to diverge from GCC's interpretation of them.

In D53157#1302724, @uweigand wrote:

I also read the C standard to imply that it is a requirement of user code to reset the status flag to default before switching back to FENV_ACCESS OFF. The fundamental characterization of the pragma says "The FENV_ACCESS pragma provides a means to inform the implementation when a program might access the floating-point environment to test floating-point status flags or run under non-default floating-point control modes." There is no mention anywhere that using the pragma, on its own, will ever change those control modes. The last sentence about "... the floating-point control modes have their default setting", while indeed a bit ambiguous, is still consistent with an interpretation that it is the responsibility of user code to ensure that state, there is no explicit statement that the implementation will do so.

I definitely agree with this interpretation of the standard. My understanding is that behavior is undefined if the user has not left the FP environment in the default state when transitioning to an FENV_ACCESS OFF region.

In D53157#1302724, @uweigand wrote:

I agree that we need to be careful about intermixing "normal" floating-point operations with strict ones. However, I'm still not convinced that the pragma itself must be the scheduling barrier. It seems to me that the compiler already knows where FP control flags are ever modified directly (this can only happen with intrinsics or the like), so the main issue is whether function calls need to be considered. This is where the pragma comes in: in my mind, the primary difference between FENV_ACCESS ON and FENV_ACCESS OFF regions is that where the pragma is ON, function calls need to be considered (unless otherwise known for sure) to access FP control flags, while where the pragma is OFF, function calls can be considered to never touch FP control flags. So the real scheduling barrier would be any function call within a FENV_ACCESS ON region. Those would have to be marked by the front-end in the IR, presumably using a function attribute. The common LLVM optimizers would then need to respect that scheduling barrier (here is where we likely still have an open issue, there doesn't appear to be any way to express that at the IR level for regular floating-point operations ...), and likewise the back-ends (but that looks straightforward: a back-end typically will model FP status as residing in a register or in a pseudo-memory slot, and those can simply be considered used/clobbered by function calls marked as within FENV_ACCESS ON regions).

I'm a bit confused by this. The constrained intrinsics will cause all calls to act as barriers to motion of the FP operations represented by the intrinsics (at least before instruction selection). So I'm not clear what you are saying is needed here.

In D53157#1302724, @uweigand wrote:

A couple of comments on the previous discussion:

Instead of defining a new command line option, I'd prefer to use the existing options -frounding-math and -ftrapping-math to set the default behavior of math operations w.r.t. rounding modes and exception status. (For compatibility with GCC if nothing else.)

I like this proposal.

I also read the C standard to imply that it is a requirement of user code to reset the status flag to default before switching back to FENV_ACCESS OFF. The fundamental characterization of the pragma says "The FENV_ACCESS pragma provides a means to inform the implementation when a program might access the floating-point environment to test floating-point status flags or run under non-default floating-point control modes." There is no mention anywhere that using the pragma, on its own, will ever change those control modes. The last sentence about "... the floating-point control modes have their default setting", while indeed a bit ambiguous, is still consistent with an interpretation that it is the responsibility of user code to ensure that state, there is no explicit statement that the implementation will do so.

That's a fair interpretation. Andy mentioned:

In D53157#1303193, @andrew.w.kaylor wrote:

I definitely agree with this interpretation of the standard. My understanding is that behavior is undefined if the user has not left the FP environment in the default state when transitioning to an FENV_ACCESS OFF region.

If we all agree upon that, then we simply have to treat the functions that modify the FPEnv, e.g. fesetexcept(...), as barriers. That way it does not matter if a FENV_ACCESS=OFF function is translated with constrained intrinsics or not, since nothing can be scheduled around these barriers.

I agree that we need to be careful about intermixing "normal" floating-point operations with strict ones. However, I'm still not convinced that the pragma itself must be the scheduling barrier. It seems to me that the compiler already knows where FP control flags are ever modified directly (this can only happen with intrinsics or the like), so the main issue is whether function calls need to be considered. This is where the pragma comes in: in my mind, the primary difference between FENV_ACCESS ON and FENV_ACCESS OFF regions is that where the pragma is ON, function calls need to be considered (unless otherwise known for sure) to access FP control flags, while where the pragma is OFF, function calls can be considered to never touch FP control flags. So the real scheduling barrier would be any function call within a FENV_ACCESS ON region. Those would have to be marked by the front-end in the IR, presumably using a function attribute. The common LLVM optimizers would then need to respect that scheduling barrier (here is where we likely still have an open issue, there doesn't appear to be any way to express that at the IR level for regular floating-point operations ...), and likewise the back-ends (but that looks straightforward: a back-end typically will model FP status as residing in a register or in a pseudo-memory slot, and those can simply be considered used/clobbered by function calls marked as within FENV_ACCESS ON regions).

I'm not sure if I fully understand this, but it seems to be an acceptable solution to the problem.

As mentioned above, couldn't we make the helper functions that read/write the FPEnv the barriers? That seems like a simpler solution.

In D53157#1303398, @cameron.mcinally wrote:

If we all agree upon that, then we simply have to treat the functions that modify the FPEnv, e.g. fesetexcept(...), as barriers. That way it does not matter if a FENV_ACCESS=OFF function is translated with constrained intrinsics or not, since nothing can be scheduled around these barriers.

I thought we couldn't do barriers. No barriers means no way to prevent code motion and mixing of constrained with non-constrained FP. That was the reason for having all FP in a function be constrained if any of it was.

OK, let me try to expand on my point 3 above, which appears to have confused everybody :-)

First, let's distinguish two separate requirements: those on floating-point operations that explicitly run in regions with non-default control modes, and those on floating-point operations that run outside such regions. (Note that those regions are by definition a subset of the regions marked as FENV_ACCESS ON, but not necessarily coincide with them.)

For the first class of operations, there are various requirements to constrain optimization. All of those are handled in the current design by simply representing those operations via constrained intrinsics. This can be done simply by always using constrained intrinsics whenever we are inside any FENV_ACCESS ON region. (My comments above were not intended to refer to those at all.)

Now, for the second class of operations, most of these requirements do not apply. However, there is one critical requirement that still does apply: such operations may never be moved to inside a region with non-default control modes. (Where it makes a difference, the operation can in principle be even moved inside a FENV_ACCESS ON region, as long as the control modes actually have not yet changed from the default modes.)

As @andrew.w.kaylor mentioned above, to fulfil this requirement if suffices to prevent moving any such floating-point operation across any statement that modifies the control modes (or inspects the exception flags). There is only a small number of statements that do so directly (those would all be inline asm or platform-specific builtins), but the action can be hidden behind a function call as well. This applies in particular to the related C library calls with well-known names (which the compiler could detect), but in addition any random function call might indirectly contain such a statement (which the compiler will not know, in general). Since LLVM today will freely move normal floating-point operations across normal function calls, this is a problem that needs to be addressed.

Now, one way to do that would be to ensure that any floating-point operation outside FENV_ACCESS ON regions is also implemented via constrained intrinsics, as long as there is any FENV_ACCESS ON present in the function at all. As @cameron.mcinally mentioned above, that would prevent reordering (across any function call, even) and therefore solve the problem. However, I believe there were concerns whether we can reliably ensure that property in the presence of optimizations like inlining, in particular if LTO is also used. Even if this can be resolved, I believe there were additional concerns that this might overly constrain optimization and lead to suboptimal performance.

This finally gets to the intent of my point 3 above, where I was trying to see if there is any way we can do better, that is correctly handle floating-point operations outside FENV_ACCESS ON regions without having to turn them all into constrained intrinsics. One way might be to add some new code motion barrier in the IR optimizers that would prevent movement of floating-point operations across the location of any FENV_ACCESS pragma. But -- and this is I think what @kpn referered to above -- those locations don't really correspond to anything in the IR that could serve as such a barrier. My observation above was simply that if we want to go that way, it in fact isn't really necessary to put those barriers at exactly these places. Instead, we can put the barriers at those places that will actually modify (or inspect) the control words -- which are either special operations we can explicitly recognize, or general function calls -- but only those calls where the call site was originally inside a FENV_ACCESS ON region (and was marked by the front-end as such). This has the advantage that a function call site is a natural place to act as barrier -- in fact it already acts as such e.g. for global memory accesses. (It would also be easy to implement as barrier on the MI / back-end level.) In addition, this approach would not lead to any performance penalty outside of FENV_ACCESS ON regions at all.

But given that there is still infrastructure missing in the IR optimizers, I also think that at least in the first implementation, we probably should go with the original approach and just use constrained intrinsics everywhere in the function, and possibly add some function attribute that prevent any cross-inlining of functions built with constrained intrinsics with functions built with regular floating-point operations.

In D53157#1303193, @andrew.w.kaylor wrote:

I agree that it's preferable to re-use these existing options if possible. I have some concerns that -ftrapping-math has a partial implementation in place that doesn't seem to be well aligned with the way fast-math flags are handled, so it might require some work to have that working as expected without breaking existing users. In general though these seem like they should do what we need.

So as far as I can see, the current implementation of -f(no-)trapping-math in LLVM is pretty much a stub. The driver passes the option on to CC1, and in addition it disables -menable-unsafe-fp-math and -massociative-math if -ftrapping-math is in effect (the latter may be a minor difference to GCC, but should be conservatively compatible). The compiler itself does nothing except mark all functions with the no-trapping-math attribute if -fno-trapping-math is in effect. LLVM itself completely ignores this attribute except on ARM, where it is used to set the ABI_FP_exceptions extended attribute. (But since LLVM doesn't really implement the flag in actual codegen, setting the attribute to indicate the property seems a bit odd ...)

Regarding GCC compatibility, I notice that GCC defaults to trapping math being enabled and I don't think that's what we want with clang. It also seems to imply something more than I think we need for constrained handling. For example, the GCC documentation says that -fno-trapping-math "can result in incorrect output for programs that depend on an exact implementation of IEEE or ISO rules/specifications for math functions" so it sounds like maybe it also implies (for GCC) something like LLVM's "afn" fast math flag.

So if we are going to use these options, I think we need to have a discussion about whether or not it's OK to diverge from GCC's interpretation of them.

The GCC documentation says:

-fno-trapping-math

Compile code assuming that floating-point operations cannot generate user-visible traps. These traps include division by zero, overflow, underflow, inexact result and invalid operation. This option requires that -fno-signaling-nans be in effect. Setting this option may allow faster code if one relies on ``non-stop'' IEEE arithmetic, for example.

This option should never be turned on by any -O option since it can result in incorrect output for programs that depend on an exact implementation of IEEE or ISO rules/specifications for math functions.

The default is -ftrapping-math.

As far as I can tell, this does not imply any additional differences except that -fno-trapping-math may rearrange floating-point operations to eliminate or introduce traps or changes to exception flags. It is those changes themselves (not anything else) that can result in incorrect output for programs that depend on exact IEEE semantics, and therefore this option must not be enabled by default at any optimization level, as the second paragraph above states.

Note that in GCC, -fno-trapping-math is implied by -funsafe-math-optimizations, which in turn is implied by -ffast-math. Those options all are documented using the same "incorrect output for programs that depend on exact IEEE semantics" clause: -funsafe-math-optimization is the catch-all for all such effects, but it is separated into -fno-signed-zeros, -fno-trapping-math, -fassociative-math, and -freciprocal-math that cover the distinct aspects of just how IEEE semantics might get violated.

We can of course still have a separate discussion on what the default should be. But even if we choose a different default than GCC (as is already the case e.g. for the -ffp-contract option), I think it would be preferable for the two explicit options to behave in a compatible way.

In D53157#1304275, @kpn wrote:

In D53157#1303398, @cameron.mcinally wrote:

If we all agree upon that, then we simply have to treat the functions that modify the FPEnv, e.g. fesetexcept(...), as barriers. That way it does not matter if a FENV_ACCESS=OFF function is translated with constrained intrinsics or not, since nothing can be scheduled around these barriers.

I thought we couldn't do barriers. No barriers means no way to prevent code motion and mixing of constrained with non-constrained FP. That was the reason for having all FP in a function be constrained if any of it was.

I'd like to hear more about this. The fesetexcept(...) function and friends change the FPEnv state, which can change the semantics for the instructions that follow. They definitely have to act as a barrier.

In D53157#1304873, @cameron.mcinally wrote:

In D53157#1304275, @kpn wrote:

In D53157#1303398, @cameron.mcinally wrote:

If we all agree upon that, then we simply have to treat the functions that modify the FPEnv, e.g. fesetexcept(...), as barriers. That way it does not matter if a FENV_ACCESS=OFF function is translated with constrained intrinsics or not, since nothing can be scheduled around these barriers.

I thought we couldn't do barriers. No barriers means no way to prevent code motion and mixing of constrained with non-constrained FP. That was the reason for having all FP in a function be constrained if any of it was.

I'd like to hear more about this. The fesetexcept(...) function and friends change the FPEnv state, which can change the semantics for the instructions that follow. They definitely have to act as a barrier.

There's no sense in which we can have a code-motion barrier within a function that acts on the regular FP instructions. We can have a barrier for the constrained intrinsics. This is why we need, in this design, for any function that uses FENV_ACCESS=ON for any part of it, to always use the constrained instrinsics.

In D53157#1304347, @uweigand wrote:

OK, let me try to expand on my point 3 above, which appears to have confused everybody :-)

Ah, that makes more sense.

The problem I was missing is when a FENV_ACCESS=OFF operation, like a FDIV, is hoisted into a FENV_ACCESS=ON region. I see it now, but still think that forcing FENV_ACCESS=OFF operations into constrained intrinsics is a big hammer. If there is a way to add barriers around function calls in a FENV_ACCESS=ON region, that would be better.

Stepping way back, the *best* solution is to have the FPEnv implementation shut down unsafe optimizations on an individual basis. Perhaps we should be tagging functions that contain FENV_ACCESS=ON regions. That way unsafe optimization passes, e.g. hoisting, can query the tag to see if they should skip these functions.

But given that there is still infrastructure missing in the IR optimizers, I also think that at least in the first implementation, we probably should go with the original approach and just use constrained intrinsics everywhere in the function, and possibly add some function attribute that prevent any cross-inlining of functions built with constrained intrinsics with functions built with regular floating-point operations.

Yes, I'll stop moaning about this. This is perfectly suitable for a first implementation.

In D53157#1304880, @hfinkel wrote:

In D53157#1304873, @cameron.mcinally wrote:

I'd like to hear more about this. The fesetexcept(...) function and friends change the FPEnv state, which can change the semantics for the instructions that follow. They definitely have to act as a barrier.

There's no sense in which we can have a code-motion barrier within a function that acts on the regular FP instructions. We can have a barrier for the constrained intrinsics. This is why we need, in this design, for any function that uses FENV_ACCESS=ON for any part of it, to always use the constrained instrinsics.

Thanks. Ulrich's description helped me see the problem.

In D53157#1304895, @cameron.mcinally wrote:

The problem I was missing is when a FENV_ACCESS=OFF operation, like a FDIV, is hoisted into a FENV_ACCESS=ON region. I see it now, but still think that forcing FENV_ACCESS=OFF operations into constrained intrinsics is a big hammer. If there is a way to add barriers around function calls in a FENV_ACCESS=ON region, that would be better.

It may be a big hammer, but I don't think we'll need to use it terribly often (that is, I don't expect the mixing of FENV_ACCESS regions within a function to be common -- I could be wrong). In any event, we've been clear that choosing to enable FP environment access will mean sacrificing performance in the near term. Once we've had time to teach all the relevant optimizations how to handle the constrained intrinsics the hammer will start to feel much smaller.

In D53157#1304895, @cameron.mcinally wrote:

Stepping way back, the *best* solution is to have the FPEnv implementation shut down unsafe optimizations on an individual basis. Perhaps we should be tagging functions that contain FENV_ACCESS=ON regions. That way unsafe optimization passes, e.g. hoisting, can query the tag to see if they should skip these functions.

One of the reasons we went with the approach we did is that it provides conservatively correct results by default. Optimizations don't need to be taught to leave the intrinsics alone. They do that as a default behavior when they don't recognize the intrinsic. If we had required optimizations to opt out as needed, we'd have problems chasing down all the places where that needed to happen. So we chose the opposite challenge of using the big hammer for initial implementation and then having to go looking for the places that needed to be taught how to interpret the intrinsics for cases when the optimization can still be performed.

I think at this point we're all on the same page in this regard. I just wanted to make sure we also understand why we're on that page. I still believe it was the correct choice.

In D53157#1304900, @andrew.w.kaylor wrote:

I think at this point we're all on the same page in this regard. I just wanted to make sure we also understand why we're on that page. I still believe it was the correct choice.

Yup, we're on the same page.

To explain where I'm coming from, our in-house compiler has tracked down *many* of the trap-unsafe transformations in LLVM and can toggle them as desired. The execution penalty for running trap-safe code is (much) less than 5% with that compiler. My goal for this project is to make Clang competitive with that compiler.

In D53157#1304347, @uweigand wrote:

But given that there is still infrastructure missing in the IR optimizers, I also think that at least in the first implementation, we probably should go with the original approach and just use constrained intrinsics everywhere in the function, and possibly add some function attribute that prevent any cross-inlining of functions built with constrained intrinsics with functions built with regular floating-point operations.

Subtle. This last sentence seems to imply that cross-inlining should be allowed when there are no regular floating point operations in the function to be inlined. This makes sense due to, for example, the common use of tiny functions just to retrieve a value. Do I interpret your statement correctly?

In D53157#1305233, @kpn wrote:

In D53157#1304347, @uweigand wrote:

But given that there is still infrastructure missing in the IR optimizers, I also think that at least in the first implementation, we probably should go with the original approach and just use constrained intrinsics everywhere in the function, and possibly add some function attribute that prevent any cross-inlining of functions built with constrained intrinsics with functions built with regular floating-point operations.

Subtle. This last sentence seems to imply that cross-inlining should be allowed when there are no regular floating point operations in the function to be inlined. This makes sense due to, for example, the common use of tiny functions just to retrieve a value. Do I interpret your statement correctly?

Sure, that would be an optimization. Another potential optimization would be to allow inlining, but have the inliner automatically replace regular floating-point operations with constrained intrinsics in the target routine ... But all of that is probably "stage2" after an initial implementation that just disables inlining.

Digressing a bit, but has anyone given thought to how this implementation will play out with libraries? When running with traps enabled, libraries must be compiled for trap-safety. E.g. optimizations on a lib's code could introduce a NaN or cause a trap that does not exist in the code itself.

I'm wondering if the speed penalty for trap-safe code will be a problem with the general Clang population that does not care about traps...

In D53157#1309743, @cameron.mcinally wrote:

Digressing a bit, but has anyone given thought to how this implementation will play out with libraries? When running with traps enabled, libraries must be compiled for trap-safety. E.g. optimizations on a lib's code could introduce a NaN or cause a trap that does not exist in the code itself.

In my reading of the standard text, there is no special provision for library code. This means that in general, calling any library function while running with nondefault trap settings is undefined behavior, unless the library was itself compiled with FENV_ACCESS ON. There does not even appear to be any requirement for the C standard library functions to have been compiled with FENV_ACCESS ON, as far as I can see ...

I've changed the patch so that calls to CreateFAdd() et al will give you constrained intrinsics if they are enabled. This required adding functions to enable/disable constrained-as-default plus calls to deal with the rounding mode and exception handling for contrained intrinsics.

I've left the independent create constrained functions to make it dead easy to move them out of the header and avoid inlining bloat.

I'm not dealing with any fast math flags or metadata currently. I'm not sure they make sense in a constrained context, or at least I haven't seen anyone argue they must be in place for an initial implementation.

In D53157#1316860, @uweigand wrote:

In my reading of the standard text, there is no special provision for library code. This means that in general, calling any library function while running with nondefault trap settings is undefined behavior, unless the library was itself compiled with FENV_ACCESS ON. There does not even appear to be any requirement for the C standard library functions to have been compiled with FENV_ACCESS ON, as far as I can see ...

I'll take this conversation to llvm-dev in the near future. I don't want to detract from Kevin's work.

Apologies for highjacking this Diff, Kevin.

cameron.mcinally added inline comments.Dec 3 2018, 12:02 PM

include/llvm/IR/IRBuilder.h
1244 ↗	(On Diff #176432)	I think you can forgo the `else {` in these functions since the if branch returns immediately.

Address review comment: Shrink the diff by eliminating unneeded use of else.

I emailed llvm-dev and cfe-dev on January 16th. The only responses I got back were of the don't care variety. For now it seems the constrained intrinsics will only be used by clang.

@rsmith, does the direction of this patch seem reasonable for clang?

Once Richard comments is there anything else needing us to delay here?

kpn added a reviewer: rsmith.May 10 2019, 10:18 AM

kpn added a subscriber: kbarton.May 10 2019, 12:33 PM

I think this looks straightforward, as long as people agree to have a separate CreateConstrained* version of these functions. I'm not qualified to weigh in on that as I don't know Clang at all and can't comment about the tradeoffs (although I think they have been well articulated in the discussion). From what I can see, that is the only thing blocking this from getting approved (unless there is something else I missed while reading the discussion).

The only other comment I have is there was some very good description about the intention here from @uweigand, @cameron.mcinally and @andrew.w.kaylor and @kpn. I think it would be good if that discussion is extracted from this review and put somewhere (perhaps the language ref) to indicate precisely what the semantics are we are trying to achieve with FENV_ACCESS ON/OFF.

include/llvm/IR/IRBuilder.h
228 ↗	(On Diff #176464)	This is a minor quibble, but the method is setIsConstrainedFP, while the member is IsFPConstrained. I'm not sure if that is intentionally different, or an oversight.

I'm waiting for a signoff from at least one front-end guy. I hope the mode setting that allows us to keep front ends from having to touch every use of the IRBuilder is something that works for clang. But I haven't heard anything from @rsmith or any other front-end person.

include/llvm/IR/IRBuilder.h
228 ↗	(On Diff #176464)	Yeah, that's an oversight. Fixed.

andrew.w.kaylor added inline comments.May 20 2019, 2:09 PM

include/llvm/IR/IRBuilder.h
234 ↗	(On Diff #176464)	I think it would be better to add some enumerated type to describe the exception semantic and rounding modes. The MDNode is an implementation detail and an awkward one for the front end to have to deal with.

mibintc added a subscriber: mibintc.May 21 2019, 7:40 AM

erichkeane added a subscriber: erichkeane.May 24 2019, 12:07 PM

erichkeane added inline comments.

include/llvm/IR/IRBuilder.h
110 ↗	(On Diff #176464)	Instead of adding these here, make these inline initializers on lines 100-102.
228 ↗	(On Diff #176464)	IS this fixed?
1249 ↗	(On Diff #176464)	Why set the last 2 to nullptr when you have defaults for these?
1259 ↗	(On Diff #176464)	The last 2 parameters are never actually used except in the test. Are these really important to have if they are never used in source?
1408 ↗	(On Diff #176464)	All these CreateConstrainedXXX should be distilled down to a single function that takes the intrinsic as a parameter.

mibintc mentioned this in D62730: [RFC] Alternate implementation of D53157 IRBuilder for Constrained FP using enumeration vs MDNode and add support for fp-model and fp-speculation language options.May 31 2019, 6:55 AM

mibintc mentioned this in D62731: Add support for options -frounding-math, -ftrapping-math, -ffp-model=, and -ffp-exception-behavior=, : Specify floating point behavior.May 31 2019, 7:01 AM

I've posted a small change to this patch here, https://reviews.llvm.org/D62730 and there's a patch to add clang fp options here, https://reviews.llvm.org/D62731

include/llvm/IR/IRBuilder.h
234 ↗	(On Diff #176464)	I posted a patch showing the rounding and exception information being passed as enumeration not MDNode. I've uploaded the patch here, https://reviews.llvm.org/D62730

rscottmanley added a subscriber: rscottmanley.May 31 2019, 7:31 AM

Address review comments.

I don't know which patch will proceed; but they weren't added already, this seems to be missing documentation changes.

I wanted to get the API straight before working on documentation.

include/llvm/IR/IRBuilder.h
228 ↗	(On Diff #176464)	In my working copy, yes.
234 ↗	(On Diff #176464)	It's a good idea. D62730 beat me to the punch.
1249 ↗	(On Diff #176464)	Eh, no really good reason.
1259 ↗	(On Diff #176464)	They're part of the intrinsics. It seems like there should be a way to emit them as part of the instruction when creating the instruction.
1408 ↗	(On Diff #176464)	That moves us further down the road towards having modes. I haven't seen a front-end guy sign off on it yet, but it seems like a good idea.

mibintc added inline comments.May 31 2019, 8:17 AM

include/llvm/IR/IRBuilder.h
234 ↗	(On Diff #176464)	BTW I created FPState.h because if I put the enumeration types in IRBuilder it dragged in too many include files in clang and required other random source changes for disambigutation. Anyway I don't need to comandeer your patch -- I don't want it to die from neglect. I can close mine?

Oh, this ticket is not going to die from neglect. It is true that D43515 is a higher priority, but I need this IRBuilder work done as well. My department head wanted it done by the end of _last_ year. It's not going to die.

How about I merge your changes into this ticket and we continue work over here? There is the issue of the documentation that lebedev.ri asked you to write. Can I talk you into putting that together and sending it to me <kevin.neal@sas.com>? I'll work on the documentation that I was asked to write. Between the two of us we should be in pretty good shape. Does that work for you?

I am still waiting for feedback from an actual consumer of the IRBuilder who will be using this new functionality. If someone clang-side could chime in on this ticket I'd very much appreciate it.

In D53157#1525311, @kpn wrote:

Oh, this ticket is not going to die from neglect. It is true that D43515 is a higher priority, but I need this IRBuilder work done as well. My department head wanted it done by the end of _last_ year. It's not going to die.

How about I merge your changes into this ticket and we continue work over here? There is the issue of the documentation that lebedev.ri asked you to write. Can I talk you into putting that together and sending it to me <kevin.neal@sas.com>? I'll work on the documentation that I was asked to write. Between the two of us we should be in pretty good shape. Does that work for you?

Yes I'll do that. Thanks.

I am still waiting for feedback from an actual consumer of the IRBuilder who will be using this new functionality. If someone clang-side could chime in on this ticket I'd very much appreciate it.

I wrote a clang patch that works with your IRBuilder modifications. I also checked with the Intel fortran team and they think this interface will be workable for them too. From Intel perspective it's +1.

Address the rest of the review comments, hopefully.

I've lifted from mibintc's D62730 the enums for the exception behavior and rounding modes. The rest of that set of changes is better left to a separate ticket after all.

Since I now have those requested enums I've changed the relevant public functions to take them instead of an MDNode*.

There is no standalone or comprehensive IRBuilder documentation that I could find. I hope the references in comments back to the LangRef are good enough. I used the exact language from it to make it easier to reference said doc.

simoll added a subscriber: simoll.Jun 17 2019, 5:29 AM

Ping.

This works for me, I redid my patches adding fp-model options to work with this newest changeset, (but I haven't yet uploaded them.)

kpn added a reviewer: rjmccall.Jun 17 2019, 8:14 AM

rjmccall added inline comments.Jun 17 2019, 9:50 AM

include/llvm/IR/IRBuilder.h
113 ↗	(On Diff #202987)	Should these have "FP" in the name somewhere? And are they really IRBuilder-specific concepts, as opposed to something that should be declared as part of the interface for working with these intrinsics? Also, I believe we can use explicit underlying types now in LLVM; it'd be nice if we didn't make `IRBuilder` unnecessarily large.
255 ↗	(On Diff #202987)	This seems unnecessary.
1138 ↗	(On Diff #202987)	Huh? You build an `MDNode` that wraps an `MDString` and then immediately extract the `MDString` from it and drop the `MDNode`? I think you should just have a function somewhere that returns the correct metadata string for a `ConstrainedRoundingKind`, and then the code here is much more obvious. Maybe this can be wherever you declare the enums. You can also have a function that goes the other way and returns an `Optional<ConstrainedRoundingKind>`. Function name should include `FP`. Same points apply to the next function.
228 ↗	(On Diff #176464)	Maybe this should be more explicit about what exactly it does? Specifically, it changes the behavior of the `CreateF<Op>` methods so that they use constrained intrinsics instead of the standard instructions.

kpn marked an inline comment as done.Jun 17 2019, 10:10 AM

kpn added inline comments.

include/llvm/IR/IRBuilder.h
113 ↗	(On Diff #202987)	Would it be better to use the RoundingMode and ExceptionBehavior enums in the ConstrainedFPIntrinsic class? These enums->strings here get turned back into those IntrinsicInst.h enums eventually anyway. But that means pulling in yet more headers in IRBuilder.h. I admit I'm not sure what you mean with your second paragraph. Is that a way of saying that, for example, the relevant IntrinsicInst.h enums should be used instead?

rjmccall added inline comments.Jun 17 2019, 11:00 AM

include/llvm/IR/IRBuilder.h
113 ↗	(On Diff #202987)	I think it's `IRBuilder.h`'s fate to pull in the majority of the headers in `IR`, but even if we want to avoid that, duplicating the enums seems like a step too far. If there's too much code in the header declaring that intrinsic, you should extract a smaller header that just declares the enums and their string conversions. The second paragraph is asking for these enums to be specifically constrained to `uint8_t` so that we aren't wasting a ton of memory everywhere we store them. `IRBuilder` doesn't have strong size constraints, but it'd still be nice if all these accumulated features didn't make it hundreds of bytes larger than it needs to be.

Address review comments.

Run the changes through clang-format.

include/llvm/IR/IRBuilder.h
1138 ↗	(On Diff #202987)	Well, when you put it that way, your proposed change does result in much simpler code. The resulting code isn't that much more than the switch. I hope it is now easy enough to read. These functions are private to the class and just exist to be helpers. I don't see a need to put the switch table into a different function, but I'm willing to be convinced.

rjmccall added inline comments.Jun 17 2019, 12:55 PM

include/llvm/IR/IRBuilder.h
1138 ↗	(On Diff #202987)	I feel like having the enum->string mapping available somewhere independent of `IRBuilder` will probably be useful to someone, and it nicely separates concerns in any case . You could just make it a static method on `ConstrainedFPIntrinsic` along with the reverse operation.

kpn marked an inline comment as done.Jun 17 2019, 1:06 PM

kpn added inline comments.

include/llvm/IR/IRBuilder.h
1138 ↗	(On Diff #202987)	Ok, will do.

Add static methods to convert between a StringRef and the enums for RoundingMode or ExceptionBehavior.

rjmccall added inline comments.Jun 18 2019, 10:28 AM

include/llvm/IR/IntrinsicInst.h
235 ↗	(On Diff #205371)	Is it okay that `ebUnspecified` and `ebInvalid` overlap here?
245 ↗	(On Diff #205371)	You should document the behavior of the `StrTo...` functions when the string is unexpected and the `...ToStr` functions when the enum is invalid/unspecified.

kpn marked an inline comment as done.Jun 18 2019, 10:54 AM

kpn added inline comments.

include/llvm/IR/IntrinsicInst.h
235 ↗	(On Diff #205371)	I can think of a couple of alternatives. If they don't overlap then we have to go back and sweep the source to make sure that ebUnspecified is always handled in all cases that currently handle ebInvalid. And in the future nobody is allowed to check in source that doesn't handle both. Or, we could drop ebUnspecified, but then ebInvalid would have a valid meaning to the IRBuilder interface. That looks like a bug even if it works properly. Generally, adding eb/rmUnspecified but having them overlap the invalid cases seems to me to be the best option.

rjmccall added inline comments.Jun 18 2019, 12:15 PM

include/llvm/IR/IntrinsicInst.h
235 ↗	(On Diff #205371)	What is the purpose of `ebInvalid`? Is this a valid argument to the intrinsic, or is it there so that `getExceptionBehavior()` has something to return if it doesn't recognize the argument? For the latter, maybe it would be better to just assert that the argument was one of the recognized possibilities; we don't generally expect LLVM to silently propagate ill-formed IR. Or if it's valid to not provide exception behavior to the intrinsic, maybe that's the same as `ebUnspecified`, or maybe the accessor should return `Optional<ExceptionBehavior>`.

Andy, can I get you to chime in?

I found out that some compilers, including SAS/C, will warn about enum overlaps like the one here under discussion. So I now believe John is correct and that eliminating that overlap is the right thing to do. Over the weekend I'll be thinking about whether or not to try and find an alternative, but I'll be out of cell phone range until late Sunday evening.

kpn added inline comments.Jun 26 2019, 11:20 AM

include/llvm/IR/IntrinsicInst.h
235 ↗	(On Diff #205371)	The current code is in ConstrainedFPIntrinsic::getExceptionBehavior(), which returns ebInvalid if the metadata is missing or somehow wrong (wrong type, invalid string, etc). This is then used by the IR verifier which fails with an Assert() if it gets ebInvalid. So we aren't propagating ill-formed IR. The intrinsic requires the metadata, and it requires the metadata be valid. But I don't want to clutter up IRBuilder's consumer's code. That's why I wanted the API to allow those arguments to not be given. One idea of mine is to have ebInvalid be valid when given to the IRBuilder, and the meaning would be "this value is unspecified". But that seems weird. Maybe there's another way.

rjmccall added inline comments.Jun 26 2019, 11:44 AM

include/llvm/IR/IntrinsicInst.h
235 ↗	(On Diff #205371)	The current code is in ConstrainedFPIntrinsic::getExceptionBehavior(), which returns ebInvalid if the metadata is missing or somehow wrong (wrong type, invalid string, etc). This is then used by the IR verifier which fails with an Assert() if it gets ebInvalid. So we aren't propagating ill-formed IR. Okay, in that case, I don't understand why this accessor shouldn't just assume that it's working on valid IR and not be able to return `ebInvalid`. The intrinsic requires the metadata, and it requires the metadata be valid. But I don't want to clutter up IRBuilder's consumer's code. That's why I wanted the API to allow those arguments to not be given. If you want the `IRBuilder` factories to be able to not take a value (if there's some defaulting scheme that's more complicated than you can express with a default argument), you can have it take an `Optional<Whatever>`, and then people can just pass in `None`; and then you don't need to muddy up the enum with a case that isn't really a valid alternative.

Address review comments. Eliminate ConstrainedFPIntrinsic's ebInvalid and rmInvalid enumeration values and replace them with use of the Optional<> class. Adjust the rest of the patch to take that into account. ConstrainedFPIntrinsic now supports string <--> enum conversions.

rjmccall added inline comments.Jul 1 2019, 11:24 AM

include/llvm/IR/IRBuilder.h
259 ↗	(On Diff #207378)	Okay, so what are the invariants here now? It looks like, in order to enable constrained floating point, I need to also have set the default exception behavior and rounding mode. That should at least be documented, but maybe a better approach would be to require these to be passed in when enabling constrained FP.
1324 ↗	(On Diff #207378)	`FPMD` is dropped in this case; I don't know if that's intentional.
1459 ↗	(On Diff #207378)	It looks like nothing in `IRBuilder` ever passes these arguments. Are you just anticipating that someone might want to call this directly?

kpn marked 3 inline comments as done.Jul 1 2019, 12:11 PM

kpn added inline comments.

include/llvm/IR/IRBuilder.h
259 ↗	(On Diff #207378)	The IRBuilder constructor sets the defaults to ebStrict and rmDynamic, but leaves strict mode off. So if you only turn on strict mode you get the strictest settings. This implies that it isn't possible to trigger this assertion. Which is true unless you have some form of memory overwrite or someone comes along later and puts in a bad IRBuilder change. The more compiler work I've done over the years the more sanity checks I've gotten into the habit of adding.
1324 ↗	(On Diff #207378)	You can see that I'm on the fence here. I'm not sure that mixing fast math with constrained math makes sense. So CreateConstrainedFPBinOp() can take an instruction for copying fast math flags, but it doesn't do anything with this instruction. And the FPMD is simply dropped in the non-*FMF() methods. If we do decide later to support mixing constrained and fast math then we won't have to change any APIs. Until then it takes the conservative route and drops the info.
1459 ↗	(On Diff #207378)	Correct. And I wrote tests for it in IRBuilderTest.cpp.

rjmccall added inline comments.Jul 1 2019, 1:45 PM

include/llvm/IR/IRBuilder.h
259 ↗	(On Diff #207378)	Okay, that scheme makes sense, but I think sanity-checking that there hasn't been an arbitrary memory smasher is a bit much. Using `Optional` when the value can't actually be missing just makes the code more confusing for readers and maintainers; please just leave these non-optional.
1324 ↗	(On Diff #207378)	Okay. That should probably be mentioned, at least in the documentation for `CreateConstrainedFPBinOp`. Should you make an overload of the latter which takes an `MDNode*` as the final argument, for parallelism/completeness?
1459 ↗	(On Diff #207378)	Okay. So people who want to pass these explicitly just can't use the convenience methods? That seems like a reasonable compromise.

kpn marked 2 inline comments as done.Jul 3 2019, 10:24 AM

kpn added inline comments.

include/llvm/IR/IRBuilder.h
1324 ↗	(On Diff #207378)	I made one function that takes both. Patch coming in a minute.

I was convinced in email that having the constrained intrinsics take fast math flags could be useful in some circumstances. The example that was given was that perhaps FMA contraction could be requested. So, when generating constrained intrinsics the fast math flags and the FP metadata is now handled exactly the same as in non-constrained mode. I've updated the documentation to warn about that when turning constrained mode on and off.

I was also a little worried about having CreateConstrainedFPBinOp() handle fast math flags differently from CreateBinOp(). Now they're the same.

Thanks, this looks good to me.

• wuzish added a subscriber: • wuzish.Jul 3 2019, 7:34 PM

This revision was not accepted when it landed; it landed in state Needs Review.Jul 8 2019, 9:20 AM

Closed by commit rL365339: Teach the IRBuilder about fadd and friends. (authored by kpn). · Explain Why

This revision was automatically updated to reflect the committed changes.

Herald added a project: Restricted Project. · View Herald TranscriptJul 8 2019, 9:20 AM

mibintc added a comment.Jul 25 2019, 9:13 AM

This comment was removed by mibintc.

pengfei added a subscriber: pengfei.Aug 15 2019, 10:23 PM

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

IR/

IRBuilder.h

130 lines

IntrinsicInst.h

49 lines

lib/

IR/

IntrinsicInst.cpp

67 lines

Verifier.cpp

4 lines

unittests/

IR/

IRBuilderTest.cpp

64 lines

Diff 208440

llvm/trunk/include/llvm/IR/IRBuilder.h

Show All 25 Lines
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
#include "llvm/IR/DebugLoc.h"		#include "llvm/IR/DebugLoc.h"
#include "llvm/IR/DerivedTypes.h"		#include "llvm/IR/DerivedTypes.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/IR/GlobalVariable.h"		#include "llvm/IR/GlobalVariable.h"
#include "llvm/IR/InstrTypes.h"		#include "llvm/IR/InstrTypes.h"
#include "llvm/IR/Instruction.h"		#include "llvm/IR/Instruction.h"
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
#include "llvm/IR/Intrinsics.h"		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/LLVMContext.h"		#include "llvm/IR/LLVMContext.h"
#include "llvm/IR/Module.h"		#include "llvm/IR/Module.h"
#include "llvm/IR/Operator.h"		#include "llvm/IR/Operator.h"
#include "llvm/IR/Type.h"		#include "llvm/IR/Type.h"
#include "llvm/IR/Value.h"		#include "llvm/IR/Value.h"
#include "llvm/IR/ValueHandle.h"		#include "llvm/IR/ValueHandle.h"
#include "llvm/Support/AtomicOrdering.h"		#include "llvm/Support/AtomicOrdering.h"
#include "llvm/Support/CBindingWrapping.h"		#include "llvm/Support/CBindingWrapping.h"
▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines
protected:		protected:
BasicBlock *BB;		BasicBlock *BB;
BasicBlock::iterator InsertPt;		BasicBlock::iterator InsertPt;
LLVMContext &Context;		LLVMContext &Context;

MDNode *DefaultFPMathTag;		MDNode *DefaultFPMathTag;
FastMathFlags FMF;		FastMathFlags FMF;

		bool IsFPConstrained;
		ConstrainedFPIntrinsic::ExceptionBehavior DefaultConstrainedExcept;
		ConstrainedFPIntrinsic::RoundingMode DefaultConstrainedRounding;

ArrayRef<OperandBundleDef> DefaultOperandBundles;		ArrayRef<OperandBundleDef> DefaultOperandBundles;

public:		public:
IRBuilderBase(LLVMContext &context, MDNode *FPMathTag = nullptr,		IRBuilderBase(LLVMContext &context, MDNode *FPMathTag = nullptr,
ArrayRef<OperandBundleDef> OpBundles = None)		ArrayRef<OperandBundleDef> OpBundles = None)
: Context(context), DefaultFPMathTag(FPMathTag),		: Context(context), DefaultFPMathTag(FPMathTag), IsFPConstrained(false),
		DefaultConstrainedExcept(ConstrainedFPIntrinsic::ebStrict),
		DefaultConstrainedRounding(ConstrainedFPIntrinsic::rmDynamic),
DefaultOperandBundles(OpBundles) {		DefaultOperandBundles(OpBundles) {
ClearInsertionPoint();		ClearInsertionPoint();
}		}

//===--------------------------------------------------------------------===//		//===--------------------------------------------------------------------===//
// Builder configuration methods		// Builder configuration methods
//===--------------------------------------------------------------------===//		//===--------------------------------------------------------------------===//

▲ Show 20 Lines • Show All 100 Lines • ▼ Show 20 Lines	public:
void clearFastMathFlags() { FMF.clear(); }		void clearFastMathFlags() { FMF.clear(); }

/// Set the floating point math metadata to be used.		/// Set the floating point math metadata to be used.
void setDefaultFPMathTag(MDNode *FPMathTag) { DefaultFPMathTag = FPMathTag; }		void setDefaultFPMathTag(MDNode *FPMathTag) { DefaultFPMathTag = FPMathTag; }

/// Set the fast-math flags to be used with generated fp-math operators		/// Set the fast-math flags to be used with generated fp-math operators
void setFastMathFlags(FastMathFlags NewFMF) { FMF = NewFMF; }		void setFastMathFlags(FastMathFlags NewFMF) { FMF = NewFMF; }

		/// Enable/Disable use of constrained floating point math. When
		/// enabled the CreateF<op>() calls instead create constrained
		/// floating point intrinsic calls. Fast math flags are unaffected
		/// by this setting.
		void setIsFPConstrained(bool IsCon) { IsFPConstrained = IsCon; }

		/// Query for the use of constrained floating point math
		bool getIsFPConstrained() { return IsFPConstrained; }

		/// Set the exception handling to be used with constrained floating point
		void setDefaultConstrainedExcept(
		ConstrainedFPIntrinsic::ExceptionBehavior NewExcept) {
		DefaultConstrainedExcept = NewExcept;
		}

		/// Set the rounding mode handling to be used with constrained floating point
		void setDefaultConstrainedRounding(
		ConstrainedFPIntrinsic::RoundingMode NewRounding) {
		DefaultConstrainedRounding = NewRounding;
		}

		/// Get the exception handling used with constrained floating point
		ConstrainedFPIntrinsic::ExceptionBehavior getDefaultConstrainedExcept() {
		return DefaultConstrainedExcept;
		}

		/// Get the rounding mode handling used with constrained floating point
		ConstrainedFPIntrinsic::RoundingMode getDefaultConstrainedRounding() {
		return DefaultConstrainedRounding;
		}

//===--------------------------------------------------------------------===//		//===--------------------------------------------------------------------===//
// RAII helpers.		// RAII helpers.
//===--------------------------------------------------------------------===//		//===--------------------------------------------------------------------===//

// RAII object that stores the current insertion point and restores it		// RAII object that stores the current insertion point and restores it
// when the object is destroyed. This includes the debug location.		// when the object is destroyed. This includes the debug location.
class InsertPointGuard {		class InsertPointGuard {
IRBuilderBase &Builder;		IRBuilderBase &Builder;
▲ Show 20 Lines • Show All 811 Lines • ▼ Show 20 Lines	private:

Value foldConstant(Instruction::BinaryOps Opc, Value L,		Value foldConstant(Instruction::BinaryOps Opc, Value L,
Value *R, const Twine &Name) const {		Value *R, const Twine &Name) const {
auto *LC = dyn_cast<Constant>(L);		auto *LC = dyn_cast<Constant>(L);
auto *RC = dyn_cast<Constant>(R);		auto *RC = dyn_cast<Constant>(R);
return (LC && RC) ? Insert(Folder.CreateBinOp(Opc, LC, RC), Name) : nullptr;		return (LC && RC) ? Insert(Folder.CreateBinOp(Opc, LC, RC), Name) : nullptr;
}		}

		Value *getConstrainedFPRounding(
		Optional<ConstrainedFPIntrinsic::RoundingMode> Rounding) {
		ConstrainedFPIntrinsic::RoundingMode UseRounding =
		DefaultConstrainedRounding;

		if (Rounding.hasValue())
		UseRounding = Rounding.getValue();

		Optional<StringRef> RoundingStr =
		ConstrainedFPIntrinsic::RoundingModeToStr(UseRounding);
		assert(RoundingStr.hasValue() && "Garbage strict rounding mode!");
		auto *RoundingMDS = MDString::get(Context, RoundingStr.getValue());

		return MetadataAsValue::get(Context, RoundingMDS);
		}

		Value *getConstrainedFPExcept(
		Optional<ConstrainedFPIntrinsic::ExceptionBehavior> Except) {
		ConstrainedFPIntrinsic::ExceptionBehavior UseExcept =
		DefaultConstrainedExcept;

		if (Except.hasValue())
		UseExcept = Except.getValue();

		Optional<StringRef> ExceptStr =
		ConstrainedFPIntrinsic::ExceptionBehaviorToStr(UseExcept);
		assert(ExceptStr.hasValue() && "Garbage strict exception behavior!");
		auto *ExceptMDS = MDString::get(Context, ExceptStr.getValue());

		return MetadataAsValue::get(Context, ExceptMDS);
		}

public:		public:
Value CreateAdd(Value LHS, Value *RHS, const Twine &Name = "",		Value CreateAdd(Value LHS, Value *RHS, const Twine &Name = "",
bool HasNUW = false, bool HasNSW = false) {		bool HasNUW = false, bool HasNSW = false) {
if (auto *LC = dyn_cast<Constant>(LHS))		if (auto *LC = dyn_cast<Constant>(LHS))
if (auto *RC = dyn_cast<Constant>(RHS))		if (auto *RC = dyn_cast<Constant>(RHS))
return Insert(Folder.CreateAdd(LC, RC, HasNUW, HasNSW), Name);		return Insert(Folder.CreateAdd(LC, RC, HasNUW, HasNSW), Name);
return CreateInsertNUWNSWBinOp(Instruction::Add, LHS, RHS, Name,		return CreateInsertNUWNSWBinOp(Instruction::Add, LHS, RHS, Name,
HasNUW, HasNSW);		HasNUW, HasNSW);
▲ Show 20 Lines • Show All 202 Lines • ▼ Show 20 Lines	public:
}		}

Value CreateXor(Value LHS, uint64_t RHS, const Twine &Name = "") {		Value CreateXor(Value LHS, uint64_t RHS, const Twine &Name = "") {
return CreateXor(LHS, ConstantInt::get(LHS->getType(), RHS), Name);		return CreateXor(LHS, ConstantInt::get(LHS->getType(), RHS), Name);
}		}

Value CreateFAdd(Value L, Value *R, const Twine &Name = "",		Value CreateFAdd(Value L, Value *R, const Twine &Name = "",
MDNode *FPMD = nullptr) {		MDNode *FPMD = nullptr) {
		if (IsFPConstrained)
		return CreateConstrainedFPBinOp(Intrinsic::experimental_constrained_fadd,
		L, R, nullptr, Name, FPMD);

if (Value *V = foldConstant(Instruction::FAdd, L, R, Name)) return V;		if (Value *V = foldConstant(Instruction::FAdd, L, R, Name)) return V;
Instruction *I = setFPAttrs(BinaryOperator::CreateFAdd(L, R), FPMD, FMF);		Instruction *I = setFPAttrs(BinaryOperator::CreateFAdd(L, R), FPMD, FMF);
return Insert(I, Name);		return Insert(I, Name);
}		}

/// Copy fast-math-flags from an instruction rather than using the builder's		/// Copy fast-math-flags from an instruction rather than using the builder's
/// default FMF.		/// default FMF.
Value CreateFAddFMF(Value L, Value R, Instruction FMFSource,		Value CreateFAddFMF(Value L, Value R, Instruction FMFSource,
const Twine &Name = "") {		const Twine &Name = "") {
		if (IsFPConstrained)
		return CreateConstrainedFPBinOp(Intrinsic::experimental_constrained_fadd,
		L, R, FMFSource, Name);

if (Value *V = foldConstant(Instruction::FAdd, L, R, Name)) return V;		if (Value *V = foldConstant(Instruction::FAdd, L, R, Name)) return V;
Instruction *I = setFPAttrs(BinaryOperator::CreateFAdd(L, R), nullptr,		Instruction *I = setFPAttrs(BinaryOperator::CreateFAdd(L, R), nullptr,
FMFSource->getFastMathFlags());		FMFSource->getFastMathFlags());
return Insert(I, Name);		return Insert(I, Name);
}		}

Value CreateFSub(Value L, Value *R, const Twine &Name = "",		Value CreateFSub(Value L, Value *R, const Twine &Name = "",
MDNode *FPMD = nullptr) {		MDNode *FPMD = nullptr) {
		if (IsFPConstrained)
		return CreateConstrainedFPBinOp(Intrinsic::experimental_constrained_fsub,
		L, R, nullptr, Name, FPMD);

if (Value *V = foldConstant(Instruction::FSub, L, R, Name)) return V;		if (Value *V = foldConstant(Instruction::FSub, L, R, Name)) return V;
Instruction *I = setFPAttrs(BinaryOperator::CreateFSub(L, R), FPMD, FMF);		Instruction *I = setFPAttrs(BinaryOperator::CreateFSub(L, R), FPMD, FMF);
return Insert(I, Name);		return Insert(I, Name);
}		}

/// Copy fast-math-flags from an instruction rather than using the builder's		/// Copy fast-math-flags from an instruction rather than using the builder's
/// default FMF.		/// default FMF.
Value CreateFSubFMF(Value L, Value R, Instruction FMFSource,		Value CreateFSubFMF(Value L, Value R, Instruction FMFSource,
const Twine &Name = "") {		const Twine &Name = "") {
		if (IsFPConstrained)
		return CreateConstrainedFPBinOp(Intrinsic::experimental_constrained_fsub,
		L, R, FMFSource, Name);

if (Value *V = foldConstant(Instruction::FSub, L, R, Name)) return V;		if (Value *V = foldConstant(Instruction::FSub, L, R, Name)) return V;
Instruction *I = setFPAttrs(BinaryOperator::CreateFSub(L, R), nullptr,		Instruction *I = setFPAttrs(BinaryOperator::CreateFSub(L, R), nullptr,
FMFSource->getFastMathFlags());		FMFSource->getFastMathFlags());
return Insert(I, Name);		return Insert(I, Name);
}		}

Value CreateFMul(Value L, Value *R, const Twine &Name = "",		Value CreateFMul(Value L, Value *R, const Twine &Name = "",
MDNode *FPMD = nullptr) {		MDNode *FPMD = nullptr) {
		if (IsFPConstrained)
		return CreateConstrainedFPBinOp(Intrinsic::experimental_constrained_fmul,
		L, R, nullptr, Name, FPMD);

if (Value *V = foldConstant(Instruction::FMul, L, R, Name)) return V;		if (Value *V = foldConstant(Instruction::FMul, L, R, Name)) return V;
Instruction *I = setFPAttrs(BinaryOperator::CreateFMul(L, R), FPMD, FMF);		Instruction *I = setFPAttrs(BinaryOperator::CreateFMul(L, R), FPMD, FMF);
return Insert(I, Name);		return Insert(I, Name);
}		}

/// Copy fast-math-flags from an instruction rather than using the builder's		/// Copy fast-math-flags from an instruction rather than using the builder's
/// default FMF.		/// default FMF.
Value CreateFMulFMF(Value L, Value R, Instruction FMFSource,		Value CreateFMulFMF(Value L, Value R, Instruction FMFSource,
const Twine &Name = "") {		const Twine &Name = "") {
		if (IsFPConstrained)
		return CreateConstrainedFPBinOp(Intrinsic::experimental_constrained_fmul,
		L, R, FMFSource, Name);

if (Value *V = foldConstant(Instruction::FMul, L, R, Name)) return V;		if (Value *V = foldConstant(Instruction::FMul, L, R, Name)) return V;
Instruction *I = setFPAttrs(BinaryOperator::CreateFMul(L, R), nullptr,		Instruction *I = setFPAttrs(BinaryOperator::CreateFMul(L, R), nullptr,
FMFSource->getFastMathFlags());		FMFSource->getFastMathFlags());
return Insert(I, Name);		return Insert(I, Name);
}		}

Value CreateFDiv(Value L, Value *R, const Twine &Name = "",		Value CreateFDiv(Value L, Value *R, const Twine &Name = "",
MDNode *FPMD = nullptr) {		MDNode *FPMD = nullptr) {
		if (IsFPConstrained)
		return CreateConstrainedFPBinOp(Intrinsic::experimental_constrained_fdiv,
		L, R, nullptr, Name, FPMD);

if (Value *V = foldConstant(Instruction::FDiv, L, R, Name)) return V;		if (Value *V = foldConstant(Instruction::FDiv, L, R, Name)) return V;
Instruction *I = setFPAttrs(BinaryOperator::CreateFDiv(L, R), FPMD, FMF);		Instruction *I = setFPAttrs(BinaryOperator::CreateFDiv(L, R), FPMD, FMF);
return Insert(I, Name);		return Insert(I, Name);
}		}

/// Copy fast-math-flags from an instruction rather than using the builder's		/// Copy fast-math-flags from an instruction rather than using the builder's
/// default FMF.		/// default FMF.
Value CreateFDivFMF(Value L, Value R, Instruction FMFSource,		Value CreateFDivFMF(Value L, Value R, Instruction FMFSource,
const Twine &Name = "") {		const Twine &Name = "") {
		if (IsFPConstrained)
		return CreateConstrainedFPBinOp(Intrinsic::experimental_constrained_fdiv,
		L, R, FMFSource, Name);

if (Value *V = foldConstant(Instruction::FDiv, L, R, Name)) return V;		if (Value *V = foldConstant(Instruction::FDiv, L, R, Name)) return V;
Instruction *I = setFPAttrs(BinaryOperator::CreateFDiv(L, R), nullptr,		Instruction *I = setFPAttrs(BinaryOperator::CreateFDiv(L, R), nullptr,
FMFSource->getFastMathFlags());		FMFSource->getFastMathFlags());
return Insert(I, Name);		return Insert(I, Name);
}		}

Value CreateFRem(Value L, Value *R, const Twine &Name = "",		Value CreateFRem(Value L, Value *R, const Twine &Name = "",
MDNode *FPMD = nullptr) {		MDNode *FPMD = nullptr) {
		if (IsFPConstrained)
		return CreateConstrainedFPBinOp(Intrinsic::experimental_constrained_frem,
		L, R, nullptr, Name, FPMD);

if (Value *V = foldConstant(Instruction::FRem, L, R, Name)) return V;		if (Value *V = foldConstant(Instruction::FRem, L, R, Name)) return V;
Instruction *I = setFPAttrs(BinaryOperator::CreateFRem(L, R), FPMD, FMF);		Instruction *I = setFPAttrs(BinaryOperator::CreateFRem(L, R), FPMD, FMF);
return Insert(I, Name);		return Insert(I, Name);
}		}

/// Copy fast-math-flags from an instruction rather than using the builder's		/// Copy fast-math-flags from an instruction rather than using the builder's
/// default FMF.		/// default FMF.
Value CreateFRemFMF(Value L, Value R, Instruction FMFSource,		Value CreateFRemFMF(Value L, Value R, Instruction FMFSource,
const Twine &Name = "") {		const Twine &Name = "") {
		if (IsFPConstrained)
		return CreateConstrainedFPBinOp(Intrinsic::experimental_constrained_frem,
		L, R, FMFSource, Name);

if (Value *V = foldConstant(Instruction::FRem, L, R, Name)) return V;		if (Value *V = foldConstant(Instruction::FRem, L, R, Name)) return V;
Instruction *I = setFPAttrs(BinaryOperator::CreateFRem(L, R), nullptr,		Instruction *I = setFPAttrs(BinaryOperator::CreateFRem(L, R), nullptr,
FMFSource->getFastMathFlags());		FMFSource->getFastMathFlags());
return Insert(I, Name);		return Insert(I, Name);
}		}

Value *CreateBinOp(Instruction::BinaryOps Opc,		Value *CreateBinOp(Instruction::BinaryOps Opc,
Value LHS, Value RHS, const Twine &Name = "",		Value LHS, Value RHS, const Twine &Name = "",
MDNode *FPMathTag = nullptr) {		MDNode *FPMathTag = nullptr) {
if (Value *V = foldConstant(Opc, LHS, RHS, Name)) return V;		if (Value *V = foldConstant(Opc, LHS, RHS, Name)) return V;
Instruction *BinOp = BinaryOperator::Create(Opc, LHS, RHS);		Instruction *BinOp = BinaryOperator::Create(Opc, LHS, RHS);
if (isa<FPMathOperator>(BinOp))		if (isa<FPMathOperator>(BinOp))
BinOp = setFPAttrs(BinOp, FPMathTag, FMF);		BinOp = setFPAttrs(BinOp, FPMathTag, FMF);
return Insert(BinOp, Name);		return Insert(BinOp, Name);
}		}

		CallInst *CreateConstrainedFPBinOp(
		Intrinsic::ID ID, Value L, Value R, Instruction *FMFSource = nullptr,
		const Twine &Name = "", MDNode *FPMathTag = nullptr,
		Optional<ConstrainedFPIntrinsic::RoundingMode> Rounding = None,
		Optional<ConstrainedFPIntrinsic::ExceptionBehavior> Except = None) {
		Value *RoundingV = getConstrainedFPRounding(Rounding);
		Value *ExceptV = getConstrainedFPExcept(Except);

		FastMathFlags UseFMF = FMF;
		if (FMFSource)
		UseFMF = FMFSource->getFastMathFlags();

		CallInst *C = CreateIntrinsic(ID, {L->getType()},
		{L, R, RoundingV, ExceptV}, nullptr, Name);
		return cast<CallInst>(setFPAttrs(C, FPMathTag, UseFMF));
		}

Value CreateNeg(Value V, const Twine &Name = "",		Value CreateNeg(Value V, const Twine &Name = "",
bool HasNUW = false, bool HasNSW = false) {		bool HasNUW = false, bool HasNSW = false) {
if (auto *VC = dyn_cast<Constant>(V))		if (auto *VC = dyn_cast<Constant>(V))
return Insert(Folder.CreateNeg(VC, HasNUW, HasNSW), Name);		return Insert(Folder.CreateNeg(VC, HasNUW, HasNSW), Name);
BinaryOperator *BO = Insert(BinaryOperator::CreateNeg(V), Name);		BinaryOperator *BO = Insert(BinaryOperator::CreateNeg(V), Name);
if (HasNUW) BO->setHasNoUnsignedWrap();		if (HasNUW) BO->setHasNoUnsignedWrap();
if (HasNSW) BO->setHasNoSignedWrap();		if (HasNSW) BO->setHasNoSignedWrap();
return BO;		return BO;
▲ Show 20 Lines • Show All 1,059 Lines • Show Last 20 Lines

llvm/trunk/include/llvm/IR/IntrinsicInst.h

Show First 20 Lines • Show All 202 Lines • ▼ Show 20 Lines	static bool classof(const Value *V) {
return isa<IntrinsicInst>(V) && classof(cast<IntrinsicInst>(V));		return isa<IntrinsicInst>(V) && classof(cast<IntrinsicInst>(V));
}		}
/// @}		/// @}
};		};

/// This is the common base class for constrained floating point intrinsics.		/// This is the common base class for constrained floating point intrinsics.
class ConstrainedFPIntrinsic : public IntrinsicInst {		class ConstrainedFPIntrinsic : public IntrinsicInst {
public:		public:
enum RoundingMode {		/// Specifies the rounding mode to be assumed. This is only used when
rmInvalid,		/// when constrained floating point is enabled. See the LLVM Language
rmDynamic,		/// Reference Manual for details.
rmToNearest,		enum RoundingMode : uint8_t {
rmDownward,		rmDynamic, ///< This corresponds to "fpround.dynamic".
rmUpward,		rmToNearest, ///< This corresponds to "fpround.tonearest".
rmTowardZero		rmDownward, ///< This corresponds to "fpround.downward".
};		rmUpward, ///< This corresponds to "fpround.upward".
		rmTowardZero ///< This corresponds to "fpround.tozero".
enum ExceptionBehavior {		};
ebInvalid,
ebIgnore,		/// Specifies the required exception behavior. This is only used when
ebMayTrap,		/// when constrained floating point is used. See the LLVM Language
ebStrict		/// Reference Manual for details.
		enum ExceptionBehavior : uint8_t {
		ebIgnore, ///< This corresponds to "fpexcept.ignore".
		ebMayTrap, ///< This corresponds to "fpexcept.maytrap".
		ebStrict ///< This corresponds to "fpexcept.strict".
};		};

bool isUnaryOp() const;		bool isUnaryOp() const;
bool isTernaryOp() const;		bool isTernaryOp() const;
RoundingMode getRoundingMode() const;		Optional<RoundingMode> getRoundingMode() const;
ExceptionBehavior getExceptionBehavior() const;		Optional<ExceptionBehavior> getExceptionBehavior() const;

		/// Returns a valid RoundingMode enumerator when given a string
		/// that is valid as input in constrained intrinsic rounding mode
		/// metadata.
		static Optional<RoundingMode> StrToRoundingMode(StringRef);

		/// For any RoundingMode enumerator, returns a string valid as input in
		/// constrained intrinsic rounding mode metadata.
		static Optional<StringRef> RoundingModeToStr(RoundingMode);

		/// Returns a valid ExceptionBehavior enumerator when given a string
		/// valid as input in constrained intrinsic exception behavior metadata.
		static Optional<ExceptionBehavior> StrToExceptionBehavior(StringRef);

		/// For any ExceptionBehavior enumerator, returns a string valid as
		/// input in constrained intrinsic exception behavior metadata.
		static Optional<StringRef> ExceptionBehaviorToStr(ExceptionBehavior);

// Methods for support type inquiry through isa, cast, and dyn_cast:		// Methods for support type inquiry through isa, cast, and dyn_cast:
static bool classof(const IntrinsicInst *I) {		static bool classof(const IntrinsicInst *I) {
switch (I->getIntrinsicID()) {		switch (I->getIntrinsicID()) {
case Intrinsic::experimental_constrained_fadd:		case Intrinsic::experimental_constrained_fadd:
case Intrinsic::experimental_constrained_fsub:		case Intrinsic::experimental_constrained_fsub:
case Intrinsic::experimental_constrained_fmul:		case Intrinsic::experimental_constrained_fmul:
case Intrinsic::experimental_constrained_fdiv:		case Intrinsic::experimental_constrained_fdiv:
▲ Show 20 Lines • Show All 627 Lines • Show Last 20 Lines

llvm/trunk/lib/IR/IntrinsicInst.cpp

Show First 20 Lines • Show All 97 Lines • ▼ Show 20 Lines	Value *InstrProfIncrementInst::getStep() const {
if (InstrProfIncrementInstStep::classof(this)) {		if (InstrProfIncrementInstStep::classof(this)) {
return const_cast<Value *>(getArgOperand(4));		return const_cast<Value *>(getArgOperand(4));
}		}
const Module *M = getModule();		const Module *M = getModule();
LLVMContext &Context = M->getContext();		LLVMContext &Context = M->getContext();
return ConstantInt::get(Type::getInt64Ty(Context), 1);		return ConstantInt::get(Type::getInt64Ty(Context), 1);
}		}

ConstrainedFPIntrinsic::RoundingMode		Optional<ConstrainedFPIntrinsic::RoundingMode>
ConstrainedFPIntrinsic::getRoundingMode() const {		ConstrainedFPIntrinsic::getRoundingMode() const {
unsigned NumOperands = getNumArgOperands();		unsigned NumOperands = getNumArgOperands();
Metadata *MD =		Metadata *MD =
dyn_cast<MetadataAsValue>(getArgOperand(NumOperands - 2))->getMetadata();		dyn_cast<MetadataAsValue>(getArgOperand(NumOperands - 2))->getMetadata();
if (!MD \|\| !isa<MDString>(MD))		if (!MD \|\| !isa<MDString>(MD))
return rmInvalid;		return None;
StringRef RoundingArg = cast<MDString>(MD)->getString();		return StrToRoundingMode(cast<MDString>(MD)->getString());
		}

		Optional<ConstrainedFPIntrinsic::RoundingMode>
		ConstrainedFPIntrinsic::StrToRoundingMode(StringRef RoundingArg) {
// For dynamic rounding mode, we use round to nearest but we will set the		// For dynamic rounding mode, we use round to nearest but we will set the
// 'exact' SDNodeFlag so that the value will not be rounded.		// 'exact' SDNodeFlag so that the value will not be rounded.
return StringSwitch<RoundingMode>(RoundingArg)		return StringSwitch<Optional<RoundingMode>>(RoundingArg)
.Case("round.dynamic", rmDynamic)		.Case("round.dynamic", rmDynamic)
.Case("round.tonearest", rmToNearest)		.Case("round.tonearest", rmToNearest)
.Case("round.downward", rmDownward)		.Case("round.downward", rmDownward)
.Case("round.upward", rmUpward)		.Case("round.upward", rmUpward)
.Case("round.towardzero", rmTowardZero)		.Case("round.towardzero", rmTowardZero)
.Default(rmInvalid);		.Default(None);
}		}

ConstrainedFPIntrinsic::ExceptionBehavior		Optional<StringRef>
		ConstrainedFPIntrinsic::RoundingModeToStr(RoundingMode UseRounding) {
		Optional<StringRef> RoundingStr = None;
		switch (UseRounding) {
		case ConstrainedFPIntrinsic::rmDynamic:
		RoundingStr = "round.dynamic";
		break;
		case ConstrainedFPIntrinsic::rmToNearest:
		RoundingStr = "round.tonearest";
		break;
		case ConstrainedFPIntrinsic::rmDownward:
		RoundingStr = "round.downward";
		break;
		case ConstrainedFPIntrinsic::rmUpward:
		RoundingStr = "round.upward";
		break;
		case ConstrainedFPIntrinsic::rmTowardZero:
		RoundingStr = "round.tozero";
		break;
		}
		return RoundingStr;
		}

		Optional<ConstrainedFPIntrinsic::ExceptionBehavior>
ConstrainedFPIntrinsic::getExceptionBehavior() const {		ConstrainedFPIntrinsic::getExceptionBehavior() const {
unsigned NumOperands = getNumArgOperands();		unsigned NumOperands = getNumArgOperands();
Metadata *MD =		Metadata *MD =
dyn_cast<MetadataAsValue>(getArgOperand(NumOperands - 1))->getMetadata();		dyn_cast<MetadataAsValue>(getArgOperand(NumOperands - 1))->getMetadata();
if (!MD \|\| !isa<MDString>(MD))		if (!MD \|\| !isa<MDString>(MD))
return ebInvalid;		return None;
StringRef ExceptionArg = cast<MDString>(MD)->getString();		return StrToExceptionBehavior(cast<MDString>(MD)->getString());
return StringSwitch<ExceptionBehavior>(ExceptionArg)		}

		Optional<ConstrainedFPIntrinsic::ExceptionBehavior>
		ConstrainedFPIntrinsic::StrToExceptionBehavior(StringRef ExceptionArg) {
		return StringSwitch<Optional<ExceptionBehavior>>(ExceptionArg)
.Case("fpexcept.ignore", ebIgnore)		.Case("fpexcept.ignore", ebIgnore)
.Case("fpexcept.maytrap", ebMayTrap)		.Case("fpexcept.maytrap", ebMayTrap)
.Case("fpexcept.strict", ebStrict)		.Case("fpexcept.strict", ebStrict)
.Default(ebInvalid);		.Default(None);
		}

		Optional<StringRef>
		ConstrainedFPIntrinsic::ExceptionBehaviorToStr(ExceptionBehavior UseExcept) {
		Optional<StringRef> ExceptStr = None;
		switch (UseExcept) {
		case ConstrainedFPIntrinsic::ebStrict:
		ExceptStr = "fpexcept.strict";
		break;
		case ConstrainedFPIntrinsic::ebIgnore:
		ExceptStr = "fpexcept.ignore";
		break;
		case ConstrainedFPIntrinsic::ebMayTrap:
		ExceptStr = "fpexcept.maytrap";
		break;
		}
		return ExceptStr;
}		}

bool ConstrainedFPIntrinsic::isUnaryOp() const {		bool ConstrainedFPIntrinsic::isUnaryOp() const {
switch (getIntrinsicID()) {		switch (getIntrinsicID()) {
default:		default:
return false;		return false;
case Intrinsic::experimental_constrained_fptrunc:		case Intrinsic::experimental_constrained_fptrunc:
case Intrinsic::experimental_constrained_fpext:		case Intrinsic::experimental_constrained_fpext:
▲ Show 20 Lines • Show All 66 Lines • Show Last 20 Lines

llvm/trunk/lib/IR/Verifier.cpp

Show First 20 Lines • Show All 4,770 Lines • ▼ Show 20 Lines	void Verifier::visitConstrainedFPIntrinsic(ConstrainedFPIntrinsic &FPI) {
}		}

// If a non-metadata argument is passed in a metadata slot then the		// If a non-metadata argument is passed in a metadata slot then the
// error will be caught earlier when the incorrect argument doesn't		// error will be caught earlier when the incorrect argument doesn't
// match the specification in the intrinsic call table. Thus, no		// match the specification in the intrinsic call table. Thus, no
// argument type check is needed here.		// argument type check is needed here.

if (HasExceptionMD) {		if (HasExceptionMD) {
Assert(FPI.getExceptionBehavior() != ConstrainedFPIntrinsic::ebInvalid,		Assert(FPI.getExceptionBehavior().hasValue(),
"invalid exception behavior argument", &FPI);		"invalid exception behavior argument", &FPI);
}		}
if (HasRoundingMD) {		if (HasRoundingMD) {
Assert(FPI.getRoundingMode() != ConstrainedFPIntrinsic::rmInvalid,		Assert(FPI.getRoundingMode().hasValue(),
"invalid rounding mode argument", &FPI);		"invalid rounding mode argument", &FPI);
}		}
}		}

void Verifier::visitDbgIntrinsic(StringRef Kind, DbgVariableIntrinsic &DII) {		void Verifier::visitDbgIntrinsic(StringRef Kind, DbgVariableIntrinsic &DII) {
auto *MD = cast<MetadataAsValue>(DII.getArgOperand(0))->getMetadata();		auto *MD = cast<MetadataAsValue>(DII.getArgOperand(0))->getMetadata();
AssertDI(isa<ValueAsMetadata>(MD) \|\|		AssertDI(isa<ValueAsMetadata>(MD) \|\|
(isa<MDNode>(MD) && !cast<MDNode>(MD)->getNumOperands()),		(isa<MDNode>(MD) && !cast<MDNode>(MD)->getNumOperands()),
▲ Show 20 Lines • Show All 643 Lines • Show Last 20 Lines

llvm/trunk/unittests/IR/IRBuilderTest.cpp

Show First 20 Lines • Show All 116 Lines • ▼ Show 20 Lines	TEST_F(IRBuilderTest, Intrinsics) {

Call = Builder.CreateIntrinsic(Intrinsic::fma, {V->getType()}, {V, V, V}, I);		Call = Builder.CreateIntrinsic(Intrinsic::fma, {V->getType()}, {V, V, V}, I);
II = cast<IntrinsicInst>(Call);		II = cast<IntrinsicInst>(Call);
EXPECT_EQ(II->getIntrinsicID(), Intrinsic::fma);		EXPECT_EQ(II->getIntrinsicID(), Intrinsic::fma);
EXPECT_TRUE(II->hasNoInfs());		EXPECT_TRUE(II->hasNoInfs());
EXPECT_FALSE(II->hasNoNaNs());		EXPECT_FALSE(II->hasNoNaNs());
}		}

		TEST_F(IRBuilderTest, ConstrainedFP) {
		IRBuilder<> Builder(BB);
		Value *V;
		CallInst *Call;
		IntrinsicInst *II;

		V = Builder.CreateLoad(GV);

		// See if we get constrained intrinsics instead of non-constrained
		// instructions.
		Builder.setIsFPConstrained(true);

		V = Builder.CreateFAdd(V, V);
		ASSERT_TRUE(isa<IntrinsicInst>(V));
		II = cast<IntrinsicInst>(V);
		EXPECT_EQ(II->getIntrinsicID(), Intrinsic::experimental_constrained_fadd);

		V = Builder.CreateFSub(V, V);
		ASSERT_TRUE(isa<IntrinsicInst>(V));
		II = cast<IntrinsicInst>(V);
		EXPECT_EQ(II->getIntrinsicID(), Intrinsic::experimental_constrained_fsub);

		V = Builder.CreateFMul(V, V);
		ASSERT_TRUE(isa<IntrinsicInst>(V));
		II = cast<IntrinsicInst>(V);
		EXPECT_EQ(II->getIntrinsicID(), Intrinsic::experimental_constrained_fmul);

		V = Builder.CreateFDiv(V, V);
		ASSERT_TRUE(isa<IntrinsicInst>(V));
		II = cast<IntrinsicInst>(V);
		EXPECT_EQ(II->getIntrinsicID(), Intrinsic::experimental_constrained_fdiv);

		V = Builder.CreateFRem(V, V);
		ASSERT_TRUE(isa<IntrinsicInst>(V));
		II = cast<IntrinsicInst>(V);
		EXPECT_EQ(II->getIntrinsicID(), Intrinsic::experimental_constrained_frem);

		// Verify the codepaths for setting and overriding the default metadata.
		V = Builder.CreateFAdd(V, V);
		ASSERT_TRUE(isa<ConstrainedFPIntrinsic>(V));
		auto *CII = cast<ConstrainedFPIntrinsic>(V);
		ASSERT_TRUE(CII->getExceptionBehavior() == ConstrainedFPIntrinsic::ebStrict);
		ASSERT_TRUE(CII->getRoundingMode() == ConstrainedFPIntrinsic::rmDynamic);

		Builder.setDefaultConstrainedExcept(ConstrainedFPIntrinsic::ebIgnore);
		Builder.setDefaultConstrainedRounding(ConstrainedFPIntrinsic::rmUpward);
		V = Builder.CreateFAdd(V, V);
		CII = cast<ConstrainedFPIntrinsic>(V);
		ASSERT_TRUE(CII->getExceptionBehavior() == ConstrainedFPIntrinsic::ebIgnore);
		ASSERT_TRUE(CII->getRoundingMode() == ConstrainedFPIntrinsic::rmUpward);

		// Now override the defaults.
		Call = Builder.CreateConstrainedFPBinOp(
		Intrinsic::experimental_constrained_fadd, V, V, nullptr, "", nullptr,
		ConstrainedFPIntrinsic::rmDownward, ConstrainedFPIntrinsic::ebMayTrap);
		CII = cast<ConstrainedFPIntrinsic>(Call);
		EXPECT_EQ(CII->getIntrinsicID(), Intrinsic::experimental_constrained_fadd);
		ASSERT_TRUE(CII->getExceptionBehavior() == ConstrainedFPIntrinsic::ebMayTrap);
		ASSERT_TRUE(CII->getRoundingMode() == ConstrainedFPIntrinsic::rmDownward);

		Builder.CreateRetVoid();
		EXPECT_FALSE(verifyModule(*M));
		}

TEST_F(IRBuilderTest, Lifetime) {		TEST_F(IRBuilderTest, Lifetime) {
IRBuilder<> Builder(BB);		IRBuilder<> Builder(BB);
AllocaInst *Var1 = Builder.CreateAlloca(Builder.getInt8Ty());		AllocaInst *Var1 = Builder.CreateAlloca(Builder.getInt8Ty());
AllocaInst *Var2 = Builder.CreateAlloca(Builder.getInt32Ty());		AllocaInst *Var2 = Builder.CreateAlloca(Builder.getInt32Ty());
AllocaInst *Var3 = Builder.CreateAlloca(Builder.getInt8Ty(),		AllocaInst *Var3 = Builder.CreateAlloca(Builder.getInt8Ty(),
Builder.getInt32(123));		Builder.getInt32(123));

CallInst *Start1 = Builder.CreateLifetimeStart(Var1);		CallInst *Start1 = Builder.CreateLifetimeStart(Var1);
▲ Show 20 Lines • Show All 615 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Teach the IRBuilder about constrained fadd and friendsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 208440

llvm/trunk/include/llvm/IR/IRBuilder.h

llvm/trunk/include/llvm/IR/IntrinsicInst.h

llvm/trunk/lib/IR/IntrinsicInst.cpp

llvm/trunk/lib/IR/Verifier.cpp

llvm/trunk/unittests/IR/IRBuilderTest.cpp

Teach the IRBuilder about constrained fadd and friends
ClosedPublic