The IRBuilder has calls to create floating point instructions like fadd. It does not have calls to create constrained versions of them. This patch adds support for constrained creation of fadd, fsub, fmul, fdiv, and frem.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
Should the builder's FastMathFlags setting apply to these calls? Is it worth adding an optional FMF parameter to these now, so we don't end up with duplicated API like we have for the regular FP binops?
In the interest of having these new calls be like the non-strict ones, I think this does make some sense.
Would it be enough to model after the existing FP binops and take an Instruction argument to copy the flags from?
Yes, that's the typical usage - copy the flags over from some existing instruction. I don't remember why the existing code has 2 versions of each. If the optional parameter isn't there, then we should just use the buillder's default FMF on the new call?
Adding the usual suspects: @andrew.w.kaylor @craig.topper @uweigand @hfinkel
I don't see the benefit of this change. The intrinsics are good enough to be functional right now (minus a few that are in progress), so I'm not sure we need IRBuilder functions. Am I missing something?
That said, the constrained intrinsics aren't a good long term solution for the FPEnv project. IMO, we'll hit a wall once we try to optimize them. Not sure if this patch helps with that problem though. We really need to toggle optimizations that can clobber side-effects on a case-by-case basis.
The IRBuilder is used in clang. Once the use of the pragma is wired down to clang's codegen we need a way to emit the constrained intrinsics. It makes for very readable code to change a call to Builder.CreateFAdd() so it conditionally calls Builder.CreateConstrainedFAdd() instead. And if CreateConstrainedFAdd() returns something other than a call to an intrinsic then clang doesn't care.
Rather than having separate methods to create the constrained versions of the intrinsics, why not have a way to set the constrained state in the IR builder and have the regular CreateFAdd et. al. functions use that to automatically create the constrained intrinsic versions? This would correspond to how fast math flags are handled.
Well, yes and no. There is the default fast math flags that get picked up by, eg, CreateFAdd(). There's also a default MDNode. But CreateFAdd() and the others let you specify a different MDNode, and the Create*FMF() functions let you specify different fast math flags. So the existing practice isn't strongly pointing in either direction.
Having CreateConstrained*() functions means front ends can key off its own flag and not have to notify the IRBuilder when that flag goes out of scope. Then again, having a IRBuilder-instance-global flag means probably fewer places in the front end have to be touched, but it would have to notify the IRBuilder when its own strictness flag goes out of scope. So I don't know what is better.
I suspect you are correct and we should just stick with the existing calls. I don't expect anyone would need to switch between constrained and regular floating point at an instruction level of granularity. But we need to do what is best for users of the IRBuilder, and I'm not 100% sold either way.
Can we get some input from someone who maintains front ends?
Craig Topper also raised some concerns with me (in person, his desk is right by mine) about the potential effect this might have on code size. He tells me that IRBuilder calls are intended to always be inlined and if we grow the implementation of these functions too much it could lead to noticeable bloat. It still seems to me like it might be worthwhile for the simplification it would allow in the front end, but I'm not really a front end guy so I definitely agree that we should get some input from front end people about what they want.
The Standard allows for this (to some degree). E.g. FENV_ACCESS can be toggled between nested compound statements.
Also, some AVX512 hardware instructions have explicit SAE and RM operands.
Yes, but I'm not worried about what sort of code is input to a compiler. I'm worried about what is right for the compiler itself. Also, since we're in LLVM here, we have to think about all languages that use LLVM as a backend.
Just because FENV_ACCESS can be toggled on that granularity doesn't mean we have to represent it that way. We've previously agreed (I think) that if FENV_ACCESS is enabled anywhere in a function we will want to use the constrained intrinsics for all FP operations in the function, not just the ones in the scope where it was specified. I think this works because FENV_ACCESS ON needs to be respected (i.e. we need to assume that the user may change the FP environment) but FENV_ACCESS OFF doesn't need to be respected (i.e. we are not required to assume that the user will not change the environment). For instance, the spec does give us liberty to assume the default rounding mode in FENV_ACCESS OFF regions, but if we make no assumptions about the rounding mode that will be conservatively correct.
As for instructions that take an explicit rounding mode argument, that's a separate issue (and one that is relevant for multiple architectures). The constrained intrinsics do not enforce a rounding mode (i.e. they are descriptive rather than prescriptive), but if we have concrete rounding mode arguments for any given instruction we could use that to encode the rounding mode in the instruction. I'm not sure how user code would express this apart from use of intrinsics.
Fair enough. Just for conversation's sake, I was envisioning the FE support a little differently...
- Add a command line option, say -ffpe_safe=[true|false], that toggles FPEnv-safe code generation for a whole file. A -ffpe_safe=true would be equivalent to FENV_ACCESS=ON at the beginning of a translation unit and we would capture it in some FE variable. That command line option can be overridden by the FENV_ACCESS pragma.
- If the FENV_ACCESS pragma is seen, carry a nop/metadata/something construct to toggle the FE variable during IR generation.
- Then when generating IR, choose between constrained/unconstrained IR depending on the state of the FE variable.
#1 may seem superfluous, but there are certain benchmarks (e.g. SPEC CPU) that do not allow for code modifications. So, in order to run those with FPEnv-safe code, we'd need a non-invasive way to toggle FPEnv-safe code generation.
I'll also add that my interpretation of FENV_ACCESS in the C Standard [7.6.1] implies that we have to track the FENV_ACCESS pragma. That is, it's not okay to ignore FENV_ACCESS=OFF. When we transfer from FENV_ACCESS=OFF code to FENV_ACCESS=ON code, the rounding mode needs to be reset to the default setting. But, I'm open to hearing other interpretations. My copy of the Standard is old, so please correct me if this changed...
When execution passes from a part of the program translated with FENV_ACCESS ‘‘off’’ to a part translated with FENV_ACCESS ‘‘on’’, the state of the floating-point status flags is unspecified and the floating-point control modes have their default settings.
Craig's right about not wanting to bloat the inlinable functions in IRBuilder, but this is something that we can measure. In addition, we might be able to move the "slow path" (which create the constrained intrinsics) to the .cpp file (by manually outlining to a different function).
Just because FENV_ACCESS can be toggled on that granularity doesn't mean we have to represent it that way. We've previously agreed (I think) that if FENV_ACCESS is enabled anywhere in a function we will want to use the constrained intrinsics for all FP operations in the function, not just the ones in the scope where it was specified.
Yes, this is also my understanding. We can't soundly mix the two in the same function because we can't prevent the code motion within the function.
Also, to be clear, adding a mode that automatically adds these when using the existing IRBuilder functions seems worth investigating. It seems like that would greatly simply the FE code.
To bikeshed, I'd prefer we don't have an underscore in the name of the command-line flag.
- If the FENV_ACCESS pragma is seen, carry a nop/metadata/something construct to toggle the FE variable during IR generation.
- Then when generating IR, choose between constrained/unconstrained IR depending on the state of the FE variable.
#1 may seem superfluous, but there are certain benchmarks (e.g. SPEC CPU) that do not allow for code modifications. So, in order to run those with FPEnv-safe code, we'd need a non-invasive way to toggle FPEnv-safe code generation.
I'll also add that my interpretation of FENV_ACCESS in the C Standard [7.6.1] implies that we have to track the FENV_ACCESS pragma. That is, it's not okay to ignore FENV_ACCESS=OFF. When we transfer from FENV_ACCESS=OFF code to FENV_ACCESS=ON code, the rounding mode needs to be reset to the default setting. But, I'm open to hearing other interpretations. My copy of the Standard is old, so please correct me if this changed...
When execution passes from a part of the program translated with FENV_ACCESS ‘‘off’’ to a part translated with FENV_ACCESS ‘‘on’’, the state of the floating-point status flags is unspecified and the floating-point control modes have their default settings.
The rounding mode does need to be reset to its default setting when passing from FENV_ACCESS "on" to FENV_ACCESS "off", but that seems to be the user's responsibility. Are you saying that the implementation should reset it on that transition?
Ugh, I don't know. The C Standard's language is so vague.
Yes, that's how I was interpreting the Standard (today). The implementation should reset the control modes. The verbiage is murky at best though.
We touched on this in D43142 and I do realize that my opinion has flip-flopped since then. I previously believed that reseting the control modes was up to the user, but now I'm not so sure. I suppose that either way, as long as a fesetround(default_mode_constant) is seen with a FENV_ACCESS=OFF, we could use that as a barrier to prevent the problematic code motion.
Thinking aloud, maybe we should be working on redefining FENV_ACCESS in the C Standard? It's pretty clear that this section could use some work.
All that said, my understanding of $7.6.1 in the Standard is cloudy at best. If I'm the only one that feels this way, I'll drop my objections...
To be clear, I mean here that *we* can't mix the two soundly in the same function (where we generate some operations using the constrained instrinsics and some not) because of constraints imposed by our design. The standard may indeed allow for more precision. We'll need to be conservatively correct.
Yes, that's how I was interpreting the Standard (today). The implementation should reset the control modes. The verbiage is murky at best though.
I'll also point out that whether or not we insert a rounding-mode reset when the pragma changes state is orthogonal to whether we stop emitting constrained intrinsics at that point.
We touched on this in D43142 and I do realize that my opinion has flip-flopped since then. I previously believed that reseting the control modes was up to the user, but now I'm not so sure. I suppose that either way, as long as a fesetround(default_mode_constant) is seen with a FENV_ACCESS=OFF, we could use that as a barrier to prevent the problematic code motion.
Thinking aloud, maybe we should be working on redefining FENV_ACCESS in the C Standard? It's pretty clear that this section could use some work.
All that said, my understanding of $7.6.1 in the Standard is cloudy at best. If I'm the only one that feels this way, I'll drop my objections...
I don't read the standard that way, but the standard could certainly be more clear.
A couple of comments on the previous discussion:
- Instead of defining a new command line option, I'd prefer to use the existing options -frounding-math and -ftrapping-math to set the default behavior of math operations w.r.t. rounding modes and exception status. (For compatibility with GCC if nothing else.)
- I also read the C standard to imply that it is a requirement of user code to reset the status flag to default before switching back to FENV_ACCESS OFF. The fundamental characterization of the pragma says "The FENV_ACCESS pragma provides a means to inform the implementation when a program might access the floating-point environment to test floating-point status flags or run under non-default floating-point control modes." There is no mention anywhere that using the pragma, on its own, will ever change those control modes. The last sentence about "... the floating-point control modes have their default setting", while indeed a bit ambiguous, is still consistent with an interpretation that it is the responsibility of user code to ensure that state, there is no explicit statement that the implementation will do so.
- I agree that we need to be careful about intermixing "normal" floating-point operations with strict ones. However, I'm still not convinced that the pragma itself must be the scheduling barrier. It seems to me that the compiler already knows where FP control flags are ever modified directly (this can only happen with intrinsics or the like), so the main issue is whether function calls need to be considered. This is where the pragma comes in: in my mind, the primary difference between FENV_ACCESS ON and FENV_ACCESS OFF regions is that where the pragma is ON, function calls need to be considered (unless otherwise known for sure) to access FP control flags, while where the pragma is OFF, function calls can be considered to never touch FP control flags. So the real scheduling barrier would be any function call within a FENV_ACCESS ON region. Those would have to be marked by the front-end in the IR, presumably using a function attribute. The common LLVM optimizers would then need to respect that scheduling barrier (here is where we likely still have an open issue, there doesn't appear to be any way to express that at the IR level for regular floating-point operations ...), and likewise the back-ends (but that looks straightforward: a back-end typically will model FP status as residing in a register or in a pseudo-memory slot, and those can simply be considered used/clobbered by function calls marked as within FENV_ACCESS ON regions).
I agree that it's preferable to re-use these existing options if possible. I have some concerns that -ftrapping-math has a partial implementation in place that doesn't seem to be well aligned with the way fast-math flags are handled, so it might require some work to have that working as expected without breaking existing users. In general though these seem like they should do what we need.
Regarding GCC compatibility, I notice that GCC defaults to trapping math being enabled and I don't think that's what we want with clang. It also seems to imply something more than I think we need for constrained handling. For example, the GCC documentation says that -fno-trapping-math "can result in incorrect output for programs that depend on an exact implementation of IEEE or ISO rules/specifications for math functions" so it sounds like maybe it also implies (for GCC) something like LLVM's "afn" fast math flag.
So if we are going to use these options, I think we need to have a discussion about whether or not it's OK to diverge from GCC's interpretation of them.
I definitely agree with this interpretation of the standard. My understanding is that behavior is undefined if the user has not left the FP environment in the default state when transitioning to an FENV_ACCESS OFF region.
I'm a bit confused by this. The constrained intrinsics will cause all calls to act as barriers to motion of the FP operations represented by the intrinsics (at least before instruction selection). So I'm not clear what you are saying is needed here.
I like this proposal.
- I also read the C standard to imply that it is a requirement of user code to reset the status flag to default before switching back to FENV_ACCESS OFF. The fundamental characterization of the pragma says "The FENV_ACCESS pragma provides a means to inform the implementation when a program might access the floating-point environment to test floating-point status flags or run under non-default floating-point control modes." There is no mention anywhere that using the pragma, on its own, will ever change those control modes. The last sentence about "... the floating-point control modes have their default setting", while indeed a bit ambiguous, is still consistent with an interpretation that it is the responsibility of user code to ensure that state, there is no explicit statement that the implementation will do so.
That's a fair interpretation. Andy mentioned:
If we all agree upon that, then we simply have to treat the functions that modify the FPEnv, e.g. fesetexcept(...), as barriers. That way it does not matter if a FENV_ACCESS=OFF function is translated with constrained intrinsics or not, since nothing can be scheduled around these barriers.
- I agree that we need to be careful about intermixing "normal" floating-point operations with strict ones. However, I'm still not convinced that the pragma itself must be the scheduling barrier. It seems to me that the compiler already knows where FP control flags are ever modified directly (this can only happen with intrinsics or the like), so the main issue is whether function calls need to be considered. This is where the pragma comes in: in my mind, the primary difference between FENV_ACCESS ON and FENV_ACCESS OFF regions is that where the pragma is ON, function calls need to be considered (unless otherwise known for sure) to access FP control flags, while where the pragma is OFF, function calls can be considered to never touch FP control flags. So the real scheduling barrier would be any function call within a FENV_ACCESS ON region. Those would have to be marked by the front-end in the IR, presumably using a function attribute. The common LLVM optimizers would then need to respect that scheduling barrier (here is where we likely still have an open issue, there doesn't appear to be any way to express that at the IR level for regular floating-point operations ...), and likewise the back-ends (but that looks straightforward: a back-end typically will model FP status as residing in a register or in a pseudo-memory slot, and those can simply be considered used/clobbered by function calls marked as within FENV_ACCESS ON regions).
I'm not sure if I fully understand this, but it seems to be an acceptable solution to the problem.
As mentioned above, couldn't we make the helper functions that read/write the FPEnv the barriers? That seems like a simpler solution.
I thought we couldn't do barriers. No barriers means no way to prevent code motion and mixing of constrained with non-constrained FP. That was the reason for having all FP in a function be constrained if any of it was.
OK, let me try to expand on my point 3 above, which appears to have confused everybody :-)
First, let's distinguish two separate requirements: those on floating-point operations that explicitly run in regions with non-default control modes, and those on floating-point operations that run outside such regions. (Note that those regions are by definition a subset of the regions marked as FENV_ACCESS ON, but not necessarily coincide with them.)
For the first class of operations, there are various requirements to constrain optimization. All of those are handled in the current design by simply representing those operations via constrained intrinsics. This can be done simply by always using constrained intrinsics whenever we are inside any FENV_ACCESS ON region. (My comments above were not intended to refer to those at all.)
Now, for the second class of operations, most of these requirements do not apply. However, there is one critical requirement that still does apply: such operations may never be moved to inside a region with non-default control modes. (Where it makes a difference, the operation can in principle be even moved inside a FENV_ACCESS ON region, as long as the control modes actually have not yet changed from the default modes.)
As @andrew.w.kaylor mentioned above, to fulfil this requirement if suffices to prevent moving any such floating-point operation across any statement that modifies the control modes (or inspects the exception flags). There is only a small number of statements that do so directly (those would all be inline asm or platform-specific builtins), but the action can be hidden behind a function call as well. This applies in particular to the related C library calls with well-known names (which the compiler could detect), but in addition any random function call might indirectly contain such a statement (which the compiler will not know, in general). Since LLVM today will freely move normal floating-point operations across normal function calls, this is a problem that needs to be addressed.
Now, one way to do that would be to ensure that any floating-point operation outside FENV_ACCESS ON regions is also implemented via constrained intrinsics, as long as there is any FENV_ACCESS ON present in the function at all. As @cameron.mcinally mentioned above, that would prevent reordering (across any function call, even) and therefore solve the problem. However, I believe there were concerns whether we can reliably ensure that property in the presence of optimizations like inlining, in particular if LTO is also used. Even if this can be resolved, I believe there were additional concerns that this might overly constrain optimization and lead to suboptimal performance.
This finally gets to the intent of my point 3 above, where I was trying to see if there is any way we can do better, that is correctly handle floating-point operations outside FENV_ACCESS ON regions without having to turn them all into constrained intrinsics. One way might be to add some new code motion barrier in the IR optimizers that would prevent movement of floating-point operations across the location of any FENV_ACCESS pragma. But -- and this is I think what @kpn referered to above -- those locations don't really correspond to anything in the IR that could serve as such a barrier. My observation above was simply that if we want to go that way, it in fact isn't really necessary to put those barriers at exactly these places. Instead, we can put the barriers at those places that will actually modify (or inspect) the control words -- which are either special operations we can explicitly recognize, or general function calls -- but only those calls where the call site was originally inside a FENV_ACCESS ON region (and was marked by the front-end as such). This has the advantage that a function call site is a natural place to act as barrier -- in fact it already acts as such e.g. for global memory accesses. (It would also be easy to implement as barrier on the MI / back-end level.) In addition, this approach would not lead to any performance penalty outside of FENV_ACCESS ON regions at all.
But given that there is still infrastructure missing in the IR optimizers, I also think that at least in the first implementation, we probably should go with the original approach and just use constrained intrinsics everywhere in the function, and possibly add some function attribute that prevent any cross-inlining of functions built with constrained intrinsics with functions built with regular floating-point operations.
So as far as I can see, the current implementation of -f(no-)trapping-math in LLVM is pretty much a stub. The driver passes the option on to CC1, and in addition it disables -menable-unsafe-fp-math and -massociative-math if -ftrapping-math is in effect (the latter may be a minor difference to GCC, but should be conservatively compatible). The compiler itself does nothing except mark all functions with the no-trapping-math attribute if -fno-trapping-math is in effect. LLVM itself completely ignores this attribute except on ARM, where it is used to set the ABI_FP_exceptions extended attribute. (But since LLVM doesn't really implement the flag in actual codegen, setting the attribute to indicate the property seems a bit odd ...)
Regarding GCC compatibility, I notice that GCC defaults to trapping math being enabled and I don't think that's what we want with clang. It also seems to imply something more than I think we need for constrained handling. For example, the GCC documentation says that -fno-trapping-math "can result in incorrect output for programs that depend on an exact implementation of IEEE or ISO rules/specifications for math functions" so it sounds like maybe it also implies (for GCC) something like LLVM's "afn" fast math flag.
So if we are going to use these options, I think we need to have a discussion about whether or not it's OK to diverge from GCC's interpretation of them.
The GCC documentation says:
-fno-trapping-math
Compile code assuming that floating-point operations cannot generate user-visible traps. These traps include division by zero, overflow, underflow, inexact result and invalid operation. This option requires that -fno-signaling-nans be in effect. Setting this option may allow faster code if one relies on ``non-stop'' IEEE arithmetic, for example.
This option should never be turned on by any -O option since it can result in incorrect output for programs that depend on an exact implementation of IEEE or ISO rules/specifications for math functions.
The default is -ftrapping-math.
As far as I can tell, this does not imply any additional differences except that -fno-trapping-math may rearrange floating-point operations to eliminate or introduce traps or changes to exception flags. It is those changes themselves (not anything else) that can result in incorrect output for programs that depend on exact IEEE semantics, and therefore this option must not be enabled by default at any optimization level, as the second paragraph above states.
Note that in GCC, -fno-trapping-math is implied by -funsafe-math-optimizations, which in turn is implied by -ffast-math. Those options all are documented using the same "incorrect output for programs that depend on exact IEEE semantics" clause: -funsafe-math-optimization is the catch-all for all such effects, but it is separated into -fno-signed-zeros, -fno-trapping-math, -fassociative-math, and -freciprocal-math that cover the distinct aspects of just how IEEE semantics might get violated.
We can of course still have a separate discussion on what the default should be. But even if we choose a different default than GCC (as is already the case e.g. for the -ffp-contract option), I think it would be preferable for the two explicit options to behave in a compatible way.
I'd like to hear more about this. The fesetexcept(...) function and friends change the FPEnv state, which can change the semantics for the instructions that follow. They definitely have to act as a barrier.
There's no sense in which we can have a code-motion barrier within a function that acts on the regular FP instructions. We can have a barrier for the constrained intrinsics. This is why we need, in this design, for any function that uses FENV_ACCESS=ON for any part of it, to always use the constrained instrinsics.
Ah, that makes more sense.
The problem I was missing is when a FENV_ACCESS=OFF operation, like a FDIV, is hoisted into a FENV_ACCESS=ON region. I see it now, but still think that forcing FENV_ACCESS=OFF operations into constrained intrinsics is a big hammer. If there is a way to add barriers around function calls in a FENV_ACCESS=ON region, that would be better.
Stepping way back, the *best* solution is to have the FPEnv implementation shut down unsafe optimizations on an individual basis. Perhaps we should be tagging functions that contain FENV_ACCESS=ON regions. That way unsafe optimization passes, e.g. hoisting, can query the tag to see if they should skip these functions.
But given that there is still infrastructure missing in the IR optimizers, I also think that at least in the first implementation, we probably should go with the original approach and just use constrained intrinsics everywhere in the function, and possibly add some function attribute that prevent any cross-inlining of functions built with constrained intrinsics with functions built with regular floating-point operations.
Yes, I'll stop moaning about this. This is perfectly suitable for a first implementation.
It may be a big hammer, but I don't think we'll need to use it terribly often (that is, I don't expect the mixing of FENV_ACCESS regions within a function to be common -- I could be wrong). In any event, we've been clear that choosing to enable FP environment access will mean sacrificing performance in the near term. Once we've had time to teach all the relevant optimizations how to handle the constrained intrinsics the hammer will start to feel much smaller.
One of the reasons we went with the approach we did is that it provides conservatively correct results by default. Optimizations don't need to be taught to leave the intrinsics alone. They do that as a default behavior when they don't recognize the intrinsic. If we had required optimizations to opt out as needed, we'd have problems chasing down all the places where that needed to happen. So we chose the opposite challenge of using the big hammer for initial implementation and then having to go looking for the places that needed to be taught how to interpret the intrinsics for cases when the optimization can still be performed.
I think at this point we're all on the same page in this regard. I just wanted to make sure we also understand why we're on that page. I still believe it was the correct choice.
Yup, we're on the same page.
To explain where I'm coming from, our in-house compiler has tracked down *many* of the trap-unsafe transformations in LLVM and can toggle them as desired. The execution penalty for running trap-safe code is (much) less than 5% with that compiler. My goal for this project is to make Clang competitive with that compiler.
Subtle. This last sentence seems to imply that cross-inlining should be allowed when there are no regular floating point operations in the function to be inlined. This makes sense due to, for example, the common use of tiny functions just to retrieve a value. Do I interpret your statement correctly?
Sure, that would be an optimization. Another potential optimization would be to allow inlining, but have the inliner automatically replace regular floating-point operations with constrained intrinsics in the target routine ... But all of that is probably "stage2" after an initial implementation that just disables inlining.
Digressing a bit, but has anyone given thought to how this implementation will play out with libraries? When running with traps enabled, libraries must be compiled for trap-safety. E.g. optimizations on a lib's code could introduce a NaN or cause a trap that does not exist in the code itself.
I'm wondering if the speed penalty for trap-safe code will be a problem with the general Clang population that does not care about traps...
In my reading of the standard text, there is no special provision for library code. This means that in general, calling any library function while running with nondefault trap settings is undefined behavior, unless the library was itself compiled with FENV_ACCESS ON. There does not even appear to be any requirement for the C standard library functions to have been compiled with FENV_ACCESS ON, as far as I can see ...
I've changed the patch so that calls to CreateFAdd() et al will give you constrained intrinsics if they are enabled. This required adding functions to enable/disable constrained-as-default plus calls to deal with the rounding mode and exception handling for contrained intrinsics.
I've left the independent create constrained functions to make it dead easy to move them out of the header and avoid inlining bloat.
I'm not dealing with any fast math flags or metadata currently. I'm not sure they make sense in a constrained context, or at least I haven't seen anyone argue they must be in place for an initial implementation.
I'll take this conversation to llvm-dev in the near future. I don't want to detract from Kevin's work.
Apologies for highjacking this Diff, Kevin.
include/llvm/IR/IRBuilder.h | ||
---|---|---|
1244 ↗ | (On Diff #176432) | I think you can forgo the else { in these functions since the if branch returns immediately. |
I emailed llvm-dev and cfe-dev on January 16th. The only responses I got back were of the don't care variety. For now it seems the constrained intrinsics will only be used by clang.
@rsmith, does the direction of this patch seem reasonable for clang?
Once Richard comments is there anything else needing us to delay here?
I think this looks straightforward, as long as people agree to have a separate CreateConstrained* version of these functions. I'm not qualified to weigh in on that as I don't know Clang at all and can't comment about the tradeoffs (although I think they have been well articulated in the discussion). From what I can see, that is the only thing blocking this from getting approved (unless there is something else I missed while reading the discussion).
The only other comment I have is there was some very good description about the intention here from @uweigand, @cameron.mcinally and @andrew.w.kaylor and @kpn. I think it would be good if that discussion is extracted from this review and put somewhere (perhaps the language ref) to indicate precisely what the semantics are we are trying to achieve with FENV_ACCESS ON/OFF.
include/llvm/IR/IRBuilder.h | ||
---|---|---|
228 ↗ | (On Diff #176464) | This is a minor quibble, but the method is setIsConstrainedFP, while the member is IsFPConstrained. |
I'm waiting for a signoff from at least one front-end guy. I hope the mode setting that allows us to keep front ends from having to touch every use of the IRBuilder is something that works for clang. But I haven't heard anything from @rsmith or any other front-end person.
include/llvm/IR/IRBuilder.h | ||
---|---|---|
228 ↗ | (On Diff #176464) | Yeah, that's an oversight. Fixed. |
include/llvm/IR/IRBuilder.h | ||
---|---|---|
234 ↗ | (On Diff #176464) | I think it would be better to add some enumerated type to describe the exception semantic and rounding modes. The MDNode is an implementation detail and an awkward one for the front end to have to deal with. |
include/llvm/IR/IRBuilder.h | ||
---|---|---|
110 ↗ | (On Diff #176464) | Instead of adding these here, make these inline initializers on lines 100-102. |
228 ↗ | (On Diff #176464) | IS this fixed? |
1249 ↗ | (On Diff #176464) | Why set the last 2 to nullptr when you have defaults for these? |
1259 ↗ | (On Diff #176464) | The last 2 parameters are never actually used except in the test. Are these really important to have if they are never used in source? |
1408 ↗ | (On Diff #176464) | All these CreateConstrainedXXX should be distilled down to a single function that takes the intrinsic as a parameter. |
I've posted a small change to this patch here, https://reviews.llvm.org/D62730 and there's a patch to add clang fp options here, https://reviews.llvm.org/D62731
include/llvm/IR/IRBuilder.h | ||
---|---|---|
234 ↗ | (On Diff #176464) | I posted a patch showing the rounding and exception information being passed as enumeration not MDNode. I've uploaded the patch here, https://reviews.llvm.org/D62730 |
I don't know which patch will proceed; but they weren't added already, this seems to be missing documentation changes.
I wanted to get the API straight before working on documentation.
include/llvm/IR/IRBuilder.h | ||
---|---|---|
228 ↗ | (On Diff #176464) | In my working copy, yes. |
234 ↗ | (On Diff #176464) | It's a good idea. D62730 beat me to the punch. |
1249 ↗ | (On Diff #176464) | Eh, no really good reason. |
1259 ↗ | (On Diff #176464) | They're part of the intrinsics. It seems like there should be a way to emit them as part of the instruction when creating the instruction. |
1408 ↗ | (On Diff #176464) | That moves us further down the road towards having modes. I haven't seen a front-end guy sign off on it yet, but it seems like a good idea. |
include/llvm/IR/IRBuilder.h | ||
---|---|---|
234 ↗ | (On Diff #176464) | BTW I created FPState.h because if I put the enumeration types in IRBuilder it dragged in too many include files in clang and required other random source changes for disambigutation. Anyway I don't need to comandeer your patch -- I don't want it to die from neglect. I can close mine? |
Oh, this ticket is not going to die from neglect. It is true that D43515 is a higher priority, but I need this IRBuilder work done as well. My department head wanted it done by the end of _last_ year. It's not going to die.
How about I merge your changes into this ticket and we continue work over here? There is the issue of the documentation that lebedev.ri asked you to write. Can I talk you into putting that together and sending it to me <kevin.neal@sas.com>? I'll work on the documentation that I was asked to write. Between the two of us we should be in pretty good shape. Does that work for you?
I am still waiting for feedback from an actual consumer of the IRBuilder who will be using this new functionality. If someone clang-side could chime in on this ticket I'd very much appreciate it.
Yes I'll do that. Thanks.
I am still waiting for feedback from an actual consumer of the IRBuilder who will be using this new functionality. If someone clang-side could chime in on this ticket I'd very much appreciate it.
I wrote a clang patch that works with your IRBuilder modifications. I also checked with the Intel fortran team and they think this interface will be workable for them too. From Intel perspective it's +1.
Address the rest of the review comments, hopefully.
I've lifted from mibintc's D62730 the enums for the exception behavior and rounding modes. The rest of that set of changes is better left to a separate ticket after all.
Since I now have those requested enums I've changed the relevant public functions to take them instead of an MDNode*.
There is no standalone or comprehensive IRBuilder documentation that I could find. I hope the references in comments back to the LangRef are good enough. I used the exact language from it to make it easier to reference said doc.
This works for me, I redid my patches adding fp-model options to work with this newest changeset, (but I haven't yet uploaded them.)
include/llvm/IR/IRBuilder.h | ||
---|---|---|
113 ↗ | (On Diff #202987) | Should these have "FP" in the name somewhere? And are they really IRBuilder-specific concepts, as opposed to something that should be declared as part of the interface for working with these intrinsics? Also, I believe we can use explicit underlying types now in LLVM; it'd be nice if we didn't make IRBuilder unnecessarily large. |
255 ↗ | (On Diff #202987) | This seems unnecessary. |
1138 ↗ | (On Diff #202987) | Huh? You build an MDNode that wraps an MDString and then immediately extract the MDString from it and drop the MDNode? I think you should just have a function somewhere that returns the correct metadata string for a ConstrainedRoundingKind, and then the code here is much more obvious. Maybe this can be wherever you declare the enums. You can also have a function that goes the other way and returns an Optional<ConstrainedRoundingKind>. Function name should include FP. Same points apply to the next function. |
228 ↗ | (On Diff #176464) | Maybe this should be more explicit about what exactly it does? Specifically, it changes the behavior of the CreateF<Op> methods so that they use constrained intrinsics instead of the standard instructions. |
include/llvm/IR/IRBuilder.h | ||
---|---|---|
113 ↗ | (On Diff #202987) | Would it be better to use the RoundingMode and ExceptionBehavior enums in the ConstrainedFPIntrinsic class? These enums->strings here get turned back into those IntrinsicInst.h enums eventually anyway. But that means pulling in yet more headers in IRBuilder.h. I admit I'm not sure what you mean with your second paragraph. Is that a way of saying that, for example, the relevant IntrinsicInst.h enums should be used instead? |
include/llvm/IR/IRBuilder.h | ||
---|---|---|
113 ↗ | (On Diff #202987) | I think it's IRBuilder.h's fate to pull in the majority of the headers in IR, but even if we want to avoid that, duplicating the enums seems like a step too far. If there's too much code in the header declaring that intrinsic, you should extract a smaller header that just declares the enums and their string conversions. The second paragraph is asking for these enums to be specifically constrained to uint8_t so that we aren't wasting a ton of memory everywhere we store them. IRBuilder doesn't have strong size constraints, but it'd still be nice if all these accumulated features didn't make it hundreds of bytes larger than it needs to be. |
Address review comments.
Run the changes through clang-format.
include/llvm/IR/IRBuilder.h | ||
---|---|---|
1138 ↗ | (On Diff #202987) | Well, when you put it that way, your proposed change does result in much simpler code. The resulting code isn't that much more than the switch. I hope it is now easy enough to read. These functions are private to the class and just exist to be helpers. I don't see a need to put the switch table into a different function, but I'm willing to be convinced. |
include/llvm/IR/IRBuilder.h | ||
---|---|---|
1138 ↗ | (On Diff #202987) | I feel like having the enum->string mapping available somewhere independent of IRBuilder will probably be useful to someone, and it nicely separates concerns in any case . You could just make it a static method on ConstrainedFPIntrinsic along with the reverse operation. |
include/llvm/IR/IRBuilder.h | ||
---|---|---|
1138 ↗ | (On Diff #202987) | Ok, will do. |
Add static methods to convert between a StringRef and the enums for RoundingMode or ExceptionBehavior.
include/llvm/IR/IntrinsicInst.h | ||
---|---|---|
235 ↗ | (On Diff #205371) | I can think of a couple of alternatives. If they don't overlap then we have to go back and sweep the source to make sure that ebUnspecified is always handled in all cases that currently handle ebInvalid. And in the future nobody is allowed to check in source that doesn't handle both. Or, we could drop ebUnspecified, but then ebInvalid would have a valid meaning to the IRBuilder interface. That looks like a bug even if it works properly. Generally, adding eb/rmUnspecified but having them overlap the invalid cases seems to me to be the best option. |
include/llvm/IR/IntrinsicInst.h | ||
---|---|---|
235 ↗ | (On Diff #205371) | What is the purpose of ebInvalid? Is this a valid argument to the intrinsic, or is it there so that getExceptionBehavior() has something to return if it doesn't recognize the argument? For the latter, maybe it would be better to just assert that the argument was one of the recognized possibilities; we don't generally expect LLVM to silently propagate ill-formed IR. Or if it's valid to not provide exception behavior to the intrinsic, maybe that's the same as ebUnspecified, or maybe the accessor should return Optional<ExceptionBehavior>. |
I found out that some compilers, including SAS/C, will warn about enum overlaps like the one here under discussion. So I now believe John is correct and that eliminating that overlap is the right thing to do. Over the weekend I'll be thinking about whether or not to try and find an alternative, but I'll be out of cell phone range until late Sunday evening.
include/llvm/IR/IntrinsicInst.h | ||
---|---|---|
235 ↗ | (On Diff #205371) | The current code is in ConstrainedFPIntrinsic::getExceptionBehavior(), which returns ebInvalid if the metadata is missing or somehow wrong (wrong type, invalid string, etc). This is then used by the IR verifier which fails with an Assert() if it gets ebInvalid. So we aren't propagating ill-formed IR. The intrinsic requires the metadata, and it requires the metadata be valid. But I don't want to clutter up IRBuilder's consumer's code. That's why I wanted the API to allow those arguments to not be given. One idea of mine is to have ebInvalid be valid when given to the IRBuilder, and the meaning would be "this value is unspecified". But that seems weird. Maybe there's another way. |
include/llvm/IR/IntrinsicInst.h | ||
---|---|---|
235 ↗ | (On Diff #205371) |
Okay, in that case, I don't understand why this accessor shouldn't just assume that it's working on valid IR and not be able to return ebInvalid.
If you want the IRBuilder factories to be able to not take a value (if there's some defaulting scheme that's more complicated than you can express with a default argument), you can have it take an Optional<Whatever>, and then people can just pass in None; and then you don't need to muddy up the enum with a case that isn't really a valid alternative. |
Address review comments. Eliminate ConstrainedFPIntrinsic's ebInvalid and rmInvalid enumeration values and replace them with use of the Optional<> class. Adjust the rest of the patch to take that into account. ConstrainedFPIntrinsic now supports string <--> enum conversions.
include/llvm/IR/IRBuilder.h | ||
---|---|---|
259 ↗ | (On Diff #207378) | Okay, so what are the invariants here now? It looks like, in order to enable constrained floating point, I need to also have set the default exception behavior and rounding mode. That should at least be documented, but maybe a better approach would be to require these to be passed in when enabling constrained FP. |
1324 ↗ | (On Diff #207378) | FPMD is dropped in this case; I don't know if that's intentional. |
1459 ↗ | (On Diff #207378) | It looks like nothing in IRBuilder ever passes these arguments. Are you just anticipating that someone might want to call this directly? |
include/llvm/IR/IRBuilder.h | ||
---|---|---|
259 ↗ | (On Diff #207378) | The IRBuilder constructor sets the defaults to ebStrict and rmDynamic, but leaves strict mode off. So if you only turn on strict mode you get the strictest settings. This implies that it isn't possible to trigger this assertion. Which is true unless you have some form of memory overwrite or someone comes along later and puts in a bad IRBuilder change. The more compiler work I've done over the years the more sanity checks I've gotten into the habit of adding. |
1324 ↗ | (On Diff #207378) | You can see that I'm on the fence here. I'm not sure that mixing fast math with constrained math makes sense. So CreateConstrainedFPBinOp() can take an instruction for copying fast math flags, but it doesn't do anything with this instruction. And the FPMD is simply dropped in the non-*FMF() methods. If we do decide later to support mixing constrained and fast math then we won't have to change any APIs. Until then it takes the conservative route and drops the info. |
1459 ↗ | (On Diff #207378) | Correct. And I wrote tests for it in IRBuilderTest.cpp. |
include/llvm/IR/IRBuilder.h | ||
---|---|---|
259 ↗ | (On Diff #207378) | Okay, that scheme makes sense, but I think sanity-checking that there hasn't been an arbitrary memory smasher is a bit much. Using Optional when the value can't actually be missing just makes the code more confusing for readers and maintainers; please just leave these non-optional. |
1324 ↗ | (On Diff #207378) | Okay. That should probably be mentioned, at least in the documentation for CreateConstrainedFPBinOp. Should you make an overload of the latter which takes an MDNode* as the final argument, for parallelism/completeness? |
1459 ↗ | (On Diff #207378) | Okay. So people who want to pass these explicitly just can't use the convenience methods? That seems like a reasonable compromise. |
include/llvm/IR/IRBuilder.h | ||
---|---|---|
1324 ↗ | (On Diff #207378) | I made one function that takes both. Patch coming in a minute. |
I was convinced in email that having the constrained intrinsics take fast math flags could be useful in some circumstances. The example that was given was that perhaps FMA contraction could be requested. So, when generating constrained intrinsics the fast math flags and the FP metadata is now handled exactly the same as in non-constrained mode. I've updated the documentation to warn about that when turning constrained mode on and off.
I was also a little worried about having CreateConstrainedFPBinOp() handle fast math flags differently from CreateBinOp(). Now they're the same.