This is an archive of the discontinued LLVM Phabricator instance.

[RFC] Allow target to handle STRICT floating-point nodes
AbandonedPublic

Authored by uweigand on Apr 12 2018, 9:54 AM.

Details

Summary

The ISD::STRICT_ nodes used to implement the constrained floating-point intrinsics are currently never passed to the target back-end, which makes it impossible to handle them correctly (e.g. mark instructions are depending on a floating-point status and control register, or mark instructions as possibly trapping).

This patch allows the target to use setOperationAction to switch the action on ISD::STRICT_ nodes to Legal. If this is done, the SelectionDAG common code will stop converting the STRICT nodes to regular floating-point nodes, but instead pass the STRICT nodes to the target using normal SelectionDAG matching rules.

To avoid having the back-end duplicate all the floating-point instruction patterns to handle both strict and non-strict variants, we implement the following mechanism to represent both using the same MI definition:

Floating-point instruction dependencies are represented at the MI level in two ways:

  • All FP instructions (strict *and* non-strict) are modeled to use a platform-specifc floating-point control register. This models global state that FP operations depend on, e.g. current rounding mode and exception mask setting. (For non-strict instructions, the only difference this makes is that they cannot be scheduled across an instruction that changes the FPC register.)
  • Strict FP instructions are modeled to carry an extra memory operand refering to a new PseudoSourceValue of class FPStatus. This represents the changes to global state that may be *caused* by the instruction, e.g. setting exception status bits or triggering a trap.

In order to allow the same MI definition to be used both for strict operations *with* FPStatus memory operand and non-strict operations without such an operand, we introduce a new flag mayAccessMemory. This implies that:

  • If the instruction does not carry any memory operand, it is not actually considered to access memory.
  • If the instruction *does* carry a memory operand, it is considered mayLoad and/or mayStore depending on that operand.

Common code provides a number of convenience multi-alternative pattern fragments like any_fadd that the target can use to match both strict and non-strict versions of a FP operation with a single DAG pattern. When matching the strict version, instruction selection code will copy over the FPStatus memory operand; when matching the non-strict version, there will be no such operand.

This patch implements the common-code changes as described above, and also converts the SystemZ target to make full use of them. There are test cases to verify that all the operations still match and generate the same set of instructions, as well as a test case to verify that strict operations are no longer scheduled inappropriately.

Diff Detail

Event Timeline

uweigand created this revision.Apr 12 2018, 9:54 AM
kpn added a subscriber: kpn.Apr 26 2018, 7:54 AM

The reason that I originally implemented this the way I did, mutating the strict nodes to their non-constrained equivalents, was that I thought it would require too much duplication in the .td files to implement all the pattern matching for the strict nodes. The original plan was to find some other way to communicate the "strict" state to the target after instruction selection, but I never found a way to do that.

What you've done here seems reasonable to me. Obviously it still involves a lot of updates to the td files, but your approach to making that manageable going forward seems plausible. I really don't have the expertise in instruction selection to judge that completely, but this looks like a promising direction.

I particularly like that you've introduced it as a way for targets to opt-in. I believe some targets are already modeling the FP status and control registers and may not need this. X86 isn't one of them, so I think we'll want to consider something like what you've done here.

The reason that I originally implemented this the way I did, mutating the strict nodes to their non-constrained equivalents, was that I thought it would require too much duplication in the .td files to implement all the pattern matching for the strict nodes. The original plan was to find some other way to communicate the "strict" state to the target after instruction selection, but I never found a way to do that.

What you've done here seems reasonable to me. Obviously it still involves a lot of updates to the td files, but your approach to making that manageable going forward seems plausible. I really don't have the expertise in instruction selection to judge that completely, but this looks like a promising direction.

Thanks for looking at this!

Following the BoF at EuroLLVM, I've been talking to Chris Lattner about this, and he suggested a somewhat different approach: providing a way for an MI instruction to have the "unmodeled side effects" flag be provided by an MI *operand*. That is, right now the hasSideEffects flag is a constant: any particular MI instruction class has this either set or clear, but to model FP instructions, it might be better to have a setting where whether or not any particular instantiation of the instruction has side effects depends on the value of an operand.

This would avoid the duplication of all the FP instructions into two classes, and instead just give them an additional operand. I'm still looking into how to best implement this. (Some tablegen changes will likely be required as well.)

I particularly like that you've introduced it as a way for targets to opt-in. I believe some targets are already modeling the FP status and control registers and may not need this. X86 isn't one of them, so I think we'll want to consider something like what you've done here.

Can you elaborate which targets are already modeling the FP status flags? This would be surprising to me, since actually modeling those for the current non-strict semantics would cause significantly worse code to be generated ...

(Of course, this should still be opt-in per target, not only to allow for a step-by-step transition to the new scheme, but also since some targets may not support strict IEEE semantics anyway ...)

The reason that I originally implemented this the way I did, mutating the strict nodes to their non-constrained equivalents, was that I thought it would require too much duplication in the .td files to implement all the pattern matching for the strict nodes. The original plan was to find some other way to communicate the "strict" state to the target after instruction selection, but I never found a way to do that.

What you've done here seems reasonable to me. Obviously it still involves a lot of updates to the td files, but your approach to making that manageable going forward seems plausible. I really don't have the expertise in instruction selection to judge that completely, but this looks like a promising direction.

Thanks for looking at this!

Following the BoF at EuroLLVM, I've been talking to Chris Lattner about this, and he suggested a somewhat different approach: providing a way for an MI instruction to have the "unmodeled side effects" flag be provided by an MI *operand*. That is, right now the hasSideEffects flag is a constant: any particular MI instruction class has this either set or clear, but to model FP instructions, it might be better to have a setting where whether or not any particular instantiation of the instruction has side effects depends on the value of an operand.

But that's much stronger than necessary, and will prevent all kinds of optimizations. I don't think that we want to do that. Why not just add a dependence on the registers that matter?

This would avoid the duplication of all the FP instructions into two classes, and instead just give them an additional operand. I'm still looking into how to best implement this. (Some tablegen changes will likely be required as well.)

I particularly like that you've introduced it as a way for targets to opt-in. I believe some targets are already modeling the FP status and control registers and may not need this. X86 isn't one of them, so I think we'll want to consider something like what you've done here.

Can you elaborate which targets are already modeling the FP status flags? This would be surprising to me, since actually modeling those for the current non-strict semantics would cause significantly worse code to be generated ...

I suppose that it depends on what you mean. The PowerPC backend, for example, as an RM pseduo-register to model the state of the current rounding mode. Moreover, we do model use dependencies on RM on calls and FP instructions. I think that all we need to do is add defs of RM (and likely rename RM to represent some more-general status).

(Of course, this should still be opt-in per target, not only to allow for a step-by-step transition to the new scheme, but also since some targets may not support strict IEEE semantics anyway ...)

! In D45576#1108112, @hfinkel wrote:

...

Why not just add a dependence on the registers that matter?

That would be ideal, as long as there's a switch to ignore the dependency for users that do not care about traps.

On x86, the MXCSR register could be both used and defined by simple operations, such as a DIV. That could, of course, prevent a ton of optimizations... unless the individual bits of MXCSR are tracked.

Following the BoF at EuroLLVM, I've been talking to Chris Lattner about this, and he suggested a somewhat different approach: providing a way for an MI instruction to have the "unmodeled side effects" flag be provided by an MI *operand*. That is, right now the hasSideEffects flag is a constant: any particular MI instruction class has this either set or clear, but to model FP instructions, it might be better to have a setting where whether or not any particular instantiation of the instruction has side effects depends on the value of an operand.

But that's much stronger than necessary, and will prevent all kinds of optimizations. I don't think that we want to do that. Why not just add a dependence on the registers that matter?

So the above statement was assuming that you need hasSideEffects on the MI level to correctly model certain strict FP effects, specifically the mode where FP instructions may trap. I agree that hasSideEffects is unneeded if we only want to model e.g. rounding modes or non-trapping exception flags. But I don't really see how we can correctly model trapping instructions only by using register dependencies.

That's why my patch here uses multiclasses to provide two versions of all FP instructions, one with hasSideEffects set and one without. The comment about using an operand to model side effects was meant in that context, to provide an easier way to achive the same effect without this duplication of instructions.

Can you elaborate which targets are already modeling the FP status flags? This would be surprising to me, since actually modeling those for the current non-strict semantics would cause significantly worse code to be generated ...

I suppose that it depends on what you mean. The PowerPC backend, for example, as an RM pseduo-register to model the state of the current rounding mode. Moreover, we do model use dependencies on RM on calls and FP instructions. I think that all we need to do is add defs of RM (and likely rename RM to represent some more-general status).

I see. For just modeling the rounding mode, the current PowerPC implementation actually looks fine to me, since this will not introduce many unnecessary scheduling constraints. (The FP instructions are only users of the RM register, and the only defs are the intrinsics -- and in the future probably any call within a FENV_ACCESS region, but that can be handled like a variant calling convention.)

But extending that model to non-trapping exception status flags will introduce significant constraints: basically every FP instruction will have to both use and def the register, in effect creating a totally ordered chain of all FP instructions that prevents any reordering. I do not think you'll want that active by default ... And as to the trapping exception mode, as mentioned above, I don't believe just register dependencies are enough.

Following the BoF at EuroLLVM, I've been talking to Chris Lattner about this, and he suggested a somewhat different approach: providing a way for an MI instruction to have the "unmodeled side effects" flag be provided by an MI *operand*. That is, right now the hasSideEffects flag is a constant: any particular MI instruction class has this either set or clear, but to model FP instructions, it might be better to have a setting where whether or not any particular instantiation of the instruction has side effects depends on the value of an operand.

But that's much stronger than necessary, and will prevent all kinds of optimizations. I don't think that we want to do that. Why not just add a dependence on the registers that matter?

So the above statement was assuming that you need hasSideEffects on the MI level to correctly model certain strict FP effects, specifically the mode where FP instructions may trap. I agree that hasSideEffects is unneeded if we only want to model e.g. rounding modes or non-trapping exception flags. But I don't really see how we can correctly model trapping instructions only by using register dependencies.

That's a good point. I'm not sure that we want to model the effects of trap handlers, however. We don't at the IR level (as we mark the intrinsics as IntrInaccessibleMemOnly), and don't want to (because otherwise we need to assume that every FP operation is like an arbitrary external function call). As a result, I don't think that we want to here either. Also, can we delete the instructions when they're dead?

I suspect that a better alternative is to mark the instructions so that they form scheduling barriers (which, right now, looks like marking them with isCall, although maybe there's a better option).

That's why my patch here uses multiclasses to provide two versions of all FP instructions, one with hasSideEffects set and one without. The comment about using an operand to model side effects was meant in that context, to provide an easier way to achive the same effect without this duplication of instructions.

Can you elaborate which targets are already modeling the FP status flags? This would be surprising to me, since actually modeling those for the current non-strict semantics would cause significantly worse code to be generated ...

I suppose that it depends on what you mean. The PowerPC backend, for example, as an RM pseduo-register to model the state of the current rounding mode. Moreover, we do model use dependencies on RM on calls and FP instructions. I think that all we need to do is add defs of RM (and likely rename RM to represent some more-general status).

I see. For just modeling the rounding mode, the current PowerPC implementation actually looks fine to me, since this will not introduce many unnecessary scheduling constraints. (The FP instructions are only users of the RM register, and the only defs are the intrinsics -- and in the future probably any call within a FENV_ACCESS region, but that can be handled like a variant calling convention.)

But extending that model to non-trapping exception status flags will introduce significant constraints: basically every FP instruction will have to both use and def the register, in effect creating a totally ordered chain of all FP instructions that prevents any reordering. I do not think you'll want that active by default ... And as to the trapping exception mode, as mentioned above, I don't believe just register dependencies are enough.

That's a good point. I'm not sure that we want to model the effects of trap handlers, however. We don't at the IR level (as we mark the intrinsics as IntrInaccessibleMemOnly), and don't want to (because otherwise we need to assume that every FP operation is like an arbitrary external function call). As a result, I don't think that we want to here either. Also, can we delete the instructions when they're dead?

Well, we certainly have a use case where trapping exception mode needs to be supported. (At the very least, the compiler must generate code that runs correctly in an environment where trapping mode is enabled; in particular, FP operations must not be speculated, so that we don't get spurious traps.)

Also, can we delete the instructions when they're dead?

According to the C11 standard F.9.1, no:

Concern about side effects may inhibit code motion and removal of seemingly useless code. For example, in

#include <fenv.h>
#pragma STDC FENV_ACCESS ON
void f(double x)
{
/* ... */
for (i = 0; i < n; i++) x + 1;
/* ... */
}

x + 1 might raise floating-point exceptions, so cannot be removed. And since the loop body might not execute (maybe 0 ≥ n), x + 1 cannot be moved out of the loop. (Of course these optimizations are valid if the implementation can rule out the nettlesome cases.)

This specification does not require support for trap handlers that maintain information about the order or count of floating-point exceptions. Therefore, between function calls, floating-point exceptions need not be precise: the actual order and number of occurrences of floating-point exceptions (> 1) may vary from what the source code expresses. Thus, the preceding loop could be treated as

if (0 < n) x + 1;

Note that this applies even in default (non-trapping) exception handling mode, because removing dead operations may still change the exception status flags read out by subsequent calls to fegetexcept.

That's a good point. I'm not sure that we want to model the effects of trap handlers, however. We don't at the IR level (as we mark the intrinsics as IntrInaccessibleMemOnly), and don't want to (because otherwise we need to assume that every FP operation is like an arbitrary external function call). As a result, I don't think that we want to here either. Also, can we delete the instructions when they're dead?

Well, we certainly have a use case where trapping exception mode needs to be supported. (At the very least, the compiler must generate code that runs correctly in an environment where trapping mode is enabled; in particular, FP operations must not be speculated, so that we don't get spurious traps.)

Also, can we delete the instructions when they're dead?

According to the C11 standard F.9.1, no:

Concern about side effects may inhibit code motion and removal of seemingly useless code. For example, in

#include <fenv.h>
#pragma STDC FENV_ACCESS ON
void f(double x)
{
/* ... */
for (i = 0; i < n; i++) x + 1;
/* ... */
}

x + 1 might raise floating-point exceptions, so cannot be removed. And since the loop body might not execute (maybe 0 ≥ n), x + 1 cannot be moved out of the loop. (Of course these optimizations are valid if the implementation can rule out the nettlesome cases.)

This specification does not require support for trap handlers that maintain information about the order or count of floating-point exceptions. Therefore, between function calls, floating-point exceptions need not be precise: the actual order and number of occurrences of floating-point exceptions (> 1) may vary from what the source code expresses. Thus, the preceding loop could be treated as

if (0 < n) x + 1;

Note that this applies even in default (non-trapping) exception handling mode, because removing dead operations may still change the exception status flags read out by subsequent calls to fegetexcept.

Ah, okay. That doesn't seem so bad. The fact that the number of times they're called can't be observable means, I think, that we only need to model the trapping behavior as writing (and not reading) inaccessible memory. This can't be removed or speculated, but otherwise shouldn't have undo optimization implications. Do you agree?

That's a good point. I'm not sure that we want to model the effects of trap handlers, however. We don't at the IR level (as we mark the intrinsics as IntrInaccessibleMemOnly), and don't want to (because otherwise we need to assume that every FP operation is like an arbitrary external function call). As a result, I don't think that we want to here either. Also, can we delete the instructions when they're dead?

Well, we certainly have a use case where trapping exception mode needs to be supported. (At the very least, the compiler must generate code that runs correctly in an environment where trapping mode is enabled; in particular, FP operations must not be speculated, so that we don't get spurious traps.)

Also, can we delete the instructions when they're dead?

According to the C11 standard F.9.1, no:

Concern about side effects may inhibit code motion and removal of seemingly useless code. For example, in

#include <fenv.h>
#pragma STDC FENV_ACCESS ON
void f(double x)
{
/* ... */
for (i = 0; i < n; i++) x + 1;
/* ... */
}

x + 1 might raise floating-point exceptions, so cannot be removed. And since the loop body might not execute (maybe 0 ≥ n), x + 1 cannot be moved out of the loop. (Of course these optimizations are valid if the implementation can rule out the nettlesome cases.)

This specification does not require support for trap handlers that maintain information about the order or count of floating-point exceptions. Therefore, between function calls, floating-point exceptions need not be precise: the actual order and number of occurrences of floating-point exceptions (> 1) may vary from what the source code expresses. Thus, the preceding loop could be treated as

if (0 < n) x + 1;

Note that this applies even in default (non-trapping) exception handling mode, because removing dead operations may still change the exception status flags read out by subsequent calls to fegetexcept.

Ah, okay. That doesn't seem so bad. The fact that the number of times they're called can't be observable means, I think, that we only need to model the trapping behavior as writing (and not reading) inaccessible memory. This can't be removed or speculated, but otherwise shouldn't have undo

Correction: I mean writing to memory (the trap handler might write memory other than inaccessible memory (e.g., some global variable)). Anything that's escaped (trivially including globals) is fair game.

optimization implications. Do you agree?

That's a good point. I'm not sure that we want to model the effects of trap handlers, however. We don't at the IR level (as we mark the intrinsics as IntrInaccessibleMemOnly), and don't want to (because otherwise we need to assume that every FP operation is like an arbitrary external function call). As a result, I don't think that we want to here either. Also, can we delete the instructions when they're dead?

Well, we certainly have a use case where trapping exception mode needs to be supported. (At the very least, the compiler must generate code that runs correctly in an environment where trapping mode is enabled; in particular, FP operations must not be speculated, so that we don't get spurious traps.)

Also, can we delete the instructions when they're dead?

According to the C11 standard F.9.1, no:

Concern about side effects may inhibit code motion and removal of seemingly useless code. For example, in

#include <fenv.h>
#pragma STDC FENV_ACCESS ON
void f(double x)
{
/* ... */
for (i = 0; i < n; i++) x + 1;
/* ... */
}

x + 1 might raise floating-point exceptions, so cannot be removed. And since the loop body might not execute (maybe 0 ≥ n), x + 1 cannot be moved out of the loop. (Of course these optimizations are valid if the implementation can rule out the nettlesome cases.)

This specification does not require support for trap handlers that maintain information about the order or count of floating-point exceptions. Therefore, between function calls, floating-point exceptions need not be precise: the actual order and number of occurrences of floating-point exceptions (> 1) may vary from what the source code expresses. Thus, the preceding loop could be treated as

if (0 < n) x + 1;

Note that this applies even in default (non-trapping) exception handling mode, because removing dead operations may still change the exception status flags read out by subsequent calls to fegetexcept.

Ah, okay. That doesn't seem so bad. The fact that the number of times they're called can't be observable means, I think, that we only need to model the trapping behavior as writing (and not reading) inaccessible memory. This can't be removed or speculated, but otherwise shouldn't have undo

Correction: I mean writing to memory (the trap handler might write memory other than inaccessible memory (e.g., some global variable)). Anything that's escaped (trivially including globals) is fair game.

I suppose that they also get to read memory (so long as they're not using it to keep ordering/count information). So the trap handlers are modeled as reading/writing arbitrary escaped memory. That, plus register dependencies, seems like it should be sufficient.

optimization implications. Do you agree?

Ah, okay. That doesn't seem so bad. The fact that the number of times they're called can't be observable means, I think, that we only need to model the trapping behavior as writing (and not reading) inaccessible memory. This can't be removed or speculated, but otherwise shouldn't have undo optimization implications.

Correction: I mean writing to memory (the trap handler might write memory other than inaccessible memory (e.g., some global variable)). Anything that's escaped (trivially including globals) is fair game.

I suppose that they also get to read memory (so long as they're not using it to keep ordering/count information). So the trap handlers are modeled as reading/writing arbitrary escaped memory. That, plus register dependencies, seems like it should be sufficient.

So if I understand you correctly, you're suggesting (at the MI level) to mark trapping FP operations as mayLoad and mayStore instead of hasSideEffects (as the patch does originally)?

That's probably OK, I think. But this would still restrict the optimization possibilities quite a bit compared to now, so you still wouldn't want to enable it by default in non-strict FP mode, right? In that case we're back to the question whether we need to duplicate all FP patterns or not.

Ah, okay. That doesn't seem so bad. The fact that the number of times they're called can't be observable means, I think, that we only need to model the trapping behavior as writing (and not reading) inaccessible memory. This can't be removed or speculated, but otherwise shouldn't have undo optimization implications.

Correction: I mean writing to memory (the trap handler might write memory other than inaccessible memory (e.g., some global variable)). Anything that's escaped (trivially including globals) is fair game.

I suppose that they also get to read memory (so long as they're not using it to keep ordering/count information). So the trap handlers are modeled as reading/writing arbitrary escaped memory. That, plus register dependencies, seems like it should be sufficient.

So if I understand you correctly, you're suggesting (at the MI level) to mark trapping FP operations as mayLoad and mayStore instead of hasSideEffects (as the patch does originally)?

That's probably OK, I think. But this would still restrict the optimization possibilities quite a bit compared to now, so you still wouldn't want to enable it by default in non-strict FP mode, right? In that case we're back to the question whether we need to duplicate all FP patterns or not.

That's correct. And, yes, we only want to do this in modes where we must.

To take a hint from the suggestion you relayed from Chris, how about we just add MMOs to these instructions, and then let mayLoad/mayStore look at optional MMOs when returning their answer? Maybe this lets us do what we want without duplicating all of the patterns?

To take a hint from the suggestion you relayed from Chris, how about we just add MMOs to these instructions, and then let mayLoad/mayStore look at optional MMOs when returning their answer? Maybe this lets us do what we want without duplicating all of the patterns?

That sound like a good idea. I'll experiment with this approach a bit to see how far I can get. I guess we'll have to review the optimizers to catch places that simply look at the mayLoad/mayStore flag without detailed checking of MMOs -- we certainly want to avoid regressing on current behavior in the default case.

uweigand updated this revision to Diff 148944.May 29 2018, 10:34 AM

(Failed) attempt to handle strict and regular FP operations without duplicating patterns ...

To take a hint from the suggestion you relayed from Chris, how about we just add MMOs to these instructions, and then let mayLoad/mayStore look at optional MMOs when returning their answer? Maybe this lets us do what we want without duplicating all of the patterns?

So I was indeed able to change mayLoad and mayStore to allow optional MMOs as you suggested. (Not part of this patch, but I can post it separately if you're interested.)

Initially I thought that this would allow recognizing strict and non-strict operations without duplicating patterns, using an approach along the lines of the new patch attached here. Unfortunately this turned out to be incorrect. Calling mutateStrictFPToFP as I do here *removes the chain*, which causes multiple distinct copies of a floating-point operation with the same input (which need to be duplicated since FP status may change in between) to still be merged at this point.

To fix this, we need to pass strict FP operations through the selection process while keeping the chain intact (i.e. we have to preserve the chain on the resulting MachineSDNode). But for non-strict operations there is no chain. So the selection needs to recognize operations with chain and without chain. I don't see any way with the current selection pattern rules to implement this in a single pattern, so I guess we'll still need to have two patterns for each operation. On the other hand, I guess the MI instructions themselves don't need to be duplicated any more (only the DAG patterns).

Am I missing something here? Any other suggestions how to improve this would be appreciated!

uweigand updated this revision to Diff 149504.Jun 1 2018, 10:04 AM

Working version of the patch using memory operands.

This version now correctly implements target support for STRICT floating-point nodes using a MMO to represent the floating-point exception status. (Note that targets can and should additionally use a register use dependency on *all* floating-point instructions to represent floating-point control state, e.g. rounding mode and exception trap flags.)

This is now handled mostly by target-independent code: the STRICT_ DAG nodes now count as MemIntrinsic nodes and carry the fp-status MMO right from the beginning. This is now also using a new target-independent "fp-status" flavor of PseudoSourceValue. The STRICT_ DAG nodes then go through DAG selection as usual, keeping their chain until the end.

As noted earlier, this currently requires the target to provide two DAG patterns when matching floating-point instructions, e.g. one for fadd and one for strict_fadd. But those can now both resolve to the *same* MI instruction. The resulting MI will be distinguished by either carrying or not carrying the fp-status MMO.

I haven't found any way to avoid having two patterns; it would be nice to have some sort of "alternative", like "match either fadd or strict_fadd here", but this doesn't appear to exist.

Special-casing strict FP nodes SelectionDAGISel::SelectCodeCommon seems a bit awkward, but is currently necessary because otherwise the fp-status MMO would not be transferred to the final MI, since it is *not* marked as mayLoad or mayStore in general (that was the whole point of this approach ...).

Any thoughts?

Short of tracking the exception masking flags, this is a good solution.

Pretty much any FP operation can trap and there are vector variants, so a little bit of a pattern explosion is likely. Unfortunately, I do no have a better solution to offer though. :/

uweigand updated this revision to Diff 151936.Jun 19 2018, 10:05 AM

I've come up with a suggestion to avoid even the duplicated DAG patterns, by having them implicitly generated by TableGen. This uses a new "alternative" mechanism described in a separate RFC:
https://reviews.llvm.org/D48326

Assuming this solution (or something along those lines) gets approved, this patch shows how we could use it to implement strict FP in back ends. Note that the only back-end changes required for floating-point implicit are now:

  • add an implicit Use of a register representing floating-point *control* state
  • mark instructions as "mayAccessMemory" to indicate they may (or may not!) have a memory operand representing the floating-point *exception* state
  • have the DAG pattern match e.g. "any_fadd", which is a new "alternative" of fadd and strict_fadd.

No duplication of any kind is expected.

Note that this patch is now *complete* (except for test cases). As compared to the previous version, it additionally includes

  • support for the FPC register
  • support for all vector FP instructions
  • the infrastructure to support optional memory operands (using the mayAccessMemory marker)

I like this solution. The start-up cost of adding the any_OP patterns will be amortized, compared against adding unique patterns for each strict node.

That said, I'm not familiar enough with the target-independent framework changes in this patch, so someone with more experience will have to review those...

uweigand updated this revision to Diff 157675.Jul 27 2018, 6:05 AM
uweigand edited the summary of this revision. (Show Details)
uweigand set the repository for this revision to rL LLVM.

Updated patch to make use of the new multi-alternative pattern fragment features, and completed set of test cases.

Support is still not fully complete, but only because for certain operations there still are no strict variants defined (see https://reviews.llvm.org/D43515).

So I think at this point the patch may be ready for upstream, and therefore I would like to ask for review.

The patterns look great. I'm still not comfortable enough to review the target-independent changes, so will leave that to someone else.

The FPC register use on non-strict operations may be overly restrictive though. Here's a pathological test case to illustrate:

void foo(int a, int b) {

while(something) {
  call_that_may_modify_FPC();
  int x = a / b;
  use(x);  
}

}

Since a and b are local to the function, the *non-strict* DIV could be hoisted out of the loop. If we assume that calls implicitly define FPC, which I think we must to model global state, wouldn't that prevent the hoist of this non-strict DIV?

Since a and b are local to the function, the *non-strict* DIV could be hoisted out of the loop. If we assume that calls implicitly define FPC, which I think we must to model global state, wouldn't that prevent the hoist of this non-strict DIV?

Well, I was thinking that in non-strict code, calls do *not* implicitly define FPC. Only calls in strict code (marked by the front-end when FENV_ACCESS is on at the call site) would do so. (This is not yet implemented.)

Well, I was thinking that in non-strict code, calls do *not* implicitly define FPC. Only calls in strict code (marked by the front-end when FENV_ACCESS is on at the call site) would do so. (This is not yet implemented.)

That sounds reasonable to me. Thanks for the explanation, Ulrich.

nhaehnle removed a subscriber: nhaehnle.Jul 30 2018, 4:25 AM

Ping. Would this be easier to review if I split it up into multiple patches?

uweigand updated this revision to Diff 167967.Oct 2 2018, 9:59 AM

Yet another Ping after returning from vacation ...

I've now split off two logically self-contained subparts of the patch into separate revisions to hopefully simplify review:

[PseudoSourceValue] New category to represent floating-point status
https://reviews.llvm.org/D52785

[MI] New flag mayAccessMemory
https://reviews.llvm.org/D52786

This patch now depends on both of those and will only be committed together with them (if approved).

uweigand updated this revision to Diff 173405.Nov 9 2018, 12:03 PM

Updated to include support for recently added constrained intrinsics: floor, ceil, trunc, round, minnum, maxnum.

uweigand updated this revision to Diff 173407.Nov 9 2018, 12:07 PM

Re-added new tests accidentally lost in last diff.

wuzish added a subscriber: wuzish.EditedFeb 26 2019, 2:08 AM

Well, I have a question about why is it not enough to use use/def of physical register in MI level to prevent the reordering of instructions about set/get fp status register? For example, If the instruction which setting rounding mode, and then the instruction which read the rounding mode, the sequence would not be reordered because of the dependency of rounding mode bits (fp status register).

Well, I have a question about why is it not enough to use use/def of physical register in MI level to prevent the reordering of instructions about set/get fp status register? For example, If the instruction which setting rounding mode, and then the instruction which read the rounding mode, the sequence would not be reordered because of the dependency of rounding mode bits (fp status register).

This would be enough to handle rounding-mode aspects. But if FP exceptions are enabled, we have additional restrictions: all FP exceptions could then have additional side effects (trap) and therefore cannot be executed speculatively, even where that would be OK from a status register dataflow perspective.

Also, one goal of the design is that we do not have to duplicate all FP instruction patterns in the back-end; we want a single pattern that can handle both the strict and non-strict case. If we wanted to a phys reg def, that would then have to be optional in some form, and there's no good way I can see to do this in the current infrastructure.

Well, I have a question about why is it not enough to use use/def of physical register in MI level to prevent the reordering of instructions about set/get fp status register? For example, If the instruction which setting rounding mode, and then the instruction which read the rounding mode, the sequence would not be reordered because of the dependency of rounding mode bits (fp status register).

This would be enough to handle rounding-mode aspects. But if FP exceptions are enabled, we have additional restrictions: all FP exceptions could then have additional side effects (trap) and therefore cannot be executed speculatively, even where that would be OK from a status register dataflow perspective.

Also, one goal of the design is that we do not have to duplicate all FP instruction patterns in the back-end; we want a single pattern that can handle both the strict and non-strict case. If we wanted to a phys reg def, that would then have to be optional in some form, and there's no good way I can see to do this in the current infrastructure.

OK. If we use tablegen multiclass to define two kinds of instructions is not too much work because of multiclass. One for strict fp pattern and one for normal non-strict although they will be mapped to same instruction in some targets but with different flags(one has hasSideEffect flag, one hasn't) or phys reg def/use?

Well, I have a question about why is it not enough to use use/def of physical register in MI level to prevent the reordering of instructions about set/get fp status register? For example, If the instruction which setting rounding mode, and then the instruction which read the rounding mode, the sequence would not be reordered because of the dependency of rounding mode bits (fp status register).

This would be enough to handle rounding-mode aspects. But if FP exceptions are enabled, we have additional restrictions: all FP exceptions could then have additional side effects (trap) and therefore cannot be executed speculatively, even where that would be OK from a status register dataflow perspective.

Also, one goal of the design is that we do not have to duplicate all FP instruction patterns in the back-end; we want a single pattern that can handle both the strict and non-strict case. If we wanted to a phys reg def, that would then have to be optional in some form, and there's no good way I can see to do this in the current infrastructure.

OK. If we use tablegen multiclass to define two kinds of instructions is not too much work because of multiclass. One for strict fp pattern and one for normal non-strict although they will be mapped to same instruction in some targets but with different flags(one has hasSideEffect flag, one hasn't) or phys reg def/use?

If you'll go back to the beginning of this discussion, you'll note that this was in fact my very first approach to this problem. I would have been fine with that for SystemZ, but there was pushback from other target maintainers that this would be a lot of work for all targets, and therefore I started looking into other options.

uweigand abandoned this revision.Jun 5 2019, 3:34 PM