This is an archive of the discontinued LLVM Phabricator instance.

[FPEnv] Don't need copysign/fabs/fneg constrained intrinsics
AbandonedPublic

Authored by cameron.mcinally on Aug 17 2018, 11:17 AM.

Details

Summary

This isn't so much a patch as it is a tool for discussion. We shouldn't need constrained intrinsics for copysign/fabs/fneg since they are bitwise operations (assuming the intrinsics map to libm functions). They never trap and are not influenced by the rounding mode.

That said, there are cases where copysign/etc optimizations can result in trap-unsafe code. E.g.:

define double @safe(float %a, double %b) "unsafe-fp-math"="false" {

%r1 = call float @llvm.copysign.f32(float 1.000000e+00, float %a)
%r2 = fpext float %r1 to double
%r3 = call double @llvm.copysign.f64(double %b, double %r2)
ret double %r3

}

In this example, the 1st copysign is sanitizing a potential SNaN so that the fpext does not trap on it. LLVM will currently optimize this sequence to remove the 1st copysign, which may lead to a FE_INVALID trap on the fpext.

As long as we add constrained intrinsics for the fpext (and friends), we won't be able to optimize the copysign in an unsafe way. So, I think we're good with just that.

In conclusion, we don't need constrained intrinsics for copysign/fabs/fneg, but we should be wary of how they're optimized in the presence of constrained converts.

Does this sound correct to everyone? Did I miss anything?

Diff Detail

Event Timeline

It looks like I spoke too soon. We DO need a constrained fneg intrinsic. LLVM does not have a dedicated FNEG operator, but rather uses SUB. This is not okay since 0-x can trap, where -x cannot trap. I'll create a separate patch and Differential for FNEG.

Besides that, we still don't need constrained copysign/fabs intrinsics...

I believe we still don't need an intrinsic for constrained fneg. As you say, floating-point negation is implemented as -0-x, using a regular fsub (*not* the constrained intrinsic). The fsub semantics says that it cannot trap, so this would still be fine -- as long as we're sure the -0-x is always implemented via a dedicated negate instruction, and never via a subtraction. But this seems to be true, the idiom -0-x is recognized early during codegen and always treated specially.

The fsub semantics says that it cannot trap, so this would still be fine -- as long as we're sure the -0-x is always implemented via a dedicated negate instruction, and never via a subtraction.

Oh, that can't be correct. Can you point me to the documentation?

An FSUB can trap. Also, the user could explicitly write 0-x or -x, depending on what behavior they want. It's not safe for the compiler to canonicalize both of those into one operation. Unless I'm missing something?

I do see that you wrote -0-x. Is there something special about the -0?

cameron.mcinally added a comment.EditedAug 17 2018, 2:41 PM

Oh, ok. I think I understand now...

User writes 0-x: Frontend maps to constrained FSUB
User writes -x: Frontend maps to FSUB

Yeah, I suppose that would work if it's guaranteed that the -x would never produce an FSUB instruction, like you said. There seems to be a number of implicit agreements there though, so it may be fragile.

I do see that you wrote -0-x. Is there something special about the -0?

I understand this is to map -0 to +0 and vice versa.

User writes 0-x: Frontend maps to constrained FSUB
User writes -x: Frontend maps to FSUB

Yes, exactly.

arsenm added a subscriber: arsenm.Aug 19 2018, 7:39 AM

I think it would be better to just add fneg as a first class operation

kpn added a comment.Aug 20 2018, 6:53 AM

I do see that you wrote -0-x. Is there something special about the -0?

I understand this is to map -0 to +0 and vice versa.

User writes 0-x: Frontend maps to constrained FSUB
User writes -x: Frontend maps to FSUB

Yes, exactly.

That does seem fragile. I don't like fragile. I can imagine getting questions from programmers who are having performance issues and we tell them to replace "0-x" with "-x". Or are having correctness issues with +0 vs -0, fix them, and then have traps where they didn't before. We'd need to make sure the SelectionDAG knows about these implicit agreements (and keeps knowing about them) since it does have that FNEG node.

It seems to me that the SelectionDAG needs to be aware of FENV_ACCESS ON when building the SDAG so it doesn't use FNEG when trapping behavior is wanted. Wouldn't that solve the fragility issue in the FENV_ACCESS ON case at least?

I haven't looked into copysign yet. That's on today's menu.

I'm looking into the FNEG handling and found one bad xform in llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp:

// Expand Y = FNEG(X) -> Y = SUB -0.0, X

That's not necessarily a show-stopper though. We could change this code to do a bitwise operation instead. That may even be faster than a FSUB, depending on the target of course.

kpn added a comment.Aug 20 2018, 7:43 AM

I'm looking into the FNEG handling and found one bad xform in llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp:

// Expand Y = FNEG(X) -> Y = SUB -0.0, X

That's not necessarily a show-stopper though. We could change this code to do a bitwise operation instead. That may even be faster than a FSUB, depending on the target of course.

If SelectionDAG refuses to create an FNEG node when it see a constrained fsub, and the SDAG optimizations and legalizations are the same, then this shouldn't be an issue. If the fsub that led to the FNEG node was not constrained in the first place then this seems like a valid transform.

If the fsub that led to the FNEG node was not constrained in the first place then this seems like a valid transform.

Bah, sorry. My comment was ambiguous...

I meant that the FNEG->FSUB xform is "bad" in the context Ulrich proposed. For example:

The user writes a unary minus, -x, which is a bitwise operation and will never trap. The frontend would then convert -x to an *unconstrained* -0-x. InstCombine catches this pattern and canonicalizes the -0-x to a FNEG. But, if the target expands the FNEG, it will be converted back into an FSUB, which is arithmetic and could trap. We don't want to introduce traps where none were originally intended.

kpn added a comment.Aug 20 2018, 9:33 AM

If the fsub that led to the FNEG node was not constrained in the first place then this seems like a valid transform.

Bah, sorry. My comment was ambiguous...

I meant that the FNEG->FSUB xform is "bad" in the context Ulrich proposed. For example:

The user writes a unary minus, -x, which is a bitwise operation and will never trap.

Do we know this for certain? For example, on SystemZ there are instructions for FNEG and abs, non-vector and vector. Those instructions are floating point instructions. And, yes, those instructions are defined to not trap. But do we know for certain that no architecture traps, even the ones that use floating point instructions for these operations?

I'm sticking with my thinking that a constrained fsub instruction should never be turned into an FNEG SDNode. It's a very simple rule that won't result in confused or surprised programmers. This rule also means a constrained FNEG would not be needed.

The user writes a unary minus, -x, which is a bitwise operation and will never trap.

Do we know this for certain?

Yes, this is specified by IEEE-754 Section 5.5.1 (my copy is old, but I don't believe this has changed):

"negate(x) copies a floating-point operand x to a destination in the same format, reversing the sign bit. negate(x) is not the same as subtraction(0, x) (see 6.3)."

This section also states that "The operations treat floating-point numbers and NaNs alike, and signal no exception."

I'm sticking with my thinking that a constrained fsub instruction should never be turned into an FNEG SDNode. It's a very simple rule that won't result in confused or surprised programmers.

This is correct. We cannot optimize away a trap that existed in the original program.

This rule also means a constrained FNEG would not be needed.

I'm not so sure about this. We may be able to do without a constrained FNEG if the FEs and LLVM agree to an implicit contract on unconstrained FSUB. But, right now, I don't think we fully comply in all cases.

kpn added a comment.Aug 20 2018, 11:55 AM

I just got done talking to one of our floating point math guys here.

If the fsub that led to the FNEG node was not constrained in the first place then this seems like a valid transform.

Bah, sorry. My comment was ambiguous...

I meant that the FNEG->FSUB xform is "bad" in the context Ulrich proposed. For example:

The user writes a unary minus, -x, which is a bitwise operation and will never trap. The frontend would then convert -x to an *unconstrained* -0-x. InstCombine catches this pattern and canonicalizes the -0-x to a FNEG. But, if the target expands the FNEG, it will be converted back into an FSUB, which is arithmetic and could trap. We don't want to introduce traps where none were originally intended.

This sounds like the case for a constrained fneg intrinsic.

We can't convert -x into an fsub, constrained or not, because we can't risk introducing or removing traps. We need an unambiguous way of making a distinction between "-0-x" and "-x". The constrained fneg intrinsic would allow that distinction.

In D50913#1206266, @kpn wrote:

I just got done talking to one of our floating point math guys here.

If the fsub that led to the FNEG node was not constrained in the first place then this seems like a valid transform.

Bah, sorry. My comment was ambiguous...

I meant that the FNEG->FSUB xform is "bad" in the context Ulrich proposed. For example:

The user writes a unary minus, -x, which is a bitwise operation and will never trap. The frontend would then convert -x to an *unconstrained* -0-x. InstCombine catches this pattern and canonicalizes the -0-x to a FNEG. But, if the target expands the FNEG, it will be converted back into an FSUB, which is arithmetic and could trap. We don't want to introduce traps where none were originally intended.

This sounds like the case for a constrained fneg intrinsic.

We can't convert -x into an fsub, constrained or not, because we can't risk introducing or removing traps. We need an unambiguous way of making a distinction between "-0-x" and "-x". The constrained fneg intrinsic would allow that distinction.

This is a problem with the representation of fneg as an fsub. We should just fix that hack with a proper fneg instruction, and eliminate the transformations to fsubs where appropriat

I would really much rather have an fneg instruction than to deal with a constrained fneg On GPUs we have source modifiers, and it would be really helpful to preserve all of the side effect free properties of fneg so ti's easy to propagate all of the optimizations of fnegs through the constrained operations. Having the unnecessary complexity of the side effects to workaround this quirk of how the IR has been defined thus far would be unfortunate

So LangRef says that there is no explicit fneg:

"Note that the ‘fsub’ instruction is used to represent the ‘fneg’ instruction present in most other intermediate representations."

But this doesn't seem to be the case in IRBuilder. I.e.: CreateFNeg(...)

Perhaps separating the FNeg and FSub operators is a good solution. And also removing all the xforms between the two. That makes more sense since fneg really *shouldn't* be constrained (it's just a bitwise operation).

spatel added a subscriber: spatel.Aug 20 2018, 12:29 PM

So LangRef says that there is no explicit fneg:

"Note that the ‘fsub’ instruction is used to represent the ‘fneg’ instruction present in most other intermediate representations."

But this doesn't seem to be the case in IRBuilder. I.e.: CreateFNeg(...)

Perhaps separating the FNeg and FSub operators is a good solution. And also removing all the xforms between the two. That makes more sense since fneg really *shouldn't* be constrained (it's just a bitwise operation).

The IRBuilder is just a convenience wrapper. We also have the pattern matching or m_Fneg(), so I don't think it should be too difficult to introduce a proper FNeg instruction since most of the uses are hidden in these anyway

Cool, I like that idea. I'll let this sit for a day or so to see if there are further comments. If not, I'll start a discussion on llvm-dev for a wider audience.

Cool, I like that idea. I'll let this sit for a day or so to see if there are further comments. If not, I'll start a discussion on llvm-dev for a wider audience.

I think fneg as an IR instruction or intrinsic has come up for discussion before, but I don't know why it wasn't implemented at the time. An intrinsic is the lighter weight solution and would line up with the similar:
http://llvm.org/docs/LangRef.html#llvm-copysign-intrinsic
http://llvm.org/docs/LangRef.html#llvm-fabs-intrinsic

Any reason to prefer an actual instruction over an intrinsic?

I think fneg as an IR instruction or intrinsic has come up for discussion before, but I don't know why it wasn't implemented at the time. An intrinsic is the lighter weight solution and would line up with the similar:

http://llvm.org/docs/LangRef.html#llvm-copysign-intrinsic
http://llvm.org/docs/LangRef.html#llvm-fabs-intrinsic

Any reason to prefer an actual instruction over an intrinsic?

An intrinsic would work. I'm trying to avoid creating a *constrained* intrinsic, since it doesn't fit well. All of the other constrained intrinsics are on arithmetic operations (can trap). Copysign/abs/neg are bitwise operations (cannot trap). It doesn't seem correct to constrain an operation that cannot trap.

Cool, I like that idea. I'll let this sit for a day or so to see if there are further comments. If not, I'll start a discussion on llvm-dev for a wider audience.

I think fneg as an IR instruction or intrinsic has come up for discussion before, but I don't know why it wasn't implemented at the time. An intrinsic is the lighter weight solution and would line up with the similar:
http://llvm.org/docs/LangRef.html#llvm-copysign-intrinsic
http://llvm.org/docs/LangRef.html#llvm-fabs-intrinsic

Any reason to prefer an actual instruction over an intrinsic?

I thjnk there’s unnecessary resistance to adding new instructions, but they are nicer to deal with. The existing set of basic operators are instructions, so fneg should be as well. They’re common enough to justify it. Isn’t the representation of intrinsic calls more expensive? I’d prefer if we had more of the IEEE ops as full instructions, like fma and fabs

Cool, I like that idea. I'll let this sit for a day or so to see if there are further comments. If not, I'll start a discussion on llvm-dev for a wider audience.

I think fneg as an IR instruction or intrinsic has come up for discussion before, but I don't know why it wasn't implemented at the time. An intrinsic is the lighter weight solution and would line up with the similar:
http://llvm.org/docs/LangRef.html#llvm-copysign-intrinsic
http://llvm.org/docs/LangRef.html#llvm-fabs-intrinsic

Any reason to prefer an actual instruction over an intrinsic?

I thjnk there’s unnecessary resistance to adding new instructions, but they are nicer to deal with. The existing set of basic operators are instructions, so fneg should be as well. They’re common enough to justify it. Isn’t the representation of intrinsic calls more expensive? I’d prefer if we had more of the IEEE ops as full instructions, like fma and fabs

Ok - those seem like good arguments. This patch/topic should be raised on llvm-dev, so others can comment.

This patch/topic should be raised on llvm-dev, so others can comment.

Will do. I'll give it a little more time for the usual suspects to chime in, if they wish.

llvm-dev discussion started under the subject:

[FPEnv] FNEG instruction

I feel like it might be a bad idea to have floating point instructions that don't have constrained forms.

For one thing, we need to not just consider the operation itself but also any possible ways it can be combined with other operations. We might be confident that fneg X can stand alone without fear of trapping, but by not making a constrained version we are implicitly giving the optimizer freedom to combine this with other operations. Can we be sure the optimizer won't find some clever way to combine a sequence of non-constrained operations that we thought didn't need constrained forms and end up with an unconstrained fadd or fsub somewhere?

Also, we previously agreed that if any FP operations in a function required constrained intrinsics then all FP operations in that function would get constrained intrinsics. This was to avoid operations migrating across boundaries after inlining and such. If we had a function that contained an fneg that was not marked as constrained and no other constrained operations and that function was inlined into a function that was not constrained, the fneg could easily end up being combined with unconstrained fsub and fadd operations.

If we don't introduce a first class fneg instruction, I don't think we can rely on the CodeGen always pattern matching -0-X to an fneg instruction. The fact that it does that is nice, but if there is nothing requiring it to happen we should assume that it won't always happen.

I am not against introducing a first class fneg instruction, but I do want to point out that even if we do so the lack of trapping behavior cannot be part of the definition of the instruction. Or rather that fact won't distinguish it from other FP instructions because the LLVM language reference explicitly states that all FP instructions are assumed to have no side effects. So there is no reason for the optimizer to preserve this non-trapping property in the abscence of constrained intrinsics.

For one thing, we need to not just consider the operation itself but also any possible ways it can be combined with other operations. We might be confident that fneg X can stand alone without fear of trapping, but by not making a constrained version we are implicitly giving the optimizer freedom to combine this with other operations. Can we be sure the optimizer won't find some clever way to combine a sequence of non-constrained operations that we thought didn't need constrained forms and end up with an unconstrained fadd or fsub somewhere?

I'm not sure the search space is that big for this. There are only 4 quiet-computational operations: copy, abs, neg, and copysign. Neg is the only one that we have to worry about being converted into a trapping operation. If there's a contract that an FNEG will never be converted into an FSUB, then it's fine to optimize them. In fact, we want the operations to be optimized.

Also, we previously agreed that if any FP operations in a function required constrained intrinsics then all FP operations in that function would get constrained intrinsics. This was to avoid operations migrating across boundaries after inlining and such. If we had a function that contained an fneg that was not marked as constrained and no other constrained operations and that function was inlined into a function that was not constrained, the fneg could easily end up being combined with unconstrained fsub and fadd operations.

This sounds okay to me. If an fneg is combined with an unconstrained operation, there was no intention of trap-safety on the unconstrained operation in the first place. If the user is issuing unconstrained operations with traps enabled, that's their mistake.

That said, a fneg cannot be combined with a constrained operation freely.

I am not against introducing a first class fneg instruction, but I do want to point out that even if we do so the lack of trapping behavior cannot be part of the definition of the instruction. Or rather that fact won't distinguish it from other FP instructions because the LLVM language reference explicitly states that all FP instructions are assumed to have no side effects. So there is no reason for the optimizer to preserve this non-trapping property in the abscence of constrained intrinsics.

Your argument here is compelling. If there's no guarantee that an FSUB and FNEG are disjoint, then this won't work. Although, an FSUB and FNEG aren't really the same operation. Perhaps we should only be doing the transformation under UnsafeMath conditions to begin with...

Your argument here is compelling. If there's no guarantee that an FSUB and FNEG are disjoint, then this won't work. Although, an FSUB and FNEG aren't really the same operation. Perhaps we should only be doing the transformation under UnsafeMath conditions to begin with...

I think that you are correct. If I'm reading the spec correctly, the relevant cases are fneg(NaN) and fneg(0), so we should only be performing the transformation if the nnan and nsz flags are set.

Regarding the constrained fneg, I guess my biggest argument in favor of having a constrained fneg intrinsic is that it would be confusing (as a human reader of the IR) to see a non-constrained floating point operation amongst constrained operations. On the other hand, I don't think that arguments against having a constrained fneg are terribly compelling either.

Did we decide that fast math flags can't be applied in the presence of constrained operations? Maybe that's my best reason to have a constrained fneg.

I am not against introducing a first class fneg instruction, but I do want to point out that even if we do so the lack of trapping behavior cannot be part of the definition of the instruction. Or rather that fact won't distinguish it from other FP instructions because the LLVM language reference explicitly states that all FP instructions are assumed to have no side effects. So there is no reason for the optimizer to preserve this non-trapping property in the abscence of constrained intrinsics.

Your argument here is compelling. If there's no guarantee that an FSUB and FNEG are disjoint, then this won't work. Although, an FSUB and FNEG aren't really the same operation. Perhaps we should only be doing the transformation under UnsafeMath conditions to begin with...

This is more a question of lowering and not an IR semantics question. The fneg will be defined as having no side effects or trapping. If through some architectural specific quirk the expected instructions need something more to avoid this, then that is their issue to deal with. Since it should always be impleentable with bitwise operations, this shouldn't be an issue

This is more a question of lowering and not an IR semantics question. The fneg will be defined as having no side effects or trapping. If through some architectural specific quirk the expected instructions need something more to avoid this, then that is their issue to deal with. Since it should always be impleentable with bitwise operations, this shouldn't be an issue

As I said, all LLVM IR FP instructions are assumed to have no side effects. I'm not sure we want an instruction that goes beyond this to be defined as having no side effects. It adds a complication to the language and introduces restrictions on the code generator that aren't needed in the ordinary case of non-constrained FP. The target code generators are free to do anything they want with the other FP instructions, including things that introduce new FP status flags being set that otherwise wouldn't be, and for the normal case the back ends should be free to do that with fneg as well.

Did we decide that fast math flags can't be applied in the presence of constrained operations?

I haven't thought about this before today, but some of the flags we would want. E.g. reassoc/contract/afn.

In a pure trap-safe mode, contract/afn should be turned off. But, generally speaking, our users would prefer to see optimized code that misses some right-at-the-edge cases.

kpn added a comment.Aug 24 2018, 10:38 AM

Did we decide that fast math flags can't be applied in the presence of constrained operations?

I haven't thought about this before today, but some of the flags we would want. E.g. reassoc/contract/afn.

In a pure trap-safe mode, contract/afn should be turned off. But, generally speaking, our users would prefer to see optimized code that misses some right-at-the-edge cases.

Over here we'd want contract and afn but not reassoc.

It looks like an explicit FNEG operation isn't going to fly. I propose we drop that pursuit and continue forward with a constrained FNEG intrinsic.

I suppose the best plan of action would be to add constrained COPYSIGN and FABS intrinsics too, so that the whole set is uniform. Does anyone feel strongly otherwise? If not, I'll close this Diff and work on a patch to put them in place.

@cameron.mcinally can you summarize why an explicit fneg isn't going to fly. I had been trying to follow the thread and maybe I missed a strong object somewhere?

There hasn't been any strong objects, but I haven't seen many strong accepts either besides the few main stakeholders. I'm under the assumption that silence is a passive reject in situations like this.

Should I keep pushing for it? It felt like I was beating a dead horse...

There hasn't been any strong objects, but I haven't seen many strong accepts either besides the few main stakeholders. I'm under the assumption that silence is a passive reject in situations like this.

Should I keep pushing for it? It felt like I was beating a dead horse...

It's hard to interpret silence sometimes, but my experience has been that this community is rarely passive about rejection. I think there are good reasons to make another attempt at moving forward with the fneg instruction.

There hasn't been any strong objects, but I haven't seen many strong accepts either besides the few main stakeholders. I'm under the assumption that silence is a passive reject in situations like this.

Should I keep pushing for it? It felt like I was beating a dead horse...

I don't know if I qualify as a main stakeholder, but since I've been futzing around with fast-math-flags and IR canonicalization for a few years now...
I thought we had consensus that adding fneg to IR was desirable, so I didn't bother to repeat any of the people that did reply to the dev thread.

I do have 1 concern though - let me reply to the dev thread with that since that's the wider audience in theory.

cameron.mcinally abandoned this revision.Nov 28 2018, 9:46 AM

This issue was handled with the addition of a dedicated FNeg IR Instruction in D53877.

At this time, I do not know of a need for constrained fabs/fneg/fcopysign intrinsics. We can reevaluate if unsafe transformations are found in the future.

Thanks for the contributions, everyone.