This is an archive of the discontinued LLVM Phabricator instance.

[Strict FP] Allow custom operation actions
ClosedPublic

Authored by uweigand on Jul 24 2019, 10:22 AM.

Details

Summary

This patch changes the DAG legalizer to respect the operation actions set by the target for strict floating-point operations. (Currently, the legalizer will usually fall back to mutate to the non-strict action (which is assumed to be legal), and only skip mutation if the strict operation is marked legal.)

With this patch, if whenever a strict operation is marked as Legal or Custom, it is passed to the target as usual. Only if it is marked as Expand will the legalizer attempt to mutate to the non-strict operation. Note that this will now fail if the non-strict operation is itself marked as Custom -- the target will have to provide a Custom definition for the strict operation then as well.

Diff Detail

Repository
rL LLVM

Event Timeline

uweigand created this revision.Jul 24 2019, 10:22 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 24 2019, 10:23 AM
kpn added inline comments.Jul 24 2019, 11:39 AM
lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
2790 ↗(On Diff #211537)

In what way does it not honor the strict properties?

Also, wouldn't it be a bug if we were asked to expand a strict node when the non-strict is legal?

kpn added a comment.Jul 24 2019, 11:48 AM

It looks like we're unrolling some vectors where before we weren't. That seems unfortunate. Is that the reason for the generated code quality regressions?

In D65226#1599743, @kpn wrote:

It looks like we're unrolling some vectors where before we weren't. That seems unfortunate. Is that the reason for the generated code quality regressions?

+1. What's the reason behind the scalarization?

uweigand marked an inline comment as done.Jul 25 2019, 4:50 AM
In D65226#1599743, @kpn wrote:

It looks like we're unrolling some vectors where before we weren't. That seems unfortunate. Is that the reason for the generated code quality regressions?

+1. What's the reason behind the scalarization?

Those are cases where the non-strict operation actually is not legal: e.g. the X86 target sets the operation action for FADD and FSUB vector operations to Custom. The old code simply ignored that and just emitted FADD and FSUB anyway as if they were legal, and only by chance did they match an isel pattern anyway.

The target can get the old behavior back by simply handling STRICT_FADD and STRICT_FSUB directly (presumably also via Custom operations similar to ones it uses for FADD/FSUB), so this is only a "regression" for the current fallback implementation.

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
2790 ↗(On Diff #211537)

In what way does it not honor the strict properties?

Well, the expansion is done by using a truncating store followed by a load. The truncating store is a non-strict operation which will not raise exceptions.

Also, wouldn't it be a bug if we were asked to expand a strict node when the non-strict is legal?

Why would that be a bug? That's the current default behavior on all targets that don't (yet) support strict operations. (All strict operations default to expand, which is taken to mean replace by the non-strict version.)

In D65226#1599743, @kpn wrote:

It looks like we're unrolling some vectors where before we weren't. That seems unfortunate. Is that the reason for the generated code quality regressions?

+1. What's the reason behind the scalarization?

Those are cases where the non-strict operation actually is not legal: e.g. the X86 target sets the operation action for FADD and FSUB vector operations to Custom. The old code simply ignored that and just emitted FADD and FSUB anyway as if they were legal, and only by chance did they match an isel pattern anyway.

The target can get the old behavior back by simply handling STRICT_FADD and STRICT_FSUB directly (presumably also via Custom operations similar to ones it uses for FADD/FSUB), so this is only a "regression" for the current fallback implementation.

Ah, ok. So I think the x86 case is a little more subtle than just *not legal*. The backend checks for *some* Custom lowerings (horizontal ops), but ultimately the non-strict vector FADD/FSUB are legal on x86. The Custom lowering code just returns the original op.

In D65226#1599743, @kpn wrote:

It looks like we're unrolling some vectors where before we weren't. That seems unfortunate. Is that the reason for the generated code quality regressions?

+1. What's the reason behind the scalarization?

Those are cases where the non-strict operation actually is not legal: e.g. the X86 target sets the operation action for FADD and FSUB vector operations to Custom. The old code simply ignored that and just emitted FADD and FSUB anyway as if they were legal, and only by chance did they match an isel pattern anyway.

The target can get the old behavior back by simply handling STRICT_FADD and STRICT_FSUB directly (presumably also via Custom operations similar to ones it uses for FADD/FSUB), so this is only a "regression" for the current fallback implementation.

Ah, ok. So I think the x86 case is a little more subtle than just *not legal*. The backend checks for *some* Custom lowerings (horizontal ops), but ultimately the non-strict vector FADD/FSUB are legal on x86. The Custom lowering code just returns the original op.

Yes, exactly. But there's no way common code can know that this is what the Custom lowering code does; in general, it is not OK to just pass an opcode to isel if the target classifies the op as Custom. But as I said, once the target actually handles the STRICT_ opcodes (either Custom or maybe even Legal, if that's what the target wants), then any regression will go away. (And as long as the target *doesn't* handle them, they don't really implement the strict semantics anyway and shouldn't be used for anything "real" anyway.)

I don't see a problem with scalarizing strict ops for Custom lowered nodes that would otherwise be legal. It's not ideal, I suppose, but if a target doesn't support the strict nodes in the backend, then it probably shouldn't be using the experimental intrinsics anyway.

I don't really have a strong opinion on this though, so will leave it to others that may have one...

kpn added a comment.Jul 25 2019, 11:55 AM

So the PowerPC code regressions will be fixed once the strict tickets make it into the tree it sounds like.

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
2790 ↗(On Diff #211537)

In what way does it not honor the strict properties?

Well, the expansion is done by using a truncating store followed by a load. The truncating store is a non-strict operation which will not raise exceptions.

Is it documented anywhere that the truncating store will not raise exceptions? It seems weird to me since for IEEE targets a truncating store must change the format of the bits anyway. What if an exponent is out of range of the smaller type, for example?

How many non-IEEE targets are there? I'm well aware that hardware is still very current that supports S/370's radix 16 FP, plus the radix 10 FP, but are there any other current forms of not-at-all-IEEE floating point?

A different angle: if a target cannot use EmitStackConvert(), then isn't it that target's responsibility to make sure this code path is never used? In that case we wouldn't need this subtle code here.

Do we even have any tests for EmitStackConvert()? I remember having a very hard time finding one.

Also, wouldn't it be a bug if we were asked to expand a strict node when the non-strict is legal?

Why would that be a bug? That's the current default behavior on all targets that don't (yet) support strict operations. (All strict operations default to expand, which is taken to mean replace by the non-strict version.)

It sounds like we're changing what Expand means for strict operations. Previously it meant the same thing as it does for non-strict operations: use the fallback/default expansion. And there's plenty of code in place to do those expansions in ways that preserve the strictness. The strict and non-strict nodes have followed the same paths for the most part with the exception of strict { Custom, Promote } -> Expand.

Long term, on targets that support strict floating point, do we want Expand to have different meanings for strict and non-strict nodes? It worries me if they're different.

3707 ↗(On Diff #211537)

Say, we can't emit libcalls for any old random thing. If we simply return false here then it will try -- and fail -- to emit a libcall. And failing to emit a libcall is _not_ fatal I'm pretty sure. So no tricky returning true when we didn't actually expand anything is needed here.

uweigand marked an inline comment as done.Jul 25 2019, 12:35 PM
In D65226#1601520, @kpn wrote:

So the PowerPC code regressions will be fixed once the strict tickets make it into the tree it sounds like.

Yes, exactly.

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
2790 ↗(On Diff #211537)

Is it documented anywhere that the truncating store will not raise exceptions?

Common code assumes that the truncating store will not raise FP exceptions, just like it assumes any DAG operation, except for the STRICT_... ones, will not raise FP exceptions. (Now, since it is a store, common code assumes that it may raise an exception because of memory access faults, but then again, in this specific use case, common code will recognize that the store targets a stack slot which can never fault.) Therefore, common code might e.g. speculate the truncating store outside of a condition -- which would be incorrect if indeed FP exceptions are turned on. Similarly, common code might schedule this truncating store across a function call that might change rounding modes.

I guess a way to correctly implement this transformation would be to expand a STRICT_FP_ROUND into a "strict" version of a truncating store (which currently does not exist). But I don't think it would be worthwhile to add this until and unless this is actually useful on some target.

It sounds like we're changing what Expand means for strict operations.

Well, I guess that's true in the sense that for now, Expand for strict operations allows for two implementations:

  • either, implement the precise semantics of the operation in terms of other operations (this is the "traditional" meaning of Expand, and is in some cases doable for strict operations too, but usually only if you can implement a strict operations in terms of other strict operations);
  • or, fall back to the non-strict semantics if strict semantics are not possible because the target doesn't (yet) implement any of those at all

At some point in the future, we want the second option to go away, but we're not there yet. But on targets that do implement strict semantics in the backend, common code should never use that second option.

And there's plenty of code in place to do those expansions in ways that preserve the strictness.

Not really? All the existing expansions end up in solely non-strict DAG opcodes, so how could they preserve the strictness?

Long term, on targets that support strict floating point, do we want Expand to have different meanings for strict and non-strict nodes? It worries me if they're different.

No, long term the meaning should be the same again. The fallback is the short-term thing.

kpn added inline comments.Jul 26 2019, 9:18 AM
lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
2790 ↗(On Diff #211537)

Common code assumes that the truncating store will not raise FP exceptions, just like it assumes any DAG operation, except for the STRICT_... ones, will not raise FP exceptions. (Now, since it is a store, common code assumes that it may raise an exception because of memory access faults, but then again, in this specific use case, common code will recognize that the store targets a stack slot which can never fault.) Therefore, common code might e.g. speculate the truncating store outside of a condition -- which would be incorrect if indeed FP exceptions are turned on. Similarly, common code might schedule this truncating store across a function call that might change rounding modes.

In the STRICT EmitStackConvert() case the store+load are chained together _and_ that chain is spliced into the chain where the STRICT node was formerly located. Does common code do transformations to the SDAG that reorder the chain? I thought the point of the chain was to enforce ordering.

I guess a way to correctly implement this transformation would be to expand a STRICT_FP_ROUND into a "strict" version of a truncating store (which currently does not exist). But I don't think it would be worthwhile to add this until and unless this is actually useful on some target.

Agreed.

uweigand marked an inline comment as done.Jul 26 2019, 10:35 AM
uweigand added inline comments.
lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
2790 ↗(On Diff #211537)

In the STRICT EmitStackConvert() case the store+load are chained together _and_ that chain is spliced into the chain where the STRICT node was formerly located. Does common code do transformations to the SDAG that reorder the chain? I thought the point of the chain was to enforce ordering.

That's true on the SDAG, yes. But once the store is translated to MI, it will not be marked as potentially raising an FP exception (like strict MIs are), and therefore it might reordered by the MI schedulers.

The difference to a (hypothetical) strict truncating store would be that the latter would get translated to MI with the mayRaiseFPException flag on.

kpn added inline comments.Jul 26 2019, 11:28 AM
lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
2790 ↗(On Diff #211537)

Should this expansion be there at all, then? Right now if this expansion is used we silently emit non-strict code. Shouldn't it just outright fail, or fail to expand at least? Same deal with STRICT_FP_EXTEND below.

uweigand marked an inline comment as done.Jul 26 2019, 11:49 AM
uweigand added inline comments.
lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
2790 ↗(On Diff #211537)

I guess it is there to ensure that the (currently still default) fallback to expand strict to non-strict continues to work on targets where FP_TRUNC is not legal.

I agree this is somewhat questionable; I'd be fine with this expansion going away. Use of a constrained truncate/expand intrinsic on such target should then fail ...

Ping?

Should we move ahead with this? I believe this is still a pre-req for D63782 ...

kpn added a comment.Aug 5 2019, 10:48 AM

Ping?

Should we move ahead with this? I believe this is still a pre-req for D63782 ...

I have no objections. I'll defer to one of the other reviewers.

I'm still not thrilled with the tricky doing nothing but returning true code like I mentioned, but I don't think that should be enough to hold things up.

hfinkel accepted this revision.Aug 5 2019, 11:16 AM
In D65226#1615192, @kpn wrote:

Ping?

Should we move ahead with this? I believe this is still a pre-req for D63782 ...

I have no objections. I'll defer to one of the other reviewers.

I'm still not thrilled with the tricky doing nothing but returning true code like I mentioned, but I don't think that should be enough to hold things up.

LGTM. It does not look like there are any outstanding objections.

This revision is now accepted and ready to land.Aug 5 2019, 11:16 AM
This revision was automatically updated to reflect the committed changes.

Thanks, Hal. I agree we'll need to clean up the "fallback" Expand logic once targets have moved to actually supporting strict FP nodes.