This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Improved lowering for saturating float to int.
Needs ReviewPublic

Authored by ebevhan on Aug 17 2020, 7:26 AM.

Details

Summary

Adapted from D54696 by @nikic.

This patch improves the default lowering of saturating
float to int conversions, FP_TO_[SU]INT_SAT, on AArch64.
Since the default FP_TO_INT operations on AArch64 are
already saturating, we can take advantage of this
behavior in the lowering for the saturating operations.

Diff Detail

Event Timeline

ebevhan created this revision.Aug 17 2020, 7:26 AM
ebevhan requested review of this revision.Aug 17 2020, 7:26 AM
ebevhan updated this revision to Diff 286506.Aug 19 2020, 1:48 AM

Amended comment.

Given you are making use of the fact that AArch64 fp_to_int already does the saturation part, I'm wondering if the different-width saturation can be moved from the input to the output? Specially I'm thinking this will result in nicer code as the min/max immediate values will be easier to produce in integer form.

Given you are making use of the fact that AArch64 fp_to_int already does the saturation part, I'm wondering if the different-width saturation can be moved from the input to the output? Specially I'm thinking this will result in nicer code as the min/max immediate values will be easier to produce in integer form.

Ah, so you mean to emit the select-comparisons as integer comparisons rather than floating point comparisons in the second expansion.

It feels like that would work. @nikic, was there any reason in particular you didn't implement it that way in the original patch or does it sound fine to you?

nikic added a comment.Aug 30 2020, 8:54 AM

Given you are making use of the fact that AArch64 fp_to_int already does the saturation part, I'm wondering if the different-width saturation can be moved from the input to the output? Specially I'm thinking this will result in nicer code as the min/max immediate values will be easier to produce in integer form.

Ah, so you mean to emit the select-comparisons as integer comparisons rather than floating point comparisons in the second expansion.

It feels like that would work. @nikic, was there any reason in particular you didn't implement it that way in the original patch or does it sound fine to you?

Sounds like a good idea to me. Don't think I had a particular reason here beyond "that's what the generic expansion does".

ebevhan updated this revision to Diff 289172.Sep 1 2020, 8:01 AM

Made natively saturating expansions use integer comparisons instead.

ebevhan updated this revision to Diff 290263.Sep 7 2020, 5:42 AM

Rebased.

ebevhan updated this revision to Diff 290297.Sep 7 2020, 8:49 AM

Removed superfluous NaN check and updated tests.

Sorry I had hoped to look at this patch properly today but that hasn't worked out. I've got a general comment below as I think it would be nicer to abstract the NativeSaturation into a TLI hook that other targets can implement when useful to them.

I'm also wondering if there's some code reuse opportunities with LowerVectorFP_TO_INT since understandably they're making similar lowering choices. I'll defer that last point until I take a proper look tomorrow.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
3009

This seems a bit unnatural to the way lowering is typically done. Can this just return SDValue() and then add a target TLI hook to query the behaviour of the FP convert ISD nodes that expandFP_TO_INT_SAT can call.

I'm afraid some of my comments are probably conflicting. I would have experimented a little to understand better how things work, but I'm guessing the patch is based on other work because I couldn't some of the affected functions/nodes.

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8070–8071

Perhaps I've misread something but with this change it looks like a NaN input to FP_TO_UINT_SAT can result in (unsigned int)MaxFloat rather than the expected zero? because the following ISD::FMINNUM will choose the non-NaN input rather than propagating MinFloat like it did before (because the FMAXNUM has been omitted).

This goes against my previous suggestion but perhaps it will be cleaner to perform all the lowering within LowerFP_TO_INT_SAT as the process will be a uniform:

// This handles native saturation and NaN -> 0 transformations.
A = getNode(FP_TO_UINT // Letting LowerFP_TO_INT handle input promotion, rather than duplicate that handling.
A = getNode(ISD::MIN, A, SAT_HI)
if (isSigned)
  A = getNode(ISD::MAX, A, SAT_LO)
return A
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
3012–3015

The DstVT.getScalarSizeInBits().getFixedSize() != SatWidth case feels like something we should catch and transform as early as possible to enable other optimisation opportunities. Perhaps worth a DAGCombine using the same target TLI hook mentioned before.

llvm/test/CodeGen/AArch64/fptosi-sat-scalar.ll
78–88

This test backs up the performance side of what I'm saying in expandFP_TO_INT_SAT. Using the knowledge that the saturation value is exactly representable as a float is not the only thing you care about for AArch64. You also care about "what's the most efficient way to perform the clamping operations".

Assuming (mainly because I haven't yet found documented proof) that a native FP_TO_INT operation on AArch64 implements the NaN->0 requirement, then I'm wondering if there's ever a good reason to perform the clamping at the input stage. Perhaps it's really a question of scalar vs vector with regards to the availability of MIN/MAX instructions.

I would have experimented a little to understand better how things work, but I'm guessing the patch is based on other work because I couldn't some of the affected functions/nodes.

I forgot to add the parent revision of this to Phab, sorry. It's added now.

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8070–8071

I suspect you may be right. Your suggested pattern might be cleaner.

Should I just drop the NativeSaturation parameter altogether and only handle the expansion explicitly in Lowering?

Should I add more types to Custom lowering of these nodes on AArch64?

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
3012–3015

What sort of combines do you have in mind here?

llvm/test/CodeGen/AArch64/fptosi-sat-scalar.ll
78–88

So perhaps we should just skip the standard expansion for these cases and only expand to the pattern you specified there. It's probably not safe to do it for all cases if we can't guarantee the saturating behavior of other FP_TO_INT operations.

Assuming (mainly because I haven't yet found documented proof) that a native FP_TO_INT operation on AArch64 implements the NaN->0 requirement,

That is interesting, I can't find it either. I simply adapted this from the previous patchset by nikic, andI presumed that that was correct.
Perhaps it's an undocumented behavior.

ebevhan updated this revision to Diff 290715.Sep 9 2020, 4:55 AM

Followup to removing vector expansion in the parent patch.

ebevhan updated this revision to Diff 290752.Sep 9 2020, 9:01 AM

Redid lowering.

efriedma added inline comments.
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8098

Please don't abuse the semantics of target-independent opcodes. If the replacement operation is saturating, we should use FP_TO_SINT_SAT. (I assume this doesn't form an infinite loop because the saturation width would be different.)

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
8839

Is this change related somhow?

llvm/test/CodeGen/AArch64/fptosi-sat-scalar.ll
78–88

It's documented in the AArch64 reference manual, it's just sort of buried. The conversion is described in FPToFixed(), which calls FPUnpackBase().

efriedma added inline comments.Sep 9 2020, 4:48 PM
llvm/test/CodeGen/AArch64/fptosi-sat-vector.ll
170

This should probably just be "fcvtzs v0.2d, v0.2d; sqxtn v0.2s, v0.2d".

ebevhan added inline comments.Sep 10 2020, 3:02 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8098

I don't think saturation width is used to determine legality, so I don't think I can emit a _SAT here, unless I've misunderstood how legalization operates.

I think I would be more inclined to simply remove NativeSaturation and do everything in Lower.

If I can't use FP_TO_INT, should there be patterns in AArch64 for selecting FP_TO_INT_SAT nodes in that case?

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
8839

I got an issue here with some of the vector emissions since getNode with a BUILD_VECTOR can actually manage to optimize away the BUILD_VECTOR node and we end up getting something else. Then the rest of the code in this function breaks, because it assumes that Op is a BUILD_VECTOR.

Even if I change the patch after this so we don't end up with that pattern any more, I might keep this in because it's a pitfall waiting to happen.

ebevhan added inline comments.Sep 10 2020, 5:36 AM
llvm/test/CodeGen/AArch64/fptosi-sat-vector.ll
170

Is sqxtn ever selected without intrinsics? I would assume that a simple minmax and trunc pattern would catch it, but that doesn't happen.

ebevhan updated this revision to Diff 290950.Sep 10 2020, 5:51 AM

Removed NativeSaturation parameter and made all lowering happen in Lower.

We only use fminnum/fmaxnum on vector-same-width saturation, and otherwise we use integer comparisons. This improves some emitted code.

ebevhan added inline comments.Sep 10 2020, 5:54 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
3050

Do you think it's still wrong to use FP_TO_INT in these lowerings?

ebevhan updated this revision to Diff 290957.Sep 10 2020, 6:12 AM

Use default expansion for f128 conversions; they aren't native so we can't be sure that they saturate.

efriedma added inline comments.Sep 10 2020, 11:26 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8098

If you've marked the legality of a node "Custom", the target determines whether it's legal. If the custom lowering hook just returns the input node, that will tell the legalizer that the node is legal.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
3050

My concern is that you're making assumptions that doesn't match the specified semantics of FP_TO_SINT. This is confusing, and some code could see this operation and perform some illegal transform on it. (This sort of usage has caused miscompiles in the past.)

int_aarch64_neon_fcvtzs has the right semantics, or you could introduce a new target-specific node, or you could generate another FP_TO_SINT_SAT. I guess I'd slightly prefer a new node.

8839

Oh, hmm, strange that nobody else has hit it. I guess it's fine.

llvm/test/CodeGen/AArch64/fptosi-sat-vector.ll
170

Not sure off the top of my head, but I wouldn't be surprised if the answer is no. Nobody has spent much time on saturation patterns.

The current patch generates xtn? That isn't saturating, I think?

ebevhan added inline comments.Sep 11 2020, 2:19 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
3050

Hm, I agree with what you're saying, so I opted for emitting FP_TO_INT_SAT nodes with DstWidth == SatWidth.

However, while implementing I'm finding that I need to reimplement all of the custom vector lowering for FP_TO_INT for these as well, which seems a bit superfluous.

Would it be wrong to construct an FP_TO_INT node and then just immediately manually call LowerFP_TO_INT?

llvm/test/CodeGen/AArch64/fptosi-sat-vector.ll
170

xtn doesn't saturate, no... It just seems to truncate.

But... Doesn't this actually mean that it's wrong? This is supposed to saturate to i32, but it's actually converting to i64 and then truncating. That can't be right.

You weren't wrong about assumptions on node behavior, lowering this to FP_TO_INT clearly didn't do the right thing. So this means that vector FP_TO_INT only saturates if the scalar types are the same bit width? I wonder if the same applies to f16.

efriedma added inline comments.Sep 11 2020, 3:28 PM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
3050

Would it be wrong to construct an FP_TO_INT node and then just immediately manually call LowerFP_TO_INT?

I think I'd prefer to just make LowerVectorFP_TO_INT work on both FP_TO_INT and FP_TO_INT_SAT nodes.

llvm/test/CodeGen/AArch64/fptosi-sat-vector.ll
170

ISD::FP_TO_SINT always has the same semantics as the IR fptosi, regardless of what the actual underlying instructions do.

ebevhan updated this revision to Diff 291556.Sep 14 2020, 6:19 AM

Now only uses FP_TO_INT_SAT nodes for lowering. The nodes are selected in isel.

LowerVectorFP_TO_INT was extended to also handle the _SAT nodes.

ebevhan marked an inline comment as done.Sep 14 2020, 6:24 AM
ebevhan added inline comments.
llvm/test/CodeGen/AArch64/fptosi-sat-vector.ll
170

Ah, certainly. I really meant FCVTZ, not FP_TO_INT.

ebevhan updated this revision to Diff 291575.Sep 14 2020, 7:45 AM

Fix clang-format warnings.

efriedma added inline comments.Sep 14 2020, 11:22 AM
llvm/lib/Target/AArch64/AArch64InstrInfo.td
3916 ↗(On Diff #291575)

(v8i16 (to_int v8f16:$Rn, (i32 64)))? The 64 seems weird.

ebevhan updated this revision to Diff 291823.Sep 15 2020, 1:16 AM

Fixed typos.

Ping. The build failures don't seem related to the patch.

uabelho added inline comments.Oct 14 2020, 12:39 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
3037

We explicitly did

if (DstWidth == SatWidth)
  return Op;

at line 3105 above, so the

DstWidth != SatWidth

comparison should always be true here?

3059

Always false since we already did

if (DstWidth == SatWidth)
  return Op;

at 3105?