This is an archive of the discontinued LLVM Phabricator instance.

[DAG] Create fptoui.sat from clamped fptoui
ClosedPublic

Authored by dmgreen on Dec 2 2021, 8:37 AM.

Details

Summary

This is the unsigned variant of D111976, where we convert a clamped fptoui to a fptoui.sat. Because we are unsigned, the condition this time is only UMIN of UINT_MAX. Similarly to D111976 it handles ISD::UMIN, ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes.

This especially helps on ARM/AArch64 where the vcvt instructions naturally saturate the result.

Diff Detail

Event Timeline

dmgreen created this revision.Dec 2 2021, 8:37 AM
dmgreen requested review of this revision.Dec 2 2021, 8:37 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 2 2021, 8:38 AM

This seems fine as an extension of the previous patch.
I haven't been following the progress in this area closely though. What prevents folding these patterns to the saturating intrinsics in IR?
https://llvm.org/docs/LangRef.html#llvm-fptoui-sat-intrinsic

This seems fine as an extension of the previous patch.
I haven't been following the progress in this area closely though. What prevents folding these patterns to the saturating intrinsics in IR?
https://llvm.org/docs/LangRef.html#llvm-fptoui-sat-intrinsic

An fptoui.sat is more defined than a fptoui + umin. For the fptoui any out-of-range value produces poison - for the fptoui.sat the out of range values are defined to saturate. So the transform isn't reversible and on some architectures produces worse code.

More concretely a iN fptoui.sat(X) can be expanded into fptoui(fmax(fmin(X, (float)(2^N)-1), 0)) (or something else with float compares and int selects if fmin/fmax are not available). Plus it needs to handle Nan for fptosi, making sure it becomes 0.

I would actually really like it to be done in IR. We would need to vectorize these if we can, and they are much simpler to vectorize if we already have the scalar instructions. But it needs to be a costed decision, not an inst-combine canonicalization decision. Any ideas of a good place to make that happen?

An fptoui.sat is more defined than a fptoui + umin. For the fptoui any out-of-range value produces poison - for the fptoui.sat the out of range values are defined to saturate. So the transform isn't reversible and on some architectures produces worse code.

Ah, right. This is similar to a problem we have with funnel-shift intrinsics. After several improvements in the default expansion, we are able to canonicalize most patterns, but there are still a few that could escape because the safe expansion has more instructions than the unsafe pattern on targets that don't have good shift/rotates.

I would actually really like it to be done in IR. We would need to vectorize these if we can, and they are much simpler to vectorize if we already have the scalar instructions. But it needs to be a costed decision, not an inst-combine canonicalization decision. Any ideas of a good place to make that happen?

Yes, trying to bend the cost models to recognize larger patterns like this is tough. VectorCombine does generic transforms using costs, but this doesn't quite fall into that category...

cc'ing @nikic @bjope based on the history:
https://groups.google.com/g/llvm-dev/c/cgDFaBmCnDQ/m/CZAIMj4IBAAJ
D54749

I'm not sure if there was a plan for these intrinsics to be used from C-like source. Maybe we need an fp-combine pass?

bjope added a comment.Dec 3 2021, 8:35 AM

An fptoui.sat is more defined than a fptoui + umin. For the fptoui any out-of-range value produces poison - for the fptoui.sat the out of range values are defined to saturate. So the transform isn't reversible and on some architectures produces worse code.

Ah, right. This is similar to a problem we have with funnel-shift intrinsics. After several improvements in the default expansion, we are able to canonicalize most patterns, but there are still a few that could escape because the safe expansion has more instructions than the unsafe pattern on targets that don't have good shift/rotates.

I would actually really like it to be done in IR. We would need to vectorize these if we can, and they are much simpler to vectorize if we already have the scalar instructions. But it needs to be a costed decision, not an inst-combine canonicalization decision. Any ideas of a good place to make that happen?

Yes, trying to bend the cost models to recognize larger patterns like this is tough. VectorCombine does generic transforms using costs, but this doesn't quite fall into that category...

cc'ing @nikic @bjope based on the history:
https://groups.google.com/g/llvm-dev/c/cgDFaBmCnDQ/m/CZAIMj4IBAAJ
D54749

I'm not sure if there was a plan for these intrinsics to be used from C-like source. Maybe we need an fp-combine pass?

My use case for these were to related to the Embedded-C support, to implement conversion between fixed point types and floating point types. This was added in https://reviews.llvm.org/D86632. In that patch the intrinsics are used directly when producing IR in the frontend.

Downstream we set FP_TO_SINT_SAT as "custom" as we can do optimized lowering in some situation (depending on involved types and saturation width). I realize that we probably want to override shouldConvertFpToSat to avoid any conversion to these saturated intrinsics when it isn't beneficial for our target.

My use case for these were to related to the Embedded-C support, to implement conversion between fixed point types and floating point types. This was added in https://reviews.llvm.org/D86632. In that patch the intrinsics are used directly when producing IR in the frontend.

Downstream we set FP_TO_SINT_SAT as "custom" as we can do optimized lowering in some situation (depending on involved types and saturation width). I realize that we probably want to override shouldConvertFpToSat to avoid any conversion to these saturated intrinsics when it isn't beneficial for our target.

I believe they may be used in rust too.

I know that scalable vectors are not supported for all cases (they can try to unroll) - that is on my list of things to look at.

Yeah, making an shouldConvertFpToSat override can be important - for example only converting them when the operations are available like we do on Arm. You may be able to copy one of the existing fpclamptosat.ll test cases as a starting point.

As an outsider, similar to VectorCombine maybe other FooCombine passes (special purpose) with costs. As an addition/extension to InstCombine.

RKSimon added inline comments.Jan 20 2022, 7:11 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4915

Isn't this (C1 + 1).exactLogBase2()?

dmgreen updated this revision to Diff 402037.Jan 21 2022, 10:28 AM

Update as per comments. Thanks!

RKSimon accepted this revision.Jan 25 2022, 8:53 AM

LGTM - cheers

This revision is now accepted and ready to land.Jan 25 2022, 8:53 AM
This revision was automatically updated to reflect the committed changes.