This is an archive of the discontinued LLVM Phabricator instance.

[DAG] Create fptosi.sat from clamped fptosi
ClosedPublic

Authored by dmgreen on Oct 18 2021, 1:21 AM.

Details

Summary

This adds a fold in DAGCombine to create fptosi_sat from sequences for smin(smax(fptosi(x))) nodes, where the min/max saturate the output of the fp convert to a specific bitwidth (say INT_MIN and INT_MAX). Because it is dealing with smin(/smax) in DAG they may currently be ISD::SMIN, ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes which need to be handled similarly.

A shouldConvertFpToSat method was added to control when converting may be profitable. The original fptosi will have a less strict semantics than the fptosisat, with less values that need to produce defined behaviour.

This especially helps on ARM/AArch64 where the vcvt instructions naturally saturate the result.

Diff Detail

Event Timeline

dmgreen created this revision.Oct 18 2021, 1:21 AM
dmgreen requested review of this revision.Oct 18 2021, 1:21 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 18 2021, 1:21 AM

WebAssembly changes look great!

dmgreen updated this revision to Diff 380736.Oct 19 2021, 11:09 AM

Thanks!

Now uses isConstOrConstSplat as opposed to inventing another one.

RKSimon added inline comments.Oct 25 2021, 4:21 AM
llvm/test/CodeGen/X86/fpclamptosat.ll
1179

Should we be taking into account the maximum/minimum representable value of the half type as well here?

dmgreen added inline comments.Oct 25 2021, 8:51 AM
llvm/test/CodeGen/X86/fpclamptosat.ll
1179

Yeah I haven't looked into that at much yet. A fp16 only produces 17 (IIRC) bits of data, so anything larger than that doesn't really make much sense to try and saturate. I kept the test to keep the tests consistent and make sure nothing silly happened. I wasn't sure if that was best optimized at the llvm level (maybe with some range analysis) or in DAG, but it felt like a different issue that I didn't try and address here.

Do you think it's worth doing something special with that here? Or removing the tests that don't make a huge amount of sense? Or leaving it as-is?

efriedma added inline comments.Oct 25 2021, 11:25 AM
llvm/test/CodeGen/AArch64/fpclamptosat_vec.ll
1226

This isn't an improvement in the case where we don't have a native half fcvtzs.

llvm/test/CodeGen/X86/fpclamptosat.ll
1179

For actual fptosi.sat, we can't really optimize based on the set of legal finite values because we need to handle +-inf.

Here, the smin/smax are redundant, but if we care, we should probably handle that using ComputeNumSignBits or something like that. (But we probably don't care.)

craig.topper added inline comments.Oct 25 2021, 12:07 PM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4790

Function name should be lower case.

4803

Use const APInt &

4855

Use const APInt &

4857

Can we do -Max != (MinC + 1) and share the MinC + 1. One less computation for large APInts.

4863

Lower case function name

dmgreen added inline comments.Oct 28 2021, 2:12 AM
llvm/test/CodeGen/AArch64/fpclamptosat_vec.ll
1226

Yep. This unrolling is also blocking SVE, which I wanted to take a look into. I may just prevent the fold for half on non-half hardware in the meantime.

llvm/test/CodeGen/X86/fpclamptosat.ll
1179

Yeah, this would be an optimization for fptosi where the infinite values produce poison. NumSignBits might work but I was already looking into a patch based on range information, which seems to work fine too. I don't have a motivating case, but it seems simple enough and might help somewhere.

dmgreen updated this revision to Diff 382959.EditedOct 28 2021, 2:27 AM

Thanks - Addressed comments. I still have not done anything with the unrolled half without fp16 case quite yet.

dmgreen updated this revision to Diff 382960.Oct 28 2021, 2:33 AM

Now with the correct change.

dmgreen updated this revision to Diff 382962.Oct 28 2021, 2:39 AM

IsSignedMinMax -> isSignedMinMax

dmgreen updated this revision to Diff 386646.Nov 11 2021, 1:33 PM
dmgreen marked 5 inline comments as done.

Added a check for the AArch64 v8f16 without fp16 case, that does not legalize well.

Ping. Any other comments?

RKSimon accepted this revision.Nov 29 2021, 5:06 AM

LGTM - cheers

This revision is now accepted and ready to land.Nov 29 2021, 5:06 AM
This revision was landed with ongoing or failed builds.Nov 30 2021, 3:05 AM
This revision was automatically updated to reflect the committed changes.
hans added a subscriber: hans.Nov 30 2021, 6:42 AM

This caused our builds to break with this assertion:

llvm/include/llvm/ADT/APInt.h:990:
bool llvm::APInt::operator==(const llvm::APInt &) const:
Assertion `BitWidth == RHS.BitWidth && "Comparison requires equal bit widths"' failed.

Reverted in a87782c34d667d1c1a18fe82a9a7abfc72fd345b.

hans added a comment.Nov 30 2021, 6:43 AM

This caused our builds to break with this assertion:

llvm/include/llvm/ADT/APInt.h:990:
bool llvm::APInt::operator==(const llvm::APInt &) const:
Assertion `BitWidth == RHS.BitWidth && "Comparison requires equal bit widths"' failed.

Reverted in a87782c34d667d1c1a18fe82a9a7abfc72fd345b.

Forgot the important part: The reproducer is here: https://bugs.chromium.org/p/chromium/issues/detail?id=1275121#c2

Oh. Thanks for the report. Looks like two of the operands can be different types where it wasn't expected.

I don't think it's this change, but I will take a look. Thanks.