This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] Change foldAndOrOfSETCC() to optimize and/or patterns with floating points.
ClosedPublic

Authored by kmitropoulou on Jul 14 2023, 12:24 AM.

Details

Summary

CMP(A,C)||CMP(B,C) => CMP(MIN/MAX(A,B), C)
CMP(A,C)&&CMP(B,C) => CMP(MIN/MAX(A,B), C)

If the operands are proven to be non NaN, then the optimization can be applied
for all predicates.

We can apply the optimization for the following predicates for FMINNUM/FMAXNUM
(for quiet and signaling NaNs) and for FMINNUM_IEEE/FMAXNUM_IEEE if we can prove
that the operands are not signaling NaNs.

  • ordered lt/le and ||
  • ordered gt/ge and ||
  • unordered lt/le and &&
  • unordered gt/ge and &&

Diff Detail

Event Timeline

kmitropoulou created this revision.Jul 14 2023, 12:24 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 14 2023, 12:24 AM
kmitropoulou requested review of this revision.Jul 14 2023, 12:24 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 14 2023, 12:24 AM

Updating D155267: [DAGCombiner] Change foldAndOrOfSETCC() to optimize and/or patterns with floating points.

Updating D155267: [DAGCombiner] Change foldAndOrOfSETCC() to optimize and/or patterns with floating points.

arsenm added inline comments.Jul 21 2023, 10:50 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
6135–6138

Should try harder to short circuit these calls. You should also start by checking the fast math flags for nnan

Updating D155267: [DAGCombiner] Change foldAndOrOfSETCC() to optimize and/or patterns with floating points.

Updating D155267: [DAGCombiner] Change foldAndOrOfSETCC() to optimize and/or patterns with floating points.

kmitropoulou marked an inline comment as done.Jul 26 2023, 6:25 PM

Updating D155267: [DAGCombiner] Change foldAndOrOfSETCC() to optimize and/or patterns with floating points.

Updating D155267: [DAGCombiner] Change foldAndOrOfSETCC() to optimize and/or patterns with floating points.

arsenm added inline comments.Aug 2 2023, 3:16 PM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
6116–6117

might as well use 0 for invalid opcode

6128–6178

Can you move this all to a helper function? It would be easier to read with early exits. Plus you can defer the isKnownNeverSNan calls depend on whether the legality more easily

6174

i think the optional makes this noisier, just use 0 or instruction_list_end as the sentinel

craig.topper added inline comments.Aug 2 2023, 3:18 PM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
6116–6117

I think we often use ISD::DELETED_NODE which I think is 0.

6174

This is ISD so we shouln't use instruction_list_end, but ISD::DELETED_NODE would work.

kmitropoulou marked 3 inline comments as done.

Updating D155267: [DAGCombiner] Change foldAndOrOfSETCC() to optimize and/or patterns with floating points.

Updating D155267: [DAGCombiner] Change foldAndOrOfSETCC() to optimize and/or patterns with floating points.

Updating D155267: [DAGCombiner] Change foldAndOrOfSETCC() to optimize and/or patterns with floating points.

arsenm added inline comments.Aug 9 2023, 4:54 PM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
6054

no else after return

6074–6079

Can this be reordered to try FMINNUM/FMAXNUM first?

kmitropoulou marked 2 inline comments as done.

Updating D155267: [DAGCombiner] Change foldAndOrOfSETCC() to optimize and/or patterns with floating points.

arsenm added inline comments.Aug 16 2023, 10:46 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
6042

Check operand 2 first, it's canonically simpler

6074–6079

The point to f this was to try to avoid recursing 6 times to call isKnownNeverSNan, you can try to avoid that in the case where you don't need to worry about it

kmitropoulou marked 2 inline comments as done.

Updating D155267: [DAGCombiner] Change foldAndOrOfSETCC() to optimize and/or patterns with floating points.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
6074–6079

Thank you Matt for the explanation :)

arsenm added inline comments.Aug 23 2023, 2:48 PM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
6074–6079

Beyond the operand check, I also meant to not eagerly call these first thing in the function. like you can perform the check down at the (isNotSNaN). e.g.

add a bothOperandsNotSNan() helper, and call it in place of the bool isNotSNaN variable

kmitropoulou marked an inline comment as done.

Updating D155267: [DAGCombiner] Change foldAndOrOfSETCC() to optimize and/or patterns with floating points.

Updating D155267: [DAGCombiner] Change foldAndOrOfSETCC() to optimize and/or patterns with floating points.

Updating D155267: [DAGCombiner] Change foldAndOrOfSETCC() to optimize and/or patterns with floating points.

arsenm accepted this revision.Aug 24 2023, 8:00 AM
This revision is now accepted and ready to land.Aug 24 2023, 8:00 AM

Updating D155267: [DAGCombiner] Change foldAndOrOfSETCC() to optimize and/or patterns with floating points.

This revision was landed with ongoing or failed builds.Aug 24 2023, 10:49 AM
This revision was automatically updated to reflect the committed changes.

As a heads-up, I'm seeing a crash in the AArch64 Instruction Selection pass when building x264's encoder/analyse.c with --target=aarch64-grtev4-linux-gnu -mcpu=neoverse-n1 that root-causes to this revision. It includes this error message:

fatal error: error in backend: Cannot select: 0x4927f6931c0: f32 = fmaxnum_ieee 0x4927f67b070, 0x4927c7197e0
  0x4927f67b070: f32,ch = CopyFromReg 0x4927ee820e0, Register:f32 %9
    0x4927f68bb60: f32 = Register %9
  0x4927c7197e0: f32,ch = CopyFromReg 0x4927ee820e0, Register:f32 %10
    0x4927f3f0230: f32 = Register %10
In function: x264_10_weights_analyse

We're working on getting a standalone reproducer.

Thank you for the heads up. I reverted the patch : 48fa79a503a7cf380f98b6335fbd349afae1bd86

Please let me know once you have the reproducer.

Thank you for the heads up. I reverted the patch : 48fa79a503a7cf380f98b6335fbd349afae1bd86

Please let me know once you have the reproducer.

I also ran into this bug. I've reduced the issue down to this reproducer:

float a, b;
int c;
void d() {
  for (;;)
    while (b < 0 && a < 0)
      c--;
}

Compiled like this: clang -target aarch64-linux-gnu -c -O2 -ffast-math analyse-reduced.c

kmitropoulou added a comment.EditedAug 28 2023, 4:14 PM

@brooksmoses : Were you able to extract a reproducer?

@brooksmoses : Were you able to extract a reproducer?

No, we were fairly sure that mstorsjo's reproducer covers our case as well, and stopped trying to reduce it once that was posted. Sorry for not saying so!

I would expect that any source code that triggers your optimization and is compiled with -target aarch64-linux-gnu would show the issue, but if that's not the case, I'm happy to pick this back up and see if I can reduce the x264 code.

@mstorsjo : I could not reproduce the failure with the provided test in my workspace.

After some testing, I found that the problem is due to the fact that FMINNUM_IEEE and FMAXNUM_IEEE are not supported in AArch64. I added new lit tests AArch64/combine_andor_with_cmps.ll in order to address the problem.

Please let me know if the fix (https://reviews.llvm.org/D159240) does not address the problems that you encountered.

@mstorsjo : I could not reproduce the failure with the provided test in my workspace.

That's weird - I just recompiled clang at 5ec13535235d07eafd64058551bc495f87c283b1, and reproduced it again.

After some testing, I found that the problem is due to the fact that FMINNUM_IEEE and FMAXNUM_IEEE are not supported in AArch64. I added new lit tests AArch64/combine_andor_with_cmps.ll in order to address the problem.

Please let me know if the fix (https://reviews.llvm.org/D159240) does not address the problems that you encountered.

I tested applying D159240 on the commit before 5ec13535235d07eafd64058551bc495f87c283b1, and the reduced testcase compiles fine now at least, so I presume the issue is fixed. Thanks!