Page MenuHomePhabricator

[PowerPC] Emit scalar min/max instructions with unsafe fp math
ClosedPublic

Authored by nemanjai on Jun 6 2019, 8:39 PM.

Details

Summary

This is something I meant to do a long time ago but never got around to it. These instructions should be an improvement over the compare/fsel sequence we currently emit.

The semantics of the instructions as specified in the ISA match the semantics specified in the description of the nodes.

Diff Detail

Event Timeline

nemanjai created this revision.Jun 6 2019, 8:39 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 6 2019, 8:39 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
jsji added a comment.Jun 10 2019, 2:12 PM

It is a great idea to exploit xsmindp/xsmaxdp! But looks like we make it more general than restricted to UnsafeFPMath?

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
551

Why we need TM.Options.UnsafeFPMath here?
If ISD::FMAXNUM_IEEE is generated, then the semantic is exact the same as `xsmaxdp,
we should be safe to use xsmaxdp.

I think we can also add actions for ISD::FMAXNUM/FMAXIMUM and ISD::FMINNUM/FMINIMUM,
then we do need TM.Options.UnsafeFPMath or TM.Options.NoNansFPMath/NoSignedZerosFPMath for them.

llvm/lib/Target/PowerPC/PPCInstrVSX.td
999

We can add similar patterns for fmaxnum/fmaxinum and fminnum/fmininum with Predicates ?

llvm/test/CodeGen/PowerPC/scalar-min-max.ll
3

Maybe add RUN lines for --enable-no-nans-fp-math/-enable-no-signed-zeros-fp-math?

8

It would be great if we can pre-commit the testcase to show only difference .

49

Why we need these attributes? Looks like these should be in different RUN line ?

jsji requested changes to this revision.Aug 27 2019, 7:54 PM

Move out of review queue, need author's action.

This revision now requires changes to proceed.Aug 27 2019, 7:54 PM
amyk added a subscriber: amyk.Sep 12 2019, 10:57 PM
amyk added inline comments.
llvm/test/CodeGen/PowerPC/scalar-min-max.ll
3

Maybe also add -verify-machineinstrs to our tests?

nemanjai marked 2 inline comments as done.Oct 22 2019, 4:54 PM
nemanjai added inline comments.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
551

When I originally did this, it would produce these nodes along with ISD::FCANONICALIZE when unsafe fp math isn't specified. However, the DAG combiner seems to have been modified to not do that any longer. The instructions themselves handle SNaNs correctly anyway so we can handle the inputs coming from ISD::FCANONICALIZE anyway.

However, I don't really see a point in legalizing FMAXNUM/FMINNUM since we will just get the _IEEE versions even with fast math.

llvm/test/CodeGen/PowerPC/scalar-min-max.ll
8

I will add a RUN line without the FMF flags which will show the difference in codegen in the test case itself.

nemanjai updated this revision to Diff 226093.Oct 22 2019, 5:59 PM

Remove the requirement for unsafe math for the FMAXNUM_IEEE and FMINNUM_IEEE. Add codegen for P9 xsmaxdp/xsmindp. Improve the test case.

steven.zhang requested changes to this revision.Oct 23 2019, 1:33 AM
steven.zhang added a subscriber: steven.zhang.

I see there is one sanity failed.

FAIL: LLVM :: CodeGen/PowerPC/ctr-minmaxnum.ll (29517 of 50289)
******************** TEST 'LLVM :: CodeGen/PowerPC/ctr-minmaxnum.ll' FAILED ********************
Script:
--
: 'RUN: at line 1';   /home/qshanz/work/build/bin/llc -mtriple=powerpc64-unknown-linux-gnu -verify-machineinstrs -mcpu=pwr7 < /home/qshanz/work/llvm/llvm/test/CodeGen/PowerPC/ctr-minmaxnum.ll | /home/qshanz/work/build/bin/FileCheck /home/qshanz/work/llvm/llvm/test/CodeGen/PowerPC/ctr-minmaxnum.ll
: 'RUN: at line 2';   /home/qshanz/work/build/bin/llc -mtriple=powerpc64-unknown-linux-gnu -verify-machineinstrs -mcpu=a2q < /home/qshanz/work/llvm/llvm/test/CodeGen/PowerPC/ctr-minmaxnum.ll | /home/qshanz/work/build/bin/FileCheck /home/qshanz/work/llvm/llvm/test/CodeGen/PowerPC/ctr-minmaxnum.ll --check-prefix=QPX
--
Exit Code: 2

Command Output (stderr):
--
LLVM ERROR: Cannot select: t14: f32 = fcanonicalize t2
  t2: f32,ch = CopyFromReg t0, Register:f32 %2
    t1: f32 = Register %2
In function: test1
FileCheck error: '-' is empty.
FileCheck command line:  /home/qshanz/work/build/bin/FileCheck /home/qshanz/work/llvm/llvm/test/CodeGen/PowerPC/ctr-minmaxnum.ll
lib/Target/PowerPC/PPCISelLowering.cpp
555 ↗(On Diff #226093)

We will get the ISD::FMAXNUM/ISD::FMINNUM node if mark it as legal.

define dso_local float @testfmax_fast(float %a, float %b) {
entry:
  %cmp = fcmp fast ogt float %a, %b
  %cond = select i1 %cmp, float %a, float %b
  ret float %cond
}

llc test.ll -mattr=+vsx

And also for the intrinsic llvm.minnum/llvm.maxnum.

Initial selection DAG: %bb.0 'testfmax_fast:entry'
SelectionDAG has 11 nodes:
  t0: ch = EntryToken
  t2: f32,ch = CopyFromReg t0, Register:f32 %0
  t4: f32,ch = CopyFromReg t0, Register:f32 %1
  t6: i1 = setcc nnan ninf nsz arcp contract afn reassoc t2, t4, setgt:ch
    t7: f32 = fmaxnum t2, t4
  t9: ch,glue = CopyToReg t0, Register:f32 $f1, t7
  t10: ch = PPCISD::RET_FLAG t9, Register:f32 $f1, t9:1

The node is built directly not combined by select_cc. And I think, we need to lower it if we know that, the operand is NaN or not(i.e. isKnownNeverNaN()).

7221 ↗(On Diff #226093)

Is it hasVSX() more clear ?

7225 ↗(On Diff #226093)

Do we need logic to handle the case that if the op is NaN ?

If src1 or src2 is a SNaN, an Invalid Operation
exception occurs.
If either src1 or src2 is a NaN, result is src2.
Otherwise, if src1 is less than src2, result is src1.
Otherwise, result is src2.

The ISA documentation is a bit confusing here. Isn't NaN including SNaN and QNaN ? The condition in the second if cover the first one.

lib/Target/PowerPC/PPCInstrInfo.td
120 ↗(On Diff #226093)

SDT_PPCFPMinMax ?

This revision now requires changes to proceed.Oct 23 2019, 1:33 AM
steven.zhang added inline comments.Oct 23 2019, 4:48 AM
lib/Target/PowerPC/PPCISelLowering.cpp
555 ↗(On Diff #226093)

Hmm, ignore the above comments. It is right to have DAG generated the IEEE node instead of the non-IEEE, as the hw has instruction semantics equal.

7225 ↗(On Diff #226093)

Please also ignore above comments as after double confirm, the XSMAXCDP/XSMINCDP perfectly match the semantics, no matter if the operand is NaN or not.

test/CodeGen/PowerPC/scalar-min-max.ll
12 ↗(On Diff #226093)

It would be great if we have some test to verify the behavior if operand is SNaN/QNaN for P9. However, this is NOT a must.

nemanjai marked 5 inline comments as done.Oct 23 2019, 5:37 AM

I see there is one sanity failed.
LLVM ERROR: Cannot select: t14: f32 = fcanonicalize t2

t2: f32,ch = CopyFromReg t0, Register:f32 %2
  t1: f32 = Register %2

In function: test1
FileCheck error: '-' is empty.
FileCheck command line: /home/qshanz/work/build/bin/FileCheck /home/qshanz/work/llvm/llvm/test/CodeGen/PowerPC/ctr-minmaxnum.ll

Ah, that's where I saw the issue I mentioned in https://reviews.llvm.org/D62993?id=203487#inline-622927
I'll fix it up in the next update.

lib/Target/PowerPC/PPCISelLowering.cpp
555 ↗(On Diff #226093)

I don't dispute that we will get the nodes if we mark them legal. However, I do not think that we will get these nodes in more situations than we get the _IEEE versions.

The way I see it, with ninf nsz nnan, the nodes are equivalent since the only difference between them is the handling of NaNs and (presumably) signed zeros.

7205 ↗(On Diff #226093)

This needs to change from isISA3_0() to hasP9Vector().

7221 ↗(On Diff #226093)

I dont' understand this comment. This is only available in ISA3.0, so hasVSX() is not adequate. And VSX is a requirement for P9Vector, so why would I need both?

7225 ↗(On Diff #226093)

Quiet NaNs are fine. Signaling NaNs cause exceptions. This is fine, signaling NaNs are supposed to cause exceptions as far as I can tell.

lib/Target/PowerPC/PPCInstrInfo.td
120 ↗(On Diff #226093)

Sounds good. Will do.

nemanjai updated this revision to Diff 226125.Oct 23 2019, 5:42 AM

Add handling for ISD::FCANONICALIZE.
Add a few tests.
Fix up some naming.

steven.zhang accepted this revision.Oct 23 2019, 7:56 PM

LGTM from my aspect.

steven.zhang added inline comments.Oct 23 2019, 8:01 PM
lib/Target/PowerPC/PPCInstrVSX.td
1267 ↗(On Diff #226125)

Just out of curious, why we set the complexity as 1 here instead of 400.

nemanjai marked an inline comment as done.Oct 23 2019, 9:46 PM
nemanjai added inline comments.
lib/Target/PowerPC/PPCInstrVSX.td
1267 ↗(On Diff #226125)

Ha ha... because it's a bug. I'll change it to 400 on the commit.
It doesn't change semantics because there isn't an equivalent FPU choice to be made here, but it should still be consistent.

@jsji Any further comments?

jsji added a comment.Oct 25 2019, 11:42 AM

Some nit, the biggest question is with line 7223

lib/Target/PowerPC/PPCISelLowering.cpp
1304 ↗(On Diff #226125)

Typo? XSMAXCDP not XSMAXCPD.

7210 ↗(On Diff #226125)

Although this won't have problem right now, because we always return before using Flags.
I think it would be better to move this after the new code, to avoid a potential trap for future programmers.

7223 ↗(On Diff #226125)

We will lose some opportunities here, eg: with -mcpu=pwr9 --enable-no-nans-fp-math --enable-no-signed-zeros-fp-math?
We will catch new opportunities for max/min, but will give up lowering all the other CC?

lib/Target/PowerPC/PPCInstrVSX.td
1290 ↗(On Diff #226125)

Nit: The pattern order here is not consistent with above: instead of having one min, one max, we start to put all min together, hen max. It won't have problem, but would be better to read/check if we make it consistent.

test/CodeGen/PowerPC/ctr-minmaxnum.ll
69 ↗(On Diff #226125)

QPX should NOT be affected, so shouldn't change here?

144 ↗(On Diff #226125)

QPX , shouldn't change.

test/CodeGen/PowerPC/scalar-min-max.ll
2 ↗(On Diff #226125)

I would be more interested to see -mcpu=pwr9 with --enable-no-nans-fp-math instead of -mcpu=pwr8. :)

nemanjai marked 6 inline comments as done.Oct 25 2019, 4:12 PM
nemanjai added inline comments.
lib/Target/PowerPC/PPCISelLowering.cpp
1304 ↗(On Diff #226125)

Oops. Thanks, I'l fix it.

7210 ↗(On Diff #226125)

Fair enough, I'll move this down below the min/max switch.

7223 ↗(On Diff #226125)

That is a really good point, this will prevent us from generating the fsel on P9. I'll fix it up.

lib/Target/PowerPC/PPCInstrVSX.td
1290 ↗(On Diff #226125)

I agree 100% that it's nicer to keep it organized and consistent. Will update, thank you.

test/CodeGen/PowerPC/ctr-minmaxnum.ll
69 ↗(On Diff #226125)

Oops, overzealous search-and-replace...

test/CodeGen/PowerPC/scalar-min-max.ll
2 ↗(On Diff #226125)

I think it is good to show that we get the same instructions with fast math on both P8 and P9.

nemanjai updated this revision to Diff 226519.Oct 25 2019, 4:20 PM

Address the minor comments and fix the early exit on P9 with no NaNs/Infs.

jsji accepted this revision.Oct 27 2019, 9:11 AM

LGTM. Thanks.

This revision is now accepted and ready to land.Oct 27 2019, 9:11 AM
jsji added a comment.Oct 28 2019, 2:27 PM

Oh, maybe the title should be updated to remove "unsafe fp math" to avoid confusion?

This revision was automatically updated to reflect the committed changes.