Page MenuHomePhabricator

[PPC] Generate positive FP zero using xor insn instead of loading from constant area
ClosedPublic

Authored by amehsan on Aug 17 2016, 10:35 AM.

Details

Summary

Legalizing positive floating point zero

Diff Detail

Event Timeline

amehsan updated this revision to Diff 68380.Aug 17 2016, 10:35 AM
amehsan retitled this revision from to [PPC] Generate positive FP zero using xor insn instead of loading from constant area.
amehsan updated this object.
amehsan added reviewers: hfinkel, kbarton, nemanjai.
amehsan added a subscriber: llvm-commits.
nemanjai added inline comments.Aug 17 2016, 12:11 PM
lib/Target/PowerPC/PPCISelLowering.h
746

Perhaps a comment about what the use of this function is since this patch has none.

test/CodeGen/PowerPC/fast-isel-fcmp-nan.ll
5

I don't understand the changes to this test case. How is the VSX floating point comparison replacing the move of a field within the condition register and why? I don't really understand the intent of the original test either, but nonetheless checking for these instructions does not seem like the equivalent test.

test/CodeGen/PowerPC/pzero-fp-xored.ll
2

I think it might be a good idea to include --implicit-check-not on this line and specify which patterns (loading a zero from constant pool?) you want to not see.

3

I don't think it's a particularly good idea to use a builtin check string (CHECK-NOT) as a check prefix. In fact, I'm not sure what the semantics of this actually are. And in any case, using different check prefixes for the two run lines should probably include equivalent checking (including CHECK-WHATEVER-PREFIX-LABEL:).

amehsan added inline comments.Aug 17 2016, 12:50 PM
lib/Target/PowerPC/PPCISelLowering.h
746

The name is clear. Isn't it?

test/CodeGen/PowerPC/fast-isel-fcmp-nan.ll
5

I was not able to understand the intention of this testcase. (That said, I didn't check git blame. There might be some info there).

The mcrf here, follows a fcmpu and copies BF 0 to BF7. the new VSX cmp insn, puts the result directly in BF 7. So that was the closest thing that I could have put here. I will check git blame, and if I found something useful to add, I will modify it. But with a testcase, whose intention is not known, I think this change should be fine.

test/CodeGen/PowerPC/pzero-fp-xored.ll
2

Yes, I can add that.

3

I had forgot that CHECK-NOT is built-in. Will change that.

nemanjai added inline comments.Aug 17 2016, 1:14 PM
lib/Target/PowerPC/PPCISelLowering.h
746

Oh, I see. This is used by the target-independent transformations.

test/CodeGen/PowerPC/fast-isel-fcmp-nan.ll
5

Yeah ok. I agree. I think this is kind of a weird test case.

hfinkel edited edge metadata.Aug 18 2016, 1:48 AM

It does not need to be in the same commit as adds this scalar support, but we should also add a pattern to catch the vector case. For

define <2 x double> @foo() local_unnamed_addr #0 {
  ret <2 x double> zeroinitializer
}

we currently also load from memory. Actually, for this case on powerpc64le, there's another opportunity as well:

	addis 3, 2, .LCPI0_0@toc@ha
	addi 3, 3, .LCPI0_0@toc@l
	lxvd2x 0, 0, 3
	xxswapd	 34, 0
	blr

Even in cases where we do need to load from memory, we should be able to fold the swap into the constant in cases where it can't otherwise be eliminated.

It does not need to be in the same commit as adds this scalar support, but we should also add a pattern to catch the vector > case. For

Yes. I intentionally left it out because it required some extra work and I wanted to look at other issues that my reduced testcase for Eigen revealed. I will open a bugzilla item with your example.

It does not need to be in the same commit as adds this scalar support, but we should also add a pattern to catch the vector > case. For

Yes. I intentionally left it out because it required some extra work and I wanted to look at other issues that my reduced testcase for Eigen revealed. I will open a bugzilla item with your example.

It does not need to be in the same commit as adds this scalar support, but we should also add a pattern to catch the vector > case. For

Yes. I intentionally left it out because it required some extra work and I wanted to look at other issues that my reduced testcase for Eigen revealed. I will open a bugzilla item with your example.

I took a look. It seems to be a simple oversight for double and long long. For other cases we generate vxor. So changing it to generate vxor for the missing two cases is easy. I leave changing it to exploit xxlxor, for the future.

echristo added inline comments.
lib/Target/PowerPC/PPCISelLowering.cpp
12121

Here's a good point for a comment on things that you expect to work :)

amehsan added inline comments.Aug 24 2016, 7:11 PM
lib/Target/PowerPC/PPCISelLowering.cpp
12121

VT!

amehsan added inline comments.Aug 24 2016, 7:21 PM
lib/Target/PowerPC/PPCISelLowering.cpp
12121

I don't really know what we do with f16, f80, etc. (need to check a test case). But I think it does not hurt to guard against them.

amehsan updated this revision to Diff 72089.Sep 21 2016, 11:16 AM
amehsan edited edge metadata.
kbarton requested changes to this revision.Sep 21 2016, 11:56 AM
kbarton edited edge metadata.

We need to handle fp16 and fp80, or at least document why they are omitted.

lib/Target/PowerPC/PPCISelLowering.cpp
12121

I agree with Eric. I think a general comment would be useful here. Something like: Single-precision and double precision FP immediates can be loaded when VSX instructions are available and the Immediate has value 0. Half-precision and 80-bit are excluded because...

test/CodeGen/PowerPC/fast-isel-fcmp-nan.ll
5

Any insight from git blame?

This revision now requires changes to proceed.Sep 21 2016, 11:56 AM

We need to handle fp16 and fp80, or at least document why they are omitted.

Currently we have problems handling fp16 and fp80. Without knowing how generally we want to handle them, I prefer to exclude them here.

For this example

define half @t1(half %x) local_unnamed_addr #0 {
entry:
  %cmp = fadd half %x, 0.000000e+00
  ret half %cmp
}

I am getting this output

OUTPUT OF: llc  -mcpu=pwr8< test.ll

        .text
        .abiversion 2
        .file   "<stdin>"
LLVM ERROR: Cannot select: t20: f32 = fp16_to_fp t18
  t18: i32 = fp_to_fp16 t2
    t2: f32,ch = CopyFromReg t0, Register:f32 %vreg0
      t1: f32 = Register %vreg0
In function: t1
amehsan added inline comments.Oct 14 2016, 9:35 AM
test/CodeGen/PowerPC/fast-isel-fcmp-nan.ll
5

Yes. It turns out that with this patch a downstream codegen bug is exposed. I have opened a PR for this

Since fixing that bug will impact the code generated for this testcase, I prefer to leave this test unchanged. I have explicitly mentioned in the 2nd comment of the PR that a fix for that bug, should make sure we generate good code for this testcases.

amehsan added inline comments.Oct 14 2016, 9:43 AM
lib/Target/PowerPC/PPCISelLowering.cpp
12121

I have explained in another comment why this does not support f16 and f80. If you don't mind to approve this, I will add a summary of that comment to the code, before committing the change.

amehsan requested a review of this revision.Oct 14 2016, 9:48 AM
amehsan edited edge metadata.
kbarton accepted this revision.Oct 20 2016, 11:58 AM
kbarton edited edge metadata.

LGTM

This revision is now accepted and ready to land.Oct 20 2016, 11:58 AM
amehsan added inline comments.Oct 24 2016, 10:35 AM
test/CodeGen/PowerPC/tail-dup-analyzable-fallthrough.ll
8

Apparently due to some other change, we now generate xxlxor here, which is better. So I'll change this line of the code before committing.

Could you move the testcase into the right directory?

Could you move the testcase into the right directory?

Sorry I made a mistake when committing. Will fix it right now.