Legalizing positive floating point zero
Diff Detail
Event Timeline
lib/Target/PowerPC/PPCISelLowering.h | ||
---|---|---|
746 | Perhaps a comment about what the use of this function is since this patch has none. | |
test/CodeGen/PowerPC/fast-isel-fcmp-nan.ll | ||
5 | I don't understand the changes to this test case. How is the VSX floating point comparison replacing the move of a field within the condition register and why? I don't really understand the intent of the original test either, but nonetheless checking for these instructions does not seem like the equivalent test. | |
test/CodeGen/PowerPC/pzero-fp-xored.ll | ||
1 | I think it might be a good idea to include --implicit-check-not on this line and specify which patterns (loading a zero from constant pool?) you want to not see. | |
2 | I don't think it's a particularly good idea to use a builtin check string (CHECK-NOT) as a check prefix. In fact, I'm not sure what the semantics of this actually are. And in any case, using different check prefixes for the two run lines should probably include equivalent checking (including CHECK-WHATEVER-PREFIX-LABEL:). |
lib/Target/PowerPC/PPCISelLowering.h | ||
---|---|---|
746 | The name is clear. Isn't it? | |
test/CodeGen/PowerPC/fast-isel-fcmp-nan.ll | ||
5 | I was not able to understand the intention of this testcase. (That said, I didn't check git blame. There might be some info there). The mcrf here, follows a fcmpu and copies BF 0 to BF7. the new VSX cmp insn, puts the result directly in BF 7. So that was the closest thing that I could have put here. I will check git blame, and if I found something useful to add, I will modify it. But with a testcase, whose intention is not known, I think this change should be fine. | |
test/CodeGen/PowerPC/pzero-fp-xored.ll | ||
1 | Yes, I can add that. | |
2 | I had forgot that CHECK-NOT is built-in. Will change that. |
It does not need to be in the same commit as adds this scalar support, but we should also add a pattern to catch the vector case. For
define <2 x double> @foo() local_unnamed_addr #0 { ret <2 x double> zeroinitializer }
we currently also load from memory. Actually, for this case on powerpc64le, there's another opportunity as well:
addis 3, 2, .LCPI0_0@toc@ha addi 3, 3, .LCPI0_0@toc@l lxvd2x 0, 0, 3 xxswapd 34, 0 blr
Even in cases where we do need to load from memory, we should be able to fold the swap into the constant in cases where it can't otherwise be eliminated.
Yes. I intentionally left it out because it required some extra work and I wanted to look at other issues that my reduced testcase for Eigen revealed. I will open a bugzilla item with your example.
Yes. I intentionally left it out because it required some extra work and I wanted to look at other issues that my reduced testcase for Eigen revealed. I will open a bugzilla item with your example.
I took a look. It seems to be a simple oversight for double and long long. For other cases we generate vxor. So changing it to generate vxor for the missing two cases is easy. I leave changing it to exploit xxlxor, for the future.
lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
12117 | Here's a good point for a comment on things that you expect to work :) |
lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
12117 | VT! |
lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
12117 | I don't really know what we do with f16, f80, etc. (need to check a test case). But I think it does not hurt to guard against them. |
We need to handle fp16 and fp80, or at least document why they are omitted.
lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
12117 | I agree with Eric. I think a general comment would be useful here. Something like: Single-precision and double precision FP immediates can be loaded when VSX instructions are available and the Immediate has value 0. Half-precision and 80-bit are excluded because... | |
test/CodeGen/PowerPC/fast-isel-fcmp-nan.ll | ||
5 | Any insight from git blame? |
Currently we have problems handling fp16 and fp80. Without knowing how generally we want to handle them, I prefer to exclude them here.
For this example
define half @t1(half %x) local_unnamed_addr #0 { entry: %cmp = fadd half %x, 0.000000e+00 ret half %cmp }
I am getting this output
OUTPUT OF: llc -mcpu=pwr8< test.ll .text .abiversion 2 .file "<stdin>" LLVM ERROR: Cannot select: t20: f32 = fp16_to_fp t18 t18: i32 = fp_to_fp16 t2 t2: f32,ch = CopyFromReg t0, Register:f32 %vreg0 t1: f32 = Register %vreg0 In function: t1
test/CodeGen/PowerPC/fast-isel-fcmp-nan.ll | ||
---|---|---|
5 | Yes. It turns out that with this patch a downstream codegen bug is exposed. I have opened a PR for this Since fixing that bug will impact the code generated for this testcase, I prefer to leave this test unchanged. I have explicitly mentioned in the 2nd comment of the PR that a fix for that bug, should make sure we generate good code for this testcases. |
lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
12117 | I have explained in another comment why this does not support f16 and f80. If you don't mind to approve this, I will add a summary of that comment to the code, before committing the change. |
test/CodeGen/PowerPC/tail-dup-analyzable-fallthrough.ll | ||
---|---|---|
8 ↗ | (On Diff #72089) | Apparently due to some other change, we now generate xxlxor here, which is better. So I'll change this line of the code before committing. |
Perhaps a comment about what the use of this function is since this patch has none.