Current implementation handles unordered comparison poorly in soft-float mode.
Consider (a ULE b) which is a <= b. It is lowered to (ledf2(a, b) <= 0 || unorddf2(a, b) != 0) (in general). We can do better job by lowering it to (__gtdf2(a, b) <= 0).
Such replacement is true for other CMP's (ult, ugt, uge). In general, we just call same function as for ordered case but negate comparison against zero.
Details
Diff Detail
Event Timeline
lib/CodeGen/SelectionDAG/TargetLowering.cpp | ||
---|---|---|
118 | There was one newline too many, now there are two. Remove them both? | |
130 | Capital g, and name sounds like a function. What about something like "ShouldInvertCC" ? | |
184 | "Invert CC for unordered comparisons." ? | |
187–188 | Indentation seems off, clang-format? | |
216 | "true" -> "/*isInteger=*/true" ? | |
test/CodeGen/AArch64/arm64-fp128.ll | ||
153 | Is this just "cset w0, gt" ? | |
167 | Can this comment be removed now? I guess your change lets us undo some other transform (which you should probably look into by the way) | |
170–171 | CHECK-NEXT? | |
171 | You should check IFFALSE/RET29 somewhere. | |
178 | CHECK-NEXT? | |
182 | CHECK-NEXT? | |
test/CodeGen/Thumb2/float-cmp.ll | ||
84 | Hmm, these might benefit from stricter CHECK/NONE, since I guess the result gets inverted? | |
test/CodeGen/X86/fp-cmp-sf.ll | ||
1 ↗ | (On Diff #28773) | Can you chmod this file as non-executable? Also, what about "fpcmp-soft-float" or "fpcmp-soft-fp"? "sf" isn't descriptive enough IMO. |
127 ↗ | (On Diff #28773) | Why optsize/readnone? For that matter, if you're not doing CHECK-NEXT, why nounwind? |
ping
test/CodeGen/AArch64/arm64-fp128.ll | ||
---|---|---|
167 | This comment explains line 171: why we compare with GE, instead of LT (that should be in case of ordered LT) | |
178 | No, there is ldp instruction in between | |
182 | same here | |
test/CodeGen/Thumb2/float-cmp.ll | ||
84 | Sorry, I didn't understand that. Can you please explain? | |
test/CodeGen/X86/fp-cmp-sf.ll | ||
1 ↗ | (On Diff #28773) | Why chmod this as non-executable? Other tests don't do that. |
Thanks for the update! To me, the logic is sound. Nitpick replies below.
test/CodeGen/AArch64/arm64-fp128.ll | ||
---|---|---|
167 | Hmm, that's not how I understood the comment. Checking both uno/__unordtf2 and oge/__getf2 is indeed an 'unfortunate "optimization"'. We used to do something like: IR: %1 = fcmp olt ... br i1 %1, label %iftrue, label %iffalse SelectionDAG: %1 = fcmp olt ... %2 = xor i1 %1, 1 brcond i1 %2, label %iffalse br label %iftrue Optimized SelectionDAG: %1 = fcmp uge ... brcond i1 %2, label %iffalse br label %iftrue It's this optimization that the comment complains about, I think, because a softened uge used to need two libcalls (as opposed to 1 for olt). Whereas I read: bl __lttf2 cmp w0, #0 b.ge <iffalse> iftrue: as a straightforward lowering for: %1 = fcmp olt ... %2 = xor i1 %1, 1 brcond i1 %2, label %iffalse br label %iftrue via (given that __lttf2 returns < 0 / -1 on "olt" only): %1 = call @__lttf2 %2 = icmp lt i32 %1, 0 %3 = xor i1 %2, 1 brcond i1 %3, label %iffalse br label %iftrue itself simplified to: %1 = call @__lttf2 %2 = icmp ge i32 %1, 0 brcond i1 %2, label %iffalse br label %iftrue In any case, not a big deal, but I might have missed something, so: am I making sense? | |
test/CodeGen/Thumb2/float-cmp.ll | ||
84 | The IR does an UGT comparison, but this patch checks for OLE. Since UGT == !OLE, I imagine there's some code right after the call that inverts r0 somehow. I'm saying it might be a good idea to check (in this case, using "NONE:") that code as well. | |
test/CodeGen/X86/fp-cmp-sf.ll | ||
1 ↗ | (On Diff #28773) | I mean this test file (fp-cmp-sf.ll) is executable in your patch (see the svn property a few lines above on phabricator). Before committing, please change your local fp-cmp-sf.ll file to not be executable: chmod -x test/CodeGen/X86/fp-cmp-sf.ll You might also have to svn propdel svn:executable test/CodeGen/X86/fp-cmp-sf.ll if you svn add-ed it already. |
Thanks for comments, Ahmed
test/CodeGen/AArch64/arm64-fp128.ll | ||
---|---|---|
167 | Yes, exactly. But there's no XOR in given IR, so we must explain why GE. And I agree that <unfortunately "optimizes"> is now just "optimizes" |
There was one newline too many, now there are two. Remove them both?