Current implementation handles unordered comparison poorly in soft-float mode.
Consider (a ULE b) which is a <= b. It is lowered to (ledf2(a, b) <= 0 || unorddf2(a, b) != 0) (in general). We can do better job by lowering it to (__gtdf2(a, b) <= 0).
Such replacement is true for other CMP's (ult, ugt, uge). In general, we just call same function as for ordered case but negate comparison against zero.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
lib/CodeGen/SelectionDAG/TargetLowering.cpp | ||
---|---|---|
117 | There was one newline too many, now there are two. Remove them both? | |
128 | Capital g, and name sounds like a function. What about something like "ShouldInvertCC" ? | |
182 | "Invert CC for unordered comparisons." ? | |
185–187 | Indentation seems off, clang-format? | |
214 | "true" -> "/*isInteger=*/true" ? | |
test/CodeGen/AArch64/arm64-fp128.ll | ||
152–153 | Is this just "cset w0, gt" ? | |
167 | Can this comment be removed now? I guess your change lets us undo some other transform (which you should probably look into by the way) | |
168–169 | CHECK-NEXT? | |
169–171 | You should check IFFALSE/RET29 somewhere. | |
178 | CHECK-NEXT? | |
182 | CHECK-NEXT? | |
test/CodeGen/Thumb2/float-cmp.ll | ||
84 | Hmm, these might benefit from stricter CHECK/NONE, since I guess the result gets inverted? | |
test/CodeGen/X86/fp-cmp-sf.ll | ||
1 ↗ | (On Diff #28773) | Can you chmod this file as non-executable? Also, what about "fpcmp-soft-float" or "fpcmp-soft-fp"? "sf" isn't descriptive enough IMO. |
127 ↗ | (On Diff #28773) | Why optsize/readnone? For that matter, if you're not doing CHECK-NEXT, why nounwind? |
ping
test/CodeGen/AArch64/arm64-fp128.ll | ||
---|---|---|
167 | This comment explains line 171: why we compare with GE, instead of LT (that should be in case of ordered LT) | |
178 | No, there is ldp instruction in between | |
182 | same here | |
test/CodeGen/Thumb2/float-cmp.ll | ||
84 | Sorry, I didn't understand that. Can you please explain? | |
test/CodeGen/X86/fp-cmp-sf.ll | ||
1 ↗ | (On Diff #28773) | Why chmod this as non-executable? Other tests don't do that. |
Thanks for the update! To me, the logic is sound. Nitpick replies below.
test/CodeGen/AArch64/arm64-fp128.ll | ||
---|---|---|
167 | Hmm, that's not how I understood the comment. Checking both uno/__unordtf2 and oge/__getf2 is indeed an 'unfortunate "optimization"'. We used to do something like: IR: %1 = fcmp olt ... br i1 %1, label %iftrue, label %iffalse SelectionDAG: %1 = fcmp olt ... %2 = xor i1 %1, 1 brcond i1 %2, label %iffalse br label %iftrue Optimized SelectionDAG: %1 = fcmp uge ... brcond i1 %2, label %iffalse br label %iftrue It's this optimization that the comment complains about, I think, because a softened uge used to need two libcalls (as opposed to 1 for olt). Whereas I read: bl __lttf2 cmp w0, #0 b.ge <iffalse> iftrue: as a straightforward lowering for: %1 = fcmp olt ... %2 = xor i1 %1, 1 brcond i1 %2, label %iffalse br label %iftrue via (given that __lttf2 returns < 0 / -1 on "olt" only): %1 = call @__lttf2 %2 = icmp lt i32 %1, 0 %3 = xor i1 %2, 1 brcond i1 %3, label %iffalse br label %iftrue itself simplified to: %1 = call @__lttf2 %2 = icmp ge i32 %1, 0 brcond i1 %2, label %iffalse br label %iftrue In any case, not a big deal, but I might have missed something, so: am I making sense? | |
test/CodeGen/Thumb2/float-cmp.ll | ||
84 | The IR does an UGT comparison, but this patch checks for OLE. Since UGT == !OLE, I imagine there's some code right after the call that inverts r0 somehow. I'm saying it might be a good idea to check (in this case, using "NONE:") that code as well. | |
test/CodeGen/X86/fp-cmp-sf.ll | ||
1 ↗ | (On Diff #28773) | I mean this test file (fp-cmp-sf.ll) is executable in your patch (see the svn property a few lines above on phabricator). Before committing, please change your local fp-cmp-sf.ll file to not be executable: chmod -x test/CodeGen/X86/fp-cmp-sf.ll You might also have to svn propdel svn:executable test/CodeGen/X86/fp-cmp-sf.ll if you svn add-ed it already. |
Thanks for comments, Ahmed
test/CodeGen/AArch64/arm64-fp128.ll | ||
---|---|---|
167 | Yes, exactly. But there's no XOR in given IR, so we must explain why GE. And I agree that <unfortunately "optimizes"> is now just "optimizes" |
There was one newline too many, now there are two. Remove them both?