This is an archive of the discontinued LLVM Phabricator instance.

[SDAG] Optimize unordered comparison in soft-float mode
ClosedPublic

Authored by anadolskiy on Jun 29 2015, 7:54 AM.

Details

Summary

Current implementation handles unordered comparison poorly in soft-float mode.
Consider (a ULE b) which is a <= b. It is lowered to (ledf2(a, b) <= 0 || unorddf2(a, b) != 0) (in general). We can do better job by lowering it to (__gtdf2(a, b) <= 0).
Such replacement is true for other CMP's (ult, ugt, uge). In general, we just call same function as for ordered case but negate comparison against zero.

Diff Detail

Repository
rL LLVM

Event Timeline

anadolskiy updated this revision to Diff 28671.Jun 29 2015, 7:54 AM
anadolskiy retitled this revision from to [SDAG] Optimize unordered comparison in soft-float mode.
anadolskiy updated this object.
anadolskiy edited the test plan for this revision. (Show Details)
anadolskiy added a reviewer: resistor.
anadolskiy set the repository for this revision to rL LLVM.
anadolskiy added a subscriber: Unknown Object (MLST).
anadolskiy updated this object.Jun 30 2015, 1:11 AM
anadolskiy updated this revision to Diff 28773.Jun 30 2015, 5:57 AM

Added tests

ab added a subscriber: ab.Jun 30 2015, 7:24 PM
ab added inline comments.
lib/CodeGen/SelectionDAG/TargetLowering.cpp
118 ↗(On Diff #28773)

There was one newline too many, now there are two. Remove them both?

130 ↗(On Diff #28773)

Capital g, and name sounds like a function. What about something like "ShouldInvertCC" ?

184 ↗(On Diff #28773)

"Invert CC for unordered comparisons." ?

187 ↗(On Diff #28773)

Indentation seems off, clang-format?

216 ↗(On Diff #28773)

"true" -> "/*isInteger=*/true" ?

test/CodeGen/AArch64/arm64-fp128.ll
153 ↗(On Diff #28773)

Is this just "cset w0, gt" ?

167 ↗(On Diff #28773)

Can this comment be removed now? I guess your change lets us undo some other transform (which you should probably look into by the way)

170 ↗(On Diff #28773)

CHECK-NEXT?

171 ↗(On Diff #28773)

You should check IFFALSE/RET29 somewhere.

178 ↗(On Diff #28773)

CHECK-NEXT?

182 ↗(On Diff #28773)

CHECK-NEXT?

test/CodeGen/Thumb2/float-cmp.ll
84 ↗(On Diff #28773)

Hmm, these might benefit from stricter CHECK/NONE, since I guess the result gets inverted?

test/CodeGen/X86/fp-cmp-sf.ll
1 ↗(On Diff #28773)

Can you chmod this file as non-executable? Also, what about "fpcmp-soft-float" or "fpcmp-soft-fp"? "sf" isn't descriptive enough IMO.

127 ↗(On Diff #28773)

Why optsize/readnone? For that matter, if you're not doing CHECK-NEXT, why nounwind?

anadolskiy updated this revision to Diff 28864.Jul 1 2015, 7:20 AM

Thanks Ahmed for review! Fixed yours remarks

anadolskiy updated this revision to Diff 28866.Jul 1 2015, 7:36 AM

Fixed format

ping

test/CodeGen/AArch64/arm64-fp128.ll
167 ↗(On Diff #28773)

This comment explains line 171: why we compare with GE, instead of LT (that should be in case of ordered LT)

178 ↗(On Diff #28773)

No, there is ldp instruction in between

182 ↗(On Diff #28773)

same here

test/CodeGen/Thumb2/float-cmp.ll
84 ↗(On Diff #28773)

Sorry, I didn't understand that. Can you please explain?

test/CodeGen/X86/fp-cmp-sf.ll
1 ↗(On Diff #28773)

Why chmod this as non-executable? Other tests don't do that.

ab added a comment.Jul 6 2015, 12:02 PM

Thanks for the update! To me, the logic is sound. Nitpick replies below.

test/CodeGen/AArch64/arm64-fp128.ll
167 ↗(On Diff #28773)

Hmm, that's not how I understood the comment.

Checking both uno/__unordtf2 and oge/__getf2 is indeed an 'unfortunate "optimization"'. We used to do something like:

IR:

%1 = fcmp olt ...
br i1 %1, label %iftrue, label %iffalse

SelectionDAG:

%1 = fcmp olt ...
%2 = xor i1 %1, 1
brcond i1 %2, label %iffalse
br label %iftrue

Optimized SelectionDAG:

%1 = fcmp uge ...
brcond i1 %2, label %iffalse
br label %iftrue

It's this optimization that the comment complains about, I think, because a softened uge used to need two libcalls (as opposed to 1 for olt).

Whereas I read:

  bl __lttf2
  cmp w0, #0
  b.ge <iffalse>
iftrue:

as a straightforward lowering for:

%1 = fcmp olt ...
%2 = xor i1 %1, 1
brcond i1 %2, label %iffalse
br label %iftrue

via (given that __lttf2 returns < 0 / -1 on "olt" only):

%1 = call @__lttf2
%2 = icmp lt i32 %1, 0
%3 = xor i1 %2, 1
brcond i1 %3, label %iffalse
br label %iftrue

itself simplified to:

%1 = call @__lttf2
%2 = icmp ge i32 %1, 0
brcond i1 %2, label %iffalse
br label %iftrue

In any case, not a big deal, but I might have missed something, so: am I making sense?

test/CodeGen/Thumb2/float-cmp.ll
84 ↗(On Diff #28773)

The IR does an UGT comparison, but this patch checks for OLE. Since UGT == !OLE, I imagine there's some code right after the call that inverts r0 somehow.

I'm saying it might be a good idea to check (in this case, using "NONE:") that code as well.

test/CodeGen/X86/fp-cmp-sf.ll
1 ↗(On Diff #28773)

I mean this test file (fp-cmp-sf.ll) is executable in your patch (see the svn property a few lines above on phabricator). Before committing, please change your local fp-cmp-sf.ll file to not be executable:

chmod -x test/CodeGen/X86/fp-cmp-sf.ll

You might also have to

svn propdel svn:executable test/CodeGen/X86/fp-cmp-sf.ll

if you svn add-ed it already.

anadolskiy updated this revision to Diff 29352.Jul 9 2015, 10:23 AM

Made new tests non-executable

Thanks for comments, Ahmed

test/CodeGen/AArch64/arm64-fp128.ll
167 ↗(On Diff #28866)

Yes, exactly. But there's no XOR in given IR, so we must explain why GE. And I agree that <unfortunately "optimizes"> is now just "optimizes"

ab accepted this revision.Jul 13 2015, 9:40 AM
ab added a reviewer: ab.

OK, this LGTM, thanks! Do you need someone to commit it for you?

This revision is now accepted and ready to land.Jul 13 2015, 9:40 AM
This revision was automatically updated to reflect the committed changes.