Page MenuHomePhabricator

[x86] inline calls to fmaxf / llvm.maxnum.f32 using maxss (PR24475)
ClosedPublic

Authored by spatel on Dec 7 2015, 10:43 AM.

Details

Summary

This patch implements the suggested codegen from PR24475:
https://llvm.org/bugs/show_bug.cgi?id=24475

but only for the fmaxf() case to start, so we can sort out any bugs before extending to fmin, f64, and vectors.

The fmax / maxnum definitions provide us flexibility for signed zeros, so I hope the only thing we have to worry about in this replacement sequence is NaN handling.

Note 1: I initially implemented this as lowerFMAXNUM(), but that exposes a problem: SelectionDAGBuilder::visitSelect() transforms compare/select instructions into FMAXNUM nodes if we declare FMAXNUM legal or custom. Perhaps this should be checking for NaN inputs or global unsafe-math before transforming? As it stands, this bypasses a big set of optimizations that the x86 backend already has in PerformSELECTCombine(). I don't know what the tradeoffs are for making a 'combine' rather than a 'lower'. If a 'lower' is preferred, we will need to fix that problem.

Note 2: The v2f32 test reveals another bug; the vector is extended to v4f32, so we have completely unnecessary operations happening on undef elements of the vector.

Diff Detail

Repository
rL LLVM

Event Timeline

spatel updated this revision to Diff 42078.Dec 7 2015, 10:43 AM
spatel retitled this revision from to [x86] inline calls to fmaxf / llvm.maxnum.f32 using maxss (PR24475).
spatel updated this object.
spatel added reviewers: scanon, qcolombet, jmolloy.
spatel added a subscriber: llvm-commits.
jmolloy resigned from this revision.Dec 12 2015, 5:30 AM
jmolloy removed a reviewer: jmolloy.

As this primarily involves the x86 backend, I don't think I'm an appropriate reviewer; Resigning.

spatel added a subscriber: jmolloy.

As this primarily involves the x86 backend, I don't think I'm an appropriate reviewer; Resigning.

Thanks, James. I thought you might have some feedback on the SelectionDAGBuilder question, although that is separate from the functionality of this patch as it stands.

Adding some more potential x86 reviewers.

zansari edited edge metadata.Dec 14 2015, 12:45 PM

Hi Sanjay,

Quick high-level question : wouldn't it be better to pull the intermediate value out of the fmax to reduce the dependence chain?

So, instead of :

cond = isnan(op0)
V = select (op1, op0, cond)
Result = FMAX(op1, V)

do this:

t = FMAX(op1, op0)
cond = isnan(op0)
Result = select (op1, t, cond)

This way, we go from isnan->select->fmax to fmax/isnan -> select.

Quick high-level question : wouldn't it be better to pull the intermediate value out of the fmax to reduce the dependence chain?

Yes, that would be better.

Because I'm SSE dyslexic, I altered the test program in https://llvm.org/bugs/show_bug.cgi?id=24475 to check, and this is what I came up with:

__m128 maxnum = _mm_max_ss(v2, v1);
__m128 isnan1 = _mm_cmpunord_ss(v1, v1);
maxnum = _mm_blendv_ps(maxnum, v2, isnan1);

Which compiles to (AT&T syntax - should invert the dyslexia, but I still can't get it right):

vmaxss        %xmm0, %xmm1, %xmm2         <--- if either input is NaN, xmm0 (v1) is returned
vcmpunordss   %xmm0, %xmm0, %xmm0
vblendvps     %xmm0, %xmm1, %xmm2, %xmm0  <--- if xmm0 (v1) is NaN, output xmm1 (v2); if not, output max or v1

I'll translate that to LLVM and update the patch. Thanks!

spatel updated this revision to Diff 42867.Dec 15 2015, 9:28 AM
spatel edited edge metadata.
spatel removed a subscriber: jmolloy.

Patch updated:

  1. Implement more efficient sequence suggested by Zia.
  2. Fix checks in the test to match.
  3. Add FIXME comments for other fmax tests that aren't handled yet.

I'd disable the expansion under minsize, at least.. Otherwise, lgtm.

kbsmith1 accepted this revision.Dec 15 2015, 12:40 PM
kbsmith1 edited edge metadata.

LGTM

This revision is now accepted and ready to land.Dec 15 2015, 12:40 PM

I'd disable the expansion under minsize, at least.. Otherwise, lgtm.

Thanks! I'll add that check and a test case and get this checked in.

This revision was automatically updated to reflect the committed changes.