This is an archive of the discontinued LLVM Phabricator instance.

[complex] Teach the complex math IR gen to emit direct math and a NaN-test prior to the call to the library function.
ClosedPublic

Authored by chandlerc on Oct 13 2014, 11:39 AM.

Details

Summary

This should automatically make fastmath (including just non-NaNs) able to avoid
the expensive libcalls and also open the door to more advanced folding in LLVM
based on the rules for complex math.

Diff Detail

Repository
rL LLVM

Event Timeline

chandlerc updated this revision to Diff 14810.Oct 13 2014, 11:39 AM
chandlerc retitled this revision from to [complex] Teach the complex math IR gen to emit direct math and a NaN-test prior to the call to the library function..
chandlerc updated this object.
chandlerc added reviewers: hfinkel, scanon, resistor.
chandlerc added a subscriber: Unknown Object (MLST).
hfinkel accepted this revision.Oct 14 2014, 9:36 AM
hfinkel edited edge metadata.

Aside from the missing _Complex long double functions (as mentioned below), LGTM.

lib/CodeGen/CGExprComplex.cpp
608 ↗(On Diff #14810)

What about PPC_FP128TyID, which gets multc3 (or FP128TyID, which I think also gets mulxc3)?

This revision is now accepted and ready to land.Oct 14 2014, 9:36 AM
chandlerc added inline comments.Oct 14 2014, 6:06 PM
lib/CodeGen/CGExprComplex.cpp
608 ↗(On Diff #14810)

That's not new in this patch. I've only really been able to test this effectively on x86, and am not super familiar with the PPC ABI here.

Want to add multc3 and PPC_FP128TyID and a powerpc triple run to the test? It should probably be a separate commit though.

hfinkel added inline comments.Oct 14 2014, 6:20 PM
lib/CodeGen/CGExprComplex.cpp
608 ↗(On Diff #14810)

Okay. What pronoun belongs in front of the 'want'?

scanon edited edge metadata.Oct 17 2014, 2:23 AM

Apologies for delay in looking at this, I'm on vacation this week.

I don't love this approach because (a) it doesn't get us fully to where we want to be in performance, and (b) it's going to trash the floating-point flag state. The performance issue is that we still have two comparisons and one or two branches for every complex op outside of no-nans, and the flags issue is as follows:

The intention of IEEE-754 is that anything that is conceptually a single "operation" should raise at most one of divide-by-zero, invalid, overflow, or underflow. A complex multiplication implemented with lazy checking may cause two of these to be raised:

(tiny, huge) * (tiny, huge) --> underflow + overflow
(0, huge) * (inf, huge) --> invalid + overflow, no flags

My preferred approach would be to implement limited-range semantics as an option (via either pragma or flag), and have it implied by fast-math.

Now, all that being said, I haven't checked if today's compiler-rt implementations are even correct w.r.t. flags in this sense, so it's not immediately obvious that this change makes anything worse today, and it will address some of the performance concerns of the earlier patch. It just seems contrary to the direction that we really want to be going in the longer-term w.r.t. numerical correctness.

chandlerc closed this revision.Oct 19 2014, 12:24 PM
chandlerc updated this revision to Diff 15130.

Closed by commit rL220167 (authored by @chandlerc).

  • Original Message -----

From: "Steve Canon" <scanon@apple.com>
To: chandlerc@gmail.com, resistor@mac.com, hfinkel@anl.gov, scanon@apple.com
Cc: cfe-commits@cs.uiuc.edu
Sent: Friday, October 17, 2014 4:23:28 AM
Subject: Re: [PATCH] [complex] Teach the complex math IR gen to emit direct math and a NaN-test prior to the call to
the library function.

Apologies for delay in looking at this, I'm on vacation this week.

I don't love this approach because (a) it doesn't get us fully to
where we want to be in performance, and (b) it's going to trash the
floating-point flag state. The performance issue is that we still
have two comparisons and one or two branches for every complex op
outside of no-nans, and the flags issue is as follows:

The intention of IEEE-754 is that anything that is conceptually a
single "operation" should raise at most one of divide-by-zero,
invalid, overflow, or underflow. A complex multiplication
implemented with lazy checking may cause two of these to be raised:

(tiny, huge) * (tiny, huge) --> underflow + overflow
(0, huge) * (inf, huge) --> invalid + overflow, no flags

Thinking about this, this can only matter if we actually permit access to the FP environment, which we currently don't. So, if we were to ever allow "#pragma STDC FENV_ACCESS on", then we'd want to disable this optimization. But for now this is irrelevant (at least from the C perspective). Is this right?

-Hal

My preferred approach would be to implement limited-range semantics
as an option (via either pragma or flag), and have it implied by
fast-math.

Now, all that being said, I haven't checked if today's compiler-rt
implementations are even correct w.r.t. flags in this sense, so it's
not immediately obvious that this change makes anything worse today,
and it will address some of the performance concerns of the
earlier patch. It just seems contrary to the direction that we
really want to be going in the longer-term w.r.t. numerical
correctness.

http://reviews.llvm.org/D5756

I realize this revision is closed, but...

Three of these test variants fail in the parser (unsupported floating point type) when compiled for powerpc64le-unknown-linux-gnu:

long double _Complex mul_long_double_cc(long double _Complex a, long double _Complex b)
long double _Complex div_long_double_rc(long double a, long double _Complex b)
long double _Complex div_long_double_cc(long double _Complex a, long double _Complex b)

  • Original Message -----

From: "Bill Schmidt" <wschmidt@linux.vnet.ibm.com>
To: chandlerc@gmail.com, resistor@mac.com, hfinkel@anl.gov, scanon@apple.com
Cc: wschmidt@linux.vnet.ibm.com, cfe-commits@cs.uiuc.edu
Sent: Wednesday, October 22, 2014 3:27:30 PM
Subject: Re: [PATCH] [complex] Teach the complex math IR gen to emit direct math and a NaN-test prior to the call to
the library function.

I realize this revision is closed, but...

Three of these test variants fail in the parser (unsupported floating
point type) when compiled for powerpc64le-unknown-linux-gnu:

long double _Complex mul_long_double_cc(long double _Complex a, long
double _Complex b)
long double _Complex div_long_double_rc(long double a, long double
_Complex b)
long double _Complex div_long_double_cc(long double _Complex a, long
double _Complex b)

Hrmm... that seems like a bug that we should fix.

-Hal

REPOSITORY

rL LLVM

http://reviews.llvm.org/D5756

Er, never mind. I missed a build issue and was testing an old version.