This should automatically make fastmath (including just non-NaNs) able to avoid
the expensive libcalls and also open the door to more advanced folding in LLVM
based on the rules for complex math.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
Aside from the missing _Complex long double functions (as mentioned below), LGTM.
lib/CodeGen/CGExprComplex.cpp | ||
---|---|---|
608 ↗ | (On Diff #14810) | What about PPC_FP128TyID, which gets multc3 (or FP128TyID, which I think also gets mulxc3)? |
lib/CodeGen/CGExprComplex.cpp | ||
---|---|---|
608 ↗ | (On Diff #14810) | That's not new in this patch. I've only really been able to test this effectively on x86, and am not super familiar with the PPC ABI here. Want to add multc3 and PPC_FP128TyID and a powerpc triple run to the test? It should probably be a separate commit though. |
lib/CodeGen/CGExprComplex.cpp | ||
---|---|---|
608 ↗ | (On Diff #14810) | Okay. What pronoun belongs in front of the 'want'? |
Apologies for delay in looking at this, I'm on vacation this week.
I don't love this approach because (a) it doesn't get us fully to where we want to be in performance, and (b) it's going to trash the floating-point flag state. The performance issue is that we still have two comparisons and one or two branches for every complex op outside of no-nans, and the flags issue is as follows:
The intention of IEEE-754 is that anything that is conceptually a single "operation" should raise at most one of divide-by-zero, invalid, overflow, or underflow. A complex multiplication implemented with lazy checking may cause two of these to be raised:
(tiny, huge) * (tiny, huge) --> underflow + overflow (0, huge) * (inf, huge) --> invalid + overflow, no flags
My preferred approach would be to implement limited-range semantics as an option (via either pragma or flag), and have it implied by fast-math.
Now, all that being said, I haven't checked if today's compiler-rt implementations are even correct w.r.t. flags in this sense, so it's not immediately obvious that this change makes anything worse today, and it will address some of the performance concerns of the earlier patch. It just seems contrary to the direction that we really want to be going in the longer-term w.r.t. numerical correctness.
- Original Message -----
From: "Steve Canon" <scanon@apple.com>
To: chandlerc@gmail.com, resistor@mac.com, hfinkel@anl.gov, scanon@apple.com
Cc: cfe-commits@cs.uiuc.edu
Sent: Friday, October 17, 2014 4:23:28 AM
Subject: Re: [PATCH] [complex] Teach the complex math IR gen to emit direct math and a NaN-test prior to the call to
the library function.Apologies for delay in looking at this, I'm on vacation this week.
I don't love this approach because (a) it doesn't get us fully to
where we want to be in performance, and (b) it's going to trash the
floating-point flag state. The performance issue is that we still
have two comparisons and one or two branches for every complex op
outside of no-nans, and the flags issue is as follows:The intention of IEEE-754 is that anything that is conceptually a
single "operation" should raise at most one of divide-by-zero,
invalid, overflow, or underflow. A complex multiplication
implemented with lazy checking may cause two of these to be raised:(tiny, huge) * (tiny, huge) --> underflow + overflow (0, huge) * (inf, huge) --> invalid + overflow, no flags
Thinking about this, this can only matter if we actually permit access to the FP environment, which we currently don't. So, if we were to ever allow "#pragma STDC FENV_ACCESS on", then we'd want to disable this optimization. But for now this is irrelevant (at least from the C perspective). Is this right?
-Hal
My preferred approach would be to implement limited-range semantics
as an option (via either pragma or flag), and have it implied by
fast-math.Now, all that being said, I haven't checked if today's compiler-rt
implementations are even correct w.r.t. flags in this sense, so it's
not immediately obvious that this change makes anything worse today,
and it will address some of the performance concerns of the
earlier patch. It just seems contrary to the direction that we
really want to be going in the longer-term w.r.t. numerical
correctness.
I realize this revision is closed, but...
Three of these test variants fail in the parser (unsupported floating point type) when compiled for powerpc64le-unknown-linux-gnu:
long double _Complex mul_long_double_cc(long double _Complex a, long double _Complex b)
long double _Complex div_long_double_rc(long double a, long double _Complex b)
long double _Complex div_long_double_cc(long double _Complex a, long double _Complex b)
- Original Message -----
From: "Bill Schmidt" <wschmidt@linux.vnet.ibm.com>
To: chandlerc@gmail.com, resistor@mac.com, hfinkel@anl.gov, scanon@apple.com
Cc: wschmidt@linux.vnet.ibm.com, cfe-commits@cs.uiuc.edu
Sent: Wednesday, October 22, 2014 3:27:30 PM
Subject: Re: [PATCH] [complex] Teach the complex math IR gen to emit direct math and a NaN-test prior to the call to
the library function.I realize this revision is closed, but...
Three of these test variants fail in the parser (unsupported floating
point type) when compiled for powerpc64le-unknown-linux-gnu:long double _Complex mul_long_double_cc(long double _Complex a, long
double _Complex b)
long double _Complex div_long_double_rc(long double a, long double
_Complex b)
long double _Complex div_long_double_cc(long double _Complex a, long
double _Complex b)
Hrmm... that seems like a bug that we should fix.
-Hal
REPOSITORY
rL LLVM