This is an archive of the discontinued LLVM Phabricator instance.

[complex] Teach the complex math IR gen to emit direct math and a NaN-test prior to the call to the library function.
ClosedPublic

Authored by chandlerc on Oct 13 2014, 11:39 AM.

Download Raw Diff

Details

Reviewers

scanon
resistor
hfinkel

Commits

rG0c4b230b32ba: [complex] Teach the complex math IR gen to emit direct math and a NaN-test…
rC220167: [complex] Teach the complex math IR gen to emit direct math and
rL220167: [complex] Teach the complex math IR gen to emit direct math and

Summary

This should automatically make fastmath (including just non-NaNs) able to avoid
the expensive libcalls and also open the door to more advanced folding in LLVM
based on the rules for complex math.

Diff Detail

Repository: rL LLVM

Event Timeline

chandlerc updated this revision to Diff 14810.Oct 13 2014, 11:39 AM

chandlerc retitled this revision from to [complex] Teach the complex math IR gen to emit direct math and a NaN-test prior to the call to the library function..

chandlerc updated this object.

chandlerc added reviewers: hfinkel, scanon, resistor.

chandlerc added a subscriber: Unknown Object (MLST).

Aside from the missing _Complex long double functions (as mentioned below), LGTM.

lib/CodeGen/CGExprComplex.cpp
608 ↗	(On Diff #14810)	What about PPC_FP128TyID, which gets multc3 (or FP128TyID, which I think also gets mulxc3)?

This revision is now accepted and ready to land.Oct 14 2014, 9:36 AM

chandlerc added inline comments.Oct 14 2014, 6:06 PM

lib/CodeGen/CGExprComplex.cpp
608 ↗	(On Diff #14810)	That's not new in this patch. I've only really been able to test this effectively on x86, and am not super familiar with the PPC ABI here. Want to add multc3 and PPC_FP128TyID and a powerpc triple run to the test? It should probably be a separate commit though.

hfinkel added inline comments.Oct 14 2014, 6:20 PM

lib/CodeGen/CGExprComplex.cpp
608 ↗	(On Diff #14810)	Okay. What pronoun belongs in front of the 'want'?

Apologies for delay in looking at this, I'm on vacation this week.

I don't love this approach because (a) it doesn't get us fully to where we want to be in performance, and (b) it's going to trash the floating-point flag state. The performance issue is that we still have two comparisons and one or two branches for every complex op outside of no-nans, and the flags issue is as follows:

The intention of IEEE-754 is that anything that is conceptually a single "operation" should raise at most one of divide-by-zero, invalid, overflow, or underflow. A complex multiplication implemented with lazy checking may cause two of these to be raised:

(tiny, huge) * (tiny, huge) --> underflow + overflow
(0, huge) * (inf, huge) --> invalid + overflow, no flags

My preferred approach would be to implement limited-range semantics as an option (via either pragma or flag), and have it implied by fast-math.

Now, all that being said, I haven't checked if today's compiler-rt implementations are even correct w.r.t. flags in this sense, so it's not immediately obvious that this change makes anything worse today, and it will address some of the performance concerns of the earlier patch. It just seems contrary to the direction that we really want to be going in the longer-term w.r.t. numerical correctness.

Closed by commit rL220167 (authored by @chandlerc).

Original Message -----

From: "Steve Canon" <scanon@apple.com>
To: chandlerc@gmail.com, resistor@mac.com, hfinkel@anl.gov, scanon@apple.com
Cc: cfe-commits@cs.uiuc.edu
Sent: Friday, October 17, 2014 4:23:28 AM
Subject: Re: [PATCH] [complex] Teach the complex math IR gen to emit direct math and a NaN-test prior to the call to
the library function.

Apologies for delay in looking at this, I'm on vacation this week.

I don't love this approach because (a) it doesn't get us fully to
where we want to be in performance, and (b) it's going to trash the
floating-point flag state. The performance issue is that we still
have two comparisons and one or two branches for every complex op
outside of no-nans, and the flags issue is as follows:

The intention of IEEE-754 is that anything that is conceptually a
single "operation" should raise at most one of divide-by-zero,
invalid, overflow, or underflow. A complex multiplication
implemented with lazy checking may cause two of these to be raised:
(tiny, huge) * (tiny, huge) --> underflow + overflow
(0, huge) * (inf, huge) --> invalid + overflow, no flags

Thinking about this, this can only matter if we actually permit access to the FP environment, which we currently don't. So, if we were to ever allow "#pragma STDC FENV_ACCESS on", then we'd want to disable this optimization. But for now this is irrelevant (at least from the C perspective). Is this right?

-Hal

My preferred approach would be to implement limited-range semantics
as an option (via either pragma or flag), and have it implied by
fast-math.

Now, all that being said, I haven't checked if today's compiler-rt
implementations are even correct w.r.t. flags in this sense, so it's
not immediately obvious that this change makes anything worse today,
and it will address some of the performance concerns of the
earlier patch. It just seems contrary to the direction that we
really want to be going in the longer-term w.r.t. numerical
correctness.

http://reviews.llvm.org/D5756

I realize this revision is closed, but...

Three of these test variants fail in the parser (unsupported floating point type) when compiled for powerpc64le-unknown-linux-gnu:

long double _Complex mul_long_double_cc(long double _Complex a, long double _Complex b)
long double _Complex div_long_double_rc(long double a, long double _Complex b)
long double _Complex div_long_double_cc(long double _Complex a, long double _Complex b)

Original Message -----

From: "Bill Schmidt" <wschmidt@linux.vnet.ibm.com>
To: chandlerc@gmail.com, resistor@mac.com, hfinkel@anl.gov, scanon@apple.com
Cc: wschmidt@linux.vnet.ibm.com, cfe-commits@cs.uiuc.edu
Sent: Wednesday, October 22, 2014 3:27:30 PM
Subject: Re: [PATCH] [complex] Teach the complex math IR gen to emit direct math and a NaN-test prior to the call to
the library function.

I realize this revision is closed, but...

Three of these test variants fail in the parser (unsupported floating
point type) when compiled for powerpc64le-unknown-linux-gnu:

long double _Complex mul_long_double_cc(long double _Complex a, long
double _Complex b)
long double _Complex div_long_double_rc(long double a, long double
_Complex b)
long double _Complex div_long_double_cc(long double _Complex a, long
double _Complex b)

Hrmm... that seems like a bug that we should fix.

-Hal

REPOSITORY
rL LLVM
http://reviews.llvm.org/D5756

Er, never mind. I missed a build issue and was testing an old version.

Revision Contents

Path

Size

cfe/

trunk/

lib/

CodeGen/

CGExprComplex.cpp

100 lines

test/

CodeGen/

complex-math.c

48 lines

Diff 15130

cfe/trunk/lib/CodeGen/CGExprComplex.cpp

Show All 9 Lines
// This contains code to emit Expr nodes with complex types as LLVM code.		// This contains code to emit Expr nodes with complex types as LLVM code.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "CodeGenFunction.h"		#include "CodeGenFunction.h"
#include "CodeGenModule.h"		#include "CodeGenModule.h"
#include "clang/AST/ASTContext.h"		#include "clang/AST/ASTContext.h"
#include "clang/AST/StmtVisitor.h"		#include "clang/AST/StmtVisitor.h"
		#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/SmallString.h"		#include "llvm/ADT/SmallString.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
		#include "llvm/IR/Instructions.h"
		#include "llvm/IR/MDBuilder.h"
		#include "llvm/IR/Metadata.h"
#include <algorithm>		#include <algorithm>
using namespace clang;		using namespace clang;
using namespace CodeGen;		using namespace CodeGen;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Complex Expression Emitter		// Complex Expression Emitter
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

▲ Show 20 Lines • Show All 553 Lines • ▼ Show 20 Lines	ComplexPairTy ComplexExprEmitter::EmitComplexBinOpLibCall(StringRef LibCallName,
const CGFunctionInfo &FuncInfo = CGF.CGM.getTypes().arrangeFreeFunctionCall(		const CGFunctionInfo &FuncInfo = CGF.CGM.getTypes().arrangeFreeFunctionCall(
Op.Ty, Args, FunctionType::ExtInfo(), RequiredArgs::All);		Op.Ty, Args, FunctionType::ExtInfo(), RequiredArgs::All);
llvm::FunctionType *FTy = CGF.CGM.getTypes().GetFunctionType(FuncInfo);		llvm::FunctionType *FTy = CGF.CGM.getTypes().GetFunctionType(FuncInfo);
llvm::Constant *Func = CGF.CGM.CreateRuntimeFunction(FTy, LibCallName);		llvm::Constant *Func = CGF.CGM.CreateRuntimeFunction(FTy, LibCallName);

return CGF.EmitCall(FuncInfo, Func, ReturnValueSlot(), Args).getComplexVal();		return CGF.EmitCall(FuncInfo, Func, ReturnValueSlot(), Args).getComplexVal();
}		}

		/// \brief Lookup the libcall name for a given floating point type complex
		/// multiply.
		static StringRef getComplexMultiplyLibCallName(llvm::Type *Ty) {
		switch (Ty->getTypeID()) {
		default:
		llvm_unreachable("Unsupported floating point type!");
		case llvm::Type::HalfTyID:
		return "__mulhc3";
		case llvm::Type::FloatTyID:
		return "__mulsc3";
		case llvm::Type::DoubleTyID:
		return "__muldc3";
		case llvm::Type::PPC_FP128TyID:
		return "__multc3";
		case llvm::Type::X86_FP80TyID:
		return "__mulxc3";
		}
		}

// See C11 Annex G.5.1 for the semantics of multiplicative operators on complex		// See C11 Annex G.5.1 for the semantics of multiplicative operators on complex
// typed values.		// typed values.
ComplexPairTy ComplexExprEmitter::EmitBinMul(const BinOpInfo &Op) {		ComplexPairTy ComplexExprEmitter::EmitBinMul(const BinOpInfo &Op) {
using llvm::Value;		using llvm::Value;
Value ResR, ResI;		Value ResR, ResI;
		llvm::MDBuilder MDHelper(CGF.getLLVMContext());

if (Op.LHS.first->getType()->isFloatingPointTy()) {		if (Op.LHS.first->getType()->isFloatingPointTy()) {
// The general formulation is:		// The general formulation is:
// (a + ib) * (c + id) = (a * c - b * d) + i(a * d + b * c)		// (a + ib) * (c + id) = (a * c - b * d) + i(a * d + b * c)
//		//
// But we can fold away components which would be zero due to a real		// But we can fold away components which would be zero due to a real
// operand according to C11 Annex G.5.1p2.		// operand according to C11 Annex G.5.1p2.
// FIXME: C11 also provides for imaginary types which would allow folding		// FIXME: C11 also provides for imaginary types which would allow folding
// still more of this within the type system.		// still more of this within the type system.

if (Op.LHS.second && Op.RHS.second) {		if (Op.LHS.second && Op.RHS.second) {
// If both operands are complex, delegate to a libcall which works to		// If both operands are complex, emit the core math directly, and then
// prevent underflow and overflow.		// test for NaNs. If we find NaNs in the result, we delegate to a libcall
StringRef LibCallName;		// to carefully re-compute the correct infinity representation if
switch (Op.LHS.first->getType()->getTypeID()) {		// possible. The expectation is that the presence of NaNs here is
default:		// extremely rare, and so the cost of the libcall is almost irrelevant.
llvm_unreachable("Unsupported floating point type!");		// This is good, because the libcall re-computes the core multiplication
case llvm::Type::HalfTyID:		// exactly the same as we do here and re-tests for NaNs in order to be
return EmitComplexBinOpLibCall("__mulhc3", Op);		// a generic complex*complex libcall.
case llvm::Type::FloatTyID:
return EmitComplexBinOpLibCall("__mulsc3", Op);		// First compute the four products.
case llvm::Type::DoubleTyID:		Value *AC = Builder.CreateFMul(Op.LHS.first, Op.RHS.first, "mul_ac");
return EmitComplexBinOpLibCall("__muldc3", Op);		Value *BD = Builder.CreateFMul(Op.LHS.second, Op.RHS.second, "mul_bd");
case llvm::Type::PPC_FP128TyID:		Value *AD = Builder.CreateFMul(Op.LHS.first, Op.RHS.second, "mul_ad");
return EmitComplexBinOpLibCall("__multc3", Op);		Value *BC = Builder.CreateFMul(Op.LHS.second, Op.RHS.first, "mul_bc");
case llvm::Type::X86_FP80TyID:
return EmitComplexBinOpLibCall("__mulxc3", Op);		// The real part is the difference of the first two, the imaginary part is
}		// the sum of the second.
		ResR = Builder.CreateFSub(AC, BD, "mul_r");
		ResI = Builder.CreateFAdd(AD, BC, "mul_i");

		// Emit the test for the real part becoming NaN and create a branch to
		// handle it. We test for NaN by comparing the number to itself.
		Value *IsRNaN = Builder.CreateFCmpUNO(ResR, ResR, "isnan_cmp");
		llvm::BasicBlock *ContBB = CGF.createBasicBlock("complex_mul_cont");
		llvm::BasicBlock *INaNBB = CGF.createBasicBlock("complex_mul_imag_nan");
		llvm::Instruction *Branch = Builder.CreateCondBr(IsRNaN, INaNBB, ContBB);
		llvm::BasicBlock *OrigBB = Branch->getParent();

		// Give hint that we very much don't expect to see NaNs.
		// Value chosen to match UR_NONTAKEN_WEIGHT, see BranchProbabilityInfo.cpp
		llvm::MDNode *BrWeight = MDHelper.createBranchWeights(1, (1U << 20) - 1);
		Branch->setMetadata(llvm::LLVMContext::MD_prof, BrWeight);

		// Now test the imaginary part and create its branch.
		CGF.EmitBlock(INaNBB);
		Value *IsINaN = Builder.CreateFCmpUNO(ResI, ResI, "isnan_cmp");
		llvm::BasicBlock *LibCallBB = CGF.createBasicBlock("complex_mul_libcall");
		Branch = Builder.CreateCondBr(IsINaN, LibCallBB, ContBB);
		Branch->setMetadata(llvm::LLVMContext::MD_prof, BrWeight);

		// Now emit the libcall on this slowest of the slow paths.
		CGF.EmitBlock(LibCallBB);
		Value LibCallR, LibCallI;
		std::tie(LibCallR, LibCallI) = EmitComplexBinOpLibCall(
		getComplexMultiplyLibCallName(Op.LHS.first->getType()), Op);
		Builder.CreateBr(ContBB);

		// Finally continue execution by phi-ing together the different
		// computation paths.
		CGF.EmitBlock(ContBB);
		llvm::PHINode *RealPHI = Builder.CreatePHI(ResR->getType(), 3, "real_mul_phi");
		RealPHI->addIncoming(ResR, OrigBB);
		RealPHI->addIncoming(ResR, INaNBB);
		RealPHI->addIncoming(LibCallR, LibCallBB);
		llvm::PHINode *ImagPHI = Builder.CreatePHI(ResI->getType(), 3, "imag_mul_phi");
		ImagPHI->addIncoming(ResI, OrigBB);
		ImagPHI->addIncoming(ResI, INaNBB);
		ImagPHI->addIncoming(LibCallI, LibCallBB);
		return ComplexPairTy(RealPHI, ImagPHI);
}		}
assert((Op.LHS.second \|\| Op.RHS.second) &&		assert((Op.LHS.second \|\| Op.RHS.second) &&
"At least one operand must be complex!");		"At least one operand must be complex!");

// If either of the operands is a real rather than a complex, the		// If either of the operands is a real rather than a complex, the
// imaginary component is ignored when computing the real component of the		// imaginary component is ignored when computing the real component of the
// result.		// result.
ResR = Builder.CreateFMul(Op.LHS.first, Op.RHS.first, "mul.rl");		ResR = Builder.CreateFMul(Op.LHS.first, Op.RHS.first, "mul.rl");
▲ Show 20 Lines • Show All 387 Lines • Show Last 20 Lines

cfe/trunk/test/CodeGen/complex-math.c

Show First 20 Lines • Show All 83 Lines • ▼ Show 20 Lines	float _Complex mul_float_rc(float a, float _Complex b) {
// X86: fmul		// X86: fmul
// X86: fmul		// X86: fmul
// X86-NOT: fmul		// X86-NOT: fmul
// X86: ret		// X86: ret
return a * b;		return a * b;
}		}
float _Complex mul_float_cc(float _Complex a, float _Complex b) {		float _Complex mul_float_cc(float _Complex a, float _Complex b) {
// X86-LABEL: @mul_float_cc(		// X86-LABEL: @mul_float_cc(
// X86-NOT: fmul		// X86: %[[AC:[^ ]+]] = fmul
		// X86: %[[BD:[^ ]+]] = fmul
		// X86: %[[AD:[^ ]+]] = fmul
		// X86: %[[BC:[^ ]+]] = fmul
		// X86: %[[RR:[^ ]+]] = fsub float %[[AC]], %[[BD]]
		// X86: %[[RI:[^ ]+]] = fadd float
		// X86-DAG: %[[AD]]
		// X86-DAG: ,
		// X86-DAG: %[[BC]]
		// X86: fcmp uno float %[[RR]]
		// X86: fcmp uno float %[[RI]]
// X86: call {{.*}} @__mulsc3(		// X86: call {{.*}} @__mulsc3(
// X86: ret		// X86: ret
return a * b;		return a * b;
}		}

float _Complex div_float_rr(float a, float b) {		float _Complex div_float_rr(float a, float b) {
// X86-LABEL: @div_float_rr(		// X86-LABEL: @div_float_rr(
// X86: fdiv		// X86: fdiv
▲ Show 20 Lines • Show All 105 Lines • ▼ Show 20 Lines	double _Complex mul_double_rc(double a, double _Complex b) {
// X86: fmul		// X86: fmul
// X86: fmul		// X86: fmul
// X86-NOT: fmul		// X86-NOT: fmul
// X86: ret		// X86: ret
return a * b;		return a * b;
}		}
double _Complex mul_double_cc(double _Complex a, double _Complex b) {		double _Complex mul_double_cc(double _Complex a, double _Complex b) {
// X86-LABEL: @mul_double_cc(		// X86-LABEL: @mul_double_cc(
// X86-NOT: fmul		// X86: %[[AC:[^ ]+]] = fmul
		// X86: %[[BD:[^ ]+]] = fmul
		// X86: %[[AD:[^ ]+]] = fmul
		// X86: %[[BC:[^ ]+]] = fmul
		// X86: %[[RR:[^ ]+]] = fsub double %[[AC]], %[[BD]]
		// X86: %[[RI:[^ ]+]] = fadd double
		// X86-DAG: %[[AD]]
		// X86-DAG: ,
		// X86-DAG: %[[BC]]
		// X86: fcmp uno double %[[RR]]
		// X86: fcmp uno double %[[RI]]
// X86: call {{.*}} @__muldc3(		// X86: call {{.*}} @__muldc3(
// X86: ret		// X86: ret
return a * b;		return a * b;
}		}

double _Complex div_double_rr(double a, double b) {		double _Complex div_double_rr(double a, double b) {
// X86-LABEL: @div_double_rr(		// X86-LABEL: @div_double_rr(
// X86: fdiv		// X86: fdiv
▲ Show 20 Lines • Show All 105 Lines • ▼ Show 20 Lines	long double _Complex mul_long_double_rc(long double a, long double _Complex b) {
// X86: fmul		// X86: fmul
// X86: fmul		// X86: fmul
// X86-NOT: fmul		// X86-NOT: fmul
// X86: ret		// X86: ret
return a * b;		return a * b;
}		}
long double _Complex mul_long_double_cc(long double _Complex a, long double _Complex b) {		long double _Complex mul_long_double_cc(long double _Complex a, long double _Complex b) {
// X86-LABEL: @mul_long_double_cc(		// X86-LABEL: @mul_long_double_cc(
// X86-NOT: fmul		// X86: %[[AC:[^ ]+]] = fmul
		// X86: %[[BD:[^ ]+]] = fmul
		// X86: %[[AD:[^ ]+]] = fmul
		// X86: %[[BC:[^ ]+]] = fmul
		// X86: %[[RR:[^ ]+]] = fsub x86_fp80 %[[AC]], %[[BD]]
		// X86: %[[RI:[^ ]+]] = fadd x86_fp80
		// X86-DAG: %[[AD]]
		// X86-DAG: ,
		// X86-DAG: %[[BC]]
		// X86: fcmp uno x86_fp80 %[[RR]]
		// X86: fcmp uno x86_fp80 %[[RI]]
// X86: call {{.*}} @__mulxc3(		// X86: call {{.*}} @__mulxc3(
// X86: ret		// X86: ret
// PPC-LABEL: @mul_long_double_cc(		// PPC-LABEL: @mul_long_double_cc(
// PPC-NOT: fmul		// PPC: %[[AC:[^ ]+]] = fmul
		// PPC: %[[BD:[^ ]+]] = fmul
		// PPC: %[[AD:[^ ]+]] = fmul
		// PPC: %[[BC:[^ ]+]] = fmul
		// PPC: %[[RR:[^ ]+]] = fsub ppc_fp128 %[[AC]], %[[BD]]
		// PPC: %[[RI:[^ ]+]] = fadd ppc_fp128
		// PPC-DAG: %[[AD]]
		// PPC-DAG: ,
		// PPC-DAG: %[[BC]]
		// PPC: fcmp uno ppc_fp128 %[[RR]]
		// PPC: fcmp uno ppc_fp128 %[[RI]]
// PPC: call {{.*}} @__multc3(		// PPC: call {{.*}} @__multc3(
// PPC: ret		// PPC: ret
return a * b;		return a * b;
}		}

long double _Complex div_long_double_rr(long double a, long double b) {		long double _Complex div_long_double_rr(long double a, long double b) {
// X86-LABEL: @div_long_double_rr(		// X86-LABEL: @div_long_double_rr(
// X86: fdiv		// X86: fdiv
▲ Show 20 Lines • Show All 85 Lines • Show Last 20 Lines