This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
docs/
3
LangRef.rst
-
include/llvm/
-
llvm/
-
CodeGen/
-
BasicTTIImpl.h
-
IR/
-
IRBuilder.h
-
Intrinsics.td
-
lib/
-
Analysis/
-
InstructionSimplify.cpp
-
CodeGen/SelectionDAG/
-
SelectionDAG/
-
SelectionDAGBuilder.cpp
-
Transforms/InstCombine/
-
InstCombine/
-
InstCombineCalls.cpp
-
InstCombineCompares.cpp

Differential D56598

Add llvm.psub
AbandonedPublic

Authored by aqjune on Jan 11 2019, 7:44 AM.

Download Raw Diff

Details

Reviewers

hfinkel
chandlerc

Summary

This patch adds llvm.psub(p1,p2) intrinsic function, which subtracts two pointers and returns the difference.

Its semantic is as follows.
If p1 and p2 point to different objects, and neither of them is based on a pointer casted from an integer, `llvm.psub(p1, p2)` returns poison. For example,

%p = alloca
%q = alloca
%i = llvm.psub(p, q) ; %i is poison

This allows aggressive escape analysis on pointers. Given i = llvm.psub(p1, p2), if neither of p1 and p2 is based on a pointer casted from an integer, the llvm.psub call does not make p1 or p2 escape.

If either p1 or p2 is based on a pointer casted from integer, or p1 and p2 point to a same object, it returns the result of subtraction (in bytes); for example,

%p = alloca
%q = inttoptr %x
%i = llvm.psub(p, q) ; %i is equivalent to (ptrtoint %p) - %x

`null` is regarded as a pointer casted from an integer because
it is equivalent to `inttoptr 0`.

Adding llvm.psub allows LLVM to utilize significant portion of ptrtoints & reduce a portion of inttoptrs.
When SPECrate 2017 is compiled with -O0 (r348082, Dec 2 2018), approximately 23,200 ptrtoints are generated. Among these, about 22,000 ptrtoints (95%) are generated from pointer subtraction.
When SPECrate 2017 is compiled with -O3, 22,800 inttoptrs and 31,700 ptrtoints are generated. If psub is used instead, # of inttoptr decreases to 13,500 (59%) and # of ptrtoint decreases to 14,300 (45%).

To see the performance change, I ran SPECrate 2017 (thread # = 1) with three versions of LLVM, which are r313797 (Sep 21, 2017), LLVM 6.0 official, and r348082 (Dec 2, 2018). Running r313797 shows that 505.mcf_r has consistent 2.0% speedup over 3 different machines (which are i3-6100, i5-6600, i7-7700). For LLVM 6.0 and r348082, there's neither consistent speedup nor slowdown, but the average speedup is near 0. I believe there's still a room of improvement.

Diff Detail

Event Timeline

aqjune created this revision.Jan 11 2019, 7:44 AM

https://reviews.llvm.org/D56601 shows how CaptureTracker can be enhanced with llvm.psub.

aqjune edited the summary of this revision. (Show Details)Jan 11 2019, 4:16 PM

aqjune added reviewers: hfinkel, chandlerc.

Tests missing.

docs/LangRef.rst
16077	declare iN @llvm.psub.iN.pty.pty(pty p1, pty p2) nounwind readnone speculatable
16095	I realize that it is just a snippet, but this is not a valid IR.

I like the introduction of a explicit pointer subtraction since ptrtoint/inttoptr can cause lots of subtle bugs in our CHERI backend. However, I wonder if this should be an instruction instead of an intrinsic?

docs/LangRef.rst
16095	I'm not sure disallowing pointer diffs between objects with different provenance is a good idea. For CHERI we have an explicit instruction to get the difference between two capabilities (effectively pointers with bounds, permissions and integrity protection). If we were to take provenance into account and raise a trap when one of the values is derived from a different object, it would cause lots of C code to stop working. Our `CSub` instruction ignores bounds an object provenance and only subtracts the virtual address.

This patch doesn't have llvm-commits subscribed. The developer policy says all review discussion for LLVM patches must happen on llvm-commits.

Please "abandon" this revision, and post it again with llvm-commits correctly CC'ed.

Hello all, I sincerely thank you for your comments. I'll apply the comments & re-upload the patch.

@arichardson I think csub and psub should be used differently, sadly :( Returning poison on different provenances allow more analysis, and I believe this a benefit of introducing psub.

aqjune abandoned this revision.Jan 23 2019, 10:17 AM

Revision Contents

Path

Size

docs/

LangRef.rst

37 lines

include/

llvm/

CodeGen/

BasicTTIImpl.h

8 lines

IR/

IRBuilder.h

11 lines

Intrinsics.td

3 lines

lib/

Analysis/

InstructionSimplify.cpp

4 lines

CodeGen/

SelectionDAG/

SelectionDAGBuilder.cpp

10 lines

Transforms/

InstCombine/

InstCombineCalls.cpp

25 lines

InstCombineCompares.cpp

19 lines

Diff 181265

docs/LangRef.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 16,060 Lines • ▼ Show 20 Lines
	obviously not constant. However, a call like			obviously not constant. However, a call like
	``llvm.is.constant.i32(i32 %param)`` can return true after the			``llvm.is.constant.i32(i32 %param)`` can return true after the
	function is inlined, if the value passed to the function parameter was			function is inlined, if the value passed to the function parameter was
	a constant.			a constant.

	On the other hand, if constant folding is not run, it will never			On the other hand, if constant folding is not run, it will never
	evaluate to true, even in simple cases.			evaluate to true, even in simple cases.

				'``llvm.psub``' Intrinsic
				^^^^^^^^^^^^^^^^^^^^^^^^^

				Syntax:
				"""""""

				::

				declare iN @llvm.psub.iN.pty.pty(pty, pty) nounwind readnone speculatable
				lebedev.riUnsubmitted Not Done Reply Inline Actions declare iN @llvm.psub.iN.pty.pty(pty p1, pty p2) nounwind readnone speculatable lebedev.ri: declare iN @llvm.psub.iN.pty.pty(pty p1, pty p2) nounwind readnone speculatable

				Overview:
				"""""""""

				The '``llvm.psub(p1, p2)``' intrinsic returns the result of subtraction
				of two pointers `p1 - p2` in bytes.

				Arguments:
				""""""""""
				The two arguments are pointers of the same type to subtract.
				The return value is the result of subtraction.

				Semantics:
				""""""""""
				If p1 and p2 point to different objects(global variable, alloca, etc)
				and neither of them is based on a pointer casted from an integer,
				``llvm.psub(p1, p2)`` returns poison. Otherwise, the result is equivalent to
				``sub (ptrtoint p1 to iN) (ptrtoint p2 to iN)``.
				lebedev.riUnsubmitted Not Done Reply Inline Actions I realize that it is just a snippet, but this is not a valid IR. lebedev.ri: I realize that it is just a snippet, but this is not a valid IR.
				arichardsonUnsubmitted Not Done Reply Inline Actions I'm not sure disallowing pointer diffs between objects with different provenance is a good idea. For CHERI we have an explicit instruction to get the difference between two capabilities (effectively pointers with bounds, permissions and integrity protection). If we were to take provenance into account and raise a trap when one of the values is derived from a different object, it would cause lots of C code to stop working. Our `CSub` instruction ignores bounds an object provenance and only subtracts the virtual address. arichardson: I'm not sure disallowing pointer diffs between objects with different provenance is a good idea.
				``null`` is regarded as a pointer casted from an integer because
				it is equivalent to ``inttoptr 0``.
				Therefore, ``llvm.psub(p1, null)`` is equivalent to
				``ptrtoint p1``.

				The goal of this intrinsics is to allow more aggressive escape analysis on
				pointers. Given ``i = llvm.psub(p1, p2)``, if neither of p1 and p2
				is based on a pointer casted from an integer, this ``llvm.psub``
				does not make p1 or p2 escape.

	Stack Map Intrinsics			Stack Map Intrinsics
	--------------------			--------------------

	LLVM provides experimental intrinsics to support runtime patching			LLVM provides experimental intrinsics to support runtime patching
	mechanisms commonly desired in dynamic language JITs. These intrinsics			mechanisms commonly desired in dynamic language JITs. These intrinsics
	are described in :doc:`StackMaps`.			are described in :doc:`StackMaps`.

	Element Wise Atomic Memory Intrinsics			Element Wise Atomic Memory Intrinsics
	▲ Show 20 Lines • Show All 487 Lines • Show Last 20 Lines

include/llvm/CodeGen/BasicTTIImpl.h

Show First 20 Lines • Show All 1,259 Lines • ▼ Show 20 Lines	case Intrinsic::experimental_vector_reduce_fmin:
return static_cast<T *>(this)->getMinMaxReductionCost(		return static_cast<T *>(this)->getMinMaxReductionCost(
Tys[0], CmpInst::makeCmpResultType(Tys[0]), /IsPairwiseForm=/false,		Tys[0], CmpInst::makeCmpResultType(Tys[0]), /IsPairwiseForm=/false,
/IsSigned=/true);		/IsSigned=/true);
case Intrinsic::experimental_vector_reduce_umax:		case Intrinsic::experimental_vector_reduce_umax:
case Intrinsic::experimental_vector_reduce_umin:		case Intrinsic::experimental_vector_reduce_umin:
return static_cast<T *>(this)->getMinMaxReductionCost(		return static_cast<T *>(this)->getMinMaxReductionCost(
Tys[0], CmpInst::makeCmpResultType(Tys[0]), /IsPairwiseForm=/false,		Tys[0], CmpInst::makeCmpResultType(Tys[0]), /IsPairwiseForm=/false,
/IsSigned=/false);		/IsSigned=/false);
		case Intrinsic::psub: {
		// Two ptrtoints followed by sub.
		unsigned p2i_cost = static_cast<T *>(this)
		->getOperationCost(Instruction::PtrToInt, Tys[0], RetTy);
		unsigned sub_cost = static_cast<T *>(this)
		->getOperationCost(Instruction::Sub, RetTy, RetTy);
		return p2i_cost * 2 + sub_cost;
		}
case Intrinsic::ctpop:		case Intrinsic::ctpop:
ISDs.push_back(ISD::CTPOP);		ISDs.push_back(ISD::CTPOP);
// In case of legalization use TCC_Expensive. This is cheaper than a		// In case of legalization use TCC_Expensive. This is cheaper than a
// library call but still not a cheap instruction.		// library call but still not a cheap instruction.
SingleCallCost = TargetTransformInfo::TCC_Expensive;		SingleCallCost = TargetTransformInfo::TCC_Expensive;
break;		break;
// FIXME: ctlz, cttz, ...		// FIXME: ctlz, cttz, ...
}		}
▲ Show 20 Lines • Show All 279 Lines • Show Last 20 Lines

include/llvm/IR/IRBuilder.h

	Show First 20 Lines • Show All 2,013 Lines • ▼ Show 20 Lines
	/// the size of the pointed-to objects.			/// the size of the pointed-to objects.
	///			///
	/// This is intended to implement C-style pointer subtraction. As such, the			/// This is intended to implement C-style pointer subtraction. As such, the
	/// pointers must be appropriately aligned for their element types and			/// pointers must be appropriately aligned for their element types and
	/// pointing into the same object.			/// pointing into the same object.
	Value CreatePtrDiff(Value LHS, Value *RHS, const Twine &Name = "") {			Value CreatePtrDiff(Value LHS, Value *RHS, const Twine &Name = "") {
	assert(LHS->getType() == RHS->getType() &&			assert(LHS->getType() == RHS->getType() &&
	"Pointer subtraction operand types must match!");			"Pointer subtraction operand types must match!");
	auto *ArgType = cast<PointerType>(LHS->getType());			PointerType *ArgType = cast<PointerType>(LHS->getType());
	Value *LHS_int = CreatePtrToInt(LHS, Type::getInt64Ty(Context));			Type *psubTys[] = { Type::getInt64Ty(Context), ArgType, ArgType };
	Value *RHS_int = CreatePtrToInt(RHS, Type::getInt64Ty(Context));			Value *psubArgs[] = { LHS, RHS };
	Value *Difference = CreateSub(LHS_int, RHS_int);			Module *M = BB->getParent()->getParent();
				Value *Difference = CreateCall(Intrinsic::getDeclaration(M,
				llvm::Intrinsic::psub, ArrayRef<llvm::Type *>(psubTys, 3)),
				psubArgs);
	return CreateExactSDiv(Difference,			return CreateExactSDiv(Difference,
	ConstantExpr::getSizeOf(ArgType->getElementType()),			ConstantExpr::getSizeOf(ArgType->getElementType()),
	Name);			Name);
	}			}

	/// Create a launder.invariant.group intrinsic call. If Ptr type is			/// Create a launder.invariant.group intrinsic call. If Ptr type is
	/// different from pointer to i8, it's casted to pointer to i8 in the same			/// different from pointer to i8, it's casted to pointer to i8 in the same
	/// address space before call and casted back to Ptr type after call.			/// address space before call and casted back to Ptr type after call.
	▲ Show 20 Lines • Show All 180 Lines • Show Last 20 Lines

include/llvm/IR/Intrinsics.td

	Show First 20 Lines • Show All 983 Lines • ▼ Show 20 Lines
	// Clear cache intrinsic, default to ignore (ie. emit nothing)			// Clear cache intrinsic, default to ignore (ie. emit nothing)
	// maps to void __clear_cache() on supporting platforms			// maps to void __clear_cache() on supporting platforms
	def int_clear_cache : Intrinsic<[], [llvm_ptr_ty, llvm_ptr_ty],			def int_clear_cache : Intrinsic<[], [llvm_ptr_ty, llvm_ptr_ty],
	[], "llvm.clear_cache">;			[], "llvm.clear_cache">;

	// Intrinsic to detect whether its argument is a constant.			// Intrinsic to detect whether its argument is a constant.
	def int_is_constant : Intrinsic<[llvm_i1_ty], [llvm_any_ty], [IntrNoMem], "llvm.is.constant">;			def int_is_constant : Intrinsic<[llvm_i1_ty], [llvm_any_ty], [IntrNoMem], "llvm.is.constant">;

				let IntrProperties = [IntrNoMem, IntrSpeculatable] in {
				def int_psub : Intrinsic<[llvm_anyint_ty], [llvm_anyptr_ty, llvm_anyptr_ty]>;
				}

	//===-------------------------- Masked Intrinsics -------------------------===//			//===-------------------------- Masked Intrinsics -------------------------===//
	//			//
	def int_masked_store : Intrinsic<[], [llvm_anyvector_ty,			def int_masked_store : Intrinsic<[], [llvm_anyvector_ty,
	LLVMAnyPointerType<LLVMMatchType<0>>,			LLVMAnyPointerType<LLVMMatchType<0>>,
	llvm_i32_ty,			llvm_i32_ty,
	LLVMVectorSameWidth<0, llvm_i1_ty>],			LLVMVectorSameWidth<0, llvm_i1_ty>],
	[IntrArgMemOnly]>;			[IntrArgMemOnly]>;
	▲ Show 20 Lines • Show All 152 Lines • Show Last 20 Lines

lib/Analysis/InstructionSimplify.cpp

Show First 20 Lines • Show All 5,072 Lines • ▼ Show 20 Lines	if ((match(Op0, m_APFloat(C)) && C->isInfinity() &&
return ConstantFP::getInfinity(ReturnType, UseNegInf);		return ConstantFP::getInfinity(ReturnType, UseNegInf);

// TODO: minnum(nnan x, inf) -> x		// TODO: minnum(nnan x, inf) -> x
// TODO: minnum(nnan ninf x, flt_max) -> x		// TODO: minnum(nnan ninf x, flt_max) -> x
// TODO: maxnum(nnan x, -inf) -> x		// TODO: maxnum(nnan x, -inf) -> x
// TODO: maxnum(nnan ninf x, -flt_max) -> x		// TODO: maxnum(nnan ninf x, -flt_max) -> x
break;		break;
}		}
		case Intrinsic::psub:
		if (Constant *Result = computePointerDifference(Q.DL, Op0, Op1))
		return ConstantExpr::getIntegerCast(Result, F->getReturnType(), true);
		break;
default:		default:
break;		break;
}		}

return nullptr;		return nullptr;
}		}

template <typename IterTy>		template <typename IterTy>
▲ Show 20 Lines • Show All 381 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,367 Lines • ▼ Show 20 Lines	case Intrinsic::icall_branch_funnel: {
return nullptr;		return nullptr;
}		}

case Intrinsic::wasm_landingpad_index:		case Intrinsic::wasm_landingpad_index:
// Information this intrinsic contained has been transferred to		// Information this intrinsic contained has been transferred to
// MachineFunction in SelectionDAGISel::PrepareEHLandingPad. We can safely		// MachineFunction in SelectionDAGISel::PrepareEHLandingPad. We can safely
// delete it now.		// delete it now.
return nullptr;		return nullptr;
		case Intrinsic::psub:
		// pointer to integer casting
		EVT DestVT = DAG.getTargetLoweringInfo().getValueType(DAG.getDataLayout(),
		I.getType());
		SDValue Op1 = DAG.getZExtOrTrunc(getValue(I.getArgOperand(0)), getCurSDLoc(), DestVT);
		SDValue Op2 = DAG.getZExtOrTrunc(getValue(I.getArgOperand(1)), getCurSDLoc(), DestVT);
		SDValue BinNodeValue = DAG.getNode(ISD::SUB, getCurSDLoc(), Op1.getValueType(),
		Op1, Op2);
		setValue(&I, BinNodeValue);
		return nullptr;
}		}
}		}

void SelectionDAGBuilder::visitConstrainedFPIntrinsic(		void SelectionDAGBuilder::visitConstrainedFPIntrinsic(
const ConstrainedFPIntrinsic &FPI) {		const ConstrainedFPIntrinsic &FPI) {
SDLoc sdl = getCurSDLoc();		SDLoc sdl = getCurSDLoc();
unsigned Opcode;		unsigned Opcode;
switch (FPI.getIntrinsicID()) {		switch (FPI.getIntrinsicID()) {
▲ Show 20 Lines • Show All 4,046 Lines • Show Last 20 Lines

lib/Transforms/InstCombine/InstCombineCalls.cpp

Show First 20 Lines • Show All 4,007 Lines • ▼ Show 20 Lines	if (match(NextInst,
MoveI = MoveI->getNextNode();		MoveI = MoveI->getNextNode();
Temp->moveBefore(II);		Temp->moveBefore(II);
}		}
II->setArgOperand(0, Builder.CreateAnd(CurrCond, NextCond));		II->setArgOperand(0, Builder.CreateAnd(CurrCond, NextCond));
return eraseInstFromFunction(*NextInst);		return eraseInstFromFunction(*NextInst);
}		}
break;		break;
}		}

		case Intrinsic::psub: {
		Value *Op1 = II->getArgOperand(0);
		Value *Op2 = II->getArgOperand(1);
		if (Value *Res = OptimizePointerDifference(Op1, Op2, II->getType())) {
		return replaceInstUsesWith(*II, Res);
		}
		// psub(inttoptr(i1), p2)
		// =>
		// sub (ptrtoint(inttoptr i1), ptrtoint(p2))
		// Note that ptrtoint(inttoptr i1) can be optimized to i1.
		if (Operator::getOpcode(Op1) == Instruction::IntToPtr \|\|
		Operator::getOpcode(Op2) == Instruction::IntToPtr \|\|
		isa<ConstantPointerNull>(Op1) \|\| isa<ConstantPointerNull>(Op2)) {
		Type *Op1Ty = Op1->getType();
		IntegerType *ITy = IntegerType::get(Op1->getContext(),
		DL.getPointerTypeSizeInBits(Op1Ty));
		Value *I1 = Builder.CreatePtrToInt(Op1, ITy);
		Value *I2 = Builder.CreatePtrToInt(Op2, ITy);
		Value *Sub = Builder.CreateSub(I1, I2);
		Value *Cast = Builder.CreateIntCast(Sub, II->getType(), true, II->getName());
		return replaceInstUsesWith(*II, Cast);
		}
		break;
		}
}		}
return visitCallSite(II);		return visitCallSite(II);
}		}

// Fence instruction simplification		// Fence instruction simplification
Instruction *InstCombiner::visitFenceInst(FenceInst &FI) {		Instruction *InstCombiner::visitFenceInst(FenceInst &FI) {
// Remove identical consecutive fences.		// Remove identical consecutive fences.
Instruction *Next = FI.getNextNonDebugInstruction();		Instruction *Next = FI.getNextNonDebugInstruction();
▲ Show 20 Lines • Show All 694 Lines • Show Last 20 Lines

lib/Transforms/InstCombine/InstCombineCompares.cpp

Show First 20 Lines • Show All 2,810 Lines • ▼ Show 20 Lines	if (IsZero \|\| C == BitWidth) {
Cmp.setOperand(0, II->getArgOperand(0));		Cmp.setOperand(0, II->getArgOperand(0));
auto *NewOp =		auto *NewOp =
IsZero ? Constant::getNullValue(Ty) : Constant::getAllOnesValue(Ty);		IsZero ? Constant::getNullValue(Ty) : Constant::getAllOnesValue(Ty);
Cmp.setOperand(1, NewOp);		Cmp.setOperand(1, NewOp);
return &Cmp;		return &Cmp;
}		}
break;		break;
}		}
		case Intrinsic::psub: {
		// psub(a, b) == 0 -> a == b
		if (C.isNullValue()) {
		Value *Op0 = II->getArgOperand(0);
		Value *Op1 = II->getArgOperand(1);
		Value *NewCmp = nullptr;
		if (Cmp.getPredicate() == CmpInst::ICMP_EQ) {
		NewCmp = Builder.CreateICmpEQ(Op0, Op1);
		} else if (Cmp.getPredicate() == CmpInst::ICMP_NE) {
		NewCmp = Builder.CreateICmpNE(Op0, Op1);
		} // There's no other case because this function starts with
		// Cmp.isEquality().
		NewCmp->takeName(&Cmp);
		replaceInstUsesWith(Cmp, NewCmp);
		Worklist.Add(II);
		return &Cmp;
		}
		break;
		}
default:		default:
break;		break;
}		}

return nullptr;		return nullptr;
}		}

/// Handle icmp with constant (but not simple integer constant) RHS.		/// Handle icmp with constant (but not simple integer constant) RHS.
▲ Show 20 Lines • Show All 2,728 Lines • Show Last 20 Lines