This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
docs/
-
LangRef.rst
-
include/llvm/
-
llvm/
-
CodeGen/
-
Passes.h
-
IR/
-
Intrinsics.td
-
InitializePasses.h
-
lib/
-
Analysis/
-
InlineCost.cpp
-
InstructionSimplify.cpp
-
CodeGen/
-
CMakeLists.txt
-
CodeGen.cpp
-
LLVMTargetMachine.cpp
-
PreISelIntrinsicLowering.cpp
-
test/
-
CodeGen/Generic/
-
Generic/
-
stop-after.ll
-
Transforms/
-
InstSimplify/
-
load-relative-32.ll
-
load-relative.ll
-
PreISelIntrinsicLowering/
-
load-relative.ll
-
tools/opt/
-
opt/
-
opt.cpp

Differential D18367

Introduce llvm.load.relative intrinsic.
ClosedPublic

Authored by pcc on Mar 22 2016, 11:56 AM.

Download Raw Diff

Details

Reviewers

rjmccall
majnemer
• rafael
rnk

Commits

rG7dd8dbf48652: Introduce llvm.load.relative intrinsic.
rL267223: Introduce llvm.load.relative intrinsic.

Summary

This intrinsic takes two arguments, `%ptr and %offset`. It loads
a 32-bit value from the address `%ptr + %offset, adds %ptr` to that
value and returns it. The constant folder specifically recognizes the form of
this intrinsic and the constant initializers it may load from; if a loaded
constant initializer is known to have the form `i32 trunc(x - %ptr)`,
the intrinsic call is folded to `x`.

LLVM provides that the calculation of such a constant initializer will
not overflow at link time under the medium code model if `x` is an
`unnamed_addr` function, however it does not provide the same for a constant
initializer folded into a function body; this intrinsic can be used to avoid
the possibility of overflows when loading from such a constant.

Depends on http://reviews.llvm.org/D17938

Diff Detail

Repository: rL LLVM

Event Timeline

pcc updated this revision to Diff 51315.Mar 22 2016, 11:56 AM

pcc retitled this revision from to Introduce llvm.load.relative intrinsic..

pcc updated this object.

pcc added reviewers: • rafael, rjmccall.

pcc added a subscriber: llvm-commits.

pcc mentioned this in D18199: CodeGen: Implement IR generation for the relative vtable ABI (PR26723)..Mar 22 2016, 12:57 PM

I would appreciate it if this intrinsic supported both i32 and i64 relative offsets. I understand that overflowing the relative adjustment is generally not a problem on 64-bit platforms, but it would be nice if frontends using a "far" relative address didn't need to use different code generation patterns.

I would appreciate it if this intrinsic supported both i32 and i64 relative offsets.

That seems like a reasonable extension. We can certainly add that later if it is needed by a frontend.

Actually, I don't understand why the overloading is done the way it is. Are you planning to store v-table entries relative to the v-table's address point rather than the address of the slot? Even if so, can't that sort of further adjustment be trivially done by the caller instead of baking it into the intrinsic?

I think it would be simpler to have a llvm.load.relative.iN that assumes an iN relative offset and doesn't take an explicit second operand.

Are you planning to store v-table entries relative to the v-table's address point rather than the address of the slot?

Yes, exactly. Sorry if that wasn't clear before. On most architectures the vtable load can be expressed as a reg+imm load; we can achieve a shorter call sequence by using that register as an addend rather than having an additional immediate.

Even if so, can't that sort of further adjustment be trivially done by the caller instead of baking it into the intrinsic?

We could certainly do it that way, but the problem with that is that I think it would make other parts of the compiler more complicated. For example, the whole-program virtual call optimizer would need to do more IR walking in order to discover the slot offset (which would now need to be present in two places in the IR and verified to be equal) and the call site. That was one of the things I wanted to address in the current design of the vcall optimizer (see also http://lists.llvm.org/pipermail/llvm-dev/2016-February/096146.html).

In D18367#381050, @pcc wrote:

Are you planning to store v-table entries relative to the v-table's address point rather than the address of the slot?

Yes, exactly. Sorry if that wasn't clear before. On most architectures the vtable load can be expressed as a reg+imm load; we can achieve a shorter call sequence by using that register as an addend rather than having an additional immediate.

On ARM and AArch64, adjusting the v-table address to the slot address can be done with a pre-indexed load, so it doesn't really matter there. I agree that it probably helps x86, though. Somewhat amusing to think about x86 requiring more instructions to do something, though. :)

Even if so, can't that sort of further adjustment be trivially done by the caller instead of baking it into the intrinsic?

We could certainly do it that way, but the problem with that is that I think it would make other parts of the compiler more complicated. For example, the whole-program virtual call optimizer would need to do more IR walking in order to discover the slot offset (which would now need to be present in two places in the IR and verified to be equal) and the call site.

Hmm. Fair enough, I think I see your point.

Set the inline cost for the intrinsic to the cost of the lowered IR instructions

Adding some more reviewers.

rnk added inline comments.Apr 19 2016, 10:47 AM

docs/LangRef.rst
12194–12198 ↗	(On Diff #51387)	Can the wording be improved here? This is a long sentence.
lib/Analysis/InstructionSimplify.cpp
3806 ↗	(On Diff #51387)	Now that the verifier uses an llvm_anyint_ty, you should probably check the bitwidth here to avoid crashing on crazy inputs.
lib/CodeGen/PreISelIntrinsicLowering.cpp
1 ↗	(On Diff #51387)	Should this lowering be done as part of CodeGenPrepare instead? We currently lower llvm.objectsize there.

Improve wording in documentation
Avoid crashing on bit widths >64

docs/LangRef.rst
12194–12198 ↗	(On Diff #51387)	Okay, what about this?
lib/CodeGen/PreISelIntrinsicLowering.cpp
1 ↗	(On Diff #51387)	I looked for an existing place to add this lowering, but I couldn't find anywhere suitable. CodeGenPrepare is apparently an optional optimization pass [1], but the backend is required to handle this intrinsic (objectsize can be implemented trivially by returning 0 or -1, which is what the backend currently does [2,3], but there's no trivial implementation of this intrinsic). [1] http://llvm-cs.pcc.me.uk/lib/CodeGen/Passes.cpp#470 [2] http://llvm-cs.pcc.me.uk/lib/CodeGen/SelectionDAG/FastISel.cpp#1218 [3] http://llvm-cs.pcc.me.uk/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp#5348

lgtm

This revision is now accepted and ready to land.Apr 21 2016, 4:18 PM

Closed by commit rL267223: Introduce llvm.load.relative intrinsic. (authored by pcc). · Explain WhyApr 22 2016, 2:23 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

docs/

LangRef.rst

25 lines

include/

llvm/

CodeGen/

Passes.h

5 lines

IR/

Intrinsics.td

3 lines

InitializePasses.h

1 line

lib/

Analysis/

InlineCost.cpp

5 lines

InstructionSimplify.cpp

61 lines

CodeGen/

CMakeLists.txt

1 line

CodeGen.cpp

1 line

LLVMTargetMachine.cpp

2 lines

PreISelIntrinsicLowering.cpp

85 lines

test/

CodeGen/

Generic/

stop-after.ll

2 lines

Transforms/

InstSimplify/

load-relative-32.ll

19 lines

load-relative.ll

75 lines

PreISelIntrinsicLowering/

load-relative.ll

26 lines

tools/

opt/

opt.cpp

1 line

Diff 54724

llvm/trunk/docs/LangRef.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 12,355 Lines • ▼ Show 20 Lines
	is ``false``. Since the optimizer is allowed to replace the ``undef``			is ``false``. Since the optimizer is allowed to replace the ``undef``
	with an arbitrary value, it can optimize guard to fail "spuriously",			with an arbitrary value, it can optimize guard to fail "spuriously",
	i.e. without the original condition being false (hence the "not only			i.e. without the original condition being false (hence the "not only
	if"); and this allows for "check widening" type optimizations.			if"); and this allows for "check widening" type optimizations.

	``@llvm.experimental.guard`` cannot be invoked.			``@llvm.experimental.guard`` cannot be invoked.


				'``llvm.load.relative``' Intrinsic
				^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

				Syntax:
				"""""""

				::

				declare i8* @llvm.load.relative.iN(i8* %ptr, iN %offset) argmemonly nounwind readonly

				Overview:
				"""""""""

				This intrinsic loads a 32-bit value from the address ``%ptr + %offset``,
				adds ``%ptr`` to that value and returns it. The constant folder specifically
				recognizes the form of this intrinsic and the constant initializers it may
				load from; if a loaded constant initializer is known to have the form
				``i32 trunc(x - %ptr)``, the intrinsic call is folded to ``x``.

				LLVM provides that the calculation of such a constant initializer will
				not overflow at link time under the medium code model if ``x`` is an
				``unnamed_addr`` function. However, it does not provide this guarantee for
				a constant initializer folded into a function body. This intrinsic can be
				used to avoid the possibility of overflows when loading from such a constant.

	Stack Map Intrinsics			Stack Map Intrinsics
	--------------------			--------------------

	LLVM provides experimental intrinsics to support runtime patching			LLVM provides experimental intrinsics to support runtime patching
	mechanisms commonly desired in dynamic language JITs. These intrinsics			mechanisms commonly desired in dynamic language JITs. These intrinsics
	are described in :doc:`StackMaps`.			are described in :doc:`StackMaps`.

llvm/trunk/include/llvm/CodeGen/Passes.h

Show First 20 Lines • Show All 674 Lines • ▼ Show 20 Lines	/// MachineDominanaceFrontier - This pass is a machine dominators analysis pass.
///		///
FunctionPass createInterleavedAccessPass(const TargetMachine TM);		FunctionPass createInterleavedAccessPass(const TargetMachine TM);

/// LowerEmuTLS - This pass generates __emutls_[vt].xyz variables for all		/// LowerEmuTLS - This pass generates __emutls_[vt].xyz variables for all
/// TLS variables for the emulated TLS model.		/// TLS variables for the emulated TLS model.
///		///
ModulePass createLowerEmuTLSPass(const TargetMachine TM);		ModulePass createLowerEmuTLSPass(const TargetMachine TM);

		/// This pass lowers the @llvm.load.relative intrinsic to instructions.
		/// This is unsafe to do earlier because a pass may combine the constant
		/// initializer into the load, which may result in an overflowing evaluation.
		ModulePass *createPreISelIntrinsicLoweringPass();

/// GlobalMerge - This pass merges internal (by default) globals into structs		/// GlobalMerge - This pass merges internal (by default) globals into structs
/// to enable reuse of a base pointer by indexed addressing modes.		/// to enable reuse of a base pointer by indexed addressing modes.
/// It can also be configured to focus on size optimizations only.		/// It can also be configured to focus on size optimizations only.
///		///
Pass createGlobalMergePass(const TargetMachine TM, unsigned MaximalOffset,		Pass createGlobalMergePass(const TargetMachine TM, unsigned MaximalOffset,
bool OnlyOptimizeForSize = false,		bool OnlyOptimizeForSize = false,
bool MergeExternalByDefault = false);		bool MergeExternalByDefault = false);

Show All 32 Lines

llvm/trunk/include/llvm/IR/Intrinsics.td

Show First 20 Lines • Show All 663 Lines • ▼ Show 20 Lines	def int_masked_scatter: Intrinsic<[],
LLVMVectorOfPointersToElt<0>, llvm_i32_ty,		LLVMVectorOfPointersToElt<0>, llvm_i32_ty,
LLVMVectorSameWidth<0, llvm_i1_ty>],		LLVMVectorSameWidth<0, llvm_i1_ty>],
[IntrArgMemOnly]>;		[IntrArgMemOnly]>;

// Intrinsics to support bit sets.		// Intrinsics to support bit sets.
def int_bitset_test : Intrinsic<[llvm_i1_ty], [llvm_ptr_ty, llvm_metadata_ty],		def int_bitset_test : Intrinsic<[llvm_i1_ty], [llvm_ptr_ty, llvm_metadata_ty],
[IntrNoMem]>;		[IntrNoMem]>;

		def int_load_relative: Intrinsic<[llvm_ptr_ty], [llvm_ptr_ty, llvm_anyint_ty],
		[IntrReadMem, IntrArgMemOnly]>;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Target-specific intrinsics		// Target-specific intrinsics
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

include "llvm/IR/IntrinsicsPowerPC.td"		include "llvm/IR/IntrinsicsPowerPC.td"
include "llvm/IR/IntrinsicsX86.td"		include "llvm/IR/IntrinsicsX86.td"
include "llvm/IR/IntrinsicsARM.td"		include "llvm/IR/IntrinsicsARM.td"
include "llvm/IR/IntrinsicsAArch64.td"		include "llvm/IR/IntrinsicsAArch64.td"
include "llvm/IR/IntrinsicsXCore.td"		include "llvm/IR/IntrinsicsXCore.td"
include "llvm/IR/IntrinsicsHexagon.td"		include "llvm/IR/IntrinsicsHexagon.td"
include "llvm/IR/IntrinsicsNVVM.td"		include "llvm/IR/IntrinsicsNVVM.td"
include "llvm/IR/IntrinsicsMips.td"		include "llvm/IR/IntrinsicsMips.td"
include "llvm/IR/IntrinsicsAMDGPU.td"		include "llvm/IR/IntrinsicsAMDGPU.td"
include "llvm/IR/IntrinsicsBPF.td"		include "llvm/IR/IntrinsicsBPF.td"
include "llvm/IR/IntrinsicsSystemZ.td"		include "llvm/IR/IntrinsicsSystemZ.td"
include "llvm/IR/IntrinsicsWebAssembly.td"		include "llvm/IR/IntrinsicsWebAssembly.td"

llvm/trunk/include/llvm/InitializePasses.h

	Show First 20 Lines • Show All 240 Lines • ▼ Show 20 Lines
	void initializePostDomOnlyViewerPass(PassRegistry&);			void initializePostDomOnlyViewerPass(PassRegistry&);
	void initializePostDomPrinterPass(PassRegistry&);			void initializePostDomPrinterPass(PassRegistry&);
	void initializePostDomViewerPass(PassRegistry&);			void initializePostDomViewerPass(PassRegistry&);
	void initializePostDominatorTreeWrapperPassPass(PassRegistry&);			void initializePostDominatorTreeWrapperPassPass(PassRegistry&);
	void initializePostOrderFunctionAttrsLegacyPassPass(PassRegistry&);			void initializePostOrderFunctionAttrsLegacyPassPass(PassRegistry&);
	void initializePostRAHazardRecognizerPass(PassRegistry&);			void initializePostRAHazardRecognizerPass(PassRegistry&);
	void initializePostRASchedulerPass(PassRegistry&);			void initializePostRASchedulerPass(PassRegistry&);
	void initializePostMachineSchedulerPass(PassRegistry&);			void initializePostMachineSchedulerPass(PassRegistry&);
				void initializePreISelIntrinsicLoweringPass(PassRegistry&);
	void initializePrintFunctionPassWrapperPass(PassRegistry&);			void initializePrintFunctionPassWrapperPass(PassRegistry&);
	void initializePrintModulePassWrapperPass(PassRegistry&);			void initializePrintModulePassWrapperPass(PassRegistry&);
	void initializePrintBasicBlockPassPass(PassRegistry&);			void initializePrintBasicBlockPassPass(PassRegistry&);
	void initializeProcessImplicitDefsPass(PassRegistry&);			void initializeProcessImplicitDefsPass(PassRegistry&);
	void initializePromotePassPass(PassRegistry&);			void initializePromotePassPass(PassRegistry&);
	void initializePruneEHPass(PassRegistry&);			void initializePruneEHPass(PassRegistry&);
	void initializeReassociatePass(PassRegistry&);			void initializeReassociatePass(PassRegistry&);
	void initializeRegBankSelectPass(PassRegistry &);			void initializeRegBankSelectPass(PassRegistry &);
	▲ Show 20 Lines • Show All 81 Lines • Show Last 20 Lines

llvm/trunk/lib/Analysis/InlineCost.cpp

Show First 20 Lines • Show All 894 Lines • ▼ Show 20 Lines	if (Function *F = CS.getCalledFunction()) {

// Next check if it is an intrinsic we know about.		// Next check if it is an intrinsic we know about.
// FIXME: Lift this into part of the InstVisitor.		// FIXME: Lift this into part of the InstVisitor.
if (IntrinsicInst *II = dyn_cast<IntrinsicInst>(CS.getInstruction())) {		if (IntrinsicInst *II = dyn_cast<IntrinsicInst>(CS.getInstruction())) {
switch (II->getIntrinsicID()) {		switch (II->getIntrinsicID()) {
default:		default:
return Base::visitCallSite(CS);		return Base::visitCallSite(CS);

		case Intrinsic::load_relative:
		// This is normally lowered to 4 LLVM instructions.
		Cost += 3 * InlineConstants::InstrCost;
		return false;

case Intrinsic::memset:		case Intrinsic::memset:
case Intrinsic::memcpy:		case Intrinsic::memcpy:
case Intrinsic::memmove:		case Intrinsic::memmove:
// SROA can usually chew through these intrinsics, but they aren't free.		// SROA can usually chew through these intrinsics, but they aren't free.
return false;		return false;
case Intrinsic::localescape:		case Intrinsic::localescape:
HasFrameEscape = true;		HasFrameEscape = true;
return false;		return false;
▲ Show 20 Lines • Show All 633 Lines • Show Last 20 Lines

llvm/trunk/lib/Analysis/InstructionSimplify.cpp

Show First 20 Lines • Show All 3,819 Lines • ▼ Show 20 Lines	static bool IsIdempotent(Intrinsic::ID ID) {
case Intrinsic::trunc:		case Intrinsic::trunc:
case Intrinsic::rint:		case Intrinsic::rint:
case Intrinsic::nearbyint:		case Intrinsic::nearbyint:
case Intrinsic::round:		case Intrinsic::round:
return true;		return true;
}		}
}		}

		static Value SimplifyRelativeLoad(Constant Ptr, Constant *Offset,
		const DataLayout &DL) {
		GlobalValue *PtrSym;
		APInt PtrOffset;
		if (!IsConstantOffsetFromGlobal(Ptr, PtrSym, PtrOffset, DL))
		return nullptr;

		Type *Int8PtrTy = Type::getInt8PtrTy(Ptr->getContext());
		Type *Int32Ty = Type::getInt32Ty(Ptr->getContext());
		Type *Int32PtrTy = Int32Ty->getPointerTo();
		Type *Int64Ty = Type::getInt64Ty(Ptr->getContext());

		auto *OffsetConstInt = dyn_cast<ConstantInt>(Offset);
		if (!OffsetConstInt \|\| OffsetConstInt->getType()->getBitWidth() > 64)
		return nullptr;

		uint64_t OffsetInt = OffsetConstInt->getSExtValue();
		if (OffsetInt % 4 != 0)
		return nullptr;

		Constant *C = ConstantExpr::getGetElementPtr(
		Int32Ty, ConstantExpr::getBitCast(Ptr, Int32PtrTy),
		ConstantInt::get(Int64Ty, OffsetInt / 4));
		Constant *Loaded = ConstantFoldLoadFromConstPtr(C, Int32Ty, DL);
		if (!Loaded)
		return nullptr;

		auto *LoadedCE = dyn_cast<ConstantExpr>(Loaded);
		if (!LoadedCE)
		return nullptr;

		if (LoadedCE->getOpcode() == Instruction::Trunc) {
		LoadedCE = dyn_cast<ConstantExpr>(LoadedCE->getOperand(0));
		if (!LoadedCE)
		return nullptr;
		}

		if (LoadedCE->getOpcode() != Instruction::Sub)
		return nullptr;

		auto *LoadedLHS = dyn_cast<ConstantExpr>(LoadedCE->getOperand(0));
		if (!LoadedLHS \|\| LoadedLHS->getOpcode() != Instruction::PtrToInt)
		return nullptr;
		auto *LoadedLHSPtr = LoadedLHS->getOperand(0);

		Constant *LoadedRHS = LoadedCE->getOperand(1);
		GlobalValue *LoadedRHSSym;
		APInt LoadedRHSOffset;
		if (!IsConstantOffsetFromGlobal(LoadedRHS, LoadedRHSSym, LoadedRHSOffset,
		DL) \|\|
		PtrSym != LoadedRHSSym \|\| PtrOffset != LoadedRHSOffset)
		return nullptr;

		return ConstantExpr::getBitCast(LoadedLHSPtr, Int8PtrTy);
		}

template <typename IterTy>		template <typename IterTy>
static Value SimplifyIntrinsic(Function F, IterTy ArgBegin, IterTy ArgEnd,		static Value SimplifyIntrinsic(Function F, IterTy ArgBegin, IterTy ArgEnd,
const Query &Q, unsigned MaxRecurse) {		const Query &Q, unsigned MaxRecurse) {
Intrinsic::ID IID = F->getIntrinsicID();		Intrinsic::ID IID = F->getIntrinsicID();
unsigned NumOperands = std::distance(ArgBegin, ArgEnd);		unsigned NumOperands = std::distance(ArgBegin, ArgEnd);
Type *ReturnType = F->getReturnType();		Type *ReturnType = F->getReturnType();

// Binary Ops		// Binary Ops
Show All 24 Lines	if (IID == Intrinsic::umul_with_overflow \|\|
// X * 0 -> { 0, false }		// X * 0 -> { 0, false }
if (match(RHS, m_Zero()))		if (match(RHS, m_Zero()))
return Constant::getNullValue(ReturnType);		return Constant::getNullValue(ReturnType);

// X * undef -> { 0, false }		// X * undef -> { 0, false }
if (match(RHS, m_Undef()))		if (match(RHS, m_Undef()))
return Constant::getNullValue(ReturnType);		return Constant::getNullValue(ReturnType);
}		}

		if (IID == Intrinsic::load_relative && isa<Constant>(LHS) &&
		isa<Constant>(RHS))
		return SimplifyRelativeLoad(cast<Constant>(LHS), cast<Constant>(RHS),
		Q.DL);
}		}

// Perform idempotent optimizations		// Perform idempotent optimizations
if (!IsIdempotent(IID))		if (!IsIdempotent(IID))
return nullptr;		return nullptr;

// Unary Ops		// Unary Ops
if (NumOperands == 1)		if (NumOperands == 1)
▲ Show 20 Lines • Show All 300 Lines • Show Last 20 Lines

llvm/trunk/lib/CodeGen/CMakeLists.txt

Show First 20 Lines • Show All 81 Lines • ▼ Show 20 Lines	add_llvm_library(LLVMCodeGen
OptimizePHIs.cpp		OptimizePHIs.cpp
ParallelCG.cpp		ParallelCG.cpp
Passes.cpp		Passes.cpp
PeepholeOptimizer.cpp		PeepholeOptimizer.cpp
PHIElimination.cpp		PHIElimination.cpp
PHIEliminationUtils.cpp		PHIEliminationUtils.cpp
PostRAHazardRecognizer.cpp		PostRAHazardRecognizer.cpp
PostRASchedulerList.cpp		PostRASchedulerList.cpp
		PreISelIntrinsicLowering.cpp
ProcessImplicitDefs.cpp		ProcessImplicitDefs.cpp
PrologEpilogInserter.cpp		PrologEpilogInserter.cpp
PseudoSourceValue.cpp		PseudoSourceValue.cpp
RegAllocBase.cpp		RegAllocBase.cpp
RegAllocBasic.cpp		RegAllocBasic.cpp
RegAllocFast.cpp		RegAllocFast.cpp
RegAllocGreedy.cpp		RegAllocGreedy.cpp
RegAllocPBQP.cpp		RegAllocPBQP.cpp
▲ Show 20 Lines • Show All 47 Lines • Show Last 20 Lines

llvm/trunk/lib/CodeGen/CodeGen.cpp

Show First 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	void llvm::initializeCodeGen(PassRegistry &Registry) {
initializePatchableFunctionPass(Registry);		initializePatchableFunctionPass(Registry);
initializeOptimizePHIsPass(Registry);		initializeOptimizePHIsPass(Registry);
initializePEIPass(Registry);		initializePEIPass(Registry);
initializePHIEliminationPass(Registry);		initializePHIEliminationPass(Registry);
initializePeepholeOptimizerPass(Registry);		initializePeepholeOptimizerPass(Registry);
initializePostMachineSchedulerPass(Registry);		initializePostMachineSchedulerPass(Registry);
initializePostRAHazardRecognizerPass(Registry);		initializePostRAHazardRecognizerPass(Registry);
initializePostRASchedulerPass(Registry);		initializePostRASchedulerPass(Registry);
		initializePreISelIntrinsicLoweringPass(Registry);
initializeProcessImplicitDefsPass(Registry);		initializeProcessImplicitDefsPass(Registry);
initializeRegisterCoalescerPass(Registry);		initializeRegisterCoalescerPass(Registry);
initializeShrinkWrapPass(Registry);		initializeShrinkWrapPass(Registry);
initializeSlotIndexesPass(Registry);		initializeSlotIndexesPass(Registry);
initializeStackColoringPass(Registry);		initializeStackColoringPass(Registry);
initializeStackMapLivenessPass(Registry);		initializeStackMapLivenessPass(Registry);
initializeLiveDebugValuesPass(Registry);		initializeLiveDebugValuesPass(Registry);
initializeSafeStackPass(Registry);		initializeSafeStackPass(Registry);
Show All 16 Lines

llvm/trunk/lib/CodeGen/LLVMTargetMachine.cpp

Show First 20 Lines • Show All 96 Lines • ▼ Show 20 Lines	addPassesToGenerateCode(LLVMTargetMachine *TM, PassManagerBase &PM,
bool DisableVerify, AnalysisID StartBefore,		bool DisableVerify, AnalysisID StartBefore,
AnalysisID StartAfter, AnalysisID StopAfter,		AnalysisID StartAfter, AnalysisID StopAfter,
MachineFunctionInitializer *MFInitializer = nullptr) {		MachineFunctionInitializer *MFInitializer = nullptr) {

// When in emulated TLS mode, add the LowerEmuTLS pass.		// When in emulated TLS mode, add the LowerEmuTLS pass.
if (TM->Options.EmulatedTLS)		if (TM->Options.EmulatedTLS)
PM.add(createLowerEmuTLSPass(TM));		PM.add(createLowerEmuTLSPass(TM));

		PM.add(createPreISelIntrinsicLoweringPass());

// Add internal analysis passes from the target machine.		// Add internal analysis passes from the target machine.
PM.add(createTargetTransformInfoWrapperPass(TM->getTargetIRAnalysis()));		PM.add(createTargetTransformInfoWrapperPass(TM->getTargetIRAnalysis()));

// Targets may override createPassConfig to provide a target-specific		// Targets may override createPassConfig to provide a target-specific
// subclass.		// subclass.
TargetPassConfig *PassConfig = TM->createPassConfig(PM);		TargetPassConfig *PassConfig = TM->createPassConfig(PM);
PassConfig->setStartStopPasses(StartBefore, StartAfter, StopAfter);		PassConfig->setStartStopPasses(StartBefore, StartAfter, StopAfter);

▲ Show 20 Lines • Show All 178 Lines • Show Last 20 Lines

llvm/trunk/lib/CodeGen/PreISelIntrinsicLowering.cpp

				//===-- PreISelIntrinsicLowering.cpp - Pre-ISel intrinsic lowering pass ---===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// This pass implements IR lowering for the llvm.load.relative intrinsic.
				//
				//===----------------------------------------------------------------------===//

				#include "llvm/CodeGen/Passes.h"
				#include "llvm/IR/Function.h"
				#include "llvm/IR/IRBuilder.h"
				#include "llvm/IR/Instructions.h"
				#include "llvm/IR/Intrinsics.h"
				#include "llvm/IR/Module.h"
				#include "llvm/Pass.h"

				using namespace llvm;

				namespace {

				bool lowerLoadRelative(Function &F) {
				if (F.use_empty())
				return false;

				bool Changed = false;
				Type *Int32Ty = Type::getInt32Ty(F.getContext());
				Type *Int32PtrTy = Int32Ty->getPointerTo();
				Type *Int8Ty = Type::getInt8Ty(F.getContext());

				for (auto I = F.use_begin(), E = F.use_end(); I != E;) {
				auto CI = dyn_cast<CallInst>(I->getUser());
				++I;
				if (!CI \|\| CI->getCalledValue() != &F)
				continue;

				IRBuilder<> B(CI);
				Value *OffsetPtr =
				B.CreateGEP(Int8Ty, CI->getArgOperand(0), CI->getArgOperand(1));
				Value *OffsetPtrI32 = B.CreateBitCast(OffsetPtr, Int32PtrTy);
				Value *OffsetI32 = B.CreateAlignedLoad(OffsetPtrI32, 4);

				Value *ResultPtr = B.CreateGEP(Int8Ty, CI->getArgOperand(0), OffsetI32);

				CI->replaceAllUsesWith(ResultPtr);
				CI->eraseFromParent();
				Changed = true;
				}

				return Changed;
				}

				bool lowerIntrinsics(Module &M) {
				bool Changed = false;
				for (Function &F : M) {
				if (F.getName().startswith("llvm.load.relative."))
				Changed \|= lowerLoadRelative(F);
				}
				return Changed;
				}

				class PreISelIntrinsicLowering : public ModulePass {
				public:
				static char ID;
				PreISelIntrinsicLowering() : ModulePass(ID) {}

				bool runOnModule(Module &M) {
				return lowerIntrinsics(M);
				}
				};

				char PreISelIntrinsicLowering::ID;

				}

				INITIALIZE_PASS(PreISelIntrinsicLowering, "pre-isel-intrinsic-lowering",
				"Pre-ISel Intrinsic Lowering", false, false)

				ModulePass *llvm::createPreISelIntrinsicLoweringPass() {
				return new PreISelIntrinsicLowering;
				}

llvm/trunk/test/CodeGen/Generic/stop-after.ll

	; RUN: llc < %s -debug-pass=Structure -stop-after=loop-reduce -o /dev/null 2>&1 \| FileCheck %s -check-prefix=STOP			; RUN: llc < %s -debug-pass=Structure -stop-after=loop-reduce -o /dev/null 2>&1 \| FileCheck %s -check-prefix=STOP
	; RUN: llc < %s -debug-pass=Structure -start-after=loop-reduce -o /dev/null 2>&1 \| FileCheck %s -check-prefix=START			; RUN: llc < %s -debug-pass=Structure -start-after=loop-reduce -o /dev/null 2>&1 \| FileCheck %s -check-prefix=START

	; STOP: -loop-reduce			; STOP: -loop-reduce
	; STOP: Loop Strength Reduction			; STOP: Loop Strength Reduction
	; STOP-NEXT: Machine Function Analysis			; STOP-NEXT: Machine Function Analysis
	; STOP-NEXT: MIR Printing Pass			; STOP-NEXT: MIR Printing Pass

	; START: -machine-branch-prob -gc-lowering			; START: -machine-branch-prob -pre-isel-intrinsic-lowering
	; START: FunctionPass Manager			; START: FunctionPass Manager
	; START-NEXT: Lower Garbage Collection Instructions			; START-NEXT: Lower Garbage Collection Instructions

llvm/trunk/test/Transforms/InstSimplify/load-relative-32.ll

				; RUN: opt < %s -instsimplify -S \| FileCheck %s

				target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:32:32-n8:16:32"
				target triple = "i386-unknown-linux-gnu"

				@a = external global i8

				@c1 = constant [3 x i32] [i32 0, i32 0,
				i32 sub (i32 ptrtoint (i8* @a to i32), i32 ptrtoint (i32* getelementptr ([3 x i32], [3 x i32]* @c1, i32 0, i32 2) to i32))
				]

				; CHECK: @f1
				define i8* @f1() {
				; CHECK: ret i8* @a
				%l = call i8* @llvm.load.relative.i32(i8* bitcast (i32* getelementptr ([3 x i32], [3 x i32]* @c1, i32 0, i32 2) to i8*), i32 0)
				ret i8* %l
				}

				declare i8* @llvm.load.relative.i32(i8*, i32)

llvm/trunk/test/Transforms/InstSimplify/load-relative.ll

				; RUN: opt < %s -instsimplify -S \| FileCheck %s

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				@a = external global i8
				@b = external global i8

				@c1 = constant i32 trunc (i64 sub (i64 ptrtoint (i8* @a to i64), i64 ptrtoint (i32* @c1 to i64)) to i32)
				@c2 = constant [7 x i32] [i32 0, i32 0,
				i32 trunc (i64 sub (i64 ptrtoint (i8* @a to i64), i64 ptrtoint (i32* getelementptr ([7 x i32], [7 x i32]* @c2, i32 0, i32 2) to i64)) to i32),
				i32 trunc (i64 sub (i64 ptrtoint (i8* @b to i64), i64 ptrtoint (i32* getelementptr ([7 x i32], [7 x i32]* @c2, i32 0, i32 2) to i64)) to i32),
				i32 trunc (i64 add (i64 ptrtoint (i8* @b to i64), i64 ptrtoint (i32* getelementptr ([7 x i32], [7 x i32]* @c2, i32 0, i32 2) to i64)) to i32),
				i32 trunc (i64 sub (i64 ptrtoint (i8* @b to i64), i64 1) to i32),
				i32 trunc (i64 sub (i64 0, i64 ptrtoint (i32* getelementptr ([7 x i32], [7 x i32]* @c2, i32 0, i32 2) to i64)) to i32)
				]

				; CHECK: @f1
				define i8* @f1() {
				; CHECK: ret i8* @a
				%l = call i8* @llvm.load.relative.i32(i8* bitcast (i32* @c1 to i8*), i32 0)
				ret i8* %l
				}

				; CHECK: @f2
				define i8* @f2() {
				; CHECK: ret i8* @a
				%l = call i8* @llvm.load.relative.i32(i8* bitcast (i32* getelementptr ([7 x i32], [7 x i32]* @c2, i64 0, i64 2) to i8*), i32 0)
				ret i8* %l
				}

				; CHECK: @f3
				define i8* @f3() {
				; CHECK: ret i8* @b
				%l = call i8* @llvm.load.relative.i64(i8* bitcast (i32* getelementptr ([7 x i32], [7 x i32]* @c2, i64 0, i64 2) to i8*), i64 4)
				ret i8* %l
				}

				; CHECK: @f4
				define i8* @f4() {
				; CHECK: ret i8* %
				%l = call i8* @llvm.load.relative.i32(i8* bitcast (i32* getelementptr ([7 x i32], [7 x i32]* @c2, i64 0, i64 2) to i8*), i32 1)
				ret i8* %l
				}

				; CHECK: @f5
				define i8* @f5() {
				; CHECK: ret i8* %
				%l = call i8* @llvm.load.relative.i32(i8* zeroinitializer, i32 0)
				ret i8* %l
				}

				; CHECK: @f6
				define i8* @f6() {
				; CHECK: ret i8* %
				%l = call i8* @llvm.load.relative.i32(i8* bitcast (i32* getelementptr ([7 x i32], [7 x i32]* @c2, i64 0, i64 2) to i8*), i32 8)
				ret i8* %l
				}

				; CHECK: @f7
				define i8* @f7() {
				; CHECK: ret i8* %
				%l = call i8* @llvm.load.relative.i32(i8* bitcast (i32* getelementptr ([7 x i32], [7 x i32]* @c2, i64 0, i64 2) to i8*), i32 12)
				ret i8* %l
				}

				; CHECK: @f8
				define i8* @f8() {
				; CHECK: ret i8* %
				%l = call i8* @llvm.load.relative.i32(i8* bitcast (i32* getelementptr ([7 x i32], [7 x i32]* @c2, i64 0, i64 2) to i8*), i32 16)
				ret i8* %l
				}

				declare i8* @llvm.load.relative.i32(i8*, i32)
				declare i8* @llvm.load.relative.i64(i8*, i64)

llvm/trunk/test/Transforms/PreISelIntrinsicLowering/load-relative.ll

				; RUN: opt -pre-isel-intrinsic-lowering -S -o - %s \| FileCheck %s

				; CHECK: define i8* @foo32(i8* [[P:%.]], i32 [[O:%.]])
				define i8* @foo32(i8* %p, i32 %o) {
				; CHECK: [[OP:%.]] = getelementptr i8, i8 [[P]], i32 [[O]]
				; CHECK: [[OPI32:%.]] = bitcast i8 [[OP]] to i32*
				; CHECK: [[OI32:%.]] = load i32, i32 [[OPI32]], align 4
				; CHECK: [[R:%.]] = getelementptr i8, i8 [[P]], i32 [[OI32]]
				; CHECK: ret i8* [[R]]
				%l = call i8* @llvm.load.relative.i32(i8* %p, i32 %o)
				ret i8* %l
				}

				; CHECK: define i8* @foo64(i8* [[P:%.]], i64 [[O:%.]])
				define i8* @foo64(i8* %p, i64 %o) {
				; CHECK: [[OP:%.]] = getelementptr i8, i8 [[P]], i64 [[O]]
				; CHECK: [[OPI32:%.]] = bitcast i8 [[OP]] to i32*
				; CHECK: [[OI32:%.]] = load i32, i32 [[OPI32]], align 4
				; CHECK: [[R:%.]] = getelementptr i8, i8 [[P]], i32 [[OI32]]
				; CHECK: ret i8* [[R]]
				%l = call i8* @llvm.load.relative.i64(i8* %p, i64 %o)
				ret i8* %l
				}

				declare i8* @llvm.load.relative.i32(i8*, i32)
				declare i8* @llvm.load.relative.i64(i8*, i64)

llvm/trunk/tools/opt/opt.cpp

Show First 20 Lines • Show All 347 Lines • ▼ Show 20 Lines	int main(int argc, char **argv) {
// supported.		// supported.
initializeCodeGenPreparePass(Registry);		initializeCodeGenPreparePass(Registry);
initializeAtomicExpandPass(Registry);		initializeAtomicExpandPass(Registry);
initializeRewriteSymbolsPass(Registry);		initializeRewriteSymbolsPass(Registry);
initializeWinEHPreparePass(Registry);		initializeWinEHPreparePass(Registry);
initializeDwarfEHPreparePass(Registry);		initializeDwarfEHPreparePass(Registry);
initializeSafeStackPass(Registry);		initializeSafeStackPass(Registry);
initializeSjLjEHPreparePass(Registry);		initializeSjLjEHPreparePass(Registry);
		initializePreISelIntrinsicLoweringPass(Registry);

#ifdef LINK_POLLY_INTO_TOOLS		#ifdef LINK_POLLY_INTO_TOOLS
polly::initializePollyPasses(Registry);		polly::initializePollyPasses(Registry);
#endif		#endif

cl::ParseCommandLineOptions(argc, argv,		cl::ParseCommandLineOptions(argc, argv,
"llvm .bc -> .bc modular optimizer and analysis printer\n");		"llvm .bc -> .bc modular optimizer and analysis printer\n");

▲ Show 20 Lines • Show All 320 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Introduce llvm.load.relative intrinsic.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 54724

llvm/trunk/docs/LangRef.rst

llvm/trunk/include/llvm/CodeGen/Passes.h

llvm/trunk/include/llvm/IR/Intrinsics.td

llvm/trunk/include/llvm/InitializePasses.h

llvm/trunk/lib/Analysis/InlineCost.cpp

llvm/trunk/lib/Analysis/InstructionSimplify.cpp

llvm/trunk/lib/CodeGen/CMakeLists.txt

llvm/trunk/lib/CodeGen/CodeGen.cpp

llvm/trunk/lib/CodeGen/LLVMTargetMachine.cpp

llvm/trunk/lib/CodeGen/PreISelIntrinsicLowering.cpp

llvm/trunk/test/CodeGen/Generic/stop-after.ll

llvm/trunk/test/Transforms/InstSimplify/load-relative-32.ll

llvm/trunk/test/Transforms/InstSimplify/load-relative.ll

llvm/trunk/test/Transforms/PreISelIntrinsicLowering/load-relative.ll

llvm/trunk/tools/opt/opt.cpp

Introduce llvm.load.relative intrinsic.
ClosedPublic