This is an archive of the discontinued LLVM Phabricator instance.

Add new indexed load/store intrinsics.
AbandonedPublic

Authored by • HaoLiu on Apr 22 2015, 6:00 AM.

Download Raw Diff

Details

Reviewers

qcolombet
rengolin
ab
delena

Summary

Hi,

According to the comments in D8820 (Teach Loop Vectorizer about interleaved data accesses), i split that patch and this is the first patch to add support for the new intrinsics:

<4 x double> @llvm.indexed.load.v4f64 (double* <ptr>, <4 x i32> <index>, i32 <alignment>)
void @llvm.indexed.store.v4f64 (<4 x double> <value>, double* <ptr>, <4 x i32> <index>, i32 <alignment>)

Such intrinsics can be used as interleaved load/store, strided load/store, etc.

I just a bit worry about the name of "indexed". Actually there is already indexed load/store name used for load/store with indexed memory mode (the pre-incremental, post-incremental, pre-dec...). There is also masked load/store for prediction load/store. I can't find a better name for load/store with indices. If you think this name is confusing and there is a better name, I can change it.

The implementation is like the masked load/store. This patch mainly about:

(1) Add two new intrinsics and modify the LangRef.rst
(2) Add code generator for the new intrinsics. Add AArch64 backend codegen for the interleaved load/store, which is a subset of the indexed load/store.
(3) Teach the CodeGenPrepare to scalarize unsupported unsupported indexed load/store.

There is no code in the Legalization phase, as the AArch64 backend can not support other indexed load/store except interleaved load/store. Even if I add such code, I can not test. Anyway, the CodeGenPrepare can handle the unsupported cases.

There are TODOs in the CodeGenPrepare, some indexed load/store can be transfered into "a VectorLoad + a SuffleVector" or "a ShuffleVector + a VectorStore".

Review please.

Thanks,
-Hao

Diff Detail

Event Timeline

• HaoLiu updated this revision to Diff 24217.Apr 22 2015, 6:00 AM

• HaoLiu retitled this revision from to Add new indexed load/store intrinsics..

• HaoLiu updated this object.

• HaoLiu edited the test plan for this revision. (Show Details)

• HaoLiu added reviewers: delena, rengolin, ab, qcolombet.

Herald added a subscriber: aemerson. · View Herald TranscriptApr 22 2015, 6:00 AM

• HaoLiu edited subscribers, added: Unknown Object (MLST); removed: aemerson.Apr 22 2015, 6:01 AM

If the name of indexed.load indexed.store are not reasonable, how about indexed.gather and indexed.scatter like Elena used in D7665?

Thanks,
-Hao

• HaoLiu planned changes to this revision.Apr 22 2015, 6:22 AM

Thanks for splitting the patches, it makes for a nicer read!

I'm concerned about CodeGenPrepare legalization. Isn't it a non-mandatory optimization pass? A quick grep shows it's not even enabled at -O0, what happens then?

-Ahmed

In D9194#159975, @HaoLiu wrote:

If the name of indexed.load indexed.store are not reasonable, how about indexed.gather and indexed.scatter like Elena used in D7665?

Hao,

I was under the impression that you two would use the same intrinsics, even though the hardware implementation is different, the IR concept should be generic enough to allow scatter/gather, masked loads and indexed loads to be represented in a generic way. Please comment on Elena's patch if you think we should use her intrinsics, but with a more generic name.

I don't want us creating too many load/store intrinsics, as that means poor matching for optimisation passes and make the IR more complex than it should be.

cheers,
--renato

In D9194#160053, @rengolin wrote:

In D9194#159975, @HaoLiu wrote:

If the name of indexed.load indexed.store are not reasonable, how about indexed.gather and indexed.scatter like Elena used in D7665?

Hao,

I was under the impression that you two would use the same intrinsics, even though the hardware implementation is different, the IR concept should be generic enough to allow scatter/gather, masked loads and indexed loads to be represented in a generic way. Please comment on Elena's patch if you think we should use her intrinsics, but with a more generic name.

Again, I agree, and now I think just straight reusing Elena's intrinsic is probably enough.

This:

%i3 = ...
...
%indices = insertelement  ...,  i32 %i3, i32 3
<4 x double> @llvm.indexed.load.v4f64 (double* %base, <4 x i32> %indices, i32 <alignment>)

seems equivalent to:

%i3 = ...
...
%p3 = getelementptr double* %base, i32 %i3
...
%ptrs = insertelement ..., double* %p3, i32 3
<4 x double> @masked.gather.v4f64 (<4 x double*> %ptrs, i32 <alignment>, <4 x i1> <1, 1, 1, 1>, <4 x double> undef)

The only problem is that the GEPs might get combined with the base somehow. So, the SelectionDAGBuilder (where it's useful to have both MLOAD and ILOAD) will have to look at the pointers, determine they're all at some constant index apart, and use ILOAD instead of MLOAD.

-Ahmed

Hi Renato and Ahmed,

I agree with your comments.

But I want to change the plan. Because I think maybe there is no need to use intrinsics.
For the interleaved load about <4 x double>

<4 x double> @llvm.indexed.load.v4f64 (double* <ptr>, <4 x i32> <index>, i32 <alignment>)

I think we can use two common IRs:

<value> = load <4 x double>, <4 x double>* <ptr>
shufflevector <4 x double> <value>, <4 x double> undef, <4 x i32> <0, 2, 1, 3>

Even though it is more complex for a backend to match two IRs, it is achievable. I think the disadvantage of intrinsics is not easy to be optimized.

I want to implement the loop vectorization on interleaved memory access with vectorload/vectorstore+shufflevector.

What do you think?

Thanks,
-Hao

• HaoLiu abandoned this revision.Apr 22 2015, 7:38 PM

• HaoLiu reclaimed this revision.

We are interested in these intrinsics anyway:

<4 x double> @llvm.masked.indexed.load.v4f64 (double* <ptr>, <4 x i32> <index>, i32 <alignment>, <4 x i1>%mask, <4 x double>%paththru)

Presence of index means that the sequential load is not a solution.
If you are not going to implement this interface now, I’ll do this later.

Elena

From: Renato Golin [mailto:renato.golin@linaro.org]
Sent: Thursday, April 23, 2015 10:29
To: reviews+D9194+public+a2f9c543cd08afc3@reviews.llvm.org
Cc: Ahmed Bougacha; Hao.Liu@arm.com; llvm-commits@cs.uiuc.edu; qcolombet@apple.com; Demikhovsky, Elena
Subject: Re: [PATCH] Add new indexed load/store intrinsics.

In D9194#161690, @delena wrote:

We are interested in these intrinsics anyway:

<4 x double> @llvm.masked.indexed.load.v4f64 (double* <ptr>, <4 x i32> <index>, i32 <alignment>, <4 x i1>%mask, <4 x double>%paththru)

Presence of index means that the sequential load is not a solution.
If you are not going to implement this interface now, I’ll do this later.

Elena

From: Renato Golin [mailto:renato.golin@linaro.org]
Sent: Thursday, April 23, 2015 10:29
To: reviews+D9194+public+a2f9c543cd08afc3@reviews.llvm.org
Cc: Ahmed Bougacha; Hao.Liu@arm.com; llvm-commits@cs.uiuc.edu; qcolombet@apple.com; Demikhovsky, Elena
Subject: Re: [PATCH] Add new indexed load/store intrinsics.

Yes, it's still useful in your situation. I just think the interleaved load/store doesn't need this intrinsic.

Thanks,
-Hao

• HaoLiu abandoned this revision.Apr 27 2015, 1:50 AM

Revision Contents

Path

Size

docs/

LangRef.rst

118 lines

include/

llvm/

Analysis/

TargetTransformInfo.h

16 lines

TargetTransformInfoImpl.h

8 lines

CodeGen/

ISDOpcodes.h

3 lines

SelectionDAG.h

9 lines

SelectionDAGNodes.h

55 lines

IR/

Intrinsics.h

6 lines

Intrinsics.td

15 lines

lib/

Analysis/

TargetTransformInfo.cpp

10 lines

CodeGen/

CodeGenPrepare.cpp

131 lines

SelectionDAG/

SelectionDAG.cpp

50 lines

SelectionDAGBuilder.h

2 lines

SelectionDAGBuilder.cpp

65 lines

SelectionDAGDumper.cpp

2 lines

IR/

Function.cpp

18 lines

Verifier.cpp

12 lines

Target/

AArch64/

AArch64ISelLowering.cpp

113 lines

AArch64TargetTransformInfo.h

2 lines

AArch64TargetTransformInfo.cpp

49 lines

test/

CodeGen/

AArch64/

indexed-load-store-noninterleaved.ll

76 lines

interleaved-load-store.ll

73 lines

utils/

TableGen/

IntrinsicEmitter.cpp

6 lines

Diff 24217

docs/LangRef.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 9,827 Lines • ▼ Show 20 Lines	::

call void @llvm.masked.store.v16f32(<16 x float> %value, <16 x float>* %ptr, i32 4, <16 x i1> %mask)		call void @llvm.masked.store.v16f32(<16 x float> %value, <16 x float>* %ptr, i32 4, <16 x i1> %mask)

;; The result of the following instructions is identical aside from potential data races and memory access exceptions		;; The result of the following instructions is identical aside from potential data races and memory access exceptions
%oldval = load <16 x float>, <16 x float>* %ptr, align 4		%oldval = load <16 x float>, <16 x float>* %ptr, align 4
%res = select <16 x i1> %mask, <16 x float> %value, <16 x float> %oldval		%res = select <16 x i1> %mask, <16 x float> %value, <16 x float> %oldval
store <16 x float> %res, <16 x float>* %ptr, align 4		store <16 x float> %res, <16 x float>* %ptr, align 4

		Indexed Vector Load and Store Intrinsics
		---------------------------------------

		LLVM provides intrinsics for indexed vector load and store operations, which allow read/write access to multiple memory addresses.
		The addresses are specified by a base address and an index vector.

		.. _int_iload:

		'``llvm.indexed.load.*``' Intrinsics
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		Syntax:
		"""""""
		This is an overloaded intrinsic. The loaded data is a vector of any integer or floating point data type.

		::

		declare <16 x i32> @llvm.indexed.load.v16i32 (i32* <ptr>, <16 x i32> <index>, i32 <alignment>)
		declare <8 x double> @llvm.indexed.load.v8f64 (double* <ptr>, <8 x i32> <index>, i32 <alignment>)

		Overview:
		"""""""""

		Reads the vector elements from multiple memory addresses. The address of each element is specified by the base address and the corresponding index element.


		Arguments:
		""""""""""

		The first operand is the base pointer for the load. It must be a pointer type to the loaded vector element.
		The second operand is the index vector whose element type is always 'i32'. It must be the same length as the loaded vector.
		The third operand is the alignment of the source location. It is always 'i32' type.

		The index must be a constant vector. The alignment must be a constant.

		Semantics:
		""""""""""

		The '``llvm.indexed.load``' intrinsic is designed for reading vector elements from multiple addresses in a single IR operation.
		It can be used as interleaved load and strided load.


		Examples:
		"""""""""

		.. code-block:: llvm

		%res = call <8 x float> @llvm.indexed.load.v8f32 (float* %ptr, <8 x i32> <i32 idx0, i32 idx1, ..., i32 idx7>, i32 4)

		;; The result of the following instructions is identical aside from potential memory access exception
		%ptr0 = getelementptr float, float* %ptr, i32 idx0 ; Address for lane 0
		%ptr1 = getelementptr float, float* %ptr, i32 idx1 ; Address for lane 1
		...
		%ptr7 = getelementptr float, float* %ptr, i32 idx7 ; Address for lane 7
		%lane0 = load float, float* %ptr0, align 4 ; Load for lane 0
		%lane1 = load float, float* %ptr1, align 4 ; Load for lane 1
		...
		%lane7 = load float, float* %ptr7, align 4 ; Load for lane 7
		%res0 = insertelement <8 x float> undef, float %lane0, i32 0 ; Insert lane 0
		%res1 = insertelement <8 x float> res0, float %lane1, i32 1 ; Insert lane 1
		...
		%res = insertelement <8 x float> res14, float %lane7, i32 7 ; Insert lane 7

		.. _int_istore:

		'``llvm.indexed.store.*``' Intrinsics
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		Syntax:
		"""""""
		This is an overloaded intrinsic. The data stored in memory is a vector of any integer or floating point data type.

		::

		declare void @llvm.indexed.store.v16i32 (<16 x i32> <value>, i32* <ptr>, <16 x i32> <index>, i32 <alignment>)
		declare void @llvm.indexed.store.v8f64 (<8 x double> <value>, double* <ptr>, <8 x i32> <index>, i32 <alignment>)

		Overview:
		"""""""""

		Writes the vector elements to multiple memory addresses. The address of each element is specified by the the base address and corresponding index element.

		Arguments:
		""""""""""

		The first operand is the vector value to be written to memory.
		The second operand is the base pointer for the store. It must be a pointer type to the stored vector element.
		The third operand is the index vector whose element type is always 'i32'. It must be the same length as the stored vector.
		The fourth operand is the alignment of the source location. It is always 'i32' type.

		The index must be a constant vector. The alignment must be a constant.

		Semantics:
		""""""""""

		The '``llvm.indexed.store``' intrinsic is designed for writing vector elements to multiple addresses in a single IR operation.
		It can be used as interleaved store and strided store.

		Examples:
		"""""""""

		.. code-block:: llvm

		call void @llvm.indexed.store.v8f32(<8 x float> %value, float* %ptr, <8 x i32> <idx0, idx1, ..., idx7>, i32 4)

		;; The result of the following instructions is identical aside from potential memory access exceptions
		%ptr0 = getelementptr float, float* %ptr, i32 idx0 ; Address for lane 0
		%ptr1 = getelementptr float, float* %ptr, i32 idx1 ; Address for lane 1
		...
		%ptr7 = getelementptr float, float* %ptr, i32 idx7 ; Address for lane 7
		%lane0 = extractelement <8 x float> %value, i32 0 ; Extract lane 0
		%lane1 = extractelement <8 x float> %value, i32 1 ; Extract lane 1
		...
		%lane7 = extractelement <8 x float> %value, i32 1 ; Extract lane 7
		store float %lane0, float* %ptr0, align 4 ; Store lane 0
		store float %lane1, float* %ptr1, align 4 ; Store lane 1
		...
		store float %lane7, float* %ptr7, align 4 ; Store lane 7

Memory Use Markers		Memory Use Markers
------------------		------------------

This class of intrinsics provides information about the lifetime of		This class of intrinsics provides information about the lifetime of
memory objects and ranges where variables are immutable.		memory objects and ranges where variables are immutable.

.. _int_lifestart:		.. _int_lifestart:
▲ Show 20 Lines • Show All 536 Lines • Show Last 20 Lines

include/llvm/Analysis/TargetTransformInfo.h

Show First 20 Lines • Show All 306 Lines • ▼ Show 20 Lines	public:

/// \brief Return true if the target works with masked instruction		/// \brief Return true if the target works with masked instruction
/// AVX2 allows masks for consecutive load and store for i32 and i64 elements.		/// AVX2 allows masks for consecutive load and store for i32 and i64 elements.
/// AVX-512 architecture will also allow masks for non-consecutive memory		/// AVX-512 architecture will also allow masks for non-consecutive memory
/// accesses.		/// accesses.
bool isLegalMaskedStore(Type *DataType, int Consecutive) const;		bool isLegalMaskedStore(Type *DataType, int Consecutive) const;
bool isLegalMaskedLoad(Type *DataType, int Consecutive) const;		bool isLegalMaskedLoad(Type *DataType, int Consecutive) const;

		/// \brief Return true if the target works with indexed instruction of the
		/// given data type and indices.
		bool supportIndexedStore(Type *DataType, ArrayRef<unsigned> Indices) const;
		bool supportIndexedLoad(Type *DataType, ArrayRef<unsigned> Indices) const;

/// \brief Return the cost of the scaling factor used in the addressing		/// \brief Return the cost of the scaling factor used in the addressing
/// mode represented by AM for this target, for a load/store		/// mode represented by AM for this target, for a load/store
/// of the specified type.		/// of the specified type.
/// If the AM is supported, the return value must be >= 0.		/// If the AM is supported, the return value must be >= 0.
/// If the AM is not supported, it returns a negative value.		/// If the AM is not supported, it returns a negative value.
/// TODO: Handle pre/postinc as well.		/// TODO: Handle pre/postinc as well.
int getScalingFactorCost(Type Ty, GlobalValue BaseGV, int64_t BaseOffset,		int getScalingFactorCost(Type Ty, GlobalValue BaseGV, int64_t BaseOffset,
bool HasBaseReg, int64_t Scale) const;		bool HasBaseReg, int64_t Scale) const;
▲ Show 20 Lines • Show All 214 Lines • ▼ Show 20 Lines	public:
virtual void getUnrollingPreferences(Loop *L, UnrollingPreferences &UP) = 0;		virtual void getUnrollingPreferences(Loop *L, UnrollingPreferences &UP) = 0;
virtual bool isLegalAddImmediate(int64_t Imm) = 0;		virtual bool isLegalAddImmediate(int64_t Imm) = 0;
virtual bool isLegalICmpImmediate(int64_t Imm) = 0;		virtual bool isLegalICmpImmediate(int64_t Imm) = 0;
virtual bool isLegalAddressingMode(Type Ty, GlobalValue BaseGV,		virtual bool isLegalAddressingMode(Type Ty, GlobalValue BaseGV,
int64_t BaseOffset, bool HasBaseReg,		int64_t BaseOffset, bool HasBaseReg,
int64_t Scale) = 0;		int64_t Scale) = 0;
virtual bool isLegalMaskedStore(Type *DataType, int Consecutive) = 0;		virtual bool isLegalMaskedStore(Type *DataType, int Consecutive) = 0;
virtual bool isLegalMaskedLoad(Type *DataType, int Consecutive) = 0;		virtual bool isLegalMaskedLoad(Type *DataType, int Consecutive) = 0;
		virtual bool supportIndexedStore(Type *DataType,
		ArrayRef<unsigned> Indices) = 0;
		virtual bool supportIndexedLoad(Type *DataType,
		ArrayRef<unsigned> Indices) = 0;
virtual int getScalingFactorCost(Type Ty, GlobalValue BaseGV,		virtual int getScalingFactorCost(Type Ty, GlobalValue BaseGV,
int64_t BaseOffset, bool HasBaseReg,		int64_t BaseOffset, bool HasBaseReg,
int64_t Scale) = 0;		int64_t Scale) = 0;
virtual bool isTruncateFree(Type Ty1, Type Ty2) = 0;		virtual bool isTruncateFree(Type Ty1, Type Ty2) = 0;
virtual bool isProfitableToHoist(Instruction *I) = 0;		virtual bool isProfitableToHoist(Instruction *I) = 0;
virtual bool isTypeLegal(Type *Ty) = 0;		virtual bool isTypeLegal(Type *Ty) = 0;
virtual unsigned getJumpBufAlignment() = 0;		virtual unsigned getJumpBufAlignment() = 0;
virtual unsigned getJumpBufSize() = 0;		virtual unsigned getJumpBufSize() = 0;
▲ Show 20 Lines • Show All 100 Lines • ▼ Show 20 Lines	return Impl.isLegalAddressingMode(Ty, BaseGV, BaseOffset, HasBaseReg,
Scale);		Scale);
}		}
bool isLegalMaskedStore(Type *DataType, int Consecutive) override {		bool isLegalMaskedStore(Type *DataType, int Consecutive) override {
return Impl.isLegalMaskedStore(DataType, Consecutive);		return Impl.isLegalMaskedStore(DataType, Consecutive);
}		}
bool isLegalMaskedLoad(Type *DataType, int Consecutive) override {		bool isLegalMaskedLoad(Type *DataType, int Consecutive) override {
return Impl.isLegalMaskedLoad(DataType, Consecutive);		return Impl.isLegalMaskedLoad(DataType, Consecutive);
}		}
		bool supportIndexedStore(Type *DataType,
		ArrayRef<unsigned> Indices) override {
		return Impl.supportIndexedStore(DataType, Indices);
		}
		bool supportIndexedLoad(Type *DataType, ArrayRef<unsigned> Indices) override {
		return Impl.supportIndexedLoad(DataType, Indices);
		}
int getScalingFactorCost(Type Ty, GlobalValue BaseGV, int64_t BaseOffset,		int getScalingFactorCost(Type Ty, GlobalValue BaseGV, int64_t BaseOffset,
bool HasBaseReg, int64_t Scale) override {		bool HasBaseReg, int64_t Scale) override {
return Impl.getScalingFactorCost(Ty, BaseGV, BaseOffset, HasBaseReg, Scale);		return Impl.getScalingFactorCost(Ty, BaseGV, BaseOffset, HasBaseReg, Scale);
}		}
bool isTruncateFree(Type Ty1, Type Ty2) override {		bool isTruncateFree(Type Ty1, Type Ty2) override {
return Impl.isTruncateFree(Ty1, Ty2);		return Impl.isTruncateFree(Ty1, Ty2);
}		}
bool isProfitableToHoist(Instruction *I) override {		bool isProfitableToHoist(Instruction *I) override {
▲ Show 20 Lines • Show All 210 Lines • Show Last 20 Lines

include/llvm/Analysis/TargetTransformInfoImpl.h

Show First 20 Lines • Show All 211 Lines • ▼ Show 20 Lines	bool isLegalAddressingMode(Type Ty, GlobalValue BaseGV, int64_t BaseOffset,
// the implementation of LSR.		// the implementation of LSR.
return !BaseGV && BaseOffset == 0 && Scale <= 1;		return !BaseGV && BaseOffset == 0 && Scale <= 1;
}		}

bool isLegalMaskedStore(Type *DataType, int Consecutive) { return false; }		bool isLegalMaskedStore(Type *DataType, int Consecutive) { return false; }

bool isLegalMaskedLoad(Type *DataType, int Consecutive) { return false; }		bool isLegalMaskedLoad(Type *DataType, int Consecutive) { return false; }

		bool supportIndexedStore(Type *DataType, ArrayRef<unsigned> Indices) {
		return false;
		}

		bool supportIndexedLoad(Type *DataType, ArrayRef<unsigned> Indices) {
		return false;
		}

int getScalingFactorCost(Type Ty, GlobalValue BaseGV, int64_t BaseOffset,		int getScalingFactorCost(Type Ty, GlobalValue BaseGV, int64_t BaseOffset,
bool HasBaseReg, int64_t Scale) {		bool HasBaseReg, int64_t Scale) {
// Guess that all legal addressing mode are free.		// Guess that all legal addressing mode are free.
if (isLegalAddressingMode(Ty, BaseGV, BaseOffset, HasBaseReg, Scale))		if (isLegalAddressingMode(Ty, BaseGV, BaseOffset, HasBaseReg, Scale))
return 0;		return 0;
return -1;		return -1;
}		}

▲ Show 20 Lines • Show All 214 Lines • Show Last 20 Lines

include/llvm/CodeGen/ISDOpcodes.h

Show First 20 Lines • Show All 684 Lines • ▼ Show 20 Lines	enum NodeType {
ATOMIC_LOAD_MIN,		ATOMIC_LOAD_MIN,
ATOMIC_LOAD_MAX,		ATOMIC_LOAD_MAX,
ATOMIC_LOAD_UMIN,		ATOMIC_LOAD_UMIN,
ATOMIC_LOAD_UMAX,		ATOMIC_LOAD_UMAX,

// Masked load and store		// Masked load and store
MLOAD, MSTORE,		MLOAD, MSTORE,

		// Indexed load and store
		ILOAD, ISTORE,

/// This corresponds to the llvm.lifetime.* intrinsics. The first operand		/// This corresponds to the llvm.lifetime.* intrinsics. The first operand
/// is the chain and the second operand is the alloca pointer.		/// is the chain and the second operand is the alloca pointer.
LIFETIME_START, LIFETIME_END,		LIFETIME_START, LIFETIME_END,

/// BUILTIN_OP_END - This must be the last enum value in this list.		/// BUILTIN_OP_END - This must be the last enum value in this list.
/// The target-specific pre-isel opcode values start here.		/// The target-specific pre-isel opcode values start here.
BUILTIN_OP_END		BUILTIN_OP_END
};		};
▲ Show 20 Lines • Show All 177 Lines • Show Last 20 Lines

include/llvm/CodeGen/SelectionDAG.h

Show First 20 Lines • Show All 850 Lines • ▼ Show 20 Lines	SDValue getIndexedStore(SDValue OrigStoe, SDLoc dl, SDValue Base,
SDValue Offset, ISD::MemIndexedMode AM);		SDValue Offset, ISD::MemIndexedMode AM);

SDValue getMaskedLoad(EVT VT, SDLoc dl, SDValue Chain, SDValue Ptr,		SDValue getMaskedLoad(EVT VT, SDLoc dl, SDValue Chain, SDValue Ptr,
SDValue Mask, SDValue Src0, EVT MemVT,		SDValue Mask, SDValue Src0, EVT MemVT,
MachineMemOperand *MMO, ISD::LoadExtType);		MachineMemOperand *MMO, ISD::LoadExtType);
SDValue getMaskedStore(SDValue Chain, SDLoc dl, SDValue Val,		SDValue getMaskedStore(SDValue Chain, SDLoc dl, SDValue Val,
SDValue Ptr, SDValue Mask, EVT MemVT,		SDValue Ptr, SDValue Mask, EVT MemVT,
MachineMemOperand *MMO, bool IsTrunc);		MachineMemOperand *MMO, bool IsTrunc);

		// Construct a ILOAD node
		SDValue getIndexedLoad(EVT VT, SDValue Chain, SDValue Ptr, SDValue Index,
		EVT MemVT, MachineMemOperand *MMO, SDLoc dl);
		// Construct a ISTORE node
		SDValue getIndexedStore(SDValue Chain, SDValue Val, SDValue Ptr,
		SDValue Index, EVT MemVT, MachineMemOperand *MMO,
		SDLoc dl);

/// Construct a node to track a Value* through the backend.		/// Construct a node to track a Value* through the backend.
SDValue getSrcValue(const Value *v);		SDValue getSrcValue(const Value *v);

/// Return an MDNodeSDNode which holds an MDNode.		/// Return an MDNodeSDNode which holds an MDNode.
SDValue getMDNode(const MDNode *MD);		SDValue getMDNode(const MDNode *MD);

/// Return an AddrSpaceCastSDNode.		/// Return an AddrSpaceCastSDNode.
SDValue getAddrSpaceCast(SDLoc dl, EVT VT, SDValue Ptr,		SDValue getAddrSpaceCast(SDLoc dl, EVT VT, SDValue Ptr,
▲ Show 20 Lines • Show All 394 Lines • Show Last 20 Lines

include/llvm/CodeGen/SelectionDAGNodes.h

Show First 20 Lines • Show All 1,145 Lines • ▼ Show 20 Lines	return N->getOpcode() == ISD::LOAD \|\|
N->getOpcode() == ISD::ATOMIC_LOAD_MIN \|\|		N->getOpcode() == ISD::ATOMIC_LOAD_MIN \|\|
N->getOpcode() == ISD::ATOMIC_LOAD_MAX \|\|		N->getOpcode() == ISD::ATOMIC_LOAD_MAX \|\|
N->getOpcode() == ISD::ATOMIC_LOAD_UMIN \|\|		N->getOpcode() == ISD::ATOMIC_LOAD_UMIN \|\|
N->getOpcode() == ISD::ATOMIC_LOAD_UMAX \|\|		N->getOpcode() == ISD::ATOMIC_LOAD_UMAX \|\|
N->getOpcode() == ISD::ATOMIC_LOAD \|\|		N->getOpcode() == ISD::ATOMIC_LOAD \|\|
N->getOpcode() == ISD::ATOMIC_STORE \|\|		N->getOpcode() == ISD::ATOMIC_STORE \|\|
N->getOpcode() == ISD::MLOAD \|\|		N->getOpcode() == ISD::MLOAD \|\|
N->getOpcode() == ISD::MSTORE \|\|		N->getOpcode() == ISD::MSTORE \|\|
		N->getOpcode() == ISD::ILOAD \|\|
		N->getOpcode() == ISD::ISTORE \|\|
N->isMemIntrinsic() \|\|		N->isMemIntrinsic() \|\|
N->isTargetMemoryOpcode();		N->isTargetMemoryOpcode();
}		}
};		};

/// A SDNode reprenting atomic operations.		/// A SDNode reprenting atomic operations.
class AtomicSDNode : public MemSDNode {		class AtomicSDNode : public MemSDNode {
SDUse Ops[4];		SDUse Ops[4];
▲ Show 20 Lines • Show All 820 Lines • ▼ Show 20 Lines	public:

const SDValue &getValue() const { return getOperand(3); }		const SDValue &getValue() const { return getOperand(3); }

static bool classof(const SDNode *N) {		static bool classof(const SDNode *N) {
return N->getOpcode() == ISD::MSTORE;		return N->getOpcode() == ISD::MSTORE;
}		}
};		};

		/// This base class is used to represent ILOAD and ISTORE nodes
		class IndexedLoadStoreSDNode : public MemSDNode {
		// Operands
		SDUse Ops[4];

		public:
		friend class SelectionDAG;
		IndexedLoadStoreSDNode(ISD::NodeType NodeTy, unsigned Order, DebugLoc dl,
		SDValue *Operands, unsigned numOperands, SDVTList VTs,
		EVT MemVT, MachineMemOperand *MMO)
		: MemSDNode(NodeTy, Order, dl, VTs, MemVT, MMO) {
		InitOperands(Ops, Operands, numOperands);
		}

		// IndexedLoadSDNode (Chain, Ptr, Index)
		// IndexedStoreSDNode (Chain, Ptr, Index, Src)
		// In the both nodes address is Op1, Index is Op2.
		const SDValue &getBasePtr() const { return getOperand(1); }
		const SDValue &getIndex() const { return getOperand(2); }

		static bool classof(const SDNode *N) {
		return N->getOpcode() == ISD::ILOAD \|\| N->getOpcode() == ISD::ISTORE;
		}
		};

		/// This class is used to represent an ILOAD node
		class IndexedLoadSDNode : public IndexedLoadStoreSDNode {
		public:
		friend class SelectionDAG;
		IndexedLoadSDNode(unsigned Order, DebugLoc dl, SDValue *Operands,
		unsigned numOperands, SDVTList VTs, EVT MemVT,
		MachineMemOperand *MMO)
		: IndexedLoadStoreSDNode(ISD::ILOAD, Order, dl, Operands, numOperands,
		VTs, MemVT, MMO) {}

		static bool classof(const SDNode *N) { return N->getOpcode() == ISD::ILOAD; }
		};

		/// This class is used to represent an ISTORE node
		class IndexedStoreSDNode : public IndexedLoadStoreSDNode {
		public:
		friend class SelectionDAG;
		IndexedStoreSDNode(unsigned Order, DebugLoc dl, SDValue *Operands,
		unsigned numOperands, SDVTList VTs, EVT MemVT,
		MachineMemOperand *MMO)
		: IndexedLoadStoreSDNode(ISD::ISTORE, Order, dl, Operands, numOperands,
		VTs, MemVT, MMO) {}

		const SDValue &getValue() const { return getOperand(3); }

		static bool classof(const SDNode *N) { return N->getOpcode() == ISD::ISTORE; }
		};

/// An SDNode that represents everything that will be needed		/// An SDNode that represents everything that will be needed
/// to construct a MachineInstr. These nodes are created during the		/// to construct a MachineInstr. These nodes are created during the
/// instruction selection proper phase.		/// instruction selection proper phase.
class MachineSDNode : public SDNode {		class MachineSDNode : public SDNode {
public:		public:
typedef MachineMemOperand **mmo_iterator;		typedef MachineMemOperand **mmo_iterator;

private:		private:
▲ Show 20 Lines • Show All 154 Lines • Show Last 20 Lines

include/llvm/IR/Intrinsics.h

Show First 20 Lines • Show All 71 Lines • ▼ Show 20 Lines	#undef GET_INTRINSIC_ENUM_VALUES

/// This is a type descriptor which explains the type requirements of an		/// This is a type descriptor which explains the type requirements of an
/// intrinsic. This is returned by getIntrinsicInfoTableEntries.		/// intrinsic. This is returned by getIntrinsicInfoTableEntries.
struct IITDescriptor {		struct IITDescriptor {
enum IITDescriptorKind {		enum IITDescriptorKind {
Void, VarArg, MMX, Metadata, Half, Float, Double,		Void, VarArg, MMX, Metadata, Half, Float, Double,
Integer, Vector, Pointer, Struct,		Integer, Vector, Pointer, Struct,
Argument, ExtendArgument, TruncArgument, HalfVecArgument,		Argument, ExtendArgument, TruncArgument, HalfVecArgument,
SameVecWidthArgument, PtrToArgument, VecOfPtrsToElt		SameVecWidthArgument, PtrToArgument, VecOfPtrsToElt, PtrToVecElt
} Kind;		} Kind;

union {		union {
unsigned Integer_Width;		unsigned Integer_Width;
unsigned Float_Width;		unsigned Float_Width;
unsigned Vector_Width;		unsigned Vector_Width;
unsigned Pointer_AddressSpace;		unsigned Pointer_AddressSpace;
unsigned Struct_NumElements;		unsigned Struct_NumElements;
unsigned Argument_Info;		unsigned Argument_Info;
};		};

enum ArgKind {		enum ArgKind {
AK_Any,		AK_Any,
AK_AnyInteger,		AK_AnyInteger,
AK_AnyFloat,		AK_AnyFloat,
AK_AnyVector,		AK_AnyVector,
AK_AnyPointer		AK_AnyPointer
};		};
unsigned getArgumentNumber() const {		unsigned getArgumentNumber() const {
assert(Kind == Argument \|\| Kind == ExtendArgument \|\|		assert(Kind == Argument \|\| Kind == ExtendArgument \|\|
Kind == TruncArgument \|\| Kind == HalfVecArgument \|\|		Kind == TruncArgument \|\| Kind == HalfVecArgument \|\|
Kind == SameVecWidthArgument \|\| Kind == PtrToArgument \|\|		Kind == SameVecWidthArgument \|\| Kind == PtrToArgument \|\|
Kind == VecOfPtrsToElt);		Kind == VecOfPtrsToElt \|\| Kind == PtrToVecElt);
return Argument_Info >> 3;		return Argument_Info >> 3;
}		}
ArgKind getArgumentKind() const {		ArgKind getArgumentKind() const {
assert(Kind == Argument \|\| Kind == ExtendArgument \|\|		assert(Kind == Argument \|\| Kind == ExtendArgument \|\|
Kind == TruncArgument \|\| Kind == HalfVecArgument \|\|		Kind == TruncArgument \|\| Kind == HalfVecArgument \|\|
Kind == SameVecWidthArgument \|\| Kind == PtrToArgument \|\|		Kind == SameVecWidthArgument \|\| Kind == PtrToArgument \|\|
Kind == VecOfPtrsToElt);		Kind == VecOfPtrsToElt \|\| Kind == PtrToVecElt);
return (ArgKind)(Argument_Info & 7);		return (ArgKind)(Argument_Info & 7);
}		}

static IITDescriptor get(IITDescriptorKind K, unsigned Field) {		static IITDescriptor get(IITDescriptorKind K, unsigned Field) {
IITDescriptor Result = { K, { Field } };		IITDescriptor Result = { K, { Field } };
return Result;		return Result;
}		}
};		};
Show All 10 Lines

include/llvm/IR/Intrinsics.td

Show First 20 Lines • Show All 112 Lines • ▼ Show 20 Lines
class LLVMExtendedType<int num> : LLVMMatchType<num>;		class LLVMExtendedType<int num> : LLVMMatchType<num>;
class LLVMTruncatedType<int num> : LLVMMatchType<num>;		class LLVMTruncatedType<int num> : LLVMMatchType<num>;
class LLVMVectorSameWidth<int num, LLVMType elty>		class LLVMVectorSameWidth<int num, LLVMType elty>
: LLVMMatchType<num> {		: LLVMMatchType<num> {
ValueType ElTy = elty.VT;		ValueType ElTy = elty.VT;
}		}
class LLVMPointerTo<int num> : LLVMMatchType<num>;		class LLVMPointerTo<int num> : LLVMMatchType<num>;
class LLVMVectorOfPointersToElt<int num> : LLVMMatchType<num>;		class LLVMVectorOfPointersToElt<int num> : LLVMMatchType<num>;
		class LLVMPointerToVectorElt<int num> : LLVMMatchType<num>;

// Match the type of another intrinsic parameter that is expected to be a		// Match the type of another intrinsic parameter that is expected to be a
// vector type, but change the element count to be half as many		// vector type, but change the element count to be half as many
class LLVMHalfElementsVectorType<int num> : LLVMMatchType<num>;		class LLVMHalfElementsVectorType<int num> : LLVMMatchType<num>;

def llvm_void_ty : LLVMType<isVoid>;		def llvm_void_ty : LLVMType<isVoid>;
def llvm_any_ty : LLVMType<Any>;		def llvm_any_ty : LLVMType<Any>;
def llvm_anyint_ty : LLVMType<iAny>;		def llvm_anyint_ty : LLVMType<iAny>;
▲ Show 20 Lines • Show All 474 Lines • ▼ Show 20 Lines	def int_masked_gather: Intrinsic<[llvm_anyvector_ty],
[IntrReadArgMem]>;		[IntrReadArgMem]>;

def int_masked_scatter: Intrinsic<[],		def int_masked_scatter: Intrinsic<[],
[llvm_anyvector_ty,		[llvm_anyvector_ty,
LLVMVectorOfPointersToElt<0>, llvm_i32_ty,		LLVMVectorOfPointersToElt<0>, llvm_i32_ty,
LLVMVectorSameWidth<0, llvm_i1_ty>],		LLVMVectorSameWidth<0, llvm_i1_ty>],
[IntrReadWriteArgMem]>;		[IntrReadWriteArgMem]>;

		//===--------------------- Indexed load/store Intrinsics ------------------===//
		//
		def int_indexed_load : Intrinsic<[llvm_anyvector_ty],
		[LLVMPointerToVectorElt<0>,
		LLVMVectorSameWidth<0, llvm_i32_ty>,
		llvm_i32_ty],
		[IntrReadArgMem]>;

		def int_indexed_store : Intrinsic<[],
		[llvm_anyvector_ty, LLVMPointerToVectorElt<0>,
		LLVMVectorSameWidth<0, llvm_i32_ty>,
		llvm_i32_ty],
		[IntrReadWriteArgMem]>;

// Intrinsics to support bit sets.		// Intrinsics to support bit sets.
def int_bitset_test : Intrinsic<[llvm_i1_ty], [llvm_ptr_ty, llvm_metadata_ty],		def int_bitset_test : Intrinsic<[llvm_i1_ty], [llvm_ptr_ty, llvm_metadata_ty],
[IntrNoMem]>;		[IntrNoMem]>;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Target-specific intrinsics		// Target-specific intrinsics
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

Show All 11 Lines

lib/Analysis/TargetTransformInfo.cpp

Show First 20 Lines • Show All 109 Lines • ▼ Show 20 Lines	bool TargetTransformInfo::isLegalMaskedStore(Type *DataType,
return TTIImpl->isLegalMaskedStore(DataType, Consecutive);		return TTIImpl->isLegalMaskedStore(DataType, Consecutive);
}		}

bool TargetTransformInfo::isLegalMaskedLoad(Type *DataType,		bool TargetTransformInfo::isLegalMaskedLoad(Type *DataType,
int Consecutive) const {		int Consecutive) const {
return TTIImpl->isLegalMaskedLoad(DataType, Consecutive);		return TTIImpl->isLegalMaskedLoad(DataType, Consecutive);
}		}

		bool TargetTransformInfo::supportIndexedStore(
		Type *DataType, ArrayRef<unsigned> Indices) const {
		return TTIImpl->supportIndexedStore(DataType, Indices);
		}

		bool TargetTransformInfo::supportIndexedLoad(Type *DataType,
		ArrayRef<unsigned> Indices) const {
		return TTIImpl->supportIndexedLoad(DataType, Indices);
		}

int TargetTransformInfo::getScalingFactorCost(Type Ty, GlobalValue BaseGV,		int TargetTransformInfo::getScalingFactorCost(Type Ty, GlobalValue BaseGV,
int64_t BaseOffset,		int64_t BaseOffset,
bool HasBaseReg,		bool HasBaseReg,
int64_t Scale) const {		int64_t Scale) const {
return TTIImpl->getScalingFactorCost(Ty, BaseGV, BaseOffset, HasBaseReg,		return TTIImpl->getScalingFactorCost(Ty, BaseGV, BaseOffset, HasBaseReg,
Scale);		Scale);
}		}

▲ Show 20 Lines • Show All 199 Lines • Show Last 20 Lines

lib/CodeGen/CodeGenPrepare.cpp

Show First 20 Lines • Show All 1,245 Lines • ▼ Show 20 Lines	for (unsigned Idx = 0; Idx < VectorWidth; ++Idx) {
Instruction *OldBr = IfBlock->getTerminator();		Instruction *OldBr = IfBlock->getTerminator();
BranchInst::Create(CondBlock, NewIfBlock, Cmp, OldBr);		BranchInst::Create(CondBlock, NewIfBlock, Cmp, OldBr);
OldBr->eraseFromParent();		OldBr->eraseFromParent();
IfBlock = NewIfBlock;		IfBlock = NewIfBlock;
}		}
CI->eraseFromParent();		CI->eraseFromParent();
}		}

		// Translate index load intrinsic, like
		// <4 x i32> @llvm.indexed.load(<i32* %ptr, <4 x i32> %<idx0, ..., idx3>,
		// i32 align)
		// to scalar loads and insertelements:
		// %ptr0 = getelementptr i32, i32 *%ptr, i32 %idx0
		// %lane0 = load i32, i32 *%ptr0 ; Load lane 0
		// %res0 = insertelement <4 x i32> undef, i32 %lane0, i32 0 ; Insert lane 0
		// %ptr1 = getelementptr i32, i32 *%ptr, i32 %idx1
		// %lane1 = load i32, i32 *%ptr1 ; Load lane 1
		// %res1 = insertelement <4 x i32> %res0, i32 %lane1, i32 1 ; Insert lane 1
		// %ptr2 = getelementptr i32, i32 *%ptr, i32 %idx2
		// %lane2 = load i32, i32 *%ptr2 ; Load lane 2
		// %res2 = insertelement <4 x i32> %res0, i32 %lane2, i32 1 ; Insert lane 2
		// %ptr3 = getelementptr i32, i32 *%ptr, i32 %idx3
		// %lane3 = load i32, i32 *%ptr3 ; Load lane 3
		// %res = insertelement <4 x i32> %res2, i32 %lane3, i32 3 ; Insert lane 3
		static void ScalarizeIndexedLoad(CallInst *CI) {
		const Constant *CIdx = dyn_cast<Constant>(CI->getArgOperand(1));
		assert(CIdx && (isa<ConstantDataVector>(CIdx) \|\| isa<ConstantVector>(CIdx)) &&
		"Expect a constant index vector");
		Value *Ptr = CI->getArgOperand(0);
		ConstantInt *Alignment = dyn_cast<ConstantInt>(CI->getArgOperand(2));
		assert(Alignment && "The alignment must be a constant");
		unsigned Align = Alignment->getZExtValue();

		VectorType *VecTy = dyn_cast<VectorType>(CI->getType());
		Type *EltTy = VecTy->getElementType();
		Type *PtrTy = EltTy->getPointerTo(Ptr->getType()->getPointerAddressSpace());
		IRBuilder<> Builder(CI);
		Ptr = Builder.CreateBitCast(Ptr, PtrTy);
		Value *Result = UndefValue::get(VecTy);
		for (unsigned i = 0; i < VecTy->getNumElements(); i++) {
		ConstantInt *IdxElt = dyn_cast<ConstantInt>(CIdx->getAggregateElement(i));
		assert(IdxElt && "Expect a constant index element");
		Value *EltPtr = Builder.CreateGEP(Ptr, IdxElt);
		Value *Elt = Builder.CreateAlignedLoad(EltPtr, Align);
		Result = Builder.CreateInsertElement(Result, Elt, Builder.getInt32(i));
		}

		CI->replaceAllUsesWith(Result);
		CI->eraseFromParent();
		}

		// Translate index load intrinsic, like
		// void @llvm.indexed.store(i32* %ptr, <4 x i32> %vec,
		// <4 x i32> %<idx0, ..., idx3>, i32 align)
		// to extractelements and scalar stores:
		// %ptr0 = getelementptr i32, i32* %ptr, i32 %idx0
		// %lane0 = extractelement <4 x i32> %vec, i32 0 ; Extract lane 0
		// store i32 %lane0, i32* %ptr0 ; Store lane 0
		// %ptr1 = getelementptr i32, i32* %ptr, i32 %idx1
		// %lane1 = extractelement <4 x i32> %vec, i32 1 ; Extract lane 1
		// store i32 %lane1, i32* %ptr1 ; Store lane 1
		// %ptr2 = getelementptr i32, i32* %ptr, i32 %idx2
		// %lane2 = extractelement <4 x i32> %vec, i32 1 ; Extract lane 2
		// store i32 %lane2, i32* %ptr2 ; Store lane 2
		// %ptr3 = getelementptr i32, i32* %ptr, i32 %idx3
		// %lane3 = extractelement <4 x i32> %vec, i32 3 ; Extract lane 3
		// store i32 %lane3, i32* %ptr3 ; Store lane 3
		static void ScalarizeIndexedStore(CallInst *CI) {
		const Constant *CIdx = dyn_cast<Constant>(CI->getArgOperand(2));
		assert(CIdx && (isa<ConstantDataVector>(CIdx) \|\| isa<ConstantVector>(CIdx)) &&
		"Expect a constant index vector");
		Value *VecVal = CI->getArgOperand(0);
		Value *Ptr = CI->getArgOperand(1);
		ConstantInt *Alignment = dyn_cast<ConstantInt>(CI->getArgOperand(3));
		assert(Alignment && "The alignment must be a constant");
		unsigned Align = Alignment->getZExtValue();

		VectorType *VecTy = dyn_cast<VectorType>(VecVal->getType());
		Type *EltTy = VecTy->getElementType();
		Type *PtrTy = EltTy->getPointerTo(Ptr->getType()->getPointerAddressSpace());
		IRBuilder<> Builder(CI);
		Ptr = Builder.CreateBitCast(Ptr, PtrTy);
		for (unsigned i = 0; i < VecTy->getNumElements(); i++) {
		ConstantInt *IdxElt = dyn_cast<ConstantInt>(CIdx->getAggregateElement(i));
		assert(IdxElt && "Expect a constant index element");
		Value *Elt = Builder.CreateExtractElement(VecVal, Builder.getInt32(i));
		Value *EllPtr = Builder.CreateGEP(Ptr, IdxElt);
		Builder.CreateAlignedStore(Elt, EllPtr, Align);
		}

		CI->eraseFromParent();
		}

bool CodeGenPrepare::OptimizeCallInst(CallInst *CI, bool& ModifiedDT) {		bool CodeGenPrepare::OptimizeCallInst(CallInst *CI, bool& ModifiedDT) {
BasicBlock *BB = CI->getParent();		BasicBlock *BB = CI->getParent();

// Lower inline assembly if we can.		// Lower inline assembly if we can.
// If we found an inline asm expession, and if the target knows how to		// If we found an inline asm expession, and if the target knows how to
// lower it to normal LLVM code, do so now.		// lower it to normal LLVM code, do so now.
if (TLI && isa<InlineAsm>(CI->getCalledValue())) {		if (TLI && isa<InlineAsm>(CI->getCalledValue())) {
if (TLI->ExpandInlineAsm(CI)) {		if (TLI->ExpandInlineAsm(CI)) {
▲ Show 20 Lines • Show All 95 Lines • ▼ Show 20 Lines	if (II) {
case Intrinsic::masked_store: {		case Intrinsic::masked_store: {
if (!TTI->isLegalMaskedStore(CI->getArgOperand(0)->getType(), 1)) {		if (!TTI->isLegalMaskedStore(CI->getArgOperand(0)->getType(), 1)) {
ScalarizeMaskedStore(CI);		ScalarizeMaskedStore(CI);
ModifiedDT = true;		ModifiedDT = true;
return true;		return true;
}		}
return false;		return false;
}		}
		case Intrinsic::indexed_load: {
		SmallVector<unsigned, 8> Indices;
		Constant *CIdx = dyn_cast<Constant>(CI->getArgOperand(1));
		assert(CIdx &&
		(isa<ConstantDataVector>(CIdx) \|\| isa<ConstantVector>(CIdx)) &&
		"Expect a constant index vector");
		unsigned NumElts = CIdx->getType()->getVectorNumElements();
		for (unsigned i = 0; i < NumElts; i++) {
		ConstantInt *IdxElt =
		dyn_cast<ConstantInt>(CIdx->getAggregateElement(i));
		assert(IdxElt && "Expect a constant index element");
		Indices.push_back(IdxElt->getZExtValue());
		}

		if (!TTI->supportIndexedLoad(CI->getType(), Indices)) {
		// TODO: Some llvm.indexed.load can be optimized by vector load and
		// shufflevector.
		ScalarizeIndexedLoad(CI);
		ModifiedDT = true;
		return true;
		}
		return false;
		}
		case Intrinsic::indexed_store: {
		SmallVector<unsigned, 8> Indices;
		Constant *CIdx = dyn_cast<Constant>(CI->getArgOperand(2));
		assert(CIdx &&
		(isa<ConstantDataVector>(CIdx) \|\| isa<ConstantVector>(CIdx)) &&
		"Expect a constant index vector");
		unsigned NumElts = CIdx->getType()->getVectorNumElements();
		for (unsigned i = 0; i < NumElts; i++) {
		ConstantInt *IdxElt =
		dyn_cast<ConstantInt>(CIdx->getAggregateElement(i));
		assert(IdxElt && "Expect a constant index element");
		Indices.push_back(IdxElt->getZExtValue());
		}

		if (!TTI->supportIndexedStore(CI->getArgOperand(0)->getType(), Indices)) {
		// TODO: Some llvm.indexed.store can be optimized by shufflevector and
		// vector store.
		ScalarizeIndexedStore(CI);
		ModifiedDT = true;
		return true;
		}
		return false;
		}
}		}

if (TLI) {		if (TLI) {
SmallVector<Value*, 2> PtrOps;		SmallVector<Value*, 2> PtrOps;
Type *AccessTy;		Type *AccessTy;
if (TLI->GetAddrModeArguments(II, PtrOps, AccessTy))		if (TLI->GetAddrModeArguments(II, PtrOps, AccessTy))
while (!PtrOps.empty())		while (!PtrOps.empty())
if (OptimizeMemoryInst(II, PtrOps.pop_back_val(), AccessTy))		if (OptimizeMemoryInst(II, PtrOps.pop_back_val(), AccessTy))
▲ Show 20 Lines • Show All 3,371 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/SelectionDAG.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,076 Lines • ▼ Show 20 Lines	SDValue SelectionDAG::getMaskedStore(SDValue Chain, SDLoc dl, SDValue Val,
SDNode *N = new (NodeAllocator) MaskedStoreSDNode(dl.getIROrder(),		SDNode *N = new (NodeAllocator) MaskedStoreSDNode(dl.getIROrder(),
dl.getDebugLoc(), Ops, 4,		dl.getDebugLoc(), Ops, 4,
VTs, isTrunc, MemVT, MMO);		VTs, isTrunc, MemVT, MMO);
CSEMap.InsertNode(N, IP);		CSEMap.InsertNode(N, IP);
InsertNode(N);		InsertNode(N);
return SDValue(N, 0);		return SDValue(N, 0);
}		}

		SDValue SelectionDAG::getIndexedLoad(EVT VT, SDValue Chain, SDValue Ptr,
		SDValue Index, EVT MemVT,
		MachineMemOperand *MMO, SDLoc dl) {
		assert(Chain.getValueType() == MVT::Other && "Invalid chain type");
		SDVTList VTs = getVTList(VT, MVT::Other);
		SDValue Ops[] = {Chain, Ptr, Index};
		FoldingSetNodeID ID;
		AddNodeIDNode(ID, ISD::ILOAD, VTs, Ops);
		ID.AddInteger(VT.getRawBits());
		ID.AddInteger(encodeMemSDNodeFlags(ISD::NON_EXTLOAD, ISD::UNINDEXED,
		MMO->isVolatile(), MMO->isNonTemporal(),
		MMO->isInvariant()));
		ID.AddInteger(MMO->getPointerInfo().getAddrSpace());
		void *IP = nullptr;
		if (SDNode *E = CSEMap.FindNodeOrInsertPos(ID, IP)) {
		cast<IndexedLoadSDNode>(E)->refineAlignment(MMO);
		return SDValue(E, 0);
		}
		SDNode *N = new (NodeAllocator) IndexedLoadSDNode(
		dl.getIROrder(), dl.getDebugLoc(), Ops, 3, VTs, MemVT, MMO);
		CSEMap.InsertNode(N, IP);
		InsertNode(N);
		return SDValue(N, 0);
		}

		SDValue SelectionDAG::getIndexedStore(SDValue Chain, SDValue Val, SDValue Ptr,
		SDValue Index, EVT MemVT,
		MachineMemOperand *MMO, SDLoc dl) {
		assert(Chain.getValueType() == MVT::Other && "Invalid chain type");
		EVT VT = Val.getValueType();
		SDVTList VTs = getVTList(MVT::Other);
		SDValue Ops[] = {Chain, Ptr, Index, Val};
		FoldingSetNodeID ID;
		AddNodeIDNode(ID, ISD::ISTORE, VTs, Ops);
		ID.AddInteger(VT.getRawBits());
		ID.AddInteger(encodeMemSDNodeFlags(false, ISD::UNINDEXED, MMO->isVolatile(),
		MMO->isNonTemporal(), MMO->isInvariant()));
		ID.AddInteger(MMO->getPointerInfo().getAddrSpace());
		void *IP = nullptr;
		if (SDNode *E = CSEMap.FindNodeOrInsertPos(ID, IP)) {
		cast<IndexedStoreSDNode>(E)->refineAlignment(MMO);
		return SDValue(E, 0);
		}
		SDNode *N = new (NodeAllocator) IndexedStoreSDNode(
		dl.getIROrder(), dl.getDebugLoc(), Ops, 4, VTs, MemVT, MMO);
		CSEMap.InsertNode(N, IP);
		InsertNode(N);
		return SDValue(N, 0);
		}

SDValue SelectionDAG::getVAArg(EVT VT, SDLoc dl,		SDValue SelectionDAG::getVAArg(EVT VT, SDLoc dl,
SDValue Chain, SDValue Ptr,		SDValue Chain, SDValue Ptr,
SDValue SV,		SDValue SV,
unsigned Align) {		unsigned Align) {
SDValue Ops[] = { Chain, Ptr, SV, getTargetConstant(Align, MVT::i32) };		SDValue Ops[] = { Chain, Ptr, SV, getTargetConstant(Align, MVT::i32) };
return getNode(ISD::VAARG, dl, getVTList(VT, MVT::Other), Ops);		return getNode(ISD::VAARG, dl, getVTList(VT, MVT::Other), Ops);
}		}

▲ Show 20 Lines • Show All 1,874 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h

Show First 20 Lines • Show All 768 Lines • ▼ Show 20 Lines	private:
void visitGetElementPtr(const User &I);		void visitGetElementPtr(const User &I);
void visitSelect(const User &I);		void visitSelect(const User &I);

void visitAlloca(const AllocaInst &I);		void visitAlloca(const AllocaInst &I);
void visitLoad(const LoadInst &I);		void visitLoad(const LoadInst &I);
void visitStore(const StoreInst &I);		void visitStore(const StoreInst &I);
void visitMaskedLoad(const CallInst &I);		void visitMaskedLoad(const CallInst &I);
void visitMaskedStore(const CallInst &I);		void visitMaskedStore(const CallInst &I);
		void visitIndexedLoad(const CallInst &I);
		void visitIndexedStore(const CallInst &I);
void visitAtomicCmpXchg(const AtomicCmpXchgInst &I);		void visitAtomicCmpXchg(const AtomicCmpXchgInst &I);
void visitAtomicRMW(const AtomicRMWInst &I);		void visitAtomicRMW(const AtomicRMWInst &I);
void visitFence(const FenceInst &I);		void visitFence(const FenceInst &I);
void visitPHI(const PHINode &I);		void visitPHI(const PHINode &I);
void visitCall(const CallInst &I);		void visitCall(const CallInst &I);
bool visitMemCmpCall(const CallInst &I);		bool visitMemCmpCall(const CallInst &I);
bool visitMemChrCall(const CallInst &I);		bool visitMemChrCall(const CallInst &I);
bool visitStrCpyCall(const CallInst &I, bool isStpcpy);		bool visitStrCpyCall(const CallInst &I, bool isStpcpy);
▲ Show 20 Lines • Show All 56 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 3,699 Lines • ▼ Show 20 Lines	void SelectionDAGBuilder::visitMaskedLoad(const CallInst &I) {

SDValue Load = DAG.getMaskedLoad(VT, sdl, InChain, Ptr, Mask, Src0, VT, MMO,		SDValue Load = DAG.getMaskedLoad(VT, sdl, InChain, Ptr, Mask, Src0, VT, MMO,
ISD::NON_EXTLOAD);		ISD::NON_EXTLOAD);
SDValue OutChain = Load.getValue(1);		SDValue OutChain = Load.getValue(1);
DAG.setRoot(OutChain);		DAG.setRoot(OutChain);
setValue(&I, Load);		setValue(&I, Load);
}		}

		void SelectionDAGBuilder::visitIndexedStore(const CallInst &I) {
		SDLoc sdl = getCurSDLoc();

		// llvm.indexed.store.*(Src, Ptr, Index, alignemt)
		Value *PtrOperand = I.getArgOperand(1);
		SDValue Ptr = getValue(PtrOperand);
		SDValue Src = getValue(I.getArgOperand(0));
		SDValue Index = getValue(I.getArgOperand(2));
		EVT VT = Src.getValueType();
		unsigned Alignment = (cast<ConstantInt>(I.getArgOperand(3)))->getZExtValue();
		if (!Alignment)
		Alignment = DAG.getEVTAlignment(VT);

		AAMDNodes AAInfo;
		I.getAAMetadata(AAInfo);

		MachineMemOperand *MMO = DAG.getMachineFunction().getMachineMemOperand(
		MachinePointerInfo(PtrOperand), MachineMemOperand::MOStore,
		VT.getStoreSize(), Alignment, AAInfo);
		SDValue StoreNode =
		DAG.getIndexedStore(getRoot(), Src, Ptr, Index, VT, MMO, sdl);
		DAG.setRoot(StoreNode);
		setValue(&I, StoreNode);
		}

		void SelectionDAGBuilder::visitIndexedLoad(const CallInst &I) {
		SDLoc sdl = getCurSDLoc();
		// @llvm.indexed.load.*(Ptr, Index, alignment)
		Value *PtrOperand = I.getArgOperand(0);
		SDValue Ptr = getValue(PtrOperand);
		SDValue Index = getValue(I.getArgOperand(1));

		const TargetLowering &TLI = DAG.getTargetLoweringInfo();
		EVT VT = TLI.getValueType(I.getType());
		unsigned Alignment = (cast<ConstantInt>(I.getArgOperand(2)))->getZExtValue();
		if (!Alignment)
		Alignment = DAG.getEVTAlignment(VT);

		AAMDNodes AAInfo;
		I.getAAMetadata(AAInfo);
		const MDNode *Ranges = I.getMetadata(LLVMContext::MD_range);

		SDValue InChain = DAG.getRoot();
		if (AA->pointsToConstantMemory(AliasAnalysis::Location(
		PtrOperand, AA->getTypeStoreSize(I.getType()), AAInfo))) {
		// Do not serialize (non-volatile) loads of constant memory with anything.
		InChain = DAG.getEntryNode();
		}

		MachineMemOperand *MMO = DAG.getMachineFunction().getMachineMemOperand(
		MachinePointerInfo(PtrOperand), MachineMemOperand::MOLoad,
		VT.getStoreSize(), Alignment, AAInfo, Ranges);

		SDValue Load = DAG.getIndexedLoad(VT, InChain, Ptr, Index, VT, MMO, sdl);
		SDValue OutChain = Load.getValue(1);
		DAG.setRoot(OutChain);
		setValue(&I, Load);
		}

void SelectionDAGBuilder::visitAtomicCmpXchg(const AtomicCmpXchgInst &I) {		void SelectionDAGBuilder::visitAtomicCmpXchg(const AtomicCmpXchgInst &I) {
SDLoc dl = getCurSDLoc();		SDLoc dl = getCurSDLoc();
AtomicOrdering SuccessOrder = I.getSuccessOrdering();		AtomicOrdering SuccessOrder = I.getSuccessOrdering();
AtomicOrdering FailureOrder = I.getFailureOrdering();		AtomicOrdering FailureOrder = I.getFailureOrdering();
SynchronizationScope Scope = I.getSynchScope();		SynchronizationScope Scope = I.getSynchScope();

SDValue InChain = getRoot();		SDValue InChain = getRoot();

▲ Show 20 Lines • Show All 1,143 Lines • ▼ Show 20 Lines	SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I, unsigned Intrinsic) {
}		}

case Intrinsic::masked_load:		case Intrinsic::masked_load:
visitMaskedLoad(I);		visitMaskedLoad(I);
return nullptr;		return nullptr;
case Intrinsic::masked_store:		case Intrinsic::masked_store:
visitMaskedStore(I);		visitMaskedStore(I);
return nullptr;		return nullptr;
		case Intrinsic::indexed_load:
		visitIndexedLoad(I);
		return nullptr;
		case Intrinsic::indexed_store:
		visitIndexedStore(I);
		return nullptr;
case Intrinsic::x86_mmx_pslli_w:		case Intrinsic::x86_mmx_pslli_w:
case Intrinsic::x86_mmx_pslli_d:		case Intrinsic::x86_mmx_pslli_d:
case Intrinsic::x86_mmx_pslli_q:		case Intrinsic::x86_mmx_pslli_q:
case Intrinsic::x86_mmx_psrli_w:		case Intrinsic::x86_mmx_psrli_w:
case Intrinsic::x86_mmx_psrli_d:		case Intrinsic::x86_mmx_psrli_d:
case Intrinsic::x86_mmx_psrli_q:		case Intrinsic::x86_mmx_psrli_q:
case Intrinsic::x86_mmx_psrai_w:		case Intrinsic::x86_mmx_psrai_w:
case Intrinsic::x86_mmx_psrai_d: {		case Intrinsic::x86_mmx_psrai_d: {
▲ Show 20 Lines • Show All 2,945 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp

Show First 20 Lines • Show All 267 Lines • ▼ Show 20 Lines	#endif
case ISD::CALLSEQ_START: return "callseq_start";		case ISD::CALLSEQ_START: return "callseq_start";
case ISD::CALLSEQ_END: return "callseq_end";		case ISD::CALLSEQ_END: return "callseq_end";

// Other operators		// Other operators
case ISD::LOAD: return "load";		case ISD::LOAD: return "load";
case ISD::STORE: return "store";		case ISD::STORE: return "store";
case ISD::MLOAD: return "masked_load";		case ISD::MLOAD: return "masked_load";
case ISD::MSTORE: return "masked_store";		case ISD::MSTORE: return "masked_store";
		case ISD::ILOAD: return "indexed_load";
		case ISD::ISTORE: return "indexed_store";
case ISD::VAARG: return "vaarg";		case ISD::VAARG: return "vaarg";
case ISD::VACOPY: return "vacopy";		case ISD::VACOPY: return "vacopy";
case ISD::VAEND: return "vaend";		case ISD::VAEND: return "vaend";
case ISD::VASTART: return "vastart";		case ISD::VASTART: return "vastart";
case ISD::DYNAMIC_STACKALLOC: return "dynamic_stackalloc";		case ISD::DYNAMIC_STACKALLOC: return "dynamic_stackalloc";
case ISD::EXTRACT_ELEMENT: return "extract_element";		case ISD::EXTRACT_ELEMENT: return "extract_element";
case ISD::BUILD_PAIR: return "build_pair";		case ISD::BUILD_PAIR: return "build_pair";
case ISD::STACKSAVE: return "stacksave";		case ISD::STACKSAVE: return "stacksave";
▲ Show 20 Lines • Show All 385 Lines • Show Last 20 Lines

lib/IR/Function.cpp

Show First 20 Lines • Show All 542 Lines • ▼ Show 20 Lines	enum IIT_Info {
IIT_EXTEND_ARG = 24,		IIT_EXTEND_ARG = 24,
IIT_TRUNC_ARG = 25,		IIT_TRUNC_ARG = 25,
IIT_ANYPTR = 26,		IIT_ANYPTR = 26,
IIT_V1 = 27,		IIT_V1 = 27,
IIT_VARARG = 28,		IIT_VARARG = 28,
IIT_HALF_VEC_ARG = 29,		IIT_HALF_VEC_ARG = 29,
IIT_SAME_VEC_WIDTH_ARG = 30,		IIT_SAME_VEC_WIDTH_ARG = 30,
IIT_PTR_TO_ARG = 31,		IIT_PTR_TO_ARG = 31,
IIT_VEC_OF_PTRS_TO_ELT = 32		IIT_VEC_OF_PTRS_TO_ELT = 32,
		IIT_PTR_TO_VEC_ELT = 33
};		};


static void DecodeIITType(unsigned &NextElt, ArrayRef<unsigned char> Infos,		static void DecodeIITType(unsigned &NextElt, ArrayRef<unsigned char> Infos,
SmallVectorImpl<Intrinsic::IITDescriptor> &OutputTable) {		SmallVectorImpl<Intrinsic::IITDescriptor> &OutputTable) {
IIT_Info Info = IIT_Info(Infos[NextElt++]);		IIT_Info Info = IIT_Info(Infos[NextElt++]);
unsigned StructElts = 2;		unsigned StructElts = 2;
using namespace Intrinsic;		using namespace Intrinsic;

switch (Info) {		switch (Info) {
case IIT_Done:		case IIT_Done:
▲ Show 20 Lines • Show All 106 Lines • ▼ Show 20 Lines	case IIT_PTR_TO_ARG: {
return;		return;
}		}
case IIT_VEC_OF_PTRS_TO_ELT: {		case IIT_VEC_OF_PTRS_TO_ELT: {
unsigned ArgInfo = (NextElt == Infos.size() ? 0 : Infos[NextElt++]);		unsigned ArgInfo = (NextElt == Infos.size() ? 0 : Infos[NextElt++]);
OutputTable.push_back(IITDescriptor::get(IITDescriptor::VecOfPtrsToElt,		OutputTable.push_back(IITDescriptor::get(IITDescriptor::VecOfPtrsToElt,
ArgInfo));		ArgInfo));
return;		return;
}		}
		case IIT_PTR_TO_VEC_ELT: {
		unsigned ArgInfo = (NextElt == Infos.size() ? 0 : Infos[NextElt++]);
		OutputTable.push_back(
		IITDescriptor::get(IITDescriptor::PtrToVecElt, ArgInfo));
		return;
		}
case IIT_EMPTYSTRUCT:		case IIT_EMPTYSTRUCT:
OutputTable.push_back(IITDescriptor::get(IITDescriptor::Struct, 0));		OutputTable.push_back(IITDescriptor::get(IITDescriptor::Struct, 0));
return;		return;
case IIT_STRUCT5: ++StructElts; // FALL THROUGH.		case IIT_STRUCT5: ++StructElts; // FALL THROUGH.
case IIT_STRUCT4: ++StructElts; // FALL THROUGH.		case IIT_STRUCT4: ++StructElts; // FALL THROUGH.
case IIT_STRUCT3: ++StructElts; // FALL THROUGH.		case IIT_STRUCT3: ++StructElts; // FALL THROUGH.
case IIT_STRUCT2: {		case IIT_STRUCT2: {
OutputTable.push_back(IITDescriptor::get(IITDescriptor::Struct,StructElts));		OutputTable.push_back(IITDescriptor::get(IITDescriptor::Struct,StructElts));
▲ Show 20 Lines • Show All 112 Lines • ▼ Show 20 Lines	case IITDescriptor::VecOfPtrsToElt: {
Type *Ty = Tys[D.getArgumentNumber()];		Type *Ty = Tys[D.getArgumentNumber()];
VectorType *VTy = dyn_cast<VectorType>(Ty);		VectorType *VTy = dyn_cast<VectorType>(Ty);
if (!VTy)		if (!VTy)
llvm_unreachable("Expected an argument of Vector Type");		llvm_unreachable("Expected an argument of Vector Type");
Type *EltTy = VTy->getVectorElementType();		Type *EltTy = VTy->getVectorElementType();
return VectorType::get(PointerType::getUnqual(EltTy),		return VectorType::get(PointerType::getUnqual(EltTy),
VTy->getNumElements());		VTy->getNumElements());
}		}
		case IITDescriptor::PtrToVecElt: {
		Type *Ty = Tys[D.getArgumentNumber()];
		VectorType *VTy = dyn_cast<VectorType>(Ty);
		if (!VTy)
		llvm_unreachable("Expected an argument of Vector Type");
		Type *EltTy = VTy->getVectorElementType();
		return PointerType::getUnqual(EltTy);
		}
}		}
llvm_unreachable("unhandled");		llvm_unreachable("unhandled");
}		}



FunctionType *Intrinsic::getType(LLVMContext &Context,		FunctionType *Intrinsic::getType(LLVMContext &Context,
ID id, ArrayRef<Type*> Tys) {		ID id, ArrayRef<Type*> Tys) {
▲ Show 20 Lines • Show All 148 Lines • Show Last 20 Lines

lib/IR/Verifier.cpp

Show First 20 Lines • Show All 3,036 Lines • ▼ Show 20 Lines	if (!ThisArgVecTy \|\| !ReferenceType \|\|
return true;		return true;
PointerType *ThisArgEltTy =		PointerType *ThisArgEltTy =
dyn_cast<PointerType>(ThisArgVecTy->getVectorElementType());		dyn_cast<PointerType>(ThisArgVecTy->getVectorElementType());
if (!ThisArgEltTy)		if (!ThisArgEltTy)
return true;		return true;
return (!(ThisArgEltTy->getElementType() ==		return (!(ThisArgEltTy->getElementType() ==
ReferenceType->getVectorElementType()));		ReferenceType->getVectorElementType()));
}		}
		case IITDescriptor::PtrToVecElt: {
		if (D.getArgumentNumber() >= ArgTys.size())
		return true;
		VectorType *ReferenceType =
		dyn_cast<VectorType>(ArgTys[D.getArgumentNumber()]);
		if (!ReferenceType)
		return true;
		PointerType *ThisArgType = dyn_cast<PointerType>(Ty);
		return (!ThisArgType \|\|
		ThisArgType->getPointerElementType() !=
		ReferenceType->getElementType());
		}
}		}
llvm_unreachable("unhandled");		llvm_unreachable("unhandled");
}		}

/// \brief Verify if the intrinsic has variable arguments.		/// \brief Verify if the intrinsic has variable arguments.
/// This method is intended to be called after all the fixed arguments have been		/// This method is intended to be called after all the fixed arguments have been
/// verified first.		/// verified first.
///		///
▲ Show 20 Lines • Show All 539 Lines • Show Last 20 Lines

lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 489 Lines • ▼ Show 20 Lines	AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
setTargetDAGCombine(ISD::MUL);		setTargetDAGCombine(ISD::MUL);

setTargetDAGCombine(ISD::SELECT);		setTargetDAGCombine(ISD::SELECT);
setTargetDAGCombine(ISD::VSELECT);		setTargetDAGCombine(ISD::VSELECT);

setTargetDAGCombine(ISD::INTRINSIC_VOID);		setTargetDAGCombine(ISD::INTRINSIC_VOID);
setTargetDAGCombine(ISD::INTRINSIC_W_CHAIN);		setTargetDAGCombine(ISD::INTRINSIC_W_CHAIN);
setTargetDAGCombine(ISD::INSERT_VECTOR_ELT);		setTargetDAGCombine(ISD::INSERT_VECTOR_ELT);
		setTargetDAGCombine(ISD::ILOAD);
		setTargetDAGCombine(ISD::ISTORE);

MaxStoresPerMemset = MaxStoresPerMemsetOptSize = 8;		MaxStoresPerMemset = MaxStoresPerMemsetOptSize = 8;
MaxStoresPerMemcpy = MaxStoresPerMemcpyOptSize = 4;		MaxStoresPerMemcpy = MaxStoresPerMemcpyOptSize = 4;
MaxStoresPerMemmove = MaxStoresPerMemmoveOptSize = 4;		MaxStoresPerMemmove = MaxStoresPerMemmoveOptSize = 4;

setStackPointerRegisterToSaveRestore(AArch64::SP);		setStackPointerRegisterToSaveRestore(AArch64::SP);

setSchedulingPreference(Sched::Hybrid);		setSchedulingPreference(Sched::Hybrid);
▲ Show 20 Lines • Show All 7,634 Lines • ▼ Show 20 Lines	for (SDNode::use_iterator UI = Addr.getNode()->use_begin(), UE =
DCI.CombineTo(N, SDValue(UpdN.getNode(), 0)); // Dup/Inserted Result		DCI.CombineTo(N, SDValue(UpdN.getNode(), 0)); // Dup/Inserted Result
DCI.CombineTo(User, SDValue(UpdN.getNode(), 1)); // Write back register		DCI.CombineTo(User, SDValue(UpdN.getNode(), 1)); // Write back register

break;		break;
}		}
return SDValue();		return SDValue();
}		}

		static unsigned getLdNStNIntrinsicID(unsigned NumVec, bool IsLoad) {
		static unsigned LoadInt[3] = {Intrinsic::aarch64_neon_ld2,
		Intrinsic::aarch64_neon_ld3,
		Intrinsic::aarch64_neon_ld4};
		static unsigned StoreInt[3] = {Intrinsic::aarch64_neon_st2,
		Intrinsic::aarch64_neon_st3,
		Intrinsic::aarch64_neon_st4};

		return IsLoad ? LoadInt[NumVec - 2] : StoreInt[NumVec - 2];
		}

		// Check if the given indices are interleaved by N (N = 2,3,4).
		bool static isInterleavedIndices(ArrayRef<unsigned> Indices, unsigned &NumVec,
		unsigned &NumElts) {
		if (Indices.size() <= 2)
		return false;
		if (Indices[0] != 0)
		return false;
		NumVec = Indices[1];
		if (NumVec < 2 \|\| NumVec > 4)
		return false;

		NumElts = Indices.size() / NumVec;
		// The index should match: 0, NumVec, 2*NumVec, ..., 1, NumVec + 1, ...
		for (unsigned i = 0; i < NumVec; i++)
		for (unsigned j = 0; j < NumElts; j++)
		if (Indices[j + i * NumElts] != j * NumVec + i)
		return false;

		return true;
		}

		static SDValue
		performIndexedLoadStoreCombine(SDNode *N, TargetLowering::DAGCombinerInfo &DCI,
		SelectionDAG &DAG) {
		bool IsLoad = N->getOpcode() == ISD::ILOAD;
		// VecVal = ILOAD (Chain, Pointer, index)
		// ISTORE (Chain, Pointer, Index, VecVal)
		SDNode *IdxNode = N->getOperand(2).getNode(); // Indexed Node
		SmallVector<unsigned, 16> Indices;
		for (unsigned i = 0; i < IdxNode->getNumOperands(); i++) {
		ConstantSDNode *IdxElt =
		dyn_cast<ConstantSDNode>(IdxNode->getOperand(i).getNode());
		assert(IdxElt && "Expect a constant index element");
		Indices.push_back(IdxElt->getZExtValue());
		}

		unsigned NumVec, NumElts;
		if (!isInterleavedIndices(Indices, NumVec, NumElts))
		return SDValue();

		// For store, get the stored vector type. For load, get the result type.
		EVT VT =
		IsLoad ? N->getValueType(0) : N->getOperand(3).getNode()->getValueType(0);
		EVT ValVT =
		EVT::getVectorVT(*DAG.getContext(), VT.getVectorElementType(), NumElts);
		if (!DAG.getTargetLoweringInfo().isTypeLegal(ValVT))
		return SDValue();

		// Build the operand list.
		SmallVector<SDValue, 8> Ops;
		Ops.push_back(N->getOperand(0)); // The Chain
		const TargetLowering &TLI = DAG.getTargetLoweringInfo();
		SDLoc DL(N);
		// Push the intrinsic ID for ldN stN.
		Ops.push_back(DAG.getTargetConstant(getLdNStNIntrinsicID(NumVec, IsLoad),
		TLI.getPointerTy()));
		if (!IsLoad) {
		SDValue StoreVec = N->getOperand(3);
		for (unsigned i = 0; i < NumVec; i++) {
		SDValue ValVec = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, ValVT, StoreVec,
		DAG.getConstant(0 + i * NumElts, MVT::i64));
		Ops.push_back(ValVec); // The stored vectors
		}
		}
		Ops.push_back(N->getOperand(1)); // The pointer

		EVT Tys[4];
		unsigned n;
		unsigned NumRetVecs = IsLoad ? NumVec : 0;
		for (n = 0; n < NumRetVecs; ++n)
		Tys[n] = ValVT;
		Tys[n] = MVT::Other; // Type of the chain
		SDVTList SDTys = DAG.getVTList(makeArrayRef(Tys, NumRetVecs + 1));

		IndexedLoadStoreSDNode *MemNode = cast<IndexedLoadStoreSDNode>(N);
		unsigned NewOp = IsLoad ? ISD::INTRINSIC_W_CHAIN : ISD::INTRINSIC_VOID;
		SDValue NewNode = DAG.getMemIntrinsicNode(
		NewOp, DL, SDTys, Ops, MemNode->getMemoryVT(), MemNode->getMemOperand());

		if (!IsLoad)
		return NewNode;

		SDValue ResVec;
		SDValue Res[4];
		for (unsigned i = 0; i < NumVec; i++)
		Res[i] = SDValue(NewNode.getNode(), i);
		ResVec = DAG.getNode(ISD::CONCAT_VECTORS, DL, VT, makeArrayRef(Res, NumVec));
		// Replace the result
		DAG.ReplaceAllUsesOfValueWith(SDValue(N, 0), ResVec);
		// Replace the Chain
		DAG.ReplaceAllUsesOfValueWith(SDValue(N, 1),
		SDValue(NewNode.getNode(), NumVec));

		return SDValue();
		}

/// Target-specific DAG combine function for NEON load/store intrinsics		/// Target-specific DAG combine function for NEON load/store intrinsics
/// to merge base address updates.		/// to merge base address updates.
static SDValue performNEONPostLDSTCombine(SDNode *N,		static SDValue performNEONPostLDSTCombine(SDNode *N,
TargetLowering::DAGCombinerInfo &DCI,		TargetLowering::DAGCombinerInfo &DCI,
SelectionDAG &DAG) {		SelectionDAG &DAG) {
if (DCI.isBeforeLegalize() \|\| DCI.isCalledByLegalizer())		if (DCI.isBeforeLegalize() \|\| DCI.isCalledByLegalizer())
return SDValue();		return SDValue();

▲ Show 20 Lines • Show All 542 Lines • ▼ Show 20 Lines	SDValue AArch64TargetLowering::PerformDAGCombine(SDNode *N,
case ISD::CONCAT_VECTORS:		case ISD::CONCAT_VECTORS:
return performConcatVectorsCombine(N, DCI, DAG);		return performConcatVectorsCombine(N, DCI, DAG);
case ISD::SELECT:		case ISD::SELECT:
return performSelectCombine(N, DAG);		return performSelectCombine(N, DAG);
case ISD::VSELECT:		case ISD::VSELECT:
return performVSelectCombine(N, DCI.DAG);		return performVSelectCombine(N, DCI.DAG);
case ISD::STORE:		case ISD::STORE:
return performSTORECombine(N, DCI, DAG, Subtarget);		return performSTORECombine(N, DCI, DAG, Subtarget);
		case ISD::ILOAD:
		case ISD::ISTORE:
		return performIndexedLoadStoreCombine(N, DCI, DAG);
case AArch64ISD::BRCOND:		case AArch64ISD::BRCOND:
return performBRCONDCombine(N, DCI, DAG);		return performBRCONDCombine(N, DCI, DAG);
case AArch64ISD::CSEL:		case AArch64ISD::CSEL:
return performCONDCombine(N, DCI, DAG, 2, 3);		return performCONDCombine(N, DCI, DAG, 2, 3);
case AArch64ISD::DUP:		case AArch64ISD::DUP:
return performPostLD1Combine(N, DCI, false);		return performPostLD1Combine(N, DCI, false);
case ISD::INSERT_VECTOR_ELT:		case ISD::INSERT_VECTOR_ELT:
return performPostLD1Combine(N, DCI, true);		return performPostLD1Combine(N, DCI, true);
Show All 20 Lines	case ISD::INTRINSIC_W_CHAIN:
case Intrinsic::aarch64_neon_st1x4:		case Intrinsic::aarch64_neon_st1x4:
case Intrinsic::aarch64_neon_st2lane:		case Intrinsic::aarch64_neon_st2lane:
case Intrinsic::aarch64_neon_st3lane:		case Intrinsic::aarch64_neon_st3lane:
case Intrinsic::aarch64_neon_st4lane:		case Intrinsic::aarch64_neon_st4lane:
return performNEONPostLDSTCombine(N, DCI, DAG);		return performNEONPostLDSTCombine(N, DCI, DAG);
default:		default:
break;		break;
}		}
		break;
}		}
return SDValue();		return SDValue();
}		}

// Check if the return value is used as only a return value, as otherwise		// Check if the return value is used as only a return value, as otherwise
// we can't perform a tail-call. In particular, we need to check for		// we can't perform a tail-call. In particular, we need to check for
// target ISD nodes that are returns and any other "odd" constructs		// target ISD nodes that are returns and any other "odd" constructs
// that the generic analysis code won't necessarily catch.		// that the generic analysis code won't necessarily catch.
▲ Show 20 Lines • Show All 267 Lines • Show Last 20 Lines

lib/Target/AArch64/AArch64TargetTransformInfo.h

Show First 20 Lines • Show All 133 Lines • ▼ Show 20 Lines	public:

void getUnrollingPreferences(Loop *L, TTI::UnrollingPreferences &UP);		void getUnrollingPreferences(Loop *L, TTI::UnrollingPreferences &UP);

Value getOrCreateResultFromMemIntrinsic(IntrinsicInst Inst,		Value getOrCreateResultFromMemIntrinsic(IntrinsicInst Inst,
Type *ExpectedType);		Type *ExpectedType);

bool getTgtMemIntrinsic(IntrinsicInst *Inst, MemIntrinsicInfo &Info);		bool getTgtMemIntrinsic(IntrinsicInst *Inst, MemIntrinsicInfo &Info);

		bool supportIndexedStore(Type *DataType, ArrayRef<unsigned> Indices);
		bool supportIndexedLoad(Type *DataType, ArrayRef<unsigned> Indices);
/// @}		/// @}
};		};

} // end namespace llvm		} // end namespace llvm

#endif		#endif

lib/Target/AArch64/AArch64TargetTransformInfo.cpp

Show First 20 Lines • Show All 401 Lines • ▼ Show 20 Lines	if (Src->isVectorTy() && Src->getVectorElementType()->isIntegerTy(8) &&
unsigned NumVectorizableInstsToAmortize = NumVecElts * 2;		unsigned NumVectorizableInstsToAmortize = NumVecElts * 2;
// We generate 2 instructions per vector element.		// We generate 2 instructions per vector element.
return NumVectorizableInstsToAmortize * NumVecElts * 2;		return NumVectorizableInstsToAmortize * NumVecElts * 2;
}		}

return LT.first;		return LT.first;
}		}

		// Check if the given indices are interleaved by N (N = 2,3,4).
		bool static isInterleavedIndices(ArrayRef<unsigned> Indices, unsigned &NumVec,
		unsigned &NumElts) {
		if (Indices.size() <= 2)
		return false;
		if (Indices[0] != 0)
		return false;
		NumVec = Indices[1];
		if (NumVec < 2 \|\| NumVec > 4)
		return false;

		NumElts = Indices.size() / NumVec;
		// The index should match: 0, NumVec, 2*NumVec, ..., 1, NumVec + 1, ...
		for (unsigned i = 0; i < NumVec; i++)
		for (unsigned j = 0; j < NumElts; j++)
		if (Indices[j + i * NumElts] != j * NumVec + i)
		return false;

		return true;
		}

		bool AArch64TTIImpl::supportIndexedStore(Type *DataType,
		ArrayRef<unsigned> Indices) {
		unsigned NumVec, NumElts;
		if (!isInterleavedIndices(Indices, NumVec, NumElts))
		return false;

		VectorType *VecType = dyn_cast<VectorType>(DataType);
		assert(VecType && VecType->getNumElements() == NumVec * NumElts &&
		"Expected a vector type");

		VectorType *ValVec = VectorType::get(VecType->getElementType(), NumElts);
		return isTypeLegal(ValVec);
		}

		bool AArch64TTIImpl::supportIndexedLoad(Type *DataType,
		ArrayRef<unsigned> Indices) {
		unsigned NumVec, NumElts;
		if (!isInterleavedIndices(Indices, NumVec, NumElts))
		return false;

		VectorType *VecType = dyn_cast<VectorType>(DataType);
		assert(VecType && VecType->getNumElements() == NumVec * NumElts &&
		"Expected a vector type");

		VectorType *ValVec = VectorType::get(VecType->getElementType(), NumElts);
		return isTypeLegal(ValVec);
		}

unsigned AArch64TTIImpl::getCostOfKeepingLiveOverCall(ArrayRef<Type *> Tys) {		unsigned AArch64TTIImpl::getCostOfKeepingLiveOverCall(ArrayRef<Type *> Tys) {
unsigned Cost = 0;		unsigned Cost = 0;
for (auto *I : Tys) {		for (auto *I : Tys) {
if (!I->isVectorTy())		if (!I->isVectorTy())
continue;		continue;
if (I->getScalarSizeInBits() * I->getVectorNumElements() == 128)		if (I->getScalarSizeInBits() * I->getVectorNumElements() == 128)
Cost += getMemoryOpCost(Instruction::Store, I, 128, 0) +		Cost += getMemoryOpCost(Instruction::Store, I, 128, 0) +
getMemoryOpCost(Instruction::Load, I, 128, 0);		getMemoryOpCost(Instruction::Load, I, 128, 0);
▲ Show 20 Lines • Show All 104 Lines • Show Last 20 Lines

test/CodeGen/AArch64/indexed-load-store-noninterleaved.ll

This file was added.

				; RUN: llc -print-after codegenprepare < %s 2>&1 \| FileCheck %s
				; REQUIRES: asserts

				target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"
				target triple = "aarch64-linux-gnueabi"

				; CHECK-LABEL: @test_v4i32(i32* %ptr) {
				; CHECK: load i32
				; CHECK: insertelement <4 x i32> {{.*}}, i32 0
				; CHECK: getelementptr {{.*}}, i32 1
				; CHECK: load i32
				; CHECK: insertelement <4 x i32> {{.*}}, i32 1
				; CHECK: getelementptr {{.*}}, i32 2
				; CHECK: load i32
				; CHECK: insertelement <4 x i32> {{.*}}, i32 2
				; CHECK: getelementptr {{.*}}, i32 3
				; CHECK: load i32
				; CHECK: insertelement <4 x i32> {{.*}}, i32 3

				; CHECK: extractelement <4 x i32> {{.*}}, i32 0
				; CHECK: getelementptr {{.*}}, i32 3
				; CHECK: store i32
				; CHECK: extractelement <4 x i32> {{.*}}, i32 1
				; CHECK: getelementptr {{.*}}, i32 2
				; CHECK: store i32
				; CHECK: extractelement <4 x i32> {{.*}}, i32 2
				; CHECK: getelementptr {{.*}}, i32 1
				; CHECK: store i32
				; CHECK: extractelement <4 x i32> {{.*}}, i32 3
				; CHECK: store i32

				define void @test_v4i32(i32* %ptr) {
				entry:
				%indexed.load = call <4 x i32> @llvm.indexed.load.v4i32(i32* %ptr, <4 x i32> <i32 0, i32 1, i32 2, i32 3>, i32 4)
				%0 = add nsw <4 x i32> %indexed.load, <i32 1, i32 1, i32 1, i32 1>
				call void @llvm.indexed.store.v4i32(<4 x i32> %0, i32* %ptr, <4 x i32> <i32 3, i32 2, i32 1, i32 0>, i32 4)
				ret void
				}

				; CHECK-LABEL: @test_v4f32(float* %ptr) {
				; CHECK: load float
				; CHECK: insertelement <4 x float> {{.*}}, i32 0
				; CHECK: getelementptr {{.*}}, i32 2
				; CHECK: load float
				; CHECK: insertelement <4 x float> {{.*}}, i32 1
				; CHECK: getelementptr {{.*}}, i32 4
				; CHECK: load float
				; CHECK: insertelement <4 x float> {{.*}}, i32 2
				; CHECK: getelementptr {{.*}}, i32 6
				; CHECK: load float
				; CHECK: insertelement <4 x float> {{.*}}, i32 3

				; CHECK: extractelement <4 x float> {{.*}}, i32 0
				; CHECK: store float
				; CHECK: extractelement <4 x float> {{.*}}, i32 1
				; CHECK: getelementptr {{.*}}, i32 2
				; CHECK: store float
				; CHECK: extractelement <4 x float> {{.*}}, i32 2
				; CHECK: getelementptr {{.*}}, i32 4
				; CHECK: store float
				; CHECK: extractelement <4 x float> {{.*}}, i32 3
				; CHECK: getelementptr {{.*}}, i32 6
				; CHECK: store float

				define void @test_v4f32(float* %ptr) {
				entry:
				%indexed.load = call <4 x float> @llvm.indexed.load.v4f32(float* %ptr, <4 x i32> <i32 0, i32 2, i32 4, i32 6>, i32 4)
				%0 = fadd <4 x float> %indexed.load, <float 1.0, float 1.0, float 1.0, float 1.0>
				call void @llvm.indexed.store.v4f32(<4 x float> %0, float* %ptr, <4 x i32> <i32 0, i32 2, i32 4, i32 6>, i32 4)
				ret void
				}

				declare <4 x i32> @llvm.indexed.load.v4i32(i32*, <4 x i32>, i32)
				declare void @llvm.indexed.store.v4i32(<4 x i32>, i32*, <4 x i32>, i32)
				declare <4 x float> @llvm.indexed.load.v4f32(float*, <4 x i32>, i32)
				declare void @llvm.indexed.store.v4f32(<4 x float>, float*, <4 x i32>, i32)

test/CodeGen/AArch64/interleaved-load-store.ll

This file was added.

				; RUN: llc < %s \| FileCheck %s

				target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"
				target triple = "aarch64-linux-gnueabi"

				; Make sure the intrinsic about 2 interleaved vectors can be matched
				; CHECK-LABEL: test_ld2_st2:
				; CHECK: ld2
				; CHECK: st2

				define void @test_ld2_st2(i32* %ptr) {
				entry:
				%interleave.load = call <8 x i32> @llvm.indexed.load.v8i32(i32* %ptr, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 1, i32 3, i32 5, i32 7>, i32 4)
				%0 = shufflevector <8 x i32> %interleave.load, <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
				%1 = shufflevector <8 x i32> %interleave.load, <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
				%2 = add nsw <4 x i32> %0, <i32 1, i32 1, i32 1, i32 1>
				%3 = add nsw <4 x i32> %1, <i32 2, i32 2, i32 2, i32 2>
				%4 = shufflevector <4 x i32> %2, <4 x i32> %3, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
				call void @llvm.indexed.store.v8i32(<8 x i32> %4, i32* %ptr, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 1, i32 3, i32 5, i32 7>, i32 4)
				ret void
				}

				; Make sure the intrinsic about 3 interleaved vectors can be matched
				; CHECK-LABEL: test_ld3_st3:
				; CHECK: ld3
				; CHECK: st3

				define void @test_ld3_st3(float* %ptr) {
				entry:
				%interleave.load = call <12 x float> @llvm.indexed.load.v12f32(float* %ptr, <12 x i32> <i32 0, i32 3, i32 6, i32 9, i32 1, i32 4, i32 7, i32 10, i32 2, i32 5, i32 8, i32 11>, i32 4)
				%0 = shufflevector <12 x float> %interleave.load, <12 x float> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
				%1 = shufflevector <12 x float> %interleave.load, <12 x float> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
				%2 = shufflevector <12 x float> %interleave.load, <12 x float> undef, <4 x i32> <i32 8, i32 9, i32 10, i32 11>
				%3 = fadd <4 x float> %0, <float 1.0, float 1.0, float 1.0, float 1.0>
				%4 = fadd <4 x float> %1, <float 2.0, float 2.0, float 2.0, float 2.0>
				%5 = fadd <4 x float> %2, <float 3.0, float 3.0, float 3.0, float 3.0>
				%6 = shufflevector <4 x float> %3, <4 x float> %4, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
				%7 = shufflevector <4 x float> %5, <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
				%8 = shufflevector <8 x float> %6, <8 x float> %7, <12 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11>
				call void @llvm.indexed.store.v12f32(<12 x float> %8, float* %ptr, <12 x i32> <i32 0, i32 3, i32 6, i32 9, i32 1, i32 4, i32 7, i32 10, i32 2, i32 5, i32 8, i32 11>, i32 4)
				ret void
				}

				; Make sure the intrinsic about 3 interleaved vectors can be matched
				; CHECK-LABEL: test_ld4_st4:
				; CHECK: ld4
				; CHECK: st4

				define void @test_ld4_st4(i64* %ptr) {
				entry:
				%interleave.load = call <8 x i64> @llvm.indexed.load.v8i64(i64* %ptr, <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7>, i32 4)
				%0 = shufflevector <8 x i64> %interleave.load, <8 x i64> undef, <2 x i32> <i32 0, i32 1>
				%1 = shufflevector <8 x i64> %interleave.load, <8 x i64> undef, <2 x i32> <i32 2, i32 3>
				%2 = shufflevector <8 x i64> %interleave.load, <8 x i64> undef, <2 x i32> <i32 4, i32 5>
				%3 = shufflevector <8 x i64> %interleave.load, <8 x i64> undef, <2 x i32> <i32 6, i32 7>
				%4 = add nsw <2 x i64> %0, <i64 1, i64 1>
				%5 = add nsw <2 x i64> %1, <i64 2, i64 2>
				%6 = add nsw <2 x i64> %2, <i64 3, i64 3>
				%7 = add nsw <2 x i64> %3, <i64 4, i64 4>
				%8 = shufflevector <2 x i64> %4, <2 x i64> %5, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
				%9 = shufflevector <2 x i64> %6, <2 x i64> %7, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
				%10 = shufflevector <4 x i64> %8, <4 x i64> %9, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
				call void @llvm.indexed.store.v8i64(<8 x i64> %10, i64* %ptr, <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7>, i32 4)
				ret void
				}


				declare <8 x i32> @llvm.indexed.load.v8i32(i32*, <8 x i32>, i32)
				declare void @llvm.indexed.store.v8i32(<8 x i32>, i32*, <8 x i32>, i32)
				declare <12 x float> @llvm.indexed.load.v12f32(float*, <12 x i32>, i32)
				declare void @llvm.indexed.store.v12f32(<12 x float>, float*, <12 x i32>, i32)
				declare <8 x i64> @llvm.indexed.load.v8i64(i64*, <8 x i32>, i32)
				declare void @llvm.indexed.store.v8i64(<8 x i64>, i64*, <8 x i32>, i32)

utils/TableGen/IntrinsicEmitter.cpp

Show First 20 Lines • Show All 254 Lines • ▼ Show 20 Lines	enum IIT_Info {
IIT_EXTEND_ARG = 24,		IIT_EXTEND_ARG = 24,
IIT_TRUNC_ARG = 25,		IIT_TRUNC_ARG = 25,
IIT_ANYPTR = 26,		IIT_ANYPTR = 26,
IIT_V1 = 27,		IIT_V1 = 27,
IIT_VARARG = 28,		IIT_VARARG = 28,
IIT_HALF_VEC_ARG = 29,		IIT_HALF_VEC_ARG = 29,
IIT_SAME_VEC_WIDTH_ARG = 30,		IIT_SAME_VEC_WIDTH_ARG = 30,
IIT_PTR_TO_ARG = 31,		IIT_PTR_TO_ARG = 31,
IIT_VEC_OF_PTRS_TO_ELT = 32		IIT_VEC_OF_PTRS_TO_ELT = 32,
		IIT_PTR_TO_VEC_ELT = 33
};		};


static void EncodeFixedValueType(MVT::SimpleValueType VT,		static void EncodeFixedValueType(MVT::SimpleValueType VT,
std::vector<unsigned char> &Sig) {		std::vector<unsigned char> &Sig) {
if (MVT(VT).isInteger()) {		if (MVT(VT).isInteger()) {
unsigned BitWidth = MVT(VT).getSizeInBits();		unsigned BitWidth = MVT(VT).getSizeInBits();
switch (BitWidth) {		switch (BitWidth) {
default: PrintFatalError("unhandled integer type width in intrinsic!");		default: PrintFatalError("unhandled integer type width in intrinsic!");
case 1: return Sig.push_back(IIT_I1);		case 1: return Sig.push_back(IIT_I1);
case 8: return Sig.push_back(IIT_I8);		case 8: return Sig.push_back(IIT_I8);
Show All 39 Lines	else if (R->isSubClassOf("LLVMVectorSameWidth")) {
MVT::SimpleValueType VT = getValueType(R->getValueAsDef("ElTy"));		MVT::SimpleValueType VT = getValueType(R->getValueAsDef("ElTy"));
EncodeFixedValueType(VT, Sig);		EncodeFixedValueType(VT, Sig);
return;		return;
}		}
else if (R->isSubClassOf("LLVMPointerTo"))		else if (R->isSubClassOf("LLVMPointerTo"))
Sig.push_back(IIT_PTR_TO_ARG);		Sig.push_back(IIT_PTR_TO_ARG);
else if (R->isSubClassOf("LLVMVectorOfPointersToElt"))		else if (R->isSubClassOf("LLVMVectorOfPointersToElt"))
Sig.push_back(IIT_VEC_OF_PTRS_TO_ELT);		Sig.push_back(IIT_VEC_OF_PTRS_TO_ELT);
		else if (R->isSubClassOf("LLVMPointerToVectorElt"))
		Sig.push_back(IIT_PTR_TO_VEC_ELT);
else		else
Sig.push_back(IIT_ARG);		Sig.push_back(IIT_ARG);
return Sig.push_back((Number << 3) \| ArgCodes[Number]);		return Sig.push_back((Number << 3) \| ArgCodes[Number]);
}		}

MVT::SimpleValueType VT = getValueType(R->getValueAsDef("VT"));		MVT::SimpleValueType VT = getValueType(R->getValueAsDef("VT"));

unsigned Tmp = 0;		unsigned Tmp = 0;
▲ Show 20 Lines • Show All 537 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Add new indexed load/store intrinsics.AbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 24217

docs/LangRef.rst

include/llvm/Analysis/TargetTransformInfo.h

include/llvm/Analysis/TargetTransformInfoImpl.h

include/llvm/CodeGen/ISDOpcodes.h

include/llvm/CodeGen/SelectionDAG.h

include/llvm/CodeGen/SelectionDAGNodes.h

include/llvm/IR/Intrinsics.h

include/llvm/IR/Intrinsics.td

lib/Analysis/TargetTransformInfo.cpp

lib/CodeGen/CodeGenPrepare.cpp

lib/CodeGen/SelectionDAG/SelectionDAG.cpp

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp

lib/IR/Function.cpp

lib/IR/Verifier.cpp

lib/Target/AArch64/AArch64ISelLowering.cpp

lib/Target/AArch64/AArch64TargetTransformInfo.h

lib/Target/AArch64/AArch64TargetTransformInfo.cpp

test/CodeGen/AArch64/indexed-load-store-noninterleaved.ll

test/CodeGen/AArch64/interleaved-load-store.ll

utils/TableGen/IntrinsicEmitter.cpp

Add new indexed load/store intrinsics.
AbandonedPublic