Download Raw Diff

Details

Reviewers

• tstellarAMD
jlebar
llvm-commits

Commits

rG327955e057a7: Add TLI.allowsMisalignedMemoryAccesses to LoadStoreVectorizer
rL275100: Add TLI.allowsMisalignedMemoryAccesses to LoadStoreVectorizer

Summary

Extend TTI to access TLI.allowsMisalignedMemoryAccesses(). Check condition when vectorizing load and store chains.

Diff Detail

Repository: rL LLVM

Event Timeline

asbirlea updated this revision to Diff 62532.Jul 1 2016, 1:42 PM

asbirlea retitled this revision from to Add TLI.allowsMisalignedMemoryAccesses to LoadStoreVectorizer.

asbirlea updated this object.

asbirlea added reviewers: llvm-commits, jlebar.

asbirlea added a subscriber: arsenm.

Herald added a subscriber: mzolotukhin. · View Herald TranscriptJul 1 2016, 1:42 PM

asbirlea added a parent revision: D21934: Address two correctness issues in LoadStoreVectorizer.Jul 1 2016, 1:43 PM

arsenm added inline comments.Jul 1 2016, 1:55 PM

include/llvm/Analysis/TargetTransformInfo.h
392 ↗	(On Diff #62532)	This is missing a lot of parameters compared to the TLI version. It at least needs and address space and the alignment value
392 ↗	(On Diff #62532)	IsFast would be helpful too

Adding AddressSpace and Alignment.

I'm not clear if the address space argument is the right one.

Looking into where to get the isFast info from.

asbirlea added a child revision: D22071: Correct ordering of loads/stores..Jul 6 2016, 2:35 PM

Added Fast parameter.
Updated check on allowsMisaligned.
Updated test that now vectorizes.

Herald added a reviewer: • tstellarAMD. · View Herald TranscriptJul 6 2016, 3:16 PM

asbirlea mentioned this in D22071: Correct ordering of loads/stores..Jul 6 2016, 3:30 PM

jlebar added inline comments.Jul 6 2016, 3:49 PM

include/llvm/Analysis/TargetTransformInfo.h
391 ↗	(On Diff #62984)	Maybe a better comment would be Determine whether we can read BitWidth bits from memory with the given alignment (in bytes). That is, this isn't strictly for determining whether misaligned memory accesses are allowed -- if Alignment == BitWidth, the call is still allowed (right?).
lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
703 ↗	(On Diff #62984)	Might as well set this to false, to guard against low-quality TTI overrides.
704 ↗	(On Diff #62984)	No `== true`.
705 ↗	(On Diff #62984)	I'm not sure why this is the right condition. "If the store is going to be misaligned, don't vectorize it." This is now, essentially, "if the store is going to be misaligned and misaligned stores of this witdh and alignment are not fast, don't vectorize it," right? So shouldn't the condition be `(Alignment % SzInBytes) != 0 && (!allowsMisaligned(...) \|\| !Fast)` ? In fact, maybe it makes sense to centralize this logic, rename `allowsMisaligned`, and just do `memoryAccessIsAllowedAndFast(SzInBytes, AS, Alignment)`
test/Transforms/LoadStoreVectorizer/AMDGPU/merge-stores.ll
505 ↗	(On Diff #62984)	I am surprised we can vectorize this...are you sure it's not a bug?

asbirlea added a parent revision: D22107: Clang-format LoadStoreVectorizer.Jul 7 2016, 1:16 PM

Update after formatting.

jlebar added inline comments.Jul 7 2016, 1:41 PM

include/llvm/Analysis/TargetTransformInfo.h
391 ↗	(On Diff #63130)	(This suggestion may be wrong -- the callee may assume that the memory access is misaligned, even if Alignment would indicate it's not. I dunno.)

asbirlea added inline comments.Jul 7 2016, 1:44 PM

lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
700 ↗	(On Diff #63130)	Updated this with new condition. Now there are other tests that vectorize, the ones with "natural" alignment that I had updated previously. I'm looking into whether this is correct behavior.
test/Transforms/LoadStoreVectorizer/AMDGPU/merge-stores.ll
505 ↗	(On Diff #63130)	This is no longer vectorized with the above changes. I will update it with the other tests.

arsenm added inline comments.Jul 7 2016, 1:54 PM

test/Transforms/LoadStoreVectorizer/AMDGPU/merge-stores.ll
505 ↗	(On Diff #63130)	I recently enabled misaligned access depending on the triple, so this should probably be vectorized?

arsenm added inline comments.Jul 7 2016, 1:55 PM

test/Transforms/LoadStoreVectorizer/AMDGPU/merge-stores.ll
505 ↗	(On Diff #63130)	Never mind, I misread which test this is

asbirlea added inline comments.Jul 7 2016, 2:59 PM

test/Transforms/LoadStoreVectorizer/AMDGPU/merge-stores.ll
505 ↗	(On Diff #63130)	I believe due to what you enabled, the tests ending in "_natural_align" are now vectorizing. They don't vectorize without the triple argument. I'm still not clear why/if the vectorization is correct. Would you mind taking a look to confirm? I can update the (4) tests in that case.

arsenm added inline comments.Jul 7 2016, 3:08 PM

test/Transforms/LoadStoreVectorizer/AMDGPU/merge-stores.ll
505 ↗	(On Diff #63130)	I don't see the changes in the diff, but that should be fine

Updating tests.

arsenm added inline comments.Jul 7 2016, 3:16 PM

lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
830 ↗	(On Diff #63146)	! instead of == false

Replace ==false with !

clang-format.

asbirlea added inline comments.Jul 8 2016, 10:16 AM

include/llvm/Analysis/TargetTransformInfo.h

391 ↗

(On Diff #63146)

The definition should be the same as the one in TLI. It reads:

/// \brief Determine if the target supports unaligned memory accesses.
///
/// This function returns true if the target allows unaligned memory accesses
/// of the specified type in the given address space. If true, it also returns
/// whether the unaligned memory access is "fast" in the last argument by
/// reference. This is used, for example, in situations where an array
/// copy/move/set is converted to a sequence of store operations. Its use
/// helps to ensure that such replacements don't generate code that causes an
/// alignment error (trap) on the target machine.

The difference in the API here is not having a type, but a size in bits. The brief description still applies. Perhaps use just that?

jlebar accepted this revision.Jul 8 2016, 11:01 AM

jlebar edited edge metadata.

jlebar added inline comments.

include/llvm/Analysis/TargetTransformInfo.h
391 ↗	(On Diff #63155)	sgtm
lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
702 ↗	(On Diff #63155)	I still think it might make sense to move the (Alignment % SzInBytes) != 0 && (Alignment % TargetBaseAlign) != 0 checks into the helper function (currently called allowsMisalignedAndIsFast, would need a new name). But, up to you.

This revision is now accepted and ready to land.Jul 8 2016, 11:01 AM

Address comments.

asbirlea added inline comments.Jul 8 2016, 1:36 PM

include/llvm/Analysis/TargetTransformInfo.h
391 ↗	(On Diff #63155)	done.
lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
702 ↗	(On Diff #63155)	I debated a bit whether to do this. Ideally I would have liked a single point of usage (and future removal) for TargetBaseAlign. Still, the condition is cleaner and I can remove a nesting level.

Format.

Closed by commit rL275100: Add TLI.allowsMisalignedMemoryAccesses to LoadStoreVectorizer (authored by asbirlea). · Explain WhyJul 11 2016, 1:53 PM

This revision was automatically updated to reflect the committed changes.

Diff 63568

llvm/trunk/include/llvm/Analysis/TargetTransformInfo.h

Show First 20 Lines • Show All 382 Lines • ▼ Show 20 Lines	public:
/// floating-point operations because the semantics of vector and scalar		/// floating-point operations because the semantics of vector and scalar
/// floating-point semantics may differ. For example, ARM NEON v7 SIMD math		/// floating-point semantics may differ. For example, ARM NEON v7 SIMD math
/// does not support IEEE-754 denormal numbers, while depending on the		/// does not support IEEE-754 denormal numbers, while depending on the
/// platform, scalar floating-point math does.		/// platform, scalar floating-point math does.
/// This applies to floating-point math operations and calls, not memory		/// This applies to floating-point math operations and calls, not memory
/// operations, shuffles, or casts.		/// operations, shuffles, or casts.
bool isFPVectorizationPotentiallyUnsafe() const;		bool isFPVectorizationPotentiallyUnsafe() const;

		/// \brief Determine if the target supports unaligned memory accesses.
		bool allowsMisalignedMemoryAccesses(unsigned BitWidth, unsigned AddressSpace = 0,
		unsigned Alignment = 1,
		bool *Fast = nullptr) const;

/// \brief Return hardware support for population count.		/// \brief Return hardware support for population count.
PopcntSupportKind getPopcntSupport(unsigned IntTyWidthInBit) const;		PopcntSupportKind getPopcntSupport(unsigned IntTyWidthInBit) const;

/// \brief Return true if the hardware has a fast square-root instruction.		/// \brief Return true if the hardware has a fast square-root instruction.
bool haveFastSqrt(Type *Ty) const;		bool haveFastSqrt(Type *Ty) const;

/// \brief Return the expected cost of supporting the floating point operation		/// \brief Return the expected cost of supporting the floating point operation
/// of the specified type.		/// of the specified type.
▲ Show 20 Lines • Show All 249 Lines • ▼ Show 20 Lines	public:
virtual bool isProfitableToHoist(Instruction *I) = 0;		virtual bool isProfitableToHoist(Instruction *I) = 0;
virtual bool isTypeLegal(Type *Ty) = 0;		virtual bool isTypeLegal(Type *Ty) = 0;
virtual unsigned getJumpBufAlignment() = 0;		virtual unsigned getJumpBufAlignment() = 0;
virtual unsigned getJumpBufSize() = 0;		virtual unsigned getJumpBufSize() = 0;
virtual bool shouldBuildLookupTables() = 0;		virtual bool shouldBuildLookupTables() = 0;
virtual bool enableAggressiveInterleaving(bool LoopHasReductions) = 0;		virtual bool enableAggressiveInterleaving(bool LoopHasReductions) = 0;
virtual bool enableInterleavedAccessVectorization() = 0;		virtual bool enableInterleavedAccessVectorization() = 0;
virtual bool isFPVectorizationPotentiallyUnsafe() = 0;		virtual bool isFPVectorizationPotentiallyUnsafe() = 0;
		virtual bool allowsMisalignedMemoryAccesses(unsigned BitWidth,
		unsigned AddressSpace,
		unsigned Alignment,
		bool *Fast) = 0;
virtual PopcntSupportKind getPopcntSupport(unsigned IntTyWidthInBit) = 0;		virtual PopcntSupportKind getPopcntSupport(unsigned IntTyWidthInBit) = 0;
virtual bool haveFastSqrt(Type *Ty) = 0;		virtual bool haveFastSqrt(Type *Ty) = 0;
virtual int getFPOpCost(Type *Ty) = 0;		virtual int getFPOpCost(Type *Ty) = 0;
virtual int getIntImmCost(const APInt &Imm, Type *Ty) = 0;		virtual int getIntImmCost(const APInt &Imm, Type *Ty) = 0;
virtual int getIntImmCost(unsigned Opc, unsigned Idx, const APInt &Imm,		virtual int getIntImmCost(unsigned Opc, unsigned Idx, const APInt &Imm,
Type *Ty) = 0;		Type *Ty) = 0;
virtual int getIntImmCost(Intrinsic::ID IID, unsigned Idx, const APInt &Imm,		virtual int getIntImmCost(Intrinsic::ID IID, unsigned Idx, const APInt &Imm,
Type *Ty) = 0;		Type *Ty) = 0;
▲ Show 20 Lines • Show All 151 Lines • ▼ Show 20 Lines	bool enableAggressiveInterleaving(bool LoopHasReductions) override {
return Impl.enableAggressiveInterleaving(LoopHasReductions);		return Impl.enableAggressiveInterleaving(LoopHasReductions);
}		}
bool enableInterleavedAccessVectorization() override {		bool enableInterleavedAccessVectorization() override {
return Impl.enableInterleavedAccessVectorization();		return Impl.enableInterleavedAccessVectorization();
}		}
bool isFPVectorizationPotentiallyUnsafe() override {		bool isFPVectorizationPotentiallyUnsafe() override {
return Impl.isFPVectorizationPotentiallyUnsafe();		return Impl.isFPVectorizationPotentiallyUnsafe();
}		}
		bool allowsMisalignedMemoryAccesses(unsigned BitWidth, unsigned AddressSpace,
		unsigned Alignment, bool *Fast) override {
		return Impl.allowsMisalignedMemoryAccesses(BitWidth, AddressSpace,
		Alignment, Fast);
		}
PopcntSupportKind getPopcntSupport(unsigned IntTyWidthInBit) override {		PopcntSupportKind getPopcntSupport(unsigned IntTyWidthInBit) override {
return Impl.getPopcntSupport(IntTyWidthInBit);		return Impl.getPopcntSupport(IntTyWidthInBit);
}		}
bool haveFastSqrt(Type *Ty) override { return Impl.haveFastSqrt(Ty); }		bool haveFastSqrt(Type *Ty) override { return Impl.haveFastSqrt(Ty); }

int getFPOpCost(Type *Ty) override { return Impl.getFPOpCost(Ty); }		int getFPOpCost(Type *Ty) override { return Impl.getFPOpCost(Ty); }

int getIntImmCost(const APInt &Imm, Type *Ty) override {		int getIntImmCost(const APInt &Imm, Type *Ty) override {
▲ Show 20 Lines • Show All 222 Lines • Show Last 20 Lines

llvm/trunk/include/llvm/Analysis/TargetTransformInfoImpl.h

Show First 20 Lines • Show All 238 Lines • ▼ Show 20 Lines	public:
bool shouldBuildLookupTables() { return true; }		bool shouldBuildLookupTables() { return true; }

bool enableAggressiveInterleaving(bool LoopHasReductions) { return false; }		bool enableAggressiveInterleaving(bool LoopHasReductions) { return false; }

bool enableInterleavedAccessVectorization() { return false; }		bool enableInterleavedAccessVectorization() { return false; }

bool isFPVectorizationPotentiallyUnsafe() { return false; }		bool isFPVectorizationPotentiallyUnsafe() { return false; }

		bool allowsMisalignedMemoryAccesses(unsigned BitWidth,
		unsigned AddressSpace,
		unsigned Alignment,
		bool *Fast) { return false; }

TTI::PopcntSupportKind getPopcntSupport(unsigned IntTyWidthInBit) {		TTI::PopcntSupportKind getPopcntSupport(unsigned IntTyWidthInBit) {
return TTI::PSK_Software;		return TTI::PSK_Software;
}		}

bool haveFastSqrt(Type *Ty) { return false; }		bool haveFastSqrt(Type *Ty) { return false; }

unsigned getFPOpCost(Type *Ty) { return TargetTransformInfo::TCC_Basic; }		unsigned getFPOpCost(Type *Ty) { return TargetTransformInfo::TCC_Basic; }

▲ Show 20 Lines • Show All 277 Lines • Show Last 20 Lines

llvm/trunk/include/llvm/CodeGen/BasicTTIImpl.h

Show First 20 Lines • Show All 99 Lines • ▼ Show 20 Lines	public:
// Provide value semantics. MSVC requires that we spell all of these out.		// Provide value semantics. MSVC requires that we spell all of these out.
BasicTTIImplBase(const BasicTTIImplBase &Arg)		BasicTTIImplBase(const BasicTTIImplBase &Arg)
: BaseT(static_cast<const BaseT &>(Arg)) {}		: BaseT(static_cast<const BaseT &>(Arg)) {}
BasicTTIImplBase(BasicTTIImplBase &&Arg)		BasicTTIImplBase(BasicTTIImplBase &&Arg)
: BaseT(std::move(static_cast<BaseT &>(Arg))) {}		: BaseT(std::move(static_cast<BaseT &>(Arg))) {}

/// \name Scalar TTI Implementations		/// \name Scalar TTI Implementations
/// @{		/// @{
		bool allowsMisalignedMemoryAccesses(unsigned BitWidth, unsigned AddressSpace,
		unsigned Alignment, bool *Fast) const {
		MVT M = MVT::getIntegerVT(BitWidth);
		return getTLI()->allowsMisalignedMemoryAccesses(M, AddressSpace, Alignment, Fast);
		}

bool hasBranchDivergence() { return false; }		bool hasBranchDivergence() { return false; }

bool isSourceOfDivergence(const Value *V) { return false; }		bool isSourceOfDivergence(const Value *V) { return false; }

bool isLegalAddImmediate(int64_t imm) {		bool isLegalAddImmediate(int64_t imm) {
return getTLI()->isLegalAddImmediate(imm);		return getTLI()->isLegalAddImmediate(imm);
}		}
▲ Show 20 Lines • Show All 845 Lines • Show Last 20 Lines

llvm/trunk/lib/Analysis/TargetTransformInfo.cpp

	Show First 20 Lines • Show All 180 Lines • ▼ Show 20 Lines
	bool TargetTransformInfo::enableInterleavedAccessVectorization() const {			bool TargetTransformInfo::enableInterleavedAccessVectorization() const {
	return TTIImpl->enableInterleavedAccessVectorization();			return TTIImpl->enableInterleavedAccessVectorization();
	}			}

	bool TargetTransformInfo::isFPVectorizationPotentiallyUnsafe() const {			bool TargetTransformInfo::isFPVectorizationPotentiallyUnsafe() const {
	return TTIImpl->isFPVectorizationPotentiallyUnsafe();			return TTIImpl->isFPVectorizationPotentiallyUnsafe();
	}			}

				bool TargetTransformInfo::allowsMisalignedMemoryAccesses(unsigned BitWidth,
				unsigned AddressSpace,
				unsigned Alignment,
				bool *Fast) const {
				return TTIImpl->allowsMisalignedMemoryAccesses(BitWidth, AddressSpace,
				Alignment, Fast);
				}

	TargetTransformInfo::PopcntSupportKind			TargetTransformInfo::PopcntSupportKind
	TargetTransformInfo::getPopcntSupport(unsigned IntTyWidthInBit) const {			TargetTransformInfo::getPopcntSupport(unsigned IntTyWidthInBit) const {
	return TTIImpl->getPopcntSupport(IntTyWidthInBit);			return TTIImpl->getPopcntSupport(IntTyWidthInBit);
	}			}

	bool TargetTransformInfo::haveFastSqrt(Type *Ty) const {			bool TargetTransformInfo::haveFastSqrt(Type *Ty) const {
	return TTIImpl->haveFastSqrt(Ty);			return TTIImpl->haveFastSqrt(Ty);
	}			}
	▲ Show 20 Lines • Show All 256 Lines • Show Last 20 Lines

llvm/trunk/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp

Show First 20 Lines • Show All 121 Lines • ▼ Show 20 Lines	private:
/// Finds the load/stores to consecutive memory addresses and vectorizes them.		/// Finds the load/stores to consecutive memory addresses and vectorizes them.
bool vectorizeInstructions(ArrayRef<Value *> Instrs);		bool vectorizeInstructions(ArrayRef<Value *> Instrs);

/// Vectorizes the load instructions in Chain.		/// Vectorizes the load instructions in Chain.
bool vectorizeLoadChain(ArrayRef<Value *> Chain);		bool vectorizeLoadChain(ArrayRef<Value *> Chain);

/// Vectorizes the store instructions in Chain.		/// Vectorizes the store instructions in Chain.
bool vectorizeStoreChain(ArrayRef<Value *> Chain);		bool vectorizeStoreChain(ArrayRef<Value *> Chain);

		/// Check if this load/store access is misaligned accesses
		bool accessIsMisaligned(unsigned SzInBytes, unsigned AddressSpace,
		unsigned Alignment);
};		};

class LoadStoreVectorizer : public FunctionPass {		class LoadStoreVectorizer : public FunctionPass {
public:		public:
static char ID;		static char ID;

LoadStoreVectorizer() : FunctionPass(ID) {		LoadStoreVectorizer() : FunctionPass(ID) {
initializeLoadStoreVectorizerPass(*PassRegistry::getPassRegistry());		initializeLoadStoreVectorizerPass(*PassRegistry::getPassRegistry());
▲ Show 20 Lines • Show All 549 Lines • ▼ Show 20 Lines	DEBUG({
for (Value *V : Chain)		for (Value *V : Chain)
V->dump();		V->dump();
});		});

// Check alignment restrictions.		// Check alignment restrictions.
unsigned Alignment = getAlignment(S0);		unsigned Alignment = getAlignment(S0);

// If the store is going to be misaligned, don't vectorize it.		// If the store is going to be misaligned, don't vectorize it.
// TODO: Check TLI.allowsMisalignedMemoryAccess		if (accessIsMisaligned(SzInBytes, AS, Alignment)) {
if ((Alignment % SzInBytes) != 0 && (Alignment % TargetBaseAlign) != 0) {		if (S0->getPointerAddressSpace() != 0)
if (S0->getPointerAddressSpace() == 0) {		return false;

// If we're storing to an object on the stack, we control its alignment,		// If we're storing to an object on the stack, we control its alignment,
// so we can cheat and change it!		// so we can cheat and change it!
Value *V = GetUnderlyingObject(S0->getPointerOperand(), DL);		Value *V = GetUnderlyingObject(S0->getPointerOperand(), DL);
if (AllocaInst *AI = dyn_cast_or_null<AllocaInst>(V)) {		if (AllocaInst *AI = dyn_cast_or_null<AllocaInst>(V)) {
AI->setAlignment(TargetBaseAlign);		AI->setAlignment(TargetBaseAlign);
Alignment = TargetBaseAlign;		Alignment = TargetBaseAlign;
} else {		} else {
return false;		return false;
}		}
} else {
return false;
}
}		}

BasicBlock::iterator First, Last;		BasicBlock::iterator First, Last;
std::tie(First, Last) = getBoundaryInstrs(Chain);		std::tie(First, Last) = getBoundaryInstrs(Chain);

if (!isVectorizable(Chain, First, Last))		if (!isVectorizable(Chain, First, Last))
return false;		return false;

▲ Show 20 Lines • Show All 98 Lines • ▼ Show 20 Lines	if (ChainSize > VF) {
return vectorizeLoadChain(Chain.slice(0, VF)) \|		return vectorizeLoadChain(Chain.slice(0, VF)) \|
vectorizeLoadChain(Chain.slice(VF));		vectorizeLoadChain(Chain.slice(VF));
}		}

// Check alignment restrictions.		// Check alignment restrictions.
unsigned Alignment = getAlignment(L0);		unsigned Alignment = getAlignment(L0);

// If the load is going to be misaligned, don't vectorize it.		// If the load is going to be misaligned, don't vectorize it.
// TODO: Check TLI.allowsMisalignedMemoryAccess and remove TargetBaseAlign.		if (accessIsMisaligned(SzInBytes, AS, Alignment)) {
if ((Alignment % SzInBytes) != 0 && (Alignment % TargetBaseAlign) != 0) {		if (L0->getPointerAddressSpace() != 0)
if (L0->getPointerAddressSpace() == 0) {		return false;

// If we're loading from an object on the stack, we control its alignment,		// If we're loading from an object on the stack, we control its alignment,
// so we can cheat and change it!		// so we can cheat and change it!
Value *V = GetUnderlyingObject(L0->getPointerOperand(), DL);		Value *V = GetUnderlyingObject(L0->getPointerOperand(), DL);
if (AllocaInst *AI = dyn_cast_or_null<AllocaInst>(V)) {		if (AllocaInst *AI = dyn_cast_or_null<AllocaInst>(V)) {
AI->setAlignment(TargetBaseAlign);		AI->setAlignment(TargetBaseAlign);
Alignment = TargetBaseAlign;		Alignment = TargetBaseAlign;
} else {		} else {
return false;		return false;
}		}
} else {
return false;
}
}		}

DEBUG({		DEBUG({
dbgs() << "LSV: Loads to vectorize:\n";		dbgs() << "LSV: Loads to vectorize:\n";
for (Value *V : Chain)		for (Value *V : Chain)
V->dump();		V->dump();
});		});

▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	bool Vectorizer::vectorizeLoadChain(ArrayRef<Value *> Chain) {
}		}

eraseInstructions(Chain);		eraseInstructions(Chain);

++NumVectorInstructions;		++NumVectorInstructions;
NumScalarsVectorized += Chain.size();		NumScalarsVectorized += Chain.size();
return true;		return true;
}		}

		bool Vectorizer::accessIsMisaligned(unsigned SzInBytes, unsigned AddressSpace,
		unsigned Alignment) {
		bool Fast = false;
		bool Allows = TTI.allowsMisalignedMemoryAccesses(SzInBytes * 8, AddressSpace,
		Alignment, &Fast);
		// TODO: Remove TargetBaseAlign
		return !(Allows && Fast) && (Alignment % SzInBytes) != 0 &&
		(Alignment % TargetBaseAlign) != 0;
		}

llvm/trunk/test/Transforms/LoadStoreVectorizer/AMDGPU/merge-stores.ll

Show All 13 Lines	define void @merge_global_store_2_constants_i8(i8 addrspace(1)* %out) #0 {
%out.gep.1 = getelementptr i8, i8 addrspace(1)* %out, i32 1		%out.gep.1 = getelementptr i8, i8 addrspace(1)* %out, i32 1

store i8 123, i8 addrspace(1)* %out.gep.1		store i8 123, i8 addrspace(1)* %out.gep.1
store i8 456, i8 addrspace(1)* %out, align 2		store i8 456, i8 addrspace(1)* %out, align 2
ret void		ret void
}		}

; CHECK-LABEL: @merge_global_store_2_constants_i8_natural_align		; CHECK-LABEL: @merge_global_store_2_constants_i8_natural_align
; CHECK: store i8		; CHECK: store <2 x i8>
; CHECK: store i8
define void @merge_global_store_2_constants_i8_natural_align(i8 addrspace(1)* %out) #0 {		define void @merge_global_store_2_constants_i8_natural_align(i8 addrspace(1)* %out) #0 {
%out.gep.1 = getelementptr i8, i8 addrspace(1)* %out, i32 1		%out.gep.1 = getelementptr i8, i8 addrspace(1)* %out, i32 1

store i8 123, i8 addrspace(1)* %out.gep.1		store i8 123, i8 addrspace(1)* %out.gep.1
store i8 456, i8 addrspace(1)* %out		store i8 456, i8 addrspace(1)* %out
ret void		ret void
}		}

Show All 13 Lines	define void @merge_global_store_2_constants_0_i16(i16 addrspace(1)* %out) #0 {
%out.gep.1 = getelementptr i16, i16 addrspace(1)* %out, i32 1		%out.gep.1 = getelementptr i16, i16 addrspace(1)* %out, i32 1

store i16 0, i16 addrspace(1)* %out.gep.1		store i16 0, i16 addrspace(1)* %out.gep.1
store i16 0, i16 addrspace(1)* %out, align 4		store i16 0, i16 addrspace(1)* %out, align 4
ret void		ret void
}		}

; CHECK-LABEL: @merge_global_store_2_constants_i16_natural_align		; CHECK-LABEL: @merge_global_store_2_constants_i16_natural_align
; CHECK: store i16		; CHECK: store <2 x i16>
; CHECK: store i16
define void @merge_global_store_2_constants_i16_natural_align(i16 addrspace(1)* %out) #0 {		define void @merge_global_store_2_constants_i16_natural_align(i16 addrspace(1)* %out) #0 {
%out.gep.1 = getelementptr i16, i16 addrspace(1)* %out, i32 1		%out.gep.1 = getelementptr i16, i16 addrspace(1)* %out, i32 1

store i16 123, i16 addrspace(1)* %out.gep.1		store i16 123, i16 addrspace(1)* %out.gep.1
store i16 456, i16 addrspace(1)* %out		store i16 456, i16 addrspace(1)* %out
ret void		ret void
}		}

; CHECK-LABEL: @merge_global_store_2_constants_half_natural_align		; CHECK-LABEL: @merge_global_store_2_constants_half_natural_align
; CHECK: store half		; CHECK: store <2 x half>
; CHECK: store half
define void @merge_global_store_2_constants_half_natural_align(half addrspace(1)* %out) #0 {		define void @merge_global_store_2_constants_half_natural_align(half addrspace(1)* %out) #0 {
%out.gep.1 = getelementptr half, half addrspace(1)* %out, i32 1		%out.gep.1 = getelementptr half, half addrspace(1)* %out, i32 1

store half 2.0, half addrspace(1)* %out.gep.1		store half 2.0, half addrspace(1)* %out.gep.1
store half 1.0, half addrspace(1)* %out		store half 1.0, half addrspace(1)* %out
ret void		ret void
}		}

▲ Show 20 Lines • Show All 353 Lines • ▼ Show 20 Lines	define void @merge_global_store_4_adjacent_loads_i8(i8 addrspace(1)* %out, i8 addrspace(1)* %in) #0 {
store i8 %x, i8 addrspace(1)* %out, align 4		store i8 %x, i8 addrspace(1)* %out, align 4
store i8 %y, i8 addrspace(1)* %out.gep.1		store i8 %y, i8 addrspace(1)* %out.gep.1
store i8 %z, i8 addrspace(1)* %out.gep.2		store i8 %z, i8 addrspace(1)* %out.gep.2
store i8 %w, i8 addrspace(1)* %out.gep.3		store i8 %w, i8 addrspace(1)* %out.gep.3
ret void		ret void
}		}

; CHECK-LABEL: @merge_global_store_4_adjacent_loads_i8_natural_align		; CHECK-LABEL: @merge_global_store_4_adjacent_loads_i8_natural_align
; CHECK: load i8		; CHECK: load <4 x i8>
; CHECK: load i8		; CHECK: store <4 x i8>
; CHECK: load i8
; CHECK: load i8
; CHECK: store i8
; CHECK: store i8
; CHECK: store i8
; CHECK: store i8
define void @merge_global_store_4_adjacent_loads_i8_natural_align(i8 addrspace(1)* %out, i8 addrspace(1)* %in) #0 {		define void @merge_global_store_4_adjacent_loads_i8_natural_align(i8 addrspace(1)* %out, i8 addrspace(1)* %in) #0 {
%out.gep.1 = getelementptr i8, i8 addrspace(1)* %out, i8 1		%out.gep.1 = getelementptr i8, i8 addrspace(1)* %out, i8 1
%out.gep.2 = getelementptr i8, i8 addrspace(1)* %out, i8 2		%out.gep.2 = getelementptr i8, i8 addrspace(1)* %out, i8 2
%out.gep.3 = getelementptr i8, i8 addrspace(1)* %out, i8 3		%out.gep.3 = getelementptr i8, i8 addrspace(1)* %out, i8 3
%in.gep.1 = getelementptr i8, i8 addrspace(1)* %in, i8 1		%in.gep.1 = getelementptr i8, i8 addrspace(1)* %in, i8 1
%in.gep.2 = getelementptr i8, i8 addrspace(1)* %in, i8 2		%in.gep.2 = getelementptr i8, i8 addrspace(1)* %in, i8 2
%in.gep.3 = getelementptr i8, i8 addrspace(1)* %in, i8 3		%in.gep.3 = getelementptr i8, i8 addrspace(1)* %in, i8 3

▲ Show 20 Lines • Show All 197 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Add TLI.allowsMisalignedMemoryAccesses to LoadStoreVectorizer
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 63568

llvm/trunk/include/llvm/Analysis/TargetTransformInfo.h

llvm/trunk/include/llvm/Analysis/TargetTransformInfoImpl.h

llvm/trunk/include/llvm/CodeGen/BasicTTIImpl.h

llvm/trunk/lib/Analysis/TargetTransformInfo.cpp

llvm/trunk/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp

llvm/trunk/test/Transforms/LoadStoreVectorizer/AMDGPU/merge-stores.ll

This is an archive of the discontinued LLVM Phabricator instance.

Add TLI.allowsMisalignedMemoryAccesses to LoadStoreVectorizerClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 63568

llvm/trunk/include/llvm/Analysis/TargetTransformInfo.h

llvm/trunk/include/llvm/Analysis/TargetTransformInfoImpl.h

llvm/trunk/include/llvm/CodeGen/BasicTTIImpl.h

llvm/trunk/lib/Analysis/TargetTransformInfo.cpp

llvm/trunk/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp

llvm/trunk/test/Transforms/LoadStoreVectorizer/AMDGPU/merge-stores.ll

Add TLI.allowsMisalignedMemoryAccesses to LoadStoreVectorizer
ClosedPublic