This is an archive of the discontinued LLVM Phabricator instance.

llvm/lib/Analysis/BasicAliasAnalysis.cpp
543 ↗	(On Diff #253187)	Can you check DL.getTypeAllocSize(GEPIdxedTy).Scalable instead, since you're calling getTypeAllocSize anyway? I don't think this is the right way to break out of this loop early; if the offset isn't recorded in "Decomposed", other code might assume it's zero?
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
1880	I think TypeSize also has isZero now, assuming that got merged.
llvm/lib/Transforms/Utils/VNCoercion.cpp
24–25	Fix the comment here? Can you refactor the whole `Ty->isStructTy() \|\| LoadTy->isArrayTy() \|\| (Ty->isVectorTy() && Ty->getVectorIsScalable())` thing into a helper function, so the overall property has a name?

Harbormaster failed remote builds in B50720: Diff 253187!Mar 27 2020, 12:36 PM

bjope added a subscriber: bjope.Mar 28 2020, 10:29 AM

bjope added inline comments.

llvm/lib/Transforms/Utils/VNCoercion.cpp

24–25

I agree that refactoring this check into a helper could be wise.

Just out of curiosity, shouldn't we also check DL.typeSizeEuqalsStoreSize in such a helper?

Background for that question is that I've been investigating some problems downstream (where we got some types that aren't byte sized and thus include paddning in memory).

Consider this example (using types available upstream):

define <4 x i1> @v4i8_to_v4i1() {
  %tmp = alloca <4 x i8>, align 1
  store <4 x i8> <i8 4, i8 -1, i8 -1, i8 -1>, <4 x i8>* %tmp, align 1
  %sroa_cast = bitcast <4 x i8>* %tmp to <4 x i1>*
  %dst = load <4 x i1>, <4 x i1>* %sroa_cast, align 1
  ret <4 x i1> %dst
}

And notice that the returned value will differ depending on if we have big/little endian:

> opt -gvn -S -data-layout "e" gvn-test.ll
; ModuleID = 'gvn-test.ll'
source_filename = "gvn-test.ll"
target datalayout = "e"

define <4 x i1> @v4i8_to_v4i1() {
  %tmp = alloca <4 x i8>, align 1
  store <4 x i8> <i8 4, i8 -1, i8 -1, i8 -1>, <4 x i8>* %tmp, align 1
  %sroa_cast = bitcast <4 x i8>* %tmp to <4 x i1>*
  ret <4 x i1> <i1 false, i1 false, i1 true, i1 false>
}

> opt -gvn -S -data-layout "E" gvn-test.ll

; ModuleID = 'gvn-test.ll'
source_filename = "gvn-test.ll"
target datalayout = "E"

define <4 x i1> @v4i8_to_v4i1() {
  %tmp = alloca <4 x i8>, align 1
  store <4 x i8> <i8 4, i8 -1, i8 -1, i8 -1>, <4 x i8>* %tmp, align 1
  %sroa_cast = bitcast <4 x i8>* %tmp to <4 x i1>*
  ret <4 x i1> <i1 false, i1 true, i1 false, i1 false>
}

So we might wanna avoid VNCoercion for types with DL.typeSizeEuqalsStoreSize(Ty) being false. At least for vector types. For scalars the alignTo check a few lines down offer some protection, but maybe checking DL.typeSizeEuqalsStoreSize(Ty) make sense for scalars as well. Or what do you think?

bjope added inline comments.Mar 28 2020, 10:38 AM

llvm/lib/Transforms/Utils/VNCoercion.cpp
24–25	Well, it is not wrong that we get different result for big/little endian in my example above. Just that it is unclear to my where the bits go when storing <4 x i1> to memory. Is it well defined where the four bits go when storing <4 x i1> to memory?

efriedma added inline comments.Mar 28 2020, 2:57 PM

llvm/lib/Transforms/Utils/VNCoercion.cpp
24–25	Yes, there's a consistent rule that should be used everywhere. Maybe the simplest description is just the code in foldConstVectorToAPInt in ConstantFolding.cpp.

huihuiz added inline comments.Mar 30 2020, 9:56 AM

llvm/lib/Analysis/BasicAliasAnalysis.cpp
543 ↗	(On Diff #253187)	Spend sometime look again, BasicAAResult::aliasGEP is using the offset/varIndices recorded in DecomposedGEP. For scalable type, we might end up inaccurate alias result. Working on fixes for this logic.

huihuiz mentioned this in D77828: [BasicAA] Fix aliasGEP/DecomposeGEPExpression for scalable type..Apr 9 2020, 2:08 PM

huihuiz added a child revision: D77828: [BasicAA] Fix aliasGEP/DecomposeGEPExpression for scalable type..

Address review comments.

Herald added a reviewer: ctetreau. · View Herald TranscriptApr 9 2020, 4:42 PM

huihuiz retitled this revision from [GVN] Fix VNCoercion/BasicAA for Scalable Vector. to [GVN] Fix VNCoercion for Scalable Vector..Apr 9 2020, 4:43 PM

huihuiz edited the summary of this revision. (Show Details)

huihuiz added inline comments.

llvm/lib/Analysis/BasicAliasAnalysis.cpp
543 ↗	(On Diff #253187)	Fixing BasicAA in D77828.

LGTM

This revision is now accepted and ready to land.Apr 9 2020, 5:22 PM

Harbormaster failed remote builds in B52595: Diff 256446!Apr 9 2020, 5:26 PM

huihuiz mentioned this in rG6c989d024862: [BasicAA] Fix aliasGEP/DecomposeGEPExpression for scalable type..Apr 10 2020, 5:15 PM

Harbormaster completed remote builds in B52595: Diff 256446.Apr 10 2020, 5:47 PM

Closed by commit rG6e7eeb44b305: [GVN] Fix VNCoercion for Scalable Vector. (authored by huihuiz). · Explain WhyApr 10 2020, 5:49 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

InstCombine/

InstructionCombining.cpp

2 lines

Utils/

VNCoercion.cpp

63 lines

test/

Transforms/

GVN/

vscale.ll

344 lines

Diff 256724

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp

Show First 20 Lines • Show All 1,871 Lines • ▼ Show 20 Lines	if (auto *GEPVTy = dyn_cast<VectorType>(GEP.getType())) {
// TODO: 1) Scalarize splat operands, 2) scalarize entire instruction if		// TODO: 1) Scalarize splat operands, 2) scalarize entire instruction if
// possible (decide on canonical form for pointer broadcast), 3) exploit		// possible (decide on canonical form for pointer broadcast), 3) exploit
// undef elements to decrease demanded bits		// undef elements to decrease demanded bits
}		}

Value *PtrOp = GEP.getOperand(0);		Value *PtrOp = GEP.getOperand(0);

// Eliminate unneeded casts for indices, and replace indices which displace		// Eliminate unneeded casts for indices, and replace indices which displace
// by multiples of a zero size type with zero.		// by multiples of a zero size type with zero.
		efriedmaUnsubmitted Done Reply Inline Actions I think TypeSize also has isZero now, assuming that got merged. efriedma: I think TypeSize also has isZero now, assuming that got merged.
bool MadeChange = false;		bool MadeChange = false;

// Index width may not be the same width as pointer width.		// Index width may not be the same width as pointer width.
// Data layout chooses the right type based on supported integer types.		// Data layout chooses the right type based on supported integer types.
Type *NewScalarIndexTy =		Type *NewScalarIndexTy =
DL.getIndexType(GEP.getPointerOperandType()->getScalarType());		DL.getIndexType(GEP.getPointerOperandType()->getScalarType());

gep_type_iterator GTI = gep_type_begin(GEP);		gep_type_iterator GTI = gep_type_begin(GEP);
for (User::op_iterator I = GEP.op_begin() + 1, E = GEP.op_end(); I != E;		for (User::op_iterator I = GEP.op_begin() + 1, E = GEP.op_end(); I != E;
++I, ++GTI) {		++I, ++GTI) {
// Skip indices into struct types.		// Skip indices into struct types.
if (GTI.isStruct())		if (GTI.isStruct())
continue;		continue;

Type IndexTy = (I)->getType();		Type IndexTy = (I)->getType();
Type *NewIndexType =		Type *NewIndexType =
IndexTy->isVectorTy()		IndexTy->isVectorTy()
? VectorType::get(NewScalarIndexTy,		? VectorType::get(NewScalarIndexTy,
cast<VectorType>(IndexTy)->getNumElements())		cast<VectorType>(IndexTy)->getNumElements())
: NewScalarIndexTy;		: NewScalarIndexTy;

// If the element type has zero size then any index over it is equivalent		// If the element type has zero size then any index over it is equivalent
// to an index of zero, so replace it with zero if it is not zero already.		// to an index of zero, so replace it with zero if it is not zero already.
Type *EltTy = GTI.getIndexedType();		Type *EltTy = GTI.getIndexedType();
if (EltTy->isSized() && DL.getTypeAllocSize(EltTy) == 0)		if (EltTy->isSized() && DL.getTypeAllocSize(EltTy).isZero())
if (!isa<Constant>(*I) \|\| !match(I->get(), m_Zero())) {		if (!isa<Constant>(*I) \|\| !match(I->get(), m_Zero())) {
*I = Constant::getNullValue(NewIndexType);		*I = Constant::getNullValue(NewIndexType);
MadeChange = true;		MadeChange = true;
}		}

if (IndexTy != NewIndexType) {		if (IndexTy != NewIndexType) {
// If we are using a wider index than needed for this platform, shrink		// If we are using a wider index than needed for this platform, shrink
// it to what we need. If narrower, sign-extend it to what we need.		// it to what we need. If narrower, sign-extend it to what we need.
▲ Show 20 Lines • Show All 2,017 Lines • Show Last 20 Lines

llvm/lib/Transforms/Utils/VNCoercion.cpp

#include "llvm/Transforms/Utils/VNCoercion.h"		#include "llvm/Transforms/Utils/VNCoercion.h"
#include "llvm/Analysis/AliasAnalysis.h"		#include "llvm/Analysis/AliasAnalysis.h"
#include "llvm/Analysis/ConstantFolding.h"		#include "llvm/Analysis/ConstantFolding.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/IR/IRBuilder.h"		#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/IntrinsicInst.h"		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"

#define DEBUG_TYPE "vncoerce"		#define DEBUG_TYPE "vncoerce"
namespace llvm {		namespace llvm {
namespace VNCoercion {		namespace VNCoercion {

		static bool isFirstClassAggregateOrScalableType(Type *Ty) {
		return Ty->isStructTy() \|\| Ty->isArrayTy() \|\|
		(Ty->isVectorTy() && Ty->getVectorIsScalable());
		}

/// Return true if coerceAvailableValueToLoadType will succeed.		/// Return true if coerceAvailableValueToLoadType will succeed.
bool canCoerceMustAliasedValueToLoad(Value StoredVal, Type LoadTy,		bool canCoerceMustAliasedValueToLoad(Value StoredVal, Type LoadTy,
const DataLayout &DL) {		const DataLayout &DL) {
Type *StoredTy = StoredVal->getType();		Type *StoredTy = StoredVal->getType();
if (StoredTy == LoadTy)		if (StoredTy == LoadTy)
return true;		return true;

// If the loaded or stored value is an first class array or struct, don't try		// If the loaded/stored value is a first class array/struct, or scalable type,
		efriedmaUnsubmitted Done Reply Inline Actions Fix the comment here? Can you refactor the whole `Ty->isStructTy() \|\| LoadTy->isArrayTy() \|\| (Ty->isVectorTy() && Ty->getVectorIsScalable())` thing into a helper function, so the overall property has a name? efriedma: Fix the comment here? Can you refactor the whole `Ty->isStructTy() \|\| LoadTy->isArrayTy() \|\|…
		bjopeUnsubmitted Not Done Reply Inline Actions I agree that refactoring this check into a helper could be wise. Just out of curiosity, shouldn't we also check DL.typeSizeEuqalsStoreSize in such a helper? Background for that question is that I've been investigating some problems downstream (where we got some types that aren't byte sized and thus include paddning in memory). Consider this example (using types available upstream): define <4 x i1> @v4i8_to_v4i1() { %tmp = alloca <4 x i8>, align 1 store <4 x i8> <i8 4, i8 -1, i8 -1, i8 -1>, <4 x i8>* %tmp, align 1 %sroa_cast = bitcast <4 x i8>* %tmp to <4 x i1>* %dst = load <4 x i1>, <4 x i1>* %sroa_cast, align 1 ret <4 x i1> %dst } And notice that the returned value will differ depending on if we have big/little endian: > opt -gvn -S -data-layout "e" gvn-test.ll ; ModuleID = 'gvn-test.ll' source_filename = "gvn-test.ll" target datalayout = "e" define <4 x i1> @v4i8_to_v4i1() { %tmp = alloca <4 x i8>, align 1 store <4 x i8> <i8 4, i8 -1, i8 -1, i8 -1>, <4 x i8>* %tmp, align 1 %sroa_cast = bitcast <4 x i8>* %tmp to <4 x i1>* ret <4 x i1> <i1 false, i1 false, i1 true, i1 false> } > opt -gvn -S -data-layout "E" gvn-test.ll ; ModuleID = 'gvn-test.ll' source_filename = "gvn-test.ll" target datalayout = "E" define <4 x i1> @v4i8_to_v4i1() { %tmp = alloca <4 x i8>, align 1 store <4 x i8> <i8 4, i8 -1, i8 -1, i8 -1>, <4 x i8>* %tmp, align 1 %sroa_cast = bitcast <4 x i8>* %tmp to <4 x i1>* ret <4 x i1> <i1 false, i1 true, i1 false, i1 false> } So we might wanna avoid VNCoercion for types with DL.typeSizeEuqalsStoreSize(Ty) being false. At least for vector types. For scalars the alignTo check a few lines down offer some protection, but maybe checking DL.typeSizeEuqalsStoreSize(Ty) make sense for scalars as well. Or what do you think? bjope: I agree that refactoring this check into a helper could be wise. Just out of curiosity…
		bjopeUnsubmitted Not Done Reply Inline Actions Well, it is not wrong that we get different result for big/little endian in my example above. Just that it is unclear to my where the bits go when storing <4 x i1> to memory. Is it well defined where the four bits go when storing <4 x i1> to memory? bjope: Well, it is not wrong that we get different result for big/little endian in my example above.
		efriedmaUnsubmitted Not Done Reply Inline Actions Yes, there's a consistent rule that should be used everywhere. Maybe the simplest description is just the code in foldConstVectorToAPInt in ConstantFolding.cpp. efriedma: Yes, there's a consistent rule that should be used everywhere. Maybe the simplest description…
// to transform them. We need to be able to bitcast to integer.		// don't try to transform them. We need to be able to bitcast to integer.
if (LoadTy->isStructTy() \|\| LoadTy->isArrayTy() \|\| StoredTy->isStructTy() \|\|		if (isFirstClassAggregateOrScalableType(LoadTy) \|\|
StoredTy->isArrayTy())		isFirstClassAggregateOrScalableType(StoredTy))
return false;		return false;

uint64_t StoreSize = DL.getTypeSizeInBits(StoredTy);		uint64_t StoreSize = DL.getTypeSizeInBits(StoredTy).getFixedSize();

// The store size must be byte-aligned to support future type casts.		// The store size must be byte-aligned to support future type casts.
if (llvm::alignTo(StoreSize, 8) != StoreSize)		if (llvm::alignTo(StoreSize, 8) != StoreSize)
return false;		return false;

// The store has to be at least as big as the load.		// The store has to be at least as big as the load.
if (StoreSize < DL.getTypeSizeInBits(LoadTy))		if (StoreSize < DL.getTypeSizeInBits(LoadTy).getFixedSize())
return false;		return false;

// Don't coerce non-integral pointers to integers or vice versa.		// Don't coerce non-integral pointers to integers or vice versa.
if (DL.isNonIntegralPointerType(StoredVal->getType()->getScalarType()) !=		if (DL.isNonIntegralPointerType(StoredVal->getType()->getScalarType()) !=
DL.isNonIntegralPointerType(LoadTy->getScalarType())) {		DL.isNonIntegralPointerType(LoadTy->getScalarType())) {
// As a special case, allow coercion of memset used to initialize		// As a special case, allow coercion of memset used to initialize
// an array w/null. Despite non-integral pointers not generally having a		// an array w/null. Despite non-integral pointers not generally having a
// specific bit pattern, we do assume null is zero.		// specific bit pattern, we do assume null is zero.
Show All 12 Lines	static T coerceAvailableValueToLoadTypeHelper(T StoredVal, Type *LoadedTy,
assert(canCoerceMustAliasedValueToLoad(StoredVal, LoadedTy, DL) &&		assert(canCoerceMustAliasedValueToLoad(StoredVal, LoadedTy, DL) &&
"precondition violation - materialization can't fail");		"precondition violation - materialization can't fail");
if (auto *C = dyn_cast<Constant>(StoredVal))		if (auto *C = dyn_cast<Constant>(StoredVal))
StoredVal = ConstantFoldConstant(C, DL);		StoredVal = ConstantFoldConstant(C, DL);

// If this is already the right type, just return it.		// If this is already the right type, just return it.
Type *StoredValTy = StoredVal->getType();		Type *StoredValTy = StoredVal->getType();

uint64_t StoredValSize = DL.getTypeSizeInBits(StoredValTy);		uint64_t StoredValSize = DL.getTypeSizeInBits(StoredValTy).getFixedSize();
uint64_t LoadedValSize = DL.getTypeSizeInBits(LoadedTy);		uint64_t LoadedValSize = DL.getTypeSizeInBits(LoadedTy).getFixedSize();

// If the store and reload are the same size, we can always reuse it.		// If the store and reload are the same size, we can always reuse it.
if (StoredValSize == LoadedValSize) {		if (StoredValSize == LoadedValSize) {
// Pointer to Pointer -> use bitcast.		// Pointer to Pointer -> use bitcast.
if (StoredValTy->isPtrOrPtrVectorTy() && LoadedTy->isPtrOrPtrVectorTy()) {		if (StoredValTy->isPtrOrPtrVectorTy() && LoadedTy->isPtrOrPtrVectorTy()) {
StoredVal = Helper.CreateBitCast(StoredVal, LoadedTy);		StoredVal = Helper.CreateBitCast(StoredVal, LoadedTy);
} else {		} else {
// Convert source pointers to integers, which can be bitcast.		// Convert source pointers to integers, which can be bitcast.
Show All 35 Lines	static T coerceAvailableValueToLoadTypeHelper(T StoredVal, Type *LoadedTy,
if (!StoredValTy->isIntegerTy()) {		if (!StoredValTy->isIntegerTy()) {
StoredValTy = IntegerType::get(StoredValTy->getContext(), StoredValSize);		StoredValTy = IntegerType::get(StoredValTy->getContext(), StoredValSize);
StoredVal = Helper.CreateBitCast(StoredVal, StoredValTy);		StoredVal = Helper.CreateBitCast(StoredVal, StoredValTy);
}		}

// If this is a big-endian system, we need to shift the value down to the low		// If this is a big-endian system, we need to shift the value down to the low
// bits so that a truncate will work.		// bits so that a truncate will work.
if (DL.isBigEndian()) {		if (DL.isBigEndian()) {
uint64_t ShiftAmt = DL.getTypeStoreSizeInBits(StoredValTy) -		uint64_t ShiftAmt = DL.getTypeStoreSizeInBits(StoredValTy).getFixedSize() -
DL.getTypeStoreSizeInBits(LoadedTy);		DL.getTypeStoreSizeInBits(LoadedTy).getFixedSize();
StoredVal = Helper.CreateLShr(		StoredVal = Helper.CreateLShr(
StoredVal, ConstantInt::get(StoredVal->getType(), ShiftAmt));		StoredVal, ConstantInt::get(StoredVal->getType(), ShiftAmt));
}		}

// Truncate the integer to the right size now.		// Truncate the integer to the right size now.
Type *NewIntTy = IntegerType::get(StoredValTy->getContext(), LoadedValSize);		Type *NewIntTy = IntegerType::get(StoredValTy->getContext(), LoadedValSize);
StoredVal = Helper.CreateTruncOrBitCast(StoredVal, NewIntTy);		StoredVal = Helper.CreateTruncOrBitCast(StoredVal, NewIntTy);

Show All 31 Lines
///		///
/// Check this case to see if there is anything more we can do before we give		/// Check this case to see if there is anything more we can do before we give
/// up. This returns -1 if we have to give up, or a byte number in the stored		/// up. This returns -1 if we have to give up, or a byte number in the stored
/// value of the piece that feeds the load.		/// value of the piece that feeds the load.
static int analyzeLoadFromClobberingWrite(Type LoadTy, Value LoadPtr,		static int analyzeLoadFromClobberingWrite(Type LoadTy, Value LoadPtr,
Value *WritePtr,		Value *WritePtr,
uint64_t WriteSizeInBits,		uint64_t WriteSizeInBits,
const DataLayout &DL) {		const DataLayout &DL) {
// If the loaded or stored value is a first class array or struct, don't try		// If the loaded/stored value is a first class array/struct, or scalable type,
// to transform them. We need to be able to bitcast to integer.		// don't try to transform them. We need to be able to bitcast to integer.
if (LoadTy->isStructTy() \|\| LoadTy->isArrayTy())		if (isFirstClassAggregateOrScalableType(LoadTy))
return -1;		return -1;

int64_t StoreOffset = 0, LoadOffset = 0;		int64_t StoreOffset = 0, LoadOffset = 0;
Value *StoreBase =		Value *StoreBase =
GetPointerBaseWithConstantOffset(WritePtr, StoreOffset, DL);		GetPointerBaseWithConstantOffset(WritePtr, StoreOffset, DL);
Value *LoadBase = GetPointerBaseWithConstantOffset(LoadPtr, LoadOffset, DL);		Value *LoadBase = GetPointerBaseWithConstantOffset(LoadPtr, LoadOffset, DL);
if (StoreBase != LoadBase)		if (StoreBase != LoadBase)
return -1;		return -1;

// If the load and store are to the exact same address, they should have been		// If the load and store are to the exact same address, they should have been
// a must alias. AA must have gotten confused.		// a must alias. AA must have gotten confused.
// FIXME: Study to see if/when this happens. One case is forwarding a memset		// FIXME: Study to see if/when this happens. One case is forwarding a memset
// to a load from the base of the memset.		// to a load from the base of the memset.

// If the load and store don't overlap at all, the store doesn't provide		// If the load and store don't overlap at all, the store doesn't provide
// anything to the load. In this case, they really don't alias at all, AA		// anything to the load. In this case, they really don't alias at all, AA
// must have gotten confused.		// must have gotten confused.
uint64_t LoadSize = DL.getTypeSizeInBits(LoadTy);		uint64_t LoadSize = DL.getTypeSizeInBits(LoadTy).getFixedSize();

if ((WriteSizeInBits & 7) \| (LoadSize & 7))		if ((WriteSizeInBits & 7) \| (LoadSize & 7))
return -1;		return -1;
uint64_t StoreSize = WriteSizeInBits / 8; // Convert to bytes.		uint64_t StoreSize = WriteSizeInBits / 8; // Convert to bytes.
LoadSize /= 8;		LoadSize /= 8;

bool isAAFailure = false;		bool isAAFailure = false;
if (StoreOffset < LoadOffset)		if (StoreOffset < LoadOffset)
Show All 17 Lines	static int analyzeLoadFromClobberingWrite(Type LoadTy, Value LoadPtr,
return LoadOffset - StoreOffset;		return LoadOffset - StoreOffset;
}		}

/// This function is called when we have a		/// This function is called when we have a
/// memdep query of a load that ends up being a clobbering store.		/// memdep query of a load that ends up being a clobbering store.
int analyzeLoadFromClobberingStore(Type LoadTy, Value LoadPtr,		int analyzeLoadFromClobberingStore(Type LoadTy, Value LoadPtr,
StoreInst *DepSI, const DataLayout &DL) {		StoreInst *DepSI, const DataLayout &DL) {
auto *StoredVal = DepSI->getValueOperand();		auto *StoredVal = DepSI->getValueOperand();

// Cannot handle reading from store of first-class aggregate yet.		// Cannot handle reading from store of first-class aggregate or scalable type.
if (StoredVal->getType()->isStructTy() \|\|		if (isFirstClassAggregateOrScalableType(StoredVal->getType()))
StoredVal->getType()->isArrayTy())
return -1;		return -1;

// Don't coerce non-integral pointers to integers or vice versa.		// Don't coerce non-integral pointers to integers or vice versa.
if (DL.isNonIntegralPointerType(StoredVal->getType()->getScalarType()) !=		if (DL.isNonIntegralPointerType(StoredVal->getType()->getScalarType()) !=
DL.isNonIntegralPointerType(LoadTy->getScalarType())) {		DL.isNonIntegralPointerType(LoadTy->getScalarType())) {
// Allow casts of zero values to null as a special case		// Allow casts of zero values to null as a special case
auto *CI = dyn_cast<Constant>(StoredVal);		auto *CI = dyn_cast<Constant>(StoredVal);
if (!CI \|\| !CI->isNullValue())		if (!CI \|\| !CI->isNullValue())
return -1;		return -1;
}		}

Value *StorePtr = DepSI->getPointerOperand();		Value *StorePtr = DepSI->getPointerOperand();
uint64_t StoreSize =		uint64_t StoreSize =
DL.getTypeSizeInBits(DepSI->getValueOperand()->getType());		DL.getTypeSizeInBits(DepSI->getValueOperand()->getType()).getFixedSize();
return analyzeLoadFromClobberingWrite(LoadTy, LoadPtr, StorePtr, StoreSize,		return analyzeLoadFromClobberingWrite(LoadTy, LoadPtr, StorePtr, StoreSize,
DL);		DL);
}		}

/// Looks at a memory location for a load (specified by MemLocBase, Offs, and		/// Looks at a memory location for a load (specified by MemLocBase, Offs, and
/// Size) and compares it against a load.		/// Size) and compares it against a load.
///		///
/// If the specified load could be safely widened to a larger integer load		/// If the specified load could be safely widened to a larger integer load
▲ Show 20 Lines • Show All 88 Lines • ▼ Show 20 Lines	if (DepLI->getType()->isStructTy() \|\| DepLI->getType()->isArrayTy())
return -1;		return -1;

// Don't coerce non-integral pointers to integers or vice versa.		// Don't coerce non-integral pointers to integers or vice versa.
if (DL.isNonIntegralPointerType(DepLI->getType()->getScalarType()) !=		if (DL.isNonIntegralPointerType(DepLI->getType()->getScalarType()) !=
DL.isNonIntegralPointerType(LoadTy->getScalarType()))		DL.isNonIntegralPointerType(LoadTy->getScalarType()))
return -1;		return -1;

Value *DepPtr = DepLI->getPointerOperand();		Value *DepPtr = DepLI->getPointerOperand();
uint64_t DepSize = DL.getTypeSizeInBits(DepLI->getType());		uint64_t DepSize = DL.getTypeSizeInBits(DepLI->getType()).getFixedSize();
int R = analyzeLoadFromClobberingWrite(LoadTy, LoadPtr, DepPtr, DepSize, DL);		int R = analyzeLoadFromClobberingWrite(LoadTy, LoadPtr, DepPtr, DepSize, DL);
if (R != -1)		if (R != -1)
return R;		return R;

// If we have a load/load clobber an DepLI can be widened to cover this load,		// If we have a load/load clobber an DepLI can be widened to cover this load,
// then we should widen it!		// then we should widen it!
int64_t LoadOffs = 0;		int64_t LoadOffs = 0;
const Value *LoadBase =		const Value *LoadBase =
GetPointerBaseWithConstantOffset(LoadPtr, LoadOffs, DL);		GetPointerBaseWithConstantOffset(LoadPtr, LoadOffs, DL);
unsigned LoadSize = DL.getTypeStoreSize(LoadTy);		unsigned LoadSize = DL.getTypeStoreSize(LoadTy).getFixedSize();

unsigned Size =		unsigned Size =
getLoadLoadClobberFullWidthSize(LoadBase, LoadOffs, LoadSize, DepLI);		getLoadLoadClobberFullWidthSize(LoadBase, LoadOffs, LoadSize, DepLI);
if (Size == 0)		if (Size == 0)
return -1;		return -1;

// Check non-obvious conditions enforced by MDA which we rely on for being		// Check non-obvious conditions enforced by MDA which we rely on for being
// able to materialize this potentially available value		// able to materialize this potentially available value
▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines	static T getStoreValueForLoadHelper(T SrcVal, unsigned Offset, Type *LoadTy,
// so we don't need to do any truncation, etc. This avoids introducing		// so we don't need to do any truncation, etc. This avoids introducing
// ptrtoint instructions for pointers that may be non-integral.		// ptrtoint instructions for pointers that may be non-integral.
if (SrcVal->getType()->isPointerTy() && LoadTy->isPointerTy() &&		if (SrcVal->getType()->isPointerTy() && LoadTy->isPointerTy() &&
cast<PointerType>(SrcVal->getType())->getAddressSpace() ==		cast<PointerType>(SrcVal->getType())->getAddressSpace() ==
cast<PointerType>(LoadTy)->getAddressSpace()) {		cast<PointerType>(LoadTy)->getAddressSpace()) {
return SrcVal;		return SrcVal;
}		}

uint64_t StoreSize = (DL.getTypeSizeInBits(SrcVal->getType()) + 7) / 8;		uint64_t StoreSize =
uint64_t LoadSize = (DL.getTypeSizeInBits(LoadTy) + 7) / 8;		(DL.getTypeSizeInBits(SrcVal->getType()).getFixedSize() + 7) / 8;
		uint64_t LoadSize = (DL.getTypeSizeInBits(LoadTy).getFixedSize() + 7) / 8;
// Compute which bits of the stored value are being used by the load. Convert		// Compute which bits of the stored value are being used by the load. Convert
// to an integer type to start with.		// to an integer type to start with.
if (SrcVal->getType()->isPtrOrPtrVectorTy())		if (SrcVal->getType()->isPtrOrPtrVectorTy())
SrcVal = Helper.CreatePtrToInt(SrcVal, DL.getIntPtrType(SrcVal->getType()));		SrcVal = Helper.CreatePtrToInt(SrcVal, DL.getIntPtrType(SrcVal->getType()));
if (!SrcVal->getType()->isIntegerTy())		if (!SrcVal->getType()->isIntegerTy())
SrcVal = Helper.CreateBitCast(SrcVal, IntegerType::get(Ctx, StoreSize * 8));		SrcVal = Helper.CreateBitCast(SrcVal, IntegerType::get(Ctx, StoreSize * 8));

// Shift the bits to the least significant depending on endianness.		// Shift the bits to the least significant depending on endianness.
Show All 35 Lines
/// being a clobbering load. This means that the load may provide bits used		/// being a clobbering load. This means that the load may provide bits used
/// by the load but we can't be sure because the pointers don't must-alias.		/// by the load but we can't be sure because the pointers don't must-alias.
/// Check this case to see if there is anything more we can do before we give		/// Check this case to see if there is anything more we can do before we give
/// up.		/// up.
Value getLoadValueForLoad(LoadInst SrcVal, unsigned Offset, Type *LoadTy,		Value getLoadValueForLoad(LoadInst SrcVal, unsigned Offset, Type *LoadTy,
Instruction *InsertPt, const DataLayout &DL) {		Instruction *InsertPt, const DataLayout &DL) {
// If Offset+LoadTy exceeds the size of SrcVal, then we must be wanting to		// If Offset+LoadTy exceeds the size of SrcVal, then we must be wanting to
// widen SrcVal out to a larger load.		// widen SrcVal out to a larger load.
unsigned SrcValStoreSize = DL.getTypeStoreSize(SrcVal->getType());		unsigned SrcValStoreSize =
unsigned LoadSize = DL.getTypeStoreSize(LoadTy);		DL.getTypeStoreSize(SrcVal->getType()).getFixedSize();
		unsigned LoadSize = DL.getTypeStoreSize(LoadTy).getFixedSize();
if (Offset + LoadSize > SrcValStoreSize) {		if (Offset + LoadSize > SrcValStoreSize) {
assert(SrcVal->isSimple() && "Cannot widen volatile/atomic load!");		assert(SrcVal->isSimple() && "Cannot widen volatile/atomic load!");
assert(SrcVal->getType()->isIntegerTy() && "Can't widen non-integer load");		assert(SrcVal->getType()->isIntegerTy() && "Can't widen non-integer load");
// If we have a load/load clobber an DepLI can be widened to cover this		// If we have a load/load clobber an DepLI can be widened to cover this
// load, then we should widen it to the next power of 2 size big enough!		// load, then we should widen it to the next power of 2 size big enough!
unsigned NewLoadSize = Offset + LoadSize;		unsigned NewLoadSize = Offset + LoadSize;
if (!isPowerOf2_32(NewLoadSize))		if (!isPowerOf2_32(NewLoadSize))
NewLoadSize = NextPowerOf2(NewLoadSize);		NewLoadSize = NextPowerOf2(NewLoadSize);
Show All 26 Lines	if (Offset + LoadSize > SrcValStoreSize) {
SrcVal = NewLoad;		SrcVal = NewLoad;
}		}

return getStoreValueForLoad(SrcVal, Offset, LoadTy, InsertPt, DL);		return getStoreValueForLoad(SrcVal, Offset, LoadTy, InsertPt, DL);
}		}

Constant getConstantLoadValueForLoad(Constant SrcVal, unsigned Offset,		Constant getConstantLoadValueForLoad(Constant SrcVal, unsigned Offset,
Type *LoadTy, const DataLayout &DL) {		Type *LoadTy, const DataLayout &DL) {
unsigned SrcValStoreSize = DL.getTypeStoreSize(SrcVal->getType());		unsigned SrcValStoreSize =
unsigned LoadSize = DL.getTypeStoreSize(LoadTy);		DL.getTypeStoreSize(SrcVal->getType()).getFixedSize();
		unsigned LoadSize = DL.getTypeStoreSize(LoadTy).getFixedSize();
if (Offset + LoadSize > SrcValStoreSize)		if (Offset + LoadSize > SrcValStoreSize)
return nullptr;		return nullptr;
return getConstantStoreValueForLoad(SrcVal, Offset, LoadTy, DL);		return getConstantStoreValueForLoad(SrcVal, Offset, LoadTy, DL);
}		}

template <class T, class HelperClass>		template <class T, class HelperClass>
T getMemInstValueForLoadHelper(MemIntrinsic SrcInst, unsigned Offset,		T getMemInstValueForLoadHelper(MemIntrinsic SrcInst, unsigned Offset,
Type *LoadTy, HelperClass &Helper,		Type *LoadTy, HelperClass &Helper,
const DataLayout &DL) {		const DataLayout &DL) {
LLVMContext &Ctx = LoadTy->getContext();		LLVMContext &Ctx = LoadTy->getContext();
uint64_t LoadSize = DL.getTypeSizeInBits(LoadTy) / 8;		uint64_t LoadSize = DL.getTypeSizeInBits(LoadTy).getFixedSize() / 8;

// We know that this method is only called when the mem transfer fully		// We know that this method is only called when the mem transfer fully
// provides the bits for the load.		// provides the bits for the load.
if (MemSetInst *MSI = dyn_cast<MemSetInst>(SrcInst)) {		if (MemSetInst *MSI = dyn_cast<MemSetInst>(SrcInst)) {
// memset(P, 'x', 1234) -> splat('x'), even if x is a variable, and		// memset(P, 'x', 1234) -> splat('x'), even if x is a variable, and
// independently of what the offset is.		// independently of what the offset is.
T *Val = cast<T>(MSI->getValue());		T *Val = cast<T>(MSI->getValue());
if (LoadSize != 1)		if (LoadSize != 1)
▲ Show 20 Lines • Show All 64 Lines • Show Last 20 Lines

llvm/test/Transforms/GVN/vscale.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt -S < %s -basicaa -gvn -dce \| FileCheck %s

				; Analyze Load from clobbering Load.

				define <vscale x 4 x i32> @load_store_clobber_load(<vscale x 4 x i32> *%p) {
				; CHECK-LABEL: @load_store_clobber_load(
				; CHECK-NEXT: [[LOAD1:%.]] = load <vscale x 4 x i32>, <vscale x 4 x i32> [[P:%.*]]
				; CHECK-NEXT: store <vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32>* undef
				; CHECK-NEXT: [[ADD:%.*]] = add <vscale x 4 x i32> [[LOAD1]], [[LOAD1]]
				; CHECK-NEXT: ret <vscale x 4 x i32> [[ADD]]
				;
				%load1 = load <vscale x 4 x i32>, <vscale x 4 x i32>* %p
				store <vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32>* undef
				%load2 = load <vscale x 4 x i32>, <vscale x 4 x i32>* %p ; <- load to be eliminated
				%add = add <vscale x 4 x i32> %load1, %load2
				ret <vscale x 4 x i32> %add
				}

				define <vscale x 4 x i32> @load_store_clobber_load_mayalias(<vscale x 4 x i32>* %p, <vscale x 4 x i32>* %p2) {
				; CHECK-LABEL: @load_store_clobber_load_mayalias(
				; CHECK-NEXT: [[LOAD1:%.]] = load <vscale x 4 x i32>, <vscale x 4 x i32> [[P:%.*]]
				; CHECK-NEXT: store <vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32>* [[P2:%.*]]
				; CHECK-NEXT: [[LOAD2:%.]] = load <vscale x 4 x i32>, <vscale x 4 x i32> [[P]]
				; CHECK-NEXT: [[SUB:%.*]] = sub <vscale x 4 x i32> [[LOAD1]], [[LOAD2]]
				; CHECK-NEXT: ret <vscale x 4 x i32> [[SUB]]
				;
				%load1 = load <vscale x 4 x i32>, <vscale x 4 x i32>* %p
				store <vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32>* %p2
				%load2 = load <vscale x 4 x i32>, <vscale x 4 x i32>* %p
				%sub = sub <vscale x 4 x i32> %load1, %load2
				ret <vscale x 4 x i32> %sub
				}

				define <vscale x 4 x i32> @load_store_clobber_load_noalias(<vscale x 4 x i32>* noalias %p, <vscale x 4 x i32>* noalias %p2) {
				; CHECK-LABEL: @load_store_clobber_load_noalias(
				; CHECK-NEXT: [[LOAD1:%.]] = load <vscale x 4 x i32>, <vscale x 4 x i32> [[P:%.*]]
				; CHECK-NEXT: store <vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32>* [[P2:%.*]]
				; CHECK-NEXT: [[ADD:%.*]] = add <vscale x 4 x i32> [[LOAD1]], [[LOAD1]]
				; CHECK-NEXT: ret <vscale x 4 x i32> [[ADD]]
				;
				%load1 = load <vscale x 4 x i32>, <vscale x 4 x i32>* %p
				store <vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32>* %p2
				%load2 = load <vscale x 4 x i32>, <vscale x 4 x i32>* %p ; <- load to be eliminated
				%add = add <vscale x 4 x i32> %load1, %load2
				ret <vscale x 4 x i32> %add
				}

				; TODO: BasicAA return MayAlias for %gep1,%gep2, could improve as MustAlias.
				define i32 @load_clobber_load_gep1(<vscale x 4 x i32>* %p) {
				; CHECK-LABEL: @load_clobber_load_gep1(
				; CHECK-NEXT: [[GEP1:%.]] = getelementptr <vscale x 4 x i32>, <vscale x 4 x i32> [[P:%.*]], i64 0, i64 1
				; CHECK-NEXT: [[LOAD1:%.]] = load i32, i32 [[GEP1]]
				; CHECK-NEXT: [[P2:%.]] = bitcast <vscale x 4 x i32> [[P]] to i32*
				; CHECK-NEXT: [[GEP2:%.]] = getelementptr i32, i32 [[P2]], i64 1
				; CHECK-NEXT: [[LOAD2:%.]] = load i32, i32 [[GEP2]]
				; CHECK-NEXT: [[ADD:%.*]] = add i32 [[LOAD1]], [[LOAD2]]
				; CHECK-NEXT: ret i32 [[ADD]]
				;
				%gep1 = getelementptr <vscale x 4 x i32>, <vscale x 4 x i32>* %p, i64 0, i64 1
				%load1 = load i32, i32* %gep1
				%p2 = bitcast <vscale x 4 x i32>* %p to i32*
				%gep2 = getelementptr i32, i32* %p2, i64 1
				%load2 = load i32, i32* %gep2 ; <- load could be eliminated
				%add = add i32 %load1, %load2
				ret i32 %add
				}

				define i32 @load_clobber_load_gep2(<vscale x 4 x i32>* %p) {
				; CHECK-LABEL: @load_clobber_load_gep2(
				; CHECK-NEXT: [[GEP1:%.]] = getelementptr <vscale x 4 x i32>, <vscale x 4 x i32> [[P:%.*]], i64 1, i64 0
				; CHECK-NEXT: [[LOAD1:%.]] = load i32, i32 [[GEP1]]
				; CHECK-NEXT: [[P2:%.]] = bitcast <vscale x 4 x i32> [[P]] to i32*
				; CHECK-NEXT: [[GEP2:%.]] = getelementptr i32, i32 [[P2]], i64 4
				; CHECK-NEXT: [[LOAD2:%.]] = load i32, i32 [[GEP2]]
				; CHECK-NEXT: [[ADD:%.*]] = add i32 [[LOAD1]], [[LOAD2]]
				; CHECK-NEXT: ret i32 [[ADD]]
				;
				%gep1 = getelementptr <vscale x 4 x i32>, <vscale x 4 x i32>* %p, i64 1, i64 0
				%load1 = load i32, i32* %gep1
				%p2 = bitcast <vscale x 4 x i32>* %p to i32*
				%gep2 = getelementptr i32, i32* %p2, i64 4
				%load2 = load i32, i32* %gep2 ; <- can not determine at compile-time if %load1 and %load2 are same addr
				%add = add i32 %load1, %load2
				ret i32 %add
				}

				; TODO: BasicAA return MayAlias for %gep1,%gep2, could improve as MustAlias.
				define i32 @load_clobber_load_gep3(<vscale x 4 x i32>* %p) {
				; CHECK-LABEL: @load_clobber_load_gep3(
				; CHECK-NEXT: [[GEP1:%.]] = getelementptr <vscale x 4 x i32>, <vscale x 4 x i32> [[P:%.*]], i64 1, i64 0
				; CHECK-NEXT: [[LOAD1:%.]] = load i32, i32 [[GEP1]]
				; CHECK-NEXT: [[P2:%.]] = bitcast <vscale x 4 x i32> [[P]] to <vscale x 4 x float>*
				; CHECK-NEXT: [[GEP2:%.]] = getelementptr <vscale x 4 x float>, <vscale x 4 x float> [[P2]], i64 1, i64 0
				; CHECK-NEXT: [[LOAD2:%.]] = load float, float [[GEP2]]
				; CHECK-NEXT: [[CAST:%.*]] = bitcast float [[LOAD2]] to i32
				; CHECK-NEXT: [[ADD:%.*]] = add i32 [[LOAD1]], [[CAST]]
				; CHECK-NEXT: ret i32 [[ADD]]
				;
				%gep1 = getelementptr <vscale x 4 x i32>, <vscale x 4 x i32>* %p, i64 1, i64 0
				%load1 = load i32, i32* %gep1
				%p2 = bitcast <vscale x 4 x i32>* %p to <vscale x 4 x float>*
				%gep2 = getelementptr <vscale x 4 x float>, <vscale x 4 x float>* %p2, i64 1, i64 0
				%load2 = load float, float* %gep2 ; <- load could be eliminated
				%cast = bitcast float %load2 to i32
				%add = add i32 %load1, %cast
				ret i32 %add
				}

				define <vscale x 4 x i32> @load_clobber_load_fence(<vscale x 4 x i32>* %p) {
				; CHECK-LABEL: @load_clobber_load_fence(
				; CHECK-NEXT: [[LOAD1:%.]] = load <vscale x 4 x i32>, <vscale x 4 x i32> [[P:%.*]]
				; CHECK-NEXT: call void asm "", "~{memory}"()
				; CHECK-NEXT: [[LOAD2:%.]] = load <vscale x 4 x i32>, <vscale x 4 x i32> [[P]]
				; CHECK-NEXT: [[SUB:%.*]] = sub <vscale x 4 x i32> [[LOAD1]], [[LOAD2]]
				; CHECK-NEXT: ret <vscale x 4 x i32> [[SUB]]
				;
				%load1 = load <vscale x 4 x i32>, <vscale x 4 x i32>* %p
				call void asm "", "~{memory}"()
				%load2 = load <vscale x 4 x i32>, <vscale x 4 x i32>* %p
				%sub = sub <vscale x 4 x i32> %load1, %load2
				ret <vscale x 4 x i32> %sub
				}

				define <vscale x 4 x i32> @load_clobber_load_sideeffect(<vscale x 4 x i32>* %p) {
				; CHECK-LABEL: @load_clobber_load_sideeffect(
				; CHECK-NEXT: [[LOAD1:%.]] = load <vscale x 4 x i32>, <vscale x 4 x i32> [[P:%.*]]
				; CHECK-NEXT: call void asm sideeffect "", ""()
				; CHECK-NEXT: [[LOAD2:%.]] = load <vscale x 4 x i32>, <vscale x 4 x i32> [[P]]
				; CHECK-NEXT: [[ADD:%.*]] = add <vscale x 4 x i32> [[LOAD1]], [[LOAD2]]
				; CHECK-NEXT: ret <vscale x 4 x i32> [[ADD]]
				;
				%load1 = load <vscale x 4 x i32>, <vscale x 4 x i32>* %p
				call void asm sideeffect "", ""()
				%load2 = load <vscale x 4 x i32>, <vscale x 4 x i32>* %p
				%add = add <vscale x 4 x i32> %load1, %load2
				ret <vscale x 4 x i32> %add
				}

				; Analyze Load from clobbering Store.

				define <vscale x 4 x i32> @store_forward_to_load(<vscale x 4 x i32>* %p) {
				; CHECK-LABEL: @store_forward_to_load(
				; CHECK-NEXT: store <vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32>* [[P:%.*]]
				; CHECK-NEXT: ret <vscale x 4 x i32> zeroinitializer
				;
				store <vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32>* %p
				%load = load <vscale x 4 x i32>, <vscale x 4 x i32>* %p
				ret <vscale x 4 x i32> %load
				}

				define <vscale x 4 x i32> @store_forward_to_load_sideeffect(<vscale x 4 x i32>* %p) {
				; CHECK-LABEL: @store_forward_to_load_sideeffect(
				; CHECK-NEXT: store <vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32>* [[P:%.*]]
				; CHECK-NEXT: call void asm sideeffect "", ""()
				; CHECK-NEXT: [[LOAD:%.]] = load <vscale x 4 x i32>, <vscale x 4 x i32> [[P]]
				; CHECK-NEXT: ret <vscale x 4 x i32> [[LOAD]]
				;
				store <vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32>* %p
				call void asm sideeffect "", ""()
				%load = load <vscale x 4 x i32>, <vscale x 4 x i32>* %p
				ret <vscale x 4 x i32> %load
				}

				define i32 @store_clobber_load() {
				; CHECK-LABEL: @store_clobber_load(
				; CHECK-NEXT: [[ALLOC:%.*]] = alloca <vscale x 4 x i32>
				; CHECK-NEXT: store <vscale x 4 x i32> undef, <vscale x 4 x i32>* [[ALLOC]]
				; CHECK-NEXT: [[PTR:%.]] = getelementptr <vscale x 4 x i32>, <vscale x 4 x i32> [[ALLOC]], i32 0, i32 1
				; CHECK-NEXT: [[LOAD:%.]] = load i32, i32 [[PTR]]
				; CHECK-NEXT: ret i32 [[LOAD]]
				;
				%alloc = alloca <vscale x 4 x i32>
				store <vscale x 4 x i32> undef, <vscale x 4 x i32>* %alloc
				%ptr = getelementptr <vscale x 4 x i32>, <vscale x 4 x i32>* %alloc, i32 0, i32 1
				%load = load i32, i32* %ptr
				ret i32 %load
				}

				; Analyze Load from clobbering MemInst.

				declare void @llvm.memset.p0i8.i64(i8* nocapture, i8, i64, i1)

				define i32 @memset_clobber_load(<vscale x 4 x i32> *%p) {
				; CHECK-LABEL: @memset_clobber_load(
				; CHECK-NEXT: [[CONV:%.]] = bitcast <vscale x 4 x i32> [[P:%.]] to i8
				; CHECK-NEXT: tail call void @llvm.memset.p0i8.i64(i8* [[CONV]], i8 1, i64 200, i1 false)
				; CHECK-NEXT: ret i32 16843009
				;
				%conv = bitcast <vscale x 4 x i32>* %p to i8*
				tail call void @llvm.memset.p0i8.i64(i8* %conv, i8 1, i64 200, i1 false)
				%gep = getelementptr <vscale x 4 x i32>, <vscale x 4 x i32>* %p, i64 0, i64 5
				%load = load i32, i32* %gep
				ret i32 %load
				}

				define i32 @memset_clobber_load_vscaled_base(<vscale x 4 x i32> *%p) {
				; CHECK-LABEL: @memset_clobber_load_vscaled_base(
				; CHECK-NEXT: [[CONV:%.]] = bitcast <vscale x 4 x i32> [[P:%.]] to i8
				; CHECK-NEXT: tail call void @llvm.memset.p0i8.i64(i8* [[CONV]], i8 1, i64 200, i1 false)
				; CHECK-NEXT: [[GEP:%.]] = getelementptr <vscale x 4 x i32>, <vscale x 4 x i32> [[P]], i64 1, i64 1
				; CHECK-NEXT: [[LOAD:%.]] = load i32, i32 [[GEP]]
				; CHECK-NEXT: ret i32 [[LOAD]]
				;
				%conv = bitcast <vscale x 4 x i32>* %p to i8*
				tail call void @llvm.memset.p0i8.i64(i8* %conv, i8 1, i64 200, i1 false)
				%gep = getelementptr <vscale x 4 x i32>, <vscale x 4 x i32>* %p, i64 1, i64 1
				%load = load i32, i32* %gep
				ret i32 %load
				}

				define i32 @memset_clobber_load_nonconst_index(<vscale x 4 x i32> *%p, i64 %idx1, i64 %idx2) {
				; CHECK-LABEL: @memset_clobber_load_nonconst_index(
				; CHECK-NEXT: [[CONV:%.]] = bitcast <vscale x 4 x i32> [[P:%.]] to i8
				; CHECK-NEXT: tail call void @llvm.memset.p0i8.i64(i8* [[CONV]], i8 1, i64 200, i1 false)
				; CHECK-NEXT: [[GEP:%.]] = getelementptr <vscale x 4 x i32>, <vscale x 4 x i32> [[P]], i64 [[IDX1:%.]], i64 [[IDX2:%.]]
				; CHECK-NEXT: [[LOAD:%.]] = load i32, i32 [[GEP]]
				; CHECK-NEXT: ret i32 [[LOAD]]
				;
				%conv = bitcast <vscale x 4 x i32>* %p to i8*
				tail call void @llvm.memset.p0i8.i64(i8* %conv, i8 1, i64 200, i1 false)
				%gep = getelementptr <vscale x 4 x i32>, <vscale x 4 x i32>* %p, i64 %idx1, i64 %idx2
				%load = load i32, i32* %gep
				ret i32 %load
				}


				; Load elimination across BBs

				define <vscale x 4 x i32>* @load_from_alloc_replaced_with_undef() {
				; CHECK-LABEL: @load_from_alloc_replaced_with_undef(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[A:%.*]] = alloca <vscale x 4 x i32>
				; CHECK-NEXT: br i1 undef, label [[IF_END:%.]], label [[IF_THEN:%.]]
				; CHECK: if.then:
				; CHECK-NEXT: store <vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32>* [[A]]
				; CHECK-NEXT: br label [[IF_END]]
				; CHECK: if.end:
				; CHECK-NEXT: ret <vscale x 4 x i32>* [[A]]
				;
				entry:
				%a = alloca <vscale x 4 x i32>
				%gep = getelementptr <vscale x 4 x i32>, <vscale x 4 x i32>* %a, i64 0, i64 1
				%load = load i32, i32* %gep ; <- load to be eliminated
				%tobool = icmp eq i32 %load, 0 ; <- icmp to be eliminated
				br i1 %tobool, label %if.end, label %if.then

				if.then:
				store <vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32>* %a
				br label %if.end

				if.end:
				ret <vscale x 4 x i32>* %a
				}

				define i32 @redundant_load_elimination_1(<vscale x 4 x i32>* %p) {
				; CHECK-LABEL: @redundant_load_elimination_1(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[GEP:%.]] = getelementptr <vscale x 4 x i32>, <vscale x 4 x i32> [[P:%.*]], i64 1, i64 1
				; CHECK-NEXT: [[LOAD1:%.]] = load i32, i32 [[GEP]]
				; CHECK-NEXT: [[CMP:%.*]] = icmp eq i32 [[LOAD1]], 0
				; CHECK-NEXT: br i1 [[CMP]], label [[IF_THEN:%.]], label [[IF_END:%.]]
				; CHECK: if.then:
				; CHECK-NEXT: br label [[IF_END]]
				; CHECK: if.end:
				; CHECK-NEXT: ret i32 [[LOAD1]]
				;
				entry:
				%gep = getelementptr <vscale x 4 x i32>, <vscale x 4 x i32>* %p, i64 1, i64 1
				%load1 = load i32, i32* %gep
				%cmp = icmp eq i32 %load1, 0
				br i1 %cmp, label %if.then, label %if.end

				if.then:
				%load2 = load i32, i32* %gep ; <- load to be eliminated
				%add = add i32 %load1, %load2
				br label %if.end

				if.end:
				%result = phi i32 [ %add, %if.then ], [ %load1, %entry ]
				ret i32 %result
				}

				; TODO: BasicAA return MayAlias for %gep1,%gep2, could improve as NoAlias.
				define void @redundant_load_elimination_2(i1 %c, <vscale x 4 x i32>* %p, i32* %q, <vscale x 4 x i32> %v) {
				; CHECK-LABEL: @redundant_load_elimination_2(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[GEP1:%.]] = getelementptr <vscale x 4 x i32>, <vscale x 4 x i32> [[P:%.*]], i64 1, i64 1
				; CHECK-NEXT: store i32 0, i32* [[GEP1]]
				; CHECK-NEXT: [[GEP2:%.]] = getelementptr <vscale x 4 x i32>, <vscale x 4 x i32> [[P]], i64 1, i64 0
				; CHECK-NEXT: store i32 1, i32* [[GEP2]]
				; CHECK-NEXT: br i1 [[C:%.]], label [[IF_ELSE:%.]], label [[IF_THEN:%.*]]
				; CHECK: if.then:
				; CHECK-NEXT: [[T:%.]] = load i32, i32 [[GEP1]]
				; CHECK-NEXT: store i32 [[T]], i32* [[Q:%.*]]
				; CHECK-NEXT: ret void
				; CHECK: if.else:
				; CHECK-NEXT: ret void
				;
				entry:
				%gep1 = getelementptr <vscale x 4 x i32>, <vscale x 4 x i32>* %p, i64 1, i64 1
				store i32 0, i32* %gep1
				%gep2 = getelementptr <vscale x 4 x i32>, <vscale x 4 x i32>* %p, i64 1, i64 0
				store i32 1, i32* %gep2
				br i1 %c, label %if.else, label %if.then

				if.then:
				%t = load i32, i32* %gep1 ; <- load could be eliminated
				store i32 %t, i32* %q
				ret void

				if.else:
				ret void
				}

				; TODO: load in if.then could have been eliminated
				define void @missing_load_elimination(i1 %c, <vscale x 4 x i32>* %p, <vscale x 4 x i32>* %q, <vscale x 4 x i32> %v) {
				; CHECK-LABEL: @missing_load_elimination(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: store <vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32>* [[P:%.*]]
				; CHECK-NEXT: [[P1:%.]] = getelementptr <vscale x 4 x i32>, <vscale x 4 x i32> [[P]], i64 1
				; CHECK-NEXT: store <vscale x 4 x i32> [[V:%.]], <vscale x 4 x i32> [[P1]]
				; CHECK-NEXT: br i1 [[C:%.]], label [[IF_ELSE:%.]], label [[IF_THEN:%.*]]
				; CHECK: if.then:
				; CHECK-NEXT: [[T:%.]] = load <vscale x 4 x i32>, <vscale x 4 x i32> [[P]]
				; CHECK-NEXT: store <vscale x 4 x i32> [[T]], <vscale x 4 x i32>* [[Q:%.*]]
				; CHECK-NEXT: ret void
				; CHECK: if.else:
				; CHECK-NEXT: ret void
				;
				entry:
				store <vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32>* %p
				%p1 = getelementptr <vscale x 4 x i32>, <vscale x 4 x i32>* %p, i64 1
				store <vscale x 4 x i32> %v, <vscale x 4 x i32>* %p1
				br i1 %c, label %if.else, label %if.then

				if.then:
				%t = load <vscale x 4 x i32>, <vscale x 4 x i32>* %p ; load could be eliminated
				store <vscale x 4 x i32> %t, <vscale x 4 x i32>* %q
				ret void

				if.else:
				ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

[GVN] Fix VNCoercion for Scalable Vector.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 256724

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp

llvm/lib/Transforms/Utils/VNCoercion.cpp

llvm/test/Transforms/GVN/vscale.ll

[GVN] Fix VNCoercion for Scalable Vector.
ClosedPublic