This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
3/6
SROA.cpp
-
test/Transforms/SROA/
-
Transforms/
-
SROA/
-
scalable-vectors.ll

Differential D76720

[Transforms][SROA] Promote allocas with mem2reg for scalable types
ClosedPublic

Authored by c-rhodes on Mar 24 2020, 11:42 AM.

Download Raw Diff

Details

Reviewers

efriedma
cameron.mcinally
sdesmalen
ctetreau
chandlerc

Commits

rG84aa6cf1a9fe: [Transforms][SROA] Promote allocas with mem2reg for scalable types

Summary

Aggregate types containing scalable vectors aren't supported and as far
as I can tell this pass is mostly concerned with optimisations on
aggregate types, so the majority of this pass isn't very useful for
scalable vectors.

This patch modifies SROA such that mem2reg is run on allocas with
scalable types that are promotable, but nothing else such as slicing is
done.

The use of TypeSize in this pass has also been updated to be explicitly
fixed size. When invoking the following methods in DataLayout:

getTypeSizeInBits
getTypeStoreSize
getTypeStoreSizeInBits
getTypeAllocSize

we now called getFixedSize on the resultant TypeSize. This is quite an
extensive change with around 50 calls to these functions, and also the
first change of this kind (being explicit about fixed vs scalable
size) as far as I'm aware, so feedback welcome.

A test is included containing IR with scalable vectors that this pass is
able to optimise.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

c-rhodes created this revision.Mar 24 2020, 11:42 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 24 2020, 11:42 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

Harbormaster failed remote builds in B50289: Diff 252384!Mar 24 2020, 12:21 PM

We want SROA to at least run mem2reg on scalable vectors, since we don't run mem2reg separately. This is important for C code using SVE intrinsics.

I agree we don't need to care about slicing etc.; we probably can't slice an alloca of unknown size in most cases.

In D76720#1939896, @efriedma wrote:

We want SROA to at least run mem2reg on scalable vectors, since we don't run mem2reg separately. This is important for C code using SVE intrinsics.

mem2reg isn't run as a separate pass? We have a clang ACLE test for brka intrinsic downstream that was hitting a lot of "Request for a fixed size on a scalable object" asserts in SROA, with this patch applied it gets optimised by mem2reg:

./build/bin/clang -cc1 -internal-isystem ./build/lib/clang/9.0.1/include -nostdsysteminc -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o -  tools/clang/test/CodeGen/AArch64/acle/acle_sve_brka.c -mllvm -print-before-pass=mem2reg -mllvm -print-after-pass=mem2reg
*** IR Dump Before Promote Memory to Register ***
; Function Attrs: nounwind
define <n x 16 x i1> @test_svbrka_b_z(<n x 16 x i1> %pg, <n x 16 x i1> %op) local_unnamed_addr #0 {
entry:
  %pg.addr = alloca <n x 16 x i1>, align 1
  %op.addr = alloca <n x 16 x i1>, align 1
  store <n x 16 x i1> %pg, <n x 16 x i1>* %pg.addr, align 1, !tbaa !2
  store <n x 16 x i1> %op, <n x 16 x i1>* %op.addr, align 1, !tbaa !2
  %0 = load <n x 16 x i1>, <n x 16 x i1>* %pg.addr, align 1, !tbaa !2
  %1 = call <n x 16 x i1> @llvm.aarch64.sve.brka.z.nxv16i1(<n x 16 x i1> %0, <n x 16 x i1> %op)
  ret <n x 16 x i1> %1
}
*** IR Dump After Promote Memory to Register ***
; Function Attrs: nounwind
define <n x 16 x i1> @test_svbrka_b_z(<n x 16 x i1> %pg, <n x 16 x i1> %op) local_unnamed_addr #0 {
entry:
  %0 = call <n x 16 x i1> @llvm.aarch64.sve.brka.z.nxv16i1(<n x 16 x i1> %pg, <n x 16 x i1> %op)
  ret <n x 16 x i1> %0
}

I ran our downstream unit tests with SROA disabled and there's no asm differences.

In D76720#1939896, @efriedma wrote:

We want SROA to at least run mem2reg on scalable vectors, since we don't run mem2reg separately. This is important for C code using SVE intrinsics.

I've tested this on Sander's Clang patch (D76238) adding contiguous loads/stores upstream and mem2reg runs:

$ /home/culrho01/llvm-project/build/bin/clang -cc1 -internal-isystem /home/culrho01/llvm-project/build/lib/clang/11.0.0/include -nostdsysteminc -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -emit-llvm -o - /home/culrho01/llvm-project/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1.c -D__ARM_FEATURE_SVE -mllvm -print-before-all -mllvm -print-after-all

*** IR Dump Before Promote Memory to Register ***
; Function Attrs: nounwind
define <vscale x 16 x i8> @test_svld1_s8(<vscale x 16 x i1> %pg, i8* %base) local_unnamed_addr #0 {
entry:
  %pg.addr = alloca <vscale x 16 x i1>, align 2
  store <vscale x 16 x i1> %pg, <vscale x 16 x i1>* %pg.addr, align 2, !tbaa !2
  %0 = bitcast i8* %base to <vscale x 16 x i8>*
  %1 = call <vscale x 16 x i8> @llvm.masked.load.nxv16i8.p0nxv16i8(<vscale x 16 x i8>* %0, i32 1, <vscale x 16 x i1> %pg, <vscale x 16 x i8> zeroinitializer)
  ret <vscale x 16 x i8> %1
}
*** IR Dump After Promote Memory to Register ***
; Function Attrs: nounwind
define <vscale x 16 x i8> @test_svld1_s8(<vscale x 16 x i1> %pg, i8* %base) local_unnamed_addr #0 {
entry:
  %0 = bitcast i8* %base to <vscale x 16 x i8>*
  %1 = call <vscale x 16 x i8> @llvm.masked.load.nxv16i8.p0nxv16i8(<vscale x 16 x i8>* %0, i32 1, <vscale x 16 x i1> %pg, <vscale x 16 x i8> zeroinitializer)
  ret <vscale x 16 x i8> %1
}``

Hmm, looking more closely, I guess there's technically one run of mem2reg in the standard pass pipeline. But really, I'm surprised that never got killed off; we should not be relying on it.

Run mem2reg for scalable types from SROA.
Fix warning warning: inline function 'llvm::Type::getVectorIsScalable' is not defined reported by Harbor by moving implementation to Type.cpp.

sdesmalen added inline comments.Mar 27 2020, 5:22 AM

llvm/lib/IR/Type.cpp
161 ↗	(On Diff #253080)	Can this be inlined in llvm/include/llvm/IR/Type.h ?
llvm/lib/Transforms/Scalar/SROA.cpp
4474	If you base your patch on D76748, you can use `DL.getTypeAllocSize(AI.getAllocatedType()).isZero()`.

c-rhodes added inline comments.Mar 27 2020, 7:31 AM

llvm/lib/IR/Type.cpp
161 ↗	(On Diff #253080)	Hm I'm not sure, is there a reason other methods like `isVectorTy` aren't defined with inline in Type.h?
llvm/lib/Transforms/Scalar/SROA.cpp
4474	Ah nice, thanks for pointing that out I'll update this.

ctetreau added inline comments.Mar 27 2020, 9:33 AM

llvm/include/llvm/IR/Type.h
233 ↗	(On Diff #253080)	Personally, I think functions like this are a code smell. Why have a type hierarchy at all if we're just going to do everything through the base type? However, given the fact that the VectorType hierarchy is going to become more complicated soon, I'd really prefer that this function not be added. I'll propose alternatives inline below, but if you want this function, I'd prefer that: be a static function in SROA.cpp. be implemented in terms of isa<VectorType> instead of isVectorTy.
llvm/lib/Transforms/Scalar/SROA.cpp
4473–4476	This can be rewritten: { auto *AT = AI.getAllocatedType(); if (AI.isArrayAllocation() \|\| !AT->isSized() \|\| (isa<VectorType>(AT) && cast<VectorType>(AT)->isScalable()) \|\| DL.getTypeAllocSize(AT).getFixedSize() == 0) return false; } AI.getAllocatedType is used 3 times, might as well give it a name. An while isa<VectorType>(AT) && cast<VectorType>(AT)->isScalable() is a little longer than AT.isScalableVectorTy, it's not that bad. on the positive side, it's more explicit as to what it's doing, and it's also less misleading because there's no such thing as a ScalableVectorTy.
4597	if (isa<VectorType>(AI->getAllocatedType()) && cast<VectorType>(AI->getAllocatedType())->isScalable() && isAllocaPromotable(AI)) Same as above.

Removed isScalableVectorTy and replaced uses in SROA with isa<VectorType> and cast<VectorType>(VecTy)->isScalable().

efriedma added inline comments.Mar 27 2020, 10:54 AM

llvm/include/llvm/IR/Type.h
233 ↗	(On Diff #253080)	I think for code that specifically cares about the property "would getTypeAllocSize() return a scalable TypeSize", it's not unreasonable to have a function on Type to query that. The key here being that the code doesn't really care whether the type is a vector. It might make sense to leave it out for now, though, and try to refactor later when we have a better idea what code ends up doing in practice. I think most code that would want to do that calls getTypeAllocSize() somewhere nearby anyway.

Thanks for the comments @ctetreau! I've updated the patch.

llvm/include/llvm/IR/Type.h
233 ↗	(On Diff #253080)	Thanks for the suggestion, I've removed this. I don't think it's particularly useful as a function in SROA given it isn't (and is unlikely) to be used a great deal.

efriedma added inline comments.Mar 27 2020, 10:58 AM

llvm/lib/Transforms/Scalar/SROA.cpp

4599

This looks weird; did you mean to write something like this?

if (AllocaInst *AI = dyn_cast<AllocaInst>(I)) {
  if (isa<VectorType>(AI->getAllocatedType()) &&
      cast<VectorType>(AI->getAllocatedType())->isScalable()) {
    if (isAllocaPromotable(AI))
      PromotableAllocas.push_back(AI);
  } else {
    Worklist.insert(AI);
  }
}

c-rhodes added inline comments.Mar 27 2020, 1:17 PM

llvm/lib/Transforms/Scalar/SROA.cpp
4599	Oops, yes I did! Good spot, we don't want allocas with scalable types that aren't promotable added to the worklist as `runOnAlloca` will blow up. Cheers, I'll fix this.

Fix bug in branch where allocas with scalable types that aren’t promotable could still be added to worklist and subsequently passed to runOnAlloca where asserts would be triggered as this is now fixed-width. Thanks for spotting @efriedma.

c-rhodes marked an inline comment as done.Mar 27 2020, 1:31 PM

ctetreau added inline comments.Mar 27 2020, 1:52 PM

llvm/include/llvm/IR/Type.h
233 ↗	(On Diff #253080)	Thanks for the suggestion, I've removed this. I don't think it's particularly useful as a function in SROA given it isn't (and is unlikely) to be used a great deal. Appreciate it. I think for code that specifically cares about the property "would getTypeAllocSize() return a scalable TypeSize", it's not unreasonable to have a function on Type to query that. The key here being that the code doesn't really care whether the type is a vector. I'd feel better about this function, but I'd like to see it become a more common issue before we add it; Type is cluttered enough as it is.

I'd like to see better test coverage here. At least, a case that can't be promoted, and a case where a scalable alloca lands on the worklist despite the check in SROA::runImpl.

Add test for unpromotable scalable alloca.
Rename %pg -> %vec in <vscale x 16 x i8> test.

In D76720#1947128, @efriedma wrote:

I'd like to see better test coverage here. At least, a case that can't be promoted, and a case where a scalable alloca lands on the worklist despite the check in SROA::runImpl.

I've added a test for a scalable alloca that can't be promoted. I'm not really sure how test the latter, unless you're thinking about the bug you spotted, in which case the test I've added should cover that?

We shouldn't hit it but we do skip scalable allocas in runOnAlloca.

LGTM

This revision is now accepted and ready to land.Mar 30 2020, 12:41 PM

Closed by commit rG84aa6cf1a9fe: [Transforms][SROA] Promote allocas with mem2reg for scalable types (authored by c-rhodes). · Explain WhyApr 1 2020, 3:51 AM

This revision was automatically updated to reflect the committed changes.

sdesmalen mentioned this in D82243: [SVE] Remove calls to VectorType::getNumElements from Scalar.Jul 2 2020, 9:47 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Scalar/

SROA.cpp

155 lines

test/

Transforms/

SROA/

scalable-vectors.ll

36 lines

Diff 254155

llvm/lib/Transforms/Scalar/SROA.cpp

Show First 20 Lines • Show All 656 Lines • ▼ Show 20 Lines	class AllocaSlices::SliceBuilder : public PtrUseVisitor<SliceBuilder> {
SmallDenseMap<Instruction *, uint64_t> PHIOrSelectSizes;		SmallDenseMap<Instruction *, uint64_t> PHIOrSelectSizes;

/// Set to de-duplicate dead instructions found in the use walk.		/// Set to de-duplicate dead instructions found in the use walk.
SmallPtrSet<Instruction *, 4> VisitedDeadInsts;		SmallPtrSet<Instruction *, 4> VisitedDeadInsts;

public:		public:
SliceBuilder(const DataLayout &DL, AllocaInst &AI, AllocaSlices &AS)		SliceBuilder(const DataLayout &DL, AllocaInst &AI, AllocaSlices &AS)
: PtrUseVisitor<SliceBuilder>(DL),		: PtrUseVisitor<SliceBuilder>(DL),
AllocSize(DL.getTypeAllocSize(AI.getAllocatedType())), AS(AS) {}		AllocSize(DL.getTypeAllocSize(AI.getAllocatedType()).getFixedSize()),
		AS(AS) {}

private:		private:
void markAsDead(Instruction &I) {		void markAsDead(Instruction &I) {
if (VisitedDeadInsts.insert(&I).second)		if (VisitedDeadInsts.insert(&I).second)
AS.DeadUsers.push_back(&I);		AS.DeadUsers.push_back(&I);
}		}

void insertUse(Instruction &I, const APInt &Offset, uint64_t Size,		void insertUse(Instruction &I, const APInt &Offset, uint64_t Size,
▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines	if (SROAStrictInbounds && GEPI.isInBounds()) {
unsigned ElementIdx = OpC->getZExtValue();		unsigned ElementIdx = OpC->getZExtValue();
const StructLayout *SL = DL.getStructLayout(STy);		const StructLayout *SL = DL.getStructLayout(STy);
GEPOffset +=		GEPOffset +=
APInt(Offset.getBitWidth(), SL->getElementOffset(ElementIdx));		APInt(Offset.getBitWidth(), SL->getElementOffset(ElementIdx));
} else {		} else {
// For array or vector indices, scale the index by the size of the		// For array or vector indices, scale the index by the size of the
// type.		// type.
APInt Index = OpC->getValue().sextOrTrunc(Offset.getBitWidth());		APInt Index = OpC->getValue().sextOrTrunc(Offset.getBitWidth());
GEPOffset += Index * APInt(Offset.getBitWidth(),		GEPOffset +=
DL.getTypeAllocSize(GTI.getIndexedType()));		Index *
		APInt(Offset.getBitWidth(),
		DL.getTypeAllocSize(GTI.getIndexedType()).getFixedSize());
}		}

// If this index has computed an intermediate pointer which is not		// If this index has computed an intermediate pointer which is not
// inbounds, then the result of the GEP is a poison value and we can		// inbounds, then the result of the GEP is a poison value and we can
// delete it and all uses.		// delete it and all uses.
if (GEPOffset.ugt(AllocSize))		if (GEPOffset.ugt(AllocSize))
return markAsDead(GEPI);		return markAsDead(GEPI);
}		}
Show All 18 Lines	void visitLoadInst(LoadInst &LI) {

if (!IsOffsetKnown)		if (!IsOffsetKnown)
return PI.setAborted(&LI);		return PI.setAborted(&LI);

if (LI.isVolatile() &&		if (LI.isVolatile() &&
LI.getPointerAddressSpace() != DL.getAllocaAddrSpace())		LI.getPointerAddressSpace() != DL.getAllocaAddrSpace())
return PI.setAborted(&LI);		return PI.setAborted(&LI);

uint64_t Size = DL.getTypeStoreSize(LI.getType());		uint64_t Size = DL.getTypeStoreSize(LI.getType()).getFixedSize();
return handleLoadOrStore(LI.getType(), LI, Offset, Size, LI.isVolatile());		return handleLoadOrStore(LI.getType(), LI, Offset, Size, LI.isVolatile());
}		}

void visitStoreInst(StoreInst &SI) {		void visitStoreInst(StoreInst &SI) {
Value *ValOp = SI.getValueOperand();		Value *ValOp = SI.getValueOperand();
if (ValOp == *U)		if (ValOp == *U)
return PI.setEscapedAndAborted(&SI);		return PI.setEscapedAndAborted(&SI);
if (!IsOffsetKnown)		if (!IsOffsetKnown)
return PI.setAborted(&SI);		return PI.setAborted(&SI);

if (SI.isVolatile() &&		if (SI.isVolatile() &&
SI.getPointerAddressSpace() != DL.getAllocaAddrSpace())		SI.getPointerAddressSpace() != DL.getAllocaAddrSpace())
return PI.setAborted(&SI);		return PI.setAborted(&SI);

uint64_t Size = DL.getTypeStoreSize(ValOp->getType());		uint64_t Size = DL.getTypeStoreSize(ValOp->getType()).getFixedSize();

// If this memory access can be shown to statically extend outside the		// If this memory access can be shown to statically extend outside the
// bounds of the allocation, it's behavior is undefined, so simply		// bounds of the allocation, it's behavior is undefined, so simply
// ignore it. Note that this is more strict than the generic clamping		// ignore it. Note that this is more strict than the generic clamping
// behavior of insertUse. We also try to handle cases which might run the		// behavior of insertUse. We also try to handle cases which might run the
// risk of overflow.		// risk of overflow.
// FIXME: We should instead consider the pointer to have escaped if this		// FIXME: We should instead consider the pointer to have escaped if this
// function is being instrumented for addressing bugs or race conditions.		// function is being instrumented for addressing bugs or race conditions.
▲ Show 20 Lines • Show All 401 Lines • ▼ Show 20 Lines	if (LI->getParent() != BB)
return false;		return false;

// Ensure that there are no instructions between the PHI and the load that		// Ensure that there are no instructions between the PHI and the load that
// could store.		// could store.
for (BasicBlock::iterator BBI(PN); &*BBI != LI; ++BBI)		for (BasicBlock::iterator BBI(PN); &*BBI != LI; ++BBI)
if (BBI->mayWriteToMemory())		if (BBI->mayWriteToMemory())
return false;		return false;

uint64_t Size = DL.getTypeStoreSize(LI->getType());		uint64_t Size = DL.getTypeStoreSize(LI->getType()).getFixedSize();
MaxAlign = std::max(MaxAlign, MaybeAlign(LI->getAlignment()));		MaxAlign = std::max(MaxAlign, MaybeAlign(LI->getAlignment()));
MaxSize = MaxSize.ult(Size) ? APInt(APWidth, Size) : MaxSize;		MaxSize = MaxSize.ult(Size) ? APInt(APWidth, Size) : MaxSize;
HaveLoad = true;		HaveLoad = true;
}		}

if (!HaveLoad)		if (!HaveLoad)
return false;		return false;

▲ Show 20 Lines • Show All 241 Lines • ▼ Show 20 Lines	static Value *getNaturalGEPRecursively(IRBuilderTy &IRB, const DataLayout &DL,
// We can't recurse through pointer types.		// We can't recurse through pointer types.
if (Ty->isPointerTy())		if (Ty->isPointerTy())
return nullptr;		return nullptr;

// We try to analyze GEPs over vectors here, but note that these GEPs are		// We try to analyze GEPs over vectors here, but note that these GEPs are
// extremely poorly defined currently. The long-term goal is to remove GEPing		// extremely poorly defined currently. The long-term goal is to remove GEPing
// over a vector from the IR completely.		// over a vector from the IR completely.
if (VectorType *VecTy = dyn_cast<VectorType>(Ty)) {		if (VectorType *VecTy = dyn_cast<VectorType>(Ty)) {
unsigned ElementSizeInBits = DL.getTypeSizeInBits(VecTy->getScalarType());		unsigned ElementSizeInBits =
		DL.getTypeSizeInBits(VecTy->getScalarType()).getFixedSize();
if (ElementSizeInBits % 8 != 0) {		if (ElementSizeInBits % 8 != 0) {
// GEPs over non-multiple of 8 size vector elements are invalid.		// GEPs over non-multiple of 8 size vector elements are invalid.
return nullptr;		return nullptr;
}		}
APInt ElementSize(Offset.getBitWidth(), ElementSizeInBits / 8);		APInt ElementSize(Offset.getBitWidth(), ElementSizeInBits / 8);
APInt NumSkippedElements = Offset.sdiv(ElementSize);		APInt NumSkippedElements = Offset.sdiv(ElementSize);
if (NumSkippedElements.ugt(VecTy->getNumElements()))		if (NumSkippedElements.ugt(VecTy->getNumElements()))
return nullptr;		return nullptr;
Offset -= NumSkippedElements * ElementSize;		Offset -= NumSkippedElements * ElementSize;
Indices.push_back(IRB.getInt(NumSkippedElements));		Indices.push_back(IRB.getInt(NumSkippedElements));
return getNaturalGEPRecursively(IRB, DL, Ptr, VecTy->getElementType(),		return getNaturalGEPRecursively(IRB, DL, Ptr, VecTy->getElementType(),
Offset, TargetTy, Indices, NamePrefix);		Offset, TargetTy, Indices, NamePrefix);
}		}

if (ArrayType *ArrTy = dyn_cast<ArrayType>(Ty)) {		if (ArrayType *ArrTy = dyn_cast<ArrayType>(Ty)) {
Type *ElementTy = ArrTy->getElementType();		Type *ElementTy = ArrTy->getElementType();
APInt ElementSize(Offset.getBitWidth(), DL.getTypeAllocSize(ElementTy));		APInt ElementSize(Offset.getBitWidth(),
		DL.getTypeAllocSize(ElementTy).getFixedSize());
APInt NumSkippedElements = Offset.sdiv(ElementSize);		APInt NumSkippedElements = Offset.sdiv(ElementSize);
if (NumSkippedElements.ugt(ArrTy->getNumElements()))		if (NumSkippedElements.ugt(ArrTy->getNumElements()))
return nullptr;		return nullptr;

Offset -= NumSkippedElements * ElementSize;		Offset -= NumSkippedElements * ElementSize;
Indices.push_back(IRB.getInt(NumSkippedElements));		Indices.push_back(IRB.getInt(NumSkippedElements));
return getNaturalGEPRecursively(IRB, DL, Ptr, ElementTy, Offset, TargetTy,		return getNaturalGEPRecursively(IRB, DL, Ptr, ElementTy, Offset, TargetTy,
Indices, NamePrefix);		Indices, NamePrefix);
}		}

StructType *STy = dyn_cast<StructType>(Ty);		StructType *STy = dyn_cast<StructType>(Ty);
if (!STy)		if (!STy)
return nullptr;		return nullptr;

const StructLayout *SL = DL.getStructLayout(STy);		const StructLayout *SL = DL.getStructLayout(STy);
uint64_t StructOffset = Offset.getZExtValue();		uint64_t StructOffset = Offset.getZExtValue();
if (StructOffset >= SL->getSizeInBytes())		if (StructOffset >= SL->getSizeInBytes())
return nullptr;		return nullptr;
unsigned Index = SL->getElementContainingOffset(StructOffset);		unsigned Index = SL->getElementContainingOffset(StructOffset);
Offset -= APInt(Offset.getBitWidth(), SL->getElementOffset(Index));		Offset -= APInt(Offset.getBitWidth(), SL->getElementOffset(Index));
Type *ElementTy = STy->getElementType(Index);		Type *ElementTy = STy->getElementType(Index);
if (Offset.uge(DL.getTypeAllocSize(ElementTy)))		if (Offset.uge(DL.getTypeAllocSize(ElementTy).getFixedSize()))
return nullptr; // The offset points into alignment padding.		return nullptr; // The offset points into alignment padding.

Indices.push_back(IRB.getInt32(Index));		Indices.push_back(IRB.getInt32(Index));
return getNaturalGEPRecursively(IRB, DL, Ptr, ElementTy, Offset, TargetTy,		return getNaturalGEPRecursively(IRB, DL, Ptr, ElementTy, Offset, TargetTy,
Indices, NamePrefix);		Indices, NamePrefix);
}		}

/// Get a natural GEP from a base pointer to a particular offset and		/// Get a natural GEP from a base pointer to a particular offset and
Show All 15 Lines	static Value *getNaturalGEPWithOffset(IRBuilderTy &IRB, const DataLayout &DL,
// Don't consider any GEPs through an i8* as natural unless the TargetTy is		// Don't consider any GEPs through an i8* as natural unless the TargetTy is
// an i8.		// an i8.
if (Ty == IRB.getInt8PtrTy(Ty->getAddressSpace()) && TargetTy->isIntegerTy(8))		if (Ty == IRB.getInt8PtrTy(Ty->getAddressSpace()) && TargetTy->isIntegerTy(8))
return nullptr;		return nullptr;

Type *ElementTy = Ty->getElementType();		Type *ElementTy = Ty->getElementType();
if (!ElementTy->isSized())		if (!ElementTy->isSized())
return nullptr; // We can't GEP through an unsized element.		return nullptr; // We can't GEP through an unsized element.
APInt ElementSize(Offset.getBitWidth(), DL.getTypeAllocSize(ElementTy));		APInt ElementSize(Offset.getBitWidth(),
		DL.getTypeAllocSize(ElementTy).getFixedSize());
if (ElementSize == 0)		if (ElementSize == 0)
return nullptr; // Zero-length arrays can't help us build a natural GEP.		return nullptr; // Zero-length arrays can't help us build a natural GEP.
APInt NumSkippedElements = Offset.sdiv(ElementSize);		APInt NumSkippedElements = Offset.sdiv(ElementSize);

Offset -= NumSkippedElements * ElementSize;		Offset -= NumSkippedElements * ElementSize;
Indices.push_back(IRB.getInt(NumSkippedElements));		Indices.push_back(IRB.getInt(NumSkippedElements));
return getNaturalGEPRecursively(IRB, DL, Ptr, ElementTy, Offset, TargetTy,		return getNaturalGEPRecursively(IRB, DL, Ptr, ElementTy, Offset, TargetTy,
Indices, NamePrefix);		Indices, NamePrefix);
▲ Show 20 Lines • Show All 150 Lines • ▼ Show 20 Lines	static bool canConvertValue(const DataLayout &DL, Type OldTy, Type NewTy) {
// issues when in conjunction with loads and stores.		// issues when in conjunction with loads and stores.
if (isa<IntegerType>(OldTy) && isa<IntegerType>(NewTy)) {		if (isa<IntegerType>(OldTy) && isa<IntegerType>(NewTy)) {
assert(cast<IntegerType>(OldTy)->getBitWidth() !=		assert(cast<IntegerType>(OldTy)->getBitWidth() !=
cast<IntegerType>(NewTy)->getBitWidth() &&		cast<IntegerType>(NewTy)->getBitWidth() &&
"We can't have the same bitwidth for different int types");		"We can't have the same bitwidth for different int types");
return false;		return false;
}		}

if (DL.getTypeSizeInBits(NewTy) != DL.getTypeSizeInBits(OldTy))		if (DL.getTypeSizeInBits(NewTy).getFixedSize() !=
		DL.getTypeSizeInBits(OldTy).getFixedSize())
return false;		return false;
if (!NewTy->isSingleValueType() \|\| !OldTy->isSingleValueType())		if (!NewTy->isSingleValueType() \|\| !OldTy->isSingleValueType())
return false;		return false;

// We can convert pointers to integers and vice-versa. Same for vectors		// We can convert pointers to integers and vice-versa. Same for vectors
// of pointers and integers.		// of pointers and integers.
OldTy = OldTy->getScalarType();		OldTy = OldTy->getScalarType();
NewTy = NewTy->getScalarType();		NewTy = NewTy->getScalarType();
▲ Show 20 Lines • Show All 156 Lines • ▼ Show 20 Lines	static VectorType *isVectorPromotionViable(Partition &P, const DataLayout &DL) {
SmallVector<VectorType *, 4> CandidateTys;		SmallVector<VectorType *, 4> CandidateTys;
Type *CommonEltTy = nullptr;		Type *CommonEltTy = nullptr;
bool HaveCommonEltTy = true;		bool HaveCommonEltTy = true;
auto CheckCandidateType = [&](Type *Ty) {		auto CheckCandidateType = [&](Type *Ty) {
if (auto *VTy = dyn_cast<VectorType>(Ty)) {		if (auto *VTy = dyn_cast<VectorType>(Ty)) {
// Return if bitcast to vectors is different for total size in bits.		// Return if bitcast to vectors is different for total size in bits.
if (!CandidateTys.empty()) {		if (!CandidateTys.empty()) {
VectorType *V = CandidateTys[0];		VectorType *V = CandidateTys[0];
if (DL.getTypeSizeInBits(VTy) != DL.getTypeSizeInBits(V)) {		if (DL.getTypeSizeInBits(VTy).getFixedSize() !=
		DL.getTypeSizeInBits(V).getFixedSize()) {
CandidateTys.clear();		CandidateTys.clear();
return;		return;
}		}
}		}
CandidateTys.push_back(VTy);		CandidateTys.push_back(VTy);
if (!CommonEltTy)		if (!CommonEltTy)
CommonEltTy = VTy->getElementType();		CommonEltTy = VTy->getElementType();
else if (CommonEltTy != VTy->getElementType())		else if (CommonEltTy != VTy->getElementType())
Show All 29 Lines	if (!HaveCommonEltTy) {
// If there were no integer vector types, give up.		// If there were no integer vector types, give up.
if (CandidateTys.empty())		if (CandidateTys.empty())
return nullptr;		return nullptr;

// Rank the remaining candidate vector types. This is easy because we know		// Rank the remaining candidate vector types. This is easy because we know
// they're all integer vectors. We sort by ascending number of elements.		// they're all integer vectors. We sort by ascending number of elements.
auto RankVectorTypes = [&DL](VectorType RHSTy, VectorType LHSTy) {		auto RankVectorTypes = [&DL](VectorType RHSTy, VectorType LHSTy) {
(void)DL;		(void)DL;
assert(DL.getTypeSizeInBits(RHSTy) == DL.getTypeSizeInBits(LHSTy) &&		assert(DL.getTypeSizeInBits(RHSTy).getFixedSize() ==
		DL.getTypeSizeInBits(LHSTy).getFixedSize() &&
"Cannot have vector types of different sizes!");		"Cannot have vector types of different sizes!");
assert(RHSTy->getElementType()->isIntegerTy() &&		assert(RHSTy->getElementType()->isIntegerTy() &&
"All non-integer types eliminated!");		"All non-integer types eliminated!");
assert(LHSTy->getElementType()->isIntegerTy() &&		assert(LHSTy->getElementType()->isIntegerTy() &&
"All non-integer types eliminated!");		"All non-integer types eliminated!");
return RHSTy->getNumElements() < LHSTy->getNumElements();		return RHSTy->getNumElements() < LHSTy->getNumElements();
};		};
llvm::sort(CandidateTys, RankVectorTypes);		llvm::sort(CandidateTys, RankVectorTypes);
Show All 11 Lines	for (VectorType *VTy : CandidateTys) {
"Different vector types with the same element type!");		"Different vector types with the same element type!");
}		}
#endif		#endif
CandidateTys.resize(1);		CandidateTys.resize(1);
}		}

// Try each vector type, and return the one which works.		// Try each vector type, and return the one which works.
auto CheckVectorTypeForPromotion = [&](VectorType *VTy) {		auto CheckVectorTypeForPromotion = [&](VectorType *VTy) {
uint64_t ElementSize = DL.getTypeSizeInBits(VTy->getElementType());		uint64_t ElementSize =
		DL.getTypeSizeInBits(VTy->getElementType()).getFixedSize();

// While the definition of LLVM vectors is bitpacked, we don't support sizes		// While the definition of LLVM vectors is bitpacked, we don't support sizes
// that aren't byte sized.		// that aren't byte sized.
if (ElementSize % 8)		if (ElementSize % 8)
return false;		return false;
assert((DL.getTypeSizeInBits(VTy) % 8) == 0 &&		assert((DL.getTypeSizeInBits(VTy).getFixedSize() % 8) == 0 &&
"vector size not a multiple of element size?");		"vector size not a multiple of element size?");
ElementSize /= 8;		ElementSize /= 8;

for (const Slice &S : P)		for (const Slice &S : P)
if (!isVectorPromotionViableForSlice(P, S, VTy, ElementSize, DL))		if (!isVectorPromotionViableForSlice(P, S, VTy, ElementSize, DL))
return false;		return false;

for (const Slice *S : P.splitSliceTails())		for (const Slice *S : P.splitSliceTails())
Show All 13 Lines
///		///
/// This implements the necessary checking for the \c isIntegerWideningViable		/// This implements the necessary checking for the \c isIntegerWideningViable
/// test below on a single slice of the alloca.		/// test below on a single slice of the alloca.
static bool isIntegerWideningViableForSlice(const Slice &S,		static bool isIntegerWideningViableForSlice(const Slice &S,
uint64_t AllocBeginOffset,		uint64_t AllocBeginOffset,
Type *AllocaTy,		Type *AllocaTy,
const DataLayout &DL,		const DataLayout &DL,
bool &WholeAllocaOp) {		bool &WholeAllocaOp) {
uint64_t Size = DL.getTypeStoreSize(AllocaTy);		uint64_t Size = DL.getTypeStoreSize(AllocaTy).getFixedSize();

uint64_t RelBegin = S.beginOffset() - AllocBeginOffset;		uint64_t RelBegin = S.beginOffset() - AllocBeginOffset;
uint64_t RelEnd = S.endOffset() - AllocBeginOffset;		uint64_t RelEnd = S.endOffset() - AllocBeginOffset;

// We can't reasonably handle cases where the load or store extends past		// We can't reasonably handle cases where the load or store extends past
// the end of the alloca's type and into its padding.		// the end of the alloca's type and into its padding.
if (RelEnd > Size)		if (RelEnd > Size)
return false;		return false;

Use *U = S.getUse();		Use *U = S.getUse();

if (LoadInst *LI = dyn_cast<LoadInst>(U->getUser())) {		if (LoadInst *LI = dyn_cast<LoadInst>(U->getUser())) {
if (LI->isVolatile())		if (LI->isVolatile())
return false;		return false;
// We can't handle loads that extend past the allocated memory.		// We can't handle loads that extend past the allocated memory.
if (DL.getTypeStoreSize(LI->getType()) > Size)		if (DL.getTypeStoreSize(LI->getType()).getFixedSize() > Size)
return false;		return false;
// So far, AllocaSliceRewriter does not support widening split slice tails		// So far, AllocaSliceRewriter does not support widening split slice tails
// in rewriteIntegerLoad.		// in rewriteIntegerLoad.
if (S.beginOffset() < AllocBeginOffset)		if (S.beginOffset() < AllocBeginOffset)
return false;		return false;
// Note that we don't count vector loads or stores as whole-alloca		// Note that we don't count vector loads or stores as whole-alloca
// operations which enable integer widening because we would prefer to use		// operations which enable integer widening because we would prefer to use
// vector widening instead.		// vector widening instead.
if (!isa<VectorType>(LI->getType()) && RelBegin == 0 && RelEnd == Size)		if (!isa<VectorType>(LI->getType()) && RelBegin == 0 && RelEnd == Size)
WholeAllocaOp = true;		WholeAllocaOp = true;
if (IntegerType *ITy = dyn_cast<IntegerType>(LI->getType())) {		if (IntegerType *ITy = dyn_cast<IntegerType>(LI->getType())) {
if (ITy->getBitWidth() < DL.getTypeStoreSizeInBits(ITy))		if (ITy->getBitWidth() < DL.getTypeStoreSizeInBits(ITy).getFixedSize())
return false;		return false;
} else if (RelBegin != 0 \|\| RelEnd != Size \|\|		} else if (RelBegin != 0 \|\| RelEnd != Size \|\|
!canConvertValue(DL, AllocaTy, LI->getType())) {		!canConvertValue(DL, AllocaTy, LI->getType())) {
// Non-integer loads need to be convertible from the alloca type so that		// Non-integer loads need to be convertible from the alloca type so that
// they are promotable.		// they are promotable.
return false;		return false;
}		}
} else if (StoreInst *SI = dyn_cast<StoreInst>(U->getUser())) {		} else if (StoreInst *SI = dyn_cast<StoreInst>(U->getUser())) {
Type *ValueTy = SI->getValueOperand()->getType();		Type *ValueTy = SI->getValueOperand()->getType();
if (SI->isVolatile())		if (SI->isVolatile())
return false;		return false;
// We can't handle stores that extend past the allocated memory.		// We can't handle stores that extend past the allocated memory.
if (DL.getTypeStoreSize(ValueTy) > Size)		if (DL.getTypeStoreSize(ValueTy).getFixedSize() > Size)
return false;		return false;
// So far, AllocaSliceRewriter does not support widening split slice tails		// So far, AllocaSliceRewriter does not support widening split slice tails
// in rewriteIntegerStore.		// in rewriteIntegerStore.
if (S.beginOffset() < AllocBeginOffset)		if (S.beginOffset() < AllocBeginOffset)
return false;		return false;
// Note that we don't count vector loads or stores as whole-alloca		// Note that we don't count vector loads or stores as whole-alloca
// operations which enable integer widening because we would prefer to use		// operations which enable integer widening because we would prefer to use
// vector widening instead.		// vector widening instead.
if (!isa<VectorType>(ValueTy) && RelBegin == 0 && RelEnd == Size)		if (!isa<VectorType>(ValueTy) && RelBegin == 0 && RelEnd == Size)
WholeAllocaOp = true;		WholeAllocaOp = true;
if (IntegerType *ITy = dyn_cast<IntegerType>(ValueTy)) {		if (IntegerType *ITy = dyn_cast<IntegerType>(ValueTy)) {
if (ITy->getBitWidth() < DL.getTypeStoreSizeInBits(ITy))		if (ITy->getBitWidth() < DL.getTypeStoreSizeInBits(ITy).getFixedSize())
return false;		return false;
} else if (RelBegin != 0 \|\| RelEnd != Size \|\|		} else if (RelBegin != 0 \|\| RelEnd != Size \|\|
!canConvertValue(DL, ValueTy, AllocaTy)) {		!canConvertValue(DL, ValueTy, AllocaTy)) {
// Non-integer stores need to be convertible to the alloca type so that		// Non-integer stores need to be convertible to the alloca type so that
// they are promotable.		// they are promotable.
return false;		return false;
}		}
} else if (MemIntrinsic *MI = dyn_cast<MemIntrinsic>(U->getUser())) {		} else if (MemIntrinsic *MI = dyn_cast<MemIntrinsic>(U->getUser())) {
Show All 14 Lines
/// Test whether the given alloca partition's integer operations can be		/// Test whether the given alloca partition's integer operations can be
/// widened to promotable ones.		/// widened to promotable ones.
///		///
/// This is a quick test to check whether we can rewrite the integer loads and		/// This is a quick test to check whether we can rewrite the integer loads and
/// stores to a particular alloca into wider loads and stores and be able to		/// stores to a particular alloca into wider loads and stores and be able to
/// promote the resulting alloca.		/// promote the resulting alloca.
static bool isIntegerWideningViable(Partition &P, Type *AllocaTy,		static bool isIntegerWideningViable(Partition &P, Type *AllocaTy,
const DataLayout &DL) {		const DataLayout &DL) {
uint64_t SizeInBits = DL.getTypeSizeInBits(AllocaTy);		uint64_t SizeInBits = DL.getTypeSizeInBits(AllocaTy).getFixedSize();
// Don't create integer types larger than the maximum bitwidth.		// Don't create integer types larger than the maximum bitwidth.
if (SizeInBits > IntegerType::MAX_INT_BITS)		if (SizeInBits > IntegerType::MAX_INT_BITS)
return false;		return false;

// Don't try to handle allocas with bit-padding.		// Don't try to handle allocas with bit-padding.
if (SizeInBits != DL.getTypeStoreSizeInBits(AllocaTy))		if (SizeInBits != DL.getTypeStoreSizeInBits(AllocaTy).getFixedSize())
return false;		return false;

// We need to ensure that an integer type with the appropriate bitwidth can		// We need to ensure that an integer type with the appropriate bitwidth can
// be converted to the alloca type, whatever that is. We don't want to force		// be converted to the alloca type, whatever that is. We don't want to force
// the alloca itself to have an integer type if there is a more suitable one.		// the alloca itself to have an integer type if there is a more suitable one.
Type *IntTy = Type::getIntNTy(AllocaTy->getContext(), SizeInBits);		Type *IntTy = Type::getIntNTy(AllocaTy->getContext(), SizeInBits);
if (!canConvertValue(DL, AllocaTy, IntTy) \|\|		if (!canConvertValue(DL, AllocaTy, IntTy) \|\|
!canConvertValue(DL, IntTy, AllocaTy))		!canConvertValue(DL, IntTy, AllocaTy))
Show All 22 Lines	static bool isIntegerWideningViable(Partition &P, Type *AllocaTy,
return WholeAllocaOp;		return WholeAllocaOp;
}		}

static Value extractInteger(const DataLayout &DL, IRBuilderTy &IRB, Value V,		static Value extractInteger(const DataLayout &DL, IRBuilderTy &IRB, Value V,
IntegerType *Ty, uint64_t Offset,		IntegerType *Ty, uint64_t Offset,
const Twine &Name) {		const Twine &Name) {
LLVM_DEBUG(dbgs() << " start: " << *V << "\n");		LLVM_DEBUG(dbgs() << " start: " << *V << "\n");
IntegerType *IntTy = cast<IntegerType>(V->getType());		IntegerType *IntTy = cast<IntegerType>(V->getType());
assert(DL.getTypeStoreSize(Ty) + Offset <= DL.getTypeStoreSize(IntTy) &&		assert(DL.getTypeStoreSize(Ty).getFixedSize() + Offset <=
		DL.getTypeStoreSize(IntTy).getFixedSize() &&
"Element extends past full value");		"Element extends past full value");
uint64_t ShAmt = 8 * Offset;		uint64_t ShAmt = 8 * Offset;
if (DL.isBigEndian())		if (DL.isBigEndian())
ShAmt = 8 * (DL.getTypeStoreSize(IntTy) - DL.getTypeStoreSize(Ty) - Offset);		ShAmt = 8 * (DL.getTypeStoreSize(IntTy).getFixedSize() -
		DL.getTypeStoreSize(Ty).getFixedSize() - Offset);
if (ShAmt) {		if (ShAmt) {
V = IRB.CreateLShr(V, ShAmt, Name + ".shift");		V = IRB.CreateLShr(V, ShAmt, Name + ".shift");
LLVM_DEBUG(dbgs() << " shifted: " << *V << "\n");		LLVM_DEBUG(dbgs() << " shifted: " << *V << "\n");
}		}
assert(Ty->getBitWidth() <= IntTy->getBitWidth() &&		assert(Ty->getBitWidth() <= IntTy->getBitWidth() &&
"Cannot extract to a larger integer!");		"Cannot extract to a larger integer!");
if (Ty != IntTy) {		if (Ty != IntTy) {
V = IRB.CreateTrunc(V, Ty, Name + ".trunc");		V = IRB.CreateTrunc(V, Ty, Name + ".trunc");
LLVM_DEBUG(dbgs() << " trunced: " << *V << "\n");		LLVM_DEBUG(dbgs() << " trunced: " << *V << "\n");
}		}
return V;		return V;
}		}

static Value insertInteger(const DataLayout &DL, IRBuilderTy &IRB, Value Old,		static Value insertInteger(const DataLayout &DL, IRBuilderTy &IRB, Value Old,
Value *V, uint64_t Offset, const Twine &Name) {		Value *V, uint64_t Offset, const Twine &Name) {
IntegerType *IntTy = cast<IntegerType>(Old->getType());		IntegerType *IntTy = cast<IntegerType>(Old->getType());
IntegerType *Ty = cast<IntegerType>(V->getType());		IntegerType *Ty = cast<IntegerType>(V->getType());
assert(Ty->getBitWidth() <= IntTy->getBitWidth() &&		assert(Ty->getBitWidth() <= IntTy->getBitWidth() &&
"Cannot insert a larger integer!");		"Cannot insert a larger integer!");
LLVM_DEBUG(dbgs() << " start: " << *V << "\n");		LLVM_DEBUG(dbgs() << " start: " << *V << "\n");
if (Ty != IntTy) {		if (Ty != IntTy) {
V = IRB.CreateZExt(V, IntTy, Name + ".ext");		V = IRB.CreateZExt(V, IntTy, Name + ".ext");
LLVM_DEBUG(dbgs() << " extended: " << *V << "\n");		LLVM_DEBUG(dbgs() << " extended: " << *V << "\n");
}		}
assert(DL.getTypeStoreSize(Ty) + Offset <= DL.getTypeStoreSize(IntTy) &&		assert(DL.getTypeStoreSize(Ty).getFixedSize() + Offset <=
		DL.getTypeStoreSize(IntTy).getFixedSize() &&
"Element store outside of alloca store");		"Element store outside of alloca store");
uint64_t ShAmt = 8 * Offset;		uint64_t ShAmt = 8 * Offset;
if (DL.isBigEndian())		if (DL.isBigEndian())
ShAmt = 8 * (DL.getTypeStoreSize(IntTy) - DL.getTypeStoreSize(Ty) - Offset);		ShAmt = 8 * (DL.getTypeStoreSize(IntTy).getFixedSize() -
		DL.getTypeStoreSize(Ty).getFixedSize() - Offset);
if (ShAmt) {		if (ShAmt) {
V = IRB.CreateShl(V, ShAmt, Name + ".shift");		V = IRB.CreateShl(V, ShAmt, Name + ".shift");
LLVM_DEBUG(dbgs() << " shifted: " << *V << "\n");		LLVM_DEBUG(dbgs() << " shifted: " << *V << "\n");
}		}

if (ShAmt \|\| Ty->getBitWidth() < IntTy->getBitWidth()) {		if (ShAmt \|\| Ty->getBitWidth() < IntTy->getBitWidth()) {
APInt Mask = ~Ty->getMask().zext(IntTy->getBitWidth()).shl(ShAmt);		APInt Mask = ~Ty->getMask().zext(IntTy->getBitWidth()).shl(ShAmt);
Old = IRB.CreateAnd(Old, Mask, Name + ".mask");		Old = IRB.CreateAnd(Old, Mask, Name + ".mask");
▲ Show 20 Lines • Show All 146 Lines • ▼ Show 20 Lines	AllocaSliceRewriter(const DataLayout &DL, AllocaSlices &AS, SROA &Pass,
uint64_t NewAllocaEndOffset, bool IsIntegerPromotable,		uint64_t NewAllocaEndOffset, bool IsIntegerPromotable,
VectorType *PromotableVecTy,		VectorType *PromotableVecTy,
SmallSetVector<PHINode *, 8> &PHIUsers,		SmallSetVector<PHINode *, 8> &PHIUsers,
SmallSetVector<SelectInst *, 8> &SelectUsers)		SmallSetVector<SelectInst *, 8> &SelectUsers)
: DL(DL), AS(AS), Pass(Pass), OldAI(OldAI), NewAI(NewAI),		: DL(DL), AS(AS), Pass(Pass), OldAI(OldAI), NewAI(NewAI),
NewAllocaBeginOffset(NewAllocaBeginOffset),		NewAllocaBeginOffset(NewAllocaBeginOffset),
NewAllocaEndOffset(NewAllocaEndOffset),		NewAllocaEndOffset(NewAllocaEndOffset),
NewAllocaTy(NewAI.getAllocatedType()),		NewAllocaTy(NewAI.getAllocatedType()),
IntTy(IsIntegerPromotable		IntTy(
? Type::getIntNTy(		IsIntegerPromotable
NewAI.getContext(),		? Type::getIntNTy(NewAI.getContext(),
DL.getTypeSizeInBits(NewAI.getAllocatedType()))		DL.getTypeSizeInBits(NewAI.getAllocatedType())
		.getFixedSize())
: nullptr),		: nullptr),
VecTy(PromotableVecTy),		VecTy(PromotableVecTy),
ElementTy(VecTy ? VecTy->getElementType() : nullptr),		ElementTy(VecTy ? VecTy->getElementType() : nullptr),
ElementSize(VecTy ? DL.getTypeSizeInBits(ElementTy) / 8 : 0),		ElementSize(VecTy ? DL.getTypeSizeInBits(ElementTy).getFixedSize() / 8
		: 0),
PHIUsers(PHIUsers), SelectUsers(SelectUsers),		PHIUsers(PHIUsers), SelectUsers(SelectUsers),
IRB(NewAI.getContext(), ConstantFolder()) {		IRB(NewAI.getContext(), ConstantFolder()) {
if (VecTy) {		if (VecTy) {
assert((DL.getTypeSizeInBits(ElementTy) % 8) == 0 &&		assert((DL.getTypeSizeInBits(ElementTy).getFixedSize() % 8) == 0 &&
"Only multiple-of-8 sized vector elements are viable");		"Only multiple-of-8 sized vector elements are viable");
++NumVectorized;		++NumVectorized;
}		}
assert((!IntTy && !VecTy) \|\| (IntTy && !VecTy) \|\| (!IntTy && VecTy));		assert((!IntTy && !VecTy) \|\| (IntTy && !VecTy) \|\| (!IntTy && VecTy));
}		}

bool visit(AllocaSlices::const_iterator I) {		bool visit(AllocaSlices::const_iterator I) {
bool CanSROA = true;		bool CanSROA = true;
▲ Show 20 Lines • Show All 148 Lines • ▼ Show 20 Lines	bool visitLoadInst(LoadInst &LI) {

AAMDNodes AATags;		AAMDNodes AATags;
LI.getAAMetadata(AATags);		LI.getAAMetadata(AATags);

unsigned AS = LI.getPointerAddressSpace();		unsigned AS = LI.getPointerAddressSpace();

Type TargetTy = IsSplit ? Type::getIntNTy(LI.getContext(), SliceSize 8)		Type TargetTy = IsSplit ? Type::getIntNTy(LI.getContext(), SliceSize 8)
: LI.getType();		: LI.getType();
const bool IsLoadPastEnd = DL.getTypeStoreSize(TargetTy) > SliceSize;		const bool IsLoadPastEnd =
		DL.getTypeStoreSize(TargetTy).getFixedSize() > SliceSize;
bool IsPtrAdjusted = false;		bool IsPtrAdjusted = false;
Value *V;		Value *V;
if (VecTy) {		if (VecTy) {
V = rewriteVectorizedLoadInst();		V = rewriteVectorizedLoadInst();
} else if (IntTy && LI.getType()->isIntegerTy()) {		} else if (IntTy && LI.getType()->isIntegerTy()) {
V = rewriteIntegerLoad(LI);		V = rewriteIntegerLoad(LI);
} else if (NewBeginOffset == NewAllocaBeginOffset &&		} else if (NewBeginOffset == NewAllocaBeginOffset &&
NewEndOffset == NewAllocaEndOffset &&		NewEndOffset == NewAllocaEndOffset &&
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	if (VecTy) {
IsPtrAdjusted = true;		IsPtrAdjusted = true;
}		}
V = convertValue(DL, IRB, V, TargetTy);		V = convertValue(DL, IRB, V, TargetTy);

if (IsSplit) {		if (IsSplit) {
assert(!LI.isVolatile());		assert(!LI.isVolatile());
assert(LI.getType()->isIntegerTy() &&		assert(LI.getType()->isIntegerTy() &&
"Only integer type loads and stores are split");		"Only integer type loads and stores are split");
assert(SliceSize < DL.getTypeStoreSize(LI.getType()) &&		assert(SliceSize < DL.getTypeStoreSize(LI.getType()).getFixedSize() &&
"Split load isn't smaller than original load");		"Split load isn't smaller than original load");
assert(DL.typeSizeEqualsStoreSize(LI.getType()) &&		assert(DL.typeSizeEqualsStoreSize(LI.getType()) &&
"Non-byte-multiple bit width");		"Non-byte-multiple bit width");
// Move the insertion point just past the load so that we can refer to it.		// Move the insertion point just past the load so that we can refer to it.
IRB.SetInsertPoint(&*std::next(BasicBlock::iterator(&LI)));		IRB.SetInsertPoint(&*std::next(BasicBlock::iterator(&LI)));
// Create a placeholder value with the same type as LI to use as the		// Create a placeholder value with the same type as LI to use as the
// basis for the new value. This allows us to replace the uses of LI with		// basis for the new value. This allows us to replace the uses of LI with
// the computed value, and then replace the placeholder with LI, leaving		// the computed value, and then replace the placeholder with LI, leaving
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	bool rewriteVectorizedStoreInst(Value V, StoreInst &SI, Value OldOp,

LLVM_DEBUG(dbgs() << " to: " << *Store << "\n");		LLVM_DEBUG(dbgs() << " to: " << *Store << "\n");
return true;		return true;
}		}

bool rewriteIntegerStore(Value *V, StoreInst &SI, AAMDNodes AATags) {		bool rewriteIntegerStore(Value *V, StoreInst &SI, AAMDNodes AATags) {
assert(IntTy && "We cannot extract an integer from the alloca");		assert(IntTy && "We cannot extract an integer from the alloca");
assert(!SI.isVolatile());		assert(!SI.isVolatile());
if (DL.getTypeSizeInBits(V->getType()) != IntTy->getBitWidth()) {		if (DL.getTypeSizeInBits(V->getType()).getFixedSize() !=
		IntTy->getBitWidth()) {
Value *Old = IRB.CreateAlignedLoad(NewAI.getAllocatedType(), &NewAI,		Value *Old = IRB.CreateAlignedLoad(NewAI.getAllocatedType(), &NewAI,
NewAI.getAlign(), "oldload");		NewAI.getAlign(), "oldload");
Old = convertValue(DL, IRB, Old, IntTy);		Old = convertValue(DL, IRB, Old, IntTy);
assert(BeginOffset >= NewAllocaBeginOffset && "Out of bounds offset");		assert(BeginOffset >= NewAllocaBeginOffset && "Out of bounds offset");
uint64_t Offset = BeginOffset - NewAllocaBeginOffset;		uint64_t Offset = BeginOffset - NewAllocaBeginOffset;
V = insertInteger(DL, IRB, Old, SI.getValueOperand(), Offset, "insert");		V = insertInteger(DL, IRB, Old, SI.getValueOperand(), Offset, "insert");
}		}
V = convertValue(DL, IRB, V, NewAllocaTy);		V = convertValue(DL, IRB, V, NewAllocaTy);
Show All 18 Lines	bool visitStoreInst(StoreInst &SI) {
Value *V = SI.getValueOperand();		Value *V = SI.getValueOperand();

// Strip all inbounds GEPs and pointer casts to try to dig out any root		// Strip all inbounds GEPs and pointer casts to try to dig out any root
// alloca that should be re-examined after promoting this alloca.		// alloca that should be re-examined after promoting this alloca.
if (V->getType()->isPointerTy())		if (V->getType()->isPointerTy())
if (AllocaInst *AI = dyn_cast<AllocaInst>(V->stripInBoundsOffsets()))		if (AllocaInst *AI = dyn_cast<AllocaInst>(V->stripInBoundsOffsets()))
Pass.PostPromotionWorklist.insert(AI);		Pass.PostPromotionWorklist.insert(AI);

if (SliceSize < DL.getTypeStoreSize(V->getType())) {		if (SliceSize < DL.getTypeStoreSize(V->getType()).getFixedSize()) {
assert(!SI.isVolatile());		assert(!SI.isVolatile());
assert(V->getType()->isIntegerTy() &&		assert(V->getType()->isIntegerTy() &&
"Only integer type loads and stores are split");		"Only integer type loads and stores are split");
assert(DL.typeSizeEqualsStoreSize(V->getType()) &&		assert(DL.typeSizeEqualsStoreSize(V->getType()) &&
"Non-byte-multiple bit width");		"Non-byte-multiple bit width");
IntegerType NarrowTy = Type::getIntNTy(SI.getContext(), SliceSize 8);		IntegerType NarrowTy = Type::getIntNTy(SI.getContext(), SliceSize 8);
V = extractInteger(DL, IRB, V, NarrowTy, NewBeginOffset - BeginOffset,		V = extractInteger(DL, IRB, V, NarrowTy, NewBeginOffset - BeginOffset,
"extract");		"extract");
}		}

if (VecTy)		if (VecTy)
return rewriteVectorizedStoreInst(V, SI, OldOp, AATags);		return rewriteVectorizedStoreInst(V, SI, OldOp, AATags);
if (IntTy && V->getType()->isIntegerTy())		if (IntTy && V->getType()->isIntegerTy())
return rewriteIntegerStore(V, SI, AATags);		return rewriteIntegerStore(V, SI, AATags);

const bool IsStorePastEnd = DL.getTypeStoreSize(V->getType()) > SliceSize;		const bool IsStorePastEnd =
		DL.getTypeStoreSize(V->getType()).getFixedSize() > SliceSize;
StoreInst *NewSI;		StoreInst *NewSI;
if (NewBeginOffset == NewAllocaBeginOffset &&		if (NewBeginOffset == NewAllocaBeginOffset &&
NewEndOffset == NewAllocaEndOffset &&		NewEndOffset == NewAllocaEndOffset &&
(canConvertValue(DL, V->getType(), NewAllocaTy) \|\|		(canConvertValue(DL, V->getType(), NewAllocaTy) \|\|
(IsStorePastEnd && NewAllocaTy->isIntegerTy() &&		(IsStorePastEnd && NewAllocaTy->isIntegerTy() &&
V->getType()->isIntegerTy()))) {		V->getType()->isIntegerTy()))) {
// If this is an integer store past the end of slice (and thus the bytes		// If this is an integer store past the end of slice (and thus the bytes
// past that point are irrelevant or this is unreachable), truncate the		// past that point are irrelevant or this is unreachable), truncate the
▲ Show 20 Lines • Show All 98 Lines • ▼ Show 20 Lines	const bool CanContinue = [&]() {
return false;		return false;
auto *C = cast<ConstantInt>(II.getLength());		auto *C = cast<ConstantInt>(II.getLength());
if (C->getBitWidth() > 64)		if (C->getBitWidth() > 64)
return false;		return false;
const auto Len = C->getZExtValue();		const auto Len = C->getZExtValue();
auto *Int8Ty = IntegerType::getInt8Ty(NewAI.getContext());		auto *Int8Ty = IntegerType::getInt8Ty(NewAI.getContext());
auto *SrcTy = VectorType::get(Int8Ty, Len);		auto *SrcTy = VectorType::get(Int8Ty, Len);
return canConvertValue(DL, SrcTy, AllocaTy) &&		return canConvertValue(DL, SrcTy, AllocaTy) &&
DL.isLegalInteger(DL.getTypeSizeInBits(ScalarTy));		DL.isLegalInteger(DL.getTypeSizeInBits(ScalarTy).getFixedSize());
}();		}();

// If this doesn't map cleanly onto the alloca type, and that type isn't		// If this doesn't map cleanly onto the alloca type, and that type isn't
// a single value type, just emit a memset.		// a single value type, just emit a memset.
if (!CanContinue) {		if (!CanContinue) {
Type *SizeTy = II.getLength()->getType();		Type *SizeTy = II.getLength()->getType();
Constant *Size = ConstantInt::get(SizeTy, NewEndOffset - NewBeginOffset);		Constant *Size = ConstantInt::get(SizeTy, NewEndOffset - NewBeginOffset);
CallInst *New = IRB.CreateMemSet(		CallInst *New = IRB.CreateMemSet(
Show All 17 Lines	if (VecTy) {
assert(ElementTy == ScalarTy);		assert(ElementTy == ScalarTy);

unsigned BeginIndex = getIndex(NewBeginOffset);		unsigned BeginIndex = getIndex(NewBeginOffset);
unsigned EndIndex = getIndex(NewEndOffset);		unsigned EndIndex = getIndex(NewEndOffset);
assert(EndIndex > BeginIndex && "Empty vector!");		assert(EndIndex > BeginIndex && "Empty vector!");
unsigned NumElements = EndIndex - BeginIndex;		unsigned NumElements = EndIndex - BeginIndex;
assert(NumElements <= VecTy->getNumElements() && "Too many elements!");		assert(NumElements <= VecTy->getNumElements() && "Too many elements!");

Value *Splat =		Value *Splat = getIntegerSplat(
getIntegerSplat(II.getValue(), DL.getTypeSizeInBits(ElementTy) / 8);		II.getValue(), DL.getTypeSizeInBits(ElementTy).getFixedSize() / 8);
Splat = convertValue(DL, IRB, Splat, ElementTy);		Splat = convertValue(DL, IRB, Splat, ElementTy);
if (NumElements > 1)		if (NumElements > 1)
Splat = getVectorSplat(Splat, NumElements);		Splat = getVectorSplat(Splat, NumElements);

Value *Old = IRB.CreateAlignedLoad(NewAI.getAllocatedType(), &NewAI,		Value *Old = IRB.CreateAlignedLoad(NewAI.getAllocatedType(), &NewAI,
NewAI.getAlign(), "oldload");		NewAI.getAlign(), "oldload");
V = insertVector(IRB, Old, Splat, BeginIndex, "vec");		V = insertVector(IRB, Old, Splat, BeginIndex, "vec");
} else if (IntTy) {		} else if (IntTy) {
Show All 16 Lines	if (VecTy) {
"Wrong type for an alloca wide integer!");		"Wrong type for an alloca wide integer!");
}		}
V = convertValue(DL, IRB, V, AllocaTy);		V = convertValue(DL, IRB, V, AllocaTy);
} else {		} else {
// Established these invariants above.		// Established these invariants above.
assert(NewBeginOffset == NewAllocaBeginOffset);		assert(NewBeginOffset == NewAllocaBeginOffset);
assert(NewEndOffset == NewAllocaEndOffset);		assert(NewEndOffset == NewAllocaEndOffset);

V = getIntegerSplat(II.getValue(), DL.getTypeSizeInBits(ScalarTy) / 8);		V = getIntegerSplat(II.getValue(),
		DL.getTypeSizeInBits(ScalarTy).getFixedSize() / 8);
if (VectorType *AllocaVecTy = dyn_cast<VectorType>(AllocaTy))		if (VectorType *AllocaVecTy = dyn_cast<VectorType>(AllocaTy))
V = getVectorSplat(V, AllocaVecTy->getNumElements());		V = getVectorSplat(V, AllocaVecTy->getNumElements());

V = convertValue(DL, IRB, V, AllocaTy);		V = convertValue(DL, IRB, V, AllocaTy);
}		}

StoreInst *New =		StoreInst *New =
IRB.CreateAlignedStore(V, &NewAI, NewAI.getAlign(), II.isVolatile());		IRB.CreateAlignedStore(V, &NewAI, NewAI.getAlign(), II.isVolatile());
▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	bool visitMemTransferInst(MemTransferInst &II) {
// memmove with memcpy, and we don't need to worry about all manner of		// memmove with memcpy, and we don't need to worry about all manner of
// downsides to splitting and transforming the operations.		// downsides to splitting and transforming the operations.

// If this doesn't map cleanly onto the alloca type, and that type isn't		// If this doesn't map cleanly onto the alloca type, and that type isn't
// a single value type, just emit a memcpy.		// a single value type, just emit a memcpy.
bool EmitMemCpy =		bool EmitMemCpy =
!VecTy && !IntTy &&		!VecTy && !IntTy &&
(BeginOffset > NewAllocaBeginOffset \|\| EndOffset < NewAllocaEndOffset \|\|		(BeginOffset > NewAllocaBeginOffset \|\| EndOffset < NewAllocaEndOffset \|\|
SliceSize != DL.getTypeStoreSize(NewAI.getAllocatedType()) \|\|		SliceSize !=
		DL.getTypeStoreSize(NewAI.getAllocatedType()).getFixedSize() \|\|
!NewAI.getAllocatedType()->isSingleValueType());		!NewAI.getAllocatedType()->isSingleValueType());

// If we're just going to emit a memcpy, the alloca hasn't changed, and the		// If we're just going to emit a memcpy, the alloca hasn't changed, and the
// size hasn't been shrunk based on analysis of the viable range, this is		// size hasn't been shrunk based on analysis of the viable range, this is
// a no-op.		// a no-op.
if (EmitMemCpy && &OldAI == &NewAI) {		if (EmitMemCpy && &OldAI == &NewAI) {
// Ensure the start lines up.		// Ensure the start lines up.
assert(NewBeginOffset == BeginOffset);		assert(NewBeginOffset == BeginOffset);
▲ Show 20 Lines • Show All 529 Lines • ▼ Show 20 Lines
///		///
/// This removes no-op aggregate types wrapping an underlying type. It will		/// This removes no-op aggregate types wrapping an underlying type. It will
/// strip as many layers of types as it can without changing either the type		/// strip as many layers of types as it can without changing either the type
/// size or the allocated size.		/// size or the allocated size.
static Type stripAggregateTypeWrapping(const DataLayout &DL, Type Ty) {		static Type stripAggregateTypeWrapping(const DataLayout &DL, Type Ty) {
if (Ty->isSingleValueType())		if (Ty->isSingleValueType())
return Ty;		return Ty;

uint64_t AllocSize = DL.getTypeAllocSize(Ty);		uint64_t AllocSize = DL.getTypeAllocSize(Ty).getFixedSize();
uint64_t TypeSize = DL.getTypeSizeInBits(Ty);		uint64_t TypeSize = DL.getTypeSizeInBits(Ty).getFixedSize();

Type *InnerTy;		Type *InnerTy;
if (ArrayType *ArrTy = dyn_cast<ArrayType>(Ty)) {		if (ArrayType *ArrTy = dyn_cast<ArrayType>(Ty)) {
InnerTy = ArrTy->getElementType();		InnerTy = ArrTy->getElementType();
} else if (StructType *STy = dyn_cast<StructType>(Ty)) {		} else if (StructType *STy = dyn_cast<StructType>(Ty)) {
const StructLayout *SL = DL.getStructLayout(STy);		const StructLayout *SL = DL.getStructLayout(STy);
unsigned Index = SL->getElementContainingOffset(0);		unsigned Index = SL->getElementContainingOffset(0);
InnerTy = STy->getElementType(Index);		InnerTy = STy->getElementType(Index);
} else {		} else {
return Ty;		return Ty;
}		}

if (AllocSize > DL.getTypeAllocSize(InnerTy) \|\|		if (AllocSize > DL.getTypeAllocSize(InnerTy).getFixedSize() \|\|
TypeSize > DL.getTypeSizeInBits(InnerTy))		TypeSize > DL.getTypeSizeInBits(InnerTy).getFixedSize())
return Ty;		return Ty;

return stripAggregateTypeWrapping(DL, InnerTy);		return stripAggregateTypeWrapping(DL, InnerTy);
}		}

/// Try to find a partition of the aggregate type passed in for a given		/// Try to find a partition of the aggregate type passed in for a given
/// offset and size.		/// offset and size.
///		///
/// This recurses through the aggregate type and tries to compute a subtype		/// This recurses through the aggregate type and tries to compute a subtype
/// based on the offset and size. When the offset and size span a sub-section		/// based on the offset and size. When the offset and size span a sub-section
/// of an array, it will even compute a new array type for that sub-section,		/// of an array, it will even compute a new array type for that sub-section,
/// and the same for structs.		/// and the same for structs.
///		///
/// Note that this routine is very strict and tries to find a partition of the		/// Note that this routine is very strict and tries to find a partition of the
/// type which produces the exact right offset and size. It is not forgiving		/// type which produces the exact right offset and size. It is not forgiving
/// when the size or offset cause either end of type-based partition to be off.		/// when the size or offset cause either end of type-based partition to be off.
/// Also, this is a best-effort routine. It is reasonable to give up and not		/// Also, this is a best-effort routine. It is reasonable to give up and not
/// return a type if necessary.		/// return a type if necessary.
static Type getTypePartition(const DataLayout &DL, Type Ty, uint64_t Offset,		static Type getTypePartition(const DataLayout &DL, Type Ty, uint64_t Offset,
uint64_t Size) {		uint64_t Size) {
if (Offset == 0 && DL.getTypeAllocSize(Ty) == Size)		if (Offset == 0 && DL.getTypeAllocSize(Ty).getFixedSize() == Size)
return stripAggregateTypeWrapping(DL, Ty);		return stripAggregateTypeWrapping(DL, Ty);
if (Offset > DL.getTypeAllocSize(Ty) \|\|		if (Offset > DL.getTypeAllocSize(Ty).getFixedSize() \|\|
(DL.getTypeAllocSize(Ty) - Offset) < Size)		(DL.getTypeAllocSize(Ty).getFixedSize() - Offset) < Size)
return nullptr;		return nullptr;

if (SequentialType *SeqTy = dyn_cast<SequentialType>(Ty)) {		if (SequentialType *SeqTy = dyn_cast<SequentialType>(Ty)) {
Type *ElementTy = SeqTy->getElementType();		Type *ElementTy = SeqTy->getElementType();
uint64_t ElementSize = DL.getTypeAllocSize(ElementTy);		uint64_t ElementSize = DL.getTypeAllocSize(ElementTy).getFixedSize();
uint64_t NumSkippedElements = Offset / ElementSize;		uint64_t NumSkippedElements = Offset / ElementSize;
if (NumSkippedElements >= SeqTy->getNumElements())		if (NumSkippedElements >= SeqTy->getNumElements())
return nullptr;		return nullptr;
Offset -= NumSkippedElements * ElementSize;		Offset -= NumSkippedElements * ElementSize;

// First check if we need to recurse.		// First check if we need to recurse.
if (Offset > 0 \|\| Size < ElementSize) {		if (Offset > 0 \|\| Size < ElementSize) {
// Bail if the partition ends in a different array element.		// Bail if the partition ends in a different array element.
Show All 23 Lines	static Type getTypePartition(const DataLayout &DL, Type Ty, uint64_t Offset,
uint64_t EndOffset = Offset + Size;		uint64_t EndOffset = Offset + Size;
if (EndOffset > SL->getSizeInBytes())		if (EndOffset > SL->getSizeInBytes())
return nullptr;		return nullptr;

unsigned Index = SL->getElementContainingOffset(Offset);		unsigned Index = SL->getElementContainingOffset(Offset);
Offset -= SL->getElementOffset(Index);		Offset -= SL->getElementOffset(Index);

Type *ElementTy = STy->getElementType(Index);		Type *ElementTy = STy->getElementType(Index);
uint64_t ElementSize = DL.getTypeAllocSize(ElementTy);		uint64_t ElementSize = DL.getTypeAllocSize(ElementTy).getFixedSize();
if (Offset >= ElementSize)		if (Offset >= ElementSize)
return nullptr; // The offset points into alignment padding.		return nullptr; // The offset points into alignment padding.

// See if any partition must be contained by the element.		// See if any partition must be contained by the element.
if (Offset > 0 \|\| Size < ElementSize) {		if (Offset > 0 \|\| Size < ElementSize) {
if ((Offset + Size) > ElementSize)		if ((Offset + Size) > ElementSize)
return nullptr;		return nullptr;
return getTypePartition(DL, ElementTy, Offset, Size);		return getTypePartition(DL, ElementTy, Offset, Size);
▲ Show 20 Lines • Show All 551 Lines • ▼ Show 20 Lines
AllocaInst *SROA::rewritePartition(AllocaInst &AI, AllocaSlices &AS,		AllocaInst *SROA::rewritePartition(AllocaInst &AI, AllocaSlices &AS,
Partition &P) {		Partition &P) {
// Try to compute a friendly type for this partition of the alloca. This		// Try to compute a friendly type for this partition of the alloca. This
// won't always succeed, in which case we fall back to a legal integer type		// won't always succeed, in which case we fall back to a legal integer type
// or an i8 array of an appropriate size.		// or an i8 array of an appropriate size.
Type *SliceTy = nullptr;		Type *SliceTy = nullptr;
const DataLayout &DL = AI.getModule()->getDataLayout();		const DataLayout &DL = AI.getModule()->getDataLayout();
if (Type *CommonUseTy = findCommonType(P.begin(), P.end(), P.endOffset()))		if (Type *CommonUseTy = findCommonType(P.begin(), P.end(), P.endOffset()))
if (DL.getTypeAllocSize(CommonUseTy) >= P.size())		if (DL.getTypeAllocSize(CommonUseTy).getFixedSize() >= P.size())
SliceTy = CommonUseTy;		SliceTy = CommonUseTy;
if (!SliceTy)		if (!SliceTy)
if (Type *TypePartitionTy = getTypePartition(DL, AI.getAllocatedType(),		if (Type *TypePartitionTy = getTypePartition(DL, AI.getAllocatedType(),
P.beginOffset(), P.size()))		P.beginOffset(), P.size()))
SliceTy = TypePartitionTy;		SliceTy = TypePartitionTy;
if ((!SliceTy \|\| (SliceTy->isArrayTy() &&		if ((!SliceTy \|\| (SliceTy->isArrayTy() &&
SliceTy->getArrayElementType()->isIntegerTy())) &&		SliceTy->getArrayElementType()->isIntegerTy())) &&
DL.isLegalInteger(P.size() * 8))		DL.isLegalInteger(P.size() * 8))
SliceTy = Type::getIntNTy(C, P.size() 8);		SliceTy = Type::getIntNTy(C, P.size() 8);
if (!SliceTy)		if (!SliceTy)
SliceTy = ArrayType::get(Type::getInt8Ty(*C), P.size());		SliceTy = ArrayType::get(Type::getInt8Ty(*C), P.size());
assert(DL.getTypeAllocSize(SliceTy) >= P.size());		assert(DL.getTypeAllocSize(SliceTy).getFixedSize() >= P.size());

bool IsIntegerPromotable = isIntegerWideningViable(P, SliceTy, DL);		bool IsIntegerPromotable = isIntegerWideningViable(P, SliceTy, DL);

VectorType *VecTy =		VectorType *VecTy =
IsIntegerPromotable ? nullptr : isVectorPromotionViable(P, DL);		IsIntegerPromotable ? nullptr : isVectorPromotionViable(P, DL);
if (VecTy)		if (VecTy)
SliceTy = VecTy;		SliceTy = VecTy;

▲ Show 20 Lines • Show All 124 Lines • ▼ Show 20 Lines	bool SROA::splitAlloca(AllocaInst &AI, AllocaSlices &AS) {
// Now that we have identified any pre-splitting opportunities,		// Now that we have identified any pre-splitting opportunities,
// mark loads and stores unsplittable except for the following case.		// mark loads and stores unsplittable except for the following case.
// We leave a slice splittable if all other slices are disjoint or fully		// We leave a slice splittable if all other slices are disjoint or fully
// included in the slice, such as whole-alloca loads and stores.		// included in the slice, such as whole-alloca loads and stores.
// If we fail to split these during pre-splitting, we want to force them		// If we fail to split these during pre-splitting, we want to force them
// to be rewritten into a partition.		// to be rewritten into a partition.
bool IsSorted = true;		bool IsSorted = true;

uint64_t AllocaSize = DL.getTypeAllocSize(AI.getAllocatedType());		uint64_t AllocaSize =
		DL.getTypeAllocSize(AI.getAllocatedType()).getFixedSize();
const uint64_t MaxBitVectorSize = 1024;		const uint64_t MaxBitVectorSize = 1024;
if (AllocaSize <= MaxBitVectorSize) {		if (AllocaSize <= MaxBitVectorSize) {
// If a byte boundary is included in any load or store, a slice starting or		// If a byte boundary is included in any load or store, a slice starting or
// ending at the boundary is not splittable.		// ending at the boundary is not splittable.
SmallBitVector SplittableOffset(AllocaSize + 1, true);		SmallBitVector SplittableOffset(AllocaSize + 1, true);
for (Slice &S : AS)		for (Slice &S : AS)
for (unsigned O = S.beginOffset() + 1;		for (unsigned O = S.beginOffset() + 1;
O < S.endOffset() && O < AllocaSize; O++)		O < S.endOffset() && O < AllocaSize; O++)
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	bool SROA::splitAlloca(AllocaInst &AI, AllocaSlices &AS) {
SmallVector<Fragment, 4> Fragments;		SmallVector<Fragment, 4> Fragments;

// Rewrite each partition.		// Rewrite each partition.
for (auto &P : AS.partitions()) {		for (auto &P : AS.partitions()) {
if (AllocaInst *NewAI = rewritePartition(AI, AS, P)) {		if (AllocaInst *NewAI = rewritePartition(AI, AS, P)) {
Changed = true;		Changed = true;
if (NewAI != &AI) {		if (NewAI != &AI) {
uint64_t SizeOfByte = 8;		uint64_t SizeOfByte = 8;
uint64_t AllocaSize = DL.getTypeSizeInBits(NewAI->getAllocatedType());		uint64_t AllocaSize =
		DL.getTypeSizeInBits(NewAI->getAllocatedType()).getFixedSize();
// Don't include any padding.		// Don't include any padding.
uint64_t Size = std::min(AllocaSize, P.size() * SizeOfByte);		uint64_t Size = std::min(AllocaSize, P.size() * SizeOfByte);
Fragments.push_back(Fragment(NewAI, P.beginOffset() * SizeOfByte, Size));		Fragments.push_back(Fragment(NewAI, P.beginOffset() * SizeOfByte, Size));
}		}
}		}
++NumPartitions;		++NumPartitions;
}		}

NumAllocaPartitions += NumPartitions;		NumAllocaPartitions += NumPartitions;
MaxPartitionsPerAlloca.updateMax(NumPartitions);		MaxPartitionsPerAlloca.updateMax(NumPartitions);

// Migrate debug information from the old alloca to the new alloca(s)		// Migrate debug information from the old alloca to the new alloca(s)
// and the individual partitions.		// and the individual partitions.
TinyPtrVector<DbgVariableIntrinsic *> DbgDeclares = FindDbgAddrUses(&AI);		TinyPtrVector<DbgVariableIntrinsic *> DbgDeclares = FindDbgAddrUses(&AI);
if (!DbgDeclares.empty()) {		if (!DbgDeclares.empty()) {
auto *Var = DbgDeclares.front()->getVariable();		auto *Var = DbgDeclares.front()->getVariable();
auto *Expr = DbgDeclares.front()->getExpression();		auto *Expr = DbgDeclares.front()->getExpression();
auto VarSize = Var->getSizeInBits();		auto VarSize = Var->getSizeInBits();
DIBuilder DIB(AI.getModule(), /AllowUnresolved*/ false);		DIBuilder DIB(AI.getModule(), /AllowUnresolved*/ false);
uint64_t AllocaSize = DL.getTypeSizeInBits(AI.getAllocatedType());		uint64_t AllocaSize =
		DL.getTypeSizeInBits(AI.getAllocatedType()).getFixedSize();
for (auto Fragment : Fragments) {		for (auto Fragment : Fragments) {
// Create a fragment expression describing the new partition or reuse AI's		// Create a fragment expression describing the new partition or reuse AI's
// expression if there is only one partition.		// expression if there is only one partition.
auto *FragmentExpr = Expr;		auto *FragmentExpr = Expr;
if (Fragment.Size < AllocaSize \|\| Expr->isFragment()) {		if (Fragment.Size < AllocaSize \|\| Expr->isFragment()) {
// If this alloca is already a scalar replacement of a larger aggregate,		// If this alloca is already a scalar replacement of a larger aggregate,
// Fragment.Offset describes the offset inside the scalar.		// Fragment.Offset describes the offset inside the scalar.
auto ExprFragment = Expr->getFragmentInfo();		auto ExprFragment = Expr->getFragmentInfo();
▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines	bool SROA::runOnAlloca(AllocaInst &AI) {
// Special case dead allocas, as they're trivial.		// Special case dead allocas, as they're trivial.
if (AI.use_empty()) {		if (AI.use_empty()) {
AI.eraseFromParent();		AI.eraseFromParent();
return true;		return true;
}		}
const DataLayout &DL = AI.getModule()->getDataLayout();		const DataLayout &DL = AI.getModule()->getDataLayout();

// Skip alloca forms that this analysis can't handle.		// Skip alloca forms that this analysis can't handle.
if (AI.isArrayAllocation() \|\| !AI.getAllocatedType()->isSized() \|\|		auto *AT = AI.getAllocatedType();
DL.getTypeAllocSize(AI.getAllocatedType()) == 0)		if (AI.isArrayAllocation() \|\| !AT->isSized() \|\|
		sdesmalenUnsubmitted Not Done Reply Inline Actions If you base your patch on D76748, you can use `DL.getTypeAllocSize(AI.getAllocatedType()).isZero()`. sdesmalen: If you base your patch on D76748, you can use `DL.getTypeAllocSize(AI.getAllocatedType()).
		c-rhodesAuthorUnsubmitted Not Done Reply Inline Actions Ah nice, thanks for pointing that out I'll update this. c-rhodes: Ah nice, thanks for pointing that out I'll update this.
		(isa<VectorType>(AT) && cast<VectorType>(AT)->isScalable()) \|\|
		DL.getTypeAllocSize(AT).getFixedSize() == 0)
		ctetreauUnsubmitted Done Reply Inline Actions This can be rewritten: { auto AT = AI.getAllocatedType(); if (AI.isArrayAllocation() \|\| !AT->isSized() \|\| (isa<VectorType>(AT) && cast<VectorType>(AT)->isScalable()) \|\| DL.getTypeAllocSize(AT).getFixedSize() == 0) return false; } AI.getAllocatedType is used 3 times, might as well give it a name. An while isa<VectorType>(AT) && cast<VectorType>(AT)->isScalable() is a little longer than AT.isScalableVectorTy, it's not that bad. on the positive side, it's more explicit as to what it's doing, and it's also less misleading because there's no such thing as a ScalableVectorTy. ctetreau:* This can be rewritten: ``` { auto *AT = AI.getAllocatedType(); if (AI.
return false;		return false;

bool Changed = false;		bool Changed = false;

// First, split any FCA loads and stores touching this alloca to promote		// First, split any FCA loads and stores touching this alloca to promote
// better splitting and promotion opportunities.		// better splitting and promotion opportunities.
AggLoadStoreRewriter AggRewriter(DL);		AggLoadStoreRewriter AggRewriter(DL);
Changed \|= AggRewriter.rewrite(AI);		Changed \|= AggRewriter.rewrite(AI);
▲ Show 20 Lines • Show All 103 Lines • ▼ Show 20 Lines	PreservedAnalyses SROA::runImpl(Function &F, DominatorTree &RunDT,
LLVM_DEBUG(dbgs() << "SROA function: " << F.getName() << "\n");		LLVM_DEBUG(dbgs() << "SROA function: " << F.getName() << "\n");
C = &F.getContext();		C = &F.getContext();
DT = &RunDT;		DT = &RunDT;
AC = &RunAC;		AC = &RunAC;

BasicBlock &EntryBB = F.getEntryBlock();		BasicBlock &EntryBB = F.getEntryBlock();
for (BasicBlock::iterator I = EntryBB.begin(), E = std::prev(EntryBB.end());		for (BasicBlock::iterator I = EntryBB.begin(), E = std::prev(EntryBB.end());
I != E; ++I) {		I != E; ++I) {
if (AllocaInst *AI = dyn_cast<AllocaInst>(I))		if (AllocaInst *AI = dyn_cast<AllocaInst>(I)) {
		if (isa<VectorType>(AI->getAllocatedType()) &&
		ctetreauUnsubmitted Done Reply Inline Actions if (isa<VectorType>(AI->getAllocatedType()) && cast<VectorType>(AI->getAllocatedType())->isScalable() && isAllocaPromotable(AI)) Same as above. ctetreau: ``` if (isa<VectorType>(AI->getAllocatedType()) && cast<VectorType>(AI…
		cast<VectorType>(AI->getAllocatedType())->isScalable()) {
		if (isAllocaPromotable(AI))
		efriedmaUnsubmitted Done Reply Inline Actions This looks weird; did you mean to write something like this? if (AllocaInst AI = dyn_cast<AllocaInst>(I)) { if (isa<VectorType>(AI->getAllocatedType()) && cast<VectorType>(AI->getAllocatedType())->isScalable()) { if (isAllocaPromotable(AI)) PromotableAllocas.push_back(AI); } else { Worklist.insert(AI); } } efriedma:* This looks weird; did you mean to write something like this? ``` if (AllocaInst *AI =…
		c-rhodesAuthorUnsubmitted Not Done Reply Inline Actions Oops, yes I did! Good spot, we don't want allocas with scalable types that aren't promotable added to the worklist as `runOnAlloca` will blow up. Cheers, I'll fix this. c-rhodes: Oops, yes I did! Good spot, we don't want allocas with scalable types that aren't promotable…
		PromotableAllocas.push_back(AI);
		} else {
Worklist.insert(AI);		Worklist.insert(AI);
}		}
		}
		}

bool Changed = false;		bool Changed = false;
// A set of deleted alloca instruction pointers which should be removed from		// A set of deleted alloca instruction pointers which should be removed from
// the list of promotable allocas.		// the list of promotable allocas.
SmallPtrSet<AllocaInst *, 4> DeletedAllocas;		SmallPtrSet<AllocaInst *, 4> DeletedAllocas;

do {		do {
while (!Worklist.empty()) {		while (!Worklist.empty()) {
▲ Show 20 Lines • Show All 80 Lines • Show Last 20 Lines

llvm/test/Transforms/SROA/scalable-vectors.ll

This file was added.

				; RUN: opt < %s -sroa -S \| FileCheck %s
				; RUN: opt < %s -passes=sroa -S \| FileCheck %s

				; This test checks that SROA runs mem2reg on scalable vectors.

				define <vscale x 16 x i1> @alloca_nxv16i1(<vscale x 16 x i1> %pg) {
				; CHECK-LABEL: alloca_nxv16i1
				; CHECK-NEXT: ret <vscale x 16 x i1> %pg
				%pg.addr = alloca <vscale x 16 x i1>
				store <vscale x 16 x i1> %pg, <vscale x 16 x i1>* %pg.addr
				%1 = load <vscale x 16 x i1>, <vscale x 16 x i1>* %pg.addr
				ret <vscale x 16 x i1> %1
				}

				define <vscale x 16 x i8> @alloca_nxv16i8(<vscale x 16 x i8> %vec) {
				; CHECK-LABEL: alloca_nxv16i8
				; CHECK-NEXT: ret <vscale x 16 x i8> %vec
				%vec.addr = alloca <vscale x 16 x i8>
				store <vscale x 16 x i8> %vec, <vscale x 16 x i8>* %vec.addr
				%1 = load <vscale x 16 x i8>, <vscale x 16 x i8>* %vec.addr
				ret <vscale x 16 x i8> %1
				}

				; Test scalable alloca that can't be promoted. Mem2Reg only considers
				; non-volatile loads and stores for promotion.
				define <vscale x 16 x i8> @unpromotable_alloca(<vscale x 16 x i8> %vec) {
				; CHECK-LABEL: unpromotable_alloca
				; CHECK-NEXT: %vec.addr = alloca <vscale x 16 x i8>
				; CHECK-NEXT: store volatile <vscale x 16 x i8> %vec, <vscale x 16 x i8>* %vec.addr
				; CHECK-NEXT: %1 = load volatile <vscale x 16 x i8>, <vscale x 16 x i8>* %vec.addr
				; CHECK-NEXT: ret <vscale x 16 x i8> %1
				%vec.addr = alloca <vscale x 16 x i8>
				store volatile <vscale x 16 x i8> %vec, <vscale x 16 x i8>* %vec.addr
				%1 = load volatile <vscale x 16 x i8>, <vscale x 16 x i8>* %vec.addr
				ret <vscale x 16 x i8> %1
				}

This is an archive of the discontinued LLVM Phabricator instance.

[Transforms][SROA] Promote allocas with mem2reg for scalable typesClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 254155

llvm/lib/Transforms/Scalar/SROA.cpp

llvm/test/Transforms/SROA/scalable-vectors.ll

[Transforms][SROA] Promote allocas with mem2reg for scalable types
ClosedPublic