Download Raw Diff

Details

Reviewers

majnemer
eli.friedman
Bigcheese

Summary

Removed the alias set tracker and added individual load aliasing checks for each individual chain.

Diff Detail

Event Timeline

rriddle updated this revision to Diff 61175.Jun 18 2016, 3:12 PM

rriddle retitled this revision from to LoadCombine BugFixes : Combine negative index GEPS and fix load aliasing.

rriddle updated this object.

rriddle added a reviewer: Bigcheese.

rriddle added a subscriber: llvm-commits.

Please add testcases to the tests directory. Also, please split out each independent change. Each change is easier to analyze in isolation.

This revision now requires changes to proceed.Jun 18 2016, 3:52 PM

Removed the other patch fixes and added a test to the test suite

eli.friedman added a subscriber: eli.friedman.Jun 18 2016, 6:24 PM

eli.friedman added inline comments.

/Users/rriddle/Desktop/llvm/llvm/test/Transforms/LoadCombine/load-combine-alias.ll
13	What transform is this testcase checking for? I don't think you can prove any useful aliasing property between %i, %i1, and %str?

rriddle added inline comments.Jun 18 2016, 6:29 PM

/Users/rriddle/Desktop/llvm/llvm/test/Transforms/LoadCombine/load-combine-alias.ll
13	In the current implementation %1 is being aliased in that store operation which would cause every load chain to try and combine which would prevent the load at %0 from being able to combine with %2, %4, etc.

eli.friedman added inline comments.Jun 18 2016, 7:02 PM

/Users/rriddle/Desktop/llvm/llvm/test/Transforms/LoadCombine/load-combine-alias.ll
13	I'm not sure I follow... is the transform you're trying to perform something like this? void f(int* a, int* b) { a[0] = b[0]; a[1] = b[1]; a[2] = b[2]; a[3] = b[3]; // Combine to "memcpy(a, b, 16)"??? }

rriddle added inline comments.Jun 18 2016, 7:12 PM

/Users/rriddle/Desktop/llvm/llvm/test/Transforms/LoadCombine/load-combine-alias.ll

No, the point of transform is to combine each of the individual loads into a single large one. In this specific test, there are multiple GEP + load pairs coming from %i. Each of those will combine down to a single i128 load. It may be easier to see with this output

define i32 @Load_MultiChain(i32* %i) {
  %1 = getelementptr inbounds i32, i32* %i, i64 1
  %2 = bitcast i32* %i to i8*
  %3 = getelementptr i8, i8* %2, i64 0
  %4 = bitcast i8* %3 to i128*
  %.combined = load i128, i128* %4, align 4
  %combine.extract.shift = lshr i128 %.combined, 32
  %combine.extract.trunc1 = trunc i128 %combine.extract.shift to i32
  %5 = load i32, i32* %1, align 4
  %combine.extract.trunc = trunc i128 %.combined to i32
  %6 = load i32, i32* %i, align 4
  %7 = getelementptr inbounds i32, i32* %i, i64 2
  %combine.extract.shift2 = lshr i128 %.combined, 64
  %combine.extract.trunc3 = trunc i128 %combine.extract.shift2 to i32
  %8 = load i32, i32* %7, align 4
  %9 = getelementptr inbounds i32, i32* %i, i64 3
  %combine.extract.shift4 = lshr i128 %.combined, 96
  %combine.extract.trunc5 = trunc i128 %combine.extract.shift4 to i32
  %10 = load i32, i32* %9, align 4
  %11 = getelementptr inbounds i32, i32* %i, i64 4
  %12 = load i32, i32* %11, align 4
  %13 = getelementptr inbounds i32, i32* %i, i64 5
  %14 = load i32, i32* %13, align 4
  %15 = add nsw i32 %combine.extract.trunc, %14
  ret i32 %15
}

In the above every load that originated from i has been combined into a single i128 load. This test is not about transforming into the most optimal result it is testing to make sure that a referenced load chain does not affect all of the other load chains in the basic block. The problem with the current implementation is that it is too conservative when it comes to a write operation that aliases a load. When it reaches such an instruction it tries to combine all of the load chains at that point. This reduces the amount of loads that will be combined if it combines every chain when it reaches a store that references a load.

rriddle added inline comments.Jun 18 2016, 7:17 PM

/Users/rriddle/Desktop/llvm/llvm/test/Transforms/LoadCombine/load-combine-alias.ll
13	The line at 12 makes more sense when you visualize the load chains at that point. I will visualize the load chains in this format {base_ptr : referenced loads by instruction name }. So at line 12 the load chains are as follows : { %i : %0 } { %str : %1} The store instruction at line 12 is storing into %i1 which aliases the base ptr %str. This causes every load chain to try and combine, but at this point each only has 1 load; so there is no combine operation performed. The problem with this is that the chain with baseptr %i wasn't referenced by that store operation so why are we trying to combine that chain at this point.

eli.friedman added inline comments.Jun 18 2016, 7:25 PM

/Users/rriddle/Desktop/llvm/llvm/test/Transforms/LoadCombine/load-combine-alias.ll

The problem isn't whether it's optimal... the problem is that the transformation doesn't preserve the semantics of the code. For my C code, -load-combine (plus instcombine to clean up the result) gives:

define void @f(i32* nocapture %a, i32* nocapture readonly %b) local_unnamed_addr #0 {
entry:
  %0 = bitcast i32* %b to i128*
  %.combined = load i128, i128* %0, align 4
  %combine.extract.trunc = trunc i128 %.combined to i32
  store i32 %combine.extract.trunc, i32* %a, align 4, !tbaa !1
  %combine.extract.shift = lshr i128 %.combined, 32
  %combine.extract.trunc1 = trunc i128 %combine.extract.shift to i32
  %arrayidx3 = getelementptr inbounds i32, i32* %a, i64 1
  store i32 %combine.extract.trunc1, i32* %arrayidx3, align 4, !tbaa !1
  %combine.extract.shift2 = lshr i128 %.combined, 64
  %combine.extract.trunc3 = trunc i128 %combine.extract.shift2 to i32
  %arrayidx5 = getelementptr inbounds i32, i32* %a, i64 2
  store i32 %combine.extract.trunc3, i32* %arrayidx5, align 4, !tbaa !1
  %combine.extract.shift4 = lshr i128 %.combined, 96
  %combine.extract.trunc5 = trunc i128 %combine.extract.shift4 to i32
  %arrayidx7 = getelementptr inbounds i32, i32* %a, i64 3
  store i32 %combine.extract.trunc5, i32* %arrayidx7, align 4, !tbaa !1
  ret void
}

This is completely, utterly wrong.

rriddle abandoned this revision.Jun 19 2016, 9:24 AM

Fixed argument aliasing and updated test suite

Updated Memory location creation

eli.friedman added inline comments.Jun 19 2016, 1:39 PM

/Users/rriddle/Desktop/llvm/llvm/lib/Transforms/Scalar/LoadCombine.cpp
236	Useless comment.
249	Useless comment.
250	Convention is to write this as "if (RefInfo & MRI_Mod)".
252	Isn't this missing the step of sorting Loads->second?
262	What's the justification for the early exit here? Testcase?
266	Is updating the end iterator actually necessary?
270	Useless comment.
/Users/rriddle/Desktop/llvm/llvm/test/Transforms/LoadCombine/load-combine-alias.ll
10	Maybe it would be better to use noalias argument pointers instead of allocas? This testcase is a little weird because you're loading undef.
34	The allocas seem unnecessary here.

Redundant checks and comments. Updated test arguments.

Fixed load sorting

rriddle updated this revision to Diff 61226.Jun 19 2016, 2:12 PM

eli.friedman added inline comments.Jun 19 2016, 2:19 PM

/Users/rriddle/Desktop/llvm/llvm/lib/Transforms/Scalar/LoadCombine.cpp
239	Weird indentation.
257	This block of code is very similar to the body of LoadCombine::combineLoads; please refactor.

Refactor CombineLoads

LGTM. If you don't have commit access, I'll commit this for you.

eli.friedman accepted this revision.Jun 20 2016, 11:23 AM

eli.friedman edited edge metadata.

majnemer added inline comments.Jun 20 2016, 11:26 AM

/Users/rriddle/Desktop/llvm/llvm/lib/Transforms/Scalar/LoadCombine.cpp
134–142	Please appropriately format this.
250	Please format this appropriately.

I was going over this one more time before committing it... and I realized you don't actually have any test coverage for this change. I'm pretty sure the given testcase passes without your patch.

This revision now requires changes to proceed.Jun 21 2016, 7:20 PM

Updated tests

Sorry about that.

rriddle abandoned this revision.Dec 15 2016, 3:49 AM

davide mentioned this in D28922: [LoadCombine] Fix combining of loads which span an aliasing store..Jan 20 2017, 11:51 AM

Diff 62109

/Users/rriddle/Desktop/llvm/llvm/lib/Transforms/Scalar/LoadCombine.cpp

Show First 20 Lines • Show All 77 Lines • ▼ Show 20 Lines

private:		private:
BuilderTy *Builder;		BuilderTy *Builder;

PointerOffsetPair getPointerOffsetPair(LoadInst &);		PointerOffsetPair getPointerOffsetPair(LoadInst &);
bool combineLoads(DenseMap<const Value *, SmallVector<LoadPOPPair, 8>> &);		bool combineLoads(DenseMap<const Value *, SmallVector<LoadPOPPair, 8>> &);
bool aggregateLoads(SmallVectorImpl<LoadPOPPair> &);		bool aggregateLoads(SmallVectorImpl<LoadPOPPair> &);
bool combineLoads(SmallVectorImpl<LoadPOPPair> &);		bool combineLoads(SmallVectorImpl<LoadPOPPair> &);
		bool
		checkGenericInstAlias(Instruction *,
		DenseMap<const Value *, SmallVector<LoadPOPPair, 8>> &);
};		};
}		}

bool LoadCombine::doInitialization(Function &F) {		bool LoadCombine::doInitialization(Function &F) {
DEBUG(dbgs() << "LoadCombine function: " << F.getName() << "\n");		DEBUG(dbgs() << "LoadCombine function: " << F.getName() << "\n");
C = &F.getContext();		C = &F.getContext();
return true;		return true;
}		}
Show All 18 Lines	PointerOffsetPair LoadCombine::getPointerOffsetPair(LoadInst &LI) {
}		}
return POP;		return POP;
}		}

bool LoadCombine::combineLoads(		bool LoadCombine::combineLoads(
DenseMap<const Value *, SmallVector<LoadPOPPair, 8>> &LoadMap) {		DenseMap<const Value *, SmallVector<LoadPOPPair, 8>> &LoadMap) {
bool Combined = false;		bool Combined = false;
for (auto &Loads : LoadMap) {		for (auto &Loads : LoadMap) {
if (Loads.second.size() < 2)
continue;
std::sort(Loads.second.begin(), Loads.second.end(),
[](const LoadPOPPair &A, const LoadPOPPair &B) {
return A.POP.Offset < B.POP.Offset;
});
if (aggregateLoads(Loads.second))		if (aggregateLoads(Loads.second))
Combined = true;		Combined = true;
}		}
return Combined;		return Combined;
}		}

/// \brief Try to aggregate loads from a sorted list of loads to be combined.		/// \brief Try to aggregate loads from a sorted list of loads to be combined.
///		///
/// It is guaranteed that no writes occur between any of the loads. All loads		/// It is guaranteed that no writes occur between any of the loads. All loads
/// have the same base pointer. There are at least two loads.		/// have the same base pointer. There are at least two loads.
bool LoadCombine::aggregateLoads(SmallVectorImpl<LoadPOPPair> &Loads) {		bool LoadCombine::aggregateLoads(SmallVectorImpl<LoadPOPPair> &Loads) {
assert(Loads.size() >= 2 && "Insufficient loads!");		if(Loads.size() < 2)
		return false;

		std::sort(Loads.begin(), Loads.end(),
		[](const LoadPOPPair &A, const LoadPOPPair &B) {
		return A.POP.Offset < B.POP.Offset;
		});

LoadInst *BaseLoad = nullptr;		LoadInst *BaseLoad = nullptr;
		majnemerUnsubmitted Not Done Reply Inline Actions Please appropriately format this. majnemer: Please appropriately format this.
SmallVector<LoadPOPPair, 8> AggregateLoads;		SmallVector<LoadPOPPair, 8> AggregateLoads;
bool Combined = false;		bool Combined = false;
uint64_t PrevOffset = -1ull;		uint64_t PrevOffset = -1ull;
uint64_t PrevSize = 0;		uint64_t PrevSize = 0;
for (auto &L : Loads) {		for (auto &L : Loads) {
if (PrevOffset == -1ull) {		if (PrevOffset == -1ull) {
BaseLoad = L.Load;		BaseLoad = L.Load;
PrevOffset = L.POP.Offset;		PrevOffset = L.POP.Offset;
▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines	Value *V = Builder->CreateExtractInteger(
L.POP.Offset - Loads[0].POP.Offset, "combine.extract");		L.POP.Offset - Loads[0].POP.Offset, "combine.extract");
L.Load->replaceAllUsesWith(V);		L.Load->replaceAllUsesWith(V);
}		}

NumLoadsCombined = NumLoadsCombined + Loads.size();		NumLoadsCombined = NumLoadsCombined + Loads.size();
return true;		return true;
}		}

		bool LoadCombine::checkGenericInstAlias(
		Instruction *V,
		DenseMap<const Value *, SmallVector<LoadPOPPair, 8>> &LoadMap) {
		bool Combined = false;
		eli.friedmanUnsubmitted Not Done Reply Inline Actions Useless comment. eli.friedman: Useless comment.

		// Loop over each load and check to see if it is aliased by
		// this instruction.
		eli.friedmanUnsubmitted Not Done Reply Inline Actions Weird indentation. eli.friedman: Weird indentation.
		auto LoadMapEnd = LoadMap.end();
		for (auto Loads = LoadMap.begin(); Loads != LoadMapEnd;) {
		// Check the AA results for an alias.
		// FIXME:We could calculate the size for the memory location instead of
		// leaving it unknown.
		auto RefInfo =
		AA->getModRefInfo(V, MemoryLocation(Loads->first));

		if (RefInfo & ModRefInfo::MRI_Mod) {
		// Try to aggregate.
		eli.friedmanUnsubmitted Not Done Reply Inline Actions Useless comment. eli.friedman: Useless comment.
		if(aggregateLoads(Loads->second))
		eli.friedmanUnsubmitted Not Done Reply Inline Actions Convention is to write this as "if (RefInfo & MRI_Mod)". eli.friedman: Convention is to write this as "if (RefInfo & MRI_Mod)".
		majnemerUnsubmitted Not Done Reply Inline Actions Please format this appropriately. majnemer: Please format this appropriately.
		Combined = true;

		eli.friedmanUnsubmitted Not Done Reply Inline Actions Isn't this missing the step of sorting Loads->second? eli.friedman: Isn't this missing the step of sorting Loads->second?
		// Get the current iterator.
		auto CurLoads = Loads++;

		// Remove the loads.
		LoadMap.erase(CurLoads);
		eli.friedmanUnsubmitted Not Done Reply Inline Actions This block of code is very similar to the body of LoadCombine::combineLoads; please refactor. eli.friedman: This block of code is very similar to the body of LoadCombine::combineLoads; please refactor.
		continue;
		}

		++Loads;
		}
		eli.friedmanUnsubmitted Not Done Reply Inline Actions What's the justification for the early exit here? Testcase? eli.friedman: What's the justification for the early exit here? Testcase?
		return Combined;
		}

bool LoadCombine::runOnBasicBlock(BasicBlock &BB) {		bool LoadCombine::runOnBasicBlock(BasicBlock &BB) {
		eli.friedmanUnsubmitted Not Done Reply Inline Actions Is updating the end iterator actually necessary? eli.friedman: Is updating the end iterator actually necessary?
if (skipBasicBlock(BB))		if (skipBasicBlock(BB))
return false;		return false;

AA = &getAnalysis<AAResultsWrapperPass>().getAAResults();		AA = &getAnalysis<AAResultsWrapperPass>().getAAResults();
		eli.friedmanUnsubmitted Not Done Reply Inline Actions Useless comment. eli.friedman: Useless comment.

IRBuilder<TargetFolder> TheBuilder(		IRBuilder<TargetFolder> TheBuilder(
BB.getContext(), TargetFolder(BB.getModule()->getDataLayout()));		BB.getContext(), TargetFolder(BB.getModule()->getDataLayout()));
Builder = &TheBuilder;		Builder = &TheBuilder;

DenseMap<const Value *, SmallVector<LoadPOPPair, 8>> LoadMap;		DenseMap<const Value *, SmallVector<LoadPOPPair, 8>> LoadMap;
AliasSetTracker AST(*AA);

bool Combined = false;		bool Combined = false;
unsigned Index = 0;		unsigned Index = 0;
for (auto &I : BB) {		for (auto &I : BB) {
if (I.mayThrow() \|\| (I.mayWriteToMemory() && AST.containsUnknown(&I))) {		// If this instruction may throw then we need to combine the loadmap now.
		if (I.mayThrow()) {
if (combineLoads(LoadMap))		if (combineLoads(LoadMap))
Combined = true;		Combined = true;
LoadMap.clear();		LoadMap.clear();
AST.clear();		continue;
		}
		// Check if the instruction might write to memory.
		if (I.mayWriteToMemory()) {
		// Check for an alias from generic analysis.
		if (checkGenericInstAlias(&I, LoadMap))
		Combined = true;
continue;		continue;
}		}
LoadInst *LI = dyn_cast<LoadInst>(&I);		LoadInst *LI = dyn_cast<LoadInst>(&I);
if (!LI)		if (!LI)
continue;		continue;
++NumLoadsAnalyzed;		++NumLoadsAnalyzed;
if (!LI->isSimple() \|\| !LI->getType()->isIntegerTy())		if (!LI->isSimple() \|\| !LI->getType()->isIntegerTy())
continue;		continue;
auto POP = getPointerOffsetPair(*LI);		auto POP = getPointerOffsetPair(*LI);
if (!POP.Pointer)		if (!POP.Pointer)
continue;		continue;
LoadMap[POP.Pointer].push_back(LoadPOPPair(LI, POP, Index++));		LoadMap[POP.Pointer].push_back(LoadPOPPair(LI, POP, Index++));
AST.add(LI);
}		}
if (combineLoads(LoadMap))		if (combineLoads(LoadMap))
Combined = true;		Combined = true;
return Combined;		return Combined;
}		}

char LoadCombine::ID = 0;		char LoadCombine::ID = 0;

BasicBlockPass *llvm::createLoadCombinePass() {		BasicBlockPass *llvm::createLoadCombinePass() {
return new LoadCombine();		return new LoadCombine();
}		}

INITIALIZE_PASS_BEGIN(LoadCombine, "load-combine", LDCOMBINE_NAME, false, false)		INITIALIZE_PASS_BEGIN(LoadCombine, "load-combine", LDCOMBINE_NAME, false, false)
INITIALIZE_PASS_DEPENDENCY(AAResultsWrapperPass)		INITIALIZE_PASS_DEPENDENCY(AAResultsWrapperPass)
INITIALIZE_PASS_END(LoadCombine, "load-combine", LDCOMBINE_NAME, false, false)		INITIALIZE_PASS_END(LoadCombine, "load-combine", LDCOMBINE_NAME, false, false)

/Users/rriddle/Desktop/llvm/llvm/test/Transforms/LoadCombine/load-combine-alias.ll

				; RUN: opt -basicaa -load-combine -S < %s \| FileCheck %s
				target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-apple-macosx10.11.0"


				define void @load_combine(i32* noalias %a, i32* noalias %b, i32* noalias %c){
				entry:
				%arrayidx = getelementptr inbounds i32, i32* %b,i64 0
				%0 = load i32, i32* %arrayidx, align 16
				%arrayidx1 = getelementptr inbounds i32, i32* %a, i64 0
				eli.friedmanUnsubmitted Not Done Reply Inline Actions Maybe it would be better to use noalias argument pointers instead of allocas? This testcase is a little weird because you're loading undef. eli.friedman: Maybe it would be better to use noalias argument pointers instead of allocas? This testcase is…
				store i32 %0, i32* %arrayidx1, align 16
				%arrayidx2 = getelementptr inbounds i32, i32* %b, i64 1
				%1 = load i32, i32* %arrayidx2, align 4
				eli.friedmanUnsubmitted Not Done Reply Inline Actions What transform is this testcase checking for? I don't think you can prove any useful aliasing property between %i, %i1, and %str? eli.friedman: What transform is this testcase checking for? I don't think you can prove any useful aliasing…
				rriddleAuthorUnsubmitted Not Done Reply Inline Actions In the current implementation %1 is being aliased in that store operation which would cause every load chain to try and combine which would prevent the load at %0 from being able to combine with %2, %4, etc. rriddle: In the current implementation %1 is being aliased in that store operation which would cause…
				eli.friedmanUnsubmitted Not Done Reply Inline Actions I'm not sure I follow... is the transform you're trying to perform something like this? void f(int* a, int* b) { a[0] = b[0]; a[1] = b[1]; a[2] = b[2]; a[3] = b[3]; // Combine to "memcpy(a, b, 16)"??? } eli.friedman: I'm not sure I follow... is the transform you're trying to perform something like this? ```…
				rriddleAuthorUnsubmitted Not Done Reply Inline Actions No, the point of transform is to combine each of the individual loads into a single large one. In this specific test, there are multiple GEP + load pairs coming from %i. Each of those will combine down to a single i128 load. It may be easier to see with this output define i32 @Load_MultiChain(i32* %i) { %1 = getelementptr inbounds i32, i32* %i, i64 1 %2 = bitcast i32* %i to i8* %3 = getelementptr i8, i8* %2, i64 0 %4 = bitcast i8* %3 to i128* %.combined = load i128, i128* %4, align 4 %combine.extract.shift = lshr i128 %.combined, 32 %combine.extract.trunc1 = trunc i128 %combine.extract.shift to i32 %5 = load i32, i32* %1, align 4 %combine.extract.trunc = trunc i128 %.combined to i32 %6 = load i32, i32* %i, align 4 %7 = getelementptr inbounds i32, i32* %i, i64 2 %combine.extract.shift2 = lshr i128 %.combined, 64 %combine.extract.trunc3 = trunc i128 %combine.extract.shift2 to i32 %8 = load i32, i32* %7, align 4 %9 = getelementptr inbounds i32, i32* %i, i64 3 %combine.extract.shift4 = lshr i128 %.combined, 96 %combine.extract.trunc5 = trunc i128 %combine.extract.shift4 to i32 %10 = load i32, i32* %9, align 4 %11 = getelementptr inbounds i32, i32* %i, i64 4 %12 = load i32, i32* %11, align 4 %13 = getelementptr inbounds i32, i32* %i, i64 5 %14 = load i32, i32* %13, align 4 %15 = add nsw i32 %combine.extract.trunc, %14 ret i32 %15 } In the above every load that originated from i has been combined into a single i128 load. This test is not about transforming into the most optimal result it is testing to make sure that a referenced load chain does not affect all of the other load chains in the basic block. The problem with the current implementation is that it is too conservative when it comes to a write operation that aliases a load. When it reaches such an instruction it tries to combine all of the load chains at that point. This reduces the amount of loads that will be combined if it combines every chain when it reaches a store that references a load. rriddle: No, the point of transform is to combine each of the individual loads into a single large one.
				eli.friedmanUnsubmitted Not Done Reply Inline Actions The problem isn't whether it's optimal... the problem is that the transformation doesn't preserve the semantics of the code. For my C code, -load-combine (plus instcombine to clean up the result) gives: define void @f(i32* nocapture %a, i32* nocapture readonly %b) local_unnamed_addr #0 { entry: %0 = bitcast i32* %b to i128* %.combined = load i128, i128* %0, align 4 %combine.extract.trunc = trunc i128 %.combined to i32 store i32 %combine.extract.trunc, i32* %a, align 4, !tbaa !1 %combine.extract.shift = lshr i128 %.combined, 32 %combine.extract.trunc1 = trunc i128 %combine.extract.shift to i32 %arrayidx3 = getelementptr inbounds i32, i32* %a, i64 1 store i32 %combine.extract.trunc1, i32* %arrayidx3, align 4, !tbaa !1 %combine.extract.shift2 = lshr i128 %.combined, 64 %combine.extract.trunc3 = trunc i128 %combine.extract.shift2 to i32 %arrayidx5 = getelementptr inbounds i32, i32* %a, i64 2 store i32 %combine.extract.trunc3, i32* %arrayidx5, align 4, !tbaa !1 %combine.extract.shift4 = lshr i128 %.combined, 96 %combine.extract.trunc5 = trunc i128 %combine.extract.shift4 to i32 %arrayidx7 = getelementptr inbounds i32, i32* %a, i64 3 store i32 %combine.extract.trunc5, i32* %arrayidx7, align 4, !tbaa !1 ret void } This is completely, utterly wrong. eli.friedman: The problem isn't whether it's optimal... the problem is that the transformation doesn't…
				rriddleAuthorUnsubmitted Not Done Reply Inline Actions The line at 12 makes more sense when you visualize the load chains at that point. I will visualize the load chains in this format {base_ptr : referenced loads by instruction name }. So at line 12 the load chains are as follows : { %i : %0 } { %str : %1} The store instruction at line 12 is storing into %i1 which aliases the base ptr %str. This causes every load chain to try and combine, but at this point each only has 1 load; so there is no combine operation performed. The problem with this is that the chain with baseptr %i wasn't referenced by that store operation so why are we trying to combine that chain at this point. rriddle: The line at 12 makes more sense when you visualize the load chains at that point. I will…
				%arrayidx3 = getelementptr inbounds i32, i32* %a, i64 1
				%c.gep.1 = getelementptr inbounds i32, i32* %c, i64 0
				%c.load.1 = load i32, i32* %c.gep.1, align 8
				store i32 %1, i32* %c.gep.1, align 4
				store i32 %c.load.1, i32* %arrayidx3, align 4
				%arrayidx4 = getelementptr inbounds i32, i32* %b, i64 2
				%2 = load i32, i32* %arrayidx4, align 8
				%arrayidx5 = getelementptr inbounds i32, i32* %a, i64 2
				store i32 %2, i32* %arrayidx5, align 8
				%arrayidx6 = getelementptr inbounds i32, i32* %b, i64 3
				%3 = load i32, i32* %arrayidx6, align 4
				%arrayidx7 = getelementptr inbounds i32, i32* %a, i64 3
				store i32 %3, i32* %arrayidx7, align 4
				ret void
				; CHECK-LABEL:@load_combine(
				; CHECK-NOT: load i64
				}

				define void @load_nocombine(i32* %a, i32* %b) {
				entry:
				%arrayidx = getelementptr inbounds i32, i32* %b, i64 0
				eli.friedmanUnsubmitted Not Done Reply Inline Actions The allocas seem unnecessary here. eli.friedman: The allocas seem unnecessary here.
				%0 = load i32, i32* %arrayidx, align 16
				%arrayidx1 = getelementptr inbounds i32, i32* %a, i64 0
				store i32 %0, i32* %arrayidx1, align 16
				%arrayidx2 = getelementptr inbounds i32, i32* %b, i64 1
				%1 = load i32, i32* %arrayidx2, align 4
				%arrayidx3 = getelementptr inbounds i32, i32* %a, i64 1
				store i32 %1, i32* %arrayidx3, align 4
				ret void
				; CHECK-LABEL:@load_nocombine(
				; CHECK-NOT: load i64, i64
				}

This is an archive of the discontinued LLVM Phabricator instance.

LoadCombine Load Aliasing Fix
AbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 62109

/Users/rriddle/Desktop/llvm/llvm/lib/Transforms/Scalar/LoadCombine.cpp

/Users/rriddle/Desktop/llvm/llvm/test/Transforms/LoadCombine/load-combine-alias.ll

This is an archive of the discontinued LLVM Phabricator instance.

LoadCombine Load Aliasing FixAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 62109

/Users/rriddle/Desktop/llvm/llvm/lib/Transforms/Scalar/LoadCombine.cpp

/Users/rriddle/Desktop/llvm/llvm/test/Transforms/LoadCombine/load-combine-alias.ll

LoadCombine Load Aliasing Fix
AbandonedPublic