This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
2
LoadStoreVectorizer.cpp
-
test/Transforms/LoadStoreVectorizer/AArch64/
-
Transforms/
-
LoadStoreVectorizer/
-
AArch64/
1
reorder-infinite-loop.ll

Differential D54468

[LoadStoreVectorizer] Fix infinite loop in reorder.
Needs ReviewPublic

Authored by ebevhan on Nov 13 2018, 2:07 AM.

Download Raw Diff

Details

Reviewers

rtereshin
volkan

Summary

When performing LSV, SCEV could tell us that two addresses
were consecutive, yet depended on each other in the IR. This
was due to folding opportunities in the SCEV expressions for
the address calculations.

Solve this by doing basic simplification when reordering
instructions after merging.

This solves https://bugs.llvm.org/show_bug.cgi?id=38517 .

Diff Detail

Event Timeline

ebevhan created this revision.Nov 13 2018, 2:07 AM

Herald added subscribers: llvm-commits, javed.absar. · View Herald TranscriptNov 13 2018, 2:07 AM

rtereshin added a reviewer: volkan.Feb 11 2019, 11:42 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 11 2019, 11:42 PM

volkan added inline comments.Feb 12 2019, 10:52 AM

lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
494	Could you add a comment explaining why we are doing this? Also, do we really need to simplify the operands for each instruction in the list? Should we check if it's a gep before trying to simplify?
test/Transforms/LoadStoreVectorizer/AArch64/reorder-infinite-loop.ll
15	Nit: You can run -instnamer on the test in order to assign names to these instructions.

rtereshin added inline comments.Feb 12 2019, 7:20 PM

lib/Transforms/Vectorize/LoadStoreVectorizer.cpp

497

Hi Bevin,

Thanks for looking into this, apologies it took me so long to notice there is a review on me.

I don't think this is a reliable fix. It appears that it will only work if SimplifyInstruction is capable of simplifying-away all the "false" data dependencies between memory operations Vectorizer::isConsecutiveAccess is able to prove consecutive. I don't think there is anything (or should be) that guarantees such parity.

To illustrate the problem, let me start with a minimized version of the test you're adding in this patch, for which the SimplifyInstruction-based fix does work:

target triple = "aarch64"

; Function Attrs: noinline nounwind
define i32 @test([2 x i32]* %array) #0 {
entry:
  %arrayidx = getelementptr inbounds [2 x i32], [2 x i32]* %array, i32 0, i32 1
  %t = load i32, i32* %arrayidx, align 4
  %rem = urem i32 %t, 1
  %arrayidx2 = getelementptr inbounds [2 x i32], [2 x i32]* %array, i32 %rem, i32 0
  %v = load i32, i32* %arrayidx2, align 4
  %r = add i32 %v, %t
  ret i32 %r
}

attributes #0 = { noinline nounwind }

The false dependency is created by %rem = urem i32 %t, 1 and it's trivial enough for SimplifyInstruction to constant fold, eliminating the dependency.

Here is the same test with the expression made more complex every so slightly:

target triple = "aarch64"

; Function Attrs: noinline nounwind
define i32 @test([2 x i32]* %array, i32 %idx) #0 {
entry:
  %ptr1 = getelementptr inbounds [2 x i32], [2 x i32]* %array, i32 %idx, i32 1
  %t = load i32, i32* %ptr1, align 4
  %t2 = mul i32 %t, 2
  %idx.p.2t = add i32 %idx, %t2
  %idx.p.t = sub i32 %idx.p.2t, %t
  %idx.another = sub i32 %idx.p.t, %t
  %ptr0 = getelementptr inbounds [2 x i32], [2 x i32]* %array, i32 %idx.another, i32 0
  %v = load i32, i32* %ptr0, align 4
  %r = add i32 %v, %t
  ret i32 %r
}

attributes #0 = { noinline nounwind }

The expression for i32 %idx.another is roughly (((2 * %t) + %idx) - %t) - %t and it's trivial enough for ScalarEvolution alone to see through:

%ptr1 = getelementptr inbounds [2 x i32], [2 x i32]* %array, i32 %idx, i32 1
-->  (4 + (8 * (sext i32 %idx to i64))<nsw> + %array) U: full-set S: full-set
%t = load i32, i32* %ptr1, align 4
-->  %t U: full-set S: full-set
%t2 = mul i32 %t, 2
-->  (2 * %t) U: [0,-1) S: [-2147483648,2147483647)
%idx.p.2t = add i32 %idx, %t2
-->  ((2 * %t) + %idx) U: full-set S: full-set
%idx.p.t = sub i32 %idx.p.2t, %t
-->  (%idx + %t) U: full-set S: full-set
%idx.another = sub i32 %idx.p.t, %t
-->  %idx U: full-set S: full-set
%ptr0 = getelementptr inbounds [2 x i32], [2 x i32]* %array, i32 %idx.another, i32 0
-->  ((8 * (sext i32 %idx to i64))<nsw> + %array)<nsw> U: full-set S: full-set

but it isn't trivial enough for SimplifyInstruction to eliminate the false dependency: LSV hangs on this example with and w/o this patch both.

If we pursue roughly the same approach, we could use ScalarEvolution Expander instead: if we re-materialize the SCEV of the pointer being used when we issue the vectorized version of the load or store (and keep reorder method as it currently is TOT), we could probably avoid the problem more reliably just because Vectorizer::isConsecutiveAccess is largely based on ScalarEvolution and there is more parity between what they can and can not do (note above the SCEV for %ptr0: it has the false dependency on %t fully eliminated. If re-materialized from SCEV, it will be just fine).

However, it won't do entirely either. Vectorizer::isConsecutiveAccess does more than just applying SE. Most notably, it can see through selects, even nested ones. Here's a test-case, which is mostly derived from the previous one:

target triple = "aarch64"

; Function Attrs: noinline nounwind
define i32 @test(i32* %array, i32 %idx, i1 %cond, i1 %cond2) #0 {
entry:
  %idx.p.7 = add nsw i32 %idx, 7
  %idx.p.7.sext = sext i32 %idx.p.7 to i64
  %ptr1.dummy = getelementptr inbounds i32, i32* %array, i64 21
  %ptr1.true = getelementptr inbounds i32, i32* %array, i64 1
  %ptr1.false = getelementptr inbounds i32, i32* %array, i64 %idx.p.7.sext
  %ptr1.tmp = select i1 %cond, i32* %ptr1.true, i32* %ptr1.false
  %ptr1 = select i1 %cond2, i32* %ptr1.tmp, i32* %ptr1.dummy
  %t = load i32, i32* %ptr1, align 4
  %t2 = mul i32 %t, 2
  %idx.p.2t = add i32 %idx, %t2
  %idx.p.t = sub i32 %idx.p.2t, %t
  %idx.another = sub i32 %idx.p.t, %t
  %idx.p.6 = add i32 %idx.another, 6
  %idx.p.6.sext = sext i32 %idx.p.6 to i64
  %ptr0.dummy = getelementptr inbounds i32, i32* %array, i64 20
  %ptr0.true = getelementptr inbounds i32, i32* %array, i64 0
  %ptr0.false = getelementptr inbounds i32, i32* %array, i64 %idx.p.6.sext
  %ptr0.tmp = select i1 %cond, i32* %ptr0.true, i32* %ptr0.false
  %ptr0 = select i1 %cond2, i32* %ptr0.tmp, i32* %ptr0.dummy
  %v = load i32, i32* %ptr0, align 4
  %r = add i32 %v, %t
  ret i32 %r
}

attributes #0 = { noinline nounwind }

As before, it hangs LSV both w/ and w/o this patch applied. What's more, however, in this case a top-level SCEV of the %ptr0 is nowhere near being free of the false dependency on %t:

%idx.p.7 = add nsw i32 %idx, 7
-->  (7 + %idx) U: full-set S: full-set
%idx.p.7.sext = sext i32 %idx.p.7 to i64
-->  (sext i32 (7 + %idx) to i64) U: [-2147483648,2147483648) S: [-2147483648,2147483648)
%ptr1.dummy = getelementptr inbounds i32, i32* %array, i64 21
-->  (84 + %array)<nsw> U: full-set S: full-set
%ptr1.true = getelementptr inbounds i32, i32* %array, i64 1
-->  (4 + %array)<nsw> U: full-set S: full-set
%ptr1.false = getelementptr inbounds i32, i32* %array, i64 %idx.p.7.sext
-->  ((4 * (sext i32 (7 + %idx) to i64))<nsw> + %array)<nsw> U: full-set S: full-set
%ptr1.tmp = select i1 %cond, i32* %ptr1.true, i32* %ptr1.false
-->  %ptr1.tmp U: full-set S: full-set
%ptr1 = select i1 %cond2, i32* %ptr1.tmp, i32* %ptr1.dummy
-->  %ptr1 U: full-set S: full-set
%t = load i32, i32* %ptr1, align 4
-->  %t U: full-set S: full-set
%t2 = mul i32 %t, 2
-->  (2 * %t) U: [0,-1) S: [-2147483648,2147483647)
%idx.p.2t = add i32 %idx, %t2
-->  ((2 * %t) + %idx) U: full-set S: full-set
%idx.p.t = sub i32 %idx.p.2t, %t
-->  (%idx + %t) U: full-set S: full-set
%idx.another = sub i32 %idx.p.t, %t
-->  %idx U: full-set S: full-set
%idx.p.6 = add i32 %idx.another, 6
-->  (6 + %idx) U: full-set S: full-set
%idx.p.6.sext = sext i32 %idx.p.6 to i64
-->  (sext i32 (6 + %idx) to i64) U: [-2147483648,2147483648) S: [-2147483648,2147483648)
%ptr0.dummy = getelementptr inbounds i32, i32* %array, i64 20
-->  (80 + %array)<nsw> U: full-set S: full-set
%ptr0.true = getelementptr inbounds i32, i32* %array, i64 0
-->  %array U: full-set S: full-set
%ptr0.false = getelementptr inbounds i32, i32* %array, i64 %idx.p.6.sext
-->  ((4 * (sext i32 (6 + %idx) to i64))<nsw> + %array)<nsw> U: full-set S: full-set
%ptr0.tmp = select i1 %cond, i32* %ptr0.true, i32* %ptr0.false
-->  %ptr0.tmp U: full-set S: full-set
%ptr0 = select i1 %cond2, i32* %ptr0.tmp, i32* %ptr0.dummy

So rematerializing just the top-level SCEV with the Expander won't help here either.

Revision Contents

Path

Size

lib/

Transforms/

Vectorize/

LoadStoreVectorizer.cpp

7 lines

test/

Transforms/

LoadStoreVectorizer/

AArch64/

reorder-infinite-loop.ll

27 lines

Diff 173821

lib/Transforms/Vectorize/LoadStoreVectorizer.cpp

Show First 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
#include "llvm/ADT/MapVector.h"		#include "llvm/ADT/MapVector.h"
#include "llvm/ADT/PostOrderIterator.h"		#include "llvm/ADT/PostOrderIterator.h"
#include "llvm/ADT/STLExtras.h"		#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/SmallPtrSet.h"		#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/ADT/iterator_range.h"		#include "llvm/ADT/iterator_range.h"
#include "llvm/Analysis/AliasAnalysis.h"		#include "llvm/Analysis/AliasAnalysis.h"
		#include "llvm/Analysis/InstructionSimplify.h"
#include "llvm/Analysis/MemoryLocation.h"		#include "llvm/Analysis/MemoryLocation.h"
#include "llvm/Analysis/OrderedBasicBlock.h"		#include "llvm/Analysis/OrderedBasicBlock.h"
#include "llvm/Analysis/ScalarEvolution.h"		#include "llvm/Analysis/ScalarEvolution.h"
#include "llvm/Analysis/TargetTransformInfo.h"		#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/Transforms/Utils/Local.h"		#include "llvm/Transforms/Utils/Local.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/Analysis/VectorUtils.h"		#include "llvm/Analysis/VectorUtils.h"
#include "llvm/IR/Attributes.h"		#include "llvm/IR/Attributes.h"
▲ Show 20 Lines • Show All 408 Lines • ▼ Show 20 Lines	bool Vectorizer::lookThroughSelects(Value PtrA, Value PtrB,
}		}
return false;		return false;
}		}

void Vectorizer::reorder(Instruction *I) {		void Vectorizer::reorder(Instruction *I) {
OrderedBasicBlock OBB(I->getParent());		OrderedBasicBlock OBB(I->getParent());
SmallPtrSet<Instruction *, 16> InstructionsToMove;		SmallPtrSet<Instruction *, 16> InstructionsToMove;
SmallVector<Instruction *, 16> Worklist;		SmallVector<Instruction *, 16> Worklist;
		SimplifyQuery SQ(DL, nullptr, &DT);

Worklist.push_back(I);		Worklist.push_back(I);
while (!Worklist.empty()) {		while (!Worklist.empty()) {
Instruction *IW = Worklist.pop_back_val();		Instruction *IW = Worklist.pop_back_val();
int NumOperands = IW->getNumOperands();		int NumOperands = IW->getNumOperands();
for (int i = 0; i < NumOperands; i++) {		for (int i = 0; i < NumOperands; i++) {
Instruction *IM = dyn_cast<Instruction>(IW->getOperand(i));		Instruction *IM = dyn_cast<Instruction>(IW->getOperand(i));
if (!IM \|\| IM->getOpcode() == Instruction::PHI)		if (!IM \|\| IM->getOpcode() == Instruction::PHI)
continue;		continue;

// If IM is in another BB, no need to move it, because this pass only		// If IM is in another BB, no need to move it, because this pass only
// vectorizes instructions within one BB.		// vectorizes instructions within one BB.
if (IM->getParent() != I->getParent())		if (IM->getParent() != I->getParent())
continue;		continue;

if (!OBB.dominates(IM, I)) {		if (!OBB.dominates(IM, I)) {
		if (auto *S = SimplifyInstruction(IM, SQ.getWithInstruction(IM))) {
		volkanUnsubmitted Not Done Reply Inline Actions Could you add a comment explaining why we are doing this? Also, do we really need to simplify the operands for each instruction in the list? Should we check if it's a gep before trying to simplify? volkan: Could you add a comment explaining why we are doing this? Also, do we really need to simplify…
		IW->setOperand(i, S);
		i--;
		continue;
		rtereshinUnsubmitted Not Done Reply Inline Actions Hi Bevin, Thanks for looking into this, apologies it took me so long to notice there is a review on me. I don't think this is a reliable fix. It appears that it will only work if `SimplifyInstruction` is capable of simplifying-away all the "false" data dependencies between memory operations `Vectorizer::isConsecutiveAccess` is able to prove consecutive. I don't think there is anything (or should be) that guarantees such parity. To illustrate the problem, let me start with a minimized version of the test you're adding in this patch, for which the `SimplifyInstruction`-based fix does work: target triple = "aarch64" ; Function Attrs: noinline nounwind define i32 @test([2 x i32]* %array) #0 { entry: %arrayidx = getelementptr inbounds [2 x i32], [2 x i32]* %array, i32 0, i32 1 %t = load i32, i32* %arrayidx, align 4 %rem = urem i32 %t, 1 %arrayidx2 = getelementptr inbounds [2 x i32], [2 x i32]* %array, i32 %rem, i32 0 %v = load i32, i32* %arrayidx2, align 4 %r = add i32 %v, %t ret i32 %r } attributes #0 = { noinline nounwind } The false dependency is created by `%rem = urem i32 %t, 1` and it's trivial enough for `SimplifyInstruction` to constant fold, eliminating the dependency. Here is the same test with the expression made more complex every so slightly: target triple = "aarch64" ; Function Attrs: noinline nounwind define i32 @test([2 x i32]* %array, i32 %idx) #0 { entry: %ptr1 = getelementptr inbounds [2 x i32], [2 x i32]* %array, i32 %idx, i32 1 %t = load i32, i32* %ptr1, align 4 %t2 = mul i32 %t, 2 %idx.p.2t = add i32 %idx, %t2 %idx.p.t = sub i32 %idx.p.2t, %t %idx.another = sub i32 %idx.p.t, %t %ptr0 = getelementptr inbounds [2 x i32], [2 x i32]* %array, i32 %idx.another, i32 0 %v = load i32, i32* %ptr0, align 4 %r = add i32 %v, %t ret i32 %r } attributes #0 = { noinline nounwind } The expression for `i32 %idx.another` is roughly `(((2 * %t) + %idx) - %t) - %t` and it's trivial enough for ScalarEvolution alone to see through: %ptr1 = getelementptr inbounds [2 x i32], [2 x i32]* %array, i32 %idx, i32 1 --> (4 + (8 * (sext i32 %idx to i64))<nsw> + %array) U: full-set S: full-set %t = load i32, i32* %ptr1, align 4 --> %t U: full-set S: full-set %t2 = mul i32 %t, 2 --> (2 * %t) U: [0,-1) S: [-2147483648,2147483647) %idx.p.2t = add i32 %idx, %t2 --> ((2 * %t) + %idx) U: full-set S: full-set %idx.p.t = sub i32 %idx.p.2t, %t --> (%idx + %t) U: full-set S: full-set %idx.another = sub i32 %idx.p.t, %t --> %idx U: full-set S: full-set %ptr0 = getelementptr inbounds [2 x i32], [2 x i32]* %array, i32 %idx.another, i32 0 --> ((8 * (sext i32 %idx to i64))<nsw> + %array)<nsw> U: full-set S: full-set but it isn't trivial enough for `SimplifyInstruction` to eliminate the false dependency: LSV hangs on this example with and w/o this patch both. If we pursue roughly the same approach, we could use ScalarEvolution Expander instead: if we re-materialize the SCEV of the pointer being used when we issue the vectorized version of the load or store (and keep `reorder` method as it currently is TOT), we could probably avoid the problem more reliably just because `Vectorizer::isConsecutiveAccess` is largely based on ScalarEvolution and there is more parity between what they can and can not do (note above the SCEV for `%ptr0`: it has the false dependency on `%t` fully eliminated. If re-materialized from SCEV, it will be just fine). However, it won't do entirely either. `Vectorizer::isConsecutiveAccess` does more than just applying SE. Most notably, it can see through `select`s, even nested ones. Here's a test-case, which is mostly derived from the previous one: target triple = "aarch64" ; Function Attrs: noinline nounwind define i32 @test(i32* %array, i32 %idx, i1 %cond, i1 %cond2) #0 { entry: %idx.p.7 = add nsw i32 %idx, 7 %idx.p.7.sext = sext i32 %idx.p.7 to i64 %ptr1.dummy = getelementptr inbounds i32, i32* %array, i64 21 %ptr1.true = getelementptr inbounds i32, i32* %array, i64 1 %ptr1.false = getelementptr inbounds i32, i32* %array, i64 %idx.p.7.sext %ptr1.tmp = select i1 %cond, i32* %ptr1.true, i32* %ptr1.false %ptr1 = select i1 %cond2, i32* %ptr1.tmp, i32* %ptr1.dummy %t = load i32, i32* %ptr1, align 4 %t2 = mul i32 %t, 2 %idx.p.2t = add i32 %idx, %t2 %idx.p.t = sub i32 %idx.p.2t, %t %idx.another = sub i32 %idx.p.t, %t %idx.p.6 = add i32 %idx.another, 6 %idx.p.6.sext = sext i32 %idx.p.6 to i64 %ptr0.dummy = getelementptr inbounds i32, i32* %array, i64 20 %ptr0.true = getelementptr inbounds i32, i32* %array, i64 0 %ptr0.false = getelementptr inbounds i32, i32* %array, i64 %idx.p.6.sext %ptr0.tmp = select i1 %cond, i32* %ptr0.true, i32* %ptr0.false %ptr0 = select i1 %cond2, i32* %ptr0.tmp, i32* %ptr0.dummy %v = load i32, i32* %ptr0, align 4 %r = add i32 %v, %t ret i32 %r } attributes #0 = { noinline nounwind } As before, it hangs LSV both w/ and w/o this patch applied. What's more, however, in this case a top-level SCEV of the `%ptr0` is nowhere near being free of the false dependency on `%t`: %idx.p.7 = add nsw i32 %idx, 7 --> (7 + %idx) U: full-set S: full-set %idx.p.7.sext = sext i32 %idx.p.7 to i64 --> (sext i32 (7 + %idx) to i64) U: [-2147483648,2147483648) S: [-2147483648,2147483648) %ptr1.dummy = getelementptr inbounds i32, i32* %array, i64 21 --> (84 + %array)<nsw> U: full-set S: full-set %ptr1.true = getelementptr inbounds i32, i32* %array, i64 1 --> (4 + %array)<nsw> U: full-set S: full-set %ptr1.false = getelementptr inbounds i32, i32* %array, i64 %idx.p.7.sext --> ((4 * (sext i32 (7 + %idx) to i64))<nsw> + %array)<nsw> U: full-set S: full-set %ptr1.tmp = select i1 %cond, i32* %ptr1.true, i32* %ptr1.false --> %ptr1.tmp U: full-set S: full-set %ptr1 = select i1 %cond2, i32* %ptr1.tmp, i32* %ptr1.dummy --> %ptr1 U: full-set S: full-set %t = load i32, i32* %ptr1, align 4 --> %t U: full-set S: full-set %t2 = mul i32 %t, 2 --> (2 * %t) U: [0,-1) S: [-2147483648,2147483647) %idx.p.2t = add i32 %idx, %t2 --> ((2 * %t) + %idx) U: full-set S: full-set %idx.p.t = sub i32 %idx.p.2t, %t --> (%idx + %t) U: full-set S: full-set %idx.another = sub i32 %idx.p.t, %t --> %idx U: full-set S: full-set %idx.p.6 = add i32 %idx.another, 6 --> (6 + %idx) U: full-set S: full-set %idx.p.6.sext = sext i32 %idx.p.6 to i64 --> (sext i32 (6 + %idx) to i64) U: [-2147483648,2147483648) S: [-2147483648,2147483648) %ptr0.dummy = getelementptr inbounds i32, i32* %array, i64 20 --> (80 + %array)<nsw> U: full-set S: full-set %ptr0.true = getelementptr inbounds i32, i32* %array, i64 0 --> %array U: full-set S: full-set %ptr0.false = getelementptr inbounds i32, i32* %array, i64 %idx.p.6.sext --> ((4 * (sext i32 (6 + %idx) to i64))<nsw> + %array)<nsw> U: full-set S: full-set %ptr0.tmp = select i1 %cond, i32* %ptr0.true, i32* %ptr0.false --> %ptr0.tmp U: full-set S: full-set %ptr0 = select i1 %cond2, i32* %ptr0.tmp, i32* %ptr0.dummy So rematerializing just the top-level SCEV with the Expander won't help here either. rtereshin: Hi Bevin, Thanks for looking into this, apologies it took me so long to notice there is a…
		}
InstructionsToMove.insert(IM);		InstructionsToMove.insert(IM);
Worklist.push_back(IM);		Worklist.push_back(IM);
}		}
}		}
}		}

// All instructions to move should follow I. Start from I, not from begin().		// All instructions to move should follow I. Start from I, not from begin().
for (auto BBI = I->getIterator(), E = I->getParent()->end(); BBI != E;		for (auto BBI = I->getIterator(), E = I->getParent()->end(); BBI != E;
▲ Show 20 Lines • Show All 736 Lines • Show Last 20 Lines

test/Transforms/LoadStoreVectorizer/AArch64/reorder-infinite-loop.ll

This file was added.

				; RUN: opt -load-store-vectorizer -S -o - %s \| FileCheck %s
				target triple = "aarch64"

				@a = dso_local global i32 0, align 1
				@b = dso_local global [1 x [3 x i32]] zeroinitializer, align 1

				; arrayidx4 should have the last index replaced with 0 due to simplification,
				; to avoid the infinite loop in reorder.
				; CHECK-LABEL: @main()
				; CHECK: %arrayidx4 = getelementptr inbounds [1 x [3 x i32]], [1 x [3 x i32]]* @b, i32 0, i32 0

				; Function Attrs: noinline nounwind
				define dso_local i16 @main() #0 {
				entry:
				%0 = load i32, i32* @a, align 1
				volkanUnsubmitted Not Done Reply Inline Actions Nit: You can run -instnamer on the test in order to assign names to these instructions. volkan: Nit: You can run -instnamer on the test in order to assign names to these instructions.
				%rem = urem i32 %0, 1
				%arrayidx = getelementptr inbounds [1 x [3 x i32]], [1 x [3 x i32]]* @b, i32 0, i32 %rem
				%arrayidx1 = getelementptr inbounds [3 x i32], [3 x i32]* %arrayidx, i32 0, i32 2
				%1 = load i32, i32* %arrayidx1, align 1
				%rem2 = urem i32 %1, 1
				%arrayidx4 = getelementptr inbounds [1 x [3 x i32]], [1 x [3 x i32]]* @b, i32 0, i32 %rem2
				%arrayidx5 = getelementptr inbounds [3 x i32], [3 x i32]* %arrayidx4, i32 0, i32 1
				%2 = load i32, i32* %arrayidx5, align 1
				ret i16 0
				}

				attributes #0 = { noinline nounwind }

This is an archive of the discontinued LLVM Phabricator instance.

[LoadStoreVectorizer] Fix infinite loop in reorder.Needs ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 173821

lib/Transforms/Vectorize/LoadStoreVectorizer.cpp

test/Transforms/LoadStoreVectorizer/AArch64/reorder-infinite-loop.ll

[LoadStoreVectorizer] Fix infinite loop in reorder.
Needs ReviewPublic