This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
9/33
LoadStoreVectorizer.cpp
-
test/Transforms/LoadStoreVectorizer/
-
Transforms/
-
LoadStoreVectorizer/
-
AMDGPU/
1/9
insertion-point.ll
-
X86/
6
correct-order.ll
-
preserve-order32.ll
-
preserve-order64.ll
-
subchain-interleaved.ll

Differential D22071

Correct ordering of loads/stores.
ClosedPublic

Authored by asbirlea on Jul 6 2016, 2:34 PM.

Download Raw Diff

Details

Reviewers

• tstellarAMD
jlebar
llvm-commits
arsenm

Commits

rGcbc6ac2afd7b: Correct ordering of loads/stores.
rL275117: Correct ordering of loads/stores.

Summary

Aiming to correct the ordering of loads/stores. This patch changes insert point for loads to the position of the first load; it adds a new ordering method for loads to insert before, rather than after the load.
Updated testcases to reflect the changes.

Diff Detail

Event Timeline

asbirlea updated this revision to Diff 62977.Jul 6 2016, 2:34 PM

asbirlea retitled this revision from to Correct ordering of loads/stores..

asbirlea updated this object.

asbirlea added reviewers: llvm-commits, jlebar, arsenm.

Herald added a reviewer: • tstellarAMD. · View Herald TranscriptJul 6 2016, 2:34 PM

Herald added a subscriber: mzolotukhin. · View Herald Transcript

asbirlea added a parent revision: D21935: Add TLI.allowsMisalignedMemoryAccesses to LoadStoreVectorizer.Jul 6 2016, 2:35 PM

Revert some test changes after correcting the condition testing for misaligned in D21935.

jlebar added inline comments.Jul 6 2016, 3:35 PM

lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
94	Where do we use reorderAfter? Should one of the calls to reorderBefore call reorderAfter? If so, why aren't some of the tests failing? :)
96	"&" should bind to the variable name. (It's helpful to me to run clang-format as part of "arc diff", so I don't ever forget this stuff. I have a wrapper script around arc that invokes git-clang-format before posting a change. https://github.com/jlebar/conf/blob/master/bin/arc In fact I call my wrapper "arc" so I don't even have to think about it. You'll need a file/symlink in the same directory as that script called arc-real, and you'll need git-clang-format on your path -- it's in the LLVM source tree at tools/clang/tools/clang-format/git-clang-format. I think you'll also need to add [clangformat] style = "file" to your .gitconfig. Easy, right? :)
337	Whitespace alignment here.
342	This is arbitrarily-deep recursion, which I think we tend to avoid, for fear of overflowing the stack given pessimal inputs. Can we write this with an explicit worklist instead?
348	Can we use llvm::SmallPtrSet? unordered_set is much slower and cache-inefficient.
351	Space around "=" (clang-format should take care off this, too). Here and elsewhere.
352	Do we know that I->getParent()->end() can't change when we insert new elements?
352	This only considers instructions in I's BB, but InstructionsToMove may contain other instructions, no? If so that may make this whole thing more complicated...
359	If you're going to do this, maybe we should assert that InstructionsToMove is empty after the loop?
359	Do we need the `&*`? I'd think it should work without that.
884	We use dyn_cast when the cast may fail and return null. But I think you don't want a null pointer inside InstrsToReorder, and we know that Bitcast is an Instruction, so I think you want plain cast<>.
test/Transforms/LoadStoreVectorizer/AMDGPU/insertion-point.ll
12	I don't quite get what the original code is testing here. Like, the adds are completely independent of the loads, right? If so, can we fix this test so it's not sensitive to implementation details?
test/Transforms/LoadStoreVectorizer/X86/correct-order.ll
17	Do all these tests need to be inside loops?
18	Do we have a test which checks that we don't reorder through phi nodes?

Partially address comments.

Still looking into updating the tests and some of the comments I missed.

lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
94	Fair point. I need to figure out if there si a case when reorderAfter is needed for stores. In the mean time, removing it.
96	Yes. Postponing running clang-format until all other comments are addressed. I also have another patch on top of this which may lead to conflicts after formatting, in which case I will clang-format the next patch.
359	All this was removed, but I use the comments for reorderBefore.
884	Yes, updated.
test/Transforms/LoadStoreVectorizer/AMDGPU/insertion-point.ll
12	I'm not sure what the original purpose was. It looks to me it is intentionally testing an implementation detail ("insert_load_point").
test/Transforms/LoadStoreVectorizer/X86/correct-order.ll
18	I don't think so. Feel free to add one :).

OK, I'll have another look once you've figured the rest out. Just lmk.

lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
95	in which case I will clang-format the next patch. I can help you resolve the conflicts if it gets hairy (I have some ideas for rebase tricks), but it's a requirement each commit be properly clang-formatted. We can't commit improperly-formatted code.
test/Transforms/LoadStoreVectorizer/AMDGPU/insertion-point.ll
12	It looks to me it is intentionally testing an implementation detail ("insert_load_point"). Looks like it to me, too. When we committed the original patches, we all agreed that we wouldn't act with a bias towards the existing code, since we committed with existing unresolved issues. I think this should count under that rubric. That is, can we fix the test so it no longer tests an implementation detail? I suppose you don't need to do that in this patch if you don't want.
test/Transforms/LoadStoreVectorizer/X86/correct-order.ll
19	You don't think it's worth adding a test as part of this patch? It seems relevant because we could otherwise get infinite recursion or something...

arsenm added inline comments.Jul 6 2016, 8:13 PM

test/Transforms/LoadStoreVectorizer/AMDGPU/insertion-point.ll
12	The pass should still have an expectation for where the instructions will be inserted relative to the originals, I think a test ensuring this is useful

jlebar added inline comments.Jul 6 2016, 8:41 PM

test/Transforms/LoadStoreVectorizer/AMDGPU/insertion-point.ll
12	The pass should still have an expectation for where the instructions will be inserted relative to the originals I guess I am OK with this if we can articulate in the test file or the cpp file exactly what is the rule that we expect applies to our output. If we cannot articulate a rule, and instead we're just checking that the pass does what it currently does, I do not think that is a good test. The reason is that, without an articulation of the rule, if the test fails, we have no way to tell whether there's a bug or if the test just needs to be changed. (And if we can articulate a rule, it should go without saying that, inasmuch as reasonable, the test should check only for adherence to the rule, ignoring other ancillary properties of the output.)

arsenm added inline comments.Jul 6 2016, 11:55 PM

test/Transforms/LoadStoreVectorizer/AMDGPU/insertion-point.ll
12	Test should generally have a comment explaining what they are testing anyway. This change was mostly why I added this test in the first place. If something is changing any behavior of the pass, a test should capture this. I don't understand the concern about wondering if it's a bug or the test needs update, the point of having the test is you have to look at the test output changes to verify that it is still correct

jlebar added inline comments.Jul 7 2016, 8:09 AM

test/Transforms/LoadStoreVectorizer/AMDGPU/insertion-point.ll
12	the point of having the test is you have to look at the test output changes to verify that it is still correct My thesis is that this process is bug-prone. My evidence for this is that Alina found multiple tests in this test suite that had bugs -- tests that checked that some sequence of operations was vectorized when in fact it was not safe to vectorize it. This motivates my suggestion, which is that, inasmuch as we can, we should avoid engaging in this process (by writing tests that are not fragile to uninteresting details), and, where we can't avoid the process entirely (maybe the details are interesting, and maybe that's the case here), we should write down explicitly what behavior we expect from the pass. I don't mean to suggest that the bugs in the tests were the result of you being careless -- I didn't catch them either when I reviewed the patches. My point is just that every time humans have to look at new output and decide if it's correct, there is a chance that we'll overlook a bug. And based on this history, that chance is not negligible. Even if you disagree with my application of the evidence and think it's unlikely that the three of us would make such a bug, surely other maintainers may not be as scrupulous. Thus my suggestion: If we can write down the behavior we expect from the pass -- "Vectorized loads should be inserted at the position of the first load, and instructions which were between the first and last load should be reordered preserving their relative order inasmuch as possible." (or whatever the actual rule is) -- then when the test fails, we can judge against that whether the test or the pass is broken. And if we have to update the test, we have some chance of creating the correct output ourselves, rather than just accepting the output created by the pass (which is more likely, in my judgement, to lead to us accepting buggy output, per above). I think we're pretty close in what we want, honestly. The main difference, I think, is that I am saying that we should try not to test behavior for which we cannot articulate a rule. If the behavior is so incidental that we can't even say what it's supposed to do, I don't see why we'd want to enstone it as a test.

Address comments.

I think most comments are addressed now. Let me know if I missed anything.

lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
95	Formatted.
342	Used a work-list for the other method.
test/Transforms/LoadStoreVectorizer/AMDGPU/insertion-point.ll
12	I tried to resolve this for now by adding the comment you suggested in both this and the other 2 tests checking the order is preserved.
test/Transforms/LoadStoreVectorizer/X86/correct-order.ll
18	Removed loops in all tests.
20	I'm not sure how to properly create one right now. I added a test that makes an attempt at that, but it in fact ensures there is no vectorization beyond basic blocks (and implicitly through a phi node).

Adding changes from dependent patches.

asbirlea added a child revision: D22119: Extended LoadStoreVectorizer to vectorize subchains..Jul 7 2016, 4:21 PM

Can you elaborate a bit more in the commit message exactly what bug we're fixing here? It's not immediately clear to me.

lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
95	Maybe "reorderUsers" or "moveUsers" would be a better name, especially since we no longer have reorderAfter? Or even just "reorder", I guess.
341	push_back
342	`!Worklist.empty()` (I think?)
346	`I = Worklist.pop_back_elem();`
351	push_back
372	Since this is no longer recursive, we don't need the helper anymore? We can keep it if you think it's a useful way to break things down (I'm not sure it is), but then we should change the name so it's more descriptive and have it return the SmallPtrSet instead of take it by reference.
375	Could we call this iterator BBI, and then call the instruction to move IM? That would be consistent with the helper, and also would fix the problem of InstructionToMove and InstructionsToMove looking very similar.
382	Now that I think about this, could we just have a DEBUG() loop that checks that every element of InstructionsToMove is in the same BB as I? Then we don't need to erase from the set, which is overhead we don't need.
385	Nit, end sentences with periods. Also suggest swapping the order of the prepositional phrases; this order is awkward (although I can't articulate a rule :).
test/Transforms/LoadStoreVectorizer/AMDGPU/insertion-point.ll
6	Please reflow. Also thanks for adding this; I now understand why the test output is what it is. That makes me happy.

Address comments.

Mark as done.

jlebar added inline comments.Jul 8 2016, 3:14 PM

lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
356–367	One of the things I'm bad at is evaluating the correctness of code that contains nits to be fixed. So...now that you've fixed the nits, I have a correctness concern, which I'm sorry I didn't see earlier. My concern is that we stop reordering when IM dominates IW. But it seems to me that we should stop reordering when IM dominates I, no? Because after we move IW up before I, it may no longer dominate its operands.
359	Don't need DEBUG around the assert(). assert() is a macro and only evaluates its args in debug builds.

asbirlea added inline comments.Jul 8 2016, 5:03 PM

lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
356–367	Let me try to reason this. The loop checks that all operands IM should dominate IW. If that's not the case, IW should be moved before I. If that happens, IW is added to the worklist, so all its operands are checked and possibly moved before as well in the next iterations. Yes, all IM should implicitly dominate I as well, but that should be transitive through IW. Unless I missed something?

asbirlea added inline comments.Jul 8 2016, 5:15 PM

lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
356–367	What I missed is that instructions are not moved until after the loop. You're right.

Address comments.

jlebar added inline comments.Jul 11 2016, 9:46 AM

lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
356–367	Did you add a test for this? Phabricator isn't showing one, but I don't entirely trust it. If not, can you add one?

jlebar mentioned this in D22119: Extended LoadStoreVectorizer to vectorize subchains..Jul 11 2016, 10:45 AM

Add testcase.

asbirlea added inline comments.Jul 11 2016, 12:45 PM

lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
356–368	Added a tescase now. The test fails without the above change (replacing I with IW in the dominators check)

jlebar accepted this revision.Jul 11 2016, 12:53 PM

jlebar edited edge metadata.

This revision is now accepted and ready to land.Jul 11 2016, 12:53 PM

Update to latest.

Closed by commit rL275117: Correct ordering of loads/stores. (authored by asbirlea). · Explain WhyJul 11 2016, 3:41 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Transforms/

Vectorize/

LoadStoreVectorizer.cpp

48 lines

test/

Transforms/

LoadStoreVectorizer/

AMDGPU/

insertion-point.ll

6 lines

X86/

correct-order.ll

26 lines

preserve-order32.ll

7 lines

preserve-order64.ll

56 lines

subchain-interleaved.ll

91 lines

Diff 63569

lib/Transforms/Vectorize/LoadStoreVectorizer.cpp

Show First 20 Lines • Show All 82 Lines • ▼ Show 20 Lines	unsigned getAlignment(StoreInst *SI) const {
if (Align != 0)		if (Align != 0)
return Align;		return Align;

return DL.getABITypeAlignment(SI->getValueOperand()->getType());		return DL.getABITypeAlignment(SI->getValueOperand()->getType());
}		}

bool isConsecutiveAccess(Value A, Value B);		bool isConsecutiveAccess(Value A, Value B);

/// Reorders the users of I after vectorization to ensure that I dominates its		/// After vectorization, reorder the instructions that I depends on
/// users.		/// (the instructions defining its operands), to ensure they dominate I.
void reorder(Instruction *I);		void reorder(Instruction *I);

		jlebarUnsubmitted Not Done Reply Inline Actions Where do we use reorderAfter? Should one of the calls to reorderBefore call reorderAfter? If so, why aren't some of the tests failing? :) jlebar: Where do we use reorderAfter? Should one of the calls to reorderBefore call reorderAfter? If…
		asbirleaAuthorUnsubmitted Not Done Reply Inline Actions Fair point. I need to figure out if there si a case when reorderAfter is needed for stores. In the mean time, removing it. asbirlea: Fair point. I need to figure out if there si a case when reorderAfter is needed for stores. In…
/// Returns the first and the last instructions in Chain.		/// Returns the first and the last instructions in Chain.
		jlebarUnsubmitted Not Done Reply Inline Actions in which case I will clang-format the next patch. I can help you resolve the conflicts if it gets hairy (I have some ideas for rebase tricks), but it's a requirement each commit be properly clang-formatted. We can't commit improperly-formatted code. jlebar: > in which case I will clang-format the next patch. I can help you resolve the conflicts if it…
		asbirleaAuthorUnsubmitted Not Done Reply Inline Actions Formatted. asbirlea: Formatted.
		jlebarUnsubmitted Done Reply Inline Actions Maybe "reorderUsers" or "moveUsers" would be a better name, especially since we no longer have reorderAfter? Or even just "reorder", I guess. jlebar: Maybe "reorderUsers" or "moveUsers" would be a better name, especially since we no longer have…
std::pair<BasicBlock::iterator, BasicBlock::iterator>		std::pair<BasicBlock::iterator, BasicBlock::iterator>
		jlebarUnsubmitted Not Done Reply Inline Actions "&" should bind to the variable name. (It's helpful to me to run clang-format as part of "arc diff", so I don't ever forget this stuff. I have a wrapper script around arc that invokes git-clang-format before posting a change. https://github.com/jlebar/conf/blob/master/bin/arc In fact I call my wrapper "arc" so I don't even have to think about it. You'll need a file/symlink in the same directory as that script called arc-real, and you'll need git-clang-format on your path -- it's in the LLVM source tree at tools/clang/tools/clang-format/git-clang-format. I think you'll also need to add [clangformat] style = "file" to your .gitconfig. Easy, right? :) jlebar: "&" should bind to the variable name. (It's helpful to me to run clang-format as part of "arc…
		asbirleaAuthorUnsubmitted Not Done Reply Inline Actions Yes. Postponing running clang-format until all other comments are addressed. I also have another patch on top of this which may lead to conflicts after formatting, in which case I will clang-format the next patch. asbirlea: Yes. Postponing running clang-format until all other comments are addressed. I also have…
getBoundaryInstrs(ArrayRef<Value *> Chain);		getBoundaryInstrs(ArrayRef<Value *> Chain);

/// Erases the original instructions after vectorizing.		/// Erases the original instructions after vectorizing.
void eraseInstructions(ArrayRef<Value *> Chain);		void eraseInstructions(ArrayRef<Value *> Chain);

/// "Legalize" the vector type that would be produced by combining \p		/// "Legalize" the vector type that would be produced by combining \p
/// ElementSizeBits elements in \p Chain. Break into two pieces such that the		/// ElementSizeBits elements in \p Chain. Break into two pieces such that the
/// total size of each piece is 1, 2 or a multiple of 4 bytes. \p Chain is		/// total size of each piece is 1, 2 or a multiple of 4 bytes. \p Chain is
▲ Show 20 Lines • Show All 224 Lines • ▼ Show 20 Lines	bool Vectorizer::isConsecutiveAccess(Value A, Value B) {
const SCEV *OffsetSCEVA = SE.getSCEV(OpA);		const SCEV *OffsetSCEVA = SE.getSCEV(OpA);
const SCEV *OffsetSCEVB = SE.getSCEV(OpB);		const SCEV *OffsetSCEVB = SE.getSCEV(OpB);
const SCEV *One = SE.getConstant(APInt(BitWidth, 1));		const SCEV *One = SE.getConstant(APInt(BitWidth, 1));
const SCEV *X2 = SE.getAddExpr(OffsetSCEVA, One);		const SCEV *X2 = SE.getAddExpr(OffsetSCEVA, One);
return X2 == OffsetSCEVB;		return X2 == OffsetSCEVB;
}		}

void Vectorizer::reorder(Instruction *I) {		void Vectorizer::reorder(Instruction *I) {
Instruction *InsertAfter = I;		SmallPtrSet<Instruction *, 16> InstructionsToMove;
		jlebarUnsubmitted Not Done Reply Inline Actions Whitespace alignment here. jlebar: Whitespace alignment here.
for (User *U : I->users()) {		SmallVector<Instruction *, 16> Worklist;
Instruction *User = dyn_cast<Instruction>(U);
if (!User \|\| User->getOpcode() == Instruction::PHI)		Worklist.push_back(I);
		while (!Worklist.empty()) {
		jlebarUnsubmitted Done Reply Inline Actions push_back jlebar: push_back
		Instruction *IW = Worklist.pop_back_val();
		jlebarUnsubmitted Not Done Reply Inline Actions This is arbitrarily-deep recursion, which I think we tend to avoid, for fear of overflowing the stack given pessimal inputs. Can we write this with an explicit worklist instead? jlebar: This is arbitrarily-deep recursion, which I think we tend to avoid, for fear of overflowing the…
		asbirleaAuthorUnsubmitted Not Done Reply Inline Actions Used a work-list for the other method. asbirlea: Used a work-list for the other method.
		jlebarUnsubmitted Done Reply Inline Actions `!Worklist.empty()` (I think?) jlebar: `!Worklist.empty()` (I think?)
		int NumOperands = IW->getNumOperands();
		for (int i = 0; i < NumOperands; i++) {
		Instruction *IM = dyn_cast<Instruction>(IW->getOperand(i));
		if (!IM \|\| IM->getOpcode() == Instruction::PHI)
		jlebarUnsubmitted Done Reply Inline Actions `I = Worklist.pop_back_elem();` jlebar: `I = Worklist.pop_back_elem();`
continue;		continue;

		jlebarUnsubmitted Not Done Reply Inline Actions Can we use llvm::SmallPtrSet? unordered_set is much slower and cache-inefficient. jlebar: Can we use llvm::SmallPtrSet? unordered_set is much slower and cache-inefficient.
if (!DT.dominates(I, User)) {		if (!DT.dominates(IM, I)) {
User->removeFromParent();		InstructionsToMove.insert(IM);
User->insertAfter(InsertAfter);		Worklist.push_back(IM);
		jlebarUnsubmitted Not Done Reply Inline Actions Space around "=" (clang-format should take care off this, too). Here and elsewhere. jlebar: Space around "=" (clang-format should take care off this, too). Here and elsewhere.
		jlebarUnsubmitted Done Reply Inline Actions push_back jlebar: push_back
InsertAfter = User;		assert(IM->getParent() == IW->getParent() &&
		jlebarUnsubmitted Not Done Reply Inline Actions Do we know that I->getParent()->end() can't change when we insert new elements? jlebar: Do we know that I->getParent()->end() can't change when we insert new elements?
		jlebarUnsubmitted Not Done Reply Inline Actions This only considers instructions in I's BB, but InstructionsToMove may contain other instructions, no? If so that may make this whole thing more complicated... jlebar: This only considers instructions in I's BB, but InstructionsToMove may contain other…
reorder(User);		"Instructions to move should be in the same basic block");
		}
		}
}		}

		// All instructions to move should follow I. Start from I, not from begin().
		for (auto BBI = I->getIterator(), E = I->getParent()->end(); BBI != E;
		jlebarUnsubmitted Not Done Reply Inline Actions If you're going to do this, maybe we should assert that InstructionsToMove is empty after the loop? jlebar: If you're going to do this, maybe we should assert that InstructionsToMove is empty after the…
		jlebarUnsubmitted Not Done Reply Inline Actions Do we need the `&`? I'd think it should work without that. jlebar:* Do we need the `&*`? I'd think it should work without that.
		asbirleaAuthorUnsubmitted Not Done Reply Inline Actions All this was removed, but I use the comments for reorderBefore. asbirlea: All this was removed, but I use the comments for reorderBefore.
		jlebarUnsubmitted Not Done Reply Inline Actions Don't need DEBUG around the assert(). assert() is a macro and only evaluates its args in debug builds. jlebar: Don't need DEBUG around the assert(). assert() is a macro and only evaluates its args in debug…
		++BBI) {
		if (!is_contained(InstructionsToMove, &*BBI))
		continue;
		Instruction IM = &BBI;
		--BBI;
		IM->removeFromParent();
		IM->insertBefore(I);
}		}
		jlebarUnsubmitted Not Done Reply Inline Actions One of the things I'm bad at is evaluating the correctness of code that contains nits to be fixed. So...now that you've fixed the nits, I have a correctness concern, which I'm sorry I didn't see earlier. My concern is that we stop reordering when IM dominates IW. But it seems to me that we should stop reordering when IM dominates I, no? Because after we move IW up before I, it may no longer dominate its operands. jlebar: One of the things I'm bad at is evaluating the correctness of code that contains nits to be…
		asbirleaAuthorUnsubmitted Not Done Reply Inline Actions Let me try to reason this. The loop checks that all operands IM should dominate IW. If that's not the case, IW should be moved before I. If that happens, IW is added to the worklist, so all its operands are checked and possibly moved before as well in the next iterations. Yes, all IM should implicitly dominate I as well, but that should be transitive through IW. Unless I missed something? asbirlea: Let me try to reason this. The loop checks that all operands IM should dominate IW. If that's…
		asbirleaAuthorUnsubmitted Not Done Reply Inline Actions What I missed is that instructions are not moved until after the loop. You're right. asbirlea: What I missed is that instructions are not moved until after the loop. You're right.
		jlebarUnsubmitted Not Done Reply Inline Actions Did you add a test for this? Phabricator isn't showing one, but I don't entirely trust it. If not, can you add one? jlebar: Did you add a test for this? Phabricator isn't showing one, but I don't entirely trust it. If…
}		}
		asbirleaAuthorUnsubmitted Not Done Reply Inline Actions Added a tescase now. The test fails without the above change (replacing I with IW in the dominators check) asbirlea: Added a tescase now. The test fails without the above change (replacing I with IW in the…

std::pair<BasicBlock::iterator, BasicBlock::iterator>		std::pair<BasicBlock::iterator, BasicBlock::iterator>
Vectorizer::getBoundaryInstrs(ArrayRef<Value *> Chain) {		Vectorizer::getBoundaryInstrs(ArrayRef<Value *> Chain) {
Instruction *C0 = cast<Instruction>(Chain[0]);		Instruction *C0 = cast<Instruction>(Chain[0]);
		jlebarUnsubmitted Done Reply Inline Actions Since this is no longer recursive, we don't need the helper anymore? We can keep it if you think it's a useful way to break things down (I'm not sure it is), but then we should change the name so it's more descriptive and have it return the SmallPtrSet instead of take it by reference. jlebar: Since this is no longer recursive, we don't need the helper anymore? We can keep it if you…
BasicBlock::iterator FirstInstr = C0->getIterator();		BasicBlock::iterator FirstInstr = C0->getIterator();
BasicBlock::iterator LastInstr = C0->getIterator();		BasicBlock::iterator LastInstr = C0->getIterator();

		jlebarUnsubmitted Done Reply Inline Actions Could we call this iterator BBI, and then call the instruction to move IM? That would be consistent with the helper, and also would fix the problem of InstructionToMove and InstructionsToMove looking very similar. jlebar: Could we call this iterator BBI, and then call the instruction to move IM? That would be…
BasicBlock *BB = C0->getParent();		BasicBlock *BB = C0->getParent();
unsigned NumFound = 0;		unsigned NumFound = 0;
for (Instruction &I : *BB) {		for (Instruction &I : *BB) {
if (!is_contained(Chain, &I))		if (!is_contained(Chain, &I))
continue;		continue;

++NumFound;		++NumFound;
		jlebarUnsubmitted Done Reply Inline Actions Now that I think about this, could we just have a DEBUG() loop that checks that every element of InstructionsToMove is in the same BB as I? Then we don't need to erase from the set, which is overhead we don't need. jlebar: Now that I think about this, could we just have a DEBUG() loop that checks that every element…
if (NumFound == 1) {		if (NumFound == 1) {
FirstInstr = I.getIterator();		FirstInstr = I.getIterator();
}		}
		jlebarUnsubmitted Done Reply Inline Actions Nit, end sentences with periods. Also suggest swapping the order of the prepositional phrases; this order is awkward (although I can't articulate a rule :). jlebar: Nit, end sentences with periods. Also suggest swapping the order of the prepositional phrases…
if (NumFound == Chain.size()) {		if (NumFound == Chain.size()) {
LastInstr = I.getIterator();		LastInstr = I.getIterator();
break;		break;
}		}
}		}

// Range is [first, last).		// Range is [first, last).
return std::make_pair(FirstInstr, ++LastInstr);		return std::make_pair(FirstInstr, ++LastInstr);
▲ Show 20 Lines • Show All 470 Lines • ▼ Show 20 Lines	bool Vectorizer::vectorizeLoadChain(ArrayRef<Value *> Chain) {

BasicBlock::iterator First, Last;		BasicBlock::iterator First, Last;
std::tie(First, Last) = getBoundaryInstrs(Chain);		std::tie(First, Last) = getBoundaryInstrs(Chain);

if (!isVectorizable(Chain, First, Last))		if (!isVectorizable(Chain, First, Last))
return false;		return false;

// Set insert point.		// Set insert point.
Builder.SetInsertPoint(&*Last);		Builder.SetInsertPoint(&*First);

Value *Bitcast =		Value *Bitcast =
Builder.CreateBitCast(L0->getPointerOperand(), VecTy->getPointerTo(AS));		Builder.CreateBitCast(L0->getPointerOperand(), VecTy->getPointerTo(AS));

LoadInst *LI = cast<LoadInst>(Builder.CreateLoad(Bitcast));		LoadInst *LI = cast<LoadInst>(Builder.CreateLoad(Bitcast));
propagateMetadata(LI, Chain);		propagateMetadata(LI, Chain);
LI->setAlignment(Alignment);		LI->setAlignment(Alignment);

if (VecLoadTy) {		if (VecLoadTy) {
SmallVector<Instruction *, 16> InstrsToErase;		SmallVector<Instruction *, 16> InstrsToErase;
SmallVector<Instruction *, 16> InstrsToReorder;		SmallVector<Instruction *, 16> InstrsToReorder;
		InstrsToReorder.push_back(cast<Instruction>(Bitcast));
		jlebarUnsubmitted Not Done Reply Inline Actions We use dyn_cast when the cast may fail and return null. But I think you don't want a null pointer inside InstrsToReorder, and we know that Bitcast is an Instruction, so I think you want plain cast<>. jlebar: We use dyn_cast when the cast may fail and return null. But I think you don't want a null…
		asbirleaAuthorUnsubmitted Not Done Reply Inline Actions Yes, updated. asbirlea: Yes, updated.

unsigned VecWidth = VecLoadTy->getNumElements();		unsigned VecWidth = VecLoadTy->getNumElements();
for (unsigned I = 0, E = Chain.size(); I != E; ++I) {		for (unsigned I = 0, E = Chain.size(); I != E; ++I) {
for (auto Use : Chain[I]->users()) {		for (auto Use : Chain[I]->users()) {
Instruction *UI = cast<Instruction>(Use);		Instruction *UI = cast<Instruction>(Use);
unsigned Idx = cast<ConstantInt>(UI->getOperand(1))->getZExtValue();		unsigned Idx = cast<ConstantInt>(UI->getOperand(1))->getZExtValue();
unsigned NewIdx = Idx + I * VecWidth;		unsigned NewIdx = Idx + I * VecWidth;
Value *V = Builder.CreateExtractElement(LI, Builder.getInt32(NewIdx));		Value *V = Builder.CreateExtractElement(LI, Builder.getInt32(NewIdx));
Instruction *Extracted = cast<Instruction>(V);		Instruction *Extracted = cast<Instruction>(V);
if (Extracted->getType() != UI->getType())		if (Extracted->getType() != UI->getType())
Extracted = cast<Instruction>(		Extracted = cast<Instruction>(
Builder.CreateBitCast(Extracted, UI->getType()));		Builder.CreateBitCast(Extracted, UI->getType()));

// Replace the old instruction.		// Replace the old instruction.
UI->replaceAllUsesWith(Extracted);		UI->replaceAllUsesWith(Extracted);
InstrsToReorder.push_back(Extracted);
InstrsToErase.push_back(UI);		InstrsToErase.push_back(UI);
}		}
}		}

for (Instruction *ModUser : InstrsToReorder)		for (Instruction *ModUser : InstrsToReorder)
reorder(ModUser);		reorder(ModUser);

for (auto I : InstrsToErase)		for (auto I : InstrsToErase)
I->eraseFromParent();		I->eraseFromParent();
} else {		} else {
SmallVector<Instruction *, 16> InstrsToReorder;		SmallVector<Instruction *, 16> InstrsToReorder;
		InstrsToReorder.push_back(cast<Instruction>(Bitcast));

for (unsigned I = 0, E = Chain.size(); I != E; ++I) {		for (unsigned I = 0, E = Chain.size(); I != E; ++I) {
Value *V = Builder.CreateExtractElement(LI, Builder.getInt32(I));		Value *V = Builder.CreateExtractElement(LI, Builder.getInt32(I));
Instruction *Extracted = cast<Instruction>(V);		Instruction *Extracted = cast<Instruction>(V);
Instruction *UI = cast<Instruction>(Chain[I]);		Instruction *UI = cast<Instruction>(Chain[I]);
if (Extracted->getType() != UI->getType()) {		if (Extracted->getType() != UI->getType()) {
Extracted = cast<Instruction>(		Extracted = cast<Instruction>(
Builder.CreateBitOrPointerCast(Extracted, UI->getType()));		Builder.CreateBitOrPointerCast(Extracted, UI->getType()));
}		}

// Replace the old instruction.		// Replace the old instruction.
UI->replaceAllUsesWith(Extracted);		UI->replaceAllUsesWith(Extracted);
InstrsToReorder.push_back(Extracted);
}		}

for (Instruction *ModUser : InstrsToReorder)		for (Instruction *ModUser : InstrsToReorder)
reorder(ModUser);		reorder(ModUser);
}		}

eraseInstructions(Chain);		eraseInstructions(Chain);

Show All 14 Lines

test/Transforms/LoadStoreVectorizer/AMDGPU/insertion-point.ll

	; RUN: opt -mtriple=amdgcn-amd-amdhsa -basicaa -load-store-vectorizer -S -o - %s \| FileCheck %s			; RUN: opt -mtriple=amdgcn-amd-amdhsa -basicaa -load-store-vectorizer -S -o - %s \| FileCheck %s

	target datalayout = "e-p:32:32-p1:64:64-p2:64:64-p3:32:32-p4:64:64-p5:32:32-p24:64:64-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64"			target datalayout = "e-p:32:32-p1:64:64-p2:64:64-p3:32:32-p4:64:64-p5:32:32-p24:64:64-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64"

	; Check relative position of the inserted vector load relative to the			; Check relative position of the inserted vector load relative to the existing
	; existing adds.			; adds. Vectorized loads should be inserted at the position of the first load.
				jlebarUnsubmitted Done Reply Inline Actions Please reflow. Also thanks for adding this; I now understand why the test output is what it is. That makes me happy. jlebar: Please reflow. Also thanks for adding this; I now understand why the test output is what it is.

	; CHECK-LABEL: @insert_load_point(			; CHECK-LABEL: @insert_load_point(
	; CHECK: %z = add i32 %x, 4			; CHECK: %z = add i32 %x, 4
	; CHECK: %w = add i32 %y, 9
	; CHECK: load <2 x float>			; CHECK: load <2 x float>
				; CHECK: %w = add i32 %y, 9
	; CHECK: %foo = add i32 %z, %w			; CHECK: %foo = add i32 %z, %w
				jlebarUnsubmitted Not Done Reply Inline Actions I don't quite get what the original code is testing here. Like, the adds are completely independent of the loads, right? If so, can we fix this test so it's not sensitive to implementation details? jlebar: I don't quite get what the original code is testing here. Like, the adds are completely…
				asbirleaAuthorUnsubmitted Not Done Reply Inline Actions I'm not sure what the original purpose was. It looks to me it is intentionally testing an implementation detail ("insert_load_point"). asbirlea: I'm not sure what the original purpose was. It looks to me it is intentionally testing an…
				jlebarUnsubmitted Not Done Reply Inline Actions It looks to me it is intentionally testing an implementation detail ("insert_load_point"). Looks like it to me, too. When we committed the original patches, we all agreed that we wouldn't act with a bias towards the existing code, since we committed with existing unresolved issues. I think this should count under that rubric. That is, can we fix the test so it no longer tests an implementation detail? I suppose you don't need to do that in this patch if you don't want. jlebar: > It looks to me it is intentionally testing an implementation detail ("insert_load_point").
				arsenmUnsubmitted Not Done Reply Inline Actions The pass should still have an expectation for where the instructions will be inserted relative to the originals, I think a test ensuring this is useful arsenm: The pass should still have an expectation for where the instructions will be inserted relative…
				jlebarUnsubmitted Not Done Reply Inline Actions The pass should still have an expectation for where the instructions will be inserted relative to the originals I guess I am OK with this if we can articulate in the test file or the cpp file exactly what is the rule that we expect applies to our output. If we cannot articulate a rule, and instead we're just checking that the pass does what it currently does, I do not think that is a good test. The reason is that, without an articulation of the rule, if the test fails, we have no way to tell whether there's a bug or if the test just needs to be changed. (And if we can articulate a rule, it should go without saying that, inasmuch as reasonable, the test should check only for adherence to the rule, ignoring other ancillary properties of the output.) jlebar: > The pass should still have an expectation for where the instructions will be inserted…
				arsenmUnsubmitted Not Done Reply Inline Actions Test should generally have a comment explaining what they are testing anyway. This change was mostly why I added this test in the first place. If something is changing any behavior of the pass, a test should capture this. I don't understand the concern about wondering if it's a bug or the test needs update, the point of having the test is you have to look at the test output changes to verify that it is still correct arsenm: Test should generally have a comment explaining what they are testing anyway. This change was…
				jlebarUnsubmitted Not Done Reply Inline Actions the point of having the test is you have to look at the test output changes to verify that it is still correct My thesis is that this process is bug-prone. My evidence for this is that Alina found multiple tests in this test suite that had bugs -- tests that checked that some sequence of operations was vectorized when in fact it was not safe to vectorize it. This motivates my suggestion, which is that, inasmuch as we can, we should avoid engaging in this process (by writing tests that are not fragile to uninteresting details), and, where we can't avoid the process entirely (maybe the details are interesting, and maybe that's the case here), we should write down explicitly what behavior we expect from the pass. I don't mean to suggest that the bugs in the tests were the result of you being careless -- I didn't catch them either when I reviewed the patches. My point is just that every time humans have to look at new output and decide if it's correct, there is a chance that we'll overlook a bug. And based on this history, that chance is not negligible. Even if you disagree with my application of the evidence and think it's unlikely that the three of us would make such a bug, surely other maintainers may not be as scrupulous. Thus my suggestion: If we can write down the behavior we expect from the pass -- "Vectorized loads should be inserted at the position of the first load, and instructions which were between the first and last load should be reordered preserving their relative order inasmuch as possible." (or whatever the actual rule is) -- then when the test fails, we can judge against that whether the test or the pass is broken. And if we have to update the test, we have some chance of creating the correct output ourselves, rather than just accepting the output created by the pass (which is more likely, in my judgement, to lead to us accepting buggy output, per above). I think we're pretty close in what we want, honestly. The main difference, I think, is that I am saying that we should try not to test behavior for which we cannot articulate a rule. If the behavior is so incidental that we can't even say what it's supposed to do, I don't see why we'd want to enstone it as a test. jlebar: > the point of having the test is you have to look at the test output changes to verify that it…
				asbirleaAuthorUnsubmitted Not Done Reply Inline Actions I tried to resolve this for now by adding the comment you suggested in both this and the other 2 tests checking the order is preserved. asbirlea: I tried to resolve this for now by adding the comment you suggested in both this and the other…
	define void @insert_load_point(float addrspace(1)* nocapture %a, float addrspace(1)* nocapture %b, float addrspace(1)* nocapture readonly %c, i64 %idx, i32 %x, i32 %y) #0 {			define void @insert_load_point(float addrspace(1)* nocapture %a, float addrspace(1)* nocapture %b, float addrspace(1)* nocapture readonly %c, i64 %idx, i32 %x, i32 %y) #0 {
	entry:			entry:
	%a.idx.x = getelementptr inbounds float, float addrspace(1)* %a, i64 %idx			%a.idx.x = getelementptr inbounds float, float addrspace(1)* %a, i64 %idx
	%c.idx.x = getelementptr inbounds float, float addrspace(1)* %c, i64 %idx			%c.idx.x = getelementptr inbounds float, float addrspace(1)* %c, i64 %idx
	%a.idx.x.1 = getelementptr inbounds float, float addrspace(1)* %a.idx.x, i64 1			%a.idx.x.1 = getelementptr inbounds float, float addrspace(1)* %a.idx.x, i64 1
	%c.idx.x.1 = getelementptr inbounds float, float addrspace(1)* %c.idx.x, i64 1			%c.idx.x.1 = getelementptr inbounds float, float addrspace(1)* %c.idx.x, i64 1

	%z = add i32 %x, 4			%z = add i32 %x, 4
	▲ Show 20 Lines • Show All 42 Lines • Show Last 20 Lines

test/Transforms/LoadStoreVectorizer/X86/correct-order.ll

This file was added.

				; RUN: opt -mtriple=x86-linux -load-store-vectorizer -S -o - %s \| FileCheck %s

				target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"

				; CHECK-LABEL: @correct_order(
				; CHECK: bitcast i32*
				; CHECK: load <2 x i32>
				; CHECK: load i32
				; CHECK: bitcast i32*
				; CHECK: store <2 x i32>
				; CHECK: load i32
				define void @correct_order(i32* noalias %ptr) {
				%next.gep = getelementptr i32, i32* %ptr, i64 0
				%next.gep1 = getelementptr i32, i32* %ptr, i64 1
				%next.gep2 = getelementptr i32, i32* %ptr, i64 2

				%l1 = load i32, i32* %next.gep1, align 4
				jlebarUnsubmitted Not Done Reply Inline Actions Do all these tests need to be inside loops? jlebar: Do all these tests need to be inside loops?
				%l2 = load i32, i32* %next.gep, align 4
				asbirleaAuthorUnsubmitted Not Done Reply Inline Actions Removed loops in all tests. asbirlea: Removed loops in all tests.
				jlebarUnsubmitted Not Done Reply Inline Actions Do we have a test which checks that we don't reorder through phi nodes? jlebar: Do we have a test which checks that we don't reorder through phi nodes?
				asbirleaAuthorUnsubmitted Not Done Reply Inline Actions I don't think so. Feel free to add one :). asbirlea: I don't think so. Feel free to add one :).
				store i32 0, i32* %next.gep1, align 4
				jlebarUnsubmitted Not Done Reply Inline Actions You don't think it's worth adding a test as part of this patch? It seems relevant because we could otherwise get infinite recursion or something... jlebar: You don't think it's worth adding a test as part of this patch? It seems relevant because we…
				store i32 0, i32* %next.gep, align 4
				asbirleaAuthorUnsubmitted Not Done Reply Inline Actions I'm not sure how to properly create one right now. I added a test that makes an attempt at that, but it in fact ensures there is no vectorization beyond basic blocks (and implicitly through a phi node). asbirlea: I'm not sure how to properly create one right now. I added a test that makes an attempt at that…
				%l3 = load i32, i32* %next.gep1, align 4
				%l4 = load i32, i32* %next.gep2, align 4

				ret void
				}

test/Transforms/LoadStoreVectorizer/X86/preserve-order32.ll

	; RUN: opt -mtriple=x86-linux -load-store-vectorizer -S -o - %s \| FileCheck %s			; RUN: opt -mtriple=x86-linux -load-store-vectorizer -S -o - %s \| FileCheck %s

	target datalayout = "e-p:32:32-p1:64:64-p2:64:64-p3:32:32-p4:64:64-p5:32:32-p24:64:64-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64"			target datalayout = "e-p:32:32-p1:64:64-p2:64:64-p3:32:32-p4:64:64-p5:32:32-p24:64:64-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64"

	%struct.buffer_t = type { i32, i8* }			%struct.buffer_t = type { i32, i8* }

	; Check an i32 and i8* get vectorized, and that			; Check an i32 and i8* get vectorized, and that the two accesses
	; the two accesses (load into buff.val and store to buff.p) preserve their order.			; (load into buff.val and store to buff.p) preserve their order.
				; Vectorized loads should be inserted at the position of the first load,
				; and instructions which were between the first and last load should be
				; reordered preserving their relative order inasmuch as possible.

	; CHECK-LABEL: @preserve_order_32(			; CHECK-LABEL: @preserve_order_32(
	; CHECK: load <2 x i32>			; CHECK: load <2 x i32>
	; CHECK: %buff.val = load i8			; CHECK: %buff.val = load i8
	; CHECK: store i8 0			; CHECK: store i8 0
	define void @preserve_order_32(%struct.buffer_t* noalias %buff) #0 {			define void @preserve_order_32(%struct.buffer_t* noalias %buff) #0 {
	entry:			entry:
	%tmp1 = getelementptr inbounds %struct.buffer_t, %struct.buffer_t* %buff, i32 0, i32 1			%tmp1 = getelementptr inbounds %struct.buffer_t, %struct.buffer_t* %buff, i32 0, i32 1
	Show All 9 Lines

test/Transforms/LoadStoreVectorizer/X86/preserve-order64.ll

	; RUN: opt -mtriple=x86-linux -load-store-vectorizer -S -o - %s \| FileCheck %s			; RUN: opt -mtriple=x86-linux -load-store-vectorizer -S -o - %s \| FileCheck %s

	target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"

	%struct.buffer_t = type { i64, i8* }			%struct.buffer_t = type { i64, i8* }
				%struct.nested.buffer = type { %struct.buffer_t, %struct.buffer_t }

	; Check an i64 and i8* get vectorized, and that			; Check an i64 and i8* get vectorized, and that the two accesses
	; the two accesses (load into buff.val and store to buff.p) preserve their order.			; (load into buff.val and store to buff.p) preserve their order.
				; Vectorized loads should be inserted at the position of the first load,
				; and instructions which were between the first and last load should be
				; reordered preserving their relative order inasmuch as possible.

	; CHECK-LABEL: @preserve_order_64(			; CHECK-LABEL: @preserve_order_64(
	; CHECK: load <2 x i64>			; CHECK: load <2 x i64>
	; CHECK: %buff.val = load i8			; CHECK: %buff.val = load i8
	; CHECK: store i8 0			; CHECK: store i8 0
	define void @preserve_order_64(%struct.buffer_t* noalias %buff) #0 {			define void @preserve_order_64(%struct.buffer_t* noalias %buff) #0 {
	entry:			entry:
	%tmp1 = getelementptr inbounds %struct.buffer_t, %struct.buffer_t* %buff, i64 0, i32 1			%tmp1 = getelementptr inbounds %struct.buffer_t, %struct.buffer_t* %buff, i64 0, i32 1
	%buff.p = load i8, i8* %tmp1, align 8			%buff.p = load i8, i8* %tmp1, align 8
	%buff.val = load i8, i8* %buff.p, align 8			%buff.val = load i8, i8* %buff.p, align 8
	store i8 0, i8* %buff.p, align 8			store i8 0, i8* %buff.p, align 8
	%tmp0 = getelementptr inbounds %struct.buffer_t, %struct.buffer_t* %buff, i64 0, i32 0			%tmp0 = getelementptr inbounds %struct.buffer_t, %struct.buffer_t* %buff, i64 0, i32 0
	%buff.int = load i64, i64* %tmp0, align 8			%buff.int = load i64, i64* %tmp0, align 8
	ret void			ret void
	}			}

				; Check reordering recurses correctly.

				; CHECK-LABEL: @transitive_reorder(
				; CHECK: load <2 x i64>
				; CHECK: %buff.val = load i8
				; CHECK: store i8 0
				define void @transitive_reorder(%struct.buffer_t* noalias %buff, %struct.nested.buffer* noalias %nest) #0 {
				entry:
				%nest0_0 = getelementptr inbounds %struct.nested.buffer, %struct.nested.buffer* %nest, i64 0, i32 0
				%tmp1 = getelementptr inbounds %struct.buffer_t, %struct.buffer_t* %nest0_0, i64 0, i32 1
				%buff.p = load i8, i8* %tmp1, align 8
				%buff.val = load i8, i8* %buff.p, align 8
				store i8 0, i8* %buff.p, align 8
				%nest1_0 = getelementptr inbounds %struct.nested.buffer, %struct.nested.buffer* %nest, i64 0, i32 0
				%tmp0 = getelementptr inbounds %struct.buffer_t, %struct.buffer_t* %nest1_0, i64 0, i32 0
				%buff.int = load i64, i64* %tmp0, align 8
				ret void
				}

				; Check for no vectorization over phi node

				; CHECK-LABEL: @no_vect_phi(
				; CHECK: load i8*
				; CHECK: load i8
				; CHECK: store i8 0
				; CHECK: load i64
				define void @no_vect_phi(i32* noalias %ptr, %struct.buffer_t* noalias %buff) {
				entry:
				%tmp1 = getelementptr inbounds %struct.buffer_t, %struct.buffer_t* %buff, i64 0, i32 1
				%buff.p = load i8, i8* %tmp1, align 8
				%buff.val = load i8, i8* %buff.p, align 8
				store i8 0, i8* %buff.p, align 8
				br label %"for something"

				"for something":
				%index = phi i64 [ 0, %entry ], [ %index.next, %"for something" ]

				%tmp0 = getelementptr inbounds %struct.buffer_t, %struct.buffer_t* %buff, i64 0, i32 0
				%buff.int = load i64, i64* %tmp0, align 8

				%index.next = add i64 %index, 8
				%cmp_res = icmp eq i64 %index.next, 8
				br i1 %cmp_res, label %ending, label %"for something"

				ending:
				ret void
				}

	attributes #0 = { nounwind }			attributes #0 = { nounwind }

test/Transforms/LoadStoreVectorizer/X86/subchain-interleaved.ll

This file was added.

				; RUN: opt -mtriple=x86-linux -load-store-vectorizer -S -o - %s \| FileCheck %s

				target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"

				; Vectorized subsets of the load/store chains in the presence of
				; interleaved loads/stores

				; CHECK-LABEL: @interleave_2L_2S(
				; CHECK: load <2 x i32>
				; CHECK: load i32
				; CHECK: store <2 x i32>
				; CHECK: load i32
				define void @interleave_2L_2S(i32* noalias %ptr) {
				%next.gep = getelementptr i32, i32* %ptr, i64 0
				%next.gep1 = getelementptr i32, i32* %ptr, i64 1
				%next.gep2 = getelementptr i32, i32* %ptr, i64 2

				%l1 = load i32, i32* %next.gep1, align 4
				%l2 = load i32, i32* %next.gep, align 4
				store i32 0, i32* %next.gep1, align 4
				store i32 0, i32* %next.gep, align 4
				%l3 = load i32, i32* %next.gep1, align 4
				%l4 = load i32, i32* %next.gep2, align 4

				ret void
				}

				; CHECK-LABEL: @interleave_3L_2S_1L(
				; CHECK: load <3 x i32>
				; CHECK: store <2 x i32>
				; CHECK: load i32

				define void @interleave_3L_2S_1L(i32* noalias %ptr) {
				%next.gep = getelementptr i32, i32* %ptr, i64 0
				%next.gep1 = getelementptr i32, i32* %ptr, i64 1
				%next.gep2 = getelementptr i32, i32* %ptr, i64 2

				%l2 = load i32, i32* %next.gep, align 4
				%l1 = load i32, i32* %next.gep1, align 4
				store i32 0, i32* %next.gep1, align 4
				store i32 0, i32* %next.gep, align 4
				%l3 = load i32, i32* %next.gep1, align 4
				%l4 = load i32, i32* %next.gep2, align 4

				ret void
				}

				; CHECK-LABEL: @chain_suffix(
				; CHECK: load i32
				; CHECK: store <2 x i32>
				; CHECK: load i32
				; CHECK: load i32
				define void @chain_suffix(i32* noalias %ptr) {
				%next.gep = getelementptr i32, i32* %ptr, i64 0
				%next.gep1 = getelementptr i32, i32* %ptr, i64 1
				%next.gep2 = getelementptr i32, i32* %ptr, i64 2

				%l2 = load i32, i32* %next.gep, align 4
				store i32 0, i32* %next.gep1, align 4
				store i32 0, i32* %next.gep, align 4
				%l3 = load i32, i32* %next.gep1, align 4
				%l4 = load i32, i32* %next.gep2, align 4

				ret void
				}


				; CHECK-LABEL: @chain_prefix_suffix(
				; CHECK: load i32
				; CHECK: load i32
				; CHECK: store <2 x i32>
				; CHECK: load i32
				; CHECK: load i32
				; CHECK: load i32
				define void @chain_prefix_suffix(i32* noalias %ptr) {
				%next.gep = getelementptr i32, i32* %ptr, i64 0
				%next.gep1 = getelementptr i32, i32* %ptr, i64 1
				%next.gep2 = getelementptr i32, i32* %ptr, i64 2
				%next.gep3 = getelementptr i32, i32* %ptr, i64 3

				%l1 = load i32, i32* %next.gep, align 4
				%l2 = load i32, i32* %next.gep1, align 4
				store i32 0, i32* %next.gep1, align 4
				store i32 0, i32* %next.gep2, align 4
				%l3 = load i32, i32* %next.gep1, align 4
				%l4 = load i32, i32* %next.gep2, align 4
				%l5 = load i32, i32* %next.gep3, align 4

				ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

Correct ordering of loads/stores.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 63569

lib/Transforms/Vectorize/LoadStoreVectorizer.cpp

test/Transforms/LoadStoreVectorizer/AMDGPU/insertion-point.ll

test/Transforms/LoadStoreVectorizer/X86/correct-order.ll

test/Transforms/LoadStoreVectorizer/X86/preserve-order32.ll

test/Transforms/LoadStoreVectorizer/X86/preserve-order64.ll

test/Transforms/LoadStoreVectorizer/X86/subchain-interleaved.ll

Correct ordering of loads/stores.
ClosedPublic