This is an archive of the discontinued LLVM Phabricator instance.

Correct ordering of loads/stores.
ClosedPublic

Authored by asbirlea on Jul 6 2016, 2:34 PM.

Details

Summary

Aiming to correct the ordering of loads/stores. This patch changes insert point for loads to the position of the first load; it adds a new ordering method for loads to insert before, rather than after the load.
Updated testcases to reflect the changes.

Diff Detail

Event Timeline

asbirlea updated this revision to Diff 62977.Jul 6 2016, 2:34 PM
asbirlea retitled this revision from to Correct ordering of loads/stores..
asbirlea updated this object.
asbirlea added reviewers: llvm-commits, jlebar, arsenm.
asbirlea updated this revision to Diff 62986.Jul 6 2016, 3:30 PM
asbirlea edited edge metadata.

Revert some test changes after correcting the condition testing for misaligned in D21935.

jlebar added inline comments.Jul 6 2016, 3:35 PM
lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
94

Where do we use reorderAfter? Should one of the calls to reorderBefore call reorderAfter? If so, why aren't some of the tests failing? :)

96

"&" should bind to the variable name.

(It's helpful to me to run clang-format as part of "arc diff", so I don't ever forget this stuff. I have a wrapper script around arc that invokes git-clang-format before posting a change. https://github.com/jlebar/conf/blob/master/bin/arc In fact I call my wrapper "arc" so I don't even have to think about it. You'll need a file/symlink in the same directory as that script called arc-real, and you'll need git-clang-format on your path -- it's in the LLVM source tree at tools/clang/tools/clang-format/git-clang-format. I think you'll also need to add

[clangformat]
        style = "file"

to your .gitconfig.

Easy, right? :)

337

Whitespace alignment here.

342

This is arbitrarily-deep recursion, which I think we tend to avoid, for fear of overflowing the stack given pessimal inputs. Can we write this with an explicit worklist instead?

348

Can we use llvm::SmallPtrSet? unordered_set is much slower and cache-inefficient.

351

Space around "=" (clang-format should take care off this, too). Here and elsewhere.

352

Do we know that I->getParent()->end() can't change when we insert new elements?

352

This only considers instructions in I's BB, but InstructionsToMove may contain other instructions, no? If so that may make this whole thing more complicated...

359

If you're going to do this, maybe we should assert that InstructionsToMove is empty after the loop?

359

Do we need the &*? I'd think it should work without that.

884

We use dyn_cast when the cast may fail and return null. But I think you don't want a null pointer inside InstrsToReorder, and we know that Bitcast is an Instruction, so I think you want plain cast<>.

test/Transforms/LoadStoreVectorizer/AMDGPU/insertion-point.ll
12

I don't quite get what the original code is testing here. Like, the adds are completely independent of the loads, right? If so, can we fix this test so it's not sensitive to implementation details?

test/Transforms/LoadStoreVectorizer/X86/correct-order.ll
17

Do all these tests need to be inside loops?

18

Do we have a test which checks that we don't reorder through phi nodes?

asbirlea updated this revision to Diff 62998.Jul 6 2016, 4:59 PM

Partially address comments.

Still looking into updating the tests and some of the comments I missed.

lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
94

Fair point. I need to figure out if there si a case when reorderAfter is needed for stores. In the mean time, removing it.

96

Yes. Postponing running clang-format until all other comments are addressed.
I also have another patch on top of this which may lead to conflicts after formatting, in which case I will clang-format the next patch.

359

All this was removed, but I use the comments for reorderBefore.

884

Yes, updated.

test/Transforms/LoadStoreVectorizer/AMDGPU/insertion-point.ll
12

I'm not sure what the original purpose was. It looks to me it is intentionally testing an implementation detail ("insert_load_point").

test/Transforms/LoadStoreVectorizer/X86/correct-order.ll
18

I don't think so. Feel free to add one :).

jlebar edited edge metadata.Jul 6 2016, 5:56 PM

OK, I'll have another look once you've figured the rest out. Just lmk.

lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
95

in which case I will clang-format the next patch.

I can help you resolve the conflicts if it gets hairy (I have some ideas for rebase tricks), but it's a requirement each commit be properly clang-formatted. We can't commit improperly-formatted code.

test/Transforms/LoadStoreVectorizer/AMDGPU/insertion-point.ll
12

It looks to me it is intentionally testing an implementation detail ("insert_load_point").

Looks like it to me, too. When we committed the original patches, we all agreed that we wouldn't act with a bias towards the existing code, since we committed with existing unresolved issues. I think this should count under that rubric.

That is, can we fix the test so it no longer tests an implementation detail? I suppose you don't need to do that in this patch if you don't want.

test/Transforms/LoadStoreVectorizer/X86/correct-order.ll
19

You don't think it's worth adding a test as part of this patch? It seems relevant because we could otherwise get infinite recursion or something...

arsenm added inline comments.Jul 6 2016, 8:13 PM
test/Transforms/LoadStoreVectorizer/AMDGPU/insertion-point.ll
12

The pass should still have an expectation for where the instructions will be inserted relative to the originals, I think a test ensuring this is useful

jlebar added inline comments.Jul 6 2016, 8:41 PM
test/Transforms/LoadStoreVectorizer/AMDGPU/insertion-point.ll
12

The pass should still have an expectation for where the instructions will be inserted relative to the originals

I guess I am OK with this if we can articulate in the test file or the cpp file exactly what is the rule that we expect applies to our output. If we cannot articulate a rule, and instead we're just checking that the pass does what it currently does, I do not think that is a good test. The reason is that, without an articulation of the rule, if the test fails, we have no way to tell whether there's a bug or if the test just needs to be changed.

(And if we can articulate a rule, it should go without saying that, inasmuch as reasonable, the test should check only for adherence to the rule, ignoring other ancillary properties of the output.)

arsenm added inline comments.Jul 6 2016, 11:55 PM
test/Transforms/LoadStoreVectorizer/AMDGPU/insertion-point.ll
12

Test should generally have a comment explaining what they are testing anyway. This change was mostly why I added this test in the first place. If something is changing any behavior of the pass, a test should capture this. I don't understand the concern about wondering if it's a bug or the test needs update, the point of having the test is you have to look at the test output changes to verify that it is still correct

jlebar added inline comments.Jul 7 2016, 8:09 AM
test/Transforms/LoadStoreVectorizer/AMDGPU/insertion-point.ll
12

the point of having the test is you have to look at the test output changes to verify that it is still correct

My thesis is that this process is bug-prone. My evidence for this is that Alina found multiple tests in this test suite that had bugs -- tests that checked that some sequence of operations was vectorized when in fact it was not safe to vectorize it. This motivates my suggestion, which is that, inasmuch as we can, we should avoid engaging in this process (by writing tests that are not fragile to uninteresting details), and, where we can't avoid the process entirely (maybe the details are interesting, and maybe that's the case here), we should write down explicitly what behavior we expect from the pass.

I don't mean to suggest that the bugs in the tests were the result of you being careless -- I didn't catch them either when I reviewed the patches. My point is just that every time humans have to look at new output and decide if it's correct, there is a chance that we'll overlook a bug. And based on this history, that chance is not negligible. Even if you disagree with my application of the evidence and think it's unlikely that the three of us would make such a bug, surely other maintainers may not be as scrupulous.

Thus my suggestion: If we can write down the behavior we expect from the pass -- "Vectorized loads should be inserted at the position of the first load, and instructions which were between the first and last load should be reordered preserving their relative order inasmuch as possible." (or whatever the actual rule is) -- then when the test fails, we can judge against *that* whether the test or the pass is broken. And if we have to update the test, we have some chance of creating the correct output ourselves, rather than just accepting the output created by the pass (which is more likely, in my judgement, to lead to us accepting buggy output, per above).

I think we're pretty close in what we want, honestly. The main difference, I think, is that I am saying that we should try not to test behavior for which we cannot articulate a rule. If the behavior is so incidental that we can't even say what it's supposed to do, I don't see why we'd want to enstone it as a test.

asbirlea updated this revision to Diff 63097.Jul 7 2016, 11:13 AM
asbirlea edited edge metadata.

Address comments.

I think most comments are addressed now. Let me know if I missed anything.

lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
95

Formatted.

342

Used a work-list for the other method.

test/Transforms/LoadStoreVectorizer/AMDGPU/insertion-point.ll
12

I tried to resolve this for now by adding the comment you suggested in both this and the other 2 tests checking the order is preserved.

test/Transforms/LoadStoreVectorizer/X86/correct-order.ll
18

Removed loops in all tests.

20

I'm not sure how to properly create one right now. I added a test that makes an attempt at that, but it in fact ensures there is no vectorization beyond basic blocks (and implicitly through a phi node).

asbirlea updated this revision to Diff 63151.Jul 7 2016, 3:55 PM

Adding changes from dependent patches.

jlebar added a comment.Jul 7 2016, 6:09 PM

Can you elaborate a bit more in the commit message exactly what bug we're fixing here? It's not immediately clear to me.

lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
95

Maybe "reorderUsers" or "moveUsers" would be a better name, especially since we no longer have reorderAfter? Or even just "reorder", I guess.

341

push_back

342

!Worklist.empty() (I think?)

346

I = Worklist.pop_back_elem();

351

push_back

372

Since this is no longer recursive, we don't need the helper anymore? We can keep it if you think it's a useful way to break things down (I'm not sure it is), but then we should change the name so it's more descriptive and have it return the SmallPtrSet instead of take it by reference.

375

Could we call this iterator BBI, and then call the instruction to move IM? That would be consistent with the helper, and also would fix the problem of InstructionToMove and InstructionsToMove looking very similar.

382

Now that I think about this, could we just have a DEBUG() loop that checks that every element of InstructionsToMove is in the same BB as I? Then we don't need to erase from the set, which is overhead we don't need.

385

Nit, end sentences with periods.

Also suggest swapping the order of the prepositional phrases; this order is awkward (although I can't articulate a rule :).

test/Transforms/LoadStoreVectorizer/AMDGPU/insertion-point.ll
6

Please reflow. Also thanks for adding this; I now understand why the test output is what it is. That makes me happy.

asbirlea updated this revision to Diff 63300.Jul 8 2016, 1:03 PM

Address comments.

asbirlea marked 10 inline comments as done.Jul 8 2016, 1:57 PM

Mark as done.

jlebar added inline comments.Jul 8 2016, 3:14 PM
lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
356–367

One of the things I'm bad at is evaluating the correctness of code that contains nits to be fixed. So...now that you've fixed the nits, I have a correctness concern, which I'm sorry I didn't see earlier.

My concern is that we stop reordering when IM dominates IW. But it seems to me that we should stop reordering when IM dominates I, no? Because after we move IW up before I, it may no longer dominate its operands.

359

Don't need DEBUG around the assert(). assert() is a macro and only evaluates its args in debug builds.

asbirlea added inline comments.Jul 8 2016, 5:03 PM
lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
356–367

Let me try to reason this.
The loop checks that all operands IM should dominate IW. If that's not the case, IW should be moved before I. If that happens, IW is added to the worklist, so all its operands are checked and possibly moved before as well in the next iterations.
Yes, all IM should implicitly dominate I as well, but that should be transitive through IW. Unless I missed something?

asbirlea added inline comments.Jul 8 2016, 5:15 PM
lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
356–367

What I missed is that instructions are not moved until after the loop. You're right.

asbirlea updated this revision to Diff 63509.Jul 11 2016, 7:28 AM

Address comments.

jlebar added inline comments.Jul 11 2016, 9:46 AM
lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
356–367

Did you add a test for this? Phabricator isn't showing one, but I don't entirely trust it. If not, can you add one?

asbirlea updated this revision to Diff 63555.Jul 11 2016, 12:43 PM

Add testcase.

asbirlea added inline comments.Jul 11 2016, 12:45 PM
lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
356–368

Added a tescase now. The test fails without the above change (replacing I with IW in the dominators check)

jlebar accepted this revision.Jul 11 2016, 12:53 PM
jlebar edited edge metadata.
This revision is now accepted and ready to land.Jul 11 2016, 12:53 PM
asbirlea updated this revision to Diff 63569.Jul 11 2016, 1:59 PM
asbirlea edited edge metadata.

Update to latest.

This revision was automatically updated to reflect the committed changes.