Aiming to correct the ordering of loads/stores. This patch changes insert point for loads to the position of the first load; it adds a new ordering method for loads to insert before, rather than after the load.
Updated testcases to reflect the changes.
Details
Diff Detail
Event Timeline
Revert some test changes after correcting the condition testing for misaligned in D21935.
lib/Transforms/Vectorize/LoadStoreVectorizer.cpp | ||
---|---|---|
94 | Where do we use reorderAfter? Should one of the calls to reorderBefore call reorderAfter? If so, why aren't some of the tests failing? :) | |
96 | "&" should bind to the variable name. (It's helpful to me to run clang-format as part of "arc diff", so I don't ever forget this stuff. I have a wrapper script around arc that invokes git-clang-format before posting a change. https://github.com/jlebar/conf/blob/master/bin/arc In fact I call my wrapper "arc" so I don't even have to think about it. You'll need a file/symlink in the same directory as that script called arc-real, and you'll need git-clang-format on your path -- it's in the LLVM source tree at tools/clang/tools/clang-format/git-clang-format. I think you'll also need to add [clangformat] style = "file" to your .gitconfig. Easy, right? :) | |
337 | Whitespace alignment here. | |
342 | This is arbitrarily-deep recursion, which I think we tend to avoid, for fear of overflowing the stack given pessimal inputs. Can we write this with an explicit worklist instead? | |
348 | Can we use llvm::SmallPtrSet? unordered_set is much slower and cache-inefficient. | |
351 | Space around "=" (clang-format should take care off this, too). Here and elsewhere. | |
352 | Do we know that I->getParent()->end() can't change when we insert new elements? | |
352 | This only considers instructions in I's BB, but InstructionsToMove may contain other instructions, no? If so that may make this whole thing more complicated... | |
359 | If you're going to do this, maybe we should assert that InstructionsToMove is empty after the loop? | |
359 | Do we need the &*? I'd think it should work without that. | |
885 | We use dyn_cast when the cast may fail and return null. But I think you don't want a null pointer inside InstrsToReorder, and we know that Bitcast is an Instruction, so I think you want plain cast<>. | |
test/Transforms/LoadStoreVectorizer/AMDGPU/insertion-point.ll | ||
12 | I don't quite get what the original code is testing here. Like, the adds are completely independent of the loads, right? If so, can we fix this test so it's not sensitive to implementation details? | |
test/Transforms/LoadStoreVectorizer/X86/correct-order.ll | ||
17 | Do all these tests need to be inside loops? | |
18 | Do we have a test which checks that we don't reorder through phi nodes? |
Still looking into updating the tests and some of the comments I missed.
lib/Transforms/Vectorize/LoadStoreVectorizer.cpp | ||
---|---|---|
94 | Fair point. I need to figure out if there si a case when reorderAfter is needed for stores. In the mean time, removing it. | |
96 | Yes. Postponing running clang-format until all other comments are addressed. | |
359 | All this was removed, but I use the comments for reorderBefore. | |
885 | Yes, updated. | |
test/Transforms/LoadStoreVectorizer/AMDGPU/insertion-point.ll | ||
12 | I'm not sure what the original purpose was. It looks to me it is intentionally testing an implementation detail ("insert_load_point"). | |
test/Transforms/LoadStoreVectorizer/X86/correct-order.ll | ||
18 | I don't think so. Feel free to add one :). |
OK, I'll have another look once you've figured the rest out. Just lmk.
lib/Transforms/Vectorize/LoadStoreVectorizer.cpp | ||
---|---|---|
95 |
I can help you resolve the conflicts if it gets hairy (I have some ideas for rebase tricks), but it's a requirement each commit be properly clang-formatted. We can't commit improperly-formatted code. | |
test/Transforms/LoadStoreVectorizer/AMDGPU/insertion-point.ll | ||
12 |
Looks like it to me, too. When we committed the original patches, we all agreed that we wouldn't act with a bias towards the existing code, since we committed with existing unresolved issues. I think this should count under that rubric. That is, can we fix the test so it no longer tests an implementation detail? I suppose you don't need to do that in this patch if you don't want. | |
test/Transforms/LoadStoreVectorizer/X86/correct-order.ll | ||
19 | You don't think it's worth adding a test as part of this patch? It seems relevant because we could otherwise get infinite recursion or something... |
test/Transforms/LoadStoreVectorizer/AMDGPU/insertion-point.ll | ||
---|---|---|
12 | The pass should still have an expectation for where the instructions will be inserted relative to the originals, I think a test ensuring this is useful |
test/Transforms/LoadStoreVectorizer/AMDGPU/insertion-point.ll | ||
---|---|---|
12 |
I guess I am OK with this if we can articulate in the test file or the cpp file exactly what is the rule that we expect applies to our output. If we cannot articulate a rule, and instead we're just checking that the pass does what it currently does, I do not think that is a good test. The reason is that, without an articulation of the rule, if the test fails, we have no way to tell whether there's a bug or if the test just needs to be changed. (And if we can articulate a rule, it should go without saying that, inasmuch as reasonable, the test should check only for adherence to the rule, ignoring other ancillary properties of the output.) |
test/Transforms/LoadStoreVectorizer/AMDGPU/insertion-point.ll | ||
---|---|---|
12 | Test should generally have a comment explaining what they are testing anyway. This change was mostly why I added this test in the first place. If something is changing any behavior of the pass, a test should capture this. I don't understand the concern about wondering if it's a bug or the test needs update, the point of having the test is you have to look at the test output changes to verify that it is still correct |
test/Transforms/LoadStoreVectorizer/AMDGPU/insertion-point.ll | ||
---|---|---|
12 |
My thesis is that this process is bug-prone. My evidence for this is that Alina found multiple tests in this test suite that had bugs -- tests that checked that some sequence of operations was vectorized when in fact it was not safe to vectorize it. This motivates my suggestion, which is that, inasmuch as we can, we should avoid engaging in this process (by writing tests that are not fragile to uninteresting details), and, where we can't avoid the process entirely (maybe the details are interesting, and maybe that's the case here), we should write down explicitly what behavior we expect from the pass. I don't mean to suggest that the bugs in the tests were the result of you being careless -- I didn't catch them either when I reviewed the patches. My point is just that every time humans have to look at new output and decide if it's correct, there is a chance that we'll overlook a bug. And based on this history, that chance is not negligible. Even if you disagree with my application of the evidence and think it's unlikely that the three of us would make such a bug, surely other maintainers may not be as scrupulous. Thus my suggestion: If we can write down the behavior we expect from the pass -- "Vectorized loads should be inserted at the position of the first load, and instructions which were between the first and last load should be reordered preserving their relative order inasmuch as possible." (or whatever the actual rule is) -- then when the test fails, we can judge against *that* whether the test or the pass is broken. And if we have to update the test, we have some chance of creating the correct output ourselves, rather than just accepting the output created by the pass (which is more likely, in my judgement, to lead to us accepting buggy output, per above). I think we're pretty close in what we want, honestly. The main difference, I think, is that I am saying that we should try not to test behavior for which we cannot articulate a rule. If the behavior is so incidental that we can't even say what it's supposed to do, I don't see why we'd want to enstone it as a test. |
I think most comments are addressed now. Let me know if I missed anything.
lib/Transforms/Vectorize/LoadStoreVectorizer.cpp | ||
---|---|---|
95 | Formatted. | |
342 | Used a work-list for the other method. | |
test/Transforms/LoadStoreVectorizer/AMDGPU/insertion-point.ll | ||
12 | I tried to resolve this for now by adding the comment you suggested in both this and the other 2 tests checking the order is preserved. | |
test/Transforms/LoadStoreVectorizer/X86/correct-order.ll | ||
18 | Removed loops in all tests. | |
20 | I'm not sure how to properly create one right now. I added a test that makes an attempt at that, but it in fact ensures there is no vectorization beyond basic blocks (and implicitly through a phi node). |
Can you elaborate a bit more in the commit message exactly what bug we're fixing here? It's not immediately clear to me.
lib/Transforms/Vectorize/LoadStoreVectorizer.cpp | ||
---|---|---|
95 | Maybe "reorderUsers" or "moveUsers" would be a better name, especially since we no longer have reorderAfter? Or even just "reorder", I guess. | |
341 | push_back | |
342 | !Worklist.empty() (I think?) | |
346 | I = Worklist.pop_back_elem(); | |
346 | push_back | |
373 | Since this is no longer recursive, we don't need the helper anymore? We can keep it if you think it's a useful way to break things down (I'm not sure it is), but then we should change the name so it's more descriptive and have it return the SmallPtrSet instead of take it by reference. | |
376 | Could we call this iterator BBI, and then call the instruction to move IM? That would be consistent with the helper, and also would fix the problem of InstructionToMove and InstructionsToMove looking very similar. | |
383 | Now that I think about this, could we just have a DEBUG() loop that checks that every element of InstructionsToMove is in the same BB as I? Then we don't need to erase from the set, which is overhead we don't need. | |
386 | Nit, end sentences with periods. Also suggest swapping the order of the prepositional phrases; this order is awkward (although I can't articulate a rule :). | |
test/Transforms/LoadStoreVectorizer/AMDGPU/insertion-point.ll | ||
6 | Please reflow. Also thanks for adding this; I now understand why the test output is what it is. That makes me happy. |
lib/Transforms/Vectorize/LoadStoreVectorizer.cpp | ||
---|---|---|
354 | Don't need DEBUG around the assert(). assert() is a macro and only evaluates its args in debug builds. | |
357–368 | One of the things I'm bad at is evaluating the correctness of code that contains nits to be fixed. So...now that you've fixed the nits, I have a correctness concern, which I'm sorry I didn't see earlier. My concern is that we stop reordering when IM dominates IW. But it seems to me that we should stop reordering when IM dominates I, no? Because after we move IW up before I, it may no longer dominate its operands. |
lib/Transforms/Vectorize/LoadStoreVectorizer.cpp | ||
---|---|---|
357–368 | Let me try to reason this. |
lib/Transforms/Vectorize/LoadStoreVectorizer.cpp | ||
---|---|---|
357–368 | What I missed is that instructions are not moved until after the loop. You're right. |
lib/Transforms/Vectorize/LoadStoreVectorizer.cpp | ||
---|---|---|
357–368 | Did you add a test for this? Phabricator isn't showing one, but I don't entirely trust it. If not, can you add one? |
lib/Transforms/Vectorize/LoadStoreVectorizer.cpp | ||
---|---|---|
357–369 | Added a tescase now. The test fails without the above change (replacing I with IW in the dominators check) |
Where do we use reorderAfter? Should one of the calls to reorderBefore call reorderAfter? If so, why aren't some of the tests failing? :)