When merging stores, add checks to ensure that no data dependences are introduced. Previously we checked only chain dependencies but ignored data dependencies.
There two checks capture the differences between the cases where we use Anti-aliasing (and can limit the search chain-parallel stores) vs. when we do not and consider direct chains of stores.
Details
Diff Detail
Event Timeline
Adding reviewers of:
http://reviews.llvm.org/D14834
Does that patch resolve this case?
Changing finding better chain algorithm may hide the problem fixed by this patch, but for our test case, changing the FindBetterChain algorithm doesn't help.
This is incomplete. It's missing the case for UseAA merges (which is also needed to resolve the underlying bug 18321/14834). I have a patch that captures this, though I haven't gotten a test which will exercises the useAA case.
Comandeering to upload patch.
Add check to ensuring no loops in data dependences are constructed from store merging
in the case we use Anti-aliasing analysis.
lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
11137 | I'm not sure if it's valid to use setNodeId here. But that's irrelevant, because this function shouldn't exist, since the code should be using hasPredecessor. | |
11161 | There are formatting/spacing issues throughout this. Please clang-format. | |
11171 | This is obfuscated; please refactor the hasPredecessor family to have a static function which requires the worklist to be pre-populated, instead of passing in a dummy node here. | |
11184 | Should use hasPredecessor. | |
test/CodeGen/AArch64/vector_merge_dep_check.ll | ||
11 | This test can almost certainly be simplified. It would be nice to spend a bit of time to remove irrelevant stuff from it so it's easier to see what's actually going on is just: And then, merging the store to offsets -1 and 0 into a single store, which results in the %l2 load chain-depending on it, but yet the merged-store requires that load as a value. (I think) |
Hi Nirav, I think the problem need to be solved should be the same with and without AA. It is testing the node predecessors. For your new upload, you use different check with UseAA. For my algorithm, it only tests the latest node in the store array because it is the one that links with the successors. Do you have another test case that shows failure with UseAA? What could be the different with UseAA?
There are two cases we do. In the case of AA we look for stores which are
parallel in regards to the chain dependencies. Otherwise we walk up the
chain to merge stores. Either way we fail to check if there are data
dependencies that we contradict (i.e. make a cycle).
Your patch handles the later case only because it relies on the fact that
all stores are predecessors to the latest one and there are not multiple
chain descendent. Thus if we walk through uses of that we implicitly are
doing the walk for all of them but it wouldn't necessarily hold in the
parallel case.
You're right about us only needing one path. We could unify them modifying
the parallel case to searching up the non-chain operands and not stores
themselves to avoid chained stores from always failing. Once D14834 is
committed the nonparallel case will disappear and we can return to the
simpler case.
Implement hasPredecessorHelper cleanup and merge two checks as per discussion
cleanup hasPredecessorHelper fixes and merge checks back together as per discussions
include/llvm/CodeGen/SelectionDAGNodes.h | ||
---|---|---|
628–653 | Lots of typos. Visitedo. unino. "operand_dependences" sounds like it's referring to a variable or function but isn't actually. "Null" is strangely capitalized. Much of the old comment you deleted seems still relevant, too. | |
634 | Why removed the Visited check? It seems to me like that was still a more natural interface. Then you can just ask all your "is Node X a predecessor" questions with the same interface. | |
lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
9639 | Why search completely? Isn't it better to escape early if you happen to be able to? | |
11158–11160 | This comment isn't very clear. | |
11166 | That you're skipping the chain operand deserves a mention, here. | |
test/CodeGen/AArch64/vector_merge_dep_check.ll | ||
13 | Yes, I see that. It's just that there's a lot of (probably?) irrelevant metadata args which can be removed, variable names shortened/made clearer, possibly the loop could go away too. And a comment in the test about what it is actually intending to test would be good, too. |
test/CodeGen/AArch64/vector_merge_dep_check.ll | ||
---|---|---|
14 | Done |
Separate check into a separate function.
Fix test case to check both AA and non-AA cases.
Do loop dependences check only in AA cases
Thanks, lgtm.
Note for readers: we spent some time going over this, and noted that 1) The original test case was using UseAA mode (which we'd not noticed before) 2) The bug doesn't actually exist in !UseAA, because of the different way it traverses the chain.
So this final version reflects that.
Why removed the Visited check? It seems to me like that was still a more natural interface.
Then you can just ask all your "is Node X a predecessor" questions with the same interface.