This is an archive of the discontinued LLVM Phabricator instance.

DAGCombiner: fix use-after-free when merging consecutive stores
ClosedPublic

Authored by nhaehnle on Oct 14 2016, 2:59 AM.

Details

Summary

Have MergeConsecutiveStores explicitly return information about the stores
that were merged, so that we can safely determine whether the starting
node has been freed.

Event Timeline

nhaehnle updated this revision to Diff 74642.Oct 14 2016, 2:59 AM
nhaehnle retitled this revision from to DAGCombiner: fix use-after-free when merging consecutive stores.
nhaehnle updated this object.
nhaehnle added reviewers: chandlerc, bogner.
nhaehnle added a subscriber: llvm-commits.

Nirav, can you take a look at this?

niravd edited edge metadata.Oct 27 2016, 9:18 AM

Do you have a test case for this?

The moving of StoreNodes to the local context seems like it could be a real fix to a nasty problem. We never seem to clear it and that may cause incorrect merges (though i have no example).

Otherwise this seems like a small worthwhile cleanup. As I understand it, when we merge stores we look for candidate nodes that match St but do not require that St is one of the merged nodes which makes it possible for a merge to leave the original store without marking it to be checked again. This may leave some stores that were merge candidates but were not chosen untouched.
I think a comment explaining this would be helpful.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11703

Please kill the comment on the preceding line.

12327

Use StoreNodes.size() instead of NumChanged (or better yet an iterator). This lets you keep the return type above in MergeConsecutiveStores.

The moving of StoreNodes to the local context seems like it could be a real fix to a nasty problem. We never seem to clear it and that may cause incorrect merges (though i have no example).

Ignore this.

Looking at this further, it seems like this is just stopping earlier than otherwise and potentially masking an issue. Nicolai, if you have a testcase you can share I'd be very interested at looking at it as I think it may be related to a bug I've run into separately.

Sorry for the slow response, had some other bugs to deal with. I'll address the comments now.

There are lots of lit tests that hit this when you apply D25313, unfortunately I didn't take a note of which ones precisely because there were so many... I don't think they manifest as actual miscompiles because the deleted nodes land in the Recycler, and nothing allocates a new node in the meantime. But it's certainly fragile...

nhaehnle updated this revision to Diff 76719.Nov 2 2016, 9:13 AM
nhaehnle edited edge metadata.

Keep returning a boolean.

Since not all nodes are necessarily merged, the success-path now needs to
truncate the StoreNodes vector to just those nodes that were actually
removed.

To follow up, these are the tests with asan failures (when D25313 is applied) that are fixed by applying this patch (AMDGPU and X86 backends only):

LLVM :: CodeGen/AMDGPU/amdgpu.private-memory.ll
LLVM :: CodeGen/AMDGPU/invariant-load-no-alias-store.ll
LLVM :: CodeGen/AMDGPU/merge-stores.ll
LLVM :: CodeGen/AMDGPU/vector-alloca.ll
LLVM :: CodeGen/X86/2012-11-28-merge-store-alias.ll
LLVM :: CodeGen/X86/2012-12-1-merge-multiple.ll
LLVM :: CodeGen/X86/MergeConsecutiveStores.ll
LLVM :: CodeGen/X86/avx-vextractf128.ll
LLVM :: CodeGen/X86/merge_store.ll
LLVM :: CodeGen/X86/pr18023.ll
LLVM :: CodeGen/X86/stores-merging.ll
LLVM :: CodeGen/X86/vector-merge-store-fp-constants.ll
nhaehnle updated this revision to Diff 76759.Nov 2 2016, 11:49 AM

Simplify using any_of.

niravd requested changes to this revision.Nov 2 2016, 12:44 PM
niravd edited edge metadata.

I'm still not convinced this fixes the problem. It reduces the number of merges that we do but they should should have been safe as is. . Let's take some time to look at the failing test cases and see if we can find where the problem is coming from.

This revision now requires changes to proceed.Nov 2 2016, 12:44 PM
niravd requested changes to this revision.Nov 2 2016, 12:44 PM

I'm still not convinced this fixes the problem. It reduces the number of merges that we do but they should should have been safe as is. . Let's take some time to look at the failing test cases and see if we can find where the problem is coming from.

This doesn't reduce the number of merges. The patch has 0 changes in any of the test/CodeGen tests across all backends, and I don't have any out-of-tree tests with changes either. There is no real functional change.

Even before this patch, the code satisfied: The store nodes that are deleted for the merge are exactly a prefix of the StoreNodes vector. The patch (a) truncates the StoreNodes vector to that prefix and (b) passes the vector out so that we can safely decide whether ST was deleted without implicitly relying on the behaviour of the SDNode recycler.

Look, I'll do you one better and do a llvm-lit test/CodeGen run with the following change in a debug build:

if (any_of(StoreNodes,
           [ST](const MemOpLink &Link) { return Link.MemNode == ST; })) {
  assert(ST->getOpcode() == ISD::DELETED_NODE);
  break;
}

assert(ST->getOpcode() != ISD::DELETED_NODE);

Of course the assertions re-introduce the problematic reliance on the recycler behaviour, but the point is that neither assertion ever triggers on any of the in-tree CodeGen tests.

niravd accepted this revision.Nov 3 2016, 6:53 AM
niravd edited edge metadata.

LGTM modulo previous comment.

This revision is now accepted and ready to land.Nov 3 2016, 6:53 AM
This revision was automatically updated to reflect the committed changes.