This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/SI: Do not insert an instruction into worklist twice in movetovalu
ClosedPublic

Authored by alfred.j.huang on Jun 27 2017, 4:39 PM.

Details

Summary

In movetovalu, when we process an instruction in the worklist, we may delete/modify the instruction. So putting an
instruction in the worklist may result in handling a deleted/modified instruction, and thus cause trouble.

Diff Detail

Repository
rL LLVM

Event Timeline

cfang created this revision.Jun 27 2017, 4:39 PM
arsenm edited edge metadata.Jun 27 2017, 5:12 PM

Why is it possible for an instruction here to end up in the worklist multiple times? I'm surprised we haven't hit this before and I'm not sure exactly when this happens. Can you reduce the test any further? I would expect only 2-3 actual instructions in it.

lib/Target/AMDGPU/SIInstrInfo.cpp
3860 ↗(On Diff #104310)

Missing space

3947 ↗(On Diff #104310)

Missing space

3947 ↗(On Diff #104310)

I think the worklist can be pretty big. A brute force search through is probably bad.

cfang marked 2 inline comments as done.

This is the shortest test I can get! The instruction in question is the "and". It is first put
in the list because of tmp3, and the second time by tmp13.

define amdgpu_kernel void @in_worklist_once() #0 {
bb:

%tmp = load i64, i64* undef
br label %bb1

bb1: ; preds = %bb1, %bb

%tmp2 = phi i64 [ undef, %bb ], [ %tmp16, %bb1 ]
%tmp3 = phi i64 [ %tmp, %bb ], [ undef, %bb1 ]
%tmp11 = lshr i64 %tmp2, 14
%tmp13 = xor i64 %tmp11, %tmp2
%tmp15 = and i64 %tmp3, %tmp13
%tmp16 = xor i64 %tmp15, %tmp3
br label %bb1

}

lib/Target/AMDGPU/SIInstrInfo.cpp
3947 ↗(On Diff #104310)

What's your suggestion for the search in smallvector?

cfang updated this revision to Diff 104542.Jun 28 2017, 4:22 PM

Update based on Matt's comments.

TODO:

  1. improve the search in the worklist!
wdng added a comment.Jun 30 2017, 8:52 PM

Hi, is it possible to upload a full diff?

lib/Target/AMDGPU/SIInstrInfo.cpp
3947 ↗(On Diff #104310)

I would suggest to use hash for O(1) search.

I'm in the process of adding a SmallSet Workset in collaboration with the SmallVectorImpl for the Worklist to avoid searching through the Worklist every time before insertion. Will send out new diff when it's working.

alfred.j.huang commandeered this revision.Jul 7 2017, 1:57 PM
alfred.j.huang edited reviewers, added: cfang; removed: alfred.j.huang.

Changpeng Fang is on vacation and I'm trying to complete the code review for him.

alfred.j.huang retitled this revision from AMDGPU/SI: Don not insert an instruction into worklist twice in movetovalu to AMDGPU/SI: Do not insert an instruction into worklist twice in movetovalu .
  1. Changpeng's newest test case did show the "and" instr to be accessed twice through the 2 phi nodes. I tried an early version of CL 1419065 and it crashed also, so the problem did exist before. When a node was inserted into the Worklist twice and processed last in first out, the first pop may modify the instr and replaced it with new ones, the second pop may now have a junk node. Hence it is possible the problem did not reveal itself by luck earlier on, I guess. It is imperative to avoid pushing an instr into the Worklist twice.
  1. To avoid searching if an instr had already been entered into the Worklist, I am changing the Worklist type from SmallPtrVector to SmallPtrSet. SmallPtrSet's insert will ensure an item not to be entered twice. There is a catch for this, SmallPtrVector access is last in first out, SmallPtrSet is first in first out, this may lead to usage of register order to be different than before, so I added a linear search to get the last item entered. I don't believe this costs much performance degradation, but guarantee generated code to be same as before.
t-tye added inline comments.Jul 9 2017, 12:37 PM
lib/Target/AMDGPU/SIInstrInfo.cpp
3415–3418 ↗(On Diff #105693)

@alfred.j.huang you mention "SmallPtrVector access is last in first out". Is that true when the SmallPtrVector exceeds the size of the small number? If not then if the worklist gets larger than 32 it will no longer be deterministic in the order it returns the work list items which IIRC is against the LLVM policy which wants compilation to be consistent from run to run.

Thanks for mentioning this. I have reservation for the order myself. I was thinking reverse might work due to https://reviews.llvm.org/D26718. There was also this "LLVM_ENABLE_REVERSE_ITERATION" define and -mllvm reverse-iterate that when used in a single compilation, would make the SmallPtrSet iteration in reverse order. But I think you are right, SmallPtrSet is actually an unordered containiner, anything larger than the small 32 entries are hashed, so there is really no order, talking of iteration order is a bit strange here.

So back to the original question, when the original SmallPtrVector was used, we were traversing the "USE" chain for a previous dest register, we inserted them into the Worklist in order of the USErs and processed them by popping, so they are in the reverse order. I'm not really sure if it matters at all which order they were pushed and popped, I realized the difference only when one llvm lit test shows a difference of " v_add_i32_e32 v0, vcc, v1, v0" versus " v_add_i32_e32 v0, vcc, v0, v1", which in theory are the same. If testing with ocl test, conformance and llvm lit show there is no difference in runtime, can we actually ignore the ordering?

As a last resort, I can reuse my original changes which it is the most secure, but probably not with the best performance in mind by using 2 containers; a SmallPtrVector for push and pop, plus a SmallPtrSet to avoid inserting duplicates. In this case the result will guarantee to be the same as before.

t-tye added a comment.Jul 9 2017, 6:49 PM

A concern would be that if using the set caused running the same test multiple times resulted in different results because the different values of pointers caused the iteration order to change. Just because the tests run on your machine does not mean they will run on another machine and get the same result if it is relying on the hashed values of pointers.

Ok. Opinion well taken, I will change back to use 2 containers. SmallPtrVector as before for insertion and to collaborate with a SmallPtrSet for checking of duplicates. A new patch will be submitted later. Thanks!

SetVector and SmallSetVector already exist

I do not know the existence of SetVector. It basically contains sub containers set_ and vector_, which is exactly what I need. Instead of me imitating it with an explicit SmallSet and SmallVector. This should work. Thanks!

Switch to SmallSetVector in place of SmallPtrVector so an item is inserted into the container once only. The order of pushing and popping are the same as before.

Test is missing from patch

Sorry. I thought I'm using the same test Changpeng created and posted before. Anyway, I posted it here again and changed the GCN checkings to make the test more meaningful.

arsenm accepted this revision.Jul 13 2017, 2:25 PM

LGTM

lib/Target/AMDGPU/SIInstrInfo.h
42 ↗(On Diff #106531)

A better name would indicate it's for the moveToVALU worklist. I've thought about moving all of this code into the actual pass instead of SIInstrInfo, so it doesn't really matter.

test/CodeGen/AMDGPU/move-to-valu-worklist.ll
2 ↗(On Diff #106531)

Can you add a comment describing the problem

This revision is now accepted and ready to land.Jul 13 2017, 2:25 PM
This revision was automatically updated to reflect the committed changes.