In movetovalu, when we process an instruction in the worklist, we may delete/modify the instruction. So putting an
instruction in the worklist may result in handling a deleted/modified instruction, and thus cause trouble.
Details
Diff Detail
Event Timeline
Why is it possible for an instruction here to end up in the worklist multiple times? I'm surprised we haven't hit this before and I'm not sure exactly when this happens. Can you reduce the test any further? I would expect only 2-3 actual instructions in it.
lib/Target/AMDGPU/SIInstrInfo.cpp | ||
---|---|---|
3860 | Missing space | |
3947 | Missing space | |
3947 | I think the worklist can be pretty big. A brute force search through is probably bad. |
This is the shortest test I can get! The instruction in question is the "and". It is first put
in the list because of tmp3, and the second time by tmp13.
define amdgpu_kernel void @in_worklist_once() #0 {
bb:
%tmp = load i64, i64* undef br label %bb1
bb1: ; preds = %bb1, %bb
%tmp2 = phi i64 [ undef, %bb ], [ %tmp16, %bb1 ] %tmp3 = phi i64 [ %tmp, %bb ], [ undef, %bb1 ] %tmp11 = lshr i64 %tmp2, 14 %tmp13 = xor i64 %tmp11, %tmp2 %tmp15 = and i64 %tmp3, %tmp13 %tmp16 = xor i64 %tmp15, %tmp3 br label %bb1
}
lib/Target/AMDGPU/SIInstrInfo.cpp | ||
---|---|---|
3947 | What's your suggestion for the search in smallvector? |
Hi, is it possible to upload a full diff?
lib/Target/AMDGPU/SIInstrInfo.cpp | ||
---|---|---|
3947 | I would suggest to use hash for O(1) search. |
I'm in the process of adding a SmallSet Workset in collaboration with the SmallVectorImpl for the Worklist to avoid searching through the Worklist every time before insertion. Will send out new diff when it's working.
- Changpeng's newest test case did show the "and" instr to be accessed twice through the 2 phi nodes. I tried an early version of CL 1419065 and it crashed also, so the problem did exist before. When a node was inserted into the Worklist twice and processed last in first out, the first pop may modify the instr and replaced it with new ones, the second pop may now have a junk node. Hence it is possible the problem did not reveal itself by luck earlier on, I guess. It is imperative to avoid pushing an instr into the Worklist twice.
- To avoid searching if an instr had already been entered into the Worklist, I am changing the Worklist type from SmallPtrVector to SmallPtrSet. SmallPtrSet's insert will ensure an item not to be entered twice. There is a catch for this, SmallPtrVector access is last in first out, SmallPtrSet is first in first out, this may lead to usage of register order to be different than before, so I added a linear search to get the last item entered. I don't believe this costs much performance degradation, but guarantee generated code to be same as before.
lib/Target/AMDGPU/SIInstrInfo.cpp | ||
---|---|---|
3415–3418 | @alfred.j.huang you mention "SmallPtrVector access is last in first out". Is that true when the SmallPtrVector exceeds the size of the small number? If not then if the worklist gets larger than 32 it will no longer be deterministic in the order it returns the work list items which IIRC is against the LLVM policy which wants compilation to be consistent from run to run. |
Thanks for mentioning this. I have reservation for the order myself. I was thinking reverse might work due to https://reviews.llvm.org/D26718. There was also this "LLVM_ENABLE_REVERSE_ITERATION" define and -mllvm reverse-iterate that when used in a single compilation, would make the SmallPtrSet iteration in reverse order. But I think you are right, SmallPtrSet is actually an unordered containiner, anything larger than the small 32 entries are hashed, so there is really no order, talking of iteration order is a bit strange here.
So back to the original question, when the original SmallPtrVector was used, we were traversing the "USE" chain for a previous dest register, we inserted them into the Worklist in order of the USErs and processed them by popping, so they are in the reverse order. I'm not really sure if it matters at all which order they were pushed and popped, I realized the difference only when one llvm lit test shows a difference of " v_add_i32_e32 v0, vcc, v1, v0" versus " v_add_i32_e32 v0, vcc, v0, v1", which in theory are the same. If testing with ocl test, conformance and llvm lit show there is no difference in runtime, can we actually ignore the ordering?
As a last resort, I can reuse my original changes which it is the most secure, but probably not with the best performance in mind by using 2 containers; a SmallPtrVector for push and pop, plus a SmallPtrSet to avoid inserting duplicates. In this case the result will guarantee to be the same as before.
A concern would be that if using the set caused running the same test multiple times resulted in different results because the different values of pointers caused the iteration order to change. Just because the tests run on your machine does not mean they will run on another machine and get the same result if it is relying on the hashed values of pointers.
Ok. Opinion well taken, I will change back to use 2 containers. SmallPtrVector as before for insertion and to collaborate with a SmallPtrSet for checking of duplicates. A new patch will be submitted later. Thanks!
I do not know the existence of SetVector. It basically contains sub containers set_ and vector_, which is exactly what I need. Instead of me imitating it with an explicit SmallSet and SmallVector. This should work. Thanks!
Switch to SmallSetVector in place of SmallPtrVector so an item is inserted into the container once only. The order of pushing and popping are the same as before.
Sorry. I thought I'm using the same test Changpeng created and posted before. Anyway, I posted it here again and changed the GCN checkings to make the test more meaningful.
LGTM
lib/Target/AMDGPU/SIInstrInfo.h | ||
---|---|---|
41 | A better name would indicate it's for the moveToVALU worklist. I've thought about moving all of this code into the actual pass instead of SIInstrInfo, so it doesn't really matter. | |
test/CodeGen/AMDGPU/move-to-valu-worklist.ll | ||
2 ↗ | (On Diff #106531) | Can you add a comment describing the problem |
A better name would indicate it's for the moveToVALU worklist. I've thought about moving all of this code into the actual pass instead of SIInstrInfo, so it doesn't really matter.