Although it looks like SmallVector is enough for dead instruction
collection as well as for load instruction searching and no Visited set
is actually needed, a Visited set has been added for dead instruction
collection to make it safe and consistent with load instruction
searching in the 'findArgParts'.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
This doesn't make sense. The visited set must contain all visited instructions to prevent infinite loops. The worklist on the other hand only contains instructions yet to be visited. They are complement sets, you can't combine them into one, or at least not as implemented.
@nikic thank you very much for your comment. As I see in the removed code, in the AppendUsers lambda, every user of the value V is firstly inserted into the Visited set and if and only if the insertion takes place, the object is also appended to the Worklist vector. The SmallSetVector class implements exactly the same logic: https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/ADT/SetVector.h#L141 (however, one difference is there, SmallSetVector uses SmallDenceSet, not SmallPtrSet). The problem I see, during the traversing, we remove elements from the end of the vector using it as a stack (https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/IPO/ArgumentPromotion.cpp#L573) and SmallSetVector will remove the elements from the set too, not only from the vector. So (in the new implementation), if a user was inserted into the worklist and then is taken for processing, the user is also removed from the set of visited elements and therefore can be added to the worklist again and again if there is a loop between the users. Is this your concern?
Colleagues, @nikic Excuse me if I spend your time. I've had a look at the loop of searching the load instructions (https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/IPO/ArgumentPromotion.cpp#L563-L597) again and I cannot figure out an example where a cycle between an instruction and its user is. There is a loop over a worklist to find the dead instructions in the same pass: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/IPO/ArgumentPromotion.cpp#L392-L416 and no Visited set is used in this loop at all (the code is very similar but the code on lines 563-597, I proposed a patch for, adds some condition checks in the case where the GEP instruction is visited). I tried to replace the type of the Worklist variable in my patch with SmallVector instead of SmallSetVector and all the insert_range operations with append_range and no test in the llvm\test\Transforms\ArgumentPromotion folder fell. This can be a proof that the Visited set is not needed or we just have no test to get an infinite loop over the worklist in the case where a cycle between an instruction and its user is possible.
we just have no test to get an infinite loop over the worklist in the case where a cycle between an instruction and its user is possible.
Highly likely this case is our reality.
The cycle between an instruction and its users is possible for PHI nodes: only they may reference their own value but the loop in the patch handles no instructions but GEP, bitcast and load.
For example for the following input:
define internal i32* @callee(i32* noundef %0, i32* noundef %1) #0 { 2: br label %3 3: %4 = phi i32* [ %4, %3 ], [ %1, %2 ] br label %3 ret i32* %4 }
The pass just writes into the debug stream: ArgPromotion of i32* %1 failed: unknown user %4 = phi i32* [ %4, %3 ], [ %1, %2 ] and doesn't fell into a loop.
Also, there can be a situation when the graph looks like a diamond:
%0 = ... %1 = ... %0 ... %2 = ... %0 ... %3 = ... %1 ... %2 ... ; might be handled twice %4 = load ... %3 ...
In this case if there was no Visited set, instruction %3 would be handled twice, but because only the GEP, bitcast and load instructions are handled and no one of them can have more than a single pointer argument (I'm not sure about GEP but as I see even GEP has one pointer argument too), the promotion just fails one time on an unknown user:
%3 = bitcast i32* %0 to float* %4 = bitcast i32* %1 to float* %5 = icmp ugt float* %3, %4 %6 = select i1 %5, float* %3, float* %4 %7 = load float, float* %6, align 4, !tbaa !4 ...
output:
ArgPromotion of i32* %0 failed: unknown user %5 = icmp ugt float* %3, %4
So, the optimization (to remove the Visited set) looks like making sense here. What is your opinion?
The patch summary and the diff have been updated, the idea is just use a SmallVector instance for load instruction search because the def-use sub-graph where users are only GET, bitcast and load instructions looks like a tree not even a DAG.
@psamolysov The main thing we're concerned about when we use these "Visited" sets is unreachable code, which can include self-referencing instructions. You can have something like %ptr = load ptr, ptr %ptr and then loop infinitely.
Now, I think you are correct that this can't happen in this case, because we are only looking through instructions that have a single non-constant operand. However, if this code is ever changed to also handle other instruction kinds, one would have to be careful to reintroduce the visited set. For example, the "full restrict" patches add an additional operand to load instructions, which could be used to form a cycle in unreachable code.
So, I think it's safe to drop this Visited set in this case, but I would usually consider it good practice to have one, in the interest of defensiveness. Though I don't care particularly strongly about this point.
So, I think it's safe to drop this Visited set in this case, but I would usually consider it good practice to have one, in the interest of defensiveness.
I agree, the code may be changed in the future and the change can be unsafe if the developer won't take into account the idea why the Visited set is not used *now*. I see two ways: to add a comment why using a SmallVector to store the worklist and not using of the Visited set is safe in our case, or to add the Visited set into dead instruction collecting loop in the doPromotion function too, for consistency. What could be better from your point of view?
Better to stay safe IMHO.
To stay safe, I propose the following change: to add the Visited set to the dead instruction searching in the doPromotion function too. This makes the code consistent: two used graph traversing algorithms will look similar.