This is an archive of the discontinued LLVM Phabricator instance.

[DeadMachineInstrctionElim] Post order visit all blocks and Iteratively run DeadMachineInstructionElim pass until nothing dead
ClosedPublic

Authored by guopeilin on Nov 15 2020, 7:22 PM.

Diff Detail

Event Timeline

guopeilin created this revision.Nov 15 2020, 7:22 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 15 2020, 7:22 PM
guopeilin requested review of this revision.Nov 15 2020, 7:22 PM
hliao added a comment.Nov 16 2020, 6:47 AM

could you elaborate more on why we need to run that iteratively? since the original one runs bottom-up, supposedly it should find all.

could you elaborate more on why we need to run that iteratively? since the original one runs bottom-up, supposedly it should find all.

From the iteratively-run-dead-mi-elim.mir we can see that bb.5 defines %6, and %6 is used in bb.2. When we traverse the all basic blocks, that is, we runs bottome-up, we will meet bb.5 first, for %6, we find that it is not dead cause %3 in bb.2 use it. So %6 surrive. Then we continue traverse other BBs, When we meet bb.2, we see that no one use %5, so we kill it. So as %4, %3. Right now, actually %6 becomes dead cause we kill %3 thus there is no longer any one uses %6.
However, cause we only traverse blocks once, we can't erase %6 at the end. So if we iteratively visit all blocks until nothing change, then we can ensure that all dead mi is erased.

guopeilin retitled this revision from [DeadMachineInstrctionElim] Iteratively run DeadMachineInstrcutionElim pass untill nothing dead to [DeadMachineInstrctionElim] Iteratively run DeadMachineInstrcutionElim pass until nothing dead.Nov 16 2020, 5:26 PM
guopeilin retitled this revision from [DeadMachineInstrctionElim] Iteratively run DeadMachineInstrcutionElim pass until nothing dead to [DeadMachineInstrctionElim] Iteratively run DeadMachineInstructionElim pass until nothing dead.
hliao added a comment.Nov 16 2020, 8:51 PM

could you elaborate more on why we need to run that iteratively? since the original one runs bottom-up, supposedly it should find all.

From the iteratively-run-dead-mi-elim.mir we can see that bb.5 defines %6, and %6 is used in bb.2. When we traverse the all basic blocks, that is, we runs bottome-up, we will meet bb.5 first, for %6, we find that it is not dead cause %3 in bb.2 use it. So %6 surrive. Then we continue traverse other BBs, When we meet bb.2, we see that no one use %5, so we kill it. So as %4, %3. Right now, actually %6 becomes dead cause we kill %3 thus there is no longer any one uses %6.
However, cause we only traverse blocks once, we can't erase %6 at the end. So if we iteratively visit all blocks until nothing change, then we can ensure that all dead mi is erased.

Ah, I see. Even though we try to traverse basic blocks bottom-up, that's just the block placement order instead of block reachability. Could we replace that order with the post-order? So that, the use is always traversed before the defiine.

could you elaborate more on why we need to run that iteratively? since the original one runs bottom-up, supposedly it should find all.

From the iteratively-run-dead-mi-elim.mir we can see that bb.5 defines %6, and %6 is used in bb.2. When we traverse the all basic blocks, that is, we runs bottome-up, we will meet bb.5 first, for %6, we find that it is not dead cause %3 in bb.2 use it. So %6 surrive. Then we continue traverse other BBs, When we meet bb.2, we see that no one use %5, so we kill it. So as %4, %3. Right now, actually %6 becomes dead cause we kill %3 thus there is no longer any one uses %6.
However, cause we only traverse blocks once, we can't erase %6 at the end. So if we iteratively visit all blocks until nothing change, then we can ensure that all dead mi is erased.

Ah, I see. Even though we try to traverse basic blocks bottom-up, that's just the block placement order instead of block reachability. Could we replace that order with the post-order? So that, the use is always traversed before the defiine.

That probably will not help if we have a loop?

hliao added a comment.Nov 16 2020, 9:14 PM

could you elaborate more on why we need to run that iteratively? since the original one runs bottom-up, supposedly it should find all.

From the iteratively-run-dead-mi-elim.mir we can see that bb.5 defines %6, and %6 is used in bb.2. When we traverse the all basic blocks, that is, we runs bottome-up, we will meet bb.5 first, for %6, we find that it is not dead cause %3 in bb.2 use it. So %6 surrive. Then we continue traverse other BBs, When we meet bb.2, we see that no one use %5, so we kill it. So as %4, %3. Right now, actually %6 becomes dead cause we kill %3 thus there is no longer any one uses %6.
However, cause we only traverse blocks once, we can't erase %6 at the end. So if we iteratively visit all blocks until nothing change, then we can ensure that all dead mi is erased.

Ah, I see. Even though we try to traverse basic blocks bottom-up, that's just the block placement order instead of block reachability. Could we replace that order with the post-order? So that, the use is always traversed before the defiine.

That probably will not help if we have a loop?

It still works unless that value has a cyclic dependency through phi-node.

could you elaborate more on why we need to run that iteratively? since the original one runs bottom-up, supposedly it should find all.

From the iteratively-run-dead-mi-elim.mir we can see that bb.5 defines %6, and %6 is used in bb.2. When we traverse the all basic blocks, that is, we runs bottome-up, we will meet bb.5 first, for %6, we find that it is not dead cause %3 in bb.2 use it. So %6 surrive. Then we continue traverse other BBs, When we meet bb.2, we see that no one use %5, so we kill it. So as %4, %3. Right now, actually %6 becomes dead cause we kill %3 thus there is no longer any one uses %6.
However, cause we only traverse blocks once, we can't erase %6 at the end. So if we iteratively visit all blocks until nothing change, then we can ensure that all dead mi is erased.

Ah, I see. Even though we try to traverse basic blocks bottom-up, that's just the block placement order instead of block reachability. Could we replace that order with the post-order? So that, the use is always traversed before the defiine.

That probably will not help if we have a loop?

It still works unless that value has a cyclic dependency through phi-node.

That's exactly what I had in mind, a phi node as the only way to get a cyclic dependency in SSA.

I tend to say this is LGTM. Although I wish to see a test with a cyclic dependency.

hliao added a comment.Nov 16 2020, 9:21 PM

could you elaborate more on why we need to run that iteratively? since the original one runs bottom-up, supposedly it should find all.

From the iteratively-run-dead-mi-elim.mir we can see that bb.5 defines %6, and %6 is used in bb.2. When we traverse the all basic blocks, that is, we runs bottome-up, we will meet bb.5 first, for %6, we find that it is not dead cause %3 in bb.2 use it. So %6 surrive. Then we continue traverse other BBs, When we meet bb.2, we see that no one use %5, so we kill it. So as %4, %3. Right now, actually %6 becomes dead cause we kill %3 thus there is no longer any one uses %6.
However, cause we only traverse blocks once, we can't erase %6 at the end. So if we iteratively visit all blocks until nothing change, then we can ensure that all dead mi is erased.

Ah, I see. Even though we try to traverse basic blocks bottom-up, that's just the block placement order instead of block reachability. Could we replace that order with the post-order? So that, the use is always traversed before the defiine.

That probably will not help if we have a loop?

It still works unless that value has a cyclic dependency through phi-node.

That's exactly what I had in mind, a phi node as the only way to get a cyclic dependency in SSA.

I tend to say this is LGTM. Although I wish to see a test with a cyclic dependency.

That value with cyclic dependence cannot be removed unless that loop is deemed as dead. It's beyond the scope this DCE. Using the post order removes the need for iterative runs.

Using post-order is quite straight-forward and only involves several lines of change. Please check the attachment.

That test passed with this traverse order change.

Using post-order is quite straight-forward and only involves several lines of change. Please check the attachment.

That test passed with this traverse order change.

That's a great help, I pass all my related cases with this patch, Thanks a lot.

Using post-order is quite straight-forward and only involves several lines of change. Please check the attachment.

That test passed with this traverse order change.

That's a great help, I pass all my related cases with this patch, Thanks a lot.

Now that we decide to use post order to visit all blocks of a function, I think we need to consider that what if CFG contains cycles?


From this picture, we can see that post order is not clearly defined cause there exits cycles, one of the possible orders is that [ m, g, d, e, c, b, t, x]
So m comes before g, if we define something in m and use it in g. Then even though both def and use are useless, cause we visit m first, we will still get a dead definition after we post-order visit all blocks.
So is it possible there still exist some cases theoretically that cannot be fixed by post-order visit? That is we may still need to iteratively run?

hliao accepted this revision.Nov 17 2020, 9:57 PM

Using post-order is quite straight-forward and only involves several lines of change. Please check the attachment.

That test passed with this traverse order change.

That's a great help, I pass all my related cases with this patch, Thanks a lot.

Now that we decide to use post order to visit all blocks of a function, I think we need to consider that what if CFG contains cycles?


From this picture, we can see that post order is not clearly defined cause there exits cycles, one of the possible orders is that [ m, g, d, e, c, b, t, x]
So m comes before g, if we define something in m and use it in g. Then even though both def and use are useless, cause we visit m first, we will still get a dead definition after we post-order visit all blocks.
So is it possible there still exist some cases theoretically that cannot be fixed by post-order visit? That is we may still need to iteratively run?

You are right, that's possible. That case should be rare as that's a def in the back-edge with acyclic dep. Could you merge the post-order change together with the iterative runs? so that, in the regular case, we at most run twice. Please keep on eye on compile time.

This revision is now accepted and ready to land.Nov 17 2020, 9:57 PM

BTW, please add a test case with that def in back-edge with acyclic dep.

guopeilin updated this revision to Diff 306005.Nov 18 2020, 1:41 AM
guopeilin retitled this revision from [DeadMachineInstrctionElim] Iteratively run DeadMachineInstructionElim pass until nothing dead to [DeadMachineInstrctionElim] Post order visit all blocks and Iteratively run DeadMachineInstructionElim pass until nothing dead.

Using post-order is quite straight-forward and only involves several lines of change. Please check the attachment.

That test passed with this traverse order change.

That's a great help, I pass all my related cases with this patch, Thanks a lot.

Now that we decide to use post order to visit all blocks of a function, I think we need to consider that what if CFG contains cycles?


From this picture, we can see that post order is not clearly defined cause there exits cycles, one of the possible orders is that [ m, g, d, e, c, b, t, x]
So m comes before g, if we define something in m and use it in g. Then even though both def and use are useless, cause we visit m first, we will still get a dead definition after we post-order visit all blocks.
So is it possible there still exist some cases theoretically that cannot be fixed by post-order visit? That is we may still need to iteratively run?

You are right, that's possible. That case should be rare as that's a def in the back-edge with acyclic dep. Could you merge the post-order change together with the iterative runs? so that, in the regular case, we at most run twice. Please keep on eye on compile time.

Post order visit and iteratively run are merged.

hliao added a comment.Nov 18 2020, 8:15 AM

Using post-order is quite straight-forward and only involves several lines of change. Please check the attachment.

That test passed with this traverse order change.

That's a great help, I pass all my related cases with this patch, Thanks a lot.

Now that we decide to use post order to visit all blocks of a function, I think we need to consider that what if CFG contains cycles?


From this picture, we can see that post order is not clearly defined cause there exits cycles, one of the possible orders is that [ m, g, d, e, c, b, t, x]
So m comes before g, if we define something in m and use it in g. Then even though both def and use are useless, cause we visit m first, we will still get a dead definition after we post-order visit all blocks.
So is it possible there still exist some cases theoretically that cannot be fixed by post-order visit? That is we may still need to iteratively run?

You are right, that's possible. That case should be rare as that's a def in the back-edge with acyclic dep. Could you merge the post-order change together with the iterative runs? so that, in the regular case, we at most run twice. Please keep on eye on compile time.

Post order visit and iteratively run are merged.

Do you add a test case similar to the CFG you showed?

hliao added a comment.Nov 19 2020, 9:10 PM

Do you have permission to commit?

Do you have permission to commit?

No, but I will apply for permission later. And I am still trying to write the test case that demonstrates a cycle within cfg manually, Although I found it difficult and cost time.