This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Fix merging m0 inits
ClosedPublic

Authored by piotr on Sep 17 2020, 10:13 PM.

Details

Summary

Fix incorrect merges of m0 inits in loops.

It was assumed that if a clobbering instruction appears in
the same block as an init and the clobbering instruction
does not dominate the init then it does not interfere with
init.

This does not work in the presence of loops, where in this
scenario, the clobbering instruction does interfere with
the init in another iteration.

To fix this, do not check for block equality and defer the
decision to the predecessor check.

Diff Detail

Event Timeline

piotr created this revision.Sep 17 2020, 10:13 PM
piotr requested review of this revision.Sep 17 2020, 10:13 PM

Doesn't loop block self dominate?

About a year ago I was working on eliminating this code completely but I guess I won't be getting back to this any time soon

piotr added a comment.Sep 18 2020, 1:43 PM

Doesn't loop block self dominate?

You mean, why the condition "MDT.dominates(From, To)" returns false if From and To are in the same BB? Inside that function there is a bb dominance check if the instructions are in different blocks, but if the instructions are in the same block the code checks whether From comes before To.

The bug I am trying to fix is simplified in the test m0-in-loop-0 where before my patch SI_INIT from line 333 was incorrectly removed. Although "m0 = COPY" comes after SI_INIT in that bb, this is a loop and in the second iteration DS_WRITE would see a clobbered m0, so SI_INIT has to be preserved.

Doesn't loop block self dominate?

You mean, why the condition "MDT.dominates(From, To)" returns false if From and To are in the same BB? Inside that function there is a bb dominance check if the instructions are in different blocks, but if the instructions are in the same block the code checks whether From comes before To.

The bug I am trying to fix is simplified in the test m0-in-loop-0 where before my patch SI_INIT from line 333 was incorrectly removed. Although "m0 = COPY" comes after SI_INIT in that bb, this is a loop and in the second iteration DS_WRITE would see a clobbered m0, so SI_INIT has to be preserved.

That sounds like a bug in the dominate() to me.

piotr added a comment.EditedSep 20 2020, 11:43 PM

Doesn't loop block self dominate?

You mean, why the condition "MDT.dominates(From, To)" returns false if From and To are in the same BB? Inside that function there is a bb dominance check if the instructions are in different blocks, but if the instructions are in the same block the code checks whether From comes before To.

The bug I am trying to fix is simplified in the test m0-in-loop-0 where before my patch SI_INIT from line 333 was incorrectly removed. Although "m0 = COPY" comes after SI_INIT in that bb, this is a loop and in the second iteration DS_WRITE would see a clobbered m0, so SI_INIT has to be preserved.

That sounds like a bug in the dominate() to me.

MDT.dominates() gives correct answer. In the failing case "From" (m0 copy, line 335 in the test) is inside the same block as "To" (SI_INIT, line 333). "From" comes after "To", and "From" does not dominate "To" (so MDT.dominates() return false as expected), but "To" is reachable from "From" from previous iteration, so "To" cannot be removed.

Doesn't loop block self dominate?

You mean, why the condition "MDT.dominates(From, To)" returns false if From and To are in the same BB? Inside that function there is a bb dominance check if the instructions are in different blocks, but if the instructions are in the same block the code checks whether From comes before To.

The bug I am trying to fix is simplified in the test m0-in-loop-0 where before my patch SI_INIT from line 333 was incorrectly removed. Although "m0 = COPY" comes after SI_INIT in that bb, this is a loop and in the second iteration DS_WRITE would see a clobbered m0, so SI_INIT has to be preserved.

That sounds like a bug in the dominate() to me.

MDT.dominates() gives correct answer. In the failing case "From" (m0 copy, line 335 in the test) is inside the same block as "To" (SI_INIT, line 333). "From" comes after "To", and "From" does not dominate "To" (so MDT.dominates() return false as expected), but "To" is reachable from "From" from previous iteration, so "To" cannot be removed.

I believe it gives incorrect answer:

(gdb) p MDT.dominates(From, To)
$1 = false
(gdb) p MDT.dominates(From->getParent(), To->getParent())
$2 = true

It correctly tells that block dominates itself, but then it says instructions in that block do not dominate other instructions in that block.
I believe the correct change would be to replace this:

bool dominates(const MachineInstr *A, const MachineInstr *B) const {
  applySplitCriticalEdges();
  const MachineBasicBlock *BBA = A->getParent(), *BBB = B->getParent();
  if (BBA != BBB) return DT->dominates(BBA, BBB);

with this:

bool dominates(const MachineInstr *A, const MachineInstr *B) const {
  applySplitCriticalEdges();
  const MachineBasicBlock *BBA = A->getParent(), *BBB = B->getParent();
  if (DT->dominates(BBA, BBB))
    return true;
  if (BBA != BBB)
    return false;
piotr added a comment.Sep 22 2020, 6:47 AM
bb.1:
  ..
  TO_inst   (inits m0)
  ...       (uses m0)
  FROM_inst (clobbers m0)
  ...
  S_CBRANCH_VCCZ %bb.1, undef
  S_BRANCH %bb.2

I guess I am missing something here, the dominance usually refers to blocks so it can be confusing if applied for instructions.

In this case I expect MDT.dominates(FROM_inst, TO_inst) to return false, because it is possible for a program invocation to execute TO_inst, without executing FROM_inst first (e.g. single loop iteration).

Anyway, the spirit of the code I am editing was that before my patch it was assumed that if FROM_inst comes after TO_inst in a basic block (as in the case described above) then there was no path from FROM_inst to TO_inst, so TO_inst could be removed. This is not true for loops, where FROM_inst will clober m0 used in the next iteration.

bb.1:
  ..
  TO_inst   (inits m0)
  ...       (uses m0)
  FROM_inst (clobbers m0)
  ...
  S_CBRANCH_VCCZ %bb.1, undef
  S_BRANCH %bb.2

I guess I am missing something here, the dominance usually refers to blocks so it can be confusing if applied for instructions.

In this case I expect MDT.dominates(FROM_inst, TO_inst) to return false, because it is possible for a program invocation to execute TO_inst, without executing FROM_inst first (e.g. single loop iteration).

Anyway, the spirit of the code I am editing was that before my patch it was assumed that if FROM_inst comes after TO_inst in a basic block (as in the case described above) then there was no path from FROM_inst to TO_inst, so TO_inst could be removed. This is not true for loops, where FROM_inst will clober m0 used in the next iteration.

If block A dominates block B then all instructions in A dominate all instructions in B, isn't it? That is not changed if A == B. Imagine you have split that block in between of "From" and "To" and there is simple fall-through at that split. Nothing has changed in the program logic, right? But now MDT::dominates() suddenly start to return true instead of false. This just cannot be right.

Doesn't loop block self dominate?

By definition: in CFG block X dominates block Y iif every path from entry to Y goes through X. Every block dominates itself.
Also by definition: block X strictly dominates block Y iif X dominates Y and X != Y.
To check if MBB A strictly dominates MBB B you need to call MDT.properlyDominates(A, B)

All the above is about that there are 2 orthogonal problems - dominance and reachability.
All instructions in a loop (irrelative of the number of blocks in a loop) are reachable from each other, i.e form SCC in a graph.
This is not about dominance at all. Moreover, the notion of dominance applied to instructions is just an interface sugar because dominance assumes graph but we don't have a graph of instructions but only basic blocks.
So, my opinion that check for a loop (i.e.predecessor search) is correct here.

rampitec accepted this revision.Sep 22 2020, 12:45 PM

OK, thanks Alex.

This revision is now accepted and ready to land.Sep 22 2020, 12:45 PM
This revision was landed with ongoing or failed builds.Sep 23 2020, 12:15 AM
This revision was automatically updated to reflect the committed changes.