Page MenuHomePhabricator

[AMDGPU] Target hook to apply target specific split constraint
Needs ReviewPublic

Authored by alex-t on Sep 3 2020, 12:05 PM.

Details

Reviewers
rampitec
Summary

The change introduce new target hook that allow target to apply split constraint to MBB.

Diff Detail

Event Timeline

alex-t created this revision.Sep 3 2020, 12:05 PM
alex-t requested review of this revision.Sep 3 2020, 12:05 PM

I do not think it shall be at a block level. You can allow splitting after all exec modifications are done. For example:

bb.1:
  ; fall-thru
bb.2:
  s_or_b64 exec, exec, mask
  ; a valid split point

I.e. you may skip the iterator until you see a non-exec modifying and non-terminator instruction. You would also need to skip the block to the only successor. In that example you would disallow split in the bb.1, you would also disallow split in bb.2 before s_or_b64, but that is perfectly valid to split it after s_or_b64. The code as written would prohinit any splits in the bb.2 which is an overkill.

I would rather modify the interface of isBasicBlockPrologue() (and rename it) to account for a fall-thru case and do a search for a valid split point. In fact it is also needed for spill reloads, and that interface was created for spill reloads. In that case isBasicBlockPrologue() would need to return not just an iterator, but also a new block, and potentially it may prohibit any spilling/splitting at all if it cannot find a valid place to do it.

alex-t added a comment.Sep 4 2020, 9:15 AM

Oops. In fact the diff above is not that I was intended to upload. The SIInstrInfo::IsValidForLISplit is a complete nonsense. Probably deleted a part of the function by accident...
I'll get back and upload the working one.

alex-t updated this revision to Diff 289990.Sep 4 2020, 10:22 AM

Now the correct diff uploaded

alex-t added a comment.EditedSep 4 2020, 10:27 AM

So, since we now have sensible diff to discuss...
Why I decided to disallow split in any block that gets control with exec == 0 and has no restoring code in prologue?
I just did it by example of the code that already does same for the blocks with interference - a bit later below:

// Abort if the spill cannot be inserted at the MBB' start
MachineBasicBlock *MBB = MF->getBlockNumbered(Number);
if (!MBB->empty() &&
    SlotIndex::isEarlierInstr(LIS->getInstructionIndex(MBB->instr_front()),
                              SA->getFirstSplitPoint(Number)))
  return false;

Your idea of returnin the iterator to the nearest valid insert point is good but I have to find another place to invade. Maybe I should try to modify InsertPointAnalysis::getFirst/LastInsertPoint methods. The main problem here is that the getFirstInsertPoint (that in order query TargetInstrInfo::isBasicBlockPrologue) is called in both addSplitConstraints and addThroughConstraints only for those basic blocks which have interference for the give LI and run into "continue" for those which have no.
In case we have block that gets control from predecessor with exec == 0 but has no interference we have no chance to avoid placing spill into it if we rely on the isBasicBlockPrologue.

I still do not believe the problem is specific to EXEC = 0 case. In fact the problem could occur with any EXEC value, it is sufficient to have it different from what is expected. That is not valid to insert a split into any block before exec is restored, not just if previous value was zero.

Also I still think that disabling a whole "endif" block is an overkill.

Also I still think that disabling a whole "endif" block is an overkill.

It only is disabled if S_OR_B64 exec, ... is in the middle of the block that should never happen.

while (isBasicBlockPrologue(*J)) {
  if (IsExecRestore(&*J))
    return true;

assumes that if exec is restored in the block prologue it is valid

Also I still think that disabling a whole "endif" block is an overkill.

It only is disabled if S_OR_B64 exec, ... is in the middle of the block that should never happen.

while (isBasicBlockPrologue(*J)) {
  if (IsExecRestore(&*J))
    return true;

assumes that if exec is restored in the block prologue it is valid

So practically it never happens and split is effectively only disabled in an empty block? I said it already: it does not matter that exec is zero, what matters is that it does not match. It does not matter that a block is empty as well, it is enough to split before s_or to hit the bug.

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
6811

Not speaking of the wave32, you cannot rely on the S_OR instruction. It can be restored with a XOR as well.

6817

S_OR shall not be a terminator, it is used in the prologue.

alex-t added a comment.Sep 4 2020, 1:28 PM

Also I still think that disabling a whole "endif" block is an overkill.

It only is disabled if S_OR_B64 exec, ... is in the middle of the block that should never happen.

while (isBasicBlockPrologue(*J)) {
  if (IsExecRestore(&*J))
    return true;

assumes that if exec is restored in the block prologue it is valid

So practically it never happens and split is effectively only disabled in an empty block? I said it already: it does not matter that exec is zero, what matters is that it does not match. It does not matter that a block is empty as well, it is enough to split before s_or to hit the bug.

Let's forget about exec == 0...
The code snippet above looks for exec restoring instruction in the block prologue and return true if it exists. So, false positive can only happen if there is no exec restoring instruction in the beginning of the block but it is placed in the middle of the block. In this case I report the block is invalid for splitting despite it is valid to split after the exec restoring instruction in the middle of the block. Do we really have exec restored in the middle of the block?

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
6817

This is not about searching S_OR at all. This code checks if one of the predecessors has exec modified at block exit and a branch with exeec == 0 to current block or which we are querying now.

Also I still think that disabling a whole "endif" block is an overkill.

It only is disabled if S_OR_B64 exec, ... is in the middle of the block that should never happen.

while (isBasicBlockPrologue(*J)) {
  if (IsExecRestore(&*J))
    return true;

assumes that if exec is restored in the block prologue it is valid

So practically it never happens and split is effectively only disabled in an empty block? I said it already: it does not matter that exec is zero, what matters is that it does not match. It does not matter that a block is empty as well, it is enough to split before s_or to hit the bug.

Let's forget about exec == 0...
The code snippet above looks for exec restoring instruction in the block prologue and return true if it exists. So, false positive can only happen if there is no exec restoring instruction in the beginning of the block but it is placed in the middle of the block. In this case I report the block is invalid for splitting despite it is valid to split after the exec restoring instruction in the middle of the block. Do we really have exec restored in the middle of the block?

No, we do not change exec in the middle of the block.
It would help to understand what are you doing if you would have a test to show.

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
6817

It does at line 6811?

alex-t added a comment.EditedSep 7 2020, 7:14 AM

The idea is:

For the block that is queried

  1. Look for it's predecessors that can pass control through the S_EXECZ/EXECNZ
  2. If found one, look for exec restoring code starting the beginning of the block being queried.
  3. Since exec restoring code always belong to the block prologue, search the prologue and if not found return false.

Considering your comment that exec == 0 does not matter, we'd rather search upwards before the immediate dominator block in encountered to check what we met first - exec modify or exec restore. The problem here is that XOR can be both.

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
6817

At line 6811 is the lambda that looks for the exec restoring instructions. It is used at line 6834 looking for exec restore from the beginning of current block. As for the XOR - yes it also should be counted here.

The idea is:

For the block that is queried

  1. Look for it's predecessors that can pass control through the S_EXECZ/EXECNZ
  2. If found one, look for exec restoring code starting the beginning of the block being queried.
  3. Since exec restoring code always belong to the block prologue, search the prologue and if not found return false.

Considering your comment that exec == 0 does not matter, we'd rather search upwards before the immediate dominator block in encountered to check what we met first - exec modify or exec restore. The problem here is that XOR can be both.

Thanks. More or less you want to disable split in an empty block preceded by c_branch_exec[n]z. I can understand why this would be a problem. In reality such block can be either empty or contain another branch, because it does not make any sense to pass a control to a block with vector instructions and have no active lanes.

But this leaves couple problems:

  1. It does not disallow a split in a block prologue before exec is restored. This creates exactly the same problem.
  2. It does not disallow a split even in an empty block where EXEC is not zero, but just wrong. The problem is not zero EXEC, the problem is wrong EXEC, zero is just once case of this.

JBTW, even with all of this it is still OK to split an LI of SGPR. I'd say at the very least callback needs to take an LI in question as well.

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
6813

It already modifies EXEC, so checking DST here is not needed.

The idea is:

For the block that is queried

  1. Look for it's predecessors that can pass control through the S_EXECZ/EXECNZ
  2. If found one, look for exec restoring code starting the beginning of the block being queried.
  3. Since exec restoring code always belong to the block prologue, search the prologue and if not found return false.

Considering your comment that exec == 0 does not matter, we'd rather search upwards before the immediate dominator block in encountered to check what we met first - exec modify or exec restore. The problem here is that XOR can be both.

Thanks. More or less you want to disable split in an empty block preceded by c_branch_exec[n]z. I can understand why this would be a problem. In reality such block can be either empty or contain another branch, because it does not make any sense to pass a control to a block with vector instructions and have no active lanes.

But this leaves couple problems:

  1. It does not disallow a split in a block prologue before exec is restored. This creates exactly the same problem.
  2. It does not disallow a split even in an empty block where EXEC is not zero, but just wrong. The problem is not zero EXEC, the problem is wrong EXEC, zero is just once case of this.

JBTW, even with all of this it is still OK to split an LI of SGPR. I'd say at the very least callback needs to take an LI in question as well.

In fact, I clearly understand that the explicit and clean way to aceive the goal would be for the block queried check if the value of exec mask in it is exactly the same as in the block to which VNInfo that active in current block got defined.
Details:

  1. We have MBB that is queried if it is valid to split the give LI in it.
  2. We check which VNInfo (valno) is active in this block.
  3. We take MBB where the VNInfo::def SlotIndex lives
  4. We proove that along the all passes from MBB with def to current MBB exec mask operations keep consistency. In another words - whatever we did with exec mask it has same value in defMBB and in splitMBB.

It would be great but... it would require values analysis over all passes.

I think this is just hacking around a deeper IR question about how to model exec writes

Also I still think that disabling a whole "endif" block is an overkill.

It only is disabled if S_OR_B64 exec, ... is in the middle of the block that should never happen.

while (isBasicBlockPrologue(*J)) {
  if (IsExecRestore(&*J))
    return true;

assumes that if exec is restored in the block prologue it is valid

So practically it never happens and split is effectively only disabled in an empty block? I said it already: it does not matter that exec is zero, what matters is that it does not match. It does not matter that a block is empty as well, it is enough to split before s_or to hit the bug.

Let's forget about exec == 0...
The code snippet above looks for exec restoring instruction in the block prologue and return true if it exists. So, false positive can only happen if there is no exec restoring instruction in the beginning of the block but it is placed in the middle of the block. In this case I report the block is invalid for splitting despite it is valid to split after the exec restoring instruction in the middle of the block. Do we really have exec restored in the middle of the block?

No, we do not change exec in the middle of the block.
It would help to understand what are you doing if you would have a test to show.

This does happen, although perhaps it shouldn't (WQM emit these for example)

arsenm added inline comments.Oct 21 2020, 2:47 PM
llvm/lib/CodeGen/RegAllocGreedy.cpp
1262–1263

Why is this early exiting here and not in the same place as lower in the loop, after the hasInterference check?

// Abort if the spill cannot be inpserted at the MBB' start
if (!MBB->empty() &&
    SlotIndex::isEarlierInstr(LIS->getInstructionIndex(MBB->instr_front()),
                              SA->getFirstSplitPoint(Number)))
  return false;
// Interference for the
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
6812

Doesn't handle wave32

6820

This doesn't consider fallthrough blocks