This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Avoid null export insertion when unifying exit blocks
AbandonedPublic

Authored by critson on May 20 2021, 1:25 AM.

Details

Summary

Avoid adding a null export by unifying existing "done" exports
in the unified exit block of a shader with divergent exits.
If a null export is required then place this in a separate block
only visited by divergent exits that require it.

This assumes well-formed IR generated by existing frontends.
Specifically that there is a single "done" export for the shader
which occurs in uniform control flow.

Diff Detail

Event Timeline

critson created this revision.May 20 2021, 1:25 AM
critson requested review of this revision.May 20 2021, 1:25 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 20 2021, 1:25 AM
foad added a comment.May 20 2021, 6:21 AM

Mostly minor comments inline but I'm concerned about the "multiple predecessors with different done exports" case.

llvm/lib/Target/AMDGPU/AMDGPUUnifyDivergentExitNodes.cpp
138

Could be "for (auto &I : reverse(BB)) ..."

141

Why compare with BoolTrue, rather than examine the value? Is this just optimisation to make the test as fast as possible?

220

Isn't it possible that different predecessors have different "done" exports? What is this function supposed to do in that case?

224

Don't need the ", 8" (unless you have special knowledge that 8 really is the optimal value).

228

What does "otherwise" refer back to?

229

Adding "else Phis[Idx] = Undef" here would simplify the creation of the new export intrinsic below.

243

I think return IsCompr ? (...) : (...) would be clearer.

249

Clearer to use .empty() than to use .size() as a boolean, here and 12 lines below.

328–329

Wasn't this supposed to say "exec mask"?

critson marked 9 inline comments as done.May 20 2021, 8:52 PM

Mostly minor comments inline but I'm concerned about the "multiple predecessors with different done exports" case.

I have gone into this a bit more in the comment.
However this expects input IR is well-formed w.r.t. exports, as it is produced by existing front-ends.
If we wanted to handle the generic case then I feel like we might as well write a pass that checks all exports and adjusts them so they are well formed and run it after this pass.

llvm/lib/Target/AMDGPU/AMDGPUUnifyDivergentExitNodes.cpp
141

I believe this is standard practice rather than retrieving the constant int values representing true and false repeatedly.

220

This does assumes the IR is well-formed to a certain degree.
I have added a comment to try to document this.

Consider the following IR:

if (condition) {
 export done 1
} else {
 export done 2
}
return

This will eventually be compiled to:

set exec mask for if-branch
export done 1
set exec mask for else-branch
export done 2
restore exec mask
return

Multiple export done instructions will be executed which is invalid.
The expected cases for this are:

  1. Divergent exits with their own exports, e.g.
if (condition) {
  export done 1
  return
} else {
  export done 2
  return
}
  1. Or divergent exits without exports or a uniformly reached export, e.g.
export done
if (condition) {
  return
} else {
  return
}
228

I don't know, let's remove it.

229

Sure, I had to change the type of Phis and add some casts.

critson updated this revision to Diff 346922.May 20 2021, 8:52 PM
critson marked 4 inline comments as done.
  • Address reviewer comments
foad added inline comments.May 21 2021, 2:04 AM
llvm/lib/Target/AMDGPU/AMDGPUUnifyDivergentExitNodes.cpp
220

OK. Thanks for explaining. I find it hard to get my head round the possible cases that we do or don't have to handle. Do you cope with:

if (uniform condition)
  export done 1
else
  export done 2
if (divergent condition)
  return 1
else
  return 2

?

229

Sorry, I didn't realise it would need a cast.

I don't see why we need to handle such situation that would never happen. Can we simply assert each function has at most one exp_done?

critson added inline comments.May 21 2021, 3:16 AM
llvm/lib/Target/AMDGPU/AMDGPUUnifyDivergentExitNodes.cpp
220

Good question, I think the answer is no -- although it is in principle valid IR that we do not generate.
I realize I was thinking primarily of divergent conditions, rather than uniform ones.
Although the comment I added is still accurate, <= 1 reachable export done.

In principle I can rework this to handle the case you raised, but I am not sure it is worth it?

foad added inline comments.May 21 2021, 3:25 AM
llvm/lib/Target/AMDGPU/AMDGPUUnifyDivergentExitNodes.cpp
220

I have no idea what cases must be handled or are "worth" handling, so I think I'll leave that part of the review to someone who understands that.

critson updated this revision to Diff 356673.Jul 6 2021, 4:11 AM

Rework based on the assumption there is only one "done" export.
This is true for existing front-ends.

critson edited the summary of this revision. (Show Details)Jul 6 2021, 4:15 AM
foad added inline comments.Jul 6 2021, 4:32 AM
llvm/test/CodeGen/AMDGPU/kill-infinite-loop.ll
12

This comment seems obsolete now.

30

Same here.

critson updated this revision to Diff 356855.Jul 6 2021, 8:02 PM
  • Remove redundant comments
critson marked 5 inline comments as done.Jul 6 2021, 8:03 PM

Hi, I think we can further simplify the logic in our backend regarding export done and implement my idea as D105610, could you take a look if you have concern? the main idea is we just trust frontend will setup exp done correctly and only insert exp-null-done for kill intrinsic which is handled in SILateBranchLowering.cpp.

critson abandoned this revision.Jul 9 2021, 1:58 AM