This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Fixed a couple of SIFixWWMLiveness problems
AbandonedPublic

Authored by tpr on May 4 2018, 1:58 PM.

Details

Reviewers
nhaehnle
cwabbott
Summary
  1. Where SIFixWWMLiveness adds dummy operands to EXIT_WWM, remove kill

marks from other uses of the same register to avoid the MIR becoming
invalid.

  1. SIFixWWMLiveness's scheme for finding registers that might be live in

disabled lanes at the point of the WWM code involves finding defs that
reach the WWM code, and would be considered live if all defs are
considered partial defs.

That scheme could find false positives in defs that do not dominate the
WWM code, and thus do not really have a live value reaching it in any
lane. The two problems with adding a dummy operand for such a register
to EXIT_WWM are (a) it is over-conservative, and (b) it violates
MachineVerifier's check that all uses are fully defined, even if
sometimes by an IMPLICIT_DEF that arises from a undef phi input.

This commit fixes it by only considering defs that dominate the
EXIT_WWM.

Change-Id: I3d70f6138798c3aaa8b6bdf290d4eb85c7beb311

Diff Detail

Event Timeline

tpr created this revision.May 4 2018, 1:58 PM
tpr added a comment.May 4 2018, 2:02 PM

Sorry about the size of the test. It's the best that bugpoint could do.

cwabbott added inline comments.May 4 2018, 2:37 PM
lib/Target/AMDGPU/SIFixWWMLiveness.cpp
40

Unfortunately, this won't work in general. Consider something like:

while (non_uniform_condition) {
    BEGIN_WWM;
    bar = ...;
    EXIT_WWM;
    foo = ...;
}
... = foo;

The definition of foo doesn't dominate the WWM instructions earlier in the loop body, but the use of foo can potentially "see" the result of many different iterations of the loop, since the loop trip count is non-uniform, and any WWM instructions will clobber everything but the last iteration. Hence we need to add an artificial interference with foo here. Of course, if you removed the use of foo outside the loop, then we wouldn't need to do anything... it's the actual use that is crucial here.

We also ran into the issue of SIFixWWMLiveness being too conservative (as well as the liveness issue) when enabling AMD_shader_ballot for radv, but I haven't been able to come up with a good solution for it. It seems that we have to treat loops, and registers live out of a loop, specially somehow.

For the liveness issue, maybe a better way to solve it would be to add a new ENTER_WWM pseudoinstruction similar to EXIT_WWM, and add a matching implicit def to the matching ENTER_WWM whenever we insert an implicit use on EXIT_WWM, and mark both of them as kills. After all, any affected registers only need to interfere with the instructions run in WWM, so that should help with code quality too. I'm not sure why I didn't do that in the first place.

tpr added a comment.May 5 2018, 12:27 AM

Your suggestion of an ENTER_WWM with a def sounds like it would work.

You made a comment in this code that "this is a workaround anyways until LLVM gains the notion of predicated uses and definitions of variables". How do you see that working? Are you aware of anyone else thinking along those lines in the LLVM community? Something to think about for the future.

In D46470#1088984, @tpr wrote:

You made a comment in this code that "this is a workaround anyways until LLVM gains the notion of predicated uses and definitions of variables". How do you see that working? Are you aware of anyone else thinking along those lines in the LLVM community? Something to think about for the future.

I was thinking about some way to decorate instructions with abstract predicates with enough information to make them useful for register allocation. For example, you could compute the OR of two predicates, one predicate minus another, etc. We would compute these predicates per-block before lowering the control flow, and then make every instruction predicated and lower the control flow (preserving the predicates). Except that WWM instructions still wouldn't be predicated, of course. There's a lot of pre-existing literature on this sort of thing with classical (scalar) predicated architectures, for example "Strategies for Predicate-Aware Register Allocation" by Hoflehner is one overview. Actually implementing one of the schemes in that paper would require changes to core MC and would probably be quite invasive, though. There are some other potential users for it, like the new predicated AVX512 stuff, but I don't know if anyone else is thinking about it besides for some discussion with @nhaehnle.

tpr added a comment.May 7 2018, 4:11 AM

Hi Connor

Thanks for the pointers on possible future directions. I was vaguely thinking about some scheme where a value in a live interval is marked with its predication in some way so two overlapping segments are not considered interfering if they have disjoint predications. But that is pretty intrusive.

Anyway, back to my present problem:

I have been trying to come up with a better way of handling WWM liveness based on examining the particular cases (which are essentially a multi-def value that is a lowered phi, and a single-def value defined at the bottom of a do-while loop that the WWM is inside).

However, a fundamental problem has occurred to me: however we synthesize liveness in WWM code, surely the register allocator could decide to split the register, and the inserted copies will be predicated and thus useless for our live-in-inactive-lanes liveness. The inactive lanes will rely on keeping the value in the original register, but the liveness info will no longer say that register is live in the WWM code, and the same register could be allocated to a WWM value.

Any ideas on that one?

tpr abandoned this revision.May 11 2018, 7:15 AM

Abandoned in favor of
https://reviews.llvm.org/D46756