Page MenuHomePhabricator

[AMDGPU][GlobalISel] Improve BFI Pattern Matching
Changes PlannedPublic

Authored by Pierre-vh on Aug 23 2022, 9:58 AM.

Details

Summary
  • Introduce a simple "Copy Hoisting" post RegBankAssign combine to prevent stray COP¨Y instructions from interfering with pattern matching.
  • Prevent introduction of useless UNMERGE_VALUEs when splitting i64 values that also interfere with pattern matching.

Diff Detail

Event Timeline

Pierre-vh created this revision.Aug 23 2022, 9:58 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 23 2022, 9:58 AM
Pierre-vh requested review of this revision.Aug 23 2022, 9:58 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 23 2022, 9:58 AM
jsilvanus added inline comments.
llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp
401

This line seems to be wrong.

Pierre-vh updated this revision to Diff 455105.Aug 24 2022, 1:41 AM
Pierre-vh marked an inline comment as done.

Tweak codegen

The copy hoisting works for BFI, but it's not an ideal solution I think because for other cases (like the insertelement tests) it can worsen codegen. I tried to make it as local as possible but I couldn't get rid of all the cases where it's unprofitable to move the copy "out".

If copy hoisting is an acceptable solution and we want to move forward with it maybe it needs to be made smarter, e.g. look at the whole expression tree, try to compute how many copies could be inserted and where and choose the solution that inserts the least copies? Perhaps it could even move copies downwards, e.g. currently it can transform a tree of expressions from SGPR to VGPR, but maybe it could do the opposite as well if it introduces less copies?

Thoughts? Should I keep going with this approach and try to make it smarter and better, or give it up?
Ideally I'd really like to be able to just fix the tablegen but I haven't found a way to do it properly.

There's also another annoying case in one of the BFI tests where the RegBankSelect adds 2 identical copies.
It prevents BFI from being selected because %6/%7 aren't identical (despite referencing the same physical register).
This could be fixed by another combine (?) or we could maybe change GISe's GIM_CheckIsSameOperand l so it looks through copies of physregs to vregs?

%6:vgpr(s32) = COPY %2:sgpr(s32)
%3:vgpr(s32) = G_XOR %1:vgpr, %6:vgpr
%4:vgpr(s32) = G_AND %0:vgpr, %3:vgpr
%7:vgpr(s32) = COPY %2:sgpr(s32)
%5:vgpr(s32) = G_XOR %7:vgpr, %4:vgpr
foad added a comment.Aug 24 2022, 3:08 AM

My gut feeling is that this is the wrong approach, because it will convert SALU to VALU operations regardless of whether they can actually be matched by some larger VALU pattern. This is bad for all sorts of reasons especially on GFX10+: latency, occupancy, power consumption.

Instead I think the pattern matching for instructions like BFI needs to be able to look through cross-regbank copies. There has been some discussion in the past about this. Please read through https://discourse.llvm.org/t/globalisel-cross-bank-constant-propagation/57927 for a start.

My gut feeling is that this is the wrong approach, because it will convert SALU to VALU operations regardless of whether they can actually be matched by some larger VALU pattern. This is bad for all sorts of reasons especially on GFX10+: latency, occupancy, power consumption.

Instead I think the pattern matching for instructions like BFI needs to be able to look through cross-regbank copies. There has been some discussion in the past about this. Please read through https://discourse.llvm.org/t/globalisel-cross-bank-constant-propagation/57927 for a start.

I agree, this approach is very hacky and I started noticing its limits/drawbacks while I was trying to fix the unintended consequences of the new combine.

Do you think it'd be worthwhile to restart a new discussion on the Discourse? It seems like there's a lot of different opinions on how to solve this.
I'm personally not a fan of adding something like g_maybe_cross_reg_bank_copy - it feels like we'll need it everywhere sooner or later and it's making TableGen even more confusing.
Ignoring copies all the time also seems wrong because it'll just move the problem somewhere else. Later someone might want to match copies explicitly and then we'll end up with the same discussion.

Recently I've also had a lot of issues like this, where the DAG pattern is straightforward but the MIR one has extra additions making it harder to match.
I feel like there's an opportunity to improve pattern matching for those cases. On top of my head, maybe we could add a way to create special complex "PatFrags" for GISel that:

  • In DAGISel, Map to one or more nodes (like current PatFrags)
  • In GISel, run a C++ function to perform the matching

Then the C++ function could check the instruction & look through copies as needed. It'd have an advantage over ComplexPatterns as it could be used as a non-leaf node and won't require having an equivalent DAGISel function.

foad added a comment.Aug 24 2022, 4:22 AM

Do you think it'd be worthwhile to restart a new discussion on the Discourse?

Sure!

Pierre-vh planned changes to this revision.Sep 29 2022, 1:31 AM