- Introduce a simple "Copy Hoisting" post RegBankAssign combine to prevent stray COP¨Y instructions from interfering with pattern matching.
- Prevent introduction of useless UNMERGE_VALUEs when splitting i64 values that also interfere with pattern matching.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp | ||
---|---|---|
401 | This line seems to be wrong. |
The copy hoisting works for BFI, but it's not an ideal solution I think because for other cases (like the insertelement tests) it can worsen codegen. I tried to make it as local as possible but I couldn't get rid of all the cases where it's unprofitable to move the copy "out".
If copy hoisting is an acceptable solution and we want to move forward with it maybe it needs to be made smarter, e.g. look at the whole expression tree, try to compute how many copies could be inserted and where and choose the solution that inserts the least copies? Perhaps it could even move copies downwards, e.g. currently it can transform a tree of expressions from SGPR to VGPR, but maybe it could do the opposite as well if it introduces less copies?
Thoughts? Should I keep going with this approach and try to make it smarter and better, or give it up?
Ideally I'd really like to be able to just fix the tablegen but I haven't found a way to do it properly.
There's also another annoying case in one of the BFI tests where the RegBankSelect adds 2 identical copies.
It prevents BFI from being selected because %6/%7 aren't identical (despite referencing the same physical register).
This could be fixed by another combine (?) or we could maybe change GISe's GIM_CheckIsSameOperand l so it looks through copies of physregs to vregs?
%6:vgpr(s32) = COPY %2:sgpr(s32) %3:vgpr(s32) = G_XOR %1:vgpr, %6:vgpr %4:vgpr(s32) = G_AND %0:vgpr, %3:vgpr %7:vgpr(s32) = COPY %2:sgpr(s32) %5:vgpr(s32) = G_XOR %7:vgpr, %4:vgpr
My gut feeling is that this is the wrong approach, because it will convert SALU to VALU operations regardless of whether they can actually be matched by some larger VALU pattern. This is bad for all sorts of reasons especially on GFX10+: latency, occupancy, power consumption.
Instead I think the pattern matching for instructions like BFI needs to be able to look through cross-regbank copies. There has been some discussion in the past about this. Please read through https://discourse.llvm.org/t/globalisel-cross-bank-constant-propagation/57927 for a start.
I agree, this approach is very hacky and I started noticing its limits/drawbacks while I was trying to fix the unintended consequences of the new combine.
Do you think it'd be worthwhile to restart a new discussion on the Discourse? It seems like there's a lot of different opinions on how to solve this.
I'm personally not a fan of adding something like g_maybe_cross_reg_bank_copy - it feels like we'll need it everywhere sooner or later and it's making TableGen even more confusing.
Ignoring copies all the time also seems wrong because it'll just move the problem somewhere else. Later someone might want to match copies explicitly and then we'll end up with the same discussion.
Recently I've also had a lot of issues like this, where the DAG pattern is straightforward but the MIR one has extra additions making it harder to match.
I feel like there's an opportunity to improve pattern matching for those cases. On top of my head, maybe we could add a way to create special complex "PatFrags" for GISel that:
- In DAGISel, Map to one or more nodes (like current PatFrags)
- In GISel, run a C++ function to perform the matching
Then the C++ function could check the instruction & look through copies as needed. It'd have an advantage over ComplexPatterns as it could be used as a non-leaf node and won't require having an equivalent DAGISel function.
This line seems to be wrong.