This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Alter arith-cbz-fusion to fuse between pairs register instructions
Needs ReviewPublic

Authored by dmgreen on Sep 27 2021, 8:45 AM.

Details

Summary

There is not a lot of documentation on arith-cbz-fusion in llvm, other than it fuses between arithmetic instructions and cbz branches. It is not explained anywhere that I can find if that is meant to be the arithmetic instruction that defines the register and the cbz that uses it, or any old arithmetic instruction and the branch. The tests seem to suggest it should be the former, but the code only checks for the later.

This updates the code to fuse between the arithmetic instructions setting the register and the cbz using it.

Diff Detail

Event Timeline

dmgreen created this revision.Sep 27 2021, 8:45 AM
dmgreen requested review of this revision.Sep 27 2021, 8:45 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 27 2021, 8:45 AM

I haven't worked with this code for a while, but isn't this function only invoked when there is an edge in the schedule graph between the two nodes anyway so they must be writing/reading the same register and you wouldn't need the extra check you are adding here?

Matt added a subscriber: Matt.Sep 27 2021, 9:44 AM

It seems to be between an SUnit and its Preds. But there can be many dependencies between a SUnit and ExitSU (the CBZ), more than just where register dependencies exist.

It seems to be between an SUnit and its Preds. But there can be many dependencies between a SUnit and ExitSU (the CBZ), more than just where register dependencies exist.

I don't think most shouldScheduleAdjacent implementations make sense otherwise. This line looks like it rejects the other dependency types: https://github.com/llvm/llvm-project/blob/06e2a0684e52b6cd2e4af3e19964bb5953573465/llvm/lib/CodeGen/MacroFusion.cpp#L180

This code adds an SDep::Artificial dependency between the instruction and ExitSU:
https://github.com/llvm/llvm-project/blob/20c02807333a47000879e0f673cdf2d6b07148dd/llvm/lib/CodeGen/ScheduleDAGInstrs.cpp#L876

Skipping Artificial edges in the same way we skip Weak edges will cause some of the other fusions to no longer correctly apply, as far as I can see. Like the arithmetic-bcc fusion and some of the X86 fusions.

This code adds an SDep::Artificial dependency between the instruction and ExitSU:

But ExitSU is an artificial node not an to mark the end of the schedule graph. I don't think it makes sense trying to macro-fuse ExitSU with anything.

The thing that is odd with this diff is that it adds extra checks to one particular rule (isArithmeticCbzPair) while it seems to me that if this is required, then all the other rules are problematic in the same sense; so we'd better understand why and find a general fix.

Have you tried what happens when we ignore Dep.isArtificial() in MacroFusion.cpp?

This code adds an SDep::Artificial dependency between the instruction and ExitSU:

But ExitSU is an artificial node not an to mark the end of the schedule graph. I don't think it makes sense trying to macro-fuse ExitSU with anything.

The thing that is odd with this diff is that it adds extra checks to one particular rule (isArithmeticCbzPair) while it seems to me that if this is required, then all the other rules are problematic in the same sense; so we'd better understand why and find a general fix.

ExitSU is the CBNZW or Bcc according to the debug logs I'm looking at.

Have you tried what happens when we ignore Dep.isArtificial() in MacroFusion.cpp?

Yeah, that is what I meant by "Skipping Artificial edges in the same way we skip Weak edges will cause some of the other fusions to no longer correctly apply, as far as I can see. Like the arithmetic-bcc fusion and some of the X86 fusions." There are a number of failing test cases I think because the dependencies between a $xzr = SUBSXri $x2, 0, 0, implicit-def $nzcv and a Bcc 1, %bb.1, implicit killed $nzcv is an artificial dependency, potentially through the physical nzcv register.

I don't have a strong need for this patch. I just noticed it started failing when changing the default schedule for -mcpu=generic. I think it can happen whenever another edge is added between nodes in the scheduling DAG, in this case between instructions with latency > 1 and ExitSU. It looks like the only other fusion that will be between an instruction and ExitSU will be isArithmeticBccPair, where the dependency goes through NSVZ.

I looked at this a bit more and I am now suspicious about https://github.com/llvm/llvm-project/blob/aec66f895bf516564346a4366d5e06139b8370ed/llvm/lib/CodeGen/ScheduleDAGInstrs.cpp#L213 using -1 instead of the actual operand index resulting in an artificial dependency. So far that seems wrong to me and fixing that means we can also switch macrofusion to only look at Dep.getKind() == SDep::Data edges and the X86 fusion tests still work properly.

As usual changing things like that produces a bunch of noise in other tests the changes in X86 look benign but I didn't go through all the other backends...

I put up my experiment here: https://reviews.llvm.org/D111325 but I am not sure I have the bandwidth right now to address all the test churn :-(

This comment was removed by MatzeB.