This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen] Fix dependence breaking for tied operands
ClosedPublic

Authored by danilaml on Aug 5 2021, 9:54 AM.

Details

Summary

This is mainly to show the issue on a reduced synthetic lit test and discuss the possible solution. I'm not convinced that the one I've included in the initial version of this patch is correct one (and even if it is, it can likely be refactored better).

The issue I'm trying to fix is similar to the one fixed in https://reviews.llvm.org/D4351
Without the patch, compiler breaks the antidependece on $esi with $ecx and generates clearly incorrect code for XOR:

$esi = XOR32rr undef $ecx, undef $ecx, implicit-def dead $eflags, implicit-def $rsi

I've attempted to follow the code's logic but wasn't able to figure out the invariant it tries to uphold.
The problems I've found (and as such, possible places to attempt the fix):

  1. Fix for PR20020 included the following comment:
If this reg is tied and live (Classes[Reg] is set to -1), we can't change

Yet, according to the doc above Classes:

/// For live regs that are only used in one register class in a
/// live range, the register class. If the register is not live, the
/// corresponding value is null. If the register is live but used in
/// multiple register classes, the corresponding value is -1 casted to a
/// pointer.

So should it just check against null? Is it enough?

  1. On x64 32-bit instructions are often (although inconsistently) annotated with implicit-def of the 64-bit reg alias.

It doesn't look like this pass handles implicit defs at all (sans, possibly, near MI.getDesc().getNumOperands()).
I don't think it's safe to assume that one can freely change implicit-def of the instruction, so perhaps adding all such encountered regs to KeepRegs could be a solution? But wouldn't that be too pessimistic given the mentioned behavior of x64 backend?
In the provided test the implicit-def $rsi in

$esi = XOR32rr undef $esi, undef $esi, implicit-def dead $eflags, implicit-def $rsi

causes the following issue:
in ScanInstruction it erases all ecx entries from RegRefs, because implicit-def is also isDef.
Then, only the uses are inserted into RegRefs in the following loop (due to isUse check).
This seems to violate isNewRegClobberedByRefs assumption stated in the comments:

//... We guard against the case in which
// the two-address instruction also defines NewReg, as may happen with
// pre/postincrement loads. In this case, both the use and def operands are in
// RegRefs because the def is inserted by PrescanInstruction and not erased
// during ScanInstruction.

Which leads to the "partial" $ecx replacement in the mir output.

Diff Detail

Event Timeline

danilaml created this revision.Aug 5 2021, 9:54 AM
danilaml requested review of this revision.Aug 5 2021, 9:54 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 5 2021, 9:54 AM
danilaml edited the summary of this revision. (Show Details)Aug 5 2021, 10:16 AM
danilaml added reviewers: spatel, hfinkel, MatzeB.

Adding more potential reviewers - I can't remember any details about how this pass works.

Ping.
I'm tempted to just disable all dep breaking for implicit-def registers, but I fear that it might be too pessimistic for x86_64.

RKSimon resigned from this revision.Aug 17 2021, 8:49 AM

This isn't an area I know much about

@MatzeB Do you have any thoughts on this?

@MatzeB Do you have any thoughts on this?

Unfortunately this pass has a very unconventional way to analyze things and manage liveness. It's hard to understand what is going on, so it's hard to judge changes to this pass...

It seems the idea of this patch is that check for tied operands in line 225/226 (in the original) is performed in a scan separate from the other operands? I don't see why this improves things right now, but I also don't see the harm. @danilaml Can you explain why the change helps?

If this does indeed only make the pass more conservative in some cases with tied operands, then I'm fine landing it.

On x64 32-bit instructions are often (although inconsistently) annotated with implicit-def of the 64-bit reg alias.

The idea is that this depends on the user of the instruction. For example if there is a use reading the whole %rsi register and the defining instruction only writes to %esi then the instruction should be annotated with the implicit-def, there's no need for it if the uses only read %esi so you can see both variants in practice.

It doesn't look like this pass handles implicit defs at all (sans, possibly, near MI.getDesc().getNumOperands()).
I don't think it's safe to assume that one can freely change implicit-def of the instruction, so perhaps adding all such encountered regs to KeepRegs could be a solution?

Generally I think you can change registers for operands marked with the renamable attribute, but must not touch other operands. With that said, this pass clearly doesn't care; it was written before renamable was introduced and was never updated to use it :-(

llvm/lib/CodeGen/CriticalAntiDepBreaker.cpp
224

Would it make sense to compute a bool HasTiedOperand variable in the first loop and skip the 2nd one if we didn't see any tied operands?

224–228

@MatzeB Do you have any thoughts on this?

Unfortunately this pass has a very unconventional way to analyze things and manage liveness. It's hard to understand what is going on, so it's hard to judge changes to this pass...

I agree ant that's what I tried to express in my summary. I don't believe this patch solves all potential issues I've encountered, but as you said, the way the Pass tracks it's state is very convoluted so I was hesitant to make any more involved changes (I even had trouble reconstructing the context of the patch after combing back to it right now).

It seems the idea of this patch is that check for tied operands in line 225/226 (in the original) is performed in a scan separate from the other operands? I don't see why this improves things right now, but I also don't see the harm. @danilaml Can you explain why the change helps?

The overall issue here is that this pass doesn't seem to correctly handle instructions with implicit defs and tied reg operands. One case of it was fixed in https://reviews.llvm.org/D4351 , and this tries to fix similar issue (see the end of the summary and the test case), but basically it ends up with
$esi = XOR32rr undef $ecx, undef $ecx, implicit-def dead $eflags, implicit-def $rsi
in place of
$esi = XOR32rr undef $esi, undef $esi, implicit-def dead $eflags, implicit-def $rsi

which is illegal (no asserts but codegen randomly converges it to "legal" state) and can lead to the bad codegen.

If this does indeed only make the pass more conservative in some cases with tied operands, then I'm fine landing it.

I think what my change does is effectively prohibit dep breaking for regs in tied operands if they are also used in implicit def in the same instruction (the case for most 32-bit wide MIs on x64).
Previously, the code would encounter the check before it processed all MOs, that might've changed the Classes[Reg] to -1. I'm still not sure why the original fix equates -1 state with being live (which it is not).

On x64 32-bit instructions are often (although inconsistently) annotated with implicit-def of the 64-bit reg alias.

The idea is that this depends on the user of the instruction. For example if there is a use reading the whole %rsi register and the defining instruction only writes to %esi then the instruction should be annotated with the implicit-def, there's no need for it if the uses only read %esi so you can see both variants in practice.

I see, that makes sense.

It doesn't look like this pass handles implicit defs at all (sans, possibly, near MI.getDesc().getNumOperands()).
I don't think it's safe to assume that one can freely change implicit-def of the instruction, so perhaps adding all such encountered regs to KeepRegs could be a solution?

Generally I think you can change registers for operands marked with the renamable attribute, but must not touch other operands. With that said, this pass clearly doesn't care; it was written before renamable was introduced and was never updated to use it :-(

In the MIR I've encountered (and reduced the test case from) for x86 I didn't see the implicit-defs annotated that way. And as you said, this pass doesn't seem to care about it. When reading the code for this pass I had a lot of "how does it even work" moments, so I'm not prepared to attempt fixing these issues/pass (especially considering how hard it is to find someone to review this patch). All in all, I'm convinced that this pass would still be subtly broken and this patch would just make it slightly less so in at least one example (encountered in the wild).

fwiw, the CriticalAntiDepBreaker pass was *supposed* to be removed after everyone started using pre-RA scheduling...

The overall issue here is that this pass doesn't seem to correctly handle instructions with implicit defs and tied reg operands. One case of it was fixed in https://reviews.llvm.org/D4351 , and this tries to fix similar issue (see the end of the summary and the test case)

Correct me if I'm wrong, but my impression of D4351 / line 225/226 is that the pass intends to simply not deal with tied-operands at all (regardless of implicit defs/uses being present or not); that seems reasonable enough for me. Though line 226 somehow stopped that logic from kicking in when there are undef operands involved right? So it seems to me the bug happens when you have a tied + undef operand at the same time? I don't see yet how (or if at all) implicit defs/uses play into the picture.

fwiw, the CriticalAntiDepBreaker pass was *supposed* to be removed after everyone started using pre-RA scheduling...

This discussion centers around X86 still using the "old" PostRASchedulerList and not switching to the "new" PostMachineScheduler. After all CriticalAntiDepBreaker uses interfaces from PostRASchedulerList. Unfortunately there seems to be only small perf benefits or none at all with swings in both directions which makes it very hard to justify all the work necessary to update tests and investigate and fix local regressions, while at the same time this pass is an obvious example of code that should be deprecated, or rewritten in the new framework :-/

In order to move forward here, I am fine with the fix as-is in the meantime. I'll accept this after the nitpicks are addressed.

The overall issue here is that this pass doesn't seem to correctly handle instructions with implicit defs and tied reg operands. One case of it was fixed in https://reviews.llvm.org/D4351 , and this tries to fix similar issue (see the end of the summary and the test case)

Correct me if I'm wrong, but my impression of D4351 / line 225/226 is that the pass intends to simply not deal with tied-operands at all (regardless of implicit defs/uses being present or not); that seems reasonable enough for me. Though line 226 somehow stopped that logic from kicking in when there are undef operands involved right? So it seems to me the bug happens when you have a tied + undef operand at the same time? I don't see yet how (or if at all) implicit defs/uses play into the picture.

Well, the comment says that If this reg is tied and live (Classes[Reg] is set to -1), but 1) Reg might still be live with Classes[reg] != -1 (see my question on the D4351 review) 2) ...
This patch doesn't fix the underlying issue with implicit-def regs (as I'm uncertain on how to fix it besides bailing out as soo as an implicit reg is encountered), just prevents the code even reaching it for the test case, by moving the check after the implicit register is encountered and all its aliases are marked as -1 at the line 200.

danilaml updated this revision to Diff 380497.Oct 18 2021, 12:37 PM

Addressed comments

danilaml marked an inline comment as done.Oct 18 2021, 12:38 PM
danilaml added inline comments.
llvm/lib/CodeGen/CriticalAntiDepBreaker.cpp
224

Maybe. Not sure if we want to attempt to optimize it if it's supposed to be deprecated though.

@craig.topper Do you recall any previous attempts to transition x86 from PostRAScheduler to PostMachineScheduler ?

pengfei added a subscriber: wxiao3.Oct 19 2021, 1:59 AM
danilaml marked an inline comment as done.Oct 25 2021, 6:48 AM
MatzeB accepted this revision.Oct 25 2021, 8:36 AM

Thanks, LGTM

This revision is now accepted and ready to land.Oct 25 2021, 8:36 AM
This revision was landed with ongoing or failed builds.Oct 25 2021, 8:52 AM
This revision was automatically updated to reflect the committed changes.