This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/CodeGen/
-
CodeGen/
3/3
CriticalAntiDepBreaker.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
tied-depbreak.mir

Differential D107582

[CodeGen] Fix dependence breaking for tied operands
ClosedPublic

Authored by danilaml on Aug 5 2021, 9:54 AM.

Download Raw Diff

Details

Reviewers

spatel
hfinkel
MatzeB
craig.topper
atrick
andreadb
timshen
qcolombet
RKSimon

Commits

rG7b102fcc9116: [CodeGen] Fix dependence breaking for tied operands

Summary

This is mainly to show the issue on a reduced synthetic lit test and discuss the possible solution. I'm not convinced that the one I've included in the initial version of this patch is correct one (and even if it is, it can likely be refactored better).

The issue I'm trying to fix is similar to the one fixed in https://reviews.llvm.org/D4351
Without the patch, compiler breaks the antidependece on $esi with $ecx and generates clearly incorrect code for XOR:

$esi = XOR32rr undef $ecx, undef $ecx, implicit-def dead $eflags, implicit-def $rsi

I've attempted to follow the code's logic but wasn't able to figure out the invariant it tries to uphold.
The problems I've found (and as such, possible places to attempt the fix):

Fix for PR20020 included the following comment:

If this reg is tied and live (Classes[Reg] is set to -1), we can't change

Yet, according to the doc above Classes:

/// For live regs that are only used in one register class in a
/// live range, the register class. If the register is not live, the
/// corresponding value is null. If the register is live but used in
/// multiple register classes, the corresponding value is -1 casted to a
/// pointer.

So should it just check against null? Is it enough?

On x64 32-bit instructions are often (although inconsistently) annotated with implicit-def of the 64-bit reg alias.

It doesn't look like this pass handles implicit defs at all (sans, possibly, near MI.getDesc().getNumOperands()).
I don't think it's safe to assume that one can freely change implicit-def of the instruction, so perhaps adding all such encountered regs to KeepRegs could be a solution? But wouldn't that be too pessimistic given the mentioned behavior of x64 backend?
In the provided test the implicit-def $rsi in

$esi = XOR32rr undef $esi, undef $esi, implicit-def dead $eflags, implicit-def $rsi

causes the following issue:
in ScanInstruction it erases all ecx entries from RegRefs, because implicit-def is also isDef.
Then, only the uses are inserted into RegRefs in the following loop (due to isUse check).
This seems to violate isNewRegClobberedByRefs assumption stated in the comments:

//... We guard against the case in which
// the two-address instruction also defines NewReg, as may happen with
// pre/postincrement loads. In this case, both the use and def operands are in
// RegRefs because the def is inserted by PrescanInstruction and not erased
// during ScanInstruction.

Which leads to the "partial" $ecx replacement in the mir output.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

danilaml created this revision.Aug 5 2021, 9:54 AM

Herald added subscribers: pengfei, hiraditya. · View Herald TranscriptAug 5 2021, 9:54 AM

danilaml requested review of this revision.Aug 5 2021, 9:54 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 5 2021, 9:54 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

danilaml edited the summary of this revision. (Show Details)Aug 5 2021, 10:16 AM

danilaml added reviewers: spatel, hfinkel, MatzeB.

Harbormaster completed remote builds in B118197: Diff 364522.Aug 5 2021, 10:23 AM

spatel mentioned this in D4351: bug fix for PR20020: anti-dependency-breaker causes miscompilation.Aug 10 2021, 3:53 AM

Adding more potential reviewers - I can't remember any details about how this pass works.

Ping.
I'm tempted to just disable all dep breaking for implicit-def registers, but I fear that it might be too pessimistic for x86_64.

This isn't an area I know much about

danilaml added reviewers: timshen, qcolombet.Aug 26 2021, 5:24 AM

ping?

ping

@MatzeB Do you have any thoughts on this?

@MatzeB Do you have any thoughts on this?

Unfortunately this pass has a very unconventional way to analyze things and manage liveness. It's hard to understand what is going on, so it's hard to judge changes to this pass...

It seems the idea of this patch is that check for tied operands in line 225/226 (in the original) is performed in a scan separate from the other operands? I don't see why this improves things right now, but I also don't see the harm. @danilaml Can you explain why the change helps?

If this does indeed only make the pass more conservative in some cases with tied operands, then I'm fine landing it.

On x64 32-bit instructions are often (although inconsistently) annotated with implicit-def of the 64-bit reg alias.

The idea is that this depends on the user of the instruction. For example if there is a use reading the whole %rsi register and the defining instruction only writes to %esi then the instruction should be annotated with the implicit-def, there's no need for it if the uses only read %esi so you can see both variants in practice.

It doesn't look like this pass handles implicit defs at all (sans, possibly, near MI.getDesc().getNumOperands()).
I don't think it's safe to assume that one can freely change implicit-def of the instruction, so perhaps adding all such encountered regs to KeepRegs could be a solution?

Generally I think you can change registers for operands marked with the renamable attribute, but must not touch other operands. With that said, this pass clearly doesn't care; it was written before renamable was introduced and was never updated to use it :-(

llvm/lib/CodeGen/CriticalAntiDepBreaker.cpp
224	Would it make sense to compute a `bool HasTiedOperand` variable in the first loop and skip the 2nd one if we didn't see any tied operands?
224–228

In D107582#3070678, @MatzeB wrote:

@MatzeB Do you have any thoughts on this?

Unfortunately this pass has a very unconventional way to analyze things and manage liveness. It's hard to understand what is going on, so it's hard to judge changes to this pass...

I agree ant that's what I tried to express in my summary. I don't believe this patch solves all potential issues I've encountered, but as you said, the way the Pass tracks it's state is very convoluted so I was hesitant to make any more involved changes (I even had trouble reconstructing the context of the patch after combing back to it right now).

It seems the idea of this patch is that check for tied operands in line 225/226 (in the original) is performed in a scan separate from the other operands? I don't see why this improves things right now, but I also don't see the harm. @danilaml Can you explain why the change helps?

The overall issue here is that this pass doesn't seem to correctly handle instructions with implicit defs and tied reg operands. One case of it was fixed in https://reviews.llvm.org/D4351 , and this tries to fix similar issue (see the end of the summary and the test case), but basically it ends up with
$esi = XOR32rr undef $ecx, undef $ecx, implicit-def dead $eflags, implicit-def $rsi
in place of
$esi = XOR32rr undef $esi, undef $esi, implicit-def dead $eflags, implicit-def $rsi

which is illegal (no asserts but codegen randomly converges it to "legal" state) and can lead to the bad codegen.

If this does indeed only make the pass more conservative in some cases with tied operands, then I'm fine landing it.

I think what my change does is effectively prohibit dep breaking for regs in tied operands if they are also used in implicit def in the same instruction (the case for most 32-bit wide MIs on x64).
Previously, the code would encounter the check before it processed all MOs, that might've changed the Classes[Reg] to -1. I'm still not sure why the original fix equates -1 state with being live (which it is not).

On x64 32-bit instructions are often (although inconsistently) annotated with implicit-def of the 64-bit reg alias.

The idea is that this depends on the user of the instruction. For example if there is a use reading the whole %rsi register and the defining instruction only writes to %esi then the instruction should be annotated with the implicit-def, there's no need for it if the uses only read %esi so you can see both variants in practice.

I see, that makes sense.

It doesn't look like this pass handles implicit defs at all (sans, possibly, near MI.getDesc().getNumOperands()).
I don't think it's safe to assume that one can freely change implicit-def of the instruction, so perhaps adding all such encountered regs to KeepRegs could be a solution?

Generally I think you can change registers for operands marked with the renamable attribute, but must not touch other operands. With that said, this pass clearly doesn't care; it was written before renamable was introduced and was never updated to use it :-(

In the MIR I've encountered (and reduced the test case from) for x86 I didn't see the implicit-defs annotated that way. And as you said, this pass doesn't seem to care about it. When reading the code for this pass I had a lot of "how does it even work" moments, so I'm not prepared to attempt fixing these issues/pass (especially considering how hard it is to find someone to review this patch). All in all, I'm convinced that this pass would still be subtly broken and this patch would just make it slightly less so in at least one example (encountered in the wild).

fwiw, the CriticalAntiDepBreaker pass was *supposed* to be removed after everyone started using pre-RA scheduling...

The overall issue here is that this pass doesn't seem to correctly handle instructions with implicit defs and tied reg operands. One case of it was fixed in https://reviews.llvm.org/D4351 , and this tries to fix similar issue (see the end of the summary and the test case)

Correct me if I'm wrong, but my impression of D4351 / line 225/226 is that the pass intends to simply not deal with tied-operands at all (regardless of implicit defs/uses being present or not); that seems reasonable enough for me. Though line 226 somehow stopped that logic from kicking in when there are undef operands involved right? So it seems to me the bug happens when you have a tied + undef operand at the same time? I don't see yet how (or if at all) implicit defs/uses play into the picture.

fwiw, the CriticalAntiDepBreaker pass was *supposed* to be removed after everyone started using pre-RA scheduling...

This discussion centers around X86 still using the "old" PostRASchedulerList and not switching to the "new" PostMachineScheduler. After all CriticalAntiDepBreaker uses interfaces from PostRASchedulerList. Unfortunately there seems to be only small perf benefits or none at all with swings in both directions which makes it very hard to justify all the work necessary to update tests and investigate and fix local regressions, while at the same time this pass is an obvious example of code that should be deprecated, or rewritten in the new framework :-/

In order to move forward here, I am fine with the fix as-is in the meantime. I'll accept this after the nitpicks are addressed.

In D107582#3071113, @MatzeB wrote:

The overall issue here is that this pass doesn't seem to correctly handle instructions with implicit defs and tied reg operands. One case of it was fixed in https://reviews.llvm.org/D4351 , and this tries to fix similar issue (see the end of the summary and the test case)

Correct me if I'm wrong, but my impression of D4351 / line 225/226 is that the pass intends to simply not deal with tied-operands at all (regardless of implicit defs/uses being present or not); that seems reasonable enough for me. Though line 226 somehow stopped that logic from kicking in when there are undef operands involved right? So it seems to me the bug happens when you have a tied + undef operand at the same time? I don't see yet how (or if at all) implicit defs/uses play into the picture.

Well, the comment says that If this reg is tied and live (Classes[Reg] is set to -1), but 1) Reg might still be live with Classes[reg] != -1 (see my question on the D4351 review) 2) ...
This patch doesn't fix the underlying issue with implicit-def regs (as I'm uncertain on how to fix it besides bailing out as soo as an implicit reg is encountered), just prevents the code even reaching it for the test case, by moving the check after the implicit register is encountered and all its aliases are marked as -1 at the line 200.

Addressed comments

danilaml marked an inline comment as done.Oct 18 2021, 12:38 PM

danilaml added inline comments.

llvm/lib/CodeGen/CriticalAntiDepBreaker.cpp
224	Maybe. Not sure if we want to attempt to optimize it if it's supposed to be deprecated though.

@craig.topper Do you recall any previous attempts to transition x86 from PostRAScheduler to PostMachineScheduler ?

Harbormaster completed remote builds in B129414: Diff 380497.Oct 18 2021, 1:21 PM

pengfei added a subscriber: wxiao3.Oct 19 2021, 1:59 AM

danilaml marked an inline comment as done.Oct 25 2021, 6:48 AM

Thanks, LGTM

This revision is now accepted and ready to land.Oct 25 2021, 8:36 AM

This revision was landed with ongoing or failed builds.Oct 25 2021, 8:52 AM

Closed by commit rG7b102fcc9116: [CodeGen] Fix dependence breaking for tied operands (authored by danilaml). · Explain Why

This revision was automatically updated to reflect the committed changes.

danilaml added a commit: rG7b102fcc9116: [CodeGen] Fix dependence breaking for tied operands.

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

CriticalAntiDepBreaker.cpp

25 lines

test/

CodeGen/

X86/

tied-depbreak.mir

64 lines

Diff 382016

llvm/lib/CodeGen/CriticalAntiDepBreaker.cpp

Show First 20 Lines • Show All 206 Lines • ▼ Show 20 Lines for (MCRegAliasIterator AI(Reg, TRI, false); AI.isValid(); ++AI) {

Classes[Reg] = reinterpret_cast<TargetRegisterClass *>(-1); Classes[Reg] = reinterpret_cast<TargetRegisterClass *>(-1);

} }

// If we're still willing to consider this register, note the reference. // If we're still willing to consider this register, note the reference.

if (Classes[Reg] != reinterpret_cast<TargetRegisterClass *>(-1)) if (Classes[Reg] != reinterpret_cast<TargetRegisterClass *>(-1))

RegRefs.insert(std::make_pair(Reg, &MO)); RegRefs.insert(std::make_pair(Reg, &MO));

if (MO.isUse() && Special) {

if (!KeepRegs.test(Reg)) {

for (MCSubRegIterator SubRegs(Reg, TRI, /*IncludeSelf=*/true);

SubRegs.isValid(); ++SubRegs)

KeepRegs.set(*SubRegs);

}

for (unsigned I = 0, E = MI.getNumOperands(); I != E; ++I) {

MatzeBUnsubmitted

Done

Would it make sense to compute a bool HasTiedOperand variable in the first loop and skip the 2nd one if we didn't see any tied operands?

MatzeB: Would it make sense to compute a `bool HasTiedOperand` variable in the first loop and skip the…

danilamlAuthorUnsubmitted

Done

Maybe. Not sure if we want to attempt to optimize it if it's supposed to be deprecated though.

danilaml: Maybe. Not sure if we want to attempt to optimize it if it's supposed to be deprecated though.

const MachineOperand &MO = MI.getOperand(I);

if (!MO.isReg()) continue;

if (!Reg.isValid())

MatzeBUnsubmitted

Done

}

- for (unsigned i = 0, e = MI.getNumOperands(); i != e; ++i) {

- MachineOperand &MO = MI.getOperand(i);

+ for (unsigned I = 0, E = MI.getNumOperands(); I != E; ++I) {

+ const MachineOperand &MO = MI.getOperand(i);

if (!MO.isReg()) continue;

- if (Reg == 0) continue;

+ if (!Reg.isValid())

+ continue;

// If this reg is tied and live (Classes[Reg] is set to -1), we can't change

MatzeB:

continue;

// If this reg is tied and live (Classes[Reg] is set to -1), we can't change // If this reg is tied and live (Classes[Reg] is set to -1), we can't change

// it or any of its sub or super regs. We need to use KeepRegs to mark the // it or any of its sub or super regs. We need to use KeepRegs to mark the

// reg because not all uses of the same reg within an instruction are // reg because not all uses of the same reg within an instruction are

// necessarily tagged as tied. // necessarily tagged as tied.

// Example: an x86 "xor %eax, %eax" will have one source operand tied to the // Example: an x86 "xor %eax, %eax" will have one source operand tied to the

// def register but not the second (see PR20020 for details). // def register but not the second (see PR20020 for details).

// FIXME: can this check be relaxed to account for undef uses // FIXME: can this check be relaxed to account for undef uses

// of a register? In the above 'xor' example, the uses of %eax are undef, so // of a register? In the above 'xor' example, the uses of %eax are undef, so

// earlier instructions could still replace %eax even though the 'xor' // earlier instructions could still replace %eax even though the 'xor'

// itself can't be changed. // itself can't be changed.

if (MI.isRegTiedToUseOperand(i) && if (MI.isRegTiedToUseOperand(I) &&

Classes[Reg] == reinterpret_cast<TargetRegisterClass *>(-1)) { Classes[Reg] == reinterpret_cast<TargetRegisterClass *>(-1)) {

for (MCSubRegIterator SubRegs(Reg, TRI, /*IncludeSelf=*/true); for (MCSubRegIterator SubRegs(Reg, TRI, /*IncludeSelf=*/true);

SubRegs.isValid(); ++SubRegs) { SubRegs.isValid(); ++SubRegs) {

KeepRegs.set(*SubRegs); KeepRegs.set(*SubRegs);

} }

for (MCSuperRegIterator SuperRegs(Reg, TRI); for (MCSuperRegIterator SuperRegs(Reg, TRI);

SuperRegs.isValid(); ++SuperRegs) { SuperRegs.isValid(); ++SuperRegs) {

KeepRegs.set(*SuperRegs); KeepRegs.set(*SuperRegs);

} }

if (MO.isUse() && Special) {

if (!KeepRegs.test(Reg)) {

for (MCSubRegIterator SubRegs(Reg, TRI, /*IncludeSelf=*/true);

SubRegs.isValid(); ++SubRegs)

KeepRegs.set(*SubRegs);

}

} }

void CriticalAntiDepBreaker::ScanInstruction(MachineInstr &MI, unsigned Count) { void CriticalAntiDepBreaker::ScanInstruction(MachineInstr &MI, unsigned Count) {

// Update liveness. // Update liveness.

// Proceeding upwards, registers that are defed but not used in this // Proceeding upwards, registers that are defed but not used in this

// instruction are now dead. // instruction are now dead.

assert(!MI.isKill() && "Attempting to scan a kill instruction"); assert(!MI.isKill() && "Attempting to scan a kill instruction");

▲ Show 20 Lines • Show All 453 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/tied-depbreak.mir

This file was added.

				# RUN: llc -mtriple=x86_64-unknown-linux-gnu -mcpu=slm -run-pass post-RA-sched -o - %s \| FileCheck %s
				#
				# Verify that the critical antidependence breaker does not partially
				# replace tied operands

				--- \|

				define void @main() { ret void }

				...
				---
				# CHECK-LABEL: main
				name: main
				alignment: 16
				exposesReturnsTwice: false
				legalized: false
				regBankSelected: false
				selected: false
				failedISel: false
				tracksRegLiveness: true
				hasWinCFI: false
				registers: []
				liveins:
				- { reg: '$edi', virtual-reg: '' }
				- { reg: '$esi', virtual-reg: '' }
				frameInfo:
				isFrameAddressTaken: false
				isReturnAddressTaken: false
				hasStackMap: false
				hasPatchPoint: false
				stackSize: 0
				offsetAdjustment: 0
				maxAlignment: 1
				adjustsStack: false
				hasCalls: false
				stackProtector: ''
				maxCallFrameSize: 0
				cvBytesOfCalleeSavedRegisters: 0
				hasOpaqueSPAdjustment: false
				hasVAStart: false
				hasMustTailInVarArgFunc: false
				localFrameSize: 0
				savePoint: ''
				restorePoint: ''
				fixedStack: []
				stack: []
				callSites: []
				debugValueSubstitutions: []
				constants: []
				machineFunctionInfo: {}
				body: \|
				bb.0:
				liveins: $edi, $esi

				$eax = MOV32rr $esi
				$eax = LEA64_32r $rdi, 1, $rsi, 0, $noreg
				$edi = MOV32rr $esi
				$esi = MOV32ri 4
				; Verify that XOR is untouched by the dependency breaker
				; CHECK: $esi = XOR32rr undef $esi, undef $esi, implicit-def dead $eflags, implicit-def $rsi
				$esi = XOR32rr undef $esi, undef $esi, implicit-def dead $eflags, implicit-def $rsi
				RETQ killed $eax

				...

This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen] Fix dependence breaking for tied operandsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 382016

llvm/lib/CodeGen/CriticalAntiDepBreaker.cpp

llvm/test/CodeGen/X86/tied-depbreak.mir

[CodeGen] Fix dependence breaking for tied operands
ClosedPublic