This is an archive of the discontinued LLVM Phabricator instance.

I don't know what the state of this is nowadays, but doesn't this risk problems with the old problem of XMM and YMM not being differentiated within the register units? Calling convention for windows says XMM registers are preserved but YMM register are clobbered. With this change we would likely register XMM and YMM as clobbered. I haven't thought things through here, is this pass still safe with this in mind?

In D124168#3476315, @MatzeB wrote:

I don't know what the state of this is nowadays, but doesn't this risk problems with the old problem of XMM and YMM not being differentiated within the register units? Calling convention for windows says XMM registers are preserved but YMM register are clobbered. With this change we would likely register XMM and YMM as clobbered. I haven't thought things through here, is this pass still safe with this in mind?

I don't know anything about that, but shouldn't that be expressed with the existing constraints on the calls/returns anyway?

In D124168#3477201, @arsenm wrote:

In D124168#3476315, @MatzeB wrote:

I don't know what the state of this is nowadays, but doesn't this risk problems with the old problem of XMM and YMM not being differentiated within the register units? Calling convention for windows says XMM registers are preserved but YMM register are clobbered. With this change we would likely register XMM and YMM as clobbered. I haven't thought things through here, is this pass still safe with this in mind?

I don't know anything about that, but shouldn't that be expressed with the existing constraints on the calls/returns anyway?

I don't see why this would warrant special casing, but given that no existing tests failed I assume this is fine

Fix missed test update that shows this fixes a bug where a mov was previously incorrectly deleted

Herald added subscribers: kosarev, kerbowa, jvesely. · View Herald TranscriptJul 24 2022, 10:45 AM

Harbormaster completed remote builds in B177237: Diff 447144.Jul 24 2022, 12:29 PM

ping

qcolombet accepted this revision.Aug 25 2022, 11:32 AM

This revision is now accepted and ready to land.Aug 25 2022, 11:32 AM

b5041527c75de2f409aa9e2e6deba12b17834c59

Theoretically improves compile time for targets with many overlapping registers

In practice, this is a major compile time regression instead: http://llvm-compile-time-tracker.com/compare.php?from=6c44a7179f1747ec38d580e6b50bde98555ad811&to=b5041527c75de2f409aa9e2e6deba12b17834c59&stat=instructions

In D124168#3784108, @nikic wrote:

Theoretically improves compile time for targets with many overlapping registers

In practice, this is a major compile time regression instead: http://llvm-compile-time-tracker.com/compare.php?from=6c44a7179f1747ec38d580e6b50bde98555ad811&to=b5041527c75de2f409aa9e2e6deba12b17834c59&stat=instructions

This doesn't make much sense. My preliminary analysis says that this is somehow phys_regs_and_masks fault (which really doesn't buy any real code simplification over iterating the operands)

In D124168#3784598, @arsenm wrote:

In D124168#3784108, @nikic wrote:

Theoretically improves compile time for targets with many overlapping registers

In practice, this is a major compile time regression instead: http://llvm-compile-time-tracker.com/compare.php?from=6c44a7179f1747ec38d580e6b50bde98555ad811&to=b5041527c75de2f409aa9e2e6deba12b17834c59&stat=instructions

This doesn't make much sense. My preliminary analysis says that this is somehow phys_regs_and_masks fault (which really doesn't buy any real code simplification over iterating the operands)

Hopefully d90f7cb559e32c2cbf1f9839d7e8e0cc0be189ba fixes this

In D124168#3784875, @arsenm wrote:

In D124168#3784598, @arsenm wrote:

In D124168#3784108, @nikic wrote:

Theoretically improves compile time for targets with many overlapping registers

In practice, this is a major compile time regression instead: http://llvm-compile-time-tracker.com/compare.php?from=6c44a7179f1747ec38d580e6b50bde98555ad811&to=b5041527c75de2f409aa9e2e6deba12b17834c59&stat=instructions

This doesn't make much sense. My preliminary analysis says that this is somehow phys_regs_and_masks fault (which really doesn't buy any real code simplification over iterating the operands)

Hopefully d90f7cb559e32c2cbf1f9839d7e8e0cc0be189ba fixes this

It does recover most of the regression: http://llvm-compile-time-tracker.com/compare.php?from=ab56719acd98778fb2e48fa425ac7c8d27bdea86&to=d90f7cb559e32c2cbf1f9839d7e8e0cc0be189ba&stat=instructions

There is still a bit of residual regression left though: http://llvm-compile-time-tracker.com/compare.php?from=6c44a7179f1747ec38d580e6b50bde98555ad811&to=d90f7cb559e32c2cbf1f9839d7e8e0cc0be189ba&stat=instructions

In D124168#3784951, @nikic wrote:

In D124168#3784875, @arsenm wrote:

In D124168#3784598, @arsenm wrote:

In D124168#3784108, @nikic wrote:

Theoretically improves compile time for targets with many overlapping registers

In practice, this is a major compile time regression instead: http://llvm-compile-time-tracker.com/compare.php?from=6c44a7179f1747ec38d580e6b50bde98555ad811&to=b5041527c75de2f409aa9e2e6deba12b17834c59&stat=instructions

This doesn't make much sense. My preliminary analysis says that this is somehow phys_regs_and_masks fault (which really doesn't buy any real code simplification over iterating the operands)

Hopefully d90f7cb559e32c2cbf1f9839d7e8e0cc0be189ba fixes this

It does recover most of the regression: http://llvm-compile-time-tracker.com/compare.php?from=ab56719acd98778fb2e48fa425ac7c8d27bdea86&to=d90f7cb559e32c2cbf1f9839d7e8e0cc0be189ba&stat=instructions

There is still a bit of residual regression left though: http://llvm-compile-time-tracker.com/compare.php?from=6c44a7179f1747ec38d580e6b50bde98555ad811&to=d90f7cb559e32c2cbf1f9839d7e8e0cc0be189ba&stat=instructions

I think this is explainable by LiveRegUnits::removeRegsNotPreserved being slower than clearBitsNotInMask. Ideally we would avoid this by switching from using regmasks to regunit masks

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

DeadMachineInstructionElim.cpp

52 lines

test/

CodeGen/

AMDGPU/

fold-immediate-operand-shrink.mir

1 line

Diff 447144

llvm/lib/CodeGen/DeadMachineInstructionElim.cpp

//===- DeadMachineInstructionElim.cpp - Remove dead machine instructions --===//		//===- DeadMachineInstructionElim.cpp - Remove dead machine instructions --===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This is an extremely simple MachineInstr-level dead-code-elimination pass.		// This is an extremely simple MachineInstr-level dead-code-elimination pass.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/ADT/PostOrderIterator.h"		#include "llvm/ADT/PostOrderIterator.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
		#include "llvm/CodeGen/LiveRegUnits.h"
#include "llvm/CodeGen/MachineFunctionPass.h"		#include "llvm/CodeGen/MachineFunctionPass.h"
#include "llvm/CodeGen/MachineRegisterInfo.h"		#include "llvm/CodeGen/MachineRegisterInfo.h"
#include "llvm/CodeGen/TargetSubtargetInfo.h"		#include "llvm/CodeGen/TargetSubtargetInfo.h"
#include "llvm/InitializePasses.h"		#include "llvm/InitializePasses.h"
#include "llvm/Pass.h"		#include "llvm/Pass.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"

using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "dead-mi-elimination"		#define DEBUG_TYPE "dead-mi-elimination"

STATISTIC(NumDeletes, "Number of dead instructions deleted");		STATISTIC(NumDeletes, "Number of dead instructions deleted");

namespace {		namespace {
class DeadMachineInstructionElim : public MachineFunctionPass {		class DeadMachineInstructionElim : public MachineFunctionPass {
bool runOnMachineFunction(MachineFunction &MF) override;		bool runOnMachineFunction(MachineFunction &MF) override;

const TargetRegisterInfo *TRI;
const MachineRegisterInfo *MRI;		const MachineRegisterInfo *MRI;
const TargetInstrInfo *TII;		const TargetInstrInfo *TII;
BitVector LivePhysRegs;		LiveRegUnits LivePhysRegs;

public:		public:
static char ID; // Pass identification, replacement for typeid		static char ID; // Pass identification, replacement for typeid
DeadMachineInstructionElim() : MachineFunctionPass(ID) {		DeadMachineInstructionElim() : MachineFunctionPass(ID) {
initializeDeadMachineInstructionElimPass(*PassRegistry::getPassRegistry());		initializeDeadMachineInstructionElimPass(*PassRegistry::getPassRegistry());
}		}

void getAnalysisUsage(AnalysisUsage &AU) const override {		void getAnalysisUsage(AnalysisUsage &AU) const override {
Show All 28 Lines	bool DeadMachineInstructionElim::isDead(const MachineInstr *MI) const {
bool SawStore = false;		bool SawStore = false;
if (!MI->isSafeToMove(nullptr, SawStore) && !MI->isPHI())		if (!MI->isSafeToMove(nullptr, SawStore) && !MI->isPHI())
return false;		return false;

// Examine each operand.		// Examine each operand.
for (const MachineOperand &MO : MI->operands()) {		for (const MachineOperand &MO : MI->operands()) {
if (MO.isReg() && MO.isDef()) {		if (MO.isReg() && MO.isDef()) {
Register Reg = MO.getReg();		Register Reg = MO.getReg();
if (Register::isPhysicalRegister(Reg)) {		if (Reg.isPhysical()) {
// Don't delete live physreg defs, or any reserved register defs.		// Don't delete live physreg defs, or any reserved register defs.
if (LivePhysRegs.test(Reg) \|\| MRI->isReserved(Reg))		if (!LivePhysRegs.available(Reg) \|\| MRI->isReserved(Reg))
return false;		return false;
} else {		} else {
if (MO.isDead()) {		if (MO.isDead()) {
#ifndef NDEBUG		#ifndef NDEBUG
// Baisc check on the register. All of them should be		// Baisc check on the register. All of them should be
// 'undef'.		// 'undef'.
for (auto &U : MRI->use_nodbg_operands(Reg))		for (auto &U : MRI->use_nodbg_operands(Reg))
assert(U.isUndef() && "'Undef' use on a 'dead' register is found!");		assert(U.isUndef() && "'Undef' use on a 'dead' register is found!");
Show All 20 Lines	bool DeadMachineInstructionElim::runOnMachineFunction(MachineFunction &MF) {
while (AnyChanges && eliminateDeadMI(MF))		while (AnyChanges && eliminateDeadMI(MF))
;		;
return AnyChanges;		return AnyChanges;
}		}

bool DeadMachineInstructionElim::eliminateDeadMI(MachineFunction &MF) {		bool DeadMachineInstructionElim::eliminateDeadMI(MachineFunction &MF) {
bool AnyChanges = false;		bool AnyChanges = false;
MRI = &MF.getRegInfo();		MRI = &MF.getRegInfo();
TRI = MF.getSubtarget().getRegisterInfo();
TII = MF.getSubtarget().getInstrInfo();		TII = MF.getSubtarget().getInstrInfo();

		LivePhysRegs.init(*MF.getSubtarget().getRegisterInfo());

// Loop over all instructions in all blocks, from bottom to top, so that it's		// Loop over all instructions in all blocks, from bottom to top, so that it's
// more likely that chains of dependent but ultimately dead instructions will		// more likely that chains of dependent but ultimately dead instructions will
// be cleaned up.		// be cleaned up.
for (MachineBasicBlock *MBB : post_order(&MF)) {		for (MachineBasicBlock *MBB : post_order(&MF)) {
// Start out assuming that reserved registers are live out of this block.		LivePhysRegs.addLiveOuts(*MBB);
LivePhysRegs = MRI->getReservedRegs();

// Add live-ins from successors to LivePhysRegs. Normally, physregs are not
// live across blocks, but some targets (x86) can have flags live out of a
// block.
for (const MachineBasicBlock *Succ : MBB->successors())
for (const auto &LI : Succ->liveins())
LivePhysRegs.set(LI.PhysReg);

// Now scan the instructions and delete dead ones, tracking physreg		// Now scan the instructions and delete dead ones, tracking physreg
// liveness as we go.		// liveness as we go.
for (MachineInstr &MI : llvm::make_early_inc_range(llvm::reverse(*MBB))) {		for (MachineInstr &MI : make_early_inc_range(reverse(*MBB))) {
// If the instruction is dead, delete it!		// If the instruction is dead, delete it!
if (isDead(&MI)) {		if (isDead(&MI)) {
LLVM_DEBUG(dbgs() << "DeadMachineInstructionElim: DELETING: " << MI);		LLVM_DEBUG(dbgs() << "DeadMachineInstructionElim: DELETING: " << MI);
// It is possible that some DBG_VALUE instructions refer to this		// It is possible that some DBG_VALUE instructions refer to this
// instruction. They will be deleted in the live debug variable		// instruction. They will be deleted in the live debug variable
// analysis.		// analysis.
MI.eraseFromParent();		MI.eraseFromParent();
AnyChanges = true;		AnyChanges = true;
++NumDeletes;		++NumDeletes;
continue;		continue;
}		}

// Record the physreg defs.		LivePhysRegs.stepBackward(MI);
for (const MachineOperand &MO : MI.operands()) {
if (MO.isReg() && MO.isDef()) {
Register Reg = MO.getReg();
if (Register::isPhysicalRegister(Reg)) {
// Check the subreg set, not the alias set, because a def
// of a super-register may still be partially live after
// this def.
for (MCSubRegIterator SR(Reg, TRI,/IncludeSelf=/true);
SR.isValid(); ++SR)
LivePhysRegs.reset(*SR);
}
} else if (MO.isRegMask()) {
// Register mask of preserved registers. All clobbers are dead.
LivePhysRegs.clearBitsNotInMask(MO.getRegMask());
}
}
// Record the physreg uses, after the defs, in case a physreg is
// both defined and used in the same instruction.
for (const MachineOperand &MO : MI.operands()) {
if (MO.isReg() && MO.isUse()) {
Register Reg = MO.getReg();
if (Register::isPhysicalRegister(Reg)) {
for (MCRegAliasIterator AI(Reg, TRI, true); AI.isValid(); ++AI)
LivePhysRegs.set(*AI);
}
}
}
}		}
}		}

LivePhysRegs.clear();		LivePhysRegs.clear();
return AnyChanges;		return AnyChanges;
}		}

llvm/test/CodeGen/AMDGPU/fold-immediate-operand-shrink.mir

	Show First 20 Lines • Show All 154 Lines • ▼ Show 20 Lines
	name: shrink_scalar_imm_vgpr_v_add_i32_e64_liveout_vcc_lo_use			name: shrink_scalar_imm_vgpr_v_add_i32_e64_liveout_vcc_lo_use
	tracksRegLiveness: true			tracksRegLiveness: true

	body: \|			body: \|
	; GCN-LABEL: name: shrink_scalar_imm_vgpr_v_add_i32_e64_liveout_vcc_lo_use			; GCN-LABEL: name: shrink_scalar_imm_vgpr_v_add_i32_e64_liveout_vcc_lo_use
	; GCN: bb.0:			; GCN: bb.0:
	; GCN-NEXT: successors: %bb.1(0x80000000)			; GCN-NEXT: successors: %bb.1(0x80000000)
	; GCN-NEXT: {{ $}}			; GCN-NEXT: {{ $}}
				; GCN-NEXT: $vcc = S_MOV_B64 -1
	; GCN-NEXT: [[S_MOV_B32_:%[0-9]+]]:sreg_32_xm0 = S_MOV_B32 12345			; GCN-NEXT: [[S_MOV_B32_:%[0-9]+]]:sreg_32_xm0 = S_MOV_B32 12345
	; GCN-NEXT: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF			; GCN-NEXT: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
	; GCN-NEXT: [[V_ADD_CO_U32_e64_:%[0-9]+]]:vgpr_32, [[V_ADD_CO_U32_e64_1:%[0-9]+]]:sreg_64 = V_ADD_CO_U32_e64 [[S_MOV_B32_]], [[DEF]], 0, implicit $exec			; GCN-NEXT: [[V_ADD_CO_U32_e64_:%[0-9]+]]:vgpr_32, [[V_ADD_CO_U32_e64_1:%[0-9]+]]:sreg_64 = V_ADD_CO_U32_e64 [[S_MOV_B32_]], [[DEF]], 0, implicit $exec
	; GCN-NEXT: {{ $}}			; GCN-NEXT: {{ $}}
	; GCN-NEXT: bb.1:			; GCN-NEXT: bb.1:
	; GCN-NEXT: liveins: $vcc_lo			; GCN-NEXT: liveins: $vcc_lo
	; GCN-NEXT: {{ $}}			; GCN-NEXT: {{ $}}
	; GCN-NEXT: S_ENDPGM 0, implicit [[V_ADD_CO_U32_e64_]], implicit $vcc_lo			; GCN-NEXT: S_ENDPGM 0, implicit [[V_ADD_CO_U32_e64_]], implicit $vcc_lo
	▲ Show 20 Lines • Show All 499 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

DeadMachineInstructionElim: Switch to using LiveRegUnitsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 447144

llvm/lib/CodeGen/DeadMachineInstructionElim.cpp

llvm/test/CodeGen/AMDGPU/fold-immediate-operand-shrink.mir

DeadMachineInstructionElim: Switch to using LiveRegUnits
ClosedPublic