This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Add peephole to optimize MOV
AbandonedPublic

Authored by piotr on Jun 24 2019, 5:27 AM.

Download Raw Diff

Details

Reviewers

rampitec
arsenm

Summary

Add peephole optimization to remove redundant MOV if:

the MOV is identical to a MOV in the immediate predecessor of MBB and
no instruction between them modifies the destination register

Change-Id: Ibb187e3219ef641b7681e4123c7ca3fe8e1d14e6

Diff Detail

Repository

rL LLVM

Build Status

Buildable 33864
Build 33863: arc lint + arc unit

Event Timeline

piotr created this revision.Jun 24 2019, 5:27 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 24 2019, 5:27 AM

Herald added subscribers: llvm-commits, t-tye, tpr and 9 others. · View Herald Transcript

Harbormaster completed remote builds in B33781: Diff 206200.Jun 24 2019, 5:27 AM

piotr added reviewers: rampitec, arsenm.Jun 24 2019, 5:28 AM

This doesn't seem like the right place to handle this. This doesn't really fit with SIShrinkInstructions. I also wouldn't expect you need to do anything special to handle this case. Why doesn't MachineCSE catch this? Can you add an IR example testcase?

Patch does not take control flow divergence into account.

lib/Target/AMDGPU/SIShrinkInstructions.cpp
682	Even is block has only one predecessor that does not mean this is the reaching def of the physreg. The predecessor may (and likely will) have a different exec mask. It might work with sclaral, but not with vector moves.

This revision now requires changes to proceed.Jun 24 2019, 8:39 AM

This opt opportunity presents itself after the register coalescer, where inline constants are moved to the same registers (attaching IR example testcase imminently). MachineCSE is run too early to spot this and the last occurence of si-shrink-instructions is run at the right time. I think, for similar reasons some other peepholes were also placed in this pass, even though they are not about 32-bit encoding.

piotr marked an inline comment as done.Jun 24 2019, 8:58 AM

piotr added inline comments.

lib/Target/AMDGPU/SIShrinkInstructions.cpp
682	Perhaps I misunderstood your comment, but if the block has only one predecessor then it is executed for a subset of threads that already executed the predecessor, so for that subset we're sure that the previous mov happened?

Adding IR test case example.

Harbormaster completed remote builds in B33800: Diff 206233.Jun 24 2019, 9:03 AM

rampitec added inline comments.Jun 24 2019, 9:39 AM

lib/Target/AMDGPU/SIShrinkInstructions.cpp
682	Consider this: a: v_mov_b32 v0, 1 s_and_savexec_b64 s[0:1], vcc ; mask branch %c b: v_mov_b32 v0, 2 c: s_or_b32 exec, exec, s[0:1] use v0 You will have divergent v0 value at use even though block c has only one predecessor.

In D63709#1555766, @piotr wrote:

This opt opportunity presents itself after the register coalescer, where inline constants are moved to the same registers (attaching IR example testcase imminently). MachineCSE is run too early to spot this and the last occurence of si-shrink-instructions is run at the right time. I think, for similar reasons some other peepholes were also placed in this pass, even though they are not about 32-bit encoding.

I'd like to take a look a more reduced testcase. My intuition is something else is going wrong that's not eliminating this

test/CodeGen/AMDGPU/mov-opt.ll
15	You should be able to reduce this more

I do suspect the handling of the EXEC physreg use in MachineCSE. I don't see any of the typical sources of late visible constants in this testcase?

arsenm added inline comments.Jun 24 2019, 10:52 AM

test/CodeGen/AMDGPU/mov-opt.ll
15	I don't actually see any redundant constants?

piotr marked an inline comment as done.Jun 24 2019, 2:47 PM

piotr added inline comments.

test/CodeGen/AMDGPU/mov-opt.ll
15	The move constants are by-products of different phi nodes and this is why they are not explicitly written in the IR. They get created during the isel in HandlePHINodesInSuccessorBlocks. I will work on simplifying the IR test case even more.

Simplifying IR test case even more with bugpoint.

Harbormaster completed remote builds in B33864: Diff 206385.Jun 25 2019, 1:18 AM

piotr marked an inline comment as done.Jun 25 2019, 1:24 AM

piotr added inline comments.

lib/Target/AMDGPU/SIShrinkInstructions.cpp
682	Thanks Stas. Good point, I will need to check for the modification of the exec mask between the movs.

Reduced further: https://paste.debian.net/1089194

In D63709#1557592, @arsenm wrote:

Reduced further: https://paste.debian.net/1089194

Other targets seem to not have this problem with a slightly generalized version, so I would look into how this is cleaned up there

In D63709#1557605, @arsenm wrote:

In D63709#1557592, @arsenm wrote:

Reduced further: https://paste.debian.net/1089194

Other targets seem to not have this problem with a slightly generalized version, so I would look into how this is cleaned up there

It seems we're missing a simplifycfg run somewhere, so maybe we're thinking of this on the wrong level entirely.. If I run simplify cfg on any of the testcase variants, this problem disappears

In D63709#1557626, @arsenm wrote:

In D63709#1557605, @arsenm wrote:

In D63709#1557592, @arsenm wrote:

Reduced further: https://paste.debian.net/1089194

Other targets seem to not have this problem with a slightly generalized version, so I would look into how this is cleaned up there

It seems we're missing a simplifycfg run somewhere, so maybe we're thinking of this on the wrong level entirely.. If I run simplify cfg on any of the testcase variants, this problem disappears

Other targets seem to run SimiplifyCFG after AtomicExpand, which we are missing. Even with that disabled and the phi survives to machineinstrs, aarch64 and hexagon both avoid this

In D63709#1557626, @arsenm wrote:

In D63709#1557605, @arsenm wrote:

In D63709#1557592, @arsenm wrote:

Reduced further: https://paste.debian.net/1089194

Other targets seem to not have this problem with a slightly generalized version, so I would look into how this is cleaned up there

It seems we're missing a simplifycfg run somewhere, so maybe we're thinking of this on the wrong level entirely.. If I run simplify cfg on any of the testcase variants, this problem disappears

Well spotted, I will try to look at the issue from the perspective of missing simplifycfg.

In D63709#1557646, @arsenm wrote:

In D63709#1557626, @arsenm wrote:

In D63709#1557605, @arsenm wrote:

In D63709#1557592, @arsenm wrote:

Reduced further: https://paste.debian.net/1089194

Other targets seem to not have this problem with a slightly generalized version, so I would look into how this is cleaned up there

It seems we're missing a simplifycfg run somewhere, so maybe we're thinking of this on the wrong level entirely.. If I run simplify cfg on any of the testcase variants, this problem disappears

Other targets seem to run SimiplifyCFG after AtomicExpand, which we are missing. Even with that disabled and the phi survives to machineinstrs, aarch64 and hexagon both avoid this

MachineCSE should be taking care of this, but it for some reason concludes it isn't profitable:
Examining: %9:vgpr_32 = V_MOV_B32_e32 1065353216, implicit $exec

Found a common subexpression: %7:vgpr_32 = V_MOV_B32_e32 1065353216, implicit $exec
Not profitable, avoid CSE!

I think things are going from from this heuristic:

// Heuristics #3: If the common subexpression is used by PHIs, do not reuse
// it unless the defined value is already used in the BB of the new use.
bool HasPHI = false;
for (MachineInstr &UseMI : MRI->use_nodbg_instructions(CSReg)) {
  HasPHI |= UseMI.isPHI();
  if (UseMI.getParent() == MI->getParent())
    return true;
}

The second phi use is in a flow block with only the phi, so the use is in the fall through successor. Maybe it could be relaxed to look through trivial successors, at least for blocks with only phis?

In D63709#1557689, @arsenm wrote:
I think things are going from from this heuristic:
// Heuristics #3: If the common subexpression is used by PHIs, do not reuse
// it unless the defined value is already used in the BB of the new use.
bool HasPHI = false;
for (MachineInstr &UseMI : MRI->use_nodbg_instructions(CSReg)) {
  HasPHI |= UseMI.isPHI();
  if (UseMI.getParent() == MI->getParent())
    return true;
}
The second phi use is in a flow block with only the phi, so the use is in the fall through successor. Maybe it could be relaxed to look through trivial successors, at least for blocks with only phis?

Nice tracking down. I will abandon this review, and submit another one where I will extend the MachineCSE pass as you suggested.

piotr abandoned this revision.Jun 26 2019, 2:17 PM

piotr mentioned this in D63860: [AMDGPU] Add test.Jun 27 2019, 1:34 AM

Revision Contents

Path

Size

lib/

Target/

AMDGPU/

SIShrinkInstructions.cpp

95 lines

test/

CodeGen/

AMDGPU/

control-flow-fastregalloc.ll

1 line

mov-opt.ll

83 lines

remove-redundant-mov.mir

284 lines

Diff 206385

lib/Target/AMDGPU/SIShrinkInstructions.cpp

Show First 20 Lines • Show All 426 Lines • ▼ Show 20 Lines	if (TargetRegisterInfo::isPhysicalRegister(Reg)) {
LaneBitmask LM = TRI.getSubRegIndexLaneMask(Sub);		LaneBitmask LM = TRI.getSubRegIndexLaneMask(Sub);
Sub = TRI.getSubRegFromChannel(I + countTrailingZeros(LM.getAsInteger()));		Sub = TRI.getSubRegFromChannel(I + countTrailingZeros(LM.getAsInteger()));
}		}
}		}
return TargetInstrInfo::RegSubRegPair(Reg, Sub);		return TargetInstrInfo::RegSubRegPair(Reg, Sub);
}		}

// Match:		// Match:
		// bb0:
		// ..
		// mov r, imm
		// ..
		// branch bb1
		//
		// bb1:
		// ; predecessors: %bb.0
		// ..
		// mov r, imm <== redundant mov
		//
		//
		// Returns true if the mov can be removed.
		//
		static bool isMovRedundant(MachineInstr &Mov, MachineRegisterInfo &MRI,
		const SIInstrInfo *TII) {

		assert(Mov.getOpcode() == AMDGPU::V_MOV_B32_e32 \|\|
		Mov.getOpcode() == AMDGPU::S_MOV_B32 \|\|
		Mov.getOpcode() == AMDGPU::S_MOV_B64);
		assert(Mov.getParent()->pred_size() == 1);

		auto DstOperand = Mov.getOpcode() == AMDGPU::V_MOV_B32_e32
		? TII->getNamedOperand(Mov, AMDGPU::OpName::vdst)
		: TII->getNamedOperand(Mov, AMDGPU::OpName::sdst);

		unsigned R = DstOperand->getReg();
		unsigned Rsub = DstOperand->getSubReg();
		auto MBB = Mov.getParent();
		const SIRegisterInfo &TRI = TII->getRegisterInfo();

		// Make sure that 'R' is not modified between the MOVs in 'MBB'.
		auto I = std::next(Mov.getReverseIterator()), E = MBB->instr_rend();
		for (; I != E; ++I) {
		if (instModifiesReg(&*I, R, Rsub, TRI))
		return false;
		}

		unsigned Op = Mov.getOpcode();
		auto SrcOperand = TII->getNamedOperand(Mov, AMDGPU::OpName::src0);
		assert(SrcOperand->isImm());
		const int64_t Imm = SrcOperand->getImm();

		// Look for the same MOV in the predecessor.
		auto Pred = *MBB->pred_begin();
		I = Pred->instr_rbegin(), E = Pred->instr_rend();
		for (; I != E; ++I) {

		MachineInstr Instr = &I;
		auto InstrSrcOp = TII->getNamedOperand(*Instr, AMDGPU::OpName::src0);
		auto InstrDstOp = Instr->getOpcode() == AMDGPU::V_MOV_B32_e32
		? TII->getNamedOperand(*Instr, AMDGPU::OpName::vdst)
		: TII->getNamedOperand(*Instr, AMDGPU::OpName::sdst);

		if (Instr->getOpcode() == Op &&
		InstrDstOp->getReg() == R &&
		InstrDstOp->getSubReg() == Rsub &&
		InstrSrcOp->isImm() &&
		InstrSrcOp->getImm() == Imm) {
		break;
		}

		// Make sure that 'R' is not modified between the MOVs in 'Pred'.
		if (instModifiesReg(Instr, R, Rsub, TRI))
		return false;
		}

		// If the same MOV was not found in the predecessor, bail out.
		if (I == E)
		return false;

		if (MRI.tracksLiveness() && !MBB->isLiveIn(R))
		MBB->addLiveIn(R);

		return true;
		}

		// Match:
// mov t, x		// mov t, x
// mov x, y		// mov x, y
// mov y, t		// mov y, t
//		//
// =>		// =>
//		//
// mov t, x (t is potentially dead and move eliminated)		// mov t, x (t is potentially dead and move eliminated)
// v_swap_b32 x, y		// v_swap_b32 x, y
▲ Show 20 Lines • Show All 145 Lines • ▼ Show 20 Lines	for (I = MBB.begin(); I != MBB.end(); I = Next) {
if (ST.hasSwap() && (MI.getOpcode() == AMDGPU::V_MOV_B32_e32 \|\|		if (ST.hasSwap() && (MI.getOpcode() == AMDGPU::V_MOV_B32_e32 \|\|
MI.getOpcode() == AMDGPU::COPY)) {		MI.getOpcode() == AMDGPU::COPY)) {
if (auto *NextMI = matchSwap(MI, MRI, TII)) {		if (auto *NextMI = matchSwap(MI, MRI, TII)) {
Next = NextMI->getIterator();		Next = NextMI->getIterator();
continue;		continue;
}		}
}		}

		if (MI.getOpcode() == AMDGPU::V_MOV_B32_e32 \|\|
		MI.getOpcode() == AMDGPU::S_MOV_B32 \|\|
		MI.getOpcode() == AMDGPU::S_MOV_B64) {
		// If the MOV is identical to a MOV in the immediate predecessor
		// of MBB and also no instruction between them modifies the destination
		// register, then remove the MOV.
		MachineOperand &Src = MI.getOperand(1);
		if (Src.isImm() &&
		TargetRegisterInfo::isPhysicalRegister(MI.getOperand(0).getReg())) {
		rampitecUnsubmitted Not Done Reply Inline Actions Even is block has only one predecessor that does not mean this is the reaching def of the physreg. The predecessor may (and likely will) have a different exec mask. It might work with sclaral, but not with vector moves. rampitec: Even is block has only one predecessor that does not mean this is the reaching def of the…
		piotrAuthorUnsubmitted Done Reply Inline Actions Perhaps I misunderstood your comment, but if the block has only one predecessor then it is executed for a subset of threads that already executed the predecessor, so for that subset we're sure that the previous mov happened? piotr: Perhaps I misunderstood your comment, but if the block has only one predecessor then it is…
		rampitecUnsubmitted Not Done Reply Inline Actions Consider this: a: v_mov_b32 v0, 1 s_and_savexec_b64 s[0:1], vcc ; mask branch %c b: v_mov_b32 v0, 2 c: s_or_b32 exec, exec, s[0:1] use v0 You will have divergent v0 value at use even though block c has only one predecessor. rampitec: Consider this: ``` a: v_mov_b32 v0, 1 s_and_savexec_b64 s[0:1], vcc ; mask branch %c b…
		piotrAuthorUnsubmitted Done Reply Inline Actions Thanks Stas. Good point, I will need to check for the modification of the exec mask between the movs. piotr: Thanks Stas. Good point, I will need to check for the modification of the exec mask between the…

		if (MBB.pred_size() == 1 && isMovRedundant(MI, MRI, TII)) {
		MI.eraseFromParent();
		continue;
		}
		}
		}

// Combine adjacent s_nops to use the immediate operand encoding how long		// Combine adjacent s_nops to use the immediate operand encoding how long
// to wait.		// to wait.
//		//
// s_nop N		// s_nop N
// s_nop M		// s_nop M
// =>		// =>
// s_nop (N + M)		// s_nop (N + M)
if (MI.getOpcode() == AMDGPU::S_NOP &&		if (MI.getOpcode() == AMDGPU::S_NOP &&
▲ Show 20 Lines • Show All 195 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/control-flow-fastregalloc.ll

	Show All 32 Lines
	; VMEM: v_mov_b32_e32 v[[V_SAVEEXEC_HI:[0-9]+]], s[[SAVEEXEC_HI]]			; VMEM: v_mov_b32_e32 v[[V_SAVEEXEC_HI:[0-9]+]], s[[SAVEEXEC_HI]]
	; VMEM: buffer_store_dword v[[V_SAVEEXEC_HI]], off, s[0:3], s7 offset:24 ; 4-byte Folded Spill			; VMEM: buffer_store_dword v[[V_SAVEEXEC_HI]], off, s[0:3], s7 offset:24 ; 4-byte Folded Spill

	; GCN: s_mov_b64 exec, s{{\[}}[[ANDEXEC_LO]]:[[ANDEXEC_HI]]{{\]}}			; GCN: s_mov_b64 exec, s{{\[}}[[ANDEXEC_LO]]:[[ANDEXEC_HI]]{{\]}}

	; GCN: mask branch [[ENDIF:BB[0-9]+_[0-9]+]]			; GCN: mask branch [[ENDIF:BB[0-9]+_[0-9]+]]

	; GCN: {{^}}BB{{[0-9]+}}_1: ; %if			; GCN: {{^}}BB{{[0-9]+}}_1: ; %if
	; GCN: s_mov_b32 m0, -1
	; GCN: ds_read_b32 [[LOAD1:v[0-9]+]]			; GCN: ds_read_b32 [[LOAD1:v[0-9]+]]
	; GCN: buffer_load_dword [[RELOAD_LOAD0:v[0-9]+]], off, s[0:3], s7 offset:[[LOAD0_OFFSET]] ; 4-byte Folded Reload			; GCN: buffer_load_dword [[RELOAD_LOAD0:v[0-9]+]], off, s[0:3], s7 offset:[[LOAD0_OFFSET]] ; 4-byte Folded Reload
	; GCN: s_waitcnt vmcnt(0) lgkmcnt(0)			; GCN: s_waitcnt vmcnt(0) lgkmcnt(0)


	; Spill val register			; Spill val register
	; GCN: v_add_i32_e32 [[VAL:v[0-9]+]], vcc, [[LOAD1]], [[RELOAD_LOAD0]]			; GCN: v_add_i32_e32 [[VAL:v[0-9]+]], vcc, [[LOAD1]], [[RELOAD_LOAD0]]
	; GCN: buffer_store_dword [[VAL]], off, s[0:3], s7 offset:[[VAL_OFFSET:[0-9]+]] ; 4-byte Folded Spill			; GCN: buffer_store_dword [[VAL]], off, s[0:3], s7 offset:[[VAL_OFFSET:[0-9]+]] ; 4-byte Folded Spill
	▲ Show 20 Lines • Show All 244 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/mov-opt.ll

This file was added.

				; RUN: llc < %s -mtriple=amdgcn--amdpal -mcpu=gfx900 -verify-machineinstrs \| FileCheck %s

				; Check that the redundant immediate MOV instruction
				; (by-product of handling phi nodes) is optimized away
				; and not found in bb1.

				; CHECK-LABEL: {{^}}mov_opt:
				; CHECK: v_mov_b32_e32 {{v[0-9]+}}, 1.0
				; CHECK: %bb.1:
				; CHECK-NOT: v_mov_b32_e32 {{v[0-9]+}}, 1.0
				; CHECK: %bb.2:

				define void @mov_opt(i32, i32) local_unnamed_addr #0 {
				.entry:
				%2 = add i32 %1, %0
				arsenmUnsubmitted Not Done Reply Inline Actions You should be able to reduce this more arsenm: You should be able to reduce this more
				arsenmUnsubmitted Not Done Reply Inline Actions I don't actually see any redundant constants? arsenm: I don't actually see any redundant constants?
				piotrAuthorUnsubmitted Done Reply Inline Actions The move constants are by-products of different phi nodes and this is why they are not explicitly written in the IR. They get created during the isel in HandlePHINodesInSuccessorBlocks. I will work on simplifying the IR test case even more. piotr: The move constants are by-products of different phi nodes and this is why they are not…
				br i1 undef, label %3, label %.critedge

				3: ; preds = %.entry
				br i1 undef, label %4, label %.critedge

				4: ; preds = %3
				switch i32 undef, label %8 [
				i32 0, label %5
				i32 1, label %6
				i32 2, label %7
				]

				5: ; preds = %4
				br label %8

				6: ; preds = %4
				br label %8

				7: ; preds = %4
				br label %8

				8: ; preds = %7, %6, %5, %4
				%9 = add i32 0, %2
				%10 = lshr i32 %9, 1
				%11 = getelementptr <{ [4294967295 x i32] }>, <{ [4294967295 x i32] }> addrspace(6)* null, i32 0, i32 0, i32 %10
				%12 = ptrtoint i32 addrspace(6)* %11 to i32
				%13 = call i32 @llvm.amdgcn.s.buffer.load.i32(<4 x i32> undef, i32 %12, i32 0)
				%14 = lshr i32 %13, 0
				%15 = lshr i32 %14, 3
				%16 = and i32 %15, 7
				switch i32 %16, label %17 [
				i32 0, label %17
				i32 1, label %18
				i32 2, label %20
				i32 3, label %21
				]

				17: ; preds = %8, %8
				br label %.critedge

				18: ; preds = %8
				%19 = fsub reassoc nnan nsz arcp contract float 1.000000e+00, undef
				br label %.critedge

				20: ; preds = %8
				br label %.critedge

				21: ; preds = %8
				%22 = fsub reassoc nnan nsz arcp contract float 1.000000e+00, undef
				br label %.critedge

				.critedge: ; preds = %21, %20, %18, %17, %3, %.entry
				%__llpc_output_proxy_.3.0 = phi float [ 1.000000e+00, %3 ], [ undef, %21 ], [ undef, %20 ], [ undef, %18 ], [ 0.000000e+00, %17 ], [ 1.000000e+00, %.entry ]
				%__llpc_output_proxy_.3.1 = phi float [ 0.000000e+00, %3 ], [ 0.000000e+00, %21 ], [ 0.000000e+00, %20 ], [ %19, %18 ], [ 0.000000e+00, %17 ], [ 0.000000e+00, %.entry ]
				%__llpc_output_proxy_.3.3 = phi float [ 0.000000e+00, %3 ], [ %22, %21 ], [ undef, %20 ], [ undef, %18 ], [ undef, %17 ], [ 0.000000e+00, %.entry ]
				call void @llvm.amdgcn.exp.f32(i32 immarg 40, i32 immarg 15, float %__llpc_output_proxy_.3.0, float %__llpc_output_proxy_.3.1, float undef, float %__llpc_output_proxy_.3.3, i1 immarg false, i1 immarg false) #2
				ret void
				}

				; Function Attrs: inaccessiblememonly nounwind
				declare void @llvm.amdgcn.exp.f32(i32 immarg, i32 immarg, float, float, float, float, i1 immarg, i1 immarg) #0

				; Function Attrs: nounwind readnone
				declare i32 @llvm.amdgcn.s.buffer.load.i32(<4 x i32>, i32, i32 immarg) #1

				attributes #0 = { inaccessiblememonly nounwind }
				attributes #1 = { nounwind readnone }
				attributes #2 = { nounwind }

test/CodeGen/AMDGPU/remove-redundant-mov.mir

This file was added.

				# RUN: llc -march=amdgcn -mcpu=gfx900 -run-pass si-shrink-instructions -verify-machineinstrs %s -o - \| FileCheck -check-prefix=GCN %s

				# GCN-LABEL: name: v_mov_redundant_move_single
				# GCN: bb.1:
				# GCN-NOT: $vgpr2 = V_MOV_B32_e32 1065353216
				# GCN: $vgpr3 = V_MOV_B32_e32 $vgpr2
				---
				name: v_mov_redundant_move_single
				body: \|
				bb.0:
				renamable $vgpr2 = V_MOV_B32_e32 1065353216, implicit $exec
				S_BRANCH %bb.1

				bb.1:
				renamable $vgpr2 = V_MOV_B32_e32 1065353216, implicit $exec
				$vgpr3 = V_MOV_B32_e32 $vgpr2, implicit $exec, implicit $exec
				...

				# GCN-LABEL: name: v_mov_redundant_move_liveness
				# GCN: bb.1:
				# GCN: liveins: $vgpr2
				# GCN-NOT: $vgpr2 = V_MOV_B32_e32 1065353216
				# GCN: $vgpr3 = V_MOV_B32_e32 $vgpr2
				---
				name: v_mov_redundant_move_liveness
				tracksRegLiveness: true
				body: \|
				bb.0:
				renamable $vgpr2 = V_MOV_B32_e32 1065353216, implicit $exec
				S_BRANCH %bb.1

				bb.1:
				renamable $vgpr2 = V_MOV_B32_e32 1065353216, implicit $exec
				$vgpr3 = V_MOV_B32_e32 $vgpr2, implicit $exec, implicit $exec
				...

				# GCN-LABEL: name: v_mov_redundant_move_multiple
				# GCN: bb.1:
				# GCN-NOT: $vgpr2 = V_MOV_B32_e32 0
				# GCN-NOT: $vgpr1 = V_MOV_B32_e32 1065353216
				# GCN-NOT: $vgpr4 = V_MOV_B32_e32 3204448256
				# GCN-NOT: $vgpr3 = V_MOV_B32_e32 1056964608
				# GCN: $vgpr3 = V_MOV_B32_e32 $vgpr2
				---
				name: v_mov_redundant_move_multiple
				body: \|
				bb.0:
				renamable $vgpr1 = V_MOV_B32_e32 1065353216, implicit $exec
				renamable $vgpr2 = V_MOV_B32_e32 0, implicit $exec
				renamable $vgpr3 = V_MOV_B32_e32 1056964608, implicit $exec
				renamable $vgpr4 = V_MOV_B32_e32 3204448256, implicit $exec
				S_BRANCH %bb.1

				bb.1:
				renamable $vgpr2 = V_MOV_B32_e32 0, implicit $exec
				renamable $vgpr1 = V_MOV_B32_e32 1065353216, implicit $exec
				renamable $vgpr4 = V_MOV_B32_e32 3204448256, implicit $exec
				renamable $vgpr3 = V_MOV_B32_e32 1056964608, implicit $exec
				$vgpr3 = V_MOV_B32_e32 $vgpr2, implicit $exec, implicit $exec
				...

				# GCN-LABEL: name: v_mov_necessary_move_not_removed_1
				# GCN: bb.1:
				# GCN: $vgpr2 = V_MOV_B32_e32 1065353216
				# GCN: $vgpr3 = V_MOV_B32_e32 $vgpr2
				---
				name: v_mov_necessary_move_not_removed_1
				body: \|
				bb.0:
				renamable $vgpr2 = V_MOV_B32_e32 1065353216, implicit $exec
				renamable $vgpr2 = V_MOV_B32_e32 0, implicit $exec
				S_BRANCH %bb.1

				bb.1:
				renamable $vgpr2 = V_MOV_B32_e32 1065353216, implicit $exec
				$vgpr3 = V_MOV_B32_e32 $vgpr2, implicit $exec, implicit $exec
				...

				# GCN-LABEL: name: v_mov_necessary_move_not_removed_2
				# GCN: bb.1:
				# GCN: $vgpr2 = V_MOV_B32_e32 1065353216
				# GCN: $vgpr3 = V_MOV_B32_e32 $vgpr2
				---
				name: v_mov_necessary_move_not_removed_2
				body: \|
				bb.0:
				renamable $vgpr2 = V_MOV_B32_e32 1065353216, implicit $exec
				S_BRANCH %bb.1

				bb.1:
				renamable $vgpr2 = V_MOV_B32_e32 0, implicit $exec
				renamable $vgpr2 = V_MOV_B32_e32 1065353216, implicit $exec
				$vgpr3 = V_MOV_B32_e32 $vgpr2, implicit $exec, implicit $exec
				...



				# GCN-LABEL: name: s_mov_32_redundant_move_single
				# GCN: bb.1:
				# GCN-NOT: $sgpr11 = S_MOV_B32 1065353216
				# GCN: $sgpr12 = S_MOV_B32 $sgpr11
				---
				name: s_mov_32_redundant_move_single
				body: \|
				bb.0:
				renamable $sgpr11 = S_MOV_B32 1065353216, implicit $exec
				S_BRANCH %bb.1

				bb.1:
				renamable $sgpr11 = S_MOV_B32 1065353216, implicit $exec
				$sgpr12 = S_MOV_B32 $sgpr11, implicit $exec, implicit $exec
				...

				# GCN-LABEL: name: s_mov_32_redundant_move_liveness
				# GCN: bb.1:
				# GCN: liveins: $sgpr11
				# GCN-NOT: $sgpr11 = S_MOV_B32 1065353216
				# GCN: $sgpr12 = S_MOV_B32 $sgpr11
				---
				name: s_mov_32_redundant_move_liveness
				tracksRegLiveness: true
				body: \|
				bb.0:
				renamable $sgpr11 = S_MOV_B32 1065353216, implicit $exec
				S_BRANCH %bb.1

				bb.1:
				renamable $sgpr11 = S_MOV_B32 1065353216, implicit $exec
				$sgpr12 = S_MOV_B32 $sgpr11, implicit $exec, implicit $exec
				...

				# GCN-LABEL: name: s_mov_32_redundant_move_multiple
				# GCN: bb.1:
				# GCN-NOT: $sgpr11 = S_MOV_B32 0
				# GCN-NOT: $sgpr10 = S_MOV_B32 1065353216
				# GCN-NOT: $sgpr13 = S_MOV_B32 3204448256
				# GCN-NOT: $sgpr12 = S_MOV_B32 1056964608
				# GCN: $sgpr12 = S_MOV_B32 $sgpr11
				---
				name: s_mov_32_redundant_move_multiple
				body: \|
				bb.0:
				renamable $sgpr10 = S_MOV_B32 1065353216, implicit $exec
				renamable $sgpr11 = S_MOV_B32 0, implicit $exec
				renamable $sgpr12 = S_MOV_B32 1056964608, implicit $exec
				renamable $sgpr13 = S_MOV_B32 3204448256, implicit $exec
				S_BRANCH %bb.1

				bb.1:
				renamable $sgpr11 = S_MOV_B32 0, implicit $exec
				renamable $sgpr10 = S_MOV_B32 1065353216, implicit $exec
				renamable $sgpr13 = S_MOV_B32 3204448256, implicit $exec
				renamable $sgpr12 = S_MOV_B32 1056964608, implicit $exec
				$sgpr12 = S_MOV_B32 $sgpr11, implicit $exec, implicit $exec
				...

				# GCN-LABEL: name: s_mov_32_necessary_move_not_removed_1
				# GCN: bb.1:
				# GCN: $sgpr11 = S_MOV_B32 1065353216
				# GCN: $sgpr12 = S_MOV_B32 $sgpr11
				---
				name: s_mov_32_necessary_move_not_removed_1
				body: \|
				bb.0:
				renamable $sgpr11 = S_MOV_B32 1065353216, implicit $exec
				renamable $sgpr11 = S_MOV_B32 0, implicit $exec
				S_BRANCH %bb.1

				bb.1:
				renamable $sgpr11 = S_MOV_B32 1065353216, implicit $exec
				$sgpr12 = S_MOV_B32 $sgpr11, implicit $exec, implicit $exec
				...

				# GCN-LABEL: name: s_mov_32_necessary_move_not_removed_2
				# GCN: bb.1:
				# GCN: $sgpr11 = S_MOV_B32 1065353216
				# GCN: $sgpr12 = S_MOV_B32 $sgpr11
				---
				name: s_mov_32_necessary_move_not_removed_2
				body: \|
				bb.0:
				renamable $sgpr11 = S_MOV_B32 1065353216, implicit $exec
				S_BRANCH %bb.1

				bb.1:
				renamable $sgpr11 = S_MOV_B32 0, implicit $exec
				renamable $sgpr11 = S_MOV_B32 1065353216, implicit $exec
				$sgpr12 = S_MOV_B32 $sgpr11, implicit $exec, implicit $exec
				...



				# GCN-LABEL: name: s_mov_64_redundant_move_single
				# GCN: bb.1:
				# GCN-NOT: $sgpr8_sgpr9 = S_MOV_B64 1065353216
				# GCN: $sgpr6_sgpr7 = S_MOV_B64 $sgpr8_sgpr9
				---
				name: s_mov_64_redundant_move_single
				body: \|
				bb.0:
				renamable $sgpr8_sgpr9 = S_MOV_B64 1065353216, implicit $exec
				S_BRANCH %bb.1

				bb.1:
				renamable $sgpr8_sgpr9 = S_MOV_B64 1065353216, implicit $exec
				$sgpr6_sgpr7 = S_MOV_B64 $sgpr8_sgpr9, implicit $exec, implicit $exec
				...

				# GCN-LABEL: name: s_mov_64_redundant_move_liveness
				# GCN: bb.1:
				# GCN: liveins: $sgpr10_sgpr11
				# GCN-NOT: $sgpr10_sgpr11 = S_MOV_B64 1065353216
				# GCN: $sgpr6_sgpr7 = S_MOV_B64 $sgpr10_sgpr11
				---
				name: s_mov_64_redundant_move_liveness
				tracksRegLiveness: true
				body: \|
				bb.0:
				renamable $sgpr10_sgpr11 = S_MOV_B64 1065353216, implicit $exec
				S_BRANCH %bb.1

				bb.1:
				renamable $sgpr10_sgpr11 = S_MOV_B64 1065353216, implicit $exec
				$sgpr6_sgpr7 = S_MOV_B64 $sgpr10_sgpr11, implicit $exec, implicit $exec
				...

				# GCN-LABEL: name: s_mov_64_redundant_move_multiple
				# GCN: bb.1:
				# GCN-NOT: $sgpr10_sgpr11 = S_MOV_B64 0
				# GCN-NOT: $sgpr12_sgpr13 = S_MOV_B64 1065353216
				# GCN-NOT: $sgpr6_sgpr7 = S_MOV_B64 3204448256
				# GCN-NOT: $sgpr8_sgpr9 = S_MOV_B64 1056964608
				# GCN: $sgpr14_sgpr15 = S_MOV_B64 $sgpr8_sgpr9
				---
				name: s_mov_64_redundant_move_multiple
				body: \|
				bb.0:
				renamable $sgpr12_sgpr13 = S_MOV_B64 1065353216, implicit $exec
				renamable $sgpr10_sgpr11 = S_MOV_B64 0, implicit $exec
				renamable $sgpr8_sgpr9 = S_MOV_B64 1056964608, implicit $exec
				renamable $sgpr6_sgpr7 = S_MOV_B64 3204448256, implicit $exec
				S_BRANCH %bb.1

				bb.1:
				renamable $sgpr10_sgpr11 = S_MOV_B64 0, implicit $exec
				renamable $sgpr12_sgpr13 = S_MOV_B64 1065353216, implicit $exec
				renamable $sgpr6_sgpr7 = S_MOV_B64 3204448256, implicit $exec
				renamable $sgpr8_sgpr9 = S_MOV_B64 1056964608, implicit $exec
				$sgpr14_sgpr15 = S_MOV_B64 $sgpr8_sgpr9, implicit $exec, implicit $exec
				...

				# GCN-LABEL: name: s_mov_64_necessary_move_not_removed_1
				# GCN: bb.1:
				# GCN: $sgpr10_sgpr11 = S_MOV_B64 1065353216
				# GCN: $sgpr6_sgpr7 = S_MOV_B64 $sgpr10_sgpr11
				---
				name: s_mov_64_necessary_move_not_removed_1
				body: \|
				bb.0:
				renamable $sgpr10_sgpr11 = S_MOV_B64 1065353216, implicit $exec
				renamable $sgpr10_sgpr11 = S_MOV_B64 0, implicit $exec
				S_BRANCH %bb.1

				bb.1:
				renamable $sgpr10_sgpr11 = S_MOV_B64 1065353216, implicit $exec
				$sgpr6_sgpr7 = S_MOV_B64 $sgpr10_sgpr11, implicit $exec, implicit $exec
				...

				# GCN-LABEL: name: s_mov_64_necessary_move_not_removed_2
				# GCN: bb.1:
				# GCN: $sgpr10_sgpr11 = S_MOV_B64 1065353216
				# GCN: $sgpr6_sgpr7 = S_MOV_B64 $sgpr10_sgpr11
				---
				name: s_mov_64_necessary_move_not_removed_2
				body: \|
				bb.0:
				renamable $sgpr10_sgpr11 = S_MOV_B64 1065353216, implicit $exec
				S_BRANCH %bb.1

				bb.1:
				renamable $sgpr10_sgpr11 = S_MOV_B64 0, implicit $exec
				renamable $sgpr10_sgpr11 = S_MOV_B64 1065353216, implicit $exec
				$sgpr6_sgpr7 = S_MOV_B64 $sgpr10_sgpr11, implicit $exec, implicit $exec
				...

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Add peephole to optimize MOVAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 206385

lib/Target/AMDGPU/SIShrinkInstructions.cpp

test/CodeGen/AMDGPU/control-flow-fastregalloc.ll

test/CodeGen/AMDGPU/mov-opt.ll

test/CodeGen/AMDGPU/remove-redundant-mov.mir

[AMDGPU] Add peephole to optimize MOV
AbandonedPublic