This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/CodeGen/
-
CodeGen/
-
LiveRangeEdit.cpp
-
test/CodeGen/AMDGPU/
-
CodeGen/
-
AMDGPU/
-
remat-dead-subreg.mir

Differential D131884

Fix subrange liveness checking at rematerialization
ClosedPublic

Authored by npmiller on Aug 15 2022, 5:01 AM.

Download Raw Diff

Details

Reviewers

arsenm
rampitec

Commits

rGccfabfbb1f91: Fix subrange liveness checking at rematerialization

Summary

This patch fixes an issue where an instruction reading a whole register would be moved during register allocation into a spot where one of the subregisters was dead.

The code to check whether an instruction can be rematerialized at a given point or not was already checking for subranges to ensure that subregisters are live, but only when the instruction being moved was using a subregister, this patch changes that so the subranges are checked even when the moved instruction uses the full register.

This patch also adds a case to the original test for the subrange checking that trigger the issue described above.

The original subrange checking code was introduced in this revision: https://reviews.llvm.org/D115278

And I've encountered this issue on AMDGPUs while working with DPC++: https://github.com/intel/llvm/issues/6209

Essentially the greedy register allocator attempts to move the following instruction:

%3961:vreg_64 = V_LSHLREV_B64_e64 3, %3078:vreg_64, implicit $exec

From @3440 into the body of a loop @16312, but %3078 has the following live ranges:

%3078 [2224r,2240r:0)[2240r,3488B:1)[16192B,38336B:1) 0@2224r 1@2240r  L0000000000000003 [2224r,3440r:0) 0@2224r  L000000000000000C [2240r,3488B:0)[16192B,38336B:0) 0@2240r

So @16312e %3078.sub1 is alive but %3078.sub0 is dead, so this instruction being moved there leads to invalid memory accesses as 3078.sub0 ends up being trashed and the result of this instruction is used as part of an address calculation for a load.

On the original ticket this issue showed up on gfx906 and gfx90a but not on gfx908, this turned out to be because on gfx908 instead of moving the shift instruction into the loop, its value is spilled into an ACC register, gfx906 doesn't have ACC registers and for gfx90a ACC registers are used like regular vector registers and so aren't used for spilling.

With this patch the original application from the DPC++ ticket works properly on gfx906, and the result of the shift instruction is correctly spilled instead of moving the instruction in the loop.

Diff Detail

Event Timeline

npmiller created this revision.Aug 15 2022, 5:01 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 15 2022, 5:01 AM

Herald added subscribers: kosarev, kerbowa, hiraditya and 3 others. · View Herald Transcript

npmiller requested review of this revision.Aug 15 2022, 5:01 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 15 2022, 5:01 AM

Herald added subscribers: llvm-commits, wdng. · View Herald Transcript

Harbormaster completed remote builds in B181253: Diff 452611.Aug 15 2022, 6:02 AM

rolandschulz added a subscriber: rolandschulz.Aug 15 2022, 9:49 AM

Looks reasonable, thanks!

This revision is now accepted and ready to land.Aug 15 2022, 10:41 AM

Thanks! Please note that I don't have commit access so someone else will have to land patch.

In D131884#3725315, @npmiller wrote:

Thanks! Please note that I don't have commit access so someone else will have to land patch.

I can help.

Closed by commit rGccfabfbb1f91: Fix subrange liveness checking at rematerialization (authored by npmiller, committed by rampitec). · Explain WhyAug 16 2022, 10:50 AM

This revision was automatically updated to reflect the committed changes.

rampitec added a commit: rGccfabfbb1f91: Fix subrange liveness checking at rematerialization.

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

LiveRangeEdit.cpp

6 lines

test/

CodeGen/

AMDGPU/

remat-dead-subreg.mir

25 lines

Diff 452611

llvm/lib/CodeGen/LiveRangeEdit.cpp

Show First 20 Lines • Show All 128 Lines • ▼ Show 20 Lines	for (const MachineOperand &MO : OrigMI->operands()) {
// See PR14098.		// See PR14098.
if (SlotIndex::isSameInstr(OrigIdx, UseIdx))		if (SlotIndex::isSameInstr(OrigIdx, UseIdx))
return false;		return false;

if (OVNI != li.getVNInfoAt(UseIdx))		if (OVNI != li.getVNInfoAt(UseIdx))
return false;		return false;

// Check that subrange is live at UseIdx.		// Check that subrange is live at UseIdx.
if (MO.getSubReg()) {		if (li.hasSubRanges()) {
const TargetRegisterInfo *TRI = MRI.getTargetRegisterInfo();		const TargetRegisterInfo *TRI = MRI.getTargetRegisterInfo();
LaneBitmask LM = TRI->getSubRegIndexLaneMask(MO.getSubReg());		unsigned SubReg = MO.getSubReg();
		LaneBitmask LM = SubReg ? TRI->getSubRegIndexLaneMask(SubReg)
		: MRI.getMaxLaneMaskForVReg(MO.getReg());
for (LiveInterval::SubRange &SR : li.subranges()) {		for (LiveInterval::SubRange &SR : li.subranges()) {
if ((SR.LaneMask & LM).none())		if ((SR.LaneMask & LM).none())
continue;		continue;
if (!SR.liveAt(UseIdx))		if (!SR.liveAt(UseIdx))
return false;		return false;
// Early exit if all used lanes are checked. No need to continue.		// Early exit if all used lanes are checked. No need to continue.
LM &= ~SR.LaneMask;		LM &= ~SR.LaneMask;
if (LM.none())		if (LM.none())
▲ Show 20 Lines • Show All 350 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/remat-dead-subreg.mir

Show First 20 Lines • Show All 73 Lines • ▼ Show 20 Lines	bb.0.entry:
%1:sgpr_128 = S_LOAD_DWORDX4_IMM %0, 1, 0		%1:sgpr_128 = S_LOAD_DWORDX4_IMM %0, 1, 0
%2:sreg_64 = S_MOV_B64 %1.sub2_sub3		%2:sreg_64 = S_MOV_B64 %1.sub2_sub3
%3:sreg_64 = S_MOV_B64 2, implicit $m0		%3:sreg_64 = S_MOV_B64 2, implicit $m0
%4:sreg_64 = S_MOV_B64 3, implicit $m0		%4:sreg_64 = S_MOV_B64 3, implicit $m0
%5:vgpr_32 = V_MOV_B32_e32 %1.sub0, implicit $exec, implicit %3, implicit %4		%5:vgpr_32 = V_MOV_B32_e32 %1.sub0, implicit $exec, implicit %3, implicit %4
%6:vreg_64 = V_MOV_B64_PSEUDO %2, implicit $exec		%6:vreg_64 = V_MOV_B64_PSEUDO %2, implicit $exec
S_NOP 0, implicit %1.sub0, implicit %1.sub3		S_NOP 0, implicit %1.sub0, implicit %1.sub3
...		...
		---
		name: dead_subreg_whole_reg
		tracksRegLiveness: true
		body: \|
		bb.0.entry:
		; GCN-LABEL: name: dead_subreg_whole_reg
		; GCN: $m0 = IMPLICIT_DEF
		; GCN-NEXT: renamable $sgpr0_sgpr1 = S_MOV_B64 1, implicit $m0
		; GCN-NEXT: renamable $sgpr2_sgpr3 = S_MOV_B64 renamable $sgpr0_sgpr1
		; GCN-NEXT: SI_SPILL_S64_SAVE killed renamable $sgpr2_sgpr3, %stack.0, implicit $exec, implicit $sp_reg :: (store (s64) into %stack.0, align 4, addrspace 5)
		; GCN-NEXT: renamable $sgpr4_sgpr5 = S_MOV_B64 2, implicit $m0
		; GCN-NEXT: renamable $sgpr2_sgpr3 = S_MOV_B64 3, implicit $m0
		; GCN-NEXT: dead %4:vgpr_32 = V_MOV_B32_e32 $sgpr0, implicit $exec, implicit killed $sgpr4_sgpr5, implicit killed $sgpr2_sgpr3
		; GCN-NEXT: renamable $sgpr2_sgpr3 = SI_SPILL_S64_RESTORE %stack.0, implicit $exec, implicit $sp_reg :: (load (s64) from %stack.0, align 4, addrspace 5)
		; GCN-NEXT: dead %5:vreg_64 = V_MOV_B64_PSEUDO killed $sgpr2_sgpr3, implicit $exec
		; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr0
		$m0 = IMPLICIT_DEF
		%0:sreg_64_xexec = S_MOV_B64 1, implicit $m0
		%1:sreg_64 = S_MOV_B64 %0:sreg_64_xexec
		%2:sreg_64 = S_MOV_B64 2, implicit $m0
		%3:sreg_64 = S_MOV_B64 3, implicit $m0
		%4:vgpr_32 = V_MOV_B32_e32 %0.sub0:sreg_64_xexec, implicit $exec, implicit %2, implicit %3
		%5:vreg_64 = V_MOV_B64_PSEUDO %1, implicit $exec
		S_NOP 0, implicit %0.sub0
		...