This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/GlobalISel: Insert freeze when splitting vector G_SEXT_INREG
ClosedPublic

Authored by arsenm on Nov 15 2022, 11:33 AM.

Download Raw Diff

Details

Reviewers

foad
Pierre-vh
Petar.Avramovic

Group Reviewers

Restricted Project

Summary

This transform is broken for undef or poison inputs without a freeze.
This is also broken in lots of other places where shifts are split
into 32-bit pieces.

Amt < 32 case:
; Broken: https://alive2.llvm.org/ce/z/7bb4vc
; Freezing the low half of the bits makes it correct
; Fixed: https://alive2.llvm.org/ce/z/zJAZFr
define i64 @src(i64 %val) {
  %shl = shl i64 %val, 55
  %shr = ashr i64 %shl, 55
  ret i64 %shr
}

define i64 @tgt(i64 %val) {
  %lo32 = trunc i64 %val to i32
  %shr.half = lshr i64 %val, 32
  %hi32 = trunc i64 %shr.half to i32
  %inreg.0 = shl i32 %lo32, 23
  %new.lo = ashr i32 %inreg.0, 23
  %new.hi = ashr i32 %new.lo, 31
  %zext.lo = zext i32 %new.lo to i64
  %zext.hi = zext i32 %new.hi to i64
  %hi.ins = shl i64 %zext.hi, 32
  %or = or i64 %hi.ins, %zext.lo
  ret i64 %or
}

Amt == 32 case:
Broken: https://alive2.llvm.org/ce/z/5f4qwQ
Fixed: https://alive2.llvm.org/ce/z/A2hWWF
This one times out alive; works if argument is made undef or scaled down
to a smaller bitwidth.

define i64 @src(i64 %val) {
  %shl = shl i64 %val, 32
  %shr = ashr i64 %shl, 32
  ret i64 %shr
}

define i64 @tgt(i64 %val) {
  %lo32 = trunc i64 %val to i32
  %shr.half = lshr i64 %val, 32
  %hi32 = trunc i64 %shr.half to i32
  %new.hi = ashr i32 %lo32, 31
  %zext.lo = zext i32 %lo32 to i64
  %zext.hi = zext i32 %new.hi to i64
  %hi.ins = shl i64 %zext.hi, 32
  %or = or i64 %hi.ins, %zext.lo
  ret i64 %or
}

Amt > 32 case:
; Correct: https://alive2.llvm.org/ce/z/tvrhPf
define i64 @src(i64 %val) {
  %shl = shl i64 %val, 9
  %shr = ashr i64 %shl, 9
  ret i64 %shr
}

define i64 @tgt(i64 %val) {
  %lo32 = trunc i64 %val to i32
  %lshr = lshr i64 %val, 32
  %hi32 = trunc i64 %lshr to i32
  %inreg.0 = shl i32 %hi32, 9
  %new.hi = ashr i32 %inreg.0, 9
  %zext.lo = zext i32 %lo32 to i64
  %zext.hi = zext i32 %new.hi to i64
  %hi.ins = shl i64 %zext.hi, 32
  %or = or i64 %hi.ins, %zext.lo
  ret i64 %or
}

Diff Detail

Event Timeline

arsenm created this revision.Nov 15 2022, 11:33 AM

Herald added a project: Restricted Project. · View Herald TranscriptNov 15 2022, 11:33 AM

Herald added subscribers: kosarev, kerbowa, hiraditya and 5 others. · View Herald Transcript

arsenm requested review of this revision.Nov 15 2022, 11:33 AM

Herald added a project: Restricted Project. · View Herald TranscriptNov 15 2022, 11:33 AM

Herald added a subscriber: wdng. · View Herald Transcript

Harbormaster completed remote builds in B197811: Diff 475540.Nov 15 2022, 11:34 AM

arsenm added a parent revision: D138051: AMDGPU/GlobalISel: Fix broken expansion of 64-bit vector sext_inreg.Nov 15 2022, 11:36 AM

Rebase

Harbormaster completed remote builds in B197824: Diff 475558.Nov 15 2022, 12:58 PM

What makes this combine specifically require a freeze? Could we have more combine that need it to or is it something with G_SEXT_INREG's semantics that makes it need the G_FREEZE?

llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
2452	I would add some context to describe why freeze is needed here, since it's not an instruction that's often seen (at least for me)

In D138050#3930454, @Pierre-vh wrote:

What makes this combine specifically require a freeze? Could we have more combine that need it to or is it something with G_SEXT_INREG's semantics that makes it need the G_FREEZE?

It's introducing an expectation for potentially poisonous bits in two distinct uses since the low half is used in two different places. Both uses need to use the same value. There are plenty of places that are probably missing freezes

There have been a number of talks on freeze, e.g. https://www.youtube.com/watch?v=ZMaZH3YYJqY

Add comment

Harbormaster completed remote builds in B198045: Diff 475886.Nov 16 2022, 12:03 PM

In D138050#3930981, @arsenm wrote:

In D138050#3930454, @Pierre-vh wrote:

What makes this combine specifically require a freeze? Could we have more combine that need it to or is it something with G_SEXT_INREG's semantics that makes it need the G_FREEZE?

It's introducing an expectation for potentially poisonous bits in two distinct uses since the low half is used in two different places. Both uses need to use the same value. There are plenty of places that are probably missing freezes

There have been a number of talks on freeze, e.g. https://www.youtube.com/watch?v=ZMaZH3YYJqY

Interesting, I'll definitely be looking out for cases like that in the future. I'll also watch that talk as soon as possible
Thanks :)

This revision is now accepted and ready to land.Nov 16 2022, 11:38 PM

b0847b0095e10e784dc241ebb19f39edd9c6a7f8

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

AMDGPURegisterBankInfo.cpp

8 lines

test/

CodeGen/

AMDGPU/

GlobalISel/

regbankselect-sext-inreg.mir

12 lines

Diff 475886

llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp

Show First 20 Lines • Show All 2,441 Lines • ▼ Show 20 Lines	case AMDGPU::G_SEXT_INREG: {

// Don't use LegalizerHelper's narrowScalar. It produces unwanted G_SEXTs		// Don't use LegalizerHelper's narrowScalar. It produces unwanted G_SEXTs
// we would need to further expand, and doesn't let us directly set the		// we would need to further expand, and doesn't let us directly set the
// result registers.		// result registers.
SmallVector<Register, 2> DstRegs(OpdMapper.getVRegs(0));		SmallVector<Register, 2> DstRegs(OpdMapper.getVRegs(0));

int Amt = MI.getOperand(2).getImm();		int Amt = MI.getOperand(2).getImm();
if (Amt <= 32) {		if (Amt <= 32) {
		// Downstream users have expectations for the high bit behavior, so freeze
		// incoming poison.
if (Amt == 32) {		if (Amt == 32) {
		Pierre-vhUnsubmitted Not Done Reply Inline Actions I would add some context to describe why freeze is needed here, since it's not an instruction that's often seen (at least for me) Pierre-vh: I would add some context to describe why freeze is needed here, since it's not an instruction…
// The low bits are unchanged.		// The low bits are unchanged.
B.buildCopy(DstRegs[0], SrcRegs[0]);		B.buildFreeze(DstRegs[0], SrcRegs[0]);
} else {		} else {
		auto Freeze = B.buildFreeze(S32, SrcRegs[0]);
// Extend in the low bits and propagate the sign bit to the high half.		// Extend in the low bits and propagate the sign bit to the high half.
B.buildSExtInReg(DstRegs[0], SrcRegs[0], Amt);		B.buildSExtInReg(DstRegs[0], Freeze, Amt);
}		}

B.buildAShr(DstRegs[1], DstRegs[0], B.buildConstant(S32, 31));		B.buildAShr(DstRegs[1], DstRegs[0], B.buildConstant(S32, 31));
} else {		} else {
// The low bits are unchanged, and extend in the high bits.		// The low bits are unchanged, and extend in the high bits.
		// No freeze required
B.buildCopy(DstRegs[0], SrcRegs[0]);		B.buildCopy(DstRegs[0], SrcRegs[0]);
B.buildSExtInReg(DstRegs[1], DstRegs[0], Amt - 32);		B.buildSExtInReg(DstRegs[1], DstRegs[0], Amt - 32);
}		}

Register DstReg = MI.getOperand(0).getReg();		Register DstReg = MI.getOperand(0).getReg();
MRI.setRegBank(DstReg, AMDGPU::VGPRRegBank);		MRI.setRegBank(DstReg, AMDGPU::VGPRRegBank);
MI.eraseFromParent();		MI.eraseFromParent();
return;		return;
▲ Show 20 Lines • Show All 2,344 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-sext-inreg.mir

Show First 20 Lines • Show All 129 Lines • ▼ Show 20 Lines	body: \|
bb.0:		bb.0:
liveins: $vgpr0_vgpr1		liveins: $vgpr0_vgpr1

; CHECK-LABEL: name: sext_inreg_v_s64_1		; CHECK-LABEL: name: sext_inreg_v_s64_1
; CHECK: liveins: $vgpr0_vgpr1		; CHECK: liveins: $vgpr0_vgpr1
; CHECK-NEXT: {{ $}}		; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[COPY:%[0-9]+]]:vgpr(s64) = COPY $vgpr0_vgpr1		; CHECK-NEXT: [[COPY:%[0-9]+]]:vgpr(s64) = COPY $vgpr0_vgpr1
; CHECK-NEXT: [[UV:%[0-9]+]]:vgpr(s32), [[UV1:%[0-9]+]]:vgpr(s32) = G_UNMERGE_VALUES [[COPY]](s64)		; CHECK-NEXT: [[UV:%[0-9]+]]:vgpr(s32), [[UV1:%[0-9]+]]:vgpr(s32) = G_UNMERGE_VALUES [[COPY]](s64)
; CHECK-NEXT: [[SEXT_INREG:%[0-9]+]]:vgpr(s32) = G_SEXT_INREG [[UV]], 1		; CHECK-NEXT: [[FREEZE:%[0-9]+]]:vgpr(s32) = G_FREEZE [[UV]]
		; CHECK-NEXT: [[SEXT_INREG:%[0-9]+]]:vgpr(s32) = G_SEXT_INREG [[FREEZE]], 1
; CHECK-NEXT: [[C:%[0-9]+]]:vgpr(s32) = G_CONSTANT i32 31		; CHECK-NEXT: [[C:%[0-9]+]]:vgpr(s32) = G_CONSTANT i32 31
; CHECK-NEXT: [[ASHR:%[0-9]+]]:vgpr(s32) = G_ASHR [[SEXT_INREG]], [[C]](s32)		; CHECK-NEXT: [[ASHR:%[0-9]+]]:vgpr(s32) = G_ASHR [[SEXT_INREG]], [[C]](s32)
; CHECK-NEXT: [[MV:%[0-9]+]]:vgpr(s64) = G_MERGE_VALUES [[SEXT_INREG]](s32), [[ASHR]](s32)		; CHECK-NEXT: [[MV:%[0-9]+]]:vgpr(s64) = G_MERGE_VALUES [[SEXT_INREG]](s32), [[ASHR]](s32)
; CHECK-NEXT: S_ENDPGM 0, implicit [[MV]](s64)		; CHECK-NEXT: S_ENDPGM 0, implicit [[MV]](s64)
%0:_(s64) = COPY $vgpr0_vgpr1		%0:_(s64) = COPY $vgpr0_vgpr1
%1:_(s64) = G_SEXT_INREG %0, 1		%1:_(s64) = G_SEXT_INREG %0, 1
S_ENDPGM 0, implicit %1		S_ENDPGM 0, implicit %1

...		...

---		---
name: sext_inreg_v_s64_31		name: sext_inreg_v_s64_31
legalized: true		legalized: true

body: \|		body: \|
bb.0:		bb.0:
liveins: $vgpr0_vgpr1		liveins: $vgpr0_vgpr1

; CHECK-LABEL: name: sext_inreg_v_s64_31		; CHECK-LABEL: name: sext_inreg_v_s64_31
; CHECK: liveins: $vgpr0_vgpr1		; CHECK: liveins: $vgpr0_vgpr1
; CHECK-NEXT: {{ $}}		; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[COPY:%[0-9]+]]:vgpr(s64) = COPY $vgpr0_vgpr1		; CHECK-NEXT: [[COPY:%[0-9]+]]:vgpr(s64) = COPY $vgpr0_vgpr1
; CHECK-NEXT: [[UV:%[0-9]+]]:vgpr(s32), [[UV1:%[0-9]+]]:vgpr(s32) = G_UNMERGE_VALUES [[COPY]](s64)		; CHECK-NEXT: [[UV:%[0-9]+]]:vgpr(s32), [[UV1:%[0-9]+]]:vgpr(s32) = G_UNMERGE_VALUES [[COPY]](s64)
; CHECK-NEXT: [[SEXT_INREG:%[0-9]+]]:vgpr(s32) = G_SEXT_INREG [[UV]], 31		; CHECK-NEXT: [[FREEZE:%[0-9]+]]:vgpr(s32) = G_FREEZE [[UV]]
		; CHECK-NEXT: [[SEXT_INREG:%[0-9]+]]:vgpr(s32) = G_SEXT_INREG [[FREEZE]], 31
; CHECK-NEXT: [[C:%[0-9]+]]:vgpr(s32) = G_CONSTANT i32 31		; CHECK-NEXT: [[C:%[0-9]+]]:vgpr(s32) = G_CONSTANT i32 31
; CHECK-NEXT: [[ASHR:%[0-9]+]]:vgpr(s32) = G_ASHR [[SEXT_INREG]], [[C]](s32)		; CHECK-NEXT: [[ASHR:%[0-9]+]]:vgpr(s32) = G_ASHR [[SEXT_INREG]], [[C]](s32)
; CHECK-NEXT: [[MV:%[0-9]+]]:vgpr(s64) = G_MERGE_VALUES [[SEXT_INREG]](s32), [[ASHR]](s32)		; CHECK-NEXT: [[MV:%[0-9]+]]:vgpr(s64) = G_MERGE_VALUES [[SEXT_INREG]](s32), [[ASHR]](s32)
; CHECK-NEXT: S_ENDPGM 0, implicit [[MV]](s64)		; CHECK-NEXT: S_ENDPGM 0, implicit [[MV]](s64)
%0:_(s64) = COPY $vgpr0_vgpr1		%0:_(s64) = COPY $vgpr0_vgpr1
%1:_(s64) = G_SEXT_INREG %0, 31		%1:_(s64) = G_SEXT_INREG %0, 31
S_ENDPGM 0, implicit %1		S_ENDPGM 0, implicit %1

...		...

---		---
name: sext_inreg_v_s64_32		name: sext_inreg_v_s64_32
legalized: true		legalized: true

body: \|		body: \|
bb.0:		bb.0:
liveins: $vgpr0_vgpr1		liveins: $vgpr0_vgpr1

; CHECK-LABEL: name: sext_inreg_v_s64_32		; CHECK-LABEL: name: sext_inreg_v_s64_32
; CHECK: liveins: $vgpr0_vgpr1		; CHECK: liveins: $vgpr0_vgpr1
; CHECK-NEXT: {{ $}}		; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[COPY:%[0-9]+]]:vgpr(s64) = COPY $vgpr0_vgpr1		; CHECK-NEXT: [[COPY:%[0-9]+]]:vgpr(s64) = COPY $vgpr0_vgpr1
; CHECK-NEXT: [[UV:%[0-9]+]]:vgpr(s32), [[UV1:%[0-9]+]]:vgpr(s32) = G_UNMERGE_VALUES [[COPY]](s64)		; CHECK-NEXT: [[UV:%[0-9]+]]:vgpr(s32), [[UV1:%[0-9]+]]:vgpr(s32) = G_UNMERGE_VALUES [[COPY]](s64)
; CHECK-NEXT: [[COPY1:%[0-9]+]]:vgpr(s32) = COPY [[UV]](s32)		; CHECK-NEXT: [[FREEZE:%[0-9]+]]:vgpr(s32) = G_FREEZE [[UV]]
; CHECK-NEXT: [[C:%[0-9]+]]:vgpr(s32) = G_CONSTANT i32 31		; CHECK-NEXT: [[C:%[0-9]+]]:vgpr(s32) = G_CONSTANT i32 31
; CHECK-NEXT: [[ASHR:%[0-9]+]]:vgpr(s32) = G_ASHR [[COPY1]], [[C]](s32)		; CHECK-NEXT: [[ASHR:%[0-9]+]]:vgpr(s32) = G_ASHR [[FREEZE]], [[C]](s32)
; CHECK-NEXT: [[MV:%[0-9]+]]:vgpr(s64) = G_MERGE_VALUES [[COPY1]](s32), [[ASHR]](s32)		; CHECK-NEXT: [[MV:%[0-9]+]]:vgpr(s64) = G_MERGE_VALUES [[FREEZE]](s32), [[ASHR]](s32)
; CHECK-NEXT: S_ENDPGM 0, implicit [[MV]](s64)		; CHECK-NEXT: S_ENDPGM 0, implicit [[MV]](s64)
%0:_(s64) = COPY $vgpr0_vgpr1		%0:_(s64) = COPY $vgpr0_vgpr1
%1:_(s64) = G_SEXT_INREG %0, 32		%1:_(s64) = G_SEXT_INREG %0, 32
S_ENDPGM 0, implicit %1		S_ENDPGM 0, implicit %1

...		...

---		---
▲ Show 20 Lines • Show All 67 Lines • Show Last 20 Lines