This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/GlobalISel: Insert freeze when splitting vector G_SEXT_INREG
ClosedPublic

Authored by arsenm on Nov 15 2022, 11:33 AM.

Details

Reviewers
foad
Pierre-vh
Petar.Avramovic
Group Reviewers
Restricted Project
Summary

This transform is broken for undef or poison inputs without a freeze.
This is also broken in lots of other places where shifts are split
into 32-bit pieces.

Amt < 32 case:
; Broken: https://alive2.llvm.org/ce/z/7bb4vc
; Freezing the low half of the bits makes it correct
; Fixed: https://alive2.llvm.org/ce/z/zJAZFr
define i64 @src(i64 %val) {
  %shl = shl i64 %val, 55
  %shr = ashr i64 %shl, 55
  ret i64 %shr
}

define i64 @tgt(i64 %val) {
  %lo32 = trunc i64 %val to i32
  %shr.half = lshr i64 %val, 32
  %hi32 = trunc i64 %shr.half to i32
  %inreg.0 = shl i32 %lo32, 23
  %new.lo = ashr i32 %inreg.0, 23
  %new.hi = ashr i32 %new.lo, 31
  %zext.lo = zext i32 %new.lo to i64
  %zext.hi = zext i32 %new.hi to i64
  %hi.ins = shl i64 %zext.hi, 32
  %or = or i64 %hi.ins, %zext.lo
  ret i64 %or
}

Amt == 32 case:
Broken: https://alive2.llvm.org/ce/z/5f4qwQ
Fixed: https://alive2.llvm.org/ce/z/A2hWWF
This one times out alive; works if argument is made undef or scaled down
to a smaller bitwidth.

define i64 @src(i64 %val) {
  %shl = shl i64 %val, 32
  %shr = ashr i64 %shl, 32
  ret i64 %shr
}

define i64 @tgt(i64 %val) {
  %lo32 = trunc i64 %val to i32
  %shr.half = lshr i64 %val, 32
  %hi32 = trunc i64 %shr.half to i32
  %new.hi = ashr i32 %lo32, 31
  %zext.lo = zext i32 %lo32 to i64
  %zext.hi = zext i32 %new.hi to i64
  %hi.ins = shl i64 %zext.hi, 32
  %or = or i64 %hi.ins, %zext.lo
  ret i64 %or
}

Amt > 32 case:
; Correct: https://alive2.llvm.org/ce/z/tvrhPf
define i64 @src(i64 %val) {
  %shl = shl i64 %val, 9
  %shr = ashr i64 %shl, 9
  ret i64 %shr
}

define i64 @tgt(i64 %val) {
  %lo32 = trunc i64 %val to i32
  %lshr = lshr i64 %val, 32
  %hi32 = trunc i64 %lshr to i32
  %inreg.0 = shl i32 %hi32, 9
  %new.hi = ashr i32 %inreg.0, 9
  %zext.lo = zext i32 %lo32 to i64
  %zext.hi = zext i32 %new.hi to i64
  %hi.ins = shl i64 %zext.hi, 32
  %or = or i64 %hi.ins, %zext.lo
  ret i64 %or
}

Diff Detail

Event Timeline

arsenm created this revision.Nov 15 2022, 11:33 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 15 2022, 11:33 AM
arsenm requested review of this revision.Nov 15 2022, 11:33 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 15 2022, 11:33 AM
Herald added a subscriber: wdng. · View Herald Transcript

What makes this combine specifically require a freeze? Could we have more combine that need it to or is it something with G_SEXT_INREG's semantics that makes it need the G_FREEZE?

llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
2452

I would add some context to describe why freeze is needed here, since it's not an instruction that's often seen (at least for me)

What makes this combine specifically require a freeze? Could we have more combine that need it to or is it something with G_SEXT_INREG's semantics that makes it need the G_FREEZE?

It's introducing an expectation for potentially poisonous bits in two distinct uses since the low half is used in two different places. Both uses need to use the same value. There are plenty of places that are probably missing freezes

There have been a number of talks on freeze, e.g. https://www.youtube.com/watch?v=ZMaZH3YYJqY

arsenm updated this revision to Diff 475886.Nov 16 2022, 12:02 PM

Add comment

Pierre-vh accepted this revision.Nov 16 2022, 11:38 PM

What makes this combine specifically require a freeze? Could we have more combine that need it to or is it something with G_SEXT_INREG's semantics that makes it need the G_FREEZE?

It's introducing an expectation for potentially poisonous bits in two distinct uses since the low half is used in two different places. Both uses need to use the same value. There are plenty of places that are probably missing freezes

There have been a number of talks on freeze, e.g. https://www.youtube.com/watch?v=ZMaZH3YYJqY

Interesting, I'll definitely be looking out for cases like that in the future. I'll also watch that talk as soon as possible
Thanks :)

This revision is now accepted and ready to land.Nov 16 2022, 11:38 PM