This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU][DAG] Only apply trunc/shift combine to 16 bit types
AbandonedPublic

Authored by Pierre-vh on Oct 13 2022, 4:58 AM.

Details

Reviewers
arsenm
Summary

Before, we checked <32 - probably assuming anything below 32 would be 16 bits.
However, odd integer types like i26 exist and are legal. Don't combine in those cases.

Diff Detail

Event Timeline

Pierre-vh created this revision.Oct 13 2022, 4:58 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 13 2022, 4:58 AM
Pierre-vh requested review of this revision.Oct 13 2022, 4:58 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 13 2022, 4:58 AM

What are you trying to solve here? alignbit isn't preferable to 32-bit shifts, and the shift doesn't seem to be wrong

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
3240

Legal 16-bit isn't the point here, it's to avoid the 64-bit shift. Even if we didn't have 16-bit types we would want the combine.

llvm/test/CodeGen/AMDGPU/partial-shift-shrink.ll
157

The alignbit and shift are equally fast, and the shift is easier to understand

This is trying to solve a miscompilation, but it might be the wrong fix. I know for sure this combine is responsible for a miscompilation in a sample, as disabling it/removing it/applying this fix resolves the issue.

This is the DAG being wrongly combined:

      t26: i64 = srl t23, Constant:i32<26>
      t28: i64 = mul nuw nsw t22, Constant:i64<12345678>
    t29: i64 = add nuw nsw t26, t28
  t32: i64 = srl t29, Constant:i32<25>
t33: i26 = truncate t32

Combining: t33: i26 = truncate t32
Creating new node: t82: i32 = truncate t29
Creating new node: t83: i32 = srl t82, Constant:i32<25>
Creating new node: t84: i26 = truncate t83
 ... into: t84: i26 = truncate t83

srl is a shift-right, so it doesn't seem right to me that we truncate the source to 32 bits as it changes the output.

Maybe the error is in the VT of the first trunc? Shouldn't we have at least 2*K for shift-right?
Maybe this is a better fix? This also works to fix the miscompilation I'm seeing.

EVT MidScalarTy = MVT::i32;

// For right shifts, ensure the VT of the shift source is wide
// enough that we don't lose bits in the result.
if(Src.getOpcode() == ISD::SRL || Src.getOpcode() == ISD::SRA) {

  // Don't risk losing info if we don't know the shift amount.
  if(!Known.isConstant())
    return SDValue();

  const uint64_t ScalarWidth = Known.getConstant().getZExtValue() * 2;
  if(ScalarWidth >= 64)
    return SDValue();

  MidScalarTy = EVT::getIntegerVT(*DAG.getContext(), ScalarWidth);
}

EVT MidVT = VT.isVector() ?
  EVT::getVectorVT(*DAG.getContext(), MidScalarTy,
                   VT.getVectorNumElements()) : MidScalarTy;

The checks are that the shift doesn't cross the 32-bit boundary. Here is the broken case: https://alive2.llvm.org/ce/z/P6iBze

The shift amount check seems to be wrong. I think the correct condition is ShiftAmt <= (32 - VT.getScalarSizeInBits()) https://alive2.llvm.org/ce/z/uYZ9tq

I'm not sure alive2 still has the abstract condition checks anymore like the old version

The shift amount check seems to be wrong. I think the correct condition is ShiftAmt <= (32 - VT.getScalarSizeInBits()) https://alive2.llvm.org/ce/z/uYZ9tq

For right shifts. For left shifts it's size.

GlobalISel also seems to have half ported this combine. For some reason it's only handling shl

Pierre-vh abandoned this revision.Oct 17 2022, 12:54 AM