Page MenuHomePhabricator

[AggressiveInstCombine] Add logical shift right instr to `TruncInstCombine` DAG
ClosedPublic

Authored by anton-afanasyev on Aug 17 2021, 4:15 AM.

Details

Summary

Add lshr instruction to the DAG post-dominated by trunc, allowing
TruncInstCombine to reduce bitwidth of expressions containing
these instructions.

We should be shifting by less than the target bitwidth.
Also it is sufficient to require that all truncated bits
of the value-to-be-shifted are zeros: https://alive2.llvm.org/ce/z/_LytbB

Alive2 variable-length proof:
https://godbolt.org/z/1srE1aqzf => s/32/8/ => https://alive2.llvm.org/ce/z/StwPia

Part of https://reviews.llvm.org/D107766

Diff Detail

Event Timeline

anton-afanasyev requested review of this revision.Aug 17 2021, 4:15 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 17 2021, 4:15 AM
lebedev.ri edited the summary of this revision. (Show Details)Aug 17 2021, 5:31 AM
lebedev.ri edited the summary of this revision. (Show Details)Aug 17 2021, 6:14 AM
lebedev.ri accepted this revision.Aug 17 2021, 7:34 AM

We don't actually need *all* the high bits to be zeros,
only the ones that would be potentially shifted-in,
iff we don't truncate them away first.
E.g.: 0b11100111, target bit width of 4, and shift amount of 0..1: https://alive2.llvm.org/ce/z/jJ85EE

But indeed, we only have the 'min' bitwidth, we can't really model that here,
especially because 'max' bitwidth would differ for different 'min' bitwidths...

So LG, i guess.
@spatel ?

llvm/lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
293
This revision is now accepted and ready to land.Aug 17 2021, 7:34 AM

We don't actually need *all* the high bits to be zeros,
only the ones that would be potentially shifted-in,

Sure, that's why it's only sufficient condition, not necessary one. Hope it's the most common case in real life.

Rebase after shl bugfix

Rebase after shl bugfix

Add tests similar to 0988488ed461 (multiple widths) and 803270c0c691 (64-bit) for lshr patterns?

llvm/lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
295–296

This wasn't updated with the suggested change to use getActiveBits().
IIUC, we can make this more efficient (avoid computeKnownBits in some cases) by hoisting the common check for MinBitWidth >= OrigBitWidth above the extra check for lshr:

unsigned MinBitWidth = KnownRHS.getMaxValue()
                           .uadd_sat(APInt(SrcBitWidth, 1))
                           .getLimitedValue(SrcBitWidth);
if (MinBitWidth >= OrigBitWidth)
  return nullptr;
if (I->getOpcode() == Instruction::LShr) {
  KnownBits KnownLHS = computeKnownBits(I->getOperand(0), DL);
  if (KnownLHS.getMaxValue().getActiveBits() >= OrigBitWidth)
    return nullptr;
}
Itr.second.MinBitWidth = MinBitWidth;
llvm/test/Transforms/AggressiveInstCombine/trunc_shifts.ll
212

Similar to the previous shl example - this is another simplification to 0 that isn't handled by -instsimplify (but regular -instcombine gets it).

anton-afanasyev marked 3 inline comments as done.Aug 18 2021, 7:44 AM

Rebase after shl bugfix

Add tests similar to 0988488ed461 (multiple widths) and 803270c0c691 (64-bit) for lshr patterns?

Sure, added tests

llvm/lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
293

Thanks, used getActiveBits()

295–296

Thanks, duplicated check for efficiency.

llvm/test/Transforms/AggressiveInstCombine/trunc_shifts.ll
212

Ok, but here it is simple negative test, checking we do not fold to poisonous lshr i16 x, 16.

anton-afanasyev marked 3 inline comments as done.

Address comments

spatel added inline comments.Aug 18 2021, 8:10 AM
llvm/lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
297–298

We already returned if MinBitWdith based on KnownRHS was too big, so this std::max is redundant?

llvm/test/Transforms/AggressiveInstCombine/trunc_shifts.ll
212

Yes, not a problem - just pointing out that there's another opportunity to improve InstSimplify and run some kind of cleanup in this pass.

anton-afanasyev marked 2 inline comments as done.Aug 18 2021, 9:31 AM
anton-afanasyev added inline comments.
llvm/lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
297–298

No, it isn't: we still use this updated MinBitWidth by setting it to instruction Info:

Itr.second.MinBitWidth = MinBitWidth;

This value is then used while computing common MinBitWidth in getMinBitWidth() function.

spatel added inline comments.Aug 18 2021, 9:49 AM
llvm/lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
297–298

Ah, ok. Do we have a test to exercise that path? I didn't see any test failures when I made the change locally.

anton-afanasyev marked 2 inline comments as done.

Add test to exercise uncovered path

llvm/lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
297–298

Ok, added test to cover that path. With your change we get incorrect transformation: https://alive2.llvm.org/ce/z/fZ6jFb

spatel accepted this revision.Aug 18 2021, 12:09 PM

LGTM

This revision was landed with ongoing or failed builds.Aug 18 2021, 12:22 PM
This revision was automatically updated to reflect the committed changes.

looks like this caused a crash in some chromium code

c++ repro,

$ cat t.cpp
struct a {
  typedef unsigned char b;
};
struct c {
  static unsigned char *d(unsigned char *g, unsigned *p, int q) {
    *p = *g;
    if (*p)
      *p = q;
    return g;
  }
};
template <typename h> void i(int *, int, int, unsigned *, typename h::b *) {
  char *e = 0;
  unsigned char j;
  unsigned f;
  c::d(&j, &f, 33);
  *e = f >> 2;
}
int k, l, m;
unsigned n;
void o() { i<a>(&k, l, m, &n, (unsigned char *)o); }

$ clang -cc1 -fno-delete-null-pointer-checks -O3 -fsanitize=fuzzer-no-link -emit-llvm t.cpp

Also made an IR repro,

$ cat repro.ll
; ModuleID = 't.cpp'
source_filename = "t.cpp"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

$_Z1iI1aEvPiiiPjPNT_1bE = comdat any

; Function Attrs: mustprogress nounwind null_pointer_is_valid optforfuzzing
define linkonce_odr void @_Z1iI1aEvPiiiPjPNT_1bE(i32* %0, i32 %1, i32 %2, i32* %3, i8* %4) local_unnamed_addr #0 comdat {
_ZN1c1dEPhPji.exit:
  %shr = lshr i32 33, 2
  %conv = trunc i32 %shr to i8
  store i8 %conv, i8* null, align 536870912, !tbaa !6
  ret void
}

attributes #0 = { mustprogress nounwind null_pointer_is_valid optforfuzzing "frame-pointer"="none" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+cx8,+mmx,+sse,+sse2,+x87" }

!llvm.module.flags = !{!0}
!llvm.ident = !{!1}

!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{!"clang version 14.0.0"}
!2 = !{!3, !3, i64 0}
!3 = !{!"int", !4, i64 0}
!4 = !{!"omnipotent char", !5, i64 0}
!5 = !{!"Simple C++ TBAA"}
!6 = !{!4, !4, i64 0}

$ opt -aggressive-instcombine -S repro.ll

looks like this caused a crash in some chromium code

Hi @akhuang , I'm not able to reproduce this crash. Could you please decribe your environment/crash output/stack trace?

> opt -aggressive-instcombine -S repro.ll 
; ModuleID = 'repro.ll'
source_filename = "repro.cpp"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

@k = local_unnamed_addr global i32 0, align 4
@l = local_unnamed_addr global i32 0, align 4
@m = local_unnamed_addr global i32 0, align 4
@n = local_unnamed_addr global i32 0, align 4

; Function Attrs: mustprogress nofree norecurse nosync nounwind null_pointer_is_valid optforfuzzing willreturn
define dso_local void @_Z1ov() local_unnamed_addr #0 {
entry:
  store i8 8, i8* null, align 536870912, !tbaa !2
  ret void
}

attributes #0 = { mustprogress nofree norecurse nosync nounwind null_pointer_is_valid optforfuzzing willreturn "frame-pointer"="none" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+cx8,+mmx,+sse,+sse2,+x87" }

!llvm.module.flags = !{!0}
!llvm.ident = !{!1}

!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{!"clang version 14.0.0 (git@github.com:llvm/llvm-project.git b3a45e286fdfa73dd758472363dfafe7543cc077)"}
!2 = !{!3, !3, i64 0}
!3 = !{!"omnipotent char", !4, i64 0}
!4 = !{!"Simple C++ TBAA"}

> clang -cc1 -fno-delete-null-pointer-checks -O3 -fsanitize=fuzzer-no-link -emit-llvm  t.cpp
> cat t.ll
(the same output)

looks like this caused a crash in some chromium code

Hi @akhuang , I'm not able to reproduce this crash. Could you please decribe your environment/crash output/stack trace?

This should repro in a debug build, but might not be visible in release+asserts; I see it when building with clang on macOS, but it doesn't seem to repro on godbolt.
I think it's a simple bug and fixed with:
dd19f342fa21

This should repro in a debug build, but might not be visible in release+asserts; I see it when building with clang on macOS, but it doesn't seem to repro on godbolt.
I think it's a simple bug and fixed with:
dd19f342fa21

Thank you!

This should repro in a debug build, but might not be visible in release+asserts; I see it when building with clang on macOS, but it doesn't seem to repro on godbolt.
I think it's a simple bug and fixed with:
dd19f342fa21

Thank you!

Easy fix, but it does point to a larger problem (for the 3rd time in this patch set...) - we're seeing unsimplified IR, but this pass is not really prepared to handle that.
Maybe we'll sort this out when we try to make this pass run at -O2 instead of only -O3 (by having it run directly after -instcombine for example).

For reference, the exact bug was also responsible for:
https://llvm.org/PR51553
(and again, I can't remember why we run AIC before regular IC in the -O3 pipeline...)