This is an archive of the discontinued LLVM Phabricator instance.

[LSR] Consider post-inc form when creating extends/truncates.
ClosedPublic

Authored by fhahn on Jun 15 2023, 2:18 AM.

Details

Summary

GenerateTruncates at the moment creates extends/truncates for post-inc
uses of normalized expressions. For example, if an add rec of the form
{1,+,-1} is used outside the loop, the normalized form will use {1,+,-1}
instead of {0,+,-1}. When naively sign-extending the normalized
expression, it will get extended incorrectly to {1,+,-1} for the wider
type, if the backedge-taken count of the loop is 1.

To address this, the patch updates GenerateTruncates to check if the
LSRUse contains any fixups with PostIncLoops. If that's the case, first
de-normalize the expression, then perform the extend/truncate, then
normalize again.

There may be other places where similar checks are needed and the helper
can be generalized for those cases. I'd not be surprised if other subtle
mis-compiles are caused by this.

Fixes #38847.
Fixes #62852.

Diff Detail

Event Timeline

fhahn created this revision.Jun 15 2023, 2:18 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 15 2023, 2:18 AM
fhahn requested review of this revision.Jun 15 2023, 2:18 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 15 2023, 2:18 AM
nikic added a comment.Jun 15 2023, 2:24 AM

Did you see D151796? What you're doing here presumably still relies on denormalization and normalization round-tripping, which is not actually the case.

fhahn added a comment.Jun 15 2023, 2:37 AM

Did you see D151796? What you're doing here presumably still relies on denormalization and normalization round-tripping, which is not actually the case.

I saw this one a while ago, but it dropped off my radar!

The patch relies on round-tripping, but addresses a different issue I think. For the 2 linked issues, the problem is that we build new expressions based on normalized expressions and that is incorrect in some cases if they are used by something that needs the post-inc. IIUC to squash all potential mis-compiles, we would need to ensure that round-tripping will always succeed ( D151796 is a first step) & new expressions based on normalized expressions remain correct in for post-inc users (this patch is a first step).

fhahn updated this revision to Diff 531673.Jun 15 2023, 3:12 AM

Rebase after moving the test to the X86 subdir.

nikic accepted this revision.Jun 16 2023, 9:07 AM

LGTM

llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
4143

to -> the

This revision is now accepted and ready to land.Jun 16 2023, 9:07 AM
This revision was landed with ongoing or failed builds.Jun 17 2023, 2:02 AM
This revision was automatically updated to reflect the committed changes.
fhahn marked an inline comment as done.
fhahn added inline comments.Jun 17 2023, 2:02 AM
llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
4143

Thanks, adjusted in the committed version!

seeing this in CI builds of MIOpen. might be related to this patch. Reproducer might take a few days, so this is mostly FYI

clang++: /jenkins/workspace/compiler-psdb-amd-stg-open_2/repos/external/llvm-project/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp:4152: const llvm::SCEV* getAnyExtendConsideringPostIncUses({anonymous}::LSRUse&, const llvm::SCEV*, llvm::Type*, llvm::ScalarEvolution&): Assertion `(!Loops || *Loops == LF.PostIncLoops) && "different post-inc loops used"' failed.4

fhahn added a comment.Jun 17 2023, 9:59 AM

seeing this in CI builds of MIOpen. might be related to this patch. Reproducer might take a few days, so this is mostly FYI

clang++: /jenkins/workspace/compiler-psdb-amd-stg-open_2/repos/external/llvm-project/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp:4152: const llvm::SCEV* getAnyExtendConsideringPostIncUses({anonymous}::LSRUse&, const llvm::SCEV*, llvm::Type*, llvm::ScalarEvolution&): Assertion `(!Loops || *Loops == LF.PostIncLoops) && "different post-inc loops used"' failed.4

Thanks for the heads-up. It seems to also trigger during clang bootstrap on some platforms, I'll try to get a reproducer that way. For now I reverted the change

Recommitted as dae5cd73cb38 with an adjustment to perform the conversion of the PostIncLoopSets for all Fixups, and returning nullptr if they don't match

There may be one more case which this patch does not capture? Check the following input:

target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
target triple = "aarch64-unknown-linux-gnu"

@c = internal global i32 0, align 4
@g = dso_local local_unnamed_addr global ptr @c, align 8
@h = dso_local global i8 0, align 4
@j = dso_local local_unnamed_addr global ptr @h, align 8
@b = internal unnamed_addr global i32 0, align 4
@e = dso_local local_unnamed_addr global i16 0, align 4
@f = dso_local local_unnamed_addr global i64 0, align 8
@l = dso_local local_unnamed_addr global i64 0, align 8
@i = dso_local local_unnamed_addr global i64 0, align 8
@.str = private unnamed_addr constant [5 x i8] c"%lX\0A\00", align 1
@a = dso_local local_unnamed_addr global i8 0, align 1

; Function Attrs: mustprogress nofree norecurse nosync nounwind readnone willreturn uwtable
define dso_local void @n(ptr nocapture noundef %0) {
  ret void
}

; Function Attrs: nofree nounwind uwtable
define dso_local i32 @main() {
  %1 = load ptr, ptr @g, align 8
  %2 = load i64, ptr @f, align 8
  br label %3

3:                                                ; preds = %0, %18
  %4 = phi i32 [ 0, %0 ], [ %19, %18 ]
  %5 = phi i64 [ %2, %0 ], [ %15, %18 ]
  br label %6

6:                                                ; preds = %3, %6
  %7 = phi i32 [ 1, %3 ], [ %10, %6 ]
  %8 = phi i64 [ %5, %3 ], [ %9, %6 ]
  %9 = add nsw i64 %8, 1
  %10 = add nsw i32 %7, -1
  %11 = icmp sgt i32 %7, 0
  br i1 %11, label %6, label %12

12:                                               ; preds = %6, %12
  %13 = phi i32 [ %16, %12 ], [ 1, %6 ]
  %14 = phi i64 [ %15, %12 ], [ %9, %6 ]
  %15 = add nsw i64 %14, 1
  %16 = add nsw i32 %13, -1
  %17 = icmp sgt i32 %13, 0
  br i1 %17, label %12, label %18

18:                                               ; preds = %12
  %19 = add nuw nsw i32 %4, 1
  %20 = icmp eq i32 %19, 8
  br i1 %20, label %21, label %3

21:                                               ; preds = %18
  store i32 %16, ptr @b, align 4
  store i32 0, ptr %1, align 4
  %22 = zext i32 %13 to i64
  store i16 -1, ptr @e, align 4
  store i64 %15, ptr @f, align 8
  store i64 %14, ptr @l, align 8
  store i64 %22, ptr @i, align 8
  %23 = urem i32 %16, 53
  %24 = trunc i32 %23 to i8
  %25 = load ptr, ptr @j, align 8
  store i8 %24, ptr %25, align 1
  %26 = load i8, ptr @h, align 4
  %27 = xor i8 %26, 5
  %28 = zext i8 %27 to i64
  %29 = tail call i32 (ptr, ...) @printf(ptr noundef nonnull @.str, i64 noundef %28)
  ret i32 0
}

declare noundef i32 @printf(ptr nocapture noundef readonly, ...)
$ opt -loop-reduce reduced.ll -S -o out.ll && clang out.ll -o a.out && ./a.out
B
$ clang reduced.ll -o a.out && ./a.out
2C

There may be one more case which this patch does not capture? Check the following input:

target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
target triple = "aarch64-unknown-linux-gnu"

@c = internal global i32 0, align 4
@g = dso_local local_unnamed_addr global ptr @c, align 8
@h = dso_local global i8 0, align 4
@j = dso_local local_unnamed_addr global ptr @h, align 8
@b = internal unnamed_addr global i32 0, align 4
@e = dso_local local_unnamed_addr global i16 0, align 4
@f = dso_local local_unnamed_addr global i64 0, align 8
@l = dso_local local_unnamed_addr global i64 0, align 8
@i = dso_local local_unnamed_addr global i64 0, align 8
@.str = private unnamed_addr constant [5 x i8] c"%lX\0A\00", align 1
@a = dso_local local_unnamed_addr global i8 0, align 1

; Function Attrs: mustprogress nofree norecurse nosync nounwind readnone willreturn uwtable
define dso_local void @n(ptr nocapture noundef %0) {
  ret void
}

; Function Attrs: nofree nounwind uwtable
define dso_local i32 @main() {
  %1 = load ptr, ptr @g, align 8
  %2 = load i64, ptr @f, align 8
  br label %3

3:                                                ; preds = %0, %18
  %4 = phi i32 [ 0, %0 ], [ %19, %18 ]
  %5 = phi i64 [ %2, %0 ], [ %15, %18 ]
  br label %6

6:                                                ; preds = %3, %6
  %7 = phi i32 [ 1, %3 ], [ %10, %6 ]
  %8 = phi i64 [ %5, %3 ], [ %9, %6 ]
  %9 = add nsw i64 %8, 1
  %10 = add nsw i32 %7, -1
  %11 = icmp sgt i32 %7, 0
  br i1 %11, label %6, label %12

12:                                               ; preds = %6, %12
  %13 = phi i32 [ %16, %12 ], [ 1, %6 ]
  %14 = phi i64 [ %15, %12 ], [ %9, %6 ]
  %15 = add nsw i64 %14, 1
  %16 = add nsw i32 %13, -1
  %17 = icmp sgt i32 %13, 0
  br i1 %17, label %12, label %18

18:                                               ; preds = %12
  %19 = add nuw nsw i32 %4, 1
  %20 = icmp eq i32 %19, 8
  br i1 %20, label %21, label %3

21:                                               ; preds = %18
  store i32 %16, ptr @b, align 4
  store i32 0, ptr %1, align 4
  %22 = zext i32 %13 to i64
  store i16 -1, ptr @e, align 4
  store i64 %15, ptr @f, align 8
  store i64 %14, ptr @l, align 8
  store i64 %22, ptr @i, align 8
  %23 = urem i32 %16, 53
  %24 = trunc i32 %23 to i8
  %25 = load ptr, ptr @j, align 8
  store i8 %24, ptr %25, align 1
  %26 = load i8, ptr @h, align 4
  %27 = xor i8 %26, 5
  %28 = zext i8 %27 to i64
  %29 = tail call i32 (ptr, ...) @printf(ptr noundef nonnull @.str, i64 noundef %28)
  ret i32 0
}

declare noundef i32 @printf(ptr nocapture noundef readonly, ...)
$ opt -loop-reduce reduced.ll -S -o out.ll && clang out.ll -o a.out && ./a.out
B
$ clang reduced.ll -o a.out && ./a.out
2C

@fhahn Do you plan to fix this? If not, I will file one ticket for this.

fhahn added a comment.Jul 3 2023, 2:06 PM

@peixin thanks for the additional test case! I precommitted a slightly reduced version of your test and should have a fix ready shortly

@peixin thanks for the additional test case! I precommitted a slightly reduced version of your test and should have a fix ready shortly

@fhahn OK. Thanks.

Hi, I ran into
llvm-project/llvm/include/llvm/Support/Casting.h:109: static bool llvm::isa_impl_cl<To, const From*>::doit(const From*) [with To = llvm::SCEVAddRecExpr; From = llvm::SCEV]: Assertion `Val && "isa<> used on a null pointer"' failed.

llc < test.ll

source_filename = "mmulint8-4ce5aa.c"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

@.str = external dso_local unnamed_addr constant [35 x i8], align 1
@.str.1 = external dso_local unnamed_addr constant [36 x i8], align 1

define dso_local void @f90_matmul_int8_i8(ptr %dest_addr) local_unnamed_addr #0 {
entry:

br i1 poison, label %if.then87, label %if.then87.sink.split

if.then87.sink.split: ; preds = %entry

unreachable

if.then87: ; preds = %entry

br i1 poison, label %if.then112, label %if.end124

if.then112: ; preds = %if.then87

%0 = load i64, ptr poison, align 8
%1 = load i64, ptr poison, align 8
br label %if.then137

if.end124: ; preds = %if.then87

%2 = load i64, ptr poison, align 8
%3 = load i64, ptr poison, align 8
br label %if.then137

if.then137: ; preds = %if.end124, %if.then112

%4 = phi i64 [ %1, %if.then112 ], [ %3, %if.end124 ]
%5 = phi i64 [ %0, %if.then112 ], [ %2, %if.end124 ]
%6 = load i64, ptr poison, align 8
%add.ptr162 = getelementptr inbounds i64, ptr %dest_addr, i64 %6
%mul163 = mul nsw i64 %5, %4
%add.ptr164 = getelementptr inbounds i64, ptr %add.ptr162, i64 %mul163
%add.ptr166 = getelementptr inbounds i64, ptr %add.ptr164, i64 poison
%add.ptr167 = getelementptr inbounds i64, ptr %add.ptr166, i64 -1
%add.ptr249.us = getelementptr inbounds i64, ptr %add.ptr167, i64 0
br i1 poison, label %for.cond251.for.inc255_crit_edge.us.unr-lcssa, label %for.body253.us

for.body253.us: ; preds = %for.body253.us, %if.then137

%d_elem_p.0541.us = phi ptr [ %add.ptr254.us.7, %for.body253.us ], [ %add.ptr249.us, %if.then137 ]
store i64 0, ptr %d_elem_p.0541.us, align 8
%add.ptr254.us = getelementptr inbounds i64, ptr %d_elem_p.0541.us, i64 %5
%add.ptr254.us.1 = getelementptr inbounds i64, ptr %add.ptr254.us, i64 %5
store i64 0, ptr %add.ptr254.us.1, align 8
%add.ptr254.us.2 = getelementptr inbounds i64, ptr %add.ptr254.us.1, i64 %5
%add.ptr254.us.3 = getelementptr inbounds i64, ptr %add.ptr254.us.2, i64 %5
store i64 0, ptr %add.ptr254.us.3, align 8
%add.ptr254.us.4 = getelementptr inbounds i64, ptr %add.ptr254.us.3, i64 %5
%add.ptr254.us.5 = getelementptr inbounds i64, ptr %add.ptr254.us.4, i64 %5
store i64 0, ptr %add.ptr254.us.5, align 8
%add.ptr254.us.6 = getelementptr inbounds i64, ptr %add.ptr254.us.5, i64 %5
%add.ptr254.us.7 = getelementptr inbounds i64, ptr %add.ptr254.us.6, i64 %5
br i1 poison, label %for.cond251.for.inc255_crit_edge.us.unr-lcssa, label %for.body253.us

for.cond251.for.inc255_crit_edge.us.unr-lcssa: ; preds = %for.body253.us, %if.then137

%d_elem_p.0541.us.unr = phi ptr [ %add.ptr249.us, %if.then137 ], [ %add.ptr254.us.7, %for.body253.us ]
ret void

}

attributes #0 = { "tune-cpu"="generic" }

fhahn added a comment.Jul 5 2023, 10:13 AM

@ronlieb @mstorsjo thanks for the reports, should be fixed by 69ca5c9d62deef66a00ceb26706fab7c755d1f10.

@peixin your reproducer should be fixed by 7f5b15ad150e

peixin added a comment.Jul 5 2023, 6:08 PM

@peixin your reproducer should be fixed by 7f5b15ad150e

Thanks @fhahn . I confirmed that my original internal case is also fixed now :).