This is an archive of the discontinued LLVM Phabricator instance.

[AggressiveInstCombine] folding load for constant global patterened arrays and structs by GEP indices
ClosedPublic

Authored by khei4 on Mar 22 2023, 3:56 AM.

Details

Summary

Fold loadInst for constant aggregate types, those load results are equivalent for GEP indices.

This revision will be part 2 of the following stages.

  1. alignment-based https://reviews.llvm.org/D144445
  2. GEP-based (This revision will be)

Depends on: https://reviews.llvm.org/D145355, https://reviews.llvm.org/D144445
Fixes https://github.com/llvm/llvm-project/issues/61615

Diff Detail

Event Timeline

khei4 created this revision.Mar 22 2023, 3:56 AM
khei4 requested review of this revision.Mar 22 2023, 3:56 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 22 2023, 3:56 AM
khei4 edited the summary of this revision. (Show Details)Mar 22 2023, 3:56 AM
khei4 edited the summary of this revision. (Show Details)Mar 22 2023, 3:57 AM
khei4 updated this revision to Diff 507317.Mar 22 2023, 5:01 AM
  • fix arrive-back checking on GEP analysis
  • remove obsolete comment
khei4 edited the summary of this revision. (Show Details)Mar 23 2023, 3:31 PM
khei4 planned changes to this revision.Mar 23 2023, 7:09 PM
khei4 updated this revision to Diff 508014.Mar 24 2023, 2:52 AM

implement non-inbounds scale handling, clean the diff

khei4 updated this revision to Diff 508017.Mar 24 2023, 3:15 AM
khei4 retitled this revision from (WIP)[AggressiveInstCombine] folding load for constant global patterened arrays and structs by GEP indices to [AggressiveInstCombine] folding load for constant global patterened arrays and structs by GEP indices.

rename test case name

khei4 added inline comments.Mar 24 2023, 3:17 AM
llvm/test/Transforms/AggressiveInstCombine/patterned-load.ll
51–62

Tweaked this case not to be folded by alignment.

khei4 updated this revision to Diff 509592.Mar 30 2023, 3:09 AM

add a comment

nikic added inline comments.May 7 2023, 4:09 AM
llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
834–847

I think these two ifs are redundant with the !isa<GlobalVariable>(PtrOpV) check that's done below the while loop. It should be possible to just drop them.

857

This looks confused. The second element in the pair is not the IndexTypeSize, but already the Scale (as an APInt).

881

divission -> division

910

I think we can keep this non-optional. Stride=1 is always valid as a fallback.

khei4 added a comment.May 7 2023, 10:49 PM

Thank you for the review!

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
834–847

Thanks for the good catch!

857

Sounds reasonable! (I thought this bit width casting is necessary to avoid mismatching on APInt ops, but it's not.)

910

yeah, but the GCD calculation process seems good to be undefined on the first.
I'll assign 1 on the following Stride's "nullopt" checking!

khei4 updated this revision to Diff 520257.May 7 2023, 10:50 PM

apply feedbacks

nikic added inline comments.May 8 2023, 3:59 AM
llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
910

You are right that inside getStrideAndModOffsetOfGEP() we should start with std::optional<APInt>. My thinking it that we can use plain APInt outside it. What do you think about making getStrideAndModOffsetOfGEP() return std::pair<APInt, APInt> and then calling it like auto [Stride, ConstOffset] = getStrideAndModOffsetOfGEP(PtrOp, DL)? In that case, getStrideAndModOffsetOfGEP would return Stride=1, ConstOffset=0 for the case where it can't determine anything.

khei4 added inline comments.May 8 2023, 6:25 AM
llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
910

Sounds good! I'll apply;)

khei4 added inline comments.May 8 2023, 6:30 AM
llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
838–839

I noticed this failure become uncapturable. I'm not sure the cases GEPOperator::collectOffset return false, but as we confirm, we can decide by Stride 1 whatever:)

nikic added inline comments.May 8 2023, 6:46 AM
llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
838–839

I think you can replace it with break; so it falls through to the !isa<GlobalVariable>(PtrOp) handling. collectOffset() only fails for scalable vectors, so it's a very rare case...

khei4 updated this revision to Diff 520362.May 8 2023, 7:05 AM
khei4 edited the summary of this revision. (Show Details)

make Stride non-optional on foldPatternedLoads(the caller function) body.

nikic added a comment.May 8 2023, 7:37 AM

Logic looks good, some style nits. Can you please check the compile-time impact for this patch?

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
828–829
832

nit: Unnecessary empty line.

837
838
851–852

I don't think there's a need to create a separate APInt here?

859
903

I think you can move BW into the if below now, it's not used otherwise.

920

I think the !Stride isn't needed anymore.

khei4 added inline comments.May 8 2023, 8:06 AM
llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
838–839

I noticed it now! it sounds shorter than I did!

khei4 added a comment.May 8 2023, 8:17 AM

Thank you for the quick good catches!

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
851–852

Good catch! I believed this is necessary to match bit-width, but that was another place...

903

The same goes for LoadTy! Thanks!

khei4 updated this revision to Diff 520387.May 8 2023, 8:17 AM

apply feedbacks

nikic accepted this revision.May 9 2023, 4:18 AM
This revision is now accepted and ready to land.May 9 2023, 4:18 AM

This is causing a compiler crash, which looks like a div by zero. I'll start working on a repro, but I'd like to revert to keep trunk green.

This is causing a compiler crash, which looks like a div by zero. I'll start working on a repro, but I'd like to revert to keep trunk green.

Reverted in e08c397a88077c50dc25e71b39b9d5efbfc85a9a

The following crashes under opt -passes=aggressive-instcombine:

; ModuleID = 'repro.ll'
source_filename = "repro.ll"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

%struct.snork = type { i128 }

@global = internal constant %struct.snork { i128 1 }

define i1 @snork() {
bb:
  %load = load i64, ptr @global, align 8
  %icmp = icmp sgt i64 %load, 0
  call void @llvm.assume(i1 %icmp)
  ret i1 false
}

; Function Attrs: nocallback nofree nosync nounwind willreturn memory(inaccessiblemem: readwrite)
declare void @llvm.assume(i1 noundef) #0

attributes #0 = { nocallback nofree nosync nounwind willreturn memory(inaccessiblemem: readwrite) }

Halide also found this too: https://buildbot.halide-lang.org/master/#/builders/76/builds/64/steps/9/logs/stdio

MaskRay added a subscriber: MaskRay.May 9 2023, 1:59 PM

@khei4 FYI, 0574a4be879e07b48ba9be8d63eebba49a04dfe8 somehow placed everything on one line:)

khei4 added a comment.EditedMay 9 2023, 7:20 PM

@rupprecht Thank you for handling regression! Sorry for being late. And repro seems a really common case...
only place using division is L859 on https://reviews.llvm.org/rG0574a4be879e07b48ba9be8d63eebba49a04dfe8. I'll handle the zero-size case!

Edit: FYI, This is caused by a lack of handling a 128bit integer's type size. no-GEPed load. -O2 or -O3 couldn't repro this, for the above case, normal ConstantFolding seems to eliminate this case. -O2 or -O3 could repro this. I'll add this case to the AggressiveInstCombine test! Thank you for the report!

@MaskRay Maybe that's why I wrote it just by git commit -m. I rewrite it also.

khei4 added a comment.EditedMay 9 2023, 8:46 PM

WIP notes:
This is clearly div by 0 error, but especially something wrong with i128. I'm now searching why handling is needed for i128. For the following test cases only the last one crash.

@g128 = internal constant i128 42
@g64 = internal constant i64 42
define i128 @no-gep-128-as128(i64 %idx){
 %1 = load i128, ptr @g128, align 4
  ret i128 %1
}
define i128 @no-gep-64-as128(i64 %idx){
 %1 = load i128, ptr @g64, align 4
  ret i128 %1
}
define i32 @no-gep-64-as32(i64 %idx){
 %1 = load i32, ptr @g64, align 4
  ret i32 %1
}
define i32 @no-gep-128-as32-crash(i64 %idx){
 %1 = load i32, ptr @g128, align 4
  ret i32 %1
}
khei4 updated this revision to Diff 520965.May 10 2023, 6:06 AM

although I'm now wondering why i128 couldn't be folded by EarlyCSEPass
https://llvm.godbolt.org/z/YbP4W3Y4x
I believe this will fix the div zero crashes.

khei4 updated this revision to Diff 520967.May 10 2023, 6:12 AM

add comment newlines

nikic added a comment.May 11 2023, 2:37 PM

The updated code looks good to me, but could you please also add the extra test case (that was previously crashing)?

And yes, the fact that the load from i128 constant does not fold is weird. I think it is due to this check: https://github.com/llvm/llvm-project/blob/0f800dfe036c12e1883586234bcae2be33d82920/llvm/lib/Analysis/ConstantFolding.cpp#L432 It would be nice to relax it so it can handle wider integers.

khei4 updated this revision to Diff 521511.May 11 2023, 6:21 PM

The updated code looks good to me, but could you please also add the extra test case (that was previously crashing)?

@nikic Thank you for reminding me! I added. (sorry I said in the former text.

I think it is due to this check: https://github.com/llvm/llvm-project/blob/0f800dfe036c12e1883586234bcae2be33d82920/llvm/lib/Analysis/ConstantFolding.cpp#L432

Thanks a lot for finding that part! I'll try to patch it!

nikic added inline comments.May 12 2023, 1:24 AM
llvm/test/Transforms/AggressiveInstCombine/patterned-load.ll
20

Could you please also rerun update_test_checks.py for the new test?

khei4 updated this revision to Diff 521582.May 12 2023, 1:38 AM

update test. (It's embarrassing. Sorry...

nikic accepted this revision.May 12 2023, 1:41 AM

LGTM

khei4 closed this revision.May 13 2023, 3:02 AM

Sorry, I failed to link this revision with the following commit...
https://reviews.llvm.org/rG39a0677784d1b53f2d6e33af2a53e915f3f62c86