This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/AggressiveInstCombine/
-
Transforms/
-
AggressiveInstCombine/
20/48
AggressiveInstCombine.cpp
-
test/Transforms/AggressiveInstCombine/
-
Transforms/
-
AggressiveInstCombine/
1/2
patterned-load.ll

Differential D144445

[AggressiveInstCombine] folding load for constant global patterened arrays and structs by alignment
ClosedPublic

Authored by khei4 on Feb 20 2023, 8:10 PM.

Download Raw Diff

Details

Reviewers

nikic
spatel

Commits

rG434b0badb5d5: [AggressiveInstCombine] folding load for constant global patterened arrays and…

Summary

Fold loadInst for constant aggregate types, those load results are equivalent for the given alignments.

This revision will be part 1 of the following stages.

alignment-based (This revision will be)
GEP-based https://reviews.llvm.org/D146622

add ConstantFoldLoadFromPatternedAggregate method to AggressiveInstCombine

alive proofs: https://alive2.llvm.org/ce/z/qBGl72
Depends on: https://reviews.llvm.org/D145355
Fixes: https://github.com/rust-lang/rust/issues/107208.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

update failed tests

Herald added subscribers: kosarev, kerbowa, jvesely. · View Herald TranscriptFeb 25 2023, 1:42 AM

Harbormaster completed remote builds in B215910: Diff 500393.Feb 25 2023, 1:43 AM

add inline comments for check and review.

llvm/lib/Analysis/ConstantFolding.cpp
778 ↗	(On Diff #500393)	although I'm not sure it's possible to fold pointer structs and arrays, note here.
813 ↗	(On Diff #500393)	Conversion from bytes to the load type seems a bit ugly. Are there any better ways?
llvm/test/CodeGen/AMDGPU/lower-module-lds-via-hybrid.ll
22 ↗	(On Diff #500393)	update these tests `llvm/test/CodeGen/AMDGPU/lower-module-lds-via-table.ll` and `llvm/test/CodeGen/AMDGPU/lower-module-lds-via-hybrid.ll` by following commands llvm/utils/update_test_checks.py --opt-binary ./build/bin/opt \ llvm/test/CodeGen/AMDGPU/lower-module-lds-via-table.ll llvm/utils/update_llc_test_checks.py --llc-binary ./build/bin/llc \ llvm/test/CodeGen/AMDGPU/lower-module-lds-via-table.ll and tweak head comments, as comments in these files say Is this ok?
llvm/test/Transforms/InstSimplify/load-patterned-aggregates.ll
52 ↗	(On Diff #500139)	these tests are dupplicate to the following tests, sorry for the noises

@JonChesterfield Hi. Sorry for the sudden review requests. I modified the test files you recently modified. I don't know how to check diffs in this revision are correct or not.
Could you take a look at whether these changes are no problem? If you have any idea about these diffs please teach me, and I'll fix these diffs. :)

Herald added a subscriber: wdng. · View Herald TranscriptFeb 25 2023, 2:16 AM

khei4 edited the summary of this revision. (Show Details)Feb 25 2023, 6:51 PM

add pre-increment and overflow check

Harbormaster completed remote builds in B215993: Diff 500492.Feb 25 2023, 9:27 PM

early return and format

Harbormaster completed remote builds in B215998: Diff 500497.Feb 25 2023, 9:48 PM

khei4 removed a parent revision: D144618: [ConstantFold][InstSimplify] Pre-commit tests for D144445 (NFC).Feb 25 2023, 11:51 PM

@JonChesterfield @arsenm sorry for being noisy. I noticed my transformation was broken for ptrtoinst applied pointer arrays. I'll fix them.

see ReadDataFromGlobal return value.
now ready for review

Harbormaster completed remote builds in B216048: Diff 500556.Feb 26 2023, 7:15 AM

khei4 edited the summary of this revision. (Show Details)Feb 26 2023, 9:46 PM

nikic added inline comments.Feb 27 2023, 8:05 AM

llvm/lib/Analysis/ConstantFolding.cpp
781 ↗	(On Diff #500556)	This looks like a leftover from a previous implementation? I don't think your current one has any limitations regarding the type of the initializer.
786 ↗	(On Diff #500556)	If we have a 64K global and are reading 4 byte elements (i32) from it, then we need to check that 16k elements are the same. This is too expensive, as the compile-time regression shows. We should limit this more aggressively to achieve reasonable compile-times, e.g. to 1K globals.
794 ↗	(On Diff #500556)	Rather than `ReadDataFromGlobal()`, we should use `ConstantFoldLoadFromConst()` here (or rather in the loop below). This has two advantages: It's going to handle all types (including pointer loads), and it allows an early bailout if not all elements are the same. Your current implementation will always read the entire global, even if it's clear after two elements that it's not uniform. It will also take care of endianness itself.
798 ↗	(On Diff #500556)	If the load alignment is used, you also need to make sure that the global alignment is `>=` the load alignment. Otherwise it doesn't really tell you anything. Though I'm not sure we really want to use alignment to determine access stride -- this is pretty unusual. Based this on the getelementptr stride would be more typical and handle cases that alignment can't (e.g. non-power-of-two stride).
llvm/lib/Analysis/InstructionSimplify.cpp
6640 ↗	(On Diff #500556)	hasUniqueInitializer() is only relevant if you want to modify an initializer, it should not be checked here.
6651 ↗	(On Diff #500556)	It might make sense to move this to the `else` of the below branch. If we know a constant offset, then we don't need to go through this code.
llvm/test/Transforms/InstSimplify/load-patterned-aggregates.ll
3 ↗	(On Diff #500556)	Rather than hardcoding the data layout, it's better to do something like this: ; RUN: opt < %s -passes=instsimplify -S -data-layout="e" \| FileCheck %s --check-prefixes=CHECK,LE ; RUN: opt < %s -passes=instsimplify -S -data-layout="E" \| FileCheck %s --check-prefixes=CHECK,BE This will check both little and big endian.

it might be better to move this to aggressive inst-combine or refactor

Based on those numbers, it does seem pretty likely that we'll have to move this into AggressiveInstCombine. That would also give a bit more leeway wrt globals sizes. And I don't think it would really reduce optimization potential in any meaningful way -- this doesn't seem like the kind of fold that benefits from being run 20 times in the optimization pipeline.

Thank you for the review!

llvm/lib/Analysis/ConstantFolding.cpp
781 ↗	(On Diff #500556)	I might have cared padding bit, but it's no longer needed if I use `ConstantFoldLoadFromConst`
786 ↗	(On Diff #500556)	Sounds reasonable. I should have estimated it.
794 ↗	(On Diff #500556)	Rather than `ReadDataFromGlobal()`, we should use `ConstantFoldLoadFromConst()` here (or rather in the loop below). This has two advantages: It's going to handle all types (including pointer loads), and it allows an early bailout if not all elements are the same. Your current implementation will always read the entire global, even if it's clear after two elements that it's not uniform. It will also take care of endianness itself. Thank you! I couldn't notice that function. That sounds much more efficient than the current whole global variable reading.
798 ↗	(On Diff #500556)	If the load alignment is used, you also need to make sure that the global alignment is >= the load alignment. Otherwise it doesn't really tell you anything. Right. I'll handle that case. Based this on the getelementptr stride would be more typical and handle cases that alignment can't (e.g. non-power-of-two stride) Ok, I am a little confused. Overall I thought I don't need to care about the case that alive decides any value could be returned. Although I didn't, `undef` value could be returned? if `Global Variable's alignment` > `load's alignment`, then whatever offsetted pointer is given to load, then any value could be returned. https://alive2.llvm.org/ce/z/5_EDLB even if `Global Variable's alignment` <= `load's alignment`, the offset is not the multiple of load's alignment, then any value could be returned. https://alive2.llvm.org/ce/z/Qnon4K based on these assumptions, I used alignment to calculate possible valid accessible offsets. Are these assumptions and alive results correct? I might have misunderstood or broadly interpret alive results. But I'm feeling it's also good to compute stride from GEP and use a bigger one. A possible minimum stride can be gained by calculating the greatest common divisor of GEP-type sizes, so it's simpler than I expected. (This might be out of scope, it might be better not to use GEP idx, I saw your proposal )
llvm/lib/Analysis/InstructionSimplify.cpp
6640 ↗	(On Diff #500556)	Yeah, you're also right here. I removed it. Thanks!
6651 ↗	(On Diff #500556)	Sounds reasonable. I'll fix.
llvm/test/Transforms/InstSimplify/load-patterned-aggregates.ll
3 ↗	(On Diff #500556)	Cool! Thanks!

Applied feedback

currently pushed and waiting to see the compile-time regressions expecting improvement ;)

https://llvm-compile-time-tracker.com/

leave some comments to consider.

llvm/lib/Analysis/ConstantFolding.cpp
773 ↗	(On Diff #501805)	Is it better to implement on `GEPOperator`?
798 ↗	(On Diff #500556)	Is the following case the example that only the seeing only GEP stride can't fold possible case? https://alive2.llvm.org/ce/z/9cVcee

In D144445#4155524, @nikic wrote:

it might be better to move this to aggressive inst-combine or refactor

Based on those numbers, it does seem pretty likely that we'll have to move this into AggressiveInstCombine. That would also give a bit more leeway wrt globals sizes. And I don't think it would really reduce optimization potential in any meaningful way -- this doesn't seem like the kind of fold that benefits from being run 20 times in the optimization pipeline.

https://llvm-compile-time-tracker.com/compare.php?from=6f3baf43820680841b0daebdde2c78b43175444b&to=c08adfbed285199cd839da40419f918ac7a863b1&stat=instructions:u

@nikic
Thanks to your feedback, regression relaxed, but yet seems so much to stay here. I'll move this to AggressiveInstCombine.

Harbormaster completed remote builds in B216932: Diff 501805.Mar 2 2023, 3:10 AM

khei4 retitled this revision from [ConstantFold][InstSimplify] folding load for constant global patterened arrays and structs to [AggressiveInstCombine] folding load for constant global patterened arrays and structs.Mar 5 2023, 10:44 PM

khei4 edited the summary of this revision. (Show Details)

khei4 mentioned this in D145355: [AggressiveInstCombine] Pre-Commit test for D144445 (NFC).Mar 5 2023, 11:03 PM

khei4 edited the summary of this revision. (Show Details)Mar 5 2023, 11:14 PM

khei4 added a parent revision: D145355: [AggressiveInstCombine] Pre-Commit test for D144445 (NFC).Mar 5 2023, 11:24 PM

move folding to AggressiveInstCombine

compile time regression for looking up to
64K objects: https://llvm-compile-time-tracker.com/compare.php?from=6f3baf43820680841b0daebdde2c78b43175444b&to=5024fc0c38730b0e58146a809b570d579d40fc0e&stat=instructions:u

1K objects: https://llvm-compile-time-tracker.com/compare.php?from=6f3baf43820680841b0daebdde2c78b43175444b&to=4a9358337b66c68e00462fac2c4902cb34398bd6&stat=instructions:u

other numbers (roughly) https://llvm-compile-time-tracker.com/?config=NewPM-O3&stat=instructions%3Au&remote=khei4

4K or 16K would be fine.

Flightor added a subscriber: Flightor.Mar 6 2023, 1:17 AM

Harbormaster completed remote builds in B217506: Diff 502532.Mar 6 2023, 1:31 AM

change the size bounds to 4K and add comments

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
876	according to this alive results https://alive2.llvm.org/ce/z/5_EDLB when load alignment is bigger, seems to be fine to return poison value if (GV->getAlign().valueOrOne().value() < LoadAlign) { I.replaceAllUsesWith(PoisonValue::get(LoadTy)); return true; } but the following test failed. (on `check-llvm` only targeted for X86) `llvm/test/Transforms/PhaseOrdering/X86/nancvt.ll` is this alive result wrong?

Harbormaster completed remote builds in B217520: Diff 502556.Mar 6 2023, 1:37 AM

nikic added inline comments.Mar 8 2023, 1:00 PM

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
848	This isn't the right way to compute the GEP stride: GEP works on alloc sizes rather than store sizes, and can have multiple indices. You can use the `collectOffset()` methods to handle all the cases, and then `GreatestCommonDivisor()` to calculate the GCD.
866	This needs to bail out for volatile loads.
873	This is still checking for unique initializers -- should only check definitive initializer.
881	Even if we can't use the alignment, we can still use the GEP stride.
893	Can just omit the Offset argument here (it's zero by default).
901	Do we need the `LoadSize` adjustment here? Can we instead iterate to `ByteOffset < GVSize`? The current `LoadSize` calculation is not right for pointers, but it seem like we can just drop it entirely.
905	It's okay to just use 64 as the APInt size in this context. You are currently using the size of the initializer, which will make for a very wide integer...

apply feedbacks.

update tests

Thank you for the review! I'm sorry about the messy changes!
also add ptr array and constant offsets tests. https://reviews.llvm.org/D145355

I'm now wondering to handle pointers. And mentioned behavior for cross-bounding load and bigger load variable alignment. I'll figure it out!

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
848	You can use the collectOffset() methods to handle all the cases, and then GreatestCommonDivisor() to calculate the GCD. These methods save me! thanks!
873	Thanks for saying again...
880–882	Current implementation cannot read ptr array
881	Done. (just for my understanding, I'll clarify the case for bigger load alignment in discord.
881–882	Currently
901	Do we need the LoadSize adjustment here? Can we instead iterate to ByteOffset < GVSize? I think bound-crossing load cannot be excluded, without LoadSize. like https://alive2.llvm.org/ce/z/jYsLBk Sorry, I'm not confident with the semantics of this. I'll try to ask in discord.
905	Thanks! It's embarrassing...

Harbormaster completed remote builds in B218344: Diff 503699.Mar 9 2023, 3:24 AM

refactoring

remove LoadSize
rename variables
fix intermediate APInt bit size to GVSize * 8

now ready for review again

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
870–872	are there any better way to bitcast APInt?
901	I removed loadSize in the end

fix awkward indent

Harbormaster completed remote builds in B219558: Diff 505377.Mar 14 2023, 11:52 PM

Sorry, I noticed test failed.

fix wrong argument

Harbormaster completed remote builds in B219810: Diff 505733.Mar 16 2023, 3:36 AM

Extracting the stride from GEPs turned out to be trickier than I expected...

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
833	Should add a negative test using volatile.
844	This comment looks out of place, should be further down?
848	64K -> 4K
850	`uint64_t`
856	This is again creating a very, very wide integer :) The correct size to use here is `DL.getIndexTypeSizeInBits(PtrOp->getType())` as you already do in the loop. In that case you also don't need the separate `COff` variable, you can let collectOffset() directly add onto this one.
858	This looks somewhat dubious -- I assume the intention here is that this is a "no-op" value for the first GreatestCommonDivisor call? I think it would be clearer to make `Stride` an `std::optional` and then set it the first time, and use GCD for later iterations. (After the loop, if `Stride` is not set, it's okay to just bail out: The constant-offset case is already handled by other code, so we needn't bother with it.)
881	indeces -> indices
883	Hm, what happens if ConstOffset itself is negative? Would urem produce correct results in that case? Let's say `ConstOffset = -3` and `Stride = 3`. Let's assume 8-bit address space to keep things simple, then `ConstOffset = 253` in unsigned interpretation and `253 % 3 == 1`. Also I just realized another problem here: Just using the GCD isn't right if the GEP is not `inbounds` (i.e. multiplication can overflow) and the stride is not a power of two. I believe the correct way to handle these cases is implemented in BasicAA, in particular https://github.com/llvm/llvm-project/blob/47aa1fe376a477939a1991ffe37504124af25f52/llvm/lib/Analysis/BasicAliasAnalysis.cpp#L1135-L1138 to only keep a power of two factor for non-inbounds and https://github.com/llvm/llvm-project/blob/47aa1fe376a477939a1991ffe37504124af25f52/llvm/lib/Analysis/BasicAliasAnalysis.cpp#L1168-L1170 for the final modulus.
884	A correctness check missing here is that after looking through all the GEPs, you must arrive back at `GV`. Otherwise there might be some kind of other operation sitting in between which preserves the underlying object, but where we can't determine the offset. As a test case, one could use a call to `llvm.ptrmask` for example.
893	This probably also needs to check that ConstOffset is zero? I don't think using load alignment as stride is right with a non-zero start offset.
897	It would be a bit simpler to keep using (and adding to) ConstOffset here, as we need an APInt anyway.

This revision now requires changes to proceed.Mar 19 2023, 7:54 AM

In D144445#4204759, @nikic wrote:

Extracting the stride from GEPs turned out to be trickier than I expected...

If you like, we can go back to just the alignment handling for this patch, and then add the GEP handling in a separate one on top. Either way works for me.

Thank you for a lot of good catches!

In D144445#4204789, @nikic wrote:

In D144445#4204759, @nikic wrote:

Extracting the stride from GEPs turned out to be trickier than I expected...

If you like, we can go back to just the alignment handling for this patch, and then add the GEP handling in a separate one on top. Either way works for me.

Sounds good to me!
Then I want to separate this into following

alignment-based (This revision will be)
(all) inbounds GEP index-based
non-inbounds included GEP index-based

TBH, I feel non-inbounds GEP handling is a little bit handful, but I'll read AA's implementation and try it first!

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
856	Oh... I was confused with offset and the GV byte sizes! You're right!
858	This looks somewhat dubious -- I assume the intention here is that this is a "no-op" value for the first GreatestCommonDivisor call? Yes! I think it would be clearer to make `Stride` a `std::optional` and then set it the first time, Sounds reasonable, I'll use that!
883	Ah, I mistakenly assumed `inbounds`! Without `inbounds`, ConstantOffset could be negative and could overflow!
884	Good catch! I haven't known underlying-object preserved operations! I'll add tests!
893	Good catch! Yeah, we should have considered GEP-stride-based and alignment-based separately also for ConstOffset.

update summary and address simple fixes.

Harbormaster completed remote builds in B220373: Diff 506490.Mar 20 2023, 2:07 AM

addresss

ptrmasked pointer failures
not inbounds GEP

and refactoring

use APInt in loop
fix BitWidth for APInt
fix typo

Harbormaster completed remote builds in B220391: Diff 506516.Mar 20 2023, 4:00 AM

Nearly there, I think...

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
831	Unnecessary/unusual newline.
838	getPointerOperand()` is a bit more elegant.
845	What you want to do is after the GEPOperator loop, check that the final PtrOpV is GV. This makes sure there were only GEPs.
854	"to avoid scanning too much memory" maybe -- we're not really allocating anything here.
864	This shouldn't be needed when using `std::optional`.
869	I don't really get what this comment is trying to say. The reason why non-inbounds GEP is tricky to handle is that there is an implicit modulo the address space size, which makes it harder to calculate the GCD.
871	You should probably extract the whole "get stride of GEP" logic into a separate function. That way, all these early bailouts don't apply for the alignment case (where we don't care about inbounds).

This revision now requires changes to proceed.Mar 21 2023, 7:33 AM

Thank you for the review! I may fuse the GEP-related plans I proposed! because I might have misunderstood non-inbounds GEP with correct alignment!

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
845	Thanks! I'll check that before and after the loop!
864	I might have missed some patterns, but I guess it's necessary because GCD (self-referencing) calculations need its initial value as the first type size. :) Stride = APIntOps::GreatestCommonDivisor( Stride, APInt(ConstOffset.getBitWidth(), IndexTypeSize.getZExtValue()));
871	(with above comment) Ah, OK I might misunderstand the behavior of the load with non-inbounds GEP indices, but correct alignment!

khei4 retitled this revision from (WIP) [AggressiveInstCombine] folding load for constant global patterened arrays and structs to (WIP) [AggressiveInstCombine] folding load for constant global patterened arrays and structs by alignment.Mar 22 2023, 1:05 AM

khei4 edited the summary of this revision. (Show Details)Mar 22 2023, 1:23 AM

rebase to be only alignment-based analysis. GEP-based revision will be created.

Harbormaster completed remote builds in B220977: Diff 507300.Mar 22 2023, 3:31 AM

khei4 mentioned this in D146622: [AggressiveInstCombine] folding load for constant global patterened arrays and structs by GEP indices.Mar 22 2023, 3:56 AM

make ConstOffset be not optional

khei4 added a child revision: D146622: [AggressiveInstCombine] folding load for constant global patterened arrays and structs by GEP indices.Mar 22 2023, 3:59 AM

khei4 edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B220984: Diff 507309.Mar 22 2023, 4:00 AM

nikic added inline comments.Mar 22 2023, 4:09 AM

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
860	These isa checks aren't needed in this patch: For the alignment case, we don't care what the operand is.
llvm/test/Transforms/AggressiveInstCombine/patterned-load.ll
71	I don't get how this one folds with just this patch. If align 1 we have stride 1, but constarray1 needs a stride of 2, no?

Allen added a subscriber: Allen.Mar 22 2023, 4:38 AM

khei4 added inline comments.Mar 22 2023, 4:48 AM

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
860	Seems like, I could not get the point of `ptrmask` cares! (Precisely, it seems like we need to check alignment-based and GEP-based separately for completeness) Thanks!
llvm/test/Transforms/AggressiveInstCombine/patterned-load.ll
71	Ah, sorry seems like I just forgot to commit test changes...

update tests, remove redundant isa check

Harbormaster completed remote builds in B220986: Diff 507312.Mar 22 2023, 5:44 AM

nikic added inline comments.Mar 22 2023, 7:02 AM

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
855–880	Some simplifications. I think as far as foldPatternedLoads is concerned, we don't need the std::optional around Stride, because a Stride of 1 is always conservatively correct. It's only relevant internally while calculating GEP stride in the next patch.
856

apply suggestions

khei4 added inline comments.Mar 22 2023, 9:12 AM

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
855–880	Thanks! Yeah, simplicity for this patch matters!

Harbormaster completed remote builds in B221041: Diff 507385.Mar 22 2023, 9:29 AM

LGTM

This revision is now accepted and ready to land.Mar 23 2023, 4:40 AM

fix wrong arrow operator

This revision was landed with ongoing or failed builds.Mar 23 2023, 7:31 AM

Closed by commit rG434b0badb5d5: [AggressiveInstCombine] folding load for constant global patterened arrays and… (authored by khei4). · Explain Why

This revision was automatically updated to reflect the committed changes.

khei4 added a commit: rG434b0badb5d5: [AggressiveInstCombine] folding load for constant global patterened arrays and….

khei4 mentioned this in rGc7a3284de305: [AggressiveInstCombine] Pre-Commit test for D144445 (NFC).

Harbormaster completed remote builds in B221311: Diff 507735.Mar 23 2023, 9:17 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

AggressiveInstCombine/

AggressiveInstCombine.cpp

60 lines

test/

Transforms/

AggressiveInstCombine/

patterned-load.ll

47 lines

Diff 507385

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp

Show All 12 Lines

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

#include "llvm/Transforms/AggressiveInstCombine/AggressiveInstCombine.h" #include "llvm/Transforms/AggressiveInstCombine/AggressiveInstCombine.h"

#include "AggressiveInstCombineInternal.h" #include "AggressiveInstCombineInternal.h"

#include "llvm/ADT/Statistic.h" #include "llvm/ADT/Statistic.h"

#include "llvm/Analysis/AliasAnalysis.h" #include "llvm/Analysis/AliasAnalysis.h"

#include "llvm/Analysis/AssumptionCache.h" #include "llvm/Analysis/AssumptionCache.h"

#include "llvm/Analysis/BasicAliasAnalysis.h" #include "llvm/Analysis/BasicAliasAnalysis.h"

#include "llvm/Analysis/ConstantFolding.h"

#include "llvm/Analysis/GlobalsModRef.h" #include "llvm/Analysis/GlobalsModRef.h"

#include "llvm/Analysis/TargetLibraryInfo.h" #include "llvm/Analysis/TargetLibraryInfo.h"

#include "llvm/Analysis/TargetTransformInfo.h" #include "llvm/Analysis/TargetTransformInfo.h"

#include "llvm/Analysis/ValueTracking.h" #include "llvm/Analysis/ValueTracking.h"

#include "llvm/IR/DataLayout.h" #include "llvm/IR/DataLayout.h"

#include "llvm/IR/Dominators.h" #include "llvm/IR/Dominators.h"

#include "llvm/IR/Function.h" #include "llvm/IR/Function.h"

#include "llvm/IR/IRBuilder.h" #include "llvm/IR/IRBuilder.h"

▲ Show 20 Lines • Show All 271 Lines • ▼ Show 20 Lines static bool tryToRecognizePopCount(Instruction &I) {

APInt Mask01 = APInt::getSplat(Len, APInt(8, 0x01)); APInt Mask01 = APInt::getSplat(Len, APInt(8, 0x01));

APInt MaskShift = APInt(Len, Len - 8); APInt MaskShift = APInt(Len, Len - 8);

Value *Op0 = I.getOperand(0); Value *Op0 = I.getOperand(0);

Value *Op1 = I.getOperand(1); Value *Op1 = I.getOperand(1);

Value *MulOp0; Value *MulOp0;

// Matching "(i * 0x01010101...) >> 24". // Matching "(i * 0x01010101...) >> 24".

if ((match(Op0, m_Mul(m_Value(MulOp0), m_SpecificInt(Mask01)))) && if ((match(Op0, m_Mul(m_Value(MulOp0), m_SpecificInt(Mask01)))) &&

match(Op1, m_SpecificInt(MaskShift))) { match(Op1, m_SpecificInt(MaskShift))) {

Value *ShiftOp0; Value *ShiftOp0;

// Matching "((i + (i >> 4)) & 0x0F0F0F0F...)". // Matching "((i + (i >> 4)) & 0x0F0F0F0F...)".

if (match(MulOp0, m_And(m_c_Add(m_LShr(m_Value(ShiftOp0), m_SpecificInt(4)), if (match(MulOp0, m_And(m_c_Add(m_LShr(m_Value(ShiftOp0), m_SpecificInt(4)),

m_Deferred(ShiftOp0)), m_Deferred(ShiftOp0)),

m_SpecificInt(Mask0F)))) { m_SpecificInt(Mask0F)))) {

Value *AndOp0; Value *AndOp0;

// Matching "(i & 0x33333333...) + ((i >> 2) & 0x33333333...)". // Matching "(i & 0x33333333...) + ((i >> 2) & 0x33333333...)".

if (match(ShiftOp0, if (match(ShiftOp0,

▲ Show 20 Lines • Show All 79 Lines • ▼ Show 20 Lines static bool tryToFPToSat(Instruction &I, TargetTransformInfo &TTI) {

Value *Sat = Builder.CreateCall(Fn, In); Value *Sat = Builder.CreateCall(Fn, In);

I.replaceAllUsesWith(Builder.CreateSExt(Sat, IntTy)); I.replaceAllUsesWith(Builder.CreateSExt(Sat, IntTy));

return true; return true;

} }

/// Try to replace a mathlib call to sqrt with the LLVM intrinsic. This avoids /// Try to replace a mathlib call to sqrt with the LLVM intrinsic. This avoids

/// pessimistic codegen that has to account for setting errno and can enable /// pessimistic codegen that has to account for setting errno and can enable

/// vectorization. /// vectorization.

static bool static bool foldSqrt(Instruction &I, TargetTransformInfo &TTI,

foldSqrt(Instruction &I, TargetTransformInfo &TTI, TargetLibraryInfo &TLI) { TargetLibraryInfo &TLI) {

// Match a call to sqrt mathlib function. // Match a call to sqrt mathlib function.

auto *Call = dyn_cast<CallInst>(&I); auto *Call = dyn_cast<CallInst>(&I);

if (!Call) if (!Call)

return false; return false;

Module *M = Call->getModule(); Module *M = Call->getModule();

LibFunc Func; LibFunc Func;

if (!TLI.getLibFunc(*Call, Func) || !isLibFuncEmittable(M, &TLI, Func)) if (!TLI.getLibFunc(*Call, Func) || !isLibFuncEmittable(M, &TLI, Func))

▲ Show 20 Lines • Show All 405 Lines • ▼ Show 20 Lines static bool foldConsecutiveLoads(Instruction &I, const DataLayout &DL,

// shift if not zero. // shift if not zero.

if (LOps.Shift) if (LOps.Shift)

NewOp = Builder.CreateShl(NewOp, LOps.Shift); NewOp = Builder.CreateShl(NewOp, LOps.Shift);

I.replaceAllUsesWith(NewOp); I.replaceAllUsesWith(NewOp);

return true; return true;

} }

/// If C is a constant patterned array and all valid loaded results for given

/// alignment are same to a constant, return that constant.

static bool foldPatternedLoads(Instruction &I, const DataLayout &DL) {

auto *LI = dyn_cast<LoadInst>(&I);

nikicUnsubmitted

Not Done

Unnecessary/unusual newline.

nikic: Unnecessary/unusual newline.

if (!LI || LI->isVolatile())

return false;

nikicUnsubmitted

Not Done

Should add a negative test using volatile.

nikic: Should add a negative test using volatile.

// We can only fold the load if it is from a constant global with definitive

// initializer. Skip expensive logic if this is not the case.

auto *PtrOp = LI->getPointerOperand();

auto *GV = dyn_cast<GlobalVariable>(getUnderlyingObject(PtrOp));

nikicUnsubmitted

Not Done

getPointerOperand()` is a bit more elegant.

nikic: getPointerOperand()` is a bit more elegant.

if (!GV || !GV->isConstant() || !GV->hasDefinitiveInitializer())

return false;

Type *LoadTy = LI->getType();

Constant *C = GV->getInitializer();

nikicUnsubmitted

Not Done

This comment looks out of place, should be further down?

nikic: This comment looks out of place, should be further down?

// Bail for large initializers in excess of 4K to avoid too many scans.

nikicUnsubmitted

Not Done

What you want to do is after the GEPOperator loop, check that the final PtrOpV is GV. This makes sure there were only GEPs.

nikic: What you want to do is after the GEPOperator loop, check that the final PtrOpV is GV. This…

khei4AuthorUnsubmitted

Done

Thanks! I'll check that before and after the loop!

khei4: Thanks! I'll check that before and after the loop!

uint64_t GVSize = DL.getTypeAllocSize(C->getType());

if (!GVSize || 4096 < GVSize)

return false;

nikicUnsubmitted

Not Done

This isn't the right way to compute the GEP stride: GEP works on alloc sizes rather than store sizes, and can have multiple indices. You can use the collectOffset() methods to handle all the cases, and then GreatestCommonDivisor() to calculate the GCD.

nikic: This isn't the right way to compute the GEP stride: GEP works on alloc sizes rather than store…

khei4AuthorUnsubmitted

Done

You can use the collectOffset() methods to handle all the cases, and then GreatestCommonDivisor() to calculate the GCD.

These methods save me! thanks!

khei4: > You can use the collectOffset() methods to handle all the cases, and then…

nikicUnsubmitted

Not Done

64K -> 4K

nikic: 64K -> 4K

// Check whether pointer arrives back at Global Variable.

nikicUnsubmitted

Not Done

uint64_t

nikic: `uint64_t`

// If PtrOp is neither GlobalVariable nor GEP, it might not arrive back at

// GlobalVariable.

// TODO: implement GEP handling

unsigned BW = DL.getIndexTypeSizeInBits(PtrOp->getType());

nikicUnsubmitted

Not Done

"to avoid scanning too much memory" maybe -- we're not really allocating anything here.

nikic: "to avoid scanning too much memory" maybe -- we're not really allocating anything here.

// TODO: Determine stride based on GEPs.

APInt Stride(BW, 1);

nikicUnsubmitted

Not Done

This is again creating a very, very wide integer :) The correct size to use here is DL.getIndexTypeSizeInBits(PtrOp->getType()) as you already do in the loop. In that case you also don't need the separate COff variable, you can let collectOffset() directly add onto this one.

nikic: This is again creating a very, very wide integer :) The correct size to use here is `DL.

khei4AuthorUnsubmitted

Done

Oh... I was confused with offset and the GV byte sizes! You're right!

khei4: Oh... I was confused with offset and the GV byte sizes! You're right!

nikicUnsubmitted

Not Done

std::optional<APInt> Stride;

- APInt ConstOffset = APInt(BW, 0);

+ APInt ConstOffset(BW, 0);

if (isa<GlobalVariable>(PtrOp)) {

nikic:

APInt ConstOffset(BW, 0);

nikicUnsubmitted

Not Done

This looks somewhat dubious -- I assume the intention here is that this is a "no-op" value for the first GreatestCommonDivisor call?

I think it would be clearer to make Stride an std::optional and then set it the first time, and use GCD for later iterations. (After the loop, if Stride is not set, it's okay to just bail out: The constant-offset case is already handled by other code, so we needn't bother with it.)

nikic: This looks somewhat dubious -- I assume the intention here is that this is a "no-op" value for…

khei4AuthorUnsubmitted

Done

This looks somewhat dubious -- I assume the intention here is that this is a "no-op" value for the first GreatestCommonDivisor call?

Yes!

I think it would be clearer to make Stride a std::optional and then set it the first time,

Sounds reasonable, I'll use that!

khei4: > This looks somewhat dubious -- I assume the intention here is that this is a "no-op" value…

// Any possible offset could be multiple of GEP stride. And any valid

// offset is multiple of load alignment, so checking only multiples of bigger

nikicUnsubmitted

Not Done

These isa checks aren't needed in this patch: For the alignment case, we don't care what the operand is.

nikic: These isa checks aren't needed in this patch: For the alignment case, we don't care what the…

khei4AuthorUnsubmitted

Done

Seems like, I could not get the point of ptrmask cares! (Precisely, it seems like we need to check alignment-based and GEP-based separately for completeness) Thanks!

khei4: Seems like, I could not get the point of `ptrmask` cares! (Precisely, it seems like we need to…

// one is sufficient to say results' equality.

if (auto LA = LI->getAlign();

LA <= GV->getAlign().valueOrOne() && Stride->getZExtValue() < LA.value())

Stride = APInt(BW, LA.value());

nikicUnsubmitted

Not Done

This shouldn't be needed when using std::optional.

nikic: This shouldn't be needed when using `std::optional`.

khei4AuthorUnsubmitted

Done

I might have missed some patterns, but I guess it's necessary because GCD (self-referencing) calculations need its initial value as the first type size. :)

Stride = APIntOps::GreatestCommonDivisor(
    Stride,
    APInt(ConstOffset.getBitWidth(), IndexTypeSize.getZExtValue()));

khei4: I might have missed some patterns, but I guess it's necessary because GCD (self-referencing)…

Constant *Ca = ConstantFoldLoadFromConst(C, LoadTy, ConstOffset, DL);

nikicUnsubmitted

Not Done

This needs to bail out for volatile loads.

nikic: This needs to bail out for volatile loads.

if (!Ca)

return false;

nikicUnsubmitted

Not Done

I don't really get what this comment is trying to say. The reason why non-inbounds GEP is tricky to handle is that there is an implicit modulo the address space size, which makes it harder to calculate the GCD.

nikic: I don't really get what this comment is trying to say. The reason why non-inbounds GEP is…

unsigned E = GVSize - DL.getTypeStoreSize(LoadTy);

for (; ConstOffset.getZExtValue() <= E; ConstOffset += Stride)

nikicUnsubmitted

Not Done

You should probably extract the whole "get stride of GEP" logic into a separate function. That way, all these early bailouts don't apply for the alignment case (where we don't care about inbounds).

nikic: You should probably extract the whole "get stride of GEP" logic into a separate function. That…

khei4AuthorUnsubmitted

Done

(with above comment)
Ah, OK I might misunderstand the behavior of the load with non-inbounds GEP indices, but correct alignment!

khei4: (with above comment) Ah, OK I might misunderstand the behavior of the load with non-inbounds…

if (Ca != ConstantFoldLoadFromConst(C, LoadTy, ConstOffset, DL))

khei4AuthorUnsubmitted

Done

are there any better way to bitcast APInt?

khei4: are there any better way to bitcast APInt?

return false;

nikicUnsubmitted

Not Done

This is still checking for unique initializers -- should only check definitive initializer.

nikic: This is still checking for unique initializers -- should only check definitive initializer.

khei4AuthorUnsubmitted

Done

Thanks for saying again...

khei4: Thanks for saying again...

I.replaceAllUsesWith(Ca);

khei4AuthorUnsubmitted

Done

according to this alive results https://alive2.llvm.org/ce/z/5_EDLB

when load alignment is bigger, seems to be fine to return poison value

if (GV->getAlign().valueOrOne().value() < LoadAlign) {
  I.replaceAllUsesWith(PoisonValue::get(LoadTy));
  return true;
}

but the following test failed. (on check-llvm only targeted for X86)
llvm/test/Transforms/PhaseOrdering/X86/nancvt.ll

is this alive result wrong?

khei4: according to this alive results https://alive2.llvm.org/ce/z/5_EDLB when load alignment is…

return true;

}

/// This is the entry point for folds that could be implemented in regular /// This is the entry point for folds that could be implemented in regular

nikicUnsubmitted

Not Done

unsigned BW = DL.getIndexTypeSizeInBits(PtrOp->getType());

- std::optional<APInt> Stride;

- APInt ConstOffset = APInt(BW, 0);

- if (isa<GlobalVariable>(PtrOp)) {

- Stride = APInt(BW, 1);

- }

+ // TODO: Determine stride based on GEPs.

+ APInt Stride(BW, 1);

+ APInt ConstOffset(BW, 0);

// Any possible offset could be multiple of GEP stride. And any valid

// offset is multiple of load alignment, so checking only multiples of bigger

// one is sufficient to say results' equality.

if (auto LA = LI->getAlign();

- LA.value() <= GV->getAlign().valueOrOne().value() &&

- (!Stride || Stride->getZExtValue() < LA.value())) {

+ LA <= GV->getAlign().valueOrOne() &&

+ Stride->getZExtValue() < LA.value()) {

Stride = APInt(BW, LA.value());

}

- if (!Stride)

- return false;

Constant *Ca = ConstantFoldLoadFromConst(C, LoadTy, ConstOffset, DL);

if (!Ca)

return false;

- unsigned E = GVSize - DL.getTypeAllocSize(LoadTy);

- for (; ConstOffset.getZExtValue() <= E; ConstOffset += *Stride)

+ unsigned E = GVSize - DL.getTypeStoreSize(LoadTy);

+ for (; ConstOffset.getZExtValue() <= E; ConstOffset += Stride)

if (Ca != ConstantFoldLoadFromConst(C, LoadTy, ConstOffset, DL))

return false;

I.replaceAllUsesWith(Ca);

Some simplifications.

I think as far as foldPatternedLoads is concerned, we don't need the std::optional around Stride, because a Stride of 1 is always conservatively correct. It's only relevant internally while calculating GEP stride in the next patch.

nikic: Some simplifications. I think as far as foldPatternedLoads is concerned, we don't need the std…

khei4AuthorUnsubmitted

Done

Thanks! Yeah, simplicity for this patch matters!

khei4: Thanks! Yeah, simplicity for this patch matters!

/// InstCombine, but they are separated because they are not expected to /// InstCombine, but they are separated because they are not expected to

nikicUnsubmitted

Not Done

Even if we can't use the alignment, we can still use the GEP stride.

nikic: Even if we can't use the alignment, we can still use the GEP stride.

khei4AuthorUnsubmitted

Done

Done.
(just for my understanding, I'll clarify the case for bigger load alignment in discord.

khei4: Done. (just for my understanding, I'll clarify the case for bigger load alignment in discord.

nikicUnsubmitted

Not Done

indeces -> indices

nikic: indeces -> indices

/// occur frequently and/or have more than a constant-length pattern match. /// occur frequently and/or have more than a constant-length pattern match.

khei4AuthorUnsubmitted

Done

Currently

khei4: Currently

khei4AuthorUnsubmitted

Done

Current implementation cannot read ptr array

khei4: Current implementation cannot read ptr array

static bool foldUnusualPatterns(Function &F, DominatorTree &DT, static bool foldUnusualPatterns(Function &F, DominatorTree &DT,

nikicUnsubmitted

Not Done

Hm, what happens if ConstOffset itself is negative? Would urem produce correct results in that case? Let's say ConstOffset = -3 and Stride = 3. Let's assume 8-bit address space to keep things simple, then ConstOffset = 253 in unsigned interpretation and 253 % 3 == 1.

Also I just realized another problem here: Just using the GCD isn't right if the GEP is not inbounds (i.e. multiplication can overflow) and the stride is not a power of two.

I believe the correct way to handle these cases is implemented in BasicAA, in particular https://github.com/llvm/llvm-project/blob/47aa1fe376a477939a1991ffe37504124af25f52/llvm/lib/Analysis/BasicAliasAnalysis.cpp#L1135-L1138 to only keep a power of two factor for non-inbounds and https://github.com/llvm/llvm-project/blob/47aa1fe376a477939a1991ffe37504124af25f52/llvm/lib/Analysis/BasicAliasAnalysis.cpp#L1168-L1170 for the final modulus.

nikic: Hm, what happens if ConstOffset itself is negative? Would urem produce correct results in that…

khei4AuthorUnsubmitted

Done

Ah, I mistakenly assumed inbounds! Without inbounds, ConstantOffset could be negative and could overflow!

khei4: Ah, I mistakenly assumed `inbounds`! Without `inbounds`, ConstantOffset could be negative and…

TargetTransformInfo &TTI, TargetTransformInfo &TTI,

nikicUnsubmitted

Not Done

A correctness check missing here is that after looking through all the GEPs, you must arrive back at GV. Otherwise there might be some kind of other operation sitting in between which preserves the underlying object, but where we can't determine the offset. As a test case, one could use a call to llvm.ptrmask for example.

nikic: A correctness check missing here is that after looking through all the GEPs, you must arrive…

khei4AuthorUnsubmitted

Done

Good catch! I haven't known underlying-object preserved operations! I'll add tests!

khei4: Good catch! I haven't known underlying-object preserved operations! I'll add tests!

TargetLibraryInfo &TLI, AliasAnalysis &AA) { TargetLibraryInfo &TLI, AliasAnalysis &AA) {

bool MadeChange = false; bool MadeChange = false;

for (BasicBlock &BB : F) { for (BasicBlock &BB : F) {

// Ignore unreachable basic blocks. // Ignore unreachable basic blocks.

if (!DT.isReachableFromEntry(&BB)) if (!DT.isReachableFromEntry(&BB))

continue; continue;

const DataLayout &DL = F.getParent()->getDataLayout(); const DataLayout &DL = F.getParent()->getDataLayout();

nikicUnsubmitted

Not Done

Can just omit the Offset argument here (it's zero by default).

nikic: Can just omit the Offset argument here (it's zero by default).

nikicUnsubmitted

Not Done

This probably also needs to check that ConstOffset is zero? I don't think using load alignment as stride is right with a non-zero start offset.

nikic: This probably also needs to check that ConstOffset is zero? I don't think using load alignment…

khei4AuthorUnsubmitted

Done

Good catch! Yeah, we should have considered GEP-stride-based and alignment-based separately also for ConstOffset.

khei4: Good catch! Yeah, we should have considered GEP-stride-based and alignment-based separately…

// Walk the block backwards for efficiency. We're matching a chain of // Walk the block backwards for efficiency. We're matching a chain of

// use->defs, so we're more likely to succeed by starting from the bottom. // use->defs, so we're more likely to succeed by starting from the bottom.

// Also, we want to avoid matching partial patterns. // Also, we want to avoid matching partial patterns.

// TODO: It would be more efficient if we removed dead instructions // TODO: It would be more efficient if we removed dead instructions

nikicUnsubmitted

Not Done

It would be a bit simpler to keep using (and adding to) ConstOffset here, as we need an APInt anyway.

nikic: It would be a bit simpler to keep using (and adding to) ConstOffset here, as we need an APInt…

// iteratively in this loop rather than waiting until the end. // iteratively in this loop rather than waiting until the end.

for (Instruction &I : make_early_inc_range(llvm::reverse(BB))) { for (Instruction &I : make_early_inc_range(llvm::reverse(BB))) {

MadeChange |= foldAnyOrAllBitsSet(I); MadeChange |= foldAnyOrAllBitsSet(I);

MadeChange |= foldGuardedFunnelShift(I, DT); MadeChange |= foldGuardedFunnelShift(I, DT);

nikicUnsubmitted

Not Done

Do we need the LoadSize adjustment here? Can we instead iterate to ByteOffset < GVSize?

The current LoadSize calculation is not right for pointers, but it seem like we can just drop it entirely.

nikic: Do we need the `LoadSize` adjustment here? Can we instead iterate to `ByteOffset < GVSize`?

khei4AuthorUnsubmitted

Done

Do we need the LoadSize adjustment here? Can we instead iterate to ByteOffset < GVSize?

I think bound-crossing load cannot be excluded, without LoadSize.
like https://alive2.llvm.org/ce/z/jYsLBk
Sorry, I'm not confident with the semantics of this. I'll try to ask in discord.

khei4: > Do we need the LoadSize adjustment here? Can we instead iterate to ByteOffset < GVSize? I…

khei4AuthorUnsubmitted

Done

I removed loadSize in the end

khei4: I removed loadSize in the end

MadeChange |= tryToRecognizePopCount(I); MadeChange |= tryToRecognizePopCount(I);

MadeChange |= tryToFPToSat(I, TTI); MadeChange |= tryToFPToSat(I, TTI);

MadeChange |= tryToRecognizeTableBasedCttz(I); MadeChange |= tryToRecognizeTableBasedCttz(I);

MadeChange |= foldConsecutiveLoads(I, DL, TTI, AA); MadeChange |= foldConsecutiveLoads(I, DL, TTI, AA);

nikicUnsubmitted

Not Done

It's okay to just use 64 as the APInt size in this context. You are currently using the size of the initializer, which will make for a *very* wide integer...

nikic: It's okay to just use 64 as the APInt size in this context. You are currently using the size of…

khei4AuthorUnsubmitted

Done

Thanks! It's embarrassing...

khei4: Thanks! It's embarrassing...

MadeChange |= foldPatternedLoads(I, DL);

// NOTE: This function introduces erasing of the instruction `I`, so it // NOTE: This function introduces erasing of the instruction `I`, so it

// needs to be called at the end of this sequence, otherwise we may make // needs to be called at the end of this sequence, otherwise we may make

// bugs. // bugs.

MadeChange |= foldSqrt(I, TTI, TLI); MadeChange |= foldSqrt(I, TTI, TLI);

} }

// We're done with transforms, so remove dead instructions. // We're done with transforms, so remove dead instructions.

Show All 36 Lines

llvm/test/Transforms/AggressiveInstCombine/patterned-load.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -passes=aggressive-instcombine -S -data-layout="e" \| FileCheck %s --check-prefixes=CHECK,LE		; RUN: opt < %s -passes=aggressive-instcombine -S -data-layout="e" \| FileCheck %s --check-prefixes=CHECK,LE
; RUN: opt < %s -passes=aggressive-instcombine -S -data-layout="E" \| FileCheck %s --check-prefixes=CHECK,BE		; RUN: opt < %s -passes=aggressive-instcombine -S -data-layout="E" \| FileCheck %s --check-prefixes=CHECK,BE


@constarray1 = internal constant [8 x i8] c"\01\00\01\00\01\00\01\00", align 4		@constarray1 = internal constant [8 x i8] c"\01\00\01\00\01\00\01\00", align 4
@constarray2 = internal constant [8 x i8] c"\FF\FF\01\00\01\00\01\00", align 4		@constarray2 = internal constant [8 x i8] c"\FF\FF\01\00\01\00\01\00", align 4

@g = internal constant i32 42		@g = internal constant i32 42
@constptrarray = internal constant [4 x ptr] [ptr @g, ptr @g, ptr @g, ptr @g], align 4		@constptrarray = internal constant [4 x ptr] [ptr @g, ptr @g, ptr @g, ptr @g], align 4

@constpackedstruct = internal constant <{[8 x i8]}> <{[8 x i8] c"\01\00\01\00\01\00\01\00"}>, align 4		@constpackedstruct = internal constant <{[8 x i8]}> <{[8 x i8] c"\01\00\01\00\01\00\01\00"}>, align 4
@conststruct = internal constant {i16, [8 x i8]} {i16 1, [8 x i8] c"\01\00\01\00\01\00\01\00"}, align 4		@conststruct = internal constant {i16, [8 x i8]} {i16 1, [8 x i8] c"\01\00\01\00\01\00\01\00"}, align 4

; TODO: this will be ret i8 1
define i8 @inbounds_gep_load_i8_align2(i64 %idx){		define i8 @inbounds_gep_load_i8_align2(i64 %idx){
; CHECK-LABEL: @inbounds_gep_load_i8_align2(		; CHECK-LABEL: @inbounds_gep_load_i8_align2(
; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i8, ptr @constarray1, i64 [[IDX:%.]]		; CHECK-NEXT: ret i8 1
; CHECK-NEXT: [[TMP2:%.*]] = load i8, ptr [[TMP1]], align 2
; CHECK-NEXT: ret i8 [[TMP2]]
;		;
%1 = getelementptr inbounds i8, ptr @constarray1, i64 %idx		%1 = getelementptr inbounds i8, ptr @constarray1, i64 %idx
%2 = load i8, ptr %1, align 2		%2 = load i8, ptr %1, align 2
ret i8 %2		ret i8 %2
}		}

; can't be folded because access with i8 strides is not patterned.		; can't be folded because access with i8 strides is not patterned.
define i8 @inbounds_gep_load_i8_align1(i64 %idx){		define i8 @inbounds_gep_load_i8_align1(i64 %idx){
Show All 19 Lines	;
ret i8 %2		ret i8 %2
}		}

declare ptr @llvm.ptrmask.p0.i64(ptr , i64)		declare ptr @llvm.ptrmask.p0.i64(ptr , i64)

; can't be folded because ptrmask can change ptr, while preserving provenance		; can't be folded because ptrmask can change ptr, while preserving provenance
define i8 @inbounds_gep_load_i8_align2_ptrmasked(i64 %idx, i64 %mask){		define i8 @inbounds_gep_load_i8_align2_ptrmasked(i64 %idx, i64 %mask){
; CHECK-LABEL: @inbounds_gep_load_i8_align2_ptrmasked(		; CHECK-LABEL: @inbounds_gep_load_i8_align2_ptrmasked(
; CHECK-NEXT: [[TMP1:%.]] = call ptr @llvm.ptrmask.p0.i64(ptr @constarray1, i64 [[MASK:%.]])		; CHECK-NEXT: ret i8 1
; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i8, ptr [[TMP1]], i64 [[IDX:%.]]
; CHECK-NEXT: [[TMP3:%.*]] = load i8, ptr [[TMP2]], align 2
; CHECK-NEXT: ret i8 [[TMP3]]
;		;
%1 = call ptr @llvm.ptrmask.p0.i64(ptr @constarray1, i64 %mask)		%1 = call ptr @llvm.ptrmask.p0.i64(ptr @constarray1, i64 %mask)
%2 = getelementptr inbounds i8, ptr %1, i64 %idx		%2 = getelementptr inbounds i8, ptr %1, i64 %idx
%3 = load i8, ptr %2, align 2		%3 = load i8, ptr %2, align 2
ret i8 %3		ret i8 %3
}		}

; TODO: this will be ret i32 65537(LE), 16777472(BE)		; TODO: this will be ret i32 65537(LE), 16777472(BE)
define i32 @inbounds_gep_i16_load_i32_align1(i64 %idx){		define i32 @inbounds_gep_i16_load_i32_align1(i64 %idx){
; CHECK-LABEL: @inbounds_gep_i16_load_i32_align1(		; CHECK-LABEL: @inbounds_gep_i16_load_i32_align1(
; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i16, ptr @constarray1, i64 [[IDX:%.]]		; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i16, ptr @constarray1, i64 [[IDX:%.]]
; CHECK-NEXT: [[TMP2:%.*]] = load i32, ptr [[TMP1]], align 1		; CHECK-NEXT: [[TMP2:%.*]] = load i32, ptr [[TMP1]], align 1
; CHECK-NEXT: ret i32 [[TMP2]]		; CHECK-NEXT: ret i32 [[TMP2]]
;		;
%1 = getelementptr inbounds i16, ptr @constarray1, i64 %idx		%1 = getelementptr inbounds i16, ptr @constarray1, i64 %idx
%2 = load i32, ptr %1, align 1		%2 = load i32, ptr %1, align 1
ret i32 %2		ret i32 %2
}		}
		nikicUnsubmitted Not Done Reply Inline Actions I don't get how this one folds with just this patch. If align 1 we have stride 1, but constarray1 needs a stride of 2, no? nikic: I don't get how this one folds with just this patch. If align 1 we have stride 1, but…
		khei4AuthorUnsubmitted Done Reply Inline Actions Ah, sorry seems like I just forgot to commit test changes... khei4: Ah, sorry seems like I just forgot to commit test changes...

; TODO: this will be ret i32 65537(LE), 16777472(BE)		; TODO: this will be ret i32 65537(LE), 16777472(BE)
define i32 @inbounds_gep_i32_load_i32_align8(i64 %idx){		define i32 @inbounds_gep_i32_load_i32_align8(i64 %idx){
; CHECK-LABEL: @inbounds_gep_i32_load_i32_align8(		; CHECK-LABEL: @inbounds_gep_i32_load_i32_align8(
; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, ptr @constarray1, i64 [[IDX:%.]]		; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, ptr @constarray1, i64 [[IDX:%.]]
; CHECK-NEXT: [[TMP2:%.*]] = load i32, ptr [[TMP1]], align 8		; CHECK-NEXT: [[TMP2:%.*]] = load i32, ptr [[TMP1]], align 8
; CHECK-NEXT: ret i32 [[TMP2]]		; CHECK-NEXT: ret i32 [[TMP2]]
;		;
Show All 11 Lines
; CHECK-NEXT: ret i32 [[TMP3]]		; CHECK-NEXT: ret i32 [[TMP3]]
;		;
%1 = getelementptr inbounds i16, ptr @constarray2, i64 1		%1 = getelementptr inbounds i16, ptr @constarray2, i64 1
%2 = getelementptr inbounds i32, ptr %1, i64 %idx		%2 = getelementptr inbounds i32, ptr %1, i64 %idx
%3 = load i32, ptr %2, align 4		%3 = load i32, ptr %2, align 4
ret i32 %3		ret i32 %3
}		}

; TODO: this coould be folded into 65537(LE), 16777472(BE)
define i32 @gep_load_i32_align2_const_offset(i64 %idx){		define i32 @gep_load_i32_align2_const_offset(i64 %idx){
; CHECK-LABEL: @gep_load_i32_align2_const_offset(		; LE-LABEL: @gep_load_i32_align2_const_offset(
; CHECK-NEXT: [[TMP1:%.*]] = getelementptr i16, ptr @constarray1, i64 -2		; LE-NEXT: ret i32 65537
; CHECK-NEXT: [[TMP2:%.]] = getelementptr [3 x i16], ptr [[TMP1]], i64 [[IDX:%.]]		;
; CHECK-NEXT: [[TMP3:%.*]] = load i32, ptr [[TMP2]], align 2		; BE-LABEL: @gep_load_i32_align2_const_offset(
; CHECK-NEXT: ret i32 [[TMP3]]		; BE-NEXT: ret i32 16777472
;		;
%1 = getelementptr i16, ptr @constarray1, i64 -2		%1 = getelementptr i16, ptr @constarray1, i64 -2
%2 = getelementptr [3 x i16], ptr %1, i64 %idx		%2 = getelementptr [3 x i16], ptr %1, i64 %idx
%3 = load i32, ptr %2, align 2		%3 = load i32, ptr %2, align 2
ret i32 %3		ret i32 %3
}		}

; can't be folded because if gep is non-inbounds,		; can't be folded because if gep is non-inbounds,
Show All 21 Lines
; CHECK-NEXT: ret i32 [[TMP3]]		; CHECK-NEXT: ret i32 [[TMP3]]
;		;
%1 = getelementptr inbounds ptr, ptr @constptrarray, i64 %idx		%1 = getelementptr inbounds ptr, ptr @constptrarray, i64 %idx
%2 = load ptr, ptr %1, align 4		%2 = load ptr, ptr %1, align 4
%3 = load i32, ptr %2, align 4		%3 = load i32, ptr %2, align 4
ret i32 %3		ret i32 %3
}		}

; TODO: this coould be folded into 65537(LE), 16777472(BE)
define i32 @inbounds_gep_i32_load_i32_align4_packedstruct(i64 %idx){		define i32 @inbounds_gep_i32_load_i32_align4_packedstruct(i64 %idx){
; CHECK-LABEL: @inbounds_gep_i32_load_i32_align4_packedstruct(		; LE-LABEL: @inbounds_gep_i32_load_i32_align4_packedstruct(
; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, ptr @constpackedstruct, i64 [[IDX:%.]]		; LE-NEXT: ret i32 65537
; CHECK-NEXT: [[TMP2:%.*]] = load i32, ptr [[TMP1]], align 4		;
; CHECK-NEXT: ret i32 [[TMP2]]		; BE-LABEL: @inbounds_gep_i32_load_i32_align4_packedstruct(
		; BE-NEXT: ret i32 16777472
;		;
%1 = getelementptr inbounds i32, ptr @constpackedstruct, i64 %idx		%1 = getelementptr inbounds i32, ptr @constpackedstruct, i64 %idx
%2 = load i32, ptr %1, align 4		%2 = load i32, ptr %1, align 4
ret i32 %2		ret i32 %2
}		}

; can't be folded because results are not equal		; can't be folded because results are not equal
define i32 @inbounds_gep_i8_load_i32_align1_packedstruct(i64 %idx){		define i32 @inbounds_gep_i8_load_i32_align1_packedstruct(i64 %idx){
; CHECK-LABEL: @inbounds_gep_i8_load_i32_align1_packedstruct(		; CHECK-LABEL: @inbounds_gep_i8_load_i32_align1_packedstruct(
; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i8, ptr @constpackedstruct, i64 [[IDX:%.]]		; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i8, ptr @constpackedstruct, i64 [[IDX:%.]]
; CHECK-NEXT: [[TMP2:%.*]] = load i32, ptr [[TMP1]], align 1		; CHECK-NEXT: [[TMP2:%.*]] = load i32, ptr [[TMP1]], align 1
; CHECK-NEXT: ret i32 [[TMP2]]		; CHECK-NEXT: ret i32 [[TMP2]]
;		;
%1 = getelementptr inbounds i8, ptr @constpackedstruct, i64 %idx		%1 = getelementptr inbounds i8, ptr @constpackedstruct, i64 %idx
%2 = load i32, ptr %1, align 1		%2 = load i32, ptr %1, align 1
ret i32 %2		ret i32 %2
}		}

; TODO: this coould be folded into 65537(LE), 16777472(BE)		; TODO: this coould be folded into 65537(LE), 16777472(BE)
define i32 @inbounds_gep_i32_load_i32_align4_struct_with_const_offset(i64 %idx){		define i32 @inbounds_gep_i32_load_i32_align4_struct_with_const_offset(i64 %idx){
; CHECK-LABEL: @inbounds_gep_i32_load_i32_align4_struct_with_const_offset(		; LE-LABEL: @inbounds_gep_i32_load_i32_align4_struct_with_const_offset(
; CHECK-NEXT: [[TMP1:%.*]] = getelementptr inbounds i16, ptr @conststruct, i64 1		; LE-NEXT: ret i32 65537
; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, ptr [[TMP1]], i64 [[IDX:%.]]		;
; CHECK-NEXT: [[TMP3:%.*]] = load i32, ptr [[TMP2]], align 4		; BE-LABEL: @inbounds_gep_i32_load_i32_align4_struct_with_const_offset(
; CHECK-NEXT: ret i32 [[TMP3]]		; BE-NEXT: [[TMP1:%.*]] = getelementptr inbounds i16, ptr @conststruct, i64 1
		; BE-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, ptr [[TMP1]], i64 [[IDX:%.]]
		; BE-NEXT: [[TMP3:%.*]] = load i32, ptr [[TMP2]], align 4
		; BE-NEXT: ret i32 [[TMP3]]
;		;
%1 = getelementptr inbounds i16, ptr @conststruct, i64 1		%1 = getelementptr inbounds i16, ptr @conststruct, i64 1
%2 = getelementptr inbounds i32, ptr %1, i64 %idx		%2 = getelementptr inbounds i32, ptr %1, i64 %idx
%3 = load i32, ptr %2, align 4		%3 = load i32, ptr %2, align 4
ret i32 %3		ret i32 %3
}		}

;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
; BE: {{.*}}
; LE: {{.*}}

This is an archive of the discontinued LLVM Phabricator instance.

[AggressiveInstCombine] folding load for constant global patterened arrays and structs by alignmentClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 507385

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp

llvm/test/Transforms/AggressiveInstCombine/patterned-load.ll

[AggressiveInstCombine] folding load for constant global patterened arrays and structs by alignment
ClosedPublic