This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Transforms/Scalar/
-
llvm/
-
Transforms/
-
Scalar/
-
SROA.h
-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
6/16
SROA.cpp
-
test/Transforms/SROA/
-
Transforms/
-
SROA/
-
basictest.ll
-
split-integer-be.ll
1
split-integer.ll

Differential D68414

[SROA] Enhance AggLoadStoreRewriter to rewrite integer load/store if it covers multi fields in original aggregate
Needs ReviewPublic

Authored by Carrot on Oct 3 2019, 12:07 PM.

Download Raw Diff

Details

Reviewers

chandlerc
arsenm
t.p.northover
hfinkel
cherry
efriedma
fhahn
reames

Summary

The motivated example is:

enum ResultType {
  a, b, c, d, 
  Error,
};

struct Result {
  Result(ResultType type = Error, unsigned hash = 0)
    : type(type), hash(hash) {}
  ResultType type; 
  unsigned hash; 
};

template<typename Function>
inline Result foo(Function function) {
  bool done; 
  Result result;
  std::tie(done, result) = function();
  if (done) return result;
  return Result(Error);
}

int main(int argc, char** argv) {
  auto function = [] { return std::make_tuple(false, Result()); };
  Result result = foo(function);
  return int(result.type);
}

When compiled with libc++, llvm generates:

movb    $0, -16(%rsp)
movq    $4, -12(%rsp)
movq    -16(%rsp), %rcx
movq    %rcx, -16(%rsp)
movl    $4, %eax
testb   %cl, %cl 
je      .LBB0_2

All of the memory accesses are redundant.

The problem is the underlying tuple structure looks like

{i8, {i32, i32}}

Its total size is 96 bit, small enough to be returned through registers, but as function return value its type is changed to

{i64, i32}

So for the temporary alloca object to receive the result of the lambda function, it is written and read as different types. When alloca slices are built from memory accesses, these slices overlapped with each other

Slices of alloca:   %6 = alloca %"struct.std::__u::__tuple_impl", align 8
  [0,8) slice #0
    used by:   store i64 %20, i64* %22
  [0,1) slice #1
    used by:   %31 = load i8, i8* %30, align 8
  [0,12) slice #2 (splittable)
    used by:   call void @llvm.lifetime.end.p0i8(i64 12, i8* %40)
  [0,12) slice #3 (splittable)
    used by:   call void @llvm.lifetime.start.p0i8(i64 12, i8* %12)
  [4,12) slice #4
    used by:   %37 = load i64, i64* %36, align 4
  [8,12) slice #5
    used by:   store i32 %21, i32* %23, align 8

then all of the slices are grouped together as a single one, so no SROA occurred.

This patch solved the problem by splitting some integer load/store which covers multiple fields of the alloca aggregate, and these fields have different parent structure. In following example

{i32, {i32, i32}}

%ptrval = ptrtoint %struct.ptr* %ptr to i64
%ptrval2 = add i64 %ptrval, 4
%ptr1 = inttoptr i64 %ptrval to i64*
%ptr2 = inttoptr i64 %ptrval2 to i64*
%val1 = load i64, i64* ptr1, align 4
%val2 = load i64, i64* ptr2, align 4

The first 64-bit load will be rewritten to 2 32-bit loads because it actually access 2 fields in the original aggregate, and the two fields don't belong to the same inner structure.

The second load won't be rewritten because all fields accessed by the load belong to the same inner structure, it's a common case in LLVM IR.

Diff Detail

Event Timeline

Carrot created this revision.Oct 3 2019, 12:07 PM

Herald added a project: Restricted Project. · View Herald TranscriptOct 3 2019, 12:07 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Carrot added reviewers: chandlerc, arsenm.Oct 8 2019, 3:53 PM

Herald added a subscriber: wdng. · View Herald TranscriptOct 8 2019, 3:53 PM

ping

ping^2

lebedev.ri edited the summary of this revision. (Show Details)Oct 24 2019, 2:49 PM

Carrot added a reviewer: t.p.northover.Oct 29 2019, 9:57 AM

Carrot added a reviewer: hfinkel.Nov 5 2019, 9:26 AM

ping

lkail added a subscriber: lkail.Nov 15 2019, 3:12 PM

Carrot added a reviewer: reames.Dec 20 2019, 4:01 PM

Carrot added a reviewer: cherry.Dec 30 2019, 3:07 PM

Could anybody help to review this last decade's patch?
Thanks a lot!

arsenm added inline comments.Feb 13 2020, 4:49 PM

lib/Transforms/Scalar/SROA.cpp
3470 ↗	(On Diff #223063)	Unchecked dyn cast, jut cast
3550 ↗	(On Diff #223063)	Unchecked dyn_cast
test/Transforms/SROA/split-integer.ll
15 ↗	(On Diff #223063)	Test should be smaller, check more, and use named values

Carrot updated this revision to Diff 246850.Feb 26 2020, 3:42 PM

Carrot marked 4 inline comments as done.

Herald added a subscriber: hiraditya. · View Herald TranscriptFeb 26 2020, 3:42 PM

Carrot added inline comments.Feb 26 2020, 3:43 PM

test/Transforms/SROA/split-integer.ll
15 ↗	(On Diff #223063)	Test is simplified. The IR after SROA is quite redundant, they will soon be cleaned up by following passes. And future enhancement of SROA can also change them, so the IR after SROA is not very reliable, other IR changes is not a regression when there is no alloca/store/load. So I prefer to check alloca/store/load only.

ping

reames resigned from this revision.Mar 25 2020, 11:10 AM

ping

xbolva00 added reviewers: efriedma, fhahn.Apr 1 2020, 5:12 PM

My biggest concern with this patch is that you're blindly rewriting based on the type of the alloca. That type isn't always reliable, particularly when you're dealing with unions. And even in cases where it is reliable, it might not reflect the actual operations the code ends up using in practice. I'm concerned you'll end up preemptively splitting operations that didn't need to be split, and hurt performance by doing that.

If we're going to do a transform like this, I'd like to see the following:

Base the transform on the SROA slices rather than the IR type. IR types of memory are not trustworthy; the true shape should be based on actual usage.
Specifically target memory operations where splitting them would unblock mem2reg. Otherwise it isn't obvious the transform is profitable. Partitions which can't be transformed to registers aren't that common, but various operations can trigger this: volatile operations, operations behind a select, etc.

llvm/lib/Transforms/Scalar/SROA.cpp
3471	Rearranging functions like this makes it harder to read the patch.

The most complex part of this patch is avoiding "blindly rewriting", and most of the new data structures are used for this purpose.

Suppose we have following type definition and pointers

%inner = type { i32, i32 }
%outer = type { %inner, %inner }

%local = alloca %outer, align 8
%inner_ptr1 = getelementptr inbounds %outer, %outer* %local, i64 0, i32 0, i32 0
%inner_ptr2 = getelementptr inbounds %outer, %outer* %local, i64 0, i32 0, i32 1
%inner_ptr3 = getelementptr inbounds %outer, %outer* %local, i64 0, i32 1, i32 0
%ptr1 = bitcast i32* %inner_ptr1 to i64*
%ptr2 = bitcast i32* %inner_ptr2 to i64*
%ptr3 = bitcast i32* %inner_ptr3 to i64*

And then we use these pointers to load 64bit value.

%load1 = load i64, i64* %ptr1
%load2 = load i64, i64* %ptr2
%load3 = load i64, i64* %ptr3

load1 covers the first whole inner struct, load3 covers the second whole inner struct, these are very common in llvm, there are many such cases in llvm test cases, so they should not be split.
load2 covers the second part of the first inner struct and the first part of the second inner struct, this is very uncommon in llvm. And later when construct SROA slices, all load1, load2 and load3 forms a single slice, causes SROA failed to replace any field of the %local structure. Only split load2 earlier in AggLoadStoreRewriter, usual SROA slices can be formed later for load2 and load3 separately.

llvm/lib/Transforms/Scalar/SROA.cpp
3471	This is because the base class is changed from InstVisitor to PtrUseVisitor. These 2 classes have different return types for various visit functions. Fortunately they have similar semantics.

Your reply doesn't really address the most important point. The entire premise of your extractTypeFields is that the type of an alloca is a reliable guide to the way code will access that alloca. That simply is not true: clang cannot generate appropriate LLVM types in some cases. So you'll end up with weird behaviors in many cases. I understand that the patch tries to recognize certain cases, but it still doesn't address the fundamental problem.

Instead of extracting fields from the type, I expect the code to construct "fields" based on actual memory accesses.

llvm/lib/Transforms/Scalar/SROA.cpp
3471	I don't see the relationship between changing the base class and reordering the visit* functions. It should be possible to write the functions in any order for either base class.

In D68414#1978709, @efriedma wrote:

Your reply doesn't really address the most important point. The entire premise of your extractTypeFields is that the type of an alloca is a reliable guide to the way code will access that alloca. That simply is not true: clang cannot generate appropriate LLVM types in some cases. So you'll end up with weird behaviors in many cases.

Could you show me an example?

I understand that the patch tries to recognize certain cases, but it still doesn't address the fundamental problem.

Instead of extracting fields from the type, I expect the code to construct "fields" based on actual memory accesses.

For the following instruction in the attached test case the actual memory access is 64bit integer load. This is exactly the problematic instruction and I want to split it.

%split = load i64, i64* %altptr, align 8

llvm/lib/Transforms/Scalar/SROA.cpp
3471	I didn't reorder any visit* functions. I deleted some visit* functions because PtrUseVisitor has same default implementations. Phabricator diff algorithm matches the meaningless "}" and blank lines, causes the result looks strange.

Consider something like the following C testcase:

union U {
  struct A {
    long m1;
    int m2;
  } a;
  struct B {
    int m;
    struct C {
      int z[2];
    } c;
  } b;
};

void f(struct C);
void a(struct C c) {
  union U x = { .b = { 1, c } };
  f(x.b.c);
}

The alloca is generated with a type like "{i64, i32}", but all the accesses happen as it it were "{i32, i64}". If you assume "{i64, i32}" actually describes the memory accesses, you're paying the price for extra splitting with no profit. (In this particular case, we might eventually recover in instcombine, but more generally the optimization is going to be unpredictable.)

llvm/lib/Transforms/Scalar/SROA.cpp
3471	Oh, sorry, you're right, not your fault.

Thank you for the example.

The alloca type is declared as you described:

%union.U = type { %"struct.U::A" }
%"struct.U::A" = type { i64, i32 }

Some related instructions

%x = alloca %union.U, align 8
...
%b2 = bitcast %union.U* %x to %"struct.U::B"*
%agg.tmp.sroa.0.0..sroa_idx = getelementptr inbounds %"struct.U::B", %"struct.U::B"* %b2, i64 0, i32 1
%agg.tmp.sroa.0.0..sroa_cast = bitcast %struct.C* %agg.tmp.sroa.0.0..sroa_idx to i64*
%agg.tmp.sroa.0.0.copyload = load i64, i64* %agg.tmp.sroa.0.0..sroa_cast, align 4, !tbaa.struct !8

My patch doesn't split the last load instruction, because the 64bit load offset doesn't match a field in %union.U, to be conservative function isSplittableType returns false.

But if I change the definition of struct A to

struct D {

int d1; 
int d2;

};

struct A { 
  struct D m1; 
  int m2; 
} a;

Then it works as you described.

So for union types the alloca type is totally useless.

I will seek the road in SROA slices as you suggested in https://reviews.llvm.org/D68414#1965055.

Thanks a lot!

This version splits overlapped AllocaSlices like

S1   ------
S2      ------

into

S11  ---
S12     ---
S2      ------

So later these slices can be rewritten with scalar operations.

lebedev.ri added a subscriber: lebedev.ri.May 1 2020, 12:34 AM

lebedev.ri added inline comments.

llvm/lib/Transforms/Scalar/SROA.cpp
4142–4143	Ignorant question: is it guaranteed that there isn't a situation like this: /// S1 ------ /// S2 ------ /// S3 ------ I.e., there is no other slice S3 overlapping S2, with different overlap characteristic?

Carrot marked an inline comment as done.May 3 2020, 3:49 PM

Carrot added inline comments.

llvm/lib/Transforms/Scalar/SROA.cpp
4142–4143	Although I think it is extremely unlikely in real world applications, one can easily write such code in a synthetic test case. Function @test3 in basictest.ll demonstrates this situation, actually it is more crazily overlapped. With this patch more alloca space and related load/store instructions are eliminated.

I'm happy with the general approach here; just some questions about the details.

Also, can you gather numbers about how frequently this triggers over some testsuite like the LLVM testsuite or SPEC? I'd like some idea about how large the impact of this is going to be.

llvm/lib/Transforms/Scalar/SROA.cpp
4162	I think this is worst-case O(n^3) over the number of operations in a partition? There's one loop outside the function, and two loops inside the function. I guess it's unlikely to bite in most cases, but it's the sort of thing that can easily blow up to infinite compile-time.
4164	Is it possible that we don't end up actually splitting a partition after we pre-split? Could we try harder to avoid that?
4306	Given this is endian-sensitive, I'd like to see a big-endian testcase.

In D68414#2035503, @efriedma wrote:

I'm happy with the general approach here; just some questions about the details.

Also, can you gather numbers about how frequently this triggers over some testsuite like the LLVM testsuite or SPEC? I'd like some idea about how large the impact of this is going to be.

For LLVM testsuite function presplitOverlappedSlices returns true for 114 times, I think most of them are from basictest.ll.

llvm/lib/Transforms/Scalar/SROA.cpp
4162	I limit the number of calls to this function to some constant.
4164	This function only presplits slices, it doesn't split partitions. The partitions are only constructed on demand when we request an iterator from AllocaSlices like: for (auto &P : AS.partitions())

Carrot updated this revision to Diff 264140.May 14 2020, 7:07 PM

ping

For LLVM testsuite function presplitOverlappedSlices returns true for 114 times, I think most of them are from basictest.ll.

I mean the LLVM testsuite, not the regression tests. (http://llvm.org/docs/TestSuiteGuide.html). I'm trying to get an idea how wide the impact is on general C/C++ code. I'm not expecting this to trigger very frequently, but I want to make sure my intuition is correct.

llvm/lib/Transforms/Scalar/SROA.cpp
4141	`static const int MaxPresplitIterations = 128;`
4164	I wasn't trying to ask about the partitions datastructure. I meant, are there cases where we do the transform, but then SROA can't do any further transforms? So instead of a single load, we have two loads and some bit manipulation that doesn't simplify? I'd like to avoid splitting the load/store instruction if it''s not actually going to help.
4356	"been"

compile-time results: https://llvm-compile-time-tracker.com/compare.php?from=e616a4259889b55ed1bf5bf095f0e59658c6e311&to=0a2a92f815130c9ed2f0fed11850079bbd55038e&stat=instructions

As of vanilla llvm test-suite + RawSpeed, presplitOverlappedSlices() fires 629240 times, and succeeds 3 (three) times.
I would say that it should either succeed more, or cost less :)

|             statistic name             |  baseline |  proposed |    Δ   |    %   | \|%\| |
|:--------------------------------------:|:---------:|:---------:|:------:|:------:|:-----:|
| sroa.NumPresplitAttempted              |         0 |    629240 | 629240 |  0.00% | 0.00% |
| sroa.NumPresplitSuccess                |         0 |         3 |      3 |  0.00% | 0.00% |
| correlated-value-propagation.NumShlNUW |      4210 |      4209 |     -1 | -0.02% | 0.02% |
| correlated-value-propagation.NumShlNW  |      6274 |      6273 |     -1 | -0.02% | 0.02% |
| mem2reg.NumLocalPromoted               |      6245 |      6246 |      1 |  0.02% | 0.02% |
| correlated-value-propagation.NumNUW    |     15086 |     15085 |     -1 | -0.01% | 0.01% |
| sroa.MaxPartitionsPerAlloca            |     11933 |     11934 |      1 |  0.01% | 0.01% |
| stack-coloring.StackSlotMerged         |     12160 |     12159 |     -1 | -0.01% | 0.01% |
| SLP.NumVectorInstructions              |     34625 |     34626 |      1 |  0.00% | 0.00% |
| asm-printer.EmittedInsts               |   7936899 |   7936895 |     -4 |  0.00% | 0.00% |
| assembler.EmittedDataFragments         |   2500398 |   2500399 |      1 |  0.00% | 0.00% |
| assembler.EmittedFillFragments         |    423471 |    423472 |      1 |  0.00% | 0.00% |
| assembler.EmittedFragments             |   5157859 |   5157861 |      2 |  0.00% | 0.00% |
| assembler.FragmentLayouts              |  12143832 |  12143834 |      2 |  0.00% | 0.00% |
| assembler.ObjectBytes                  | 254675544 | 254675584 |     40 |  0.00% | 0.00% |
| assembler.evaluateFixup                |   7937674 |   7937675 |      1 |  0.00% | 0.00% |
| assume-queries.NumAssumeQueries        |   8436268 |   8436587 |    319 |  0.00% | 0.00% |
| basicaa.SearchTimes                    |  66366214 |  66366216 |      2 |  0.00% | 0.00% |
| bdce.NumRemoved                        |     43590 |     43589 |     -1 |  0.00% | 0.00% |
| codegenprepare.NumCastUses             |    375363 |    375361 |     -2 |  0.00% | 0.00% |
| codegenprepare.NumGEPsElim             |    106610 |    106609 |     -1 |  0.00% | 0.00% |
| correlated-value-propagation.NumNW     |     25516 |     25515 |     -1 |  0.00% | 0.00% |
| dagcombine.NodesCombined               |   3881288 |   3881285 |     -3 |  0.00% | 0.00% |
| dse.NumDomMemDefChecks                 |   3131956 |   3131955 |     -1 |  0.00% | 0.00% |
| dse.NumGetDomMemoryDefPassed           |   1084707 |   1084706 |     -1 |  0.00% | 0.00% |
| dse.NumRemainingStores                 |    846110 |    846108 |     -2 |  0.00% | 0.00% |
| early-cse.NumCSE                       |   2188895 |   2188891 |     -4 |  0.00% | 0.00% |
| early-cse.NumSimplify                  |    542909 |    542924 |     15 |  0.00% | 0.00% |
| gvn.NumGVNInstr                        |    325697 |    325693 |     -4 |  0.00% | 0.00% |
| gvn.NumGVNLoad                         |     76337 |     76336 |     -1 |  0.00% | 0.00% |
| gvn.NumGVNSimpl                        |     96093 |     96090 |     -3 |  0.00% | 0.00% |
| instcombine.NumCombined                |   3674269 |   3674267 |     -2 |  0.00% | 0.00% |
| instcombine.NumSunkInst                |     63820 |     63817 |     -3 |  0.00% | 0.00% |
| instcombine.NumWorklistIterations      |   2024510 |   2024511 |      1 |  0.00% | 0.00% |
| instcount.NumAllocaInst                |     45896 |     45895 |     -1 |  0.00% | 0.00% |
| instcount.NumBitCastInst               |    607901 |    607898 |     -3 |  0.00% | 0.00% |
| instcount.NumCallInst                  |   1760607 |   1760604 |     -3 |  0.00% | 0.00% |
| instcount.NumGetElementPtrInst         |   1177736 |   1177732 |     -4 |  0.00% | 0.00% |
| instcount.NumLoadInst                  |   1006106 |   1006105 |     -1 |  0.00% | 0.00% |
| instcount.NumStoreInst                 |    706082 |    706079 |     -3 |  0.00% | 0.00% |
| instcount.TotalInsts                   |   8826738 |   8826723 |    -15 |  0.00% | 0.00% |
| instcount.TotalIntegerInsts            |   2263812 |   2263811 |     -1 |  0.00% | 0.00% |
| instcount.TotalIntegerScalarInsts      |   2154781 |   2154780 |     -1 |  0.00% | 0.00% |
| instcount.TotalScalarInsts             |   8163514 |   8163499 |    -15 |  0.00% | 0.00% |
| isel.NumDAGIselRetries                 |  56963972 |  56963901 |    -71 |  0.00% | 0.00% |
| mcexpr.MCExprEvaluate                  |  39299357 |  39299360 |      3 |  0.00% | 0.00% |
| mem2reg.NumSingleStore                 |    556895 |    556897 |      2 |  0.00% | 0.00% |
| memory-builtins.ObjectVisitorArgument  |   1652515 |   1652519 |      4 |  0.00% | 0.00% |
| memory-builtins.ObjectVisitorLoad      |    560185 |    560201 |     16 |  0.00% | 0.00% |
| post-RA-sched.NumFixedAnti             |     52447 |     52446 |     -1 |  0.00% | 0.00% |
| post-RA-sched.NumStalls                |   3686107 |   3686108 |      1 |  0.00% | 0.00% |
| regalloc.NumAssigned                   |   4117619 |   4117618 |     -1 |  0.00% | 0.00% |
| simplifycfg.NumSimpl                   |    985893 |    985892 |     -1 |  0.00% | 0.00% |
| sroa.NumAllocaPartitionUses            |   3095830 |   3095827 |     -3 |  0.00% | 0.00% |
| sroa.NumAllocaPartitions               |    698974 |    698973 |     -1 |  0.00% | 0.00% |
| sroa.NumAllocasAnalyzed                |    796794 |    796791 |     -3 |  0.00% | 0.00% |
| sroa.NumDeleted                        |   3687318 |   3687311 |     -7 |  0.00% | 0.00% |
| sroa.NumPromoted                       |    689146 |    689149 |      3 |  0.00% | 0.00% |
| stack-coloring.NumMarkerSeen           |    105177 |    105174 |     -3 |  0.00% | 0.00% |
| stack-coloring.StackSpaceSaved         |    383217 |    383205 |    -12 |  0.00% | 0.00% |

llvm/lib/Transforms/Scalar/SROA.cpp
4535	As of vanilla llvm test-suite, `PresplitTimes` is at most ever `2`, so i think `MAX_PRESPLIT_ITERATIONS` could be much lower than `256`.

3? Really? I mean, I guess it would be odd for for this pattern to show up outside of unions or ABI lowering, but I thought it would be a little more common than that.

Is this still relevant?

llvm/test/Transforms/SROA/split-integer.ll
17	Use opaque pointers

Herald added a project: Restricted Project. · View Herald TranscriptAug 10 2023, 1:45 PM

Herald added subscribers: StephenFan, arichardson. · View Herald Transcript

Revision Contents

Path

Size

llvm/

include/

llvm/

Transforms/

Scalar/

SROA.h

1 line

lib/

Transforms/

Scalar/

SROA.cpp

254 lines

test/

Transforms/

SROA/

basictest.ll

390 lines

split-integer-be.ll

27 lines

split-integer.ll

45 lines

Diff 264140

llvm/include/llvm/Transforms/Scalar/SROA.h

Show First 20 Lines • Show All 117 Lines • ▼ Show 20 Lines	private:
friend class sroa::AllocaSliceRewriter;		friend class sroa::AllocaSliceRewriter;
friend class sroa::SROALegacyPass;		friend class sroa::SROALegacyPass;

/// Helper used by both the public run method and by the legacy pass.		/// Helper used by both the public run method and by the legacy pass.
PreservedAnalyses runImpl(Function &F, DominatorTree &RunDT,		PreservedAnalyses runImpl(Function &F, DominatorTree &RunDT,
AssumptionCache &RunAC);		AssumptionCache &RunAC);

bool presplitLoadsAndStores(AllocaInst &AI, sroa::AllocaSlices &AS);		bool presplitLoadsAndStores(AllocaInst &AI, sroa::AllocaSlices &AS);
		bool presplitOverlappedSlices(AllocaInst &AI, sroa::AllocaSlices &AS);
AllocaInst *rewritePartition(AllocaInst &AI, sroa::AllocaSlices &AS,		AllocaInst *rewritePartition(AllocaInst &AI, sroa::AllocaSlices &AS,
sroa::Partition &P);		sroa::Partition &P);
bool splitAlloca(AllocaInst &AI, sroa::AllocaSlices &AS);		bool splitAlloca(AllocaInst &AI, sroa::AllocaSlices &AS);
bool runOnAlloca(AllocaInst &AI);		bool runOnAlloca(AllocaInst &AI);
void clobberUse(Use &U);		void clobberUse(Use &U);
bool deleteDeadInstructions(SmallPtrSetImpl<AllocaInst *> &DeletedAllocas);		bool deleteDeadInstructions(SmallPtrSetImpl<AllocaInst *> &DeletedAllocas);
bool promoteAllocas(Function &F);		bool promoteAllocas(Function &F);
};		};

} // end namespace llvm		} // end namespace llvm

#endif // LLVM_TRANSFORMS_SCALAR_SROA_H		#endif // LLVM_TRANSFORMS_SCALAR_SROA_H

llvm/lib/Transforms/Scalar/SROA.cpp

Show First 20 Lines • Show All 3,462 Lines • ▼ Show 20 Lines	bool visitAddrSpaceCastInst(AddrSpaceCastInst &ASC) {
return false;		return false;
}		}

bool visitGetElementPtrInst(GetElementPtrInst &GEPI) {		bool visitGetElementPtrInst(GetElementPtrInst &GEPI) {
enqueueUsers(GEPI);		enqueueUsers(GEPI);
return false;		return false;
}		}

bool visitPHINode(PHINode &PN) {		bool visitPHINode(PHINode &PN) {
		efriedmaUnsubmitted Not Done Reply Inline Actions Rearranging functions like this makes it harder to read the patch. efriedma: Rearranging functions like this makes it harder to read the patch.
		CarrotAuthorUnsubmitted Done Reply Inline Actions This is because the base class is changed from InstVisitor to PtrUseVisitor. These 2 classes have different return types for various visit functions. Fortunately they have similar semantics. Carrot: This is because the base class is changed from InstVisitor to PtrUseVisitor. These 2 classes…
		efriedmaUnsubmitted Not Done Reply Inline Actions I don't see the relationship between changing the base class and reordering the visit* functions. It should be possible to write the functions in any order for either base class. efriedma: I don't see the relationship between changing the base class and reordering the visit*…
		CarrotAuthorUnsubmitted Done Reply Inline Actions I didn't reorder any visit* functions. I deleted some visit* functions because PtrUseVisitor has same default implementations. Phabricator diff algorithm matches the meaningless "}" and blank lines, causes the result looks strange. Carrot: I didn't reorder any visit* functions. I deleted some visit* functions because PtrUseVisitor…
		efriedmaUnsubmitted Not Done Reply Inline Actions Oh, sorry, you're right, not your fault. efriedma: Oh, sorry, you're right, not your fault.
enqueueUsers(PN);		enqueueUsers(PN);
return false;		return false;
}		}

bool visitSelectInst(SelectInst &SI) {		bool visitSelectInst(SelectInst &SI) {
enqueueUsers(SI);		enqueueUsers(SI);
return false;		return false;
}		}
▲ Show 20 Lines • Show All 143 Lines • ▼ Show 20 Lines	StructType *SubTy =
StructType::get(STy->getContext(), makeArrayRef(EI, EE), STy->isPacked());		StructType::get(STy->getContext(), makeArrayRef(EI, EE), STy->isPacked());
const StructLayout *SubSL = DL.getStructLayout(SubTy);		const StructLayout *SubSL = DL.getStructLayout(SubTy);
if (Size != SubSL->getSizeInBytes())		if (Size != SubSL->getSizeInBytes())
return nullptr; // The sub-struct doesn't have quite the size needed.		return nullptr; // The sub-struct doesn't have quite the size needed.

return SubTy;		return SubTy;
}		}

		// Fore each load/store record the corresponding slice and split positions.
		struct SplitOffsets {
		Slice *S;
		std::vector<uint64_t> Splits;
		};

/// Pre-split loads and stores to simplify rewriting.		/// Pre-split loads and stores to simplify rewriting.
///		///
/// We want to break up the splittable load+store pairs as much as		/// We want to break up the splittable load+store pairs as much as
/// possible. This is important to do as a preprocessing step, as once we		/// possible. This is important to do as a preprocessing step, as once we
/// start rewriting the accesses to partitions of the alloca we lose the		/// start rewriting the accesses to partitions of the alloca we lose the
/// necessary information to correctly split apart paired loads and stores		/// necessary information to correctly split apart paired loads and stores
/// which both point into this alloca. The case to consider is something like		/// which both point into this alloca. The case to consider is something like
/// the following:		/// the following:
Show All 28 Lines	bool SROA::presplitLoadsAndStores(AllocaInst &AI, AllocaSlices &AS) {
// actually split.		// actually split.
SmallVector<LoadInst *, 4> Loads;		SmallVector<LoadInst *, 4> Loads;
SmallVector<StoreInst *, 4> Stores;		SmallVector<StoreInst *, 4> Stores;

// We need to accumulate the splits required of each load or store where we		// We need to accumulate the splits required of each load or store where we
// can find them via a direct lookup. This is important to cross-check loads		// can find them via a direct lookup. This is important to cross-check loads
// and stores against each other. We also track the slice so that we can kill		// and stores against each other. We also track the slice so that we can kill
// all the slices that end up split.		// all the slices that end up split.
struct SplitOffsets {
Slice *S;
std::vector<uint64_t> Splits;
};
SmallDenseMap<Instruction *, SplitOffsets, 8> SplitOffsetsMap;		SmallDenseMap<Instruction *, SplitOffsets, 8> SplitOffsetsMap;

// Track loads out of this alloca which cannot, for any reason, be pre-split.		// Track loads out of this alloca which cannot, for any reason, be pre-split.
// This is important as we also cannot pre-split stores of those loads!		// This is important as we also cannot pre-split stores of those loads!
// FIXME: This is all pretty gross. It means that we can be more aggressive		// FIXME: This is all pretty gross. It means that we can be more aggressive
// in pre-splitting when the load feeding the store happens to come from		// in pre-splitting when the load feeding the store happens to come from
// a separate alloca. Put another way, the effectiveness of SROA would be		// a separate alloca. Put another way, the effectiveness of SROA would be
// decreased by a frontend which just concatenated all of its local allocas		// decreased by a frontend which just concatenated all of its local allocas
▲ Show 20 Lines • Show All 217 Lines • ▼ Show 20 Lines	for (;;) {
// Append this load onto the list of split loads so we can find it later		// Append this load onto the list of split loads so we can find it later
// to rewrite the stores.		// to rewrite the stores.
SplitLoads.push_back(PLoad);		SplitLoads.push_back(PLoad);

// Now build a new slice for the alloca.		// Now build a new slice for the alloca.
NewSlices.push_back(		NewSlices.push_back(
Slice(BaseOffset + PartOffset, BaseOffset + PartOffset + PartSize,		Slice(BaseOffset + PartOffset, BaseOffset + PartOffset + PartSize,
&PLoad->getOperandUse(PLoad->getPointerOperandIndex()),		&PLoad->getOperandUse(PLoad->getPointerOperandIndex()),
/IsSplittable/ false));		/IsSplittable/ true));
LLVM_DEBUG(dbgs() << " new slice [" << NewSlices.back().beginOffset()		LLVM_DEBUG(dbgs() << " new slice [" << NewSlices.back().beginOffset()
<< ", " << NewSlices.back().endOffset()		<< ", " << NewSlices.back().endOffset()
<< "): " << *PLoad << "\n");		<< "): " << *PLoad << "\n");

// See if we've handled all the splits.		// See if we've handled all the splits.
if (Idx >= Size)		if (Idx >= Size)
break;		break;

▲ Show 20 Lines • Show All 132 Lines • ▼ Show 20 Lines	for (;;) {
StorePartPtrTy, StoreBasePtr->getName() + "."),		StorePartPtrTy, StoreBasePtr->getName() + "."),
getAdjustedAlignment(SI, PartOffset, DL),		getAdjustedAlignment(SI, PartOffset, DL),
/IsVolatile/ false);		/IsVolatile/ false);

// Now build a new slice for the alloca.		// Now build a new slice for the alloca.
NewSlices.push_back(		NewSlices.push_back(
Slice(BaseOffset + PartOffset, BaseOffset + PartOffset + PartSize,		Slice(BaseOffset + PartOffset, BaseOffset + PartOffset + PartSize,
&PStore->getOperandUse(PStore->getPointerOperandIndex()),		&PStore->getOperandUse(PStore->getPointerOperandIndex()),
/IsSplittable/ false));		/IsSplittable/ true));
LLVM_DEBUG(dbgs() << " new slice [" << NewSlices.back().beginOffset()		LLVM_DEBUG(dbgs() << " new slice [" << NewSlices.back().beginOffset()
<< ", " << NewSlices.back().endOffset()		<< ", " << NewSlices.back().endOffset()
<< "): " << *PStore << "\n");		<< "): " << *PStore << "\n");
if (!SplitLoads) {		if (!SplitLoads) {
LLVM_DEBUG(dbgs() << " of split load: " << *PLoad << "\n");		LLVM_DEBUG(dbgs() << " of split load: " << *PLoad << "\n");
}		}

// See if we've finished all the splits.		// See if we've finished all the splits.
▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	PromotableAllocas.erase(
llvm::remove_if(		llvm::remove_if(
PromotableAllocas,		PromotableAllocas,
[&](AllocaInst *AI) { return ResplitPromotableAllocas.count(AI); }),		[&](AllocaInst *AI) { return ResplitPromotableAllocas.count(AI); }),
PromotableAllocas.end());		PromotableAllocas.end());

return true;		return true;
}		}

		// Limit the number of times presplitOverlappedSlices is called.
		#define MAX_PRESPLIT_ITERATIONS 128
		efriedmaUnsubmitted Not Done Reply Inline Actions `static const int MaxPresplitIterations = 128;` efriedma: `static const int MaxPresplitIterations = 128;`

		/// Pre-split overlapped AllocaSlices like following to simplify rewriting.
		lebedev.riUnsubmitted Not Done Reply Inline Actions Ignorant question: is it guaranteed that there isn't a situation like this: /// S1 ------ /// S2 ------ /// S3 ------ I.e., there is no other slice S3 overlapping S2, with different overlap characteristic? lebedev.ri: Ignorant question: is it guaranteed that there isn't a situation like this: ``` /// S1…
		CarrotAuthorUnsubmitted Done Reply Inline Actions Although I think it is extremely unlikely in real world applications, one can easily write such code in a synthetic test case. Function @test3 in basictest.ll demonstrates this situation, actually it is more crazily overlapped. With this patch more alloca space and related load/store instructions are eliminated. Carrot: Although I think it is extremely unlikely in real world applications, one can easily write such…
		///
		/// S1 ------
		/// S2 ------
		///
		/// Here we want to split S1 at the begin offset of S2. So it changes to
		///
		/// S11 ---
		/// S12 ---
		/// S2 ------
		///
		/// \returns true if any changes are made.
		bool SROA::presplitOverlappedSlices(AllocaInst &AI, sroa::AllocaSlices &AS) {
		LLVM_DEBUG(dbgs() << "Pre-splitting overlapped slices\n");

		// Track the loads and stores which are candidates for splitting.
		SmallVector<LoadInst *, 4> Loads;
		SmallVector<StoreInst *, 4> Stores;
		SmallDenseMap<Instruction *, SplitOffsets, 8> SplitOffsetsMap;

		efriedmaUnsubmitted Not Done Reply Inline Actions I think this is worst-case O(n^3) over the number of operations in a partition? There's one loop outside the function, and two loops inside the function. I guess it's unlikely to bite in most cases, but it's the sort of thing that can easily blow up to infinite compile-time. efriedma: I think this is worst-case O(n^3) over the number of operations in a partition? There's one…
		CarrotAuthorUnsubmitted Done Reply Inline Actions I limit the number of calls to this function to some constant. Carrot: I limit the number of calls to this function to some constant.
		for (auto &P : AS.partitions()) {
		bool Found = false;
		efriedmaUnsubmitted Not Done Reply Inline Actions Is it possible that we don't end up actually splitting a partition after we pre-split? Could we try harder to avoid that? efriedma: Is it possible that we don't end up actually splitting a partition after we pre-split? Could…
		CarrotAuthorUnsubmitted Done Reply Inline Actions This function only presplits slices, it doesn't split partitions. The partitions are only constructed on demand when we request an iterator from AllocaSlices like: for (auto &P : AS.partitions()) Carrot: This function only presplits slices, it doesn't split partitions. The partitions are only…
		efriedmaUnsubmitted Not Done Reply Inline Actions I wasn't trying to ask about the partitions datastructure. I meant, are there cases where we do the transform, but then SROA can't do any further transforms? So instead of a single load, we have two loads and some bit manipulation that doesn't simplify? I'd like to avoid splitting the load/store instruction if it''s not actually going to help. efriedma: I wasn't trying to ask about the partitions datastructure. I meant, are there cases where we…
		for (Slice &S1 : P) {
		if (!S1.isSplittable())
		continue;
		for (Slice &S2 : P) {
		// We are interested in following case only:
		//
		// S1 ------
		// S2 ------
		if ((S1.beginOffset() >= S2.beginOffset()) \|\|
		(S1.endOffset() >= S2.endOffset()) \|\|
		(S1.endOffset() <= S2.beginOffset()))
		continue;

		// Found the overlapped case, record the instruction.
		Instruction *I = cast<Instruction>(S1.getUse()->getUser());
		if (auto *LI = dyn_cast<LoadInst>(I)) {
		assert(!LI->isVolatile() && "Cannot split volatile loads!");
		Loads.push_back(LI);
		} else if (auto *SI = dyn_cast<StoreInst>(I)) {
		if (S1.getUse() != &SI->getOperandUse(SI->getPointerOperandIndex()))
		// Skip stores of pointers.
		continue;
		assert(!SI->isVolatile() && "Cannot split volatile stores!");
		Stores.push_back(SI);
		} else {
		// Other uses cannot be pre-split.
		continue;
		}

		// We can split S1 at the position S2.beginOffset().
		LLVM_DEBUG(dbgs() << " Candidate: " << *I << "\n");
		auto &Offsets = SplitOffsetsMap[I];
		assert(Offsets.Splits.empty());
		Offsets.S = &S1;
		Offsets.Splits.push_back(S2.beginOffset() - S1.beginOffset());

		Found = true;
		break;
		}

		if (Found)
		break;
		}
		}

		// Collect the new slices which we will merge into the alloca slices.
		SmallVector<Slice, 4> NewSlices;
		std::vector<Value *> SplitInsts;
		IRBuilderTy IRB(&AI);
		const DataLayout &DL = AI.getModule()->getDataLayout();

		for (LoadInst *LI : Loads) {
		SplitInsts.clear();

		IntegerType *Ty = cast<IntegerType>(LI->getType());
		uint64_t LoadSize = Ty->getBitWidth() / 8;

		auto &Offsets = SplitOffsetsMap[LI];
		assert(LoadSize == Offsets.S->endOffset() - Offsets.S->beginOffset() &&
		"Slice size should always match load size exactly!");
		uint64_t BaseOffset = Offsets.S->beginOffset();
		Instruction *BasePtr = cast<Instruction>(LI->getPointerOperand());

		auto AS = LI->getPointerAddressSpace();
		IRB.SetInsertPoint(LI);

		assert(Offsets.Splits.size() == 1);
		uint64_t PartOffset = 0, PartSize = Offsets.Splits.front();
		for (int i=0; i<2; i++) {
		auto PartTy = Type::getIntNTy(Ty->getContext(), PartSize 8);
		auto *PartPtrTy = PartTy->getPointerTo(AS);
		LoadInst *PLoad = IRB.CreateAlignedLoad(
		PartTy,
		getAdjustedPtr(IRB, DL, BasePtr,
		APInt(DL.getIndexSizeInBits(AS), PartOffset),
		PartPtrTy, BasePtr->getName() + "."),
		getAdjustedAlignment(LI, PartOffset, DL),
		/IsVolatile/ false, LI->getName());
		PLoad->copyMetadata(*LI, {LLVMContext::MD_mem_parallel_loop_access,
		LLVMContext::MD_access_group});

		// Record the part load so later we can combine the loaded values into a
		// single integer.
		SplitInsts.push_back(PLoad);

		// Now build a new slice for the alloca.
		NewSlices.push_back(
		Slice(BaseOffset + PartOffset, BaseOffset + PartOffset + PartSize,
		&PLoad->getOperandUse(PLoad->getPointerOperandIndex()),
		/IsSplittable/ true));
		LLVM_DEBUG(dbgs() << " new slice [" << NewSlices.back().beginOffset()
		<< ", " << NewSlices.back().endOffset()
		<< "): " << *PLoad << "\n");

		// Setup the next partition.
		PartOffset = PartSize;
		PartSize = LoadSize - PartSize;
		}

		// Combine 2 loaded value into a single integer.
		Value *V1 = IRB.CreateZExt(SplitInsts[0], Ty, LI->getName() + ".ext.0");
		Value *V2 = IRB.CreateZExt(SplitInsts[1], Ty, LI->getName() + ".ext.1");

		PartSize = Offsets.Splits.front();
		if (DL.isBigEndian()) {
		uint64_t ShAmt = 8 * (LoadSize - PartSize);
		V1 = IRB.CreateShl(V1, ShAmt, LI->getName() + ".shift");
		} else {
		uint64_t ShAmt = 8 * PartSize;
		V2 = IRB.CreateShl(V2, ShAmt, LI->getName() + ".shift");
		}

		Value *V = IRB.CreateOr(V1, V2, LI->getName() + ".or");
		LI->replaceAllUsesWith(V);

		// Mark the original load as dead and kill the original slice.
		DeadInsts.insert(LI);
		Offsets.S->kill();
		}

		for (StoreInst *SI : Stores) {
		SplitInsts.clear();
		IRB.SetInsertPoint(SI);

		auto *V = SI->getValueOperand();
		IntegerType *Ty = cast<IntegerType>(V->getType());
		uint64_t StoreSize = Ty->getBitWidth() / 8;

		auto &Offsets = SplitOffsetsMap[SI];
		assert(StoreSize == Offsets.S->endOffset() - Offsets.S->beginOffset() &&
		"Slice size should always match load size exactly!");
		uint64_t BaseOffset = Offsets.S->beginOffset();
		Instruction *StoreBasePtr = cast<Instruction>(SI->getPointerOperand());

		assert(Offsets.Splits.size() == 1);
		uint64_t PartSize = Offsets.Splits.front();

		// Split the store value into 2 parts.
		auto LowTy = Type::getIntNTy(Ty->getContext(), PartSize 8);
		auto *HighTy = Type::getIntNTy(Ty->getContext(),
		(StoreSize - PartSize) * 8);

		efriedmaUnsubmitted Done Reply Inline Actions Given this is endian-sensitive, I'd like to see a big-endian testcase. efriedma: Given this is endian-sensitive, I'd like to see a big-endian testcase.
		auto *V1 = V;
		auto *V2 = V;
		if (DL.isBigEndian()) {
		uint64_t ShAmt = 8 * (StoreSize - PartSize);
		V1 = IRB.CreateLShr(V1, ShAmt, SI->getName() + ".shift");
		} else {
		uint64_t ShAmt = 8 * PartSize;
		V2 = IRB.CreateLShr(V2, ShAmt, SI->getName() + ".shift");
		}

		V1 = IRB.CreateTrunc(V1, LowTy, SI->getName() + ".trunc.0");
		V2 = IRB.CreateTrunc(V2, HighTy, SI->getName() + ".trunc.1");
		SplitInsts.push_back(V1);
		SplitInsts.push_back(V2);

		// Now we can store the 2 parts.
		auto AS = SI->getPointerAddressSpace();
		uint64_t PartOffset = 0;
		for (int i=0; i<2; i++) {
		Value *SV = SplitInsts[i];
		auto *PartTy = SV->getType();
		auto *StorePartPtrTy = PartTy->getPointerTo(AS);

		StoreInst *PStore = IRB.CreateAlignedStore(SV,
		getAdjustedPtr(IRB, DL, StoreBasePtr,
		APInt(DL.getIndexSizeInBits(AS), PartOffset),
		StorePartPtrTy, StoreBasePtr->getName() + "."),
		getAdjustedAlignment(SI, PartOffset, DL),
		/IsVolatile/ false);

		// Build a new slice for the alloca.
		NewSlices.push_back(
		Slice(BaseOffset + PartOffset, BaseOffset + PartOffset + PartSize,
		&PStore->getOperandUse(PStore->getPointerOperandIndex()),
		/IsSplittable/ true));
		LLVM_DEBUG(dbgs() << " new slice [" << NewSlices.back().beginOffset()
		<< ", " << NewSlices.back().endOffset()
		<< "): " << *PStore << "\n");

		// Setup the next part.
		PartOffset = PartSize;
		PartSize = StoreSize - PartSize;
		}

		// Mark the original store as dead and kill the original slice.
		DeadInsts.insert(SI);
		Offsets.S->kill();
		}

		// Remove the killed slices that have ben pre-split.
		efriedmaUnsubmitted Not Done Reply Inline Actions "been" efriedma: "been"
		AS.erase(llvm::remove_if(AS, [](const Slice &S) { return S.isDead(); }),
		AS.end());

		// Insert our new slices. This will sort and merge them into the sorted
		// sequence.
		AS.insert(NewSlices);

		LLVM_DEBUG(dbgs() << " Pre-split slices:\n");
		#ifndef NDEBUG
		for (auto I = AS.begin(), E = AS.end(); I != E; ++I)
		LLVM_DEBUG(AS.print(dbgs(), I, " "));
		#endif

		return SplitOffsetsMap.size() > 0;
		}

/// Rewrite an alloca partition's users.		/// Rewrite an alloca partition's users.
///		///
/// This routine drives both of the rewriting goals of the SROA pass. It tries		/// This routine drives both of the rewriting goals of the SROA pass. It tries
/// to rewrite uses of an alloca partition to be conducive for SSA value		/// to rewrite uses of an alloca partition to be conducive for SSA value
/// promotion. If the partition needs a new, more refined alloca, this will		/// promotion. If the partition needs a new, more refined alloca, this will
/// build that new alloca, preserving as much type information as possible, and		/// build that new alloca, preserving as much type information as possible, and
/// rewrite the uses of the old alloca to point at the new one and have the		/// rewrite the uses of the old alloca to point at the new one and have the
/// appropriate new offsets. It also evaluates how successful the rewrite was		/// appropriate new offsets. It also evaluates how successful the rewrite was
▲ Show 20 Lines • Show All 146 Lines • ▼ Show 20 Lines	if (AS.begin() == AS.end())
return false;		return false;

unsigned NumPartitions = 0;		unsigned NumPartitions = 0;
bool Changed = false;		bool Changed = false;
const DataLayout &DL = AI.getModule()->getDataLayout();		const DataLayout &DL = AI.getModule()->getDataLayout();

// First try to pre-split loads and stores.		// First try to pre-split loads and stores.
Changed \|= presplitLoadsAndStores(AI, AS);		Changed \|= presplitLoadsAndStores(AI, AS);
		int PresplitTimes = 0;
		lebedev.riUnsubmitted Not Done Reply Inline Actions As of vanilla llvm test-suite, `PresplitTimes` is at most ever `2`, so i think `MAX_PRESPLIT_ITERATIONS` could be much lower than `256`. lebedev.ri: As of vanilla llvm test-suite, `PresplitTimes` is at most ever `2`, so i think…
		bool LocalChanged = true;
		while (LocalChanged && PresplitTimes < MAX_PRESPLIT_ITERATIONS) {
		LocalChanged = presplitOverlappedSlices(AI, AS);
		Changed \|= LocalChanged;
		PresplitTimes++;
		}

// Now that we have identified any pre-splitting opportunities,		// Now that we have identified any pre-splitting opportunities,
// mark loads and stores unsplittable except for the following case.		// mark loads and stores unsplittable except for the following case.
// We leave a slice splittable if all other slices are disjoint or fully		// We leave a slice splittable if all other slices are disjoint or fully
// included in the slice, such as whole-alloca loads and stores.		// included in the slice, such as whole-alloca loads and stores.
// If we fail to split these during pre-splitting, we want to force them		// If we fail to split these during pre-splitting, we want to force them
// to be rewritten into a partition.		// to be rewritten into a partition.
bool IsSorted = true;		bool IsSorted = true;
▲ Show 20 Lines • Show All 395 Lines • Show Last 20 Lines

llvm/test/Transforms/SROA/basictest.ll

Show First 20 Lines • Show All 97 Lines • ▼ Show 20 Lines	L2:
%gepA.bc = bitcast i8* %gepA to i64*		%gepA.bc = bitcast i8* %gepA to i64*
%Z = load i64, i64* %gepA.bc, align 1		%Z = load i64, i64* %gepA.bc, align 1
ret i64 %Z		ret i64 %Z
}		}

; Avoid crashing when load/storing at at different offsets.		; Avoid crashing when load/storing at at different offsets.
define i64 @test2_addrspacecast_gep_offset(i64 %X) {		define i64 @test2_addrspacecast_gep_offset(i64 %X) {
; CHECK-LABEL: @test2_addrspacecast_gep_offset(		; CHECK-LABEL: @test2_addrspacecast_gep_offset(
; CHECK: %A.sroa.0 = alloca [10 x i8]		; CHECK: %A.sroa.1.32.extract.trunc = trunc i64 %X to i48
; CHECK: [[GEP0:%.]] = getelementptr inbounds [10 x i8], [10 x i8] %A.sroa.0, i16 0, i16 2		; CHECK-NEXT: %A.sroa.3.32.extract.shift = lshr i64 %X, 48
; CHECK-NEXT: [[GEP1:%.]] = addrspacecast i8 [[GEP0]] to i64 addrspace(1)*		; CHECK-NEXT: %A.sroa.3.32.extract.trunc = trunc i64 %A.sroa.3.32.extract.shift to i16
; CHECK-NEXT: store i64 %X, i64 addrspace(1)* [[GEP1]], align 1
; CHECK: br		; CHECK: br

; CHECK: [[BITCAST:%.]] = bitcast [10 x i8] %A.sroa.0 to i64*		; CHECK: %Z.ext.0 = zext i16 undef to i64
; CHECK: %A.sroa.0.0.A.sroa.0.30.Z = load i64, i64* [[BITCAST]], align 1		; CHECK-NEXT: %Z.ext.1 = zext i48 %A.sroa.1.32.extract.trunc to i64
		; CHECK-NEXT: %Z.shift = shl i64 %Z.ext.1, 16
		; CHECK-NEXT: %Z.or = or i64 %Z.ext.0, %Z.shift
; CHECK-NEXT: ret		; CHECK-NEXT: ret
entry:		entry:
%A = alloca [256 x i8]		%A = alloca [256 x i8]
%B = addrspacecast [256 x i8]* %A to i64 addrspace(1)*		%B = addrspacecast [256 x i8]* %A to i64 addrspace(1)*
%gepA = getelementptr [256 x i8], [256 x i8]* %A, i16 0, i16 30		%gepA = getelementptr [256 x i8], [256 x i8]* %A, i16 0, i16 30
%gepB = getelementptr i64, i64 addrspace(1)* %B, i16 4		%gepB = getelementptr i64, i64 addrspace(1)* %B, i16 4
store i64 %X, i64 addrspace(1)* %gepB, align 1		store i64 %X, i64 addrspace(1)* %gepB, align 1
br label %L2		br label %L2

L2:		L2:
%gepA.bc = bitcast i8* %gepA to i64*		%gepA.bc = bitcast i8* %gepA to i64*
%Z = load i64, i64* %gepA.bc, align 1		%Z = load i64, i64* %gepA.bc, align 1
ret i64 %Z		ret i64 %Z
}		}

define void @test3(i8* %dst, i8* align 8 %src) {		define void @test3(i8* %dst, i8* align 8 %src) {
; CHECK-LABEL: @test3(		; CHECK-LABEL: @test3(

entry:		entry:
%a = alloca [300 x i8]		%a = alloca [300 x i8]
; CHECK-NOT: alloca		; CHECK-NOT: alloca
; CHECK: %[[test3_a1:.*]] = alloca [42 x i8]		; CHECK: %[[test3_a1:.*]] = alloca [42 x i8]
; CHECK-NEXT: %[[test3_a2:.*]] = alloca [99 x i8]		; CHECK-NEXT: %[[test3_a2:.*]] = alloca [99 x i8]
; CHECK-NEXT: %[[test3_a3:.*]] = alloca [16 x i8]
; CHECK-NEXT: %[[test3_a4:.*]] = alloca [42 x i8]		; CHECK-NEXT: %[[test3_a4:.*]] = alloca [42 x i8]
; CHECK-NEXT: %[[test3_a5:.*]] = alloca [7 x i8]
; CHECK-NEXT: %[[test3_a6:.*]] = alloca [7 x i8]
; CHECK-NEXT: %[[test3_a7:.*]] = alloca [85 x i8]		; CHECK-NEXT: %[[test3_a7:.*]] = alloca [85 x i8]

%b = getelementptr [300 x i8], [300 x i8]* %a, i64 0, i64 0		%b = getelementptr [300 x i8], [300 x i8]* %a, i64 0, i64 0
call void @llvm.memcpy.p0i8.p0i8.i32(i8* %b, i8* align 8 %src, i32 300, i1 false), !tbaa !0		call void @llvm.memcpy.p0i8.p0i8.i32(i8* %b, i8* align 8 %src, i32 300, i1 false), !tbaa !0
; CHECK-NEXT: %[[gep_dst:.]] = getelementptr inbounds [42 x i8], [42 x i8] %[[test3_a1]], i64 0, i64 0		; CHECK-NEXT: %[[gep_dst:.]] = getelementptr inbounds [42 x i8], [42 x i8] %[[test3_a1]], i64 0, i64 0
; CHECK-NEXT: call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 1 %[[gep_dst]], i8* align 8 %src, i32 42, {{.}}), !tbaa [[TAG_0:!.]]		; CHECK-NEXT: call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 1 %[[gep_dst]], i8* align 8 %src, i32 42, {{.}}), !tbaa [[TAG_0:!.]]
; CHECK-NEXT: %[[gep:.]] = getelementptr inbounds i8, i8 %src, i64 42		; CHECK-NEXT: %[[gep:.]] = getelementptr inbounds i8, i8 %src, i64 42
; CHECK-NEXT: %[[test3_r1:.]] = load i8, i8 %[[gep]], {{.*}}, !tbaa [[TAG_0]]		; CHECK-NEXT: %[[test3_r1:.]] = load i8, i8 %[[gep]], {{.*}}, !tbaa [[TAG_0]]
; CHECK-NEXT: %[[gep_src:.]] = getelementptr inbounds i8, i8 %src, i64 43		; CHECK-NEXT: %[[gep_src:.]] = getelementptr inbounds i8, i8 %src, i64 43
; CHECK-NEXT: %[[gep_dst:.]] = getelementptr inbounds [99 x i8], [99 x i8] %[[test3_a2]], i64 0, i64 0		; CHECK-NEXT: %[[gep_dst:.]] = getelementptr inbounds [99 x i8], [99 x i8] %[[test3_a2]], i64 0, i64 0
; CHECK-NEXT: call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 1 %[[gep_dst]], i8* align 1 %[[gep_src]], i32 99, {{.}}), !tbaa [[TAG_0:!.]]		; CHECK-NEXT: call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 1 %[[gep_dst]], i8* align 1 %[[gep_src]], i32 99, {{.}}), !tbaa [[TAG_0:!.]]
; CHECK-NEXT: %[[gep_src:.]] = getelementptr inbounds i8, i8 %src, i64 142		; CHECK-NEXT: %[[gep_src:.]] = getelementptr inbounds i8, i8 %src, i64 142
; CHECK-NEXT: %[[gep_dst:.]] = getelementptr inbounds [16 x i8], [16 x i8] %[[test3_a3]], i64 0, i64 0		; CHECK-NEXT: %[[dummy:.]] = load i8, i8 %[[gep_src]], align 2, !tbaa [[TAG_0]]
; CHECK-NEXT: call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 1 %[[gep_dst]], i8* align 2 %[[gep_src]], i32 16, {{.}}), !tbaa [[TAG_0:!.]]		; CHECK-NEXT: %[[gep_src:.]] = getelementptr inbounds i8, i8 %src, i64 143
		; CHECK-NEXT: %[[dummy:.]] = load i8, i8 %[[gep_src]], align 1, !tbaa [[TAG_0]]
		; CHECK-NEXT: %[[gep_src:.]] = getelementptr inbounds i8, i8 %src, i64 144
		; CHECK-NEXT: %[[dummy:.]] = load i8, i8 %[[gep_src]], align 8, !tbaa [[TAG_0]]
		; CHECK-NEXT: %[[gep_src:.]] = getelementptr inbounds i8, i8 %src, i64 145
		; CHECK-NEXT: %[[dummy:.]] = load i8, i8 %[[gep_src]], align 1, !tbaa [[TAG_0]]
		; CHECK-NEXT: %[[gep_src:.]] = getelementptr inbounds i8, i8 %src, i64 146
		; CHECK-NEXT: %[[dummy:.]] = load i8, i8 %[[gep_src]], align 2, !tbaa [[TAG_0]]
		; CHECK-NEXT: %[[gep_src:.]] = getelementptr inbounds i8, i8 %src, i64 147
		; CHECK-NEXT: %[[dummy:.]] = load i8, i8 %[[gep_src]], align 1, !tbaa [[TAG_0]]
		; CHECK-NEXT: %[[gep_src:.]] = getelementptr inbounds i8, i8 %src, i64 148
		; CHECK-NEXT: %[[dummy:.]] = load i8, i8 %[[gep_src]], align 4, !tbaa [[TAG_0]]
		; CHECK-NEXT: %[[gep_src:.]] = getelementptr inbounds i8, i8 %src, i64 149
		; CHECK-NEXT: %[[dummy:.]] = load i8, i8 %[[gep_src]], align 1, !tbaa [[TAG_0]]
		; CHECK-NEXT: %[[gep_src:.]] = getelementptr inbounds i8, i8 %src, i64 150
		; CHECK-NEXT: %[[bitcast:.]] = bitcast i8 %[[gep_src]] to i64*
		; CHECK-NEXT: %[[src150:.]] = load i64, i64 %[[bitcast]], align 2, !tbaa [[TAG_0]]
		; CHECK-NEXT: %[[src150_trunc:.*]] = trunc i64 %[[src150]] to i56
		; CHECK-NEXT: %[[src150_trunc_trunc:.*]] = trunc i56 %[[src150_trunc]] to i48
		; CHECK-NEXT: %[[src150_trunc_trunc_trunc:.*]] = trunc i48 %[[src150_trunc_trunc]] to i40
		; CHECK-NEXT: %[[src150_trunc_trunc_trunc_trunc:.*]] = trunc i40 %[[src150_trunc_trunc_trunc]] to i32
		; CHECK-NEXT: %[[src150_trunc_trunc_trunc_trunc_trunc:.*]] = trunc i32 %[[src150_trunc_trunc_trunc_trunc]] to i24
		; CHECK-NEXT: %[[src150_trunc_trunc_trunc_trunc_trunc_trunc:.*]] = trunc i24 %[[src150_trunc_trunc_trunc_trunc_trunc]] to i16
		; CHECK-NEXT: %[[dummy:.*]] = trunc i16 %[[src150_trunc_trunc_trunc_trunc_trunc_trunc]] to i8
		; CHECK-NEXT: %[[src150_trunc_trunc_trunc_trunc_trunc_trunc_lshr:.*]] = lshr i16 %[[src150_trunc_trunc_trunc_trunc_trunc_trunc]], 8
		; CHECK-NEXT: %[[dummy:.*]] = trunc i16 %[[src150_trunc_trunc_trunc_trunc_trunc_trunc_lshr]] to i8
		; CHECK-NEXT: %[[src150_trunc_trunc_trunc_trunc_trunc_lshr:.*]] = lshr i24 %[[src150_trunc_trunc_trunc_trunc_trunc]], 16
		; CHECK-NEXT: %[[dummy:.*]] = trunc i24 %[[src150_trunc_trunc_trunc_trunc_trunc_lshr]] to i8
		; CHECK-NEXT: %[[src150_trunc_trunc_trunc_trunc_lshr:.*]] = lshr i32 %[[src150_trunc_trunc_trunc_trunc]], 24
		; CHECK-NEXT: %[[dummy:.*]] = trunc i32 %[[src150_trunc_trunc_trunc_trunc_lshr]] to i8
		; CHECK-NEXT: %[[src150_trunc_trunc_trunc_lshr:.*]] = lshr i40 %[[src150_trunc_trunc_trunc]], 32
		; CHECK-NEXT: %[[dummy:.*]] = trunc i40 %[[src150_trunc_trunc_trunc_lshr]] to i8
		; CHECK-NEXT: %[[src150_trunc_trunc_lshr:.*]] = lshr i48 %[[src150_trunc_trunc]], 40
		; CHECK-NEXT: %[[dummy:.*]] = trunc i48 %[[src150_trunc_trunc_lshr]] to i8
		; CHECK-NEXT: %[[src150_trunc_lshr:.*]] = lshr i56 %[[src150_trunc]], 48
		; CHECK-NEXT: %[[dummy:.*]] = trunc i56 %[[src150_trunc_lshr]] to i8
		; CHECK-NEXT: %[[src150_lshr:.*]] = lshr i64 %[[src150]], 56
		; CHECK-NEXT: %[[dummy:.*]] = trunc i64 %[[src150_lshr]] to i8
; CHECK-NEXT: %[[gep_src:.]] = getelementptr inbounds i8, i8 %src, i64 158		; CHECK-NEXT: %[[gep_src:.]] = getelementptr inbounds i8, i8 %src, i64 158
; CHECK-NEXT: %[[gep_dst:.]] = getelementptr inbounds [42 x i8], [42 x i8] %[[test3_a4]], i64 0, i64 0		; CHECK-NEXT: %[[gep_dst:.]] = getelementptr inbounds [42 x i8], [42 x i8] %[[test3_a4]], i64 0, i64 0
; CHECK-NEXT: call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 1 %[[gep_dst]], i8* align 2 %[[gep_src]], i32 42, {{.}}), !tbaa [[TAG_0:!.]]		; CHECK-NEXT: call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 1 %[[gep_dst]], i8* align 2 %[[gep_src]], i32 42, {{.}}), !tbaa [[TAG_0:!.]]
; CHECK-NEXT: %[[gep_src:.]] = getelementptr inbounds i8, i8 %src, i64 200		; CHECK-NEXT: %[[gep_src:.]] = getelementptr inbounds i8, i8 %src, i64 200
; CHECK-NEXT: %[[gep_dst:.]] = getelementptr inbounds [7 x i8], [7 x i8] %[[test3_a5]], i64 0, i64 0		; CHECK-NEXT: %[[dummy:.]] = load i8, i8 %[[gep_src]], align 8, !tbaa [[TAG_0]]
; CHECK-NEXT: call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 1 %[[gep_dst]], i8* align 8 %[[gep_src]], i32 7, {{.}}), !tbaa [[TAG_0:!.]]		; CHECK-NEXT: %[[gep_src:.]] = getelementptr inbounds i8, i8 %src, i64 201
		; CHECK-NEXT: %[[dummy:.]] = load i8, i8 %[[gep_src]], align 1, !tbaa [[TAG_0]]
		; CHECK-NEXT: %[[gep_src:.]] = getelementptr inbounds i8, i8 %src, i64 202
		; CHECK-NEXT: %[[dummy:.]] = load i8, i8 %[[gep_src]], align 2, !tbaa [[TAG_0]]
		; CHECK-NEXT: %[[gep_src:.]] = getelementptr inbounds i8, i8 %src, i64 203
		; CHECK-NEXT: %[[bc_src:.]] = bitcast i8 %[[gep_src]] to i32*
		; CHECK-NEXT: %[[i32_203:.]] = load i32, i32 %[[bc_src]], align 1, !tbaa [[TAG_0]]
		; CHECK-NEXT: %[[i32_203_trunc:.*]] = trunc i32 %[[i32_203]] to i24
		; CHECK-NEXT: %[[i32_203_trunc_trunc:.*]] = trunc i24 %[[i32_203_trunc]] to i16
		; CHECK-NEXT: %[[dummy:.*]] = trunc i16 %[[i32_203_trunc_trunc]] to i8
		; CHECK-NEXT: %[[i32_203_trunc_trunc_lshr:.*]] = lshr i16 %[[i32_203_trunc_trunc]], 8
		; CHECK-NEXT: %[[dummy:.*]] = trunc i16 %[[i32_203_trunc_trunc_lshr]] to i8
		; CHECK-NEXT: %[[i32_203_trunc_lshr:.*]] = lshr i24 %[[i32_203_trunc]], 16
		; CHECK-NEXT: %[[dummy:.*]] = trunc i24 %[[i32_203_trunc_lshr]] to i8
		; CHECK-NEXT: %[[i32_203_lshr:.*]] = lshr i32 %[[i32_203]], 24
		; CHECK-NEXT: %[[dummy:.*]] = trunc i32 %[[i32_203_lshr]] to i8
; CHECK-NEXT: %[[gep:.]] = getelementptr inbounds i8, i8 %src, i64 207		; CHECK-NEXT: %[[gep:.]] = getelementptr inbounds i8, i8 %src, i64 207
; CHECK-NEXT: %[[test3_r2:.]] = load i8, i8 %[[gep]], {{.*}}, !tbaa [[TAG_0]]		; CHECK-NEXT: %[[test3_r2:.]] = load i8, i8 %[[gep]], {{.*}}, !tbaa [[TAG_0]]
; CHECK-NEXT: %[[gep_src:.]] = getelementptr inbounds i8, i8 %src, i64 208		; CHECK-NEXT: %[[gep_src:.]] = getelementptr inbounds i8, i8 %src, i64 208
; CHECK-NEXT: %[[gep_dst:.]] = getelementptr inbounds [7 x i8], [7 x i8] %[[test3_a6]], i64 0, i64 0		; CHECK-NEXT: %[[dummy:.]] = load i8, i8 %[[gep_src]], align 8, !tbaa [[TAG_0]]
; CHECK-NEXT: call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 1 %[[gep_dst]], i8* align 8 %[[gep_src]], i32 7, {{.}}), !tbaa [[TAG_0:!.]]		; CHECK-NEXT: %[[gep_src:.]] = getelementptr inbounds i8, i8 %src, i64 209
		; CHECK-NEXT: %[[dummy:.]] = load i8, i8 %[[gep_src]], align 1, !tbaa [[TAG_0]]
		; CHECK-NEXT: %[[gep_src:.]] = getelementptr inbounds i8, i8 %src, i64 210
		; CHECK-NEXT: %[[dummy:.]] = load i8, i8 %[[gep_src]], align 2, !tbaa [[TAG_0]]
		; CHECK-NEXT: %[[gep_src:.]] = getelementptr inbounds i8, i8 %src, i64 211
		; CHECK-NEXT: %[[bc_src:.]] = bitcast i8 %[[gep_src]] to i32*
		; CHECK-NEXT: %[[i32_211:.]] = load i32, i32 %[[bc_src]], align 1, !tbaa [[TAG_0]]
		; CHECK-NEXT: %[[i32_211_trunc:.*]] = trunc i32 %[[i32_211]] to i24
		; CHECK-NEXT: %[[i32_211_trunc_trunc:.*]] = trunc i24 %[[i32_211_trunc]] to i16
		; CHECK-NEXT: %[[dummy:.*]] = trunc i16 %[[i32_211_trunc_trunc]] to i8
		; CHECK-NEXT: %[[i32_211_trunc_trunc_lshr:.*]] = lshr i16 %[[i32_211_trunc_trunc]], 8
		; CHECK-NEXT: %[[dummy:.*]] = trunc i16 %[[i32_211_trunc_trunc_lshr]] to i8
		; CHECK-NEXT: %[[i32_211_trunc_lshr:.*]] = lshr i24 %[[i32_211_trunc]], 16
		; CHECK-NEXT: %[[dummy:.*]] = trunc i24 %[[i32_211_trunc_lshr]] to i8
		; CHECK-NEXT: %[[i32_211_lshr:.*]] = lshr i32 %[[i32_211]], 24
		; CHECK-NEXT: %[[dummy:.*]] = trunc i32 %[[i32_211_lshr]] to i8
; CHECK-NEXT: %[[gep_src:.]] = getelementptr inbounds i8, i8 %src, i64 215		; CHECK-NEXT: %[[gep_src:.]] = getelementptr inbounds i8, i8 %src, i64 215
; CHECK-NEXT: %[[gep_dst:.]] = getelementptr inbounds [85 x i8], [85 x i8] %[[test3_a7]], i64 0, i64 0		; CHECK-NEXT: %[[gep_dst:.]] = getelementptr inbounds [85 x i8], [85 x i8] %[[test3_a7]], i64 0, i64 0
; CHECK-NEXT: call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 1 %[[gep_dst]], i8* align 1 %[[gep_src]], i32 85, {{.}}), !tbaa [[TAG_0:!.]]		; CHECK-NEXT: call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 1 %[[gep_dst]], i8* align 1 %[[gep_src]], i32 85, {{.}}), !tbaa [[TAG_0:!.]]

; Clobber a single element of the array, this should be promotable, and be deleted.		; Clobber a single element of the array, this should be promotable, and be deleted.
%c = getelementptr [300 x i8], [300 x i8]* %a, i64 0, i64 42		%c = getelementptr [300 x i8], [300 x i8]* %a, i64 0, i64 42
store i8 0, i8* %c		store i8 0, i8* %c

Show All 15 Lines	; CHECK-NEXT: call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 1 %[[gep_dst]], i8* align 1 %[[gep_src]], i32 85, {{.}}), !tbaa [[TAG_0:!.]]
%overlap.3.i64 = bitcast i8* %overlap.3.i8 to i64*		%overlap.3.i64 = bitcast i8* %overlap.3.i8 to i64*
%overlap.4.i64 = bitcast i8* %overlap.4.i8 to i64*		%overlap.4.i64 = bitcast i8* %overlap.4.i8 to i64*
%overlap.5.i64 = bitcast i8* %overlap.5.i8 to i64*		%overlap.5.i64 = bitcast i8* %overlap.5.i8 to i64*
%overlap.6.i64 = bitcast i8* %overlap.6.i8 to i64*		%overlap.6.i64 = bitcast i8* %overlap.6.i8 to i64*
%overlap.7.i64 = bitcast i8* %overlap.7.i8 to i64*		%overlap.7.i64 = bitcast i8* %overlap.7.i8 to i64*
%overlap.8.i64 = bitcast i8* %overlap.8.i8 to i64*		%overlap.8.i64 = bitcast i8* %overlap.8.i8 to i64*
%overlap.9.i64 = bitcast i8* %overlap.9.i8 to i64*		%overlap.9.i64 = bitcast i8* %overlap.9.i8 to i64*
store i8 1, i8* %overlap.1.i8, !tbaa !3		store i8 1, i8* %overlap.1.i8, !tbaa !3
; CHECK-NEXT: %[[gep:.]] = getelementptr inbounds [16 x i8], [16 x i8] %[[test3_a3]], i64 0, i64 0
; CHECK-NEXT: store i8 1, i8* %[[gep]], align 1, !tbaa [[TAG_3:!.*]]
store i16 1, i16* %overlap.1.i16, !tbaa !5		store i16 1, i16* %overlap.1.i16, !tbaa !5
; CHECK-NEXT: %[[bitcast:.]] = bitcast [16 x i8] %[[test3_a3]] to i16*
; CHECK-NEXT: store i16 1, i16* %[[bitcast]], {{.}}, !tbaa [[TAG_5:!.]]
store i32 1, i32* %overlap.1.i32, !tbaa !7		store i32 1, i32* %overlap.1.i32, !tbaa !7
; CHECK-NEXT: %[[bitcast:.]] = bitcast [16 x i8] %[[test3_a3]] to i32*
; CHECK-NEXT: store i32 1, i32* %[[bitcast]], {{.}}, !tbaa [[TAG_7:!.]]
store i64 1, i64* %overlap.1.i64, !tbaa !9		store i64 1, i64* %overlap.1.i64, !tbaa !9
; CHECK-NEXT: %[[bitcast:.]] = bitcast [16 x i8] %[[test3_a3]] to i64*
; CHECK-NEXT: store i64 1, i64* %[[bitcast]], {{.}}, !tbaa [[TAG_9:!.]]
store i64 2, i64* %overlap.2.i64, !tbaa !11		store i64 2, i64* %overlap.2.i64, !tbaa !11
; CHECK-NEXT: %[[gep:.]] = getelementptr inbounds [16 x i8], [16 x i8] %[[test3_a3]], i64 0, i64 1
; CHECK-NEXT: %[[bitcast:.]] = bitcast i8 %[[gep]] to i64*
; CHECK-NEXT: store i64 2, i64* %[[bitcast]], {{.}}, !tbaa [[TAG_11:!.]]
store i64 3, i64* %overlap.3.i64, !tbaa !13		store i64 3, i64* %overlap.3.i64, !tbaa !13
; CHECK-NEXT: %[[gep:.]] = getelementptr inbounds [16 x i8], [16 x i8] %[[test3_a3]], i64 0, i64 2
; CHECK-NEXT: %[[bitcast:.]] = bitcast i8 %[[gep]] to i64*
; CHECK-NEXT: store i64 3, i64* %[[bitcast]], {{.}}, !tbaa [[TAG_13:!.]]
store i64 4, i64* %overlap.4.i64, !tbaa !15		store i64 4, i64* %overlap.4.i64, !tbaa !15
; CHECK-NEXT: %[[gep:.]] = getelementptr inbounds [16 x i8], [16 x i8] %[[test3_a3]], i64 0, i64 3
; CHECK-NEXT: %[[bitcast:.]] = bitcast i8 %[[gep]] to i64*
; CHECK-NEXT: store i64 4, i64* %[[bitcast]], {{.}}, !tbaa [[TAG_15:!.]]
store i64 5, i64* %overlap.5.i64, !tbaa !17		store i64 5, i64* %overlap.5.i64, !tbaa !17
; CHECK-NEXT: %[[gep:.]] = getelementptr inbounds [16 x i8], [16 x i8] %[[test3_a3]], i64 0, i64 4
; CHECK-NEXT: %[[bitcast:.]] = bitcast i8 %[[gep]] to i64*
; CHECK-NEXT: store i64 5, i64* %[[bitcast]], {{.}}, !tbaa [[TAG_17:!.]]
store i64 6, i64* %overlap.6.i64, !tbaa !19		store i64 6, i64* %overlap.6.i64, !tbaa !19
; CHECK-NEXT: %[[gep:.]] = getelementptr inbounds [16 x i8], [16 x i8] %[[test3_a3]], i64 0, i64 5
; CHECK-NEXT: %[[bitcast:.]] = bitcast i8 %[[gep]] to i64*
; CHECK-NEXT: store i64 6, i64* %[[bitcast]], {{.}}, !tbaa [[TAG_19:!.]]
store i64 7, i64* %overlap.7.i64, !tbaa !21		store i64 7, i64* %overlap.7.i64, !tbaa !21
; CHECK-NEXT: %[[gep:.]] = getelementptr inbounds [16 x i8], [16 x i8] %[[test3_a3]], i64 0, i64 6
; CHECK-NEXT: %[[bitcast:.]] = bitcast i8 %[[gep]] to i64*
; CHECK-NEXT: store i64 7, i64* %[[bitcast]], {{.}}, !tbaa [[TAG_21:!.]]
store i64 8, i64* %overlap.8.i64, !tbaa !23		store i64 8, i64* %overlap.8.i64, !tbaa !23
; CHECK-NEXT: %[[gep:.]] = getelementptr inbounds [16 x i8], [16 x i8] %[[test3_a3]], i64 0, i64 7
; CHECK-NEXT: %[[bitcast:.]] = bitcast i8 %[[gep]] to i64*
; CHECK-NEXT: store i64 8, i64* %[[bitcast]], {{.}}, !tbaa [[TAG_23:!.]]
store i64 9, i64* %overlap.9.i64, !tbaa !25		store i64 9, i64* %overlap.9.i64, !tbaa !25
; CHECK-NEXT: %[[gep:.]] = getelementptr inbounds [16 x i8], [16 x i8] %[[test3_a3]], i64 0, i64 8
; CHECK-NEXT: %[[bitcast:.]] = bitcast i8 %[[gep]] to i64*
; CHECK-NEXT: store i64 9, i64* %[[bitcast]], {{.}}, !tbaa [[TAG_25:!.]]

; Make two sequences of overlapping stores with more gaps and irregularities.		; Make two sequences of overlapping stores with more gaps and irregularities.
%overlap2.1.0.i8 = getelementptr [300 x i8], [300 x i8]* %a, i64 0, i64 200		%overlap2.1.0.i8 = getelementptr [300 x i8], [300 x i8]* %a, i64 0, i64 200
%overlap2.1.1.i8 = getelementptr [300 x i8], [300 x i8]* %a, i64 0, i64 201		%overlap2.1.1.i8 = getelementptr [300 x i8], [300 x i8]* %a, i64 0, i64 201
%overlap2.1.2.i8 = getelementptr [300 x i8], [300 x i8]* %a, i64 0, i64 202		%overlap2.1.2.i8 = getelementptr [300 x i8], [300 x i8]* %a, i64 0, i64 202
%overlap2.1.3.i8 = getelementptr [300 x i8], [300 x i8]* %a, i64 0, i64 203		%overlap2.1.3.i8 = getelementptr [300 x i8], [300 x i8]* %a, i64 0, i64 203

%overlap2.2.0.i8 = getelementptr [300 x i8], [300 x i8]* %a, i64 0, i64 208		%overlap2.2.0.i8 = getelementptr [300 x i8], [300 x i8]* %a, i64 0, i64 208
%overlap2.2.1.i8 = getelementptr [300 x i8], [300 x i8]* %a, i64 0, i64 209		%overlap2.2.1.i8 = getelementptr [300 x i8], [300 x i8]* %a, i64 0, i64 209
%overlap2.2.2.i8 = getelementptr [300 x i8], [300 x i8]* %a, i64 0, i64 210		%overlap2.2.2.i8 = getelementptr [300 x i8], [300 x i8]* %a, i64 0, i64 210
%overlap2.2.3.i8 = getelementptr [300 x i8], [300 x i8]* %a, i64 0, i64 211		%overlap2.2.3.i8 = getelementptr [300 x i8], [300 x i8]* %a, i64 0, i64 211

%overlap2.1.0.i16 = bitcast i8* %overlap2.1.0.i8 to i16*		%overlap2.1.0.i16 = bitcast i8* %overlap2.1.0.i8 to i16*
%overlap2.1.0.i32 = bitcast i8* %overlap2.1.0.i8 to i32*		%overlap2.1.0.i32 = bitcast i8* %overlap2.1.0.i8 to i32*
%overlap2.1.1.i32 = bitcast i8* %overlap2.1.1.i8 to i32*		%overlap2.1.1.i32 = bitcast i8* %overlap2.1.1.i8 to i32*
%overlap2.1.2.i32 = bitcast i8* %overlap2.1.2.i8 to i32*		%overlap2.1.2.i32 = bitcast i8* %overlap2.1.2.i8 to i32*
%overlap2.1.3.i32 = bitcast i8* %overlap2.1.3.i8 to i32*		%overlap2.1.3.i32 = bitcast i8* %overlap2.1.3.i8 to i32*
store i8 1, i8* %overlap2.1.0.i8, !tbaa !27		store i8 1, i8* %overlap2.1.0.i8, !tbaa !27
; CHECK-NEXT: %[[gep:.]] = getelementptr inbounds [7 x i8], [7 x i8] %[[test3_a5]], i64 0, i64 0
; CHECK-NEXT: store i8 1, i8* %[[gep]], align 1, !tbaa [[TAG_27:!.*]]
store i16 1, i16* %overlap2.1.0.i16, !tbaa !29		store i16 1, i16* %overlap2.1.0.i16, !tbaa !29
; CHECK-NEXT: %[[bitcast:.]] = bitcast [7 x i8] %[[test3_a5]] to i16*
; CHECK-NEXT: store i16 1, i16* %[[bitcast]], {{.}}, !tbaa [[TAG_29:!.]]
store i32 1, i32* %overlap2.1.0.i32, !tbaa !31		store i32 1, i32* %overlap2.1.0.i32, !tbaa !31
; CHECK-NEXT: %[[bitcast:.]] = bitcast [7 x i8] %[[test3_a5]] to i32*
; CHECK-NEXT: store i32 1, i32* %[[bitcast]], {{.}}, !tbaa [[TAG_31:!.]]
store i32 2, i32* %overlap2.1.1.i32, !tbaa !33		store i32 2, i32* %overlap2.1.1.i32, !tbaa !33
; CHECK-NEXT: %[[gep:.]] = getelementptr inbounds [7 x i8], [7 x i8] %[[test3_a5]], i64 0, i64 1
; CHECK-NEXT: %[[bitcast:.]] = bitcast i8 %[[gep]] to i32*
; CHECK-NEXT: store i32 2, i32* %[[bitcast]], {{.}}, !tbaa [[TAG_33:!.]]
store i32 3, i32* %overlap2.1.2.i32, !tbaa !35		store i32 3, i32* %overlap2.1.2.i32, !tbaa !35
; CHECK-NEXT: %[[gep:.]] = getelementptr inbounds [7 x i8], [7 x i8] %[[test3_a5]], i64 0, i64 2
; CHECK-NEXT: %[[bitcast:.]] = bitcast i8 %[[gep]] to i32*
; CHECK-NEXT: store i32 3, i32* %[[bitcast]], {{.}}, !tbaa [[TAG_35:!.]]
store i32 4, i32* %overlap2.1.3.i32, !tbaa !37		store i32 4, i32* %overlap2.1.3.i32, !tbaa !37
; CHECK-NEXT: %[[gep:.]] = getelementptr inbounds [7 x i8], [7 x i8] %[[test3_a5]], i64 0, i64 3
; CHECK-NEXT: %[[bitcast:.]] = bitcast i8 %[[gep]] to i32*
; CHECK-NEXT: store i32 4, i32* %[[bitcast]], {{.}}, !tbaa [[TAG_37:!.]]

%overlap2.2.0.i32 = bitcast i8* %overlap2.2.0.i8 to i32*		%overlap2.2.0.i32 = bitcast i8* %overlap2.2.0.i8 to i32*
%overlap2.2.1.i16 = bitcast i8* %overlap2.2.1.i8 to i16*		%overlap2.2.1.i16 = bitcast i8* %overlap2.2.1.i8 to i16*
%overlap2.2.1.i32 = bitcast i8* %overlap2.2.1.i8 to i32*		%overlap2.2.1.i32 = bitcast i8* %overlap2.2.1.i8 to i32*
%overlap2.2.2.i32 = bitcast i8* %overlap2.2.2.i8 to i32*		%overlap2.2.2.i32 = bitcast i8* %overlap2.2.2.i8 to i32*
%overlap2.2.3.i32 = bitcast i8* %overlap2.2.3.i8 to i32*		%overlap2.2.3.i32 = bitcast i8* %overlap2.2.3.i8 to i32*
store i32 1, i32* %overlap2.2.0.i32, !tbaa !39		store i32 1, i32* %overlap2.2.0.i32, !tbaa !39
; CHECK-NEXT: %[[bitcast:.]] = bitcast [7 x i8] %[[test3_a6]] to i32*
; CHECK-NEXT: store i32 1, i32* %[[bitcast]], {{.}}, !tbaa [[TAG_39:!.]]
store i8 1, i8* %overlap2.2.1.i8, !tbaa !41		store i8 1, i8* %overlap2.2.1.i8, !tbaa !41
; CHECK-NEXT: %[[gep:.]] = getelementptr inbounds [7 x i8], [7 x i8] %[[test3_a6]], i64 0, i64 1
; CHECK-NEXT: store i8 1, i8* %[[gep]], align 1, !tbaa [[TAG_41:!.*]]
store i16 1, i16* %overlap2.2.1.i16, !tbaa !43		store i16 1, i16* %overlap2.2.1.i16, !tbaa !43
; CHECK-NEXT: %[[gep:.]] = getelementptr inbounds [7 x i8], [7 x i8] %[[test3_a6]], i64 0, i64 1
; CHECK-NEXT: %[[bitcast:.]] = bitcast i8 %[[gep]] to i16*
; CHECK-NEXT: store i16 1, i16* %[[bitcast]], {{.}}, !tbaa [[TAG_43:!.]]
store i32 1, i32* %overlap2.2.1.i32, !tbaa !45		store i32 1, i32* %overlap2.2.1.i32, !tbaa !45
; CHECK-NEXT: %[[gep:.]] = getelementptr inbounds [7 x i8], [7 x i8] %[[test3_a6]], i64 0, i64 1
; CHECK-NEXT: %[[bitcast:.]] = bitcast i8 %[[gep]] to i32*
; CHECK-NEXT: store i32 1, i32* %[[bitcast]], {{.}}, !tbaa [[TAG_45:!.]]
store i32 3, i32* %overlap2.2.2.i32, !tbaa !47		store i32 3, i32* %overlap2.2.2.i32, !tbaa !47
; CHECK-NEXT: %[[gep:.]] = getelementptr inbounds [7 x i8], [7 x i8] %[[test3_a6]], i64 0, i64 2
; CHECK-NEXT: %[[bitcast:.]] = bitcast i8 %[[gep]] to i32*
; CHECK-NEXT: store i32 3, i32* %[[bitcast]], {{.}}, !tbaa [[TAG_47:!.]]
store i32 4, i32* %overlap2.2.3.i32, !tbaa !49		store i32 4, i32* %overlap2.2.3.i32, !tbaa !49
; CHECK-NEXT: %[[gep:.]] = getelementptr inbounds [7 x i8], [7 x i8] %[[test3_a6]], i64 0, i64 3
; CHECK-NEXT: %[[bitcast:.]] = bitcast i8 %[[gep]] to i32*
; CHECK-NEXT: store i32 4, i32* %[[bitcast]], {{.}}, !tbaa [[TAG_49:!.]]

%overlap2.prefix = getelementptr i8, i8* %overlap2.1.1.i8, i64 -4		%overlap2.prefix = getelementptr i8, i8* %overlap2.1.1.i8, i64 -4
call void @llvm.memcpy.p0i8.p0i8.i32(i8* %overlap2.prefix, i8* %src, i32 8, i1 false), !tbaa !51		call void @llvm.memcpy.p0i8.p0i8.i32(i8* %overlap2.prefix, i8* %src, i32 8, i1 false), !tbaa !51
; CHECK-NEXT: %[[gep_dst:.]] = getelementptr inbounds [42 x i8], [42 x i8] %[[test3_a4]], i64 0, i64 39		; CHECK-NEXT: %[[gep_dst:.]] = getelementptr inbounds [42 x i8], [42 x i8] %[[test3_a4]], i64 0, i64 39
; CHECK-NEXT: call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 1 %[[gep_dst]], i8* align 1 %src, i32 3, {{.}}), !tbaa [[TAG_51:!.]]		; CHECK-NEXT: call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 1 %[[gep_dst]], i8* align 1 %src, i32 3, {{.}}), !tbaa [[TAG_51:!.]]
; CHECK-NEXT: %[[gep_src:.]] = getelementptr inbounds i8, i8 %src, i64 3		; CHECK-NEXT: %[[gep_src:.]] = getelementptr inbounds i8, i8 %src, i64 3
; CHECK-NEXT: %[[gep_dst:.]] = getelementptr inbounds [7 x i8], [7 x i8] %[[test3_a5]], i64 0, i64 0		; CHECK-NEXT: %[[i8_3:.]] = load i8, i8 %[[gep_src]], align 1, !tbaa [[TAG_51]]
; CHECK-NEXT: call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 1 %[[gep_dst]], i8* align 1 %[[gep_src]], i32 5, {{.*}}), !tbaa [[TAG_51]]		; CHECK-NEXT: %[[gep_src:.]] = getelementptr inbounds i8, i8 %src, i64 4
		; CHECK-NEXT: %[[i8_4:.]] = load i8, i8 %[[gep_src]], align 1, !tbaa [[TAG_51]]
		; CHECK-NEXT: %[[gep_src:.]] = getelementptr inbounds i8, i8 %src, i64 5
		; CHECK-NEXT: %[[dummy:.]] = load i8, i8 %[[gep_src]], align 1, !tbaa [[TAG_51]]
		; CHECK-NEXT: %[[gep_src:.]] = getelementptr inbounds i8, i8 %src, i64 6
		; CHECK-NEXT: %[[bc_src:.]] = bitcast i8 %[[gep_src]] to i16*
		; CHECK-NEXT: %[[i16_6:.]] = load i16, i16 %[[bc_src]], align 1, !tbaa [[TAG_51]]
		; CHECK-NEXT: %[[dummy:.*]] = trunc i16 %[[i16_6]] to i8
		; CHECK-NEXT: %[[i16_6_lshr:.*]] = lshr i16 %[[i16_6]], 8
		; CHECK-NEXT: %[[dummy:.*]] = trunc i16 %[[i16_6_lshr]] to i8

; Bridge between the overlapping areas		; Bridge between the overlapping areas
call void @llvm.memset.p0i8.i32(i8* %overlap2.1.2.i8, i8 42, i32 8, i1 false), !tbaa !53		call void @llvm.memset.p0i8.i32(i8* %overlap2.1.2.i8, i8 42, i32 8, i1 false), !tbaa !53
; CHECK-NEXT: %[[gep:.]] = getelementptr inbounds [7 x i8], [7 x i8] %[[test3_a5]], i64 0, i64 2
; CHECK-NEXT: call void @llvm.memset.p0i8.i32(i8* align 1 %[[gep]], i8 42, i32 5, {{.}}), !tbaa [[TAG_53:!.]]
; ...promoted i8 store...
; CHECK-NEXT: %[[gep:.]] = getelementptr inbounds [7 x i8], [7 x i8] %[[test3_a6]], i64 0, i64 0
; CHECK-NEXT: call void @llvm.memset.p0i8.i32(i8* align 1 %[[gep]], i8 42, i32 2, {{.*}}), !tbaa [[TAG_53]]

; Entirely within the second overlap.		; Entirely within the second overlap.
call void @llvm.memcpy.p0i8.p0i8.i32(i8* %overlap2.2.1.i8, i8* %src, i32 5, i1 false), !tbaa !55		call void @llvm.memcpy.p0i8.p0i8.i32(i8* %overlap2.2.1.i8, i8* %src, i32 5, i1 false), !tbaa !55
; CHECK-NEXT: %[[gep:.]] = getelementptr inbounds [7 x i8], [7 x i8] %[[test3_a6]], i64 0, i64 1		; CHECK-NEXT: %[[i8_0:.]] = load i8, i8 %src, align 1, !tbaa [[TAG_55:!.*]]
; CHECK-NEXT: call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 1 %[[gep]], i8* align 1 %src, i32 5, {{.}}), !tbaa [[TAG_55:!.]]		; CHECK-NEXT: %[[gep_src:.]] = getelementptr inbounds i8, i8 %src, i64 1
		; CHECK-NEXT: %[[dummy:.]] = load i8, i8 %[[gep_src]], align 1, !tbaa [[TAG_55:!.*]]
		; CHECK-NEXT: %[[gep_src:.]] = getelementptr inbounds i8, i8 %src, i64 2
		; CHECK-NEXT: %[[bc_src:.]] = bitcast i8 %[[gep_src]] to i24*
		; CHECK-NEXT: %[[i24_2:.]] = load i24, i24 %[[bc_src]], align 1, !tbaa [[TAG_55]]
		; CHECK-NEXT: %[[i24_2_trunc:.*]] = trunc i24 %[[i24_2]] to i16
		; CHECK-NEXT: %[[dummy:.*]] = trunc i16 %[[i24_2_trunc]] to i8
		; CHECK-NEXT: %[[i24_2_trunc_lshr:.*]] = lshr i16 %[[i24_2_trunc]], 8
		; CHECK-NEXT: %[[dummy:.*]] = trunc i16 %[[i24_2_trunc_lshr]] to i8
		; CHECK-NEXT: %[[i24_2_lshr:.*]] = lshr i24 %[[i24_2]], 16
		; CHECK-NEXT: %[[dummy:.*]] = trunc i24 %[[i24_2_lshr]] to i8

; Trailing past the second overlap.		; Trailing past the second overlap.
call void @llvm.memcpy.p0i8.p0i8.i32(i8* %overlap2.2.2.i8, i8* %src, i32 8, i1 false), !tbaa !57		call void @llvm.memcpy.p0i8.p0i8.i32(i8* %overlap2.2.2.i8, i8* %src, i32 8, i1 false), !tbaa !57
; CHECK-NEXT: %[[gep:.]] = getelementptr inbounds [7 x i8], [7 x i8] %[[test3_a6]], i64 0, i64 2		; CHECK-NEXT: %[[i8_0_210:.]] = load i8, i8 %src, align 1, !tbaa [[TAG_57:!.*]]
; CHECK-NEXT: call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 1 %[[gep]], i8* align 1 %src, i32 5, {{.}}), !tbaa [[TAG_57:!.]]		; CHECK-NEXT: %[[gep_src:.]] = getelementptr inbounds i8, i8 %src, i64 1
		; CHECK-NEXT: %[[bc_src:.]] = bitcast i8 %[[gep_src]] to i32*
		; CHECK-NEXT: %[[i32_1:.]] = load i32, i32 %[[bc_src]], align 1, !tbaa [[TAG_57]]
		; CHECK-NEXT: %[[i32_1_trunc:.*]] = trunc i32 %[[i32_1]] to i24
		; CHECK-NEXT: %[[i32_1_trunc_trunc:.*]] = trunc i24 %[[i32_1_trunc]] to i16
		; CHECK-NEXT: %[[i32_1_trunc_trunc_trunc:.*]] = trunc i16 %[[i32_1_trunc_trunc]] to i8
		; CHECK-NEXT: %[[i32_1_trunc_trunc_lshr:.*]] = lshr i16 %[[i32_1_trunc_trunc]], 8
		; CHECK-NEXT: %[[i32_1_trunc_trunc_lshr_trunc:.*]] = trunc i16 %[[i32_1_trunc_trunc_lshr]] to i8
		; CHECK-NEXT: %[[i32_1_trunc_lshr:.*]] = lshr i24 %[[i32_1_trunc]], 16
		; CHECK-NEXT: %[[i32_1_trunc_lshr_trunc:.*]] = trunc i24 %[[i32_1_trunc_lshr]] to i8
		; CHECK-NEXT: %[[i32_1_lshr:.*]] = lshr i32 %[[i32_1]], 24
		; CHECK-NEXT: %[[i32_1_lshr_trunc:.*]] = trunc i32 %[[i32_1_lshr]] to i8
; CHECK-NEXT: %[[gep_src:.]] = getelementptr inbounds i8, i8 %src, i64 5		; CHECK-NEXT: %[[gep_src:.]] = getelementptr inbounds i8, i8 %src, i64 5
; CHECK-NEXT: %[[gep_dst:.]] = getelementptr inbounds [85 x i8], [85 x i8] %[[test3_a7]], i64 0, i64 0		; CHECK-NEXT: %[[gep_dst:.]] = getelementptr inbounds [85 x i8], [85 x i8] %[[test3_a7]], i64 0, i64 0
; CHECK-NEXT: call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 1 %[[gep_dst]], i8* align 1 %[[gep_src]], i32 3, {{.*}}), !tbaa [[TAG_57]]		; CHECK-NEXT: call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 1 %[[gep_dst]], i8* align 1 %[[gep_src]], i32 3, {{.*}}), !tbaa [[TAG_57]]

call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dst, i8* %b, i32 300, i1 false), !tbaa !59		call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dst, i8* %b, i32 300, i1 false), !tbaa !59
; CHECK-NEXT: %[[gep:.]] = getelementptr inbounds [42 x i8], [42 x i8] %[[test3_a1]], i64 0, i64 0		; CHECK-NEXT: %[[gep:.]] = getelementptr inbounds [42 x i8], [42 x i8] %[[test3_a1]], i64 0, i64 0
; CHECK-NEXT: call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 1 %dst, i8* align 1 %[[gep]], i32 42, {{.}}), !tbaa [[TAG_59:!.]]		; CHECK-NEXT: call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 1 %dst, i8* align 1 %[[gep]], i32 42, {{.}}), !tbaa [[TAG_59:!.]]
; CHECK-NEXT: %[[gep:.]] = getelementptr inbounds i8, i8 %dst, i64 42		; CHECK-NEXT: %[[gep:.]] = getelementptr inbounds i8, i8 %dst, i64 42
; CHECK-NEXT: store i8 0, i8* %[[gep]], {{.*}}, !tbaa [[TAG_59]]		; CHECK-NEXT: store i8 0, i8* %[[gep]], {{.*}}, !tbaa [[TAG_59]]
; CHECK-NEXT: %[[gep_dst:.]] = getelementptr inbounds i8, i8 %dst, i64 43		; CHECK-NEXT: %[[gep_dst:.]] = getelementptr inbounds i8, i8 %dst, i64 43
; CHECK-NEXT: %[[gep_src:.]] = getelementptr inbounds [99 x i8], [99 x i8] %[[test3_a2]], i64 0, i64 0		; CHECK-NEXT: %[[gep_src:.]] = getelementptr inbounds [99 x i8], [99 x i8] %[[test3_a2]], i64 0, i64 0
; CHECK-NEXT: call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 1 %[[gep_dst]], i8* align 1 %[[gep_src]], i32 99, {{.*}}), !tbaa [[TAG_59]]		; CHECK-NEXT: call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 1 %[[gep_dst]], i8* align 1 %[[gep_src]], i32 99, {{.*}}), !tbaa [[TAG_59]]
; CHECK-NEXT: %[[gep_dst:.]] = getelementptr inbounds i8, i8 %dst, i64 142		; CHECK-NEXT: %[[gep_dst:.]] = getelementptr inbounds i8, i8 %dst, i64 142
; CHECK-NEXT: %[[gep_src:.]] = getelementptr inbounds [16 x i8], [16 x i8] %[[test3_a3]], i64 0, i64 0		; CHECK-NEXT: store i8 1, i8* %[[gep_dst]], align 1, !tbaa [[TAG_59]]
; CHECK-NEXT: call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 1 %[[gep_dst]], i8* align 1 %[[gep_src]], i32 16, {{.*}}), !tbaa [[TAG_59]]		; CHECK-NEXT: %[[gep_dst:.]] = getelementptr inbounds i8, i8 %dst, i64 143
		; CHECK-NEXT: store i8 2, i8* %[[gep_dst]], align 1, !tbaa [[TAG_59]]
		; CHECK-NEXT: %[[gep_dst:.]] = getelementptr inbounds i8, i8 %dst, i64 144
		; CHECK-NEXT: store i8 3, i8* %[[gep_dst]], align 1, !tbaa [[TAG_59]]
		; CHECK-NEXT: %[[gep_dst:.]] = getelementptr inbounds i8, i8 %dst, i64 145
		; CHECK-NEXT: store i8 4, i8* %[[gep_dst]], align 1, !tbaa [[TAG_59]]
		; CHECK-NEXT: %[[gep_dst:.]] = getelementptr inbounds i8, i8 %dst, i64 146
		; CHECK-NEXT: store i8 5, i8* %[[gep_dst]], align 1, !tbaa [[TAG_59]]
		; CHECK-NEXT: %[[gep_dst:.]] = getelementptr inbounds i8, i8 %dst, i64 147
		; CHECK-NEXT: store i8 6, i8* %[[gep_dst]], align 1, !tbaa [[TAG_59]]
		; CHECK-NEXT: %[[gep_dst:.]] = getelementptr inbounds i8, i8 %dst, i64 148
		; CHECK-NEXT: store i8 7, i8* %[[gep_dst]], align 1, !tbaa [[TAG_59]]
		; CHECK-NEXT: %[[gep_dst:.]] = getelementptr inbounds i8, i8 %dst, i64 149
		; CHECK-NEXT: store i8 8, i8* %[[gep_dst]], align 1, !tbaa [[TAG_59]]
		; CHECK-NEXT: %[[gep_dst:.]] = getelementptr inbounds i8, i8 %dst, i64 150
		; CHECK-NEXT: %[[bc_dst:.]] = bitcast i8 %[[gep_dst]] to i64*
		; CHECK-NEXT: %[[zero:.*]] = zext i8 0 to i16
		; CHECK-NEXT: %[[zero_shl:.*]] = shl i16 %[[zero]], 8
		; CHECK-NEXT: %[[undef16:.*]] = and i16 undef, 255
		; CHECK-NEXT: %[[low16:.*]] = or i16 %[[undef16]], %[[zero_shl]]
		; CHECK-NEXT: %[[nine:.*]] = zext i8 9 to i16
		; CHECK-NEXT: %[[low16_and:.*]] = and i16 %[[low16]], -256
		; CHECK-NEXT: %[[low16:.*]] = or i16 %[[low16_and]], %[[nine]]
		; CHECK-NEXT: %[[zero:.*]] = zext i8 0 to i24
		; CHECK-NEXT: %[[zero_shl:.*]] = shl i24 %[[zero]], 16
		; CHECK-NEXT: %[[undef24:.*]] = and i24 undef, 65535
		; CHECK-NEXT: %[[high24:.*]] = or i24 %[[undef24]], %[[zero_shl]]
		; CHECK-NEXT: %[[low16_zext:.*]] = zext i16 %[[low16]] to i24
		; CHECK-NEXT: %[[high24_and:.*]] = and i24 %[[high24]], -65536
		; CHECK-NEXT: %[[value24:.*]] = or i24 %[[high24_and]], %[[low16_zext]]
		; CHECK-NEXT: %[[zero:.*]] = zext i8 0 to i32
		; CHECK-NEXT: %[[zero_shl:.*]] = shl i32 %[[zero]], 24
		; CHECK-NEXT: %[[undef32:.*]] = and i32 undef, 16777215
		; CHECK-NEXT: %[[value32:.*]] = or i32 %[[undef32]], %[[zero_shl]]
		; CHECK-NEXT: %[[value24_zext:.*]] = zext i24 %[[value24]] to i32
		; CHECK-NEXT: %[[value32_and:.*]] = and i32 %[[value32]], -16777216
		; CHECK-NEXT: %[[value32:.*]] = or i32 %[[value32_and]], %[[value24_zext]]
		; CHECK-NEXT: %[[zero:.*]] = zext i8 0 to i40
		; CHECK-NEXT: %[[zero_shl:.*]] = shl i40 %[[zero]], 32
		; CHECK-NEXT: %[[undef40:.*]] = and i40 undef, 4294967295
		; CHECK-NEXT: %[[value40:.*]] = or i40 %[[undef40]], %[[zero_shl]]
		; CHECK-NEXT: %[[value32_zext:.*]] = zext i32 %[[value32]] to i40
		; CHECK-NEXT: %[[value40_and:.*]] = and i40 %[[value40]], -4294967296
		; CHECK-NEXT: %[[value40:.*]] = or i40 %[[value40_and]], %[[value32_zext]]
		; CHECK-NEXT: %[[zero:.*]] = zext i8 0 to i48
		; CHECK-NEXT: %[[zero_shl:.*]] = shl i48 %[[zero]], 40
		; CHECK-NEXT: %[[undef48:.*]] = and i48 undef, 1099511627775
		; CHECK-NEXT: %[[value48:.*]] = or i48 %[[undef48]], %[[zero_shl]]
		; CHECK-NEXT: %[[value40_zext:.*]] = zext i40 %[[value40]] to i48
		; CHECK-NEXT: %[[value48_and:.*]] = and i48 %[[value48]], -1099511627776
		; CHECK-NEXT: %[[value48:.*]] = or i48 %[[value48_and]], %[[value40_zext]]
		; CHECK-NEXT: %[[zero:.*]] = zext i8 0 to i56
		; CHECK-NEXT: %[[zero_shl:.*]] = shl i56 %[[zero]], 48
		; CHECK-NEXT: %[[undef56:.*]] = and i56 undef, 281474976710655
		; CHECK-NEXT: %[[value56:.*]] = or i56 %[[undef56]], %[[zero_shl]]
		; CHECK-NEXT: %[[value48_zext:.*]] = zext i48 %[[value48]] to i56
		; CHECK-NEXT: %[[value56_and:.*]] = and i56 %[[value56]], -281474976710656
		; CHECK-NEXT: %[[value56:.*]] = or i56 %[[value56_and]], %[[value48_zext]]
		; CHECK-NEXT: %[[zero:.*]] = zext i8 0 to i64
		; CHECK-NEXT: %[[zero_shl:.*]] = shl i64 %[[zero]], 56
		; CHECK-NEXT: %[[undef64:.*]] = and i64 undef, 72057594037927935
		; CHECK-NEXT: %[[value64:.*]] = or i64 %[[undef64]], %[[zero_shl]]
		; CHECK-NEXT: %[[value56_zext:.*]] = zext i56 %[[value56]] to i64
		; CHECK-NEXT: %[[value64_and:.*]] = and i64 %[[value64]], -72057594037927936
		; CHECK-NEXT: %[[value64:.*]] = or i64 %[[value64_and]], %[[value56_zext]]
		; CHECK-NEXT: store i64 %[[value64]], i64* %[[bc_dst]], align 1, !tbaa [[TAG_59]]
; CHECK-NEXT: %[[gep_dst:.]] = getelementptr inbounds i8, i8 %dst, i64 158		; CHECK-NEXT: %[[gep_dst:.]] = getelementptr inbounds i8, i8 %dst, i64 158
; CHECK-NEXT: %[[gep_src:.]] = getelementptr inbounds [42 x i8], [42 x i8] %[[test3_a4]], i64 0, i64 0		; CHECK-NEXT: %[[gep_src:.]] = getelementptr inbounds [42 x i8], [42 x i8] %[[test3_a4]], i64 0, i64 0
; CHECK-NEXT: call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 1 %[[gep_dst]], i8* align 1 %[[gep_src]], i32 42, {{.*}}), !tbaa [[TAG_59]]		; CHECK-NEXT: call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 1 %[[gep_dst]], i8* align 1 %[[gep_src]], i32 42, {{.*}}), !tbaa [[TAG_59]]
; CHECK-NEXT: %[[gep_dst:.]] = getelementptr inbounds i8, i8 %dst, i64 200		; CHECK-NEXT: %[[gep_dst:.]] = getelementptr inbounds i8, i8 %dst, i64 200
; CHECK-NEXT: %[[gep_src:.]] = getelementptr inbounds [7 x i8], [7 x i8] %[[test3_a5]], i64 0, i64 0		; CHECK-NEXT: store i8 %[[i8_3]], i8* %[[gep_dst]], align 1, !tbaa [[TAG_59]]
; CHECK-NEXT: call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 1 %[[gep_dst]], i8* align 1 %[[gep_src]], i32 7, {{.*}}), !tbaa [[TAG_59]]		; CHECK-NEXT: %[[gep_dst:.]] = getelementptr inbounds i8, i8 %dst, i64 201
; CHECK-NEXT: %[[gep:.]] = getelementptr inbounds i8, i8 %dst, i64 207		; CHECK-NEXT: store i8 %[[i8_4]], i8* %[[gep_dst]], align 1, !tbaa [[TAG_59]]
; CHECK-NEXT: store i8 42, i8* %[[gep]], {{.*}}, !tbaa [[TAG_59]]		; CHECK-NEXT: %[[gep_dst:.]] = getelementptr inbounds i8, i8 %dst, i64 202
		; CHECK-NEXT: store i8 42, i8* %[[gep_dst]], align 1, !tbaa [[TAG_59]]
		; CHECK-NEXT: %[[gep_dst:.]] = getelementptr inbounds i8, i8 %dst, i64 203
		; CHECK-NEXT: %[[bc_dst:.]] = bitcast i8 %[[gep_dst]] to i32*
		; CHECK-NEXT: %[[byte:.*]] = zext i8 42 to i16
		; CHECK-NEXT: %[[byte_shl:.*]] = shl i16 %[[byte]], 8
		; CHECK-NEXT: %[[undef16:.*]] = and i16 undef, 255
		; CHECK-NEXT: %[[low16:.*]] = or i16 %[[undef16]], %[[byte_shl]]
		; CHECK-NEXT: %[[byte:.*]] = zext i8 42 to i16
		; CHECK-NEXT: %[[low16_and:.*]] = and i16 %[[low16]], -256
		; CHECK-NEXT: %[[low16:.*]] = or i16 %[[low16_and]], %[[byte]]
		; CHECK-NEXT: %[[byte:.*]] = zext i8 42 to i24
		; CHECK-NEXT: %[[byte_shl:.*]] = shl i24 %[[byte]], 16
		; CHECK-NEXT: %[[undef24:.*]] = and i24 undef, 65535
		; CHECK-NEXT: %[[high24:.*]] = or i24 %[[undef24]], %[[byte_shl]]
		; CHECK-NEXT: %[[low16_zext:.*]] = zext i16 %[[low16]] to i24
		; CHECK-NEXT: %[[high24_and:.*]] = and i24 %[[high24]], -65536
		; CHECK-NEXT: %[[value24:.*]] = or i24 %[[high24_and]], %[[low16_zext]]
		; CHECK-NEXT: %[[byte:.*]] = zext i8 42 to i32
		; CHECK-NEXT: %[[byte_shl:.*]] = shl i32 %[[byte]], 24
		; CHECK-NEXT: %[[undef32:.*]] = and i32 undef, 16777215
		; CHECK-NEXT: %[[value32:.*]] = or i32 %[[undef32]], %[[byte_shl]]
		; CHECK-NEXT: %[[value24_zext:.*]] = zext i24 %[[value24]] to i32
		; CHECK-NEXT: %[[value32_and:.*]] = and i32 %[[value32]], -16777216
		; CHECK-NEXT: %[[value32:.*]] = or i32 %[[value32_and]], %[[value24_zext]]
		; CHECK-NEXT: store i32 %[[value32]], i32* %[[bc_dst]], align 1, !tbaa [[TAG_59]]
		; CHECK-NEXT: %[[gep_dst:.]] = getelementptr inbounds i8, i8 %dst, i64 207
		; CHECK-NEXT: store i8 42, i8* %[[gep_dst]], {{.*}}, !tbaa [[TAG_59]]
; CHECK-NEXT: %[[gep_dst:.]] = getelementptr inbounds i8, i8 %dst, i64 208		; CHECK-NEXT: %[[gep_dst:.]] = getelementptr inbounds i8, i8 %dst, i64 208
; CHECK-NEXT: %[[gep_src:.]] = getelementptr inbounds [7 x i8], [7 x i8] %[[test3_a6]], i64 0, i64 0		; CHECK-NEXT: store i8 42, i8* %[[gep_dst]], {{.*}}, !tbaa [[TAG_59]]
; CHECK-NEXT: call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 1 %[[gep_dst]], i8* align 1 %[[gep_src]], i32 7, {{.*}}), !tbaa [[TAG_59]]		; CHECK-NEXT: %[[gep_dst:.]] = getelementptr inbounds i8, i8 %dst, i64 209
		; CHECK-NEXT: store i8 %[[i8_0]], i8* %[[gep_dst]], align 1, !tbaa [[TAG_59]]
		; CHECK-NEXT: %[[gep_dst:.]] = getelementptr inbounds i8, i8 %dst, i64 210
		; CHECK-NEXT: store i8 %[[i8_0_210]], i8* %[[gep_dst]], align 1, !tbaa [[TAG_59]]
		; CHECK-NEXT: %[[gep_dst:.]] = getelementptr inbounds i8, i8 %dst, i64 211
		; CHECK-NEXT: %[[bc_dst:.]] = bitcast i8 %[[gep_dst]] to i32*
		; CHECK-NEXT: %[[i32_1_trunc_trunc_lshr_trunc_zext:.*]] = zext i8 %[[i32_1_trunc_trunc_lshr_trunc]] to i16
		; CHECK-NEXT: %[[i32_1_trunc_trunc_lshr_trunc_zext_shl:.*]] = shl i16 %[[i32_1_trunc_trunc_lshr_trunc_zext]], 8
		; CHECK-NEXT: %[[undef16:.*]] = and i16 undef, 255
		; CHECK-NEXT: %[[second_byte:.*]] = or i16 %[[undef16]], %[[i32_1_trunc_trunc_lshr_trunc_zext_shl]]
		; CHECK-NEXT: %[[i32_1_trunc_trunc_trunc_zext:.*]] = zext i8 %[[i32_1_trunc_trunc_trunc]] to i16
		; CHECK-NEXT: %[[masked_second_byte:.*]] = and i16 %[[second_byte]], -256
		; CHECK-NEXT: %[[low16:.*]] = or i16 %[[masked_second_byte]], %[[i32_1_trunc_trunc_trunc_zext]]
		; CHECK-NEXT: %[[i32_1_trunc_lshr_trunc_zext:.*]] = zext i8 %[[i32_1_trunc_lshr_trunc]] to i24
		; CHECK-NEXT: %[[i32_1_trunc_lshr_trunc_zext_shl:.*]] = shl i24 %[[i32_1_trunc_lshr_trunc_zext]], 16
		; CHECK-NEXT: %[[undef24:.*]] = and i24 undef, 65535
		; CHECK-NEXT: %[[third_byte:.*]] = or i24 %[[undef24]], %[[i32_1_trunc_lshr_trunc_zext_shl]]
		; CHECK-NEXT: %[[low16_zext:.*]] = zext i16 %[[low16]] to i24
		; CHECK-NEXT: %[[masked_third_byte:.*]] = and i24 %[[third_byte]], -65536
		; CHECK-NEXT: %[[value24:.*]] = or i24 %[[masked_third_byte]], %[[low16_zext]]
		; CHECK-NEXT: %[[i32_1_lshr_trunc_zext:.*]] = zext i8 %[[i32_1_lshr_trunc]] to i32
		; CHECK-NEXT: %[[high_byte:.*]] = shl i32 %[[i32_1_lshr_trunc_zext]], 24
		; CHECK-NEXT: %[[undef32:.*]] = and i32 undef, 16777215
		; CHECK-NEXT: %[[value32:.*]] = or i32 %[[undef32]], %[[high_byte]]
		; CHECK-NEXT: %[[value24_zext:.*]] = zext i24 %[[value24]] to i32
		; CHECK-NEXT: %[[value32_and:.*]] = and i32 %[[value32]], -16777216
		; CHECK-NEXT: %[[value32:.*]] = or i32 %[[value32_and]], %[[value24_zext]]
		; CHECK-NEXT: store i32 %[[value32]], i32* %[[bc_dst]], align 1, !tbaa [[TAG_59]]
; CHECK-NEXT: %[[gep_dst:.]] = getelementptr inbounds i8, i8 %dst, i64 215		; CHECK-NEXT: %[[gep_dst:.]] = getelementptr inbounds i8, i8 %dst, i64 215
; CHECK-NEXT: %[[gep_src:.]] = getelementptr inbounds [85 x i8], [85 x i8] %[[test3_a7]], i64 0, i64 0		; CHECK-NEXT: %[[gep_src:.]] = getelementptr inbounds [85 x i8], [85 x i8] %[[test3_a7]], i64 0, i64 0
; CHECK-NEXT: call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 1 %[[gep_dst]], i8* align 1 %[[gep_src]], i32 85, {{.*}}), !tbaa [[TAG_59]]		; CHECK-NEXT: call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 1 %[[gep_dst]], i8* align 1 %[[gep_src]], i32 85, {{.*}}), !tbaa [[TAG_59]]

ret void		ret void
}		}

define void @test4(i8* %dst, i8* %src) {		define void @test4(i8* %dst, i8* %src) {
▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines
; CHECK-NEXT: %[[gep_dst:.]] = getelementptr inbounds [40 x i8], [40 x i8] %[[test4_a6]], i64 0, i64 0		; CHECK-NEXT: %[[gep_dst:.]] = getelementptr inbounds [40 x i8], [40 x i8] %[[test4_a6]], i64 0, i64 0
; CHECK-NEXT: call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 1 %[[gep_dst]], i8* align 1 %[[gep_src]], i32 40, {{.*}}), !tbaa [[TAG_0]]		; CHECK-NEXT: call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 1 %[[gep_dst]], i8* align 1 %[[gep_src]], i32 40, {{.*}}), !tbaa [[TAG_0]]

%a.src.1 = getelementptr [100 x i8], [100 x i8]* %a, i64 0, i64 20		%a.src.1 = getelementptr [100 x i8], [100 x i8]* %a, i64 0, i64 20
%a.dst.1 = getelementptr [100 x i8], [100 x i8]* %a, i64 0, i64 40		%a.dst.1 = getelementptr [100 x i8], [100 x i8]* %a, i64 0, i64 40
call void @llvm.memcpy.p0i8.p0i8.i32(i8* %a.dst.1, i8* %a.src.1, i32 10, i1 false), !tbaa !3		call void @llvm.memcpy.p0i8.p0i8.i32(i8* %a.dst.1, i8* %a.src.1, i32 10, i1 false), !tbaa !3
; CHECK-NEXT: %[[gep_dst:.]] = getelementptr inbounds [7 x i8], [7 x i8] %[[test4_a4]], i64 0, i64 0		; CHECK-NEXT: %[[gep_dst:.]] = getelementptr inbounds [7 x i8], [7 x i8] %[[test4_a4]], i64 0, i64 0
; CHECK-NEXT: %[[gep_src:.]] = getelementptr inbounds [7 x i8], [7 x i8] %[[test4_a2]], i64 0, i64 0		; CHECK-NEXT: %[[gep_src:.]] = getelementptr inbounds [7 x i8], [7 x i8] %[[test4_a2]], i64 0, i64 0
; CHECK-NEXT: call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 1 %[[gep_dst]], i8* align 1 %[[gep_src]], i32 7, {{.*}}), !tbaa [[TAG_3]]		; CHECK-NEXT: call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 1 %[[gep_dst]], i8* align 1 %[[gep_src]], i32 7, {{.}}), !tbaa [[TAG_3:!.]]

; Clobber a single element of the array, this should be promotable, and be deleted.		; Clobber a single element of the array, this should be promotable, and be deleted.
%c = getelementptr [100 x i8], [100 x i8]* %a, i64 0, i64 42		%c = getelementptr [100 x i8], [100 x i8]* %a, i64 0, i64 42
store i8 0, i8* %c		store i8 0, i8* %c

%a.src.2 = getelementptr [100 x i8], [100 x i8]* %a, i64 0, i64 50		%a.src.2 = getelementptr [100 x i8], [100 x i8]* %a, i64 0, i64 50
call void @llvm.memmove.p0i8.p0i8.i32(i8* %a.dst.1, i8* %a.src.2, i32 10, i1 false), !tbaa !5		call void @llvm.memmove.p0i8.p0i8.i32(i8* %a.dst.1, i8* %a.src.2, i32 10, i1 false), !tbaa !5
; CHECK-NEXT: %[[gep_dst:.]] = getelementptr inbounds [7 x i8], [7 x i8] %[[test4_a4]], i64 0, i64 0		; CHECK-NEXT: %[[gep_dst:.]] = getelementptr inbounds [7 x i8], [7 x i8] %[[test4_a4]], i64 0, i64 0
; CHECK-NEXT: %[[gep_src:.]] = getelementptr inbounds [7 x i8], [7 x i8] %[[test4_a5]], i64 0, i64 0		; CHECK-NEXT: %[[gep_src:.]] = getelementptr inbounds [7 x i8], [7 x i8] %[[test4_a5]], i64 0, i64 0
; CHECK-NEXT: call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 1 %[[gep_dst]], i8* align 1 %[[gep_src]], i32 7, {{.*}}), !tbaa [[TAG_5]]		; CHECK-NEXT: call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 1 %[[gep_dst]], i8* align 1 %[[gep_src]], i32 7, {{.}}), !tbaa [[TAG_5:!.]]

call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dst, i8* %b, i32 100, i1 false), !tbaa !7		call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dst, i8* %b, i32 100, i1 false), !tbaa !7
; CHECK-NEXT: %[[gep:.]] = getelementptr inbounds [20 x i8], [20 x i8] %[[test4_a1]], i64 0, i64 0		; CHECK-NEXT: %[[gep:.]] = getelementptr inbounds [20 x i8], [20 x i8] %[[test4_a1]], i64 0, i64 0
; CHECK-NEXT: call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 1 %dst, i8* align 1 %[[gep]], i32 20, {{.*}}), !tbaa [[TAG_7]]		; CHECK-NEXT: call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 1 %dst, i8* align 1 %[[gep]], i32 20, {{.}}), !tbaa [[TAG_7:!.]]
; CHECK-NEXT: %[[gep:.]] = getelementptr inbounds i8, i8 %dst, i64 20		; CHECK-NEXT: %[[gep:.]] = getelementptr inbounds i8, i8 %dst, i64 20
; CHECK-NEXT: %[[bitcast:.]] = bitcast i8 %[[gep]] to i16*		; CHECK-NEXT: %[[bitcast:.]] = bitcast i8 %[[gep]] to i16*
; CHECK-NEXT: store i16 %[[test4_r1]], i16* %[[bitcast]], {{.*}}, !tbaa [[TAG_7]]		; CHECK-NEXT: store i16 %[[test4_r1]], i16* %[[bitcast]], {{.*}}, !tbaa [[TAG_7]]
; CHECK-NEXT: %[[gep:.]] = getelementptr inbounds i8, i8 %dst, i64 22		; CHECK-NEXT: %[[gep:.]] = getelementptr inbounds i8, i8 %dst, i64 22
; CHECK-NEXT: store i8 %[[test4_r2]], i8* %[[gep]], {{.*}}, !tbaa [[TAG_7]]		; CHECK-NEXT: store i8 %[[test4_r2]], i8* %[[gep]], {{.*}}, !tbaa [[TAG_7]]
; CHECK-NEXT: %[[gep_dst:.]] = getelementptr inbounds i8, i8 %dst, i64 23		; CHECK-NEXT: %[[gep_dst:.]] = getelementptr inbounds i8, i8 %dst, i64 23
; CHECK-NEXT: %[[gep_src:.]] = getelementptr inbounds [7 x i8], [7 x i8] %[[test4_a2]], i64 0, i64 0		; CHECK-NEXT: %[[gep_src:.]] = getelementptr inbounds [7 x i8], [7 x i8] %[[test4_a2]], i64 0, i64 0
; CHECK-NEXT: call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 1 %[[gep_dst]], i8* align 1 %[[gep_src]], i32 7, {{.*}}), !tbaa [[TAG_7]]		; CHECK-NEXT: call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 1 %[[gep_dst]], i8* align 1 %[[gep_src]], i32 7, {{.*}}), !tbaa [[TAG_7]]
▲ Show 20 Lines • Show All 416 Lines • ▼ Show 20 Lines
; CHECK: %[[srcgep1:.]] = getelementptr inbounds i8, i8 %src, i64 4		; CHECK: %[[srcgep1:.]] = getelementptr inbounds i8, i8 %src, i64 4
; CHECK-NEXT: %[[srccast1:.]] = bitcast i8 %[[srcgep1]] to i32*		; CHECK-NEXT: %[[srccast1:.]] = bitcast i8 %[[srcgep1]] to i32*
; CHECK-NEXT: %[[srcload:.]] = load i32, i32 %[[srccast1]], {{.*}}, !tbaa [[TAG_0]]		; CHECK-NEXT: %[[srcload:.]] = load i32, i32 %[[srccast1]], {{.*}}, !tbaa [[TAG_0]]
; CHECK-NEXT: %[[agep1:.]] = getelementptr inbounds [34 x i8], [34 x i8] %[[a]], i64 0, i64 0		; CHECK-NEXT: %[[agep1:.]] = getelementptr inbounds [34 x i8], [34 x i8] %[[a]], i64 0, i64 0
; CHECK-NEXT: call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 1 %[[agep1]], i8* %src, i32 %size, {{.*}}), !tbaa [[TAG_3]]		; CHECK-NEXT: call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 1 %[[agep1]], i8* %src, i32 %size, {{.*}}), !tbaa [[TAG_3]]
; CHECK-NEXT: %[[agep2:.]] = getelementptr inbounds [34 x i8], [34 x i8] %[[a]], i64 0, i64 0		; CHECK-NEXT: %[[agep2:.]] = getelementptr inbounds [34 x i8], [34 x i8] %[[a]], i64 0, i64 0
; CHECK-NEXT: call void @llvm.memset.p0i8.i32(i8* align 1 %[[agep2]], i8 42, i32 %size, {{.*}}), !tbaa [[TAG_5]]		; CHECK-NEXT: call void @llvm.memset.p0i8.i32(i8* align 1 %[[agep2]], i8 42, i32 %size, {{.*}}), !tbaa [[TAG_5]]
; CHECK-NEXT: %[[dstcast1:.]] = bitcast i8 %dst to i32*		; CHECK-NEXT: %[[dstcast1:.]] = bitcast i8 %dst to i32*
; CHECK-NEXT: store i32 42, i32* %[[dstcast1]], {{.*}}, !tbaa [[TAG_9]]		; CHECK-NEXT: store i32 42, i32* %[[dstcast1]], {{.}}, !tbaa [[TAG_9:!.]]
; CHECK-NEXT: %[[dstgep1:.]] = getelementptr inbounds i8, i8 %dst, i64 4		; CHECK-NEXT: %[[dstgep1:.]] = getelementptr inbounds i8, i8 %dst, i64 4
; CHECK-NEXT: %[[dstcast2:.]] = bitcast i8 %[[dstgep1]] to i32*		; CHECK-NEXT: %[[dstcast2:.]] = bitcast i8 %[[dstgep1]] to i32*
; CHECK-NEXT: store i32 %[[srcload]], i32* %[[dstcast2]], {{.*}}, !tbaa [[TAG_9]]		; CHECK-NEXT: store i32 %[[srcload]], i32* %[[dstcast2]], {{.*}}, !tbaa [[TAG_9]]
; CHECK-NEXT: %[[agep3:.]] = getelementptr inbounds [34 x i8], [34 x i8] %[[a]], i64 0, i64 0		; CHECK-NEXT: %[[agep3:.]] = getelementptr inbounds [34 x i8], [34 x i8] %[[a]], i64 0, i64 0
; CHECK-NEXT: call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dst, i8* align 1 %[[agep3]], i32 %size, {{.*}}), !tbaa [[TAG_11]]		; CHECK-NEXT: call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dst, i8* align 1 %[[agep3]], i32 %size, {{.}}), !tbaa [[TAG_11:!.]]
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void

entry:		entry:
%a = alloca [42 x i8]		%a = alloca [42 x i8]
%ptr = getelementptr [42 x i8], [42 x i8]* %a, i32 0, i32 0		%ptr = getelementptr [42 x i8], [42 x i8]* %a, i32 0, i32 0
call void @llvm.memcpy.p0i8.p0i8.i32(i8* %ptr, i8* %src, i32 8, i1 false), !tbaa !0		call void @llvm.memcpy.p0i8.p0i8.i32(i8* %ptr, i8* %src, i32 8, i1 false), !tbaa !0
%ptr2 = getelementptr [42 x i8], [42 x i8]* %a, i32 0, i32 8		%ptr2 = getelementptr [42 x i8], [42 x i8]* %a, i32 0, i32 8
call void @llvm.memcpy.p0i8.p0i8.i32(i8* %ptr2, i8* %src, i32 %size, i1 false), !tbaa !3		call void @llvm.memcpy.p0i8.p0i8.i32(i8* %ptr2, i8* %src, i32 %size, i1 false), !tbaa !3
▲ Show 20 Lines • Show All 1,086 Lines • ▼ Show 20 Lines
; CHECK-DAG: [[TYPE_5:!.]] = !{{{.}}, !"type_5"}		; CHECK-DAG: [[TYPE_5:!.]] = !{{{.}}, !"type_5"}
; CHECK-DAG: [[TAG_5]] = !{[[TYPE_5]], [[TYPE_5]], i64 0, i64 1}		; CHECK-DAG: [[TAG_5]] = !{[[TYPE_5]], [[TYPE_5]], i64 0, i64 1}
; CHECK-DAG: [[TYPE_7:!.]] = !{{{.}}, !"type_7"}		; CHECK-DAG: [[TYPE_7:!.]] = !{{{.}}, !"type_7"}
; CHECK-DAG: [[TAG_7]] = !{[[TYPE_7]], [[TYPE_7]], i64 0, i64 1}		; CHECK-DAG: [[TAG_7]] = !{[[TYPE_7]], [[TYPE_7]], i64 0, i64 1}
; CHECK-DAG: [[TYPE_9:!.]] = !{{{.}}, !"type_9"}		; CHECK-DAG: [[TYPE_9:!.]] = !{{{.}}, !"type_9"}
; CHECK-DAG: [[TAG_9]] = !{[[TYPE_9]], [[TYPE_9]], i64 0, i64 1}		; CHECK-DAG: [[TAG_9]] = !{[[TYPE_9]], [[TYPE_9]], i64 0, i64 1}
; CHECK-DAG: [[TYPE_11:!.]] = !{{{.}}, !"type_11"}		; CHECK-DAG: [[TYPE_11:!.]] = !{{{.}}, !"type_11"}
; CHECK-DAG: [[TAG_11]] = !{[[TYPE_11]], [[TYPE_11]], i64 0, i64 1}		; CHECK-DAG: [[TAG_11]] = !{[[TYPE_11]], [[TYPE_11]], i64 0, i64 1}
; CHECK-DAG: [[TYPE_13:!.]] = !{{{.}}, !"type_13"}
; CHECK-DAG: [[TAG_13]] = !{[[TYPE_13]], [[TYPE_13]], i64 0, i64 1}
; CHECK-DAG: [[TYPE_15:!.]] = !{{{.}}, !"type_15"}
; CHECK-DAG: [[TAG_15]] = !{[[TYPE_15]], [[TYPE_15]], i64 0, i64 1}
; CHECK-DAG: [[TYPE_17:!.]] = !{{{.}}, !"type_17"}
; CHECK-DAG: [[TAG_17]] = !{[[TYPE_17]], [[TYPE_17]], i64 0, i64 1}
; CHECK-DAG: [[TYPE_19:!.]] = !{{{.}}, !"type_19"}
; CHECK-DAG: [[TAG_19]] = !{[[TYPE_19]], [[TYPE_19]], i64 0, i64 1}
; CHECK-DAG: [[TYPE_21:!.]] = !{{{.}}, !"type_21"}
; CHECK-DAG: [[TAG_21]] = !{[[TYPE_21]], [[TYPE_21]], i64 0, i64 1}
; CHECK-DAG: [[TYPE_23:!.]] = !{{{.}}, !"type_23"}
; CHECK-DAG: [[TAG_23]] = !{[[TYPE_23]], [[TYPE_23]], i64 0, i64 1}
; CHECK-DAG: [[TYPE_25:!.]] = !{{{.}}, !"type_25"}
; CHECK-DAG: [[TAG_25]] = !{[[TYPE_25]], [[TYPE_25]], i64 0, i64 1}
; CHECK-DAG: [[TYPE_27:!.]] = !{{{.}}, !"type_27"}
; CHECK-DAG: [[TAG_27]] = !{[[TYPE_27]], [[TYPE_27]], i64 0, i64 1}
; CHECK-DAG: [[TYPE_29:!.]] = !{{{.}}, !"type_29"}
; CHECK-DAG: [[TAG_29]] = !{[[TYPE_29]], [[TYPE_29]], i64 0, i64 1}
; CHECK-DAG: [[TYPE_31:!.]] = !{{{.}}, !"type_31"}
; CHECK-DAG: [[TAG_31]] = !{[[TYPE_31]], [[TYPE_31]], i64 0, i64 1}
; CHECK-DAG: [[TYPE_33:!.]] = !{{{.}}, !"type_33"}
; CHECK-DAG: [[TAG_33]] = !{[[TYPE_33]], [[TYPE_33]], i64 0, i64 1}
; CHECK-DAG: [[TYPE_35:!.]] = !{{{.}}, !"type_35"}
; CHECK-DAG: [[TAG_35]] = !{[[TYPE_35]], [[TYPE_35]], i64 0, i64 1}
; CHECK-DAG: [[TYPE_37:!.]] = !{{{.}}, !"type_37"}
; CHECK-DAG: [[TAG_37]] = !{[[TYPE_37]], [[TYPE_37]], i64 0, i64 1}
; CHECK-DAG: [[TYPE_39:!.]] = !{{{.}}, !"type_39"}
; CHECK-DAG: [[TAG_39]] = !{[[TYPE_39]], [[TYPE_39]], i64 0, i64 1}
; CHECK-DAG: [[TYPE_41:!.]] = !{{{.}}, !"type_41"}
; CHECK-DAG: [[TAG_41]] = !{[[TYPE_41]], [[TYPE_41]], i64 0, i64 1}
; CHECK-DAG: [[TYPE_43:!.]] = !{{{.}}, !"type_43"}
; CHECK-DAG: [[TAG_43]] = !{[[TYPE_43]], [[TYPE_43]], i64 0, i64 1}
; CHECK-DAG: [[TYPE_45:!.]] = !{{{.}}, !"type_45"}
; CHECK-DAG: [[TAG_45]] = !{[[TYPE_45]], [[TYPE_45]], i64 0, i64 1}
; CHECK-DAG: [[TYPE_47:!.]] = !{{{.}}, !"type_47"}
; CHECK-DAG: [[TAG_47]] = !{[[TYPE_47]], [[TYPE_47]], i64 0, i64 1}
; CHECK-DAG: [[TYPE_49:!.]] = !{{{.}}, !"type_49"}
; CHECK-DAG: [[TAG_49]] = !{[[TYPE_49]], [[TYPE_49]], i64 0, i64 1}
; CHECK-DAG: [[TYPE_51:!.]] = !{{{.}}, !"type_51"}		; CHECK-DAG: [[TYPE_51:!.]] = !{{{.}}, !"type_51"}
; CHECK-DAG: [[TAG_51]] = !{[[TYPE_51]], [[TYPE_51]], i64 0, i64 1}		; CHECK-DAG: [[TAG_51]] = !{[[TYPE_51]], [[TYPE_51]], i64 0, i64 1}
; CHECK-DAG: [[TYPE_53:!.]] = !{{{.}}, !"type_53"}
; CHECK-DAG: [[TAG_53]] = !{[[TYPE_53]], [[TYPE_53]], i64 0, i64 1}
; CHECK-DAG: [[TYPE_55:!.]] = !{{{.}}, !"type_55"}		; CHECK-DAG: [[TYPE_55:!.]] = !{{{.}}, !"type_55"}
; CHECK-DAG: [[TAG_55]] = !{[[TYPE_55]], [[TYPE_55]], i64 0, i64 1}		; CHECK-DAG: [[TAG_55]] = !{[[TYPE_55]], [[TYPE_55]], i64 0, i64 1}
; CHECK-DAG: [[TYPE_57:!.]] = !{{{.}}, !"type_57"}		; CHECK-DAG: [[TYPE_57:!.]] = !{{{.}}, !"type_57"}
; CHECK-DAG: [[TAG_57]] = !{[[TYPE_57]], [[TYPE_57]], i64 0, i64 1}		; CHECK-DAG: [[TAG_57]] = !{[[TYPE_57]], [[TYPE_57]], i64 0, i64 1}
; CHECK-DAG: [[TYPE_59:!.]] = !{{{.}}, !"type_59"}		; CHECK-DAG: [[TYPE_59:!.]] = !{{{.}}, !"type_59"}
; CHECK-DAG: [[TAG_59]] = !{[[TYPE_59]], [[TYPE_59]], i64 0, i64 1}		; CHECK-DAG: [[TAG_59]] = !{[[TYPE_59]], [[TYPE_59]], i64 0, i64 1}

llvm/test/Transforms/SROA/split-integer-be.ll

This file was added.

				; RUN: opt < %s -sroa -S \| FileCheck %s

				target datalayout = "E-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-n8:16:32:64"

				; CHECK-LABEL: @split_be
				; CHECK-NOT: alloca
				; CHECK: %[[x_lshr:.*]] = lshr i32 %X, 24
				; CHECK: %[[x_part:.*]] = trunc i32 %[[x_lshr]] to i8
				; CHECK: %[[y_lshr:.*]] = lshr i32 %Y, 8
				; CHECK: %[[y_part:.*]] = trunc i32 %[[y_lshr]] to i24
				; CHECK: %[[x_zext:.*]] = zext i8 %[[x_part]] to i32
				; CHECK-NEXT: %[[y_zext:.*]] = zext i24 %[[y_part]] to i32
				; CHECK-NEXT: %[[x_shl:.*]] = shl i32 %[[x_zext]], 24
				; CHECK-NEXT: %[[result:.*]] = or i32 %[[x_shl]], %[[y_zext]]
				; CHECK-NEXT: ret i32 %[[result]]

				define i32 @split_be(i8* %dst, i32 %X, i32 %Y) {
				%A = alloca [8 x i8]
				%gep1 = getelementptr [8 x i8], [8 x i8]* %A, i16 0, i16 0
				%ptr1 = bitcast i8* %gep1 to i32*
				%gep2 = getelementptr [8 x i8], [8 x i8]* %A, i16 0, i16 1
				%ptr2 = bitcast i8* %gep2 to i32*
				store i32 %X, i32* %ptr1, align 4
				store i32 %Y, i32* %ptr2, align 1
				%res = load i32, i32* %ptr1, align 4
				ret i32 %res
				}

llvm/test/Transforms/SROA/split-integer.ll

This file was added.

				; RUN: opt < %s -sroa -S \| FileCheck %s


				%inner = type { i32, i32 }
				%outer = type { i8, %inner }

				; CHECK-LABEL: @foo
				; CHECK-NOT: alloca
				; CHECK-NOT: store
				; CHECK-NOT: load

				define i64 @foo() {
				entry:
				%tmpstruct1 = alloca %outer, align 8
				%tmpstruct2 = alloca %outer, align 8
				%ptr1 = getelementptr inbounds %outer, %outer* %tmpstruct2, i64 0, i32 0
				store i8 0, i8* %ptr1, align 8
				arsenmUnsubmitted Not Done Reply Inline Actions Use opaque pointers arsenm: Use opaque pointers
				%innerptr = getelementptr inbounds %outer, %outer* %tmpstruct2, i64 0, i32 1
				%ptr2 = bitcast %inner* %innerptr to i64*
				store i64 4, i64* %ptr2, align 4
				%altptr = bitcast %outer* %tmpstruct2 to i64*
				%split = load i64, i64* %altptr, align 8
				%construct1 = insertvalue { i64, i32 } undef, i64 %split, 0
				%construct2 = insertvalue { i64, i32 } %construct1, i32 0, 1
				%first64 = extractvalue { i64, i32 } %construct2, 0
				%last32 = extractvalue { i64, i32 } %construct2, 1
				%tmpptr = bitcast %outer* %tmpstruct1 to i64*
				store i64 %first64, i64* %tmpptr
				%lastptr = getelementptr inbounds %outer, %outer* %tmpstruct1, i64 0, i32 1, i32 1
				store i32 %last32, i32* %lastptr, align 8
				%flagptr = getelementptr inbounds %outer, %outer* %tmpstruct1, i64 0, i32 0
				%flag = load i8, i8* %flagptr, align 8
				%structptr = getelementptr inbounds %outer, %outer* %tmpstruct1, i64 0, i32 1
				%valptr = bitcast %inner* %structptr to i64*
				%value = load i64, i64* %valptr, align 4
				%cond = icmp eq i8 %flag, 0
				br i1 %cond, label %true, label %exit

				exit:
				%retv = phi i64 [ 4, %true ], [ %value, %entry ]
				ret i64 %retv

				true:
				br label %exit
				}

This is an archive of the discontinued LLVM Phabricator instance.

[SROA] Enhance AggLoadStoreRewriter to rewrite integer load/store if it covers multi fields in original aggregateNeeds ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 264140

llvm/include/llvm/Transforms/Scalar/SROA.h

llvm/lib/Transforms/Scalar/SROA.cpp

llvm/test/Transforms/SROA/basictest.ll

llvm/test/Transforms/SROA/split-integer-be.ll

llvm/test/Transforms/SROA/split-integer.ll

[SROA] Enhance AggLoadStoreRewriter to rewrite integer load/store if it covers multi fields in original aggregate
Needs ReviewPublic