This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/AggressiveInstCombine/
-
Transforms/
-
AggressiveInstCombine/
26/61
AggressiveInstCombine.cpp
-
test/Transforms/AggressiveInstCombine/
-
Transforms/
-
AggressiveInstCombine/
-
AArch64/
1/4
or-load.ll
-
X86/
-
or-load.ll

Differential D127392

[AggressiveInstCombine] Combine consecutive loads which are being merged to form a wider load.
ClosedPublic

Authored by bipmis on Jun 9 2022, 3:49 AM.

Download Raw Diff

Details

Reviewers

dmgreen
spatel
RKSimon
efriedma
nikic

Commits

rG3b49a9fcf693: [AggressiveInstCombine] Combine consecutive loads which are being merged to…
rG3c70c8c1df66: [AggressiveInstCombine] Combine consecutive loads which are being merged to…

Summary

The patch simplifies some of the patterns as below

1. (zExt(L1) << shift1) | (zExt(L2) << shift2) -> zExt(L3) << shift1
2. (? | (zExt(L1) << shift1)) | (zExt(L2) << shift2) -> ? | (zExt(L3) << shift1)

The pattern is indicative of the fact that the loads are being merged to a wider load and the only use of this pattern is with a wider load. In this case for a non-atomic/non-volatile loads reduce the pattern to a combined load which would improve the cost of inlining, unrolling, vectorization etc.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Herald added a subscriber: hiraditya. · View Herald TranscriptJun 9 2022, 3:49 AM

bipmis requested review of this revision.Jun 9 2022, 3:49 AM

Harbormaster completed remote builds in B168784: Diff 435486.Jun 9 2022, 4:15 AM

I'd like to see some form of load combining in IR, but we need to be very careful about how that is implemented.

Note that LLVM had a dedicated pass for load combining, but it caused problems and was removed:
https://lists.llvm.org/pipermail/llvm-dev/2016-September/105291.html

So any proposal to do it again should be aware of and avoid those problems. Putting it in InstCombine is controversial because it could interfere with other analysis/passes. We recently added a cost-aware transform to AggressiveInstCombine, so that might be a better fit.

I didn't check all of the tests, but at least the simplest cases should already be handled in DAGCombiner. Can you show a larger, motivating example where that backend transform is too late to get to the ideal code?

@spatel Thanks for reviewing it. The common scenarios in llvm where we see this as an issue is with the cost involved in the inliner, unrolller and vectorizer which results in a sub-optimal code. One such simple example can be with dot product where vectorizer fails to generate a vectorized code as below:
https://godbolt.org/z/Ee4cbf1PG

Similar situation can be seen with the unroller as well as in the example below:
https://godbolt.org/z/dWo8n3Yz8

In D127392#3581102, @bipmis wrote:

https://godbolt.org/z/Ee4cbf1PG
Similar situation can be seen with the unroller as well as in the example below:
https://godbolt.org/z/dWo8n3Yz8

Thanks - those look like good motivating examples. I recommend moving this patch over to AggressiveInstCombine or recreating a limited form of the old LoadCombine pass. Either way, we need to be careful to make sure that the transform does not combine loads illegally and that it doesn't harm other analysis. You might want to start an RFC thread on discourse for more feedback and tag people who commented on earlier proposals to do this kind of transform.

I am in favour of moving the discussion to discourse.

AggressiveInstCombine and target-dependent InstCombine are two independent concepts for me.

RKSimon added inline comments.Jun 16 2022, 8:26 AM

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
17 ↗	(On Diff #435486)	clang-format this?

This is being discussed in the discourse -https://discourse.llvm.org/t/load-widening-in-ir/61952

The main question I see here is whether this needs to be TTI based or not -- if yes, then it also shouldn't be in InstCombine. I think there's two reason why we might want TTI:

Do we want to create unaligned loads? Creating them is legal, but if the target does not support fast unaligned loads, the backend will break them up again. Should we only perform this optimization if TTI.allowsMisalignedMemoryAccesses reports a fast unaligned access?
Do we want to create loads with illegal sizes? For example, if we have a 64-bit target, so we want to combine two i64 loads into an i128 load, even though it will later be broken up again? (At least for the current implementation where both loads must have the same size, the question of whether to allow i24 loads or similar does not come up.)

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
3004 ↗	(On Diff #435486)	nullptr
3017 ↗	(On Diff #435486)	Move dyn_cast into initialization, having it in the if is hard to read.
3029 ↗	(On Diff #435486)	GEPOperator Also directly dyn_cast in the condition.
3035 ↗	(On Diff #435486)	You are probably looking for `GEP2->accumulateConstantOffsets()` here -- your current code will treat GEPs with multiple indices incorrectly.
3077 ↗	(On Diff #435486)	I don't get this check -- you're checking that there is an available value, but not that this value is the previous load. There might still be a clobber in between. What you're probably looking for is canInstructionRangeModRef() between the two instructions.
3108 ↗	(On Diff #435486)	Unnecessary cast
3114 ↗	(On Diff #435486)	Why does this not use the builder to create the instruction, only to insert it?
3119 ↗	(On Diff #435486)	Uh, what's the point of oring something with zero?

RKSimon added inline comments.Jun 24 2022, 3:26 AM

llvm/test/Transforms/InstCombine/or-load.ll
2 ↗	(On Diff #435486)	add little and big endian test coverage

In D127392#3607415, @nikic wrote:

The main question I see here is whether this needs to be TTI based or not -- if yes, then it also shouldn't be in InstCombine. I think there's two reason why we might want TTI:

Do we want to create unaligned loads? Creating them is legal, but if the target does not support fast unaligned loads, the backend will break them up again. Should we only perform this optimization if TTI.allowsMisalignedMemoryAccesses reports a fast unaligned access?

Do we want to create loads with illegal sizes? For example, if we have a 64-bit target, so we want to combine two i64 loads into an i128 load, even though it will later be broken up again? (At least for the current implementation where both loads must have the same size, the question of whether to allow i24 loads or similar does not come up.)

Thanks for the review. I have made most of the other suggested changes and can post a patch for the same. It would also handle some more test cases.

Right I think we need to be sure on this particular path. If we want a TTI based should Aggressive Instcombine be a better choice. We also see some use scenarios of TTI in InstCombine but is our case a vaild use case scenario.
Next patch would be in InstCombine for this reason.
I will also test the patch up with extended tests to see if we are seeing any performance issues and if the backend breaking up happens ok.

We also see some use scenarios of TTI in InstCombine

I believe there is a strong consensus to say no to TTI in InstCombine.

bipmis added inline comments.Jun 28 2022, 9:21 AM

llvm/test/Transforms/InstCombine/or-load.ll
2 ↗	(On Diff #435486)	In IR I believe the checks would be the same for LE and BE. The differences should pop up in the assembly. https://godbolt.org/z/xboKx47cc

Handle some of the review comments.
Added support for additional scenarios like reverse order loads and big-endian support.

Moving the implementation to AgressiveInstCombine still open for discussion in addition with the TTI requirement.

In D127392#3625158, @bipmis wrote:

Handle some of the review comments.
Added support for additional scenarios like reverse order loads and big-endian support.

Moving the implementation to AgressiveInstCombine still open for discussion in addition with the TTI requirement.

IMO, we should move this patch to AggressiveInstCombine with conservative TTI limitations to start.

Now that we've heard from several others, I'd summarize as:

There's good motivation to have load combining in IR as a canonicalization.
But it should be limited with TTI (avoid unaligned loads, illegal operations/types).
Adding TTI to InstCombine is a non-starter; we don't want to have that level of target-dependence in IR canonicalization.
AggressiveInstCombine is lower-weight canonicalization pass that recently gained access to TTI.
Potential follow-ups could relax the TTI constraints and/or allow bswap formation to deal with endian diffs seen in the current tests (we do that type of transform in the backend already).

bipmis added inline comments.Jul 1 2022, 9:07 AM

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
3035 ↗	(On Diff #435486)	Currently we are handling GEP with single indice. With the multiple indices we exit at the check "isa<GetElementPtrInst>(Op2)" . Will keep a look on this if it can be improved.
3077 ↗	(On Diff #435486)	This is basically to check if there are stores or memory accesses b/w the 2 loads which we should consider for before the load merges. The check starts at L2 in reverse and does the memory check till L1. There are alias tests loadCombine_4consecutive_with_alias* which should give you the idea.
3119 ↗	(On Diff #435486)	Yeah this is basically cause the visitOr() does an Insert of the returned OR. This may not be needed if we move this to AggressiveInstCombine.

When switching this to AggressiveInstCombine, I would strongly recommend to start with a much more minimal patch. Handle only a single simple case, without any of the possible variants. We can build on that base later.

Harbormaster completed remote builds in B173244: Diff 441695.Jul 1 2022, 9:30 AM

In D127392#3625257, @nikic wrote:

When switching this to AggressiveInstCombine, I would strongly recommend to start with a much more minimal patch. Handle only a single simple case, without any of the possible variants. We can build on that base later.

Strongly agree - there's a lot of potential for this to go wrong both in correctness and perf regressions, so we need to build up in steps.
AFAIK, the load combine pass did not have correctness problems when it died, so that source code would be a good reference.

A simple implementation in AggressiveInstCombine which handles the forward consecutive load sequences as provided in the tests.
The implementation is limited to a specific consecutive load pattern which reduces to a combined load only(One and only use of individual loads is to generate a wider load). This is not a generic LoadCombine as combining loads with other uses can result in poison propagation.

In D127392#3625567, @spatel wrote:

In D127392#3625257, @nikic wrote:

When switching this to AggressiveInstCombine, I would strongly recommend to start with a much more minimal patch. Handle only a single simple case, without any of the possible variants. We can build on that base later.

Strongly agree - there's a lot of potential for this to go wrong both in correctness and perf regressions, so we need to build up in steps.
AFAIK, the load combine pass did not have correctness problems when it died, so that source code would be a good reference.

The load combine pass is different in that it implements load combine in a more generic way and not based on a specific pattern. So if it find 2 loads can be combined it generates a wider load and subsequent usage are derived from this wider load using CreateExtractInteger. The current implementation will be more like a pattern match with offset and shift verify to confirm the loads are consecutive and with the only use to create a wider load.

Harbormaster completed remote builds in B174838: Diff 443893.Jul 12 2022, 4:48 AM

spatel retitled this revision from [InstCombine] Combine consecutive loads which are being merged to form a wider load. to [AggressiveInstCombine] Combine consecutive loads which are being merged to form a wider load..Jul 14 2022, 7:14 AM

How does this code account for potential memory accesses between the loads that are getting combined?

define i16 @loadCombine_2consecutive_store_between(ptr %p) {
  %p1 = getelementptr i8, ptr %p, i32 1
  %l1 = load i8, ptr %p, align 2
  store i8 42, ptr %p  ; this must happen after a combined load?
  %l2 = load i8, ptr %p1

  %e1 = zext i8 %l1 to i16
  %e2 = zext i8 %l2 to i16
  %s2 = shl i16 %e2, 8
  %o1 = or i16 %e1, %s2
  ret i16 %o1
}

llvm/test/Transforms/AggressiveInstCombine/or-load.ll
5 ↗	(On Diff #443893)	We need to have this test duplicated on a target (x86?) where the fold actually happens. Target-specific tests will need to go inside target subdirectories, otherwise we'll break testing bots. We can pre-commit those once we have the right set of tests.
24 ↗	(On Diff #443893)	Why do the tests have non-canonical code (shift-by-0)? I don't think we'd ever see this pattern given where AIC is in the opt pipeline.

In D127392#3653014, @spatel wrote:

How does this code account for potential memory accesses between the loads that are getting combined?

define i16 @loadCombine_2consecutive_store_between(ptr %p) {
  %p1 = getelementptr i8, ptr %p, i32 1
  %l1 = load i8, ptr %p, align 2
  store i8 42, ptr %p  ; this must happen after a combined load?
  %l2 = load i8, ptr %p1

  %e1 = zext i8 %l1 to i16
  %e2 = zext i8 %l2 to i16
  %s2 = shl i16 %e2, 8
  %o1 = or i16 %e1, %s2
  ret i16 %o1
}

I should have mentioned. This being the base version have not enabled the same. Just targeting simple load scenarios in this patch. This was enabled in the the InstCombine patch. The same will be enabled in the subsequent patches to handle memory access b/w loads.

In D127392#3654507, @bipmis wrote:

I should have mentioned. This being the base version have not enabled the same. Just targeting simple load scenarios in this patch. This was enabled in the the InstCombine patch. The same will be enabled in the subsequent patches to handle memory access b/w loads.

I don't understand the comment. Does the posted patch miscompile the example with a store? If so, then the patch can't be committed in this form.
If there are planned changes to the posted patch before it is ready for review, please note that in the patch status (or add "WIP" to the title).

Updating the patch with AliasAnalysis. Test split up for respective Targets and new tests updated.
Currently handling only simple forward load patterns.

In D127392#3654734, @spatel wrote:

In D127392#3654507, @bipmis wrote:

I should have mentioned. This being the base version have not enabled the same. Just targeting simple load scenarios in this patch. This was enabled in the the InstCombine patch. The same will be enabled in the subsequent patches to handle memory access b/w loads.

I don't understand the comment. Does the posted patch miscompile the example with a store? If so, then the patch can't be committed in this form.
If there are planned changes to the posted patch before it is ready for review, please note that in the patch status (or add "WIP" to the title).

Yes it should have been enabled to handle the store scenarios. I have updates the tests and added few more tests. Can add more as needed.
Also to keep it simple have handled forward load sequences.

dmgreen added inline comments.Jul 18 2022, 11:12 AM

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
448–449	The comment could flow better: // 2. (? \| (zExt(L1) << shift1)) \| (zExt(L2) << shift2) // -> ? \| (zExt(L3) << shift1)
479	Drop extra brackets from (LI1 == LI2)
489	I think this can use getPointersDiff to check the two pointers are the right distance apart.
531	Do we check anywhere that LI1 and LI2 are in the same block?
533–536	Does std::distance work?
583	This doesn't need to cast to a Value*
592	Allows -> Allowed
593	This could also be DL.isLegalInteger, which would avoid the need to create the Type.
601	Should this check the Fast flag too? if (!Allowed \|\| !Fast)
605–612	I think this is unneeded, and this can always just create: NewLoad = new LoadInst(IntegerType::get(Load1Ptr->getContext(), LOps.LoadSize), LI1->getPointerOperand(), "", LI1->isVolatile(), LI1->getAlign(), LI1->getOrdering(), LI1->getSyncScopeID()); Or possibly use Builder.CreateLoad to avoid the separate Insert.
625–628	This could avoid the NewOp variable and just do: if (LOps.zext) { NewLoad = Builder.CreateZExt(NewLoad, LOps.zext);
llvm/test/Transforms/AggressiveInstCombine/AArch64/or-load.ll
25–26	Can you run these tests through `opt -O1` (without this patch) and use the result as the tests (maybe with a little cleanup). LLVM-IR will almost never include shl 0 nodes, and we should make sure we are testing what will appear in reality. https://godbolt.org/z/raxKnEE9a

Harbormaster completed remote builds in B176056: Diff 445525.Jul 18 2022, 11:23 AM

Handle Review Comments from David.

Harbormaster completed remote builds in B176473: Diff 446107.Jul 20 2022, 5:17 AM

Handle GEP in a more generic way as as requested earlier by @nikic
Add check for loads belonging to same BB.

Harbormaster completed remote builds in B178214: Diff 448536.Jul 29 2022, 12:48 AM

@spatel I have updated the patch and it should handle the forward load scenarios. Have incorporated most of the review comments received.
The patch has been tested and passing all tests. Do review and suggest if you have more comments. Thanks.

nikic added inline comments.Jul 29 2022, 2:33 AM

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
512	This whole code can be replaced with `stripAndAccumulateConstantOffsets()`.
542	FindAvailableLoadedValue() is not the correct way to check for clobbers. In particular, it will return an "available value" for a direct clobber (store to the same address). What you want it to loop over the instructions and call getModRefInfo() on AliasAnalysis, together with a small limit (e.g. 16) when you will abort the walk and bail out of the transform.
llvm/test/Transforms/AggressiveInstCombine/AArch64/or-load.ll
203	Try a variant storing to `%p3` rather than `%pstr` here. I believe your current implementation will incorrectly accept this.

bipmis added inline comments.Jul 29 2022, 3:07 AM

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
512	Sure. Will look into this.
llvm/test/Transforms/AggressiveInstCombine/AArch64/or-load.ll
203	It does not return an "available value" for a direct clobber. For example a change to the test %l1 = load i8, ptr %p %l2 = load i8, ptr %p1 %l3 = load i8, ptr %p2 store i8 10, i8* %p3 %l4 = load i8, ptr %p3 still returns ; LE-NEXT: [[TMP1:%.]] = load i16, ptr [[P]], align 1 ; LE-NEXT: [[TMP2:%.]] = zext i16 [[TMP1]] to i32 ; LE-NEXT: [[L3:%.]] = load i8, ptr [[P2]], align 1 ; LE-NEXT: store i8 10, ptr [[P3]], align 1 ; LE-NEXT: [[L4:%.]] = load i8, ptr [[P3]], align 1 Can add more tests if you suggest.

nikic added inline comments.Jul 29 2022, 5:57 AM

llvm/test/Transforms/AggressiveInstCombine/AArch64/or-load.ll
203	Okay, I had to test this patch locally to find a case where it fails. Try this variant: store i8 0, ptr %p3 store i8 1, ptr %p We are looking for an available value of `%p`, so we find the `store i8 1, ptr %p` and are happy. But before that, there is a clobber of `%p3` that makes the transform invalid (the load is moved before the store).

nikic added a reviewer: nikic.Jul 29 2022, 5:57 AM

Handle Review comments from nikic. Made changes to Alias Analysis.

Harbormaster completed remote builds in B178550: Diff 448989.Aug 1 2022, 4:25 AM

Gentle ping for review on this! Thanks.

I feel like this being recursive makes it more difficult to reason about than it could be. I have added some mostly nitpicks/cleanups below.

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
443	zext -> Zext (Or ZExt? Capitalized at least)
448	Are there any tests for case 2?
467–468	The variables can be defined where they are first used.
482	Should it check that the address space is the same?
502	Does it matter if it is a bitcast or a gep?
520	Capitalize variable names.
549	We checked that loadSize1 == loadSize2 above.
552	Demorgan this.
568	Maybe replace the name of foldLoadsIterative with foldConsecutiveLoads and vice-versa. foldLoadsIterative doesn't really explain what this function is folding, and it's not really iterative.

Thanks David for reviewing. Have handled most of the nits.

bipmis added inline comments.Aug 11 2022, 2:43 PM

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
448	The case2 is not required as we are handling the pattern recursively. This also keeps the implementation simple.
502	I think this may be needed, so that we fall through and evaluate further if instructions are only of these types.
549	Right. I have added additional comments on why this is needed.

Harbormaster completed remote builds in B180783: Diff 451989.Aug 11 2022, 4:35 PM

@dmgreen For pattern matching a chain of or(or,load), recursion seemed to a good choice to go to the root node and evaluate if the entire chain can be reduced. Also there are other instances of pattern match in AggressiveCombine() like matchAndOrChain() which implements it similarly.
@nikic Do you have any further comments on the Alias Analysis used. I have used the limit as the instruction difference between 2 loads for alias analysis b/w which we need to look out for a store.
@spatel Please do suggest if you have any other review comments. Thanks.

dmgreen added inline comments.Aug 25 2022, 3:47 AM

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
502	What is it you mean by fall though? If stripAndAccumulateConstantOffsets could give us an Offset, it seems like we should just always call it and have it do what it can. It will return the original pointer if it couldn't do anything useful. It may be better to keep Offset1/Offset2 as APInt too. It would help if the pointers were > 64bits.
564	Drop these extra brackets
569	Should this have a limit on the number of instructions?
579	Is there a test for an i128 version of the fold?
646	Is all the metadata on the old instruction (like MD_range) always valid on the new load?

nikic added inline comments.Aug 30 2022, 6:29 AM

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
523	Shouldn't this be `!LI1->isSimple() \|\|` !LI2->isSimple()`? We want to bail if either load isn't simple, not if both are. Also, it looks like there are no (negative) tests for volatile/atomic loads.

Herald added a subscriber: • pcwang-thead. · View Herald TranscriptAug 30 2022, 6:29 AM

Allen added a subscriber: Allen.Aug 30 2022, 6:45 AM

Handle review comments and add additional tests.

bipmis marked 5 inline comments as done.Sep 5 2022, 4:10 AM

bipmis added inline comments.

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
569	Some tests like load64_farLoads() which have wider instruction gap b/w loads may result in partial combine(when tried with 16). I can possibly go for a bigger limit or can keep the limit on the actual instructions b/w 2 loads.

Harbormaster completed remote builds in B185058: Diff 457929.Sep 5 2022, 4:38 AM

dmgreen added inline comments.Sep 6 2022, 7:19 AM

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
569	Using what limit?
646	What happens if new metadata gets added in the future, that isn't valid? Is it better to just drop all the metadata? Or is that too likely to be worse for performance?

bipmis added inline comments.Sep 7 2022, 3:10 AM

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
569	In the current implementation worst case limit could be all instructions in a BB. What is the issue with this? For the test case load64_farLoads() it does fine with a limit of 35.
646	This being a specific scenario of the pattern match and looking for an or-load chain, I dont think performance should be a big concern. Depends on the end use of the merged load. What I am seeing in most cases is that they try to retain atleast the AATags. if (AATags) NewVal->setAAMetadata(AATags);

nikic added inline comments.Sep 7 2022, 3:19 AM

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
646	Note that you can't simply take the AATags from one load, they have to be merged appropriately. I believe for this specific case you need the `AAMDNodes::concat()` method, because you are merging loads from different non-overlapping locations.

dmgreen added inline comments.Sep 8 2022, 1:21 AM

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
569	It protects us where there are thousands of instructions in the block, just to be safe for degenerate cases. If we expect the maximum pattern to be (load+zext+shift+or) * i8->i64, so 4*8, then a limit of 64 instructions sounds fine.

bipmis added inline comments.Sep 8 2022, 5:07 AM

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
569	Sounds right. Will implement the same with the 64 instruction limit. Thanks.
646	Agreed. Thanks for this. I think we are better off dropping the metadata at this point.

alex added a subscriber: alex.Sep 9 2022, 7:24 AM

Add a limit of 64 instructions for Alias Analysis.
Concat AATags Metadata for merged loads.

bipmis added inline comments.Sep 12 2022, 6:50 AM

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
646	The concat method leaves the tbaa blank so maybe we may want to drop the Metadata altogether? Currently the ‘noalias’ and ‘alias.scope’ Metadata will be concatenated from AAMDNodes.

Harbormaster completed remote builds in B186142: Diff 459444.Sep 12 2022, 7:27 AM

Thanks for the the changes - as far as I understand, this LGTM now. Thanks for working through the details.

Any other comments?

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
646	I'm a little surprised that if two the tbaa info are the same, we can't use the same on the result node. I think using concat sounds sensible though. I suspect in practice we will often be combining char in any case.

This revision is now accepted and ready to land.Sep 15 2022, 12:38 AM

There are a number of test failures in pre-merge checks, probably pipeline tests.

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
909–910	Just like all the other analyses, this should use a reference, not a pointer (the analysis is not optional).

nikic added inline comments.Sep 15 2022, 1:10 AM

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
488	Zext -> ZextType, maybe?
495	Iterative -> Recursive
567	This check should come first, otherwise you don't count store instructions.
591	I think this handles non-byte-sized loads incorrectly. Let's say you do i4 loads with 1 byte offset. Then PrevSize will be 1 and match the offset, even though the loads are not actually consecutive. Please add some tests for non-byte-sized loads.
624	Combine these declarations with initialization.
645	Why does this not use the IRBuilder?
867	The DataLayout fetch can be moved outside the loop.

spatel added inline comments.Sep 15 2022, 6:11 AM

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
866–867	The transform should be placed ahead of foldSqrt(), or we we may hit a use-after-free bug (because foldSqrt can delete `I`). There was a code comment about this in: https://github.com/llvm/llvm-project/commit/df868edee561eb973edd85ec9df41c67aa0bff6b ...but that patch got reverted. We should probably add that code comment independently (or fix the bug some other way).

Update the patch with review comments.
-> Support power of 2 loads and minimum load size 8bits.
-> IR builder for new load.
-> New test for a 4 bit load.
-> Nits.

bipmis marked 7 inline comments as done.Sep 16 2022, 6:05 AM

bipmis added inline comments.

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
567	So it would count stores which do not alias. For the case it aliases we terminate anyways and the count update wont matter?
591	Right. The shift condition will prevent it from merging. But we do not want to combine loads smaller than a byte. Updated checks.

Harbormaster completed remote builds in B187111: Diff 460709.Sep 16 2022, 6:41 AM

spatel mentioned this in D133584: [DAGCombiner] [AMDGPU] Allow vector loads in MatchLoadCombine.Sep 19 2022, 8:06 AM

Ping.
@nikic Any further comments on the patch.

From a cursory look it looks fine, I assume @dmgreen has reviewed in detail :)

I have one note (that can be addressed later): There's currently a check that the loads have the same size. Does that mean that if you have a partially folded pattern (e.g. one i16 load plus two i8 loads) that it will not get folded? If so, this seems like something we should relax (later).

In D127392#3807970, @nikic wrote:

From a cursory look it looks fine, I assume @dmgreen has reviewed in detail :)

I have one note (that can be addressed later): There's currently a check that the loads have the same size. Does that mean that if you have a partially folded pattern (e.g. one i16 load plus two i8 loads) that it will not get folded? If so, this seems like something we should relax (later).

Thats correct. Currently we are folding loads of same size only in a chain. So for above case if two i8 loads belong to single lower chain they can be folded.
Agreed. This is more to do with a stricter basic implementation first. We can relax this later.

Please ensure you precommit the new tests to trunk and then rebase this patch on it - so we see the effect of the patch on the IR

This revision was landed with ongoing or failed builds.Sep 23 2022, 2:20 AM

Closed by commit rG3c70c8c1df66: [AggressiveInstCombine] Combine consecutive loads which are being merged to… (authored by bipmis). · Explain Why

This revision was automatically updated to reflect the committed changes.

bipmis added a commit: rG3c70c8c1df66: [AggressiveInstCombine] Combine consecutive loads which are being merged to….

When I do a 3-stage bootstrap at this commit, the second-stage compiler crashes. The issue does not appear at the previous commit, so I'm reverting this commit.

gribozavr added a reverting change: rG954d3cd2c6e9: Revert "[AggressiveInstCombine] Combine consecutive loads which are being….Sep 23 2022, 10:21 AM

Reverted: https://github.com/llvm/llvm-project/commit/954d3cd2c6e9a5cdee23fcef60f2c3a21c88be60

In D127392#3812152, @gribozavr2 wrote:

Reverted: https://github.com/llvm/llvm-project/commit/954d3cd2c6e9a5cdee23fcef60f2c3a21c88be60

@gribozavr2 Do you have any useful build log that you can provide to speed up triage please?

In D127392#3812152, @gribozavr2 wrote:

Reverted: https://github.com/llvm/llvm-project/commit/954d3cd2c6e9a5cdee23fcef60f2c3a21c88be60

@gribozavr2 Do let me know how to reproduce this issue. I am trying a 3 stage in "clang>/cmake/caches/3-stage.cmake". Linking the executable in second stage takes quite some time for a clean HEAD code.
Builds fine on AArch64 and x86 with the patch. Stage2 and stage3 binary are bit exact.

I'm looking into coming up with a reduced repro.
We're seeing a crash in re2. The weird thing is that the compiler I've repro'd is not bootstrapped, but the crash in re2 happens even when compiling it with -O0. My guess is that something else we compile with the just-built clang like libc++ is getting miscompiled, but more investigation required.

it ended up being a function that we always compile with -O3...

anyway, reduced test case

$ cat /tmp/a.ll
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"                                                                                                       
target triple = "x86_64-grtev4-linux-gnu"                                                                                                                                                          
                                                                                                                                                                                                   
define i64 @eggs(ptr noundef readonly %arg) {                                                                                                                                                      
  %tmp3 = load i8, ptr %arg, align 1                                                             
  %tmp4 = getelementptr inbounds i8, ptr %arg, i64 1                                             
  %tmp5 = load i8, ptr %tmp4, align 1                                                            
  %tmp6 = getelementptr inbounds i8, ptr %arg, i64 2                                             
  %tmp7 = load i8, ptr %tmp6, align 1                                                            
  %tmp8 = getelementptr inbounds i8, ptr %arg, i64 3                                             
  %tmp9 = load i8, ptr %tmp8, align 1                                                            
  %tmp10 = getelementptr inbounds i8, ptr %arg, i64 4                                            
  %tmp11 = load i8, ptr %tmp10, align 1                                                          
  %tmp12 = getelementptr inbounds i8, ptr %arg, i64 5                                            
  %tmp13 = load i8, ptr %tmp12, align 1                                                          
  %tmp14 = getelementptr inbounds i8, ptr %arg, i64 6                                            
  %tmp15 = load i8, ptr %tmp14, align 1                                                          
  %tmp16 = getelementptr inbounds i8, ptr %arg, i64 7                                            
  %tmp17 = load i8, ptr %tmp16, align 1
  %tmp18 = zext i8 %tmp17 to i64     
  %tmp19 = shl nuw i64 %tmp18, 56    
  %tmp20 = zext i8 %tmp15 to i64     
  %tmp21 = shl nuw nsw i64 %tmp20, 48
  %tmp22 = or i64 %tmp19, %tmp21     
  %tmp23 = zext i8 %tmp13 to i64     
  %tmp24 = shl nuw nsw i64 %tmp23, 40
  %tmp25 = or i64 %tmp22, %tmp24     
  %tmp26 = zext i8 %tmp11 to i64     
  %tmp27 = shl nuw nsw i64 %tmp26, 32
  %tmp28 = or i64 %tmp25, %tmp27     
  %tmp29 = zext i8 %tmp9 to i64      
  %tmp30 = shl nuw nsw i64 %tmp29, 24
  %tmp31 = or i64 %tmp28, %tmp30     
  %tmp32 = zext i8 %tmp7 to i64      
  %tmp33 = shl nuw nsw i64 %tmp32, 16
  %tmp34 = zext i8 %tmp5 to i64     
  %tmp35 = shl nuw nsw i64 %tmp34, 8
  %tmp36 = or i64 %tmp31, %tmp33
  %tmp37 = zext i8 %tmp3 to i64 
  %tmp38 = or i64 %tmp36, %tmp35                                                                                                                                                                   
  %tmp39 = or i64 %tmp38, %tmp37                                                                                                                                                                   
  ret i64 %tmp39                                                                                                                                                                                   
}
$ ./build/rel/bin/opt -passes=aggressive-instcombine -S /tmp/a.ll
; ModuleID = '/tmp/b.ll'
source_filename = "/tmp/a.ll"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-grtev4-linux-gnu"

define i64 @eggs(ptr noundef readonly %arg) {
  %tmp3 = load i8, ptr %arg, align 1
  %tmp4 = getelementptr inbounds i8, ptr %arg, i64 1
  %tmp5 = load i32, ptr %tmp4, align 1
  %1 = zext i32 %tmp5 to i64
  %2 = shl i64 %1, 8
  %tmp37 = zext i8 %tmp3 to i64
  %tmp39 = or i64 %2, %tmp37
  ret i64 %tmp39
}

that does look wrong, it looks like it should be optimized to load i64 rather than zext(load i32 (%a + 1)) | zext(load i8 %a)

aeubanks reopened this revision.Sep 26 2022, 11:08 AM

This revision is now accepted and ready to land.Sep 26 2022, 11:08 AM

In D127392#3815730, @aeubanks wrote:

it ended up being a function that we always compile with -O3...

anyway, reduced test case

$ cat /tmp/a.ll
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"                                                                                                       
target triple = "x86_64-grtev4-linux-gnu"                                                                                                                                                          
                                                                                                                                                                                                   
define i64 @eggs(ptr noundef readonly %arg) {                                                                                                                                                      
  %tmp3 = load i8, ptr %arg, align 1                                                             
  %tmp4 = getelementptr inbounds i8, ptr %arg, i64 1                                             
  %tmp5 = load i8, ptr %tmp4, align 1                                                            
  %tmp6 = getelementptr inbounds i8, ptr %arg, i64 2                                             
  %tmp7 = load i8, ptr %tmp6, align 1                                                            
  %tmp8 = getelementptr inbounds i8, ptr %arg, i64 3                                             
  %tmp9 = load i8, ptr %tmp8, align 1                                                            
  %tmp10 = getelementptr inbounds i8, ptr %arg, i64 4                                            
  %tmp11 = load i8, ptr %tmp10, align 1                                                          
  %tmp12 = getelementptr inbounds i8, ptr %arg, i64 5                                            
  %tmp13 = load i8, ptr %tmp12, align 1                                                          
  %tmp14 = getelementptr inbounds i8, ptr %arg, i64 6                                            
  %tmp15 = load i8, ptr %tmp14, align 1                                                          
  %tmp16 = getelementptr inbounds i8, ptr %arg, i64 7                                            
  %tmp17 = load i8, ptr %tmp16, align 1
  %tmp18 = zext i8 %tmp17 to i64     
  %tmp19 = shl nuw i64 %tmp18, 56    
  %tmp20 = zext i8 %tmp15 to i64     
  %tmp21 = shl nuw nsw i64 %tmp20, 48
  %tmp22 = or i64 %tmp19, %tmp21     
  %tmp23 = zext i8 %tmp13 to i64     
  %tmp24 = shl nuw nsw i64 %tmp23, 40
  %tmp25 = or i64 %tmp22, %tmp24     
  %tmp26 = zext i8 %tmp11 to i64     
  %tmp27 = shl nuw nsw i64 %tmp26, 32
  %tmp28 = or i64 %tmp25, %tmp27     
  %tmp29 = zext i8 %tmp9 to i64      
  %tmp30 = shl nuw nsw i64 %tmp29, 24
  %tmp31 = or i64 %tmp28, %tmp30     
  %tmp32 = zext i8 %tmp7 to i64      
  %tmp33 = shl nuw nsw i64 %tmp32, 16
  %tmp34 = zext i8 %tmp5 to i64     
  %tmp35 = shl nuw nsw i64 %tmp34, 8
  %tmp36 = or i64 %tmp31, %tmp33
  %tmp37 = zext i8 %tmp3 to i64 
  %tmp38 = or i64 %tmp36, %tmp35                                                                                                                                                                   
  %tmp39 = or i64 %tmp38, %tmp37                                                                                                                                                                   
  ret i64 %tmp39                                                                                                                                                                                   
}
$ ./build/rel/bin/opt -passes=aggressive-instcombine -S /tmp/a.ll
; ModuleID = '/tmp/b.ll'
source_filename = "/tmp/a.ll"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-grtev4-linux-gnu"

define i64 @eggs(ptr noundef readonly %arg) {
  %tmp3 = load i8, ptr %arg, align 1
  %tmp4 = getelementptr inbounds i8, ptr %arg, i64 1
  %tmp5 = load i32, ptr %tmp4, align 1
  %1 = zext i32 %tmp5 to i64
  %2 = shl i64 %1, 8
  %tmp37 = zext i8 %tmp3 to i64
  %tmp39 = or i64 %2, %tmp37
  ret i64 %tmp39
}

that does look wrong, it looks like it should be optimized to load i64 rather than zext(load i32 (%a + 1)) | zext(load i8 %a)

@aeubanks Thanks for the test case.
So in this patch we have implemented forward load merge as can be seen by the test example. However I had to handle the child node "or(load,load)" as this is being reversed by InstCombine currently as seen below. This needed a corrected check.

define i32 @Load32(ptr noundef %ptr) local_unnamed_addr #0 {
entry:
  %0 = load i8, ptr %ptr, align 1, !tbaa !5
  %conv = zext i8 %0 to i32
  %arrayidx1 = getelementptr inbounds i8, ptr %ptr, i64 1
  %1 = load i8, ptr %arrayidx1, align 1, !tbaa !5
  %conv2 = zext i8 %1 to i32
  %shl = shl i32 %conv2, 8
 ** %or = or i32 %conv, %shl**
  %arrayidx3 = getelementptr inbounds i8, ptr %ptr, i64 2
  %2 = load i8, ptr %arrayidx3, align 1, !tbaa !5
  %conv4 = zext i8 %2 to i32
  %shl5 = shl i32 %conv4, 16
  %or6 = or i32 %or, %shl5
  %arrayidx7 = getelementptr inbounds i8, ptr %ptr, i64 3
  %3 = load i8, ptr %arrayidx7, align 1, !tbaa !5
  %conv8 = zext i8 %3 to i32
  %shl9 = shl i32 %conv8, 24
  %or10 = or i32 %or6, %shl9
  ret i32 %or10
}

*** IR Dump After InstCombinePass on Load32 ***
; Function Attrs: nounwind uwtable
define i32 @Load32(ptr noundef %ptr) local_unnamed_addr #0 {
entry:
  %0 = load i8, ptr %ptr, align 1, !tbaa !5
  %conv = zext i8 %0 to i32
  %arrayidx1 = getelementptr inbounds i8, ptr %ptr, i64 1
  %1 = load i8, ptr %arrayidx1, align 1, !tbaa !5
  %conv2 = zext i8 %1 to i32
  %shl = shl nuw nsw i32 %conv2, 8
**  %or = or i32 %shl, %conv**
  %arrayidx3 = getelementptr inbounds i8, ptr %ptr, i64 2
  %2 = load i8, ptr %arrayidx3, align 1, !tbaa !5
  %conv4 = zext i8 %2 to i32
  %shl5 = shl nuw nsw i32 %conv4, 16
  %or6 = or i32 %or, %shl5
  %arrayidx7 = getelementptr inbounds i8, ptr %ptr, i64 3
  %3 = load i8, ptr %arrayidx7, align 1, !tbaa !5
  %conv8 = zext i8 %3 to i32
  %shl9 = shl nuw i32 %conv8, 24
  %or10 = or i32 %or6, %shl9
  ret i32 %or10

The full implementation of reverse load merge pattern is planned subsequently. So now you will see the lowest node loads being merged but not the other ones. I am updating the patch. Would be great if you can test the same.

Handle the reverse load pattern checks correctly.
Currently we need to handle the leaf node reverse loads as InstCombine pass folds the pattern from a forward to reverse one.
Full reverse load patterns planned to be implemented subsequently.

Harbormaster completed remote builds in B188925: Diff 463194.Sep 27 2022, 6:09 AM

spatel added inline comments.Sep 27 2022, 7:41 AM

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
694–697	I didn't notice this limitation before. "Forward" and "reverse" are referring to the order that the `or` instructions use the loaded values? I agree that we don't want to complicate the initial implementation any more than necessary, but it might help to see how the backend handles that in DAGCombiner::MatchLoadCombine() (see D133584 for a proposed enhancement to that code).

bipmis added inline comments.Sep 27 2022, 9:22 AM

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
694–697	Right it is the order Forward - or(or(or(or(0,1),2),3) Reverse - or(or(or(or(3,2),1),0) Considering 0,1...as zext and shl of loads with index 0,1 etc. For simplicity we wanted to implement the forward first and if this looks good we can do the reverse+mixed size loads. There should be minimal changes on top of this. So that is in plan next.

Matt added a subscriber: Matt.Sep 27 2022, 11:37 AM

bipmis added inline comments.Sep 28 2022, 6:29 AM

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
694–697	I didn't notice this limitation before. "Forward" and "reverse" are referring to the order that the `or` instructions use the loaded values? I agree that we don't want to complicate the initial implementation any more than necessary, but it might help to see how the backend handles that in DAGCombiner::MatchLoadCombine() (see D133584 for a proposed enhancement to that code). @spatel Verified DAGCombiner::MatchLoadCombine() handles the reverse load pattern fine and a single load is generated.

Please pre-commit the new tests with the baseline CHECK lines.
Also, it would be good to add a test that matches the reported failure case (the @eggs test from the earlier comment).
After that, I think it's fine to re-commit.

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
694–697	Thanks for checking. It doesn't directly impact the initial/immediate patch here, but it might provide a template for the planned enhancements.

Closed by commit rG3b49a9fcf693: [AggressiveInstCombine] Combine consecutive loads which are being merged to… (authored by bipmis). · Explain WhySep 28 2022, 9:36 AM

This revision was automatically updated to reflect the committed changes.

bipmis added a commit: rG3b49a9fcf693: [AggressiveInstCombine] Combine consecutive loads which are being merged to….

bipmis mentioned this in D135137: [AggressiveInstCombine] Load merge the reverse load pattern of consecutive loads..Oct 4 2022, 3:28 AM

Kai added a subscriber: Kai.Oct 4 2022, 12:54 PM

Kai added inline comments.

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
816	FYI: The missing bitcast for the `Load1Ptr` argument means that this change only works with opaque pointers.

dstuttard added a subscriber: dstuttard.Oct 5 2022, 2:06 AM

dstuttard added inline comments.

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
816	We had the same issue - I've just uploaded a patch. See D135249

uabelho added a subscriber: uabelho.Oct 23 2023, 5:58 AM

uabelho added inline comments.

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
738	Since the NumScanned counter includes debug instructions, this code makes the result depend on the presence on debug instructions. I wrote a ticket about that: https://github.com/llvm/llvm-project/issues/69925

Herald added a subscriber: StephenFan. · View Herald TranscriptOct 23 2023, 5:58 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

AggressiveInstCombine/

AggressiveInstCombine.cpp

215 lines

test/

Transforms/

AggressiveInstCombine/

AArch64/

or-load.ll

498 lines

X86/

or-load.ll

1202 lines

Diff 463588

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp

	Show First 20 Lines • Show All 44 Lines • ▼ Show 20 Lines

	STATISTIC(NumAnyOrAllBitsSet, "Number of any/all-bits-set patterns folded");			STATISTIC(NumAnyOrAllBitsSet, "Number of any/all-bits-set patterns folded");
	STATISTIC(NumGuardedRotates,			STATISTIC(NumGuardedRotates,
	"Number of guarded rotates transformed into funnel shifts");			"Number of guarded rotates transformed into funnel shifts");
	STATISTIC(NumGuardedFunnelShifts,			STATISTIC(NumGuardedFunnelShifts,
	"Number of guarded funnel shifts transformed into funnel shifts");			"Number of guarded funnel shifts transformed into funnel shifts");
	STATISTIC(NumPopCountRecognized, "Number of popcount idioms recognized");			STATISTIC(NumPopCountRecognized, "Number of popcount idioms recognized");

				static cl::opt<unsigned> MaxInstrsToScan(
				"aggressive-instcombine-max-scan-instrs", cl::init(64), cl::Hidden,
				cl::desc("Max number of instructions to scan for aggressive instcombine."));

	namespace {			namespace {
	/// Contains expression pattern combiner logic.			/// Contains expression pattern combiner logic.
	/// This class provides both the logic to combine expression patterns and			/// This class provides both the logic to combine expression patterns and
	/// combine them. It differs from InstCombiner class in that each pattern			/// combine them. It differs from InstCombiner class in that each pattern
	/// combiner runs only once as opposed to InstCombine's multi-iteration,			/// combiner runs only once as opposed to InstCombine's multi-iteration,
	/// which allows pattern combiner to have higher complexity than the O(1)			/// which allows pattern combiner to have higher complexity than the O(1)
	/// required by the instruction combiner.			/// required by the instruction combiner.
	class AggressiveInstCombinerLegacyPass : public FunctionPass {			class AggressiveInstCombinerLegacyPass : public FunctionPass {
	▲ Show 20 Lines • Show All 370 Lines • ▼ Show 20 Lines
	/// Try to replace a mathlib call to sqrt with the LLVM intrinsic. This avoids			/// Try to replace a mathlib call to sqrt with the LLVM intrinsic. This avoids
	/// pessimistic codegen that has to account for setting errno and can enable			/// pessimistic codegen that has to account for setting errno and can enable
	/// vectorization.			/// vectorization.
	static bool			static bool
	foldSqrt(Instruction &I, TargetTransformInfo &TTI, TargetLibraryInfo &TLI) {			foldSqrt(Instruction &I, TargetTransformInfo &TTI, TargetLibraryInfo &TLI) {
	// Match a call to sqrt mathlib function.			// Match a call to sqrt mathlib function.
	auto *Call = dyn_cast<CallInst>(&I);			auto *Call = dyn_cast<CallInst>(&I);
	if (!Call)			if (!Call)
	return false;			return false;
				dmgreenUnsubmitted Not Done Reply Inline Actions zext -> Zext (Or ZExt? Capitalized at least) dmgreen: zext -> Zext (Or ZExt? Capitalized at least)

	Module *M = Call->getModule();			Module *M = Call->getModule();
	LibFunc Func;			LibFunc Func;
	if (!TLI.getLibFunc(*Call, Func) \|\| !isLibFuncEmittable(M, &TLI, Func))			if (!TLI.getLibFunc(*Call, Func) \|\| !isLibFuncEmittable(M, &TLI, Func))
	return false;			return false;
				dmgreenUnsubmitted Not Done Reply Inline Actions Are there any tests for case 2? dmgreen: Are there any tests for case 2?
				bipmisAuthorUnsubmitted Done Reply Inline Actions The case2 is not required as we are handling the pattern recursively. This also keeps the implementation simple. bipmis: The case2 is not required as we are handling the pattern recursively. This also keeps the…

				dmgreenUnsubmitted Not Done Reply Inline Actions The comment could flow better: // 2. (? \| (zExt(L1) << shift1)) \| (zExt(L2) << shift2) // -> ? \| (zExt(L3) << shift1) dmgreen: The comment could flow better: ``` // 2. (? \| (zExt(L1) << shift1)) \| (zExt(L2) << shift2) //…
	if (Func != LibFunc_sqrt && Func != LibFunc_sqrtf && Func != LibFunc_sqrtl)			if (Func != LibFunc_sqrt && Func != LibFunc_sqrtf && Func != LibFunc_sqrtl)
	return false;			return false;

	// If (1) this is a sqrt libcall, (2) we can assume that NAN is not created			// If (1) this is a sqrt libcall, (2) we can assume that NAN is not created
	// (because NNAN or the operand arg must not be less than -0.0) and (2) we			// (because NNAN or the operand arg must not be less than -0.0) and (2) we
	// would not end up lowering to a libcall anyway (which could change the value			// would not end up lowering to a libcall anyway (which could change the value
	// of errno), then:			// of errno), then:
	// (1) errno won't be set.			// (1) errno won't be set.
	// (2) it is safe to convert this to an intrinsic call.			// (2) it is safe to convert this to an intrinsic call.
	Type *Ty = Call->getType();			Type *Ty = Call->getType();
	Value *Arg = Call->getArgOperand(0);			Value *Arg = Call->getArgOperand(0);
	if (TTI.haveFastSqrt(Ty) &&			if (TTI.haveFastSqrt(Ty) &&
	(Call->hasNoNaNs() \|\| CannotBeOrderedLessThanZero(Arg, &TLI))) {			(Call->hasNoNaNs() \|\| CannotBeOrderedLessThanZero(Arg, &TLI))) {
	IRBuilder<> Builder(&I);			IRBuilder<> Builder(&I);
	IRBuilderBase::FastMathFlagGuard Guard(Builder);			IRBuilderBase::FastMathFlagGuard Guard(Builder);
	Builder.setFastMathFlags(Call->getFastMathFlags());			Builder.setFastMathFlags(Call->getFastMathFlags());

	Function *Sqrt = Intrinsic::getDeclaration(M, Intrinsic::sqrt, Ty);			Function *Sqrt = Intrinsic::getDeclaration(M, Intrinsic::sqrt, Ty);
	Value *NewSqrt = Builder.CreateCall(Sqrt, Arg, "sqrt");			Value *NewSqrt = Builder.CreateCall(Sqrt, Arg, "sqrt");
				dmgreenUnsubmitted Not Done Reply Inline Actions The variables can be defined where they are first used. dmgreen: The variables can be defined where they are first used.
	I.replaceAllUsesWith(NewSqrt);			I.replaceAllUsesWith(NewSqrt);

	// Explicitly erase the old call because a call with side effects is not			// Explicitly erase the old call because a call with side effects is not
	// trivially dead.			// trivially dead.
	I.eraseFromParent();			I.eraseFromParent();
	return true;			return true;
	}			}

	return false;			return false;
	}			}

				dmgreenUnsubmitted Not Done Reply Inline Actions Drop extra brackets from (LI1 == LI2) dmgreen: Drop extra brackets from (LI1 == LI2)
	// Check if this array of constants represents a cttz table.			// Check if this array of constants represents a cttz table.
	// Iterate over the elements from \p Table by trying to find/match all			// Iterate over the elements from \p Table by trying to find/match all
	// the numbers from 0 to \p InputBits that should represent cttz results.			// the numbers from 0 to \p InputBits that should represent cttz results.
				dmgreenUnsubmitted Not Done Reply Inline Actions Should it check that the address space is the same? dmgreen: Should it check that the address space is the same?
	static bool isCTTZTable(const ConstantDataArray &Table, uint64_t Mul,			static bool isCTTZTable(const ConstantDataArray &Table, uint64_t Mul,
	uint64_t Shift, uint64_t InputBits) {			uint64_t Shift, uint64_t InputBits) {
	unsigned Length = Table.getNumElements();			unsigned Length = Table.getNumElements();
	if (Length < InputBits \|\| Length > InputBits * 2)			if (Length < InputBits \|\| Length > InputBits * 2)
	return false;			return false;

				nikicUnsubmitted Done Reply Inline Actions Zext -> ZextType, maybe? nikic: Zext -> ZextType, maybe?
	APInt Mask = APInt::getBitsSetFrom(InputBits, Shift);			APInt Mask = APInt::getBitsSetFrom(InputBits, Shift);
				dmgreenUnsubmitted Not Done Reply Inline Actions I think this can use getPointersDiff to check the two pointers are the right distance apart. dmgreen: I think this can use getPointersDiff to check the two pointers are the right distance apart.
	unsigned Matched = 0;			unsigned Matched = 0;

	for (unsigned i = 0; i < Length; i++) {			for (unsigned i = 0; i < Length; i++) {
	uint64_t Element = Table.getElementAsInteger(i);			uint64_t Element = Table.getElementAsInteger(i);
	if (Element >= InputBits)			if (Element >= InputBits)
	continue;			continue;
				nikicUnsubmitted Done Reply Inline Actions Iterative -> Recursive nikic: Iterative -> Recursive

	// Check if \p Element matches a concrete answer. It could fail for some			// Check if \p Element matches a concrete answer. It could fail for some
	// elements that are never accessed, so we keep iterating over each element			// elements that are never accessed, so we keep iterating over each element
	// from the table. The number of matched elements should be equal to the			// from the table. The number of matched elements should be equal to the
	// number of potential right answers which is \p InputBits actually.			// number of potential right answers which is \p InputBits actually.
	if ((((Mul << Element) & Mask.getZExtValue()) >> Shift) == i)			if ((((Mul << Element) & Mask.getZExtValue()) >> Shift) == i)
	Matched++;			Matched++;
				dmgreenUnsubmitted Not Done Reply Inline Actions Does it matter if it is a bitcast or a gep? dmgreen: Does it matter if it is a bitcast or a gep?
				bipmisAuthorUnsubmitted Done Reply Inline Actions I think this may be needed, so that we fall through and evaluate further if instructions are only of these types. bipmis: I think this may be needed, so that we fall through and evaluate further if instructions are…
				dmgreenUnsubmitted Done Reply Inline Actions What is it you mean by fall though? If stripAndAccumulateConstantOffsets could give us an Offset, it seems like we should just always call it and have it do what it can. It will return the original pointer if it couldn't do anything useful. It may be better to keep Offset1/Offset2 as APInt too. It would help if the pointers were > 64bits. dmgreen: What is it you mean by fall though? If stripAndAccumulateConstantOffsets could give us an…
	}			}

	return Matched == InputBits;			return Matched == InputBits;
	}			}

	// Try to recognize table-based ctz implementation.			// Try to recognize table-based ctz implementation.
	// E.g., an example in C (for more cases please see the llvm/tests):			// E.g., an example in C (for more cases please see the llvm/tests):
	// int f(unsigned x) {			// int f(unsigned x) {
	// static const char table[32] =			// static const char table[32] =
	// {0, 1, 28, 2, 29, 14, 24, 3, 30,			// {0, 1, 28, 2, 29, 14, 24, 3, 30,
				nikicUnsubmitted Not Done Reply Inline Actions This whole code can be replaced with `stripAndAccumulateConstantOffsets()`. nikic: This whole code can be replaced with `stripAndAccumulateConstantOffsets()`.
				bipmisAuthorUnsubmitted Done Reply Inline Actions Sure. Will look into this. bipmis: Sure. Will look into this.
	// 22, 20, 15, 25, 17, 4, 8, 31, 27,			// 22, 20, 15, 25, 17, 4, 8, 31, 27,
	// 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9};			// 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9};
	// return table[((unsigned)((x & -x) * 0x077CB531U)) >> 27];			// return table[((unsigned)((x & -x) * 0x077CB531U)) >> 27];
	// }			// }
	// this can be lowered to `cttz` instruction.			// this can be lowered to `cttz` instruction.
	// There is also a special case when the element is 0.			// There is also a special case when the element is 0.
	//			//
	// Here are some examples or LLVM IR for a 64-bit target:			// Here are some examples or LLVM IR for a 64-bit target:
				dmgreenUnsubmitted Not Done Reply Inline Actions Capitalize variable names. dmgreen: Capitalize variable names.
	//			//
	// CASE 1:			// CASE 1:
	// %sub = sub i32 0, %x			// %sub = sub i32 0, %x
				nikicUnsubmitted Done Reply Inline Actions Shouldn't this be `!LI1->isSimple() \|\|` !LI2->isSimple()`? We want to bail if either load isn't simple, not if both are. Also, it looks like there are no (negative) tests for volatile/atomic loads. nikic: Shouldn't this be `!LI1->isSimple() \|\| `!LI2->isSimple()`? We want to bail if either load isn't…
	// %and = and i32 %sub, %x			// %and = and i32 %sub, %x
	// %mul = mul i32 %and, 125613361			// %mul = mul i32 %and, 125613361
	// %shr = lshr i32 %mul, 27			// %shr = lshr i32 %mul, 27
	// %idxprom = zext i32 %shr to i64			// %idxprom = zext i32 %shr to i64
	// %arrayidx = getelementptr inbounds [32 x i8], [32 x i8]* @ctz1.table, i64 0,			// %arrayidx = getelementptr inbounds [32 x i8], [32 x i8]* @ctz1.table, i64 0,
	// i64 %idxprom %0 = load i8, i8* %arrayidx, align 1, !tbaa !8			// i64 %idxprom %0 = load i8, i8* %arrayidx, align 1, !tbaa !8
	//			//
	// CASE 2:			// CASE 2:
				dmgreenUnsubmitted Not Done Reply Inline Actions Do we check anywhere that LI1 and LI2 are in the same block? dmgreen: Do we check anywhere that LI1 and LI2 are in the same block?
	// %sub = sub i32 0, %x			// %sub = sub i32 0, %x
	// %and = and i32 %sub, %x			// %and = and i32 %sub, %x
	// %mul = mul i32 %and, 72416175			// %mul = mul i32 %and, 72416175
	// %shr = lshr i32 %mul, 26			// %shr = lshr i32 %mul, 26
	// %idxprom = zext i32 %shr to i64			// %idxprom = zext i32 %shr to i64
				dmgreenUnsubmitted Not Done Reply Inline Actions Does std::distance work? dmgreen: Does std::distance work?
	// %arrayidx = getelementptr inbounds [64 x i16], [64 x i16]* @ctz2.table, i64			// %arrayidx = getelementptr inbounds [64 x i16], [64 x i16]* @ctz2.table, i64
	// 0, i64 %idxprom %0 = load i16, i16* %arrayidx, align 2, !tbaa !8			// 0, i64 %idxprom %0 = load i16, i16* %arrayidx, align 2, !tbaa !8
	//			//
	// CASE 3:			// CASE 3:
	// %sub = sub i32 0, %x			// %sub = sub i32 0, %x
	// %and = and i32 %sub, %x			// %and = and i32 %sub, %x
				nikicUnsubmitted Not Done Reply Inline Actions FindAvailableLoadedValue() is not the correct way to check for clobbers. In particular, it will return an "available value" for a direct clobber (store to the same address). What you want it to loop over the instructions and call getModRefInfo() on AliasAnalysis, together with a small limit (e.g. 16) when you will abort the walk and bail out of the transform. nikic: FindAvailableLoadedValue() is not the correct way to check for clobbers. In particular, it will…
	// %mul = mul i32 %and, 81224991			// %mul = mul i32 %and, 81224991
	// %shr = lshr i32 %mul, 27			// %shr = lshr i32 %mul, 27
	// %idxprom = zext i32 %shr to i64			// %idxprom = zext i32 %shr to i64
	// %arrayidx = getelementptr inbounds [32 x i32], [32 x i32]* @ctz3.table, i64			// %arrayidx = getelementptr inbounds [32 x i32], [32 x i32]* @ctz3.table, i64
	// 0, i64 %idxprom %0 = load i32, i32* %arrayidx, align 4, !tbaa !8			// 0, i64 %idxprom %0 = load i32, i32* %arrayidx, align 4, !tbaa !8
	//			//
	// CASE 4:			// CASE 4:
				dmgreenUnsubmitted Not Done Reply Inline Actions We checked that loadSize1 == loadSize2 above. dmgreen: We checked that loadSize1 == loadSize2 above.
				bipmisAuthorUnsubmitted Done Reply Inline Actions Right. I have added additional comments on why this is needed. bipmis: Right. I have added additional comments on why this is needed.
	// %sub = sub i64 0, %x			// %sub = sub i64 0, %x
	// %and = and i64 %sub, %x			// %and = and i64 %sub, %x
	// %mul = mul i64 %and, 283881067100198605			// %mul = mul i64 %and, 283881067100198605
				dmgreenUnsubmitted Not Done Reply Inline Actions Demorgan this. dmgreen: Demorgan this.
	// %shr = lshr i64 %mul, 58			// %shr = lshr i64 %mul, 58
	// %arrayidx = getelementptr inbounds [64 x i8], [64 x i8]* @table, i64 0, i64			// %arrayidx = getelementptr inbounds [64 x i8], [64 x i8]* @table, i64 0, i64
	// %shr %0 = load i8, i8* %arrayidx, align 1, !tbaa !8			// %shr %0 = load i8, i8* %arrayidx, align 1, !tbaa !8
	//			//
	// All this can be lowered to @llvm.cttz.i32/64 intrinsic.			// All this can be lowered to @llvm.cttz.i32/64 intrinsic.
	static bool tryToRecognizeTableBasedCttz(Instruction &I) {			static bool tryToRecognizeTableBasedCttz(Instruction &I) {
	LoadInst *LI = dyn_cast<LoadInst>(&I);			LoadInst *LI = dyn_cast<LoadInst>(&I);
	if (!LI)			if (!LI)
	return false;			return false;

	Type *AccessType = LI->getType();			Type *AccessType = LI->getType();
	if (!AccessType->isIntegerTy())			if (!AccessType->isIntegerTy())
				dmgreenUnsubmitted Done Reply Inline Actions Drop these extra brackets dmgreen: Drop these extra brackets
	return false;			return false;

	GetElementPtrInst *GEP = dyn_cast<GetElementPtrInst>(LI->getPointerOperand());			GetElementPtrInst *GEP = dyn_cast<GetElementPtrInst>(LI->getPointerOperand());
				nikicUnsubmitted Not Done Reply Inline Actions This check should come first, otherwise you don't count store instructions. nikic: This check should come first, otherwise you don't count store instructions.
				bipmisAuthorUnsubmitted Done Reply Inline Actions So it would count stores which do not alias. For the case it aliases we terminate anyways and the count update wont matter? bipmis: So it would count stores which do not alias. For the case it aliases we terminate anyways and…
	if (!GEP \|\| !GEP->isInBounds() \|\| GEP->getNumIndices() != 2)			if (!GEP \|\| !GEP->isInBounds() \|\| GEP->getNumIndices() != 2)
				dmgreenUnsubmitted Not Done Reply Inline Actions Maybe replace the name of foldLoadsIterative with foldConsecutiveLoads and vice-versa. foldLoadsIterative doesn't really explain what this function is folding, and it's not really iterative. dmgreen: Maybe replace the name of foldLoadsIterative with foldConsecutiveLoads and vice-versa.
	return false;			return false;
				dmgreenUnsubmitted Not Done Reply Inline Actions Should this have a limit on the number of instructions? dmgreen: Should this have a limit on the number of instructions?
				bipmisAuthorUnsubmitted Done Reply Inline Actions Some tests like load64_farLoads() which have wider instruction gap b/w loads may result in partial combine(when tried with 16). I can possibly go for a bigger limit or can keep the limit on the actual instructions b/w 2 loads. bipmis: Some tests like load64_farLoads() which have wider instruction gap b/w loads may result in…
				dmgreenUnsubmitted Not Done Reply Inline Actions Using what limit? dmgreen: Using what limit?
				bipmisAuthorUnsubmitted Done Reply Inline Actions In the current implementation worst case limit could be all instructions in a BB. What is the issue with this? For the test case load64_farLoads() it does fine with a limit of 35. bipmis: In the current implementation worst case limit could be all instructions in a BB. What is the…
				dmgreenUnsubmitted Not Done Reply Inline Actions It protects us where there are thousands of instructions in the block, just to be safe for degenerate cases. If we expect the maximum pattern to be (load+zext+shift+or) * i8->i64, so 48, then a limit of 64 instructions sounds fine. dmgreen:* It protects us where there are thousands of instructions in the block, just to be safe for…
				bipmisAuthorUnsubmitted Done Reply Inline Actions Sounds right. Will implement the same with the 64 instruction limit. Thanks. bipmis: Sounds right. Will implement the same with the 64 instruction limit. Thanks.

	if (!GEP->getSourceElementType()->isArrayTy())			if (!GEP->getSourceElementType()->isArrayTy())
	return false;			return false;

	uint64_t ArraySize = GEP->getSourceElementType()->getArrayNumElements();			uint64_t ArraySize = GEP->getSourceElementType()->getArrayNumElements();
	if (ArraySize != 32 && ArraySize != 64)			if (ArraySize != 32 && ArraySize != 64)
	return false;			return false;

	GlobalVariable *GVTable = dyn_cast<GlobalVariable>(GEP->getPointerOperand());			GlobalVariable *GVTable = dyn_cast<GlobalVariable>(GEP->getPointerOperand());
	if (!GVTable \|\| !GVTable->hasInitializer() \|\| !GVTable->isConstant())			if (!GVTable \|\| !GVTable->hasInitializer() \|\| !GVTable->isConstant())
				dmgreenUnsubmitted Done Reply Inline Actions Is there a test for an i128 version of the fold? dmgreen: Is there a test for an i128 version of the fold?
	return false;			return false;

	ConstantDataArray *ConstData =			ConstantDataArray *ConstData =
	dyn_cast<ConstantDataArray>(GVTable->getInitializer());			dyn_cast<ConstantDataArray>(GVTable->getInitializer());
				dmgreenUnsubmitted Not Done Reply Inline Actions This doesn't need to cast to a Value* dmgreen: This doesn't need to cast to a Value*
	if (!ConstData)			if (!ConstData)
	return false;			return false;

	if (!match(GEP->idx_begin()->get(), m_ZeroInt()))			if (!match(GEP->idx_begin()->get(), m_ZeroInt()))
	return false;			return false;

	Value *Idx2 = std::next(GEP->idx_begin())->get();			Value *Idx2 = std::next(GEP->idx_begin())->get();
	Value *X1;			Value *X1;
				nikicUnsubmitted Done Reply Inline Actions I think this handles non-byte-sized loads incorrectly. Let's say you do i4 loads with 1 byte offset. Then PrevSize will be 1 and match the offset, even though the loads are not actually consecutive. Please add some tests for non-byte-sized loads. nikic: I think this handles non-byte-sized loads incorrectly. Let's say you do i4 loads with 1 byte…
				bipmisAuthorUnsubmitted Done Reply Inline Actions Right. The shift condition will prevent it from merging. But we do not want to combine loads smaller than a byte. Updated checks. bipmis: Right. The shift condition will prevent it from merging. But we do not want to combine loads…
	uint64_t MulConst, ShiftConst;			uint64_t MulConst, ShiftConst;
				dmgreenUnsubmitted Not Done Reply Inline Actions Allows -> Allowed dmgreen: Allows -> Allowed
	// FIXME: 64-bit targets have `i64` type for the GEP index, so this match will			// FIXME: 64-bit targets have `i64` type for the GEP index, so this match will
				dmgreenUnsubmitted Not Done Reply Inline Actions This could also be DL.isLegalInteger, which would avoid the need to create the Type. dmgreen: This could also be DL.isLegalInteger, which would avoid the need to create the Type.
	// probably fail for other (e.g. 32-bit) targets.			// probably fail for other (e.g. 32-bit) targets.
	if (!match(Idx2, m_ZExtOrSelf(			if (!match(Idx2, m_ZExtOrSelf(
	m_LShr(m_Mul(m_c_And(m_Neg(m_Value(X1)), m_Deferred(X1)),			m_LShr(m_Mul(m_c_And(m_Neg(m_Value(X1)), m_Deferred(X1)),
	m_ConstantInt(MulConst)),			m_ConstantInt(MulConst)),
	m_ConstantInt(ShiftConst)))))			m_ConstantInt(ShiftConst)))))
	return false;			return false;

	unsigned InputBits = X1->getType()->getScalarSizeInBits();			unsigned InputBits = X1->getType()->getScalarSizeInBits();
				dmgreenUnsubmitted Not Done Reply Inline Actions Should this check the Fast flag too? if (!Allowed \|\| !Fast) dmgreen: Should this check the Fast flag too? ``` if (!Allowed \|\| !Fast) ```
	if (InputBits != 32 && InputBits != 64)			if (InputBits != 32 && InputBits != 64)
	return false;			return false;

	// Shift should extract top 5..7 bits.			// Shift should extract top 5..7 bits.
	if (InputBits - Log2_32(InputBits) != ShiftConst &&			if (InputBits - Log2_32(InputBits) != ShiftConst &&
	InputBits - Log2_32(InputBits) - 1 != ShiftConst)			InputBits - Log2_32(InputBits) - 1 != ShiftConst)
	return false;			return false;

	if (!isCTTZTable(*ConstData, MulConst, ShiftConst, InputBits))			if (!isCTTZTable(*ConstData, MulConst, ShiftConst, InputBits))
	return false;			return false;

				dmgreenUnsubmitted Not Done Reply Inline Actions I think this is unneeded, and this can always just create: NewLoad = new LoadInst(IntegerType::get(Load1Ptr->getContext(), LOps.LoadSize), LI1->getPointerOperand(), "", LI1->isVolatile(), LI1->getAlign(), LI1->getOrdering(), LI1->getSyncScopeID()); Or possibly use Builder.CreateLoad to avoid the separate Insert. dmgreen: I think this is unneeded, and this can always just create: ``` NewLoad = new LoadInst…
	auto ZeroTableElem = ConstData->getElementAsInteger(0);			auto ZeroTableElem = ConstData->getElementAsInteger(0);
	bool DefinedForZero = ZeroTableElem == InputBits;			bool DefinedForZero = ZeroTableElem == InputBits;

	IRBuilder<> B(LI);			IRBuilder<> B(LI);
	ConstantInt *BoolConst = B.getInt1(!DefinedForZero);			ConstantInt *BoolConst = B.getInt1(!DefinedForZero);
	Type *XType = X1->getType();			Type *XType = X1->getType();
	auto Cttz = B.CreateIntrinsic(Intrinsic::cttz, {XType}, {X1, BoolConst});			auto Cttz = B.CreateIntrinsic(Intrinsic::cttz, {XType}, {X1, BoolConst});
	Value *ZExtOrTrunc = nullptr;			Value *ZExtOrTrunc = nullptr;

	if (DefinedForZero) {			if (DefinedForZero) {
	ZExtOrTrunc = B.CreateZExtOrTrunc(Cttz, AccessType);			ZExtOrTrunc = B.CreateZExtOrTrunc(Cttz, AccessType);
	} else {			} else {
				nikicUnsubmitted Done Reply Inline Actions Combine these declarations with initialization. nikic: Combine these declarations with initialization.
	// If the value in elem 0 isn't the same as InputBits, we still want to			// If the value in elem 0 isn't the same as InputBits, we still want to
	// produce the value from the table.			// produce the value from the table.
	auto Cmp = B.CreateICmpEQ(X1, ConstantInt::get(XType, 0));			auto Cmp = B.CreateICmpEQ(X1, ConstantInt::get(XType, 0));
	auto Select =			auto Select =
				dmgreenUnsubmitted Not Done Reply Inline Actions This could avoid the NewOp variable and just do: if (LOps.zext) { NewLoad = Builder.CreateZExt(NewLoad, LOps.zext); dmgreen: This could avoid the NewOp variable and just do: ``` if (LOps.zext) { NewLoad = Builder.
	B.CreateSelect(Cmp, ConstantInt::get(XType, ZeroTableElem), Cttz);			B.CreateSelect(Cmp, ConstantInt::get(XType, ZeroTableElem), Cttz);

	// NOTE: If the table[0] is 0, but the cttz(0) is defined by the Target			// NOTE: If the table[0] is 0, but the cttz(0) is defined by the Target
	// it should be handled as: `cttz(x) & (typeSize - 1)`.			// it should be handled as: `cttz(x) & (typeSize - 1)`.

	ZExtOrTrunc = B.CreateZExtOrTrunc(Select, AccessType);			ZExtOrTrunc = B.CreateZExtOrTrunc(Select, AccessType);
	}			}

	LI->replaceAllUsesWith(ZExtOrTrunc);			LI->replaceAllUsesWith(ZExtOrTrunc);

	return true;			return true;
	}			}

				/// This is used by foldLoadsRecursive() to capture a Root Load node which is
				/// of type or(load, load) and recursively build the wide load. Also capture the
				/// shift amount, zero extend type and loadSize.
				struct LoadOps {
				nikicUnsubmitted Done Reply Inline Actions Why does this not use the IRBuilder? nikic: Why does this not use the IRBuilder?
				LoadInst *Root = nullptr;
				dmgreenUnsubmitted Done Reply Inline Actions Is all the metadata on the old instruction (like MD_range) always valid on the new load? dmgreen: Is all the metadata on the old instruction (like MD_range) always valid on the new load?
				dmgreenUnsubmitted Not Done Reply Inline Actions What happens if new metadata gets added in the future, that isn't valid? Is it better to just drop all the metadata? Or is that too likely to be worse for performance? dmgreen: What happens if new metadata gets added in the future, that isn't valid? Is it better to just…
				bipmisAuthorUnsubmitted Done Reply Inline Actions This being a specific scenario of the pattern match and looking for an or-load chain, I dont think performance should be a big concern. Depends on the end use of the merged load. What I am seeing in most cases is that they try to retain atleast the AATags. if (AATags) NewVal->setAAMetadata(AATags); bipmis: This being a specific scenario of the pattern match and looking for an or-load chain, I dont…
				nikicUnsubmitted Not Done Reply Inline Actions Note that you can't simply take the AATags from one load, they have to be merged appropriately. I believe for this specific case you need the `AAMDNodes::concat()` method, because you are merging loads from different non-overlapping locations. nikic: Note that you can't simply take the AATags from one load, they have to be merged appropriately.
				bipmisAuthorUnsubmitted Done Reply Inline Actions Agreed. Thanks for this. I think we are better off dropping the metadata at this point. bipmis: Agreed. Thanks for this. I think we are better off dropping the metadata at this point.
				bipmisAuthorUnsubmitted Done Reply Inline Actions The concat method leaves the tbaa blank so maybe we may want to drop the Metadata altogether? Currently the ‘noalias’ and ‘alias.scope’ Metadata will be concatenated from AAMDNodes. bipmis: The concat method leaves the tbaa blank so maybe we may want to drop the Metadata altogether?
				dmgreenUnsubmitted Not Done Reply Inline Actions I'm a little surprised that if two the tbaa info are the same, we can't use the same on the result node. I think using concat sounds sensible though. I suspect in practice we will often be combining char in any case. dmgreen: I'm a little surprised that if two the tbaa info are the same, we can't use the same on the…
				bool FoundRoot = false;
				uint64_t LoadSize = 0;
				Value *Shift = nullptr;
				Type *ZextType;
				AAMDNodes AATags;
				};

				// Identify and Merge consecutive loads recursively which is of the form
				// (ZExt(L1) << shift1) \| (ZExt(L2) << shift2) -> ZExt(L3) << shift1
				// (ZExt(L1) << shift1) \| ZExt(L2) -> ZExt(L3)
				static bool foldLoadsRecursive(Value *V, LoadOps &LOps, const DataLayout &DL,
				AliasAnalysis &AA) {
				Value *ShAmt2 = nullptr;
				Value *X;
				Instruction L1, L2;

				// Go to the last node with loads.
				if (match(V, m_OneUse(m_c_Or(
				m_Value(X),
				m_OneUse(m_Shl(m_OneUse(m_ZExt(m_OneUse(m_Instruction(L2)))),
				m_Value(ShAmt2)))))) \|\|
				match(V, m_OneUse(m_Or(m_Value(X),
				m_OneUse(m_ZExt(m_OneUse(m_Instruction(L2))))))))
				foldLoadsRecursive(X, LOps, DL, AA);
				else
				return false;

				// Check if the pattern has loads
				LoadInst *LI1 = LOps.Root;
				Value *ShAmt1 = LOps.Shift;
				if (LOps.FoundRoot == false &&
				(match(X, m_OneUse(m_ZExt(m_Instruction(L1)))) \|\|
				match(X, m_OneUse(m_Shl(m_OneUse(m_ZExt(m_OneUse(m_Instruction(L1)))),
				m_Value(ShAmt1)))))) {
				LI1 = dyn_cast<LoadInst>(L1);
				}
				LoadInst *LI2 = dyn_cast<LoadInst>(L2);

				// Check if loads are same, atomic, volatile and having same address space.
				if (LI1 == LI2 \|\| !LI1 \|\| !LI2 \|\| !LI1->isSimple() \|\| !LI2->isSimple() \|\|
				LI1->getPointerAddressSpace() != LI2->getPointerAddressSpace())
				return false;

				// Check if Loads come from same BB.
				if (LI1->getParent() != LI2->getParent())
				return false;

				// Swap loads if LI1 comes later as we handle only forward loads.
				// This is done as InstCombine folds lowest node forward loads to reverse.
				// The implementation will be subsequently extended to handle all reverse
				// loads.
				spatelUnsubmitted Not Done Reply Inline Actions I didn't notice this limitation before. "Forward" and "reverse" are referring to the order that the `or` instructions use the loaded values? I agree that we don't want to complicate the initial implementation any more than necessary, but it might help to see how the backend handles that in DAGCombiner::MatchLoadCombine() (see D133584 for a proposed enhancement to that code). spatel: I didn't notice this limitation before. "Forward" and "reverse" are referring to the order that…
				bipmisAuthorUnsubmitted Done Reply Inline Actions Right it is the order Forward - or(or(or(or(0,1),2),3) Reverse - or(or(or(or(3,2),1),0) Considering 0,1...as zext and shl of loads with index 0,1 etc. For simplicity we wanted to implement the forward first and if this looks good we can do the reverse+mixed size loads. There should be minimal changes on top of this. So that is in plan next. bipmis: Right it is the order Forward - or(or(or(or(0,1),2),3) Reverse - or(or(or(or(3,2),1),0)…
				bipmisAuthorUnsubmitted Done Reply Inline Actions I didn't notice this limitation before. "Forward" and "reverse" are referring to the order that the `or` instructions use the loaded values? I agree that we don't want to complicate the initial implementation any more than necessary, but it might help to see how the backend handles that in DAGCombiner::MatchLoadCombine() (see D133584 for a proposed enhancement to that code). @spatel Verified DAGCombiner::MatchLoadCombine() handles the reverse load pattern fine and a single load is generated. bipmis: > I didn't notice this limitation before. "Forward" and "reverse" are referring to the order…
				spatelUnsubmitted Not Done Reply Inline Actions Thanks for checking. It doesn't directly impact the initial/immediate patch here, but it might provide a template for the planned enhancements. spatel: Thanks for checking. It doesn't directly impact the initial/immediate patch here, but it might…
				if (!LI1->comesBefore(LI2)) {
				if (LOps.FoundRoot == false) {
				std::swap(LI1, LI2);
				std::swap(ShAmt1, ShAmt2);
				} else
				return false;
				}

				// Find the data layout
				bool IsBigEndian = DL.isBigEndian();

				// Check if loads are consecutive and same size.
				Value *Load1Ptr = LI1->getPointerOperand();
				APInt Offset1(DL.getIndexTypeSizeInBits(Load1Ptr->getType()), 0);
				Load1Ptr =
				Load1Ptr->stripAndAccumulateConstantOffsets(DL, Offset1,
				/* AllowNonInbounds */ true);

				Value *Load2Ptr = LI2->getPointerOperand();
				APInt Offset2(DL.getIndexTypeSizeInBits(Load2Ptr->getType()), 0);
				Load2Ptr =
				Load2Ptr->stripAndAccumulateConstantOffsets(DL, Offset2,
				/* AllowNonInbounds */ true);

				// Verify if both loads have same base pointers and load sizes are same.
				uint64_t LoadSize1 = LI1->getType()->getPrimitiveSizeInBits();
				uint64_t LoadSize2 = LI2->getType()->getPrimitiveSizeInBits();
				if (Load1Ptr != Load2Ptr \|\| LoadSize1 != LoadSize2)
				return false;

				// Support Loadsizes greater or equal to 8bits and only power of 2.
				if (LoadSize1 < 8 \|\| !isPowerOf2_64(LoadSize1))
				return false;

				// Alias Analysis to check for store b/w the loads.
				MemoryLocation Loc = MemoryLocation::get(LI2);
				unsigned NumScanned = 0;
				for (Instruction &Inst : make_range(LI1->getIterator(), LI2->getIterator())) {
				if (Inst.mayWriteToMemory() && isModSet(AA.getModRefInfo(&Inst, Loc)))
				return false;
				if (++NumScanned > MaxInstrsToScan)
				uabelhoUnsubmitted Not Done Reply Inline Actions Since the NumScanned counter includes debug instructions, this code makes the result depend on the presence on debug instructions. I wrote a ticket about that: https://github.com/llvm/llvm-project/issues/69925 uabelho: Since the NumScanned counter includes debug instructions, this code makes the result depend on…
				return false;
				}

				// Big endian swap the shifts
				if (IsBigEndian)
				std::swap(ShAmt1, ShAmt2);

				// Find Shifts values.
				const APInt *Temp;
				uint64_t Shift1 = 0, Shift2 = 0;
				if (ShAmt1 && match(ShAmt1, m_APInt(Temp)))
				Shift1 = Temp->getZExtValue();
				if (ShAmt2 && match(ShAmt2, m_APInt(Temp)))
				Shift2 = Temp->getZExtValue();

				// First load is always LI1. This is where we put the new load.
				// Use the merged load size available from LI1, if we already combined loads.
				if (LOps.FoundRoot)
				LoadSize1 = LOps.LoadSize;

				// Verify if shift amount and load index aligns and verifies that loads
				// are consecutive.
				uint64_t ShiftDiff = IsBigEndian ? LoadSize2 : LoadSize1;
				uint64_t PrevSize =
				DL.getTypeStoreSize(IntegerType::get(LI1->getContext(), LoadSize1));
				if ((Shift2 - Shift1) != ShiftDiff \|\| (Offset2 - Offset1) != PrevSize)
				return false;

				// Update LOps
				AAMDNodes AATags1 = LOps.AATags;
				AAMDNodes AATags2 = LI2->getAAMetadata();
				if (LOps.FoundRoot == false) {
				LOps.FoundRoot = true;
				LOps.LoadSize = LoadSize1 + LoadSize2;
				AATags1 = LI1->getAAMetadata();
				} else
				LOps.LoadSize = LOps.LoadSize + LoadSize2;

				// Concatenate the AATags of the Merged Loads.
				LOps.AATags = AATags1.concat(AATags2);

				LOps.Root = LI1;
				LOps.Shift = ShAmt1;
				LOps.ZextType = X->getType();
				return true;
				}

				// For a given BB instruction, evaluate all loads in the chain that form a
				// pattern which suggests that the loads can be combined. The one and only use
				// of the loads is to form a wider load.
				static bool foldConsecutiveLoads(Instruction &I, const DataLayout &DL,
				TargetTransformInfo &TTI, AliasAnalysis &AA) {
				LoadOps LOps;
				if (!foldLoadsRecursive(&I, LOps, DL, AA) \|\| !LOps.FoundRoot)
				return false;

				IRBuilder<> Builder(&I);
				LoadInst NewLoad = nullptr, LI1 = LOps.Root;

				// TTI based checks if we want to proceed with wider load
				bool Allowed =
				TTI.isTypeLegal(IntegerType::get(I.getContext(), LOps.LoadSize));
				if (!Allowed)
				return false;

				unsigned AS = LI1->getPointerAddressSpace();
				bool Fast = false;
				Allowed = TTI.allowsMisalignedMemoryAccesses(I.getContext(), LOps.LoadSize,
				AS, LI1->getAlign(), &Fast);
				if (!Allowed \|\| !Fast)
				return false;

				// New load can be generated
				Value *Load1Ptr = LI1->getPointerOperand();
				Builder.SetInsertPoint(LI1);
				NewLoad = Builder.CreateAlignedLoad(
				IntegerType::get(Load1Ptr->getContext(), LOps.LoadSize), Load1Ptr,
				LI1->getAlign(), LI1->isVolatile(), "");
				KaiUnsubmitted Not Done Reply Inline Actions FYI: The missing bitcast for the `Load1Ptr` argument means that this change only works with opaque pointers. Kai: FYI: The missing bitcast for the `Load1Ptr` argument means that this change only works with…
				dstuttardUnsubmitted Not Done Reply Inline Actions We had the same issue - I've just uploaded a patch. See D135249 dstuttard: We had the same issue - I've just uploaded a patch. See D135249
				NewLoad->takeName(LI1);
				// Set the New Load AATags Metadata.
				if (LOps.AATags)
				NewLoad->setAAMetadata(LOps.AATags);

				Value *NewOp = NewLoad;
				// Check if zero extend needed.
				if (LOps.ZextType)
				NewOp = Builder.CreateZExt(NewOp, LOps.ZextType);

				// Check if shift needed. We need to shift with the amount of load1
				// shift if not zero.
				if (LOps.Shift)
				NewOp = Builder.CreateShl(NewOp, LOps.Shift);
				I.replaceAllUsesWith(NewOp);

				return true;
				}

	/// This is the entry point for folds that could be implemented in regular			/// This is the entry point for folds that could be implemented in regular
	/// InstCombine, but they are separated because they are not expected to			/// InstCombine, but they are separated because they are not expected to
	/// occur frequently and/or have more than a constant-length pattern match.			/// occur frequently and/or have more than a constant-length pattern match.
	static bool foldUnusualPatterns(Function &F, DominatorTree &DT,			static bool foldUnusualPatterns(Function &F, DominatorTree &DT,
	TargetTransformInfo &TTI,			TargetTransformInfo &TTI,
	TargetLibraryInfo &TLI) {			TargetLibraryInfo &TLI, AliasAnalysis &AA) {
	bool MadeChange = false;			bool MadeChange = false;
	for (BasicBlock &BB : F) {			for (BasicBlock &BB : F) {
	// Ignore unreachable basic blocks.			// Ignore unreachable basic blocks.
	if (!DT.isReachableFromEntry(&BB))			if (!DT.isReachableFromEntry(&BB))
	continue;			continue;

				const DataLayout &DL = F.getParent()->getDataLayout();

	// Walk the block backwards for efficiency. We're matching a chain of			// Walk the block backwards for efficiency. We're matching a chain of
	// use->defs, so we're more likely to succeed by starting from the bottom.			// use->defs, so we're more likely to succeed by starting from the bottom.
	// Also, we want to avoid matching partial patterns.			// Also, we want to avoid matching partial patterns.
	// TODO: It would be more efficient if we removed dead instructions			// TODO: It would be more efficient if we removed dead instructions
	// iteratively in this loop rather than waiting until the end.			// iteratively in this loop rather than waiting until the end.
	for (Instruction &I : make_early_inc_range(llvm::reverse(BB))) {			for (Instruction &I : make_early_inc_range(llvm::reverse(BB))) {
	MadeChange \|= foldAnyOrAllBitsSet(I);			MadeChange \|= foldAnyOrAllBitsSet(I);
	MadeChange \|= foldGuardedFunnelShift(I, DT);			MadeChange \|= foldGuardedFunnelShift(I, DT);
	MadeChange \|= tryToRecognizePopCount(I);			MadeChange \|= tryToRecognizePopCount(I);
	MadeChange \|= tryToFPToSat(I, TTI);			MadeChange \|= tryToFPToSat(I, TTI);
	MadeChange \|= tryToRecognizeTableBasedCttz(I);			MadeChange \|= tryToRecognizeTableBasedCttz(I);
				MadeChange \|= foldConsecutiveLoads(I, DL, TTI, AA);
	// NOTE: This function introduces erasing of the instruction `I`, so it			// NOTE: This function introduces erasing of the instruction `I`, so it
	// needs to be called at the end of this sequence, otherwise we may make			// needs to be called at the end of this sequence, otherwise we may make
	// bugs.			// bugs.
	MadeChange \|= foldSqrt(I, TTI, TLI);			MadeChange \|= foldSqrt(I, TTI, TLI);
	}			}
	}			}
				nikicUnsubmitted Done Reply Inline Actions The DataLayout fetch can be moved outside the loop. nikic: The DataLayout fetch can be moved outside the loop.
				spatelUnsubmitted Done Reply Inline Actions The transform should be placed ahead of foldSqrt(), or we we may hit a use-after-free bug (because foldSqrt can delete `I`). There was a code comment about this in: https://github.com/llvm/llvm-project/commit/df868edee561eb973edd85ec9df41c67aa0bff6b ...but that patch got reverted. We should probably add that code comment independently (or fix the bug some other way). spatel: The transform should be placed ahead of foldSqrt(), or we we may hit a use-after-free bug…

	// We're done with transforms, so remove dead instructions.			// We're done with transforms, so remove dead instructions.
	if (MadeChange)			if (MadeChange)
	for (BasicBlock &BB : F)			for (BasicBlock &BB : F)
	SimplifyInstructionsInBlock(&BB);			SimplifyInstructionsInBlock(&BB);

	return MadeChange;			return MadeChange;
	}			}

	/// This is the entry point for all transforms. Pass manager differences are			/// This is the entry point for all transforms. Pass manager differences are
	/// handled in the callers of this function.			/// handled in the callers of this function.
	static bool runImpl(Function &F, AssumptionCache &AC, TargetTransformInfo &TTI,			static bool runImpl(Function &F, AssumptionCache &AC, TargetTransformInfo &TTI,
	TargetLibraryInfo &TLI, DominatorTree &DT) {			TargetLibraryInfo &TLI, DominatorTree &DT,
				AliasAnalysis &AA) {
	bool MadeChange = false;			bool MadeChange = false;
	const DataLayout &DL = F.getParent()->getDataLayout();			const DataLayout &DL = F.getParent()->getDataLayout();
	TruncInstCombine TIC(AC, TLI, DL, DT);			TruncInstCombine TIC(AC, TLI, DL, DT);
	MadeChange \|= TIC.run(F);			MadeChange \|= TIC.run(F);
	MadeChange \|= foldUnusualPatterns(F, DT, TTI, TLI);			MadeChange \|= foldUnusualPatterns(F, DT, TTI, TLI, AA);
	return MadeChange;			return MadeChange;
	}			}

	void AggressiveInstCombinerLegacyPass::getAnalysisUsage(			void AggressiveInstCombinerLegacyPass::getAnalysisUsage(
	AnalysisUsage &AU) const {			AnalysisUsage &AU) const {
	AU.setPreservesCFG();			AU.setPreservesCFG();
	AU.addRequired<AssumptionCacheTracker>();			AU.addRequired<AssumptionCacheTracker>();
	AU.addRequired<DominatorTreeWrapperPass>();			AU.addRequired<DominatorTreeWrapperPass>();
	AU.addRequired<TargetLibraryInfoWrapperPass>();			AU.addRequired<TargetLibraryInfoWrapperPass>();
	AU.addRequired<TargetTransformInfoWrapperPass>();			AU.addRequired<TargetTransformInfoWrapperPass>();
	AU.addPreserved<AAResultsWrapperPass>();			AU.addPreserved<AAResultsWrapperPass>();
	AU.addPreserved<BasicAAWrapperPass>();			AU.addPreserved<BasicAAWrapperPass>();
	AU.addPreserved<DominatorTreeWrapperPass>();			AU.addPreserved<DominatorTreeWrapperPass>();
	AU.addPreserved<GlobalsAAWrapperPass>();			AU.addPreserved<GlobalsAAWrapperPass>();
				AU.addRequired<AAResultsWrapperPass>();
	}			}

	bool AggressiveInstCombinerLegacyPass::runOnFunction(Function &F) {			bool AggressiveInstCombinerLegacyPass::runOnFunction(Function &F) {
	auto &AC = getAnalysis<AssumptionCacheTracker>().getAssumptionCache(F);			auto &AC = getAnalysis<AssumptionCacheTracker>().getAssumptionCache(F);
	auto &TLI = getAnalysis<TargetLibraryInfoWrapperPass>().getTLI(F);			auto &TLI = getAnalysis<TargetLibraryInfoWrapperPass>().getTLI(F);
	auto &DT = getAnalysis<DominatorTreeWrapperPass>().getDomTree();			auto &DT = getAnalysis<DominatorTreeWrapperPass>().getDomTree();
	auto &TTI = getAnalysis<TargetTransformInfoWrapperPass>().getTTI(F);			auto &TTI = getAnalysis<TargetTransformInfoWrapperPass>().getTTI(F);
	return runImpl(F, AC, TTI, TLI, DT);			auto &AA = getAnalysis<AAResultsWrapperPass>().getAAResults();
				return runImpl(F, AC, TTI, TLI, DT, AA);
				nikicUnsubmitted Not Done Reply Inline Actions Just like all the other analyses, this should use a reference, not a pointer (the analysis is not optional). nikic: Just like all the other analyses, this should use a reference, not a pointer (the analysis is…
	}			}

	PreservedAnalyses AggressiveInstCombinePass::run(Function &F,			PreservedAnalyses AggressiveInstCombinePass::run(Function &F,
	FunctionAnalysisManager &AM) {			FunctionAnalysisManager &AM) {
	auto &AC = AM.getResult<AssumptionAnalysis>(F);			auto &AC = AM.getResult<AssumptionAnalysis>(F);
	auto &TLI = AM.getResult<TargetLibraryAnalysis>(F);			auto &TLI = AM.getResult<TargetLibraryAnalysis>(F);
	auto &DT = AM.getResult<DominatorTreeAnalysis>(F);			auto &DT = AM.getResult<DominatorTreeAnalysis>(F);
	auto &TTI = AM.getResult<TargetIRAnalysis>(F);			auto &TTI = AM.getResult<TargetIRAnalysis>(F);
	if (!runImpl(F, AC, TTI, TLI, DT)) {			auto &AA = AM.getResult<AAManager>(F);
				if (!runImpl(F, AC, TTI, TLI, DT, AA)) {
	// No changes, all analyses are preserved.			// No changes, all analyses are preserved.
	return PreservedAnalyses::all();			return PreservedAnalyses::all();
	}			}
	// Mark all the analyses that instcombine updates as preserved.			// Mark all the analyses that instcombine updates as preserved.
	PreservedAnalyses PA;			PreservedAnalyses PA;
	PA.preserveSet<CFGAnalyses>();			PA.preserveSet<CFGAnalyses>();
	return PA;			return PA;
	}			}
	Show All 28 Lines

llvm/test/Transforms/AggressiveInstCombine/AArch64/or-load.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -passes=aggressive-instcombine -S -mtriple aarch64 -data-layout="e-n64" \| FileCheck %s --check-prefixes=ALL		; RUN: opt < %s -passes=aggressive-instcombine -S -mtriple aarch64 -data-layout="e-n64" \| FileCheck %s --check-prefixes=ALL,LE
; RUN: opt < %s -passes=aggressive-instcombine -S -mtriple aarch64 -data-layout="E-n64" \| FileCheck %s --check-prefixes=ALL		; RUN: opt < %s -passes=aggressive-instcombine -S -mtriple aarch64 -data-layout="E-n64" \| FileCheck %s --check-prefixes=ALL,BE

define i16 @loadCombine_2consecutive(ptr %p) {		define i16 @loadCombine_2consecutive(ptr %p) {
;		;
; ALL-LABEL: @loadCombine_2consecutive(		; ALL-LABEL: @loadCombine_2consecutive(
; ALL-NEXT: [[P1:%.]] = getelementptr i8, ptr [[P:%.]], i32 1		; ALL-NEXT: [[P1:%.]] = getelementptr i8, ptr [[P:%.]], i32 1
; ALL-NEXT: [[L1:%.*]] = load i8, ptr [[P]], align 1		; ALL-NEXT: [[L1:%.*]] = load i8, ptr [[P]], align 1
; ALL-NEXT: [[L2:%.*]] = load i8, ptr [[P1]], align 1		; ALL-NEXT: [[L2:%.*]] = load i8, ptr [[P1]], align 1
; ALL-NEXT: [[E1:%.*]] = zext i8 [[L1]] to i16		; ALL-NEXT: [[E1:%.*]] = zext i8 [[L1]] to i16
; ALL-NEXT: [[E2:%.*]] = zext i8 [[L2]] to i16		; ALL-NEXT: [[E2:%.*]] = zext i8 [[L2]] to i16
; ALL-NEXT: [[S2:%.*]] = shl i16 [[E2]], 8		; ALL-NEXT: [[S2:%.*]] = shl i16 [[E2]], 8
; ALL-NEXT: [[O1:%.*]] = or i16 [[E1]], [[S2]]		; ALL-NEXT: [[O1:%.*]] = or i16 [[E1]], [[S2]]
; ALL-NEXT: ret i16 [[O1]]		; ALL-NEXT: ret i16 [[O1]]
;		;
%p1 = getelementptr i8, ptr %p, i32 1		%p1 = getelementptr i8, ptr %p, i32 1
%l1 = load i8, ptr %p		%l1 = load i8, ptr %p
%l2 = load i8, ptr %p1		%l2 = load i8, ptr %p1
%e1 = zext i8 %l1 to i16		%e1 = zext i8 %l1 to i16
%e2 = zext i8 %l2 to i16		%e2 = zext i8 %l2 to i16
%s2 = shl i16 %e2, 8		%s2 = shl i16 %e2, 8
%o1 = or i16 %e1, %s2		%o1 = or i16 %e1, %s2
ret i16 %o1		ret i16 %o1
}		}

		dmgreenUnsubmitted Not Done Reply Inline Actions Can you run these tests through `opt -O1` (without this patch) and use the result as the tests (maybe with a little cleanup). LLVM-IR will almost never include shl 0 nodes, and we should make sure we are testing what will appear in reality. https://godbolt.org/z/raxKnEE9a dmgreen: Can you run these tests through `opt -O1` (without this patch) and use the result as the tests…
define i16 @loadCombine_2consecutive_BE(ptr %p) {		define i16 @loadCombine_2consecutive_BE(ptr %p) {
; ALL-LABEL: @loadCombine_2consecutive_BE(		; ALL-LABEL: @loadCombine_2consecutive_BE(
; ALL-NEXT: [[P1:%.]] = getelementptr i8, ptr [[P:%.]], i32 1		; ALL-NEXT: [[P1:%.]] = getelementptr i8, ptr [[P:%.]], i32 1
; ALL-NEXT: [[L1:%.*]] = load i8, ptr [[P]], align 1		; ALL-NEXT: [[L1:%.*]] = load i8, ptr [[P]], align 1
; ALL-NEXT: [[L2:%.*]] = load i8, ptr [[P1]], align 1		; ALL-NEXT: [[L2:%.*]] = load i8, ptr [[P1]], align 1
; ALL-NEXT: [[E1:%.*]] = zext i8 [[L1]] to i16		; ALL-NEXT: [[E1:%.*]] = zext i8 [[L1]] to i16
; ALL-NEXT: [[E2:%.*]] = zext i8 [[L2]] to i16		; ALL-NEXT: [[E2:%.*]] = zext i8 [[L2]] to i16
; ALL-NEXT: [[S1:%.*]] = shl i16 [[E1]], 8		; ALL-NEXT: [[S1:%.*]] = shl i16 [[E1]], 8
; ALL-NEXT: [[O1:%.*]] = or i16 [[S1]], [[E2]]		; ALL-NEXT: [[O1:%.*]] = or i16 [[S1]], [[E2]]
; ALL-NEXT: ret i16 [[O1]]		; ALL-NEXT: ret i16 [[O1]]
;		;
%p1 = getelementptr i8, ptr %p, i32 1		%p1 = getelementptr i8, ptr %p, i32 1
%l1 = load i8, ptr %p		%l1 = load i8, ptr %p
%l2 = load i8, ptr %p1		%l2 = load i8, ptr %p1
%e1 = zext i8 %l1 to i16		%e1 = zext i8 %l1 to i16
%e2 = zext i8 %l2 to i16		%e2 = zext i8 %l2 to i16
%s1 = shl i16 %e1, 8		%s1 = shl i16 %e1, 8
%o1 = or i16 %s1, %e2		%o1 = or i16 %s1, %e2
ret i16 %o1		ret i16 %o1
}		}

define i32 @loadCombine_4consecutive(ptr %p) {		define i32 @loadCombine_4consecutive(ptr %p) {
; ALL-LABEL: @loadCombine_4consecutive(		; LE-LABEL: @loadCombine_4consecutive(
; ALL-NEXT: [[P1:%.]] = getelementptr i8, ptr [[P:%.]], i32 1		; LE-NEXT: [[L1:%.]] = load i32, ptr [[P:%.]], align 1
; ALL-NEXT: [[P2:%.*]] = getelementptr i8, ptr [[P]], i32 2		; LE-NEXT: ret i32 [[L1]]
; ALL-NEXT: [[P3:%.*]] = getelementptr i8, ptr [[P]], i32 3		;
; ALL-NEXT: [[L1:%.*]] = load i8, ptr [[P]], align 1		; BE-LABEL: @loadCombine_4consecutive(
; ALL-NEXT: [[L2:%.*]] = load i8, ptr [[P1]], align 1		; BE-NEXT: [[P1:%.]] = getelementptr i8, ptr [[P:%.]], i32 1
; ALL-NEXT: [[L3:%.*]] = load i8, ptr [[P2]], align 1		; BE-NEXT: [[P2:%.*]] = getelementptr i8, ptr [[P]], i32 2
; ALL-NEXT: [[L4:%.*]] = load i8, ptr [[P3]], align 1		; BE-NEXT: [[P3:%.*]] = getelementptr i8, ptr [[P]], i32 3
; ALL-NEXT: [[E1:%.*]] = zext i8 [[L1]] to i32		; BE-NEXT: [[L1:%.*]] = load i8, ptr [[P]], align 1
; ALL-NEXT: [[E2:%.*]] = zext i8 [[L2]] to i32		; BE-NEXT: [[L2:%.*]] = load i8, ptr [[P1]], align 1
; ALL-NEXT: [[E3:%.*]] = zext i8 [[L3]] to i32		; BE-NEXT: [[L3:%.*]] = load i8, ptr [[P2]], align 1
; ALL-NEXT: [[E4:%.*]] = zext i8 [[L4]] to i32		; BE-NEXT: [[L4:%.*]] = load i8, ptr [[P3]], align 1
; ALL-NEXT: [[S2:%.*]] = shl i32 [[E2]], 8		; BE-NEXT: [[E1:%.*]] = zext i8 [[L1]] to i32
; ALL-NEXT: [[S3:%.*]] = shl i32 [[E3]], 16		; BE-NEXT: [[E2:%.*]] = zext i8 [[L2]] to i32
; ALL-NEXT: [[S4:%.*]] = shl i32 [[E4]], 24		; BE-NEXT: [[E3:%.*]] = zext i8 [[L3]] to i32
; ALL-NEXT: [[O1:%.*]] = or i32 [[E1]], [[S2]]		; BE-NEXT: [[E4:%.*]] = zext i8 [[L4]] to i32
; ALL-NEXT: [[O2:%.*]] = or i32 [[O1]], [[S3]]		; BE-NEXT: [[S2:%.*]] = shl i32 [[E2]], 8
; ALL-NEXT: [[O3:%.*]] = or i32 [[O2]], [[S4]]		; BE-NEXT: [[S3:%.*]] = shl i32 [[E3]], 16
; ALL-NEXT: ret i32 [[O3]]		; BE-NEXT: [[S4:%.*]] = shl i32 [[E4]], 24
		; BE-NEXT: [[O1:%.*]] = or i32 [[E1]], [[S2]]
		; BE-NEXT: [[O2:%.*]] = or i32 [[O1]], [[S3]]
		; BE-NEXT: [[O3:%.*]] = or i32 [[O2]], [[S4]]
		; BE-NEXT: ret i32 [[O3]]
;		;
%p1 = getelementptr i8, ptr %p, i32 1		%p1 = getelementptr i8, ptr %p, i32 1
%p2 = getelementptr i8, ptr %p, i32 2		%p2 = getelementptr i8, ptr %p, i32 2
%p3 = getelementptr i8, ptr %p, i32 3		%p3 = getelementptr i8, ptr %p, i32 3
%l1 = load i8, ptr %p		%l1 = load i8, ptr %p
%l2 = load i8, ptr %p1		%l2 = load i8, ptr %p1
%l3 = load i8, ptr %p2		%l3 = load i8, ptr %p2
%l4 = load i8, ptr %p3		%l4 = load i8, ptr %p3
Show All 9 Lines	;

%o1 = or i32 %e1, %s2		%o1 = or i32 %e1, %s2
%o2 = or i32 %o1, %s3		%o2 = or i32 %o1, %s3
%o3 = or i32 %o2, %s4		%o3 = or i32 %o2, %s4
ret i32 %o3		ret i32 %o3
}		}

define i32 @loadCombine_4consecutive_BE(ptr %p) {		define i32 @loadCombine_4consecutive_BE(ptr %p) {
; ALL-LABEL: @loadCombine_4consecutive_BE(		; LE-LABEL: @loadCombine_4consecutive_BE(
; ALL-NEXT: [[P1:%.]] = getelementptr i8, ptr [[P:%.]], i32 1		; LE-NEXT: [[P1:%.]] = getelementptr i8, ptr [[P:%.]], i32 1
; ALL-NEXT: [[P2:%.*]] = getelementptr i8, ptr [[P]], i32 2		; LE-NEXT: [[P2:%.*]] = getelementptr i8, ptr [[P]], i32 2
; ALL-NEXT: [[P3:%.*]] = getelementptr i8, ptr [[P]], i32 3		; LE-NEXT: [[P3:%.*]] = getelementptr i8, ptr [[P]], i32 3
; ALL-NEXT: [[L1:%.*]] = load i8, ptr [[P]], align 1		; LE-NEXT: [[L1:%.*]] = load i8, ptr [[P]], align 1
; ALL-NEXT: [[L2:%.*]] = load i8, ptr [[P1]], align 1		; LE-NEXT: [[L2:%.*]] = load i8, ptr [[P1]], align 1
; ALL-NEXT: [[L3:%.*]] = load i8, ptr [[P2]], align 1		; LE-NEXT: [[L3:%.*]] = load i8, ptr [[P2]], align 1
; ALL-NEXT: [[L4:%.*]] = load i8, ptr [[P3]], align 1		; LE-NEXT: [[L4:%.*]] = load i8, ptr [[P3]], align 1
; ALL-NEXT: [[E1:%.*]] = zext i8 [[L1]] to i32		; LE-NEXT: [[E1:%.*]] = zext i8 [[L1]] to i32
; ALL-NEXT: [[E2:%.*]] = zext i8 [[L2]] to i32		; LE-NEXT: [[E2:%.*]] = zext i8 [[L2]] to i32
; ALL-NEXT: [[E3:%.*]] = zext i8 [[L3]] to i32		; LE-NEXT: [[E3:%.*]] = zext i8 [[L3]] to i32
; ALL-NEXT: [[E4:%.*]] = zext i8 [[L4]] to i32		; LE-NEXT: [[E4:%.*]] = zext i8 [[L4]] to i32
; ALL-NEXT: [[S1:%.*]] = shl i32 [[E1]], 24		; LE-NEXT: [[S1:%.*]] = shl i32 [[E1]], 24
; ALL-NEXT: [[S2:%.*]] = shl i32 [[E2]], 16		; LE-NEXT: [[S2:%.*]] = shl i32 [[E2]], 16
; ALL-NEXT: [[S3:%.*]] = shl i32 [[E3]], 8		; LE-NEXT: [[S3:%.*]] = shl i32 [[E3]], 8
; ALL-NEXT: [[O1:%.*]] = or i32 [[S1]], [[S2]]		; LE-NEXT: [[O1:%.*]] = or i32 [[S1]], [[S2]]
; ALL-NEXT: [[O2:%.*]] = or i32 [[O1]], [[S3]]		; LE-NEXT: [[O2:%.*]] = or i32 [[O1]], [[S3]]
; ALL-NEXT: [[O3:%.*]] = or i32 [[O2]], [[E4]]		; LE-NEXT: [[O3:%.*]] = or i32 [[O2]], [[E4]]
; ALL-NEXT: ret i32 [[O3]]		; LE-NEXT: ret i32 [[O3]]
		;
		; BE-LABEL: @loadCombine_4consecutive_BE(
		; BE-NEXT: [[L1:%.]] = load i32, ptr [[P:%.]], align 1
		; BE-NEXT: ret i32 [[L1]]
;		;
%p1 = getelementptr i8, ptr %p, i32 1		%p1 = getelementptr i8, ptr %p, i32 1
%p2 = getelementptr i8, ptr %p, i32 2		%p2 = getelementptr i8, ptr %p, i32 2
%p3 = getelementptr i8, ptr %p, i32 3		%p3 = getelementptr i8, ptr %p, i32 3
%l1 = load i8, ptr %p		%l1 = load i8, ptr %p
%l2 = load i8, ptr %p1		%l2 = load i8, ptr %p1
%l3 = load i8, ptr %p2		%l3 = load i8, ptr %p2
%l4 = load i8, ptr %p3		%l4 = load i8, ptr %p3
Show All 9 Lines	;

%o1 = or i32 %s1, %s2		%o1 = or i32 %s1, %s2
%o2 = or i32 %o1, %s3		%o2 = or i32 %o1, %s3
%o3 = or i32 %o2, %e4		%o3 = or i32 %o2, %e4
ret i32 %o3		ret i32 %o3
}		}

define i32 @loadCombine_4consecutive_alias(ptr %p) {		define i32 @loadCombine_4consecutive_alias(ptr %p) {
; ALL-LABEL: @loadCombine_4consecutive_alias(		; LE-LABEL: @loadCombine_4consecutive_alias(
; ALL-NEXT: [[P1:%.]] = getelementptr i8, ptr [[P:%.]], i32 1		; LE-NEXT: [[L1:%.]] = load i32, ptr [[P:%.]], align 1
; ALL-NEXT: [[P2:%.*]] = getelementptr i8, ptr [[P]], i32 2		; LE-NEXT: store i8 10, ptr [[P]], align 1
; ALL-NEXT: [[P3:%.*]] = getelementptr i8, ptr [[P]], i32 3		; LE-NEXT: ret i32 [[L1]]
; ALL-NEXT: [[L1:%.*]] = load i8, ptr [[P]], align 1		;
; ALL-NEXT: store i8 10, ptr [[P]], align 1		; BE-LABEL: @loadCombine_4consecutive_alias(
; ALL-NEXT: [[L2:%.*]] = load i8, ptr [[P1]], align 1		; BE-NEXT: [[P1:%.]] = getelementptr i8, ptr [[P:%.]], i32 1
; ALL-NEXT: [[L3:%.*]] = load i8, ptr [[P2]], align 1		; BE-NEXT: [[P2:%.*]] = getelementptr i8, ptr [[P]], i32 2
; ALL-NEXT: [[L4:%.*]] = load i8, ptr [[P3]], align 1		; BE-NEXT: [[P3:%.*]] = getelementptr i8, ptr [[P]], i32 3
; ALL-NEXT: [[E1:%.*]] = zext i8 [[L1]] to i32		; BE-NEXT: [[L1:%.*]] = load i8, ptr [[P]], align 1
; ALL-NEXT: [[E2:%.*]] = zext i8 [[L2]] to i32		; BE-NEXT: store i8 10, ptr [[P]], align 1
; ALL-NEXT: [[E3:%.*]] = zext i8 [[L3]] to i32		; BE-NEXT: [[L2:%.*]] = load i8, ptr [[P1]], align 1
; ALL-NEXT: [[E4:%.*]] = zext i8 [[L4]] to i32		; BE-NEXT: [[L3:%.*]] = load i8, ptr [[P2]], align 1
; ALL-NEXT: [[S2:%.*]] = shl i32 [[E2]], 8		; BE-NEXT: [[L4:%.*]] = load i8, ptr [[P3]], align 1
; ALL-NEXT: [[S3:%.*]] = shl i32 [[E3]], 16		; BE-NEXT: [[E1:%.*]] = zext i8 [[L1]] to i32
; ALL-NEXT: [[S4:%.*]] = shl i32 [[E4]], 24		; BE-NEXT: [[E2:%.*]] = zext i8 [[L2]] to i32
; ALL-NEXT: [[O1:%.*]] = or i32 [[E1]], [[S2]]		; BE-NEXT: [[E3:%.*]] = zext i8 [[L3]] to i32
; ALL-NEXT: [[O2:%.*]] = or i32 [[O1]], [[S3]]		; BE-NEXT: [[E4:%.*]] = zext i8 [[L4]] to i32
; ALL-NEXT: [[O3:%.*]] = or i32 [[O2]], [[S4]]		; BE-NEXT: [[S2:%.*]] = shl i32 [[E2]], 8
; ALL-NEXT: ret i32 [[O3]]		; BE-NEXT: [[S3:%.*]] = shl i32 [[E3]], 16
		; BE-NEXT: [[S4:%.*]] = shl i32 [[E4]], 24
		; BE-NEXT: [[O1:%.*]] = or i32 [[E1]], [[S2]]
		; BE-NEXT: [[O2:%.*]] = or i32 [[O1]], [[S3]]
		; BE-NEXT: [[O3:%.*]] = or i32 [[O2]], [[S4]]
		; BE-NEXT: ret i32 [[O3]]
;		;
%p1 = getelementptr i8, ptr %p, i32 1		%p1 = getelementptr i8, ptr %p, i32 1
%p2 = getelementptr i8, ptr %p, i32 2		%p2 = getelementptr i8, ptr %p, i32 2
%p3 = getelementptr i8, ptr %p, i32 3		%p3 = getelementptr i8, ptr %p, i32 3
%l1 = load i8, ptr %p		%l1 = load i8, ptr %p
store i8 10, i8* %p		store i8 10, i8* %p
%l2 = load i8, ptr %p1		%l2 = load i8, ptr %p1
%l3 = load i8, ptr %p2		%l3 = load i8, ptr %p2
Show All 10 Lines	;

%o1 = or i32 %e1, %s2		%o1 = or i32 %e1, %s2
%o2 = or i32 %o1, %s3		%o2 = or i32 %o1, %s3
%o3 = or i32 %o2, %s4		%o3 = or i32 %o2, %s4
ret i32 %o3		ret i32 %o3
}		}

define i32 @loadCombine_4consecutive_alias_BE(ptr %p) {		define i32 @loadCombine_4consecutive_alias_BE(ptr %p) {
; ALL-LABEL: @loadCombine_4consecutive_alias_BE(		; LE-LABEL: @loadCombine_4consecutive_alias_BE(
; ALL-NEXT: [[P1:%.]] = getelementptr i8, ptr [[P:%.]], i32 1		; LE-NEXT: [[P1:%.]] = getelementptr i8, ptr [[P:%.]], i32 1
; ALL-NEXT: [[P2:%.*]] = getelementptr i8, ptr [[P]], i32 2		; LE-NEXT: [[P2:%.*]] = getelementptr i8, ptr [[P]], i32 2
; ALL-NEXT: [[P3:%.*]] = getelementptr i8, ptr [[P]], i32 3		; LE-NEXT: [[P3:%.*]] = getelementptr i8, ptr [[P]], i32 3
; ALL-NEXT: [[L1:%.*]] = load i8, ptr [[P]], align 1		; LE-NEXT: [[L1:%.*]] = load i8, ptr [[P]], align 1
; ALL-NEXT: store i8 10, ptr [[P]], align 1		; LE-NEXT: store i8 10, ptr [[P]], align 1
; ALL-NEXT: [[L2:%.*]] = load i8, ptr [[P1]], align 1		; LE-NEXT: [[L2:%.*]] = load i8, ptr [[P1]], align 1
; ALL-NEXT: [[L3:%.*]] = load i8, ptr [[P2]], align 1		; LE-NEXT: [[L3:%.*]] = load i8, ptr [[P2]], align 1
		nikicUnsubmitted Not Done Reply Inline Actions Try a variant storing to `%p3` rather than `%pstr` here. I believe your current implementation will incorrectly accept this. nikic: Try a variant storing to `%p3` rather than `%pstr` here. I believe your current implementation…
		bipmisAuthorUnsubmitted Done Reply Inline Actions It does not return an "available value" for a direct clobber. For example a change to the test %l1 = load i8, ptr %p %l2 = load i8, ptr %p1 %l3 = load i8, ptr %p2 store i8 10, i8* %p3 %l4 = load i8, ptr %p3 still returns ; LE-NEXT: [[TMP1:%.]] = load i16, ptr [[P]], align 1 ; LE-NEXT: [[TMP2:%.]] = zext i16 [[TMP1]] to i32 ; LE-NEXT: [[L3:%.]] = load i8, ptr [[P2]], align 1 ; LE-NEXT: store i8 10, ptr [[P3]], align 1 ; LE-NEXT: [[L4:%.]] = load i8, ptr [[P3]], align 1 Can add more tests if you suggest. bipmis: It does not return an "available value" for a direct clobber. For example a change to the test…
		nikicUnsubmitted Not Done Reply Inline Actions Okay, I had to test this patch locally to find a case where it fails. Try this variant: store i8 0, ptr %p3 store i8 1, ptr %p We are looking for an available value of `%p`, so we find the `store i8 1, ptr %p` and are happy. But before that, there is a clobber of `%p3` that makes the transform invalid (the load is moved before the store). nikic: Okay, I had to test this patch locally to find a case where it fails. Try this variant: ```…
; ALL-NEXT: [[L4:%.*]] = load i8, ptr [[P3]], align 1		; LE-NEXT: [[L4:%.*]] = load i8, ptr [[P3]], align 1
; ALL-NEXT: [[E1:%.*]] = zext i8 [[L1]] to i32		; LE-NEXT: [[E1:%.*]] = zext i8 [[L1]] to i32
; ALL-NEXT: [[E2:%.*]] = zext i8 [[L2]] to i32		; LE-NEXT: [[E2:%.*]] = zext i8 [[L2]] to i32
; ALL-NEXT: [[E3:%.*]] = zext i8 [[L3]] to i32		; LE-NEXT: [[E3:%.*]] = zext i8 [[L3]] to i32
; ALL-NEXT: [[E4:%.*]] = zext i8 [[L4]] to i32		; LE-NEXT: [[E4:%.*]] = zext i8 [[L4]] to i32
; ALL-NEXT: [[S1:%.*]] = shl i32 [[E1]], 24		; LE-NEXT: [[S1:%.*]] = shl i32 [[E1]], 24
; ALL-NEXT: [[S2:%.*]] = shl i32 [[E2]], 16		; LE-NEXT: [[S2:%.*]] = shl i32 [[E2]], 16
; ALL-NEXT: [[S3:%.*]] = shl i32 [[E3]], 8		; LE-NEXT: [[S3:%.*]] = shl i32 [[E3]], 8
; ALL-NEXT: [[O1:%.*]] = or i32 [[S1]], [[S2]]		; LE-NEXT: [[O1:%.*]] = or i32 [[S1]], [[S2]]
; ALL-NEXT: [[O2:%.*]] = or i32 [[O1]], [[S3]]		; LE-NEXT: [[O2:%.*]] = or i32 [[O1]], [[S3]]
; ALL-NEXT: [[O3:%.*]] = or i32 [[O2]], [[E4]]		; LE-NEXT: [[O3:%.*]] = or i32 [[O2]], [[E4]]
; ALL-NEXT: ret i32 [[O3]]		; LE-NEXT: ret i32 [[O3]]
		;
		; BE-LABEL: @loadCombine_4consecutive_alias_BE(
		; BE-NEXT: [[L1:%.]] = load i32, ptr [[P:%.]], align 1
		; BE-NEXT: store i8 10, ptr [[P]], align 1
		; BE-NEXT: ret i32 [[L1]]
;		;
%p1 = getelementptr i8, ptr %p, i32 1		%p1 = getelementptr i8, ptr %p, i32 1
%p2 = getelementptr i8, ptr %p, i32 2		%p2 = getelementptr i8, ptr %p, i32 2
%p3 = getelementptr i8, ptr %p, i32 3		%p3 = getelementptr i8, ptr %p, i32 3
%l1 = load i8, ptr %p		%l1 = load i8, ptr %p
store i8 10, i8* %p		store i8 10, i8* %p
%l2 = load i8, ptr %p1		%l2 = load i8, ptr %p1
%l3 = load i8, ptr %p2		%l3 = load i8, ptr %p2
▲ Show 20 Lines • Show All 495 Lines • ▼ Show 20 Lines	;
%s3 = shl i32 %e3, 16		%s3 = shl i32 %e3, 16

%o1 = or i32 %e1, %s2		%o1 = or i32 %e1, %s2
%o2 = or i32 %o1, %s3		%o2 = or i32 %o1, %s3
ret i32 %o2		ret i32 %o2
}		}

define i128 @loadCombine_i128(ptr %p) {		define i128 @loadCombine_i128(ptr %p) {
; ALL-LABEL: @loadCombine_i128(		; LE-LABEL: @loadCombine_i128(
; ALL-NEXT: [[P1:%.]] = getelementptr i32, ptr [[P:%.]], i32 1		; LE-NEXT: [[P2:%.]] = getelementptr i32, ptr [[P:%.]], i32 2
; ALL-NEXT: [[P2:%.*]] = getelementptr i32, ptr [[P]], i32 2		; LE-NEXT: [[P3:%.*]] = getelementptr i32, ptr [[P]], i32 3
; ALL-NEXT: [[P3:%.*]] = getelementptr i32, ptr [[P]], i32 3		; LE-NEXT: [[L1:%.*]] = load i64, ptr [[P]], align 4
; ALL-NEXT: [[L1:%.*]] = load i32, ptr [[P]], align 4		; LE-NEXT: [[TMP1:%.*]] = zext i64 [[L1]] to i128
; ALL-NEXT: [[L2:%.*]] = load i32, ptr [[P1]], align 4		; LE-NEXT: [[L3:%.*]] = load i32, ptr [[P2]], align 4
; ALL-NEXT: [[L3:%.*]] = load i32, ptr [[P2]], align 4		; LE-NEXT: [[L4:%.*]] = load i32, ptr [[P3]], align 4
; ALL-NEXT: [[L4:%.*]] = load i32, ptr [[P3]], align 4		; LE-NEXT: [[E3:%.*]] = zext i32 [[L3]] to i128
; ALL-NEXT: [[E1:%.*]] = zext i32 [[L1]] to i128		; LE-NEXT: [[E4:%.*]] = zext i32 [[L4]] to i128
; ALL-NEXT: [[E2:%.*]] = zext i32 [[L2]] to i128		; LE-NEXT: [[S3:%.*]] = shl i128 [[E3]], 64
; ALL-NEXT: [[E3:%.*]] = zext i32 [[L3]] to i128		; LE-NEXT: [[S4:%.*]] = shl i128 [[E4]], 96
; ALL-NEXT: [[E4:%.*]] = zext i32 [[L4]] to i128		; LE-NEXT: [[O2:%.*]] = or i128 [[TMP1]], [[S3]]
; ALL-NEXT: [[S2:%.*]] = shl i128 [[E2]], 32		; LE-NEXT: [[O3:%.*]] = or i128 [[O2]], [[S4]]
; ALL-NEXT: [[S3:%.*]] = shl i128 [[E3]], 64		; LE-NEXT: ret i128 [[O3]]
; ALL-NEXT: [[S4:%.*]] = shl i128 [[E4]], 96		;
; ALL-NEXT: [[O1:%.*]] = or i128 [[E1]], [[S2]]		; BE-LABEL: @loadCombine_i128(
; ALL-NEXT: [[O2:%.*]] = or i128 [[O1]], [[S3]]		; BE-NEXT: [[P1:%.]] = getelementptr i32, ptr [[P:%.]], i32 1
; ALL-NEXT: [[O3:%.*]] = or i128 [[O2]], [[S4]]		; BE-NEXT: [[P2:%.*]] = getelementptr i32, ptr [[P]], i32 2
; ALL-NEXT: ret i128 [[O3]]		; BE-NEXT: [[P3:%.*]] = getelementptr i32, ptr [[P]], i32 3
		; BE-NEXT: [[L1:%.*]] = load i32, ptr [[P]], align 4
		; BE-NEXT: [[L2:%.*]] = load i32, ptr [[P1]], align 4
		; BE-NEXT: [[L3:%.*]] = load i32, ptr [[P2]], align 4
		; BE-NEXT: [[L4:%.*]] = load i32, ptr [[P3]], align 4
		; BE-NEXT: [[E1:%.*]] = zext i32 [[L1]] to i128
		; BE-NEXT: [[E2:%.*]] = zext i32 [[L2]] to i128
		; BE-NEXT: [[E3:%.*]] = zext i32 [[L3]] to i128
		; BE-NEXT: [[E4:%.*]] = zext i32 [[L4]] to i128
		; BE-NEXT: [[S2:%.*]] = shl i128 [[E2]], 32
		; BE-NEXT: [[S3:%.*]] = shl i128 [[E3]], 64
		; BE-NEXT: [[S4:%.*]] = shl i128 [[E4]], 96
		; BE-NEXT: [[O1:%.*]] = or i128 [[E1]], [[S2]]
		; BE-NEXT: [[O2:%.*]] = or i128 [[O1]], [[S3]]
		; BE-NEXT: [[O3:%.*]] = or i128 [[O2]], [[S4]]
		; BE-NEXT: ret i128 [[O3]]
;		;
%p1 = getelementptr i32, ptr %p, i32 1		%p1 = getelementptr i32, ptr %p, i32 1
%p2 = getelementptr i32, ptr %p, i32 2		%p2 = getelementptr i32, ptr %p, i32 2
%p3 = getelementptr i32, ptr %p, i32 3		%p3 = getelementptr i32, ptr %p, i32 3
%l1 = load i32, ptr %p		%l1 = load i32, ptr %p
%l2 = load i32, ptr %p1		%l2 = load i32, ptr %p1
%l3 = load i32, ptr %p2		%l3 = load i32, ptr %p2
%l4 = load i32, ptr %p3		%l4 = load i32, ptr %p3
Show All 9 Lines	;

%o1 = or i128 %e1, %s2		%o1 = or i128 %e1, %s2
%o2 = or i128 %o1, %s3		%o2 = or i128 %o1, %s3
%o3 = or i128 %o2, %s4		%o3 = or i128 %o2, %s4
ret i128 %o3		ret i128 %o3
}		}

define i128 @loadCombine_i128_BE(ptr %p) {		define i128 @loadCombine_i128_BE(ptr %p) {
; ALL-LABEL: @loadCombine_i128_BE(		; LE-LABEL: @loadCombine_i128_BE(
; ALL-NEXT: [[P1:%.]] = getelementptr i32, ptr [[P:%.]], i32 1		; LE-NEXT: [[P1:%.]] = getelementptr i32, ptr [[P:%.]], i32 1
; ALL-NEXT: [[P2:%.*]] = getelementptr i32, ptr [[P]], i32 2		; LE-NEXT: [[P2:%.*]] = getelementptr i32, ptr [[P]], i32 2
; ALL-NEXT: [[P3:%.*]] = getelementptr i32, ptr [[P]], i32 3		; LE-NEXT: [[P3:%.*]] = getelementptr i32, ptr [[P]], i32 3
; ALL-NEXT: [[L1:%.*]] = load i32, ptr [[P]], align 4		; LE-NEXT: [[L1:%.*]] = load i32, ptr [[P]], align 4
; ALL-NEXT: [[L2:%.*]] = load i32, ptr [[P1]], align 4		; LE-NEXT: [[L2:%.*]] = load i32, ptr [[P1]], align 4
; ALL-NEXT: [[L3:%.*]] = load i32, ptr [[P2]], align 4		; LE-NEXT: [[L3:%.*]] = load i32, ptr [[P2]], align 4
; ALL-NEXT: [[L4:%.*]] = load i32, ptr [[P3]], align 4		; LE-NEXT: [[L4:%.*]] = load i32, ptr [[P3]], align 4
; ALL-NEXT: [[E1:%.*]] = zext i32 [[L1]] to i128		; LE-NEXT: [[E1:%.*]] = zext i32 [[L1]] to i128
; ALL-NEXT: [[E2:%.*]] = zext i32 [[L2]] to i128		; LE-NEXT: [[E2:%.*]] = zext i32 [[L2]] to i128
; ALL-NEXT: [[E3:%.*]] = zext i32 [[L3]] to i128		; LE-NEXT: [[E3:%.*]] = zext i32 [[L3]] to i128
; ALL-NEXT: [[E4:%.*]] = zext i32 [[L4]] to i128		; LE-NEXT: [[E4:%.*]] = zext i32 [[L4]] to i128
; ALL-NEXT: [[S1:%.*]] = shl i128 [[E1]], 96		; LE-NEXT: [[S1:%.*]] = shl i128 [[E1]], 96
; ALL-NEXT: [[S2:%.*]] = shl i128 [[E2]], 64		; LE-NEXT: [[S2:%.*]] = shl i128 [[E2]], 64
; ALL-NEXT: [[S3:%.*]] = shl i128 [[E3]], 32		; LE-NEXT: [[S3:%.*]] = shl i128 [[E3]], 32
; ALL-NEXT: [[O1:%.*]] = or i128 [[S1]], [[S2]]		; LE-NEXT: [[O1:%.*]] = or i128 [[S1]], [[S2]]
; ALL-NEXT: [[O2:%.*]] = or i128 [[O1]], [[S3]]		; LE-NEXT: [[O2:%.*]] = or i128 [[O1]], [[S3]]
; ALL-NEXT: [[O3:%.*]] = or i128 [[O2]], [[E4]]		; LE-NEXT: [[O3:%.*]] = or i128 [[O2]], [[E4]]
; ALL-NEXT: ret i128 [[O3]]		; LE-NEXT: ret i128 [[O3]]
		;
		; BE-LABEL: @loadCombine_i128_BE(
		; BE-NEXT: [[P2:%.]] = getelementptr i32, ptr [[P:%.]], i32 2
		; BE-NEXT: [[P3:%.*]] = getelementptr i32, ptr [[P]], i32 3
		; BE-NEXT: [[L1:%.*]] = load i64, ptr [[P]], align 4
		; BE-NEXT: [[TMP1:%.*]] = zext i64 [[L1]] to i128
		; BE-NEXT: [[TMP2:%.*]] = shl i128 [[TMP1]], 64
		; BE-NEXT: [[L3:%.*]] = load i32, ptr [[P2]], align 4
		; BE-NEXT: [[L4:%.*]] = load i32, ptr [[P3]], align 4
		; BE-NEXT: [[E3:%.*]] = zext i32 [[L3]] to i128
		; BE-NEXT: [[E4:%.*]] = zext i32 [[L4]] to i128
		; BE-NEXT: [[S3:%.*]] = shl i128 [[E3]], 32
		; BE-NEXT: [[O2:%.*]] = or i128 [[TMP2]], [[S3]]
		; BE-NEXT: [[O3:%.*]] = or i128 [[O2]], [[E4]]
		; BE-NEXT: ret i128 [[O3]]
;		;
%p1 = getelementptr i32, ptr %p, i32 1		%p1 = getelementptr i32, ptr %p, i32 1
%p2 = getelementptr i32, ptr %p, i32 2		%p2 = getelementptr i32, ptr %p, i32 2
%p3 = getelementptr i32, ptr %p, i32 3		%p3 = getelementptr i32, ptr %p, i32 3
%l1 = load i32, ptr %p		%l1 = load i32, ptr %p
%l2 = load i32, ptr %p1		%l2 = load i32, ptr %p1
%l3 = load i32, ptr %p2		%l3 = load i32, ptr %p2
%l4 = load i32, ptr %p3		%l4 = load i32, ptr %p3
Show All 9 Lines	;

%o1 = or i128 %s1, %s2		%o1 = or i128 %s1, %s2
%o2 = or i128 %o1, %s3		%o2 = or i128 %o1, %s3
%o3 = or i128 %o2, %e4		%o3 = or i128 %o2, %e4
ret i128 %o3		ret i128 %o3
}		}

define i64 @loadCombine_i64(ptr %p) {		define i64 @loadCombine_i64(ptr %p) {
; ALL-LABEL: @loadCombine_i64(		; LE-LABEL: @loadCombine_i64(
; ALL-NEXT: [[P1:%.]] = getelementptr i16, ptr [[P:%.]], i32 1		; LE-NEXT: [[L1:%.]] = load i64, ptr [[P:%.]], align 2
; ALL-NEXT: [[P2:%.*]] = getelementptr i16, ptr [[P]], i32 2		; LE-NEXT: ret i64 [[L1]]
; ALL-NEXT: [[P3:%.*]] = getelementptr i16, ptr [[P]], i32 3		;
; ALL-NEXT: [[L1:%.*]] = load i16, ptr [[P]], align 2		; BE-LABEL: @loadCombine_i64(
; ALL-NEXT: [[L2:%.*]] = load i16, ptr [[P1]], align 2		; BE-NEXT: [[P1:%.]] = getelementptr i16, ptr [[P:%.]], i32 1
; ALL-NEXT: [[L3:%.*]] = load i16, ptr [[P2]], align 2		; BE-NEXT: [[P2:%.*]] = getelementptr i16, ptr [[P]], i32 2
; ALL-NEXT: [[L4:%.*]] = load i16, ptr [[P3]], align 2		; BE-NEXT: [[P3:%.*]] = getelementptr i16, ptr [[P]], i32 3
; ALL-NEXT: [[E1:%.*]] = zext i16 [[L1]] to i64		; BE-NEXT: [[L1:%.*]] = load i16, ptr [[P]], align 2
; ALL-NEXT: [[E2:%.*]] = zext i16 [[L2]] to i64		; BE-NEXT: [[L2:%.*]] = load i16, ptr [[P1]], align 2
; ALL-NEXT: [[E3:%.*]] = zext i16 [[L3]] to i64		; BE-NEXT: [[L3:%.*]] = load i16, ptr [[P2]], align 2
; ALL-NEXT: [[E4:%.*]] = zext i16 [[L4]] to i64		; BE-NEXT: [[L4:%.*]] = load i16, ptr [[P3]], align 2
; ALL-NEXT: [[S2:%.*]] = shl i64 [[E2]], 16		; BE-NEXT: [[E1:%.*]] = zext i16 [[L1]] to i64
; ALL-NEXT: [[S3:%.*]] = shl i64 [[E3]], 32		; BE-NEXT: [[E2:%.*]] = zext i16 [[L2]] to i64
; ALL-NEXT: [[S4:%.*]] = shl i64 [[E4]], 48		; BE-NEXT: [[E3:%.*]] = zext i16 [[L3]] to i64
; ALL-NEXT: [[O1:%.*]] = or i64 [[E1]], [[S2]]		; BE-NEXT: [[E4:%.*]] = zext i16 [[L4]] to i64
; ALL-NEXT: [[O2:%.*]] = or i64 [[O1]], [[S3]]		; BE-NEXT: [[S2:%.*]] = shl i64 [[E2]], 16
; ALL-NEXT: [[O3:%.*]] = or i64 [[O2]], [[S4]]		; BE-NEXT: [[S3:%.*]] = shl i64 [[E3]], 32
; ALL-NEXT: ret i64 [[O3]]		; BE-NEXT: [[S4:%.*]] = shl i64 [[E4]], 48
		; BE-NEXT: [[O1:%.*]] = or i64 [[E1]], [[S2]]
		; BE-NEXT: [[O2:%.*]] = or i64 [[O1]], [[S3]]
		; BE-NEXT: [[O3:%.*]] = or i64 [[O2]], [[S4]]
		; BE-NEXT: ret i64 [[O3]]
;		;
%p1 = getelementptr i16, ptr %p, i32 1		%p1 = getelementptr i16, ptr %p, i32 1
%p2 = getelementptr i16, ptr %p, i32 2		%p2 = getelementptr i16, ptr %p, i32 2
%p3 = getelementptr i16, ptr %p, i32 3		%p3 = getelementptr i16, ptr %p, i32 3
%l1 = load i16, ptr %p		%l1 = load i16, ptr %p
%l2 = load i16, ptr %p1		%l2 = load i16, ptr %p1
%l3 = load i16, ptr %p2		%l3 = load i16, ptr %p2
%l4 = load i16, ptr %p3		%l4 = load i16, ptr %p3
Show All 9 Lines	;

%o1 = or i64 %e1, %s2		%o1 = or i64 %e1, %s2
%o2 = or i64 %o1, %s3		%o2 = or i64 %o1, %s3
%o3 = or i64 %o2, %s4		%o3 = or i64 %o2, %s4
ret i64 %o3		ret i64 %o3
}		}

define i64 @loadCombine_i64_BE(ptr %p) {		define i64 @loadCombine_i64_BE(ptr %p) {
; ALL-LABEL: @loadCombine_i64_BE(		; LE-LABEL: @loadCombine_i64_BE(
; ALL-NEXT: [[P1:%.]] = getelementptr i16, ptr [[P:%.]], i32 1		; LE-NEXT: [[P1:%.]] = getelementptr i16, ptr [[P:%.]], i32 1
; ALL-NEXT: [[P2:%.*]] = getelementptr i16, ptr [[P]], i32 2		; LE-NEXT: [[P2:%.*]] = getelementptr i16, ptr [[P]], i32 2
; ALL-NEXT: [[P3:%.*]] = getelementptr i16, ptr [[P]], i32 3		; LE-NEXT: [[P3:%.*]] = getelementptr i16, ptr [[P]], i32 3
; ALL-NEXT: [[L1:%.*]] = load i16, ptr [[P]], align 2		; LE-NEXT: [[L1:%.*]] = load i16, ptr [[P]], align 2
; ALL-NEXT: [[L2:%.*]] = load i16, ptr [[P1]], align 2		; LE-NEXT: [[L2:%.*]] = load i16, ptr [[P1]], align 2
; ALL-NEXT: [[L3:%.*]] = load i16, ptr [[P2]], align 2		; LE-NEXT: [[L3:%.*]] = load i16, ptr [[P2]], align 2
; ALL-NEXT: [[L4:%.*]] = load i16, ptr [[P3]], align 2		; LE-NEXT: [[L4:%.*]] = load i16, ptr [[P3]], align 2
; ALL-NEXT: [[E1:%.*]] = zext i16 [[L1]] to i64		; LE-NEXT: [[E1:%.*]] = zext i16 [[L1]] to i64
; ALL-NEXT: [[E2:%.*]] = zext i16 [[L2]] to i64		; LE-NEXT: [[E2:%.*]] = zext i16 [[L2]] to i64
; ALL-NEXT: [[E3:%.*]] = zext i16 [[L3]] to i64		; LE-NEXT: [[E3:%.*]] = zext i16 [[L3]] to i64
; ALL-NEXT: [[E4:%.*]] = zext i16 [[L4]] to i64		; LE-NEXT: [[E4:%.*]] = zext i16 [[L4]] to i64
; ALL-NEXT: [[S1:%.*]] = shl i64 [[E1]], 48		; LE-NEXT: [[S1:%.*]] = shl i64 [[E1]], 48
; ALL-NEXT: [[S2:%.*]] = shl i64 [[E2]], 32		; LE-NEXT: [[S2:%.*]] = shl i64 [[E2]], 32
; ALL-NEXT: [[S3:%.*]] = shl i64 [[E3]], 16		; LE-NEXT: [[S3:%.*]] = shl i64 [[E3]], 16
; ALL-NEXT: [[O1:%.*]] = or i64 [[S1]], [[S2]]		; LE-NEXT: [[O1:%.*]] = or i64 [[S1]], [[S2]]
; ALL-NEXT: [[O2:%.*]] = or i64 [[O1]], [[S3]]		; LE-NEXT: [[O2:%.*]] = or i64 [[O1]], [[S3]]
; ALL-NEXT: [[O3:%.*]] = or i64 [[O2]], [[E4]]		; LE-NEXT: [[O3:%.*]] = or i64 [[O2]], [[E4]]
; ALL-NEXT: ret i64 [[O3]]		; LE-NEXT: ret i64 [[O3]]
		;
		; BE-LABEL: @loadCombine_i64_BE(
		; BE-NEXT: [[L1:%.]] = load i64, ptr [[P:%.]], align 2
		; BE-NEXT: ret i64 [[L1]]
;		;
%p1 = getelementptr i16, ptr %p, i32 1		%p1 = getelementptr i16, ptr %p, i32 1
%p2 = getelementptr i16, ptr %p, i32 2		%p2 = getelementptr i16, ptr %p, i32 2
%p3 = getelementptr i16, ptr %p, i32 3		%p3 = getelementptr i16, ptr %p, i32 3
%l1 = load i16, ptr %p		%l1 = load i16, ptr %p
%l2 = load i16, ptr %p1		%l2 = load i16, ptr %p1
%l3 = load i16, ptr %p2		%l3 = load i16, ptr %p2
%l4 = load i16, ptr %p3		%l4 = load i16, ptr %p3
▲ Show 20 Lines • Show All 98 Lines • ▼ Show 20 Lines	;
%e1 = zext i8 %l1 to i16		%e1 = zext i8 %l1 to i16
%e2 = zext i8 %l2 to i16		%e2 = zext i8 %l2 to i16
%s2 = shl i16 %e2, 8		%s2 = shl i16 %e2, 8
%o1 = or i16 %e1, %s2		%o1 = or i16 %e1, %s2
ret i16 %o1		ret i16 %o1
}		}

define i64 @load64_farLoads(ptr %ptr) {		define i64 @load64_farLoads(ptr %ptr) {
; ALL-LABEL: @load64_farLoads(		; LE-LABEL: @load64_farLoads(
; ALL-NEXT: entry:		; LE-NEXT: entry:
; ALL-NEXT: [[TMP0:%.]] = load i8, ptr [[PTR:%.]], align 1		; LE-NEXT: [[TMP0:%.]] = load i64, ptr [[PTR:%.]], align 1
; ALL-NEXT: [[CONV:%.*]] = zext i8 [[TMP0]] to i64		; LE-NEXT: ret i64 [[TMP0]]
; ALL-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds i8, ptr [[PTR]], i64 1		;
; ALL-NEXT: [[TMP1:%.*]] = load i8, ptr [[ARRAYIDX1]], align 1		; BE-LABEL: @load64_farLoads(
; ALL-NEXT: [[CONV2:%.*]] = zext i8 [[TMP1]] to i64		; BE-NEXT: entry:
; ALL-NEXT: [[SHL:%.*]] = shl i64 [[CONV2]], 8		; BE-NEXT: [[TMP0:%.]] = load i8, ptr [[PTR:%.]], align 1
; ALL-NEXT: [[OR:%.*]] = or i64 [[CONV]], [[SHL]]		; BE-NEXT: [[CONV:%.*]] = zext i8 [[TMP0]] to i64
; ALL-NEXT: [[ARRAYIDX3:%.*]] = getelementptr inbounds i8, ptr [[PTR]], i64 2		; BE-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds i8, ptr [[PTR]], i64 1
; ALL-NEXT: [[TMP2:%.*]] = load i8, ptr [[ARRAYIDX3]], align 1		; BE-NEXT: [[TMP1:%.*]] = load i8, ptr [[ARRAYIDX1]], align 1
; ALL-NEXT: [[CONV4:%.*]] = zext i8 [[TMP2]] to i64		; BE-NEXT: [[CONV2:%.*]] = zext i8 [[TMP1]] to i64
; ALL-NEXT: [[SHL5:%.*]] = shl i64 [[CONV4]], 16		; BE-NEXT: [[SHL:%.*]] = shl i64 [[CONV2]], 8
; ALL-NEXT: [[OR6:%.*]] = or i64 [[OR]], [[SHL5]]		; BE-NEXT: [[OR:%.*]] = or i64 [[CONV]], [[SHL]]
; ALL-NEXT: [[ARRAYIDX7:%.*]] = getelementptr inbounds i8, ptr [[PTR]], i64 3		; BE-NEXT: [[ARRAYIDX3:%.*]] = getelementptr inbounds i8, ptr [[PTR]], i64 2
; ALL-NEXT: [[TMP3:%.*]] = load i8, ptr [[ARRAYIDX7]], align 1		; BE-NEXT: [[TMP2:%.*]] = load i8, ptr [[ARRAYIDX3]], align 1
; ALL-NEXT: [[CONV8:%.*]] = zext i8 [[TMP3]] to i64		; BE-NEXT: [[CONV4:%.*]] = zext i8 [[TMP2]] to i64
; ALL-NEXT: [[SHL9:%.*]] = shl i64 [[CONV8]], 24		; BE-NEXT: [[SHL5:%.*]] = shl i64 [[CONV4]], 16
; ALL-NEXT: [[OR10:%.*]] = or i64 [[OR6]], [[SHL9]]		; BE-NEXT: [[OR6:%.*]] = or i64 [[OR]], [[SHL5]]
; ALL-NEXT: [[ARRAYIDX11:%.*]] = getelementptr inbounds i8, ptr [[PTR]], i64 4		; BE-NEXT: [[ARRAYIDX7:%.*]] = getelementptr inbounds i8, ptr [[PTR]], i64 3
; ALL-NEXT: [[TMP4:%.*]] = load i8, ptr [[ARRAYIDX11]], align 1		; BE-NEXT: [[TMP3:%.*]] = load i8, ptr [[ARRAYIDX7]], align 1
; ALL-NEXT: [[CONV12:%.*]] = zext i8 [[TMP4]] to i64		; BE-NEXT: [[CONV8:%.*]] = zext i8 [[TMP3]] to i64
; ALL-NEXT: [[SHL13:%.*]] = shl i64 [[CONV12]], 32		; BE-NEXT: [[SHL9:%.*]] = shl i64 [[CONV8]], 24
; ALL-NEXT: [[OR14:%.*]] = or i64 [[OR10]], [[SHL13]]		; BE-NEXT: [[OR10:%.*]] = or i64 [[OR6]], [[SHL9]]
; ALL-NEXT: [[ARRAYIDX15:%.*]] = getelementptr inbounds i8, ptr [[PTR]], i64 5		; BE-NEXT: [[ARRAYIDX11:%.*]] = getelementptr inbounds i8, ptr [[PTR]], i64 4
; ALL-NEXT: [[TMP5:%.*]] = load i8, ptr [[ARRAYIDX15]], align 1		; BE-NEXT: [[TMP4:%.*]] = load i8, ptr [[ARRAYIDX11]], align 1
; ALL-NEXT: [[CONV16:%.*]] = zext i8 [[TMP5]] to i64		; BE-NEXT: [[CONV12:%.*]] = zext i8 [[TMP4]] to i64
; ALL-NEXT: [[SHL17:%.*]] = shl i64 [[CONV16]], 40		; BE-NEXT: [[SHL13:%.*]] = shl i64 [[CONV12]], 32
; ALL-NEXT: [[OR18:%.*]] = or i64 [[OR14]], [[SHL17]]		; BE-NEXT: [[OR14:%.*]] = or i64 [[OR10]], [[SHL13]]
; ALL-NEXT: [[ARRAYIDX19:%.*]] = getelementptr inbounds i8, ptr [[PTR]], i64 6		; BE-NEXT: [[ARRAYIDX15:%.*]] = getelementptr inbounds i8, ptr [[PTR]], i64 5
; ALL-NEXT: [[TMP6:%.*]] = load i8, ptr [[ARRAYIDX19]], align 1		; BE-NEXT: [[TMP5:%.*]] = load i8, ptr [[ARRAYIDX15]], align 1
; ALL-NEXT: [[CONV20:%.*]] = zext i8 [[TMP6]] to i64		; BE-NEXT: [[CONV16:%.*]] = zext i8 [[TMP5]] to i64
; ALL-NEXT: [[SHL21:%.*]] = shl i64 [[CONV20]], 48		; BE-NEXT: [[SHL17:%.*]] = shl i64 [[CONV16]], 40
; ALL-NEXT: [[OR22:%.*]] = or i64 [[OR18]], [[SHL21]]		; BE-NEXT: [[OR18:%.*]] = or i64 [[OR14]], [[SHL17]]
; ALL-NEXT: [[ARRAYIDX23:%.*]] = getelementptr inbounds i8, ptr [[PTR]], i64 7		; BE-NEXT: [[ARRAYIDX19:%.*]] = getelementptr inbounds i8, ptr [[PTR]], i64 6
; ALL-NEXT: [[TMP7:%.*]] = load i8, ptr [[ARRAYIDX23]], align 1		; BE-NEXT: [[TMP6:%.*]] = load i8, ptr [[ARRAYIDX19]], align 1
; ALL-NEXT: [[CONV24:%.*]] = zext i8 [[TMP7]] to i64		; BE-NEXT: [[CONV20:%.*]] = zext i8 [[TMP6]] to i64
; ALL-NEXT: [[SHL25:%.*]] = shl i64 [[CONV24]], 56		; BE-NEXT: [[SHL21:%.*]] = shl i64 [[CONV20]], 48
; ALL-NEXT: [[OR26:%.*]] = or i64 [[OR22]], [[SHL25]]		; BE-NEXT: [[OR22:%.*]] = or i64 [[OR18]], [[SHL21]]
; ALL-NEXT: ret i64 [[OR26]]		; BE-NEXT: [[ARRAYIDX23:%.*]] = getelementptr inbounds i8, ptr [[PTR]], i64 7
		; BE-NEXT: [[TMP7:%.*]] = load i8, ptr [[ARRAYIDX23]], align 1
		; BE-NEXT: [[CONV24:%.*]] = zext i8 [[TMP7]] to i64
		; BE-NEXT: [[SHL25:%.*]] = shl i64 [[CONV24]], 56
		; BE-NEXT: [[OR26:%.*]] = or i64 [[OR22]], [[SHL25]]
		; BE-NEXT: ret i64 [[OR26]]
;		;
entry:		entry:
%0 = load i8, ptr %ptr, align 1		%0 = load i8, ptr %ptr, align 1
%conv = zext i8 %0 to i64		%conv = zext i8 %0 to i64
%arrayidx1 = getelementptr inbounds i8, ptr %ptr, i64 1		%arrayidx1 = getelementptr inbounds i8, ptr %ptr, i64 1
%1 = load i8, ptr %arrayidx1, align 1		%1 = load i8, ptr %arrayidx1, align 1
%conv2 = zext i8 %1 to i64		%conv2 = zext i8 %1 to i64
%shl = shl i64 %conv2, 8		%shl = shl i64 %conv2, 8
Show All 27 Lines	entry:
%7 = load i8, ptr %arrayidx23, align 1		%7 = load i8, ptr %arrayidx23, align 1
%conv24 = zext i8 %7 to i64		%conv24 = zext i8 %7 to i64
%shl25 = shl i64 %conv24, 56		%shl25 = shl i64 %conv24, 56
%or26 = or i64 %or22, %shl25		%or26 = or i64 %or22, %shl25
ret i64 %or26		ret i64 %or26
}		}

define i32 @loadCombine_4consecutive_metadata(ptr %p, ptr %pstr) {		define i32 @loadCombine_4consecutive_metadata(ptr %p, ptr %pstr) {
; ALL-LABEL: @loadCombine_4consecutive_metadata(		; LE-LABEL: @loadCombine_4consecutive_metadata(
; ALL-NEXT: [[P1:%.]] = getelementptr i8, ptr [[P:%.]], i32 1		; LE-NEXT: [[L1:%.]] = load i32, ptr [[P:%.]], align 1, !alias.scope !0
; ALL-NEXT: [[P2:%.*]] = getelementptr i8, ptr [[P]], i32 2		; LE-NEXT: store i32 25, ptr [[PSTR:%.*]], align 4, !noalias !0
; ALL-NEXT: [[P3:%.*]] = getelementptr i8, ptr [[P]], i32 3		; LE-NEXT: ret i32 [[L1]]
; ALL-NEXT: [[L1:%.*]] = load i8, ptr [[P]], align 1, !alias.scope !0		;
; ALL-NEXT: [[L2:%.*]] = load i8, ptr [[P1]], align 1, !alias.scope !0		; BE-LABEL: @loadCombine_4consecutive_metadata(
; ALL-NEXT: [[L3:%.*]] = load i8, ptr [[P2]], align 1, !alias.scope !0		; BE-NEXT: [[P1:%.]] = getelementptr i8, ptr [[P:%.]], i32 1
; ALL-NEXT: [[L4:%.*]] = load i8, ptr [[P3]], align 1, !alias.scope !0		; BE-NEXT: [[P2:%.*]] = getelementptr i8, ptr [[P]], i32 2
; ALL-NEXT: store i32 25, ptr [[PSTR:%.*]], align 4, !noalias !0		; BE-NEXT: [[P3:%.*]] = getelementptr i8, ptr [[P]], i32 3
; ALL-NEXT: [[E1:%.*]] = zext i8 [[L1]] to i32		; BE-NEXT: [[L1:%.*]] = load i8, ptr [[P]], align 1, !alias.scope !0
; ALL-NEXT: [[E2:%.*]] = zext i8 [[L2]] to i32		; BE-NEXT: [[L2:%.*]] = load i8, ptr [[P1]], align 1, !alias.scope !0
; ALL-NEXT: [[E3:%.*]] = zext i8 [[L3]] to i32		; BE-NEXT: [[L3:%.*]] = load i8, ptr [[P2]], align 1, !alias.scope !0
; ALL-NEXT: [[E4:%.*]] = zext i8 [[L4]] to i32		; BE-NEXT: [[L4:%.*]] = load i8, ptr [[P3]], align 1, !alias.scope !0
; ALL-NEXT: [[S2:%.*]] = shl i32 [[E2]], 8		; BE-NEXT: store i32 25, ptr [[PSTR:%.*]], align 4, !noalias !0
; ALL-NEXT: [[S3:%.*]] = shl i32 [[E3]], 16		; BE-NEXT: [[E1:%.*]] = zext i8 [[L1]] to i32
; ALL-NEXT: [[S4:%.*]] = shl i32 [[E4]], 24		; BE-NEXT: [[E2:%.*]] = zext i8 [[L2]] to i32
; ALL-NEXT: [[O1:%.*]] = or i32 [[E1]], [[S2]]		; BE-NEXT: [[E3:%.*]] = zext i8 [[L3]] to i32
; ALL-NEXT: [[O2:%.*]] = or i32 [[O1]], [[S3]]		; BE-NEXT: [[E4:%.*]] = zext i8 [[L4]] to i32
; ALL-NEXT: [[O3:%.*]] = or i32 [[O2]], [[S4]]		; BE-NEXT: [[S2:%.*]] = shl i32 [[E2]], 8
; ALL-NEXT: ret i32 [[O3]]		; BE-NEXT: [[S3:%.*]] = shl i32 [[E3]], 16
		; BE-NEXT: [[S4:%.*]] = shl i32 [[E4]], 24
		; BE-NEXT: [[O1:%.*]] = or i32 [[E1]], [[S2]]
		; BE-NEXT: [[O2:%.*]] = or i32 [[O1]], [[S3]]
		; BE-NEXT: [[O3:%.*]] = or i32 [[O2]], [[S4]]
		; BE-NEXT: ret i32 [[O3]]
;		;
%p1 = getelementptr i8, ptr %p, i32 1		%p1 = getelementptr i8, ptr %p, i32 1
%p2 = getelementptr i8, ptr %p, i32 2		%p2 = getelementptr i8, ptr %p, i32 2
%p3 = getelementptr i8, ptr %p, i32 3		%p3 = getelementptr i8, ptr %p, i32 3
%l1 = load i8, ptr %p, !alias.scope !2		%l1 = load i8, ptr %p, !alias.scope !2
%l2 = load i8, ptr %p1, !alias.scope !2		%l2 = load i8, ptr %p1, !alias.scope !2
%l3 = load i8, ptr %p2, !alias.scope !2		%l3 = load i8, ptr %p2, !alias.scope !2
%l4 = load i8, ptr %p3, !alias.scope !2		%l4 = load i8, ptr %p3, !alias.scope !2
▲ Show 20 Lines • Show All 358 Lines • Show Last 20 Lines

llvm/test/Transforms/AggressiveInstCombine/X86/or-load.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -passes=aggressive-instcombine -mtriple x86_64-none-eabi -mattr=avx2 -data-layout="e-n64" -S \| FileCheck %s --check-prefixes=ALL		; RUN: opt < %s -passes=aggressive-instcombine -mtriple x86_64-none-eabi -mattr=avx2 -data-layout="e-n64" -S \| FileCheck %s --check-prefixes=ALL,LE
; RUN: opt < %s -passes=aggressive-instcombine -mtriple x86_64-none-eabi -mattr=avx2 -data-layout="E-n64" -S \| FileCheck %s --check-prefixes=ALL		; RUN: opt < %s -passes=aggressive-instcombine -mtriple x86_64-none-eabi -mattr=avx2 -data-layout="E-n64" -S \| FileCheck %s --check-prefixes=ALL,BE

define i16 @loadCombine_2consecutive(ptr %p) {		define i16 @loadCombine_2consecutive(ptr %p) {
;		;
; ALL-LABEL: @loadCombine_2consecutive(		; LE-LABEL: @loadCombine_2consecutive(
; ALL-NEXT: [[P1:%.]] = getelementptr i8, ptr [[P:%.]], i32 1		; LE-NEXT: [[L1:%.]] = load i16, ptr [[P:%.]], align 1
; ALL-NEXT: [[L1:%.*]] = load i8, ptr [[P]], align 1		; LE-NEXT: ret i16 [[L1]]
; ALL-NEXT: [[L2:%.*]] = load i8, ptr [[P1]], align 1		;
; ALL-NEXT: [[E1:%.*]] = zext i8 [[L1]] to i16		; BE-LABEL: @loadCombine_2consecutive(
; ALL-NEXT: [[E2:%.*]] = zext i8 [[L2]] to i16		; BE-NEXT: [[P1:%.]] = getelementptr i8, ptr [[P:%.]], i32 1
; ALL-NEXT: [[S2:%.*]] = shl i16 [[E2]], 8		; BE-NEXT: [[L1:%.*]] = load i8, ptr [[P]], align 1
; ALL-NEXT: [[O1:%.*]] = or i16 [[E1]], [[S2]]		; BE-NEXT: [[L2:%.*]] = load i8, ptr [[P1]], align 1
; ALL-NEXT: ret i16 [[O1]]		; BE-NEXT: [[E1:%.*]] = zext i8 [[L1]] to i16
		; BE-NEXT: [[E2:%.*]] = zext i8 [[L2]] to i16
		; BE-NEXT: [[S2:%.*]] = shl i16 [[E2]], 8
		; BE-NEXT: [[O1:%.*]] = or i16 [[E1]], [[S2]]
		; BE-NEXT: ret i16 [[O1]]
;		;
%p1 = getelementptr i8, ptr %p, i32 1		%p1 = getelementptr i8, ptr %p, i32 1
%l1 = load i8, ptr %p		%l1 = load i8, ptr %p
%l2 = load i8, ptr %p1		%l2 = load i8, ptr %p1
%e1 = zext i8 %l1 to i16		%e1 = zext i8 %l1 to i16
%e2 = zext i8 %l2 to i16		%e2 = zext i8 %l2 to i16
%s2 = shl i16 %e2, 8		%s2 = shl i16 %e2, 8
%o1 = or i16 %e1, %s2		%o1 = or i16 %e1, %s2
ret i16 %o1		ret i16 %o1
}		}

define i16 @loadCombine_2consecutive_BE(ptr %p) {		define i16 @loadCombine_2consecutive_BE(ptr %p) {
; ALL-LABEL: @loadCombine_2consecutive_BE(		; LE-LABEL: @loadCombine_2consecutive_BE(
; ALL-NEXT: [[P1:%.]] = getelementptr i8, ptr [[P:%.]], i32 1		; LE-NEXT: [[P1:%.]] = getelementptr i8, ptr [[P:%.]], i32 1
; ALL-NEXT: [[L1:%.*]] = load i8, ptr [[P]], align 1		; LE-NEXT: [[L1:%.*]] = load i8, ptr [[P]], align 1
; ALL-NEXT: [[L2:%.*]] = load i8, ptr [[P1]], align 1		; LE-NEXT: [[L2:%.*]] = load i8, ptr [[P1]], align 1
; ALL-NEXT: [[E1:%.*]] = zext i8 [[L1]] to i16		; LE-NEXT: [[E1:%.*]] = zext i8 [[L1]] to i16
; ALL-NEXT: [[E2:%.*]] = zext i8 [[L2]] to i16		; LE-NEXT: [[E2:%.*]] = zext i8 [[L2]] to i16
; ALL-NEXT: [[S1:%.*]] = shl i16 [[E1]], 8		; LE-NEXT: [[S1:%.*]] = shl i16 [[E1]], 8
; ALL-NEXT: [[O1:%.*]] = or i16 [[S1]], [[E2]]		; LE-NEXT: [[O1:%.*]] = or i16 [[S1]], [[E2]]
; ALL-NEXT: ret i16 [[O1]]		; LE-NEXT: ret i16 [[O1]]
		;
		; BE-LABEL: @loadCombine_2consecutive_BE(
		; BE-NEXT: [[L1:%.]] = load i16, ptr [[P:%.]], align 1
		; BE-NEXT: ret i16 [[L1]]
;		;
%p1 = getelementptr i8, ptr %p, i32 1		%p1 = getelementptr i8, ptr %p, i32 1
%l1 = load i8, ptr %p		%l1 = load i8, ptr %p
%l2 = load i8, ptr %p1		%l2 = load i8, ptr %p1
%e1 = zext i8 %l1 to i16		%e1 = zext i8 %l1 to i16
%e2 = zext i8 %l2 to i16		%e2 = zext i8 %l2 to i16
%s1 = shl i16 %e1, 8		%s1 = shl i16 %e1, 8
%o1 = or i16 %s1, %e2		%o1 = or i16 %s1, %e2
ret i16 %o1		ret i16 %o1
}		}

define i32 @loadCombine_4consecutive(ptr %p) {		define i32 @loadCombine_4consecutive(ptr %p) {
; ALL-LABEL: @loadCombine_4consecutive(		; LE-LABEL: @loadCombine_4consecutive(
; ALL-NEXT: [[P1:%.]] = getelementptr i8, ptr [[P:%.]], i32 1		; LE-NEXT: [[L1:%.]] = load i32, ptr [[P:%.]], align 1
; ALL-NEXT: [[P2:%.*]] = getelementptr i8, ptr [[P]], i32 2		; LE-NEXT: ret i32 [[L1]]
; ALL-NEXT: [[P3:%.*]] = getelementptr i8, ptr [[P]], i32 3		;
; ALL-NEXT: [[L1:%.*]] = load i8, ptr [[P]], align 1		; BE-LABEL: @loadCombine_4consecutive(
; ALL-NEXT: [[L2:%.*]] = load i8, ptr [[P1]], align 1		; BE-NEXT: [[P1:%.]] = getelementptr i8, ptr [[P:%.]], i32 1
; ALL-NEXT: [[L3:%.*]] = load i8, ptr [[P2]], align 1		; BE-NEXT: [[P2:%.*]] = getelementptr i8, ptr [[P]], i32 2
; ALL-NEXT: [[L4:%.*]] = load i8, ptr [[P3]], align 1		; BE-NEXT: [[P3:%.*]] = getelementptr i8, ptr [[P]], i32 3
; ALL-NEXT: [[E1:%.*]] = zext i8 [[L1]] to i32		; BE-NEXT: [[L1:%.*]] = load i8, ptr [[P]], align 1
; ALL-NEXT: [[E2:%.*]] = zext i8 [[L2]] to i32		; BE-NEXT: [[L2:%.*]] = load i8, ptr [[P1]], align 1
; ALL-NEXT: [[E3:%.*]] = zext i8 [[L3]] to i32		; BE-NEXT: [[L3:%.*]] = load i8, ptr [[P2]], align 1
; ALL-NEXT: [[E4:%.*]] = zext i8 [[L4]] to i32		; BE-NEXT: [[L4:%.*]] = load i8, ptr [[P3]], align 1
; ALL-NEXT: [[S2:%.*]] = shl i32 [[E2]], 8		; BE-NEXT: [[E1:%.*]] = zext i8 [[L1]] to i32
; ALL-NEXT: [[S3:%.*]] = shl i32 [[E3]], 16		; BE-NEXT: [[E2:%.*]] = zext i8 [[L2]] to i32
; ALL-NEXT: [[S4:%.*]] = shl i32 [[E4]], 24		; BE-NEXT: [[E3:%.*]] = zext i8 [[L3]] to i32
; ALL-NEXT: [[O1:%.*]] = or i32 [[E1]], [[S2]]		; BE-NEXT: [[E4:%.*]] = zext i8 [[L4]] to i32
; ALL-NEXT: [[O2:%.*]] = or i32 [[O1]], [[S3]]		; BE-NEXT: [[S2:%.*]] = shl i32 [[E2]], 8
; ALL-NEXT: [[O3:%.*]] = or i32 [[O2]], [[S4]]		; BE-NEXT: [[S3:%.*]] = shl i32 [[E3]], 16
; ALL-NEXT: ret i32 [[O3]]		; BE-NEXT: [[S4:%.*]] = shl i32 [[E4]], 24
		; BE-NEXT: [[O1:%.*]] = or i32 [[E1]], [[S2]]
		; BE-NEXT: [[O2:%.*]] = or i32 [[O1]], [[S3]]
		; BE-NEXT: [[O3:%.*]] = or i32 [[O2]], [[S4]]
		; BE-NEXT: ret i32 [[O3]]
;		;
%p1 = getelementptr i8, ptr %p, i32 1		%p1 = getelementptr i8, ptr %p, i32 1
%p2 = getelementptr i8, ptr %p, i32 2		%p2 = getelementptr i8, ptr %p, i32 2
%p3 = getelementptr i8, ptr %p, i32 3		%p3 = getelementptr i8, ptr %p, i32 3
%l1 = load i8, ptr %p		%l1 = load i8, ptr %p
%l2 = load i8, ptr %p1		%l2 = load i8, ptr %p1
%l3 = load i8, ptr %p2		%l3 = load i8, ptr %p2
%l4 = load i8, ptr %p3		%l4 = load i8, ptr %p3
Show All 9 Lines	;

%o1 = or i32 %e1, %s2		%o1 = or i32 %e1, %s2
%o2 = or i32 %o1, %s3		%o2 = or i32 %o1, %s3
%o3 = or i32 %o2, %s4		%o3 = or i32 %o2, %s4
ret i32 %o3		ret i32 %o3
}		}

define i32 @loadCombine_4consecutive_BE(ptr %p) {		define i32 @loadCombine_4consecutive_BE(ptr %p) {
; ALL-LABEL: @loadCombine_4consecutive_BE(		; LE-LABEL: @loadCombine_4consecutive_BE(
; ALL-NEXT: [[P1:%.]] = getelementptr i8, ptr [[P:%.]], i32 1		; LE-NEXT: [[P1:%.]] = getelementptr i8, ptr [[P:%.]], i32 1
; ALL-NEXT: [[P2:%.*]] = getelementptr i8, ptr [[P]], i32 2		; LE-NEXT: [[P2:%.*]] = getelementptr i8, ptr [[P]], i32 2
; ALL-NEXT: [[P3:%.*]] = getelementptr i8, ptr [[P]], i32 3		; LE-NEXT: [[P3:%.*]] = getelementptr i8, ptr [[P]], i32 3
; ALL-NEXT: [[L1:%.*]] = load i8, ptr [[P]], align 1		; LE-NEXT: [[L1:%.*]] = load i8, ptr [[P]], align 1
; ALL-NEXT: [[L2:%.*]] = load i8, ptr [[P1]], align 1		; LE-NEXT: [[L2:%.*]] = load i8, ptr [[P1]], align 1
; ALL-NEXT: [[L3:%.*]] = load i8, ptr [[P2]], align 1		; LE-NEXT: [[L3:%.*]] = load i8, ptr [[P2]], align 1
; ALL-NEXT: [[L4:%.*]] = load i8, ptr [[P3]], align 1		; LE-NEXT: [[L4:%.*]] = load i8, ptr [[P3]], align 1
; ALL-NEXT: [[E1:%.*]] = zext i8 [[L1]] to i32		; LE-NEXT: [[E1:%.*]] = zext i8 [[L1]] to i32
; ALL-NEXT: [[E2:%.*]] = zext i8 [[L2]] to i32		; LE-NEXT: [[E2:%.*]] = zext i8 [[L2]] to i32
; ALL-NEXT: [[E3:%.*]] = zext i8 [[L3]] to i32		; LE-NEXT: [[E3:%.*]] = zext i8 [[L3]] to i32
; ALL-NEXT: [[E4:%.*]] = zext i8 [[L4]] to i32		; LE-NEXT: [[E4:%.*]] = zext i8 [[L4]] to i32
; ALL-NEXT: [[S1:%.*]] = shl i32 [[E1]], 24		; LE-NEXT: [[S1:%.*]] = shl i32 [[E1]], 24
; ALL-NEXT: [[S2:%.*]] = shl i32 [[E2]], 16		; LE-NEXT: [[S2:%.*]] = shl i32 [[E2]], 16
; ALL-NEXT: [[S3:%.*]] = shl i32 [[E3]], 8		; LE-NEXT: [[S3:%.*]] = shl i32 [[E3]], 8
; ALL-NEXT: [[O1:%.*]] = or i32 [[S1]], [[S2]]		; LE-NEXT: [[O1:%.*]] = or i32 [[S1]], [[S2]]
; ALL-NEXT: [[O2:%.*]] = or i32 [[O1]], [[S3]]		; LE-NEXT: [[O2:%.*]] = or i32 [[O1]], [[S3]]
; ALL-NEXT: [[O3:%.*]] = or i32 [[O2]], [[E4]]		; LE-NEXT: [[O3:%.*]] = or i32 [[O2]], [[E4]]
; ALL-NEXT: ret i32 [[O3]]		; LE-NEXT: ret i32 [[O3]]
		;
		; BE-LABEL: @loadCombine_4consecutive_BE(
		; BE-NEXT: [[L1:%.]] = load i32, ptr [[P:%.]], align 1
		; BE-NEXT: ret i32 [[L1]]
;		;
%p1 = getelementptr i8, ptr %p, i32 1		%p1 = getelementptr i8, ptr %p, i32 1
%p2 = getelementptr i8, ptr %p, i32 2		%p2 = getelementptr i8, ptr %p, i32 2
%p3 = getelementptr i8, ptr %p, i32 3		%p3 = getelementptr i8, ptr %p, i32 3
%l1 = load i8, ptr %p		%l1 = load i8, ptr %p
%l2 = load i8, ptr %p1		%l2 = load i8, ptr %p1
%l3 = load i8, ptr %p2		%l3 = load i8, ptr %p2
%l4 = load i8, ptr %p3		%l4 = load i8, ptr %p3
Show All 9 Lines	;

%o1 = or i32 %s1, %s2		%o1 = or i32 %s1, %s2
%o2 = or i32 %o1, %s3		%o2 = or i32 %o1, %s3
%o3 = or i32 %o2, %e4		%o3 = or i32 %o2, %e4
ret i32 %o3		ret i32 %o3
}		}

define i32 @loadCombine_4consecutive_alias(ptr %p) {		define i32 @loadCombine_4consecutive_alias(ptr %p) {
; ALL-LABEL: @loadCombine_4consecutive_alias(		; LE-LABEL: @loadCombine_4consecutive_alias(
; ALL-NEXT: [[P1:%.]] = getelementptr i8, ptr [[P:%.]], i32 1		; LE-NEXT: [[L1:%.]] = load i32, ptr [[P:%.]], align 1
; ALL-NEXT: [[P2:%.*]] = getelementptr i8, ptr [[P]], i32 2		; LE-NEXT: store i8 10, ptr [[P]], align 1
; ALL-NEXT: [[P3:%.*]] = getelementptr i8, ptr [[P]], i32 3		; LE-NEXT: ret i32 [[L1]]
; ALL-NEXT: [[L1:%.*]] = load i8, ptr [[P]], align 1		;
; ALL-NEXT: store i8 10, ptr [[P]], align 1		; BE-LABEL: @loadCombine_4consecutive_alias(
; ALL-NEXT: [[L2:%.*]] = load i8, ptr [[P1]], align 1		; BE-NEXT: [[P1:%.]] = getelementptr i8, ptr [[P:%.]], i32 1
; ALL-NEXT: [[L3:%.*]] = load i8, ptr [[P2]], align 1		; BE-NEXT: [[P2:%.*]] = getelementptr i8, ptr [[P]], i32 2
; ALL-NEXT: [[L4:%.*]] = load i8, ptr [[P3]], align 1		; BE-NEXT: [[P3:%.*]] = getelementptr i8, ptr [[P]], i32 3
; ALL-NEXT: [[E1:%.*]] = zext i8 [[L1]] to i32		; BE-NEXT: [[L1:%.*]] = load i8, ptr [[P]], align 1
; ALL-NEXT: [[E2:%.*]] = zext i8 [[L2]] to i32		; BE-NEXT: store i8 10, ptr [[P]], align 1
; ALL-NEXT: [[E3:%.*]] = zext i8 [[L3]] to i32		; BE-NEXT: [[L2:%.*]] = load i8, ptr [[P1]], align 1
; ALL-NEXT: [[E4:%.*]] = zext i8 [[L4]] to i32		; BE-NEXT: [[L3:%.*]] = load i8, ptr [[P2]], align 1
; ALL-NEXT: [[S2:%.*]] = shl i32 [[E2]], 8		; BE-NEXT: [[L4:%.*]] = load i8, ptr [[P3]], align 1
; ALL-NEXT: [[S3:%.*]] = shl i32 [[E3]], 16		; BE-NEXT: [[E1:%.*]] = zext i8 [[L1]] to i32
; ALL-NEXT: [[S4:%.*]] = shl i32 [[E4]], 24		; BE-NEXT: [[E2:%.*]] = zext i8 [[L2]] to i32
; ALL-NEXT: [[O1:%.*]] = or i32 [[E1]], [[S2]]		; BE-NEXT: [[E3:%.*]] = zext i8 [[L3]] to i32
; ALL-NEXT: [[O2:%.*]] = or i32 [[O1]], [[S3]]		; BE-NEXT: [[E4:%.*]] = zext i8 [[L4]] to i32
; ALL-NEXT: [[O3:%.*]] = or i32 [[O2]], [[S4]]		; BE-NEXT: [[S2:%.*]] = shl i32 [[E2]], 8
; ALL-NEXT: ret i32 [[O3]]		; BE-NEXT: [[S3:%.*]] = shl i32 [[E3]], 16
		; BE-NEXT: [[S4:%.*]] = shl i32 [[E4]], 24
		; BE-NEXT: [[O1:%.*]] = or i32 [[E1]], [[S2]]
		; BE-NEXT: [[O2:%.*]] = or i32 [[O1]], [[S3]]
		; BE-NEXT: [[O3:%.*]] = or i32 [[O2]], [[S4]]
		; BE-NEXT: ret i32 [[O3]]
;		;
%p1 = getelementptr i8, ptr %p, i32 1		%p1 = getelementptr i8, ptr %p, i32 1
%p2 = getelementptr i8, ptr %p, i32 2		%p2 = getelementptr i8, ptr %p, i32 2
%p3 = getelementptr i8, ptr %p, i32 3		%p3 = getelementptr i8, ptr %p, i32 3
%l1 = load i8, ptr %p		%l1 = load i8, ptr %p
store i8 10, i8* %p		store i8 10, i8* %p
%l2 = load i8, ptr %p1		%l2 = load i8, ptr %p1
%l3 = load i8, ptr %p2		%l3 = load i8, ptr %p2
Show All 10 Lines	;

%o1 = or i32 %e1, %s2		%o1 = or i32 %e1, %s2
%o2 = or i32 %o1, %s3		%o2 = or i32 %o1, %s3
%o3 = or i32 %o2, %s4		%o3 = or i32 %o2, %s4
ret i32 %o3		ret i32 %o3
}		}

define i32 @loadCombine_4consecutive_alias_BE(ptr %p) {		define i32 @loadCombine_4consecutive_alias_BE(ptr %p) {
; ALL-LABEL: @loadCombine_4consecutive_alias_BE(		; LE-LABEL: @loadCombine_4consecutive_alias_BE(
; ALL-NEXT: [[P1:%.]] = getelementptr i8, ptr [[P:%.]], i32 1		; LE-NEXT: [[P1:%.]] = getelementptr i8, ptr [[P:%.]], i32 1
; ALL-NEXT: [[P2:%.*]] = getelementptr i8, ptr [[P]], i32 2		; LE-NEXT: [[P2:%.*]] = getelementptr i8, ptr [[P]], i32 2
; ALL-NEXT: [[P3:%.*]] = getelementptr i8, ptr [[P]], i32 3		; LE-NEXT: [[P3:%.*]] = getelementptr i8, ptr [[P]], i32 3
; ALL-NEXT: [[L1:%.*]] = load i8, ptr [[P]], align 1		; LE-NEXT: [[L1:%.*]] = load i8, ptr [[P]], align 1
; ALL-NEXT: store i8 10, ptr [[P]], align 1		; LE-NEXT: store i8 10, ptr [[P]], align 1
; ALL-NEXT: [[L2:%.*]] = load i8, ptr [[P1]], align 1		; LE-NEXT: [[L2:%.*]] = load i8, ptr [[P1]], align 1
; ALL-NEXT: [[L3:%.*]] = load i8, ptr [[P2]], align 1		; LE-NEXT: [[L3:%.*]] = load i8, ptr [[P2]], align 1
; ALL-NEXT: [[L4:%.*]] = load i8, ptr [[P3]], align 1		; LE-NEXT: [[L4:%.*]] = load i8, ptr [[P3]], align 1
; ALL-NEXT: [[E1:%.*]] = zext i8 [[L1]] to i32		; LE-NEXT: [[E1:%.*]] = zext i8 [[L1]] to i32
; ALL-NEXT: [[E2:%.*]] = zext i8 [[L2]] to i32		; LE-NEXT: [[E2:%.*]] = zext i8 [[L2]] to i32
; ALL-NEXT: [[E3:%.*]] = zext i8 [[L3]] to i32		; LE-NEXT: [[E3:%.*]] = zext i8 [[L3]] to i32
; ALL-NEXT: [[E4:%.*]] = zext i8 [[L4]] to i32		; LE-NEXT: [[E4:%.*]] = zext i8 [[L4]] to i32
; ALL-NEXT: [[S1:%.*]] = shl i32 [[E1]], 24		; LE-NEXT: [[S1:%.*]] = shl i32 [[E1]], 24
; ALL-NEXT: [[S2:%.*]] = shl i32 [[E2]], 16		; LE-NEXT: [[S2:%.*]] = shl i32 [[E2]], 16
; ALL-NEXT: [[S3:%.*]] = shl i32 [[E3]], 8		; LE-NEXT: [[S3:%.*]] = shl i32 [[E3]], 8
; ALL-NEXT: [[O1:%.*]] = or i32 [[S1]], [[S2]]		; LE-NEXT: [[O1:%.*]] = or i32 [[S1]], [[S2]]
; ALL-NEXT: [[O2:%.*]] = or i32 [[O1]], [[S3]]		; LE-NEXT: [[O2:%.*]] = or i32 [[O1]], [[S3]]
; ALL-NEXT: [[O3:%.*]] = or i32 [[O2]], [[E4]]		; LE-NEXT: [[O3:%.*]] = or i32 [[O2]], [[E4]]
; ALL-NEXT: ret i32 [[O3]]		; LE-NEXT: ret i32 [[O3]]
		;
		; BE-LABEL: @loadCombine_4consecutive_alias_BE(
		; BE-NEXT: [[L1:%.]] = load i32, ptr [[P:%.]], align 1
		; BE-NEXT: store i8 10, ptr [[P]], align 1
		; BE-NEXT: ret i32 [[L1]]
;		;
%p1 = getelementptr i8, ptr %p, i32 1		%p1 = getelementptr i8, ptr %p, i32 1
%p2 = getelementptr i8, ptr %p, i32 2		%p2 = getelementptr i8, ptr %p, i32 2
%p3 = getelementptr i8, ptr %p, i32 3		%p3 = getelementptr i8, ptr %p, i32 3
%l1 = load i8, ptr %p		%l1 = load i8, ptr %p
store i8 10, i8* %p		store i8 10, i8* %p
%l2 = load i8, ptr %p1		%l2 = load i8, ptr %p1
%l3 = load i8, ptr %p2		%l3 = load i8, ptr %p2
Show All 10 Lines	;

%o1 = or i32 %s1, %s2		%o1 = or i32 %s1, %s2
%o2 = or i32 %o1, %s3		%o2 = or i32 %o1, %s3
%o3 = or i32 %o2, %e4		%o3 = or i32 %o2, %e4
ret i32 %o3		ret i32 %o3
}		}

define i32 @loadCombine_4consecutive_alias2(ptr %p, ptr %pstr) {		define i32 @loadCombine_4consecutive_alias2(ptr %p, ptr %pstr) {
; ALL-LABEL: @loadCombine_4consecutive_alias2(		; LE-LABEL: @loadCombine_4consecutive_alias2(
; ALL-NEXT: [[P1:%.]] = getelementptr i8, ptr [[P:%.]], i32 1		; LE-NEXT: [[P2:%.]] = getelementptr i8, ptr [[P:%.]], i32 2
; ALL-NEXT: [[P2:%.*]] = getelementptr i8, ptr [[P]], i32 2		; LE-NEXT: [[P3:%.*]] = getelementptr i8, ptr [[P]], i32 3
; ALL-NEXT: [[P3:%.*]] = getelementptr i8, ptr [[P]], i32 3		; LE-NEXT: [[L1:%.*]] = load i16, ptr [[P]], align 1
; ALL-NEXT: [[L1:%.*]] = load i8, ptr [[P]], align 1		; LE-NEXT: [[TMP1:%.*]] = zext i16 [[L1]] to i32
; ALL-NEXT: [[L2:%.*]] = load i8, ptr [[P1]], align 1		; LE-NEXT: [[L3:%.*]] = load i8, ptr [[P2]], align 1
; ALL-NEXT: [[L3:%.*]] = load i8, ptr [[P2]], align 1		; LE-NEXT: store i8 10, ptr [[PSTR:%.*]], align 1
; ALL-NEXT: store i8 10, ptr [[PSTR:%.*]], align 1		; LE-NEXT: [[L4:%.*]] = load i8, ptr [[P3]], align 1
; ALL-NEXT: [[L4:%.*]] = load i8, ptr [[P3]], align 1		; LE-NEXT: [[E3:%.*]] = zext i8 [[L3]] to i32
; ALL-NEXT: [[E1:%.*]] = zext i8 [[L1]] to i32		; LE-NEXT: [[E4:%.*]] = zext i8 [[L4]] to i32
; ALL-NEXT: [[E2:%.*]] = zext i8 [[L2]] to i32		; LE-NEXT: [[S3:%.*]] = shl i32 [[E3]], 16
; ALL-NEXT: [[E3:%.*]] = zext i8 [[L3]] to i32		; LE-NEXT: [[S4:%.*]] = shl i32 [[E4]], 24
; ALL-NEXT: [[E4:%.*]] = zext i8 [[L4]] to i32		; LE-NEXT: [[O2:%.*]] = or i32 [[TMP1]], [[S3]]
; ALL-NEXT: [[S2:%.*]] = shl i32 [[E2]], 8		; LE-NEXT: [[O3:%.*]] = or i32 [[O2]], [[S4]]
; ALL-NEXT: [[S3:%.*]] = shl i32 [[E3]], 16		; LE-NEXT: ret i32 [[O3]]
; ALL-NEXT: [[S4:%.*]] = shl i32 [[E4]], 24		;
; ALL-NEXT: [[O1:%.*]] = or i32 [[E1]], [[S2]]		; BE-LABEL: @loadCombine_4consecutive_alias2(
; ALL-NEXT: [[O2:%.*]] = or i32 [[O1]], [[S3]]		; BE-NEXT: [[P1:%.]] = getelementptr i8, ptr [[P:%.]], i32 1
; ALL-NEXT: [[O3:%.*]] = or i32 [[O2]], [[S4]]		; BE-NEXT: [[P2:%.*]] = getelementptr i8, ptr [[P]], i32 2
; ALL-NEXT: ret i32 [[O3]]		; BE-NEXT: [[P3:%.*]] = getelementptr i8, ptr [[P]], i32 3
		; BE-NEXT: [[L1:%.*]] = load i8, ptr [[P]], align 1
		; BE-NEXT: [[L2:%.*]] = load i8, ptr [[P1]], align 1
		; BE-NEXT: [[L3:%.*]] = load i8, ptr [[P2]], align 1
		; BE-NEXT: store i8 10, ptr [[PSTR:%.*]], align 1
		; BE-NEXT: [[L4:%.*]] = load i8, ptr [[P3]], align 1
		; BE-NEXT: [[E1:%.*]] = zext i8 [[L1]] to i32
		; BE-NEXT: [[E2:%.*]] = zext i8 [[L2]] to i32
		; BE-NEXT: [[E3:%.*]] = zext i8 [[L3]] to i32
		; BE-NEXT: [[E4:%.*]] = zext i8 [[L4]] to i32
		; BE-NEXT: [[S2:%.*]] = shl i32 [[E2]], 8
		; BE-NEXT: [[S3:%.*]] = shl i32 [[E3]], 16
		; BE-NEXT: [[S4:%.*]] = shl i32 [[E4]], 24
		; BE-NEXT: [[O1:%.*]] = or i32 [[E1]], [[S2]]
		; BE-NEXT: [[O2:%.*]] = or i32 [[O1]], [[S3]]
		; BE-NEXT: [[O3:%.*]] = or i32 [[O2]], [[S4]]
		; BE-NEXT: ret i32 [[O3]]
;		;
%p1 = getelementptr i8, ptr %p, i32 1		%p1 = getelementptr i8, ptr %p, i32 1
%p2 = getelementptr i8, ptr %p, i32 2		%p2 = getelementptr i8, ptr %p, i32 2
%p3 = getelementptr i8, ptr %p, i32 3		%p3 = getelementptr i8, ptr %p, i32 3
%l1 = load i8, ptr %p		%l1 = load i8, ptr %p
%l2 = load i8, ptr %p1		%l2 = load i8, ptr %p1
%l3 = load i8, ptr %p2		%l3 = load i8, ptr %p2
store i8 10, i8* %pstr		store i8 10, i8* %pstr
Show All 10 Lines	;

%o1 = or i32 %e1, %s2		%o1 = or i32 %e1, %s2
%o2 = or i32 %o1, %s3		%o2 = or i32 %o1, %s3
%o3 = or i32 %o2, %s4		%o3 = or i32 %o2, %s4
ret i32 %o3		ret i32 %o3
}		}

define i32 @loadCombine_4consecutive_alias2_BE(ptr %p, ptr %pstr) {		define i32 @loadCombine_4consecutive_alias2_BE(ptr %p, ptr %pstr) {
; ALL-LABEL: @loadCombine_4consecutive_alias2_BE(		; LE-LABEL: @loadCombine_4consecutive_alias2_BE(
; ALL-NEXT: [[P1:%.]] = getelementptr i8, ptr [[P:%.]], i32 1		; LE-NEXT: [[P1:%.]] = getelementptr i8, ptr [[P:%.]], i32 1
; ALL-NEXT: [[P2:%.*]] = getelementptr i8, ptr [[P]], i32 2		; LE-NEXT: [[P2:%.*]] = getelementptr i8, ptr [[P]], i32 2
; ALL-NEXT: [[P3:%.*]] = getelementptr i8, ptr [[P]], i32 3		; LE-NEXT: [[P3:%.*]] = getelementptr i8, ptr [[P]], i32 3
; ALL-NEXT: [[L1:%.*]] = load i8, ptr [[P]], align 1		; LE-NEXT: [[L1:%.*]] = load i8, ptr [[P]], align 1
; ALL-NEXT: [[L2:%.*]] = load i8, ptr [[P1]], align 1		; LE-NEXT: [[L2:%.*]] = load i8, ptr [[P1]], align 1
; ALL-NEXT: [[L3:%.*]] = load i8, ptr [[P2]], align 1		; LE-NEXT: [[L3:%.*]] = load i8, ptr [[P2]], align 1
; ALL-NEXT: store i8 10, ptr [[PSTR:%.*]], align 1		; LE-NEXT: store i8 10, ptr [[PSTR:%.*]], align 1
; ALL-NEXT: [[L4:%.*]] = load i8, ptr [[P3]], align 1		; LE-NEXT: [[L4:%.*]] = load i8, ptr [[P3]], align 1
; ALL-NEXT: [[E1:%.*]] = zext i8 [[L1]] to i32		; LE-NEXT: [[E1:%.*]] = zext i8 [[L1]] to i32
; ALL-NEXT: [[E2:%.*]] = zext i8 [[L2]] to i32		; LE-NEXT: [[E2:%.*]] = zext i8 [[L2]] to i32
; ALL-NEXT: [[E3:%.*]] = zext i8 [[L3]] to i32		; LE-NEXT: [[E3:%.*]] = zext i8 [[L3]] to i32
; ALL-NEXT: [[E4:%.*]] = zext i8 [[L4]] to i32		; LE-NEXT: [[E4:%.*]] = zext i8 [[L4]] to i32
; ALL-NEXT: [[S1:%.*]] = shl i32 [[E1]], 24		; LE-NEXT: [[S1:%.*]] = shl i32 [[E1]], 24
; ALL-NEXT: [[S2:%.*]] = shl i32 [[E2]], 16		; LE-NEXT: [[S2:%.*]] = shl i32 [[E2]], 16
; ALL-NEXT: [[S3:%.*]] = shl i32 [[E3]], 8		; LE-NEXT: [[S3:%.*]] = shl i32 [[E3]], 8
; ALL-NEXT: [[O1:%.*]] = or i32 [[S1]], [[S2]]		; LE-NEXT: [[O1:%.*]] = or i32 [[S1]], [[S2]]
; ALL-NEXT: [[O2:%.*]] = or i32 [[O1]], [[S3]]		; LE-NEXT: [[O2:%.*]] = or i32 [[O1]], [[S3]]
; ALL-NEXT: [[O3:%.*]] = or i32 [[O2]], [[E4]]		; LE-NEXT: [[O3:%.*]] = or i32 [[O2]], [[E4]]
; ALL-NEXT: ret i32 [[O3]]		; LE-NEXT: ret i32 [[O3]]
		;
		; BE-LABEL: @loadCombine_4consecutive_alias2_BE(
		; BE-NEXT: [[P2:%.]] = getelementptr i8, ptr [[P:%.]], i32 2
		; BE-NEXT: [[P3:%.*]] = getelementptr i8, ptr [[P]], i32 3
		; BE-NEXT: [[L1:%.*]] = load i16, ptr [[P]], align 1
		; BE-NEXT: [[TMP1:%.*]] = zext i16 [[L1]] to i32
		; BE-NEXT: [[TMP2:%.*]] = shl i32 [[TMP1]], 16
		; BE-NEXT: [[L3:%.*]] = load i8, ptr [[P2]], align 1
		; BE-NEXT: store i8 10, ptr [[PSTR:%.*]], align 1
		; BE-NEXT: [[L4:%.*]] = load i8, ptr [[P3]], align 1
		; BE-NEXT: [[E3:%.*]] = zext i8 [[L3]] to i32
		; BE-NEXT: [[E4:%.*]] = zext i8 [[L4]] to i32
		; BE-NEXT: [[S3:%.*]] = shl i32 [[E3]], 8
		; BE-NEXT: [[O2:%.*]] = or i32 [[TMP2]], [[S3]]
		; BE-NEXT: [[O3:%.*]] = or i32 [[O2]], [[E4]]
		; BE-NEXT: ret i32 [[O3]]
;		;
%p1 = getelementptr i8, ptr %p, i32 1		%p1 = getelementptr i8, ptr %p, i32 1
%p2 = getelementptr i8, ptr %p, i32 2		%p2 = getelementptr i8, ptr %p, i32 2
%p3 = getelementptr i8, ptr %p, i32 3		%p3 = getelementptr i8, ptr %p, i32 3
%l1 = load i8, ptr %p		%l1 = load i8, ptr %p
%l2 = load i8, ptr %p1		%l2 = load i8, ptr %p1
%l3 = load i8, ptr %p2		%l3 = load i8, ptr %p2
store i8 10, i8* %pstr		store i8 10, i8* %pstr
Show All 10 Lines	;

%o1 = or i32 %s1, %s2		%o1 = or i32 %s1, %s2
%o2 = or i32 %o1, %s3		%o2 = or i32 %o1, %s3
%o3 = or i32 %o2, %e4		%o3 = or i32 %o2, %e4
ret i32 %o3		ret i32 %o3
}		}

define i32 @loadCombine_4consecutive_alias3(ptr %p) {		define i32 @loadCombine_4consecutive_alias3(ptr %p) {
; ALL-LABEL: @loadCombine_4consecutive_alias3(		; LE-LABEL: @loadCombine_4consecutive_alias3(
; ALL-NEXT: [[P1:%.]] = getelementptr i8, ptr [[P:%.]], i32 1		; LE-NEXT: [[P2:%.]] = getelementptr i8, ptr [[P:%.]], i32 2
; ALL-NEXT: [[P2:%.*]] = getelementptr i8, ptr [[P]], i32 2		; LE-NEXT: [[P3:%.*]] = getelementptr i8, ptr [[P]], i32 3
; ALL-NEXT: [[P3:%.*]] = getelementptr i8, ptr [[P]], i32 3		; LE-NEXT: [[L1:%.*]] = load i16, ptr [[P]], align 1
; ALL-NEXT: [[L1:%.*]] = load i8, ptr [[P]], align 1		; LE-NEXT: [[TMP1:%.*]] = zext i16 [[L1]] to i32
; ALL-NEXT: [[L2:%.*]] = load i8, ptr [[P1]], align 1		; LE-NEXT: [[L3:%.*]] = load i8, ptr [[P2]], align 1
; ALL-NEXT: [[L3:%.*]] = load i8, ptr [[P2]], align 1		; LE-NEXT: store i8 10, ptr [[P3]], align 1
; ALL-NEXT: store i8 10, ptr [[P3]], align 1		; LE-NEXT: store i8 5, ptr [[P]], align 1
; ALL-NEXT: store i8 5, ptr [[P]], align 1		; LE-NEXT: [[L4:%.*]] = load i8, ptr [[P3]], align 1
; ALL-NEXT: [[L4:%.*]] = load i8, ptr [[P3]], align 1		; LE-NEXT: [[E3:%.*]] = zext i8 [[L3]] to i32
; ALL-NEXT: [[E1:%.*]] = zext i8 [[L1]] to i32		; LE-NEXT: [[E4:%.*]] = zext i8 [[L4]] to i32
; ALL-NEXT: [[E2:%.*]] = zext i8 [[L2]] to i32		; LE-NEXT: [[S3:%.*]] = shl i32 [[E3]], 16
; ALL-NEXT: [[E3:%.*]] = zext i8 [[L3]] to i32		; LE-NEXT: [[S4:%.*]] = shl i32 [[E4]], 24
; ALL-NEXT: [[E4:%.*]] = zext i8 [[L4]] to i32		; LE-NEXT: [[O2:%.*]] = or i32 [[TMP1]], [[S3]]
; ALL-NEXT: [[S2:%.*]] = shl i32 [[E2]], 8		; LE-NEXT: [[O3:%.*]] = or i32 [[O2]], [[S4]]
; ALL-NEXT: [[S3:%.*]] = shl i32 [[E3]], 16		; LE-NEXT: ret i32 [[O3]]
; ALL-NEXT: [[S4:%.*]] = shl i32 [[E4]], 24		;
; ALL-NEXT: [[O1:%.*]] = or i32 [[E1]], [[S2]]		; BE-LABEL: @loadCombine_4consecutive_alias3(
; ALL-NEXT: [[O2:%.*]] = or i32 [[O1]], [[S3]]		; BE-NEXT: [[P1:%.]] = getelementptr i8, ptr [[P:%.]], i32 1
; ALL-NEXT: [[O3:%.*]] = or i32 [[O2]], [[S4]]		; BE-NEXT: [[P2:%.*]] = getelementptr i8, ptr [[P]], i32 2
; ALL-NEXT: ret i32 [[O3]]		; BE-NEXT: [[P3:%.*]] = getelementptr i8, ptr [[P]], i32 3
		; BE-NEXT: [[L1:%.*]] = load i8, ptr [[P]], align 1
		; BE-NEXT: [[L2:%.*]] = load i8, ptr [[P1]], align 1
		; BE-NEXT: [[L3:%.*]] = load i8, ptr [[P2]], align 1
		; BE-NEXT: store i8 10, ptr [[P3]], align 1
		; BE-NEXT: store i8 5, ptr [[P]], align 1
		; BE-NEXT: [[L4:%.*]] = load i8, ptr [[P3]], align 1
		; BE-NEXT: [[E1:%.*]] = zext i8 [[L1]] to i32
		; BE-NEXT: [[E2:%.*]] = zext i8 [[L2]] to i32
		; BE-NEXT: [[E3:%.*]] = zext i8 [[L3]] to i32
		; BE-NEXT: [[E4:%.*]] = zext i8 [[L4]] to i32
		; BE-NEXT: [[S2:%.*]] = shl i32 [[E2]], 8
		; BE-NEXT: [[S3:%.*]] = shl i32 [[E3]], 16
		; BE-NEXT: [[S4:%.*]] = shl i32 [[E4]], 24
		; BE-NEXT: [[O1:%.*]] = or i32 [[E1]], [[S2]]
		; BE-NEXT: [[O2:%.*]] = or i32 [[O1]], [[S3]]
		; BE-NEXT: [[O3:%.*]] = or i32 [[O2]], [[S4]]
		; BE-NEXT: ret i32 [[O3]]
;		;
%p1 = getelementptr i8, ptr %p, i32 1		%p1 = getelementptr i8, ptr %p, i32 1
%p2 = getelementptr i8, ptr %p, i32 2		%p2 = getelementptr i8, ptr %p, i32 2
%p3 = getelementptr i8, ptr %p, i32 3		%p3 = getelementptr i8, ptr %p, i32 3
%l1 = load i8, ptr %p		%l1 = load i8, ptr %p
%l2 = load i8, ptr %p1		%l2 = load i8, ptr %p1
%l3 = load i8, ptr %p2		%l3 = load i8, ptr %p2
store i8 10, i8* %p3		store i8 10, i8* %p3
Show All 11 Lines	;

%o1 = or i32 %e1, %s2		%o1 = or i32 %e1, %s2
%o2 = or i32 %o1, %s3		%o2 = or i32 %o1, %s3
%o3 = or i32 %o2, %s4		%o3 = or i32 %o2, %s4
ret i32 %o3		ret i32 %o3
}		}

define i32 @loadCombine_4consecutive_alias3_BE(ptr %p) {		define i32 @loadCombine_4consecutive_alias3_BE(ptr %p) {
; ALL-LABEL: @loadCombine_4consecutive_alias3_BE(		; LE-LABEL: @loadCombine_4consecutive_alias3_BE(
; ALL-NEXT: [[P1:%.]] = getelementptr i8, ptr [[P:%.]], i32 1		; LE-NEXT: [[P1:%.]] = getelementptr i8, ptr [[P:%.]], i32 1
; ALL-NEXT: [[P2:%.*]] = getelementptr i8, ptr [[P]], i32 2		; LE-NEXT: [[P2:%.*]] = getelementptr i8, ptr [[P]], i32 2
; ALL-NEXT: [[P3:%.*]] = getelementptr i8, ptr [[P]], i32 3		; LE-NEXT: [[P3:%.*]] = getelementptr i8, ptr [[P]], i32 3
; ALL-NEXT: [[L1:%.*]] = load i8, ptr [[P]], align 1		; LE-NEXT: [[L1:%.*]] = load i8, ptr [[P]], align 1
; ALL-NEXT: [[L2:%.*]] = load i8, ptr [[P1]], align 1		; LE-NEXT: [[L2:%.*]] = load i8, ptr [[P1]], align 1
; ALL-NEXT: [[L3:%.*]] = load i8, ptr [[P2]], align 1		; LE-NEXT: [[L3:%.*]] = load i8, ptr [[P2]], align 1
; ALL-NEXT: store i8 10, ptr [[P3]], align 1		; LE-NEXT: store i8 10, ptr [[P3]], align 1
; ALL-NEXT: store i8 5, ptr [[P]], align 1		; LE-NEXT: store i8 5, ptr [[P]], align 1
; ALL-NEXT: [[L4:%.*]] = load i8, ptr [[P3]], align 1		; LE-NEXT: [[L4:%.*]] = load i8, ptr [[P3]], align 1
; ALL-NEXT: [[E1:%.*]] = zext i8 [[L1]] to i32		; LE-NEXT: [[E1:%.*]] = zext i8 [[L1]] to i32
; ALL-NEXT: [[E2:%.*]] = zext i8 [[L2]] to i32		; LE-NEXT: [[E2:%.*]] = zext i8 [[L2]] to i32
; ALL-NEXT: [[E3:%.*]] = zext i8 [[L3]] to i32		; LE-NEXT: [[E3:%.*]] = zext i8 [[L3]] to i32
; ALL-NEXT: [[E4:%.*]] = zext i8 [[L4]] to i32		; LE-NEXT: [[E4:%.*]] = zext i8 [[L4]] to i32
; ALL-NEXT: [[S1:%.*]] = shl i32 [[E1]], 24		; LE-NEXT: [[S1:%.*]] = shl i32 [[E1]], 24
; ALL-NEXT: [[S2:%.*]] = shl i32 [[E2]], 16		; LE-NEXT: [[S2:%.*]] = shl i32 [[E2]], 16
; ALL-NEXT: [[S3:%.*]] = shl i32 [[E3]], 8		; LE-NEXT: [[S3:%.*]] = shl i32 [[E3]], 8
; ALL-NEXT: [[O1:%.*]] = or i32 [[S1]], [[S2]]		; LE-NEXT: [[O1:%.*]] = or i32 [[S1]], [[S2]]
; ALL-NEXT: [[O2:%.*]] = or i32 [[O1]], [[S3]]		; LE-NEXT: [[O2:%.*]] = or i32 [[O1]], [[S3]]
; ALL-NEXT: [[O3:%.*]] = or i32 [[O2]], [[E4]]		; LE-NEXT: [[O3:%.*]] = or i32 [[O2]], [[E4]]
; ALL-NEXT: ret i32 [[O3]]		; LE-NEXT: ret i32 [[O3]]
		;
		; BE-LABEL: @loadCombine_4consecutive_alias3_BE(
		; BE-NEXT: [[P2:%.]] = getelementptr i8, ptr [[P:%.]], i32 2
		; BE-NEXT: [[P3:%.*]] = getelementptr i8, ptr [[P]], i32 3
		; BE-NEXT: [[L1:%.*]] = load i16, ptr [[P]], align 1
		; BE-NEXT: [[TMP1:%.*]] = zext i16 [[L1]] to i32
		; BE-NEXT: [[TMP2:%.*]] = shl i32 [[TMP1]], 16
		; BE-NEXT: [[L3:%.*]] = load i8, ptr [[P2]], align 1
		; BE-NEXT: store i8 10, ptr [[P3]], align 1
		; BE-NEXT: store i8 5, ptr [[P]], align 1
		; BE-NEXT: [[L4:%.*]] = load i8, ptr [[P3]], align 1
		; BE-NEXT: [[E3:%.*]] = zext i8 [[L3]] to i32
		; BE-NEXT: [[E4:%.*]] = zext i8 [[L4]] to i32
		; BE-NEXT: [[S3:%.*]] = shl i32 [[E3]], 8
		; BE-NEXT: [[O2:%.*]] = or i32 [[TMP2]], [[S3]]
		; BE-NEXT: [[O3:%.*]] = or i32 [[O2]], [[E4]]
		; BE-NEXT: ret i32 [[O3]]
;		;
%p1 = getelementptr i8, ptr %p, i32 1		%p1 = getelementptr i8, ptr %p, i32 1
%p2 = getelementptr i8, ptr %p, i32 2		%p2 = getelementptr i8, ptr %p, i32 2
%p3 = getelementptr i8, ptr %p, i32 3		%p3 = getelementptr i8, ptr %p, i32 3
%l1 = load i8, ptr %p		%l1 = load i8, ptr %p
%l2 = load i8, ptr %p1		%l2 = load i8, ptr %p1
%l3 = load i8, ptr %p2		%l3 = load i8, ptr %p2
store i8 10, i8* %p3		store i8 10, i8* %p3
▲ Show 20 Lines • Show All 274 Lines • ▼ Show 20 Lines	;
call void @use(i32 %o1)		call void @use(i32 %o1)
%o2 = or i32 %o1, %s3		%o2 = or i32 %o1, %s3
call void @use(i32 %o2)		call void @use(i32 %o2)
%o3 = or i32 %o2, %s4		%o3 = or i32 %o2, %s4
ret i32 %o3		ret i32 %o3
}		}

define i32 @loadCombine_parLoad1(ptr %p) {		define i32 @loadCombine_parLoad1(ptr %p) {
; ALL-LABEL: @loadCombine_parLoad1(		; LE-LABEL: @loadCombine_parLoad1(
; ALL-NEXT: [[P1:%.]] = getelementptr i8, ptr [[P:%.]], i32 1		; LE-NEXT: [[P2:%.]] = getelementptr i8, ptr [[P:%.]], i32 2
; ALL-NEXT: [[P2:%.*]] = getelementptr i8, ptr [[P]], i32 2		; LE-NEXT: [[L1:%.*]] = load i16, ptr [[P]], align 1
; ALL-NEXT: [[L1:%.*]] = load i8, ptr [[P]], align 1		; LE-NEXT: [[TMP1:%.*]] = zext i16 [[L1]] to i32
; ALL-NEXT: [[L2:%.*]] = load i8, ptr [[P1]], align 1		; LE-NEXT: [[L3:%.*]] = load i8, ptr [[P2]], align 1
; ALL-NEXT: [[L3:%.*]] = load i8, ptr [[P2]], align 1		; LE-NEXT: [[E3:%.*]] = zext i8 [[L3]] to i32
; ALL-NEXT: [[E1:%.*]] = zext i8 [[L1]] to i32		; LE-NEXT: [[S3:%.*]] = shl i32 [[E3]], 16
; ALL-NEXT: [[E2:%.*]] = zext i8 [[L2]] to i32		; LE-NEXT: [[O2:%.*]] = or i32 [[TMP1]], [[S3]]
; ALL-NEXT: [[E3:%.*]] = zext i8 [[L3]] to i32		; LE-NEXT: ret i32 [[O2]]
; ALL-NEXT: [[S2:%.*]] = shl i32 [[E2]], 8		;
; ALL-NEXT: [[S3:%.*]] = shl i32 [[E3]], 16		; BE-LABEL: @loadCombine_parLoad1(
; ALL-NEXT: [[O1:%.*]] = or i32 [[E1]], [[S2]]		; BE-NEXT: [[P1:%.]] = getelementptr i8, ptr [[P:%.]], i32 1
; ALL-NEXT: [[O2:%.*]] = or i32 [[O1]], [[S3]]		; BE-NEXT: [[P2:%.*]] = getelementptr i8, ptr [[P]], i32 2
; ALL-NEXT: ret i32 [[O2]]		; BE-NEXT: [[L1:%.*]] = load i8, ptr [[P]], align 1
		; BE-NEXT: [[L2:%.*]] = load i8, ptr [[P1]], align 1
		; BE-NEXT: [[L3:%.*]] = load i8, ptr [[P2]], align 1
		; BE-NEXT: [[E1:%.*]] = zext i8 [[L1]] to i32
		; BE-NEXT: [[E2:%.*]] = zext i8 [[L2]] to i32
		; BE-NEXT: [[E3:%.*]] = zext i8 [[L3]] to i32
		; BE-NEXT: [[S2:%.*]] = shl i32 [[E2]], 8
		; BE-NEXT: [[S3:%.*]] = shl i32 [[E3]], 16
		; BE-NEXT: [[O1:%.*]] = or i32 [[E1]], [[S2]]
		; BE-NEXT: [[O2:%.*]] = or i32 [[O1]], [[S3]]
		; BE-NEXT: ret i32 [[O2]]
;		;
%p1 = getelementptr i8, ptr %p, i32 1		%p1 = getelementptr i8, ptr %p, i32 1
%p2 = getelementptr i8, ptr %p, i32 2		%p2 = getelementptr i8, ptr %p, i32 2
%l1 = load i8, ptr %p		%l1 = load i8, ptr %p
%l2 = load i8, ptr %p1		%l2 = load i8, ptr %p1
%l3 = load i8, ptr %p2		%l3 = load i8, ptr %p2

%e1 = zext i8 %l1 to i32		%e1 = zext i8 %l1 to i32
%e2 = zext i8 %l2 to i32		%e2 = zext i8 %l2 to i32
%e3 = zext i8 %l3 to i32		%e3 = zext i8 %l3 to i32

%s2 = shl i32 %e2, 8		%s2 = shl i32 %e2, 8
%s3 = shl i32 %e3, 16		%s3 = shl i32 %e3, 16

%o1 = or i32 %e1, %s2		%o1 = or i32 %e1, %s2
%o2 = or i32 %o1, %s3		%o2 = or i32 %o1, %s3
ret i32 %o2		ret i32 %o2
}		}

define i128 @loadCombine_i128(ptr %p) {		define i128 @loadCombine_i128(ptr %p) {
; ALL-LABEL: @loadCombine_i128(		; LE-LABEL: @loadCombine_i128(
; ALL-NEXT: [[P1:%.]] = getelementptr i32, ptr [[P:%.]], i32 1		; LE-NEXT: [[P2:%.]] = getelementptr i32, ptr [[P:%.]], i32 2
; ALL-NEXT: [[P2:%.*]] = getelementptr i32, ptr [[P]], i32 2		; LE-NEXT: [[P3:%.*]] = getelementptr i32, ptr [[P]], i32 3
; ALL-NEXT: [[P3:%.*]] = getelementptr i32, ptr [[P]], i32 3		; LE-NEXT: [[L1:%.*]] = load i64, ptr [[P]], align 4
; ALL-NEXT: [[L1:%.*]] = load i32, ptr [[P]], align 4		; LE-NEXT: [[TMP1:%.*]] = zext i64 [[L1]] to i128
; ALL-NEXT: [[L2:%.*]] = load i32, ptr [[P1]], align 4		; LE-NEXT: [[L3:%.*]] = load i32, ptr [[P2]], align 4
; ALL-NEXT: [[L3:%.*]] = load i32, ptr [[P2]], align 4		; LE-NEXT: [[L4:%.*]] = load i32, ptr [[P3]], align 4
; ALL-NEXT: [[L4:%.*]] = load i32, ptr [[P3]], align 4		; LE-NEXT: [[E3:%.*]] = zext i32 [[L3]] to i128
; ALL-NEXT: [[E1:%.*]] = zext i32 [[L1]] to i128		; LE-NEXT: [[E4:%.*]] = zext i32 [[L4]] to i128
; ALL-NEXT: [[E2:%.*]] = zext i32 [[L2]] to i128		; LE-NEXT: [[S3:%.*]] = shl i128 [[E3]], 64
; ALL-NEXT: [[E3:%.*]] = zext i32 [[L3]] to i128		; LE-NEXT: [[S4:%.*]] = shl i128 [[E4]], 96
; ALL-NEXT: [[E4:%.*]] = zext i32 [[L4]] to i128		; LE-NEXT: [[O2:%.*]] = or i128 [[TMP1]], [[S3]]
; ALL-NEXT: [[S2:%.*]] = shl i128 [[E2]], 32		; LE-NEXT: [[O3:%.*]] = or i128 [[O2]], [[S4]]
; ALL-NEXT: [[S3:%.*]] = shl i128 [[E3]], 64		; LE-NEXT: ret i128 [[O3]]
; ALL-NEXT: [[S4:%.*]] = shl i128 [[E4]], 96		;
; ALL-NEXT: [[O1:%.*]] = or i128 [[E1]], [[S2]]		; BE-LABEL: @loadCombine_i128(
; ALL-NEXT: [[O2:%.*]] = or i128 [[O1]], [[S3]]		; BE-NEXT: [[P1:%.]] = getelementptr i32, ptr [[P:%.]], i32 1
; ALL-NEXT: [[O3:%.*]] = or i128 [[O2]], [[S4]]		; BE-NEXT: [[P2:%.*]] = getelementptr i32, ptr [[P]], i32 2
; ALL-NEXT: ret i128 [[O3]]		; BE-NEXT: [[P3:%.*]] = getelementptr i32, ptr [[P]], i32 3
		; BE-NEXT: [[L1:%.*]] = load i32, ptr [[P]], align 4
		; BE-NEXT: [[L2:%.*]] = load i32, ptr [[P1]], align 4
		; BE-NEXT: [[L3:%.*]] = load i32, ptr [[P2]], align 4
		; BE-NEXT: [[L4:%.*]] = load i32, ptr [[P3]], align 4
		; BE-NEXT: [[E1:%.*]] = zext i32 [[L1]] to i128
		; BE-NEXT: [[E2:%.*]] = zext i32 [[L2]] to i128
		; BE-NEXT: [[E3:%.*]] = zext i32 [[L3]] to i128
		; BE-NEXT: [[E4:%.*]] = zext i32 [[L4]] to i128
		; BE-NEXT: [[S2:%.*]] = shl i128 [[E2]], 32
		; BE-NEXT: [[S3:%.*]] = shl i128 [[E3]], 64
		; BE-NEXT: [[S4:%.*]] = shl i128 [[E4]], 96
		; BE-NEXT: [[O1:%.*]] = or i128 [[E1]], [[S2]]
		; BE-NEXT: [[O2:%.*]] = or i128 [[O1]], [[S3]]
		; BE-NEXT: [[O3:%.*]] = or i128 [[O2]], [[S4]]
		; BE-NEXT: ret i128 [[O3]]
;		;
%p1 = getelementptr i32, ptr %p, i32 1		%p1 = getelementptr i32, ptr %p, i32 1
%p2 = getelementptr i32, ptr %p, i32 2		%p2 = getelementptr i32, ptr %p, i32 2
%p3 = getelementptr i32, ptr %p, i32 3		%p3 = getelementptr i32, ptr %p, i32 3
%l1 = load i32, ptr %p		%l1 = load i32, ptr %p
%l2 = load i32, ptr %p1		%l2 = load i32, ptr %p1
%l3 = load i32, ptr %p2		%l3 = load i32, ptr %p2
%l4 = load i32, ptr %p3		%l4 = load i32, ptr %p3
Show All 9 Lines	;

%o1 = or i128 %e1, %s2		%o1 = or i128 %e1, %s2
%o2 = or i128 %o1, %s3		%o2 = or i128 %o1, %s3
%o3 = or i128 %o2, %s4		%o3 = or i128 %o2, %s4
ret i128 %o3		ret i128 %o3
}		}

define i128 @loadCombine_i128_BE(ptr %p) {		define i128 @loadCombine_i128_BE(ptr %p) {
; ALL-LABEL: @loadCombine_i128_BE(		; LE-LABEL: @loadCombine_i128_BE(
; ALL-NEXT: [[P1:%.]] = getelementptr i32, ptr [[P:%.]], i32 1		; LE-NEXT: [[P1:%.]] = getelementptr i32, ptr [[P:%.]], i32 1
; ALL-NEXT: [[P2:%.*]] = getelementptr i32, ptr [[P]], i32 2		; LE-NEXT: [[P2:%.*]] = getelementptr i32, ptr [[P]], i32 2
; ALL-NEXT: [[P3:%.*]] = getelementptr i32, ptr [[P]], i32 3		; LE-NEXT: [[P3:%.*]] = getelementptr i32, ptr [[P]], i32 3
; ALL-NEXT: [[L1:%.*]] = load i32, ptr [[P]], align 4		; LE-NEXT: [[L1:%.*]] = load i32, ptr [[P]], align 4
; ALL-NEXT: [[L2:%.*]] = load i32, ptr [[P1]], align 4		; LE-NEXT: [[L2:%.*]] = load i32, ptr [[P1]], align 4
; ALL-NEXT: [[L3:%.*]] = load i32, ptr [[P2]], align 4		; LE-NEXT: [[L3:%.*]] = load i32, ptr [[P2]], align 4
; ALL-NEXT: [[L4:%.*]] = load i32, ptr [[P3]], align 4		; LE-NEXT: [[L4:%.*]] = load i32, ptr [[P3]], align 4
; ALL-NEXT: [[E1:%.*]] = zext i32 [[L1]] to i128		; LE-NEXT: [[E1:%.*]] = zext i32 [[L1]] to i128
; ALL-NEXT: [[E2:%.*]] = zext i32 [[L2]] to i128		; LE-NEXT: [[E2:%.*]] = zext i32 [[L2]] to i128
; ALL-NEXT: [[E3:%.*]] = zext i32 [[L3]] to i128		; LE-NEXT: [[E3:%.*]] = zext i32 [[L3]] to i128
; ALL-NEXT: [[E4:%.*]] = zext i32 [[L4]] to i128		; LE-NEXT: [[E4:%.*]] = zext i32 [[L4]] to i128
; ALL-NEXT: [[S1:%.*]] = shl i128 [[E1]], 96		; LE-NEXT: [[S1:%.*]] = shl i128 [[E1]], 96
; ALL-NEXT: [[S2:%.*]] = shl i128 [[E2]], 64		; LE-NEXT: [[S2:%.*]] = shl i128 [[E2]], 64
; ALL-NEXT: [[S3:%.*]] = shl i128 [[E3]], 32		; LE-NEXT: [[S3:%.*]] = shl i128 [[E3]], 32
; ALL-NEXT: [[O1:%.*]] = or i128 [[S1]], [[S2]]		; LE-NEXT: [[O1:%.*]] = or i128 [[S1]], [[S2]]
; ALL-NEXT: [[O2:%.*]] = or i128 [[O1]], [[S3]]		; LE-NEXT: [[O2:%.*]] = or i128 [[O1]], [[S3]]
; ALL-NEXT: [[O3:%.*]] = or i128 [[O2]], [[E4]]		; LE-NEXT: [[O3:%.*]] = or i128 [[O2]], [[E4]]
; ALL-NEXT: ret i128 [[O3]]		; LE-NEXT: ret i128 [[O3]]
		;
		; BE-LABEL: @loadCombine_i128_BE(
		; BE-NEXT: [[P2:%.]] = getelementptr i32, ptr [[P:%.]], i32 2
		; BE-NEXT: [[P3:%.*]] = getelementptr i32, ptr [[P]], i32 3
		; BE-NEXT: [[L1:%.*]] = load i64, ptr [[P]], align 4
		; BE-NEXT: [[TMP1:%.*]] = zext i64 [[L1]] to i128
		; BE-NEXT: [[TMP2:%.*]] = shl i128 [[TMP1]], 64
		; BE-NEXT: [[L3:%.*]] = load i32, ptr [[P2]], align 4
		; BE-NEXT: [[L4:%.*]] = load i32, ptr [[P3]], align 4
		; BE-NEXT: [[E3:%.*]] = zext i32 [[L3]] to i128
		; BE-NEXT: [[E4:%.*]] = zext i32 [[L4]] to i128
		; BE-NEXT: [[S3:%.*]] = shl i128 [[E3]], 32
		; BE-NEXT: [[O2:%.*]] = or i128 [[TMP2]], [[S3]]
		; BE-NEXT: [[O3:%.*]] = or i128 [[O2]], [[E4]]
		; BE-NEXT: ret i128 [[O3]]
;		;
%p1 = getelementptr i32, ptr %p, i32 1		%p1 = getelementptr i32, ptr %p, i32 1
%p2 = getelementptr i32, ptr %p, i32 2		%p2 = getelementptr i32, ptr %p, i32 2
%p3 = getelementptr i32, ptr %p, i32 3		%p3 = getelementptr i32, ptr %p, i32 3
%l1 = load i32, ptr %p		%l1 = load i32, ptr %p
%l2 = load i32, ptr %p1		%l2 = load i32, ptr %p1
%l3 = load i32, ptr %p2		%l3 = load i32, ptr %p2
%l4 = load i32, ptr %p3		%l4 = load i32, ptr %p3
Show All 9 Lines	;

%o1 = or i128 %s1, %s2		%o1 = or i128 %s1, %s2
%o2 = or i128 %o1, %s3		%o2 = or i128 %o1, %s3
%o3 = or i128 %o2, %e4		%o3 = or i128 %o2, %e4
ret i128 %o3		ret i128 %o3
}		}

define i64 @loadCombine_i64(ptr %p) {		define i64 @loadCombine_i64(ptr %p) {
; ALL-LABEL: @loadCombine_i64(		; LE-LABEL: @loadCombine_i64(
; ALL-NEXT: [[P1:%.]] = getelementptr i16, ptr [[P:%.]], i32 1		; LE-NEXT: [[L1:%.]] = load i64, ptr [[P:%.]], align 2
; ALL-NEXT: [[P2:%.*]] = getelementptr i16, ptr [[P]], i32 2		; LE-NEXT: ret i64 [[L1]]
; ALL-NEXT: [[P3:%.*]] = getelementptr i16, ptr [[P]], i32 3		;
; ALL-NEXT: [[L1:%.*]] = load i16, ptr [[P]], align 2		; BE-LABEL: @loadCombine_i64(
; ALL-NEXT: [[L2:%.*]] = load i16, ptr [[P1]], align 2		; BE-NEXT: [[P1:%.]] = getelementptr i16, ptr [[P:%.]], i32 1
; ALL-NEXT: [[L3:%.*]] = load i16, ptr [[P2]], align 2		; BE-NEXT: [[P2:%.*]] = getelementptr i16, ptr [[P]], i32 2
; ALL-NEXT: [[L4:%.*]] = load i16, ptr [[P3]], align 2		; BE-NEXT: [[P3:%.*]] = getelementptr i16, ptr [[P]], i32 3
; ALL-NEXT: [[E1:%.*]] = zext i16 [[L1]] to i64		; BE-NEXT: [[L1:%.*]] = load i16, ptr [[P]], align 2
; ALL-NEXT: [[E2:%.*]] = zext i16 [[L2]] to i64		; BE-NEXT: [[L2:%.*]] = load i16, ptr [[P1]], align 2
; ALL-NEXT: [[E3:%.*]] = zext i16 [[L3]] to i64		; BE-NEXT: [[L3:%.*]] = load i16, ptr [[P2]], align 2
; ALL-NEXT: [[E4:%.*]] = zext i16 [[L4]] to i64		; BE-NEXT: [[L4:%.*]] = load i16, ptr [[P3]], align 2
; ALL-NEXT: [[S2:%.*]] = shl i64 [[E2]], 16		; BE-NEXT: [[E1:%.*]] = zext i16 [[L1]] to i64
; ALL-NEXT: [[S3:%.*]] = shl i64 [[E3]], 32		; BE-NEXT: [[E2:%.*]] = zext i16 [[L2]] to i64
; ALL-NEXT: [[S4:%.*]] = shl i64 [[E4]], 48		; BE-NEXT: [[E3:%.*]] = zext i16 [[L3]] to i64
; ALL-NEXT: [[O1:%.*]] = or i64 [[E1]], [[S2]]		; BE-NEXT: [[E4:%.*]] = zext i16 [[L4]] to i64
; ALL-NEXT: [[O2:%.*]] = or i64 [[O1]], [[S3]]		; BE-NEXT: [[S2:%.*]] = shl i64 [[E2]], 16
; ALL-NEXT: [[O3:%.*]] = or i64 [[O2]], [[S4]]		; BE-NEXT: [[S3:%.*]] = shl i64 [[E3]], 32
; ALL-NEXT: ret i64 [[O3]]		; BE-NEXT: [[S4:%.*]] = shl i64 [[E4]], 48
		; BE-NEXT: [[O1:%.*]] = or i64 [[E1]], [[S2]]
		; BE-NEXT: [[O2:%.*]] = or i64 [[O1]], [[S3]]
		; BE-NEXT: [[O3:%.*]] = or i64 [[O2]], [[S4]]
		; BE-NEXT: ret i64 [[O3]]
;		;
%p1 = getelementptr i16, ptr %p, i32 1		%p1 = getelementptr i16, ptr %p, i32 1
%p2 = getelementptr i16, ptr %p, i32 2		%p2 = getelementptr i16, ptr %p, i32 2
%p3 = getelementptr i16, ptr %p, i32 3		%p3 = getelementptr i16, ptr %p, i32 3
%l1 = load i16, ptr %p		%l1 = load i16, ptr %p
%l2 = load i16, ptr %p1		%l2 = load i16, ptr %p1
%l3 = load i16, ptr %p2		%l3 = load i16, ptr %p2
%l4 = load i16, ptr %p3		%l4 = load i16, ptr %p3
Show All 9 Lines	;

%o1 = or i64 %e1, %s2		%o1 = or i64 %e1, %s2
%o2 = or i64 %o1, %s3		%o2 = or i64 %o1, %s3
%o3 = or i64 %o2, %s4		%o3 = or i64 %o2, %s4
ret i64 %o3		ret i64 %o3
}		}

define i64 @loadCombine_i64_BE(ptr %p) {		define i64 @loadCombine_i64_BE(ptr %p) {
; ALL-LABEL: @loadCombine_i64_BE(		; LE-LABEL: @loadCombine_i64_BE(
; ALL-NEXT: [[P1:%.]] = getelementptr i16, ptr [[P:%.]], i32 1		; LE-NEXT: [[P1:%.]] = getelementptr i16, ptr [[P:%.]], i32 1
; ALL-NEXT: [[P2:%.*]] = getelementptr i16, ptr [[P]], i32 2		; LE-NEXT: [[P2:%.*]] = getelementptr i16, ptr [[P]], i32 2
; ALL-NEXT: [[P3:%.*]] = getelementptr i16, ptr [[P]], i32 3		; LE-NEXT: [[P3:%.*]] = getelementptr i16, ptr [[P]], i32 3
; ALL-NEXT: [[L1:%.*]] = load i16, ptr [[P]], align 2		; LE-NEXT: [[L1:%.*]] = load i16, ptr [[P]], align 2
; ALL-NEXT: [[L2:%.*]] = load i16, ptr [[P1]], align 2		; LE-NEXT: [[L2:%.*]] = load i16, ptr [[P1]], align 2
; ALL-NEXT: [[L3:%.*]] = load i16, ptr [[P2]], align 2		; LE-NEXT: [[L3:%.*]] = load i16, ptr [[P2]], align 2
; ALL-NEXT: [[L4:%.*]] = load i16, ptr [[P3]], align 2		; LE-NEXT: [[L4:%.*]] = load i16, ptr [[P3]], align 2
; ALL-NEXT: [[E1:%.*]] = zext i16 [[L1]] to i64		; LE-NEXT: [[E1:%.*]] = zext i16 [[L1]] to i64
; ALL-NEXT: [[E2:%.*]] = zext i16 [[L2]] to i64		; LE-NEXT: [[E2:%.*]] = zext i16 [[L2]] to i64
; ALL-NEXT: [[E3:%.*]] = zext i16 [[L3]] to i64		; LE-NEXT: [[E3:%.*]] = zext i16 [[L3]] to i64
; ALL-NEXT: [[E4:%.*]] = zext i16 [[L4]] to i64		; LE-NEXT: [[E4:%.*]] = zext i16 [[L4]] to i64
; ALL-NEXT: [[S1:%.*]] = shl i64 [[E1]], 48		; LE-NEXT: [[S1:%.*]] = shl i64 [[E1]], 48
; ALL-NEXT: [[S2:%.*]] = shl i64 [[E2]], 32		; LE-NEXT: [[S2:%.*]] = shl i64 [[E2]], 32
; ALL-NEXT: [[S3:%.*]] = shl i64 [[E3]], 16		; LE-NEXT: [[S3:%.*]] = shl i64 [[E3]], 16
; ALL-NEXT: [[O1:%.*]] = or i64 [[S1]], [[S2]]		; LE-NEXT: [[O1:%.*]] = or i64 [[S1]], [[S2]]
; ALL-NEXT: [[O2:%.*]] = or i64 [[O1]], [[S3]]		; LE-NEXT: [[O2:%.*]] = or i64 [[O1]], [[S3]]
; ALL-NEXT: [[O3:%.*]] = or i64 [[O2]], [[E4]]		; LE-NEXT: [[O3:%.*]] = or i64 [[O2]], [[E4]]
; ALL-NEXT: ret i64 [[O3]]		; LE-NEXT: ret i64 [[O3]]
		;
		; BE-LABEL: @loadCombine_i64_BE(
		; BE-NEXT: [[L1:%.]] = load i64, ptr [[P:%.]], align 2
		; BE-NEXT: ret i64 [[L1]]
;		;
%p1 = getelementptr i16, ptr %p, i32 1		%p1 = getelementptr i16, ptr %p, i32 1
%p2 = getelementptr i16, ptr %p, i32 2		%p2 = getelementptr i16, ptr %p, i32 2
%p3 = getelementptr i16, ptr %p, i32 3		%p3 = getelementptr i16, ptr %p, i32 3
%l1 = load i16, ptr %p		%l1 = load i16, ptr %p
%l2 = load i16, ptr %p1		%l2 = load i16, ptr %p1
%l3 = load i16, ptr %p2		%l3 = load i16, ptr %p2
%l4 = load i16, ptr %p3		%l4 = load i16, ptr %p3
▲ Show 20 Lines • Show All 98 Lines • ▼ Show 20 Lines	;
%e1 = zext i8 %l1 to i16		%e1 = zext i8 %l1 to i16
%e2 = zext i8 %l2 to i16		%e2 = zext i8 %l2 to i16
%s2 = shl i16 %e2, 8		%s2 = shl i16 %e2, 8
%o1 = or i16 %e1, %s2		%o1 = or i16 %e1, %s2
ret i16 %o1		ret i16 %o1
}		}

define i64 @load64_farLoads(ptr %ptr) {		define i64 @load64_farLoads(ptr %ptr) {
; ALL-LABEL: @load64_farLoads(		; LE-LABEL: @load64_farLoads(
; ALL-NEXT: entry:		; LE-NEXT: entry:
; ALL-NEXT: [[TMP0:%.]] = load i8, ptr [[PTR:%.]], align 1		; LE-NEXT: [[TMP0:%.]] = load i64, ptr [[PTR:%.]], align 1
; ALL-NEXT: [[CONV:%.*]] = zext i8 [[TMP0]] to i64		; LE-NEXT: ret i64 [[TMP0]]
; ALL-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds i8, ptr [[PTR]], i64 1		;
; ALL-NEXT: [[TMP1:%.*]] = load i8, ptr [[ARRAYIDX1]], align 1		; BE-LABEL: @load64_farLoads(
; ALL-NEXT: [[CONV2:%.*]] = zext i8 [[TMP1]] to i64		; BE-NEXT: entry:
; ALL-NEXT: [[SHL:%.*]] = shl i64 [[CONV2]], 8		; BE-NEXT: [[TMP0:%.]] = load i8, ptr [[PTR:%.]], align 1
; ALL-NEXT: [[OR:%.*]] = or i64 [[CONV]], [[SHL]]		; BE-NEXT: [[CONV:%.*]] = zext i8 [[TMP0]] to i64
; ALL-NEXT: [[ARRAYIDX3:%.*]] = getelementptr inbounds i8, ptr [[PTR]], i64 2		; BE-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds i8, ptr [[PTR]], i64 1
; ALL-NEXT: [[TMP2:%.*]] = load i8, ptr [[ARRAYIDX3]], align 1		; BE-NEXT: [[TMP1:%.*]] = load i8, ptr [[ARRAYIDX1]], align 1
; ALL-NEXT: [[CONV4:%.*]] = zext i8 [[TMP2]] to i64		; BE-NEXT: [[CONV2:%.*]] = zext i8 [[TMP1]] to i64
; ALL-NEXT: [[SHL5:%.*]] = shl i64 [[CONV4]], 16		; BE-NEXT: [[SHL:%.*]] = shl i64 [[CONV2]], 8
; ALL-NEXT: [[OR6:%.*]] = or i64 [[OR]], [[SHL5]]		; BE-NEXT: [[OR:%.*]] = or i64 [[CONV]], [[SHL]]
; ALL-NEXT: [[ARRAYIDX7:%.*]] = getelementptr inbounds i8, ptr [[PTR]], i64 3		; BE-NEXT: [[ARRAYIDX3:%.*]] = getelementptr inbounds i8, ptr [[PTR]], i64 2
; ALL-NEXT: [[TMP3:%.*]] = load i8, ptr [[ARRAYIDX7]], align 1		; BE-NEXT: [[TMP2:%.*]] = load i8, ptr [[ARRAYIDX3]], align 1
; ALL-NEXT: [[CONV8:%.*]] = zext i8 [[TMP3]] to i64		; BE-NEXT: [[CONV4:%.*]] = zext i8 [[TMP2]] to i64
; ALL-NEXT: [[SHL9:%.*]] = shl i64 [[CONV8]], 24		; BE-NEXT: [[SHL5:%.*]] = shl i64 [[CONV4]], 16
; ALL-NEXT: [[OR10:%.*]] = or i64 [[OR6]], [[SHL9]]		; BE-NEXT: [[OR6:%.*]] = or i64 [[OR]], [[SHL5]]
; ALL-NEXT: [[ARRAYIDX11:%.*]] = getelementptr inbounds i8, ptr [[PTR]], i64 4		; BE-NEXT: [[ARRAYIDX7:%.*]] = getelementptr inbounds i8, ptr [[PTR]], i64 3
; ALL-NEXT: [[TMP4:%.*]] = load i8, ptr [[ARRAYIDX11]], align 1		; BE-NEXT: [[TMP3:%.*]] = load i8, ptr [[ARRAYIDX7]], align 1
; ALL-NEXT: [[CONV12:%.*]] = zext i8 [[TMP4]] to i64		; BE-NEXT: [[CONV8:%.*]] = zext i8 [[TMP3]] to i64
; ALL-NEXT: [[SHL13:%.*]] = shl i64 [[CONV12]], 32		; BE-NEXT: [[SHL9:%.*]] = shl i64 [[CONV8]], 24
; ALL-NEXT: [[OR14:%.*]] = or i64 [[OR10]], [[SHL13]]		; BE-NEXT: [[OR10:%.*]] = or i64 [[OR6]], [[SHL9]]
; ALL-NEXT: [[ARRAYIDX15:%.*]] = getelementptr inbounds i8, ptr [[PTR]], i64 5		; BE-NEXT: [[ARRAYIDX11:%.*]] = getelementptr inbounds i8, ptr [[PTR]], i64 4
; ALL-NEXT: [[TMP5:%.*]] = load i8, ptr [[ARRAYIDX15]], align 1		; BE-NEXT: [[TMP4:%.*]] = load i8, ptr [[ARRAYIDX11]], align 1
; ALL-NEXT: [[CONV16:%.*]] = zext i8 [[TMP5]] to i64		; BE-NEXT: [[CONV12:%.*]] = zext i8 [[TMP4]] to i64
; ALL-NEXT: [[SHL17:%.*]] = shl i64 [[CONV16]], 40		; BE-NEXT: [[SHL13:%.*]] = shl i64 [[CONV12]], 32
; ALL-NEXT: [[OR18:%.*]] = or i64 [[OR14]], [[SHL17]]		; BE-NEXT: [[OR14:%.*]] = or i64 [[OR10]], [[SHL13]]
; ALL-NEXT: [[ARRAYIDX19:%.*]] = getelementptr inbounds i8, ptr [[PTR]], i64 6		; BE-NEXT: [[ARRAYIDX15:%.*]] = getelementptr inbounds i8, ptr [[PTR]], i64 5
; ALL-NEXT: [[TMP6:%.*]] = load i8, ptr [[ARRAYIDX19]], align 1		; BE-NEXT: [[TMP5:%.*]] = load i8, ptr [[ARRAYIDX15]], align 1
; ALL-NEXT: [[CONV20:%.*]] = zext i8 [[TMP6]] to i64		; BE-NEXT: [[CONV16:%.*]] = zext i8 [[TMP5]] to i64
; ALL-NEXT: [[SHL21:%.*]] = shl i64 [[CONV20]], 48		; BE-NEXT: [[SHL17:%.*]] = shl i64 [[CONV16]], 40
; ALL-NEXT: [[OR22:%.*]] = or i64 [[OR18]], [[SHL21]]		; BE-NEXT: [[OR18:%.*]] = or i64 [[OR14]], [[SHL17]]
; ALL-NEXT: [[ARRAYIDX23:%.*]] = getelementptr inbounds i8, ptr [[PTR]], i64 7		; BE-NEXT: [[ARRAYIDX19:%.*]] = getelementptr inbounds i8, ptr [[PTR]], i64 6
; ALL-NEXT: [[TMP7:%.*]] = load i8, ptr [[ARRAYIDX23]], align 1		; BE-NEXT: [[TMP6:%.*]] = load i8, ptr [[ARRAYIDX19]], align 1
; ALL-NEXT: [[CONV24:%.*]] = zext i8 [[TMP7]] to i64		; BE-NEXT: [[CONV20:%.*]] = zext i8 [[TMP6]] to i64
; ALL-NEXT: [[SHL25:%.*]] = shl i64 [[CONV24]], 56		; BE-NEXT: [[SHL21:%.*]] = shl i64 [[CONV20]], 48
; ALL-NEXT: [[OR26:%.*]] = or i64 [[OR22]], [[SHL25]]		; BE-NEXT: [[OR22:%.*]] = or i64 [[OR18]], [[SHL21]]
; ALL-NEXT: ret i64 [[OR26]]		; BE-NEXT: [[ARRAYIDX23:%.*]] = getelementptr inbounds i8, ptr [[PTR]], i64 7
		; BE-NEXT: [[TMP7:%.*]] = load i8, ptr [[ARRAYIDX23]], align 1
		; BE-NEXT: [[CONV24:%.*]] = zext i8 [[TMP7]] to i64
		; BE-NEXT: [[SHL25:%.*]] = shl i64 [[CONV24]], 56
		; BE-NEXT: [[OR26:%.*]] = or i64 [[OR22]], [[SHL25]]
		; BE-NEXT: ret i64 [[OR26]]
;		;
entry:		entry:
%0 = load i8, ptr %ptr, align 1		%0 = load i8, ptr %ptr, align 1
%conv = zext i8 %0 to i64		%conv = zext i8 %0 to i64
%arrayidx1 = getelementptr inbounds i8, ptr %ptr, i64 1		%arrayidx1 = getelementptr inbounds i8, ptr %ptr, i64 1
%1 = load i8, ptr %arrayidx1, align 1		%1 = load i8, ptr %arrayidx1, align 1
%conv2 = zext i8 %1 to i64		%conv2 = zext i8 %1 to i64
%shl = shl i64 %conv2, 8		%shl = shl i64 %conv2, 8
Show All 27 Lines	entry:
%7 = load i8, ptr %arrayidx23, align 1		%7 = load i8, ptr %arrayidx23, align 1
%conv24 = zext i8 %7 to i64		%conv24 = zext i8 %7 to i64
%shl25 = shl i64 %conv24, 56		%shl25 = shl i64 %conv24, 56
%or26 = or i64 %or22, %shl25		%or26 = or i64 %or22, %shl25
ret i64 %or26		ret i64 %or26
}		}

define i32 @loadCombine_4consecutive_metadata(ptr %p, ptr %pstr) {		define i32 @loadCombine_4consecutive_metadata(ptr %p, ptr %pstr) {
; ALL-LABEL: @loadCombine_4consecutive_metadata(		; LE-LABEL: @loadCombine_4consecutive_metadata(
; ALL-NEXT: [[P1:%.]] = getelementptr i8, ptr [[P:%.]], i32 1		; LE-NEXT: [[L1:%.]] = load i32, ptr [[P:%.]], align 1, !alias.scope !0
; ALL-NEXT: [[P2:%.*]] = getelementptr i8, ptr [[P]], i32 2		; LE-NEXT: store i32 25, ptr [[PSTR:%.*]], align 4, !noalias !0
; ALL-NEXT: [[P3:%.*]] = getelementptr i8, ptr [[P]], i32 3		; LE-NEXT: ret i32 [[L1]]
; ALL-NEXT: [[L1:%.*]] = load i8, ptr [[P]], align 1, !alias.scope !0		;
; ALL-NEXT: [[L2:%.*]] = load i8, ptr [[P1]], align 1, !alias.scope !0		; BE-LABEL: @loadCombine_4consecutive_metadata(
; ALL-NEXT: [[L3:%.*]] = load i8, ptr [[P2]], align 1, !alias.scope !0		; BE-NEXT: [[P1:%.]] = getelementptr i8, ptr [[P:%.]], i32 1
; ALL-NEXT: [[L4:%.*]] = load i8, ptr [[P3]], align 1, !alias.scope !0		; BE-NEXT: [[P2:%.*]] = getelementptr i8, ptr [[P]], i32 2
; ALL-NEXT: store i32 25, ptr [[PSTR:%.*]], align 4, !noalias !0		; BE-NEXT: [[P3:%.*]] = getelementptr i8, ptr [[P]], i32 3
; ALL-NEXT: [[E1:%.*]] = zext i8 [[L1]] to i32		; BE-NEXT: [[L1:%.*]] = load i8, ptr [[P]], align 1, !alias.scope !0
; ALL-NEXT: [[E2:%.*]] = zext i8 [[L2]] to i32		; BE-NEXT: [[L2:%.*]] = load i8, ptr [[P1]], align 1, !alias.scope !0
; ALL-NEXT: [[E3:%.*]] = zext i8 [[L3]] to i32		; BE-NEXT: [[L3:%.*]] = load i8, ptr [[P2]], align 1, !alias.scope !0
; ALL-NEXT: [[E4:%.*]] = zext i8 [[L4]] to i32		; BE-NEXT: [[L4:%.*]] = load i8, ptr [[P3]], align 1, !alias.scope !0
; ALL-NEXT: [[S2:%.*]] = shl i32 [[E2]], 8		; BE-NEXT: store i32 25, ptr [[PSTR:%.*]], align 4, !noalias !0
; ALL-NEXT: [[S3:%.*]] = shl i32 [[E3]], 16		; BE-NEXT: [[E1:%.*]] = zext i8 [[L1]] to i32
; ALL-NEXT: [[S4:%.*]] = shl i32 [[E4]], 24		; BE-NEXT: [[E2:%.*]] = zext i8 [[L2]] to i32
; ALL-NEXT: [[O1:%.*]] = or i32 [[E1]], [[S2]]		; BE-NEXT: [[E3:%.*]] = zext i8 [[L3]] to i32
; ALL-NEXT: [[O2:%.*]] = or i32 [[O1]], [[S3]]		; BE-NEXT: [[E4:%.*]] = zext i8 [[L4]] to i32
; ALL-NEXT: [[O3:%.*]] = or i32 [[O2]], [[S4]]		; BE-NEXT: [[S2:%.*]] = shl i32 [[E2]], 8
; ALL-NEXT: ret i32 [[O3]]		; BE-NEXT: [[S3:%.*]] = shl i32 [[E3]], 16
		; BE-NEXT: [[S4:%.*]] = shl i32 [[E4]], 24
		; BE-NEXT: [[O1:%.*]] = or i32 [[E1]], [[S2]]
		; BE-NEXT: [[O2:%.*]] = or i32 [[O1]], [[S3]]
		; BE-NEXT: [[O3:%.*]] = or i32 [[O2]], [[S4]]
		; BE-NEXT: ret i32 [[O3]]
;		;
%p1 = getelementptr i8, ptr %p, i32 1		%p1 = getelementptr i8, ptr %p, i32 1
%p2 = getelementptr i8, ptr %p, i32 2		%p2 = getelementptr i8, ptr %p, i32 2
%p3 = getelementptr i8, ptr %p, i32 3		%p3 = getelementptr i8, ptr %p, i32 3
%l1 = load i8, ptr %p, !alias.scope !2		%l1 = load i8, ptr %p, !alias.scope !2
%l2 = load i8, ptr %p1, !alias.scope !2		%l2 = load i8, ptr %p1, !alias.scope !2
%l3 = load i8, ptr %p2, !alias.scope !2		%l3 = load i8, ptr %p2, !alias.scope !2
%l4 = load i8, ptr %p3, !alias.scope !2		%l4 = load i8, ptr %p3, !alias.scope !2
▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	;
%s4 = shl i16 %e4, 12		%s4 = shl i16 %e4, 12
%o1 = or i16 %e1, %s2		%o1 = or i16 %e1, %s2
%o2 = or i16 %o1, %s3		%o2 = or i16 %o1, %s3
%o3 = or i16 %o2, %s4		%o3 = or i16 %o2, %s4
ret i16 %o3		ret i16 %o3
}		}

define i32 @loadCombine_4consecutive_rev(ptr %p) {		define i32 @loadCombine_4consecutive_rev(ptr %p) {
; ALL-LABEL: @loadCombine_4consecutive_rev(		; LE-LABEL: @loadCombine_4consecutive_rev(
; ALL-NEXT: [[P1:%.]] = getelementptr i8, ptr [[P:%.]], i32 1		; LE-NEXT: [[P1:%.]] = getelementptr i8, ptr [[P:%.]], i32 1
; ALL-NEXT: [[P2:%.*]] = getelementptr i8, ptr [[P]], i32 2		; LE-NEXT: [[P2:%.*]] = getelementptr i8, ptr [[P]], i32 2
; ALL-NEXT: [[P3:%.*]] = getelementptr i8, ptr [[P]], i32 3		; LE-NEXT: [[L1:%.*]] = load i8, ptr [[P]], align 1
; ALL-NEXT: [[L1:%.*]] = load i8, ptr [[P]], align 1		; LE-NEXT: [[L2:%.*]] = load i8, ptr [[P1]], align 1
; ALL-NEXT: [[L2:%.*]] = load i8, ptr [[P1]], align 1		; LE-NEXT: [[L3:%.*]] = load i16, ptr [[P2]], align 1
; ALL-NEXT: [[L3:%.*]] = load i8, ptr [[P2]], align 1		; LE-NEXT: [[TMP1:%.*]] = zext i16 [[L3]] to i32
; ALL-NEXT: [[L4:%.*]] = load i8, ptr [[P3]], align 1		; LE-NEXT: [[TMP2:%.*]] = shl i32 [[TMP1]], 16
; ALL-NEXT: [[E1:%.*]] = zext i8 [[L1]] to i32		; LE-NEXT: [[E1:%.*]] = zext i8 [[L1]] to i32
; ALL-NEXT: [[E2:%.*]] = zext i8 [[L2]] to i32		; LE-NEXT: [[E2:%.*]] = zext i8 [[L2]] to i32
; ALL-NEXT: [[E3:%.*]] = zext i8 [[L3]] to i32		; LE-NEXT: [[S2:%.*]] = shl i32 [[E2]], 8
; ALL-NEXT: [[E4:%.*]] = zext i8 [[L4]] to i32		; LE-NEXT: [[O2:%.*]] = or i32 [[TMP2]], [[S2]]
; ALL-NEXT: [[S2:%.*]] = shl i32 [[E2]], 8		; LE-NEXT: [[O3:%.*]] = or i32 [[O2]], [[E1]]
; ALL-NEXT: [[S3:%.*]] = shl i32 [[E3]], 16		; LE-NEXT: ret i32 [[O3]]
; ALL-NEXT: [[S4:%.*]] = shl i32 [[E4]], 24		;
; ALL-NEXT: [[O1:%.*]] = or i32 [[S4]], [[S3]]		; BE-LABEL: @loadCombine_4consecutive_rev(
; ALL-NEXT: [[O2:%.*]] = or i32 [[O1]], [[S2]]		; BE-NEXT: [[P1:%.]] = getelementptr i8, ptr [[P:%.]], i32 1
; ALL-NEXT: [[O3:%.*]] = or i32 [[O2]], [[E1]]		; BE-NEXT: [[P2:%.*]] = getelementptr i8, ptr [[P]], i32 2
; ALL-NEXT: ret i32 [[O3]]		; BE-NEXT: [[P3:%.*]] = getelementptr i8, ptr [[P]], i32 3
		; BE-NEXT: [[L1:%.*]] = load i8, ptr [[P]], align 1
		; BE-NEXT: [[L2:%.*]] = load i8, ptr [[P1]], align 1
		; BE-NEXT: [[L3:%.*]] = load i8, ptr [[P2]], align 1
		; BE-NEXT: [[L4:%.*]] = load i8, ptr [[P3]], align 1
		; BE-NEXT: [[E1:%.*]] = zext i8 [[L1]] to i32
		; BE-NEXT: [[E2:%.*]] = zext i8 [[L2]] to i32
		; BE-NEXT: [[E3:%.*]] = zext i8 [[L3]] to i32
		; BE-NEXT: [[E4:%.*]] = zext i8 [[L4]] to i32
		; BE-NEXT: [[S2:%.*]] = shl i32 [[E2]], 8
		; BE-NEXT: [[S3:%.*]] = shl i32 [[E3]], 16
		; BE-NEXT: [[S4:%.*]] = shl i32 [[E4]], 24
		; BE-NEXT: [[O1:%.*]] = or i32 [[S4]], [[S3]]
		; BE-NEXT: [[O2:%.*]] = or i32 [[O1]], [[S2]]
		; BE-NEXT: [[O3:%.*]] = or i32 [[O2]], [[E1]]
		; BE-NEXT: ret i32 [[O3]]
;		;
%p1 = getelementptr i8, ptr %p, i32 1		%p1 = getelementptr i8, ptr %p, i32 1
%p2 = getelementptr i8, ptr %p, i32 2		%p2 = getelementptr i8, ptr %p, i32 2
%p3 = getelementptr i8, ptr %p, i32 3		%p3 = getelementptr i8, ptr %p, i32 3
%l1 = load i8, ptr %p		%l1 = load i8, ptr %p
%l2 = load i8, ptr %p1		%l2 = load i8, ptr %p1
%l3 = load i8, ptr %p2		%l3 = load i8, ptr %p2
%l4 = load i8, ptr %p3		%l4 = load i8, ptr %p3
Show All 9 Lines	;

%o1 = or i32 %s4, %s3		%o1 = or i32 %s4, %s3
%o2 = or i32 %o1, %s2		%o2 = or i32 %o1, %s2
%o3 = or i32 %o2, %e1		%o3 = or i32 %o2, %e1
ret i32 %o3		ret i32 %o3
}		}

define i64 @loadCombine_8consecutive_rev(ptr %p) {		define i64 @loadCombine_8consecutive_rev(ptr %p) {
; ALL-LABEL: @loadCombine_8consecutive_rev(		; LE-LABEL: @loadCombine_8consecutive_rev(
; ALL-NEXT: [[P1:%.]] = getelementptr i8, ptr [[P:%.]], i32 1		; LE-NEXT: [[P1:%.]] = getelementptr i8, ptr [[P:%.]], i32 1
; ALL-NEXT: [[P2:%.*]] = getelementptr i8, ptr [[P]], i32 2		; LE-NEXT: [[P2:%.*]] = getelementptr i8, ptr [[P]], i32 2
; ALL-NEXT: [[P3:%.*]] = getelementptr i8, ptr [[P]], i32 3		; LE-NEXT: [[P3:%.*]] = getelementptr i8, ptr [[P]], i32 3
; ALL-NEXT: [[P4:%.*]] = getelementptr i8, ptr [[P]], i32 4		; LE-NEXT: [[P4:%.*]] = getelementptr i8, ptr [[P]], i32 4
; ALL-NEXT: [[P5:%.*]] = getelementptr i8, ptr [[P]], i32 5		; LE-NEXT: [[P5:%.*]] = getelementptr i8, ptr [[P]], i32 5
; ALL-NEXT: [[P6:%.*]] = getelementptr i8, ptr [[P]], i32 6		; LE-NEXT: [[P6:%.*]] = getelementptr i8, ptr [[P]], i32 6
; ALL-NEXT: [[P7:%.*]] = getelementptr i8, ptr [[P]], i32 7		; LE-NEXT: [[L1:%.*]] = load i8, ptr [[P]], align 1
; ALL-NEXT: [[L1:%.*]] = load i8, ptr [[P]], align 1		; LE-NEXT: [[L2:%.*]] = load i8, ptr [[P1]], align 1
; ALL-NEXT: [[L2:%.*]] = load i8, ptr [[P1]], align 1		; LE-NEXT: [[L3:%.*]] = load i8, ptr [[P2]], align 1
; ALL-NEXT: [[L3:%.*]] = load i8, ptr [[P2]], align 1		; LE-NEXT: [[L4:%.*]] = load i8, ptr [[P3]], align 1
; ALL-NEXT: [[L4:%.*]] = load i8, ptr [[P3]], align 1		; LE-NEXT: [[L5:%.*]] = load i8, ptr [[P4]], align 1
; ALL-NEXT: [[L5:%.*]] = load i8, ptr [[P4]], align 1		; LE-NEXT: [[L6:%.*]] = load i8, ptr [[P5]], align 1
; ALL-NEXT: [[L6:%.*]] = load i8, ptr [[P5]], align 1		; LE-NEXT: [[L7:%.*]] = load i16, ptr [[P6]], align 1
; ALL-NEXT: [[L7:%.*]] = load i8, ptr [[P6]], align 1		; LE-NEXT: [[TMP1:%.*]] = zext i16 [[L7]] to i64
; ALL-NEXT: [[L8:%.*]] = load i8, ptr [[P7]], align 1		; LE-NEXT: [[TMP2:%.*]] = shl i64 [[TMP1]], 48
; ALL-NEXT: [[E1:%.*]] = zext i8 [[L1]] to i64		; LE-NEXT: [[E1:%.*]] = zext i8 [[L1]] to i64
; ALL-NEXT: [[E2:%.*]] = zext i8 [[L2]] to i64		; LE-NEXT: [[E2:%.*]] = zext i8 [[L2]] to i64
; ALL-NEXT: [[E3:%.*]] = zext i8 [[L3]] to i64		; LE-NEXT: [[E3:%.*]] = zext i8 [[L3]] to i64
; ALL-NEXT: [[E4:%.*]] = zext i8 [[L4]] to i64		; LE-NEXT: [[E4:%.*]] = zext i8 [[L4]] to i64
; ALL-NEXT: [[E5:%.*]] = zext i8 [[L5]] to i64		; LE-NEXT: [[E5:%.*]] = zext i8 [[L5]] to i64
; ALL-NEXT: [[E6:%.*]] = zext i8 [[L6]] to i64		; LE-NEXT: [[E6:%.*]] = zext i8 [[L6]] to i64
; ALL-NEXT: [[E7:%.*]] = zext i8 [[L7]] to i64		; LE-NEXT: [[S2:%.*]] = shl i64 [[E2]], 8
; ALL-NEXT: [[E8:%.*]] = zext i8 [[L8]] to i64		; LE-NEXT: [[S3:%.*]] = shl i64 [[E3]], 16
; ALL-NEXT: [[S2:%.*]] = shl i64 [[E2]], 8		; LE-NEXT: [[S4:%.*]] = shl i64 [[E4]], 24
; ALL-NEXT: [[S3:%.*]] = shl i64 [[E3]], 16		; LE-NEXT: [[S5:%.*]] = shl i64 [[E5]], 32
; ALL-NEXT: [[S4:%.*]] = shl i64 [[E4]], 24		; LE-NEXT: [[S6:%.*]] = shl i64 [[E6]], 40
; ALL-NEXT: [[S5:%.*]] = shl i64 [[E5]], 32		; LE-NEXT: [[O6:%.*]] = or i64 [[TMP2]], [[S6]]
; ALL-NEXT: [[S6:%.*]] = shl i64 [[E6]], 40		; LE-NEXT: [[O5:%.*]] = or i64 [[O6]], [[S5]]
; ALL-NEXT: [[S7:%.*]] = shl i64 [[E7]], 48		; LE-NEXT: [[O4:%.*]] = or i64 [[O5]], [[S4]]
; ALL-NEXT: [[S8:%.*]] = shl i64 [[E8]], 56		; LE-NEXT: [[O3:%.*]] = or i64 [[O4]], [[S3]]
; ALL-NEXT: [[O7:%.*]] = or i64 [[S8]], [[S7]]		; LE-NEXT: [[O2:%.*]] = or i64 [[O3]], [[S2]]
; ALL-NEXT: [[O6:%.*]] = or i64 [[O7]], [[S6]]		; LE-NEXT: [[O1:%.*]] = or i64 [[O2]], [[E1]]
; ALL-NEXT: [[O5:%.*]] = or i64 [[O6]], [[S5]]		; LE-NEXT: ret i64 [[O1]]
; ALL-NEXT: [[O4:%.*]] = or i64 [[O5]], [[S4]]		;
; ALL-NEXT: [[O3:%.*]] = or i64 [[O4]], [[S3]]		; BE-LABEL: @loadCombine_8consecutive_rev(
; ALL-NEXT: [[O2:%.*]] = or i64 [[O3]], [[S2]]		; BE-NEXT: [[P1:%.]] = getelementptr i8, ptr [[P:%.]], i32 1
; ALL-NEXT: [[O1:%.*]] = or i64 [[O2]], [[E1]]		; BE-NEXT: [[P2:%.*]] = getelementptr i8, ptr [[P]], i32 2
; ALL-NEXT: ret i64 [[O1]]		; BE-NEXT: [[P3:%.*]] = getelementptr i8, ptr [[P]], i32 3
		; BE-NEXT: [[P4:%.*]] = getelementptr i8, ptr [[P]], i32 4
		; BE-NEXT: [[P5:%.*]] = getelementptr i8, ptr [[P]], i32 5
		; BE-NEXT: [[P6:%.*]] = getelementptr i8, ptr [[P]], i32 6
		; BE-NEXT: [[P7:%.*]] = getelementptr i8, ptr [[P]], i32 7
		; BE-NEXT: [[L1:%.*]] = load i8, ptr [[P]], align 1
		; BE-NEXT: [[L2:%.*]] = load i8, ptr [[P1]], align 1
		; BE-NEXT: [[L3:%.*]] = load i8, ptr [[P2]], align 1
		; BE-NEXT: [[L4:%.*]] = load i8, ptr [[P3]], align 1
		; BE-NEXT: [[L5:%.*]] = load i8, ptr [[P4]], align 1
		; BE-NEXT: [[L6:%.*]] = load i8, ptr [[P5]], align 1
		; BE-NEXT: [[L7:%.*]] = load i8, ptr [[P6]], align 1
		; BE-NEXT: [[L8:%.*]] = load i8, ptr [[P7]], align 1
		; BE-NEXT: [[E1:%.*]] = zext i8 [[L1]] to i64
		; BE-NEXT: [[E2:%.*]] = zext i8 [[L2]] to i64
		; BE-NEXT: [[E3:%.*]] = zext i8 [[L3]] to i64
		; BE-NEXT: [[E4:%.*]] = zext i8 [[L4]] to i64
		; BE-NEXT: [[E5:%.*]] = zext i8 [[L5]] to i64
		; BE-NEXT: [[E6:%.*]] = zext i8 [[L6]] to i64
		; BE-NEXT: [[E7:%.*]] = zext i8 [[L7]] to i64
		; BE-NEXT: [[E8:%.*]] = zext i8 [[L8]] to i64
		; BE-NEXT: [[S2:%.*]] = shl i64 [[E2]], 8
		; BE-NEXT: [[S3:%.*]] = shl i64 [[E3]], 16
		; BE-NEXT: [[S4:%.*]] = shl i64 [[E4]], 24
		; BE-NEXT: [[S5:%.*]] = shl i64 [[E5]], 32
		; BE-NEXT: [[S6:%.*]] = shl i64 [[E6]], 40
		; BE-NEXT: [[S7:%.*]] = shl i64 [[E7]], 48
		; BE-NEXT: [[S8:%.*]] = shl i64 [[E8]], 56
		; BE-NEXT: [[O7:%.*]] = or i64 [[S8]], [[S7]]
		; BE-NEXT: [[O6:%.*]] = or i64 [[O7]], [[S6]]
		; BE-NEXT: [[O5:%.*]] = or i64 [[O6]], [[S5]]
		; BE-NEXT: [[O4:%.*]] = or i64 [[O5]], [[S4]]
		; BE-NEXT: [[O3:%.*]] = or i64 [[O4]], [[S3]]
		; BE-NEXT: [[O2:%.*]] = or i64 [[O3]], [[S2]]
		; BE-NEXT: [[O1:%.*]] = or i64 [[O2]], [[E1]]
		; BE-NEXT: ret i64 [[O1]]
;		;
%p1 = getelementptr i8, ptr %p, i32 1		%p1 = getelementptr i8, ptr %p, i32 1
%p2 = getelementptr i8, ptr %p, i32 2		%p2 = getelementptr i8, ptr %p, i32 2
%p3 = getelementptr i8, ptr %p, i32 3		%p3 = getelementptr i8, ptr %p, i32 3
%p4 = getelementptr i8, ptr %p, i32 4		%p4 = getelementptr i8, ptr %p, i32 4
%p5 = getelementptr i8, ptr %p, i32 5		%p5 = getelementptr i8, ptr %p, i32 5
%p6 = getelementptr i8, ptr %p, i32 6		%p6 = getelementptr i8, ptr %p, i32 6
%p7 = getelementptr i8, ptr %p, i32 7		%p7 = getelementptr i8, ptr %p, i32 7
Show All 29 Lines	;
%o4 = or i64 %o5, %s4		%o4 = or i64 %o5, %s4
%o3 = or i64 %o4, %s3		%o3 = or i64 %o4, %s3
%o2 = or i64 %o3, %s2		%o2 = or i64 %o3, %s2
%o1 = or i64 %o2, %e1		%o1 = or i64 %o2, %e1
ret i64 %o1		ret i64 %o1
}		}

define i64 @loadCombine_8consecutive_rev_BE(ptr %p) {		define i64 @loadCombine_8consecutive_rev_BE(ptr %p) {
; ALL-LABEL: @loadCombine_8consecutive_rev_BE(		; LE-LABEL: @loadCombine_8consecutive_rev_BE(
; ALL-NEXT: [[P1:%.]] = getelementptr i8, ptr [[P:%.]], i32 1		; LE-NEXT: [[P1:%.]] = getelementptr i8, ptr [[P:%.]], i32 1
; ALL-NEXT: [[P2:%.*]] = getelementptr i8, ptr [[P]], i32 2		; LE-NEXT: [[P2:%.*]] = getelementptr i8, ptr [[P]], i32 2
; ALL-NEXT: [[P3:%.*]] = getelementptr i8, ptr [[P]], i32 3		; LE-NEXT: [[P3:%.*]] = getelementptr i8, ptr [[P]], i32 3
; ALL-NEXT: [[P4:%.*]] = getelementptr i8, ptr [[P]], i32 4		; LE-NEXT: [[P4:%.*]] = getelementptr i8, ptr [[P]], i32 4
; ALL-NEXT: [[P5:%.*]] = getelementptr i8, ptr [[P]], i32 5		; LE-NEXT: [[P5:%.*]] = getelementptr i8, ptr [[P]], i32 5
; ALL-NEXT: [[P6:%.*]] = getelementptr i8, ptr [[P]], i32 6		; LE-NEXT: [[P6:%.*]] = getelementptr i8, ptr [[P]], i32 6
; ALL-NEXT: [[P7:%.*]] = getelementptr i8, ptr [[P]], i32 7		; LE-NEXT: [[P7:%.*]] = getelementptr i8, ptr [[P]], i32 7
; ALL-NEXT: [[L1:%.*]] = load i8, ptr [[P]], align 1		; LE-NEXT: [[L1:%.*]] = load i8, ptr [[P]], align 1
; ALL-NEXT: [[L2:%.*]] = load i8, ptr [[P1]], align 1		; LE-NEXT: [[L2:%.*]] = load i8, ptr [[P1]], align 1
; ALL-NEXT: [[L3:%.*]] = load i8, ptr [[P2]], align 1		; LE-NEXT: [[L3:%.*]] = load i8, ptr [[P2]], align 1
; ALL-NEXT: [[L4:%.*]] = load i8, ptr [[P3]], align 1		; LE-NEXT: [[L4:%.*]] = load i8, ptr [[P3]], align 1
; ALL-NEXT: [[L5:%.*]] = load i8, ptr [[P4]], align 1		; LE-NEXT: [[L5:%.*]] = load i8, ptr [[P4]], align 1
; ALL-NEXT: [[L6:%.*]] = load i8, ptr [[P5]], align 1		; LE-NEXT: [[L6:%.*]] = load i8, ptr [[P5]], align 1
; ALL-NEXT: [[L7:%.*]] = load i8, ptr [[P6]], align 1		; LE-NEXT: [[L7:%.*]] = load i8, ptr [[P6]], align 1
; ALL-NEXT: [[L8:%.*]] = load i8, ptr [[P7]], align 1		; LE-NEXT: [[L8:%.*]] = load i8, ptr [[P7]], align 1
; ALL-NEXT: [[E1:%.*]] = zext i8 [[L1]] to i64		; LE-NEXT: [[E1:%.*]] = zext i8 [[L1]] to i64
; ALL-NEXT: [[E2:%.*]] = zext i8 [[L2]] to i64		; LE-NEXT: [[E2:%.*]] = zext i8 [[L2]] to i64
; ALL-NEXT: [[E3:%.*]] = zext i8 [[L3]] to i64		; LE-NEXT: [[E3:%.*]] = zext i8 [[L3]] to i64
; ALL-NEXT: [[E4:%.*]] = zext i8 [[L4]] to i64		; LE-NEXT: [[E4:%.*]] = zext i8 [[L4]] to i64
; ALL-NEXT: [[E5:%.*]] = zext i8 [[L5]] to i64		; LE-NEXT: [[E5:%.*]] = zext i8 [[L5]] to i64
; ALL-NEXT: [[E6:%.*]] = zext i8 [[L6]] to i64		; LE-NEXT: [[E6:%.*]] = zext i8 [[L6]] to i64
; ALL-NEXT: [[E7:%.*]] = zext i8 [[L7]] to i64		; LE-NEXT: [[E7:%.*]] = zext i8 [[L7]] to i64
; ALL-NEXT: [[E8:%.*]] = zext i8 [[L8]] to i64		; LE-NEXT: [[E8:%.*]] = zext i8 [[L8]] to i64
; ALL-NEXT: [[S1:%.*]] = shl i64 [[E1]], 56		; LE-NEXT: [[S1:%.*]] = shl i64 [[E1]], 56
; ALL-NEXT: [[S2:%.*]] = shl i64 [[E2]], 48		; LE-NEXT: [[S2:%.*]] = shl i64 [[E2]], 48
; ALL-NEXT: [[S3:%.*]] = shl i64 [[E3]], 40		; LE-NEXT: [[S3:%.*]] = shl i64 [[E3]], 40
; ALL-NEXT: [[S4:%.*]] = shl i64 [[E4]], 32		; LE-NEXT: [[S4:%.*]] = shl i64 [[E4]], 32
; ALL-NEXT: [[S5:%.*]] = shl i64 [[E5]], 24		; LE-NEXT: [[S5:%.*]] = shl i64 [[E5]], 24
; ALL-NEXT: [[S6:%.*]] = shl i64 [[E6]], 16		; LE-NEXT: [[S6:%.*]] = shl i64 [[E6]], 16
; ALL-NEXT: [[S7:%.*]] = shl i64 [[E7]], 8		; LE-NEXT: [[S7:%.*]] = shl i64 [[E7]], 8
; ALL-NEXT: [[O7:%.*]] = or i64 [[E8]], [[S7]]		; LE-NEXT: [[O7:%.*]] = or i64 [[E8]], [[S7]]
; ALL-NEXT: [[O6:%.*]] = or i64 [[O7]], [[S6]]		; LE-NEXT: [[O6:%.*]] = or i64 [[O7]], [[S6]]
; ALL-NEXT: [[O5:%.*]] = or i64 [[O6]], [[S5]]		; LE-NEXT: [[O5:%.*]] = or i64 [[O6]], [[S5]]
; ALL-NEXT: [[O4:%.*]] = or i64 [[O5]], [[S4]]		; LE-NEXT: [[O4:%.*]] = or i64 [[O5]], [[S4]]
; ALL-NEXT: [[O3:%.*]] = or i64 [[O4]], [[S3]]		; LE-NEXT: [[O3:%.*]] = or i64 [[O4]], [[S3]]
; ALL-NEXT: [[O2:%.*]] = or i64 [[O3]], [[S2]]		; LE-NEXT: [[O2:%.*]] = or i64 [[O3]], [[S2]]
; ALL-NEXT: [[O1:%.*]] = or i64 [[O2]], [[S1]]		; LE-NEXT: [[O1:%.*]] = or i64 [[O2]], [[S1]]
; ALL-NEXT: ret i64 [[O1]]		; LE-NEXT: ret i64 [[O1]]
		;
		; BE-LABEL: @loadCombine_8consecutive_rev_BE(
		; BE-NEXT: [[P1:%.]] = getelementptr i8, ptr [[P:%.]], i32 1
		; BE-NEXT: [[P2:%.*]] = getelementptr i8, ptr [[P]], i32 2
		; BE-NEXT: [[P3:%.*]] = getelementptr i8, ptr [[P]], i32 3
		; BE-NEXT: [[P4:%.*]] = getelementptr i8, ptr [[P]], i32 4
		; BE-NEXT: [[P5:%.*]] = getelementptr i8, ptr [[P]], i32 5
		; BE-NEXT: [[P6:%.*]] = getelementptr i8, ptr [[P]], i32 6
		; BE-NEXT: [[L1:%.*]] = load i8, ptr [[P]], align 1
		; BE-NEXT: [[L2:%.*]] = load i8, ptr [[P1]], align 1
		; BE-NEXT: [[L3:%.*]] = load i8, ptr [[P2]], align 1
		; BE-NEXT: [[L4:%.*]] = load i8, ptr [[P3]], align 1
		; BE-NEXT: [[L5:%.*]] = load i8, ptr [[P4]], align 1
		; BE-NEXT: [[L6:%.*]] = load i8, ptr [[P5]], align 1
		; BE-NEXT: [[L7:%.*]] = load i16, ptr [[P6]], align 1
		; BE-NEXT: [[TMP1:%.*]] = zext i16 [[L7]] to i64
		; BE-NEXT: [[E1:%.*]] = zext i8 [[L1]] to i64
		; BE-NEXT: [[E2:%.*]] = zext i8 [[L2]] to i64
		; BE-NEXT: [[E3:%.*]] = zext i8 [[L3]] to i64
		; BE-NEXT: [[E4:%.*]] = zext i8 [[L4]] to i64
		; BE-NEXT: [[E5:%.*]] = zext i8 [[L5]] to i64
		; BE-NEXT: [[E6:%.*]] = zext i8 [[L6]] to i64
		; BE-NEXT: [[S1:%.*]] = shl i64 [[E1]], 56
		; BE-NEXT: [[S2:%.*]] = shl i64 [[E2]], 48
		; BE-NEXT: [[S3:%.*]] = shl i64 [[E3]], 40
		; BE-NEXT: [[S4:%.*]] = shl i64 [[E4]], 32
		; BE-NEXT: [[S5:%.*]] = shl i64 [[E5]], 24
		; BE-NEXT: [[S6:%.*]] = shl i64 [[E6]], 16
		; BE-NEXT: [[O6:%.*]] = or i64 [[TMP1]], [[S6]]
		; BE-NEXT: [[O5:%.*]] = or i64 [[O6]], [[S5]]
		; BE-NEXT: [[O4:%.*]] = or i64 [[O5]], [[S4]]
		; BE-NEXT: [[O3:%.*]] = or i64 [[O4]], [[S3]]
		; BE-NEXT: [[O2:%.*]] = or i64 [[O3]], [[S2]]
		; BE-NEXT: [[O1:%.*]] = or i64 [[O2]], [[S1]]
		; BE-NEXT: ret i64 [[O1]]
;		;
%p1 = getelementptr i8, ptr %p, i32 1		%p1 = getelementptr i8, ptr %p, i32 1
%p2 = getelementptr i8, ptr %p, i32 2		%p2 = getelementptr i8, ptr %p, i32 2
%p3 = getelementptr i8, ptr %p, i32 3		%p3 = getelementptr i8, ptr %p, i32 3
%p4 = getelementptr i8, ptr %p, i32 4		%p4 = getelementptr i8, ptr %p, i32 4
%p5 = getelementptr i8, ptr %p, i32 5		%p5 = getelementptr i8, ptr %p, i32 5
%p6 = getelementptr i8, ptr %p, i32 6		%p6 = getelementptr i8, ptr %p, i32 6
%p7 = getelementptr i8, ptr %p, i32 7		%p7 = getelementptr i8, ptr %p, i32 7
Show All 29 Lines	;
%o4 = or i64 %o5, %s4		%o4 = or i64 %o5, %s4
%o3 = or i64 %o4, %s3		%o3 = or i64 %o4, %s3
%o2 = or i64 %o3, %s2		%o2 = or i64 %o3, %s2
%o1 = or i64 %o2, %s1		%o1 = or i64 %o2, %s1
ret i64 %o1		ret i64 %o1
}		}

define i64 @eggs(ptr noundef readonly %arg) {		define i64 @eggs(ptr noundef readonly %arg) {
; ALL-LABEL: @eggs(		; LE-LABEL: @eggs(
; ALL-NEXT: [[TMP3:%.]] = load i8, ptr [[ARG:%.]], align 1		; LE-NEXT: [[TMP3:%.]] = load i8, ptr [[ARG:%.]], align 1
; ALL-NEXT: [[TMP4:%.*]] = getelementptr inbounds i8, ptr [[ARG]], i64 1		; LE-NEXT: [[TMP4:%.*]] = getelementptr inbounds i8, ptr [[ARG]], i64 1
; ALL-NEXT: [[TMP5:%.*]] = load i8, ptr [[TMP4]], align 1		; LE-NEXT: [[TMP5:%.*]] = load i8, ptr [[TMP4]], align 1
; ALL-NEXT: [[TMP6:%.*]] = getelementptr inbounds i8, ptr [[ARG]], i64 2		; LE-NEXT: [[TMP6:%.*]] = getelementptr inbounds i8, ptr [[ARG]], i64 2
; ALL-NEXT: [[TMP7:%.*]] = load i8, ptr [[TMP6]], align 1		; LE-NEXT: [[TMP7:%.*]] = load i8, ptr [[TMP6]], align 1
; ALL-NEXT: [[TMP8:%.*]] = getelementptr inbounds i8, ptr [[ARG]], i64 3		; LE-NEXT: [[TMP8:%.*]] = getelementptr inbounds i8, ptr [[ARG]], i64 3
; ALL-NEXT: [[TMP9:%.*]] = load i8, ptr [[TMP8]], align 1		; LE-NEXT: [[TMP9:%.*]] = load i8, ptr [[TMP8]], align 1
; ALL-NEXT: [[TMP10:%.*]] = getelementptr inbounds i8, ptr [[ARG]], i64 4		; LE-NEXT: [[TMP10:%.*]] = getelementptr inbounds i8, ptr [[ARG]], i64 4
; ALL-NEXT: [[TMP11:%.*]] = load i8, ptr [[TMP10]], align 1		; LE-NEXT: [[TMP11:%.*]] = load i8, ptr [[TMP10]], align 1
; ALL-NEXT: [[TMP12:%.*]] = getelementptr inbounds i8, ptr [[ARG]], i64 5		; LE-NEXT: [[TMP12:%.*]] = getelementptr inbounds i8, ptr [[ARG]], i64 5
; ALL-NEXT: [[TMP13:%.*]] = load i8, ptr [[TMP12]], align 1		; LE-NEXT: [[TMP13:%.*]] = load i8, ptr [[TMP12]], align 1
; ALL-NEXT: [[TMP14:%.*]] = getelementptr inbounds i8, ptr [[ARG]], i64 6		; LE-NEXT: [[TMP14:%.*]] = getelementptr inbounds i8, ptr [[ARG]], i64 6
; ALL-NEXT: [[TMP15:%.*]] = load i8, ptr [[TMP14]], align 1		; LE-NEXT: [[TMP15:%.*]] = load i16, ptr [[TMP14]], align 1
; ALL-NEXT: [[TMP16:%.*]] = getelementptr inbounds i8, ptr [[ARG]], i64 7		; LE-NEXT: [[TMP1:%.*]] = zext i16 [[TMP15]] to i64
; ALL-NEXT: [[TMP17:%.*]] = load i8, ptr [[TMP16]], align 1		; LE-NEXT: [[TMP2:%.*]] = shl i64 [[TMP1]], 48
; ALL-NEXT: [[TMP18:%.*]] = zext i8 [[TMP17]] to i64		; LE-NEXT: [[TMP23:%.*]] = zext i8 [[TMP13]] to i64
; ALL-NEXT: [[TMP19:%.*]] = shl nuw i64 [[TMP18]], 56		; LE-NEXT: [[TMP24:%.*]] = shl nuw nsw i64 [[TMP23]], 40
; ALL-NEXT: [[TMP20:%.*]] = zext i8 [[TMP15]] to i64		; LE-NEXT: [[TMP25:%.*]] = or i64 [[TMP2]], [[TMP24]]
; ALL-NEXT: [[TMP21:%.*]] = shl nuw nsw i64 [[TMP20]], 48		; LE-NEXT: [[TMP26:%.*]] = zext i8 [[TMP11]] to i64
; ALL-NEXT: [[TMP22:%.*]] = or i64 [[TMP19]], [[TMP21]]		; LE-NEXT: [[TMP27:%.*]] = shl nuw nsw i64 [[TMP26]], 32
; ALL-NEXT: [[TMP23:%.*]] = zext i8 [[TMP13]] to i64		; LE-NEXT: [[TMP28:%.*]] = or i64 [[TMP25]], [[TMP27]]
; ALL-NEXT: [[TMP24:%.*]] = shl nuw nsw i64 [[TMP23]], 40		; LE-NEXT: [[TMP29:%.*]] = zext i8 [[TMP9]] to i64
; ALL-NEXT: [[TMP25:%.*]] = or i64 [[TMP22]], [[TMP24]]		; LE-NEXT: [[TMP30:%.*]] = shl nuw nsw i64 [[TMP29]], 24
; ALL-NEXT: [[TMP26:%.*]] = zext i8 [[TMP11]] to i64		; LE-NEXT: [[TMP31:%.*]] = or i64 [[TMP28]], [[TMP30]]
; ALL-NEXT: [[TMP27:%.*]] = shl nuw nsw i64 [[TMP26]], 32		; LE-NEXT: [[TMP32:%.*]] = zext i8 [[TMP7]] to i64
; ALL-NEXT: [[TMP28:%.*]] = or i64 [[TMP25]], [[TMP27]]		; LE-NEXT: [[TMP33:%.*]] = shl nuw nsw i64 [[TMP32]], 16
; ALL-NEXT: [[TMP29:%.*]] = zext i8 [[TMP9]] to i64		; LE-NEXT: [[TMP34:%.*]] = zext i8 [[TMP5]] to i64
; ALL-NEXT: [[TMP30:%.*]] = shl nuw nsw i64 [[TMP29]], 24		; LE-NEXT: [[TMP35:%.*]] = shl nuw nsw i64 [[TMP34]], 8
; ALL-NEXT: [[TMP31:%.*]] = or i64 [[TMP28]], [[TMP30]]		; LE-NEXT: [[TMP36:%.*]] = or i64 [[TMP31]], [[TMP33]]
; ALL-NEXT: [[TMP32:%.*]] = zext i8 [[TMP7]] to i64		; LE-NEXT: [[TMP37:%.*]] = zext i8 [[TMP3]] to i64
; ALL-NEXT: [[TMP33:%.*]] = shl nuw nsw i64 [[TMP32]], 16		; LE-NEXT: [[TMP38:%.*]] = or i64 [[TMP36]], [[TMP35]]
; ALL-NEXT: [[TMP34:%.*]] = zext i8 [[TMP5]] to i64		; LE-NEXT: [[TMP39:%.*]] = or i64 [[TMP38]], [[TMP37]]
; ALL-NEXT: [[TMP35:%.*]] = shl nuw nsw i64 [[TMP34]], 8		; LE-NEXT: ret i64 [[TMP39]]
; ALL-NEXT: [[TMP36:%.*]] = or i64 [[TMP31]], [[TMP33]]		;
; ALL-NEXT: [[TMP37:%.*]] = zext i8 [[TMP3]] to i64		; BE-LABEL: @eggs(
; ALL-NEXT: [[TMP38:%.*]] = or i64 [[TMP36]], [[TMP35]]		; BE-NEXT: [[TMP3:%.]] = load i8, ptr [[ARG:%.]], align 1
; ALL-NEXT: [[TMP39:%.*]] = or i64 [[TMP38]], [[TMP37]]		; BE-NEXT: [[TMP4:%.*]] = getelementptr inbounds i8, ptr [[ARG]], i64 1
; ALL-NEXT: ret i64 [[TMP39]]		; BE-NEXT: [[TMP5:%.*]] = load i8, ptr [[TMP4]], align 1
		; BE-NEXT: [[TMP6:%.*]] = getelementptr inbounds i8, ptr [[ARG]], i64 2
		; BE-NEXT: [[TMP7:%.*]] = load i8, ptr [[TMP6]], align 1
		; BE-NEXT: [[TMP8:%.*]] = getelementptr inbounds i8, ptr [[ARG]], i64 3
		; BE-NEXT: [[TMP9:%.*]] = load i8, ptr [[TMP8]], align 1
		; BE-NEXT: [[TMP10:%.*]] = getelementptr inbounds i8, ptr [[ARG]], i64 4
		; BE-NEXT: [[TMP11:%.*]] = load i8, ptr [[TMP10]], align 1
		; BE-NEXT: [[TMP12:%.*]] = getelementptr inbounds i8, ptr [[ARG]], i64 5
		; BE-NEXT: [[TMP13:%.*]] = load i8, ptr [[TMP12]], align 1
		; BE-NEXT: [[TMP14:%.*]] = getelementptr inbounds i8, ptr [[ARG]], i64 6
		; BE-NEXT: [[TMP15:%.*]] = load i8, ptr [[TMP14]], align 1
		; BE-NEXT: [[TMP16:%.*]] = getelementptr inbounds i8, ptr [[ARG]], i64 7
		; BE-NEXT: [[TMP17:%.*]] = load i8, ptr [[TMP16]], align 1
		; BE-NEXT: [[TMP18:%.*]] = zext i8 [[TMP17]] to i64
		; BE-NEXT: [[TMP19:%.*]] = shl nuw i64 [[TMP18]], 56
		; BE-NEXT: [[TMP20:%.*]] = zext i8 [[TMP15]] to i64
		; BE-NEXT: [[TMP21:%.*]] = shl nuw nsw i64 [[TMP20]], 48
		; BE-NEXT: [[TMP22:%.*]] = or i64 [[TMP19]], [[TMP21]]
		; BE-NEXT: [[TMP23:%.*]] = zext i8 [[TMP13]] to i64
		; BE-NEXT: [[TMP24:%.*]] = shl nuw nsw i64 [[TMP23]], 40
		; BE-NEXT: [[TMP25:%.*]] = or i64 [[TMP22]], [[TMP24]]
		; BE-NEXT: [[TMP26:%.*]] = zext i8 [[TMP11]] to i64
		; BE-NEXT: [[TMP27:%.*]] = shl nuw nsw i64 [[TMP26]], 32
		; BE-NEXT: [[TMP28:%.*]] = or i64 [[TMP25]], [[TMP27]]
		; BE-NEXT: [[TMP29:%.*]] = zext i8 [[TMP9]] to i64
		; BE-NEXT: [[TMP30:%.*]] = shl nuw nsw i64 [[TMP29]], 24
		; BE-NEXT: [[TMP31:%.*]] = or i64 [[TMP28]], [[TMP30]]
		; BE-NEXT: [[TMP32:%.*]] = zext i8 [[TMP7]] to i64
		; BE-NEXT: [[TMP33:%.*]] = shl nuw nsw i64 [[TMP32]], 16
		; BE-NEXT: [[TMP34:%.*]] = zext i8 [[TMP5]] to i64
		; BE-NEXT: [[TMP35:%.*]] = shl nuw nsw i64 [[TMP34]], 8
		; BE-NEXT: [[TMP36:%.*]] = or i64 [[TMP31]], [[TMP33]]
		; BE-NEXT: [[TMP37:%.*]] = zext i8 [[TMP3]] to i64
		; BE-NEXT: [[TMP38:%.*]] = or i64 [[TMP36]], [[TMP35]]
		; BE-NEXT: [[TMP39:%.*]] = or i64 [[TMP38]], [[TMP37]]
		; BE-NEXT: ret i64 [[TMP39]]
;		;
%tmp3 = load i8, ptr %arg, align 1		%tmp3 = load i8, ptr %arg, align 1
%tmp4 = getelementptr inbounds i8, ptr %arg, i64 1		%tmp4 = getelementptr inbounds i8, ptr %arg, i64 1
%tmp5 = load i8, ptr %tmp4, align 1		%tmp5 = load i8, ptr %tmp4, align 1
%tmp6 = getelementptr inbounds i8, ptr %arg, i64 2		%tmp6 = getelementptr inbounds i8, ptr %arg, i64 2
%tmp7 = load i8, ptr %tmp6, align 1		%tmp7 = load i8, ptr %tmp6, align 1
%tmp8 = getelementptr inbounds i8, ptr %arg, i64 3		%tmp8 = getelementptr inbounds i8, ptr %arg, i64 3
%tmp9 = load i8, ptr %tmp8, align 1		%tmp9 = load i8, ptr %tmp8, align 1
Show All 32 Lines