This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Transforms/
-
llvm/
-
Transforms/
-
Scalar.h
-
Scalar/
2/2
LICM.h
-
Utils/
2/2
LoopUtils.h
-
lib/
-
Passes/
1/1
PassBuilderPipelines.cpp
-
Transforms/
-
IPO/
1/1
PassManagerBuilder.cpp
-
Scalar/
1/2
LICM.cpp
-
test/Transforms/
-
Transforms/
-
LoopUnroll/AArch64/
-
AArch64/
-
runtime-unroll-generic.ll
-
PhaseOrdering/
-
AArch64/
-
matrix-extract-insert.ll
-
X86/
-
hoist-load-of-baseptr.ll
-
speculation-vs-tbaa.ll
-
spurious-peeling.ll

Differential D119965

[LICM][PhaseOrder] Don't speculate in LICM until after running loop rotate
ClosedPublic

Authored by wsmoses on Feb 16 2022, 11:37 AM.

Download Raw Diff

Details

Reviewers

jdoerfert
nikic
lebedev.ri
fhahn
jeroen.dobbelaere

Commits

rGd9da6a535f21: [LICM][PhaseOrder] Don't speculate in LICM until after running loop rotate

Summary

LICM will speculatively hoist code outside of loops. This requires removing information, like alias analysis (https://github.com/llvm/llvm-project/issues/53794), range information (https://bugs.llvm.org/show_bug.cgi?id=50550), among others. Prior to https://reviews.llvm.org/D99249 , LICM would only be run after LoopRotate. Running Loop Rotate prior to LICM prevents a instruction hoist from being speculative, if it was conditionally executed by the iteration (as is commonly emitted by clang and other frontends). Adding the additional LICM pass first, however, forces all of these instructions to be considered speculative, even if they are not speculative after LoopRotate. This destroys information, resulting in performance losses for discarding this additional information.

This PR modifies LICM to accept a ``speculative'' parameter which allows LICM to be set to perform information-loss speculative hoists or not. Phase ordering is then modified to not perform the information-losing speculative hoists until after loop rotate is performed, preserving this additional information.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

wsmoses created this revision.Feb 16 2022, 11:37 AM

Herald added subscribers: ormris, asbirlea, hiraditya. · View Herald TranscriptFeb 16 2022, 11:37 AM

wsmoses requested review of this revision.Feb 16 2022, 11:37 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 16 2022, 11:37 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Fix Name

wsmoses removed a reviewer: vchuravy.Feb 16 2022, 11:40 AM

wsmoses added a subscriber: vchuravy.

Harbormaster completed remote builds in B150035: Diff 409350.Feb 16 2022, 12:36 PM

I'm currently looking into a complete revert, hang on...

lebedev.ri mentioned this in rG73ee82871e60: [NFC][PhaseOrdering] Precommit tests from D119965.Feb 16 2022, 1:19 PM

D119975

In D119965#3327448, @lebedev.ri wrote:

D119975

Ok, that doesn't work.

Please rebase this patch, and update affected tests (no new tests are needed here):

the subject is missing "in" before "LICM"
The parameter name should be AllowSpeculation, it should be bool.

llvm/lib/Passes/PassBuilderPipelines.cpp
296–299	Please update the comment

Address comments and rebase

wsmoses retitled this revision from [LICM][PhaseOrder] Don't speculate LICM until after running loop rotate to [LICM][PhaseOrder] Don't speculate in LICM until after running loop rotate.Feb 17 2022, 9:21 AM

lebedev.ri added inline comments.Feb 17 2022, 9:32 AM

llvm/include/llvm/Transforms/Scalar/LICM.h
49	bool
69	bool
llvm/include/llvm/Transforms/Utils/LoopUtils.h
174–175	This sounds worse than it is.
llvm/lib/Transforms/IPO/PassManagerBuilder.cpp
910	Could we please be consisted with `/AllowSpeculation=/`/`/AllowSpeculation/`
llvm/lib/Transforms/Scalar/LICM.cpp
206	Elsewhere `LicmAllowSpeculation` doesn't have a default value. Should we be consistent with that?

wsmoses marked 5 inline comments as done.Feb 17 2022, 9:44 AM

wsmoses added inline comments.

llvm/lib/Transforms/Scalar/LICM.cpp
206	This constructor in particular needs a default value since it is constructed without arguments (and all other arguments have default values).

Fix bool type, and comments

lebedev.ri added inline comments.Feb 17 2022, 9:51 AM

llvm/include/llvm/Transforms/Utils/LoopUtils.h
211	Same comment update (and elsewhere)

Clarify speculative comment

wsmoses marked an inline comment as done.Feb 17 2022, 9:57 AM

LGTM, thanks.

This revision is now accepted and ready to land.Feb 17 2022, 10:03 AM

Harbormaster completed remote builds in B150274: Diff 409702.Feb 17 2022, 10:27 AM

Set default LICM to speculate on new PM

Harbormaster completed remote builds in B150286: Diff 409717.Feb 17 2022, 11:17 AM

Update AArch tests

Herald added a subscriber: zzheng. · View Herald TranscriptFeb 17 2022, 11:35 AM

Harbormaster completed remote builds in B150295: Diff 409736.Feb 17 2022, 12:21 PM

Fix tests

Harbormaster completed remote builds in B150341: Diff 409802.Feb 17 2022, 4:25 PM

This revision was landed with ongoing or failed builds.Feb 17 2022, 5:13 PM

Closed by commit rGd9da6a535f21: [LICM][PhaseOrder] Don't speculate in LICM until after running loop rotate (authored by wsmoses). · Explain Why

This revision was automatically updated to reflect the committed changes.

wsmoses added a commit: rGd9da6a535f21: [LICM][PhaseOrder] Don't speculate in LICM until after running loop rotate.

fhahn mentioned this in D121944: [LICM] Add allowspeculation pass options..Mar 17 2022, 12:55 PM

fhahn mentioned this in rG5ab421fb4e0e: [LICM] Add allowspeculation pass options..Mar 18 2022, 9:52 AM

@wsmoses, this patch caused eigen regression in our code, it seems some originally vectorized code changed to scalarized code.
Could you revert it?

Herald added a project: Restricted Project. · View Herald TranscriptMar 25 2022, 11:14 AM

What precisely is the regression and the origin of it? This patch essentially aims to restore the information which was lost during a prior Phase Ordering change in https://reviews.llvm.org/D99249 which caused a large number of performance regressions. What is the behavior of your code prior to https://reviews.llvm.org/D99249 landing?

In D119965#3408489, @Carrot wrote:

@wsmoses, this patch caused eigen regression in our code, it seems some originally vectorized code changed to scalarized code.
Could you revert it?

This patch improved many regressions so it would be bad trade off to rush things and revert it.

It would be more productive approach if you could prepare small repro or at least some steps to reproduce your issue.

Before this patch I have a code snippet

     │ 80:   mov      -0x58(%rsp),%rdx                                                                                                                                                      
     │       mov      -0x60(%rsp),%r9                                                                                                                                                       
     │       mov      -0x40(%rsp),%r15                                                                                                                                                      
0.63 │ 8f:   mulps    %xmm8,%xmm9                                                                                                                                                           
0.44 │       movups   (%rdx,%r9,4),%xmm1                                                                                                                                                    
0.36 │       addps    %xmm9,%xmm1                                                                                                                                                           
0.25 │       movups   %xmm1,(%rdx,%r9,4)                                                                                                                                                    
0.75 │       mulps    %xmm8,%xmm3                                                                                                                                                           
0.41 │       movups   (%rdx,%r10,4),%xmm1                                                                                                                                                   
0.86 │       addps    %xmm3,%xmm1                                                                                                                                                           
0.70 │       movups   %xmm1,(%rdx,%r10,4)                                                                                                                                                   
0.43 │       add      $0x8,%r9                                                                                                                                                              
0.28 │       add      -0x48(%rsp),%rbx                                                                                                                                                      
0.07 │       cmp      %r15,%r9                                                                                                                                                              
0.14 │     ↓ jge      3e5                                             
             ...

After this patch, it is changed to

0.33 │ 80:   mov      %r12,%rdx                                                                                                                                                             
 0.32 │       or       $0x1,%rdx                                                                                                                                                             
 0.32 │       mov      %r12,%rdi                                                                                                                                                             
 0.39 │       or       $0x2,%rdi                                                                                                                                                             
 0.38 │       mov      %r12,%rcx                                                                                                                                                             
 0.30 │       or       $0x3,%rcx                                                                                                                                                             
 0.37 │       mov      %r12,%rbp                                                                                                                                                             
 0.39 │       or       $0x4,%rbp                                                                                                                                                             
 0.31 │       mov      %r12,%r10                                                                                                                                                             
 0.27 │       or       $0x5,%r10                                                                                                                                                             
 0.31 │       mov      %r12,%r9                                                                                                                                                              
 0.37 │       or       $0x6,%r9                                                                                                                                                              
 0.29 │       mov      %r12,%r8                                                                                                                                                              
 0.35 │       or       $0x7,%r8                                                                                                                                                              
 0.34 │ b1:   mulss    %xmm8,%xmm13                                                                                                                                                          
 0.39 │       addss    (%r11,%r12,4),%xmm13                                                                                                                                                  
 0.33 │       movss    %xmm13,(%r11,%r12,4)                                                                                                                                                  
 0.33 │       mulss    %xmm8,%xmm12                                                                                                                                                          
 0.41 │       addss    (%r11,%rdx,4),%xmm12                                                                                                                                                  
 0.38 │       movss    %xmm12,(%r11,%rdx,4)                                                                                                                                                  
 0.31 │       mulss    %xmm8,%xmm3                                                                                                                                                           
 0.39 │       addss    (%r11,%rdi,4),%xmm3                                                                                                                                                   
 0.35 │       movss    %xmm3,(%r11,%rdi,4)                                                                                                                                                   
 0.41 │       mulss    %xmm8,%xmm4                                                                                                                                                           
 0.31 │       addss    (%r11,%rcx,4),%xmm4                                                                                                                                                   
 0.31 │       movss    %xmm4,(%r11,%rcx,4)                                                                                                                                                   
 0.34 │       mulss    %xmm8,%xmm5                                                                                                                                                           
 0.41 │       addss    (%r11,%rbp,4),%xmm5                                                                                                                                                   
 0.34 │       movss    %xmm5,(%r11,%rbp,4)                                                                                                                                                   
 0.32 │       mulss    %xmm8,%xmm6                                                                                                                                                           
 0.34 │       addss    (%r11,%r10,4),%xmm6                                                                                                                                                   
 0.38 │       movss    %xmm6,(%r11,%r10,4)                                                                                                                                                   
 0.35 │       mulss    %xmm8,%xmm7                                                                                                                                                           
 0.43 │       addss    (%r11,%r9,4),%xmm7                                                                                                                                                    
 0.38 │       movss    %xmm7,(%r11,%r9,4)                                                                                                                                                    
 0.41 │       mulss    %xmm8,%xmm1                                                                                                                                                           
 0.36 │       addss    (%r11,%r8,4),%xmm1                                                                                                                                                    
 0.32 │       movss    %xmm1,(%r11,%r8,4)                                                                                                                                                    
 0.39 │       add      $0x8,%r12                                                                                                                                                             
 0.39 │       add      -0x18(%rsp),%rbx                                                                                                                                                      
 0.02 │       cmp      -0x60(%rsp),%r12                                                                                                                                                      
 0.31 │     ↓ jge      510                                                                    
              ...

In D119965#3408653, @Carrot wrote:

Before this patch I have a code snippet

│ 80:   mov      -0x58(%rsp),%rdx                                                                                                                                                      
│       mov      -0x60(%rsp),%r9

Could you also share the input source file to reproduce the difference? I think this will be needed to investigate the difference.

Could you also share the input source file to reproduce the difference? I think this will be needed to investigate the difference.

The original build configuration is a combination of thinlto and fdo. I'm trying to get a separate reduced test case.

In D119965#3409284, @Carrot wrote:

Could you also share the input source file to reproduce the difference? I think this will be needed to investigate the difference.

The original build configuration is a combination of thinlto and fdo. I'm trying to get a separate reduced test case.

That would be very helpful!

I still failed to reproduce it in plain mode.

But now I understand the problem more clear. It looks this patch triggered some inefficiency in following optimizations.

After the LICM pass, the two versions of IR differs significantly, but our interesting BB is the same.

%236 = fmul float %162, %6, !dbg !2160
%237 = mul nsw i64 %19, %5, !dbg !2161
%238 = getelementptr inbounds float, float* %4, i64 %237, !dbg !2162
%239 = load float, float* %238, align 4, !dbg !2163
%240 = fadd float %239, %236, !dbg !2163
store float %240, float* %238, align 4, !dbg !2163
%241 = fmul float %163, %6, !dbg !2164
%242 = or i64 %19, 1, !dbg !2165                                             // *
%243 = mul nsw i64 %242, %5, !dbg !2166
%244 = getelementptr inbounds float, float* %4, i64 %243, !dbg !2167
%245 = load float, float* %244, align 4, !dbg !2168
%246 = fadd float %245, %241, !dbg !2168
store float %246, float* %244, align 4, !dbg !2168
%247 = fmul float %164, %6, !dbg !2169
%248 = or i64 %19, 2, !dbg !2170                                             // *
%249 = mul nsw i64 %248, %5, !dbg !2171
%250 = getelementptr inbounds float, float* %4, i64 %249, !dbg !2172
%251 = load float, float* %250, align 4, !dbg !2173
%252 = fadd float %251, %247, !dbg !2173
store float %252, float* %250, align 4, !dbg !2173
%253 = fmul float %165, %6, !dbg !2174
%254 = or i64 %19, 3, !dbg !2175                                            // *
%255 = mul nsw i64 %254, %5, !dbg !2176
%256 = getelementptr inbounds float, float* %4, i64 %255, !dbg !2177
...

Notice those or instructions, they are used together with following mul/GEP instructions to access consecutive array elements.

In old version IR, the loop header contains same group of or instructions, GVNPass found this fact, it deletes these or instructions in our interesting BB and reuse the results of those or instructions in loop header. Later SLPVectorize can still understand GEP instructions compute consecutive memory addresses, and vectorized this BB.

In the new version IR, the loop header doesn't contain those or instructions, instead one of the predecessors of this BB contains these or instructions, they look like

BB1:
   br %cond, label %BB2, label %BB3

BB2:
   ...
   br label BBX

BB3:
   ...
  %179 = or i64 %24, 1
  ...
   br label BBX

BBX:
   // our interesting bb
  ...
  %242 = or i64 %24, 1, !dbg !2165
  ...

Then GVN insert or instructions to BB2, replaces the or instructions with PHIs in BBX.

BB1:
   br %cond, label %BB2, label %BB3

BB2:
   ...
  %161 = or i64 %24, 1
  ...
   br label BBX

BB3:
   ...
  %179 = or i64 %24, 1
  ...
   br label BBX

BBX:
   // our interesting bb
  %245 = phi i64 [ %161, %BB2 ], [ %179, %BB3 ]
  ...
  %296 = mul nsw i64 %245, %5, !dbg !2155
  %297 = getelementptr inbounds float, float* %4, i64 %296, !dbg !2156
  ...

Although all PHI operands have same value, SLPVectorizer couldn't recognize it, so it can't figure out the GEPs computes consecutive memory addresses, and failed to vectorize this BB.

There are 3 potential solutions.

GVNPass, if a new PHI's operands have same value, we can move them to dominator, and delete the PHI and its operands.
InstCombinePass, do the same thing described above, but as a clean up work in a later pass.
SLPVectorizerPass, we can teach it look into PHI operands, PHI's operands may have same value, we can get more useful information from it.

Which method do you think is better?

If this is really as simple as all the incoming values being identical instructions with identical arguments, then it seems like a simple extension of the InstCombinerImpl::foldPHIArgOpIntoPHI()

In D119965#3423713, @lebedev.ri wrote:

If this is really as simple as all the incoming values being identical instructions with identical arguments, then it seems like a simple extension of the InstCombinerImpl::foldPHIArgOpIntoPHI()

foldPHIArgOpIntoPHI requires all operands are used by PHI only. In our case or instruction in BB3 has other users.

In D119965#3423795, @Carrot wrote:

In D119965#3423713, @lebedev.ri wrote:

If this is really as simple as all the incoming values being identical instructions with identical arguments, then it seems like a simple extension of the InstCombinerImpl::foldPHIArgOpIntoPHI()

foldPHIArgOpIntoPHI requires all operands are used by PHI only. In our case or instruction in BB3 has other users.

There are two ways to view this;

if all of the IV's of PHI are fully identical instructions with fully identical operands, then we don't need to PHI together the operands, and can replace the PHI with said instruction.
The one-user check there is there to ensure that the instruction count does not increase, so in principle, if we need to PHI together the operands, we need as many of the instructions to be one-user as many PHI's we need.

There are two ways to view this;

if all of the IV's of PHI are fully identical instructions with fully identical operands, then we don't need to PHI together the operands, and can replace the PHI with said instruction.

The one-user check there is there to ensure that the instruction count does not increase, so in principle, if we need to PHI together the operands, we need as many of the instructions to be one-user as many PHI's we need.

I'm thinking of the following change, so we can remove PHI instruction without introduce any extra instructions on any path.

BB1:
   br %cond, label %BB2, label %BB3

BB2:
   ...
  %161 = or i64 %24, 1
  ...
   br label BBX

BB3:
   ...
  %179 = or i64 %24, 1
  ...
   br label BBX

BBX:
   // our interesting bb
  %245 = phi i64 [ %161, %BB2 ], [ %179, %BB3 ]
  ...
  %296 = mul nsw i64 %245, %5, !dbg !2155
  %297 = getelementptr inbounds float, float* %4, i64 %296, !dbg !2156
  ...

==>

BB1:
   ...
   // New instruction is inserted here
   %245 = or i64 %24, 1
   br %cond, label %BB2, label %BB3

BB2:
   ...
  ...
   br label BBX

BB3:
   ...
  // Use of %245.
  ...
   br label BBX

BBX:
  // our interesting bb
  // PHI instruction is deleted.
  ...
  %296 = mul nsw i64 %245, %5, !dbg !2155
  %297 = getelementptr inbounds float, float* %4, i64 %296, !dbg !2156
  ...

In D119965#3426448, @Carrot wrote:
There are two ways to view this;

if all of the IV's of PHI are fully identical instructions with fully identical operands, then we don't need to PHI together the operands, and can replace the PHI with said instruction.

The one-user check there is there to ensure that the instruction count does not increase, so in principle, if we need to PHI together the operands, we need as many of the instructions to be one-user as many PHI's we need.

I'm thinking of the following change, so we can remove PHI instruction without introduce any extra instructions on any path.
BB1:
   br %cond, label %BB2, label %BB3

BB2:
   ...
  %161 = or i64 %24, 1
  ...
   br label BBX

BB3:
   ...
  %179 = or i64 %24, 1
  ...
   br label BBX

BBX:
   // our interesting bb
  %245 = phi i64 [ %161, %BB2 ], [ %179, %BB3 ]
  ...
  %296 = mul nsw i64 %245, %5, !dbg !2155
  %297 = getelementptr inbounds float, float* %4, i64 %296, !dbg !2156
  ...

==>

BB1:
   ...
   // New instruction is inserted here
   %245 = or i64 %24, 1
   br %cond, label %BB2, label %BB3

BB2:
   ...
  ...
   br label BBX

BB3:
   ...
  // Use of %245.
  ...
   br label BBX

BBX:
  // our interesting bb
  // PHI instruction is deleted.
  ...
  %296 = mul nsw i64 %245, %5, !dbg !2155
  %297 = getelementptr inbounds float, float* %4, i64 %296, !dbg !2156
  ...

That kinda sounds like something for GVNHoist, which i think is currently still disabled due to some miscompilations?

That kinda sounds like something for GVNHoist, which i think is currently still disabled due to some miscompilations?

I believe that known miscompilations are fixed but major perf regressions were found so.. disabled.

I'm thinking of the following change, so we can remove PHI instruction without introduce any extra instructions on any path.

BB1:
   br %cond, label %BB2, label %BB3

BB2:
   ...
  %161 = or i64 %24, 1
  ...
   br label BBX

BB3:
   ...
  %179 = or i64 %24, 1
  ...
   br label BBX

BBX:
   // our interesting bb
  %245 = phi i64 [ %161, %BB2 ], [ %179, %BB3 ]
  ...
  %296 = mul nsw i64 %245, %5, !dbg !2155
  %297 = getelementptr inbounds float, float* %4, i64 %296, !dbg !2156
  ...

==>

BB1:
   ...
   // New instruction is inserted here
   %245 = or i64 %24, 1
   br %cond, label %BB2, label %BB3

BB2:
   ...
  ...
   br label BBX

BB3:
   ...
  // Use of %245.
  ...
   br label BBX

BBX:
  // our interesting bb
  // PHI instruction is deleted.
  ...
  %296 = mul nsw i64 %245, %5, !dbg !2155
  %297 = getelementptr inbounds float, float* %4, i64 %296, !dbg !2156
  ...

That kinda sounds like something for GVNHoist, which i think is currently still disabled due to some miscompilations?

I tried GVNHoist, it works as expected, moves the "or" instruction to BB1.

But BB1 is in a loop, later LICM moves the "or" instruction to the loop preheader. Unfortunately another identical "or" instruction exists in one of its predecessors, I encountered the same problem as before GVNHoist, but this time no GVNHoist can hoist the "or" instruction, and later GVNPass creates a PHI for it.

Can I add another GVNHoist pass after loop optimizations?

Now I understand how this patch caused missing vectorization in our code. In my previous comment I have analyzed that different GVN result caused different SLPVectorization behavior. This time let's focus on how this patch generates different GVN results.

Following is simplified IR before the first LICM. The original code is much more complex, and multiple optimizations are involved, so only related instructions and control flow are listed.

LoopHeader1:
  br %cond1, label %PreHeader2, %LoopExit1

PreHeader2:
  br label %LoopHeader2

LoopHeader2:
  br %cond2, label %LoopBody2, label %LoopExit2

LoopBody2:
  %100 = or i64 %24, 1
  ... // uses of %100
  br label %LoopHeader2

LoopExit2:
  %200 = or i64 %24, 1
  ... // uses of %200
  br label %LoopHeader1

Without this patch:

First LICM of loop2, the definition of %100 is moved to PreHeader2. So now the definition of %100 dominates %200.
LoopRotate of loop1, BB LoopHeader1 is duplicated into predecessors and deleted, PreHeader2 becomes the new loop header of loop1, so now it's more obvious that %100 dominates %200.
GVN, because the definition of %100 dominates %200, %200 is deleted, all uses of %200 are replaced by %100.

With this patch, we have following different behavior:

First LICM of loop2, speculation is disabled, the definition of %100 is not moved, so code is not changed.
LoopRotate of loop2, LoopHeader2 is duplicated into predecessors and deleted. Pay attention that there is a new pre header for loop2 is created. Now we have following code

LoopHeader1:
  br %cond1, label %PreHeader2, %LoopExit1

PreHeader2:
  br %cond2, label %NewPreHeader2, label %LoopExit2

NewPreHeader2:
  br %LoopBody2

LoopBody2:
  %100 = or i64 %24, 1
  ... // uses of %100
  br %cond2, label %LoopBody2, label %_crit_edge

_crit_edge
  br label %LoopExit2

LoopExit2:
  %200 = or i64 %24, 1
  ... // uses of %200
  br label %LoopHeader1

Second LICM of loop2, this time the definition of %100 is moved to NewPreHeader2, but this time it doesn't dominate %200.

LoopHeader1:
  br %cond1, label %PreHeader2, %LoopExit1

PreHeader2:
  br %cond2, label %NewPreHeader2, label %LoopExit2

NewPreHeader2:
  %100 = or i64 %24, 1
  br %LoopBody2

LoopBody2:
  ... // uses of %100
  br %cond2, label %LoopBody2, label %_crit_edge

_crit_edge
  br label %LoopExit2

LoopExit2:
  %200 = or i64 %24, 1
  ... // uses of %200
  br label %LoopHeader1

LoopRotate of loop1, LoopHeader1 is duplicated into predecessors and deleted, PreHeader2 becomes the new loop header of loop1. %100 still doesn't dominates %200

NewPreHeader1:
  br label %PreHeader2

PreHeader2:                        // It's actually loop header of loop1
  br %cond2, label %NewPreHeader2, label %LoopExit2

NewPreHeader2:
  %100 = or i64 %24, 1
  br %LoopBody2

LoopBody2:
  ... // uses of %100
  br %cond2, label %LoopBody2, label %_crit_edge

_crit_edge
  br label %LoopExit2

LoopExit2:
  %200 = or i64 %24, 1
  ... // uses of %200
  br %cond1, label %PreHeader2, label %LoopExit1

GVN, because the definition of %100 can reach %200 but doesn't dominate %200, so GVN adds a new definition in the other path PreHeader2 -> LoopExit2, it's a critical edge, so it's splitted. New definition of the "or" instruction is inserted in the new BB. A PHI instruction is inserted in LoopExit2.

NewPreHeader1:
  br label %PreHeader2

PreHeader2:                        // It's actually loop header of loop1
  br %cond2, label %NewPreHeader2, label %LoopExit2.crit_edge

NewPreHeader2:
  %100 = or i64 %24, 1
  br %LoopBody2

LoopBody2:
  ... // uses of %100
  br %cond2, label %LoopBody2, label %_crit_edge

_crit_edge:
  br label %LoopExit2

LoopExit2.crit_edge:
  %150 = or i64 %24, 1
  br label %LoopExit2
  
LoopExit2:
  %200 = phi i64 [%150, LoopExit2.crit_edge], [%100, _crit_edge]
  ... // uses of %200
  br %cond1, label %PreHeader2, label %LoopExit1

This is how we got different IR at the end of GVN. And later SLPVectorization makes different decision with these IR.

Any suggestions on how to fix it?

If I specify -rotation-max-header-size=2, loop rotation can be disabled for Loop2, and the second LICM hoists the "or" instruction to PreHeader2, which dominates following "or" instruction. And later GVN and SLPVectorizer works as previously, I got vectorized instructions for LoopBody2.

But the performance is not restored, I got more register spills. Loop rotation can change Loop2 to a single block loop, which benefits many other optimizations, so it is also important to our eigen code.

I added another GVNHoist pass after loop optimizations, then the two "or" instructions are replaced by one "or" instruction and hoisted into PreHeader2, and finally instructions in LoopExit2 got vectorized by SLPVectorizer.

Any comments on this method?

Since I feel like this recent conversation is distinct from the PR, I think it deserves its own issue for visibility and tracking.

@Carrot how would you feel about opening a distinct issue on Github, providing all of the relevant context, and tagging the relevant folks?

Last I saw, GVNHoist was disabled due to perf regressions. Then, new effort started to reimplement it - https://lists.llvm.org/pipermail/llvm-dev/2021-September/152665.html but somehow dead now.

So you may improve your specific loop with GVNHoist, but you will regress something else.

https://github.com/llvm/llvm-project/issues/55237 is filed.
We can move discussion to there.

Allen added a subscriber: Allen.May 3 2022, 12:32 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

Transforms/

Scalar.h

3 lines

Scalar/

LICM.h

20 lines

Utils/

LoopUtils.h

9 lines

lib/

Passes/

PassBuilderPipelines.cpp

34 lines

Transforms/

IPO/

PassManagerBuilder.cpp

26 lines

Scalar/

LICM.cpp

71 lines

test/

Transforms/

LoopUnroll/

AArch64/

runtime-unroll-generic.ll

20 lines

PhaseOrdering/

AArch64/

matrix-extract-insert.ll

12 lines

X86/

hoist-load-of-baseptr.ll

20 lines

speculation-vs-tbaa.ll

72 lines

spurious-peeling.ll

14 lines

Diff 409823

llvm/include/llvm/Transforms/Scalar.h

	Show First 20 Lines • Show All 127 Lines • ▼ Show 20 Lines
	Pass *createIndVarSimplifyPass();			Pass *createIndVarSimplifyPass();

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// LICM - This pass is a loop invariant code motion and memory promotion pass.			// LICM - This pass is a loop invariant code motion and memory promotion pass.
	//			//
	Pass *createLICMPass();			Pass *createLICMPass();
	Pass *createLICMPass(unsigned LicmMssaOptCap,			Pass *createLICMPass(unsigned LicmMssaOptCap,
	unsigned LicmMssaNoAccForPromotionCap);			unsigned LicmMssaNoAccForPromotionCap,
				bool AllowSpeculation);

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// LoopSink - This pass sinks invariants from preheader to loop body where			// LoopSink - This pass sinks invariants from preheader to loop body where
	// frequency is lower than loop preheader.			// frequency is lower than loop preheader.
	//			//
	Pass *createLoopSinkPass();			Pass *createLoopSinkPass();

	▲ Show 20 Lines • Show All 423 Lines • Show Last 20 Lines

llvm/include/llvm/Transforms/Scalar/LICM.h

	Show All 40 Lines

	extern cl::opt<unsigned> SetLicmMssaOptCap;			extern cl::opt<unsigned> SetLicmMssaOptCap;
	extern cl::opt<unsigned> SetLicmMssaNoAccForPromotionCap;			extern cl::opt<unsigned> SetLicmMssaNoAccForPromotionCap;

	/// Performs Loop Invariant Code Motion Pass.			/// Performs Loop Invariant Code Motion Pass.
	class LICMPass : public PassInfoMixin<LICMPass> {			class LICMPass : public PassInfoMixin<LICMPass> {
	unsigned LicmMssaOptCap;			unsigned LicmMssaOptCap;
	unsigned LicmMssaNoAccForPromotionCap;			unsigned LicmMssaNoAccForPromotionCap;
				bool LicmAllowSpeculation;
				lebedev.riUnsubmitted Done Reply Inline Actions bool lebedev.ri: bool

	public:			public:
	LICMPass()			LICMPass()
	: LicmMssaOptCap(SetLicmMssaOptCap),			: LicmMssaOptCap(SetLicmMssaOptCap),
	LicmMssaNoAccForPromotionCap(SetLicmMssaNoAccForPromotionCap) {}			LicmMssaNoAccForPromotionCap(SetLicmMssaNoAccForPromotionCap),
	LICMPass(unsigned LicmMssaOptCap, unsigned LicmMssaNoAccForPromotionCap)			LicmAllowSpeculation(true) {}
				LICMPass(unsigned LicmMssaOptCap, unsigned LicmMssaNoAccForPromotionCap,
				bool LicmAllowSpeculation)
	: LicmMssaOptCap(LicmMssaOptCap),			: LicmMssaOptCap(LicmMssaOptCap),
	LicmMssaNoAccForPromotionCap(LicmMssaNoAccForPromotionCap) {}			LicmMssaNoAccForPromotionCap(LicmMssaNoAccForPromotionCap),
				LicmAllowSpeculation(LicmAllowSpeculation) {}
	PreservedAnalyses run(Loop &L, LoopAnalysisManager &AM,			PreservedAnalyses run(Loop &L, LoopAnalysisManager &AM,
	LoopStandardAnalysisResults &AR, LPMUpdater &U);			LoopStandardAnalysisResults &AR, LPMUpdater &U);
	};			};

	/// Performs LoopNest Invariant Code Motion Pass.			/// Performs LoopNest Invariant Code Motion Pass.
	class LNICMPass : public PassInfoMixin<LNICMPass> {			class LNICMPass : public PassInfoMixin<LNICMPass> {
	unsigned LicmMssaOptCap;			unsigned LicmMssaOptCap;
	unsigned LicmMssaNoAccForPromotionCap;			unsigned LicmMssaNoAccForPromotionCap;
				bool LicmAllowSpeculation;
				lebedev.riUnsubmitted Done Reply Inline Actions bool lebedev.ri: bool

	public:			public:
	LNICMPass()			LNICMPass()
	: LicmMssaOptCap(SetLicmMssaOptCap),			: LicmMssaOptCap(SetLicmMssaOptCap),
	LicmMssaNoAccForPromotionCap(SetLicmMssaNoAccForPromotionCap) {}			LicmMssaNoAccForPromotionCap(SetLicmMssaNoAccForPromotionCap),
	LNICMPass(unsigned LicmMssaOptCap, unsigned LicmMssaNoAccForPromotionCap)			LicmAllowSpeculation(true) {}
				LNICMPass(unsigned LicmMssaOptCap, unsigned LicmMssaNoAccForPromotionCap,
				bool LicmAllowSpeculation)
	: LicmMssaOptCap(LicmMssaOptCap),			: LicmMssaOptCap(LicmMssaOptCap),
	LicmMssaNoAccForPromotionCap(LicmMssaNoAccForPromotionCap) {}			LicmMssaNoAccForPromotionCap(LicmMssaNoAccForPromotionCap),
				LicmAllowSpeculation(LicmAllowSpeculation) {}
	PreservedAnalyses run(LoopNest &L, LoopAnalysisManager &AM,			PreservedAnalyses run(LoopNest &L, LoopAnalysisManager &AM,
	LoopStandardAnalysisResults &AR, LPMUpdater &U);			LoopStandardAnalysisResults &AR, LPMUpdater &U);
	};			};
	} // end namespace llvm			} // end namespace llvm

	#endif // LLVM_TRANSFORMS_SCALAR_LICM_H			#endif // LLVM_TRANSFORMS_SCALAR_LICM_H

llvm/include/llvm/Transforms/Utils/LoopUtils.h

Show First 20 Lines • Show All 165 Lines • ▼ Show 20 Lines

/// Walk the specified region of the CFG (defined by all blocks

/// dominated by the specified block, and that are in the current loop) in depth

/// first order w.r.t the DominatorTree. This allows us to visit definitions

/// before uses, allowing us to hoist a loop body in one pass without iteration.

/// Takes DomTreeNode, AAResults, LoopInfo, DominatorTree,

/// BlockFrequencyInfo, TargetLibraryInfo, Loop, AliasSet information for all

/// instructions of the loop and loop safety information as arguments.

/// Diagnostics is emitted via \p ORE. It returns changed status.

/// \p AllowSpeculation is whether values should be hoisted even if they are not

/// guaranteed to execute in the loop, but are safe to speculatively execute.

lebedev.riUnsubmitted

Done

/// Diagnostics is emitted via \p ORE. It returns changed status.

/// \p AllowSpeculation is whether values should be hoisted even if they are not

- /// guaranteed to execute in the loop.

+ /// guaranteed to execute in the loop, but are safe to speculatively execute.

bool hoistRegion(DomTreeNode *, AAResults *, LoopInfo *, DominatorTree *,

This sounds worse than it is.

lebedev.ri: This sounds worse than it is.

bool hoistRegion(DomTreeNode *, AAResults *, LoopInfo *, DominatorTree *,

BlockFrequencyInfo *, TargetLibraryInfo *, Loop *,

MemorySSAUpdater *, ScalarEvolution *, ICFLoopSafetyInfo *,

SinkAndHoistLICMFlags &, OptimizationRemarkEmitter *, bool);

SinkAndHoistLICMFlags &, OptimizationRemarkEmitter *, bool,

bool AllowSpeculation);

/// This function deletes dead loops. The caller of this function needs to

/// guarantee that the loop is infact dead.

/// The function requires a bunch or prerequisites to be present:

/// - The loop needs to be in LCSSA form

/// - The loop needs to have a Preheader

/// - A unique dedicated exit block must exist

///

Show All 13 Lines

/// Try to promote memory values to scalars by sinking stores out of

/// the loop and moving loads to before the loop. We do this by looping over

/// the stores in the loop, looking for stores to Must pointers which are

/// loop invariant. It takes a set of must-alias values, Loop exit blocks

/// vector, loop exit blocks insertion point vector, PredIteratorCache,

/// LoopInfo, DominatorTree, Loop, AliasSet information for all instructions

/// of the loop and loop safety information as arguments.

/// Diagnostics is emitted via \p ORE. It returns changed status.

/// \p AllowSpeculation is whether values should be hoisted even if they are not

/// guaranteed to execute in the loop, but are safe to speculatively execute.

lebedev.riUnsubmitted

Done

Same comment update (and elsewhere)

lebedev.ri: Same comment update (and elsewhere)

bool promoteLoopAccessesToScalars(

const SmallSetVector<Value *, 8> &, SmallVectorImpl<BasicBlock *> &,

SmallVectorImpl<Instruction *> &, SmallVectorImpl<MemoryAccess *> &,

PredIteratorCache &, LoopInfo *, DominatorTree *, const TargetLibraryInfo *,

Loop *, MemorySSAUpdater *, ICFLoopSafetyInfo *,

OptimizationRemarkEmitter *);

OptimizationRemarkEmitter *, bool AllowSpeculation);

/// Does a BFS from a given node to all of its children inside a given loop.

/// The returned vector of nodes includes the starting point.

SmallVector<DomTreeNode *, 16> collectChildrenInLoop(DomTreeNode *N,

const Loop *CurLoop);

/// Returns the instructions that use values defined in the loop.

SmallVector<Instruction *, 8> findDefsUsedOutsideOfLoop(Loop *L);

▲ Show 20 Lines • Show All 314 Lines • Show Last 20 Lines

llvm/lib/Passes/PassBuilderPipelines.cpp

Show First 20 Lines • Show All 287 Lines • ▼ Show 20 Lines	PassBuilder::buildO1FunctionSimplificationPipeline(OptimizationLevel Level,

// Simplify the loop body. We do this initially to clean up after other loop		// Simplify the loop body. We do this initially to clean up after other loop
// passes run, either when iterating on a loop or on inner loops with		// passes run, either when iterating on a loop or on inner loops with
// implications on the outer loop.		// implications on the outer loop.
LPM1.addPass(LoopInstSimplifyPass());		LPM1.addPass(LoopInstSimplifyPass());
LPM1.addPass(LoopSimplifyCFGPass());		LPM1.addPass(LoopSimplifyCFGPass());

// Try to remove as much code from the loop header as possible,		// Try to remove as much code from the loop header as possible,
// to reduce amount of IR that will have to be duplicated.		// to reduce amount of IR that will have to be duplicated. However,
		// do not perform speculative hoisting the first time as LICM
		// will destroy metadata that may not need to be destroyed if run
		// after loop rotation.
		lebedev.riUnsubmitted Done Reply Inline Actions Please update the comment lebedev.ri: Please update the comment
// TODO: Investigate promotion cap for O1.		// TODO: Investigate promotion cap for O1.
LPM1.addPass(LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap));		LPM1.addPass(LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap,
		/AllowSpeculation=/false));

LPM1.addPass(LoopRotatePass(/* Disable header duplication */ true,		LPM1.addPass(LoopRotatePass(/* Disable header duplication */ true,
isLTOPreLink(Phase)));		isLTOPreLink(Phase)));
// TODO: Investigate promotion cap for O1.		// TODO: Investigate promotion cap for O1.
LPM1.addPass(LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap));		LPM1.addPass(LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap,
		/AllowSpeculation=/true));
LPM1.addPass(SimpleLoopUnswitchPass());		LPM1.addPass(SimpleLoopUnswitchPass());
if (EnableLoopFlatten)		if (EnableLoopFlatten)
LPM1.addPass(LoopFlattenPass());		LPM1.addPass(LoopFlattenPass());

LPM2.addPass(LoopIdiomRecognizePass());		LPM2.addPass(LoopIdiomRecognizePass());
LPM2.addPass(IndVarSimplifyPass());		LPM2.addPass(IndVarSimplifyPass());

for (auto &C : LateLoopOptimizationsEPCallbacks)		for (auto &C : LateLoopOptimizationsEPCallbacks)
▲ Show 20 Lines • Show All 153 Lines • ▼ Show 20 Lines	PassBuilder::buildFunctionSimplificationPipeline(OptimizationLevel Level,

// Simplify the loop body. We do this initially to clean up after other loop		// Simplify the loop body. We do this initially to clean up after other loop
// passes run, either when iterating on a loop or on inner loops with		// passes run, either when iterating on a loop or on inner loops with
// implications on the outer loop.		// implications on the outer loop.
LPM1.addPass(LoopInstSimplifyPass());		LPM1.addPass(LoopInstSimplifyPass());
LPM1.addPass(LoopSimplifyCFGPass());		LPM1.addPass(LoopSimplifyCFGPass());

// Try to remove as much code from the loop header as possible,		// Try to remove as much code from the loop header as possible,
// to reduce amount of IR that will have to be duplicated.		// to reduce amount of IR that will have to be duplicated. However,
		// do not perform speculative hoisting the first time as LICM
		// will destroy metadata that may not need to be destroyed if run
		// after loop rotation.
// TODO: Investigate promotion cap for O1.		// TODO: Investigate promotion cap for O1.
LPM1.addPass(LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap));		LPM1.addPass(LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap,
		/AllowSpeculation=/false));

// Disable header duplication in loop rotation at -Oz.		// Disable header duplication in loop rotation at -Oz.
LPM1.addPass(		LPM1.addPass(
LoopRotatePass(Level != OptimizationLevel::Oz, isLTOPreLink(Phase)));		LoopRotatePass(Level != OptimizationLevel::Oz, isLTOPreLink(Phase)));
// TODO: Investigate promotion cap for O1.		// TODO: Investigate promotion cap for O1.
LPM1.addPass(LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap));		LPM1.addPass(LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap,
		/AllowSpeculation=/true));
LPM1.addPass(		LPM1.addPass(
SimpleLoopUnswitchPass(/* NonTrivial */ Level == OptimizationLevel::O3 &&		SimpleLoopUnswitchPass(/* NonTrivial */ Level == OptimizationLevel::O3 &&
EnableO3NonTrivialUnswitching));		EnableO3NonTrivialUnswitching));
if (EnableLoopFlatten)		if (EnableLoopFlatten)
LPM1.addPass(LoopFlattenPass());		LPM1.addPass(LoopFlattenPass());

LPM2.addPass(LoopIdiomRecognizePass());		LPM2.addPass(LoopIdiomRecognizePass());
LPM2.addPass(IndVarSimplifyPass());		LPM2.addPass(IndVarSimplifyPass());
▲ Show 20 Lines • Show All 80 Lines • ▼ Show 20 Lines	PassBuilder::buildFunctionSimplificationPipeline(OptimizationLevel Level,
// TODO: Investigate if this is too expensive.		// TODO: Investigate if this is too expensive.
FPM.addPass(ADCEPass());		FPM.addPass(ADCEPass());

// Specially optimize memory movement as it doesn't look like dataflow in SSA.		// Specially optimize memory movement as it doesn't look like dataflow in SSA.
FPM.addPass(MemCpyOptPass());		FPM.addPass(MemCpyOptPass());

FPM.addPass(DSEPass());		FPM.addPass(DSEPass());
FPM.addPass(createFunctionToLoopPassAdaptor(		FPM.addPass(createFunctionToLoopPassAdaptor(
LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap),		LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap,
		/AllowSpeculation=/true),
/UseMemorySSA=/true, /UseBlockFrequencyInfo=/true));		/UseMemorySSA=/true, /UseBlockFrequencyInfo=/true));

FPM.addPass(CoroElidePass());		FPM.addPass(CoroElidePass());

for (auto &C : ScalarOptimizerLateEPCallbacks)		for (auto &C : ScalarOptimizerLateEPCallbacks)
C(FPM, Level);		C(FPM, Level);

FPM.addPass(SimplifyCFGPass(SimplifyCFGOptions()		FPM.addPass(SimplifyCFGPass(SimplifyCFGOptions()
▲ Show 20 Lines • Show All 427 Lines • ▼ Show 20 Lines	if (Level.getSpeedupLevel() > 1 && ExtraVectorizerPasses) {
// runtime checks for two inner loops in the same outer loop, fold any		// runtime checks for two inner loops in the same outer loop, fold any
// common computations, hoist loop-invariant aspects out of any outer loop,		// common computations, hoist loop-invariant aspects out of any outer loop,
// and unswitch the runtime checks if possible. Once hoisted, we may have		// and unswitch the runtime checks if possible. Once hoisted, we may have
// dead (or speculatable) control flows or more combining opportunities.		// dead (or speculatable) control flows or more combining opportunities.
ExtraPasses.addPass(EarlyCSEPass());		ExtraPasses.addPass(EarlyCSEPass());
ExtraPasses.addPass(CorrelatedValuePropagationPass());		ExtraPasses.addPass(CorrelatedValuePropagationPass());
ExtraPasses.addPass(InstCombinePass());		ExtraPasses.addPass(InstCombinePass());
LoopPassManager LPM;		LoopPassManager LPM;
LPM.addPass(LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap));		LPM.addPass(LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap,
		/AllowSpeculation=/true));
LPM.addPass(SimpleLoopUnswitchPass(/* NonTrivial */ Level ==		LPM.addPass(SimpleLoopUnswitchPass(/* NonTrivial */ Level ==
OptimizationLevel::O3));		OptimizationLevel::O3));
ExtraPasses.addPass(		ExtraPasses.addPass(
RequireAnalysisPass<OptimizationRemarkEmitterAnalysis, Function>());		RequireAnalysisPass<OptimizationRemarkEmitterAnalysis, Function>());
ExtraPasses.addPass(		ExtraPasses.addPass(
createFunctionToLoopPassAdaptor(std::move(LPM), /UseMemorySSA=/true,		createFunctionToLoopPassAdaptor(std::move(LPM), /UseMemorySSA=/true,
/UseBlockFrequencyInfo=/true));		/UseBlockFrequencyInfo=/true));
ExtraPasses.addPass(		ExtraPasses.addPass(
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	if (!IsFullLTO) {
FPM.addPass(LoopUnrollPass(LoopUnrollOptions(		FPM.addPass(LoopUnrollPass(LoopUnrollOptions(
Level.getSpeedupLevel(), /OnlyWhenForced=/!PTO.LoopUnrolling,		Level.getSpeedupLevel(), /OnlyWhenForced=/!PTO.LoopUnrolling,
PTO.ForgetAllSCEVInLoopUnroll)));		PTO.ForgetAllSCEVInLoopUnroll)));
FPM.addPass(WarnMissedTransformationsPass());		FPM.addPass(WarnMissedTransformationsPass());
FPM.addPass(InstCombinePass());		FPM.addPass(InstCombinePass());
FPM.addPass(		FPM.addPass(
RequireAnalysisPass<OptimizationRemarkEmitterAnalysis, Function>());		RequireAnalysisPass<OptimizationRemarkEmitterAnalysis, Function>());
FPM.addPass(createFunctionToLoopPassAdaptor(		FPM.addPass(createFunctionToLoopPassAdaptor(
LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap),		LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap,
		/AllowSpeculation=/true),
/UseMemorySSA=/true, /UseBlockFrequencyInfo=/true));		/UseMemorySSA=/true, /UseBlockFrequencyInfo=/true));
}		}

// Now that we've vectorized and unrolled loops, we may have more refined		// Now that we've vectorized and unrolled loops, we may have more refined
// alignment information, try to re-derive it here.		// alignment information, try to re-derive it here.
FPM.addPass(AlignmentFromAssumptionsPass());		FPM.addPass(AlignmentFromAssumptionsPass());

if (IsFullLTO)		if (IsFullLTO)
▲ Show 20 Lines • Show All 523 Lines • ▼ Show 20 Lines	PassBuilder::buildLTODefaultPipeline(OptimizationLevel Level,
MPM.addPass(RequireAnalysisPass<GlobalsAA, Module>());		MPM.addPass(RequireAnalysisPass<GlobalsAA, Module>());
// Invalidate AAManager so it can be recreated and pick up the newly available		// Invalidate AAManager so it can be recreated and pick up the newly available
// GlobalsAA.		// GlobalsAA.
MPM.addPass(		MPM.addPass(
createModuleToFunctionPassAdaptor(InvalidateAnalysisPass<AAManager>()));		createModuleToFunctionPassAdaptor(InvalidateAnalysisPass<AAManager>()));

FunctionPassManager MainFPM;		FunctionPassManager MainFPM;
MainFPM.addPass(createFunctionToLoopPassAdaptor(		MainFPM.addPass(createFunctionToLoopPassAdaptor(
LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap),		LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap,
		/AllowSpeculation=/true),
/USeMemorySSA=/true, /UseBlockFrequencyInfo=/true));		/USeMemorySSA=/true, /UseBlockFrequencyInfo=/true));

if (RunNewGVN)		if (RunNewGVN)
MainFPM.addPass(NewGVNPass());		MainFPM.addPass(NewGVNPass());
else		else
MainFPM.addPass(GVNPass());		MainFPM.addPass(GVNPass());

// Remove dead memcpy()'s.		// Remove dead memcpy()'s.
▲ Show 20 Lines • Show All 203 Lines • Show Last 20 Lines

llvm/lib/Transforms/IPO/PassManagerBuilder.cpp

Show First 20 Lines • Show All 452 Lines • ▼ Show 20 Lines	void PassManagerBuilder::addFunctionSimplificationPasses(
if (EnableSimpleLoopUnswitch) {		if (EnableSimpleLoopUnswitch) {
// The simple loop unswitch pass relies on separate cleanup passes. Schedule		// The simple loop unswitch pass relies on separate cleanup passes. Schedule
// them first so when we re-process a loop they run before other loop		// them first so when we re-process a loop they run before other loop
// passes.		// passes.
MPM.add(createLoopInstSimplifyPass());		MPM.add(createLoopInstSimplifyPass());
MPM.add(createLoopSimplifyCFGPass());		MPM.add(createLoopSimplifyCFGPass());
}		}
// Try to remove as much code from the loop header as possible,		// Try to remove as much code from the loop header as possible,
// to reduce amount of IR that will have to be duplicated.		// to reduce amount of IR that will have to be duplicated. However,
		// do not perform speculative hoisting the first time as LICM
		// will destroy metadata that may not need to be destroyed if run
		// after loop rotation.
// TODO: Investigate promotion cap for O1.		// TODO: Investigate promotion cap for O1.
MPM.add(createLICMPass(LicmMssaOptCap, LicmMssaNoAccForPromotionCap));		MPM.add(createLICMPass(LicmMssaOptCap, LicmMssaNoAccForPromotionCap,
		/AllowSpeculation=/false));
// Rotate Loop - disable header duplication at -Oz		// Rotate Loop - disable header duplication at -Oz
MPM.add(createLoopRotatePass(SizeLevel == 2 ? 0 : -1, PrepareForLTO));		MPM.add(createLoopRotatePass(SizeLevel == 2 ? 0 : -1, PrepareForLTO));
// TODO: Investigate promotion cap for O1.		// TODO: Investigate promotion cap for O1.
MPM.add(createLICMPass(LicmMssaOptCap, LicmMssaNoAccForPromotionCap));		MPM.add(createLICMPass(LicmMssaOptCap, LicmMssaNoAccForPromotionCap,
		/AllowSpeculation=/true));
if (EnableSimpleLoopUnswitch)		if (EnableSimpleLoopUnswitch)
MPM.add(createSimpleLoopUnswitchLegacyPass());		MPM.add(createSimpleLoopUnswitchLegacyPass());
else		else
MPM.add(createLoopUnswitchPass(SizeLevel \|\| OptLevel < 3, DivergentTarget));		MPM.add(createLoopUnswitchPass(SizeLevel \|\| OptLevel < 3, DivergentTarget));
// FIXME: We break the loop pass pipeline here in order to do full		// FIXME: We break the loop pass pipeline here in order to do full
// simplifycfg. Eventually loop-simplifycfg should be enhanced to replace the		// simplifycfg. Eventually loop-simplifycfg should be enhanced to replace the
// need for this.		// need for this.
MPM.add(createCFGSimplificationPass(		MPM.add(createCFGSimplificationPass(
▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	if (OptLevel > 1) {
MPM.add(createCorrelatedValuePropagationPass());		MPM.add(createCorrelatedValuePropagationPass());
}		}
MPM.add(createAggressiveDCEPass()); // Delete dead instructions		MPM.add(createAggressiveDCEPass()); // Delete dead instructions

MPM.add(createMemCpyOptPass()); // Remove memcpy / form memset		MPM.add(createMemCpyOptPass()); // Remove memcpy / form memset
// TODO: Investigate if this is too expensive at O1.		// TODO: Investigate if this is too expensive at O1.
if (OptLevel > 1) {		if (OptLevel > 1) {
MPM.add(createDeadStoreEliminationPass()); // Delete dead stores		MPM.add(createDeadStoreEliminationPass()); // Delete dead stores
MPM.add(createLICMPass(LicmMssaOptCap, LicmMssaNoAccForPromotionCap));		MPM.add(createLICMPass(LicmMssaOptCap, LicmMssaNoAccForPromotionCap,
		/AllowSpeculation=/true));
}		}

addExtensionsToPM(EP_ScalarOptimizerLate, MPM);		addExtensionsToPM(EP_ScalarOptimizerLate, MPM);

if (RerollLoops)		if (RerollLoops)
MPM.add(createLoopRerollPass());		MPM.add(createLoopRerollPass());

// Merge & remove BBs and sink & hoist common instructions.		// Merge & remove BBs and sink & hoist common instructions.
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	if (OptLevel > 1 && ExtraVectorizerPasses) {
// alignment checks inserted by the vectorizer. We want to track correlated		// alignment checks inserted by the vectorizer. We want to track correlated
// runtime checks for two inner loops in the same outer loop, fold any		// runtime checks for two inner loops in the same outer loop, fold any
// common computations, hoist loop-invariant aspects out of any outer loop,		// common computations, hoist loop-invariant aspects out of any outer loop,
// and unswitch the runtime checks if possible. Once hoisted, we may have		// and unswitch the runtime checks if possible. Once hoisted, we may have
// dead (or speculatable) control flows or more combining opportunities.		// dead (or speculatable) control flows or more combining opportunities.
PM.add(createEarlyCSEPass());		PM.add(createEarlyCSEPass());
PM.add(createCorrelatedValuePropagationPass());		PM.add(createCorrelatedValuePropagationPass());
PM.add(createInstructionCombiningPass());		PM.add(createInstructionCombiningPass());
PM.add(createLICMPass(LicmMssaOptCap, LicmMssaNoAccForPromotionCap));		PM.add(createLICMPass(LicmMssaOptCap, LicmMssaNoAccForPromotionCap,
		/AllowSpeculation=/true));
PM.add(createLoopUnswitchPass(SizeLevel \|\| OptLevel < 3, DivergentTarget));		PM.add(createLoopUnswitchPass(SizeLevel \|\| OptLevel < 3, DivergentTarget));
PM.add(createCFGSimplificationPass(		PM.add(createCFGSimplificationPass(
SimplifyCFGOptions().convertSwitchRangeToICmp(true)));		SimplifyCFGOptions().convertSwitchRangeToICmp(true)));
PM.add(createInstructionCombiningPass());		PM.add(createInstructionCombiningPass());
}		}

// Now that we've formed fast to execute loop structures, we do further		// Now that we've formed fast to execute loop structures, we do further
// optimizations. These are run afterward as they might block doing complex		// optimizations. These are run afterward as they might block doing complex
▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	if (!IsFullLTO) {
if (!DisableUnrollLoops) {		if (!DisableUnrollLoops) {
// LoopUnroll may generate some redundency to cleanup.		// LoopUnroll may generate some redundency to cleanup.
PM.add(createInstructionCombiningPass());		PM.add(createInstructionCombiningPass());

// Runtime unrolling will introduce runtime check in loop prologue. If the		// Runtime unrolling will introduce runtime check in loop prologue. If the
// unrolled loop is a inner loop, then the prologue will be inside the		// unrolled loop is a inner loop, then the prologue will be inside the
// outer loop. LICM pass can help to promote the runtime check out if the		// outer loop. LICM pass can help to promote the runtime check out if the
// checked value is loop invariant.		// checked value is loop invariant.
PM.add(createLICMPass(LicmMssaOptCap, LicmMssaNoAccForPromotionCap));		PM.add(createLICMPass(LicmMssaOptCap, LicmMssaNoAccForPromotionCap,
		/AllowSpeculation=/true));
}		}

PM.add(createWarnMissedTransformationsPass());		PM.add(createWarnMissedTransformationsPass());
}		}

// After vectorization and unrolling, assume intrinsics may tell us more		// After vectorization and unrolling, assume intrinsics may tell us more
// about pointer alignments.		// about pointer alignments.
PM.add(createAlignmentFromAssumptionsPass());		PM.add(createAlignmentFromAssumptionsPass());
▲ Show 20 Lines • Show All 230 Lines • ▼ Show 20 Lines	void PassManagerBuilder::populateModulePassManager(

// Scheduling LoopVersioningLICM when inlining is over, because after that		// Scheduling LoopVersioningLICM when inlining is over, because after that
// we may see more accurate aliasing. Reason to run this late is that too		// we may see more accurate aliasing. Reason to run this late is that too
// early versioning may prevent further inlining due to increase of code		// early versioning may prevent further inlining due to increase of code
// size. By placing it just after inlining other optimizations which runs		// size. By placing it just after inlining other optimizations which runs
// later might get benefit of no-alias assumption in clone loop.		// later might get benefit of no-alias assumption in clone loop.
if (UseLoopVersioningLICM) {		if (UseLoopVersioningLICM) {
MPM.add(createLoopVersioningLICMPass()); // Do LoopVersioningLICM		MPM.add(createLoopVersioningLICMPass()); // Do LoopVersioningLICM
MPM.add(createLICMPass(LicmMssaOptCap, LicmMssaNoAccForPromotionCap));		MPM.add(createLICMPass(LicmMssaOptCap, LicmMssaNoAccForPromotionCap,
		/AllowSpeculation=/true));
		lebedev.riUnsubmitted Done Reply Inline Actions Could we please be consisted with `/AllowSpeculation=/`/`/AllowSpeculation/` lebedev.ri: Could we please be consisted with `/AllowSpeculation=/`/`/AllowSpeculation/`
}		}

// We add a fresh GlobalsModRef run at this point. This is particularly		// We add a fresh GlobalsModRef run at this point. This is particularly
// useful as the above will have inlined, DCE'ed, and function-attr		// useful as the above will have inlined, DCE'ed, and function-attr
// propagated everything. We should at this point have a reasonably minimal		// propagated everything. We should at this point have a reasonably minimal
// and richly annotated call graph. By computing aliasing and mod/ref		// and richly annotated call graph. By computing aliasing and mod/ref
// information for all local globals here, the late loop passes and notably		// information for all local globals here, the late loop passes and notably
// the vectorizer will be able to use them to help recognize vectorizable		// the vectorizer will be able to use them to help recognize vectorizable
▲ Show 20 Lines • Show All 218 Lines • ▼ Show 20 Lines	void PassManagerBuilder::addLTOOptimizationPasses(legacy::PassManagerBase &PM) {
if (OptLevel > 1)		if (OptLevel > 1)
PM.add(createTailCallEliminationPass());		PM.add(createTailCallEliminationPass());

// Infer attributes on declarations, call sites, arguments, etc.		// Infer attributes on declarations, call sites, arguments, etc.
PM.add(createPostOrderFunctionAttrsLegacyPass()); // Add nocapture.		PM.add(createPostOrderFunctionAttrsLegacyPass()); // Add nocapture.
// Run a few AA driven optimizations here and now, to cleanup the code.		// Run a few AA driven optimizations here and now, to cleanup the code.
PM.add(createGlobalsAAWrapperPass()); // IP alias analysis.		PM.add(createGlobalsAAWrapperPass()); // IP alias analysis.

PM.add(createLICMPass(LicmMssaOptCap, LicmMssaNoAccForPromotionCap));		PM.add(createLICMPass(LicmMssaOptCap, LicmMssaNoAccForPromotionCap,
		/AllowSpeculation=/true));
PM.add(NewGVN ? createNewGVNPass()		PM.add(NewGVN ? createNewGVNPass()
: createGVNPass(DisableGVNLoadPRE)); // Remove redundancies.		: createGVNPass(DisableGVNLoadPRE)); // Remove redundancies.
PM.add(createMemCpyOptPass()); // Remove dead memcpys.		PM.add(createMemCpyOptPass()); // Remove dead memcpys.

// Nuke dead stores.		// Nuke dead stores.
PM.add(createDeadStoreEliminationPass());		PM.add(createDeadStoreEliminationPass());
PM.add(createMergedLoadStoreMotionPass()); // Merge ld/st in diamonds.		PM.add(createMergedLoadStoreMotionPass()); // Merge ld/st in diamonds.

▲ Show 20 Lines • Show All 200 Lines • Show Last 20 Lines

llvm/lib/Transforms/Scalar/LICM.cpp

Show First 20 Lines • Show All 143 Lines • ▼ Show 20 Lines
static void hoist(Instruction &I, const DominatorTree DT, const Loop CurLoop,		static void hoist(Instruction &I, const DominatorTree DT, const Loop CurLoop,
BasicBlock Dest, ICFLoopSafetyInfo SafetyInfo,		BasicBlock Dest, ICFLoopSafetyInfo SafetyInfo,
MemorySSAUpdater MSSAU, ScalarEvolution SE,		MemorySSAUpdater MSSAU, ScalarEvolution SE,
OptimizationRemarkEmitter *ORE);		OptimizationRemarkEmitter *ORE);
static bool sink(Instruction &I, LoopInfo LI, DominatorTree DT,		static bool sink(Instruction &I, LoopInfo LI, DominatorTree DT,
BlockFrequencyInfo BFI, const Loop CurLoop,		BlockFrequencyInfo BFI, const Loop CurLoop,
ICFLoopSafetyInfo SafetyInfo, MemorySSAUpdater MSSAU,		ICFLoopSafetyInfo SafetyInfo, MemorySSAUpdater MSSAU,
OptimizationRemarkEmitter *ORE);		OptimizationRemarkEmitter *ORE);
static bool isSafeToExecuteUnconditionally(Instruction &Inst,		static bool isSafeToExecuteUnconditionally(
const DominatorTree *DT,		Instruction &Inst, const DominatorTree DT, const TargetLibraryInfo TLI,
const TargetLibraryInfo *TLI,		const Loop CurLoop, const LoopSafetyInfo SafetyInfo,
const Loop *CurLoop,		OptimizationRemarkEmitter ORE, const Instruction CtxI,
const LoopSafetyInfo *SafetyInfo,		bool AllowSpeculation);
OptimizationRemarkEmitter *ORE,
const Instruction *CtxI = nullptr);
static bool pointerInvalidatedByLoop(MemoryLocation MemLoc,		static bool pointerInvalidatedByLoop(MemoryLocation MemLoc,
AliasSetTracker CurAST, Loop CurLoop,		AliasSetTracker CurAST, Loop CurLoop,
AAResults *AA);		AAResults *AA);
static bool pointerInvalidatedByLoopWithMSSA(MemorySSA MSSA, MemoryUse MU,		static bool pointerInvalidatedByLoopWithMSSA(MemorySSA MSSA, MemoryUse MU,
Loop *CurLoop, Instruction &I,		Loop *CurLoop, Instruction &I,
SinkAndHoistLICMFlags &Flags);		SinkAndHoistLICMFlags &Flags);
static bool pointerInvalidatedByBlockWithMSSA(BasicBlock &BB, MemorySSA &MSSA,		static bool pointerInvalidatedByBlockWithMSSA(BasicBlock &BB, MemorySSA &MSSA,
MemoryUse &MU);		MemoryUse &MU);
Show All 16 Lines
namespace {		namespace {
struct LoopInvariantCodeMotion {		struct LoopInvariantCodeMotion {
bool runOnLoop(Loop L, AAResults AA, LoopInfo LI, DominatorTree DT,		bool runOnLoop(Loop L, AAResults AA, LoopInfo LI, DominatorTree DT,
BlockFrequencyInfo BFI, TargetLibraryInfo TLI,		BlockFrequencyInfo BFI, TargetLibraryInfo TLI,
TargetTransformInfo TTI, ScalarEvolution SE, MemorySSA *MSSA,		TargetTransformInfo TTI, ScalarEvolution SE, MemorySSA *MSSA,
OptimizationRemarkEmitter *ORE, bool LoopNestMode = false);		OptimizationRemarkEmitter *ORE, bool LoopNestMode = false);

LoopInvariantCodeMotion(unsigned LicmMssaOptCap,		LoopInvariantCodeMotion(unsigned LicmMssaOptCap,
unsigned LicmMssaNoAccForPromotionCap)		unsigned LicmMssaNoAccForPromotionCap,
		bool LicmAllowSpeculation)
: LicmMssaOptCap(LicmMssaOptCap),		: LicmMssaOptCap(LicmMssaOptCap),
LicmMssaNoAccForPromotionCap(LicmMssaNoAccForPromotionCap) {}		LicmMssaNoAccForPromotionCap(LicmMssaNoAccForPromotionCap),
		LicmAllowSpeculation(LicmAllowSpeculation) {}

private:		private:
unsigned LicmMssaOptCap;		unsigned LicmMssaOptCap;
unsigned LicmMssaNoAccForPromotionCap;		unsigned LicmMssaNoAccForPromotionCap;
		bool LicmAllowSpeculation;
};		};

struct LegacyLICMPass : public LoopPass {		struct LegacyLICMPass : public LoopPass {
static char ID; // Pass identification, replacement for typeid		static char ID; // Pass identification, replacement for typeid
LegacyLICMPass(		LegacyLICMPass(
unsigned LicmMssaOptCap = SetLicmMssaOptCap,		unsigned LicmMssaOptCap = SetLicmMssaOptCap,
unsigned LicmMssaNoAccForPromotionCap = SetLicmMssaNoAccForPromotionCap)		unsigned LicmMssaNoAccForPromotionCap = SetLicmMssaNoAccForPromotionCap,
: LoopPass(ID), LICM(LicmMssaOptCap, LicmMssaNoAccForPromotionCap) {		bool LicmAllowSpeculation = true)
		lebedev.riUnsubmitted Not Done Reply Inline Actions Elsewhere `LicmAllowSpeculation` doesn't have a default value. Should we be consistent with that? lebedev.ri: Elsewhere `LicmAllowSpeculation` doesn't have a default value. Should we be consistent with…
		wsmosesAuthorUnsubmitted Done Reply Inline Actions This constructor in particular needs a default value since it is constructed without arguments (and all other arguments have default values). wsmoses: This constructor in particular needs a default value since it is constructed without arguments…
		: LoopPass(ID), LICM(LicmMssaOptCap, LicmMssaNoAccForPromotionCap,
		LicmAllowSpeculation) {
initializeLegacyLICMPassPass(*PassRegistry::getPassRegistry());		initializeLegacyLICMPassPass(*PassRegistry::getPassRegistry());
}		}

bool runOnLoop(Loop *L, LPPassManager &LPM) override {		bool runOnLoop(Loop *L, LPPassManager &LPM) override {
if (skipLoop(L))		if (skipLoop(L))
return false;		return false;

LLVM_DEBUG(dbgs() << "Perform LICM on Loop with header at block "		LLVM_DEBUG(dbgs() << "Perform LICM on Loop with header at block "
▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	PreservedAnalyses LICMPass::run(Loop &L, LoopAnalysisManager &AM,
if (!AR.MSSA)		if (!AR.MSSA)
report_fatal_error("LICM requires MemorySSA (loop-mssa)");		report_fatal_error("LICM requires MemorySSA (loop-mssa)");

// For the new PM, we also can't use OptimizationRemarkEmitter as an analysis		// For the new PM, we also can't use OptimizationRemarkEmitter as an analysis
// pass. Function analyses need to be preserved across loop transformations		// pass. Function analyses need to be preserved across loop transformations
// but ORE cannot be preserved (see comment before the pass definition).		// but ORE cannot be preserved (see comment before the pass definition).
OptimizationRemarkEmitter ORE(L.getHeader()->getParent());		OptimizationRemarkEmitter ORE(L.getHeader()->getParent());

LoopInvariantCodeMotion LICM(LicmMssaOptCap, LicmMssaNoAccForPromotionCap);		LoopInvariantCodeMotion LICM(LicmMssaOptCap, LicmMssaNoAccForPromotionCap,
		LicmAllowSpeculation);
if (!LICM.runOnLoop(&L, &AR.AA, &AR.LI, &AR.DT, AR.BFI, &AR.TLI, &AR.TTI,		if (!LICM.runOnLoop(&L, &AR.AA, &AR.LI, &AR.DT, AR.BFI, &AR.TLI, &AR.TTI,
&AR.SE, AR.MSSA, &ORE))		&AR.SE, AR.MSSA, &ORE))
return PreservedAnalyses::all();		return PreservedAnalyses::all();

auto PA = getLoopPassPreservedAnalyses();		auto PA = getLoopPassPreservedAnalyses();

PA.preserve<DominatorTreeAnalysis>();		PA.preserve<DominatorTreeAnalysis>();
PA.preserve<LoopAnalysis>();		PA.preserve<LoopAnalysis>();
PA.preserve<MemorySSAAnalysis>();		PA.preserve<MemorySSAAnalysis>();

return PA;		return PA;
}		}

PreservedAnalyses LNICMPass::run(LoopNest &LN, LoopAnalysisManager &AM,		PreservedAnalyses LNICMPass::run(LoopNest &LN, LoopAnalysisManager &AM,
LoopStandardAnalysisResults &AR,		LoopStandardAnalysisResults &AR,
LPMUpdater &) {		LPMUpdater &) {
if (!AR.MSSA)		if (!AR.MSSA)
report_fatal_error("LNICM requires MemorySSA (loop-mssa)");		report_fatal_error("LNICM requires MemorySSA (loop-mssa)");

// For the new PM, we also can't use OptimizationRemarkEmitter as an analysis		// For the new PM, we also can't use OptimizationRemarkEmitter as an analysis
// pass. Function analyses need to be preserved across loop transformations		// pass. Function analyses need to be preserved across loop transformations
// but ORE cannot be preserved (see comment before the pass definition).		// but ORE cannot be preserved (see comment before the pass definition).
OptimizationRemarkEmitter ORE(LN.getParent());		OptimizationRemarkEmitter ORE(LN.getParent());

LoopInvariantCodeMotion LICM(LicmMssaOptCap, LicmMssaNoAccForPromotionCap);		LoopInvariantCodeMotion LICM(LicmMssaOptCap, LicmMssaNoAccForPromotionCap,
		LicmAllowSpeculation);

Loop &OutermostLoop = LN.getOutermostLoop();		Loop &OutermostLoop = LN.getOutermostLoop();
bool Changed = LICM.runOnLoop(&OutermostLoop, &AR.AA, &AR.LI, &AR.DT, AR.BFI,		bool Changed = LICM.runOnLoop(&OutermostLoop, &AR.AA, &AR.LI, &AR.DT, AR.BFI,
&AR.TLI, &AR.TTI, &AR.SE, AR.MSSA, &ORE, true);		&AR.TLI, &AR.TTI, &AR.SE, AR.MSSA, &ORE, true);

if (!Changed)		if (!Changed)
return PreservedAnalyses::all();		return PreservedAnalyses::all();

Show All 14 Lines
INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass)		INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass)
INITIALIZE_PASS_DEPENDENCY(MemorySSAWrapperPass)		INITIALIZE_PASS_DEPENDENCY(MemorySSAWrapperPass)
INITIALIZE_PASS_DEPENDENCY(LazyBFIPass)		INITIALIZE_PASS_DEPENDENCY(LazyBFIPass)
INITIALIZE_PASS_END(LegacyLICMPass, "licm", "Loop Invariant Code Motion", false,		INITIALIZE_PASS_END(LegacyLICMPass, "licm", "Loop Invariant Code Motion", false,
false)		false)

Pass *llvm::createLICMPass() { return new LegacyLICMPass(); }		Pass *llvm::createLICMPass() { return new LegacyLICMPass(); }
Pass *llvm::createLICMPass(unsigned LicmMssaOptCap,		Pass *llvm::createLICMPass(unsigned LicmMssaOptCap,
unsigned LicmMssaNoAccForPromotionCap) {		unsigned LicmMssaNoAccForPromotionCap,
return new LegacyLICMPass(LicmMssaOptCap, LicmMssaNoAccForPromotionCap);		bool LicmAllowSpeculation) {
		return new LegacyLICMPass(LicmMssaOptCap, LicmMssaNoAccForPromotionCap,
		LicmAllowSpeculation);
}		}

llvm::SinkAndHoistLICMFlags::SinkAndHoistLICMFlags(bool IsSink, Loop *L,		llvm::SinkAndHoistLICMFlags::SinkAndHoistLICMFlags(bool IsSink, Loop *L,
MemorySSA *MSSA)		MemorySSA *MSSA)
: SinkAndHoistLICMFlags(SetLicmMssaOptCap, SetLicmMssaNoAccForPromotionCap,		: SinkAndHoistLICMFlags(SetLicmMssaOptCap, SetLicmMssaNoAccForPromotionCap,
IsSink, L, MSSA) {}		IsSink, L, MSSA) {}

llvm::SinkAndHoistLICMFlags::SinkAndHoistLICMFlags(		llvm::SinkAndHoistLICMFlags::SinkAndHoistLICMFlags(
▲ Show 20 Lines • Show All 79 Lines • ▼ Show 20 Lines	Changed \|= LoopNestMode
? sinkRegionForLoopNest(DT->getNode(L->getHeader()), AA, LI,		? sinkRegionForLoopNest(DT->getNode(L->getHeader()), AA, LI,
DT, BFI, TLI, TTI, L, &MSSAU,		DT, BFI, TLI, TTI, L, &MSSAU,
&SafetyInfo, Flags, ORE)		&SafetyInfo, Flags, ORE)
: sinkRegion(DT->getNode(L->getHeader()), AA, LI, DT, BFI,		: sinkRegion(DT->getNode(L->getHeader()), AA, LI, DT, BFI,
TLI, TTI, L, &MSSAU, &SafetyInfo, Flags, ORE);		TLI, TTI, L, &MSSAU, &SafetyInfo, Flags, ORE);
Flags.setIsSink(false);		Flags.setIsSink(false);
if (Preheader)		if (Preheader)
Changed \|= hoistRegion(DT->getNode(L->getHeader()), AA, LI, DT, BFI, TLI, L,		Changed \|= hoistRegion(DT->getNode(L->getHeader()), AA, LI, DT, BFI, TLI, L,
&MSSAU, SE, &SafetyInfo, Flags, ORE, LoopNestMode);		&MSSAU, SE, &SafetyInfo, Flags, ORE, LoopNestMode,
		LicmAllowSpeculation);

// Now that all loop invariants have been removed from the loop, promote any		// Now that all loop invariants have been removed from the loop, promote any
// memory references to scalars that we can.		// memory references to scalars that we can.
// Don't sink stores from loops without dedicated block exits. Exits		// Don't sink stores from loops without dedicated block exits. Exits
// containing indirect branches are not transformed by loop simplify,		// containing indirect branches are not transformed by loop simplify,
// make sure we catch that. An additional load may be generated in the		// make sure we catch that. An additional load may be generated in the
// preheader for SSA updater, so also avoid sinking when no preheader		// preheader for SSA updater, so also avoid sinking when no preheader
// is available.		// is available.
Show All 25 Lines	if (!HasCatchSwitch) {
// decreasing in size over time).		// decreasing in size over time).
bool Promoted = false;		bool Promoted = false;
bool LocalPromoted;		bool LocalPromoted;
do {		do {
LocalPromoted = false;		LocalPromoted = false;
for (const SmallSetVector<Value *, 8> &PointerMustAliases :		for (const SmallSetVector<Value *, 8> &PointerMustAliases :
collectPromotionCandidates(MSSA, AA, L)) {		collectPromotionCandidates(MSSA, AA, L)) {
LocalPromoted \|= promoteLoopAccessesToScalars(		LocalPromoted \|= promoteLoopAccessesToScalars(
PointerMustAliases, ExitBlocks, InsertPts, MSSAInsertPts, PIC,		PointerMustAliases, ExitBlocks, InsertPts, MSSAInsertPts, PIC, LI,
LI, DT, TLI, L, &MSSAU, &SafetyInfo, ORE);		DT, TLI, L, &MSSAU, &SafetyInfo, ORE, LicmAllowSpeculation);
}		}
Promoted \|= LocalPromoted;		Promoted \|= LocalPromoted;
} while (LocalPromoted);		} while (LocalPromoted);

// Once we have promoted values across the loop body we have to		// Once we have promoted values across the loop body we have to
// recursively reform LCSSA as any nested loop may now have values defined		// recursively reform LCSSA as any nested loop may now have values defined
// within the loop used in the outer loop.		// within the loop used in the outer loop.
// FIXME: This is really heavy handed. It would be a bit better to use an		// FIXME: This is really heavy handed. It would be a bit better to use an
▲ Show 20 Lines • Show All 347 Lines • ▼ Show 20 Lines
/// uses, allowing us to hoist a loop body in one pass without iteration.		/// uses, allowing us to hoist a loop body in one pass without iteration.
///		///
bool llvm::hoistRegion(DomTreeNode N, AAResults AA, LoopInfo *LI,		bool llvm::hoistRegion(DomTreeNode N, AAResults AA, LoopInfo *LI,
DominatorTree DT, BlockFrequencyInfo BFI,		DominatorTree DT, BlockFrequencyInfo BFI,
TargetLibraryInfo TLI, Loop CurLoop,		TargetLibraryInfo TLI, Loop CurLoop,
MemorySSAUpdater MSSAU, ScalarEvolution SE,		MemorySSAUpdater MSSAU, ScalarEvolution SE,
ICFLoopSafetyInfo *SafetyInfo,		ICFLoopSafetyInfo *SafetyInfo,
SinkAndHoistLICMFlags &Flags,		SinkAndHoistLICMFlags &Flags,
OptimizationRemarkEmitter *ORE, bool LoopNestMode) {		OptimizationRemarkEmitter *ORE, bool LoopNestMode,
		bool AllowSpeculation) {
// Verify inputs.		// Verify inputs.
assert(N != nullptr && AA != nullptr && LI != nullptr && DT != nullptr &&		assert(N != nullptr && AA != nullptr && LI != nullptr && DT != nullptr &&
CurLoop != nullptr && MSSAU != nullptr && SafetyInfo != nullptr &&		CurLoop != nullptr && MSSAU != nullptr && SafetyInfo != nullptr &&
"Unexpected input to hoistRegion.");		"Unexpected input to hoistRegion.");

ControlFlowHoister CFH(LI, DT, CurLoop, MSSAU);		ControlFlowHoister CFH(LI, DT, CurLoop, MSSAU);

// Keep track of instructions that have been hoisted, as they may need to be		// Keep track of instructions that have been hoisted, as they may need to be
Show All 35 Lines	for (Instruction &I : llvm::make_early_inc_range(*BB)) {
// TODO: It may be safe to hoist if we are hoisting to a conditional block		// TODO: It may be safe to hoist if we are hoisting to a conditional block
// and we have accurately duplicated the control flow from the loop header		// and we have accurately duplicated the control flow from the loop header
// to that block.		// to that block.
if (CurLoop->hasLoopInvariantOperands(&I) &&		if (CurLoop->hasLoopInvariantOperands(&I) &&
canSinkOrHoistInst(I, AA, DT, CurLoop, /CurAST/ nullptr, MSSAU,		canSinkOrHoistInst(I, AA, DT, CurLoop, /CurAST/ nullptr, MSSAU,
true, &Flags, ORE) &&		true, &Flags, ORE) &&
isSafeToExecuteUnconditionally(		isSafeToExecuteUnconditionally(
I, DT, TLI, CurLoop, SafetyInfo, ORE,		I, DT, TLI, CurLoop, SafetyInfo, ORE,
CurLoop->getLoopPreheader()->getTerminator())) {		CurLoop->getLoopPreheader()->getTerminator(), AllowSpeculation)) {
hoist(I, DT, CurLoop, CFH.getOrCreateHoistedBlock(BB), SafetyInfo,		hoist(I, DT, CurLoop, CFH.getOrCreateHoistedBlock(BB), SafetyInfo,
MSSAU, SE, ORE);		MSSAU, SE, ORE);
HoistedInstructions.push_back(&I);		HoistedInstructions.push_back(&I);
Changed = true;		Changed = true;
continue;		continue;
}		}

// Attempt to remove floating point division out of the loop by		// Attempt to remove floating point division out of the loop by
▲ Show 20 Lines • Show All 880 Lines • ▼ Show 20 Lines	static void hoist(Instruction &I, const DominatorTree DT, const Loop CurLoop,
else if (isa<CallInst>(I))		else if (isa<CallInst>(I))
++NumMovedCalls;		++NumMovedCalls;
++NumHoisted;		++NumHoisted;
}		}

/// Only sink or hoist an instruction if it is not a trapping instruction,		/// Only sink or hoist an instruction if it is not a trapping instruction,
/// or if the instruction is known not to trap when moved to the preheader.		/// or if the instruction is known not to trap when moved to the preheader.
/// or if it is a trapping instruction and is guaranteed to execute.		/// or if it is a trapping instruction and is guaranteed to execute.
static bool isSafeToExecuteUnconditionally(Instruction &Inst,		static bool isSafeToExecuteUnconditionally(
const DominatorTree *DT,		Instruction &Inst, const DominatorTree DT, const TargetLibraryInfo TLI,
const TargetLibraryInfo *TLI,		const Loop CurLoop, const LoopSafetyInfo SafetyInfo,
const Loop *CurLoop,		OptimizationRemarkEmitter ORE, const Instruction CtxI,
const LoopSafetyInfo *SafetyInfo,		bool AllowSpeculation) {
OptimizationRemarkEmitter *ORE,		if (AllowSpeculation && isSafeToSpeculativelyExecute(&Inst, CtxI, DT, TLI))
const Instruction *CtxI) {
if (isSafeToSpeculativelyExecute(&Inst, CtxI, DT, TLI))
return true;		return true;

bool GuaranteedToExecute =		bool GuaranteedToExecute =
SafetyInfo->isGuaranteedToExecute(Inst, DT, CurLoop);		SafetyInfo->isGuaranteedToExecute(Inst, DT, CurLoop);

if (!GuaranteedToExecute) {		if (!GuaranteedToExecute) {
auto *LI = dyn_cast<LoadInst>(&Inst);		auto *LI = dyn_cast<LoadInst>(&Inst);
if (LI && CurLoop->isLoopInvariant(LI->getPointerOperand()))		if (LI && CurLoop->isLoopInvariant(LI->getPointerOperand()))
▲ Show 20 Lines • Show All 151 Lines • ▼ Show 20 Lines
///		///
bool llvm::promoteLoopAccessesToScalars(		bool llvm::promoteLoopAccessesToScalars(
const SmallSetVector<Value *, 8> &PointerMustAliases,		const SmallSetVector<Value *, 8> &PointerMustAliases,
SmallVectorImpl<BasicBlock *> &ExitBlocks,		SmallVectorImpl<BasicBlock *> &ExitBlocks,
SmallVectorImpl<Instruction *> &InsertPts,		SmallVectorImpl<Instruction *> &InsertPts,
SmallVectorImpl<MemoryAccess *> &MSSAInsertPts, PredIteratorCache &PIC,		SmallVectorImpl<MemoryAccess *> &MSSAInsertPts, PredIteratorCache &PIC,
LoopInfo LI, DominatorTree DT, const TargetLibraryInfo *TLI,		LoopInfo LI, DominatorTree DT, const TargetLibraryInfo *TLI,
Loop CurLoop, MemorySSAUpdater MSSAU, ICFLoopSafetyInfo *SafetyInfo,		Loop CurLoop, MemorySSAUpdater MSSAU, ICFLoopSafetyInfo *SafetyInfo,
OptimizationRemarkEmitter *ORE) {		OptimizationRemarkEmitter *ORE, bool AllowSpeculation) {
// Verify inputs.		// Verify inputs.
assert(LI != nullptr && DT != nullptr && CurLoop != nullptr &&		assert(LI != nullptr && DT != nullptr && CurLoop != nullptr &&
SafetyInfo != nullptr &&		SafetyInfo != nullptr &&
"Unexpected Input to promoteLoopAccessesToScalars");		"Unexpected Input to promoteLoopAccessesToScalars");

Value SomePtr = PointerMustAliases.begin();		Value SomePtr = PointerMustAliases.begin();
BasicBlock *Preheader = CurLoop->getLoopPreheader();		BasicBlock *Preheader = CurLoop->getLoopPreheader();

▲ Show 20 Lines • Show All 88 Lines • ▼ Show 20 Lines	for (User *U : ASIV->users()) {

Align InstAlignment = Load->getAlign();		Align InstAlignment = Load->getAlign();

// Note that proving a load safe to speculate requires proving		// Note that proving a load safe to speculate requires proving
// sufficient alignment at the target location. Proving it guaranteed		// sufficient alignment at the target location. Proving it guaranteed
// to execute does as well. Thus we can increase our guaranteed		// to execute does as well. Thus we can increase our guaranteed
// alignment as well.		// alignment as well.
if (!DereferenceableInPH \|\| (InstAlignment > Alignment))		if (!DereferenceableInPH \|\| (InstAlignment > Alignment))
if (isSafeToExecuteUnconditionally(*Load, DT, TLI, CurLoop,		if (isSafeToExecuteUnconditionally(
SafetyInfo, ORE,		*Load, DT, TLI, CurLoop, SafetyInfo, ORE,
Preheader->getTerminator())) {		Preheader->getTerminator(), AllowSpeculation)) {
DereferenceableInPH = true;		DereferenceableInPH = true;
Alignment = std::max(Alignment, InstAlignment);		Alignment = std::max(Alignment, InstAlignment);
}		}
} else if (const StoreInst *Store = dyn_cast<StoreInst>(UI)) {		} else if (const StoreInst *Store = dyn_cast<StoreInst>(UI)) {
// Stores of the pointer are not interesting, only stores to the		// Stores of the pointer are not interesting, only stores to the
// pointer.		// pointer.
if (UI->getOperand(1) != ASIV)		if (UI->getOperand(1) != ASIV)
continue;		continue;
▲ Show 20 Lines • Show All 290 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopUnroll/AArch64/runtime-unroll-generic.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -S -passes='default<O1>' -mtriple aarch64 -mcpu=cortex-a55 \| FileCheck %s -check-prefix=CHECK-A55			; RUN: opt < %s -S -passes='default<O1>' -mtriple aarch64 -mcpu=cortex-a55 \| FileCheck %s -check-prefix=CHECK-A55
	; RUN: opt < %s -S -passes='default<O1>' -mtriple aarch64 \| FileCheck %s -check-prefix=CHECK-GENERIC			; RUN: opt < %s -S -passes='default<O1>' -mtriple aarch64 \| FileCheck %s -check-prefix=CHECK-GENERIC

	; Testing that, while runtime unrolling is performed on in-order cores (such as the cortex-a55), it is not performed when -mcpu is not specified			; Testing that, while runtime unrolling is performed on in-order cores (such as the cortex-a55), it is not performed when -mcpu is not specified
	define void @runtime_unroll_generic(i32 %arg_0, i32* %arg_1, i16* %arg_2, i16* %arg_3) {			define void @runtime_unroll_generic(i32 %arg_0, i32* %arg_1, i16* %arg_2, i16* %arg_3) {
	; CHECK-A55-LABEL: @runtime_unroll_generic(			; CHECK-A55-LABEL: @runtime_unroll_generic(
	; CHECK-A55-NEXT: entry:			; CHECK-A55-NEXT: entry:
				; CHECK-A55-NEXT: [[CMP52_NOT:%.]] = icmp eq i32 [[ARG_0:%.]], 0
				; CHECK-A55-NEXT: br i1 [[CMP52_NOT]], label [[FOR_END:%.]], label [[FOR_BODY6_LR_PH:%.]]
				; CHECK-A55: for.body6.lr.ph:
	; CHECK-A55-NEXT: [[ARRAYIDX10:%.]] = getelementptr inbounds i16, i16 [[ARG_2:%.*]], i64 undef			; CHECK-A55-NEXT: [[ARRAYIDX10:%.]] = getelementptr inbounds i16, i16 [[ARG_2:%.*]], i64 undef
	; CHECK-A55-NEXT: [[ARRAYIDX14:%.]] = getelementptr inbounds i16, i16 [[ARG_3:%.*]], i64 undef			; CHECK-A55-NEXT: [[ARRAYIDX14:%.]] = getelementptr inbounds i16, i16 [[ARG_3:%.*]], i64 undef
	; CHECK-A55-NEXT: [[ARRAYIDX20:%.]] = getelementptr inbounds i32, i32 [[ARG_1:%.*]], i64 undef			; CHECK-A55-NEXT: [[ARRAYIDX20:%.]] = getelementptr inbounds i32, i32 [[ARG_1:%.*]], i64 undef
	; CHECK-A55-NEXT: [[CMP52_NOT:%.]] = icmp eq i32 [[ARG_0:%.]], 0
	; CHECK-A55-NEXT: br i1 [[CMP52_NOT]], label [[FOR_END:%.]], label [[FOR_BODY6_PREHEADER:%.]]
	; CHECK-A55: for.body6.preheader:
	; CHECK-A55-NEXT: [[TMP0:%.*]] = add i32 [[ARG_0]], -1			; CHECK-A55-NEXT: [[TMP0:%.*]] = add i32 [[ARG_0]], -1
	; CHECK-A55-NEXT: [[XTRAITER:%.*]] = and i32 [[ARG_0]], 3			; CHECK-A55-NEXT: [[XTRAITER:%.*]] = and i32 [[ARG_0]], 3
	; CHECK-A55-NEXT: [[TMP1:%.*]] = icmp ult i32 [[TMP0]], 3			; CHECK-A55-NEXT: [[TMP1:%.*]] = icmp ult i32 [[TMP0]], 3
	; CHECK-A55-NEXT: br i1 [[TMP1]], label [[FOR_END_LOOPEXIT_UNR_LCSSA:%.]], label [[FOR_BODY6_PREHEADER_NEW:%.]]			; CHECK-A55-NEXT: br i1 [[TMP1]], label [[FOR_END_LOOPEXIT_UNR_LCSSA:%.]], label [[FOR_BODY6_LR_PH_NEW:%.]]
	; CHECK-A55: for.body6.preheader.new:			; CHECK-A55: for.body6.lr.ph.new:
	; CHECK-A55-NEXT: [[UNROLL_ITER:%.*]] = and i32 [[ARG_0]], -4			; CHECK-A55-NEXT: [[UNROLL_ITER:%.*]] = and i32 [[ARG_0]], -4
	; CHECK-A55-NEXT: br label [[FOR_BODY6:%.*]]			; CHECK-A55-NEXT: br label [[FOR_BODY6:%.*]]
	; CHECK-A55: for.body6:			; CHECK-A55: for.body6:
	; CHECK-A55-NEXT: [[NITER:%.]] = phi i32 [ 0, [[FOR_BODY6_PREHEADER_NEW]] ], [ [[NITER_NEXT_3:%.]], [[FOR_BODY6]] ]			; CHECK-A55-NEXT: [[NITER:%.]] = phi i32 [ 0, [[FOR_BODY6_LR_PH_NEW]] ], [ [[NITER_NEXT_3:%.]], [[FOR_BODY6]] ]
	; CHECK-A55-NEXT: [[TMP2:%.]] = load i16, i16 [[ARRAYIDX10]], align 2			; CHECK-A55-NEXT: [[TMP2:%.]] = load i16, i16 [[ARRAYIDX10]], align 2
	; CHECK-A55-NEXT: [[CONV:%.*]] = sext i16 [[TMP2]] to i32			; CHECK-A55-NEXT: [[CONV:%.*]] = sext i16 [[TMP2]] to i32
	; CHECK-A55-NEXT: [[TMP3:%.]] = load i16, i16 [[ARRAYIDX14]], align 2			; CHECK-A55-NEXT: [[TMP3:%.]] = load i16, i16 [[ARRAYIDX14]], align 2
	; CHECK-A55-NEXT: [[CONV15:%.*]] = sext i16 [[TMP3]] to i32			; CHECK-A55-NEXT: [[CONV15:%.*]] = sext i16 [[TMP3]] to i32
	; CHECK-A55-NEXT: [[MUL16:%.*]] = mul nsw i32 [[CONV15]], [[CONV]]			; CHECK-A55-NEXT: [[MUL16:%.*]] = mul nsw i32 [[CONV15]], [[CONV]]
	; CHECK-A55-NEXT: [[TMP4:%.]] = load i32, i32 [[ARRAYIDX20]], align 4			; CHECK-A55-NEXT: [[TMP4:%.]] = load i32, i32 [[ARRAYIDX20]], align 4
	; CHECK-A55-NEXT: [[ADD21:%.*]] = add nsw i32 [[MUL16]], [[TMP4]]			; CHECK-A55-NEXT: [[ADD21:%.*]] = add nsw i32 [[MUL16]], [[TMP4]]
	; CHECK-A55-NEXT: store i32 [[ADD21]], i32* [[ARRAYIDX20]], align 4			; CHECK-A55-NEXT: store i32 [[ADD21]], i32* [[ARRAYIDX20]], align 4
	▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines
	; CHECK-A55-NEXT: [[ADD21_EPIL_2:%.*]] = add nsw i32 [[MUL16_EPIL_2]], [[TMP19]]			; CHECK-A55-NEXT: [[ADD21_EPIL_2:%.*]] = add nsw i32 [[MUL16_EPIL_2]], [[TMP19]]
	; CHECK-A55-NEXT: store i32 [[ADD21_EPIL_2]], i32* [[ARRAYIDX20]], align 4			; CHECK-A55-NEXT: store i32 [[ADD21_EPIL_2]], i32* [[ARRAYIDX20]], align 4
	; CHECK-A55-NEXT: br label [[FOR_END]]			; CHECK-A55-NEXT: br label [[FOR_END]]
	; CHECK-A55: for.end:			; CHECK-A55: for.end:
	; CHECK-A55-NEXT: ret void			; CHECK-A55-NEXT: ret void
	;			;
	; CHECK-GENERIC-LABEL: @runtime_unroll_generic(			; CHECK-GENERIC-LABEL: @runtime_unroll_generic(
	; CHECK-GENERIC-NEXT: entry:			; CHECK-GENERIC-NEXT: entry:
				; CHECK-GENERIC-NEXT: [[CMP52_NOT:%.]] = icmp eq i32 [[ARG_0:%.]], 0
				; CHECK-GENERIC-NEXT: br i1 [[CMP52_NOT]], label [[FOR_END:%.]], label [[FOR_BODY6_LR_PH:%.]]
				; CHECK-GENERIC: for.body6.lr.ph:
	; CHECK-GENERIC-NEXT: [[ARRAYIDX10:%.]] = getelementptr inbounds i16, i16 [[ARG_2:%.*]], i64 undef			; CHECK-GENERIC-NEXT: [[ARRAYIDX10:%.]] = getelementptr inbounds i16, i16 [[ARG_2:%.*]], i64 undef
	; CHECK-GENERIC-NEXT: [[ARRAYIDX14:%.]] = getelementptr inbounds i16, i16 [[ARG_3:%.*]], i64 undef			; CHECK-GENERIC-NEXT: [[ARRAYIDX14:%.]] = getelementptr inbounds i16, i16 [[ARG_3:%.*]], i64 undef
	; CHECK-GENERIC-NEXT: [[ARRAYIDX20:%.]] = getelementptr inbounds i32, i32 [[ARG_1:%.*]], i64 undef			; CHECK-GENERIC-NEXT: [[ARRAYIDX20:%.]] = getelementptr inbounds i32, i32 [[ARG_1:%.*]], i64 undef
	; CHECK-GENERIC-NEXT: [[CMP52_NOT:%.]] = icmp eq i32 [[ARG_0:%.]], 0			; CHECK-GENERIC-NEXT: br label [[FOR_BODY6:%.*]]
	; CHECK-GENERIC-NEXT: br i1 [[CMP52_NOT]], label [[FOR_END:%.]], label [[FOR_BODY6:%.]]
	; CHECK-GENERIC: for.body6:			; CHECK-GENERIC: for.body6:
	; CHECK-GENERIC-NEXT: [[K_03:%.]] = phi i32 [ [[INC:%.]], [[FOR_BODY6]] ], [ 0, [[ENTRY:%.*]] ]			; CHECK-GENERIC-NEXT: [[K_03:%.]] = phi i32 [ 0, [[FOR_BODY6_LR_PH]] ], [ [[INC:%.]], [[FOR_BODY6]] ]
	; CHECK-GENERIC-NEXT: [[TMP0:%.]] = load i16, i16 [[ARRAYIDX10]], align 2			; CHECK-GENERIC-NEXT: [[TMP0:%.]] = load i16, i16 [[ARRAYIDX10]], align 2
	; CHECK-GENERIC-NEXT: [[CONV:%.*]] = sext i16 [[TMP0]] to i32			; CHECK-GENERIC-NEXT: [[CONV:%.*]] = sext i16 [[TMP0]] to i32
	; CHECK-GENERIC-NEXT: [[TMP1:%.]] = load i16, i16 [[ARRAYIDX14]], align 2			; CHECK-GENERIC-NEXT: [[TMP1:%.]] = load i16, i16 [[ARRAYIDX14]], align 2
	; CHECK-GENERIC-NEXT: [[CONV15:%.*]] = sext i16 [[TMP1]] to i32			; CHECK-GENERIC-NEXT: [[CONV15:%.*]] = sext i16 [[TMP1]] to i32
	; CHECK-GENERIC-NEXT: [[MUL16:%.*]] = mul nsw i32 [[CONV15]], [[CONV]]			; CHECK-GENERIC-NEXT: [[MUL16:%.*]] = mul nsw i32 [[CONV15]], [[CONV]]
	; CHECK-GENERIC-NEXT: [[TMP2:%.]] = load i32, i32 [[ARRAYIDX20]], align 4			; CHECK-GENERIC-NEXT: [[TMP2:%.]] = load i32, i32 [[ARRAYIDX20]], align 4
	; CHECK-GENERIC-NEXT: [[ADD21:%.*]] = add nsw i32 [[MUL16]], [[TMP2]]			; CHECK-GENERIC-NEXT: [[ADD21:%.*]] = add nsw i32 [[MUL16]], [[TMP2]]
	; CHECK-GENERIC-NEXT: store i32 [[ADD21]], i32* [[ARRAYIDX20]], align 4			; CHECK-GENERIC-NEXT: store i32 [[ADD21]], i32* [[ARRAYIDX20]], align 4
	▲ Show 20 Lines • Show All 66 Lines • Show Last 20 Lines

llvm/test/Transforms/PhaseOrdering/AArch64/matrix-extract-insert.ll

Show First 20 Lines • Show All 82 Lines • ▼ Show 20 Lines	entry:
%25 = load <225 x double>, <225 x double>* %21, align 8		%25 = load <225 x double>, <225 x double>* %21, align 8
%matins = insertelement <225 x double> %25, double %sub, i64 %20		%matins = insertelement <225 x double> %25, double %sub, i64 %20
store <225 x double> %matins, <225 x double>* %21, align 8		store <225 x double> %matins, <225 x double>* %21, align 8
ret void		ret void
}		}
define void @matrix_extract_insert_loop(i32 %i, [225 x double]* nonnull align 8 dereferenceable(1800) %A, [225 x double]* nonnull align 8 dereferenceable(1800) %B) {		define void @matrix_extract_insert_loop(i32 %i, [225 x double]* nonnull align 8 dereferenceable(1800) %A, [225 x double]* nonnull align 8 dereferenceable(1800) %B) {
; CHECK-LABEL: @matrix_extract_insert_loop(		; CHECK-LABEL: @matrix_extract_insert_loop(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
		; CHECK-NEXT: [[CMP212_NOT:%.]] = icmp eq i32 [[I:%.]], 0
; CHECK-NEXT: [[TMP0:%.]] = bitcast [225 x double] [[A:%.]] to <225 x double>		; CHECK-NEXT: [[TMP0:%.]] = bitcast [225 x double] [[A:%.]] to <225 x double>
; CHECK-NEXT: [[CONV6:%.]] = zext i32 [[I:%.]] to i64		; CHECK-NEXT: [[CONV6:%.*]] = zext i32 [[I]] to i64
; CHECK-NEXT: [[TMP1:%.]] = bitcast [225 x double] [[B:%.]] to <225 x double>		; CHECK-NEXT: [[TMP1:%.]] = bitcast [225 x double] [[B:%.]] to <225 x double>
; CHECK-NEXT: [[CMP212_NOT:%.*]] = icmp eq i32 [[I]], 0
; CHECK-NEXT: br i1 [[CMP212_NOT]], label [[FOR_COND_CLEANUP:%.]], label [[FOR_COND1_PREHEADER_US:%.]]		; CHECK-NEXT: br i1 [[CMP212_NOT]], label [[FOR_COND_CLEANUP:%.]], label [[FOR_COND1_PREHEADER_US:%.]]
; CHECK: for.cond1.preheader.us:		; CHECK: for.cond1.preheader.us:
; CHECK-NEXT: [[TMP2:%.*]] = icmp ult i32 [[I]], 225		; CHECK-NEXT: [[TMP2:%.*]] = icmp ult i32 [[I]], 225
; CHECK-NEXT: tail call void @llvm.assume(i1 [[TMP2]])		; CHECK-NEXT: tail call void @llvm.assume(i1 [[TMP2]])
; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds <225 x double>, <225 x double> [[TMP1]], i64 0, i64 [[CONV6]]		; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds <225 x double>, <225 x double> [[TMP1]], i64 0, i64 [[CONV6]]
; CHECK-NEXT: br label [[FOR_BODY4_US:%.*]]		; CHECK-NEXT: br label [[FOR_BODY4_US:%.*]]
; CHECK: for.body4.us:		; CHECK: for.body4.us:
; CHECK-NEXT: [[K_013_US:%.]] = phi i32 [ 0, [[FOR_COND1_PREHEADER_US]] ], [ [[INC_US:%.]], [[FOR_BODY4_US]] ]		; CHECK-NEXT: [[K_013_US:%.]] = phi i32 [ 0, [[FOR_COND1_PREHEADER_US]] ], [ [[INC_US:%.]], [[FOR_BODY4_US]] ]
Show All 37 Lines
; CHECK: for.cond1.for.cond.cleanup3_crit_edge.us.1:		; CHECK: for.cond1.for.cond.cleanup3_crit_edge.us.1:
; CHECK-NEXT: [[TMP14:%.*]] = add nuw nsw i64 [[CONV6]], 30		; CHECK-NEXT: [[TMP14:%.*]] = add nuw nsw i64 [[CONV6]], 30
; CHECK-NEXT: [[TMP15:%.*]] = icmp ult i32 [[I]], 195		; CHECK-NEXT: [[TMP15:%.*]] = icmp ult i32 [[I]], 195
; CHECK-NEXT: tail call void @llvm.assume(i1 [[TMP15]])		; CHECK-NEXT: tail call void @llvm.assume(i1 [[TMP15]])
; CHECK-NEXT: [[TMP16:%.]] = getelementptr inbounds <225 x double>, <225 x double> [[TMP1]], i64 0, i64 [[TMP14]]		; CHECK-NEXT: [[TMP16:%.]] = getelementptr inbounds <225 x double>, <225 x double> [[TMP1]], i64 0, i64 [[TMP14]]
; CHECK-NEXT: br label [[FOR_BODY4_US_2:%.*]]		; CHECK-NEXT: br label [[FOR_BODY4_US_2:%.*]]
; CHECK: for.body4.us.2:		; CHECK: for.body4.us.2:
; CHECK-NEXT: [[K_013_US_2:%.]] = phi i32 [ 0, [[FOR_COND1_FOR_COND_CLEANUP3_CRIT_EDGE_US_1]] ], [ [[INC_US_2:%.]], [[FOR_BODY4_US_2]] ]		; CHECK-NEXT: [[K_013_US_2:%.]] = phi i32 [ 0, [[FOR_COND1_FOR_COND_CLEANUP3_CRIT_EDGE_US_1]] ], [ [[INC_US_2:%.]], [[FOR_BODY4_US_2]] ]
; CHECK-NEXT: [[NARROW16:%.*]] = add nuw nsw i32 [[K_013_US_2]], 30		; CHECK-NEXT: [[NARROW17:%.*]] = add nuw nsw i32 [[K_013_US_2]], 30
; CHECK-NEXT: [[TMP17:%.*]] = zext i32 [[NARROW16]] to i64		; CHECK-NEXT: [[TMP17:%.*]] = zext i32 [[NARROW17]] to i64
; CHECK-NEXT: [[TMP18:%.*]] = icmp ult i32 [[K_013_US_2]], 195		; CHECK-NEXT: [[TMP18:%.*]] = icmp ult i32 [[K_013_US_2]], 195
; CHECK-NEXT: tail call void @llvm.assume(i1 [[TMP18]])		; CHECK-NEXT: tail call void @llvm.assume(i1 [[TMP18]])
; CHECK-NEXT: [[TMP19:%.]] = getelementptr inbounds <225 x double>, <225 x double> [[TMP0]], i64 0, i64 [[TMP17]]		; CHECK-NEXT: [[TMP19:%.]] = getelementptr inbounds <225 x double>, <225 x double> [[TMP0]], i64 0, i64 [[TMP17]]
; CHECK-NEXT: [[MATRIXEXT_US_2:%.]] = load double, double [[TMP19]], align 8		; CHECK-NEXT: [[MATRIXEXT_US_2:%.]] = load double, double [[TMP19]], align 8
; CHECK-NEXT: [[MATRIXEXT8_US_2:%.]] = load double, double [[TMP16]], align 8		; CHECK-NEXT: [[MATRIXEXT8_US_2:%.]] = load double, double [[TMP16]], align 8
; CHECK-NEXT: [[MUL_US_2:%.*]] = fmul double [[MATRIXEXT_US_2]], [[MATRIXEXT8_US_2]]		; CHECK-NEXT: [[MUL_US_2:%.*]] = fmul double [[MATRIXEXT_US_2]], [[MATRIXEXT8_US_2]]
; CHECK-NEXT: [[TMP20:%.]] = getelementptr inbounds <225 x double>, <225 x double> [[TMP1]], i64 0, i64 [[TMP17]]		; CHECK-NEXT: [[TMP20:%.]] = getelementptr inbounds <225 x double>, <225 x double> [[TMP1]], i64 0, i64 [[TMP17]]
; CHECK-NEXT: [[MATRIXEXT11_US_2:%.]] = load double, double [[TMP20]], align 8		; CHECK-NEXT: [[MATRIXEXT11_US_2:%.]] = load double, double [[TMP20]], align 8
; CHECK-NEXT: [[SUB_US_2:%.*]] = fsub double [[MATRIXEXT11_US_2]], [[MUL_US_2]]		; CHECK-NEXT: [[SUB_US_2:%.*]] = fsub double [[MATRIXEXT11_US_2]], [[MUL_US_2]]
; CHECK-NEXT: store double [[SUB_US_2]], double* [[TMP20]], align 8		; CHECK-NEXT: store double [[SUB_US_2]], double* [[TMP20]], align 8
; CHECK-NEXT: [[INC_US_2]] = add nuw nsw i32 [[K_013_US_2]], 1		; CHECK-NEXT: [[INC_US_2]] = add nuw nsw i32 [[K_013_US_2]], 1
; CHECK-NEXT: [[CMP2_US_2:%.*]] = icmp ult i32 [[INC_US_2]], [[I]]		; CHECK-NEXT: [[CMP2_US_2:%.*]] = icmp ult i32 [[INC_US_2]], [[I]]
; CHECK-NEXT: br i1 [[CMP2_US_2]], label [[FOR_BODY4_US_2]], label [[FOR_COND1_FOR_COND_CLEANUP3_CRIT_EDGE_US_2:%.*]]		; CHECK-NEXT: br i1 [[CMP2_US_2]], label [[FOR_BODY4_US_2]], label [[FOR_COND1_FOR_COND_CLEANUP3_CRIT_EDGE_US_2:%.*]]
; CHECK: for.cond1.for.cond.cleanup3_crit_edge.us.2:		; CHECK: for.cond1.for.cond.cleanup3_crit_edge.us.2:
; CHECK-NEXT: [[TMP21:%.*]] = add nuw nsw i64 [[CONV6]], 45		; CHECK-NEXT: [[TMP21:%.*]] = add nuw nsw i64 [[CONV6]], 45
; CHECK-NEXT: [[TMP22:%.*]] = icmp ult i32 [[I]], 180		; CHECK-NEXT: [[TMP22:%.*]] = icmp ult i32 [[I]], 180
; CHECK-NEXT: tail call void @llvm.assume(i1 [[TMP22]])		; CHECK-NEXT: tail call void @llvm.assume(i1 [[TMP22]])
; CHECK-NEXT: [[TMP23:%.]] = getelementptr inbounds <225 x double>, <225 x double> [[TMP1]], i64 0, i64 [[TMP21]]		; CHECK-NEXT: [[TMP23:%.]] = getelementptr inbounds <225 x double>, <225 x double> [[TMP1]], i64 0, i64 [[TMP21]]
; CHECK-NEXT: br label [[FOR_BODY4_US_3:%.*]]		; CHECK-NEXT: br label [[FOR_BODY4_US_3:%.*]]
; CHECK: for.body4.us.3:		; CHECK: for.body4.us.3:
; CHECK-NEXT: [[K_013_US_3:%.]] = phi i32 [ 0, [[FOR_COND1_FOR_COND_CLEANUP3_CRIT_EDGE_US_2]] ], [ [[INC_US_3:%.]], [[FOR_BODY4_US_3]] ]		; CHECK-NEXT: [[K_013_US_3:%.]] = phi i32 [ 0, [[FOR_COND1_FOR_COND_CLEANUP3_CRIT_EDGE_US_2]] ], [ [[INC_US_3:%.]], [[FOR_BODY4_US_3]] ]
; CHECK-NEXT: [[NARROW17:%.*]] = add nuw nsw i32 [[K_013_US_3]], 45		; CHECK-NEXT: [[NARROW18:%.*]] = add nuw nsw i32 [[K_013_US_3]], 45
; CHECK-NEXT: [[TMP24:%.*]] = zext i32 [[NARROW17]] to i64		; CHECK-NEXT: [[TMP24:%.*]] = zext i32 [[NARROW18]] to i64
; CHECK-NEXT: [[TMP25:%.*]] = icmp ult i32 [[K_013_US_3]], 180		; CHECK-NEXT: [[TMP25:%.*]] = icmp ult i32 [[K_013_US_3]], 180
; CHECK-NEXT: tail call void @llvm.assume(i1 [[TMP25]])		; CHECK-NEXT: tail call void @llvm.assume(i1 [[TMP25]])
; CHECK-NEXT: [[TMP26:%.]] = getelementptr inbounds <225 x double>, <225 x double> [[TMP0]], i64 0, i64 [[TMP24]]		; CHECK-NEXT: [[TMP26:%.]] = getelementptr inbounds <225 x double>, <225 x double> [[TMP0]], i64 0, i64 [[TMP24]]
; CHECK-NEXT: [[MATRIXEXT_US_3:%.]] = load double, double [[TMP26]], align 8		; CHECK-NEXT: [[MATRIXEXT_US_3:%.]] = load double, double [[TMP26]], align 8
; CHECK-NEXT: [[MATRIXEXT8_US_3:%.]] = load double, double [[TMP23]], align 8		; CHECK-NEXT: [[MATRIXEXT8_US_3:%.]] = load double, double [[TMP23]], align 8
; CHECK-NEXT: [[MUL_US_3:%.*]] = fmul double [[MATRIXEXT_US_3]], [[MATRIXEXT8_US_3]]		; CHECK-NEXT: [[MUL_US_3:%.*]] = fmul double [[MATRIXEXT_US_3]], [[MATRIXEXT8_US_3]]
; CHECK-NEXT: [[TMP27:%.]] = getelementptr inbounds <225 x double>, <225 x double> [[TMP1]], i64 0, i64 [[TMP24]]		; CHECK-NEXT: [[TMP27:%.]] = getelementptr inbounds <225 x double>, <225 x double> [[TMP1]], i64 0, i64 [[TMP24]]
; CHECK-NEXT: [[MATRIXEXT11_US_3:%.]] = load double, double [[TMP27]], align 8		; CHECK-NEXT: [[MATRIXEXT11_US_3:%.]] = load double, double [[TMP27]], align 8
▲ Show 20 Lines • Show All 160 Lines • Show Last 20 Lines

llvm/test/Transforms/PhaseOrdering/X86/hoist-load-of-baseptr.ll

	Show All 38 Lines
	; OLDPM_O1-NEXT: store i32 [[INC]], i32* [[CALL]], align 4, !tbaa [[TBAA2]]			; OLDPM_O1-NEXT: store i32 [[INC]], i32* [[CALL]], align 4, !tbaa [[TBAA2]]
	; OLDPM_O1-NEXT: [[INC5]] = add nuw i64 [[J_07]], 1			; OLDPM_O1-NEXT: [[INC5]] = add nuw i64 [[J_07]], 1
	; OLDPM_O1-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INC5]], [[NUMELEMS]]			; OLDPM_O1-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INC5]], [[NUMELEMS]]
	; OLDPM_O1-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP3]], label [[FOR_BODY4]], !llvm.loop [[LOOP6:![0-9]+]]			; OLDPM_O1-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP3]], label [[FOR_BODY4]], !llvm.loop [[LOOP6:![0-9]+]]
	;			;
	; OLDPM_O2-LABEL: define {{[^@]+}}@_Z7computeRSt6vectorIiSaIiEEy			; OLDPM_O2-LABEL: define {{[^@]+}}@_Z7computeRSt6vectorIiSaIiEEy
	; OLDPM_O2-SAME: (%"class.std::vector"* nocapture noundef nonnull readonly align 8 dereferenceable(24) [[DATA:%.]], i64 noundef [[NUMELEMS:%.]]) local_unnamed_addr #[[ATTR0:[0-9]+]] {			; OLDPM_O2-SAME: (%"class.std::vector"* nocapture noundef nonnull readonly align 8 dereferenceable(24) [[DATA:%.]], i64 noundef [[NUMELEMS:%.]]) local_unnamed_addr #[[ATTR0:[0-9]+]] {
	; OLDPM_O2-NEXT: entry:			; OLDPM_O2-NEXT: entry:
	; OLDPM_O2-NEXT: [[_M_START_I:%.]] = getelementptr inbounds %"class.std::vector", %"class.std::vector" [[DATA]], i64 0, i32 0, i32 0, i32 0, i32 0
	; OLDPM_O2-NEXT: [[CMP26_NOT:%.*]] = icmp eq i64 [[NUMELEMS]], 0			; OLDPM_O2-NEXT: [[CMP26_NOT:%.*]] = icmp eq i64 [[NUMELEMS]], 0
				; OLDPM_O2-NEXT: [[_M_START_I:%.]] = getelementptr inbounds %"class.std::vector", %"class.std::vector" [[DATA]], i64 0, i32 0, i32 0, i32 0, i32 0
				; OLDPM_O2-NEXT: [[TMP0:%.]] = load i32, i32** [[_M_START_I]], align 8
	; OLDPM_O2-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[NUMELEMS]], 8			; OLDPM_O2-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[NUMELEMS]], 8
	; OLDPM_O2-NEXT: [[N_VEC:%.*]] = and i64 [[NUMELEMS]], -8			; OLDPM_O2-NEXT: [[N_VEC:%.*]] = and i64 [[NUMELEMS]], -8
	; OLDPM_O2-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[NUMELEMS]]			; OLDPM_O2-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[NUMELEMS]]
	; OLDPM_O2-NEXT: br label [[FOR_COND1_PREHEADER:%.*]]			; OLDPM_O2-NEXT: br label [[FOR_COND1_PREHEADER:%.*]]
	; OLDPM_O2: for.cond1.preheader:			; OLDPM_O2: for.cond1.preheader:
	; OLDPM_O2-NEXT: [[I_08:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INC7:%.]], [[FOR_COND_CLEANUP3:%.]] ]			; OLDPM_O2-NEXT: [[I_08:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INC7:%.]], [[FOR_COND_CLEANUP3:%.]] ]
	; OLDPM_O2-NEXT: [[TMP0:%.]] = load i32, i32** [[_M_START_I]], align 8
	; OLDPM_O2-NEXT: br i1 [[CMP26_NOT]], label [[FOR_COND_CLEANUP3]], label [[FOR_BODY4_PREHEADER:%.*]]			; OLDPM_O2-NEXT: br i1 [[CMP26_NOT]], label [[FOR_COND_CLEANUP3]], label [[FOR_BODY4_PREHEADER:%.*]]
	; OLDPM_O2: for.body4.preheader:			; OLDPM_O2: for.body4.preheader:
	; OLDPM_O2-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[FOR_BODY4_PREHEADER11:%.]], label [[VECTOR_BODY:%.]]			; OLDPM_O2-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[FOR_BODY4_PREHEADER11:%.]], label [[VECTOR_BODY:%.]]
	; OLDPM_O2: vector.body:			; OLDPM_O2: vector.body:
	; OLDPM_O2-NEXT: [[INDEX:%.]] = phi i64 [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ], [ 0, [[FOR_BODY4_PREHEADER]] ]			; OLDPM_O2-NEXT: [[INDEX:%.]] = phi i64 [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ], [ 0, [[FOR_BODY4_PREHEADER]] ]
	; OLDPM_O2-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 [[INDEX]]			; OLDPM_O2-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 [[INDEX]]
	; OLDPM_O2-NEXT: [[TMP2:%.]] = bitcast i32 [[TMP1]] to <4 x i32>*			; OLDPM_O2-NEXT: [[TMP2:%.]] = bitcast i32 [[TMP1]] to <4 x i32>*
	; OLDPM_O2-NEXT: [[WIDE_LOAD:%.]] = load <4 x i32>, <4 x i32> [[TMP2]], align 4, !tbaa [[TBAA0:![0-9]+]]			; OLDPM_O2-NEXT: [[WIDE_LOAD:%.]] = load <4 x i32>, <4 x i32> [[TMP2]], align 4, !tbaa [[TBAA0:![0-9]+]]
	Show All 28 Lines
	; OLDPM_O2-NEXT: store i32 [[INC]], i32* [[ADD_PTR_I]], align 4, !tbaa [[TBAA0]]			; OLDPM_O2-NEXT: store i32 [[INC]], i32* [[ADD_PTR_I]], align 4, !tbaa [[TBAA0]]
	; OLDPM_O2-NEXT: [[INC5]] = add nuw i64 [[J_07]], 1			; OLDPM_O2-NEXT: [[INC5]] = add nuw i64 [[J_07]], 1
	; OLDPM_O2-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INC5]], [[NUMELEMS]]			; OLDPM_O2-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INC5]], [[NUMELEMS]]
	; OLDPM_O2-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP3]], label [[FOR_BODY4]], !llvm.loop [[LOOP8:![0-9]+]]			; OLDPM_O2-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP3]], label [[FOR_BODY4]], !llvm.loop [[LOOP8:![0-9]+]]
	;			;
	; OLDPM_O3-LABEL: define {{[^@]+}}@_Z7computeRSt6vectorIiSaIiEEy			; OLDPM_O3-LABEL: define {{[^@]+}}@_Z7computeRSt6vectorIiSaIiEEy
	; OLDPM_O3-SAME: (%"class.std::vector"* nocapture noundef nonnull readonly align 8 dereferenceable(24) [[DATA:%.]], i64 noundef [[NUMELEMS:%.]]) local_unnamed_addr #[[ATTR0:[0-9]+]] {			; OLDPM_O3-SAME: (%"class.std::vector"* nocapture noundef nonnull readonly align 8 dereferenceable(24) [[DATA:%.]], i64 noundef [[NUMELEMS:%.]]) local_unnamed_addr #[[ATTR0:[0-9]+]] {
	; OLDPM_O3-NEXT: entry:			; OLDPM_O3-NEXT: entry:
	; OLDPM_O3-NEXT: [[_M_START_I:%.]] = getelementptr inbounds %"class.std::vector", %"class.std::vector" [[DATA]], i64 0, i32 0, i32 0, i32 0, i32 0
	; OLDPM_O3-NEXT: [[CMP26_NOT:%.*]] = icmp eq i64 [[NUMELEMS]], 0			; OLDPM_O3-NEXT: [[CMP26_NOT:%.*]] = icmp eq i64 [[NUMELEMS]], 0
				; OLDPM_O3-NEXT: [[_M_START_I:%.]] = getelementptr inbounds %"class.std::vector", %"class.std::vector" [[DATA]], i64 0, i32 0, i32 0, i32 0, i32 0
				; OLDPM_O3-NEXT: [[TMP0:%.]] = load i32, i32** [[_M_START_I]], align 8
	; OLDPM_O3-NEXT: br i1 [[CMP26_NOT]], label [[FOR_COND_CLEANUP:%.]], label [[FOR_COND1_PREHEADER_US_PREHEADER:%.]]			; OLDPM_O3-NEXT: br i1 [[CMP26_NOT]], label [[FOR_COND_CLEANUP:%.]], label [[FOR_COND1_PREHEADER_US_PREHEADER:%.]]
	; OLDPM_O3: for.cond1.preheader.us.preheader:			; OLDPM_O3: for.cond1.preheader.us.preheader:
	; OLDPM_O3-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[NUMELEMS]], 8			; OLDPM_O3-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[NUMELEMS]], 8
	; OLDPM_O3-NEXT: [[N_VEC:%.*]] = and i64 [[NUMELEMS]], -8			; OLDPM_O3-NEXT: [[N_VEC:%.*]] = and i64 [[NUMELEMS]], -8
	; OLDPM_O3-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[NUMELEMS]]			; OLDPM_O3-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[NUMELEMS]]
	; OLDPM_O3-NEXT: br label [[FOR_COND1_PREHEADER_US:%.*]]			; OLDPM_O3-NEXT: br label [[FOR_COND1_PREHEADER_US:%.*]]
	; OLDPM_O3: for.cond1.preheader.us:			; OLDPM_O3: for.cond1.preheader.us:
	; OLDPM_O3-NEXT: [[I_08_US:%.]] = phi i64 [ [[INC7_US:%.]], [[FOR_COND1_FOR_COND_CLEANUP3_CRIT_EDGE_US:%.*]] ], [ 0, [[FOR_COND1_PREHEADER_US_PREHEADER]] ]			; OLDPM_O3-NEXT: [[I_08_US:%.]] = phi i64 [ [[INC7_US:%.]], [[FOR_COND1_FOR_COND_CLEANUP3_CRIT_EDGE_US:%.*]] ], [ 0, [[FOR_COND1_PREHEADER_US_PREHEADER]] ]
	; OLDPM_O3-NEXT: [[TMP0:%.]] = load i32, i32** [[_M_START_I]], align 8
	; OLDPM_O3-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[FOR_BODY4_US_PREHEADER:%.]], label [[VECTOR_BODY:%.]]			; OLDPM_O3-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[FOR_BODY4_US_PREHEADER:%.]], label [[VECTOR_BODY:%.]]
	; OLDPM_O3: vector.body:			; OLDPM_O3: vector.body:
	; OLDPM_O3-NEXT: [[INDEX:%.]] = phi i64 [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ], [ 0, [[FOR_COND1_PREHEADER_US]] ]			; OLDPM_O3-NEXT: [[INDEX:%.]] = phi i64 [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ], [ 0, [[FOR_COND1_PREHEADER_US]] ]
	; OLDPM_O3-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 [[INDEX]]			; OLDPM_O3-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 [[INDEX]]
	; OLDPM_O3-NEXT: [[TMP2:%.]] = bitcast i32 [[TMP1]] to <4 x i32>*			; OLDPM_O3-NEXT: [[TMP2:%.]] = bitcast i32 [[TMP1]] to <4 x i32>*
	; OLDPM_O3-NEXT: [[WIDE_LOAD:%.]] = load <4 x i32>, <4 x i32> [[TMP2]], align 4, !tbaa [[TBAA0:![0-9]+]]			; OLDPM_O3-NEXT: [[WIDE_LOAD:%.]] = load <4 x i32>, <4 x i32> [[TMP2]], align 4, !tbaa [[TBAA0:![0-9]+]]
	; OLDPM_O3-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 4			; OLDPM_O3-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 4
	; OLDPM_O3-NEXT: [[TMP4:%.]] = bitcast i32 [[TMP3]] to <4 x i32>*			; OLDPM_O3-NEXT: [[TMP4:%.]] = bitcast i32 [[TMP3]] to <4 x i32>*
	Show All 26 Lines
	; OLDPM_O3-NEXT: [[EXITCOND10_NOT:%.*]] = icmp eq i64 [[INC7_US]], 100			; OLDPM_O3-NEXT: [[EXITCOND10_NOT:%.*]] = icmp eq i64 [[INC7_US]], 100
	; OLDPM_O3-NEXT: br i1 [[EXITCOND10_NOT]], label [[FOR_COND_CLEANUP]], label [[FOR_COND1_PREHEADER_US]], !llvm.loop [[LOOP9:![0-9]+]]			; OLDPM_O3-NEXT: br i1 [[EXITCOND10_NOT]], label [[FOR_COND_CLEANUP]], label [[FOR_COND1_PREHEADER_US]], !llvm.loop [[LOOP9:![0-9]+]]
	; OLDPM_O3: for.cond.cleanup:			; OLDPM_O3: for.cond.cleanup:
	; OLDPM_O3-NEXT: ret void			; OLDPM_O3-NEXT: ret void
	;			;
	; NEWPM_O1-LABEL: define {{[^@]+}}@_Z7computeRSt6vectorIiSaIiEEy			; NEWPM_O1-LABEL: define {{[^@]+}}@_Z7computeRSt6vectorIiSaIiEEy
	; NEWPM_O1-SAME: (%"class.std::vector"* nocapture noundef nonnull readonly align 8 dereferenceable(24) [[DATA:%.]], i64 noundef [[NUMELEMS:%.]]) local_unnamed_addr #[[ATTR0:[0-9]+]] {			; NEWPM_O1-SAME: (%"class.std::vector"* nocapture noundef nonnull readonly align 8 dereferenceable(24) [[DATA:%.]], i64 noundef [[NUMELEMS:%.]]) local_unnamed_addr #[[ATTR0:[0-9]+]] {
	; NEWPM_O1-NEXT: entry:			; NEWPM_O1-NEXT: entry:
	; NEWPM_O1-NEXT: [[_M_START_I:%.]] = getelementptr inbounds %"class.std::vector", %"class.std::vector" [[DATA]], i64 0, i32 0, i32 0, i32 0, i32 0
	; NEWPM_O1-NEXT: [[CMP26_NOT:%.*]] = icmp eq i64 [[NUMELEMS]], 0			; NEWPM_O1-NEXT: [[CMP26_NOT:%.*]] = icmp eq i64 [[NUMELEMS]], 0
				; NEWPM_O1-NEXT: [[_M_START_I:%.]] = getelementptr inbounds %"class.std::vector", %"class.std::vector" [[DATA]], i64 0, i32 0, i32 0, i32 0, i32 0
				; NEWPM_O1-NEXT: [[TMP0:%.]] = load i32, i32** [[_M_START_I]], align 8
	; NEWPM_O1-NEXT: br label [[FOR_COND1_PREHEADER:%.*]]			; NEWPM_O1-NEXT: br label [[FOR_COND1_PREHEADER:%.*]]
	; NEWPM_O1: for.cond1.preheader:			; NEWPM_O1: for.cond1.preheader:
	; NEWPM_O1-NEXT: [[I_08:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INC7:%.]], [[FOR_COND_CLEANUP3:%.]] ]			; NEWPM_O1-NEXT: [[I_08:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INC7:%.]], [[FOR_COND_CLEANUP3:%.]] ]
	; NEWPM_O1-NEXT: [[TMP0:%.]] = load i32, i32** [[_M_START_I]], align 8
	; NEWPM_O1-NEXT: br i1 [[CMP26_NOT]], label [[FOR_COND_CLEANUP3]], label [[FOR_BODY4:%.*]]			; NEWPM_O1-NEXT: br i1 [[CMP26_NOT]], label [[FOR_COND_CLEANUP3]], label [[FOR_BODY4:%.*]]
	; NEWPM_O1: for.cond.cleanup:			; NEWPM_O1: for.cond.cleanup:
	; NEWPM_O1-NEXT: ret void			; NEWPM_O1-NEXT: ret void
	; NEWPM_O1: for.cond.cleanup3:			; NEWPM_O1: for.cond.cleanup3:
	; NEWPM_O1-NEXT: [[INC7]] = add nuw nsw i64 [[I_08]], 1			; NEWPM_O1-NEXT: [[INC7]] = add nuw nsw i64 [[I_08]], 1
	; NEWPM_O1-NEXT: [[EXITCOND9_NOT:%.*]] = icmp eq i64 [[INC7]], 100			; NEWPM_O1-NEXT: [[EXITCOND9_NOT:%.*]] = icmp eq i64 [[INC7]], 100
	; NEWPM_O1-NEXT: br i1 [[EXITCOND9_NOT]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_COND1_PREHEADER]], !llvm.loop [[LOOP0:![0-9]+]]			; NEWPM_O1-NEXT: br i1 [[EXITCOND9_NOT]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_COND1_PREHEADER]], !llvm.loop [[LOOP0:![0-9]+]]
	; NEWPM_O1: for.body4:			; NEWPM_O1: for.body4:
	; NEWPM_O1-NEXT: [[J_07:%.]] = phi i64 [ [[INC5:%.]], [[FOR_BODY4]] ], [ 0, [[FOR_COND1_PREHEADER]] ]			; NEWPM_O1-NEXT: [[J_07:%.]] = phi i64 [ [[INC5:%.]], [[FOR_BODY4]] ], [ 0, [[FOR_COND1_PREHEADER]] ]
	; NEWPM_O1-NEXT: [[ADD_PTR_I:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 [[J_07]]			; NEWPM_O1-NEXT: [[ADD_PTR_I:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 [[J_07]]
	; NEWPM_O1-NEXT: [[TMP1:%.]] = load i32, i32 [[ADD_PTR_I]], align 4, !tbaa [[TBAA2:![0-9]+]]			; NEWPM_O1-NEXT: [[TMP1:%.]] = load i32, i32 [[ADD_PTR_I]], align 4, !tbaa [[TBAA2:![0-9]+]]
	; NEWPM_O1-NEXT: [[INC:%.*]] = add nsw i32 [[TMP1]], 1			; NEWPM_O1-NEXT: [[INC:%.*]] = add nsw i32 [[TMP1]], 1
	; NEWPM_O1-NEXT: store i32 [[INC]], i32* [[ADD_PTR_I]], align 4, !tbaa [[TBAA2]]			; NEWPM_O1-NEXT: store i32 [[INC]], i32* [[ADD_PTR_I]], align 4, !tbaa [[TBAA2]]
	; NEWPM_O1-NEXT: [[INC5]] = add nuw i64 [[J_07]], 1			; NEWPM_O1-NEXT: [[INC5]] = add nuw i64 [[J_07]], 1
	; NEWPM_O1-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INC5]], [[NUMELEMS]]			; NEWPM_O1-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INC5]], [[NUMELEMS]]
	; NEWPM_O1-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP3]], label [[FOR_BODY4]], !llvm.loop [[LOOP6:![0-9]+]]			; NEWPM_O1-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP3]], label [[FOR_BODY4]], !llvm.loop [[LOOP6:![0-9]+]]
	;			;
	; NEWPM_O2-LABEL: define {{[^@]+}}@_Z7computeRSt6vectorIiSaIiEEy			; NEWPM_O2-LABEL: define {{[^@]+}}@_Z7computeRSt6vectorIiSaIiEEy
	; NEWPM_O2-SAME: (%"class.std::vector"* nocapture noundef nonnull readonly align 8 dereferenceable(24) [[DATA:%.]], i64 noundef [[NUMELEMS:%.]]) local_unnamed_addr #[[ATTR0:[0-9]+]] {			; NEWPM_O2-SAME: (%"class.std::vector"* nocapture noundef nonnull readonly align 8 dereferenceable(24) [[DATA:%.]], i64 noundef [[NUMELEMS:%.]]) local_unnamed_addr #[[ATTR0:[0-9]+]] {
	; NEWPM_O2-NEXT: entry:			; NEWPM_O2-NEXT: entry:
	; NEWPM_O2-NEXT: [[_M_START_I:%.]] = getelementptr inbounds %"class.std::vector", %"class.std::vector" [[DATA]], i64 0, i32 0, i32 0, i32 0, i32 0
	; NEWPM_O2-NEXT: [[CMP26_NOT:%.*]] = icmp eq i64 [[NUMELEMS]], 0			; NEWPM_O2-NEXT: [[CMP26_NOT:%.*]] = icmp eq i64 [[NUMELEMS]], 0
				; NEWPM_O2-NEXT: [[_M_START_I:%.]] = getelementptr inbounds %"class.std::vector", %"class.std::vector" [[DATA]], i64 0, i32 0, i32 0, i32 0, i32 0
				; NEWPM_O2-NEXT: [[TMP0:%.]] = load i32, i32** [[_M_START_I]], align 8
	; NEWPM_O2-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[NUMELEMS]], 8			; NEWPM_O2-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[NUMELEMS]], 8
	; NEWPM_O2-NEXT: [[N_VEC:%.*]] = and i64 [[NUMELEMS]], -8			; NEWPM_O2-NEXT: [[N_VEC:%.*]] = and i64 [[NUMELEMS]], -8
	; NEWPM_O2-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[NUMELEMS]]			; NEWPM_O2-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[NUMELEMS]]
	; NEWPM_O2-NEXT: br label [[FOR_COND1_PREHEADER:%.*]]			; NEWPM_O2-NEXT: br label [[FOR_COND1_PREHEADER:%.*]]
	; NEWPM_O2: for.cond1.preheader:			; NEWPM_O2: for.cond1.preheader:
	; NEWPM_O2-NEXT: [[I_08:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INC7:%.]], [[FOR_COND_CLEANUP3:%.]] ]			; NEWPM_O2-NEXT: [[I_08:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INC7:%.]], [[FOR_COND_CLEANUP3:%.]] ]
	; NEWPM_O2-NEXT: [[TMP0:%.]] = load i32, i32** [[_M_START_I]], align 8
	; NEWPM_O2-NEXT: br i1 [[CMP26_NOT]], label [[FOR_COND_CLEANUP3]], label [[FOR_BODY4_PREHEADER:%.*]]			; NEWPM_O2-NEXT: br i1 [[CMP26_NOT]], label [[FOR_COND_CLEANUP3]], label [[FOR_BODY4_PREHEADER:%.*]]
	; NEWPM_O2: for.body4.preheader:			; NEWPM_O2: for.body4.preheader:
	; NEWPM_O2-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[FOR_BODY4_PREHEADER11:%.]], label [[VECTOR_BODY:%.]]			; NEWPM_O2-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[FOR_BODY4_PREHEADER11:%.]], label [[VECTOR_BODY:%.]]
	; NEWPM_O2: vector.body:			; NEWPM_O2: vector.body:
	; NEWPM_O2-NEXT: [[INDEX:%.]] = phi i64 [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ], [ 0, [[FOR_BODY4_PREHEADER]] ]			; NEWPM_O2-NEXT: [[INDEX:%.]] = phi i64 [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ], [ 0, [[FOR_BODY4_PREHEADER]] ]
	; NEWPM_O2-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 [[INDEX]]			; NEWPM_O2-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 [[INDEX]]
	; NEWPM_O2-NEXT: [[TMP2:%.]] = bitcast i32 [[TMP1]] to <4 x i32>*			; NEWPM_O2-NEXT: [[TMP2:%.]] = bitcast i32 [[TMP1]] to <4 x i32>*
	; NEWPM_O2-NEXT: [[WIDE_LOAD:%.]] = load <4 x i32>, <4 x i32> [[TMP2]], align 4, !tbaa [[TBAA0:![0-9]+]]			; NEWPM_O2-NEXT: [[WIDE_LOAD:%.]] = load <4 x i32>, <4 x i32> [[TMP2]], align 4, !tbaa [[TBAA0:![0-9]+]]
	Show All 28 Lines
	; NEWPM_O2-NEXT: store i32 [[INC]], i32* [[ADD_PTR_I]], align 4, !tbaa [[TBAA0]]			; NEWPM_O2-NEXT: store i32 [[INC]], i32* [[ADD_PTR_I]], align 4, !tbaa [[TBAA0]]
	; NEWPM_O2-NEXT: [[INC5]] = add nuw i64 [[J_07]], 1			; NEWPM_O2-NEXT: [[INC5]] = add nuw i64 [[J_07]], 1
	; NEWPM_O2-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INC5]], [[NUMELEMS]]			; NEWPM_O2-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INC5]], [[NUMELEMS]]
	; NEWPM_O2-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP3]], label [[FOR_BODY4]], !llvm.loop [[LOOP8:![0-9]+]]			; NEWPM_O2-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP3]], label [[FOR_BODY4]], !llvm.loop [[LOOP8:![0-9]+]]
	;			;
	; NEWPM_O3-LABEL: define {{[^@]+}}@_Z7computeRSt6vectorIiSaIiEEy			; NEWPM_O3-LABEL: define {{[^@]+}}@_Z7computeRSt6vectorIiSaIiEEy
	; NEWPM_O3-SAME: (%"class.std::vector"* nocapture noundef nonnull readonly align 8 dereferenceable(24) [[DATA:%.]], i64 noundef [[NUMELEMS:%.]]) local_unnamed_addr #[[ATTR0:[0-9]+]] {			; NEWPM_O3-SAME: (%"class.std::vector"* nocapture noundef nonnull readonly align 8 dereferenceable(24) [[DATA:%.]], i64 noundef [[NUMELEMS:%.]]) local_unnamed_addr #[[ATTR0:[0-9]+]] {
	; NEWPM_O3-NEXT: entry:			; NEWPM_O3-NEXT: entry:
	; NEWPM_O3-NEXT: [[_M_START_I:%.]] = getelementptr inbounds %"class.std::vector", %"class.std::vector" [[DATA]], i64 0, i32 0, i32 0, i32 0, i32 0
	; NEWPM_O3-NEXT: [[CMP26_NOT:%.*]] = icmp eq i64 [[NUMELEMS]], 0			; NEWPM_O3-NEXT: [[CMP26_NOT:%.*]] = icmp eq i64 [[NUMELEMS]], 0
				; NEWPM_O3-NEXT: [[_M_START_I:%.]] = getelementptr inbounds %"class.std::vector", %"class.std::vector" [[DATA]], i64 0, i32 0, i32 0, i32 0, i32 0
				; NEWPM_O3-NEXT: [[TMP0:%.]] = load i32, i32** [[_M_START_I]], align 8
	; NEWPM_O3-NEXT: br i1 [[CMP26_NOT]], label [[FOR_COND_CLEANUP:%.]], label [[FOR_COND1_PREHEADER_US_PREHEADER:%.]]			; NEWPM_O3-NEXT: br i1 [[CMP26_NOT]], label [[FOR_COND_CLEANUP:%.]], label [[FOR_COND1_PREHEADER_US_PREHEADER:%.]]
	; NEWPM_O3: for.cond1.preheader.us.preheader:			; NEWPM_O3: for.cond1.preheader.us.preheader:
	; NEWPM_O3-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[NUMELEMS]], 8			; NEWPM_O3-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[NUMELEMS]], 8
	; NEWPM_O3-NEXT: [[N_VEC:%.*]] = and i64 [[NUMELEMS]], -8			; NEWPM_O3-NEXT: [[N_VEC:%.*]] = and i64 [[NUMELEMS]], -8
	; NEWPM_O3-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[NUMELEMS]]			; NEWPM_O3-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[NUMELEMS]]
	; NEWPM_O3-NEXT: br label [[FOR_COND1_PREHEADER_US:%.*]]			; NEWPM_O3-NEXT: br label [[FOR_COND1_PREHEADER_US:%.*]]
	; NEWPM_O3: for.cond1.preheader.us:			; NEWPM_O3: for.cond1.preheader.us:
	; NEWPM_O3-NEXT: [[I_08_US:%.]] = phi i64 [ [[INC7_US:%.]], [[FOR_COND1_FOR_COND_CLEANUP3_CRIT_EDGE_US:%.*]] ], [ 0, [[FOR_COND1_PREHEADER_US_PREHEADER]] ]			; NEWPM_O3-NEXT: [[I_08_US:%.]] = phi i64 [ [[INC7_US:%.]], [[FOR_COND1_FOR_COND_CLEANUP3_CRIT_EDGE_US:%.*]] ], [ 0, [[FOR_COND1_PREHEADER_US_PREHEADER]] ]
	; NEWPM_O3-NEXT: [[TMP0:%.]] = load i32, i32** [[_M_START_I]], align 8
	; NEWPM_O3-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[FOR_BODY4_US_PREHEADER:%.]], label [[VECTOR_BODY:%.]]			; NEWPM_O3-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[FOR_BODY4_US_PREHEADER:%.]], label [[VECTOR_BODY:%.]]
	; NEWPM_O3: vector.body:			; NEWPM_O3: vector.body:
	; NEWPM_O3-NEXT: [[INDEX:%.]] = phi i64 [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ], [ 0, [[FOR_COND1_PREHEADER_US]] ]			; NEWPM_O3-NEXT: [[INDEX:%.]] = phi i64 [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ], [ 0, [[FOR_COND1_PREHEADER_US]] ]
	; NEWPM_O3-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 [[INDEX]]			; NEWPM_O3-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 [[INDEX]]
	; NEWPM_O3-NEXT: [[TMP2:%.]] = bitcast i32 [[TMP1]] to <4 x i32>*			; NEWPM_O3-NEXT: [[TMP2:%.]] = bitcast i32 [[TMP1]] to <4 x i32>*
	; NEWPM_O3-NEXT: [[WIDE_LOAD:%.]] = load <4 x i32>, <4 x i32> [[TMP2]], align 4, !tbaa [[TBAA0:![0-9]+]]			; NEWPM_O3-NEXT: [[WIDE_LOAD:%.]] = load <4 x i32>, <4 x i32> [[TMP2]], align 4, !tbaa [[TBAA0:![0-9]+]]
	; NEWPM_O3-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 4			; NEWPM_O3-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 4
	; NEWPM_O3-NEXT: [[TMP4:%.]] = bitcast i32 [[TMP3]] to <4 x i32>*			; NEWPM_O3-NEXT: [[TMP4:%.]] = bitcast i32 [[TMP3]] to <4 x i32>*
	▲ Show 20 Lines • Show All 148 Lines • Show Last 20 Lines

llvm/test/Transforms/PhaseOrdering/X86/speculation-vs-tbaa.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -O1 -S -enable-new-pm=0 < %s \| FileCheck --check-prefixes=OLDPM_O1 %s			; RUN: opt -O1 -S -enable-new-pm=0 < %s \| FileCheck --check-prefixes=OLDPM_O1 %s
	; RUN: opt -O2 -S -enable-new-pm=0 < %s \| FileCheck --check-prefixes=OLDPM_O23 %s			; RUN: opt -O2 -S -enable-new-pm=0 < %s \| FileCheck --check-prefixes=OLDPM_O23 %s
	; RUN: opt -O3 -S -enable-new-pm=0 < %s \| FileCheck --check-prefixes=OLDPM_O23 %s			; RUN: opt -O3 -S -enable-new-pm=0 < %s \| FileCheck --check-prefixes=OLDPM_O23 %s
	; RUN: opt -passes='default<O1>' -S < %s \| FileCheck --check-prefixes=NEWPM_O1 %s			; RUN: opt -passes='default<O1>' -S < %s \| FileCheck --check-prefixes=NEWPM_O1 %s
	; RUN: opt -passes='default<O2>' -S < %s \| FileCheck --check-prefixes=NEWPM_O23 %s			; RUN: opt -passes='default<O2>' -S < %s \| FileCheck --check-prefixes=NEWPM_O23 %s
	; RUN: opt -passes='default<O3>' -S < %s \| FileCheck --check-prefixes=NEWPM_O23 %s			; RUN: opt -passes='default<O3>' -S < %s \| FileCheck --check-prefixes=NEWPM_O23 %s

	target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	; We should retain the TBAA on the load here, not lose it.			; We should retain the TBAA on the load here, not lose it.

	define void @licm(double** align 8 dereferenceable(8) %_M_start.i, i64 %numElem) {			define void @licm(double** align 8 dereferenceable(8) %_M_start.i, i64 %numElem) {
	; OLDPM_O1-LABEL: @licm(			; OLDPM_O1-LABEL: @licm(
	; OLDPM_O1-NEXT: entry:			; OLDPM_O1-NEXT: entry:
	; OLDPM_O1-NEXT: [[TMP0:%.]] = load double, double** [[_M_START_I:%.*]], align 8
	; OLDPM_O1-NEXT: [[CMP1_NOT:%.]] = icmp eq i64 [[NUMELEM:%.]], 0			; OLDPM_O1-NEXT: [[CMP1_NOT:%.]] = icmp eq i64 [[NUMELEM:%.]], 0
	; OLDPM_O1-NEXT: br i1 [[CMP1_NOT]], label [[FOR_COND_CLEANUP:%.]], label [[FOR_BODY:%.]]			; OLDPM_O1-NEXT: br i1 [[CMP1_NOT]], label [[FOR_COND_CLEANUP:%.]], label [[FOR_BODY_LR_PH:%.]]
				; OLDPM_O1: for.body.lr.ph:
				; OLDPM_O1-NEXT: [[TMP0:%.]] = load double, double** [[_M_START_I:%.*]], align 8, !tbaa [[TBAA3:![0-9]+]]
				; OLDPM_O1-NEXT: br label [[FOR_BODY:%.*]]
	; OLDPM_O1: for.body:			; OLDPM_O1: for.body:
	; OLDPM_O1-NEXT: [[K_02:%.]] = phi i64 [ [[INC:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]			; OLDPM_O1-NEXT: [[K_02:%.]] = phi i64 [ 0, [[FOR_BODY_LR_PH]] ], [ [[INC:%.]], [[FOR_BODY]] ]
	; OLDPM_O1-NEXT: [[ADD_PTR_I:%.]] = getelementptr inbounds double, double [[TMP0]], i64 [[K_02]]			; OLDPM_O1-NEXT: [[ADD_PTR_I:%.]] = getelementptr inbounds double, double [[TMP0]], i64 [[K_02]]
	; OLDPM_O1-NEXT: store double 2.000000e+00, double* [[ADD_PTR_I]], align 8, !tbaa [[TBAA3:![0-9]+]]			; OLDPM_O1-NEXT: store double 2.000000e+00, double* [[ADD_PTR_I]], align 8, !tbaa [[TBAA8:![0-9]+]]
	; OLDPM_O1-NEXT: [[INC]] = add nuw i64 [[K_02]], 1			; OLDPM_O1-NEXT: [[INC]] = add nuw i64 [[K_02]], 1
	; OLDPM_O1-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INC]], [[NUMELEM]]			; OLDPM_O1-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INC]], [[NUMELEM]]
	; OLDPM_O1-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]]			; OLDPM_O1-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]]
	; OLDPM_O1: for.cond.cleanup:			; OLDPM_O1: for.cond.cleanup:
	; OLDPM_O1-NEXT: ret void			; OLDPM_O1-NEXT: ret void
	;			;
	; OLDPM_O23-LABEL: @licm(			; OLDPM_O23-LABEL: @licm(
	; OLDPM_O23-NEXT: entry:			; OLDPM_O23-NEXT: entry:
	; OLDPM_O23-NEXT: [[TMP0:%.]] = load double, double** [[_M_START_I:%.*]], align 8
	; OLDPM_O23-NEXT: [[CMP1_NOT:%.]] = icmp eq i64 [[NUMELEM:%.]], 0			; OLDPM_O23-NEXT: [[CMP1_NOT:%.]] = icmp eq i64 [[NUMELEM:%.]], 0
	; OLDPM_O23-NEXT: br i1 [[CMP1_NOT]], label [[FOR_COND_CLEANUP:%.]], label [[FOR_BODY_PREHEADER:%.]]			; OLDPM_O23-NEXT: br i1 [[CMP1_NOT]], label [[FOR_COND_CLEANUP:%.]], label [[FOR_BODY_LR_PH:%.]]
	; OLDPM_O23: for.body.preheader:			; OLDPM_O23: for.body.lr.ph:
				; OLDPM_O23-NEXT: [[TMP0:%.]] = load double, double** [[_M_START_I:%.*]], align 8, !tbaa [[TBAA3:![0-9]+]]
	; OLDPM_O23-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[NUMELEM]], 4			; OLDPM_O23-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[NUMELEM]], 4
	; OLDPM_O23-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[FOR_BODY_PREHEADER3:%.]], label [[VECTOR_PH:%.]]			; OLDPM_O23-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[FOR_BODY_PREHEADER:%.]], label [[VECTOR_PH:%.]]
	; OLDPM_O23: vector.ph:			; OLDPM_O23: vector.ph:
	; OLDPM_O23-NEXT: [[N_VEC:%.*]] = and i64 [[NUMELEM]], -4			; OLDPM_O23-NEXT: [[N_VEC:%.*]] = and i64 [[NUMELEM]], -4
	; OLDPM_O23-NEXT: br label [[VECTOR_BODY:%.*]]			; OLDPM_O23-NEXT: br label [[VECTOR_BODY:%.*]]
	; OLDPM_O23: vector.body:			; OLDPM_O23: vector.body:
	; OLDPM_O23-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; OLDPM_O23-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; OLDPM_O23-NEXT: [[TMP1:%.]] = getelementptr inbounds double, double [[TMP0]], i64 [[INDEX]]			; OLDPM_O23-NEXT: [[TMP1:%.]] = getelementptr inbounds double, double [[TMP0]], i64 [[INDEX]]
	; OLDPM_O23-NEXT: [[TMP2:%.]] = bitcast double [[TMP1]] to <2 x double>*			; OLDPM_O23-NEXT: [[TMP2:%.]] = bitcast double [[TMP1]] to <2 x double>*
	; OLDPM_O23-NEXT: store <2 x double> <double 2.000000e+00, double 2.000000e+00>, <2 x double>* [[TMP2]], align 8, !tbaa [[TBAA3:![0-9]+]]			; OLDPM_O23-NEXT: store <2 x double> <double 2.000000e+00, double 2.000000e+00>, <2 x double>* [[TMP2]], align 8, !tbaa [[TBAA8:![0-9]+]]
	; OLDPM_O23-NEXT: [[TMP3:%.]] = getelementptr inbounds double, double [[TMP1]], i64 2			; OLDPM_O23-NEXT: [[TMP3:%.]] = getelementptr inbounds double, double [[TMP1]], i64 2
	; OLDPM_O23-NEXT: [[TMP4:%.]] = bitcast double [[TMP3]] to <2 x double>*			; OLDPM_O23-NEXT: [[TMP4:%.]] = bitcast double [[TMP3]] to <2 x double>*
	; OLDPM_O23-NEXT: store <2 x double> <double 2.000000e+00, double 2.000000e+00>, <2 x double>* [[TMP4]], align 8, !tbaa [[TBAA3]]			; OLDPM_O23-NEXT: store <2 x double> <double 2.000000e+00, double 2.000000e+00>, <2 x double>* [[TMP4]], align 8, !tbaa [[TBAA8]]
	; OLDPM_O23-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4			; OLDPM_O23-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
	; OLDPM_O23-NEXT: [[TMP5:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; OLDPM_O23-NEXT: [[TMP5:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; OLDPM_O23-NEXT: br i1 [[TMP5]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP7:![0-9]+]]			; OLDPM_O23-NEXT: br i1 [[TMP5]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]]
	; OLDPM_O23: middle.block:			; OLDPM_O23: middle.block:
	; OLDPM_O23-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[NUMELEM]]			; OLDPM_O23-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[NUMELEM]]
	; OLDPM_O23-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY_PREHEADER3]]			; OLDPM_O23-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY_PREHEADER]]
	; OLDPM_O23: for.body.preheader3:			; OLDPM_O23: for.body.preheader:
	; OLDPM_O23-NEXT: [[K_02_PH:%.*]] = phi i64 [ 0, [[FOR_BODY_PREHEADER]] ], [ [[N_VEC]], [[MIDDLE_BLOCK]] ]			; OLDPM_O23-NEXT: [[K_02_PH:%.*]] = phi i64 [ 0, [[FOR_BODY_LR_PH]] ], [ [[N_VEC]], [[MIDDLE_BLOCK]] ]
	; OLDPM_O23-NEXT: br label [[FOR_BODY:%.*]]			; OLDPM_O23-NEXT: br label [[FOR_BODY:%.*]]
	; OLDPM_O23: for.body:			; OLDPM_O23: for.body:
	; OLDPM_O23-NEXT: [[K_02:%.]] = phi i64 [ [[INC:%.]], [[FOR_BODY]] ], [ [[K_02_PH]], [[FOR_BODY_PREHEADER3]] ]			; OLDPM_O23-NEXT: [[K_02:%.]] = phi i64 [ [[INC:%.]], [[FOR_BODY]] ], [ [[K_02_PH]], [[FOR_BODY_PREHEADER]] ]
	; OLDPM_O23-NEXT: [[ADD_PTR_I:%.]] = getelementptr inbounds double, double [[TMP0]], i64 [[K_02]]			; OLDPM_O23-NEXT: [[ADD_PTR_I:%.]] = getelementptr inbounds double, double [[TMP0]], i64 [[K_02]]
	; OLDPM_O23-NEXT: store double 2.000000e+00, double* [[ADD_PTR_I]], align 8, !tbaa [[TBAA3]]			; OLDPM_O23-NEXT: store double 2.000000e+00, double* [[ADD_PTR_I]], align 8, !tbaa [[TBAA8]]
	; OLDPM_O23-NEXT: [[INC]] = add nuw i64 [[K_02]], 1			; OLDPM_O23-NEXT: [[INC]] = add nuw i64 [[K_02]], 1
	; OLDPM_O23-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INC]], [[NUMELEM]]			; OLDPM_O23-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INC]], [[NUMELEM]]
	; OLDPM_O23-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]], !llvm.loop [[LOOP9:![0-9]+]]			; OLDPM_O23-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]], !llvm.loop [[LOOP12:![0-9]+]]
	; OLDPM_O23: for.cond.cleanup:			; OLDPM_O23: for.cond.cleanup:
	; OLDPM_O23-NEXT: ret void			; OLDPM_O23-NEXT: ret void
	;			;
	; NEWPM_O1-LABEL: @licm(			; NEWPM_O1-LABEL: @licm(
	; NEWPM_O1-NEXT: entry:			; NEWPM_O1-NEXT: entry:
	; NEWPM_O1-NEXT: [[TMP0:%.]] = load double, double** [[_M_START_I:%.*]], align 8
	; NEWPM_O1-NEXT: [[CMP1_NOT:%.]] = icmp eq i64 [[NUMELEM:%.]], 0			; NEWPM_O1-NEXT: [[CMP1_NOT:%.]] = icmp eq i64 [[NUMELEM:%.]], 0
	; NEWPM_O1-NEXT: br i1 [[CMP1_NOT]], label [[FOR_COND_CLEANUP:%.]], label [[FOR_BODY:%.]]			; NEWPM_O1-NEXT: br i1 [[CMP1_NOT]], label [[FOR_COND_CLEANUP:%.]], label [[FOR_BODY_LR_PH:%.]]
				; NEWPM_O1: for.body.lr.ph:
				; NEWPM_O1-NEXT: [[TMP0:%.]] = load double, double** [[_M_START_I:%.*]], align 8, !tbaa [[TBAA3:![0-9]+]]
				; NEWPM_O1-NEXT: br label [[FOR_BODY:%.*]]
	; NEWPM_O1: for.body:			; NEWPM_O1: for.body:
	; NEWPM_O1-NEXT: [[K_02:%.]] = phi i64 [ [[INC:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]			; NEWPM_O1-NEXT: [[K_02:%.]] = phi i64 [ 0, [[FOR_BODY_LR_PH]] ], [ [[INC:%.]], [[FOR_BODY]] ]
	; NEWPM_O1-NEXT: [[ADD_PTR_I:%.]] = getelementptr inbounds double, double [[TMP0]], i64 [[K_02]]			; NEWPM_O1-NEXT: [[ADD_PTR_I:%.]] = getelementptr inbounds double, double [[TMP0]], i64 [[K_02]]
	; NEWPM_O1-NEXT: store double 2.000000e+00, double* [[ADD_PTR_I]], align 8, !tbaa [[TBAA3:![0-9]+]]			; NEWPM_O1-NEXT: store double 2.000000e+00, double* [[ADD_PTR_I]], align 8, !tbaa [[TBAA8:![0-9]+]]
	; NEWPM_O1-NEXT: [[INC]] = add nuw i64 [[K_02]], 1			; NEWPM_O1-NEXT: [[INC]] = add nuw i64 [[K_02]], 1
	; NEWPM_O1-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INC]], [[NUMELEM]]			; NEWPM_O1-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INC]], [[NUMELEM]]
	; NEWPM_O1-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]]			; NEWPM_O1-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]]
	; NEWPM_O1: for.cond.cleanup:			; NEWPM_O1: for.cond.cleanup:
	; NEWPM_O1-NEXT: ret void			; NEWPM_O1-NEXT: ret void
	;			;
	; NEWPM_O23-LABEL: @licm(			; NEWPM_O23-LABEL: @licm(
	; NEWPM_O23-NEXT: entry:			; NEWPM_O23-NEXT: entry:
	; NEWPM_O23-NEXT: [[TMP0:%.]] = load double, double** [[_M_START_I:%.*]], align 8
	; NEWPM_O23-NEXT: [[CMP1_NOT:%.]] = icmp eq i64 [[NUMELEM:%.]], 0			; NEWPM_O23-NEXT: [[CMP1_NOT:%.]] = icmp eq i64 [[NUMELEM:%.]], 0
	; NEWPM_O23-NEXT: br i1 [[CMP1_NOT]], label [[FOR_COND_CLEANUP:%.]], label [[FOR_BODY_PREHEADER:%.]]			; NEWPM_O23-NEXT: br i1 [[CMP1_NOT]], label [[FOR_COND_CLEANUP:%.]], label [[FOR_BODY_LR_PH:%.]]
	; NEWPM_O23: for.body.preheader:			; NEWPM_O23: for.body.lr.ph:
				; NEWPM_O23-NEXT: [[TMP0:%.]] = load double, double** [[_M_START_I:%.*]], align 8, !tbaa [[TBAA3:![0-9]+]]
	; NEWPM_O23-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[NUMELEM]], 4			; NEWPM_O23-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[NUMELEM]], 4
	; NEWPM_O23-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[FOR_BODY_PREHEADER3:%.]], label [[VECTOR_PH:%.]]			; NEWPM_O23-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[FOR_BODY_PREHEADER:%.]], label [[VECTOR_PH:%.]]
	; NEWPM_O23: vector.ph:			; NEWPM_O23: vector.ph:
	; NEWPM_O23-NEXT: [[N_VEC:%.*]] = and i64 [[NUMELEM]], -4			; NEWPM_O23-NEXT: [[N_VEC:%.*]] = and i64 [[NUMELEM]], -4
	; NEWPM_O23-NEXT: br label [[VECTOR_BODY:%.*]]			; NEWPM_O23-NEXT: br label [[VECTOR_BODY:%.*]]
	; NEWPM_O23: vector.body:			; NEWPM_O23: vector.body:
	; NEWPM_O23-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; NEWPM_O23-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; NEWPM_O23-NEXT: [[TMP1:%.]] = getelementptr inbounds double, double [[TMP0]], i64 [[INDEX]]			; NEWPM_O23-NEXT: [[TMP1:%.]] = getelementptr inbounds double, double [[TMP0]], i64 [[INDEX]]
	; NEWPM_O23-NEXT: [[TMP2:%.]] = bitcast double [[TMP1]] to <2 x double>*			; NEWPM_O23-NEXT: [[TMP2:%.]] = bitcast double [[TMP1]] to <2 x double>*
	; NEWPM_O23-NEXT: store <2 x double> <double 2.000000e+00, double 2.000000e+00>, <2 x double>* [[TMP2]], align 8, !tbaa [[TBAA3:![0-9]+]]			; NEWPM_O23-NEXT: store <2 x double> <double 2.000000e+00, double 2.000000e+00>, <2 x double>* [[TMP2]], align 8, !tbaa [[TBAA8:![0-9]+]]
	; NEWPM_O23-NEXT: [[TMP3:%.]] = getelementptr inbounds double, double [[TMP1]], i64 2			; NEWPM_O23-NEXT: [[TMP3:%.]] = getelementptr inbounds double, double [[TMP1]], i64 2
	; NEWPM_O23-NEXT: [[TMP4:%.]] = bitcast double [[TMP3]] to <2 x double>*			; NEWPM_O23-NEXT: [[TMP4:%.]] = bitcast double [[TMP3]] to <2 x double>*
	; NEWPM_O23-NEXT: store <2 x double> <double 2.000000e+00, double 2.000000e+00>, <2 x double>* [[TMP4]], align 8, !tbaa [[TBAA3]]			; NEWPM_O23-NEXT: store <2 x double> <double 2.000000e+00, double 2.000000e+00>, <2 x double>* [[TMP4]], align 8, !tbaa [[TBAA8]]
	; NEWPM_O23-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4			; NEWPM_O23-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
	; NEWPM_O23-NEXT: [[TMP5:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; NEWPM_O23-NEXT: [[TMP5:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; NEWPM_O23-NEXT: br i1 [[TMP5]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP7:![0-9]+]]			; NEWPM_O23-NEXT: br i1 [[TMP5]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]]
	; NEWPM_O23: middle.block:			; NEWPM_O23: middle.block:
	; NEWPM_O23-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[NUMELEM]]			; NEWPM_O23-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[NUMELEM]]
	; NEWPM_O23-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY_PREHEADER3]]			; NEWPM_O23-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY_PREHEADER]]
	; NEWPM_O23: for.body.preheader3:			; NEWPM_O23: for.body.preheader:
	; NEWPM_O23-NEXT: [[K_02_PH:%.*]] = phi i64 [ 0, [[FOR_BODY_PREHEADER]] ], [ [[N_VEC]], [[MIDDLE_BLOCK]] ]			; NEWPM_O23-NEXT: [[K_02_PH:%.*]] = phi i64 [ 0, [[FOR_BODY_LR_PH]] ], [ [[N_VEC]], [[MIDDLE_BLOCK]] ]
	; NEWPM_O23-NEXT: br label [[FOR_BODY:%.*]]			; NEWPM_O23-NEXT: br label [[FOR_BODY:%.*]]
	; NEWPM_O23: for.body:			; NEWPM_O23: for.body:
	; NEWPM_O23-NEXT: [[K_02:%.]] = phi i64 [ [[INC:%.]], [[FOR_BODY]] ], [ [[K_02_PH]], [[FOR_BODY_PREHEADER3]] ]			; NEWPM_O23-NEXT: [[K_02:%.]] = phi i64 [ [[INC:%.]], [[FOR_BODY]] ], [ [[K_02_PH]], [[FOR_BODY_PREHEADER]] ]
	; NEWPM_O23-NEXT: [[ADD_PTR_I:%.]] = getelementptr inbounds double, double [[TMP0]], i64 [[K_02]]			; NEWPM_O23-NEXT: [[ADD_PTR_I:%.]] = getelementptr inbounds double, double [[TMP0]], i64 [[K_02]]
	; NEWPM_O23-NEXT: store double 2.000000e+00, double* [[ADD_PTR_I]], align 8, !tbaa [[TBAA3]]			; NEWPM_O23-NEXT: store double 2.000000e+00, double* [[ADD_PTR_I]], align 8, !tbaa [[TBAA8]]
	; NEWPM_O23-NEXT: [[INC]] = add nuw i64 [[K_02]], 1			; NEWPM_O23-NEXT: [[INC]] = add nuw i64 [[K_02]], 1
	; NEWPM_O23-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INC]], [[NUMELEM]]			; NEWPM_O23-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INC]], [[NUMELEM]]
	; NEWPM_O23-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]], !llvm.loop [[LOOP9:![0-9]+]]			; NEWPM_O23-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]], !llvm.loop [[LOOP12:![0-9]+]]
	; NEWPM_O23: for.cond.cleanup:			; NEWPM_O23: for.cond.cleanup:
	; NEWPM_O23-NEXT: ret void			; NEWPM_O23-NEXT: ret void
	;			;
	entry:			entry:
	br label %for.cond			br label %for.cond

	for.cond: ; preds = %for.body, %entry			for.cond: ; preds = %for.body, %entry
	%k.0 = phi i64 [ 0, %entry ], [ %inc, %for.body ]			%k.0 = phi i64 [ 0, %entry ], [ %inc, %for.body ]
	Show All 27 Lines

llvm/test/Transforms/PhaseOrdering/X86/spurious-peeling.ll

Show All 26 Lines
; OLDPM_O23-NEXT: entry:		; OLDPM_O23-NEXT: entry:
; OLDPM_O23-NEXT: [[BASE_I_I:%.]] = getelementptr inbounds [[CLASS_FLOATVECPAIR:%.]], %class.FloatVecPair* [[FVP]], i64 0, i32 1, i32 0		; OLDPM_O23-NEXT: [[BASE_I_I:%.]] = getelementptr inbounds [[CLASS_FLOATVECPAIR:%.]], %class.FloatVecPair* [[FVP]], i64 0, i32 1, i32 0
; OLDPM_O23-NEXT: [[TMP0:%.]] = load %class.HomemadeVector.0, %class.HomemadeVector.0** [[BASE_I_I]], align 8, !tbaa [[TBAA0:![0-9]+]]		; OLDPM_O23-NEXT: [[TMP0:%.]] = load %class.HomemadeVector.0, %class.HomemadeVector.0** [[BASE_I_I]], align 8, !tbaa [[TBAA0:![0-9]+]]
; OLDPM_O23-NEXT: [[SIZE4_I:%.]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0:%.]], %class.HomemadeVector.0* [[TMP0]], i64 undef, i32 1		; OLDPM_O23-NEXT: [[SIZE4_I:%.]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0:%.]], %class.HomemadeVector.0* [[TMP0]], i64 undef, i32 1
; OLDPM_O23-NEXT: [[TMP1:%.]] = load i32, i32 [[SIZE4_I]], align 8, !tbaa [[TBAA6:![0-9]+]]		; OLDPM_O23-NEXT: [[TMP1:%.]] = load i32, i32 [[SIZE4_I]], align 8, !tbaa [[TBAA6:![0-9]+]]
; OLDPM_O23-NEXT: [[CMP510_NOT_I:%.*]] = icmp eq i32 [[TMP1]], 0		; OLDPM_O23-NEXT: [[CMP510_NOT_I:%.*]] = icmp eq i32 [[TMP1]], 0
; OLDPM_O23-NEXT: br i1 [[CMP510_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT:%.]], label [[FOR_BODY7_LR_PH_I:%.]]		; OLDPM_O23-NEXT: br i1 [[CMP510_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT:%.]], label [[FOR_BODY7_LR_PH_I:%.]]
; OLDPM_O23: for.body7.lr.ph.i:		; OLDPM_O23: for.body7.lr.ph.i:
; OLDPM_O23-NEXT: [[BASE_I4_I:%.]] = getelementptr inbounds [[CLASS_FLOATVECPAIR]], %class.FloatVecPair [[FVP]], i64 0, i32 0, i32 0
; OLDPM_O23-NEXT: [[BASE_I6_I:%.]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0 [[TMP0]], i64 undef, i32 0		; OLDPM_O23-NEXT: [[BASE_I6_I:%.]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0 [[TMP0]], i64 undef, i32 0
; OLDPM_O23-NEXT: [[TMP2:%.]] = load float, float** [[BASE_I6_I]], align 8, !tbaa [[TBAA8:![0-9]+]]		; OLDPM_O23-NEXT: [[TMP2:%.]] = load float, float** [[BASE_I6_I]], align 8, !tbaa [[TBAA8:![0-9]+]]
; OLDPM_O23-NEXT: [[ARRAYIDX_I7_I:%.]] = getelementptr inbounds float, float [[TMP2]], i64 undef		; OLDPM_O23-NEXT: [[ARRAYIDX_I7_I:%.]] = getelementptr inbounds float, float [[TMP2]], i64 undef
		; OLDPM_O23-NEXT: [[BASE_I4_I:%.]] = getelementptr inbounds [[CLASS_FLOATVECPAIR]], %class.FloatVecPair [[FVP]], i64 0, i32 0, i32 0
; OLDPM_O23-NEXT: [[TMP3:%.]] = load %class.HomemadeVector.0, %class.HomemadeVector.0** [[BASE_I4_I]], align 8, !tbaa [[TBAA0]]		; OLDPM_O23-NEXT: [[TMP3:%.]] = load %class.HomemadeVector.0, %class.HomemadeVector.0** [[BASE_I4_I]], align 8, !tbaa [[TBAA0]]
; OLDPM_O23-NEXT: [[BASE_I2_I:%.]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0 [[TMP3]], i64 undef, i32 0		; OLDPM_O23-NEXT: [[BASE_I2_I:%.]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0 [[TMP3]], i64 undef, i32 0
; OLDPM_O23-NEXT: [[TMP4:%.]] = load float, float** [[BASE_I2_I]], align 8, !tbaa [[TBAA8]]		; OLDPM_O23-NEXT: [[TMP4:%.]] = load float, float** [[BASE_I2_I]], align 8, !tbaa [[TBAA8]]
; OLDPM_O23-NEXT: [[ARRAYIDX_I3_I:%.]] = getelementptr inbounds float, float [[TMP4]], i64 undef		; OLDPM_O23-NEXT: [[ARRAYIDX_I3_I:%.]] = getelementptr inbounds float, float [[TMP4]], i64 undef
; OLDPM_O23-NEXT: [[DOTPRE_I:%.]] = load float, float [[ARRAYIDX_I3_I]], align 4, !tbaa [[TBAA9:![0-9]+]]		; OLDPM_O23-NEXT: [[DOTPRE_I:%.]] = load float, float [[ARRAYIDX_I3_I]], align 4, !tbaa [[TBAA9:![0-9]+]]
; OLDPM_O23-NEXT: br label [[FOR_BODY7_I:%.*]]		; OLDPM_O23-NEXT: br label [[FOR_BODY7_I:%.*]]
; OLDPM_O23: for.body7.i:		; OLDPM_O23: for.body7.i:
; OLDPM_O23-NEXT: [[TMP5:%.]] = phi float [ [[DOTPRE_I]], [[FOR_BODY7_LR_PH_I]] ], [ [[ADD_I:%.]], [[FOR_BODY7_I]] ]		; OLDPM_O23-NEXT: [[TMP5:%.]] = phi float [ [[DOTPRE_I]], [[FOR_BODY7_LR_PH_I]] ], [ [[ADD_I:%.]], [[FOR_BODY7_I]] ]
Show All 12 Lines
; NEWPM_O1-NEXT: entry:		; NEWPM_O1-NEXT: entry:
; NEWPM_O1-NEXT: [[BASE_I_I:%.]] = getelementptr inbounds [[CLASS_FLOATVECPAIR:%.]], %class.FloatVecPair* [[FVP]], i64 0, i32 1, i32 0		; NEWPM_O1-NEXT: [[BASE_I_I:%.]] = getelementptr inbounds [[CLASS_FLOATVECPAIR:%.]], %class.FloatVecPair* [[FVP]], i64 0, i32 1, i32 0
; NEWPM_O1-NEXT: [[TMP0:%.]] = load %class.HomemadeVector.0, %class.HomemadeVector.0** [[BASE_I_I]], align 8, !tbaa [[TBAA0:![0-9]+]]		; NEWPM_O1-NEXT: [[TMP0:%.]] = load %class.HomemadeVector.0, %class.HomemadeVector.0** [[BASE_I_I]], align 8, !tbaa [[TBAA0:![0-9]+]]
; NEWPM_O1-NEXT: [[SIZE4_I:%.]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0:%.]], %class.HomemadeVector.0* [[TMP0]], i64 undef, i32 1		; NEWPM_O1-NEXT: [[SIZE4_I:%.]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0:%.]], %class.HomemadeVector.0* [[TMP0]], i64 undef, i32 1
; NEWPM_O1-NEXT: [[TMP1:%.]] = load i32, i32 [[SIZE4_I]], align 8, !tbaa [[TBAA6:![0-9]+]]		; NEWPM_O1-NEXT: [[TMP1:%.]] = load i32, i32 [[SIZE4_I]], align 8, !tbaa [[TBAA6:![0-9]+]]
; NEWPM_O1-NEXT: [[CMP510_NOT_I:%.*]] = icmp eq i32 [[TMP1]], 0		; NEWPM_O1-NEXT: [[CMP510_NOT_I:%.*]] = icmp eq i32 [[TMP1]], 0
; NEWPM_O1-NEXT: br i1 [[CMP510_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT:%.]], label [[FOR_BODY7_LR_PH_I:%.]]		; NEWPM_O1-NEXT: br i1 [[CMP510_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT:%.]], label [[FOR_BODY7_LR_PH_I:%.]]
; NEWPM_O1: for.body7.lr.ph.i:		; NEWPM_O1: for.body7.lr.ph.i:
; NEWPM_O1-NEXT: [[BASE_I6_I:%.]] = getelementptr inbounds [[CLASS_FLOATVECPAIR]], %class.FloatVecPair [[FVP]], i64 0, i32 0, i32 0
; NEWPM_O1-NEXT: [[BASE_I4_I:%.]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0 [[TMP0]], i64 undef, i32 0		; NEWPM_O1-NEXT: [[BASE_I4_I:%.]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0 [[TMP0]], i64 undef, i32 0
; NEWPM_O1-NEXT: [[TMP2:%.]] = load float, float** [[BASE_I4_I]], align 8, !tbaa [[TBAA8:![0-9]+]]		; NEWPM_O1-NEXT: [[TMP2:%.]] = load float, float** [[BASE_I4_I]], align 8, !tbaa [[TBAA8:![0-9]+]]
; NEWPM_O1-NEXT: [[ARRAYIDX_I5_I:%.]] = getelementptr inbounds float, float [[TMP2]], i64 undef		; NEWPM_O1-NEXT: [[ARRAYIDX_I5_I:%.]] = getelementptr inbounds float, float [[TMP2]], i64 undef
		; NEWPM_O1-NEXT: [[BASE_I6_I:%.]] = getelementptr inbounds [[CLASS_FLOATVECPAIR]], %class.FloatVecPair [[FVP]], i64 0, i32 0, i32 0
; NEWPM_O1-NEXT: [[TMP3:%.]] = load %class.HomemadeVector.0, %class.HomemadeVector.0** [[BASE_I6_I]], align 8, !tbaa [[TBAA0]]		; NEWPM_O1-NEXT: [[TMP3:%.]] = load %class.HomemadeVector.0, %class.HomemadeVector.0** [[BASE_I6_I]], align 8, !tbaa [[TBAA0]]
; NEWPM_O1-NEXT: [[BASE_I8_I:%.]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0 [[TMP3]], i64 undef, i32 0		; NEWPM_O1-NEXT: [[BASE_I8_I:%.]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0 [[TMP3]], i64 undef, i32 0
; NEWPM_O1-NEXT: [[TMP4:%.]] = load float, float** [[BASE_I8_I]], align 8, !tbaa [[TBAA8]]		; NEWPM_O1-NEXT: [[TMP4:%.]] = load float, float** [[BASE_I8_I]], align 8, !tbaa [[TBAA8]]
; NEWPM_O1-NEXT: [[ARRAYIDX_I9_I:%.]] = getelementptr inbounds float, float [[TMP4]], i64 undef		; NEWPM_O1-NEXT: [[ARRAYIDX_I9_I:%.]] = getelementptr inbounds float, float [[TMP4]], i64 undef
; NEWPM_O1-NEXT: br label [[FOR_BODY7_I:%.*]]		; NEWPM_O1-NEXT: br label [[FOR_BODY7_I:%.*]]
; NEWPM_O1: for.body7.i:		; NEWPM_O1: for.body7.i:
; NEWPM_O1-NEXT: [[J_011_I:%.]] = phi i32 [ 0, [[FOR_BODY7_LR_PH_I]] ], [ [[INC_I:%.]], [[FOR_BODY7_I]] ]		; NEWPM_O1-NEXT: [[J_011_I:%.]] = phi i32 [ 0, [[FOR_BODY7_LR_PH_I]] ], [ [[INC_I:%.]], [[FOR_BODY7_I]] ]
; NEWPM_O1-NEXT: [[TMP5:%.]] = load float, float [[ARRAYIDX_I5_I]], align 4, !tbaa [[TBAA9:![0-9]+]]		; NEWPM_O1-NEXT: [[TMP5:%.]] = load float, float [[ARRAYIDX_I5_I]], align 4, !tbaa [[TBAA9:![0-9]+]]
Show All 11 Lines
; NEWPM_O23-NEXT: entry:		; NEWPM_O23-NEXT: entry:
; NEWPM_O23-NEXT: [[BASE_I_I:%.]] = getelementptr inbounds [[CLASS_FLOATVECPAIR:%.]], %class.FloatVecPair* [[FVP]], i64 0, i32 1, i32 0		; NEWPM_O23-NEXT: [[BASE_I_I:%.]] = getelementptr inbounds [[CLASS_FLOATVECPAIR:%.]], %class.FloatVecPair* [[FVP]], i64 0, i32 1, i32 0
; NEWPM_O23-NEXT: [[TMP0:%.]] = load %class.HomemadeVector.0, %class.HomemadeVector.0** [[BASE_I_I]], align 8, !tbaa [[TBAA0:![0-9]+]]		; NEWPM_O23-NEXT: [[TMP0:%.]] = load %class.HomemadeVector.0, %class.HomemadeVector.0** [[BASE_I_I]], align 8, !tbaa [[TBAA0:![0-9]+]]
; NEWPM_O23-NEXT: [[SIZE4_I:%.]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0:%.]], %class.HomemadeVector.0* [[TMP0]], i64 undef, i32 1		; NEWPM_O23-NEXT: [[SIZE4_I:%.]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0:%.]], %class.HomemadeVector.0* [[TMP0]], i64 undef, i32 1
; NEWPM_O23-NEXT: [[TMP1:%.]] = load i32, i32 [[SIZE4_I]], align 8, !tbaa [[TBAA6:![0-9]+]]		; NEWPM_O23-NEXT: [[TMP1:%.]] = load i32, i32 [[SIZE4_I]], align 8, !tbaa [[TBAA6:![0-9]+]]
; NEWPM_O23-NEXT: [[CMP510_NOT_I:%.*]] = icmp eq i32 [[TMP1]], 0		; NEWPM_O23-NEXT: [[CMP510_NOT_I:%.*]] = icmp eq i32 [[TMP1]], 0
; NEWPM_O23-NEXT: br i1 [[CMP510_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT:%.]], label [[FOR_BODY7_LR_PH_I:%.]]		; NEWPM_O23-NEXT: br i1 [[CMP510_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT:%.]], label [[FOR_BODY7_LR_PH_I:%.]]
; NEWPM_O23: for.body7.lr.ph.i:		; NEWPM_O23: for.body7.lr.ph.i:
; NEWPM_O23-NEXT: [[BASE_I6_I:%.]] = getelementptr inbounds [[CLASS_FLOATVECPAIR]], %class.FloatVecPair [[FVP]], i64 0, i32 0, i32 0
; NEWPM_O23-NEXT: [[BASE_I4_I:%.]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0 [[TMP0]], i64 undef, i32 0		; NEWPM_O23-NEXT: [[BASE_I4_I:%.]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0 [[TMP0]], i64 undef, i32 0
; NEWPM_O23-NEXT: [[TMP2:%.]] = load float, float** [[BASE_I4_I]], align 8, !tbaa [[TBAA8:![0-9]+]]		; NEWPM_O23-NEXT: [[TMP2:%.]] = load float, float** [[BASE_I4_I]], align 8, !tbaa [[TBAA8:![0-9]+]]
; NEWPM_O23-NEXT: [[ARRAYIDX_I5_I:%.]] = getelementptr inbounds float, float [[TMP2]], i64 undef		; NEWPM_O23-NEXT: [[ARRAYIDX_I5_I:%.]] = getelementptr inbounds float, float [[TMP2]], i64 undef
		; NEWPM_O23-NEXT: [[BASE_I6_I:%.]] = getelementptr inbounds [[CLASS_FLOATVECPAIR]], %class.FloatVecPair [[FVP]], i64 0, i32 0, i32 0
; NEWPM_O23-NEXT: [[TMP3:%.]] = load %class.HomemadeVector.0, %class.HomemadeVector.0** [[BASE_I6_I]], align 8, !tbaa [[TBAA0]]		; NEWPM_O23-NEXT: [[TMP3:%.]] = load %class.HomemadeVector.0, %class.HomemadeVector.0** [[BASE_I6_I]], align 8, !tbaa [[TBAA0]]
; NEWPM_O23-NEXT: [[BASE_I8_I:%.]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0 [[TMP3]], i64 undef, i32 0		; NEWPM_O23-NEXT: [[BASE_I8_I:%.]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0 [[TMP3]], i64 undef, i32 0
; NEWPM_O23-NEXT: [[TMP4:%.]] = load float, float** [[BASE_I8_I]], align 8, !tbaa [[TBAA8]]		; NEWPM_O23-NEXT: [[TMP4:%.]] = load float, float** [[BASE_I8_I]], align 8, !tbaa [[TBAA8]]
; NEWPM_O23-NEXT: [[ARRAYIDX_I9_I:%.]] = getelementptr inbounds float, float [[TMP4]], i64 undef		; NEWPM_O23-NEXT: [[ARRAYIDX_I9_I:%.]] = getelementptr inbounds float, float [[TMP4]], i64 undef
; NEWPM_O23-NEXT: [[DOTPRE_I:%.]] = load float, float [[ARRAYIDX_I9_I]], align 4, !tbaa [[TBAA9:![0-9]+]]		; NEWPM_O23-NEXT: [[DOTPRE_I:%.]] = load float, float [[ARRAYIDX_I9_I]], align 4, !tbaa [[TBAA9:![0-9]+]]
; NEWPM_O23-NEXT: br label [[FOR_BODY7_I:%.*]]		; NEWPM_O23-NEXT: br label [[FOR_BODY7_I:%.*]]
; NEWPM_O23: for.body7.i:		; NEWPM_O23: for.body7.i:
; NEWPM_O23-NEXT: [[TMP5:%.]] = phi float [ [[DOTPRE_I]], [[FOR_BODY7_LR_PH_I]] ], [ [[ADD_I:%.]], [[FOR_BODY7_I]] ]		; NEWPM_O23-NEXT: [[TMP5:%.]] = phi float [ [[DOTPRE_I]], [[FOR_BODY7_LR_PH_I]] ], [ [[ADD_I:%.]], [[FOR_BODY7_I]] ]
Show All 15 Lines	entry:
ret void		ret void
}		}

define linkonce_odr dso_local void @_ZN12FloatVecPair6vecIncEv(%class.FloatVecPair* %this) comdat align 2 {		define linkonce_odr dso_local void @_ZN12FloatVecPair6vecIncEv(%class.FloatVecPair* %this) comdat align 2 {
; OLDPM_O1-LABEL: define {{[^@]+}}@_ZN12FloatVecPair6vecIncEv		; OLDPM_O1-LABEL: define {{[^@]+}}@_ZN12FloatVecPair6vecIncEv
; OLDPM_O1-SAME: (%class.FloatVecPair* [[THIS:%.*]]) local_unnamed_addr comdat align 2 {		; OLDPM_O1-SAME: (%class.FloatVecPair* [[THIS:%.*]]) local_unnamed_addr comdat align 2 {
; OLDPM_O1-NEXT: entry:		; OLDPM_O1-NEXT: entry:
; OLDPM_O1-NEXT: [[VSRC23:%.]] = getelementptr inbounds [[CLASS_FLOATVECPAIR:%.]], %class.FloatVecPair* [[THIS]], i64 0, i32 1		; OLDPM_O1-NEXT: [[VSRC23:%.]] = getelementptr inbounds [[CLASS_FLOATVECPAIR:%.]], %class.FloatVecPair* [[THIS]], i64 0, i32 1
; OLDPM_O1-NEXT: [[VSRCDST:%.]] = getelementptr inbounds [[CLASS_FLOATVECPAIR]], %class.FloatVecPair [[THIS]], i64 0, i32 0
; OLDPM_O1-NEXT: [[CALL2:%.]] = call %class.HomemadeVector.0 @_ZN14HomemadeVectorIS_IfLj8EELj8EEixEj(%class.HomemadeVector* nonnull [[VSRC23]])		; OLDPM_O1-NEXT: [[CALL2:%.]] = call %class.HomemadeVector.0 @_ZN14HomemadeVectorIS_IfLj8EELj8EEixEj(%class.HomemadeVector* nonnull [[VSRC23]])
; OLDPM_O1-NEXT: [[SIZE43:%.]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0:%.]], %class.HomemadeVector.0* [[CALL2]], i64 0, i32 1		; OLDPM_O1-NEXT: [[SIZE43:%.]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0:%.]], %class.HomemadeVector.0* [[CALL2]], i64 0, i32 1
; OLDPM_O1-NEXT: [[TMP0:%.]] = load i32, i32 [[SIZE43]], align 8, !tbaa [[TBAA0:![0-9]+]]		; OLDPM_O1-NEXT: [[TMP0:%.]] = load i32, i32 [[SIZE43]], align 8, !tbaa [[TBAA0:![0-9]+]]
; OLDPM_O1-NEXT: [[CMP54_NOT:%.*]] = icmp eq i32 [[TMP0]], 0		; OLDPM_O1-NEXT: [[CMP54_NOT:%.*]] = icmp eq i32 [[TMP0]], 0
; OLDPM_O1-NEXT: br i1 [[CMP54_NOT]], label [[FOR_COND_CLEANUP6:%.]], label [[FOR_BODY7:%.]]		; OLDPM_O1-NEXT: br i1 [[CMP54_NOT]], label [[FOR_COND_CLEANUP6:%.]], label [[FOR_BODY7_LR_PH:%.]]
		; OLDPM_O1: for.body7.lr.ph:
		; OLDPM_O1-NEXT: [[VSRCDST:%.]] = getelementptr inbounds [[CLASS_FLOATVECPAIR]], %class.FloatVecPair [[THIS]], i64 0, i32 0
		; OLDPM_O1-NEXT: br label [[FOR_BODY7:%.*]]
; OLDPM_O1: for.cond.cleanup6:		; OLDPM_O1: for.cond.cleanup6:
; OLDPM_O1-NEXT: ret void		; OLDPM_O1-NEXT: ret void
; OLDPM_O1: for.body7:		; OLDPM_O1: for.body7:
; OLDPM_O1-NEXT: [[J_05:%.]] = phi i32 [ [[INC:%.]], [[FOR_BODY7]] ], [ 0, [[ENTRY:%.*]] ]		; OLDPM_O1-NEXT: [[J_05:%.]] = phi i32 [ 0, [[FOR_BODY7_LR_PH]] ], [ [[INC:%.]], [[FOR_BODY7]] ]
; OLDPM_O1-NEXT: [[CALL9:%.]] = call %class.HomemadeVector.0 @_ZN14HomemadeVectorIS_IfLj8EELj8EEixEj(%class.HomemadeVector* nonnull [[VSRC23]])		; OLDPM_O1-NEXT: [[CALL9:%.]] = call %class.HomemadeVector.0 @_ZN14HomemadeVectorIS_IfLj8EELj8EEixEj(%class.HomemadeVector* nonnull [[VSRC23]])
; OLDPM_O1-NEXT: [[CALL10:%.]] = call float @_ZN14HomemadeVectorIfLj8EEixEj(%class.HomemadeVector.0* [[CALL9]])		; OLDPM_O1-NEXT: [[CALL10:%.]] = call float @_ZN14HomemadeVectorIfLj8EEixEj(%class.HomemadeVector.0* [[CALL9]])
; OLDPM_O1-NEXT: [[TMP1:%.]] = load float, float [[CALL10]], align 4, !tbaa [[TBAA6:![0-9]+]]		; OLDPM_O1-NEXT: [[TMP1:%.]] = load float, float [[CALL10]], align 4, !tbaa [[TBAA6:![0-9]+]]
; OLDPM_O1-NEXT: [[CALL11:%.]] = call %class.HomemadeVector.0 @_ZN14HomemadeVectorIS_IfLj8EELj8EEixEj(%class.HomemadeVector* [[VSRCDST]])		; OLDPM_O1-NEXT: [[CALL11:%.]] = call %class.HomemadeVector.0 @_ZN14HomemadeVectorIS_IfLj8EELj8EEixEj(%class.HomemadeVector* [[VSRCDST]])
; OLDPM_O1-NEXT: [[CALL12:%.]] = call float @_ZN14HomemadeVectorIfLj8EEixEj(%class.HomemadeVector.0* [[CALL11]])		; OLDPM_O1-NEXT: [[CALL12:%.]] = call float @_ZN14HomemadeVectorIfLj8EEixEj(%class.HomemadeVector.0* [[CALL11]])
; OLDPM_O1-NEXT: [[TMP2:%.]] = load float, float [[CALL12]], align 4, !tbaa [[TBAA6]]		; OLDPM_O1-NEXT: [[TMP2:%.]] = load float, float [[CALL12]], align 4, !tbaa [[TBAA6]]
; OLDPM_O1-NEXT: [[ADD:%.*]] = fadd float [[TMP1]], [[TMP2]]		; OLDPM_O1-NEXT: [[ADD:%.*]] = fadd float [[TMP1]], [[TMP2]]
; OLDPM_O1-NEXT: store float [[ADD]], float* [[CALL12]], align 4, !tbaa [[TBAA6]]		; OLDPM_O1-NEXT: store float [[ADD]], float* [[CALL12]], align 4, !tbaa [[TBAA6]]
▲ Show 20 Lines • Show All 110 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[LICM][PhaseOrder] Don't speculate in LICM until after running loop rotateClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 409823

llvm/include/llvm/Transforms/Scalar.h

llvm/include/llvm/Transforms/Scalar/LICM.h

llvm/include/llvm/Transforms/Utils/LoopUtils.h

llvm/lib/Passes/PassBuilderPipelines.cpp

llvm/lib/Transforms/IPO/PassManagerBuilder.cpp

llvm/lib/Transforms/Scalar/LICM.cpp

llvm/test/Transforms/LoopUnroll/AArch64/runtime-unroll-generic.ll

llvm/test/Transforms/PhaseOrdering/AArch64/matrix-extract-insert.ll

llvm/test/Transforms/PhaseOrdering/X86/hoist-load-of-baseptr.ll

llvm/test/Transforms/PhaseOrdering/X86/speculation-vs-tbaa.ll

llvm/test/Transforms/PhaseOrdering/X86/spurious-peeling.ll

[LICM][PhaseOrder] Don't speculate in LICM until after running loop rotate
ClosedPublic