This is an archive of the discontinued LLVM Phabricator instance.

[MBP] Move a latch block with conditional exit and multi predecessors to top of loop
ClosedPublic

Authored by Carrot on Feb 13 2018, 2:15 PM.

Download Raw Diff

Details

Reviewers

davidxl
iteratee

Commits

rGd2210af3322d: [MBP] Move a latch block with conditional exit and multi predecessors to top of…
rL363471: [MBP] Move a latch block with conditional exit and multi predecessors to top of…

Summary

Current findBestLoopTop can find and move one kind of block to top, a latch block has one successor. Another common case is:

a latch block
it has two successors, one is loop header, another is exit
it has more than one predecessors

If it is below one of its predecessors P, only P can fall through to it, all other predecessors need a jump to it, and another conditional jump to loop header. If it is moved before loop header, all its predecessors jump to it, then fall through to loop header. So all its predecessors except P can reduce one taken branch.

Diff Detail

Repository: rL LLVM

Event Timeline

Carrot created this revision.Feb 13 2018, 2:15 PM

Herald added subscribers: javed.absar, nhaehnle, nemanjai. · View Herald TranscriptFeb 13 2018, 2:15 PM

Please add a few new test cases first.

Add a new test case.

davidxl added inline comments.Feb 15 2018, 9:52 AM

test/CodeGen/X86/move_latch_to_loop_top.ll
38 ↗	(On Diff #134325)	This is not the optimal rotation for the loop. The optimal rotation should be true latch header false in which case there is one more fall through from true to latch. I think the enhancement should shoot for getting the optimal layout.

Add more code to iteratively find new block that can be moved before old loop top block, and reduce taken branches. Now it can layout the test case optimally as David commented.

Carrot marked an inline comment as done.Feb 22 2018, 1:54 PM

davidxl added inline comments.Feb 26 2018, 9:20 AM

lib/CodeGen/MachineBlockPlacement.cpp
1758 ↗	(On Diff #135512)	BottomBlock --> bottom block BB
1759 ↗	(On Diff #135512)	the following case.
1768 ↗	(On Diff #135512)	Add some arrows in the graph and explain more in text why it is not beneficial
1792 ↗	(On Diff #135512)	Add more comment about the intention of this method: The method checks if the reduced taken branches is less than increased taken branch (to the exit block when rotation happens). If yes, it returns true.
1880 ↗	(On Diff #135512)	It makes assumption that there is an existing fall through to the exit bb. If not, it is always beneficial to rotate.

davidxl added inline comments.Feb 26 2018, 9:20 AM

lib/CodeGen/MachineBlockPlacement.cpp
1822 ↗	(On Diff #135512)	--> and it has more than one predecessors.
1828 ↗	(On Diff #135512)	Add comment that the reduced taken branches will be compared with the increased taken branch to the loop exit block.
1864 ↗	(On Diff #135512)	Why is this check needed?
test/CodeGen/X86/move_latch_to_loop_top.ll
74 ↗	(On Diff #135512)	perhaps draw a diagram here.

Carrot updated this revision to Diff 136143.Feb 27 2018, 1:26 PM

Carrot marked 7 inline comments as done.

Carrot added inline comments.

lib/CodeGen/MachineBlockPlacement.cpp
1864 ↗	(On Diff #135512)	Only two patterns are handled, none of the cases has more than 2 successors.
1880 ↗	(On Diff #135512)	Yes. But layout the loop body and rotate loop occur after this function, it is difficult to guess which BB will be put at the bottom of the loop. So to be conservative, assume the current candidate BB can be layout at the bottom, and fall through to the exit bb.

Update the patch to the current code base, also make two improvements:

Do more precise cost analysis based on the increased number of fallthrough.
Also enable findBestLoopTop when profile information is available.

Herald added subscribers: asbirlea, jsji, jfb and 2 others. · View Herald TranscriptMay 23 2019, 2:50 PM

davidxl added inline comments.May 24 2019, 4:19 PM

lib/CodeGen/MachineBlockPlacement.cpp
1959 ↗	(On Diff #201075)	Move this checked into the caller function
1991 ↗	(On Diff #201075)	This check is not necessary. See example at https://reviews.llvm.org/F8921829 If B6 is selected as the new loop top, the fall through frequencies can be increased from 99 to 150.

Carrot updated this revision to Diff 202299.May 30 2019, 2:04 PM

Carrot marked 2 inline comments as done.

Carrot edited the summary of this revision. (Show Details)

For all the test cases update, please also validate if they make sense or not if possible.

lib/CodeGen/MachineBlockPlacement.cpp
1808 ↗	(On Diff #202299)	This part looks complete. Are all the paths covered by some tests?
1918 ↗	(On Diff #202299)	The logic looks correct. Are all cases covered by tests?
1977 ↗	(On Diff #202299)	The comment should be fixed.
1983 ↗	(On Diff #202299)	more generally, you want largest pred edge frequency to be smaller than the new back edge frequency.
1985 ↗	(On Diff #202299)	The 'else' here seems wrong. It needs to fall through to do the same check.

Carrot updated this revision to Diff 203015.Jun 4 2019, 1:54 PM

Carrot marked 6 inline comments as done.

Carrot added inline comments.

lib/CodeGen/MachineBlockPlacement.cpp
1991 ↗	(On Diff #201075)	You are right. This code was used as heuristic when I didn't quantitatively compute the number of fallthrough. Now that we have more precise cost model, it should be removed.
1808 ↗	(On Diff #202299)	New tests added.
1918 ↗	(On Diff #202299)	A lot of existing test cases are impacted by this patch, and this function is intensively tested by those test cases.
1983 ↗	(On Diff #202299)	Again it is a heuristic before FallThroughGains is implemented. Now it can be removed.

Some analysis of the test case changes.

test/CodeGen/AArch64/neg-imm.ll
The control flow graph looks like:

    entry
      |
      V
-->for.body
|     |\
|     | \
|     |  \
|     | if.then3
|     |  /
|     | /
|     |/
---for.inc
      |
      V
for.cond.cleanup

The original layout is:

entry
for.body
if.then3
for.inc
for.cond.cleanup

For each loop iteration there are two taken branches.

The new layout is:

entry
for.inc
for.body
if.then3
for.cond.cleanup

For each loop iterations there is one taken branch.

test/CodeGen/AMDGPU/optimize-negated-cond.ll
The control flow of function @negated_cond_dominated_blocks is:

  bb
   |
   V
  bb4 <--
  /\    |
 /  \   |
bb5 bb6 |
 \  /   |
  \/    |
  bb7 ---
   |
   V
  bb3

The original layout is:

bb
bb4
bb6
bb5
bb7
bb3

For each loop iteration there are two taken branches.

New layout is

bb
bb6
bb7
bb4
bb5
bb3

For each loop iterations there is one taken branch.

test/CodeGen/Hexagon/redundant-branching2.ll
It is also diamond shaped loop,

   |
  b3 <--
  /\   |
 /  \  |
b4  b5 |
 \  /  |
  \/   |
  b6----
   |

Original layout is

b3
b4
b5
b6

New layout is

b5
b6
b3
b4

The new layout can reduce 1 taken branch per iteration.

test/CodeGen/X86/widen_arith-*.ll
The control flow graph is:

    entry
      |
      V
-->forcond ---
|     |      |
|     V      |
---forbody   |
             |
             V
         afterfor

Original layout is:

entry
forbody
forcond
afterfor

New layout is:

entry
forcond
forbody
afterfor

It shouldn't have performance impact, but the new layout is more natrual, more readable.

lgtm.

Warning. With static profiling, the layout strategy based on the 'precise' cost model may be off. If for some reason this causes issues later, the change should be guarded with 'hasProfile' check.

This revision is now accepted and ready to land.Jun 10 2019, 11:05 AM

Closed by commit rL363471: [MBP] Move a latch block with conditional exit and multi predecessors to top of… (authored by Carrot). · Explain WhyJun 14 2019, 4:05 PM

This revision was automatically updated to reflect the committed changes.

Herald added a project: Restricted Project. · View Herald TranscriptJun 14 2019, 4:05 PM

MaskRay mentioned this in rL363487: [RISCV] Regenerate remat.ll and atomic-rmw.ll after D43256.Jun 15 2019, 12:46 AM

MaskRay mentioned this in rGe1aa69f75574: [RISCV] Regenerate remat.ll and atomic-rmw.ll after D43256.

Do you have any benchmark numbers to show that this is generally profitable? From our downstream testing, it is not clear that this change is beneficial.

In D43256#1545519, @samparker wrote:

Do you have any benchmark numbers to show that this is generally profitable? From our downstream testing, it is not clear that this change is beneficial.

We got performance improvement in our internal search benchmark.

dmgreen added a subscriber: dmgreen.Jun 20 2019, 4:58 AM

This change causes 35% regression on very simple loop which is hot part of our internal micro benchmark.
This loop takes 99% of total execution time and has reasonably large number of iterations.
All measurements were performed on Intel(R) Xeon(R) CPU E3-1240 v5 @ 3.50GHz.

Here is the loop in question before the change:

25.41  | 0x30020e20:   c5fa10 vmovss    12(%rcx,%rsi,4), %xmm0
  5.12  | 0x30020e26:   c5f82e vucomiss    %xmm0, %xmm0
  0.61  | 0x30020e2a:   7a1d   jp    29                              ; 0x30020e49
  0.61  | 0x30020e2c:   c5fa10 vmovss    12(%rax,%rsi,4), %xmm2
 28.07 | 0x30020e32:   c5f82e vucomiss    %xmm1, %xmm0
  4.10  | 0x30020e36:   7506   jne    6                              ; 0x30020e3e
           | 0x30020e38:   0f8bd0 jnp    720                            ; 0x3002110e
  0.41  | 0x30020e3e:   c5fac2 vcmpless    %xmm2, %xmm0, %xmm3
  0.41  | 0x30020e43:   c4e369 vblendvps    %xmm3, %xmm0, %xmm2, %xmm0
 35.04 | 0x30020e49:   c5fa11 vmovss    %xmm0, 12(%rdx,%rsi,4)
  0.20  | 0x30020e4f:   48ffc6 incq    %rsi
           | 0x30020e52:   4839de cmpq    %rbx, %rsi
           | 0x30020e55:   72c9   jb    -55                             ; 0x30020e20

After the change:

          | 0x30020a40:   c5fa11 vmovss    %xmm0, 12(%rdx,%rsi,4)
          | 0x 30020a46:   48ffc6 incq    %rsi
27.25 | 0x 30020a49:   4839de cmpq    %rbx, %rsi
          | 0x30020a4c:   0f836e jae    -146                           ; 0x300209c0
          | 0x30020a52:   c5fa10 vmovss    12(%rcx,%rsi,4), %xmm0
          | 0x30020a58:   c5f82e vucomiss    %xmm0, %xmm0
          | 0x30020a5c:   7ae2   jp    -30                             ; 0x30020a40
27.46 | 0x30020a5e:   c5fa10 vmovss    12(%rax,%rsi,4), %xmm2
          | 0x30020a64:   c5f82e vucomiss    %xmm1, %xmm0
          | 0x30020a68:   7506   jne    6                              ; 0x30020a70
          | 0x30020a6a:   0f8bfd jnp    509                            ; 0x30020c6d
          | 0x30020a70:   c5fac2 vcmpless    %xmm2, %xmm0, %xmm3
23.36 | 0x30020a75:   c4e369 vblendvps    %xmm3, %xmm0, %xmm2, %xmm0
21.93 | 0x30020a7b:   ebc3   jmp    -61                            ; 0x30020a40

So far I don't have full understanding why that causes 35% slow down.
One note is that by minimizing number of taken branches we actually increase number of branch instruction in the loop what increases code size.
Moreover we increase number of branch instructions executed at runtime for old fall-through path and don't decrease it for all over paths.
While these two facts may negatively affect performance they don't explain 35% slowdown. Something more complicated happens behind the scene.

This example shows that this optimization is not always beneficial and requires more complicated profitability heuristic.

Any ideas?

Herald added subscribers: • wuzish, MaskRay. · View Herald TranscriptJul 18 2019, 3:54 AM

Can you provide benchmark results from "public" ones like SPEC or LLVM test-suite? I think this should be mandatory to land performance critical patches (always provide results from SPEC/test suite).

If think it is not enough to land critical patches just because "we see an improvement in the internal benchmark" - either open source your benchmark or if not, please provide data from test-suite.

(should be probably reverted from 9.0 branch)

@hans

Is the test case slow down with PGO or not? Also do you have branch misprediction perf data? (large slowdowns like this is usually triggered by side effect like this). Is there a trimmed down version to demonstrate the issue?

In D43256#1591704, @davidxl wrote:

Is the test case slow down with PGO or not?

No, it's not PGO.

Also do you have branch misprediction perf data? (large slowdowns like this is usually triggered by side effect like this).

I do. There is no any branch/data/instruction miss predictions. LSD works 100% in both cases as well. Even after regression CPI is 0.3. That means we are execution bounded. If I run the benchmark under perf scores change a little and there is 19% difference instead of original 35%. About half of slowdown (8.3%) comes from increased path length due to extra jump instruction. Rest comes from CPI increase by 10%. I'm checking different hypotheses what could cause that but no success so far.

Is there a trimmed down version to demonstrate the issue?

It is a simple loop consequently reading floating points (non NANs) from two array and writing minimum value to a third array:

for (int i = 0; i < 320000; i++) {
   c[i] = min(a[i], b[i]);
}

carrot, can you help looking into this? The cost model should look into extra direct jumps introduced as well.

Evgeniy, could you try to build your code with FDO? The layout code is based on profile information, if that is not available, the static estimated profile information is used. Since the loaded values are not NaN, there should be no branch from loop header to latch, but the estimated profile gives it a non trivial value, so the latch is moved before header, and one taken branch is reduced in the NaN path. I think the static profile estimation can be enhanced to treat a floating point number as not a NaN, or as a NaN only with a very small possibility.

I tried to reproduce your result with

clang++ -O2 -c d43256.cc -save-temps

#include<algorithm>

#define N 320000

float a[N];
float b[N];
float c[N];

using namespace std;

void foo(int M) {

for (int i = 0; i < M; i++) {
 c[i] = min(a[i], b[i]);
}

}

But I got totally different code sequence. Could you help to give more complete reproduce steps?
Thanks a lot!

In D43256#1591204, @xbolva00 wrote:

(should be probably reverted from 9.0 branch)

@hans

Thanks for the heads up; I'll keep an eye on this.

I'd prefer to see this either reverted or fixed on trunk, and then merge that to the 9.0 branch.

Here is a C++ equivalent of my original code (which is actually java application) for you to reproduce.

clang++ -c -O2 floatmin.cpp -march=skylake

extern float a[];
extern float b[];
extern float c[];

bool foo(int M, bool flag) {
  for (int i = 0; i < M; i++) {
    float x = a[i];
    float y = b[i];
    float min;
    if (x != x) {
      min = x;   // a is NaN
    }
    else if (y == 0.0f) {
      goto fail;
    }
    else {
     min = (x <= y) ? x : y;
    }
    c[i] = min;
  }

  return true;
fail:
  return false;
}

With C++ reproducer I can measure about 9% slowdown only. In this case CPI is identical (for the original test case I still don't know root cause of CPI difference) and all slowdown comes from increased path length due to one extra jump.

With this reproducer on hands you can gather profile data if needed. But that's a separate story. I don't think we can afford such a regression when profile is not available.
You probably could assume worst case if profile is not available but I believe it won't help in this and root cause is that heuristic just doesn't take extra jump into account.

Thanks for the test case, I can reproduce it now.

As I suspected, the problem is in BranchProbabilityAnalysis. For floating point unorder comparison, it estimates the probability as 12/32. While for loop exit, the probability is 3.1%. With these probability numbers, move the latch before header is beneficial. Unfortunately there is never NaN in the input.

Even for a normal program, the NaN possibility is extremely rare, we should give a much smaller number for the unordered comparison.

ebrevnov, with following patch the test case can be layout correctly. Can you try it with your actual code?

Index: BranchProbabilityInfo.cpp

BranchProbabilityInfo.cpp (revision 366821)

+++ BranchProbabilityInfo.cpp (working copy)
@@ -118,6 +118,12 @@
static const uint32_t FPH_TAKEN_WEIGHT = 20;
static const uint32_t FPH_NONTAKEN_WEIGHT = 12;

+/ This is the probability for an ordered floating point comparison.
+static const uint32_t FPH_ORD_WEIGHT = 1024 * 1024 - 1;
+/ This is the probability for an unordered floating point comparison, it means
+/ one or two of the operands are NaN, it should be extremely rare.
+static const uint32_t FPH_UNO_WEIGHT = 1;
+
/ Invoke-terminating normal branch taken weight
/
/ This is the weight for branching to the normal destination of an invoke
@@ -778,6 +784,8 @@

if (!FCmp)
  return false;

+ uint32_t TakenWeight = FPH_TAKEN_WEIGHT;
+ uint32_t NontakenWeight = FPH_NONTAKEN_WEIGHT;

bool isProb;
if (FCmp->isEquality()) {
  // f1 == f2 -> Unlikely

@@ -786,9 +794,13 @@

} else if (FCmp->getPredicate() == FCmpInst::FCMP_ORD) {
  // !isnan -> Likely
  isProb = true;

+ TakenWeight = FPH_ORD_WEIGHT;
+ NontakenWeight = FPH_UNO_WEIGHT;

} else if (FCmp->getPredicate() == FCmpInst::FCMP_UNO) {
  // isnan -> Unlikely
  isProb = false;

+ TakenWeight = FPH_ORD_WEIGHT;
+ NontakenWeight = FPH_UNO_WEIGHT;

} else {
  return false;
}

@@ -798,8 +810,7 @@

if (!isProb)
  std::swap(TakenIdx, NonTakenIdx);

BranchProbability TakenProb(FPH_TAKEN_WEIGHT,
FPH_TAKEN_WEIGHT + FPH_NONTAKEN_WEIGHT);

+ BranchProbability TakenProb(TakenWeight, TakenWeight + NontakenWeight);

setEdgeProbability(BB, TakenIdx, TakenProb);
setEdgeProbability(BB, NonTakenIdx, TakenProb.getCompl());
return true;

Hi @Carrot,

I checked your patch on original test case and it does resolve the issue. Unfortunately, we have 4 other test cases regressed from 1% to 5%. Bad news is that we don't have sources for these cases. At glance they don't involve operations with floats. So that is not a surprise they are not affected. I can continue investigation and try to provide short reproducer if needed. My fear here is that original change touches very wide class of cases and we won't be able to fix all of them one by one (or it will take too much time). Can we think of a more realistic approach to deal with regressions?

Thanks
Evgeniy

Is this reverted (at least from 9.0, @hans) yet?

In D43256#1600951, @lebedev.ri wrote:

Is this reverted (at least from 9.0, @hans) yet?

No. The preferred approach is to resolve on trunk first (reverting or fixing) and then merging that to the release branch.

It sounds like there is still discussion ongoing here. Perhaps it should be reverted on trunk in the meantime?

I suspect other affected cases are due to bad static profile data too.

For now, I think the best way forward is to enable this transformation only when real profile data is available.

@ebrevnov, it's better to provide a reproducer. Otherwise I can't analyze the problem that impacts your code. Four is not a big number.

In D43256#1550842, @Carrot wrote:

In D43256#1545519, @samparker wrote:

Do you have any benchmark numbers to show that this is generally profitable? From our downstream testing, it is not clear that this change is beneficial.

We got performance improvement in our internal search benchmark.

How does this transformation impact the benchmark when not using profile data?

In D43256#1602841, @hans wrote:

In D43256#1550842, @Carrot wrote:

In D43256#1545519, @samparker wrote:

Do you have any benchmark numbers to show that this is generally profitable? From our downstream testing, it is not clear that this change is beneficial.

We got performance improvement in our internal search benchmark.

How does this transformation impact the benchmark when not using profile data?

Benchmark is running, will report the result once it is finished.

In D43256#1602841, @hans wrote:

In D43256#1550842, @Carrot wrote:

In D43256#1545519, @samparker wrote:

Do you have any benchmark numbers to show that this is generally profitable? From our downstream testing, it is not clear that this change is beneficial.

We got performance improvement in our internal search benchmark.

How does this transformation impact the benchmark when not using profile data?

In plain mode we also got performance improvement, the speedup is a little smaller than FDO Mode.

In D43256#1601744, @Carrot wrote:

@ebrevnov, it's better to provide a reproducer. Otherwise I can't analyze the problem that impacts your code. Four is not a big number.

Four is not big, but who knows how many other cases will arise. I was looking to another case with ~%5 regression. As I mentioned, sources are not available for this case and I was not able to identify anything special just looking into assembler. As in previous case we have one extra jump which accounts for about 3.5% instructions of the loop. Here is an assembler for the loop after optimization:

23001bf40:   4c8975 movq    %r14, -72(%rbp)
23001bf44:   4c894d movq    %r9, -48(%rbp)
23001bf48:   48894d movq    %rcx, -56(%rbp)
23001bf4c:   4c8945 movq    %r8, -64(%rbp)
23001bf50:   48895d movq    %rbx, -80(%rbp)
23001bf54:   4889cf movq    %rcx, %rdi
23001bf57:   90     nop
23001bf58:   e863f1 callq    -69277                       ; 0x3000b0c0
23001bf5d:   4c8b75 movq    -72(%rbp), %r14
23001bf61:   488b4d movq    -56(%rbp), %rcx
23001bf65:   4c8b4d movq    -48(%rbp), %r9
23001bf69:   4c8b45 movq    -64(%rbp), %r8
23001bf6d:   48ffc3 incq    %rbx
23001bf70:   410fb6 movzbl    148(%r9), %eax
23001bf78:   84c0   testb    %al, %al
23001bf7a:   7572   jne    114                            ; 0x3001bfee
23001bf7c:   498b04 movq    (%r12), %rax
23001bf80:   498b55 movq    (%r13), %rdx
23001bf84:   4885c2 testq    %rax, %rdx
23001bf87:   751b   jne    27                             ; 0x3001bfa4
23001bf89:   488b78 movq    16(%rax), %rdi
23001bf8d:   4885fa testq    %rdi, %rdx
23001bf90:   7526   jne    38                             ; 0x3001bfb8
23001bf92:   f64708 testb    $1, 8(%rdi)            ; 0x3001bfd4
23001bf96:   be5f49 movl    $346463, %esi
23001bf9b:   74a3   je    -93                             ; 0x3001bf40
23001bf9d:   be8b10 movl    $4235, %esi
23001bfa2:   eb9c   jmp    -100                           ; 0x3001bf40

Thanks
Evgeniy

In D43256#1604069, @Carrot wrote:

In D43256#1602841, @hans wrote:

In D43256#1550842, @Carrot wrote:

In D43256#1545519, @samparker wrote:

Do you have any benchmark numbers to show that this is generally profitable? From our downstream testing, it is not clear that this change is beneficial.

We got performance improvement in our internal search benchmark.

How does this transformation impact the benchmark when not using profile data?

In plain mode we also got performance improvement, the speedup is a little smaller than FDO Mode.

I have a general question/comment. By now it's more or less evident that benefit of optimization heavily depends on correctness of profile information. That means in general case there is no way to reason about its effectiveness. Thus I believe it should be turned off if there is no profile.

In D43256#1605737, @ebrevnov wrote:

In D43256#1604069, @Carrot wrote:

In D43256#1602841, @hans wrote:

In D43256#1550842, @Carrot wrote:

In D43256#1545519, @samparker wrote:

Do you have any benchmark numbers to show that this is generally profitable? From our downstream testing, it is not clear that this change is beneficial.

We got performance improvement in our internal search benchmark.

How does this transformation impact the benchmark when not using profile data?

In plain mode we also got performance improvement, the speedup is a little smaller than FDO Mode.

I have a general question/comment. By now it's more or less evident that benefit of optimization heavily depends on correctness of profile information. That means in general case there is no way to reason about its effectiveness. Thus I believe it should be turned off if there is no profile.

The optimization decision is based on profile information, if a real profile is not available, the statically estimated profile information (generated by BranchProbabilityInfo.cpp) is used. So if an unreasonable probability is generated as in your first case, or if an user program has untypical run time behavior than BPI expected, it may make bad decision.
Since you have strong concern about the optimization without real profile information, I will restore the old behavior if no real profile information is available.

In D43256#1606387, @Carrot wrote:

In D43256#1605737, @ebrevnov wrote:

In D43256#1604069, @Carrot wrote:

In D43256#1602841, @hans wrote:

In D43256#1550842, @Carrot wrote:

In D43256#1545519, @samparker wrote:

Do you have any benchmark numbers to show that this is generally profitable? From our downstream testing, it is not clear that this change is beneficial.

We got performance improvement in our internal search benchmark.

How does this transformation impact the benchmark when not using profile data?

In plain mode we also got performance improvement, the speedup is a little smaller than FDO Mode.

I have a general question/comment. By now it's more or less evident that benefit of optimization heavily depends on correctness of profile information. That means in general case there is no way to reason about its effectiveness. Thus I believe it should be turned off if there is no profile.

The optimization decision is based on profile information, if a real profile is not available, the statically estimated profile information (generated by BranchProbabilityInfo.cpp) is used. So if an unreasonable probability is generated as in your first case, or if an user program has untypical run time behavior than BPI expected, it may make bad decision.
Since you have strong concern about the optimization without real profile information, I will restore the old behavior if no real profile information is available.

I would suggest putting the optimization under an option and disable it by default for now. Once all problems are resolved we can change the default. What do you think?

In D43256#1606387, @Carrot wrote:

In D43256#1605737, @ebrevnov wrote:

In D43256#1604069, @Carrot wrote:

In D43256#1602841, @hans wrote:

In D43256#1550842, @Carrot wrote:

In D43256#1545519, @samparker wrote:

Do you have any benchmark numbers to show that this is generally profitable? From our downstream testing, it is not clear that this change is beneficial.

We got performance improvement in our internal search benchmark.

How does this transformation impact the benchmark when not using profile data?

In plain mode we also got performance improvement, the speedup is a little smaller than FDO Mode.

I have a general question/comment. By now it's more or less evident that benefit of optimization heavily depends on correctness of profile information. That means in general case there is no way to reason about its effectiveness. Thus I believe it should be turned off if there is no profile.

The optimization decision is based on profile information, if a real profile is not available, the statically estimated profile information (generated by BranchProbabilityInfo.cpp) is used. So if an unreasonable probability is generated as in your first case, or if an user program has untypical run time behavior than BPI expected, it may make bad decision.
Since you have strong concern about the optimization without real profile information, I will restore the old behavior if no real profile information is available.

I took a deeper look at the second failing case and found something interesting. In fact, there is a profile information for the loop but it's incomplete. For example there is a profile data for the method entry, loop back edge and some other branches in the loop. Only two branches doesn't have profile. (You may wonder how it's possible. Original application is java program where profile typically available but may be unavailable for inlined methods). I think your heuristic makes wrong decision due to absence of profile for these two branches which were generated from one select instruction over boolean variable. In most cases you really can't statically estimate if boolean is false or true.

I think we need to be conservative and don't account for potential benefit from branches with out representative profile.

Thanks
Evgeniy

In D43256#1607733, @ebrevnov wrote:

I would suggest putting the optimization under an option and disable it by default for now. Once all problems are resolved we can change the default. What do you think?

After restore to original behavior in plain mode, you can use -force-precise-rotation-cost=true to use this more aggressive loop layout.

In D43256#1608459, @Carrot wrote:

In D43256#1607733, @ebrevnov wrote:

I would suggest putting the optimization under an option and disable it by default for now. Once all problems are resolved we can change the default. What do you think?

After restore to original behavior in plain mode, you can use -force-precise-rotation-cost=true to use this more aggressive loop layout.

Is there a patch in progress for restoring to the original behaviour in non-profile mode? It would be nice if we could get this resolved soon.

In D43256#1609742, @hans wrote:

In D43256#1608459, @Carrot wrote:

In D43256#1607733, @ebrevnov wrote:

I would suggest putting the optimization under an option and disable it by default for now. Once all problems are resolved we can change the default. What do you think?

After restore to original behavior in plain mode, you can use -force-precise-rotation-cost=true to use this more aggressive loop layout.

Is there a patch in progress for restoring to the original behaviour in non-profile mode? It would be nice if we could get this resolved soon.

Yes, some new added or modified test cases after this patch need to be adjusted. The new patch can be ready tomorrow.

Patch https://reviews.llvm.org/D65673 for restoring the original layout in plain mode.

We are also seeing this patch slow down one of our internal benchmarks and speed up another one on the Qualcomm Hexagon target.
In both cases the static estimated profile is used - and the static profile is representative. In both cases D43256 basically lays outs executed hot code closer together improving cache utilization. However in both cases we see critical path length and the number of jumps in the critical path increase. So a precise cost model is a good idea. We spent some time analyzing why one benchmark got worse - we can see more mispredicts - but there may be more going on under the hood.
The other benchmark that speeds up - we see the new layout lowers pressure on an internal branch target hardware resource - the critical loop has a lot of calls that have already increased pressure on that resource.
We dont have sources for these benchmarks.

We have verified that D65673 restores the old behavior on the benchmark that got worse and throwing the flag -force-precise-rotation-cost=true helps us keep the improvement on the other one.

@hjagasia, thank you for the verification.

In D43256#1612903, @Carrot wrote:

Patch https://reviews.llvm.org/D65673 for restoring the original layout in plain mode.

I see that this was committed in r368339. I will merge that to the release branch once it's baked in trunk for a little while.

In D43256#1622622, @hans wrote:

In D43256#1612903, @Carrot wrote:

Patch https://reviews.llvm.org/D65673 for restoring the original layout in plain mode.

I see that this was committed in r368339. I will merge that to the release branch once it's baked in trunk for a little while.

That was reverted in r368579 since it broke the Chromium build.

@Carrot: Can you please prioritize this so we can get it fixed in time for the LLVM 9 release?

@Carrot: Any update on this? We have slow downs on our benchmarks because of this as mentioned earlier.

@hans this patch should be reverted from 9.0 I think so rc3 is “fixed”.

In D43256#1639810, @xbolva00 wrote:

@hans this patch should be reverted from 9.0 I think so rc3 is “fixed”.

I'd rather see the fix land on trunk first (reverting on the branch is also not trivial, there are merge conflicts in several test files). From the discussion at http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190819/686087.html, it should be ready to go in again.

Ok, thanks! I didnt know there is a working fix.

In D43256#1641056, @hans wrote:

In D43256#1639810, @xbolva00 wrote:

@hans this patch should be reverted from 9.0 I think so rc3 is “fixed”.

I'd rather see the fix land on trunk first (reverting on the branch is also not trivial, there are merge conflicts in several test files). From the discussion at http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190819/686087.html, it should be ready to go in again.

It is committed as r369664.

In D43256#1641893, @Carrot wrote:

In D43256#1641056, @hans wrote:

In D43256#1639810, @xbolva00 wrote:

@hans this patch should be reverted from 9.0 I think so rc3 is “fixed”.

I'd rather see the fix land on trunk first (reverting on the branch is also not trivial, there are merge conflicts in several test files). From the discussion at http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190819/686087.html, it should be ready to go in again.

It is committed as r369664.

I see there is a follow-up comment on the commit email saying it fails in the verifier.

I think at this point, we're too close to the llvm 9 release to take this. Hopefully it can get fixed and stabilized on trunk, and then be merged to llvm 9.0.1.

Maybe not worth to revert it even from 9.0.1

This commit (see below) says that disabling this new feature regressed many benchmarks. So probably it is okay to leave it as is, possibly tune heuristic a bit.

Anyway, in static profile some code perf improves, some regresses - but if this patch improves a lot of code, we shouldn’t disable it because some people have few regressions..)

https://reviews.llvm.org/rGf9f81289e6864ca3f09df16bad0ffc3ca58c3162#684441

Hi,
Seems like this is supposed to increase the average number of fallthrough in a case of simple nested loops, however it seems to also increase the total number of branch instructions both in and outside the outer loop.
In our target this seems to do more harm than good as increasing the number of branches is significantly more harmful than having more conditional taken branches.
Instead of a conditional branch that is usually taken and jumps from end to start of outer loop, we have an unconditional branch that jumps from the end of the inner loop to the end of the outer loop, which then conditionally jumps out of the outer loop or falls through to the next iteration of the outer loop.
This increases the total number of executed branch instructions by 1.

Did I understand this correctly?

Is there a way to tweak this optimization to avoid generating additional branches? Or would this mean completely disabling it for targets that don't benefit from additional fallthroughs when it comes at the cost of additional branches?

Thanks

Herald added subscribers: kerbowa, pengfei. · View Herald TranscriptDec 31 2020, 12:19 PM

@GalZohar, the various layout algorithms in MachineBlockPlacement mainly consider the number of fall-throughs and dynamic number of branch instructions (usually they are consistent) according to branch probabilities. So you can try to build your application with profiling. Or you can compile this file with -Os since the improvement in this patch is disabled with -Os.

In D43256#2480636, @Carrot wrote:

@GalZohar, the various layout algorithms in MachineBlockPlacement mainly consider the number of fall-throughs and dynamic number of branch instructions (usually they are consistent) according to branch probabilities. So you can try to build your application with profiling. Or you can compile this file with -Os since the improvement in this patch is disabled with -Os.

In the simple example I have the number of branches executed when running the most common control flow path is increased by 1 due to this transformation.
Seems like in nested loops, where the outer-loop latch is moved to fall through into the outer-loop head, then the inner loop needs an additional exit branch, which increases the number of dynamic branches by 1 for every outer-loop iteration that also executes the inner loop. Is this intentional?
Disabling this completely degrades performance in some more complex examples where the total number of branches is not increased and therefore this optimization is beneficial. I would prefer to keep this when the number of branches isn't increased but skip it otherwise, like the nested loop example above.
I'm not sure how profiling would help my situation, as it seems like the number of branches may be increased regardless of block frequency values, as there's 1 path that executes an extra branch instruction.

@GalZohar, without a testcase I can't say what's the problem.

It is not intentional to increase the number of branches of any particular path. All of the algorithms are driven by branch probabilities. If you didn't collect profiling, llvm guesses a probability for each branch, it's reasonable for most code. But it's not rare that the guessed probability is different from actual probability, it may not result in a good layout.

So I strongly suggest you try profiling build. MBP is one of the passes that get most performance improvement from profiling.

In D43256#2487302, @Carrot wrote:

@GalZohar, without a testcase I can't say what's the problem.

It is not intentional to increase the number of branches of any particular path. All of the algorithms are driven by branch probabilities. If you didn't collect profiling, llvm guesses a probability for each branch, it's reasonable for most code. But it's not rare that the guessed probability is different from actual probability, it may not result in a good layout.

So I strongly suggest you try profiling build. MBP is one of the passes that get most performance improvement from profiling.

While we hope to eventually support profiling builds, we must also optimize in the best way possible without profiling.
We have a very simple case where profiling shouldn't be needed. I think this would be the optimal layout:

BB1 (Entry) -> BB2, BB5
BB2 -> BB3, BB4
BB3 -> BB3, BB4
BB4 ->BB2, BB5
BB5 (Exit)

This way all blocks (except exit) have a single branch instruction.
With this optimization BB4 is moved before BB2, which results in an additional branch from BB3 to BB4 which wasn't needed before (BB3 had only a single branch, and will now need 2). All other blocks still have a single branch after. BB1 also gets an additional branch instruction, but that is acceptable as it's outside the loop.

Is this something that is intentional or is something broken with the frequencies? Assuming this is intended, I still don't understand how this transformation is good regardless of frequencies, assuming number of branches is always more important than number of fallthroughs.

@GalZohar, thanks for the example, now I understand your problem. The new layout does increase the number of executed branches for the path BB2->BB3->BB4->BB2. Unfortunately most of the current MBP algorithms don't consider this factor. When considering the number of fall through only, the new layout has more fall through and less taken branch.

Since the number of executed branches does has performance impact on your target, so welcome to enhance MBP to include this factor on related targets.

In D43256#2494925, @Carrot wrote:

@GalZohar, thanks for the example, now I understand your problem. The new layout does increase the number of executed branches for the path BB2->BB3->BB4->BB2. Unfortunately most of the current MBP algorithms don't consider this factor. When considering the number of fall through only, the new layout has more fall through and less taken branch.

Since the number of executed branches does has performance impact on your target, so welcome to enhance MBP to include this factor on related targets.

Is this a general problem with MBP then? I haven't noticed it before this optimization, but maybe it was just luck or not investigating the right examples. I'm trying to understand where is the best place to fix it, and if it's even feasible.

Is this a general problem with MBP then? I haven't noticed it before this optimization, but maybe it was just luck or not investigating the right examples. I'm trying to understand where is the best place to fix it, and if it's even feasible.

Mostly, there is only one function considering extra jump instruction rotateLoopWithProfile, but it is not called by default.

zhaozhengpeng added a subscriber: zhaozhengpeng.Oct 20 2022, 3:41 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 20 2022, 3:41 AM

Herald added a subscriber: kosarev. · View Herald Transcript

Revision Contents

Path

Size

llvm/

trunk/

lib/

CodeGen/

MachineBlockPlacement.cpp

283 lines

test/

CodeGen/

AArch64/

cmpxchg-idioms.ll

2 lines

neg-imm.ll

2 lines

tailmerging_in_mbp.ll

5 lines

AMDGPU/

collapse-endcf.ll

9 lines

divergent-branch-uniform-condition.ll

41 lines

global_smrd_cfg.ll

21 lines

hoist-cond.ll

2 lines

i1-copy-from-loop.ll

12 lines

indirect-addressing-si.ll

12 lines

loop_break.ll

2 lines

loop_exit_with_xor.ll

4 lines

madmk.ll

2 lines

multilevel-break.ll

56 lines

optimize-negated-cond.ll

8 lines

si-annotate-cf.ll

12 lines

valu-i1.ll

2 lines

wqm.ll

11 lines

ARM/

2011-03-23-PeepholeBug.ll

2 lines

arm-and-tst-peephole.ll

5 lines

2 lines

26 lines

4 lines

2 lines

2 lines

Hexagon/

bug6757-endloop.ll

2 lines

early-if-merge-loop.ll

4 lines

prof-early-if.ll

2 lines

redundant-branching2.ll

2 lines

PowerPC/

atomics-regression.ll

312 lines

cmp_elimination.ll

11 lines

ctrloop-shortLoops.ll

3 lines

expand-foldable-isel.ll

10 lines

knowCRBitSpill.ll

2 lines

licm-remat.ll

3 lines

SystemZ/

atomicrmw-minmax-01.ll

6 lines

atomicrmw-minmax-02.ll

6 lines

loop-01.ll

4 lines

loop-02.ll

2 lines

swifterror.ll

4 lines

Thumb/

consthoist-physical-addr.ll

12 lines

X86/

block-placement.ll

19 lines

code_placement.ll

7 lines

code_placement_cold_loop_blocks.ll

2 lines

code_placement_ignore_succ_in_inner_loop.ll

7 lines

code_placement_loop_rotation2.ll

14 lines

code_placement_no_header_change.ll

2 lines

conditional-tailcall.ll

178 lines

loop-blocks.ll

38 lines

loop-rotate.ll

120 lines

lsr-loop-exit-cond.ll

66 lines

move_latch_to_loop_top.ll

239 lines

pr38185.ll

16 lines

ragreedy-hoist-spill.ll

123 lines

reverse_branches.ll

35 lines

speculative-load-hardening.ll

57 lines

swifterror.ll

6 lines

tail-dup-merge-loop-headers.ll

67 lines

tail-dup-repeat.ll

25 lines

vector-shift-by-select-loop.ll

67 lines

16 lines

16 lines

16 lines

32 lines

16 lines

16 lines

32 lines

x86-cmov-converter.ll

2 lines

DebugInfo/

X86/

PR37234.ll

12 lines

dbg-value-transfer-order.ll

13 lines

Diff 204875

llvm/trunk/lib/CodeGen/MachineBlockPlacement.cpp

Show First 20 Lines • Show All 449 Lines • ▼ Show 20 Lines	void fillWorkLists(const MachineBasicBlock *MBB,
const BlockFilterSet *BlockFilter);		const BlockFilterSet *BlockFilter);

void buildChain(const MachineBasicBlock *BB, BlockChain &Chain,		void buildChain(const MachineBasicBlock *BB, BlockChain &Chain,
BlockFilterSet *BlockFilter = nullptr);		BlockFilterSet *BlockFilter = nullptr);
bool canMoveBottomBlockToTop(const MachineBasicBlock *BottomBlock,		bool canMoveBottomBlockToTop(const MachineBasicBlock *BottomBlock,
const MachineBasicBlock *OldTop);		const MachineBasicBlock *OldTop);
bool hasViableTopFallthrough(const MachineBasicBlock *Top,		bool hasViableTopFallthrough(const MachineBasicBlock *Top,
const BlockFilterSet &LoopBlockSet);		const BlockFilterSet &LoopBlockSet);
		BlockFrequency TopFallThroughFreq(const MachineBasicBlock *Top,
		const BlockFilterSet &LoopBlockSet);
		BlockFrequency FallThroughGains(const MachineBasicBlock *NewTop,
		const MachineBasicBlock *OldTop,
		const MachineBasicBlock *ExitBB,
		const BlockFilterSet &LoopBlockSet);
		MachineBasicBlock findBestLoopTopHelper(MachineBasicBlock OldTop,
		const MachineLoop &L, const BlockFilterSet &LoopBlockSet);
MachineBasicBlock *findBestLoopTop(		MachineBasicBlock *findBestLoopTop(
const MachineLoop &L, const BlockFilterSet &LoopBlockSet);		const MachineLoop &L, const BlockFilterSet &LoopBlockSet);
MachineBasicBlock *findBestLoopExit(		MachineBasicBlock *findBestLoopExit(
const MachineLoop &L, const BlockFilterSet &LoopBlockSet);		const MachineLoop &L, const BlockFilterSet &LoopBlockSet,
		BlockFrequency &ExitFreq);
BlockFilterSet collectLoopBlockSet(const MachineLoop &L);		BlockFilterSet collectLoopBlockSet(const MachineLoop &L);
void buildLoopChains(const MachineLoop &L);		void buildLoopChains(const MachineLoop &L);
void rotateLoop(		void rotateLoop(
BlockChain &LoopChain, const MachineBasicBlock *ExitingBB,		BlockChain &LoopChain, const MachineBasicBlock *ExitingBB,
const BlockFilterSet &LoopBlockSet);		BlockFrequency ExitFreq, const BlockFilterSet &LoopBlockSet);
void rotateLoopWithProfile(		void rotateLoopWithProfile(
BlockChain &LoopChain, const MachineLoop &L,		BlockChain &LoopChain, const MachineLoop &L,
const BlockFilterSet &LoopBlockSet);		const BlockFilterSet &LoopBlockSet);
void buildCFGChains();		void buildCFGChains();
void optimizeBranches();		void optimizeBranches();
void alignBlocks();		void alignBlocks();
/// Returns true if a block should be tail-duplicated to increase fallthrough		/// Returns true if a block should be tail-duplicated to increase fallthrough
/// opportunities.		/// opportunities.
▲ Show 20 Lines • Show All 1,310 Lines • ▼ Show 20 Lines	MachineBlockPlacement::canMoveBottomBlockToTop(
if (OtherBB == BottomBlock)		if (OtherBB == BottomBlock)
OtherBB = *Pred->succ_rbegin();		OtherBB = *Pred->succ_rbegin();
if (OtherBB == OldTop)		if (OtherBB == OldTop)
return false;		return false;

return true;		return true;
}		}

/// Find the best loop top block for layout.		// Find out the possible fall through frequence to the top of a loop.
		BlockFrequency
		MachineBlockPlacement::TopFallThroughFreq(
		const MachineBasicBlock *Top,
		const BlockFilterSet &LoopBlockSet) {
		BlockFrequency MaxFreq = 0;
		for (MachineBasicBlock *Pred : Top->predecessors()) {
		BlockChain *PredChain = BlockToChain[Pred];
		if (!LoopBlockSet.count(Pred) &&
		(!PredChain \|\| Pred == *std::prev(PredChain->end()))) {
		// Found a Pred block can be placed before Top.
		// Check if Top is the best successor of Pred.
		auto TopProb = MBPI->getEdgeProbability(Pred, Top);
		bool TopOK = true;
		for (MachineBasicBlock *Succ : Pred->successors()) {
		auto SuccProb = MBPI->getEdgeProbability(Pred, Succ);
		BlockChain *SuccChain = BlockToChain[Succ];
		// Check if Succ can be placed after Pred.
		// Succ should not be in any chain, or it is the head of some chain.
		if (!LoopBlockSet.count(Succ) && (SuccProb > TopProb) &&
		(!SuccChain \|\| Succ == *SuccChain->begin())) {
		TopOK = false;
		break;
		}
		}
		if (TopOK) {
		BlockFrequency EdgeFreq = MBFI->getBlockFreq(Pred) *
		MBPI->getEdgeProbability(Pred, Top);
		if (EdgeFreq > MaxFreq)
		MaxFreq = EdgeFreq;
		}
		}
		}
		return MaxFreq;
		}

		// Compute the fall through gains when move NewTop before OldTop.
		//
		// In following diagram, edges marked as "-" are reduced fallthrough, edges
		// marked as "+" are increased fallthrough, this function computes
		//
		// SUM(increased fallthrough) - SUM(decreased fallthrough)
		//
		// \|
		// \| -
		// V
		// --->OldTop
		// \| .
		// \| .
		// +\| . +
		// \| Pred --->
		// \| \|-
		// \| V
		// --- NewTop <---
		// \|-
		// V
		//
		BlockFrequency
		MachineBlockPlacement::FallThroughGains(
		const MachineBasicBlock *NewTop,
		const MachineBasicBlock *OldTop,
		const MachineBasicBlock *ExitBB,
		const BlockFilterSet &LoopBlockSet) {
		BlockFrequency FallThrough2Top = TopFallThroughFreq(OldTop, LoopBlockSet);
		BlockFrequency FallThrough2Exit = 0;
		if (ExitBB)
		FallThrough2Exit = MBFI->getBlockFreq(NewTop) *
		MBPI->getEdgeProbability(NewTop, ExitBB);
		BlockFrequency BackEdgeFreq = MBFI->getBlockFreq(NewTop) *
		MBPI->getEdgeProbability(NewTop, OldTop);

		// Find the best Pred of NewTop.
		MachineBasicBlock *BestPred = nullptr;
		BlockFrequency FallThroughFromPred = 0;
		for (MachineBasicBlock *Pred : NewTop->predecessors()) {
		if (!LoopBlockSet.count(Pred))
		continue;
		BlockChain *PredChain = BlockToChain[Pred];
		if (!PredChain \|\| Pred == *std::prev(PredChain->end())) {
		BlockFrequency EdgeFreq = MBFI->getBlockFreq(Pred) *
		MBPI->getEdgeProbability(Pred, NewTop);
		if (EdgeFreq > FallThroughFromPred) {
		FallThroughFromPred = EdgeFreq;
		BestPred = Pred;
		}
		}
		}

		// If NewTop is not placed after Pred, another successor can be placed
		// after Pred.
		BlockFrequency NewFreq = 0;
		if (BestPred) {
		for (MachineBasicBlock *Succ : BestPred->successors()) {
		if ((Succ == NewTop) \|\| (Succ == BestPred) \|\| !LoopBlockSet.count(Succ))
		continue;
		if (ComputedEdges.find(Succ) != ComputedEdges.end())
		continue;
		BlockChain *SuccChain = BlockToChain[Succ];
		if ((SuccChain && (Succ != *SuccChain->begin())) \|\|
		(SuccChain == BlockToChain[BestPred]))
		continue;
		BlockFrequency EdgeFreq = MBFI->getBlockFreq(BestPred) *
		MBPI->getEdgeProbability(BestPred, Succ);
		if (EdgeFreq > NewFreq)
		NewFreq = EdgeFreq;
		}
		BlockFrequency OrigEdgeFreq = MBFI->getBlockFreq(BestPred) *
		MBPI->getEdgeProbability(BestPred, NewTop);
		if (NewFreq > OrigEdgeFreq) {
		// If NewTop is not the best successor of Pred, then Pred doesn't
		// fallthrough to NewTop. So there is no FallThroughFromPred and
		// NewFreq.
		NewFreq = 0;
		FallThroughFromPred = 0;
		}
		}

		BlockFrequency Result = 0;
		BlockFrequency Gains = BackEdgeFreq + NewFreq;
		BlockFrequency Lost = FallThrough2Top + FallThrough2Exit +
		FallThroughFromPred;
		if (Gains > Lost)
		Result = Gains - Lost;
		return Result;
		}

		/// Helper function of findBestLoopTop. Find the best loop top block
		/// from predecessors of old top.
		///
		/// Look for a block which is strictly better than the old top for laying
		/// out before the old top of the loop. This looks for only two patterns:
///		///
/// Look for a block which is strictly better than the loop header for laying		/// 1. a block has only one successor, the old loop top
/// out at the top of the loop. This looks for one and only one pattern:		///
/// a latch block with no conditional exit. This block will cause a conditional		/// Because such a block will always result in an unconditional jump,
/// jump around it or will be the bottom of the loop if we lay it out in place,		/// rotating it in front of the old top is always profitable.
/// but if it it doesn't end up at the bottom of the loop for any reason,		///
/// rotation alone won't fix it. Because such a block will always result in an		/// 2. a block has two successors, one is old top, another is exit
/// unconditional jump (for the backedge) rotating it in front of the loop		/// and it has more than one predecessors
/// header is always profitable.		///
		/// If it is below one of its predecessors P, only P can fall through to
		/// it, all other predecessors need a jump to it, and another conditional
		/// jump to loop header. If it is moved before loop header, all its
		/// predecessors jump to it, then fall through to loop header. So all its
		/// predecessors except P can reduce one taken branch.
		/// At the same time, move it before old top increases the taken branch
		/// to loop exit block, so the reduced taken branch will be compared with
		/// the increased taken branch to the loop exit block.
MachineBasicBlock *		MachineBasicBlock *
MachineBlockPlacement::findBestLoopTop(const MachineLoop &L,		MachineBlockPlacement::findBestLoopTopHelper(
		MachineBasicBlock *OldTop,
		const MachineLoop &L,
const BlockFilterSet &LoopBlockSet) {		const BlockFilterSet &LoopBlockSet) {
// Placing the latch block before the header may introduce an extra branch
// that skips this block the first time the loop is executed, which we want
// to avoid when optimising for size.
// FIXME: in theory there is a case that does not introduce a new branch,
// i.e. when the layout predecessor does not fallthrough to the loop header.
// In practice this never happens though: there always seems to be a preheader
// that can fallthrough and that is also placed before the header.
if (F->getFunction().hasOptSize())
return L.getHeader();

// Check that the header hasn't been fused with a preheader block due to		// Check that the header hasn't been fused with a preheader block due to
// crazy branches. If it has, we need to start with the header at the top to		// crazy branches. If it has, we need to start with the header at the top to
// prevent pulling the preheader into the loop body.		// prevent pulling the preheader into the loop body.
BlockChain &HeaderChain = *BlockToChain[L.getHeader()];		BlockChain &HeaderChain = *BlockToChain[OldTop];
if (!LoopBlockSet.count(*HeaderChain.begin()))		if (!LoopBlockSet.count(*HeaderChain.begin()))
return L.getHeader();		return OldTop;

LLVM_DEBUG(dbgs() << "Finding best loop top for: "		LLVM_DEBUG(dbgs() << "Finding best loop top for: " << getBlockName(OldTop)
<< getBlockName(L.getHeader()) << "\n");		<< "\n");

BlockFrequency BestPredFreq;		BlockFrequency BestGains = 0;
MachineBasicBlock *BestPred = nullptr;		MachineBasicBlock *BestPred = nullptr;
for (MachineBasicBlock *Pred : L.getHeader()->predecessors()) {		for (MachineBasicBlock *Pred : OldTop->predecessors()) {
if (!LoopBlockSet.count(Pred))		if (!LoopBlockSet.count(Pred))
continue;		continue;
LLVM_DEBUG(dbgs() << " header pred: " << getBlockName(Pred) << ", has "		if (Pred == L.getHeader())
		continue;
		LLVM_DEBUG(dbgs() << " old top pred: " << getBlockName(Pred) << ", has "
<< Pred->succ_size() << " successors, ";		<< Pred->succ_size() << " successors, ";
MBFI->printBlockFreq(dbgs(), Pred) << " freq\n");		MBFI->printBlockFreq(dbgs(), Pred) << " freq\n");
if (Pred->succ_size() > 1)		if (Pred->succ_size() > 2)
continue;		continue;

if (!canMoveBottomBlockToTop(Pred, L.getHeader()))		MachineBasicBlock *OtherBB = nullptr;
		if (Pred->succ_size() == 2) {
		OtherBB = *Pred->succ_begin();
		if (OtherBB == OldTop)
		OtherBB = *Pred->succ_rbegin();
		}

		if (!canMoveBottomBlockToTop(Pred, OldTop))
continue;		continue;

BlockFrequency PredFreq = MBFI->getBlockFreq(Pred);		BlockFrequency Gains = FallThroughGains(Pred, OldTop, OtherBB,
if (!BestPred \|\| PredFreq > BestPredFreq \|\|		LoopBlockSet);
(!(PredFreq < BestPredFreq) &&		if ((Gains > 0) && (Gains > BestGains \|\|
Pred->isLayoutSuccessor(L.getHeader()))) {		((Gains == BestGains) && Pred->isLayoutSuccessor(OldTop)))) {
BestPred = Pred;		BestPred = Pred;
BestPredFreq = PredFreq;		BestGains = Gains;
}		}
}		}

// If no direct predecessor is fine, just use the loop header.		// If no direct predecessor is fine, just use the loop header.
if (!BestPred) {		if (!BestPred) {
LLVM_DEBUG(dbgs() << " final top unchanged\n");		LLVM_DEBUG(dbgs() << " final top unchanged\n");
return L.getHeader();		return OldTop;
}		}

// Walk backwards through any straight line of predecessors.		// Walk backwards through any straight line of predecessors.
while (BestPred->pred_size() == 1 &&		while (BestPred->pred_size() == 1 &&
(*BestPred->pred_begin())->succ_size() == 1 &&		(*BestPred->pred_begin())->succ_size() == 1 &&
*BestPred->pred_begin() != L.getHeader())		*BestPred->pred_begin() != L.getHeader())
BestPred = *BestPred->pred_begin();		BestPred = *BestPred->pred_begin();

LLVM_DEBUG(dbgs() << " final top: " << getBlockName(BestPred) << "\n");		LLVM_DEBUG(dbgs() << " final top: " << getBlockName(BestPred) << "\n");
return BestPred;		return BestPred;
}		}

		/// Find the best loop top block for layout.
		///
		/// This function iteratively calls findBestLoopTopHelper, until no new better
		/// BB can be found.
		MachineBasicBlock *
		MachineBlockPlacement::findBestLoopTop(const MachineLoop &L,
		const BlockFilterSet &LoopBlockSet) {
		// Placing the latch block before the header may introduce an extra branch
		// that skips this block the first time the loop is executed, which we want
		// to avoid when optimising for size.
		// FIXME: in theory there is a case that does not introduce a new branch,
		// i.e. when the layout predecessor does not fallthrough to the loop header.
		// In practice this never happens though: there always seems to be a preheader
		// that can fallthrough and that is also placed before the header.
		if (F->getFunction().hasOptSize())
		return L.getHeader();

		MachineBasicBlock *OldTop = nullptr;
		MachineBasicBlock *NewTop = L.getHeader();
		while (NewTop != OldTop) {
		OldTop = NewTop;
		NewTop = findBestLoopTopHelper(OldTop, L, LoopBlockSet);
		if (NewTop != OldTop)
		ComputedEdges[NewTop] = { OldTop, false };
		}
		return NewTop;
		}

/// Find the best loop exiting block for layout.		/// Find the best loop exiting block for layout.
///		///
/// This routine implements the logic to analyze the loop looking for the best		/// This routine implements the logic to analyze the loop looking for the best
/// block to layout at the top of the loop. Typically this is done to maximize		/// block to layout at the top of the loop. Typically this is done to maximize
/// fallthrough opportunities.		/// fallthrough opportunities.
MachineBasicBlock *		MachineBasicBlock *
MachineBlockPlacement::findBestLoopExit(const MachineLoop &L,		MachineBlockPlacement::findBestLoopExit(const MachineLoop &L,
const BlockFilterSet &LoopBlockSet) {		const BlockFilterSet &LoopBlockSet,
		BlockFrequency &ExitFreq) {
// We don't want to layout the loop linearly in all cases. If the loop header		// We don't want to layout the loop linearly in all cases. If the loop header
// is just a normal basic block in the loop, we want to look for what block		// is just a normal basic block in the loop, we want to look for what block
// within the loop is the best one to layout at the top. However, if the loop		// within the loop is the best one to layout at the top. However, if the loop
// header has be pre-merged into a chain due to predecessors not having		// header has be pre-merged into a chain due to predecessors not having
// analyzable branches, and the predecessor it is merged with is not part		// analyzable branches, and the predecessor it is merged with is not part
// of the loop, rotating the header into the middle of the loop will create		// of the loop, rotating the header into the middle of the loop will create
// a non-contiguous range of blocks which is Very Bad. So start with the		// a non-contiguous range of blocks which is Very Bad. So start with the
// header and only rotate if safe.		// header and only rotate if safe.
▲ Show 20 Lines • Show All 94 Lines • ▼ Show 20 Lines	MachineBlockPlacement::findBestLoopExit(const MachineLoop &L,
// one of them as the exiting block we are rotating toward, disable loop		// one of them as the exiting block we are rotating toward, disable loop
// rotation altogether.		// rotation altogether.
if (!BlocksExitingToOuterLoop.empty() &&		if (!BlocksExitingToOuterLoop.empty() &&
!BlocksExitingToOuterLoop.count(ExitingBB))		!BlocksExitingToOuterLoop.count(ExitingBB))
return nullptr;		return nullptr;

LLVM_DEBUG(dbgs() << " Best exiting block: " << getBlockName(ExitingBB)		LLVM_DEBUG(dbgs() << " Best exiting block: " << getBlockName(ExitingBB)
<< "\n");		<< "\n");
		ExitFreq = BestExitEdgeFreq;
return ExitingBB;		return ExitingBB;
}		}

/// Check if there is a fallthrough to loop header Top.		/// Check if there is a fallthrough to loop header Top.
///		///
/// 1. Look for a Pred that can be layout before Top.		/// 1. Look for a Pred that can be layout before Top.
/// 2. Check if Top is the most possible successor of Pred.		/// 2. Check if Top is the most possible successor of Pred.
bool		bool
Show All 28 Lines
/// Attempt to rotate an exiting block to the bottom of the loop.		/// Attempt to rotate an exiting block to the bottom of the loop.
///		///
/// Once we have built a chain, try to rotate it to line up the hot exit block		/// Once we have built a chain, try to rotate it to line up the hot exit block
/// with fallthrough out of the loop if doing so doesn't introduce unnecessary		/// with fallthrough out of the loop if doing so doesn't introduce unnecessary
/// branches. For example, if the loop has fallthrough into its header and out		/// branches. For example, if the loop has fallthrough into its header and out
/// of its bottom already, don't rotate it.		/// of its bottom already, don't rotate it.
void MachineBlockPlacement::rotateLoop(BlockChain &LoopChain,		void MachineBlockPlacement::rotateLoop(BlockChain &LoopChain,
const MachineBasicBlock *ExitingBB,		const MachineBasicBlock *ExitingBB,
		BlockFrequency ExitFreq,
const BlockFilterSet &LoopBlockSet) {		const BlockFilterSet &LoopBlockSet) {
if (!ExitingBB)		if (!ExitingBB)
return;		return;

MachineBasicBlock Top = LoopChain.begin();		MachineBasicBlock Top = LoopChain.begin();
MachineBasicBlock Bottom = std::prev(LoopChain.end());		MachineBasicBlock Bottom = std::prev(LoopChain.end());

// If ExitingBB is already the last one in a chain then nothing to do.		// If ExitingBB is already the last one in a chain then nothing to do.
if (Bottom == ExitingBB)		if (Bottom == ExitingBB)
return;		return;

bool ViableTopFallthrough = hasViableTopFallthrough(Top, LoopBlockSet);		bool ViableTopFallthrough = hasViableTopFallthrough(Top, LoopBlockSet);

// If the header has viable fallthrough, check whether the current loop		// If the header has viable fallthrough, check whether the current loop
// bottom is a viable exiting block. If so, bail out as rotating will		// bottom is a viable exiting block. If so, bail out as rotating will
// introduce an unnecessary branch.		// introduce an unnecessary branch.
if (ViableTopFallthrough) {		if (ViableTopFallthrough) {
for (MachineBasicBlock *Succ : Bottom->successors()) {		for (MachineBasicBlock *Succ : Bottom->successors()) {
BlockChain *SuccChain = BlockToChain[Succ];		BlockChain *SuccChain = BlockToChain[Succ];
if (!LoopBlockSet.count(Succ) &&		if (!LoopBlockSet.count(Succ) &&
(!SuccChain \|\| Succ == *SuccChain->begin()))		(!SuccChain \|\| Succ == *SuccChain->begin()))
return;		return;
}		}

		// Rotate will destroy the top fallthrough, we need to ensure the new exit
		// frequency is larger than top fallthrough.
		BlockFrequency FallThrough2Top = TopFallThroughFreq(Top, LoopBlockSet);
		if (FallThrough2Top >= ExitFreq)
		return;
}		}

BlockChain::iterator ExitIt = llvm::find(LoopChain, ExitingBB);		BlockChain::iterator ExitIt = llvm::find(LoopChain, ExitingBB);
if (ExitIt == LoopChain.end())		if (ExitIt == LoopChain.end())
return;		return;

// Rotating a loop exit to the bottom when there is a fallthrough to top		// Rotating a loop exit to the bottom when there is a fallthrough to top
// trades the entry fallthrough for an exit fallthrough.		// trades the entry fallthrough for an exit fallthrough.
Show All 39 Lines
/// exits to BB out of the loop.		/// exits to BB out of the loop.
/// 3. The missed fall through edge (if it exists) from the last BB to the		/// 3. The missed fall through edge (if it exists) from the last BB to the
/// first BB in the loop chain.		/// first BB in the loop chain.
/// Therefore, the cost for a given rotation is the sum of costs listed above.		/// Therefore, the cost for a given rotation is the sum of costs listed above.
/// We select the best rotation with the smallest cost.		/// We select the best rotation with the smallest cost.
void MachineBlockPlacement::rotateLoopWithProfile(		void MachineBlockPlacement::rotateLoopWithProfile(
BlockChain &LoopChain, const MachineLoop &L,		BlockChain &LoopChain, const MachineLoop &L,
const BlockFilterSet &LoopBlockSet) {		const BlockFilterSet &LoopBlockSet) {
auto HeaderBB = L.getHeader();
auto HeaderIter = llvm::find(LoopChain, HeaderBB);
auto RotationPos = LoopChain.end();		auto RotationPos = LoopChain.end();

BlockFrequency SmallestRotationCost = BlockFrequency::getMaxFrequency();		BlockFrequency SmallestRotationCost = BlockFrequency::getMaxFrequency();

// A utility lambda that scales up a block frequency by dividing it by a		// A utility lambda that scales up a block frequency by dividing it by a
// branch probability which is the reciprocal of the scale.		// branch probability which is the reciprocal of the scale.
auto ScaleBlockFrequency = [](BlockFrequency Freq,		auto ScaleBlockFrequency = [](BlockFrequency Freq,
unsigned Scale) -> BlockFrequency {		unsigned Scale) -> BlockFrequency {
if (Scale == 0)		if (Scale == 0)
return 0;		return 0;
// Use operator / between BlockFrequency and BranchProbability to implement		// Use operator / between BlockFrequency and BranchProbability to implement
// saturating multiplication.		// saturating multiplication.
return Freq / BranchProbability(1, Scale);		return Freq / BranchProbability(1, Scale);
};		};

// Compute the cost of the missed fall-through edge to the loop header if the		// Compute the cost of the missed fall-through edge to the loop header if the
// chain head is not the loop header. As we only consider natural loops with		// chain head is not the loop header. As we only consider natural loops with
// single header, this computation can be done only once.		// single header, this computation can be done only once.
BlockFrequency HeaderFallThroughCost(0);		BlockFrequency HeaderFallThroughCost(0);
for (auto *Pred : HeaderBB->predecessors()) {		MachineBasicBlock ChainHeaderBB = LoopChain.begin();
		for (auto *Pred : ChainHeaderBB->predecessors()) {
BlockChain *PredChain = BlockToChain[Pred];		BlockChain *PredChain = BlockToChain[Pred];
if (!LoopBlockSet.count(Pred) &&		if (!LoopBlockSet.count(Pred) &&
(!PredChain \|\| Pred == *std::prev(PredChain->end()))) {		(!PredChain \|\| Pred == *std::prev(PredChain->end()))) {
auto EdgeFreq =		auto EdgeFreq = MBFI->getBlockFreq(Pred) *
MBFI->getBlockFreq(Pred) * MBPI->getEdgeProbability(Pred, HeaderBB);		MBPI->getEdgeProbability(Pred, ChainHeaderBB);
auto FallThruCost = ScaleBlockFrequency(EdgeFreq, MisfetchCost);		auto FallThruCost = ScaleBlockFrequency(EdgeFreq, MisfetchCost);
// If the predecessor has only an unconditional jump to the header, we		// If the predecessor has only an unconditional jump to the header, we
// need to consider the cost of this jump.		// need to consider the cost of this jump.
if (Pred->succ_size() == 1)		if (Pred->succ_size() == 1)
FallThruCost += ScaleBlockFrequency(EdgeFreq, JumpInstCost);		FallThruCost += ScaleBlockFrequency(EdgeFreq, JumpInstCost);
HeaderFallThroughCost = std::max(HeaderFallThroughCost, FallThruCost);		HeaderFallThroughCost = std::max(HeaderFallThroughCost, FallThruCost);
}		}
}		}
Show All 33 Lines	for (auto Iter = LoopChain.begin(), TailIter = std::prev(LoopChain.end()),
auto TailBB = *TailIter;		auto TailBB = *TailIter;

// Calculate the cost by putting this BB to the top.		// Calculate the cost by putting this BB to the top.
BlockFrequency Cost = 0;		BlockFrequency Cost = 0;

// If the current BB is the loop header, we need to take into account the		// If the current BB is the loop header, we need to take into account the
// cost of the missed fall through edge from outside of the loop to the		// cost of the missed fall through edge from outside of the loop to the
// header.		// header.
if (Iter != HeaderIter)		if (Iter != LoopChain.begin())
Cost += HeaderFallThroughCost;		Cost += HeaderFallThroughCost;

// Collect the loop exit cost by summing up frequencies of all exit edges		// Collect the loop exit cost by summing up frequencies of all exit edges
// except the one from the chain tail.		// except the one from the chain tail.
for (auto &ExitWithFreq : ExitsWithFreq)		for (auto &ExitWithFreq : ExitsWithFreq)
if (TailBB != ExitWithFreq.first)		if (TailBB != ExitWithFreq.first)
Cost += ExitWithFreq.second;		Cost += ExitWithFreq.second;

▲ Show 20 Lines • Show All 104 Lines • ▼ Show 20 Lines	void MachineBlockPlacement::buildLoopChains(const MachineLoop &L) {
bool RotateLoopWithProfile =		bool RotateLoopWithProfile =
ForcePreciseRotationCost \|\|		ForcePreciseRotationCost \|\|
(PreciseRotationCost && F->getFunction().hasProfileData());		(PreciseRotationCost && F->getFunction().hasProfileData());

// First check to see if there is an obviously preferable top block for the		// First check to see if there is an obviously preferable top block for the
// loop. This will default to the header, but may end up as one of the		// loop. This will default to the header, but may end up as one of the
// predecessors to the header if there is one which will result in strictly		// predecessors to the header if there is one which will result in strictly
// fewer branches in the loop body.		// fewer branches in the loop body.
// When we use profile data to rotate the loop, this is unnecessary.		MachineBasicBlock *LoopTop = findBestLoopTop(L, LoopBlockSet);
MachineBasicBlock *LoopTop =
RotateLoopWithProfile ? L.getHeader() : findBestLoopTop(L, LoopBlockSet);

// If we selected just the header for the loop top, look for a potentially		// If we selected just the header for the loop top, look for a potentially
// profitable exit block in the event that rotating the loop can eliminate		// profitable exit block in the event that rotating the loop can eliminate
// branches by placing an exit edge at the bottom.		// branches by placing an exit edge at the bottom.
//		//
// Loops are processed innermost to uttermost, make sure we clear		// Loops are processed innermost to uttermost, make sure we clear
// PreferredLoopExit before processing a new loop.		// PreferredLoopExit before processing a new loop.
PreferredLoopExit = nullptr;		PreferredLoopExit = nullptr;
		BlockFrequency ExitFreq;
if (!RotateLoopWithProfile && LoopTop == L.getHeader())		if (!RotateLoopWithProfile && LoopTop == L.getHeader())
PreferredLoopExit = findBestLoopExit(L, LoopBlockSet);		PreferredLoopExit = findBestLoopExit(L, LoopBlockSet, ExitFreq);

BlockChain &LoopChain = *BlockToChain[LoopTop];		BlockChain &LoopChain = *BlockToChain[LoopTop];

// FIXME: This is a really lame way of walking the chains in the loop: we		// FIXME: This is a really lame way of walking the chains in the loop: we
// walk the blocks, and use a set to prevent visiting a particular chain		// walk the blocks, and use a set to prevent visiting a particular chain
// twice.		// twice.
SmallPtrSet<BlockChain *, 4> UpdatedPreds;		SmallPtrSet<BlockChain *, 4> UpdatedPreds;
assert(LoopChain.UnscheduledPredecessors == 0 &&		assert(LoopChain.UnscheduledPredecessors == 0 &&
"LoopChain should not have unscheduled predecessors.");		"LoopChain should not have unscheduled predecessors.");
UpdatedPreds.insert(&LoopChain);		UpdatedPreds.insert(&LoopChain);

for (const MachineBasicBlock *LoopBB : LoopBlockSet)		for (const MachineBasicBlock *LoopBB : LoopBlockSet)
fillWorkLists(LoopBB, UpdatedPreds, &LoopBlockSet);		fillWorkLists(LoopBB, UpdatedPreds, &LoopBlockSet);

buildChain(LoopTop, LoopChain, &LoopBlockSet);		buildChain(LoopTop, LoopChain, &LoopBlockSet);

if (RotateLoopWithProfile)		if (RotateLoopWithProfile)
rotateLoopWithProfile(LoopChain, L, LoopBlockSet);		rotateLoopWithProfile(LoopChain, L, LoopBlockSet);
else		else
rotateLoop(LoopChain, PreferredLoopExit, LoopBlockSet);		rotateLoop(LoopChain, PreferredLoopExit, ExitFreq, LoopBlockSet);

LLVM_DEBUG({		LLVM_DEBUG({
// Crash at the end so we get all of the debugging output first.		// Crash at the end so we get all of the debugging output first.
bool BadLoop = false;		bool BadLoop = false;
if (LoopChain.UnscheduledPredecessors) {		if (LoopChain.UnscheduledPredecessors) {
BadLoop = true;		BadLoop = true;
dbgs() << "Loop chain contains a block without its preds placed!\n"		dbgs() << "Loop chain contains a block without its preds placed!\n"
<< " Loop header: " << getBlockName(*L.block_begin()) << "\n"		<< " Loop header: " << getBlockName(*L.block_begin()) << "\n"
▲ Show 20 Lines • Show All 634 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AArch64/cmpxchg-idioms.ll

	Show First 20 Lines • Show All 105 Lines • ▼ Show 20 Lines
	; CHECK: [[FAILED]]:			; CHECK: [[FAILED]]:
	; CHECK-NOT: cmp {{w[0-9]+}}, {{w[0-9]+}}			; CHECK-NOT: cmp {{w[0-9]+}}, {{w[0-9]+}}

	; verify the preheader is simplified by simplifycfg.			; verify the preheader is simplified by simplifycfg.
	; CHECK: [[PH]]:			; CHECK: [[PH]]:
	; CHECK: mov w22, #2			; CHECK: mov w22, #2
	; CHECK-NOT: mov w22, #4			; CHECK-NOT: mov w22, #4
	; CHECK-NOT: cmn w22, #4			; CHECK-NOT: cmn w22, #4
	; CHECK: b [[LOOP2:LBB[0-9]+_[0-9]+]]			; CHECK: [[LOOP2:LBB[0-9]+_[0-9]+]]: ; %for.cond
	; CHECK-NOT: b.ne [[LOOP2]]			; CHECK-NOT: b.ne [[LOOP2]]
	; CHECK-NOT: b {{LBB[0-9]+_[0-9]+}}			; CHECK-NOT: b {{LBB[0-9]+_[0-9]+}}
	; CHECK: bl _foo			; CHECK: bl _foo
	entry:			entry:
	%pair = cmpxchg i32* %c, i32 %a, i32 %b seq_cst seq_cst			%pair = cmpxchg i32* %c, i32 %a, i32 %b seq_cst seq_cst
	%success = extractvalue { i32, i1 } %pair, 1			%success = extractvalue { i32, i1 } %pair, 1
	br label %for.cond			br label %for.cond

	Show All 30 Lines

llvm/trunk/test/CodeGen/AArch64/neg-imm.ll

	; RUN: llc -mtriple=aarch64-linux-gnu -verify-machineinstrs -o - %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-linux-gnu -verify-machineinstrs -disable-block-placement -o - %s \| FileCheck %s
	; LSR used to pick a sub-optimal solution due to the target responding			; LSR used to pick a sub-optimal solution due to the target responding
	; conservatively to isLegalAddImmediate for negative values.			; conservatively to isLegalAddImmediate for negative values.

	declare void @foo(i32)			declare void @foo(i32)

	define void @test(i32 %px) {			define void @test(i32 %px) {
	; CHECK_LABEL: test:			; CHECK_LABEL: test:
	; CHECK_LABEL: %entry			; CHECK_LABEL: %entry
	Show All 37 Lines

llvm/trunk/test/CodeGen/AArch64/tailmerging_in_mbp.ll

	; RUN: llc <%s -mtriple=aarch64-eabi -verify-machine-dom-info \| FileCheck %s			; RUN: llc <%s -mtriple=aarch64-eabi -verify-machine-dom-info \| FileCheck %s

	; CHECK-LABEL: test:			; CHECK-LABEL: test:
	; CHECK: LBB0_7:			; CHECK-LABEL: %cond.false12.i
	; CHECK: b.hi			; CHECK: b.gt
	; CHECK-NEXT: b
	; CHECK-NEXT: LBB0_8:			; CHECK-NEXT: LBB0_8:
	; CHECK-NEXT: mov x8, x9			; CHECK-NEXT: mov x8, x9
	; CHECK-NEXT: LBB0_9:			; CHECK-NEXT: LBB0_9:
	define i64 @test(i64 %n, i64* %a, i64* %b, i64* %c, i64* %d, i64* %e, i64* %f) {			define i64 @test(i64 %n, i64* %a, i64* %b, i64* %c, i64* %d, i64* %e, i64* %f) {
	entry:			entry:
	%cmp28 = icmp sgt i64 %n, 1			%cmp28 = icmp sgt i64 %n, 1
	br i1 %cmp28, label %for.body, label %for.end			br i1 %cmp28, label %for.body, label %for.end

	▲ Show 20 Lines • Show All 49 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AMDGPU/collapse-endcf.ll

	Show First 20 Lines • Show All 224 Lines • ▼ Show 20 Lines
	bb.end: ; preds = %bb.then, %bb			bb.end: ; preds = %bb.then, %bb
	call void @llvm.amdgcn.s.barrier()			call void @llvm.amdgcn.s.barrier()
	ret void			ret void
	}			}

	; Make sure scc liveness is updated if sor_b64 is removed			; Make sure scc liveness is updated if sor_b64 is removed
	; ALL-LABEL: {{^}}scc_liveness:			; ALL-LABEL: {{^}}scc_liveness:

				; GCN: %bb10
				; GCN: s_or_b64 exec, exec, s{{\[[0-9]+:[0-9]+\]}}
				; GCN: s_andn2_b64
				; GCN-NEXT: s_cbranch_execz

	; GCN: [[BB1_LOOP:BB[0-9]+_[0-9]+]]:			; GCN: [[BB1_LOOP:BB[0-9]+_[0-9]+]]:
	; GCN: s_andn2_b64 exec, exec,			; GCN: s_andn2_b64 exec, exec,
	; GCN-NEXT: s_cbranch_execnz [[BB1_LOOP]]			; GCN-NEXT: s_cbranch_execnz [[BB1_LOOP]]

	; GCN: buffer_load_dword v{{[0-9]+}}, v{{[0-9]+}}, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offen			; GCN: buffer_load_dword v{{[0-9]+}}, v{{[0-9]+}}, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offen
	; GCN: s_and_b64 exec, exec, {{vcc\|s\[[0-9:]+\]}}			; GCN: s_and_b64 exec, exec, {{vcc\|s\[[0-9:]+\]}}

	; GCN-NOT: s_or_b64 exec, exec			; GCN-NOT: s_or_b64 exec, exec

	; GCN: s_or_b64 exec, exec, s{{\[[0-9]+:[0-9]+\]}}			; GCN: s_or_b64 exec, exec, s{{\[[0-9]+:[0-9]+\]}}
	; GCN: s_andn2_b64
	; GCN-NEXT: s_cbranch_execnz

	; GCN: s_or_b64 exec, exec, s{{\[[0-9]+:[0-9]+\]}}
	; GCN: buffer_store_dword			; GCN: buffer_store_dword
	; GCN: buffer_store_dword			; GCN: buffer_store_dword
	; GCN: buffer_store_dword			; GCN: buffer_store_dword
	; GCN: buffer_store_dword			; GCN: buffer_store_dword
	; GCN: s_setpc_b64			; GCN: s_setpc_b64
	define void @scc_liveness(i32 %arg) local_unnamed_addr #0 {			define void @scc_liveness(i32 %arg) local_unnamed_addr #0 {
	bb:			bb:
	br label %bb1			br label %bb1
	Show All 40 Lines

llvm/trunk/test/CodeGen/AMDGPU/divergent-branch-uniform-condition.ll

	Show All 13 Lines
	; CHECK-NEXT: v_readfirstlane_b32 s0, v0			; CHECK-NEXT: v_readfirstlane_b32 s0, v0
	; CHECK-NEXT: s_mov_b32 m0, s0			; CHECK-NEXT: s_mov_b32 m0, s0
	; CHECK-NEXT: s_mov_b64 s[4:5], 0			; CHECK-NEXT: s_mov_b64 s[4:5], 0
	; CHECK-NEXT: v_interp_p1_f32_e32 v0, v1, attr0.x			; CHECK-NEXT: v_interp_p1_f32_e32 v0, v1, attr0.x
	; CHECK-NEXT: v_cmp_nlt_f32_e64 s[0:1], 0, v0			; CHECK-NEXT: v_cmp_nlt_f32_e64 s[0:1], 0, v0
	; CHECK-NEXT: v_mov_b32_e32 v1, 0			; CHECK-NEXT: v_mov_b32_e32 v1, 0
	; CHECK-NEXT: ; implicit-def: $sgpr2_sgpr3			; CHECK-NEXT: ; implicit-def: $sgpr2_sgpr3
	; CHECK-NEXT: ; implicit-def: $sgpr6_sgpr7			; CHECK-NEXT: ; implicit-def: $sgpr6_sgpr7
	; CHECK-NEXT: BB0_1: ; %loop			; CHECK-NEXT: s_branch BB0_3
				; CHECK-NEXT: BB0_1: ; in Loop: Header=BB0_3 Depth=1
				; CHECK-NEXT: ; implicit-def: $vgpr1
				; CHECK-NEXT: BB0_2: ; %Flow
				; CHECK-NEXT: ; in Loop: Header=BB0_3 Depth=1
				; CHECK-NEXT: s_and_b64 s[8:9], exec, s[6:7]
				; CHECK-NEXT: s_or_b64 s[8:9], s[8:9], s[4:5]
				; CHECK-NEXT: s_mov_b64 s[4:5], s[8:9]
				; CHECK-NEXT: s_andn2_b64 exec, exec, s[8:9]
				; CHECK-NEXT: s_cbranch_execz BB0_7
				; CHECK-NEXT: BB0_3: ; %loop
	; CHECK-NEXT: ; =>This Inner Loop Header: Depth=1			; CHECK-NEXT: ; =>This Inner Loop Header: Depth=1
	; CHECK-NEXT: v_cmp_gt_u32_e32 vcc, 32, v1			; CHECK-NEXT: v_cmp_gt_u32_e32 vcc, 32, v1
	; CHECK-NEXT: s_and_b64 vcc, exec, vcc			; CHECK-NEXT: s_and_b64 vcc, exec, vcc
	; CHECK-NEXT: s_or_b64 s[6:7], s[6:7], exec			; CHECK-NEXT: s_or_b64 s[6:7], s[6:7], exec
	; CHECK-NEXT: s_or_b64 s[2:3], s[2:3], exec			; CHECK-NEXT: s_or_b64 s[2:3], s[2:3], exec
	; CHECK-NEXT: s_cbranch_vccz BB0_5			; CHECK-NEXT: s_cbranch_vccz BB0_1
	; CHECK-NEXT: ; %bb.2: ; %endif1			; CHECK-NEXT: ; %bb.4: ; %endif1
	; CHECK-NEXT: ; in Loop: Header=BB0_1 Depth=1			; CHECK-NEXT: ; in Loop: Header=BB0_3 Depth=1
	; CHECK-NEXT: s_mov_b64 s[6:7], -1			; CHECK-NEXT: s_mov_b64 s[6:7], -1
	; CHECK-NEXT: s_and_saveexec_b64 s[8:9], s[0:1]			; CHECK-NEXT: s_and_saveexec_b64 s[8:9], s[0:1]
	; CHECK-NEXT: s_xor_b64 s[8:9], exec, s[8:9]			; CHECK-NEXT: s_xor_b64 s[8:9], exec, s[8:9]
	; CHECK-NEXT: ; mask branch BB0_4			; CHECK-NEXT: ; mask branch BB0_6
	; CHECK-NEXT: BB0_3: ; %endif2			; CHECK-NEXT: BB0_5: ; %endif2
	; CHECK-NEXT: ; in Loop: Header=BB0_1 Depth=1			; CHECK-NEXT: ; in Loop: Header=BB0_3 Depth=1
	; CHECK-NEXT: v_add_u32_e32 v1, 1, v1			; CHECK-NEXT: v_add_u32_e32 v1, 1, v1
	; CHECK-NEXT: s_xor_b64 s[6:7], exec, -1			; CHECK-NEXT: s_xor_b64 s[6:7], exec, -1
	; CHECK-NEXT: BB0_4: ; %Flow1			; CHECK-NEXT: BB0_6: ; %Flow1
	; CHECK-NEXT: ; in Loop: Header=BB0_1 Depth=1			; CHECK-NEXT: ; in Loop: Header=BB0_3 Depth=1
	; CHECK-NEXT: s_or_b64 exec, exec, s[8:9]			; CHECK-NEXT: s_or_b64 exec, exec, s[8:9]
	; CHECK-NEXT: s_andn2_b64 s[2:3], s[2:3], exec			; CHECK-NEXT: s_andn2_b64 s[2:3], s[2:3], exec
	; CHECK-NEXT: s_branch BB0_6			; CHECK-NEXT: s_branch BB0_2
	; CHECK-NEXT: BB0_5: ; in Loop: Header=BB0_1 Depth=1			; CHECK-NEXT: BB0_7: ; %Flow2
	; CHECK-NEXT: ; implicit-def: $vgpr1
	; CHECK-NEXT: BB0_6: ; %Flow
	; CHECK-NEXT: ; in Loop: Header=BB0_1 Depth=1
	; CHECK-NEXT: s_and_b64 s[8:9], exec, s[6:7]
	; CHECK-NEXT: s_or_b64 s[8:9], s[8:9], s[4:5]
	; CHECK-NEXT: s_mov_b64 s[4:5], s[8:9]
	; CHECK-NEXT: s_andn2_b64 exec, exec, s[8:9]
	; CHECK-NEXT: s_cbranch_execnz BB0_1
	; CHECK-NEXT: ; %bb.7: ; %Flow2
	; CHECK-NEXT: s_or_b64 exec, exec, s[8:9]			; CHECK-NEXT: s_or_b64 exec, exec, s[8:9]
	; CHECK-NEXT: v_mov_b32_e32 v1, 0			; CHECK-NEXT: v_mov_b32_e32 v1, 0
	; this is the divergent branch with the condition not marked as divergent			; this is the divergent branch with the condition not marked as divergent
	; CHECK-NEXT: s_and_saveexec_b64 s[0:1], s[2:3]			; CHECK-NEXT: s_and_saveexec_b64 s[0:1], s[2:3]
	; CHECK-NEXT: ; mask branch BB0_9			; CHECK-NEXT: ; mask branch BB0_9
	; CHECK-NEXT: BB0_8: ; %if1			; CHECK-NEXT: BB0_8: ; %if1
	; CHECK-NEXT: v_sqrt_f32_e32 v1, v0			; CHECK-NEXT: v_sqrt_f32_e32 v1, v0
	; CHECK-NEXT: BB0_9: ; %endloop			; CHECK-NEXT: BB0_9: ; %endloop
	Show All 36 Lines

llvm/trunk/test/CodeGen/AMDGPU/global_smrd_cfg.ll

	; RUN: llc -mtriple amdgcn--amdhsa -mcpu=fiji -amdgpu-scalarize-global-loads=true -verify-machineinstrs < %s \| FileCheck %s			; RUN: llc -mtriple amdgcn--amdhsa -mcpu=fiji -amdgpu-scalarize-global-loads=true -verify-machineinstrs < %s \| FileCheck %s

	; CHECK-LABEL: %bb11			; CHECK-LABEL: %bb22

	; Load from %arg in a Loop body has alias store			; Load from %arg has alias store in Loop

	; CHECK: flat_load_dword			; CHECK: flat_load_dword

	; CHECK-LABEL: %bb20			; #####################################################################
	; CHECK: flat_store_dword
				; Load from %arg1 has no-alias store in Loop - arg1[i+1] never alias arg1[i]

				; CHECK: s_load_dword

	; #####################################################################			; #####################################################################

	; CHECK-LABEL: %bb22			; CHECK-LABEL: %bb11

	; Load from %arg has alias store in Loop			; Load from %arg in a Loop body has alias store

	; CHECK: flat_load_dword			; CHECK: flat_load_dword

	; #####################################################################			; CHECK-LABEL: %bb20

	; Load from %arg1 has no-alias store in Loop - arg1[i+1] never alias arg1[i]

	; CHECK: s_load_dword			; CHECK: flat_store_dword

	define amdgpu_kernel void @cfg(i32 addrspace(1)* nocapture readonly %arg, i32 addrspace(1)* nocapture %arg1, i32 %arg2) #0 {			define amdgpu_kernel void @cfg(i32 addrspace(1)* nocapture readonly %arg, i32 addrspace(1)* nocapture %arg1, i32 %arg2) #0 {
	bb:			bb:
	%tmp = sext i32 %arg2 to i64			%tmp = sext i32 %arg2 to i64
	%tmp3 = getelementptr inbounds i32, i32 addrspace(1)* %arg, i64 %tmp			%tmp3 = getelementptr inbounds i32, i32 addrspace(1)* %arg, i64 %tmp
	%tmp4 = load i32, i32 addrspace(1)* %tmp3, align 4, !tbaa !0			%tmp4 = load i32, i32 addrspace(1)* %tmp3, align 4, !tbaa !0
	%tmp5 = icmp sgt i32 %tmp4, 0			%tmp5 = icmp sgt i32 %tmp4, 0
	br i1 %tmp5, label %bb6, label %bb8			br i1 %tmp5, label %bb6, label %bb8
	▲ Show 20 Lines • Show All 81 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AMDGPU/hoist-cond.ll

	; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck %s			; RUN: llc -march=amdgcn -verify-machineinstrs -disable-block-placement < %s \| FileCheck %s

	; Check that invariant compare is hoisted out of the loop.			; Check that invariant compare is hoisted out of the loop.
	; At the same time condition shall not be serialized into a VGPR and deserialized later			; At the same time condition shall not be serialized into a VGPR and deserialized later
	; using another v_cmp + v_cndmask, but used directly in s_and_saveexec_b64.			; using another v_cmp + v_cndmask, but used directly in s_and_saveexec_b64.

	; CHECK: v_cmp_{{..}}_u32_e{{32\|64}} [[COND:s\[[0-9]+:[0-9]+\]\|vcc]]			; CHECK: v_cmp_{{..}}_u32_e{{32\|64}} [[COND:s\[[0-9]+:[0-9]+\]\|vcc]]
	; CHECK: BB0_1:			; CHECK: BB0_1:
	; CHECK-NOT: v_cmp			; CHECK-NOT: v_cmp
	Show All 37 Lines

llvm/trunk/test/CodeGen/AMDGPU/i1-copy-from-loop.ll

	; RUN: llc -mtriple=amdgcn-- -verify-machineinstrs < %s \| FileCheck -check-prefix=SI %s			; RUN: llc -mtriple=amdgcn-- -verify-machineinstrs < %s \| FileCheck -check-prefix=SI %s
	; RUN: llc -mtriple=amdgcn-- -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefix=SI %s			; RUN: llc -mtriple=amdgcn-- -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefix=SI %s

	; SI-LABEL: {{^}}i1_copy_from_loop:			; SI-LABEL: {{^}}i1_copy_from_loop:
	;			;
				; SI: ; %Flow
				; SI-DAG: s_andn2_b64 [[LCSSA_ACCUM:s\[[0-9]+:[0-9]+\]]], [[LCSSA_ACCUM]], exec
				; SI-DAG: s_and_b64 [[CC_MASK2:s\[[0-9]+:[0-9]+\]]], [[CC_ACCUM:s\[[0-9]+:[0-9]+\]]], exec
				; SI: s_or_b64 [[LCSSA_ACCUM]], [[LCSSA_ACCUM]], [[CC_MASK2]]

	; SI: ; %for.body			; SI: ; %for.body
	; SI: v_cmp_gt_u32_e64 [[CC_SREG:s\[[0-9]+:[0-9]+\]]], 4,			; SI: v_cmp_gt_u32_e64 [[CC_SREG:s\[[0-9]+:[0-9]+\]]], 4,
	; SI-DAG: s_andn2_b64 [[CC_ACCUM:s\[[0-9]+:[0-9]+\]]], [[CC_ACCUM]], exec			; SI-DAG: s_andn2_b64 [[CC_ACCUM]], [[CC_ACCUM]], exec
	; SI-DAG: s_and_b64 [[CC_MASK:s\[[0-9]+:[0-9]+\]]], [[CC_SREG]], exec			; SI-DAG: s_and_b64 [[CC_MASK:s\[[0-9]+:[0-9]+\]]], [[CC_SREG]], exec
	; SI: s_or_b64 [[CC_ACCUM]], [[CC_ACCUM]], [[CC_MASK]]			; SI: s_or_b64 [[CC_ACCUM]], [[CC_ACCUM]], [[CC_MASK]]

	; SI: ; %Flow1			; SI: ; %Flow1
	; SI: s_or_b64 [[CC_ACCUM]], [[CC_ACCUM]], exec			; SI: s_or_b64 [[CC_ACCUM]], [[CC_ACCUM]], exec

	; SI: ; %Flow
	; SI-DAG: s_andn2_b64 [[LCSSA_ACCUM:s\[[0-9]+:[0-9]+\]]], [[LCSSA_ACCUM]], exec
	; SI-DAG: s_and_b64 [[CC_MASK2:s\[[0-9]+:[0-9]+\]]], [[CC_ACCUM]], exec
	; SI: s_or_b64 [[LCSSA_ACCUM]], [[LCSSA_ACCUM]], [[CC_MASK2]]

	; SI: ; %for.end			; SI: ; %for.end
	; SI: s_and_saveexec_b64 {{s\[[0-9]+:[0-9]+\]}}, [[LCSSA_ACCUM]]			; SI: s_and_saveexec_b64 {{s\[[0-9]+:[0-9]+\]}}, [[LCSSA_ACCUM]]

	define amdgpu_ps void @i1_copy_from_loop(<4 x i32> inreg %rsrc, i32 %tid) {			define amdgpu_ps void @i1_copy_from_loop(<4 x i32> inreg %rsrc, i32 %tid) {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	Show All 29 Lines

llvm/trunk/test/CodeGen/AMDGPU/indirect-addressing-si.ll

Show First 20 Lines • Show All 624 Lines • ▼ Show 20 Lines	define amdgpu_kernel void @insertelement_v16f32_or_index(<16 x float> addrspace(1)* %out, <16 x float> %a, i32 %idx.in) nounwind {
%vecins = insertelement <16 x float> %a, float 5.000000e+00, i32 %idx		%vecins = insertelement <16 x float> %a, float 5.000000e+00, i32 %idx
store <16 x float> %vecins, <16 x float> addrspace(1)* %out, align 64		store <16 x float> %vecins, <16 x float> addrspace(1)* %out, align 64
ret void		ret void
}		}

; GCN-LABEL: {{^}}broken_phi_bb:		; GCN-LABEL: {{^}}broken_phi_bb:
; GCN: v_mov_b32_e32 [[PHIREG:v[0-9]+]], 8		; GCN: v_mov_b32_e32 [[PHIREG:v[0-9]+]], 8

; GCN: s_branch [[BB2:BB[0-9]+_[0-9]+]]		; GCN: [[BB2:BB[0-9]+_[0-9]+]]:

; GCN: {{^BB[0-9]+_[0-9]+}}:
; GCN: s_mov_b64 exec,

; GCN: [[BB2]]:
; GCN: v_cmp_le_i32_e32 vcc, s{{[0-9]+}}, [[PHIREG]]		; GCN: v_cmp_le_i32_e32 vcc, s{{[0-9]+}}, [[PHIREG]]
; GCN: buffer_load_dword		; GCN: buffer_load_dword

; GCN: [[REGLOOP:BB[0-9]+_[0-9]+]]:		; GCN: [[REGLOOP:BB[0-9]+_[0-9]+]]:
; MOVREL: v_movreld_b32_e32		; MOVREL: v_movreld_b32_e32

; IDXMODE: s_set_gpr_idx_on		; IDXMODE: s_set_gpr_idx_on
; IDXMODE: v_mov_b32_e32		; IDXMODE: v_mov_b32_e32
; IDXMODE: s_set_gpr_idx_off		; IDXMODE: s_set_gpr_idx_off

; GCN: s_cbranch_execnz [[REGLOOP]]		; GCN: s_cbranch_execnz [[REGLOOP]]

		; GCN: {{^; %bb.[0-9]}}:
		; GCN: s_mov_b64 exec,
		; GCN: s_branch [[BB2]]

define amdgpu_kernel void @broken_phi_bb(i32 %arg, i32 %arg1) #0 {		define amdgpu_kernel void @broken_phi_bb(i32 %arg, i32 %arg1) #0 {
bb:		bb:
br label %bb2		br label %bb2

bb2: ; preds = %bb4, %bb		bb2: ; preds = %bb4, %bb
%tmp = phi i32 [ 8, %bb ], [ %tmp7, %bb4 ]		%tmp = phi i32 [ 8, %bb ], [ %tmp7, %bb4 ]
%tmp3 = icmp slt i32 %tmp, %arg		%tmp3 = icmp slt i32 %tmp, %arg
br i1 %tmp3, label %bb4, label %bb8		br i1 %tmp3, label %bb4, label %bb8
Show All 18 Lines

llvm/trunk/test/CodeGen/AMDGPU/loop_break.ll

	; RUN: opt -mtriple=amdgcn-- -S -structurizecfg -si-annotate-control-flow %s \| FileCheck -check-prefix=OPT %s			; RUN: opt -mtriple=amdgcn-- -S -structurizecfg -si-annotate-control-flow %s \| FileCheck -check-prefix=OPT %s
	; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -march=amdgcn -verify-machineinstrs -disable-block-placement < %s \| FileCheck -check-prefix=GCN %s

	; Uses llvm.amdgcn.break			; Uses llvm.amdgcn.break

	; OPT-LABEL: @break_loop(			; OPT-LABEL: @break_loop(
	; OPT: bb1:			; OPT: bb1:
	; OPT: icmp slt i32			; OPT: icmp slt i32
	; OPT-NEXT: br i1 %cmp0, label %bb4, label %Flow			; OPT-NEXT: br i1 %cmp0, label %bb4, label %Flow

	▲ Show 20 Lines • Show All 325 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AMDGPU/loop_exit_with_xor.ll

Show First 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	loopexit:
ret void		ret void
}		}

; Another case where the mask of lanes wanting to exit the loop is not masked		; Another case where the mask of lanes wanting to exit the loop is not masked
; by exec, because it is a function parameter.		; by exec, because it is a function parameter.

; GCN-LABEL: {{^}}break_cond_is_arg:		; GCN-LABEL: {{^}}break_cond_is_arg:
; GCN: s_xor_b64 [[REG1:[^ ,]]], {{[^ ,], -1$}}		; GCN: s_xor_b64 [[REG1:[^ ,]]], {{[^ ,], -1$}}
		; GCN: s_andn2_b64 exec, exec, [[REG3:[^ ,]*]]
; GCN: s_and_b64 [[REG2:[^ ,]*]], exec, [[REG1]]		; GCN: s_and_b64 [[REG2:[^ ,]*]], exec, [[REG1]]
; GCN: s_or_b64 [[REG3:[^ ,]*]], [[REG2]],		; GCN: s_or_b64 [[REG3]], [[REG2]],
; GCN: s_andn2_b64 exec, exec, [[REG3]]

define void @break_cond_is_arg(i32 %arg, i1 %breakcond) {		define void @break_cond_is_arg(i32 %arg, i1 %breakcond) {
entry:		entry:
br label %loop		br label %loop

loop:		loop:
%tmp23phi = phi i32 [ %tmp23, %endif ], [ 0, %entry ]		%tmp23phi = phi i32 [ %tmp23, %endif ], [ 0, %entry ]
%tmp23 = add nuw i32 %tmp23phi, 1		%tmp23 = add nuw i32 %tmp23phi, 1
Show All 19 Lines

llvm/trunk/test/CodeGen/AMDGPU/madmk.ll

Show First 20 Lines • Show All 182 Lines • ▼ Show 20 Lines	define amdgpu_kernel void @madmk_add_inline_imm_f32(float addrspace(1)* noalias %out, float addrspace(1)* noalias %in) nounwind {

%mul = fmul float %a, 10.0		%mul = fmul float %a, 10.0
%madmk = fadd float %mul, 2.0		%madmk = fadd float %mul, 2.0
store float %madmk, float addrspace(1)* %out.gep, align 4		store float %madmk, float addrspace(1)* %out.gep, align 4
ret void		ret void
}		}

; SI-LABEL: {{^}}kill_madmk_verifier_error:		; SI-LABEL: {{^}}kill_madmk_verifier_error:
		; SI: s_or_b64
; SI: s_xor_b64		; SI: s_xor_b64
; SI: v_mac_f32_e32 {{v[0-9]+}}, 0x472aee8c, {{v[0-9]+}}		; SI: v_mac_f32_e32 {{v[0-9]+}}, 0x472aee8c, {{v[0-9]+}}
; SI: s_or_b64
define amdgpu_kernel void @kill_madmk_verifier_error() nounwind {		define amdgpu_kernel void @kill_madmk_verifier_error() nounwind {
bb:		bb:
br label %bb2		br label %bb2

bb1: ; preds = %bb2		bb1: ; preds = %bb2
ret void		ret void

bb2: ; preds = %bb6, %bb		bb2: ; preds = %bb6, %bb
Show All 16 Lines

llvm/trunk/test/CodeGen/AMDGPU/multilevel-break.ll

	Show All 18 Lines
	;			;
	; OPT: Flow1:			; OPT: Flow1:

	; GCN-LABEL: {{^}}multi_else_break:			; GCN-LABEL: {{^}}multi_else_break:

	; GCN: ; %main_body			; GCN: ; %main_body
	; GCN: s_mov_b64 [[LEFT_OUTER:s\[[0-9]+:[0-9]+\]]], 0{{$}}			; GCN: s_mov_b64 [[LEFT_OUTER:s\[[0-9]+:[0-9]+\]]], 0{{$}}

				; GCN: [[FLOW2:BB[0-9]+_[0-9]+]]: ; %Flow2
				; GCN: s_or_b64 exec, exec, [[TMP0:s\[[0-9]+:[0-9]+\]]]
				; GCN: s_and_b64 [[TMP1:s\[[0-9]+:[0-9]+\]]], exec, [[BREAK_OUTER:s\[[0-9]+:[0-9]+\]]]
				; GCN: s_or_b64 [[TMP1]], [[TMP1]], [[LEFT_OUTER]]
				; GCN: s_mov_b64 [[LEFT_OUTER]], [[TMP1]]
				; GCN: s_andn2_b64 exec, exec, [[TMP1]]
				; GCN: s_cbranch_execz [[IF_BLOCK:BB[0-9]+_[0-9]+]]

	; GCN: [[OUTER_LOOP:BB[0-9]+_[0-9]+]]: ; %LOOP.outer{{$}}			; GCN: [[OUTER_LOOP:BB[0-9]+_[0-9]+]]: ; %LOOP.outer{{$}}
	; GCN: s_mov_b64 [[LEFT_INNER:s\[[0-9]+:[0-9]+\]]], 0{{$}}			; GCN: s_mov_b64 [[LEFT_INNER:s\[[0-9]+:[0-9]+\]]], 0{{$}}

				; GCN: ; %Flow
				; GCN: s_or_b64 exec, exec, [[SAVE_EXEC:s\[[0-9]+:[0-9]+\]]]
				; GCN: s_and_b64 [[TMP0]], exec, [[BREAK_INNER:s\[[0-9]+:[0-9]+\]]]
				; GCN: s_or_b64 [[TMP0]], [[TMP0]], [[LEFT_INNER]]
				; GCN: s_mov_b64 [[LEFT_INNER]], [[TMP0]]
				; GCN: s_andn2_b64 exec, exec, [[TMP0]]
				; GCN: s_cbranch_execz [[FLOW2]]

	; GCN: [[INNER_LOOP:BB[0-9]+_[0-9]+]]: ; %LOOP{{$}}			; GCN: [[INNER_LOOP:BB[0-9]+_[0-9]+]]: ; %LOOP{{$}}
	; GCN: s_or_b64 [[BREAK_OUTER:s\[[0-9]+:[0-9]+\]]], [[BREAK_OUTER]], exec			; GCN: s_or_b64 [[BREAK_OUTER]], [[BREAK_OUTER]], exec
	; GCN: s_or_b64 [[BREAK_INNER:s\[[0-9]+:[0-9]+\]]], [[BREAK_INNER]], exec			; GCN: s_or_b64 [[BREAK_INNER]], [[BREAK_INNER]], exec
	; GCN: s_and_saveexec_b64 [[SAVE_EXEC:s\[[0-9]+:[0-9]+\]]], vcc			; GCN: s_and_saveexec_b64 [[SAVE_EXEC]], vcc

	; FIXME: duplicate comparison			; FIXME: duplicate comparison
	; GCN: ; %ENDIF			; GCN: ; %ENDIF
	; GCN-DAG: v_cmp_eq_u32_e32 vcc,			; GCN-DAG: v_cmp_eq_u32_e32 vcc,
	; GCN-DAG: v_cmp_ne_u32_e64 [[TMP51NEG:s\[[0-9]+:[0-9]+\]]],			; GCN-DAG: v_cmp_ne_u32_e64 [[TMP51NEG:s\[[0-9]+:[0-9]+\]]],
	; GCN-DAG: s_andn2_b64 [[BREAK_OUTER]], [[BREAK_OUTER]], exec			; GCN-DAG: s_andn2_b64 [[BREAK_OUTER]], [[BREAK_OUTER]], exec
	; GCN-DAG: s_andn2_b64 [[BREAK_INNER]], [[BREAK_INNER]], exec			; GCN-DAG: s_andn2_b64 [[BREAK_INNER]], [[BREAK_INNER]], exec
	; GCN-DAG: s_and_b64 [[TMP_EQ:s\[[0-9]+:[0-9]+\]]], vcc, exec			; GCN-DAG: s_and_b64 [[TMP_EQ:s\[[0-9]+:[0-9]+\]]], vcc, exec
	; GCN-DAG: s_and_b64 [[TMP_NE:s\[[0-9]+:[0-9]+\]]], [[TMP51NEG]], exec			; GCN-DAG: s_and_b64 [[TMP_NE:s\[[0-9]+:[0-9]+\]]], [[TMP51NEG]], exec
	; GCN-DAG: s_or_b64 [[BREAK_OUTER]], [[BREAK_OUTER]], [[TMP_EQ]]			; GCN-DAG: s_or_b64 [[BREAK_OUTER]], [[BREAK_OUTER]], [[TMP_EQ]]
	; GCN-DAG: s_or_b64 [[BREAK_INNER]], [[BREAK_INNER]], [[TMP_NE]]			; GCN-DAG: s_or_b64 [[BREAK_INNER]], [[BREAK_INNER]], [[TMP_NE]]

	; GCN: ; %Flow			; GCN: [[IF_BLOCK]]: ; %IF
	; GCN: s_or_b64 exec, exec, [[SAVE_EXEC]]
	; GCN: s_and_b64 [[TMP0:s\[[0-9]+:[0-9]+\]]], exec, [[BREAK_INNER]]
	; GCN: s_or_b64 [[TMP0]], [[TMP0]], [[LEFT_INNER]]
	; GCN: s_mov_b64 [[LEFT_INNER]], [[TMP0]]
	; GCN: s_andn2_b64 exec, exec, [[TMP0]]
	; GCN: s_cbranch_execnz [[INNER_LOOP]]

	; GCN: ; %Flow2
	; GCN: s_or_b64 exec, exec, [[TMP0]]
	; GCN: s_and_b64 [[TMP1:s\[[0-9]+:[0-9]+\]]], exec, [[BREAK_OUTER]]
	; GCN: s_or_b64 [[TMP1]], [[TMP1]], [[LEFT_OUTER]]
	; GCN: s_mov_b64 [[LEFT_OUTER]], [[TMP1]]
	; GCN: s_andn2_b64 exec, exec, [[TMP1]]
	; GCN: s_cbranch_execnz [[OUTER_LOOP]]

	; GCN: ; %IF
	; GCN-NEXT: s_endpgm			; GCN-NEXT: s_endpgm
	define amdgpu_vs void @multi_else_break(<4 x float> %vec, i32 %ub, i32 %cont) {			define amdgpu_vs void @multi_else_break(<4 x float> %vec, i32 %ub, i32 %cont) {
	main_body:			main_body:
	br label %LOOP.outer			br label %LOOP.outer

	LOOP.outer: ; preds = %ENDIF, %main_body			LOOP.outer: ; preds = %ENDIF, %main_body
	%tmp43 = phi i32 [ 0, %main_body ], [ %tmp47, %ENDIF ]			%tmp43 = phi i32 [ 0, %main_body ], [ %tmp47, %ENDIF ]
	br label %LOOP			br label %LOOP
	Show All 16 Lines
	; OPT: llvm.amdgcn.if.break			; OPT: llvm.amdgcn.if.break
	; OPT: llvm.amdgcn.loop			; OPT: llvm.amdgcn.loop
	; OPT: llvm.amdgcn.if.break			; OPT: llvm.amdgcn.if.break
	; OPT: llvm.amdgcn.end.cf			; OPT: llvm.amdgcn.end.cf

	; GCN-LABEL: {{^}}multi_if_break_loop:			; GCN-LABEL: {{^}}multi_if_break_loop:
	; GCN: s_mov_b64 [[LEFT:s\[[0-9]+:[0-9]+\]]], 0{{$}}			; GCN: s_mov_b64 [[LEFT:s\[[0-9]+:[0-9]+\]]], 0{{$}}

				; GCN: ; %Flow4
				; GCN: s_and_b64 [[BREAK:s\[[0-9]+:[0-9]+\]]], exec, [[BREAK]]
				; GCN: s_or_b64 [[LEFT]], [[BREAK]], [[OLD_LEFT:s\[[0-9]+:[0-9]+\]]]
				; GCN: s_andn2_b64 exec, exec, [[LEFT]]
				; GCN-NEXT: s_cbranch_execz

	; GCN: [[LOOP:BB[0-9]+_[0-9]+]]: ; %bb1{{$}}			; GCN: [[LOOP:BB[0-9]+_[0-9]+]]: ; %bb1{{$}}
	; GCN: s_mov_b64 [[OLD_LEFT:s\[[0-9]+:[0-9]+\]]], [[LEFT]]			; GCN: s_mov_b64 [[OLD_LEFT]], [[LEFT]]

	; GCN: ; %LeafBlock1			; GCN: ; %LeafBlock1
	; GCN: s_mov_b64			; GCN: s_mov_b64
	; GCN: s_mov_b64 [[BREAK:s\[[0-9]+:[0-9]+\]]], -1{{$}}			; GCN: s_mov_b64 [[BREAK]], -1{{$}}

	; GCN: ; %case1			; GCN: ; %case1
	; GCN: buffer_load_dword [[LOAD2:v[0-9]+]],			; GCN: buffer_load_dword [[LOAD2:v[0-9]+]],
	; GCN: v_cmp_ge_i32_e32 vcc, {{v[0-9]+}}, [[LOAD2]]			; GCN: v_cmp_ge_i32_e32 vcc, {{v[0-9]+}}, [[LOAD2]]
	; GCN: s_orn2_b64 [[BREAK]], vcc, exec			; GCN: s_orn2_b64 [[BREAK]], vcc, exec

	; GCN: ; %Flow3			; GCN: ; %Flow3
	; GCN: s_branch [[FLOW:BB[0-9]+_[0-9]+]]			; GCN: s_branch [[FLOW:BB[0-9]+_[0-9]+]]

	; GCN: s_mov_b64 [[BREAK]], -1{{$}}			; GCN: s_mov_b64 [[BREAK]], -1{{$}}

	; GCN: [[FLOW]]: ; %Flow			; GCN: [[FLOW]]: ; %Flow

	; GCN: ; %case0			; GCN: ; %case0
	; GCN: buffer_load_dword [[LOAD1:v[0-9]+]],			; GCN: buffer_load_dword [[LOAD1:v[0-9]+]],
	; GCN-DAG: s_andn2_b64 [[BREAK]], [[BREAK]], exec			; GCN-DAG: s_andn2_b64 [[BREAK]], [[BREAK]], exec
	; GCN-DAG: v_cmp_ge_i32_e32 vcc, {{v[0-9]+}}, [[LOAD1]]			; GCN-DAG: v_cmp_ge_i32_e32 vcc, {{v[0-9]+}}, [[LOAD1]]
	; GCN-DAG: s_and_b64 [[TMP:s\[[0-9]+:[0-9]+\]]], vcc, exec			; GCN-DAG: s_and_b64 [[TMP:s\[[0-9]+:[0-9]+\]]], vcc, exec
	; GCN: s_or_b64 [[BREAK]], [[BREAK]], [[TMP]]			; GCN: s_or_b64 [[BREAK]], [[BREAK]], [[TMP]]

	; GCN: ; %Flow4
	; GCN: s_and_b64 [[BREAK]], exec, [[BREAK]]
	; GCN: s_or_b64 [[LEFT]], [[BREAK]], [[OLD_LEFT]]
	; GCN: s_andn2_b64 exec, exec, [[LEFT]]
	; GCN-NEXT: s_cbranch_execnz

	define amdgpu_kernel void @multi_if_break_loop(i32 %arg) #0 {			define amdgpu_kernel void @multi_if_break_loop(i32 %arg) #0 {
	bb:			bb:
	%id = call i32 @llvm.amdgcn.workitem.id.x()			%id = call i32 @llvm.amdgcn.workitem.id.x()
	%tmp = sub i32 %id, %arg			%tmp = sub i32 %id, %arg
	br label %bb1			br label %bb1

	bb1:			bb1:
	%lsr.iv = phi i32 [ undef, %bb ], [ %lsr.iv.next, %case0 ], [ %lsr.iv.next, %case1 ]			%lsr.iv = phi i32 [ undef, %bb ], [ %lsr.iv.next, %case0 ], [ %lsr.iv.next, %case1 ]
	Show All 26 Lines

llvm/trunk/test/CodeGen/AMDGPU/optimize-negated-cond.ll

; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s		; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s

; GCN-LABEL: {{^}}negated_cond:		; GCN-LABEL: {{^}}negated_cond:
; GCN: BB0_1:		; GCN: BB0_1:
; GCN: v_cmp_eq_u32_e64 [[CC:[^,]+]],		; GCN: v_cmp_eq_u32_e64 [[CC:[^,]+]],
; GCN: BB0_2:		; GCN: BB0_3:
; GCN-NOT: v_cndmask_b32		; GCN-NOT: v_cndmask_b32
; GCN-NOT: v_cmp		; GCN-NOT: v_cmp
; GCN: s_andn2_b64 vcc, exec, [[CC]]		; GCN: s_andn2_b64 vcc, exec, [[CC]]
; GCN: s_cbranch_vccnz BB0_4		; GCN: s_cbranch_vccnz BB0_2
define amdgpu_kernel void @negated_cond(i32 addrspace(1)* %arg1) {		define amdgpu_kernel void @negated_cond(i32 addrspace(1)* %arg1) {
bb:		bb:
br label %bb1		br label %bb1

bb1:		bb1:
%tmp1 = load i32, i32 addrspace(1)* %arg1		%tmp1 = load i32, i32 addrspace(1)* %arg1
%tmp2 = icmp eq i32 %tmp1, 0		%tmp2 = icmp eq i32 %tmp1, 0
br label %bb2		br label %bb2
Show All 12 Lines	bb4:
%gep = getelementptr inbounds i32, i32 addrspace(1)* %arg1, i32 %tmp6		%gep = getelementptr inbounds i32, i32 addrspace(1)* %arg1, i32 %tmp6
store i32 0, i32 addrspace(1)* %gep		store i32 0, i32 addrspace(1)* %gep
%tmp7 = icmp eq i32 %tmp6, 32		%tmp7 = icmp eq i32 %tmp6, 32
br i1 %tmp7, label %bb1, label %bb2		br i1 %tmp7, label %bb1, label %bb2
}		}

; GCN-LABEL: {{^}}negated_cond_dominated_blocks:		; GCN-LABEL: {{^}}negated_cond_dominated_blocks:
; GCN: v_cmp_eq_u32_e64 [[CC:[^,]+]],		; GCN: v_cmp_eq_u32_e64 [[CC:[^,]+]],
; GCN: BB1_1:		; GCN: %bb4
; GCN-NOT: v_cndmask_b32		; GCN-NOT: v_cndmask_b32
; GCN-NOT: v_cmp		; GCN-NOT: v_cmp
; GCN: s_andn2_b64 vcc, exec, [[CC]]		; GCN: s_andn2_b64 vcc, exec, [[CC]]
; GCN: s_cbranch_vccz BB1_3		; GCN: s_cbranch_vccnz BB1_1
define amdgpu_kernel void @negated_cond_dominated_blocks(i32 addrspace(1)* %arg1) {		define amdgpu_kernel void @negated_cond_dominated_blocks(i32 addrspace(1)* %arg1) {
bb:		bb:
br label %bb2		br label %bb2

bb2:		bb2:
%tmp1 = load i32, i32 addrspace(1)* %arg1		%tmp1 = load i32, i32 addrspace(1)* %arg1
%tmp2 = icmp eq i32 %tmp1, 0		%tmp2 = icmp eq i32 %tmp1, 0
br label %bb4		br label %bb4
Show All 24 Lines

llvm/trunk/test/CodeGen/AMDGPU/si-annotate-cf.ll

	Show First 20 Lines • Show All 90 Lines • ▼ Show 20 Lines
	}			}

	declare float @llvm.fabs.f32(float) nounwind readnone			declare float @llvm.fabs.f32(float) nounwind readnone

	; This broke the old AMDIL cfg structurizer			; This broke the old AMDIL cfg structurizer
	; FUNC-LABEL: {{^}}loop_land_info_assert:			; FUNC-LABEL: {{^}}loop_land_info_assert:
	; SI: v_cmp_lt_i32_e64 [[CMP4:s\[[0-9:]+\]]], s{{[0-9]+}}, 4{{$}}			; SI: v_cmp_lt_i32_e64 [[CMP4:s\[[0-9:]+\]]], s{{[0-9]+}}, 4{{$}}
	; SI: s_and_b64 [[CMP4M:s\[[0-9]+:[0-9]+\]]], exec, [[CMP4]]			; SI: s_and_b64 [[CMP4M:s\[[0-9]+:[0-9]+\]]], exec, [[CMP4]]
	; SI: s_branch [[INFLOOP:BB[0-9]+_[0-9]+]]
				; SI: [[WHILELOOP:BB[0-9]+_[0-9]+]]: ; %while.cond
				; SI: s_cbranch_vccz [[FOR_COND_PH:BB[0-9]+_[0-9]+]]

	; SI: [[CONVEX_EXIT:BB[0-9_]+]]			; SI: [[CONVEX_EXIT:BB[0-9_]+]]
	; SI: s_mov_b64 vcc,			; SI: s_mov_b64 vcc,
	; SI-NEXT: s_cbranch_vccnz [[ENDPGM:BB[0-9]+_[0-9]+]]			; SI-NEXT: s_cbranch_vccnz [[ENDPGM:BB[0-9]+_[0-9]+]]
	; SI: s_cbranch_vccnz [[INFLOOP]]
				; SI: s_cbranch_vccnz [[WHILELOOP]]

	; SI: ; %if.else			; SI: ; %if.else
	; SI: buffer_store_dword			; SI: buffer_store_dword

	; SI: [[INFLOOP]]:			; SI: [[FOR_COND_PH]]: ; %for.cond.preheader
	; SI: s_cbranch_vccnz [[CONVEX_EXIT]]

	; SI: ; %for.cond.preheader
	; SI: s_cbranch_vccz [[ENDPGM]]			; SI: s_cbranch_vccz [[ENDPGM]]

	; SI: [[ENDPGM]]:			; SI: [[ENDPGM]]:
	; SI-NEXT: s_endpgm			; SI-NEXT: s_endpgm
	define amdgpu_kernel void @loop_land_info_assert(i32 %c0, i32 %c1, i32 %c2, i32 %c3, i32 %x, i32 %y, i1 %arg) nounwind {			define amdgpu_kernel void @loop_land_info_assert(i32 %c0, i32 %c1, i32 %c2, i32 %c3, i32 %x, i32 %y, i1 %arg) nounwind {
	entry:			entry:
	%cmp = icmp sgt i32 %c0, 0			%cmp = icmp sgt i32 %c0, 0
	br label %while.cond.outer			br label %while.cond.outer
	▲ Show 20 Lines • Show All 43 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AMDGPU/valu-i1.ll

	; RUN: llc -march=amdgcn -verify-machineinstrs -enable-misched -asm-verbose < %s \| FileCheck -check-prefix=SI %s			; RUN: llc -march=amdgcn -verify-machineinstrs -enable-misched -asm-verbose -disable-block-placement < %s \| FileCheck -check-prefix=SI %s

	declare i32 @llvm.amdgcn.workitem.id.x() nounwind readnone			declare i32 @llvm.amdgcn.workitem.id.x() nounwind readnone

	; SI-LABEL: {{^}}test_if:			; SI-LABEL: {{^}}test_if:
	; Make sure the i1 values created by the cfg structurizer pass are			; Make sure the i1 values created by the cfg structurizer pass are
	; moved using VALU instructions			; moved using VALU instructions


	▲ Show 20 Lines • Show All 261 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AMDGPU/wqm.ll

	Show First 20 Lines • Show All 644 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: s_mov_b64 [[LIVE:s\[[0-9]+:[0-9]+\]]], exec			; CHECK-NEXT: s_mov_b64 [[LIVE:s\[[0-9]+:[0-9]+\]]], exec
	; CHECK: s_wqm_b64 exec, exec			; CHECK: s_wqm_b64 exec, exec
	; CHECK: s_and_b64 exec, exec, [[LIVE]]			; CHECK: s_and_b64 exec, exec, [[LIVE]]
	; CHECK: image_store			; CHECK: image_store
	; CHECK: s_wqm_b64 exec, exec			; CHECK: s_wqm_b64 exec, exec
	; CHECK-DAG: v_mov_b32_e32 [[CTR:v[0-9]+]], 0			; CHECK-DAG: v_mov_b32_e32 [[CTR:v[0-9]+]], 0
	; CHECK-DAG: s_mov_b32 [[SEVEN:s[0-9]+]], 0x40e00000			; CHECK-DAG: s_mov_b32 [[SEVEN:s[0-9]+]], 0x40e00000

	; CHECK: [[LOOPHDR:BB[0-9]+_[0-9]+]]: ; %body			; CHECK: [[LOOPHDR:BB[0-9]+_[0-9]+]]: ; %loop
	; CHECK: v_add_f32_e32 [[CTR]], 2.0, [[CTR]]
	; CHECK: v_cmp_lt_f32_e32 vcc, [[SEVEN]], [[CTR]]			; CHECK: v_cmp_lt_f32_e32 vcc, [[SEVEN]], [[CTR]]
	; CHECK: s_cbranch_vccz [[LOOPHDR]]			; CHECK: s_cbranch_vccnz
	; CHECK: ; %break

				; CHECK: ; %body
				; CHECK: v_add_f32_e32 [[CTR]], 2.0, [[CTR]]
				; CHECK: s_branch [[LOOPHDR]]

				; CHECK: ; %break
	; CHECK: ; return			; CHECK: ; return
	define amdgpu_ps <4 x float> @test_loop_vcc(<4 x float> %in) nounwind {			define amdgpu_ps <4 x float> @test_loop_vcc(<4 x float> %in) nounwind {
	entry:			entry:
	call void @llvm.amdgcn.image.store.1d.v4f32.i32(<4 x float> %in, i32 15, i32 undef, <8 x i32> undef, i32 0, i32 0)			call void @llvm.amdgcn.image.store.1d.v4f32.i32(<4 x float> %in, i32 15, i32 undef, <8 x i32> undef, i32 0, i32 0)
	br label %loop			br label %loop

	loop:			loop:
	%ctr.iv = phi float [ 0.0, %entry ], [ %ctr.next, %body ]			%ctr.iv = phi float [ 0.0, %entry ], [ %ctr.next, %body ]
	▲ Show 20 Lines • Show All 185 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/ARM/2011-03-23-PeepholeBug.ll

	Show All 20 Lines
	; CHECK: bb1			; CHECK: bb1
	; CHECK: subs [[REG:r[0-9]+]], #1			; CHECK: subs [[REG:r[0-9]+]], #1
	%tmp = tail call i32 @puts() nounwind			%tmp = tail call i32 @puts() nounwind
	%indvar.next = add i32 %indvar, 1			%indvar.next = add i32 %indvar, 1
	br label %bb2			br label %bb2

	bb2: ; preds = %bb1, %entry			bb2: ; preds = %bb1, %entry
	; CHECK: cmp [[REG]], #0			; CHECK: cmp [[REG]], #0
	; CHECK: ble			; CHECK: bgt
	%indvar = phi i32 [ %indvar.next, %bb1 ], [ 0, %entry ]			%indvar = phi i32 [ %indvar.next, %bb1 ], [ 0, %entry ]
	%tries.0 = sub i32 2147483647, %indvar			%tries.0 = sub i32 2147483647, %indvar
	%tmp1 = icmp sgt i32 %tries.0, 0			%tmp1 = icmp sgt i32 %tries.0, 0
	br i1 %tmp1, label %bb, label %bb3			br i1 %tmp1, label %bb, label %bb3

	bb3: ; preds = %bb2, %bb			bb3: ; preds = %bb2, %bb
	ret i32 0			ret i32 0
	}			}

	declare i32 @rand()			declare i32 @rand()

	declare i32 @puts() nounwind			declare i32 @puts() nounwind

llvm/trunk/test/CodeGen/ARM/arm-and-tst-peephole.ll

	Show First 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
	; V8-LABEL: %tailrecurse.switch			; V8-LABEL: %tailrecurse.switch
	; V8: cmp			; V8: cmp
	; V8-NEXT: beq			; V8-NEXT: beq
	; V8-NEXT: %tailrecurse.switch			; V8-NEXT: %tailrecurse.switch
	; V8: cmp			; V8: cmp
	; V8-NEXT: beq			; V8-NEXT: beq
	; V8-NEXT: %tailrecurse.switch			; V8-NEXT: %tailrecurse.switch
	; V8: cmp			; V8: cmp
	; V8-NEXT: beq			; V8-NEXT: bne
	; V8-NEXT: %sw.epilog			; V8-NEXT: %sw.bb
	; V8-NEXT: bx lr
	switch i32 %and, label %sw.epilog [			switch i32 %and, label %sw.epilog [
	i32 1, label %sw.bb			i32 1, label %sw.bb
	i32 3, label %sw.bb6			i32 3, label %sw.bb6
	i32 2, label %sw.bb8			i32 2, label %sw.bb8
	], !prof !1			], !prof !1

	sw.bb: ; preds = %tailrecurse.switch, %tailrecurse			sw.bb: ; preds = %tailrecurse.switch, %tailrecurse
	%shl = shl i32 %acc.tr, 1			%shl = shl i32 %acc.tr, 1
	▲ Show 20 Lines • Show All 125 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/ARM/atomic-cmp.ll

	; RUN: llc < %s -mtriple=armv7-apple-darwin -verify-machineinstrs \| FileCheck %s -check-prefix=ARM			; RUN: llc < %s -mtriple=armv7-apple-darwin -verify-machineinstrs \| FileCheck %s -check-prefix=ARM
	; RUN: llc < %s -mtriple=thumbv7-apple-darwin -verify-machineinstrs \| FileCheck %s -check-prefix=T2			; RUN: llc < %s -mtriple=thumbv7-apple-darwin -verify-machineinstrs \| FileCheck %s -check-prefix=T2
	; rdar://8964854			; rdar://8964854

	define i8 @t(i8* %a, i8 %b, i8 %c) nounwind {			define i8 @t(i8* %a, i8 %b, i8 %c) nounwind {
	; ARM-LABEL: t:			; ARM-LABEL: t:
	; ARM: ldrexb			; ARM: ldrexb
	; ARM: strexb			; ARM: strexb
	; ARM: clrex			; ARM: clrex

	; T2-LABEL: t:			; T2-LABEL: t:
	; T2: strexb
	; T2: ldrexb			; T2: ldrexb
				; T2: strexb
	; T2: clrex			; T2: clrex
	%tmp0 = cmpxchg i8* %a, i8 %b, i8 %c monotonic monotonic			%tmp0 = cmpxchg i8* %a, i8 %b, i8 %c monotonic monotonic
	%tmp1 = extractvalue { i8, i1 } %tmp0, 0			%tmp1 = extractvalue { i8, i1 } %tmp0, 0
	ret i8 %tmp1			ret i8 %tmp1
	}			}

llvm/trunk/test/CodeGen/ARM/atomic-cmpxchg.ll

	Show First 20 Lines • Show All 46 Lines • ▼ Show 20 Lines
	; CHECK-THUMBV6-NEXT: uxtb r1, r4			; CHECK-THUMBV6-NEXT: uxtb r1, r4
	; CHECK-THUMBV6-NEXT: subs [[R1:r[0-7]]], r0, {{r[0-9]+}}			; CHECK-THUMBV6-NEXT: subs [[R1:r[0-7]]], r0, {{r[0-9]+}}
	; CHECK-THUMBV6-NEXT: rsbs r0, [[R1]], #0			; CHECK-THUMBV6-NEXT: rsbs r0, [[R1]], #0
	; CHECK-THUMBV6-NEXT: adcs r0, [[R1]]			; CHECK-THUMBV6-NEXT: adcs r0, [[R1]]

	; CHECK-ARMV7-LABEL: test_cmpxchg_res_i8:			; CHECK-ARMV7-LABEL: test_cmpxchg_res_i8:
	; CHECK-ARMV7-NEXT: .fnstart			; CHECK-ARMV7-NEXT: .fnstart
	; CHECK-ARMV7-NEXT: uxtb [[DESIRED:r[0-9]+]], r1			; CHECK-ARMV7-NEXT: uxtb [[DESIRED:r[0-9]+]], r1
	; CHECK-ARMV7-NEXT: b [[TRY:.LBB[0-9_]+]]			; CHECK-ARMV7-NEXT: [[TRY:.LBB[0-9_]+]]:
	; CHECK-ARMV7-NEXT: [[HEAD:.LBB[0-9_]+]]:			; CHECK-ARMV7-NEXT: ldrexb [[SUCCESS:r[0-9]+]], [r0]
	; CHECK-ARMV7-NEXT: strexb [[SUCCESS:r[0-9]+]], r2, [r0]			; CHECK-ARMV7-NEXT: cmp [[SUCCESS]], r1
				; CHECK-ARMV7-NEXT: bne [[EXIT:.LBB[0-9_]+]]
				; CHECK-ARMV7-NEXT: strexb [[SUCCESS]], r2, [r0]
	; CHECK-ARMV7-NEXT: cmp [[SUCCESS]], #0			; CHECK-ARMV7-NEXT: cmp [[SUCCESS]], #0
	; CHECK-ARMV7-NEXT: moveq r0, #1			; CHECK-ARMV7-NEXT: moveq r0, #1
	; CHECK-ARMV7-NEXT: bxeq lr			; CHECK-ARMV7-NEXT: bxeq lr
	; CHECK-ARMV7-NEXT: [[TRY]]:			; CHECK-ARMV7-NEXT: b [[TRY]]
	; CHECK-ARMV7-NEXT: ldrexb [[SUCCESS]], [r0]			; CHECK-ARMV7-NEXT: [[EXIT]]:
	; CHECK-ARMV7-NEXT: cmp [[SUCCESS]], r1
	; CHECK-ARMV7-NEXT: beq [[HEAD]]
	; CHECK-ARMV7-NEXT: mov r0, #0			; CHECK-ARMV7-NEXT: mov r0, #0
	; CHECK-ARMV7-NEXT: clrex			; CHECK-ARMV7-NEXT: clrex
	; CHECK-ARMV7-NEXT: bx lr			; CHECK-ARMV7-NEXT: bx lr

	; CHECK-THUMBV7-LABEL: test_cmpxchg_res_i8:			; CHECK-THUMBV7-LABEL: test_cmpxchg_res_i8:
	; CHECK-THUMBV7-NEXT: .fnstart			; CHECK-THUMBV7-NEXT: .fnstart
	; CHECK-THUMBV7-NEXT: uxtb [[DESIRED:r[0-9]+]], r1			; CHECK-THUMBV7-NEXT: uxtb [[DESIRED:r[0-9]+]], r1
	; CHECK-THUMBV7-NEXT: b [[TRYLD:.LBB[0-9_]+]]			; CHECK-THUMBV7-NEXT: [[TRYLD:.LBB[0-9_]+]]
	; CHECK-THUMBV7-NEXT: [[TRYST:.LBB[0-9_]+]]:			; CHECK-THUMBV7-NEXT: ldrexb [[LD:r[0-9]+]], [r0]
				; CHECK-THUMBV7-NEXT: cmp [[LD]], [[DESIRED]]
				; CHECK-THUMBV7-NEXT: bne [[EXIT:.LBB[0-9_]+]]
	; CHECK-THUMBV7-NEXT: strexb [[SUCCESS:r[0-9]+]], r2, [r0]			; CHECK-THUMBV7-NEXT: strexb [[SUCCESS:r[0-9]+]], r2, [r0]
	; CHECK-THUMBV7-NEXT: cmp [[SUCCESS]], #0			; CHECK-THUMBV7-NEXT: cmp [[SUCCESS]], #0
	; CHECK-THUMBV7-NEXT: itt eq			; CHECK-THUMBV7-NEXT: itt eq
	; CHECK-THUMBV7-NEXT: moveq r0, #1			; CHECK-THUMBV7-NEXT: moveq r0, #1
	; CHECK-THUMBV7-NEXT: bxeq lr			; CHECK-THUMBV7-NEXT: bxeq lr
	; CHECK-THUMBV7-NEXT: [[TRYLD]]:			; CHECK-THUMBV7-NEXT: b [[TRYLD]]
	; CHECK-THUMBV7-NEXT: ldrexb [[LD:r[0-9]+]], [r0]			; CHECK-THUMBV7-NEXT: [[EXIT]]:
	; CHECK-THUMBV7-NEXT: cmp [[LD]], [[DESIRED]]
	; CHECK-THUMBV7-NEXT: beq [[TRYST:.LBB[0-9_]+]]
	; CHECK-THUMBV7-NEXT: movs r0, #0			; CHECK-THUMBV7-NEXT: movs r0, #0
	; CHECK-THUMBV7-NEXT: clrex			; CHECK-THUMBV7-NEXT: clrex
	; CHECK-THUMBV7-NEXT: bx lr			; CHECK-THUMBV7-NEXT: bx lr

llvm/trunk/test/CodeGen/ARM/code-placement.ll

	Show All 32 Lines
	; rdar://8117827			; rdar://8117827
	define i32 @t2(i32 %passes, i32* nocapture %src, i32 %size) nounwind readonly {			define i32 @t2(i32 %passes, i32* nocapture %src, i32 %size) nounwind readonly {
	entry:			entry:
	; CHECK-LABEL: t2:			; CHECK-LABEL: t2:
	%0 = icmp eq i32 %passes, 0 ; <i1> [#uses=1]			%0 = icmp eq i32 %passes, 0 ; <i1> [#uses=1]
	br i1 %0, label %bb5, label %bb.nph15			br i1 %0, label %bb5, label %bb.nph15

	bb1: ; preds = %bb2.preheader, %bb1			bb1: ; preds = %bb2.preheader, %bb1
				; CHECK: LBB1_[[BB3:.]]: @ %bb3
	; CHECK: LBB1_[[PREHDR:.]]: @ %bb2.preheader			; CHECK: LBB1_[[PREHDR:.]]: @ %bb2.preheader
	; CHECK: blt LBB1_[[BB3:.]]			; CHECK: blt LBB1_[[BB3]]
	%indvar = phi i32 [ %indvar.next, %bb1 ], [ 0, %bb2.preheader ] ; <i32> [#uses=2]			%indvar = phi i32 [ %indvar.next, %bb1 ], [ 0, %bb2.preheader ] ; <i32> [#uses=2]
	%sum.08 = phi i32 [ %2, %bb1 ], [ %sum.110, %bb2.preheader ] ; <i32> [#uses=1]			%sum.08 = phi i32 [ %2, %bb1 ], [ %sum.110, %bb2.preheader ] ; <i32> [#uses=1]
	%tmp17 = sub i32 %i.07, %indvar ; <i32> [#uses=1]			%tmp17 = sub i32 %i.07, %indvar ; <i32> [#uses=1]
	%scevgep = getelementptr i32, i32* %src, i32 %tmp17 ; <i32*> [#uses=1]			%scevgep = getelementptr i32, i32* %src, i32 %tmp17 ; <i32*> [#uses=1]
	%1 = load i32, i32* %scevgep, align 4 ; <i32> [#uses=1]			%1 = load i32, i32* %scevgep, align 4 ; <i32> [#uses=1]
	%2 = add nsw i32 %1, %sum.08 ; <i32> [#uses=2]			%2 = add nsw i32 %1, %sum.08 ; <i32> [#uses=2]
	%indvar.next = add i32 %indvar, 1 ; <i32> [#uses=2]			%indvar.next = add i32 %indvar, 1 ; <i32> [#uses=2]
	%exitcond = icmp eq i32 %indvar.next, %size ; <i1> [#uses=1]			%exitcond = icmp eq i32 %indvar.next, %size ; <i1> [#uses=1]
	br i1 %exitcond, label %bb3, label %bb1			br i1 %exitcond, label %bb3, label %bb1

	bb3: ; preds = %bb1, %bb2.preheader			bb3: ; preds = %bb1, %bb2.preheader
	; CHECK: LBB1_[[BB1:.]]: @ %bb1			; CHECK: LBB1_[[BB1:.]]: @ %bb1
	; CHECK: bne LBB1_[[BB1]]			; CHECK: bne LBB1_[[BB1]]
	; CHECK: LBB1_[[BB3]]: @ %bb3
	%sum.0.lcssa = phi i32 [ %sum.110, %bb2.preheader ], [ %2, %bb1 ] ; <i32> [#uses=2]			%sum.0.lcssa = phi i32 [ %sum.110, %bb2.preheader ], [ %2, %bb1 ] ; <i32> [#uses=2]
	%3 = add i32 %pass.011, 1 ; <i32> [#uses=2]			%3 = add i32 %pass.011, 1 ; <i32> [#uses=2]
	%exitcond18 = icmp eq i32 %3, %passes ; <i1> [#uses=1]			%exitcond18 = icmp eq i32 %3, %passes ; <i1> [#uses=1]
	br i1 %exitcond18, label %bb5, label %bb2.preheader			br i1 %exitcond18, label %bb5, label %bb2.preheader

	bb.nph15: ; preds = %entry			bb.nph15: ; preds = %entry
	%i.07 = add i32 %size, -1 ; <i32> [#uses=2]			%i.07 = add i32 %size, -1 ; <i32> [#uses=2]
	%4 = icmp sgt i32 %i.07, -1 ; <i1> [#uses=1]			%4 = icmp sgt i32 %i.07, -1 ; <i1> [#uses=1]
	Show All 11 Lines

llvm/trunk/test/CodeGen/ARM/pr32578.ll

	; RUN: llc -o - %s \| FileCheck %s			; RUN: llc -o - %s \| FileCheck %s
	target triple = "armv7"			target triple = "armv7"

	; CHECK-LABEL: func:			; CHECK-LABEL: func:
	; CHECK: push {r11, lr}			; CHECK: push {r11, lr}
	; CHECK: vpush {d8}			; CHECK: vpush {d8}
	; CHECK: b .LBB0_2			; CHECK: .LBB0_1: @ %tailrecurse
	define arm_aapcscc double @func() {			define arm_aapcscc double @func() {
	br label %tailrecurse			br label %tailrecurse

	tailrecurse:			tailrecurse:
	%v0 = load i16, i16* undef, align 8			%v0 = load i16, i16* undef, align 8
	%cond36.i = icmp eq i16 %v0, 3			%cond36.i = icmp eq i16 %v0, 3
	br i1 %cond36.i, label %sw.bb.i, label %sw.epilog.i			br i1 %cond36.i, label %sw.bb.i, label %sw.epilog.i

	Show All 12 Lines

llvm/trunk/test/CodeGen/ARM/swifterror.ll

	Show First 20 Lines • Show All 176 Lines • ▼ Show 20 Lines
	; CHECK-APPLE-LABEL: foo_loop:			; CHECK-APPLE-LABEL: foo_loop:
	; CHECK-APPLE: mov [[CODE:r[0-9]+]], r0			; CHECK-APPLE: mov [[CODE:r[0-9]+]], r0
	; swifterror is kept in a register			; swifterror is kept in a register
	; CHECK-APPLE: cmp [[CODE]], #0			; CHECK-APPLE: cmp [[CODE]], #0
	; CHECK-APPLE: beq			; CHECK-APPLE: beq
	; CHECK-APPLE: mov r0, #16			; CHECK-APPLE: mov r0, #16
	; CHECK-APPLE: malloc			; CHECK-APPLE: malloc
	; CHECK-APPLE: strb r{{.*}}, [r0, #8]			; CHECK-APPLE: strb r{{.*}}, [r0, #8]
	; CHECK-APPLE: ble			; CHECK-APPLE: b

	; CHECK-O0-LABEL: foo_loop:			; CHECK-O0-LABEL: foo_loop:
	; CHECK-O0: cmp r{{.*}}, #0			; CHECK-O0: cmp r{{.*}}, #0
	; CHECK-O0: beq			; CHECK-O0: beq
	; CHECK-O0: mov r0, #16			; CHECK-O0: mov r0, #16
	; CHECK-O0: malloc			; CHECK-O0: malloc
	; CHECK-O0-DAG: mov [[ID:r[0-9]+]], r0			; CHECK-O0-DAG: mov [[ID:r[0-9]+]], r0
	; CHECK-O0-DAG: movw [[ID2:.*]], #1			; CHECK-O0-DAG: movw [[ID2:.*]], #1
	▲ Show 20 Lines • Show All 414 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/Hexagon/bug6757-endloop.ll

	; RUN: llc -march=hexagon < %s \| FileCheck %s			; RUN: llc -march=hexagon < %s \| FileCheck %s

	; Make sure that we can handle loops with multiple ENDLOOP instructions.			; Make sure that we can handle loops with multiple ENDLOOP instructions.
	; This situation can arise due to tail duplication.			; This situation can arise due to tail duplication.

	; CHECK: loop1([[LP:.LBB0_[0-9]+]]			; CHECK: loop1([[LP:.LBB0_[0-9]+]]
				; CHECK: endloop1
	; CHECK: [[LP]]:			; CHECK: [[LP]]:
	; CHECK-NOT: loop1(			; CHECK-NOT: loop1(
	; CHECK: endloop1			; CHECK: endloop1
	; CHECK: endloop1

	%s.0 = type { i32, i8* }			%s.0 = type { i32, i8* }
	%s.1 = type { i32, i32, i32, i32 }			%s.1 = type { i32, i32, i32, i32 }

	define void @f0(%s.0* nocapture readonly %a0, %s.1* nocapture readonly %a1) {			define void @f0(%s.0* nocapture readonly %a0, %s.1* nocapture readonly %a1) {
	b0:			b0:
	%v0 = getelementptr inbounds %s.1, %s.1* %a1, i32 0, i32 0			%v0 = getelementptr inbounds %s.1, %s.1* %a1, i32 0, i32 0
	%v1 = load i32, i32* %v0, align 4			%v1 = load i32, i32* %v0, align 4
	▲ Show 20 Lines • Show All 89 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/Hexagon/early-if-merge-loop.ll

	; RUN: llc -march=hexagon < %s \| FileCheck %s			; RUN: llc -march=hexagon < %s \| FileCheck %s
	; Make sure that the loop in the end has only one basic block.			; Make sure that the loop in the end has only one basic block.

	; CHECK-LABEL: fred			; CHECK-LABEL: fred
				; CHECK: %b2
	; Rely on the comments, make sure the one for the loop header is present.			; Rely on the comments, make sure the one for the loop header is present.
	; CHECK: %loop			; CHECK: %loop
	; CHECK-NOT: %should_merge			; CHECK: %should_merge
				; CHECK: %exit

	target triple = "hexagon"			target triple = "hexagon"

	define i32 @fred(i32 %a0, i64* nocapture readonly %a1) #0 {			define i32 @fred(i32 %a0, i64* nocapture readonly %a1) #0 {
	b2:			b2:
	%v3 = bitcast i64* %a1 to i32*			%v3 = bitcast i64* %a1 to i32*
	%v4 = getelementptr inbounds i32, i32* %v3, i32 1			%v4 = getelementptr inbounds i32, i32* %v3, i32 1
	%v5 = zext i32 %a0 to i64			%v5 = zext i32 %a0 to i64
	▲ Show 20 Lines • Show All 76 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/Hexagon/prof-early-if.ll

	; RUN: llc -O2 -march=hexagon < %s \| FileCheck %s			; RUN: llc -O2 -march=hexagon < %s \| FileCheck %s
	; Rely on the comments generated by llc. Check that "if.then" was not predicated.			; Rely on the comments generated by llc. Check that "if.then" was not predicated.
				; CHECK: b5
	; CHECK: b2			; CHECK: b2
	; CHECK-NOT: if{{.*}}memd			; CHECK-NOT: if{{.*}}memd
	; CHECK: b5

	%s.0 = type { [24 x i32], [24 x i32], [24 x i32], [24 x i32], [24 x i32], [24 x i32], [24 x i32], [24 x i32], [24 x i32], [24 x i32], [24 x i32], [24 x i32], [24 x i32], [24 x i32], [24 x i32], [24 x i32], [24 x i32], [3 x i32], [24 x i32], [8 x %s.1], [5 x i32] }			%s.0 = type { [24 x i32], [24 x i32], [24 x i32], [24 x i32], [24 x i32], [24 x i32], [24 x i32], [24 x i32], [24 x i32], [24 x i32], [24 x i32], [24 x i32], [24 x i32], [24 x i32], [24 x i32], [24 x i32], [24 x i32], [3 x i32], [24 x i32], [8 x %s.1], [5 x i32] }
	%s.1 = type { i32, i32 }			%s.1 = type { i32, i32 }

	@g0 = global i64 0			@g0 = global i64 0
	@g1 = global i32 0			@g1 = global i32 0
	@g2 = global i32 0			@g2 = global i32 0
	@g3 = global i8 0			@g3 = global i8 0
	▲ Show 20 Lines • Show All 67 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/Hexagon/redundant-branching2.ll

	; RUN: llc -march=hexagon -O2 < %s \| FileCheck %s			; RUN: llc -march=hexagon -O2 < %s \| FileCheck %s
	; This test checks if redundant conditional branches are removed.			; This test checks if redundant conditional branches are removed.

	; CHECK: memub			; CHECK: memub
	; CHECK: memub			; CHECK: memub
				; CHECK: cmp.eq
	; CHECK: memub			; CHECK: memub
	; CHECK-NOT: if{{.*}}jump .LBB			; CHECK-NOT: if{{.*}}jump .LBB
	; CHECK: cmp.eq

	target triple = "hexagon-unknown--elf"			target triple = "hexagon-unknown--elf"

	; Function Attrs: nounwind			; Function Attrs: nounwind
	declare void @f0() #0			declare void @f0() #0

	; Function Attrs: nounwind			; Function Attrs: nounwind
	define void @f1(i8* %a0, i32 %a1, i8* %a2, i32* %a3, i32 %a4) #0 {			define void @f1(i8* %a0, i32 %a1, i8* %a2, i32* %a3, i32 %a4) #0 {
	▲ Show 20 Lines • Show All 58 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/PowerPC/atomics-regression.ll

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 395 Lines • ▼ Show 20 Lines	; PPC64LE-NEXT: blr
fence syncscope("singlethread") seq_cst		fence syncscope("singlethread") seq_cst
ret void		ret void
}		}

define void @test40(i8* %ptr, i8 %cmp, i8 %val) {		define void @test40(i8* %ptr, i8 %cmp, i8 %val) {
; PPC64LE-LABEL: test40:		; PPC64LE-LABEL: test40:
; PPC64LE: # %bb.0:		; PPC64LE: # %bb.0:
; PPC64LE-NEXT: rlwinm 4, 4, 0, 24, 31		; PPC64LE-NEXT: rlwinm 4, 4, 0, 24, 31
; PPC64LE-NEXT: b .LBB40_2
; PPC64LE-NEXT: .p2align 5
; PPC64LE-NEXT: .LBB40_1:		; PPC64LE-NEXT: .LBB40_1:
; PPC64LE-NEXT: stbcx. 5, 0, 3
; PPC64LE-NEXT: beqlr 0
; PPC64LE-NEXT: .LBB40_2:
; PPC64LE-NEXT: lbarx 6, 0, 3		; PPC64LE-NEXT: lbarx 6, 0, 3
; PPC64LE-NEXT: cmpw 4, 6		; PPC64LE-NEXT: cmpw 4, 6
; PPC64LE-NEXT: beq 0, .LBB40_1		; PPC64LE-NEXT: bne 0, .LBB40_3
; PPC64LE-NEXT: # %bb.3:		; PPC64LE-NEXT: # %bb.2:
		; PPC64LE-NEXT: stbcx. 5, 0, 3
		; PPC64LE-NEXT: beqlr 0
		; PPC64LE-NEXT: b .LBB40_1
		; PPC64LE-NEXT: .LBB40_3:
; PPC64LE-NEXT: stbcx. 6, 0, 3		; PPC64LE-NEXT: stbcx. 6, 0, 3
; PPC64LE-NEXT: blr		; PPC64LE-NEXT: blr
%res = cmpxchg i8* %ptr, i8 %cmp, i8 %val monotonic monotonic		%res = cmpxchg i8* %ptr, i8 %cmp, i8 %val monotonic monotonic
ret void		ret void
}		}

define void @test41(i8* %ptr, i8 %cmp, i8 %val) {		define void @test41(i8* %ptr, i8 %cmp, i8 %val) {
; PPC64LE-LABEL: test41:		; PPC64LE-LABEL: test41:
Show All 39 Lines	; PPC64LE-NEXT: blr
ret void		ret void
}		}

define void @test43(i8* %ptr, i8 %cmp, i8 %val) {		define void @test43(i8* %ptr, i8 %cmp, i8 %val) {
; PPC64LE-LABEL: test43:		; PPC64LE-LABEL: test43:
; PPC64LE: # %bb.0:		; PPC64LE: # %bb.0:
; PPC64LE-NEXT: rlwinm 4, 4, 0, 24, 31		; PPC64LE-NEXT: rlwinm 4, 4, 0, 24, 31
; PPC64LE-NEXT: lwsync		; PPC64LE-NEXT: lwsync
; PPC64LE-NEXT: b .LBB43_2
; PPC64LE-NEXT: .p2align 5
; PPC64LE-NEXT: .LBB43_1:		; PPC64LE-NEXT: .LBB43_1:
; PPC64LE-NEXT: stbcx. 5, 0, 3
; PPC64LE-NEXT: beqlr 0
; PPC64LE-NEXT: .LBB43_2:
; PPC64LE-NEXT: lbarx 6, 0, 3		; PPC64LE-NEXT: lbarx 6, 0, 3
; PPC64LE-NEXT: cmpw 4, 6		; PPC64LE-NEXT: cmpw 4, 6
; PPC64LE-NEXT: beq 0, .LBB43_1		; PPC64LE-NEXT: bne 0, .LBB43_3
; PPC64LE-NEXT: # %bb.3:		; PPC64LE-NEXT: # %bb.2:
		; PPC64LE-NEXT: stbcx. 5, 0, 3
		; PPC64LE-NEXT: beqlr 0
		; PPC64LE-NEXT: b .LBB43_1
		; PPC64LE-NEXT: .LBB43_3:
; PPC64LE-NEXT: stbcx. 6, 0, 3		; PPC64LE-NEXT: stbcx. 6, 0, 3
; PPC64LE-NEXT: blr		; PPC64LE-NEXT: blr
%res = cmpxchg i8* %ptr, i8 %cmp, i8 %val release monotonic		%res = cmpxchg i8* %ptr, i8 %cmp, i8 %val release monotonic
ret void		ret void
}		}

define void @test44(i8* %ptr, i8 %cmp, i8 %val) {		define void @test44(i8* %ptr, i8 %cmp, i8 %val) {
; PPC64LE-LABEL: test44:		; PPC64LE-LABEL: test44:
; PPC64LE: # %bb.0:		; PPC64LE: # %bb.0:
; PPC64LE-NEXT: rlwinm 4, 4, 0, 24, 31		; PPC64LE-NEXT: rlwinm 4, 4, 0, 24, 31
; PPC64LE-NEXT: lwsync		; PPC64LE-NEXT: lwsync
; PPC64LE-NEXT: b .LBB44_2
; PPC64LE-NEXT: .p2align 5
; PPC64LE-NEXT: .LBB44_1:		; PPC64LE-NEXT: .LBB44_1:
; PPC64LE-NEXT: stbcx. 5, 0, 3
; PPC64LE-NEXT: beqlr 0
; PPC64LE-NEXT: .LBB44_2:
; PPC64LE-NEXT: lbarx 6, 0, 3		; PPC64LE-NEXT: lbarx 6, 0, 3
; PPC64LE-NEXT: cmpw 4, 6		; PPC64LE-NEXT: cmpw 4, 6
; PPC64LE-NEXT: beq 0, .LBB44_1		; PPC64LE-NEXT: bne 0, .LBB44_3
; PPC64LE-NEXT: # %bb.3:		; PPC64LE-NEXT: # %bb.2:
		; PPC64LE-NEXT: stbcx. 5, 0, 3
		; PPC64LE-NEXT: beqlr 0
		; PPC64LE-NEXT: b .LBB44_1
		; PPC64LE-NEXT: .LBB44_3:
; PPC64LE-NEXT: stbcx. 6, 0, 3		; PPC64LE-NEXT: stbcx. 6, 0, 3
; PPC64LE-NEXT: blr		; PPC64LE-NEXT: blr
%res = cmpxchg i8* %ptr, i8 %cmp, i8 %val release acquire		%res = cmpxchg i8* %ptr, i8 %cmp, i8 %val release acquire
ret void		ret void
}		}

define void @test45(i8* %ptr, i8 %cmp, i8 %val) {		define void @test45(i8* %ptr, i8 %cmp, i8 %val) {
; PPC64LE-LABEL: test45:		; PPC64LE-LABEL: test45:
▲ Show 20 Lines • Show All 109 Lines • ▼ Show 20 Lines	; PPC64LE-NEXT: blr
%res = cmpxchg i8* %ptr, i8 %cmp, i8 %val seq_cst seq_cst		%res = cmpxchg i8* %ptr, i8 %cmp, i8 %val seq_cst seq_cst
ret void		ret void
}		}

define void @test50(i16* %ptr, i16 %cmp, i16 %val) {		define void @test50(i16* %ptr, i16 %cmp, i16 %val) {
; PPC64LE-LABEL: test50:		; PPC64LE-LABEL: test50:
; PPC64LE: # %bb.0:		; PPC64LE: # %bb.0:
; PPC64LE-NEXT: rlwinm 4, 4, 0, 16, 31		; PPC64LE-NEXT: rlwinm 4, 4, 0, 16, 31
; PPC64LE-NEXT: b .LBB50_2
; PPC64LE-NEXT: .p2align 5
; PPC64LE-NEXT: .LBB50_1:		; PPC64LE-NEXT: .LBB50_1:
; PPC64LE-NEXT: sthcx. 5, 0, 3
; PPC64LE-NEXT: beqlr 0
; PPC64LE-NEXT: .LBB50_2:
; PPC64LE-NEXT: lharx 6, 0, 3		; PPC64LE-NEXT: lharx 6, 0, 3
; PPC64LE-NEXT: cmpw 4, 6		; PPC64LE-NEXT: cmpw 4, 6
; PPC64LE-NEXT: beq 0, .LBB50_1		; PPC64LE-NEXT: bne 0, .LBB50_3
; PPC64LE-NEXT: # %bb.3:		; PPC64LE-NEXT: # %bb.2:
		; PPC64LE-NEXT: sthcx. 5, 0, 3
		; PPC64LE-NEXT: beqlr 0
		; PPC64LE-NEXT: b .LBB50_1
		; PPC64LE-NEXT: .LBB50_3:
; PPC64LE-NEXT: sthcx. 6, 0, 3		; PPC64LE-NEXT: sthcx. 6, 0, 3
; PPC64LE-NEXT: blr		; PPC64LE-NEXT: blr
%res = cmpxchg i16* %ptr, i16 %cmp, i16 %val monotonic monotonic		%res = cmpxchg i16* %ptr, i16 %cmp, i16 %val monotonic monotonic
ret void		ret void
}		}

define void @test51(i16* %ptr, i16 %cmp, i16 %val) {		define void @test51(i16* %ptr, i16 %cmp, i16 %val) {
; PPC64LE-LABEL: test51:		; PPC64LE-LABEL: test51:
Show All 39 Lines	; PPC64LE-NEXT: blr
ret void		ret void
}		}

define void @test53(i16* %ptr, i16 %cmp, i16 %val) {		define void @test53(i16* %ptr, i16 %cmp, i16 %val) {
; PPC64LE-LABEL: test53:		; PPC64LE-LABEL: test53:
; PPC64LE: # %bb.0:		; PPC64LE: # %bb.0:
; PPC64LE-NEXT: rlwinm 4, 4, 0, 16, 31		; PPC64LE-NEXT: rlwinm 4, 4, 0, 16, 31
; PPC64LE-NEXT: lwsync		; PPC64LE-NEXT: lwsync
; PPC64LE-NEXT: b .LBB53_2
; PPC64LE-NEXT: .p2align 5
; PPC64LE-NEXT: .LBB53_1:		; PPC64LE-NEXT: .LBB53_1:
; PPC64LE-NEXT: sthcx. 5, 0, 3
; PPC64LE-NEXT: beqlr 0
; PPC64LE-NEXT: .LBB53_2:
; PPC64LE-NEXT: lharx 6, 0, 3		; PPC64LE-NEXT: lharx 6, 0, 3
; PPC64LE-NEXT: cmpw 4, 6		; PPC64LE-NEXT: cmpw 4, 6
; PPC64LE-NEXT: beq 0, .LBB53_1		; PPC64LE-NEXT: bne 0, .LBB53_3
; PPC64LE-NEXT: # %bb.3:		; PPC64LE-NEXT: # %bb.2:
		; PPC64LE-NEXT: sthcx. 5, 0, 3
		; PPC64LE-NEXT: beqlr 0
		; PPC64LE-NEXT: b .LBB53_1
		; PPC64LE-NEXT: .LBB53_3:
; PPC64LE-NEXT: sthcx. 6, 0, 3		; PPC64LE-NEXT: sthcx. 6, 0, 3
; PPC64LE-NEXT: blr		; PPC64LE-NEXT: blr
%res = cmpxchg i16* %ptr, i16 %cmp, i16 %val release monotonic		%res = cmpxchg i16* %ptr, i16 %cmp, i16 %val release monotonic
ret void		ret void
}		}

define void @test54(i16* %ptr, i16 %cmp, i16 %val) {		define void @test54(i16* %ptr, i16 %cmp, i16 %val) {
; PPC64LE-LABEL: test54:		; PPC64LE-LABEL: test54:
; PPC64LE: # %bb.0:		; PPC64LE: # %bb.0:
; PPC64LE-NEXT: rlwinm 4, 4, 0, 16, 31		; PPC64LE-NEXT: rlwinm 4, 4, 0, 16, 31
; PPC64LE-NEXT: lwsync		; PPC64LE-NEXT: lwsync
; PPC64LE-NEXT: b .LBB54_2
; PPC64LE-NEXT: .p2align 5
; PPC64LE-NEXT: .LBB54_1:		; PPC64LE-NEXT: .LBB54_1:
; PPC64LE-NEXT: sthcx. 5, 0, 3
; PPC64LE-NEXT: beqlr 0
; PPC64LE-NEXT: .LBB54_2:
; PPC64LE-NEXT: lharx 6, 0, 3		; PPC64LE-NEXT: lharx 6, 0, 3
; PPC64LE-NEXT: cmpw 4, 6		; PPC64LE-NEXT: cmpw 4, 6
; PPC64LE-NEXT: beq 0, .LBB54_1		; PPC64LE-NEXT: bne 0, .LBB54_3
; PPC64LE-NEXT: # %bb.3:		; PPC64LE-NEXT: # %bb.2:
		; PPC64LE-NEXT: sthcx. 5, 0, 3
		; PPC64LE-NEXT: beqlr 0
		; PPC64LE-NEXT: b .LBB54_1
		; PPC64LE-NEXT: .LBB54_3:
; PPC64LE-NEXT: sthcx. 6, 0, 3		; PPC64LE-NEXT: sthcx. 6, 0, 3
; PPC64LE-NEXT: blr		; PPC64LE-NEXT: blr
%res = cmpxchg i16* %ptr, i16 %cmp, i16 %val release acquire		%res = cmpxchg i16* %ptr, i16 %cmp, i16 %val release acquire
ret void		ret void
}		}

define void @test55(i16* %ptr, i16 %cmp, i16 %val) {		define void @test55(i16* %ptr, i16 %cmp, i16 %val) {
; PPC64LE-LABEL: test55:		; PPC64LE-LABEL: test55:
▲ Show 20 Lines • Show All 108 Lines • ▼ Show 20 Lines
; PPC64LE-NEXT: blr		; PPC64LE-NEXT: blr
%res = cmpxchg i16* %ptr, i16 %cmp, i16 %val seq_cst seq_cst		%res = cmpxchg i16* %ptr, i16 %cmp, i16 %val seq_cst seq_cst
ret void		ret void
}		}

define void @test60(i32* %ptr, i32 %cmp, i32 %val) {		define void @test60(i32* %ptr, i32 %cmp, i32 %val) {
; PPC64LE-LABEL: test60:		; PPC64LE-LABEL: test60:
; PPC64LE: # %bb.0:		; PPC64LE: # %bb.0:
; PPC64LE-NEXT: b .LBB60_2
; PPC64LE-NEXT: .p2align 5
; PPC64LE-NEXT: .LBB60_1:		; PPC64LE-NEXT: .LBB60_1:
; PPC64LE-NEXT: stwcx. 5, 0, 3
; PPC64LE-NEXT: beqlr 0
; PPC64LE-NEXT: .LBB60_2:
; PPC64LE-NEXT: lwarx 6, 0, 3		; PPC64LE-NEXT: lwarx 6, 0, 3
; PPC64LE-NEXT: cmpw 4, 6		; PPC64LE-NEXT: cmpw 4, 6
; PPC64LE-NEXT: beq 0, .LBB60_1		; PPC64LE-NEXT: bne 0, .LBB60_3
; PPC64LE-NEXT: # %bb.3:		; PPC64LE-NEXT: # %bb.2:
		; PPC64LE-NEXT: stwcx. 5, 0, 3
		; PPC64LE-NEXT: beqlr 0
		; PPC64LE-NEXT: b .LBB60_1
		; PPC64LE-NEXT: .LBB60_3:
; PPC64LE-NEXT: stwcx. 6, 0, 3		; PPC64LE-NEXT: stwcx. 6, 0, 3
; PPC64LE-NEXT: blr		; PPC64LE-NEXT: blr
%res = cmpxchg i32* %ptr, i32 %cmp, i32 %val monotonic monotonic		%res = cmpxchg i32* %ptr, i32 %cmp, i32 %val monotonic monotonic
ret void		ret void
}		}

define void @test61(i32* %ptr, i32 %cmp, i32 %val) {		define void @test61(i32* %ptr, i32 %cmp, i32 %val) {
; PPC64LE-LABEL: test61:		; PPC64LE-LABEL: test61:
Show All 36 Lines	; PPC64LE-NEXT: blr
%res = cmpxchg i32* %ptr, i32 %cmp, i32 %val acquire acquire		%res = cmpxchg i32* %ptr, i32 %cmp, i32 %val acquire acquire
ret void		ret void
}		}

define void @test63(i32* %ptr, i32 %cmp, i32 %val) {		define void @test63(i32* %ptr, i32 %cmp, i32 %val) {
; PPC64LE-LABEL: test63:		; PPC64LE-LABEL: test63:
; PPC64LE: # %bb.0:		; PPC64LE: # %bb.0:
; PPC64LE-NEXT: lwsync		; PPC64LE-NEXT: lwsync
; PPC64LE-NEXT: b .LBB63_2
; PPC64LE-NEXT: .p2align 5
; PPC64LE-NEXT: .LBB63_1:		; PPC64LE-NEXT: .LBB63_1:
; PPC64LE-NEXT: stwcx. 5, 0, 3
; PPC64LE-NEXT: beqlr 0
; PPC64LE-NEXT: .LBB63_2:
; PPC64LE-NEXT: lwarx 6, 0, 3		; PPC64LE-NEXT: lwarx 6, 0, 3
; PPC64LE-NEXT: cmpw 4, 6		; PPC64LE-NEXT: cmpw 4, 6
; PPC64LE-NEXT: beq 0, .LBB63_1		; PPC64LE-NEXT: bne 0, .LBB63_3
; PPC64LE-NEXT: # %bb.3:		; PPC64LE-NEXT: # %bb.2:
		; PPC64LE-NEXT: stwcx. 5, 0, 3
		; PPC64LE-NEXT: beqlr 0
		; PPC64LE-NEXT: b .LBB63_1
		; PPC64LE-NEXT: .LBB63_3:
; PPC64LE-NEXT: stwcx. 6, 0, 3		; PPC64LE-NEXT: stwcx. 6, 0, 3
; PPC64LE-NEXT: blr		; PPC64LE-NEXT: blr
%res = cmpxchg i32* %ptr, i32 %cmp, i32 %val release monotonic		%res = cmpxchg i32* %ptr, i32 %cmp, i32 %val release monotonic
ret void		ret void
}		}

define void @test64(i32* %ptr, i32 %cmp, i32 %val) {		define void @test64(i32* %ptr, i32 %cmp, i32 %val) {
; PPC64LE-LABEL: test64:		; PPC64LE-LABEL: test64:
; PPC64LE: # %bb.0:		; PPC64LE: # %bb.0:
; PPC64LE-NEXT: lwsync		; PPC64LE-NEXT: lwsync
; PPC64LE-NEXT: b .LBB64_2
; PPC64LE-NEXT: .p2align 5
; PPC64LE-NEXT: .LBB64_1:		; PPC64LE-NEXT: .LBB64_1:
; PPC64LE-NEXT: stwcx. 5, 0, 3
; PPC64LE-NEXT: beqlr 0
; PPC64LE-NEXT: .LBB64_2:
; PPC64LE-NEXT: lwarx 6, 0, 3		; PPC64LE-NEXT: lwarx 6, 0, 3
; PPC64LE-NEXT: cmpw 4, 6		; PPC64LE-NEXT: cmpw 4, 6
; PPC64LE-NEXT: beq 0, .LBB64_1		; PPC64LE-NEXT: bne 0, .LBB64_3
; PPC64LE-NEXT: # %bb.3:		; PPC64LE-NEXT: # %bb.2:
		; PPC64LE-NEXT: stwcx. 5, 0, 3
		; PPC64LE-NEXT: beqlr 0
		; PPC64LE-NEXT: b .LBB64_1
		; PPC64LE-NEXT: .LBB64_3:
; PPC64LE-NEXT: stwcx. 6, 0, 3		; PPC64LE-NEXT: stwcx. 6, 0, 3
; PPC64LE-NEXT: blr		; PPC64LE-NEXT: blr
%res = cmpxchg i32* %ptr, i32 %cmp, i32 %val release acquire		%res = cmpxchg i32* %ptr, i32 %cmp, i32 %val release acquire
ret void		ret void
}		}

define void @test65(i32* %ptr, i32 %cmp, i32 %val) {		define void @test65(i32* %ptr, i32 %cmp, i32 %val) {
; PPC64LE-LABEL: test65:		; PPC64LE-LABEL: test65:
▲ Show 20 Lines • Show All 103 Lines • ▼ Show 20 Lines
; PPC64LE-NEXT: blr		; PPC64LE-NEXT: blr
%res = cmpxchg i32* %ptr, i32 %cmp, i32 %val seq_cst seq_cst		%res = cmpxchg i32* %ptr, i32 %cmp, i32 %val seq_cst seq_cst
ret void		ret void
}		}

define void @test70(i64* %ptr, i64 %cmp, i64 %val) {		define void @test70(i64* %ptr, i64 %cmp, i64 %val) {
; PPC64LE-LABEL: test70:		; PPC64LE-LABEL: test70:
; PPC64LE: # %bb.0:		; PPC64LE: # %bb.0:
; PPC64LE-NEXT: b .LBB70_2
; PPC64LE-NEXT: .p2align 5
; PPC64LE-NEXT: .LBB70_1:		; PPC64LE-NEXT: .LBB70_1:
; PPC64LE-NEXT: stdcx. 5, 0, 3
; PPC64LE-NEXT: beqlr 0
; PPC64LE-NEXT: .LBB70_2:
; PPC64LE-NEXT: ldarx 6, 0, 3		; PPC64LE-NEXT: ldarx 6, 0, 3
; PPC64LE-NEXT: cmpd 4, 6		; PPC64LE-NEXT: cmpd 4, 6
; PPC64LE-NEXT: beq 0, .LBB70_1		; PPC64LE-NEXT: bne 0, .LBB70_3
; PPC64LE-NEXT: # %bb.3:		; PPC64LE-NEXT: # %bb.2:
		; PPC64LE-NEXT: stdcx. 5, 0, 3
		; PPC64LE-NEXT: beqlr 0
		; PPC64LE-NEXT: b .LBB70_1
		; PPC64LE-NEXT: .LBB70_3:
; PPC64LE-NEXT: stdcx. 6, 0, 3		; PPC64LE-NEXT: stdcx. 6, 0, 3
; PPC64LE-NEXT: blr		; PPC64LE-NEXT: blr
%res = cmpxchg i64* %ptr, i64 %cmp, i64 %val monotonic monotonic		%res = cmpxchg i64* %ptr, i64 %cmp, i64 %val monotonic monotonic
ret void		ret void
}		}

define void @test71(i64* %ptr, i64 %cmp, i64 %val) {		define void @test71(i64* %ptr, i64 %cmp, i64 %val) {
; PPC64LE-LABEL: test71:		; PPC64LE-LABEL: test71:
Show All 36 Lines	; PPC64LE-NEXT: blr
%res = cmpxchg i64* %ptr, i64 %cmp, i64 %val acquire acquire		%res = cmpxchg i64* %ptr, i64 %cmp, i64 %val acquire acquire
ret void		ret void
}		}

define void @test73(i64* %ptr, i64 %cmp, i64 %val) {		define void @test73(i64* %ptr, i64 %cmp, i64 %val) {
; PPC64LE-LABEL: test73:		; PPC64LE-LABEL: test73:
; PPC64LE: # %bb.0:		; PPC64LE: # %bb.0:
; PPC64LE-NEXT: lwsync		; PPC64LE-NEXT: lwsync
; PPC64LE-NEXT: b .LBB73_2
; PPC64LE-NEXT: .p2align 5
; PPC64LE-NEXT: .LBB73_1:		; PPC64LE-NEXT: .LBB73_1:
; PPC64LE-NEXT: stdcx. 5, 0, 3
; PPC64LE-NEXT: beqlr 0
; PPC64LE-NEXT: .LBB73_2:
; PPC64LE-NEXT: ldarx 6, 0, 3		; PPC64LE-NEXT: ldarx 6, 0, 3
; PPC64LE-NEXT: cmpd 4, 6		; PPC64LE-NEXT: cmpd 4, 6
; PPC64LE-NEXT: beq 0, .LBB73_1		; PPC64LE-NEXT: bne 0, .LBB73_3
; PPC64LE-NEXT: # %bb.3:		; PPC64LE-NEXT: # %bb.2:
		; PPC64LE-NEXT: stdcx. 5, 0, 3
		; PPC64LE-NEXT: beqlr 0
		; PPC64LE-NEXT: b .LBB73_1
		; PPC64LE-NEXT: .LBB73_3:
; PPC64LE-NEXT: stdcx. 6, 0, 3		; PPC64LE-NEXT: stdcx. 6, 0, 3
; PPC64LE-NEXT: blr		; PPC64LE-NEXT: blr
%res = cmpxchg i64* %ptr, i64 %cmp, i64 %val release monotonic		%res = cmpxchg i64* %ptr, i64 %cmp, i64 %val release monotonic
ret void		ret void
}		}

define void @test74(i64* %ptr, i64 %cmp, i64 %val) {		define void @test74(i64* %ptr, i64 %cmp, i64 %val) {
; PPC64LE-LABEL: test74:		; PPC64LE-LABEL: test74:
; PPC64LE: # %bb.0:		; PPC64LE: # %bb.0:
; PPC64LE-NEXT: lwsync		; PPC64LE-NEXT: lwsync
; PPC64LE-NEXT: b .LBB74_2
; PPC64LE-NEXT: .p2align 5
; PPC64LE-NEXT: .LBB74_1:		; PPC64LE-NEXT: .LBB74_1:
; PPC64LE-NEXT: stdcx. 5, 0, 3
; PPC64LE-NEXT: beqlr 0
; PPC64LE-NEXT: .LBB74_2:
; PPC64LE-NEXT: ldarx 6, 0, 3		; PPC64LE-NEXT: ldarx 6, 0, 3
; PPC64LE-NEXT: cmpd 4, 6		; PPC64LE-NEXT: cmpd 4, 6
; PPC64LE-NEXT: beq 0, .LBB74_1		; PPC64LE-NEXT: bne 0, .LBB74_3
; PPC64LE-NEXT: # %bb.3:		; PPC64LE-NEXT: # %bb.2:
		; PPC64LE-NEXT: stdcx. 5, 0, 3
		; PPC64LE-NEXT: beqlr 0
		; PPC64LE-NEXT: b .LBB74_1
		; PPC64LE-NEXT: .LBB74_3:
; PPC64LE-NEXT: stdcx. 6, 0, 3		; PPC64LE-NEXT: stdcx. 6, 0, 3
; PPC64LE-NEXT: blr		; PPC64LE-NEXT: blr
%res = cmpxchg i64* %ptr, i64 %cmp, i64 %val release acquire		%res = cmpxchg i64* %ptr, i64 %cmp, i64 %val release acquire
ret void		ret void
}		}

define void @test75(i64* %ptr, i64 %cmp, i64 %val) {		define void @test75(i64* %ptr, i64 %cmp, i64 %val) {
; PPC64LE-LABEL: test75:		; PPC64LE-LABEL: test75:
▲ Show 20 Lines • Show All 104 Lines • ▼ Show 20 Lines	; PPC64LE-NEXT: blr
%res = cmpxchg i64* %ptr, i64 %cmp, i64 %val seq_cst seq_cst		%res = cmpxchg i64* %ptr, i64 %cmp, i64 %val seq_cst seq_cst
ret void		ret void
}		}

define void @test80(i8* %ptr, i8 %cmp, i8 %val) {		define void @test80(i8* %ptr, i8 %cmp, i8 %val) {
; PPC64LE-LABEL: test80:		; PPC64LE-LABEL: test80:
; PPC64LE: # %bb.0:		; PPC64LE: # %bb.0:
; PPC64LE-NEXT: rlwinm 4, 4, 0, 24, 31		; PPC64LE-NEXT: rlwinm 4, 4, 0, 24, 31
; PPC64LE-NEXT: b .LBB80_2
; PPC64LE-NEXT: .p2align 5
; PPC64LE-NEXT: .LBB80_1:		; PPC64LE-NEXT: .LBB80_1:
; PPC64LE-NEXT: stbcx. 5, 0, 3
; PPC64LE-NEXT: beqlr 0
; PPC64LE-NEXT: .LBB80_2:
; PPC64LE-NEXT: lbarx 6, 0, 3		; PPC64LE-NEXT: lbarx 6, 0, 3
; PPC64LE-NEXT: cmpw 4, 6		; PPC64LE-NEXT: cmpw 4, 6
; PPC64LE-NEXT: beq 0, .LBB80_1		; PPC64LE-NEXT: bne 0, .LBB80_3
; PPC64LE-NEXT: # %bb.3:		; PPC64LE-NEXT: # %bb.2:
		; PPC64LE-NEXT: stbcx. 5, 0, 3
		; PPC64LE-NEXT: beqlr 0
		; PPC64LE-NEXT: b .LBB80_1
		; PPC64LE-NEXT: .LBB80_3:
; PPC64LE-NEXT: stbcx. 6, 0, 3		; PPC64LE-NEXT: stbcx. 6, 0, 3
; PPC64LE-NEXT: blr		; PPC64LE-NEXT: blr
%res = cmpxchg i8* %ptr, i8 %cmp, i8 %val syncscope("singlethread") monotonic monotonic		%res = cmpxchg i8* %ptr, i8 %cmp, i8 %val syncscope("singlethread") monotonic monotonic
ret void		ret void
}		}

define void @test81(i8* %ptr, i8 %cmp, i8 %val) {		define void @test81(i8* %ptr, i8 %cmp, i8 %val) {
; PPC64LE-LABEL: test81:		; PPC64LE-LABEL: test81:
Show All 39 Lines	; PPC64LE-NEXT: blr
ret void		ret void
}		}

define void @test83(i8* %ptr, i8 %cmp, i8 %val) {		define void @test83(i8* %ptr, i8 %cmp, i8 %val) {
; PPC64LE-LABEL: test83:		; PPC64LE-LABEL: test83:
; PPC64LE: # %bb.0:		; PPC64LE: # %bb.0:
; PPC64LE-NEXT: rlwinm 4, 4, 0, 24, 31		; PPC64LE-NEXT: rlwinm 4, 4, 0, 24, 31
; PPC64LE-NEXT: lwsync		; PPC64LE-NEXT: lwsync
; PPC64LE-NEXT: b .LBB83_2
; PPC64LE-NEXT: .p2align 5
; PPC64LE-NEXT: .LBB83_1:		; PPC64LE-NEXT: .LBB83_1:
; PPC64LE-NEXT: stbcx. 5, 0, 3
; PPC64LE-NEXT: beqlr 0
; PPC64LE-NEXT: .LBB83_2:
; PPC64LE-NEXT: lbarx 6, 0, 3		; PPC64LE-NEXT: lbarx 6, 0, 3
; PPC64LE-NEXT: cmpw 4, 6		; PPC64LE-NEXT: cmpw 4, 6
; PPC64LE-NEXT: beq 0, .LBB83_1		; PPC64LE-NEXT: bne 0, .LBB83_3
; PPC64LE-NEXT: # %bb.3:		; PPC64LE-NEXT: # %bb.2:
		; PPC64LE-NEXT: stbcx. 5, 0, 3
		; PPC64LE-NEXT: beqlr 0
		; PPC64LE-NEXT: b .LBB83_1
		; PPC64LE-NEXT: .LBB83_3:
; PPC64LE-NEXT: stbcx. 6, 0, 3		; PPC64LE-NEXT: stbcx. 6, 0, 3
; PPC64LE-NEXT: blr		; PPC64LE-NEXT: blr
%res = cmpxchg i8* %ptr, i8 %cmp, i8 %val syncscope("singlethread") release monotonic		%res = cmpxchg i8* %ptr, i8 %cmp, i8 %val syncscope("singlethread") release monotonic
ret void		ret void
}		}

define void @test84(i8* %ptr, i8 %cmp, i8 %val) {		define void @test84(i8* %ptr, i8 %cmp, i8 %val) {
; PPC64LE-LABEL: test84:		; PPC64LE-LABEL: test84:
; PPC64LE: # %bb.0:		; PPC64LE: # %bb.0:
; PPC64LE-NEXT: rlwinm 4, 4, 0, 24, 31		; PPC64LE-NEXT: rlwinm 4, 4, 0, 24, 31
; PPC64LE-NEXT: lwsync		; PPC64LE-NEXT: lwsync
; PPC64LE-NEXT: b .LBB84_2
; PPC64LE-NEXT: .p2align 5
; PPC64LE-NEXT: .LBB84_1:		; PPC64LE-NEXT: .LBB84_1:
; PPC64LE-NEXT: stbcx. 5, 0, 3
; PPC64LE-NEXT: beqlr 0
; PPC64LE-NEXT: .LBB84_2:
; PPC64LE-NEXT: lbarx 6, 0, 3		; PPC64LE-NEXT: lbarx 6, 0, 3
; PPC64LE-NEXT: cmpw 4, 6		; PPC64LE-NEXT: cmpw 4, 6
; PPC64LE-NEXT: beq 0, .LBB84_1		; PPC64LE-NEXT: bne 0, .LBB84_3
; PPC64LE-NEXT: # %bb.3:		; PPC64LE-NEXT: # %bb.2:
		; PPC64LE-NEXT: stbcx. 5, 0, 3
		; PPC64LE-NEXT: beqlr 0
		; PPC64LE-NEXT: b .LBB84_1
		; PPC64LE-NEXT: .LBB84_3:
; PPC64LE-NEXT: stbcx. 6, 0, 3		; PPC64LE-NEXT: stbcx. 6, 0, 3
; PPC64LE-NEXT: blr		; PPC64LE-NEXT: blr
%res = cmpxchg i8* %ptr, i8 %cmp, i8 %val syncscope("singlethread") release acquire		%res = cmpxchg i8* %ptr, i8 %cmp, i8 %val syncscope("singlethread") release acquire
ret void		ret void
}		}

define void @test85(i8* %ptr, i8 %cmp, i8 %val) {		define void @test85(i8* %ptr, i8 %cmp, i8 %val) {
; PPC64LE-LABEL: test85:		; PPC64LE-LABEL: test85:
▲ Show 20 Lines • Show All 109 Lines • ▼ Show 20 Lines	; PPC64LE-NEXT: blr
%res = cmpxchg i8* %ptr, i8 %cmp, i8 %val syncscope("singlethread") seq_cst seq_cst		%res = cmpxchg i8* %ptr, i8 %cmp, i8 %val syncscope("singlethread") seq_cst seq_cst
ret void		ret void
}		}

define void @test90(i16* %ptr, i16 %cmp, i16 %val) {		define void @test90(i16* %ptr, i16 %cmp, i16 %val) {
; PPC64LE-LABEL: test90:		; PPC64LE-LABEL: test90:
; PPC64LE: # %bb.0:		; PPC64LE: # %bb.0:
; PPC64LE-NEXT: rlwinm 4, 4, 0, 16, 31		; PPC64LE-NEXT: rlwinm 4, 4, 0, 16, 31
; PPC64LE-NEXT: b .LBB90_2
; PPC64LE-NEXT: .p2align 5
; PPC64LE-NEXT: .LBB90_1:		; PPC64LE-NEXT: .LBB90_1:
; PPC64LE-NEXT: sthcx. 5, 0, 3
; PPC64LE-NEXT: beqlr 0
; PPC64LE-NEXT: .LBB90_2:
; PPC64LE-NEXT: lharx 6, 0, 3		; PPC64LE-NEXT: lharx 6, 0, 3
; PPC64LE-NEXT: cmpw 4, 6		; PPC64LE-NEXT: cmpw 4, 6
; PPC64LE-NEXT: beq 0, .LBB90_1		; PPC64LE-NEXT: bne 0, .LBB90_3
; PPC64LE-NEXT: # %bb.3:		; PPC64LE-NEXT: # %bb.2:
		; PPC64LE-NEXT: sthcx. 5, 0, 3
		; PPC64LE-NEXT: beqlr 0
		; PPC64LE-NEXT: b
		; PPC64LE-NEXT: .LBB90_3:
; PPC64LE-NEXT: sthcx. 6, 0, 3		; PPC64LE-NEXT: sthcx. 6, 0, 3
; PPC64LE-NEXT: blr		; PPC64LE-NEXT: blr
%res = cmpxchg i16* %ptr, i16 %cmp, i16 %val syncscope("singlethread") monotonic monotonic		%res = cmpxchg i16* %ptr, i16 %cmp, i16 %val syncscope("singlethread") monotonic monotonic
ret void		ret void
}		}

define void @test91(i16* %ptr, i16 %cmp, i16 %val) {		define void @test91(i16* %ptr, i16 %cmp, i16 %val) {
; PPC64LE-LABEL: test91:		; PPC64LE-LABEL: test91:
Show All 39 Lines	; PPC64LE-NEXT: blr
ret void		ret void
}		}

define void @test93(i16* %ptr, i16 %cmp, i16 %val) {		define void @test93(i16* %ptr, i16 %cmp, i16 %val) {
; PPC64LE-LABEL: test93:		; PPC64LE-LABEL: test93:
; PPC64LE: # %bb.0:		; PPC64LE: # %bb.0:
; PPC64LE-NEXT: rlwinm 4, 4, 0, 16, 31		; PPC64LE-NEXT: rlwinm 4, 4, 0, 16, 31
; PPC64LE-NEXT: lwsync		; PPC64LE-NEXT: lwsync
; PPC64LE-NEXT: b .LBB93_2
; PPC64LE-NEXT: .p2align 5
; PPC64LE-NEXT: .LBB93_1:		; PPC64LE-NEXT: .LBB93_1:
; PPC64LE-NEXT: sthcx. 5, 0, 3
; PPC64LE-NEXT: beqlr 0
; PPC64LE-NEXT: .LBB93_2:
; PPC64LE-NEXT: lharx 6, 0, 3		; PPC64LE-NEXT: lharx 6, 0, 3
; PPC64LE-NEXT: cmpw 4, 6		; PPC64LE-NEXT: cmpw 4, 6
; PPC64LE-NEXT: beq 0, .LBB93_1		; PPC64LE-NEXT: bne 0, .LBB93_3
; PPC64LE-NEXT: # %bb.3:		; PPC64LE-NEXT: # %bb.2:
		; PPC64LE-NEXT: sthcx. 5, 0, 3
		; PPC64LE-NEXT: beqlr 0
		; PPC64LE-NEXT: b .LBB93_1
		; PPC64LE-NEXT: .LBB93_3:
; PPC64LE-NEXT: sthcx. 6, 0, 3		; PPC64LE-NEXT: sthcx. 6, 0, 3
; PPC64LE-NEXT: blr		; PPC64LE-NEXT: blr
%res = cmpxchg i16* %ptr, i16 %cmp, i16 %val syncscope("singlethread") release monotonic		%res = cmpxchg i16* %ptr, i16 %cmp, i16 %val syncscope("singlethread") release monotonic
ret void		ret void
}		}

define void @test94(i16* %ptr, i16 %cmp, i16 %val) {		define void @test94(i16* %ptr, i16 %cmp, i16 %val) {
; PPC64LE-LABEL: test94:		; PPC64LE-LABEL: test94:
; PPC64LE: # %bb.0:		; PPC64LE: # %bb.0:
; PPC64LE-NEXT: rlwinm 4, 4, 0, 16, 31		; PPC64LE-NEXT: rlwinm 4, 4, 0, 16, 31
; PPC64LE-NEXT: lwsync		; PPC64LE-NEXT: lwsync
; PPC64LE-NEXT: b .LBB94_2
; PPC64LE-NEXT: .p2align 5
; PPC64LE-NEXT: .LBB94_1:		; PPC64LE-NEXT: .LBB94_1:
; PPC64LE-NEXT: sthcx. 5, 0, 3
; PPC64LE-NEXT: beqlr 0
; PPC64LE-NEXT: .LBB94_2:
; PPC64LE-NEXT: lharx 6, 0, 3		; PPC64LE-NEXT: lharx 6, 0, 3
; PPC64LE-NEXT: cmpw 4, 6		; PPC64LE-NEXT: cmpw 4, 6
; PPC64LE-NEXT: beq 0, .LBB94_1		; PPC64LE-NEXT: bne 0, .LBB94_3
; PPC64LE-NEXT: # %bb.3:		; PPC64LE-NEXT: # %bb.2:
		; PPC64LE-NEXT: sthcx. 5, 0, 3
		; PPC64LE-NEXT: beqlr 0
		; PPC64LE-NEXT: b .LBB94_1
		; PPC64LE-NEXT: .LBB94_3:
; PPC64LE-NEXT: sthcx. 6, 0, 3		; PPC64LE-NEXT: sthcx. 6, 0, 3
; PPC64LE-NEXT: blr		; PPC64LE-NEXT: blr
%res = cmpxchg i16* %ptr, i16 %cmp, i16 %val syncscope("singlethread") release acquire		%res = cmpxchg i16* %ptr, i16 %cmp, i16 %val syncscope("singlethread") release acquire
ret void		ret void
}		}

define void @test95(i16* %ptr, i16 %cmp, i16 %val) {		define void @test95(i16* %ptr, i16 %cmp, i16 %val) {
; PPC64LE-LABEL: test95:		; PPC64LE-LABEL: test95:
▲ Show 20 Lines • Show All 108 Lines • ▼ Show 20 Lines
; PPC64LE-NEXT: blr		; PPC64LE-NEXT: blr
%res = cmpxchg i16* %ptr, i16 %cmp, i16 %val syncscope("singlethread") seq_cst seq_cst		%res = cmpxchg i16* %ptr, i16 %cmp, i16 %val syncscope("singlethread") seq_cst seq_cst
ret void		ret void
}		}

define void @test100(i32* %ptr, i32 %cmp, i32 %val) {		define void @test100(i32* %ptr, i32 %cmp, i32 %val) {
; PPC64LE-LABEL: test100:		; PPC64LE-LABEL: test100:
; PPC64LE: # %bb.0:		; PPC64LE: # %bb.0:
; PPC64LE-NEXT: b .LBB100_2
; PPC64LE-NEXT: .p2align 5
; PPC64LE-NEXT: .LBB100_1:		; PPC64LE-NEXT: .LBB100_1:
; PPC64LE-NEXT: stwcx. 5, 0, 3
; PPC64LE-NEXT: beqlr 0
; PPC64LE-NEXT: .LBB100_2:
; PPC64LE-NEXT: lwarx 6, 0, 3		; PPC64LE-NEXT: lwarx 6, 0, 3
; PPC64LE-NEXT: cmpw 4, 6		; PPC64LE-NEXT: cmpw 4, 6
; PPC64LE-NEXT: beq 0, .LBB100_1		; PPC64LE-NEXT: bne 0, .LBB100_3
; PPC64LE-NEXT: # %bb.3:		; PPC64LE-NEXT: # %bb.2:
		; PPC64LE-NEXT: stwcx. 5, 0, 3
		; PPC64LE-NEXT: beqlr 0
		; PPC64LE-NEXT: b .LBB100_1
		; PPC64LE-NEXT: .LBB100_3:
; PPC64LE-NEXT: stwcx. 6, 0, 3		; PPC64LE-NEXT: stwcx. 6, 0, 3
; PPC64LE-NEXT: blr		; PPC64LE-NEXT: blr
%res = cmpxchg i32* %ptr, i32 %cmp, i32 %val syncscope("singlethread") monotonic monotonic		%res = cmpxchg i32* %ptr, i32 %cmp, i32 %val syncscope("singlethread") monotonic monotonic
ret void		ret void
}		}

define void @test101(i32* %ptr, i32 %cmp, i32 %val) {		define void @test101(i32* %ptr, i32 %cmp, i32 %val) {
; PPC64LE-LABEL: test101:		; PPC64LE-LABEL: test101:
Show All 36 Lines	; PPC64LE-NEXT: blr
%res = cmpxchg i32* %ptr, i32 %cmp, i32 %val syncscope("singlethread") acquire acquire		%res = cmpxchg i32* %ptr, i32 %cmp, i32 %val syncscope("singlethread") acquire acquire
ret void		ret void
}		}

define void @test103(i32* %ptr, i32 %cmp, i32 %val) {		define void @test103(i32* %ptr, i32 %cmp, i32 %val) {
; PPC64LE-LABEL: test103:		; PPC64LE-LABEL: test103:
; PPC64LE: # %bb.0:		; PPC64LE: # %bb.0:
; PPC64LE-NEXT: lwsync		; PPC64LE-NEXT: lwsync
; PPC64LE-NEXT: b .LBB103_2
; PPC64LE-NEXT: .p2align 5
; PPC64LE-NEXT: .LBB103_1:		; PPC64LE-NEXT: .LBB103_1:
; PPC64LE-NEXT: stwcx. 5, 0, 3
; PPC64LE-NEXT: beqlr 0
; PPC64LE-NEXT: .LBB103_2:
; PPC64LE-NEXT: lwarx 6, 0, 3		; PPC64LE-NEXT: lwarx 6, 0, 3
; PPC64LE-NEXT: cmpw 4, 6		; PPC64LE-NEXT: cmpw 4, 6
; PPC64LE-NEXT: beq 0, .LBB103_1		; PPC64LE-NEXT: bne 0, .LBB103_3
; PPC64LE-NEXT: # %bb.3:		; PPC64LE-NEXT: # %bb.2:
		; PPC64LE-NEXT: stwcx. 5, 0, 3
		; PPC64LE-NEXT: beqlr 0
		; PPC64LE-NEXT: b .LBB103_1
		; PPC64LE-NEXT: .LBB103_3:
; PPC64LE-NEXT: stwcx. 6, 0, 3		; PPC64LE-NEXT: stwcx. 6, 0, 3
; PPC64LE-NEXT: blr		; PPC64LE-NEXT: blr
%res = cmpxchg i32* %ptr, i32 %cmp, i32 %val syncscope("singlethread") release monotonic		%res = cmpxchg i32* %ptr, i32 %cmp, i32 %val syncscope("singlethread") release monotonic
ret void		ret void
}		}

define void @test104(i32* %ptr, i32 %cmp, i32 %val) {		define void @test104(i32* %ptr, i32 %cmp, i32 %val) {
; PPC64LE-LABEL: test104:		; PPC64LE-LABEL: test104:
; PPC64LE: # %bb.0:		; PPC64LE: # %bb.0:
; PPC64LE-NEXT: lwsync		; PPC64LE-NEXT: lwsync
; PPC64LE-NEXT: b .LBB104_2
; PPC64LE-NEXT: .p2align 5
; PPC64LE-NEXT: .LBB104_1:		; PPC64LE-NEXT: .LBB104_1:
; PPC64LE-NEXT: stwcx. 5, 0, 3
; PPC64LE-NEXT: beqlr 0
; PPC64LE-NEXT: .LBB104_2:
; PPC64LE-NEXT: lwarx 6, 0, 3		; PPC64LE-NEXT: lwarx 6, 0, 3
; PPC64LE-NEXT: cmpw 4, 6		; PPC64LE-NEXT: cmpw 4, 6
; PPC64LE-NEXT: beq 0, .LBB104_1		; PPC64LE-NEXT: bne 0, .LBB104_3
; PPC64LE-NEXT: # %bb.3:		; PPC64LE-NEXT: # %bb.2:
		; PPC64LE-NEXT: stwcx. 5, 0, 3
		; PPC64LE-NEXT: beqlr 0
		; PPC64LE-NEXT: b .LBB104_1
		; PPC64LE-NEXT: .LBB104_3:
; PPC64LE-NEXT: stwcx. 6, 0, 3		; PPC64LE-NEXT: stwcx. 6, 0, 3
; PPC64LE-NEXT: blr		; PPC64LE-NEXT: blr
%res = cmpxchg i32* %ptr, i32 %cmp, i32 %val syncscope("singlethread") release acquire		%res = cmpxchg i32* %ptr, i32 %cmp, i32 %val syncscope("singlethread") release acquire
ret void		ret void
}		}

define void @test105(i32* %ptr, i32 %cmp, i32 %val) {		define void @test105(i32* %ptr, i32 %cmp, i32 %val) {
; PPC64LE-LABEL: test105:		; PPC64LE-LABEL: test105:
▲ Show 20 Lines • Show All 103 Lines • ▼ Show 20 Lines
; PPC64LE-NEXT: blr		; PPC64LE-NEXT: blr
%res = cmpxchg i32* %ptr, i32 %cmp, i32 %val syncscope("singlethread") seq_cst seq_cst		%res = cmpxchg i32* %ptr, i32 %cmp, i32 %val syncscope("singlethread") seq_cst seq_cst
ret void		ret void
}		}

define void @test110(i64* %ptr, i64 %cmp, i64 %val) {		define void @test110(i64* %ptr, i64 %cmp, i64 %val) {
; PPC64LE-LABEL: test110:		; PPC64LE-LABEL: test110:
; PPC64LE: # %bb.0:		; PPC64LE: # %bb.0:
; PPC64LE-NEXT: b .LBB110_2
; PPC64LE-NEXT: .p2align 5
; PPC64LE-NEXT: .LBB110_1:		; PPC64LE-NEXT: .LBB110_1:
; PPC64LE-NEXT: stdcx. 5, 0, 3
; PPC64LE-NEXT: beqlr 0
; PPC64LE-NEXT: .LBB110_2:
; PPC64LE-NEXT: ldarx 6, 0, 3		; PPC64LE-NEXT: ldarx 6, 0, 3
; PPC64LE-NEXT: cmpd 4, 6		; PPC64LE-NEXT: cmpd 4, 6
; PPC64LE-NEXT: beq 0, .LBB110_1		; PPC64LE-NEXT: bne 0, .LBB110_3
; PPC64LE-NEXT: # %bb.3:		; PPC64LE-NEXT: # %bb.2:
		; PPC64LE-NEXT: stdcx. 5, 0, 3
		; PPC64LE-NEXT: beqlr 0
		; PPC64LE-NEXT: b .LBB110_1
		; PPC64LE-NEXT: .LBB110_3:
; PPC64LE-NEXT: stdcx. 6, 0, 3		; PPC64LE-NEXT: stdcx. 6, 0, 3
; PPC64LE-NEXT: blr		; PPC64LE-NEXT: blr
%res = cmpxchg i64* %ptr, i64 %cmp, i64 %val syncscope("singlethread") monotonic monotonic		%res = cmpxchg i64* %ptr, i64 %cmp, i64 %val syncscope("singlethread") monotonic monotonic
ret void		ret void
}		}

define void @test111(i64* %ptr, i64 %cmp, i64 %val) {		define void @test111(i64* %ptr, i64 %cmp, i64 %val) {
; PPC64LE-LABEL: test111:		; PPC64LE-LABEL: test111:
Show All 36 Lines	; PPC64LE-NEXT: blr
%res = cmpxchg i64* %ptr, i64 %cmp, i64 %val syncscope("singlethread") acquire acquire		%res = cmpxchg i64* %ptr, i64 %cmp, i64 %val syncscope("singlethread") acquire acquire
ret void		ret void
}		}

define void @test113(i64* %ptr, i64 %cmp, i64 %val) {		define void @test113(i64* %ptr, i64 %cmp, i64 %val) {
; PPC64LE-LABEL: test113:		; PPC64LE-LABEL: test113:
; PPC64LE: # %bb.0:		; PPC64LE: # %bb.0:
; PPC64LE-NEXT: lwsync		; PPC64LE-NEXT: lwsync
; PPC64LE-NEXT: b .LBB113_2
; PPC64LE-NEXT: .p2align 5
; PPC64LE-NEXT: .LBB113_1:		; PPC64LE-NEXT: .LBB113_1:
; PPC64LE-NEXT: stdcx. 5, 0, 3
; PPC64LE-NEXT: beqlr 0
; PPC64LE-NEXT: .LBB113_2:
; PPC64LE-NEXT: ldarx 6, 0, 3		; PPC64LE-NEXT: ldarx 6, 0, 3
; PPC64LE-NEXT: cmpd 4, 6		; PPC64LE-NEXT: cmpd 4, 6
; PPC64LE-NEXT: beq 0, .LBB113_1		; PPC64LE-NEXT: bne 0, .LBB113_3
; PPC64LE-NEXT: # %bb.3:		; PPC64LE-NEXT: # %bb.2:
		; PPC64LE-NEXT: stdcx. 5, 0, 3
		; PPC64LE-NEXT: beqlr 0
		; PPC64LE-NEXT: b .LBB113_1
		; PPC64LE-NEXT: .LBB113_3:
; PPC64LE-NEXT: stdcx. 6, 0, 3		; PPC64LE-NEXT: stdcx. 6, 0, 3
; PPC64LE-NEXT: blr		; PPC64LE-NEXT: blr
%res = cmpxchg i64* %ptr, i64 %cmp, i64 %val syncscope("singlethread") release monotonic		%res = cmpxchg i64* %ptr, i64 %cmp, i64 %val syncscope("singlethread") release monotonic
ret void		ret void
}		}

define void @test114(i64* %ptr, i64 %cmp, i64 %val) {		define void @test114(i64* %ptr, i64 %cmp, i64 %val) {
; PPC64LE-LABEL: test114:		; PPC64LE-LABEL: test114:
; PPC64LE: # %bb.0:		; PPC64LE: # %bb.0:
; PPC64LE-NEXT: lwsync		; PPC64LE-NEXT: lwsync
; PPC64LE-NEXT: b .LBB114_2
; PPC64LE-NEXT: .p2align 5
; PPC64LE-NEXT: .LBB114_1:		; PPC64LE-NEXT: .LBB114_1:
; PPC64LE-NEXT: stdcx. 5, 0, 3
; PPC64LE-NEXT: beqlr 0
; PPC64LE-NEXT: .LBB114_2:
; PPC64LE-NEXT: ldarx 6, 0, 3		; PPC64LE-NEXT: ldarx 6, 0, 3
; PPC64LE-NEXT: cmpd 4, 6		; PPC64LE-NEXT: cmpd 4, 6
; PPC64LE-NEXT: beq 0, .LBB114_1		; PPC64LE-NEXT: bne 0, .LBB114_3
; PPC64LE-NEXT: # %bb.3:		; PPC64LE-NEXT: # %bb.2:
		; PPC64LE-NEXT: stdcx. 5, 0, 3
		; PPC64LE-NEXT: beqlr 0
		; PPC64LE-NEXT: b .LBB114_1
		; PPC64LE-NEXT: .LBB114_3:
; PPC64LE-NEXT: stdcx. 6, 0, 3		; PPC64LE-NEXT: stdcx. 6, 0, 3
; PPC64LE-NEXT: blr		; PPC64LE-NEXT: blr
%res = cmpxchg i64* %ptr, i64 %cmp, i64 %val syncscope("singlethread") release acquire		%res = cmpxchg i64* %ptr, i64 %cmp, i64 %val syncscope("singlethread") release acquire
ret void		ret void
}		}

define void @test115(i64* %ptr, i64 %cmp, i64 %val) {		define void @test115(i64* %ptr, i64 %cmp, i64 %val) {
; PPC64LE-LABEL: test115:		; PPC64LE-LABEL: test115:
▲ Show 20 Lines • Show All 7,587 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/PowerPC/cmp_elimination.ll

	Show First 20 Lines • Show All 712 Lines • ▼ Show 20 Lines
	if.end:			if.end:
	ret void			ret void
	}			}

	; partially redundant case			; partially redundant case
	define void @func28(i32 signext %a) {			define void @func28(i32 signext %a) {
	; CHECK-LABEL: @func28			; CHECK-LABEL: @func28
	; CHECK: cmplwi [[REG1:[0-9]+]], [[REG2:[0-9]+]]			; CHECK: cmplwi [[REG1:[0-9]+]], [[REG2:[0-9]+]]
	; CHECK: .[[LABEL1:[A-Z0-9_]+]]:			; CHECK: .[[LABEL2:[A-Z0-9_]+]]:
				; CHECK: cmpwi [[REG1]], [[REG2]]
				; CHECK: ble 0, .[[LABEL1:[A-Z0-9_]+]]
	; CHECK-NOT: cmp			; CHECK-NOT: cmp
	; CHECK: bne 0, .[[LABEL2:[A-Z0-9_]+]]			; CHECK: bne 0, .[[LABEL2]]
	; CHECK: bl dummy1			; CHECK: bl dummy1
	; CHECK: .[[LABEL2]]:			; CHECK: b .[[LABEL2]]
	; CHECK: cmpwi [[REG1]], [[REG2]]			; CHECK: .[[LABEL1]]:
	; CHECK: bgt 0, .[[LABEL1]]
	; CHECK: blr			; CHECK: blr
	entry:			entry:
	br label %do.body			br label %do.body

	do.body:			do.body:
	%a.addr.0 = phi i32 [ %a, %entry ], [ %call, %if.end ]			%a.addr.0 = phi i32 [ %a, %entry ], [ %call, %if.end ]
	%cmp = icmp eq i32 %a.addr.0, 0			%cmp = icmp eq i32 %a.addr.0, 0
	br i1 %cmp, label %if.then, label %if.end			br i1 %cmp, label %if.then, label %if.end
	▲ Show 20 Lines • Show All 52 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/PowerPC/ctrloop-shortLoops.ll

Show First 20 Lines • Show All 82 Lines • ▼ Show 20 Lines	for.body: ; preds = %entry, %for.body
%indvars.iv.next = add nsw i64 %indvars.iv, -1		%indvars.iv.next = add nsw i64 %indvars.iv, -1
%tobool = icmp eq i64 %indvars.iv, 0		%tobool = icmp eq i64 %indvars.iv, 0
br i1 %tobool, label %for.cond.cleanup, label %for.body		br i1 %tobool, label %for.cond.cleanup, label %for.body
}		}

; Function Attrs: norecurse nounwind		; Function Attrs: norecurse nounwind
define signext i32 @testTripCount2NonSmallLoop() {		define signext i32 @testTripCount2NonSmallLoop() {
; CHECK-LABEL: testTripCount2NonSmallLoop:		; CHECK-LABEL: testTripCount2NonSmallLoop:
; CHECK: bge		; CHECK: blt
		; CHECK: beq
; CHECK: blr		; CHECK: blr

entry:		entry:
%.pre = load i32, i32* @a, align 4		%.pre = load i32, i32* @a, align 4
br label %for.body		br label %for.body

for.body: ; preds = %entry, %if.end		for.body: ; preds = %entry, %if.end
%0 = phi i32 [ %.pre, %entry ], [ %1, %if.end ]		%0 = phi i32 [ %.pre, %entry ], [ %1, %if.end ]
▲ Show 20 Lines • Show All 65 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/PowerPC/expand-foldable-isel.ll

	Show All 23 Lines
	; CHECK-GEN-ISEL-TRUE-LABEL: _ZN3pov6ot_insEPPNS_14ot_node_structEPNS_15ot_block_structEPNS_12ot_id_structE:			; CHECK-GEN-ISEL-TRUE-LABEL: _ZN3pov6ot_insEPPNS_14ot_node_structEPNS_15ot_block_structEPNS_12ot_id_structE:
	; Note: the following line fold the original isel (isel r4, r3, r3)			; Note: the following line fold the original isel (isel r4, r3, r3)
	; CHECK-GEN-ISEL-TRUE: mr r4, r3			; CHECK-GEN-ISEL-TRUE: mr r4, r3
	; CHECK-GEN-ISEL-TRUE: isel r29, r5, r6, 4*cr5+lt			; CHECK-GEN-ISEL-TRUE: isel r29, r5, r6, 4*cr5+lt
	; CHECK-GEN-ISEL-TRUE: blr			; CHECK-GEN-ISEL-TRUE: blr
	;			;
	; CHECK-LABEL: _ZN3pov6ot_insEPPNS_14ot_node_structEPNS_15ot_block_structEPNS_12ot_id_structE:			; CHECK-LABEL: _ZN3pov6ot_insEPPNS_14ot_node_structEPNS_15ot_block_structEPNS_12ot_id_structE:
	; CHECK: mr r4, r3			; CHECK: mr r4, r3
	; CHECK: bc 12, 4*cr5+lt, .LBB0_3			; CHECK: bc 12, 4*cr5+lt, [[CASE1:.LBB[0-9_]+]]
	; CHECK: # %bb.2:			; CHECK: # %bb.
	; CHECK: ori r29, r6, 0			; CHECK: ori r29, r6, 0
	; CHECK: b .LBB0_4			; CHECK: b [[MERGE:.LBB[0-9_]+]]
	; CHECK: .LBB0_3:			; CHECK: [[CASE1]]:
	; CHECK: addi r29, r5, 0			; CHECK: addi r29, r5, 0
	; CHECK: .LBB0_4:			; CHECK: [[MERGE]]:
	; CHECK: blr			; CHECK: blr
	entry:			entry:
	br label %while.cond11			br label %while.cond11

	while.cond11:			while.cond11:
	%this_node.0250 = phi %"struct.pov::ot_node_struct"* [ undef, %entry ], [ %1, %cond.false21.i156 ], [ %1, %cond.true18.i153 ]			%this_node.0250 = phi %"struct.pov::ot_node_struct"* [ undef, %entry ], [ %1, %cond.false21.i156 ], [ %1, %cond.true18.i153 ]
	%temp_id.sroa.21.1 = phi i32 [ undef, %entry ], [ %shr2039.i152, %cond.true18.i153 ], [ %div24.i155, %cond.false21.i156 ]			%temp_id.sroa.21.1 = phi i32 [ undef, %entry ], [ %shr2039.i152, %cond.true18.i153 ], [ %div24.i155, %cond.false21.i156 ]
	%0 = load i32, i32* undef, align 4			%0 = load i32, i32* undef, align 4
	Show All 25 Lines

llvm/trunk/test/CodeGen/PowerPC/knowCRBitSpill.ll

	Show First 20 Lines • Show All 80 Lines • ▼ Show 20 Lines
	define dso_local signext i32 @spillCRUNSET(%struct.p5rx* readonly %p1, i32 signext %p2, i32 signext %p3) {			define dso_local signext i32 @spillCRUNSET(%struct.p5rx* readonly %p1, i32 signext %p2, i32 signext %p3) {
	; CHECK-LABEL: spillCRUNSET:			; CHECK-LABEL: spillCRUNSET:
	; CHECK: # %bb.0: # %entry			; CHECK: # %bb.0: # %entry
	; CHECK-DAG: crxor [[CREG:.]]cr5+lt, [[CREG]]cr5+lt, [[CREG]]cr5+lt			; CHECK-DAG: crxor [[CREG:.]]cr5+lt, [[CREG]]cr5+lt, [[CREG]]cr5+lt
	; CHECK-DAG: li [[REG1:.*]], 0			; CHECK-DAG: li [[REG1:.*]], 0
	; CHECK-NOT: mfocrf [[REG2:.*]], [[CREG]]			; CHECK-NOT: mfocrf [[REG2:.*]], [[CREG]]
	; CHECK-NOT: rlwinm [[REG2]], [[REG2]]			; CHECK-NOT: rlwinm [[REG2]], [[REG2]]
	; CHECK: stw [[REG1]]			; CHECK: stw [[REG1]]
	; CHECK: .LBB1_1: # %redo_first_pass			; CHECK: .LBB1_1:
	entry:			entry:
	%and = and i32 %p3, 128			%and = and i32 %p3, 128
	%tobool = icmp eq i32 %and, 0			%tobool = icmp eq i32 %and, 0
	%tobool2 = icmp eq %struct.p5rx* %p1, null			%tobool2 = icmp eq %struct.p5rx* %p1, null
	%sv_any = getelementptr inbounds %struct.p5rx, %struct.p5rx* %p1, i64 0, i32 0			%sv_any = getelementptr inbounds %struct.p5rx, %struct.p5rx* %p1, i64 0, i32 0
	%tobool12 = icmp eq i32 %p2, 0			%tobool12 = icmp eq i32 %p2, 0
	br label %redo_first_pass			br label %redo_first_pass

	Show All 40 Lines

llvm/trunk/test/CodeGen/PowerPC/licm-remat.ll

	Show All 18 Lines

	define linkonce_odr void @ZN6snappyDecompressor_(%"class.snappy::SnappyDecompressor"* %this, %"class.snappy::SnappyIOVecWriter"* %writer) {			define linkonce_odr void @ZN6snappyDecompressor_(%"class.snappy::SnappyDecompressor"* %this, %"class.snappy::SnappyIOVecWriter"* %writer) {
	; CHECK-LABEL: ZN6snappyDecompressor_:			; CHECK-LABEL: ZN6snappyDecompressor_:
	; CHECK: # %bb.0: # %entry			; CHECK: # %bb.0: # %entry
	; CHECK: addis 3, 2, _ZN6snappy8internalL8wordmaskE@toc@ha			; CHECK: addis 3, 2, _ZN6snappy8internalL8wordmaskE@toc@ha
	; CHECK-DAG: addi 25, 3, _ZN6snappy8internalL8wordmaskE@toc@l			; CHECK-DAG: addi 25, 3, _ZN6snappy8internalL8wordmaskE@toc@l
	; CHECK-DAG: addis 5, 2, _ZN6snappy8internalL10char_tableE@toc@ha			; CHECK-DAG: addis 5, 2, _ZN6snappy8internalL10char_tableE@toc@ha
	; CHECK-DAG: addi 24, 5, _ZN6snappy8internalL10char_tableE@toc@l			; CHECK-DAG: addi 24, 5, _ZN6snappy8internalL10char_tableE@toc@l
	; CHECK: b .[[LABEL1:[A-Z0-9_]+]]			; CHECK: .LBB0_2: # %for.cond
	; CHECK: .[[LABEL1]]: # %for.cond
	; CHECK-NOT: addis {{[0-9]+}}, 2, _ZN6snappy8internalL8wordmaskE@toc@ha			; CHECK-NOT: addis {{[0-9]+}}, 2, _ZN6snappy8internalL8wordmaskE@toc@ha
	; CHECK-NOT: addis {{[0-9]+}}, 2, _ZN6snappy8internalL10char_tableE@toc@ha			; CHECK-NOT: addis {{[0-9]+}}, 2, _ZN6snappy8internalL10char_tableE@toc@ha
	; CHECK: bctrl			; CHECK: bctrl
	entry:			entry:
	%ip_limit_ = getelementptr inbounds %"class.snappy::SnappyDecompressor", %"class.snappy::SnappyDecompressor"* %this, i64 0, i32 2			%ip_limit_ = getelementptr inbounds %"class.snappy::SnappyDecompressor", %"class.snappy::SnappyDecompressor"* %this, i64 0, i32 2
	%0 = bitcast i8** %ip_limit_ to i64*			%0 = bitcast i8** %ip_limit_ to i64*
	%curr_iov_index_.i = getelementptr inbounds %"class.snappy::SnappyIOVecWriter", %"class.snappy::SnappyIOVecWriter"* %writer, i64 0, i32 2			%curr_iov_index_.i = getelementptr inbounds %"class.snappy::SnappyIOVecWriter", %"class.snappy::SnappyIOVecWriter"* %writer, i64 0, i32 2
	%curr_iov_written_.i = getelementptr inbounds %"class.snappy::SnappyIOVecWriter", %"class.snappy::SnappyIOVecWriter"* %writer, i64 0, i32 3			%curr_iov_written_.i = getelementptr inbounds %"class.snappy::SnappyIOVecWriter", %"class.snappy::SnappyIOVecWriter"* %writer, i64 0, i32 3
	▲ Show 20 Lines • Show All 144 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/SystemZ/atomicrmw-minmax-01.ll

	; Test 8-bit atomic min/max operations.			; Test 8-bit atomic min/max operations.
	;			;
	; RUN: llc < %s -mtriple=s390x-linux-gnu \| FileCheck %s			; RUN: llc < %s -mtriple=s390x-linux-gnu -disable-block-placement \| FileCheck %s
	; RUN: llc < %s -mtriple=s390x-linux-gnu \| FileCheck %s -check-prefix=CHECK-SHIFT1			; RUN: llc < %s -mtriple=s390x-linux-gnu -disable-block-placement \| FileCheck %s -check-prefix=CHECK-SHIFT1
	; RUN: llc < %s -mtriple=s390x-linux-gnu \| FileCheck %s -check-prefix=CHECK-SHIFT2			; RUN: llc < %s -mtriple=s390x-linux-gnu -disable-block-placement \| FileCheck %s -check-prefix=CHECK-SHIFT2

	; Check signed minimum.			; Check signed minimum.
	; - CHECK is for the main loop.			; - CHECK is for the main loop.
	; - CHECK-SHIFT1 makes sure that the negated shift count used by the second			; - CHECK-SHIFT1 makes sure that the negated shift count used by the second
	; RLL is set up correctly. The negation is independent of the NILL and L			; RLL is set up correctly. The negation is independent of the NILL and L
	; tested in CHECK.			; tested in CHECK.
	; - CHECK-SHIFT2 makes sure that %b is shifted into the high part of the word			; - CHECK-SHIFT2 makes sure that %b is shifted into the high part of the word
	; before being used, and that the low bits are set to 1. This sequence is			; before being used, and that the low bits are set to 1. This sequence is
	▲ Show 20 Lines • Show All 211 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/SystemZ/atomicrmw-minmax-02.ll

	; Test 8-bit atomic min/max operations.			; Test 8-bit atomic min/max operations.
	;			;
	; RUN: llc < %s -mtriple=s390x-linux-gnu \| FileCheck %s			; RUN: llc < %s -mtriple=s390x-linux-gnu -disable-block-placement \| FileCheck %s
	; RUN: llc < %s -mtriple=s390x-linux-gnu \| FileCheck %s -check-prefix=CHECK-SHIFT1			; RUN: llc < %s -mtriple=s390x-linux-gnu -disable-block-placement \| FileCheck %s -check-prefix=CHECK-SHIFT1
	; RUN: llc < %s -mtriple=s390x-linux-gnu \| FileCheck %s -check-prefix=CHECK-SHIFT2			; RUN: llc < %s -mtriple=s390x-linux-gnu -disable-block-placement \| FileCheck %s -check-prefix=CHECK-SHIFT2

	; Check signed minimum.			; Check signed minimum.
	; - CHECK is for the main loop.			; - CHECK is for the main loop.
	; - CHECK-SHIFT1 makes sure that the negated shift count used by the second			; - CHECK-SHIFT1 makes sure that the negated shift count used by the second
	; RLL is set up correctly. The negation is independent of the NILL and L			; RLL is set up correctly. The negation is independent of the NILL and L
	; tested in CHECK.			; tested in CHECK.
	; - CHECK-SHIFT2 makes sure that %b is shifted into the high part of the word			; - CHECK-SHIFT2 makes sure that %b is shifted into the high part of the word
	; before being used, and that the low bits are set to 1. This sequence is			; before being used, and that the low bits are set to 1. This sequence is
	▲ Show 20 Lines • Show All 211 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/SystemZ/loop-01.ll

	; Test loop tuning.			; Test loop tuning.
	;			;
	; RUN: llc < %s -mtriple=s390x-linux-gnu -mcpu=z10 \| FileCheck %s			; RUN: llc < %s -mtriple=s390x-linux-gnu -mcpu=z10 -disable-block-placement \| FileCheck %s
	; RUN: llc < %s -mtriple=s390x-linux-gnu -mcpu=z13 \			; RUN: llc < %s -mtriple=s390x-linux-gnu -mcpu=z13 -disable-block-placement \
	; RUN: \| FileCheck %s -check-prefix=CHECK -check-prefix=CHECK-Z13			; RUN: \| FileCheck %s -check-prefix=CHECK -check-prefix=CHECK-Z13

	; Test that strength reduction is applied to addresses with a scale factor,			; Test that strength reduction is applied to addresses with a scale factor,
	; but that indexed addressing can still be used.			; but that indexed addressing can still be used.
	define void @f1(i32 *%dest, i32 %a) {			define void @f1(i32 *%dest, i32 %a) {
	; CHECK-LABEL: f1:			; CHECK-LABEL: f1:
	; CHECK-NOT: sllg			; CHECK-NOT: sllg
	; CHECK: st %r3, 400({{%r[1-5],%r[1-5]}})			; CHECK: st %r3, 400({{%r[1-5],%r[1-5]}})
	▲ Show 20 Lines • Show All 310 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/SystemZ/loop-02.ll

	; Test BRCTH.			; Test BRCTH.

	; RUN: llc < %s -verify-machineinstrs -mtriple=s390x-linux-gnu -mcpu=z196 \			; RUN: llc < %s -verify-machineinstrs -mtriple=s390x-linux-gnu -mcpu=z196 \
	; RUN: -no-integrated-as \| FileCheck %s			; RUN: -no-integrated-as -disable-block-placement \| FileCheck %s

	; Test a loop that should be converted into dbr form and then use BRCTH.			; Test a loop that should be converted into dbr form and then use BRCTH.
	define void @f2(i32 %src, i32 %dest) {			define void @f2(i32 %src, i32 %dest) {
	; CHECK-LABEL: f2:			; CHECK-LABEL: f2:
	; CHECK: blah [[REG:%r[0-5]]]			; CHECK: blah [[REG:%r[0-5]]]
	; CHECK: [[LABEL:\.[^:]]]:{{.}} %loop			; CHECK: [[LABEL:\.[^:]]]:{{.}} %loop
	; CHECK: brcth [[REG]], [[LABEL]]			; CHECK: brcth [[REG]], [[LABEL]]
	; CHECK: br %r14			; CHECK: br %r14
	Show All 26 Lines

llvm/trunk/test/CodeGen/SystemZ/swifterror.ll

	; RUN: llc < %s -mtriple=s390x-linux-gnu\| FileCheck %s			; RUN: llc < %s -mtriple=s390x-linux-gnu -disable-block-placement \| FileCheck %s
	; RUN: llc < %s -O0 -mtriple=s390x-linux-gnu \| FileCheck --check-prefix=CHECK-O0 %s			; RUN: llc < %s -O0 -mtriple=s390x-linux-gnu -disable-block-placement \| FileCheck --check-prefix=CHECK-O0 %s

	declare i8* @malloc(i64)			declare i8* @malloc(i64)
	declare void @free(i8*)			declare void @free(i8*)
	%swift_error = type {i64, i8}			%swift_error = type {i64, i8}

	; This tests the basic usage of a swifterror parameter. "foo" is the function			; This tests the basic usage of a swifterror parameter. "foo" is the function
	; that takes a swifterror parameter and "caller" is the caller of "foo".			; that takes a swifterror parameter and "caller" is the caller of "foo".
	define float @foo(%swift_error** swifterror %error_ptr_ref) {			define float @foo(%swift_error** swifterror %error_ptr_ref) {
	▲ Show 20 Lines • Show All 348 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/Thumb/consthoist-physical-addr.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc %s -o - -asm-verbose=false \| FileCheck %s			; RUN: llc %s -o - -asm-verbose=false \| FileCheck %s

	target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"			target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"
	target triple = "thumbv6m-arm-none-eabi"			target triple = "thumbv6m-arm-none-eabi"

	define i32 @C(i32 %x, i32* nocapture %y) #0 {			define i32 @C(i32 %x, i32* nocapture %y) #0 {
	; CHECK-LABEL: C:			; CHECK-LABEL: C:
	; CHECK: .save {r4, r5, r7, lr}			; CHECK: .save {r4, r5, r7, lr}
	; CHECK-NEXT: push {r4, r5, r7, lr}			; CHECK-NEXT: push {r4, r5, r7, lr}
	; CHECK-NEXT: movs r2, #0			; CHECK-NEXT: movs r2, #0
	; CHECK-NEXT: ldr r3, .LCPI0_0			; CHECK-NEXT: ldr r3, .LCPI0_0
	; CHECK-NEXT: b .LBB0_4
	; CHECK-NEXT: .LBB0_1:			; CHECK-NEXT: .LBB0_1:
				; CHECK-NEXT: cmp r2, #128
				; CHECK-NEXT: beq .LBB0_5
	; CHECK-NEXT: movs r4, #0			; CHECK-NEXT: movs r4, #0
	; CHECK-NEXT: str r4, [r3, #8]			; CHECK-NEXT: str r4, [r3, #8]
	; CHECK-NEXT: lsls r4, r2, #2			; CHECK-NEXT: lsls r4, r2, #2
	; CHECK-NEXT: adds r5, r4, r0			; CHECK-NEXT: adds r5, r4, r0
	; CHECK-NEXT: str r5, [r3]			; CHECK-NEXT: str r5, [r3]
	; CHECK-NEXT: movs r5, #1			; CHECK-NEXT: movs r5, #1
	; CHECK-NEXT: str r5, [r3, #12]			; CHECK-NEXT: str r5, [r3, #12]
	; CHECK-NEXT: isb sy			; CHECK-NEXT: isb sy
	; CHECK-NEXT: .LBB0_2:			; CHECK-NEXT: .LBB0_3:
	; CHECK-NEXT: ldr r5, [r3, #12]			; CHECK-NEXT: ldr r5, [r3, #12]
	; CHECK-NEXT: cmp r5, #0			; CHECK-NEXT: cmp r5, #0
	; CHECK-NEXT: bne .LBB0_2			; CHECK-NEXT: bne .LBB0_3
	; CHECK-NEXT: ldr r5, [r3, #4]			; CHECK-NEXT: ldr r5, [r3, #4]
	; CHECK-NEXT: str r5, [r1, r4]			; CHECK-NEXT: str r5, [r1, r4]
	; CHECK-NEXT: adds r2, r2, #1			; CHECK-NEXT: adds r2, r2, #1
	; CHECK-NEXT: .LBB0_4:			; CHECK-NEXT: b .LBB0_1
	; CHECK-NEXT: cmp r2, #128			; CHECK-NEXT: .LBB0_5:
	; CHECK-NEXT: bne .LBB0_1
	; CHECK-NEXT: movs r0, #0			; CHECK-NEXT: movs r0, #0
	; CHECK-NEXT: pop {r4, r5, r7, pc}			; CHECK-NEXT: pop {r4, r5, r7, pc}
	; CHECK-NEXT: .p2align 2			; CHECK-NEXT: .p2align 2
	; CHECK-NEXT: .LCPI0_0:			; CHECK-NEXT: .LCPI0_0:
	; CHECK-NEXT: .long 805355524			; CHECK-NEXT: .long 805355524
	entry:			entry:
	br label %for.cond			br label %for.cond

	Show All 35 Lines

llvm/trunk/test/CodeGen/X86/block-placement.ll

Show First 20 Lines • Show All 76 Lines • ▼ Show 20 Lines
exit:		exit:
ret i32 %b		ret i32 %b
}		}

define i32 @test_loop_cold_blocks(i32 %i, i32* %a) {		define i32 @test_loop_cold_blocks(i32 %i, i32* %a) {
; Check that we sink cold loop blocks after the hot loop body.		; Check that we sink cold loop blocks after the hot loop body.
; CHECK-LABEL: test_loop_cold_blocks:		; CHECK-LABEL: test_loop_cold_blocks:
; CHECK: %entry		; CHECK: %entry
; CHECK-NOT: .p2align
; CHECK: %unlikely1
; CHECK-NOT: .p2align
; CHECK: %unlikely2
; CHECK: .p2align		; CHECK: .p2align
; CHECK: %body1		; CHECK: %body1
; CHECK: %body2		; CHECK: %body2
; CHECK: %body3		; CHECK: %body3
		; CHECK-NOT: .p2align
		; CHECK: %unlikely1
		; CHECK-NOT: .p2align
		; CHECK: %unlikely2
; CHECK: %exit		; CHECK: %exit

entry:		entry:
br label %body1		br label %body1

body1:		body1:
%iv = phi i32 [ 0, %entry ], [ %next, %body3 ]		%iv = phi i32 [ 0, %entry ], [ %next, %body3 ]
%base = phi i32 [ 0, %entry ], [ %sum, %body3 ]		%base = phi i32 [ 0, %entry ], [ %sum, %body3 ]
Show All 19 Lines	body3:
%next = add i32 %iv, 1		%next = add i32 %iv, 1
%exitcond = icmp eq i32 %next, %i		%exitcond = icmp eq i32 %next, %i
br i1 %exitcond, label %exit, label %body1		br i1 %exitcond, label %exit, label %body1

exit:		exit:
ret i32 %sum		ret i32 %sum
}		}

!0 = !{!"branch_weights", i32 4, i32 64}		!0 = !{!"branch_weights", i32 1, i32 64}

define i32 @test_loop_early_exits(i32 %i, i32* %a) {		define i32 @test_loop_early_exits(i32 %i, i32* %a) {
; Check that we sink early exit blocks out of loop bodies.		; Check that we sink early exit blocks out of loop bodies.
; CHECK-LABEL: test_loop_early_exits:		; CHECK-LABEL: test_loop_early_exits:
; CHECK: %entry		; CHECK: %entry
; CHECK: %body1		; CHECK: %body1
; CHECK: %body2		; CHECK: %body2
; CHECK: %body3		; CHECK: %body3
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
; duplicated, we add some calls to dummy.		; duplicated, we add some calls to dummy.
declare void @dummy()		declare void @dummy()

define i32 @test_loop_rotate(i32 %i, i32* %a) {		define i32 @test_loop_rotate(i32 %i, i32* %a) {
; Check that we rotate conditional exits from the loop to the bottom of the		; Check that we rotate conditional exits from the loop to the bottom of the
; loop, eliminating unconditional branches to the top.		; loop, eliminating unconditional branches to the top.
; CHECK-LABEL: test_loop_rotate:		; CHECK-LABEL: test_loop_rotate:
; CHECK: %entry		; CHECK: %entry
; CHECK: %body1
; CHECK: %body0		; CHECK: %body0
		; CHECK: %body1
; CHECK: %exit		; CHECK: %exit

entry:		entry:
br label %body0		br label %body0

body0:		body0:
%iv = phi i32 [ 0, %entry ], [ %next, %body1 ]		%iv = phi i32 [ 0, %entry ], [ %next, %body1 ]
%base = phi i32 [ 0, %entry ], [ %sum, %body1 ]		%base = phi i32 [ 0, %entry ], [ %sum, %body1 ]
▲ Show 20 Lines • Show All 750 Lines • ▼ Show 20 Lines
; First rotated loop top.		; First rotated loop top.
; CHECK: .p2align		; CHECK: .p2align
; CHECK: %while.end		; CHECK: %while.end
; %for.cond gets completely tail-duplicated away.		; %for.cond gets completely tail-duplicated away.
; CHECK: %if.then		; CHECK: %if.then
; CHECK: %if.else		; CHECK: %if.else
; CHECK: %if.end10		; CHECK: %if.end10
; Second rotated loop top		; Second rotated loop top
; CHECK: .p2align
; CHECK: %if.then24
; CHECK: %while.cond.outer		; CHECK: %while.cond.outer
; Third rotated loop top		; Third rotated loop top
; CHECK: .p2align		; CHECK: .p2align
		; CHECK: %if.end20
; CHECK: %while.cond		; CHECK: %while.cond
; CHECK: %while.body		; CHECK: %while.body
; CHECK: %land.lhs.true		; CHECK: %land.lhs.true
; CHECK: %if.then19		; CHECK: %if.then19
; CHECK: %if.end20		; CHECK: %if.then24
; CHECK: %if.then8		; CHECK: %if.then8
; CHECK: ret		; CHECK: ret

entry:		entry:
%shr = ashr i32 %n, 1		%shr = ashr i32 %n, 1
%add = add nsw i32 %shr, 1		%add = add nsw i32 %shr, 1
%arrayidx3 = getelementptr inbounds double, double* %ra, i64 1		%arrayidx3 = getelementptr inbounds double, double* %ra, i64 1
br label %for.cond		br label %for.cond
▲ Show 20 Lines • Show All 563 Lines • ▼ Show 20 Lines
}		}

define i32 @not_rotate_if_extra_branch_regression(i32 %count, i32 %init) {		define i32 @not_rotate_if_extra_branch_regression(i32 %count, i32 %init) {
; This is a regression test against patch avoid loop rotation if		; This is a regression test against patch avoid loop rotation if
; it introduce an extra btanch.		; it introduce an extra btanch.
; CHECK-LABEL: not_rotate_if_extra_branch_regression		; CHECK-LABEL: not_rotate_if_extra_branch_regression
; CHECK: %.entry		; CHECK: %.entry
; CHECK: %.first_backedge		; CHECK: %.first_backedge
; CHECK: %.slow
; CHECK: %.second_header		; CHECK: %.second_header
		; CHECK: %.slow
.entry:		.entry:
%sum.0 = shl nsw i32 %count, 1		%sum.0 = shl nsw i32 %count, 1
br label %.first_header		br label %.first_header

.first_header:		.first_header:
%i = phi i32 [ %i.1, %.first_backedge ], [ 0, %.entry ]		%i = phi i32 [ %i.1, %.first_backedge ], [ 0, %.entry ]
%is_bo1 = icmp sgt i32 %i, 9000000		%is_bo1 = icmp sgt i32 %i, 9000000
br i1 %is_bo1, label %.bailout, label %.first_backedge, !prof !14		br i1 %is_bo1, label %.bailout, label %.first_backedge, !prof !14
▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/code_placement.ll

	; RUN: llc -mtriple=i686-- < %s \| FileCheck %s			; RUN: llc -mtriple=i686-- < %s \| FileCheck %s

	@Te0 = external global [256 x i32] ; <[256 x i32]*> [#uses=5]			@Te0 = external global [256 x i32] ; <[256 x i32]*> [#uses=5]
	@Te1 = external global [256 x i32] ; <[256 x i32]*> [#uses=4]			@Te1 = external global [256 x i32] ; <[256 x i32]*> [#uses=4]
	@Te3 = external global [256 x i32] ; <[256 x i32]*> [#uses=2]			@Te3 = external global [256 x i32] ; <[256 x i32]*> [#uses=2]

				; CHECK: %entry
				; CHECK: %bb
				; CHECK: %bb1
				; CHECK: %bb2

	define void @t(i8* nocapture %in, i8* nocapture %out, i32* nocapture %rk, i32 %r) nounwind ssp {			define void @t(i8* nocapture %in, i8* nocapture %out, i32* nocapture %rk, i32 %r) nounwind ssp {
	entry:			entry:
	%0 = load i32, i32* %rk, align 4 ; <i32> [#uses=1]			%0 = load i32, i32* %rk, align 4 ; <i32> [#uses=1]
	%1 = getelementptr i32, i32* %rk, i64 1 ; <i32*> [#uses=1]			%1 = getelementptr i32, i32* %rk, i64 1 ; <i32*> [#uses=1]
	%2 = load i32, i32* %1, align 4 ; <i32> [#uses=1]			%2 = load i32, i32* %1, align 4 ; <i32> [#uses=1]
	%tmp15 = add i32 %r, -1 ; <i32> [#uses=1]			%tmp15 = add i32 %r, -1 ; <i32> [#uses=1]
	%tmp.16 = zext i32 %tmp15 to i64 ; <i64> [#uses=2]			%tmp.16 = zext i32 %tmp15 to i64 ; <i64> [#uses=2]
	br label %bb			br label %bb
	; CHECK: jmp
	; CHECK-NEXT: align

	bb: ; preds = %bb1, %entry			bb: ; preds = %bb1, %entry
	%indvar = phi i64 [ 0, %entry ], [ %indvar.next, %bb1 ] ; <i64> [#uses=3]			%indvar = phi i64 [ 0, %entry ], [ %indvar.next, %bb1 ] ; <i64> [#uses=3]
	%s1.0 = phi i32 [ %2, %entry ], [ %56, %bb1 ] ; <i32> [#uses=2]			%s1.0 = phi i32 [ %2, %entry ], [ %56, %bb1 ] ; <i32> [#uses=2]
	%s0.0 = phi i32 [ %0, %entry ], [ %43, %bb1 ] ; <i32> [#uses=2]			%s0.0 = phi i32 [ %0, %entry ], [ %43, %bb1 ] ; <i32> [#uses=2]
	%tmp18 = shl i64 %indvar, 4 ; <i64> [#uses=4]			%tmp18 = shl i64 %indvar, 4 ; <i64> [#uses=4]
	%rk26 = bitcast i32* %rk to i8* ; <i8*> [#uses=6]			%rk26 = bitcast i32* %rk to i8* ; <i8*> [#uses=6]
	%3 = lshr i32 %s0.0, 24 ; <i32> [#uses=1]			%3 = lshr i32 %s0.0, 24 ; <i32> [#uses=1]
	▲ Show 20 Lines • Show All 112 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/code_placement_cold_loop_blocks.ll

	Show All 38 Lines

	define void @nested_loop_0(i1 %flag) !prof !1 {			define void @nested_loop_0(i1 %flag) !prof !1 {
	; Test if a block that is cold in the inner loop but not cold in the outer loop			; Test if a block that is cold in the inner loop but not cold in the outer loop
	; will merged to the outer loop chain.			; will merged to the outer loop chain.
	;			;
	; CHECK-LABEL: nested_loop_0:			; CHECK-LABEL: nested_loop_0:
	; CHECK: callq c			; CHECK: callq c
	; CHECK: callq d			; CHECK: callq d
	; CHECK: callq e
	; CHECK: callq b			; CHECK: callq b
				; CHECK: callq e
	; CHECK: callq f			; CHECK: callq f

	entry:			entry:
	br label %header			br label %header

	header:			header:
	call void @b()			call void @b()
	%call4 = call zeroext i1 @a()			%call4 = call zeroext i1 @a()
	▲ Show 20 Lines • Show All 65 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/code_placement_ignore_succ_in_inner_loop.ll

	; RUN: llc -mcpu=corei7 -mtriple=x86_64-linux < %s \| FileCheck %s			; RUN: llc -mcpu=corei7 -mtriple=x86_64-linux < %s \| FileCheck %s

	define void @foo() {			define void @foo() {
	; Test that when determining the edge probability from a node in an inner loop			; After moving the latch to the top of loop, there is no fall through from the
	; to a node in an outer loop, the weights on edges in the inner loop should be			; latch to outer loop.
	; ignored if we are building the chain for the outer loop.
	;			;
	; CHECK-LABEL: foo:			; CHECK-LABEL: foo:
	; CHECK: callq c
	; CHECK: callq b			; CHECK: callq b
				; CHECK: callq c

	entry:			entry:
	%call = call zeroext i1 @a()			%call = call zeroext i1 @a()
	br i1 %call, label %if.then, label %if.else, !prof !1			br i1 %call, label %if.then, label %if.else, !prof !1

	if.then:			if.then:
	%call1 = call zeroext i1 @a()			%call1 = call zeroext i1 @a()
	br i1 %call1, label %while.body, label %if.end.1, !prof !1			br i1 %call1, label %while.body, label %if.end.1, !prof !1
	▲ Show 20 Lines • Show All 105 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/code_placement_loop_rotation2.ll

	; RUN: llc -mcpu=corei7 -mtriple=x86_64-linux < %s \| FileCheck %s			; RUN: llc -mcpu=corei7 -mtriple=x86_64-linux < %s \| FileCheck %s
	; RUN: llc -mcpu=corei7 -mtriple=x86_64-linux -precise-rotation-cost < %s \| FileCheck %s -check-prefix=CHECK-PROFILE			; RUN: llc -mcpu=corei7 -mtriple=x86_64-linux -precise-rotation-cost < %s \| FileCheck %s -check-prefix=CHECK-PROFILE

	define void @foo() {			define void @foo() {
	; Test a nested loop case when profile data is not available.			; Test a nested loop case when profile data is not available.
	;			;
	; CHECK-LABEL: foo:			; CHECK-LABEL: foo:
				; CHECK: callq g
				; CHECK: callq h
	; CHECK: callq b			; CHECK: callq b
	; CHECK: callq c
	; CHECK: callq d
	; CHECK: callq e			; CHECK: callq e
	; CHECK: callq f			; CHECK: callq f
	; CHECK: callq g			; CHECK: callq c
	; CHECK: callq h			; CHECK: callq d

	entry:			entry:
	br label %header			br label %header

	header:			header:
	call void @b()			call void @b()
	%call = call zeroext i1 @a()			%call = call zeroext i1 @a()
	br i1 %call, label %if.then, label %if.else, !prof !2			br i1 %call, label %if.then, label %if.else, !prof !2
	Show All 31 Lines
	end:			end:
	ret void			ret void
	}			}

	define void @bar() !prof !1 {			define void @bar() !prof !1 {
	; Test a nested loop case when profile data is available.			; Test a nested loop case when profile data is available.
	;			;
	; CHECK-PROFILE-LABEL: bar:			; CHECK-PROFILE-LABEL: bar:
				; CHECK-PROFILE: callq h
				; CHECK-PROFILE: callq b
				; CHECK-PROFILE: callq g
	; CHECK-PROFILE: callq e			; CHECK-PROFILE: callq e
	; CHECK-PROFILE: callq f			; CHECK-PROFILE: callq f
	; CHECK-PROFILE: callq c			; CHECK-PROFILE: callq c
	; CHECK-PROFILE: callq d			; CHECK-PROFILE: callq d
	; CHECK-PROFILE: callq h
	; CHECK-PROFILE: callq b
	; CHECK-PROFILE: callq g

	entry:			entry:
	br label %header			br label %header

	header:			header:
	call void @b()			call void @b()
	%call = call zeroext i1 @a()			%call = call zeroext i1 @a()
	br i1 %call, label %if.then, label %if.else, !prof !2			br i1 %call, label %if.then, label %if.else, !prof !2
	▲ Show 20 Lines • Show All 46 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/code_placement_no_header_change.ll

	; RUN: llc -mtriple=i686-linux < %s \| FileCheck %s			; RUN: llc -mtriple=i686-linux < %s \| FileCheck %s


	define i32 @bar(i32 %count) {			define i32 @bar(i32 %count) {
	; Test checks that basic block backedge2 is not moved before header,			; Test checks that basic block backedge2 is not moved before header,
	; because it can't reduce taken branches.			; because it can't reduce taken branches.
	; Later backedge1 and backedge2 is rotated before loop header.			; Later backedge1 and backedge2 is rotated before loop header.
	; CHECK-LABEL: bar			; CHECK-LABEL: bar
	; CHECK: %.entry			; CHECK: %.entry
				; CHECK: %.header
	; CHECK: %.backedge1			; CHECK: %.backedge1
	; CHECK: %.backedge2			; CHECK: %.backedge2
	; CHECK: %.header
	; CHECK: %.exit			; CHECK: %.exit
	.entry:			.entry:
	%c = shl nsw i32 %count, 2			%c = shl nsw i32 %count, 2
	br label %.header			br label %.header

	.header:			.header:
	%val1 = call i32 @foo()			%val1 = call i32 @foo()
	%cond1 = icmp sgt i32 %val1, 1			%cond1 = icmp sgt i32 %val1, 1
	Show All 16 Lines

llvm/trunk/test/CodeGen/X86/conditional-tailcall.ll

	Show First 20 Lines • Show All 252 Lines • ▼ Show 20 Lines
	; CHECK32-NEXT: leal (%eax,%edx), %ebp # encoding: [0x8d,0x2c,0x10]			; CHECK32-NEXT: leal (%eax,%edx), %ebp # encoding: [0x8d,0x2c,0x10]
	; CHECK32-NEXT: xorl %ebx, %ebx # encoding: [0x31,0xdb]			; CHECK32-NEXT: xorl %ebx, %ebx # encoding: [0x31,0xdb]
	; CHECK32-NEXT: pushl $2 # encoding: [0x6a,0x02]			; CHECK32-NEXT: pushl $2 # encoding: [0x6a,0x02]
	; CHECK32-NEXT: .cfi_adjust_cfa_offset 4			; CHECK32-NEXT: .cfi_adjust_cfa_offset 4
	; CHECK32-NEXT: popl %esi # encoding: [0x5e]			; CHECK32-NEXT: popl %esi # encoding: [0x5e]
	; CHECK32-NEXT: .cfi_adjust_cfa_offset -4			; CHECK32-NEXT: .cfi_adjust_cfa_offset -4
	; CHECK32-NEXT: xorl %edi, %edi # encoding: [0x31,0xff]			; CHECK32-NEXT: xorl %edi, %edi # encoding: [0x31,0xff]
	; CHECK32-NEXT: incl %edi # encoding: [0x47]			; CHECK32-NEXT: incl %edi # encoding: [0x47]
	; CHECK32-NEXT: jmp .LBB3_1 # encoding: [0xeb,A]			; CHECK32-NEXT: .LBB3_1: # %for.cond
	; CHECK32-NEXT: # fixup A - offset: 1, value: .LBB3_1-1, kind: FK_PCRel_1			; CHECK32-NEXT: # =>This Inner Loop Header: Depth=1
	; CHECK32-NEXT: .LBB3_2: # %for.body			; CHECK32-NEXT: testl %edx, %edx # encoding: [0x85,0xd2]
				; CHECK32-NEXT: je .LBB3_13 # encoding: [0x74,A]
				; CHECK32-NEXT: # fixup A - offset: 1, value: .LBB3_13-1, kind: FK_PCRel_1
				; CHECK32-NEXT: # %bb.2: # %for.body
	; CHECK32-NEXT: # in Loop: Header=BB3_1 Depth=1			; CHECK32-NEXT: # in Loop: Header=BB3_1 Depth=1
	; CHECK32-NEXT: cmpl $2, %ebx # encoding: [0x83,0xfb,0x02]			; CHECK32-NEXT: cmpl $2, %ebx # encoding: [0x83,0xfb,0x02]
	; CHECK32-NEXT: je .LBB3_11 # encoding: [0x74,A]			; CHECK32-NEXT: je .LBB3_11 # encoding: [0x74,A]
	; CHECK32-NEXT: # fixup A - offset: 1, value: .LBB3_11-1, kind: FK_PCRel_1			; CHECK32-NEXT: # fixup A - offset: 1, value: .LBB3_11-1, kind: FK_PCRel_1
	; CHECK32-NEXT: # %bb.3: # %for.body			; CHECK32-NEXT: # %bb.3: # %for.body
	; CHECK32-NEXT: # in Loop: Header=BB3_1 Depth=1			; CHECK32-NEXT: # in Loop: Header=BB3_1 Depth=1
	; CHECK32-NEXT: cmpl $1, %ebx # encoding: [0x83,0xfb,0x01]			; CHECK32-NEXT: cmpl $1, %ebx # encoding: [0x83,0xfb,0x01]
	; CHECK32-NEXT: je .LBB3_9 # encoding: [0x74,A]			; CHECK32-NEXT: je .LBB3_9 # encoding: [0x74,A]
	Show All 37 Lines
	; CHECK32-NEXT: cmpl $10, %ecx # encoding: [0x83,0xf9,0x0a]			; CHECK32-NEXT: cmpl $10, %ecx # encoding: [0x83,0xf9,0x0a]
	; CHECK32-NEXT: movl %esi, %ebx # encoding: [0x89,0xf3]			; CHECK32-NEXT: movl %esi, %ebx # encoding: [0x89,0xf3]
	; CHECK32-NEXT: jae .LBB3_8 # encoding: [0x73,A]			; CHECK32-NEXT: jae .LBB3_8 # encoding: [0x73,A]
	; CHECK32-NEXT: # fixup A - offset: 1, value: .LBB3_8-1, kind: FK_PCRel_1			; CHECK32-NEXT: # fixup A - offset: 1, value: .LBB3_8-1, kind: FK_PCRel_1
	; CHECK32-NEXT: .LBB3_10: # %for.inc			; CHECK32-NEXT: .LBB3_10: # %for.inc
	; CHECK32-NEXT: # in Loop: Header=BB3_1 Depth=1			; CHECK32-NEXT: # in Loop: Header=BB3_1 Depth=1
	; CHECK32-NEXT: incl %eax # encoding: [0x40]			; CHECK32-NEXT: incl %eax # encoding: [0x40]
	; CHECK32-NEXT: decl %edx # encoding: [0x4a]			; CHECK32-NEXT: decl %edx # encoding: [0x4a]
	; CHECK32-NEXT: .LBB3_1: # %for.cond			; CHECK32-NEXT: jmp .LBB3_1 # encoding: [0xeb,A]
	; CHECK32-NEXT: # =>This Inner Loop Header: Depth=1			; CHECK32-NEXT: # fixup A - offset: 1, value: .LBB3_1-1, kind: FK_PCRel_1
	; CHECK32-NEXT: testl %edx, %edx # encoding: [0x85,0xd2]			; CHECK32-NEXT: .LBB3_13:
	; CHECK32-NEXT: jne .LBB3_2 # encoding: [0x75,A]
	; CHECK32-NEXT: # fixup A - offset: 1, value: .LBB3_2-1, kind: FK_PCRel_1
	; CHECK32-NEXT: # %bb.13:
	; CHECK32-NEXT: cmpl $2, %ebx # encoding: [0x83,0xfb,0x02]			; CHECK32-NEXT: cmpl $2, %ebx # encoding: [0x83,0xfb,0x02]
	; CHECK32-NEXT: sete %al # encoding: [0x0f,0x94,0xc0]			; CHECK32-NEXT: sete %al # encoding: [0x0f,0x94,0xc0]
	; CHECK32-NEXT: jmp .LBB3_14 # encoding: [0xeb,A]			; CHECK32-NEXT: jmp .LBB3_14 # encoding: [0xeb,A]
	; CHECK32-NEXT: # fixup A - offset: 1, value: .LBB3_14-1, kind: FK_PCRel_1			; CHECK32-NEXT: # fixup A - offset: 1, value: .LBB3_14-1, kind: FK_PCRel_1
	; CHECK32-NEXT: .LBB3_8:			; CHECK32-NEXT: .LBB3_8:
	; CHECK32-NEXT: xorl %eax, %eax # encoding: [0x31,0xc0]			; CHECK32-NEXT: xorl %eax, %eax # encoding: [0x31,0xc0]
	; CHECK32-NEXT: .LBB3_14: # %cleanup.thread			; CHECK32-NEXT: .LBB3_14: # %cleanup.thread
	; CHECK32-NEXT: # kill: def $al killed $al killed $eax			; CHECK32-NEXT: # kill: def $al killed $al killed $eax
	Show All 33 Lines
	; CHECK64-NEXT: pushq $2 # encoding: [0x6a,0x02]			; CHECK64-NEXT: pushq $2 # encoding: [0x6a,0x02]
	; CHECK64-NEXT: .cfi_adjust_cfa_offset 8			; CHECK64-NEXT: .cfi_adjust_cfa_offset 8
	; CHECK64-NEXT: popq %r9 # encoding: [0x41,0x59]			; CHECK64-NEXT: popq %r9 # encoding: [0x41,0x59]
	; CHECK64-NEXT: .cfi_adjust_cfa_offset -8			; CHECK64-NEXT: .cfi_adjust_cfa_offset -8
	; CHECK64-NEXT: pushq $1 # encoding: [0x6a,0x01]			; CHECK64-NEXT: pushq $1 # encoding: [0x6a,0x01]
	; CHECK64-NEXT: .cfi_adjust_cfa_offset 8			; CHECK64-NEXT: .cfi_adjust_cfa_offset 8
	; CHECK64-NEXT: popq %r8 # encoding: [0x41,0x58]			; CHECK64-NEXT: popq %r8 # encoding: [0x41,0x58]
	; CHECK64-NEXT: .cfi_adjust_cfa_offset -8			; CHECK64-NEXT: .cfi_adjust_cfa_offset -8
	; CHECK64-NEXT: jmp .LBB3_11 # encoding: [0xeb,A]			; CHECK64-NEXT: .LBB3_1: # %for.cond
	; CHECK64-NEXT: # fixup A - offset: 1, value: .LBB3_11-1, kind: FK_PCRel_1			; CHECK64-NEXT: # =>This Inner Loop Header: Depth=1
	; CHECK64-NEXT: .LBB3_1: # %for.body			; CHECK64-NEXT: testq %rax, %rax # encoding: [0x48,0x85,0xc0]
	; CHECK64-NEXT: # in Loop: Header=BB3_11 Depth=1			; CHECK64-NEXT: je .LBB3_12 # encoding: [0x74,A]
	; CHECK64-NEXT: cmpl $2, %ecx # encoding: [0x83,0xf9,0x02]			; CHECK64-NEXT: # fixup A - offset: 1, value: .LBB3_12-1, kind: FK_PCRel_1
	; CHECK64-NEXT: je .LBB3_9 # encoding: [0x74,A]
	; CHECK64-NEXT: # fixup A - offset: 1, value: .LBB3_9-1, kind: FK_PCRel_1
	; CHECK64-NEXT: # %bb.2: # %for.body			; CHECK64-NEXT: # %bb.2: # %for.body
	; CHECK64-NEXT: # in Loop: Header=BB3_11 Depth=1			; CHECK64-NEXT: # in Loop: Header=BB3_1 Depth=1
	; CHECK64-NEXT: cmpl $1, %ecx # encoding: [0x83,0xf9,0x01]			; CHECK64-NEXT: cmpl $2, %ecx # encoding: [0x83,0xf9,0x02]
	; CHECK64-NEXT: je .LBB3_7 # encoding: [0x74,A]			; CHECK64-NEXT: je .LBB3_10 # encoding: [0x74,A]
	; CHECK64-NEXT: # fixup A - offset: 1, value: .LBB3_7-1, kind: FK_PCRel_1			; CHECK64-NEXT: # fixup A - offset: 1, value: .LBB3_10-1, kind: FK_PCRel_1
	; CHECK64-NEXT: # %bb.3: # %for.body			; CHECK64-NEXT: # %bb.3: # %for.body
	; CHECK64-NEXT: # in Loop: Header=BB3_11 Depth=1			; CHECK64-NEXT: # in Loop: Header=BB3_1 Depth=1
				; CHECK64-NEXT: cmpl $1, %ecx # encoding: [0x83,0xf9,0x01]
				; CHECK64-NEXT: je .LBB3_8 # encoding: [0x74,A]
				; CHECK64-NEXT: # fixup A - offset: 1, value: .LBB3_8-1, kind: FK_PCRel_1
				; CHECK64-NEXT: # %bb.4: # %for.body
				; CHECK64-NEXT: # in Loop: Header=BB3_1 Depth=1
	; CHECK64-NEXT: testl %ecx, %ecx # encoding: [0x85,0xc9]			; CHECK64-NEXT: testl %ecx, %ecx # encoding: [0x85,0xc9]
	; CHECK64-NEXT: jne .LBB3_10 # encoding: [0x75,A]			; CHECK64-NEXT: jne .LBB3_11 # encoding: [0x75,A]
	; CHECK64-NEXT: # fixup A - offset: 1, value: .LBB3_10-1, kind: FK_PCRel_1			; CHECK64-NEXT: # fixup A - offset: 1, value: .LBB3_11-1, kind: FK_PCRel_1
	; CHECK64-NEXT: # %bb.4: # %sw.bb			; CHECK64-NEXT: # %bb.5: # %sw.bb
	; CHECK64-NEXT: # in Loop: Header=BB3_11 Depth=1			; CHECK64-NEXT: # in Loop: Header=BB3_1 Depth=1
	; CHECK64-NEXT: movzbl (%rdi), %edx # encoding: [0x0f,0xb6,0x17]			; CHECK64-NEXT: movzbl (%rdi), %edx # encoding: [0x0f,0xb6,0x17]
	; CHECK64-NEXT: cmpl $43, %edx # encoding: [0x83,0xfa,0x2b]			; CHECK64-NEXT: cmpl $43, %edx # encoding: [0x83,0xfa,0x2b]
	; CHECK64-NEXT: movl %r8d, %ecx # encoding: [0x44,0x89,0xc1]			; CHECK64-NEXT: movl %r8d, %ecx # encoding: [0x44,0x89,0xc1]
	; CHECK64-NEXT: je .LBB3_10 # encoding: [0x74,A]			; CHECK64-NEXT: je .LBB3_11 # encoding: [0x74,A]
	; CHECK64-NEXT: # fixup A - offset: 1, value: .LBB3_10-1, kind: FK_PCRel_1			; CHECK64-NEXT: # fixup A - offset: 1, value: .LBB3_11-1, kind: FK_PCRel_1
	; CHECK64-NEXT: # %bb.5: # %sw.bb			; CHECK64-NEXT: # %bb.6: # %sw.bb
	; CHECK64-NEXT: # in Loop: Header=BB3_11 Depth=1			; CHECK64-NEXT: # in Loop: Header=BB3_1 Depth=1
	; CHECK64-NEXT: cmpb $45, %dl # encoding: [0x80,0xfa,0x2d]			; CHECK64-NEXT: cmpb $45, %dl # encoding: [0x80,0xfa,0x2d]
	; CHECK64-NEXT: movl %r8d, %ecx # encoding: [0x44,0x89,0xc1]			; CHECK64-NEXT: movl %r8d, %ecx # encoding: [0x44,0x89,0xc1]
	; CHECK64-NEXT: je .LBB3_10 # encoding: [0x74,A]			; CHECK64-NEXT: je .LBB3_11 # encoding: [0x74,A]
	; CHECK64-NEXT: # fixup A - offset: 1, value: .LBB3_10-1, kind: FK_PCRel_1			; CHECK64-NEXT: # fixup A - offset: 1, value: .LBB3_11-1, kind: FK_PCRel_1
	; CHECK64-NEXT: # %bb.6: # %if.else			; CHECK64-NEXT: # %bb.7: # %if.else
	; CHECK64-NEXT: # in Loop: Header=BB3_11 Depth=1			; CHECK64-NEXT: # in Loop: Header=BB3_1 Depth=1
	; CHECK64-NEXT: addl $-48, %edx # encoding: [0x83,0xc2,0xd0]			; CHECK64-NEXT: addl $-48, %edx # encoding: [0x83,0xc2,0xd0]
	; CHECK64-NEXT: cmpl $10, %edx # encoding: [0x83,0xfa,0x0a]			; CHECK64-NEXT: cmpl $10, %edx # encoding: [0x83,0xfa,0x0a]
	; CHECK64-NEXT: jmp .LBB3_8 # encoding: [0xeb,A]			; CHECK64-NEXT: jmp .LBB3_9 # encoding: [0xeb,A]
	; CHECK64-NEXT: # fixup A - offset: 1, value: .LBB3_8-1, kind: FK_PCRel_1			; CHECK64-NEXT: # fixup A - offset: 1, value: .LBB3_9-1, kind: FK_PCRel_1
	; CHECK64-NEXT: .LBB3_7: # %sw.bb14			; CHECK64-NEXT: .LBB3_8: # %sw.bb14
	; CHECK64-NEXT: # in Loop: Header=BB3_11 Depth=1			; CHECK64-NEXT: # in Loop: Header=BB3_1 Depth=1
	; CHECK64-NEXT: movzbl (%rdi), %ecx # encoding: [0x0f,0xb6,0x0f]			; CHECK64-NEXT: movzbl (%rdi), %ecx # encoding: [0x0f,0xb6,0x0f]
	; CHECK64-NEXT: addl $-48, %ecx # encoding: [0x83,0xc1,0xd0]			; CHECK64-NEXT: addl $-48, %ecx # encoding: [0x83,0xc1,0xd0]
	; CHECK64-NEXT: cmpl $10, %ecx # encoding: [0x83,0xf9,0x0a]			; CHECK64-NEXT: cmpl $10, %ecx # encoding: [0x83,0xf9,0x0a]
	; CHECK64-NEXT: .LBB3_8: # %if.else			; CHECK64-NEXT: .LBB3_9: # %if.else
	; CHECK64-NEXT: # in Loop: Header=BB3_11 Depth=1			; CHECK64-NEXT: # in Loop: Header=BB3_1 Depth=1
	; CHECK64-NEXT: movl %r9d, %ecx # encoding: [0x44,0x89,0xc9]			; CHECK64-NEXT: movl %r9d, %ecx # encoding: [0x44,0x89,0xc9]
	; CHECK64-NEXT: jb .LBB3_10 # encoding: [0x72,A]			; CHECK64-NEXT: jb .LBB3_11 # encoding: [0x72,A]
	; CHECK64-NEXT: # fixup A - offset: 1, value: .LBB3_10-1, kind: FK_PCRel_1			; CHECK64-NEXT: # fixup A - offset: 1, value: .LBB3_11-1, kind: FK_PCRel_1
	; CHECK64-NEXT: jmp .LBB3_13 # encoding: [0xeb,A]			; CHECK64-NEXT: jmp .LBB3_13 # encoding: [0xeb,A]
	; CHECK64-NEXT: # fixup A - offset: 1, value: .LBB3_13-1, kind: FK_PCRel_1			; CHECK64-NEXT: # fixup A - offset: 1, value: .LBB3_13-1, kind: FK_PCRel_1
	; CHECK64-NEXT: .LBB3_9: # %sw.bb22			; CHECK64-NEXT: .LBB3_10: # %sw.bb22
	; CHECK64-NEXT: # in Loop: Header=BB3_11 Depth=1			; CHECK64-NEXT: # in Loop: Header=BB3_1 Depth=1
	; CHECK64-NEXT: movzbl (%rdi), %ecx # encoding: [0x0f,0xb6,0x0f]			; CHECK64-NEXT: movzbl (%rdi), %ecx # encoding: [0x0f,0xb6,0x0f]
	; CHECK64-NEXT: addl $-48, %ecx # encoding: [0x83,0xc1,0xd0]			; CHECK64-NEXT: addl $-48, %ecx # encoding: [0x83,0xc1,0xd0]
	; CHECK64-NEXT: cmpl $10, %ecx # encoding: [0x83,0xf9,0x0a]			; CHECK64-NEXT: cmpl $10, %ecx # encoding: [0x83,0xf9,0x0a]
	; CHECK64-NEXT: movl %r9d, %ecx # encoding: [0x44,0x89,0xc9]			; CHECK64-NEXT: movl %r9d, %ecx # encoding: [0x44,0x89,0xc9]
	; CHECK64-NEXT: jae _Z20isValidIntegerSuffixN9__gnu_cxx17__normal_iteratorIPKcSsEES3_ # TAILCALL			; CHECK64-NEXT: jae _Z20isValidIntegerSuffixN9__gnu_cxx17__normal_iteratorIPKcSsEES3_ # TAILCALL
	; CHECK64-NEXT: # encoding: [0x73,A]			; CHECK64-NEXT: # encoding: [0x73,A]
	; CHECK64-NEXT: # fixup A - offset: 1, value: _Z20isValidIntegerSuffixN9__gnu_cxx17__normal_iteratorIPKcSsEES3_-1, kind: FK_PCRel_1			; CHECK64-NEXT: # fixup A - offset: 1, value: _Z20isValidIntegerSuffixN9__gnu_cxx17__normal_iteratorIPKcSsEES3_-1, kind: FK_PCRel_1
	; CHECK64-NEXT: .LBB3_10: # %for.inc			; CHECK64-NEXT: .LBB3_11: # %for.inc
	; CHECK64-NEXT: # in Loop: Header=BB3_11 Depth=1			; CHECK64-NEXT: # in Loop: Header=BB3_1 Depth=1
	; CHECK64-NEXT: incq %rdi # encoding: [0x48,0xff,0xc7]			; CHECK64-NEXT: incq %rdi # encoding: [0x48,0xff,0xc7]
	; CHECK64-NEXT: decq %rax # encoding: [0x48,0xff,0xc8]			; CHECK64-NEXT: decq %rax # encoding: [0x48,0xff,0xc8]
	; CHECK64-NEXT: .LBB3_11: # %for.cond			; CHECK64-NEXT: jmp .LBB3_1 # encoding: [0xeb,A]
	; CHECK64-NEXT: # =>This Inner Loop Header: Depth=1
	; CHECK64-NEXT: testq %rax, %rax # encoding: [0x48,0x85,0xc0]
	; CHECK64-NEXT: jne .LBB3_1 # encoding: [0x75,A]
	; CHECK64-NEXT: # fixup A - offset: 1, value: .LBB3_1-1, kind: FK_PCRel_1			; CHECK64-NEXT: # fixup A - offset: 1, value: .LBB3_1-1, kind: FK_PCRel_1
	; CHECK64-NEXT: # %bb.12:			; CHECK64-NEXT: .LBB3_12:
	; CHECK64-NEXT: cmpl $2, %ecx # encoding: [0x83,0xf9,0x02]			; CHECK64-NEXT: cmpl $2, %ecx # encoding: [0x83,0xf9,0x02]
	; CHECK64-NEXT: sete %al # encoding: [0x0f,0x94,0xc0]			; CHECK64-NEXT: sete %al # encoding: [0x0f,0x94,0xc0]
	; CHECK64-NEXT: # kill: def $al killed $al killed $eax			; CHECK64-NEXT: # kill: def $al killed $al killed $eax
	; CHECK64-NEXT: retq # encoding: [0xc3]			; CHECK64-NEXT: retq # encoding: [0xc3]
	; CHECK64-NEXT: .LBB3_13:			; CHECK64-NEXT: .LBB3_13:
	; CHECK64-NEXT: xorl %eax, %eax # encoding: [0x31,0xc0]			; CHECK64-NEXT: xorl %eax, %eax # encoding: [0x31,0xc0]
	; CHECK64-NEXT: # kill: def $al killed $al killed $eax			; CHECK64-NEXT: # kill: def $al killed $al killed $eax
	; CHECK64-NEXT: retq # encoding: [0xc3]			; CHECK64-NEXT: retq # encoding: [0xc3]
	;			;
	; WIN64-LABEL: pr31257:			; WIN64-LABEL: pr31257:
	; WIN64: # %bb.0: # %entry			; WIN64: # %bb.0: # %entry
	; WIN64-NEXT: movq (%rcx), %rcx # encoding: [0x48,0x8b,0x09]			; WIN64-NEXT: movq (%rcx), %rcx # encoding: [0x48,0x8b,0x09]
	; WIN64-NEXT: movq -24(%rcx), %r8 # encoding: [0x4c,0x8b,0x41,0xe8]			; WIN64-NEXT: movq -24(%rcx), %r8 # encoding: [0x4c,0x8b,0x41,0xe8]
	; WIN64-NEXT: leaq (%rcx,%r8), %rdx # encoding: [0x4a,0x8d,0x14,0x01]			; WIN64-NEXT: leaq (%rcx,%r8), %rdx # encoding: [0x4a,0x8d,0x14,0x01]
	; WIN64-NEXT: xorl %eax, %eax # encoding: [0x31,0xc0]			; WIN64-NEXT: xorl %eax, %eax # encoding: [0x31,0xc0]
	; WIN64-NEXT: jmp .LBB3_10 # encoding: [0xeb,A]			; WIN64-NEXT: .LBB3_1: # %for.cond
	; WIN64-NEXT: # fixup A - offset: 1, value: .LBB3_10-1, kind: FK_PCRel_1			; WIN64-NEXT: # =>This Inner Loop Header: Depth=1
	; WIN64-NEXT: .LBB3_1: # %for.body			; WIN64-NEXT: testq %r8, %r8 # encoding: [0x4d,0x85,0xc0]
	; WIN64-NEXT: # in Loop: Header=BB3_10 Depth=1			; WIN64-NEXT: je .LBB3_11 # encoding: [0x74,A]
	; WIN64-NEXT: cmpl $2, %eax # encoding: [0x83,0xf8,0x02]			; WIN64-NEXT: # fixup A - offset: 1, value: .LBB3_11-1, kind: FK_PCRel_1
	; WIN64-NEXT: je .LBB3_8 # encoding: [0x74,A]
	; WIN64-NEXT: # fixup A - offset: 1, value: .LBB3_8-1, kind: FK_PCRel_1
	; WIN64-NEXT: # %bb.2: # %for.body			; WIN64-NEXT: # %bb.2: # %for.body
	; WIN64-NEXT: # in Loop: Header=BB3_10 Depth=1			; WIN64-NEXT: # in Loop: Header=BB3_1 Depth=1
	; WIN64-NEXT: cmpl $1, %eax # encoding: [0x83,0xf8,0x01]			; WIN64-NEXT: cmpl $2, %eax # encoding: [0x83,0xf8,0x02]
	; WIN64-NEXT: je .LBB3_6 # encoding: [0x74,A]			; WIN64-NEXT: je .LBB3_9 # encoding: [0x74,A]
	; WIN64-NEXT: # fixup A - offset: 1, value: .LBB3_6-1, kind: FK_PCRel_1			; WIN64-NEXT: # fixup A - offset: 1, value: .LBB3_9-1, kind: FK_PCRel_1
	; WIN64-NEXT: # %bb.3: # %for.body			; WIN64-NEXT: # %bb.3: # %for.body
	; WIN64-NEXT: # in Loop: Header=BB3_10 Depth=1			; WIN64-NEXT: # in Loop: Header=BB3_1 Depth=1
				; WIN64-NEXT: cmpl $1, %eax # encoding: [0x83,0xf8,0x01]
				; WIN64-NEXT: je .LBB3_7 # encoding: [0x74,A]
				; WIN64-NEXT: # fixup A - offset: 1, value: .LBB3_7-1, kind: FK_PCRel_1
				; WIN64-NEXT: # %bb.4: # %for.body
				; WIN64-NEXT: # in Loop: Header=BB3_1 Depth=1
	; WIN64-NEXT: testl %eax, %eax # encoding: [0x85,0xc0]			; WIN64-NEXT: testl %eax, %eax # encoding: [0x85,0xc0]
	; WIN64-NEXT: jne .LBB3_9 # encoding: [0x75,A]			; WIN64-NEXT: jne .LBB3_10 # encoding: [0x75,A]
	; WIN64-NEXT: # fixup A - offset: 1, value: .LBB3_9-1, kind: FK_PCRel_1			; WIN64-NEXT: # fixup A - offset: 1, value: .LBB3_10-1, kind: FK_PCRel_1
	; WIN64-NEXT: # %bb.4: # %sw.bb			; WIN64-NEXT: # %bb.5: # %sw.bb
	; WIN64-NEXT: # in Loop: Header=BB3_10 Depth=1			; WIN64-NEXT: # in Loop: Header=BB3_1 Depth=1
	; WIN64-NEXT: movzbl (%rcx), %r9d # encoding: [0x44,0x0f,0xb6,0x09]			; WIN64-NEXT: movzbl (%rcx), %r9d # encoding: [0x44,0x0f,0xb6,0x09]
	; WIN64-NEXT: cmpl $43, %r9d # encoding: [0x41,0x83,0xf9,0x2b]			; WIN64-NEXT: cmpl $43, %r9d # encoding: [0x41,0x83,0xf9,0x2b]
	; WIN64-NEXT: movl $1, %eax # encoding: [0xb8,0x01,0x00,0x00,0x00]			; WIN64-NEXT: movl $1, %eax # encoding: [0xb8,0x01,0x00,0x00,0x00]
	; WIN64-NEXT: je .LBB3_9 # encoding: [0x74,A]			; WIN64-NEXT: je .LBB3_10 # encoding: [0x74,A]
	; WIN64-NEXT: # fixup A - offset: 1, value: .LBB3_9-1, kind: FK_PCRel_1			; WIN64-NEXT: # fixup A - offset: 1, value: .LBB3_10-1, kind: FK_PCRel_1
	; WIN64-NEXT: # %bb.5: # %sw.bb			; WIN64-NEXT: # %bb.6: # %sw.bb
	; WIN64-NEXT: # in Loop: Header=BB3_10 Depth=1			; WIN64-NEXT: # in Loop: Header=BB3_1 Depth=1
	; WIN64-NEXT: cmpb $45, %r9b # encoding: [0x41,0x80,0xf9,0x2d]			; WIN64-NEXT: cmpb $45, %r9b # encoding: [0x41,0x80,0xf9,0x2d]
	; WIN64-NEXT: je .LBB3_9 # encoding: [0x74,A]			; WIN64-NEXT: je .LBB3_10 # encoding: [0x74,A]
	; WIN64-NEXT: # fixup A - offset: 1, value: .LBB3_9-1, kind: FK_PCRel_1			; WIN64-NEXT: # fixup A - offset: 1, value: .LBB3_10-1, kind: FK_PCRel_1
	; WIN64-NEXT: jmp .LBB3_7 # encoding: [0xeb,A]			; WIN64-NEXT: jmp .LBB3_8 # encoding: [0xeb,A]
	; WIN64-NEXT: # fixup A - offset: 1, value: .LBB3_7-1, kind: FK_PCRel_1			; WIN64-NEXT: # fixup A - offset: 1, value: .LBB3_8-1, kind: FK_PCRel_1
	; WIN64-NEXT: .LBB3_6: # %sw.bb14			; WIN64-NEXT: .LBB3_7: # %sw.bb14
	; WIN64-NEXT: # in Loop: Header=BB3_10 Depth=1			; WIN64-NEXT: # in Loop: Header=BB3_1 Depth=1
	; WIN64-NEXT: movzbl (%rcx), %r9d # encoding: [0x44,0x0f,0xb6,0x09]			; WIN64-NEXT: movzbl (%rcx), %r9d # encoding: [0x44,0x0f,0xb6,0x09]
	; WIN64-NEXT: .LBB3_7: # %if.else			; WIN64-NEXT: .LBB3_8: # %if.else
	; WIN64-NEXT: # in Loop: Header=BB3_10 Depth=1			; WIN64-NEXT: # in Loop: Header=BB3_1 Depth=1
	; WIN64-NEXT: addl $-48, %r9d # encoding: [0x41,0x83,0xc1,0xd0]			; WIN64-NEXT: addl $-48, %r9d # encoding: [0x41,0x83,0xc1,0xd0]
	; WIN64-NEXT: movl $2, %eax # encoding: [0xb8,0x02,0x00,0x00,0x00]			; WIN64-NEXT: movl $2, %eax # encoding: [0xb8,0x02,0x00,0x00,0x00]
	; WIN64-NEXT: cmpl $10, %r9d # encoding: [0x41,0x83,0xf9,0x0a]			; WIN64-NEXT: cmpl $10, %r9d # encoding: [0x41,0x83,0xf9,0x0a]
	; WIN64-NEXT: jb .LBB3_9 # encoding: [0x72,A]			; WIN64-NEXT: jb .LBB3_10 # encoding: [0x72,A]
	; WIN64-NEXT: # fixup A - offset: 1, value: .LBB3_9-1, kind: FK_PCRel_1			; WIN64-NEXT: # fixup A - offset: 1, value: .LBB3_10-1, kind: FK_PCRel_1
	; WIN64-NEXT: jmp .LBB3_12 # encoding: [0xeb,A]			; WIN64-NEXT: jmp .LBB3_12 # encoding: [0xeb,A]
	; WIN64-NEXT: # fixup A - offset: 1, value: .LBB3_12-1, kind: FK_PCRel_1			; WIN64-NEXT: # fixup A - offset: 1, value: .LBB3_12-1, kind: FK_PCRel_1
	; WIN64-NEXT: .LBB3_8: # %sw.bb22			; WIN64-NEXT: .LBB3_9: # %sw.bb22
	; WIN64-NEXT: # in Loop: Header=BB3_10 Depth=1			; WIN64-NEXT: # in Loop: Header=BB3_1 Depth=1
	; WIN64-NEXT: movzbl (%rcx), %r9d # encoding: [0x44,0x0f,0xb6,0x09]			; WIN64-NEXT: movzbl (%rcx), %r9d # encoding: [0x44,0x0f,0xb6,0x09]
	; WIN64-NEXT: addl $-48, %r9d # encoding: [0x41,0x83,0xc1,0xd0]			; WIN64-NEXT: addl $-48, %r9d # encoding: [0x41,0x83,0xc1,0xd0]
	; WIN64-NEXT: movl $2, %eax # encoding: [0xb8,0x02,0x00,0x00,0x00]			; WIN64-NEXT: movl $2, %eax # encoding: [0xb8,0x02,0x00,0x00,0x00]
	; WIN64-NEXT: cmpl $10, %r9d # encoding: [0x41,0x83,0xf9,0x0a]			; WIN64-NEXT: cmpl $10, %r9d # encoding: [0x41,0x83,0xf9,0x0a]
	; WIN64-NEXT: jae _Z20isValidIntegerSuffixN9__gnu_cxx17__normal_iteratorIPKcSsEES3_ # TAILCALL			; WIN64-NEXT: jae _Z20isValidIntegerSuffixN9__gnu_cxx17__normal_iteratorIPKcSsEES3_ # TAILCALL
	; WIN64-NEXT: # encoding: [0x73,A]			; WIN64-NEXT: # encoding: [0x73,A]
	; WIN64-NEXT: # fixup A - offset: 1, value: _Z20isValidIntegerSuffixN9__gnu_cxx17__normal_iteratorIPKcSsEES3_-1, kind: FK_PCRel_1			; WIN64-NEXT: # fixup A - offset: 1, value: _Z20isValidIntegerSuffixN9__gnu_cxx17__normal_iteratorIPKcSsEES3_-1, kind: FK_PCRel_1
	; WIN64-NEXT: .LBB3_9: # %for.inc			; WIN64-NEXT: .LBB3_10: # %for.inc
	; WIN64-NEXT: # in Loop: Header=BB3_10 Depth=1			; WIN64-NEXT: # in Loop: Header=BB3_1 Depth=1
	; WIN64-NEXT: incq %rcx # encoding: [0x48,0xff,0xc1]			; WIN64-NEXT: incq %rcx # encoding: [0x48,0xff,0xc1]
	; WIN64-NEXT: decq %r8 # encoding: [0x49,0xff,0xc8]			; WIN64-NEXT: decq %r8 # encoding: [0x49,0xff,0xc8]
	; WIN64-NEXT: .LBB3_10: # %for.cond			; WIN64-NEXT: jmp .LBB3_1 # encoding: [0xeb,A]
	; WIN64-NEXT: # =>This Inner Loop Header: Depth=1
	; WIN64-NEXT: testq %r8, %r8 # encoding: [0x4d,0x85,0xc0]
	; WIN64-NEXT: jne .LBB3_1 # encoding: [0x75,A]
	; WIN64-NEXT: # fixup A - offset: 1, value: .LBB3_1-1, kind: FK_PCRel_1			; WIN64-NEXT: # fixup A - offset: 1, value: .LBB3_1-1, kind: FK_PCRel_1
	; WIN64-NEXT: # %bb.11:			; WIN64-NEXT: .LBB3_11:
	; WIN64-NEXT: cmpl $2, %eax # encoding: [0x83,0xf8,0x02]			; WIN64-NEXT: cmpl $2, %eax # encoding: [0x83,0xf8,0x02]
	; WIN64-NEXT: sete %al # encoding: [0x0f,0x94,0xc0]			; WIN64-NEXT: sete %al # encoding: [0x0f,0x94,0xc0]
	; WIN64-NEXT: # kill: def $al killed $al killed $eax			; WIN64-NEXT: # kill: def $al killed $al killed $eax
	; WIN64-NEXT: retq # encoding: [0xc3]			; WIN64-NEXT: retq # encoding: [0xc3]
	; WIN64-NEXT: .LBB3_12:			; WIN64-NEXT: .LBB3_12:
	; WIN64-NEXT: xorl %eax, %eax # encoding: [0x31,0xc0]			; WIN64-NEXT: xorl %eax, %eax # encoding: [0x31,0xc0]
	; WIN64-NEXT: # kill: def $al killed $al killed $eax			; WIN64-NEXT: # kill: def $al killed $al killed $eax
	; WIN64-NEXT: retq # encoding: [0xc3]			; WIN64-NEXT: retq # encoding: [0xc3]
	▲ Show 20 Lines • Show All 72 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/loop-blocks.ll

; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -asm-verbose=false \| FileCheck %s		; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -asm-verbose=false \| FileCheck %s

; These tests check for loop branching structure, and that the loop align		; These tests check for loop branching structure, and that the loop align
; directive is placed in the expected place.		; directive is placed in the expected place.

; CodeGen should insert a branch into the middle of the loop in		; CodeGen should insert a branch into the middle of the loop in
; order to avoid a branch within the loop.		; order to avoid a branch within the loop.

; CHECK-LABEL: simple:		; CHECK-LABEL: simple:
; CHECK: jmp .LBB0_1		; CHECK: align
; CHECK-NEXT: align
; CHECK-NEXT: .LBB0_2:
; CHECK-NEXT: callq loop_latch
; CHECK-NEXT: .LBB0_1:		; CHECK-NEXT: .LBB0_1:
; CHECK-NEXT: callq loop_header		; CHECK-NEXT: callq loop_header
		; CHECK: js .LBB0_3
		; CHECK-NEXT: callq loop_latch
		; CHECK-NEXT: jmp .LBB0_1
		; CHECK-NEXT: .LBB0_3:
		; CHECK-NEXT: callq exit

define void @simple() nounwind {		define void @simple() nounwind {
entry:		entry:
br label %loop		br label %loop

loop:		loop:
call void @loop_header()		call void @loop_header()
%t0 = tail call i32 @get()		%t0 = tail call i32 @get()
▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines

; Same as slightly_more_involved, but block_a is now a CFG diamond with		; Same as slightly_more_involved, but block_a is now a CFG diamond with
; fallthrough edges which should be preserved.		; fallthrough edges which should be preserved.
; "callq block_a_merge_func" is tail duped.		; "callq block_a_merge_func" is tail duped.

; CHECK-LABEL: yet_more_involved:		; CHECK-LABEL: yet_more_involved:
; CHECK: jmp .LBB2_1		; CHECK: jmp .LBB2_1
; CHECK-NEXT: align		; CHECK-NEXT: align
; CHECK-NEXT: .LBB2_5:
; CHECK-NEXT: callq block_a_true_func		; CHECK: .LBB2_1:
; CHECK-NEXT: callq block_a_merge_func
; CHECK-NEXT: .LBB2_1:
; CHECK-NEXT: callq body		; CHECK-NEXT: callq body
;		; CHECK-NEXT: callq get
; LBB2_4		; CHECK-NEXT: cmpl $2, %eax
; CHECK: callq bar99		; CHECK-NEXT: jge .LBB2_2
		; CHECK-NEXT: callq bar99
; CHECK-NEXT: callq get		; CHECK-NEXT: callq get
; CHECK-NEXT: cmpl $2999, %eax		; CHECK-NEXT: cmpl $2999, %eax
; CHECK-NEXT: jle .LBB2_5		; CHECK-NEXT: jg .LBB2_6
		; CHECK-NEXT: callq block_a_true_func
		; CHECK-NEXT: callq block_a_merge_func
		; CHECK-NEXT: jmp .LBB2_1
		; CHECK-NEXT: align
		; CHECK-NEXT: .LBB2_6:
; CHECK-NEXT: callq block_a_false_func		; CHECK-NEXT: callq block_a_false_func
; CHECK-NEXT: callq block_a_merge_func		; CHECK-NEXT: callq block_a_merge_func
; CHECK-NEXT: jmp .LBB2_1		; CHECK-NEXT: jmp .LBB2_1

define void @yet_more_involved() nounwind {		define void @yet_more_involved() nounwind {
entry:		entry:
br label %loop		br label %loop

▲ Show 20 Lines • Show All 99 Lines • ▼ Show 20 Lines	block101:
br label %loop		br label %loop

block102:		block102:
call void @bar102()		call void @bar102()
br label %loop		br label %loop
}		}

; CHECK-LABEL: check_minsize:		; CHECK-LABEL: check_minsize:
; CHECK: jmp .LBB4_1
; CHECK-NOT: align		; CHECK-NOT: align
; CHECK-NEXT: .LBB4_2:		; CHECK: .LBB4_1:
; CHECK-NEXT: callq loop_latch
; CHECK-NEXT: .LBB4_1:
; CHECK-NEXT: callq loop_header		; CHECK-NEXT: callq loop_header
		; CHECK: callq loop_latch
		; CHECK: .LBB4_3:
		; CHECK: callq exit


define void @check_minsize() minsize nounwind {		define void @check_minsize() minsize nounwind {
entry:		entry:
br label %loop		br label %loop

loop:		loop:
call void @loop_header()		call void @loop_header()
▲ Show 20 Lines • Show All 60 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/loop-rotate.ll

				; RUN: llc -mtriple=i686-linux < %s \| FileCheck %s

				; Don't rotate the loop if the number of fall through to exit is not larger
				; than the number of fall through to header.
				define void @no_rotate() {
				; CHECK-LABEL: no_rotate
				; CHECK: %entry
				; CHECK: %header
				; CHECK: %middle
				; CHECK: %latch1
				; CHECK: %latch2
				; CHECK: %end
				entry:
				br label %header

				header:
				%val1 = call i1 @foo()
				br i1 %val1, label %middle, label %end

				middle:
				%val2 = call i1 @foo()
				br i1 %val2, label %latch1, label %end

				latch1:
				%val3 = call i1 @foo()
				br i1 %val3, label %latch2, label %header

				latch2:
				%val4 = call i1 @foo()
				br label %header

				end:
				ret void
				}

				define void @do_rotate() {
				; CHECK-LABEL: do_rotate
				; CHECK: %entry
				; CHECK: %then
				; CHECK: %else
				; CHECK: %latch1
				; CHECK: %latch2
				; CHECK: %header
				; CHECK: %end
				entry:
				%val0 = call i1 @foo()
				br i1 %val0, label %then, label %else

				then:
				call void @a()
				br label %header

				else:
				call void @b()
				br label %header

				header:
				%val1 = call i1 @foo()
				br i1 %val1, label %latch1, label %end

				latch1:
				%val3 = call i1 @foo()
				br i1 %val3, label %latch2, label %header

				latch2:
				%val4 = call i1 @foo()
				br label %header

				end:
				ret void
				}

				; The loop structure is same as in @no_rotate, but the loop header's predecessor
				; doesn't fall through to it, so it should be rotated to get exit fall through.
				define void @do_rotate2() {
				; CHECK-LABEL: do_rotate2
				; CHECK: %entry
				; CHECK: %then
				; CHECK: %middle
				; CHECK: %latch1
				; CHECK: %latch2
				; CHECK: %header
				; CHECK: %exit
				entry:
				%val0 = call i1 @foo()
				br i1 %val0, label %then, label %header, !prof !1

				then:
				call void @a()
				br label %end

				header:
				%val1 = call i1 @foo()
				br i1 %val1, label %middle, label %exit

				middle:
				%val2 = call i1 @foo()
				br i1 %val2, label %latch1, label %exit

				latch1:
				%val3 = call i1 @foo()
				br i1 %val3, label %latch2, label %header

				latch2:
				%val4 = call i1 @foo()
				br label %header

				exit:
				call void @b()
				br label %end

				end:
				ret void
				}

				declare i1 @foo()
				declare void @a()
				declare void @b()

				!1 = !{!"branch_weights", i32 10, i32 1}

llvm/trunk/test/CodeGen/X86/lsr-loop-exit-cond.ll

	Show All 15 Lines
	; GENERIC-NEXT: movl (%rdx), %eax			; GENERIC-NEXT: movl (%rdx), %eax
	; GENERIC-NEXT: movl 4(%rdx), %ebx			; GENERIC-NEXT: movl 4(%rdx), %ebx
	; GENERIC-NEXT: decl %ecx			; GENERIC-NEXT: decl %ecx
	; GENERIC-NEXT: leaq 20(%rdx), %r14			; GENERIC-NEXT: leaq 20(%rdx), %r14
	; GENERIC-NEXT: movq _Te0@{{.*}}(%rip), %r9			; GENERIC-NEXT: movq _Te0@{{.*}}(%rip), %r9
	; GENERIC-NEXT: movq _Te1@{{.*}}(%rip), %r8			; GENERIC-NEXT: movq _Te1@{{.*}}(%rip), %r8
	; GENERIC-NEXT: movq _Te3@{{.*}}(%rip), %r10			; GENERIC-NEXT: movq _Te3@{{.*}}(%rip), %r10
	; GENERIC-NEXT: movq %rcx, %r11			; GENERIC-NEXT: movq %rcx, %r11
	; GENERIC-NEXT: jmp LBB0_1
	; GENERIC-NEXT: .p2align 4, 0x90			; GENERIC-NEXT: .p2align 4, 0x90
	; GENERIC-NEXT: LBB0_2: ## %bb1
	; GENERIC-NEXT: ## in Loop: Header=BB0_1 Depth=1
	; GENERIC-NEXT: movl %edi, %ebx
	; GENERIC-NEXT: shrl $16, %ebx
	; GENERIC-NEXT: movzbl %bl, %ebx
	; GENERIC-NEXT: xorl (%r8,%rbx,4), %eax
	; GENERIC-NEXT: xorl -4(%r14), %eax
	; GENERIC-NEXT: shrl $24, %edi
	; GENERIC-NEXT: movzbl %bpl, %ebx
	; GENERIC-NEXT: movl (%r10,%rbx,4), %ebx
	; GENERIC-NEXT: xorl (%r9,%rdi,4), %ebx
	; GENERIC-NEXT: xorl (%r14), %ebx
	; GENERIC-NEXT: decq %r11
	; GENERIC-NEXT: addq $16, %r14
	; GENERIC-NEXT: LBB0_1: ## %bb			; GENERIC-NEXT: LBB0_1: ## %bb
	; GENERIC-NEXT: ## =>This Inner Loop Header: Depth=1			; GENERIC-NEXT: ## =>This Inner Loop Header: Depth=1
	; GENERIC-NEXT: movzbl %al, %edi			; GENERIC-NEXT: movzbl %al, %edi
	; GENERIC-NEXT: ## kill: def $eax killed $eax def $rax			; GENERIC-NEXT: ## kill: def $eax killed $eax def $rax
	; GENERIC-NEXT: shrl $24, %eax			; GENERIC-NEXT: shrl $24, %eax
	; GENERIC-NEXT: movl %ebx, %ebp			; GENERIC-NEXT: movl %ebx, %ebp
	; GENERIC-NEXT: shrl $16, %ebp			; GENERIC-NEXT: shrl $16, %ebp
	; GENERIC-NEXT: movzbl %bpl, %ebp			; GENERIC-NEXT: movzbl %bpl, %ebp
	; GENERIC-NEXT: movl (%r8,%rbp,4), %ebp			; GENERIC-NEXT: movl (%r8,%rbp,4), %ebp
	; GENERIC-NEXT: xorl (%r9,%rax,4), %ebp			; GENERIC-NEXT: xorl (%r9,%rax,4), %ebp
	; GENERIC-NEXT: xorl -12(%r14), %ebp			; GENERIC-NEXT: xorl -12(%r14), %ebp
	; GENERIC-NEXT: shrl $24, %ebx			; GENERIC-NEXT: shrl $24, %ebx
	; GENERIC-NEXT: movl (%r10,%rdi,4), %edi			; GENERIC-NEXT: movl (%r10,%rdi,4), %edi
	; GENERIC-NEXT: xorl (%r9,%rbx,4), %edi			; GENERIC-NEXT: xorl (%r9,%rbx,4), %edi
	; GENERIC-NEXT: xorl -8(%r14), %edi			; GENERIC-NEXT: xorl -8(%r14), %edi
	; GENERIC-NEXT: movl %ebp, %eax			; GENERIC-NEXT: movl %ebp, %eax
	; GENERIC-NEXT: shrl $24, %eax			; GENERIC-NEXT: shrl $24, %eax
	; GENERIC-NEXT: movl (%r9,%rax,4), %eax			; GENERIC-NEXT: movl (%r9,%rax,4), %eax
	; GENERIC-NEXT: testq %r11, %r11			; GENERIC-NEXT: testq %r11, %r11
	; GENERIC-NEXT: jne LBB0_2			; GENERIC-NEXT: je LBB0_3
	; GENERIC-NEXT: ## %bb.3: ## %bb2			; GENERIC-NEXT: ## %bb.2: ## %bb1
				; GENERIC-NEXT: ## in Loop: Header=BB0_1 Depth=1
				; GENERIC-NEXT: movl %edi, %ebx
				; GENERIC-NEXT: shrl $16, %ebx
				; GENERIC-NEXT: movzbl %bl, %ebx
				; GENERIC-NEXT: xorl (%r8,%rbx,4), %eax
				; GENERIC-NEXT: xorl -4(%r14), %eax
				; GENERIC-NEXT: shrl $24, %edi
				; GENERIC-NEXT: movzbl %bpl, %ebx
				; GENERIC-NEXT: movl (%r10,%rbx,4), %ebx
				; GENERIC-NEXT: xorl (%r9,%rdi,4), %ebx
				; GENERIC-NEXT: xorl (%r14), %ebx
				; GENERIC-NEXT: decq %r11
				; GENERIC-NEXT: addq $16, %r14
				; GENERIC-NEXT: jmp LBB0_1
				; GENERIC-NEXT: LBB0_3: ## %bb2
	; GENERIC-NEXT: shlq $4, %rcx			; GENERIC-NEXT: shlq $4, %rcx
	; GENERIC-NEXT: andl $-16777216, %eax ## imm = 0xFF000000			; GENERIC-NEXT: andl $-16777216, %eax ## imm = 0xFF000000
	; GENERIC-NEXT: movl %edi, %ebx			; GENERIC-NEXT: movl %edi, %ebx
	; GENERIC-NEXT: shrl $16, %ebx			; GENERIC-NEXT: shrl $16, %ebx
	; GENERIC-NEXT: movzbl %bl, %ebx			; GENERIC-NEXT: movzbl %bl, %ebx
	; GENERIC-NEXT: movzbl 2(%r8,%rbx,4), %ebx			; GENERIC-NEXT: movzbl 2(%r8,%rbx,4), %ebx
	; GENERIC-NEXT: shll $16, %ebx			; GENERIC-NEXT: shll $16, %ebx
	; GENERIC-NEXT: orl %eax, %ebx			; GENERIC-NEXT: orl %eax, %ebx
	Show All 31 Lines
	; ATOM-NEXT: movl (%rdx), %r15d			; ATOM-NEXT: movl (%rdx), %r15d
	; ATOM-NEXT: movl 4(%rdx), %eax			; ATOM-NEXT: movl 4(%rdx), %eax
	; ATOM-NEXT: leaq 20(%rdx), %r14			; ATOM-NEXT: leaq 20(%rdx), %r14
	; ATOM-NEXT: movq _Te0@{{.*}}(%rip), %r9			; ATOM-NEXT: movq _Te0@{{.*}}(%rip), %r9
	; ATOM-NEXT: movq _Te1@{{.*}}(%rip), %r8			; ATOM-NEXT: movq _Te1@{{.*}}(%rip), %r8
	; ATOM-NEXT: movq _Te3@{{.*}}(%rip), %r10			; ATOM-NEXT: movq _Te3@{{.*}}(%rip), %r10
	; ATOM-NEXT: decl %ecx			; ATOM-NEXT: decl %ecx
	; ATOM-NEXT: movq %rcx, %r11			; ATOM-NEXT: movq %rcx, %r11
	; ATOM-NEXT: jmp LBB0_1
	; ATOM-NEXT: .p2align 4, 0x90			; ATOM-NEXT: .p2align 4, 0x90
	; ATOM-NEXT: LBB0_2: ## %bb1
	; ATOM-NEXT: ## in Loop: Header=BB0_1 Depth=1
	; ATOM-NEXT: shrl $16, %eax
	; ATOM-NEXT: shrl $24, %edi
	; ATOM-NEXT: decq %r11
	; ATOM-NEXT: movzbl %al, %ebp
	; ATOM-NEXT: movzbl %bl, %eax
	; ATOM-NEXT: movl (%r10,%rax,4), %eax
	; ATOM-NEXT: xorl (%r8,%rbp,4), %r15d
	; ATOM-NEXT: xorl (%r9,%rdi,4), %eax
	; ATOM-NEXT: xorl -4(%r14), %r15d
	; ATOM-NEXT: xorl (%r14), %eax
	; ATOM-NEXT: addq $16, %r14
	; ATOM-NEXT: LBB0_1: ## %bb			; ATOM-NEXT: LBB0_1: ## %bb
	; ATOM-NEXT: ## =>This Inner Loop Header: Depth=1			; ATOM-NEXT: ## =>This Inner Loop Header: Depth=1
	; ATOM-NEXT: movl %eax, %edi			; ATOM-NEXT: movl %eax, %edi
	; ATOM-NEXT: movl %r15d, %ebp			; ATOM-NEXT: movl %r15d, %ebp
	; ATOM-NEXT: shrl $24, %eax			; ATOM-NEXT: shrl $24, %eax
	; ATOM-NEXT: shrl $16, %edi			; ATOM-NEXT: shrl $16, %edi
	; ATOM-NEXT: shrl $24, %ebp			; ATOM-NEXT: shrl $24, %ebp
	; ATOM-NEXT: movzbl %dil, %edi			; ATOM-NEXT: movzbl %dil, %edi
	; ATOM-NEXT: movl (%r8,%rdi,4), %ebx			; ATOM-NEXT: movl (%r8,%rdi,4), %ebx
	; ATOM-NEXT: movzbl %r15b, %edi			; ATOM-NEXT: movzbl %r15b, %edi
	; ATOM-NEXT: xorl (%r9,%rbp,4), %ebx			; ATOM-NEXT: xorl (%r9,%rbp,4), %ebx
	; ATOM-NEXT: movl (%r10,%rdi,4), %edi			; ATOM-NEXT: movl (%r10,%rdi,4), %edi
	; ATOM-NEXT: xorl -12(%r14), %ebx			; ATOM-NEXT: xorl -12(%r14), %ebx
	; ATOM-NEXT: xorl (%r9,%rax,4), %edi			; ATOM-NEXT: xorl (%r9,%rax,4), %edi
	; ATOM-NEXT: movl %ebx, %eax			; ATOM-NEXT: movl %ebx, %eax
	; ATOM-NEXT: xorl -8(%r14), %edi			; ATOM-NEXT: xorl -8(%r14), %edi
	; ATOM-NEXT: shrl $24, %eax			; ATOM-NEXT: shrl $24, %eax
	; ATOM-NEXT: movl (%r9,%rax,4), %r15d			; ATOM-NEXT: movl (%r9,%rax,4), %r15d
	; ATOM-NEXT: testq %r11, %r11			; ATOM-NEXT: testq %r11, %r11
	; ATOM-NEXT: movl %edi, %eax			; ATOM-NEXT: movl %edi, %eax
	; ATOM-NEXT: jne LBB0_2			; ATOM-NEXT: je LBB0_3
	; ATOM-NEXT: ## %bb.3: ## %bb2			; ATOM-NEXT: ## %bb.2: ## %bb1
				; ATOM-NEXT: ## in Loop: Header=BB0_1 Depth=1
				; ATOM-NEXT: shrl $16, %eax
				; ATOM-NEXT: shrl $24, %edi
				; ATOM-NEXT: decq %r11
				; ATOM-NEXT: movzbl %al, %ebp
				; ATOM-NEXT: movzbl %bl, %eax
				; ATOM-NEXT: movl (%r10,%rax,4), %eax
				; ATOM-NEXT: xorl (%r8,%rbp,4), %r15d
				; ATOM-NEXT: xorl (%r9,%rdi,4), %eax
				; ATOM-NEXT: xorl -4(%r14), %r15d
				; ATOM-NEXT: xorl (%r14), %eax
				; ATOM-NEXT: addq $16, %r14
				; ATOM-NEXT: jmp LBB0_1
				; ATOM-NEXT: LBB0_3: ## %bb2
	; ATOM-NEXT: shrl $16, %eax			; ATOM-NEXT: shrl $16, %eax
	; ATOM-NEXT: shrl $8, %edi			; ATOM-NEXT: shrl $8, %edi
	; ATOM-NEXT: movzbl %bl, %ebp			; ATOM-NEXT: movzbl %bl, %ebp
	; ATOM-NEXT: andl $-16777216, %r15d ## imm = 0xFF000000			; ATOM-NEXT: andl $-16777216, %r15d ## imm = 0xFF000000
	; ATOM-NEXT: shlq $4, %rcx			; ATOM-NEXT: shlq $4, %rcx
	; ATOM-NEXT: movzbl %al, %eax			; ATOM-NEXT: movzbl %al, %eax
	; ATOM-NEXT: movzbl 3(%r9,%rdi,4), %edi			; ATOM-NEXT: movzbl 3(%r9,%rdi,4), %edi
	; ATOM-NEXT: movzbl 2(%r8,%rbp,4), %ebp			; ATOM-NEXT: movzbl 2(%r8,%rbp,4), %ebp
	▲ Show 20 Lines • Show All 232 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/move_latch_to_loop_top.ll

				; RUN: llc -mcpu=corei7 -mtriple=x86_64-linux < %s \| FileCheck %s

				; The block latch should be moved before header.
				;CHECK-LABEL: test1:
				;CHECK: %latch
				;CHECK: %header
				;CHECK: %false
				define i32 @test1(i32* %p) {
				entry:
				br label %header

				header:
				%x1 = phi i64 [0, %entry], [%x2, %latch]
				%count1 = phi i32 [0, %entry], [%count4, %latch]
				%0 = ptrtoint i32* %p to i64
				%1 = add i64 %0, %x1
				%2 = inttoptr i64 %1 to i32*
				%data = load i32, i32* %2
				%3 = icmp eq i32 %data, 0
				br i1 %3, label %latch, label %false

				false:
				%count2 = add i32 %count1, 1
				br label %latch

				latch:
				%count4 = phi i32 [%count2, %false], [%count1, %header]
				%x2 = add i64 %x1, 1
				%4 = icmp eq i64 %x2, 100
				br i1 %4, label %exit, label %header

				exit:
				ret i32 %count4
				}

				; The block latch and one of false/true should be moved before header.
				;CHECK-LABEL: test2:
				;CHECK: %true
				;CHECK: %latch
				;CHECK: %header
				;CHECK: %false
				define i32 @test2(i32* %p) {
				entry:
				br label %header

				header:
				%x1 = phi i64 [0, %entry], [%x2, %latch]
				%count1 = phi i32 [0, %entry], [%count4, %latch]
				%0 = ptrtoint i32* %p to i64
				%1 = add i64 %0, %x1
				%2 = inttoptr i64 %1 to i32*
				%data = load i32, i32* %2
				%3 = icmp eq i32 %data, 0
				br i1 %3, label %true, label %false

				false:
				%count2 = add i32 %count1, 1
				br label %latch

				true:
				%count3 = add i32 %count1, 2
				br label %latch

				latch:
				%count4 = phi i32 [%count2, %false], [%count3, %true]
				%x2 = add i64 %x1, 1
				%4 = icmp eq i64 %x2, 100
				br i1 %4, label %exit, label %header

				exit:
				ret i32 %count4
				}

				; More blocks can be moved before header.
				; header <------------
				; /\ \|
				; / \ \|
				; / \ \|
				; / \ \|
				; / \ \|
				; true false \|
				; /\ /\ \|
				; / \ / \ \|
				; / \ / \ \|
				; true3 false3 / \ \|
				; \ / true2 false2 \|
				; \ / \ / \|
				; \/ \ / \|
				; endif3 \ / \|
				; \ \/ \|
				; \ endif2 \|
				; \ / \|
				; \ / \|
				; \ / \|
				; \ / \|
				; \/ \|
				; latch-------------
				; \|
				; \|
				; exit
				;
				; Blocks true3,endif3,latch should be moved before header.
				;
				;CHECK-LABEL: test3:
				;CHECK: %true3
				;CHECK: %endif3
				;CHECK: %latch
				;CHECK: %header
				;CHECK: %false
				define i32 @test3(i32* %p) {
				entry:
				br label %header

				header:
				%x1 = phi i64 [0, %entry], [%x2, %latch]
				%count1 = phi i32 [0, %entry], [%count12, %latch]
				%0 = ptrtoint i32* %p to i64
				%1 = add i64 %0, %x1
				%2 = inttoptr i64 %1 to i32*
				%data = load i32, i32* %2
				%3 = icmp eq i32 %data, 0
				br i1 %3, label %true, label %false, !prof !3

				false:
				%count2 = add i32 %count1, 1
				%cond = icmp sgt i32 %count2, 10
				br i1 %cond, label %true2, label %false2

				false2:
				%count3 = and i32 %count2, 7
				br label %endif2

				true2:
				%count4 = mul i32 %count2, 3
				br label %endif2

				endif2:
				%count5 = phi i32 [%count3, %false2], [%count4, %true2]
				%count6 = sub i32 %count5, 5
				br label %latch

				true:
				%count7 = add i32 %count1, 2
				%cond2 = icmp slt i32 %count7, 20
				br i1 %cond2, label %true3, label %false3

				false3:
				%count8 = or i32 %count7, 3
				br label %endif3

				true3:
				%count9 = xor i32 %count7, 55
				br label %endif3

				endif3:
				%count10 = phi i32 [%count8, %false3], [%count9, %true3]
				%count11 = add i32 %count10, 3
				br label %latch

				latch:
				%count12 = phi i32 [%count6, %endif2], [%count11, %endif3]
				%x2 = add i64 %x1, 1
				%4 = icmp eq i64 %x2, 100
				br i1 %4, label %exit, label %header

				exit:
				ret i32 %count12
				}

				; The exit block has higher frequency than false block, so latch block
				; should not moved before header.
				;CHECK-LABEL: test4:
				;CHECK: %header
				;CHECK: %true
				;CHECK: %latch
				;CHECK: %false
				;CHECK: %exit
				define i32 @test4(i32 %t, i32* %p) {
				entry:
				br label %header

				header:
				%x1 = phi i64 [0, %entry], [%x2, %latch]
				%count1 = phi i32 [0, %entry], [%count4, %latch]
				%0 = ptrtoint i32* %p to i64
				%1 = add i64 %0, %x1
				%2 = inttoptr i64 %1 to i32*
				%data = load i32, i32* %2
				%3 = icmp eq i32 %data, 0
				br i1 %3, label %true, label %false, !prof !1

				false:
				%count2 = add i32 %count1, 1
				br label %latch

				true:
				%count3 = add i32 %count1, 2
				br label %latch

				latch:
				%count4 = phi i32 [%count2, %false], [%count3, %true]
				%x2 = add i64 %x1, 1
				%4 = icmp eq i64 %x2, 100
				br i1 %4, label %exit, label %header, !prof !2

				exit:
				ret i32 %count4
				}

				!1 = !{!"branch_weights", i32 100, i32 1}
				!2 = !{!"branch_weights", i32 16, i32 16}
				!3 = !{!"branch_weights", i32 51, i32 49}

				; If move latch to loop top doesn't reduce taken branch, don't do it.
				;CHECK-LABEL: test5:
				;CHECK: %entry
				;CHECK: %header
				;CHECK: %latch
				define void @test5(i32* %p) {
				entry:
				br label %header

				header:
				%x1 = phi i64 [0, %entry], [%x1, %header], [%x2, %latch]
				%0 = ptrtoint i32* %p to i64
				%1 = add i64 %0, %x1
				%2 = inttoptr i64 %1 to i32*
				%data = load i32, i32* %2
				%3 = icmp eq i32 %data, 0
				br i1 %3, label %latch, label %header

				latch:
				%x2 = add i64 %x1, 1
				br label %header

				exit:
				ret void
				}

llvm/trunk/test/CodeGen/X86/pr38185.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -o - %s -mtriple=x86_64--unknown-linux-gnu \| FileCheck %s			; RUN: llc -o - %s -mtriple=x86_64--unknown-linux-gnu \| FileCheck %s

	define void @foo(i32* %a, i32* %b, i32* noalias %c, i64 %s) {			define void @foo(i32* %a, i32* %b, i32* noalias %c, i64 %s) {
	; CHECK-LABEL: foo:			; CHECK-LABEL: foo:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: movq $0, -{{[0-9]+}}(%rsp)			; CHECK-NEXT: movq $0, -{{[0-9]+}}(%rsp)
	; CHECK-NEXT: jmp .LBB0_1
	; CHECK-NEXT: .p2align 4, 0x90			; CHECK-NEXT: .p2align 4, 0x90
	; CHECK-NEXT: .LBB0_2: # %body			; CHECK-NEXT: .LBB0_1: # %loop
				; CHECK-NEXT: # =>This Inner Loop Header: Depth=1
				; CHECK-NEXT: movq -{{[0-9]+}}(%rsp), %r9
				; CHECK-NEXT: cmpq %rcx, %r9
				; CHECK-NEXT: je .LBB0_3
				; CHECK-NEXT: # %bb.2: # %body
	; CHECK-NEXT: # in Loop: Header=BB0_1 Depth=1			; CHECK-NEXT: # in Loop: Header=BB0_1 Depth=1
	; CHECK-NEXT: movl $1, (%rdx,%r9,4)			; CHECK-NEXT: movl $1, (%rdx,%r9,4)
	; CHECK-NEXT: movzbl (%rdi,%r9,4), %r8d			; CHECK-NEXT: movzbl (%rdi,%r9,4), %r8d
	; CHECK-NEXT: movzbl (%rsi,%r9,4), %eax			; CHECK-NEXT: movzbl (%rsi,%r9,4), %eax
	; CHECK-NEXT: andl %r8d, %eax			; CHECK-NEXT: andl %r8d, %eax
	; CHECK-NEXT: andl $1, %eax			; CHECK-NEXT: andl $1, %eax
	; CHECK-NEXT: movl %eax, (%rdi,%r9,4)			; CHECK-NEXT: movl %eax, (%rdi,%r9,4)
	; CHECK-NEXT: incq %r9			; CHECK-NEXT: incq %r9
	; CHECK-NEXT: movq %r9, -{{[0-9]+}}(%rsp)			; CHECK-NEXT: movq %r9, -{{[0-9]+}}(%rsp)
	; CHECK-NEXT: .LBB0_1: # %loop			; CHECK-NEXT: jmp .LBB0_1
	; CHECK-NEXT: # =>This Inner Loop Header: Depth=1			; CHECK-NEXT: .LBB0_3: # %endloop
	; CHECK-NEXT: movq -{{[0-9]+}}(%rsp), %r9
	; CHECK-NEXT: cmpq %rcx, %r9
	; CHECK-NEXT: jne .LBB0_2
	; CHECK-NEXT: # %bb.3: # %endloop
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%i = alloca i64			%i = alloca i64
	store i64 0, i64* %i			store i64 0, i64* %i
	br label %loop			br label %loop

	loop:			loop:
	%ct = load i64, i64* %i			%ct = load i64, i64* %i
	%comp = icmp eq i64 %ct, %s			%comp = icmp eq i64 %ct, %s
	Show All 31 Lines

llvm/trunk/test/CodeGen/X86/ragreedy-hoist-spill.ll

	Show First 20 Lines • Show All 97 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: ## %bb.12: ## %while.body200.preheader			; CHECK-NEXT: ## %bb.12: ## %while.body200.preheader
	; CHECK-NEXT: xorl %edx, %edx			; CHECK-NEXT: xorl %edx, %edx
	; CHECK-NEXT: leaq {{.*}}(%rip), %rsi			; CHECK-NEXT: leaq {{.*}}(%rip), %rsi
	; CHECK-NEXT: leaq {{.*}}(%rip), %rdi			; CHECK-NEXT: leaq {{.*}}(%rip), %rdi
	; CHECK-NEXT: xorl %ebp, %ebp			; CHECK-NEXT: xorl %ebp, %ebp
	; CHECK-NEXT: xorl %r13d, %r13d			; CHECK-NEXT: xorl %r13d, %r13d
	; CHECK-NEXT: jmp LBB0_13			; CHECK-NEXT: jmp LBB0_13
	; CHECK-NEXT: .p2align 4, 0x90			; CHECK-NEXT: .p2align 4, 0x90
				; CHECK-NEXT: LBB0_20: ## %sw.bb256
				; CHECK-NEXT: ## in Loop: Header=BB0_13 Depth=1
				; CHECK-NEXT: movl %r14d, %r13d
				; CHECK-NEXT: LBB0_21: ## %while.cond197.backedge
				; CHECK-NEXT: ## in Loop: Header=BB0_13 Depth=1
				; CHECK-NEXT: decl %r15d
				; CHECK-NEXT: testl %r15d, %r15d
				; CHECK-NEXT: movl %r13d, %r14d
				; CHECK-NEXT: jle LBB0_22
				; CHECK-NEXT: LBB0_13: ## %while.body200
				; CHECK-NEXT: ## =>This Loop Header: Depth=1
				; CHECK-NEXT: ## Child Loop BB0_30 Depth 2
				; CHECK-NEXT: ## Child Loop BB0_38 Depth 2
				; CHECK-NEXT: leal -268(%r14), %eax
				; CHECK-NEXT: cmpl $105, %eax
				; CHECK-NEXT: ja LBB0_14
				; CHECK-NEXT: ## %bb.56: ## %while.body200
				; CHECK-NEXT: ## in Loop: Header=BB0_13 Depth=1
				; CHECK-NEXT: movslq (%rdi,%rax,4), %rax
				; CHECK-NEXT: addq %rdi, %rax
				; CHECK-NEXT: jmpq *%rax
				; CHECK-NEXT: LBB0_44: ## %while.cond1037.preheader
				; CHECK-NEXT: ## in Loop: Header=BB0_13 Depth=1
				; CHECK-NEXT: testb %dl, %dl
				; CHECK-NEXT: movl %r14d, %r13d
				; CHECK-NEXT: jne LBB0_21
				; CHECK-NEXT: jmp LBB0_55
				; CHECK-NEXT: .p2align 4, 0x90
	; CHECK-NEXT: LBB0_14: ## %while.body200			; CHECK-NEXT: LBB0_14: ## %while.body200
	; CHECK-NEXT: ## in Loop: Header=BB0_13 Depth=1			; CHECK-NEXT: ## in Loop: Header=BB0_13 Depth=1
	; CHECK-NEXT: leal 1(%r14), %eax			; CHECK-NEXT: leal 1(%r14), %eax
	; CHECK-NEXT: cmpl $21, %eax			; CHECK-NEXT: cmpl $21, %eax
	; CHECK-NEXT: ja LBB0_20			; CHECK-NEXT: ja LBB0_20
	; CHECK-NEXT: ## %bb.15: ## %while.body200			; CHECK-NEXT: ## %bb.15: ## %while.body200
	; CHECK-NEXT: ## in Loop: Header=BB0_13 Depth=1			; CHECK-NEXT: ## in Loop: Header=BB0_13 Depth=1
	; CHECK-NEXT: movl $-1, %r13d			; CHECK-NEXT: movl $-1, %r13d
	; CHECK-NEXT: movslq (%rsi,%rax,4), %rax			; CHECK-NEXT: movslq (%rsi,%rax,4), %rax
	; CHECK-NEXT: addq %rsi, %rax			; CHECK-NEXT: addq %rsi, %rax
	; CHECK-NEXT: jmpq *%rax			; CHECK-NEXT: jmpq *%rax
	; CHECK-NEXT: LBB0_18: ## %while.cond201.preheader			; CHECK-NEXT: LBB0_18: ## %while.cond201.preheader
	; CHECK-NEXT: ## in Loop: Header=BB0_13 Depth=1			; CHECK-NEXT: ## in Loop: Header=BB0_13 Depth=1
	; CHECK-NEXT: movl $1, %r13d			; CHECK-NEXT: movl $1, %r13d
	; CHECK-NEXT: jmp LBB0_21			; CHECK-NEXT: jmp LBB0_21
	; CHECK-NEXT: LBB0_44: ## %while.cond1037.preheader
	; CHECK-NEXT: ## in Loop: Header=BB0_13 Depth=1
	; CHECK-NEXT: testb %dl, %dl
	; CHECK-NEXT: movl %r14d, %r13d
	; CHECK-NEXT: jne LBB0_21
	; CHECK-NEXT: jmp LBB0_55
	; CHECK-NEXT: LBB0_26: ## %sw.bb474			; CHECK-NEXT: LBB0_26: ## %sw.bb474
	; CHECK-NEXT: ## in Loop: Header=BB0_13 Depth=1			; CHECK-NEXT: ## in Loop: Header=BB0_13 Depth=1
	; CHECK-NEXT: testb %dl, %dl			; CHECK-NEXT: testb %dl, %dl
	; CHECK-NEXT: ## implicit-def: $r12			; CHECK-NEXT: ## implicit-def: $r12
	; CHECK-NEXT: jne LBB0_34			; CHECK-NEXT: jne LBB0_34
	; CHECK-NEXT: ## %bb.27: ## %do.body479.preheader			; CHECK-NEXT: ## %bb.27: ## %do.body479.preheader
	; CHECK-NEXT: ## in Loop: Header=BB0_13 Depth=1			; CHECK-NEXT: ## in Loop: Header=BB0_13 Depth=1
	; CHECK-NEXT: testb %dl, %dl			; CHECK-NEXT: testb %dl, %dl
	; CHECK-NEXT: ## implicit-def: $r12			; CHECK-NEXT: ## implicit-def: $r12
	; CHECK-NEXT: jne LBB0_34			; CHECK-NEXT: jne LBB0_34
	; CHECK-NEXT: ## %bb.28: ## %land.rhs485.preheader			; CHECK-NEXT: ## %bb.28: ## %land.rhs485.preheader
	; CHECK-NEXT: ## in Loop: Header=BB0_13 Depth=1			; CHECK-NEXT: ## in Loop: Header=BB0_13 Depth=1
	; CHECK-NEXT: ## implicit-def: $rax			; CHECK-NEXT: ## implicit-def: $rax
				; CHECK-NEXT: testb %al, %al
				; CHECK-NEXT: jns LBB0_30
				; CHECK-NEXT: jmp LBB0_55
	; CHECK-NEXT: .p2align 4, 0x90			; CHECK-NEXT: .p2align 4, 0x90
	; CHECK-NEXT: LBB0_29: ## %land.rhs485			; CHECK-NEXT: LBB0_32: ## %do.body479.backedge
	; CHECK-NEXT: ## Parent Loop BB0_13 Depth=1			; CHECK-NEXT: ## in Loop: Header=BB0_30 Depth=2
	; CHECK-NEXT: ## => This Inner Loop Header: Depth=2			; CHECK-NEXT: leaq 1(%r12), %rax
				; CHECK-NEXT: testb %dl, %dl
				; CHECK-NEXT: je LBB0_33
				; CHECK-NEXT: ## %bb.29: ## %land.rhs485
				; CHECK-NEXT: ## in Loop: Header=BB0_30 Depth=2
	; CHECK-NEXT: testb %al, %al			; CHECK-NEXT: testb %al, %al
	; CHECK-NEXT: js LBB0_55			; CHECK-NEXT: js LBB0_55
	; CHECK-NEXT: ## %bb.30: ## %cond.true.i.i2780			; CHECK-NEXT: LBB0_30: ## %cond.true.i.i2780
	; CHECK-NEXT: ## in Loop: Header=BB0_29 Depth=2			; CHECK-NEXT: ## Parent Loop BB0_13 Depth=1
				; CHECK-NEXT: ## => This Inner Loop Header: Depth=2
	; CHECK-NEXT: movq %rax, %r12			; CHECK-NEXT: movq %rax, %r12
	; CHECK-NEXT: testb %dl, %dl			; CHECK-NEXT: testb %dl, %dl
	; CHECK-NEXT: jne LBB0_32			; CHECK-NEXT: jne LBB0_32
	; CHECK-NEXT: ## %bb.31: ## %lor.rhs500			; CHECK-NEXT: ## %bb.31: ## %lor.rhs500
	; CHECK-NEXT: ## in Loop: Header=BB0_29 Depth=2			; CHECK-NEXT: ## in Loop: Header=BB0_30 Depth=2
	; CHECK-NEXT: movl $256, %esi ## imm = 0x100			; CHECK-NEXT: movl $256, %esi ## imm = 0x100
	; CHECK-NEXT: callq ___maskrune			; CHECK-NEXT: callq ___maskrune
	; CHECK-NEXT: xorl %edx, %edx			; CHECK-NEXT: xorl %edx, %edx
	; CHECK-NEXT: testb %dl, %dl			; CHECK-NEXT: testb %dl, %dl
	; CHECK-NEXT: je LBB0_34			; CHECK-NEXT: jne LBB0_32
	; CHECK-NEXT: LBB0_32: ## %do.body479.backedge			; CHECK-NEXT: jmp LBB0_34
	; CHECK-NEXT: ## in Loop: Header=BB0_29 Depth=2			; CHECK-NEXT: LBB0_45: ## %sw.bb1134
	; CHECK-NEXT: leaq 1(%r12), %rax			; CHECK-NEXT: ## in Loop: Header=BB0_13 Depth=1
	; CHECK-NEXT: testb %dl, %dl			; CHECK-NEXT: leaq {{[0-9]+}}(%rsp), %rax
	; CHECK-NEXT: jne LBB0_29			; CHECK-NEXT: leaq {{[0-9]+}}(%rsp), %rcx
	; CHECK-NEXT: ## %bb.33: ## %if.end517.loopexitsplit			; CHECK-NEXT: cmpq %rax, %rcx
				; CHECK-NEXT: jb LBB0_55
				; CHECK-NEXT: ## %bb.46: ## in Loop: Header=BB0_13 Depth=1
				; CHECK-NEXT: xorl %ebp, %ebp
				; CHECK-NEXT: movl $268, %r13d ## imm = 0x10C
				; CHECK-NEXT: jmp LBB0_21
				; CHECK-NEXT: LBB0_19: ## %sw.bb243
				; CHECK-NEXT: ## in Loop: Header=BB0_13 Depth=1
				; CHECK-NEXT: movl $2, %r13d
				; CHECK-NEXT: jmp LBB0_21
				; CHECK-NEXT: LBB0_40: ## %sw.bb566
				; CHECK-NEXT: ## in Loop: Header=BB0_13 Depth=1
				; CHECK-NEXT: movl $20, %r13d
				; CHECK-NEXT: jmp LBB0_21
				; CHECK-NEXT: LBB0_33: ## %if.end517.loopexitsplit
	; CHECK-NEXT: ## in Loop: Header=BB0_13 Depth=1			; CHECK-NEXT: ## in Loop: Header=BB0_13 Depth=1
	; CHECK-NEXT: incq %r12			; CHECK-NEXT: incq %r12
	; CHECK-NEXT: LBB0_34: ## %if.end517			; CHECK-NEXT: LBB0_34: ## %if.end517
	; CHECK-NEXT: ## in Loop: Header=BB0_13 Depth=1			; CHECK-NEXT: ## in Loop: Header=BB0_13 Depth=1
	; CHECK-NEXT: leal -324(%r13), %eax			; CHECK-NEXT: leal -324(%r13), %eax
	; CHECK-NEXT: cmpl $59, %eax			; CHECK-NEXT: cmpl $59, %eax
	; CHECK-NEXT: ja LBB0_35			; CHECK-NEXT: ja LBB0_35
	; CHECK-NEXT: ## %bb.57: ## %if.end517			; CHECK-NEXT: ## %bb.57: ## %if.end517
	Show All 22 Lines
	; CHECK-NEXT: ## %bb.39: ## %for.cond542.preheader			; CHECK-NEXT: ## %bb.39: ## %for.cond542.preheader
	; CHECK-NEXT: ## in Loop: Header=BB0_13 Depth=1			; CHECK-NEXT: ## in Loop: Header=BB0_13 Depth=1
	; CHECK-NEXT: testb %dl, %dl			; CHECK-NEXT: testb %dl, %dl
	; CHECK-NEXT: movb $0, (%r12)			; CHECK-NEXT: movb $0, (%r12)
	; CHECK-NEXT: movl %r14d, %r13d			; CHECK-NEXT: movl %r14d, %r13d
	; CHECK-NEXT: leaq {{.*}}(%rip), %rsi			; CHECK-NEXT: leaq {{.*}}(%rip), %rsi
	; CHECK-NEXT: leaq {{.*}}(%rip), %rdi			; CHECK-NEXT: leaq {{.*}}(%rip), %rdi
	; CHECK-NEXT: jmp LBB0_21			; CHECK-NEXT: jmp LBB0_21
	; CHECK-NEXT: LBB0_45: ## %sw.bb1134
	; CHECK-NEXT: ## in Loop: Header=BB0_13 Depth=1
	; CHECK-NEXT: leaq {{[0-9]+}}(%rsp), %rax
	; CHECK-NEXT: leaq {{[0-9]+}}(%rsp), %rcx
	; CHECK-NEXT: cmpq %rax, %rcx
	; CHECK-NEXT: jb LBB0_55
	; CHECK-NEXT: ## %bb.46: ## in Loop: Header=BB0_13 Depth=1
	; CHECK-NEXT: xorl %ebp, %ebp
	; CHECK-NEXT: movl $268, %r13d ## imm = 0x10C
	; CHECK-NEXT: jmp LBB0_21
	; CHECK-NEXT: LBB0_19: ## %sw.bb243
	; CHECK-NEXT: ## in Loop: Header=BB0_13 Depth=1
	; CHECK-NEXT: movl $2, %r13d
	; CHECK-NEXT: jmp LBB0_21
	; CHECK-NEXT: LBB0_40: ## %sw.bb566
	; CHECK-NEXT: ## in Loop: Header=BB0_13 Depth=1
	; CHECK-NEXT: movl $20, %r13d
	; CHECK-NEXT: jmp LBB0_21
	; CHECK-NEXT: .p2align 4, 0x90
	; CHECK-NEXT: LBB0_13: ## %while.body200
	; CHECK-NEXT: ## =>This Loop Header: Depth=1
	; CHECK-NEXT: ## Child Loop BB0_29 Depth 2
	; CHECK-NEXT: ## Child Loop BB0_38 Depth 2
	; CHECK-NEXT: leal -268(%r14), %eax
	; CHECK-NEXT: cmpl $105, %eax
	; CHECK-NEXT: ja LBB0_14
	; CHECK-NEXT: ## %bb.56: ## %while.body200
	; CHECK-NEXT: ## in Loop: Header=BB0_13 Depth=1
	; CHECK-NEXT: movslq (%rdi,%rax,4), %rax
	; CHECK-NEXT: addq %rdi, %rax
	; CHECK-NEXT: jmpq *%rax
	; CHECK-NEXT: LBB0_20: ## %sw.bb256
	; CHECK-NEXT: ## in Loop: Header=BB0_13 Depth=1
	; CHECK-NEXT: movl %r14d, %r13d
	; CHECK-NEXT: LBB0_21: ## %while.cond197.backedge
	; CHECK-NEXT: ## in Loop: Header=BB0_13 Depth=1
	; CHECK-NEXT: decl %r15d
	; CHECK-NEXT: testl %r15d, %r15d
	; CHECK-NEXT: movl %r13d, %r14d
	; CHECK-NEXT: jg LBB0_13
	; CHECK-NEXT: jmp LBB0_22
	; CHECK-NEXT: .p2align 4, 0x90			; CHECK-NEXT: .p2align 4, 0x90
	; CHECK-NEXT: LBB0_42: ## %while.cond864			; CHECK-NEXT: LBB0_42: ## %while.cond864
	; CHECK-NEXT: ## =>This Inner Loop Header: Depth=1			; CHECK-NEXT: ## =>This Inner Loop Header: Depth=1
	; CHECK-NEXT: jmp LBB0_42			; CHECK-NEXT: jmp LBB0_42
	; CHECK-NEXT: .p2align 4, 0x90			; CHECK-NEXT: .p2align 4, 0x90
	; CHECK-NEXT: LBB0_43: ## %while.cond962			; CHECK-NEXT: LBB0_43: ## %while.cond962
	; CHECK-NEXT: ## =>This Inner Loop Header: Depth=1			; CHECK-NEXT: ## =>This Inner Loop Header: Depth=1
	; CHECK-NEXT: jmp LBB0_43			; CHECK-NEXT: jmp LBB0_43
	▲ Show 20 Lines • Show All 438 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/reverse_branches.ll

	Show First 20 Lines • Show All 79 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: .p2align 4, 0x90			; CHECK-NEXT: .p2align 4, 0x90
	; CHECK-NEXT: LBB0_15: ## %for.inc38			; CHECK-NEXT: LBB0_15: ## %for.inc38
	; CHECK-NEXT: ## in Loop: Header=BB0_9 Depth=1			; CHECK-NEXT: ## in Loop: Header=BB0_9 Depth=1
	; CHECK-NEXT: incl %eax			; CHECK-NEXT: incl %eax
	; CHECK-NEXT: cmpl $999, %eax ## imm = 0x3E7			; CHECK-NEXT: cmpl $999, %eax ## imm = 0x3E7
	; CHECK-NEXT: jg LBB0_16			; CHECK-NEXT: jg LBB0_16
	; CHECK-NEXT: LBB0_9: ## %for.cond18.preheader			; CHECK-NEXT: LBB0_9: ## %for.cond18.preheader
	; CHECK-NEXT: ## =>This Loop Header: Depth=1			; CHECK-NEXT: ## =>This Loop Header: Depth=1
	; CHECK-NEXT: ## Child Loop BB0_10 Depth 2			; CHECK-NEXT: ## Child Loop BB0_11 Depth 2
	; CHECK-NEXT: ## Child Loop BB0_12 Depth 3			; CHECK-NEXT: ## Child Loop BB0_12 Depth 3
	; CHECK-NEXT: movq %rcx, %rdx			; CHECK-NEXT: movq %rcx, %rdx
	; CHECK-NEXT: xorl %esi, %esi			; CHECK-NEXT: xorl %esi, %esi
	; CHECK-NEXT: xorl %edi, %edi			; CHECK-NEXT: xorl %edi, %edi
				; CHECK-NEXT: cmpl $999, %edi ## imm = 0x3E7
				; CHECK-NEXT: jle LBB0_11
				; CHECK-NEXT: jmp LBB0_15
	; CHECK-NEXT: .p2align 4, 0x90			; CHECK-NEXT: .p2align 4, 0x90
	; CHECK-NEXT: LBB0_10: ## %for.cond18			; CHECK-NEXT: LBB0_14: ## %exit
				; CHECK-NEXT: ## in Loop: Header=BB0_11 Depth=2
				; CHECK-NEXT: addq %rsi, %rbp
				; CHECK-NEXT: incq %rdi
				; CHECK-NEXT: decq %rsi
				; CHECK-NEXT: addq $1001, %rdx ## imm = 0x3E9
				; CHECK-NEXT: cmpq $-1000, %rbp ## imm = 0xFC18
				; CHECK-NEXT: jne LBB0_5
				; CHECK-NEXT: ## %bb.10: ## %for.cond18
				; CHECK-NEXT: ## in Loop: Header=BB0_11 Depth=2
				; CHECK-NEXT: cmpl $999, %edi ## imm = 0x3E7
				; CHECK-NEXT: jg LBB0_15
				; CHECK-NEXT: LBB0_11: ## %for.body20
	; CHECK-NEXT: ## Parent Loop BB0_9 Depth=1			; CHECK-NEXT: ## Parent Loop BB0_9 Depth=1
	; CHECK-NEXT: ## => This Loop Header: Depth=2			; CHECK-NEXT: ## => This Loop Header: Depth=2
	; CHECK-NEXT: ## Child Loop BB0_12 Depth 3			; CHECK-NEXT: ## Child Loop BB0_12 Depth 3
	; CHECK-NEXT: cmpl $999, %edi ## imm = 0x3E7
	; CHECK-NEXT: jg LBB0_15
	; CHECK-NEXT: ## %bb.11: ## %for.body20
	; CHECK-NEXT: ## in Loop: Header=BB0_10 Depth=2
	; CHECK-NEXT: movq $-1000, %rbp ## imm = 0xFC18			; CHECK-NEXT: movq $-1000, %rbp ## imm = 0xFC18
	; CHECK-NEXT: .p2align 4, 0x90			; CHECK-NEXT: .p2align 4, 0x90
	; CHECK-NEXT: LBB0_12: ## %do.body.i			; CHECK-NEXT: LBB0_12: ## %do.body.i
	; CHECK-NEXT: ## Parent Loop BB0_9 Depth=1			; CHECK-NEXT: ## Parent Loop BB0_9 Depth=1
	; CHECK-NEXT: ## Parent Loop BB0_10 Depth=2			; CHECK-NEXT: ## Parent Loop BB0_11 Depth=2
	; CHECK-NEXT: ## => This Inner Loop Header: Depth=3			; CHECK-NEXT: ## => This Inner Loop Header: Depth=3
	; CHECK-NEXT: cmpb $120, 1000(%rdx,%rbp)			; CHECK-NEXT: cmpb $120, 1000(%rdx,%rbp)
	; CHECK-NEXT: je LBB0_14			; CHECK-NEXT: je LBB0_14
	; CHECK-NEXT: ## %bb.13: ## %do.cond.i			; CHECK-NEXT: ## %bb.13: ## %do.cond.i
	; CHECK-NEXT: ## in Loop: Header=BB0_12 Depth=3			; CHECK-NEXT: ## in Loop: Header=BB0_12 Depth=3
	; CHECK-NEXT: incq %rbp			; CHECK-NEXT: incq %rbp
	; CHECK-NEXT: jne LBB0_12			; CHECK-NEXT: jne LBB0_12
	; CHECK-NEXT: jmp LBB0_5
	; CHECK-NEXT: .p2align 4, 0x90
	; CHECK-NEXT: LBB0_14: ## %exit
	; CHECK-NEXT: ## in Loop: Header=BB0_10 Depth=2
	; CHECK-NEXT: addq %rsi, %rbp
	; CHECK-NEXT: incq %rdi
	; CHECK-NEXT: decq %rsi
	; CHECK-NEXT: addq $1001, %rdx ## imm = 0x3E9
	; CHECK-NEXT: cmpq $-1000, %rbp ## imm = 0xFC18
	; CHECK-NEXT: je LBB0_10
	; CHECK-NEXT: LBB0_5: ## %if.then			; CHECK-NEXT: LBB0_5: ## %if.then
	; CHECK-NEXT: leaq {{.*}}(%rip), %rdi			; CHECK-NEXT: leaq {{.*}}(%rip), %rdi
	; CHECK-NEXT: callq _puts			; CHECK-NEXT: callq _puts
	; CHECK-NEXT: movl $1, %edi			; CHECK-NEXT: movl $1, %edi
	; CHECK-NEXT: callq _exit			; CHECK-NEXT: callq _exit
	; CHECK-NEXT: LBB0_16: ## %for.end40			; CHECK-NEXT: LBB0_16: ## %for.end40
	; CHECK-NEXT: leaq {{.*}}(%rip), %rdi			; CHECK-NEXT: leaq {{.*}}(%rip), %rdi
	; CHECK-NEXT: callq _puts			; CHECK-NEXT: callq _puts
	▲ Show 20 Lines • Show All 104 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/speculative-load-hardening.ll

	Show First 20 Lines • Show All 209 Lines • ▼ Show 20 Lines
	; X64-NEXT: cmoveq %r15, %rax			; X64-NEXT: cmoveq %r15, %rax
	; X64-NEXT: jmp .LBB2_5			; X64-NEXT: jmp .LBB2_5
	; X64-NEXT: .LBB2_2: # %l.header.preheader			; X64-NEXT: .LBB2_2: # %l.header.preheader
	; X64-NEXT: movq %rcx, %r14			; X64-NEXT: movq %rcx, %r14
	; X64-NEXT: movq %rdx, %r12			; X64-NEXT: movq %rdx, %r12
	; X64-NEXT: movl %esi, %ebp			; X64-NEXT: movl %esi, %ebp
	; X64-NEXT: cmovneq %r15, %rax			; X64-NEXT: cmovneq %r15, %rax
	; X64-NEXT: xorl %ebx, %ebx			; X64-NEXT: xorl %ebx, %ebx
	; X64-NEXT: jmp .LBB2_3
	; X64-NEXT: .p2align 4, 0x90			; X64-NEXT: .p2align 4, 0x90
	; X64-NEXT: .LBB2_6: # in Loop: Header=BB2_3 Depth=1
	; X64-NEXT: cmovgeq %r15, %rax
	; X64-NEXT: .LBB2_3: # %l.header			; X64-NEXT: .LBB2_3: # %l.header
	; X64-NEXT: # =>This Inner Loop Header: Depth=1			; X64-NEXT: # =>This Inner Loop Header: Depth=1
	; X64-NEXT: movslq (%r12), %rcx			; X64-NEXT: movslq (%r12), %rcx
	; X64-NEXT: orq %rax, %rcx			; X64-NEXT: orq %rax, %rcx
	; X64-NEXT: movq %rax, %rdx			; X64-NEXT: movq %rax, %rdx
	; X64-NEXT: orq %r14, %rdx			; X64-NEXT: orq %r14, %rdx
	; X64-NEXT: movl (%rdx,%rcx,4), %edi			; X64-NEXT: movl (%rdx,%rcx,4), %edi
	; X64-NEXT: shlq $47, %rax			; X64-NEXT: shlq $47, %rax
	; X64-NEXT: orq %rax, %rsp			; X64-NEXT: orq %rax, %rsp
	; X64-NEXT: callq sink			; X64-NEXT: callq sink
	; X64-NEXT: .Lslh_ret_addr1:			; X64-NEXT: .Lslh_ret_addr1:
	; X64-NEXT: movq %rsp, %rax			; X64-NEXT: movq %rsp, %rax
	; X64-NEXT: movq -{{[0-9]+}}(%rsp), %rcx			; X64-NEXT: movq -{{[0-9]+}}(%rsp), %rcx
	; X64-NEXT: sarq $63, %rax			; X64-NEXT: sarq $63, %rax
	; X64-NEXT: cmpq $.Lslh_ret_addr1, %rcx			; X64-NEXT: cmpq $.Lslh_ret_addr1, %rcx
	; X64-NEXT: cmovneq %r15, %rax			; X64-NEXT: cmovneq %r15, %rax
	; X64-NEXT: incl %ebx			; X64-NEXT: incl %ebx
	; X64-NEXT: cmpl %ebp, %ebx			; X64-NEXT: cmpl %ebp, %ebx
	; X64-NEXT: jl .LBB2_6			; X64-NEXT: jge .LBB2_4
	; X64-NEXT: # %bb.4:			; X64-NEXT: # %bb.6: # in Loop: Header=BB2_3 Depth=1
				; X64-NEXT: cmovgeq %r15, %rax
				; X64-NEXT: jmp .LBB2_3
				; X64-NEXT: .LBB2_4:
	; X64-NEXT: cmovlq %r15, %rax			; X64-NEXT: cmovlq %r15, %rax
	; X64-NEXT: .LBB2_5: # %exit			; X64-NEXT: .LBB2_5: # %exit
	; X64-NEXT: shlq $47, %rax			; X64-NEXT: shlq $47, %rax
	; X64-NEXT: orq %rax, %rsp			; X64-NEXT: orq %rax, %rsp
	; X64-NEXT: popq %rbx			; X64-NEXT: popq %rbx
	; X64-NEXT: popq %r12			; X64-NEXT: popq %r12
	; X64-NEXT: popq %r14			; X64-NEXT: popq %r14
	; X64-NEXT: popq %r15			; X64-NEXT: popq %r15
	▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines
	; X64-NEXT: movq %r8, %r14			; X64-NEXT: movq %r8, %r14
	; X64-NEXT: movq %rcx, %rbx			; X64-NEXT: movq %rcx, %rbx
	; X64-NEXT: movl %edx, %r12d			; X64-NEXT: movl %edx, %r12d
	; X64-NEXT: movl %esi, %r15d			; X64-NEXT: movl %esi, %r15d
	; X64-NEXT: cmovneq %rbp, %rax			; X64-NEXT: cmovneq %rbp, %rax
	; X64-NEXT: xorl %r13d, %r13d			; X64-NEXT: xorl %r13d, %r13d
	; X64-NEXT: movl %esi, {{[-0-9]+}}(%r{{[sb]}}p) # 4-byte Spill			; X64-NEXT: movl %esi, {{[-0-9]+}}(%r{{[sb]}}p) # 4-byte Spill
	; X64-NEXT: testl %r15d, %r15d			; X64-NEXT: testl %r15d, %r15d
	; X64-NEXT: jg .LBB3_5
	; X64-NEXT: jmp .LBB3_4
	; X64-NEXT: .p2align 4, 0x90
	; X64-NEXT: .LBB3_12:
	; X64-NEXT: cmovgeq %rbp, %rax
	; X64-NEXT: testl %r15d, %r15d
	; X64-NEXT: jle .LBB3_4			; X64-NEXT: jle .LBB3_4
				; X64-NEXT: .p2align 4, 0x90
	; X64-NEXT: .LBB3_5: # %l2.header.preheader			; X64-NEXT: .LBB3_5: # %l2.header.preheader
	; X64-NEXT: cmovleq %rbp, %rax			; X64-NEXT: cmovleq %rbp, %rax
	; X64-NEXT: xorl %r15d, %r15d			; X64-NEXT: xorl %r15d, %r15d
	; X64-NEXT: jmp .LBB3_6
	; X64-NEXT: .p2align 4, 0x90			; X64-NEXT: .p2align 4, 0x90
	; X64-NEXT: .LBB3_11: # in Loop: Header=BB3_6 Depth=1
	; X64-NEXT: cmovgeq %rbp, %rax
	; X64-NEXT: .LBB3_6: # %l2.header			; X64-NEXT: .LBB3_6: # %l2.header
	; X64-NEXT: # =>This Inner Loop Header: Depth=1			; X64-NEXT: # =>This Inner Loop Header: Depth=1
	; X64-NEXT: movslq (%rbx), %rcx			; X64-NEXT: movslq (%rbx), %rcx
	; X64-NEXT: orq %rax, %rcx			; X64-NEXT: orq %rax, %rcx
	; X64-NEXT: movq %rax, %rdx			; X64-NEXT: movq %rax, %rdx
	; X64-NEXT: orq %r14, %rdx			; X64-NEXT: orq %r14, %rdx
	; X64-NEXT: movl (%rdx,%rcx,4), %edi			; X64-NEXT: movl (%rdx,%rcx,4), %edi
	; X64-NEXT: shlq $47, %rax			; X64-NEXT: shlq $47, %rax
	; X64-NEXT: orq %rax, %rsp			; X64-NEXT: orq %rax, %rsp
	; X64-NEXT: callq sink			; X64-NEXT: callq sink
	; X64-NEXT: .Lslh_ret_addr2:			; X64-NEXT: .Lslh_ret_addr2:
	; X64-NEXT: movq %rsp, %rax			; X64-NEXT: movq %rsp, %rax
	; X64-NEXT: movq -{{[0-9]+}}(%rsp), %rcx			; X64-NEXT: movq -{{[0-9]+}}(%rsp), %rcx
	; X64-NEXT: sarq $63, %rax			; X64-NEXT: sarq $63, %rax
	; X64-NEXT: cmpq $.Lslh_ret_addr2, %rcx			; X64-NEXT: cmpq $.Lslh_ret_addr2, %rcx
	; X64-NEXT: cmovneq %rbp, %rax			; X64-NEXT: cmovneq %rbp, %rax
	; X64-NEXT: incl %r15d			; X64-NEXT: incl %r15d
	; X64-NEXT: cmpl %r12d, %r15d			; X64-NEXT: cmpl %r12d, %r15d
	; X64-NEXT: jl .LBB3_11			; X64-NEXT: jge .LBB3_7
	; X64-NEXT: # %bb.7:			; X64-NEXT: # %bb.11: # in Loop: Header=BB3_6 Depth=1
				; X64-NEXT: cmovgeq %rbp, %rax
				; X64-NEXT: jmp .LBB3_6
				; X64-NEXT: .p2align 4, 0x90
				; X64-NEXT: .LBB3_7:
	; X64-NEXT: cmovlq %rbp, %rax			; X64-NEXT: cmovlq %rbp, %rax
	; X64-NEXT: movl {{[-0-9]+}}(%r{{[sb]}}p), %r15d # 4-byte Reload			; X64-NEXT: movl {{[-0-9]+}}(%r{{[sb]}}p), %r15d # 4-byte Reload
	; X64-NEXT: jmp .LBB3_8			; X64-NEXT: jmp .LBB3_8
	; X64-NEXT: .p2align 4, 0x90			; X64-NEXT: .p2align 4, 0x90
	; X64-NEXT: .LBB3_4:			; X64-NEXT: .LBB3_4:
	; X64-NEXT: cmovgq %rbp, %rax			; X64-NEXT: cmovgq %rbp, %rax
	; X64-NEXT: .LBB3_8: # %l1.latch			; X64-NEXT: .LBB3_8: # %l1.latch
	; X64-NEXT: movslq (%rbx), %rcx			; X64-NEXT: movslq (%rbx), %rcx
	; X64-NEXT: orq %rax, %rcx			; X64-NEXT: orq %rax, %rcx
	; X64-NEXT: movq %rax, %rdx			; X64-NEXT: movq %rax, %rdx
	; X64-NEXT: orq %r14, %rdx			; X64-NEXT: orq %r14, %rdx
	; X64-NEXT: movl (%rdx,%rcx,4), %edi			; X64-NEXT: movl (%rdx,%rcx,4), %edi
	; X64-NEXT: shlq $47, %rax			; X64-NEXT: shlq $47, %rax
	; X64-NEXT: orq %rax, %rsp			; X64-NEXT: orq %rax, %rsp
	; X64-NEXT: callq sink			; X64-NEXT: callq sink
	; X64-NEXT: .Lslh_ret_addr3:			; X64-NEXT: .Lslh_ret_addr3:
	; X64-NEXT: movq %rsp, %rax			; X64-NEXT: movq %rsp, %rax
	; X64-NEXT: movq -{{[0-9]+}}(%rsp), %rcx			; X64-NEXT: movq -{{[0-9]+}}(%rsp), %rcx
	; X64-NEXT: sarq $63, %rax			; X64-NEXT: sarq $63, %rax
	; X64-NEXT: cmpq $.Lslh_ret_addr3, %rcx			; X64-NEXT: cmpq $.Lslh_ret_addr3, %rcx
	; X64-NEXT: cmovneq %rbp, %rax			; X64-NEXT: cmovneq %rbp, %rax
	; X64-NEXT: incl %r13d			; X64-NEXT: incl %r13d
	; X64-NEXT: cmpl %r15d, %r13d			; X64-NEXT: cmpl %r15d, %r13d
	; X64-NEXT: jl .LBB3_12			; X64-NEXT: jge .LBB3_9
	; X64-NEXT: # %bb.9:			; X64-NEXT: # %bb.12:
				; X64-NEXT: cmovgeq %rbp, %rax
				; X64-NEXT: testl %r15d, %r15d
				; X64-NEXT: jg .LBB3_5
				; X64-NEXT: jmp .LBB3_4
				; X64-NEXT: .LBB3_9:
	; X64-NEXT: cmovlq %rbp, %rax			; X64-NEXT: cmovlq %rbp, %rax
	; X64-NEXT: .LBB3_10: # %exit			; X64-NEXT: .LBB3_10: # %exit
	; X64-NEXT: shlq $47, %rax			; X64-NEXT: shlq $47, %rax
	; X64-NEXT: orq %rax, %rsp			; X64-NEXT: orq %rax, %rsp
	; X64-NEXT: addq $8, %rsp			; X64-NEXT: addq $8, %rsp
	; X64-NEXT: popq %rbx			; X64-NEXT: popq %rbx
	; X64-NEXT: popq %r12			; X64-NEXT: popq %r12
	; X64-NEXT: popq %r13			; X64-NEXT: popq %r13
	Show All 15 Lines
	; X64-LFENCE-NEXT: jne .LBB3_6			; X64-LFENCE-NEXT: jne .LBB3_6
	; X64-LFENCE-NEXT: # %bb.1: # %l1.header.preheader			; X64-LFENCE-NEXT: # %bb.1: # %l1.header.preheader
	; X64-LFENCE-NEXT: movq %r8, %r14			; X64-LFENCE-NEXT: movq %r8, %r14
	; X64-LFENCE-NEXT: movq %rcx, %rbx			; X64-LFENCE-NEXT: movq %rcx, %rbx
	; X64-LFENCE-NEXT: movl %edx, %r13d			; X64-LFENCE-NEXT: movl %edx, %r13d
	; X64-LFENCE-NEXT: movl %esi, %r15d			; X64-LFENCE-NEXT: movl %esi, %r15d
	; X64-LFENCE-NEXT: lfence			; X64-LFENCE-NEXT: lfence
	; X64-LFENCE-NEXT: xorl %r12d, %r12d			; X64-LFENCE-NEXT: xorl %r12d, %r12d
				; X64-LFENCE-NEXT: jmp .LBB3_2
	; X64-LFENCE-NEXT: .p2align 4, 0x90			; X64-LFENCE-NEXT: .p2align 4, 0x90
				; X64-LFENCE-NEXT: .LBB3_5: # %l1.latch
				; X64-LFENCE-NEXT: # in Loop: Header=BB3_2 Depth=1
				; X64-LFENCE-NEXT: lfence
				; X64-LFENCE-NEXT: movslq (%rbx), %rax
				; X64-LFENCE-NEXT: movl (%r14,%rax,4), %edi
				; X64-LFENCE-NEXT: callq sink
				; X64-LFENCE-NEXT: incl %r12d
				; X64-LFENCE-NEXT: cmpl %r15d, %r12d
				; X64-LFENCE-NEXT: jge .LBB3_6
	; X64-LFENCE-NEXT: .LBB3_2: # %l1.header			; X64-LFENCE-NEXT: .LBB3_2: # %l1.header
	; X64-LFENCE-NEXT: # =>This Loop Header: Depth=1			; X64-LFENCE-NEXT: # =>This Loop Header: Depth=1
	; X64-LFENCE-NEXT: # Child Loop BB3_4 Depth 2			; X64-LFENCE-NEXT: # Child Loop BB3_4 Depth 2
	; X64-LFENCE-NEXT: lfence			; X64-LFENCE-NEXT: lfence
	; X64-LFENCE-NEXT: testl %r15d, %r15d			; X64-LFENCE-NEXT: testl %r15d, %r15d
	; X64-LFENCE-NEXT: jle .LBB3_5			; X64-LFENCE-NEXT: jle .LBB3_5
	; X64-LFENCE-NEXT: # %bb.3: # %l2.header.preheader			; X64-LFENCE-NEXT: # %bb.3: # %l2.header.preheader
	; X64-LFENCE-NEXT: # in Loop: Header=BB3_2 Depth=1			; X64-LFENCE-NEXT: # in Loop: Header=BB3_2 Depth=1
	; X64-LFENCE-NEXT: lfence			; X64-LFENCE-NEXT: lfence
	; X64-LFENCE-NEXT: xorl %ebp, %ebp			; X64-LFENCE-NEXT: xorl %ebp, %ebp
	; X64-LFENCE-NEXT: .p2align 4, 0x90			; X64-LFENCE-NEXT: .p2align 4, 0x90
	; X64-LFENCE-NEXT: .LBB3_4: # %l2.header			; X64-LFENCE-NEXT: .LBB3_4: # %l2.header
	; X64-LFENCE-NEXT: # Parent Loop BB3_2 Depth=1			; X64-LFENCE-NEXT: # Parent Loop BB3_2 Depth=1
	; X64-LFENCE-NEXT: # => This Inner Loop Header: Depth=2			; X64-LFENCE-NEXT: # => This Inner Loop Header: Depth=2
	; X64-LFENCE-NEXT: lfence			; X64-LFENCE-NEXT: lfence
	; X64-LFENCE-NEXT: movslq (%rbx), %rax			; X64-LFENCE-NEXT: movslq (%rbx), %rax
	; X64-LFENCE-NEXT: movl (%r14,%rax,4), %edi			; X64-LFENCE-NEXT: movl (%r14,%rax,4), %edi
	; X64-LFENCE-NEXT: callq sink			; X64-LFENCE-NEXT: callq sink
	; X64-LFENCE-NEXT: incl %ebp			; X64-LFENCE-NEXT: incl %ebp
	; X64-LFENCE-NEXT: cmpl %r13d, %ebp			; X64-LFENCE-NEXT: cmpl %r13d, %ebp
	; X64-LFENCE-NEXT: jl .LBB3_4			; X64-LFENCE-NEXT: jl .LBB3_4
	; X64-LFENCE-NEXT: .LBB3_5: # %l1.latch			; X64-LFENCE-NEXT: jmp .LBB3_5
	; X64-LFENCE-NEXT: # in Loop: Header=BB3_2 Depth=1
	; X64-LFENCE-NEXT: lfence
	; X64-LFENCE-NEXT: movslq (%rbx), %rax
	; X64-LFENCE-NEXT: movl (%r14,%rax,4), %edi
	; X64-LFENCE-NEXT: callq sink
	; X64-LFENCE-NEXT: incl %r12d
	; X64-LFENCE-NEXT: cmpl %r15d, %r12d
	; X64-LFENCE-NEXT: jl .LBB3_2
	; X64-LFENCE-NEXT: .LBB3_6: # %exit			; X64-LFENCE-NEXT: .LBB3_6: # %exit
	; X64-LFENCE-NEXT: lfence			; X64-LFENCE-NEXT: lfence
	; X64-LFENCE-NEXT: addq $8, %rsp			; X64-LFENCE-NEXT: addq $8, %rsp
	; X64-LFENCE-NEXT: popq %rbx			; X64-LFENCE-NEXT: popq %rbx
	; X64-LFENCE-NEXT: popq %r12			; X64-LFENCE-NEXT: popq %r12
	; X64-LFENCE-NEXT: popq %r13			; X64-LFENCE-NEXT: popq %r13
	; X64-LFENCE-NEXT: popq %r14			; X64-LFENCE-NEXT: popq %r14
	; X64-LFENCE-NEXT: popq %r15			; X64-LFENCE-NEXT: popq %r15
	▲ Show 20 Lines • Show All 701 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/swifterror.ll

	; RUN: llc < %s -verify-machineinstrs -mtriple=x86_64-apple-darwin \| FileCheck --check-prefix=CHECK-APPLE %s			; RUN: llc -verify-machineinstrs < %s -mtriple=x86_64-apple-darwin -disable-block-placement \| FileCheck --check-prefix=CHECK-APPLE %s
	; RUN: llc < %s -verify-machineinstrs -mtriple=x86_64-apple-darwin -O0 \| FileCheck --check-prefix=CHECK-O0 %s			; RUN: llc -verify-machineinstrs -O0 < %s -mtriple=x86_64-apple-darwin -disable-block-placement \| FileCheck --check-prefix=CHECK-O0 %s
	; RUN: llc < %s -verify-machineinstrs -mtriple=i386-apple-darwin \| FileCheck --check-prefix=CHECK-i386 %s			; RUN: llc -verify-machineinstrs < %s -mtriple=i386-apple-darwin -disable-block-placement \| FileCheck --check-prefix=CHECK-i386 %s

	declare i8* @malloc(i64)			declare i8* @malloc(i64)
	declare void @free(i8*)			declare void @free(i8*)
	%swift_error = type {i64, i8}			%swift_error = type {i64, i8}

	; This tests the basic usage of a swifterror parameter. "foo" is the function			; This tests the basic usage of a swifterror parameter. "foo" is the function
	; that takes a swifterror parameter and "caller" is the caller of "foo".			; that takes a swifterror parameter and "caller" is the caller of "foo".
	define float @foo(%swift_error** swifterror %error_ptr_ref) {			define float @foo(%swift_error** swifterror %error_ptr_ref) {
	▲ Show 20 Lines • Show All 804 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/tail-dup-merge-loop-headers.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu \| FileCheck %s			; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu \| FileCheck %s

	; Function Attrs: nounwind uwtable			; Function Attrs: nounwind uwtable
	define void @tail_dup_merge_loops(i32 %a, i8* %b, i8* %c) local_unnamed_addr #0 {			define void @tail_dup_merge_loops(i32 %a, i8* %b, i8* %c) local_unnamed_addr #0 {
	; CHECK-LABEL: tail_dup_merge_loops:			; CHECK-LABEL: tail_dup_merge_loops:
	; CHECK: # %bb.0: # %entry			; CHECK: # %bb.0: # %entry
	; CHECK-NEXT: testl %edi, %edi			; CHECK-NEXT: testl %edi, %edi
	; CHECK-NEXT: jne .LBB0_2			; CHECK-NEXT: jne .LBB0_2
	; CHECK-NEXT: jmp .LBB0_5			; CHECK-NEXT: jmp .LBB0_5
	; CHECK-NEXT: .LBB0_3: # %inner_loop_exit			; CHECK-NEXT: .LBB0_3: # %inner_loop_exit
	; CHECK-NEXT: # in Loop: Header=BB0_2 Depth=1			; CHECK-NEXT: # in Loop: Header=BB0_2 Depth=1
	; CHECK-NEXT: incq %rsi			; CHECK-NEXT: incq %rsi
	; CHECK-NEXT: testl %edi, %edi			; CHECK-NEXT: testl %edi, %edi
	; CHECK-NEXT: jne .LBB0_2			; CHECK-NEXT: je .LBB0_5
	; CHECK-NEXT: jmp .LBB0_5
	; CHECK-NEXT: .p2align 4, 0x90			; CHECK-NEXT: .p2align 4, 0x90
				; CHECK-NEXT: .LBB0_2: # %inner_loop_top
				; CHECK-NEXT: # =>This Loop Header: Depth=1
				; CHECK-NEXT: # Child Loop BB0_4 Depth 2
				; CHECK-NEXT: cmpb $0, (%rsi)
				; CHECK-NEXT: js .LBB0_3
	; CHECK-NEXT: .LBB0_4: # %inner_loop_latch			; CHECK-NEXT: .LBB0_4: # %inner_loop_latch
	; CHECK-NEXT: # in Loop: Header=BB0_2 Depth=1			; CHECK-NEXT: # Parent Loop BB0_2 Depth=1
				; CHECK-NEXT: # => This Inner Loop Header: Depth=2
	; CHECK-NEXT: addq $2, %rsi			; CHECK-NEXT: addq $2, %rsi
	; CHECK-NEXT: .LBB0_2: # %inner_loop_top
	; CHECK-NEXT: # =>This Inner Loop Header: Depth=1
	; CHECK-NEXT: cmpb $0, (%rsi)			; CHECK-NEXT: cmpb $0, (%rsi)
	; CHECK-NEXT: jns .LBB0_4			; CHECK-NEXT: jns .LBB0_4
	; CHECK-NEXT: jmp .LBB0_3			; CHECK-NEXT: jmp .LBB0_3
	; CHECK-NEXT: .LBB0_5: # %exit			; CHECK-NEXT: .LBB0_5: # %exit
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	entry:			entry:
	%notlhs674.i = icmp eq i32 %a, 0			%notlhs674.i = icmp eq i32 %a, 0
	br label %outer_loop_top			br label %outer_loop_top
	▲ Show 20 Lines • Show All 94 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: jb .LBB1_28			; CHECK-NEXT: jb .LBB1_28
	; CHECK-NEXT: # %bb.6: # %shared_preheader			; CHECK-NEXT: # %bb.6: # %shared_preheader
	; CHECK-NEXT: movb $32, %dl			; CHECK-NEXT: movb $32, %dl
	; CHECK-NEXT: xorl %eax, %eax			; CHECK-NEXT: xorl %eax, %eax
	; CHECK-NEXT: # implicit-def: $rcx			; CHECK-NEXT: # implicit-def: $rcx
	; CHECK-NEXT: testl %ebp, %ebp			; CHECK-NEXT: testl %ebp, %ebp
	; CHECK-NEXT: je .LBB1_18			; CHECK-NEXT: je .LBB1_18
	; CHECK-NEXT: .p2align 4, 0x90			; CHECK-NEXT: .p2align 4, 0x90
	; CHECK-NEXT: .LBB1_8: # %shared_loop_header			; CHECK-NEXT: .LBB1_9: # %shared_loop_header
	; CHECK-NEXT: # =>This Inner Loop Header: Depth=1			; CHECK-NEXT: # =>This Inner Loop Header: Depth=1
	; CHECK-NEXT: testq %rbx, %rbx			; CHECK-NEXT: testq %rbx, %rbx
	; CHECK-NEXT: jne .LBB1_27			; CHECK-NEXT: jne .LBB1_27
	; CHECK-NEXT: # %bb.9: # %inner_loop_body			; CHECK-NEXT: # %bb.10: # %inner_loop_body
	; CHECK-NEXT: # in Loop: Header=BB1_8 Depth=1			; CHECK-NEXT: # in Loop: Header=BB1_9 Depth=1
	; CHECK-NEXT: testl %eax, %eax			; CHECK-NEXT: testl %eax, %eax
	; CHECK-NEXT: jns .LBB1_8			; CHECK-NEXT: jns .LBB1_9
	; CHECK-NEXT: # %bb.10: # %if.end96.i			; CHECK-NEXT: # %bb.11: # %if.end96.i
	; CHECK-NEXT: # in Loop: Header=BB1_8 Depth=1			; CHECK-NEXT: # in Loop: Header=BB1_9 Depth=1
	; CHECK-NEXT: cmpl $3, %ebp			; CHECK-NEXT: cmpl $3, %ebp
	; CHECK-NEXT: jae .LBB1_22			; CHECK-NEXT: jae .LBB1_22
	; CHECK-NEXT: # %bb.11: # %if.end287.i			; CHECK-NEXT: # %bb.12: # %if.end287.i
	; CHECK-NEXT: # in Loop: Header=BB1_8 Depth=1			; CHECK-NEXT: # in Loop: Header=BB1_9 Depth=1
	; CHECK-NEXT: xorl %esi, %esi			; CHECK-NEXT: xorl %esi, %esi
	; CHECK-NEXT: cmpl $1, %ebp			; CHECK-NEXT: cmpl $1, %ebp
	; CHECK-NEXT: setne %dl			; CHECK-NEXT: setne %dl
	; CHECK-NEXT: testb %al, %al			; CHECK-NEXT: testb %al, %al
	; CHECK-NEXT: jne .LBB1_15			; CHECK-NEXT: jne .LBB1_16
	; CHECK-NEXT: # %bb.12: # %if.end308.i			; CHECK-NEXT: # %bb.13: # %if.end308.i
	; CHECK-NEXT: # in Loop: Header=BB1_8 Depth=1			; CHECK-NEXT: # in Loop: Header=BB1_9 Depth=1
	; CHECK-NEXT: testb %al, %al			; CHECK-NEXT: testb %al, %al
	; CHECK-NEXT: je .LBB1_17			; CHECK-NEXT: je .LBB1_7
	; CHECK-NEXT: # %bb.13: # %if.end335.i			; CHECK-NEXT: # %bb.14: # %if.end335.i
	; CHECK-NEXT: # in Loop: Header=BB1_8 Depth=1			; CHECK-NEXT: # in Loop: Header=BB1_9 Depth=1
	; CHECK-NEXT: xorl %edx, %edx			; CHECK-NEXT: xorl %edx, %edx
	; CHECK-NEXT: testb %dl, %dl			; CHECK-NEXT: testb %dl, %dl
	; CHECK-NEXT: movl $0, %esi			; CHECK-NEXT: movl $0, %esi
	; CHECK-NEXT: jne .LBB1_7			; CHECK-NEXT: jne .LBB1_8
	; CHECK-NEXT: # %bb.14: # %merge_other			; CHECK-NEXT: # %bb.15: # %merge_other
	; CHECK-NEXT: # in Loop: Header=BB1_8 Depth=1			; CHECK-NEXT: # in Loop: Header=BB1_9 Depth=1
	; CHECK-NEXT: xorl %esi, %esi			; CHECK-NEXT: xorl %esi, %esi
	; CHECK-NEXT: jmp .LBB1_16			; CHECK-NEXT: jmp .LBB1_17
	; CHECK-NEXT: .LBB1_15: # in Loop: Header=BB1_8 Depth=1			; CHECK-NEXT: .LBB1_16: # in Loop: Header=BB1_9 Depth=1
	; CHECK-NEXT: movb %dl, %sil			; CHECK-NEXT: movb %dl, %sil
	; CHECK-NEXT: addl $3, %esi			; CHECK-NEXT: addl $3, %esi
	; CHECK-NEXT: .LBB1_16: # %outer_loop_latch			; CHECK-NEXT: .LBB1_17: # %outer_loop_latch
	; CHECK-NEXT: # in Loop: Header=BB1_8 Depth=1			; CHECK-NEXT: # in Loop: Header=BB1_9 Depth=1
	; CHECK-NEXT: # implicit-def: $dl			; CHECK-NEXT: # implicit-def: $dl
	; CHECK-NEXT: jmp .LBB1_7			; CHECK-NEXT: jmp .LBB1_8
	; CHECK-NEXT: .LBB1_17: # %merge_predecessor_split			; CHECK-NEXT: .LBB1_7: # %merge_predecessor_split
	; CHECK-NEXT: # in Loop: Header=BB1_8 Depth=1			; CHECK-NEXT: # in Loop: Header=BB1_9 Depth=1
	; CHECK-NEXT: movb $32, %dl			; CHECK-NEXT: movb $32, %dl
	; CHECK-NEXT: xorl %esi, %esi			; CHECK-NEXT: xorl %esi, %esi
	; CHECK-NEXT: .LBB1_7: # %outer_loop_latch			; CHECK-NEXT: .LBB1_8: # %outer_loop_latch
	; CHECK-NEXT: # in Loop: Header=BB1_8 Depth=1			; CHECK-NEXT: # in Loop: Header=BB1_9 Depth=1
	; CHECK-NEXT: movzwl %si, %esi			; CHECK-NEXT: movzwl %si, %esi
	; CHECK-NEXT: decl %esi			; CHECK-NEXT: decl %esi
	; CHECK-NEXT: movzwl %si, %esi			; CHECK-NEXT: movzwl %si, %esi
	; CHECK-NEXT: leaq 1(%rcx,%rsi), %rcx			; CHECK-NEXT: leaq 1(%rcx,%rsi), %rcx
	; CHECK-NEXT: testl %ebp, %ebp			; CHECK-NEXT: testl %ebp, %ebp
	; CHECK-NEXT: jne .LBB1_8			; CHECK-NEXT: jne .LBB1_9
	; CHECK-NEXT: .LBB1_18: # %while.cond.us1412.i			; CHECK-NEXT: .LBB1_18: # %while.cond.us1412.i
	; CHECK-NEXT: xorl %eax, %eax			; CHECK-NEXT: xorl %eax, %eax
	; CHECK-NEXT: testb %al, %al			; CHECK-NEXT: testb %al, %al
	; CHECK-NEXT: movl $1, %ebx			; CHECK-NEXT: movl $1, %ebx
	; CHECK-NEXT: jne .LBB1_20			; CHECK-NEXT: jne .LBB1_20
	; CHECK-NEXT: # %bb.19: # %while.cond.us1412.i			; CHECK-NEXT: # %bb.19: # %while.cond.us1412.i
	; CHECK-NEXT: decb %dl			; CHECK-NEXT: decb %dl
	; CHECK-NEXT: jne .LBB1_26			; CHECK-NEXT: jne .LBB1_26
	▲ Show 20 Lines • Show All 129 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/tail-dup-repeat.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu \| FileCheck %s			; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu \| FileCheck %s

	; Function Attrs: uwtable			; Function Attrs: uwtable
	; When tail-duplicating during placement, we work backward from blocks with			; When tail-duplicating during placement, we work backward from blocks with
	; multiple successors. In this case, the block dup1 gets duplicated into dup2			; multiple successors. In this case, the block dup1 gets duplicated into dup2
	; and if.then64, and then the block dup2 gets duplicated into land.lhs.true			; and if.then64, and then the block dup2 gets duplicated into land.lhs.true
	; and if.end70			; and if.end70

	define void @repeated_tail_dup(i1 %a1, i1 %a2, i32* %a4, i32* %a5, i8* %a6, i32 %a7) #0 align 2 {			define void @repeated_tail_dup(i1 %a1, i1 %a2, i32* %a4, i32* %a5, i8* %a6, i32 %a7) #0 align 2 {
	; CHECK-LABEL: repeated_tail_dup:			; CHECK-LABEL: repeated_tail_dup:
	; CHECK: # %bb.0: # %entry			; CHECK: # %bb.0: # %entry
	; CHECK-NEXT: .p2align 4, 0x90
	; CHECK-NEXT: .LBB0_1: # %for.cond
	; CHECK-NEXT: # =>This Inner Loop Header: Depth=1
	; CHECK-NEXT: testb $1, %dil			; CHECK-NEXT: testb $1, %dil
	; CHECK-NEXT: je .LBB0_3			; CHECK-NEXT: je .LBB0_3
	; CHECK-NEXT: # %bb.2: # %land.lhs.true
	; CHECK-NEXT: # in Loop: Header=BB0_1 Depth=1
	; CHECK-NEXT: movl $10, (%rdx)
	; CHECK-NEXT: jmp .LBB0_6
	; CHECK-NEXT: .p2align 4, 0x90			; CHECK-NEXT: .p2align 4, 0x90
				; CHECK-NEXT: .LBB0_2: # %land.lhs.true
				; CHECK-NEXT: movl $10, (%rdx)
				; CHECK-NEXT: .LBB0_6: # %dup2
				; CHECK-NEXT: movl $2, (%rcx)
				; CHECK-NEXT: testl %r9d, %r9d
				; CHECK-NEXT: jne .LBB0_8
				; CHECK-NEXT: .LBB0_1: # %for.cond
				; CHECK-NEXT: testb $1, %dil
				; CHECK-NEXT: jne .LBB0_2
	; CHECK-NEXT: .LBB0_3: # %if.end56			; CHECK-NEXT: .LBB0_3: # %if.end56
	; CHECK-NEXT: # in Loop: Header=BB0_1 Depth=1
	; CHECK-NEXT: testb $1, %sil			; CHECK-NEXT: testb $1, %sil
	; CHECK-NEXT: je .LBB0_5			; CHECK-NEXT: je .LBB0_5
	; CHECK-NEXT: # %bb.4: # %if.then64			; CHECK-NEXT: # %bb.4: # %if.then64
	; CHECK-NEXT: # in Loop: Header=BB0_1 Depth=1
	; CHECK-NEXT: movb $1, (%r8)			; CHECK-NEXT: movb $1, (%r8)
	; CHECK-NEXT: testl %r9d, %r9d			; CHECK-NEXT: testl %r9d, %r9d
	; CHECK-NEXT: je .LBB0_1			; CHECK-NEXT: je .LBB0_1
	; CHECK-NEXT: jmp .LBB0_8			; CHECK-NEXT: jmp .LBB0_8
	; CHECK-NEXT: .p2align 4, 0x90			; CHECK-NEXT: .p2align 4, 0x90
	; CHECK-NEXT: .LBB0_5: # %if.end70			; CHECK-NEXT: .LBB0_5: # %if.end70
	; CHECK-NEXT: # in Loop: Header=BB0_1 Depth=1
	; CHECK-NEXT: movl $12, (%rdx)			; CHECK-NEXT: movl $12, (%rdx)
	; CHECK-NEXT: .LBB0_6: # %dup2			; CHECK-NEXT: jmp .LBB0_6
	; CHECK-NEXT: # in Loop: Header=BB0_1 Depth=1
	; CHECK-NEXT: movl $2, (%rcx)
	; CHECK-NEXT: testl %r9d, %r9d
	; CHECK-NEXT: je .LBB0_1
	; CHECK-NEXT: .LBB0_8: # %for.end			; CHECK-NEXT: .LBB0_8: # %for.end
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	entry:			entry:
	br label %for.cond			br label %for.cond

	for.cond: ; preds = %dup1, %entry			for.cond: ; preds = %dup1, %entry
	br i1 %a1, label %land.lhs.true, label %if.end56			br i1 %a1, label %land.lhs.true, label %if.end56

	Show All 29 Lines

llvm/trunk/test/CodeGen/X86/vector-shift-by-select-loop.ll

	Show First 20 Lines • Show All 130 Lines • ▼ Show 20 Lines
	; SSE-NEXT: movdqu %xmm5, 80(%rdi,%rcx,4)			; SSE-NEXT: movdqu %xmm5, 80(%rdi,%rcx,4)
	; SSE-NEXT: movdqu %xmm6, 96(%rdi,%rcx,4)			; SSE-NEXT: movdqu %xmm6, 96(%rdi,%rcx,4)
	; SSE-NEXT: movdqu %xmm2, 112(%rdi,%rcx,4)			; SSE-NEXT: movdqu %xmm2, 112(%rdi,%rcx,4)
	; SSE-NEXT: addq $32, %rcx			; SSE-NEXT: addq $32, %rcx
	; SSE-NEXT: cmpq %rcx, %rdx			; SSE-NEXT: cmpq %rcx, %rdx
	; SSE-NEXT: jne .LBB0_4			; SSE-NEXT: jne .LBB0_4
	; SSE-NEXT: # %bb.5: # %middle.block			; SSE-NEXT: # %bb.5: # %middle.block
	; SSE-NEXT: cmpq %rax, %rdx			; SSE-NEXT: cmpq %rax, %rdx
	; SSE-NEXT: je .LBB0_9			; SSE-NEXT: jne .LBB0_6
				; SSE-NEXT: .LBB0_9: # %for.cond.cleanup
				; SSE-NEXT: retq
	; SSE-NEXT: .p2align 4, 0x90			; SSE-NEXT: .p2align 4, 0x90
				; SSE-NEXT: .LBB0_8: # %for.body
				; SSE-NEXT: # in Loop: Header=BB0_6 Depth=1
				; SSE-NEXT: # kill: def $cl killed $cl killed $ecx
				; SSE-NEXT: shll %cl, (%rdi,%rdx,4)
				; SSE-NEXT: incq %rdx
				; SSE-NEXT: cmpq %rdx, %rax
				; SSE-NEXT: je .LBB0_9
	; SSE-NEXT: .LBB0_6: # %for.body			; SSE-NEXT: .LBB0_6: # %for.body
	; SSE-NEXT: # =>This Inner Loop Header: Depth=1			; SSE-NEXT: # =>This Inner Loop Header: Depth=1
	; SSE-NEXT: cmpb $0, (%rsi,%rdx)			; SSE-NEXT: cmpb $0, (%rsi,%rdx)
	; SSE-NEXT: movl %r9d, %ecx			; SSE-NEXT: movl %r9d, %ecx
	; SSE-NEXT: je .LBB0_8			; SSE-NEXT: je .LBB0_8
	; SSE-NEXT: # %bb.7: # %for.body			; SSE-NEXT: # %bb.7: # %for.body
	; SSE-NEXT: # in Loop: Header=BB0_6 Depth=1			; SSE-NEXT: # in Loop: Header=BB0_6 Depth=1
	; SSE-NEXT: movl %r8d, %ecx			; SSE-NEXT: movl %r8d, %ecx
	; SSE-NEXT: .LBB0_8: # %for.body			; SSE-NEXT: jmp .LBB0_8
	; SSE-NEXT: # in Loop: Header=BB0_6 Depth=1
	; SSE-NEXT: # kill: def $cl killed $cl killed $ecx
	; SSE-NEXT: shll %cl, (%rdi,%rdx,4)
	; SSE-NEXT: incq %rdx
	; SSE-NEXT: cmpq %rdx, %rax
	; SSE-NEXT: jne .LBB0_6
	; SSE-NEXT: .LBB0_9: # %for.cond.cleanup
	; SSE-NEXT: retq
	;			;
	; AVX1-LABEL: vector_variable_shift_left_loop:			; AVX1-LABEL: vector_variable_shift_left_loop:
	; AVX1: # %bb.0: # %entry			; AVX1: # %bb.0: # %entry
	; AVX1-NEXT: testl %edx, %edx			; AVX1-NEXT: testl %edx, %edx
	; AVX1-NEXT: jle .LBB0_9			; AVX1-NEXT: jle .LBB0_9
	; AVX1-NEXT: # %bb.1: # %for.body.preheader			; AVX1-NEXT: # %bb.1: # %for.body.preheader
	; AVX1-NEXT: movl %ecx, %r9d			; AVX1-NEXT: movl %ecx, %r9d
	; AVX1-NEXT: movl %edx, %eax			; AVX1-NEXT: movl %edx, %eax
	▲ Show 20 Lines • Show All 87 Lines • ▼ Show 20 Lines
	; AVX1-NEXT: vmovdqu %xmm6, 80(%rdi,%rcx,4)			; AVX1-NEXT: vmovdqu %xmm6, 80(%rdi,%rcx,4)
	; AVX1-NEXT: vmovdqu %xmm1, 96(%rdi,%rcx,4)			; AVX1-NEXT: vmovdqu %xmm1, 96(%rdi,%rcx,4)
	; AVX1-NEXT: vmovdqu %xmm0, 112(%rdi,%rcx,4)			; AVX1-NEXT: vmovdqu %xmm0, 112(%rdi,%rcx,4)
	; AVX1-NEXT: addq $32, %rcx			; AVX1-NEXT: addq $32, %rcx
	; AVX1-NEXT: cmpq %rcx, %rdx			; AVX1-NEXT: cmpq %rcx, %rdx
	; AVX1-NEXT: jne .LBB0_4			; AVX1-NEXT: jne .LBB0_4
	; AVX1-NEXT: # %bb.5: # %middle.block			; AVX1-NEXT: # %bb.5: # %middle.block
	; AVX1-NEXT: cmpq %rax, %rdx			; AVX1-NEXT: cmpq %rax, %rdx
	; AVX1-NEXT: je .LBB0_9			; AVX1-NEXT: jne .LBB0_6
				; AVX1-NEXT: .LBB0_9: # %for.cond.cleanup
				; AVX1-NEXT: vzeroupper
				; AVX1-NEXT: retq
	; AVX1-NEXT: .p2align 4, 0x90			; AVX1-NEXT: .p2align 4, 0x90
				; AVX1-NEXT: .LBB0_8: # %for.body
				; AVX1-NEXT: # in Loop: Header=BB0_6 Depth=1
				; AVX1-NEXT: # kill: def $cl killed $cl killed $ecx
				; AVX1-NEXT: shll %cl, (%rdi,%rdx,4)
				; AVX1-NEXT: incq %rdx
				; AVX1-NEXT: cmpq %rdx, %rax
				; AVX1-NEXT: je .LBB0_9
	; AVX1-NEXT: .LBB0_6: # %for.body			; AVX1-NEXT: .LBB0_6: # %for.body
	; AVX1-NEXT: # =>This Inner Loop Header: Depth=1			; AVX1-NEXT: # =>This Inner Loop Header: Depth=1
	; AVX1-NEXT: cmpb $0, (%rsi,%rdx)			; AVX1-NEXT: cmpb $0, (%rsi,%rdx)
	; AVX1-NEXT: movl %r9d, %ecx			; AVX1-NEXT: movl %r9d, %ecx
	; AVX1-NEXT: je .LBB0_8			; AVX1-NEXT: je .LBB0_8
	; AVX1-NEXT: # %bb.7: # %for.body			; AVX1-NEXT: # %bb.7: # %for.body
	; AVX1-NEXT: # in Loop: Header=BB0_6 Depth=1			; AVX1-NEXT: # in Loop: Header=BB0_6 Depth=1
	; AVX1-NEXT: movl %r8d, %ecx			; AVX1-NEXT: movl %r8d, %ecx
	; AVX1-NEXT: .LBB0_8: # %for.body			; AVX1-NEXT: jmp .LBB0_8
	; AVX1-NEXT: # in Loop: Header=BB0_6 Depth=1
	; AVX1-NEXT: # kill: def $cl killed $cl killed $ecx
	; AVX1-NEXT: shll %cl, (%rdi,%rdx,4)
	; AVX1-NEXT: incq %rdx
	; AVX1-NEXT: cmpq %rdx, %rax
	; AVX1-NEXT: jne .LBB0_6
	; AVX1-NEXT: .LBB0_9: # %for.cond.cleanup
	; AVX1-NEXT: vzeroupper
	; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: vector_variable_shift_left_loop:			; AVX2-LABEL: vector_variable_shift_left_loop:
	; AVX2: # %bb.0: # %entry			; AVX2: # %bb.0: # %entry
	; AVX2-NEXT: testl %edx, %edx			; AVX2-NEXT: testl %edx, %edx
	; AVX2-NEXT: jle .LBB0_9			; AVX2-NEXT: jle .LBB0_9
	; AVX2-NEXT: # %bb.1: # %for.body.preheader			; AVX2-NEXT: # %bb.1: # %for.body.preheader
	; AVX2-NEXT: movl %ecx, %r9d			; AVX2-NEXT: movl %ecx, %r9d
	; AVX2-NEXT: movl %edx, %eax			; AVX2-NEXT: movl %edx, %eax
	Show All 38 Lines
	; AVX2-NEXT: vmovdqu %ymm4, 32(%rdi,%rcx,4)			; AVX2-NEXT: vmovdqu %ymm4, 32(%rdi,%rcx,4)
	; AVX2-NEXT: vmovdqu %ymm5, 64(%rdi,%rcx,4)			; AVX2-NEXT: vmovdqu %ymm5, 64(%rdi,%rcx,4)
	; AVX2-NEXT: vmovdqu %ymm6, 96(%rdi,%rcx,4)			; AVX2-NEXT: vmovdqu %ymm6, 96(%rdi,%rcx,4)
	; AVX2-NEXT: addq $32, %rcx			; AVX2-NEXT: addq $32, %rcx
	; AVX2-NEXT: cmpq %rcx, %rdx			; AVX2-NEXT: cmpq %rcx, %rdx
	; AVX2-NEXT: jne .LBB0_4			; AVX2-NEXT: jne .LBB0_4
	; AVX2-NEXT: # %bb.5: # %middle.block			; AVX2-NEXT: # %bb.5: # %middle.block
	; AVX2-NEXT: cmpq %rax, %rdx			; AVX2-NEXT: cmpq %rax, %rdx
	; AVX2-NEXT: je .LBB0_9			; AVX2-NEXT: jne .LBB0_6
				; AVX2-NEXT: .LBB0_9: # %for.cond.cleanup
				; AVX2-NEXT: vzeroupper
				; AVX2-NEXT: retq
	; AVX2-NEXT: .p2align 4, 0x90			; AVX2-NEXT: .p2align 4, 0x90
				; AVX2-NEXT: .LBB0_8: # %for.body
				; AVX2-NEXT: # in Loop: Header=BB0_6 Depth=1
				; AVX2-NEXT: # kill: def $cl killed $cl killed $ecx
				; AVX2-NEXT: shll %cl, (%rdi,%rdx,4)
				; AVX2-NEXT: incq %rdx
				; AVX2-NEXT: cmpq %rdx, %rax
				; AVX2-NEXT: je .LBB0_9
	; AVX2-NEXT: .LBB0_6: # %for.body			; AVX2-NEXT: .LBB0_6: # %for.body
	; AVX2-NEXT: # =>This Inner Loop Header: Depth=1			; AVX2-NEXT: # =>This Inner Loop Header: Depth=1
	; AVX2-NEXT: cmpb $0, (%rsi,%rdx)			; AVX2-NEXT: cmpb $0, (%rsi,%rdx)
	; AVX2-NEXT: movl %r9d, %ecx			; AVX2-NEXT: movl %r9d, %ecx
	; AVX2-NEXT: je .LBB0_8			; AVX2-NEXT: je .LBB0_8
	; AVX2-NEXT: # %bb.7: # %for.body			; AVX2-NEXT: # %bb.7: # %for.body
	; AVX2-NEXT: # in Loop: Header=BB0_6 Depth=1			; AVX2-NEXT: # in Loop: Header=BB0_6 Depth=1
	; AVX2-NEXT: movl %r8d, %ecx			; AVX2-NEXT: movl %r8d, %ecx
	; AVX2-NEXT: .LBB0_8: # %for.body			; AVX2-NEXT: jmp .LBB0_8
	; AVX2-NEXT: # in Loop: Header=BB0_6 Depth=1
	; AVX2-NEXT: # kill: def $cl killed $cl killed $ecx
	; AVX2-NEXT: shll %cl, (%rdi,%rdx,4)
	; AVX2-NEXT: incq %rdx
	; AVX2-NEXT: cmpq %rdx, %rax
	; AVX2-NEXT: jne .LBB0_6
	; AVX2-NEXT: .LBB0_9: # %for.cond.cleanup
	; AVX2-NEXT: vzeroupper
	; AVX2-NEXT: retq
	entry:			entry:
	%cmp12 = icmp sgt i32 %count, 0			%cmp12 = icmp sgt i32 %count, 0
	br i1 %cmp12, label %for.body.preheader, label %for.cond.cleanup			br i1 %cmp12, label %for.body.preheader, label %for.cond.cleanup

	for.body.preheader:			for.body.preheader:
	%wide.trip.count = zext i32 %count to i64			%wide.trip.count = zext i32 %count to i64
	%min.iters.check = icmp ult i32 %count, 32			%min.iters.check = icmp ult i32 %count, 32
	br i1 %min.iters.check, label %for.body.preheader40, label %vector.ph			br i1 %min.iters.check, label %for.body.preheader40, label %vector.ph
	▲ Show 20 Lines • Show All 226 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/widen_arith-1.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=i686-unknown-unknown -mattr=+sse4.2 \| FileCheck %s			; RUN: llc < %s -mtriple=i686-unknown-unknown -mattr=+sse4.2 \| FileCheck %s

	define void @update(<3 x i8>* %dst, <3 x i8>* %src, i32 %n) nounwind {			define void @update(<3 x i8>* %dst, <3 x i8>* %src, i32 %n) nounwind {
	; CHECK-LABEL: update:			; CHECK-LABEL: update:
	; CHECK: # %bb.0: # %entry			; CHECK: # %bb.0: # %entry
	; CHECK-NEXT: subl $12, %esp			; CHECK-NEXT: subl $12, %esp
	; CHECK-NEXT: movl $0, (%esp)			; CHECK-NEXT: movl $0, (%esp)
	; CHECK-NEXT: pcmpeqd %xmm0, %xmm0			; CHECK-NEXT: pcmpeqd %xmm0, %xmm0
	; CHECK-NEXT: movdqa {{.*#+}} xmm1 = <0,4,8,12,u,u,u,u,u,u,u,u,u,u,u,u>			; CHECK-NEXT: movdqa {{.*#+}} xmm1 = <0,4,8,12,u,u,u,u,u,u,u,u,u,u,u,u>
	; CHECK-NEXT: jmp .LBB0_1
	; CHECK-NEXT: .p2align 4, 0x90			; CHECK-NEXT: .p2align 4, 0x90
	; CHECK-NEXT: .LBB0_2: # %forbody			; CHECK-NEXT: .LBB0_1: # %forcond
				; CHECK-NEXT: # =>This Inner Loop Header: Depth=1
				; CHECK-NEXT: movl (%esp), %eax
				; CHECK-NEXT: cmpl {{[0-9]+}}(%esp), %eax
				; CHECK-NEXT: jge .LBB0_3
				; CHECK-NEXT: # %bb.2: # %forbody
	; CHECK-NEXT: # in Loop: Header=BB0_1 Depth=1			; CHECK-NEXT: # in Loop: Header=BB0_1 Depth=1
	; CHECK-NEXT: movl (%esp), %eax			; CHECK-NEXT: movl (%esp), %eax
	; CHECK-NEXT: movl {{[0-9]+}}(%esp), %ecx			; CHECK-NEXT: movl {{[0-9]+}}(%esp), %ecx
	; CHECK-NEXT: movl {{[0-9]+}}(%esp), %edx			; CHECK-NEXT: movl {{[0-9]+}}(%esp), %edx
	; CHECK-NEXT: pmovzxbd {{.*#+}} xmm2 = mem[0],zero,zero,zero,mem[1],zero,zero,zero,mem[2],zero,zero,zero,mem[3],zero,zero,zero			; CHECK-NEXT: pmovzxbd {{.*#+}} xmm2 = mem[0],zero,zero,zero,mem[1],zero,zero,zero,mem[2],zero,zero,zero,mem[3],zero,zero,zero
	; CHECK-NEXT: psubd %xmm0, %xmm2			; CHECK-NEXT: psubd %xmm0, %xmm2
	; CHECK-NEXT: pextrb $8, %xmm2, 2(%ecx,%eax,4)			; CHECK-NEXT: pextrb $8, %xmm2, 2(%ecx,%eax,4)
	; CHECK-NEXT: pshufb %xmm1, %xmm2			; CHECK-NEXT: pshufb %xmm1, %xmm2
	; CHECK-NEXT: pextrw $0, %xmm2, (%ecx,%eax,4)			; CHECK-NEXT: pextrw $0, %xmm2, (%ecx,%eax,4)
	; CHECK-NEXT: incl (%esp)			; CHECK-NEXT: incl (%esp)
	; CHECK-NEXT: .LBB0_1: # %forcond			; CHECK-NEXT: jmp .LBB0_1
	; CHECK-NEXT: # =>This Inner Loop Header: Depth=1			; CHECK-NEXT: .LBB0_3: # %afterfor
	; CHECK-NEXT: movl (%esp), %eax
	; CHECK-NEXT: cmpl {{[0-9]+}}(%esp), %eax
	; CHECK-NEXT: jl .LBB0_2
	; CHECK-NEXT: # %bb.3: # %afterfor
	; CHECK-NEXT: addl $12, %esp			; CHECK-NEXT: addl $12, %esp
	; CHECK-NEXT: retl			; CHECK-NEXT: retl
	entry:			entry:
	%dst.addr = alloca <3 x i8>*			%dst.addr = alloca <3 x i8>*
	%src.addr = alloca <3 x i8>*			%src.addr = alloca <3 x i8>*
	%n.addr = alloca i32			%n.addr = alloca i32
	%i = alloca i32, align 4			%i = alloca i32, align 4
	store <3 x i8>* %dst, <3 x i8>** %dst.addr			store <3 x i8>* %dst, <3 x i8>** %dst.addr
	Show All 33 Lines

llvm/trunk/test/CodeGen/X86/widen_arith-2.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=i686-unknown-unknown -mattr=+sse4.2 \| FileCheck %s			; RUN: llc < %s -mtriple=i686-unknown-unknown -mattr=+sse4.2 \| FileCheck %s

	; widen v8i8 to v16i8 (checks even power of 2 widening with add & and)			; widen v8i8 to v16i8 (checks even power of 2 widening with add & and)

	define void @update(i64* %dst_i, i64* %src_i, i32 %n) nounwind {			define void @update(i64* %dst_i, i64* %src_i, i32 %n) nounwind {
	; CHECK-LABEL: update:			; CHECK-LABEL: update:
	; CHECK: # %bb.0: # %entry			; CHECK: # %bb.0: # %entry
	; CHECK-NEXT: subl $12, %esp			; CHECK-NEXT: subl $12, %esp
	; CHECK-NEXT: movl $0, (%esp)			; CHECK-NEXT: movl $0, (%esp)
	; CHECK-NEXT: pcmpeqd %xmm0, %xmm0			; CHECK-NEXT: pcmpeqd %xmm0, %xmm0
	; CHECK-NEXT: movdqa {{.*#+}} xmm1 = [4,4,4,4,4,4,4,4]			; CHECK-NEXT: movdqa {{.*#+}} xmm1 = [4,4,4,4,4,4,4,4]
	; CHECK-NEXT: jmp .LBB0_1
	; CHECK-NEXT: .p2align 4, 0x90			; CHECK-NEXT: .p2align 4, 0x90
	; CHECK-NEXT: .LBB0_2: # %forbody			; CHECK-NEXT: .LBB0_1: # %forcond
				; CHECK-NEXT: # =>This Inner Loop Header: Depth=1
				; CHECK-NEXT: movl (%esp), %eax
				; CHECK-NEXT: cmpl {{[0-9]+}}(%esp), %eax
				; CHECK-NEXT: jge .LBB0_3
				; CHECK-NEXT: # %bb.2: # %forbody
	; CHECK-NEXT: # in Loop: Header=BB0_1 Depth=1			; CHECK-NEXT: # in Loop: Header=BB0_1 Depth=1
	; CHECK-NEXT: movl (%esp), %eax			; CHECK-NEXT: movl (%esp), %eax
	; CHECK-NEXT: leal (,%eax,8), %ecx			; CHECK-NEXT: leal (,%eax,8), %ecx
	; CHECK-NEXT: movl {{[0-9]+}}(%esp), %edx			; CHECK-NEXT: movl {{[0-9]+}}(%esp), %edx
	; CHECK-NEXT: addl %ecx, %edx			; CHECK-NEXT: addl %ecx, %edx
	; CHECK-NEXT: movl %edx, {{[0-9]+}}(%esp)			; CHECK-NEXT: movl %edx, {{[0-9]+}}(%esp)
	; CHECK-NEXT: addl {{[0-9]+}}(%esp), %ecx			; CHECK-NEXT: addl {{[0-9]+}}(%esp), %ecx
	; CHECK-NEXT: movl %ecx, {{[0-9]+}}(%esp)			; CHECK-NEXT: movl %ecx, {{[0-9]+}}(%esp)
	; CHECK-NEXT: pmovzxbw {{.*#+}} xmm2 = mem[0],zero,mem[1],zero,mem[2],zero,mem[3],zero,mem[4],zero,mem[5],zero,mem[6],zero,mem[7],zero			; CHECK-NEXT: pmovzxbw {{.*#+}} xmm2 = mem[0],zero,mem[1],zero,mem[2],zero,mem[3],zero,mem[4],zero,mem[5],zero,mem[6],zero,mem[7],zero
	; CHECK-NEXT: psubw %xmm0, %xmm2			; CHECK-NEXT: psubw %xmm0, %xmm2
	; CHECK-NEXT: pand %xmm1, %xmm2			; CHECK-NEXT: pand %xmm1, %xmm2
	; CHECK-NEXT: packuswb %xmm0, %xmm2			; CHECK-NEXT: packuswb %xmm0, %xmm2
	; CHECK-NEXT: movq %xmm2, (%edx,%eax,8)			; CHECK-NEXT: movq %xmm2, (%edx,%eax,8)
	; CHECK-NEXT: incl (%esp)			; CHECK-NEXT: incl (%esp)
	; CHECK-NEXT: .LBB0_1: # %forcond			; CHECK-NEXT: jmp .LBB0_1
	; CHECK-NEXT: # =>This Inner Loop Header: Depth=1			; CHECK-NEXT: .LBB0_3: # %afterfor
	; CHECK-NEXT: movl (%esp), %eax
	; CHECK-NEXT: cmpl {{[0-9]+}}(%esp), %eax
	; CHECK-NEXT: jl .LBB0_2
	; CHECK-NEXT: # %bb.3: # %afterfor
	; CHECK-NEXT: addl $12, %esp			; CHECK-NEXT: addl $12, %esp
	; CHECK-NEXT: retl			; CHECK-NEXT: retl
	entry:			entry:
	%dst_i.addr = alloca i64*			%dst_i.addr = alloca i64*
	%src_i.addr = alloca i64*			%src_i.addr = alloca i64*
	%n.addr = alloca i32			%n.addr = alloca i32
	%i = alloca i32, align 4			%i = alloca i32, align 4
	%dst = alloca <8 x i8>*, align 4			%dst = alloca <8 x i8>*, align 4
	▲ Show 20 Lines • Show All 46 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/widen_arith-3.ll

	Show All 12 Lines
	; CHECK-NEXT: movl %esp, %ebp			; CHECK-NEXT: movl %esp, %ebp
	; CHECK-NEXT: andl $-8, %esp			; CHECK-NEXT: andl $-8, %esp
	; CHECK-NEXT: subl $32, %esp			; CHECK-NEXT: subl $32, %esp
	; CHECK-NEXT: movdqa {{.*#+}} xmm1 = [0,1,4,5,8,9,12,13,8,9,12,13,12,13,14,15]			; CHECK-NEXT: movdqa {{.*#+}} xmm1 = [0,1,4,5,8,9,12,13,8,9,12,13,12,13,14,15]
	; CHECK-NEXT: pcmpeqd %xmm0, %xmm0			; CHECK-NEXT: pcmpeqd %xmm0, %xmm0
	; CHECK-NEXT: movw $1, {{[0-9]+}}(%esp)			; CHECK-NEXT: movw $1, {{[0-9]+}}(%esp)
	; CHECK-NEXT: movl $65537, {{[0-9]+}}(%esp) # imm = 0x10001			; CHECK-NEXT: movl $65537, {{[0-9]+}}(%esp) # imm = 0x10001
	; CHECK-NEXT: movl $0, {{[0-9]+}}(%esp)			; CHECK-NEXT: movl $0, {{[0-9]+}}(%esp)
	; CHECK-NEXT: jmp .LBB0_1
	; CHECK-NEXT: .p2align 4, 0x90			; CHECK-NEXT: .p2align 4, 0x90
	; CHECK-NEXT: .LBB0_2: # %forbody			; CHECK-NEXT: .LBB0_1: # %forcond
				; CHECK-NEXT: # =>This Inner Loop Header: Depth=1
				; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax
				; CHECK-NEXT: cmpl 16(%ebp), %eax
				; CHECK-NEXT: jge .LBB0_3
				; CHECK-NEXT: # %bb.2: # %forbody
	; CHECK-NEXT: # in Loop: Header=BB0_1 Depth=1			; CHECK-NEXT: # in Loop: Header=BB0_1 Depth=1
	; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax			; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax
	; CHECK-NEXT: movl 12(%ebp), %edx			; CHECK-NEXT: movl 12(%ebp), %edx
	; CHECK-NEXT: movl 8(%ebp), %ecx			; CHECK-NEXT: movl 8(%ebp), %ecx
	; CHECK-NEXT: pmovzxwd {{.*#+}} xmm2 = mem[0],zero,mem[1],zero,mem[2],zero,mem[3],zero			; CHECK-NEXT: pmovzxwd {{.*#+}} xmm2 = mem[0],zero,mem[1],zero,mem[2],zero,mem[3],zero
	; CHECK-NEXT: psubd %xmm0, %xmm2			; CHECK-NEXT: psubd %xmm0, %xmm2
	; CHECK-NEXT: pextrw $4, %xmm2, 4(%ecx,%eax,8)			; CHECK-NEXT: pextrw $4, %xmm2, 4(%ecx,%eax,8)
	; CHECK-NEXT: pshufb %xmm1, %xmm2			; CHECK-NEXT: pshufb %xmm1, %xmm2
	; CHECK-NEXT: movd %xmm2, (%ecx,%eax,8)			; CHECK-NEXT: movd %xmm2, (%ecx,%eax,8)
	; CHECK-NEXT: incl {{[0-9]+}}(%esp)			; CHECK-NEXT: incl {{[0-9]+}}(%esp)
	; CHECK-NEXT: .LBB0_1: # %forcond			; CHECK-NEXT: jmp .LBB0_1
	; CHECK-NEXT: # =>This Inner Loop Header: Depth=1			; CHECK-NEXT: .LBB0_3: # %afterfor
	; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax
	; CHECK-NEXT: cmpl 16(%ebp), %eax
	; CHECK-NEXT: jl .LBB0_2
	; CHECK-NEXT: # %bb.3: # %afterfor
	; CHECK-NEXT: movl %ebp, %esp			; CHECK-NEXT: movl %ebp, %esp
	; CHECK-NEXT: popl %ebp			; CHECK-NEXT: popl %ebp
	; CHECK-NEXT: retl			; CHECK-NEXT: retl
	entry:			entry:
	%dst.addr = alloca <3 x i16>*			%dst.addr = alloca <3 x i16>*
	%src.addr = alloca <3 x i16>*			%src.addr = alloca <3 x i16>*
	%n.addr = alloca i32			%n.addr = alloca i32
	%v = alloca <3 x i16>, align 8			%v = alloca <3 x i16>, align 8
	Show All 36 Lines

llvm/trunk/test/CodeGen/X86/widen_arith-4.ll

	Show All 10 Lines
	; SSE2-NEXT: movq %rsi, -{{[0-9]+}}(%rsp)			; SSE2-NEXT: movq %rsi, -{{[0-9]+}}(%rsp)
	; SSE2-NEXT: movl %edx, -{{[0-9]+}}(%rsp)			; SSE2-NEXT: movl %edx, -{{[0-9]+}}(%rsp)
	; SSE2-NEXT: movabsq $4295032833, %rax # imm = 0x100010001			; SSE2-NEXT: movabsq $4295032833, %rax # imm = 0x100010001
	; SSE2-NEXT: movq %rax, -{{[0-9]+}}(%rsp)			; SSE2-NEXT: movq %rax, -{{[0-9]+}}(%rsp)
	; SSE2-NEXT: movw $0, -{{[0-9]+}}(%rsp)			; SSE2-NEXT: movw $0, -{{[0-9]+}}(%rsp)
	; SSE2-NEXT: movl $0, -{{[0-9]+}}(%rsp)			; SSE2-NEXT: movl $0, -{{[0-9]+}}(%rsp)
	; SSE2-NEXT: movdqa {{.*#+}} xmm0 = <271,271,271,271,271,u,u,u>			; SSE2-NEXT: movdqa {{.*#+}} xmm0 = <271,271,271,271,271,u,u,u>
	; SSE2-NEXT: movdqa {{.*#+}} xmm1 = <2,4,2,2,2,u,u,u>			; SSE2-NEXT: movdqa {{.*#+}} xmm1 = <2,4,2,2,2,u,u,u>
	; SSE2-NEXT: jmp .LBB0_1
	; SSE2-NEXT: .p2align 4, 0x90			; SSE2-NEXT: .p2align 4, 0x90
	; SSE2-NEXT: .LBB0_2: # %forbody			; SSE2-NEXT: .LBB0_1: # %forcond
				; SSE2-NEXT: # =>This Inner Loop Header: Depth=1
				; SSE2-NEXT: movl -{{[0-9]+}}(%rsp), %eax
				; SSE2-NEXT: cmpl -{{[0-9]+}}(%rsp), %eax
				; SSE2-NEXT: jge .LBB0_3
				; SSE2-NEXT: # %bb.2: # %forbody
	; SSE2-NEXT: # in Loop: Header=BB0_1 Depth=1			; SSE2-NEXT: # in Loop: Header=BB0_1 Depth=1
	; SSE2-NEXT: movslq -{{[0-9]+}}(%rsp), %rax			; SSE2-NEXT: movslq -{{[0-9]+}}(%rsp), %rax
	; SSE2-NEXT: movq -{{[0-9]+}}(%rsp), %rcx			; SSE2-NEXT: movq -{{[0-9]+}}(%rsp), %rcx
	; SSE2-NEXT: shlq $4, %rax			; SSE2-NEXT: shlq $4, %rax
	; SSE2-NEXT: movq -{{[0-9]+}}(%rsp), %rdx			; SSE2-NEXT: movq -{{[0-9]+}}(%rsp), %rdx
	; SSE2-NEXT: movdqa (%rdx,%rax), %xmm2			; SSE2-NEXT: movdqa (%rdx,%rax), %xmm2
	; SSE2-NEXT: psubw %xmm0, %xmm2			; SSE2-NEXT: psubw %xmm0, %xmm2
	; SSE2-NEXT: pmullw %xmm1, %xmm2			; SSE2-NEXT: pmullw %xmm1, %xmm2
	; SSE2-NEXT: movq %xmm2, (%rcx,%rax)			; SSE2-NEXT: movq %xmm2, (%rcx,%rax)
	; SSE2-NEXT: pextrw $4, %xmm2, %edx			; SSE2-NEXT: pextrw $4, %xmm2, %edx
	; SSE2-NEXT: movw %dx, 8(%rcx,%rax)			; SSE2-NEXT: movw %dx, 8(%rcx,%rax)
	; SSE2-NEXT: incl -{{[0-9]+}}(%rsp)			; SSE2-NEXT: incl -{{[0-9]+}}(%rsp)
	; SSE2-NEXT: .LBB0_1: # %forcond			; SSE2-NEXT: jmp .LBB0_1
	; SSE2-NEXT: # =>This Inner Loop Header: Depth=1			; SSE2-NEXT: .LBB0_3: # %afterfor
	; SSE2-NEXT: movl -{{[0-9]+}}(%rsp), %eax
	; SSE2-NEXT: cmpl -{{[0-9]+}}(%rsp), %eax
	; SSE2-NEXT: jl .LBB0_2
	; SSE2-NEXT: # %bb.3: # %afterfor
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; SSE41-LABEL: update:			; SSE41-LABEL: update:
	; SSE41: # %bb.0: # %entry			; SSE41: # %bb.0: # %entry
	; SSE41-NEXT: movq %rdi, -{{[0-9]+}}(%rsp)			; SSE41-NEXT: movq %rdi, -{{[0-9]+}}(%rsp)
	; SSE41-NEXT: movq %rsi, -{{[0-9]+}}(%rsp)			; SSE41-NEXT: movq %rsi, -{{[0-9]+}}(%rsp)
	; SSE41-NEXT: movl %edx, -{{[0-9]+}}(%rsp)			; SSE41-NEXT: movl %edx, -{{[0-9]+}}(%rsp)
	; SSE41-NEXT: movabsq $4295032833, %rax # imm = 0x100010001			; SSE41-NEXT: movabsq $4295032833, %rax # imm = 0x100010001
	; SSE41-NEXT: movq %rax, -{{[0-9]+}}(%rsp)			; SSE41-NEXT: movq %rax, -{{[0-9]+}}(%rsp)
	; SSE41-NEXT: movw $0, -{{[0-9]+}}(%rsp)			; SSE41-NEXT: movw $0, -{{[0-9]+}}(%rsp)
	; SSE41-NEXT: movl $0, -{{[0-9]+}}(%rsp)			; SSE41-NEXT: movl $0, -{{[0-9]+}}(%rsp)
	; SSE41-NEXT: movdqa {{.*#+}} xmm0 = <271,271,271,271,271,u,u,u>			; SSE41-NEXT: movdqa {{.*#+}} xmm0 = <271,271,271,271,271,u,u,u>
	; SSE41-NEXT: jmp .LBB0_1
	; SSE41-NEXT: .p2align 4, 0x90			; SSE41-NEXT: .p2align 4, 0x90
	; SSE41-NEXT: .LBB0_2: # %forbody			; SSE41-NEXT: .LBB0_1: # %forcond
				; SSE41-NEXT: # =>This Inner Loop Header: Depth=1
				; SSE41-NEXT: movl -{{[0-9]+}}(%rsp), %eax
				; SSE41-NEXT: cmpl -{{[0-9]+}}(%rsp), %eax
				; SSE41-NEXT: jge .LBB0_3
				; SSE41-NEXT: # %bb.2: # %forbody
	; SSE41-NEXT: # in Loop: Header=BB0_1 Depth=1			; SSE41-NEXT: # in Loop: Header=BB0_1 Depth=1
	; SSE41-NEXT: movslq -{{[0-9]+}}(%rsp), %rax			; SSE41-NEXT: movslq -{{[0-9]+}}(%rsp), %rax
	; SSE41-NEXT: movq -{{[0-9]+}}(%rsp), %rcx			; SSE41-NEXT: movq -{{[0-9]+}}(%rsp), %rcx
	; SSE41-NEXT: shlq $4, %rax			; SSE41-NEXT: shlq $4, %rax
	; SSE41-NEXT: movq -{{[0-9]+}}(%rsp), %rdx			; SSE41-NEXT: movq -{{[0-9]+}}(%rsp), %rdx
	; SSE41-NEXT: movdqa (%rdx,%rax), %xmm1			; SSE41-NEXT: movdqa (%rdx,%rax), %xmm1
	; SSE41-NEXT: psubw %xmm0, %xmm1			; SSE41-NEXT: psubw %xmm0, %xmm1
	; SSE41-NEXT: movdqa %xmm1, %xmm2			; SSE41-NEXT: movdqa %xmm1, %xmm2
	; SSE41-NEXT: psllw $2, %xmm2			; SSE41-NEXT: psllw $2, %xmm2
	; SSE41-NEXT: psllw $1, %xmm1			; SSE41-NEXT: psllw $1, %xmm1
	; SSE41-NEXT: pblendw {{.*#+}} xmm2 = xmm1[0],xmm2[1],xmm1[2,3,4,5,6,7]			; SSE41-NEXT: pblendw {{.*#+}} xmm2 = xmm1[0],xmm2[1],xmm1[2,3,4,5,6,7]
	; SSE41-NEXT: pextrw $4, %xmm1, 8(%rcx,%rax)			; SSE41-NEXT: pextrw $4, %xmm1, 8(%rcx,%rax)
	; SSE41-NEXT: movq %xmm2, (%rcx,%rax)			; SSE41-NEXT: movq %xmm2, (%rcx,%rax)
	; SSE41-NEXT: incl -{{[0-9]+}}(%rsp)			; SSE41-NEXT: incl -{{[0-9]+}}(%rsp)
	; SSE41-NEXT: .LBB0_1: # %forcond			; SSE41-NEXT: jmp .LBB0_1
	; SSE41-NEXT: # =>This Inner Loop Header: Depth=1			; SSE41-NEXT: .LBB0_3: # %afterfor
	; SSE41-NEXT: movl -{{[0-9]+}}(%rsp), %eax
	; SSE41-NEXT: cmpl -{{[0-9]+}}(%rsp), %eax
	; SSE41-NEXT: jl .LBB0_2
	; SSE41-NEXT: # %bb.3: # %afterfor
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	entry:			entry:
	%dst.addr = alloca <5 x i16>*			%dst.addr = alloca <5 x i16>*
	%src.addr = alloca <5 x i16>*			%src.addr = alloca <5 x i16>*
	%n.addr = alloca i32			%n.addr = alloca i32
	%v = alloca <5 x i16>, align 16			%v = alloca <5 x i16>, align 16
	%i = alloca i32, align 4			%i = alloca i32, align 4
	store <5 x i16>* %dst, <5 x i16>** %dst.addr			store <5 x i16>* %dst, <5 x i16>** %dst.addr
	Show All 35 Lines

llvm/trunk/test/CodeGen/X86/widen_arith-5.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+sse4.2 \| FileCheck %s			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+sse4.2 \| FileCheck %s

	; widen a v3i32 to v4i32 to do a vector multiple and a subtraction			; widen a v3i32 to v4i32 to do a vector multiple and a subtraction

	define void @update(<3 x i32>* %dst, <3 x i32>* %src, i32 %n) nounwind {			define void @update(<3 x i32>* %dst, <3 x i32>* %src, i32 %n) nounwind {
	; CHECK-LABEL: update:			; CHECK-LABEL: update:
	; CHECK: # %bb.0: # %entry			; CHECK: # %bb.0: # %entry
	; CHECK-NEXT: movq %rdi, -{{[0-9]+}}(%rsp)			; CHECK-NEXT: movq %rdi, -{{[0-9]+}}(%rsp)
	; CHECK-NEXT: movq %rsi, -{{[0-9]+}}(%rsp)			; CHECK-NEXT: movq %rsi, -{{[0-9]+}}(%rsp)
	; CHECK-NEXT: movl %edx, -{{[0-9]+}}(%rsp)			; CHECK-NEXT: movl %edx, -{{[0-9]+}}(%rsp)
	; CHECK-NEXT: movabsq $4294967297, %rax # imm = 0x100000001			; CHECK-NEXT: movabsq $4294967297, %rax # imm = 0x100000001
	; CHECK-NEXT: movq %rax, -{{[0-9]+}}(%rsp)			; CHECK-NEXT: movq %rax, -{{[0-9]+}}(%rsp)
	; CHECK-NEXT: movl $1, -{{[0-9]+}}(%rsp)			; CHECK-NEXT: movl $1, -{{[0-9]+}}(%rsp)
	; CHECK-NEXT: movl $0, -{{[0-9]+}}(%rsp)			; CHECK-NEXT: movl $0, -{{[0-9]+}}(%rsp)
	; CHECK-NEXT: movdqa {{.*#+}} xmm0 = <3,3,3,u>			; CHECK-NEXT: movdqa {{.*#+}} xmm0 = <3,3,3,u>
	; CHECK-NEXT: jmp .LBB0_1
	; CHECK-NEXT: .p2align 4, 0x90			; CHECK-NEXT: .p2align 4, 0x90
	; CHECK-NEXT: .LBB0_2: # %forbody			; CHECK-NEXT: .LBB0_1: # %forcond
				; CHECK-NEXT: # =>This Inner Loop Header: Depth=1
				; CHECK-NEXT: movl -{{[0-9]+}}(%rsp), %eax
				; CHECK-NEXT: cmpl -{{[0-9]+}}(%rsp), %eax
				; CHECK-NEXT: jge .LBB0_3
				; CHECK-NEXT: # %bb.2: # %forbody
	; CHECK-NEXT: # in Loop: Header=BB0_1 Depth=1			; CHECK-NEXT: # in Loop: Header=BB0_1 Depth=1
	; CHECK-NEXT: movslq -{{[0-9]+}}(%rsp), %rax			; CHECK-NEXT: movslq -{{[0-9]+}}(%rsp), %rax
	; CHECK-NEXT: movq -{{[0-9]+}}(%rsp), %rcx			; CHECK-NEXT: movq -{{[0-9]+}}(%rsp), %rcx
	; CHECK-NEXT: shlq $4, %rax			; CHECK-NEXT: shlq $4, %rax
	; CHECK-NEXT: movq -{{[0-9]+}}(%rsp), %rdx			; CHECK-NEXT: movq -{{[0-9]+}}(%rsp), %rdx
	; CHECK-NEXT: movdqa (%rdx,%rax), %xmm1			; CHECK-NEXT: movdqa (%rdx,%rax), %xmm1
	; CHECK-NEXT: pslld $2, %xmm1			; CHECK-NEXT: pslld $2, %xmm1
	; CHECK-NEXT: psubd %xmm0, %xmm1			; CHECK-NEXT: psubd %xmm0, %xmm1
	; CHECK-NEXT: pextrd $2, %xmm1, 8(%rcx,%rax)			; CHECK-NEXT: pextrd $2, %xmm1, 8(%rcx,%rax)
	; CHECK-NEXT: movq %xmm1, (%rcx,%rax)			; CHECK-NEXT: movq %xmm1, (%rcx,%rax)
	; CHECK-NEXT: incl -{{[0-9]+}}(%rsp)			; CHECK-NEXT: incl -{{[0-9]+}}(%rsp)
	; CHECK-NEXT: .LBB0_1: # %forcond			; CHECK-NEXT: jmp .LBB0_1
	; CHECK-NEXT: # =>This Inner Loop Header: Depth=1			; CHECK-NEXT: .LBB0_3: # %afterfor
	; CHECK-NEXT: movl -{{[0-9]+}}(%rsp), %eax
	; CHECK-NEXT: cmpl -{{[0-9]+}}(%rsp), %eax
	; CHECK-NEXT: jl .LBB0_2
	; CHECK-NEXT: # %bb.3: # %afterfor
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	entry:			entry:
	%dst.addr = alloca <3 x i32>*			%dst.addr = alloca <3 x i32>*
	%src.addr = alloca <3 x i32>*			%src.addr = alloca <3 x i32>*
	%n.addr = alloca i32			%n.addr = alloca i32
	%v = alloca <3 x i32>, align 16			%v = alloca <3 x i32>, align 16
	%i = alloca i32, align 4			%i = alloca i32, align 4
	store <3 x i32>* %dst, <3 x i32>** %dst.addr			store <3 x i32>* %dst, <3 x i32>** %dst.addr
	Show All 35 Lines

llvm/trunk/test/CodeGen/X86/widen_arith-6.ll

	Show All 9 Lines
	; CHECK-NEXT: movl %esp, %ebp			; CHECK-NEXT: movl %esp, %ebp
	; CHECK-NEXT: andl $-16, %esp			; CHECK-NEXT: andl $-16, %esp
	; CHECK-NEXT: subl $48, %esp			; CHECK-NEXT: subl $48, %esp
	; CHECK-NEXT: movl $1077936128, {{[0-9]+}}(%esp) # imm = 0x40400000			; CHECK-NEXT: movl $1077936128, {{[0-9]+}}(%esp) # imm = 0x40400000
	; CHECK-NEXT: movl $1073741824, {{[0-9]+}}(%esp) # imm = 0x40000000			; CHECK-NEXT: movl $1073741824, {{[0-9]+}}(%esp) # imm = 0x40000000
	; CHECK-NEXT: movl $1065353216, {{[0-9]+}}(%esp) # imm = 0x3F800000			; CHECK-NEXT: movl $1065353216, {{[0-9]+}}(%esp) # imm = 0x3F800000
	; CHECK-NEXT: movl $0, {{[0-9]+}}(%esp)			; CHECK-NEXT: movl $0, {{[0-9]+}}(%esp)
	; CHECK-NEXT: movaps {{.*#+}} xmm0 = <1.97604004E+3,1.97604004E+3,1.97604004E+3,u>			; CHECK-NEXT: movaps {{.*#+}} xmm0 = <1.97604004E+3,1.97604004E+3,1.97604004E+3,u>
	; CHECK-NEXT: jmp .LBB0_1
	; CHECK-NEXT: .p2align 4, 0x90			; CHECK-NEXT: .p2align 4, 0x90
	; CHECK-NEXT: .LBB0_2: # %forbody			; CHECK-NEXT: .LBB0_1: # %forcond
				; CHECK-NEXT: # =>This Inner Loop Header: Depth=1
				; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax
				; CHECK-NEXT: cmpl 16(%ebp), %eax
				; CHECK-NEXT: jge .LBB0_3
				; CHECK-NEXT: # %bb.2: # %forbody
	; CHECK-NEXT: # in Loop: Header=BB0_1 Depth=1			; CHECK-NEXT: # in Loop: Header=BB0_1 Depth=1
	; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax			; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax
	; CHECK-NEXT: movl 8(%ebp), %ecx			; CHECK-NEXT: movl 8(%ebp), %ecx
	; CHECK-NEXT: shll $4, %eax			; CHECK-NEXT: shll $4, %eax
	; CHECK-NEXT: movl 12(%ebp), %edx			; CHECK-NEXT: movl 12(%ebp), %edx
	; CHECK-NEXT: movaps (%edx,%eax), %xmm1			; CHECK-NEXT: movaps (%edx,%eax), %xmm1
	; CHECK-NEXT: mulps {{[0-9]+}}(%esp), %xmm1			; CHECK-NEXT: mulps {{[0-9]+}}(%esp), %xmm1
	; CHECK-NEXT: addps %xmm0, %xmm1			; CHECK-NEXT: addps %xmm0, %xmm1
	; CHECK-NEXT: extractps $2, %xmm1, 8(%ecx,%eax)			; CHECK-NEXT: extractps $2, %xmm1, 8(%ecx,%eax)
	; CHECK-NEXT: extractps $1, %xmm1, 4(%ecx,%eax)			; CHECK-NEXT: extractps $1, %xmm1, 4(%ecx,%eax)
	; CHECK-NEXT: movss %xmm1, (%ecx,%eax)			; CHECK-NEXT: movss %xmm1, (%ecx,%eax)
	; CHECK-NEXT: incl {{[0-9]+}}(%esp)			; CHECK-NEXT: incl {{[0-9]+}}(%esp)
	; CHECK-NEXT: .LBB0_1: # %forcond			; CHECK-NEXT: jmp .LBB0_1
	; CHECK-NEXT: # =>This Inner Loop Header: Depth=1			; CHECK-NEXT: .LBB0_3: # %afterfor
	; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax
	; CHECK-NEXT: cmpl 16(%ebp), %eax
	; CHECK-NEXT: jl .LBB0_2
	; CHECK-NEXT: # %bb.3: # %afterfor
	; CHECK-NEXT: movl %ebp, %esp			; CHECK-NEXT: movl %ebp, %esp
	; CHECK-NEXT: popl %ebp			; CHECK-NEXT: popl %ebp
	; CHECK-NEXT: retl			; CHECK-NEXT: retl
	entry:			entry:
	%dst.addr = alloca <3 x float>*			%dst.addr = alloca <3 x float>*
	%src.addr = alloca <3 x float>*			%src.addr = alloca <3 x float>*
	%n.addr = alloca i32			%n.addr = alloca i32
	%v = alloca <3 x float>, align 16			%v = alloca <3 x float>, align 16
	Show All 37 Lines

llvm/trunk/test/CodeGen/X86/widen_cast-4.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=i686-unknown-unknown -mattr=+sse4.2 \| FileCheck %s --check-prefix=NARROW			; RUN: llc < %s -mtriple=i686-unknown-unknown -mattr=+sse4.2 \| FileCheck %s --check-prefix=NARROW
	; RUN: llc < %s -mtriple=i686-unknown-unknown -mattr=+sse4.2 -x86-experimental-vector-widening-legalization \| FileCheck %s --check-prefix=WIDE			; RUN: llc < %s -mtriple=i686-unknown-unknown -mattr=+sse4.2 -x86-experimental-vector-widening-legalization \| FileCheck %s --check-prefix=WIDE

	; FIXME: We shouldn't require both a movd and an insert in the wide version.			; FIXME: We shouldn't require both a movd and an insert in the wide version.

	define void @update(i64* %dst_i, i64* %src_i, i32 %n) nounwind {			define void @update(i64* %dst_i, i64* %src_i, i32 %n) nounwind {
	; NARROW-LABEL: update:			; NARROW-LABEL: update:
	; NARROW: # %bb.0: # %entry			; NARROW: # %bb.0: # %entry
	; NARROW-NEXT: subl $12, %esp			; NARROW-NEXT: subl $12, %esp
	; NARROW-NEXT: movl $0, (%esp)			; NARROW-NEXT: movl $0, (%esp)
	; NARROW-NEXT: pcmpeqd %xmm0, %xmm0			; NARROW-NEXT: pcmpeqd %xmm0, %xmm0
	; NARROW-NEXT: movdqa {{.*#+}} xmm1 = <0,2,4,6,8,10,12,14,u,u,u,u,u,u,u,u>			; NARROW-NEXT: movdqa {{.*#+}} xmm1 = <0,2,4,6,8,10,12,14,u,u,u,u,u,u,u,u>
	; NARROW-NEXT: jmp .LBB0_1
	; NARROW-NEXT: .p2align 4, 0x90			; NARROW-NEXT: .p2align 4, 0x90
	; NARROW-NEXT: .LBB0_2: # %forbody			; NARROW-NEXT: .LBB0_1: # %forcond
				; NARROW-NEXT: # =>This Inner Loop Header: Depth=1
				; NARROW-NEXT: movl (%esp), %eax
				; NARROW-NEXT: cmpl {{[0-9]+}}(%esp), %eax
				; NARROW-NEXT: jge .LBB0_3
				; NARROW-NEXT: # %bb.2: # %forbody
	; NARROW-NEXT: # in Loop: Header=BB0_1 Depth=1			; NARROW-NEXT: # in Loop: Header=BB0_1 Depth=1
	; NARROW-NEXT: movl (%esp), %eax			; NARROW-NEXT: movl (%esp), %eax
	; NARROW-NEXT: leal (,%eax,8), %ecx			; NARROW-NEXT: leal (,%eax,8), %ecx
	; NARROW-NEXT: movl {{[0-9]+}}(%esp), %edx			; NARROW-NEXT: movl {{[0-9]+}}(%esp), %edx
	; NARROW-NEXT: addl %ecx, %edx			; NARROW-NEXT: addl %ecx, %edx
	; NARROW-NEXT: movl %edx, {{[0-9]+}}(%esp)			; NARROW-NEXT: movl %edx, {{[0-9]+}}(%esp)
	; NARROW-NEXT: addl {{[0-9]+}}(%esp), %ecx			; NARROW-NEXT: addl {{[0-9]+}}(%esp), %ecx
	; NARROW-NEXT: movl %ecx, {{[0-9]+}}(%esp)			; NARROW-NEXT: movl %ecx, {{[0-9]+}}(%esp)
	; NARROW-NEXT: pmovzxbw {{.*#+}} xmm2 = mem[0],zero,mem[1],zero,mem[2],zero,mem[3],zero,mem[4],zero,mem[5],zero,mem[6],zero,mem[7],zero			; NARROW-NEXT: pmovzxbw {{.*#+}} xmm2 = mem[0],zero,mem[1],zero,mem[2],zero,mem[3],zero,mem[4],zero,mem[5],zero,mem[6],zero,mem[7],zero
	; NARROW-NEXT: psubw %xmm0, %xmm2			; NARROW-NEXT: psubw %xmm0, %xmm2
	; NARROW-NEXT: psllw $8, %xmm2			; NARROW-NEXT: psllw $8, %xmm2
	; NARROW-NEXT: psraw $8, %xmm2			; NARROW-NEXT: psraw $8, %xmm2
	; NARROW-NEXT: psrlw $2, %xmm2			; NARROW-NEXT: psrlw $2, %xmm2
	; NARROW-NEXT: pshufb %xmm1, %xmm2			; NARROW-NEXT: pshufb %xmm1, %xmm2
	; NARROW-NEXT: movq %xmm2, (%edx,%eax,8)			; NARROW-NEXT: movq %xmm2, (%edx,%eax,8)
	; NARROW-NEXT: incl (%esp)			; NARROW-NEXT: incl (%esp)
	; NARROW-NEXT: .LBB0_1: # %forcond			; NARROW-NEXT: jmp .LBB0_1
	; NARROW-NEXT: # =>This Inner Loop Header: Depth=1			; NARROW-NEXT: .LBB0_3: # %afterfor
	; NARROW-NEXT: movl (%esp), %eax
	; NARROW-NEXT: cmpl {{[0-9]+}}(%esp), %eax
	; NARROW-NEXT: jl .LBB0_2
	; NARROW-NEXT: # %bb.3: # %afterfor
	; NARROW-NEXT: addl $12, %esp			; NARROW-NEXT: addl $12, %esp
	; NARROW-NEXT: retl			; NARROW-NEXT: retl
	;			;
	; WIDE-LABEL: update:			; WIDE-LABEL: update:
	; WIDE: # %bb.0: # %entry			; WIDE: # %bb.0: # %entry
	; WIDE-NEXT: subl $12, %esp			; WIDE-NEXT: subl $12, %esp
	; WIDE-NEXT: movl $0, (%esp)			; WIDE-NEXT: movl $0, (%esp)
	; WIDE-NEXT: pcmpeqd %xmm0, %xmm0			; WIDE-NEXT: pcmpeqd %xmm0, %xmm0
	; WIDE-NEXT: movdqa {{.*#+}} xmm1 = [63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63]			; WIDE-NEXT: movdqa {{.*#+}} xmm1 = [63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63]
	; WIDE-NEXT: movdqa {{.*#+}} xmm2 = [32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32]			; WIDE-NEXT: movdqa {{.*#+}} xmm2 = [32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32]
	; WIDE-NEXT: jmp .LBB0_1
	; WIDE-NEXT: .p2align 4, 0x90			; WIDE-NEXT: .p2align 4, 0x90
	; WIDE-NEXT: .LBB0_2: # %forbody			; WIDE-NEXT: .LBB0_1: # %forcond
				; WIDE-NEXT: # =>This Inner Loop Header: Depth=1
				; WIDE-NEXT: movl (%esp), %eax
				; WIDE-NEXT: cmpl {{[0-9]+}}(%esp), %eax
				; WIDE-NEXT: jge .LBB0_3
				; WIDE-NEXT: # %bb.2: # %forbody
	; WIDE-NEXT: # in Loop: Header=BB0_1 Depth=1			; WIDE-NEXT: # in Loop: Header=BB0_1 Depth=1
	; WIDE-NEXT: movl (%esp), %eax			; WIDE-NEXT: movl (%esp), %eax
	; WIDE-NEXT: leal (,%eax,8), %ecx			; WIDE-NEXT: leal (,%eax,8), %ecx
	; WIDE-NEXT: movl {{[0-9]+}}(%esp), %edx			; WIDE-NEXT: movl {{[0-9]+}}(%esp), %edx
	; WIDE-NEXT: addl %ecx, %edx			; WIDE-NEXT: addl %ecx, %edx
	; WIDE-NEXT: movl %edx, {{[0-9]+}}(%esp)			; WIDE-NEXT: movl %edx, {{[0-9]+}}(%esp)
	; WIDE-NEXT: addl {{[0-9]+}}(%esp), %ecx			; WIDE-NEXT: addl {{[0-9]+}}(%esp), %ecx
	; WIDE-NEXT: movl %ecx, {{[0-9]+}}(%esp)			; WIDE-NEXT: movl %ecx, {{[0-9]+}}(%esp)
	; WIDE-NEXT: movq {{.*#+}} xmm3 = mem[0],zero			; WIDE-NEXT: movq {{.*#+}} xmm3 = mem[0],zero
	; WIDE-NEXT: psubb %xmm0, %xmm3			; WIDE-NEXT: psubb %xmm0, %xmm3
	; WIDE-NEXT: psrlw $2, %xmm3			; WIDE-NEXT: psrlw $2, %xmm3
	; WIDE-NEXT: pand %xmm1, %xmm3			; WIDE-NEXT: pand %xmm1, %xmm3
	; WIDE-NEXT: pxor %xmm2, %xmm3			; WIDE-NEXT: pxor %xmm2, %xmm3
	; WIDE-NEXT: psubb %xmm2, %xmm3			; WIDE-NEXT: psubb %xmm2, %xmm3
	; WIDE-NEXT: movq %xmm3, (%edx,%eax,8)			; WIDE-NEXT: movq %xmm3, (%edx,%eax,8)
	; WIDE-NEXT: incl (%esp)			; WIDE-NEXT: incl (%esp)
	; WIDE-NEXT: .LBB0_1: # %forcond			; WIDE-NEXT: jmp .LBB0_1
	; WIDE-NEXT: # =>This Inner Loop Header: Depth=1			; WIDE-NEXT: .LBB0_3: # %afterfor
	; WIDE-NEXT: movl (%esp), %eax
	; WIDE-NEXT: cmpl {{[0-9]+}}(%esp), %eax
	; WIDE-NEXT: jl .LBB0_2
	; WIDE-NEXT: # %bb.3: # %afterfor
	; WIDE-NEXT: addl $12, %esp			; WIDE-NEXT: addl $12, %esp
	; WIDE-NEXT: retl			; WIDE-NEXT: retl
	entry:			entry:
	%dst_i.addr = alloca i64*			%dst_i.addr = alloca i64*
	%src_i.addr = alloca i64*			%src_i.addr = alloca i64*
	%n.addr = alloca i32			%n.addr = alloca i32
	%i = alloca i32, align 4			%i = alloca i32, align 4
	%dst = alloca <8 x i8>*, align 4			%dst = alloca <8 x i8>*, align 4
	▲ Show 20 Lines • Show All 46 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/x86-cmov-converter.ll

	; RUN: llc -mtriple=x86_64-pc-linux -x86-cmov-converter=true -verify-machineinstrs < %s \| FileCheck -allow-deprecated-dag-overlap %s			; RUN: llc -mtriple=x86_64-pc-linux -x86-cmov-converter=true -verify-machineinstrs -disable-block-placement < %s \| FileCheck -allow-deprecated-dag-overlap %s

	;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;			;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
	;; This test checks that x86-cmov-converter optimization transform CMOV			;; This test checks that x86-cmov-converter optimization transform CMOV
	;; instruction into branches when it is profitable.			;; instruction into branches when it is profitable.
	;; There are 5 cases below:			;; There are 5 cases below:
	;; 1. CmovInCriticalPath:			;; 1. CmovInCriticalPath:
	;; CMOV depends on the condition and it is in the hot path.			;; CMOV depends on the condition and it is in the hot path.
	;; Thus, it worths transforming.			;; Thus, it worths transforming.
	▲ Show 20 Lines • Show All 483 Lines • Show Last 20 Lines

llvm/trunk/test/DebugInfo/X86/PR37234.ll

	Show All 15 Lines
	; }			; }
	; return aa;			; return aa;
	; }			; }


	; CHECK-LABEL: # %bb.{{.*}}:			; CHECK-LABEL: # %bb.{{.*}}:
	; CHECK: #DEBUG_VALUE: main:aa <- 0			; CHECK: #DEBUG_VALUE: main:aa <- 0
	; CHECK: #DEBUG_VALUE: main:aa <- $[[REG:[0-9a-z]+]]			; CHECK: #DEBUG_VALUE: main:aa <- $[[REG:[0-9a-z]+]]
	; CHECK: jmp .LBB0_1			; CHECK: .LBB0_1:
	; CHECK: .LBB0_2:			; CHECK: #DEBUG_VALUE: main:aa <- $[[REG]]
				; CHECK: je .LBB0_4
				; CHECK: # %bb.{{.*}}:
	; CHECK: #DEBUG_VALUE: main:aa <- $[[REG]]			; CHECK: #DEBUG_VALUE: main:aa <- $[[REG]]
	; CHECK: jne .LBB0_1			; CHECK: jne .LBB0_1
	; CHECK: # %bb.{{.*}}:			; CHECK: # %bb.{{.*}}:
	; CHECK: #DEBUG_VALUE: main:aa <- $[[REG]]			; CHECK: #DEBUG_VALUE: main:aa <- $[[REG]]
	; CHECK: incl %[[REG]]			; CHECK: incl %[[REG]]
	; CHECK: #DEBUG_VALUE: main:aa <- $[[REG]]			; CHECK: #DEBUG_VALUE: main:aa <- $[[REG]]
	; CHECK: .LBB0_1:			; CHECK: jmp .LBB0_1
	; CHECK: #DEBUG_VALUE: main:aa <- $[[REG]]			; CHECK: .LBB0_4:
	; CHECK: jne .LBB0_2
	; CHECK: # %bb.{{.*}}:
	; CHECK: #DEBUG_VALUE: main:aa <- $[[REG]]			; CHECK: #DEBUG_VALUE: main:aa <- $[[REG]]
	; CHECK: retq			; CHECK: retq

	source_filename = "PR37234.cpp"			source_filename = "PR37234.cpp"
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	@.str = private unnamed_addr constant [4 x i8] c"aaa\00", align 1			@.str = private unnamed_addr constant [4 x i8] c"aaa\00", align 1
	▲ Show 20 Lines • Show All 81 Lines • Show Last 20 Lines

llvm/trunk/test/DebugInfo/X86/dbg-value-transfer-order.ll

	Show All 18 Lines
	; We had a bug where the DBG_VALUE instruction for bit_offset would be emitted			; We had a bug where the DBG_VALUE instruction for bit_offset would be emitted
	; at the end of the basic block, long after its actual program point. What's			; at the end of the basic block, long after its actual program point. What's
	; interesting in this example is that the !range metadata produces an AssertSext			; interesting in this example is that the !range metadata produces an AssertSext
	; DAG node that gets replaced during ISel. This leads to an unordered SDDbgValue			; DAG node that gets replaced during ISel. This leads to an unordered SDDbgValue
	; vector, which has to be sorted by IR order before it is processed in parallel			; vector, which has to be sorted by IR order before it is processed in parallel
	; with the Orders insertion point vector.			; with the Orders insertion point vector.

	; CHECK-LABEL: f: # @f			; CHECK-LABEL: f: # @f
				; CHECK: .LBB0_4:
				; Check that this DEBUG_VALUE comes before the left shift.
				; CHECK: #DEBUG_VALUE: bit_offset <- $ecx
				; CHECK: .cv_loc 0 1 8 28 # t.c:8:28
				; CHECK: movl $1, %[[reg:[^ ]*]]
				; CHECK: shll %cl, %[[reg]]
	; CHECK: .LBB0_2: # %while.body			; CHECK: .LBB0_2: # %while.body
	; CHECK: movl $32, %ecx			; CHECK: movl $32, %ecx
	; CHECK: testl {{.*}}			; CHECK: testl {{.*}}
	; CHECK: jne .LBB0_4			; CHECK: jne .LBB0_4
	; CHECK: # %bb.3: # %if.then			; CHECK: # %bb.3: # %if.then
	; CHECK: callq if_then			; CHECK: callq if_then
	; CHECK: movl %eax, %ecx			; CHECK: movl %eax, %ecx
	; CHECK: .LBB0_4: # %if.end			; CHECK: jmp .LBB0_4
	; Check that this DEBUG_VALUE comes before the left shift.
	; CHECK: #DEBUG_VALUE: bit_offset <- $ecx
	; CHECK: .cv_loc 0 1 8 28 # t.c:8:28
	; CHECK: movl $1, %[[reg:[^ ]*]]
	; CHECK: shll %cl, %[[reg]]

	; ModuleID = 't.c'			; ModuleID = 't.c'
	source_filename = "t.c"			source_filename = "t.c"
	target datalayout = "e-m:w-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:w-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-pc-windows-msvc19.0.24215"			target triple = "x86_64-pc-windows-msvc19.0.24215"

	; Function Attrs: nounwind readnone speculatable			; Function Attrs: nounwind readnone speculatable
	declare void @llvm.dbg.declare(metadata, metadata, metadata) #1			declare void @llvm.dbg.declare(metadata, metadata, metadata) #1
	▲ Show 20 Lines • Show All 102 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[MBP] Move a latch block with conditional exit and multi predecessors to top of loopClosedPublic

Details

Diff Detail

Event Timeline

Index: BranchProbabilityInfo.cpp

Revision Contents

Diff 204875

llvm/trunk/lib/CodeGen/MachineBlockPlacement.cpp

llvm/trunk/test/CodeGen/AArch64/cmpxchg-idioms.ll

llvm/trunk/test/CodeGen/AArch64/neg-imm.ll

llvm/trunk/test/CodeGen/AArch64/tailmerging_in_mbp.ll

llvm/trunk/test/CodeGen/AMDGPU/collapse-endcf.ll

llvm/trunk/test/CodeGen/AMDGPU/divergent-branch-uniform-condition.ll

llvm/trunk/test/CodeGen/AMDGPU/global_smrd_cfg.ll

llvm/trunk/test/CodeGen/AMDGPU/hoist-cond.ll

llvm/trunk/test/CodeGen/AMDGPU/i1-copy-from-loop.ll

llvm/trunk/test/CodeGen/AMDGPU/indirect-addressing-si.ll

llvm/trunk/test/CodeGen/AMDGPU/loop_break.ll

llvm/trunk/test/CodeGen/AMDGPU/loop_exit_with_xor.ll

llvm/trunk/test/CodeGen/AMDGPU/madmk.ll

llvm/trunk/test/CodeGen/AMDGPU/multilevel-break.ll

llvm/trunk/test/CodeGen/AMDGPU/optimize-negated-cond.ll

llvm/trunk/test/CodeGen/AMDGPU/si-annotate-cf.ll

llvm/trunk/test/CodeGen/AMDGPU/valu-i1.ll

llvm/trunk/test/CodeGen/AMDGPU/wqm.ll

llvm/trunk/test/CodeGen/ARM/2011-03-23-PeepholeBug.ll

llvm/trunk/test/CodeGen/ARM/arm-and-tst-peephole.ll

llvm/trunk/test/CodeGen/ARM/atomic-cmp.ll

llvm/trunk/test/CodeGen/ARM/atomic-cmpxchg.ll

llvm/trunk/test/CodeGen/ARM/code-placement.ll

llvm/trunk/test/CodeGen/ARM/pr32578.ll

llvm/trunk/test/CodeGen/ARM/swifterror.ll

llvm/trunk/test/CodeGen/Hexagon/bug6757-endloop.ll

llvm/trunk/test/CodeGen/Hexagon/early-if-merge-loop.ll

llvm/trunk/test/CodeGen/Hexagon/prof-early-if.ll

llvm/trunk/test/CodeGen/Hexagon/redundant-branching2.ll

llvm/trunk/test/CodeGen/PowerPC/atomics-regression.ll

llvm/trunk/test/CodeGen/PowerPC/cmp_elimination.ll

llvm/trunk/test/CodeGen/PowerPC/ctrloop-shortLoops.ll

llvm/trunk/test/CodeGen/PowerPC/expand-foldable-isel.ll

llvm/trunk/test/CodeGen/PowerPC/knowCRBitSpill.ll

llvm/trunk/test/CodeGen/PowerPC/licm-remat.ll

llvm/trunk/test/CodeGen/SystemZ/atomicrmw-minmax-01.ll

llvm/trunk/test/CodeGen/SystemZ/atomicrmw-minmax-02.ll

llvm/trunk/test/CodeGen/SystemZ/loop-01.ll

llvm/trunk/test/CodeGen/SystemZ/loop-02.ll

llvm/trunk/test/CodeGen/SystemZ/swifterror.ll

llvm/trunk/test/CodeGen/Thumb/consthoist-physical-addr.ll

llvm/trunk/test/CodeGen/X86/block-placement.ll

llvm/trunk/test/CodeGen/X86/code_placement.ll

llvm/trunk/test/CodeGen/X86/code_placement_cold_loop_blocks.ll

llvm/trunk/test/CodeGen/X86/code_placement_ignore_succ_in_inner_loop.ll

llvm/trunk/test/CodeGen/X86/code_placement_loop_rotation2.ll

llvm/trunk/test/CodeGen/X86/code_placement_no_header_change.ll

llvm/trunk/test/CodeGen/X86/conditional-tailcall.ll

llvm/trunk/test/CodeGen/X86/loop-blocks.ll

llvm/trunk/test/CodeGen/X86/loop-rotate.ll

llvm/trunk/test/CodeGen/X86/lsr-loop-exit-cond.ll

llvm/trunk/test/CodeGen/X86/move_latch_to_loop_top.ll

llvm/trunk/test/CodeGen/X86/pr38185.ll

llvm/trunk/test/CodeGen/X86/ragreedy-hoist-spill.ll

llvm/trunk/test/CodeGen/X86/reverse_branches.ll

llvm/trunk/test/CodeGen/X86/speculative-load-hardening.ll

llvm/trunk/test/CodeGen/X86/swifterror.ll

llvm/trunk/test/CodeGen/X86/tail-dup-merge-loop-headers.ll

llvm/trunk/test/CodeGen/X86/tail-dup-repeat.ll

llvm/trunk/test/CodeGen/X86/vector-shift-by-select-loop.ll

llvm/trunk/test/CodeGen/X86/widen_arith-1.ll

llvm/trunk/test/CodeGen/X86/widen_arith-2.ll

llvm/trunk/test/CodeGen/X86/widen_arith-3.ll

llvm/trunk/test/CodeGen/X86/widen_arith-4.ll

llvm/trunk/test/CodeGen/X86/widen_arith-5.ll

llvm/trunk/test/CodeGen/X86/widen_arith-6.ll

llvm/trunk/test/CodeGen/X86/widen_cast-4.ll

llvm/trunk/test/CodeGen/X86/x86-cmov-converter.ll

llvm/trunk/test/DebugInfo/X86/PR37234.ll

llvm/trunk/test/DebugInfo/X86/dbg-value-transfer-order.ll

[MBP] Move a latch block with conditional exit and multi predecessors to top of loop
ClosedPublic