This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/test/OpenMP/
-
test/
-
OpenMP/
-
cancel_codegen.cpp
-
irbuilder_for_iterator.cpp
-
irbuilder_for_rangefor.cpp
-
irbuilder_for_unsigned.c
-
irbuilder_nested_parallel_for.c
-
irbuilder_unroll_partial_factor_for.c
-
irbuilder_unroll_partial_heuristic_constant_for.c
-
irbuilder_unroll_partial_heuristic_runtime_for.c
-
irbuilder_unroll_unroll_partial_factor.c
-
llvm/
-
include/llvm/Frontend/OpenMP/
-
llvm/
-
Frontend/
-
OpenMP/
1/2
OMPIRBuilder.h
-
lib/Frontend/OpenMP/
-
Frontend/
-
OpenMP/
3
OMPIRBuilder.cpp
-
unittests/Frontend/
-
Frontend/
1
OpenMPIRBuilderTest.cpp
-
mlir/
-
include/mlir/Dialect/OpenMP/
-
mlir/
-
Dialect/
-
OpenMP/
-
OpenMPOps.td
-
lib/
-
Dialect/OpenMP/IR/
-
OpenMP/
-
IR/
-
OpenMPDialect.cpp
-
Target/LLVMIR/Dialect/OpenMP/
-
LLVMIR/
-
Dialect/
-
OpenMP/
-
OpenMPToLLVMIRTranslation.cpp
-
test/
-
Conversion/OpenMPToLLVM/
-
OpenMPToLLVM/
-
convert-to-llvmir.mlir
-
Dialect/OpenMP/
-
OpenMP/
1/2
invalid.mlir
-
ops.mlir
-
Target/LLVMIR/
-
LLVMIR/
-
openmp-llvm.mlir

Differential D116292

[OMPIRBuilder][MLIR] Support ordered clause specified with parameter
AbandonedPublic

Authored by peixin on Dec 26 2021, 6:26 PM.

Download Raw Diff

Details

Reviewers

kiranchandramohan
ftynse
Meinersbur
clementval
Leporacanthicus
kiranktp
jdoerfert
arnamoy10
bryanpkc
shraiysh
NimishMishra

Summary

With ordered clause specified with parameter n, the n outer loops form a
doacross loop nest. Add applyDoacrossLoop to implement the doacross loop
"init" and "fini" runtime call in OpenMP IRBuilder. Add one virtual
clause in WsLoop MLIR Op to store the doacross loop bounds info.

In addition, move the barrier runtime call in the front of "after" basic
block, and set the insertion point at the end of "after" basic block.
With this change, lowering to LLVM IR is supported when dynamic schedule
is specified and collapse value is greater than 1. Also add the test
case.

Diff Detail

Unit TestsFailed

	Time	Test
	60 ms	x64 debian > LLVM.Bindings/Go::go.test

Event Timeline

peixin created this revision.Dec 26 2021, 6:26 PM

Herald added subscribers: sdasgup3, wenzhicui, wrengr and 21 others. · View Herald TranscriptDec 26 2021, 6:26 PM

peixin requested review of this revision.Dec 26 2021, 6:26 PM

Herald added subscribers: llvm-commits, sstefan1, stephenneuendorffer, nicolasvasilache. · View Herald TranscriptDec 26 2021, 6:26 PM

Harbormaster completed remote builds in B140686: Diff 396252.Dec 26 2021, 7:19 PM

Fix clang failed test cases.

Herald added a subscriber: zzheng. · View Herald TranscriptDec 26 2021, 8:09 PM

Harbormaster completed remote builds in B140694: Diff 396260.Dec 26 2021, 8:53 PM

To make the review work easier, I would like to give a brief explanation of the design of OMPIRBuilder of ordered clause with parameter. First of all, the ordered clause cannot work alone to make the code region execute in order. The ordered clause and ordered construct must cooperate to make the code region executing in order. For ordered clause specified with a paramter, the outer n (the parameter) loops form the doacross loop nest and OpenMP runtime function kmpc_doacross_init is generated to initialize the loop bounds info of the doacross loop nest. For ordered construct with depend clause, it posts/waits the corresponding thread id according to the index specified in ordered depend directive.

Clang transforms the doacross loop nest into a new one with lower bound of 0 and step of 1. However, this is really not necessary. OpenMP runtime can handle the doacross loop nest regardless of positive or negative step https://github.com/llvm/llvm-project/blob/7c3cf4c2c0689be1a08b8a1326703ec5770de471/openmp/runtime/src/kmp_csupport.cpp#L4050-L4058. The doacross loop nest is independent of worksharing-loop.

When lowering parse-tree to MLIR (https://github.com/flang-compiler/f18-llvm-project/pull/1370/commits/75a8db9c0f7f8c21c2720a794a46afc950ccd0ff), the loop bounds info of lower bounds, upper bounds and steps of the doacross loop nest is collected. The loop bounds info can be taken as the fourth argument of kmpc_doacross_init call directly, and using the expression value of ordered depend directive as the argument of kmpc_doacross_wait/post will make it work (https://github.com/flang-compiler/f18-llvm-project/pull/1368).

peixin added reviewers: shraiysh, NimishMishra.Dec 30 2021, 5:11 PM

ping

Herald added a subscriber: awarzynski. · View Herald TranscriptJan 24 2022, 5:00 PM

Thank you for the summary, it was helpful.

With this change, lowering to LLVM IR is supported when dynamic schedule

is specified and collapse value is greater than 1. Also add the test
case.

Could you explain what goes bad when you do not do this?

In D116292#3210335, @peixin wrote:

Clang transforms the doacross loop nest into a new one with lower bound of 0 and step of 1. However, this is really not necessary. OpenMP runtime can handle the doacross loop nest regardless of positive or negative step https://github.com/llvm/llvm-project/blob/7c3cf4c2c0689be1a08b8a1326703ec5770de471/openmp/runtime/src/kmp_csupport.cpp#L4050-L4058. The doacross loop nest is independent of worksharing-loop.

While it does contain code for it, it is also wrong in edge cases:

If lo is larger than up (I assume there must be check for this somewhere, but I don't who is responsible for checking it; the compiler-emitted code?)
If lo - up overflows, in particular if the loop counter variable itself is int64_t. The trip count itself doesn't even need to be large if the increment value is large as well.
Potentially if the loop counter variable is uint64_t and lo/up larger than 2^63.

An integer loop variable must be introduced anyway for loops over iterators, we might just as well normalize everything to a simplified logical iteration space and not have to bother with overflows later.

llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
424	Did you consider making doacross part of an existing call like applyDynamicWorkshareLoop? What are the reason against it? If is a potential `collapseLoop` that loses information of the dimensionality of the original loop, did you consider adding that information to `CanonicalLoopInfo` such that it can be preserved?
llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
1718–1720	If the body of the loop is just an assert, enclose the entire loop into an `#ifndef NDEBUG`
1729–1732
1744
llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp
2025–2115	I don't think this kind of checking is useful. It does not make clear what properties are actually relevant and very difficult to update even if e.g. just the allocas are ordered differently. I suggest to only have some sanity checks, such as the existence of a call to `__kmpc_doacross_fini`.
mlir/test/Dialect/OpenMP/invalid.mlir
123	Why this change?

Thanks @Meinersbur for the review and good comments.

In D116292#3280853, @Meinersbur wrote:

With this change, lowering to LLVM IR is supported when dynamic schedule
is specified and collapse value is greater than 1. Also add the test
case.

Could you explain what goes bad when you do not do this?

For dynamic schedule, it overrides the afterIP. But when collapse value is greater than 1, it should use the afterIP stored before transforming the collapsed loops. You can check the changes of this patch in mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp.

In D116292#3210335, @peixin wrote:

Clang transforms the doacross loop nest into a new one with lower bound of 0 and step of 1. However, this is really not necessary. OpenMP runtime can handle the doacross loop nest regardless of positive or negative step https://github.com/llvm/llvm-project/blob/7c3cf4c2c0689be1a08b8a1326703ec5770de471/openmp/runtime/src/kmp_csupport.cpp#L4050-L4058. The doacross loop nest is independent of worksharing-loop.

While it does contain code for it, it is also wrong in edge case:

If lo is larger than up (I assume there must be check for this somewhere, but I don't who is responsible for checking it; the compiler-emitted code?)

Do you mean lo is greater than up and step is positive? Normalization don't check this case, either. I think that is the problem of the user code.

If lo - up overflows, in particular if the loop counter variable itself is int64_t. The trip count itself doesn't even need to be large if the increment value is large as well.

If lo - up overflows, computing the trip count also overflows. The worksharing-loop seems not to support it.

Potentially if the loop counter variable is uint64_t and lo/up larger than 2^63.

For flang, the type of loop counter variable must be int32_t or int64_t, and there is no signed integer 64-bit in fortran. For clang, the uint64_t is converted into int64_t and there is no uint64_t passed to __kmpc_doacross_init. For example, the lower bound is -1 if it is declared as unsigned long long lb = ULLONG_MAX;. In this case, the computation here is trace_count = (uint64_t) (-1 - 1) / 1 + 1 = ULLONG_MAX for the statement for (unsigned long long i = ULLONG_MAX; i >= 1; i--) and it is correct.

An integer loop variable must be introduced anyway for loops over iterators, we might just as well normalize everything to a simplified logical iteration space and not have to bother with overflows later.

For clang, the variable can be normalized according the operators in the for loop such as <, >, <=, >=, !=. But for flang, it is not easy to know if lb is greater than ub or not. Let's look at the following example,

!$omp do ordered(1)
do i = lb, up, step
  !$omp ordered depend(i-1)
  func(i-1)
  ...
enddo

For Ordered Depend directive, how to transform i-1 for the argument of __kmpc_doacrosss_wait is hard to know. Actually, normalization in clang is not correct in all cases. I found one bug as follows:

#include <iostream>
using namespace std;

int main() {
  int i, i_lb = 1, i_ub = 10, i_step = 1;
  int a[10];

  for (i = 0; i < 10; i++)
    a[i] = 1;

  #pragma omp parallel num_threads(9)
  #pragma omp for ordered(1)
  for (i = i_lb; i != i_ub; i = i + i_step) {
    #pragma omp ordered depend(sink: i-1)
    a[i] = a[i-1] + 1;
    #pragma omp ordered depend(source)
  }

  for (i = 0; i < 10; i++)
    cout << a[i] << " ";
  cout << endl;
  return 0;
}

$ clang++ case.cpp && ./a.out
1 2 3 4 5 6 7 8 9 10 
$ clang++ case.cpp -fopenmp && ./a.out
1 1 1 1 1 1 1 1 1 1

The problem is clang thinks that i_lb is greater than i_ub in this case and the normalization of depend(sink: i-1) is wrong.

For other comments, will fix them later.

llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
424	The `doacrossloop` only inserts the init and fini calls into current worksharing-loop. If making it like applyDynamicWorkshareLoop, three are needed, i.e., applyDoacrossDynamicLoop, applyDoacrossStaticLoop, and applyDoacrossStaticChunkLoop, which is too redundant. Worksharing loop is commonly used, but ordered(n) is not commonly used. In some workloads, there is even no ordered(n) clause. Adding doacorss loop info into CanonicalLoopInfo will have some cost, which is not necessary. What do you think?
mlir/test/Dialect/OpenMP/invalid.mlir
123	The doacross loop is not implemented before. If the ordered value is greater than 1, there is one virtual doacross clause attached with this patch. This check only checks if there is ordered clause.

peixin mentioned this in D114413: [OpenMPIRBuilder] Implement static-chunked workshare-loop schedules..Jan 28 2022, 11:19 PM

In last OpenMP Flang technical call, got the information from OpenMP community by @Meinersbur that implementation of ordered directive and clause is under discussion. Currently in LLVM openmp library and clang frontend, the doacross loop is independent from the worksharing loop. The OpenMP community is discussing about if fixing the canonical loop instead of forming one new doacross loop considering the performance issue and edge cases such as overflow. We plan to delay the progress of lowering the ordered directive and clause. So close this PR for now and may reopen this in future.

peixin abandoned this revision.Feb 22 2022, 10:33 PM

peixin mentioned this in D128338: [NFC][OpenMP] Fix worksharing-loop.Jun 22 2022, 6:20 AM

Revision Contents

Path

Size

clang/

test/

OpenMP/

cancel_codegen.cpp

32 lines

irbuilder_for_iterator.cpp

4 lines

irbuilder_for_rangefor.cpp

4 lines

irbuilder_for_unsigned.c

4 lines

irbuilder_nested_parallel_for.c

116 lines

irbuilder_unroll_partial_factor_for.c

379 lines

irbuilder_unroll_partial_heuristic_constant_for.c

409 lines

irbuilder_unroll_partial_heuristic_runtime_for.c

423 lines

irbuilder_unroll_unroll_partial_factor.c

365 lines

llvm/

include/

llvm/

Frontend/

OpenMP/

OMPIRBuilder.h

19 lines

lib/

Frontend/

OpenMP/

OMPIRBuilder.cpp

76 lines

unittests/

Frontend/

OpenMPIRBuilderTest.cpp

179 lines

mlir/

include/

mlir/

Dialect/

OpenMP/

OpenMPOps.td

10 lines

lib/

Dialect/

OpenMP/

IR/

OpenMPDialect.cpp

52 lines

Target/

LLVMIR/

Dialect/

OpenMP/

OpenMPToLLVMIRTranslation.cpp

28 lines

test/

Conversion/

OpenMPToLLVM/

convert-to-llvmir.mlir

2 lines

Dialect/

OpenMP/

invalid.mlir

12 lines

ops.mlir

48 lines

Target/

LLVMIR/

openmp-llvm.mlir

112 lines

Diff 396260

clang/test/OpenMP/cancel_codegen.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 1,379 Lines • ▼ Show 20 Lines
	// CHECK3-NEXT: switch i32 [[TMP6]], label [[OMP_SECTION_LOOP_INC]] [			// CHECK3-NEXT: switch i32 [[TMP6]], label [[OMP_SECTION_LOOP_INC]] [
	// CHECK3-NEXT: i32 0, label [[OMP_SECTION_LOOP_BODY_CASE:%.*]]			// CHECK3-NEXT: i32 0, label [[OMP_SECTION_LOOP_BODY_CASE:%.*]]
	// CHECK3-NEXT: ]			// CHECK3-NEXT: ]
	// CHECK3: omp_section_loop.inc:			// CHECK3: omp_section_loop.inc:
	// CHECK3-NEXT: [[OMP_SECTION_LOOP_NEXT]] = add nuw i32 [[OMP_SECTION_LOOP_IV]], 1			// CHECK3-NEXT: [[OMP_SECTION_LOOP_NEXT]] = add nuw i32 [[OMP_SECTION_LOOP_IV]], 1
	// CHECK3-NEXT: br label [[OMP_SECTION_LOOP_HEADER]]			// CHECK3-NEXT: br label [[OMP_SECTION_LOOP_HEADER]]
	// CHECK3: omp_section_loop.exit:			// CHECK3: omp_section_loop.exit:
	// CHECK3-NEXT: call void @__kmpc_for_static_fini(%struct.ident_t* @[[GLOB1]], i32 [[OMP_GLOBAL_THREAD_NUM11]])			// CHECK3-NEXT: call void @__kmpc_for_static_fini(%struct.ident_t* @[[GLOB1]], i32 [[OMP_GLOBAL_THREAD_NUM11]])
	// CHECK3-NEXT: [[OMP_GLOBAL_THREAD_NUM12:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t @[[GLOB1]])
	// CHECK3-NEXT: call void @__kmpc_barrier(%struct.ident_t* @[[GLOB2:[0-9]+]], i32 [[OMP_GLOBAL_THREAD_NUM12]])
	// CHECK3-NEXT: br label [[OMP_SECTION_LOOP_AFTER:%.*]]			// CHECK3-NEXT: br label [[OMP_SECTION_LOOP_AFTER:%.*]]
	// CHECK3: omp_section_loop.after:			// CHECK3: omp_section_loop.after:
				// CHECK3-NEXT: [[OMP_GLOBAL_THREAD_NUM12:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t @[[GLOB1]])
				// CHECK3-NEXT: call void @__kmpc_barrier(%struct.ident_t* @[[GLOB2:[0-9]+]], i32 [[OMP_GLOBAL_THREAD_NUM12]])
	// CHECK3-NEXT: br label [[OMP_SECTIONS_END:%.*]]			// CHECK3-NEXT: br label [[OMP_SECTIONS_END:%.*]]
	// CHECK3: omp_sections.end:			// CHECK3: omp_sections.end:
	// CHECK3-NEXT: br label [[OMP_SECTION_LOOP_PREHEADER13:%.*]]			// CHECK3-NEXT: br label [[OMP_SECTION_LOOP_PREHEADER13:%.*]]
	// CHECK3: omp_section_loop.preheader13:			// CHECK3: omp_section_loop.preheader13:
	// CHECK3-NEXT: store i32 0, i32* [[P_LOWERBOUND28]], align 4			// CHECK3-NEXT: store i32 0, i32* [[P_LOWERBOUND28]], align 4
	// CHECK3-NEXT: store i32 1, i32* [[P_UPPERBOUND29]], align 4			// CHECK3-NEXT: store i32 1, i32* [[P_UPPERBOUND29]], align 4
	// CHECK3-NEXT: store i32 1, i32* [[P_STRIDE30]], align 4			// CHECK3-NEXT: store i32 1, i32* [[P_STRIDE30]], align 4
	// CHECK3-NEXT: [[OMP_GLOBAL_THREAD_NUM31:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t @[[GLOB1]])			// CHECK3-NEXT: [[OMP_GLOBAL_THREAD_NUM31:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t @[[GLOB1]])
	Show All 17 Lines
	// CHECK3-NEXT: i32 0, label [[OMP_SECTION_LOOP_BODY_CASE23:%.*]]			// CHECK3-NEXT: i32 0, label [[OMP_SECTION_LOOP_BODY_CASE23:%.*]]
	// CHECK3-NEXT: i32 1, label [[OMP_SECTION_LOOP_BODY_CASE25:%.*]]			// CHECK3-NEXT: i32 1, label [[OMP_SECTION_LOOP_BODY_CASE25:%.*]]
	// CHECK3-NEXT: ]			// CHECK3-NEXT: ]
	// CHECK3: omp_section_loop.inc17:			// CHECK3: omp_section_loop.inc17:
	// CHECK3-NEXT: [[OMP_SECTION_LOOP_NEXT22]] = add nuw i32 [[OMP_SECTION_LOOP_IV20]], 1			// CHECK3-NEXT: [[OMP_SECTION_LOOP_NEXT22]] = add nuw i32 [[OMP_SECTION_LOOP_IV20]], 1
	// CHECK3-NEXT: br label [[OMP_SECTION_LOOP_HEADER14]]			// CHECK3-NEXT: br label [[OMP_SECTION_LOOP_HEADER14]]
	// CHECK3: omp_section_loop.exit18:			// CHECK3: omp_section_loop.exit18:
	// CHECK3-NEXT: call void @__kmpc_for_static_fini(%struct.ident_t* @[[GLOB1]], i32 [[OMP_GLOBAL_THREAD_NUM31]])			// CHECK3-NEXT: call void @__kmpc_for_static_fini(%struct.ident_t* @[[GLOB1]], i32 [[OMP_GLOBAL_THREAD_NUM31]])
	// CHECK3-NEXT: [[OMP_GLOBAL_THREAD_NUM32:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t @[[GLOB1]])
	// CHECK3-NEXT: call void @__kmpc_barrier(%struct.ident_t* @[[GLOB2]], i32 [[OMP_GLOBAL_THREAD_NUM32]])
	// CHECK3-NEXT: br label [[OMP_SECTION_LOOP_AFTER19:%.*]]			// CHECK3-NEXT: br label [[OMP_SECTION_LOOP_AFTER19:%.*]]
	// CHECK3: omp_section_loop.after19:			// CHECK3: omp_section_loop.after19:
				// CHECK3-NEXT: [[OMP_GLOBAL_THREAD_NUM32:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t @[[GLOB1]])
				// CHECK3-NEXT: call void @__kmpc_barrier(%struct.ident_t* @[[GLOB2]], i32 [[OMP_GLOBAL_THREAD_NUM32]])
	// CHECK3-NEXT: br label [[OMP_SECTIONS_END33:%.*]]			// CHECK3-NEXT: br label [[OMP_SECTIONS_END33:%.*]]
	// CHECK3: omp_sections.end33:			// CHECK3: omp_sections.end33:
	// CHECK3-NEXT: [[TMP14:%.]] = load i32, i32 [[ARGC_ADDR]], align 4			// CHECK3-NEXT: [[TMP14:%.]] = load i32, i32 [[ARGC_ADDR]], align 4
	// CHECK3-NEXT: store i32 [[TMP14]], i32* [[DOTCAPTURE_EXPR_]], align 4			// CHECK3-NEXT: store i32 [[TMP14]], i32* [[DOTCAPTURE_EXPR_]], align 4
	// CHECK3-NEXT: [[TMP15:%.]] = load i32, i32 [[DOTCAPTURE_EXPR_]], align 4			// CHECK3-NEXT: [[TMP15:%.]] = load i32, i32 [[DOTCAPTURE_EXPR_]], align 4
	// CHECK3-NEXT: [[SUB:%.*]] = sub nsw i32 [[TMP15]], 0			// CHECK3-NEXT: [[SUB:%.*]] = sub nsw i32 [[TMP15]], 0
	// CHECK3-NEXT: [[DIV:%.*]] = sdiv i32 [[SUB]], 1			// CHECK3-NEXT: [[DIV:%.*]] = sdiv i32 [[SUB]], 1
	// CHECK3-NEXT: [[SUB35:%.*]] = sub nsw i32 [[DIV]], 1			// CHECK3-NEXT: [[SUB35:%.*]] = sub nsw i32 [[DIV]], 1
	▲ Show 20 Lines • Show All 570 Lines • ▼ Show 20 Lines
	// CHECK4-NEXT: switch i32 [[TMP6]], label [[OMP_SECTION_LOOP_INC]] [			// CHECK4-NEXT: switch i32 [[TMP6]], label [[OMP_SECTION_LOOP_INC]] [
	// CHECK4-NEXT: i32 0, label [[OMP_SECTION_LOOP_BODY_CASE:%.*]]			// CHECK4-NEXT: i32 0, label [[OMP_SECTION_LOOP_BODY_CASE:%.*]]
	// CHECK4-NEXT: ]			// CHECK4-NEXT: ]
	// CHECK4: omp_section_loop.inc:			// CHECK4: omp_section_loop.inc:
	// CHECK4-NEXT: [[OMP_SECTION_LOOP_NEXT]] = add nuw i32 [[OMP_SECTION_LOOP_IV]], 1			// CHECK4-NEXT: [[OMP_SECTION_LOOP_NEXT]] = add nuw i32 [[OMP_SECTION_LOOP_IV]], 1
	// CHECK4-NEXT: br label [[OMP_SECTION_LOOP_HEADER]]			// CHECK4-NEXT: br label [[OMP_SECTION_LOOP_HEADER]]
	// CHECK4: omp_section_loop.exit:			// CHECK4: omp_section_loop.exit:
	// CHECK4-NEXT: call void @__kmpc_for_static_fini(%struct.ident_t* @[[GLOB1]], i32 [[OMP_GLOBAL_THREAD_NUM11]])			// CHECK4-NEXT: call void @__kmpc_for_static_fini(%struct.ident_t* @[[GLOB1]], i32 [[OMP_GLOBAL_THREAD_NUM11]])
	// CHECK4-NEXT: [[OMP_GLOBAL_THREAD_NUM12:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t @[[GLOB1]])
	// CHECK4-NEXT: call void @__kmpc_barrier(%struct.ident_t* @[[GLOB2:[0-9]+]], i32 [[OMP_GLOBAL_THREAD_NUM12]])
	// CHECK4-NEXT: br label [[OMP_SECTION_LOOP_AFTER:%.*]]			// CHECK4-NEXT: br label [[OMP_SECTION_LOOP_AFTER:%.*]]
	// CHECK4: omp_section_loop.after:			// CHECK4: omp_section_loop.after:
				// CHECK4-NEXT: [[OMP_GLOBAL_THREAD_NUM12:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t @[[GLOB1]])
				// CHECK4-NEXT: call void @__kmpc_barrier(%struct.ident_t* @[[GLOB2:[0-9]+]], i32 [[OMP_GLOBAL_THREAD_NUM12]])
	// CHECK4-NEXT: br label [[OMP_SECTIONS_END:%.*]]			// CHECK4-NEXT: br label [[OMP_SECTIONS_END:%.*]]
	// CHECK4: omp_sections.end:			// CHECK4: omp_sections.end:
	// CHECK4-NEXT: br label [[OMP_SECTION_LOOP_PREHEADER13:%.*]]			// CHECK4-NEXT: br label [[OMP_SECTION_LOOP_PREHEADER13:%.*]]
	// CHECK4: omp_section_loop.preheader13:			// CHECK4: omp_section_loop.preheader13:
	// CHECK4-NEXT: store i32 0, i32* [[P_LOWERBOUND28]], align 4			// CHECK4-NEXT: store i32 0, i32* [[P_LOWERBOUND28]], align 4
	// CHECK4-NEXT: store i32 1, i32* [[P_UPPERBOUND29]], align 4			// CHECK4-NEXT: store i32 1, i32* [[P_UPPERBOUND29]], align 4
	// CHECK4-NEXT: store i32 1, i32* [[P_STRIDE30]], align 4			// CHECK4-NEXT: store i32 1, i32* [[P_STRIDE30]], align 4
	// CHECK4-NEXT: [[OMP_GLOBAL_THREAD_NUM31:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t @[[GLOB1]])			// CHECK4-NEXT: [[OMP_GLOBAL_THREAD_NUM31:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t @[[GLOB1]])
	Show All 17 Lines
	// CHECK4-NEXT: i32 0, label [[OMP_SECTION_LOOP_BODY_CASE23:%.*]]			// CHECK4-NEXT: i32 0, label [[OMP_SECTION_LOOP_BODY_CASE23:%.*]]
	// CHECK4-NEXT: i32 1, label [[OMP_SECTION_LOOP_BODY_CASE25:%.*]]			// CHECK4-NEXT: i32 1, label [[OMP_SECTION_LOOP_BODY_CASE25:%.*]]
	// CHECK4-NEXT: ]			// CHECK4-NEXT: ]
	// CHECK4: omp_section_loop.inc17:			// CHECK4: omp_section_loop.inc17:
	// CHECK4-NEXT: [[OMP_SECTION_LOOP_NEXT22]] = add nuw i32 [[OMP_SECTION_LOOP_IV20]], 1			// CHECK4-NEXT: [[OMP_SECTION_LOOP_NEXT22]] = add nuw i32 [[OMP_SECTION_LOOP_IV20]], 1
	// CHECK4-NEXT: br label [[OMP_SECTION_LOOP_HEADER14]]			// CHECK4-NEXT: br label [[OMP_SECTION_LOOP_HEADER14]]
	// CHECK4: omp_section_loop.exit18:			// CHECK4: omp_section_loop.exit18:
	// CHECK4-NEXT: call void @__kmpc_for_static_fini(%struct.ident_t* @[[GLOB1]], i32 [[OMP_GLOBAL_THREAD_NUM31]])			// CHECK4-NEXT: call void @__kmpc_for_static_fini(%struct.ident_t* @[[GLOB1]], i32 [[OMP_GLOBAL_THREAD_NUM31]])
	// CHECK4-NEXT: [[OMP_GLOBAL_THREAD_NUM32:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t @[[GLOB1]])
	// CHECK4-NEXT: call void @__kmpc_barrier(%struct.ident_t* @[[GLOB2]], i32 [[OMP_GLOBAL_THREAD_NUM32]])
	// CHECK4-NEXT: br label [[OMP_SECTION_LOOP_AFTER19:%.*]]			// CHECK4-NEXT: br label [[OMP_SECTION_LOOP_AFTER19:%.*]]
	// CHECK4: omp_section_loop.after19:			// CHECK4: omp_section_loop.after19:
				// CHECK4-NEXT: [[OMP_GLOBAL_THREAD_NUM32:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t @[[GLOB1]])
				// CHECK4-NEXT: call void @__kmpc_barrier(%struct.ident_t* @[[GLOB2]], i32 [[OMP_GLOBAL_THREAD_NUM32]])
	// CHECK4-NEXT: br label [[OMP_SECTIONS_END33:%.*]]			// CHECK4-NEXT: br label [[OMP_SECTIONS_END33:%.*]]
	// CHECK4: omp_sections.end33:			// CHECK4: omp_sections.end33:
	// CHECK4-NEXT: [[TMP14:%.]] = load i32, i32 [[ARGC_ADDR]], align 4			// CHECK4-NEXT: [[TMP14:%.]] = load i32, i32 [[ARGC_ADDR]], align 4
	// CHECK4-NEXT: store i32 [[TMP14]], i32* [[DOTCAPTURE_EXPR_]], align 4			// CHECK4-NEXT: store i32 [[TMP14]], i32* [[DOTCAPTURE_EXPR_]], align 4
	// CHECK4-NEXT: [[TMP15:%.]] = load i32, i32 [[DOTCAPTURE_EXPR_]], align 4			// CHECK4-NEXT: [[TMP15:%.]] = load i32, i32 [[DOTCAPTURE_EXPR_]], align 4
	// CHECK4-NEXT: [[SUB:%.*]] = sub nsw i32 [[TMP15]], 0			// CHECK4-NEXT: [[SUB:%.*]] = sub nsw i32 [[TMP15]], 0
	// CHECK4-NEXT: [[DIV:%.*]] = sdiv i32 [[SUB]], 1			// CHECK4-NEXT: [[DIV:%.*]] = sdiv i32 [[SUB]], 1
	// CHECK4-NEXT: [[SUB35:%.*]] = sub nsw i32 [[DIV]], 1			// CHECK4-NEXT: [[SUB35:%.*]] = sub nsw i32 [[DIV]], 1
	▲ Show 20 Lines • Show All 1,810 Lines • ▼ Show 20 Lines
	// CHECK9-NEXT: switch i32 [[TMP6]], label [[OMP_SECTION_LOOP_INC]] [			// CHECK9-NEXT: switch i32 [[TMP6]], label [[OMP_SECTION_LOOP_INC]] [
	// CHECK9-NEXT: i32 0, label [[OMP_SECTION_LOOP_BODY_CASE:%.*]]			// CHECK9-NEXT: i32 0, label [[OMP_SECTION_LOOP_BODY_CASE:%.*]]
	// CHECK9-NEXT: ]			// CHECK9-NEXT: ]
	// CHECK9: omp_section_loop.inc:			// CHECK9: omp_section_loop.inc:
	// CHECK9-NEXT: [[OMP_SECTION_LOOP_NEXT]] = add nuw i32 [[OMP_SECTION_LOOP_IV]], 1			// CHECK9-NEXT: [[OMP_SECTION_LOOP_NEXT]] = add nuw i32 [[OMP_SECTION_LOOP_IV]], 1
	// CHECK9-NEXT: br label [[OMP_SECTION_LOOP_HEADER]]			// CHECK9-NEXT: br label [[OMP_SECTION_LOOP_HEADER]]
	// CHECK9: omp_section_loop.exit:			// CHECK9: omp_section_loop.exit:
	// CHECK9-NEXT: call void @__kmpc_for_static_fini(%struct.ident_t* @[[GLOB1]], i32 [[OMP_GLOBAL_THREAD_NUM11]])			// CHECK9-NEXT: call void @__kmpc_for_static_fini(%struct.ident_t* @[[GLOB1]], i32 [[OMP_GLOBAL_THREAD_NUM11]])
	// CHECK9-NEXT: [[OMP_GLOBAL_THREAD_NUM12:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t @[[GLOB1]])
	// CHECK9-NEXT: call void @__kmpc_barrier(%struct.ident_t* @[[GLOB2:[0-9]+]], i32 [[OMP_GLOBAL_THREAD_NUM12]])
	// CHECK9-NEXT: br label [[OMP_SECTION_LOOP_AFTER:%.*]]			// CHECK9-NEXT: br label [[OMP_SECTION_LOOP_AFTER:%.*]]
	// CHECK9: omp_section_loop.after:			// CHECK9: omp_section_loop.after:
				// CHECK9-NEXT: [[OMP_GLOBAL_THREAD_NUM12:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t @[[GLOB1]])
				// CHECK9-NEXT: call void @__kmpc_barrier(%struct.ident_t* @[[GLOB2:[0-9]+]], i32 [[OMP_GLOBAL_THREAD_NUM12]])
	// CHECK9-NEXT: br label [[OMP_SECTIONS_END:%.*]]			// CHECK9-NEXT: br label [[OMP_SECTIONS_END:%.*]]
	// CHECK9: omp_sections.end:			// CHECK9: omp_sections.end:
	// CHECK9-NEXT: br label [[OMP_SECTION_LOOP_PREHEADER13:%.*]]			// CHECK9-NEXT: br label [[OMP_SECTION_LOOP_PREHEADER13:%.*]]
	// CHECK9: omp_section_loop.preheader13:			// CHECK9: omp_section_loop.preheader13:
	// CHECK9-NEXT: store i32 0, i32* [[P_LOWERBOUND28]], align 4			// CHECK9-NEXT: store i32 0, i32* [[P_LOWERBOUND28]], align 4
	// CHECK9-NEXT: store i32 1, i32* [[P_UPPERBOUND29]], align 4			// CHECK9-NEXT: store i32 1, i32* [[P_UPPERBOUND29]], align 4
	// CHECK9-NEXT: store i32 1, i32* [[P_STRIDE30]], align 4			// CHECK9-NEXT: store i32 1, i32* [[P_STRIDE30]], align 4
	// CHECK9-NEXT: [[OMP_GLOBAL_THREAD_NUM31:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t @[[GLOB1]])			// CHECK9-NEXT: [[OMP_GLOBAL_THREAD_NUM31:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t @[[GLOB1]])
	Show All 17 Lines
	// CHECK9-NEXT: i32 0, label [[OMP_SECTION_LOOP_BODY_CASE23:%.*]]			// CHECK9-NEXT: i32 0, label [[OMP_SECTION_LOOP_BODY_CASE23:%.*]]
	// CHECK9-NEXT: i32 1, label [[OMP_SECTION_LOOP_BODY_CASE25:%.*]]			// CHECK9-NEXT: i32 1, label [[OMP_SECTION_LOOP_BODY_CASE25:%.*]]
	// CHECK9-NEXT: ]			// CHECK9-NEXT: ]
	// CHECK9: omp_section_loop.inc17:			// CHECK9: omp_section_loop.inc17:
	// CHECK9-NEXT: [[OMP_SECTION_LOOP_NEXT22]] = add nuw i32 [[OMP_SECTION_LOOP_IV20]], 1			// CHECK9-NEXT: [[OMP_SECTION_LOOP_NEXT22]] = add nuw i32 [[OMP_SECTION_LOOP_IV20]], 1
	// CHECK9-NEXT: br label [[OMP_SECTION_LOOP_HEADER14]]			// CHECK9-NEXT: br label [[OMP_SECTION_LOOP_HEADER14]]
	// CHECK9: omp_section_loop.exit18:			// CHECK9: omp_section_loop.exit18:
	// CHECK9-NEXT: call void @__kmpc_for_static_fini(%struct.ident_t* @[[GLOB1]], i32 [[OMP_GLOBAL_THREAD_NUM31]])			// CHECK9-NEXT: call void @__kmpc_for_static_fini(%struct.ident_t* @[[GLOB1]], i32 [[OMP_GLOBAL_THREAD_NUM31]])
	// CHECK9-NEXT: [[OMP_GLOBAL_THREAD_NUM32:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t @[[GLOB1]])
	// CHECK9-NEXT: call void @__kmpc_barrier(%struct.ident_t* @[[GLOB2]], i32 [[OMP_GLOBAL_THREAD_NUM32]])
	// CHECK9-NEXT: br label [[OMP_SECTION_LOOP_AFTER19:%.*]]			// CHECK9-NEXT: br label [[OMP_SECTION_LOOP_AFTER19:%.*]]
	// CHECK9: omp_section_loop.after19:			// CHECK9: omp_section_loop.after19:
				// CHECK9-NEXT: [[OMP_GLOBAL_THREAD_NUM32:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t @[[GLOB1]])
				// CHECK9-NEXT: call void @__kmpc_barrier(%struct.ident_t* @[[GLOB2]], i32 [[OMP_GLOBAL_THREAD_NUM32]])
	// CHECK9-NEXT: br label [[OMP_SECTIONS_END33:%.*]]			// CHECK9-NEXT: br label [[OMP_SECTIONS_END33:%.*]]
	// CHECK9: omp_sections.end33:			// CHECK9: omp_sections.end33:
	// CHECK9-NEXT: [[TMP14:%.]] = load i32, i32 [[ARGC_ADDR]], align 4			// CHECK9-NEXT: [[TMP14:%.]] = load i32, i32 [[ARGC_ADDR]], align 4
	// CHECK9-NEXT: store i32 [[TMP14]], i32* [[DOTCAPTURE_EXPR_]], align 4			// CHECK9-NEXT: store i32 [[TMP14]], i32* [[DOTCAPTURE_EXPR_]], align 4
	// CHECK9-NEXT: [[TMP15:%.]] = load i32, i32 [[DOTCAPTURE_EXPR_]], align 4			// CHECK9-NEXT: [[TMP15:%.]] = load i32, i32 [[DOTCAPTURE_EXPR_]], align 4
	// CHECK9-NEXT: [[SUB:%.*]] = sub nsw i32 [[TMP15]], 0			// CHECK9-NEXT: [[SUB:%.*]] = sub nsw i32 [[TMP15]], 0
	// CHECK9-NEXT: [[DIV:%.*]] = sdiv i32 [[SUB]], 1			// CHECK9-NEXT: [[DIV:%.*]] = sdiv i32 [[SUB]], 1
	// CHECK9-NEXT: [[SUB35:%.*]] = sub nsw i32 [[DIV]], 1			// CHECK9-NEXT: [[SUB35:%.*]] = sub nsw i32 [[DIV]], 1
	▲ Show 20 Lines • Show All 570 Lines • ▼ Show 20 Lines
	// CHECK10-NEXT: switch i32 [[TMP6]], label [[OMP_SECTION_LOOP_INC]] [			// CHECK10-NEXT: switch i32 [[TMP6]], label [[OMP_SECTION_LOOP_INC]] [
	// CHECK10-NEXT: i32 0, label [[OMP_SECTION_LOOP_BODY_CASE:%.*]]			// CHECK10-NEXT: i32 0, label [[OMP_SECTION_LOOP_BODY_CASE:%.*]]
	// CHECK10-NEXT: ]			// CHECK10-NEXT: ]
	// CHECK10: omp_section_loop.inc:			// CHECK10: omp_section_loop.inc:
	// CHECK10-NEXT: [[OMP_SECTION_LOOP_NEXT]] = add nuw i32 [[OMP_SECTION_LOOP_IV]], 1			// CHECK10-NEXT: [[OMP_SECTION_LOOP_NEXT]] = add nuw i32 [[OMP_SECTION_LOOP_IV]], 1
	// CHECK10-NEXT: br label [[OMP_SECTION_LOOP_HEADER]]			// CHECK10-NEXT: br label [[OMP_SECTION_LOOP_HEADER]]
	// CHECK10: omp_section_loop.exit:			// CHECK10: omp_section_loop.exit:
	// CHECK10-NEXT: call void @__kmpc_for_static_fini(%struct.ident_t* @[[GLOB1]], i32 [[OMP_GLOBAL_THREAD_NUM11]])			// CHECK10-NEXT: call void @__kmpc_for_static_fini(%struct.ident_t* @[[GLOB1]], i32 [[OMP_GLOBAL_THREAD_NUM11]])
	// CHECK10-NEXT: [[OMP_GLOBAL_THREAD_NUM12:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t @[[GLOB1]])
	// CHECK10-NEXT: call void @__kmpc_barrier(%struct.ident_t* @[[GLOB2:[0-9]+]], i32 [[OMP_GLOBAL_THREAD_NUM12]])
	// CHECK10-NEXT: br label [[OMP_SECTION_LOOP_AFTER:%.*]]			// CHECK10-NEXT: br label [[OMP_SECTION_LOOP_AFTER:%.*]]
	// CHECK10: omp_section_loop.after:			// CHECK10: omp_section_loop.after:
				// CHECK10-NEXT: [[OMP_GLOBAL_THREAD_NUM12:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t @[[GLOB1]])
				// CHECK10-NEXT: call void @__kmpc_barrier(%struct.ident_t* @[[GLOB2:[0-9]+]], i32 [[OMP_GLOBAL_THREAD_NUM12]])
	// CHECK10-NEXT: br label [[OMP_SECTIONS_END:%.*]]			// CHECK10-NEXT: br label [[OMP_SECTIONS_END:%.*]]
	// CHECK10: omp_sections.end:			// CHECK10: omp_sections.end:
	// CHECK10-NEXT: br label [[OMP_SECTION_LOOP_PREHEADER13:%.*]]			// CHECK10-NEXT: br label [[OMP_SECTION_LOOP_PREHEADER13:%.*]]
	// CHECK10: omp_section_loop.preheader13:			// CHECK10: omp_section_loop.preheader13:
	// CHECK10-NEXT: store i32 0, i32* [[P_LOWERBOUND28]], align 4			// CHECK10-NEXT: store i32 0, i32* [[P_LOWERBOUND28]], align 4
	// CHECK10-NEXT: store i32 1, i32* [[P_UPPERBOUND29]], align 4			// CHECK10-NEXT: store i32 1, i32* [[P_UPPERBOUND29]], align 4
	// CHECK10-NEXT: store i32 1, i32* [[P_STRIDE30]], align 4			// CHECK10-NEXT: store i32 1, i32* [[P_STRIDE30]], align 4
	// CHECK10-NEXT: [[OMP_GLOBAL_THREAD_NUM31:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t @[[GLOB1]])			// CHECK10-NEXT: [[OMP_GLOBAL_THREAD_NUM31:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t @[[GLOB1]])
	Show All 17 Lines
	// CHECK10-NEXT: i32 0, label [[OMP_SECTION_LOOP_BODY_CASE23:%.*]]			// CHECK10-NEXT: i32 0, label [[OMP_SECTION_LOOP_BODY_CASE23:%.*]]
	// CHECK10-NEXT: i32 1, label [[OMP_SECTION_LOOP_BODY_CASE25:%.*]]			// CHECK10-NEXT: i32 1, label [[OMP_SECTION_LOOP_BODY_CASE25:%.*]]
	// CHECK10-NEXT: ]			// CHECK10-NEXT: ]
	// CHECK10: omp_section_loop.inc17:			// CHECK10: omp_section_loop.inc17:
	// CHECK10-NEXT: [[OMP_SECTION_LOOP_NEXT22]] = add nuw i32 [[OMP_SECTION_LOOP_IV20]], 1			// CHECK10-NEXT: [[OMP_SECTION_LOOP_NEXT22]] = add nuw i32 [[OMP_SECTION_LOOP_IV20]], 1
	// CHECK10-NEXT: br label [[OMP_SECTION_LOOP_HEADER14]]			// CHECK10-NEXT: br label [[OMP_SECTION_LOOP_HEADER14]]
	// CHECK10: omp_section_loop.exit18:			// CHECK10: omp_section_loop.exit18:
	// CHECK10-NEXT: call void @__kmpc_for_static_fini(%struct.ident_t* @[[GLOB1]], i32 [[OMP_GLOBAL_THREAD_NUM31]])			// CHECK10-NEXT: call void @__kmpc_for_static_fini(%struct.ident_t* @[[GLOB1]], i32 [[OMP_GLOBAL_THREAD_NUM31]])
	// CHECK10-NEXT: [[OMP_GLOBAL_THREAD_NUM32:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t @[[GLOB1]])
	// CHECK10-NEXT: call void @__kmpc_barrier(%struct.ident_t* @[[GLOB2]], i32 [[OMP_GLOBAL_THREAD_NUM32]])
	// CHECK10-NEXT: br label [[OMP_SECTION_LOOP_AFTER19:%.*]]			// CHECK10-NEXT: br label [[OMP_SECTION_LOOP_AFTER19:%.*]]
	// CHECK10: omp_section_loop.after19:			// CHECK10: omp_section_loop.after19:
				// CHECK10-NEXT: [[OMP_GLOBAL_THREAD_NUM32:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t @[[GLOB1]])
				// CHECK10-NEXT: call void @__kmpc_barrier(%struct.ident_t* @[[GLOB2]], i32 [[OMP_GLOBAL_THREAD_NUM32]])
	// CHECK10-NEXT: br label [[OMP_SECTIONS_END33:%.*]]			// CHECK10-NEXT: br label [[OMP_SECTIONS_END33:%.*]]
	// CHECK10: omp_sections.end33:			// CHECK10: omp_sections.end33:
	// CHECK10-NEXT: [[TMP14:%.]] = load i32, i32 [[ARGC_ADDR]], align 4			// CHECK10-NEXT: [[TMP14:%.]] = load i32, i32 [[ARGC_ADDR]], align 4
	// CHECK10-NEXT: store i32 [[TMP14]], i32* [[DOTCAPTURE_EXPR_]], align 4			// CHECK10-NEXT: store i32 [[TMP14]], i32* [[DOTCAPTURE_EXPR_]], align 4
	// CHECK10-NEXT: [[TMP15:%.]] = load i32, i32 [[DOTCAPTURE_EXPR_]], align 4			// CHECK10-NEXT: [[TMP15:%.]] = load i32, i32 [[DOTCAPTURE_EXPR_]], align 4
	// CHECK10-NEXT: [[SUB:%.*]] = sub nsw i32 [[TMP15]], 0			// CHECK10-NEXT: [[SUB:%.*]] = sub nsw i32 [[TMP15]], 0
	// CHECK10-NEXT: [[DIV:%.*]] = sdiv i32 [[SUB]], 1			// CHECK10-NEXT: [[DIV:%.*]] = sdiv i32 [[SUB]], 1
	// CHECK10-NEXT: [[SUB35:%.*]] = sub nsw i32 [[DIV]], 1			// CHECK10-NEXT: [[SUB35:%.*]] = sub nsw i32 [[DIV]], 1
	▲ Show 20 Lines • Show All 510 Lines • Show Last 20 Lines

clang/test/OpenMP/irbuilder_for_iterator.cpp

	Show First 20 Lines • Show All 92 Lines • ▼ Show 20 Lines
	// CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds float, float [[TMP14]], i64 [[IDXPROM4]]			// CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds float, float [[TMP14]], i64 [[IDXPROM4]]
	// CHECK-NEXT: store float [[MUL]], float* [[ARRAYIDX5]], align 4			// CHECK-NEXT: store float [[MUL]], float* [[ARRAYIDX5]], align 4
	// CHECK-NEXT: br label [[OMP_LOOP_INC]]			// CHECK-NEXT: br label [[OMP_LOOP_INC]]
	// CHECK: omp_loop.inc:			// CHECK: omp_loop.inc:
	// CHECK-NEXT: [[OMP_LOOP_NEXT]] = add nuw i64 [[OMP_LOOP_IV]], 1			// CHECK-NEXT: [[OMP_LOOP_NEXT]] = add nuw i64 [[OMP_LOOP_IV]], 1
	// CHECK-NEXT: br label [[OMP_LOOP_HEADER]]			// CHECK-NEXT: br label [[OMP_LOOP_HEADER]]
	// CHECK: omp_loop.exit:			// CHECK: omp_loop.exit:
	// CHECK-NEXT: call void @__kmpc_for_static_fini(%struct.ident_t* @[[GLOB1]], i32 [[OMP_GLOBAL_THREAD_NUM]])			// CHECK-NEXT: call void @__kmpc_for_static_fini(%struct.ident_t* @[[GLOB1]], i32 [[OMP_GLOBAL_THREAD_NUM]])
	// CHECK-NEXT: [[OMP_GLOBAL_THREAD_NUM6:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t @[[GLOB1]])
	// CHECK-NEXT: call void @__kmpc_barrier(%struct.ident_t* @[[GLOB2:[0-9]+]], i32 [[OMP_GLOBAL_THREAD_NUM6]])
	// CHECK-NEXT: br label [[OMP_LOOP_AFTER:%.*]]			// CHECK-NEXT: br label [[OMP_LOOP_AFTER:%.*]]
	// CHECK: omp_loop.after:			// CHECK: omp_loop.after:
				// CHECK-NEXT: [[OMP_GLOBAL_THREAD_NUM6:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t @[[GLOB1]])
				// CHECK-NEXT: call void @__kmpc_barrier(%struct.ident_t* @[[GLOB2:[0-9]+]], i32 [[OMP_GLOBAL_THREAD_NUM6]])
	// CHECK-NEXT: ret void			// CHECK-NEXT: ret void
	//			//
	//			//
	// CHECK-LABEL: define {{[^@]+}}@__captured_stmt			// CHECK-LABEL: define {{[^@]+}}@__captured_stmt
	// CHECK-SAME: (i64* nonnull align 8 dereferenceable(8) [[DISTANCE:%.]], %struct.anon noalias [[__CONTEXT:%.*]]) #[[ATTR2:[0-9]+]] {			// CHECK-SAME: (i64* nonnull align 8 dereferenceable(8) [[DISTANCE:%.]], %struct.anon noalias [[__CONTEXT:%.*]]) #[[ATTR2:[0-9]+]] {
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[DISTANCE_ADDR:%.]] = alloca i64, align 8			// CHECK-NEXT: [[DISTANCE_ADDR:%.]] = alloca i64, align 8
	// CHECK-NEXT: [[__CONTEXT_ADDR:%.]] = alloca %struct.anon, align 8			// CHECK-NEXT: [[__CONTEXT_ADDR:%.]] = alloca %struct.anon, align 8
	▲ Show 20 Lines • Show All 54 Lines • Show Last 20 Lines

clang/test/OpenMP/irbuilder_for_rangefor.cpp

	Show First 20 Lines • Show All 108 Lines • ▼ Show 20 Lines
	// CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds float, float [[TMP17]], i64 [[IDXPROM4]]			// CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds float, float [[TMP17]], i64 [[IDXPROM4]]
	// CHECK-NEXT: store float [[MUL]], float* [[ARRAYIDX5]], align 4			// CHECK-NEXT: store float [[MUL]], float* [[ARRAYIDX5]], align 4
	// CHECK-NEXT: br label [[OMP_LOOP_INC]]			// CHECK-NEXT: br label [[OMP_LOOP_INC]]
	// CHECK: omp_loop.inc:			// CHECK: omp_loop.inc:
	// CHECK-NEXT: [[OMP_LOOP_NEXT]] = add nuw i64 [[OMP_LOOP_IV]], 1			// CHECK-NEXT: [[OMP_LOOP_NEXT]] = add nuw i64 [[OMP_LOOP_IV]], 1
	// CHECK-NEXT: br label [[OMP_LOOP_HEADER]]			// CHECK-NEXT: br label [[OMP_LOOP_HEADER]]
	// CHECK: omp_loop.exit:			// CHECK: omp_loop.exit:
	// CHECK-NEXT: call void @__kmpc_for_static_fini(%struct.ident_t* @[[GLOB1]], i32 [[OMP_GLOBAL_THREAD_NUM]])			// CHECK-NEXT: call void @__kmpc_for_static_fini(%struct.ident_t* @[[GLOB1]], i32 [[OMP_GLOBAL_THREAD_NUM]])
	// CHECK-NEXT: [[OMP_GLOBAL_THREAD_NUM6:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t @[[GLOB1]])
	// CHECK-NEXT: call void @__kmpc_barrier(%struct.ident_t* @[[GLOB2:[0-9]+]], i32 [[OMP_GLOBAL_THREAD_NUM6]])
	// CHECK-NEXT: br label [[OMP_LOOP_AFTER:%.*]]			// CHECK-NEXT: br label [[OMP_LOOP_AFTER:%.*]]
	// CHECK: omp_loop.after:			// CHECK: omp_loop.after:
				// CHECK-NEXT: [[OMP_GLOBAL_THREAD_NUM6:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t @[[GLOB1]])
				// CHECK-NEXT: call void @__kmpc_barrier(%struct.ident_t* @[[GLOB2:[0-9]+]], i32 [[OMP_GLOBAL_THREAD_NUM6]])
	// CHECK-NEXT: ret void			// CHECK-NEXT: ret void
	//			//
	//			//
	// CHECK-LABEL: define {{[^@]+}}@__captured_stmt			// CHECK-LABEL: define {{[^@]+}}@__captured_stmt
	// CHECK-SAME: (i64* nonnull align 8 dereferenceable(8) [[DISTANCE:%.]], %struct.anon noalias [[__CONTEXT:%.*]]) #[[ATTR2:[0-9]+]] {			// CHECK-SAME: (i64* nonnull align 8 dereferenceable(8) [[DISTANCE:%.]], %struct.anon noalias [[__CONTEXT:%.*]]) #[[ATTR2:[0-9]+]] {
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[DISTANCE_ADDR:%.]] = alloca i64, align 8			// CHECK-NEXT: [[DISTANCE_ADDR:%.]] = alloca i64, align 8
	// CHECK-NEXT: [[__CONTEXT_ADDR:%.]] = alloca %struct.anon, align 8			// CHECK-NEXT: [[__CONTEXT_ADDR:%.]] = alloca %struct.anon, align 8
	▲ Show 20 Lines • Show All 57 Lines • Show Last 20 Lines

clang/test/OpenMP/irbuilder_for_unsigned.c

	Show First 20 Lines • Show All 84 Lines • ▼ Show 20 Lines
	// CHECK-NEXT: [[ARRAYIDX8:%.]] = getelementptr inbounds float, float [[TMP18]], i64 [[IDXPROM7]]			// CHECK-NEXT: [[ARRAYIDX8:%.]] = getelementptr inbounds float, float [[TMP18]], i64 [[IDXPROM7]]
	// CHECK-NEXT: store float [[MUL6]], float* [[ARRAYIDX8]], align 4			// CHECK-NEXT: store float [[MUL6]], float* [[ARRAYIDX8]], align 4
	// CHECK-NEXT: br label [[OMP_LOOP_INC]]			// CHECK-NEXT: br label [[OMP_LOOP_INC]]
	// CHECK: omp_loop.inc:			// CHECK: omp_loop.inc:
	// CHECK-NEXT: [[OMP_LOOP_NEXT]] = add nuw i32 [[OMP_LOOP_IV]], 1			// CHECK-NEXT: [[OMP_LOOP_NEXT]] = add nuw i32 [[OMP_LOOP_IV]], 1
	// CHECK-NEXT: br label [[OMP_LOOP_HEADER]]			// CHECK-NEXT: br label [[OMP_LOOP_HEADER]]
	// CHECK: omp_loop.exit:			// CHECK: omp_loop.exit:
	// CHECK-NEXT: call void @__kmpc_for_static_fini(%struct.ident_t* @[[GLOB1]], i32 [[OMP_GLOBAL_THREAD_NUM]])			// CHECK-NEXT: call void @__kmpc_for_static_fini(%struct.ident_t* @[[GLOB1]], i32 [[OMP_GLOBAL_THREAD_NUM]])
	// CHECK-NEXT: [[OMP_GLOBAL_THREAD_NUM9:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t @[[GLOB1]])
	// CHECK-NEXT: call void @__kmpc_barrier(%struct.ident_t* @[[GLOB2:[0-9]+]], i32 [[OMP_GLOBAL_THREAD_NUM9]])
	// CHECK-NEXT: br label [[OMP_LOOP_AFTER:%.*]]			// CHECK-NEXT: br label [[OMP_LOOP_AFTER:%.*]]
	// CHECK: omp_loop.after:			// CHECK: omp_loop.after:
				// CHECK-NEXT: [[OMP_GLOBAL_THREAD_NUM9:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t @[[GLOB1]])
				// CHECK-NEXT: call void @__kmpc_barrier(%struct.ident_t* @[[GLOB2:[0-9]+]], i32 [[OMP_GLOBAL_THREAD_NUM9]])
	// CHECK-NEXT: ret void			// CHECK-NEXT: ret void
	//			//
	//			//
	// CHECK-LABEL: define {{[^@]+}}@__captured_stmt			// CHECK-LABEL: define {{[^@]+}}@__captured_stmt
	// CHECK-SAME: (i32* nonnull align 4 dereferenceable(4) [[DISTANCE:%.]], %struct.anon noalias [[__CONTEXT:%.*]]) #[[ATTR1:[0-9]+]] {			// CHECK-SAME: (i32* nonnull align 4 dereferenceable(4) [[DISTANCE:%.]], %struct.anon noalias [[__CONTEXT:%.*]]) #[[ATTR1:[0-9]+]] {
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[DISTANCE_ADDR:%.]] = alloca i32, align 8			// CHECK-NEXT: [[DISTANCE_ADDR:%.]] = alloca i32, align 8
	// CHECK-NEXT: [[__CONTEXT_ADDR:%.]] = alloca %struct.anon, align 8			// CHECK-NEXT: [[__CONTEXT_ADDR:%.]] = alloca %struct.anon, align 8
	▲ Show 20 Lines • Show All 51 Lines • Show Last 20 Lines

clang/test/OpenMP/irbuilder_nested_parallel_for.c

	Show All 17 Lines
	// CHECK-NEXT: br label [[OMP_PAR_OUTLINED_EXIT:%.*]]			// CHECK-NEXT: br label [[OMP_PAR_OUTLINED_EXIT:%.*]]
	// CHECK: omp.par.outlined.exit:			// CHECK: omp.par.outlined.exit:
	// CHECK-NEXT: br label [[OMP_PAR_EXIT_SPLIT:%.*]]			// CHECK-NEXT: br label [[OMP_PAR_EXIT_SPLIT:%.*]]
	// CHECK: omp.par.exit.split:			// CHECK: omp.par.exit.split:
	// CHECK-NEXT: ret void			// CHECK-NEXT: ret void
	//			//
	// CHECK-DEBUG-LABEL: @_Z14parallel_for_0v(			// CHECK-DEBUG-LABEL: @_Z14parallel_for_0v(
	// CHECK-DEBUG-NEXT: entry:			// CHECK-DEBUG-NEXT: entry:
	// CHECK-DEBUG-NEXT: [[OMP_GLOBAL_THREAD_NUM:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t @[[GLOB1:[0-9]+]]), !dbg [[DBG12:![0-9]+]]			// CHECK-DEBUG-NEXT: [[OMP_GLOBAL_THREAD_NUM:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t @[[GLOB1:[0-9]+]]), !dbg [[DBG13:![0-9]+]]
	// CHECK-DEBUG-NEXT: br label [[OMP_PARALLEL:%.*]]			// CHECK-DEBUG-NEXT: br label [[OMP_PARALLEL:%.*]]
	// CHECK-DEBUG: omp_parallel:			// CHECK-DEBUG: omp_parallel:
	// CHECK-DEBUG-NEXT: call void (%struct.ident_t, i32, void (i32, i32, ...), ...) @__kmpc_fork_call(%struct.ident_t* @[[GLOB1]], i32 0, void (i32, i32, ...)* bitcast (void (i32, i32)* @_Z14parallel_for_0v..omp_par to void (i32, i32, ...)*)), !dbg [[DBG13:![0-9]+]]			// CHECK-DEBUG-NEXT: call void (%struct.ident_t, i32, void (i32, i32, ...), ...) @__kmpc_fork_call(%struct.ident_t* @[[GLOB1]], i32 0, void (i32, i32, ...)* bitcast (void (i32, i32)* @_Z14parallel_for_0v..omp_par to void (i32, i32, ...)*)), !dbg [[DBG14:![0-9]+]]
	// CHECK-DEBUG-NEXT: br label [[OMP_PAR_OUTLINED_EXIT:%.*]]			// CHECK-DEBUG-NEXT: br label [[OMP_PAR_OUTLINED_EXIT:%.*]]
	// CHECK-DEBUG: omp.par.outlined.exit:			// CHECK-DEBUG: omp.par.outlined.exit:
	// CHECK-DEBUG-NEXT: br label [[OMP_PAR_EXIT_SPLIT:%.*]]			// CHECK-DEBUG-NEXT: br label [[OMP_PAR_EXIT_SPLIT:%.*]]
	// CHECK-DEBUG: omp.par.exit.split:			// CHECK-DEBUG: omp.par.exit.split:
	// CHECK-DEBUG-NEXT: ret void, !dbg [[DBG17:![0-9]+]]			// CHECK-DEBUG-NEXT: ret void, !dbg [[DBG18:![0-9]+]]
	//			//
	void parallel_for_0(void) {			void parallel_for_0(void) {
	#pragma omp parallel			#pragma omp parallel
	{			{
	#pragma omp for			#pragma omp for
	for (int i = 0; i < 100; ++i) {			for (int i = 0; i < 100; ++i) {
	}			}
	}			}
	Show All 18 Lines
	// CHECK-NEXT: ret void			// CHECK-NEXT: ret void
	//			//
	// CHECK-DEBUG-LABEL: @_Z14parallel_for_1Pfid(			// CHECK-DEBUG-LABEL: @_Z14parallel_for_1Pfid(
	// CHECK-DEBUG-NEXT: entry:			// CHECK-DEBUG-NEXT: entry:
	// CHECK-DEBUG-NEXT: [[R_ADDR:%.]] = alloca float, align 8			// CHECK-DEBUG-NEXT: [[R_ADDR:%.]] = alloca float, align 8
	// CHECK-DEBUG-NEXT: [[A_ADDR:%.*]] = alloca i32, align 4			// CHECK-DEBUG-NEXT: [[A_ADDR:%.*]] = alloca i32, align 4
	// CHECK-DEBUG-NEXT: [[B_ADDR:%.*]] = alloca double, align 8			// CHECK-DEBUG-NEXT: [[B_ADDR:%.*]] = alloca double, align 8
	// CHECK-DEBUG-NEXT: store float* [[R:%.]], float* [[R_ADDR]], align 8			// CHECK-DEBUG-NEXT: store float* [[R:%.]], float* [[R_ADDR]], align 8
	// CHECK-DEBUG-NEXT: call void @llvm.dbg.declare(metadata float** [[R_ADDR]], metadata [[META71:![0-9]+]], metadata !DIExpression()), !dbg [[DBG72:![0-9]+]]			// CHECK-DEBUG-NEXT: call void @llvm.dbg.declare(metadata float** [[R_ADDR]], metadata [[META72:![0-9]+]], metadata !DIExpression()), !dbg [[DBG73:![0-9]+]]
	// CHECK-DEBUG-NEXT: store i32 [[A:%.]], i32 [[A_ADDR]], align 4			// CHECK-DEBUG-NEXT: store i32 [[A:%.]], i32 [[A_ADDR]], align 4
	// CHECK-DEBUG-NEXT: call void @llvm.dbg.declare(metadata i32* [[A_ADDR]], metadata [[META73:![0-9]+]], metadata !DIExpression()), !dbg [[DBG74:![0-9]+]]			// CHECK-DEBUG-NEXT: call void @llvm.dbg.declare(metadata i32* [[A_ADDR]], metadata [[META74:![0-9]+]], metadata !DIExpression()), !dbg [[DBG75:![0-9]+]]
	// CHECK-DEBUG-NEXT: store double [[B:%.]], double [[B_ADDR]], align 8			// CHECK-DEBUG-NEXT: store double [[B:%.]], double [[B_ADDR]], align 8
	// CHECK-DEBUG-NEXT: call void @llvm.dbg.declare(metadata double* [[B_ADDR]], metadata [[META75:![0-9]+]], metadata !DIExpression()), !dbg [[DBG76:![0-9]+]]			// CHECK-DEBUG-NEXT: call void @llvm.dbg.declare(metadata double* [[B_ADDR]], metadata [[META76:![0-9]+]], metadata !DIExpression()), !dbg [[DBG77:![0-9]+]]
	// CHECK-DEBUG-NEXT: [[OMP_GLOBAL_THREAD_NUM:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t @[[GLOB6:[0-9]+]]), !dbg [[DBG77:![0-9]+]]			// CHECK-DEBUG-NEXT: [[OMP_GLOBAL_THREAD_NUM:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t @[[GLOB6:[0-9]+]]), !dbg [[DBG78:![0-9]+]]
	// CHECK-DEBUG-NEXT: br label [[OMP_PARALLEL:%.*]]			// CHECK-DEBUG-NEXT: br label [[OMP_PARALLEL:%.*]]
	// CHECK-DEBUG: omp_parallel:			// CHECK-DEBUG: omp_parallel:
	// CHECK-DEBUG-NEXT: call void (%struct.ident_t, i32, void (i32, i32, ...), ...) @__kmpc_fork_call(%struct.ident_t* @[[GLOB6]], i32 3, void (i32, i32, ...)* bitcast (void (i32, i32, i32, double, float*) @_Z14parallel_for_1Pfid..omp_par.4 to void (i32, i32, ...)), i32 [[A_ADDR]], double* [[B_ADDR]], float** [[R_ADDR]]), !dbg [[DBG78:![0-9]+]]			// CHECK-DEBUG-NEXT: call void (%struct.ident_t, i32, void (i32, i32, ...), ...) @__kmpc_fork_call(%struct.ident_t* @[[GLOB6]], i32 3, void (i32, i32, ...)* bitcast (void (i32, i32, i32, double, float*) @_Z14parallel_for_1Pfid..omp_par.4 to void (i32, i32, ...)), i32 [[A_ADDR]], double* [[B_ADDR]], float** [[R_ADDR]]), !dbg [[DBG79:![0-9]+]]
	// CHECK-DEBUG-NEXT: br label [[OMP_PAR_OUTLINED_EXIT16:%.*]]			// CHECK-DEBUG-NEXT: br label [[OMP_PAR_OUTLINED_EXIT16:%.*]]
	// CHECK-DEBUG: omp.par.outlined.exit16:			// CHECK-DEBUG: omp.par.outlined.exit16:
	// CHECK-DEBUG-NEXT: br label [[OMP_PAR_EXIT_SPLIT:%.*]]			// CHECK-DEBUG-NEXT: br label [[OMP_PAR_EXIT_SPLIT:%.*]]
	// CHECK-DEBUG: omp.par.exit.split:			// CHECK-DEBUG: omp.par.exit.split:
	// CHECK-DEBUG-NEXT: ret void, !dbg [[DBG80:![0-9]+]]			// CHECK-DEBUG-NEXT: ret void, !dbg [[DBG81:![0-9]+]]
	//			//
	void parallel_for_1(float *r, int a, double b) {			void parallel_for_1(float *r, int a, double b) {
	#pragma omp parallel			#pragma omp parallel
	{			{
	#pragma omp parallel			#pragma omp parallel
	{			{
	#pragma omp for			#pragma omp for
	for (int i = 0; i < 100; ++i) {			for (int i = 0; i < 100; ++i) {
	▲ Show 20 Lines • Show All 65 Lines • ▼ Show 20 Lines
	// CHECK-NEXT: [[TMP11:%.]] = load float, float** [[R_ADDR]], align 8			// CHECK-NEXT: [[TMP11:%.]] = load float, float** [[R_ADDR]], align 8
	// CHECK-NEXT: store float [[CONV202]], float* [[TMP11]], align 4			// CHECK-NEXT: store float [[CONV202]], float* [[TMP11]], align 4
	// CHECK-NEXT: br label [[OMP_LOOP_INC194]]			// CHECK-NEXT: br label [[OMP_LOOP_INC194]]
	// CHECK: omp_loop.inc194:			// CHECK: omp_loop.inc194:
	// CHECK-NEXT: [[OMP_LOOP_NEXT199]] = add nuw i32 [[OMP_LOOP_IV197]], 1			// CHECK-NEXT: [[OMP_LOOP_NEXT199]] = add nuw i32 [[OMP_LOOP_IV197]], 1
	// CHECK-NEXT: br label [[OMP_LOOP_HEADER191]]			// CHECK-NEXT: br label [[OMP_LOOP_HEADER191]]
	// CHECK: omp_loop.exit195:			// CHECK: omp_loop.exit195:
	// CHECK-NEXT: call void @__kmpc_for_static_fini(%struct.ident_t* @[[GLOB1]], i32 [[OMP_GLOBAL_THREAD_NUM207]])			// CHECK-NEXT: call void @__kmpc_for_static_fini(%struct.ident_t* @[[GLOB1]], i32 [[OMP_GLOBAL_THREAD_NUM207]])
	// CHECK-NEXT: [[OMP_GLOBAL_THREAD_NUM208:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t @[[GLOB1]])
	// CHECK-NEXT: call void @__kmpc_barrier(%struct.ident_t* @[[GLOB2:[0-9]+]], i32 [[OMP_GLOBAL_THREAD_NUM208]])
	// CHECK-NEXT: br label [[OMP_LOOP_AFTER196:%.*]]			// CHECK-NEXT: br label [[OMP_LOOP_AFTER196:%.*]]
	// CHECK: omp_loop.after196:			// CHECK: omp_loop.after196:
				// CHECK-NEXT: [[OMP_GLOBAL_THREAD_NUM208:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t @[[GLOB1]])
				// CHECK-NEXT: call void @__kmpc_barrier(%struct.ident_t* @[[GLOB2:[0-9]+]], i32 [[OMP_GLOBAL_THREAD_NUM208]])
	// CHECK-NEXT: ret void			// CHECK-NEXT: ret void
	//			//
	// CHECK-DEBUG-LABEL: @_Z14parallel_for_2Pfid(			// CHECK-DEBUG-LABEL: @_Z14parallel_for_2Pfid(
	// CHECK-DEBUG-NEXT: entry:			// CHECK-DEBUG-NEXT: entry:
	// CHECK-DEBUG-NEXT: [[R_ADDR:%.]] = alloca float, align 8			// CHECK-DEBUG-NEXT: [[R_ADDR:%.]] = alloca float, align 8
	// CHECK-DEBUG-NEXT: [[A_ADDR:%.*]] = alloca i32, align 4			// CHECK-DEBUG-NEXT: [[A_ADDR:%.*]] = alloca i32, align 4
	// CHECK-DEBUG-NEXT: [[B_ADDR:%.*]] = alloca double, align 8			// CHECK-DEBUG-NEXT: [[B_ADDR:%.*]] = alloca double, align 8
	// CHECK-DEBUG-NEXT: [[I185:%.*]] = alloca i32, align 4			// CHECK-DEBUG-NEXT: [[I185:%.*]] = alloca i32, align 4
	// CHECK-DEBUG-NEXT: [[AGG_CAPTURED186:%.]] = alloca [[STRUCT_ANON_17:%.]], align 8			// CHECK-DEBUG-NEXT: [[AGG_CAPTURED186:%.]] = alloca [[STRUCT_ANON_17:%.]], align 8
	// CHECK-DEBUG-NEXT: [[AGG_CAPTURED187:%.]] = alloca [[STRUCT_ANON_18:%.]], align 4			// CHECK-DEBUG-NEXT: [[AGG_CAPTURED187:%.]] = alloca [[STRUCT_ANON_18:%.]], align 4
	// CHECK-DEBUG-NEXT: [[DOTCOUNT_ADDR188:%.*]] = alloca i32, align 4			// CHECK-DEBUG-NEXT: [[DOTCOUNT_ADDR188:%.*]] = alloca i32, align 4
	// CHECK-DEBUG-NEXT: [[P_LASTITER203:%.*]] = alloca i32, align 4			// CHECK-DEBUG-NEXT: [[P_LASTITER203:%.*]] = alloca i32, align 4
	// CHECK-DEBUG-NEXT: [[P_LOWERBOUND204:%.*]] = alloca i32, align 4			// CHECK-DEBUG-NEXT: [[P_LOWERBOUND204:%.*]] = alloca i32, align 4
	// CHECK-DEBUG-NEXT: [[P_UPPERBOUND205:%.*]] = alloca i32, align 4			// CHECK-DEBUG-NEXT: [[P_UPPERBOUND205:%.*]] = alloca i32, align 4
	// CHECK-DEBUG-NEXT: [[P_STRIDE206:%.*]] = alloca i32, align 4			// CHECK-DEBUG-NEXT: [[P_STRIDE206:%.*]] = alloca i32, align 4
	// CHECK-DEBUG-NEXT: store float* [[R:%.]], float* [[R_ADDR]], align 8			// CHECK-DEBUG-NEXT: store float* [[R:%.]], float* [[R_ADDR]], align 8
	// CHECK-DEBUG-NEXT: call void @llvm.dbg.declare(metadata float** [[R_ADDR]], metadata [[META132:![0-9]+]], metadata !DIExpression()), !dbg [[DBG133:![0-9]+]]			// CHECK-DEBUG-NEXT: call void @llvm.dbg.declare(metadata float** [[R_ADDR]], metadata [[META133:![0-9]+]], metadata !DIExpression()), !dbg [[DBG134:![0-9]+]]
	// CHECK-DEBUG-NEXT: store i32 [[A:%.]], i32 [[A_ADDR]], align 4			// CHECK-DEBUG-NEXT: store i32 [[A:%.]], i32 [[A_ADDR]], align 4
	// CHECK-DEBUG-NEXT: call void @llvm.dbg.declare(metadata i32* [[A_ADDR]], metadata [[META134:![0-9]+]], metadata !DIExpression()), !dbg [[DBG135:![0-9]+]]			// CHECK-DEBUG-NEXT: call void @llvm.dbg.declare(metadata i32* [[A_ADDR]], metadata [[META135:![0-9]+]], metadata !DIExpression()), !dbg [[DBG136:![0-9]+]]
	// CHECK-DEBUG-NEXT: store double [[B:%.]], double [[B_ADDR]], align 8			// CHECK-DEBUG-NEXT: store double [[B:%.]], double [[B_ADDR]], align 8
	// CHECK-DEBUG-NEXT: call void @llvm.dbg.declare(metadata double* [[B_ADDR]], metadata [[META136:![0-9]+]], metadata !DIExpression()), !dbg [[DBG137:![0-9]+]]			// CHECK-DEBUG-NEXT: call void @llvm.dbg.declare(metadata double* [[B_ADDR]], metadata [[META137:![0-9]+]], metadata !DIExpression()), !dbg [[DBG138:![0-9]+]]
	// CHECK-DEBUG-NEXT: [[OMP_GLOBAL_THREAD_NUM:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t @[[GLOB13:[0-9]+]]), !dbg [[DBG138:![0-9]+]]			// CHECK-DEBUG-NEXT: [[OMP_GLOBAL_THREAD_NUM:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t @[[GLOB13:[0-9]+]]), !dbg [[DBG139:![0-9]+]]
	// CHECK-DEBUG-NEXT: br label [[OMP_PARALLEL:%.*]]			// CHECK-DEBUG-NEXT: br label [[OMP_PARALLEL:%.*]]
	// CHECK-DEBUG: omp_parallel:			// CHECK-DEBUG: omp_parallel:
	// CHECK-DEBUG-NEXT: call void (%struct.ident_t, i32, void (i32, i32, ...), ...) @__kmpc_fork_call(%struct.ident_t* @[[GLOB13]], i32 3, void (i32, i32, ...)* bitcast (void (i32, i32, i32, double, float*) @_Z14parallel_for_2Pfid..omp_par.23 to void (i32, i32, ...)), i32 [[A_ADDR]], double* [[B_ADDR]], float** [[R_ADDR]]), !dbg [[DBG139:![0-9]+]]			// CHECK-DEBUG-NEXT: call void (%struct.ident_t, i32, void (i32, i32, ...), ...) @__kmpc_fork_call(%struct.ident_t* @[[GLOB13]], i32 3, void (i32, i32, ...)* bitcast (void (i32, i32, i32, double, float*) @_Z14parallel_for_2Pfid..omp_par.23 to void (i32, i32, ...)), i32 [[A_ADDR]], double* [[B_ADDR]], float** [[R_ADDR]]), !dbg [[DBG140:![0-9]+]]
	// CHECK-DEBUG-NEXT: br label [[OMP_PAR_OUTLINED_EXIT184:%.*]]			// CHECK-DEBUG-NEXT: br label [[OMP_PAR_OUTLINED_EXIT184:%.*]]
	// CHECK-DEBUG: omp.par.outlined.exit184:			// CHECK-DEBUG: omp.par.outlined.exit184:
	// CHECK-DEBUG-NEXT: br label [[OMP_PAR_EXIT_SPLIT:%.*]]			// CHECK-DEBUG-NEXT: br label [[OMP_PAR_EXIT_SPLIT:%.*]]
	// CHECK-DEBUG: omp.par.exit.split:			// CHECK-DEBUG: omp.par.exit.split:
	// CHECK-DEBUG-NEXT: call void @llvm.dbg.declare(metadata i32* [[I185]], metadata [[META143:![0-9]+]], metadata !DIExpression()), !dbg [[DBG146:![0-9]+]]			// CHECK-DEBUG-NEXT: call void @llvm.dbg.declare(metadata i32* [[I185]], metadata [[META144:![0-9]+]], metadata !DIExpression()), !dbg [[DBG147:![0-9]+]]
	// CHECK-DEBUG-NEXT: store i32 0, i32* [[I185]], align 4, !dbg [[DBG146]]			// CHECK-DEBUG-NEXT: store i32 0, i32* [[I185]], align 4, !dbg [[DBG147]]
	// CHECK-DEBUG-NEXT: [[TMP0:%.]] = getelementptr inbounds [[STRUCT_ANON_17]], %struct.anon.17 [[AGG_CAPTURED186]], i32 0, i32 0, !dbg [[DBG147:![0-9]+]]			// CHECK-DEBUG-NEXT: [[TMP0:%.]] = getelementptr inbounds [[STRUCT_ANON_17]], %struct.anon.17 [[AGG_CAPTURED186]], i32 0, i32 0, !dbg [[DBG148:![0-9]+]]
	// CHECK-DEBUG-NEXT: store i32* [[I185]], i32** [[TMP0]], align 8, !dbg [[DBG147]]			// CHECK-DEBUG-NEXT: store i32* [[I185]], i32** [[TMP0]], align 8, !dbg [[DBG148]]
	// CHECK-DEBUG-NEXT: [[TMP1:%.]] = getelementptr inbounds [[STRUCT_ANON_18]], %struct.anon.18 [[AGG_CAPTURED187]], i32 0, i32 0, !dbg [[DBG147]]			// CHECK-DEBUG-NEXT: [[TMP1:%.]] = getelementptr inbounds [[STRUCT_ANON_18]], %struct.anon.18 [[AGG_CAPTURED187]], i32 0, i32 0, !dbg [[DBG148]]
	// CHECK-DEBUG-NEXT: [[TMP2:%.]] = load i32, i32 [[I185]], align 4, !dbg [[DBG148:![0-9]+]]			// CHECK-DEBUG-NEXT: [[TMP2:%.]] = load i32, i32 [[I185]], align 4, !dbg [[DBG149:![0-9]+]]
	// CHECK-DEBUG-NEXT: store i32 [[TMP2]], i32* [[TMP1]], align 4, !dbg [[DBG147]]			// CHECK-DEBUG-NEXT: store i32 [[TMP2]], i32* [[TMP1]], align 4, !dbg [[DBG148]]
	// CHECK-DEBUG-NEXT: call void @__captured_stmt.19(i32* [[DOTCOUNT_ADDR188]], %struct.anon.17* [[AGG_CAPTURED186]]), !dbg [[DBG147]]			// CHECK-DEBUG-NEXT: call void @__captured_stmt.19(i32* [[DOTCOUNT_ADDR188]], %struct.anon.17* [[AGG_CAPTURED186]]), !dbg [[DBG148]]
	// CHECK-DEBUG-NEXT: [[DOTCOUNT189:%.]] = load i32, i32 [[DOTCOUNT_ADDR188]], align 4, !dbg [[DBG147]]			// CHECK-DEBUG-NEXT: [[DOTCOUNT189:%.]] = load i32, i32 [[DOTCOUNT_ADDR188]], align 4, !dbg [[DBG148]]
	// CHECK-DEBUG-NEXT: br label [[OMP_LOOP_PREHEADER190:%.*]], !dbg [[DBG147]]			// CHECK-DEBUG-NEXT: br label [[OMP_LOOP_PREHEADER190:%.*]], !dbg [[DBG148]]
	// CHECK-DEBUG: omp_loop.preheader190:			// CHECK-DEBUG: omp_loop.preheader190:
	// CHECK-DEBUG-NEXT: store i32 0, i32* [[P_LOWERBOUND204]], align 4, !dbg [[DBG147]]			// CHECK-DEBUG-NEXT: store i32 0, i32* [[P_LOWERBOUND204]], align 4, !dbg [[DBG148]]
	// CHECK-DEBUG-NEXT: [[TMP3:%.*]] = sub i32 [[DOTCOUNT189]], 1, !dbg [[DBG147]]			// CHECK-DEBUG-NEXT: [[TMP3:%.*]] = sub i32 [[DOTCOUNT189]], 1, !dbg [[DBG148]]
	// CHECK-DEBUG-NEXT: store i32 [[TMP3]], i32* [[P_UPPERBOUND205]], align 4, !dbg [[DBG147]]			// CHECK-DEBUG-NEXT: store i32 [[TMP3]], i32* [[P_UPPERBOUND205]], align 4, !dbg [[DBG148]]
	// CHECK-DEBUG-NEXT: store i32 1, i32* [[P_STRIDE206]], align 4, !dbg [[DBG147]]			// CHECK-DEBUG-NEXT: store i32 1, i32* [[P_STRIDE206]], align 4, !dbg [[DBG148]]
	// CHECK-DEBUG-NEXT: [[OMP_GLOBAL_THREAD_NUM207:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t @[[GLOB42:[0-9]+]]), !dbg [[DBG147]]			// CHECK-DEBUG-NEXT: [[OMP_GLOBAL_THREAD_NUM207:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t @[[GLOB42:[0-9]+]]), !dbg [[DBG148]]
	// CHECK-DEBUG-NEXT: call void @__kmpc_for_static_init_4u(%struct.ident_t* @[[GLOB42]], i32 [[OMP_GLOBAL_THREAD_NUM207]], i32 34, i32* [[P_LASTITER203]], i32* [[P_LOWERBOUND204]], i32* [[P_UPPERBOUND205]], i32* [[P_STRIDE206]], i32 1, i32 1), !dbg [[DBG147]]			// CHECK-DEBUG-NEXT: call void @__kmpc_for_static_init_4u(%struct.ident_t* @[[GLOB42]], i32 [[OMP_GLOBAL_THREAD_NUM207]], i32 34, i32* [[P_LASTITER203]], i32* [[P_LOWERBOUND204]], i32* [[P_UPPERBOUND205]], i32* [[P_STRIDE206]], i32 1, i32 1), !dbg [[DBG148]]
	// CHECK-DEBUG-NEXT: [[TMP4:%.]] = load i32, i32 [[P_LOWERBOUND204]], align 4, !dbg [[DBG147]]			// CHECK-DEBUG-NEXT: [[TMP4:%.]] = load i32, i32 [[P_LOWERBOUND204]], align 4, !dbg [[DBG148]]
	// CHECK-DEBUG-NEXT: [[TMP5:%.]] = load i32, i32 [[P_UPPERBOUND205]], align 4, !dbg [[DBG147]]			// CHECK-DEBUG-NEXT: [[TMP5:%.]] = load i32, i32 [[P_UPPERBOUND205]], align 4, !dbg [[DBG148]]
	// CHECK-DEBUG-NEXT: [[TMP6:%.*]] = sub i32 [[TMP5]], [[TMP4]], !dbg [[DBG147]]			// CHECK-DEBUG-NEXT: [[TMP6:%.*]] = sub i32 [[TMP5]], [[TMP4]], !dbg [[DBG148]]
	// CHECK-DEBUG-NEXT: [[TMP7:%.*]] = add i32 [[TMP6]], 1, !dbg [[DBG147]]			// CHECK-DEBUG-NEXT: [[TMP7:%.*]] = add i32 [[TMP6]], 1, !dbg [[DBG148]]
	// CHECK-DEBUG-NEXT: br label [[OMP_LOOP_HEADER191:%.*]], !dbg [[DBG147]]			// CHECK-DEBUG-NEXT: br label [[OMP_LOOP_HEADER191:%.*]], !dbg [[DBG148]]
	// CHECK-DEBUG: omp_loop.header191:			// CHECK-DEBUG: omp_loop.header191:
	// CHECK-DEBUG-NEXT: [[OMP_LOOP_IV197:%.]] = phi i32 [ 0, [[OMP_LOOP_PREHEADER190]] ], [ [[OMP_LOOP_NEXT199:%.]], [[OMP_LOOP_INC194:%.*]] ], !dbg [[DBG147]]			// CHECK-DEBUG-NEXT: [[OMP_LOOP_IV197:%.]] = phi i32 [ 0, [[OMP_LOOP_PREHEADER190]] ], [ [[OMP_LOOP_NEXT199:%.]], [[OMP_LOOP_INC194:%.*]] ], !dbg [[DBG148]]
	// CHECK-DEBUG-NEXT: br label [[OMP_LOOP_COND192:%.*]], !dbg [[DBG147]]			// CHECK-DEBUG-NEXT: br label [[OMP_LOOP_COND192:%.*]], !dbg [[DBG148]]
	// CHECK-DEBUG: omp_loop.cond192:			// CHECK-DEBUG: omp_loop.cond192:
	// CHECK-DEBUG-NEXT: [[OMP_LOOP_CMP198:%.*]] = icmp ult i32 [[OMP_LOOP_IV197]], [[TMP7]], !dbg [[DBG147]]			// CHECK-DEBUG-NEXT: [[OMP_LOOP_CMP198:%.*]] = icmp ult i32 [[OMP_LOOP_IV197]], [[TMP7]], !dbg [[DBG148]]
	// CHECK-DEBUG-NEXT: br i1 [[OMP_LOOP_CMP198]], label [[OMP_LOOP_BODY193:%.]], label [[OMP_LOOP_EXIT195:%.]], !dbg [[DBG147]]			// CHECK-DEBUG-NEXT: br i1 [[OMP_LOOP_CMP198]], label [[OMP_LOOP_BODY193:%.]], label [[OMP_LOOP_EXIT195:%.]], !dbg [[DBG148]]
	// CHECK-DEBUG: omp_loop.body193:			// CHECK-DEBUG: omp_loop.body193:
	// CHECK-DEBUG-NEXT: [[TMP8:%.*]] = add i32 [[OMP_LOOP_IV197]], [[TMP4]], !dbg [[DBG147]]			// CHECK-DEBUG-NEXT: [[TMP8:%.*]] = add i32 [[OMP_LOOP_IV197]], [[TMP4]], !dbg [[DBG148]]
	// CHECK-DEBUG-NEXT: call void @__captured_stmt.20(i32* [[I185]], i32 [[TMP8]], %struct.anon.18* [[AGG_CAPTURED187]]), !dbg [[DBG147]]			// CHECK-DEBUG-NEXT: call void @__captured_stmt.20(i32* [[I185]], i32 [[TMP8]], %struct.anon.18* [[AGG_CAPTURED187]]), !dbg [[DBG148]]
	// CHECK-DEBUG-NEXT: [[TMP9:%.]] = load i32, i32 [[A_ADDR]], align 4, !dbg [[DBG149:![0-9]+]]			// CHECK-DEBUG-NEXT: [[TMP9:%.]] = load i32, i32 [[A_ADDR]], align 4, !dbg [[DBG150:![0-9]+]]
	// CHECK-DEBUG-NEXT: [[CONV200:%.*]] = sitofp i32 [[TMP9]] to double, !dbg [[DBG149]]			// CHECK-DEBUG-NEXT: [[CONV200:%.*]] = sitofp i32 [[TMP9]] to double, !dbg [[DBG150]]
	// CHECK-DEBUG-NEXT: [[TMP10:%.]] = load double, double [[B_ADDR]], align 8, !dbg [[DBG150:![0-9]+]]			// CHECK-DEBUG-NEXT: [[TMP10:%.]] = load double, double [[B_ADDR]], align 8, !dbg [[DBG151:![0-9]+]]
	// CHECK-DEBUG-NEXT: [[ADD201:%.*]] = fadd double [[CONV200]], [[TMP10]], !dbg [[DBG151:![0-9]+]]			// CHECK-DEBUG-NEXT: [[ADD201:%.*]] = fadd double [[CONV200]], [[TMP10]], !dbg [[DBG152:![0-9]+]]
	// CHECK-DEBUG-NEXT: [[CONV202:%.*]] = fptrunc double [[ADD201]] to float, !dbg [[DBG149]]			// CHECK-DEBUG-NEXT: [[CONV202:%.*]] = fptrunc double [[ADD201]] to float, !dbg [[DBG150]]
	// CHECK-DEBUG-NEXT: [[TMP11:%.]] = load float, float** [[R_ADDR]], align 8, !dbg [[DBG152:![0-9]+]]			// CHECK-DEBUG-NEXT: [[TMP11:%.]] = load float, float** [[R_ADDR]], align 8, !dbg [[DBG153:![0-9]+]]
	// CHECK-DEBUG-NEXT: store float [[CONV202]], float* [[TMP11]], align 4, !dbg [[DBG153:![0-9]+]]			// CHECK-DEBUG-NEXT: store float [[CONV202]], float* [[TMP11]], align 4, !dbg [[DBG154:![0-9]+]]
	// CHECK-DEBUG-NEXT: br label [[OMP_LOOP_INC194]], !dbg [[DBG147]]			// CHECK-DEBUG-NEXT: br label [[OMP_LOOP_INC194]], !dbg [[DBG148]]
	// CHECK-DEBUG: omp_loop.inc194:			// CHECK-DEBUG: omp_loop.inc194:
	// CHECK-DEBUG-NEXT: [[OMP_LOOP_NEXT199]] = add nuw i32 [[OMP_LOOP_IV197]], 1, !dbg [[DBG147]]			// CHECK-DEBUG-NEXT: [[OMP_LOOP_NEXT199]] = add nuw i32 [[OMP_LOOP_IV197]], 1, !dbg [[DBG148]]
	// CHECK-DEBUG-NEXT: br label [[OMP_LOOP_HEADER191]], !dbg [[DBG147]]			// CHECK-DEBUG-NEXT: br label [[OMP_LOOP_HEADER191]], !dbg [[DBG148]]
	// CHECK-DEBUG: omp_loop.exit195:			// CHECK-DEBUG: omp_loop.exit195:
	// CHECK-DEBUG-NEXT: call void @__kmpc_for_static_fini(%struct.ident_t* @[[GLOB42]], i32 [[OMP_GLOBAL_THREAD_NUM207]]), !dbg [[DBG147]]			// CHECK-DEBUG-NEXT: call void @__kmpc_for_static_fini(%struct.ident_t* @[[GLOB42]], i32 [[OMP_GLOBAL_THREAD_NUM207]]), !dbg [[DBG148]]
	// CHECK-DEBUG-NEXT: [[OMP_GLOBAL_THREAD_NUM208:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t @[[GLOB42]]), !dbg [[DBG150]]			// CHECK-DEBUG-NEXT: br label [[OMP_LOOP_AFTER196:%.*]], !dbg [[DBG148]]
	// CHECK-DEBUG-NEXT: call void @__kmpc_barrier(%struct.ident_t* @[[GLOB43:[0-9]+]], i32 [[OMP_GLOBAL_THREAD_NUM208]]), !dbg [[DBG150]]
	// CHECK-DEBUG-NEXT: br label [[OMP_LOOP_AFTER196:%.*]], !dbg [[DBG147]]
	// CHECK-DEBUG: omp_loop.after196:			// CHECK-DEBUG: omp_loop.after196:
	// CHECK-DEBUG-NEXT: ret void, !dbg [[DBG154:![0-9]+]]			// CHECK-DEBUG-NEXT: [[OMP_GLOBAL_THREAD_NUM208:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t @[[GLOB42]]), !dbg [[DBG151]]
				// CHECK-DEBUG-NEXT: call void @__kmpc_barrier(%struct.ident_t* @[[GLOB43:[0-9]+]], i32 [[OMP_GLOBAL_THREAD_NUM208]]), !dbg [[DBG151]]
				// CHECK-DEBUG-NEXT: ret void, !dbg [[DBG155:![0-9]+]]
	//			//
	void parallel_for_2(float *r, int a, double b) {			void parallel_for_2(float *r, int a, double b) {
	#pragma omp parallel			#pragma omp parallel
	{			{
	#pragma omp for			#pragma omp for
	for (int i = 0; i < 100; ++i)			for (int i = 0; i < 100; ++i)
	*r = a + b;			*r = a + b;
	#pragma omp parallel			#pragma omp parallel
	Show All 33 Lines

clang/test/OpenMP/irbuilder_unroll_partial_factor_for.c

	// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --function-signature --include-generated-funcs			// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --function-signature --include-generated-funcs
	// RUN: %clang_cc1 -fopenmp-enable-irbuilder -verify -fopenmp -fopenmp-version=51 -x c -triple x86_64-unknown-unknown -emit-llvm %s -o - \| FileCheck %s			// RUN: %clang_cc1 -fopenmp-enable-irbuilder -verify -fopenmp -fopenmp-version=51 -x c -triple x86_64-unknown-unknown -emit-llvm %s -o - \| FileCheck %s
	// expected-no-diagnostics			// expected-no-diagnostics

	#ifndef HEADER			#ifndef HEADER
	#define HEADER			#define HEADER

	// CHECK-LABEL: define {{.*}}@unroll_partial_heuristic_for(
	// CHECK-NEXT: [[ENTRY:.*]]:
	// CHECK-NEXT: %[[N_ADDR:.+]] = alloca i32, align 4
	// CHECK-NEXT: %[[A_ADDR:.+]] = alloca float*, align 8
	// CHECK-NEXT: %[[B_ADDR:.+]] = alloca float*, align 8
	// CHECK-NEXT: %[[C_ADDR:.+]] = alloca float*, align 8
	// CHECK-NEXT: %[[D_ADDR:.+]] = alloca float*, align 8
	// CHECK-NEXT: %[[I:.+]] = alloca i32, align 4
	// CHECK-NEXT: %[[AGG_CAPTURED:.+]] = alloca %struct.anon, align 8
	// CHECK-NEXT: %[[AGG_CAPTURED1:.+]] = alloca %struct.anon.0, align 4
	// CHECK-NEXT: %[[DOTCOUNT_ADDR:.+]] = alloca i32, align 4
	// CHECK-NEXT: %[[P_LASTITER:.+]] = alloca i32, align 4
	// CHECK-NEXT: %[[P_LOWERBOUND:.+]] = alloca i32, align 4
	// CHECK-NEXT: %[[P_UPPERBOUND:.+]] = alloca i32, align 4
	// CHECK-NEXT: %[[P_STRIDE:.+]] = alloca i32, align 4
	// CHECK-NEXT: store i32 %[[N:.+]], i32* %[[N_ADDR]], align 4
	// CHECK-NEXT: store float* %[[A:.+]], float** %[[A_ADDR]], align 8
	// CHECK-NEXT: store float* %[[B:.+]], float** %[[B_ADDR]], align 8
	// CHECK-NEXT: store float* %[[C:.+]], float** %[[C_ADDR]], align 8
	// CHECK-NEXT: store float* %[[D:.+]], float** %[[D_ADDR]], align 8
	// CHECK-NEXT: store i32 0, i32* %[[I]], align 4
	// CHECK-NEXT: %[[TMP0:.+]] = getelementptr inbounds %struct.anon, %struct.anon* %[[AGG_CAPTURED]], i32 0, i32 0
	// CHECK-NEXT: store i32* %[[I]], i32** %[[TMP0]], align 8
	// CHECK-NEXT: %[[TMP1:.+]] = getelementptr inbounds %struct.anon, %struct.anon* %[[AGG_CAPTURED]], i32 0, i32 1
	// CHECK-NEXT: store i32* %[[N_ADDR]], i32** %[[TMP1]], align 8
	// CHECK-NEXT: %[[TMP2:.+]] = getelementptr inbounds %struct.anon.0, %struct.anon.0* %[[AGG_CAPTURED1]], i32 0, i32 0
	// CHECK-NEXT: %[[TMP3:.+]] = load i32, i32* %[[I]], align 4
	// CHECK-NEXT: store i32 %[[TMP3]], i32* %[[TMP2]], align 4
	// CHECK-NEXT: call void @__captured_stmt(i32* %[[DOTCOUNT_ADDR]], %struct.anon* %[[AGG_CAPTURED]])
	// CHECK-NEXT: %[[DOTCOUNT:.+]] = load i32, i32* %[[DOTCOUNT_ADDR]], align 4
	// CHECK-NEXT: br label %[[OMP_LOOP_PREHEADER:.+]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[OMP_LOOP_PREHEADER]]:
	// CHECK-NEXT: %[[TMP4:.+]] = udiv i32 %[[DOTCOUNT]], 13
	// CHECK-NEXT: %[[TMP5:.+]] = urem i32 %[[DOTCOUNT]], 13
	// CHECK-NEXT: %[[TMP6:.+]] = icmp ne i32 %[[TMP5]], 0
	// CHECK-NEXT: %[[TMP7:.+]] = zext i1 %[[TMP6]] to i32
	// CHECK-NEXT: %[[OMP_FLOOR0_TRIPCOUNT:.+]] = add nuw i32 %[[TMP4]], %[[TMP7]]
	// CHECK-NEXT: br label %[[OMP_FLOOR0_PREHEADER:.+]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[OMP_FLOOR0_PREHEADER]]:
	// CHECK-NEXT: store i32 0, i32* %[[P_LOWERBOUND]], align 4
	// CHECK-NEXT: %[[TMP8:.+]] = sub i32 %[[OMP_FLOOR0_TRIPCOUNT]], 1
	// CHECK-NEXT: store i32 %[[TMP8]], i32* %[[P_UPPERBOUND]], align 4
	// CHECK-NEXT: store i32 1, i32* %[[P_STRIDE]], align 4
	// CHECK-NEXT: %[[OMP_GLOBAL_THREAD_NUM:.+]] = call i32 @__kmpc_global_thread_num(%struct.ident_t* @1)
	// CHECK-NEXT: call void @__kmpc_for_static_init_4u(%struct.ident_t* @1, i32 %[[OMP_GLOBAL_THREAD_NUM]], i32 34, i32* %[[P_LASTITER]], i32* %[[P_LOWERBOUND]], i32* %[[P_UPPERBOUND]], i32* %[[P_STRIDE]], i32 1, i32 1)
	// CHECK-NEXT: %[[TMP9:.+]] = load i32, i32* %[[P_LOWERBOUND]], align 4
	// CHECK-NEXT: %[[TMP10:.+]] = load i32, i32* %[[P_UPPERBOUND]], align 4
	// CHECK-NEXT: %[[TMP11:.+]] = sub i32 %[[TMP10]], %[[TMP9]]
	// CHECK-NEXT: %[[TMP12:.+]] = add i32 %[[TMP11]], 1
	// CHECK-NEXT: br label %[[OMP_FLOOR0_HEADER:.+]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[OMP_FLOOR0_HEADER]]:
	// CHECK-NEXT: %[[OMP_FLOOR0_IV:.+]] = phi i32 [ 0, %[[OMP_FLOOR0_PREHEADER]] ], [ %[[OMP_FLOOR0_NEXT:.+]], %[[OMP_FLOOR0_INC:.+]] ]
	// CHECK-NEXT: br label %[[OMP_FLOOR0_COND:.+]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[OMP_FLOOR0_COND]]:
	// CHECK-NEXT: %[[OMP_FLOOR0_CMP:.+]] = icmp ult i32 %[[OMP_FLOOR0_IV]], %[[TMP12]]
	// CHECK-NEXT: br i1 %[[OMP_FLOOR0_CMP]], label %[[OMP_FLOOR0_BODY:.+]], label %[[OMP_FLOOR0_EXIT:.+]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[OMP_FLOOR0_BODY]]:
	// CHECK-NEXT: %[[TMP13:.+]] = add i32 %[[OMP_FLOOR0_IV]], %[[TMP9]]
	// CHECK-NEXT: %[[TMP14:.+]] = icmp eq i32 %[[TMP13]], %[[OMP_FLOOR0_TRIPCOUNT]]
	// CHECK-NEXT: %[[TMP15:.+]] = select i1 %[[TMP14]], i32 %[[TMP5]], i32 13
	// CHECK-NEXT: br label %[[OMP_TILE0_PREHEADER:.+]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[OMP_TILE0_PREHEADER]]:
	// CHECK-NEXT: br label %[[OMP_TILE0_HEADER:.+]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[OMP_TILE0_HEADER]]:
	// CHECK-NEXT: %[[OMP_TILE0_IV:.+]] = phi i32 [ 0, %[[OMP_TILE0_PREHEADER]] ], [ %[[OMP_TILE0_NEXT:.+]], %[[OMP_TILE0_INC:.+]] ]
	// CHECK-NEXT: br label %[[OMP_TILE0_COND:.+]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[OMP_TILE0_COND]]:
	// CHECK-NEXT: %[[OMP_TILE0_CMP:.+]] = icmp ult i32 %[[OMP_TILE0_IV]], %[[TMP15]]
	// CHECK-NEXT: br i1 %[[OMP_TILE0_CMP]], label %[[OMP_TILE0_BODY:.+]], label %[[OMP_TILE0_EXIT:.+]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[OMP_TILE0_BODY]]:
	// CHECK-NEXT: %[[TMP16:.+]] = mul nuw i32 13, %[[TMP13]]
	// CHECK-NEXT: %[[TMP17:.+]] = add nuw i32 %[[TMP16]], %[[OMP_TILE0_IV]]
	// CHECK-NEXT: br label %[[OMP_LOOP_BODY:.+]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[OMP_LOOP_BODY]]:
	// CHECK-NEXT: call void @__captured_stmt.1(i32* %[[I]], i32 %[[TMP17]], %struct.anon.0* %[[AGG_CAPTURED1]])
	// CHECK-NEXT: %[[TMP18:.+]] = load float, float* %[[B_ADDR]], align 8
	// CHECK-NEXT: %[[TMP19:.+]] = load i32, i32* %[[I]], align 4
	// CHECK-NEXT: %[[IDXPROM:.+]] = sext i32 %[[TMP19]] to i64
	// CHECK-NEXT: %[[ARRAYIDX:.+]] = getelementptr inbounds float, float* %[[TMP18]], i64 %[[IDXPROM]]
	// CHECK-NEXT: %[[TMP20:.+]] = load float, float* %[[ARRAYIDX]], align 4
	// CHECK-NEXT: %[[TMP21:.+]] = load float, float* %[[C_ADDR]], align 8
	// CHECK-NEXT: %[[TMP22:.+]] = load i32, i32* %[[I]], align 4
	// CHECK-NEXT: %[[IDXPROM2:.+]] = sext i32 %[[TMP22]] to i64
	// CHECK-NEXT: %[[ARRAYIDX3:.+]] = getelementptr inbounds float, float* %[[TMP21]], i64 %[[IDXPROM2]]
	// CHECK-NEXT: %[[TMP23:.+]] = load float, float* %[[ARRAYIDX3]], align 4
	// CHECK-NEXT: %[[MUL:.+]] = fmul float %[[TMP20]], %[[TMP23]]
	// CHECK-NEXT: %[[TMP24:.+]] = load float, float* %[[D_ADDR]], align 8
	// CHECK-NEXT: %[[TMP25:.+]] = load i32, i32* %[[I]], align 4
	// CHECK-NEXT: %[[IDXPROM4:.+]] = sext i32 %[[TMP25]] to i64
	// CHECK-NEXT: %[[ARRAYIDX5:.+]] = getelementptr inbounds float, float* %[[TMP24]], i64 %[[IDXPROM4]]
	// CHECK-NEXT: %[[TMP26:.+]] = load float, float* %[[ARRAYIDX5]], align 4
	// CHECK-NEXT: %[[MUL6:.+]] = fmul float %[[MUL]], %[[TMP26]]
	// CHECK-NEXT: %[[TMP27:.+]] = load float, float* %[[A_ADDR]], align 8
	// CHECK-NEXT: %[[TMP28:.+]] = load i32, i32* %[[I]], align 4
	// CHECK-NEXT: %[[IDXPROM7:.+]] = sext i32 %[[TMP28]] to i64
	// CHECK-NEXT: %[[ARRAYIDX8:.+]] = getelementptr inbounds float, float* %[[TMP27]], i64 %[[IDXPROM7]]
	// CHECK-NEXT: store float %[[MUL6]], float* %[[ARRAYIDX8]], align 4
	// CHECK-NEXT: br label %[[OMP_TILE0_INC]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[OMP_TILE0_INC]]:
	// CHECK-NEXT: %[[OMP_TILE0_NEXT]] = add nuw i32 %[[OMP_TILE0_IV]], 1
	// CHECK-NEXT: br label %[[OMP_TILE0_HEADER]], !llvm.loop ![[LOOP3:[0-9]+]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[OMP_TILE0_EXIT]]:
	// CHECK-NEXT: br label %[[OMP_TILE0_AFTER:.+]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[OMP_TILE0_AFTER]]:
	// CHECK-NEXT: br label %[[OMP_FLOOR0_INC]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[OMP_FLOOR0_INC]]:
	// CHECK-NEXT: %[[OMP_FLOOR0_NEXT]] = add nuw i32 %[[OMP_FLOOR0_IV]], 1
	// CHECK-NEXT: br label %[[OMP_FLOOR0_HEADER]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[OMP_FLOOR0_EXIT]]:
	// CHECK-NEXT: call void @__kmpc_for_static_fini(%struct.ident_t* @1, i32 %[[OMP_GLOBAL_THREAD_NUM]])
	// CHECK-NEXT: %[[OMP_GLOBAL_THREAD_NUM9:.+]] = call i32 @__kmpc_global_thread_num(%struct.ident_t* @1)
	// CHECK-NEXT: call void @__kmpc_barrier(%struct.ident_t* @2, i32 %[[OMP_GLOBAL_THREAD_NUM9]])
	// CHECK-NEXT: br label %[[OMP_FLOOR0_AFTER:.+]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[OMP_FLOOR0_AFTER]]:
	// CHECK-NEXT: br label %[[OMP_LOOP_AFTER:.+]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[OMP_LOOP_AFTER]]:
	// CHECK-NEXT: ret void
	// CHECK-NEXT: }

	void unroll_partial_heuristic_for(int n, float a, float b, float c, float d) {			void unroll_partial_heuristic_for(int n, float a, float b, float c, float d) {
	#pragma omp for			#pragma omp for
	#pragma omp unroll partial(13)			#pragma omp unroll partial(13)
	for (int i = 0; i < n; i++) {			for (int i = 0; i < n; i++) {
	a[i] = b[i] * c[i] * d[i];			a[i] = b[i] * c[i] * d[i];
	}			}
	}			}

	#endif // HEADER			#endif // HEADER

	// CHECK-LABEL: define {{.*}}@__captured_stmt(
	// CHECK-NEXT: [[ENTRY:.*]]:
	// CHECK-NEXT: %[[DISTANCE_ADDR:.+]] = alloca i32*, align 8
	// CHECK-NEXT: %[[__CONTEXT_ADDR:.+]] = alloca %struct.anon*, align 8
	// CHECK-NEXT: %[[DOTSTART:.+]] = alloca i32, align 4
	// CHECK-NEXT: %[[DOTSTOP:.+]] = alloca i32, align 4
	// CHECK-NEXT: %[[DOTSTEP:.+]] = alloca i32, align 4
	// CHECK-NEXT: store i32* %[[DISTANCE:.+]], i32** %[[DISTANCE_ADDR]], align 8
	// CHECK-NEXT: store %struct.anon* %[[__CONTEXT:.+]], %struct.anon** %[[__CONTEXT_ADDR]], align 8
	// CHECK-NEXT: %[[TMP0:.+]] = load %struct.anon, %struct.anon* %[[__CONTEXT_ADDR]], align 8
	// CHECK-NEXT: %[[TMP1:.+]] = getelementptr inbounds %struct.anon, %struct.anon* %[[TMP0]], i32 0, i32 0
	// CHECK-NEXT: %[[TMP2:.+]] = load i32, i32* %[[TMP1]], align 8
	// CHECK-NEXT: %[[TMP3:.+]] = load i32, i32* %[[TMP2]], align 4
	// CHECK-NEXT: store i32 %[[TMP3]], i32* %[[DOTSTART]], align 4
	// CHECK-NEXT: %[[TMP4:.+]] = getelementptr inbounds %struct.anon, %struct.anon* %[[TMP0]], i32 0, i32 1
	// CHECK-NEXT: %[[TMP5:.+]] = load i32, i32* %[[TMP4]], align 8
	// CHECK-NEXT: %[[TMP6:.+]] = load i32, i32* %[[TMP5]], align 4
	// CHECK-NEXT: store i32 %[[TMP6]], i32* %[[DOTSTOP]], align 4
	// CHECK-NEXT: store i32 1, i32* %[[DOTSTEP]], align 4
	// CHECK-NEXT: %[[TMP7:.+]] = load i32, i32* %[[DOTSTART]], align 4
	// CHECK-NEXT: %[[TMP8:.+]] = load i32, i32* %[[DOTSTOP]], align 4
	// CHECK-NEXT: %[[CMP:.+]] = icmp slt i32 %[[TMP7]], %[[TMP8]]
	// CHECK-NEXT: br i1 %[[CMP]], label %[[COND_TRUE:.+]], label %[[COND_FALSE:.+]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[COND_TRUE]]:
	// CHECK-NEXT: %[[TMP9:.+]] = load i32, i32* %[[DOTSTOP]], align 4
	// CHECK-NEXT: %[[TMP10:.+]] = load i32, i32* %[[DOTSTART]], align 4
	// CHECK-NEXT: %[[SUB:.+]] = sub nsw i32 %[[TMP9]], %[[TMP10]]
	// CHECK-NEXT: %[[TMP11:.+]] = load i32, i32* %[[DOTSTEP]], align 4
	// CHECK-NEXT: %[[DIV:.+]] = udiv i32 %[[SUB]], %[[TMP11]]
	// CHECK-NEXT: br label %[[COND_END:.+]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[COND_FALSE]]:
	// CHECK-NEXT: br label %[[COND_END]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[COND_END]]:
	// CHECK-NEXT: %[[COND:.+]] = phi i32 [ %[[DIV]], %[[COND_TRUE]] ], [ 0, %[[COND_FALSE]] ]
	// CHECK-NEXT: %[[TMP12:.+]] = load i32, i32* %[[DISTANCE_ADDR]], align 8
	// CHECK-NEXT: store i32 %[[COND]], i32* %[[TMP12]], align 4
	// CHECK-NEXT: ret void
	// CHECK-NEXT: }


	// CHECK-LABEL: define {{.*}}@__captured_stmt.1(
	// CHECK-NEXT: [[ENTRY:.*]]:
	// CHECK-NEXT: %[[LOOPVAR_ADDR:.+]] = alloca i32*, align 8
	// CHECK-NEXT: %[[LOGICAL_ADDR:.+]] = alloca i32, align 4
	// CHECK-NEXT: %[[__CONTEXT_ADDR:.+]] = alloca %struct.anon.0*, align 8
	// CHECK-NEXT: store i32* %[[LOOPVAR:.+]], i32** %[[LOOPVAR_ADDR]], align 8
	// CHECK-NEXT: store i32 %[[LOGICAL:.+]], i32* %[[LOGICAL_ADDR]], align 4
	// CHECK-NEXT: store %struct.anon.0* %[[__CONTEXT:.+]], %struct.anon.0** %[[__CONTEXT_ADDR]], align 8
	// CHECK-NEXT: %[[TMP0:.+]] = load %struct.anon.0, %struct.anon.0* %[[__CONTEXT_ADDR]], align 8
	// CHECK-NEXT: %[[TMP1:.+]] = getelementptr inbounds %struct.anon.0, %struct.anon.0* %[[TMP0]], i32 0, i32 0
	// CHECK-NEXT: %[[TMP2:.+]] = load i32, i32* %[[TMP1]], align 4
	// CHECK-NEXT: %[[TMP3:.+]] = load i32, i32* %[[LOGICAL_ADDR]], align 4
	// CHECK-NEXT: %[[MUL:.+]] = mul i32 1, %[[TMP3]]
	// CHECK-NEXT: %[[ADD:.+]] = add i32 %[[TMP2]], %[[MUL]]
	// CHECK-NEXT: %[[TMP4:.+]] = load i32, i32* %[[LOOPVAR_ADDR]], align 8
	// CHECK-NEXT: store i32 %[[ADD]], i32* %[[TMP4]], align 4
	// CHECK-NEXT: ret void
	// CHECK-NEXT: }


	// CHECK: ![[META0:[0-9]+]] = !{i32 1, !"wchar_size", i32 4}			// CHECK-LABEL: define {{[^@]+}}@unroll_partial_heuristic_for
	// CHECK: ![[META1:[0-9]+]] = !{i32 7, !"openmp", i32 51}			// CHECK-SAME: (i32 [[N:%.]], float [[A:%.]], float [[B:%.]], float [[C:%.]], float [[D:%.*]]) #[[ATTR0:[0-9]+]] {
	// CHECK: ![[META2:[0-9]+]] =			// CHECK-NEXT: entry:
	// CHECK: ![[LOOP3]] = distinct !{![[LOOP3]], ![[LOOPPROP4:[0-9]+]], ![[LOOPPROP5:[0-9]+]]}			// CHECK-NEXT: [[N_ADDR:%.*]] = alloca i32, align 4
	// CHECK: ![[LOOPPROP4]] = !{!"llvm.loop.unroll.enable"}			// CHECK-NEXT: [[A_ADDR:%.]] = alloca float, align 8
	// CHECK: ![[LOOPPROP5]] = !{!"llvm.loop.unroll.count", i32 13}			// CHECK-NEXT: [[B_ADDR:%.]] = alloca float, align 8
				// CHECK-NEXT: [[C_ADDR:%.]] = alloca float, align 8
				// CHECK-NEXT: [[D_ADDR:%.]] = alloca float, align 8
				// CHECK-NEXT: [[I:%.*]] = alloca i32, align 4
				// CHECK-NEXT: [[AGG_CAPTURED:%.]] = alloca [[STRUCT_ANON:%.]], align 8
				// CHECK-NEXT: [[AGG_CAPTURED1:%.]] = alloca [[STRUCT_ANON_0:%.]], align 4
				// CHECK-NEXT: [[DOTCOUNT_ADDR:%.*]] = alloca i32, align 4
				// CHECK-NEXT: [[P_LASTITER:%.*]] = alloca i32, align 4
				// CHECK-NEXT: [[P_LOWERBOUND:%.*]] = alloca i32, align 4
				// CHECK-NEXT: [[P_UPPERBOUND:%.*]] = alloca i32, align 4
				// CHECK-NEXT: [[P_STRIDE:%.*]] = alloca i32, align 4
				// CHECK-NEXT: store i32 [[N]], i32* [[N_ADDR]], align 4
				// CHECK-NEXT: store float* [[A]], float** [[A_ADDR]], align 8
				// CHECK-NEXT: store float* [[B]], float** [[B_ADDR]], align 8
				// CHECK-NEXT: store float* [[C]], float** [[C_ADDR]], align 8
				// CHECK-NEXT: store float* [[D]], float** [[D_ADDR]], align 8
				// CHECK-NEXT: store i32 0, i32* [[I]], align 4
				// CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds [[STRUCT_ANON]], %struct.anon [[AGG_CAPTURED]], i32 0, i32 0
				// CHECK-NEXT: store i32* [[I]], i32** [[TMP0]], align 8
				// CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds [[STRUCT_ANON]], %struct.anon [[AGG_CAPTURED]], i32 0, i32 1
				// CHECK-NEXT: store i32* [[N_ADDR]], i32** [[TMP1]], align 8
				// CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds [[STRUCT_ANON_0]], %struct.anon.0 [[AGG_CAPTURED1]], i32 0, i32 0
				// CHECK-NEXT: [[TMP3:%.]] = load i32, i32 [[I]], align 4
				// CHECK-NEXT: store i32 [[TMP3]], i32* [[TMP2]], align 4
				// CHECK-NEXT: call void @__captured_stmt(i32* [[DOTCOUNT_ADDR]], %struct.anon* [[AGG_CAPTURED]])
				// CHECK-NEXT: [[DOTCOUNT:%.]] = load i32, i32 [[DOTCOUNT_ADDR]], align 4
				// CHECK-NEXT: br label [[OMP_LOOP_PREHEADER:%.*]]
				// CHECK: omp_loop.preheader:
				// CHECK-NEXT: [[TMP4:%.*]] = udiv i32 [[DOTCOUNT]], 13
				// CHECK-NEXT: [[TMP5:%.*]] = urem i32 [[DOTCOUNT]], 13
				// CHECK-NEXT: [[TMP6:%.*]] = icmp ne i32 [[TMP5]], 0
				// CHECK-NEXT: [[TMP7:%.*]] = zext i1 [[TMP6]] to i32
				// CHECK-NEXT: [[OMP_FLOOR0_TRIPCOUNT:%.*]] = add nuw i32 [[TMP4]], [[TMP7]]
				// CHECK-NEXT: br label [[OMP_FLOOR0_PREHEADER:%.*]]
				// CHECK: omp_floor0.preheader:
				// CHECK-NEXT: store i32 0, i32* [[P_LOWERBOUND]], align 4
				// CHECK-NEXT: [[TMP8:%.*]] = sub i32 [[OMP_FLOOR0_TRIPCOUNT]], 1
				// CHECK-NEXT: store i32 [[TMP8]], i32* [[P_UPPERBOUND]], align 4
				// CHECK-NEXT: store i32 1, i32* [[P_STRIDE]], align 4
				// CHECK-NEXT: [[OMP_GLOBAL_THREAD_NUM:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t @[[GLOB1:[0-9]+]])
				// CHECK-NEXT: call void @__kmpc_for_static_init_4u(%struct.ident_t* @[[GLOB1]], i32 [[OMP_GLOBAL_THREAD_NUM]], i32 34, i32* [[P_LASTITER]], i32* [[P_LOWERBOUND]], i32* [[P_UPPERBOUND]], i32* [[P_STRIDE]], i32 1, i32 1)
				// CHECK-NEXT: [[TMP9:%.]] = load i32, i32 [[P_LOWERBOUND]], align 4
				// CHECK-NEXT: [[TMP10:%.]] = load i32, i32 [[P_UPPERBOUND]], align 4
				// CHECK-NEXT: [[TMP11:%.*]] = sub i32 [[TMP10]], [[TMP9]]
				// CHECK-NEXT: [[TMP12:%.*]] = add i32 [[TMP11]], 1
				// CHECK-NEXT: br label [[OMP_FLOOR0_HEADER:%.*]]
				// CHECK: omp_floor0.header:
				// CHECK-NEXT: [[OMP_FLOOR0_IV:%.]] = phi i32 [ 0, [[OMP_FLOOR0_PREHEADER]] ], [ [[OMP_FLOOR0_NEXT:%.]], [[OMP_FLOOR0_INC:%.*]] ]
				// CHECK-NEXT: br label [[OMP_FLOOR0_COND:%.*]]
				// CHECK: omp_floor0.cond:
				// CHECK-NEXT: [[OMP_FLOOR0_CMP:%.*]] = icmp ult i32 [[OMP_FLOOR0_IV]], [[TMP12]]
				// CHECK-NEXT: br i1 [[OMP_FLOOR0_CMP]], label [[OMP_FLOOR0_BODY:%.]], label [[OMP_FLOOR0_EXIT:%.]]
				// CHECK: omp_floor0.body:
				// CHECK-NEXT: [[TMP13:%.*]] = add i32 [[OMP_FLOOR0_IV]], [[TMP9]]
				// CHECK-NEXT: [[TMP14:%.*]] = icmp eq i32 [[TMP13]], [[OMP_FLOOR0_TRIPCOUNT]]
				// CHECK-NEXT: [[TMP15:%.*]] = select i1 [[TMP14]], i32 [[TMP5]], i32 13
				// CHECK-NEXT: br label [[OMP_TILE0_PREHEADER:%.*]]
				// CHECK: omp_tile0.preheader:
				// CHECK-NEXT: br label [[OMP_TILE0_HEADER:%.*]]
				// CHECK: omp_tile0.header:
				// CHECK-NEXT: [[OMP_TILE0_IV:%.]] = phi i32 [ 0, [[OMP_TILE0_PREHEADER]] ], [ [[OMP_TILE0_NEXT:%.]], [[OMP_TILE0_INC:%.*]] ]
				// CHECK-NEXT: br label [[OMP_TILE0_COND:%.*]]
				// CHECK: omp_tile0.cond:
				// CHECK-NEXT: [[OMP_TILE0_CMP:%.*]] = icmp ult i32 [[OMP_TILE0_IV]], [[TMP15]]
				// CHECK-NEXT: br i1 [[OMP_TILE0_CMP]], label [[OMP_TILE0_BODY:%.]], label [[OMP_TILE0_EXIT:%.]]
				// CHECK: omp_tile0.body:
				// CHECK-NEXT: [[TMP16:%.*]] = mul nuw i32 13, [[TMP13]]
				// CHECK-NEXT: [[TMP17:%.*]] = add nuw i32 [[TMP16]], [[OMP_TILE0_IV]]
				// CHECK-NEXT: br label [[OMP_LOOP_BODY:%.*]]
				// CHECK: omp_loop.body:
				// CHECK-NEXT: call void @__captured_stmt.1(i32* [[I]], i32 [[TMP17]], %struct.anon.0* [[AGG_CAPTURED1]])
				// CHECK-NEXT: [[TMP18:%.]] = load float, float** [[B_ADDR]], align 8
				// CHECK-NEXT: [[TMP19:%.]] = load i32, i32 [[I]], align 4
				// CHECK-NEXT: [[IDXPROM:%.*]] = sext i32 [[TMP19]] to i64
				// CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds float, float [[TMP18]], i64 [[IDXPROM]]
				// CHECK-NEXT: [[TMP20:%.]] = load float, float [[ARRAYIDX]], align 4
				// CHECK-NEXT: [[TMP21:%.]] = load float, float** [[C_ADDR]], align 8
				// CHECK-NEXT: [[TMP22:%.]] = load i32, i32 [[I]], align 4
				// CHECK-NEXT: [[IDXPROM2:%.*]] = sext i32 [[TMP22]] to i64
				// CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds float, float [[TMP21]], i64 [[IDXPROM2]]
				// CHECK-NEXT: [[TMP23:%.]] = load float, float [[ARRAYIDX3]], align 4
				// CHECK-NEXT: [[MUL:%.*]] = fmul float [[TMP20]], [[TMP23]]
				// CHECK-NEXT: [[TMP24:%.]] = load float, float** [[D_ADDR]], align 8
				// CHECK-NEXT: [[TMP25:%.]] = load i32, i32 [[I]], align 4
				// CHECK-NEXT: [[IDXPROM4:%.*]] = sext i32 [[TMP25]] to i64
				// CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds float, float [[TMP24]], i64 [[IDXPROM4]]
				// CHECK-NEXT: [[TMP26:%.]] = load float, float [[ARRAYIDX5]], align 4
				// CHECK-NEXT: [[MUL6:%.*]] = fmul float [[MUL]], [[TMP26]]
				// CHECK-NEXT: [[TMP27:%.]] = load float, float** [[A_ADDR]], align 8
				// CHECK-NEXT: [[TMP28:%.]] = load i32, i32 [[I]], align 4
				// CHECK-NEXT: [[IDXPROM7:%.*]] = sext i32 [[TMP28]] to i64
				// CHECK-NEXT: [[ARRAYIDX8:%.]] = getelementptr inbounds float, float [[TMP27]], i64 [[IDXPROM7]]
				// CHECK-NEXT: store float [[MUL6]], float* [[ARRAYIDX8]], align 4
				// CHECK-NEXT: br label [[OMP_TILE0_INC]]
				// CHECK: omp_tile0.inc:
				// CHECK-NEXT: [[OMP_TILE0_NEXT]] = add nuw i32 [[OMP_TILE0_IV]], 1
				// CHECK-NEXT: br label [[OMP_TILE0_HEADER]], !llvm.loop [[LOOP3:![0-9]+]]
				// CHECK: omp_tile0.exit:
				// CHECK-NEXT: br label [[OMP_TILE0_AFTER:%.*]]
				// CHECK: omp_tile0.after:
				// CHECK-NEXT: br label [[OMP_FLOOR0_INC]]
				// CHECK: omp_floor0.inc:
				// CHECK-NEXT: [[OMP_FLOOR0_NEXT]] = add nuw i32 [[OMP_FLOOR0_IV]], 1
				// CHECK-NEXT: br label [[OMP_FLOOR0_HEADER]]
				// CHECK: omp_floor0.exit:
				// CHECK-NEXT: call void @__kmpc_for_static_fini(%struct.ident_t* @[[GLOB1]], i32 [[OMP_GLOBAL_THREAD_NUM]])
				// CHECK-NEXT: br label [[OMP_FLOOR0_AFTER:%.*]]
				// CHECK: omp_floor0.after:
				// CHECK-NEXT: [[OMP_GLOBAL_THREAD_NUM9:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t @[[GLOB1]])
				// CHECK-NEXT: call void @__kmpc_barrier(%struct.ident_t* @[[GLOB2:[0-9]+]], i32 [[OMP_GLOBAL_THREAD_NUM9]])
				// CHECK-NEXT: br label [[OMP_LOOP_AFTER:%.*]]
				// CHECK: omp_loop.after:
				// CHECK-NEXT: ret void
				//
				//
				// CHECK-LABEL: define {{[^@]+}}@__captured_stmt
				// CHECK-SAME: (i32* nonnull align 4 dereferenceable(4) [[DISTANCE:%.]], %struct.anon noalias [[__CONTEXT:%.*]]) #[[ATTR0]] {
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[DISTANCE_ADDR:%.]] = alloca i32, align 8
				// CHECK-NEXT: [[__CONTEXT_ADDR:%.]] = alloca %struct.anon, align 8
				// CHECK-NEXT: [[DOTSTART:%.*]] = alloca i32, align 4
				// CHECK-NEXT: [[DOTSTOP:%.*]] = alloca i32, align 4
				// CHECK-NEXT: [[DOTSTEP:%.*]] = alloca i32, align 4
				// CHECK-NEXT: store i32* [[DISTANCE]], i32** [[DISTANCE_ADDR]], align 8
				// CHECK-NEXT: store %struct.anon* [[__CONTEXT]], %struct.anon** [[__CONTEXT_ADDR]], align 8
				// CHECK-NEXT: [[TMP0:%.]] = load %struct.anon, %struct.anon** [[__CONTEXT_ADDR]], align 8
				// CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds [[STRUCT_ANON:%.]], %struct.anon* [[TMP0]], i32 0, i32 0
				// CHECK-NEXT: [[TMP2:%.]] = load i32, i32** [[TMP1]], align 8
				// CHECK-NEXT: [[TMP3:%.]] = load i32, i32 [[TMP2]], align 4
				// CHECK-NEXT: store i32 [[TMP3]], i32* [[DOTSTART]], align 4
				// CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds [[STRUCT_ANON]], %struct.anon [[TMP0]], i32 0, i32 1
				// CHECK-NEXT: [[TMP5:%.]] = load i32, i32** [[TMP4]], align 8
				// CHECK-NEXT: [[TMP6:%.]] = load i32, i32 [[TMP5]], align 4
				// CHECK-NEXT: store i32 [[TMP6]], i32* [[DOTSTOP]], align 4
				// CHECK-NEXT: store i32 1, i32* [[DOTSTEP]], align 4
				// CHECK-NEXT: [[TMP7:%.]] = load i32, i32 [[DOTSTART]], align 4
				// CHECK-NEXT: [[TMP8:%.]] = load i32, i32 [[DOTSTOP]], align 4
				// CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[TMP7]], [[TMP8]]
				// CHECK-NEXT: br i1 [[CMP]], label [[COND_TRUE:%.]], label [[COND_FALSE:%.]]
				// CHECK: cond.true:
				// CHECK-NEXT: [[TMP9:%.]] = load i32, i32 [[DOTSTOP]], align 4
				// CHECK-NEXT: [[TMP10:%.]] = load i32, i32 [[DOTSTART]], align 4
				// CHECK-NEXT: [[SUB:%.*]] = sub nsw i32 [[TMP9]], [[TMP10]]
				// CHECK-NEXT: [[TMP11:%.]] = load i32, i32 [[DOTSTEP]], align 4
				// CHECK-NEXT: [[DIV:%.*]] = udiv i32 [[SUB]], [[TMP11]]
				// CHECK-NEXT: br label [[COND_END:%.*]]
				// CHECK: cond.false:
				// CHECK-NEXT: br label [[COND_END]]
				// CHECK: cond.end:
				// CHECK-NEXT: [[COND:%.*]] = phi i32 [ [[DIV]], [[COND_TRUE]] ], [ 0, [[COND_FALSE]] ]
				// CHECK-NEXT: [[TMP12:%.]] = load i32, i32** [[DISTANCE_ADDR]], align 8
				// CHECK-NEXT: store i32 [[COND]], i32* [[TMP12]], align 4
				// CHECK-NEXT: ret void
				//
				//
				// CHECK-LABEL: define {{[^@]+}}@__captured_stmt.1
				// CHECK-SAME: (i32* nonnull align 4 dereferenceable(4) [[LOOPVAR:%.]], i32 [[LOGICAL:%.]], %struct.anon.0* noalias [[__CONTEXT:%.*]]) #[[ATTR0]] {
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[LOOPVAR_ADDR:%.]] = alloca i32, align 8
				// CHECK-NEXT: [[LOGICAL_ADDR:%.*]] = alloca i32, align 4
				// CHECK-NEXT: [[__CONTEXT_ADDR:%.]] = alloca %struct.anon.0, align 8
				// CHECK-NEXT: store i32* [[LOOPVAR]], i32** [[LOOPVAR_ADDR]], align 8
				// CHECK-NEXT: store i32 [[LOGICAL]], i32* [[LOGICAL_ADDR]], align 4
				// CHECK-NEXT: store %struct.anon.0* [[__CONTEXT]], %struct.anon.0** [[__CONTEXT_ADDR]], align 8
				// CHECK-NEXT: [[TMP0:%.]] = load %struct.anon.0, %struct.anon.0** [[__CONTEXT_ADDR]], align 8
				// CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds [[STRUCT_ANON_0:%.]], %struct.anon.0* [[TMP0]], i32 0, i32 0
				// CHECK-NEXT: [[TMP2:%.]] = load i32, i32 [[TMP1]], align 4
				// CHECK-NEXT: [[TMP3:%.]] = load i32, i32 [[LOGICAL_ADDR]], align 4
				// CHECK-NEXT: [[MUL:%.*]] = mul i32 1, [[TMP3]]
				// CHECK-NEXT: [[ADD:%.*]] = add i32 [[TMP2]], [[MUL]]
				// CHECK-NEXT: [[TMP4:%.]] = load i32, i32** [[LOOPVAR_ADDR]], align 8
				// CHECK-NEXT: store i32 [[ADD]], i32* [[TMP4]], align 4
				// CHECK-NEXT: ret void
				//

clang/test/OpenMP/irbuilder_unroll_partial_heuristic_constant_for.c

	// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --function-signature --include-generated-funcs			// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --function-signature --include-generated-funcs
	// RUN: %clang_cc1 -fopenmp-enable-irbuilder -verify -fopenmp -fopenmp-version=51 -x c -triple x86_64-unknown-unknown -emit-llvm %s -o - \| FileCheck %s			// RUN: %clang_cc1 -fopenmp-enable-irbuilder -verify -fopenmp -fopenmp-version=51 -x c -triple x86_64-unknown-unknown -emit-llvm %s -o - \| FileCheck %s
	// expected-no-diagnostics			// expected-no-diagnostics

	// REQUIRES: x86-registered-target			// REQUIRES: x86-registered-target

	// TODO: The unroll-factor heuristic might be able to use the information that the trip count is constant, but currently is not able to determine that.			// TODO: The unroll-factor heuristic might be able to use the information that the trip count is constant, but currently is not able to determine that.

	#ifndef HEADER			#ifndef HEADER
	#define HEADER			#define HEADER

	double sind(double);			double sind(double);

	// CHECK-LABEL: define {{.*}}@unroll_partial_heuristic_constant_for(
	// CHECK-NEXT: [[ENTRY:.*]]:
	// CHECK-NEXT: %[[A_ADDR:.+]] = alloca float*, align 8
	// CHECK-NEXT: %[[B_ADDR:.+]] = alloca float*, align 8
	// CHECK-NEXT: %[[C_ADDR:.+]] = alloca float*, align 8
	// CHECK-NEXT: %[[D_ADDR:.+]] = alloca float*, align 8
	// CHECK-NEXT: %[[E_ADDR:.+]] = alloca float*, align 8
	// CHECK-NEXT: %[[OFFSET_ADDR:.+]] = alloca float, align 4
	// CHECK-NEXT: %[[I:.+]] = alloca i32, align 4
	// CHECK-NEXT: %[[AGG_CAPTURED:.+]] = alloca %struct.anon, align 8
	// CHECK-NEXT: %[[AGG_CAPTURED1:.+]] = alloca %struct.anon.0, align 4
	// CHECK-NEXT: %[[DOTCOUNT_ADDR:.+]] = alloca i32, align 4
	// CHECK-NEXT: %[[P_LASTITER:.+]] = alloca i32, align 4
	// CHECK-NEXT: %[[P_LOWERBOUND:.+]] = alloca i32, align 4
	// CHECK-NEXT: %[[P_UPPERBOUND:.+]] = alloca i32, align 4
	// CHECK-NEXT: %[[P_STRIDE:.+]] = alloca i32, align 4
	// CHECK-NEXT: store float* %[[A:.+]], float** %[[A_ADDR]], align 8
	// CHECK-NEXT: store float* %[[B:.+]], float** %[[B_ADDR]], align 8
	// CHECK-NEXT: store float* %[[C:.+]], float** %[[C_ADDR]], align 8
	// CHECK-NEXT: store float* %[[D:.+]], float** %[[D_ADDR]], align 8
	// CHECK-NEXT: store float* %[[E:.+]], float** %[[E_ADDR]], align 8
	// CHECK-NEXT: store float %[[OFFSET:.+]], float* %[[OFFSET_ADDR]], align 4
	// CHECK-NEXT: store i32 0, i32* %[[I]], align 4
	// CHECK-NEXT: %[[TMP0:.+]] = getelementptr inbounds %struct.anon, %struct.anon* %[[AGG_CAPTURED]], i32 0, i32 0
	// CHECK-NEXT: store i32* %[[I]], i32** %[[TMP0]], align 8
	// CHECK-NEXT: %[[TMP1:.+]] = getelementptr inbounds %struct.anon.0, %struct.anon.0* %[[AGG_CAPTURED1]], i32 0, i32 0
	// CHECK-NEXT: %[[TMP2:.+]] = load i32, i32* %[[I]], align 4
	// CHECK-NEXT: store i32 %[[TMP2]], i32* %[[TMP1]], align 4
	// CHECK-NEXT: call void @__captured_stmt(i32* %[[DOTCOUNT_ADDR]], %struct.anon* %[[AGG_CAPTURED]])
	// CHECK-NEXT: %[[DOTCOUNT:.+]] = load i32, i32* %[[DOTCOUNT_ADDR]], align 4
	// CHECK-NEXT: br label %[[OMP_LOOP_PREHEADER:.+]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[OMP_LOOP_PREHEADER]]:
	// CHECK-NEXT: %[[TMP3:.+]] = udiv i32 %[[DOTCOUNT]], 4
	// CHECK-NEXT: %[[TMP4:.+]] = urem i32 %[[DOTCOUNT]], 4
	// CHECK-NEXT: %[[TMP5:.+]] = icmp ne i32 %[[TMP4]], 0
	// CHECK-NEXT: %[[TMP6:.+]] = zext i1 %[[TMP5]] to i32
	// CHECK-NEXT: %[[OMP_FLOOR0_TRIPCOUNT:.+]] = add nuw i32 %[[TMP3]], %[[TMP6]]
	// CHECK-NEXT: br label %[[OMP_FLOOR0_PREHEADER:.+]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[OMP_FLOOR0_PREHEADER]]:
	// CHECK-NEXT: store i32 0, i32* %[[P_LOWERBOUND]], align 4
	// CHECK-NEXT: %[[TMP7:.+]] = sub i32 %[[OMP_FLOOR0_TRIPCOUNT]], 1
	// CHECK-NEXT: store i32 %[[TMP7]], i32* %[[P_UPPERBOUND]], align 4
	// CHECK-NEXT: store i32 1, i32* %[[P_STRIDE]], align 4
	// CHECK-NEXT: %[[OMP_GLOBAL_THREAD_NUM:.+]] = call i32 @__kmpc_global_thread_num(%struct.ident_t* @1)
	// CHECK-NEXT: call void @__kmpc_for_static_init_4u(%struct.ident_t* @1, i32 %[[OMP_GLOBAL_THREAD_NUM]], i32 34, i32* %[[P_LASTITER]], i32* %[[P_LOWERBOUND]], i32* %[[P_UPPERBOUND]], i32* %[[P_STRIDE]], i32 1, i32 1)
	// CHECK-NEXT: %[[TMP8:.+]] = load i32, i32* %[[P_LOWERBOUND]], align 4
	// CHECK-NEXT: %[[TMP9:.+]] = load i32, i32* %[[P_UPPERBOUND]], align 4
	// CHECK-NEXT: %[[TMP10:.+]] = sub i32 %[[TMP9]], %[[TMP8]]
	// CHECK-NEXT: %[[TMP11:.+]] = add i32 %[[TMP10]], 1
	// CHECK-NEXT: br label %[[OMP_FLOOR0_HEADER:.+]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[OMP_FLOOR0_HEADER]]:
	// CHECK-NEXT: %[[OMP_FLOOR0_IV:.+]] = phi i32 [ 0, %[[OMP_FLOOR0_PREHEADER]] ], [ %[[OMP_FLOOR0_NEXT:.+]], %[[OMP_FLOOR0_INC:.+]] ]
	// CHECK-NEXT: br label %[[OMP_FLOOR0_COND:.+]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[OMP_FLOOR0_COND]]:
	// CHECK-NEXT: %[[OMP_FLOOR0_CMP:.+]] = icmp ult i32 %[[OMP_FLOOR0_IV]], %[[TMP11]]
	// CHECK-NEXT: br i1 %[[OMP_FLOOR0_CMP]], label %[[OMP_FLOOR0_BODY:.+]], label %[[OMP_FLOOR0_EXIT:.+]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[OMP_FLOOR0_BODY]]:
	// CHECK-NEXT: %[[TMP12:.+]] = add i32 %[[OMP_FLOOR0_IV]], %[[TMP8]]
	// CHECK-NEXT: %[[TMP13:.+]] = icmp eq i32 %[[TMP12]], %[[OMP_FLOOR0_TRIPCOUNT]]
	// CHECK-NEXT: %[[TMP14:.+]] = select i1 %[[TMP13]], i32 %[[TMP4]], i32 4
	// CHECK-NEXT: br label %[[OMP_TILE0_PREHEADER:.+]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[OMP_TILE0_PREHEADER]]:
	// CHECK-NEXT: br label %[[OMP_TILE0_HEADER:.+]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[OMP_TILE0_HEADER]]:
	// CHECK-NEXT: %[[OMP_TILE0_IV:.+]] = phi i32 [ 0, %[[OMP_TILE0_PREHEADER]] ], [ %[[OMP_TILE0_NEXT:.+]], %[[OMP_TILE0_INC:.+]] ]
	// CHECK-NEXT: br label %[[OMP_TILE0_COND:.+]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[OMP_TILE0_COND]]:
	// CHECK-NEXT: %[[OMP_TILE0_CMP:.+]] = icmp ult i32 %[[OMP_TILE0_IV]], %[[TMP14]]
	// CHECK-NEXT: br i1 %[[OMP_TILE0_CMP]], label %[[OMP_TILE0_BODY:.+]], label %[[OMP_TILE0_EXIT:.+]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[OMP_TILE0_BODY]]:
	// CHECK-NEXT: %[[TMP15:.+]] = mul nuw i32 4, %[[TMP12]]
	// CHECK-NEXT: %[[TMP16:.+]] = add nuw i32 %[[TMP15]], %[[OMP_TILE0_IV]]
	// CHECK-NEXT: br label %[[OMP_LOOP_BODY:.+]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[OMP_LOOP_BODY]]:
	// CHECK-NEXT: call void @__captured_stmt.1(i32* %[[I]], i32 %[[TMP16]], %struct.anon.0* %[[AGG_CAPTURED1]])
	// CHECK-NEXT: %[[TMP17:.+]] = load float, float* %[[B_ADDR]], align 8
	// CHECK-NEXT: %[[TMP18:.+]] = load i32, i32* %[[I]], align 4
	// CHECK-NEXT: %[[IDXPROM:.+]] = sext i32 %[[TMP18]] to i64
	// CHECK-NEXT: %[[ARRAYIDX:.+]] = getelementptr inbounds float, float* %[[TMP17]], i64 %[[IDXPROM]]
	// CHECK-NEXT: %[[TMP19:.+]] = load float, float* %[[ARRAYIDX]], align 4
	// CHECK-NEXT: %[[CONV:.+]] = fpext float %[[TMP19]] to double
	// CHECK-NEXT: %[[CALL:.+]] = call double @sind(double %[[CONV]])
	// CHECK-NEXT: %[[TMP20:.+]] = load float, float* %[[C_ADDR]], align 8
	// CHECK-NEXT: %[[TMP21:.+]] = load i32, i32* %[[I]], align 4
	// CHECK-NEXT: %[[IDXPROM2:.+]] = sext i32 %[[TMP21]] to i64
	// CHECK-NEXT: %[[ARRAYIDX3:.+]] = getelementptr inbounds float, float* %[[TMP20]], i64 %[[IDXPROM2]]
	// CHECK-NEXT: %[[TMP22:.+]] = load float, float* %[[ARRAYIDX3]], align 4
	// CHECK-NEXT: %[[CONV4:.+]] = fpext float %[[TMP22]] to double
	// CHECK-NEXT: %[[MUL:.+]] = fmul double %[[CALL]], %[[CONV4]]
	// CHECK-NEXT: %[[TMP23:.+]] = load float, float* %[[D_ADDR]], align 8
	// CHECK-NEXT: %[[TMP24:.+]] = load i32, i32* %[[I]], align 4
	// CHECK-NEXT: %[[IDXPROM5:.+]] = sext i32 %[[TMP24]] to i64
	// CHECK-NEXT: %[[ARRAYIDX6:.+]] = getelementptr inbounds float, float* %[[TMP23]], i64 %[[IDXPROM5]]
	// CHECK-NEXT: %[[TMP25:.+]] = load float, float* %[[ARRAYIDX6]], align 4
	// CHECK-NEXT: %[[CONV7:.+]] = fpext float %[[TMP25]] to double
	// CHECK-NEXT: %[[MUL8:.+]] = fmul double %[[MUL]], %[[CONV7]]
	// CHECK-NEXT: %[[TMP26:.+]] = load float, float* %[[E_ADDR]], align 8
	// CHECK-NEXT: %[[TMP27:.+]] = load i32, i32* %[[I]], align 4
	// CHECK-NEXT: %[[IDXPROM9:.+]] = sext i32 %[[TMP27]] to i64
	// CHECK-NEXT: %[[ARRAYIDX10:.+]] = getelementptr inbounds float, float* %[[TMP26]], i64 %[[IDXPROM9]]
	// CHECK-NEXT: %[[TMP28:.+]] = load float, float* %[[ARRAYIDX10]], align 4
	// CHECK-NEXT: %[[CONV11:.+]] = fpext float %[[TMP28]] to double
	// CHECK-NEXT: %[[MUL12:.+]] = fmul double %[[MUL8]], %[[CONV11]]
	// CHECK-NEXT: %[[TMP29:.+]] = load float, float* %[[OFFSET_ADDR]], align 4
	// CHECK-NEXT: %[[CONV13:.+]] = fpext float %[[TMP29]] to double
	// CHECK-NEXT: %[[ADD:.+]] = fadd double %[[MUL12]], %[[CONV13]]
	// CHECK-NEXT: %[[TMP30:.+]] = load float, float* %[[A_ADDR]], align 8
	// CHECK-NEXT: %[[TMP31:.+]] = load i32, i32* %[[I]], align 4
	// CHECK-NEXT: %[[IDXPROM14:.+]] = sext i32 %[[TMP31]] to i64
	// CHECK-NEXT: %[[ARRAYIDX15:.+]] = getelementptr inbounds float, float* %[[TMP30]], i64 %[[IDXPROM14]]
	// CHECK-NEXT: %[[TMP32:.+]] = load float, float* %[[ARRAYIDX15]], align 4
	// CHECK-NEXT: %[[CONV16:.+]] = fpext float %[[TMP32]] to double
	// CHECK-NEXT: %[[ADD17:.+]] = fadd double %[[CONV16]], %[[ADD]]
	// CHECK-NEXT: %[[CONV18:.+]] = fptrunc double %[[ADD17]] to float
	// CHECK-NEXT: store float %[[CONV18]], float* %[[ARRAYIDX15]], align 4
	// CHECK-NEXT: br label %[[OMP_TILE0_INC]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[OMP_TILE0_INC]]:
	// CHECK-NEXT: %[[OMP_TILE0_NEXT]] = add nuw i32 %[[OMP_TILE0_IV]], 1
	// CHECK-NEXT: br label %[[OMP_TILE0_HEADER]], !llvm.loop ![[LOOP3:[0-9]+]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[OMP_TILE0_EXIT]]:
	// CHECK-NEXT: br label %[[OMP_TILE0_AFTER:.+]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[OMP_TILE0_AFTER]]:
	// CHECK-NEXT: br label %[[OMP_FLOOR0_INC]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[OMP_FLOOR0_INC]]:
	// CHECK-NEXT: %[[OMP_FLOOR0_NEXT]] = add nuw i32 %[[OMP_FLOOR0_IV]], 1
	// CHECK-NEXT: br label %[[OMP_FLOOR0_HEADER]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[OMP_FLOOR0_EXIT]]:
	// CHECK-NEXT: call void @__kmpc_for_static_fini(%struct.ident_t* @1, i32 %[[OMP_GLOBAL_THREAD_NUM]])
	// CHECK-NEXT: %[[OMP_GLOBAL_THREAD_NUM19:.+]] = call i32 @__kmpc_global_thread_num(%struct.ident_t* @1)
	// CHECK-NEXT: call void @__kmpc_barrier(%struct.ident_t* @2, i32 %[[OMP_GLOBAL_THREAD_NUM19]])
	// CHECK-NEXT: br label %[[OMP_FLOOR0_AFTER:.+]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[OMP_FLOOR0_AFTER]]:
	// CHECK-NEXT: br label %[[OMP_LOOP_AFTER:.+]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[OMP_LOOP_AFTER]]:
	// CHECK-NEXT: ret void
	// CHECK-NEXT: }

	void unroll_partial_heuristic_constant_for(float a, float b, float c, float d, float *e, float offset) {			void unroll_partial_heuristic_constant_for(float a, float b, float c, float d, float *e, float offset) {
	#pragma omp for			#pragma omp for
	#pragma omp unroll partial			#pragma omp unroll partial
	for (int i = 0; i < 128; i++) {			for (int i = 0; i < 128; i++) {
	a[i] += sind(b[i]) * c[i] * d[i] * e[i] + offset;			a[i] += sind(b[i]) * c[i] * d[i] * e[i] + offset;
	}			}
	}			}

	#endif // HEADER			#endif // HEADER

	// CHECK-LABEL: define {{.*}}@__captured_stmt(
	// CHECK-NEXT: [[ENTRY:.*]]:
	// CHECK-NEXT: %[[DISTANCE_ADDR:.+]] = alloca i32*, align 8
	// CHECK-NEXT: %[[__CONTEXT_ADDR:.+]] = alloca %struct.anon*, align 8
	// CHECK-NEXT: %[[DOTSTART:.+]] = alloca i32, align 4
	// CHECK-NEXT: %[[DOTSTOP:.+]] = alloca i32, align 4
	// CHECK-NEXT: %[[DOTSTEP:.+]] = alloca i32, align 4
	// CHECK-NEXT: store i32* %[[DISTANCE:.+]], i32** %[[DISTANCE_ADDR]], align 8
	// CHECK-NEXT: store %struct.anon* %[[__CONTEXT:.+]], %struct.anon** %[[__CONTEXT_ADDR]], align 8
	// CHECK-NEXT: %[[TMP0:.+]] = load %struct.anon, %struct.anon* %[[__CONTEXT_ADDR]], align 8
	// CHECK-NEXT: %[[TMP1:.+]] = getelementptr inbounds %struct.anon, %struct.anon* %[[TMP0]], i32 0, i32 0
	// CHECK-NEXT: %[[TMP2:.+]] = load i32, i32* %[[TMP1]], align 8
	// CHECK-NEXT: %[[TMP3:.+]] = load i32, i32* %[[TMP2]], align 4
	// CHECK-NEXT: store i32 %[[TMP3]], i32* %[[DOTSTART]], align 4
	// CHECK-NEXT: store i32 128, i32* %[[DOTSTOP]], align 4
	// CHECK-NEXT: store i32 1, i32* %[[DOTSTEP]], align 4
	// CHECK-NEXT: %[[TMP4:.+]] = load i32, i32* %[[DOTSTART]], align 4
	// CHECK-NEXT: %[[TMP5:.+]] = load i32, i32* %[[DOTSTOP]], align 4
	// CHECK-NEXT: %[[CMP:.+]] = icmp slt i32 %[[TMP4]], %[[TMP5]]
	// CHECK-NEXT: br i1 %[[CMP]], label %[[COND_TRUE:.+]], label %[[COND_FALSE:.+]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[COND_TRUE]]:
	// CHECK-NEXT: %[[TMP6:.+]] = load i32, i32* %[[DOTSTOP]], align 4
	// CHECK-NEXT: %[[TMP7:.+]] = load i32, i32* %[[DOTSTART]], align 4
	// CHECK-NEXT: %[[SUB:.+]] = sub nsw i32 %[[TMP6]], %[[TMP7]]
	// CHECK-NEXT: %[[TMP8:.+]] = load i32, i32* %[[DOTSTEP]], align 4
	// CHECK-NEXT: %[[DIV:.+]] = udiv i32 %[[SUB]], %[[TMP8]]
	// CHECK-NEXT: br label %[[COND_END:.+]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[COND_FALSE]]:
	// CHECK-NEXT: br label %[[COND_END]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[COND_END]]:
	// CHECK-NEXT: %[[COND:.+]] = phi i32 [ %[[DIV]], %[[COND_TRUE]] ], [ 0, %[[COND_FALSE]] ]
	// CHECK-NEXT: %[[TMP9:.+]] = load i32, i32* %[[DISTANCE_ADDR]], align 8
	// CHECK-NEXT: store i32 %[[COND]], i32* %[[TMP9]], align 4
	// CHECK-NEXT: ret void
	// CHECK-NEXT: }


	// CHECK-LABEL: define {{.*}}@__captured_stmt.1(
	// CHECK-NEXT: [[ENTRY:.*]]:
	// CHECK-NEXT: %[[LOOPVAR_ADDR:.+]] = alloca i32*, align 8
	// CHECK-NEXT: %[[LOGICAL_ADDR:.+]] = alloca i32, align 4
	// CHECK-NEXT: %[[__CONTEXT_ADDR:.+]] = alloca %struct.anon.0*, align 8
	// CHECK-NEXT: store i32* %[[LOOPVAR:.+]], i32** %[[LOOPVAR_ADDR]], align 8
	// CHECK-NEXT: store i32 %[[LOGICAL:.+]], i32* %[[LOGICAL_ADDR]], align 4
	// CHECK-NEXT: store %struct.anon.0* %[[__CONTEXT:.+]], %struct.anon.0** %[[__CONTEXT_ADDR]], align 8
	// CHECK-NEXT: %[[TMP0:.+]] = load %struct.anon.0, %struct.anon.0* %[[__CONTEXT_ADDR]], align 8
	// CHECK-NEXT: %[[TMP1:.+]] = getelementptr inbounds %struct.anon.0, %struct.anon.0* %[[TMP0]], i32 0, i32 0
	// CHECK-NEXT: %[[TMP2:.+]] = load i32, i32* %[[TMP1]], align 4
	// CHECK-NEXT: %[[TMP3:.+]] = load i32, i32* %[[LOGICAL_ADDR]], align 4
	// CHECK-NEXT: %[[MUL:.+]] = mul i32 1, %[[TMP3]]
	// CHECK-NEXT: %[[ADD:.+]] = add i32 %[[TMP2]], %[[MUL]]
	// CHECK-NEXT: %[[TMP4:.+]] = load i32, i32* %[[LOOPVAR_ADDR]], align 8
	// CHECK-NEXT: store i32 %[[ADD]], i32* %[[TMP4]], align 4
	// CHECK-NEXT: ret void
	// CHECK-NEXT: }


	// CHECK: ![[META0:[0-9]+]] = !{i32 1, !"wchar_size", i32 4}			// CHECK-LABEL: define {{[^@]+}}@unroll_partial_heuristic_constant_for
	// CHECK: ![[META1:[0-9]+]] = !{i32 7, !"openmp", i32 51}			// CHECK-SAME: (float* [[A:%.]], float [[B:%.]], float [[C:%.]], float [[D:%.]], float [[E:%.]], float [[OFFSET:%.]]) #[[ATTR0:[0-9]+]] {
	// CHECK: ![[META2:[0-9]+]] =			// CHECK-NEXT: entry:
	// CHECK: ![[LOOP3]] = distinct !{![[LOOP3]], ![[LOOPPROP4:[0-9]+]], ![[LOOPPROP5:[0-9]+]]}			// CHECK-NEXT: [[A_ADDR:%.]] = alloca float, align 8
	// CHECK: ![[LOOPPROP4]] = !{!"llvm.loop.unroll.enable"}			// CHECK-NEXT: [[B_ADDR:%.]] = alloca float, align 8
	// CHECK: ![[LOOPPROP5]] = !{!"llvm.loop.unroll.count", i32 4}			// CHECK-NEXT: [[C_ADDR:%.]] = alloca float, align 8
				// CHECK-NEXT: [[D_ADDR:%.]] = alloca float, align 8
				// CHECK-NEXT: [[E_ADDR:%.]] = alloca float, align 8
				// CHECK-NEXT: [[OFFSET_ADDR:%.*]] = alloca float, align 4
				// CHECK-NEXT: [[I:%.*]] = alloca i32, align 4
				// CHECK-NEXT: [[AGG_CAPTURED:%.]] = alloca [[STRUCT_ANON:%.]], align 8
				// CHECK-NEXT: [[AGG_CAPTURED1:%.]] = alloca [[STRUCT_ANON_0:%.]], align 4
				// CHECK-NEXT: [[DOTCOUNT_ADDR:%.*]] = alloca i32, align 4
				// CHECK-NEXT: [[P_LASTITER:%.*]] = alloca i32, align 4
				// CHECK-NEXT: [[P_LOWERBOUND:%.*]] = alloca i32, align 4
				// CHECK-NEXT: [[P_UPPERBOUND:%.*]] = alloca i32, align 4
				// CHECK-NEXT: [[P_STRIDE:%.*]] = alloca i32, align 4
				// CHECK-NEXT: store float* [[A]], float** [[A_ADDR]], align 8
				// CHECK-NEXT: store float* [[B]], float** [[B_ADDR]], align 8
				// CHECK-NEXT: store float* [[C]], float** [[C_ADDR]], align 8
				// CHECK-NEXT: store float* [[D]], float** [[D_ADDR]], align 8
				// CHECK-NEXT: store float* [[E]], float** [[E_ADDR]], align 8
				// CHECK-NEXT: store float [[OFFSET]], float* [[OFFSET_ADDR]], align 4
				// CHECK-NEXT: store i32 0, i32* [[I]], align 4
				// CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds [[STRUCT_ANON]], %struct.anon [[AGG_CAPTURED]], i32 0, i32 0
				// CHECK-NEXT: store i32* [[I]], i32** [[TMP0]], align 8
				// CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds [[STRUCT_ANON_0]], %struct.anon.0 [[AGG_CAPTURED1]], i32 0, i32 0
				// CHECK-NEXT: [[TMP2:%.]] = load i32, i32 [[I]], align 4
				// CHECK-NEXT: store i32 [[TMP2]], i32* [[TMP1]], align 4
				// CHECK-NEXT: call void @__captured_stmt(i32* [[DOTCOUNT_ADDR]], %struct.anon* [[AGG_CAPTURED]])
				// CHECK-NEXT: [[DOTCOUNT:%.]] = load i32, i32 [[DOTCOUNT_ADDR]], align 4
				// CHECK-NEXT: br label [[OMP_LOOP_PREHEADER:%.*]]
				// CHECK: omp_loop.preheader:
				// CHECK-NEXT: [[TMP3:%.*]] = udiv i32 [[DOTCOUNT]], 4
				// CHECK-NEXT: [[TMP4:%.*]] = urem i32 [[DOTCOUNT]], 4
				// CHECK-NEXT: [[TMP5:%.*]] = icmp ne i32 [[TMP4]], 0
				// CHECK-NEXT: [[TMP6:%.*]] = zext i1 [[TMP5]] to i32
				// CHECK-NEXT: [[OMP_FLOOR0_TRIPCOUNT:%.*]] = add nuw i32 [[TMP3]], [[TMP6]]
				// CHECK-NEXT: br label [[OMP_FLOOR0_PREHEADER:%.*]]
				// CHECK: omp_floor0.preheader:
				// CHECK-NEXT: store i32 0, i32* [[P_LOWERBOUND]], align 4
				// CHECK-NEXT: [[TMP7:%.*]] = sub i32 [[OMP_FLOOR0_TRIPCOUNT]], 1
				// CHECK-NEXT: store i32 [[TMP7]], i32* [[P_UPPERBOUND]], align 4
				// CHECK-NEXT: store i32 1, i32* [[P_STRIDE]], align 4
				// CHECK-NEXT: [[OMP_GLOBAL_THREAD_NUM:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t @[[GLOB1:[0-9]+]])
				// CHECK-NEXT: call void @__kmpc_for_static_init_4u(%struct.ident_t* @[[GLOB1]], i32 [[OMP_GLOBAL_THREAD_NUM]], i32 34, i32* [[P_LASTITER]], i32* [[P_LOWERBOUND]], i32* [[P_UPPERBOUND]], i32* [[P_STRIDE]], i32 1, i32 1)
				// CHECK-NEXT: [[TMP8:%.]] = load i32, i32 [[P_LOWERBOUND]], align 4
				// CHECK-NEXT: [[TMP9:%.]] = load i32, i32 [[P_UPPERBOUND]], align 4
				// CHECK-NEXT: [[TMP10:%.*]] = sub i32 [[TMP9]], [[TMP8]]
				// CHECK-NEXT: [[TMP11:%.*]] = add i32 [[TMP10]], 1
				// CHECK-NEXT: br label [[OMP_FLOOR0_HEADER:%.*]]
				// CHECK: omp_floor0.header:
				// CHECK-NEXT: [[OMP_FLOOR0_IV:%.]] = phi i32 [ 0, [[OMP_FLOOR0_PREHEADER]] ], [ [[OMP_FLOOR0_NEXT:%.]], [[OMP_FLOOR0_INC:%.*]] ]
				// CHECK-NEXT: br label [[OMP_FLOOR0_COND:%.*]]
				// CHECK: omp_floor0.cond:
				// CHECK-NEXT: [[OMP_FLOOR0_CMP:%.*]] = icmp ult i32 [[OMP_FLOOR0_IV]], [[TMP11]]
				// CHECK-NEXT: br i1 [[OMP_FLOOR0_CMP]], label [[OMP_FLOOR0_BODY:%.]], label [[OMP_FLOOR0_EXIT:%.]]
				// CHECK: omp_floor0.body:
				// CHECK-NEXT: [[TMP12:%.*]] = add i32 [[OMP_FLOOR0_IV]], [[TMP8]]
				// CHECK-NEXT: [[TMP13:%.*]] = icmp eq i32 [[TMP12]], [[OMP_FLOOR0_TRIPCOUNT]]
				// CHECK-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], i32 [[TMP4]], i32 4
				// CHECK-NEXT: br label [[OMP_TILE0_PREHEADER:%.*]]
				// CHECK: omp_tile0.preheader:
				// CHECK-NEXT: br label [[OMP_TILE0_HEADER:%.*]]
				// CHECK: omp_tile0.header:
				// CHECK-NEXT: [[OMP_TILE0_IV:%.]] = phi i32 [ 0, [[OMP_TILE0_PREHEADER]] ], [ [[OMP_TILE0_NEXT:%.]], [[OMP_TILE0_INC:%.*]] ]
				// CHECK-NEXT: br label [[OMP_TILE0_COND:%.*]]
				// CHECK: omp_tile0.cond:
				// CHECK-NEXT: [[OMP_TILE0_CMP:%.*]] = icmp ult i32 [[OMP_TILE0_IV]], [[TMP14]]
				// CHECK-NEXT: br i1 [[OMP_TILE0_CMP]], label [[OMP_TILE0_BODY:%.]], label [[OMP_TILE0_EXIT:%.]]
				// CHECK: omp_tile0.body:
				// CHECK-NEXT: [[TMP15:%.*]] = mul nuw i32 4, [[TMP12]]
				// CHECK-NEXT: [[TMP16:%.*]] = add nuw i32 [[TMP15]], [[OMP_TILE0_IV]]
				// CHECK-NEXT: br label [[OMP_LOOP_BODY:%.*]]
				// CHECK: omp_loop.body:
				// CHECK-NEXT: call void @__captured_stmt.1(i32* [[I]], i32 [[TMP16]], %struct.anon.0* [[AGG_CAPTURED1]])
				// CHECK-NEXT: [[TMP17:%.]] = load float, float** [[B_ADDR]], align 8
				// CHECK-NEXT: [[TMP18:%.]] = load i32, i32 [[I]], align 4
				// CHECK-NEXT: [[IDXPROM:%.*]] = sext i32 [[TMP18]] to i64
				// CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds float, float [[TMP17]], i64 [[IDXPROM]]
				// CHECK-NEXT: [[TMP19:%.]] = load float, float [[ARRAYIDX]], align 4
				// CHECK-NEXT: [[CONV:%.*]] = fpext float [[TMP19]] to double
				// CHECK-NEXT: [[CALL:%.*]] = call double @sind(double [[CONV]])
				// CHECK-NEXT: [[TMP20:%.]] = load float, float** [[C_ADDR]], align 8
				// CHECK-NEXT: [[TMP21:%.]] = load i32, i32 [[I]], align 4
				// CHECK-NEXT: [[IDXPROM2:%.*]] = sext i32 [[TMP21]] to i64
				// CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds float, float [[TMP20]], i64 [[IDXPROM2]]
				// CHECK-NEXT: [[TMP22:%.]] = load float, float [[ARRAYIDX3]], align 4
				// CHECK-NEXT: [[CONV4:%.*]] = fpext float [[TMP22]] to double
				// CHECK-NEXT: [[MUL:%.*]] = fmul double [[CALL]], [[CONV4]]
				// CHECK-NEXT: [[TMP23:%.]] = load float, float** [[D_ADDR]], align 8
				// CHECK-NEXT: [[TMP24:%.]] = load i32, i32 [[I]], align 4
				// CHECK-NEXT: [[IDXPROM5:%.*]] = sext i32 [[TMP24]] to i64
				// CHECK-NEXT: [[ARRAYIDX6:%.]] = getelementptr inbounds float, float [[TMP23]], i64 [[IDXPROM5]]
				// CHECK-NEXT: [[TMP25:%.]] = load float, float [[ARRAYIDX6]], align 4
				// CHECK-NEXT: [[CONV7:%.*]] = fpext float [[TMP25]] to double
				// CHECK-NEXT: [[MUL8:%.*]] = fmul double [[MUL]], [[CONV7]]
				// CHECK-NEXT: [[TMP26:%.]] = load float, float** [[E_ADDR]], align 8
				// CHECK-NEXT: [[TMP27:%.]] = load i32, i32 [[I]], align 4
				// CHECK-NEXT: [[IDXPROM9:%.*]] = sext i32 [[TMP27]] to i64
				// CHECK-NEXT: [[ARRAYIDX10:%.]] = getelementptr inbounds float, float [[TMP26]], i64 [[IDXPROM9]]
				// CHECK-NEXT: [[TMP28:%.]] = load float, float [[ARRAYIDX10]], align 4
				// CHECK-NEXT: [[CONV11:%.*]] = fpext float [[TMP28]] to double
				// CHECK-NEXT: [[MUL12:%.*]] = fmul double [[MUL8]], [[CONV11]]
				// CHECK-NEXT: [[TMP29:%.]] = load float, float [[OFFSET_ADDR]], align 4
				// CHECK-NEXT: [[CONV13:%.*]] = fpext float [[TMP29]] to double
				// CHECK-NEXT: [[ADD:%.*]] = fadd double [[MUL12]], [[CONV13]]
				// CHECK-NEXT: [[TMP30:%.]] = load float, float** [[A_ADDR]], align 8
				// CHECK-NEXT: [[TMP31:%.]] = load i32, i32 [[I]], align 4
				// CHECK-NEXT: [[IDXPROM14:%.*]] = sext i32 [[TMP31]] to i64
				// CHECK-NEXT: [[ARRAYIDX15:%.]] = getelementptr inbounds float, float [[TMP30]], i64 [[IDXPROM14]]
				// CHECK-NEXT: [[TMP32:%.]] = load float, float [[ARRAYIDX15]], align 4
				// CHECK-NEXT: [[CONV16:%.*]] = fpext float [[TMP32]] to double
				// CHECK-NEXT: [[ADD17:%.*]] = fadd double [[CONV16]], [[ADD]]
				// CHECK-NEXT: [[CONV18:%.*]] = fptrunc double [[ADD17]] to float
				// CHECK-NEXT: store float [[CONV18]], float* [[ARRAYIDX15]], align 4
				// CHECK-NEXT: br label [[OMP_TILE0_INC]]
				// CHECK: omp_tile0.inc:
				// CHECK-NEXT: [[OMP_TILE0_NEXT]] = add nuw i32 [[OMP_TILE0_IV]], 1
				// CHECK-NEXT: br label [[OMP_TILE0_HEADER]], !llvm.loop [[LOOP3:![0-9]+]]
				// CHECK: omp_tile0.exit:
				// CHECK-NEXT: br label [[OMP_TILE0_AFTER:%.*]]
				// CHECK: omp_tile0.after:
				// CHECK-NEXT: br label [[OMP_FLOOR0_INC]]
				// CHECK: omp_floor0.inc:
				// CHECK-NEXT: [[OMP_FLOOR0_NEXT]] = add nuw i32 [[OMP_FLOOR0_IV]], 1
				// CHECK-NEXT: br label [[OMP_FLOOR0_HEADER]]
				// CHECK: omp_floor0.exit:
				// CHECK-NEXT: call void @__kmpc_for_static_fini(%struct.ident_t* @[[GLOB1]], i32 [[OMP_GLOBAL_THREAD_NUM]])
				// CHECK-NEXT: br label [[OMP_FLOOR0_AFTER:%.*]]
				// CHECK: omp_floor0.after:
				// CHECK-NEXT: [[OMP_GLOBAL_THREAD_NUM19:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t @[[GLOB1]])
				// CHECK-NEXT: call void @__kmpc_barrier(%struct.ident_t* @[[GLOB2:[0-9]+]], i32 [[OMP_GLOBAL_THREAD_NUM19]])
				// CHECK-NEXT: br label [[OMP_LOOP_AFTER:%.*]]
				// CHECK: omp_loop.after:
				// CHECK-NEXT: ret void
				//
				//
				// CHECK-LABEL: define {{[^@]+}}@__captured_stmt
				// CHECK-SAME: (i32* nonnull align 4 dereferenceable(4) [[DISTANCE:%.]], %struct.anon noalias [[__CONTEXT:%.*]]) #[[ATTR0]] {
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[DISTANCE_ADDR:%.]] = alloca i32, align 8
				// CHECK-NEXT: [[__CONTEXT_ADDR:%.]] = alloca %struct.anon, align 8
				// CHECK-NEXT: [[DOTSTART:%.*]] = alloca i32, align 4
				// CHECK-NEXT: [[DOTSTOP:%.*]] = alloca i32, align 4
				// CHECK-NEXT: [[DOTSTEP:%.*]] = alloca i32, align 4
				// CHECK-NEXT: store i32* [[DISTANCE]], i32** [[DISTANCE_ADDR]], align 8
				// CHECK-NEXT: store %struct.anon* [[__CONTEXT]], %struct.anon** [[__CONTEXT_ADDR]], align 8
				// CHECK-NEXT: [[TMP0:%.]] = load %struct.anon, %struct.anon** [[__CONTEXT_ADDR]], align 8
				// CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds [[STRUCT_ANON:%.]], %struct.anon* [[TMP0]], i32 0, i32 0
				// CHECK-NEXT: [[TMP2:%.]] = load i32, i32** [[TMP1]], align 8
				// CHECK-NEXT: [[TMP3:%.]] = load i32, i32 [[TMP2]], align 4
				// CHECK-NEXT: store i32 [[TMP3]], i32* [[DOTSTART]], align 4
				// CHECK-NEXT: store i32 128, i32* [[DOTSTOP]], align 4
				// CHECK-NEXT: store i32 1, i32* [[DOTSTEP]], align 4
				// CHECK-NEXT: [[TMP4:%.]] = load i32, i32 [[DOTSTART]], align 4
				// CHECK-NEXT: [[TMP5:%.]] = load i32, i32 [[DOTSTOP]], align 4
				// CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[TMP4]], [[TMP5]]
				// CHECK-NEXT: br i1 [[CMP]], label [[COND_TRUE:%.]], label [[COND_FALSE:%.]]
				// CHECK: cond.true:
				// CHECK-NEXT: [[TMP6:%.]] = load i32, i32 [[DOTSTOP]], align 4
				// CHECK-NEXT: [[TMP7:%.]] = load i32, i32 [[DOTSTART]], align 4
				// CHECK-NEXT: [[SUB:%.*]] = sub nsw i32 [[TMP6]], [[TMP7]]
				// CHECK-NEXT: [[TMP8:%.]] = load i32, i32 [[DOTSTEP]], align 4
				// CHECK-NEXT: [[DIV:%.*]] = udiv i32 [[SUB]], [[TMP8]]
				// CHECK-NEXT: br label [[COND_END:%.*]]
				// CHECK: cond.false:
				// CHECK-NEXT: br label [[COND_END]]
				// CHECK: cond.end:
				// CHECK-NEXT: [[COND:%.*]] = phi i32 [ [[DIV]], [[COND_TRUE]] ], [ 0, [[COND_FALSE]] ]
				// CHECK-NEXT: [[TMP9:%.]] = load i32, i32** [[DISTANCE_ADDR]], align 8
				// CHECK-NEXT: store i32 [[COND]], i32* [[TMP9]], align 4
				// CHECK-NEXT: ret void
				//
				//
				// CHECK-LABEL: define {{[^@]+}}@__captured_stmt.1
				// CHECK-SAME: (i32* nonnull align 4 dereferenceable(4) [[LOOPVAR:%.]], i32 [[LOGICAL:%.]], %struct.anon.0* noalias [[__CONTEXT:%.*]]) #[[ATTR0]] {
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[LOOPVAR_ADDR:%.]] = alloca i32, align 8
				// CHECK-NEXT: [[LOGICAL_ADDR:%.*]] = alloca i32, align 4
				// CHECK-NEXT: [[__CONTEXT_ADDR:%.]] = alloca %struct.anon.0, align 8
				// CHECK-NEXT: store i32* [[LOOPVAR]], i32** [[LOOPVAR_ADDR]], align 8
				// CHECK-NEXT: store i32 [[LOGICAL]], i32* [[LOGICAL_ADDR]], align 4
				// CHECK-NEXT: store %struct.anon.0* [[__CONTEXT]], %struct.anon.0** [[__CONTEXT_ADDR]], align 8
				// CHECK-NEXT: [[TMP0:%.]] = load %struct.anon.0, %struct.anon.0** [[__CONTEXT_ADDR]], align 8
				// CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds [[STRUCT_ANON_0:%.]], %struct.anon.0* [[TMP0]], i32 0, i32 0
				// CHECK-NEXT: [[TMP2:%.]] = load i32, i32 [[TMP1]], align 4
				// CHECK-NEXT: [[TMP3:%.]] = load i32, i32 [[LOGICAL_ADDR]], align 4
				// CHECK-NEXT: [[MUL:%.*]] = mul i32 1, [[TMP3]]
				// CHECK-NEXT: [[ADD:%.*]] = add i32 [[TMP2]], [[MUL]]
				// CHECK-NEXT: [[TMP4:%.]] = load i32, i32** [[LOOPVAR_ADDR]], align 8
				// CHECK-NEXT: store i32 [[ADD]], i32* [[TMP4]], align 4
				// CHECK-NEXT: ret void
				//

clang/test/OpenMP/irbuilder_unroll_partial_heuristic_runtime_for.c

	// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --function-signature --include-generated-funcs			// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --function-signature --include-generated-funcs
	// RUN: %clang_cc1 -fopenmp-enable-irbuilder -verify -fopenmp -fopenmp-version=51 -x c -triple x86_64-unknown-unknown -emit-llvm %s -o - \| FileCheck %s			// RUN: %clang_cc1 -fopenmp-enable-irbuilder -verify -fopenmp -fopenmp-version=51 -x c -triple x86_64-unknown-unknown -emit-llvm %s -o - \| FileCheck %s
	// expected-no-diagnostics			// expected-no-diagnostics

	// REQUIRES: x86-registered-target			// REQUIRES: x86-registered-target

	#ifndef HEADER			#ifndef HEADER
	#define HEADER			#define HEADER

	double sind(double);			double sind(double);

	// CHECK-LABEL: define {{.*}}@unroll_partial_heuristic_runtime_for(
	// CHECK-NEXT: [[ENTRY:.*]]:
	// CHECK-NEXT: %[[N_ADDR:.+]] = alloca i32, align 4
	// CHECK-NEXT: %[[A_ADDR:.+]] = alloca float*, align 8
	// CHECK-NEXT: %[[B_ADDR:.+]] = alloca float*, align 8
	// CHECK-NEXT: %[[C_ADDR:.+]] = alloca float*, align 8
	// CHECK-NEXT: %[[D_ADDR:.+]] = alloca float*, align 8
	// CHECK-NEXT: %[[E_ADDR:.+]] = alloca float*, align 8
	// CHECK-NEXT: %[[OFFSET_ADDR:.+]] = alloca float, align 4
	// CHECK-NEXT: %[[I:.+]] = alloca i32, align 4
	// CHECK-NEXT: %[[AGG_CAPTURED:.+]] = alloca %struct.anon, align 8
	// CHECK-NEXT: %[[AGG_CAPTURED1:.+]] = alloca %struct.anon.0, align 4
	// CHECK-NEXT: %[[DOTCOUNT_ADDR:.+]] = alloca i32, align 4
	// CHECK-NEXT: %[[P_LASTITER:.+]] = alloca i32, align 4
	// CHECK-NEXT: %[[P_LOWERBOUND:.+]] = alloca i32, align 4
	// CHECK-NEXT: %[[P_UPPERBOUND:.+]] = alloca i32, align 4
	// CHECK-NEXT: %[[P_STRIDE:.+]] = alloca i32, align 4
	// CHECK-NEXT: store i32 %[[N:.+]], i32* %[[N_ADDR]], align 4
	// CHECK-NEXT: store float* %[[A:.+]], float** %[[A_ADDR]], align 8
	// CHECK-NEXT: store float* %[[B:.+]], float** %[[B_ADDR]], align 8
	// CHECK-NEXT: store float* %[[C:.+]], float** %[[C_ADDR]], align 8
	// CHECK-NEXT: store float* %[[D:.+]], float** %[[D_ADDR]], align 8
	// CHECK-NEXT: store float* %[[E:.+]], float** %[[E_ADDR]], align 8
	// CHECK-NEXT: store float %[[OFFSET:.+]], float* %[[OFFSET_ADDR]], align 4
	// CHECK-NEXT: store i32 0, i32* %[[I]], align 4
	// CHECK-NEXT: %[[TMP0:.+]] = getelementptr inbounds %struct.anon, %struct.anon* %[[AGG_CAPTURED]], i32 0, i32 0
	// CHECK-NEXT: store i32* %[[I]], i32** %[[TMP0]], align 8
	// CHECK-NEXT: %[[TMP1:.+]] = getelementptr inbounds %struct.anon, %struct.anon* %[[AGG_CAPTURED]], i32 0, i32 1
	// CHECK-NEXT: store i32* %[[N_ADDR]], i32** %[[TMP1]], align 8
	// CHECK-NEXT: %[[TMP2:.+]] = getelementptr inbounds %struct.anon.0, %struct.anon.0* %[[AGG_CAPTURED1]], i32 0, i32 0
	// CHECK-NEXT: %[[TMP3:.+]] = load i32, i32* %[[I]], align 4
	// CHECK-NEXT: store i32 %[[TMP3]], i32* %[[TMP2]], align 4
	// CHECK-NEXT: call void @__captured_stmt(i32* %[[DOTCOUNT_ADDR]], %struct.anon* %[[AGG_CAPTURED]])
	// CHECK-NEXT: %[[DOTCOUNT:.+]] = load i32, i32* %[[DOTCOUNT_ADDR]], align 4
	// CHECK-NEXT: br label %[[OMP_LOOP_PREHEADER:.+]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[OMP_LOOP_PREHEADER]]:
	// CHECK-NEXT: %[[TMP4:.+]] = udiv i32 %[[DOTCOUNT]], 4
	// CHECK-NEXT: %[[TMP5:.+]] = urem i32 %[[DOTCOUNT]], 4
	// CHECK-NEXT: %[[TMP6:.+]] = icmp ne i32 %[[TMP5]], 0
	// CHECK-NEXT: %[[TMP7:.+]] = zext i1 %[[TMP6]] to i32
	// CHECK-NEXT: %[[OMP_FLOOR0_TRIPCOUNT:.+]] = add nuw i32 %[[TMP4]], %[[TMP7]]
	// CHECK-NEXT: br label %[[OMP_FLOOR0_PREHEADER:.+]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[OMP_FLOOR0_PREHEADER]]:
	// CHECK-NEXT: store i32 0, i32* %[[P_LOWERBOUND]], align 4
	// CHECK-NEXT: %[[TMP8:.+]] = sub i32 %[[OMP_FLOOR0_TRIPCOUNT]], 1
	// CHECK-NEXT: store i32 %[[TMP8]], i32* %[[P_UPPERBOUND]], align 4
	// CHECK-NEXT: store i32 1, i32* %[[P_STRIDE]], align 4
	// CHECK-NEXT: %[[OMP_GLOBAL_THREAD_NUM:.+]] = call i32 @__kmpc_global_thread_num(%struct.ident_t* @1)
	// CHECK-NEXT: call void @__kmpc_for_static_init_4u(%struct.ident_t* @1, i32 %[[OMP_GLOBAL_THREAD_NUM]], i32 34, i32* %[[P_LASTITER]], i32* %[[P_LOWERBOUND]], i32* %[[P_UPPERBOUND]], i32* %[[P_STRIDE]], i32 1, i32 1)
	// CHECK-NEXT: %[[TMP9:.+]] = load i32, i32* %[[P_LOWERBOUND]], align 4
	// CHECK-NEXT: %[[TMP10:.+]] = load i32, i32* %[[P_UPPERBOUND]], align 4
	// CHECK-NEXT: %[[TMP11:.+]] = sub i32 %[[TMP10]], %[[TMP9]]
	// CHECK-NEXT: %[[TMP12:.+]] = add i32 %[[TMP11]], 1
	// CHECK-NEXT: br label %[[OMP_FLOOR0_HEADER:.+]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[OMP_FLOOR0_HEADER]]:
	// CHECK-NEXT: %[[OMP_FLOOR0_IV:.+]] = phi i32 [ 0, %[[OMP_FLOOR0_PREHEADER]] ], [ %[[OMP_FLOOR0_NEXT:.+]], %[[OMP_FLOOR0_INC:.+]] ]
	// CHECK-NEXT: br label %[[OMP_FLOOR0_COND:.+]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[OMP_FLOOR0_COND]]:
	// CHECK-NEXT: %[[OMP_FLOOR0_CMP:.+]] = icmp ult i32 %[[OMP_FLOOR0_IV]], %[[TMP12]]
	// CHECK-NEXT: br i1 %[[OMP_FLOOR0_CMP]], label %[[OMP_FLOOR0_BODY:.+]], label %[[OMP_FLOOR0_EXIT:.+]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[OMP_FLOOR0_BODY]]:
	// CHECK-NEXT: %[[TMP13:.+]] = add i32 %[[OMP_FLOOR0_IV]], %[[TMP9]]
	// CHECK-NEXT: %[[TMP14:.+]] = icmp eq i32 %[[TMP13]], %[[OMP_FLOOR0_TRIPCOUNT]]
	// CHECK-NEXT: %[[TMP15:.+]] = select i1 %[[TMP14]], i32 %[[TMP5]], i32 4
	// CHECK-NEXT: br label %[[OMP_TILE0_PREHEADER:.+]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[OMP_TILE0_PREHEADER]]:
	// CHECK-NEXT: br label %[[OMP_TILE0_HEADER:.+]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[OMP_TILE0_HEADER]]:
	// CHECK-NEXT: %[[OMP_TILE0_IV:.+]] = phi i32 [ 0, %[[OMP_TILE0_PREHEADER]] ], [ %[[OMP_TILE0_NEXT:.+]], %[[OMP_TILE0_INC:.+]] ]
	// CHECK-NEXT: br label %[[OMP_TILE0_COND:.+]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[OMP_TILE0_COND]]:
	// CHECK-NEXT: %[[OMP_TILE0_CMP:.+]] = icmp ult i32 %[[OMP_TILE0_IV]], %[[TMP15]]
	// CHECK-NEXT: br i1 %[[OMP_TILE0_CMP]], label %[[OMP_TILE0_BODY:.+]], label %[[OMP_TILE0_EXIT:.+]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[OMP_TILE0_BODY]]:
	// CHECK-NEXT: %[[TMP16:.+]] = mul nuw i32 4, %[[TMP13]]
	// CHECK-NEXT: %[[TMP17:.+]] = add nuw i32 %[[TMP16]], %[[OMP_TILE0_IV]]
	// CHECK-NEXT: br label %[[OMP_LOOP_BODY:.+]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[OMP_LOOP_BODY]]:
	// CHECK-NEXT: call void @__captured_stmt.1(i32* %[[I]], i32 %[[TMP17]], %struct.anon.0* %[[AGG_CAPTURED1]])
	// CHECK-NEXT: %[[TMP18:.+]] = load float, float* %[[B_ADDR]], align 8
	// CHECK-NEXT: %[[TMP19:.+]] = load i32, i32* %[[I]], align 4
	// CHECK-NEXT: %[[IDXPROM:.+]] = sext i32 %[[TMP19]] to i64
	// CHECK-NEXT: %[[ARRAYIDX:.+]] = getelementptr inbounds float, float* %[[TMP18]], i64 %[[IDXPROM]]
	// CHECK-NEXT: %[[TMP20:.+]] = load float, float* %[[ARRAYIDX]], align 4
	// CHECK-NEXT: %[[CONV:.+]] = fpext float %[[TMP20]] to double
	// CHECK-NEXT: %[[CALL:.+]] = call double @sind(double %[[CONV]])
	// CHECK-NEXT: %[[TMP21:.+]] = load float, float* %[[C_ADDR]], align 8
	// CHECK-NEXT: %[[TMP22:.+]] = load i32, i32* %[[I]], align 4
	// CHECK-NEXT: %[[IDXPROM2:.+]] = sext i32 %[[TMP22]] to i64
	// CHECK-NEXT: %[[ARRAYIDX3:.+]] = getelementptr inbounds float, float* %[[TMP21]], i64 %[[IDXPROM2]]
	// CHECK-NEXT: %[[TMP23:.+]] = load float, float* %[[ARRAYIDX3]], align 4
	// CHECK-NEXT: %[[CONV4:.+]] = fpext float %[[TMP23]] to double
	// CHECK-NEXT: %[[MUL:.+]] = fmul double %[[CALL]], %[[CONV4]]
	// CHECK-NEXT: %[[TMP24:.+]] = load float, float* %[[D_ADDR]], align 8
	// CHECK-NEXT: %[[TMP25:.+]] = load i32, i32* %[[I]], align 4
	// CHECK-NEXT: %[[IDXPROM5:.+]] = sext i32 %[[TMP25]] to i64
	// CHECK-NEXT: %[[ARRAYIDX6:.+]] = getelementptr inbounds float, float* %[[TMP24]], i64 %[[IDXPROM5]]
	// CHECK-NEXT: %[[TMP26:.+]] = load float, float* %[[ARRAYIDX6]], align 4
	// CHECK-NEXT: %[[CONV7:.+]] = fpext float %[[TMP26]] to double
	// CHECK-NEXT: %[[MUL8:.+]] = fmul double %[[MUL]], %[[CONV7]]
	// CHECK-NEXT: %[[TMP27:.+]] = load float, float* %[[E_ADDR]], align 8
	// CHECK-NEXT: %[[TMP28:.+]] = load i32, i32* %[[I]], align 4
	// CHECK-NEXT: %[[IDXPROM9:.+]] = sext i32 %[[TMP28]] to i64
	// CHECK-NEXT: %[[ARRAYIDX10:.+]] = getelementptr inbounds float, float* %[[TMP27]], i64 %[[IDXPROM9]]
	// CHECK-NEXT: %[[TMP29:.+]] = load float, float* %[[ARRAYIDX10]], align 4
	// CHECK-NEXT: %[[CONV11:.+]] = fpext float %[[TMP29]] to double
	// CHECK-NEXT: %[[MUL12:.+]] = fmul double %[[MUL8]], %[[CONV11]]
	// CHECK-NEXT: %[[TMP30:.+]] = load float, float* %[[OFFSET_ADDR]], align 4
	// CHECK-NEXT: %[[CONV13:.+]] = fpext float %[[TMP30]] to double
	// CHECK-NEXT: %[[ADD:.+]] = fadd double %[[MUL12]], %[[CONV13]]
	// CHECK-NEXT: %[[TMP31:.+]] = load float, float* %[[A_ADDR]], align 8
	// CHECK-NEXT: %[[TMP32:.+]] = load i32, i32* %[[I]], align 4
	// CHECK-NEXT: %[[IDXPROM14:.+]] = sext i32 %[[TMP32]] to i64
	// CHECK-NEXT: %[[ARRAYIDX15:.+]] = getelementptr inbounds float, float* %[[TMP31]], i64 %[[IDXPROM14]]
	// CHECK-NEXT: %[[TMP33:.+]] = load float, float* %[[ARRAYIDX15]], align 4
	// CHECK-NEXT: %[[CONV16:.+]] = fpext float %[[TMP33]] to double
	// CHECK-NEXT: %[[ADD17:.+]] = fadd double %[[CONV16]], %[[ADD]]
	// CHECK-NEXT: %[[CONV18:.+]] = fptrunc double %[[ADD17]] to float
	// CHECK-NEXT: store float %[[CONV18]], float* %[[ARRAYIDX15]], align 4
	// CHECK-NEXT: br label %[[OMP_TILE0_INC]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[OMP_TILE0_INC]]:
	// CHECK-NEXT: %[[OMP_TILE0_NEXT]] = add nuw i32 %[[OMP_TILE0_IV]], 1
	// CHECK-NEXT: br label %[[OMP_TILE0_HEADER]], !llvm.loop ![[LOOP3:[0-9]+]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[OMP_TILE0_EXIT]]:
	// CHECK-NEXT: br label %[[OMP_TILE0_AFTER:.+]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[OMP_TILE0_AFTER]]:
	// CHECK-NEXT: br label %[[OMP_FLOOR0_INC]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[OMP_FLOOR0_INC]]:
	// CHECK-NEXT: %[[OMP_FLOOR0_NEXT]] = add nuw i32 %[[OMP_FLOOR0_IV]], 1
	// CHECK-NEXT: br label %[[OMP_FLOOR0_HEADER]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[OMP_FLOOR0_EXIT]]:
	// CHECK-NEXT: call void @__kmpc_for_static_fini(%struct.ident_t* @1, i32 %[[OMP_GLOBAL_THREAD_NUM]])
	// CHECK-NEXT: %[[OMP_GLOBAL_THREAD_NUM19:.+]] = call i32 @__kmpc_global_thread_num(%struct.ident_t* @1)
	// CHECK-NEXT: call void @__kmpc_barrier(%struct.ident_t* @2, i32 %[[OMP_GLOBAL_THREAD_NUM19]])
	// CHECK-NEXT: br label %[[OMP_FLOOR0_AFTER:.+]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[OMP_FLOOR0_AFTER]]:
	// CHECK-NEXT: br label %[[OMP_LOOP_AFTER:.+]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[OMP_LOOP_AFTER]]:
	// CHECK-NEXT: ret void
	// CHECK-NEXT: }

	void unroll_partial_heuristic_runtime_for(int n, float a, float b, float c, float d, float *e, float offset) {			void unroll_partial_heuristic_runtime_for(int n, float a, float b, float c, float d, float *e, float offset) {
	#pragma omp for			#pragma omp for
	#pragma omp unroll partial			#pragma omp unroll partial
	for (int i = 0; i < n; i++) {			for (int i = 0; i < n; i++) {
	a[i] += sind(b[i]) * c[i] * d[i] * e[i] + offset;			a[i] += sind(b[i]) * c[i] * d[i] * e[i] + offset;
	}			}
	}			}

	#endif // HEADER			#endif // HEADER

	// CHECK-LABEL: define {{.*}}@__captured_stmt(
	// CHECK-NEXT: [[ENTRY:.*]]:
	// CHECK-NEXT: %[[DISTANCE_ADDR:.+]] = alloca i32*, align 8
	// CHECK-NEXT: %[[__CONTEXT_ADDR:.+]] = alloca %struct.anon*, align 8
	// CHECK-NEXT: %[[DOTSTART:.+]] = alloca i32, align 4
	// CHECK-NEXT: %[[DOTSTOP:.+]] = alloca i32, align 4
	// CHECK-NEXT: %[[DOTSTEP:.+]] = alloca i32, align 4
	// CHECK-NEXT: store i32* %[[DISTANCE:.+]], i32** %[[DISTANCE_ADDR]], align 8
	// CHECK-NEXT: store %struct.anon* %[[__CONTEXT:.+]], %struct.anon** %[[__CONTEXT_ADDR]], align 8
	// CHECK-NEXT: %[[TMP0:.+]] = load %struct.anon, %struct.anon* %[[__CONTEXT_ADDR]], align 8
	// CHECK-NEXT: %[[TMP1:.+]] = getelementptr inbounds %struct.anon, %struct.anon* %[[TMP0]], i32 0, i32 0
	// CHECK-NEXT: %[[TMP2:.+]] = load i32, i32* %[[TMP1]], align 8
	// CHECK-NEXT: %[[TMP3:.+]] = load i32, i32* %[[TMP2]], align 4
	// CHECK-NEXT: store i32 %[[TMP3]], i32* %[[DOTSTART]], align 4
	// CHECK-NEXT: %[[TMP4:.+]] = getelementptr inbounds %struct.anon, %struct.anon* %[[TMP0]], i32 0, i32 1
	// CHECK-NEXT: %[[TMP5:.+]] = load i32, i32* %[[TMP4]], align 8
	// CHECK-NEXT: %[[TMP6:.+]] = load i32, i32* %[[TMP5]], align 4
	// CHECK-NEXT: store i32 %[[TMP6]], i32* %[[DOTSTOP]], align 4
	// CHECK-NEXT: store i32 1, i32* %[[DOTSTEP]], align 4
	// CHECK-NEXT: %[[TMP7:.+]] = load i32, i32* %[[DOTSTART]], align 4
	// CHECK-NEXT: %[[TMP8:.+]] = load i32, i32* %[[DOTSTOP]], align 4
	// CHECK-NEXT: %[[CMP:.+]] = icmp slt i32 %[[TMP7]], %[[TMP8]]
	// CHECK-NEXT: br i1 %[[CMP]], label %[[COND_TRUE:.+]], label %[[COND_FALSE:.+]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[COND_TRUE]]:
	// CHECK-NEXT: %[[TMP9:.+]] = load i32, i32* %[[DOTSTOP]], align 4
	// CHECK-NEXT: %[[TMP10:.+]] = load i32, i32* %[[DOTSTART]], align 4
	// CHECK-NEXT: %[[SUB:.+]] = sub nsw i32 %[[TMP9]], %[[TMP10]]
	// CHECK-NEXT: %[[TMP11:.+]] = load i32, i32* %[[DOTSTEP]], align 4
	// CHECK-NEXT: %[[DIV:.+]] = udiv i32 %[[SUB]], %[[TMP11]]
	// CHECK-NEXT: br label %[[COND_END:.+]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[COND_FALSE]]:
	// CHECK-NEXT: br label %[[COND_END]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[COND_END]]:
	// CHECK-NEXT: %[[COND:.+]] = phi i32 [ %[[DIV]], %[[COND_TRUE]] ], [ 0, %[[COND_FALSE]] ]
	// CHECK-NEXT: %[[TMP12:.+]] = load i32, i32* %[[DISTANCE_ADDR]], align 8
	// CHECK-NEXT: store i32 %[[COND]], i32* %[[TMP12]], align 4
	// CHECK-NEXT: ret void
	// CHECK-NEXT: }


	// CHECK-LABEL: define {{.*}}@__captured_stmt.1(
	// CHECK-NEXT: [[ENTRY:.*]]:
	// CHECK-NEXT: %[[LOOPVAR_ADDR:.+]] = alloca i32*, align 8
	// CHECK-NEXT: %[[LOGICAL_ADDR:.+]] = alloca i32, align 4
	// CHECK-NEXT: %[[__CONTEXT_ADDR:.+]] = alloca %struct.anon.0*, align 8
	// CHECK-NEXT: store i32* %[[LOOPVAR:.+]], i32** %[[LOOPVAR_ADDR]], align 8
	// CHECK-NEXT: store i32 %[[LOGICAL:.+]], i32* %[[LOGICAL_ADDR]], align 4
	// CHECK-NEXT: store %struct.anon.0* %[[__CONTEXT:.+]], %struct.anon.0** %[[__CONTEXT_ADDR]], align 8
	// CHECK-NEXT: %[[TMP0:.+]] = load %struct.anon.0, %struct.anon.0* %[[__CONTEXT_ADDR]], align 8
	// CHECK-NEXT: %[[TMP1:.+]] = getelementptr inbounds %struct.anon.0, %struct.anon.0* %[[TMP0]], i32 0, i32 0
	// CHECK-NEXT: %[[TMP2:.+]] = load i32, i32* %[[TMP1]], align 4
	// CHECK-NEXT: %[[TMP3:.+]] = load i32, i32* %[[LOGICAL_ADDR]], align 4
	// CHECK-NEXT: %[[MUL:.+]] = mul i32 1, %[[TMP3]]
	// CHECK-NEXT: %[[ADD:.+]] = add i32 %[[TMP2]], %[[MUL]]
	// CHECK-NEXT: %[[TMP4:.+]] = load i32, i32* %[[LOOPVAR_ADDR]], align 8
	// CHECK-NEXT: store i32 %[[ADD]], i32* %[[TMP4]], align 4
	// CHECK-NEXT: ret void
	// CHECK-NEXT: }


	// CHECK: ![[META0:[0-9]+]] = !{i32 1, !"wchar_size", i32 4}			// CHECK-LABEL: define {{[^@]+}}@unroll_partial_heuristic_runtime_for
	// CHECK: ![[META1:[0-9]+]] = !{i32 7, !"openmp", i32 51}			// CHECK-SAME: (i32 [[N:%.]], float [[A:%.]], float [[B:%.]], float [[C:%.]], float [[D:%.]], float [[E:%.]], float [[OFFSET:%.]]) #[[ATTR0:[0-9]+]] {
	// CHECK: ![[META2:[0-9]+]] =			// CHECK-NEXT: entry:
	// CHECK: ![[LOOP3]] = distinct !{![[LOOP3]], ![[LOOPPROP4:[0-9]+]], ![[LOOPPROP5:[0-9]+]]}			// CHECK-NEXT: [[N_ADDR:%.*]] = alloca i32, align 4
	// CHECK: ![[LOOPPROP4]] = !{!"llvm.loop.unroll.enable"}			// CHECK-NEXT: [[A_ADDR:%.]] = alloca float, align 8
	// CHECK: ![[LOOPPROP5]] = !{!"llvm.loop.unroll.count", i32 4}			// CHECK-NEXT: [[B_ADDR:%.]] = alloca float, align 8
				// CHECK-NEXT: [[C_ADDR:%.]] = alloca float, align 8
				// CHECK-NEXT: [[D_ADDR:%.]] = alloca float, align 8
				// CHECK-NEXT: [[E_ADDR:%.]] = alloca float, align 8
				// CHECK-NEXT: [[OFFSET_ADDR:%.*]] = alloca float, align 4
				// CHECK-NEXT: [[I:%.*]] = alloca i32, align 4
				// CHECK-NEXT: [[AGG_CAPTURED:%.]] = alloca [[STRUCT_ANON:%.]], align 8
				// CHECK-NEXT: [[AGG_CAPTURED1:%.]] = alloca [[STRUCT_ANON_0:%.]], align 4
				// CHECK-NEXT: [[DOTCOUNT_ADDR:%.*]] = alloca i32, align 4
				// CHECK-NEXT: [[P_LASTITER:%.*]] = alloca i32, align 4
				// CHECK-NEXT: [[P_LOWERBOUND:%.*]] = alloca i32, align 4
				// CHECK-NEXT: [[P_UPPERBOUND:%.*]] = alloca i32, align 4
				// CHECK-NEXT: [[P_STRIDE:%.*]] = alloca i32, align 4
				// CHECK-NEXT: store i32 [[N]], i32* [[N_ADDR]], align 4
				// CHECK-NEXT: store float* [[A]], float** [[A_ADDR]], align 8
				// CHECK-NEXT: store float* [[B]], float** [[B_ADDR]], align 8
				// CHECK-NEXT: store float* [[C]], float** [[C_ADDR]], align 8
				// CHECK-NEXT: store float* [[D]], float** [[D_ADDR]], align 8
				// CHECK-NEXT: store float* [[E]], float** [[E_ADDR]], align 8
				// CHECK-NEXT: store float [[OFFSET]], float* [[OFFSET_ADDR]], align 4
				// CHECK-NEXT: store i32 0, i32* [[I]], align 4
				// CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds [[STRUCT_ANON]], %struct.anon [[AGG_CAPTURED]], i32 0, i32 0
				// CHECK-NEXT: store i32* [[I]], i32** [[TMP0]], align 8
				// CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds [[STRUCT_ANON]], %struct.anon [[AGG_CAPTURED]], i32 0, i32 1
				// CHECK-NEXT: store i32* [[N_ADDR]], i32** [[TMP1]], align 8
				// CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds [[STRUCT_ANON_0]], %struct.anon.0 [[AGG_CAPTURED1]], i32 0, i32 0
				// CHECK-NEXT: [[TMP3:%.]] = load i32, i32 [[I]], align 4
				// CHECK-NEXT: store i32 [[TMP3]], i32* [[TMP2]], align 4
				// CHECK-NEXT: call void @__captured_stmt(i32* [[DOTCOUNT_ADDR]], %struct.anon* [[AGG_CAPTURED]])
				// CHECK-NEXT: [[DOTCOUNT:%.]] = load i32, i32 [[DOTCOUNT_ADDR]], align 4
				// CHECK-NEXT: br label [[OMP_LOOP_PREHEADER:%.*]]
				// CHECK: omp_loop.preheader:
				// CHECK-NEXT: [[TMP4:%.*]] = udiv i32 [[DOTCOUNT]], 4
				// CHECK-NEXT: [[TMP5:%.*]] = urem i32 [[DOTCOUNT]], 4
				// CHECK-NEXT: [[TMP6:%.*]] = icmp ne i32 [[TMP5]], 0
				// CHECK-NEXT: [[TMP7:%.*]] = zext i1 [[TMP6]] to i32
				// CHECK-NEXT: [[OMP_FLOOR0_TRIPCOUNT:%.*]] = add nuw i32 [[TMP4]], [[TMP7]]
				// CHECK-NEXT: br label [[OMP_FLOOR0_PREHEADER:%.*]]
				// CHECK: omp_floor0.preheader:
				// CHECK-NEXT: store i32 0, i32* [[P_LOWERBOUND]], align 4
				// CHECK-NEXT: [[TMP8:%.*]] = sub i32 [[OMP_FLOOR0_TRIPCOUNT]], 1
				// CHECK-NEXT: store i32 [[TMP8]], i32* [[P_UPPERBOUND]], align 4
				// CHECK-NEXT: store i32 1, i32* [[P_STRIDE]], align 4
				// CHECK-NEXT: [[OMP_GLOBAL_THREAD_NUM:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t @[[GLOB1:[0-9]+]])
				// CHECK-NEXT: call void @__kmpc_for_static_init_4u(%struct.ident_t* @[[GLOB1]], i32 [[OMP_GLOBAL_THREAD_NUM]], i32 34, i32* [[P_LASTITER]], i32* [[P_LOWERBOUND]], i32* [[P_UPPERBOUND]], i32* [[P_STRIDE]], i32 1, i32 1)
				// CHECK-NEXT: [[TMP9:%.]] = load i32, i32 [[P_LOWERBOUND]], align 4
				// CHECK-NEXT: [[TMP10:%.]] = load i32, i32 [[P_UPPERBOUND]], align 4
				// CHECK-NEXT: [[TMP11:%.*]] = sub i32 [[TMP10]], [[TMP9]]
				// CHECK-NEXT: [[TMP12:%.*]] = add i32 [[TMP11]], 1
				// CHECK-NEXT: br label [[OMP_FLOOR0_HEADER:%.*]]
				// CHECK: omp_floor0.header:
				// CHECK-NEXT: [[OMP_FLOOR0_IV:%.]] = phi i32 [ 0, [[OMP_FLOOR0_PREHEADER]] ], [ [[OMP_FLOOR0_NEXT:%.]], [[OMP_FLOOR0_INC:%.*]] ]
				// CHECK-NEXT: br label [[OMP_FLOOR0_COND:%.*]]
				// CHECK: omp_floor0.cond:
				// CHECK-NEXT: [[OMP_FLOOR0_CMP:%.*]] = icmp ult i32 [[OMP_FLOOR0_IV]], [[TMP12]]
				// CHECK-NEXT: br i1 [[OMP_FLOOR0_CMP]], label [[OMP_FLOOR0_BODY:%.]], label [[OMP_FLOOR0_EXIT:%.]]
				// CHECK: omp_floor0.body:
				// CHECK-NEXT: [[TMP13:%.*]] = add i32 [[OMP_FLOOR0_IV]], [[TMP9]]
				// CHECK-NEXT: [[TMP14:%.*]] = icmp eq i32 [[TMP13]], [[OMP_FLOOR0_TRIPCOUNT]]
				// CHECK-NEXT: [[TMP15:%.*]] = select i1 [[TMP14]], i32 [[TMP5]], i32 4
				// CHECK-NEXT: br label [[OMP_TILE0_PREHEADER:%.*]]
				// CHECK: omp_tile0.preheader:
				// CHECK-NEXT: br label [[OMP_TILE0_HEADER:%.*]]
				// CHECK: omp_tile0.header:
				// CHECK-NEXT: [[OMP_TILE0_IV:%.]] = phi i32 [ 0, [[OMP_TILE0_PREHEADER]] ], [ [[OMP_TILE0_NEXT:%.]], [[OMP_TILE0_INC:%.*]] ]
				// CHECK-NEXT: br label [[OMP_TILE0_COND:%.*]]
				// CHECK: omp_tile0.cond:
				// CHECK-NEXT: [[OMP_TILE0_CMP:%.*]] = icmp ult i32 [[OMP_TILE0_IV]], [[TMP15]]
				// CHECK-NEXT: br i1 [[OMP_TILE0_CMP]], label [[OMP_TILE0_BODY:%.]], label [[OMP_TILE0_EXIT:%.]]
				// CHECK: omp_tile0.body:
				// CHECK-NEXT: [[TMP16:%.*]] = mul nuw i32 4, [[TMP13]]
				// CHECK-NEXT: [[TMP17:%.*]] = add nuw i32 [[TMP16]], [[OMP_TILE0_IV]]
				// CHECK-NEXT: br label [[OMP_LOOP_BODY:%.*]]
				// CHECK: omp_loop.body:
				// CHECK-NEXT: call void @__captured_stmt.1(i32* [[I]], i32 [[TMP17]], %struct.anon.0* [[AGG_CAPTURED1]])
				// CHECK-NEXT: [[TMP18:%.]] = load float, float** [[B_ADDR]], align 8
				// CHECK-NEXT: [[TMP19:%.]] = load i32, i32 [[I]], align 4
				// CHECK-NEXT: [[IDXPROM:%.*]] = sext i32 [[TMP19]] to i64
				// CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds float, float [[TMP18]], i64 [[IDXPROM]]
				// CHECK-NEXT: [[TMP20:%.]] = load float, float [[ARRAYIDX]], align 4
				// CHECK-NEXT: [[CONV:%.*]] = fpext float [[TMP20]] to double
				// CHECK-NEXT: [[CALL:%.*]] = call double @sind(double [[CONV]])
				// CHECK-NEXT: [[TMP21:%.]] = load float, float** [[C_ADDR]], align 8
				// CHECK-NEXT: [[TMP22:%.]] = load i32, i32 [[I]], align 4
				// CHECK-NEXT: [[IDXPROM2:%.*]] = sext i32 [[TMP22]] to i64
				// CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds float, float [[TMP21]], i64 [[IDXPROM2]]
				// CHECK-NEXT: [[TMP23:%.]] = load float, float [[ARRAYIDX3]], align 4
				// CHECK-NEXT: [[CONV4:%.*]] = fpext float [[TMP23]] to double
				// CHECK-NEXT: [[MUL:%.*]] = fmul double [[CALL]], [[CONV4]]
				// CHECK-NEXT: [[TMP24:%.]] = load float, float** [[D_ADDR]], align 8
				// CHECK-NEXT: [[TMP25:%.]] = load i32, i32 [[I]], align 4
				// CHECK-NEXT: [[IDXPROM5:%.*]] = sext i32 [[TMP25]] to i64
				// CHECK-NEXT: [[ARRAYIDX6:%.]] = getelementptr inbounds float, float [[TMP24]], i64 [[IDXPROM5]]
				// CHECK-NEXT: [[TMP26:%.]] = load float, float [[ARRAYIDX6]], align 4
				// CHECK-NEXT: [[CONV7:%.*]] = fpext float [[TMP26]] to double
				// CHECK-NEXT: [[MUL8:%.*]] = fmul double [[MUL]], [[CONV7]]
				// CHECK-NEXT: [[TMP27:%.]] = load float, float** [[E_ADDR]], align 8
				// CHECK-NEXT: [[TMP28:%.]] = load i32, i32 [[I]], align 4
				// CHECK-NEXT: [[IDXPROM9:%.*]] = sext i32 [[TMP28]] to i64
				// CHECK-NEXT: [[ARRAYIDX10:%.]] = getelementptr inbounds float, float [[TMP27]], i64 [[IDXPROM9]]
				// CHECK-NEXT: [[TMP29:%.]] = load float, float [[ARRAYIDX10]], align 4
				// CHECK-NEXT: [[CONV11:%.*]] = fpext float [[TMP29]] to double
				// CHECK-NEXT: [[MUL12:%.*]] = fmul double [[MUL8]], [[CONV11]]
				// CHECK-NEXT: [[TMP30:%.]] = load float, float [[OFFSET_ADDR]], align 4
				// CHECK-NEXT: [[CONV13:%.*]] = fpext float [[TMP30]] to double
				// CHECK-NEXT: [[ADD:%.*]] = fadd double [[MUL12]], [[CONV13]]
				// CHECK-NEXT: [[TMP31:%.]] = load float, float** [[A_ADDR]], align 8
				// CHECK-NEXT: [[TMP32:%.]] = load i32, i32 [[I]], align 4
				// CHECK-NEXT: [[IDXPROM14:%.*]] = sext i32 [[TMP32]] to i64
				// CHECK-NEXT: [[ARRAYIDX15:%.]] = getelementptr inbounds float, float [[TMP31]], i64 [[IDXPROM14]]
				// CHECK-NEXT: [[TMP33:%.]] = load float, float [[ARRAYIDX15]], align 4
				// CHECK-NEXT: [[CONV16:%.*]] = fpext float [[TMP33]] to double
				// CHECK-NEXT: [[ADD17:%.*]] = fadd double [[CONV16]], [[ADD]]
				// CHECK-NEXT: [[CONV18:%.*]] = fptrunc double [[ADD17]] to float
				// CHECK-NEXT: store float [[CONV18]], float* [[ARRAYIDX15]], align 4
				// CHECK-NEXT: br label [[OMP_TILE0_INC]]
				// CHECK: omp_tile0.inc:
				// CHECK-NEXT: [[OMP_TILE0_NEXT]] = add nuw i32 [[OMP_TILE0_IV]], 1
				// CHECK-NEXT: br label [[OMP_TILE0_HEADER]], !llvm.loop [[LOOP3:![0-9]+]]
				// CHECK: omp_tile0.exit:
				// CHECK-NEXT: br label [[OMP_TILE0_AFTER:%.*]]
				// CHECK: omp_tile0.after:
				// CHECK-NEXT: br label [[OMP_FLOOR0_INC]]
				// CHECK: omp_floor0.inc:
				// CHECK-NEXT: [[OMP_FLOOR0_NEXT]] = add nuw i32 [[OMP_FLOOR0_IV]], 1
				// CHECK-NEXT: br label [[OMP_FLOOR0_HEADER]]
				// CHECK: omp_floor0.exit:
				// CHECK-NEXT: call void @__kmpc_for_static_fini(%struct.ident_t* @[[GLOB1]], i32 [[OMP_GLOBAL_THREAD_NUM]])
				// CHECK-NEXT: br label [[OMP_FLOOR0_AFTER:%.*]]
				// CHECK: omp_floor0.after:
				// CHECK-NEXT: [[OMP_GLOBAL_THREAD_NUM19:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t @[[GLOB1]])
				// CHECK-NEXT: call void @__kmpc_barrier(%struct.ident_t* @[[GLOB2:[0-9]+]], i32 [[OMP_GLOBAL_THREAD_NUM19]])
				// CHECK-NEXT: br label [[OMP_LOOP_AFTER:%.*]]
				// CHECK: omp_loop.after:
				// CHECK-NEXT: ret void
				//
				//
				// CHECK-LABEL: define {{[^@]+}}@__captured_stmt
				// CHECK-SAME: (i32* nonnull align 4 dereferenceable(4) [[DISTANCE:%.]], %struct.anon noalias [[__CONTEXT:%.*]]) #[[ATTR0]] {
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[DISTANCE_ADDR:%.]] = alloca i32, align 8
				// CHECK-NEXT: [[__CONTEXT_ADDR:%.]] = alloca %struct.anon, align 8
				// CHECK-NEXT: [[DOTSTART:%.*]] = alloca i32, align 4
				// CHECK-NEXT: [[DOTSTOP:%.*]] = alloca i32, align 4
				// CHECK-NEXT: [[DOTSTEP:%.*]] = alloca i32, align 4
				// CHECK-NEXT: store i32* [[DISTANCE]], i32** [[DISTANCE_ADDR]], align 8
				// CHECK-NEXT: store %struct.anon* [[__CONTEXT]], %struct.anon** [[__CONTEXT_ADDR]], align 8
				// CHECK-NEXT: [[TMP0:%.]] = load %struct.anon, %struct.anon** [[__CONTEXT_ADDR]], align 8
				// CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds [[STRUCT_ANON:%.]], %struct.anon* [[TMP0]], i32 0, i32 0
				// CHECK-NEXT: [[TMP2:%.]] = load i32, i32** [[TMP1]], align 8
				// CHECK-NEXT: [[TMP3:%.]] = load i32, i32 [[TMP2]], align 4
				// CHECK-NEXT: store i32 [[TMP3]], i32* [[DOTSTART]], align 4
				// CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds [[STRUCT_ANON]], %struct.anon [[TMP0]], i32 0, i32 1
				// CHECK-NEXT: [[TMP5:%.]] = load i32, i32** [[TMP4]], align 8
				// CHECK-NEXT: [[TMP6:%.]] = load i32, i32 [[TMP5]], align 4
				// CHECK-NEXT: store i32 [[TMP6]], i32* [[DOTSTOP]], align 4
				// CHECK-NEXT: store i32 1, i32* [[DOTSTEP]], align 4
				// CHECK-NEXT: [[TMP7:%.]] = load i32, i32 [[DOTSTART]], align 4
				// CHECK-NEXT: [[TMP8:%.]] = load i32, i32 [[DOTSTOP]], align 4
				// CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[TMP7]], [[TMP8]]
				// CHECK-NEXT: br i1 [[CMP]], label [[COND_TRUE:%.]], label [[COND_FALSE:%.]]
				// CHECK: cond.true:
				// CHECK-NEXT: [[TMP9:%.]] = load i32, i32 [[DOTSTOP]], align 4
				// CHECK-NEXT: [[TMP10:%.]] = load i32, i32 [[DOTSTART]], align 4
				// CHECK-NEXT: [[SUB:%.*]] = sub nsw i32 [[TMP9]], [[TMP10]]
				// CHECK-NEXT: [[TMP11:%.]] = load i32, i32 [[DOTSTEP]], align 4
				// CHECK-NEXT: [[DIV:%.*]] = udiv i32 [[SUB]], [[TMP11]]
				// CHECK-NEXT: br label [[COND_END:%.*]]
				// CHECK: cond.false:
				// CHECK-NEXT: br label [[COND_END]]
				// CHECK: cond.end:
				// CHECK-NEXT: [[COND:%.*]] = phi i32 [ [[DIV]], [[COND_TRUE]] ], [ 0, [[COND_FALSE]] ]
				// CHECK-NEXT: [[TMP12:%.]] = load i32, i32** [[DISTANCE_ADDR]], align 8
				// CHECK-NEXT: store i32 [[COND]], i32* [[TMP12]], align 4
				// CHECK-NEXT: ret void
				//
				//
				// CHECK-LABEL: define {{[^@]+}}@__captured_stmt.1
				// CHECK-SAME: (i32* nonnull align 4 dereferenceable(4) [[LOOPVAR:%.]], i32 [[LOGICAL:%.]], %struct.anon.0* noalias [[__CONTEXT:%.*]]) #[[ATTR0]] {
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[LOOPVAR_ADDR:%.]] = alloca i32, align 8
				// CHECK-NEXT: [[LOGICAL_ADDR:%.*]] = alloca i32, align 4
				// CHECK-NEXT: [[__CONTEXT_ADDR:%.]] = alloca %struct.anon.0, align 8
				// CHECK-NEXT: store i32* [[LOOPVAR]], i32** [[LOOPVAR_ADDR]], align 8
				// CHECK-NEXT: store i32 [[LOGICAL]], i32* [[LOGICAL_ADDR]], align 4
				// CHECK-NEXT: store %struct.anon.0* [[__CONTEXT]], %struct.anon.0** [[__CONTEXT_ADDR]], align 8
				// CHECK-NEXT: [[TMP0:%.]] = load %struct.anon.0, %struct.anon.0** [[__CONTEXT_ADDR]], align 8
				// CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds [[STRUCT_ANON_0:%.]], %struct.anon.0* [[TMP0]], i32 0, i32 0
				// CHECK-NEXT: [[TMP2:%.]] = load i32, i32 [[TMP1]], align 4
				// CHECK-NEXT: [[TMP3:%.]] = load i32, i32 [[LOGICAL_ADDR]], align 4
				// CHECK-NEXT: [[MUL:%.*]] = mul i32 1, [[TMP3]]
				// CHECK-NEXT: [[ADD:%.*]] = add i32 [[TMP2]], [[MUL]]
				// CHECK-NEXT: [[TMP4:%.]] = load i32, i32** [[LOOPVAR_ADDR]], align 8
				// CHECK-NEXT: store i32 [[ADD]], i32* [[TMP4]], align 4
				// CHECK-NEXT: ret void
				//

clang/test/OpenMP/irbuilder_unroll_unroll_partial_factor.c

	// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --function-signature --include-generated-funcs			// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --function-signature --include-generated-funcs
	// RUN: %clang_cc1 -fopenmp-enable-irbuilder -verify -fopenmp -fopenmp-version=51 -x c -triple x86_64-unknown-unknown -emit-llvm %s -o - \| FileCheck %s			// RUN: %clang_cc1 -fopenmp-enable-irbuilder -verify -fopenmp -fopenmp-version=51 -x c -triple x86_64-unknown-unknown -emit-llvm %s -o - \| FileCheck %s
	// expected-no-diagnostics			// expected-no-diagnostics

	#ifndef HEADER			#ifndef HEADER
	#define HEADER			#define HEADER

	// CHECK-LABEL: define {{.*}}@unroll_partial_factor_for(
	// CHECK-NEXT: [[ENTRY:.*]]:
	// CHECK-NEXT: %[[A_ADDR:.+]] = alloca float*, align 8
	// CHECK-NEXT: %[[B_ADDR:.+]] = alloca float*, align 8
	// CHECK-NEXT: %[[C_ADDR:.+]] = alloca float*, align 8
	// CHECK-NEXT: %[[D_ADDR:.+]] = alloca float*, align 8
	// CHECK-NEXT: %[[I:.+]] = alloca i32, align 4
	// CHECK-NEXT: %[[AGG_CAPTURED:.+]] = alloca %struct.anon, align 8
	// CHECK-NEXT: %[[AGG_CAPTURED1:.+]] = alloca %struct.anon.0, align 4
	// CHECK-NEXT: %[[DOTCOUNT_ADDR:.+]] = alloca i32, align 4
	// CHECK-NEXT: %[[P_LASTITER:.+]] = alloca i32, align 4
	// CHECK-NEXT: %[[P_LOWERBOUND:.+]] = alloca i32, align 4
	// CHECK-NEXT: %[[P_UPPERBOUND:.+]] = alloca i32, align 4
	// CHECK-NEXT: %[[P_STRIDE:.+]] = alloca i32, align 4
	// CHECK-NEXT: store float* %[[A:.+]], float** %[[A_ADDR]], align 8
	// CHECK-NEXT: store float* %[[B:.+]], float** %[[B_ADDR]], align 8
	// CHECK-NEXT: store float* %[[C:.+]], float** %[[C_ADDR]], align 8
	// CHECK-NEXT: store float* %[[D:.+]], float** %[[D_ADDR]], align 8
	// CHECK-NEXT: store i32 0, i32* %[[I]], align 4
	// CHECK-NEXT: %[[TMP0:.+]] = getelementptr inbounds %struct.anon, %struct.anon* %[[AGG_CAPTURED]], i32 0, i32 0
	// CHECK-NEXT: store i32* %[[I]], i32** %[[TMP0]], align 8
	// CHECK-NEXT: %[[TMP1:.+]] = getelementptr inbounds %struct.anon.0, %struct.anon.0* %[[AGG_CAPTURED1]], i32 0, i32 0
	// CHECK-NEXT: %[[TMP2:.+]] = load i32, i32* %[[I]], align 4
	// CHECK-NEXT: store i32 %[[TMP2]], i32* %[[TMP1]], align 4
	// CHECK-NEXT: call void @__captured_stmt(i32* %[[DOTCOUNT_ADDR]], %struct.anon* %[[AGG_CAPTURED]])
	// CHECK-NEXT: %[[DOTCOUNT:.+]] = load i32, i32* %[[DOTCOUNT_ADDR]], align 4
	// CHECK-NEXT: br label %[[OMP_LOOP_PREHEADER:.+]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[OMP_LOOP_PREHEADER]]:
	// CHECK-NEXT: %[[TMP3:.+]] = udiv i32 %[[DOTCOUNT]], 2
	// CHECK-NEXT: %[[TMP4:.+]] = urem i32 %[[DOTCOUNT]], 2
	// CHECK-NEXT: %[[TMP5:.+]] = icmp ne i32 %[[TMP4]], 0
	// CHECK-NEXT: %[[TMP6:.+]] = zext i1 %[[TMP5]] to i32
	// CHECK-NEXT: %[[OMP_FLOOR0_TRIPCOUNT:.+]] = add nuw i32 %[[TMP3]], %[[TMP6]]
	// CHECK-NEXT: br label %[[OMP_FLOOR0_PREHEADER:.+]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[OMP_FLOOR0_PREHEADER]]:
	// CHECK-NEXT: store i32 0, i32* %[[P_LOWERBOUND]], align 4
	// CHECK-NEXT: %[[TMP7:.+]] = sub i32 %[[OMP_FLOOR0_TRIPCOUNT]], 1
	// CHECK-NEXT: store i32 %[[TMP7]], i32* %[[P_UPPERBOUND]], align 4
	// CHECK-NEXT: store i32 1, i32* %[[P_STRIDE]], align 4
	// CHECK-NEXT: %[[OMP_GLOBAL_THREAD_NUM:.+]] = call i32 @__kmpc_global_thread_num(%struct.ident_t* @1)
	// CHECK-NEXT: call void @__kmpc_for_static_init_4u(%struct.ident_t* @1, i32 %[[OMP_GLOBAL_THREAD_NUM]], i32 34, i32* %[[P_LASTITER]], i32* %[[P_LOWERBOUND]], i32* %[[P_UPPERBOUND]], i32* %[[P_STRIDE]], i32 1, i32 1)
	// CHECK-NEXT: %[[TMP8:.+]] = load i32, i32* %[[P_LOWERBOUND]], align 4
	// CHECK-NEXT: %[[TMP9:.+]] = load i32, i32* %[[P_UPPERBOUND]], align 4
	// CHECK-NEXT: %[[TMP10:.+]] = sub i32 %[[TMP9]], %[[TMP8]]
	// CHECK-NEXT: %[[TMP11:.+]] = add i32 %[[TMP10]], 1
	// CHECK-NEXT: br label %[[OMP_FLOOR0_HEADER:.+]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[OMP_FLOOR0_HEADER]]:
	// CHECK-NEXT: %[[OMP_FLOOR0_IV:.+]] = phi i32 [ 0, %[[OMP_FLOOR0_PREHEADER]] ], [ %[[OMP_FLOOR0_NEXT:.+]], %[[OMP_FLOOR0_INC:.+]] ]
	// CHECK-NEXT: br label %[[OMP_FLOOR0_COND:.+]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[OMP_FLOOR0_COND]]:
	// CHECK-NEXT: %[[OMP_FLOOR0_CMP:.+]] = icmp ult i32 %[[OMP_FLOOR0_IV]], %[[TMP11]]
	// CHECK-NEXT: br i1 %[[OMP_FLOOR0_CMP]], label %[[OMP_FLOOR0_BODY:.+]], label %[[OMP_FLOOR0_EXIT:.+]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[OMP_FLOOR0_BODY]]:
	// CHECK-NEXT: %[[TMP12:.+]] = add i32 %[[OMP_FLOOR0_IV]], %[[TMP8]]
	// CHECK-NEXT: %[[TMP13:.+]] = icmp eq i32 %[[TMP12]], %[[OMP_FLOOR0_TRIPCOUNT]]
	// CHECK-NEXT: %[[TMP14:.+]] = select i1 %[[TMP13]], i32 %[[TMP4]], i32 2
	// CHECK-NEXT: br label %[[OMP_TILE0_PREHEADER:.+]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[OMP_TILE0_PREHEADER]]:
	// CHECK-NEXT: br label %[[OMP_TILE0_HEADER:.+]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[OMP_TILE0_HEADER]]:
	// CHECK-NEXT: %[[OMP_TILE0_IV:.+]] = phi i32 [ 0, %[[OMP_TILE0_PREHEADER]] ], [ %[[OMP_TILE0_NEXT:.+]], %[[OMP_TILE0_INC:.+]] ]
	// CHECK-NEXT: br label %[[OMP_TILE0_COND:.+]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[OMP_TILE0_COND]]:
	// CHECK-NEXT: %[[OMP_TILE0_CMP:.+]] = icmp ult i32 %[[OMP_TILE0_IV]], %[[TMP14]]
	// CHECK-NEXT: br i1 %[[OMP_TILE0_CMP]], label %[[OMP_TILE0_BODY:.+]], label %[[OMP_TILE0_EXIT:.+]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[OMP_TILE0_BODY]]:
	// CHECK-NEXT: %[[TMP15:.+]] = mul nuw i32 2, %[[TMP12]]
	// CHECK-NEXT: %[[TMP16:.+]] = add nuw i32 %[[TMP15]], %[[OMP_TILE0_IV]]
	// CHECK-NEXT: br label %[[OMP_LOOP_BODY:.+]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[OMP_LOOP_BODY]]:
	// CHECK-NEXT: call void @__captured_stmt.1(i32* %[[I]], i32 %[[TMP16]], %struct.anon.0* %[[AGG_CAPTURED1]])
	// CHECK-NEXT: %[[TMP17:.+]] = load float, float* %[[B_ADDR]], align 8
	// CHECK-NEXT: %[[TMP18:.+]] = load i32, i32* %[[I]], align 4
	// CHECK-NEXT: %[[IDXPROM:.+]] = sext i32 %[[TMP18]] to i64
	// CHECK-NEXT: %[[ARRAYIDX:.+]] = getelementptr inbounds float, float* %[[TMP17]], i64 %[[IDXPROM]]
	// CHECK-NEXT: %[[TMP19:.+]] = load float, float* %[[ARRAYIDX]], align 4
	// CHECK-NEXT: %[[TMP20:.+]] = load float, float* %[[C_ADDR]], align 8
	// CHECK-NEXT: %[[TMP21:.+]] = load i32, i32* %[[I]], align 4
	// CHECK-NEXT: %[[IDXPROM2:.+]] = sext i32 %[[TMP21]] to i64
	// CHECK-NEXT: %[[ARRAYIDX3:.+]] = getelementptr inbounds float, float* %[[TMP20]], i64 %[[IDXPROM2]]
	// CHECK-NEXT: %[[TMP22:.+]] = load float, float* %[[ARRAYIDX3]], align 4
	// CHECK-NEXT: %[[MUL:.+]] = fmul float %[[TMP19]], %[[TMP22]]
	// CHECK-NEXT: %[[TMP23:.+]] = load float, float* %[[D_ADDR]], align 8
	// CHECK-NEXT: %[[TMP24:.+]] = load i32, i32* %[[I]], align 4
	// CHECK-NEXT: %[[IDXPROM4:.+]] = sext i32 %[[TMP24]] to i64
	// CHECK-NEXT: %[[ARRAYIDX5:.+]] = getelementptr inbounds float, float* %[[TMP23]], i64 %[[IDXPROM4]]
	// CHECK-NEXT: %[[TMP25:.+]] = load float, float* %[[ARRAYIDX5]], align 4
	// CHECK-NEXT: %[[MUL6:.+]] = fmul float %[[MUL]], %[[TMP25]]
	// CHECK-NEXT: %[[TMP26:.+]] = load float, float* %[[A_ADDR]], align 8
	// CHECK-NEXT: %[[TMP27:.+]] = load i32, i32* %[[I]], align 4
	// CHECK-NEXT: %[[IDXPROM7:.+]] = sext i32 %[[TMP27]] to i64
	// CHECK-NEXT: %[[ARRAYIDX8:.+]] = getelementptr inbounds float, float* %[[TMP26]], i64 %[[IDXPROM7]]
	// CHECK-NEXT: store float %[[MUL6]], float* %[[ARRAYIDX8]], align 4
	// CHECK-NEXT: br label %[[OMP_TILE0_INC]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[OMP_TILE0_INC]]:
	// CHECK-NEXT: %[[OMP_TILE0_NEXT]] = add nuw i32 %[[OMP_TILE0_IV]], 1
	// CHECK-NEXT: br label %[[OMP_TILE0_HEADER]], !llvm.loop ![[LOOP3:[0-9]+]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[OMP_TILE0_EXIT]]:
	// CHECK-NEXT: br label %[[OMP_TILE0_AFTER:.+]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[OMP_TILE0_AFTER]]:
	// CHECK-NEXT: br label %[[OMP_FLOOR0_INC]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[OMP_FLOOR0_INC]]:
	// CHECK-NEXT: %[[OMP_FLOOR0_NEXT]] = add nuw i32 %[[OMP_FLOOR0_IV]], 1
	// CHECK-NEXT: br label %[[OMP_FLOOR0_HEADER]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[OMP_FLOOR0_EXIT]]:
	// CHECK-NEXT: call void @__kmpc_for_static_fini(%struct.ident_t* @1, i32 %[[OMP_GLOBAL_THREAD_NUM]])
	// CHECK-NEXT: %[[OMP_GLOBAL_THREAD_NUM9:.+]] = call i32 @__kmpc_global_thread_num(%struct.ident_t* @1)
	// CHECK-NEXT: call void @__kmpc_barrier(%struct.ident_t* @2, i32 %[[OMP_GLOBAL_THREAD_NUM9]])
	// CHECK-NEXT: br label %[[OMP_FLOOR0_AFTER:.+]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[OMP_FLOOR0_AFTER]]:
	// CHECK-NEXT: br label %[[OMP_LOOP_AFTER:.+]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[OMP_LOOP_AFTER]]:
	// CHECK-NEXT: ret void
	// CHECK-NEXT: }

	void unroll_partial_factor_for(float a, float b, float c, float d) {			void unroll_partial_factor_for(float a, float b, float c, float d) {
	#pragma omp for			#pragma omp for
	#pragma omp unroll partial(2)			#pragma omp unroll partial(2)
	for (int i = 0; i < 2; i++) {			for (int i = 0; i < 2; i++) {
	a[i] = b[i] * c[i] * d[i];			a[i] = b[i] * c[i] * d[i];
	}			}
	}			}

	#endif // HEADER			#endif // HEADER

	// CHECK-LABEL: define {{.*}}@__captured_stmt(
	// CHECK-NEXT: [[ENTRY:.*]]:
	// CHECK-NEXT: %[[DISTANCE_ADDR:.+]] = alloca i32*, align 8
	// CHECK-NEXT: %[[__CONTEXT_ADDR:.+]] = alloca %struct.anon*, align 8
	// CHECK-NEXT: %[[DOTSTART:.+]] = alloca i32, align 4
	// CHECK-NEXT: %[[DOTSTOP:.+]] = alloca i32, align 4
	// CHECK-NEXT: %[[DOTSTEP:.+]] = alloca i32, align 4
	// CHECK-NEXT: store i32* %[[DISTANCE:.+]], i32** %[[DISTANCE_ADDR]], align 8
	// CHECK-NEXT: store %struct.anon* %[[__CONTEXT:.+]], %struct.anon** %[[__CONTEXT_ADDR]], align 8
	// CHECK-NEXT: %[[TMP0:.+]] = load %struct.anon, %struct.anon* %[[__CONTEXT_ADDR]], align 8
	// CHECK-NEXT: %[[TMP1:.+]] = getelementptr inbounds %struct.anon, %struct.anon* %[[TMP0]], i32 0, i32 0
	// CHECK-NEXT: %[[TMP2:.+]] = load i32, i32* %[[TMP1]], align 8
	// CHECK-NEXT: %[[TMP3:.+]] = load i32, i32* %[[TMP2]], align 4
	// CHECK-NEXT: store i32 %[[TMP3]], i32* %[[DOTSTART]], align 4
	// CHECK-NEXT: store i32 2, i32* %[[DOTSTOP]], align 4
	// CHECK-NEXT: store i32 1, i32* %[[DOTSTEP]], align 4
	// CHECK-NEXT: %[[TMP4:.+]] = load i32, i32* %[[DOTSTART]], align 4
	// CHECK-NEXT: %[[TMP5:.+]] = load i32, i32* %[[DOTSTOP]], align 4
	// CHECK-NEXT: %[[CMP:.+]] = icmp slt i32 %[[TMP4]], %[[TMP5]]
	// CHECK-NEXT: br i1 %[[CMP]], label %[[COND_TRUE:.+]], label %[[COND_FALSE:.+]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[COND_TRUE]]:
	// CHECK-NEXT: %[[TMP6:.+]] = load i32, i32* %[[DOTSTOP]], align 4
	// CHECK-NEXT: %[[TMP7:.+]] = load i32, i32* %[[DOTSTART]], align 4
	// CHECK-NEXT: %[[SUB:.+]] = sub nsw i32 %[[TMP6]], %[[TMP7]]
	// CHECK-NEXT: %[[TMP8:.+]] = load i32, i32* %[[DOTSTEP]], align 4
	// CHECK-NEXT: %[[DIV:.+]] = udiv i32 %[[SUB]], %[[TMP8]]
	// CHECK-NEXT: br label %[[COND_END:.+]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[COND_FALSE]]:
	// CHECK-NEXT: br label %[[COND_END]]
	// CHECK-EMPTY:
	// CHECK-NEXT: [[COND_END]]:
	// CHECK-NEXT: %[[COND:.+]] = phi i32 [ %[[DIV]], %[[COND_TRUE]] ], [ 0, %[[COND_FALSE]] ]
	// CHECK-NEXT: %[[TMP9:.+]] = load i32, i32* %[[DISTANCE_ADDR]], align 8
	// CHECK-NEXT: store i32 %[[COND]], i32* %[[TMP9]], align 4
	// CHECK-NEXT: ret void
	// CHECK-NEXT: }


	// CHECK-LABEL: define {{.*}}@__captured_stmt.1(
	// CHECK-NEXT: [[ENTRY:.*]]:
	// CHECK-NEXT: %[[LOOPVAR_ADDR:.+]] = alloca i32*, align 8
	// CHECK-NEXT: %[[LOGICAL_ADDR:.+]] = alloca i32, align 4
	// CHECK-NEXT: %[[__CONTEXT_ADDR:.+]] = alloca %struct.anon.0*, align 8
	// CHECK-NEXT: store i32* %[[LOOPVAR:.+]], i32** %[[LOOPVAR_ADDR]], align 8
	// CHECK-NEXT: store i32 %[[LOGICAL:.+]], i32* %[[LOGICAL_ADDR]], align 4
	// CHECK-NEXT: store %struct.anon.0* %[[__CONTEXT:.+]], %struct.anon.0** %[[__CONTEXT_ADDR]], align 8
	// CHECK-NEXT: %[[TMP0:.+]] = load %struct.anon.0, %struct.anon.0* %[[__CONTEXT_ADDR]], align 8
	// CHECK-NEXT: %[[TMP1:.+]] = getelementptr inbounds %struct.anon.0, %struct.anon.0* %[[TMP0]], i32 0, i32 0
	// CHECK-NEXT: %[[TMP2:.+]] = load i32, i32* %[[TMP1]], align 4
	// CHECK-NEXT: %[[TMP3:.+]] = load i32, i32* %[[LOGICAL_ADDR]], align 4
	// CHECK-NEXT: %[[MUL:.+]] = mul i32 1, %[[TMP3]]
	// CHECK-NEXT: %[[ADD:.+]] = add i32 %[[TMP2]], %[[MUL]]
	// CHECK-NEXT: %[[TMP4:.+]] = load i32, i32* %[[LOOPVAR_ADDR]], align 8
	// CHECK-NEXT: store i32 %[[ADD]], i32* %[[TMP4]], align 4
	// CHECK-NEXT: ret void
	// CHECK-NEXT: }


	// CHECK: ![[META0:[0-9]+]] = !{i32 1, !"wchar_size", i32 4}			// CHECK-LABEL: define {{[^@]+}}@unroll_partial_factor_for
	// CHECK: ![[META1:[0-9]+]] = !{i32 7, !"openmp", i32 51}			// CHECK-SAME: (float* [[A:%.]], float [[B:%.]], float [[C:%.]], float [[D:%.*]]) #[[ATTR0:[0-9]+]] {
	// CHECK: ![[META2:[0-9]+]] =			// CHECK-NEXT: entry:
	// CHECK: ![[LOOP3]] = distinct !{![[LOOP3]], ![[LOOPPROP4:[0-9]+]], ![[LOOPPROP5:[0-9]+]]}			// CHECK-NEXT: [[A_ADDR:%.]] = alloca float, align 8
	// CHECK: ![[LOOPPROP4]] = !{!"llvm.loop.unroll.enable"}			// CHECK-NEXT: [[B_ADDR:%.]] = alloca float, align 8
	// CHECK: ![[LOOPPROP5]] = !{!"llvm.loop.unroll.count", i32 2}			// CHECK-NEXT: [[C_ADDR:%.]] = alloca float, align 8
				// CHECK-NEXT: [[D_ADDR:%.]] = alloca float, align 8
				// CHECK-NEXT: [[I:%.*]] = alloca i32, align 4
				// CHECK-NEXT: [[AGG_CAPTURED:%.]] = alloca [[STRUCT_ANON:%.]], align 8
				// CHECK-NEXT: [[AGG_CAPTURED1:%.]] = alloca [[STRUCT_ANON_0:%.]], align 4
				// CHECK-NEXT: [[DOTCOUNT_ADDR:%.*]] = alloca i32, align 4
				// CHECK-NEXT: [[P_LASTITER:%.*]] = alloca i32, align 4
				// CHECK-NEXT: [[P_LOWERBOUND:%.*]] = alloca i32, align 4
				// CHECK-NEXT: [[P_UPPERBOUND:%.*]] = alloca i32, align 4
				// CHECK-NEXT: [[P_STRIDE:%.*]] = alloca i32, align 4
				// CHECK-NEXT: store float* [[A]], float** [[A_ADDR]], align 8
				// CHECK-NEXT: store float* [[B]], float** [[B_ADDR]], align 8
				// CHECK-NEXT: store float* [[C]], float** [[C_ADDR]], align 8
				// CHECK-NEXT: store float* [[D]], float** [[D_ADDR]], align 8
				// CHECK-NEXT: store i32 0, i32* [[I]], align 4
				// CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds [[STRUCT_ANON]], %struct.anon [[AGG_CAPTURED]], i32 0, i32 0
				// CHECK-NEXT: store i32* [[I]], i32** [[TMP0]], align 8
				// CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds [[STRUCT_ANON_0]], %struct.anon.0 [[AGG_CAPTURED1]], i32 0, i32 0
				// CHECK-NEXT: [[TMP2:%.]] = load i32, i32 [[I]], align 4
				// CHECK-NEXT: store i32 [[TMP2]], i32* [[TMP1]], align 4
				// CHECK-NEXT: call void @__captured_stmt(i32* [[DOTCOUNT_ADDR]], %struct.anon* [[AGG_CAPTURED]])
				// CHECK-NEXT: [[DOTCOUNT:%.]] = load i32, i32 [[DOTCOUNT_ADDR]], align 4
				// CHECK-NEXT: br label [[OMP_LOOP_PREHEADER:%.*]]
				// CHECK: omp_loop.preheader:
				// CHECK-NEXT: [[TMP3:%.*]] = udiv i32 [[DOTCOUNT]], 2
				// CHECK-NEXT: [[TMP4:%.*]] = urem i32 [[DOTCOUNT]], 2
				// CHECK-NEXT: [[TMP5:%.*]] = icmp ne i32 [[TMP4]], 0
				// CHECK-NEXT: [[TMP6:%.*]] = zext i1 [[TMP5]] to i32
				// CHECK-NEXT: [[OMP_FLOOR0_TRIPCOUNT:%.*]] = add nuw i32 [[TMP3]], [[TMP6]]
				// CHECK-NEXT: br label [[OMP_FLOOR0_PREHEADER:%.*]]
				// CHECK: omp_floor0.preheader:
				// CHECK-NEXT: store i32 0, i32* [[P_LOWERBOUND]], align 4
				// CHECK-NEXT: [[TMP7:%.*]] = sub i32 [[OMP_FLOOR0_TRIPCOUNT]], 1
				// CHECK-NEXT: store i32 [[TMP7]], i32* [[P_UPPERBOUND]], align 4
				// CHECK-NEXT: store i32 1, i32* [[P_STRIDE]], align 4
				// CHECK-NEXT: [[OMP_GLOBAL_THREAD_NUM:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t @[[GLOB1:[0-9]+]])
				// CHECK-NEXT: call void @__kmpc_for_static_init_4u(%struct.ident_t* @[[GLOB1]], i32 [[OMP_GLOBAL_THREAD_NUM]], i32 34, i32* [[P_LASTITER]], i32* [[P_LOWERBOUND]], i32* [[P_UPPERBOUND]], i32* [[P_STRIDE]], i32 1, i32 1)
				// CHECK-NEXT: [[TMP8:%.]] = load i32, i32 [[P_LOWERBOUND]], align 4
				// CHECK-NEXT: [[TMP9:%.]] = load i32, i32 [[P_UPPERBOUND]], align 4
				// CHECK-NEXT: [[TMP10:%.*]] = sub i32 [[TMP9]], [[TMP8]]
				// CHECK-NEXT: [[TMP11:%.*]] = add i32 [[TMP10]], 1
				// CHECK-NEXT: br label [[OMP_FLOOR0_HEADER:%.*]]
				// CHECK: omp_floor0.header:
				// CHECK-NEXT: [[OMP_FLOOR0_IV:%.]] = phi i32 [ 0, [[OMP_FLOOR0_PREHEADER]] ], [ [[OMP_FLOOR0_NEXT:%.]], [[OMP_FLOOR0_INC:%.*]] ]
				// CHECK-NEXT: br label [[OMP_FLOOR0_COND:%.*]]
				// CHECK: omp_floor0.cond:
				// CHECK-NEXT: [[OMP_FLOOR0_CMP:%.*]] = icmp ult i32 [[OMP_FLOOR0_IV]], [[TMP11]]
				// CHECK-NEXT: br i1 [[OMP_FLOOR0_CMP]], label [[OMP_FLOOR0_BODY:%.]], label [[OMP_FLOOR0_EXIT:%.]]
				// CHECK: omp_floor0.body:
				// CHECK-NEXT: [[TMP12:%.*]] = add i32 [[OMP_FLOOR0_IV]], [[TMP8]]
				// CHECK-NEXT: [[TMP13:%.*]] = icmp eq i32 [[TMP12]], [[OMP_FLOOR0_TRIPCOUNT]]
				// CHECK-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], i32 [[TMP4]], i32 2
				// CHECK-NEXT: br label [[OMP_TILE0_PREHEADER:%.*]]
				// CHECK: omp_tile0.preheader:
				// CHECK-NEXT: br label [[OMP_TILE0_HEADER:%.*]]
				// CHECK: omp_tile0.header:
				// CHECK-NEXT: [[OMP_TILE0_IV:%.]] = phi i32 [ 0, [[OMP_TILE0_PREHEADER]] ], [ [[OMP_TILE0_NEXT:%.]], [[OMP_TILE0_INC:%.*]] ]
				// CHECK-NEXT: br label [[OMP_TILE0_COND:%.*]]
				// CHECK: omp_tile0.cond:
				// CHECK-NEXT: [[OMP_TILE0_CMP:%.*]] = icmp ult i32 [[OMP_TILE0_IV]], [[TMP14]]
				// CHECK-NEXT: br i1 [[OMP_TILE0_CMP]], label [[OMP_TILE0_BODY:%.]], label [[OMP_TILE0_EXIT:%.]]
				// CHECK: omp_tile0.body:
				// CHECK-NEXT: [[TMP15:%.*]] = mul nuw i32 2, [[TMP12]]
				// CHECK-NEXT: [[TMP16:%.*]] = add nuw i32 [[TMP15]], [[OMP_TILE0_IV]]
				// CHECK-NEXT: br label [[OMP_LOOP_BODY:%.*]]
				// CHECK: omp_loop.body:
				// CHECK-NEXT: call void @__captured_stmt.1(i32* [[I]], i32 [[TMP16]], %struct.anon.0* [[AGG_CAPTURED1]])
				// CHECK-NEXT: [[TMP17:%.]] = load float, float** [[B_ADDR]], align 8
				// CHECK-NEXT: [[TMP18:%.]] = load i32, i32 [[I]], align 4
				// CHECK-NEXT: [[IDXPROM:%.*]] = sext i32 [[TMP18]] to i64
				// CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds float, float [[TMP17]], i64 [[IDXPROM]]
				// CHECK-NEXT: [[TMP19:%.]] = load float, float [[ARRAYIDX]], align 4
				// CHECK-NEXT: [[TMP20:%.]] = load float, float** [[C_ADDR]], align 8
				// CHECK-NEXT: [[TMP21:%.]] = load i32, i32 [[I]], align 4
				// CHECK-NEXT: [[IDXPROM2:%.*]] = sext i32 [[TMP21]] to i64
				// CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds float, float [[TMP20]], i64 [[IDXPROM2]]
				// CHECK-NEXT: [[TMP22:%.]] = load float, float [[ARRAYIDX3]], align 4
				// CHECK-NEXT: [[MUL:%.*]] = fmul float [[TMP19]], [[TMP22]]
				// CHECK-NEXT: [[TMP23:%.]] = load float, float** [[D_ADDR]], align 8
				// CHECK-NEXT: [[TMP24:%.]] = load i32, i32 [[I]], align 4
				// CHECK-NEXT: [[IDXPROM4:%.*]] = sext i32 [[TMP24]] to i64
				// CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds float, float [[TMP23]], i64 [[IDXPROM4]]
				// CHECK-NEXT: [[TMP25:%.]] = load float, float [[ARRAYIDX5]], align 4
				// CHECK-NEXT: [[MUL6:%.*]] = fmul float [[MUL]], [[TMP25]]
				// CHECK-NEXT: [[TMP26:%.]] = load float, float** [[A_ADDR]], align 8
				// CHECK-NEXT: [[TMP27:%.]] = load i32, i32 [[I]], align 4
				// CHECK-NEXT: [[IDXPROM7:%.*]] = sext i32 [[TMP27]] to i64
				// CHECK-NEXT: [[ARRAYIDX8:%.]] = getelementptr inbounds float, float [[TMP26]], i64 [[IDXPROM7]]
				// CHECK-NEXT: store float [[MUL6]], float* [[ARRAYIDX8]], align 4
				// CHECK-NEXT: br label [[OMP_TILE0_INC]]
				// CHECK: omp_tile0.inc:
				// CHECK-NEXT: [[OMP_TILE0_NEXT]] = add nuw i32 [[OMP_TILE0_IV]], 1
				// CHECK-NEXT: br label [[OMP_TILE0_HEADER]], !llvm.loop [[LOOP3:![0-9]+]]
				// CHECK: omp_tile0.exit:
				// CHECK-NEXT: br label [[OMP_TILE0_AFTER:%.*]]
				// CHECK: omp_tile0.after:
				// CHECK-NEXT: br label [[OMP_FLOOR0_INC]]
				// CHECK: omp_floor0.inc:
				// CHECK-NEXT: [[OMP_FLOOR0_NEXT]] = add nuw i32 [[OMP_FLOOR0_IV]], 1
				// CHECK-NEXT: br label [[OMP_FLOOR0_HEADER]]
				// CHECK: omp_floor0.exit:
				// CHECK-NEXT: call void @__kmpc_for_static_fini(%struct.ident_t* @[[GLOB1]], i32 [[OMP_GLOBAL_THREAD_NUM]])
				// CHECK-NEXT: br label [[OMP_FLOOR0_AFTER:%.*]]
				// CHECK: omp_floor0.after:
				// CHECK-NEXT: [[OMP_GLOBAL_THREAD_NUM9:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t @[[GLOB1]])
				// CHECK-NEXT: call void @__kmpc_barrier(%struct.ident_t* @[[GLOB2:[0-9]+]], i32 [[OMP_GLOBAL_THREAD_NUM9]])
				// CHECK-NEXT: br label [[OMP_LOOP_AFTER:%.*]]
				// CHECK: omp_loop.after:
				// CHECK-NEXT: ret void
				//
				//
				// CHECK-LABEL: define {{[^@]+}}@__captured_stmt
				// CHECK-SAME: (i32* nonnull align 4 dereferenceable(4) [[DISTANCE:%.]], %struct.anon noalias [[__CONTEXT:%.*]]) #[[ATTR0]] {
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[DISTANCE_ADDR:%.]] = alloca i32, align 8
				// CHECK-NEXT: [[__CONTEXT_ADDR:%.]] = alloca %struct.anon, align 8
				// CHECK-NEXT: [[DOTSTART:%.*]] = alloca i32, align 4
				// CHECK-NEXT: [[DOTSTOP:%.*]] = alloca i32, align 4
				// CHECK-NEXT: [[DOTSTEP:%.*]] = alloca i32, align 4
				// CHECK-NEXT: store i32* [[DISTANCE]], i32** [[DISTANCE_ADDR]], align 8
				// CHECK-NEXT: store %struct.anon* [[__CONTEXT]], %struct.anon** [[__CONTEXT_ADDR]], align 8
				// CHECK-NEXT: [[TMP0:%.]] = load %struct.anon, %struct.anon** [[__CONTEXT_ADDR]], align 8
				// CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds [[STRUCT_ANON:%.]], %struct.anon* [[TMP0]], i32 0, i32 0
				// CHECK-NEXT: [[TMP2:%.]] = load i32, i32** [[TMP1]], align 8
				// CHECK-NEXT: [[TMP3:%.]] = load i32, i32 [[TMP2]], align 4
				// CHECK-NEXT: store i32 [[TMP3]], i32* [[DOTSTART]], align 4
				// CHECK-NEXT: store i32 2, i32* [[DOTSTOP]], align 4
				// CHECK-NEXT: store i32 1, i32* [[DOTSTEP]], align 4
				// CHECK-NEXT: [[TMP4:%.]] = load i32, i32 [[DOTSTART]], align 4
				// CHECK-NEXT: [[TMP5:%.]] = load i32, i32 [[DOTSTOP]], align 4
				// CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[TMP4]], [[TMP5]]
				// CHECK-NEXT: br i1 [[CMP]], label [[COND_TRUE:%.]], label [[COND_FALSE:%.]]
				// CHECK: cond.true:
				// CHECK-NEXT: [[TMP6:%.]] = load i32, i32 [[DOTSTOP]], align 4
				// CHECK-NEXT: [[TMP7:%.]] = load i32, i32 [[DOTSTART]], align 4
				// CHECK-NEXT: [[SUB:%.*]] = sub nsw i32 [[TMP6]], [[TMP7]]
				// CHECK-NEXT: [[TMP8:%.]] = load i32, i32 [[DOTSTEP]], align 4
				// CHECK-NEXT: [[DIV:%.*]] = udiv i32 [[SUB]], [[TMP8]]
				// CHECK-NEXT: br label [[COND_END:%.*]]
				// CHECK: cond.false:
				// CHECK-NEXT: br label [[COND_END]]
				// CHECK: cond.end:
				// CHECK-NEXT: [[COND:%.*]] = phi i32 [ [[DIV]], [[COND_TRUE]] ], [ 0, [[COND_FALSE]] ]
				// CHECK-NEXT: [[TMP9:%.]] = load i32, i32** [[DISTANCE_ADDR]], align 8
				// CHECK-NEXT: store i32 [[COND]], i32* [[TMP9]], align 4
				// CHECK-NEXT: ret void
				//
				//
				// CHECK-LABEL: define {{[^@]+}}@__captured_stmt.1
				// CHECK-SAME: (i32* nonnull align 4 dereferenceable(4) [[LOOPVAR:%.]], i32 [[LOGICAL:%.]], %struct.anon.0* noalias [[__CONTEXT:%.*]]) #[[ATTR0]] {
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[LOOPVAR_ADDR:%.]] = alloca i32, align 8
				// CHECK-NEXT: [[LOGICAL_ADDR:%.*]] = alloca i32, align 4
				// CHECK-NEXT: [[__CONTEXT_ADDR:%.]] = alloca %struct.anon.0, align 8
				// CHECK-NEXT: store i32* [[LOOPVAR]], i32** [[LOOPVAR_ADDR]], align 8
				// CHECK-NEXT: store i32 [[LOGICAL]], i32* [[LOGICAL_ADDR]], align 4
				// CHECK-NEXT: store %struct.anon.0* [[__CONTEXT]], %struct.anon.0** [[__CONTEXT_ADDR]], align 8
				// CHECK-NEXT: [[TMP0:%.]] = load %struct.anon.0, %struct.anon.0** [[__CONTEXT_ADDR]], align 8
				// CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds [[STRUCT_ANON_0:%.]], %struct.anon.0* [[TMP0]], i32 0, i32 0
				// CHECK-NEXT: [[TMP2:%.]] = load i32, i32 [[TMP1]], align 4
				// CHECK-NEXT: [[TMP3:%.]] = load i32, i32 [[LOGICAL_ADDR]], align 4
				// CHECK-NEXT: [[MUL:%.*]] = mul i32 1, [[TMP3]]
				// CHECK-NEXT: [[ADD:%.*]] = add i32 [[TMP2]], [[MUL]]
				// CHECK-NEXT: [[TMP4:%.]] = load i32, i32** [[LOOPVAR_ADDR]], align 8
				// CHECK-NEXT: store i32 [[ADD]], i32* [[TMP4]], align 4
				// CHECK-NEXT: ret void
				//

llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h

Show First 20 Lines • Show All 401 Lines • ▼ Show 20 Lines	public:
///		///
/// \returns Point where to insert code after the workshare construct.		/// \returns Point where to insert code after the workshare construct.
InsertPointTy applyDynamicWorkshareLoop(DebugLoc DL, CanonicalLoopInfo *CLI,		InsertPointTy applyDynamicWorkshareLoop(DebugLoc DL, CanonicalLoopInfo *CLI,
InsertPointTy AllocaIP,		InsertPointTy AllocaIP,
omp::OMPScheduleType SchedType,		omp::OMPScheduleType SchedType,
bool NeedsBarrier,		bool NeedsBarrier,
Value *Chunk = nullptr);		Value *Chunk = nullptr);

		/// Insert doacross loop info in a workshare loop.
		///
		/// In \p AllocaIP, allocate space for the loop bounds info. In the front of
		/// \p PreHeaderBB, store \p DoacrossVars in the loop bounds info and call
		/// doacross loop init runtime function. Call the fini doacross loop runtime
		/// function in \p ExitBB.
		///
		/// \param DL Debug location for instructions.
		/// \param AllocaIP An insertion point for Alloca instructions.
		/// \param PreHeaderBB The preheader basic block of the loop.
		/// \param ExitBB The exit basic block of the loop.
		/// \param OrderedVal The ordered parameter (n) specified in ordered clause.
		/// \param DoacrossVars The lower bounds, upper bounds, and steps of n outer
		/// loops.
		void applyDoacrossLoop(DebugLoc DL, InsertPointTy AllocaIP,
		MeinersburUnsubmitted Not Done Reply Inline Actions Did you consider making doacross part of an existing call like applyDynamicWorkshareLoop? What are the reason against it? If is a potential `collapseLoop` that loses information of the dimensionality of the original loop, did you consider adding that information to `CanonicalLoopInfo` such that it can be preserved? Meinersbur: Did you consider making doacross part of an existing call like applyDynamicWorkshareLoop? What…
		peixinAuthorUnsubmitted Done Reply Inline Actions The `doacrossloop` only inserts the init and fini calls into current worksharing-loop. If making it like applyDynamicWorkshareLoop, three are needed, i.e., applyDoacrossDynamicLoop, applyDoacrossStaticLoop, and applyDoacrossStaticChunkLoop, which is too redundant. Worksharing loop is commonly used, but ordered(n) is not commonly used. In some workloads, there is even no ordered(n) clause. Adding doacorss loop info into CanonicalLoopInfo will have some cost, which is not necessary. What do you think? peixin: The `doacrossloop` only inserts the init and fini calls into current worksharing-loop. If…
		BasicBlock PreHeaderBB, BasicBlock ExitBB,
		std::int64_t OrderedVal,
		ArrayRef<llvm::Value *> DoacrossVars);

/// Modifies the canonical loop to be a workshare loop.		/// Modifies the canonical loop to be a workshare loop.
///		///
/// This takes a \p LoopInfo representing a canonical loop, such as the one		/// This takes a \p LoopInfo representing a canonical loop, such as the one
/// created by \p createCanonicalLoop and emits additional instructions to		/// created by \p createCanonicalLoop and emits additional instructions to
/// turn it into a workshare loop. In particular, it calls to an OpenMP		/// turn it into a workshare loop. In particular, it calls to an OpenMP
/// runtime function in the preheader to obtain the loop bounds to be used in		/// runtime function in the preheader to obtain the loop bounds to be used in
/// the current thread, updates the relevant instructions in the canonical		/// the current thread, updates the relevant instructions in the canonical
/// loop and calls to an OpenMP runtime finalization function after the loop.		/// loop and calls to an OpenMP runtime finalization function after the loop.
▲ Show 20 Lines • Show All 1,131 Lines • Show Last 20 Lines

llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp

Show First 20 Lines • Show All 1,543 Lines • ▼ Show 20 Lines return !Instr ||

Instr->getParent() != CLI->getLatch() && Instr != UpdatedIV); Instr->getParent() != CLI->getLatch() && Instr != UpdatedIV);

}); });

// In the "exit" block, call the "fini" function. // In the "exit" block, call the "fini" function.

Builder.SetInsertPoint(CLI->getExit(), Builder.SetInsertPoint(CLI->getExit(),

CLI->getExit()->getTerminator()->getIterator()); CLI->getExit()->getTerminator()->getIterator());

Builder.CreateCall(StaticFini, {SrcLoc, ThreadNum}); Builder.CreateCall(StaticFini, {SrcLoc, ThreadNum});

Builder.restoreIP(CLI->getAfterIP());

// Add the barrier if requested. // Add the barrier if requested.

if (NeedsBarrier) if (NeedsBarrier)

createBarrier(LocationDescription(Builder.saveIP(), DL), createBarrier(LocationDescription(Builder.saveIP(), DL),

omp::Directive::OMPD_for, /* ForceSimpleCall */ false, omp::Directive::OMPD_for, /* ForceSimpleCall */ false,

/* CheckCancelFlag */ false); /* CheckCancelFlag */ false);

InsertPointTy AfterIP = CLI->getAfterIP();

CLI->invalidate(); CLI->invalidate();

return AfterIP; return Builder.saveIP();

} }

OpenMPIRBuilder::InsertPointTy OpenMPIRBuilder::InsertPointTy

OpenMPIRBuilder::applyWorkshareLoop(DebugLoc DL, CanonicalLoopInfo *CLI, OpenMPIRBuilder::applyWorkshareLoop(DebugLoc DL, CanonicalLoopInfo *CLI,

InsertPointTy AllocaIP, bool NeedsBarrier) { InsertPointTy AllocaIP, bool NeedsBarrier) {

// Currently only supports static schedules. // Currently only supports static schedules.

return applyStaticWorkshareLoop(DL, CLI, AllocaIP, NeedsBarrier); return applyStaticWorkshareLoop(DL, CLI, AllocaIP, NeedsBarrier);

} }

▲ Show 20 Lines • Show All 124 Lines • ▼ Show 20 Lines OpenMPIRBuilder::InsertPointTy OpenMPIRBuilder::applyDynamicWorkshareLoop(

auto *CI = cast<CmpInst>(Comp); auto *CI = cast<CmpInst>(Comp);

CI->setOperand(1, UpperBound); CI->setOperand(1, UpperBound);

// Redirect the inner exit to branch to outer condition. // Redirect the inner exit to branch to outer condition.

Instruction *Branch = &Cond->back(); Instruction *Branch = &Cond->back();

auto *BI = cast<BranchInst>(Branch); auto *BI = cast<BranchInst>(Branch);

assert(BI->getSuccessor(1) == Exit); assert(BI->getSuccessor(1) == Exit);

BI->setSuccessor(1, OuterCond); BI->setSuccessor(1, OuterCond);

Builder.restoreIP(AfterIP);

// Add the barrier if requested. // Add the barrier if requested.

if (NeedsBarrier) { if (NeedsBarrier)

Builder.SetInsertPoint(&Exit->back());

createBarrier(LocationDescription(Builder.saveIP(), DL), createBarrier(LocationDescription(Builder.saveIP(), DL),

omp::Directive::OMPD_for, /* ForceSimpleCall */ false, omp::Directive::OMPD_for, /* ForceSimpleCall */ false,

/* CheckCancelFlag */ false); /* CheckCancelFlag */ false);

}

CLI->invalidate(); CLI->invalidate();

return AfterIP; return Builder.saveIP();

}

void OpenMPIRBuilder::applyDoacrossLoop(DebugLoc DL, InsertPointTy AllocaIP,

BasicBlock *PreHeaderBB,

BasicBlock *ExitBB,

std::int64_t OrderedVal,

ArrayRef<llvm::Value *> DoacrossVars) {

for (size_t I = 0; I < DoacrossVars.size(); I++)

assert(DoacrossVars[I]->getType()->isIntegerTy(64) &&

"Doacross init runtime call requires loop bounds with i64 type");

MeinersburUnsubmitted

Not Done

If the body of the loop is just an assert, enclose the entire loop into an #ifndef NDEBUG

Meinersbur: If the body of the loop is just an assert, enclose the entire loop into an `#ifndef NDEBUG`

// Set up the source location value for OpenMP runtime.

Builder.SetInsertPoint(&PreHeaderBB->front());

Builder.SetCurrentDebugLocation(DL);

Constant *SrcLocStr = getOrCreateSrcLocStr(DL);

Value *SrcLoc = getOrCreateIdent(SrcLocStr);

// Allocate space for loop bounds and generate alloc instruction.

SmallVector<Type *, 3> ElementsTys;

ElementsTys.emplace_back(Int64); // lower

ElementsTys.emplace_back(Int64); // upper

ElementsTys.emplace_back(Int64); // stride(step)

MeinersburUnsubmitted

Not Done

// Allocate space for loop bounds and generate alloc instruction.

- SmallVector<Type *, 3> ElementsTys;

- ElementsTys.emplace_back(Int64); // lower

- ElementsTys.emplace_back(Int64); // upper

- ElementsTys.emplace_back(Int64); // stride(step)

+ Type* ElementsTys[] = {/*lower*/ Int64, /*upper*/ Int64, /*stride/step*/ Int64};

auto *KmpDimTy = StructType::create(ElementsTys, "kmp_dim");

Meinersbur:

auto *KmpDimTy = StructType::create(ElementsTys, "kmp_dim");

auto *DimsTy = ArrayType::get(KmpDimTy, OrderedVal);

Builder.restoreIP(AllocaIP);

AllocaInst *DimsInst = Builder.CreateAlloca(DimsTy, nullptr, "dims");

DimsInst->setAlignment(Align(8));

// Emit doacross init call in preheader front.

Builder.SetInsertPoint(&PreHeaderBB->front());

// Store doacross loop vars in loop bounds.

for (std::int64_t I = 0; I < OrderedVal; I++) {

MeinersburUnsubmitted

Not Done

// Store doacross loop vars in loop bounds.

- for (std::int64_t I = 0; I < OrderedVal; I++) {

+ for (int64_t I = 0; I < OrderedVal; I++) {

Value *LoopBounds = Builder.CreateInBoundsGEP(

Meinersbur:

Value *LoopBounds = Builder.CreateInBoundsGEP(

DimsTy, DimsInst, {Builder.getInt64(0), Builder.getInt64(I)});

Value *LowerBound = Builder.CreateInBoundsGEP(

KmpDimTy, LoopBounds, {Builder.getInt32(0), Builder.getInt32(0)});

StoreInst *LBInst = Builder.CreateStore(DoacrossVars[I * 3], LowerBound);

LBInst->setAlignment(Align(8));

Value *UpperBound = Builder.CreateInBoundsGEP(

KmpDimTy, LoopBounds, {Builder.getInt32(0), Builder.getInt32(1)});

StoreInst *UBInst =

Builder.CreateStore(DoacrossVars[I * 3 + 1], UpperBound);

UBInst->setAlignment(Align(8));

Value *Step = Builder.CreateInBoundsGEP(

KmpDimTy, LoopBounds, {Builder.getInt32(0), Builder.getInt32(2)});

StoreInst *StepInst = Builder.CreateStore(DoacrossVars[I * 3 + 2], Step);

StepInst->setAlignment(Align(8));

}

Value *LoopBoundsBase = Builder.CreateInBoundsGEP(

DimsTy, DimsInst, {Builder.getInt64(0), Builder.getInt64(0)});

Value *LoopBoundsBaseInt8Ptr = Builder.CreateBitCast(LoopBoundsBase, Int8Ptr);

Value *ThreadId = getOrCreateThreadID(SrcLoc);

Function *RTLFnInit =

getOrCreateRuntimeFunctionPtr(OMPRTL___kmpc_doacross_init);

Builder.CreateCall(RTLFnInit, {SrcLoc, ThreadId, Builder.getInt32(OrderedVal),

LoopBoundsBaseInt8Ptr});

Builder.SetInsertPoint(&ExitBB->back());

Function *RTLFnFini =

getOrCreateRuntimeFunctionPtr(OMPRTL___kmpc_doacross_fini);

Builder.CreateCall(RTLFnFini, {SrcLoc, ThreadId});

} }

/// Make \p Source branch to \p Target. /// Make \p Source branch to \p Target.

/// ///

/// Handles two situations: /// Handles two situations:

/// * \p Source already has an unconditional branch. /// * \p Source already has an unconditional branch.

/// * \p Source is a degenerate block (no terminator because the BB is /// * \p Source is a degenerate block (no terminator because the BB is

/// the current head of the IR construction). /// the current head of the IR construction).

▲ Show 20 Lines • Show All 1,716 Lines • Show Last 20 Lines

llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp

Show First 20 Lines • Show All 1,758 Lines • ▼ Show 20 Lines	TEST_F(OpenMPIRBuilderTest, StaticWorkShareLoop) {

CanonicalLoopInfo *CLI = OMPBuilder.createCanonicalLoop(		CanonicalLoopInfo *CLI = OMPBuilder.createCanonicalLoop(
Loc, LoopBodyGen, StartVal, StopVal, StepVal,		Loc, LoopBodyGen, StartVal, StopVal, StepVal,
/IsSigned=/false, /InclusiveStop=/false);		/IsSigned=/false, /InclusiveStop=/false);
BasicBlock *Preheader = CLI->getPreheader();		BasicBlock *Preheader = CLI->getPreheader();
BasicBlock *Body = CLI->getBody();		BasicBlock *Body = CLI->getBody();
Value *IV = CLI->getIndVar();		Value *IV = CLI->getIndVar();
BasicBlock *ExitBlock = CLI->getExit();		BasicBlock *ExitBlock = CLI->getExit();
		BasicBlock *AfterBlock = CLI->getAfter();

Builder.SetInsertPoint(BB, BB->getFirstInsertionPt());		Builder.SetInsertPoint(BB, BB->getFirstInsertionPt());
InsertPointTy AllocaIP = Builder.saveIP();		InsertPointTy AllocaIP = Builder.saveIP();

OMPBuilder.applyStaticWorkshareLoop(DL, CLI, AllocaIP, /NeedsBarrier=/true);		InsertPointTy EndIP = OMPBuilder.applyStaticWorkshareLoop(
		DL, CLI, AllocaIP, /NeedsBarrier=/true);

BasicBlock *Cond = Body->getSinglePredecessor();		BasicBlock *Cond = Body->getSinglePredecessor();
Instruction Cmp = &Cond->begin();		Instruction Cmp = &Cond->begin();
Value *TripCount = Cmp->getOperand(1);		Value *TripCount = Cmp->getOperand(1);

auto AllocaIter = BB->begin();		auto AllocaIter = BB->begin();
ASSERT_GE(std::distance(BB->begin(), BB->end()), 4);		ASSERT_GE(std::distance(BB->begin(), BB->end()), 4);
AllocaInst PLastIter = dyn_cast<AllocaInst>(&(AllocaIter++));		AllocaInst PLastIter = dyn_cast<AllocaInst>(&(AllocaIter++));
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	TEST_F(OpenMPIRBuilderTest, StaticWorkShareLoop) {
auto *LoadedUpperBound = dyn_cast<LoadInst>(Difference->getOperand(0));		auto *LoadedUpperBound = dyn_cast<LoadInst>(Difference->getOperand(0));
ASSERT_NE(LoadedUpperBound, nullptr);		ASSERT_NE(LoadedUpperBound, nullptr);
EXPECT_EQ(LoadedUpperBound->getPointerOperand(), PUpperBound);		EXPECT_EQ(LoadedUpperBound->getPointerOperand(), PUpperBound);

// The original loop iterator should only be used in the condition, in the		// The original loop iterator should only be used in the condition, in the
// increment and in the statement that adds the lower bound to it.		// increment and in the statement that adds the lower bound to it.
EXPECT_EQ(std::distance(IV->use_begin(), IV->use_end()), 3);		EXPECT_EQ(std::distance(IV->use_begin(), IV->use_end()), 3);

// The exit block should contain the "fini" call and the barrier call,		// The exit block should contain the "fini" call.
// plus the call to obtain the thread ID.
size_t NumCallsInExitBlock =		size_t NumCallsInExitBlock =
count_if(*ExitBlock, [](Instruction &I) { return isa<CallInst>(I); });		count_if(*ExitBlock, [](Instruction &I) { return isa<CallInst>(I); });
EXPECT_EQ(NumCallsInExitBlock, 3u);		EXPECT_EQ(NumCallsInExitBlock, 1u);

		// The after block should contain the barrier call, plus the call to obtain
		// the thread ID.
		size_t NumCallsInAfterBlock =
		count_if(*AfterBlock, [](Instruction &I) { return isa<CallInst>(I); });
		EXPECT_EQ(NumCallsInAfterBlock, 2u);

		// Add a termination to our block and check that it is internally consistent.
		Builder.restoreIP(EndIP);
		Builder.CreateRetVoid();
		OMPBuilder.finalize();
		EXPECT_FALSE(verifyModule(*M, &errs()));
}		}

TEST_P(OpenMPIRBuilderTestWithParams, DynamicWorkShareLoop) {		TEST_P(OpenMPIRBuilderTestWithParams, DynamicWorkShareLoop) {
using InsertPointTy = OpenMPIRBuilder::InsertPointTy;		using InsertPointTy = OpenMPIRBuilder::InsertPointTy;
OpenMPIRBuilder OMPBuilder(*M);		OpenMPIRBuilder OMPBuilder(*M);
OMPBuilder.initialize();		OMPBuilder.initialize();
IRBuilder<> Builder(BB);		IRBuilder<> Builder(BB);
OpenMPIRBuilder::LocationDescription Loc({Builder.saveIP(), DL});		OpenMPIRBuilder::LocationDescription Loc({Builder.saveIP(), DL});
Show All 27 Lines	TEST_P(OpenMPIRBuilderTestWithParams, DynamicWorkShareLoop) {

Builder.SetInsertPoint(BB, BB->getFirstInsertionPt());		Builder.SetInsertPoint(BB, BB->getFirstInsertionPt());
InsertPointTy AllocaIP = Builder.saveIP();		InsertPointTy AllocaIP = Builder.saveIP();

// Collect all the info from CLI, as it isn't usable after the call to		// Collect all the info from CLI, as it isn't usable after the call to
// createDynamicWorkshareLoop.		// createDynamicWorkshareLoop.
InsertPointTy AfterIP = CLI->getAfterIP();		InsertPointTy AfterIP = CLI->getAfterIP();
BasicBlock *Preheader = CLI->getPreheader();		BasicBlock *Preheader = CLI->getPreheader();
BasicBlock *ExitBlock = CLI->getExit();		BasicBlock *AfterBlock = CLI->getAfter();
Value *IV = CLI->getIndVar();		Value *IV = CLI->getIndVar();

InsertPointTy EndIP =		InsertPointTy EndIP =
OMPBuilder.applyDynamicWorkshareLoop(DL, CLI, AllocaIP, SchedType,		OMPBuilder.applyDynamicWorkshareLoop(DL, CLI, AllocaIP, SchedType,
/NeedsBarrier=/true, ChunkVal);		/NeedsBarrier=/true, ChunkVal);
// The returned value should be the "after" point.		// The returned value should be the "after" point.
ASSERT_EQ(EndIP.getBlock(), AfterIP.getBlock());		ASSERT_EQ(EndIP.getBlock(), AfterIP.getBlock());
ASSERT_EQ(EndIP.getPoint(), AfterIP.getPoint());		ASSERT_EQ(EndIP.getPoint(), AfterIP.getPoint());
▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	TEST_P(OpenMPIRBuilderTestWithParams, DynamicWorkShareLoop) {
EXPECT_EQ(OrigLowerBound->getValue(), 1);		EXPECT_EQ(OrigLowerBound->getValue(), 1);
EXPECT_EQ(OrigUpperBound->getValue(), 21);		EXPECT_EQ(OrigUpperBound->getValue(), 21);
EXPECT_EQ(OrigStride->getValue(), 1);		EXPECT_EQ(OrigStride->getValue(), 1);

// The original loop iterator should only be used in the condition, in the		// The original loop iterator should only be used in the condition, in the
// increment and in the statement that adds the lower bound to it.		// increment and in the statement that adds the lower bound to it.
EXPECT_EQ(std::distance(IV->use_begin(), IV->use_end()), 3);		EXPECT_EQ(std::distance(IV->use_begin(), IV->use_end()), 3);

// The exit block should contain the barrier call, plus the call to obtain		// The after block should contain the barrier call, plus the call to obtain
// the thread ID.		// the thread ID.
size_t NumCallsInExitBlock =		size_t NumCallsInAfterBlock =
count_if(*ExitBlock, [](Instruction &I) { return isa<CallInst>(I); });		count_if(*AfterBlock, [](Instruction &I) { return isa<CallInst>(I); });
EXPECT_EQ(NumCallsInExitBlock, 2u);		EXPECT_EQ(NumCallsInAfterBlock, 2u);

// Add a termination to our block and check that it is internally consistent.		// Add a termination to our block and check that it is internally consistent.
Builder.restoreIP(EndIP);		Builder.restoreIP(EndIP);
Builder.CreateRetVoid();		Builder.CreateRetVoid();
OMPBuilder.finalize();		OMPBuilder.finalize();
EXPECT_FALSE(verifyModule(*M, &errs()));		EXPECT_FALSE(verifyModule(*M, &errs()));
}		}

Show All 10 Lines	::testing::Values(omp::OMPScheduleType::DynamicChunked,
omp::OMPScheduleType::ModifierMonotonic,		omp::OMPScheduleType::ModifierMonotonic,
omp::OMPScheduleType::GuidedChunked \|		omp::OMPScheduleType::GuidedChunked \|
omp::OMPScheduleType::ModifierNonmonotonic,		omp::OMPScheduleType::ModifierNonmonotonic,
omp::OMPScheduleType::Auto \|		omp::OMPScheduleType::Auto \|
omp::OMPScheduleType::ModifierMonotonic,		omp::OMPScheduleType::ModifierMonotonic,
omp::OMPScheduleType::Runtime \|		omp::OMPScheduleType::Runtime \|
omp::OMPScheduleType::ModifierMonotonic));		omp::OMPScheduleType::ModifierMonotonic));

		TEST_F(OpenMPIRBuilderTest, DoacrossLoop) {
		using InsertPointTy = OpenMPIRBuilder::InsertPointTy;
		OpenMPIRBuilder OMPBuilder(*M);
		OMPBuilder.initialize();
		IRBuilder<> Builder(BB);
		OpenMPIRBuilder::LocationDescription Loc({Builder.saveIP(), DL});

		Type *LCTy = Type::getInt32Ty(Ctx);
		Value *StartVal = ConstantInt::get(LCTy, 10);
		Value *StopVal = ConstantInt::get(LCTy, 52);
		Value *StepVal = ConstantInt::get(LCTy, 2);
		auto LoopBodyGen = [&](InsertPointTy, llvm::Value *) {};

		CanonicalLoopInfo *CLI = OMPBuilder.createCanonicalLoop(
		Loc, LoopBodyGen, StartVal, StopVal, StepVal,
		/IsSigned=/false, /InclusiveStop=/false);
		BasicBlock *Preheader = CLI->getPreheader();
		BasicBlock *ExitBlock = CLI->getExit();

		Builder.SetInsertPoint(BB, BB->getFirstInsertionPt());
		InsertPointTy AllocaIP = Builder.saveIP();

		InsertPointTy EndIP = OMPBuilder.applyStaticWorkshareLoop(
		DL, CLI, AllocaIP, /NeedsBarrier=/true);

		SmallVector<Value *, 3> DoacrossVars;
		Type *I64Ty = Type::getInt64Ty(Ctx);
		DoacrossVars.emplace_back(ConstantInt::get(I64Ty, 10));
		DoacrossVars.emplace_back(ConstantInt::get(I64Ty, 52));
		DoacrossVars.emplace_back(ConstantInt::get(I64Ty, 2));
		std::int64_t OrderedVal = 1;
		OMPBuilder.applyDoacrossLoop(DL, AllocaIP, Preheader, ExitBlock, OrderedVal,
		DoacrossVars);

		auto AllocaIter = BB->begin();
		ASSERT_GE(std::distance(BB->begin(), BB->end()), 5);
		AllocaIter++; // PLastIter
		AllocaIter++; // PLowerBound
		AllocaIter++; // PUpperBound
		AllocaIter++; // PStride
		AllocaInst DIMS = dyn_cast<AllocaInst>(&(AllocaIter));
		EXPECT_NE(DIMS, nullptr);
		EXPECT_TRUE(DIMS->getAllocatedType()->isArrayTy());
		EXPECT_EQ(DIMS->getArraySize(), ConstantInt::get(LCTy, 1));
		EXPECT_EQ(DIMS->getAlignment(), 8);
		Type *KmpDimTy = DIMS->getAllocatedType()->getArrayElementType();
		EXPECT_TRUE(KmpDimTy->isStructTy());
		EXPECT_EQ(KmpDimTy->getStructNumElements(), 3);
		EXPECT_TRUE(KmpDimTy->getStructElementType(0)->isIntegerTy(64));
		EXPECT_TRUE(KmpDimTy->getStructElementType(1)->isIntegerTy(64));
		EXPECT_TRUE(KmpDimTy->getStructElementType(2)->isIntegerTy(64));

		auto PreheaderIter = Preheader->begin();
		ASSERT_GE(std::distance(Preheader->begin(), Preheader->end()), 17);
		GetElementPtrInst ADDR = dyn_cast<GetElementPtrInst>(&(PreheaderIter++));
		GetElementPtrInst GEPLB = dyn_cast<GetElementPtrInst>(&(PreheaderIter++));
		StoreInst StoreLB = dyn_cast<StoreInst>(&(PreheaderIter++));
		GetElementPtrInst GEPUB = dyn_cast<GetElementPtrInst>(&(PreheaderIter++));
		StoreInst StoreUB = dyn_cast<StoreInst>(&(PreheaderIter++));
		GetElementPtrInst GEPStep = dyn_cast<GetElementPtrInst>(&(PreheaderIter++));
		StoreInst StoreStep = dyn_cast<StoreInst>(&(PreheaderIter++));
		GetElementPtrInst Base = dyn_cast<GetElementPtrInst>(&(PreheaderIter++));
		BitCastInst BaseI8 = dyn_cast<BitCastInst>(&(PreheaderIter++));
		CallInst InitGTID = dyn_cast<CallInst>(&(PreheaderIter++));
		CallInst DoacrossInit = dyn_cast<CallInst>(&(PreheaderIter++));
		EXPECT_NE(ADDR, nullptr);
		EXPECT_NE(GEPLB, nullptr);
		EXPECT_NE(StoreLB, nullptr);
		EXPECT_NE(GEPUB, nullptr);
		EXPECT_NE(StoreUB, nullptr);
		EXPECT_NE(GEPStep, nullptr);
		EXPECT_NE(StoreStep, nullptr);
		EXPECT_NE(Base, nullptr);
		EXPECT_NE(BaseI8, nullptr);
		EXPECT_NE(InitGTID, nullptr);
		EXPECT_NE(DoacrossInit, nullptr);
		EXPECT_EQ(ADDR->getNumOperands(), 3);
		EXPECT_EQ(ADDR->getOperand(0), DIMS);
		EXPECT_EQ(ADDR->getOperand(1), ConstantInt::get(I64Ty, 0));
		EXPECT_EQ(ADDR->getOperand(2), ConstantInt::get(I64Ty, 0));
		EXPECT_EQ(GEPLB->getNumOperands(), 3);
		EXPECT_EQ(GEPLB->getOperand(0), ADDR);
		EXPECT_EQ(GEPLB->getOperand(1), ConstantInt::get(LCTy, 0));
		EXPECT_EQ(GEPLB->getOperand(2), ConstantInt::get(LCTy, 0));
		EXPECT_EQ(StoreLB->getNumOperands(), 2);
		EXPECT_EQ(StoreLB->getOperand(0), DoacrossVars[0]);
		EXPECT_EQ(StoreLB->getOperand(1), GEPLB);
		EXPECT_EQ(StoreLB->getAlignment(), 8);
		EXPECT_EQ(GEPUB->getNumOperands(), 3);
		EXPECT_EQ(GEPUB->getOperand(0), ADDR);
		EXPECT_EQ(GEPUB->getOperand(1), ConstantInt::get(LCTy, 0));
		EXPECT_EQ(GEPUB->getOperand(2), ConstantInt::get(LCTy, 1));
		EXPECT_EQ(StoreUB->getNumOperands(), 2);
		EXPECT_EQ(StoreUB->getOperand(0), DoacrossVars[1]);
		EXPECT_EQ(StoreUB->getOperand(1), GEPUB);
		EXPECT_EQ(StoreUB->getAlignment(), 8);
		EXPECT_EQ(GEPStep->getNumOperands(), 3);
		EXPECT_EQ(GEPStep->getOperand(0), ADDR);
		EXPECT_EQ(GEPStep->getOperand(1), ConstantInt::get(LCTy, 0));
		EXPECT_EQ(GEPStep->getOperand(2), ConstantInt::get(LCTy, 2));
		EXPECT_EQ(StoreStep->getNumOperands(), 2);
		EXPECT_EQ(StoreStep->getOperand(0), DoacrossVars[2]);
		EXPECT_EQ(StoreStep->getOperand(1), GEPStep);
		EXPECT_EQ(StoreStep->getAlignment(), 8);
		EXPECT_EQ(Base->getNumOperands(), 3);
		EXPECT_EQ(Base->getOperand(0), DIMS);
		EXPECT_EQ(Base->getOperand(1), ConstantInt::get(I64Ty, 0));
		EXPECT_EQ(Base->getOperand(2), ConstantInt::get(I64Ty, 0));
		EXPECT_EQ(BaseI8->getNumOperands(), 1);
		EXPECT_EQ(BaseI8->getOperand(0), Base);
		EXPECT_EQ(InitGTID->getCalledFunction()->getName(),
		"__kmpc_global_thread_num");
		EXPECT_EQ(DoacrossInit->getCalledFunction()->getName(),
		"__kmpc_doacross_init");
		EXPECT_EQ(DoacrossInit->getNumOperands(), 5);
		EXPECT_EQ(DoacrossInit->getOperand(2), ConstantInt::get(LCTy, OrderedVal));
		EXPECT_EQ(DoacrossInit->getOperand(3), BaseI8);

		auto ExitIter = ExitBlock->begin();
		ASSERT_GE(std::distance(ExitBlock->begin(), ExitBlock->end()), 2);
		ExitIter++; // __kmpc_for_static_fini
		CallInst DoacrossFini = dyn_cast<CallInst>(&(ExitIter++));
		EXPECT_NE(DoacrossFini, nullptr);
		EXPECT_EQ(DoacrossFini->getCalledFunction()->getName(),
		"__kmpc_doacross_fini");
		MeinersburUnsubmitted Not Done Reply Inline Actions I don't think this kind of checking is useful. It does not make clear what properties are actually relevant and very difficult to update even if e.g. just the allocas are ordered differently. I suggest to only have some sanity checks, such as the existence of a call to `__kmpc_doacross_fini`. Meinersbur: I don't think this kind of checking is useful. It does not make clear what properties are…

		// Add a termination to our block and check that it is internally consistent.
		Builder.restoreIP(EndIP);
		Builder.CreateRetVoid();
		OMPBuilder.finalize();
		EXPECT_FALSE(verifyModule(*M, &errs()));
		}

TEST_F(OpenMPIRBuilderTest, MasterDirective) {		TEST_F(OpenMPIRBuilderTest, MasterDirective) {
using InsertPointTy = OpenMPIRBuilder::InsertPointTy;		using InsertPointTy = OpenMPIRBuilder::InsertPointTy;
OpenMPIRBuilder OMPBuilder(*M);		OpenMPIRBuilder OMPBuilder(*M);
OMPBuilder.initialize();		OMPBuilder.initialize();
F->setName("func");		F->setName("func");
IRBuilder<> Builder(BB);		IRBuilder<> Builder(BB);

OpenMPIRBuilder::LocationDescription Loc({Builder.saveIP(), DL});		OpenMPIRBuilder::LocationDescription Loc({Builder.saveIP(), DL});
▲ Show 20 Lines • Show All 1,609 Lines • ▼ Show 20 Lines	if (isa<CallInst>(Inst)) {
"__kmpc_for_static_init_4u") {		"__kmpc_for_static_init_4u") {
FoundForInit = true;		FoundForInit = true;
}		}
}		}
}		}
EXPECT_EQ(FoundForInit, true);		EXPECT_EQ(FoundForInit, true);

bool FoundForExit = false;		bool FoundForExit = false;
bool FoundBarrier = false;
for (Instruction &Inst : *ForExitBB) {		for (Instruction &Inst : *ForExitBB) {
if (isa<CallInst>(Inst)) {		if (isa<CallInst>(Inst)) {
if (cast<CallInst>(&Inst)->getCalledFunction()->getName() ==		if (cast<CallInst>(&Inst)->getCalledFunction()->getName() ==
"__kmpc_for_static_fini") {		"__kmpc_for_static_fini") {
FoundForExit = true;		FoundForExit = true;
		break;
		}
		}
}		}
		EXPECT_EQ(FoundForExit, true);

		BasicBlock *ForAfterBB = ForExitBB->getSingleSuccessor();
		EXPECT_NE(ForAfterBB, nullptr);
		bool FoundBarrier = false;
		for (Instruction &Inst : *ForAfterBB) {
		if (isa<CallInst>(Inst)) {
if (cast<CallInst>(&Inst)->getCalledFunction()->getName() ==		if (cast<CallInst>(&Inst)->getCalledFunction()->getName() ==
"__kmpc_barrier") {		"__kmpc_barrier") {
FoundBarrier = true;		FoundBarrier = true;
}
if (FoundForExit && FoundBarrier)
break;		break;
}		}
}		}
EXPECT_EQ(FoundForExit, true);		}
EXPECT_EQ(FoundBarrier, true);		EXPECT_EQ(FoundBarrier, true);

EXPECT_NE(SwitchBB, nullptr);		EXPECT_NE(SwitchBB, nullptr);
EXPECT_NE(SwitchBB->getTerminator(), nullptr);		EXPECT_NE(SwitchBB->getTerminator(), nullptr);
EXPECT_EQ(isa<SwitchInst>(SwitchBB->getTerminator()), true);		EXPECT_EQ(isa<SwitchInst>(SwitchBB->getTerminator()), true);
Switch = cast<SwitchInst>(SwitchBB->getTerminator());		Switch = cast<SwitchInst>(SwitchBB->getTerminator());
EXPECT_EQ(Switch->getNumCases(), 2U);		EXPECT_EQ(Switch->getNumCases(), 2U);
EXPECT_NE(ForIncBB, nullptr);		EXPECT_NE(ForIncBB, nullptr);
▲ Show 20 Lines • Show All 212 Lines • Show Last 20 Lines

mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td

Show First 20 Lines • Show All 267 Lines • ▼ Show 20 Lines	let description = [{
are collapsed to form the worksharing loop.		are collapsed to form the worksharing loop.

The `nowait` attribute, when present, signifies that there should be no		The `nowait` attribute, when present, signifies that there should be no
implicit barrier at the end of the loop.		implicit barrier at the end of the loop.

The optional `ordered_val` attribute specifies how many loops are associated		The optional `ordered_val` attribute specifies how many loops are associated
with the do loop construct.		with the do loop construct.

		The `doacross_vars` are arguments of doacross loop nest, which is formed by
		"n" outer loops when the parameter "n" is in ordered clause. The arguments
		store the loop bounds info, which is required in doacorss init runtime call.

The optional `order` attribute specifies which order the iterations of the		The optional `order` attribute specifies which order the iterations of the
associate loops are executed in. Currently the only option for this		associate loops are executed in. Currently the only option for this
attribute is "concurrent".		attribute is "concurrent".
}];		}];

let arguments = (ins Variadic<IntLikeType>:$lowerBound,		let arguments = (ins Variadic<IntLikeType>:$lowerBound,
Variadic<IntLikeType>:$upperBound,		Variadic<IntLikeType>:$upperBound,
Variadic<IntLikeType>:$step,		Variadic<IntLikeType>:$step,
Variadic<AnyType>:$private_vars,		Variadic<AnyType>:$private_vars,
Variadic<AnyType>:$firstprivate_vars,		Variadic<AnyType>:$firstprivate_vars,
Variadic<AnyType>:$lastprivate_vars,		Variadic<AnyType>:$lastprivate_vars,
Variadic<AnyType>:$linear_vars,		Variadic<AnyType>:$linear_vars,
Variadic<AnyType>:$linear_step_vars,		Variadic<AnyType>:$linear_step_vars,
Variadic<OpenMP_PointerLikeType>:$reduction_vars,		Variadic<OpenMP_PointerLikeType>:$reduction_vars,
OptionalAttr<SymbolRefArrayAttr>:$reductions,		OptionalAttr<SymbolRefArrayAttr>:$reductions,
OptionalAttr<ScheduleKind>:$schedule_val,		OptionalAttr<ScheduleKind>:$schedule_val,
Optional<AnyType>:$schedule_chunk_var,		Optional<AnyType>:$schedule_chunk_var,
OptionalAttr<ScheduleModifier>:$schedule_modifier,		OptionalAttr<ScheduleModifier>:$schedule_modifier,
UnitAttr:$simd_modifier,		UnitAttr:$simd_modifier,
Confined<OptionalAttr<I64Attr>, [IntMinValue<0>]>:$collapse_val,		Confined<OptionalAttr<I64Attr>, [IntMinValue<0>]>:$collapse_val,
UnitAttr:$nowait,		UnitAttr:$nowait,
Confined<OptionalAttr<I64Attr>, [IntMinValue<0>]>:$ordered_val,		Confined<OptionalAttr<I64Attr>, [IntMinValue<0>]>:$ordered_val,
		Variadic<IntLikeType>:$doacross_vars,
OptionalAttr<OrderKind>:$order_val,		OptionalAttr<OrderKind>:$order_val,
UnitAttr:$inclusive);		UnitAttr:$inclusive);

let skipDefaultBuilders = 1;		let skipDefaultBuilders = 1;

let builders = [		let builders = [
OpBuilder<(ins "ValueRange":$lowerBound, "ValueRange":$upperBound,		OpBuilder<(ins "ValueRange":$lowerBound, "ValueRange":$upperBound,
"ValueRange":$step,		"ValueRange":$step,
CArg<"ArrayRef<NamedAttribute>", "{}">:$attributes)>,		CArg<"ArrayRef<NamedAttribute>", "{}">:$attributes)>,
OpBuilder<(ins "TypeRange":$resultTypes, "ValueRange":$lowerBound,		OpBuilder<(ins "TypeRange":$resultTypes, "ValueRange":$lowerBound,
"ValueRange":$upperBound, "ValueRange":$step,		"ValueRange":$upperBound, "ValueRange":$step,
"ValueRange":$privateVars, "ValueRange":$firstprivateVars,		"ValueRange":$privateVars, "ValueRange":$firstprivateVars,
"ValueRange":$lastprivate_vars, "ValueRange":$linear_vars,		"ValueRange":$lastprivate_vars, "ValueRange":$linear_vars,
"ValueRange":$linear_step_vars, "ValueRange":$reduction_vars,		"ValueRange":$linear_step_vars, "ValueRange":$reduction_vars,
"StringAttr":$schedule_val, "Value":$schedule_chunk_var,		"StringAttr":$schedule_val, "Value":$schedule_chunk_var,
"IntegerAttr":$collapse_val, "UnitAttr":$nowait,		"IntegerAttr":$collapse_val, "UnitAttr":$nowait,
"IntegerAttr":$ordered_val, "StringAttr":$order_val,		"IntegerAttr":$ordered_val, "ValueRange":$doacross_vars,
"UnitAttr":$inclusive, CArg<"bool", "true">:$buildBody)>,		"StringAttr":$order_val, "UnitAttr":$inclusive,
		CArg<"bool", "true">:$buildBody)>,
OpBuilder<(ins "TypeRange":$resultTypes, "ValueRange":$operands,		OpBuilder<(ins "TypeRange":$resultTypes, "ValueRange":$operands,
CArg<"ArrayRef<NamedAttribute>", "{}">:$attributes)>		CArg<"ArrayRef<NamedAttribute>", "{}">:$attributes)>
];		];

let regions = (region AnyRegion:$region);		let regions = (region AnyRegion:$region);

let extraClassDeclaration = [{		let extraClassDeclaration = [{
/// Returns the number of loops in the workshape loop nest.		/// Returns the number of loops in the workshape loop nest.
▲ Show 20 Lines • Show All 450 Lines • Show Last 20 Lines

mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp

Show First 20 Lines • Show All 530 Lines • ▼ Show 20 Lines	enum ClauseType {
procBindClause,		procBindClause,
reductionClause,		reductionClause,
nowaitClause,		nowaitClause,
linearClause,		linearClause,
scheduleClause,		scheduleClause,
collapseClause,		collapseClause,
orderClause,		orderClause,
orderedClause,		orderedClause,
		doacrossVirturalClause,
memoryOrderClause,		memoryOrderClause,
hintClause,		hintClause,
COUNT		COUNT
};		};

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Parser for Clause List		// Parser for Clause List
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines	static ParseResult parseClauses(OpAsmParser &parser, OperationState &result,
SmallVector<SymbolRefAttr> reductionSymbols;		SmallVector<SymbolRefAttr> reductionSymbols;
SmallVector<OpAsmParser::OperandType> reductionVars;		SmallVector<OpAsmParser::OperandType> reductionVars;
SmallVector<Type> reductionVarTypes;		SmallVector<Type> reductionVarTypes;

SmallVector<OpAsmParser::OperandType> linears;		SmallVector<OpAsmParser::OperandType> linears;
SmallVector<Type> linearTypes;		SmallVector<Type> linearTypes;
SmallVector<OpAsmParser::OperandType> linearSteps;		SmallVector<OpAsmParser::OperandType> linearSteps;

		// "doacross" is not one real clause and it is attached with "ordered" clause
		// when ordered value is greater than 0.
		SmallVector<OpAsmParser::OperandType> doacrossVars;
		SmallVector<Type> doacrossTypes;

SmallString<8> schedule;		SmallString<8> schedule;
SmallVector<SmallString<12>> modifiers;		SmallVector<SmallString<12>> modifiers;
Optional<OpAsmParser::OperandType> scheduleChunkSize;		Optional<OpAsmParser::OperandType> scheduleChunkSize;

// Compute the position of clauses in operand segments		// Compute the position of clauses in operand segments
int currPos = 0;		int currPos = 0;
for (ClauseType clause : clauses) {		for (ClauseType clause : clauses) {

▲ Show 20 Lines • Show All 118 Lines • ▼ Show 20 Lines	if (clauseKeyword == "if") {
auto type = parser.getBuilder().getI64Type();		auto type = parser.getBuilder().getI64Type();
mlir::IntegerAttr attr;		mlir::IntegerAttr attr;
if (checkAllowed(collapseClause) \|\| parser.parseLParen() \|\|		if (checkAllowed(collapseClause) \|\| parser.parseLParen() \|\|
parser.parseAttribute(attr, type) \|\| parser.parseRParen())		parser.parseAttribute(attr, type) \|\| parser.parseRParen())
return failure();		return failure();
result.addAttribute("collapse_val", attr);		result.addAttribute("collapse_val", attr);
} else if (clauseKeyword == "ordered") {		} else if (clauseKeyword == "ordered") {
mlir::IntegerAttr attr;		mlir::IntegerAttr attr;
if (checkAllowed(orderedClause))
return failure();
if (succeeded(parser.parseOptionalLParen())) {
auto type = parser.getBuilder().getI64Type();		auto type = parser.getBuilder().getI64Type();
if (parser.parseAttribute(attr, type) \|\| parser.parseRParen())		if (checkAllowed(orderedClause) \|\| parser.parseLParen() \|\|
		parser.parseAttribute(attr, type) \|\| parser.parseRParen())
return failure();		return failure();
} else {
// Use 0 to represent no ordered parameter was specified
attr = parser.getBuilder().getI64IntegerAttr(0);
}
result.addAttribute("ordered_val", attr);		result.addAttribute("ordered_val", attr);
		if (attr.getValue().getSExtValue() > 0) {
		if (checkAllowed(doacrossVirturalClause) \|\|
		parser.parseKeyword("doacross") \|\|
		parseOperandAndTypeList(parser, doacrossVars, doacrossTypes))
		return failure();
		clauseSegments[pos[doacrossVirturalClause]] = doacrossVars.size();
		}
} else if (clauseKeyword == "order") {		} else if (clauseKeyword == "order") {
StringRef order;		StringRef order;
if (checkAllowed(orderClause) \|\| parser.parseLParen() \|\|		if (checkAllowed(orderClause) \|\| parser.parseLParen() \|\|
parser.parseKeyword(&order) \|\| parser.parseRParen())		parser.parseKeyword(&order) \|\| parser.parseRParen())
return failure();		return failure();
auto attr = parser.getBuilder().getStringAttr(order);		auto attr = parser.getBuilder().getStringAttr(order);
result.addAttribute("order_val", attr);		result.addAttribute("order_val", attr);
} else if (clauseKeyword == "memory_order") {		} else if (clauseKeyword == "memory_order") {
▲ Show 20 Lines • Show All 111 Lines • ▼ Show 20 Lines	if (modifiers.size() > 0) {
}		}
}		}
if (scheduleChunkSize) {		if (scheduleChunkSize) {
auto chunkSizeType = parser.getBuilder().getI32Type();		auto chunkSizeType = parser.getBuilder().getI32Type();
parser.resolveOperand(*scheduleChunkSize, chunkSizeType, result.operands);		parser.resolveOperand(*scheduleChunkSize, chunkSizeType, result.operands);
}		}
}		}

		// Add ordered doacross parameters
		if (done[doacrossVirturalClause] &&
		clauseSegments[pos[doacrossVirturalClause]] &&
		failed(parser.resolveOperands(doacrossVars, doacrossTypes,
		doacrossVars[0].location, result.operands)))
		return failure();

segments.insert(segments.end(), clauseSegments.begin(), clauseSegments.end());		segments.insert(segments.end(), clauseSegments.begin(), clauseSegments.end());

return success();		return success();
}		}

/// Parses a parallel operation.		/// Parses a parallel operation.
///		///
/// operation ::= `omp.parallel` clause-list		/// operation ::= `omp.parallel` clause-list
▲ Show 20 Lines • Show All 144 Lines • ▼ Show 20 Lines	static ParseResult parseWsLoopOp(OpAsmParser &parser, OperationState &result) {
// Parse step values.		// Parse step values.
SmallVector<OpAsmParser::OperandType> steps;		SmallVector<OpAsmParser::OperandType> steps;
if (parser.parseKeyword("step") \|\|		if (parser.parseKeyword("step") \|\|
parser.parseOperandList(steps, numIVs, OpAsmParser::Delimiter::Paren) \|\|		parser.parseOperandList(steps, numIVs, OpAsmParser::Delimiter::Paren) \|\|
parser.resolveOperands(steps, loopVarType, result.operands))		parser.resolveOperands(steps, loopVarType, result.operands))
return failure();		return failure();

SmallVector<ClauseType> clauses = {		SmallVector<ClauseType> clauses = {
privateClause, firstprivateClause, lastprivateClause, linearClause,		privateClause, firstprivateClause, lastprivateClause, linearClause,
reductionClause, collapseClause, orderClause, orderedClause,		reductionClause, collapseClause, orderClause, orderedClause,
nowaitClause, scheduleClause};		nowaitClause, scheduleClause, doacrossVirturalClause};
SmallVector<int> segments{numIVs, numIVs, numIVs};		SmallVector<int> segments{numIVs, numIVs, numIVs};
if (failed(parseClauses(parser, result, clauses, segments)))		if (failed(parseClauses(parser, result, clauses, segments)))
return failure();		return failure();

result.addAttribute("operand_segment_sizes",		result.addAttribute("operand_segment_sizes",
parser.getBuilder().getI32VectorAttr(segments));		parser.getBuilder().getI32VectorAttr(segments));

// Now parse the body.		// Now parse the body.
Show All 26 Lines	printScheduleClause(p, sched.getValue(), op.schedule_modifier(),
op.simd_modifier(), op.schedule_chunk_var());		op.simd_modifier(), op.schedule_chunk_var());

if (auto collapse = op.collapse_val())		if (auto collapse = op.collapse_val())
p << "collapse(" << collapse << ") ";		p << "collapse(" << collapse << ") ";

if (op.nowait())		if (op.nowait())
p << "nowait ";		p << "nowait ";

if (auto ordered = op.ordered_val())		if (auto ordered = op.ordered_val()) {
p << "ordered(" << ordered << ") ";		p << "ordered(" << ordered << ") ";
		if (ordered.getValue() > 0)
		printDataVars(p, op.doacross_vars(), "doacross");
		}

if (auto order = op.order_val())		if (auto order = op.order_val())
p << "order(" << order << ") ";		p << "order(" << order << ") ";

if (!op.reduction_vars().empty())		if (!op.reduction_vars().empty())
printReductionVarList(p, op.reductions(), op.reduction_vars());		printReductionVarList(p, op.reductions(), op.reduction_vars());

p.printRegion(op.region(), /printEntryBlockArgs=/false);		p.printRegion(op.region(), /printEntryBlockArgs=/false);
▲ Show 20 Lines • Show All 87 Lines • ▼ Show 20 Lines	void WsLoopOp::build(OpBuilder &builder, OperationState &state,
ValueRange lowerBound, ValueRange upperBound,		ValueRange lowerBound, ValueRange upperBound,
ValueRange step, ArrayRef<NamedAttribute> attributes) {		ValueRange step, ArrayRef<NamedAttribute> attributes) {
build(builder, state, TypeRange(), lowerBound, upperBound, step,		build(builder, state, TypeRange(), lowerBound, upperBound, step,
/private_vars=/ValueRange(),		/private_vars=/ValueRange(),
/firstprivate_vars=/ValueRange(), /lastprivate_vars=/ValueRange(),		/firstprivate_vars=/ValueRange(), /lastprivate_vars=/ValueRange(),
/linear_vars=/ValueRange(), /linear_step_vars=/ValueRange(),		/linear_vars=/ValueRange(), /linear_step_vars=/ValueRange(),
/reduction_vars=/ValueRange(), /schedule_val=/nullptr,		/reduction_vars=/ValueRange(), /schedule_val=/nullptr,
/schedule_chunk_var=/nullptr, /collapse_val=/nullptr,		/schedule_chunk_var=/nullptr, /collapse_val=/nullptr,
/nowait=/nullptr, /ordered_val=/nullptr, /order_val=/nullptr,		/nowait=/nullptr, /ordered_val=/nullptr,
		/doacross_vars=/ValueRange(), /order_val=/nullptr,
/inclusive=/nullptr, /buildBody=/false);		/inclusive=/nullptr, /buildBody=/false);
state.addAttributes(attributes);		state.addAttributes(attributes);
}		}

void WsLoopOp::build(OpBuilder &, OperationState &state, TypeRange resultTypes,		void WsLoopOp::build(OpBuilder &, OperationState &state, TypeRange resultTypes,
ValueRange operands, ArrayRef<NamedAttribute> attributes) {		ValueRange operands, ArrayRef<NamedAttribute> attributes) {
state.addOperands(operands);		state.addOperands(operands);
state.addAttributes(attributes);		state.addAttributes(attributes);
(void)state.addRegion();		(void)state.addRegion();
assert(resultTypes.empty() && "mismatched number of return types");		assert(resultTypes.empty() && "mismatched number of return types");
state.addTypes(resultTypes);		state.addTypes(resultTypes);
}		}

void WsLoopOp::build(OpBuilder &builder, OperationState &result,		void WsLoopOp::build(OpBuilder &builder, OperationState &result,
TypeRange typeRange, ValueRange lowerBounds,		TypeRange typeRange, ValueRange lowerBounds,
ValueRange upperBounds, ValueRange steps,		ValueRange upperBounds, ValueRange steps,
ValueRange privateVars, ValueRange firstprivateVars,		ValueRange privateVars, ValueRange firstprivateVars,
ValueRange lastprivateVars, ValueRange linearVars,		ValueRange lastprivateVars, ValueRange linearVars,
ValueRange linearStepVars, ValueRange reductionVars,		ValueRange linearStepVars, ValueRange reductionVars,
StringAttr scheduleVal, Value scheduleChunkVar,		StringAttr scheduleVal, Value scheduleChunkVar,
IntegerAttr collapseVal, UnitAttr nowait,		IntegerAttr collapseVal, UnitAttr nowait,
IntegerAttr orderedVal, StringAttr orderVal,		IntegerAttr orderedVal, ValueRange doacrossVars,
UnitAttr inclusive, bool buildBody) {		StringAttr orderVal, UnitAttr inclusive, bool buildBody) {
result.addOperands(lowerBounds);		result.addOperands(lowerBounds);
result.addOperands(upperBounds);		result.addOperands(upperBounds);
result.addOperands(steps);		result.addOperands(steps);
result.addOperands(privateVars);		result.addOperands(privateVars);
result.addOperands(firstprivateVars);		result.addOperands(firstprivateVars);
result.addOperands(linearVars);		result.addOperands(linearVars);
result.addOperands(linearStepVars);		result.addOperands(linearStepVars);
if (scheduleChunkVar)		if (scheduleChunkVar)
result.addOperands(scheduleChunkVar);		result.addOperands(scheduleChunkVar);
		result.addOperands(doacrossVars);

if (scheduleVal)		if (scheduleVal)
result.addAttribute("schedule_val", scheduleVal);		result.addAttribute("schedule_val", scheduleVal);
if (collapseVal)		if (collapseVal)
result.addAttribute("collapse_val", collapseVal);		result.addAttribute("collapse_val", collapseVal);
if (nowait)		if (nowait)
result.addAttribute("nowait", nowait);		result.addAttribute("nowait", nowait);
if (orderedVal)		if (orderedVal)
Show All 9 Lines	result.addAttribute(
static_cast<int32_t>(upperBounds.size()),		static_cast<int32_t>(upperBounds.size()),
static_cast<int32_t>(steps.size()),		static_cast<int32_t>(steps.size()),
static_cast<int32_t>(privateVars.size()),		static_cast<int32_t>(privateVars.size()),
static_cast<int32_t>(firstprivateVars.size()),		static_cast<int32_t>(firstprivateVars.size()),
static_cast<int32_t>(lastprivateVars.size()),		static_cast<int32_t>(lastprivateVars.size()),
static_cast<int32_t>(linearVars.size()),		static_cast<int32_t>(linearVars.size()),
static_cast<int32_t>(linearStepVars.size()),		static_cast<int32_t>(linearStepVars.size()),
static_cast<int32_t>(reductionVars.size()),		static_cast<int32_t>(reductionVars.size()),
static_cast<int32_t>(scheduleChunkVar != nullptr ? 1 : 0)}));		static_cast<int32_t>(scheduleChunkVar != nullptr ? 1 : 0),
		static_cast<int32_t>(doacrossVars.size())}));

Region *bodyRegion = result.addRegion();		Region *bodyRegion = result.addRegion();
if (buildBody) {		if (buildBody) {
OpBuilder::InsertionGuard guard(builder);		OpBuilder::InsertionGuard guard(builder);
unsigned numIVs = steps.size();		unsigned numIVs = steps.size();
SmallVector<Type, 8> argTypes(numIVs, steps.getType().front());		SmallVector<Type, 8> argTypes(numIVs, steps.getType().front());
builder.createBlock(bodyRegion, {}, argTypes);		builder.createBlock(bodyRegion, {}, argTypes);
}		}
▲ Show 20 Lines • Show All 247 Lines • Show Last 20 Lines

mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp

Show First 20 Lines • Show All 743 Lines • ▼ Show 20 Lines	for (unsigned i = 0, e = loop.getNumLoops(); i < e; ++i) {
loopInfos.push_back(ompBuilder->createCanonicalLoop(		loopInfos.push_back(ompBuilder->createCanonicalLoop(
loc, bodyGen, lowerBound, upperBound, step,		loc, bodyGen, lowerBound, upperBound, step,
/IsSigned=/true, loop.inclusive(), computeIP));		/IsSigned=/true, loop.inclusive(), computeIP));

if (failed(bodyGenStatus))		if (failed(bodyGenStatus))
return failure();		return failure();
}		}

// Collapse loops. Store the insertion point because LoopInfos may get		// Collapse loops. Store the basic block because LoopInfos may get
// invalidated.		// invalidated.
llvm::IRBuilderBase::InsertPoint afterIP = loopInfos.front()->getAfterIP();		llvm::BasicBlock *afterBB = loopInfos.front()->getAfter();
llvm::CanonicalLoopInfo *loopInfo =		llvm::CanonicalLoopInfo *loopInfo =
ompBuilder->collapseLoops(diLoc, loopInfos, {});		ompBuilder->collapseLoops(diLoc, loopInfos, {});

allocaIP = findAllocaInsertPoint(builder, moduleTranslation);		allocaIP = findAllocaInsertPoint(builder, moduleTranslation);

bool isSimd = loop.simd_modifier();		bool isSimd = loop.simd_modifier();

		// Store the BBs since loopInfo get invalidated after apply*WorkshareLoop.
		llvm::BasicBlock *preHeaderBB = loopInfo->getPreheader();
		llvm::BasicBlock *exitBB = loopInfo->getExit();

		std::int64_t orderedVal =
		loop.ordered_val().hasValue() ? loop.ordered_val().getValue() : -1;
if (schedule == omp::ClauseScheduleKind::Static) {		if (schedule == omp::ClauseScheduleKind::Static) {
ompBuilder->applyStaticWorkshareLoop(ompLoc.DL, loopInfo, allocaIP,		ompBuilder->applyStaticWorkshareLoop(ompLoc.DL, loopInfo, allocaIP,
!loop.nowait(), chunk);		!loop.nowait(), chunk);
} else {		} else {
llvm::omp::OMPScheduleType schedType;		llvm::omp::OMPScheduleType schedType;
switch (schedule) {		switch (schedule) {
case omp::ClauseScheduleKind::Dynamic:		case omp::ClauseScheduleKind::Dynamic:
schedType = llvm::omp::OMPScheduleType::DynamicChunked;		schedType = llvm::omp::OMPScheduleType::DynamicChunked;
Show All 28 Lines	if (loop.schedule_modifier().hasValue()) {
case omp::ScheduleModifier::nonmonotonic:		case omp::ScheduleModifier::nonmonotonic:
schedType \|= llvm::omp::OMPScheduleType::ModifierNonmonotonic;		schedType \|= llvm::omp::OMPScheduleType::ModifierNonmonotonic;
break;		break;
default:		default:
// Nothing to do here.		// Nothing to do here.
break;		break;
}		}
}		}
afterIP = ompBuilder->applyDynamicWorkshareLoop(		ompBuilder->applyDynamicWorkshareLoop(ompLoc.DL, loopInfo, allocaIP,
ompLoc.DL, loopInfo, allocaIP, schedType, !loop.nowait(), chunk);		schedType, !loop.nowait(), chunk);
		}

		if (orderedVal > 0) {
		SmallVector<llvm::Value *> doacrossVars =
		moduleTranslation.lookupValues(loop.doacross_vars());
		ompBuilder->applyDoacrossLoop(ompLoc.DL, allocaIP, preHeaderBB, exitBB,
		orderedVal, doacrossVars);
}		}

// Continue building IR after the loop. Note that the LoopInfo returned by		// Continue building IR after the loop. Note that the LoopInfo returned by
// `collapseLoops` points inside the outermost loop and is intended for		// `collapseLoops` points inside the outermost loop and is intended for
// potential further loop transformations. Use the insertion point stored		// potential further loop transformations. Use the after basic block stored
// before collapsing loops instead.		// before collapsing loops instead and insert the created instructions
builder.restoreIP(afterIP);		// appended to the after basic block.
		builder.SetInsertPoint(afterBB, afterBB->end());

// Process the reductions if required.		// Process the reductions if required.
if (numReductions == 0)		if (numReductions == 0)
return success();		return success();

// Create the reduction generators. We need to own them here because		// Create the reduction generators. We need to own them here because
// ReductionInfo only accepts references to the generators.		// ReductionInfo only accepts references to the generators.
SmallVector<OwningReductionGen> owningReductionGens;		SmallVector<OwningReductionGen> owningReductionGens;
▲ Show 20 Lines • Show All 245 Lines • Show Last 20 Lines

mlir/test/Conversion/OpenMPToLLVM/convert-to-llvmir.mlir

Show First 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	omp.parallel {
// CHECK: omp.wsloop (%[[ARG6:.]], %[[ARG7:.]]) : i64 = (%[[ARG0]], %[[ARG1]]) to (%[[ARG2]], %[[ARG3]]) step (%[[ARG4]], %[[ARG5]]) {		// CHECK: omp.wsloop (%[[ARG6:.]], %[[ARG7:.]]) : i64 = (%[[ARG0]], %[[ARG1]]) to (%[[ARG2]], %[[ARG3]]) step (%[[ARG4]], %[[ARG5]]) {
"omp.wsloop"(%arg0, %arg1, %arg2, %arg3, %arg4, %arg5) ( {		"omp.wsloop"(%arg0, %arg1, %arg2, %arg3, %arg4, %arg5) ( {
^bb0(%arg6: index, %arg7: index): // no predecessors		^bb0(%arg6: index, %arg7: index): // no predecessors
// CHECK-DAG: %[[CAST_ARG6:.*]] = builtin.unrealized_conversion_cast %[[ARG6]] : i64 to index		// CHECK-DAG: %[[CAST_ARG6:.*]] = builtin.unrealized_conversion_cast %[[ARG6]] : i64 to index
// CHECK-DAG: %[[CAST_ARG7:.*]] = builtin.unrealized_conversion_cast %[[ARG7]] : i64 to index		// CHECK-DAG: %[[CAST_ARG7:.*]] = builtin.unrealized_conversion_cast %[[ARG7]] : i64 to index
// CHECK: "test.payload"(%[[CAST_ARG6]], %[[CAST_ARG7]]) : (index, index) -> ()		// CHECK: "test.payload"(%[[CAST_ARG6]], %[[CAST_ARG7]]) : (index, index) -> ()
"test.payload"(%arg6, %arg7) : (index, index) -> ()		"test.payload"(%arg6, %arg7) : (index, index) -> ()
omp.yield		omp.yield
}) {operand_segment_sizes = dense<[2, 2, 2, 0, 0, 0, 0, 0, 0, 0]> : vector<10xi32>} : (index, index, index, index, index, index) -> ()		}) {operand_segment_sizes = dense<[2, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0]> : vector<11xi32>} : (index, index, index, index, index, index) -> ()
omp.terminator		omp.terminator
}		}
return		return
}		}

mlir/test/Dialect/OpenMP/invalid.mlir

Show First 20 Lines • Show All 114 Lines • ▼ Show 20 Lines	func @order_not_allowed() {
omp.parallel order(concurrent) {}		omp.parallel order(concurrent) {}
return		return
}		}

// -----		// -----

func @ordered_not_allowed() {		func @ordered_not_allowed() {
// expected-error@+1 {{ordered is not a valid clause for the omp.parallel operation}}		// expected-error@+1 {{ordered is not a valid clause for the omp.parallel operation}}
omp.parallel ordered(2) {}		omp.parallel ordered(0) {}
		MeinersburUnsubmitted Not Done Reply Inline Actions Why this change? Meinersbur: Why this change?
		peixinAuthorUnsubmitted Done Reply Inline Actions The doacross loop is not implemented before. If the ordered value is greater than 1, there is one virtual doacross clause attached with this patch. This check only checks if there is ordered clause. peixin: The doacross loop is not implemented before. If the ordered value is greater than 1, there is…
}		}

// -----		// -----

func @default_once() {		func @default_once() {
// expected-error@+1 {{at most one default clause can appear on the omp.parallel operation}}		// expected-error@+1 {{at most one default clause can appear on the omp.parallel operation}}
omp.parallel default(private) default(firstprivate) {		omp.parallel default(private) default(firstprivate) {
}		}
▲ Show 20 Lines • Show All 311 Lines • ▼ Show 20 Lines

// -----		// -----

// expected-error @below {{invalid_hint is not a valid hint}}		// expected-error @below {{invalid_hint is not a valid hint}}
omp.critical.declare @mutex hint(invalid_hint)		omp.critical.declare @mutex hint(invalid_hint)

// -----		// -----

func @omp_ordered1(%arg1 : i32, %arg2 : i32, %arg3 : i32) -> () {		func @omp_ordered1(%arg1 : i32, %arg2 : i32, %arg3 : i32, %doacross_var1 : i64, %doacross_var2 : i64, %doacross_var3 : i64) -> () {
omp.wsloop (%0) : i32 = (%arg1) to (%arg2) step (%arg3) ordered(1) {		omp.wsloop (%0) : i32 = (%arg1) to (%arg2) step (%arg3) ordered(1) doacross(%doacross_var1 : i64, %doacross_var2 : i64, %doacross_var3 : i64) {
// expected-error @below {{ordered region must be closely nested inside a worksharing-loop region with an ordered clause without parameter present}}		// expected-error @below {{ordered region must be closely nested inside a worksharing-loop region with an ordered clause without parameter present}}
omp.ordered_region {		omp.ordered_region {
omp.terminator		omp.terminator
}		}
omp.yield		omp.yield
}		}
return		return
}		}
Show All 27 Lines	omp.wsloop (%0) : i32 = (%arg1) to (%arg2) step (%arg3) ordered(0) {
omp.ordered depend_type("dependsink") depend_vec(%vec0 : i64) {num_loops_val = 1 : i64}		omp.ordered depend_type("dependsink") depend_vec(%vec0 : i64) {num_loops_val = 1 : i64}

omp.yield		omp.yield
}		}
return		return
}		}
// -----		// -----

func @omp_ordered5(%arg1 : i32, %arg2 : i32, %arg3 : i32, %vec0 : i64, %vec1 : i64) -> () {		func @omp_ordered5(%arg1 : i32, %arg2 : i32, %arg3 : i32, %vec0 : i64, %vec1 : i64, %doacross_var1 : i64, %doacross_var2 : i64, %doacross_var3 : i64) -> () {
omp.wsloop (%0) : i32 = (%arg1) to (%arg2) step (%arg3) ordered(1) {		omp.wsloop (%0) : i32 = (%arg1) to (%arg2) step (%arg3) ordered(1) doacross(%doacross_var1 : i64, %doacross_var2 : i64, %doacross_var3 : i64) {
// expected-error @below {{number of variables in depend clause does not match number of iteration variables in the doacross loop}}		// expected-error @below {{number of variables in depend clause does not match number of iteration variables in the doacross loop}}
omp.ordered depend_type("dependsource") depend_vec(%vec0, %vec1 : i64, i64) {num_loops_val = 2 : i64}		omp.ordered depend_type("dependsource") depend_vec(%vec0, %vec1 : i64, i64) {num_loops_val = 2 : i64}

omp.yield		omp.yield
}		}
return		return
}		}

▲ Show 20 Lines • Show All 283 Lines • ▼ Show 20 Lines	func @omp_sections() {
}		}
return		return
}		}

// -----		// -----

func @omp_sections() {		func @omp_sections() {
// expected-error @below {{ordered is not a valid clause for the omp.sections operation}}		// expected-error @below {{ordered is not a valid clause for the omp.sections operation}}
omp.sections ordered(2) {		omp.sections ordered(0) {
omp.terminator		omp.terminator
}		}
return		return
}		}

// -----		// -----

func @omp_sections() {		func @omp_sections() {
Show All 21 Lines

mlir/test/Dialect/OpenMP/ops.mlir

Show First 20 Lines • Show All 141 Lines • ▼ Show 20 Lines	omp.parallel num_threads(%num_threads : si32) if(%if_cond: i1)
omp.parallel default(none) {		omp.parallel default(none) {
omp.terminator		omp.terminator
}		}

return		return
}		}

// CHECK-LABEL: omp_wsloop		// CHECK-LABEL: omp_wsloop
func @omp_wsloop(%lb : index, %ub : index, %step : index, %data_var : memref<i32>, %linear_var : i32, %chunk_var : i32) -> () {		func @omp_wsloop(%lb : index, %ub : index, %step : index, %data_var : memref<i32>, %linear_var : i32, %chunk_var : i32, %doacross_var1 : i64, %doacross_var2 : i64, %doacross_var3 : i64, %doacross_var4 : i64, %doacross_var5 : i64, %doacross_var6 : i64) -> () {

// CHECK: omp.wsloop (%{{.}}) : index = (%{{.}}) to (%{{.}}) step (%{{.}}) private(%{{.}} : memref<i32>, %{{.}} : memref<i32>) collapse(2) ordered(1)		// CHECK: omp.wsloop (%{{.}}) : index = (%{{.}}) to (%{{.}}) step (%{{.}}) private(%{{.}} : memref<i32>, %{{.}} : memref<i32>) collapse(2) ordered(1) doacross(%{{.}} : i64, %{{.}} : i64, %{{.*}} : i64)
"omp.wsloop" (%lb, %ub, %step, %data_var, %data_var) ({		"omp.wsloop" (%lb, %ub, %step, %data_var, %data_var, %doacross_var1, %doacross_var2, %doacross_var3) ({
^bb0(%iv: index):		^bb0(%iv: index):
omp.yield		omp.yield
}) {operand_segment_sizes = dense<[1,1,1,2,0,0,0,0,0,0]> : vector<10xi32>, collapse_val = 2, ordered_val = 1} :		}) {operand_segment_sizes = dense<[1,1,1,2,0,0,0,0,0,0,3]> : vector<11xi32>, collapse_val = 2, ordered_val = 1} :
(index, index, index, memref<i32>, memref<i32>) -> ()		(index, index, index, memref<i32>, memref<i32>, i64, i64, i64) -> ()

// CHECK: omp.wsloop (%{{.}}) : index = (%{{.}}) to (%{{.}}) step (%{{.}}) linear(%{{.}} = %{{.}} : memref<i32>) schedule(static)		// CHECK: omp.wsloop (%{{.}}) : index = (%{{.}}) to (%{{.}}) step (%{{.}}) linear(%{{.}} = %{{.}} : memref<i32>) schedule(static)
"omp.wsloop" (%lb, %ub, %step, %data_var, %linear_var) ({		"omp.wsloop" (%lb, %ub, %step, %data_var, %linear_var) ({
^bb0(%iv: index):		^bb0(%iv: index):
omp.yield		omp.yield
}) {operand_segment_sizes = dense<[1,1,1,0,0,0,1,1,0,0]> : vector<10xi32>, schedule_val = "Static"} :		}) {operand_segment_sizes = dense<[1,1,1,0,0,0,1,1,0,0,0]> : vector<11xi32>, schedule_val = "Static"} :
(index, index, index, memref<i32>, i32) -> ()		(index, index, index, memref<i32>, i32) -> ()

// CHECK: omp.wsloop (%{{.}}) : index = (%{{.}}) to (%{{.}}) step (%{{.}}) linear(%{{.}} = %{{.}} : memref<i32>, %{{.}} = %{{.}} : memref<i32>) schedule(static)		// CHECK: omp.wsloop (%{{.}}) : index = (%{{.}}) to (%{{.}}) step (%{{.}}) linear(%{{.}} = %{{.}} : memref<i32>, %{{.}} = %{{.}} : memref<i32>) schedule(static)
"omp.wsloop" (%lb, %ub, %step, %data_var, %data_var, %linear_var, %linear_var) ({		"omp.wsloop" (%lb, %ub, %step, %data_var, %data_var, %linear_var, %linear_var) ({
^bb0(%iv: index):		^bb0(%iv: index):
omp.yield		omp.yield
}) {operand_segment_sizes = dense<[1,1,1,0,0,0,2,2,0,0]> : vector<10xi32>, schedule_val = "Static"} :		}) {operand_segment_sizes = dense<[1,1,1,0,0,0,2,2,0,0,0]> : vector<11xi32>, schedule_val = "Static"} :
(index, index, index, memref<i32>, memref<i32>, i32, i32) -> ()		(index, index, index, memref<i32>, memref<i32>, i32, i32) -> ()

// CHECK: omp.wsloop (%{{.}}) : index = (%{{.}}) to (%{{.}}) step (%{{.}}) private(%{{.}} : memref<i32>) firstprivate(%{{.}} : memref<i32>) lastprivate(%{{.}} : memref<i32>) linear(%{{.}} = %{{.}} : memref<i32>) schedule(dynamic = %{{.}}) collapse(3) ordered(2)		// CHECK: omp.wsloop (%{{.}}) : index = (%{{.}}) to (%{{.}}) step (%{{.}}) private(%{{.}} : memref<i32>) firstprivate(%{{.}} : memref<i32>) lastprivate(%{{.}} : memref<i32>) linear(%{{.}} = %{{.}} : memref<i32>) schedule(dynamic = %{{.}}) collapse(3) ordered(2) doacross(%{{.}} : i64, %{{.}} : i64, %{{.}} : i64, %{{.}} : i64, %{{.}} : i64, %{{.}} : i64)
"omp.wsloop" (%lb, %ub, %step, %data_var, %data_var, %data_var, %data_var, %linear_var, %chunk_var) ({		"omp.wsloop" (%lb, %ub, %step, %data_var, %data_var, %data_var, %data_var, %linear_var, %chunk_var, %doacross_var1, %doacross_var2, %doacross_var3, %doacross_var4, %doacross_var5, %doacross_var6) ({
^bb0(%iv: index):		^bb0(%iv: index):
omp.yield		omp.yield
}) {operand_segment_sizes = dense<[1,1,1,1,1,1,1,1,0,1]> : vector<10xi32>, schedule_val = "Dynamic", collapse_val = 3, ordered_val = 2} :		}) {operand_segment_sizes = dense<[1,1,1,1,1,1,1,1,0,1,6]> : vector<11xi32>, schedule_val = "Dynamic", collapse_val = 3, ordered_val = 2} :
(index, index, index, memref<i32>, memref<i32>, memref<i32>, memref<i32>, i32, i32) -> ()		(index, index, index, memref<i32>, memref<i32>, memref<i32>, memref<i32>, i32, i32, i64, i64, i64, i64, i64, i64) -> ()

// CHECK: omp.wsloop (%{{.}}) : index = (%{{.}}) to (%{{.}}) step (%{{.}}) private(%{{.*}} : memref<i32>) schedule(auto) nowait		// CHECK: omp.wsloop (%{{.}}) : index = (%{{.}}) to (%{{.}}) step (%{{.}}) private(%{{.*}} : memref<i32>) schedule(auto) nowait
"omp.wsloop" (%lb, %ub, %step, %data_var) ({		"omp.wsloop" (%lb, %ub, %step, %data_var) ({
^bb0(%iv: index):		^bb0(%iv: index):
omp.yield		omp.yield
}) {operand_segment_sizes = dense<[1,1,1,1,0,0,0,0,0,0]> : vector<10xi32>, nowait, schedule_val = "Auto"} :		}) {operand_segment_sizes = dense<[1,1,1,1,0,0,0,0,0,0,0]> : vector<11xi32>, nowait, schedule_val = "Auto"} :
(index, index, index, memref<i32>) -> ()		(index, index, index, memref<i32>) -> ()

return		return
}		}

// CHECK-LABEL: omp_wsloop_pretty		// CHECK-LABEL: omp_wsloop_pretty
func @omp_wsloop_pretty(%lb : index, %ub : index, %step : index,		func @omp_wsloop_pretty(%lb : index, %ub : index, %step : index, %data_var : memref<i32>, %linear_var : i32, %chunk_var : i32, %doacross_var1 : i64, %doacross_var2 : i64, %doacross_var3 : i64, %doacross_var4 : i64, %doacross_var5 : i64, %doacross_var6 : i64) -> () {
%data_var : memref<i32>, %linear_var : i32, %chunk_var : i32) -> () {

// CHECK: omp.wsloop (%{{.}}) : index = (%{{.}}) to (%{{.}}) step (%{{.}}) private(%{{.*}} : memref<i32>)		// CHECK: omp.wsloop (%{{.}}) : index = (%{{.}}) to (%{{.}}) step (%{{.}}) private(%{{.*}} : memref<i32>)
omp.wsloop (%iv) : index = (%lb) to (%ub) step (%step) private(%data_var : memref<i32>) collapse(2) ordered(2) {		omp.wsloop (%iv) : index = (%lb) to (%ub) step (%step) private(%data_var : memref<i32>) collapse(2) {
omp.yield		omp.yield
}		}

// CHECK: omp.wsloop (%{{.}}) : index = (%{{.}}) to (%{{.}}) step (%{{.}}) linear(%{{.}} = %{{.}} : memref<i32>) schedule(static)		// CHECK: omp.wsloop (%{{.}}) : index = (%{{.}}) to (%{{.}}) step (%{{.}}) linear(%{{.}} = %{{.}} : memref<i32>) schedule(static)
omp.wsloop (%iv) : index = (%lb) to (%ub) step (%step) schedule(static) lastprivate(%data_var : memref<i32>) linear(%data_var = %linear_var : memref<i32>) {		omp.wsloop (%iv) : index = (%lb) to (%ub) step (%step) schedule(static) lastprivate(%data_var : memref<i32>) linear(%data_var = %linear_var : memref<i32>) {
omp.yield		omp.yield
}		}

// CHECK: omp.wsloop (%{{.}}) : index = (%{{.}}) to (%{{.}}) step (%{{.}}) private(%{{.}} : memref<i32>) firstprivate(%{{.}} : memref<i32>) lastprivate(%{{.}} : memref<i32>) linear(%{{.}} = %{{.}} : memref<i32>) schedule(static = %{{.}}) collapse(3) ordered(2)		// CHECK: omp.wsloop (%{{.}}) : index = (%{{.}}) to (%{{.}}) step (%{{.}}) private(%{{.}} : memref<i32>) firstprivate(%{{.}} : memref<i32>) lastprivate(%{{.}} : memref<i32>) linear(%{{.}} = %{{.}} : memref<i32>) schedule(static = %{{.}}) collapse(3) ordered(2) doacross(%{{.}} : i64, %{{.}} : i64, %{{.}} : i64, %{{.}} : i64, %{{.}} : i64, %{{.}} : i64)
omp.wsloop (%iv) : index = (%lb) to (%ub) step (%step) ordered(2) private(%data_var : memref<i32>)		omp.wsloop (%iv) : index = (%lb) to (%ub) step (%step) ordered(2) doacross(%doacross_var1 : i64, %doacross_var2 : i64, %doacross_var3 : i64, %doacross_var4 : i64, %doacross_var5 : i64, %doacross_var6 : i64) private(%data_var : memref<i32>)
firstprivate(%data_var : memref<i32>) lastprivate(%data_var : memref<i32>) linear(%data_var = %linear_var : memref<i32>)		firstprivate(%data_var : memref<i32>) lastprivate(%data_var : memref<i32>) linear(%data_var = %linear_var : memref<i32>)
schedule(static = %chunk_var) collapse(3) {		schedule(static = %chunk_var) collapse(3) {
omp.yield		omp.yield
}		}

// CHECK: omp.wsloop (%{{.}}) : index = (%{{.}}) to (%{{.}}) step (%{{.}}) private(%{{.}} : memref<i32>) firstprivate(%{{.}} : memref<i32>) lastprivate(%{{.}} : memref<i32>) linear(%{{.}} = %{{.}} : memref<i32>) schedule(dynamic = %{{.}}, nonmonotonic) collapse(3) ordered(2)		// CHECK: omp.wsloop (%{{.}}) : index = (%{{.}}) to (%{{.}}) step (%{{.}}) private(%{{.}} : memref<i32>) firstprivate(%{{.}} : memref<i32>) lastprivate(%{{.}} : memref<i32>) linear(%{{.}} = %{{.}} : memref<i32>) schedule(dynamic = %{{.}}, nonmonotonic) collapse(3)
omp.wsloop (%iv) : index = (%lb) to (%ub) step (%step) ordered(2) private(%data_var : memref<i32>)		omp.wsloop (%iv) : index = (%lb) to (%ub) step (%step) private(%data_var : memref<i32>)
firstprivate(%data_var : memref<i32>) lastprivate(%data_var : memref<i32>) linear(%data_var = %linear_var : memref<i32>)		firstprivate(%data_var : memref<i32>) lastprivate(%data_var : memref<i32>) linear(%data_var = %linear_var : memref<i32>)
schedule(dynamic = %chunk_var, nonmonotonic) collapse(3) {		schedule(dynamic = %chunk_var, nonmonotonic) collapse(3) {
omp.yield		omp.yield
}		}

// CHECK: omp.wsloop (%{{.}}) : index = (%{{.}}) to (%{{.}}) step (%{{.}}) private(%{{.}} : memref<i32>) firstprivate(%{{.}} : memref<i32>) lastprivate(%{{.}} : memref<i32>) linear(%{{.}} = %{{.}} : memref<i32>) schedule(dynamic = %{{.}}, monotonic) collapse(3) ordered(2)		// CHECK: omp.wsloop (%{{.}}) : index = (%{{.}}) to (%{{.}}) step (%{{.}}) private(%{{.}} : memref<i32>) firstprivate(%{{.}} : memref<i32>) lastprivate(%{{.}} : memref<i32>) linear(%{{.}} = %{{.}} : memref<i32>) schedule(dynamic = %{{.}}, monotonic) collapse(3)
omp.wsloop (%iv) : index = (%lb) to (%ub) step (%step) ordered(2) private(%data_var : memref<i32>)		omp.wsloop (%iv) : index = (%lb) to (%ub) step (%step) private(%data_var : memref<i32>)
firstprivate(%data_var : memref<i32>) lastprivate(%data_var : memref<i32>) linear(%data_var = %linear_var : memref<i32>)		firstprivate(%data_var : memref<i32>) lastprivate(%data_var : memref<i32>) linear(%data_var = %linear_var : memref<i32>)
schedule(dynamic = %chunk_var, monotonic) collapse(3) {		schedule(dynamic = %chunk_var, monotonic) collapse(3) {
omp.yield		omp.yield
}		}

// CHECK: omp.wsloop (%{{.}}) : index = (%{{.}}) to (%{{.}}) step (%{{.}}) private({{.*}} : memref<i32>)		// CHECK: omp.wsloop (%{{.}}) : index = (%{{.}}) to (%{{.}}) step (%{{.}}) private({{.*}} : memref<i32>)
omp.wsloop (%iv) : index = (%lb) to (%ub) step (%step) private(%data_var : memref<i32>) {		omp.wsloop (%iv) : index = (%lb) to (%ub) step (%step) private(%data_var : memref<i32>) {
omp.yield		omp.yield
▲ Show 20 Lines • Show All 217 Lines • ▼ Show 20 Lines	func @omp_critical() -> () {

// CHECK: omp.critical(@{{.*}})		// CHECK: omp.critical(@{{.*}})
omp.critical(@mutex1) {		omp.critical(@mutex1) {
omp.terminator		omp.terminator
}		}
return		return
}		}

func @omp_ordered(%arg1 : i32, %arg2 : i32, %arg3 : i32,		func @omp_ordered(%arg1 : i32, %arg2 : i32, %arg3 : i32, %vec0 : i64, %vec1 : i64, %vec2 : i64, %vec3 : i64, %doacross_var1 : i64, %doacross_var2 : i64, %doacross_var3 : i64, %doacross_var4 : i64, %doacross_var5 : i64, %doacross_var6 : i64) -> () {
%vec0 : i64, %vec1 : i64, %vec2 : i64, %vec3 : i64) -> () {
// CHECK: omp.ordered_region		// CHECK: omp.ordered_region
omp.ordered_region {		omp.ordered_region {
// CHECK: omp.terminator		// CHECK: omp.terminator
omp.terminator		omp.terminator
}		}

omp.wsloop (%0) : i32 = (%arg1) to (%arg2) step (%arg3) ordered(0) {		omp.wsloop (%0) : i32 = (%arg1) to (%arg2) step (%arg3) ordered(0) {
omp.ordered_region {		omp.ordered_region {
omp.terminator		omp.terminator
}		}
omp.yield		omp.yield
}		}

omp.wsloop (%0) : i32 = (%arg1) to (%arg2) step (%arg3) ordered(1) {		omp.wsloop (%0) : i32 = (%arg1) to (%arg2) step (%arg3) ordered(1) doacross(%doacross_var1 : i64, %doacross_var2 : i64, %doacross_var3 : i64) {
// Only one DEPEND(SINK: vec) clause		// Only one DEPEND(SINK: vec) clause
// CHECK: omp.ordered depend_type("dependsink") depend_vec(%{{.*}} : i64) {num_loops_val = 1 : i64}		// CHECK: omp.ordered depend_type("dependsink") depend_vec(%{{.*}} : i64) {num_loops_val = 1 : i64}
omp.ordered depend_type("dependsink") depend_vec(%vec0 : i64) {num_loops_val = 1 : i64}		omp.ordered depend_type("dependsink") depend_vec(%vec0 : i64) {num_loops_val = 1 : i64}

// CHECK: omp.ordered depend_type("dependsource") depend_vec(%{{.*}} : i64) {num_loops_val = 1 : i64}		// CHECK: omp.ordered depend_type("dependsource") depend_vec(%{{.*}} : i64) {num_loops_val = 1 : i64}
omp.ordered depend_type("dependsource") depend_vec(%vec0 : i64) {num_loops_val = 1 : i64}		omp.ordered depend_type("dependsource") depend_vec(%vec0 : i64) {num_loops_val = 1 : i64}

omp.yield		omp.yield
}		}

omp.wsloop (%0) : i32 = (%arg1) to (%arg2) step (%arg3) ordered(2) {		omp.wsloop (%0) : i32 = (%arg1) to (%arg2) step (%arg3) ordered(2) doacross(%doacross_var1 : i64, %doacross_var2 : i64, %doacross_var3 : i64, %doacross_var4 : i64, %doacross_var5 : i64, %doacross_var6 : i64) {
// Multiple DEPEND(SINK: vec) clauses		// Multiple DEPEND(SINK: vec) clauses
// CHECK: omp.ordered depend_type("dependsink") depend_vec(%{{.}}, %{{.}}, %{{.}}, %{{.}} : i64, i64, i64, i64) {num_loops_val = 2 : i64}		// CHECK: omp.ordered depend_type("dependsink") depend_vec(%{{.}}, %{{.}}, %{{.}}, %{{.}} : i64, i64, i64, i64) {num_loops_val = 2 : i64}
omp.ordered depend_type("dependsink") depend_vec(%vec0, %vec1, %vec2, %vec3 : i64, i64, i64, i64) {num_loops_val = 2 : i64}		omp.ordered depend_type("dependsink") depend_vec(%vec0, %vec1, %vec2, %vec3 : i64, i64, i64, i64) {num_loops_val = 2 : i64}

// CHECK: omp.ordered depend_type("dependsource") depend_vec(%{{.}}, %{{.}} : i64, i64) {num_loops_val = 2 : i64}		// CHECK: omp.ordered depend_type("dependsource") depend_vec(%{{.}}, %{{.}} : i64, i64) {num_loops_val = 2 : i64}
omp.ordered depend_type("dependsource") depend_vec(%vec0, %vec1 : i64, i64) {num_loops_val = 2 : i64}		omp.ordered depend_type("dependsource") depend_vec(%vec0, %vec1 : i64, i64) {num_loops_val = 2 : i64}

omp.yield		omp.yield
▲ Show 20 Lines • Show All 220 Lines • Show Last 20 Lines

mlir/test/Target/LLVMIR/openmp-llvm.mlir

Show First 20 Lines • Show All 373 Lines • ▼ Show 20 Lines	^bb0(%arg1: i64):
// tested there. Just check that the right functions are called.		// tested there. Just check that the right functions are called.
// CHECK: call i32 @__kmpc_global_thread_num		// CHECK: call i32 @__kmpc_global_thread_num
// CHECK: call void @__kmpc_for_static_init_{{.}}(%struct.ident_t @[[$wsloop_loc_struct]],		// CHECK: call void @__kmpc_for_static_init_{{.}}(%struct.ident_t @[[$wsloop_loc_struct]],
%3 = llvm.mlir.constant(2.000000e+00 : f32) : f32		%3 = llvm.mlir.constant(2.000000e+00 : f32) : f32
%4 = llvm.getelementptr %arg0[%arg1] : (!llvm.ptr<f32>, i64) -> !llvm.ptr<f32>		%4 = llvm.getelementptr %arg0[%arg1] : (!llvm.ptr<f32>, i64) -> !llvm.ptr<f32>
llvm.store %3, %4 : !llvm.ptr<f32>		llvm.store %3, %4 : !llvm.ptr<f32>
omp.yield		omp.yield
// CHECK: call void @__kmpc_for_static_fini(%struct.ident_t* @[[$wsloop_loc_struct]],		// CHECK: call void @__kmpc_for_static_fini(%struct.ident_t* @[[$wsloop_loc_struct]],
}) {operand_segment_sizes = dense<[1, 1, 1, 0, 0, 0, 0, 0, 0, 0]> : vector<10xi32>} : (i64, i64, i64) -> ()		}) {operand_segment_sizes = dense<[1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0]> : vector<11xi32>} : (i64, i64, i64) -> ()
omp.terminator		omp.terminator
}		}
llvm.return		llvm.return
}		}

// -----		// -----

// CHECK-LABEL: @wsloop_inclusive_1		// CHECK-LABEL: @wsloop_inclusive_1
llvm.func @wsloop_inclusive_1(%arg0: !llvm.ptr<f32>) {		llvm.func @wsloop_inclusive_1(%arg0: !llvm.ptr<f32>) {
%0 = llvm.mlir.constant(42 : index) : i64		%0 = llvm.mlir.constant(42 : index) : i64
%1 = llvm.mlir.constant(10 : index) : i64		%1 = llvm.mlir.constant(10 : index) : i64
%2 = llvm.mlir.constant(1 : index) : i64		%2 = llvm.mlir.constant(1 : index) : i64
// CHECK: store i64 31, i64* %{{.*}}upperbound		// CHECK: store i64 31, i64* %{{.*}}upperbound
"omp.wsloop"(%1, %0, %2) ( {		"omp.wsloop"(%1, %0, %2) ( {
^bb0(%arg1: i64):		^bb0(%arg1: i64):
%3 = llvm.mlir.constant(2.000000e+00 : f32) : f32		%3 = llvm.mlir.constant(2.000000e+00 : f32) : f32
%4 = llvm.getelementptr %arg0[%arg1] : (!llvm.ptr<f32>, i64) -> !llvm.ptr<f32>		%4 = llvm.getelementptr %arg0[%arg1] : (!llvm.ptr<f32>, i64) -> !llvm.ptr<f32>
llvm.store %3, %4 : !llvm.ptr<f32>		llvm.store %3, %4 : !llvm.ptr<f32>
omp.yield		omp.yield
}) {operand_segment_sizes = dense<[1, 1, 1, 0, 0, 0, 0, 0, 0, 0]> : vector<10xi32>} : (i64, i64, i64) -> ()		}) {operand_segment_sizes = dense<[1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0]> : vector<11xi32>} : (i64, i64, i64) -> ()
llvm.return		llvm.return
}		}

// -----		// -----

// CHECK-LABEL: @wsloop_inclusive_2		// CHECK-LABEL: @wsloop_inclusive_2
llvm.func @wsloop_inclusive_2(%arg0: !llvm.ptr<f32>) {		llvm.func @wsloop_inclusive_2(%arg0: !llvm.ptr<f32>) {
%0 = llvm.mlir.constant(42 : index) : i64		%0 = llvm.mlir.constant(42 : index) : i64
%1 = llvm.mlir.constant(10 : index) : i64		%1 = llvm.mlir.constant(10 : index) : i64
%2 = llvm.mlir.constant(1 : index) : i64		%2 = llvm.mlir.constant(1 : index) : i64
// CHECK: store i64 32, i64* %{{.*}}upperbound		// CHECK: store i64 32, i64* %{{.*}}upperbound
"omp.wsloop"(%1, %0, %2) ( {		"omp.wsloop"(%1, %0, %2) ( {
^bb0(%arg1: i64):		^bb0(%arg1: i64):
%3 = llvm.mlir.constant(2.000000e+00 : f32) : f32		%3 = llvm.mlir.constant(2.000000e+00 : f32) : f32
%4 = llvm.getelementptr %arg0[%arg1] : (!llvm.ptr<f32>, i64) -> !llvm.ptr<f32>		%4 = llvm.getelementptr %arg0[%arg1] : (!llvm.ptr<f32>, i64) -> !llvm.ptr<f32>
llvm.store %3, %4 : !llvm.ptr<f32>		llvm.store %3, %4 : !llvm.ptr<f32>
omp.yield		omp.yield
}) {inclusive, operand_segment_sizes = dense<[1, 1, 1, 0, 0, 0, 0, 0, 0, 0]> : vector<10xi32>} : (i64, i64, i64) -> ()		}) {inclusive, operand_segment_sizes = dense<[1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0]> : vector<11xi32>} : (i64, i64, i64) -> ()
llvm.return		llvm.return
}		}

// -----		// -----

llvm.func @body(i64)		llvm.func @body(i64)

llvm.func @test_omp_wsloop_dynamic(%lb : i64, %ub : i64, %step : i64) -> () {		llvm.func @test_omp_wsloop_dynamic(%lb : i64, %ub : i64, %step : i64) -> () {
▲ Show 20 Lines • Show All 197 Lines • ▼ Show 20 Lines	omp.parallel {
}		}
omp.terminator		omp.terminator
}		}
llvm.return		llvm.return
}		}

// -----		// -----

// CHECK-LABEL: @omp_ordered		// Check that the loop bounds are emitted in the correct location in case of
llvm.func @omp_ordered(%arg0 : i32, %arg1 : i32, %arg2 : i32, %arg3 : i64,		// collapse for dynamic schedule. This only checks the overall shape of the IR,
%arg4: i64, %arg5: i64, %arg6: i64) -> () {		// detailed checking is done by the OpenMPIRBuilder.

		// CHECK-LABEL: @collapse_wsloop_dynamic
		// CHECK: i32* noalias %[[TIDADDR:[0-9A-Za-z.]*]]
		// CHECK: load i32, i32* %[[TIDADDR]]
		// CHECK: store
		// CHECK: load
		// CHECK: %[[LB0:.*]] = load i32
		// CHECK: %[[UB0:.*]] = load i32
		// CHECK: %[[STEP0:.*]] = load i32
		// CHECK: %[[LB1:.*]] = load i32
		// CHECK: %[[UB1:.*]] = load i32
		// CHECK: %[[STEP1:.*]] = load i32
		// CHECK: %[[LB2:.*]] = load i32
		// CHECK: %[[UB2:.*]] = load i32
		// CHECK: %[[STEP2:.*]] = load i32

		llvm.func @collapse_wsloop_dynamic(
		%0: i32, %1: i32, %2: i32,
		%3: i32, %4: i32, %5: i32,
		%6: i32, %7: i32, %8: i32,
		%20: !llvm.ptr<i32>) {
		omp.parallel {
		// CHECK: icmp slt i32 %[[LB0]], 0
		// CHECK-COUNT-4: select
		// CHECK: %[[TRIPCOUNT0:.*]] = select
		// CHECK: br label %[[PREHEADER:.*]]
		//
		// CHECK: [[PREHEADER]]:
		// CHECK: icmp slt i32 %[[LB1]], 0
		// CHECK-COUNT-4: select
		// CHECK: %[[TRIPCOUNT1:.*]] = select
		// CHECK: icmp slt i32 %[[LB2]], 0
		// CHECK-COUNT-4: select
		// CHECK: %[[TRIPCOUNT2:.*]] = select
		// CHECK: %[[PROD:.*]] = mul nuw i32 %[[TRIPCOUNT0]], %[[TRIPCOUNT1]]
		// CHECK: %[[TOTAL:.*]] = mul nuw i32 %[[PROD]], %[[TRIPCOUNT2]]
		// CHECK: br label %[[COLLAPSED_PREHEADER:.*]]
		//
		// CHECK: [[COLLAPSED_PREHEADER]]:
		// CHECK: store i32 1, i32*
		// CHECK: store i32 %[[TOTAL]], i32*
		// CHECK: call void @__kmpc_dispatch_init_4u
		omp.wsloop (%arg0, %arg1, %arg2) : i32 = (%0, %1, %2) to (%3, %4, %5) step (%6, %7, %8) collapse(3) schedule(dynamic) {
		%31 = llvm.load %20 : !llvm.ptr<i32>
		%32 = llvm.add %31, %arg0 : i32
		%33 = llvm.add %32, %arg1 : i32
		%34 = llvm.add %33, %arg2 : i32
		llvm.store %34, %20 : !llvm.ptr<i32>
		omp.yield
		}
		omp.terminator
		}
		llvm.return
		}

		// -----

		// CHECK-LABEL: @omp_ordered_clause_para
		llvm.func @omp_ordered_clause_para(%arg0 : i32, %arg1 : i32, %arg2 : i32, %arg3 : i64,
		%arg4: i64, %arg5: i64, %arg6: i64, %arg7 : i64, %arg8 : i64) -> () {
		// CHECK: [[DIMS:%.]] = alloca [2 x [[KMPDIM:%.]]], align 8
		omp.wsloop (%arg) : i32 = (%arg0) to (%arg1) step (%arg2) ordered(2) doacross(%arg3 : i64, %arg4 : i64, %arg5 : i64, %arg6 : i64, %arg7 : i64, %arg8 : i64) {
		// CHECK: omp_loop.preheader:
		// CHECK: [[ADDR0:%.]] = getelementptr inbounds [2 x [[KMPDIM]]], [2 x [[KMPDIM]]] [[DIMS]], i64 0, i64 0
		// CHECK: [[LB0:%.]] = getelementptr inbounds [[KMPDIM]], [[KMPDIM]] [[ADDR0]], i32 0, i32 0
		// CHECK: store i64 [[ARG3:%.]], i64 [[LB0]], align 8
		// CHECK: [[UB0:%.]] = getelementptr inbounds [[KMPDIM]], [[KMPDIM]] [[ADDR0]], i32 0, i32 1
		// CHECK: store i64 [[ARG4:%.]], i64 [[UB0]], align 8
		// CHECK: [[STEP0:%.]] = getelementptr inbounds [[KMPDIM]], [[KMPDIM]] [[ADDR0]], i32 0, i32 2
		// CHECK: store i64 [[ARG5:%.]], i64 [[STEP0]], align 8
		// CHECK: [[ADDR1:%.]] = getelementptr inbounds [2 x [[KMPDIM]]], [2 x [[KMPDIM]]] [[DIMS]], i64 0, i64 1
		// CHECK: [[LB1:%.]] = getelementptr inbounds [[KMPDIM]], [[KMPDIM]] [[ADDR1]], i32 0, i32 0
		// CHECK: store i64 [[ARG6:%.]], i64 [[LB1:%.*]], align 8
		// CHECK: [[UB1:%.]] = getelementptr inbounds [[KMPDIM]], [[KMPDIM]] [[ADDR1]], i32 0, i32 1
		// CHECK: store i64 [[ARG7:%.]], i64 [[UB1:%.*]], align 8
		// CHECK: [[STEP1:%.]] = getelementptr inbounds [[KMPDIM]], [[KMPDIM]] [[ADDR1]], i32 0, i32 2
		// CHECK: store i64 [[ARG8:%.]], i64 [[STEP1:%.*]], align 8
		// CHECK: [[BASE:%.]] = getelementptr inbounds [2 x [[KMPDIM]]], [2 x [[KMPDIM]]] %dims, i64 0, i64 0
		// CHECK: [[BASEI8:%.]] = bitcast [[KMPDIM]] [[BASE]] to i8*
		// CHECK: [[OMP_THREAD:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t @[[GLOB1:[0-9]+]])
		// CHECK: call void @__kmpc_doacross_init(%struct.ident_t* @[[GLOB1]], i32 [[OMP_THREAD]], i32 2, i8* [[BASEI8]])
		// CHECK: omp_loop.exit:
		// CHECK: call void @__kmpc_doacross_fini(%struct.ident_t* @[[GLOB1]], i32 [[OMP_THREAD]])
		omp.yield
		}

		llvm.return
		}
		// -----

		// CHECK-LABEL: @omp_ordered_construct
		llvm.func @omp_ordered_construct(%arg0 : i32, %arg1 : i32, %arg2 : i32, %arg3 : i64,
		%arg4: i64, %arg5: i64, %arg6: i64, %arg8 : i64, %arg9 : i64, %arg10 : i64,
		%arg11 : i64, %arg12 : i64, %arg13 : i64) -> () {
// CHECK: [[ADDR9:%.*]] = alloca [2 x i64], align 8		// CHECK: [[ADDR9:%.*]] = alloca [2 x i64], align 8
// CHECK: [[ADDR7:%.*]] = alloca [2 x i64], align 8		// CHECK: [[ADDR7:%.*]] = alloca [2 x i64], align 8
// CHECK: [[ADDR5:%.*]] = alloca [2 x i64], align 8		// CHECK: [[ADDR5:%.*]] = alloca [2 x i64], align 8
// CHECK: [[ADDR3:%.*]] = alloca [1 x i64], align 8		// CHECK: [[ADDR3:%.*]] = alloca [1 x i64], align 8
// CHECK: [[ADDR:%.*]] = alloca [1 x i64], align 8		// CHECK: [[ADDR:%.*]] = alloca [1 x i64], align 8

// CHECK: [[OMP_THREAD:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t @[[GLOB1:[0-9]+]])		// CHECK: [[OMP_THREAD:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t @[[GLOB1:[0-9]+]])
// CHECK-NEXT: call void @__kmpc_ordered(%struct.ident_t* @[[GLOB1]], i32 [[OMP_THREAD]])		// CHECK-NEXT: call void @__kmpc_ordered(%struct.ident_t* @[[GLOB1]], i32 [[OMP_THREAD]])
omp.ordered_region {		omp.ordered_region {
omp.terminator		omp.terminator
// CHECK: call void @__kmpc_end_ordered(%struct.ident_t* @[[GLOB1]], i32 [[OMP_THREAD]])		// CHECK: call void @__kmpc_end_ordered(%struct.ident_t* @[[GLOB1]], i32 [[OMP_THREAD]])
}		}

omp.wsloop (%arg7) : i32 = (%arg0) to (%arg1) step (%arg2) ordered(0) {		omp.wsloop (%arg7) : i32 = (%arg0) to (%arg1) step (%arg2) ordered(0) {
// CHECK: [[OMP_THREAD:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t @[[GLOB1:[0-9]+]])		// CHECK: [[OMP_THREAD:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t @[[GLOB1:[0-9]+]])
// CHECK-NEXT: call void @__kmpc_ordered(%struct.ident_t* @[[GLOB1]], i32 [[OMP_THREAD]])		// CHECK-NEXT: call void @__kmpc_ordered(%struct.ident_t* @[[GLOB1]], i32 [[OMP_THREAD]])
omp.ordered_region {		omp.ordered_region {
omp.terminator		omp.terminator
// CHECK: call void @__kmpc_end_ordered(%struct.ident_t* @[[GLOB1]], i32 [[OMP_THREAD]])		// CHECK: call void @__kmpc_end_ordered(%struct.ident_t* @[[GLOB1]], i32 [[OMP_THREAD]])
}		}
omp.yield		omp.yield
}		}

omp.wsloop (%arg7) : i32 = (%arg0) to (%arg1) step (%arg2) ordered(1) {		omp.wsloop (%arg7) : i32 = (%arg0) to (%arg1) step (%arg2) ordered(1) doacross(%arg8 : i64, %arg9 : i64, %arg10 : i64) {
// CHECK: [[TMP:%.]] = getelementptr inbounds [1 x i64], [1 x i64] [[ADDR]], i64 0, i64 0		// CHECK: [[TMP:%.]] = getelementptr inbounds [1 x i64], [1 x i64] [[ADDR]], i64 0, i64 0
// CHECK: store i64 [[ARG0:%.]], i64 [[TMP]], align 4		// CHECK: store i64 [[ARG0:%.]], i64 [[TMP]], align 4
// CHECK: [[TMP2:%.]] = getelementptr inbounds [1 x i64], [1 x i64] [[ADDR]], i64 0, i64 0		// CHECK: [[TMP2:%.]] = getelementptr inbounds [1 x i64], [1 x i64] [[ADDR]], i64 0, i64 0
// CHECK: [[OMP_THREAD2:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t @[[GLOB3:[0-9]+]])		// CHECK: [[OMP_THREAD2:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t @[[GLOB3:[0-9]+]])
// CHECK: call void @__kmpc_doacross_wait(%struct.ident_t* @[[GLOB3]], i32 [[OMP_THREAD2]], i64* [[TMP2]])		// CHECK: call void @__kmpc_doacross_wait(%struct.ident_t* @[[GLOB3]], i32 [[OMP_THREAD2]], i64* [[TMP2]])
omp.ordered depend_type("dependsink") depend_vec(%arg3 : i64) {num_loops_val = 1 : i64}		omp.ordered depend_type("dependsink") depend_vec(%arg3 : i64) {num_loops_val = 1 : i64}

// CHECK: [[TMP3:%.]] = getelementptr inbounds [1 x i64], [1 x i64] [[ADDR3]], i64 0, i64 0		// CHECK: [[TMP3:%.]] = getelementptr inbounds [1 x i64], [1 x i64] [[ADDR3]], i64 0, i64 0
// CHECK: store i64 [[ARG0]], i64* [[TMP3]], align 4		// CHECK: store i64 [[ARG0]], i64* [[TMP3]], align 4
// CHECK: [[TMP4:%.]] = getelementptr inbounds [1 x i64], [1 x i64] [[ADDR3]], i64 0, i64 0		// CHECK: [[TMP4:%.]] = getelementptr inbounds [1 x i64], [1 x i64] [[ADDR3]], i64 0, i64 0
// CHECK: [[OMP_THREAD4:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t @[[GLOB5:[0-9]+]])		// CHECK: [[OMP_THREAD4:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t @[[GLOB5:[0-9]+]])
// CHECK: call void @__kmpc_doacross_post(%struct.ident_t* @[[GLOB5]], i32 [[OMP_THREAD4]], i64* [[TMP4]])		// CHECK: call void @__kmpc_doacross_post(%struct.ident_t* @[[GLOB5]], i32 [[OMP_THREAD4]], i64* [[TMP4]])
omp.ordered depend_type("dependsource") depend_vec(%arg3 : i64) {num_loops_val = 1 : i64}		omp.ordered depend_type("dependsource") depend_vec(%arg3 : i64) {num_loops_val = 1 : i64}

omp.yield		omp.yield
}		}

omp.wsloop (%arg7) : i32 = (%arg0) to (%arg1) step (%arg2) ordered(2) {		omp.wsloop (%arg7) : i32 = (%arg0) to (%arg1) step (%arg2) ordered(2) doacross(%arg8 : i64, %arg9 : i64, %arg10 : i64, %arg11 : i64, %arg12 : i64, %arg13 : i64) {
// CHECK: [[TMP5:%.]] = getelementptr inbounds [2 x i64], [2 x i64] [[ADDR5]], i64 0, i64 0		// CHECK: [[TMP5:%.]] = getelementptr inbounds [2 x i64], [2 x i64] [[ADDR5]], i64 0, i64 0
// CHECK: store i64 [[ARG0]], i64* [[TMP5]], align 4		// CHECK: store i64 [[ARG0]], i64* [[TMP5]], align 4
// CHECK: [[TMP6:%.]] = getelementptr inbounds [2 x i64], [2 x i64] [[ADDR5]], i64 0, i64 1		// CHECK: [[TMP6:%.]] = getelementptr inbounds [2 x i64], [2 x i64] [[ADDR5]], i64 0, i64 1
// CHECK: store i64 [[ARG1:%.]], i64 [[TMP6]], align 4		// CHECK: store i64 [[ARG1:%.]], i64 [[TMP6]], align 4
// CHECK: [[TMP7:%.]] = getelementptr inbounds [2 x i64], [2 x i64] [[ADDR5]], i64 0, i64 0		// CHECK: [[TMP7:%.]] = getelementptr inbounds [2 x i64], [2 x i64] [[ADDR5]], i64 0, i64 0
// CHECK: [[OMP_THREAD6:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t @[[GLOB7:[0-9]+]])		// CHECK: [[OMP_THREAD6:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t @[[GLOB7:[0-9]+]])
// CHECK: call void @__kmpc_doacross_wait(%struct.ident_t* @[[GLOB7]], i32 [[OMP_THREAD6]], i64* [[TMP7]])		// CHECK: call void @__kmpc_doacross_wait(%struct.ident_t* @[[GLOB7]], i32 [[OMP_THREAD6]], i64* [[TMP7]])
// CHECK: [[TMP8:%.]] = getelementptr inbounds [2 x i64], [2 x i64] [[ADDR7]], i64 0, i64 0		// CHECK: [[TMP8:%.]] = getelementptr inbounds [2 x i64], [2 x i64] [[ADDR7]], i64 0, i64 0
▲ Show 20 Lines • Show All 87 Lines • ▼ Show 20 Lines	llvm.func @omp_sections_trivial() -> () {
// CHECK-NEXT: ]		// CHECK-NEXT: ]

// CHECK: [[INC]]:		// CHECK: [[INC]]:
// CHECK: %{{.}} = add {{.}}, 1		// CHECK: %{{.}} = add {{.}}, 1
// CHECK: br label %[[HEADER]]		// CHECK: br label %[[HEADER]]

// CHECK: [[EXIT]]:		// CHECK: [[EXIT]]:
// CHECK: call void @__kmpc_for_static_fini({{.*}})		// CHECK: call void @__kmpc_for_static_fini({{.*}})
// CHECK: call void @__kmpc_barrier({{.*}})
// CHECK: br label %[[AFTER:.*]]		// CHECK: br label %[[AFTER:.*]]

// CHECK: [[AFTER]]:		// CHECK: [[AFTER]]:
		// CHECK: call void @__kmpc_barrier({{.*}})
// CHECK: br label %[[END:.*]]		// CHECK: br label %[[END:.*]]

// CHECK: [[END]]:		// CHECK: [[END]]:
// CHECK: ret void		// CHECK: ret void
omp.sections {		omp.sections {
omp.section {		omp.section {
// CHECK: [[SECTION1]]:		// CHECK: [[SECTION1]]:
// CHECK-NEXT: br label %[[REGION1:[^ ,]*]]		// CHECK-NEXT: br label %[[REGION1:[^ ,]*]]
▲ Show 20 Lines • Show All 87 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[OMPIRBuilder][MLIR] Support ordered clause specified with parameterAbandonedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 396260

clang/test/OpenMP/cancel_codegen.cpp

clang/test/OpenMP/irbuilder_for_iterator.cpp

clang/test/OpenMP/irbuilder_for_rangefor.cpp

clang/test/OpenMP/irbuilder_for_unsigned.c

clang/test/OpenMP/irbuilder_nested_parallel_for.c

clang/test/OpenMP/irbuilder_unroll_partial_factor_for.c

clang/test/OpenMP/irbuilder_unroll_partial_heuristic_constant_for.c

clang/test/OpenMP/irbuilder_unroll_partial_heuristic_runtime_for.c

clang/test/OpenMP/irbuilder_unroll_unroll_partial_factor.c

llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h

llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp

llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp

mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td

mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp

mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp

mlir/test/Conversion/OpenMPToLLVM/convert-to-llvmir.mlir

mlir/test/Dialect/OpenMP/invalid.mlir

mlir/test/Dialect/OpenMP/ops.mlir

mlir/test/Target/LLVMIR/openmp-llvm.mlir

[OMPIRBuilder][MLIR] Support ordered clause specified with parameter
AbandonedPublic