This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/test/OpenMP/
-
test/
-
OpenMP/
-
parallel_codegen.cpp
-
llvm/
-
include/llvm/Frontend/OpenMP/
-
llvm/
-
Frontend/
-
OpenMP/
6/8
OMPIRBuilder.h
-
lib/Frontend/OpenMP/
-
Frontend/
-
OpenMP/
18/24
OMPIRBuilder.cpp
-
unittests/Frontend/
-
Frontend/
-
OpenMPIRBuilderTest.cpp
-
mlir/test/Conversion/OpenMPToLLVM/
-
test/
-
Conversion/
-
OpenMPToLLVM/
-
convert-to-llvmir.mlir
5/6
openmp_float-parallel_param.mlir

Differential D91556

[OpenMPIRBuilder} Add capturing of parameters to pass to omp::parallel
Needs ReviewPublic

Authored by llitchev on Nov 16 2020, 11:17 AM.

Download Raw Diff

Details

Reviewers

jdoerfert
ftynse

Summary

The omp::ParallelOp is translated to a callback that is called for each thread. It uses varargs, but the parameter passing is not working properly with SSE(UP) parameter types. Thus, the need to capture the parameters into an alloca-ed struct and pass that to the callback.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	90 ms	x64 windows > LLVM.CodeGen/XCore::threads.ll

Event Timeline

llitchev created this revision.Nov 16 2020, 11:17 AM

Herald added a project: Restricted Project. · View Herald TranscriptNov 16 2020, 11:17 AM

Herald added subscribers: teijeong, rdzhabarov, tatianashp and 14 others. · View Herald Transcript

llitchev requested review of this revision.Nov 16 2020, 11:17 AM

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald TranscriptNov 16 2020, 11:17 AM

rriddle added inline comments.Nov 16 2020, 11:20 AM

mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
837 ↗	(On Diff #305566)	I would not expect the translation to modify the input module.

Harbormaster completed remote builds in B78989: Diff 305566.Nov 16 2020, 11:34 AM

llitchev added inline comments.Nov 16 2020, 10:26 PM

mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
837 ↗	(On Diff #305566)	Thanks! That makes sense - I just never thought that a translator is actually translating, input and not modifying it. Moving this code to OpenMPToLLVM converter.

ftynse requested changes to this revision.Nov 17 2020, 3:43 PM

ftynse added inline comments.

mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
837 ↗	(On Diff #305566)	Translation is supposed to be as simple as possible. We have a dedicated pass preparing an MLIR module in the LLVM dialect for translation - LegalizeForExport. Something similar can be introduced on the OpenMP dialect, and the translator just reject the inputs it cannot handle. (On a side note, we'd better refactor the translator in such a way that it no longer needs to know about OpenMP)

This revision now requires changes to proceed.Nov 17 2020, 3:43 PM

Thanks @llitchev for this patch.

FYI @jdoerfert was suggesting to fix this issue in the OpenMPIRBuilder by setting /* AggregateArgs */ to true in the CodeExtractor in /llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp.
This hit couple of issues,

When the ArtificialEntryBlock is removed in OpenMPIRBuilder::finalize() there are some uses that are left hanging.
Once the above was fixed it hit an assertion in the following place.

OpenMPIRBuilder::createParallel

assert(OutlinedFn.arg_size() >= 2 &&
       "Expected at least tid and bounded tid as arguments");

Created a small pass to capture omp::ParallelOp parameters.

Herald added a subscriber: sstefan1. · View Herald TranscriptNov 17 2020, 8:18 PM

llitchev marked an inline comment as done.Nov 17 2020, 8:20 PM

llitchev added inline comments.

mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
837 ↗	(On Diff #305566)	I created a simple pass that goes after the ConvertOpenMPToLLVM one. It looks like a better place for this than the translator.

Harbormaster completed remote builds in B79219: Diff 305963.Nov 17 2020, 8:33 PM

Thanks, this looks better! I have a couple of further comments.

mlir/include/mlir/Conversion/OpenMPToLLVM/ConvertOpenMPToLLVM.h
30 ↗	(On Diff #305963)	This could be a function pass instead.
mlir/include/mlir/Conversion/Passes.td
227 ↗	(On Diff #305963)	Please try to fit 80 cols.
228 ↗	(On Diff #305963)	Triple backslashes? We could just use single quotes inside or the code-block syntax outside...
mlir/lib/Conversion/OpenMPToLLVM/OpenMPToLLVM.cpp
27 ↗	(On Diff #305963)	Why is this necessary?
59 ↗	(On Diff #305963)	Nit: elide trivial braces
64 ↗	(On Diff #305963)	Nit: MLIR uses auto when it improves readability, e.g. if the type is already mentioned on the RHS (casts) or if it's too long to spell out. `LLVMType` looks just fine here.
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
20 ↗	(On Diff #305963)	Please drop these
mlir/test/Conversion/OpenMPToLLVM/openmp_float-parallel_param.mlir
3	Could we please make this test minimal and only exercise the functionality that the patch is adding? I don't think we need anything about `main` or `_mlir_ciface` or the entire initialization block here. We can use the Test dialect that supports unregistered ops as opaque producers or users of values.
42	The check pattern doesn't look like valid MLIR, I am surprised pre-merge checks haven't complained.
47	Prefer CHECK over CHECK-NEXT unless the semantics of the IR changes when two operations are not adjacent. The need to change another, unrelated test in these patch is the a good illustration why :)

rriddle added inline comments.Nov 18 2020, 4:13 AM

mlir/lib/Conversion/OpenMPToLLVM/OpenMPToLLVM.cpp
24 ↗	(On Diff #305963)	Use the auto generated base class instead of PassWrapper.

Addressed some CR feedback on this diff.

mlir/include/mlir/Conversion/OpenMPToLLVM/ConvertOpenMPToLLVM.h
30 ↗	(On Diff #305963)	This pass operates on LLVM Dialect IR. I tried to have it as a function pass (from the very beginning), but the function pass doesn't take the LLVMFuncOp (maybe I am missing something, though). Thanks!
mlir/include/mlir/Conversion/Passes.td
227 ↗	(On Diff #305963)	I ran the git clang-format origin/master. I thought it should have fix it. Thanks!
mlir/test/Conversion/OpenMPToLLVM/openmp_float-parallel_param.mlir
42	I have no idea why it didn't get caught.

Harbormaster completed remote builds in B79355: Diff 306203.Nov 18 2020, 1:39 PM

It doesn't make sense to do this here. The OpenMPIRBuilder is used in other places that have the same problem, especially OpenMPOpt. The logic needs to be part of the OpenMPIRBuilder.

This revision now requires changes to proceed.Nov 19 2020, 7:58 AM

@jdoerfert That makes sense. I looked at the code and that is LLVMIR, so Ill start moving things over.

ftynse mentioned this in D92189: [OpenMPIRBuilder] forward arguments as pointers to outlined function.Nov 26 2020, 9:36 AM

Just finished significant testing ofg this with our AI codegen. The fix in https://reviews.llvm.org/D92189 whows some issues with number of arguments passed as varargs. The number of varargs that starts showing problems is different for different HW (it seems different for SpirV, Vulcan, MultiCore). I will update this PR.

Implemented the closure approach in the OMPIRBuilder only.

The implementation is fully encapsulated in the OMPIRBuilder only (no changes outside of this file).
It also addresses the issue/limitation of the current implementation with having more than 15 upward defined Values, passed to the parallel region as varargs.

Herald added a project: Restricted Project. · View Herald TranscriptDec 11 2020, 12:53 PM

Herald added subscribers: llvm-commits, hiraditya. · View Herald Transcript

I suspect there will be conflicts at this time. Posting the source here, so some of the previous commenters could look at it.

I think the overall approach is good to solve the problem with max 16 arguments. Is this based on current master? I left remarks wrt style and other things below, I will need to go over the logic again after those are addressed.

llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
666	Nit: use SmallVectorImpl w/o the size. No const on the pointers (*const); doesn't mean much anyway.
llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
446	If it has to be an instruction, use `cast`. If it might no be, simply get the current instruction point from the builder.
692	This should make some of the code introduced by D92189 obsolete, right?
838	Use range loops above whenever possible. Use LLVM naming style for variables please, so first letter capitalized. Also no `llvm::`. Prefer Insertion point guards over manually saving restoring (potentially adding a explicit scope `{ ... }`). I don't think we need to iterate over the entire set of instructions of the outer function, do we? Check how D92189 identifies communicated values.
mlir/lib/Conversion/OpenMPToLLVM/OpenMPToLLVM.cpp
17 ↗	(On Diff #311301)	leftover.

Harbormaster completed remote builds in B82095: Diff 311301.Dec 11 2020, 1:44 PM

Addressed CR feedback.

llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
666	Using the Impl now.
llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
446	Using the current position from the Builder (moving one forward, so it is not the end() ).
692	Yes! Most of it. Now the data is just wrapped into a capture struct before calling the synthetic ..._fork function.
838	You are right ... That was one of the changes I wanted to make to the Diff. Now iterate over the parallel region blocks only.

Needed merges with the latest master.

Harbormaster completed remote builds in B82206: Diff 311475.Dec 13 2020, 5:28 PM

llitchev retitled this revision from Add capturing of parameters to pass to omp::parallel to [OpenMPIRBuilder} Add capturing of parameters to pass to omp::parallel.Dec 14 2020, 5:21 AM

Herald added subscribers: guansong, yaxunl. · View Herald TranscriptDec 14 2020, 5:21 AM

Merged in master.

Removed unnecessary (now) code from D92189.

Herald added a project: Restricted Project. · View Herald TranscriptDec 14 2020, 5:25 AM

Herald added a subscriber: cfe-commits. · View Herald Transcript

Harbormaster completed remote builds in B82256: Diff 311557.Dec 14 2020, 6:04 AM

Fixed a failing Windows test. The issue is that that the order of the operands for add operation has changed. I cant see how these changes could cause the issue, but it is a failing test that blocks push of this Diff.

Harbormaster completed remote builds in B82312: Diff 311648.Dec 14 2020, 12:02 PM

Added separate tests for Windows for specific tests.

On Windows the registers for the add operation are swapped. This test is completely unrelated to this change, but it fails.

Harbormaster completed remote builds in B82332: Diff 311698.Dec 14 2020, 2:43 PM

Thanks for continuing to work on this.

Change the commit message as it is working for non integer types now, just not for "too many". Some more comments below.

llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
124	wasn't that a spelling error before? and one `\param` is enough ;)
126	I believe we need to keep the `Inner` parameter here. The reasons can be found in the discussion of D92476. Short story: The callback might have the original value in a map with information attached. If we pass in only the "reloaded" value the callback cannot determine what the original was.
llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
441	How do we know that `--` is valid here? Couldn't `Loc` point to the begin of a function? If possible, let's just use `Loc.IP`.

Thanks! I have a couple of comments, but I will defer to @jdoerfert for approval in any case.

llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
678
680	Nit: it looks like this file uses IP rather than InsPoint for names related to insertion points
llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
754	Nit: I think https://llvm.org/docs/CodingStandards.html#don-t-evaluate-end-every-time-through-a-loop applies to `.size()` the same way it applies to `.end()`
789	Nit: https://llvm.org/docs/CodingStandards.html#use-early-exits-and-continue-to-simplify-code if (CapturedValues.empty()) return;
790	Nit: trailing dot
793	Nit: please `reserve` before pushing back in a loop
801–802	Nit: `Builder.restoreIP(CaptureAllocaInsPoint)` looks shorter
804–809	I suppose you may want to have `alloca` inserted in a block (function entry) different from the one where you store into the memory. You need to store just before calling the fork function (or, at least, so that the store postdominates all stored values). Looking at the function API, I would have assumed `CaptureAllocaInsPoint` to be an insertion point at the function entry block specifically for `alloca`s, where these `insertvalue`s are invalid.
820–823	Can we rather take each captured value and enumerate its uses, replacing those within the parallel block set?
llvm/test/CodeGen/XCore/threads.ll
84–140 ↗	(On Diff #311698)	These look irrelevant to the patch, but seem to fix a breakage upstream. Would you mind committing this separately?
mlir/test/Conversion/OpenMPToLLVM/openmp_float-parallel_param.mlir
2	Changes to MLIR are no longer necessary

Addressed CR.

llitchev added inline comments.Dec 21 2020, 1:56 AM

llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
124	Fixed.
680	No need to store this value anymore. Used the InsertBB->getTerminator(), thus guaranteeing the alloca and stores are just before the fork call (they were before that call too, since the ThreadID was called last), so even if more codegen is introduced in the future the logic deals with it.
llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
441	There is always an instruction before - the ThreadID was always generated, and that is what -- points to. Changed it to use InsertBB->getTerminator(). It is much more sturdy this way. Even if the codegen is changed, the alloca, insert and store will be generated always just before the forkCall.
754	Fixed.
801–802	Refactored. No need to store the InsertPoint.
804–809	Now it is guaranteed that. the codegen of the alloca, insert, and stores are done just before the forkCall. Even if the codegen changes in the future. It was the case before because the code was generated after the ThreadID getting call (which was just before the fork).
820–823	That was the first implementation I had. The issues was that the uses() was not returning all the uses (particularly the ones introduced by the loop unroller - spent bunch of time debugging it). Iterating to all the instruction parameters of the parallelRegions just works.
llvm/test/CodeGen/XCore/threads.ll
84–140 ↗	(On Diff #311698)	OK
mlir/test/Conversion/OpenMPToLLVM/openmp_float-parallel_param.mlir
2	Yes. This just exposes the original issue I had. I thought it is useful to have a test that verifies the underlined functionality works for MLIR.

Removed the changes from threads.ll.

I'll pull this is a new Diff.

Some minor optimizations related to CR.

Harbormaster completed remote builds in B83111: Diff 313039.Dec 21 2020, 2:47 AM

Harbormaster completed remote builds in B83112: Diff 313041.Dec 21 2020, 2:50 AM

Harbormaster completed remote builds in B83117: Diff 313048.Dec 21 2020, 3:08 AM

Fixed a casing issue with a local var.

Harbormaster completed remote builds in B83130: Diff 313077.Dec 21 2020, 4:29 AM

nigelp-xmos added a subscriber: nigelp-xmos.Jan 4 2021, 8:36 AM

nigelp-xmos mentioned this in D93625: [NFC] [TEST] Fix the threads.ll for Windows.Jan 5 2021, 6:15 AM

jdoerfert added inline comments.Jan 6 2021, 8:12 PM

llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
784	If we would need this, remove the Counter stuff everywhere, if you want to iterate a container: `for (const T& : Container)` `BlockParents` seems to be a set with the blocks, we already have that, it's called `ParallelRegionBlockSet`, simply pass it in. Why don't we use the `Inputs` and `Outputs` set computed by the `findInputsOutputs` call. Those are the live-in and live-out values of the parallel region.
795
806	The alloca needs to go in the `OuterAllocaIP` passed in by the caller of `CreateParallel`.
821	I'm not too happy with this insert/extract value scheme. Without further optimization (-O0) this might not be lowered properly. Why don't we create a GEP and load/store to the appropriate location instead?
831	Instead of doing this, unpack/load the location in the `PrivHelper` like we did before. Also, pass the loaded value as `Inner` to the `PrivCB` so that the callback has both the original value `V` and the reload `Inner`.

Ping @llitchev. Would you have time to take this forward?

Herald added subscribers: dcaballe, cota. · View Herald TranscriptMar 10 2021, 9:10 AM

In D91556#2617144, @kiranchandramohan wrote:

Ping @llitchev. Would you have time to take this forward?

I think @ggeorgakoudis is working on an alternative API solution, we might need to pick up the MLIR parts though.

In D91556#2620928, @jdoerfert wrote:

In D91556#2617144, @kiranchandramohan wrote:

Ping @llitchev. Would you have time to take this forward?

I think @ggeorgakoudis is working on an alternative API solution, we might need to pick up the MLIR parts though.

Yes, I have a solution for OMPIRBuilder. It hinges on https://reviews.llvm.org/D96854 to use the CodeExtractor for building the aggregate.

ftynse resigned from this revision.Aug 27 2021, 12:17 AM

Herald added subscribers: wrengr, Chia-hungDuan. · View Herald TranscriptAug 27 2021, 12:17 AM

nigelp-xmos removed a subscriber: nigelp-xmos.Aug 31 2021, 12:20 AM

Revision Contents

Path

Size

clang/

test/

OpenMP/

parallel_codegen.cpp

12 lines

llvm/

include/

llvm/

Frontend/

OpenMP/

OMPIRBuilder.h

16 lines

lib/

Frontend/

OpenMP/

OMPIRBuilder.cpp

144 lines

unittests/

Frontend/

OpenMPIRBuilderTest.cpp

184 lines

mlir/

test/

Conversion/

OpenMPToLLVM/

convert-to-llvmir.mlir

2 lines

openmp_float-parallel_param.mlir

42 lines

Diff 313077

clang/test/OpenMP/parallel_codegen.cpp

	Show First 20 Lines • Show All 127 Lines • ▼ Show 20 Lines

	// CHECK: define internal {{.}}void [[OMP_OUTLINED21]](i32 noalias %{{.+}}, i32* noalias %{{.+}}, i{{[0-9]+}}{{.}} [[VLA_SIZE:%.+]], i32 {{.+}} [[VLA_ADDR:%[^)]+]])			// CHECK: define internal {{.}}void [[OMP_OUTLINED21]](i32 noalias %{{.+}}, i32* noalias %{{.+}}, i{{[0-9]+}}{{.}} [[VLA_SIZE:%.+]], i32 {{.+}} [[VLA_ADDR:%[^)]+]])
	// CHECK: load i32, i32* @			// CHECK: load i32, i32* @

	// ALL-DEBUG-DAG: declare !callback ![[cbid:[0-9]+]] void @__kmpc_fork_call(%struct.ident_t, i32, void (i32, i32, ...), ...)			// ALL-DEBUG-DAG: declare !callback ![[cbid:[0-9]+]] void @__kmpc_fork_call(%struct.ident_t, i32, void (i32, i32, ...), ...)
	// CHECK-DEBUG-DAG: define internal void [[OMP_OUTLINED]](i32* noalias %.global_tid., i32* noalias %.bound_tid., i64 [[VLA_SIZE:%.+]], i32* {{.+}} [[VLA_ADDR:%[^)]+]])			// CHECK-DEBUG-DAG: define internal void [[OMP_OUTLINED]](i32* noalias %.global_tid., i32* noalias %.bound_tid., i64 [[VLA_SIZE:%.+]], i32* {{.+}} [[VLA_ADDR:%[^)]+]])
	// CHECK-DEBUG-DAG: call void [[OMP_OUTLINED_DEBUG]]			// CHECK-DEBUG-DAG: call void [[OMP_OUTLINED_DEBUG]]

	// Note that OpenMPIRBuilder puts the trailing arguments in a different order:
	// arguments that are wrapped into additional pointers precede the other
	// arguments. This is expected and not problematic because both the call and the
	// function are generated from the same place, and the function is internal.
	// ALL: define linkonce_odr {{[a-z\_\b][ ]?i32}} [[TMAIN]](i8* %argc)			// ALL: define linkonce_odr {{[a-z\_\b][ ]?i32}} [[TMAIN]](i8* %argc)
	// ALL: store i8 %argc, i8* [[ARGC_ADDR:%.+]],			// ALL: store i8 %argc, i8* [[ARGC_ADDR:%.+]],
	// CHECK: call {{.}}void (%struct.ident_t, i32, void (i32, i32, ...), ...) @__kmpc_fork_call(%struct.ident_t [[DEF_LOC_2]], i32 2, void (i32, i32, ...)* bitcast (void (i32, i32, i8**, i{{64\|32}}) [[OMP_OUTLINED:@.+]] to void (i32, i32, ...)), i8** [[ARGC_ADDR]], i{{64\|32}} %{{.+}})			// CHECK: call {{.}}void (%struct.ident_t, i32, void (i32, i32, ...), ...) @__kmpc_fork_call(%struct.ident_t [[DEF_LOC_2]], i32 2, void (i32, i32, ...)* bitcast (void (i32, i32, i8**, i{{64\|32}}) [[OMP_OUTLINED:@.+]] to void (i32, i32, ...)), i8** [[ARGC_ADDR]], i{{64\|32}} %{{.+}})
	// IRBUILDER: call {{.}}void (%struct.ident_t, i32, void (i32, i32, ...), ...) @__kmpc_fork_call(%struct.ident_t [[DEF_LOC_2]], i32 2, void (i32, i32, ...)* bitcast (void (i32, i32, i{{64\|32}}, i8*) [[OMP_OUTLINED:@.+]] to void (i32, i32, ...)), i{{64\|32}} %{{.+}}, i8*** [[ARGC_ADDR]])			// IRBUILDER: call {{.}}void (%struct.ident_t, i32, void (i32, i32, ...), ...) @__kmpc_fork_call(%struct.ident_t [[DEF_LOC_2]], i32 2, void (i32, i32, ...)* bitcast (void (i32, i32, %CapturedStructType, i8*) [[OMP_OUTLINED:@.+]] to void (i32, i32, ...)), %CapturedStructType %CaptureStructAlloca, i8*** [[ARGC_ADDR]])
	// ALL: ret i32 0			// ALL: ret i32 0
	// ALL-NEXT: }			// ALL-NEXT: }
	// ALL-DEBUG: define linkonce_odr i32 [[TMAIN]](i8** %argc)			// ALL-DEBUG: define linkonce_odr i32 [[TMAIN]](i8** %argc)

	// CHECK-DEBUG: store i8 %argc, i8* [[ARGC_ADDR:%.+]],			// CHECK-DEBUG: store i8 %argc, i8* [[ARGC_ADDR:%.+]],
	// CHECK-DEBUG: call void (%struct.ident_t, i32, void (i32, i32, ...), ...) @__kmpc_fork_call(%struct.ident_t* @{{.}}, i32 2, void (i32, i32, ...) bitcast (void (i32, i32, i8**, i64) [[OMP_OUTLINED:@.+]] to void (i32, i32, ...)), i8** [[ARGC_ADDR]], i64 %{{.+}})			// CHECK-DEBUG: call void (%struct.ident_t, i32, void (i32, i32, ...), ...) @__kmpc_fork_call(%struct.ident_t* @{{.}}, i32 2, void (i32, i32, ...) bitcast (void (i32, i32, i8**, i64) [[OMP_OUTLINED:@.+]] to void (i32, i32, ...)), i8** [[ARGC_ADDR]], i64 %{{.+}})
	// IRBUILDER-DEBUG: call void (%struct.ident_t, i32, void (i32, i32, ...), ...) @__kmpc_fork_call(%struct.ident_t* @{{.}}, i32 2, void (i32, i32, ...) bitcast (void (i32, i32, i64, i8*) [[OMP_OUTLINED:@.+]] to void (i32, i32, ...)), i64 %{{.+}}, i8*** [[ARGC_ADDR]])			// IRBUILDER-DEBUG: call void (%struct.ident_t, i32, void (i32, i32, ...), ...) @__kmpc_fork_call(%struct.ident_t* @{{.}}, i32 2, void (i32, i32, ...) bitcast (void (i32, i32, %CapturedStructType, i8*) [[OMP_OUTLINED:@.+]] to void (i32, i32, ...)), %CapturedStructType %CaptureStructAlloca, i8*** [[ARGC_ADDR]])
	// ALL-DEBUG: ret i32 0			// ALL-DEBUG: ret i32 0
	// ALL-DEBUG-NEXT: }			// ALL-DEBUG-NEXT: }

	// CHECK: define internal {{.}}void [[OMP_OUTLINED]](i32 noalias %.global_tid., i32* noalias %.bound_tid., i8*** nonnull align {{[0-9]+}} dereferenceable({{4\|8}}) %argc, i{{64\|32}}{{.*}} %{{.+}})			// CHECK: define internal {{.}}void [[OMP_OUTLINED]](i32 noalias %.global_tid., i32* noalias %.bound_tid., i8*** nonnull align {{[0-9]+}} dereferenceable({{4\|8}}) %argc, i{{64\|32}}{{.*}} %{{.+}})
	// IRBUILDER: define internal {{.}}void [[OMP_OUTLINED]](i32 noalias %{{.}}, i32 noalias %{{.}}, i{{64\|32}}{{.}} %{{.+}}, i8** [[ARGC_REF:%.*]])			// IRBUILDER: define internal {{.}}void [[OMP_OUTLINED]](i32 noalias %{{.}}, i32 noalias %{{.}}, %CapturedStructType %CaptureStructAlloca, i8*** [[ARGC_REF:%.*]])
	// CHECK: store i8* %argc, i8** [[ARGC_PTR_ADDR:%.+]],			// CHECK: store i8* %argc, i8** [[ARGC_PTR_ADDR:%.+]],
	// CHECK: [[ARGC_REF:%.+]] = load i8*, i8** [[ARGC_PTR_ADDR]]			// CHECK: [[ARGC_REF:%.+]] = load i8*, i8** [[ARGC_PTR_ADDR]]
	// ALL: [[ARGC:%.+]] = load i8, i8* [[ARGC_REF]]			// ALL: [[ARGC:%.+]] = load i8, i8* [[ARGC_REF]]
	// CHECK-NEXT: invoke {{.}}void [[FOO1:@.+foo.+]](i8* [[ARGC]])			// CHECK-NEXT: invoke {{.}}void [[FOO1:@.+foo.+]](i8* [[ARGC]])
	// IRBUILDER-NEXT: call {{.}}void [[FOO1:@.+foo.+]](i8* [[ARGC]])			// IRBUILDER-NEXT: call {{.}}void [[FOO1:@.+foo.+]](i8* [[ARGC]])
	// CHECK: ret void			// CHECK: ret void
	// CHECK: call {{.}}void @{{.+terminate.\|abort}}(			// CHECK: call {{.}}void @{{.+terminate.\|abort}}(
	// CHECK-NEXT: unreachable			// CHECK-NEXT: unreachable
	// CHECK-NEXT: }			// CHECK-NEXT: }
	// CHECK-DEBUG: define internal void [[OMP_OUTLINED_DEBUG:@.+]](i32* noalias %.global_tid., i32* noalias %.bound_tid., i8*** nonnull align {{[0-9]+}} dereferenceable({{4\|8}}) %argc, i64 %{{.+}})			// CHECK-DEBUG: define internal void [[OMP_OUTLINED_DEBUG:@.+]](i32* noalias %.global_tid., i32* noalias %.bound_tid., i8*** nonnull align {{[0-9]+}} dereferenceable({{4\|8}}) %argc, i64 %{{.+}})
	// IRBUILDER-DEBUG: define internal void [[OMP_OUTLINED_DEBUG:@.+]](i32* noalias %{{.}}, i32 noalias %{{.}}, i64 %{{.+}}, i8*** [[ARGC_REF:%.*]])			// IRBUILDER-DEBUG: define internal void [[OMP_OUTLINED_DEBUG:@.+]](i32* noalias %{{.}}, i32 noalias %{{.}}, %CapturedStructType %CaptureStructAlloca, i8*** [[ARGC_REF:%.*]])
	// CHECK-DEBUG: store i8* %argc, i8** [[ARGC_PTR_ADDR:%.+]],			// CHECK-DEBUG: store i8* %argc, i8** [[ARGC_PTR_ADDR:%.+]],
	// CHECK-DEBUG: [[ARGC_REF:%.+]] = load i8*, i8** [[ARGC_PTR_ADDR]]			// CHECK-DEBUG: [[ARGC_REF:%.+]] = load i8*, i8** [[ARGC_PTR_ADDR]]
	// ALL-DEBUG: [[ARGC:%.+]] = load i8, i8* [[ARGC_REF]]			// ALL-DEBUG: [[ARGC:%.+]] = load i8, i8* [[ARGC_REF]]
	// CHECK-DEBUG-NEXT: invoke void [[FOO1:@.+foo.+]](i8** [[ARGC]])			// CHECK-DEBUG-NEXT: invoke void [[FOO1:@.+foo.+]](i8** [[ARGC]])
	// IRBUILDER-DEBUG-NEXT: call void [[FOO1:@.+foo.+]](i8** [[ARGC]])			// IRBUILDER-DEBUG-NEXT: call void [[FOO1:@.+foo.+]](i8** [[ARGC]])
	// CHECK-DEBUG: ret void			// CHECK-DEBUG: ret void
	// CHECK-DEBUG: call void @{{.+terminate.*\|abort}}(			// CHECK-DEBUG: call void @{{.+terminate.*\|abort}}(
	// CHECK-DEBUG-NEXT: unreachable			// CHECK-DEBUG-NEXT: unreachable
	Show All 12 Lines

llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h

Show First 20 Lines • Show All 115 Lines • ▼ Show 20 Lines public:

/// should be placed. /// should be placed.

/// \param CodeGenIP is the insertion point at which the privatization code /// \param CodeGenIP is the insertion point at which the privatization code

/// should be placed. /// should be placed.

/// \param Original The value being copied/created, should not be used in the /// \param Original The value being copied/created, should not be used in the

/// generated IR. /// generated IR.

/// \param Inner The equivalent of \p Original that should be used in the /// \param Inner The equivalent of \p Original that should be used in the

/// generated IR; this is equal to \p Original if the value is /// generated IR; this is equal to \p Original if the value is

/// a pointer and can thus be passed directly, otherwise it is /// a pointer and can thus be passed directly, otherwise it is

/// an equivalent but different value. /// an equivalent but different value.

jdoerfertUnsubmitted

Done

wasn't that a spelling error before? and one \param is enough ;)

jdoerfert: wasn't that a spelling error before? and one `\param` is enough ;)

llitchevAuthorUnsubmitted

Done

Fixed.

llitchev: Fixed.

/// \param ReplVal The replacement value, thus a copy or new created version /// \param ReplVal The replacement value, thus a copy or new created version

/// of \p Inner. /// of \p Inner.

jdoerfertUnsubmitted

Not Done

I believe we need to keep the Inner parameter here. The reasons can be found in the discussion of D92476. Short story: The callback might have the original value in a map with information attached. If we pass in only the "reloaded" value the callback cannot determine what the original was.

jdoerfert: I believe we need to keep the `Inner` parameter here. The reasons can be found in the…

///

/// \returns The new insertion point where code generation continues and

/// \p ReplVal the replacement value.

using PrivatizeCallbackTy = function_ref<InsertPointTy( using PrivatizeCallbackTy = function_ref<InsertPointTy(

InsertPointTy AllocaIP, InsertPointTy CodeGenIP, Value &Original, InsertPointTy AllocaIP, InsertPointTy CodeGenIP, Value &Original,

Value &Inner, Value *&ReplVal)>; Value &Inner, Value *&ReplVal)>;

/// Description of a LLVM-IR insertion point (IP) and a debug/source location /// Description of a LLVM-IR insertion point (IP) and a debug/source location

/// (filename, line, column, ...). /// (filename, line, column, ...).

struct LocationDescription { struct LocationDescription {

template <typename T, typename U> template <typename T, typename U>

▲ Show 20 Lines • Show All 523 Lines • ▼ Show 20 Lines private:

/// The emitted loop will be disconnected, i.e. no edge to the loop's /// The emitted loop will be disconnected, i.e. no edge to the loop's

/// preheader and no terminator in the AfterBB. The OpenMPIRBuilder's /// preheader and no terminator in the AfterBB. The OpenMPIRBuilder's

/// IRBuilder location is not preserved. /// IRBuilder location is not preserved.

/// ///

/// \param DL DebugLoc used for the instructions in the skeleton. /// \param DL DebugLoc used for the instructions in the skeleton.

/// \param TripCount Value to be used for the trip count. /// \param TripCount Value to be used for the trip count.

/// \param F Function in which to insert the BasicBlocks. /// \param F Function in which to insert the BasicBlocks.

/// \param PreInsertBefore Where to insert BBs that execute before the body, /// \param PreInsertBefore Where to insert BBs that execute before the body,

/// typically the body itself. /// typically the body itself.

jdoerfertUnsubmitted

Done

Nit: use SmallVectorImpl w/o the size. No const on the pointers (*const); doesn't mean much anyway.

jdoerfert: Nit: use SmallVectorImpl w/o the size. No const on the pointers (*const); doesn't mean much…

llitchevAuthorUnsubmitted

Done

Using the Impl now.

llitchev: Using the Impl now.

/// \param PostInsertBefore Where to insert BBs that execute after the body. /// \param PostInsertBefore Where to insert BBs that execute after the body.

/// \param Name Base name used to derive BB /// \param Name Base name used to derive BB

/// and instruction names. /// and instruction names.

/// ///

/// \returns The CanonicalLoopInfo that represents the emitted loop. /// \returns The CanonicalLoopInfo that represents the emitted loop.

CanonicalLoopInfo *createLoopSkeleton(DebugLoc DL, Value *TripCount, CanonicalLoopInfo *createLoopSkeleton(DebugLoc DL, Value *TripCount,

Function *F, Function *F,

BasicBlock *PreInsertBefore, BasicBlock *PreInsertBefore,

BasicBlock *PostInsertBefore, BasicBlock *PostInsertBefore,

const Twine &Name = {}); const Twine &Name = {});

/// Capture the above-defined paraneters for the parallel regions.

ftynseUnsubmitted

Not Done

const Twine &Name = {});

- /// Capture the above-defined paraneters for the parallel regions.

+ /// Capture the above-defined parameters for the parallel regions.

///

/// \param CaptureAllocaInsPoint Insertion point for the alloca-ed struct.

ftynse:

///

/// \param InsertBeforeInst The instruction before which the capture

ftynseUnsubmitted

Done

Nit: it looks like this file uses IP rather than InsPoint for names related to insertion points

ftynse: Nit: it looks like this file uses IP rather than InsPoint for names related to insertion points

llitchevAuthorUnsubmitted

Done

No need to store this value anymore. Used the InsertBB->getTerminator(), thus guaranteeing the alloca and stores are just before the fork call (they were before that call too, since the ThreadID was called last), so even if more codegen is introduced in the future the logic deals with it.

llitchev: No need to store this value anymore. Used the InsertBB->getTerminator(), thus guaranteeing the…

/// alloca, insert and store should be inserted.

/// \param OuterFn The function containing the omp::Parallel.

/// \param Blocks The parallel region blocks.

/// \param TIDAddr The address of the TID value.

/// \param ZeroAddr The address of the Zero value.

void captureParallelRegionParameters(

Instruction *InsertBeforeInst, Function *OuterFn,

const SmallVectorImpl<BasicBlock *> &Blocks, const Value *const TIDAddr,

const Value *const ZeroAddr);

}; };

/// Class to represented the control flow structure of an OpenMP canonical loop. /// Class to represented the control flow structure of an OpenMP canonical loop.

/// ///

/// The control-flow structure is standardized for easy consumption by /// The control-flow structure is standardized for easy consumption by

/// directives associated with loops. For instance, the worksharing-loop /// directives associated with loops. For instance, the worksharing-loop

/// construct may change this control flow such that each loop iteration is /// construct may change this control flow such that each loop iteration is

/// executed on only one thread. /// executed on only one thread.

▲ Show 20 Lines • Show All 110 Lines • Show Last 20 Lines

llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp

Show First 20 Lines • Show All 432 Lines • ▼ Show 20 Lines IRBuilder<>::InsertPoint OpenMPIRBuilder::createParallel(

Constant *SrcLocStr = getOrCreateSrcLocStr(Loc); Constant *SrcLocStr = getOrCreateSrcLocStr(Loc);

Value *Ident = getOrCreateIdent(SrcLocStr); Value *Ident = getOrCreateIdent(SrcLocStr);

Value *ThreadID = getOrCreateThreadID(Ident); Value *ThreadID = getOrCreateThreadID(Ident);

if (NumThreads) { if (NumThreads) {

// Build call __kmpc_push_num_threads(&Ident, global_tid, num_threads) // Build call __kmpc_push_num_threads(&Ident, global_tid, num_threads)

Value *Args[] = { Value *Args[] = {

Ident, ThreadID, Ident, ThreadID,

Builder.CreateIntCast(NumThreads, Int32, /*isSigned*/ false)}; Builder.CreateIntCast(NumThreads, Int32, /*isSigned*/ false)};

jdoerfertUnsubmitted

Done

How do we know that -- is valid here? Couldn't Loc point to the begin of a function? If possible, let's just use Loc.IP.

jdoerfert: How do we know that `--` is valid here? Couldn't `Loc` point to the begin of a function? If…

llitchevAuthorUnsubmitted

Done

There is always an instruction before - the ThreadID was always generated, and that is what -- points to. Changed it to use InsertBB->getTerminator(). It is much more sturdy this way. Even if the codegen is changed, the alloca, insert and store will be generated always just before the forkCall.

llitchev: There is always an instruction before - the ThreadID was always generated, and that is what…

Builder.CreateCall( Builder.CreateCall(

getOrCreateRuntimeFunctionPtr(OMPRTL___kmpc_push_num_threads), Args); getOrCreateRuntimeFunctionPtr(OMPRTL___kmpc_push_num_threads), Args);

} }

if (ProcBind != OMP_PROC_BIND_default) { if (ProcBind != OMP_PROC_BIND_default) {

jdoerfertUnsubmitted

Done

If it has to be an instruction, use cast. If it might no be, simply get the current instruction point from the builder.

jdoerfert: If it has to be an instruction, use `cast`. If it might no be, simply get the current…

llitchevAuthorUnsubmitted

Done

Using the current position from the Builder (moving one forward, so it is not the end() ).

llitchev: Using the current position from the Builder (moving one forward, so it is not the end() ).

// Build call __kmpc_push_proc_bind(&Ident, global_tid, proc_bind) // Build call __kmpc_push_proc_bind(&Ident, global_tid, proc_bind)

Value *Args[] = { Value *Args[] = {

Ident, ThreadID, Ident, ThreadID,

ConstantInt::get(Int32, unsigned(ProcBind), /*isSigned=*/true)}; ConstantInt::get(Int32, unsigned(ProcBind), /*isSigned=*/true)};

Builder.CreateCall( Builder.CreateCall(

getOrCreateRuntimeFunctionPtr(OMPRTL___kmpc_push_proc_bind), Args); getOrCreateRuntimeFunctionPtr(OMPRTL___kmpc_push_proc_bind), Args);

} }

BasicBlock *InsertBB = Builder.GetInsertBlock(); BasicBlock *InsertBB = Builder.GetInsertBlock();

Function *OuterFn = InsertBB->getParent(); Function *OuterFn = InsertBB->getParent();

// Save the outer alloca block because the insertion iterator may get

// invalidated and we still need this later.

BasicBlock *OuterAllocaBlock = OuterAllocaIP.getBlock();

// Vector to remember instructions we used only during the modeling but which // Vector to remember instructions we used only during the modeling but which

// we want to delete at the end. // we want to delete at the end.

SmallVector<Instruction *, 4> ToBeDeleted; SmallVector<Instruction *, 4> ToBeDeleted;

// Change the location to the outer alloca insertion point to create and // Change the location to the outer alloca insertion point to create and

// initialize the allocas we pass into the parallel region. // initialize the allocas we pass into the parallel region.

Builder.restoreIP(OuterAllocaIP); Builder.restoreIP(OuterAllocaIP);

AllocaInst *TIDAddr = Builder.CreateAlloca(Int32, nullptr, "tid.addr"); AllocaInst *TIDAddr = Builder.CreateAlloca(Int32, nullptr, "tid.addr");

▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines IRBuilder<>::InsertPoint OpenMPIRBuilder::createParallel(

InsertPointTy InnerAllocaIP = Builder.saveIP(); InsertPointTy InnerAllocaIP = Builder.saveIP();

AllocaInst *PrivTIDAddr = AllocaInst *PrivTIDAddr =

Builder.CreateAlloca(Int32, nullptr, "tid.addr.local"); Builder.CreateAlloca(Int32, nullptr, "tid.addr.local");

Instruction *PrivTID = Builder.CreateLoad(PrivTIDAddr, "tid"); Instruction *PrivTID = Builder.CreateLoad(PrivTIDAddr, "tid");

// Add some fake uses for OpenMP provided arguments. // Add some fake uses for OpenMP provided arguments.

ToBeDeleted.push_back(Builder.CreateLoad(TIDAddr, "tid.addr.use")); ToBeDeleted.push_back(Builder.CreateLoad(TIDAddr, "tid.addr.use"));

Instruction *ZeroAddrUse = Builder.CreateLoad(ZeroAddr, "zero.addr.use"); ToBeDeleted.push_back(Builder.CreateLoad(ZeroAddr, "zero.addr.use"));

ToBeDeleted.push_back(ZeroAddrUse);

// ThenBB // ThenBB

// | // |

// V // V

// PRegionEntryBB <- Privatization allocas are placed here. // PRegionEntryBB <- Privatization allocas are placed here.

// | // |

// V // V

// PRegionBodyBB <- BodeGen is invoked here. // PRegionBodyBB <- BodeGen is invoked here.

▲ Show 20 Lines • Show All 148 Lines • ▼ Show 20 Lines IRBuilder<>::InsertPoint OpenMPIRBuilder::createParallel(

Extractor.findAllocas(CEAC, SinkingCands, HoistingCands, CommonExit); Extractor.findAllocas(CEAC, SinkingCands, HoistingCands, CommonExit);

Extractor.findInputsOutputs(Inputs, Outputs, SinkingCands); Extractor.findInputsOutputs(Inputs, Outputs, SinkingCands);

LLVM_DEBUG(dbgs() << "Before privatization: " << *OuterFn << "\n"); LLVM_DEBUG(dbgs() << "Before privatization: " << *OuterFn << "\n");

FunctionCallee TIDRTLFn = FunctionCallee TIDRTLFn =

getOrCreateRuntimeFunctionPtr(OMPRTL___kmpc_global_thread_num); getOrCreateRuntimeFunctionPtr(OMPRTL___kmpc_global_thread_num);

// Capture the outer parameters for the ParallelRegions.

captureParallelRegionParameters(InsertBB->getTerminator(), OuterFn, Blocks,

TIDAddr, ZeroAddr);

jdoerfertUnsubmitted

Done

This should make some of the code introduced by D92189 obsolete, right?

jdoerfert: This should make some of the code introduced by D92189 obsolete, right?

llitchevAuthorUnsubmitted

Done

Yes! Most of it. Now the data is just wrapped into a capture struct before calling the synthetic ..._fork function.

llitchev: Yes! Most of it. Now the data is just wrapped into a capture struct before calling the…

auto PrivHelper = [&](Value &V) { auto PrivHelper = [&](Value &V) {

if (&V == TIDAddr || &V == ZeroAddr) if (&V == TIDAddr || &V == ZeroAddr)

return; return;

SetVector<Use *> Uses; SmallVector<Use *, 8> Uses;

for (Use &U : V.uses()) for (Use &U : V.uses())

if (auto *UserI = dyn_cast<Instruction>(U.getUser())) if (auto *UserI = dyn_cast<Instruction>(U.getUser()))

if (ParallelRegionBlockSet.count(UserI->getParent())) if (ParallelRegionBlockSet.count(UserI->getParent()))

Uses.insert(&U); Uses.push_back(&U);

// __kmpc_fork_call expects extra arguments as pointers. If the input

// already has a pointer type, everything is fine. Otherwise, store the

// value onto stack and load it back inside the to-be-outlined region. This

// will ensure only the pointer will be passed to the function.

// FIXME: if there are more than 15 trailing arguments, they must be

// additionally packed in a struct.

Value *Inner = &V;

if (!V.getType()->isPointerTy()) {

IRBuilder<>::InsertPointGuard Guard(Builder);

LLVM_DEBUG(llvm::dbgs() << "Forwarding input as pointer: " << V << "\n");

Builder.restoreIP(OuterAllocaIP);

Value *Ptr =

Builder.CreateAlloca(V.getType(), nullptr, V.getName() + ".reloaded");

// Store to stack at end of the block that currently branches to the entry

// block of the to-be-outlined region.

Builder.SetInsertPoint(InsertBB,

InsertBB->getTerminator()->getIterator());

Builder.CreateStore(&V, Ptr);

// Load back next to allocations in the to-be-outlined region.

Builder.restoreIP(InnerAllocaIP);

Inner = Builder.CreateLoad(Ptr);

}

Value *ReplacementValue = nullptr; Value *ReplacementValue = nullptr;

CallInst *CI = dyn_cast<CallInst>(&V); CallInst *CI = dyn_cast<CallInst>(&V);

if (CI && CI->getCalledFunction() == TIDRTLFn.getCallee()) { if (CI && CI->getCalledFunction() == TIDRTLFn.getCallee()) {

ReplacementValue = PrivTID; ReplacementValue = PrivTID;

} else { } else {

Builder.restoreIP( Builder.restoreIP(

PrivCB(InnerAllocaIP, Builder.saveIP(), V, *Inner, ReplacementValue)); PrivCB(InnerAllocaIP, Builder.saveIP(), V, V, ReplacementValue));

assert(ReplacementValue && assert(ReplacementValue &&

"Expected copy/create callback to set replacement value!"); "Expected copy/create callback to set replacement value!");

if (ReplacementValue == &V) if (ReplacementValue == &V)

return; return;

} }

for (Use *UPtr : Uses) for (Use *UPtr : Uses)

UPtr->set(ReplacementValue); UPtr->set(ReplacementValue);

}; };

// Reset the inner alloca insertion as it will be used for loading the values

// wrapped into pointers before passing them into the to-be-outlined region.

// Configure it to insert immediately after the fake use of zero address so

// that they are available in the generated body and so that the

// OpenMP-related values (thread ID and zero address pointers) remain leading

// in the argument list.

InnerAllocaIP = IRBuilder<>::InsertPoint(

ZeroAddrUse->getParent(), ZeroAddrUse->getNextNode()->getIterator());

// Reset the outer alloca insertion point to the entry of the relevant block

// in case it was invalidated.

OuterAllocaIP = IRBuilder<>::InsertPoint(

OuterAllocaBlock, OuterAllocaBlock->getFirstInsertionPt());

for (Value *Input : Inputs) { for (Value *Input : Inputs) {

LLVM_DEBUG(dbgs() << "Captured input: " << *Input << "\n"); LLVM_DEBUG(dbgs() << "Captured input: " << *Input << "\n");

PrivHelper(*Input); PrivHelper(*Input);

} }

LLVM_DEBUG({ LLVM_DEBUG({

for (Value *Output : Outputs) for (Value *Output : Outputs)

LLVM_DEBUG(dbgs() << "Captured output: " << *Output << "\n"); LLVM_DEBUG(dbgs() << "Captured output: " << *Output << "\n");

}); });

Show All 10 Lines IRBuilder<>::InsertPoint OpenMPIRBuilder::createParallel(

addOutlineInfo(std::move(OI)); addOutlineInfo(std::move(OI));

InsertPointTy AfterIP(UI->getParent(), UI->getParent()->end()); InsertPointTy AfterIP(UI->getParent(), UI->getParent()->end());

UI->eraseFromParent(); UI->eraseFromParent();

return AfterIP; return AfterIP;

} }

void OpenMPIRBuilder::captureParallelRegionParameters(

Instruction *InsertBeforeInst, Function *OuterFn,

const SmallVectorImpl<BasicBlock *> &Blocks, const Value *TIDAddr,

const Value *ZeroAddr) {

// Capture outside parameters.

SetVector<Value *> CapturedValues;

SetVector<BasicBlock *> BlockParents;

unsigned BlockSize = Blocks.size();

ftynseUnsubmitted

Done

Nit: I think https://llvm.org/docs/CodingStandards.html#don-t-evaluate-end-every-time-through-a-loop applies to .size() the same way it applies to .end()

ftynse: Nit: I think https://llvm.org/docs/CodingStandards.html#don-t-evaluate-end-every-time-through…

llitchevAuthorUnsubmitted

Done

Fixed.

llitchev: Fixed.

for (unsigned Counter = 0; Counter < BlockSize; Counter++) {

BasicBlock *ParallelRegionBlock = Blocks[Counter];

BlockParents.insert(ParallelRegionBlock);

}

for (unsigned Counter = 0; Counter < BlockSize; Counter++) {

BasicBlock *ParallelRegionBlock = Blocks[Counter];

for (auto I = ParallelRegionBlock->begin(), E = ParallelRegionBlock->end();

I != E; ++I) {

for (Use &U : I->operands()) {

Value *V = U.get();

if (V == TIDAddr || V == ZeroAddr)

continue;

// Skip pointers.

if (V->getType()->isPointerTy())

continue;

// One case for example, if propagated const, there is no instruction.

Instruction *DefInst = dyn_cast<Instruction>(V);

if (!DefInst || !DefInst->getParent())

continue;

// If the parent of the def instruction is not in the parallel

// region block set, the definition of the operand is in an

// upper block.

if (!BlockParents.contains(DefInst->getParent()))

CapturedValues.insert(V);

}

jdoerfertUnsubmitted

Not Done

If we would need this, remove the Counter stuff everywhere, if you want to iterate a container: for (const T& : Container)
BlockParents seems to be a set with the blocks, we already have that, it's called ParallelRegionBlockSet, simply pass it in.
Why don't we use the Inputs and Outputs set computed by the findInputsOutputs call. Those are the live-in and live-out values of the parallel region.

jdoerfert: 1) If we would need this, remove the Counter stuff everywhere, if you want to iterate a…

// If there are captured parameters to the parallel loop,

// allocate the captured struct on the stack, set the element values.

// Then, load the capture struct, extract the elements and replace the

// captured values with the extracted ones from the struct.

ftynseUnsubmitted

Done

Nit: https://llvm.org/docs/CodingStandards.html#use-early-exits-and-continue-to-simplify-code

if (CapturedValues.empty())
  return;

ftynse: Nit: https://llvm.org/docs/CodingStandards.html#use-early-exits-and-continue-to-simplify-code…

if (CapturedValues.empty())

ftynseUnsubmitted

Done

Nit: trailing dot

ftynse: Nit: trailing dot

return;

// Create the StructTy.

ftynseUnsubmitted

Done

Nit: please reserve before pushing back in a loop

ftynse: Nit: please `reserve` before pushing back in a loop

unsigned CapturedSize = CapturedValues.size();

std::vector<Type *> StructTypes;

jdoerfertUnsubmitted

Not Done

unsigned CapturedSize = CapturedValues.size();

- std::vector<Type *> StructTypes;

+ SmallVector<Type *, 16> StructFieldTypes;

StructTypes.reserve(CapturedSize);

jdoerfert:

StructTypes.reserve(CapturedSize);

for (unsigned Counter = 0; Counter < CapturedSize; Counter++)

StructTypes.push_back(CapturedValues[Counter]->getType());

Type *CaptureStructType =

StructType::create(StructTypes, "CapturedStructType");

ftynseUnsubmitted

Done

Nit: Builder.restoreIP(CaptureAllocaInsPoint) looks shorter

ftynse: Nit: `Builder.restoreIP(CaptureAllocaInsPoint)` looks shorter

llitchevAuthorUnsubmitted

Done

Refactored. No need to store the InsertPoint.

llitchev: Refactored. No need to store the InsertPoint.

AllocaInst *AllocaInst;

{

llvm::IRBuilder<>::InsertPointGuard Guard(Builder);

Builder.SetInsertPoint(InsertBeforeInst);

jdoerfertUnsubmitted

Not Done

The alloca needs to go in the OuterAllocaIP passed in by the caller of CreateParallel.

jdoerfert: The alloca needs to go in the `OuterAllocaIP` passed in by the caller of `CreateParallel`.

// Allocate and populate the capture struct.

AllocaInst =

ftynseUnsubmitted

Done

I suppose you may want to have alloca inserted in a block (function entry) different from the one where you store into the memory. You need to store just before calling the fork function (or, at least, so that the store postdominates all stored values). Looking at the function API, I would have assumed CaptureAllocaInsPoint to be an insertion point at the function entry block specifically for allocas, where these insertvalues are invalid.

ftynse: I suppose you may want to have `alloca` inserted in a block (function entry) different from the…

llitchevAuthorUnsubmitted

Done

Now it is guaranteed that. the codegen of the alloca, insert, and stores are done just before the forkCall. Even if the codegen changes in the future. It was the case before because the code was generated after the ThreadID getting call (which was just before the fork).

llitchev: Now it is guaranteed that. the codegen of the alloca, insert, and stores are done just before…

Builder.CreateAlloca(CaptureStructType, nullptr, "CaptureStructAlloca");

Value *InsertValue = UndefValue::get(CaptureStructType);

for (auto SrcIdx : enumerate(CapturedValues))

InsertValue = Builder.CreateInsertValue(InsertValue, SrcIdx.value(),

SrcIdx.index());

Builder.CreateStore(InsertValue, AllocaInst);

}

Value *LoadedAlloca = Builder.CreateLoad(AllocaInst);

for (auto SrcIdx : enumerate(CapturedValues)) {

Value *LoadedValue =

Builder.CreateExtractValue(LoadedAlloca, SrcIdx.index());

jdoerfertUnsubmitted

Not Done

I'm not too happy with this insert/extract value scheme. Without further optimization (-O0) this might not be lowered properly. Why don't we create a GEP and load/store to the appropriate location instead?

jdoerfert: I'm not too happy with this insert/extract value scheme. Without further optimization (-O0)…

// Find the usages of the captured values and replace them in the parallel

ftynseUnsubmitted

Not Done

Can we rather take each captured value and enumerate its uses, replacing those within the parallel block set?

ftynse: Can we rather take each captured value and enumerate its uses, replacing those within the…

llitchevAuthorUnsubmitted

Done

That was the first implementation I had. The issues was that the uses() was not returning all the uses (particularly the ones introduced by the loop unroller - spent bunch of time debugging it). Iterating to all the instruction parameters of the parallelRegions just works.

llitchev: That was the first implementation I had. The issues was that the uses() was not returning all…

// region blocks.

for (unsigned Counter = 0; Counter < BlockSize; Counter++)

for (auto I = Blocks[Counter]->begin(), E = Blocks[Counter]->end();

I != E; ++I)

for (Use &U : I->operands())

if (SrcIdx.value() == U.get())

U.set(LoadedValue);

}

jdoerfertUnsubmitted

Not Done

Instead of doing this, unpack/load the location in the PrivHelper like we did before. Also, pass the loaded value as Inner to the PrivCB so that the callback has both the original value V and the reload Inner.

jdoerfert: Instead of doing this, unpack/load the location in the `PrivHelper` like we did before. Also…

}

void OpenMPIRBuilder::emitFlush(const LocationDescription &Loc) { void OpenMPIRBuilder::emitFlush(const LocationDescription &Loc) {

// Build call void __kmpc_flush(ident_t *loc) // Build call void __kmpc_flush(ident_t *loc)

Constant *SrcLocStr = getOrCreateSrcLocStr(Loc); Constant *SrcLocStr = getOrCreateSrcLocStr(Loc);

Value *Args[] = {getOrCreateIdent(SrcLocStr)}; Value *Args[] = {getOrCreateIdent(SrcLocStr)};

jdoerfertUnsubmitted

Done

Use range loops above whenever possible. Use LLVM naming style for variables please, so first letter capitalized. Also no llvm::.
Prefer Insertion point guards over manually saving restoring (potentially adding a explicit scope { ... }).
I don't think we need to iterate over the entire set of instructions of the outer function, do we? Check how D92189 identifies communicated values.

jdoerfert: Use range loops above whenever possible. Use LLVM naming style for variables please, so first…

llitchevAuthorUnsubmitted

Done

You are right ... That was one of the changes I wanted to make to the Diff. Now iterate over the parallel region blocks only.

llitchev: You are right ... That was one of the changes I wanted to make to the Diff. Now iterate over…

Builder.CreateCall(getOrCreateRuntimeFunctionPtr(OMPRTL___kmpc_flush), Args); Builder.CreateCall(getOrCreateRuntimeFunctionPtr(OMPRTL___kmpc_flush), Args);

} }

void OpenMPIRBuilder::createFlush(const LocationDescription &Loc) { void OpenMPIRBuilder::createFlush(const LocationDescription &Loc) {

if (!updateToLocation(Loc)) if (!updateToLocation(Loc))

return; return;

emitFlush(Loc); emitFlush(Loc);

} }

▲ Show 20 Lines • Show All 825 Lines • Show Last 20 Lines

llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp

Show First 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	protected:

LLVMContext Ctx;		LLVMContext Ctx;
std::unique_ptr<Module> M;		std::unique_ptr<Module> M;
Function *F;		Function *F;
BasicBlock *BB;		BasicBlock *BB;
DebugLoc DL;		DebugLoc DL;
};		};

// Returns the value stored in the given allocation. Returns null if the given
// value is not a result of an allocation, if no value is stored or if there is
// more than one store.
static Value findStoredValue(Value AllocaValue) {
Instruction *Alloca = dyn_cast<AllocaInst>(AllocaValue);
if (!Alloca)
return nullptr;
StoreInst *Store = nullptr;
for (Use &U : Alloca->uses()) {
if (auto *CandidateStore = dyn_cast<StoreInst>(U.getUser())) {
EXPECT_EQ(Store, nullptr);
Store = CandidateStore;
}
}
if (!Store)
return nullptr;
return Store->getValueOperand();
}

TEST_F(OpenMPIRBuilderTest, CreateBarrier) {		TEST_F(OpenMPIRBuilderTest, CreateBarrier) {
OpenMPIRBuilder OMPBuilder(*M);		OpenMPIRBuilder OMPBuilder(*M);
OMPBuilder.initialize();		OMPBuilder.initialize();

IRBuilder<> Builder(BB);		IRBuilder<> Builder(BB);

OMPBuilder.createBarrier({IRBuilder<>::InsertPoint()}, OMPD_for);		OMPBuilder.createBarrier({IRBuilder<>::InsertPoint()}, OMPD_for);
EXPECT_TRUE(M->global_empty());		EXPECT_TRUE(M->global_empty());
▲ Show 20 Lines • Show All 330 Lines • ▼ Show 20 Lines	TEST_F(OpenMPIRBuilderTest, ParallelSimple) {
ASSERT_NE(ForkCI, nullptr);		ASSERT_NE(ForkCI, nullptr);

EXPECT_EQ(ForkCI->getCalledFunction()->getName(), "__kmpc_fork_call");		EXPECT_EQ(ForkCI->getCalledFunction()->getName(), "__kmpc_fork_call");
EXPECT_EQ(ForkCI->getNumArgOperands(), 4U);		EXPECT_EQ(ForkCI->getNumArgOperands(), 4U);
EXPECT_TRUE(isa<GlobalVariable>(ForkCI->getArgOperand(0)));		EXPECT_TRUE(isa<GlobalVariable>(ForkCI->getArgOperand(0)));
EXPECT_EQ(ForkCI->getArgOperand(1),		EXPECT_EQ(ForkCI->getArgOperand(1),
ConstantInt::get(Type::getInt32Ty(Ctx), 1U));		ConstantInt::get(Type::getInt32Ty(Ctx), 1U));
EXPECT_EQ(ForkCI->getArgOperand(2), Usr);		EXPECT_EQ(ForkCI->getArgOperand(2), Usr);
EXPECT_EQ(findStoredValue(ForkCI->getArgOperand(3)), F->arg_begin());
}		}

TEST_F(OpenMPIRBuilderTest, ParallelNested) {		TEST_F(OpenMPIRBuilderTest, ParallelNested) {
using InsertPointTy = OpenMPIRBuilder::InsertPointTy;		using InsertPointTy = OpenMPIRBuilder::InsertPointTy;
OpenMPIRBuilder OMPBuilder(*M);		OpenMPIRBuilder OMPBuilder(*M);
OMPBuilder.initialize();		OMPBuilder.initialize();
F->setName("func");		F->setName("func");
IRBuilder<> Builder(BB);		IRBuilder<> Builder(BB);
▲ Show 20 Lines • Show All 297 Lines • ▼ Show 20 Lines	for (User *Usr : OutlinedFn->users()) {
}		}
}		}

EXPECT_EQ(ForkCI->getCalledFunction()->getName(), "__kmpc_fork_call");		EXPECT_EQ(ForkCI->getCalledFunction()->getName(), "__kmpc_fork_call");
EXPECT_EQ(ForkCI->getNumArgOperands(), 4U);		EXPECT_EQ(ForkCI->getNumArgOperands(), 4U);
EXPECT_TRUE(isa<GlobalVariable>(ForkCI->getArgOperand(0)));		EXPECT_TRUE(isa<GlobalVariable>(ForkCI->getArgOperand(0)));
EXPECT_EQ(ForkCI->getArgOperand(1),		EXPECT_EQ(ForkCI->getArgOperand(1),
ConstantInt::get(Type::getInt32Ty(Ctx), 1));		ConstantInt::get(Type::getInt32Ty(Ctx), 1));
Value *StoredForkArg = findStoredValue(ForkCI->getArgOperand(3));		EXPECT_EQ(ForkCI->getArgOperand(3), F->arg_begin());
EXPECT_EQ(StoredForkArg, F->arg_begin());

EXPECT_EQ(DirectCI->getCalledFunction(), OutlinedFn);		EXPECT_EQ(DirectCI->getCalledFunction(), OutlinedFn);
EXPECT_EQ(DirectCI->getNumArgOperands(), 3U);		EXPECT_EQ(DirectCI->getNumArgOperands(), 3U);
EXPECT_TRUE(isa<AllocaInst>(DirectCI->getArgOperand(0)));		EXPECT_TRUE(isa<AllocaInst>(DirectCI->getArgOperand(0)));
EXPECT_TRUE(isa<AllocaInst>(DirectCI->getArgOperand(1)));		EXPECT_TRUE(isa<AllocaInst>(DirectCI->getArgOperand(1)));
Value *StoredDirectArg = findStoredValue(DirectCI->getArgOperand(2));		EXPECT_EQ(DirectCI->getArgOperand(2), F->arg_begin());
EXPECT_EQ(StoredDirectArg, F->arg_begin());
}		}

TEST_F(OpenMPIRBuilderTest, ParallelCancelBarrier) {		TEST_F(OpenMPIRBuilderTest, ParallelCancelBarrier) {
using InsertPointTy = OpenMPIRBuilder::InsertPointTy;		using InsertPointTy = OpenMPIRBuilder::InsertPointTy;
OpenMPIRBuilder OMPBuilder(*M);		OpenMPIRBuilder OMPBuilder(*M);
OMPBuilder.initialize();		OMPBuilder.initialize();
F->setName("func");		F->setName("func");
IRBuilder<> Builder(BB);		IRBuilder<> Builder(BB);
▲ Show 20 Lines • Show All 98 Lines • ▼ Show 20 Lines	if (!isa<ReturnInst>(ExitBB->front())) {
ASSERT_TRUE(isa<BranchInst>(ExitBB->front()));		ASSERT_TRUE(isa<BranchInst>(ExitBB->front()));
ASSERT_EQ(cast<BranchInst>(ExitBB->front()).getNumSuccessors(), 1U);		ASSERT_EQ(cast<BranchInst>(ExitBB->front()).getNumSuccessors(), 1U);
ASSERT_TRUE(isa<ReturnInst>(		ASSERT_TRUE(isa<ReturnInst>(
cast<BranchInst>(ExitBB->front()).getSuccessor(0)->front()));		cast<BranchInst>(ExitBB->front()).getSuccessor(0)->front()));
}		}
}		}
}		}

TEST_F(OpenMPIRBuilderTest, ParallelForwardAsPointers) {
OpenMPIRBuilder OMPBuilder(*M);
OMPBuilder.initialize();
F->setName("func");
IRBuilder<> Builder(BB);
OpenMPIRBuilder::LocationDescription Loc({Builder.saveIP(), DL});
using InsertPointTy = OpenMPIRBuilder::InsertPointTy;

Type *I32Ty = Type::getInt32Ty(M->getContext());
Type *I32PtrTy = Type::getInt32PtrTy(M->getContext());
Type *StructTy = StructType::get(I32Ty, I32PtrTy);
Type *StructPtrTy = StructTy->getPointerTo();
Type *VoidTy = Type::getVoidTy(M->getContext());
FunctionCallee RetI32Func = M->getOrInsertFunction("ret_i32", I32Ty);
FunctionCallee TakeI32Func =
M->getOrInsertFunction("take_i32", VoidTy, I32Ty);
FunctionCallee RetI32PtrFunc = M->getOrInsertFunction("ret_i32ptr", I32PtrTy);
FunctionCallee TakeI32PtrFunc =
M->getOrInsertFunction("take_i32ptr", VoidTy, I32PtrTy);
FunctionCallee RetStructFunc = M->getOrInsertFunction("ret_struct", StructTy);
FunctionCallee TakeStructFunc =
M->getOrInsertFunction("take_struct", VoidTy, StructTy);
FunctionCallee RetStructPtrFunc =
M->getOrInsertFunction("ret_structptr", StructPtrTy);
FunctionCallee TakeStructPtrFunc =
M->getOrInsertFunction("take_structPtr", VoidTy, StructPtrTy);
Value *I32Val = Builder.CreateCall(RetI32Func);
Value *I32PtrVal = Builder.CreateCall(RetI32PtrFunc);
Value *StructVal = Builder.CreateCall(RetStructFunc);
Value *StructPtrVal = Builder.CreateCall(RetStructPtrFunc);

Instruction *Internal;
auto BodyGenCB = [&](InsertPointTy AllocaIP, InsertPointTy CodeGenIP,
BasicBlock &ContinuationBB) {
IRBuilder<>::InsertPointGuard Guard(Builder);
Builder.restoreIP(CodeGenIP);
Internal = Builder.CreateCall(TakeI32Func, I32Val);
Builder.CreateCall(TakeI32PtrFunc, I32PtrVal);
Builder.CreateCall(TakeStructFunc, StructVal);
Builder.CreateCall(TakeStructPtrFunc, StructPtrVal);
};
auto PrivCB = [&](InsertPointTy AllocaIP, InsertPointTy CodeGenIP, Value &,
Value &Inner, Value *&ReplacementValue) {
ReplacementValue = &Inner;
return CodeGenIP;
};
auto FiniCB = [](InsertPointTy) {};

IRBuilder<>::InsertPoint AllocaIP(&F->getEntryBlock(),
F->getEntryBlock().getFirstInsertionPt());
IRBuilder<>::InsertPoint AfterIP =
OMPBuilder.createParallel(Loc, AllocaIP, BodyGenCB, PrivCB, FiniCB,
nullptr, nullptr, OMP_PROC_BIND_default, false);
Builder.restoreIP(AfterIP);
Builder.CreateRetVoid();

OMPBuilder.finalize();

EXPECT_FALSE(verifyModule(*M, &errs()));
Function *OutlinedFn = Internal->getFunction();

Type *Arg2Type = OutlinedFn->getArg(2)->getType();
EXPECT_TRUE(Arg2Type->isPointerTy());
EXPECT_EQ(Arg2Type->getPointerElementType(), I32Ty);

// Arguments that need to be passed through pointers and reloaded will get
// used earlier in the functions and therefore will appear first in the
// argument list after outlining.
Type *Arg3Type = OutlinedFn->getArg(3)->getType();
EXPECT_TRUE(Arg3Type->isPointerTy());
EXPECT_EQ(Arg3Type->getPointerElementType(), StructTy);

Type *Arg4Type = OutlinedFn->getArg(4)->getType();
EXPECT_EQ(Arg4Type, I32PtrTy);

Type *Arg5Type = OutlinedFn->getArg(5)->getType();
EXPECT_EQ(Arg5Type, StructPtrTy);
}

TEST_F(OpenMPIRBuilderTest, CanonicalLoopSimple) {		TEST_F(OpenMPIRBuilderTest, CanonicalLoopSimple) {
using InsertPointTy = OpenMPIRBuilder::InsertPointTy;		using InsertPointTy = OpenMPIRBuilder::InsertPointTy;
OpenMPIRBuilder OMPBuilder(*M);		OpenMPIRBuilder OMPBuilder(*M);
OMPBuilder.initialize();		OMPBuilder.initialize();
IRBuilder<> Builder(BB);		IRBuilder<> Builder(BB);
OpenMPIRBuilder::LocationDescription Loc({Builder.saveIP(), DL});		OpenMPIRBuilder::LocationDescription Loc({Builder.saveIP(), DL});
Value *TripCount = F->getArg(0);		Value *TripCount = F->getArg(0);

▲ Show 20 Lines • Show All 480 Lines • ▼ Show 20 Lines	for (auto &FI : *ThenBB) {
}		}
}		}
EXPECT_NE(SingleEndCI, nullptr);		EXPECT_NE(SingleEndCI, nullptr);
EXPECT_EQ(SingleEndCI->getNumArgOperands(), 2U);		EXPECT_EQ(SingleEndCI->getNumArgOperands(), 2U);
EXPECT_TRUE(isa<GlobalVariable>(SingleEndCI->getArgOperand(0)));		EXPECT_TRUE(isa<GlobalVariable>(SingleEndCI->getArgOperand(0)));
EXPECT_EQ(SingleEndCI->getArgOperand(1), SingleEntryCI->getArgOperand(1));		EXPECT_EQ(SingleEndCI->getArgOperand(1), SingleEntryCI->getArgOperand(1));
}		}

		TEST_F(OpenMPIRBuilderTest, ParallelCaptureUpperDefinedParameters) {
		OpenMPIRBuilder OMPBuilder(*M);
		OMPBuilder.initialize();
		F->setName("func");
		IRBuilder<> Builder(BB);
		OpenMPIRBuilder::LocationDescription Loc({Builder.saveIP(), DL});
		using InsertPointTy = OpenMPIRBuilder::InsertPointTy;

		Type *I32Ty = Type::getInt32Ty(M->getContext());
		Type *I32PtrTy = Type::getInt32PtrTy(M->getContext());
		Type *StructTy = StructType::get(I32Ty, I32PtrTy);
		Type *StructPtrTy = StructTy->getPointerTo();
		Type *VoidTy = Type::getVoidTy(M->getContext());
		FunctionCallee RetI32Func = M->getOrInsertFunction("ret_i32", I32Ty);
		FunctionCallee TakeI32Func =
		M->getOrInsertFunction("take_i32", VoidTy, I32Ty);
		FunctionCallee RetI32PtrFunc = M->getOrInsertFunction("ret_i32ptr", I32PtrTy);
		FunctionCallee TakeI32PtrFunc =
		M->getOrInsertFunction("take_i32ptr", VoidTy, I32PtrTy);
		FunctionCallee RetStructFunc = M->getOrInsertFunction("ret_struct", StructTy);
		FunctionCallee TakeStructFunc =
		M->getOrInsertFunction("take_struct", VoidTy, StructTy);
		FunctionCallee RetStructPtrFunc =
		M->getOrInsertFunction("ret_structptr", StructPtrTy);
		FunctionCallee TakeStructPtrFunc =
		M->getOrInsertFunction("take_structPtr", VoidTy, StructPtrTy);
		Value *I32Val = Builder.CreateCall(RetI32Func);
		Value *I32PtrVal = Builder.CreateCall(RetI32PtrFunc);
		Value *StructVal = Builder.CreateCall(RetStructFunc);
		Value *StructPtrVal = Builder.CreateCall(RetStructPtrFunc);

		Instruction *Internal;
		auto BodyGenCB = [&](InsertPointTy AllocaIP, InsertPointTy CodeGenIP,
		BasicBlock &ContinuationBB) {
		IRBuilder<>::InsertPointGuard Guard(Builder);
		Builder.restoreIP(CodeGenIP);
		Internal = Builder.CreateCall(TakeI32Func, I32Val);
		Builder.CreateCall(TakeI32PtrFunc, I32PtrVal);
		Builder.CreateCall(TakeStructFunc, StructVal);
		Builder.CreateCall(TakeStructPtrFunc, StructPtrVal);
		};
		auto PrivCB = [&](InsertPointTy AllocaIP, InsertPointTy CodeGenIP, Value &,
		Value &Inner, Value *&ReplacementValue) {
		ReplacementValue = &Inner;
		return CodeGenIP;
		};
		auto FiniCB = [](InsertPointTy) {};

		IRBuilder<>::InsertPoint AllocaIP(&F->getEntryBlock(),
		F->getEntryBlock().getFirstInsertionPt());
		IRBuilder<>::InsertPoint AfterIP =
		OMPBuilder.createParallel(Loc, AllocaIP, BodyGenCB, PrivCB, FiniCB,
		nullptr, nullptr, OMP_PROC_BIND_default, false);

		Builder.restoreIP(AfterIP);
		Builder.CreateRetVoid();

		OMPBuilder.finalize();

		EXPECT_FALSE(verifyModule(*M, &errs()));
		Function *OutlinedFn = Internal->getFunction();

		Type *Arg2Type = OutlinedFn->getArg(2)->getType();
		EXPECT_TRUE(Arg2Type->isPointerTy());
		Type *StructElemTy = Arg2Type->getPointerElementType();
		EXPECT_STREQ(StructElemTy->getStructName().data(), "CapturedStructType");
		EXPECT_TRUE(StructElemTy->isStructTy());
		EXPECT_EQ(StructElemTy->getStructNumElements(), static_cast<unsigned>(2));
		StructType StructTypeTy = reinterpret_cast<StructType >(StructElemTy);
		EXPECT_TRUE(StructTypeTy->getElementType(0)->isIntegerTy(32));
		EXPECT_TRUE(StructTypeTy->getElementType(1)->isStructTy());
		StructType *InnerStructType =
		reinterpret_cast<StructType *>(StructTypeTy->getElementType(1));
		EXPECT_TRUE(InnerStructType->getElementType(0)->isIntegerTy(32));
		EXPECT_TRUE(InnerStructType->getElementType(1)->isPointerTy());
		EXPECT_TRUE(
		InnerStructType->getElementType(1)->getPointerElementType()->isIntegerTy(
		32));
		}
} // namespace		} // namespace

mlir/test/Conversion/OpenMPToLLVM/convert-to-llvmir.mlir

	// RUN: mlir-opt -convert-openmp-to-llvm %s -split-input-file \| FileCheck %s			// RUN: mlir-opt -convert-openmp-to-llvm %s -split-input-file \| FileCheck %s

	// CHECK-LABEL: llvm.func @branch_loop			// CHECK-LABEL: llvm.func @branch_loop
	func @branch_loop() {			func @branch_loop() {
	%start = constant 0 : index			%start = constant 0 : index
	%end = constant 0 : index			%end = constant 0 : index
	// CHECK: omp.parallel			// CHECK: omp.parallel
	omp.parallel {			omp.parallel {
	// CHECK-NEXT: llvm.br ^[[BB1:.*]](%{{[0-9]+}}, %{{[0-9]+}} : !llvm.i64, !llvm.i64			// CHECK: llvm.br ^[[BB1:.*]](%{{[0-9]+}}, %{{[0-9]+}} : !llvm.i64, !llvm.i64
	br ^bb1(%start, %end : index, index)			br ^bb1(%start, %end : index, index)
	// CHECK-NEXT: ^[[BB1]](%[[ARG1:[0-9]+]]: !llvm.i64, %[[ARG2:[0-9]+]]: !llvm.i64):{{.*}}			// CHECK-NEXT: ^[[BB1]](%[[ARG1:[0-9]+]]: !llvm.i64, %[[ARG2:[0-9]+]]: !llvm.i64):{{.*}}
	^bb1(%0: index, %1: index):			^bb1(%0: index, %1: index):
	// CHECK-NEXT: %[[CMP:[0-9]+]] = llvm.icmp "slt" %[[ARG1]], %[[ARG2]] : !llvm.i64			// CHECK-NEXT: %[[CMP:[0-9]+]] = llvm.icmp "slt" %[[ARG1]], %[[ARG2]] : !llvm.i64
	%2 = cmpi "slt", %0, %1 : index			%2 = cmpi "slt", %0, %1 : index
	// CHECK-NEXT: llvm.cond_br %[[CMP]], ^[[BB2:.]](%{{[0-9]+}}, %{{[0-9]+}} : !llvm.i64, !llvm.i64), ^[[BB3:.]]			// CHECK-NEXT: llvm.cond_br %[[CMP]], ^[[BB2:.]](%{{[0-9]+}}, %{{[0-9]+}} : !llvm.i64, !llvm.i64), ^[[BB3:.]]
	cond_br %2, ^bb2(%end, %end : index, index), ^bb3			cond_br %2, ^bb2(%end, %end : index, index), ^bb3
	// CHECK-NEXT: ^[[BB2]](%[[ARG3:[0-9]+]]: !llvm.i64, %[[ARG4:[0-9]+]]: !llvm.i64):			// CHECK-NEXT: ^[[BB2]](%[[ARG3:[0-9]+]]: !llvm.i64, %[[ARG4:[0-9]+]]: !llvm.i64):
	Show All 32 Lines

mlir/test/Conversion/OpenMPToLLVM/openmp_float-parallel_param.mlir

This file was added.

				// RUN: mlir-translate --mlir-to-llvmir %s \| FileCheck %s

				ftynseUnsubmitted Not Done Reply Inline Actions Changes to MLIR are no longer necessary ftynse: Changes to MLIR are no longer necessary
				llitchevAuthorUnsubmitted Done Reply Inline Actions Yes. This just exposes the original issue I had. I thought it is useful to have a test that verifies the underlined functionality works for MLIR. llitchev: Yes. This just exposes the original issue I had. I thought it is useful to have a test that…
				module {
				ftynseUnsubmitted Done Reply Inline Actions Could we please make this test minimal and only exercise the functionality that the patch is adding? I don't think we need anything about `main` or `_mlir_ciface` or the entire initialization block here. We can use the Test dialect that supports unregistered ops as opaque producers or users of values. ftynse: Could we please make this test minimal and only exercise the functionality that the patch is…
				llvm.func @malloc(!llvm.i64) -> !llvm.ptr<i8>
				llvm.func @main() {
				%0 = llvm.mlir.constant(4 : index) : !llvm.i64
				%1 = llvm.mlir.constant(4 : index) : !llvm.i64
				%2 = llvm.mlir.null : !llvm.ptr<float>
				%3 = llvm.mlir.constant(1 : index) : !llvm.i64
				%4 = llvm.getelementptr %2[%3] : (!llvm.ptr<float>, !llvm.i64) -> !llvm.ptr<float>
				%5 = llvm.ptrtoint %4 : !llvm.ptr<float> to !llvm.i64
				%6 = llvm.mul %1, %5 : !llvm.i64
				%7 = llvm.call @malloc(%6) : (!llvm.i64) -> !llvm.ptr<i8>
				%8 = llvm.bitcast %7 : !llvm.ptr<i8> to !llvm.ptr<float>
				%9 = llvm.mlir.undef : !llvm.struct<(ptr<float>, ptr<float>, i64, array<1 x i64>, array<1 x i64>)>
				%10 = llvm.insertvalue %8, %9[0] : !llvm.struct<(ptr<float>, ptr<float>, i64, array<1 x i64>, array<1 x i64>)>
				%11 = llvm.insertvalue %8, %10[1] : !llvm.struct<(ptr<float>, ptr<float>, i64, array<1 x i64>, array<1 x i64>)>
				%12 = llvm.mlir.constant(0 : index) : !llvm.i64
				%13 = llvm.insertvalue %12, %11[2] : !llvm.struct<(ptr<float>, ptr<float>, i64, array<1 x i64>, array<1 x i64>)>
				%14 = llvm.mlir.constant(1 : index) : !llvm.i64
				%15 = llvm.insertvalue %1, %13[3, 0] : !llvm.struct<(ptr<float>, ptr<float>, i64, array<1 x i64>, array<1 x i64>)>
				%16 = llvm.insertvalue %14, %15[4, 0] : !llvm.struct<(ptr<float>, ptr<float>, i64, array<1 x i64>, array<1 x i64>)>
				%17 = llvm.mlir.constant(4.200000e+01 : f32) : !llvm.float
				// CHECK: %CaptureStructAlloca = alloca %CapturedStructType
				// CHECK: %{{.}} = insertvalue %CapturedStructType undef, {{.}}, 0
				// CHECK: store %CapturedStructType %{{.}}, %CapturedStructType %CaptureStructAlloca
				omp.parallel num_threads(%0 : !llvm.i64) {
				// CHECK: %{{.}} = load %CapturedStructType, %CapturedStructType %CaptureStructAlloca
				// CHECK: %{{.}} = extractvalue %CapturedStructType %{{.}}, 0
				%27 = llvm.mlir.constant(1 : i64) : !llvm.i64
				%28 = llvm.extractvalue %16[1] : !llvm.struct<(ptr<float>, ptr<float>, i64, array<1 x i64>, array<1 x i64>)>
				%29 = llvm.mlir.constant(0 : index) : !llvm.i64
				%30 = llvm.mlir.constant(1 : index) : !llvm.i64
				%31 = llvm.mul %27, %30 : !llvm.i64
				%32 = llvm.add %29, %31 : !llvm.i64
				%33 = llvm.getelementptr %28[%32] : (!llvm.ptr<float>, !llvm.i64) -> !llvm.ptr<float>
				llvm.store %17, %33 : !llvm.ptr<float>
				omp.terminator
				}
				llvm.return
				}
				}
				ftynseUnsubmitted Done Reply Inline Actions The check pattern doesn't look like valid MLIR, I am surprised pre-merge checks haven't complained. ftynse: The check pattern doesn't look like valid MLIR, I am surprised pre-merge checks haven't…
				llitchevAuthorUnsubmitted Done Reply Inline Actions I have no idea why it didn't get caught. llitchev: I have no idea why it didn't get caught.
				ftynseUnsubmitted Done Reply Inline Actions Prefer CHECK over CHECK-NEXT unless the semantics of the IR changes when two operations are not adjacent. The need to change another, unrelated test in these patch is the a good illustration why :) ftynse: Prefer CHECK over CHECK-NEXT unless the semantics of the IR changes when two operations are not…

This is an archive of the discontinued LLVM Phabricator instance.

[OpenMPIRBuilder} Add capturing of parameters to pass to omp::parallelNeeds ReviewPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 313077

clang/test/OpenMP/parallel_codegen.cpp

llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h

llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp

llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp

mlir/test/Conversion/OpenMPToLLVM/convert-to-llvmir.mlir

mlir/test/Conversion/OpenMPToLLVM/openmp_float-parallel_param.mlir

[OpenMPIRBuilder} Add capturing of parameters to pass to omp::parallel
Needs ReviewPublic