This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Frontend/OpenMP/
-
llvm/
-
Frontend/
-
OpenMP/
2/2
OMPIRBuilder.h
-
lib/Frontend/OpenMP/
-
Frontend/
-
OpenMP/
42/43
OMPIRBuilder.cpp
-
unittests/Frontend/
-
Frontend/
9/9
OpenMPIRBuilderTest.cpp

Differential D71989

[OpenMP][IRBuilder] `omp task` support
ClosedPublic

Authored by shraiysh on Dec 29 2019, 11:20 PM.

Download Raw Diff

Details

Reviewers

rogfer01
ABataev
JonChesterfield
kiranchandramohan
fghanim
jdoerfert
kiranktp
ftynse
Meinersbur

Commits

rG7604c59bd233: [OpenMP][IRBuilder] `omp task` support

Summary

This patch adds basic support for omp task to the OpenMPIRBuilder.

The outlined function after code extraction is called from a wrapper function with appropriate arguments. This wrapper function is passed to the runtime calls for task allocation.

This approach is different from the Clang approach - clang directly emits the runtime call to the outlined function. The outlining utility (OutlineInfo) simply outlines the code and generates a function call to the outlined function. After the function has been generated by the outlining utility, there is no easy way to alter the function arguments without meddling with the outlining itself. Hence the wrapper function approach is taken.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

jdoerfert created this revision.Dec 29 2019, 11:20 PM

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptDec 29 2019, 11:20 PM

Herald added subscribers: guansong, bollu, hiraditya. · View Herald Transcript

jdoerfert added a parent revision: D71988: [OpenMP][WIP] Make the kmp_depend_info type fit in 128 bits..Dec 29 2019, 11:21 PM

jdoerfert added a parent revision: D70290: [OpenMP] Use the OpenMPIRBuilder for "omp parallel".

lebedev.ri added a subscriber: lebedev.ri.Dec 29 2019, 11:29 PM

lebedev.ri added inline comments.

llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
141–142	This doesn't seem correctt.

Unit tests: unknown.

clang-tidy: unknown.

clang-format: unknown.

Build artifacts: diff.json, console-log.txt

Harbormaster failed remote builds in B43030: Diff 235555!Dec 29 2019, 11:30 PM

JonChesterfield added inline comments.Dec 30 2019, 3:21 AM

openmp/runtime/src/kmp.h
2242 ↗	(On Diff #235555)	Existing code, but the comments about number of bits here should probably be executable. E.g. give the compiler and the library a uint16_t field each, assert that sizeof kmp_tasking_flags == sizeof(uint32_t). Optionally four byte align the first short.

jdoerfert marked 2 inline comments as done.Dec 30 2019, 9:55 AM

jdoerfert added inline comments.

llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
141–142	It is not, thx, 62.
openmp/runtime/src/kmp.h
2242 ↗	(On Diff #235555)	I'd like to make a bunch of changes like that now that I started to look into the library closer. In addition to the TODOs I added below (in another patch), I found seemingly unused functions, arguments, ... Versioning the runtime seems interesting to me now.

sroyuela added a subscriber: sroyuela.Dec 31 2019, 8:35 AM

rogfer01 added inline comments.Jan 13 2020, 11:43 PM

openmp/runtime/src/kmp_tasking.cpp
1764 ↗	(On Diff #235555)	Just double-checking here I understand what would happen here for C++ firstprivatized objects (sorry if I'm asking the obvious): We would capture the copy onto an `alloc`ed memory and then we would copy that memory to the task struct? So given something like struct A { A(int x = 3); void bar(); }; void foo() { A a; #pragma omp task { a.bar(); } } we'd do (in pseudo-C) void foo() { struct task_env_0 { char A_storage[sizeof(A)]; // properly aligned and all that } tenv_0; A::A(&tenv_0.A_storage, 3); // capture // This would happen (dynamically) inside __kmpc_task kmp_task_t* new_task = __kmp_omp_task_alloc( , sizeof(tenv_0), ... ); memcpy(new_task->shareds, &tenv_0, sizeof(tenv_0)); kmpc_omp_task(..., new_task). // } So we go from create task capture environment in the task context queue task / run immediately for if0 to capture environment in a local storage allocate task + copy local environment to task environment if needed (for tasks that are not if0) + queue task / run immediately for if0 Did I get that right? Thanks!

the task create and task issue step are
conceptually not separated anymore as it is

I don't think this can work reliably. Because not all C++ objects can be mem-copied.
E.g. an object can keep its own address or reference, and mem-copy will make it broken.
This could be fixed by generating (optional) thunk routine which would create all needed objects
in the library-allocated space, and similar routine which would destroy all the created objects.

In D71989#1819219, @AndreyChurbanov wrote:

the task create and task issue step are
conceptually not separated anymore as it is

I don't think this can work reliably. Because not all C++ objects can be mem-copied.
E.g. an object can keep its own address or reference, and mem-copy will make it broken.
This could be fixed by generating (optional) thunk routine which would create all needed objects
in the library-allocated space, and similar routine which would destroy all the created objects.

I agree. memcpy will only work for certain types (incl. all PoD types). We need to extend this later to pass the copy constructor function pointers and locations of the original object.
We should keep kmp_uint32 sizeof_shared_and_private_vars, void *shared_and_private_vars for the simple cases and add something like:

/// Copy wrapper takes the address of an object and the address the copy is going to be initialized in and returns the address right after the new object. 
typedef void*(*copy_wrapper)(void * /* src_addr */, void * /* trg_addr */);
 
__kmpc_task( ..., kmp_uint32 sizeof_copy_infos, copy_wrapper * copy_wrapper_list, void * copied_obj_list, ...)

and we then call the copy constructors like this:

void *local_addr = ...;
for (kmp_uint32 u = 0; u < sizeof_copy_infos; ++u)
  local_addr = copy_wrapper_list[u](copied_obj_list[u], local_addr);

@rogfer01 @AndreyChurbanov What do you think? (I can do this in this patch or later)

and we then call the copy constructors like this:

void *local_addr = ...;
for (kmp_uint32 u = 0; u < sizeof_copy_infos; ++u)
  local_addr = copy_wrapper_list[u](copied_obj_list[u], local_addr);

Just to confirm: that local_addr would be somehow linked to the task, I imagine it'd be initialized to something like task->shareds + offset_to_firstprivates, wouldn't it?

Also, perhaps you already considered if it makes sense to just have a copy function generated by the front-end rather than one of for each "firstprivatized variable that can't just be memcpy'ed"?

Also I'm curious why are you moving away (perhaps I did get this wrong!) from the current model of

kmpc_omp_task_alloc
capture environment
queue the task kmpc_omp_taskor (if the task is if(0)) do an "immediate" execution but use the environment captured in 2

to something that looks like

(partially?) capture the environment. I think I didn't understand what void *shared_and_private_vars will do here. Is this something that the front-end precaptured for us?
copy the environment you obtained from shared_and_private_vars to a task-local storage and then queue the task or (if the task is if(0)) do an immediate execution using the environment you got from the argument to kmpc_task

I guess there is a benefit in the new approach. So far I read this as improving the if(0) case (but I may be missing something here).

Thanks!

In D71989#1820289, @rogfer01 wrote:
and we then call the copy constructors like this:
void *local_addr = ...;
for (kmp_uint32 u = 0; u < sizeof_copy_infos; ++u)
  local_addr = copy_wrapper_list[u](copied_obj_list[u], local_addr);
Just to confirm: that local_addr would be somehow linked to the task, I imagine it'd be initialized to something like task->shareds + offset_to_firstprivates, wouldn't it?

Yes. It is some location in which the task local variables live.

Also, perhaps you already considered if it makes sense to just have a copy function generated by the front-end rather than one of for each "firstprivatized variable that can't just be memcpy'ed"?

Having a copy function per type allows us to reuse it, otherwise we have one copy function per static task location (at worst). Either works for me I think.

Also I'm curious why are you moving away (perhaps I did get this wrong!) from the current model of

kmpc_omp_task_alloc

capture environment

queue the task kmpc_omp_taskor (if the task is if(0)) do an "immediate" execution but use the environment captured in 2

to something that looks like

(partially?) capture the environment. I think I didn't understand what void *shared_and_private_vars will do here. Is this something that the front-end precaptured for us?

The idea is that we can allocate variables (for which we do not invoke a copy constructor) in a smart way so that we can easily copy them if we need to or not copy them at all if we don't. This interface is a first version though roughly designed like the tregion interface. Any suggestions are welcome!

copy the environment you obtained from shared_and_private_vars to a task-local storage and then queue the task or (if the task is if(0)) do an immediate execution using the environment you got from the argument to kmpc_task

Optimally, you would only copy if you need to.

I guess there is a benefit in the new approach. So far I read this as improving the if(0) case (but I may be missing something here).

There are multiple reasons but the main one for me is for sure the ability to use callback metadata to link the values passed at the call site of kmpc_task with the values received at the outlined body function. This is really complicated in the old approach, because there is no direct link possible between the captured values and the local copies in the thread (the stores go to some really hard to describe location and the body function reference is only at some point later).
In the new approach you link shared_and_private_vars to the argument the task function receives. Now you encapsulate the call site in an additional, modifiable level of indirection (see D71505), and the Attributor will unpack the struct (see D68852), propagate constants, alias information, ... between the task invocation and the task function.

Having a copy function per type allows us to reuse it, otherwise we have one copy function per static task location (at worst). Either works for me I think.

I also would be OK with either option.
But note that using per-type functions will require more additions to the interface, like:

(..., num_objects, array_of_copy_wrappers, array_of_desctuctor_wrappers, array_of_obj_offsets).

Then the library can iterate over objects to copy-construct them, then iterate to destroy them after the task is complete. Without any possibility of inlining of any wrappers.

Per-task function only needs two additions - copy_wrapper and destructor_wrapper, all other details can live inside them, including possible inlining of constructors and destructors.

In D71989#1822326, @AndreyChurbanov wrote:
Having a copy function per type allows us to reuse it, otherwise we have one copy function per static task location (at worst). Either works for me I think.

I also would be OK with either option.
But note that using per-type functions will require more additions to the interface, like:
(..., num_objects, array_of_copy_wrappers, array_of_desctuctor_wrappers, array_of_obj_offsets).
Then the library can iterate over objects to copy-construct them, then iterate to destroy them after the task is complete. Without any possibility of inlining of any wrappers.

Per-task function only needs two additions - copy_wrapper and destructor_wrapper, all other details can live inside them, including possible inlining of constructors and destructors.

Agreed. Interestingly, the "per-type" solution should allow you to implement the "per-task" version on top. You pretend all objects are part of a (task) meta-object which will copy them one by one.

@fghanim @kiranchandramohan @kiranktp @AMDChirag @anchu-rajendran @SouraVX @Meinersbur, Anyone interested in taking this over? The required changes in addition to the diff have been discussed with @AndreyChurbanov in the comments above.

openmp/runtime/src/kmp_tasking.cpp
1764 ↗	(On Diff #235555)	Did I get that right? Yes, that was the idea. (Apologies for the delayed response)

Herald added subscribers: sstefan1, yaxunl. · View Herald TranscriptDec 15 2020, 10:15 AM

In D71989#2455515, @jdoerfert wrote:

The required changes in addition to the diff have been discussed with @AndreyChurbanov in the comments above.

Actually the __kmpc_task function is going to be a bit more complicated in reality.

Because besides parameters for copy-construction + destruction of firstprivate objects, one will also need to pass to the __kmpc_task:

priority value if any;
affinity value(s) if any, probably single value is enough (this may be not supported yet, but apparently will be supported);
address of event handle to be returned for detachable task, offset of the event handle in shared_and_private_vars so that the library can put corresponding address there if it is used inside the task code, or some indicator that library should not put the address there if it is not used inside the task code (e.g. negative offset?);
offset of address of task structure in shared_and_private_vars for untied task, so that the compiler can generate __kmpc_omp_task calls to re-schedule partially executed task.

Also the __kmpc_task should better not call any __kmpc_* functions given recent activity of Joachim Protze (D92197) to eliminate all __kmpc_* calls from inside the library, and replace them with corresponding internal calls. Otherwise OMPT cannot reliably determine entry to / exit from the runtime library, IINM. But this might be a separate patch.

Please consider the special cases of task_if0, which preferably should use __kmpc_omp_task_with_deps:

#pragma omp task if(0) depend(out:A) detach(event) : the detached undeferred task needs the out dependency to release the dependency only when detach is fulfilled
#pragma omp task if(0) depend(inoutmutexset:A) : the undeferred task can execute mutually exclusive with any other inoutmutexset:A task.

I'm not sure that these are all the special cases.

See the bugs for full OpenMP code examples:

https://bugs.llvm.org/show_bug.cgi?id=46185
https://bugs.llvm.org/show_bug.cgi?id=46193

Hi @jdoerfert, we (BSC) may be able to work on this but we don't want to step on each one toes. Are there plans to push this forward (by you or someone else)?

How essential is changing the task interface in the context of your vision for the OpenMP-wise optimisations? Reusing the existing interface may not be ideal in that sense but could allow us to have a baseline already working for flang.

Thoughts? Thanks!

In D71989#2987095, @rogfer01 wrote:

Hi @jdoerfert, we (BSC) may be able to work on this but we don't want to step on each one toes. Are there plans to push this forward (by you or someone else)?

How essential is changing the task interface in the context of your vision for the OpenMP-wise optimisations? Reusing the existing interface may not be ideal in that sense but could allow us to have a baseline already working for flang.

Thoughts? Thanks!

I am not actively working on this, @ggeorgakoudis was interested though. We can communicate via email what is a good way forward.
I am fine with keeping the original interface first and just moving the code, if you think that is easier.
(I was hoping that API simplifications will make the move easier but I might have been wrong.)

Hi all, is this patch being worked on? I wanted to use this for adding support for task construct in flang.

Herald added a project: Restricted Project. · View Herald TranscriptMar 28 2022, 10:19 PM

In D71989#3413237, @shraiysh wrote:

Hi all, is this patch being worked on? I wanted to use this for adding support for task construct in flang.

I don't think it is. Read the discussion to see why this got stalled. Best way forward is to take this but *not* do the API changes I included.

In D71989#3413239, @jdoerfert wrote:

In D71989#3413237, @shraiysh wrote:

Hi all, is this patch being worked on? I wanted to use this for adding support for task construct in flang.

I don't think it is. Read the discussion to see why this got stalled. Best way forward is to take this but *not* do the API changes I included.

Okay, I am assuming a lot of the OpenMPIRBuilder.cpp code can be reused from here, but instead of creating the new proposed function, I should create the old functions with those arguments. I hope this is the correct approach, please correct me if I am wrong.

I wanted to know how to proceed about this? Do I borrow stuff from this patch and submit a new differential with you as a co-author? Do I push to this same differential? (Can we do that?) Please advise me on this. Thanks!

I wanted to know how to proceed about this? Do I borrow stuff from this patch and submit a new differential with you as a co-author? Do I push to this same differential? (Can we do that?) Please advise me on this. Thanks!

There is a "Commandeer Revision" option in the Add Action Menu. You can use that to take over this patch from @jdoerfert. Continuing here might have some advantages in the context, existing reviews and reviewers etc.

Commandeering this revision now.

rpenacob added a subscriber: rpenacob.Apr 5 2022, 6:57 AM

Update with basic support for createTask without the clauses.

Harbormaster completed remote builds in B158550: Diff 421309.Apr 7 2022, 12:14 PM

shraiysh retitled this revision from [OpenMP][IRBuilder][WIP] Prototype `omp task` support to [OpenMP][IRBuilder] `omp task` support.Apr 7 2022, 12:16 PM

shraiysh edited the summary of this revision. (Show Details)

shraiysh removed parent revisions: D70290: [OpenMP] Use the OpenMPIRBuilder for "omp parallel", D71988: [OpenMP][WIP] Make the kmp_depend_info type fit in 128 bits..

Rerun builds without dependency

shraiysh added a reviewer: kiranktp.Apr 7 2022, 12:19 PM

Harbormaster completed remote builds in B158552: Diff 421311.Apr 7 2022, 2:22 PM

shraiysh added subscribers: NimishMishra, raghavendhra, dpalermo.Apr 8 2022, 7:05 AM

Use correct method - getTypeStoreSize instead of getTypeSizeInBits.

Also, ping for review.

Harbormaster completed remote builds in B159378: Diff 422405.Apr 13 2022, 12:26 AM

Handle the case with no arguments to task function.

Herald added a reviewer: ftynse. · View Herald TranscriptApr 16 2022, 9:00 PM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: awarzynski, sdasgup3, wenzhicui and 22 others. · View Herald Transcript

Remove MLIR related code from this patch. (Added by mistake)

Harbormaster completed remote builds in B159960: Diff 423274.Apr 16 2022, 10:04 PM

shraiysh added a child revision: D123919: [mlir][OpenMP] omp.task translation to LLVM IR.Apr 17 2022, 9:36 PM

kiranchandramohan added a reviewer: Meinersbur.Apr 20 2022, 3:30 AM

Thanks @shraiysh for taking up the work for task. This is the most important pending work for non-target OpenMP.

Could you expand the Summary a bit more with the approach taken and how it differs from current Clang task codegen?

I have added a few questions and comments.

llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
1290	Do you know whether attributes are added to the outlined/wrapper function? Is it added in the finalize function? `CreateParallel` seems to have in in the `PostOutlineCB`.
1302–1307	Is this a change from clang codegen? Why not runtime_call(..., outlined_fn, ...)? Can you say that explicitly in the summary that this is different from Clang if it was changed for a particular reason?
1335	Nit: corresponding
1365	Is the size of shared a TODO? And would further changes impact the task size as well? It seems by default variables are passed as private copies, but if the enclosing region has a data-sharing of shared then that should be honoured.
1391	Nit: A debug dump of the IR here might be useful.
1393–1394	Nit: The size can be omitted.
1395	Is this step required? `TaskRegionBlockSet` and `Blocks` do not seem to be used here, also `collectBlocks` function seems to be called from finalize as well.

FYI, I am revising how outlining works, still waiting on D115216 getting reviewed.

llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
1313	Could you add an assert checking that there is just a single user, so we do not accidentally a wrong one.
1391	Please don't dump complete IR. Too much output makes `-debug` useless.
llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp
1196–1198	[nit] only whitespace change
4452	[style] LLVM codsing style does not use Almost-Always-Auto.
4516–4521	Consider `llvm::any_of`

shraiysh edited the summary of this revision. (Show Details)Apr 26 2022, 4:37 AM

Addressed comments

llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
1290	I have not added the attributes because I don't know about them and if they are required or not. I will check clang codegen once again for comments about any required attributes, but if you or anyone else knows what attributes are required and why, please let me know, I will add them.
1302–1307	The main reason for a difference is that Outlining generates a function call as shown in the input IR in the comment above. I could not find an easy way to change the arguments of a function after it has been constructed. Parallel does this by tweaking the CodeExtractor used inside outlining - and I did not want to touch outlining internally so that I can confidently rely on outlining to do its job. A nice way to have similar (but not same) behavior as clang would be to add the inline attribute to the outlined function. After an Inline pass, the wrapper function will become the outlined function and it will be almost identical to clang's codegen. I will update the summary to highlight this difference.
1365	Yes, that is a todo, I have added it now. I think the task size will be reduced in that case. I have not looked into that yet and am doing private copies for everything at the moment.
1391	Please let me know if something less than IR should be dumped here.
1395	No it is not, I thought `collectBlocks` was required. I have removed it now.
llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp
1196–1198	Removed. Thanks!

Harbormaster completed remote builds in B161378: Diff 425189.Apr 26 2022, 6:52 AM

Meinersbur added inline comments.Apr 26 2022, 9:05 AM

llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
1265–1275	I don't understand what this means.
1276	[style] Please use LLVM's naming standard.
1288	`BodyGenCB` may have invalidated `AllocaIP` and cannot be used anymore.
1302–1307	Could you add the function signature differences between `wrapper_fn` and `outlined_fn`? I.e. what arguments does `wrapper_fn` throw away? Btw, current Clang emits such a wrapper function with `-g` as well, so I don't see it as an issue. LLVM will always inline a private function with just a single call site, except in `-O0` where `always_inline` would be needed.
1333	Please document what `HasTaskData`. `TaskSize`, `NewTaskData`, etc are.
1353	Use a `llvm::Twine`
1368	What is flag `1`?
1387–1388
1399	Please avoid temporary instructions. Use `splitBB` instead.

Addressed comments. Rebase with main.

llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
1265–1275	There are three basic block splits in the next few lines and this comment is meant to justify why three splits are needed - it basically tells which basic block will go where. Should I add something more to the comment?
1302–1307	The first argument is an i32 value but I do not know its usage. There is minimal documentation for task related stuff in OpenMP Runtime Library Reference in the OpenMP subproject. Here is the function signature for the wrapper function, and the first argument is discarded at the moment. If anyone has any reference to some documentation about this, or has any idea about what that argument represents, please let me know and I will try to handle it according to its purpose. Btw, current Clang emits such a wrapper function with `-g` as well, so I don't see it as an issue. LLVM will always inline a private function with just a single call site, except in `-O0` where `always_inline` would be needed. Oh that is great then. Thank you!
1302–1307	Could you add the function signature differences between `wrapper_fn` and `outlined_fn`? I.e. what arguments does `wrapper_fn` throw away? Btw, current Clang emits such a wrapper function with `-g` as well, so I don't see it as an issue. LLVM will always inline a private function with just a single call site, except in `-O0` where `always_inline` would be needed.
1353	I have used it, but I am not sure if it is accurate usage. Please let me know if it seems inefficient.
1368	This is from flags in kmp.h. Value 1 means that it is tied (which is default unless `untied` clause is specified). Untied clause is not handled in this patch.
1399	AFAIK, `splitBB` requires an instruction pointer. I have updated this to erase the temporary instruction immediately after I have split the basic blocks, but I cannot figure out a way to completely eliminate temporary instructions. Is this okay?

Harbormaster completed remote builds in B162105: Diff 426227.Apr 30 2022, 7:11 AM

Meinersbur added inline comments.May 2 2022, 4:51 PM

llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
1352	Don't store a Twine in a variable. See comment from `Twine.h`: /// A Twine is not intended for use directly and should not be stored, its /// implementation relies on the ability to store pointers to temporary stack /// objects which may be deallocated at the end of a statement. Twines should /// only be used accepted as const references in arguments, when an API wishes /// to accept possibly-concatenated strings.
1355	No `SmallString<128>` needed. `str()` creates a `std::string` that implicitly converts to a `llvm::StringRef` that is valid until the end of the statement(`"`;`"`). Compared to using `std::string` only, this saves the creation of one temporary `std::sting` (for `OutlinedFn.getName()`). Your version saves another one (if fewer than 128 chars), but it is also more complicated. Before `StringRef` was made more compatible to `std::string_view`, the `.str()` wasn't even need.
1399	BasicBlock TaskExitBB = splitBB(Builder, "task.exit"); BasicBlock TaskBodyBB = splitBB(Builder, "task.body"); BasicBlock *TaskAllocaBB = splitBB(Builder, "task.alloca"); Note that the reverse order. After this, `Builder` will insert instructions before the terminator of `Currbb` (where you probably don't want to insert instructions here).
llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp
4501	If `WrapperFunc` is NULL, the next line will be a segfault. `ASSERT_NE` will stop execution if it fails.
4504	See above (and other occurances checking for nullptr)
4511–4514	Nice!

Addressed comments

llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
1355	Alright, thanks for the explanation, I understand it better now.

Harbormaster completed remote builds in B162395: Diff 426608.May 3 2022, 3:03 AM

Meinersbur added inline comments.May 3 2022, 8:31 PM

llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
1265–1275	Consider something that resembles LLVM-IR syntax. Eg. // def outlined_fn() { // task.alloca: // br label %task.body // // task.body: // ret void // } `outlined_fn` is a view after the finalize call, but here we are constructing the CFG before outlining, i.e. the description does not match.
1324–1326	[serious] `SrcLocStrSize` cannot be passed-by-value and passed-bt-referenced at the same statement (unsequenced).
1333	Still don't see a description of `TaskSize`. For `HasTaskData`, could you explain why and how the function signature changes?
1351–1352	`WrapperFuncName` and `WrapperFuncNameStorage` are now dead. Could you remove them?
1368	Please document that.
llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp
4538	[style]
4539	[style]

Addressed comments.

Added description for Tied Argument

llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
1265–1275	`outlined_fn` is a view after the finalize call, but here we are constructing the CFG before outlining, i.e. the description does not match. Yes, that was supposed to justify why we need four splits here. Without this comment it wasn't really clear to me why three blocks won't be enough (currBB, task.body and task.exit). It also isn't directly intuitive that task.exit is not going to be a part of the outlined function. I have added that this is going to be the basic block mapping after outlining. If it seems unnecessary, please let me know and I will remove this comment.
1333	I have added that TaskSize refers to the argument `sizeof_kmp_task_t` in the runtime call. I did not want to add further information because the argument's exact meaning is not documented anywhere (the reference doesn't have it). My interpretation of the argument is the size of arguments in bytes, but that could be incorrect and I thought it was better to redirect anyone reading this/working on this to the argument of the runtime call instead of writing a possibly misleading description. Please let me know if it would be better to add the current interpretation of the argument. For HasTaskData, could you explain why and how the function signature changes? Documented this near the wrapper function. Please let me know if it requires some alteration.

Harbormaster completed remote builds in B163630: Diff 428286.May 9 2022, 11:07 PM

Ping for review.

LGTM. Consider adding some description of TaskSize before committing.

llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
1333	https://github.com/llvm/llvm-project/blob/a4190037fac06c2b0cc71b8bb90de9e6b570ebb5/openmp/runtime/src/kmp_tasking.cpp#L1345 // sizeof_kmp_task_t: Size in bytes of kmp_task_t data structure including // private vars accessed in task.

This revision is now accepted and ready to land.May 19 2022, 8:21 AM

Thank you @Meinersbur for pointing out the description of the argument, I had not checked the cpp file. I have added the description now.

Addressed comments. I will wait for two more days, and if there are no further comments on this patch, then I will land it.

Harbormaster completed remote builds in B165740: Diff 431241.May 22 2022, 9:51 AM

Meinersbur accepted this revision.May 23 2022, 1:48 PM

This revision was landed with ongoing or failed builds.May 23 2022, 9:52 PM

Closed by commit rG7604c59bd233: [OpenMP][IRBuilder] `omp task` support (authored by shraiysh). · Explain Why

This revision was automatically updated to reflect the committed changes.

shraiysh added a commit: rG7604c59bd233: [OpenMP][IRBuilder] `omp task` support.

mikerice added a subscriber: mikerice.Oct 13 2022, 10:36 AM

mikerice added inline comments.

llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
1369	The return value of this dyn_cast is not checked. Should this use cast instead (to satisfy static verifiers)? Or do you expect a nullptr return here?

shraiysh added inline comments.Oct 13 2022, 11:29 AM

llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
1369	I do not expect it to return a nullptr in most cases. There could be a few corner cases that I am not aware of where it might. I think we can change it to cast instead of dyn_cast, or if that doesn't work, we should error out if we get a nullptr, until we find a valid testcase where nullptr is expected. I am busy for the next couple days so I will work on this after sometime, but meanwhile if this change turns any buildbots green and is required sooner, I will be able to review the change if required. Thank you for pointing this issue out.

psoni2628 mentioned this in D146766: [Flang][OpenMP] Support depend clause for task construct, excluding array sections.Jun 3 2023, 3:35 PM

Revision Contents

Path

Size

llvm/

include/

llvm/

Frontend/

OpenMP/

OMPIRBuilder.h

10 lines

lib/

Frontend/

OpenMP/

OMPIRBuilder.cpp

166 lines

unittests/

Frontend/

OpenMPIRBuilderTest.cpp

165 lines

Diff 431588

llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h

Show First 20 Lines • Show All 132 Lines • ▼ Show 20 Lines	public:
/// Callback type for body (=inner region) code generation		/// Callback type for body (=inner region) code generation
///		///
/// The callback takes code locations as arguments, each describing a		/// The callback takes code locations as arguments, each describing a
/// location where additional instructions can be inserted.		/// location where additional instructions can be inserted.
///		///
/// The CodeGenIP may be in the middle of a basic block or point to the end of		/// The CodeGenIP may be in the middle of a basic block or point to the end of
/// it. The basic block may have a terminator or be degenerate. The callback		/// it. The basic block may have a terminator or be degenerate. The callback
/// function may just insert instructions at that position, but also split the		/// function may just insert instructions at that position, but also split the
/// block (without the Before argument of BasicBlock::splitBasicBlock such		/// block (without the Before argument of BasicBlock::splitBasicBlock such
/// that the identify of the split predecessor block is preserved) and insert		/// that the identify of the split predecessor block is preserved) and insert
		lebedev.riUnsubmitted Done Reply Inline Actions This doesn't seem correctt. lebedev.ri: This doesn't seem correctt.
		jdoerfertUnsubmitted Done Reply Inline Actions It is not, thx, 62. jdoerfert: It is not, thx, 62.
/// additional control flow, including branches that do not lead back to what		/// additional control flow, including branches that do not lead back to what
/// follows the CodeGenIP. Note that since the callback is allowed to split		/// follows the CodeGenIP. Note that since the callback is allowed to split
/// the block, callers must assume that InsertPoints to positions in the		/// the block, callers must assume that InsertPoints to positions in the
/// BasicBlock after CodeGenIP including CodeGenIP itself are invalidated. If		/// BasicBlock after CodeGenIP including CodeGenIP itself are invalidated. If
/// such InsertPoints need to be preserved, it can split the block itself		/// such InsertPoints need to be preserved, it can split the block itself
/// before calling the callback.		/// before calling the callback.
///		///
/// AllocaIP and CodeGenIP must not point to the same position.		/// AllocaIP and CodeGenIP must not point to the same position.
▲ Show 20 Lines • Show All 462 Lines • ▼ Show 20 Lines	public:
/// \param Loc The location where the taskwait directive was encountered.		/// \param Loc The location where the taskwait directive was encountered.
void createTaskwait(const LocationDescription &Loc);		void createTaskwait(const LocationDescription &Loc);

/// Generator for '#omp taskyield'		/// Generator for '#omp taskyield'
///		///
/// \param Loc The location where the taskyield directive was encountered.		/// \param Loc The location where the taskyield directive was encountered.
void createTaskyield(const LocationDescription &Loc);		void createTaskyield(const LocationDescription &Loc);

		/// Generator for `#omp task`
		///
		/// \param Loc The location where the task construct was encountered.
		/// \param AllocaIP The insertion point to be used for alloca instructions.
		/// \param BodyGenCB Callback that will generate the region code.
		/// \param Tied True if the task is tied, false if the task is untied.
		InsertPointTy createTask(const LocationDescription &Loc,
		InsertPointTy AllocaIP, BodyGenCallbackTy BodyGenCB,
		bool Tied = true);

/// Functions used to generate reductions. Such functions take two Values		/// Functions used to generate reductions. Such functions take two Values
/// representing LHS and RHS of the reduction, respectively, and a reference		/// representing LHS and RHS of the reduction, respectively, and a reference
/// to the value that is updated to refer to the reduction result.		/// to the value that is updated to refer to the reduction result.
using ReductionGenTy =		using ReductionGenTy =
function_ref<InsertPointTy(InsertPointTy, Value , Value , Value *&)>;		function_ref<InsertPointTy(InsertPointTy, Value , Value , Value *&)>;

/// Functions used to generate atomic reductions. Such functions take two		/// Functions used to generate atomic reductions. Such functions take two
/// Values representing pointers to LHS and RHS of the reduction, as well as		/// Values representing pointers to LHS and RHS of the reduction, as well as
▲ Show 20 Lines • Show All 1,139 Lines • Show Last 20 Lines

llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp

Show First 20 Lines • Show All 1,247 Lines • ▼ Show 20 Lines

} }

void OpenMPIRBuilder::createTaskyield(const LocationDescription &Loc) { void OpenMPIRBuilder::createTaskyield(const LocationDescription &Loc) {

if (!updateToLocation(Loc)) if (!updateToLocation(Loc))

return; return;

emitTaskyieldImpl(Loc); emitTaskyieldImpl(Loc);

} }

OpenMPIRBuilder::InsertPointTy

OpenMPIRBuilder::createTask(const LocationDescription &Loc,

InsertPointTy AllocaIP, BodyGenCallbackTy BodyGenCB,

bool Tied) {

if (!updateToLocation(Loc))

return InsertPointTy();

// The current basic block is split into four basic blocks. After outlining,

// they will be mapped as follows:

// ```

// def current_fn() {

// current_basic_block:

// br label %task.exit

// task.exit:

// ; instructions after task

// }

// def outlined_fn() {

// task.alloca:

// br label %task.body

// task.body:

MeinersburUnsubmitted

Done

I don't understand what this means.

Meinersbur: I don't understand what this means.

shraiyshAuthorUnsubmitted

Done

There are three basic block splits in the next few lines and this comment is meant to justify why three splits are needed - it basically tells which basic block will go where. Should I add something more to the comment?

shraiysh: There are three basic block splits in the next few lines and this comment is meant to justify…

MeinersburUnsubmitted

Done

Consider something that resembles LLVM-IR syntax. Eg.

// def outlined_fn() {
//   task.alloca:
//     br label %task.body
//
//   task.body:
//     ret void
// }

outlined_fn is a view after the finalize call, but here we are constructing the CFG before outlining, i.e. the description does not match.

Meinersbur: Consider something that resembles LLVM-IR syntax. Eg. ``` // def outlined_fn() { // task.

shraiyshAuthorUnsubmitted

Done

outlined_fn is a view after the finalize call, but here we are constructing the CFG before outlining, i.e. the description does not match.

Yes, that was supposed to justify why we need four splits here. Without this comment it wasn't really clear to me why three blocks won't be enough (currBB, task.body and task.exit). It also isn't directly intuitive that task.exit is not going to be a part of the outlined function. I have added that this is going to be the basic block mapping after outlining. If it seems unnecessary, please let me know and I will remove this comment.

shraiysh: > `outlined_fn` is a view after the finalize call, but here we are constructing the CFG before…

// ret void

MeinersburUnsubmitted

Done

[style] Please use LLVM's naming standard.

Meinersbur: [style] Please use [[ https://llvm.org/docs/CodingStandards.html#name-types-functions-variables…

// }

// ```

BasicBlock *TaskExitBB = splitBB(Builder, /*CreateBranch=*/true, "task.exit");

BasicBlock *TaskBodyBB = splitBB(Builder, /*CreateBranch=*/true, "task.body");

BasicBlock *TaskAllocaBB =

splitBB(Builder, /*CreateBranch=*/true, "task.alloca");

OutlineInfo OI;

OI.EntryBB = TaskAllocaBB;

OI.OuterAllocaBB = AllocaIP.getBlock();

OI.ExitBB = TaskExitBB;

OI.PostOutlineCB = [this, &Loc, Tied](Function &OutlinedFn) {

MeinersburUnsubmitted

Done

BodyGenCB may have invalidated AllocaIP and cannot be used anymore.

Meinersbur: `BodyGenCB` may have invalidated `AllocaIP` and cannot be used anymore.

// The input IR here looks like the following-

// ```

kiranchandramohanUnsubmitted

Done

Do you know whether attributes are added to the outlined/wrapper function? Is it added in the finalize function? CreateParallel seems to have in in the PostOutlineCB.

kiranchandramohan: Do you know whether attributes are added to the outlined/wrapper function? Is it added in the…

shraiyshAuthorUnsubmitted

Done

I have not added the attributes because I don't know about them and if they are required or not. I will check clang codegen once again for comments about any required attributes, but if you or anyone else knows what attributes are required and why, please let me know, I will add them.

shraiysh: I have not added the attributes because I don't know about them and if they are required or not.

// func @current_fn() {

// outlined_fn(%args)

// }

// func @outlined_fn(%args) { ... }

// ```

// This is changed to the following-

// ```

// func @current_fn() {

// runtime_call(..., wrapper_fn, ...)

// }

// func @wrapper_fn(..., %args) {

// outlined_fn(%args)

// }

// func @outlined_fn(%args) { ... }

// ```

kiranchandramohanUnsubmitted

Done

Is this a change from clang codegen? Why not runtime_call(..., outlined_fn, ...)? Can you say that explicitly in the summary that this is different from Clang if it was changed for a particular reason?

kiranchandramohan: Is this a change from clang codegen? Why not runtime_call(..., outlined_fn, ...)? Can you say…

shraiyshAuthorUnsubmitted

Done

The main reason for a difference is that Outlining generates a function call as shown in the input IR in the comment above. I could not find an easy way to change the arguments of a function after it has been constructed. Parallel does this by tweaking the CodeExtractor used inside outlining - and I did not want to touch outlining internally so that I can confidently rely on outlining to do its job.

A nice way to have similar (but not same) behavior as clang would be to add the inline attribute to the outlined function. After an Inline pass, the wrapper function will become the outlined function and it will be almost identical to clang's codegen. I will update the summary to highlight this difference.

shraiysh: The main reason for a difference is that Outlining generates a function call as shown in the…

MeinersburUnsubmitted

Done

Could you add the function signature differences between wrapper_fn and outlined_fn? I.e. what arguments does wrapper_fn throw away?

Btw, current Clang emits such a wrapper function with -g as well, so I don't see it as an issue. LLVM will always inline a private function with just a single call site, except in -O0 where always_inline would be needed.

Meinersbur: Could you add the function signature differences between `wrapper_fn` and `outlined_fn`? I.e.

shraiyshAuthorUnsubmitted

Done

The first argument is an i32 value but I do not know its usage. There is minimal documentation for task related stuff in OpenMP Runtime Library Reference in the OpenMP subproject. Here is the function signature for the wrapper function, and the first argument is discarded at the moment. If anyone has any reference to some documentation about this, or has any idea about what that argument represents, please let me know and I will try to handle it according to its purpose.

Btw, current Clang emits such a wrapper function with -g as well, so I don't see it as an issue. LLVM will always inline a private function with just a single call site, except in -O0 where always_inline would be needed.

Oh that is great then. Thank you!

shraiysh: The first argument is an i32 value but I do not know its usage. There is minimal documentation…

shraiyshAuthorUnsubmitted

Done

Could you add the function signature differences between wrapper_fn and outlined_fn? I.e. what arguments does wrapper_fn throw away?

Btw, current Clang emits such a wrapper function with -g as well, so I don't see it as an issue. LLVM will always inline a private function with just a single call site, except in -O0 where always_inline would be needed.

shraiysh: > Could you add the function signature differences between `wrapper_fn` and `outlined_fn`? I.e.

// The stale call instruction will be replaced with a new call instruction

// for runtime call with a wrapper function.

assert(OutlinedFn.getNumUses() == 1 &&

"there must be a single user for the outlined function");

CallInst *StaleCI = cast<CallInst>(OutlinedFn.user_back());

MeinersburUnsubmitted

Done

Could you add an assert checking that there is just a single user, so we do not accidentally a wrong one.

Meinersbur: Could you add an assert checking that there is just a single user, so we do not accidentally a…

// HasTaskData is true if any variables are captured in the outlined region,

// false otherwise.

bool HasTaskData = StaleCI->arg_size() > 0;

Builder.SetInsertPoint(StaleCI);

// Gather the arguments for emitting the runtime call for

// @__kmpc_omp_task_alloc

Function *TaskAllocFn =

getOrCreateRuntimeFunctionPtr(OMPRTL___kmpc_omp_task_alloc);

// Arguments - `loc_ref` (Ident) and `gtid` (ThreadID)

// call.

MeinersburUnsubmitted

Done

uint32_t SrcLocStrSize;

- Value *Ident = getOrCreateIdent(

- getOrCreateSrcLocStr(LocationDescription(Builder), SrcLocStrSize),

- SrcLocStrSize);

+ Constant *SrcLocStr = getOrCreateSrcLocStr(Loc, SrcLocStrSize);

+ Value *Ident = getOrCreateIdent(SrcLocStr, SrcLocStrSize);

Value *ThreadID = getOrCreateThreadID(Ident);

[serious] SrcLocStrSize cannot be passed-by-value and passed-bt-referenced at the same statement (unsequenced).

Meinersbur: [serious] `SrcLocStrSize` cannot be passed-by-value and passed-bt-referenced at the same…

uint32_t SrcLocStrSize;

Constant *SrcLocStr = getOrCreateSrcLocStr(Loc, SrcLocStrSize);

Value *Ident = getOrCreateIdent(SrcLocStr, SrcLocStrSize);

Value *ThreadID = getOrCreateThreadID(Ident);

// Argument - `flags`

// If task is tied, then (Flags & 1) == 1.

MeinersburUnsubmitted

Done

Please document what HasTaskData. TaskSize, NewTaskData, etc are.

Meinersbur: Please document what `HasTaskData`. `TaskSize`, `NewTaskData`, etc are.

MeinersburUnsubmitted

Done

Still don't see a description of TaskSize.

For HasTaskData, could you explain why and how the function signature changes?

Meinersbur: Still don't see a description of `TaskSize`. For `HasTaskData`, could you explain why and how…

shraiyshAuthorUnsubmitted

Done

I have added that TaskSize refers to the argument sizeof_kmp_task_t in the runtime call. I did not want to add further information because the argument's exact meaning is not documented anywhere (the reference doesn't have it). My interpretation of the argument is the size of arguments in bytes, but that could be incorrect and I thought it was better to redirect anyone reading this/working on this to the argument of the runtime call instead of writing a possibly misleading description. Please let me know if it would be better to add the current interpretation of the argument.

For HasTaskData, could you explain why and how the function signature changes?

Documented this near the wrapper function. Please let me know if it requires some alteration.

shraiysh: I have added that TaskSize refers to the argument `sizeof_kmp_task_t` in the [[ https://github.

MeinersburUnsubmitted

Done

https://github.com/llvm/llvm-project/blob/a4190037fac06c2b0cc71b8bb90de9e6b570ebb5/openmp/runtime/src/kmp_tasking.cpp#L1345

// sizeof_kmp_task_t:  Size in bytes of kmp_task_t data structure including
// private vars accessed in task.

Meinersbur: https://github.com/llvm/llvm-project/blob/a4190037fac06c2b0cc71b8bb90de9e6b570ebb5/openmp/runti…

// If task is untied, then (Flags & 1) == 0.

// TODO: Handle the other flags.

kiranchandramohanUnsubmitted

Done

Nit: corresponding

kiranchandramohan: Nit: corresponding

Value *Flags = Builder.getInt32(Tied);

// Argument - `sizeof_kmp_task_t` (TaskSize)

// Tasksize refers to the size in bytes of kmp_task_t data structure

// including private vars accessed in task.

Value *TaskSize = Builder.getInt64(0);

if (HasTaskData) {

AllocaInst *ArgStructAlloca =

dyn_cast<AllocaInst>(StaleCI->getArgOperand(0));

assert(ArgStructAlloca &&

"Unable to find the alloca instruction corresponding to arguments "

"for extracted function");

StructType *ArgStructType =

dyn_cast<StructType>(ArgStructAlloca->getAllocatedType());

assert(ArgStructType && "Unable to find struct type corresponding to "

"arguments for extracted function");

TaskSize =

MeinersburUnsubmitted

Done

Don't store a Twine in a variable. See comment from Twine.h:

/// A Twine is not intended for use directly and should not be stored, its
/// implementation relies on the ability to store pointers to temporary stack
/// objects which may be deallocated at the end of a statement. Twines should
/// only be used accepted as const references in arguments, when an API wishes
/// to accept possibly-concatenated strings.

Meinersbur: Don't store a Twine in a variable. See comment from `Twine.h`: ``` /// A Twine is not…

MeinersburUnsubmitted

Done

WrapperFuncName and WrapperFuncNameStorage are now dead. Could you remove them?

Meinersbur: `WrapperFuncName` and `WrapperFuncNameStorage` are now dead. Could you remove them?

Builder.getInt64(M.getDataLayout().getTypeStoreSize(ArgStructType));

MeinersburUnsubmitted

Done

Use a llvm::Twine

Meinersbur: Use a `llvm::Twine`

shraiyshAuthorUnsubmitted

Done

I have used it, but I am not sure if it is accurate usage. Please let me know if it seems inefficient.

shraiysh: I have used it, but I am not sure if it is accurate usage. Please let me know if it seems…

}

MeinersburUnsubmitted

Done

FunctionCallee WrapperFuncVal = M.getOrInsertFunction(

- WrapperFuncName.toStringRef(WrapperFuncNameStorage),

+ (Twine(OutlinedFn.getName()) + ".wrapper").str(),

FunctionType::get(Builder.getInt32Ty(), WrapperArgTys, false));

No SmallString<128> needed. str() creates a std::string that implicitly converts to a llvm::StringRef that is valid until the end of the statement(";").

Compared to using std::string only, this saves the creation of one temporary std::sting (for OutlinedFn.getName()). Your version saves another one (if fewer than 128 chars), but it is also more complicated. Before StringRef was made more compatible to std::string_view, the .str() wasn't even need.

Meinersbur: No `SmallString<128>` needed. `str()` creates a `std::string` that implicitly converts to a…

shraiyshAuthorUnsubmitted

Done

Alright, thanks for the explanation, I understand it better now.

shraiysh: Alright, thanks for the explanation, I understand it better now.

// TODO: Argument - sizeof_shareds

// Argument - task_entry (the wrapper function)

// If the outlined function has some captured variables (i.e. HasTaskData is

// true), then the wrapper function will have an additional argument (the

// struct containing captured variables). Otherwise, no such argument will

// be present.

SmallVector<Type *> WrapperArgTys{Builder.getInt32Ty()};

if (HasTaskData)

WrapperArgTys.push_back(OutlinedFn.getArg(0)->getType());

kiranchandramohanUnsubmitted

Done

Is the size of shared a TODO? And would further changes impact the task size as well?
It seems by default variables are passed as private copies, but if the enclosing region has a data-sharing of shared then that should be honoured.

kiranchandramohan: Is the size of shared a TODO? And would further changes impact the task size as well? It seems…

shraiyshAuthorUnsubmitted

Done

Yes, that is a todo, I have added it now. I think the task size will be reduced in that case. I have not looked into that yet and am doing private copies for everything at the moment.

shraiysh: Yes, that is a todo, I have added it now. I think the task size will be reduced in that case.

FunctionCallee WrapperFuncVal = M.getOrInsertFunction(

(Twine(OutlinedFn.getName()) + ".wrapper").str(),

FunctionType::get(Builder.getInt32Ty(), WrapperArgTys, false));

MeinersburUnsubmitted

Done

TaskAllocFn,

- {Ident, ThreadID, /*flags=*/Builder.getInt32(1),

+ {/*loc_ref=*/Ident, /*dif=*/ThreadID, /*flags=*/Builder.getInt32(1),

/*sizeof_task=*/TaskSize, /*sizeof_shared=*/Builder.getInt64(0),

What is flag 1?

Meinersbur: What is flag `1`?

shraiyshAuthorUnsubmitted

Done

This is from flags in kmp.h. Value 1 means that it is tied (which is default unless untied clause is specified). Untied clause is not handled in this patch.

shraiysh: This is from [[ https://github.com/llvm/llvm-project/blob/e1567e771b8943861f5c886773b19ebfbf395…

MeinersburUnsubmitted

Done

Please document that.

Meinersbur: Please document that.

Function *WrapperFunc = dyn_cast<Function>(WrapperFuncVal.getCallee());

mikericeUnsubmitted

Not Done

The return value of this dyn_cast is not checked. Should this use cast instead (to satisfy static verifiers)? Or do you expect a nullptr return here?

mikerice: The return value of this dyn_cast is not checked. Should this use cast instead (to satisfy…

shraiyshAuthorUnsubmitted

Done

I do not expect it to return a nullptr in most cases. There could be a few corner cases that I am not aware of where it might.

I think we can change it to cast instead of dyn_cast, or if that doesn't work, we should error out if we get a nullptr, until we find a valid testcase where nullptr is expected. I am busy for the next couple days so I will work on this after sometime, but meanwhile if this change turns any buildbots green and is required sooner, I will be able to review the change if required. Thank you for pointing this issue out.

shraiysh: I do not expect it to return a nullptr in most cases. There could be a few corner cases that I…

PointerType *WrapperFuncBitcastType =

FunctionType::get(Builder.getInt32Ty(),

{Builder.getInt32Ty(), Builder.getInt8PtrTy()}, false)

->getPointerTo();

Value *WrapperFuncBitcast =

ConstantExpr::getBitCast(WrapperFunc, WrapperFuncBitcastType);

// Emit the @__kmpc_omp_task_alloc runtime call

// The runtime call returns a pointer to an area where the task captured

// variables must be copied before the task is run (NewTaskData)

CallInst *NewTaskData = Builder.CreateCall(

TaskAllocFn,

{/*loc_ref=*/Ident, /*gtid=*/ThreadID, /*flags=*/Flags,

/*sizeof_task=*/TaskSize, /*sizeof_shared=*/Builder.getInt64(0),

/*task_func=*/WrapperFuncBitcast});

// Copy the arguments for outlined function

if (HasTaskData) {

Value *TaskData = StaleCI->getArgOperand(0);

MeinersburUnsubmitted

Done

// Emit the body for wrapper function

- BasicBlock *WrapperEntryBB = BasicBlock::Create(M.getContext());

- WrapperFunc->getBasicBlockList().push_back(WrapperEntryBB);

+ BasicBlock *WrapperEntryBB = BasicBlock::Create(M.getContext(), "", WrapperFunc);

Builder.SetInsertPoint(WrapperEntryBB);

Meinersbur:

Align Alignment = TaskData->getPointerAlignment(M.getDataLayout());

Builder.CreateMemCpy(NewTaskData, Alignment, TaskData, Alignment,

TaskSize);

kiranchandramohanUnsubmitted

Done

Nit: A debug dump of the IR here might be useful.

kiranchandramohan: Nit: A debug dump of the IR here might be useful.

MeinersburUnsubmitted

Done

Please don't dump complete IR. Too much output makes -debug useless.

Meinersbur: Please don't dump complete IR. Too much output makes `-debug` useless.

shraiyshAuthorUnsubmitted

Done

Please let me know if something less than IR should be dumped here.

shraiysh: Please let me know if something less than IR should be dumped here.

}

// Emit the @__kmpc_omp_task runtime call to spawn the task

kiranchandramohanUnsubmitted

Done

Nit: The size can be omitted.

kiranchandramohan: Nit: The size can be omitted.

Function *TaskFn = getOrCreateRuntimeFunctionPtr(OMPRTL___kmpc_omp_task);

kiranchandramohanUnsubmitted

Done

Is this step required? TaskRegionBlockSet and Blocks do not seem to be used here, also collectBlocks function seems to be called from finalize as well.

kiranchandramohan: Is this step required? `TaskRegionBlockSet` and `Blocks` do not seem to be used here, also…

shraiyshAuthorUnsubmitted

Done

No it is not, I thought collectBlocks was required. I have removed it now.

shraiysh: No it is not, I thought `collectBlocks` was required. I have removed it now.

Builder.CreateCall(TaskFn, {Ident, ThreadID, NewTaskData});

StaleCI->eraseFromParent();

MeinersburUnsubmitted

Done

Please avoid temporary instructions. Use splitBB instead.

Meinersbur: Please avoid temporary instructions. Use `splitBB` instead.

shraiyshAuthorUnsubmitted

Done

AFAIK, splitBB requires an instruction pointer. I have updated this to erase the temporary instruction immediately after I have split the basic blocks, but I cannot figure out a way to completely eliminate temporary instructions. Is this okay?

shraiysh: AFAIK, `splitBB` requires an instruction pointer. I have updated this to erase the temporary…

MeinersburUnsubmitted

Done

BasicBlock *TaskExitBB = splitBB(Builder, "task.exit");
BasicBlock *TaskBodyBB = splitBB(Builder, "task.body");
BasicBlock *TaskAllocaBB = splitBB(Builder, "task.alloca");

Note that the reverse order. After this, Builder will insert instructions before the terminator of Currbb (where you probably don't want to insert instructions here).

Meinersbur: ``` BasicBlock *TaskExitBB = splitBB(Builder, "task.exit"); BasicBlock *TaskBodyBB = splitBB…

// Emit the body for wrapper function

BasicBlock *WrapperEntryBB =

BasicBlock::Create(M.getContext(), "", WrapperFunc);

Builder.SetInsertPoint(WrapperEntryBB);

if (HasTaskData)

Builder.CreateCall(&OutlinedFn, {WrapperFunc->getArg(1)});

else

Builder.CreateCall(&OutlinedFn);

Builder.CreateRet(Builder.getInt32(0));

};

addOutlineInfo(std::move(OI));

InsertPointTy TaskAllocaIP =

InsertPointTy(TaskAllocaBB, TaskAllocaBB->begin());

InsertPointTy TaskBodyIP = InsertPointTy(TaskBodyBB, TaskBodyBB->begin());

BodyGenCB(TaskAllocaIP, TaskBodyIP);

Builder.SetInsertPoint(TaskExitBB);

return Builder.saveIP();

}

OpenMPIRBuilder::InsertPointTy OpenMPIRBuilder::createSections( OpenMPIRBuilder::InsertPointTy OpenMPIRBuilder::createSections(

const LocationDescription &Loc, InsertPointTy AllocaIP, const LocationDescription &Loc, InsertPointTy AllocaIP,

ArrayRef<StorableBodyGenCallbackTy> SectionCBs, PrivatizeCallbackTy PrivCB, ArrayRef<StorableBodyGenCallbackTy> SectionCBs, PrivatizeCallbackTy PrivCB,

FinalizeCallbackTy FiniCB, bool IsCancellable, bool IsNowait) { FinalizeCallbackTy FiniCB, bool IsCancellable, bool IsNowait) {

assert(!isConflictIP(AllocaIP, Loc.IP) && "Dedicated IP allocas required"); assert(!isConflictIP(AllocaIP, Loc.IP) && "Dedicated IP allocas required");

if (!updateToLocation(Loc)) if (!updateToLocation(Loc))

return Loc.IP; return Loc.IP;

▲ Show 20 Lines • Show All 2,948 Lines • Show Last 20 Lines

llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp

Show First 20 Lines • Show All 1,187 Lines • ▼ Show 20 Lines TEST_F(OpenMPIRBuilderTest, ParallelForwardAsPointers) {

OMPBuilder.finalize(); OMPBuilder.finalize();

EXPECT_FALSE(verifyModule(*M, &errs())); EXPECT_FALSE(verifyModule(*M, &errs()));

Function *OutlinedFn = Internal->getFunction(); Function *OutlinedFn = Internal->getFunction();

Type *Arg2Type = OutlinedFn->getArg(2)->getType(); Type *Arg2Type = OutlinedFn->getArg(2)->getType();

EXPECT_TRUE(Arg2Type->isPointerTy()); EXPECT_TRUE(Arg2Type->isPointerTy());

EXPECT_TRUE( EXPECT_TRUE(

cast<PointerType>(Arg2Type)->isOpaqueOrPointeeTypeMatches(ArgStructTy)); cast<PointerType>(Arg2Type)->isOpaqueOrPointeeTypeMatches(ArgStructTy));

} }

MeinersburUnsubmitted

Done

[nit] only whitespace change

Meinersbur: [nit] only whitespace change

shraiyshAuthorUnsubmitted

Done

Removed. Thanks!

shraiysh: Removed. Thanks!

TEST_F(OpenMPIRBuilderTest, CanonicalLoopSimple) { TEST_F(OpenMPIRBuilderTest, CanonicalLoopSimple) {

using InsertPointTy = OpenMPIRBuilder::InsertPointTy; using InsertPointTy = OpenMPIRBuilder::InsertPointTy;

OpenMPIRBuilder OMPBuilder(*M); OpenMPIRBuilder OMPBuilder(*M);

OMPBuilder.initialize(); OMPBuilder.initialize();

IRBuilder<> Builder(BB); IRBuilder<> Builder(BB);

OpenMPIRBuilder::LocationDescription Loc({Builder.saveIP(), DL}); OpenMPIRBuilder::LocationDescription Loc({Builder.saveIP(), DL});

Value *TripCount = F->getArg(0); Value *TripCount = F->getArg(0);

▲ Show 20 Lines • Show All 3,200 Lines • ▼ Show 20 Lines TEST_F(OpenMPIRBuilderTest, EmitMapperCall) {

EXPECT_TRUE(MapperCall->getOperand(1)->getType()->isIntegerTy(64)); EXPECT_TRUE(MapperCall->getOperand(1)->getType()->isIntegerTy(64));

EXPECT_TRUE(MapperCall->getOperand(2)->getType()->isIntegerTy(32)); EXPECT_TRUE(MapperCall->getOperand(2)->getType()->isIntegerTy(32));

EXPECT_EQ(MapperCall->getOperand(6), MaptypesArg); EXPECT_EQ(MapperCall->getOperand(6), MaptypesArg);

EXPECT_EQ(MapperCall->getOperand(7), MapnamesArg); EXPECT_EQ(MapperCall->getOperand(7), MapnamesArg);

EXPECT_TRUE(MapperCall->getOperand(8)->getType()->isPointerTy()); EXPECT_TRUE(MapperCall->getOperand(8)->getType()->isPointerTy());

} }

TEST_F(OpenMPIRBuilderTest, CreateTask) {

using InsertPointTy = OpenMPIRBuilder::InsertPointTy;

OpenMPIRBuilder OMPBuilder(*M);

OMPBuilder.initialize();

F->setName("func");

IRBuilder<> Builder(BB);

AllocaInst *ValPtr32 = Builder.CreateAlloca(Builder.getInt32Ty());

AllocaInst *ValPtr128 = Builder.CreateAlloca(Builder.getInt128Ty());

Value *Val128 =

Builder.CreateLoad(Builder.getInt128Ty(), ValPtr128, "bodygen.load");

auto BodyGenCB = [&](InsertPointTy AllocaIP, InsertPointTy CodeGenIP) {

Builder.restoreIP(AllocaIP);

AllocaInst *Local128 = Builder.CreateAlloca(Builder.getInt128Ty(), nullptr,

"bodygen.alloca128");

Builder.restoreIP(CodeGenIP);

// Loading and storing captured pointer and values

Builder.CreateStore(Val128, Local128);

Value *Val32 = Builder.CreateLoad(ValPtr32->getAllocatedType(), ValPtr32,

"bodygen.load32");

LoadInst *PrivLoad128 = Builder.CreateLoad(

Local128->getAllocatedType(), Local128, "bodygen.local.load128");

Value *Cmp = Builder.CreateICmpNE(

Val32, Builder.CreateTrunc(PrivLoad128, Val32->getType()));

Instruction *ThenTerm, *ElseTerm;

SplitBlockAndInsertIfThenElse(Cmp, CodeGenIP.getBlock()->getTerminator(),

&ThenTerm, &ElseTerm);

};

BasicBlock *AllocaBB = Builder.GetInsertBlock();

BasicBlock *BodyBB = splitBB(Builder, /*CreateBranch=*/true, "alloca.split");

OpenMPIRBuilder::LocationDescription Loc(

InsertPointTy(BodyBB, BodyBB->getFirstInsertionPt()), DL);

Builder.restoreIP(OMPBuilder.createTask(

Loc, InsertPointTy(AllocaBB, AllocaBB->getFirstInsertionPt()),

MeinersburUnsubmitted

Done

[style] LLVM codsing style does not use Almost-Always-Auto.

Meinersbur: [style] LLVM codsing style does not use [[ https://llvm.org/docs/CodingStandards.html#use-auto…

BodyGenCB));

OMPBuilder.finalize();

Builder.CreateRetVoid();

EXPECT_FALSE(verifyModule(*M, &errs()));

CallInst *TaskAllocCall = dyn_cast<CallInst>(

OMPBuilder.getOrCreateRuntimeFunctionPtr(OMPRTL___kmpc_omp_task_alloc)

->user_back());

// Verify the Ident argument

GlobalVariable *Ident = cast<GlobalVariable>(TaskAllocCall->getArgOperand(0));

ASSERT_NE(Ident, nullptr);

EXPECT_TRUE(Ident->hasInitializer());

Constant *Initializer = Ident->getInitializer();

GlobalVariable *SrcStrGlob =

cast<GlobalVariable>(Initializer->getOperand(4)->stripPointerCasts());

ASSERT_NE(SrcStrGlob, nullptr);

ConstantDataArray *SrcSrc =

dyn_cast<ConstantDataArray>(SrcStrGlob->getInitializer());

ASSERT_NE(SrcSrc, nullptr);

// Verify the num_threads argument.

CallInst *GTID = dyn_cast<CallInst>(TaskAllocCall->getArgOperand(1));

ASSERT_NE(GTID, nullptr);

EXPECT_EQ(GTID->arg_size(), 1U);

EXPECT_EQ(GTID->getCalledFunction()->getName(), "__kmpc_global_thread_num");

// Verify the flags

// TODO: Check for others flags. Currently testing only for tiedness.

ConstantInt *Flags = dyn_cast<ConstantInt>(TaskAllocCall->getArgOperand(2));

ASSERT_NE(Flags, nullptr);

EXPECT_EQ(Flags->getSExtValue(), 1);

// Verify the data size

ConstantInt *DataSize =

dyn_cast<ConstantInt>(TaskAllocCall->getArgOperand(3));

ASSERT_NE(DataSize, nullptr);

EXPECT_EQ(DataSize->getSExtValue(), 24); // 64-bit pointer + 128-bit integer

// TODO: Verify size of shared clause variables

// Verify Wrapper function

Function *WrapperFunc =

dyn_cast<Function>(TaskAllocCall->getArgOperand(5)->stripPointerCasts());

ASSERT_NE(WrapperFunc, nullptr);

EXPECT_FALSE(WrapperFunc->isDeclaration());

CallInst *OutlinedFnCall = dyn_cast<CallInst>(WrapperFunc->begin()->begin());

ASSERT_NE(OutlinedFnCall, nullptr);

MeinersburUnsubmitted

Done

dyn_cast<Function>(TaskAllocCall->getArgOperand(5)->stripPointerCasts());

- EXPECT_NE(WrapperFunc, nullptr);

+ ASSERT_NE(WrapperFunc, nullptr);

EXPECT_FALSE(WrapperFunc->isDeclaration());

If WrapperFunc is NULL, the next line will be a segfault. ASSERT_NE will stop execution if it fails.

Meinersbur: If `WrapperFunc` is NULL, the next line will be a segfault. `ASSERT_NE` will stop execution if…

EXPECT_EQ(WrapperFunc->getArg(0)->getType(), Builder.getInt32Ty());

EXPECT_EQ(OutlinedFnCall->getArgOperand(0), WrapperFunc->getArg(1));

MeinersburUnsubmitted

Done

CallInst *OutlinedFnCall = dyn_cast<CallInst>(WrapperFunc->begin()->begin());

- EXPECT_NE(OutlinedFnCall, nullptr);

+ ASSERT_NE(OutlinedFnCall, nullptr);

EXPECT_EQ(WrapperFunc->getArg(0)->getType(), Builder.getInt32Ty());

See above (and other occurances checking for nullptr)

Meinersbur: See above (and other occurances checking for nullptr)

// Verify the presence of `trunc` and `icmp` instructions in Outlined function

Function *OutlinedFn = OutlinedFnCall->getCalledFunction();

ASSERT_NE(OutlinedFn, nullptr);

EXPECT_TRUE(any_of(instructions(OutlinedFn),

[](Instruction &inst) { return isa<TruncInst>(&inst); }));

EXPECT_TRUE(any_of(instructions(OutlinedFn),

[](Instruction &inst) { return isa<ICmpInst>(&inst); }));

// Verify the execution of the task

CallInst *TaskCall = dyn_cast<CallInst>(

MeinersburUnsubmitted

Done

Nice!

Meinersbur: Nice!

OMPBuilder.getOrCreateRuntimeFunctionPtr(OMPRTL___kmpc_omp_task)

->user_back());

ASSERT_NE(TaskCall, nullptr);

EXPECT_EQ(TaskCall->getArgOperand(0), Ident);

EXPECT_EQ(TaskCall->getArgOperand(1), GTID);

EXPECT_EQ(TaskCall->getArgOperand(2), TaskAllocCall);

MeinersburUnsubmitted

Done

Consider llvm::any_of

Meinersbur: Consider `llvm::any_of`

// Verify that the argument data has been copied

for (User *in : TaskAllocCall->users()) {

if (MemCpyInst *memCpyInst = dyn_cast<MemCpyInst>(in))

EXPECT_EQ(memCpyInst->getDest(), TaskAllocCall);

}

TEST_F(OpenMPIRBuilderTest, CreateTaskNoArgs) {

using InsertPointTy = OpenMPIRBuilder::InsertPointTy;

OpenMPIRBuilder OMPBuilder(*M);

OMPBuilder.initialize();

F->setName("func");

IRBuilder<> Builder(BB);

auto BodyGenCB = [&](InsertPointTy AllocaIP, InsertPointTy CodeGenIP) {};

BasicBlock *AllocaBB = Builder.GetInsertBlock();

MeinersburUnsubmitted

Done

auto BodyGenCB = [&](InsertPointTy AllocaIP, InsertPointTy CodeGenIP) {};

- auto *AllocaBB = Builder.GetInsertBlock();

+ BasicBlock* AllocaBB = Builder.GetInsertBlock();

auto *BodyBB = splitBB(Builder, /*CreateBranch=*/true, "alloca.split");

[style]

Meinersbur: [style]

BasicBlock *BodyBB = splitBB(Builder, /*CreateBranch=*/true, "alloca.split");

MeinersburUnsubmitted

Done

auto *AllocaBB = Builder.GetInsertBlock();

- auto *BodyBB = splitBB(Builder, /*CreateBranch=*/true, "alloca.split");

+ BasicBlock* BodyBB = splitBB(Builder, /*CreateBranch=*/true, "alloca.split");

OpenMPIRBuilder::LocationDescription Loc(

[style]

Meinersbur: [style]

OpenMPIRBuilder::LocationDescription Loc(

InsertPointTy(BodyBB, BodyBB->getFirstInsertionPt()), DL);

Builder.restoreIP(OMPBuilder.createTask(

Loc, InsertPointTy(AllocaBB, AllocaBB->getFirstInsertionPt()),

BodyGenCB));

OMPBuilder.finalize();

Builder.CreateRetVoid();

EXPECT_FALSE(verifyModule(*M, &errs()));

}

TEST_F(OpenMPIRBuilderTest, CreateTaskUntied) {

using InsertPointTy = OpenMPIRBuilder::InsertPointTy;

OpenMPIRBuilder OMPBuilder(*M);

OMPBuilder.initialize();

F->setName("func");

IRBuilder<> Builder(BB);

auto BodyGenCB = [&](InsertPointTy AllocaIP, InsertPointTy CodeGenIP) {};

BasicBlock *AllocaBB = Builder.GetInsertBlock();

BasicBlock *BodyBB = splitBB(Builder, /*CreateBranch=*/true, "alloca.split");

OpenMPIRBuilder::LocationDescription Loc(

InsertPointTy(BodyBB, BodyBB->getFirstInsertionPt()), DL);

Builder.restoreIP(OMPBuilder.createTask(

Loc, InsertPointTy(AllocaBB, AllocaBB->getFirstInsertionPt()), BodyGenCB,

/*Tied=*/false));

OMPBuilder.finalize();

Builder.CreateRetVoid();

// Check for the `Tied` argument

CallInst *TaskAllocCall = dyn_cast<CallInst>(

OMPBuilder.getOrCreateRuntimeFunctionPtr(OMPRTL___kmpc_omp_task_alloc)

->user_back());

ASSERT_NE(TaskAllocCall, nullptr);

ConstantInt *Flags = dyn_cast<ConstantInt>(TaskAllocCall->getArgOperand(2));

ASSERT_NE(Flags, nullptr);

EXPECT_EQ(Flags->getZExtValue() & 1U, 0U);

EXPECT_FALSE(verifyModule(*M, &errs()));

}

} // namespace } // namespace

This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP][IRBuilder] `omp task` supportClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 431588

llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h

llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp

llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp

[OpenMP][IRBuilder] `omp task` support
ClosedPublic