This is an archive of the discontinued LLVM Phabricator instance.

[pstl] Implement OpenMP backend
ClosedPublic

Authored by nadiasvertex on Apr 3 2021, 5:07 AM.

Details

Reviewers
ldionne
jdoerfert
MikeDvorskiy
Group Reviewers
Restricted Project
Summary

Implement the PSTL backend using OpenMP.

Diff Detail

Event Timeline

nadiasvertex created this revision.Apr 3 2021, 5:07 AM
nadiasvertex requested review of this revision.Apr 3 2021, 5:07 AM
Herald added a project: Restricted Project. · View Herald Transcript
Herald added a reviewer: Restricted Project. · View Herald Transcript
MikeDvorskiy added inline comments.
pstl/include/pstl/internal/parallel_backend_omp.h
343

ISO C++ forbids variable length array.

553
  1. What's a goal of the structure?
  2. I think better to don't introduce new stuff (struct _MinKItems), we can use std::vector instead (if we really need).
733

I think a container, for example std::vector, is better causing RAII reason.

This corrects all known problems in the parallel partial sort implementation. The parallel merge algorithm still needs implementing.

nadiasvertex marked 3 inline comments as done.Jun 14 2021, 2:45 PM

Resolved all review comments.

nadiasvertex edited the summary of this revision. (Show Details)Jun 15 2021, 5:21 AM
nadiasvertex edited the summary of this revision. (Show Details)

This patch brings in fixes made in the oneDPL project. All unit tests in oneDPL now pass.

Adds the "omp" entry into the PSTL CMakeLists.txt

Update the linker script for parallel STL.

The previous patch was somehow broken. I have regenerated it.

Another attempt at fixing the broken patch.

Perform clang-format

nadiasvertex edited the summary of this revision. (Show Details)

Disable lint checks

[pstl][omp] Rework patch to pacify clang-tidy

[pstl][omp] Add 'omp' option to list

[pstl][omp] Fix clang-tidy header guard warnings

[pstl][omp] Fix header file movement

MikeDvorskiy requested changes to this revision.Sep 17 2021, 9:14 AM
MikeDvorskiy added inline comments.
pstl/include/pstl/internal/omp/parallel_for_each.h
29

On this line we call increment for an iterator value (last-first) time in one thread, because it is a forward iterator..

Probably, there is an approach to avoid such overhead.. At least, I would add //TODO: like "to think how to avoid the overhead"

34

Probably, here we have overhead, because an each loop iteration increments __first from "zero".
Yes, some it is done in parallel but... In any case make sense to think whether we avoid it, if it possible. At least, I would add //TODO: like "to think how to avoid the overhead"

pstl/include/pstl/internal/omp/parallel_invoke.h
40

Nit: redundant braces.

pstl/include/pstl/internal/omp/parallel_merge.h
43

We have redundant parentheses, according to the C++ operation priority list.

48

We have redundant parentheses, according to the C++ operation priority list.

75

The "parallel merge" pattern accepts just random access iterator types. So, we can write here just difference operator, everywhere in the patch.

Probably, it makes sense to add an assert (in the begin of the function) on random "access iterator type".

91

Nit: redundant braces. (here and everywhere in the same places)

pstl/include/pstl/internal/omp/parallel_reduce.h
29

Talking about fallback to serial call (here and everywhere in the patch) - in the TBB backend (and others) we have not such logic (excepting sort and merge algorithms). A point here - a user functor/lambda may be so complicated and, probably, parallel execution is reasonable even for "short" sequences... PSTL relies that is user's responsibility - to run execution in parallel or not. Probably, in the future, if the users will complain, we will change approach, but in any case the behavior should be consistent in the all backends.

34

We have redundant parentheses (here and everywhere in the similar places), according to the C++ operation priority list.

54

According to PSTL design, the trivial pre-checks are processed on the "middle" PSTL layer - in algorithm_iml.h, numeric_impl.h, etc.

If in some places on the "middle layer" it is not true - it should be added. And remove such trivial pre-checks on a threading backend level, m.b. just add an assert instead of.

pstl/include/pstl/internal/omp/parallel_scan.h
55

Regarding the following functions: downsweep, upsweep, __split.
The implementation if ones can be shared between TBB and OpenMP backends...
For now I think is OK. It may be done with the next patch.

pstl/include/pstl/internal/omp/parallel_stable_sort.h
122

Please, see my comment above (about trivial pre-checks).

pstl/include/pstl/internal/omp/parallel_transform_reduce.h
51

Actually, we cannot use OpenMP UDR because it requires an identical value to initialize each accumulator. "__init" parameter is initial value of reduction, not an "identity".
So, the comment should be corrected.

92

Please, see my comment above (about trivial pre-checks).

pstl/include/pstl/internal/omp/parallel_transform_scan.h
28

I believe, "//TODO" should added here- about that parallel implementation will be added later, while we have fallback to a serial call.

pstl/include/pstl/internal/omp/util.h
61

I believe the call class __buffer can be shared between the all host backend.
But, I think, for now is OK. It may be done later, in the one of next patch.

This revision now requires changes to proceed.Sep 17 2021, 9:14 AM
nadiasvertex marked 14 inline comments as done.

This update applies changes suggested by MikeDvorskiy's review.

nadiasvertex marked 2 inline comments as done.Sep 26 2021, 10:10 AM

All of the serious comments were addressed. The nits were style differences I disagree with. :-) If these nits are LLVM required style I will adjust them.

pstl/include/pstl/internal/omp/parallel_invoke.h
40

I prefer the braces to make it clear what is in the block and for maintenance reasons.

pstl/include/pstl/internal/omp/parallel_merge.h
43

I prefer them.

48

I prefer them.

rarutyun added inline comments.
pstl/include/pstl/internal/omp/parallel_for.h
65

Please modify to omp single nowait here and in all other places where omp single is used within omp parallel. Currently we have two barriers from parallel and single. The inner barrier may be eliminated.

pstl/include/pstl/internal/omp/parallel_merge.h
75

I would even say we should write __xe - __xs because std::distance works with any iterator type. Since we expect random access iterator only to be there we might hide the error with the code as it is. In case of using operator- explicitly we get the compile-time error immediately if this very operator is not defined. And one more thing. Using additional abstraction layers in places where we don't need them (in this case slightly, but) increases compilation time.

pstl/include/pstl/internal/omp/parallel_stable_sort.h
68

Why do you need such a proxy object? Could you call __parallel_move_range directly?

71

If the Callable is required please mark method as const

102

__move_value is named as the type (not the alias) that decreases readability. Wouldn't it be better to rename the variable (probably __move_value_func)? The same is true for `__move_range`

119

I don't need __exec as a name, do you? I personally like the style with comments (/*__exec */) but it's up to you

pstl/include/pstl/internal/omp/parallel_transform_reduce.h
38

Don't see the reason why we need to add such an alias. In some places you use that in others - you don't

rarutyun added inline comments.Sep 26 2021, 5:05 PM
pstl/include/pstl/internal/omp/parallel_stable_sort.h
26

Why did you prefer to make the whole class template rather than making operator() with template parameters?

30

const at the end please

38

__first2 supposed to be OutputIterator semantically. Let's rename the second template parameter to OutputIterator and the last function parameter to __d_first or something. Otherwise, readability suffers.

67

This comment is applicable if the callable is required. Why do you need to be templatized? Could those template parameters be tied to operator()

pstl/include/pstl/internal/omp/util.h
61

I think it should not be a big deal to create the header with the __buffer class within and include it to both TBB backend and OpenMP backend. Otherwise we will commit relatively big code duplication

84

I am not sure if we need this variable basing on Mikhail's comments (probably, it's still useful for some algorithms) but if it remains in the code please mark it inline.

89

Is there any reason to use trailing return types? As I remember correctly you use those only in this file.

100

Why do you use std::intptr_t? it's a type to represent the pointer as the signed value. If you already check (with enable_if) that your type is integral and you have two same types (by function signature) you may case directly to Index, may not you?

105

chuck_policy (a least to me) is a very confusing name in the context of Parallel STL. I had an impression that use have some special execution policy or you wrap ExecutionPolicy object with this function or something like that. Cannot propose better from the top of my head. Need to think a little bit about that.

107

I would recommend to split those variable to make each in its line for the sake of readability.

107

If I don't miss something you always construct chunk_policy object with 3 arguments. Why do you need field initialization? I would prefer to leave this struct trivially_default_constructible unless we have strong reasons not to do that.

MikeDvorskiy added inline comments.Sep 30 2021, 7:17 AM
pstl/include/pstl/internal/omp/parallel_for.h
5

I think the file header (here and in the all headers) should be arranged with "pstl LLVM" file header, like this: (w/o copyright - the contributors listed in https://github.com/llvm/llvm-project/blob/main/pstl/CREDITS.txt)

===----------------------------------------------------------------------===

Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
See https://llvm.org/LICENSE.txt for license information.
SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

===----------------------------------------------------------------------===//

nadiasvertex marked an inline comment as done.Oct 1 2021, 8:27 AM
nadiasvertex added inline comments.
pstl/include/pstl/internal/omp/parallel_merge.h
75

Fixed.

pstl/include/pstl/internal/omp/parallel_stable_sort.h
26

No reason. Happy to change it.

102

I'm not sure why you think adding a _func suffix makes this clearer. I'm not opposed to it, but most functors in the STL aren't suffixed with "_func" and that's not confusing.

119

I agree. This was an accident.

pstl/include/pstl/internal/omp/parallel_transform_reduce.h
38

Isn't calling omp_get_num_threads() slower than reading a variable?

51

This is fixed.

pstl/include/pstl/internal/omp/util.h
61

I made this change for the oneDPL commit, but it was deferred. I can make this change here if that is the consensus.

84

Many algorithms chunk up the work. If we don't have a variable we don't know how big to make the chunks. I'm happy to make it inline. Also, my testing shows that it should probably be 2048.

89

I prefer them. If the preference in PSTL is to not, I can change it.

100

Good point! I will fix that!

105

chunk_policy is intended to refer to the information needed to apply the chunking algorithm. We could call it chunk_boundaries or __chunk_partition_details.

107

I think that the compiler complains if I don't initialize the values. I can remove it and see. It might have been an artifact from other refactoring.

nadiasvertex marked 8 inline comments as done.

Add changes from review by rarutyan.

Commented on, and marked many reviews as fixed.

pstl/include/pstl/internal/omp/parallel_stable_sort.h
68

A callable is required.

Fix pstl for removal of std::result_of (use std::invoke_result)

Minor compilation fixes from typos.

Remove copyright.

nadiasvertex marked an inline comment as done.Oct 4 2021, 12:13 PM

Fix additional uses of std::distance() and remove serial fallbacks.

Apply the clang-format changes from CI.

Add come comments, minor formatting changes.

MikeDvorskiy requested changes to this revision.Oct 5 2021, 11:38 AM

Christopher, one more comment - we discussed and agreed that a better name for macro _PSTL_PAR_BACKEND_OMP is

_PSTL_PAR_BACKEND_OPENMP

(I've counted 3 matches in the patch.)

pstl/include/pstl/internal/omp/parallel_merge.h
54

Talking about the redundant parentheses on line 45 - yes, perhaps, I can agree that it improves readability... But, here, on the line 53, I guess, it worsens readability, IMHO.

BTW,
On the line 103 the similar code written without redundant parentheses -

__parallel_merge_body(
            __mid - __xs, __xe - __mid,...
pstl/include/pstl/internal/omp/parallel_stable_sort.h
38

According to my and Ruslan's comments about usage of "std::distance" in case of random access iterators - yes, as we agreed, you have changed it by difference.... but why here (and on line 55) it was kept?

pstl/include/pstl/internal/omp/parallel_transform_reduce.h
84

The lines (83-87) look a kind of "merge artifacts"....

This revision now requires changes to proceed.Oct 5 2021, 11:38 AM

Rename the config key to _PSTL_PAR_BACKEND_OPENMP.
Remove redundant parentheses.
Fix merge artifacts.

nadiasvertex marked 3 inline comments as done.

Remove this _Size alias in parallel_transform_reduce

nadiasvertex marked 10 inline comments as done.Oct 6 2021, 5:44 AM

Addressed all known concerns with the last patch.

pstl/include/pstl/internal/omp/parallel_merge.h
54

I did a search and replace. Those parentheses were a side-effect of a mistake I made in the regular expression. Sorry. This is fixed now.

pstl/include/pstl/internal/omp/parallel_stable_sort.h
38

Yup. It wasn't clear to me that this was also required to be a random access iterator. I have made the change here too.

pstl/include/pstl/internal/omp/parallel_transform_reduce.h
38

I misunderstood what you were talking about. I have fixed this.

Fix libc++ compile issues.

Undo clang-format of pstl_config.h.

MikeDvorskiy accepted this revision.Oct 12 2021, 6:43 AM

I've put some small comments but they are definitely not a showstopper for this patch to be merged. Whoever merges it please just make sure to apply those.

Other than that, looks good to me. Remaining improvements can be done later.

pstl/CREDITS.txt
22

Please add new line when merge this review

pstl/include/pstl/internal/omp/parallel_for.h
12

In all (or almost all) files the guard macro is not started from underscore. It's inconsistent with the rest of the code and makes somebody think that this macro is public.

It probably doesn't deserve one more patch but I would like to make sure it's properly merged to main branch.

ldionne requested changes to this revision.Oct 14 2021, 6:13 AM

Thanks for the patch! Please ping me once you've applied those comments and we should be close to ready to ship this.

pstl/include/pstl/internal/omp/parallel_for.h
12

Please fix. This should be _PSTL_INTERNAL_OMP_PARALLEL_FOR_H (and similarly throughout).

15

Missing #include <cstddef>

pstl/include/pstl/internal/omp/parallel_for_each.h
28

It should be sufficient to use std::distance, not ::std::distance. Any reason why you're doing the latter? Applies in a bunch of places.

pstl/include/pstl/internal/omp/parallel_merge.h
93–95

Apply throughout.

pstl/include/pstl/internal/omp/util.h
29

This error message isn't too helpful. Something like this would probably be better:

#if !defined(_OPENMP)
#    error The OpenMP PSTL backend requires OpenMP, but it looks like your platform doesn't implement it.
#endif

Is the above message accurate and what you're trying to convey to the user? Also, is there any way that you'd be able to include <omp.h> yet OpenMP wouldn't be available? If not, then the #if !defined(_OPENMP) block is useless altogether.

This revision now requires changes to proceed.Oct 14 2021, 6:13 AM

Fix the header guards naming standard. Also remove ::std uses.

nadiasvertex marked 6 inline comments as done.Oct 14 2021, 7:58 AM

Resolved all outstanding comments.

pstl/include/pstl/internal/omp/parallel_for_each.h
28

Someone mentioned for ADL it is better to root it in the global space. I am fine with taking it out.

pstl/include/pstl/internal/omp/util.h
29

Probably true. This was reduced from a larger block that checked various openmp versions. I will remove it.

nadiasvertex marked 2 inline comments as done.Oct 14 2021, 8:01 AM
ldionne accepted this revision.Oct 14 2021, 8:27 AM

Leaving the correctness of the backend to @MikeDvorskiy (thanks!), but apart from that this looks good to me. Let's work on enabling the tests in the CI?

This revision is now accepted and ready to land.Oct 14 2021, 8:27 AM

Leaving the correctness of the backend to @MikeDvorskiy (thanks!), but apart from that this looks good to me.

Together with Mikhail we reviewed that patch and it looks good. The tests were launched on the (almost) latest code base in oneDPL CI (just several configurations, to be honest). Anyway, no issues observed.

Do you want us to commit this patch?

Let's work on enabling the tests in the CI?

I believe it can be done separately if you don't have opposite opinion

MikeDvorskiy closed this revision.Oct 21 2021, 7:04 AM

Commit 6069a6a5
by Mikhail Dvorskiy, 10/15/2021 03:36 PM

[pstl] Initial implementation of OpenMP backend, on behalf of Christopher Nelson nadiasvertex@gmail.com

A couple of parallel patterns still remains serial - "Parallel partial sort", and "Parallel transform scan" - there are //TODOs in the code.

Via Conduit