Implement the PSTL backend using OpenMP.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
pstl/include/pstl/internal/parallel_backend_omp.h | ||
---|---|---|
343 | ISO C++ forbids variable length array. | |
553 |
| |
733 | I think a container, for example std::vector, is better causing RAII reason. |
This corrects all known problems in the parallel partial sort implementation. The parallel merge algorithm still needs implementing.
This patch brings in fixes made in the oneDPL project. All unit tests in oneDPL now pass.
pstl/include/pstl/internal/omp/parallel_for_each.h | ||
---|---|---|
29 | On this line we call increment for an iterator value (last-first) time in one thread, because it is a forward iterator.. Probably, there is an approach to avoid such overhead.. At least, I would add //TODO: like "to think how to avoid the overhead" | |
34 | Probably, here we have overhead, because an each loop iteration increments __first from "zero". | |
pstl/include/pstl/internal/omp/parallel_invoke.h | ||
40 | Nit: redundant braces. | |
pstl/include/pstl/internal/omp/parallel_merge.h | ||
43 | We have redundant parentheses, according to the C++ operation priority list. | |
48 | We have redundant parentheses, according to the C++ operation priority list. | |
75 | The "parallel merge" pattern accepts just random access iterator types. So, we can write here just difference operator, everywhere in the patch. Probably, it makes sense to add an assert (in the begin of the function) on random "access iterator type". | |
91 | Nit: redundant braces. (here and everywhere in the same places) | |
pstl/include/pstl/internal/omp/parallel_reduce.h | ||
29 | Talking about fallback to serial call (here and everywhere in the patch) - in the TBB backend (and others) we have not such logic (excepting sort and merge algorithms). A point here - a user functor/lambda may be so complicated and, probably, parallel execution is reasonable even for "short" sequences... PSTL relies that is user's responsibility - to run execution in parallel or not. Probably, in the future, if the users will complain, we will change approach, but in any case the behavior should be consistent in the all backends. | |
34 | We have redundant parentheses (here and everywhere in the similar places), according to the C++ operation priority list. | |
54 | According to PSTL design, the trivial pre-checks are processed on the "middle" PSTL layer - in algorithm_iml.h, numeric_impl.h, etc. If in some places on the "middle layer" it is not true - it should be added. And remove such trivial pre-checks on a threading backend level, m.b. just add an assert instead of. | |
pstl/include/pstl/internal/omp/parallel_scan.h | ||
55 | Regarding the following functions: downsweep, upsweep, __split. | |
pstl/include/pstl/internal/omp/parallel_stable_sort.h | ||
122 | Please, see my comment above (about trivial pre-checks). | |
pstl/include/pstl/internal/omp/parallel_transform_reduce.h | ||
51 | Actually, we cannot use OpenMP UDR because it requires an identical value to initialize each accumulator. "__init" parameter is initial value of reduction, not an "identity". | |
92 | Please, see my comment above (about trivial pre-checks). | |
pstl/include/pstl/internal/omp/parallel_transform_scan.h | ||
28 | I believe, "//TODO" should added here- about that parallel implementation will be added later, while we have fallback to a serial call. | |
pstl/include/pstl/internal/omp/util.h | ||
60 ↗ | (On Diff #371932) | I believe the call class __buffer can be shared between the all host backend. |
All of the serious comments were addressed. The nits were style differences I disagree with. :-) If these nits are LLVM required style I will adjust them.
pstl/include/pstl/internal/omp/parallel_invoke.h | ||
---|---|---|
40 | I prefer the braces to make it clear what is in the block and for maintenance reasons. | |
pstl/include/pstl/internal/omp/parallel_merge.h | ||
43 | I prefer them. | |
48 | I prefer them. |
pstl/include/pstl/internal/omp/parallel_for.h | ||
---|---|---|
65 | Please modify to omp single nowait here and in all other places where omp single is used within omp parallel. Currently we have two barriers from parallel and single. The inner barrier may be eliminated. | |
pstl/include/pstl/internal/omp/parallel_merge.h | ||
75 | I would even say we should write __xe - __xs because std::distance works with any iterator type. Since we expect random access iterator only to be there we might hide the error with the code as it is. In case of using operator- explicitly we get the compile-time error immediately if this very operator is not defined. And one more thing. Using additional abstraction layers in places where we don't need them (in this case slightly, but) increases compilation time. | |
pstl/include/pstl/internal/omp/parallel_stable_sort.h | ||
68 | Why do you need such a proxy object? Could you call __parallel_move_range directly? | |
71 | If the Callable is required please mark method as const | |
102 | __move_value is named as the type (not the alias) that decreases readability. Wouldn't it be better to rename the variable (probably __move_value_func)? The same is true for `__move_range` | |
119 | I don't need __exec as a name, do you? I personally like the style with comments (/*__exec */) but it's up to you | |
pstl/include/pstl/internal/omp/parallel_transform_reduce.h | ||
38 | Don't see the reason why we need to add such an alias. In some places you use that in others - you don't |
pstl/include/pstl/internal/omp/parallel_stable_sort.h | ||
---|---|---|
26 | Why did you prefer to make the whole class template rather than making operator() with template parameters? | |
30 | const at the end please | |
38 | __first2 supposed to be OutputIterator semantically. Let's rename the second template parameter to OutputIterator and the last function parameter to __d_first or something. Otherwise, readability suffers. | |
67 | This comment is applicable if the callable is required. Why do you need to be templatized? Could those template parameters be tied to operator() | |
pstl/include/pstl/internal/omp/util.h | ||
60 ↗ | (On Diff #371932) | I think it should not be a big deal to create the header with the __buffer class within and include it to both TBB backend and OpenMP backend. Otherwise we will commit relatively big code duplication |
83 ↗ | (On Diff #371932) | I am not sure if we need this variable basing on Mikhail's comments (probably, it's still useful for some algorithms) but if it remains in the code please mark it inline. |
88 ↗ | (On Diff #371932) | Is there any reason to use trailing return types? As I remember correctly you use those only in this file. |
99 ↗ | (On Diff #371932) | Why do you use std::intptr_t? it's a type to represent the pointer as the signed value. If you already check (with enable_if) that your type is integral and you have two same types (by function signature) you may case directly to Index, may not you? |
104 ↗ | (On Diff #371932) | chuck_policy (a least to me) is a very confusing name in the context of Parallel STL. I had an impression that use have some special execution policy or you wrap ExecutionPolicy object with this function or something like that. Cannot propose better from the top of my head. Need to think a little bit about that. |
106 ↗ | (On Diff #371932) | I would recommend to split those variable to make each in its line for the sake of readability. |
106 ↗ | (On Diff #371932) | If I don't miss something you always construct chunk_policy object with 3 arguments. Why do you need field initialization? I would prefer to leave this struct trivially_default_constructible unless we have strong reasons not to do that. |
pstl/include/pstl/internal/omp/parallel_for.h | ||
---|---|---|
5 | I think the file header (here and in the all headers) should be arranged with "pstl LLVM" file header, like this: (w/o copyright - the contributors listed in https://github.com/llvm/llvm-project/blob/main/pstl/CREDITS.txt) ===----------------------------------------------------------------------=== |
pstl/include/pstl/internal/omp/parallel_merge.h | ||
---|---|---|
75 | Fixed. | |
pstl/include/pstl/internal/omp/parallel_stable_sort.h | ||
26 | No reason. Happy to change it. | |
102 | I'm not sure why you think adding a _func suffix makes this clearer. I'm not opposed to it, but most functors in the STL aren't suffixed with "_func" and that's not confusing. | |
119 | I agree. This was an accident. | |
pstl/include/pstl/internal/omp/parallel_transform_reduce.h | ||
38 | Isn't calling omp_get_num_threads() slower than reading a variable? | |
51 | This is fixed. | |
pstl/include/pstl/internal/omp/util.h | ||
60 ↗ | (On Diff #371932) | I made this change for the oneDPL commit, but it was deferred. I can make this change here if that is the consensus. |
83 ↗ | (On Diff #371932) | Many algorithms chunk up the work. If we don't have a variable we don't know how big to make the chunks. I'm happy to make it inline. Also, my testing shows that it should probably be 2048. |
88 ↗ | (On Diff #371932) | I prefer them. If the preference in PSTL is to not, I can change it. |
99 ↗ | (On Diff #371932) | Good point! I will fix that! |
104 ↗ | (On Diff #371932) | chunk_policy is intended to refer to the information needed to apply the chunking algorithm. We could call it chunk_boundaries or __chunk_partition_details. |
106 ↗ | (On Diff #371932) | I think that the compiler complains if I don't initialize the values. I can remove it and see. It might have been an artifact from other refactoring. |
Commented on, and marked many reviews as fixed.
pstl/include/pstl/internal/omp/parallel_stable_sort.h | ||
---|---|---|
68 | A callable is required. |
Christopher, one more comment - we discussed and agreed that a better name for macro _PSTL_PAR_BACKEND_OMP is
_PSTL_PAR_BACKEND_OPENMP
(I've counted 3 matches in the patch.)
pstl/include/pstl/internal/omp/parallel_merge.h | ||
---|---|---|
54 | Talking about the redundant parentheses on line 45 - yes, perhaps, I can agree that it improves readability... But, here, on the line 53, I guess, it worsens readability, IMHO. BTW, __parallel_merge_body( __mid - __xs, __xe - __mid,... | |
pstl/include/pstl/internal/omp/parallel_stable_sort.h | ||
38 | According to my and Ruslan's comments about usage of "std::distance" in case of random access iterators - yes, as we agreed, you have changed it by difference.... but why here (and on line 55) it was kept? | |
pstl/include/pstl/internal/omp/parallel_transform_reduce.h | ||
84 | The lines (83-87) look a kind of "merge artifacts".... |
Rename the config key to _PSTL_PAR_BACKEND_OPENMP.
Remove redundant parentheses.
Fix merge artifacts.
Addressed all known concerns with the last patch.
pstl/include/pstl/internal/omp/parallel_merge.h | ||
---|---|---|
54 | I did a search and replace. Those parentheses were a side-effect of a mistake I made in the regular expression. Sorry. This is fixed now. | |
pstl/include/pstl/internal/omp/parallel_stable_sort.h | ||
38 | Yup. It wasn't clear to me that this was also required to be a random access iterator. I have made the change here too. | |
pstl/include/pstl/internal/omp/parallel_transform_reduce.h | ||
38 | I misunderstood what you were talking about. I have fixed this. |
I've put some small comments but they are definitely not a showstopper for this patch to be merged. Whoever merges it please just make sure to apply those.
Other than that, looks good to me. Remaining improvements can be done later.
pstl/CREDITS.txt | ||
---|---|---|
22 | Please add new line when merge this review | |
pstl/include/pstl/internal/omp/parallel_for.h | ||
12 | In all (or almost all) files the guard macro is not started from underscore. It's inconsistent with the rest of the code and makes somebody think that this macro is public. It probably doesn't deserve one more patch but I would like to make sure it's properly merged to main branch. |
Thanks for the patch! Please ping me once you've applied those comments and we should be close to ready to ship this.
pstl/include/pstl/internal/omp/parallel_for.h | ||
---|---|---|
12 | Please fix. This should be _PSTL_INTERNAL_OMP_PARALLEL_FOR_H (and similarly throughout). | |
15 | Missing #include <cstddef> | |
pstl/include/pstl/internal/omp/parallel_for_each.h | ||
28 | It should be sufficient to use std::distance, not ::std::distance. Any reason why you're doing the latter? Applies in a bunch of places. | |
pstl/include/pstl/internal/omp/parallel_merge.h | ||
93–95 | Apply throughout. | |
pstl/include/pstl/internal/omp/util.h | ||
28 ↗ | (On Diff #378448) | This error message isn't too helpful. Something like this would probably be better: #if !defined(_OPENMP) # error The OpenMP PSTL backend requires OpenMP, but it looks like your platform doesn't implement it. #endif Is the above message accurate and what you're trying to convey to the user? Also, is there any way that you'd be able to include <omp.h> yet OpenMP wouldn't be available? If not, then the #if !defined(_OPENMP) block is useless altogether. |
Resolved all outstanding comments.
pstl/include/pstl/internal/omp/parallel_for_each.h | ||
---|---|---|
28 | Someone mentioned for ADL it is better to root it in the global space. I am fine with taking it out. | |
pstl/include/pstl/internal/omp/util.h | ||
28 ↗ | (On Diff #378448) | Probably true. This was reduced from a larger block that checked various openmp versions. I will remove it. |
Leaving the correctness of the backend to @MikeDvorskiy (thanks!), but apart from that this looks good to me. Let's work on enabling the tests in the CI?
Together with Mikhail we reviewed that patch and it looks good. The tests were launched on the (almost) latest code base in oneDPL CI (just several configurations, to be honest). Anyway, no issues observed.
Do you want us to commit this patch?
Let's work on enabling the tests in the CI?
I believe it can be done separately if you don't have opposite opinion
Commit 6069a6a5
by Mikhail Dvorskiy, 10/15/2021 03:36 PM
[pstl] Initial implementation of OpenMP backend, on behalf of Christopher Nelson nadiasvertex@gmail.com
A couple of parallel patterns still remains serial - "Parallel partial sort", and "Parallel transform scan" - there are //TODOs in the code.
Please add new line when merge this review