Page MenuHomePhabricator

[pstl] Fix a number of bugs with parallel partial sort

Authored by nadiasvertex on Apr 3 2021, 2:37 PM.


Group Reviewers
Restricted Project

Diff Detail

Event Timeline

nadiasvertex created this revision.Apr 3 2021, 2:37 PM
nadiasvertex requested review of this revision.Apr 3 2021, 2:37 PM
Herald added a project: Restricted Project. · View Herald Transcript
Herald added a reviewer: Restricted Project. · View Herald Transcript
Quuxplusone added inline comments.

As long as you're changing these anyway, might as well simplify.


New lines 577, 570, 573: seems like you should undo these whitespace diffs (manually if necessary).

I don't suppose it would make sense to use a heap instead of linearly scanning the vector in __update_largest_item every time one is inserted? I mean, that's the whole point of non-parallel partial_sort... but I guess the idea here is that rearranging the elements once would be less cache-friendly than linearly scanning them multiple times?


if you wanted to keep this simple


That this ever worked indicates that we could use some more "evil" test cases with overloaded/deleted operator, and so on. The iterators in libcxx/test/support/test_iterators.h are already "evil" in this way; we'd just need some new tests that use those iterators with parallel algorithms.

My ADL senses are tingling with those ADL calls to begin and end.

(Are there tests for parallel partial_sort? If it's this buggy, why aren't those tests failing now?)


Gratuitous whitespace diff here lost the indentation of the trailing return type.


I think this was meant to call std::iter_swap(__current_item, __swap_item), actually. Calling std::swap qualified seems like a bug, but calling std::iter_swap qualified would be normal practice.


Technically, all of these ADL calls are unsafe and should be qualified, like __omp_backend::__parallel_stable_sort_body(...). You're changing so much in the PR currently that I feel like you should go ahead and ADL-proof it too; but I wouldn't make ADL the main point of the PR. ;)

nadiasvertex added inline comments.Apr 3 2021, 4:49 PM

Yes, exactly. I didn't do timed tests, but when I look at managing a heap, operationally it's hard to see how it could be faster than the linear scan, when K is small. If K starts to get large, one wonders why a partial sort is even being done. If I maintain a heap there's a lot of copying and moving going on. This approach lets me overwrite existing items in place and enjoy the benefits of the cache for the search.

Also, during the merge operation used to reduce the chunks I have to update the largest item, so I would be doing a lot of heap maintenance. But if you think it would be better, I can look into doing that instead.


In general I thought it better to use std:distance, but if the preference is to use explicit + and - operators, I can do that.


Actually, this is all "new" code. I have another parent revision where most of this code is introduced. However, the parallel partial sort is buggy, so I am working on fixing it here. I will make the ADL fixes you suggest.

The partial sort now succeeds for most of the collections I try, but it fails to produce correct results in some cases, I think due to some asymmetry in the reduction. I thought I had fixed it (which is why I posted this) until I did more testing.

nadiasvertex abandoned this revision.Aug 31 2021, 7:35 AM

This work will continue on