This is an archive of the discontinued LLVM Phabricator instance.

[libc++] fix std::sort(T**, T**)
ClosedPublic

Authored by bfg on Nov 26 2020, 9:35 AM.

Details

Reviewers
arichardson
ldionne
Group Reviewers
Restricted Project
Commits
rG297c839e2d22: [libc++] fix std::sort(T**, T**)
Summary

previously, invocations of std::sort(T, T) casted the arguments to
(size_t *). this breaks sorting on systems for which pointers don't fit
in a size_t. change the cast to (uintptr_t *) and add a test.

Diff Detail

Event Timeline

bfg created this revision.Nov 26 2020, 9:35 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 26 2020, 9:35 AM
Herald added 1 blocking reviewer(s): Restricted Project. · View Herald Transcript
bfg requested review of this revision.Nov 26 2020, 9:35 AM
bfg added a comment.Nov 26 2020, 9:36 AM

There were linter-suggested indentation changes but I decided to match the file's existing style.

arichardson added a comment.EditedNov 26 2020, 9:57 AM

This is required for CHERI, where we use 128-bit pointers (well actually 129 bits, since we also have a hidden validity bit).
For us size_t/ptrdiff_t are still 64 bits, but uintptr_t is 128-bits, so we need to do this sort using uintptr_t.

I'm not sure if any other architectures are affected by this (32-bit ABIs on 64-bit CPU such as x32 and MIPS n32 seem to use 32 bits for everything), but it's required for us and getting this change upstreamed would be rather helpful.

libcxx/test/std/algorithms/alg.sorting/alg.sort/sort/sort.pass.cpp
138–144

It would be nice if the test had a reference to the bug report.

ldionne requested changes to this revision.Nov 26 2020, 10:02 AM

LGTM, but please take @arichardson 's suggestions. Requesting changes until those suggestions are applied.

libcxx/include/algorithm
4165

It looks like the correct thing to do regardless of the architecture. Semantically, what we want to do is sort the elements in [first, last) as if they were integers, but size_t is not guaranteed to be able to hold a pointer. uintptr_t (or intptr_t) are the types for that.

So I agree with this change.

This revision now requires changes to proceed.Nov 26 2020, 10:02 AM

Oh, and I agree with Marshall -- if there's a bug report for this, please link it from the test and mention it in the commit message.

Oh, and I agree with Marshall -- if there's a bug report for this, please link it from the test and mention it in the commit message.

I just looked at our bugtrackers and I don't think we ever filed this as an issue. The problem was noticed when running jsc (the command line tool from WebKit to run JavaScript code).
In memory our pointers (CHERI capabilities) looks like pairs of "64-bit address+64 bits of metadata (permissions+bounds)", so after calling std::sort we ended up with an array containing just the addresses of the pointers, followed by the sorted virtual addresses.

bfg updated this revision to Diff 308002.Nov 27 2020, 3:04 AM

incorporate test suggsetions

ldionne requested changes to this revision.Nov 27 2020, 7:09 AM

CI isn't passing in C++03 mode.

libcxx/test/std/algorithms/alg.sorting/alg.sort/sort/sort.pass.cpp
138–143
This revision now requires changes to proceed.Nov 27 2020, 7:09 AM
bfg updated this revision to Diff 308070.Nov 27 2020, 8:46 AM

make changes c++03 friendly

curdeius added inline comments.
libcxx/test/std/algorithms/alg.sorting/alg.sort/sort/sort.pass.cpp
138

It would be standard C++ this way not requiring any extensions (dynamic array), nope?

ldionne requested changes to this revision.Nov 27 2020, 9:59 AM
ldionne added inline comments.
libcxx/test/std/algorithms/alg.sorting/alg.sort/sort/sort.pass.cpp
138

You're right. @bfg please do what Marek suggests or what I suggested. It doesn't matter to me, as long as we're standard c++03.

This revision now requires changes to proceed.Nov 27 2020, 9:59 AM
bfg updated this revision to Diff 308145.Nov 28 2020, 2:41 AM

Make the constant array size static for actual C++03 conformance. Thanks all for feedback.

ldionne accepted this revision.Dec 1 2020, 2:12 PM

LGTM. If you need someone to commit this for you, please provide Author Name <email@domain>. Thanks!

This revision is now accepted and ready to land.Dec 1 2020, 2:12 PM
bfg added a comment.Dec 3 2020, 12:43 PM

Yes please: Brett Gutstein <brett.gutstein@cst.cam.ac.uk>.

This revision was automatically updated to reflect the committed changes.

Aw, I wish I had seen this to bikeshed over it — it's relevant to my interests! (Finishing P0879 constexpr algorithms which includes sort; the nightmare minefield that is comparing pointers https://quuxplusone.github.io/blog/2019/01/20/std-less-nightmare/)

  • Casting T** to any-kind-of-int* isn't something you can do constexprly, is it?
  • Does this sort overload exist purely to avoid instantiating std::less<T*>? why is that so important? (git blame offers no help; this codepath dates back to Howard's initial import.)
  • Could this sort overload simply delegate to _VSTD::sort(first, last, __less<void*>()), to avoid the constexpr-unfriendly part? (That is, don't type-pun the actual elements; force an implicit conversion to void* before each comparison.)
  • If we did that, of course we'd want to add void* to the list of types for which sort's helper functions are explicitly instantiated in "algorithm.cpp". Hmm, I wonder if char8_t, char16_t, etc. should already be there...

Anyway, no action required AFAIconcerned; I'm just thinking out loud and trying not to go too far down the rabbit-hole yet. This will clearly be revisited for real at some point, as part of P0879 constexpr sort.

Aw, I wish I had seen this to bikeshed over it — it's relevant to my interests! (Finishing P0879 constexpr algorithms which includes sort; the nightmare minefield that is comparing pointers https://quuxplusone.github.io/blog/2019/01/20/std-less-nightmare/)

  • Casting T** to any-kind-of-int* isn't something you can do constexprly, is it?

No, since it's a reinterpret_cast in essence.

  • Does this sort overload exist purely to avoid instantiating std::less<T*>? why is that so important? (git blame offers no help; this codepath dates back to Howard's initial import.)

My hunch is that it's a manual attempt to perform outlining, i.e. to avoid instantiating the same std::sort code over and over again for different pointer types when they are performing essentially the same task on objects that have the same representation. I could be wrong, I'm just guessing.

  • Could this sort overload simply delegate to _VSTD::sort(first, last, __less<void*>()), to avoid the constexpr-unfriendly part? (That is, don't type-pun the actual elements; force an implicit conversion to void* before each comparison.)

If that results in worse generated code, then we'll have a difficult tradeoff to make. But if not, then I agree we could do that. Or we could drop the outlining attempt altogether. It all depends on whether writing sort that way actually provides a benefit with modern compilers or not, I think. We'd have to check.

Pinging @howard.hinnant in case he remembers what was the reason for writing sort that way.

I agree that the change from size_t to uintptr_t is the right thing to do here. I don't recall exactly why I didn't use uintptr_t in the first place. It may have simply been an oversight. Or it may have been that uintptr_t wasn't in my toolbox at the time.

ldionne added inline comments.Dec 8 2020, 10:29 AM
libcxx/include/algorithm
4163

@howard.hinnant

Sorry I wasn't clear when I pinged you earlier -- but do you recall why you wrote this specialization of std::sort for ranges containing pointers? Was it to avoid instantiating the actual sort algorithm on various T* types and save on code size?

Oh, yes, I believe that was it. Just a code size optimization.

Oh, yes, I believe that was it. Just a code size optimization.

Thanks a lot, that confirms my hunch.

So @Quuxplusone feel free to investigate ways to remove or modify this, now that you know the original intent. We shouldn't take a code size regression for constexpr, though (I'm sure there's a bunch of ways we can make that work).