previously, invocations of std::sort(T, T) casted the arguments to
(size_t *). this breaks sorting on systems for which pointers don't fit
in a size_t. change the cast to (uintptr_t *) and add a test.
Details
- Reviewers
arichardson ldionne - Group Reviewers
Restricted Project - Commits
- rG297c839e2d22: [libc++] fix std::sort(T**, T**)
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
There were linter-suggested indentation changes but I decided to match the file's existing style.
This is required for CHERI, where we use 128-bit pointers (well actually 129 bits, since we also have a hidden validity bit).
For us size_t/ptrdiff_t are still 64 bits, but uintptr_t is 128-bits, so we need to do this sort using uintptr_t.
I'm not sure if any other architectures are affected by this (32-bit ABIs on 64-bit CPU such as x32 and MIPS n32 seem to use 32 bits for everything), but it's required for us and getting this change upstreamed would be rather helpful.
libcxx/test/std/algorithms/alg.sorting/alg.sort/sort/sort.pass.cpp | ||
---|---|---|
138–144 |
LGTM, but please take @arichardson 's suggestions. Requesting changes until those suggestions are applied.
libcxx/include/algorithm | ||
---|---|---|
4165 | It looks like the correct thing to do regardless of the architecture. Semantically, what we want to do is sort the elements in [first, last) as if they were integers, but size_t is not guaranteed to be able to hold a pointer. uintptr_t (or intptr_t) are the types for that. So I agree with this change. |
Oh, and I agree with Marshall -- if there's a bug report for this, please link it from the test and mention it in the commit message.
I just looked at our bugtrackers and I don't think we ever filed this as an issue. The problem was noticed when running jsc (the command line tool from WebKit to run JavaScript code).
In memory our pointers (CHERI capabilities) looks like pairs of "64-bit address+64 bits of metadata (permissions+bounds)", so after calling std::sort we ended up with an array containing just the addresses of the pointers, followed by the sorted virtual addresses.
CI isn't passing in C++03 mode.
libcxx/test/std/algorithms/alg.sorting/alg.sort/sort/sort.pass.cpp | ||
---|---|---|
138–143 |
libcxx/test/std/algorithms/alg.sorting/alg.sort/sort/sort.pass.cpp | ||
---|---|---|
138 | It would be standard C++ this way not requiring any extensions (dynamic array), nope? |
Make the constant array size static for actual C++03 conformance. Thanks all for feedback.
LGTM. If you need someone to commit this for you, please provide Author Name <email@domain>. Thanks!
Aw, I wish I had seen this to bikeshed over it — it's relevant to my interests! (Finishing P0879 constexpr algorithms which includes sort; the nightmare minefield that is comparing pointers https://quuxplusone.github.io/blog/2019/01/20/std-less-nightmare/)
- Casting T** to any-kind-of-int* isn't something you can do constexprly, is it?
- Does this sort overload exist purely to avoid instantiating std::less<T*>? why is that so important? (git blame offers no help; this codepath dates back to Howard's initial import.)
- Could this sort overload simply delegate to _VSTD::sort(first, last, __less<void*>()), to avoid the constexpr-unfriendly part? (That is, don't type-pun the actual elements; force an implicit conversion to void* before each comparison.)
- If we did that, of course we'd want to add void* to the list of types for which sort's helper functions are explicitly instantiated in "algorithm.cpp". Hmm, I wonder if char8_t, char16_t, etc. should already be there...
Anyway, no action required AFAIconcerned; I'm just thinking out loud and trying not to go too far down the rabbit-hole yet. This will clearly be revisited for real at some point, as part of P0879 constexpr sort.
No, since it's a reinterpret_cast in essence.
- Does this sort overload exist purely to avoid instantiating std::less<T*>? why is that so important? (git blame offers no help; this codepath dates back to Howard's initial import.)
My hunch is that it's a manual attempt to perform outlining, i.e. to avoid instantiating the same std::sort code over and over again for different pointer types when they are performing essentially the same task on objects that have the same representation. I could be wrong, I'm just guessing.
- Could this sort overload simply delegate to _VSTD::sort(first, last, __less<void*>()), to avoid the constexpr-unfriendly part? (That is, don't type-pun the actual elements; force an implicit conversion to void* before each comparison.)
If that results in worse generated code, then we'll have a difficult tradeoff to make. But if not, then I agree we could do that. Or we could drop the outlining attempt altogether. It all depends on whether writing sort that way actually provides a benefit with modern compilers or not, I think. We'd have to check.
Pinging @howard.hinnant in case he remembers what was the reason for writing sort that way.
I agree that the change from size_t to uintptr_t is the right thing to do here. I don't recall exactly why I didn't use uintptr_t in the first place. It may have simply been an oversight. Or it may have been that uintptr_t wasn't in my toolbox at the time.
libcxx/include/algorithm | ||
---|---|---|
4163 | Sorry I wasn't clear when I pinged you earlier -- but do you recall why you wrote this specialization of std::sort for ranges containing pointers? Was it to avoid instantiating the actual sort algorithm on various T* types and save on code size? |
Thanks a lot, that confirms my hunch.
So @Quuxplusone feel free to investigate ways to remove or modify this, now that you know the original intent. We shouldn't take a code size regression for constexpr, though (I'm sure there's a bunch of ways we can make that work).
@howard.hinnant
Sorry I wasn't clear when I pinged you earlier -- but do you recall why you wrote this specialization of std::sort for ranges containing pointers? Was it to avoid instantiating the actual sort algorithm on various T* types and save on code size?