This is an archive of the discontinued LLVM Phabricator instance.

[libc++, std::vector] call the optimized version of __uninitialized_allocator_copy for trivial types
ClosedPublic

Authored by hiraditya on Apr 6 2023, 1:56 PM.

Details

Summary

See: https://github.com/llvm/llvm-project/issues/61987

Fix suggested by: @philnik and @var-const

Testing:
ninja check-cxx check-clang check-llvm

Benchmark Testcases (BM_CopyConstruct, and BM_Assignment) added.

performance improvement:

Run on (8 X 4800 MHz CPU s)
CPU Caches:
  L1 Data 48 KiB (x4)
  L1 Instruction 32 KiB (x4)
  L2 Unified 1280 KiB (x4)
  L3 Unified 12288 KiB (x1)
Load Average: 1.66, 3.02, 2.43

Comparing build-runtimes-base/libcxx/benchmarks/vector_operations.libcxx.out to build-runtimes/libcxx/benchmarks/vector_operations.libcxx.out
Benchmark                                                   Time             CPU      Time Old      Time New       CPU Old       CPU New
----------------------------------------------------------------------------------------------------------------------------------------
BM_ConstructSize/vector_byte/5140480                     +0.0362         +0.0362        116906        121132        116902        121131
BM_CopyConstruct/vector_int/5140480                      -0.4563         -0.4577       1755224        954241       1755330        951987
BM_Assignment/vector_int/5140480                         -0.0222         -0.0220        990045        968095        989917        968125
BM_ConstructSizeValue/vector_byte/5140480                +0.0308         +0.0307        116970        120567        116977        120573
BM_ConstructIterIter/vector_char/1024                    -0.0831         -0.0831            19            17            19            17
BM_ConstructIterIter/vector_size_t/1024                  +0.0129         +0.0131            88            89            88            89
BM_ConstructIterIter/vector_string/1024                  -0.0064         -0.0018         54455         54109         54208         54112
OVERALL_GEOMEAN                                          -0.0845         -0.0842             0             0             0             0

FYI, the perf improvements for BM_CopyConstruct due to this patch is mostly subsumed by the https://reviews.llvm.org/D149826. However this patch still adds value by converting copy to memmove (the second testcase).

Before the patch:

; Function Attrs: nounwind uwtable
define linkonce_odr dso_local void @_ZNSt3__16vectorIiNS_9allocatorIiEEE18__construct_at_endIPiS5_EEvT_T0_m(ptr noundef nonnull align 8 dereferenceable(24) %0, ptr noundef %1, ptr noundef %2, i64 noundef %3) local_unnamed_addr #4 comdat align 2 {
  %5 = getelementptr inbounds %"class.std::__1::vector", ptr %0, i64 0, i32 1
  %6 = load ptr, ptr %5, align 8, !tbaa !12
  %7 = icmp eq ptr %1, %2 
  br i1 %7, label %16, label %8

8:                                                ; preds = %4, %8 
  %9 = phi ptr [ %13, %8 ], [ %1, %4 ]
  %10 = phi ptr [ %14, %8 ], [ %6, %4 ]
  %11 = icmp ne ptr %10, null
  tail call void @llvm.assume(i1 %11)
  %12 = load i32, ptr %9, align 4, !tbaa !14
  store i32 %12, ptr %10, align 4, !tbaa !14
  %13 = getelementptr inbounds i32, ptr %9, i64 1
  %14 = getelementptr inbounds i32, ptr %10, i64 1
  %15 = icmp eq ptr %13, %2
  br i1 %15, label %16, label %8, !llvm.loop !16

16:                                               ; preds = %8, %4 
  %17 = phi ptr [ %6, %4 ], [ %14, %8 ]
  store ptr %17, ptr %5, align 8, !tbaa !12
  ret void
}

After the patch:

; Function Attrs: nounwind uwtable
define linkonce_odr dso_local void @_ZNSt3__16vectorIiNS_9allocatorIiEEE18__construct_at_endIPiS5_EEvT_T0_m(ptr noundef nonnull align 8 dereferenceable(24) %0, ptr noundef %1, ptr noundef %2, i64 noundef %3) local_unnamed_addr #4 comdat align 2 {
  %5 = getelementptr inbounds %"class.std::__1::vector", ptr %0, i64 0, i32 1
  %6 = load ptr, ptr %5, align 8, !tbaa !12
  %7 = ptrtoint ptr %2 to i64
  %8 = ptrtoint ptr %1 to i64
  %9 = sub i64 %7, %8
  %10 = ashr exact i64 %9, 2
  tail call void @llvm.memmove.p0.p0.i64(ptr align 4 %6, ptr align 4 %1, i64 %9, i1 false)
  %11 = getelementptr inbounds i32, ptr %6, i64 %10
  store ptr %11, ptr %5, align 8, !tbaa !12
  ret void
}

This is due to the optimized version of uninitialized_allocator_copy function.

Diff Detail

Event Timeline

hiraditya created this revision.Apr 6 2023, 1:56 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 6 2023, 1:56 PM
hiraditya requested review of this revision.Apr 6 2023, 1:56 PM
hiraditya updated this revision to Diff 511559.Apr 6 2023, 4:36 PM
hiraditya edited the summary of this revision. (Show Details)
hiraditya retitled this revision from unwrap iterator parameters to __uninitialized_allocator_copy before calling std::copy to [libc++] unwrap iterator parameters to __uninitialized_allocator_copy before calling std::copy.Apr 6 2023, 5:43 PM
philnik set the repository for this revision to rG LLVM Github Monorepo.Apr 7 2023, 3:22 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 7 2023, 3:22 AM
Herald added a reviewer: Restricted Project. · View Herald Transcript
philnik requested changes to this revision.Apr 10 2023, 12:03 PM

This doesn't fix the problem. __wrap_iters will still not get unwrapped.

This revision now requires changes to proceed.Apr 10 2023, 12:03 PM

This doesn't fix the problem. __wrap_iters will still not get unwrapped.

what else do we need to unwrap?

This doesn't fix the problem. __wrap_iters will still not get unwrapped.

what else do we need to unwrap?

You can look at __algorithm/equal.h to see how we unwrap iterators elsewhere.

hiraditya updated this revision to Diff 512815.Apr 12 2023, 6:50 AM

You can look at __algorithm/equal.h to see how we unwrap iterators elsewhere.

I saw the changes in: https://reviews.llvm.org/D139554 and tried to adapt from there. For some reason the _uninitialized_allocator_copy_impl(_Alloc& __alloc, _Iter1 __first1, _Sent1 __last1, _Iter2 __first2) version still gets called unless I add the enable_if_t (see comments). But that is too restrictive and fails many of the tests.

philnik added inline comments.Apr 12 2023, 8:53 AM
libcxx/include/__memory/uninitialized_algorithms.h
558–563

I think this can be fixed by using different types for the __first1 and __first2 pointers and adding the condition is_same<__remove_cv_t<_Type1>, __remove_cv_t<_Type2>>::value to the enable_if in the raw pointer overload. Then that overload will always have priority over the more generic one. Then this overload (the generic one) doesn't need any enable_ifs.

hiraditya added inline comments.
libcxx/include/__memory/uninitialized_algorithms.h
558–563

The Iter version is still selected with this change.

hiraditya updated this revision to Diff 512965.Apr 12 2023, 1:56 PM
hiraditya updated this revision to Diff 515506.Apr 20 2023, 3:27 PM
hiraditya retitled this revision from [libc++] unwrap iterator parameters to __uninitialized_allocator_copy before calling std::copy to [libc++, std::vector] call the optimized version of __uninitialized_allocator_copy for trivial types.
hiraditya edited the summary of this revision. (Show Details)
hiraditya edited the summary of this revision. (Show Details)Apr 20 2023, 3:30 PM
hiraditya edited the summary of this revision. (Show Details)Apr 20 2023, 3:33 PM
hiraditya edited the summary of this revision. (Show Details)Apr 21 2023, 2:05 PM
hiraditya edited the summary of this revision. (Show Details)Apr 21 2023, 2:08 PM

Looking at the benchmark this patch has a negative effect on the average times. Can you explain why that happens? Did you look at the generated assembly after your changes?

libcxx/include/__memory/uninitialized_algorithms.h
611

LLVM style is not else after a return, here and other places in this patch. Note the if constexpr does get the else to avoid generating unneeded code by the compiler.

libcxx/include/vector
1036 ↗(On Diff #515506)

Please look for other ::values to. Since C++17 they all support the _v suffix.

hiraditya added a comment.EditedApr 27 2023, 11:54 AM

Looking at the benchmark this patch has a negative effect on the average times. Can you explain why that happens?

Oh it seems i reversed the base vs. diff. I've fix the numbers after running again.

Did you look at the generated assembly after your changes?

Yes, the vector::copy operation (BM_CopyConstruct/vector_int) is just gone for after this patch.

hiraditya edited the summary of this revision. (Show Details)Apr 27 2023, 12:07 PM
hiraditya added inline comments.Apr 27 2023, 12:10 PM
libcxx/include/__memory/uninitialized_algorithms.h
611

This code is directly copied from the function above. Not sure what is the guidance for libcxx is, cc: @ldionne

philnik requested changes to this revision.Apr 27 2023, 1:55 PM

I would have expected a lot higher improvements from this. Does clang generate better vectorized code for copies now?

libcxx/include/__memory/uninitialized_algorithms.h
558–563
libcxx/include/vector
1034–1044 ↗(On Diff #515506)

This should still not be in vector. Otherwise we have to duplicate this.

This revision now requires changes to proceed.Apr 27 2023, 1:55 PM
hiraditya updated this revision to Diff 517744.Apr 27 2023, 4:11 PM
hiraditya edited the summary of this revision. (Show Details)

Moved the check to uninitialized_algorithms.h

hiraditya added a comment.EditedApr 27 2023, 4:22 PM

I would have expected a lot higher improvements from this. Does clang generate better vectorized code for copies now?

Updated the results after rerunning. With this change clang would generate memcpy for copying vector<int>.

For the following testcase, the copy is deleted because the compiler would figure out that it is a dead code.

#include<vector>
using namespace std;
using T = int;
T vev_copy(std::vector<T> v1, std::vector<T> &v2) {
    v2 = v1;
    return 10;
}

With the above patch and the following command:

$ /usr/bin/clang++ -I/usr/local/home/llvm-project/build/include/c++/v1 -I/usr/local/home/llvm-project/build/libcxx/benchmarks/benchmark-libcxx/include -I/usr/local/home/llvm-project/libcxx/test/support -I/usr/local/home/llvm-project/libcxxabi/include -O3 -nostdinc++ -std=c++20 -S ../build/test.cpp -w -fno-exceptions -o -

_Z8vec_copyNSt3__16vectorIiNS_9allocatorIiEEEE: # @_Z8vec_copyNSt3__16vectorIiNS_9allocatorIiEEEE
        .cfi_startproc
# %bb.0:
        subq    $24, %rsp
        .cfi_def_cfa_offset 32
        xorps   %xmm0, %xmm0
        movaps  %xmm0, (%rsp)
        movq    $0, 16(%rsp)
        movq    8(%rdi), %rax
        cmpq    (%rdi), %rax
        js      .LBB0_2
# %bb.1:
        movl    $10, %eax
        addq    $24, %rsp
        .cfi_def_cfa_offset 8
        retq
.LBB0_2:
        .cfi_def_cfa_offset 32
        movq    %rsp, %rdi
        callq   _ZNKSt3__16vectorIiNS_9allocatorIiEEE20__throw_length_errorB7v170000Ev
.Lfunc_end0:
        .size   _Z8vec_copyNSt3__16vectorIiNS_9allocatorIiEEEE, .Lfunc_end0-_Z8vec_copyNSt3__16vectorIiNS_9allocatorIiEEEE
        .cfi_endproc
hiraditya added inline comments.Apr 27 2023, 5:06 PM
libcxx/include/__memory/uninitialized_algorithms.h
558–563

Ah i see, removing the const from args made it work i think. I'll re-use your version then. Thanks so much!

hiraditya updated this revision to Diff 517757.Apr 27 2023, 5:12 PM

Using @philnik's suggestion again as it works once a non-const version of the __uninitialized_allocator_copy is added.

hiraditya added inline comments.Apr 27 2023, 5:21 PM
libcxx/include/__memory/uninitialized_algorithms.h
603

this version gets called for test cases like vector<const int>

hiraditya edited the summary of this revision. (Show Details)Apr 27 2023, 5:30 PM
Mordante added inline comments.Apr 27 2023, 11:04 PM
libcxx/include/__memory/uninitialized_algorithms.h
611

We have more code that does not conform to our policy, but for new code we use our policy.

philnik added inline comments.Apr 28 2023, 7:39 AM
libcxx/include/__memory/uninitialized_algorithms.h
603

But the other overload works for that just as well, no?

611

I'm not aware that we use this particular policy in our new code. Anyways, this code is redundant anyways, and shouldn't have to be added (unless I'm missing something).

hiraditya marked an inline comment as done.Apr 28 2023, 11:49 AM
hiraditya added inline comments.
libcxx/include/__memory/uninitialized_algorithms.h
611

it was indeed redundant. removed it.

hiraditya updated this revision to Diff 518486.May 1 2023, 10:23 AM
hiraditya marked an inline comment as done.
hiraditya marked an inline comment as done.May 1 2023, 10:23 AM
hiraditya updated this revision to Diff 518584.May 1 2023, 3:38 PM

Added benchmark and fix in the same patch.

hiraditya edited the summary of this revision. (Show Details)May 1 2023, 3:38 PM

The implementation itself looks good, but the benchmarks need some work.

libcxx/benchmarks/ContainerBenchmarks.h
30–34

It looks like this doesn't do what it's supposed to: https://godbolt.org/z/vdqGrcM3P. At least with your new version, the container never get copied. You have to add DoNotOptimize(v) or something similar, so the optimizer doesn't just remove the code. Then you can also drop the __attribute__((noinline)). Same applies to CopyContainerInto. I would just remove the functions and make the copy construction and assignment inline.

libcxx/include/__memory/uninitialized_algorithms.h
594

You're missing a _LIBCPP_HIDE_FROM_ABI.

hiraditya updated this revision to Diff 519179.May 3 2023, 11:23 AM
hiraditya edited the summary of this revision. (Show Details)
hiraditya marked an inline comment as done.May 3 2023, 11:27 AM
hiraditya added inline comments.
libcxx/benchmarks/ContainerBenchmarks.h
30–34

the intent is to show that redundant copy of vector is removed because of the new changes. this was the motivation behind creating the bug. inlining this in the caller should have the same effect. if we try to prevent the deletion, that could be a separate test case? let me know your preference, i'm fine either way.

hiraditya updated this revision to Diff 520473.EditedMay 8 2023, 1:32 PM
hiraditya edited the summary of this revision. (Show Details)

Updated the testcase to avoid deletion of vector, also inlined the tests.

hiraditya edited the summary of this revision. (Show Details)May 8 2023, 1:34 PM
hiraditya edited the summary of this revision. (Show Details)
hiraditya marked an inline comment as done.May 10 2023, 9:48 AM
philnik accepted this revision.May 11 2023, 2:34 PM

LGTM % nit.

libcxx/benchmarks/ContainerBenchmarks.h
30–34

I don't think it makes sense to test the removal of redundant code. IMO the interesting part is forwarding real calls to memmove, since that is the way more likely situation in the wild. Removal of redundant code is just a nice side effect from my point of view.

45

For good measure I'd add a DoNotOptimize after the assignment.

This revision is now accepted and ready to land.May 11 2023, 2:34 PM
hiraditya added inline comments.May 12 2023, 11:50 AM
libcxx/benchmarks/ContainerBenchmarks.h
30–34

sgtm. the modified test (with DoNotOptimizeData) keeps the vector alive.

hiraditya marked 2 inline comments as done.
hiraditya updated this revision to Diff 522374.May 15 2023, 3:51 PM
hiraditya added a subscriber: var-const.

Addressed issue with pair of contiguous_iterator<int*>, sentinel_wrapper<contiguous_iterator<int*>> . That caused failures with tests added in https://reviews.llvm.org/D149826

As per @var-const

unwrap_iter can't deal with a situation where the iterator and the sentinel are different types, and unwrap_range was written for that case

philnik added inline comments.May 15 2023, 3:55 PM
libcxx/include/__memory/uninitialized_algorithms.h
599

You should be able to use __unwrap_range regardless of C++ version.

hiraditya updated this revision to Diff 522389.May 15 2023, 5:04 PM

Remove std=c++20 check for using unwrap_range.

hiraditya marked an inline comment as done.May 15 2023, 5:05 PM
hiraditya added inline comments.
libcxx/include/__memory/uninitialized_algorithms.h
599

i see! fixed it.

var-const requested changes to this revision.May 15 2023, 8:33 PM

@hiraditya The benchmark shows a significant speedup for copy construction but almost no speedup for copy assignment. Is the latter expected? If not, it might indicate an issue with the optimization or, more likely, with the benchmark.

libcxx/benchmarks/ContainerBenchmarks.h
42

Is it deliberate that c1 and c2 are the same size (even though c1 gets overwritten)? If yes, please add a comment.

libcxx/include/__memory/uninitialized_algorithms.h
575

Nit: this name looks inconsistent with _Type. One option is to rename _Type to _Tp, but I'd suggest renaming _Up to _Out or similar (and perhaps _Type to _In for symmetry).

596

This line is too long, but more importantly, I think this expression is a little hard to read because of so many nested subexpressions. How about splitting it into something like:

auto __unwrapped_out = std::__unwrap_iter(__first2);
auto __result = std::__uninitialized_allocator_copy_impl(__alloc, __unwrapped_range.first, __unwrapped_range.second, __unwrapped_out);
return std::__rewrap_iter(__first2, __result);

?

This revision now requires changes to proceed.May 15 2023, 8:33 PM
hiraditya marked an inline comment as done.May 16 2023, 7:33 AM
hiraditya updated this revision to Diff 522615.May 16 2023, 7:43 AM

Addressed comments.

hiraditya marked 2 inline comments as done.May 16 2023, 7:44 AM
hiraditya added inline comments.
libcxx/benchmarks/ContainerBenchmarks.h
42

Yeah, there was no reason to have c2 as the same size. Updated the test.

@hiraditya The benchmark shows a significant speedup for copy construction but almost no speedup for copy assignment. Is the latter expected? If not, it might indicate an issue with the optimization or, more likely, with the benchmark.

Initially i thought there should be an impact but there isn't because compiler does the right thing even as is. I can remove the test for copy assignment if that makes more sense?

var-const accepted this revision as: var-const.May 16 2023, 1:15 PM

@hiraditya The benchmark shows a significant speedup for copy construction but almost no speedup for copy assignment. Is the latter expected? If not, it might indicate an issue with the optimization or, more likely, with the benchmark.

Initially i thought there should be an impact but there isn't because compiler does the right thing even as is. I can remove the test for copy assignment if that makes more sense?

I wouldn't remove the test -- just wanted to make sure we understand why there's a difference. I'm a little surprised the compiler could optimize the assignment case but not the construction case, but if that's what the assembly shows, then I don't think we need to dig any further.

libcxx/benchmarks/ContainerBenchmarks.h
42

Hmm, I expected that c1 will be empty, not c2 (since c2 is the one that provides the final value). Or does the optimization only apply if the existing capacity can be reused? If so, then I think the previous state with both c1 and c2 being large was better.

hiraditya added inline comments.
libcxx/benchmarks/ContainerBenchmarks.h
42

oops, good catch!

I wouldn't remove the test -- just wanted to make sure we understand why there's a difference. I'm a little surprised the compiler could optimize the assignment case but not the construction case, but if that's what the assembly shows, then I don't think we need to dig any further.

The assembly does show the difference. only that performance numbers didn't show any measurable difference on my machine. For example the following test case:

#include<vector>
using namespace std;
using T = int;
T vec_copy(std::vector<T> v1, std::vector<T> &v2) {
    v2 = v1; 
    return 10; 
}

And with the patch, clang does generate an extra memmove for a for loop.

$ clang++ -I ~/g/llvm-project/build-runtimes/include/c++/v1 -I ~/g/llvm-project/build-runtimes/libcxx/benchmarks/benchmark-libcxx/include -I ~/g/llvm-project/libcxx/test/support -I ~/g/llvm-project/libcxxabi/include -O3 -nostdinc++ -std=c++20 -S test1.cpp -w -fno-exceptions

$ grep memmove test1.base.s | wc -l

2

$ grep memmove test1.s | wc -l

3
philnik accepted this revision.May 17 2023, 7:04 PM
This revision is now accepted and ready to land.May 17 2023, 7:04 PM
hiraditya updated this revision to Diff 525316.May 24 2023, 1:35 PM
hiraditya edited the summary of this revision. (Show Details)

We've noticed during our integrate that the following code (godbolt) started producing a compiler error after this change:

#include <vector>

struct Value {};
template <class Iter>
struct WrappedIter : Iter {
    using pointer = Value*;
    using reference = Value&;

    pointer operator-> () { return &v; }
    reference operator* () { return v;  }

private:
  Value v;
};


using WI = WrappedIter<std::vector<int>::iterator>;
void foo(WI b, WI e) {
    std::vector<Value> v(b, e);
}

Concretely, the call to vector constructor eventually gets to std::__to_address via __unwrap_iter and results in a compiler error:

/opt/compiler-explorer/clang-trunk-20230602/bin/../include/c++/v1/__algorithm/unwrap_iter.h:46:32: error: no matching function for call to '__to_address'
...
/opt/compiler-explorer/clang-trunk-20230602/bin/../include/c++/v1/__memory/pointer_traits.h:172:6: note: candidate template ignored: could not match '_Tp *' against 'WrappedIter<std::__wrap_iter<int *>>'
  172 | _Tp* __to_address(_Tp* __p) _NOEXCEPT {
      |      ^
/opt/compiler-explorer/clang-trunk-20230602/bin/../include/c++/v1/__memory/pointer_traits.h:204:1: note: candidate template ignored: requirement 'integral_constant<bool, false>::value' was not satisfied [with _Pointer = WrappedIter<std::__wrap_iter<int *>>]
  204 | __to_address(const _Pointer& __p) _NOEXCEPT {
      | ^

The code itself is wrong as the wrapper (unintentionally) pretends to be a contiguous iterator, but fails to provide a proper std::to_address implementation (as the arrow function is not const).
Nevertheless, I wanted to sanity check that folks feel this new compiler error is standard-compliant. @hiraditya what are your thoughts on this?

We've noticed during our integrate that the following code (godbolt) started producing a compiler error after this change:

#include <vector>

struct Value {};
template <class Iter>
struct WrappedIter : Iter {
    using pointer = Value*;
    using reference = Value&;

    pointer operator-> () { return &v; }
    reference operator* () { return v;  }

private:
  Value v;
};


using WI = WrappedIter<std::vector<int>::iterator>;
void foo(WI b, WI e) {
    std::vector<Value> v(b, e);
}

Concretely, the call to vector constructor eventually gets to std::__to_address via __unwrap_iter and results in a compiler error:

/opt/compiler-explorer/clang-trunk-20230602/bin/../include/c++/v1/__algorithm/unwrap_iter.h:46:32: error: no matching function for call to '__to_address'
...
/opt/compiler-explorer/clang-trunk-20230602/bin/../include/c++/v1/__memory/pointer_traits.h:172:6: note: candidate template ignored: could not match '_Tp *' against 'WrappedIter<std::__wrap_iter<int *>>'
  172 | _Tp* __to_address(_Tp* __p) _NOEXCEPT {
      |      ^
/opt/compiler-explorer/clang-trunk-20230602/bin/../include/c++/v1/__memory/pointer_traits.h:204:1: note: candidate template ignored: requirement 'integral_constant<bool, false>::value' was not satisfied [with _Pointer = WrappedIter<std::__wrap_iter<int *>>]
  204 | __to_address(const _Pointer& __p) _NOEXCEPT {
      | ^

The code itself is wrong as the wrapper (unintentionally) pretends to be a contiguous iterator, but fails to provide a proper std::to_address implementation (as the arrow function is not const).
Nevertheless, I wanted to sanity check that folks feel this new compiler error is standard-compliant. @hiraditya what are your thoughts on this?

Yes, this is compliant, since you claim to be a contiguous_iterator, but don't provide the full interface. Your code would fail in the exact same way if you called std::sort, std::copy, or other algorithms that also unwrap the iterator.

@philnik makes sense, thanks for clarifying.
I wasn't sure if the standard requires to fall back to implementation for a weaker iterator since the type does not implement the corresponding concept because std::to_address can't be called.
But I didn't want to spend too much time digging through the standard.

hans added a subscriber: hans.Jun 9 2023, 1:51 AM

We're hitting build failures after this change. Here's a reduced version:

$ cat /tmp/a.cc
#include <vector>

void f(volatile int *p, int n) {
  std::vector<int> v(p, p + n);
}

$ build2/bin/clang -c -std=c++20 -stdlib=libc++ /tmp/a.cc
In file included from /tmp/a.cc:1:
In file included from /work/llvm-project/build2/bin/../include/c++/v1/vector:317:
In file included from /work/llvm-project/build2/bin/../include/c++/v1/__format/formatter_bool.h:17:
In file included from /work/llvm-project/build2/bin/../include/c++/v1/__format/concepts.h:17:
In file included from /work/llvm-project/build2/bin/../include/c++/v1/__format/format_parse_context.h:16:
In file included from /work/llvm-project/build2/bin/../include/c++/v1/string_view:1048:
In file included from /work/llvm-project/build2/bin/../include/c++/v1/algorithm:1946:
In file included from /work/llvm-project/build2/bin/../include/c++/v1/memory:896:
In file included from /work/llvm-project/build2/bin/../include/c++/v1/__memory/ranges_uninitialized_algorithms.h:22:
/work/llvm-project/build2/bin/../include/c++/v1/__memory/uninitialized_algorithms.h:589:12: error: cannot initialize return object of type 'int *' with an rvalue of type 'volatile int *'
    return std::copy(__first1, __last1, const_cast<_RawTypeIn*>(__first2));
           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/work/llvm-project/build2/bin/../include/c++/v1/__memory/uninitialized_algorithms.h:596:26: note: in instantiation of function template specialization 'std::__uninitialized_allocator_copy_impl<std::allocator<int>, volatile int, volatile int, int, nullptr>' requested here
    auto __result = std::__uninitialized_allocator_copy_impl(__alloc, __unwrapped_range.first, __unwrapped_range.second, std::__unwrap_iter(__first2));
                         ^
/work/llvm-project/build2/bin/../include/c++/v1/vector:1161:22: note: in instantiation of function template specialization 'std::__uninitialized_allocator_copy<std::allocator<int>, volatile int *, volatile int *, int *>' requested here
  __tx.__pos_ = std::__uninitialized_allocator_copy(__alloc(), __first, __last, __tx.__pos_);
                     ^
/work/llvm-project/build2/bin/../include/c++/v1/vector:790:9: note: in instantiation of function template specialization 'std::vector<int>::__construct_at_end<volatile int *, volatile int *>' requested here
        __construct_at_end(__first, __last, __n);
        ^
/work/llvm-project/build2/bin/../include/c++/v1/vector:1278:3: note: in instantiation of function template specialization 'std::vector<int>::__init_with_size<volatile int *, volatile int *>' requested here
  __init_with_size(__first, __last, __n);
  ^
/tmp/a.cc:4:20: note: in instantiation of function template specialization 'std::vector<int>::vector<volatile int *, 0>' requested here
  std::vector<int> v(p, p + n);
                   ^
1 error generated.

We're hitting build failures after this change. Here's a reduced version:

$ cat /tmp/a.cc
#include <vector>

void f(volatile int *p, int n) {
  std::vector<int> v(p, p + n);
}

/work/llvm-project/build2/bin/../include/c++/v1/__memory/uninitialized_algorithms.h:589:12: error: cannot initialize return object of type 'int *' with an rvalue of type 'volatile int *'
    return std::copy(__first1, __last1, const_cast<_RawTypeIn*>(__first2));

This looks like a legit bug, calling std::copy (which gets optimized to memmove) on volatile source type seems incorrect. cc: @philnik for clarification.

We're hitting build failures after this change. Here's a reduced version:

$ cat /tmp/a.cc
#include <vector>

void f(volatile int *p, int n) {
  std::vector<int> v(p, p + n);
}

/work/llvm-project/build2/bin/../include/c++/v1/__memory/uninitialized_algorithms.h:589:12: error: cannot initialize return object of type 'int *' with an rvalue of type 'volatile int *'
    return std::copy(__first1, __last1, const_cast<_RawTypeIn*>(__first2));

This looks like a legit bug, calling std::copy (which gets optimized to memmove) on volatile source type seems incorrect. cc: @philnik for clarification.

I don't think std::copy gets optimized to a memmove when it gets a volatile pointer. The problem is that our return type is OutT*, but std::copy gets remove_const_t<InT>*. I think the simplest solution would be to check for is_same<__remove_const_t<_In>, __remove_const_t<_Out> instead of is_same<__remove_cv_t<_In>, __remove_cv_t<_Out>. @hiraditya would you fix this in a quick follow-up patch?

We're hitting build failures after this change. Here's a reduced version:

$ cat /tmp/a.cc
#include <vector>

void f(volatile int *p, int n) {
  std::vector<int> v(p, p + n);
}

/work/llvm-project/build2/bin/../include/c++/v1/__memory/uninitialized_algorithms.h:589:12: error: cannot initialize return object of type 'int *' with an rvalue of type 'volatile int *'
    return std::copy(__first1, __last1, const_cast<_RawTypeIn*>(__first2));

This looks like a legit bug, calling std::copy (which gets optimized to memmove) on volatile source type seems incorrect. cc: @philnik for clarification.

I don't think std::copy gets optimized to a memmove when it gets a volatile pointer. The problem is that our return type is OutT*, but std::copy gets remove_const_t<InT>*. I think the simplest solution would be to check for is_same<__remove_const_t<_In>, __remove_const_t<_Out> instead of is_same<__remove_cv_t<_In>, __remove_cv_t<_Out>. @hiraditya would you fix this in a quick follow-up patch?

Yeah, just did that in https://reviews.llvm.org/D152571 i'll also add a test case soon.