This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
libcxx/include/__memory/
-
include/
-
__memory/
12/16
uninitialized_algorithms.h

Differential D147741

[libc++, std::vector] call the optimized version of __uninitialized_allocator_copy for trivial types
ClosedPublic

Authored by hiraditya on Apr 6 2023, 1:56 PM.

Download Raw Diff

Details

Reviewers

philnik
ldionne
EricWF
var-const

Group Reviewers

Restricted Project

Commits

rG63a2b206fa4e: [libc++, std::vector] call the optimized version of…

Summary

See: https://github.com/llvm/llvm-project/issues/61987

Fix suggested by: @philnik and @var-const

Testing:
ninja check-cxx check-clang check-llvm

Benchmark Testcases (BM_CopyConstruct, and BM_Assignment) added.

performance improvement:

Run on (8 X 4800 MHz CPU s)
CPU Caches:
  L1 Data 48 KiB (x4)
  L1 Instruction 32 KiB (x4)
  L2 Unified 1280 KiB (x4)
  L3 Unified 12288 KiB (x1)
Load Average: 1.66, 3.02, 2.43

Comparing build-runtimes-base/libcxx/benchmarks/vector_operations.libcxx.out to build-runtimes/libcxx/benchmarks/vector_operations.libcxx.out
Benchmark                                                   Time             CPU      Time Old      Time New       CPU Old       CPU New
----------------------------------------------------------------------------------------------------------------------------------------
BM_ConstructSize/vector_byte/5140480                     +0.0362         +0.0362        116906        121132        116902        121131
BM_CopyConstruct/vector_int/5140480                      -0.4563         -0.4577       1755224        954241       1755330        951987
BM_Assignment/vector_int/5140480                         -0.0222         -0.0220        990045        968095        989917        968125
BM_ConstructSizeValue/vector_byte/5140480                +0.0308         +0.0307        116970        120567        116977        120573
BM_ConstructIterIter/vector_char/1024                    -0.0831         -0.0831            19            17            19            17
BM_ConstructIterIter/vector_size_t/1024                  +0.0129         +0.0131            88            89            88            89
BM_ConstructIterIter/vector_string/1024                  -0.0064         -0.0018         54455         54109         54208         54112
OVERALL_GEOMEAN                                          -0.0845         -0.0842             0             0             0             0

FYI, the perf improvements for BM_CopyConstruct due to this patch is mostly subsumed by the https://reviews.llvm.org/D149826. However this patch still adds value by converting copy to memmove (the second testcase).

Before the patch:

; Function Attrs: nounwind uwtable
define linkonce_odr dso_local void @_ZNSt3__16vectorIiNS_9allocatorIiEEE18__construct_at_endIPiS5_EEvT_T0_m(ptr noundef nonnull align 8 dereferenceable(24) %0, ptr noundef %1, ptr noundef %2, i64 noundef %3) local_unnamed_addr #4 comdat align 2 {
  %5 = getelementptr inbounds %"class.std::__1::vector", ptr %0, i64 0, i32 1
  %6 = load ptr, ptr %5, align 8, !tbaa !12
  %7 = icmp eq ptr %1, %2 
  br i1 %7, label %16, label %8

8:                                                ; preds = %4, %8 
  %9 = phi ptr [ %13, %8 ], [ %1, %4 ]
  %10 = phi ptr [ %14, %8 ], [ %6, %4 ]
  %11 = icmp ne ptr %10, null
  tail call void @llvm.assume(i1 %11)
  %12 = load i32, ptr %9, align 4, !tbaa !14
  store i32 %12, ptr %10, align 4, !tbaa !14
  %13 = getelementptr inbounds i32, ptr %9, i64 1
  %14 = getelementptr inbounds i32, ptr %10, i64 1
  %15 = icmp eq ptr %13, %2
  br i1 %15, label %16, label %8, !llvm.loop !16

16:                                               ; preds = %8, %4 
  %17 = phi ptr [ %6, %4 ], [ %14, %8 ]
  store ptr %17, ptr %5, align 8, !tbaa !12
  ret void
}

After the patch:

; Function Attrs: nounwind uwtable
define linkonce_odr dso_local void @_ZNSt3__16vectorIiNS_9allocatorIiEEE18__construct_at_endIPiS5_EEvT_T0_m(ptr noundef nonnull align 8 dereferenceable(24) %0, ptr noundef %1, ptr noundef %2, i64 noundef %3) local_unnamed_addr #4 comdat align 2 {
  %5 = getelementptr inbounds %"class.std::__1::vector", ptr %0, i64 0, i32 1
  %6 = load ptr, ptr %5, align 8, !tbaa !12
  %7 = ptrtoint ptr %2 to i64
  %8 = ptrtoint ptr %1 to i64
  %9 = sub i64 %7, %8
  %10 = ashr exact i64 %9, 2
  tail call void @llvm.memmove.p0.p0.i64(ptr align 4 %6, ptr align 4 %1, i64 %9, i1 false)
  %11 = getelementptr inbounds i32, ptr %6, i64 %10
  store ptr %11, ptr %5, align 8, !tbaa !12
  ret void
}

This is due to the optimized version of uninitialized_allocator_copy function.

Diff Detail

Unit TestsFailed

	Time	Test
	2,330 ms	libcxx CI C++03 > llvm-libc++-shared-cfg-in.libcxx::clang_tidy.sh.cpp
	2,330 ms	libcxx CI C++03 > llvm-libc++-shared-cfg-in.libcxx::clang_tidy.sh.cpp
	2,330 ms	libcxx CI C++03 > llvm-libc++-shared-cfg-in.libcxx::clang_tidy.sh.cpp
	2,330 ms	libcxx CI C++03 > llvm-libc++-shared-cfg-in.libcxx::clang_tidy.sh.cpp
	2,330 ms	libcxx CI C++03 > llvm-libc++-shared-cfg-in.libcxx::clang_tidy.sh.cpp
		View Full Test Results (17,925 Failed)

Event Timeline

hiraditya created this revision.Apr 6 2023, 1:56 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 6 2023, 1:56 PM

hiraditya requested review of this revision.Apr 6 2023, 1:56 PM

Harbormaster completed remote builds in B224099: Diff 511521.Apr 6 2023, 2:37 PM

hiraditya updated this revision to Diff 511559.Apr 6 2023, 4:36 PM

hiraditya edited the summary of this revision. (Show Details)

hiraditya added a reviewer: EricWF.Apr 6 2023, 5:13 PM

hiraditya retitled this revision from unwrap iterator parameters to __uninitialized_allocator_copy before calling std::copy to [libc++] unwrap iterator parameters to __uninitialized_allocator_copy before calling std::copy.Apr 6 2023, 5:43 PM

Harbormaster completed remote builds in B224132: Diff 511559.Apr 6 2023, 5:57 PM

philnik set the repository for this revision to rG LLVM Github Monorepo.Apr 7 2023, 3:22 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 7 2023, 3:22 AM

Herald added a reviewer: Restricted Project. · View Herald Transcript

Herald added a subscriber: libcxx-commits. · View Herald Transcript

This doesn't fix the problem. __wrap_iters will still not get unwrapped.

This revision now requires changes to proceed.Apr 10 2023, 12:03 PM

In D147741#4256058, @philnik wrote:

This doesn't fix the problem. __wrap_iters will still not get unwrapped.

what else do we need to unwrap?

In D147741#4256801, @hiraditya wrote:

In D147741#4256058, @philnik wrote:

This doesn't fix the problem. __wrap_iters will still not get unwrapped.

what else do we need to unwrap?

You can look at __algorithm/equal.h to see how we unwrap iterators elsewhere.

hiraditya updated this revision to Diff 512815.Apr 12 2023, 6:50 AM

You can look at __algorithm/equal.h to see how we unwrap iterators elsewhere.

I saw the changes in: https://reviews.llvm.org/D139554 and tried to adapt from there. For some reason the _uninitialized_allocator_copy_impl(_Alloc& __alloc, _Iter1 __first1, _Sent1 __last1, _Iter2 __first2) version still gets called unless I add the enable_if_t (see comments). But that is too restrictive and fails many of the tests.

philnik added inline comments.Apr 12 2023, 8:53 AM

libcxx/include/__memory/uninitialized_algorithms.h
557–562	I think this can be fixed by using different types for the `__first1` and `__first2` pointers and adding the condition `is_same<__remove_cv_t<_Type1>, __remove_cv_t<_Type2>>::value` to the `enable_if` in the raw pointer overload. Then that overload will always have priority over the more generic one. Then this overload (the generic one) doesn't need any `enable_if`s.

Harbormaster completed remote builds in B225060: Diff 512815.Apr 12 2023, 9:29 AM

hiraditya updated this revision to Diff 512931.Apr 12 2023, 11:58 AM

hiraditya added inline comments.

libcxx/include/__memory/uninitialized_algorithms.h
557–562	The Iter version is still selected with this change.

Harbormaster completed remote builds in B225147: Diff 512931.Apr 12 2023, 12:32 PM

hiraditya updated this revision to Diff 512965.Apr 12 2023, 1:56 PM

Harbormaster completed remote builds in B225176: Diff 512965.Apr 12 2023, 4:12 PM

hiraditya updated this revision to Diff 515506.Apr 20 2023, 3:27 PM

hiraditya retitled this revision from [libc++] unwrap iterator parameters to __uninitialized_allocator_copy before calling std::copy to [libc++, std::vector] call the optimized version of __uninitialized_allocator_copy for trivial types.

hiraditya edited the summary of this revision. (Show Details)

hiraditya mentioned this in D148850: [libc++] Add test case for benchmarking vector copy.

hiraditya edited the summary of this revision. (Show Details)Apr 20 2023, 3:30 PM

hiraditya edited the summary of this revision. (Show Details)Apr 20 2023, 3:33 PM

Harbormaster completed remote builds in B226998: Diff 515506.Apr 20 2023, 8:29 PM

hiraditya edited the summary of this revision. (Show Details)Apr 21 2023, 2:05 PM

hiraditya edited the summary of this revision. (Show Details)Apr 21 2023, 2:08 PM

Looking at the benchmark this patch has a negative effect on the average times. Can you explain why that happens? Did you look at the generated assembly after your changes?

libcxx/include/__memory/uninitialized_algorithms.h
610	LLVM style is not `else` after a `return`, here and other places in this patch. Note the `if constexpr` does get the `else` to avoid generating unneeded code by the compiler.
libcxx/include/vector
1036 ↗	(On Diff #515506)	Please look for other `::value`s to. Since C++17 they all support the `_v` suffix.

Looking at the benchmark this patch has a negative effect on the average times. Can you explain why that happens?

Oh it seems i reversed the base vs. diff. I've fix the numbers after running again.

Did you look at the generated assembly after your changes?

Yes, the vector::copy operation (BM_CopyConstruct/vector_int) is just gone for after this patch.

hiraditya edited the summary of this revision. (Show Details)Apr 27 2023, 12:07 PM

hiraditya added inline comments.Apr 27 2023, 12:10 PM

libcxx/include/__memory/uninitialized_algorithms.h
610	This code is directly copied from the function above. Not sure what is the guidance for libcxx is, cc: @ldionne

I would have expected a lot higher improvements from this. Does clang generate better vectorized code for copies now?

libcxx/include/__memory/uninitialized_algorithms.h
557–562	Works for me: https://godbolt.org/z/4hh75oY9f
libcxx/include/vector
1034–1044 ↗	(On Diff #515506)	This should still not be in `vector`. Otherwise we have to duplicate this.

This revision now requires changes to proceed.Apr 27 2023, 1:55 PM

Moved the check to uninitialized_algorithms.h

In D147741#4303326, @philnik wrote:

I would have expected a lot higher improvements from this. Does clang generate better vectorized code for copies now?

Updated the results after rerunning. With this change clang would generate memcpy for copying vector<int>.

For the following testcase, the copy is deleted because the compiler would figure out that it is a dead code.

#include<vector>
using namespace std;
using T = int;
T vev_copy(std::vector<T> v1, std::vector<T> &v2) {
    v2 = v1;
    return 10;
}

With the above patch and the following command:

$ /usr/bin/clang++ -I/usr/local/home/llvm-project/build/include/c++/v1 -I/usr/local/home/llvm-project/build/libcxx/benchmarks/benchmark-libcxx/include -I/usr/local/home/llvm-project/libcxx/test/support -I/usr/local/home/llvm-project/libcxxabi/include -O3 -nostdinc++ -std=c++20 -S ../build/test.cpp -w -fno-exceptions -o -

_Z8vec_copyNSt3__16vectorIiNS_9allocatorIiEEEE: # @_Z8vec_copyNSt3__16vectorIiNS_9allocatorIiEEEE
        .cfi_startproc
# %bb.0:
        subq    $24, %rsp
        .cfi_def_cfa_offset 32
        xorps   %xmm0, %xmm0
        movaps  %xmm0, (%rsp)
        movq    $0, 16(%rsp)
        movq    8(%rdi), %rax
        cmpq    (%rdi), %rax
        js      .LBB0_2
# %bb.1:
        movl    $10, %eax
        addq    $24, %rsp
        .cfi_def_cfa_offset 8
        retq
.LBB0_2:
        .cfi_def_cfa_offset 32
        movq    %rsp, %rdi
        callq   _ZNKSt3__16vectorIiNS_9allocatorIiEEE20__throw_length_errorB7v170000Ev
.Lfunc_end0:
        .size   _Z8vec_copyNSt3__16vectorIiNS_9allocatorIiEEEE, .Lfunc_end0-_Z8vec_copyNSt3__16vectorIiNS_9allocatorIiEEEE
        .cfi_endproc

hiraditya added inline comments.Apr 27 2023, 5:06 PM

libcxx/include/__memory/uninitialized_algorithms.h
557–562	Ah i see, removing the `const` from args made it work i think. I'll re-use your version then. Thanks so much!

Using @philnik's suggestion again as it works once a non-const version of the __uninitialized_allocator_copy is added.

hiraditya added inline comments.Apr 27 2023, 5:21 PM

libcxx/include/__memory/uninitialized_algorithms.h
602	this version gets called for test cases like `vector<const int>`

hiraditya edited the summary of this revision. (Show Details)Apr 27 2023, 5:30 PM

Harbormaster completed remote builds in B228703: Diff 517757.Apr 27 2023, 5:36 PM

Mordante added inline comments.Apr 27 2023, 11:04 PM

libcxx/include/__memory/uninitialized_algorithms.h
610	We have more code that does not conform to our policy, but for new code we use our policy.

philnik added inline comments.Apr 28 2023, 7:39 AM

libcxx/include/__memory/uninitialized_algorithms.h
602	But the other overload works for that just as well, no?
610	I'm not aware that we use this particular policy in our new code. Anyways, this code is redundant anyways, and shouldn't have to be added (unless I'm missing something).

hiraditya updated this revision to Diff 518006.Apr 28 2023, 11:48 AM

Harbormaster failed remote builds in B228890: Diff 518006!Apr 28 2023, 11:49 AM

hiraditya marked an inline comment as done.Apr 28 2023, 11:49 AM

hiraditya added inline comments.

libcxx/include/__memory/uninitialized_algorithms.h
610	it was indeed redundant. removed it.

hiraditya updated this revision to Diff 518486.May 1 2023, 10:23 AM

hiraditya marked an inline comment as done.

Harbormaster failed remote builds in B229263: Diff 518486!May 1 2023, 10:23 AM

hiraditya marked an inline comment as done.May 1 2023, 10:23 AM

Added benchmark and fix in the same patch.

hiraditya edited the summary of this revision. (Show Details)May 1 2023, 3:38 PM

Harbormaster failed remote builds in B229336: Diff 518584!May 1 2023, 3:46 PM

The implementation itself looks good, but the benchmarks need some work.

libcxx/benchmarks/ContainerBenchmarks.h
30–34 ↗	(On Diff #518584)	It looks like this doesn't do what it's supposed to: https://godbolt.org/z/vdqGrcM3P. At least with your new version, the container never get copied. You have to add `DoNotOptimize(v)` or something similar, so the optimizer doesn't just remove the code. Then you can also drop the `__attribute__((noinline))`. Same applies to `CopyContainerInto`. I would just remove the functions and make the copy construction and assignment inline.
libcxx/include/__memory/uninitialized_algorithms.h
593	You're missing a `_LIBCPP_HIDE_FROM_ABI`.

hiraditya updated this revision to Diff 519179.May 3 2023, 11:23 AM

hiraditya edited the summary of this revision. (Show Details)

hiraditya marked an inline comment as done.May 3 2023, 11:27 AM

hiraditya added inline comments.

libcxx/benchmarks/ContainerBenchmarks.h
30–34 ↗	(On Diff #518584)	the intent is to show that redundant copy of vector is removed because of the new changes. this was the motivation behind creating the bug. inlining this in the caller should have the same effect. if we try to prevent the deletion, that could be a separate test case? let me know your preference, i'm fine either way.

Harbormaster completed remote builds in B229761: Diff 519179.May 3 2023, 1:39 PM

Updated the testcase to avoid deletion of vector, also inlined the tests.

hiraditya edited the summary of this revision. (Show Details)May 8 2023, 1:34 PM

hiraditya edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B230699: Diff 520473.May 8 2023, 4:51 PM

hiraditya marked an inline comment as done.May 10 2023, 9:48 AM

LGTM % nit.

libcxx/benchmarks/ContainerBenchmarks.h
45 ↗	(On Diff #520473)	For good measure I'd add a `DoNotOptimize` after the assignment.
30–34 ↗	(On Diff #518584)	I don't think it makes sense to test the removal of redundant code. IMO the interesting part is forwarding real calls to memmove, since that is the way more likely situation in the wild. Removal of redundant code is just a nice side effect from my point of view.

This revision is now accepted and ready to land.May 11 2023, 2:34 PM

hiraditya added inline comments.May 12 2023, 11:50 AM

libcxx/benchmarks/ContainerBenchmarks.h
30–34 ↗	(On Diff #518584)	sgtm. the modified test (with DoNotOptimizeData) keeps the vector alive.

hiraditya updated this revision to Diff 521754.May 12 2023, 12:00 PM

hiraditya marked 2 inline comments as done.

Harbormaster completed remote builds in B231675: Diff 521754.May 12 2023, 2:30 PM

Addressed issue with pair of contiguous_iterator<int*>, sentinel_wrapper<contiguous_iterator<int*>> . That caused failures with tests added in https://reviews.llvm.org/D149826

As per @var-const

unwrap_iter can't deal with a situation where the iterator and the sentinel are different types, and unwrap_range was written for that case

philnik added inline comments.May 15 2023, 3:55 PM

libcxx/include/__memory/uninitialized_algorithms.h
598	You should be able to use `__unwrap_range` regardless of C++ version.

Remove std=c++20 check for using unwrap_range.

hiraditya marked an inline comment as done.May 15 2023, 5:05 PM

hiraditya added inline comments.

libcxx/include/__memory/uninitialized_algorithms.h
598	i see! fixed it.

Harbormaster completed remote builds in B232159: Diff 522389.May 15 2023, 7:26 PM

@hiraditya The benchmark shows a significant speedup for copy construction but almost no speedup for copy assignment. Is the latter expected? If not, it might indicate an issue with the optimization or, more likely, with the benchmark.

libcxx/benchmarks/ContainerBenchmarks.h
42 ↗	(On Diff #522389)	Is it deliberate that `c1` and `c2` are the same size (even though `c1` gets overwritten)? If yes, please add a comment.
libcxx/include/__memory/uninitialized_algorithms.h
578	Nit: this name looks inconsistent with `_Type`. One option is to rename `_Type` to `_Tp`, but I'd suggest renaming `_Up` to `_Out` or similar (and perhaps `_Type` to `_In` for symmetry).
595	This line is too long, but more importantly, I think this expression is a little hard to read because of so many nested subexpressions. How about splitting it into something like: auto __unwrapped_out = std::__unwrap_iter(__first2); auto __result = std::__uninitialized_allocator_copy_impl(__alloc, __unwrapped_range.first, __unwrapped_range.second, __unwrapped_out); return std::__rewrap_iter(__first2, __result); ?

This revision now requires changes to proceed.May 15 2023, 8:33 PM

hiraditya marked an inline comment as done.May 16 2023, 7:33 AM

Addressed comments.

hiraditya marked 2 inline comments as done.May 16 2023, 7:44 AM

hiraditya added inline comments.

libcxx/benchmarks/ContainerBenchmarks.h
42 ↗	(On Diff #522389)	Yeah, there was no reason to have `c2` as the same size. Updated the test.

In D147741#4344506, @var-const wrote:

@hiraditya The benchmark shows a significant speedup for copy construction but almost no speedup for copy assignment. Is the latter expected? If not, it might indicate an issue with the optimization or, more likely, with the benchmark.

Initially i thought there should be an impact but there isn't because compiler does the right thing even as is. I can remove the test for copy assignment if that makes more sense?

Harbormaster completed remote builds in B232315: Diff 522615.May 16 2023, 9:22 AM

In D147741#4346199, @hiraditya wrote:

In D147741#4344506, @var-const wrote:

@hiraditya The benchmark shows a significant speedup for copy construction but almost no speedup for copy assignment. Is the latter expected? If not, it might indicate an issue with the optimization or, more likely, with the benchmark.

Initially i thought there should be an impact but there isn't because compiler does the right thing even as is. I can remove the test for copy assignment if that makes more sense?

I wouldn't remove the test -- just wanted to make sure we understand why there's a difference. I'm a little surprised the compiler could optimize the assignment case but not the construction case, but if that's what the assembly shows, then I don't think we need to dig any further.

libcxx/benchmarks/ContainerBenchmarks.h
42 ↗	(On Diff #522389)	Hmm, I expected that `c1` will be empty, not `c2` (since `c2` is the one that provides the final value). Or does the optimization only apply if the existing capacity can be reused? If so, then I think the previous state with both `c1` and `c2` being large was better.

hiraditya updated this revision to Diff 522910.May 16 2023, 11:34 PM

hiraditya added inline comments.

libcxx/benchmarks/ContainerBenchmarks.h
42 ↗	(On Diff #522389)	oops, good catch!

I wouldn't remove the test -- just wanted to make sure we understand why there's a difference. I'm a little surprised the compiler could optimize the assignment case but not the construction case, but if that's what the assembly shows, then I don't think we need to dig any further.

The assembly does show the difference. only that performance numbers didn't show any measurable difference on my machine. For example the following test case:

#include<vector>
using namespace std;
using T = int;
T vec_copy(std::vector<T> v1, std::vector<T> &v2) {
    v2 = v1; 
    return 10; 
}

And with the patch, clang does generate an extra memmove for a for loop.

$ clang++ -I ~/g/llvm-project/build-runtimes/include/c++/v1 -I ~/g/llvm-project/build-runtimes/libcxx/benchmarks/benchmark-libcxx/include -I ~/g/llvm-project/libcxx/test/support -I ~/g/llvm-project/libcxxabi/include -O3 -nostdinc++ -std=c++20 -S test1.cpp -w -fno-exceptions

$ grep memmove test1.base.s | wc -l

$ grep memmove test1.s | wc -l

Harbormaster completed remote builds in B232512: Diff 522910.May 17 2023, 8:49 AM

philnik accepted this revision.May 17 2023, 7:04 PM

This revision is now accepted and ready to land.May 17 2023, 7:04 PM

hiraditya updated this revision to Diff 525316.May 24 2023, 1:35 PM

hiraditya edited the summary of this revision. (Show Details)

Herald added a subscriber: jeroen.dobbelaere. · View Herald TranscriptMay 24 2023, 1:35 PM

This revision was landed with ongoing or failed builds.May 24 2023, 1:41 PM

Closed by commit rG63a2b206fa4e: [libc++, std::vector] call the optimized version of… (authored by AdityaK <1894981+hiraditya@users.noreply.github.com>). · Explain Why

This revision was automatically updated to reflect the committed changes.

AdityaK <1894981+hiraditya@users.noreply.github.com> added a commit: rG63a2b206fa4e: [libc++, std::vector] call the optimized version of….

Harbormaster completed remote builds in B234313: Diff 525316.May 24 2023, 11:43 PM

We've noticed during our integrate that the following code (godbolt) started producing a compiler error after this change:

#include <vector>

struct Value {};
template <class Iter>
struct WrappedIter : Iter {
    using pointer = Value*;
    using reference = Value&;

    pointer operator-> () { return &v; }
    reference operator* () { return v;  }

private:
  Value v;
};


using WI = WrappedIter<std::vector<int>::iterator>;
void foo(WI b, WI e) {
    std::vector<Value> v(b, e);
}

Concretely, the call to vector constructor eventually gets to std::__to_address via __unwrap_iter and results in a compiler error:

/opt/compiler-explorer/clang-trunk-20230602/bin/../include/c++/v1/__algorithm/unwrap_iter.h:46:32: error: no matching function for call to '__to_address'
...
/opt/compiler-explorer/clang-trunk-20230602/bin/../include/c++/v1/__memory/pointer_traits.h:172:6: note: candidate template ignored: could not match '_Tp *' against 'WrappedIter<std::__wrap_iter<int *>>'
  172 | _Tp* __to_address(_Tp* __p) _NOEXCEPT {
      |      ^
/opt/compiler-explorer/clang-trunk-20230602/bin/../include/c++/v1/__memory/pointer_traits.h:204:1: note: candidate template ignored: requirement 'integral_constant<bool, false>::value' was not satisfied [with _Pointer = WrappedIter<std::__wrap_iter<int *>>]
  204 | __to_address(const _Pointer& __p) _NOEXCEPT {
      | ^

The code itself is wrong as the wrapper (unintentionally) pretends to be a contiguous iterator, but fails to provide a proper std::to_address implementation (as the arrow function is not const).
Nevertheless, I wanted to sanity check that folks feel this new compiler error is standard-compliant. @hiraditya what are your thoughts on this?

In D147741#4391032, @ilya-biryukov wrote:
We've noticed during our integrate that the following code (godbolt) started producing a compiler error after this change:
#include <vector>

struct Value {};
template <class Iter>
struct WrappedIter : Iter {
    using pointer = Value*;
    using reference = Value&;

    pointer operator-> () { return &v; }
    reference operator* () { return v;  }

private:
  Value v;
};


using WI = WrappedIter<std::vector<int>::iterator>;
void foo(WI b, WI e) {
    std::vector<Value> v(b, e);
}
Concretely, the call to vector constructor eventually gets to std::__to_address via __unwrap_iter and results in a compiler error:
/opt/compiler-explorer/clang-trunk-20230602/bin/../include/c++/v1/__algorithm/unwrap_iter.h:46:32: error: no matching function for call to '__to_address'
...
/opt/compiler-explorer/clang-trunk-20230602/bin/../include/c++/v1/__memory/pointer_traits.h:172:6: note: candidate template ignored: could not match '_Tp *' against 'WrappedIter<std::__wrap_iter<int *>>'
  172 | _Tp* __to_address(_Tp* __p) _NOEXCEPT {
      |      ^
/opt/compiler-explorer/clang-trunk-20230602/bin/../include/c++/v1/__memory/pointer_traits.h:204:1: note: candidate template ignored: requirement 'integral_constant<bool, false>::value' was not satisfied [with _Pointer = WrappedIter<std::__wrap_iter<int *>>]
  204 | __to_address(const _Pointer& __p) _NOEXCEPT {
      | ^
The code itself is wrong as the wrapper (unintentionally) pretends to be a contiguous iterator, but fails to provide a proper std::to_address implementation (as the arrow function is not const).
Nevertheless, I wanted to sanity check that folks feel this new compiler error is standard-compliant. @hiraditya what are your thoughts on this?

Yes, this is compliant, since you claim to be a contiguous_iterator, but don't provide the full interface. Your code would fail in the exact same way if you called std::sort, std::copy, or other algorithms that also unwrap the iterator.

@philnik makes sense, thanks for clarifying.
I wasn't sure if the standard requires to fall back to implementation for a weaker iterator since the type does not implement the corresponding concept because std::to_address can't be called.
But I didn't want to spend too much time digging through the standard.

We're hitting build failures after this change. Here's a reduced version:

$ cat /tmp/a.cc
#include <vector>

void f(volatile int *p, int n) {
  std::vector<int> v(p, p + n);
}

$ build2/bin/clang -c -std=c++20 -stdlib=libc++ /tmp/a.cc
In file included from /tmp/a.cc:1:
In file included from /work/llvm-project/build2/bin/../include/c++/v1/vector:317:
In file included from /work/llvm-project/build2/bin/../include/c++/v1/__format/formatter_bool.h:17:
In file included from /work/llvm-project/build2/bin/../include/c++/v1/__format/concepts.h:17:
In file included from /work/llvm-project/build2/bin/../include/c++/v1/__format/format_parse_context.h:16:
In file included from /work/llvm-project/build2/bin/../include/c++/v1/string_view:1048:
In file included from /work/llvm-project/build2/bin/../include/c++/v1/algorithm:1946:
In file included from /work/llvm-project/build2/bin/../include/c++/v1/memory:896:
In file included from /work/llvm-project/build2/bin/../include/c++/v1/__memory/ranges_uninitialized_algorithms.h:22:
/work/llvm-project/build2/bin/../include/c++/v1/__memory/uninitialized_algorithms.h:589:12: error: cannot initialize return object of type 'int *' with an rvalue of type 'volatile int *'
    return std::copy(__first1, __last1, const_cast<_RawTypeIn*>(__first2));
           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/work/llvm-project/build2/bin/../include/c++/v1/__memory/uninitialized_algorithms.h:596:26: note: in instantiation of function template specialization 'std::__uninitialized_allocator_copy_impl<std::allocator<int>, volatile int, volatile int, int, nullptr>' requested here
    auto __result = std::__uninitialized_allocator_copy_impl(__alloc, __unwrapped_range.first, __unwrapped_range.second, std::__unwrap_iter(__first2));
                         ^
/work/llvm-project/build2/bin/../include/c++/v1/vector:1161:22: note: in instantiation of function template specialization 'std::__uninitialized_allocator_copy<std::allocator<int>, volatile int *, volatile int *, int *>' requested here
  __tx.__pos_ = std::__uninitialized_allocator_copy(__alloc(), __first, __last, __tx.__pos_);
                     ^
/work/llvm-project/build2/bin/../include/c++/v1/vector:790:9: note: in instantiation of function template specialization 'std::vector<int>::__construct_at_end<volatile int *, volatile int *>' requested here
        __construct_at_end(__first, __last, __n);
        ^
/work/llvm-project/build2/bin/../include/c++/v1/vector:1278:3: note: in instantiation of function template specialization 'std::vector<int>::__init_with_size<volatile int *, volatile int *>' requested here
  __init_with_size(__first, __last, __n);
  ^
/tmp/a.cc:4:20: note: in instantiation of function template specialization 'std::vector<int>::vector<volatile int *, 0>' requested here
  std::vector<int> v(p, p + n);
                   ^
1 error generated.

In D147741#4407956, @hans wrote:

We're hitting build failures after this change. Here's a reduced version:

$ cat /tmp/a.cc
#include <vector>

void f(volatile int *p, int n) {
  std::vector<int> v(p, p + n);
}

/work/llvm-project/build2/bin/../include/c++/v1/__memory/uninitialized_algorithms.h:589:12: error: cannot initialize return object of type 'int *' with an rvalue of type 'volatile int *'
    return std::copy(__first1, __last1, const_cast<_RawTypeIn*>(__first2));

This looks like a legit bug, calling std::copy (which gets optimized to memmove) on volatile source type seems incorrect. cc: @philnik for clarification.

In D147741#4409483, @hiraditya wrote:
In D147741#4407956, @hans wrote:
We're hitting build failures after this change. Here's a reduced version:
$ cat /tmp/a.cc
#include <vector>

void f(volatile int *p, int n) {
  std::vector<int> v(p, p + n);
}

/work/llvm-project/build2/bin/../include/c++/v1/__memory/uninitialized_algorithms.h:589:12: error: cannot initialize return object of type 'int *' with an rvalue of type 'volatile int *'
    return std::copy(__first1, __last1, const_cast<_RawTypeIn*>(__first2));
This looks like a legit bug, calling std::copy (which gets optimized to memmove) on volatile source type seems incorrect. cc: @philnik for clarification.

I don't think std::copy gets optimized to a memmove when it gets a volatile pointer. The problem is that our return type is OutT*, but std::copy gets remove_const_t<InT>*. I think the simplest solution would be to check for is_same<__remove_const_t<_In>, __remove_const_t<_Out> instead of is_same<__remove_cv_t<_In>, __remove_cv_t<_Out>. @hiraditya would you fix this in a quick follow-up patch?

In D147741#4409500, @philnik wrote:
In D147741#4409483, @hiraditya wrote:
In D147741#4407956, @hans wrote:
We're hitting build failures after this change. Here's a reduced version:
$ cat /tmp/a.cc
#include <vector>

void f(volatile int *p, int n) {
  std::vector<int> v(p, p + n);
}

/work/llvm-project/build2/bin/../include/c++/v1/__memory/uninitialized_algorithms.h:589:12: error: cannot initialize return object of type 'int *' with an rvalue of type 'volatile int *'
    return std::copy(__first1, __last1, const_cast<_RawTypeIn*>(__first2));
This looks like a legit bug, calling std::copy (which gets optimized to memmove) on volatile source type seems incorrect. cc: @philnik for clarification.
I don't think std::copy gets optimized to a memmove when it gets a volatile pointer. The problem is that our return type is OutT*, but std::copy gets remove_const_t<InT>*. I think the simplest solution would be to check for is_same<__remove_const_t<_In>, __remove_const_t<_Out> instead of is_same<__remove_cv_t<_In>, __remove_cv_t<_Out>. @hiraditya would you fix this in a quick follow-up patch?

Yeah, just did that in https://reviews.llvm.org/D152571 i'll also add a test case soon.

AdityaK <1894981+hiraditya@users.noreply.github.com> mentioned this in rG8100aa4c02b0: [libcxx] Use the unoptimized routines for volatile source types.Jun 10 2023, 11:12 PM

Revision Contents

Path

Size

libcxx/

include/

__memory/

uninitialized_algorithms.h

14 lines

Diff 518006

libcxx/include/__memory/uninitialized_algorithms.h

	// -- C++ --			// -- C++ --
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef _LIBCPP___MEMORY_UNINITIALIZED_ALGORITHMS_H			#ifndef _LIBCPP___MEMORY_UNINITIALIZED_ALGORITHMS_H
	#define _LIBCPP___MEMORY_UNINITIALIZED_ALGORITHMS_H			#define _LIBCPP___MEMORY_UNINITIALIZED_ALGORITHMS_H

	#include <__algorithm/copy.h>			#include <__algorithm/copy.h>
	#include <__algorithm/move.h>			#include <__algorithm/move.h>
				#include <__algorithm/unwrap_iter.h>
	#include <__config>			#include <__config>
	#include <__iterator/iterator_traits.h>			#include <__iterator/iterator_traits.h>
	#include <__iterator/reverse_iterator.h>			#include <__iterator/reverse_iterator.h>
	#include <__memory/addressof.h>			#include <__memory/addressof.h>
	#include <__memory/allocator_traits.h>			#include <__memory/allocator_traits.h>
	#include <__memory/construct_at.h>			#include <__memory/construct_at.h>
	#include <__memory/pointer_traits.h>			#include <__memory/pointer_traits.h>
	#include <__memory/voidify.h>			#include <__memory/voidify.h>
	▲ Show 20 Lines • Show All 517 Lines • ▼ Show 20 Lines
	};			};

	// Copy-construct [__first1, __last1) in [__first2, __first2 + N), where N is distance(__first1, __last1).			// Copy-construct [__first1, __last1) in [__first2, __first2 + N), where N is distance(__first1, __last1).
	//			//
	// The caller has to ensure that __first2 can hold at least N uninitialized elements. If an exception is thrown the			// The caller has to ensure that __first2 can hold at least N uninitialized elements. If an exception is thrown the
	// already copied elements are destroyed in reverse order of their construction.			// already copied elements are destroyed in reverse order of their construction.
	template <class _Alloc, class _Iter1, class _Sent1, class _Iter2>			template <class _Alloc, class _Iter1, class _Sent1, class _Iter2>
	_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 _Iter2			_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 _Iter2
	__uninitialized_allocator_copy(_Alloc& __alloc, _Iter1 __first1, _Sent1 __last1, _Iter2 __first2) {			__uninitialized_allocator_copy_impl(_Alloc& __alloc, _Iter1 __first1, _Sent1 __last1, _Iter2 __first2) {
	auto __destruct_first = __first2;			auto __destruct_first = __first2;
	auto __guard =			auto __guard =
	std::__make_exception_guard(_AllocatorDestroyRangeReverse<_Alloc, _Iter2>(__alloc, __destruct_first, __first2));			std::__make_exception_guard(_AllocatorDestroyRangeReverse<_Alloc, _Iter2>(__alloc, __destruct_first, __first2));
	while (__first1 != __last1) {			while (__first1 != __last1) {
	allocator_traits<_Alloc>::construct(__alloc, std::__to_address(__first2), *__first1);			allocator_traits<_Alloc>::construct(__alloc, std::__to_address(__first2), *__first1);
	++__first1;			++__first1;
	++__first2;			++__first2;
	}			}
	__guard.__complete();			__guard.__complete();
	return __first2;			return __first2;
	}			}

	template <class _Alloc, class _Type>			template <class _Alloc, class _Type>
				philnikUnsubmitted Not Done Reply Inline Actions I think this can be fixed by using different types for the `__first1` and `__first2` pointers and adding the condition `is_same<__remove_cv_t<_Type1>, __remove_cv_t<_Type2>>::value` to the `enable_if` in the raw pointer overload. Then that overload will always have priority over the more generic one. Then this overload (the generic one) doesn't need any `enable_if`s. philnik: I think this can be fixed by using different types for the `__first1` and `__first2` pointers…
				hiradityaAuthorUnsubmitted Done Reply Inline Actions The Iter version is still selected with this change. hiraditya: The Iter version is still selected with this change.
				philnikUnsubmitted Not Done Reply Inline Actions Works for me: https://godbolt.org/z/4hh75oY9f philnik: Works for me: https://godbolt.org/z/4hh75oY9f
				hiradityaAuthorUnsubmitted Done Reply Inline Actions Ah i see, removing the `const` from args made it work i think. I'll re-use your version then. Thanks so much! hiraditya: Ah i see, removing the `const` from args made it work i think. I'll re-use your version then.
	struct __allocator_has_trivial_copy_construct : _Not<__has_construct<_Alloc, _Type*, const _Type&> > {};			struct __allocator_has_trivial_copy_construct : _Not<__has_construct<_Alloc, _Type*, const _Type&> > {};

	template <class _Type>			template <class _Type>
	struct __allocator_has_trivial_copy_construct<allocator<_Type>, _Type> : true_type {};			struct __allocator_has_trivial_copy_construct<allocator<_Type>, _Type> : true_type {};

	template <class _Alloc,			template <class _Alloc,
	class _Type,			class _Type,
	class _RawType = __remove_const_t<_Type>,			class _RawType = __remove_const_t<_Type>,
				class _Up,
	__enable_if_t<			__enable_if_t<
	// using _RawType because of the allocator<T const> extension			// using _RawType because of the allocator<T const> extension
	is_trivially_copy_constructible<_RawType>::value && is_trivially_copy_assignable<_RawType>::value &&			is_trivially_copy_constructible<_RawType>::value && is_trivially_copy_assignable<_RawType>::value &&
				is_same<__remove_cv_t<_Type>, __remove_cv_t<_Up> >::value &&
	__allocator_has_trivial_copy_construct<_Alloc, _RawType>::value>* = nullptr>			__allocator_has_trivial_copy_construct<_Alloc, _RawType>::value>* = nullptr>
	_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 _Type*			_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 _Up*
	__uninitialized_allocator_copy(_Alloc&, const _Type* __first1, const _Type* __last1, _Type* __first2) {			__uninitialized_allocator_copy_impl(_Alloc&, _Type* __first1, _Type* __last1, _Up* __first2) {
				var-constUnsubmitted Done Reply Inline Actions Nit: this name looks inconsistent with `_Type`. One option is to rename `_Type` to `_Tp`, but I'd suggest renaming `_Up` to `_Out` or similar (and perhaps `_Type` to `_In` for symmetry). var-const: Nit: this name looks inconsistent with `_Type`. One option is to rename `_Type` to `_Tp`, but…
	// TODO: Remove the const_cast once we drop support for std::allocator<T const>			// TODO: Remove the const_cast once we drop support for std::allocator<T const>
	if (__libcpp_is_constant_evaluated()) {			if (__libcpp_is_constant_evaluated()) {
	while (__first1 != __last1) {			while (__first1 != __last1) {
	std::__construct_at(std::__to_address(__first2), *__first1);			std::__construct_at(std::__to_address(__first2), *__first1);
	++__first1;			++__first1;
	++__first2;			++__first2;
	}			}
	return __first2;			return __first2;
	} else {			} else {
	return std::copy(__first1, __last1, const_cast<_RawType*>(__first2));			return std::copy(__first1, __last1, const_cast<_RawType*>(__first2));
	}			}
	}			}

				template <class _Alloc, class _Iter1, class _Sent1, class _Iter2>
				constexpr _Iter2 __uninitialized_allocator_copy(_Alloc& __alloc, _Iter1 __first1, _Sent1 __last1, _Iter2 __first2) {
				philnikUnsubmitted Done Reply Inline Actions You're missing a `_LIBCPP_HIDE_FROM_ABI`. philnik: You're missing a `_LIBCPP_HIDE_FROM_ABI`.
				return std::__rewrap_iter(__first2, std::__uninitialized_allocator_copy_impl(__alloc, std::__unwrap_iter(__first1), std::__unwrap_iter(__last1), std::__unwrap_iter(__first2)));
				}
				var-constUnsubmitted Done Reply Inline Actions This line is too long, but more importantly, I think this expression is a little hard to read because of so many nested subexpressions. How about splitting it into something like: auto __unwrapped_out = std::__unwrap_iter(__first2); auto __result = std::__uninitialized_allocator_copy_impl(__alloc, __unwrapped_range.first, __unwrapped_range.second, __unwrapped_out); return std::__rewrap_iter(__first2, __result); ? var-const: This line is too long, but more importantly, I think this expression is a little hard to read…

	// Move-construct the elements [__first1, __last1) into [__first2, __first2 + N)			// Move-construct the elements [__first1, __last1) into [__first2, __first2 + N)
	// if the move constructor is noexcept, where N is distance(__first1, __last1).			// if the move constructor is noexcept, where N is distance(__first1, __last1).
				philnikUnsubmitted Done Reply Inline Actions You should be able to use `__unwrap_range` regardless of C++ version. philnik: You should be able to use `__unwrap_range` regardless of C++ version.
				hiradityaAuthorUnsubmitted Done Reply Inline Actions i see! fixed it. hiraditya: i see! fixed it.
	//			//
	// Otherwise try to copy all elements. If an exception is thrown the already copied			// Otherwise try to copy all elements. If an exception is thrown the already copied
	// elements are destroyed in reverse order of their construction.			// elements are destroyed in reverse order of their construction.
	template <class _Alloc, class _Iter1, class _Sent1, class _Iter2>			template <class _Alloc, class _Iter1, class _Sent1, class _Iter2>
				hiradityaAuthorUnsubmitted Done Reply Inline Actions this version gets called for test cases like `vector<const int>` hiraditya: this version gets called for test cases like `vector<const int>`
				philnikUnsubmitted Done Reply Inline Actions But the other overload works for that just as well, no? philnik: But the other overload works for that just as well, no?
	_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 _Iter2 __uninitialized_allocator_move_if_noexcept(			_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 _Iter2 __uninitialized_allocator_move_if_noexcept(
	_Alloc& __alloc, _Iter1 __first1, _Sent1 __last1, _Iter2 __first2) {			_Alloc& __alloc, _Iter1 __first1, _Sent1 __last1, _Iter2 __first2) {
	static_assert(__is_cpp17_move_insertable<_Alloc>::value,			static_assert(__is_cpp17_move_insertable<_Alloc>::value,
	"The specified type does not meet the requirements of Cpp17MoveInsertable");			"The specified type does not meet the requirements of Cpp17MoveInsertable");
	auto __destruct_first = __first2;			auto __destruct_first = __first2;
	auto __guard =			auto __guard =
	std::__make_exception_guard(_AllocatorDestroyRangeReverse<_Alloc, _Iter2>(__alloc, __destruct_first, __first2));			std::__make_exception_guard(_AllocatorDestroyRangeReverse<_Alloc, _Iter2>(__alloc, __destruct_first, __first2));
	while (__first1 != __last1) {			while (__first1 != __last1) {
				MordanteUnsubmitted Not Done Reply Inline Actions LLVM style is not `else` after a `return`, here and other places in this patch. Note the `if constexpr` does get the `else` to avoid generating unneeded code by the compiler. Mordante: LLVM style is not `else` after a `return`, here and other places in this patch. Note the `if…
				hiradityaAuthorUnsubmitted Done Reply Inline Actions This code is directly copied from the function above. Not sure what is the guidance for libcxx is, cc: @ldionne hiraditya: This code is directly copied from the function above. Not sure what is the guidance for libcxx…
				MordanteUnsubmitted Not Done Reply Inline Actions We have more code that does not conform to our policy, but for new code we use our policy. Mordante: We have more code that does not conform to our policy, but for new code we use our policy.
				philnikUnsubmitted Done Reply Inline Actions I'm not aware that we use this particular policy in our new code. Anyways, this code is redundant anyways, and shouldn't have to be added (unless I'm missing something). philnik: I'm not aware that we use this particular policy in our new code. Anyways, this code is…
				hiradityaAuthorUnsubmitted Done Reply Inline Actions it was indeed redundant. removed it. hiraditya: it was indeed redundant. removed it.
	#ifndef _LIBCPP_HAS_NO_EXCEPTIONS			#ifndef _LIBCPP_HAS_NO_EXCEPTIONS
	allocator_traits<_Alloc>::construct(__alloc, std::__to_address(__first2), std::move_if_noexcept(*__first1));			allocator_traits<_Alloc>::construct(__alloc, std::__to_address(__first2), std::move_if_noexcept(*__first1));
	#else			#else
	allocator_traits<_Alloc>::construct(__alloc, std::__to_address(__first2), std::move(*__first1));			allocator_traits<_Alloc>::construct(__alloc, std::__to_address(__first2), std::move(*__first1));
	#endif			#endif
	++__first1;			++__first1;
	++__first2;			++__first2;
	}			}
	Show All 36 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[libc++, std::vector] call the optimized version of __uninitialized_allocator_copy for trivial typesClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 518006

libcxx/include/__memory/uninitialized_algorithms.h

[libc++, std::vector] call the optimized version of __uninitialized_allocator_copy for trivial types
ClosedPublic