Details
- Reviewers
• Quuxplusone Roman-Koshelev - Group Reviewers
Restricted Project
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
libcxx/src/string.cpp | ||
---|---|---|
445 | just a drive-by commentator - why is the wstring version so different from the string version? Surely they should be practically the same? |
libcxx/src/string.cpp | ||
---|---|---|
445 | I infer it's because string().capacity() > bufsize, but wstring().capacity() < bufsize. (A default-initialized basic_string<Ch> has non-zero capacity because of the Small String Optimization.) I also wondered if the wstring version should use the same trick — buf.resize(bufsize) followed by buf.resize(res.ptr - buf.data()). But:
So if hypothetically we did wstring buf(bufsize), that'd allocate between 44 and 80 bytes of heap every time, versus the current approach allocating between 0 and 80. (And in the case someone's stringifying a small number like "0" or "42", the current approach allocates 0 bytes, which is great.) So, having them look at least a bit different is good IMO. | |
448 | /accomodate/accommodate/ while you're here (or, look out for a merge-conflict if I fix it in main first ;)) | |
466 | Do you have a benchmark showing the performance impact? I think you should try to quantify the impact somehow — after all, all you're saving is a very predictable branch and a single stack-to-stack memcpy, right? (OTOH, I don't see how this could possibly hurt performance.) |
libcxx/src/string.cpp | ||
---|---|---|
445 | wchar is 2 bytes not 4, |
libcxx/src/string.cpp | ||
---|---|---|
445 | Note that in wchar* it will not work to write a number with the to_chars function. Otherwise, it would be possible to do this optimization at least for types that fit into capacity() | |
466 | No need to make copying where it is not needed. Also std::string buf; buf.resize(buf.capacity()); with c++20 completely constexpr |
Thanks for your patch.
Please investigate the ARM build errors.
libcxx/src/string.cpp | ||
---|---|---|
429 | Can you add an assert the buffer-size is large enough? You now depend on an implementation detail of std::string. Something like: | |
466 | +1 for a benchmark. The change that might degrade the performance are the resize calls. |
As I said
std::string buf; buf.resize(buf.capacity());
with C ++ 20 completely constexpr and equivalent
char buf[bufsize];
It is also obvious that
string(buf, res.ptr);
worse than
buf.resize(res.ptr - buf.data());
Benchmarks
$ g++ -v
gcc version 10.3.0 (Ubuntu 10.3.0-1ubuntu1)
$ g++ main.cpp -O3 -lbenchmark -std=c++20
$ ./a.out
2021-05-07T21:32:02+03:00 Running ./a.out Run on (4 X 3000 MHz CPU s) CPU Caches: L1 Data 32 KiB (x4) L1 Instruction 32 KiB (x4) L2 Unified 256 KiB (x4) L3 Unified 12288 KiB (x4) Load Average: 1.01, 0.76, 0.56 --------------------------------------------------------------- Benchmark Time CPU Iterations --------------------------------------------------------------- to_stringNewBench/1 501 ns 501 ns 1367384 to_stringNewBench/2 537 ns 537 ns 1264337 to_stringNewBench/3 788 ns 726 ns 992086 to_stringNewBench/4 824 ns 777 ns 929673 to_stringNewBench/5 981 ns 921 ns 785107 to_stringNewBench/6 1071 ns 1000 ns 725562 to_stringNewBench/7 1201 ns 1129 ns 644721 to_stringNewBench/8 1109 ns 1109 ns 626050 to_stringNewBench/9 1164 ns 1164 ns 472391 to_stringNewBench/10 1795 ns 1676 ns 440870 to_stringNewBench/11 1208 ns 1134 ns 644179 to_stringNewBench/12 1715 ns 1608 ns 444597 to_stringNewBench/13 1247 ns 1168 ns 650858 to_stringNewBench/14 1121 ns 1046 ns 652401 to_stringNewBench/15 1784 ns 1640 ns 413110 to_stringNewBench/16 1065 ns 1064 ns 619175 to_stringOldBench/1 308 ns 288 ns 2437366 to_stringOldBench/2 697 ns 697 ns 1022340 to_stringOldBench/3 792 ns 791 ns 694918 to_stringOldBench/4 938 ns 881 ns 818895 to_stringOldBench/5 1107 ns 1032 ns 704530 to_stringOldBench/6 1529 ns 1435 ns 505306 to_stringOldBench/7 1346 ns 1265 ns 579740 to_stringOldBench/8 1559 ns 1456 ns 501307 to_stringOldBench/9 1701 ns 1567 ns 461477 to_stringOldBench/10 1733 ns 1642 ns 430777 to_stringOldBench/11 4683 ns 4378 ns 161100 to_stringOldBench/12 2030 ns 1884 ns 368859 to_stringOldBench/13 5351 ns 5019 ns 142119 to_stringOldBench/14 4544 ns 4541 ns 156984 to_stringOldBench/15 1807 ns 1806 ns 327187 to_stringOldBench/16 5348 ns 4958 ns 138795
$ clang++ -v
Ubuntu clang version 12.0.0-1ubuntu1
$ clang++ main.cpp -O3 -lbenchmark -std=c++20
$ ./a.out
2021-05-07T21:31:28+03:00 Running ./a.out Run on (4 X 3000 MHz CPU s) CPU Caches: L1 Data 32 KiB (x4) L1 Instruction 32 KiB (x4) L2 Unified 256 KiB (x4) L3 Unified 12288 KiB (x4) Load Average: 0.70, 0.67, 0.53 --------------------------------------------------------------- Benchmark Time CPU Iterations --------------------------------------------------------------- to_stringNewBench/1 489 ns 489 ns 1374672 to_stringNewBench/2 615 ns 570 ns 1266976 to_stringNewBench/3 783 ns 733 ns 1010880 to_stringNewBench/4 787 ns 787 ns 877548 to_stringNewBench/5 769 ns 768 ns 806926 to_stringNewBench/6 1022 ns 959 ns 759196 to_stringNewBench/7 1002 ns 926 ns 764341 to_stringNewBench/8 1051 ns 986 ns 719005 to_stringNewBench/9 1175 ns 1108 ns 658466 to_stringNewBench/10 1437 ns 1334 ns 548946 to_stringNewBench/11 894 ns 894 ns 790186 to_stringNewBench/12 1295 ns 1294 ns 438192 to_stringNewBench/13 1044 ns 981 ns 770102 to_stringNewBench/14 939 ns 853 ns 766173 to_stringNewBench/15 1466 ns 1383 ns 507655 to_stringNewBench/16 957 ns 901 ns 816543 to_stringOldBench/1 398 ns 372 ns 1874514 to_stringOldBench/2 916 ns 851 ns 835852 to_stringOldBench/3 961 ns 904 ns 790508 to_stringOldBench/4 911 ns 911 ns 754365 to_stringOldBench/5 1030 ns 1029 ns 521805 to_stringOldBench/6 1293 ns 1236 ns 581640 to_stringOldBench/7 1272 ns 1193 ns 608069 to_stringOldBench/8 1363 ns 1270 ns 573086 to_stringOldBench/9 1494 ns 1399 ns 521500 to_stringOldBench/10 1754 ns 1644 ns 443738 to_stringOldBench/11 4812 ns 4517 ns 164884 to_stringOldBench/12 1810 ns 1681 ns 441442 to_stringOldBench/13 4956 ns 4643 ns 164099 to_stringOldBench/14 4612 ns 4292 ns 163186 to_stringOldBench/15 1657 ns 1656 ns 431159 to_stringOldBench/16 4795 ns 4484 ns 160867
Source
#include <charconv> #include <random> #include <string> #include <benchmark/benchmark.h> template <typename V> std::string i_to_stringN(const V v) { std::string buf; buf.resize(buf.capacity()); const auto res = std::to_chars(buf.data(), buf.data() + buf.size(), v); buf.resize(res.ptr - buf.data()); return buf; } std::string to_stringN (int val) { return i_to_stringN(val); } std::string to_stringN (long val) { return i_to_stringN(val); } std::string to_stringN (long long val) { return i_to_stringN(val); } std::string to_stringN (unsigned val) { return i_to_stringN(val); } std::string to_stringN (unsigned long val) { return i_to_stringN(val); } std::string to_stringN (unsigned long long val) { return i_to_stringN(val); } template <typename V> std::string i_to_stringO(const V v) { constexpr size_t bufsize = std::numeric_limits<V>::digits10 + 2; char buf[bufsize]; const auto res = std::to_chars(buf, buf + bufsize, v); return std::string(buf, res.ptr); } std::string to_stringO (int val) { return i_to_stringO(val); } std::string to_stringO (long val) { return i_to_stringO(val); } std::string to_stringO (long long val) { return i_to_stringO(val); } std::string to_stringO (unsigned val) { return i_to_stringO(val); } std::string to_stringO (unsigned long val) { return i_to_stringO(val); } std::string to_stringO (unsigned long long val) { return i_to_stringO(val); } template <unsigned N> struct make_mem { using type = std::conditional_t<N <= 9, std::uint32_t, std::uint64_t>; }; template <unsigned N> using make_mem_t = typename make_mem<N>::type; std::random_device rd; std::mt19937 gen(rd()); constexpr unsigned pow(unsigned base, unsigned exp) { unsigned res = 1; for (unsigned i = 0; i < exp; ++i) { res *= base; } return res; } template<unsigned N, bool IsNew> static void to_stringD(benchmark::State& state) { using Ty = make_mem_t<N>; std::uniform_int_distribution<Ty> distrib(pow(10, N-1), pow(10, N)-1); std::vector<Ty> vec; vec.reserve(128); for(int i=0; i<128; ++i) { vec.push_back(distrib(gen)); } for (auto _ : state) { for(Ty i : vec) { if constexpr (IsNew) { std::string res = to_stringN(i); benchmark::DoNotOptimize(res); } else { std::string res = to_stringO(i); benchmark::DoNotOptimize(res); } } } } template<bool IsNew> struct to_stringBench { static void run(benchmark::State& state) { switch(state.range(0)) { case 1: { to_stringD<1, IsNew>(state); return; } case 2: { to_stringD<2, IsNew>(state); return; } case 3: { to_stringD<3, IsNew>(state); return; } case 4: { to_stringD<4, IsNew>(state); return; } case 5: { to_stringD<5, IsNew>(state); return; } case 6: { to_stringD<6, IsNew>(state); return; } case 7: { to_stringD<7, IsNew>(state); return; } case 8: { to_stringD<8, IsNew>(state); return; } case 9: { to_stringD<9, IsNew>(state); return; } case 10: { to_stringD<10, IsNew>(state); return; } case 11: { to_stringD<11, IsNew>(state); return; } case 12: { to_stringD<12, IsNew>(state); return; } case 13: { to_stringD<13, IsNew>(state); return; } case 14: { to_stringD<14, IsNew>(state); return; } case 15: { to_stringD<15, IsNew>(state); return; } case 16: { to_stringD<16, IsNew>(state); return; } default: assert(false); } } }; void to_stringNewBench(benchmark::State& state) { to_stringBench<true>::run(state); } void to_stringOldBench(benchmark::State& state) { to_stringBench<false>::run(state); } BENCHMARK(to_stringNewBench)->DenseRange(1, 16); BENCHMARK(to_stringOldBench)->DenseRange(1, 16); BENCHMARK_MAIN();
@Roman-Koshelev: By "benchmark," I think we mean putting it in the diff proper. Here's the patch I was about to suggest... https://pastebin.com/raw/Jg2htyVG ...except that then I ran that benchmark (with and without the patch to src/) and was unable to detect any performance improvement. Changing the first resize to __resize_default_init helped, but not enough.
$ bin/clang++ -O2 ../libcxx/benchmarks/to_string.bench.cpp lib/libc++.a lib/libc++abi.a -I ../libcxx/utils/google-benchmark/include/ -L /usr/local/lib -lbenchmark -o new.exe $ ./new.exe 2021-05-08T11:16:51-04:00 Running ./new.exe Run on (8 X 2200 MHz CPU s) CPU Caches: L1 Data 32 KiB (x4) L1 Instruction 32 KiB (x4) L2 Unified 256 KiB (x4) L3 Unified 6144 KiB (x1) Load Average: 1.72, 2.07, 2.17 ---------------------------------------------------------- Benchmark Time CPU Iterations ---------------------------------------------------------- BM_ToString/1 9156 ns 9152 ns 74252 BM_ToString/2 9207 ns 9203 ns 74929 BM_ToString/3 10199 ns 10192 ns 67968 BM_ToString/4 10034 ns 10027 ns 68924 BM_ToString/5 11141 ns 11118 ns 63743 BM_ToString/6 11534 ns 11485 ns 62568 BM_ToString/7 11926 ns 11925 ns 58581 BM_ToString/8 12282 ns 12257 ns 52108 BM_ToString/9 13072 ns 13033 ns 51947 BM_ToString/10 13010 ns 12994 ns 53775 BM_ToWstring/1 6809 ns 6808 ns 101851 BM_ToWstring/2 7863 ns 7849 ns 88482 BM_ToWstring/3 8748 ns 8735 ns 77942 BM_ToWstring/4 9448 ns 9409 ns 77254 BM_ToWstring/5 88436 ns 88398 ns 7768 BM_ToWstring/6 88524 ns 88499 ns 7703 BM_ToWstring/7 90242 ns 90205 ns 7622 BM_ToWstring/8 92892 ns 92575 ns 7279 BM_ToWstring/9 90693 ns 90591 ns 7435 BM_ToWstring/10 93344 ns 93128 ns 7646 $ git checkout main -- ../libcxx/src/string.cpp $ ninja cxx ; ninja cxx ; ninja cxx [5/5] Creating library symlink lib/libc++.1.dylib lib/libc++.dylib [1/1] Linking CXX static library lib/libc++.a ninja: no work to do. $ bin/clang++ -O2 ../libcxx/benchmarks/to_string.bench.cpp lib/libc++.a lib/libc++abi.a -I ../libcxx/utils/google-benchmark/include/ -L /usr/local/lib -lbenchmark -o old.exe $ ./old.exe 2021-05-08T11:18:56-04:00 Running ./old.exe Run on (8 X 2200 MHz CPU s) CPU Caches: L1 Data 32 KiB (x4) L1 Instruction 32 KiB (x4) L2 Unified 256 KiB (x4) L3 Unified 6144 KiB (x1) Load Average: 2.54, 2.45, 2.31 ---------------------------------------------------------- Benchmark Time CPU Iterations ---------------------------------------------------------- BM_ToString/1 8390 ns 8386 ns 79859 BM_ToString/2 9494 ns 9488 ns 72932 BM_ToString/3 10157 ns 10152 ns 67160 BM_ToString/4 10708 ns 10691 ns 66018 BM_ToString/5 12084 ns 12068 ns 57079 BM_ToString/6 12080 ns 12074 ns 56325 BM_ToString/7 13303 ns 13295 ns 51666 BM_ToString/8 12070 ns 12062 ns 56550 BM_ToString/9 12610 ns 12605 ns 55151 BM_ToString/10 12362 ns 12357 ns 55914 BM_ToWstring/1 6791 ns 6787 ns 101672 BM_ToWstring/2 7698 ns 7696 ns 89471 BM_ToWstring/3 8682 ns 8672 ns 79495 BM_ToWstring/4 9499 ns 9466 ns 76316 BM_ToWstring/5 93702 ns 93611 ns 7603 BM_ToWstring/6 95741 ns 95541 ns 7602 BM_ToWstring/7 98949 ns 98798 ns 6976 BM_ToWstring/8 94875 ns 94835 ns 7649 BM_ToWstring/9 94720 ns 94675 ns 7441 BM_ToWstring/10 94846 ns 94802 ns 7011
Are you still interested in working on this patch? We're moving to GitHub PRs and like to clean up the review queue.
Can you add an assert the buffer-size is large enough? You now depend on an implementation detail of std::string. Something like:
_LIBCPP_ASSERT(buf.size(), std::numeric_limits<V>::digits10 + 2, "Capacity is too small");
If somebody adds 128-bit support this will probably break.