Details
- Reviewers
• wash EricWF - Group Reviewers
Restricted Project - Commits
- rGc30d5101f14f: [libc++] Optimize the number of assignments in std::exclusive_scan
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
That's worse; it replaced 2n assignments with n assignments, n constructions, and n destructions. Interestingly, assigning over __saved is an optimization I should put into MSVC.
My Twitter comment was about the last init = _VSTD::move(tmp) when first == last;
for (;;) { _Tp __tmp = __b(__init, *__first); *__result = __init; ++__result; if (++__first == __last) break; __init = _VSTD::move(__tmp); }
saves the last assignment.
libcxx/include/numeric | ||
---|---|---|
343–348 | The compiler can optimize a for loop much better than a do-while loop. |
I have not run either through tests yet...
I should have done that before posting, tests fail with the 2 iterator version because it doesn't handle the self-aliasing case. This version combines the good parts of both our implementations and passes tests.
if (_UFirst != _ULast) { _Ty _Tmp(_Reduce_op(_Val, *_UFirst)); // temp to enable _First == _Dest, also requirement missing for (;;) { *_UDest = _Val; ++_UDest; ++_UFirst; if (_UFirst == _ULast) { break; } _Val = _STD move(_Tmp); // Requirement missing from N4713 _Tmp = _Reduce_op(_Val, *_UFirst); } }
libcxx/include/numeric | ||
---|---|---|
341 | @BillyONeal This is equivalent to *_UDest = _Val; in your version, and I think you could instead use *_UDest = STD move(_Val);. |
libcxx/include/numeric | ||
---|---|---|
338–339 | Wouldn't you want to move this line into the loop, and so remove the assignment to __tmp at the end of the loop? |
libcxx/include/numeric | ||
---|---|---|
338–339 | As Billy mentioned in an earlier comment, this would trade 1 construction, N assignments and 1 destruction for N constructions, and N destructions. If we think about a string, for example, I believe this version with N assignments would be faster. WDYT? |
Wouldn't you want to move this line into the loop, and so remove the assignment to __tmp at the end of the loop?
It will remove code duplication and should be the same performance-wise.