Details
- Reviewers
• wash EricWF - Group Reviewers
Restricted Project - Commits
- rGc30d5101f14f: [libc++] Optimize the number of assignments in std::exclusive_scan
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
That's worse; it replaced 2n assignments with n assignments, n constructions, and n destructions. Interestingly, assigning over __saved is an optimization I should put into MSVC.
My Twitter comment was about the last init = _VSTD::move(tmp) when first == last;
for (;;)
{
_Tp __tmp = __b(__init, *__first);
*__result = __init;
++__result;
if (++__first == __last) break;
__init = _VSTD::move(__tmp);
}saves the last assignment.
| libcxx/include/numeric | ||
|---|---|---|
| 316–321 | The compiler can optimize a for loop much better than a do-while loop. | |
I have not run either through tests yet...
I should have done that before posting, tests fail with the 2 iterator version because it doesn't handle the self-aliasing case. This version combines the good parts of both our implementations and passes tests.
if (_UFirst != _ULast) {
_Ty _Tmp(_Reduce_op(_Val, *_UFirst)); // temp to enable _First == _Dest, also requirement missing
for (;;) {
*_UDest = _Val;
++_UDest;
++_UFirst;
if (_UFirst == _ULast) {
break;
}
_Val = _STD move(_Tmp); // Requirement missing from N4713
_Tmp = _Reduce_op(_Val, *_UFirst);
}
}| libcxx/include/numeric | ||
|---|---|---|
| 314 | @BillyONeal This is equivalent to *_UDest = _Val; in your version, and I think you could instead use *_UDest = STD move(_Val);. | |
| libcxx/include/numeric | ||
|---|---|---|
| 311 | Wouldn't you want to move this line into the loop, and so remove the assignment to __tmp at the end of the loop? | |
| libcxx/include/numeric | ||
|---|---|---|
| 311 | As Billy mentioned in an earlier comment, this would trade 1 construction, N assignments and 1 destruction for N constructions, and N destructions. If we think about a string, for example, I believe this version with N assignments would be faster. WDYT? | |
Wouldn't you want to move this line into the loop, and so remove the assignment to __tmp at the end of the loop?
It will remove code duplication and should be the same performance-wise.