This is an archive of the discontinued LLVM Phabricator instance.

Add optimization to basic_string::assign for compile-time known constant values.
AbandonedPublic

Authored by mvels on Apr 23 2020, 2:50 PM.

Details

Reviewers
EricWF
ldionne
Group Reviewers
Restricted Project
Summary

This change optimizes the assign() methods for string where either the contents or lengths are compile time known constants. For small strings (< min_cap) we can execute the assignment entirely inline. For strings up to 128 bytes we allow the compiler to efficiently inline the copy operation after we call the offline __resize<>() method. Short / long branches are taken at the call site for better branch prediction and allowing FDO optimizations.

Benchmarks (unstable / google perflab):

name                                                old time/op             new time/op             delta
BM_StringAssignAsciiz_Empty_Opaque                  5.69ns ± 7%             5.97ns ± 7%     ~             (p=0.056 n=5+5)
BM_StringAssignAsciiz_Empty_Transparent             5.39ns ± 7%             0.79ns ± 8%  -85.36%          (p=0.008 n=5+5)
BM_StringAssignAsciiz_Small_Opaque                  11.2ns ± 5%             11.0ns ± 6%     ~             (p=0.548 n=5+5)
BM_StringAssignAsciiz_Small_Transparent             10.1ns ± 7%              1.0ns ± 8%  -89.76%          (p=0.008 n=5+5)
BM_StringAssignAsciiz_Large_Opaque                  23.5ns ± 7%             23.8ns ± 7%     ~             (p=0.841 n=5+5)
BM_StringAssignAsciiz_Large_Transparent             21.4ns ± 7%             12.7ns ± 7%  -40.83%          (p=0.008 n=5+5)
BM_StringAssignAsciiz_Huge_Opaque                    336ns ± 4%              327ns ± 7%     ~             (p=0.421 n=5+5)
BM_StringAssignAsciiz_Huge_Transparent               331ns ± 5%              324ns ± 7%     ~             (p=0.548 n=5+5)
BM_StringAssignAsciizMix_Opaque                     13.6ns ±10%             13.7ns ± 9%     ~             (p=0.690 n=5+5)
BM_StringAssignAsciizMix_Transparent                12.9ns ± 8%              3.6ns ± 8%  -71.82%          (p=0.008 n=5+5)

Diff Detail

Event Timeline

mvels created this revision.Apr 23 2020, 2:50 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 23 2020, 2:50 PM
Herald added a reviewer: Restricted Project. · View Herald Transcript
mvels updated this revision to Diff 259719.Apr 23 2020, 2:52 PM

Adjusted for sizeof(value_type)

mvels edited the summary of this revision. (Show Details)
mvels edited the summary of this revision. (Show Details)Apr 23 2020, 2:58 PM
mvels updated this revision to Diff 259721.Apr 23 2020, 3:02 PM

Added proper constant for min short -> long promotion size

ldionne requested changes to this revision.Apr 24 2020, 4:16 AM
ldionne added a subscriber: ldionne.

Please make sure you update the ABI v2 abi list when you make these changes. It's currently broken (and this change will break it even more). You can run make check-cxx-abilist to see the failures.

Or, I guess we can also have a discussion to disable the abi list tests for unstable ABIs, since it's not that useful I guess.

This revision now requires changes to proceed.Apr 24 2020, 4:16 AM
mvels updated this revision to Diff 265492.May 21 2020, 6:34 AM

Simplified change to inlined optimizations.

mvels updated this revision to Diff 265494.May 21 2020, 6:45 AM

Fixed external instantiations for unstable (removed __resize)

mvels added a comment.May 21 2020, 6:49 AM

Please make sure you update the ABI v2 abi list when you make these changes. It's currently broken (and this change will break it even more). You can run make check-cxx-abilist to see the failures.

Or, I guess we can also have a discussion to disable the abi list tests for unstable ABIs, since it's not that useful I guess.

Thanks Luis, I noticed the ABI list checks are remove from unstable, thanks!

I will check with ericwf@ on further review, I scoped this change down to only the inlining part, optimizing capacity step ups and growth will follow in separate changes (our data indicates its a win)

mvels updated this revision to Diff 265570.May 21 2020, 12:09 PM

Restricted assign to __min_cap

Please make sure you update the ABI v2 abi list when you make these changes. It's currently broken (and this change will break it even more). You can run make check-cxx-abilist to see the failures.

Or, I guess we can also have a discussion to disable the abi list tests for unstable ABIs, since it's not that useful I guess.

Thanks Luis, I noticed the ABI list checks are remove from unstable, thanks!

They are removed from the unstable ABI, but not from the ABI v2. Those are different things (and I guess it doesn't really make sense for them to be different). But we should either decide that we don't test abi lists for ABI v2, or fix it. I suggest we fix it for now in order to make progress on this. You just need to update the ABI list for v2, it shouldn't be difficult.

mvels added a comment.May 21 2020, 1:43 PM

They are removed from the unstable ABI, but not from the ABI v2. Those are different things (and I guess it doesn't really make sense for them to be different). But we should either decide that we don't test abi lists for ABI v2, or fix it. I suggest we fix it for now in order to make progress on this. You just need to update the ABI list for v2, it shouldn't be difficult.

Ah, gotcha, I only tested V1 and unstable, I always forget there is this weird child called 'V2' :)

They are removed from the unstable ABI, but not from the ABI v2. Those are different things (and I guess it doesn't really make sense for them to be different). But we should either decide that we don't test abi lists for ABI v2, or fix it. I suggest we fix it for now in order to make progress on this. You just need to update the ABI list for v2, it shouldn't be difficult.

Ah, gotcha, I only tested V1 and unstable, I always forget there is this weird child called 'V2' :)

Well, I think we should probably remove that notion, or merge it with the "unstable" ABI. It's not really a thing since V2 is basically unstable AFAICT.

Just fix it and we can improve the situation (i.e. probably remove it) later.

Well, I think we should probably remove that notion, or merge it with the "unstable" ABI. It's not really a thing since V2 is basically unstable AFAICT.

Just fix it and we can improve the situation (i.e. probably remove it) later.

I have no way to verify / test this (cross compiling apple darwin v2 looks to be impossible?), and guessing the mangled names from x86 compiles seems dubious as well given the diffs I see on apple v1 vs linux v1

Shall I simply remove the x86_64-apple-darwin.v2.abilist insteasd? (I noticed these are simple 'check if present' targets)

Well, I think we should probably remove that notion, or merge it with the "unstable" ABI. It's not really a thing since V2 is basically unstable AFAICT.

Just fix it and we can improve the situation (i.e. probably remove it) later.

I have no way to verify / test this (cross compiling apple darwin v2 looks to be impossible?), and guessing the mangled names from x86 compiles seems dubious as well given the diffs I see on apple v1 vs linux v1

Shall I simply remove the x86_64-apple-darwin.v2.abilist insteasd? (I noticed these are simple 'check if present' targets)

Yeah, I guess that's fine. Nobody uses it anyway.

mvels updated this revision to Diff 267245.May 29 2020, 8:07 AM

Removed Apple V2 abi list

mvels updated this revision to Diff 267248.May 29 2020, 8:13 AM

Update diff after master pull

ldionne added inline comments.May 29 2020, 11:45 AM
libcxx/include/string
2306–2307

It used to be __n <= ..., now it's __n < ... -- what am I missing?

mvels marked 2 inline comments as done.Jun 18 2020, 12:27 PM
mvels added inline comments.
libcxx/include/string
2306–2307

The initial version was aimed at optimizing 2 things:

  • short optimization, i.e., [n] chars < min_cap guaranteed to fit inside any SSO or heap allocated string
  • small memcpy optimization, i.., n <= 128 which when inlined can be vectorized efficiently (i.e., a rough max memcpy inline)

For this change we should focus only on the former. The latter is better picked up as an FDO / LLVM compiler inlining opportunity.

ldionne accepted this revision.Jun 18 2020, 12:37 PM

I think you answered all my questions. Feel free to wait for Eric if he wants to review it too, but by now you know our string better than I do :)

This revision is now accepted and ready to land.Jun 18 2020, 12:37 PM
EricWF accepted this revision.Jun 18 2020, 12:53 PM
mvels updated this revision to Diff 271828.Jun 18 2020, 1:25 PM
mvels marked an inline comment as done.

Optimed ternary op for ascizz string -> always evalutate length in inlined constant string.

mvels added a comment.Jun 18 2020, 1:28 PM

Thanks for the reviews!

I am pretty confident about these changes, and the numbers held up well on our benchmarks. I'm running an internal global presubmit on our depot to double check for any weirdness before committing.

mvels updated this revision to Diff 272054.Jun 19 2020, 6:56 AM

Uploading clean diff

mvels added a comment.Jun 19 2020, 6:58 AM

Uploaded clean diff in case harbormaster is confused about diffs

mvels added a comment.Jun 19 2020, 7:48 AM

I'm mystified why the build fails, it fails on applying the patch:

$ ./scripts/buildkite/apply_patch.sh
repository exist, will reuse
working dir /mnt/disks/ssd0/agent/llvm-project-fork
Syncing local, origin and upstream...
refresh of master branch completed
Getting dependencies of 271828
Base revision is cd2553de77f2c3206deaa261a15cc7520ff2ff56
git reset, git cleanup...
Analyzing D78763
Applying diff 271828 for revision D78763...
Traceback (most recent call last):
  File "scripts/phabtalk/apply_patch2.py", line 331, in <module>
    patcher.run()
  File "scripts/phabtalk/apply_patch2.py", line 124, in run
    self._apply_diff(self.diff_id, revision_id)
  File "scripts/phabtalk/apply_patch2.py", line 240, in _apply_diff
    raise Exception('Applying patch failed:\n{}'.format(proc.stdout + proc.stderr))
Exception: Applying patch failed:
error: patch failed: libcxx/include/string:1528
error: libcxx/include/string: patch does not apply
error: patch failed: libcxx/include/__string:184
error: libcxx/include/__string: patch does not apply

Which I have no clue .... why? I locally pulled, confirmed (and uploaded) the diff from "git diff origin", yet it fails... Not sure if I should dig into rev cd2553de77f2c3206deaa261a15cc7520ff2ff56 or the whys here?

mvels added a comment.Jun 19 2020, 8:33 AM

Looks like harbormaster builds use arcanist, and arcanist has **magic ** making awful diffs?

For example, I cleanly pulled both my local master and local branch.
git diff against HEAD and origin both confirm the true diff

arc diff --preview gives ....... **magic*** (https://reviews.llvm.org/differential/diff/272089/)

I have no idea how this all works, or to have phabricator understand / build with a clean full diff against head.....

I'll likely ignore this, if this really breaks build we'll roll back, but does not look like I get any good signal here (and local builds all work fine for me)

mvels updated this revision to Diff 272093.Jun 19 2020, 8:47 AM

Running arc diff --update D78763

mvels updated this revision to Diff 272142.Jun 19 2020, 11:07 AM

arc diff with --head=HEAD 'HEAD~1'

I'm going to abandon this diff, seems arcanist, phabricator and I can't see each other eye to eye...... :\ which makes me sad,....

Create new diff (same as this) in https://reviews.llvm.org/D82220

I'm going to abandon this diff, seems arcanist, phabricator and I can't see each other eye to eye...... :\ which makes me sad,....

Create new diff (same as this) in https://reviews.llvm.org/D82220

And, another 'arcanist suprise' :\

From a clean local branch: https://reviews.llvm.org/D82221

Abandoning this change, sorry for all the noise, one day I am sure all this will be over and we all will be using plain git and all that

mvels abandoned this revision.Jun 19 2020, 11:38 AM
libcxx/include/string