This is an archive of the discontinued LLVM Phabricator instance.

ADT: Take small enough, trivially copyable T by value in SmallVector
AbandonedPublic

Authored by dexonsmith on Nov 13 2020, 3:34 PM.

Details

Summary

Change SmallVector functions taking a const T& parameter for
insertion to take T instead when T is trivially copyable and "small
enough" (the heuristic is sizeof(T) <= 2 * sizeof(void *)). When it
applies it avoids iterator invalidation gotchas.

For insert, layer insert_one_maybe_copy between insert and
insert_one_impl in order to copy Elt when we want to take it by
value.

For emplace_back, forward to push_back when we want to take the
parameter by value.

Note: the intent here is to help unblock https://reviews.llvm.org/D87326:

Overhead is likely caused by slightly more code needing to be emitted. However there is a glaring hole here that SmallVector isn't very smart about how it should take paramaters for calls like push_back.
For small trivial types it makes sense to take those by value and in those instances the current implementation of just calling grow would have no issue about reference invalidation as there no longer a reference.
Downside is it would require copying most of the small vector implementation code for this special case, kind of like what currently happens for SmallVectorTemplateBase with trivially copyable types.

Given that a lot of use cases of SmallVector seem to be for storing pointers or as the base class for SmallString, this would remove this specific overhead in these cases.

Diff Detail

Event Timeline

dexonsmith created this revision.Nov 13 2020, 3:34 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 13 2020, 3:34 PM
Herald added a subscriber: ributzka. · View Herald Transcript

This is from an idea @njames93 posted in https://reviews.llvm.org/D87326. I've just built/run SupportTests; can/will run check-all and/or write extra tests if others think this is worth committing.

See also https://reviews.llvm.org/D84293 which is in the same space.

njames93 added inline comments.Nov 13 2020, 4:15 PM
llvm/include/llvm/ADT/SmallVector.h
349

In relation to my comment about insert, it would be a lot easier to implement if this condition was extracted out into a static constexpr variable

static constexpr bool IsPushByVal = sizeof(T) <= 16;

A second point is maybe this should be based loosely on the target. Though I'm not super sure on that, WDYT?

sizeof(T) <= 2 * sizeof(void*)
637–642

This code is redundant if Elt is passed by value, however I don't think all(or any) compilers are smart enough to see that, wrapping this around an if block dependent on whether Elt is passed by ref would in all likelihood be a win. Same goes for the other call to insert below.

dexonsmith added inline comments.Nov 13 2020, 4:45 PM
llvm/include/llvm/ADT/SmallVector.h
637–642

Good point; I’ll update on Monday.

Do you have any performance and binary size results? I suspect that these are frequently just inlined + sroa'd so this won't make much of a difference in the end.

llvm/include/llvm/ADT/SmallVector.h
349

+1, I think that basing it on sizeof(void*) probably makes sense.

Won't we end up with a different behavior (with respect to iterator invalidation) based on the platform?

dexonsmith retitled this revision from ADT: Take small enough, trivially copyable T by value in SmallVector to ADT: Take small enough, trivially copyable T by value in SmallVector (otherwise, assert).
dexonsmith edited the summary of this revision. (Show Details)

Updates:

  • Switched size limit to 2 * sizeof(void *).
  • Rebased on top of 2c196bbc6bd897b3dcc1d87a3baac28e1e88df41.
  • Added assertions catching invalidation when the by-value optimization doesn't kick in.
  • Added tests.

Note that the new assertions caught a bug in llvm/utils/TableGen/CodeGenSchedule.cpp and I found another related one by chance (not caught by these assertions though, since emplace_back is opaque). I'm still building, so once the tests run I assume there will be a number of other failures.

dexonsmith marked 3 inline comments as done.Nov 17 2020, 1:48 PM

Won't we end up with a different behavior (with respect to iterator invalidation) based on the platform?

Yes; that's true of both choices so far (<= 2 * sizeof(void *) and <= 16). It's not ideal, but maybe it's better than the status quo? Curious if you have an idea for something that would be stable.

A third option that doesn't change behaviour per platform is to always take by-value when it's trivially copyable. Not sure how much that pessimizes large T.

dexonsmith added inline comments.Nov 17 2020, 1:49 PM
llvm/include/llvm/ADT/SmallVector.h
349

BTW, the updated patch takes both of these suggestions.

637–642

Updated patch covers this.

Note that the new assertions caught a bug in llvm/utils/TableGen/CodeGenSchedule.cpp and I found another related one by chance (not caught by these assertions though, since emplace_back is opaque). I'm still building, so once the tests run I assume there will be a number of other failures.

Actually, that was it. Probably helps that for small T the invalidation was "fixed". Maybe there would be more failures on 32-bit platforms, and/or on a bootstrap; I'll run a bootstrap before committing but don't want to spend the time if this is DOA anyway.

Do you have any performance and binary size results?

Nope, happy to collect something though. What do you think we need to see?

Note that right now the "fix invalidation" and "assert if invalid" are tied together in the same patch. Of course they're separable, but I'd be tempted to keep them together if there's a path to landing both. But if the optimization is DOA, I imagine we'd still want the assertion (and fixes for any extra assertion failures).

Note that the new assertions caught a bug in llvm/utils/TableGen/CodeGenSchedule.cpp and I found another related one by chance (not caught by these assertions though, since emplace_back is opaque). I'm still building, so once the tests run I assume there will be a number of other failures.

Actually, that was it. Probably helps that for small T the invalidation was "fixed". Maybe there would be more failures on 32-bit platforms, and/or on a bootstrap; I'll run a bootstrap before committing but don't want to spend the time if this is DOA anyway.

Do you have any performance and binary size results?

Nope, happy to collect something though. What do you think we need to see?

Note that right now the "fix invalidation" and "assert if invalid" are tied together in the same patch. Of course they're separable, but I'd be tempted to keep them together if there's a path to landing both. But if the optimization is DOA, I imagine we'd still want the assertion (and fixes for any extra assertion failures).

I think we can probably get a good signal from:

  1. Clang execution time compiling some cpp file at by-value thresholds of 0, 1 * sizeof(void*), 2 * sizeof(void*) would be enough to get a signal on whether this is measurable performance wise.
  2. Clang binary size for those 3 cases.
dexonsmith edited the summary of this revision. (Show Details)

Rebase on top of https://reviews.llvm.org/D91674, which streamlines insert, and handle emplace_back in a similar way.

I think we can probably get a good signal from:

  1. Clang execution time compiling some cpp file at by-value thresholds of 0, 1 * sizeof(void*), 2 * sizeof(void*) would be enough to get a signal on whether this is measurable performance wise.
  2. Clang binary size for those 3 cases.

That sounds good. I'll throw in a version with no threshold.

I think we can probably get a good signal from:

  1. Clang execution time compiling some cpp file at by-value thresholds of 0, 1 * sizeof(void*), 2 * sizeof(void*) would be enough to get a signal on whether this is measurable performance wise.
  2. Clang binary size for those 3 cases.

That sounds good. I'll throw in a version with no threshold.

Results of du -k path/to/clang followed by timing to compile llvm/lib/Target/X86/X86ISelLowering.cpp three times in a row:

116216	threshold-0/build/bin/clang-12
       30.31 real        29.96 user         0.34 sys
       30.10 real        29.77 user         0.31 sys
       30.48 real        30.15 user         0.32 sys
116200	threshold-1-ptr/build/bin/clang-12
       30.58 real        30.23 user         0.33 sys
       30.17 real        29.79 user         0.37 sys
       30.28 real        29.94 user         0.33 sys
116184	threshold-2-ptr/build/bin/clang-12
       30.15 real        29.85 user         0.30 sys
       29.98 real        29.66 user         0.31 sys
       30.02 real        29.70 user         0.32 sys
116216	threshold-none/build/bin/clang-12
       30.15 real        29.83 user         0.31 sys
       30.16 real        29.82 user         0.34 sys
       30.21 real        29.90 user         0.30 sys

Sizes are all really close, around 116MB. Best of three runs for user time seems to favour 2 * sizeof(void *) very slightly, but I'm not sure it's significant:

  • 29.77s: threshold of 0 (never by-value)
  • 29.79s: threshold of sizeof(void*)
  • 29.66s: threshold of 2 * sizeof(void*)
  • 29.82s: no threshold (always by-value when trivially copyable)

Happy to collect more data if that's useful.

dexonsmith retitled this revision from ADT: Take small enough, trivially copyable T by value in SmallVector (otherwise, assert) to ADT: Take small enough, trivially copyable T by value in SmallVector.
dexonsmith edited the summary of this revision. (Show Details)

Update: I decided to split out the assertions to https://reviews.llvm.org/D91744 after all. This is just rebasing on top of that.

Can you make more clear in the description the motivation for this patch? Is this mostly about iterator invalidation? Is this expected to make code safer?
I'm not sure about making the mental model a bit harder by having different iterator invalidation behavior implicitly based on T, WDYT?

Can you make more clear in the description the motivation for this patch? Is this mostly about iterator invalidation? Is this expected to make code safer?

Oh, I did this because I happened upon this comment in https://reviews.llvm.org/D87326:

Overhead is likely caused by slightly more code needing to be emitted. However there is a glaring hole here that SmallVector isn't very smart about how it should take paramaters for calls like push_back.
For small trivial types it makes sense to take those by value and in those instances the current implementation of just calling grow would have no issue about reference invalidation as there no longer a reference.
Downside is it would require copying most of the small vector implementation code for this special case, kind of like what currently happens for SmallVectorTemplateBase with trivially copyable types.

Given that a lot of use cases of SmallVector seem to be for storing pointers or as the base class for SmallString, this would remove this specific overhead in these cases.

I can update the description to be more clear if we decide I should commit this.

I'm not sure about making the mental model a bit harder by having different iterator invalidation behavior implicitly based on T, WDYT?

I'm not sure either, but I think it's worth doing if it unblocks @njames93's broader fix for iterator invalidation (at which point the mental model is simple again). @njames93 , WDYT?

Oh I didn't read about https://reviews.llvm.org/D87326 before, isn't the work there making obsolete all of the assertions you (and me before) just added by making this behavior safe?

dexonsmith added 1 blocking reviewer(s): njames93.Nov 18 2020, 6:01 PM

Oh I didn't read about https://reviews.llvm.org/D87326 before, isn't the work there making obsolete all of the assertions you (and me before) just added by making this behavior safe?

Yes, but that revision is blocked on landing due to compile-time regressions. It needs some number of commits to land first; maybe this is one of them; and I couldn't figure out how to test this one reasonably without adding the assertions; and it seems reasonable to fix existing bugs (with the assertions) while we wait for the beautiful future in case it doesn't come tomorrow.

Making @njames93 a blocking reviewer since I think we shouldn't move forward unless there appears to be a path to fixing invalidation generally (this seems like a useful incremental step, but I agree with @mehdi_amini that if things stop here the mental model is confusing for little gain).

dexonsmith edited the summary of this revision. (Show Details)Nov 18 2020, 6:07 PM

Oh I didn't read about https://reviews.llvm.org/D87326 before, isn't the work there making obsolete all of the assertions you (and me before) just added by making this behavior safe?

Assertions are great now as a quick fix, however if we can get to the point where SmallVector is able to handle those cases without the compile time regressions observed in D87326 we should strive to use that instead.
It removes contracts that are only enforced in assertion enabled builds which have the potential to cause undefined behaviour in release builds if they slip through the cracks.

Results of du -k path/to/clang followed by timing to compile llvm/lib/Target/X86/X86ISelLowering.cpp three times in a row:

116216	threshold-0/build/bin/clang-12
       30.31 real        29.96 user         0.34 sys
       30.10 real        29.77 user         0.31 sys
       30.48 real        30.15 user         0.32 sys
116200	threshold-1-ptr/build/bin/clang-12
       30.58 real        30.23 user         0.33 sys
       30.17 real        29.79 user         0.37 sys
       30.28 real        29.94 user         0.33 sys
116184	threshold-2-ptr/build/bin/clang-12
       30.15 real        29.85 user         0.30 sys
       29.98 real        29.66 user         0.31 sys
       30.02 real        29.70 user         0.32 sys
116216	threshold-none/build/bin/clang-12
       30.15 real        29.83 user         0.31 sys
       30.16 real        29.82 user         0.34 sys
       30.21 real        29.90 user         0.30 sys

Sizes are all really close, around 116MB. Best of three runs for user time seems to favour 2 * sizeof(void *) very slightly, but I'm not sure it's significant:

  • 29.77s: threshold of 0 (never by-value)
  • 29.79s: threshold of sizeof(void*)
  • 29.66s: threshold of 2 * sizeof(void*)
  • 29.82s: no threshold (always by-value when trivially copyable)

Happy to collect more data if that's useful.

Thanks! Looks in the noise. I guess inline+SROA really does clean it up very consistently :)

I don't have any objections to this patch. (It sounds like this ties into another workstream, so I'll let other folks give the final sign-off)

Thanks! Looks in the noise. I guess inline+SROA really does clean it up very consistently :)

I don't have any objections to this patch. (It sounds like this ties into another workstream, so I'll let other folks give the final sign-off)

Thanks for the review!

I'm still a bit concerned about the point Mehdi brought up. I've posted an alternative at https://reviews.llvm.org/D91837 which breaks down the problem in a different way; if that doesn't work out (e.g., lots of perf regressions) I'll circle back here.

dexonsmith abandoned this revision.Dec 23 2020, 2:43 PM

Abandoning this in favour of:

(which are split out from https://reviews.llvm.org/D91837)

The new work fixes reference invalidation for most of the same APIs (skipping emplace_back so far) but for all T.