This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
docs/
8/14
ProgrammersManual.rst
-
include/llvm/ADT/
-
llvm/
-
ADT/
1/1
SmallVector.h
-
unittests/ADT/
-
ADT/
3/5
SmallVectorTest.cpp

Differential D90884

[SmallVector] Add `SVec<T>` / `Vec<T>` convenience wrappers.
AbandonedPublic

Authored by silvas on Nov 5 2020, 2:14 PM.

Download Raw Diff

Details

Reviewers

rnk
mehdi_amini
rriddle
rengolin
dexonsmith

Summary

See the update to ProgrammersManual.rst for the guidelines for
using these types. tl;dr: SVec for "small" vectors on the stack, Vec for
vectors on the heap / in data structures.

SVec<T> is a convenience alias for SmallVector<T, N> with N

chosen automatically according to a heuristic described below.

Vec<T> is a convenience alias for SmallVector<T, 0>.

These are expected to help systematically improve SmallVector use in the
LLVM codebase, which has historically been plagued by semi-arbitrary /
cargo culted N parameters, often leading to bad outcomes due to
excessive sizeof(SmallVector<T, N>). These aliases also make
programming more convenient by avoiding edit/rebuild cycles due to
forgetting to type the N parameter.

Notes on the SVec<T> default inline elements heuristic:

The initial policy is to guarantee a maximum upper bound of 64 bytes for
sizeof(SVec<T>).

This tends to match my intuition: depending on the element size, you
want to be careful about your small size -- basically we just want to
ensure not too much stack growth (or heap allocation growth for the
heap-allocated case). Also, looking back through llvm-dev, it looks like
Reid suggested something similar in
this thread.

This default also tends to avoid pathological situations if a programmer
does SVec<SVec<T>> (also mentioned in that thread as problematic). For
these, the sizeof the outer SVec will not amplify the sizeof the
inner SVec. Of course, users can do SVec<Vec<T>> to indicate that the
inner SmallVector's should have no inline elements. Then the outer SVec
automatically adapt, and fit some inner SmallVector's inline.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	370 ms	linux > HWAddressSanitizer-x86_64.TestCases::sizes.cpp

Event Timeline

silvas created this revision.Nov 5 2020, 2:14 PM

Herald added a project: Restricted Project. · View Herald TranscriptNov 5 2020, 2:14 PM

Herald added subscribers: llvm-commits, dexonsmith. · View Herald Transcript

silvas requested review of this revision.Nov 5 2020, 2:14 PM

silvas added reviewers: rnk, mehdi_amini, rriddle.Nov 5 2020, 2:15 PM

Seems like it'll always have a default payload equivalent of 6 pointers? Maybe you can document this? Also that makes the vector 72B, which isn't cache line aligned, should we make it 64 or 128 instead?

I'm not sure it makes sense to every get a default of a very small N in general. I guess a default like here makes sense for smaller object, what about static_assert if sizeof(T)>sizeof(void*)?

llvm/include/llvm/ADT/SmallVector.h
896

In D90884#2377497, @mehdi_amini wrote:

Seems like it'll always have a default payload equivalent of 6 pointers? Maybe you can document this? Also that makes the vector 72B, which isn't cache line aligned, should we make it 64 or 128 instead?

Usually SmallVector's are on the stack, so all cache lines are expected to be already touched anyway. So I don't see much value in considering cache line sizes for the default size.

The default payload is not 6 pointers in all cases. E.g. for smaller types on 64-bit platforms, we will use a 32-bit size/capacity, which on 64-bit platforms results in a header that is 2 pointer in size, resulting in a 4 pointer inline storage.

I'm not sure it makes sense to every get a default of a very small N in general. I guess a default like here makes sense for smaller object, what about static_assert if sizeof(T)>sizeof(void*)?

SmallVector is definitely used for many types that are larger than void*, so we don't want to prohibit that with a static_assert.

SmallVector with a very small N is still advantageous over std::vector because SmallVector has a POD optimization.

Address typo

silvas marked an inline comment as done.Nov 5 2020, 2:47 PM

In D90884#2377497, @mehdi_amini wrote:

Seems like it'll always have a default payload equivalent of 6 pointers? Maybe you can document this? Also that makes the vector 72B, which isn't cache line aligned, should we make it 64 or 128 instead?

Right, the header is usually two pointers big (64-bit), so 3x the header is 72. Maybe keeping the size under 128 makes more sense.

I'm not sure it makes sense to every get a default of a very small N in general. I guess a default like here makes sense for smaller object, what about static_assert if sizeof(T)>sizeof(void*)?

I'm not sure I understand this suggestion. There are lots of SmallVector<LargeTy> instantiations.

FWIW, I'm in favor of a conservative, low default size here. LLVM developers tend to use SmallVector for everything, even when it's inappropriate (many heap cases).

llvm/unittests/ADT/SmallVectorTest.cpp
1019	If there's no runtime code, is there any reason not to stamp this out as a series of static_asserts, or a series of explicit template instantiations of a class template that does nothing but static_assert about its template arguments? Gtest's test registration is very heavyweight.

In D90884#2377537, @rnk wrote:

I'm not sure it makes sense to every get a default of a very small N in general. I guess a default like here makes sense for smaller object, what about static_assert if sizeof(T)>sizeof(void*)?

I'm not sure I understand this suggestion. There are lots of SmallVector<LargeTy> instantiations.

Sorry I wasn't clear: I didn't mean to prevent SmallVector<LargeTy, N> but I was trying to indicate that it may not be a good default for N to be 0 ever. So I would do something like:

template <typename ElementTy> constexpr size_t calculateDefaultSmallSize() {
  static_assert(sizeof(ElementTy) < 8, "Please provide a default value for SmallVector when the element type is \"large\"");
  ...

In D90884#2377537, @rnk wrote:

In D90884#2377497, @mehdi_amini wrote:

Seems like it'll always have a default payload equivalent of 6 pointers? Maybe you can document this? Also that makes the vector 72B, which isn't cache line aligned, should we make it 64 or 128 instead?

Right, the header is usually two pointers big (64-bit), so 3x the header is 72. Maybe keeping the size under 128 makes more sense.

The current logic would make sizeof(SmallVector) < 3 * 16B, which is 48B. The current logic is a heuristic like "if you're willing to put the header on the stack, you're willing to put 3 headers on the stack" (not 3 *additional* headers; sorry for injecting the off-by-one into this discussion)

In D90884#2377537, @rnk wrote:

In D90884#2377497, @mehdi_amini wrote:

Seems like it'll always have a default payload equivalent of 6 pointers? Maybe you can document this? Also that makes the vector 72B, which isn't cache line aligned, should we make it 64 or 128 instead?

Right, the header is usually two pointers big (64-bit), so 3x the header is 72. Maybe keeping the size under 128 makes more sense.

I find the multiple of header-size not so easy to reason about, since the header can be any of 12B, 16B, or 24B. I think I slightly prefer a limit like <= 64 or <= 8 * sizeof(void*).

FWIW, I'm in favor of a conservative, low default size here. LLVM developers tend to use SmallVector for everything, even when it's inappropriate (many heap cases).

Agreed; I've come around over time. I wonder about defaulting to SmallVector<T, 0>, getting POD optimization / compressed header / SmallVectorImpl interop, but no actual small storage unless you ask for it (I think this was your suggestion in the thread linked above).

In D90884#2377565, @mehdi_amini wrote:
In D90884#2377537, @rnk wrote:

I'm not sure it makes sense to every get a default of a very small N in general. I guess a default like here makes sense for smaller object, what about static_assert if sizeof(T)>sizeof(void*)?

I'm not sure I understand this suggestion. There are lots of SmallVector<LargeTy> instantiations.

Sorry I wasn't clear: I didn't mean to prevent SmallVector<LargeTy, N> but I was trying to indicate that it may not be a good default for N to be 0 ever. So I would do something like:
template <typename ElementTy> constexpr size_t calculateDefaultSmallSize() {
  static_assert(sizeof(ElementTy) < 8, "Please provide a default value for SmallVector when the element type is \"large\"");
  ...

That seems artificially restrictive. What "bug" are you trying to prevent with that static_assert? It just seems to be a pain for users. Also <8 is way too strict. Maybe <=128 or something. Large types are not uncommon

In D90884#2377574, @silvas wrote:

That seems artificially restrictive. What "bug" are you trying to prevent with that static_assert? It just seems to be a pain for users. Also <8 is way too strict. Maybe <=128 or something. Large types are not uncommon

My take is that if you have a large type and you want a SmallVector,, you should be explicit about the storage. I rather read SmallVector<LargeType, 4> than SmallVector<LargeType> and have implicitly N=0.

I'm not sure I understand the size correctly, but right now any type > 24B would yield a N=0, wouldn't it?

In D90884#2377570, @dexonsmith wrote:

In D90884#2377537, @rnk wrote:

In D90884#2377497, @mehdi_amini wrote:

Seems like it'll always have a default payload equivalent of 6 pointers? Maybe you can document this? Also that makes the vector 72B, which isn't cache line aligned, should we make it 64 or 128 instead?

Right, the header is usually two pointers big (64-bit), so 3x the header is 72. Maybe keeping the size under 128 makes more sense.

I find the multiple of header-size not so easy to reason about, since the header can be any of 12B, 16B, or 24B. I think I slightly prefer a limit like <= 64 or <= 8 * sizeof(void*).

Yeah, it's becoming clear that that's too hard to reason about. I propose that we guarantee that sizeof(SmallVector<T>) <= 8 * sizeof(void*). On 64bit platforms, gives us 4 pointers, two std::strings.

FWIW, I'm in favor of a conservative, low default size here. LLVM developers tend to use SmallVector for everything, even when it's inappropriate (many heap cases).

Agreed; I've come around over time. I wonder about defaulting to SmallVector<T, 0>, getting POD optimization / compressed header / SmallVectorImpl interop, but no actual small storage unless you ask for it (I think this was your suggestion in the thread linked above).

I do think we need some small storage, as an overwhelming number of use cases really do have that intent (at least all of the ones that I have written in recent memory). I think this default protects us from some of the more pathological cases like SmallVector<SmallVector<T>> which, without the max-absolute-size default, just lead to an exponential blowup per "layer" of smallvector.

In D90884#2377598, @mehdi_amini wrote:

In D90884#2377574, @silvas wrote:

That seems artificially restrictive. What "bug" are you trying to prevent with that static_assert? It just seems to be a pain for users. Also <8 is way too strict. Maybe <=128 or something. Large types are not uncommon

My take is that if you have a large type and you want a SmallVector,, you should be explicit about the storage. I rather read SmallVector<LargeType, 4> than SmallVector<LargeType> and have implicitly N=0.

That makes sense. As a matter of practicality, I think I agree with Duncan here that N=0 silently isn't too bad. (though I'm still wary of making N=0 the default).

I'm not sure I understand the size correctly, but right now any type > 24B would yield a N=0, wouldn't it?

It depends on the header size (will change that to an absolute size, per Duncan's suggestion).

Harbormaster completed remote builds in B77781: Diff 303259.Nov 5 2020, 3:27 PM

Harbormaster completed remote builds in B77783: Diff 303269.Nov 5 2020, 3:32 PM

dexonsmith mentioned this in D90882: [SmallVector] Add range versions of append/assign..Nov 5 2020, 3:36 PM

Add 64B upper bound, per Duncan's suggestion.

Harbormaster completed remote builds in B77791: Diff 303284.Nov 5 2020, 4:45 PM

silvas marked an inline comment as done.Nov 10 2020, 11:23 AM

silvas edited the summary of this revision. (Show Details)

mehdi_amini added inline comments.Nov 10 2020, 11:38 AM

llvm/unittests/ADT/SmallVectorTest.cpp
1013	FWIW I am still unconvinced that it is a good idea to use the default size on large objects. I'm not super comfortable with the implicit `0` as that seems to encourage mis-uses of the SmallVector.

silvas added inline comments.Nov 10 2020, 12:03 PM

llvm/unittests/ADT/SmallVectorTest.cpp
1013	I don't feel strongly about this, except on the usability side. But maybe that's a minor concern, as very large types are probably not that common. One real issue is that rejecting SmallSize==0 could result in unexpected static_assert's across platforms: sizeof(header) and sizeof(ValueType) will differ across platforms, and both affect whether the default small size will be 0. I don't see an obvious solution to that, and it seems like it could cause bot breakages. Thoughts?

Isn't the real answer a profiler and data collection.

Which SmallVector's are too large?
Which SmallVector's create heap allocations?

mehdi_amini added inline comments.Nov 10 2020, 12:13 PM

llvm/unittests/ADT/SmallVectorTest.cpp
1013	I agree the risk of portability is a concern. I'm not convinced that this is the right direction for SmallVector here though: it does not seem to me that there is an obvious universal default that always makes sense here. To me this feel like making an API that is "too easy to misuse".

In D90884#2386747, @tschuett wrote:

Isn't the real answer a profiler and data collection.

Which SmallVector's are too large?

Which SmallVector's create heap allocations?

I generally agree. I think Duncan did some of this in the past. Unfortunately, we don't have a way to en-mass tune SmallVector sizes. I think if we can adopt a default smallsize strategy, then we can start to do some of that by tuning the default strategy.

I thought the other way around. Don't use a default size, but fix cases of misuse.

silvas added inline comments.Nov 10 2020, 12:22 PM

llvm/unittests/ADT/SmallVectorTest.cpp
1013	AFAIK, the main misuse of SmallVector is to make its sizeof too big by not paying attention to sizeof(ValueType) and the number of inline elements. This patch specifically bounds that misuse by using a fixed upper bound. (that is, it seems to make the situation strictly better) I would categorize "using SmallVector but unexpectedly getting 0 inline elements" as a much more minor kind of misuse.

In D90884#2386770, @tschuett wrote:

I thought the other way around. Don't use a default size, but fix cases of misuse.

That's another approach. The downside is:

loses on the convenience of having a default SmallSize
cases of misuse are continuously being added, so this is a constant maintenance task
once we fix the big low hanging fruit, performance issues due to small size choices can be "death by 1000 papercuts", so meaningful tuning can only happen in aggregate.

Standing back a bit, I think most C++ programmers use std::vector and call it a day, so perhaps zero really is the sensible default. Are those programmers missing out and doing too many heap allocations? I tend to think not. I think setting an inline storage size without profiling is often a premature optimization. I think it's probably a lot easier to notice small, short-lived vectors in a profiler than it is to notice excessive memory usage due to poorly tuned SmallVectors on the heap.

I think "SmallVector" doesn't really mean "vector with inline storage". It really means "LLVM's optimized vector reimplementation", see the usage of uint32_t sizes and capacities.

I think it makes sense to add a low, default size to SmallVector. Both this heuristic and zero seem fine to me. This makes it easy to profile, find slow, small vector heap allocations, and tune the size later without changing APIs from std::vector to SmallVector.

I think setting an inline storage size without profiling is often a premature optimization.

I don't necessarily agree with this: I believe this is the kind of situation where you end up with flat profile where malloc shows up everywhere, and then we would spend a huge amount of resources in optimizing malloc() (like some company specialized in moving protos where optimizing memcpy is easier than avoiding the copy in the first place ;)).
I remember seeing this in clang in the past with strlen taking a few percent on the profile, but it was really hard to fix it because of how widespread it is in the codebase: this is why idioms/patterns around APIs are important.
(also it isn't only about the malloc time, but I suspect the caching behavior of using the stack for short lived value likely help).

At the moment, I much prefer the current behavior and our usage of SmallVector to a "std::vector with implicit inlined storage sometimes, maybe, but you don't really know": when I read or review LLVM code I know what it is doing without the need for too much more context (I see it a bit similarly to our guideline on not abusing auto for readability).

In D90884#2387054, @mehdi_amini wrote:

I think setting an inline storage size without profiling is often a premature optimization.

I don't necessarily agree with this: I believe this is the kind of situation where you end up with flat profile where malloc shows up everywhere, and then we would spend a huge amount of resources in optimizing malloc() (like some company specialized in moving protos where optimizing memcpy is easier than avoiding the copy in the first place ;)).
I remember seeing this in clang in the past with strlen taking a few percent on the profile, but it was really hard to fix it because of how widespread it is in the codebase: this is why idioms/patterns around APIs are important.
(also it isn't only about the malloc time, but I suspect the caching behavior of using the stack for short lived value likely help).

Agreed on the flat profile. I think that tuning this must be done in aggregate, and having a useful default here is the key to enabling that. We can make the policy arbitrarily complex, like having a SmallVectorTraits or something, where e.g. Value* can specialize it to say something like "most vectors of Value* are of operands, which are usually at most 3 operands" (don't know if that's true btw).

At the moment, I much prefer the current behavior and our usage of SmallVector to a "std::vector with implicit inlined storage sometimes, maybe, but you don't really know": when I read or review LLVM code I know what it is doing without the need for too much more context (I see it a bit similarly to our guideline on not abusing auto for readability).

If I write a std::vector on the stack in a patch, I think most reviewers will ask "why not SmallVector?" and even suggest using SmallVector by default. And then most people will just add a random smallsize and nobody cares. My favorite is 6.

I think that it's pretty clear that as a community we are using SmallVector as a vector replacement -- it's the default, and we should make it convenient to use, and make the default usage "just work" and avoid pitfalls (the main one being unexpectedly large sizeof).

Actually, our documented policy is "choose the first one from this list" and SmallVector is earlier in the list than std::vector. https://llvm.org/docs/ProgrammersManual.html#sequential-containers-std-vector-std-list-etc

What about a trait mechanism? We can provide a trait that works for all elemental type, and then it is just a matter for the user to define a trait defining a default size for their type, like they have to define a trait to use a type as DenseMap hash key?

In D90884#2387099, @mehdi_amini wrote:

What about a trait mechanism? We can provide a trait that works for all elemental type, and then it is just a matter for the user to define a trait defining a default size for their type, like they have to define a trait to use a type as DenseMap hash key?

I think that could work, but would probably be an addendum to a universal default based on bounding sizeof(SmallVector). It seems easy to add later.

The DenseMap case is different because the traits are actually needed for correctness. Here we are just talking about an optimization.

Random other proposal in the brainstorming here: introduce a subclass llvm::Vector that inherit from SmallVector and define the size as never exceeding 64B. SmallVector would always have an explicit size and Vector never: that at least address the case someone using SmallVector<LargeSize> and getting an implicit and unexpected N=0

In D90884#2387127, @mehdi_amini wrote:

Random other proposal in the brainstorming here: introduce a subclass llvm::Vector that inherit from SmallVector and define the size as never exceeding 64B. SmallVector would always have an explicit size and Vector never: that at least address the case someone using SmallVector<LargeSize> and getting an implicit and unexpected N=0

Interesting idea! What do other folks think? That would make average code even more succinct (no "Small").

My only concern would be llvm::Vector sounds to me like something that would never have inlined elements, but this proposal would mean that sometimes it has a inlined elements.

Mehdi, so I can better understand your position, your main objection is for something called "SmallVector" to have zero inlined elements unless explicitly requested?

In D90884#2387157, @silvas wrote:

My only concern would be llvm::Vector sounds to me like something that would never have inlined elements, but this proposal would mean that sometimes it has a inlined elements.

It would really be like a "small string optimization" wouldn't it? (except that it is dependent on the element type)

Mehdi, so I can better understand your position, your main objection is for something called "SmallVector" to have zero inlined elements unless explicitly requested?

Yes that is it at the moment, that does not mean that after a few days thinking about it I wouldn't get over it either ;)

In D90884#2387157, @silvas wrote:

In D90884#2387127, @mehdi_amini wrote:

Random other proposal in the brainstorming here: introduce a subclass llvm::Vector that inherit from SmallVector and define the size as never exceeding 64B. SmallVector would always have an explicit size and Vector never: that at least address the case someone using SmallVector<LargeSize> and getting an implicit and unexpected N=0

Interesting idea! What do other folks think? That would make average code even more succinct (no "Small").

My only concern would be llvm::Vector sounds to me like something that would never have inlined elements, but this proposal would mean that sometimes it has a inlined elements.

Mehdi, so I can better understand your position, your main objection is for something called "SmallVector" to have zero inlined elements unless explicitly requested?

If we add an llvm::Vector I think it should be:

template <class T> using Vector = SmallVector<T, 0>;

rather than "no more than 64B".

One option would be to also add:

template <class T> using DefaultSmallVector = SmallVector<T, kDefaultSmallSize<T>>;

(maybe with a better name), for Sean's use case of "give me a little bit of local storage if it's not expensive".

DefaultSmallVector is quite a "mouthful" though, I'd have thought we'd look for a "default container" that is easy to type if we have it everywhere.

Why are you against llvm::Vector with short string optimization? I know that std::vector does not have any inline storage, but that seems like a weak reason to me: from any behavior point of view it exposes the same interface
I see it similarly to how some implementation of std::string have a small-string optimization and others don't: it is an implementation detail.

In D90884#2390224, @mehdi_amini wrote:

DefaultSmallVector is quite a "mouthful" though, I'd have thought we'd look for a "default container" that is easy to type if we have it everywhere.

Why are you against llvm::Vector with short string optimization? I know that std::vector does not have any inline storage, but that seems like a weak reason to me: from any behavior point of view it exposes the same interface
I see it similarly to how some implementation of std::string have a small-string optimization and others don't: it is an implementation detail.

I think the difference is that small string optimization typically doesn't increase sizeof(std::string) compared to the default (it does some tricks to keep the string in the header last I checked). I think a more clear example is std::function which has 3 pointers of inline storage for small closures.

Some of the precedent here for "better std::vector" is FBVector which doesn't secretly expand sizeof for inline elements AFAICT: https://github.com/facebook/folly/blob/master/folly/docs/FBVector.md

Also, agreed that DefaultSmallVector is quite a mouthful and unlikely to catch on.

Maybe llvm::SVector / llvm::SmallVec? That's more succinct (helps it to catch on, and echoes the intended "default" / "convenience" aspect) but still carries a name that lets people know that they are getting inline storage. Precedent for SVector is raw_svector_ostream. Personally SmallVec reads a bit nicer to me.

template <class T>
using SVector/SmallVec = SmallVector<T, kDefaultSmallSize<T>>;

I'd prefer to avoid SmallVec: it is too close to SmallVector (it is even a prefix, which will conflict with autocompletion in IDEs).
llvm::SVec or llvm::SVector on the other hand seems different enough to me to avoid confusion, and the using definition you propose seems like a good tradeoff to me!

In D90884#2390224, @mehdi_amini wrote:

DefaultSmallVector is quite a "mouthful" though, I'd have thought we'd look for a "default container" that is easy to type if we have it everywhere.

Agreed, the other suggested names seem better (I don't have a strong opinion about which is best).

Why are you against llvm::Vector with short string optimization? I know that std::vector does not have any inline storage, but that seems like a weak reason to me: from any behavior point of view it exposes the same interface
I see it similarly to how some implementation of std::string have a small-string optimization and others don't: it is an implementation detail.

As others have mentioned, I think if we added small storage that was zero cost to sizeof that would be fine to shove into llvm::Vector by-default. But having llvm::Vector be bigger than std::vector seems dangerous to me, simply because it would be surprising.

In D90884#2392543, @dexonsmith wrote:

having llvm::Vector be bigger than std::vector seems dangerous to me, simply because it would be surprising.

Oh so we should name it llvm::BigVector instead then ;)
(just joking, just joking...)

Add SVec<T> as "DefaultSmallVector"

silvas edited the summary of this revision. (Show Details)Nov 12 2020, 3:20 PM

Herald added a reviewer: rengolin. · View Herald TranscriptNov 12 2020, 3:20 PM

silvas retitled this revision from [SmallVector] Add a default small size. to Add `SVec<T>`, a SmallVector with a default small size..Nov 12 2020, 3:20 PM

silvas edited the summary of this revision. (Show Details)

mehdi_amini accepted this revision.Nov 12 2020, 3:34 PM

This revision is now accepted and ready to land.Nov 12 2020, 3:34 PM

My plan here is to get a few approvals on this patch, and then take the discussion to llvm-dev for some final aspects:

how we should document this (e.g. programmer guide addition and "default" recommendation)
to what extent we want reviewers to suggest SVec
how much we want to bulk migrate code SVec vs let it organically grow.

Harbormaster completed remote builds in B78694: Diff 304981.Nov 12 2020, 4:11 PM

In D90884#2392713, @silvas wrote:

My plan here is to get a few approvals on this patch, and then take the discussion to llvm-dev for some final aspects:

how we should document this (e.g. programmer guide addition and "default" recommendation)

to what extent we want reviewers to suggest SVec

how much we want to bulk migrate code SVec vs let it organically grow.

Two thoughts to consider incorporating into the RFC:

I'd be concerned SVector<T> / SVec<T> will be "too" ergonomic, and people won't bother with SmallVector<T, 0> when it's more appropriate. Adding Vector<T> / Vec<T> at the same time would mitigate that. Also, instances of SmallVector<T, 0> could be updated to use Vector<T> en masse without much controversy, which might help with discovery of {S,}Vector if there's resistance to updating other SmallVectors en masse (and I think there will be).

It'd be nice to have a rough draft of changes to docs/ in this patch already when the RFC goes out, essentially giving an initial opinion and what the recommendations should be. (FWIW, I feel the docs should land in the same commit as the classes if possible.)

+1 on both points!

Add Vec<T> and update ProgrammersManual.rst

silvas retitled this revision from Add `SVec<T>`, a SmallVector with a default small size. to [SmallVector] Add `SVec<T>` / `Vec<T>` convenience wrappers..Nov 13 2020, 11:42 AM

silvas edited the summary of this revision. (Show Details)

Docs look good, so this LGTM too (assuming RFC doesn't bring up major concerns).

Harbormaster completed remote builds in B78800: Diff 305225.Nov 13 2020, 12:33 PM

mehdi_amini added inline comments.Nov 13 2020, 12:40 PM

llvm/docs/ProgrammersManual.rst
1517	I don't necessarily subscribe to the stack/heap difference. If the SVec/Vec is part of a heap allocation, they retain their properties: `Vect` would always an extra heap allocation for the data while SVec keeps inlined storage in the original heap allocation (with all the benefits of SmallVector). I see it similarly to trailing allocation in heap allocated object somehow. In general, heap or stack, SmallVector and Vec behaves differently in "move" vs "copy".
1521	Should we say that we always recommend `Vec<T>` instead of `std::vector`? Are there use-cases (other than invoking external API expressed with std::vector) to use it? (actually line 1617 may make this redundant)
1525	"reference needed" ;) (or reformulate)
1526	I'd remove this.
1630–1631	`s/often/in general/` ?
1632	Is this comment still relevant? The header for SmallVector is adjusting its max size right?

dexonsmith added inline comments.Nov 13 2020, 1:17 PM

llvm/docs/ProgrammersManual.rst
1632	Here's the current logic: template <class T> using SmallVectorSizeType = typename std::conditional<sizeof(T) < 4 && sizeof(void *) >= 8, uint64_t, uint32_t>::type; I.e., it's limited to `UINT32_MAX` elements for `sizeof(T) >= 4` ("16GB should be enough for anyone"). It would probably be good to clarify this limitation, in particular since it doesn't apply to `SmallString`.

Address comments from Mehdi about the docs.

silvas marked an inline comment as done.Nov 13 2020, 1:24 PM

silvas added inline comments.

llvm/docs/ProgrammersManual.rst
1517	This is definitely a simplification, and these are just "guidelines". There are definitely good reasons to use SVec on the heap or Vec on the stack. I think this is the safest default advice we can give, especially since large SmallVector on the heap seems to be one of the main patterns that has been historically problematic.
1521	Yes, std::vector is sometimes needed, see my comment below.
1632	It is still relevant. The current logic will still in many cases only allow UINT32_MAX elements. template <class T> using SmallVectorSizeType = typename std::conditional<sizeof(T) < 4 && sizeof(void *) >= 8, uint64_t, uint32_t>::type;

mehdi_amini added inline comments.Nov 13 2020, 2:40 PM

llvm/docs/ProgrammersManual.rst

1517

Can you phrase this in a less prescriptive way? "should" comes a bit strong to me. I think it'd be enough to just say:

* ``SVec<T>`` should be used for vectors where a "small" number
  of elements are expected. (the "S" in the name stands for "Small" or
  "on the Stack").
* ``Vec<T>`` should be used for cases where inline storage is too expensive. For example inside other
  data structures, such as ``DenseMap<Value *, Vec<Value *>>``.

Harbormaster completed remote builds in B78811: Diff 305247.Nov 13 2020, 2:43 PM

Soften wording to "recommend" from "should".

slight wording tweak

silvas added inline comments.Nov 13 2020, 6:43 PM

llvm/docs/ProgrammersManual.rst
1517	softened "should" to "recommend".

mehdi_amini added inline comments.Nov 13 2020, 7:11 PM

llvm/docs/ProgrammersManual.rst
1517	Yeah but I really prefer that we focus on small and leave out the stack/heap thing: this looks very misleading to me.
1517	(and it does not provide more/useful information either)

Harbormaster completed remote builds in B78837: Diff 305291.Nov 13 2020, 7:12 PM

Harbormaster completed remote builds in B78838: Diff 305292.Nov 13 2020, 7:51 PM

Abandoned in favor of https://reviews.llvm.org/D92522

silvas abandoned this revision.Dec 3 2020, 6:35 PM

Revision Contents

Path

Size

llvm/

docs/

ProgrammersManual.rst

47 lines

include/

llvm/

ADT/

SmallVector.h

48 lines

unittests/

ADT/

SmallVectorTest.cpp

19 lines

Diff 305292

llvm/docs/ProgrammersManual.rst

Show First 20 Lines • Show All 1,492 Lines • ▼ Show 20 Lines

``TinyPtrVector<Type>`` is a highly specialized collection class that is		``TinyPtrVector<Type>`` is a highly specialized collection class that is
optimized to avoid allocation in the case when a vector has zero or one		optimized to avoid allocation in the case when a vector has zero or one
elements. It has two major restrictions: 1) it can only hold values of pointer		elements. It has two major restrictions: 1) it can only hold values of pointer
type, and 2) it cannot hold a null pointer.		type, and 2) it cannot hold a null pointer.

Since this container is highly specialized, it is rarely used.		Since this container is highly specialized, it is rarely used.

		.. _dss_vec_svec:

		``Vec<T>`` and ``SVec<T>``
		^^^^^^^^^^^^^^^^^^^^^^^^^^

		``Vec<T>`` and ``SVec<T>`` are LLVM's preferred replacement for
		``std::vector<T>``. They are convenience wrappers around
		:ref:`SmallVector <dss_smallvector>`, and thus inherit all the benefits of that
		type as compared to ``std::vector<T>``.

		The recommended usage guidelines are simple:

		* ``SVec<T>`` is recommended for vectors on the stack, where a "small" number
		of elements is expected. (the "S" in the name stands for "Small").
		* ``Vec<T>`` is recommended for vectors on the heap. This includes inside other
		data structures, such as ``DenseMap<Value , Vec<Value >>``.

		mehdi_aminiUnsubmitted Not Done Reply Inline Actions I don't necessarily subscribe to the stack/heap difference. If the SVec/Vec is part of a heap allocation, they retain their properties: `Vect` would always an extra heap allocation for the data while SVec keeps inlined storage in the original heap allocation (with all the benefits of SmallVector). I see it similarly to trailing allocation in heap allocated object somehow. In general, heap or stack, SmallVector and Vec behaves differently in "move" vs "copy". mehdi_amini: I don't necessarily subscribe to the stack/heap difference. If the SVec/Vec is part of a heap…
		silvasAuthorUnsubmitted Done Reply Inline Actions This is definitely a simplification, and these are just "guidelines". There are definitely good reasons to use SVec on the heap or Vec on the stack. I think this is the safest default advice we can give, especially since large SmallVector on the heap seems to be one of the main patterns that has been historically problematic. silvas: This is definitely a simplification, and these are just "guidelines". There are definitely good…
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions Can you phrase this in a less prescriptive way? "should" comes a bit strong to me. I think it'd be enough to just say: * ``SVec<T>`` should be used for vectors where a "small" number of elements are expected. (the "S" in the name stands for "Small" or "on the Stack"). * ``Vec<T>`` should be used for cases where inline storage is too expensive. For example inside other data structures, such as ``DenseMap<Value , Vec<Value >>``. mehdi_amini: Can you phrase this in a less prescriptive way? "should" comes a bit strong to me. I think it'd…
		silvasAuthorUnsubmitted Done Reply Inline Actions softened "should" to "recommend". silvas: softened "should" to "recommend".
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions Yeah but I really prefer that we focus on small and leave out the stack/heap thing: this looks very misleading to me. mehdi_amini: Yeah but I really prefer that we focus on small and leave out the stack/heap thing: this looks…
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions (and it does not provide more/useful information either) mehdi_amini: (and it does not provide more/useful information either)
		``SVec<T>`` is just a ``SmallVector<T, N>`` with ``N`` chosen automatically
		using an internal heuristic that decides a reasonable number of
		inline elements (which can be 0 if ``sizeof(T)`` is very big!).
		If measurements show a need for more or fewer inline elements than
		mehdi_aminiUnsubmitted Done Reply Inline Actions Should we say that we always recommend `Vec<T>` instead of `std::vector`? Are there use-cases (other than invoking external API expressed with std::vector) to use it? (actually line 1617 may make this redundant) mehdi_amini: Should we say that we always recommend `Vec<T>` instead of `std::vector`? Are there use-cases…
		silvasAuthorUnsubmitted Done Reply Inline Actions Yes, std::vector is sometimes needed, see my comment below. silvas: Yes, std::vector is sometimes needed, see my comment below.
		the ``SVec<T>`` default, a ``SmallVector<T, N>`` should be used.

		``Vec<T>`` is just a ``SmallVector<T, 0>``. As described in the
		:ref:`SmallVector <dss_smallvector>` section, this has advantages over
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions "reference needed" ;) (or reformulate) mehdi_amini: "reference needed" ;) (or reformulate)
		``std::vector<T>``, even with 0 inline elements.
		mehdi_aminiUnsubmitted Done Reply Inline Actions I'd remove this. mehdi_amini: I'd remove this.

		There are exceptions to the recommendations above. Speaking generally, the
		the additional storage space of the inline storage of ``SVec<T>`` takes up more
		memory and also makes move/copy more expensive. The lack of inline elements
		in ``Vec<T>`` will result in more malloc/free calls in usage scenarios where
		the inline element storage would typically be large enough.

		.. note::

		Historically, ``SmallVector<T, N>`` has suffered from various misuses
		due to the ``N`` parameter being chosen semi-arbitrarily (such
		as described in `this thread
		<https://groups.google.com/g/llvm-dev/c/q1OyHZy8KVc/m/1l_AasOLBAAJ>`_).
		Making it easy to use ``Vec<T>`` where appropriate, along with the principled
		default number of inline elements for ``SVec<T>`` is expected to mitigate
		those issues.

.. _dss_smallvector:		.. _dss_smallvector:

llvm/ADT/SmallVector.h		llvm/ADT/SmallVector.h
^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^

``SmallVector<Type, N>`` is a simple class that looks and smells just like		``SmallVector<Type, N>`` is a simple class that looks and smells just like
``vector<Type>``: it supports efficient iteration, lays out elements in memory		``vector<Type>``: it supports efficient iteration, lays out elements in memory
order (so you can do pointer arithmetic between elements), supports efficient		order (so you can do pointer arithmetic between elements), supports efficient
▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	.. note::
and is no longer "private to the implementation". A name like		and is no longer "private to the implementation". A name like
``SmallVectorHeader`` might be more appropriate.		``SmallVectorHeader`` might be more appropriate.

.. _dss_vector:		.. _dss_vector:

<vector>		<vector>
^^^^^^^^		^^^^^^^^

``std::vector<T>`` is well loved and respected. However, ``SmallVector<T, 0>``		``std::vector<T>`` is well loved and respected. However, ``Vec<T>``
is often a better option due to the advantages listed above. std::vector is		is generally a better option due to the advantages listed above. std::vector is
		mehdi_aminiUnsubmitted Done Reply Inline Actions `s/often/in general/` ? mehdi_amini: `s/often/in general/` ?
still useful when you need to store more than ``UINT32_MAX`` elements or when		still useful when you need to store more than ``UINT32_MAX`` elements or when
		mehdi_aminiUnsubmitted Done Reply Inline Actions Is this comment still relevant? The header for SmallVector is adjusting its max size right? mehdi_amini: Is this comment still relevant? The header for SmallVector is adjusting its max size right?
		silvasAuthorUnsubmitted Done Reply Inline Actions It is still relevant. The current logic will still in many cases only allow UINT32_MAX elements. template <class T> using SmallVectorSizeType = typename std::conditional<sizeof(T) < 4 && sizeof(void ) >= 8, uint64_t, uint32_t>::type; silvas:* It is still relevant. The [current logic](https://github.com/llvm/llvm…
		dexonsmithUnsubmitted Not Done Reply Inline Actions Here's the current logic: template <class T> using SmallVectorSizeType = typename std::conditional<sizeof(T) < 4 && sizeof(void ) >= 8, uint64_t, uint32_t>::type; I.e., it's limited to `UINT32_MAX` elements for `sizeof(T) >= 4` ("16GB should be enough for anyone"). It would probably be good to clarify this limitation, in particular since it doesn't apply to `SmallString`. dexonsmith:* Here's the current logic: ``` template <class T> using SmallVectorSizeType = typename std…
interfacing with code that expects vectors :).		interfacing with code that expects vectors :).

One worthwhile note about std::vector: avoid code like this:		One worthwhile note about std::vector: avoid code like this:

.. code-block:: c++		.. code-block:: c++

for ( ... ) {		for ( ... ) {
std::vector<foo> V;		std::vector<foo> V;
▲ Show 20 Lines • Show All 2,456 Lines • Show Last 20 Lines

llvm/include/llvm/ADT/SmallVector.h

Show First 20 Lines • Show All 887 Lines • ▼ Show 20 Lines

/// for the case when the array is small. It contains some number of elements

/// in-place, which allows it to avoid heap allocation when the actual number of

/// elements is below that threshold. This allows normal "small" cases to be

/// fast without losing generality for large inputs.

///

/// Note that this does not attempt to be exception safe.

///

template <typename T, unsigned N>

class LLVM_GSL_OWNER SmallVector : public SmallVectorImpl<T>,

mehdi_aminiUnsubmitted

Done

// Discount the size of the header itself when calculating the maximum inline

- // bytes, so subtract one fromthe multiplier.

+ // bytes, so subtract one from the multiplier.

size_t MaxInlineBytes = (kDefaultSmallSizeHeaderSizeMultiplier - 1) *

mehdi_amini:

SmallVectorStorage<T, N> {

public:

SmallVector() : SmallVectorImpl<T>(N) {}

~SmallVector() {

// Destroy the constructed elements in the vector.

this->destroy_range(this->begin(), this->end());

}

▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines

template <unsigned Size, typename R>

SmallVector<typename std::remove_const<typename std::remove_reference<

decltype(*std::begin(std::declval<R &>()))>::type>::type,

Size>

to_vector(R &&Range) {

return {std::begin(Range), std::end(Range)};

}

/// Parameter controlling the default number of inlined elements for SVec.

///

/// We guarantee that `sizeof(SVec<T>) <= kSVecMaxSizeof`.

constexpr size_t kSVecMaxSizeof = 64;

Lint: Pre-merge checks

clang-tidy: warning: invalid case style for variable 'kSVecMaxSizeof' [readability-identifier-naming]
not useful

Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'kSVecMaxSizeof' [readability-identifier…

template <typename ElementTy> constexpr size_t calculateSVecInlineElements() {

// Discount the size of the header itself when calculating the maximum inline

// bytes.

size_t MaxInlineBytes = kSVecMaxSizeof - sizeof(SmallVectorImpl<ElementTy>);

return MaxInlineBytes / sizeof(ElementTy);

}

/// SVec is a convenience wrapper around SmallVector with a default number of

/// inlined elements.

///

/// One major reason to use SVec over SmallVector is convenience -- there is no

/// need to choose the number of inlined elements, or to write that number in

/// the type. Also, SVec is a shorter name than SmallVector which helps as well.

///

/// SVec adjusts the number of inlined elements with a policy that bounds the

/// total `sizeof(SVec)` to never exceed a fixed size (currently 64 bytes).

/// This prevents a number of the most pathological cases of SmallVector usage.

/// For example, `SVec<SVec<SVec<T>>>` works fine and doesn't result in an

/// excessive `sizeof`.

///

/// For small types like `char`, the number of inlined elements will be larger.

/// For types like large structs, the number of inlined elements will be

/// smaller, including possibly having zero inlined elements in the extreme

/// case!

///

/// Over time, we have found that most choices of the number of inlined elements

/// for SmallVector are chosen semi-arbitrarily or cargo-culted. It is expected

/// that by tuning SVec default sizes in aggregate by adjusting the policy here,

/// we can realize codebase-wide gains that would not be measurable if done

/// by tuning individual SmallVector sizes across the codebase.

template <typename T>

using SVec = SmallVector<T, calculateSVecInlineElements<T>()>;

/// `Vec<T>` is a convenience wrapper around SmallVector with 0 inlined

/// elements, and should generally be preferred over `std::vector<T>`.

///

/// As described in

/// https://llvm.org/docs/ProgrammersManual.html#llvm-adt-smallvector-h

/// `SmallVector<T, 0>` has a number of advantages over `std::vector<T>`.

///

/// See https://llvm.org/docs/ProgrammersManual.html#vector for situations

/// in which `std::vector<T>` should be preferred over `Vec<T>`.

template <typename T> using Vec = SmallVector<T, 0>;

} // end namespace llvm

namespace std {

/// Implement std::swap in terms of SmallVector swap.

template<typename T>

inline void

swap(llvm::SmallVectorImpl<T> &LHS, llvm::SmallVectorImpl<T> &RHS) {

Show All 13 Lines

llvm/unittests/ADT/SmallVectorTest.cpp

Show First 20 Lines • Show All 993 Lines • ▼ Show 20 Lines	TEST(SmallVectorTest, InitializerList) {
V2.assign({4});		V2.assign({4});
EXPECT_TRUE(makeArrayRef(V2).equals({4}));		EXPECT_TRUE(makeArrayRef(V2).equals({4}));
V2.append({3, 2});		V2.append({3, 2});
EXPECT_TRUE(makeArrayRef(V2).equals({4, 3, 2}));		EXPECT_TRUE(makeArrayRef(V2).equals({4, 3, 2}));
V2.insert(V2.begin() + 1, 5);		V2.insert(V2.begin() + 1, 5);
EXPECT_TRUE(makeArrayRef(V2).equals({4, 5, 3, 2}));		EXPECT_TRUE(makeArrayRef(V2).equals({4, 5, 3, 2}));
}		}

		struct BigElementType {
		char Arr[10000] = {0};
		};
		template <typename ValueT> struct TestSVecSizeof {
		static_assert(sizeof(SVec<ValueT>) <= kSVecMaxSizeof, "");
		};
		using instantiateTestSVecSizeof =
		::testing::Types<TestSVecSizeof<int8_t>, TestSVecSizeof<int16_t>,
		TestSVecSizeof<int32_t>, TestSVecSizeof<int64_t>,
		TestSVecSizeof<void *>, TestSVecSizeof<BigElementType>>;

		TEST(SmallVectorTest, SpotCheckOfVec) {
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions FWIW I am still unconvinced that it is a good idea to use the default size on large objects. I'm not super comfortable with the implicit `0` as that seems to encourage mis-uses of the SmallVector. mehdi_amini: FWIW I am still unconvinced that it is a good idea to use the default size on large objects.
		silvasAuthorUnsubmitted Done Reply Inline Actions I don't feel strongly about this, except on the usability side. But maybe that's a minor concern, as very large types are probably not that common. One real issue is that rejecting SmallSize==0 could result in unexpected static_assert's across platforms: sizeof(header) and sizeof(ValueType) will differ across platforms, and both affect whether the default small size will be 0. I don't see an obvious solution to that, and it seems like it could cause bot breakages. Thoughts? silvas: I don't feel strongly about this, except on the usability side. But maybe that's a minor…
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions I agree the risk of portability is a concern. I'm not convinced that this is the right direction for SmallVector here though: it does not seem to me that there is an obvious universal default that always makes sense here. To me this feel like making an API that is "too easy to misuse". mehdi_amini: I agree the risk of portability is a concern. I'm not convinced that this is the right…
		silvasAuthorUnsubmitted Done Reply Inline Actions AFAIK, the main misuse of SmallVector is to make its sizeof too big by not paying attention to sizeof(ValueType) and the number of inline elements. This patch specifically bounds that misuse by using a fixed upper bound. (that is, it seems to make the situation strictly better) I would categorize "using SmallVector but unexpectedly getting 0 inline elements" as a much more minor kind of misuse. silvas: AFAIK, the main misuse of SmallVector is to make its sizeof too big by not paying attention to…
		// Spot check that Vec<T> works.
		Vec<int> V1;
		EXPECT_TRUE(V1.empty());
		V1.push_back(1);
		EXPECT_EQ(V1[0], 1);
		}
		rnkUnsubmitted Done Reply Inline Actions If there's no runtime code, is there any reason not to stamp this out as a series of static_asserts, or a series of explicit template instantiations of a class template that does nothing but static_assert about its template arguments? Gtest's test registration is very heavyweight. rnk: If there's no runtime code, is there any reason not to stamp this out as a series of…

} // end namespace		} // end namespace

This is an archive of the discontinued LLVM Phabricator instance.

[SmallVector] Add `SVec<T>` / `Vec<T>` convenience wrappers.AbandonedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 305292

llvm/docs/ProgrammersManual.rst

llvm/include/llvm/ADT/SmallVector.h

llvm/unittests/ADT/SmallVectorTest.cpp

[SmallVector] Add `SVec<T>` / `Vec<T>` convenience wrappers.
AbandonedPublic