This is an archive of the discontinued LLVM Phabricator instance.

[SmallVector] Add `SVec<T>` / `Vec<T>` convenience wrappers.
AbandonedPublic

Authored by silvas on Nov 5 2020, 2:14 PM.

Details

Summary

See the update to ProgrammersManual.rst for the guidelines for
using these types. tl;dr: SVec for "small" vectors on the stack, Vec for
vectors on the heap / in data structures.

  • SVec<T> is a convenience alias for SmallVector<T, N> with N

chosen automatically according to a heuristic described below.

  • Vec<T> is a convenience alias for SmallVector<T, 0>.

These are expected to help systematically improve SmallVector use in the
LLVM codebase, which has historically been plagued by semi-arbitrary /
cargo culted N parameters, often leading to bad outcomes due to
excessive sizeof(SmallVector<T, N>). These aliases also make
programming more convenient by avoiding edit/rebuild cycles due to
forgetting to type the N parameter.

Notes on the SVec<T> default inline elements heuristic:

The initial policy is to guarantee a maximum upper bound of 64 bytes for
sizeof(SVec<T>).

This tends to match my intuition: depending on the element size, you
want to be careful about your small size -- basically we just want to
ensure not too much stack growth (or heap allocation growth for the
heap-allocated case). Also, looking back through llvm-dev, it looks like
Reid suggested something similar in
this thread.

This default also tends to avoid pathological situations if a programmer
does SVec<SVec<T>> (also mentioned in that thread as problematic). For
these, the sizeof the outer SVec will not amplify the sizeof the
inner SVec. Of course, users can do SVec<Vec<T>> to indicate that the
inner SmallVector's should have no inline elements. Then the outer SVec
automatically adapt, and fit some inner SmallVector's inline.

Diff Detail

Event Timeline

silvas created this revision.Nov 5 2020, 2:14 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 5 2020, 2:14 PM
silvas requested review of this revision.Nov 5 2020, 2:14 PM

Seems like it'll always have a default payload equivalent of 6 pointers? Maybe you can document this? Also that makes the vector 72B, which isn't cache line aligned, should we make it 64 or 128 instead?

I'm not sure it makes sense to every get a default of a very small N in general. I guess a default like here makes sense for smaller object, what about static_assert if sizeof(T)>sizeof(void*)?

llvm/include/llvm/ADT/SmallVector.h
904
silvas added a comment.EditedNov 5 2020, 2:46 PM

Seems like it'll always have a default payload equivalent of 6 pointers? Maybe you can document this? Also that makes the vector 72B, which isn't cache line aligned, should we make it 64 or 128 instead?

Usually SmallVector's are on the stack, so all cache lines are expected to be already touched anyway. So I don't see much value in considering cache line sizes for the default size.

The default payload is not 6 pointers in all cases. E.g. for smaller types on 64-bit platforms, we will use a 32-bit size/capacity, which on 64-bit platforms results in a header that is 2 pointer in size, resulting in a 4 pointer inline storage.

I'm not sure it makes sense to every get a default of a very small N in general. I guess a default like here makes sense for smaller object, what about static_assert if sizeof(T)>sizeof(void*)?

SmallVector is definitely used for many types that are larger than void*, so we don't want to prohibit that with a static_assert.

SmallVector with a very small N is still advantageous over std::vector because SmallVector has a POD optimization.

silvas updated this revision to Diff 303269.Nov 5 2020, 2:47 PM

Address typo

silvas marked an inline comment as done.Nov 5 2020, 2:47 PM
rnk added a comment.Nov 5 2020, 2:52 PM

Seems like it'll always have a default payload equivalent of 6 pointers? Maybe you can document this? Also that makes the vector 72B, which isn't cache line aligned, should we make it 64 or 128 instead?

Right, the header is usually two pointers big (64-bit), so 3x the header is 72. Maybe keeping the size under 128 makes more sense.

I'm not sure it makes sense to every get a default of a very small N in general. I guess a default like here makes sense for smaller object, what about static_assert if sizeof(T)>sizeof(void*)?

I'm not sure I understand this suggestion. There are lots of SmallVector<LargeTy> instantiations.

FWIW, I'm in favor of a conservative, low default size here. LLVM developers tend to use SmallVector for everything, even when it's inappropriate (many heap cases).

llvm/unittests/ADT/SmallVectorTest.cpp
1049

If there's no runtime code, is there any reason not to stamp this out as a series of static_asserts, or a series of explicit template instantiations of a class template that does nothing but static_assert about its template arguments? Gtest's test registration is very heavyweight.

In D90884#2377537, @rnk wrote:

I'm not sure it makes sense to every get a default of a very small N in general. I guess a default like here makes sense for smaller object, what about static_assert if sizeof(T)>sizeof(void*)?

I'm not sure I understand this suggestion. There are lots of SmallVector<LargeTy> instantiations.

Sorry I wasn't clear: I didn't mean to prevent SmallVector<LargeTy, N> but I was trying to indicate that it may not be a good default for N to be 0 ever. So I would do something like:

template <typename ElementTy> constexpr size_t calculateDefaultSmallSize() {
  static_assert(sizeof(ElementTy) < 8, "Please provide a default value for SmallVector when the element type is \"large\"");
  ...
silvas added a comment.Nov 5 2020, 3:05 PM
In D90884#2377537, @rnk wrote:

Seems like it'll always have a default payload equivalent of 6 pointers? Maybe you can document this? Also that makes the vector 72B, which isn't cache line aligned, should we make it 64 or 128 instead?

Right, the header is usually two pointers big (64-bit), so 3x the header is 72. Maybe keeping the size under 128 makes more sense.

The current logic would make sizeof(SmallVector) < 3 * 16B, which is 48B. The current logic is a heuristic like "if you're willing to put the header on the stack, you're willing to put 3 headers on the stack" (not 3 *additional* headers; sorry for injecting the off-by-one into this discussion)

In D90884#2377537, @rnk wrote:

Seems like it'll always have a default payload equivalent of 6 pointers? Maybe you can document this? Also that makes the vector 72B, which isn't cache line aligned, should we make it 64 or 128 instead?

Right, the header is usually two pointers big (64-bit), so 3x the header is 72. Maybe keeping the size under 128 makes more sense.

I find the multiple of header-size not so easy to reason about, since the header can be any of 12B, 16B, or 24B. I think I slightly prefer a limit like <= 64 or <= 8 * sizeof(void*).

FWIW, I'm in favor of a conservative, low default size here. LLVM developers tend to use SmallVector for everything, even when it's inappropriate (many heap cases).

Agreed; I've come around over time. I wonder about defaulting to SmallVector<T, 0>, getting POD optimization / compressed header / SmallVectorImpl interop, but no actual small storage unless you ask for it (I think this was your suggestion in the thread linked above).

silvas added a comment.Nov 5 2020, 3:10 PM
In D90884#2377537, @rnk wrote:

I'm not sure it makes sense to every get a default of a very small N in general. I guess a default like here makes sense for smaller object, what about static_assert if sizeof(T)>sizeof(void*)?

I'm not sure I understand this suggestion. There are lots of SmallVector<LargeTy> instantiations.

Sorry I wasn't clear: I didn't mean to prevent SmallVector<LargeTy, N> but I was trying to indicate that it may not be a good default for N to be 0 ever. So I would do something like:

template <typename ElementTy> constexpr size_t calculateDefaultSmallSize() {
  static_assert(sizeof(ElementTy) < 8, "Please provide a default value for SmallVector when the element type is \"large\"");
  ...

That seems artificially restrictive. What "bug" are you trying to prevent with that static_assert? It just seems to be a pain for users. Also <8 is way too strict. Maybe <=128 or something. Large types are not uncommon

That seems artificially restrictive. What "bug" are you trying to prevent with that static_assert? It just seems to be a pain for users. Also <8 is way too strict. Maybe <=128 or something. Large types are not uncommon

My take is that if you have a large type and you want a SmallVector,, you should be explicit about the storage. I rather read SmallVector<LargeType, 4> than SmallVector<LargeType> and have implicitly N=0.

I'm not sure I understand the size correctly, but right now any type > 24B would yield a N=0, wouldn't it?

silvas added a comment.Nov 5 2020, 3:22 PM
In D90884#2377537, @rnk wrote:

Seems like it'll always have a default payload equivalent of 6 pointers? Maybe you can document this? Also that makes the vector 72B, which isn't cache line aligned, should we make it 64 or 128 instead?

Right, the header is usually two pointers big (64-bit), so 3x the header is 72. Maybe keeping the size under 128 makes more sense.

I find the multiple of header-size not so easy to reason about, since the header can be any of 12B, 16B, or 24B. I think I slightly prefer a limit like <= 64 or <= 8 * sizeof(void*).

Yeah, it's becoming clear that that's too hard to reason about. I propose that we guarantee that sizeof(SmallVector<T>) <= 8 * sizeof(void*). On 64bit platforms, gives us 4 pointers, two std::strings.

FWIW, I'm in favor of a conservative, low default size here. LLVM developers tend to use SmallVector for everything, even when it's inappropriate (many heap cases).

Agreed; I've come around over time. I wonder about defaulting to SmallVector<T, 0>, getting POD optimization / compressed header / SmallVectorImpl interop, but no actual small storage unless you ask for it (I think this was your suggestion in the thread linked above).

I do think we need some small storage, as an overwhelming number of use cases really do have that intent (at least all of the ones that I have written in recent memory). I think this default protects us from some of the more pathological cases like SmallVector<SmallVector<T>> which, without the max-absolute-size default, just lead to an exponential blowup per "layer" of smallvector.

silvas added a comment.Nov 5 2020, 3:25 PM

That seems artificially restrictive. What "bug" are you trying to prevent with that static_assert? It just seems to be a pain for users. Also <8 is way too strict. Maybe <=128 or something. Large types are not uncommon

My take is that if you have a large type and you want a SmallVector,, you should be explicit about the storage. I rather read SmallVector<LargeType, 4> than SmallVector<LargeType> and have implicitly N=0.

That makes sense. As a matter of practicality, I think I agree with Duncan here that N=0 silently isn't too bad. (though I'm still wary of making N=0 the default).

I'm not sure I understand the size correctly, but right now any type > 24B would yield a N=0, wouldn't it?

It depends on the header size (will change that to an absolute size, per Duncan's suggestion).

silvas updated this revision to Diff 303284.Nov 5 2020, 3:42 PM

Add 64B upper bound, per Duncan's suggestion.

silvas marked an inline comment as done.Nov 10 2020, 11:23 AM
silvas edited the summary of this revision. (Show Details)
mehdi_amini added inline comments.Nov 10 2020, 11:38 AM
llvm/unittests/ADT/SmallVectorTest.cpp
1043

FWIW I am still unconvinced that it is a good idea to use the default size on large objects. I'm not super comfortable with the implicit 0 as that seems to encourage mis-uses of the SmallVector.

silvas added inline comments.Nov 10 2020, 12:03 PM
llvm/unittests/ADT/SmallVectorTest.cpp
1043

I don't feel strongly about this, except on the usability side. But maybe that's a minor concern, as very large types are probably not that common.

One real issue is that rejecting SmallSize==0 could result in unexpected static_assert's across platforms: sizeof(header) and sizeof(ValueType) will differ across platforms, and both affect whether the default small size will be 0.

I don't see an obvious solution to that, and it seems like it could cause bot breakages. Thoughts?

Isn't the real answer a profiler and data collection.

  • Which SmallVector's are too large?
  • Which SmallVector's create heap allocations?
mehdi_amini added inline comments.Nov 10 2020, 12:13 PM
llvm/unittests/ADT/SmallVectorTest.cpp
1043

I agree the risk of portability is a concern.

I'm not convinced that this is the right direction for SmallVector here though: it does not seem to me that there is an obvious universal default that always makes sense here. To me this feel like making an API that is "too easy to misuse".

Isn't the real answer a profiler and data collection.

  • Which SmallVector's are too large?
  • Which SmallVector's create heap allocations?

I generally agree. I think Duncan did some of this in the past. Unfortunately, we don't have a way to en-mass tune SmallVector sizes. I think if we can adopt a default smallsize strategy, then we can start to do some of that by tuning the default strategy.

I thought the other way around. Don't use a default size, but fix cases of misuse.

silvas added inline comments.Nov 10 2020, 12:22 PM
llvm/unittests/ADT/SmallVectorTest.cpp
1043

AFAIK, the main misuse of SmallVector is to make its sizeof too big by not paying attention to sizeof(ValueType) and the number of inline elements. This patch specifically bounds that misuse by using a fixed upper bound. (that is, it seems to make the situation strictly better)

I would categorize "using SmallVector but unexpectedly getting 0 inline elements" as a much more minor kind of misuse.

I thought the other way around. Don't use a default size, but fix cases of misuse.

That's another approach. The downside is:

  1. loses on the convenience of having a default SmallSize
  2. cases of misuse are continuously being added, so this is a constant maintenance task
  3. once we fix the big low hanging fruit, performance issues due to small size choices can be "death by 1000 papercuts", so meaningful tuning can only happen in aggregate.
rnk added a comment.Nov 10 2020, 1:01 PM

Standing back a bit, I think most C++ programmers use std::vector and call it a day, so perhaps zero really is the sensible default. Are those programmers missing out and doing too many heap allocations? I tend to think not. I think setting an inline storage size without profiling is often a premature optimization. I think it's probably a lot easier to notice small, short-lived vectors in a profiler than it is to notice excessive memory usage due to poorly tuned SmallVectors on the heap.

I think "SmallVector" doesn't really mean "vector with inline storage". It really means "LLVM's optimized vector reimplementation", see the usage of uint32_t sizes and capacities.

I think it makes sense to add a low, default size to SmallVector. Both this heuristic and zero seem fine to me. This makes it easy to profile, find slow, small vector heap allocations, and tune the size later without changing APIs from std::vector to SmallVector.

I think setting an inline storage size without profiling is often a premature optimization.

I don't necessarily agree with this: I believe this is the kind of situation where you end up with flat profile where malloc shows up everywhere, and then we would spend a huge amount of resources in optimizing malloc() (like some company specialized in moving protos where optimizing memcpy is easier than avoiding the copy in the first place ;)).
I remember seeing this in clang in the past with strlen taking a few percent on the profile, but it was really hard to fix it because of how widespread it is in the codebase: this is why idioms/patterns around APIs are important.
(also it isn't only about the malloc time, but I suspect the caching behavior of using the stack for short lived value likely help).

At the moment, I much prefer the current behavior and our usage of SmallVector to a "std::vector with implicit inlined storage sometimes, maybe, but you don't really know": when I read or review LLVM code I know what it is doing without the need for too much more context (I see it a bit similarly to our guideline on not abusing auto for readability).

I think setting an inline storage size without profiling is often a premature optimization.

I don't necessarily agree with this: I believe this is the kind of situation where you end up with flat profile where malloc shows up everywhere, and then we would spend a huge amount of resources in optimizing malloc() (like some company specialized in moving protos where optimizing memcpy is easier than avoiding the copy in the first place ;)).
I remember seeing this in clang in the past with strlen taking a few percent on the profile, but it was really hard to fix it because of how widespread it is in the codebase: this is why idioms/patterns around APIs are important.
(also it isn't only about the malloc time, but I suspect the caching behavior of using the stack for short lived value likely help).

Agreed on the flat profile. I think that tuning this must be done in aggregate, and having a useful default here is the key to enabling that. We can make the policy arbitrarily complex, like having a SmallVectorTraits or something, where e.g. Value* can specialize it to say something like "most vectors of Value* are of operands, which are usually at most 3 operands" (don't know if that's true btw).

At the moment, I much prefer the current behavior and our usage of SmallVector to a "std::vector with implicit inlined storage sometimes, maybe, but you don't really know": when I read or review LLVM code I know what it is doing without the need for too much more context (I see it a bit similarly to our guideline on not abusing auto for readability).

If I write a std::vector on the stack in a patch, I think most reviewers will ask "why not SmallVector?" and even suggest using SmallVector by default. And then most people will just add a random smallsize and nobody cares. My favorite is 6.

I think that it's pretty clear that as a community we are using SmallVector as a vector replacement -- it's the default, and we should make it convenient to use, and make the default usage "just work" and avoid pitfalls (the main one being unexpectedly large sizeof).

Actually, our documented policy is "choose the first one from this list" and SmallVector is earlier in the list than std::vector. https://llvm.org/docs/ProgrammersManual.html#sequential-containers-std-vector-std-list-etc

What about a trait mechanism? We can provide a trait that works for all elemental type, and then it is just a matter for the user to define a trait defining a default size for their type, like they have to define a trait to use a type as DenseMap hash key?

What about a trait mechanism? We can provide a trait that works for all elemental type, and then it is just a matter for the user to define a trait defining a default size for their type, like they have to define a trait to use a type as DenseMap hash key?

I think that could work, but would probably be an addendum to a universal default based on bounding sizeof(SmallVector). It seems easy to add later.

The DenseMap case is different because the traits are actually needed for correctness. Here we are just talking about an optimization.

Random other proposal in the brainstorming here: introduce a subclass llvm::Vector that inherit from SmallVector and define the size as never exceeding 64B. SmallVector would always have an explicit size and Vector never: that at least address the case someone using SmallVector<LargeSize> and getting an implicit and unexpected N=0

Random other proposal in the brainstorming here: introduce a subclass llvm::Vector that inherit from SmallVector and define the size as never exceeding 64B. SmallVector would always have an explicit size and Vector never: that at least address the case someone using SmallVector<LargeSize> and getting an implicit and unexpected N=0

Interesting idea! What do other folks think? That would make average code even more succinct (no "Small").

My only concern would be llvm::Vector sounds to me like something that would never have inlined elements, but this proposal would mean that sometimes it has a inlined elements.

Mehdi, so I can better understand your position, your main objection is for something called "SmallVector" to have zero inlined elements unless explicitly requested?

My only concern would be llvm::Vector sounds to me like something that would never have inlined elements, but this proposal would mean that sometimes it has a inlined elements.

It would really be like a "small string optimization" wouldn't it? (except that it is dependent on the element type)

Mehdi, so I can better understand your position, your main objection is for something called "SmallVector" to have zero inlined elements unless explicitly requested?

Yes that is it at the moment, that does not mean that after a few days thinking about it I wouldn't get over it either ;)

Random other proposal in the brainstorming here: introduce a subclass llvm::Vector that inherit from SmallVector and define the size as never exceeding 64B. SmallVector would always have an explicit size and Vector never: that at least address the case someone using SmallVector<LargeSize> and getting an implicit and unexpected N=0

Interesting idea! What do other folks think? That would make average code even more succinct (no "Small").

My only concern would be llvm::Vector sounds to me like something that would never have inlined elements, but this proposal would mean that sometimes it has a inlined elements.

Mehdi, so I can better understand your position, your main objection is for something called "SmallVector" to have zero inlined elements unless explicitly requested?

If we add an llvm::Vector I think it should be:

template <class T> using Vector = SmallVector<T, 0>;

rather than "no more than 64B".

One option would be to also add:

template <class T> using DefaultSmallVector = SmallVector<T, kDefaultSmallSize<T>>;

(maybe with a better name), for Sean's use case of "give me a little bit of local storage if it's not expensive".

DefaultSmallVector is quite a "mouthful" though, I'd have thought we'd look for a "default container" that is easy to type if we have it everywhere.

Why are you against llvm::Vector with short string optimization? I know that std::vector does not have any inline storage, but that seems like a weak reason to me: from any behavior point of view it exposes the same interface
I see it similarly to how some implementation of std::string have a small-string optimization and others don't: it is an implementation detail.

DefaultSmallVector is quite a "mouthful" though, I'd have thought we'd look for a "default container" that is easy to type if we have it everywhere.

Why are you against llvm::Vector with short string optimization? I know that std::vector does not have any inline storage, but that seems like a weak reason to me: from any behavior point of view it exposes the same interface
I see it similarly to how some implementation of std::string have a small-string optimization and others don't: it is an implementation detail.

I think the difference is that small string optimization typically doesn't increase sizeof(std::string) compared to the default (it does some tricks to keep the string in the header last I checked). I think a more clear example is std::function which has 3 pointers of inline storage for small closures.

Some of the precedent here for "better std::vector" is FBVector which doesn't secretly expand sizeof for inline elements AFAICT: https://github.com/facebook/folly/blob/master/folly/docs/FBVector.md

Also, agreed that DefaultSmallVector is quite a mouthful and unlikely to catch on.

Maybe llvm::SVector / llvm::SmallVec? That's more succinct (helps it to catch on, and echoes the intended "default" / "convenience" aspect) but still carries a name that lets people know that they are getting inline storage. Precedent for SVector is raw_svector_ostream. Personally SmallVec reads a bit nicer to me.

template <class T>
using SVector/SmallVec = SmallVector<T, kDefaultSmallSize<T>>;

I'd prefer to avoid SmallVec: it is too close to SmallVector (it is even a prefix, which will conflict with autocompletion in IDEs).
llvm::SVec or llvm::SVector on the other hand seems different enough to me to avoid confusion, and the using definition you propose seems like a good tradeoff to me!

DefaultSmallVector is quite a "mouthful" though, I'd have thought we'd look for a "default container" that is easy to type if we have it everywhere.

Agreed, the other suggested names seem better (I don't have a strong opinion about which is best).

Why are you against llvm::Vector with short string optimization? I know that std::vector does not have any inline storage, but that seems like a weak reason to me: from any behavior point of view it exposes the same interface
I see it similarly to how some implementation of std::string have a small-string optimization and others don't: it is an implementation detail.

As others have mentioned, I think if we added small storage that was zero cost to sizeof that would be fine to shove into llvm::Vector by-default. But having llvm::Vector be bigger than std::vector seems dangerous to me, simply because it would be surprising.

having llvm::Vector be bigger than std::vector seems dangerous to me, simply because it would be surprising.

Oh so we should name it llvm::BigVector instead then ;)
(just joking, just joking...)

silvas updated this revision to Diff 304981.Nov 12 2020, 3:19 PM
silvas edited the summary of this revision. (Show Details)

Add SVec<T> as "DefaultSmallVector"

silvas edited the summary of this revision. (Show Details)Nov 12 2020, 3:20 PM
silvas retitled this revision from [SmallVector] Add a default small size. to Add `SVec<T>`, a SmallVector with a default small size..Nov 12 2020, 3:20 PM
silvas edited the summary of this revision. (Show Details)
mehdi_amini accepted this revision.Nov 12 2020, 3:34 PM
This revision is now accepted and ready to land.Nov 12 2020, 3:34 PM

My plan here is to get a few approvals on this patch, and then take the discussion to llvm-dev for some final aspects:

  • how we should document this (e.g. programmer guide addition and "default" recommendation)
  • to what extent we want reviewers to suggest SVec
  • how much we want to bulk migrate code SVec vs let it organically grow.

My plan here is to get a few approvals on this patch, and then take the discussion to llvm-dev for some final aspects:

  • how we should document this (e.g. programmer guide addition and "default" recommendation)
  • to what extent we want reviewers to suggest SVec
  • how much we want to bulk migrate code SVec vs let it organically grow.

Two thoughts to consider incorporating into the RFC:

  1. I'd be concerned SVector<T> / SVec<T> will be "too" ergonomic, and people won't bother with SmallVector<T, 0> when it's more appropriate. Adding Vector<T> / Vec<T> at the same time would mitigate that. Also, instances of SmallVector<T, 0> could be updated to use Vector<T> en masse without much controversy, which might help with discovery of {S,}Vector if there's resistance to updating other SmallVectors en masse (and I think there will be).
  1. It'd be nice to have a rough draft of changes to docs/ in this patch already when the RFC goes out, essentially giving an initial opinion and what the recommendations should be. (FWIW, I feel the docs should land in the same commit as the classes if possible.)

+1 on both points!

silvas updated this revision to Diff 305225.Nov 13 2020, 11:41 AM
silvas edited the summary of this revision. (Show Details)

Add Vec<T> and update ProgrammersManual.rst

silvas retitled this revision from Add `SVec<T>`, a SmallVector with a default small size. to [SmallVector] Add `SVec<T>` / `Vec<T>` convenience wrappers..Nov 13 2020, 11:42 AM
silvas edited the summary of this revision. (Show Details)
dexonsmith accepted this revision.Nov 13 2020, 12:07 PM

Docs look good, so this LGTM too (assuming RFC doesn't bring up major concerns).

mehdi_amini added inline comments.Nov 13 2020, 12:40 PM
llvm/docs/ProgrammersManual.rst
1517 ↗(On Diff #305225)

I don't necessarily subscribe to the stack/heap difference.

If the SVec/Vec is part of a heap allocation, they retain their properties: Vect would always an extra heap allocation for the data while SVec keeps inlined storage in the original heap allocation (with all the benefits of SmallVector).

I see it similarly to trailing allocation in heap allocated object somehow.
In general, heap or stack, SmallVector and Vec behaves differently in "move" vs "copy".

1521 ↗(On Diff #305225)

Should we say that we always recommend Vec<T> instead of std::vector? Are there use-cases (other than invoking external API expressed with std::vector) to use it?

(actually line 1617 may make this redundant)

1525 ↗(On Diff #305225)

"reference needed" ;)

(or reformulate)

1526 ↗(On Diff #305225)

I'd remove this.

1617 ↗(On Diff #305225)

s/often/in general/ ?

1618 ↗(On Diff #305225)

Is this comment still relevant? The header for SmallVector is adjusting its max size right?

dexonsmith added inline comments.Nov 13 2020, 1:17 PM
llvm/docs/ProgrammersManual.rst
1618 ↗(On Diff #305225)

Here's the current logic:

template <class T>
using SmallVectorSizeType =
    typename std::conditional<sizeof(T) < 4 && sizeof(void *) >= 8, uint64_t,
                              uint32_t>::type;

I.e., it's limited to UINT32_MAX elements for sizeof(T) >= 4 ("16GB should be enough for anyone"). It would probably be good to clarify this limitation, in particular since it doesn't apply to SmallString.

silvas updated this revision to Diff 305247.Nov 13 2020, 1:22 PM
silvas marked 3 inline comments as done.
silvas edited the summary of this revision. (Show Details)

Address comments from Mehdi about the docs.

silvas marked an inline comment as done.Nov 13 2020, 1:24 PM
silvas added inline comments.
llvm/docs/ProgrammersManual.rst
1517 ↗(On Diff #305225)

This is definitely a simplification, and these are just "guidelines". There are definitely good reasons to use SVec on the heap or Vec on the stack. I think this is the safest default advice we can give, especially since large SmallVector on the heap seems to be one of the main patterns that has been historically problematic.

1521 ↗(On Diff #305225)

Yes, std::vector is sometimes needed, see my comment below.

1618 ↗(On Diff #305225)

It is still relevant. The current logic will still in many cases only allow UINT32_MAX elements.

template <class T>
using SmallVectorSizeType =
    typename std::conditional<sizeof(T) < 4 && sizeof(void *) >= 8, uint64_t,
                              uint32_t>::type;
mehdi_amini added inline comments.Nov 13 2020, 2:40 PM
llvm/docs/ProgrammersManual.rst
1517 ↗(On Diff #305225)

Can you phrase this in a less prescriptive way? "should" comes a bit strong to me. I think it'd be enough to just say:

* ``SVec<T>`` should be used for vectors where a "small" number
  of elements are expected. (the "S" in the name stands for "Small" or
  "on the Stack").
* ``Vec<T>`` should be used for cases where inline storage is too expensive. For example inside other
  data structures, such as ``DenseMap<Value *, Vec<Value *>>``.
silvas updated this revision to Diff 305291.Nov 13 2020, 6:39 PM
silvas marked an inline comment as done.

Soften wording to "recommend" from "should".

silvas updated this revision to Diff 305292.Nov 13 2020, 6:41 PM

slight wording tweak

silvas added inline comments.Nov 13 2020, 6:43 PM
llvm/docs/ProgrammersManual.rst
1517 ↗(On Diff #305225)

softened "should" to "recommend".

mehdi_amini added inline comments.Nov 13 2020, 7:11 PM
llvm/docs/ProgrammersManual.rst
1517 ↗(On Diff #305225)

Yeah but I really prefer that we focus on small and leave out the stack/heap thing: this looks very misleading to me.

1517 ↗(On Diff #305225)

(and it does not provide more/useful information either)

silvas abandoned this revision.Dec 3 2020, 6:35 PM