This is an archive of the discontinued LLVM Phabricator instance.

ADT: SmallVector size/capacity use word-size integers when elements are small
ClosedPublic

Authored by browneee on Apr 6 2020, 7:15 PM.

Details

Summary

SmallVector currently uses 32bit integers for size and capacity to reduce
sizeof(SmallVector). This limits the number of elements to UINT32_MAX.

For a SmallVector<char>, this limits the SmallVector size to only 4GB.
Buffering bitcode output uses SmallVector<char>, but needs >4GB output.

This changes SmallVector size and capacity to conditionally use word-size
integers if the element type is small (<4 bytes). For larger elements types,
the vector size can reach ~16GB with 32bit size.

Making this conditional on the element type provides both the smaller
sizeof(SmallVector) for larger types which are unlikely to grow so large,
and supports larger capacities for smaller element types.

Diff Detail

Event Timeline

browneee created this revision.Apr 6 2020, 7:15 PM
RKSimon added a subscriber: RKSimon.Apr 7 2020, 3:06 AM
RKSimon added inline comments.
llvm/include/llvm/Bitstream/BitstreamWriter.h
19 ↗(On Diff #255567)

Can this be dropped?

dblaikie accepted this revision.Apr 7 2020, 11:25 AM

There are a few places where smaller-scoped BitcodeWriters are used where a SmallVector may be suitable, but I'm OK sacrificing those for the broader goal. If someone finds these to be performance sensitive, we can revisit/try to find some way to support both goals.

This revision is now accepted and ready to land.Apr 7 2020, 11:25 AM
browneee updated this revision to Diff 255773.Apr 7 2020, 12:39 PM
browneee marked an inline comment as done.

Fix build.

Herald added a project: Restricted Project. · View Herald TranscriptApr 7 2020, 12:39 PM
browneee updated this revision to Diff 255831.Apr 7 2020, 3:11 PM

Fix more build errors.

browneee updated this revision to Diff 255856.Apr 7 2020, 4:37 PM

Fix build errors. Missed -DLLVM_ENABLE_PROJECTS in previous local test builds.

dexonsmith added a comment.EditedApr 7 2020, 6:04 PM

This is thanks to a commit of mine that shaved a word off of SmallVector. Some options to consider:

  1. Revert to word-size integers (size_t? uintptr_t?) for Size and Capacity for small-enough types. Could be just if sizeof(T)==1. Or maybe just for char and unsigned char.
  2. Revert my patch entirely and go back to words (these used to be void*).
  3. (Your patch, stop using SmallVector<char>.)

I think I would prefer some variation of (1) over (3).

dexonsmith requested changes to this revision.Apr 7 2020, 6:09 PM

Requesting changes just to be sure we consider the other options. I don't think it's good that SmallVector is no longer useful for large byte streams; I would prefer to fix that then stop using the type.

This revision now requires changes to proceed.Apr 7 2020, 6:09 PM
browneee updated this revision to Diff 255875.Apr 7 2020, 6:44 PM
browneee marked 2 inline comments as done and an inline comment as not done.

Build fixes in additional projects.

browneee updated this revision to Diff 255891.Apr 7 2020, 9:30 PM

Fix formatting.

browneee added inline comments.Apr 7 2020, 9:48 PM
llvm/include/llvm/Bitstream/BitstreamWriter.h
19 ↗(On Diff #255567)

It is still used to construct records (line 512).

This is thanks to a commit of mine that shaved a word off of SmallVector. Some options to consider:

  1. Revert to word-size integers (size_t? uintptr_t?) for Size and Capacity for small-enough types. Could be just if sizeof(T)==1. Or maybe just for char and unsigned char.
  2. Revert my patch entirely and go back to words (these used to be void*).
  3. (Your patch, stop using SmallVector<char>.)

I think I would prefer some variation of (1) over (3).

Hi Duncan, thanks for raising these alternatives.

Links to your prior commit for context: Review, Commit

I agree any of these options would solve the issue I'm experiencing.

Option 1:

  • I think SmallVectorBase would need to become templated.
  • The size related code would need support to two sets of edge cases.
  • The varying capacity may be surprising, and adds another variation to both both small mode and heap mode.

Option 3:

  • This patch is somewhat widespread. A more localized fix may be desirable.
  • Some inconvenience of an API change for downstream.

Do we want to increase the complexity of SmallVector somewhat? or do we want to keep the limit and affirm SmallVector is for small things?

Do we want to increase the complexity of SmallVector somewhat? or do we want to keep the limit and affirm SmallVector is for small things?

I don't think we should limit SmallVector to small things. Most std::string implementations also have the small storage optimization, but they're not limited to small things. Note that even SmallVector<T,0> has a number of conveniences for LLVM over std::vector (such as extra API, ability to use SmallVectorImpl APIs, and no pessimizations from exception handling).

Personally, I'm fine with splitting SmallVectorBase into SmallVectorBase<uintptr_t> and SmallVectorBase<uint32_t> (on 32-bit architectures, there's actually no split there). There aren't APIs that take a SmallVectorBase so there's no downside there. It doesn't seem too bad to me to do something like:

template <class SizeT> SmallVectorBase {
  typedef SizeT size_type;
  // ...
};
template <class T>
using SmallVectorSizeType =
    std::conditional<sizeof(T) < 4, uintptr_t, uint32_t>;
template <class T> SmallVectorImpl :
    SmallVectorBase<SmallVectorSizeType<T>> { ... };

If the complexity is too much, I would personally prefer to have my patch reverted (option 2 above) over making SmallVector stop working with large byte arrays.

Do we want to increase the complexity of SmallVector somewhat? or do we want to keep the limit and affirm SmallVector is for small things?

I don't think we should limit SmallVector to small things. Most std::string implementations also have the small storage optimization, but they're not limited to small things. Note that even SmallVector<T,0> has a number of conveniences for LLVM over std::vector (such as extra API, ability to use SmallVectorImpl APIs, and no pessimizations from exception handling).

Personally, I'm fine with splitting SmallVectorBase into SmallVectorBase<uintptr_t> and SmallVectorBase<uint32_t> (on 32-bit architectures, there's actually no split there). There aren't APIs that take a SmallVectorBase so there's no downside there. It doesn't seem too bad to me to do something like:

template <class SizeT> SmallVectorBase {
  typedef SizeT size_type;
  // ...
};
template <class T>
using SmallVectorSizeType =
    std::conditional<sizeof(T) < 4, uintptr_t, uint32_t>;
template <class T> SmallVectorImpl :
    SmallVectorBase<SmallVectorSizeType<T>> { ... };

If the complexity is too much, I would personally prefer to have my patch reverted (option 2 above) over making SmallVector stop working with large byte arrays.

Fair enough - that complexity seems reasonably acceptable to me if you reckon the memory size benefits are still worthwhile (did you measure them on any particular workloads? Do we have lots of fairly empty SmallVectors, etc?) if they don't apply to smaller types like this?

browneee updated this revision to Diff 256621.Apr 10 2020, 11:08 AM

Change to suggested approach: size and capacity type conditionally larger for small element types.

Also incorporate https://reviews.llvm.org/D77601

browneee retitled this revision from Change BitcodeWriter buffer to std::vector instead of SmallVector. to ADT: SmallVector size & capacity use word-size integers when elements are small..Apr 10 2020, 12:46 PM
browneee edited the summary of this revision. (Show Details)

Please update the patch description/subject line.

@dexonsmith I'll leave this to you for final approval, since it was your idea/you've been touching things here. But looks like about the right direction.

llvm/include/llvm/ADT/SmallVector.h
47

Don't think this typedef is really pulling its weight - probably just refer to the template type parameter directly?

53

I'd probably use numeric_limits here & make this static constexpr

132

I'd probably add a "using Base = SmallVectorBase<SmallVectorSizeType<T>>" here, and then use that in the ctor and grow_pod.

Also down by the other using decls maybe add "using Base::size/Base::capacity/Base::empty" so you don't have to "this->" everything.

browneee updated this revision to Diff 256803.Apr 11 2020, 2:54 PM
browneee marked 3 inline comments as done.

Address comments from dblaikie.

dblaikie added inline comments.Apr 11 2020, 5:30 PM
llvm/include/llvm/ADT/SmallVector.h
53

Making it a constexpr /variable/ gets a bit more complicated - do you happen to know off-hand whether this requires a separate/out-of-line definition if it's ODR used (ah, seems it does - in C++14 which LLVM Uses, in 17 ("A constexpr specifier used in a function or static member variable (since C++17) declaration implies inline.") it would be OK)

I think it's best not to leave a trap that might catch someone up in the future if SizeMax ends up getting ODR used (eg: in a call to std::less, rather than an explicit op<, etc) & then the lack of definition would lead to linking failure, etc. So best to leave it as a static constexpr function rather than static constexpr variable.

llvm/lib/Support/SmallVector.cpp
40–47

I think it's probably best to assert the size more like the above (but using zero-sized inline buffer), rather than using relative sizes - seems like it'd be more readable to me at least. Maybe @dexonsmith has a perspective here.

browneee updated this revision to Diff 257044.Apr 13 2020, 11:43 AM

Changed SizeMax to static constexpr function.
Changed static asserts.

browneee marked 2 inline comments as done.Apr 13 2020, 11:43 AM

Looks good to me at this point (I have some vague quandries about whether the report_fatal_error stuff could be improved/made more clear, but couldn't come up with an actionable suggestion so far) - @dexonsmith could you check this over and offer final approval?

dexonsmith accepted this revision.Apr 13 2020, 1:36 PM

Thanks for your patience, I missed the updates on Friday.

I have a couple of optional comments inline that I don't feel strongly about. LGTM either way.

Fair enough - that complexity seems reasonably acceptable to me if you reckon the memory size benefits are still worthwhile (did you measure them on any particular workloads? Do we have lots of fairly empty SmallVectors, etc?) if they don't apply to smaller types like this?

I haven't measured anything recently. Last I looked there were a number of SmallVectors inside other data structures (sometimes, sadly, SmallVector) on the heap (or stack). In some cases the main reason not to use std::vector is the exception pessimizations. It's nice to keep them small if it's reasonable to.

llvm/include/llvm/ADT/SmallVector.h
53

STL data structures have a name for this called max_size(). Should we be consistent with that?

178

Optionally we could expose max_size() as well.

This revision is now accepted and ready to land.Apr 13 2020, 1:36 PM
browneee updated this revision to Diff 257130.Apr 13 2020, 3:46 PM
browneee marked 3 inline comments as done.

Rename SizeMax() to SizeTypeMax(). Fix max_size().

I'm open to suggestions to resolve the clang tidy naming warnings. I would prefer to leave grow_pod the same, to minimize changes.

@dexonsmith I am not a committer, if the last changes looks good please submit for me. Thanks!

llvm/include/llvm/ADT/SmallVector.h
53

Good question.

This brought my attention to the existing SmallVectorTemplateCommon::max_size() which also needed to be updated.
I'm going to name this new function SizeTypeMax to best describe what it provides, and leave it separate from max_size().

178

Not done. Updated existing max_size() instead.

@dexonsmith I am not a committer, if the last changes looks good please submit for me. Thanks!

You've had a few patches in the past, I suggest you get yourself access.
https://llvm.org/docs/DeveloperPolicy.html#new-contributors

Or, let me know what you want for your GIT_COMMITTER* info.

GIT_COMMITTER_NAME=Andrew Browne
GIT_COMMITTER_EMAIL=browneee@google.com

This would be my second commit. I will request access next time - thanks @dexonsmith!

browneee updated this revision to Diff 257861.Apr 15 2020, 2:54 PM

Rebase to latest HEAD.

browneee updated this revision to Diff 258444.Apr 17 2020, 3:55 PM

Rebase to latest HEAD.

dexonsmith closed this revision.Apr 17 2020, 4:32 PM

GIT_COMMITTER_NAME=Andrew Browne
GIT_COMMITTER_EMAIL=browneee@google.com

This would be my second commit. I will request access next time - thanks @dexonsmith!

I should have said GIT_AUTHOR* info, since I'm the committer :). Just landed it as b8d08e961df1d229872c785ebdbc8367432e9752, thanks for waiting!

nikic reopened this revision.Apr 18 2020, 3:15 AM
nikic added a subscriber: nikic.

I have reverted this change, because it causes a 1% compile-time and memory usage regression. The memory usage regression is probably fine given what this change does, but the compile-time regression is not. (For context, this pretty much undoes the wins that the recent removal of waymarking gave us.)

Some notes:

  • Can you please split out the fix to grow() into a separate revision? It does not seem related to the main change, and reduces surface area.
  • I don't think the automatic switch of the size/capacity field has been justified well. We have plenty of SmallVectors in LLVM that are, indeed, small. There is no way an MCRelaxableFragment will ever end up storing a single instruction that is 4G large.
  • Similarly, I'm not really convinced about handling this in SmallVector at all. The original change here just used an std::vector in the one place where this has become an issue. That seems like a pretty good solution until there is evidence that this is really a more widespread problem.

But in any case, my primary concern here is the compile-time regression, and it's not immediately clear which part of the change it comes from.

This revision is now accepted and ready to land.Apr 18 2020, 3:15 AM

Thanks for the revert explanation and notes, nikic.

@dexonsmith what is your current thinking on going back to the original std::vector approach?

browneee updated this revision to Diff 259480.Apr 22 2020, 10:44 PM

Switch approach back to std::vector change.

browneee retitled this revision from ADT: SmallVector size & capacity use word-size integers when elements are small. to Change BitcodeWriter buffer to std::vector instead of SmallVector..Apr 22 2020, 11:15 PM
browneee edited the summary of this revision. (Show Details)
browneee edited the summary of this revision. (Show Details)

Thanks for the revert explanation and notes, nikic.

@dexonsmith what is your current thinking on going back to the original std::vector approach?

SmallVector has only been limited to UINT32_MAX size for about a year and I think it’s a pretty major regression that I broke using it for arbitrary char buffers. I don’t think that’s acceptable really. Note that there was pushback when I shrank SmallVector at all for aesthetic reasons.

Note that breaking SmallVector<char> also breaks SmallString and raw_svector_ostream for buffers that are sometimes large. This was certainly not the goal of my original commit and I think it’s the wrong result.

One thing to try I suppose is specializing just when sizeof(T)==1. But even if there’s still a compile time hit, I think making SmallVector functional is more critical. Use cases that really want something tiny can use TinyPtrVector; or if that’s not appropriate we can introduce a TinyVector that works for other types (could make it 8 bytes with small storage for 1 element if the type is 4 bytes or smaller).

This might be worth a thread on llvm-dev. Maybe no one else thinks LLVM should use SmallVectorImpl pervasively in APIs anymore.

(AFAICT MCRelaxableFragment has a SmallVector<MCFixup> and would not have been affected by the reverted commit since sizeof(MCFixup) is quite large, not sure why that was brought up.)

because it causes a 1% compile-time and memory usage regression.

Yeah, some memory regression is expected and, in my opinion, acceptable for the change.

The compile time regression presumably came from the changes to the report_fatal_error handling in SmallVector - perhaps it could be changed/omitted in this commit, and done separately to assess the cost of changes to that error checking?

I resubmitted the report_fatal_error checks again under D77601

http://llvm-compile-time-tracker.com/compare.php?from=7375212172951d2fc283c81d03c1a8588c3280c6&to=a30e7ea88e75568feed020aedae73c52de888835&stat=max-rss
http://llvm-compile-time-tracker.com/compare.php?from=7375212172951d2fc283c81d03c1a8588c3280c6&to=a30e7ea88e75568feed020aedae73c52de888835&stat=instructions

Imo impact from this part is insignificant.

Other pieces I see as possibly impacting compile time are:

  1. This correction to SmallVectorTemplateCommon::max_size(). But SizeTypeMax() is static constexpr, this seems like it could still be optimized to a constant.
-  size_type max_size() const { return size_type(-1) / sizeof(T); }
+  size_type max_size() const {
+    return std::min(this->SizeTypeMax(), size_type(-1) / sizeof(T));
+  }
  1. More function calls. They also appear fairly optimizable to me.

I may not have good insight into the actual optimization behavior here.

Ah, OK - thanks for noting that!

@nikic any sense of the noise floor/level on these measurements? It doesn't /look/ like there's much left in this that would cause problems. & I assume these measurements were made on an optimized build (so we don't have to try to improve the unoptimized code?

Other pieces I see as possibly impacting compile time are:

  1. This correction to SmallVectorTemplateCommon::max_size(). But SizeTypeMax() is static constexpr, this seems like it could still be optimized to a constant.
-  size_type max_size() const { return size_type(-1) / sizeof(T); }
+  size_type max_size() const {
+    return std::min(this->SizeTypeMax(), size_type(-1) / sizeof(T));
+  }

Perhaps you could move the value computation into a constexpr variable & just return that as needed. (could be a static local constexpr, I guess - to avoid the issues around linkage of constexpr member variables)

  1. More function calls. They also appear fairly optimizable to me.

I may not have good insight into the actual optimization behavior here.

*nod* Didn't seem especially interesting.

I don't think the automatic switch of the size/capacity field has been justified well. We have plenty of SmallVectors in LLVM that are, indeed, small. There is no way an MCRelaxableFragment will ever end up storing a single instruction that is 4G large.

@nikic - can you explain the relevance of this ^ (as @dexonsmith pointed out, MCRelaxableFragment doesn't look like it would be affected by this change - is there something we're missing about that?)

Similarly, I'm not really convinced about handling this in SmallVector at all. The original change here just used an std::vector in the one place where this has become an issue. That seems like a pretty good solution until there is evidence that this is really a more widespread problem.

I'm inclined to go with @dexonsmith's perspective here, as the author of the original change & the general attitude that SmallVector should support this kind of use case.

nikic added a comment.Apr 23 2020, 1:30 PM

@nikic any sense of the noise floor/level on these measurements? It doesn't /look/ like there's much left in this that would cause problems. & I assume these measurements were made on an optimized build (so we don't have to try to improve the unoptimized code?

The measurements are on an optimized build (default LLVM release build, so no LTO). The noise level on the "instructions" metric is very low, so that changes above 0.1% tend to be significant. The compile-time regression on the original change definitely wasn't noise (but the change from D77601 is in the noise).

Other pieces I see as possibly impacting compile time are:

  1. This correction to SmallVectorTemplateCommon::max_size(). But SizeTypeMax() is static constexpr, this seems like it could still be optimized to a constant.
-  size_type max_size() const { return size_type(-1) / sizeof(T); }
+  size_type max_size() const {
+    return std::min(this->SizeTypeMax(), size_type(-1) / sizeof(T));
+  }

Perhaps you could move the value computation into a constexpr variable & just return that as needed. (could be a static local constexpr, I guess - to avoid the issues around linkage of constexpr member variables)

The use of a function rather than a static constexpr for SizeTypeMax() was my first thought as well. It seems pretty weird to me, but maybe it's enough to fall one the wrong side of some inlining heuristic.

The only other thing that comes to mind is that grow_pod() moved into the header, which might have negative effects. It should be possible to avoid that by providing explicit template instantiations for uint32_t and uintptr_t in the cpp file.

I'll try to figure out what the cause is, but might take me a few days.

I don't think the automatic switch of the size/capacity field has been justified well. We have plenty of SmallVectors in LLVM that are, indeed, small. There is no way an MCRelaxableFragment will ever end up storing a single instruction that is 4G large.

@nikic - can you explain the relevance of this ^ (as @dexonsmith pointed out, MCRelaxableFragment doesn't look like it would be affected by this change - is there something we're missing about that?)

MCRelaxableFragment also contains a SmallVector<char>. I used this as an example where we use a SmallVector<char> with a very low upper bound on the size. (This example is not great, because the structure is already large for other reasons.)

Similarly, I'm not really convinced about handling this in SmallVector at all. The original change here just used an std::vector in the one place where this has become an issue. That seems like a pretty good solution until there is evidence that this is really a more widespread problem.

I'm inclined to go with @dexonsmith's perspective here, as the author of the original change & the general attitude that SmallVector should support this kind of use case.

Okay, I'm basically fine with that, if it is our stance that SmallVector should always be preferred over std::vector. Not really related to this revision, but it would probably help to do some renaming/aliasing to facilitate that view. Right now, the number of SmallVector<T, 0> uses in LLVM is really small compared to the std::vector<T> uses (100 vs 6000 based on a not very accurate grep). I think part of that is in the name, and calling it using Vector<T> = SmallVector<T, 0> and using VectorImpl<T> = SmallVectorImpl<T> would make it a lot more obvious that this is our preferred general purpose vector type, even if the stored data is not small.

Okay, I'm basically fine with that, if it is our stance that SmallVector should always be preferred over std::vector. Not really related to this revision, but it would probably help to do some renaming/aliasing to facilitate that view. Right now, the number of SmallVector<T, 0> uses in LLVM is really small compared to the std::vector<T> uses (100 vs 6000 based on a not very accurate grep). I think part of that is in the name, and calling it using Vector<T> = SmallVector<T, 0> and using VectorImpl<T> = SmallVectorImpl<T> would make it a lot more obvious that this is our preferred general purpose vector type, even if the stored data is not small.

Those aliases SGTM.

nikic added a comment.Apr 24 2020, 1:09 AM

Perhaps you could move the value computation into a constexpr variable & just return that as needed. (could be a static local constexpr, I guess - to avoid the issues around linkage of constexpr member variables)

The use of a function rather than a static constexpr for SizeTypeMax() was my first thought as well. It seems pretty weird to me, but maybe it's enough to fall one the wrong side of some inlining heuristic.

The only other thing that comes to mind is that grow_pod() moved into the header, which might have negative effects. It should be possible to avoid that by providing explicit template instantiations for uint32_t and uintptr_t in the cpp file.

I'll try to figure out what the cause is, but might take me a few days.

I've tried those two things: results. From the bottom, the first commit is a rebased version of the original change, the second one makes SizeTypeMax a constant instead of a function and the last one moves grow_pod back into the C++ file (I forgot to replace the UINT32_MAX references in grow_pod, but I don't think it has an impact on the conclusion). The first change is a +0.75% regression, the second is neutral and the last one is a -0.70% improvement, the remaining difference is likely noise. So it looks like the move of grow_pod into the header was the culprit.

What is rather peculiar is that the picture is similar for the max-rss numbers. I believe this is because max-rss is also influenced by the size of the clang binary itself, and apparently the move of grow_pod into the header increased it a lot. (I should probably collect clang binary size to make this easy to verify.) That means that there doesn't seem to be much of an increase in terms of actually allocated heap memory due to this change.

Taking the max-rss numbers across all three commits, the only part where memory usage increases non-trivially is the LTO -g link step, by about ~1%. Possibly some debuginfo related stuff uses SmallVector<char>.

So tl;dr looks like as long as we keep grow_pod outside the header file, this change seems to be approximately free in terms of compile-time and memory usage both.

@nikic, great news! Thanks for doing the detailed analysis.

browneee updated this revision to Diff 259949.Apr 24 2020, 12:05 PM

Switch back to size and capacity type conditionally larger approach (appologies for the noise here).

Apply performance regression solutions from @nikic

browneee retitled this revision from Change BitcodeWriter buffer to std::vector instead of SmallVector. to ADT: SmallVector size/capacity use word-size integers when elements are small.Apr 24 2020, 12:06 PM
browneee edited the summary of this revision. (Show Details)
dblaikie accepted this revision.Apr 24 2020, 12:20 PM

Okay, I'm basically fine with that, if it is our stance that SmallVector should always be preferred over std::vector. Not really related to this revision, but it would probably help to do some renaming/aliasing to facilitate that view. Right now, the number of SmallVector<T, 0> uses in LLVM is really small compared to the std::vector<T> uses (100 vs 6000 based on a not very accurate grep). I think part of that is in the name, and calling it using Vector<T> = SmallVector<T, 0> and using VectorImpl<T> = SmallVectorImpl<T> would make it a lot more obvious that this is our preferred general purpose vector type, even if the stored data is not small.

Those aliases SGTM.

I'd be slightly against, just because having a name that differs from the standard name only in case seems pretty subtle - that and momentum, we've had SmallVector around for a while & I think it's OK. I don't mind some places using std::vector either, though. Don't feel strongly enough that I'd outright stand against such an alias/change, but just expressing this amount of disfavor.

So tl;dr looks like as long as we keep grow_pod outside the header file, this change seems to be approximately free in terms of compile-time and memory usage both.

Awesome - thanks for looking into it!

nikic accepted this revision.Apr 24 2020, 1:11 PM
nikic added inline comments.
llvm/include/llvm/ADT/SmallVector.h
19

Is this include still needed?

116

Is this needed? I don't think it makes a lot of sense to allow odr-use of SizeTypeMax. As it's a protected member, it's only used in the SmallVector implementation, where we control how it is used.

llvm/lib/Support/SmallVector.cpp
47

Nit: if the when => if the or when.

dblaikie added inline comments.Apr 24 2020, 1:47 PM
llvm/include/llvm/ADT/SmallVector.h
116

It's used as a parameter to std::min, so it's already odr used & I'd rather not leave it as a trap to walk around even if we addressed that issue.

I assume if it were a constexpr local in a protected inline function it wouldn't hinder optimizations in any real way?

nikic added inline comments.Apr 24 2020, 2:32 PM
llvm/include/llvm/ADT/SmallVector.h
116

It's used as a parameter to std::min, so it's already odr used & I'd rather not leave it as a trap to walk around even if we addressed that issue.

Oh, right you are! In that case this seems fine :)

I assume if it were a constexpr local in a protected inline function it wouldn't hinder optimizations in any real way?

The change from constexpr function to constexpr static didn't change anything performance-wise, so either way works for me.

Another option is:

enum : size_t { SizeTypeMax = std::numeric_limits<Size_T>::max() };

Kind of sad that in C++14, using an enum is still the only "no nonsense" way to declare a constant.

browneee updated this revision to Diff 260006.Apr 24 2020, 3:09 PM
browneee marked 4 inline comments as done.
  • Change SizeTypeMax to a static constexpr function.
  • Fix comment typos.
  • Add comment to alert others to possible performance loss if that function is moved to the header.

@nikic: Thank you for detecting, analyzing, and solving the performance regression!

Reverted in 5cb4c3776a34d48e43d9118921d2191aee0e3d21

Fails on plaforms where uintptr_t is the same type as uint32_t.

browneee reopened this revision.Apr 24 2020, 9:30 PM
This revision is now accepted and ready to land.Apr 24 2020, 9:30 PM
browneee updated this revision to Diff 260064.Apr 24 2020, 9:32 PM

Change uintptr_t to uint64_t to ensure this does not instantiate the same template twice on platforms where uintptr_t is equivalent to uint32_t.

Also considered using the preprocessor to disable the uintptr_t instantiation, but chose to avoid preprocessor use.

browneee marked an inline comment as done.Apr 24 2020, 9:43 PM
browneee added inline comments.
llvm/include/llvm/ADT/SmallVector.h
19

SmallVectorTemplateBase<T, TriviallyCopyable>::grow() still remains in the header and uses report_bad_alloc_error().

dblaikie accepted this revision.Apr 27 2020, 1:14 PM

Seems good to me.

MaskRay closed this revision.EditedApr 27 2020, 2:34 PM
MaskRay added a subscriber: MaskRay.

@browneee Hi, it seems that you did not attach Differential Revision: to the commit dda3c19a3618dce9492687f8e880e7a73486ee98 so the differential was not closed automatically.

In previous llvm-dev discussions, people agreed that Reviewed-by: and Differential Revision: are useful and should be retained. Many others Summary: Subscribers: Reviewers: Tags: not not needed.

I usually do this when committing: arcfilter; git fetch origin master && git rebase origin/master; last-minute-testing && git push origin HEAD:master

where arcfilter is a shell function which drops unneeded Phabricator tags

arcfilter () {
        arc amend
        git log -1 --pretty=%B | awk '/Reviewers:|Subscribers:/{p=1} /Reviewed By:|Differential Revision:/{p=0} !p && !/^Summary:$/ {sub(/^Summary: /,"");print}' | git commit --amend -F -
}
smeenai added inline comments.
llvm/include/llvm/ADT/SmallVector.h
53

Was it intentional to make this return a size_t rather than a Size_T? Clang gives a truncation warning on 32-bit platforms when you try to instantiate the template with uint64_t as a result.

dblaikie added inline comments.Apr 30 2020, 8:02 AM
llvm/include/llvm/ADT/SmallVector.h
53

I think returning size_t is the correct thing here & the fix is not to use a 64 bit size on a 32 bit machine - that was initially intended to be solved by using uintptr_t, but got lost when that turned out to cause issues on 32 bit machines.

@browneee could you fix this differently, so that when sizeof(uintptr_t) == 4 there's only one instantiation/only uint32_t is used?

browneee added a comment.EditedApr 30 2020, 11:44 AM

Thanks for the tips, MaskRay.

Yes, I expect this would fix that issue.

smeenai, SizeTypeMax() is intended to return size_t.


I see a couple of options for fixing the truncation warning on 32-bit platforms:

  1. Add an explicit cast to remove the warning.
    • Disadvantage: the second instantiation still exists even though it is unused.
static constexpr size_t SizeTypeMax() {
  return static_cast<size_t>(std::numeric_limits<Size_T>::max());
}
  1. Use a std::conditional to swap the type of one instantiation to avoid conflicts.

    In this case I'd probably swap back to using uintptr_t and disable the uint32_t on 32bit.
    • Disadvantage: the second instantiation still exists even though it is unused.
// Will be unused when instantiated with char.
// This is to avoid instantiation for uint32_t conflicting with uintptr_t on 32-bit systems.
template class llvm::SmallVectorBase<std::conditional<sizeof(void *) != 4, uint32_t, char>::type>;     
template class llvm::SmallVectorBase<uintptr_t>;
  1. Use preprocessor to disable one of the instantiations on 32-bit platforms.

    In this case I'd probably swap back to using uintptr_t and disable the uint32_t on 32bit.
    • Disadvantage: uses preprocessor
    • Disadvantage: potential for portability issues with different platforms lacking certain macros
#if __SIZEOF_POINTER__ != 4  && !defined(_WIN32) && !defined(__ILP32)
template class llvm::SmallVectorBase<uint32_t>;     
#endif    
template class llvm::SmallVectorBase<uintptr_t>;

My order of preference would be 1, 2, 3.

Is there another solution I've missed? Thoughts on which is best? @dblaikie

Thanks for the tips, MaskRay.

Yes, I expect this would fix that issue.

smeenai, SizeTypeMax() is intended to return size_t.


I see a couple of options for fixing the truncation warning on 32-bit platforms:

  1. Add an explicit cast to remove the warning.
    • Disadvantage: the second instantiation still exists even though it is unused.
static constexpr size_t SizeTypeMax() {
  return static_cast<size_t>(std::numeric_limits<Size_T>::max());
}
  1. Use a std::conditional to swap the type of one instantiation to avoid conflicts.

    In this case I'd probably swap back to using uintptr_t and disable the uint32_t on 32bit.
    • Disadvantage: the second instantiation still exists even though it is unused.
// Will be unused when instantiated with char.
// This is to avoid instantiation for uint32_t conflicting with uintptr_t on 32-bit systems.
template class llvm::SmallVectorBase<std::conditional<sizeof(void *) != 4, uint32_t, char>::type>;     
template class llvm::SmallVectorBase<uintptr_t>;
  1. Use preprocessor to disable one of the instantiations on 32-bit platforms.

    In this case I'd probably swap back to using uintptr_t and disable the uint32_t on 32bit.
    • Disadvantage: uses preprocessor
    • Disadvantage: potential for portability issues with different platforms lacking certain macros
#if __SIZEOF_POINTER__ != 4  && !defined(_WIN32) && !defined(__ILP32)
template class llvm::SmallVectorBase<uint32_t>;     
#endif    
template class llvm::SmallVectorBase<uintptr_t>;

My order of preference would be 1, 2, 3.

Is there another solution I've missed? Thoughts on which is best? @dblaikie

I don't think (3) needs to be non-portable - it could use SIZE_MAX to test if it's bigger than 2^32? (or == 2^64)? That seems like it'd be a pretty good direct test? & ensure the types written in the explicit specializations are the same types written in the header/std::conditional (seems a bit questionable to rely on uintptr_t being necessarily the same type as either uint32_t or uint64_t - but maybe that's guaranteed/written down somewhere)?

@browneee Looks like LLVM already defines LLVM_PTR_SIZE as a more portable version of __SIZEOF_POINTER__.

@browneee Looks like LLVM already defines LLVM_PTR_SIZE as a more portable version of __SIZEOF_POINTER__.

I saw LLVM_PTR_SIZE, but its definition may be based on sizeof(), so I don't think it should be used in a preprocessor condition.

SIZE_MAX looks like a good option.

(seems a bit questionable to rely on uintptr_t being necessarily the same type as either uint32_t or uint64_t - but maybe that's guaranteed/written down somewhere)?

I think in practice uintptr_t will match range and size with one of uint32_t and uint64_t, but as you suggest it might not be equivalent to either (e.g., could use all three of unsigned, unsigned long, and unsigned long long). We need to be consistent between the std::conditional in SmallVectorSizeType and the explicit instantiations (and skimming the current implementation it looks like it is consistent, doesn’t rely on uintptr_t).