This is an archive of the discontinued LLVM Phabricator instance.

[ADT] Fix reference invalidation when self referencing a SmallVector
Needs ReviewPublic

Authored by njames93 on Sep 8 2020, 1:45 PM.

Details

Summary

Fix issues when calling methods from SmallVector that insert or append an item already inside a SmallVector where by the reference is invalidated when the container grows.

For example SmallVec.push_back(SmallVec[0]) If this call causes the vector to grow, SmallVec[0] will now be referencing invalid memory.
This also slightly speeds up SmallVec::Insert by only moving data after the insertion point once when the container grows.

This addresses https://bugs.llvm.org/show_bug.cgi?id=12728 and https://bugs.llvm.org/show_bug.cgi?id=16253

Diff Detail

Event Timeline

njames93 created this revision.Sep 8 2020, 1:45 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 8 2020, 1:45 PM
njames93 requested review of this revision.Sep 8 2020, 1:45 PM

For the clang tidy naming warnings, is it better to follow the warnings, or the current convention of the file?

I haven't added specific test cases for this, but all ADT Tests did run without a hitch under asan

llvm/include/llvm/ADT/SmallVector.h
637

The function probably also needs fixing up, but there are 2 issues here, the other being the above FIXME. Would prefer to address all of that in a follow up.

njames93 updated this revision to Diff 290847.Sep 9 2020, 4:48 PM

Harden test cases for SmallVector

Make the test case Constructable track if an object has been moved as well as created and destructed.
This makes the SmallVectorTest fail without the GrowBuffer implementation.

njames93 updated this revision to Diff 290861.Sep 9 2020, 6:56 PM

Extend behaviour to more methods.

SmallVectorImpl<T>::insert(iterator, size_t, const T&);
SmallVectorImpl<T>::insert(iterator, ItTy, ItTy);

Fixed up test cases for these usages.

nikic added a subscriber: nikic.Sep 10 2020, 2:04 AM

Could you please rebase this to current master? I wasn't able to git apply it. (Sorry, Phabricator troubles. It's hard to believe that a code review tool is incapable of properly retaining merge bases.)

njames93 updated this revision to Diff 290941.Sep 10 2020, 4:52 AM

Rebased trunk and fixed up assign to use new buffer

Could you please rebase this to current master? I wasn't able to git apply it. (Sorry, Phabricator troubles. It's hard to believe that a code review tool is incapable of properly retaining merge bases.)

To be fair it wasn't a simple automatic merge fix, the last commit to the affected file definitely conflicted this patch

njames93 updated this revision to Diff 290947.Sep 10 2020, 5:12 AM

Add buffer to append(iterator, iterator) to handle SmallVec.append(SmallVec.begin(), SmallVec.end());

njames93 edited the summary of this revision. (Show Details)Sep 10 2020, 5:22 AM
njames93 updated this revision to Diff 290953.Sep 10 2020, 5:56 AM

Fix build error when appending iterators of types convertible to SmallVector::value_type

nikic added a comment.Sep 10 2020, 7:08 AM

Here's the compile-time impact I see with this change:

1.2% regression in instructions retired: https://llvm-compile-time-tracker.com/compare.php?from=3c42c0dcf631ad6b90e718df895c05f79718659f&to=606330864b2349e29beb460ae69fa41c0170674e&stat=instructions
1.0% max-rss regression: https://llvm-compile-time-tracker.com/compare.php?from=3c42c0dcf631ad6b90e718df895c05f79718659f&to=606330864b2349e29beb460ae69fa41c0170674e&stat=max-rss

The max-rss regression is presumably due to an increase in clang binary size. Manually checking the data, size-text goes from 80490755 to 81122130, i.e. 0.8% increase.

Here's the compile-time impact I see with this change:

1.2% regression in instructions retired: https://llvm-compile-time-tracker.com/compare.php?from=3c42c0dcf631ad6b90e718df895c05f79718659f&to=606330864b2349e29beb460ae69fa41c0170674e&stat=instructions
1.0% max-rss regression: https://llvm-compile-time-tracker.com/compare.php?from=3c42c0dcf631ad6b90e718df895c05f79718659f&to=606330864b2349e29beb460ae69fa41c0170674e&stat=max-rss

The max-rss regression is presumably due to an increase in clang binary size. Manually checking the data, size-text goes from 80490755 to 81122130, i.e. 0.8% increase.

What build flags are you using. I tried a release build with thinlto. Also tried to move some of this code into the cpp file by type erasing the template out of GrowBufferBase.
For the record I'm testing on a Ryzen 2600X on ubuntu 20.04 and using clang-10 as the host compiler and linking with lld-10

trunk      - size.text = 42483D6, size.binary = 6141438
this       - size.text = 3EBD276, size.binary = 5DDD940
type-erase - size.text = 3EC0B26, size.binary = 5DE1290
This comment was removed by xbolva00.
lattner resigned from this revision.Sep 10 2020, 2:49 PM

Thank you for working on this! I don't have cycles for a detailed review of the patch, but I'm thrilled to see this footgun get fixed.

I believe I filed the second bug ( https://bugs.llvm.org/show_bug.cgi?id=16253 ) a while back because I tried to do it myself (admittedly, it was myself 7 years ago, so maybe my perspective has changed over the years) and found it too difficult. That there were corner cases or something that made me unsure how to move forward - so I filed the bug and forgot about it.

Wish I'd written down more of what those corner cases were... let's see what I can construct from what I did say in the bug.

r183465: Clearly a workaround for this bug (or a fix for an incorrect use of SmallVector, if we deem this use to be out of contract for SmallVector), though a somewhat confusing/unclear commit message to justify it.
r183459: Another of the same

Hmm - I think maybe what I had trouble with was understanding how much this shuold generalize. Should this handle subranges? (if you try to append half of a vector onto itself) and if so, how? Anyone happen to know what's guaranteed by the standard (perhaps only push_back of single elements - that's easier to handle)

For the clang tidy naming warnings, is it better to follow the warnings, or the current convention of the file?

Current convention of the file, thanks!

I haven't added specific test cases for this, but all ADT Tests did run without a hitch under asan

Did they run without a hitch under asan before the change too? I imagine so.

So probably worth adding specific tests for overlapping - ones that would've failed under asan before this change, but pass with it.

The test updates you have done in the patch so far - they're intended to make the test coverage a bit more robust to have more confidence over the refactoring? Could those test changes be committed first/separately - I assume they're intended to pass before and after the patch? So might be good as a preliminary/separate patch before this one.

llvm/include/llvm/ADT/SmallVector.h
323

Prefer range-based-for rather than std::for_each, probably (here and below)?

Hmm - I think maybe what I had trouble with was understanding how much this shuold generalize. Should this handle subranges? (if you try to append half of a vector onto itself) and if so, how? Anyone happen to know what's guaranteed by the standard (perhaps only push_back of single elements - that's easier to handle)

From what I can see insert and append fully support when the range to insert is enclosed by the vector, however assign makes no such promise as it would potentially require allocating more storage even if the container itself doesn't need to grow

I haven't added specific test cases for this, but all ADT Tests did run without a hitch under asan

Did they run without a hitch under asan before the change too? I imagine so.

They did but that was only because the test cases don't grow multiple times from what I can see

So probably worth adding specific tests for overlapping - ones that would've failed under asan before this change, but pass with it.

The test updates you have done in the patch so far - they're intended to make the test coverage a bit more robust to have more confidence over the refactoring? Could those test changes be committed first/separately - I assume they're intended to pass before and after the patch? So might be good as a preliminary/separate patch before this one.

The code to make the tests more robust fail without the modifications to small vector as they expose issues for insert moving an already deleted item.

Yeah, seems that SmallVector::push_back should be fixed to allow a.push_back(a[0]). SmallVector::insert has been fixed in rL134554

Hmm - I think maybe what I had trouble with was understanding how much this shuold generalize. Should this handle subranges? (if you try to append half of a vector onto itself) and if so, how? Anyone happen to know what's guaranteed by the standard (perhaps only push_back of single elements - that's easier to handle)

From what I can see insert and append fully support when the range to insert is enclosed by the vector, however assign makes no such promise as it would potentially require allocating more storage even if the container itself doesn't need to grow

Makes sense.

I haven't added specific test cases for this, but all ADT Tests did run without a hitch under asan

Did they run without a hitch under asan before the change too? I imagine so.

They did but that was only because the test cases don't grow multiple times from what I can see

Hmm - I thought this was about self-referential inserts/push_back, etc? Was/is there also bugs related to multiple growth? Could you show a small test case that'd fail asan with the current in-tree code related to multiple growth?

So probably worth adding specific tests for overlapping - ones that would've failed under asan before this change, but pass with it.

The test updates you have done in the patch so far - they're intended to make the test coverage a bit more robust to have more confidence over the refactoring? Could those test changes be committed first/separately - I assume they're intended to pass before and after the patch? So might be good as a preliminary/separate patch before this one.

The code to make the tests more robust fail without the modifications to small vector as they expose issues for insert moving an already deleted item.

Ah, I see now the new code avoids extra moves - the old code would grow the container, then insert the elements - which could cause an extra shift of elements after the insertion point to then make space for the to-be-inserted elements. Great to have that fixed too, but I think there should be at least some test coverage for the self-insertion otherwise we could keep the efficiency gains but accidentally break the self-insertion(self-push back, etc) by doing the same operations, but in the wrong order (eg: moving over to the new buffer first, then inserting the new elements, rather than new elements first, then old elements).

Yeah, seems that SmallVector::push_back should be fixed to allow a.push_back(a[0]). SmallVector::insert has been fixed in rL134554

Ah, right, sort of a related but maybe slightly different problem - when not resizing the underlying buffer, elements are moved, then the new one is inserted so the reference may've been invalidated. Probably best to handle that in a separate/follow-up commit - any idea if the C++ standard guarantees this for std::vector, for instance? And if so, how does libc++ implement this guarantee?

llvm/include/llvm/ADT/SmallVector.h
680–682

Could this use a range-based for loop? (I realize that'd mean moving the increment into the loop - so totally OK if you reckon that'd be less readable)

Can also drop the {} on single-line constructs like this. (& the if == size above)

llvm/unittests/ADT/SmallVectorTest.cpp
35

Maybe Destroyed rather than Invalid (& I think MovedFrom is maybe the more common phrasing than MovedOut)

Could use an enum class so you don't need to have the OS_ prefix, instead ObjectState::Constructed, etc?

Yeah, seems that SmallVector::push_back should be fixed to allow a.push_back(a[0]). SmallVector::insert has been fixed in rL134554

Ah, right, sort of a related but maybe slightly different problem - when not resizing the underlying buffer, elements are moved, then the new one is inserted so the reference may've been invalidated. Probably best to handle that in a separate/follow-up commit - any idea if the C++ standard guarantees this for std::vector, for instance? And if so, how does libc++ implement this guarantee?

Ah, I see at the end of rL134554 "Thanks to Howard Hinnant for clarifying the correct behavior, and explaining how std::vector solves this problem." - so I guess it does guarantee it, and implements it in the same way, by testing address/fixing up the address.

njames93 updated this revision to Diff 291450.Sep 13 2020, 1:32 AM
njames93 marked 3 inline comments as done.

Address a few inlines

Yeah, seems that SmallVector::push_back should be fixed to allow a.push_back(a[0]). SmallVector::insert has been fixed in rL134554

It hasn't, that handles the case where the element to insert is moved when shifting the elements down to make room, but not when the element to insert is moved because of container growth.

They did but that was only because the test cases don't grow multiple times from what I can see

Hmm - I thought this was about self-referential inserts/push_back, etc? Was/is there also bugs related to multiple growth? Could you show a small test case that'd fail asan with the current in-tree code related to multiple growth?

When I say grow multiple times, I mean the container has already grown once, then when the call to insert etc is made the container grows a second time. In that case the element to insert would now be in freed memory which would cause asan to fail. However right now that test case doesn't seem to appear, it does happen where the container grows from small size to insert. but as the memory is stack based there, its fine, from asans POV, to reference it.
In D87237, I experimented by putting asan instrumentation inside SmallVector, which caused a test case to fail when ran under asan.

Yeah, seems that SmallVector::push_back should be fixed to allow a.push_back(a[0]). SmallVector::insert has been fixed in rL134554

It hasn't, that handles the case where the element to insert is moved when shifting the elements down to make room, but not when the element to insert is moved because of container growth.

Yeah, agreed. rL134554 was only half the fix (fixing the non-growth case). Your/this patch now fixes the other half (the growth case) and fixes the push_back case (which only has a problem on growth, doesn't have a problem on non-growth, unlike insert). Great! :)

They did but that was only because the test cases don't grow multiple times from what I can see

Hmm - I thought this was about self-referential inserts/push_back, etc? Was/is there also bugs related to multiple growth? Could you show a small test case that'd fail asan with the current in-tree code related to multiple growth?

When I say grow multiple times, I mean the container has already grown once,

You mean the container is in non-small mode? (it might not've grown to get there - it might've been initialized with too much data to fit in the small buffer - though I guess that still might look like "growth" so we're probably on the same page here).

then when the call to insert etc is made the container grows a second time. In that case the element to insert would now be in freed memory which would cause asan to fail. However right now that test case doesn't seem to appear, it does happen where the container grows from small size to insert. but as the memory is stack based there, its fine, from asans POV, to reference it.
In D87237, I experimented by putting asan instrumentation inside SmallVector, which caused a test case to fail when ran under asan.

OK, so I think you're saying - the current in-tree tests, and the tests you're adding, currently don't fail even though they do test self-insertion only because they're testing in small -> big (not big -> big) mode? But with ASan instrumented SmallVector, even a small->big would cause test failures of the existing tests, without this patch to fix self-insertion?

Yeah, seems that SmallVector::push_back should be fixed to allow a.push_back(a[0]). SmallVector::insert has been fixed in rL134554

It hasn't, that handles the case where the element to insert is moved when shifting the elements down to make room, but not when the element to insert is moved because of container growth.

Yeah, agreed. rL134554 was only half the fix (fixing the non-growth case). Your/this patch now fixes the other half (the growth case) and fixes the push_back case (which only has a problem on growth, doesn't have a problem on non-growth, unlike insert). Great! :)

Yeah thats whats happening here.

then when the call to insert etc is made the container grows a second time. In that case the element to insert would now be in freed memory which would cause asan to fail. However right now that test case doesn't seem to appear, it does happen where the container grows from small size to insert. but as the memory is stack based there, its fine, from asans POV, to reference it.
In D87237, I experimented by putting asan instrumentation inside SmallVector, which caused a test case to fail when ran under asan.

OK, so I think you're saying - the current in-tree tests, and the tests you're adding, currently don't fail even though they do test self-insertion only because they're testing in small -> big (not big -> big) mode? But with ASan instrumented SmallVector, even a small->big would cause test failures of the existing tests, without this patch to fix self-insertion?

The tests inside Constructable will cause the current in-tree self-insertion test to fail as that will detect copying from a destructed value in the call to insert. The asan instrumented SmallVector would also fail because it will detect accessing the poisoned inline buffer in the small->big growth. However with this patch both of these causes of failing tests will be fixed.

njames93 updated this revision to Diff 291544.Sep 14 2020, 5:01 AM

Fix for the case if anyone tries SmallVector.emplace_back(SmallVector[X]) being invalidated on growth.
Added missing else if inside SmallVector append methods.
Refactored GrowBufferBase to take the Size_T as the template arguments to reduce template instantiations.
Use global namespace new inside GrowBuffer.

njames93 updated this revision to Diff 291557.Sep 14 2020, 6:22 AM

Move grow_size into SmallVectorBase

It can live in here as it doesn't require any specific machinery from SmallVectorTemplateCommon.
This also decreases the number of template instantiations needed for the function.

Format was provided by clang-format-11. the pre-merge bot is using 10 so I think I should ignore the format messages.

then when the call to insert etc is made the container grows a second time. In that case the element to insert would now be in freed memory which would cause asan to fail. However right now that test case doesn't seem to appear, it does happen where the container grows from small size to insert. but as the memory is stack based there, its fine, from asans POV, to reference it.
In D87237, I experimented by putting asan instrumentation inside SmallVector, which caused a test case to fail when ran under asan.

OK, so I think you're saying - the current in-tree tests, and the tests you're adding, currently don't fail even though they do test self-insertion only because they're testing in small -> big (not big -> big) mode? But with ASan instrumented SmallVector, even a small->big would cause test failures of the existing tests, without this patch to fix self-insertion?

The tests inside Constructable will cause the current in-tree self-insertion test to fail as that will detect copying from a destructed value in the call to insert. The asan instrumented SmallVector would also fail because it will detect accessing the poisoned inline buffer in the small->big growth. However with this patch both of these causes of failing tests will be fixed.

Ah, makes sense - sweet!

Use global namespace new inside GrowBuffer.

Why this change? Would've thought op new overloads might be desirable, but I certainly don't know the specifics off-hand right now.

llvm/include/llvm/ADT/SmallVector.h
304–305

Do these (and other) member function calls need to be qualified with "this->"? If not, please remove that. (usually only see that needed when it's a call into a dependent base class, right?)

333

Generally prefer range-based-for loops rather than std/llvm::for_each (about the only time I might suggest for_each would be if you had an existing lambda to pass)

647

what was wrong/happening in the absence of these "else" clauses? were tests failing? I guess the fill was happening twice, maybe? That should've failed the tests with the tighter constraints you added, right? Did it?

njames93 marked 3 inline comments as done.Sep 14 2020, 7:46 PM

Use global namespace new inside GrowBuffer.

Why this change? Would've thought op new overloads might be desirable, but I certainly don't know the specifics off-hand right now.

SmallVector seems to use global namespace new for all its operations, so I thought I'd follow the convention there.

llvm/include/llvm/ADT/SmallVector.h
647

Technically nothing wrong is happening without this else. In the case where NumElts == this->size() it'll just copy to copy assign the whole vector with Elt twice. Which while bad for performance it isn't breaching any object lifetime rules. The reason no tests fail is because the tests for SmallVector::assign, just check if the contents of the SmallVector are correct, not how many times constructors and assignment operators were called.

njames93 updated this revision to Diff 291761.Sep 14 2020, 7:46 PM
njames93 marked an inline comment as done.

Remove excessive this-> from GrowBuffer.
Use range-based loop instead of for_each.

Use global namespace new inside GrowBuffer.

Why this change? Would've thought op new overloads might be desirable, but I certainly don't know the specifics off-hand right now.

SmallVector seems to use global namespace new for all its operations, so I thought I'd follow the convention there.

Fair enough - thanks for walking me through it.

llvm/include/llvm/ADT/SmallVector.h
647

Could you make the tests a bit more rigorous to test for this fix?

njames93 updated this revision to Diff 291764.Sep 14 2020, 8:07 PM
njames93 marked an inline comment as done.

Add tests for SmallVector::assign to ensure contents aren't copied excessive amounts of times.

dblaikie accepted this revision.Sep 14 2020, 8:27 PM

Great, thanks a bunch!

llvm/unittests/ADT/SmallVectorTest.cpp
475–477

Worth testing for the specific amounts separately, or does it vary over the test variations?

This revision is now accepted and ready to land.Sep 14 2020, 8:27 PM

I have checked every GrowBuffer<T> use site. They look good!

llvm/include/llvm/ADT/SmallVector.h
343

The libc++ implementation (__swap_out_circular_buffer) returns a pointer so that the call sites do not need this->begin() + EltNo. Have you thought about adopting it?

MaskRay accepted this revision.Sep 14 2020, 11:07 PM
nikic added a comment.Sep 15 2020, 1:14 AM

Tested the newest version of this patch and I'm still seeing massive regressions, even larger than before.

Compile-time (instructions retired): https://llvm-compile-time-tracker.com/compare.php?from=cc947207283f934c72af0eb0b1a08978c59d40a2&to=d99c6d441764431519c1c11d490e7e88ffe06775&stat=instructions This is now a 1.4% geomean regression at O3, with 2% at O3, with tramp3d-v4 hitting 3%.

Max RSS: https://llvm-compile-time-tracker.com/compare.php?from=cc947207283f934c72af0eb0b1a08978c59d40a2&to=d99c6d441764431519c1c11d490e7e88ffe06775&stat=max-rss This is now a 1.6% geomean regression at O3, with 2.1% at O0.

Clang text size goes from 80560905 to 82179908, a 2% regression (non-LTO build using GCC 9.3).

Tested the newest version of this patch and I'm still seeing massive regressions, even larger than before.

Compile-time (instructions retired): https://llvm-compile-time-tracker.com/compare.php?from=cc947207283f934c72af0eb0b1a08978c59d40a2&to=d99c6d441764431519c1c11d490e7e88ffe06775&stat=instructions This is now a 1.4% geomean regression at O3, with 2% at O3, with tramp3d-v4 hitting 3%.

Max RSS: https://llvm-compile-time-tracker.com/compare.php?from=cc947207283f934c72af0eb0b1a08978c59d40a2&to=d99c6d441764431519c1c11d490e7e88ffe06775&stat=max-rss This is now a 1.6% geomean regression at O3, with 2.1% at O0.

Clang text size goes from 80560905 to 82179908, a 2% regression (non-LTO build using GCC 9.3).

@njames93 - mind looking at the memory usage a bit? that's probably relatively easy to check (well, maybe not, since it'll be smeared over all the different instantiations of this template) & seems a bit surprising. Code growth is probably just what it is - but if there's a way to quantify it and check we've got minimal extra instantiations, that'd be good.

The compile-time stat: @nikic: can you measure a wall performance difference? (otherwise possible they're shorter instructions, etc/ doesn't necessarily mean this makes compile times worse, perhaps?)

Tested the newest version of this patch and I'm still seeing massive regressions, even larger than before.

Compile-time (instructions retired): https://llvm-compile-time-tracker.com/compare.php?from=cc947207283f934c72af0eb0b1a08978c59d40a2&to=d99c6d441764431519c1c11d490e7e88ffe06775&stat=instructions This is now a 1.4% geomean regression at O3, with 2% at O3, with tramp3d-v4 hitting 3%.

Max RSS: https://llvm-compile-time-tracker.com/compare.php?from=cc947207283f934c72af0eb0b1a08978c59d40a2&to=d99c6d441764431519c1c11d490e7e88ffe06775&stat=max-rss This is now a 1.6% geomean regression at O3, with 2.1% at O0.

Clang text size goes from 80560905 to 82179908, a 2% regression (non-LTO build using GCC 9.3).

I'm still seeing a different picture with clang-10 as the host compiler, I'm only targetting x86 which explains why my binaries are so much smaller than yours.

orig-o3  54404996
this-03  54046228
orig-lto 69549814
this-lto 65981798

The most likely reason for a larger binary in GCC is in the original file, the calls to grow weren't inlined. Now, most of that code for using a new buffer is inlined. Clang may handle things a little differently.
Maybe I could define the swap buffer methods out of line to dissuade inlining.

llvm/unittests/ADT/SmallVectorTest.cpp
475–477

If the container grows there will be 2 copy constructors called, if it doesn't, there is one copy constructor and one copy assign.

@nikic Would you be able to see what the delta with this is https://gist.github.com/njames93/f26f159f06bda9e7ed2270adb39d9b08, should apply on top of trunk. It has the most expensive part of the grow buffer defined outline to dissuade inlining, may reduce binary size and improve performance with gcc9.3

Tested the newest version of this patch and I'm still seeing massive regressions, even larger than before.

Compile-time (instructions retired): https://llvm-compile-time-tracker.com/compare.php?from=cc947207283f934c72af0eb0b1a08978c59d40a2&to=d99c6d441764431519c1c11d490e7e88ffe06775&stat=instructions This is now a 1.4% geomean regression at O3, with 2% at O3, with tramp3d-v4 hitting 3%.

Max RSS: https://llvm-compile-time-tracker.com/compare.php?from=cc947207283f934c72af0eb0b1a08978c59d40a2&to=d99c6d441764431519c1c11d490e7e88ffe06775&stat=max-rss This is now a 1.6% geomean regression at O3, with 2.1% at O0.

Clang text size goes from 80560905 to 82179908, a 2% regression (non-LTO build using GCC 9.3).

@njames93 - mind looking at the memory usage a bit? that's probably relatively easy to check (well, maybe not, since it'll be smeared over all the different instantiations of this template) & seems a bit surprising. Code growth is probably just what it is - but if there's a way to quantify it and check we've got minimal extra instantiations, that'd be good.

The compile-time stat: @nikic: can you measure a wall performance difference? (otherwise possible they're shorter instructions, etc/ doesn't necessarily mean this makes compile times worse, perhaps?)

Oh, also - any chance you can get measurements with a self-host clang? (either a full two-stage bootstrap, or an LLVM 10 or 11 used to build)

llvm/unittests/ADT/SmallVectorTest.cpp
475–477

Ah, thanks!

nikic added a comment.Sep 15 2020, 1:03 PM

@nikic Would you be able to see what the delta with this is https://gist.github.com/njames93/f26f159f06bda9e7ed2270adb39d9b08, should apply on top of trunk. It has the most expensive part of the grow buffer defined outline to dissuade inlining, may reduce binary size and improve performance with gcc9.3

Unfortunately this makes things even worse, with clang binary size increasing from 80560905 to 85871535 bytes, which is an increase of 6.5%. Max RSS goes up by 3-5% accordingly.

Before looking further into what exactly is going on there, I would like to take a step back and ask whether this change is really necessary. SmallVector is some of the most performance-critical code in the LLVM project and has tens of thousands of use-sites that magnify any change to it. Given what this patch does, I think that at least some degree of either performance of code-size impact is not avoidable (though the exact impact can be mitigated). This also makes the implementation of SmallVector quite a bit more complex, with subtle invariants to upkeep.

The problem this patch is solving seems like something of an edge-case to me, and specifying that "inserting self-references requires up-front reservation of enough capacity" seems like a sufficient solution (possibly combined with some assertions to make this checkable without asan). LLVM's ADT library is thankfully in a position where it does not need to comply with the std::vector spec to the letter, and can deviate from it where useful. For example, the LLVM programming manual mentions that SmallVector forgoes some exception-safety guarantees to achieve better performance. This seems like a similar situation.

I'm sure that if sufficient effort is put into it, it's possible to make this change with a lower impact. I looked at the libc++ implementation a bit, and there are a few differences that stand out, e.g. libc++ more carefully splits the inline fast-path from the out-of-line slow-path for methods like push_back. It also seems like it provides weaker guarantees on some methods, e.g. the assign() implementation does not look grow-safe at all. Figuring out what the most efficient way to do this is would take a lot of time though, as it would involve reapplying this patch in pieces, measuring the impact individual parts have, and trying out variations. If it's necessary to do this across two different host compilers that evidently have quite different optimization behavior in this area, this becomes even more involved. So again the question: Is it really necessary?

I don't fundamentally mind not supporting this - especially with asan checks that might help catch clients in the codebase that violate the contract. But I'm guessing there's existing violations in the codebase given the previous patches/filed bugs about implementing this behavior.

So, I think either this patch needs to be landed after some tweaking to try and bring the compile size down, though as it only appears to be an issue with clang it can probably afford some regression on gcc. Or we explicitly state that small vector is not safe to self reference itself and add instrumentation to enforce that behaviour. Right now we are in a middle ground where there are some guarantees made but not enough.
I would hedge my bets that there are definitely some code clients(whether in tree or not) that are currently using SmallVector incorrectly and sometimes run into little bugs like this.
Also for the record I have just built with gcc and noticed the same kind of regressions on the text size so at least I can have a look into this myself

xbolva00 requested changes to this revision.Sep 16 2020, 1:26 AM

Tested the newest version of this patch and I'm still seeing massive regressions, even larger than before.

^.

People are hard working to improve compile times and patch like this can just kill their improvements.

I dont think we this is worth it. You can try to create SmallVectorXYZ to handle this case and be strict for SmallVector.

This revision now requires changes to proceed.Sep 16 2020, 1:26 AM
Tyker added a subscriber: Tyker.Oct 2 2020, 6:57 AM
njames93 updated this revision to Diff 299835.Oct 21 2020, 5:47 PM

Tried to control binary size when compiling for gcc. Reworked everything here (apart from the test cases)

@nikic I'd appreciate if you could test benchmark this diff using the compile time tracker.

nikic added a comment.Oct 22 2020, 9:56 AM

Here's the numbers I see with the current version of the patch:

Instructions: https://llvm-compile-time-tracker.com/compare.php?from=4b7dafd9046f0ceaadacaafe0ea4a1fb00cf70a5&to=24b013b651f8f204f633d2edffa0baa7b121e21b&stat=instructions 1.26% regression at O3, 1.72% at O0-g
Max-rss: https://llvm-compile-time-tracker.com/compare.php?from=4b7dafd9046f0ceaadacaafe0ea4a1fb00cf70a5&to=24b013b651f8f204f633d2edffa0baa7b121e21b&stat=max-rss 1.64% regression at O3, 3.55% at O0-g
Clang binary size: 82132053 to 82490366 which is an 0.4% increase

Peculiarly, even though the increase in clang binary size is indeed greatly mitigated (0.4%, where the last version had a 6.5% increase), the impact on max-rss does not seem mitigated. Maybe my assumption that the max-rss increase is caused entirely by the binary size increase was not correct.

MaskRay resigned from this revision.Oct 22 2020, 10:08 AM

If the regressions on the 'max-rss' and 'instructions' metrics cannot be addressed, maybe we should unsupport this usage. We may need a check. If it increases LLVM_ENABLE_ASSERTIONS time significantly, this can be an EXPENSIVE_CHECKS check.

Overhead is likely caused by slightly more code needing to be emitted. However there is a glaring hole here that SmallVector isn't very smart about how it should take paramaters for calls like push_back.
For small trivial types it makes sense to take those by value and in those instances the current implementation of just calling grow would have no issue about reference invalidation as there no longer a reference.
Downside is it would require copying most of the small vector implementation code for this special case, kind of like what currently happens for SmallVectorTemplateBase with trivially copyable types.

Given that a lot of use cases of SmallVector seem to be for storing pointers or as the base class for SmallString, this would remove this specific overhead in these cases.

Overhead is likely caused by slightly more code needing to be emitted. However there is a glaring hole here that SmallVector isn't very smart about how it should take paramaters for calls like push_back.
For small trivial types it makes sense to take those by value and in those instances the current implementation of just calling grow would have no issue about reference invalidation as there no longer a reference.
Downside is it would require copying most of the small vector implementation code for this special case, kind of like what currently happens for SmallVectorTemplateBase with trivially copyable types.

Given that a lot of use cases of SmallVector seem to be for storing pointers or as the base class for SmallString, this would remove this specific overhead in these cases.

That seems really valuable, and could be committed separately/ahead of the rest of this patch. I don't think you need to split the template; you can just:

using ParamT = std::conditional<should_take_by_value<T>, T, const T&>;
void push_back(ParamT Val);

Note also https://reviews.llvm.org/D84293, which proposes adding an assertion when growing is unsafe.

Downside is it would require copying most of the small vector implementation code for this special case, kind of like what currently happens for SmallVectorTemplateBase with trivially copyable types.

Given that a lot of use cases of SmallVector seem to be for storing pointers or as the base class for SmallString, this would remove this specific overhead in these cases.

That seems really valuable, and could be committed separately/ahead of the rest of this patch.

Here's a quick attempt at it: https://reviews.llvm.org/D91467

FYI, I landed assertions for various reference invalidations in https://reviews.llvm.org/D91744.

I wonder if this patch could be landed incrementally, fixing one API at a time from the bottom up. For example, we might start with just push_back or assign, hopefully side-stepping any compile-time regression by incorporating the relevant parts of https://reviews.llvm.org/D91467 to skip the slow path for small enough, trivial T.