This is an archive of the discontinued LLVM Phabricator instance.

[ADT] Add methods to SmallString for efficient concatenation
ClosedPublic

Authored by njames93 on Oct 29 2020, 4:48 AM.

Details

Summary

A common pattern when using SmallString is to repeatedly call append to build a larger string.
The issue here is the optimizer can't see through this and often has to check there is enough space in the storage for each string you try to append.
This results in lots of conditional branches and potentially multiple calls to grow needing to be emitted if the buffer wasn't large enough.
By taking an initializer_list of StringRefs, SmallString can preallocate the storage it needs for all of the StringRefs which only need to grow one time at most, then use a fast path of copying all the strings into its storage knowing there is guaranteed to be enough capacity.
By using StringRefs, this also means you can append different string like types in one go as they will all be implicitly converted to a StringRef.

Diff Detail

Event Timeline

njames93 created this revision.Oct 29 2020, 4:48 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 29 2020, 4:48 AM
njames93 requested review of this revision.Oct 29 2020, 4:48 AM

Could you refresh my memory on whether reserve has the same growth semantics as other size-increasing operations? I'm slightly concerned that if reserve only reserves the exact amount required (compared to an append or similar operation exceeding the capacity which would/should cause the usual multiplication factor growth) that this change as-is might hinder performance (eg: a loop that appends two things today would be getting the multiplicative growth - but then if it ends up with a reserve(size of two strings) call in it maybe it keeps only growing just enough on each loop iteration, incurring more growths?)?

Could you refresh my memory on whether reserve has the same growth semantics as other size-increasing operations? I'm slightly concerned that if reserve only reserves the exact amount required (compared to an append or similar operation exceeding the capacity which would/should cause the usual multiplication factor growth) that this change as-is might hinder performance (eg: a loop that appends two things today would be getting the multiplicative growth - but then if it ends up with a reserve(size of two strings) call in it maybe it keeps only growing just enough on each loop iteration, incurring more growths?)?

Reserve and grow call the same function to actually grow the container, they both use the next power of 2 above the requested size. Reserve also won't allocate if it already has the capacity needed

I wonder, would it be valuable to do this for Twine? I'm thinking something like:

Optional<size_t> Twine::tryGetStorageLength() const;

which returns None if there is a leaf whose size isn't known in constant time (say, other than CStringKind, StdStringKind, and SmallStringKind). Then change Twine::toVector to use this to reserve space:

void Twine::toVector(SmallVectorImpl<char> &Out) const {
  if (auto Size = tryGetStorageLength())
    Out.reserve(Out.size() + Size);

  raw_svector_ostream OS(Out);
  print(OS);
}

If so, might that cover this use case as well?

I wonder, would it be valuable to do this for Twine? I'm thinking something like:

Optional<size_t> Twine::tryGetStorageLength() const;

which returns None if there is a leaf whose size isn't known in constant time (say, other than CStringKind, StdStringKind, and SmallStringKind). Then change Twine::toVector to use this to reserve space:

void Twine::toVector(SmallVectorImpl<char> &Out) const {
  if (auto Size = tryGetStorageLength())
    Out.reserve(Out.size() + Size);

  raw_svector_ostream OS(Out);
  print(OS);
}

If so, might that cover this use case as well?

I can't see there being as much of a win in that case, but its definitely worth exploring, not in this patch though.

dblaikie accepted this revision.Oct 29 2020, 6:59 PM

Looks good -please add a test for the assign function this patch adds too.

This revision is now accepted and ready to land.Oct 29 2020, 6:59 PM

Looks good -please add a test for the assign function this patch adds too.

There already is an assign test, should probably improve the append test so the string isn't empty before hand

njames93 updated this revision to Diff 301844.Oct 30 2020, 2:54 AM

Extend Append test so it actually has something to append to.
Demonstrate it can concatenate strings of different types.

This revision was landed with ongoing or failed builds.Oct 30 2020, 3:08 AM
This revision was automatically updated to reflect the committed changes.

Extend Append test so it actually has something to append to.
Demonstrate it can concatenate strings of different types.

Oh, soryr - not sure what I was looking at that I missed the other test. Thanks for improving the coverage a bit there in any case!