This is an archive of the discontinued LLVM Phabricator instance.

[IR] Use map for string attributes (NFC)
ClosedPublic

Authored by nikic on Apr 25 2020, 4:12 AM.

Details

Summary

Attributes (whether enum or string) are currently stored as a simple list. Enum attributes additionally use a bitset to allow quickly determining whether an attribute is set. String attributes on the other hand require a scan of the full list. As functions tend to have a lot of string attributes (at least when clang is used), this is a noticeable performance issue.

This patch adds an additional name => attribute map to the AttributeSetNode, which allows querying string attributes quickly. This results in a 3% reduction in instructions retired. Changes to memory usage seem to be in the noise (attribute sets are uniqued, and we don't tend to have more than a few dozen or hundred unique attribute sets, so adding an extra map does not have a visible cost.)

Diff Detail

Event Timeline

nikic created this revision.Apr 25 2020, 4:12 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 25 2020, 4:12 AM
arsenm added inline comments.Apr 25 2020, 9:12 AM
lib/IR/AttributeImpl.h
185 ↗(On Diff #260079)

Is StringMap any better?

nikic marked an inline comment as done.Apr 25 2020, 9:43 AM
nikic added inline comments.
lib/IR/AttributeImpl.h
185 ↗(On Diff #260079)

I believe StringMap is only useful if the map needs to own the string. In this case the string is already owned by the Attribute. (StringMap allocates each map entry separately, as strings are variable-length.)

There's also CachedHashStringRef that can be used here, but I don't think that's useful either, because the map is immutable, so we do not need to worry about rehashing overhead.

tejohnson added inline comments.Apr 25 2020, 9:53 AM
lib/IR/AttributeImpl.h
185 ↗(On Diff #260079)

Won't this store the Attributes twice (once in the TrailingObjects base class and once in the DenseMap)? Would it be better to just use a StringMap here to subsume the TrailingObjects and not duplicate the Attributes? Or store a pointer to the const * value returned by the iterator?

nikic marked an inline comment as done.Apr 25 2020, 10:13 AM
nikic added inline comments.
lib/IR/AttributeImpl.h
185 ↗(On Diff #260079)

Attribute is a pointer-like type, the actual attribute objects are allocated separately and globally uniqued. So we're only duplicating pointers here.

It would in theory be possible to remove string attributes from the trailing objects and only store them in the map, but I think this would cause many complications for little benefit, as AttributeSet is not particularly memory-critical. In particular this would make it hard to iterate over all attributes, and no longer provide a fixed ordering for attributes. Possibly it would cause issues with folding set profiling as well, I'm not sure whether that depends on the ordering or not.

Nice change - thanks!

lib/IR/AttributeImpl.h
185 ↗(On Diff #260079)

While StringMap would duplicate the underlying string, the DJB hash used by StringMap should be more efficient than the generic hash for StringRef - see djbHash vs hash_combine_range_impl. Given this is a hot spot, the speed/size trade-off may favor StringMap.

nikic marked an inline comment as done.Apr 25 2020, 12:10 PM
nikic added inline comments.
lib/IR/AttributeImpl.h
185 ↗(On Diff #260079)

I gave StringMap a try, but the numbers are markedly worse: http://llvm-compile-time-tracker.com/compare.php?from=0cc4958dd2dda646abda6a7b01aeb6b50b601e6b&to=5655d81e5dd2e753248b47f26d09e86171f732d0&stat=instructions This is relative to using DenseMap of StringRef.

arsenm accepted this revision.Apr 25 2020, 1:34 PM
This revision is now accepted and ready to land.Apr 25 2020, 1:34 PM
wenlei accepted this revision.Apr 25 2020, 1:55 PM
wenlei added inline comments.
lib/IR/AttributeImpl.h
185 ↗(On Diff #260079)

Ok, thanks for measuring.

tejohnson accepted this revision.Apr 25 2020, 2:28 PM

Lgtm as well

This revision was automatically updated to reflect the committed changes.