Page MenuHomePhabricator

Compile-time computation of string attribute hashes
Needs ReviewPublic

Authored by serge-sans-paille on Nov 22 2021, 1:34 PM.

Details

Summary

This is a simplified —and hopefully easier to review— version of https://reviews.llvm.org/D114082 with a focus on the performance aspect

Basically, this change forces usage of an AttributeKey object instead of plain string as Attribute key for free-forms attributes. As these object cache their hash value and this value can be computed at compile time for the common case where we check / retrieve an attribute value, this speeds up attribute lookup.

Diff Detail

Event Timeline

serge-sans-paille requested review of this revision.Nov 22 2021, 1:34 PM
Herald added a reviewer: sstefan1. · View Herald Transcript
Herald added a reviewer: baziotis. · View Herald Transcript
Herald added projects: Restricted Project, Restricted Project. · View Herald Transcript
JonChesterfield edited the summary of this revision. (Show Details)Nov 23 2021, 12:34 AM
JonChesterfield added inline comments.
llvm/include/llvm/IR/Attributes.h
80

Could order by size first here, then strncmp on equal sizes

Remove static Dict and replace it by a dict attached to LLVMContext.

Some early benchmarks, on the SQLite amalgamation, through valgrind --tool=callgrind ./bin/clang -c -o/dev/null sqlite3.c

Instruction count
Before: 6,072,172,562
After: 6,011,551,695

I think @aeubanks might be a good reviewer for this.

I like the performance wins here.

MaskRay added inline comments.Nov 23 2021, 7:27 PM
llvm/lib/IR/AttributeImpl.h
214

std::unordered_map is inefficient. Why is this change?

llvm/lib/IR/Attributes.cpp
862

lookup

2054

If you change AttributeKeys to DenseMap<CachedHashStringRef, ...>, you can construct a CachedHashStringRef and avoid a hash computation in AttributeKey Key(s);

llvm/lib/Transforms/Scalar/RewriteStatepointsForGC.cpp
1531

This can be committed separately.

mehdi_amini added inline comments.
llvm/lib/IR/Attributes.cpp
125

Carrying over my comment from the original revision: it seem that you're ever only using the StringRef from the Kind in this function.
If so changing the API to use an AttributeKey seems pessimizing to me?

serge-sans-paille marked 6 inline comments as done.

Use CachedHashStringRef as suggested by @MaskRay

Nuullll added inline comments.Nov 24 2021, 4:28 AM
llvm/include/llvm/IR/Attributes.h
54
serge-sans-paille marked an inline comment as done.
serge-sans-paille added inline comments.
llvm/include/llvm/IR/Attributes.h
80

I tend to agree, but that would change how attributes are pretty-printed, so I'd rather keep that for another review.

llvm/lib/IR/AttributeImpl.h
214

Because in the specific case of AttrBuilder, std::unordered_map is actually more efficient. AttrBuilder usually only handles a few elements, I even tried an implementation based on a sorted array and it was better than other alternatives in my benchmarks. But that's for another PR.

llvm/lib/IR/Attributes.cpp
125

Kind is actually also used as a parameter of StringAttributeImpl which has been updated to store an AttributeKey, so that would just move the AttributeKey creation from the caller to the internal implementation.
And I'd rather have an homogeneous API alongside all string Attribute creation.

862

Not for std::unordered_map :-/

2054

I was looking for something along these lines, thanks!

llvm/lib/Transforms/Scalar/RewriteStatepointsForGC.cpp
1531

Actually it cannot, because the AttributeKey constructor accepts a char const (&) [N] in order to get a compile-time size, and const char* doesn match this signature.

nikic added a comment.Nov 30 2021, 9:50 AM

Compile-time: https://llvm-compile-time-tracker.com/compare.php?from=3608e18a946e77a474a468304b6c3904c55dbce0&to=0ce74a09cc8533a10fb67fdec4fb6ad8de4f1153&stat=instructions Some improvement at O3, not much change for optimized builds.

A concern I have is that this may be pessimizing non-clang frontends in favor of clang. Frontends not written in C++ will always be going through the AttributeKey::get() API, which will be slower than the initial state (I think -- we still have to calculate the hash dynamically, but now we also need to intern an AttributeKey by performing an actual hash table lookup.)

As a more general thought, I believe a core (performance) problem in the current attribute design is that AttrBuilder represents string attributes as a pair of strings, and most of our attribute manipulation is indirected through AttrBuilders. This means that we end up converting string Attributes into a pair of strings and then construct an Attribute again via FoldingSet lookup. This is of course a lot of unnecessary work. Similarly, there are many hot attributes (like "target-cpu") which are basically always the same, and we could just construct the Attribute upfront and always use that. That would save more than just the attribute name hash calculation. I think we may want to reconsider the AttrBuilder design to store Attributes at least for string attributes, though of course that does make it Context-dependent.

This comment was removed by serge-sans-paille.

*comment removed, I've been doing more detailed benchmark that imply a rework of the patch*

Use a DenseSet instead of a DenseMap to store the StringPool.

Some benchmark feedback (using a release build compiled with LTO), in number of instructions

sqlite3.c -c -O0sqlite3.c -S -emit-llvmsqlite3.ll -c
before patch6.12G4.68G4.42G
after patch6.06G4.62G4.38G

I also tried some micro benchmark on the C-API and C++ API, using compile-time known Attribute name

#include "llvm-c/Core.h"
#include <cstdio>
#include <cstring>

int main(int argc, char** argv) {

  int n = strlen(argv[0]);
  LLVMContextRef ctx = LLVMContextCreate();
  LLVMModuleRef mod = LLVMModuleCreateWithName("name");
  LLVMAttributeRef Key =
      LLVMCreateStringAttribute(ctx, argv[0], n, "myvalue", 7);
  LLVMValueRef value = LLVMGetIntrinsicDeclaration(mod, 25, {}, 0);
  LLVMDumpValue(value);
  LLVMAddAttributeAtIndex(value, 0, Key);
  unsigned size;
  char buffer[100];
  for (int i = 0; i < 10000; ++i) {
    for (int j = 0; j < 1000; ++j) {
      LLVMGetStringAttributeAtIndex(value, 0, argv[0], n);
      size += n;
    }
  }
  return size;
}

Here are the result (in seconds)

C APIC++ API
before the patch0.120.12
after the patch0.140.08

So basically, the drawback of this patch is that there is a double lookup (and thus two calls to strcmp) for each dynamic attribute request.
The advantage is that there's a single lookup and no allocation for static attribute request. At least for clang, this seems to be the most common request.

rnk added a comment.Dec 14 2021, 4:49 PM

That's interesting, somehow clang spends 1% of its time for sqlite3 in attribute lookup. I wonder how we ended up doing so much string-based attribute lookup. These were not really ever intended to be used for anything semantically interesting for the middle end, they were designed as a way to pass through uninteresting target-specific details like SSE features through to the backend. I guess what happened next is that we added rules to the inliner to block inlining between functions with mismatched CPU features, and that's where these string attribute lookups are coming from. Perhaps there's a design lesson here.

Regarding LLVMGetStringAttributeAtIndex and the C API, I don't imagine that most non-C++ frontends do a lot of attribute lookups. They mainly generate IR. I think, on balance, the performance improvements in LLVM itself will outweigh any regressions from doing extra lookups via the C API during IR generation.

Update: the strategy used by `AttrBuilder` might not be the best way to build new attributes. I've proposed an alternative implementation in https://reviews.llvm.org/D115798. This is somehow orthogonal to that patch, but it removes the need for temporary copies of string attributes Key/Value in the AttrBuilder which is already a big plus.