This is an archive of the discontinued LLVM Phabricator instance.

[WIP] Normalize String Attributes
AbandonedPublic

Authored by serge-sans-paille on Nov 17 2021, 5:15 AM.

Details

Summary

LLVM IR supports free form string attributes. Those are used both internally in various passes, or to convey information from the front-end to the middle-end, and externally by third party projects.

This patch normalizes the internal part while still being open to extension for third-party project. The basic idea is to have one immutable object per string attribute, with its hash computed at compile-time. Then instead of referencing attributes per their name, one must reference them by their key.

Pros:

  • avoid typos in attribute spelling etc, turn them in compile-time error
  • it is possible to list all internal keys in one place, and eventually (not implemented in this patch) to normalize them (we currently have a mixture of underscores and hyphens). And to document them.
  • hashes are precomputed, which leads to faster indexing and faster testing for attribute presence

Diff Detail

Event Timeline

serge-sans-paille requested review of this revision.Nov 17 2021, 5:15 AM
Herald added a reviewer: sstefan1. · View Herald Transcript
Herald added a reviewer: baziotis. · View Herald Transcript
Herald added projects: Restricted Project, Restricted Project. · View Herald Transcript

This is a WIP, to start the discussion. I've only turned on the X86 backend. Otherwise should be functional.

ychen added a subscriber: ychen.Nov 18 2021, 4:09 PM

Thanks for this. I planned something similar to turn string literal typo errors into compile errors as well.
The typos are really hard to identify :-P

Some suggestions on using tablegen.

llvm/include/llvm/IR/Attributes.h
977–1133

I think we can simplify these definitions via "Attributes.inc", by adding one more class tablegen class and defining the symbolic -> display name mapping in "Attributes.td".

// llvm/utils/TableGen/Attributes.cpp emitTargetsIndependentNames()
Emit({"EnumAttr", "TypeAttr", "IntAttr"}, "ATTRIBUTE_ENUM");
Emit({"StrBoolAttr"}, "ATTRIBUTE_STRBOOL");
Emit({"StrKeyAttr"}, "ATTRIBUTE_KEY");
llvm/include/llvm/IR/Attributes.td
43

Define a dedicated tablegen class for AttributeKey generation.

303

Map symbolic names with string literals.

The patch is pretty large, any chance you could split in two? First the change introducing the AttributeKey and changing the API, and then updating all the callees separately as a follow up?

llvm/include/llvm/IR/Attributes.h
84

This whole code deserves documentation I think.

llvm/lib/IR/Attributes.cpp
125

Seems like this method does not use anything than the StringRef inside the Kind, why change it to take an AttributeKey? Won't you eagerly compute the hash even if not needed?

This diff is indeed very large. I reworked it in https://reviews.llvm.org/D114394 to capture the essence of it, on which we could then rebase this patch.

rnk added subscribers: void, rnk.Nov 23 2021, 12:49 PM

it is possible to list all internal keys in one place

I think one of the original goals of adding support for string attributes (maybe @void will give some input) was specifically to avoid having one giant list with all the attributes supported by all the backends. String attributes were initially called "target dependent attributes", or something like that. I think we had a td_attrs accessor at some point. There was this idea that LLVM should not know about x86-specific attributes. There should be some kind of indirection and delegation to the target backend. LLVM has not always succeeded in following this vision, see for example the target-specific intrinsic enums, which were one giant enum until D71320.

I think, even if LLVM IR has to know all of the target-specific internally used attributes, we can try to honor that target independent vision by having separate headers for different target specific attributes. Does that sound reasonable? It's at least consistent with how we handle intrinsics (IntrinsicsX86.h, IntrinsicsAArch64.h etc).

it is possible to list all internal keys in one place

I think one of the original goals of adding support for string attributes (maybe @void will give some input) was specifically to avoid having one giant list with all the attributes supported by all the backends. String attributes were initially called "target dependent attributes", or something like that. I think we had a td_attrs accessor at some point. There was this idea that LLVM should not know about x86-specific attributes. There should be some kind of indirection and delegation to the target backend. LLVM has not always succeeded in following this vision, see for example the target-specific intrinsic enums, which were one giant enum until D71320.

I think, even if LLVM IR has to know all of the target-specific internally used attributes, we can try to honor that target independent vision by having separate headers for different target specific attributes. Does that sound reasonable? It's at least consistent with how we handle intrinsics (IntrinsicsX86.h, IntrinsicsAArch64.h etc).

The sub-review I proposed in https://reviews.llvm.org/D114394 does not introduce any list, so that should be a decent first step. As a second step, many passes already register their attribute in some local header and I can just generalize that: no big list but some local explicit definition.

ormris removed a subscriber: ormris.Jan 24 2022, 11:16 AM
serge-sans-paille abandoned this revision.Feb 9 2022, 4:39 AM