This is an archive of the discontinued LLVM Phabricator instance.

[WIP][IR] Add exten{ded,sible} metadata
Needs ReviewPublic

Authored by nhaehnle on Jun 22 2023, 1:26 AM.



This is an early draft of a mechanism for human-readable and
compile-time efficient metadata structures in LLVM IR built on

Benefits of extended metadata

Extended metadata is intended to have three main advantages over
MDTuples for building complex metadata structures:

  • IR assembly can contain meaningful label names, making it easier to read for humans.
  • Extended metadata is represented using C++ objects that can have meaningfully named accessors (no more MD.getOperand(MAGIC_NUMBER))
  • Extended metadata is represented using plain data types, so that the in-memory representation is smaller and generally more efficient (no more pointer chasing from MDNode -> ValueAsMetadata -> ConstantInt)

Debug info metadata is already built to have these advantages, but the
approach used there does not scale well:

  • Lots of boilerplate code in the definitions of the various DI* metadata classes
  • Intrusive boilerplate code in LLParser/AsmPrinter/MetadataLoader/BitcodeWriter
  • Completely ad-hoc .ll syntax gets in the way of tooling
  • Not usable by downstream users of LLVM for many reasons (intrusive code in parser/printer/loader/writer, MetadataKind is a closed enum that is switch()ed over in many places)

Extended metadata as presented here fixes all of these issues except for
the first one. For the first issue, I am considering a TableGen-based
solution along the lines of llvm-dialects and MLIR ODS, but that is not
strictly needed to make extended metadata work, and we should also
consider a CRTP-based solution at some point.

Overview of extended metadata

In a nutshell, the solution of extended metadata has the following

  • ExtMetadata is an abstract base class representing extended metadata. Extended metadata objects are defined by their class name and a structured data object as payload.
  • ExtMetadata classes can be registered with LLVMContexts at context creation time (the set of classes is frozen the first time an extended metadata object is created). Registering a class means:
    • Defining a C++ subclass of ExtMetadata that can be serialized and deserialized to structured data
    • Receiving a numeric class ID that is used to hook the C++ subclass into LLVM's custom RTTI system (isa<>, cast<>, etc.)
    • Being able to hook custom verification into the IR verifier
  • IR may contain extended metadata whose class (name) has *not* been registered with the LLVMContext. Such metadata is preserved as a black box using the GenericExtMetadata class. This situation may happen if IR is written out from (an intermediate stage of) a compiler built on LLVM, and this IR is then fed into generic tools like opt, llvm-reduce, llvm-dis, etc.

Use case(s)

I believe there could be some benefit to porting existing metadata uses
in LLVM to this new infrastructure, e.g. AA metadata, for the reasons
listed above (e.g. compile-time improvements). That said, the primary
motivation for this work is in downstream compilers.

In our graphics shader compiler use case (LLPC), there is a lot of
metadata specific to graphics APIs. One such example is rasterizer state,
which is a collection of settings (typically bools or small integers)
that tweak aspects of rasterizations that are conceptually fixed
function (but that impact the shader compilation process in some way).

In the status quo, we are faced with an awkward choice:

  • either we represent this state entirely outside of IR, which breaks common workflows like lit testing because the state is missing from .ll files
  • or we represent it using MDTuples in some way, but the tuples are far from human readable and compile-time access to them is slow.

With extended metadata, we would be able to represent rasterizer state in
human-readable form in .ll file along the lines of:

!lgc.rasterizer.state = !{!0}

!0 = !lgc.rasterizer.state {
  discardEnable: i1 true,
  perSampleShading: i1 false,
  rasterStream: i2 0,

And at compile time, this structure is represented by an
lgc::RasterizerStateMetadata class that is derived from
llvm::ExtMetadata and contains all these fields as plain C++ bools or
integers, which results in code that is both easier to read and faster.

About this patch

As stated, this patch is an early draft for discussion, especially for
discussion of the underlying StructuredData mechanism. There is also a
followup draft patch that shows how to define and register an
ExtMetadata class. The code is able to round-trip extended metadata
through assembly and disassembly, but the implementation is clearly not
ready for production.

Notable implementation TODOs:

  • overall high-level design questions
  • proper LLVMContext-owned storage
  • proper unique'ing of objects
  • fully hook up the verifier
  • schema support for bitcode space savings

Diff Detail