This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
docs/
3/4
LangRef.rst
-
include/llvm/
-
llvm/
-
AsmParser/
-
LLParser.h
-
IR/
5/8
StructuredData.h
-
lib/
-
AsmParser/
1
LLParser.cpp
-
IR/
1/2
AsmWriter.cpp
-
CMakeLists.txt
1
StructuredData.cpp

Differential D150370

Introduce StructuredData
Needs ReviewPublic

Authored by nhaehnle on May 11 2023, 7:36 AM.

Download Raw Diff

Details

Reviewers

jcranmer
jdoerfert
nikic
jsilvanus
Flakebi

Summary

StructuredData is a fairly general mechanism for representing extensible
data in textual IR and in bitcode.

It is intended primarily as an abstraction layer used during IR printing,
parsing and (bitcode de-)serialization.

However, it can also be used to preserve structured data as-is for
extension points in tools where a particular extension isn't understood.

Possible use cases range over:

Encoding TargetTypeInfo -- this is the use case that initially triggered this development
Extensible human-readable and compile-time efficient metadata a la debug info metadata
Human-readable and compile-time efficient modifiers on extended instructions / intrinsics

This change already includes rudimentary printing and parsing support.
Bitcode reading and writing will follow in a subsequent change.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

nhaehnle created this revision.May 11 2023, 7:36 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 11 2023, 7:36 AM

Herald added subscribers: StephenFan, hiraditya. · View Herald Transcript

nhaehnle requested review of this revision.May 11 2023, 7:36 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 11 2023, 7:36 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

nhaehnle added a child revision: D135202: [IR] Add a target extension type to LLVM..May 11 2023, 7:38 AM

nhaehnle removed a child revision: D135202: [IR] Add a target extension type to LLVM..

nhaehnle added a child revision: D147697: [IR] Add TargetExtTypeClass.

Harbormaster completed remote builds in B231335: Diff 521309.May 11 2023, 7:49 AM

Why does the symbol map need to be global, shouldn’t it by per LLVMContext (which would also get rid of the locking)?

Needs tests.

llvm/docs/LangRef.rst
3527–3531	Can constants follow the way metadata (and most other things in LLVM) is encoded and prefix the type? I.e. `i32 <integer>` and `i1 true/false`

Thanks for taking a look!

In D150370#4349276, @Flakebi wrote:

Why does the symbol map need to be global, shouldn’t it by per LLVMContext (which would also get rid of the locking)?

It's global so that we can cheaply compare an sdata::Symbol against an sdata::RegisterSymbol. This is used e.g. here: https://github.com/nhaehnle/llvm-project/blob/6bb5923a059c1120e006cb0d0cd63c5ef0806c0e/llvm/lib/IR/Type.cpp#L844

Needs tests.

What kind of tests do you have in mind? This change doesn't really expose anything, and there are tests in the followup change in the stack.

llvm/docs/LangRef.rst
3527–3531	Good question. I've been going back and forth on this, and the current version is as-is mostly because the various !DIxyz metadata doesn't have those prefixes. But that may not be the best example to follow. I'd be happy to change it to be more in line with metadata. Any other opinions?

What kind of tests do you have in mind?

I thought about tests for the parsing and printing code. I missed that the code is not yet used here, so please ignore my comment from before :)

Having global state makes things easy at the start (and efficient in the case here), but I fear we will run into problems later, for example when unloading and reloading libLLVM. There is a precedent for globals, command line arguments are global state and it is problematic.

The TargetTypeInfoDeserialize::registerSymbols() call is in the LLVMContext constructor, so it does not seem far-fetched to me to make symbols part of the context.

llvm/docs/LangRef.rst
3527–3531	But that may not be the best example to follow. I once tried to write a parser for !DI metadata (tree-sitter, for syntax highlighting) and gave up because there were more and more edge-cases, so I’d be happy to see a more structured format ;)

nhaehnle marked an inline comment as done.Jun 20 2023, 4:52 AM

nhaehnle added inline comments.

llvm/docs/LangRef.rst
3527–3531	I'm going ahead and changing it to have the `iN` prefix. As a reasonable (I believe) consequence, the bit width is now actually part of the value, so if a user wants to distinguish between `i7 0` and `i9 0`, they can.

Add iN prefix to integers (and bool) in structured data values

Harbormaster completed remote builds in B239993: Diff 532900.Jun 20 2023, 7:55 AM

nikic mentioned this in D147697: [IR] Add TargetExtTypeClass.Jun 20 2023, 12:37 PM

I am very concerned about making this a global structure, rather than something bound to the context.

More generally, I'm not happy that this new concept is being introduced as part of the target type implementation. This doesn't really seem helpful for this specific use case (it makes the implementation substantially more complex rather than simpler). It may well make sense as part of some larger context, but I think this larger context deserves a wider discussion (probably on discourse) to clarify what the goals of this abstraction are and make sure we have a good design for it.

More generally, I'm not happy that this new concept is being introduced as part of the target type implementation. This doesn't really seem helpful for this specific use case (it makes the implementation substantially more complex rather than simpler). It may well make sense as part of some larger context, but I think this larger context deserves a wider discussion (probably on discourse) to clarify what the goals of this abstraction are and make sure we have a good design for it.

Sure. It's a bit of a chicken-and-egg thing. I do think this approach is better than the piecemeal addition of things to bitcode reader/writer, which is rather subtle code that I don't think too many people understand, but at the same time, working on big changes is frustrating unless you have clear line-of-sight to actually getting something useful upstream. So I started with this small thing quite consciously on purpose.

I have a WIP change locally that leverages the same infrastructure for "extended metadata", and at least an early minimal version that does something interesting is something that I think I can put up reasonably soon so that the discussion doesn't end up overly abstract. I would also like to do the same thing for "extended instructions", but that's still a bit further out.

I am very concerned about making this a global structure, rather than something bound to the context.

You mean the symbol registration? That's ultimately a performance tradeoff. I wanted to avoid having too many string comparisons in potential hot paths, which requires interning, which requires some way to get at the intern'd ID that isn't too horrible. It would be good to understand what the actual concerns are with it; the intention is that the size of this structure is bounded by the number of static variables, so there isn't an urgent need (or even ability) to reclaim anything. But I would be happier about it if I had a way to enforce that.

In D150370#4435746, @nhaehnle wrote:

More generally, I'm not happy that this new concept is being introduced as part of the target type implementation. This doesn't really seem helpful for this specific use case (it makes the implementation substantially more complex rather than simpler). It may well make sense as part of some larger context, but I think this larger context deserves a wider discussion (probably on discourse) to clarify what the goals of this abstraction are and make sure we have a good design for it.

Sure. It's a bit of a chicken-and-egg thing. I do think this approach is better than the piecemeal addition of things to bitcode reader/writer, which is rather subtle code that I don't think too many people understand, but at the same time, working on big changes is frustrating unless you have clear line-of-sight to actually getting something useful upstream. So I started with this small thing quite consciously on purpose.

I have a WIP change locally that leverages the same infrastructure for "extended metadata", and at least an early minimal version that does something interesting is something that I think I can put up reasonably soon so that the discussion doesn't end up overly abstract. I would also like to do the same thing for "extended instructions", but that's still a bit further out.

FWIW, I do think this is pretty reasonable when seen as a pure IR/bitcode abstraction. I wouldn't have concerns if this were (initially at least) represented as std::pair<StringRef, sdata::Value> and only used as part of reading/writing. That use case doesn't really need any of the Symbol machinery -- we get these as strings in IR/bitcode anyway, and I don't think there is a substantial performance difference between interning the strings and then comparing IDs compared to directly comparing to a small handful of strings, during parsing only.

The design you have makes a lot more sense if the sdata is supposed to be used as part of the in-memory IR as well, in which case the Symbols are important for more efficient access -- but I'm not sure if they are ideal in that context either. That would still require that we have Symbols and Values (std::variant, ugh) in the representation. Ideally we'd just have a C++ struct with properly typed members, and sdata only being used for the purposes of serializing/deserializing those structs using a general mechanism, without requiring new asm/bitcode support each time.

llvm/include/llvm/IR/StructuredData.h
26	Unused
177	Not used in this patch -- move to the last one?

Drop symbols-as-values for now

In D150370#4435919, @nikic wrote:

In D150370#4435746, @nhaehnle wrote:

More generally, I'm not happy that this new concept is being introduced as part of the target type implementation. This doesn't really seem helpful for this specific use case (it makes the implementation substantially more complex rather than simpler). It may well make sense as part of some larger context, but I think this larger context deserves a wider discussion (probably on discourse) to clarify what the goals of this abstraction are and make sure we have a good design for it.

Sure. It's a bit of a chicken-and-egg thing. I do think this approach is better than the piecemeal addition of things to bitcode reader/writer, which is rather subtle code that I don't think too many people understand, but at the same time, working on big changes is frustrating unless you have clear line-of-sight to actually getting something useful upstream. So I started with this small thing quite consciously on purpose.

I have a WIP change locally that leverages the same infrastructure for "extended metadata", and at least an early minimal version that does something interesting is something that I think I can put up reasonably soon so that the discussion doesn't end up overly abstract. I would also like to do the same thing for "extended instructions", but that's still a bit further out.

FWIW, I do think this is pretty reasonable when seen as a pure IR/bitcode abstraction. I wouldn't have concerns if this were (initially at least) represented as std::pair<StringRef, sdata::Value> and only used as part of reading/writing. That use case doesn't really need any of the Symbol machinery -- we get these as strings in IR/bitcode anyway, and I don't think there is a substantial performance difference between interning the strings and then comparing IDs compared to directly comparing to a small handful of strings, during parsing only.

The design you have makes a lot more sense if the sdata is supposed to be used as part of the in-memory IR as well, in which case the Symbols are important for more efficient access -- but I'm not sure if they are ideal in that context either. That would still require that we have Symbols and Values (std::variant, ugh) in the representation. Ideally we'd just have a C++ struct with properly typed members, and sdata only being used for the purposes of serializing/deserializing those structs using a general mechanism, without requiring new asm/bitcode support each time.

Hmm, perhaps that has been a case of premature optimization.

Indeed my intention is that "known" extended structures (in the general sense) are represented as C++ structs with properly typed members in live IR. The only exception are "generic" extended structures which are used as a fallback in-memory representation to preserve as a black-box any extension structures that are unknown. That is, in our graphics compiler my current thinking is for us to have extended metadata objects along the lines of:

!lgc.rasterizer.state { discardEnable: i1 true, perSampleShading: i1 false, ... }

In our compiler, these would be represented as an instance of an lgc::RasterizerStateMetadata class which is derived from an llvm::ExtMetadata base class and contains these fields as plain data. But when we write out intermediate IR with something like -stop-after and then run it through generic tools like opt, llvm-dis, etc., the same metadata object is represented by a llvm::GenericExtMetadata which just holds an array of std::pair<sdata::Symbol, sdata::Value>. But perhaps that could just be a std::pair<std::string, sdata::Value> instead because nobody really accesses these fields anyway.

I'll think about this some more but removing the Symbol stuff and just replacing by StringRef is starting to sound reasonable to me.

llvm/include/llvm/IR/StructuredData.h
26	(only MDNode is truly unused, removing that)
177	Sure

Move SchemaField out of this patch

Harbormaster completed remote builds in B240167: Diff 533147.Jun 21 2023, 12:37 AM

Main change is removal of sdata::Symbol in favor of plain StringRef

Harbormaster completed remote builds in B240425: Diff 533499.Jun 22 2023, 1:47 AM

I think this larger context deserves a wider discussion

I started a thread on Discourse: https://discourse.llvm.org/t/rfc-structured-data-for-extensibility-in-llvm-ir/71527

Missing all the assembler tests (but I guess they come with the bitcode parts?)

llvm/include/llvm/IR/StructuredData.h
71	Does std::get on variant not do this for you?
llvm/lib/IR/AsmWriter.cpp
1345	seems more like a report_fatal_error kind of case

In D150370#4440349, @arsenm wrote:

Missing all the assembler tests (but I guess they come with the bitcode parts?)

Yeah, the assembler tests are with the actual use cases of this infrastructure in the next patch.

llvm/include/llvm/IR/StructuredData.h
71	std::get throws on error and we disable exceptions in LLVM. I'm not sure what that means in practice so thought it safest to add the assertion.
llvm/lib/IR/AsmWriter.cpp
1345	Really? I thought report_fatal_error was primarily for states that can be reached by bad input. This state here cannot be reached by bad input, but only by somebody extending sdata::Value and then forgetting to change this code.

dblaikie added a subscriber: dblaikie.Jun 26 2023, 3:03 PM

Thanks for providing the context for this functionality. The updated patch looks good to me. I think even if we never end up introducing more uses for this, this is not going to be an undue burden, so I think it's okay to move forward with the target type use case.

llvm/include/llvm/IR/StructuredData.h
19	DenseMap is not used.
37	Why do we need this class to be default-constructible? (I presume this is also why there is a monostate in the storage?)
llvm/lib/AsmParser/LLParser.cpp
4213	Okay, I guess this is where the default initialization is used. Possibly this should be std::optional instead -- or parse the sdata value in a separate function and return it?
llvm/lib/IR/StructuredData.cpp
24	Structured

khei4 added a subscriber: khei4.Jun 27 2023, 8:50 PM

Revision Contents

Path

Size

llvm/

docs/

LangRef.rst

29 lines

include/

llvm/

AsmParser/

LLParser.h

7 lines

IR/

StructuredData.h

212 lines

lib/

AsmParser/

LLParser.cpp

76 lines

IR/

AsmWriter.cpp

28 lines

CMakeLists.txt

1 line

StructuredData.cpp

95 lines

Diff 533139

llvm/docs/LangRef.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 3,503 Lines • ▼ Show 20 Lines
	source file name to the local function name.			source file name to the local function name.

	The syntax for the source file name is simply:			The syntax for the source file name is simply:

	.. code-block:: text			.. code-block:: text

	source_filename = "/path/to/source.c"			source_filename = "/path/to/source.c"

				.. _structured_data:

				Structured Data
				---------------

				Dictionaries of key-value pairs are used in some cases to represent data in an
				easily extendable, human-readable manner.

				The labels used in key-value pairs are identifiers followed immediately by a
				colon (':'), like the label of a named basic block.

				:Syntax:

				::

				sdata ::= '{' (sdata_field ',')* sdata_field? '}'
				sdata_field ::= label sdata_value
				sdata_value ::= 'type' type
				::= 'iN' integer
				::= 'i1' 'true' \| 'i1' 'false'
				FlakebiUnsubmitted Done Reply Inline Actions Can constants follow the way metadata (and most other things in LLVM) is encoded and prefix the type? I.e. `i32 <integer>` and `i1 true/false` Flakebi: Can constants follow the way metadata (and most other things in LLVM) is encoded and prefix the…
				nhaehnleAuthorUnsubmitted Done Reply Inline Actions Good question. I've been going back and forth on this, and the current version is as-is mostly because the various !DIxyz metadata doesn't have those prefixes. But that may not be the best example to follow. I'd be happy to change it to be more in line with metadata. Any other opinions? nhaehnle: Good question. I've been going back and forth on this, and the current version is as-is mostly…
				FlakebiUnsubmitted Not Done Reply Inline Actions But that may not be the best example to follow. I once tried to write a parser for !DI metadata (tree-sitter, for syntax highlighting) and gave up because there were more and more edge-cases, so I’d be happy to see a more structured format ;) Flakebi: > But that may not be the best example to follow. I once tried to write a parser for !DI…
				nhaehnleAuthorUnsubmitted Done Reply Inline Actions I'm going ahead and changing it to have the `iN` prefix. As a reasonable (I believe) consequence, the bit width is now actually part of the value, so if a user wants to distinguish between `i7 0` and `i9 0`, they can. nhaehnle: I'm going ahead and changing it to have the `iN` prefix. As a reasonable (I believe)…

				:Examples:

				::

				{}
				{ layout: type float, }
				{ foo: i1 true, bar: i32 10 }

	.. _typesystem:			.. _typesystem:

	Type System			Type System
	===========			===========

	The LLVM type system is one of the most important features of the			The LLVM type system is one of the most important features of the
	intermediate representation. Being typed enables a number of			intermediate representation. Being typed enables a number of
	optimizations to be performed on the intermediate representation			optimizations to be performed on the intermediate representation
	▲ Show 20 Lines • Show All 23,766 Lines • Show Last 20 Lines

llvm/include/llvm/AsmParser/LLParser.h

Show All 14 Lines

#include "LLLexer.h"		#include "LLLexer.h"
#include "llvm/ADT/StringMap.h"		#include "llvm/ADT/StringMap.h"
#include "llvm/AsmParser/Parser.h"		#include "llvm/AsmParser/Parser.h"
#include "llvm/IR/Attributes.h"		#include "llvm/IR/Attributes.h"
#include "llvm/IR/FMF.h"		#include "llvm/IR/FMF.h"
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
#include "llvm/IR/ModuleSummaryIndex.h"		#include "llvm/IR/ModuleSummaryIndex.h"
		#include "llvm/IR/StructuredData.h"
#include <map>		#include <map>
#include <optional>		#include <optional>

namespace llvm {		namespace llvm {
class Module;		class Module;
class ConstantRange;		class ConstantRange;
class FunctionType;		class FunctionType;
class GlobalObject;		class GlobalObject;
▲ Show 20 Lines • Show All 129 Lines • ▼ Show 20 Lines	private:

// Summary type id reference information.		// Summary type id reference information.
std::map<unsigned, std::vector<std::pair<GlobalValue::GUID *, LocTy>>>		std::map<unsigned, std::vector<std::pair<GlobalValue::GUID *, LocTy>>>
ForwardRefTypeIds;		ForwardRefTypeIds;

// Map of module ID to path.		// Map of module ID to path.
std::map<unsigned, StringRef> ModuleIdMap;		std::map<unsigned, StringRef> ModuleIdMap;

		sdata::SymbolTableLockGuard SymbolTableLock;

/// Only the llvm-as tool may set this to false to bypass		/// Only the llvm-as tool may set this to false to bypass
/// UpgradeDebuginfo so it can generate broken bitcode.		/// UpgradeDebuginfo so it can generate broken bitcode.
bool UpgradeDebugInfo;		bool UpgradeDebugInfo;

std::string SourceFileName;		std::string SourceFileName;

public:		public:
LLParser(StringRef F, SourceMgr &SM, SMDiagnostic &Err, Module *M,		LLParser(StringRef F, SourceMgr &SM, SMDiagnostic &Err, Module *M,
▲ Show 20 Lines • Show All 377 Lines • ▼ Show 20 Lines	private:
bool parseMDNode(MDNode *&N);		bool parseMDNode(MDNode *&N);
bool parseMDNodeTail(MDNode *&N);		bool parseMDNodeTail(MDNode *&N);
bool parseMDNodeVector(SmallVectorImpl<Metadata *> &Elts);		bool parseMDNodeVector(SmallVectorImpl<Metadata *> &Elts);
bool parseMetadataAttachment(unsigned &Kind, MDNode *&MD);		bool parseMetadataAttachment(unsigned &Kind, MDNode *&MD);
bool parseInstructionMetadata(Instruction &Inst);		bool parseInstructionMetadata(Instruction &Inst);
bool parseGlobalObjectMetadataAttachment(GlobalObject &GO);		bool parseGlobalObjectMetadataAttachment(GlobalObject &GO);
bool parseOptionalFunctionMetadata(Function &F);		bool parseOptionalFunctionMetadata(Function &F);

		bool parseStructuredData(
		function_ref<bool(LocTy, sdata::Symbol, LocTy, sdata::Value)>
		ParseField);

template <class FieldTy>		template <class FieldTy>
bool parseMDField(LocTy Loc, StringRef Name, FieldTy &Result);		bool parseMDField(LocTy Loc, StringRef Name, FieldTy &Result);
template <class FieldTy> bool parseMDField(StringRef Name, FieldTy &Result);		template <class FieldTy> bool parseMDField(StringRef Name, FieldTy &Result);
template <class ParserTy> bool parseMDFieldsImplBody(ParserTy ParseField);		template <class ParserTy> bool parseMDFieldsImplBody(ParserTy ParseField);
template <class ParserTy>		template <class ParserTy>
bool parseMDFieldsImpl(ParserTy ParseField, LocTy &ClosingLoc);		bool parseMDFieldsImpl(ParserTy ParseField, LocTy &ClosingLoc);
bool parseSpecializedMDNode(MDNode *&N, bool IsDistinct = false);		bool parseSpecializedMDNode(MDNode *&N, bool IsDistinct = false);

▲ Show 20 Lines • Show All 78 Lines • Show Last 20 Lines

llvm/include/llvm/IR/StructuredData.h

This file was added.

				//===- llvm/IR/StructuredData.h ---------------------------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file provides structured data objects that are used as an intermediate
				// abstraction for (de)serializing extensible IR objects.
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_IR_STRUCTUREDDATA_H
				#define LLVM_IR_STRUCTUREDDATA_H

				#include "llvm/ADT/APInt.h"
				#include "llvm/ADT/DenseMap.h"
				#include "llvm/ADT/StringRef.h"
				nikicUnsubmitted Not Done Reply Inline Actions DenseMap is not used. nikic: DenseMap is not used.
				#include "llvm/Support/Error.h"

				namespace llvm {

				class LLVMContext;
				class MDNode;
				class Type;
				nikicUnsubmitted Done Reply Inline Actions Unused nikic: Unused
				nhaehnleAuthorUnsubmitted Done Reply Inline Actions (only MDNode is truly unused, removing that) nhaehnle: (only MDNode is truly unused, removing that)

				namespace sdata {

				class RegisterSymbol;
				class SymbolTableLockGuard;

				/// A symbol is a unique'd well-known string, like the key of a field in a
				/// structured data dictionary, or the name of an enum value.
				///
				/// Use @ref RegisterSymbol to register symbol names.
				///
				nikicUnsubmitted Not Done Reply Inline Actions Why do we need this class to be default-constructible? (I presume this is also why there is a monostate in the storage?) nikic: Why do we need this class to be default-constructible? (I presume this is also why there is a…
				/// WARNING: Do not use symbols for user-provided strings. Should the need to
				/// store strings in structured data arise, an explicit string type
				/// should be added to @ref Value.
				class Symbol {
				private:
				friend class RegisterSymbol;
				friend class SymbolTableLockGuard;

				unsigned Id = 0;
				StringRef String;

				public:
				Symbol() = default;
				Symbol(const RegisterSymbol &RS);

				StringRef getAsString() const { return String; }

				bool operator==(const RegisterSymbol &RHS) const;
				bool operator!=(const RegisterSymbol &RHS) const { return !(*this == RHS); }
				bool operator==(const Symbol &RHS) const {
				if (Id != 0 && RHS.Id != 0)
				return Id == RHS.Id;
				return String == RHS.String;
				}
				bool operator!=(const Symbol &RHS) const { return !(*this == RHS); }
				};

				/// Register a constant known string as a "symbol" for used in structured data.
				///
				/// Symbols must be registered before creating/reading structured data that
				/// uses them.
				///
				/// Symbols are registered and unique'd globally. They should be constructed
				/// lazily with a static lifetime as needed, e.g. using the function-local
				arsenmUnsubmitted Not Done Reply Inline Actions Does std::get on variant not do this for you? arsenm: Does std::get on variant not do this for you?
				nhaehnleAuthorUnsubmitted Done Reply Inline Actions std::get throws on error and we disable exceptions in LLVM. I'm not sure what that means in practice so thought it safest to add the assertion. nhaehnle: std::get throws on error and we disable exceptions in LLVM. I'm not sure what that means in…
				/// static variable pattern below.
				///
				/// Example:
				/// @code
				/// struct MySymbols {
				/// sdata::RegisterSymbol MyKeyword("mykeyword");
				/// sdata::RegisterSymbol Foo("foo");
				/// sdata::RegisterSymbol Bar("bar");
				/// // ...
				///
				/// static MySymbols &get() {
				/// static MySymbols S;
				/// return S;
				/// }
				/// };
				///
				/// void registerMySymbols() {
				/// (void)MySymbols::get();
				/// }
				/// @endcode
				class RegisterSymbol {
				public:
				explicit RegisterSymbol(StringRef Str);

				Symbol get() const { return S; }

				private:
				Symbol S;
				};

				/// Thread-safe access to the table of registered symbols.
				///
				/// A read lock on the symbol table is held for the life-time of this object.
				///
				/// WARNING: This mechanism should only be used by the IR parser and bitcode
				/// reader! Everything else should use @ref RegisterSymbol instead.
				class SymbolTableLockGuard {
				public:
				SymbolTableLockGuard();
				~SymbolTableLockGuard();

				Symbol getSymbol(LLVMContext &Context, StringRef String) const;
				};

				/// A value of structured data.
				class Value {
				private:
				using Storage = std::variant<std::monostate, APInt, Type *>;

				Storage S;

				public:
				Value() = default;
				explicit Value(Type *T) : S(T) {}
				explicit Value(bool B) : S(APInt(1, B ? 1 : 0)) {}
				explicit Value(APInt I) : S(I) {}

				Value &operator=(Type *T) {
				assert(T);
				S = T;
				return *this;
				}
				Value &operator=(bool B) {
				S = APInt(1, B ? 1 : 0);
				return *this;
				}
				Value &operator=(APInt I) {
				S = I;
				return *this;
				}

				bool isAPInt() const { return std::holds_alternative<APInt>(S); }
				bool isBool() const {
				return isAPInt() && std::get<APInt>(S).getBitWidth() == 1;
				}
				bool isType() const { return std::holds_alternative<Type *>(S); }

				const APInt &getAPInt() const {
				assert(isAPInt());
				return std::get<APInt>(S);
				}
				bool getBool() const {
				assert(isBool());
				return std::get<APInt>(S).getZExtValue();
				}
				Type *getType() const {
				assert(isType());
				return std::get<Type *>(S);
				}
				};

				/// Describes the "schema" of a field of structured data.
				///
				/// This is used to describe structures for bitcode abbreviation.
				class SchemaField {
				public:
				enum class Type {
				/// Fixed-width APInt (possibly a boolean). TypeData is the number of bits.
				Int,

				/// LLVM type
				Type,
				};

				private:
				Symbol TheKey;
				nikicUnsubmitted Done Reply Inline Actions Not used in this patch -- move to the last one? nikic: Not used in this patch -- move to the last one?
				nhaehnleAuthorUnsubmitted Done Reply Inline Actions Sure nhaehnle: Sure
				Type TheType;
				unsigned TypeData;

				public:
				SchemaField(Symbol K, Type T, unsigned TD = 0)
				: TheKey(K), TheType(T), TypeData(TD) {
				assert((T != Type::Int \|\| TD != 0) &&
				"integer schema types must have a bit width");
				}

				Symbol getKey() const { return TheKey; }
				Type getType() const { return TheType; }
				unsigned getTypeBitWidth() const {
				assert(TheType == Type::Int);
				return TypeData;
				}
				};

				// Convenience function to create an Error object when an error is encountered
				// while deserializing structured data.
				Error makeDeserializeError(const Twine &Msg);

				inline Symbol::Symbol(const RegisterSymbol &RS) { *this = RS.get(); }

				inline bool Symbol::operator==(const RegisterSymbol &RHS) const {
				// Use the assumption that symbols are registered before structured data is
				// created.
				return Id == RHS.get().Id;
				}

				} // end namespace sdata

				} // end namespace llvm

				#endif

llvm/lib/AsmParser/LLParser.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,184 Lines • ▼ Show 20 Lines	do {
if (parseGlobalTypeAndValue(C))		if (parseGlobalTypeAndValue(C))
return true;		return true;
Elts.push_back(C);		Elts.push_back(C);
} while (EatIfPresent(lltok::comma));		} while (EatIfPresent(lltok::comma));

return false;		return false;
}		}

		/// parseStructuredData
		/// ::= '{' (key value (',' key value))? ','? '}'
		///
		/// value ::= 'type' type
		/// ::= 'i1' 'true' \| 'i1' 'false'
		/// ::= 'iN' integer
		bool LLParser::parseStructuredData(
		function_ref<bool(LocTy, sdata::Symbol, LocTy, sdata::Value)> ParseField) {
		if (parseToken(lltok::lbrace, "expected '{' here"))
		return true;

		while (Lex.getKind() != lltok::rbrace) {
		if (Lex.getKind() != lltok::LabelStr)
		return tokError("expected '}' or field label here");

		LocTy KeyLoc = Lex.getLoc();
		sdata::Symbol Key = SymbolTableLock.getSymbol(Context, Lex.getStrVal());
		Lex.Lex();

		LocTy ValueLoc = Lex.getLoc();
		sdata::Value V;
		nikicUnsubmitted Not Done Reply Inline Actions Okay, I guess this is where the default initialization is used. Possibly this should be std::optional instead -- or parse the sdata value in a separate function and return it? nikic: Okay, I guess this is where the default initialization is used. Possibly this should be std…
		switch (Lex.getKind()) {
		case lltok::kw_type: {
		Lex.Lex(); // eat 'type'

		Type *T;
		if (parseType(T, /AllowVoid=/true))
		return true;

		V = sdata::Value(T);
		break;
		}
		case lltok::Type: {
		Type *Ty = Lex.getTyVal();
		if (auto *IntTy = dyn_cast<IntegerType>(Ty)) {
		Lex.Lex();

		switch (Lex.getKind()) {
		case lltok::APSInt:
		V = sdata::Value(Lex.getAPSIntVal().extOrTrunc(IntTy->getBitWidth()));
		Lex.Lex();
		break;
		case lltok::kw_true:
		case lltok::kw_false:
		if (IntTy->getBitWidth() != 1)
		return tokError("true/false can only be used with i1");
		V = sdata::Value(Lex.getKind() == lltok::kw_true);
		Lex.Lex();
		break;
		default:
		return tokError("expected an integer value");
		}

		break;
		}

		return tokError("only integer types are supported in structured data");
		}

		default:
		return tokError("expected structured data value");
		}

		if (ParseField(KeyLoc, Key, ValueLoc, V))
		return true;

		if (Lex.getKind() == lltok::rbrace)
		break;
		if (parseToken(lltok::comma, "expected ',' or '}' here"))
		return true;
		}

		Lex.Lex(); // eat the '}'
		return false;
		}

bool LLParser::parseMDTuple(MDNode *&MD, bool IsDistinct) {		bool LLParser::parseMDTuple(MDNode *&MD, bool IsDistinct) {
SmallVector<Metadata *, 16> Elts;		SmallVector<Metadata *, 16> Elts;
if (parseMDNodeVector(Elts))		if (parseMDNodeVector(Elts))
return true;		return true;

MD = (IsDistinct ? MDTuple::getDistinct : MDTuple::get)(Context, Elts);		MD = (IsDistinct ? MDTuple::getDistinct : MDTuple::get)(Context, Elts);
return false;		return false;
}		}
▲ Show 20 Lines • Show All 5,898 Lines • Show Last 20 Lines

llvm/lib/IR/AsmWriter.cpp

Show First 20 Lines • Show All 52 Lines • ▼ Show 20 Lines
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
#include "llvm/IR/IntrinsicInst.h"		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/LLVMContext.h"		#include "llvm/IR/LLVMContext.h"
#include "llvm/IR/Metadata.h"		#include "llvm/IR/Metadata.h"
#include "llvm/IR/Module.h"		#include "llvm/IR/Module.h"
#include "llvm/IR/ModuleSlotTracker.h"		#include "llvm/IR/ModuleSlotTracker.h"
#include "llvm/IR/ModuleSummaryIndex.h"		#include "llvm/IR/ModuleSummaryIndex.h"
#include "llvm/IR/Operator.h"		#include "llvm/IR/Operator.h"
		#include "llvm/IR/StructuredData.h"
#include "llvm/IR/Type.h"		#include "llvm/IR/Type.h"
#include "llvm/IR/TypeFinder.h"		#include "llvm/IR/TypeFinder.h"
#include "llvm/IR/TypedPointerType.h"		#include "llvm/IR/TypedPointerType.h"
#include "llvm/IR/Use.h"		#include "llvm/IR/Use.h"
#include "llvm/IR/User.h"		#include "llvm/IR/User.h"
#include "llvm/IR/Value.h"		#include "llvm/IR/Value.h"
#include "llvm/Support/AtomicOrdering.h"		#include "llvm/Support/AtomicOrdering.h"
#include "llvm/Support/Casting.h"		#include "llvm/Support/Casting.h"
▲ Show 20 Lines • Show All 1,267 Lines • ▼ Show 20 Lines	static void WriteOptimizationInfo(raw_ostream &Out, const User *U) {

if (const OverflowingBinaryOperator *OBO =		if (const OverflowingBinaryOperator *OBO =
dyn_cast<OverflowingBinaryOperator>(U)) {		dyn_cast<OverflowingBinaryOperator>(U)) {
if (OBO->hasNoUnsignedWrap())		if (OBO->hasNoUnsignedWrap())
Out << " nuw";		Out << " nuw";
if (OBO->hasNoSignedWrap())		if (OBO->hasNoSignedWrap())
Out << " nsw";		Out << " nsw";
} else if (const PossiblyExactOperator *Div =		} else if (const PossiblyExactOperator *Div =
dyn_cast<PossiblyExactOperator>(U)) {		dyn_cast<PossiblyExactOperator>(U)) {
		arsenmUnsubmitted Not Done Reply Inline Actions seems more like a report_fatal_error kind of case arsenm: seems more like a report_fatal_error kind of case
		nhaehnleAuthorUnsubmitted Done Reply Inline Actions Really? I thought report_fatal_error was primarily for states that can be reached by bad input. This state here cannot be reached by bad input, but only by somebody extending sdata::Value and then forgetting to change this code. nhaehnle: Really? I thought report_fatal_error was primarily for states that can be reached by bad input.
if (Div->isExact())		if (Div->isExact())
Out << " exact";		Out << " exact";
} else if (const GEPOperator *GEP = dyn_cast<GEPOperator>(U)) {		} else if (const GEPOperator *GEP = dyn_cast<GEPOperator>(U)) {
if (GEP->isInBounds())		if (GEP->isInBounds())
Out << " inbounds";		Out << " inbounds";
}		}
}		}

▲ Show 20 Lines • Show All 1,275 Lines • ▼ Show 20 Lines	public:
void printVFuncId(const FunctionSummary::VFuncId VFId);		void printVFuncId(const FunctionSummary::VFuncId VFId);
void		void
printNonConstVCalls(const std::vector<FunctionSummary::VFuncId> &VCallList,		printNonConstVCalls(const std::vector<FunctionSummary::VFuncId> &VCallList,
const char *Tag);		const char *Tag);
void		void
printConstVCalls(const std::vector<FunctionSummary::ConstVCall> &VCallList,		printConstVCalls(const std::vector<FunctionSummary::ConstVCall> &VCallList,
const char *Tag);		const char *Tag);

		void
		printStructuredData(ArrayRef<std::pair<sdata::Symbol, sdata::Value>> Fields);

private:		private:
/// Print out metadata attachments.		/// Print out metadata attachments.
void printMetadataAttachments(		void printMetadataAttachments(
const SmallVectorImpl<std::pair<unsigned, MDNode *>> &MDs,		const SmallVectorImpl<std::pair<unsigned, MDNode *>> &MDs,
StringRef Separator);		StringRef Separator);

// printInfoComment - Print a little comment after the instruction indicating		// printInfoComment - Print a little comment after the instruction indicating
// which slot it occupies.		// which slot it occupies.
▲ Show 20 Lines • Show All 1,934 Lines • ▼ Show 20 Lines	void AssemblyWriter::printUseLists(const Function *F) {
if (It == UseListOrders.end())		if (It == UseListOrders.end())
return;		return;

Out << "\n; uselistorder directives\n";		Out << "\n; uselistorder directives\n";
for (const auto &Pair : It->second)		for (const auto &Pair : It->second)
printUseListOrder(Pair.first, Pair.second);		printUseListOrder(Pair.first, Pair.second);
}		}

		void AssemblyWriter::printStructuredData(
		ArrayRef<std::pair<sdata::Symbol, sdata::Value>> Fields) {
		Out << "{\n";
		for (const auto &Field : Fields) {
		Out << " " << Field.first.getAsString() << ": ";
		if (Field.second.isBool()) {
		if (Field.second.getBool())
		Out << "i1 true";
		else
		Out << "i1 false";
		} else if (Field.second.isAPInt()) {
		const APInt &I = Field.second.getAPInt();
		Out << 'i' << I.getBitWidth() << ' ' << I;
		} else if (Field.second.isType()) {
		Out << "type ";
		TypePrinter.print(Field.second.getType(), Out);
		} else {
		llvm_unreachable("unhandled sdata::Value type");
		}
		Out << ",\n";
		}
		Out << "}\n";
		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// External Interface declarations		// External Interface declarations
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

void Function::print(raw_ostream &ROS, AssemblyAnnotationWriter *AAW,		void Function::print(raw_ostream &ROS, AssemblyAnnotationWriter *AAW,
bool ShouldPreserveUseListOrder,		bool ShouldPreserveUseListOrder,
bool IsForDebug) const {		bool IsForDebug) const {
SlotTracker SlotTable(this->getParent());		SlotTracker SlotTable(this->getParent());
▲ Show 20 Lines • Show All 387 Lines • Show Last 20 Lines

llvm/lib/IR/CMakeLists.txt

Show First 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	add_llvm_component_library(LLVMCore
PrintPasses.cpp		PrintPasses.cpp
ProfDataUtils.cpp		ProfDataUtils.cpp
SafepointIRVerifier.cpp		SafepointIRVerifier.cpp
ProfileSummary.cpp		ProfileSummary.cpp
PseudoProbe.cpp		PseudoProbe.cpp
ReplaceConstant.cpp		ReplaceConstant.cpp
Statepoint.cpp		Statepoint.cpp
StructuralHash.cpp		StructuralHash.cpp
		StructuredData.cpp
Type.cpp		Type.cpp
TypedPointerType.cpp		TypedPointerType.cpp
TypeFinder.cpp		TypeFinder.cpp
Use.cpp		Use.cpp
User.cpp		User.cpp
Value.cpp		Value.cpp
ValueSymbolTable.cpp		ValueSymbolTable.cpp
VectorBuilder.cpp		VectorBuilder.cpp
Show All 18 Lines

llvm/lib/IR/StructuredData.cpp

This file was added.

				//===- StructuredData.cpp -------------------------------------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#include "llvm/IR/StructuredData.h"

				#include "LLVMContextImpl.h"
				#include "llvm/ADT/DenseMap.h"
				#include "llvm/Support/RWMutex.h"
				#include "llvm/Support/StringSaver.h"

				using namespace llvm;
				using namespace sdata;

				namespace {

				struct SymbolTable {
				sys::RWMutex Mutex;
				BumpPtrAllocator Allocator;
				StringSaver Saver{Allocator};
				nikicUnsubmitted Not Done Reply Inline Actions Structured nikic: Structured
				std::vector<StringRef> IdToName;
				DenseMap<StringRef, unsigned> NameToId;

				static SymbolTable &instance() {
				static SymbolTable Map;
				return Map;
				}
				};

				enum class DeserializeErrorCode : int {
				Generic = 1,
				};

				class DeserializeErrorCategory : public std::error_category {
				public:
				const char *name() const noexcept override {
				return "Structure Data Deserialize Error";
				}

				std::string message(int condition) const override {
				return "Error while deserializing structured data";
				}

				static DeserializeErrorCategory &get() {
				static DeserializeErrorCategory TheCategory;
				return TheCategory;
				}
				};

				} // anonymous namespace

				sdata::RegisterSymbol::RegisterSymbol(StringRef Str) {
				SymbolTable &ST = SymbolTable::instance();
				sys::ScopedWriter Lock(ST.Mutex);
				auto I = ST.NameToId.find(Str);
				if (I == ST.NameToId.end()) {
				StringRef Saved = ST.Saver.save(Str);
				ST.IdToName.push_back(Saved);
				I = ST.NameToId.try_emplace(Saved, ST.IdToName.size()).first;
				}

				S.Id = I->second;
				S.String = ST.IdToName[S.Id - 1];
				}

				SymbolTableLockGuard::SymbolTableLockGuard() {
				SymbolTable::instance().Mutex.lock_shared();
				}

				SymbolTableLockGuard::~SymbolTableLockGuard() {
				SymbolTable::instance().Mutex.unlock_shared();
				}

				Symbol SymbolTableLockGuard::getSymbol(LLVMContext &Ctx,
				StringRef String) const {
				SymbolTable &ST = SymbolTable::instance();
				Symbol S;
				S.Id = ST.NameToId.lookup(String);
				if (S.Id != 0)
				S.String = ST.IdToName[S.Id - 1];
				else
				S.String = Ctx.pImpl->Saver.save(String);
				return S;
				}

				Error llvm::sdata::makeDeserializeError(const Twine &Msg) {
				return createStringError(
				std::error_code(static_cast<int>(DeserializeErrorCode::Generic),
				DeserializeErrorCategory::get()),
				Msg);
				}