Index: docs/CommandGuide/tblgen.rst =================================================================== --- docs/CommandGuide/tblgen.rst +++ docs/CommandGuide/tblgen.rst @@ -57,6 +57,11 @@ Print all records to standard output (default). +.. option:: -dump-json + + Print a JSON representation of all records, suitable for further + automated processing. + .. option:: -print-enums Print enumeration values for a class. Index: docs/TableGen/BackEnds.rst =================================================================== --- docs/TableGen/BackEnds.rst +++ docs/TableGen/BackEnds.rst @@ -435,6 +435,127 @@ **Purpose**: Creates ``AttributeReference.rst`` from ``AttrDocs.td``, and is used for documenting user-facing attributes. +General BackEnds +================ + +JSON +---- + +**Purpose**: Output all the values in every ``def``, as a JSON data +structure that can be easily parsed by a variety of languages. Useful +for writing custom backends without having to modify TableGen itself, +or for performing auxiliary analysis on the same TableGen data passed +to a built-in backend. + +**Output**: + +The root of the output file is a JSON object (i.e. dictionary), +containing the following fixed keys: + +* ``!tablegen_json_version``: a numeric version field that will + increase if an incompatible change is ever made to the structure of + this data. The format described here corresponds to version 1. + +* ``!instanceof``: a dictionary whose keys are the class names defined + in the TableGen input. For each key, the corresponding value is an + array of strings giving the names of ``def`` records that derive + from that class. So ``root["!instanceof"]["Instruction"]``, for + example, would list the names of all the records deriving from the + class ``Instruction``. + +For each ``def`` record, the root object also has a key for the record +name. The corresponding value is a subsidiary object containing the +following fixed keys: + +* ``!superclasses``: an array of strings giving the names of all the + classes that this record derives from. + +* ``!fields``: an array of strings giving the names of all the variables + in this record that were defined with the ``field`` keyword. + +* ``!name``: a string giving the name of the record. This is always + identical to the key in the JSON root object corresponding to this + record's dictionary. (If the record is anonymous, the name is + arbitrary.) + +* ``!anonymous``: a boolean indicating whether the record's name was + specified by the TableGen input (if it is ``false``), or invented by + TableGen itself (if ``true``). + +For each variable defined in a record, the ``def`` object for that +record also has a key for the variable name. The corresponding value +is a translation into JSON of the variable's value, using the +conventions described below. + +Some TableGen data types are translated directly into the +corresponding JSON type: + +* A completely undefined value (e.g. for a variable declared without + initializer in some superclass of this record, and never initialized + by the record itself or any other superclass) is emitted as the JSON + ``null`` value. + +* ``int`` and ``bit`` values are emitted as numbers. Note that + TableGen ``int`` values are capable of holding integers too large to + be exactly representable in IEEE double precision. The integer + literal in the JSON output will show the full exact integer value. + So if you need to retrieve large integers with full precision, you + should use a JSON reader capable of translating such literals back + into 64-bit integers without losing precision, such as Python's + standard ``json`` module. + +* ``string`` and ``code`` values are emitted as JSON strings. + +* ``list`` values, for any element type ``T``, are emitted as JSON + arrays. Each element of the array is represented in turn using these + same conventions. + +* ``bits`` values are also emitted as arrays. A ``bits`` array is + ordered from least-significant bit to most-significant. So the + element with index ``i`` corresponds to the bit described as + ``x{i}`` in TableGen source. However, note that this means that + scripting languages are likely to *display* the array in the + opposite order from the way it appears in the TableGen source or in + the diagnostic ``-print-records`` output. + +All other TableGen value types are emitted as a JSON object, +containing two standard fields: ``kind`` is a discriminator describing +which kind of value the object represents, and ``printable`` is a +string giving the same representation of the value that would appear +in ``-print-records``. + +* A reference to a ``def`` object has ``kind=="def"``, and has an + extra field ``def`` giving the name of the object referred to. + +* A reference to another variable in the same record has + ``kind=="var"``, and has an extra field ``var`` giving the name of + the variable referred to. + +* A reference to a specific bit of a ``bits``-typed variable in the + same record has ``kind=="varbit"``, and has two extra fields: + ``var`` gives the name of the variable referred to, and ``index`` + gives the index of the bit. + +* A value of type ``dag`` has ``kind=="dag"``, and has two extra + fields. ``operator`` gives the initial value after the opening + parenthesis of the dag initializer; ``args`` is an array giving the + following arguments. The elements of ``args`` are arrays of length + 2, giving the value of each argument followed by its colon-suffixed + name (if any). For example, in the JSON representation of the dag + value ``(Op 22, "hello":$foo)`` (assuming that ``Op`` is the name of + a record defined elsewhere with a ``def`` statement): + + * ``operator`` will be an object in which ``kind=="def"`` and + ``def=="Op"`` + + * ``args`` will be the array ``[[22, null], ["hello", "foo"]]``. + +* If any other kind of value or complicated expression appears in the + output, it will have ``kind=="complex"``, and no additional fields. + These values are not expected to be needed by backends. The standard + ``printable`` field can be used to extract a representation of them + in TableGen source syntax if necessary. + How to write a back-end ======================= Index: docs/TableGen/index.rst =================================================================== --- docs/TableGen/index.rst +++ docs/TableGen/index.rst @@ -76,11 +76,14 @@ ADD16rr, ADD32mi, ADD32mi8, ADD32mr, ADD32ri, ADD32ri8, ADD32rm, ADD32rr, ADD64mi32, ADD64mi8, ADD64mr, ADD64ri32, ... -The default backend prints out all of the records. +The default backend prints out all of the records. There is also a general +backend which outputs all the records as a JSON data structure, enabled using +the `-dump-json` option. If you plan to use TableGen, you will most likely have to write a `backend`_ that extracts the information specific to what you need and formats it in the -appropriate way. +appropriate way. You can do this by extending TableGen itself in C++, or by +writing a script in any language that can consume the JSON output. Example ------- Index: include/llvm/TableGen/Record.h =================================================================== --- include/llvm/TableGen/Record.h +++ include/llvm/TableGen/Record.h @@ -1900,6 +1900,8 @@ Init *resolve(Init *VarName) override; }; +void EmitJSON(RecordKeeper &RK, raw_ostream &OS); + } // end namespace llvm #endif // LLVM_TABLEGEN_RECORD_H Index: lib/TableGen/CMakeLists.txt =================================================================== --- lib/TableGen/CMakeLists.txt +++ lib/TableGen/CMakeLists.txt @@ -1,5 +1,6 @@ add_llvm_library(LLVMTableGen Error.cpp + JSONBackend.cpp Main.cpp Record.cpp SetTheory.cpp Index: lib/TableGen/JSONBackend.cpp =================================================================== --- /dev/null +++ lib/TableGen/JSONBackend.cpp @@ -0,0 +1,189 @@ +//===- JSONBackend.cpp - Generate a JSON dump of all records. -*- C++ -*-=====// +// +// The LLVM Compiler Infrastructure +// +// This file is distributed under the University of Illinois Open Source +// License. See LICENSE.TXT for details. +// +//===----------------------------------------------------------------------===// +// +// This TableGen back end generates a machine-readable representation +// of all the classes and records defined by the input, in JSON format. +// +//===----------------------------------------------------------------------===// + +#include "llvm/ADT/BitVector.h" +#include "llvm/Support/Debug.h" +#include "llvm/TableGen/Error.h" +#include "llvm/TableGen/Record.h" +#include "llvm/TableGen/TableGenBackend.h" +#include "llvm/Support/JSON.h" + +#define DEBUG_TYPE "json-emitter" + +using namespace llvm; + +namespace { + +class JSONEmitter { +private: + RecordKeeper &Records; + + json::Value translateInit(const Init &I); + json::Array listSuperclasses(const Record &R); + +public: + JSONEmitter(RecordKeeper &R); + + void run(raw_ostream &OS); +}; + +} // end anonymous namespace + +JSONEmitter::JSONEmitter(RecordKeeper &R) : Records(R) {} + +json::Value JSONEmitter::translateInit(const Init &I) { + + // Init subclasses that we return as JSON primitive values of one + // kind or another. + + if (isa(&I)) { + return nullptr; + } else if (auto *Bit = dyn_cast(&I)) { + return Bit->getValue() ? 1 : 0; + } else if (auto *Bits = dyn_cast(&I)) { + json::Array array; + for (unsigned i = 0, limit = Bits->getNumBits(); i < limit; i++) + array.push_back(translateInit(*Bits->getBit(i))); + return array; + } else if (auto *Int = dyn_cast(&I)) { + return Int->getValue(); + } else if (auto *Str = dyn_cast(&I)) { + return Str->getValue(); + } else if (auto *Code = dyn_cast(&I)) { + return Code->getValue(); + } else if (auto *List = dyn_cast(&I)) { + json::Array array; + for (auto val : *List) + array.push_back(translateInit(*val)); + return array; + } + + // Init subclasses that we return as JSON objects containing a + // 'kind' discriminator. For these, we also provide the same + // translation back into TableGen input syntax that -print-records + // would give. + + json::Object obj; + obj["printable"] = I.getAsString(); + + if (auto *Def = dyn_cast(&I)) { + obj["kind"] = "def"; + obj["def"] = Def->getDef()->getName(); + return obj; + } else if (auto *Var = dyn_cast(&I)) { + obj["kind"] = "var"; + obj["var"] = Var->getName(); + return obj; + } else if (auto *VarBit = dyn_cast(&I)) { + if (auto *Var = dyn_cast(VarBit->getBitVar())) { + obj["kind"] = "varbit"; + obj["var"] = Var->getName(); + obj["index"] = VarBit->getBitNum(); + return obj; + } + } else if (auto *Dag = dyn_cast(&I)) { + obj["kind"] = "dag"; + obj["operator"] = translateInit(*Dag->getOperator()); + if (auto name = Dag->getName()) + obj["name"] = name->getAsUnquotedString(); + json::Array args; + for (unsigned i = 0, limit = Dag->getNumArgs(); i < limit; ++i) { + json::Array arg; + arg.push_back(translateInit(*Dag->getArg(i))); + if (auto argname = Dag->getArgName(i)) + arg.push_back(argname->getAsUnquotedString()); + else + arg.push_back(nullptr); + args.push_back(std::move(arg)); + } + obj["args"] = std::move(args); + return obj; + } + + // Final fallback: anything that gets past here is simply given a + // kind field of 'complex', and the only other field is the standard + // 'printable' representation. + + assert(!I.isConcrete()); + obj["kind"] = "complex"; + return obj; +} + +void JSONEmitter::run(raw_ostream &OS) { + json::Object root; + + root["!tablegen_json_version"] = 1; + + // Prepare the arrays that will list the instances of every class. + // We mostly fill those in by iterating over the superclasses of + // each def, but we also want to ensure we store an empty list for a + // class with no instances at all, so we do a preliminary iteration + // over the classes, invoking std::map::operator[] to default- + // construct the array for each one. + std::map instance_lists; + for (const auto &C : Records.getClasses()) { + auto &Name = C.second->getNameInitAsString(); + (void)instance_lists[Name]; + } + + // Main iteration over the defs. + for (const auto &D : Records.getDefs()) { + auto &Name = D.second->getNameInitAsString(); + auto &Def = *D.second; + + json::Object obj; + json::Array fields; + + for (const RecordVal &RV : Def.getValues()) { + if (!Def.isTemplateArg(RV.getNameInit())) { + auto Name = RV.getNameInitAsString(); + if (RV.getPrefix()) + fields.push_back(Name); + obj[Name] = translateInit(*RV.getValue()); + } + } + + obj["!fields"] = std::move(fields); + + json::Array superclasses; + for (const auto &SuperPair : Def.getSuperClasses()) + superclasses.push_back(SuperPair.first->getNameInitAsString()); + obj["!superclasses"] = std::move(superclasses); + + obj["!name"] = Name; + obj["!anonymous"] = Def.isAnonymous(); + + root[Name] = std::move(obj); + + // Add this def to the instance list for each of its superclasses. + for (const auto &SuperPair : Def.getSuperClasses()) { + auto SuperName = SuperPair.first->getNameInitAsString(); + instance_lists[SuperName].push_back(Name); + } + } + + // Make a JSON object from the std::map of instance lists. + json::Object instanceof; + for (auto kv: instance_lists) + instanceof[kv.first] = std::move(kv.second); + root["!instanceof"] = std::move(instanceof); + + // Done. Write the output. + OS << json::Value(std::move(root)) << "\n"; +} + +namespace llvm { + +void EmitJSON(RecordKeeper &RK, raw_ostream &OS) { JSONEmitter(RK).run(OS); } +} // end namespace llvm Index: test/TableGen/JSON-check.py =================================================================== --- /dev/null +++ test/TableGen/JSON-check.py @@ -0,0 +1,51 @@ +#!/usr/bin/env python + +import sys +import subprocess +import traceback +import json + +data = json.load(sys.stdin) +testfile = sys.argv[1] + +prefix = "CHECK: " + +fails = 0 +passes = 0 +with open(testfile) as testfh: + lineno = 0 + for line in iter(testfh.readline, ""): + lineno += 1 + line = line.rstrip("\r\n") + try: + prefix_pos = line.index(prefix) + except ValueError: + continue + check_expr = line[prefix_pos + len(prefix):] + + try: + exception = None + result = eval(check_expr, {"data":data}) + except Exception: + result = False + exception = traceback.format_exc().splitlines()[-1] + + if exception is not None: + sys.stderr.write( + "{file}:{line:d}: check threw exception: {expr}\n" + "{file}:{line:d}: exception was: {exception}\n".format( + file=testfile, line=lineno, + expr=check_expr, exception=exception)) + fails += 1 + elif not result: + sys.stderr.write( + "{file}:{line:d}: check returned False: {expr}\n".format( + file=testfile, line=lineno, expr=check_expr)) + fails += 1 + else: + passes += 1 + +if fails != 0: + sys.exit("{} checks failed".format(fails)) +else: + sys.stdout.write("{} checks passed\n".format(passes)) Index: test/TableGen/JSON.td =================================================================== --- /dev/null +++ test/TableGen/JSON.td @@ -0,0 +1,146 @@ +// RUN: llvm-tblgen -dump-json %s | %python %S/JSON-check.py %s + +// CHECK: data['!tablegen_json_version'] == 1 + +// CHECK: all(data[s]['!name'] == s for s in data if not s.startswith("!")) + +class Base {} +class Intermediate : Base {} +class Derived : Intermediate {} + +def D : Intermediate {} +// CHECK: 'D' in data['!instanceof']['Base'] +// CHECK: 'D' in data['!instanceof']['Intermediate'] +// CHECK: 'D' not in data['!instanceof']['Derived'] +// CHECK: 'Base' in data['D']['!superclasses'] +// CHECK: 'Intermediate' in data['D']['!superclasses'] +// CHECK: 'Derived' not in data['D']['!superclasses'] + +def ExampleDagOp; + +def FieldKeywordTest { + int a; + field int b; + // CHECK: 'a' not in data['FieldKeywordTest']['!fields'] + // CHECK: 'b' in data['FieldKeywordTest']['!fields'] +} + +class Variables { + int i; + string s; + bit b; + bits<8> bs; + code c; + list li; + Base base; + dag d; +} +def VarNull : Variables { + // A variable not filled in at all has its value set to JSON + // 'null', which translates to Python None + // CHECK: data['VarNull']['i'] is None +} +def VarPrim : Variables { + // Test initializers that map to primitive JSON types + + int i = 3; + // CHECK: data['VarPrim']['i'] == 3 + + // Integer literals should be emitted in the JSON at full 64-bit + // precision, for the benefit of JSON readers that preserve that + // much information. Python's is one such. + int enormous_pos = 9123456789123456789; + int enormous_neg = -9123456789123456789; + // CHECK: data['VarPrim']['enormous_pos'] == 9123456789123456789 + // CHECK: data['VarPrim']['enormous_neg'] == -9123456789123456789 + + string s = "hello, world"; + // CHECK: data['VarPrim']['s'] == 'hello, world' + + bit b = 0; + // CHECK: data['VarPrim']['b'] == 0 + + // bits<> arrays are stored in logical order (array[i] is the same + // bit identified in .td files as bs{i}), which means the _visual_ + // order of the list (in default rendering) is reversed. + bits<8> bs = { 0,0,0,1,0,1,1,1 }; + // CHECK: data['VarPrim']['bs'] == [ 1,1,1,0,1,0,0,0 ] + + code c = [{ \" }]; + // CHECK: data['VarPrim']['c'] == r' \" ' + + list li = [ 1, 2, 3, 4 ]; + // CHECK: data['VarPrim']['li'] == [ 1, 2, 3, 4 ] +} +def VarObj : Variables { + // Test initializers that map to JSON objects containing a 'kind' + // discriminator + + Base base = D; + // CHECK: data['VarObj']['base']['kind'] == 'def' + // CHECK: data['VarObj']['base']['def'] == 'D' + // CHECK: data['VarObj']['base']['printable'] == 'D' + + dag d = (ExampleDagOp 22, "hello":$foo); + // CHECK: data['VarObj']['d']['kind'] == 'dag' + // CHECK: data['VarObj']['d']['operator']['kind'] == 'def' + // CHECK: data['VarObj']['d']['operator']['def'] == 'ExampleDagOp' + // CHECK: data['VarObj']['d']['operator']['printable'] == 'ExampleDagOp' + // CHECK: data['VarObj']['d']['args'] == [[22, None], ["hello", "foo"]] + // CHECK: data['VarObj']['d']['printable'] == '(ExampleDagOp 22, "hello":$foo)' + + int undef_int; + field int ref_int = undef_int; + // CHECK: data['VarObj']['ref_int']['kind'] == 'var' + // CHECK: data['VarObj']['ref_int']['var'] == 'undef_int' + // CHECK: data['VarObj']['ref_int']['printable'] == 'undef_int' + + bits<2> undef_bits; + bits<4> ref_bits; + let ref_bits{3-2} = 0b10; + let ref_bits{1-0} = undef_bits{1-0}; + // CHECK: data['VarObj']['ref_bits'][3] == 1 + // CHECK: data['VarObj']['ref_bits'][2] == 0 + // CHECK: data['VarObj']['ref_bits'][1]['kind'] == 'varbit' + // CHECK: data['VarObj']['ref_bits'][1]['var'] == 'undef_bits' + // CHECK: data['VarObj']['ref_bits'][1]['index'] == 1 + // CHECK: data['VarObj']['ref_bits'][1]['printable'] == 'undef_bits{1}' + // CHECK: data['VarObj']['ref_bits'][0]['kind'] == 'varbit' + // CHECK: data['VarObj']['ref_bits'][0]['var'] == 'undef_bits' + // CHECK: data['VarObj']['ref_bits'][0]['index'] == 0 + // CHECK: data['VarObj']['ref_bits'][0]['printable'] == 'undef_bits{0}' + + field int complex_ref_int = !add(undef_int, 2); + // CHECK: data['VarObj']['complex_ref_int']['kind'] == 'complex' + // CHECK: data['VarObj']['complex_ref_int']['printable'] == '!add(undef_int, 2)' +} + +// Test the !anonymous member. This is tricky because when a def is +// anonymous, almost by definition, the test can't reliably predict +// the name it will be stored under! So we have to search all the defs +// in the JSON output looking for the one that has the test integer +// field set to the right value. + +def Named { int AnonTestField = 1; } +// CHECK: data['Named']['AnonTestField'] == 1 +// CHECK: data['Named']['!anonymous'] is False + +def { int AnonTestField = 2; } +// CHECK: next(rec for rec in data.values() if isinstance(rec, dict) and rec.get('AnonTestField') == 2)['!anonymous'] is True + +multiclass AnonTestMulticlass { + def _plus_one { int AnonTestField = !add(base,1); } + def { int AnonTestField = !add(base,2); } +} + +defm NamedDefm : AnonTestMulticlass<10>; +// CHECK: data['NamedDefm_plus_one']['!anonymous'] is False +// CHECK: data['NamedDefm_plus_one']['AnonTestField'] == 11 +// CHECK: next(rec for rec in data.values() if isinstance(rec, dict) and rec.get('AnonTestField') == 12)['!anonymous'] is True + +// D47431 clarifies that a named def inside a multiclass gives a +// *non*-anonymous output record, even if the defm that instantiates +// that multiclass is anonymous. +defm : AnonTestMulticlass<20>; +// CHECK: next(rec for rec in data.values() if isinstance(rec, dict) and rec.get('AnonTestField') == 21)['!anonymous'] is False +// CHECK: next(rec for rec in data.values() if isinstance(rec, dict) and rec.get('AnonTestField') == 22)['!anonymous'] is True Index: utils/TableGen/TableGen.cpp =================================================================== --- utils/TableGen/TableGen.cpp +++ utils/TableGen/TableGen.cpp @@ -24,6 +24,7 @@ enum ActionType { PrintRecords, + DumpJSON, GenEmitter, GenRegisterInfo, GenInstrInfo, @@ -59,6 +60,8 @@ Action(cl::desc("Action to perform:"), cl::values(clEnumValN(PrintRecords, "print-records", "Print all records to stdout (default)"), + clEnumValN(DumpJSON, "dump-json", + "Dump all records as machine-readable JSON"), clEnumValN(GenEmitter, "gen-emitter", "Generate machine code emitter"), clEnumValN(GenRegisterInfo, "gen-register-info", @@ -126,6 +129,9 @@ case PrintRecords: OS << Records; // No argument, dump all contents break; + case DumpJSON: + EmitJSON(Records, OS); + break; case GenEmitter: EmitCodeEmitter(Records, OS); break;