This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
docs/
-
CommandGuide/
-
tblgen.rst
-
TableGen/
-
BackEnds.rst
-
index.rst
-
include/llvm/TableGen/
-
llvm/
-
TableGen/
-
Record.h
-
lib/TableGen/
-
TableGen/
-
CMakeLists.txt
-
JSONBackend.cpp
-
test/TableGen/
-
TableGen/
-
JSON-check.py
-
JSON.td
-
utils/TableGen/
-
TableGen/
-
TableGen.cpp

Differential D46054

[TableGen] Add a general-purpose JSON backend.
ClosedPublic

Authored by simon_tatham on Apr 25 2018, 5:28 AM.

Download Raw Diff

Details

Reviewers

nhaehnle

Commits

rG6a8c6cadf10c: [TableGen] Add a general-purpose JSON backend.
rL336771: [TableGen] Add a general-purpose JSON backend.

Summary

The aim of this backend is to output everything TableGen knows about
the record set, similarly to the default -print-records backend. But
where -print-records produces output in TableGen's input syntax
(convenient for humans to read), this backend produces it as
structured JSON data, which is convenient for loading into standard
scripting languages such as Python, in order to extract information
from the data set in an automated way.

The output data contains a JSON representation of the variable
definitions in output 'def' records, and a few pieces of metadata such
as which of those definitions are tagged with the 'field' prefix and
which defs are derived from which classes. It doesn't dump out
absolutely every piece of knowledge it _could_ produce, such as type
information and complicated arithmetic operator nodes in abstract
superclasses; the main aim is to allow consumers of this JSON dump to
essentially act as new backends, and backends don't generally need to
depend on that kind of data.

The new backend is implemented as an EmitJSON() function similar to
all of llvm-tblgen's other EmitFoo functions, except that it lives in
lib/TableGen instead of utils/TableGen on the basis that I'm expecting
to add it to clang-tblgen too in a future patch.

To test it, I've written a Python script that loads the JSON output
and tests properties of it based on comments in the .td source - more
or less like FileCheck, except that the CHECK: lines have Python
expressions after them instead of textual pattern matches.

Diff Detail

Repository: rL LLVM

Event Timeline

simon_tatham created this revision.Apr 25 2018, 5:28 AM

Harbormaster completed remote builds in B17403: Diff 143909.Apr 25 2018, 5:28 AM

Herald added subscribers: llvm-commits, mgorny. · View Herald TranscriptApr 25 2018, 5:28 AM

labath added a subscriber: labath.Apr 25 2018, 8:28 AM

labath added inline comments.

utils/TableGen/JSONEmitter.cpp
1 ↗	(On Diff #143909)	You should take a look at <D45753>, which is about to add a JSON library to llvm. It would be a shame to add two of them in the same week. :)

simon_tatham added inline comments.Apr 25 2018, 8:35 AM

utils/TableGen/JSONEmitter.cpp
1 ↗	(On Diff #143909)	Ha! You're right, I hadn't noticed that. I'll replace my ad-hockery with calls to that code with great pleasure as soon as it lands – even without looking I'm sure it will be better than the 'only just enough' code I have here. Thanks for pointing it out!

@nhaehnle : while this change is blocked on that one, do you have any opinions on my design questions? Particularly the one about whether I ought to move most of the code into some sort of getAsJSONObject method in the various classes in Record.h / Record.cpp, because if I'm going to do that then I should do it before making any other detailed changes :-)

Consider what to do about integer values that don't fit exactly into a 'double'. This code will simply emit them as decimal integer literals, which JSON parsers are within their rights to round to the nearest double precision float, losing data. Some JSON readers (e.g. Python json.load) will deliver accurate integer values anyway, but it might be better not to rely on that, and instead output very large integers in some other form, such as a JSON object containing an identifying type field and two doubles whose sum is the desired integer, or a string representation of the integer, or both.

Well. The one thing I do have a strong opinion on is that the representation of very large integers should not be different from the representation of small integers, because consumers would almost certainly get that wrong. So that leaves two options: string representations, or sending JavaScript to the hell it deserves and going with integers.

Consider adding the new -dump-json option to clang-tblgen as well as llvm-tblgen. (As I understand it they wouldn't do anything differently, but it seems asymmetric not to have both of them support it. They both have -print-records, after all.)

Sure, that seems like a good idea.

Consider providing a cut-down version, enabled by another option such as '-dump-simple-json', in which all the complicated parametric expression nodes like !add and !foreach and !foldl are simply not emitted, and replaced by some kind of small object indicating that a complex expression was elided. The motivation is that I expect a lot of uses for this system would only be interested in the output fields that consist of final well-defined values of primitive type, so constructing the complicated parts is a waste of both TableGen's time and the consumer's. But I'm not sure where the line should be drawn - DAG arguments might well still need to be output in full, for example, and type information might be omittable. There may be no one good answer.

I believe there is a good answer :)

Take a look at TGParser.cpp, checkConcrete. All valid final and needed records should pass that check. There are unfortunately still some exceptions, although it'd be nice if we could get rid of those actually.

So I would argue that you should only bother emitting what fits this definition of "concrete", i.e. don't have a "complex" JSON option at all. If you do encounter something that doesn't fit the definition of concrete, just output a JSON object { "kind": "complex", "code": <getAsString()> } in its place.

On a related note, I'm not convinced you really want to print out class definitions. The benefit of printing class definitions with -print-records is that it can help you understand what's going on while you're writing class definitions, but the existing TableGen backends really don't care about class definitions. Since the idea here is to basically allow writing TableGen backends in a scripting language like Python, providing the class definitions is unlikely to be useful. It doesn't hurt, but I don't think it's a good motivation for building all this infrastructure for printing ! operations, for example.

Decide where all this code should live. It might be better to move a lot of it into Record.cpp in the form of getAsJSONObject() methods or something like that. That would remove the risk of forgetting to update the JSON back end if a new node type is introduced - anyone forgetting to implement that method in any new subclass of Init or RecTy would be reminded by a compile error.

I'd rather keep it separate for orthogonality. Adding a new fundamental concrete data type is a rare enough occurence, and the worst that would happen with my suggestion above is that you get an unexpected "complex" object, which is not too difficult to track down.

Well. The one thing I do have a strong opinion on is that the representation of very large integers should not be different from the representation of small integers, because consumers would almost certainly get that wrong.

A good point – now you put it that way, I suddenly agree strongly!

So that leaves two options: string representations, or sending JavaScript to the hell it deserves and going with integers.

Well, since my personal use cases all involve Python, and Python copes fine with arbitrary integers, I'm happy with the latter if you are :-) And I'd definitely prefer not to have to put a decode-from-string operation in every single consumer of this JSON output.

I suppose a cl::opt<bool> to switch to a more cumbersome representation of integers could always be added later, if anyone turns out to really need one.

Take a look at TGParser.cpp, checkConcrete. All valid final and needed records should pass that check. There are unfortunately still some exceptions, although it'd be nice if we could get rid of those actually.

Ah! Yes, that seems nice. And if I'm not dumping the class definitions, then perhaps it's not worth dumping the details of all the types either, for the same reason (a 'back end' consuming this format will already know what type to expect from any field it cares about), in which case I can simplify the output representation a great deal by removing the extra level of dereference where you have to suffix ['value'] all the time.

I agree that none of my own use cases will care about any of the things that this redesign throws away – and it makes the output JSON a great deal smaller and simpler. Thanks for the suggestions!

arichardson added a subscriber: arichardson.Apr 26 2018, 4:29 PM

simon_tatham mentioned this in D45753: Lift JSON library from clang-tools-extra/clangd to llvm/Support..Apr 27 2018, 5:23 AM

simon_tatham added parent revisions: D45753: Lift JSON library from clang-tools-extra/clangd to llvm/Support., D46209: [Support] Make JSON handle doubles and int64s losslessly.Apr 30 2018, 7:57 AM

OK, here's my second draft. Changes since last time:

thrown out the ad-hoc JSON emitter in favour of the new JSON library in D45753 (also requires the integer-handling followup patch D46209)
moved the new source file into lib/TableGen where clang-tblgen will be able to get at it more easily (but I haven't actually added it to clang-tblgen yet)
removed all the type and abstract class information, leaving only the concrete records and a couple of pieces of metadata that I know backends do actually want (list of field keywords, list of superclasses, list of instances of each class). Exotic subclasses of Init are now rendered as kind="complex" with only a printable representation.
flattened the JSON structure by several layers to make it more convenient to consume
added documentation of the format.

I think from my perspective this is no longer an unfinished draft; I'd be happy to commit it in this state, subject to code review approval and its dependencies landing.

simon_tatham edited the summary of this revision. (Show Details)Apr 30 2018, 8:21 AM

simon_tatham mentioned this in D46352: [TableGen] Don't quote variable name when printing !foreach..May 2 2018, 2:02 AM

Thanks, this already looks very good. I do have some suggestions though.

docs/TableGen/BackEnds.rst
516–518 ↗	(On Diff #144564)	This is a minor point, but I suspect it would be slightly more convenient for consumers if this were instead represented as `[[arg, name], [arg, name], ...]`. So the example below would have `args` be `[[22, null], ["hello", "foo"]]`. What do you think?
lib/TableGen/JSONBackend.cpp
46–49 ↗	(On Diff #144564)	C-style comments are rather uncommon in LLVM; best to be consistent and use C++-style comments here (and below).
93 ↗	(On Diff #144564)	I think this should be "var" instead of "variable", for consistency.
172–179 ↗	(On Diff #144564)	I think it would be slightly nicer to merge this into the loop above, to avoid iterating over the same data twice.
test/TableGen/JSON-check.py
49 ↗	(On Diff #144564)	Cool, I didn't know this was possible.

simon_tatham added inline comments.May 3 2018, 8:33 AM

docs/TableGen/BackEnds.rst
516–518 ↗	(On Diff #144564)	That's actually more like how I had it in the first version of the patch, and I had second thoughts and changed it to this :-) so I'm already on the fence and could easily be persuaded to change it back again. My thought was that some use cases wouldn't care about the name at all (e.g. an entire backend might use dag-typed data for some purpose that never gives a name to any argument), and those users would find it more convenient if they could get the actual arguments by just saying `node['args'][n]` instead of having to say `node['args'][n][0]` (in your version) or `node['args'][n]['value']` (as I originally had it). So I moved the names out into a separate array that you'd only have to look at if you cared about names at all. On the other hand, I can certainly see the counterargument – if you do care about names, it's nicer to have a single array to iterate over. I'm happy to change it to your style.
lib/TableGen/JSONBackend.cpp
93 ↗	(On Diff #144564)	Another thing I was trying to do (but forgot to actually mention anywhere) was to arrange for all the field names that depend on "kind" not to be the same as each other, as a means of error detection – it would stop a user accidentally mistaking a var for a varbit, by absentmindedly retrieving `node['var']` and forgetting there was another field to look at too. But that's quite a marginal consideration in the first place, and also I admit this particular choice of two different names was terrible :-) so yes, I'm happy to change over to being consistent.
test/TableGen/JSON-check.py
49 ↗	(On Diff #144564)	You mean passing a string to `sys.exit`? That was new to me as well quite recently. It's one of those functions where as a C programmer I automatically assumed I already knew how its API would work, so it took me years to find a reason to read its docs!

Updated with all those review comments.

simon_tatham marked 4 inline comments as done.May 3 2018, 9:04 AM

Great, LGTM!

docs/TableGen/BackEnds.rst
516–518 ↗	(On Diff #144564)	Yeah, I admit it was really a minor thing. Thanks for changing it though!

This revision is now accepted and ready to land.May 3 2018, 9:27 AM

Thanks for the review!

Of course, having got to this point, I can't actually commit it until D45753 is ready. And I'm about to be on holiday for several weeks, so if that happens soon then I probably won't notice for a while. But I won't actually forget about this patch, I promise :-)

simon_tatham mentioned this in D47430: TableGen: Streamline the semantics of NAME.May 29 2018, 9:41 AM

@nhaehnle, following up discussion about NAME in D47430 (and hopefully posted in the right place this time):

I was going to change this patch so that it uses !name instead of NAME for the key inside each JSON def object that gives the def's own name. (Mostly the reason I think it's useful to have such a key at all is so that client code that's consuming the JSON can pass those dictionaries to its own subfunctions without having to pass the name alongside it, so it makes sense to me to use a key that indicates that it's a JSON-specific convenience.)

But it's just struck me that there might also be a use case for knowing whether the record is anonymous or not, in the sense of whether its name is something that was deliberately specified in the TableGen input or whether it was some anonymous_123 value made up by tblgen itself.

I was thinking of adding a boolean field !anonymous, or alternatively perhaps having !key and !name (where !key is always the key under which this record is stored in the JSON root object, and !name is either the same as !key or null). Any particular preference, before I make the changes? Or is the entire idea not worth bothering with?

(Also, I noticed in passing that the IsAnonymous field isn't set reliably: the records defined by an anonymous defm have it false rather than true. That looks easy to fix, and I could fold that into this change or make it a separate one.)

There are arguments both for !key + !name and for !name + !anonymous, although thinking about it for a minute or two I weakly prefer !name + !anonymous because it matches the representation in C++. It makes it easier for people to move between JSON and C++.

P.S.: No worries about the confusion with the other review :)

And by the way, I do agree with your rationale for why !name is very useful to have in JSON. The C++ backends can (and do) use Record::getName() for the same functionality.

Renamed the NAME attribute to !name in line with D47430.

Added the !anonymous attribute, a set of tests for it, and a fix in TGParser.cpp to set it correctly for anonymous defms.

OK, !name + !anonymous it is. (That was how I'd drafted it too, so I think we had the same mild preference.)

I'm afraid this patch now has a very tiny conflict with D47430, in that we've both added braces to the same if statement in TGParser::ParseDefm.

Rebase to current trunk, and revert change to 'anonymous' handling.

In line with the discussion in D47431, I've removed my previous tweak
in TGParser.cpp that sets the !anonymous flag if any part of a def's
final name was derived from an anonymous def or defm. Now the
!anonymous field in the JSON output is consistent with the existing
behavior of the isAnonymous() query function.

simon_tatham edited the summary of this revision. (Show Details)Jul 9 2018, 7:05 AM

@nhaehnle , are you still happy for me to commit this, now that its dependencies have landed and I've tweaked its handing of !anonymous?

Nearly forgot to mention the new option in the tblgen man page!

Harbormaster completed remote builds in B20195: Diff 154762.Jul 10 2018, 12:59 AM

Actually, on second thoughts, I'm going to assume it was overcautious to ask for a re-approval, since this version of the patch introduces no new controversy and in fact removes the only previous tweak in the Tablegen core (in that I'm not trying to change the semantics of anonymous any more). So I'll commit this as is, based on the previous review approval.

Closed by commit rL336771: [TableGen] Add a general-purpose JSON backend. (authored by statham). · Explain WhyJul 11 2018, 1:45 AM

This revision was automatically updated to reflect the committed changes.

The test json.td fails on Windows when there's a space in the path to python. python should be specified in the test file as '%python' or "%python" rather than just '%python. I can submit a fix for review if you're not able. Thanks!

Oops, sorry about that :-(

(I admit I didn't consider the possibility of spacey filenames at all, but now I have, it's a mild surprise to me that %python doesn't expand to an already correctly-quoted string.)

The fix sounds easy, but if you have a setup where you can actually check it works, there might be less chance of a typo if you do it rather than me...

Revision Contents

Path

Size

llvm/

trunk/

docs/

CommandGuide/

tblgen.rst

5 lines

TableGen/

BackEnds.rst

121 lines

index.rst

7 lines

include/

llvm/

TableGen/

Record.h

2 lines

lib/

TableGen/

CMakeLists.txt

1 line

JSONBackend.cpp

189 lines

test/

TableGen/

JSON-check.py

51 lines

JSON.td

146 lines

utils/

TableGen/

TableGen.cpp

6 lines

Diff 154946

llvm/trunk/docs/CommandGuide/tblgen.rst

	Show First 20 Lines • Show All 51 Lines • ▼ Show 20 Lines
	.. option:: -class className			.. option:: -class className

	Print the enumeration list for this class.			Print the enumeration list for this class.

	.. option:: -print-records			.. option:: -print-records

	Print all records to standard output (default).			Print all records to standard output (default).

				.. option:: -dump-json

				Print a JSON representation of all records, suitable for further
				automated processing.

	.. option:: -print-enums			.. option:: -print-enums

	Print enumeration values for a class.			Print enumeration values for a class.

	.. option:: -print-sets			.. option:: -print-sets

	Print expanded sets for testing DAG exprs.			Print expanded sets for testing DAG exprs.

	▲ Show 20 Lines • Show All 69 Lines • Show Last 20 Lines

llvm/trunk/docs/TableGen/BackEnds.rst

	Show First 20 Lines • Show All 429 Lines • ▼ Show 20 Lines
	Generate ARM NEON tests for clang.			Generate ARM NEON tests for clang.

	AttrDocs			AttrDocs
	--------			--------

	Purpose: Creates ``AttributeReference.rst`` from ``AttrDocs.td``, and is			Purpose: Creates ``AttributeReference.rst`` from ``AttrDocs.td``, and is
	used for documenting user-facing attributes.			used for documenting user-facing attributes.

				General BackEnds
				================

				JSON
				----

				Purpose: Output all the values in every ``def``, as a JSON data
				structure that can be easily parsed by a variety of languages. Useful
				for writing custom backends without having to modify TableGen itself,
				or for performing auxiliary analysis on the same TableGen data passed
				to a built-in backend.

				Output:

				The root of the output file is a JSON object (i.e. dictionary),
				containing the following fixed keys:

				* ``!tablegen_json_version``: a numeric version field that will
				increase if an incompatible change is ever made to the structure of
				this data. The format described here corresponds to version 1.

				* ``!instanceof``: a dictionary whose keys are the class names defined
				in the TableGen input. For each key, the corresponding value is an
				array of strings giving the names of ``def`` records that derive
				from that class. So ``root["!instanceof"]["Instruction"]``, for
				example, would list the names of all the records deriving from the
				class ``Instruction``.

				For each ``def`` record, the root object also has a key for the record
				name. The corresponding value is a subsidiary object containing the
				following fixed keys:

				* ``!superclasses``: an array of strings giving the names of all the
				classes that this record derives from.

				* ``!fields``: an array of strings giving the names of all the variables
				in this record that were defined with the ``field`` keyword.

				* ``!name``: a string giving the name of the record. This is always
				identical to the key in the JSON root object corresponding to this
				record's dictionary. (If the record is anonymous, the name is
				arbitrary.)

				* ``!anonymous``: a boolean indicating whether the record's name was
				specified by the TableGen input (if it is ``false``), or invented by
				TableGen itself (if ``true``).

				For each variable defined in a record, the ``def`` object for that
				record also has a key for the variable name. The corresponding value
				is a translation into JSON of the variable's value, using the
				conventions described below.

				Some TableGen data types are translated directly into the
				corresponding JSON type:

				* A completely undefined value (e.g. for a variable declared without
				initializer in some superclass of this record, and never initialized
				by the record itself or any other superclass) is emitted as the JSON
				``null`` value.

				* ``int`` and ``bit`` values are emitted as numbers. Note that
				TableGen ``int`` values are capable of holding integers too large to
				be exactly representable in IEEE double precision. The integer
				literal in the JSON output will show the full exact integer value.
				So if you need to retrieve large integers with full precision, you
				should use a JSON reader capable of translating such literals back
				into 64-bit integers without losing precision, such as Python's
				standard ``json`` module.

				* ``string`` and ``code`` values are emitted as JSON strings.

				* ``list<T>`` values, for any element type ``T``, are emitted as JSON
				arrays. Each element of the array is represented in turn using these
				same conventions.

				* ``bits`` values are also emitted as arrays. A ``bits`` array is
				ordered from least-significant bit to most-significant. So the
				element with index ``i`` corresponds to the bit described as
				``x{i}`` in TableGen source. However, note that this means that
				scripting languages are likely to display the array in the
				opposite order from the way it appears in the TableGen source or in
				the diagnostic ``-print-records`` output.

				All other TableGen value types are emitted as a JSON object,
				containing two standard fields: ``kind`` is a discriminator describing
				which kind of value the object represents, and ``printable`` is a
				string giving the same representation of the value that would appear
				in ``-print-records``.

				* A reference to a ``def`` object has ``kind=="def"``, and has an
				extra field ``def`` giving the name of the object referred to.

				* A reference to another variable in the same record has
				``kind=="var"``, and has an extra field ``var`` giving the name of
				the variable referred to.

				* A reference to a specific bit of a ``bits``-typed variable in the
				same record has ``kind=="varbit"``, and has two extra fields:
				``var`` gives the name of the variable referred to, and ``index``
				gives the index of the bit.

				* A value of type ``dag`` has ``kind=="dag"``, and has two extra
				fields. ``operator`` gives the initial value after the opening
				parenthesis of the dag initializer; ``args`` is an array giving the
				following arguments. The elements of ``args`` are arrays of length
				2, giving the value of each argument followed by its colon-suffixed
				name (if any). For example, in the JSON representation of the dag
				value ``(Op 22, "hello":$foo)`` (assuming that ``Op`` is the name of
				a record defined elsewhere with a ``def`` statement):

				* ``operator`` will be an object in which ``kind=="def"`` and
				``def=="Op"``

				* ``args`` will be the array ``[[22, null], ["hello", "foo"]]``.

				* If any other kind of value or complicated expression appears in the
				output, it will have ``kind=="complex"``, and no additional fields.
				These values are not expected to be needed by backends. The standard
				``printable`` field can be used to extract a representation of them
				in TableGen source syntax if necessary.

	How to write a back-end			How to write a back-end
	=======================			=======================

	TODO.			TODO.

	Until we get a step-by-step HowTo for writing TableGen backends, you can at			Until we get a step-by-step HowTo for writing TableGen backends, you can at
	least grab the boilerplate (build system, new files, etc.) from Clang's			least grab the boilerplate (build system, new files, etc.) from Clang's
	r173931.			r173931.

	TODO: How they work, how to write one. This section should not contain details			TODO: How they work, how to write one. This section should not contain details
	about any particular backend, except maybe ``-print-enums`` as an example. This			about any particular backend, except maybe ``-print-enums`` as an example. This
	should highlight the APIs in ``TableGen/Record.h``.			should highlight the APIs in ``TableGen/Record.h``.

llvm/trunk/docs/TableGen/index.rst

Show First 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	.. code-block:: bash

$ llvm-tblgen X86.td -print-enums -class=Instruction		$ llvm-tblgen X86.td -print-enums -class=Instruction
ABS_F, ABS_Fp32, ABS_Fp64, ABS_Fp80, ADC32mi, ADC32mi8, ADC32mr, ADC32ri,		ABS_F, ABS_Fp32, ABS_Fp64, ABS_Fp80, ADC32mi, ADC32mi8, ADC32mr, ADC32ri,
ADC32ri8, ADC32rm, ADC32rr, ADC64mi32, ADC64mi8, ADC64mr, ADC64ri32, ADC64ri8,		ADC32ri8, ADC32rm, ADC32rr, ADC64mi32, ADC64mi8, ADC64mr, ADC64ri32, ADC64ri8,
ADC64rm, ADC64rr, ADD16mi, ADD16mi8, ADD16mr, ADD16ri, ADD16ri8, ADD16rm,		ADC64rm, ADC64rr, ADD16mi, ADD16mi8, ADD16mr, ADD16ri, ADD16ri8, ADD16rm,
ADD16rr, ADD32mi, ADD32mi8, ADD32mr, ADD32ri, ADD32ri8, ADD32rm, ADD32rr,		ADD16rr, ADD32mi, ADD32mi8, ADD32mr, ADD32ri, ADD32ri8, ADD32rm, ADD32rr,
ADD64mi32, ADD64mi8, ADD64mr, ADD64ri32, ...		ADD64mi32, ADD64mi8, ADD64mr, ADD64ri32, ...

The default backend prints out all of the records.		The default backend prints out all of the records. There is also a general
		backend which outputs all the records as a JSON data structure, enabled using
		the `-dump-json` option.

If you plan to use TableGen, you will most likely have to write a `backend`_		If you plan to use TableGen, you will most likely have to write a `backend`_
that extracts the information specific to what you need and formats it in the		that extracts the information specific to what you need and formats it in the
appropriate way.		appropriate way. You can do this by extending TableGen itself in C++, or by
		writing a script in any language that can consume the JSON output.

Example		Example
-------		-------

With no other arguments, `llvm-tblgen` parses the specified file and prints out all		With no other arguments, `llvm-tblgen` parses the specified file and prints out all
of the classes, then all of the definitions. This is a good way to see what the		of the classes, then all of the definitions. This is a good way to see what the
various definitions expand to fully. Running this on the ``X86.td`` file prints		various definitions expand to fully. Running this on the ``X86.td`` file prints
this (at the time of this writing):		this (at the time of this writing):
▲ Show 20 Lines • Show All 210 Lines • Show Last 20 Lines

llvm/trunk/include/llvm/TableGen/Record.h

Show First 20 Lines • Show All 1,894 Lines • ▼ Show 20 Lines	public:
explicit HasReferenceResolver(Init *VarNameToTrack)		explicit HasReferenceResolver(Init *VarNameToTrack)
: Resolver(nullptr), VarNameToTrack(VarNameToTrack) {}		: Resolver(nullptr), VarNameToTrack(VarNameToTrack) {}

bool found() const { return Found; }		bool found() const { return Found; }

Init resolve(Init VarName) override;		Init resolve(Init VarName) override;
};		};

		void EmitJSON(RecordKeeper &RK, raw_ostream &OS);

} // end namespace llvm		} // end namespace llvm

#endif // LLVM_TABLEGEN_RECORD_H		#endif // LLVM_TABLEGEN_RECORD_H

llvm/trunk/lib/TableGen/CMakeLists.txt

	add_llvm_library(LLVMTableGen			add_llvm_library(LLVMTableGen
	Error.cpp			Error.cpp
				JSONBackend.cpp
	Main.cpp			Main.cpp
	Record.cpp			Record.cpp
	SetTheory.cpp			SetTheory.cpp
	StringMatcher.cpp			StringMatcher.cpp
	TableGenBackend.cpp			TableGenBackend.cpp
	TGLexer.cpp			TGLexer.cpp
	TGParser.cpp			TGParser.cpp

	ADDITIONAL_HEADER_DIRS			ADDITIONAL_HEADER_DIRS
	${LLVM_MAIN_INCLUDE_DIR}/llvm/TableGen			${LLVM_MAIN_INCLUDE_DIR}/llvm/TableGen
	)			)

llvm/trunk/lib/TableGen/JSONBackend.cpp

Property	Old Value	New Value
svn:eol-style	null	native \ No newline at end of property
svn:keywords	null	Rev Date Author URL Id \ No newline at end of property

				//===- JSONBackend.cpp - Generate a JSON dump of all records. -- C++ --=====//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// This TableGen back end generates a machine-readable representation
				// of all the classes and records defined by the input, in JSON format.
				//
				//===----------------------------------------------------------------------===//

				#include "llvm/ADT/BitVector.h"
				#include "llvm/Support/Debug.h"
				#include "llvm/TableGen/Error.h"
				#include "llvm/TableGen/Record.h"
				#include "llvm/TableGen/TableGenBackend.h"
				#include "llvm/Support/JSON.h"

				#define DEBUG_TYPE "json-emitter"

				using namespace llvm;

				namespace {

				class JSONEmitter {
				private:
				RecordKeeper &Records;

				json::Value translateInit(const Init &I);
				json::Array listSuperclasses(const Record &R);

				public:
				JSONEmitter(RecordKeeper &R);

				void run(raw_ostream &OS);
				};

				} // end anonymous namespace

				JSONEmitter::JSONEmitter(RecordKeeper &R) : Records(R) {}

				json::Value JSONEmitter::translateInit(const Init &I) {

				// Init subclasses that we return as JSON primitive values of one
				// kind or another.

				if (isa<UnsetInit>(&I)) {
				return nullptr;
				} else if (auto *Bit = dyn_cast<BitInit>(&I)) {
				return Bit->getValue() ? 1 : 0;
				} else if (auto *Bits = dyn_cast<BitsInit>(&I)) {
				json::Array array;
				for (unsigned i = 0, limit = Bits->getNumBits(); i < limit; i++)
				array.push_back(translateInit(*Bits->getBit(i)));
				return array;
				} else if (auto *Int = dyn_cast<IntInit>(&I)) {
				return Int->getValue();
				} else if (auto *Str = dyn_cast<StringInit>(&I)) {
				return Str->getValue();
				} else if (auto *Code = dyn_cast<CodeInit>(&I)) {
				return Code->getValue();
				} else if (auto *List = dyn_cast<ListInit>(&I)) {
				json::Array array;
				for (auto val : *List)
				array.push_back(translateInit(*val));
				return array;
				}

				// Init subclasses that we return as JSON objects containing a
				// 'kind' discriminator. For these, we also provide the same
				// translation back into TableGen input syntax that -print-records
				// would give.

				json::Object obj;
				obj["printable"] = I.getAsString();

				if (auto *Def = dyn_cast<DefInit>(&I)) {
				obj["kind"] = "def";
				obj["def"] = Def->getDef()->getName();
				return obj;
				} else if (auto *Var = dyn_cast<VarInit>(&I)) {
				obj["kind"] = "var";
				obj["var"] = Var->getName();
				return obj;
				} else if (auto *VarBit = dyn_cast<VarBitInit>(&I)) {
				if (auto *Var = dyn_cast<VarInit>(VarBit->getBitVar())) {
				obj["kind"] = "varbit";
				obj["var"] = Var->getName();
				obj["index"] = VarBit->getBitNum();
				return obj;
				}
				} else if (auto *Dag = dyn_cast<DagInit>(&I)) {
				obj["kind"] = "dag";
				obj["operator"] = translateInit(*Dag->getOperator());
				if (auto name = Dag->getName())
				obj["name"] = name->getAsUnquotedString();
				json::Array args;
				for (unsigned i = 0, limit = Dag->getNumArgs(); i < limit; ++i) {
				json::Array arg;
				arg.push_back(translateInit(*Dag->getArg(i)));
				if (auto argname = Dag->getArgName(i))
				arg.push_back(argname->getAsUnquotedString());
				else
				arg.push_back(nullptr);
				args.push_back(std::move(arg));
				}
				obj["args"] = std::move(args);
				return obj;
				}

				// Final fallback: anything that gets past here is simply given a
				// kind field of 'complex', and the only other field is the standard
				// 'printable' representation.

				assert(!I.isConcrete());
				obj["kind"] = "complex";
				return obj;
				}

				void JSONEmitter::run(raw_ostream &OS) {
				json::Object root;

				root["!tablegen_json_version"] = 1;

				// Prepare the arrays that will list the instances of every class.
				// We mostly fill those in by iterating over the superclasses of
				// each def, but we also want to ensure we store an empty list for a
				// class with no instances at all, so we do a preliminary iteration
				// over the classes, invoking std::map::operator[] to default-
				// construct the array for each one.
				std::map<std::string, json::Array> instance_lists;
				for (const auto &C : Records.getClasses()) {
				auto &Name = C.second->getNameInitAsString();
				(void)instance_lists[Name];
				}

				// Main iteration over the defs.
				for (const auto &D : Records.getDefs()) {
				auto &Name = D.second->getNameInitAsString();
				auto &Def = *D.second;

				json::Object obj;
				json::Array fields;

				for (const RecordVal &RV : Def.getValues()) {
				if (!Def.isTemplateArg(RV.getNameInit())) {
				auto Name = RV.getNameInitAsString();
				if (RV.getPrefix())
				fields.push_back(Name);
				obj[Name] = translateInit(*RV.getValue());
				}
				}

				obj["!fields"] = std::move(fields);

				json::Array superclasses;
				for (const auto &SuperPair : Def.getSuperClasses())
				superclasses.push_back(SuperPair.first->getNameInitAsString());
				obj["!superclasses"] = std::move(superclasses);

				obj["!name"] = Name;
				obj["!anonymous"] = Def.isAnonymous();

				root[Name] = std::move(obj);

				// Add this def to the instance list for each of its superclasses.
				for (const auto &SuperPair : Def.getSuperClasses()) {
				auto SuperName = SuperPair.first->getNameInitAsString();
				instance_lists[SuperName].push_back(Name);
				}
				}

				// Make a JSON object from the std::map of instance lists.
				json::Object instanceof;
				for (auto kv: instance_lists)
				instanceof[kv.first] = std::move(kv.second);
				root["!instanceof"] = std::move(instanceof);

				// Done. Write the output.
				OS << json::Value(std::move(root)) << "\n";
				}

				namespace llvm {

				void EmitJSON(RecordKeeper &RK, raw_ostream &OS) { JSONEmitter(RK).run(OS); }
				} // end namespace llvm

llvm/trunk/test/TableGen/JSON-check.py

				#!/usr/bin/env python

				import sys
				import subprocess
				import traceback
				import json

				data = json.load(sys.stdin)
				testfile = sys.argv[1]

				prefix = "CHECK: "

				fails = 0
				passes = 0
				with open(testfile) as testfh:
				lineno = 0
				for line in iter(testfh.readline, ""):
				lineno += 1
				line = line.rstrip("\r\n")
				try:
				prefix_pos = line.index(prefix)
				except ValueError:
				continue
				check_expr = line[prefix_pos + len(prefix):]

				try:
				exception = None
				result = eval(check_expr, {"data":data})
				except Exception:
				result = False
				exception = traceback.format_exc().splitlines()[-1]

				if exception is not None:
				sys.stderr.write(
				"{file}:{line:d}: check threw exception: {expr}\n"
				"{file}:{line:d}: exception was: {exception}\n".format(
				file=testfile, line=lineno,
				expr=check_expr, exception=exception))
				fails += 1
				elif not result:
				sys.stderr.write(
				"{file}:{line:d}: check returned False: {expr}\n".format(
				file=testfile, line=lineno, expr=check_expr))
				fails += 1
				else:
				passes += 1

				if fails != 0:
				sys.exit("{} checks failed".format(fails))
				else:
				sys.stdout.write("{} checks passed\n".format(passes))

llvm/trunk/test/TableGen/JSON.td

				// RUN: llvm-tblgen -dump-json %s \| %python %S/JSON-check.py %s

				// CHECK: data['!tablegen_json_version'] == 1

				// CHECK: all(data[s]['!name'] == s for s in data if not s.startswith("!"))

				class Base {}
				class Intermediate : Base {}
				class Derived : Intermediate {}

				def D : Intermediate {}
				// CHECK: 'D' in data['!instanceof']['Base']
				// CHECK: 'D' in data['!instanceof']['Intermediate']
				// CHECK: 'D' not in data['!instanceof']['Derived']
				// CHECK: 'Base' in data['D']['!superclasses']
				// CHECK: 'Intermediate' in data['D']['!superclasses']
				// CHECK: 'Derived' not in data['D']['!superclasses']

				def ExampleDagOp;

				def FieldKeywordTest {
				int a;
				field int b;
				// CHECK: 'a' not in data['FieldKeywordTest']['!fields']
				// CHECK: 'b' in data['FieldKeywordTest']['!fields']
				}

				class Variables {
				int i;
				string s;
				bit b;
				bits<8> bs;
				code c;
				list<int> li;
				Base base;
				dag d;
				}
				def VarNull : Variables {
				// A variable not filled in at all has its value set to JSON
				// 'null', which translates to Python None
				// CHECK: data['VarNull']['i'] is None
				}
				def VarPrim : Variables {
				// Test initializers that map to primitive JSON types

				int i = 3;
				// CHECK: data['VarPrim']['i'] == 3

				// Integer literals should be emitted in the JSON at full 64-bit
				// precision, for the benefit of JSON readers that preserve that
				// much information. Python's is one such.
				int enormous_pos = 9123456789123456789;
				int enormous_neg = -9123456789123456789;
				// CHECK: data['VarPrim']['enormous_pos'] == 9123456789123456789
				// CHECK: data['VarPrim']['enormous_neg'] == -9123456789123456789

				string s = "hello, world";
				// CHECK: data['VarPrim']['s'] == 'hello, world'

				bit b = 0;
				// CHECK: data['VarPrim']['b'] == 0

				// bits<> arrays are stored in logical order (array[i] is the same
				// bit identified in .td files as bs{i}), which means the _visual_
				// order of the list (in default rendering) is reversed.
				bits<8> bs = { 0,0,0,1,0,1,1,1 };
				// CHECK: data['VarPrim']['bs'] == [ 1,1,1,0,1,0,0,0 ]

				code c = [{ \" }];
				// CHECK: data['VarPrim']['c'] == r' \" '

				list<int> li = [ 1, 2, 3, 4 ];
				// CHECK: data['VarPrim']['li'] == [ 1, 2, 3, 4 ]
				}
				def VarObj : Variables {
				// Test initializers that map to JSON objects containing a 'kind'
				// discriminator

				Base base = D;
				// CHECK: data['VarObj']['base']['kind'] == 'def'
				// CHECK: data['VarObj']['base']['def'] == 'D'
				// CHECK: data['VarObj']['base']['printable'] == 'D'

				dag d = (ExampleDagOp 22, "hello":$foo);
				// CHECK: data['VarObj']['d']['kind'] == 'dag'
				// CHECK: data['VarObj']['d']['operator']['kind'] == 'def'
				// CHECK: data['VarObj']['d']['operator']['def'] == 'ExampleDagOp'
				// CHECK: data['VarObj']['d']['operator']['printable'] == 'ExampleDagOp'
				// CHECK: data['VarObj']['d']['args'] == [[22, None], ["hello", "foo"]]
				// CHECK: data['VarObj']['d']['printable'] == '(ExampleDagOp 22, "hello":$foo)'

				int undef_int;
				field int ref_int = undef_int;
				// CHECK: data['VarObj']['ref_int']['kind'] == 'var'
				// CHECK: data['VarObj']['ref_int']['var'] == 'undef_int'
				// CHECK: data['VarObj']['ref_int']['printable'] == 'undef_int'

				bits<2> undef_bits;
				bits<4> ref_bits;
				let ref_bits{3-2} = 0b10;
				let ref_bits{1-0} = undef_bits{1-0};
				// CHECK: data['VarObj']['ref_bits'][3] == 1
				// CHECK: data['VarObj']['ref_bits'][2] == 0
				// CHECK: data['VarObj']['ref_bits'][1]['kind'] == 'varbit'
				// CHECK: data['VarObj']['ref_bits'][1]['var'] == 'undef_bits'
				// CHECK: data['VarObj']['ref_bits'][1]['index'] == 1
				// CHECK: data['VarObj']['ref_bits'][1]['printable'] == 'undef_bits{1}'
				// CHECK: data['VarObj']['ref_bits'][0]['kind'] == 'varbit'
				// CHECK: data['VarObj']['ref_bits'][0]['var'] == 'undef_bits'
				// CHECK: data['VarObj']['ref_bits'][0]['index'] == 0
				// CHECK: data['VarObj']['ref_bits'][0]['printable'] == 'undef_bits{0}'

				field int complex_ref_int = !add(undef_int, 2);
				// CHECK: data['VarObj']['complex_ref_int']['kind'] == 'complex'
				// CHECK: data['VarObj']['complex_ref_int']['printable'] == '!add(undef_int, 2)'
				}

				// Test the !anonymous member. This is tricky because when a def is
				// anonymous, almost by definition, the test can't reliably predict
				// the name it will be stored under! So we have to search all the defs
				// in the JSON output looking for the one that has the test integer
				// field set to the right value.

				def Named { int AnonTestField = 1; }
				// CHECK: data['Named']['AnonTestField'] == 1
				// CHECK: data['Named']['!anonymous'] is False

				def { int AnonTestField = 2; }
				// CHECK: next(rec for rec in data.values() if isinstance(rec, dict) and rec.get('AnonTestField') == 2)['!anonymous'] is True

				multiclass AnonTestMulticlass<int base> {
				def _plus_one { int AnonTestField = !add(base,1); }
				def { int AnonTestField = !add(base,2); }
				}

				defm NamedDefm : AnonTestMulticlass<10>;
				// CHECK: data['NamedDefm_plus_one']['!anonymous'] is False
				// CHECK: data['NamedDefm_plus_one']['AnonTestField'] == 11
				// CHECK: next(rec for rec in data.values() if isinstance(rec, dict) and rec.get('AnonTestField') == 12)['!anonymous'] is True

				// D47431 clarifies that a named def inside a multiclass gives a
				// non-anonymous output record, even if the defm that instantiates
				// that multiclass is anonymous.
				defm : AnonTestMulticlass<20>;
				// CHECK: next(rec for rec in data.values() if isinstance(rec, dict) and rec.get('AnonTestField') == 21)['!anonymous'] is False
				// CHECK: next(rec for rec in data.values() if isinstance(rec, dict) and rec.get('AnonTestField') == 22)['!anonymous'] is True

llvm/trunk/utils/TableGen/TableGen.cpp

Show All 18 Lines
#include "llvm/TableGen/Main.h"		#include "llvm/TableGen/Main.h"
#include "llvm/TableGen/Record.h"		#include "llvm/TableGen/Record.h"
#include "llvm/TableGen/SetTheory.h"		#include "llvm/TableGen/SetTheory.h"

using namespace llvm;		using namespace llvm;

enum ActionType {		enum ActionType {
PrintRecords,		PrintRecords,
		DumpJSON,
GenEmitter,		GenEmitter,
GenRegisterInfo,		GenRegisterInfo,
GenInstrInfo,		GenInstrInfo,
GenInstrDocs,		GenInstrDocs,
GenAsmWriter,		GenAsmWriter,
GenAsmMatcher,		GenAsmMatcher,
GenDisassembler,		GenDisassembler,
GenPseudoLowering,		GenPseudoLowering,
Show All 19 Lines	enum ActionType {
GenRegisterBank,		GenRegisterBank,
};		};

namespace {		namespace {
cl::opt<ActionType>		cl::opt<ActionType>
Action(cl::desc("Action to perform:"),		Action(cl::desc("Action to perform:"),
cl::values(clEnumValN(PrintRecords, "print-records",		cl::values(clEnumValN(PrintRecords, "print-records",
"Print all records to stdout (default)"),		"Print all records to stdout (default)"),
		clEnumValN(DumpJSON, "dump-json",
		"Dump all records as machine-readable JSON"),
clEnumValN(GenEmitter, "gen-emitter",		clEnumValN(GenEmitter, "gen-emitter",
"Generate machine code emitter"),		"Generate machine code emitter"),
clEnumValN(GenRegisterInfo, "gen-register-info",		clEnumValN(GenRegisterInfo, "gen-register-info",
"Generate registers and register classes info"),		"Generate registers and register classes info"),
clEnumValN(GenInstrInfo, "gen-instr-info",		clEnumValN(GenInstrInfo, "gen-instr-info",
"Generate instruction descriptions"),		"Generate instruction descriptions"),
clEnumValN(GenInstrDocs, "gen-instr-docs",		clEnumValN(GenInstrDocs, "gen-instr-docs",
"Generate instruction documentation"),		"Generate instruction documentation"),
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	namespace {
Class("class", cl::desc("Print Enum list for this class"),		Class("class", cl::desc("Print Enum list for this class"),
cl::value_desc("class name"), cl::cat(PrintEnumsCat));		cl::value_desc("class name"), cl::cat(PrintEnumsCat));

bool LLVMTableGenMain(raw_ostream &OS, RecordKeeper &Records) {		bool LLVMTableGenMain(raw_ostream &OS, RecordKeeper &Records) {
switch (Action) {		switch (Action) {
case PrintRecords:		case PrintRecords:
OS << Records; // No argument, dump all contents		OS << Records; // No argument, dump all contents
break;		break;
		case DumpJSON:
		EmitJSON(Records, OS);
		break;
case GenEmitter:		case GenEmitter:
EmitCodeEmitter(Records, OS);		EmitCodeEmitter(Records, OS);
break;		break;
case GenRegisterInfo:		case GenRegisterInfo:
EmitRegisterInfo(Records, OS);		EmitRegisterInfo(Records, OS);
break;		break;
case GenInstrInfo:		case GenInstrInfo:
EmitInstrInfo(Records, OS);		EmitInstrInfo(Records, OS);
▲ Show 20 Lines • Show All 115 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[TableGen] Add a general-purpose JSON backend.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 154946

llvm/trunk/docs/CommandGuide/tblgen.rst

llvm/trunk/docs/TableGen/BackEnds.rst

llvm/trunk/docs/TableGen/index.rst

llvm/trunk/include/llvm/TableGen/Record.h

llvm/trunk/lib/TableGen/CMakeLists.txt

llvm/trunk/lib/TableGen/JSONBackend.cpp

llvm/trunk/test/TableGen/JSON-check.py

llvm/trunk/test/TableGen/JSON.td

llvm/trunk/utils/TableGen/TableGen.cpp

[TableGen] Add a general-purpose JSON backend.
ClosedPublic