This is an archive of the discontinued LLVM Phabricator instance.

Refactor Tblgen DecoderEmitter to allow multiple language output
AbandonedPublic

Authored by Rot127 on Oct 26 2022, 6:11 PM.

Details

Summary

This refactor of the TableGen DecoderEmitter allows to output the generated code in different languages (C is added here).

Please note that this review request is for feedback purpose only and not meant to be merged.

The changes were made on branch llvm-15.0.3.

Justification

The necessity for this refactor came from our wish at Capstone to generate the decoder tables and decoder methods/functions in C and not C++.
Simply patching our fork would have ended in regular maintenance work which we would like to avoid (especially since we had to do it for at least four backends).
Since translating the generated C++ to C with scripts is a mess we wanted a more fundamental solution.

Summary

The refactor does the following:

  1. To allow multiple language output of this backend, all the generated strings are managed by class inherited from PrinterInterface (see details below).
  2. Splits the single DecoderEmitter.cpp file into multiple files for readability purposes and unites them under their own namespace.
  3. Adds a sequence diagram about the generation process (to give a rough overview).
  4. Adds documentation how the decoder works (state machine) and describes how the decoding state machine is build.
  5. For readability some very long methods were shortened by extracting code into separated methods.
  6. The generation logic was not touched in any way (Except two variable renamings so they match the documentation (FilteredInstructions -> FilteredInstrSubsets, VariableInstructions -> VariableInstrSubsets)).

The newly introduced "Printer" classes (PrinterInterface and children) handle all strings which are emitted into the output stream.
For each language there exists an implementation of the abstract PrinterInterface (here C++ and C for Capstone).

The DecoderEmitter and FilterChooser have an instance of a PrinterInterface and request strings from it or trigger the writing of the code to the output stream.
Depending on the language which should be written a different implementation of the PrinterInterface is set in DecoderEmitter and FilterChooser.

Feedback request

Other backends which emit code would need a "PrinterInterface" as well (e.g. AsmWriter, RegisterInfo, InstrInfo, SubtargetInfo).

The changes made here are a lot. Hence I don't expect and wouldn't want them to be merged.
But I kindly ask you for your feedback.

Diff Detail

Event Timeline

Rot127 created this revision.Oct 26 2022, 6:11 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 26 2022, 6:11 PM
Rot127 requested review of this revision.Oct 26 2022, 6:11 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 26 2022, 6:11 PM

Thanks for this patch !

It is difficult to see the changes you made. Could you separate this into two patches -- the first one contains the all the changes except the functional change; the second one contains the functional change you made.

myhsu added a comment.Oct 27 2022, 9:21 AM

I only skimmed through the patch but I found this pretty interesting. To my best understanding, you're trying to add another layer of abstraction between disassembler generation logics and the code printer. Such that downstream projects like Capstone only need to maintain a single printer code (e.g. PrinterCapston.cpp) rather than hundreds of lines of code spanning across DecoderEmitter.cpp. If this is the case, I strongly encourage you to upstream most of the stuff here except PrinterCapstone.cpp.

Other backends which emit code would need a "PrinterInterface" as well (e.g. AsmWriter, RegisterInfo, InstrInfo, SubtargetInfo).

We can try on the disassembler first and gradually refactor other components later in other patches (because this new printer interface can co-exist with the current code printing logics, right?)

llvm/utils/TableGen/DecoderEmitter/CMakeLists.txt
6

double quotes are not necessary here, please remove them for better readability

llvm/utils/TableGen/DecoderEmitter/README.md
1

Kudos on this document!

Rot127 updated this revision to Diff 471366.Oct 27 2022, 7:04 PM

Adds the decoder split up/refactor as single diff. separated from the Printer implementation.

Rot127 updated this revision to Diff 471371.Oct 27 2022, 7:26 PM

Add diff of Printer implementation.

Thanks for this patch !

It is difficult to see the changes you made. Could you separate this into two patches -- the first one contains the all the changes except the functional change; the second one contains the functional change you made.

I updated the diff two more times they can be selected now in the history. Hopefully this was the right way of doing it?
Sorry if not. I'm new to Phabricator but wanted to have it available quickly for you guys.

There are only two none functional changes in the functional commit: one renaming of the statistic variables and the extraction of types from one header file to another (easy to spot).
Sorry if this isn't 100% clean, but those changes were made too late and couldn't easily be resorted with git rebase -i.

Rot127 added a comment.EditedOct 27 2022, 7:51 PM

To my best understanding, you're trying to add another layer of abstraction between disassembler generation logics and the code printer. Such that downstream projects like Capstone only need to maintain a single printer code (e.g. PrinterCapston.cpp) rather than hundreds of lines of code spanning across DecoderEmitter.cpp

Yes, this is the idea.

because this new printer interface can co-exist with the current code printing logics, right?

In what sense do you mean "co-exist"?
Currently the generator logic writes to OS whenever it has some code piece ready (this seems to be the case for all backends mentioned above).
With this change it no longer does it, but passes the generated code pieces on. Once all the code is generated the Printer writes it to OS.

So as long as one backend is updated at a time there is no problem. Since backends do not share printing logic in any way.

myhsu added a comment.Oct 30 2022, 4:01 PM

So as long as one backend is updated at a time there is no problem. Since backends do not share printing logic in any way.

Yes I was simply wondering if it's possible to update a single backend a time. Good to know now.

llvm/utils/TableGen/CMakeLists.txt