Page MenuHomePhabricator

[TableGen] Add Instruction custom byte sequence emission

Authored by m4yers on Aug 15 2018, 9:14 AM.



This patch is part of a bigger one D50314. It adds a possibility to emit a
custom sequence of bytes attributed to an Instruction.

This feature can be used to simplify CISC machine code emission as it done in
M68K D50314 patch. You can check out, and
M680x0MCCodeEmitter.cpp to see how it is used. In short, the process is
like this:

  • Define a set of parameterized "beads" of length 8. The composition and meaning of those bits is up to the concrete backend. In m68k case first 4 bits is the type, the last 4 bits is the payload.
  • Every non-pseudo instruction receives a number of those beads and stores them into Beads Instruction variable.
  • The beads are read by this tablegen generator to emit an array of those beads referenced by instruction opcode.
  • The MCCodeEmitter interprets the beads of a concrete instruction to emit a machine command. In m68k case type is read to interpret the payload, which can be a reference to an operand, an immediate value or a complex pattern.

Diff Detail

Event Timeline

m4yers created this revision.Aug 15 2018, 9:14 AM

Is there some testing approach for changes like this?
It really isn't ideal to just use the final use-case (the target) as the one any testing strategy for this.


Ignorant question: but surely the problem of storing instruction encoding in tablegen has already arose?
How does this solution differ from the existing practice?

m4yers marked an inline comment as done.Aug 16 2018, 12:00 PM
m4yers added inline comments.

It allows to pass encoding strings of various length. Also it is semantically different from TSFlags and I thought it'd be a poor choice to extend its size and reuse.

efriedma added inline comments.

Most other targets use CodeEmitterGen to generate a function getBinaryCodeForInstr. This is nice, but it only works for targets with fixed-length instructions of length 4 or less.

x86 doesn't store instruction encodings in TableGen; instead, it uses a hand-written implementation of X86MCCodeEmitter::encodeInstruction, driven by flags in the instruction description.

lebedev.ri added inline comments.Aug 16 2018, 12:20 PM

(At the very least this should be explained here in the comment.)


map and vector are unused it seems.


I wonder if it would be more straight-forward to go the other way around?
You wouldn't need to manually worry to specify multiple-of-BeadSize-bits value.


I'm surprised there is no abstraction for this in LLVM already.
Maybe i'm looking in the wrong places.

m4yers updated this revision to Diff 161770.Aug 21 2018, 10:55 AM
m4yers marked 7 inline comments as done.

Small fixes


I will qualify that the binary string is of arbitrary size.


Do you mean like this?

constexpr unsigned BeadSize = 8;
constexpr unsigned BeadsNumber = 24;
constexpr unsigned BeadsLength = BeadsNumber * BeadSize;

@m4yers Abandon this now that the m68k target has landed?

Herald added a project: Restricted Project. · View Herald TranscriptMar 14 2021, 10:08 AM
m4yers abandoned this revision.Mar 16 2021, 1:37 AM