This is an archive of the discontinued LLVM Phabricator instance.

[tblgen][disasm] Separate encodings from instructions
ClosedPublic

Authored by dsanders on Sep 21 2018, 10:23 AM.

Details

Summary

Separate the concept of an encoding from an instruction. This will enable
the definition of additional encodings for the same instruction which can
be used to support variable length instruction sets in the disassembler
(and potentially assembler but I'm not working towards that right now)
without causing an explosion in the number of Instruction records that
CodeGen then has to pick between.

Diff Detail

Repository
rL LLVM

Event Timeline

dsanders created this revision.Sep 21 2018, 10:23 AM

The encoding of an instruction includes encodings of its operands. How are you planning to implement separate encoder/decoder methods? Right now their names are all embedded into operand definitions.

The encoding of an instruction includes encodings of its operands. How are you planning to implement separate encoder/decoder methods? Right now their names are all embedded into operand definitions.

The follow-on patch, D52369, adds the AdditionalEncoding class which defines the encoding in the same way that an Instruction does:

def : AdditionalEncoding<ADD1> {
  bits<8> Reg;
  bits<16> Inst; // You can also have bits<32> and it will still be a 16-bit encoding
  let Size = 2;
  let Inst{0-3} = 0;
  let Inst{4-7} = Reg;
  let Inst{8-15} = 0;
  ...
}

the decoders for variable fields like Reg from the above example are going to be inherited from the Instruction that the encoding maps to. This is sufficient for the majority of cases since many compressed ISA's use the same field encoding and only apply tighter range limits for the shorter encodings. There are a few that change the field encoding in a bigger way than merely truncating the bits and that case is currently only supported via AdditionalEncoding.DecoderMethod which would have to extract and decode every operand. I'm thinking that the per-operand DecoderMethods could be done with a field along the lines of:

list<NameToOperand> OperandOverrides = [NameToOperand<"Reg", SomeOperand>];

where NameToOperand is a class along the lines of:

class NameToOperand<string name, Operand operand> {
  string Name = name;
  Operand Op = operand;
}

that would then need to be plumbed into the code responsible for generating the code to extract and decode the variable fields.

The encoding side of things isn't something I've considered much so far since there's non-trivial decisions to be made to pick between the choices of valid encodings and these impact optimization as well as the validity of the generated code (e.g. jump tables sometimes need a known size to calculate the target of the jump). I've generally tried to make sure the AdditionalEncoding class makes sense from an encoder perspective but I haven't given much thought to actually generating an encoder from it yet

dsanders updated this revision to Diff 170686.Oct 23 2018, 10:03 AM

Fold in another correction that ended up in a downstream-specific patch:
Only emit 'FOO => BAR' comments encodings caused by AdditionalEncoding. For Instruction, keep the existing 'FOO' comments.

bogner added inline comments.Oct 31 2018, 10:24 AM
utils/TableGen/FixedLenDecoderEmitter.cpp
94–95 ↗(On Diff #170686)

Would simplifying by omitting this and just using brace-initialization be better?

420–422 ↗(On Diff #170686)

Why not use the operator<< here if you're going to define it above anyway?

851–853 ↗(On Diff #170686)

Same here (unless using => instead of : is important for some reason?)

dsanders added inline comments.Oct 31 2018, 10:42 AM
utils/TableGen/FixedLenDecoderEmitter.cpp
94–95 ↗(On Diff #170686)

It's needed for std::vector<EncodingAndInst>::emplace_back(). It eventually ends up failing on the placement new inside std::allocator::construct() without it

420–422 ↗(On Diff #170686)

Well spotted. It's just that I didn't notice this one when I added the operator. I'll fix this

851–853 ↗(On Diff #170686)

It's not hugely important. I used => here to indicate that we've matched the AlternateEncoding record but we're building the MCInst as if we'd matched the Instruction. I can switch this for the operator

dsanders updated this revision to Diff 171979.Oct 31 2018, 11:40 AM

Fixed the nits about not using the << operator but doing the same thing it does

bogner accepted this revision.Oct 31 2018, 3:45 PM
bogner added inline comments.
utils/TableGen/FixedLenDecoderEmitter.cpp
94–95 ↗(On Diff #170686)

I doubt that emplace_back() is much/any better than push_back({...}) for a simple value type like this, so I'd personally probably go with the simpler approach. It's pretty minor though.

This revision is now accepted and ready to land.Oct 31 2018, 3:45 PM
This revision was automatically updated to reflect the committed changes.