Separate the concept of an encoding from an instruction. This will enable
the definition of additional encodings for the same instruction which can
be used to support variable length instruction sets in the disassembler
(and potentially assembler but I'm not working towards that right now)
without causing an explosion in the number of Instruction records that
CodeGen then has to pick between.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
The encoding of an instruction includes encodings of its operands. How are you planning to implement separate encoder/decoder methods? Right now their names are all embedded into operand definitions.
The follow-on patch, D52369, adds the AdditionalEncoding class which defines the encoding in the same way that an Instruction does:
def : AdditionalEncoding<ADD1> { bits<8> Reg; bits<16> Inst; // You can also have bits<32> and it will still be a 16-bit encoding let Size = 2; let Inst{0-3} = 0; let Inst{4-7} = Reg; let Inst{8-15} = 0; ... }
the decoders for variable fields like Reg from the above example are going to be inherited from the Instruction that the encoding maps to. This is sufficient for the majority of cases since many compressed ISA's use the same field encoding and only apply tighter range limits for the shorter encodings. There are a few that change the field encoding in a bigger way than merely truncating the bits and that case is currently only supported via AdditionalEncoding.DecoderMethod which would have to extract and decode every operand. I'm thinking that the per-operand DecoderMethods could be done with a field along the lines of:
list<NameToOperand> OperandOverrides = [NameToOperand<"Reg", SomeOperand>];
where NameToOperand is a class along the lines of:
class NameToOperand<string name, Operand operand> { string Name = name; Operand Op = operand; }
that would then need to be plumbed into the code responsible for generating the code to extract and decode the variable fields.
The encoding side of things isn't something I've considered much so far since there's non-trivial decisions to be made to pick between the choices of valid encodings and these impact optimization as well as the validity of the generated code (e.g. jump tables sometimes need a known size to calculate the target of the jump). I've generally tried to make sure the AdditionalEncoding class makes sense from an encoder perspective but I haven't given much thought to actually generating an encoder from it yet
Fold in another correction that ended up in a downstream-specific patch:
Only emit 'FOO => BAR' comments encodings caused by AdditionalEncoding. For Instruction, keep the existing 'FOO' comments.
utils/TableGen/FixedLenDecoderEmitter.cpp | ||
---|---|---|
94–95 ↗ | (On Diff #170686) | Would simplifying by omitting this and just using brace-initialization be better? |
420–422 ↗ | (On Diff #170686) | Why not use the operator<< here if you're going to define it above anyway? |
851–853 ↗ | (On Diff #170686) | Same here (unless using => instead of : is important for some reason?) |
utils/TableGen/FixedLenDecoderEmitter.cpp | ||
---|---|---|
94–95 ↗ | (On Diff #170686) | It's needed for std::vector<EncodingAndInst>::emplace_back(). It eventually ends up failing on the placement new inside std::allocator::construct() without it |
420–422 ↗ | (On Diff #170686) | Well spotted. It's just that I didn't notice this one when I added the operator. I'll fix this |
851–853 ↗ | (On Diff #170686) | It's not hugely important. I used => here to indicate that we've matched the AlternateEncoding record but we're building the MCInst as if we'd matched the Instruction. I can switch this for the operator |
utils/TableGen/FixedLenDecoderEmitter.cpp | ||
---|---|---|
94–95 ↗ | (On Diff #170686) | I doubt that emplace_back() is much/any better than push_back({...}) for a simple value type like this, so I'd personally probably go with the simpler approach. It's pretty minor though. |