This is an archive of the discontinued LLVM Phabricator instance.

[TableGen][CodeEmitter] Introducing the VarLenCodeEmitterGen infrastructure
ClosedPublic

Authored by myhsu on Dec 5 2021, 7:24 PM.

Details

Summary

Full write up: https://gist.github.com/mshockwave/66e98d099256deefc062633909bb7b5b

The existing CodeEmitterGen infrastructure is unable to generate encoder function for ISAs with variable-length instructions. This patch introduces a new infrastructure to support variable-length instruction encoding, including a new TableGen syntax for writing instruction encoding directives and a new TableGen backend component, VarLenCodeEmitterGen, built on top of CodeEmitterGen.

Diff Detail

Event Timeline

myhsu created this revision.Dec 5 2021, 7:24 PM
myhsu requested review of this revision.Dec 5 2021, 7:24 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 5 2021, 7:24 PM
0x59616e added inline comments.Dec 6 2021, 3:26 AM
llvm/utils/TableGen/VarLenCodeEmitterGen.cpp
47–49

Hi,
First off, I want to appreciate for all of your hard work. That's awesome. Really.

I have a question: shouldn't that && be || ?

My understanding is: We have to make sure that every argument is what it should be.

In this case, the first argument should be StringInit, the second should be IntInit, etc.

So, if one of them is incorret, then it is fatal error.

But this logic in the if statement won't fail unless all of the two (or three) arguments are not what they should be.

Thanks for reading this comment :)

pengfei added a subscriber: pengfei.Dec 6 2021, 6:26 AM
myhsu updated this revision to Diff 392237.Dec 6 2021, 5:08 PM
myhsu marked an inline comment as done.

Fixed a minor logic issue.

llvm/utils/TableGen/VarLenCodeEmitterGen.cpp
47–49

yes you're right. Thanks!

Do you have any (full/partial) followup patches you can upload to phab showing the conversion of the M68k backend to VarLenCodeEmitterGen?

myhsu added a comment.Dec 7 2021, 4:19 AM

Do you have any (full/partial) followup patches you can upload to phab showing the conversion of the M68k backend to VarLenCodeEmitterGen?

Yes I had already refactored all the arithmetic instructions in M68k (i.e. those in M68kInstrArithmetic.td). I was trying to add more documentations so I didn't show that at the beginning. I just put it in D115234.

myhsu updated this revision to Diff 393128.Dec 9 2021, 5:54 AM

Incorporated feedbacks from @ricky26 :

  • Replacing seq and seq:$dec with ascend and descend, respectively. The reason I picked these two words is because I want them to be more explicit on their placement directions (i.e. LSB->MSB and MSB->LSB).
  • Replacing the operand operator in variant (operand "$foo", 8, 7) with slice operator.

I didn't incorporate the custom encoder idea we discussed on the mailing list because so far no one is using it and I want to make the patch as minimal as possible.

myhsu added a reviewer: jrtc27.Dec 9 2021, 5:58 AM
ricky26 accepted this revision.Dec 11 2021, 4:48 PM

LGTM. There's one comment that I think might be wrong. This is cool to see and I favour it much more than the existing code beads, IMO it's much more readable. Thanks! :)

llvm/include/llvm/Target/Target.td
769

These two examples seem to be the same.

llvm/utils/TableGen/VarLenCodeEmitterGen.cpp
67

Not really anything worth changing now (mostly a note to myself if I come to write a disassembler generator for this):
It might be worth coalescing neighbouring constant bits.

This revision is now accepted and ready to land.Dec 11 2021, 4:48 PM
myhsu added a comment.EditedDec 11 2021, 5:04 PM

LGTM. There's one comment that I think might be wrong. This is cool to see and I favour it much more than the existing code beads, IMO it's much more readable. Thanks! :)

Thank you for taking a look. I'll address those comments later. Meanwhile when I was refactoring other instructions (more specifically, control instructions that has pc-relative immediate) I found that your (encoder <custom encoder name>) directive is actually useful (using a custom encoder only on immediate that can be pc-relative). I'll push a separate review to add that feature.

myhsu updated this revision to Diff 393714.Dec 11 2021, 6:04 PM
myhsu marked 2 inline comments as done.

Updated documentations in include/llvm/Target/Target.td

llvm/include/llvm/Target/Target.td
769

you're right. I was intended to use the same arguments for both ascend and descend but forgot to change the final encodings.

llvm/utils/TableGen/VarLenCodeEmitterGen.cpp
67

oh, right, the coalescing will only happen in the later stage ( VarLenCodeEmitterGen::emitInstructionBaseValues). I agree it might be worth it to do in VarLenInst if we want to share this class with the disassembler.

craig.topper requested changes to this revision.Dec 11 2021, 6:09 PM

Marking as changes required to flag the dangerous ArrayRef usage before this gets committed.

llvm/utils/TableGen/CodeEmitterGen.cpp
404

Given that only run is called on the VarLenCodeEmitterGen object, it might make sense to move the VarLenCodeEmitterGen class defintion into the cpp file and only expose a free function wrapper.

405

Curly brace the if body for consistency with the else

llvm/utils/TableGen/VarLenCodeEmitterGen.cpp
72

It might be better to not use the comma operator here and just have 3 variable declarations on 3 different lines.

84

I believe by the third paragraph of this section https://llvm.org/docs/CodingStandards.html#don-t-use-braces-on-simple-single-statement-bodies-of-if-else-loop-statements this block and the follow else should have curly braces to keep consistency.

127

This code is dangerous. It will create a dangling reference to an initializer_list that goes out of scope when the line ends. Use a plain C array here.

llvm/utils/TableGen/VarLenCodeEmitterGen.h
17

I think there is typically a blank line after the include guards.

This revision now requires changes to proceed.Dec 11 2021, 6:09 PM
craig.topper added inline comments.Dec 11 2021, 6:22 PM
llvm/utils/TableGen/VarLenCodeEmitterGen.cpp
153

You can use auto here.

236

Can this use a Twine to build the message as part of the argument to report_fatal_error?

267

I think this violates the no static global constructor rule?

372

Curly braces to match the else

392

Why is this a separate std::string and not part of the stream?

404

Can this be return Case like the other changes in D115374

myhsu updated this revision to Diff 393715.Dec 11 2021, 7:44 PM
myhsu marked 12 inline comments as done.

Addressed coding standard feedbacks (thank you @craig.topper)

llvm/utils/TableGen/VarLenCodeEmitterGen.cpp
127

I don't think it will create a dangling reference in this particular case because DagInit::get actually copies NewArgs' elements. But I'm changing to plain C array anyway.

236

Unfortunately MCInst (i.e. MI here) doesn't have something like toString so using Twine might make the code here less concise.

267

no, InstBits (and InstBits_<HW mode>) is a local variable. Here is how it looks in the generated .inc file:

void M68kMCCodeEmitter::getBinaryCodeForInstr(const MCInst &MI,
    SmallVectorImpl<MCFixup> &Fixups,
    APInt &Inst,
    APInt &Scratch,
    const MCSubtargetInfo &STI) const {
  static const APInt InstBits[] = {
    APInt::getZeroWidth(),
    APInt::getZeroWidth(),
    APInt::getZeroWidth(),
...
jrtc27 added inline comments.Dec 11 2021, 7:49 PM
llvm/utils/TableGen/VarLenCodeEmitterGen.cpp
267

A local variable with static storage duration, i.e. a static global that's scoped to the function

jrtc27 added inline comments.Dec 11 2021, 7:59 PM
llvm/utils/TableGen/VarLenCodeEmitterGen.cpp
127

That's not Craig's point; ArrayRef<Init *> NewArgs = {OperandName, LoBit, HiBit}; itself is broken, as NewArgs points to storage after that statement whose lifetime has ended (well, it's fine so long as you never use NewArgs, but then it's pointless), so by the time you've reached DagInit::get it's already too late, the storage is gone. Using an initialiser list for ArrayRef is solely for use within the same statement (i.e. things like foo(ArrayRef({...})).

myhsu marked an inline comment as done.Dec 11 2021, 8:28 PM
myhsu added inline comments.
llvm/utils/TableGen/VarLenCodeEmitterGen.cpp
127

I see, that makes sense. Thanks for the explanation.

267

you're right. Then I think a potential solution will be using two integer arrays:

static const unsigned Index[][2] = {
  {/*TotalBits*/36, /*Storage index*/0},  // FOO16
  {/*TotalBits*/80, /*Storage index*/1},  // FOO32
  ...
};
static const uint64_t Storage[] = {
  UINT64_C(53312), UINT64_C(53352), ...
};

And use Index to indirectly access fixed values of each instruction in Storage.

craig.topper added inline comments.Dec 11 2021, 8:33 PM
llvm/utils/TableGen/VarLenCodeEmitterGen.cpp
273

Can this be a range for loop? Or can we use auto to move the declaration of IE/EE into the for statement?

275

const reference?

277

Use a range for loop and ListSeparator to manage the new line?

myhsu updated this revision to Diff 393722.Dec 11 2021, 11:42 PM
myhsu marked 5 inline comments as done.
  • Addressed minor issues.
  • Replace const APInt InstBits[] =... with two integer arrays to avoid static global ctor.

Now we will have two arrays, Index and InstBits that look like this:

unsigned Index[][2] = {
  {/*NumBits*/87, /*Index to InstBits*/0}, // FOO16
  {/*NumBits*/39, /*Index to InstBits*/2}, // FOO32
};
uint64_t InstBits[] = {
  UINT64_C(<fixed values>), UINT64_C(...), // FOO16
  UINT64_C(...),  // FOO32
};
craig.topper added inline comments.Dec 13 2021, 9:50 AM
llvm/utils/TableGen/VarLenCodeEmitterGen.cpp
428

I think this can be Scratch = Scratch.zextOrSelf(BitWidth)

myhsu updated this revision to Diff 394107.Dec 13 2021, 6:45 PM
myhsu marked an inline comment as done.

Use APInt::zextOrSelf to adjust the scratch buffer size in the generated code. Also, avoid generating said code if an instruction only has fixed values.

RKSimon added inline comments.Jan 25 2022, 6:03 AM
llvm/utils/TableGen/VarLenCodeEmitterGen.cpp
125

(style) Don't use auto for non-obvious return types (could be StringRef / std::string / const std::string& .......)

craig.topper added inline comments.Jan 25 2022, 10:33 AM
llvm/utils/TableGen/VarLenCodeEmitterGen.cpp
379

Can this use setBitVal like the BitInit case below?

myhsu updated this revision to Diff 403210.Jan 26 2022, 4:16 AM
myhsu marked 2 inline comments as done.

Addressed feedbacks.

myhsu added a comment.Feb 6 2022, 5:12 PM

kindly ping

craig.topper added inline comments.Feb 6 2022, 9:29 PM
llvm/utils/TableGen/VarLenCodeEmitterGen.cpp
104

nit. Is the default for HwMode ever used? The call when HwModes is empty passes -1

325

Can the indent call be done before the loop?

Though I'm not sure about the indent. The NumBits and Index don't have a new line after them so we're not at the start of a line here.

364

Use VarLenInsts.find to avoid doing two map lookups?

424

Use VarLenInsts.find() instead of looking up the map twice?

@myhsu I noticed you've created some further child patches - is ther plan to create a patch sequence that updates the M68k code entirely before pushing?

myhsu added a comment.Feb 7 2022, 2:45 AM

@myhsu I noticed you've created some further child patches - is ther plan to create a patch sequence that updates the M68k code entirely before pushing?

Not really, my current plan is to commit D115128 and D115234 right away once they're accepted. Same logics also goes to D119100 and D119101.
I'll make separate patches for rest of the M68k changes

myhsu updated this revision to Diff 406376.Feb 7 2022, 3:19 AM
myhsu marked 4 inline comments as done.

Addressed feedbacks.

llvm/utils/TableGen/VarLenCodeEmitterGen.cpp
104

good point, I'm fixing this on the callsite.

325

Though I'm not sure about the indent. The NumBits and Index don't have a new line after them so we're not at the start of a line here.

IS and SS are actually separate streams for different parts of the code. And SS.indent(4) is indeed at the start of a line (line 391 prints the newline for it).

This revision is now accepted and ready to land.Feb 9 2022, 11:35 AM
This revision was landed with ongoing or failed builds.Feb 11 2022, 9:33 AM
This revision was automatically updated to reflect the committed changes.