This is an archive of the discontinued LLVM Phabricator instance.

[TableGen] Add support for variable length instruction in decoder generator
ClosedPublic

Authored by 0x59616e on Mar 3 2022, 6:30 PM.

Details

Summary

To support variable length instructions, I think of them as fixed length instructions with the "maximum length". For example, if there're three instructions with 2, 6 and 9 bytes, we can fit them into the algorithm by treating them all as 9 bytes.

Also, since we can't know the length of the instruction in advance, there is a function object with type void(APInt &, uint64_t) added in the parameter list of decodeInstruction and fieldFromInstruction. We can use this to supply the additional bits the decoder needs after we know the opcode of the instruction.

Finally, InstrLenTable is added to let the decoder know the length of the instructions.

See D120960 for its usage.

Diff Detail

Event Timeline

0x59616e created this revision.Mar 3 2022, 6:30 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 3 2022, 6:30 PM
0x59616e requested review of this revision.Mar 3 2022, 6:30 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 3 2022, 6:30 PM
0x59616e edited the summary of this revision. (Show Details)Mar 10 2022, 3:38 PM

Thank you for the patch.
A high level question: It seems like generating a decoder requires two phases -- First, generates a list of OperandInfo, which is currently done by populateInstruction, then emit the real (C++) code according to these OperandInfo instances. My question is, can we use our own way -- possibly putting in a separate file -- to generate these OperandInfo before supplying them to the second phase, instead of trying to piggyback everything into the existing FixedLenDecoder framework?
Because, to be honest, I'm not a big fan of overloading the BitsInit (to carry operand info). Using a BitsInit might fit well for fixed-length instructions but I feel like there are more elegant ways to handle var-length instructions. For instance, using CGIOperandList::OperandInfo::ParseOperandName to parse suboperands rather than traversing every single in/out operands in the original instruction definition.

Also, please double check all your modifications and make sure to follow the LLVM coding style, especially the naming convention of local variables.

llvm/utils/TableGen/FixedLenDecoderEmitter.cpp
228
265

"unknown Init kind" would probably be better.

459

why do you want to change these three lines? I don't think RV is used anywhere else.

1347

ditto

2175–2181

The OI argument can be OperandInfo & to avoid copy.

2292

please use llvm::function_ref here

2295

why do we want to enlarge insn (on-demand) upon every field extractions? Can we resize insn ahead of time and do it only once?

2329

ditto

Thank you for the patch.
A high level question: It seems like generating a decoder requires two phases -- First, generates a list of OperandInfo, which is currently done by populateInstruction, then emit the real (C++) code according to these OperandInfo instances. My question is, can we use our own way -- possibly putting in a separate file -- to generate these OperandInfo before supplying them to the second phase, instead of trying to piggyback everything into the existing FixedLenDecoder framework?
Because, to be honest, I'm not a big fan of overloading the BitsInit (to carry operand info). Using a BitsInit might fit well for fixed-length instructions but I feel like there are more elegant ways to handle var-length instructions. For instance, using CGIOperandList::OperandInfo::ParseOperandName to parse suboperands rather than traversing every single in/out operands in the original instruction definition.

Sorry for not having time implementing my ideas, so I just run through it quickly (I will do it maybe this Saturday or Sunday):

We can traverse the Segments of VarLenInst and use CGIOperandList::OperandInfo::ParseOperandName to get the operand number, and then add that Segments into the correspoding OpInfo according to the operand number.

For example, if we find "dst.reg" and use CGIOperandList::OperandInfo::ParseOperandName and know "Oh, this is the third operand", then we can add this into the third OpInfo.

In this way we can avoid overload BitsInit completely.

llvm/utils/TableGen/FixedLenDecoderEmitter.cpp
459

My understanding is: SFBits could be null pointer (take a look at the if statement below), which means SoftFail is not necessary. But using getValueAsBitsInit will trigger assertion if SoftFail can not be found.

0x59616e added inline comments.Mar 15 2022, 5:50 AM
llvm/utils/TableGen/FixedLenDecoderEmitter.cpp
1347

Same reason as above: SFBits is not necessary (if not then the if statement below is useless) but getValueAsBitsInit dislike that.

2295

In some cases, for example, case MCD::OPC_CheckField, needs to extract the bit field before we know the length of the instruction.

But only two cases need this IIRC: OPC_ExtractField and OPC_CheckField. Doing this in fieldFromInstruction is because I'm lazy. I will fix this in the next diff.

And, can we resize insn ahead of time and do it only once ? I think it's no. We have to know the length to resize insn. So we can do this only in case MCD::OPC_Decode.

myhsu added a comment.Mar 15 2022, 1:15 PM

We can traverse the Segments of VarLenInst and use CGIOperandList::OperandInfo::ParseOperandName to get the operand number, and then add that Segments into the correspoding OpInfo according to the operand number.

For example, if we find "dst.reg" and use CGIOperandList::OperandInfo::ParseOperandName and know "Oh, this is the third operand", then we can add this into the third OpInfo.

In this way we can avoid overload BitsInit completely.

This approach sounds reasonable to me

llvm/utils/TableGen/FixedLenDecoderEmitter.cpp
459

The use of getValueAsBitsInit might be wrong but it's irrelevant to this patch. I think it's better to put this change into another patch.

We can traverse the Segments of VarLenInst and use CGIOperandList::OperandInfo::ParseOperandName to get the operand number, and then add that Segments into the correspoding OpInfo according to the operand number.

For example, if we find "dst.reg" and use CGIOperandList::OperandInfo::ParseOperandName and know "Oh, this is the third operand", then we can add this into the third OpInfo.

In this way we can avoid overload BitsInit completely.

This approach sounds reasonable to me

Great ! I'll materialize it this weekend.

0x59616e updated this revision to Diff 418057.Mar 24 2022, 3:38 PM
0x59616e marked 9 inline comments as done.
0x59616e edited the summary of this revision. (Show Details)

address feedback

This comment was removed by 0x59616e.
0x59616e updated this revision to Diff 418152.Mar 25 2022, 1:10 AM

fix a minor problem

myhsu added a comment.Mar 29 2022, 9:45 PM

The logics look cleaner now.
Now a bigger question is: Should we still calling it FixedLenDecoderEmitter?

llvm/test/TableGen/VarLenEncoder.td
3 ↗(On Diff #418152)

I still prefer the decoder's tests either go under the FixedLenDecoderEmitter's folder or at least putting into a separate file.

51 ↗(On Diff #418152)

I don't quite understand what this !cond trying to do

llvm/utils/TableGen/FixedLenDecoderEmitter.cpp
224

local variables should start with uppercase

1868

I'm not sure non-power-of-two # of (inlined) elements for SmallVector is recommended. If you're not sure what number to put, just use SmallVector<int> as recommended here.

1875

braces here can be removed

1900

please use !OpName.empty() or OpName.size()

1956

Did you use clang-format-diff.py rather than clang-format? I wonder why this line changed...

2360

why change to uint64_t? If you're worry about InsnType being APInt, APInt has operator=(uint64_t)

2387

ditto

2497

ditto

2502

"!= 0" in these two lines are redundant.

2541

why remove '&'?

2675

Could we add some comments here?

llvm/utils/TableGen/VarLenCodeEmitterGen.h
20–21

I think these forward declarations are obsolete now

0x59616e updated this revision to Diff 419109.Mar 30 2022, 5:02 AM
0x59616e marked 12 inline comments as done.

address some feedbacks

0x59616e updated this revision to Diff 419115.Mar 30 2022, 5:25 AM

move test into another file

The logics look cleaner now.
Now a bigger question is: Should we still calling it FixedLenDecoderEmitter?

What about DecoderEmitter.cpp ?

llvm/utils/TableGen/FixedLenDecoderEmitter.cpp
224

Just wondering: there are some variables that is started with lowercase in the original code. For example the parameter "&def" in this function.

Is that a mistake ? or is there any special naming rule that I don't know ?

228

Naming is hard.

1956

I've tried, and it remains the same.

BTW I use "git clang-format". is there any difference between these two ?

2360

But it yells "conversion from ‘uint64_t’ {aka ‘long unsigned int’} to non-scalar type ‘llvm::APInt’ requested"

2502

g++ screams "no match for ‘operator||’ (operand types are ‘llvm::APInt’ and ‘llvm::APInt’)"

0x59616e updated this revision to Diff 419119.Mar 30 2022, 5:30 AM

address feedback

myhsu added a comment.Mar 31 2022, 9:57 AM

The logics look cleaner now.
Now a bigger question is: Should we still calling it FixedLenDecoderEmitter?

What about DecoderEmitter.cpp ?

Sounds good to me, but I would suggest you to send a RFC to the forum asking for consensus.
Especially to see if there is any comment from existing FixedLenDecoder users (e.g. AArch64 and ARM). Also, in the RFC, please briefly explain why you want to build this feature on top of FixedLenDecoder rather than writing a separate disassembler.

llvm/utils/TableGen/FixedLenDecoderEmitter.cpp
224

it's likely that those code were committed before LLVM coding style was solidified

1956

I've tried, and it remains the same.

that's fine, just a minor issue

BTW I use "git clang-format". is there any difference between these two ?

I don't think so

2360

you're right and I was wrong about APInt::operator=(uint64_t): the operator won't be used in the case of initialization (constructor will be used instead but APInt doesn't have APInt(uint64_t)).

2387

why did you use another PtrLen variable here?

Also, please make sure all existing disassembler tests, especially targets that use FixedLenDecoder, are passing.

myhsu added inline comments.Mar 31 2022, 10:05 AM
llvm/utils/TableGen/FixedLenDecoderEmitter.cpp
1875

I meant only braces around the for-loop, your current syntax might create ambiguity. Here is the guideline: https://llvm.org/docs/CodingStandards.html#don-t-use-braces-on-simple-single-statement-bodies-of-if-else-loop-statements

0x59616e updated this revision to Diff 419599.Mar 31 2022, 6:37 PM
0x59616e marked 3 inline comments as done.

address feedback

The logics look cleaner now.
Now a bigger question is: Should we still calling it FixedLenDecoderEmitter?

What about DecoderEmitter.cpp ?

Sounds good to me, but I would suggest you to send a RFC to the forum asking for consensus.
Especially to see if there is any comment from existing FixedLenDecoder users (e.g. AArch64 and ARM). Also, in the RFC, please briefly explain why you want to build this feature on top of FixedLenDecoder rather than writing a separate disassembler.

No problem.

Also, please make sure all existing disassembler tests, especially targets that use FixedLenDecoder, are passing.

Sure.

llvm/utils/TableGen/FixedLenDecoderEmitter.cpp
224

Ah, that makes sense.

2360

C++ is hard.

2387

quick answer: for debug purpose.

In the original code, Len is the length of the instruction field. But it is changed in the function `decodeULEB128'. That will break the debug message below.

Should we rename after this patch ? or before this patch ?

0x59616e updated this revision to Diff 421736.Apr 9 2022, 8:17 AM

I could not remember whether I've modified this patch recently.

So I update this just for ensurance.

ping. Any more suggestions ?

@myhsu Any more comments?

myhsu accepted this revision.May 1 2022, 12:33 PM

lgtm. Thanks for the patch!

This revision is now accepted and ready to land.May 1 2022, 12:33 PM

It's taken me a very long time to find time to read though this patch (I'm very sorry about that). The approach seems nice and straight forward & the generated code looks good. Thanks a lot for finding the time to get this through. 😅

lgtm. Thanks for the patch!

Thanks for your patience. This patch wouldn't be here if it were not for your help ;)

It's taken me a very long time to find time to read though this patch (I'm very sorry about that). The approach seems nice and straight forward & the generated code looks good. Thanks a lot for finding the time to get this through. 😅

My pleasure ;)

This revision was landed with ongoing or failed builds.May 2 2022, 12:37 PM
This revision was automatically updated to reflect the committed changes.
foad added a subscriber: foad.May 3 2022, 1:52 AM
foad added inline comments.
llvm/utils/TableGen/FixedLenDecoderEmitter.cpp
2360

Hi, this is causing a slight problem for us downstream because we're using a custom InsnType (not APInt). Have you seen the big comment emitted at line 2255? It explicitly says that your InsnType needs to "be constructible from a uint64_t". So please either update the comment, or stop using plain APInt as your InsnType :)

2502

Downstream the compiler complained about using mismatched types for operator&, InsnType and uint64_t. So please either change this back or at least update the comment on line 2255.

foad added a comment.May 3 2022, 3:08 AM

For the record, to get this to build with my downstream target, I had to add:

MyInsnType MyInsnType::operator&(const uint64_t &RHS);
bool MyInsnType::operator!=(const int &RHS);
gregmiller added a subscriber: gregmiller.EditedMay 3 2022, 9:52 AM

Hello, We are maintaining a downstream version of the monorepo based on the LLVM main branch. In a recent attempt to merge the latest upstream commits
into our monorepo we came across the following test failures after your commit.
Any help would be greatly appreciated.
Thanks
Greg


`
FAIL: llvm_regressions :: LLVM/TableGen/VarLenDecoder.td
--------------------------------------------------------------------------------
Script:
--
: 'RUN: at line 1';   /scratch/gmiller/tools2/llvm_cgt/arm-llvm/RelWithAsserts/llvm/bin/llvm-tblgen -gen-disassembler -I /scratch/gmiller/tools2/llvm_cgt/llvm-project/llvm/test/TableGen/../../include /scratch/gmiller/tools2/llvm_cgt/llvm-project/llvm/test/TableGen/VarLenDecoder.td | /scratch/gmiller/tools2/llvm_cgt/arm-llvm/RelWithAsserts/llvm/bin/FileCheck /scratch/gmiller/tools2/llvm_cgt/llvm-project/llvm/test/TableGen/VarLenDecoder.td
--
Exit Code: 1

Command Output (stderr):
--
+ : 'RUN: at line 1'
+ /scratch/gmiller/tools2/llvm_cgt/arm-llvm/RelWithAsserts/llvm/bin/llvm-tblgen -gen-disassembler -I /scratch/gmiller/tools2/llvm_cgt/llvm-project/llvm/test/TableGen/../../include /scratch/gmiller/tools2/llvm_cgt/llvm-project/llvm/test/TableGen/VarLenDecoder.td
+ /scratch/gmiller/tools2/llvm_cgt/arm-llvm/RelWithAsserts/llvm/bin/FileCheck /scratch/gmiller/tools2/llvm_cgt/llvm-project/llvm/test/TableGen/VarLenDecoder.td
/scratch/gmiller/tools2/llvm_cgt/llvm-project/llvm/test/TableGen/VarLenDecoder.td:50:16: error: CHECK-NEXT: expected string not found in input
// CHECK-NEXT: MCD::OPC_Decode, 244, 1, 0, // Opcode: FOO16
               ^
<stdin>:72:57: note: scanning from here
/* 3 */ MCD::OPC_FilterValue, 8, 4, 0, 0, // Skip to: 12
                                                        ^
<stdin>:73:9: note: possible intended match here
/* 8 */ MCD::OPC_Decode, 245, 1, 0, // Opcode: FOO16
        ^

Input file: <stdin>
Check file: /scratch/gmiller/tools2/llvm_cgt/llvm-project/llvm/test/TableGen/VarLenDecoder.td

-dump-input=help explains the following input dump.

Input was:
<<<<<<
           .
           .
           .
          67:  field.insertBits(bits, startBit, numBits); 
          68: } 
          69:  
          70: static const uint8_t DecoderTable43[] = { 
          71: /* 0 */ MCD::OPC_ExtractField, 3, 5, // Inst{7-3} ... 
          72: /* 3 */ MCD::OPC_FilterValue, 8, 4, 0, 0, // Skip to: 12 
next:50'0                                                             X error: no match found
          73: /* 8 */ MCD::OPC_Decode, 245, 1, 0, // Opcode: FOO16 
next:50'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
next:50'1             ?                                             possible intended match
          74: /* 12 */ MCD::OPC_FilterValue, 9, 4, 0, 0, // Skip to: 21 
next:50'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          75: /* 17 */ MCD::OPC_Decode, 246, 1, 1, // Opcode: FOO32 
next:50'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          76: /* 21 */ MCD::OPC_Fail, 
next:50'0     ~~~~~~~~~~~~~~~~~~~~~~~~
          77:  0 
next:50'0     ~~~
          78: }; 
next:50'0     ~~~
           .
           .
           .
>>>>>>

--
`
myhsu added a comment.May 3 2022, 7:33 PM

Hello, We are maintaining a downstream version of the monorepo based on the LLVM main branch. In a recent attempt to merge the latest upstream commits
into our monorepo we came across the following test failures after your commit.
Any help would be greatly appreciated.
Thanks
Greg


`
FAIL: llvm_regressions :: LLVM/TableGen/VarLenDecoder.td
--------------------------------------------------------------------------------
Script:
--
: 'RUN: at line 1';   /scratch/gmiller/tools2/llvm_cgt/arm-llvm/RelWithAsserts/llvm/bin/llvm-tblgen -gen-disassembler -I /scratch/gmiller/tools2/llvm_cgt/llvm-project/llvm/test/TableGen/../../include /scratch/gmiller/tools2/llvm_cgt/llvm-project/llvm/test/TableGen/VarLenDecoder.td | /scratch/gmiller/tools2/llvm_cgt/arm-llvm/RelWithAsserts/llvm/bin/FileCheck /scratch/gmiller/tools2/llvm_cgt/llvm-project/llvm/test/TableGen/VarLenDecoder.td
--
Exit Code: 1

Command Output (stderr):
--
+ : 'RUN: at line 1'
+ /scratch/gmiller/tools2/llvm_cgt/arm-llvm/RelWithAsserts/llvm/bin/llvm-tblgen -gen-disassembler -I /scratch/gmiller/tools2/llvm_cgt/llvm-project/llvm/test/TableGen/../../include /scratch/gmiller/tools2/llvm_cgt/llvm-project/llvm/test/TableGen/VarLenDecoder.td
+ /scratch/gmiller/tools2/llvm_cgt/arm-llvm/RelWithAsserts/llvm/bin/FileCheck /scratch/gmiller/tools2/llvm_cgt/llvm-project/llvm/test/TableGen/VarLenDecoder.td
/scratch/gmiller/tools2/llvm_cgt/llvm-project/llvm/test/TableGen/VarLenDecoder.td:50:16: error: CHECK-NEXT: expected string not found in input
// CHECK-NEXT: MCD::OPC_Decode, 244, 1, 0, // Opcode: FOO16
               ^
<stdin>:72:57: note: scanning from here
/* 3 */ MCD::OPC_FilterValue, 8, 4, 0, 0, // Skip to: 12
                                                        ^
<stdin>:73:9: note: possible intended match here
/* 8 */ MCD::OPC_Decode, 245, 1, 0, // Opcode: FOO16
        ^

Input file: <stdin>
Check file: /scratch/gmiller/tools2/llvm_cgt/llvm-project/llvm/test/TableGen/VarLenDecoder.td

-dump-input=help explains the following input dump.

Input was:
<<<<<<
           .
           .
           .
          67:  field.insertBits(bits, startBit, numBits); 
          68: } 
          69:  
          70: static const uint8_t DecoderTable43[] = { 
          71: /* 0 */ MCD::OPC_ExtractField, 3, 5, // Inst{7-3} ... 
          72: /* 3 */ MCD::OPC_FilterValue, 8, 4, 0, 0, // Skip to: 12 
next:50'0                                                             X error: no match found
          73: /* 8 */ MCD::OPC_Decode, 245, 1, 0, // Opcode: FOO16 
next:50'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
next:50'1             ?                                             possible intended match
          74: /* 12 */ MCD::OPC_FilterValue, 9, 4, 0, 0, // Skip to: 21 
next:50'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          75: /* 17 */ MCD::OPC_Decode, 246, 1, 1, // Opcode: FOO32 
next:50'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          76: /* 21 */ MCD::OPC_Fail, 
next:50'0     ~~~~~~~~~~~~~~~~~~~~~~~~
          77:  0 
next:50'0     ~~~
          78: }; 
next:50'0     ~~~
           .
           .
           .
>>>>>>

--
`

Looks like the opcode of FOO16 and FOO32 are off by one. I think the number of pseudo opcodes is different (from the upstream one). Are include/llvm/Support/TargetOpcodes.def or include/llvm/Target/Target.td in your downstream repo different from upstream?

For the record, to get this to build with my downstream target, I had to add:

MyInsnType MyInsnType::operator&(const uint64_t &RHS);
bool MyInsnType::operator!=(const int &RHS);

Thanks for your information. I'll update the comment.

For the record, to get this to build with my downstream target, I had to add:

MyInsnType MyInsnType::operator&(const uint64_t &RHS);
bool MyInsnType::operator!=(const int &RHS);

Thanks for your information. I'll update the comment.

@myhsu Should we define our own inst type that wraps APInt, or update the comment and maintain the status quo ?

skan added a subscriber: skan.May 3 2022, 11:47 PM
snidertm added inline comments.
llvm/test/TableGen/VarLenDecoder.td
50

Is 244 the index into the instruction table where FOO16 is expected to reside?

This check on OPC_Decode seems extremely brittle. If there are any additions either in the upstream Target.td or in a downstream Target.td, won't this check and the one on line 52 will break?

0x59616e added a comment.EditedMay 5 2022, 5:10 PM

I've commited a diff in the hope of fixing the test problem:

https://github.com/llvm/llvm-project/commit/9c2121b843ff7c9846a89305b4b73e3c480fe4e7

The comment is updated :
1284ce917b5a

foad added a comment.May 11 2022, 12:15 AM

The comment is updated :
1284ce917b5a

Thanks!