Page MenuHomePhabricator

[TableGen][M68K] (Patch 1/8) Utilities for complex instruction addressing modes: CodeBeads and logical operand helper functions
ClosedPublic

Authored by myhsu on Sep 27 2020, 4:22 PM.

Details

Summary

This patch adds two components: The CodeBeads TG backend and the support for InstrInfoEmitter TG backend to emit information about logical operands.

CodeBeads are annotations for instructions to express _non-trivial amount_ of addressing modes in an easier way. Many instructions of M68K, like many other CISC architectures, can be used with variety of addressing modes. For example, JSR (i.e. function call) can be used with four different addressing modes. Without this CodeBeads utility to explicitly embed these information into instructions' TG definitions, it would be more difficult for MC to encode the instructions (e.g. need to write some sort of pattern matching code to figure out the addressing mode).

Another special property of CISC ISA is that a single instruction operand (called "logical operand" here) might consist of multiple llvm::MachineOperands. Thus this patch enables IntrInfoEmitter TG backend to (optionally) generate helper functions for logical operands.
More specifically, the llvm::<target NS>::getLogicalOperandSize to get the number of llvm::MachineOperand on a specific logical operand; llvm::<target NS>::getLogicalOperandIdx to get the llvm::MachineOperand index for a specific logical operand.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
craig.topper added inline comments.Oct 6 2020, 8:09 PM
llvm/utils/TableGen/InstrInfoEmitter.cpp
506

Probably don't need to bother avoiding trailing commas in the generated file. The C++ parsing rules allow them

576

std::max?

627

range for

myhsu updated this revision to Diff 297344.Oct 9 2020, 3:38 PM
myhsu marked 13 inline comments as done.

Addressed feedbacks, mostly formatting issues and minor coding improvements

llvm/utils/TableGen/CodeBeadsGen.cpp
51

no really...this number depends on maximum bit length among all code beads. As the TODO comment on line 55 suggested we should have a way to evaluate this dynamically. Also the for loop on line 84 need this number to reverse the byte order

Paul-C-Anagnostopoulos resigned from this revision.Oct 13 2020, 3:56 PM

A minor suggestion: Use the new true/false literals in the TableGen files. I believe it makes the code easier to read.

myhsu added a reviewer: RKSimon.
RKSimon added inline comments.Nov 16 2020, 11:01 AM
llvm/utils/TableGen/CodeBeadsGen.cpp
51

Please add a comment explaining these magic numbers.

llvm/utils/TableGen/InstrInfoEmitter.cpp
498

const auto &Row

516

const auto &Instrs

590

const auto &Row

614

const auto &Insts

llvm/include/llvm/Target/Target.td
645

Before you push this, Target.td will use 'true' and 'false'.

llvm/utils/TableGen/CodeBeadsGen.cpp
34

It's conventional to call the output stream OS.

myhsu marked 4 inline comments as done.Nov 17 2020, 3:47 PM
myhsu added inline comments.
llvm/utils/TableGen/CodeBeadsGen.cpp
51

Eventually I decided to just fix that TODO so the next revision will not have that magic number anymore

myhsu updated this revision to Diff 305917.EditedNov 17 2020, 3:57 PM
myhsu marked 5 inline comments as done.
  • Rebase to upstream master
  • Addressed feedbacks
  • Update to use the latest TG syntax

Does anyone have any more comments? Otherwise I think we can provisionally accept this.

@rengolin I think you wanted to see the entire patch series to be accepted before anything gets committed?

Does anyone have any more comments? Otherwise I think we can provisionally accept this.

LGTM too, thanks!

@jrtc27, you have marked as blocked. Have all your issues been resolved?

@rengolin I think you wanted to see the entire patch series to be accepted before anything gets committed?

Definitely. We need to make sure the whole series is sound, then wait a bit to make sure no one else has any concern, then all patches can be committed one after another.

Feel free to mark any patch of the series as accepted, so that we can speed up the convergence.

RKSimon accepted this revision.Dec 11 2020, 4:20 AM

LGTM

@jrtc27 Are you happy to approve this now? You're currently blocking the patch.

jrtc27 added inline comments.Dec 20 2020, 9:44 AM
llvm/include/llvm/Target/Target.td
642

would -> will

643–644

Reading this I don't actually understand what it does. How can I map a type to an operand of an MI?

llvm/utils/TableGen/CodeBeadsGen.cpp
9

I don't understand where the name "code bead" comes from. There's no reference to it in the GCC source, nor on the internet in general that I can obviously find, so it sounds to me like something you've invented yourself. Can we please stick to more standard terminology for things? Bead is meaningless.

96

uint8_t?

119

Array decay should render the cast unnecessary?

llvm/utils/TableGen/InstrInfoEmitter.cpp
483

Still not done; the std::string construction for Namespace is not necessary.

500–501
MaskRay added inline comments.Dec 20 2020, 11:30 AM
llvm/include/llvm/Target/Target.td
642

https://llvm.org/docs/CodingStandards.html#doxygen-use-in-documentation-comments says "don't use example - "

Many older functions do not conform to the standard but newer functions should.

RKSimon added inline comments.Dec 21 2020, 7:07 AM
llvm/utils/TableGen/CodeBeadsGen.cpp
9

Alternatively a more expansive description of code beads should be OK - e.g. are they just for helping emit CISC variable length bit encodings?

myhsu updated this revision to Diff 314140.Dec 30 2020, 12:35 PM
myhsu marked 9 inline comments as done.
  • [NFC] Addressed feedbacks

I still don't understand what problem code beads are trying to solve that isn't already solved by existing backends like X86. Why can't you just assign operands to instruction encoding bits like a normal backend?

craig.topper added inline comments.Dec 30 2020, 1:01 PM
llvm/utils/TableGen/CodeBeadsGen.cpp
18

speicifc -> specific

myhsu added a comment.Dec 30 2020, 1:37 PM

I still don't understand what problem code beads are trying to solve that isn't already solved by existing backends like X86. Why can't you just assign operands to instruction encoding bits like a normal backend?

Currently X86 - probably the only Target that shares a similar problem - uses MCInstrDesc::TSFlags to carry complex encoding info (maybe @craig.topper can shed some lights here).

TSFlags is only 64-bits wide. But some of the M68k instructions need 24 bytes to carry its encoding info. Frankly speaking, this is simply because we didn't come up with a optimal way to fit every info into 64 bits when we first created this code base long time ago (I don't think M68k has much more complex instruction formats than X86).

I'm happy to find if there is a way to fit all encoding info into TSFlags - and that will bring major code changes to this patch series for sure - if everyone here think this is the approach we should take.

Do you have any idea how much work would be necessary for the TSFlags refactor?

@rengolin @craig.topper any thoughts?

If it isn't possible to encode the information in TSFlags, another possibility is to add a pointer to the MCInstrDesc class that points to some kind of auxiliary class/struct. It would be unfortunate to increase the size of MCInstrDesc, though. I have been playing with the idea of changing the three pointer members to indexes in order to save memory, but I'm not yet convinced it's worth the effort. There are about 67,000 instances across the targets.

myhsu added a comment.EditedJan 2 2021, 2:48 PM

Do you have any idea how much work would be necessary for the TSFlags refactor?

I think it is pretty straight forward to switch our MC code to use MCInstrDesc::TSFlags -- if we managed to rewrite our TG files to use TSFlags, which is a bigger problem. Adapting our TG code to use TSFlags requires modifying most the instruction definitions, as far as I can say.

If it isn't possible to encode the information in TSFlags, another possibility is to add a pointer to the MCInstrDesc class that points to some kind of auxiliary class/struct.

This actually gives me an idea to directly use the MCInstrDesc::TSFlags as a pointer to auxiliary data structures. That is:

auto* AuxTable =  reinterpret_cast<uint8_t*>(MD.TSFlags);

That means we also need to modify InstrInfoEmitter. Here is my preliminary plan:

  1. In the Instruction TG class, add a new string field, ExtTSFlagsFieldName. That's say the value of this field is set to "foo"
  2. If TSFlags in Instruction TG class is 0xFF...FF, InstrInfoEmitter will lookup field "foo". This "foo" field needs to be a bits, but the size need not to be 64 bits.
  3. Following up step 2, InstrInfoEmitter construct auxiliary data structures according to the value of "foo" field.
  4. Assign pointer to auxiliary data structure as the value of MCInstrDesc::TSFlags.

The reason there is an indirection on step 1 and 2 (i.e. use ExtTSFalgsFieldName to lookup auxiliary data's field name) is because TG's bits type can only have fixed size but we want downstream users to specify the size of their auxiliary data structure.

Biggest advantage of this approach is that we barely need to modify our TG code. We don't need to increase the size of MCInstrDesc either

I'm definitely not a TG expert, so @Paul-C-Anagnostopoulos please speak out if you have any suggestion on this approach :-)

jrtc27 added a comment.Jan 2 2021, 2:52 PM

If it isn't possible to encode the information in TSFlags, another possibility is to add a pointer to the MCInstrDesc class that points to some kind of auxiliary class/struct.

This actually gives me an idea to directly use the MCInstrDesc::TSFlags as a pointer to auxiliary data structures. That is:

auto* AuxTable =  reinterpret_cast<uint8_t*>(MD.TSFlags);

Definitely not. Integers are not pointers, and will break on CHERI. You would need to make TSFlags a uintptr_t if you want that to work, but then you only get 32 bits on 32-bit architectures.

myhsu added a comment.Jan 2 2021, 3:00 PM

If it isn't possible to encode the information in TSFlags, another possibility is to add a pointer to the MCInstrDesc class that points to some kind of auxiliary class/struct.

This actually gives me an idea to directly use the MCInstrDesc::TSFlags as a pointer to auxiliary data structures. That is:

auto* AuxTable =  reinterpret_cast<uint8_t*>(MD.TSFlags);

Definitely not. Integers are not pointers, and will break on CHERI. You would need to make TSFlags a uintptr_t if you want that to work, but then you only get 32 bits on 32-bit architectures.

Fair enough.

Then I think another way will be reusing step 1~3 in my algorithm but generate a function for MC code to access auxiliary data structures.

@jrtc27 Will some sort of union of a uint64_t and a pointer work? One of the Flags bits could specify which one it is.

I agree that it would be cleaner to encode the information as flags, if possible.

jrtc27 added a comment.Jan 2 2021, 3:09 PM

@jrtc27 Will some sort of union of a uint64_t and a pointer work? One of the Flags bits could specify which one it is.

I agree that it would be cleaner to encode the information as flags, if possible.

That'd work, yes, so long as you either have the flag bit being _zero_ mean it's a pointer or you put the flag bit sufficiently low down (preferably bit 0) as otherwise you'll take the pointer way outside the bounds of the corresponding allocation by setting a high bit; in order to fit the bounds and 64-bit address in 128 bits we compress the bounds and rely on pointers not going "too far" out of bounds (anything other than one-past-the-end is UB in C/C++, but we relax that somewhat for compatibility, roughly in proportion to the size of the allocation), but if they do then they're marked invalid and won't work later even though you mask the bit out.

I was thinking of using a bit in the other flags member, Flags, as opposed to TSFlags. Then there would be nothing sneaky going on in the union itself.

Making it an anonymous union would mean that all current references to TSFlags would still work, correct?

jrtc27 added a comment.Jan 2 2021, 5:07 PM

I was thinking of using a bit in the other flags member, Flags, as opposed to TSFlags. Then there would be nothing sneaky going on in the union itself.

Making it an anonymous union would mean that all current references to TSFlags would still work, correct?

Yes, that should all be fine.

myhsu added a comment.Jan 2 2021, 5:36 PM

I was thinking of using a bit in the other flags member, Flags, as opposed to TSFlags. Then there would be nothing sneaky going on in the union itself.

Making it an anonymous union would mean that all current references to TSFlags would still work, correct?

@jrtc27 Will some sort of union of a uint64_t and a pointer work? One of the Flags bits could specify which one it is.

I agree that it would be cleaner to encode the information as flags, if possible.

That'd work, yes, so long as you either have the flag bit being _zero_ mean it's a pointer or you put the flag bit sufficiently low down (preferably bit 0) as otherwise you'll take the pointer way outside the bounds of the corresponding allocation by setting a high bit; in order to fit the bounds and 64-bit address in 128 bits we compress the bounds and rely on pointers not going "too far" out of bounds (anything other than one-past-the-end is UB in C/C++, but we relax that somewhat for compatibility, roughly in proportion to the size of the allocation), but if they do then they're marked invalid and won't work later even though you mask the bit out.

Sounds like a plan, I will update this patch accordingly

Thanks for the brainstorming

I was thinking of using a bit in the other flags member, Flags, as opposed to TSFlags. Then there would be nothing sneaky going on in the union itself.

Making it an anonymous union would mean that all current references to TSFlags would still work, correct?

@jrtc27 Will some sort of union of a uint64_t and a pointer work? One of the Flags bits could specify which one it is.

I agree that it would be cleaner to encode the information as flags, if possible.

That'd work, yes, so long as you either have the flag bit being _zero_ mean it's a pointer or you put the flag bit sufficiently low down (preferably bit 0) as otherwise you'll take the pointer way outside the bounds of the corresponding allocation by setting a high bit; in order to fit the bounds and 64-bit address in 128 bits we compress the bounds and rely on pointers not going "too far" out of bounds (anything other than one-past-the-end is UB in C/C++, but we relax that somewhat for compatibility, roughly in proportion to the size of the allocation), but if they do then they're marked invalid and won't work later even though you mask the bit out.

Sounds like a plan, I will update this patch accordingly

Thanks for the brainstorming

I'm not sure I follow the plan. So we're going to change the TableGen InstrInfoEmitter to put out a pointer for this target instead of the normal TSFlags constant every other target uses?

Why would refactoring TSFlags into a int-ptr union just for M68k be better than adding the CodeBeads functionality just for M68k?

@RKSimon has a good question that I will leave to others to debate.

@craig.topper Actually, a pointer is a bad idea. Instead, if it is necessary to have a separate blob of data, simply use the TSFlags field as an index into a separate table of instances. It's already a uint64_t, which makes a fine index.

I presume the use of this field and the separate table would be triggered by something in a TableGen file, not just hardwired for the M86K target.

@RKSimon has a good question that I will leave to others to debate.

@craig.topper Actually, a pointer is a bad idea. Instead, if it is necessary to have a separate blob of data, simply use the TSFlags field as an index into a separate table of instances. It's already a uint64_t, which makes a fine index.

I presume the use of this field and the separate table would be triggered by something in a TableGen file, not just hardwired for the M86K target.

Isn't this issue something that can be resolved once the backend has been merged?

It would be great if the backend could finally be available as an experimental backend as there multiple downstream projects like the Clang Kernel project that want to test this backend.

Why would refactoring TSFlags into a int-ptr union just for M68k be better than adding the CodeBeads functionality just for M68k?

I think the question is more of a "where does this belong" than "why are we doing this".

If we want to auto-generate tables at compile time, then table-gen is your friend. If this should be a bit more dynamic, then working around auto-generated tables can be cumbersome and counter productive.

The other question is "when should we do this".

We already have an implementation (CodeBeads) which I'm assuming came from an existing implementation, so (still assuming) we "know it works with m68k". The table-gen approach is so far an idea that "could" work. Working with table-gen is non-trivial.

In interest of doing one thing at a time, I'd prefer to keep the existing implementation for now and do a refactor later. As I see it, the refactor should only affect the m68k back-end.

In summary, my personal view is:

  • We have something that works now and the proposed alternative is equally unique
  • We should answer the "where" question before we go on implementing a new style
  • If it's decided a table-gen implementation fits better, we do it as a refactory, after the merge

Putting a bit TODO comment on the current CodeBeads code will help others not try to rely on it in the interim. But if they do, then this would have proven it's more than just m68k.

FWIW I still think we're better off keeping (m68k only) CodeBeads rather than refactoring a lot of code that will affect mainstream targets. After m68k has been pushed, we can reinvestigate whether to keep CodeBeads (and whether other targets would benefit from using it), or moving m68k to a refactored TSFlag mechanism - we can make either a pre-requisite for it losing its experimental status if necessary.

FWIW I still think we're better off keeping (m68k only) CodeBeads rather than refactoring a lot of code that will affect mainstream targets. After m68k has been pushed, we can reinvestigate whether to keep CodeBeads (and whether other targets would benefit from using it), or moving m68k to a refactored TSFlag mechanism - we can make either a pre-requisite for it losing its experimental status if necessary.

I very much agree with that stance and I think this is the most straight-forward approach. I think getting the backend merged so becomes visible to a broader audience - both testers and developers - will help improve the quality and squeeze out more bugs. After all, it's supposed to be marked as experimental so it's expected to not be production-ready yet.

FWIW I still think we're better off keeping (m68k only) CodeBeads rather than refactoring a lot of code that will affect mainstream targets. After m68k has been pushed, we can reinvestigate whether to keep CodeBeads (and whether other targets would benefit from using it), or moving m68k to a refactored TSFlag mechanism - we can make either a pre-requisite for it losing its experimental status if necessary.

I very much agree with that stance and I think this is the most straight-forward approach. I think getting the backend merged so becomes visible to a broader audience - both testers and developers - will help improve the quality and squeeze out more bugs. After all, it's supposed to be marked as experimental so it's expected to not be production-ready yet.

Yeah my concern is not about m68k doing weird things, it's about the code that's added in the target-independent places to support m68k that I am most concerned with, as it affects every target regardless of whether m68k is enabled.

The impact on existing targets is pretty minimal - I suppose the logical operand mappings code is the most exposed part? The CodeBeads code is relatively self-contained.

I don't think we'll gain much by trying to iterate on this further, I'd recommend accepting the patch at this point.

Yeah my concern is not about m68k doing weird things, it's about the code that's added in the target-independent places to support m68k that I am most concerned with, as it affects every target regardless of whether m68k is enabled.

Exactly, that's why I prefer to keep the CodeBeads implementation for now (that only affects the m68k target) than refactor table-gen (that would affect all).

We should avoid mixing generic refactory with experimental code, unless the refactory is a clear benefit to all targets, not just the experimental one.

I concur with @RKSimon that we should approve this code as is and get the target in.

Yeah my concern is not about m68k doing weird things, it's about the code that's added in the target-independent places to support m68k that I am most concerned with, as it affects every target regardless of whether m68k is enabled.

Exactly, that's why I prefer to keep the CodeBeads implementation for now (that only affects the m68k target) than refactor table-gen (that would affect all).

We should avoid mixing generic refactory with experimental code, unless the refactory is a clear benefit to all targets, not just the experimental one.

I concur with @RKSimon that we should approve this code as is and get the target in.

That sounds reasonable (but +1 to @RKSimon 's comment about reviewing all this "special sauce" complexity before graduating from experimental)

rengolin accepted this revision.Jan 15 2021, 1:58 PM

That sounds reasonable (but +1 to @RKSimon 's comment about reviewing all this "special sauce" complexity before graduating from experimental)

Agreed! It should be resolved before that.

@myhsu Please can you raise a bug covering the CodeBeads vs TSFlags refactor options and make it a blocker against a second bug for making the m68k backend non-experimental?

myhsu added a comment.Jan 15 2021, 2:08 PM

@myhsu Please can you raise a bug covering the CodeBeads vs TSFlags refactor options and make it a blocker against a second bug for making the m68k backend non-experimental?

Absolutely. I agree this problem should be solve before graduate from experimental target. I will create these bugs soon

rengolin added a comment.EditedJan 16 2021, 5:22 AM

Absolutely. I agree this problem should be solve before graduate from experimental target. I will create these bugs soon

You could also create a meta bug called something like "m68k in production" that all bugs that needed to be solved before we move the target to production are dependencies.

People filling bugs against the target can link to that meta bug if the issue is serious enough.

You could also create infrastructure bugs like creating buildbots, passing certain milestones in compiling programs and OSs, etc.

It'd be easy then to see that the target is good enough to be promoted to production once all those bugs (or at least all critical ones) are solved.

Absolutely. I agree this problem should be solve before graduate from experimental target. I will create these bugs soon

You could also create a meta bug called something like "m68k in production" that all bugs that needed to be solved before we move the target to production are dependencies.

People filling bugs against the target can link to that meta bug if the issue is serious enough.

You could also create infrastructure bugs like creating buildbots, passing certain milestones in compiling programs and OSs, etc.

It'd be easy then to see that the target is good enough to be promoted to production once all those bugs (or at least all critical ones) are solved.

Alright, I've created bug 48792 for tracking it.

I've also created a meta/umbrella bug 48791 for tracking the status of graduating M68k from experimental target.

I will create more bugs regarding milestones and more OS supports and link them to the meta bug.

Thanks @myhsu!

@jrtc27 - you're still blocking this - is there anything else you think we need to address here?

jrtc27 accepted this revision.Jan 18 2021, 11:21 AM

Thanks @myhsu!

@jrtc27 - you're still blocking this - is there anything else you think we need to address here?

For this review, no. There are a bunch of outstanding issues on the other reviews though that need addressing; I don't know if it makes sense to land this now or wait and land them all in one go once the other reviews are fixed and approved.

This revision is now accepted and ready to land.Jan 18 2021, 11:21 AM

Thanks @myhsu!

@jrtc27 - you're still blocking this - is there anything else you think we need to address here?

For this review, no. There are a bunch of outstanding issues on the other reviews though that need addressing; I don't know if it makes sense to land this now or wait and land them all in one go once the other reviews are fixed and approved.

The plan is for the entire patch series to be accepted before any are pushed. Given how close we are to the next release branch (26 Jan iirc) I'd like to suggest that even if they all get accepted before then, we should delay the pushes to shortly after that.

The plan is for the entire patch series to be accepted before any are pushed.

Indeed. The whole purpose of patch series is to create a dependency that we can only merge any patch once they're all approved.

Given how close we are to the next release branch (26 Jan iirc) I'd like to suggest that even if they all get accepted before then, we should delay the pushes to shortly after that.

Good point.

myhsu added a comment.Jan 19 2021, 9:27 PM

Thanks @myhsu!

@jrtc27 - you're still blocking this - is there anything else you think we need to address here?

For this review, no. There are a bunch of outstanding issues on the other reviews though that need addressing; I don't know if it makes sense to land this now or wait and land them all in one go once the other reviews are fixed and approved.

The plan is for the entire patch series to be accepted before any are pushed.

Given how close we are to the next release branch (26 Jan iirc) I'd like to suggest that even if they all get accepted before then, we should delay the pushes to shortly after that.

Good point! I'll wait until the release is over

@myhsu I think you will also need a Patch 0/8 - adding you as the m68k code owner :)

@myhsu I think you will also need a Patch 0/8 - adding you as the m68k code owner :)

Yes you're right :-) Will do.

Some minor remaining issues, but this is fine to commit once those are fixed IMO.

llvm/utils/TableGen/CodeBeadsGen.cpp
18

Still outstanding

55
95–96

Triple-slash is for doxygen

llvm/utils/TableGen/InstrInfoEmitter.cpp
512
531
610
myhsu updated this revision to Diff 326500.Feb 25 2021, 2:15 PM
myhsu marked 7 inline comments as done.
  • Addressed feedbacks
jrtc27 accepted this revision.Mar 3 2021, 6:06 AM
This revision was landed with ongoing or failed builds.Mar 8 2021, 12:33 PM
This revision was automatically updated to reflect the committed changes.