Page MenuHomePhabricator

[TableGen][M68K] (Patch 1/8) Utilities for complex instruction addressing modes: CodeBeads and logical operand helper functions
AcceptedPublic

Authored by myhsu on Sep 27 2020, 4:22 PM.

Details

Summary

This patch adds two components: The CodeBeads TG backend and the support for InstrInfoEmitter TG backend to emit information about logical operands.

CodeBeads are annotations for instructions to express _non-trivial amount_ of addressing modes in an easier way. Many instructions of M68K, like many other CISC architectures, can be used with variety of addressing modes. For example, JSR (i.e. function call) can be used with four different addressing modes. Without this CodeBeads utility to explicitly embed these information into instructions' TG definitions, it would be more difficult for MC to encode the instructions (e.g. need to write some sort of pattern matching code to figure out the addressing mode).

Another special property of CISC ISA is that a single instruction operand (called "logical operand" here) might consist of multiple llvm::MachineOperands. Thus this patch enables IntrInfoEmitter TG backend to (optionally) generate helper functions for logical operands.
More specifically, the llvm::<target NS>::getLogicalOperandSize to get the number of llvm::MachineOperand on a specific logical operand; llvm::<target NS>::getLogicalOperandIdx to get the llvm::MachineOperand index for a specific logical operand.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
myhsu added a comment.Oct 1 2020, 4:08 PM

There are plenty of lint checks that need addressing. Some are false positives and those are obvious cases, but the rest should be followed.

You must run clang-format on new files (entirely) and on localised changes of existing files (many editors have the option to select lines and format).

Interesting...the patch I updated yesterday had gone through clang-format. And I didn't see any lint warning here on Phabricator. Maybe I should run again.

Don't forget to change to the current license.

myhsu updated this revision to Diff 295702.Oct 1 2020, 5:21 PM

Update license

myhsu marked 3 inline comments as done.Oct 1 2020, 5:21 PM

Interesting...the patch I updated yesterday had gone through clang-format. And I didn't see any lint warning here on Phabricator. Maybe I should run again.

Did you check that you were using the latest llvm configuration?

myhsu updated this revision to Diff 296009.Oct 3 2020, 4:42 PM

Decouple getGenInstrBeads from <Target>MCCodeEmitter and put it as a static function in llvm::<Target namespace> with new name getMCInstrBeads

myhsu added a comment.Oct 3 2020, 4:45 PM

Interesting...the patch I updated yesterday had gone through clang-format. And I didn't see any lint warning here on Phabricator. Maybe I should run again.

Did you check that you were using the latest llvm configuration?

I made sure clang-format was fetching the the right config file. Would you mind pointing out some of the places that you think ill-format? thank you!

craig.topper added inline comments.Oct 6 2020, 8:09 PM
llvm/utils/TableGen/CMakeLists.txt
59

This list was nearly in alphabetical order. Can you put this file in the correct place.

llvm/utils/TableGen/CodeBeadsGen.cpp
51

Where do these numbers come from? Are they specific to 68K?

86

Remove commented out code?

93

You can use auto here

98

I think you can use Twine(i) here instead of std::to_string(i).

llvm/utils/TableGen/InstrInfoEmitter.cpp
466

std::max?

470

Is the find and if necessary? Won't insert avoid overwriting value if its already in the map?

474

I think you can use (Namespace + "::" + Inst->TheDef->getName()).str(). That will create a Twine for the pieces and the convert it to a std::string at the end. This should be slightly more efficient than concatenating std::strings.

497

Probably don't need to bother avoiding trailing commas in the generated file. The C++ parsing rules allow them

518

I think we should just use a range for loop here. llvm::for_each is discouraged here https://llvm.org/docs/CodingStandards.html#use-range-based-for-loops-wherever-possible

561

Same comment as earlier about using a Twine here.

567

std::max?

570

I think insert is sufficient.

618

range for

myhsu updated this revision to Diff 297344.Oct 9 2020, 3:38 PM
myhsu marked 13 inline comments as done.

Addressed feedbacks, mostly formatting issues and minor coding improvements

llvm/utils/TableGen/CodeBeadsGen.cpp
51

no really...this number depends on maximum bit length among all code beads. As the TODO comment on line 55 suggested we should have a way to evaluate this dynamically. Also the for loop on line 84 need this number to reverse the byte order

Paul-C-Anagnostopoulos resigned from this revision.Oct 13 2020, 3:56 PM

A minor suggestion: Use the new true/false literals in the TableGen files. I believe it makes the code easier to read.

myhsu added a reviewer: RKSimon.
RKSimon added inline comments.Nov 16 2020, 11:01 AM
llvm/utils/TableGen/CodeBeadsGen.cpp
51

Please add a comment explaining these magic numbers.

llvm/utils/TableGen/InstrInfoEmitter.cpp
489

const auto &Row

507

const auto &Instrs

581

const auto &Row

605

const auto &Insts

llvm/include/llvm/Target/Target.td
638

Before you push this, Target.td will use 'true' and 'false'.

llvm/utils/TableGen/CodeBeadsGen.cpp
34

It's conventional to call the output stream OS.

myhsu marked 4 inline comments as done.Nov 17 2020, 3:47 PM
myhsu added inline comments.
llvm/utils/TableGen/CodeBeadsGen.cpp
51

Eventually I decided to just fix that TODO so the next revision will not have that magic number anymore

myhsu updated this revision to Diff 305917.EditedNov 17 2020, 3:57 PM
myhsu marked 5 inline comments as done.
  • Rebase to upstream master
  • Addressed feedbacks
  • Update to use the latest TG syntax

Does anyone have any more comments? Otherwise I think we can provisionally accept this.

@rengolin I think you wanted to see the entire patch series to be accepted before anything gets committed?

Does anyone have any more comments? Otherwise I think we can provisionally accept this.

LGTM too, thanks!

@jrtc27, you have marked as blocked. Have all your issues been resolved?

@rengolin I think you wanted to see the entire patch series to be accepted before anything gets committed?

Definitely. We need to make sure the whole series is sound, then wait a bit to make sure no one else has any concern, then all patches can be committed one after another.

Feel free to mark any patch of the series as accepted, so that we can speed up the convergence.

RKSimon accepted this revision.Dec 11 2020, 4:20 AM

LGTM

@jrtc27 Are you happy to approve this now? You're currently blocking the patch.

jrtc27 added inline comments.Dec 20 2020, 9:44 AM
llvm/include/llvm/Target/Target.td
635

would -> will

636–637

Reading this I don't actually understand what it does. How can I map a type to an operand of an MI?

llvm/utils/TableGen/CodeBeadsGen.cpp
9

I don't understand where the name "code bead" comes from. There's no reference to it in the GCC source, nor on the internet in general that I can obviously find, so it sounds to me like something you've invented yourself. Can we please stick to more standard terminology for things? Bead is meaningless.

96

uint8_t?

119

Array decay should render the cast unnecessary?

llvm/utils/TableGen/InstrInfoEmitter.cpp
474

Still not done; the std::string construction for Namespace is not necessary.

491–492
MaskRay added inline comments.Dec 20 2020, 11:30 AM
llvm/include/llvm/Target/Target.td
635

https://llvm.org/docs/CodingStandards.html#doxygen-use-in-documentation-comments says "don't use example - "

Many older functions do not conform to the standard but newer functions should.

RKSimon added inline comments.Mon, Dec 21, 7:07 AM
llvm/utils/TableGen/CodeBeadsGen.cpp
9

Alternatively a more expansive description of code beads should be OK - e.g. are they just for helping emit CISC variable length bit encodings?

myhsu updated this revision to Diff 314140.Wed, Dec 30, 12:35 PM
myhsu marked 9 inline comments as done.
  • [NFC] Addressed feedbacks

I still don't understand what problem code beads are trying to solve that isn't already solved by existing backends like X86. Why can't you just assign operands to instruction encoding bits like a normal backend?

craig.topper added inline comments.Wed, Dec 30, 1:01 PM
llvm/utils/TableGen/CodeBeadsGen.cpp
17

speicifc -> specific

myhsu added a comment.Wed, Dec 30, 1:37 PM

I still don't understand what problem code beads are trying to solve that isn't already solved by existing backends like X86. Why can't you just assign operands to instruction encoding bits like a normal backend?

Currently X86 - probably the only Target that shares a similar problem - uses MCInstrDesc::TSFlags to carry complex encoding info (maybe @craig.topper can shed some lights here).

TSFlags is only 64-bits wide. But some of the M68k instructions need 24 bytes to carry its encoding info. Frankly speaking, this is simply because we didn't come up with a optimal way to fit every info into 64 bits when we first created this code base long time ago (I don't think M68k has much more complex instruction formats than X86).

I'm happy to find if there is a way to fit all encoding info into TSFlags - and that will bring major code changes to this patch series for sure - if everyone here think this is the approach we should take.

Do you have any idea how much work would be necessary for the TSFlags refactor?

@rengolin @craig.topper any thoughts?

If it isn't possible to encode the information in TSFlags, another possibility is to add a pointer to the MCInstrDesc class that points to some kind of auxiliary class/struct. It would be unfortunate to increase the size of MCInstrDesc, though. I have been playing with the idea of changing the three pointer members to indexes in order to save memory, but I'm not yet convinced it's worth the effort. There are about 67,000 instances across the targets.

myhsu added a comment.EditedSat, Jan 2, 2:48 PM

Do you have any idea how much work would be necessary for the TSFlags refactor?

I think it is pretty straight forward to switch our MC code to use MCInstrDesc::TSFlags -- if we managed to rewrite our TG files to use TSFlags, which is a bigger problem. Adapting our TG code to use TSFlags requires modifying most the instruction definitions, as far as I can say.

If it isn't possible to encode the information in TSFlags, another possibility is to add a pointer to the MCInstrDesc class that points to some kind of auxiliary class/struct.

This actually gives me an idea to directly use the MCInstrDesc::TSFlags as a pointer to auxiliary data structures. That is:

auto* AuxTable =  reinterpret_cast<uint8_t*>(MD.TSFlags);

That means we also need to modify InstrInfoEmitter. Here is my preliminary plan:

  1. In the Instruction TG class, add a new string field, ExtTSFlagsFieldName. That's say the value of this field is set to "foo"
  2. If TSFlags in Instruction TG class is 0xFF...FF, InstrInfoEmitter will lookup field "foo". This "foo" field needs to be a bits, but the size need not to be 64 bits.
  3. Following up step 2, InstrInfoEmitter construct auxiliary data structures according to the value of "foo" field.
  4. Assign pointer to auxiliary data structure as the value of MCInstrDesc::TSFlags.

The reason there is an indirection on step 1 and 2 (i.e. use ExtTSFalgsFieldName to lookup auxiliary data's field name) is because TG's bits type can only have fixed size but we want downstream users to specify the size of their auxiliary data structure.

Biggest advantage of this approach is that we barely need to modify our TG code. We don't need to increase the size of MCInstrDesc either

I'm definitely not a TG expert, so @Paul-C-Anagnostopoulos please speak out if you have any suggestion on this approach :-)

jrtc27 added a comment.Sat, Jan 2, 2:52 PM

If it isn't possible to encode the information in TSFlags, another possibility is to add a pointer to the MCInstrDesc class that points to some kind of auxiliary class/struct.

This actually gives me an idea to directly use the MCInstrDesc::TSFlags as a pointer to auxiliary data structures. That is:

auto* AuxTable =  reinterpret_cast<uint8_t*>(MD.TSFlags);

Definitely not. Integers are not pointers, and will break on CHERI. You would need to make TSFlags a uintptr_t if you want that to work, but then you only get 32 bits on 32-bit architectures.

myhsu added a comment.Sat, Jan 2, 3:00 PM

If it isn't possible to encode the information in TSFlags, another possibility is to add a pointer to the MCInstrDesc class that points to some kind of auxiliary class/struct.

This actually gives me an idea to directly use the MCInstrDesc::TSFlags as a pointer to auxiliary data structures. That is:

auto* AuxTable =  reinterpret_cast<uint8_t*>(MD.TSFlags);

Definitely not. Integers are not pointers, and will break on CHERI. You would need to make TSFlags a uintptr_t if you want that to work, but then you only get 32 bits on 32-bit architectures.

Fair enough.

Then I think another way will be reusing step 1~3 in my algorithm but generate a function for MC code to access auxiliary data structures.

@jrtc27 Will some sort of union of a uint64_t and a pointer work? One of the Flags bits could specify which one it is.

I agree that it would be cleaner to encode the information as flags, if possible.

jrtc27 added a comment.Sat, Jan 2, 3:09 PM

@jrtc27 Will some sort of union of a uint64_t and a pointer work? One of the Flags bits could specify which one it is.

I agree that it would be cleaner to encode the information as flags, if possible.

That'd work, yes, so long as you either have the flag bit being _zero_ mean it's a pointer or you put the flag bit sufficiently low down (preferably bit 0) as otherwise you'll take the pointer way outside the bounds of the corresponding allocation by setting a high bit; in order to fit the bounds and 64-bit address in 128 bits we compress the bounds and rely on pointers not going "too far" out of bounds (anything other than one-past-the-end is UB in C/C++, but we relax that somewhat for compatibility, roughly in proportion to the size of the allocation), but if they do then they're marked invalid and won't work later even though you mask the bit out.

I was thinking of using a bit in the other flags member, Flags, as opposed to TSFlags. Then there would be nothing sneaky going on in the union itself.

Making it an anonymous union would mean that all current references to TSFlags would still work, correct?

jrtc27 added a comment.Sat, Jan 2, 5:07 PM

I was thinking of using a bit in the other flags member, Flags, as opposed to TSFlags. Then there would be nothing sneaky going on in the union itself.

Making it an anonymous union would mean that all current references to TSFlags would still work, correct?

Yes, that should all be fine.

myhsu added a comment.Sat, Jan 2, 5:36 PM

I was thinking of using a bit in the other flags member, Flags, as opposed to TSFlags. Then there would be nothing sneaky going on in the union itself.

Making it an anonymous union would mean that all current references to TSFlags would still work, correct?

@jrtc27 Will some sort of union of a uint64_t and a pointer work? One of the Flags bits could specify which one it is.

I agree that it would be cleaner to encode the information as flags, if possible.

That'd work, yes, so long as you either have the flag bit being _zero_ mean it's a pointer or you put the flag bit sufficiently low down (preferably bit 0) as otherwise you'll take the pointer way outside the bounds of the corresponding allocation by setting a high bit; in order to fit the bounds and 64-bit address in 128 bits we compress the bounds and rely on pointers not going "too far" out of bounds (anything other than one-past-the-end is UB in C/C++, but we relax that somewhat for compatibility, roughly in proportion to the size of the allocation), but if they do then they're marked invalid and won't work later even though you mask the bit out.

Sounds like a plan, I will update this patch accordingly

Thanks for the brainstorming

I was thinking of using a bit in the other flags member, Flags, as opposed to TSFlags. Then there would be nothing sneaky going on in the union itself.

Making it an anonymous union would mean that all current references to TSFlags would still work, correct?

@jrtc27 Will some sort of union of a uint64_t and a pointer work? One of the Flags bits could specify which one it is.

I agree that it would be cleaner to encode the information as flags, if possible.

That'd work, yes, so long as you either have the flag bit being _zero_ mean it's a pointer or you put the flag bit sufficiently low down (preferably bit 0) as otherwise you'll take the pointer way outside the bounds of the corresponding allocation by setting a high bit; in order to fit the bounds and 64-bit address in 128 bits we compress the bounds and rely on pointers not going "too far" out of bounds (anything other than one-past-the-end is UB in C/C++, but we relax that somewhat for compatibility, roughly in proportion to the size of the allocation), but if they do then they're marked invalid and won't work later even though you mask the bit out.

Sounds like a plan, I will update this patch accordingly

Thanks for the brainstorming

I'm not sure I follow the plan. So we're going to change the TableGen InstrInfoEmitter to put out a pointer for this target instead of the normal TSFlags constant every other target uses?

Why would refactoring TSFlags into a int-ptr union just for M68k be better than adding the CodeBeads functionality just for M68k?

@RKSimon has a good question that I will leave to others to debate.

@craig.topper Actually, a pointer is a bad idea. Instead, if it is necessary to have a separate blob of data, simply use the TSFlags field as an index into a separate table of instances. It's already a uint64_t, which makes a fine index.

I presume the use of this field and the separate table would be triggered by something in a TableGen file, not just hardwired for the M86K target.

@RKSimon has a good question that I will leave to others to debate.

@craig.topper Actually, a pointer is a bad idea. Instead, if it is necessary to have a separate blob of data, simply use the TSFlags field as an index into a separate table of instances. It's already a uint64_t, which makes a fine index.

I presume the use of this field and the separate table would be triggered by something in a TableGen file, not just hardwired for the M86K target.

Isn't this issue something that can be resolved once the backend has been merged?

It would be great if the backend could finally be available as an experimental backend as there multiple downstream projects like the Clang Kernel project that want to test this backend.

Why would refactoring TSFlags into a int-ptr union just for M68k be better than adding the CodeBeads functionality just for M68k?

I think the question is more of a "where does this belong" than "why are we doing this".

If we want to auto-generate tables at compile time, then table-gen is your friend. If this should be a bit more dynamic, then working around auto-generated tables can be cumbersome and counter productive.

The other question is "when should we do this".

We already have an implementation (CodeBeads) which I'm assuming came from an existing implementation, so (still assuming) we "know it works with m68k". The table-gen approach is so far an idea that "could" work. Working with table-gen is non-trivial.

In interest of doing one thing at a time, I'd prefer to keep the existing implementation for now and do a refactor later. As I see it, the refactor should only affect the m68k back-end.

In summary, my personal view is:

  • We have something that works now and the proposed alternative is equally unique
  • We should answer the "where" question before we go on implementing a new style
  • If it's decided a table-gen implementation fits better, we do it as a refactory, after the merge

Putting a bit TODO comment on the current CodeBeads code will help others not try to rely on it in the interim. But if they do, then this would have proven it's more than just m68k.

FWIW I still think we're better off keeping (m68k only) CodeBeads rather than refactoring a lot of code that will affect mainstream targets. After m68k has been pushed, we can reinvestigate whether to keep CodeBeads (and whether other targets would benefit from using it), or moving m68k to a refactored TSFlag mechanism - we can make either a pre-requisite for it losing its experimental status if necessary.

FWIW I still think we're better off keeping (m68k only) CodeBeads rather than refactoring a lot of code that will affect mainstream targets. After m68k has been pushed, we can reinvestigate whether to keep CodeBeads (and whether other targets would benefit from using it), or moving m68k to a refactored TSFlag mechanism - we can make either a pre-requisite for it losing its experimental status if necessary.

I very much agree with that stance and I think this is the most straight-forward approach. I think getting the backend merged so becomes visible to a broader audience - both testers and developers - will help improve the quality and squeeze out more bugs. After all, it's supposed to be marked as experimental so it's expected to not be production-ready yet.

FWIW I still think we're better off keeping (m68k only) CodeBeads rather than refactoring a lot of code that will affect mainstream targets. After m68k has been pushed, we can reinvestigate whether to keep CodeBeads (and whether other targets would benefit from using it), or moving m68k to a refactored TSFlag mechanism - we can make either a pre-requisite for it losing its experimental status if necessary.

I very much agree with that stance and I think this is the most straight-forward approach. I think getting the backend merged so becomes visible to a broader audience - both testers and developers - will help improve the quality and squeeze out more bugs. After all, it's supposed to be marked as experimental so it's expected to not be production-ready yet.

Yeah my concern is not about m68k doing weird things, it's about the code that's added in the target-independent places to support m68k that I am most concerned with, as it affects every target regardless of whether m68k is enabled.

The impact on existing targets is pretty minimal - I suppose the logical operand mappings code is the most exposed part? The CodeBeads code is relatively self-contained.

I don't think we'll gain much by trying to iterate on this further, I'd recommend accepting the patch at this point.

Yeah my concern is not about m68k doing weird things, it's about the code that's added in the target-independent places to support m68k that I am most concerned with, as it affects every target regardless of whether m68k is enabled.

Exactly, that's why I prefer to keep the CodeBeads implementation for now (that only affects the m68k target) than refactor table-gen (that would affect all).

We should avoid mixing generic refactory with experimental code, unless the refactory is a clear benefit to all targets, not just the experimental one.

I concur with @RKSimon that we should approve this code as is and get the target in.

Yeah my concern is not about m68k doing weird things, it's about the code that's added in the target-independent places to support m68k that I am most concerned with, as it affects every target regardless of whether m68k is enabled.

Exactly, that's why I prefer to keep the CodeBeads implementation for now (that only affects the m68k target) than refactor table-gen (that would affect all).

We should avoid mixing generic refactory with experimental code, unless the refactory is a clear benefit to all targets, not just the experimental one.

I concur with @RKSimon that we should approve this code as is and get the target in.

That sounds reasonable (but +1 to @RKSimon 's comment about reviewing all this "special sauce" complexity before graduating from experimental)

rengolin accepted this revision.Fri, Jan 15, 1:58 PM

That sounds reasonable (but +1 to @RKSimon 's comment about reviewing all this "special sauce" complexity before graduating from experimental)

Agreed! It should be resolved before that.

@myhsu Please can you raise a bug covering the CodeBeads vs TSFlags refactor options and make it a blocker against a second bug for making the m68k backend non-experimental?

myhsu added a comment.Fri, Jan 15, 2:08 PM

@myhsu Please can you raise a bug covering the CodeBeads vs TSFlags refactor options and make it a blocker against a second bug for making the m68k backend non-experimental?

Absolutely. I agree this problem should be solve before graduate from experimental target. I will create these bugs soon

rengolin added a comment.EditedSat, Jan 16, 5:22 AM

Absolutely. I agree this problem should be solve before graduate from experimental target. I will create these bugs soon

You could also create a meta bug called something like "m68k in production" that all bugs that needed to be solved before we move the target to production are dependencies.

People filling bugs against the target can link to that meta bug if the issue is serious enough.

You could also create infrastructure bugs like creating buildbots, passing certain milestones in compiling programs and OSs, etc.

It'd be easy then to see that the target is good enough to be promoted to production once all those bugs (or at least all critical ones) are solved.

Absolutely. I agree this problem should be solve before graduate from experimental target. I will create these bugs soon

You could also create a meta bug called something like "m68k in production" that all bugs that needed to be solved before we move the target to production are dependencies.

People filling bugs against the target can link to that meta bug if the issue is serious enough.

You could also create infrastructure bugs like creating buildbots, passing certain milestones in compiling programs and OSs, etc.

It'd be easy then to see that the target is good enough to be promoted to production once all those bugs (or at least all critical ones) are solved.

Alright, I've created bug 48792 for tracking it.

I've also created a meta/umbrella bug 48791 for tracking the status of graduating M68k from experimental target.

I will create more bugs regarding milestones and more OS supports and link them to the meta bug.

Thanks @myhsu!

@jrtc27 - you're still blocking this - is there anything else you think we need to address here?

jrtc27 accepted this revision.Mon, Jan 18, 11:21 AM

Thanks @myhsu!

@jrtc27 - you're still blocking this - is there anything else you think we need to address here?

For this review, no. There are a bunch of outstanding issues on the other reviews though that need addressing; I don't know if it makes sense to land this now or wait and land them all in one go once the other reviews are fixed and approved.

This revision is now accepted and ready to land.Mon, Jan 18, 11:21 AM

Thanks @myhsu!

@jrtc27 - you're still blocking this - is there anything else you think we need to address here?

For this review, no. There are a bunch of outstanding issues on the other reviews though that need addressing; I don't know if it makes sense to land this now or wait and land them all in one go once the other reviews are fixed and approved.

The plan is for the entire patch series to be accepted before any are pushed. Given how close we are to the next release branch (26 Jan iirc) I'd like to suggest that even if they all get accepted before then, we should delay the pushes to shortly after that.

The plan is for the entire patch series to be accepted before any are pushed.

Indeed. The whole purpose of patch series is to create a dependency that we can only merge any patch once they're all approved.

Given how close we are to the next release branch (26 Jan iirc) I'd like to suggest that even if they all get accepted before then, we should delay the pushes to shortly after that.

Good point.