This is an archive of the discontinued LLVM Phabricator instance.

[M68k] Implement Disassembler
ClosedPublic

Authored by ricky26 on Mar 12 2021, 11:01 AM.

Details

Summary

This is a work-in-progress implementation of a disassembler for M68k. There are not yet any
new tests for the disassembler. I've left them until there's a general consensus that the structure
of this change is acceptable.

Questions I'm aware of:

  • Should this use Motorola or gas syntax? (At the moment it uses Motorola syntax.)
  • The disassembler produces a table at runtime for disassembly generated from the code beads. Is this okay? (This is less than ideal but as I mentioned in my llvm-dev post, it's quite complicated to write a table-gen parser for code beads.)

Diff Detail

Event Timeline

ricky26 created this revision.Mar 12 2021, 11:01 AM
myhsu added a subscriber: myhsu.Mar 16 2021, 2:30 PM
ricky26 published this revision for review.Mar 17 2021, 4:49 PM

I've marked this 'Ready for Review' since otherwise notifications don't happen. To be clear though, this is not ready to merge.

Herald added a project: Restricted Project. · View Herald TranscriptMar 17 2021, 4:49 PM
ricky26 updated this revision to Diff 335048.Apr 2 2021, 5:08 PM

Rebased, fixed some bugs & added some tests.

RKSimon added a subscriber: RKSimon.Apr 3 2021, 4:19 AM

Minor fly-by coding style comment

llvm/lib/Target/M68k/Disassembler/M68kDisassembler.cpp
63

(style) Don't re-evaluate a fixed array length.

for (unsigned I = 0, E = Buffer.size(); I != E; ++I) {`

Alternatively, AFAICT you might be able to use a for-range loop:

for (const auto &BufferValue : Buffer)
ricky26 updated this revision to Diff 337431.Apr 14 2021, 6:17 AM

Improve code style, add encoding tests to instructions.s.

ricky26 retitled this revision from WIP: [M68k] Implement Disassembler to [M68k] Implement Disassembler.Apr 14 2021, 6:17 AM

I think this is now ready for review proper.

I've done a pass tidying up the code style and I've added some encoding checks to the MC test added in the assembler.

llvm/lib/Target/M68k/Disassembler/M68kDisassembler.cpp
63

I did go through all of the loops and I only found one case where I wasn't doing something funny with the iterator (clamping the length for arrays, stepping by 2, for ex.). If I've missed any please, shout!

ricky26 updated this revision to Diff 337434.Apr 14 2021, 6:29 AM

Add loop changes missed from previous diff.

myhsu added a comment.EditedApr 14 2021, 3:52 PM

First of all, good job :-)
Though I only scanned over the patch, I'm a little concerned about the adoption of code beads here because ideally, we are going to remove code beads from LLVM in the future. As stated in one of my comments, can we get any help from the disassembler table generated by TableGen?

Also can you split any changes regarding assembly syntax into a separate patch?

llvm/lib/Target/M68k/CMakeLists.txt
14

Do we use this table anywhere? If not, is there any reason we can't use it?

llvm/lib/Target/M68k/Disassembler/CMakeLists.txt
5

Since you use getTheM68kTarget here, please explicitly link M68kInfo. I bumped into a related problem in M68kAsmParser where the linker is unable to find getTheM68kTarget symbol if you build LLVM libraries as shared objects.

llvm/lib/Target/M68k/Disassembler/M68kDisassembler.cpp
310

Is there any specific reason to use a single-case switch rather than if-statement here?

381

Can we use explicit type here

402

ditto

llvm/lib/Target/M68k/MCTargetDesc/M68kInstPrinter.cpp
196 ↗(On Diff #337434)

Can we put these changes into a separate patch (possibility combined with your changes on tests below)?
If the changes here affect the disassembler test, then maybe you can put it (the separate patch) as the dependency for the current disassembler patch.

llvm/test/MC/M68k/instructions.s
11 ↗(On Diff #337434)

Test directives related to instruction encoding is irrelevant to disassembler. Please remove them from this patch.

36 ↗(On Diff #337434)

Can we put changes on tests into a separate patch?

ricky26 updated this revision to Diff 338285.Apr 17 2021, 3:25 AM

Apply review feedback.

ricky26 updated this revision to Diff 338288.Apr 17 2021, 3:44 AM
ricky26 marked an inline comment as done.

Shorten switch to a single assert.

ricky26 marked 4 inline comments as done.Apr 17 2021, 3:45 AM

First of all, good job :-)
Though I only scanned over the patch, I'm a little concerned about the adoption of code beads here because ideally, we are going to remove code beads from LLVM in the future.

I agree but this is the path of least resistance and I don't think it's too much 'wasted work'. I'm hoping when the encoding mechanism is changed, it will be staged (introduce new encoding in table-gen, start moving components over in later patches). In which case I don't think the disassembler will get in the way. The benefit being that with this quite shallow approach (using the beads to build bitmasks) we get the extra functioning surface area of the disassembler in the mean time (and at least some of the disassembler won't need to be changed with the encoding rework).

If you think it's better to try and solve the encoding setup first, I have no qualms. I'm pretty interested in helping with the encoding rework too if I can. :)

As stated in one of my comments, can we get any help from the disassembler table generated by TableGen?

I forgot I included this. I had a look when I initially started poking around the disassembler but it looks like it is generated for instructions with the Inst field set.

There are some overrides (X86 being notable) and I considered writing an M68k extension but there're a fair few pieces of information that aren't available in table-gen which we need to parse the beads and produce a disassembly table. I figured that we would migrate over to using that (probably still with a bit of a custom extension) once we reworked the encoding.

ricky26 updated this revision to Diff 338347.Apr 17 2021, 5:56 PM

Fix up the other switch statement that I somehow missed.

First of all, good job :-)
Though I only scanned over the patch, I'm a little concerned about the adoption of code beads here because ideally, we are going to remove code beads from LLVM in the future.

I agree but this is the path of least resistance and I don't think it's too much 'wasted work'. I'm hoping when the encoding mechanism is changed, it will be staged (introduce new encoding in table-gen, start moving components over in later patches). In which case I don't think the disassembler will get in the way. The benefit being that with this quite shallow approach (using the beads to build bitmasks) we get the extra functioning surface area of the disassembler in the mean time (and at least some of the disassembler won't need to be changed with the encoding rework).

If you think it's better to try and solve the encoding setup first, I have no qualms. I'm pretty interested in helping with the encoding rework too if I can. :)

Nah, let's sort out the disassembler first.

As stated in one of my comments, can we get any help from the disassembler table generated by TableGen?

I forgot I included this. I had a look when I initially started poking around the disassembler but it looks like it is generated for instructions with the Inst field set.

There are some overrides (X86 being notable) and I considered writing an M68k extension but there're a fair few pieces of information that aren't available in table-gen which we need to parse the beads and produce a disassembly table. I figured that we would migrate over to using that (probably still with a bit of a custom extension) once we reworked the encoding.

You're right, it's probably better to consider this option after the encoding rework

myhsu accepted this revision.Apr 19 2021, 10:45 AM

conditionally LGTM.
Please fix the formatting issues before committing it. I think all of them are related to this coding style rule.

llvm/lib/Target/M68k/Disassembler/M68kDisassembler.cpp
210

remove curly braces since there is only one statement in the body

219

ditto

239

remove curly braces since there is only one statement in the body

255

ditto

336

remove curly braces since there is only one statement in the body

340

ditto

389

remove curly braces since there is only one statement in the body

410

remove curly braces since there is only one statement in the body

424

remove curly braces since there is only one statement in the body

432

ditto

472

remove curly braces since there is only one statement in the body

480

ditto

493

ditto

570

remove curly braces since there is only one statement in the body

This revision is now accepted and ready to land.Apr 19 2021, 10:45 AM
ricky26 updated this revision to Diff 338636.Apr 19 2021, 2:07 PM
ricky26 marked 14 inline comments as done.

Fixed up code style.

ricky26 updated this revision to Diff 338637.Apr 19 2021, 2:10 PM

Fixup one missed instance of the "don't use braces" rule.

This revision was landed with ongoing or failed builds.Apr 19 2021, 2:28 PM
This revision was automatically updated to reflect the committed changes.
RKSimon added inline comments.Sep 15 2021, 7:05 AM
llvm/lib/Target/M68k/Disassembler/M68kDisassembler.cpp
448

@ricky26 coverity is warning that the NumToRead = 32 case results in UB when shifting Scratch.

Could this be:

Scratch = NumRead >= 32 ? 0 : (Scratch << NumToRead);
Scratch |= Reader.readBits(NumToRead);
ricky26 added inline comments.Sep 17 2021, 2:10 PM
llvm/lib/Target/M68k/Disassembler/M68kDisassembler.cpp
448

Thanks, I always forget that shifting the full width is UB. (Is coverity something I should be watching for my changes?)

Does LLVM support host compilers with non-32-bit ints? If so, the logic will need to be more complicated.

Something like Scratch = (NumRead < (sizeof(unsigned) * 8)) ? (Scratch << NumToRead) : 0;?

If this isn't high priority I'll try and land a patch for it sometime over the coming week.

Thanks!

RKSimon added inline comments.Sep 18 2021, 12:29 AM
llvm/lib/Target/M68k/Disassembler/M68kDisassembler.cpp
448

I don't think LLVM would survive building on non-32-bit int target......

Its not high priority, but I do try to keep an eye on static analysis reports as they do sometimes raise interesting issues. Potential out of range shift amount warnings are quite common in the llvm codebase.

You can request access to the coverity reports via: https://scan.coverity.com/projects/llvm