This is an archive of the discontinued LLVM Phabricator instance.

[MC] Support returning marked up ranges in the disassembly
AcceptedPublic

Authored by seiya on Jul 23 2019, 10:21 PM.

Details

Summary

This patch introduces MCInstPrinter::setMarkupSpans, which allows the caller to
use the output of rich disassembly without parsing its output string.

An upcoming patch replaces markup() with startMarkup(), endMarkup(), and withMarkup().

This feature will be used for a disassembly highlighting feature in llvm-objdump.

Event Timeline

seiya created this revision.Jul 23 2019, 10:21 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 23 2019, 10:21 PM
jhenderson added inline comments.Jul 24 2019, 6:02 AM
llvm/include/llvm/MC/MCInstPrinter.h
44

Doxygen style comments please, to match the rest of the file.

64

Why is this a unique_ptr? Can't you just have a std::vector here directly?

80

This isn't a stack, it's a vector. Use the correct terminology for the type.

Could it be a vector of ArrayRefs?

83

Why not just Spans?

85–86

Don't use names with trailing underscores. Just do NewEnabled and NewSpans or whatever.

178

I'm not sure this comment makes sense. Also, the comments here should be in doxygen style, to match existing members.

Perhaps "Set the vector to write marked up ranges to" would be better phrasing.

jhenderson added inline comments.Jul 24 2019, 6:02 AM
llvm/include/llvm/MC/MCInstPrinter.h
181–182

This feels like bad code design to me, since it would be incredibly easy for a use to forget to do this. How about creating printInst and printInstImpl functions. The former calls resetMarkup and then printInstImpl. The latter is the pure virtual method that sub-classes implement.

llvm/lib/MC/MCInstPrinter.cpp
156–157

I'd probably just delete the second sentence here.

162

Use C++ style comments, and start the comment with an upper-case letter. It would be better to write this in the passive too: "Length and InnerLength will be set later."

MaskRay added inline comments.
llvm/include/llvm/MC/MCInstPrinter.h
74

MarkupState can be nested in MCInstprinter. It isn't expected to be accessed by other files.

seiya updated this revision to Diff 212313.Jul 30 2019, 4:52 AM
seiya marked 11 inline comments as done.
  • Addressed some review comments.
seiya added inline comments.Jul 30 2019, 4:53 AM
llvm/include/llvm/MC/MCInstPrinter.h
80

This isn't a stack, it's a vector. Use the correct terminology for the type.

Here I use std::vector as stack. I'd like to use std::stack instead but it does not provide clear method. I've updated the comment to stress my intention.

Could it be a vector of ArrayRefs?

No. We need a pointer to a vector to append elements into the vector.

181–182

I'm totally agree with you and adding printInstImpl sounds good idea. Thanks!

seiya marked an inline comment as not done.Jul 30 2019, 4:57 AM
seiya marked an inline comment as done.Jul 30 2019, 5:09 AM
seiya added inline comments.
llvm/include/llvm/MC/MCInstPrinter.h
64

I think it should be a unique_ptr to prevent dangling pointer reference here:

auto *InnerSpans =
    const_cast<std::vector<MarkupSpan> *>(Span.InnerSpans.get());
M.State.Spans.back()->push_back(std::move(Span));
// Span is now moved to the "M.State.Spans.back()" but InnerSpans is still valid
// since Span.InnerSpans is an unique_ptr.
M.State.Spans.push_back(InnerSpans);
seiya planned changes to this revision.Jul 30 2019, 5:11 AM

I'll consider the way to eliminate MarkupState.reset so that we don't have to introduce printInstImpl to existing all InstPrinters.

seiya marked an inline comment as done.Jul 30 2019, 5:27 AM
seiya added inline comments.
llvm/include/llvm/MC/MCInstPrinter.h
64

Ah, I realized we don't need unique_ptr here. We can rewrite the code above as:

M.State.Spans.back()->push_back(std::move(Span));
M.State.Spans.push_back(&M.State.Spans.back()->back().InnerSpans);

I'll update the patch later.

seiya updated this revision to Diff 212529.Jul 31 2019, 1:42 AM
seiya marked 2 inline comments as done.
  • Removed MarkupState so that we don't have to update all MCInstPrinters.
  • Addressed remaining review comments.
seiya updated this revision to Diff 212530.Jul 31 2019, 1:44 AM
  • Fixed a comment.
MaskRay added inline comments.Jul 31 2019, 8:29 AM
llvm/lib/MC/MCInstPrinter.cpp
145

I suggest deleting this TODO. After rL166639 (2012), the document has never been touched. Actually, markup() has been adopted in every target so explicitly mentioning this may not be very necessary.

seiya updated this revision to Diff 212686.Jul 31 2019, 3:55 PM
seiya marked an inline comment as done.
  • Addressed a review comment.
jhenderson added inline comments.Aug 5 2019, 6:20 AM
llvm/include/llvm/MC/MCInstPrinter.h
145–146

You could just use while(!MarupSpans.empty()) MarkupSpans.pop(); instead of clear, if you want to use a std::stack.

seiya updated this revision to Diff 213581.Aug 6 2019, 4:31 AM
seiya marked 2 inline comments as done.
  • Addressed a review comment.
llvm/include/llvm/MC/MCInstPrinter.h
145–146

Fixed. First I thought iterating all elements (O(n)) is not good but it won't be issue because n is small (typically n < 3) here...

jhenderson accepted this revision.Aug 6 2019, 4:37 AM

LGTM.

llvm/include/llvm/MC/MCInstPrinter.h
145–146

I think vector::clear is likely O(n) time as well - it has to destroy all its elements.

This revision is now accepted and ready to land.Aug 6 2019, 4:37 AM
MaskRay accepted this revision.Aug 6 2019, 4:49 AM
MaskRay added inline comments.
llvm/lib/MC/MCInstPrinter.cpp
147

M.Spans.top()->emplace_back(M.Type, OS.tell(), 0, OS.tell() + Length, 0);

seiya updated this revision to Diff 213734.Aug 6 2019, 3:28 PM
seiya marked 3 inline comments as done.
  • Addressed a review comment.
seiya added inline comments.Aug 6 2019, 3:31 PM
llvm/include/llvm/MC/MCInstPrinter.h
145–146

There's no guarantee but I implicitly assumed that there must be an O(1) optimization for
trivially destructible types like pointers.

rupprecht accepted this revision.Aug 7 2019, 10:04 AM
rupprecht added inline comments.
llvm/include/llvm/MC/MCInstPrinter.h
72

Can you add brief class-level comments for MarkupStart/MarkupEnd too?

llvm/lib/MC/MCInstPrinter.cpp
159

Another design consideration would be to put the functionality of endMarkup() into the destructor MarkupStart::~MarkupStart()

e.g.

OS << startMarkup(MarkupType::Reg) << '%' << getRegisterName(RegNo)
     << endMarkup();

turns into:

OS << startMarkup(MarkupType::Reg) << '%' << getRegisterName(RegNo); /* ~MarkupStart() ends the markup */

Though it would require changing all the call sites so the register/operand/whatever can be printed as part of the same OS << ... expression, so I wouldn't recommend trying that until later. But it may be worth looking into to avoid programmer error of forgetting either of markupStart()/markupEnd()

seiya planned changes to this revision.Aug 7 2019, 7:44 PM
seiya marked an inline comment as done.

I'll try out the destructor approach suggested by @rupprecht.

llvm/lib/MC/MCInstPrinter.cpp
159

The destructor approach sounds good idea! I'll try it.

seiya updated this revision to Diff 214579.Aug 12 2019, 12:04 AM
seiya marked 3 inline comments as done.
  • Added WithMarkup.
  • Added class-level comments for MarkupStart and MarkupEnd.
This revision is now accepted and ready to land.Aug 12 2019, 12:04 AM
seiya added inline comments.Aug 12 2019, 12:04 AM
llvm/lib/MC/MCInstPrinter.cpp
159

I've added WithMarkup, which does the functionality of endMarkup in the destructor (just like WithColor).

You should update your patch title and description too.

llvm/include/llvm/MC/MCInstPrinter.h
74

This comment doesn't make much sense (it refers to startupMarkup() too many times, I think). Please fix it.

rupprecht added inline comments.Aug 12 2019, 4:42 PM
llvm/include/llvm/MC/MCInstPrinter.h
64

Can you include an example where you might have two InnerSpans?

80

When do you need to have multiple Spans on the stack?
Can you add test coverage for that case?

seiya edited the summary of this revision. (Show Details)Aug 12 2019, 10:53 PM
seiya updated this revision to Diff 214766.Aug 12 2019, 11:19 PM
seiya marked 3 inline comments as done.
  • Updated comments.

You should update your patch title and description too.

What do you think what the patch title should describe?

How about "[MC] Support returning marked up ranges in the disassembly" or "[MCInstPrinter] Add MarkupStart/MarkupEnd/WithMarkup?

llvm/include/llvm/MC/MCInstPrinter.h
80

When do you need to have multiple Spans on the stack?

The stack has multiple Spans if the marked up ranges are nested:

movq <reg:%rax>, <mem:256(<reg:%rdi>)>
     ^^^^^^^^^^  ^^^^^^^^^----------^^
      1st span    2nd span (*)
                          ^^^^^^^^^^
                           1st span in the (*)'s InnerSpans

In this example, when we start disassembling <reg:%rdi>, the pointer to InnerSpans of the <mem:...> span will be appended into the stack.

Can you add test coverage for that case?

test/MC/Disassembler/X86/marked-up.txt includes such a case. We don't add a test in this patch so this should be committed along with D65190, btw.

seiya marked an inline comment as done.Aug 12 2019, 11:26 PM
jhenderson accepted this revision.Aug 13 2019, 7:34 AM

You should update your patch title and description too.

What do you think what the patch title should describe?

How about "[MC] Support returning marked up ranges in the disassembly" or "[MCInstPrinter] Add MarkupStart/MarkupEnd/WithMarkup?

I think the first of your suggestions is good.

Nit: "This feature will be used for the highlighting feature in llvm-objdump." should be "This feature will be used for a disassembly highlighting feature in llvm-objdump."

seiya retitled this revision from [MC] Support returning a structured rich disassembly output to [MC] Support returning marked up ranges in the disassembly.Aug 13 2019, 6:02 PM
seiya edited the summary of this revision. (Show Details)
seiya edited the summary of this revision. (Show Details)

Updated the title and description. Thanks @jhenderson!

seiya updated this revision to Diff 215013.Aug 13 2019, 7:15 PM

This is the last change to this patch if it's accepted.

seiya updated this revision to Diff 215016.Aug 13 2019, 7:26 PM
  • Improved comments for WithMarkup a bit.
jhenderson accepted this revision.Aug 14 2019, 7:59 AM

LGTM. No need to get further review from my point of view after you've fixed my remaining nit.

llvm/include/llvm/MC/MCInstPrinter.h
187–188

Nit: looks like you've got two spaces before the const in these two lines.

seiya updated this revision to Diff 215348.Aug 15 2019, 2:30 AM
  • Removed redundant whitespaces.
seiya marked an inline comment as done.Aug 15 2019, 2:30 AM
rupprecht accepted this revision.Aug 15 2019, 10:54 AM
jhenderson accepted this revision.Aug 19 2019, 6:35 AM

Does this need rebase now?