This is an archive of the discontinued LLVM Phabricator instance.

[llvm-objdump] Print target address instead of call/j/br pc-rel immediate
AbandonedPublic

Authored by sperezglz on Dec 13 2019, 12:49 AM.

Details

Summary

binutils:
400436: 74 02 je 40043a <_init+0x12>

llvm-objdump currently :
400436: 74 02 je 0x2 <_init+0x12>

Even though the symbol + offset is printed with llvm-objdump I believe it is more intuitive to see the target address as well.

Diff Detail

Event Timeline

sperezglz created this revision.Dec 13 2019, 12:49 AM
sperezglz created this object with visibility "sperezglz (Sergio Perez Gonzalez)".
sperezglz created this object with edit policy "sperezglz (Sergio Perez Gonzalez)".
Herald added a project: Restricted Project. · View Herald TranscriptDec 13 2019, 12:49 AM
sperezglz changed the visibility from "sperezglz (Sergio Perez Gonzalez)" to "Public (No Login Required)".Dec 13 2019, 9:08 AM
sperezglz changed the edit policy from "sperezglz (Sergio Perez Gonzalez)" to "All Users".

I have no issues with attempting to improve GNU compatibility, but please see my inline comments, as you appear to have degraded some of the experience for one feature that is not present in GNU objdump. Also, I'm somewhat surprised that this issue doesn't affect other targets. Do they already do the right thing?

llvm/lib/Target/AArch64/MCTargetDesc/AArch64InstPrinter.cpp
1341–1342

Rather than removing this comment, I think you should change it to say why you do nothing here.

1344

Rather than removing the syntax highlighting, I think it would be preferable to keep it, but perhaps in a different colour? @seiya added this as part of the GSOC work last summer, so he may be best placed to suggest an alternative.

Same comment applies elsewhere.

This is basically the largest blocker that prevents me from using llvm-objdump regularly / or why I am still using powerpc-linux-gnu-objdump -d, objdump -d, etc... I thought about fixing this, but it would require a lot of test update.

If lld is in your -DLLVM_ENABLE_PROJECTS= list, try ninja check-lld, there should be lots of failures. There is an llvm-objdump option --print-imm-hex which is also used to decide whether an immediate should be printed in the hexadecimal form, but the target InstrPrinter code may not consistently respect it. (I've updated many lld tests to use --print-imm-hex, but I believe there are still lots of remaining issues... After --print-imm-hex becomes the default, we should probably do another clean-up to remove the option)

Thanks for the feedback @jhenderson @MaskRay I will do as suggested and first try to scope the effort of making this consistant for all the remaining targets and check failing tests in other tools. I will also investigate if it is possible to achieve this without touching the target's InstPrinters and maybe adding a clo, or any approach that would cause less disruption and/or less effort for fixing potentially impacted tests. Get back to you as soon as I have something.

sperezglz updated this revision to Diff 235949.EditedJan 2 2020, 2:34 PM

I used a different approach in this update, adding a command line option and leaving the target's instruction printers untouched, now unless the CLO is used the output is the same as standard, that is friendlier with existing tests. The other thing I had to do was to create an "indirect call" property, only added it to x86 , this was needed because a target address shouldn't be calculated (or printed) for indirect calls and I needed a way to identify them.

bcain added a subscriber: bcain.Jan 2 2020, 2:59 PM
Jim added a subscriber: Jim.Jan 2 2020, 11:16 PM
jhenderson added inline comments.Jan 3 2020, 1:27 AM
llvm/include/llvm/MC/MCInstrDesc.h
280 ↗(On Diff #235949)

Nit: missing trailing full stop.

llvm/tools/llvm-objdump/llvm-objdump.cpp
341

"tgt" is not an obvious abbreviation for "target" to me. I'd recommend just "print-target-addr" or "print-target-address".

674

This change seems unnecessary for the rest of the change?

1484

LLVM style is to use upper-case for variable names (i.e. splitInstStream -> SplitInstStream). It's also not immediately obvious what auto represents here, so consider replacing it with the full type.

That all being said, I'm not especially keen on this approach, as it seems to tie the code here to how the instructions are printed inside the corresponding instruction printer. It feels like something that could break very easily to me. That being said, I don't have a clear suggestion for an alternative at this point. @MaskRay might do.

1485–1489

You can reduce some of the duplication here by adding the comment and symbol details outside this if:

if (PrintTargetAddress) {
  auto splitInstStream = InstPrintStream.str().rsplit('\t');
  outs() << splitInstStream.first << '\t' << Twine::utohexstr(Target);
} else {
  outs() << InstPrintStream.str();
}
outs() << CommentStream.str() << SymbolDetailsStream.str();
1486

Same comment as above. InstPrinterBuffer is not going to be used again, so why clear it?

Note that the Comments string and stream are constructed outside the loop, so need to be cleared in order to be reused. You may wish to do the same thing with the other streams for performance reasons (though I don't know for sure).

1491

What's the point of this line? It's about to go out of scope.

sperezglz added inline comments.Jan 3 2020, 9:01 AM
llvm/include/llvm/MC/MCInstrDesc.h
280 ↗(On Diff #235949)

Sure , I'll add it.

llvm/tools/llvm-objdump/llvm-objdump.cpp
341

Yes that makes sense.

674

This was necessary because this function is used in the Hexagon prettyPrinter and it was printing directly to "outs()" by default while I'm sending everything to a temporary buffer so I had to add a parameter to make it flexible, this came out on a couple of Hexagon tests failing, but with this change those work fine.

1484

Yes the variable name was a typo I'll change it and add explicit type to the variable.

About the approach, in my opinion the main advantage is that the core of changes is centralized to one file and does not involve changing every target Inst Printer, the hard requirement is that there is a "\t" between instruction and immediate.

I'll fix the stuff pointed out by the comments and leave it for discussion and reference in case other suggestions come up.

1485–1489

You are right , I'll clean it up.

1486

Yeah sorry about those two, I tried to keep it safe by clearing but lost track of the scope where they live.

MaskRay added a comment.EditedJan 3 2020, 1:48 PM

MIPS beq $9, $6, 1332
RISCV: beqz x10, 512
PPC: bl .+offset [1]

These are examples that a direct call/jump instruction may take more than one operands, or have a custom target address printing method. They demonstrate that llvm-objdump just lacks target-specific knowledge to make the dump look good. Instead, I think we should

  • Teach MCInstPrinter::printInst and MCInstrPrinter::printInstruction to take an extra parameter Address (D72172 D72180)
  • Migrate targets's *InstrInfo.td files to use a different PrintMethod, say, printBranchOperand, instead of the default printOperand.
  • In printInstruction (generated by tablegen), pass Address to printBranchOperand.
  case 1:
    // PseudoFLD, PseudoFLW, PseudoFSD, PseudoFSW, PseudoSB, PseudoSD, Pseudo...
-    printBranchOperand(MI, 1, STI, O);
+    printBranchOperand(MI, 1, Address, STI, O);
    return;
    break;

[1]: PPC currently prints bl .+offset

void PPCInstPrinter::printBranchOperand(const MCInst *MI, unsigned OpNo,
                                        raw_ostream &O) {
  if (!MI->getOperand(OpNo).isImm())
    return printOperand(MI, OpNo, O);

  // Branches can take an immediate operand.  This is used by the branch
  // selection pass to print .+8, an eight byte displacement from the PC.
  O << ".";
  int32_t Imm = SignExtend32<32>((unsigned)MI->getOperand(OpNo).getImm() << 2);
  if (Imm >= 0)
    O << "+";
  O << Imm;
}
This comment was removed by sperezglz.
sperezglz abandoned this revision.Jan 3 2020, 5:08 PM

Oh ok, I was not aware of such examples, those cases make this approach invalid. Abandoning this, thanks for the reviews.