This is an archive of the discontinued LLVM Phabricator instance.

[X86InstPrinter] Change printPCRelImm to print the target address in hexadecimal form
ClosedPublic

Authored by MaskRay on Mar 22 2020, 4:10 PM.

Details

Summary
// llvm-objdump -d output (before)
400000: e8 0b 00 00 00   callq 11
400005: e8 0b 00 00 00   callq 11

// llvm-objdump -d output (after)
400000: e8 0b 00 00 00  callq 0x400010
400005: e8 0b 00 00 00  callq 0x400015

// GNU objdump -d. The lack of 0x is not ideal because it cannot be re-assembled
400000: e8 0b 00 00 00  callq 400010
400005: e8 0b 00 00 00  callq 400015

In llvm-objdump, we pass the address of the next MCInst. Ideally we
should just thread the address of the current address, unfortunately we
cannot call X86MCCodeEmitter::encodeInstruction (X86MCCodeEmitter
requires MCInstrInfo and MCContext) to get the length of the MCInst.

MCInstPrinter::printInst has other callers (e.g llvm-mc -filetype=asm, llvm-mca) which set Address to 0. They leave MCInstPrinter::PrintBranchImmAsAddress as false and this change is a no-op for them.

Diff Detail

Event Timeline

MaskRay created this revision.Mar 22 2020, 4:10 PM
MaskRay planned changes to this revision.Mar 22 2020, 4:19 PM
MaskRay added a comment.EditedMar 22 2020, 4:31 PM

tools/llvm-objdump/llvm-objdump.cpp currently passes the address of the current instruction to MCInstPrinter::printInst:

IP.printInst(MI, Address.Address, "", STI, OS);

x86 requires PC-offset bias. This patch does not take the length of MCInst into account, so the result is wrong.

Without X86MCCodeEmitter::encodeInstruction (which requires MCInstrInfo and MCContext), it seems difficult to get the length of MCInst.

We could teach tools/llvm-objdump/llvm-objdump.cpp to pass the address of the next MCInst, i.e.

IP.printInst(MI, Address.Address + Bytes.size(), "", STI, OS);

but this seems ugly.

MaskRay updated this revision to Diff 251934.Mar 22 2020, 6:40 PM
MaskRay edited the summary of this revision. (Show Details)

Take PC offset bias into account

jhenderson added inline comments.Mar 24 2020, 2:03 AM
llvm/lib/Target/X86/MCTargetDesc/X86InstPrinterCommon.cpp
293

Whilst you're reflowing this, please fix "for example" -> "For example".

295

I don't know if it matters, but should the reference to Address be \p Address?

301–302

This might be a dumb question, since I'm not familiar with the disassembly stuff, but why is this here?

llvm/tools/llvm-objdump/llvm-objdump.cpp
739–740

This is the PC bias bit, right? I thought PC bias was a thing on other platforms too?

This is probably at least deserving of a comment.

MaskRay updated this revision to Diff 252344.Mar 24 2020, 9:08 AM
MaskRay marked 5 inline comments as done.

Improve tests

llvm/lib/Target/X86/MCTargetDesc/X86InstPrinterCommon.cpp
301–302

x86-32's address space is 32-bit. call .-4 at address 0 shall be displayed as 0xfffffffc, instead of 0xfffffffffffffffc. There are several tests relying on the behavior.

MaskRay marked an inline comment as done.Mar 24 2020, 9:23 AM
MaskRay added inline comments.
llvm/tools/llvm-objdump/llvm-objdump.cpp
739–740

I think AArch64/PPC/RISC-V do not have PC bias. Their PC relative immediate is relative to the current instruction.

andreadb added inline comments.Mar 24 2020, 11:08 AM
llvm/test/tools/llvm-mca/X86/show-encoding.s
58 ↗(On Diff #252344)

I find the jl 0xffffffffffffffa6 less readable than the old jl -90 to be honest.

For cases like this where the address of the next instruction is unknown (and therefore set to zero), I suggest that we fall back to the old behavior.
This case in is particularly annoying here because the Imm value is negative.

Note that this is quite a common pattern for code snippets processed by mca.
The branch at the end of a mca code snippet is often a backedge.

I don't know the motivations for this change. However, can we at least avoid to change the value if the PC value is unknown? Just an idea.

MaskRay marked an inline comment as done.Mar 24 2020, 11:46 AM
MaskRay added inline comments.
llvm/test/tools/llvm-mca/X86/show-encoding.s
58 ↗(On Diff #252344)

jl -90 is not an ideal output. The unambiguous form is jl .-90

I'll add a member variable TargetAddressAsDotRelative in D72172.

I don't know the motivations for this change.

call/jump instructions not displaying the target address has always been a great pain for many users. I personally prefer GNU objdump -d just because this issue.

MaskRay marked an inline comment as done.Mar 24 2020, 11:47 AM
MaskRay added inline comments.
llvm/test/tools/llvm-mca/X86/show-encoding.s
58 ↗(On Diff #252344)
MaskRay updated this revision to Diff 252407.Mar 24 2020, 12:37 PM
MaskRay edited the summary of this revision. (Show Details)

Rebase after MCInstPrinter::PrintBranchImmAsAddress is added to the parent patch.

llvm-mc -filetype=asm and llvm-mca are unaffected now.

Rebase after MCInstPrinter::PrintBranchImmAsAddress is added to the parent patch.

llvm-mc -filetype=asm and llvm-mca are unaffected now.

Thanks!

jhenderson accepted this revision.Mar 26 2020, 1:43 AM

I think this looks okay to me. LGTM. Caveat: I don't know much about the disassembler, so I leave it up to you whether you think somebody else should review.

This revision is now accepted and ready to land.Mar 26 2020, 1:43 AM
This revision was automatically updated to reflect the committed changes.