This is an archive of the discontinued LLVM Phabricator instance.

[AArch64InstPrinter][llvm-objdump] Print ADR PC-relative label as a target address hexadecimal form
ClosedPublic

Authored by krisb on Feb 15 2023, 12:52 AM.

Details

Summary

This is similar to ADRP and matches GNU objdump:

GNU objdump:

0000000000200100 <_start>:
  200100:    adr    x0, 201000 <_start+0xf00>

llvm-objdump (before patch):

0000000000200100 <_start>:
  200100:    adr    x0, #3840

llvm-objdump (after patch):

0000000000200100 <_start>:
  200100:    adr    x0, 0x201000 <_start+0xf00>

Diff Detail

Event Timeline

krisb created this revision.Feb 15 2023, 12:52 AM
krisb published this revision for review.Feb 15 2023, 1:15 AM
krisb added reviewers: psmith, simon_tatham.
krisb added a project: Restricted Project.
simon_tatham accepted this revision.Feb 15 2023, 1:42 AM

A much needed improvement – thanks!

llvm/test/tools/llvm-objdump/ELF/AArch64/pcrel-address.yaml
6

Not that it matters for testing purposes, but did you mean to point that at 0x201000? Your other new adr matches the adrp that was there already.

This revision is now accepted and ready to land.Feb 15 2023, 1:42 AM

No objections from me. I guess that --no-print-imm-hex is not honoured as we are not printing the immediate anymore?

My one caveat is that this can give some strange answers for some code that I doubt anyone would write in practice like

.text
adr x0, -4096

Which when disassembling an object will wrap around the address space. With gnu objdump:

0000000000000000 <.text>:
   0:   10ff8000        adr     x0, fffffffffffff000 <.text+0xfffffffffffff000>

which is less helpful than the current llvm-objdump (I'm not using --no-print-imm-hex)

0000000000000000 <$x.0>:
       0: 10ff8000      adr     x0, #-0x1000

However that should only happen for code that does a relative offset outside the section which is unlikely to work in practice.

I suggest updating the description with a slightly different example as it looks like it is just changing from decimal to hex. It is doing a bit more than that. It is changing a decimal offset from the PC to the absolute address of PC + offset.

I think that most ideally, if you don't know the true address of the instruction being disassembled, the best way to disassemble ADR is to indicate explicitly that the address you're going to end up with depends on that unknown, and not make it look as if it's a constant.

So if you were disassembling a single encoding in the complete absence of any context, you'd write something like adr x0, .+0x1234 (adjusted as necessary for the representation of 'here' in the assembler syntax you're outputting). And if you're in a context like an object file, where your ADR instruction is at a known offset from the start of the section but the section in turn might end up anywhere in memory after linking, you could say something relative to the section name or to one of the symbols in it, like adr x0, .text+0x12 or adr x0, .L.stringliteral.

But that's a much bigger piece of work, and shouldn't block this change, which is a strict improvement on the previous representation!

krisb edited the summary of this revision. (Show Details)Feb 15 2023, 7:35 AM
krisb updated this revision to Diff 497672.Feb 15 2023, 7:36 AM

Make test checks more readable.

krisb added a comment.EditedFeb 15 2023, 7:37 AM

@peter.smith thank you for your comments!

No objections from me. I guess that --no-print-imm-hex is not honoured as we are not printing the immediate anymore?

Right.

My one caveat is that this can give some strange answers for some code that I doubt anyone would write in practice like

.text
adr x0, -4096

Which when disassembling an object will wrap around the address space. With gnu objdump:

0000000000000000 <.text>:
   0:   10ff8000        adr     x0, fffffffffffff000 <.text+0xfffffffffffff000>

which is less helpful than the current llvm-objdump (I'm not using --no-print-imm-hex)

0000000000000000 <$x.0>:
       0: 10ff8000      adr     x0, #-0x1000

However that should only happen for code that does a relative offset outside the section which is unlikely to work in practice.

I agree that llvm-objdump output w/o the patch looks a bit better, but the example seems to be rather artificial. So, I'm not sure we need to make this more complex, but we can, for example, check that a calculated address is still in +/-1MB range and opt out if it's not.

I suggest updating the description with a slightly different example as it looks like it is just changing from decimal to hex. It is doing a bit more than that. It is changing a decimal offset from the PC to the absolute address of PC + offset.

Done.

llvm/test/tools/llvm-objdump/ELF/AArch64/pcrel-address.yaml
6

I didn't mean to fully match addresses from adrp example, just reused the offsets w/o any specific intent.
It seems to be more readable if to remove --no-leading-addr and check for instruction address as well to demonstrate that the printed address is PC + offset.

krisb added a comment.Feb 15 2023, 7:40 AM

@simon_tatham thank you for looking at this!

I think that most ideally, if you don't know the true address of the instruction being disassembled, the best way to disassemble ADR is to indicate explicitly that the address you're going to end up with depends on that unknown, and not make it look as if it's a constant.
So if you were disassembling a single encoding in the complete absence of any context, you'd write something like adr x0, .+0x1234 (adjusted as necessary for the representation of 'here' in the assembler syntax you're outputting). And if you're in a context like an object file, where your ADR instruction is at a known offset from the start of the section but the section in turn might end up anywhere in memory after linking, you could say something relative to the section name or to one of the symbols in it, like adr x0, .text+0x12 or adr x0, .L.stringliteral.

This looks like how a target address label works, which is mostly what I want to achieve by this patch. In llvm/test/tools/llvm-objdump/ELF/AArch64/pcrel-address.yaml the following example

adr x0, 0x201004 <_start+0xf04>

demonstrates this. Label in '<>' (target address label, not sure about the notion, just using the wording from llvm-objdump code) printed relatively to a known symbol _start.
But llvm-objdump doesn't output any label if it isn't able to resolve an address against any known symbol. May be we can improve this.

peter.smith accepted this revision.Feb 15 2023, 9:22 AM

Thanks for the updates. I agree with SImon that this is an improvement worth having.

This revision was landed with ongoing or failed builds.Feb 18 2023, 8:32 AM
This revision was automatically updated to reflect the committed changes.