This is an archive of the discontinued LLVM Phabricator instance.

llvm-objdump: when ELF st_other field is set, print its value before symbol name
AbandonedPublic

Authored by adalava on May 7 2019, 12:06 PM.

Details

Summary

Functions on PowerPC64 using ELFv2 ABI may have 2 entry points: global and local. When using local entry points, symbol st_other field is set, but it's not visible on llvm-objdump, unlike objdump.
This prints ELF symbol "st_other" field value just before symbol name, as it's required for validating D59436 and possibly D56586.

$ ~/src/llvm-project-oficial/build-release-master/bin/llvm-objdump -t localentry.o

localentry.o:	file format ELF64-ppc64

SYMBOL TABLE:
0000000000000000 l    df *ABS*		 00000000 localentry.c
0000000000000000 l    d  .toc		 00000000 .toc
0000000000000000         *UND*		 00000000 .TOC.
0000000000000000 gw    F .text		 0000002c 0x60 __impl_openattt
0000000000000000 g     O .bss		 00000004 a
0000000000000000 g     F .text		 0000002c 0x60 openattt
0000000000000000 gw    F .text		 0000002c 0x60 openattt@FBSD_1.1

Note this may break old test cases based on llvm-objdump.

Event Timeline

adalava created this revision.May 7 2019, 12:06 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 7 2019, 12:06 PM
adalava edited the summary of this revision. (Show Details)May 7 2019, 12:21 PM
adalava added reviewers: Bigcheese, emaste, ruiu, sfertile.
adalava added a subscriber: luporl.May 7 2019, 12:27 PM

Adding MaskRay as a reviewer for their familiarity with the llvm tools.

This definitely seems a useful upgrade to me. Unfortunately it might affect a lot of lit tests. If we do intend to make a change we should try to match the output of the binutils objdump. I can see from the surrounding context we print out '.hidden', having a quick look it seems binutils will print out the visibility if its the only thing set, and the hex value otherwise (ie not both) here is an example:

other.c:

int a = 55;

__attribute__((visibility("hidden"), weak))
int foo(void) {
  return a;
}

__attribute__((visibility("protected")))
int exported_func(void) {
  return a;
}

__attribute__((visibility("protected")))
int single_entry(void) {
    return 55;
}

__attribute__((visibility("internal")))
int internal_func(void) {
  return 1234;
}

int bar(void) {
  return foo();
}

objdump output:

0000000000000000 g     O .data  0000000000000004 a
0000000000000000  w    F .text  0000000000000040 0x62 foo
0000000000000000         *UND*  0000000000000000 .TOC.
0000000000000040 g     F .text  0000000000000040 0x63 exported_func
0000000000000080 g     F .text  000000000000002c .protected single_entry
00000000000000ac g     F .text  000000000000002c .internal internal_func
00000000000000d8 g     F .text  0000000000000048 0x60 bar
adalava added a comment.EditedMay 7 2019, 3:29 PM

This definitely seems a useful upgrade to me. Unfortunately it might affect a lot of lit tests. If we do intend to make a change we should try to match the output of the binutils objdump. I can see from the surrounding context we print out '.hidden', having a quick look it seems binutils will print out the visibility if its the only thing set, and the hex value otherwise (ie not both) here is an example:

@sfertile, here's the result with your code on FreeBSD environment:

Binutils objdump 2.30

powerpc64-unknown-freebsd12.0-objdump -t other.o 

other.o:     file format elf64-powerpc-freebsd

SYMBOL TABLE:
0000000000000000 l    df *ABS*	0000000000000000 other.c
0000000000000000 l    d  .toc	0000000000000000 .toc
0000000000000000         *UND*	0000000000000000 .TOC.
0000000000000000 g     O .data	0000000000000004 a
0000000000000070 g     F .text	0000000000000048 0x60 bar
0000000000000024 g     F .text	0000000000000024 0x63 exported_func
0000000000000000  w    F .text	0000000000000024 0x62 foo
000000000000005c g     F .text	0000000000000014 .hidden internal_func
0000000000000048 g     F .text	0000000000000014 .protected single_entry

LLVM9 objdump (with patch applied)

llvm-objdump -t other.o 

other.o:	file format ELF64-ppc64

SYMBOL TABLE:
0000000000000000 l    df *ABS*		 00000000 other.c
0000000000000000 l    d  .toc		 00000000 .toc
0000000000000000         *UND*		 00000000 .TOC.
0000000000000000 g     O .data		 00000004 a
0000000000000070 g     F .text		 00000048 0x60 bar
0000000000000024 g     F .text		 00000024 0x63 exported_func
0000000000000000 gw    F .text		 00000024 .hidden 0x62 foo
000000000000005c g     F .text		 00000014 .hidden 0x2 internal_func
0000000000000048 g     F .text		 00000014 0x3 single_entry

Based on your example I list the following differences we should probably work on:

  1. if just .hidden is set, print .hidden and omit st_other value
  2. if .hidden and other bits are set, print only st_other value
  3. if just .protect is set, print .protected (and probably follow rule above when other bits are set too
  4. on symbol size column add leading zeros up to 16 positions instead of 8

I'm not sure about ELF Symbol Binding flags. Need to check if llvm-objdump is missing the "g" flag.

The relevant binutils objdump logic is contained in bfd/elf.c, specifically the bfd_elf_print_symbol() function.

The bfd_print_symbol_all case does:

  • symbol VMA
  • space
  • symbol flags
  • space
  • Section name or "(*none*)"
  • Tab character
  • st_value (if symbol is in a common section) or st_size
  • if symbol is versioned: space and then symbol version information, surrounded with () if hidden. Trailing spaces as required to end up at 12 characters total for this part.
  • space and then st_other IF NONZERO. Use ".hidden", ".internal", or ".protected" if st_other is exactly one of those values, otherwise st_other in hex.
  • space
  • symbol name.

So basically, it's either printing nothing (if st_other is 0), .internal (if st_other is 1), .hidden (if st_other is 2), .protected (if st_other is 3), or hex (anything other than these exact st_other values.)

The st_other uint8 is shared between visibility flags and elfv2 local entry point handling.

So for ELF, you would want to do this logic INSTEAD of the hidden check immediately above.

This change makes sense and it will improve compatibility with GNU objdump. For your tests (I've commented there), I usually prefer llvm-readelf/llvm-readobj to llvm-objdump. The output of objdump also looks very weird to me. I remember @emaste said so in one FreeBSD issue.. I can check how many tests (you should at least check ninja -C Release check-llvm check-lld; clang has <5 llvm-objdump tests; ) will be impacted by this change. If there aren't many, I can fix them and commit that for you :)

This change makes sense and it will improve compatibility with GNU objdump. For your tests (I've commented there), I usually prefer llvm-readelf/llvm-readobj to llvm-objdump. The output of objdump also looks very weird to me. I remember @emaste said so in one FreeBSD issue.. I can check how many tests (you should at least check ninja -C Release check-llvm check-lld; clang has <5 llvm-objdump tests; ) will be impacted by this change. If there aren't many, I can fix them and commit that for you :)

MaskRay, looks like many tests under lld/tests are using llvm-objdump, a grep for llvm-objdump on this folder returned >1000 entries.
In any case I verified that llvm-readobj really can be used for my tests. I'll rewrite them for sure, thanks!

MaskRay, looks like many tests under lld/tests are using llvm-objdump, a grep for llvm-objdump on this folder returned >1000 entries.

We can always add the new printing under an option and migrate the tests in batches, removing the option once we have migrated all the tests. If we end up going that direction I'll gladly help update some of the tests to help move this along.

See https://reviews.llvm.org/D61718 , not many tests need updating. The dump still diverges from GNU objdump in a few ways, e.g. the width of the size field, the number of spaces, the 'g' symbol etc

adalava abandoned this revision.May 9 2019, 8:53 AM

See https://reviews.llvm.org/D61718 , not many tests need updating. The dump still diverges from GNU objdump in a few ways, e.g. the width of the size field, the number of spaces, the 'g' symbol etc

Thank you, MaskRay. I'm abandoning this review in favor of D61718. Let's continue discussing there, please check my comments in D61718.