This is an archive of the discontinued LLVM Phabricator instance.

Print reasonable representations of type names in llvm-nm, readelf and readobj.
ClosedPublic

Authored by Sunil_Srivastava on Jul 31 2019, 1:38 PM.

Details

Diff Detail

Repository
rL LLVM

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptJul 31 2019, 1:38 PM

This patch makes no executable code changes, just the ElfSymbolTypes table.

I've added a few more reviewers who I know have been working on the GNU binutils recently.

test/tools/llvm-nm/format-sysv-type.test
44 ↗(On Diff #212652)

When looking at this offline, I missed this. It isn't great that we have IFUNC printed here, because not all ABIs support such symbol types and it actually makes us less compatible with GNU nm. I think we should change the table to print "<OS specific>: 10" here, and put a (hopefully) temporary patch in llvm-readobj to convert that particular case to IFUNC, pending the resolution of https://bugs.llvm.org/show_bug.cgi?id=42686.

jhenderson added inline comments.Aug 1 2019, 2:08 AM
test/tools/llvm-nm/format-sysv-type.test
44 ↗(On Diff #212652)

Sorry, got mixed up. This comment should probably be on the readelf test, not the llvm-nm test. Please double-check versus GNU readelf though!

jhenderson added inline comments.Aug 1 2019, 2:10 AM
lib/Object/ELFObjectFile.cpp
46 ↗(On Diff #212652)

This should probably be moved so that it's in numerical order, i.e. between 9 and 11.

grimar added a comment.Aug 1 2019, 2:46 AM

Well, my main concern is about ElfSymbolTypes array values representation.
I am not sure how my suggestion would affect the output, btw...

include/llvm/Object/ELFObjectFile.h
45 ↗(On Diff #212652)

not relative to this patch, but I wonder why this is not a vector. (I do not like having a NumElfSymbolTypes).

lib/Object/ELFObjectFile.cpp
54 ↗(On Diff #212652)

It looks a bit confusing to me. When I see "Unknown 7" it looks like its a 7th Unknown.
What do you think about hte following format?

{"Unknown(7)", "<unknown>(7)", 7},
{"Unknown(8)", "<unknown>(8)", 8},
{"Unknown(9)", "<unknown>(9)", 9},
{"OS Specific(11)", "<OS specific>(11)", 11},
{"OS Specific(12)", "<OS specific>(12)", 12},
{"Proc Specific(13)", "<processor specific>(13)", 13},
{"Proc Specific(14)", "<processor specific>(14)", 14},
{"Proc Specific(15)", "<processor specific>(15)", 15}
grimar added a comment.Aug 1 2019, 3:07 AM

I tested the output with the change I suggested and it was:

Name                  Value           Class        Type         Size
 Line  Section
os_specific_10       |                |   U  |             IFUNC|     |     |*UND*
os_specific_11       |                |   U  | <OS specific>(11)|     |     |*UND*
os_specific_12       |                |   U  | <OS specific>(12)|     |     |*UND*
proc_specific_13     |                |   U  |<processor specific>(13)|     |     |*UND*
proc_specific_14     |                |   U  |<processor specific>(14)|     |     |*UND*
proc_specific_15     |                |   U  |<processor specific>(15)|     |     |*UND*
symbol_common        |                |   U  |            COMMON|     |     |*COM*
symbol_file          |                |   U  |              FILE|     |     |*UND*
symbol_func          |                |   U  |              FUNC|     |     |*UND*
symbol_ifunc         |                |   U  |             IFUNC|     |     |*UND*
symbol_notype        |                |   U  |            NOTYPE|     |     |*UND*
symbol_obj           |                |   U  |            OBJECT|     |     |*UND*
symbol_tls           |                |   U  |               TLS|     |     |*UND*
unknown_7            |                |   U  |      <unknown>(7)|     |     |*UND*
unknown_8            |                |   U  |      <unknown>(8)|     |     |*UND*
unknown_9            |                |   U  |      <unknown>(9)|

Looks good to me, though I do not really mind to keep the original output probably.
Lets see what others think.

The output for llvm-readelf should be identical to what GNU readelf prints, which is what the patch proposes. I'm happy with other options for the LLVM output. @grimar's suggestion here sounds good to me, though I'd suggest adding a space: Unknown (7).

Sunil_Srivastava marked 3 inline comments as done.Aug 1 2019, 9:33 AM
Sunil_Srivastava added inline comments.
include/llvm/Object/ELFObjectFile.h
45 ↗(On Diff #212652)

This goes back to an already submitted patch https://reviews.llvm.org/rL355742, where NumElfSymbolTypes was introduced. The table ElfSymbolTypes was moved from ELFDumper.cpp to ELFObjectFIle.cpp because it was also needed in llvm-nm.cpp

So, ElfSymbolTypes table is initialized in ELFObjectFile.cpp but it needs to be traversed in llvm-nm.cpp (function getELFTypeName) and ELFDumper.cpp, which needs to know the size. So the size needs to be somewhere visible to all of them. ELFObjectFile.h is that common file.

But perhaps it can be made a Vector. Current value, 16, is the maximum value, BTW because the type field is a 4 bit field. We could perhaps remove the named constant and just use hardcoded 16. All those will be separate issues though, that can be takes as a cleanup patch.

lib/Object/ELFObjectFile.cpp
54 ↗(On Diff #212652)

I am open to any format for the first column; whatever the consensus is.

The second column comes from what the GNU nm prints, so that has to remain "<unknown>: 7", even with the inconsistent styles between 7, 11 and 13.

test/tools/llvm-nm/format-sysv-type.test
44 ↗(On Diff #212652)

OK. Will update the proposal after checking GNU readelf.

Sunil_Srivastava marked an inline comment as done.Aug 1 2019, 2:16 PM
Sunil_Srivastava added inline comments.
test/tools/llvm-nm/format-sysv-type.test
44 ↗(On Diff #212652)

To clarify:

For type 10,

  • GNU readelf prints IFUNC. So does llvm-nm with this proposal.
  • GNU "nm --format=sysv" prints "<OS specific>: 10". This proposal prints IFUNC.

So you want llvm-nm to print "<OS specific>: 10" , at least for now. Right ? That will change llvm-nm test.

Looking at the code of both, it is easier to leave the ElfSymbolTypes table with IFUNC and make a special case in llvm-nm.cpp.

Will wait for confirmation before making change.

MaskRay added a subscriber: ro.Aug 1 2019, 9:48 PM
jhenderson added inline comments.Aug 2 2019, 4:41 AM
test/tools/llvm-nm/format-sysv-type.test
44 ↗(On Diff #212652)

Having thought about it, here are my simple rules:

  1. Places in llvm-nm and llvm-readelf that already print IFUNC should continue to do this for this patch (but we should discuss changing it to "<OS specific>: 10" later).
  2. Places in llvm-nm and llvm-readelf that do not currently print IFUNC should continue to not print IFUNC.

In other words, the only behaviour change in llvm-nm and llvm-readelf should be to make things printing just "7" or whatever print <OS specific>/<unknown> etc. Everything else should be deferred to a later change.

Sunil_Srivastava marked an inline comment as done.Aug 2 2019, 9:13 AM
Sunil_Srivastava added inline comments.
test/tools/llvm-nm/format-sysv-type.test
44 ↗(On Diff #212652)

Hi James,

I agree with these "simple rules".

I believe the current proposal meets these rules. It does not change any IFUNC lines in tests, other than white-space changes, so it preserves the IFUNC behavior. It just fills missing entries.

Any change in the existing behavior should be a separate issue.

MaskRay added inline comments.Aug 3 2019, 7:05 PM
test/tools/llvm-nm/format-sysv-type.test
44 ↗(On Diff #212652)

readelf prints IFUNC if e_ident[EI_OSABI] is ELFOSABI_NONE, ELFOSABI_GNU, or ELFOSABI_FREEBSD (Linux and FreeBSD are the two OSes I know that adopted IFUNC). Interestingly nm doesn't print IFUNC.

I am on the fence if it is necessary to make our IFUNC logic so complex for llvm-readelf/llvm-readobj. For this change, we should stick with the "simple rules".

I believe the current proposal meets these rules. It does not change any IFUNC lines in tests, other than white-space changes, so it preserves the IFUNC behavior. It just fills missing entries.

Okay. Please make the changes suggested inline re. ordering and formatting of the output ("Unknown (7)" or possibly "Unknown (0x7)" seems reasonable to me).

Changed the table order and, typename style from "Unknown 7" to "Unknown(7)".

jhenderson added inline comments.Aug 6 2019, 2:27 AM
lib/Object/ELFObjectFile.cpp
54 ↗(On Diff #212652)

I forgot this, but as demonstrated by your test, a typical symbol type output for a symbol in LLVM output style is "Section (0x3)". I don't think we want to repeat the number, and I therefore think simply printing "Unknown"/"OS Specific" etc in the first column is sufficient - the rest of the code then will append the "(0x7)" or whatever. In other words, the entries should be 3 "Unknown" strings, 2 "OS Specific" and 3 "Proc Specific".

Sunil_Srivastava marked an inline comment as done.

Removed number from the first column of the table, as suggested by James Henderson.

Sunil_Srivastava marked an inline comment as done.Aug 8 2019, 4:26 PM
Sunil_Srivastava marked an inline comment as done.
Sunil_Srivastava added inline comments.
lib/Object/ELFObjectFile.cpp
54 ↗(On Diff #212652)

In other words, the entries should be 3 "Unknown" strings, 2 "OS Specific"
and 3 "Proc Specific".

OK. Done.

This revision is now accepted and ready to land.Aug 9 2019, 1:43 AM
This revision was automatically updated to reflect the committed changes.