This is an archive of the discontinued LLVM Phabricator instance.

[llvm-readobj] Don't print out section names for STABS symbols
ClosedPublic

Authored by int3 on Sep 28 2020, 11:46 PM.

Details

Summary

This diff is similar to what D71394 did for llvm-objdump -- it avoids
trying to look up a section name for STABS symbols, since some STABS
symbol types (like N_OSO) use the n_sect field to store other data
instead of a section index.

Diff Detail

Event Timeline

int3 created this revision.Sep 28 2020, 11:46 PM
Herald added a project: Restricted Project. · View Herald Transcript
int3 requested review of this revision.Sep 28 2020, 11:46 PM

Is it possible to replace the test binary with yaml2obj?

int3 updated this revision to Diff 294898.Sep 28 2020, 11:57 PM

use yaml2obj

int3 retitled this revision from [llvm-readobj] Fix crash when printing out stabs symbols of type N_OSO to [llvm-readobj] Fix error when printing out stabs symbols of type N_OSO.Sep 28 2020, 11:57 PM
llvm/tools/llvm-readobj/MachODumper.cpp
274

type -> Type

int3 updated this revision to Diff 294899.Sep 29 2020, 12:02 AM
int3 marked an inline comment as done.

capitalize

grimar added inline comments.Sep 29 2020, 2:06 AM
llvm/test/tools/llvm-readobj/MachO/Inputs/stabs.obj.macho-x86_64.yaml
1 ↗(On Diff #294899)

Do you need a separate input file? I think you can inline it to the test file.
(See how tests in llvm-readobj/ELF do).

250 ↗(On Diff #294899)

I am not an expert in MachO, but It feels that this YAML contains excessive symbols, sections, strings etc.
The best practice is to have tests as little as possible to improve readability. Can this one be reduced?

llvm/test/tools/llvm-readobj/MachO/stabs-macho.test
1 ↗(On Diff #294899)

All of our llvm-readobj tests for ELF normally have a header with a description that explains
what is tested. I'd recommend doing this for MachO too.

Also, since your new test is in the MachO folder, you probably don't need a -macho suffix
in its name, it is excessive.

3 ↗(On Diff #294899)

Please separate CHECK line from RUN line with an empty line.

int3 marked 3 inline comments as done.Sep 29 2020, 11:58 AM
int3 added inline comments.
llvm/test/tools/llvm-readobj/MachO/Inputs/stabs.obj.macho-x86_64.yaml
250 ↗(On Diff #294899)

it's a bit of a pain to reduce yaml by hand. A lot of the extra symbols are things that get inserted into a trivial Mach-O binary by clang and/or the linker. I've removed a bunch of extra load commands but left the symbols as-is (symbols are harder to remove)

int3 updated this revision to Diff 295074.Sep 29 2020, 11:58 AM

address comments

A nit from me. Leaving the discussion on the Mach-O yaml to others. One of these days somebody's going to break and actually make yaml2obj more usable for Mach-O!

llvm/test/tools/llvm-readobj/MachO/stabs.yaml
2

(I wasn't initially sure if the "--" had some special meaning in the context of stabs - it was just easier to change the punctuation)

grimar added inline comments.Sep 30 2020, 12:41 AM
llvm/test/tools/llvm-readobj/MachO/stabs.yaml
121

I was able to quickly reduce the number of commands here to 2 by simple removing of unused parts:

ncmds:  2
...
LoadCommands:
  - cmd:             LC_SEGMENT_64
    cmdsize:         232
    segname:         __TEXT
    vmaddr:          4294967296
    vmsize:          4096
    fileoff:         0
    filesize:        4096
    maxprot:         5
    initprot:        5
    nsects:          1
    flags:           0
    Sections:
      - sectname:        __text
        segname:         __TEXT
        addr:            0x0000000100000FA0
        size:            15
        offset:          0x00000FA0
        align:           4
        reloff:          0x00000000
        nreloc:          0
        flags:           0x80000400
        reserved1:       0x00000000
        reserved2:       0x00000000
        reserved3:       0x00000000
        content:         554889E531C0C745FC000000005DC3
  - cmd:             LC_SYMTAB
    cmdsize:         24
    symoff:          4152
    nsyms:           11
    stroff:          4328
    strsize:         104
210

I was able to remove the whole ExportTrie description.

llvm/tools/llvm-readobj/MachODumper.cpp
287

I was able to comment out few cases:

switch (Type) {
case MachO::N_FUN:
//case MachO::N_STSYM:
//case MachO::N_LCSYM:
case MachO::N_BNSYM:
//case MachO::N_SLINE:
case MachO::N_ENSYM:
case MachO::N_SO:
//case MachO::N_SOL:
//case MachO::N_ENTRY:
//case MachO::N_ECOMM:
//case MachO::N_ECOML:
  return true;
default:
  return false;

And the test case passed for me. I think it means these cases are not tested properly.

int3 added inline comments.Oct 2 2020, 7:03 PM
llvm/test/tools/llvm-readobj/MachO/stabs.yaml
121

the resulting binary won't execute successfully though. I kept them because I thought it would be nice to have a somewhat representative input to the linker...

llvm/tools/llvm-readobj/MachODumper.cpp
287

I got these cases from reading stabs.h in the macOS SDK, but I'm not actually sure how to get the linker to generate them. Would you prefer I not print out the section name for all stabs sections, as was done for llvm-objdump in D71394?

Another option would be to edit the YAML to create these stabs symbols, but as above, it seemed a bit iffy to be creating something that might not match a realistic input.

grimar added inline comments.Oct 5 2020, 12:32 AM
llvm/test/tools/llvm-readobj/MachO/stabs.yaml
121

I don't think we do this normally. E.g, usually any`llvm-readobj (ELF)` test cases are as minimal as possible and we don't care about feeding them to a linker and executing. It is significantly easier to read, mantain and expand tests that are little and contain only important parts.

llvm/tools/llvm-readobj/MachODumper.cpp
287

Would you prefer I not print out the section name for all stabs sections, as was done for llvm-objdump in D71394?

I am - not an expert in MachO, so this is probably a question to someone who is more familar with MachO. I.e. what output is expected. Perhaps it is fine, since was accepted for llvm-objdump.

Another option would be to edit the YAML to create these stabs symbols, but as above, it seemed a bit iffy to be creating something that might not match a realistic input.

For testing - I think this would be a right approach. For ELF dumper part of llvm-readelf we handle many unrealistic cases to check we print a reasonable output and/or warnings and don't crash etc. llvm-readelf is a tool that can be used to inspect broken inputs, so I believe it is fine to feed it with unusual things and check it shows a resonable output.

Another point is that if you don't want to create "something that might not match a realistic input", then probably you don't need to write an untested code that supports this first of all?

To summarize, I see following solutions here:

  1. Either keep only the code for cases that are covered by test and leave the rest for follow-ups.
  2. Add all possible types of stab symbols manually to YAML.
int3 updated this revision to Diff 296195.Oct 5 2020, 8:48 AM
int3 marked an inline comment as done.

minimize test case, remove untested code

int3 added inline comments.Oct 5 2020, 8:48 AM
llvm/tools/llvm-readobj/MachODumper.cpp
287

I think keeping the code only for cases covered by test would make for pretty surprising behavior to someone who hasn't seen the corresponding test. I could see some future user being confused as to why an arbitrary subset of STABS symbols get their section names displayed. I think matching llvm-objdump's behavior (and adding a TODO) is easiest for now.

grimar added inline comments.Oct 6 2020, 12:13 AM
llvm/test/tools/llvm-readobj/MachO/stabs.yaml
181

2 points:

  1. There are a few symbols with duplicating types. E.g. there are 3 that have type == 0x64. It does not make sense to test the same type again and again. This makes the test larger for no reason.

You've previously said that "symbols are harder to remove", but I've tried and found it is actually very easy: all you need is to remove a symbol and adjust the nsyms value. E.g in the piece below I've removed duplicated symbols with type == 0x64:

    nsyms:           7
    stroff:          4328
    strsize:         104
LinkEditData:
  NameList:
    - n_strx:          45
      n_type:          0x64
      n_sect:          0
      n_desc:          0
      n_value:         0
    - n_strx:          72
      n_type:          0x66
      n_sect:          3
      n_desc:          1
      n_value:         1601361378
    - n_strx:          1
      n_type:          0x2E
      n_sect:          1
      n_desc:          0
      n_value:         4294971296
    - n_strx:          96
      n_type:          0x24
      n_sect:          1
      n_desc:          0
      n_value:         4294971296
    - n_strx:          1
      n_type:          0x24
      n_sect:          0
      n_desc:          0
      n_value:         15
    - n_strx:          1
      n_type:          0x4E
      n_sect:          1
      n_desc:          0
      n_value:         15
    - n_strx:          22
      n_type:          0x0F
      n_sect:          1
      n_desc:          0
      n_value:         4294971296
  1. It is not clear what each n_type means. Probably worth adding comments:
- n_strx:          72
  n_type:          0x66 ## N_OSO
  n_sect:          3
  n_desc:          1
  n_value:         1601361378

You also need to update the patch title. It says about "symbols of type N_OSO", but the code changes the behavior for all stabs symbols.

int3 updated this revision to Diff 296468.Oct 6 2020, 8:16 AM

update

int3 retitled this revision from [llvm-readobj] Fix error when printing out stabs symbols of type N_OSO to [llvm-readobj] Don't print out section names for STABS symbols.Oct 6 2020, 8:17 AM
int3 edited the summary of this revision. (Show Details)
int3 added a comment.Oct 6 2020, 8:19 AM

You've previously said that "symbols are harder to remove", but I've tried and found it is actually very easy: all you need is to remove a symbol and adjust the nsyms value.

Most of the hassle is in cleaning up the corresponding strings in the string table, but I guess I could have just removed the symbol entries and left the strings in. Anyway, I've cleaned up both now...

int3 updated this revision to Diff 296480.Oct 6 2020, 9:02 AM

update objcopy test

I have no more comments (except the one minor nit).
Someone else, who is more famailar with MachO should give a final approve I think.

llvm/test/tools/llvm-readobj/MachO/stabs.yaml
101

## -> # here and below.

(We should be consistent in marking comments).

You've previously said that "symbols are harder to remove", but I've tried and found it is actually very easy: all you need is to remove a symbol and adjust the nsyms value.

Most of the hassle is in cleaning up the corresponding strings in the string table, but I guess I could have just removed the symbol entries and left the strings in. Anyway, I've cleaned up both now...

I think it could be OK to have a single string, e.g "foo". Probably it is not a problem to use it for all symbols in tests like this,
as all you need to check is that something is dumped or not.

Or, it could be a set of strings of the same sizes, e.g. "foo", "bar", "zed" what would allow to computate string index much easier.
Or, a single large string, like "abcdefghijhklm", but you could use different n_strx indices (N...1) so that each symbol got a different name in result ("m". "lm", "klm" etc).

Sure, it is just a problem of yaml2elf COFF that it seems doesn't allow to assing names to symbols easily.

jhenderson accepted this revision.Oct 7 2020, 3:19 AM

Looks reasonable aside from @grimar's comments, but needs a person with more Mach-O experience to review too before this gets committed.

This revision is now accepted and ready to land.Oct 7 2020, 3:19 AM
int3 added inline comments.Oct 7 2020, 3:00 PM
llvm/test/tools/llvm-readobj/MachO/stabs.yaml
101

Did you mean # -> ##?

jhenderson added inline comments.Oct 8 2020, 1:01 AM
llvm/test/tools/llvm-readobj/MachO/stabs.yaml
101

I think that's what he meant - comments should use ##.

grimar added inline comments.Oct 8 2020, 1:18 AM
llvm/test/tools/llvm-readobj/MachO/stabs.yaml
101

Yes :)

int3 updated this revision to Diff 297652.Oct 12 2020, 11:40 AM
  1. -> ##
int3 added a subscriber: davide.

Adding @davide who seems to have contributed significantly to MachODumper.cpp in the past

davide removed a reviewer: davide.Oct 12 2020, 2:23 PM

I'm sorry, I haven't touched this code in a long time and I don't feel qualified to review it.

llvm/tools/llvm-readobj/MachODumper.cpp
631

maybe I'm missing something, but it looks like on the lines 631-639 only the case of non-stab symbols is handled, so for a stab symbol SectionName is set to "", is there another place where it's updated ? (to cover the "unless we know that .. " part of the comment).

int3 added inline comments.Oct 12 2020, 3:02 PM
llvm/tools/llvm-readobj/MachODumper.cpp
631

It's not handled. That's what the TODO comment is for :)

smeenai accepted this revision.Oct 12 2020, 3:11 PM

LGTM.

mtrent resigned from this revision.Oct 12 2020, 3:17 PM
int3 updated this revision to Diff 297741.Oct 12 2020, 6:56 PM

rename _main -> _foo; left other strings as-is since they represent paths

This revision was landed with ongoing or failed builds.Oct 12 2020, 6:56 PM
This revision was automatically updated to reflect the committed changes.