This is an archive of the discontinued LLVM Phabricator instance.

[llvm-readelf] - Allow dumping the dynamic symbols when there is no program headers.
ClosedPublic

Authored by grimar on Sep 2 2019, 6:41 AM.

Details

Summary

D62179 introduced a regression. llvm-readelf lose the ability to dump the dynamic symbols
when there is .dynamic section with a DT_SYMTAB, but there are no program headers:
https://reviews.llvm.org/D62179#1652778

Below is a program flow before the D62179 change:

  1. Find SHT_DYNSYM.
  2. Find there is no PT_DYNAMIC => don't try to parse it.
  3. Print dynamic symbols using information about them found on step (1).

And after the change it became:

  1. Find SHT_DYNSYM.
  2. Find there is no PT_DYNAMIC => find SHT_DYNAMIC.
  3. Parse dynamic table, but fail to handle the DT_SYMTAB because of the absence of the PT_LOAD. Report the "Virtual address is not in any segment" error.

This patch fixes the issue. For doing this it checks that the value of DT_SYMTAB was mapped to a segment. If not - it ignores it.

Diff Detail

Event Timeline

grimar created this revision.Sep 2 2019, 6:41 AM
grimar marked an inline comment as done.Sep 2 2019, 6:42 AM
grimar added inline comments.
tools/llvm-readobj/ELFDumper.cpp
1655

for example when we want to dump the dynamic relocations and there is no sections table

This is used and tested by test\Object\relocation-executable.test

grimar edited the summary of this revision. (Show Details)Sep 2 2019, 6:43 AM
MaskRay added inline comments.Sep 2 2019, 9:26 AM
test/tools/llvm-readobj/dyn-symbols.test
185

The test is different from https://reviews.llvm.org/D62179#1652778

In https://reviews.llvm.org/D62179#1652778, there is no program headers (corrupted phdr), so toMappedAddr returns a nullptr. Here, DT_SYMTAB is 0. When the information from DT_SYMTAB and .dynsym conflict, we probably should let DT_DYNTAB win...

tools/llvm-readobj/ELFDumper.cpp
1656

We probably should check whether toMappedAddr returns a nullptr - which is the case if there is no program headers. This approach may be better than

if (!DynSymRegion.EntSize) {
grimar marked an inline comment as done.Sep 3 2019, 1:08 AM
grimar added inline comments.
test/tools/llvm-readobj/dyn-symbols.test
185

The test is different from https://reviews.llvm.org/D62179#1652778

I think not. Here is the same YAML but with the program headers:

--- !ELF
FileHeader:
  Class:   ELFCLASS64
  Data:    ELFDATA2LSB
  Type:    ET_DYN
  Machine: EM_X86_64
Sections:
  - Name:    .dynamic
    Type:    SHT_DYNAMIC
    Entries:
      - Tag:   DT_SYMTAB
        Value: 0
      - Tag:   DT_NULL
        Value: 0
DynamicSymbols:
  - Name: foo
ProgramHeaders:
  - Type: PT_LOAD
    Flags: [ PF_R ]
    VAddr: 0x0000
    PAddr: 0x0000
    Align: 8
    Sections:
      - Section: .dynsym

The 0 value for DT_SYMTAB is valid. I just removed the ProgramHeaders.

When the information from DT_SYMTAB and .dynsym conflict, we probably should let DT_DYNTAB win...

I do not understand why. Particulary I do not understand why we should take .dynsym size from the section headers table
(it is the only source it seems), but its address from another place.
The much simpler logic is to try to take everything from section header at first and fallback to DT_SYMTAB. This is what this patch does.

If we have different results, then I guess it would be correct to report a warning, but I am not sure why DT_SYMTAB should win.
Doesn't it mean we have a broken object? In this case why should we make any assumptions about which place is correct and which is not?

MaskRay added inline comments.Sep 3 2019, 1:29 AM
test/tools/llvm-readobj/dyn-symbols.test
185

There are two approaches to locate the dynamic symbol table:

  • program header -> PT_DYNAMIC -> DT_SYMTAB
  • section header -> .dynsym

The first approach is used by the final consumer of an executable/shared object: ld.so or some other interpreter/emulator. They locate the dynamic table (from a program header), parse it, and use DT_SYMTAB to resolve symbol lookups. They ignore the section header. Prioritizing the section header can potentially be cheated by a malicious binary.

// Information in the section header has priority over the information

// in a PT_DYNAMIC header.

So, I think we should probably prioritize the information in PT_DYNAMIC.

In this case, the program header does not exist. I think it is fine to ignore DT_SYMTAB in the first approach.

I don't have any particular strong feelings either way on the discussion in the comments, so I've just made a number of suggested minor comment fixes.

test/tools/llvm-readobj/dyn-symbols.test
166

dump dynamic symbols table -> dump the dynamic symbol table

192

to address -> to an address
of absence -> of the absence
of the corresponding -> of a corresponding

tools/llvm-readobj/ELFDumper.cpp
1648

That -> Doing this

1649

This -> This behaviour

1650

Remove "exist"

1651

symbols -> symbol

1652

But we -> However, we

1654

sections table -> section table

grimar marked an inline comment as done.Sep 3 2019, 2:15 AM
grimar added inline comments.
test/tools/llvm-readobj/dyn-symbols.test
185

Information in the section header has priority over the information
in a PT_DYNAMIC header.

So, I think we should probably prioritize the information in PT_DYNAMIC.

This is comment taken from here:
https://github.com/llvm-mirror/llvm/blob/master/tools/llvm-readobj/ELFDumper.cpp#L1430

And it says that for locating the .dynamic we try to take the information from the sections header first and then fallback to PT_DYNAMIC if have no SHT_DYNAMIC section. That is how llvm-readeobj scans the object now.

So are you saying we should probably reimplement it to take the information from PT_DYNAMIC at first, right?

MaskRay added inline comments.Sep 3 2019, 4:15 AM
test/tools/llvm-readobj/dyn-symbols.test
185

Yes, I think we should try PT_DYNAMIC first. I believe this is one of few things that GNU readelf -D (--use-dynamic) does. The option has been around for more than 20 years (it has been available since the commit 19990502 sourceware import). Fake .dynsym has been used by malware for years. I think we can just always default to -D.

grimar updated this revision to Diff 218454.Sep 3 2019, 7:43 AM
grimar marked 10 inline comments as done.
grimar edited the summary of this revision. (Show Details)
  • Reimplemented in according to discussion.
  • Added more test cases to verify the new behavior.
tools/llvm-readobj/ELFDumper.cpp
1648

I had to rewrite this comment completely because of behavior change...

jhenderson added inline comments.Sep 4 2019, 9:19 AM
test/tools/llvm-readobj/dyn-symbols.test
220

to address -> to an address

251

of dynamic -> of the dynamic

274–275

Won't this be implicitly generated?

tools/llvm-readobj/ELFDumper.cpp
1648

dynamic symbols section -> dynamic symbol table

(technically, if there are no section headers, there's no dynamic symbol section)

1650

has a priority -> has priority

1651

in runtime -> at runtime

Normally the -> The
(should implies normally)

1653

to address -> to an address

1656

Using DynSymRegion.EntSize to determine if the dynamic symbol table has been found via a section header feels a bit weird to me. Please at least add a comment.

1660

of dynamic -> of the dynamic

grimar updated this revision to Diff 218871.Sep 5 2019, 2:37 AM
grimar marked 10 inline comments as done.
  • Addressed review comments.
test/tools/llvm-readobj/dyn-symbols.test
274–275

Yes, but this is probably not so obvious here? It probably can raise questions,
like: "will the default SHT_DYNSYM be generated if we already have one in the YAML?".
Having 2 described makes this test case pretty straightforward.

This revision is now accepted and ready to land.Sep 5 2019, 5:51 AM
MaskRay accepted this revision.Sep 5 2019, 5:56 AM
MaskRay added inline comments.
tools/llvm-readobj/ELFDumper.cpp
1661

This warning looks good :) People will stay alert when they see this warning.

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptSep 5 2019, 7:06 AM