This is an archive of the discontinued LLVM Phabricator instance.

[obj2yaml] - Teach tool to dump program headers.
ClosedPublic

Authored by grimar on Feb 28 2020, 4:42 AM.

Details

Summary

Currently obj2yaml does not dump program headers,
this patch teaches it to do that.

Diff Detail

Event Timeline

grimar created this revision.Feb 28 2020, 4:42 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 28 2020, 4:42 AM
Herald added a subscriber: mgrang. · View Herald Transcript
grimar added inline comments.Feb 28 2020, 4:44 AM
llvm/tools/obj2yaml/elf2yaml.cpp
405

This change will go away after we land https://reviews.llvm.org/D74955

The dumpProgramHeaders() logic is similar to llvm-objcopy/ELF/Object.cpp:sectionWithSegment. It also reminds me of D74755... The attribution of an empty section on the boundary between two segments will not affect obj2yaml correctness.

llvm/tools/obj2yaml/elf2yaml.cpp
217

I haven't carefully read through the code. Just raise an opinion.

For non-SHT_NOBITS sections, sh_offset/p_offset works better than sh_addr/p_vaddr.
For SHT_NOBITS sections, sh_offset can be ignored. See D58426.

labath added a comment.Mar 2 2020, 1:25 AM

No opinion on the code, but I think that the ability to dump program headers will be very useful. I've been wanting to have a feature like this for quite some time.

The main thing I am interested in is being able to dump core files (segments without any sections), which I guess this doesn't do yet, but it seems to be a step towards it.

jhenderson added inline comments.Mar 2 2020, 1:38 AM
llvm/include/llvm/ObjectYAML/ELFYAML.h
179 ↗(On Diff #247225)

and is used

llvm/test/tools/obj2yaml/program-headers.yaml
5

--program-headers/--segments instead of -a

78

Some additional interesting test cases:

  1. Standalone empty segments
  2. Nested empty segments (at start/end/middle)
  3. Empty sections in segments (at start/end/middle)
  4. Misaligned segments?
  5. Out-of-order PT_LOAD segments?
  6. Segments with gaps in them (file size != size of all sections), where gap is at start/middle/end of segment.
  7. Segments with memsize > file size.
  8. Non-alloc sections within segments.

(Sorry if any of these are already covered)

llvm/tools/obj2yaml/elf2yaml.cpp
177

Maybe this shouldn't be auto?

217

+1 to using sh_offset. That is what llvm-objcopy uses, IIRC, and I believe yaml2obj places things based on the file offset too.

The dumpProgramHeaders() logic is similar to llvm-objcopy/ELF/Object.cpp:sectionWithSegment. It also reminds me of D74755... The attribution of an empty section on the boundary between two segments will not affect obj2yaml correctness.

Yes, I think in obj2yaml, it doesn't matter which segments a section is within, because the output should have the segments at exactly the same place as the input. There should be no moving of segments in obj2yaml, in my opinion, as it should provide as faithful a representation as possbile of the original input.

grimar planned changes to this revision.Mar 2 2020, 11:53 PM

Will update after addressing review comments.

@jhenderson listed many good test cases. It requires some work. I've left some notes.

  1. Standalone empty segments

GNU ld can create a p_memsz=p_filesz=0 PT_LOAD. lld doesn't.

  1. Nested empty segments (at start/end/middle)

PT_TLS⊆PT_LOAD(RW)

(PT_TLS(p_memsz=0, p_align=64) is used by Android Bionic)

  1. Empty sections in segments (at start/end/middle)
  2. Misaligned segments?

p_offset%p_align != p_vaddr%p_align

  1. Out-of-order PT_LOAD segments?

PT_LOAD not ordered by p_vaddr.

ELF spec: Loadable segment entries in the program header table appear in ascending order, sorted on the p_vaddr member.

  1. Segments with gaps in them (file size != size of all sections), where gap is at start/middle/end of segment.

Type: Filler

  1. Segments with memsize > file size.

Needs non-empty SHT_NOBITS.

  1. Non-alloc sections within segments.
grimar updated this revision to Diff 249900.Mar 12 2020, 4:58 AM
grimar marked 5 inline comments as done.
grimar edited the summary of this revision. (Show Details)
  • Reimplemented, added test cases.

@jhenderson listed many good test cases. It requires some work. I've left some notes.

Thank you guys for these suggestions and comments, I used them in the update.

I've added tests for all cases listed except the next two.

Segments with gaps in them (file size != size of all sections), where gap is at start/middle/end of segment.

Type: Filler

I think we might want having a separate test for this. Also it needs more code I think.

Segments with memsize > file size.

Needs non-empty SHT_NOBITS.

I'd like to support SHT_NOBITS case separatelly too for the same reasons.

How does it sound?

grimar marked an inline comment as done.Mar 12 2020, 9:08 AM
grimar added inline comments.
llvm/test/Object/obj2yaml.test
667

.data section size == 0, I wonder if it is OK we dump it?

Can we ignore empty sections at the end of segment? I.e do what D74755 removes:

// If a section is empty it should be treated like it has a size of 1. This is
// to clarify the case when an empty section lies on a boundary between two
// segments and ensures that the section "belongs" to the second segment and
// not the first.
uint64_t SecSize = Sec.Size ? Sec.Size : 1;
MaskRay added inline comments.Mar 12 2020, 9:58 AM
llvm/test/Object/obj2yaml.test
667

I vote for removing uint64_t SecSize = Sec.Size ? Sec.Size : 1;

MaskRay added inline comments.Mar 12 2020, 10:46 AM
llvm/tools/obj2yaml/elf2yaml.cpp
202

getValueOr

224

for (; It != WorkList.end() && isInSegment<ELFT>(**It, Phdr); ++It)

The code checks VMA while isInSegment checks file offsets. It seems strange.
I think a pure O(n^2) algorithm is fine. llvm-objcopy/ELF/Object.cpp is O(n^2).

grimar updated this revision to Diff 250179.Mar 13 2020, 4:44 AM
grimar marked 4 inline comments as done.
  • Addressed review comments (simplified algorithm used).
llvm/tools/obj2yaml/elf2yaml.cpp
224

I think you are right. It is easier to use O(n*m) here, since the number of segments is not that large anyways.
Done.

  • Reimplemented, added test cases.

@jhenderson listed many good test cases. It requires some work. I've left some notes.

Thank you guys for these suggestions and comments, I used them in the update.

I've added tests for all cases listed except the next two.

Segments with gaps in them (file size != size of all sections), where gap is at start/middle/end of segment.

Type: Filler

I think we might want having a separate test for this. Also it needs more code I think.

Segments with memsize > file size.

Needs non-empty SHT_NOBITS.

I'd like to support SHT_NOBITS case separatelly too for the same reasons.

How does it sound?

llvm/test/tools/obj2yaml/program-headers.yaml
114

used->use

llvm/tools/obj2yaml/elf2yaml.cpp
203

cast<ELFYAML::Section>(*C) is better because you assert it to be non-null.

204

The two if can be combined.

grimar updated this revision to Diff 250361.Mar 14 2020, 8:43 AM
grimar marked 3 inline comments as done.
  • Rebased, addressed comments.

@jhenderson listed many good test cases. It requires some work. I've left some notes.

  1. Standalone empty segments

GNU ld can create a p_memsz=p_filesz=0 PT_LOAD. lld doesn't.

Not all segments have to be nested inside a PT_LOAD. We have several OS-specific segments in our downstream port that are not part of the memory image, but must appear in the link output, as they are used by tools other than the loader in our ecosystem, so they are not nested in PT_LOADs in our linker scripts. I imagine there are other cases too.

  • Reimplemented, added test cases.

@jhenderson listed many good test cases. It requires some work. I've left some notes.

Thank you guys for these suggestions and comments, I used them in the update.

I've added tests for all cases listed except the next two.

Segments with gaps in them (file size != size of all sections), where gap is at start/middle/end of segment.

Type: Filler

I think we might want having a separate test for this. Also it needs more code I think.

Segments with memsize > file size.

Needs non-empty SHT_NOBITS.

I'd like to support SHT_NOBITS case separatelly too for the same reasons.

How does it sound?

Those both sound reasonable to me.

llvm/include/llvm/ObjectYAML/ELFYAML.h
179 ↗(On Diff #247225)

holds -> hold

llvm/test/Object/obj2yaml.test
667

I think for obj2yaml, it's fine to do this, since obj2yaml should just recreate the layout, and the sections in the segment just help to configure the program headers.

llvm/test/tools/obj2yaml/program-headers.yaml
4

looks -> look

7

Thinking about it, do we really need this part? It's really testing yaml2obj and/or llvm-readelf, not obj2yaml.

9

dumped -> dump

24–33

You need {{$}} at the end of all of these lines, since there could be more segments after the last one in each of them.

133

into it.

(same throughout)

142

Might be worth saying "a nested dynamic segment" to highlight that being nested is also an interesting property.

158

Should this be moved to Part II?

173

This comment needs to explain "why" it must be the same as the previous segment. Otherwise, I think the first sentence is unnecessary.

"we and are" -> "we are"

291

It might be clearer to explicitly specify .empty.tls.end here?

Also, this appears to be missing a PT_TLS at the start of the segment, i.e. covering .empty.tls.start?

359

in a segment.

I disagree with this behaviour. A segment doesn't not have to contain only SHF_ALLOC sections (it is not mandated by the ELF gABI, and my organisation at least has linker scripts like that for segments other than PT_LOAD). If a segment's file offset + file size implies that it covers a non-alloc section, it should include it, no different to an alloc section.

llvm/tools/obj2yaml/elf2yaml.cpp
182–183

Could this return an Expected<std::vector<ELFYAML::PogramHeader>>? That would be cleaner than passing in something to be filled.

200–201

Maybe add a TODO here to add support for Fills?

204

As noted in the test, this SHF_ALLOC check is incorrect. It doesn't faithfully recreate what was in the object, and doesn't allow for non-loadable segments. Please remove it.

grimar planned changes to this revision.Mar 19 2020, 3:22 AM
grimar marked 2 inline comments as done.
grimar added inline comments.
llvm/tools/obj2yaml/elf2yaml.cpp
204

The approach used by this patch won't work then.

We do not create Chunks for non-allocatable SHT_STRTAB, SHT_SYMTAB and SHT_DYNSYM,
so iterating over Chunks does not work for non-allocatable case in general.

grimar marked 2 inline comments as done.Mar 19 2020, 3:27 AM
grimar added inline comments.
llvm/tools/obj2yaml/elf2yaml.cpp
204

When I experimented with how to implement it, I had a prototype that had more complex code, but had no this limitation.
I'll try to return to what I had.

jhenderson added inline comments.Mar 19 2020, 3:47 AM
llvm/tools/obj2yaml/elf2yaml.cpp
204

FWIW, our use case doesn't have the static symbol table or string table inside a segment, although I don't know of any strict reason why this shouldn't be allowed.

asmith added a subscriber: asmith.Mar 21 2020, 10:03 AM
grimar updated this revision to Diff 252004.Mar 23 2020, 6:01 AM
grimar marked 21 inline comments as done.
  • Changed implementation to support non-allocatable sections as requested.
  • Addressed review comments.
llvm/test/tools/obj2yaml/program-headers.yaml
7

I did it for documentation purposes: (1) I find it very convenient for reading and maintaining this test: with it we can check and compare obj2yaml's output with "expected" version easily. (2) This also might allow us to refer and explain (in comments) any differences we might have in outputs.
I'd really would like to keep it because of (1) if there are no strong objections.

158

I am not sure why. This tests we can have an empty segment somewhere between other ones.
Do you want me to add an additional test to Part II that will have an object with only one empty segment?

291

Done.

359

Done.

llvm/tools/obj2yaml/elf2yaml.cpp
182–183

Done. I think I did it to be consistent with dumpSymbols, but returning Expected<std::vector<ELFYAML::ProgramHeader>> is better indeed,

200–201

I avoided any additional details as I am not sure it is clear to me how exactly we are going to support them.
(e.g. for example, I guess we might want to have a command line option, as I doubt we always want to dump a data between sections).
Just having something like "TODO: add a support for Fill chunks" is probably not very useful?

grimar added a comment.EditedMar 23 2020, 6:05 AM

Note: I've changed implementation slightly. Now we dump all sections and then after dumping program headers we remove sections we do not want to see in an output.
It allows us to dump sections in one place and work with array of Chunks. It also should help to support Filler and seems makes the whole logic a bit nicer (in context of this patch).

It is possible to split this change to a separate NFC patch and I'll be happy to do it if we agree that it is a good way to go.

jhenderson added inline comments.Mar 24 2020, 2:47 AM
llvm/include/llvm/ObjectYAML/ELFYAML.h
180 ↗(On Diff #252004)

Hold -> Holds (!)

(For reference, third-person singular verbs usually have an 's' on the end, but not third-person plural - previously multiple variables were being referenced, so plural, but now there is only one)

llvm/test/tools/obj2yaml/program-headers.yaml
7

No strong objection, but might be worth moving the comment to the llvm-readelf run line, along with a statement saying the check is here to make it clear what the layout should look like.

24–33

On second thoughts, using --match-full-lines might be just as good. I don't mind either way.

158

I don't think there's a need for that. I wouldn't expect that to be a special case.

174

the same as the previous one for this

374

to segments -> in segments

llvm/tools/obj2yaml/elf2yaml.cpp
177–178

Do we need this special case? Do you have a test case that illustrates it?

229–234

This comment refers to allocatable sections, which I don't think is quite right?

grimar marked 3 inline comments as done.Mar 24 2020, 3:29 AM
grimar added inline comments.
llvm/include/llvm/ObjectYAML/ELFYAML.h
180 ↗(On Diff #252004)

Thanks.. :]

llvm/tools/obj2yaml/elf2yaml.cpp
229–234

We still only leave allocatable versions in the output, though I think this comment can be better now.
I'll split the mentioned NFC patch soon. It will include this piece of code and I'll rewrite this comment.

grimar updated this revision to Diff 252805.Mar 26 2020, 5:12 AM
grimar marked 8 inline comments as done.
  • Addressed review comments, rebased.
llvm/test/tools/obj2yaml/program-headers.yaml
24–33

I've kept {{$}} (have no preference here, {{$}} looks fine to me).

llvm/tools/obj2yaml/elf2yaml.cpp
177–178

I think we should ignore either all SHT_NULL sections or just the one with section index == 0.
When we don't, the first test case in program-headers.yaml includes it in the first segment and prints it:

- Type: PT_LOAD
  Flags: [ PF_R ]
  Sections:
   - Section: ''
   - Section: .hash
   - Section: .gnu.hash
   - Section: .dynsym
   - Section: .dynstr
jhenderson added inline comments.Mar 27 2020, 2:19 AM
llvm/tools/obj2yaml/elf2yaml.cpp
299–300

Is it perhaps worth a test case that shows that SHT_NULL sections, even not the first one, are not considered to be inside a program header?

313–314

I'm not sure this early out is necessary. The subsequent for loop will not trigger, and Ret will be returned anyway, without modification.

grimar updated this revision to Diff 253072.Mar 27 2020, 3:45 AM
grimar marked 3 inline comments as done.
  • Addressed review comments.
llvm/tools/obj2yaml/elf2yaml.cpp
299–300

Done.

313–314

Agree.

jhenderson accepted this revision.Mar 30 2020, 1:00 AM

LGTM, I think. Please wait for @MaskRay to confirm.

This revision is now accepted and ready to land.Mar 30 2020, 1:00 AM
MaskRay accepted this revision.Mar 30 2020, 3:46 PM

Looks great!

This revision was automatically updated to reflect the committed changes.