Currently obj2yaml does not dump program headers,
this patch teaches it to do that.
Details
Diff Detail
Event Timeline
llvm/tools/obj2yaml/elf2yaml.cpp | ||
---|---|---|
312 | This change will go away after we land https://reviews.llvm.org/D74955 |
The dumpProgramHeaders() logic is similar to llvm-objcopy/ELF/Object.cpp:sectionWithSegment. It also reminds me of D74755... The attribution of an empty section on the boundary between two segments will not affect obj2yaml correctness.
llvm/tools/obj2yaml/elf2yaml.cpp | ||
---|---|---|
209 | I haven't carefully read through the code. Just raise an opinion. For non-SHT_NOBITS sections, sh_offset/p_offset works better than sh_addr/p_vaddr. |
No opinion on the code, but I think that the ability to dump program headers will be very useful. I've been wanting to have a feature like this for quite some time.
The main thing I am interested in is being able to dump core files (segments without any sections), which I guess this doesn't do yet, but it seems to be a step towards it.
llvm/include/llvm/ObjectYAML/ELFYAML.h | ||
---|---|---|
179 | and is used | |
llvm/test/tools/obj2yaml/program-headers.yaml | ||
5 | --program-headers/--segments instead of -a | |
78 | Some additional interesting test cases:
(Sorry if any of these are already covered) | |
llvm/tools/obj2yaml/elf2yaml.cpp | ||
169 | Maybe this shouldn't be auto? | |
209 | +1 to using sh_offset. That is what llvm-objcopy uses, IIRC, and I believe yaml2obj places things based on the file offset too. |
Yes, I think in obj2yaml, it doesn't matter which segments a section is within, because the output should have the segments at exactly the same place as the input. There should be no moving of segments in obj2yaml, in my opinion, as it should provide as faithful a representation as possbile of the original input.
@jhenderson listed many good test cases. It requires some work. I've left some notes.
- Standalone empty segments
GNU ld can create a p_memsz=p_filesz=0 PT_LOAD. lld doesn't.
- Nested empty segments (at start/end/middle)
PT_TLS⊆PT_LOAD(RW)
(PT_TLS(p_memsz=0, p_align=64) is used by Android Bionic)
- Empty sections in segments (at start/end/middle)
- Misaligned segments?
p_offset%p_align != p_vaddr%p_align
- Out-of-order PT_LOAD segments?
PT_LOAD not ordered by p_vaddr.
ELF spec: Loadable segment entries in the program header table appear in ascending order, sorted on the p_vaddr member.
- Segments with gaps in them (file size != size of all sections), where gap is at start/middle/end of segment.
Type: Filler
- Segments with memsize > file size.
Needs non-empty SHT_NOBITS.
- Non-alloc sections within segments.
- Reimplemented, added test cases.
Thank you guys for these suggestions and comments, I used them in the update.
I've added tests for all cases listed except the next two.
Segments with gaps in them (file size != size of all sections), where gap is at start/middle/end of segment.
Type: Filler
I think we might want having a separate test for this. Also it needs more code I think.
Segments with memsize > file size.
Needs non-empty SHT_NOBITS.
I'd like to support SHT_NOBITS case separatelly too for the same reasons.
How does it sound?
llvm/test/Object/obj2yaml.test | ||
---|---|---|
670 | .data section size == 0, I wonder if it is OK we dump it? Can we ignore empty sections at the end of segment? I.e do what D74755 removes: // If a section is empty it should be treated like it has a size of 1. This is // to clarify the case when an empty section lies on a boundary between two // segments and ensures that the section "belongs" to the second segment and // not the first. uint64_t SecSize = Sec.Size ? Sec.Size : 1; |
llvm/test/Object/obj2yaml.test | ||
---|---|---|
670 | I vote for removing uint64_t SecSize = Sec.Size ? Sec.Size : 1; |
- Addressed review comments (simplified algorithm used).
llvm/tools/obj2yaml/elf2yaml.cpp | ||
---|---|---|
216 | I think you are right. It is easier to use O(n*m) here, since the number of segments is not that large anyways. |
Not all segments have to be nested inside a PT_LOAD. We have several OS-specific segments in our downstream port that are not part of the memory image, but must appear in the link output, as they are used by tools other than the loader in our ecosystem, so they are not nested in PT_LOADs in our linker scripts. I imagine there are other cases too.
Those both sound reasonable to me.
llvm/include/llvm/ObjectYAML/ELFYAML.h | ||
---|---|---|
179 | holds -> hold | |
llvm/test/Object/obj2yaml.test | ||
670 | I think for obj2yaml, it's fine to do this, since obj2yaml should just recreate the layout, and the sections in the segment just help to configure the program headers. | |
llvm/test/tools/obj2yaml/program-headers.yaml | ||
4 | looks -> look | |
7 | Thinking about it, do we really need this part? It's really testing yaml2obj and/or llvm-readelf, not obj2yaml. | |
9 | dumped -> dump | |
24–33 | You need {{$}} at the end of all of these lines, since there could be more segments after the last one in each of them. | |
133 | into it. (same throughout) | |
142 | Might be worth saying "a nested dynamic segment" to highlight that being nested is also an interesting property. | |
158 | Should this be moved to Part II? | |
173 | This comment needs to explain "why" it must be the same as the previous segment. Otherwise, I think the first sentence is unnecessary. "we and are" -> "we are" | |
291 | It might be clearer to explicitly specify .empty.tls.end here? Also, this appears to be missing a PT_TLS at the start of the segment, i.e. covering .empty.tls.start? | |
359 | in a segment. I disagree with this behaviour. A segment doesn't not have to contain only SHF_ALLOC sections (it is not mandated by the ELF gABI, and my organisation at least has linker scripts like that for segments other than PT_LOAD). If a segment's file offset + file size implies that it covers a non-alloc section, it should include it, no different to an alloc section. | |
llvm/tools/obj2yaml/elf2yaml.cpp | ||
174–175 | Could this return an Expected<std::vector<ELFYAML::PogramHeader>>? That would be cleaner than passing in something to be filled. | |
192–193 | Maybe add a TODO here to add support for Fills? | |
196 | As noted in the test, this SHF_ALLOC check is incorrect. It doesn't faithfully recreate what was in the object, and doesn't allow for non-loadable segments. Please remove it. |
llvm/tools/obj2yaml/elf2yaml.cpp | ||
---|---|---|
196 | The approach used by this patch won't work then. We do not create Chunks for non-allocatable SHT_STRTAB, SHT_SYMTAB and SHT_DYNSYM, |
llvm/tools/obj2yaml/elf2yaml.cpp | ||
---|---|---|
196 | When I experimented with how to implement it, I had a prototype that had more complex code, but had no this limitation. |
llvm/tools/obj2yaml/elf2yaml.cpp | ||
---|---|---|
196 | FWIW, our use case doesn't have the static symbol table or string table inside a segment, although I don't know of any strict reason why this shouldn't be allowed. |
- Changed implementation to support non-allocatable sections as requested.
- Addressed review comments.
llvm/test/tools/obj2yaml/program-headers.yaml | ||
---|---|---|
7 | I did it for documentation purposes: (1) I find it very convenient for reading and maintaining this test: with it we can check and compare obj2yaml's output with "expected" version easily. (2) This also might allow us to refer and explain (in comments) any differences we might have in outputs. | |
158 | I am not sure why. This tests we can have an empty segment somewhere between other ones. | |
291 | Done. | |
359 | Done. | |
llvm/tools/obj2yaml/elf2yaml.cpp | ||
174–175 | Done. I think I did it to be consistent with dumpSymbols, but returning Expected<std::vector<ELFYAML::ProgramHeader>> is better indeed, | |
192–193 | I avoided any additional details as I am not sure it is clear to me how exactly we are going to support them. |
Note: I've changed implementation slightly. Now we dump all sections and then after dumping program headers we remove sections we do not want to see in an output.
It allows us to dump sections in one place and work with array of Chunks. It also should help to support Filler and seems makes the whole logic a bit nicer (in context of this patch).
It is possible to split this change to a separate NFC patch and I'll be happy to do it if we agree that it is a good way to go.
llvm/include/llvm/ObjectYAML/ELFYAML.h | ||
---|---|---|
179 | Hold -> Holds (!) (For reference, third-person singular verbs usually have an 's' on the end, but not third-person plural - previously multiple variables were being referenced, so plural, but now there is only one) | |
llvm/test/tools/obj2yaml/program-headers.yaml | ||
7 | No strong objection, but might be worth moving the comment to the llvm-readelf run line, along with a statement saying the check is here to make it clear what the layout should look like. | |
24–33 | On second thoughts, using --match-full-lines might be just as good. I don't mind either way. | |
158 | I don't think there's a need for that. I wouldn't expect that to be a special case. | |
174 | the same as the previous one for this | |
374 | to segments -> in segments | |
llvm/tools/obj2yaml/elf2yaml.cpp | ||
169–170 | Do we need this special case? Do you have a test case that illustrates it? | |
221–226 | This comment refers to allocatable sections, which I don't think is quite right? |
llvm/include/llvm/ObjectYAML/ELFYAML.h | ||
---|---|---|
179 | Thanks.. :] | |
llvm/tools/obj2yaml/elf2yaml.cpp | ||
221–226 | We still only leave allocatable versions in the output, though I think this comment can be better now. |
- Addressed review comments, rebased.
llvm/test/tools/obj2yaml/program-headers.yaml | ||
---|---|---|
24–33 | I've kept {{$}} (have no preference here, {{$}} looks fine to me). | |
llvm/tools/obj2yaml/elf2yaml.cpp | ||
169–170 | I think we should ignore either all SHT_NULL sections or just the one with section index == 0. - Type: PT_LOAD Flags: [ PF_R ] Sections: - Section: '' - Section: .hash - Section: .gnu.hash - Section: .dynsym - Section: .dynstr |
llvm/tools/obj2yaml/elf2yaml.cpp | ||
---|---|---|
295–296 | Is it perhaps worth a test case that shows that SHT_NULL sections, even not the first one, are not considered to be inside a program header? | |
309–310 | I'm not sure this early out is necessary. The subsequent for loop will not trigger, and Ret will be returned anyway, without modification. |
and is used