This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
test/tools/llvm-objcopy/
-
tools/
-
llvm-objcopy/
2/2
basic-copy.test
4
hello-world.s
-
tools/
-
LLVMBuild.txt
-
llvm-objcopy/
-
CMakeLists.txt
-
LLVMBuild.txt
14/14
Object.h
36/49
Object.cpp
1/1
llvm-objcopy.h
3/3
llvm-objcopy.cpp

Differential D33964

[LLVM][llvm-objcopy] Added basic plumbing to get things started
ClosedPublic

Authored by jakehehrlich on Jun 6 2017, 3:46 PM.

Download Raw Diff

Details

Reviewers

phosek
Bigcheese

Commits

Summary

As discussed on llvm-dev I've implemented the first basic steps towards llvm-objcopy/llvm-objtool (name pending).

This change adds the ability to copy (without modification) 64-bit little endian ELF executables that have SHT_PROGBITS, SHT_NOBITS, SHT_NULL and SHT_STRTAB sections.

Diff Detail

Repository: rL LLVM

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Herald added a subscriber: mgorny. · View Herald TranscriptJun 6 2017, 3:46 PM

jakehehrlich retitled this revision from [LLVM][llvm-objcopy] to [LLVM][llvm-objcopy] Added basic plumbing to get things started.Jun 6 2017, 4:04 PM

jakehehrlich added a reviewer: phosek.

jakehehrlich added a reviewer: Bigcheese.Jun 6 2017, 4:36 PM

tpimh added a subscriber: tpimh.Jun 7 2017, 12:16 AM

grimar added a subscriber: grimar.Jun 7 2017, 8:10 AM

compnerd added a subscriber: compnerd.Jun 9 2017, 2:43 PM

I added ELFT support. This demanded making a number of structural changes to how I was doing things. I decided 'initialize' no longer made sense. I also decided it was best to construct special sections like the section names table independently of reading the file. Also I previously wasn't aligning anything. I added some basic alignment support so that sections and segments are properly aligned.

Have few general comments.

test/tools/llvm-objcopy/basic-copy.test
4	I would add empty line here.
16	And here. Just to separate execution flow from check in a more obvious way.
tools/llvm-objcopy/Object.cpp
37	I would inline CompOffset, it is trivial operation that does not probably require addition comparator.
77	Count
122	Seg.Index = Index++;
190	A, B
199	LLVM code style says variable names should be uppercase. Value, Multiple, so on.
200	Is it clang-formatted ?
270	Consider using typedefs in header (example from LLD code) template <class ELFT> class Writer { public: typedef typename ELFT::Shdr Elf_Shdr; typedef typename ELFT::Ehdr Elf_Ehdr; typedef typename ELFT::Phdr Elf_Phdr;
tools/llvm-objcopy/Object.h
44	Sort by Name may be ? And looks you can do initialization inline instead of doing that in constructor below.
95	Consider using llvm::StringMap may be ?
105	I think you can inline initialization,
133	Not sure how much is in common not to include argument names in definitions. (in LLD where I spend most of time I am working on, we always do that, but may be it is ok, just noticable thing for me).
tools/llvm-objcopy/llvm-objcopy.cpp
54	ELF64LE ? I guess you want to template it.
95	You either want to remove brackets here: if (ELFObjectFile<ELF64LE> o = dyn_cast<ELFObjectFile<ELF64LE>>(&Binary)) CopyBinary(o); else reportError(InputFilename, object_error::invalid_file_type); Or probably better just to early return: if (ELFObjectFile<ELF64LE> o = dyn_cast<ELFObjectFile<ELF64LE>>(&Binary)) return CopyBinary(o); reportError(InputFilename, object_error::invalid_file_type); return X;

emaste added a subscriber: emaste.Jun 11 2017, 8:05 AM

Fixed things recommended by grimar

Changed it to use llvm::StringMap instead of std::map and also fixed the basic test.

jakehehrlich added a child revision: D34167: [llvm] Add symbol table support to llvm-objcopy.Jun 13 2017, 1:37 PM

kastiglione added a subscriber: kastiglione.Jun 14 2017, 3:04 PM

I having looked through the fine details yet, but one higher level question: I saw the discussions on the mailing list concerning being careful not to go overboard making everything work with multiple object file formats and to start with ELF. That being said, why can't just the ELF functionality of libObject be extended to write? Seems like going that way would make it easier for others to reuse in other tools.

In D33964#781402, @meadori wrote:

I having looked through the fine details yet, but one higher level question: I saw the discussions on the mailing list concerning being careful not to go overboard making everything work with multiple object file formats and to start with ELF. That being said, why can't just the ELF functionality of libObject be extended to write? Seems like going that way would make it easier for others to reuse in other tools.

As I understand it libObject reads in the file to memory and then simply reinterpret_casts the different chunks of the file into the corresponding parts. So when you get an Elf64_Shdr* from the libObject interface you actually are getting a pointer into the file data. It's sort of like "in-place" file parsing. It might even be memory mapping the file, I'm not sure. My point is that It does not have it's own internal modifiable representation of the file. It just has the raw data from the file. Changing the interface to include section/symbol removal (which involves changing file layout and modifying sections like .shstrtab and .shtab) would cause the internal workings of libObject to change a lot I think. Perhaps someone can confirm my understanding?

Okay, that makes sense. Thanks to all that explained it.

With that in mind, is there any reason not to but the read/write object implementation that is started in this patch into a new library? ISTM, that functionality would be quite useful for various object file manipulation tools.

There are several llvm projects that use similar functionality so a library should be implemented at some point. Petr spoke with Rui about this possibility of de-duplication of efforts. The conclusion of this discussion was that we should build llvm-objcopy independently and think about de-duplication later. Sean Silva expressed a similar sentiment in the llvm-objcopy proposal email. I may be speaking out of turn here but I think everyone agrees that a) there is duplication occurring across llvm here and b) now is not the right time to tackle that issue but it should defiantly be in thought about. There is some concern over weather the de-duplication even makes sense. I think as llvm-objcopy starts to be fleshed out it will become clear what can be de-duplicated and what cannot. That will give us a good idea of what should and shouldn't go in this library.

Seeing how things grow and de-duplicating as necessary seems like a reasonable strategy to me.

ping/bump

After a bit offline discussion with Petr I fixed a couple things like formatting and error handling.

LGTM, but I'd like to have a second opinion.

Thanks for working on this! It's been a low priority on my todo list for years now.

I'm a bit concerned about the section layout algorithm. It ignores the section offsets from the input sections and just computes new ones derived from the address and size. This will fail when given an ELF which has a gap in the sections in a given segment. I think the hardest part of objcopy will be making sure to keep the file in a consistent state, and I'd like to see a design that makes it hard to get wrong.

tools/llvm-objcopy/Object.cpp
172	Memory leak. Use unique_ptr.
194	Use alignTo from llvm/Support/MathExtras.h
225	Replace with alignTo?
tools/llvm-objcopy/Object.h
4	`The LLVM Compiler Infrastructure`
91	This should probably use llvm/MC/StringTableBuilder.h
101	dead code
105–107	dead code

This revision now requires changes to proceed.Jun 21 2017, 5:20 PM

Switched to using StringTableBuilder (which makes StringTableSection just a wrapper around StringTableBuilder).
I removed some dead code pointed out by BigCheese.
I switched to alignTo.
Corrected licence headers

jakehehrlich marked 4 inline comments as done.Jun 22 2017, 2:50 PM

jakehehrlich added inline comments.

tools/llvm-objcopy/Object.cpp
172	It get's emplaced into the Sections vector first thing after allocation and that unique pointer manages the memory. Would be more obvious that a leak isn't occurring I first add a new unique pointer to Sections and then assign SectionNames via a cast from there?

Bigcheese added inline comments.Jun 22 2017, 3:20 PM

tools/llvm-objcopy/Object.cpp
172	Ah, I see now. It would probably be best to do: Sections.push_back(make_unique<StringTableSection>()); SectionNames = Sections.back().get(); ...

Made it obvious that a memory leak is not occurring as recommended by BigCheese.

jakehehrlich marked 3 inline comments as done.Jun 22 2017, 5:17 PM

In working on some more other features and improvements to this I realized that I had dangling pointer issue. I was assigning pointers to Segments out of an std::vector. These pointers would become invalid on resize. To fix this I simply used unique pointers to store the segments. The code changed very little as a result since I try my best not to mess with program headers.

jhenderson added a subscriber: jhenderson.Jun 27 2017, 1:34 AM

Looking good so far to me, apart from lots of small nits for the most part.

One general nit on comments - I believe that they should all end in full stops.

tools/llvm-objcopy/Object.cpp
19	Could we typedef these as well? It will read more easily, I think.
25	Any particular reason this TODO has been left undone?
166	I personally would think that using offset to determine order may be simpler, as it removes the need to test for SHF_ALLOC. Also, sections don't have to appear in the file in the order that they appear in the section header table, so I'd only use Index as a fall-back if you get two sections with the same offset.
172	Typo -> "Decide file offsets and indexes"
174	For clarity, I'd rephrase this comment as follows: "We can put section data after the ELF header and the program headers."
178	We need to make sure of what? Perhaps "We need to check both are respected" or something to that effect.
190	Section->Index = Index++; and then drop the later increment.
226	Use typedef.
tools/llvm-objcopy/Object.h
32	ParentSegment
tools/llvm-objcopy/llvm-objcopy.cpp
74	I think this comment is in the wrong place, or even unnecessary entirely.
tools/llvm-objcopy/llvm-objcopy.h
15	Unused header?

I made the changes recommended by jhenderson. I particularly liked the sorting by offsets recommendation. I also simplified the finalize method in segment. Before it was deciding segment FileSize which it shouldn't and it was using a complicated method to find the minimum when it didn't need to.

jakehehrlich marked 10 inline comments as done.Jun 27 2017, 10:54 AM

jakehehrlich added inline comments.

tools/llvm-objcopy/Object.cpp
166	This is a really good idea. Thank you!

I forgot to add all needed full stops in my last update.

The implementation looks good from my point of view. I'm wondering about the test coverage. The current test only tests PROGBITS, NULL, and STRTAB sections. However, your original comment comment claims that it will work with NOBITS. I'd like to see that tested, if this is the case. Also test cases with segments in as well, would be good.

tools/llvm-objcopy/Object.cpp
170	"make ensure" -> "make sure" or just "ensure"

This change adds test coverage for NOBITS and program headers. It turns out there were issues with NOBITS and there were issues with segments. I fixed the bugs exposed as well

Changes from last diff (outside of tests):

Some segments (like STACK segments) don't have sections so you can't set their Offset to firstSection()->Offset. This demanded a small change to Segment::finalize
Sometimes sections that don't belong in segments would wind up in a PT_LOAD segment that had Offset == 0. This is fixed by only adding allocated sections to segments (see Object::readSectionHeaders)
I was unconditionally requesting the contents of a section. This is an error in the NOBITS case so I had to add a check to only request the section contents when there is some. (see Object::readSectionHeaders)

In D33964#794100, @jakehehrlich wrote:

Sometimes sections that don't belong in segments would wind up in a PT_LOAD segment that had Offset == 0. This is fixed by only adding allocated sections to segments (see Object::readSectionHeaders)

I disagree with this approach - yes non-allocatable sections should not end up in PT_LOAD (although using linker scripts, I wouldn't be surprised if it is possible), but it is perfectly legitimate to have segments that are not loadable with non-SHF_ALLOC sections (see the ongoing discussion in D34204 for explanation). I'm not quite sure what the case here is that you've encountered, but it sounds to me like the issue is elsewhere.

Also, I noticed that you don't ignore SHT_NULL sections, but you probably should. Large proposal 74 ELFs with many sections have metadata stored in that section. I'd suggest two things:

Don't add support for Prop74 just yet - you can check somewhere if it is one and then bail out, to be safe.
Skip the SHT_NULL section when reading section headers, and recreate it later on.

tools/llvm-objcopy/Object.cpp
108	This should probably be SHT_STRTAB \|\| SHT_NULL.

I decided to just skip index zero rather than all SHT_NULL sections. It's equivalent for normal files but also handles strange files that want to have SHT_NULL sections in the middle for some reason.

Having the dummy section be treated as a real section was causing the issue I had to fix with the SHF_ALLOC check so fixing this killed two birds with one stone.

In D33964#795981, @jakehehrlich wrote:

I decided to just skip index zero rather than all SHT_NULL sections. It's equivalent for normal files but also handles strange files that want to have SHT_NULL sections in the middle for some reason.

Having the dummy section be treated as a real section was causing the issue I had to fix with the SHF_ALLOC check so fixing this killed two birds with one stone.

Great! One minor nit, and I think we're nearly there, from my point of view. Also, at things stand, very large proposal 74 ELFs with metadata in the SHT_NULL section will be corrupted silently, I believe, so it might be nice to error if there are too many sections when trying to write.

tools/llvm-objcopy/Object.cpp
128–129	Sec->Index = Index++?

Fixed the separate index assignment and increment.

jhenderson added inline comments.Jul 3 2017, 1:38 AM

tools/llvm-objcopy/Object.cpp
162	address order -> offset order
186	Could you explain why we are using a mix of offsets and addresses here, as it is surprising to me.
197	properlly -> properly
198–199	Why only the sh_name field? 4-byte alignment could cause other struct members to be misaligned (e.g. the sh_addr and sh_offset fields), because they are 8-bytes in size.

Fixed some small issues like the comments recommend by James.

jakehehrlich marked 5 inline comments as done.Jul 5 2017, 4:29 PM

jakehehrlich added inline comments.

tools/llvm-objcopy/Object.cpp
186	The first an main point is that the Offsets change in this loop and what we want to respect is the difference between the section offsets in the original file. FirstInSeg->Offset no longer respects that information. This is changeable however by adding fields that keep track of the original file offset. This might make sense to use elsewhere as well like in the sorting algorithm where we're trying to respect the original offset as much as possible. This would free up Offset to be messed with (in what way isn't clear) before sorting occurs. So this is changeable but only by adding another field. Perhaps that is not a bad idea however. This said when I was thinking about this I never actually even thought to use Offset. I was thinking about respecting the memory image not the file image. I think as a guiding principal "respect the memory image" is a a better guiding principal but only slightly. The memory image and file image should agree for all valid files however so I'm not really opposed to changing it. I have a slight opposition but it's based on some ideas I have for changing the file layout algorithm and I don't want to justify code here based on code that hasn't been written yet. The general idea is that it is possible to enforce "respect the memory image" via const correctness/encapsulation but it is not possible to do the same for the file image because Offset will need to be changed. My personal opinion is that adding a new Offset like field that isn't quite Offset and using that field to compute the new offset here isn't really any clearer that what I have here. What do you think?
198–199	The next field (sh_type) is also a 4-byte word and thus anything after those two will be 8-byte aligned. So this is the minimal alignment I can choose which ensures that all the fields of the section header are aligned. I updated the comment to reflect this fact.

By the way, although it makes it easier for me to see the changes, your current diff only shows the changes versus the previous diff, so has lost, for example, the tests.

As noted in one of my comments, I'm not certain that this will work in the nested segment case. Is there a test for this?

tools/llvm-objcopy/Object.cpp
186	I think I follow what you're saying and broadly agree with it. We have to respect the memory image, at least for linked objects, because if we change the relative positions of two sections within a segment, then the address mapping will no longer be correct. For relocatable objects, that's not an issue, because they don't have a memory mapping yet (and segments can be ignored in this case). However, the file and memory image are not always consistent, even for valid programs. Specifically, SHT_NOBITS sections have no file image typically, but do have a memory image. Usually, this is not an issue, because the .bss section is placed at the end of the data segment. However, when using TLS (which is sometimes a nested segment in the data segment) this is not the always the case due to .tbss, because other non-TLS data usually appears after it. From my observations, different platforms exhibit different behaviour here, but at least some of them have no file image assigned for the .tbss section, but do have a memory image associated with it. As such, the layout algorithm needs to be able to handle this case. That all being said, looking at the code without actually trying it out, I'm not convinced that this will handle the nested-segment case anyway, in which case what you've got here is fine. However, I would suggest updating the comment to explain why we need to respect interstitial memory gaps.
198–199	Relative to the start of the section header, the fields will be aligned correctly. However, relative to the start of the file, they won't be: for example, in 64-bit ELF, with 8 byte address fields, the sh_name field was aligned to offset 4, the sh_type field will be at offset 8, and the 8-byte-sized sh_flags field will be at offset 12, which is misaligned.

I was wrong about the alignment of section headers as James pointed out. I fixed that.
Added comments recommended by James.
This is the full diff that I forgot to add last time

jakehehrlich marked 7 inline comments as done.Jul 6 2017, 1:03 PM

jakehehrlich added inline comments.

tools/llvm-objcopy/Object.cpp
198–199	Yep. You're right. I did my math wrong. Not sure what I was thinking there. Thanks for catching that! It should be fixed now.

Could you construct a test case that confirms the behaviour of nested segments, please? Something with a memory image like:

| Segment 1                                                        |
| Segment 2                               |
| Progbits Section | gap | NOBITS section | gap | Progbits section |

test/tools/llvm-objcopy/hello-world.s
3	I don't think this will work for many people, as LLD is not part of LLVM's main source code body, so people will be unable to run this test. That being said, I'm not sure how to work around this. Can yaml2obj help us here? (I'm not particularly familiar with that tool yet).
tools/llvm-objcopy/Object.cpp
186	I'd suggest changing the second and third sentence to something like: "We must maintain the memory image so that addresses are preserved. As, with the exception of SHT_NOBITS sections at the end of segments, the memory image is a copy of the file image, we preserve the file image as well." It explains why the file and memory image need to match, which may not be obvious to those who are unfamiliar with ELF.

In D33964#803460, @jhenderson wrote:
Could you construct a test case that confirms the behaviour of nested segments, please? Something with a memory image like:
| Segment 1                                                        |
| Segment 2                               |
| Progbits Section | gap | NOBITS section | gap | Progbits section |

I can confirm this just won't work right. As is the first Progbits section will wind up in two Segments and everything will get all messed up. For this diff I'm fine with that not being supported. I'm also not sure what the intended behavior is in this case.

test/tools/llvm-objcopy/hello-world.s
3	Oh man, that didn't even occur to me. As far as I am aware there is no way to produce program headers using yaml2obj. It never produces any program headers by default either. I think you're right and this should be fixed. I have no clue how to fix it other than adding support for program headers to yaml2obj. That's not something I've looked into however. Perhaps there is a way to require lld for this test? That way it's just unsupported for most people?

Fixed the comment as recomended

jakehehrlich marked an inline comment as done.Jul 10 2017, 11:12 AM

In D33964#803969, @jakehehrlich wrote:
In D33964#803460, @jhenderson wrote:
Could you construct a test case that confirms the behaviour of nested segments, please? Something with a memory image like:
| Segment 1                                                        |
| Segment 2                               |
| Progbits Section | gap | NOBITS section | gap | Progbits section |
I can confirm this just won't work right. As is the first Progbits section will wind up in two Segments and everything will get all messed up. For this diff I'm fine with that not being supported. I'm also not sure what the intended behavior is in this case.

I'm ok with that as well, but I'd like to understand what your plan is going forward to fix this, in case it has a significant impact on the design.

Having a progbits section in multiple segments is absolutely fine, and I'd expect the layout to be maintained as in the diagram. The alignment of all containing segments needs to be respected when placing the first section, and then they all start at the same place (or in the case where the nested segment is not at the start, at the same relative offset). There are some weird oddities when coming to NOBITS sections, because of their file/memory footprint mismatch. The simple, but probably undesirable, option is to allocate a file footprint for any NOBITS sections that are not at the end of the top-most parent segment. I think though that actually, as is the case in the diagram here, that there is special handling for nested NOBITS sections like this in loaders (specifically, I'm thinking of the .tbss case again - the example could easily be a data segment with a nested TLS segment, with the .tdata, .tbss, and .data sections). By my understanding, as long as .tbss is at the end of the nested TLS segment, it should not have a file footprint, even though it makes it look like there'll be a mismatch in the data segment.

test/tools/llvm-objcopy/hello-world.s
3	I'm afraid I don't know of a way. There might be, but if there is, somebody else will have to advise. It seems to me that having program header support in yaml2obj is very desirable. I'd offer to add it, but I do not have any time to do so.

In D33964#804772, @jhenderson wrote:

I'm ok with that as well, but I'd like to understand what your plan is going forward to fix this, in case it has a significant impact on the design.

Having a progbits section in multiple segments is absolutely fine, and I'd expect the layout to be maintained as in the diagram. The alignment of all containing segments needs to be respected when placing the first section, and then they all start at the same place (or in the case where the nested segment is not at the start, at the same relative offset). There are some weird oddities when coming to NOBITS sections, because of their file/memory footprint mismatch. The simple, but probably undesirable, option is to allocate a file footprint for any NOBITS sections that are not at the end of the top-most parent segment. I think though that actually, as is the case in the diagram here, that there is special handling for nested NOBITS sections like this in loaders (specifically, I'm thinking of the .tbss case again - the example could easily be a data segment with a nested TLS segment, with the .tdata, .tbss, and .data sections). By my understanding, as long as .tbss is at the end of the nested TLS segment, it should not have a file footprint, even though it makes it look like there'll be a mismatch in the data segment.

Well if this is in fact the intended behavior then I think that means I have to switch from "respect the memory image" to "respect the file image". If the relative offsets are used instead of relative addresses it should solve this problem with only a tiny change. If I stick to "respect the memory image" then I'd have to add all sorts of complicated book keeping to ensure that this case was handled correctly. Even if TLS isn't handled this way there might be some case like this where it is and we should respect the choice that the file made. I think this case (or even the possibility of such a case existing) convinces me that it should be "respect the file image" and not "respect the memory image".

So to handle this I propose the following changes:

Make the parent segment the lowest offset segment. I think I'd prefer "lowest offset segment" rather than "top segment" in order to account for strange things like overlapping segments in case those are also a thing. I'm slowly coming to realize that it's best to curtail my assumptions as much as possible.
Change "Offset = FirstInSeg->Offset + Section->Addr - FirstInSeg->Addr;" to "Offset = FirstInSeg->Offset + Section->OriginalOffset - FirstInSeg->OriginalOffset;" because of the TLS case you found. This way we respect the choice the input file made on this mater regardless of weather it allocates space in the file for the NOBITS section or not.
Add a parent segment field to segments so that nested segments (and even overlapping segments) can have offsets based on their parent segment and not the sections they contain.

Does this sound good? If so I'll go ahead and change this diff to use 1) and 2) now and I'll add 3) in another diff. Even with just 1) and 2) I think this should still work in the specific case you requested a test on but I'll wait to add that test when I add 3) I think.

test/tools/llvm-objcopy/hello-world.s
3	I'm adding support for program headers to yaml2obj now. Hopefully I can land that change soon.

I've spent a fair amount of time today investigating GNU objcopy's behaviour for nested segments. The quick version is that it can get into a slightly broken state if you start messing about with sections in nested segments (although it does the right thing for the normal use cases).

I used an artificial linker script to create an arbitrary nested segment with four non-empty sections in: 3 progbits sections followed by a nobits section. In the parent segment, I also added additional progbits sections and concluded with a nobits section. If objcopy is used to strip one of the progbits sections in the nested segment, the file image is left unchanged, as expected, but perhaps bizarrely, the memory image of the segment gets messed up - the addresses of the sections are unchanged, but the memory image of the segment has unexpectedly shrunk. I suspect this is a bug. The behaviour of the TLS sections gets even weirder, and there appears to be a lot of special case handling which causes things to go wrong when additional segments get stripped, so I highly recommend never adding custom sections to the TLS segment!

Anyway, that all aside, from my investigations and my understanding, what you are proposing seems good. Preserving the relative file offsets will be sufficient to cover all cases that I'm aware of, regardless of how the linker has chosen to allocate file and memory space in the segment for sections.

The overlapping segment case is weird, and I'm not sure how to construct an ELF in that state. What you are proposing sounds reasonable for this as well. The only thing I would say is consider the case where Segment1 overlaps the start of Segment2, which in turn overlaps the start of Segment3. As long as Segment3 gets moved relative to Segment1 (possibly indirectly by moving relative to Segment2), I think we're fine.

Offsets of sections that cover a segment are no longer based on relative addresses but on relative offsets from the original file. This accounts for the normal case and the strange TLS case that James pointed out and it does so regardless of weather the target system makes this choice or not for TLS.
If a section covers multiple segments then the parent segment is now canonicalized to be the lowest offset segment

jhenderson added inline comments.Jul 13 2017, 2:10 AM

tools/llvm-objcopy/Object.cpp
96–102	From a code readability point, it might be worth either breaking this into a small helper function, or to explain the algorithm in comments somewhere. I'm not asking you to add them at this point, but this is the sort of thing that would be great to have unit-tests around!
194	I've just thought about yet another problem with this: In some cases, the ELF header and/or Program Header table can appear in a segment before the first section. Consequently, we need to preserve the relative offset to the start of the section, rather than the first segment. At the moment, any space before the first section within the segment is ignored and lost. This applies to Segment::finalize() too.
220	typo: remaining
262–272	Nothing is using Shdr here. Is this missing a write, or can it be removed?
tools/llvm-objcopy/Object.h
58	Two sections with identical addresses will be considered identical by the set, which means that only one will be present. I'm pretty sure overlapping sections are bad, but an empty section could easily share the same start address without any worry. I think you need to compare the pointers themselves to prove they are not identical, i.e. the check becomes: if (Lhs->Addr == Rhs->Addr) return Lhs < Rhs; return Lhs->Addr < Rhs->Addr; Optionally, you could test other members as well, e.g. Offset or Size, but ultimately, you're going to need to test that they are not the same section somewhere. Please also add a test case for this (e.g. two identical empty sections at the same address and offset, or one empty and one non-empty section at the same address and offset).
77	`if (!Sections.empty())` maybe?
137–142	The size of some of these fields is dependent on the Elf Type (64 versus 32). I wonder if these should be based on ELFT? On the other hand, I'm quite happy that we just use the largest size for now. At the moment, since the template is only instantiated for Elf64LE, this is fine, but it won't work once 32-bit support is added, unless the point where they are assigned to the Elf Header struct is modified to cast the values accordingly.

Changed firstSection to use !empty() instead of implicitly converting size() to a bool
Segments should now keep track of empty sections. I don't think this would have caused an issue but I agree that it is a problem none the less. It shouldn't ever affect layout but it could affect other analysis.
Added a helper "is covered by" method to clean things up as recommended by James
Fixed a spelling mistake pointed out by James

jakehehrlich marked 5 inline comments as done.Jul 13 2017, 5:53 PM

jakehehrlich added inline comments.

tools/llvm-objcopy/Object.cpp
194	Yea PT_PHDR is a problem I have run into in extensions of this diff. I think it needs some special handling. I think I can support it when I officially add support for copying data of interstitial gaps and nested segments. It will still require special care however. Would you be ok with me not attempting to handle this right now? At the very least this requires handling nested segments because the PT_PHDR segment must be nested in a PT_LOAD segment. The needed information is preserved now because the original offset of the segment is stored until the segment offset needs to be computed and the original offset of the section is now stored and kept. So that gap can be recovered when performing layout of the segment. This seems to be enough to confirm that there isn't a fundamental design issue that prevents the PT_PHDR case from being handled.
262–272	This is actually correct I believe. This does point out a lack of clarity in what is going on here however. As is every section knows where their header belongs so I just pass "Out" to the section and it does the rest. This highlighted code writes the dummy section at the start of the section header table. So there's no missing write or anything. I'll add a comment explaining what's going on here.
tools/llvm-objcopy/Object.h
137–142	This was a recommendation that I got from Rui actually. To quote Rui when speaking about LLD. Integral types such as Elf{32,64}_{XWord,Word,Addr,Offset} are not useful and better to avoid. We are using uint{8,16,32,64}_t instead. It seems it improves readability. (I honestly don't memorize > the real types of these ELFT types.) ELFT::uint, whose size is 32/64 depending on ELF32/64, isn't useful. Always uint64_t to represent a value that can be 32 or 64. The waste of doing this is negligible, but it could drastically simplify > types because if your function uses only ELFT::uint, you can de-template that function by using uint64_t. I've found this to be true as well. It definitely lets you get rid of a lot of templates. If we just used the templates when you had access to them (Like in Object here) and used the largest type when we didn't (like in Segment or SectionBase) we would get this strange inconsistency in our code. I'd like to avoid that.

Thanks for adding empty-section.s. Once you have program header support in yaml2obj, I think it would be good to have the same test for empty sections in program headers, for similar reasons (actually, the way the code was written, I think that test would be the more important one).

tools/llvm-objcopy/Object.cpp
80–81	It's not clear on the surface what is meant by "Covers" (I assume it means within or similar). Also, I think you've got it backwards in the name - it returns true, if the section is in the segment, not the other way around. Also, I'm not sure what the behaviour of empty sections should be here. My instinct is to say that empty sections are "covered" by a segment, if they appear at the start, or in the middle of a segment, but not if it is only at the end. I could imagine a case with two segments with the first having the same end offset as the start of the next one. If there is an empty section at the meeting point, I don't think it makes sense for it to be in both. From a run-time point of view, I don't think it matters, but it could cause weird looking anomalies in static dumping tools, and also leads to the case of, "if the first segment moves to a lower offset, does the section move with it or stay with the other segment?"
194	I'm not sure it's specific to PT_PHDR - it's possible to put these headers in any arbitrary segment, if you want (though I'm not sure why you would in most cases), e.g. directly in the first PT_LOAD segment, with the appropriate linker script. Also, I don't think it's specific to the headers either - the file image is going to get messed up if there was empty space of any kind not covered by a section before the first section. Rather than preserve the relative offsets of sections to the first section, this code should probably preserve the offset relative to the original segment start. Example: \| Segment \| \| gap \| Section1 \| Section2 \| A B The offset of Section1 will be placed at A, if I've followed the code correctly, not at B, where it should be, thus messing up the file and therefore the memory image. Section2 will still be placed immediately after Section1 correctly. As a side point, this will also have an impact on the alignment of the first section, but assuming the input program is valid, I don't think that this will be an issue.
262–272	Ah, I missed the fact that this was a reference rather than a copy. What you are doing here is fine then. A comment might be helpful though, as you say.
tools/llvm-objcopy/Object.h
137–142	Ok, that's fine with me, as long as it was a conscious decision. I could see a case in the future when adding support for things like adding or modifying sections when it may be important to safety check that we haven't overflowed, but that's not relevant at the moment.

jakehehrlich marked 3 inline comments as done.Jul 14 2017, 3:55 PM

jakehehrlich added inline comments.

tools/llvm-objcopy/Object.cpp
80–81	I agree, "covers" is not the right word here. I'll say "within" or "inside of" I suppose. I thought I was using "cover" in the colloquial sense it is used when referring to these things but apparently I did not use it correctly. I thought it was an odd way to refer to things. I'll just refrain from using that term. Oh weather an empty section should belong to a segment I agree and came to the same conclusion by the same reasoning. I plan on never modifying segment size or contents however so I don't think this will cause an issue.
194	The difference between this case and the PHDR case is that it's easy to allocate space for t his gap during layout but it is not easy to retroactively figure out that this segment should be covering the program headers which is space that is not even up for allocation from the perspective of the current algorithm. I'm going to upload a new diff that handles this case but does not handle the PHDR case.

Fixed "sectionCoversSegment" confusion
Segments with gaps at the start are now preserved
Clarified how the SHT_NULL section was being added

One more issue that I've put inline, along with a couple of comment typos, and then there's only the tests to sort out. Good work!

It would be good to get somebody else's eyes on this though, given the scale of the new code.

tools/llvm-objcopy/Object.cpp
92	"belongs" to the second section and -> "belongs" to the second segment and
206	asign -> assign
225	I think this only should happen when the section is not in a segment. Otherwise, the section will move independently of the segment, and thus break our file image preservation. In most cases this is a no-op anyway, because the alignment should already be met, but in the case of slightly dodgy input (misaligned section within a segment), we should probably avoid making things worse by breaking the memory image. If you do this, the comment at the start of the for loop should be updated.

fixed comment typos

Ok, I'm now completely happy with the code as is (barring one more nit in a comment). Just the testing left.

I don't know how you're getting on with segments and yaml2obj, but there is precedent for checking in pre-built binaries in the Lit tests, if it's not possible to create the inputs from raw assembly or similar. That might be a way forward if yaml2obj support is going to take some time.

tools/llvm-objcopy/Object.cpp
227	Nit: I'd use "if this section is in a segment" instead of belonged in, because something might belong in something else, but for whatever reason wasn't put in there.

I clarified the comment that James mentioned

I'm hoping to get the yaml2obj program headers change pushed though this week and I'm hoping BigCheese will give this a look. A lot has changed since he last looked at it. I'll also see if Petr can't give this another look. BigCheese should be back from the ISO meeting so hopefully he can give it a look this week. So maybe, just maybe, I'll land this this week or early next week.

Looks good after you fix the use of lld in hello-world.s

This revision is now accepted and ready to land.Jul 18 2017, 3:29 PM

Fixed test case issue by removing hello-world.s and adding program-headers.test
Removed some unneeded headers

jakehehrlich added a child revision: D34480: [llvm][llvm-objcopy] Added support for outputting to binary in llvm-objcopy.Jul 19 2017, 4:19 PM

Closed by commit rL308559: [LLVM][llvm-objcopy] Added basic plumbing to get things started (authored by phosek). · Explain WhyJul 19 2017, 4:53 PM

This revision was automatically updated to reflect the committed changes.

MSan detects use-of-uninitialized-value in these new tests, please fix: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap/builds/1855/steps/check-llvm%20msan/logs/stdio

Fixed MSan issue. I was not initlizing all fields of StringNames and the parent segment was sometimes uninitialized. A method "makeSection" was added to handle creation of sections so that it is all done in one place rather than spread out.
Hopefully fixed a fatal warning on a mac build be removing a variable deceleration

Somehow I messed up and uploaded the old hello-world.s test. This fixes that

Is there any way yet in yaml2obj for it to add program headers which cover more than just the stated sections (in file and memory terms)? It might be nice to have a test that simulates a segment with nobits sections in, and/or arbitrary gaps before and after sections.

Otherwise the tests look good to me as they currently stand.

Added MC as a dependency. This resolves the error when building on i686-mingw32-RA-on-linux

Some builds fail when using "make_unique" because "std::make_unique" is sometimes defined. I fixed it to use llvm::make_uinque everywhere

There was an issue on windows that caused a "permission denied" error. I *think* this is being caused by tool_output_file also opening a handle to the file and windows not allowing writing to such a file. I removed the tool_output_file code

Added llvm-objcopy to test/CMakeLists.txt

jakehehrlich updated this revision to Diff 108947.Jul 31 2017, 10:01 AM

Switched the ordering on sections in segments to use OriginalOffset instead of Address. This should be the same in general but also handles non-allocated sections being added to non PT_LOAD segments and is consistent with the rest of the code in the system.
Added llvm-objcopy to lit.cfg. This caused an odd incompatibility on windows but not linux
Added addresses to program-header.test. Without this the tests are invalid and before change 1) this would cause undefined behavior in the ordering of sections in segments because the address was 0x0 for everything. For some reason this error only occurred on windows. It would appear that the allocator on linux increases memory addresses monotonically and this caused the bug to be disguised. So now tests are valid

Revision Contents

Path

Size

test/

tools/

llvm-objcopy/

basic-copy.test

47 lines

hello-world.s

48 lines

tools/

LLVMBuild.txt

1 line

llvm-objcopy/

9 lines

21 lines

148 lines

284 lines

32 lines

99 lines

Diff 104754

test/tools/llvm-objcopy/basic-copy.test

This file was added.

				# RUN: yaml2obj %s > %t
				# RUN: llvm-objcopy %t %t2
				# RUN: llvm-readobj -sections %t2 \| FileCheck %s

				grimarUnsubmitted Done Reply Inline Actions I would add empty line here. grimar: I would add empty line here.
				!ELF
				FileHeader:
				Class: ELFCLASS64
				Data: ELFDATA2LSB
				Type: ET_EXEC
				Machine: EM_X86_64
				Sections:
				- Name: .bss
				Type: SHT_NOBITS
				Flags: [ SHF_ALLOC ]
				AddressAlign: 0x0000000000000010
				Size: 64
				grimarUnsubmitted Done Reply Inline Actions And here. Just to separate execution flow from check in a more obvious way. grimar: And here. Just to separate execution flow from check in a more obvious way.
				- Name: .text
				Type: SHT_PROGBITS
				Flags: [ SHF_ALLOC, SHF_EXECINSTR ]
				AddressAlign: 0x0000000000000010
				Content: "00000000"

				# CHECK: Type: SHT_NULL

				# CHECK: Name: .shstrtab
				# CHECK-NEXT: Type: SHT_STRTAB
				# CHECK-NEXT: Flags [
				# CHECK-NEXT: ]

				# CHECK: Name: .bss
				# CHECK-NEXT: Type: SHT_NOBITS
				# CHECK-NEXT: Flags [
				# CHECK-NEXT: SHF_ALLOC
				# CHECK-NEXT: ]
				# CHECK-NEXT: Address:
				# CHECK-NEXT: Offset: [[OFFSET:0x[0-9A-F]+]]
				# CHECK-NEXT: Size: 64

				# CHECK: Name: .text
				# CHECK-NEXT: Type: SHT_PROGBITS
				# CHECK-NEXT: Flags [
				# CHECK-NEXT: SHF_ALLOC
				# CHECK-NEXT: SHF_EXECINSTR
				# CHECK-NEXT: ]
				# CHECK-NEXT: Address:
				# CHECK-NEXT: Offset: [[OFFSET]]
				# CHECK-NEXT: Size: 4

test/tools/llvm-objcopy/hello-world.s

This file was added.

				# RUN: llvm-mc -filetype=obj -triple=x86_64-unknown-linux %s -o %t.o
				# RUN: ld.lld %t.o -o %t
				# RUN: llvm-objcopy %t %t2
				jhendersonUnsubmitted Not Done Reply Inline Actions I don't think this will work for many people, as LLD is not part of LLVM's main source code body, so people will be unable to run this test. That being said, I'm not sure how to work around this. Can yaml2obj help us here? (I'm not particularly familiar with that tool yet). jhenderson: I don't think this will work for many people, as LLD is not part of LLVM's main source code…
				jakehehrlichAuthorUnsubmitted Not Done Reply Inline Actions Oh man, that didn't even occur to me. As far as I am aware there is no way to produce program headers using yaml2obj. It never produces any program headers by default either. I think you're right and this should be fixed. I have no clue how to fix it other than adding support for program headers to yaml2obj. That's not something I've looked into however. Perhaps there is a way to require lld for this test? That way it's just unsupported for most people? jakehehrlich: Oh man, that didn't even occur to me. As far as I am aware there is no way to produce program…
				jhendersonUnsubmitted Not Done Reply Inline Actions I'm afraid I don't know of a way. There might be, but if there is, somebody else will have to advise. It seems to me that having program header support in yaml2obj is very desirable. I'd offer to add it, but I do not have any time to do so. jhenderson: I'm afraid I don't know of a way. There might be, but if there is, somebody else will have to…
				jakehehrlichAuthorUnsubmitted Not Done Reply Inline Actions I'm adding support for program headers to yaml2obj now. Hopefully I can land that change soon. jakehehrlich: I'm adding support for program headers to yaml2obj now. Hopefully I can land that change soon.
				# RUN: llvm-readobj -program-headers %t2 \| FileCheck %s

				.global _start
				.text
				_start:
				mov $1, %rax
				mov $1, %rdi
				mov $msg, %rsi
				mov $14, %rdx
				syscall

				mov $60, %rax
				mov $0, %rdi
				syscall

				.rodata
				msg:
				.ascii "Hello, World!\n"

				# CHECK: ProgramHeader {
				# CHECK: Type: PT_LOAD
				# CHECK-NEXT: Offset:
				# CHECK-NEXT: VirtualAddress:
				# CHECK-NEXT: PhysicalAddress:
				# CHECK-NEXT: FileSize:
				# CHECK-NEXT: MemSize:
				# CHECK-NEXT: Flags [
				# CHECK-NEXT: PF_R
				# CHECK-NEXT: ]
				# CHECK-NEXT: Alignment: 4096
				# CHECK-NEXT: }

				# CHECK: ProgramHeader {
				# CHECK-NEXT: Type: PT_LOAD
				# CHECK-NEXT: Offset:
				# CHECK-NEXT: VirtualAddress:
				# CHECK-NEXT: PhysicalAddress:
				# CHECK-NEXT: FileSize: 46
				# CHECK-NEXT: MemSize: 46
				# CHECK-NEXT: Flags [
				# CHECK-NEXT: PF_R
				# CHECK-NEXT: PF_X
				# CHECK-NEXT: ]
				# CHECK-NEXT: Alignment:
				# CHECK-NEXT: }

tools/LLVMBuild.txt

	Show All 32 Lines
	llvm-extract			llvm-extract
	llvm-jitlistener			llvm-jitlistener
	llvm-link			llvm-link
	llvm-lto			llvm-lto
	llvm-mc			llvm-mc
	llvm-mcmarkup			llvm-mcmarkup
	llvm-modextract			llvm-modextract
	llvm-nm			llvm-nm
				llvm-objcopy
	llvm-objdump			llvm-objdump
	llvm-pdbdump			llvm-pdbdump
	llvm-profdata			llvm-profdata
	llvm-rtdyld			llvm-rtdyld
	llvm-size			llvm-size
	llvm-split			llvm-split
	opt			opt
	verify-uselistorder			verify-uselistorder

	[component_0]			[component_0]
	type = Group			type = Group
	name = Tools			name = Tools
	parent = $ROOT			parent = $ROOT

tools/llvm-objcopy/CMakeLists.txt

This file was added.

				set(LLVM_LINK_COMPONENTS
				Core
				Object
				Support
				)
				add_llvm_tool(llvm-objcopy
				llvm-objcopy.cpp
				Object.cpp
				)

tools/llvm-objcopy/LLVMBuild.txt

This file was added.

				;===- ./tools/llvm-objcopy/LLVMBuild.txt ------------------------ Conf ---===;
				;
				; The LLVM Compiler Infrastructure
				;
				; This file is distributed under the University of Illinois Open Source
				; License. See LICENSE.TXT for details.
				;
				;===------------------------------------------------------------------------===;
				;
				; This is an LLVMBuild description file for the components in this subdirectory.
				;
				; For more information on the LLVMBuild system, please see:
				;
				; http://llvm.org/docs/LLVMBuild.html
				;
				;===------------------------------------------------------------------------===;
				[component_0]
				type = Tool
				name = llvm-objcopy
				parent = Tools
				required_libraries = Object

tools/llvm-objcopy/Object.h

This file was added.

				//===- Object.h -------------------------------------------------- C++ --===//
				//
				// The LLVM Compiler Infrastructure
				//
				BigcheeseUnsubmitted Done Reply Inline Actions `The LLVM Compiler Infrastructure` Bigcheese: `The LLVM Compiler Infrastructure`
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_OBJCOPY_OBJECT_H
				#define LLVM_OBJCOPY_OBJECT_H

				#include "llvm/ADT/iterator.h"
				#include "llvm/ADT/iterator_range.h"
				#include "llvm/MC/StringTableBuilder.h"
				#include "llvm/Object/Binary.h"
				#include "llvm/Object/ELFObjectFile.h"
				#include "llvm/Object/ObjectFile.h"
				#include "llvm/Support/ELF.h"
				#include "llvm/Support/FileOutputBuffer.h"
				#include "llvm/Support/TargetSelect.h"

				#include <memory>
				#include <set>

				class Segment;

				class SectionBase {
				public:
				llvm::StringRef Name;
				Segment *ParentSegment;
				uint64_t HeaderOffset;
				jhendersonUnsubmitted Done Reply Inline Actions ParentSegment jhenderson: ParentSegment
				uint32_t Index;

				uint64_t Addr = 0;
				uint64_t Align = 1;
				uint32_t EntrySize = 0;
				uint64_t Flags = 0;
				uint64_t Info = 0;
				uint64_t Link = llvm::ELF::SHN_UNDEF;
				uint64_t NameIndex = 0;
				uint64_t Offset = 0;
				uint64_t Size = 0;
				uint64_t Type = llvm::ELF::SHT_NULL;
				grimarUnsubmitted Done Reply Inline Actions Sort by Name may be ? And looks you can do initialization inline instead of doing that in constructor below. grimar: Sort by Name may be ? And looks you can do initialization inline instead of doing that in…

				virtual ~SectionBase() {}
				virtual void finalize();
				template <class ELFT> void writeHeader(llvm::FileOutputBuffer &Out) const;
				virtual void writeSection(llvm::FileOutputBuffer &Out) const = 0;
				};

				class Segment {
				private:
				struct SectionCompare {
				bool operator()(const SectionBase Lhs, const SectionBase Rhs) const {
				return Lhs->Addr < Rhs->Addr;
				}
				};
				jhendersonUnsubmitted Done Reply Inline Actions Two sections with identical addresses will be considered identical by the set, which means that only one will be present. I'm pretty sure overlapping sections are bad, but an empty section could easily share the same start address without any worry. I think you need to compare the pointers themselves to prove they are not identical, i.e. the check becomes: if (Lhs->Addr == Rhs->Addr) return Lhs < Rhs; return Lhs->Addr < Rhs->Addr; Optionally, you could test other members as well, e.g. Offset or Size, but ultimately, you're going to need to test that they are not the same section somewhere. Please also add a test case for this (e.g. two identical empty sections at the same address and offset, or one empty and one non-empty section at the same address and offset). jhenderson: Two sections with identical addresses will be considered identical by the set, which means that…

				std::set<const SectionBase *, SectionCompare> Sections;

				public:
				uint64_t Align;
				uint64_t FileSize;
				uint32_t Flags;
				uint32_t Index;
				uint64_t MemSize;
				uint64_t Offset;
				uint64_t PAddr;
				uint64_t Type;
				uint64_t VAddr;

				void finalize();
				const SectionBase *firstSection() const {
				if (Sections.size())
				return *Sections.begin();
				return nullptr;
				jhendersonUnsubmitted Done Reply Inline Actions `if (!Sections.empty())` maybe? jhenderson: `if (!Sections.empty())` maybe?
				}
				void addSection(const SectionBase *sec) { Sections.insert(sec); }
				template <class ELFT> void writeHeader(llvm::FileOutputBuffer &Out) const;
				};

				class Section : public SectionBase {
				private:
				llvm::ArrayRef<uint8_t> Contents;

				public:
				Section(llvm::ArrayRef<uint8_t> Data) : Contents(Data) {}
				void writeSection(llvm::FileOutputBuffer &Out) const override;
				};

				BigcheeseUnsubmitted Done Reply Inline Actions This should probably use llvm/MC/StringTableBuilder.h Bigcheese: This should probably use llvm/MC/StringTableBuilder.h
				// This is just a wraper around a StringTableBuilder that implements SectionBase
				class StringTableSection : public SectionBase {
				private:
				llvm::StringTableBuilder StrTabBuilder;
				grimarUnsubmitted Done Reply Inline Actions Consider using llvm::StringMap may be ? grimar: Consider using llvm::StringMap may be ?

				public:
				StringTableSection() : StrTabBuilder(llvm::StringTableBuilder::ELF) {
				Type = llvm::ELF::SHT_STRTAB;
				}

				BigcheeseUnsubmitted Done Reply Inline Actions dead code Bigcheese: dead code
				void addString(llvm::StringRef Name);
				uint32_t findIndex(llvm::StringRef Name) const;
				void finalize() override;
				void writeSection(llvm::FileOutputBuffer &Out) const override;
				grimarUnsubmitted Done Reply Inline Actions I think you can inline initialization, grimar: I think you can inline initialization,
				static bool classof(const SectionBase *S) {
				return S->Type == llvm::ELF::SHT_STRTAB;
				BigcheeseUnsubmitted Done Reply Inline Actions dead code Bigcheese: dead code
				}
				};

				template <class ELFT> class Object {
				private:
				typedef std::unique_ptr<SectionBase> SecPtr;
				typedef std::unique_ptr<Segment> SegPtr;

				typedef typename ELFT::Shdr Elf_Shdr;
				typedef typename ELFT::Ehdr Elf_Ehdr;
				typedef typename ELFT::Phdr Elf_Phdr;

				StringTableSection *SectionNames;
				std::vector<SecPtr> Sections;
				std::vector<SegPtr> Segments;

				void sortSections();
				void assignOffsets();
				void readProgramHeaders(const llvm::object::ELFFile<ELFT> &ElfFile);
				void readSectionHeaders(const llvm::object::ELFFile<ELFT> &ElfFile);
				void writeHeader(llvm::FileOutputBuffer &Out) const;
				void writeProgramHeaders(llvm::FileOutputBuffer &Out) const;
				void writeSectionData(llvm::FileOutputBuffer &Out) const;
				void writeSectionHeaders(llvm::FileOutputBuffer &Out) const;

				public:
				grimarUnsubmitted Done Reply Inline Actions Not sure how much is in common not to include argument names in definitions. (in LLD where I spend most of time I am working on, we always do that, but may be it is ok, just noticable thing for me). grimar: Not sure how much is in common not to include argument names in definitions. (in LLD where I…
				uint8_t Ident[16];
				uint64_t Entry;
				uint64_t SHOffset;
				uint32_t Type;
				uint32_t Machine;
				uint32_t Version;
				uint32_t Flags;

				Object(const llvm::object::ELFObjectFile<ELFT> &Obj);
				jhendersonUnsubmitted Done Reply Inline Actions The size of some of these fields is dependent on the Elf Type (64 versus 32). I wonder if these should be based on ELFT? On the other hand, I'm quite happy that we just use the largest size for now. At the moment, since the template is only instantiated for Elf64LE, this is fine, but it won't work once 32-bit support is added, unless the point where they are assigned to the Elf Header struct is modified to cast the values accordingly. jhenderson: The size of some of these fields is dependent on the Elf Type (64 versus 32). I wonder if these…
				jakehehrlichAuthorUnsubmitted Done Reply Inline Actions This was a recommendation that I got from Rui actually. To quote Rui when speaking about LLD. Integral types such as Elf{32,64}_{XWord,Word,Addr,Offset} are not useful and better to avoid. We are using uint{8,16,32,64}_t instead. It seems it improves readability. (I honestly don't memorize > the real types of these ELFT types.) ELFT::uint, whose size is 32/64 depending on ELF32/64, isn't useful. Always uint64_t to represent a value that can be 32 or 64. The waste of doing this is negligible, but it could drastically simplify > types because if your function uses only ELFT::uint, you can de-template that function by using uint64_t. I've found this to be true as well. It definitely lets you get rid of a lot of templates. If we just used the templates when you had access to them (Like in Object here) and used the largest type when we didn't (like in Segment or SectionBase) we would get this strange inconsistency in our code. I'd like to avoid that. jakehehrlich: This was a recommendation that I got from Rui actually. To quote Rui when speaking about LLD.
				jhendersonUnsubmitted Done Reply Inline Actions Ok, that's fine with me, as long as it was a conscious decision. I could see a case in the future when adding support for things like adding or modifying sections when it may be important to safety check that we haven't overflowed, but that's not relevant at the moment. jhenderson: Ok, that's fine with me, as long as it was a conscious decision. I could see a case in the…
				size_t totalSize() const;
				void finalize();
				void write(llvm::FileOutputBuffer &Out);
				};

				#endif

tools/llvm-objcopy/Object.cpp

This file was added.

				//===- Object.cpp ------------------------------------------------ C++ --===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				#include "Object.h"
				#include "llvm-objcopy.h"

				using namespace llvm;
				using namespace object;
				using namespace ELF;

				template <class ELFT> void Segment::writeHeader(FileOutputBuffer &Out) const {
				typedef typename ELFT::Ehdr Elf_Ehdr;
				typedef typename ELFT::Phdr Elf_Phdr;

				jhendersonUnsubmitted Done Reply Inline Actions Could we typedef these as well? It will read more easily, I think. jhenderson: Could we typedef these as well? It will read more easily, I think.
				uint8_t *Buf = Out.getBufferStart();
				Buf += sizeof(Elf_Ehdr) + Index * sizeof(Elf_Phdr);
				Elf_Phdr &Phdr = reinterpret_cast<Elf_Phdr >(Buf);
				Phdr.p_type = Type;
				Phdr.p_flags = Flags;
				Phdr.p_offset = Offset;
				jhendersonUnsubmitted Done Reply Inline Actions Any particular reason this TODO has been left undone? jhenderson: Any particular reason this TODO has been left undone?
				Phdr.p_vaddr = VAddr;
				Phdr.p_paddr = PAddr;
				Phdr.p_filesz = FileSize;
				Phdr.p_memsz = MemSize;
				Phdr.p_align = Align;
				}

				void Segment::finalize() {
				if (firstSection())
				Offset = firstSection()->Offset;
				}

				grimarUnsubmitted Done Reply Inline Actions I would inline CompOffset, it is trivial operation that does not probably require addition comparator. grimar: I would inline CompOffset, it is trivial operation that does not probably require addition…
				void SectionBase::finalize() {}

				template <class ELFT>
				void SectionBase::writeHeader(FileOutputBuffer &Out) const {
				uint8_t *Buf = Out.getBufferStart();
				Buf += HeaderOffset;
				typename ELFT::Shdr &Shdr = reinterpret_cast<typename ELFT::Shdr >(Buf);
				Shdr.sh_name = NameIndex;
				Shdr.sh_type = Type;
				Shdr.sh_flags = Flags;
				Shdr.sh_addr = Addr;
				Shdr.sh_offset = Offset;
				Shdr.sh_size = Size;
				Shdr.sh_link = Link;
				Shdr.sh_info = Info;
				Shdr.sh_addralign = Align;
				Shdr.sh_entsize = EntrySize;
				}

				void Section::writeSection(FileOutputBuffer &Out) const {
				if (Type == SHT_NOBITS)
				return;
				uint8_t *Buf = Out.getBufferStart() + Offset;
				std::copy(std::begin(Contents), std::end(Contents), Buf);
				}

				void StringTableSection::addString(StringRef Name) {
				StrTabBuilder.add(Name);
				Size = StrTabBuilder.getSize();
				}

				uint32_t StringTableSection::findIndex(StringRef Name) const {
				return StrTabBuilder.getOffset(Name);
				}

				void StringTableSection::finalize() { StrTabBuilder.finalize(); }

				void StringTableSection::writeSection(FileOutputBuffer &Out) const {
				StrTabBuilder.write(Out.getBufferStart() + Offset);
				}
				grimarUnsubmitted Done Reply Inline Actions Count grimar: Count

				template <class ELFT>
				void Object<ELFT>::readProgramHeaders(const ELFFile<ELFT> &ElfFile) {
				uint32_t Index = 0;
				jhendersonUnsubmitted Not Done Reply Inline Actions It's not clear on the surface what is meant by "Covers" (I assume it means within or similar). Also, I think you've got it backwards in the name - it returns true, if the section is in the segment, not the other way around. Also, I'm not sure what the behaviour of empty sections should be here. My instinct is to say that empty sections are "covered" by a segment, if they appear at the start, or in the middle of a segment, but not if it is only at the end. I could imagine a case with two segments with the first having the same end offset as the start of the next one. If there is an empty section at the meeting point, I don't think it makes sense for it to be in both. From a run-time point of view, I don't think it matters, but it could cause weird looking anomalies in static dumping tools, and also leads to the case of, "if the first segment moves to a lower offset, does the section move with it or stay with the other segment?" jhenderson: It's not clear on the surface what is meant by "Covers" (I assume it means within or similar).
				jakehehrlichAuthorUnsubmitted Not Done Reply Inline Actions I agree, "covers" is not the right word here. I'll say "within" or "inside of" I suppose. I thought I was using "cover" in the colloquial sense it is used when referring to these things but apparently I did not use it correctly. I thought it was an odd way to refer to things. I'll just refrain from using that term. Oh weather an empty section should belong to a segment I agree and came to the same conclusion by the same reasoning. I plan on never modifying segment size or contents however so I don't think this will cause an issue. jakehehrlich: I agree, "covers" is not the right word here. I'll say "within" or "inside of" I suppose. I…
				for (const auto &Phdr : unwrapOrError(ElfFile.program_headers())) {
				Segments.emplace_back(make_unique<Segment>());
				Segment &Seg = *Segments.back();
				Seg.Type = Phdr.p_type;
				Seg.Flags = Phdr.p_flags;
				Seg.Offset = Phdr.p_offset;
				Seg.VAddr = Phdr.p_vaddr;
				Seg.PAddr = Phdr.p_paddr;
				Seg.FileSize = Phdr.p_filesz;
				Seg.MemSize = Phdr.p_memsz;
				Seg.Align = Phdr.p_align;
				jhendersonUnsubmitted Not Done Reply Inline Actions "belongs" to the second section and -> "belongs" to the second segment and jhenderson: "belongs" to the second section and -> "belongs" to the second segment and
				Seg.Index = Index++;
				for (auto &Section : Sections) {
				if (Seg.Offset <= Section->Offset &&
				Seg.Offset + Seg.FileSize >= Section->Offset + Section->Size) {
				Seg.addSection(&*Section);
				Section->ParentSegment = &Seg;
				}
				}
				}
				}
				jhendersonUnsubmitted Done Reply Inline Actions From a code readability point, it might be worth either breaking this into a small helper function, or to explain the algorithm in comments somewhere. I'm not asking you to add them at this point, but this is the sort of thing that would be great to have unit-tests around! jhenderson: From a code readability point, it might be worth either breaking this into a small helper…

				template <class ELFT>
				void Object<ELFT>::readSectionHeaders(const ELFFile<ELFT> &ElfFile) {
				uint32_t Index = 0;
				for (const auto &Shdr : unwrapOrError(ElfFile.sections())) {
				if (Index == 0) {
				jhendersonUnsubmitted Done Reply Inline Actions This should probably be SHT_STRTAB \|\| SHT_NULL. jhenderson: This should probably be SHT_STRTAB \|\| SHT_NULL.
				++Index;
				continue;
				}
				if (Shdr.sh_type == SHT_STRTAB)
				continue;
				ArrayRef<uint8_t> Data;
				if (Shdr.sh_type != SHT_NOBITS)
				Data = unwrapOrError(ElfFile.getSectionContents(&Shdr));
				SecPtr Sec = make_unique<Section>(Data);
				Sec->Name = unwrapOrError(ElfFile.getSectionName(&Shdr));
				Sec->Type = Shdr.sh_type;
				Sec->Flags = Shdr.sh_flags;
				Sec->Addr = Shdr.sh_addr;
				Sec->Offset = Shdr.sh_offset;
				grimarUnsubmitted Done Reply Inline Actions Seg.Index = Index++; grimar: Seg.Index = Index++;
				Sec->Size = Shdr.sh_size;
				Sec->Link = Shdr.sh_link;
				Sec->Info = Shdr.sh_info;
				Sec->Align = Shdr.sh_addralign;
				Sec->EntrySize = Shdr.sh_entsize;
				Sec->Index = Index;
				Index++;
				jhendersonUnsubmitted Done Reply Inline Actions Sec->Index = Index++? jhenderson: Sec->Index = Index++?
				SectionNames->addString(Sec->Name);
				Sections.push_back(std::move(Sec));
				}
				}

				template <class ELFT> size_t Object<ELFT>::totalSize() const {
				// We already have the section header offset so we can calculate the total
				// size by just adding up the size of each section header.
				return SHOffset + Sections.size() * sizeof(Elf_Shdr) + sizeof(Elf_Shdr);
				}

				template <class ELFT> Object<ELFT>::Object(const ELFObjectFile<ELFT> &Obj) {
				const auto &ElfFile = *Obj.getELFFile();
				const auto &Ehdr = *ElfFile.getHeader();

				std::copy(Ehdr.e_ident, Ehdr.e_ident + 16, Ident);
				Type = Ehdr.e_type;
				Machine = Ehdr.e_machine;
				Version = Ehdr.e_version;
				Entry = Ehdr.e_entry;
				Flags = Ehdr.e_flags;

				Sections.push_back(make_unique<StringTableSection>());
				SectionNames = dyn_cast<StringTableSection>(Sections.back().get());
				SectionNames->Name = ".shstrtab";
				SectionNames->addString(SectionNames->Name);

				readSectionHeaders(ElfFile);
				readProgramHeaders(ElfFile);
				}

				template <class ELFT> void Object<ELFT>::sortSections() {
				// Put allocated sections in address order. Maintain ordering as closely as
				jhendersonUnsubmitted Done Reply Inline Actions address order -> offset order jhenderson: address order -> offset order
				// possible while meeting that demand however.
				auto CompareSections = [](const SecPtr &A, const SecPtr &B) {
				return A->Offset < B->Offset;
				};
				jhendersonUnsubmitted Done Reply Inline Actions I personally would think that using offset to determine order may be simpler, as it removes the need to test for SHF_ALLOC. Also, sections don't have to appear in the file in the order that they appear in the section header table, so I'd only use Index as a fall-back if you get two sections with the same offset. jhenderson: I personally would think that using offset to determine order may be simpler, as it removes the…
				jakehehrlichAuthorUnsubmitted Done Reply Inline Actions This is a really good idea. Thank you! jakehehrlich: This is a really good idea. Thank you!
				std::stable_sort(std::begin(Sections), std::end(Sections), CompareSections);
				}

				template <class ELFT> void Object<ELFT>::assignOffsets() {
				jhendersonUnsubmitted Done Reply Inline Actions "make ensure" -> "make sure" or just "ensure" jhenderson: "make ensure" -> "make sure" or just "ensure"
				// Decide file offsets and indexes.
				size_t PhdrSize = Segments.size() * sizeof(Elf_Phdr);
				BigcheeseUnsubmitted Done Reply Inline Actions Memory leak. Use unique_ptr. Bigcheese: Memory leak. Use unique_ptr.
				jakehehrlichAuthorUnsubmitted Done Reply Inline Actions It get's emplaced into the Sections vector first thing after allocation and that unique pointer manages the memory. Would be more obvious that a leak isn't occurring I first add a new unique pointer to Sections and then assign SectionNames via a cast from there? jakehehrlich: It get's emplaced into the Sections vector first thing after allocation and that unique pointer…
				BigcheeseUnsubmitted Done Reply Inline Actions Ah, I see now. It would probably be best to do: Sections.push_back(make_unique<StringTableSection>()); SectionNames = Sections.back().get(); ... Bigcheese: Ah, I see now. It would probably be best to do: ``` Sections.push_back…
				jhendersonUnsubmitted Done Reply Inline Actions Typo -> "Decide file offsets and indexes" jhenderson: Typo -> "Decide file offsets and indexes"
				// We can put section data after the ELF header and the program headers.
				uint64_t Offset = sizeof(Elf_Ehdr) + PhdrSize;
				jhendersonUnsubmitted Done Reply Inline Actions For clarity, I'd rephrase this comment as follows: "We can put section data after the ELF header and the program headers." jhenderson: For clarity, I'd rephrase this comment as follows: "We can put section data after the ELF…
				uint64_t Index = 1;
				for (auto &Section : Sections) {
				// The segment can have a different alignment than the section. We need to
				// ensure that both alignments are respected.
				jhendersonUnsubmitted Done Reply Inline Actions We need to make sure of what? Perhaps "We need to check both are respected" or something to that effect. jhenderson: We need to make sure of what? Perhaps "We need to check both are respected" or something to…
				if (Section->ParentSegment) {
				auto FirstInSeg = Section->ParentSegment->firstSection();
				if (FirstInSeg == Section.get()) {
				Offset = alignTo(Offset, Section->ParentSegment->Align);
				Section->Offset = Offset;
				}
				// We should respect interstitial gaps of allocated sections
				Offset = FirstInSeg->Offset + Section->Addr - FirstInSeg->Addr;
				jhendersonUnsubmitted Done Reply Inline Actions Could you explain why we are using a mix of offsets and addresses here, as it is surprising to me. jhenderson: Could you explain why we are using a mix of offsets and addresses here, as it is surprising to…
				jakehehrlichAuthorUnsubmitted Done Reply Inline Actions The first an main point is that the Offsets change in this loop and what we want to respect is the difference between the section offsets in the original file. FirstInSeg->Offset no longer respects that information. This is changeable however by adding fields that keep track of the original file offset. This might make sense to use elsewhere as well like in the sorting algorithm where we're trying to respect the original offset as much as possible. This would free up Offset to be messed with (in what way isn't clear) before sorting occurs. So this is changeable but only by adding another field. Perhaps that is not a bad idea however. This said when I was thinking about this I never actually even thought to use Offset. I was thinking about respecting the memory image not the file image. I think as a guiding principal "respect the memory image" is a a better guiding principal but only slightly. The memory image and file image should agree for all valid files however so I'm not really opposed to changing it. I have a slight opposition but it's based on some ideas I have for changing the file layout algorithm and I don't want to justify code here based on code that hasn't been written yet. The general idea is that it is possible to enforce "respect the memory image" via const correctness/encapsulation but it is not possible to do the same for the file image because Offset will need to be changed. My personal opinion is that adding a new Offset like field that isn't quite Offset and using that field to compute the new offset here isn't really any clearer that what I have here. What do you think? jakehehrlich: The first an main point is that the Offsets change in this loop and what we want to respect is…
				jhendersonUnsubmitted Done Reply Inline Actions I think I follow what you're saying and broadly agree with it. We have to respect the memory image, at least for linked objects, because if we change the relative positions of two sections within a segment, then the address mapping will no longer be correct. For relocatable objects, that's not an issue, because they don't have a memory mapping yet (and segments can be ignored in this case). However, the file and memory image are not always consistent, even for valid programs. Specifically, SHT_NOBITS sections have no file image typically, but do have a memory image. Usually, this is not an issue, because the .bss section is placed at the end of the data segment. However, when using TLS (which is sometimes a nested segment in the data segment) this is not the always the case due to .tbss, because other non-TLS data usually appears after it. From my observations, different platforms exhibit different behaviour here, but at least some of them have no file image assigned for the .tbss section, but do have a memory image associated with it. As such, the layout algorithm needs to be able to handle this case. That all being said, looking at the code without actually trying it out, I'm not convinced that this will handle the nested-segment case anyway, in which case what you've got here is fine. However, I would suggest updating the comment to explain why we need to respect interstitial memory gaps. jhenderson: I think I follow what you're saying and broadly agree with it. We have to respect the memory…
				jhendersonUnsubmitted Done Reply Inline Actions I'd suggest changing the second and third sentence to something like: "We must maintain the memory image so that addresses are preserved. As, with the exception of SHT_NOBITS sections at the end of segments, the memory image is a copy of the file image, we preserve the file image as well." It explains why the file and memory image need to match, which may not be obvious to those who are unfamiliar with ELF. jhenderson: I'd suggest changing the second and third sentence to something like: "We must maintain the…
				}
				if (Section->Align)
				Offset = alignTo(Offset, Section->Align);
				Section->Offset = Offset;
				grimarUnsubmitted Done Reply Inline Actions A, B grimar: A, B
				jhendersonUnsubmitted Done Reply Inline Actions Section->Index = Index++; and then drop the later increment. jhenderson: Section->Index = Index++; and then drop the later increment.
				Section->Index = Index++;
				if (Section->Type != SHT_NOBITS)
				Offset += Section->Size;
				}
				BigcheeseUnsubmitted Done Reply Inline Actions Use alignTo from llvm/Support/MathExtras.h Bigcheese: Use alignTo from llvm/Support/MathExtras.h
				jhendersonUnsubmitted Not Done Reply Inline Actions I've just thought about yet another problem with this: In some cases, the ELF header and/or Program Header table can appear in a segment before the first section. Consequently, we need to preserve the relative offset to the start of the section, rather than the first segment. At the moment, any space before the first section within the segment is ignored and lost. This applies to Segment::finalize() too. jhenderson: I've just thought about yet another problem with this: In some cases, the ELF header and/or…
				jakehehrlichAuthorUnsubmitted Not Done Reply Inline Actions Yea PT_PHDR is a problem I have run into in extensions of this diff. I think it needs some special handling. I think I can support it when I officially add support for copying data of interstitial gaps and nested segments. It will still require special care however. Would you be ok with me not attempting to handle this right now? At the very least this requires handling nested segments because the PT_PHDR segment must be nested in a PT_LOAD segment. The needed information is preserved now because the original offset of the segment is stored until the segment offset needs to be computed and the original offset of the section is now stored and kept. So that gap can be recovered when performing layout of the segment. This seems to be enough to confirm that there isn't a fundamental design issue that prevents the PT_PHDR case from being handled. jakehehrlich: Yea PT_PHDR is a problem I have run into in extensions of this diff. I think it needs some…
				jhendersonUnsubmitted Not Done Reply Inline Actions I'm not sure it's specific to PT_PHDR - it's possible to put these headers in any arbitrary segment, if you want (though I'm not sure why you would in most cases), e.g. directly in the first PT_LOAD segment, with the appropriate linker script. Also, I don't think it's specific to the headers either - the file image is going to get messed up if there was empty space of any kind not covered by a section before the first section. Rather than preserve the relative offsets of sections to the first section, this code should probably preserve the offset relative to the original segment start. Example: \| Segment \| \| gap \| Section1 \| Section2 \| A B The offset of Section1 will be placed at A, if I've followed the code correctly, not at B, where it should be, thus messing up the file and therefore the memory image. Section2 will still be placed immediately after Section1 correctly. As a side point, this will also have an impact on the alignment of the first section, but assuming the input program is valid, I don't think that this will be an issue. jhenderson: I'm not sure it's specific to PT_PHDR - it's possible to put these headers in any arbitrary…
				jakehehrlichAuthorUnsubmitted Not Done Reply Inline Actions The difference between this case and the PHDR case is that it's easy to allocate space for t his gap during layout but it is not easy to retroactively figure out that this segment should be covering the program headers which is space that is not even up for allocation from the perspective of the current algorithm. I'm going to upload a new diff that handles this case but does not handle the PHDR case. jakehehrlich: The difference between this case and the PHDR case is that it's easy to allocate space for t…
				// 'offset' should now be just after all the section data so we should set the
				// section header table offset to be exactly here. This spot might not be
				// aligned properlly however so we should align it as needed. This only takes
				jhendersonUnsubmitted Done Reply Inline Actions properlly -> properly jhenderson: properlly -> properly
				// a little bit of tweaking to ensure that the sh_name is 4 byte aligned.
				Offset = alignTo(Offset, sizeof(typename ELFT::Word));
				grimarUnsubmitted Done Reply Inline Actions LLVM code style says variable names should be uppercase. Value, Multiple, so on. grimar: LLVM code style says variable names should be uppercase. Value, Multiple, so on.
				jhendersonUnsubmitted Done Reply Inline Actions Why only the sh_name field? 4-byte alignment could cause other struct members to be misaligned (e.g. the sh_addr and sh_offset fields), because they are 8-bytes in size. jhenderson: Why only the sh_name field? 4-byte alignment could cause other struct members to be misaligned…
				jakehehrlichAuthorUnsubmitted Done Reply Inline Actions The next field (sh_type) is also a 4-byte word and thus anything after those two will be 8-byte aligned. So this is the minimal alignment I can choose which ensures that all the fields of the section header are aligned. I updated the comment to reflect this fact. jakehehrlich: The next field (sh_type) is also a 4-byte word and thus anything after those two will be 8-byte…
				jhendersonUnsubmitted Done Reply Inline Actions Relative to the start of the section header, the fields will be aligned correctly. However, relative to the start of the file, they won't be: for example, in 64-bit ELF, with 8 byte address fields, the sh_name field was aligned to offset 4, the sh_type field will be at offset 8, and the 8-byte-sized sh_flags field will be at offset 12, which is misaligned. jhenderson: Relative to the start of the section header, the fields will be aligned correctly. However…
				jakehehrlichAuthorUnsubmitted Done Reply Inline Actions Yep. You're right. I did my math wrong. Not sure what I was thinking there. Thanks for catching that! It should be fixed now. jakehehrlich: Yep. You're right. I did my math wrong. Not sure what I was thinking there. Thanks for catching…
				SHOffset = Offset;
				grimarUnsubmitted Done Reply Inline Actions Is it clang-formatted ? grimar: Is it clang-formatted ?
				}

				template <class ELFT> void Object<ELFT>::finalize() {
				sortSections();
				assignOffsets();

				jhendersonUnsubmitted Not Done Reply Inline Actions asign -> assign jhenderson: asign -> assign
				// Finalize SectionNames first so that we can assign name indexes.
				SectionNames->finalize();
				// Finally now that all offsets and indexes have been set we can finalize any
				// reamining issues.
				uint64_t Offset = SHOffset + sizeof(Elf_Shdr);
				for (auto &Section : Sections) {
				Section->HeaderOffset = Offset;
				Offset += sizeof(Elf_Shdr);
				Section->NameIndex = SectionNames->findIndex(Section->Name);
				Section->finalize();
				}

				for (auto &Segment : Segments)
				Segment->finalize();
				jhendersonUnsubmitted Done Reply Inline Actions typo: remaining jhenderson: typo: remaining
				}

				template <class ELFT>
				void Object<ELFT>::writeHeader(FileOutputBuffer &Out) const {
				uint8_t *Buf = Out.getBufferStart();
				BigcheeseUnsubmitted Done Reply Inline Actions Replace with alignTo? Bigcheese: Replace with alignTo?
				jhendersonUnsubmitted Not Done Reply Inline Actions I think this only should happen when the section is not in a segment. Otherwise, the section will move independently of the segment, and thus break our file image preservation. In most cases this is a no-op anyway, because the alignment should already be met, but in the case of slightly dodgy input (misaligned section within a segment), we should probably avoid making things worse by breaking the memory image. If you do this, the comment at the start of the for loop should be updated. jhenderson: I think this only should happen when the section is not in a segment. Otherwise, the section…
				Elf_Ehdr &Ehdr = reinterpret_cast<Elf_Ehdr >(Buf);
				jhendersonUnsubmitted Done Reply Inline Actions Use typedef. jhenderson: Use typedef.
				std::copy(Ident, Ident + 16, Ehdr.e_ident);
				jhendersonUnsubmitted Not Done Reply Inline Actions Nit: I'd use "if this section is in a segment" instead of belonged in, because something might belong in something else, but for whatever reason wasn't put in there. jhenderson: Nit: I'd use "if this section is in a segment" instead of belonged in, because something might…
				Ehdr.e_type = Type;
				Ehdr.e_machine = Machine;
				Ehdr.e_version = Version;
				Ehdr.e_entry = Entry;
				Ehdr.e_phoff = sizeof(Elf_Ehdr);
				Ehdr.e_shoff = SHOffset;
				Ehdr.e_flags = Flags;
				Ehdr.e_ehsize = sizeof(Elf_Ehdr);
				Ehdr.e_phentsize = sizeof(Elf_Phdr);
				Ehdr.e_phnum = Segments.size();
				Ehdr.e_shentsize = sizeof(Elf_Shdr);
				Ehdr.e_shnum = Sections.size();
				Ehdr.e_shstrndx = SectionNames->Index;
				}

				template <class ELFT>
				void Object<ELFT>::writeProgramHeaders(FileOutputBuffer &Out) const {
				for (auto &Phdr : Segments)
				Phdr->template writeHeader<ELFT>(Out);
				}

				template <class ELFT>
				void Object<ELFT>::writeSectionHeaders(FileOutputBuffer &Out) const {
				uint8_t *Buf = Out.getBufferStart() + SHOffset;
				Elf_Shdr &Shdr = reinterpret_cast<Elf_Shdr >(Buf);
				Shdr.sh_name = 0;
				Shdr.sh_type = SHT_NULL;
				Shdr.sh_flags = 0;
				Shdr.sh_addr = 0;
				Shdr.sh_offset = 0;
				Shdr.sh_size = 0;
				Shdr.sh_link = 0;
				Shdr.sh_info = 0;
				Shdr.sh_addralign = 0;
				Shdr.sh_entsize = 0;

				for (auto &Section : Sections)
				Section->template writeHeader<ELFT>(Out);
				}

				template <class ELFT>
				void Object<ELFT>::writeSectionData(FileOutputBuffer &Out) const {
				for (auto &Section : Sections)
				grimarUnsubmitted Done Reply Inline Actions Consider using typedefs in header (example from LLD code) template <class ELFT> class Writer { public: typedef typename ELFT::Shdr Elf_Shdr; typedef typename ELFT::Ehdr Elf_Ehdr; typedef typename ELFT::Phdr Elf_Phdr; grimar: Consider using typedefs in header (example from LLD code) ``` template <class ELFT> class…
				Section->writeSection(Out);
				}
				jhendersonUnsubmitted Not Done Reply Inline Actions Nothing is using Shdr here. Is this missing a write, or can it be removed? jhenderson: Nothing is using Shdr here. Is this missing a write, or can it be removed?
				jakehehrlichAuthorUnsubmitted Not Done Reply Inline Actions This is actually correct I believe. This does point out a lack of clarity in what is going on here however. As is every section knows where their header belongs so I just pass "Out" to the section and it does the rest. This highlighted code writes the dummy section at the start of the section header table. So there's no missing write or anything. I'll add a comment explaining what's going on here. jakehehrlich: This is actually correct I believe. This does point out a lack of clarity in what is going on…
				jhendersonUnsubmitted Not Done Reply Inline Actions Ah, I missed the fact that this was a reference rather than a copy. What you are doing here is fine then. A comment might be helpful though, as you say. jhenderson: Ah, I missed the fact that this was a reference rather than a copy. What you are doing here is…

				template <class ELFT> void Object<ELFT>::write(FileOutputBuffer &Out) {
				writeHeader(Out);
				writeProgramHeaders(Out);
				writeSectionData(Out);
				writeSectionHeaders(Out);
				}

				template class Object<ELF64LE>;
				template class Object<ELF64BE>;
				template class Object<ELF32LE>;
				template class Object<ELF32BE>;

tools/llvm-objcopy/llvm-objcopy.h

This file was added.

				//===- llvm-objcopy.h -------------------------------------------- C++ --===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				#ifndef LLVM_OBJCOPY_H
				#define LLVM_OBJCOPY_H

				#include "llvm/ADT/Twine.h"
				#include "llvm/Support/Error.h"

				namespace llvm {
				jhendersonUnsubmitted Done Reply Inline Actions Unused header? jhenderson: Unused header?

				LLVM_ATTRIBUTE_NORETURN extern void error(Twine Message);

				// This is taken from llvm-readobj.
				// [see here](llvm/tools/llvm-readobj/llvm-readobj.h:38)
				template <class T> T unwrapOrError(Expected<T> EO) {
				if (EO)
				return *EO;
				std::string Buf;
				raw_string_ostream OS(Buf);
				logAllUnhandledErrors(EO.takeError(), OS, "");
				OS.flush();
				error(Buf);
				}
				}

				#endif

tools/llvm-objcopy/llvm-objcopy.cpp

This file was added.

				//===- llvm-objcopy.cpp ------------------------------------------ C++ --===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				#include "llvm-objcopy.h"
				#include "Object.h"
				#include "llvm/Support/CommandLine.h"
				#include "llvm/Support/FileOutputBuffer.h"
				#include "llvm/Support/PrettyStackTrace.h"
				#include "llvm/Support/Signals.h"
				#include "llvm/Support/ToolOutputFile.h"

				#include <memory>
				#include <string>
				#include <system_error>

				using namespace llvm;
				using namespace object;
				using namespace ELF;

				// The name this program was invoked as.
				static StringRef ToolName;

				namespace llvm {

				LLVM_ATTRIBUTE_NORETURN void error(Twine Message) {
				errs() << ToolName << ": " << Message << ".\n";
				errs().flush();
				exit(1);
				}

				LLVM_ATTRIBUTE_NORETURN void reportError(StringRef File, std::error_code EC) {
				assert(EC);
				errs() << ToolName << ": '" << File << "': " << EC.message() << ".\n";
				exit(1);
				}

				LLVM_ATTRIBUTE_NORETURN void reportError(StringRef File, llvm::Error E) {
				assert(E);
				std::string Buf;
				raw_string_ostream OS(Buf);
				logAllUnhandledErrors(std::move(E), OS, "");
				OS.flush();
				errs() << ToolName << ": '" << File << "': " << Buf;
				exit(1);
				}
				}

				cl::opt<std::string> InputFilename(cl::Positional, cl::desc("<input>"));
				cl::opt<std::string> OutputFilename(cl::Positional, cl::desc("<output>"),
				grimarUnsubmitted Done Reply Inline Actions ELF64LE ? I guess you want to template it. grimar: ELF64LE ? I guess you want to template it.
				cl::init("-"));

				void CopyBinary(const ELFObjectFile<ELF64LE> &ObjFile) {
				std::unique_ptr<FileOutputBuffer> Buffer;
				Object<ELF64LE> Obj{ObjFile};
				Obj.finalize();
				ErrorOr<std::unique_ptr<FileOutputBuffer>> BufferOrErr =
				FileOutputBuffer::create(OutputFilename, Obj.totalSize(),
				FileOutputBuffer::F_executable);
				if (auto EC = BufferOrErr.getError())
				error("failed to open " + OutputFilename);
				else
				Buffer = std::move(*BufferOrErr);
				std::error_code EC;
				std::unique_ptr<tool_output_file> Out =
				make_unique<tool_output_file>(OutputFilename.data(), EC, sys::fs::F_None);
				if (EC)
				report_fatal_error(EC.message());
				Obj.write(*Buffer);
				if (auto EC = Buffer->commit())
				jhendersonUnsubmitted Done Reply Inline Actions I think this comment is in the wrong place, or even unnecessary entirely. jhenderson: I think this comment is in the wrong place, or even unnecessary entirely.
				reportError(OutputFilename, EC);
				Out->keep();
				}

				int main(int argc, char **argv) {
				// Print a stack trace if we signal out.
				sys::PrintStackTraceOnErrorSignal(argv[0]);
				PrettyStackTraceProgram X(argc, argv);
				llvm_shutdown_obj Y; // Call llvm_shutdown() on exit.
				cl::ParseCommandLineOptions(argc, argv, "llvm objcopy utility\n");
				ToolName = argv[0];
				if (InputFilename.empty()) {
				cl::PrintHelpMessage();
				return 2;
				}
				Expected<OwningBinary<Binary>> BinaryOrErr = createBinary(InputFilename);
				if (!BinaryOrErr)
				reportError(InputFilename, BinaryOrErr.takeError());
				Binary &Binary = *BinaryOrErr.get().getBinary();
				if (ELFObjectFile<ELF64LE> *o = dyn_cast<ELFObjectFile<ELF64LE>>(&Binary)) {
				CopyBinary(*o);
				grimarUnsubmitted Done Reply Inline Actions You either want to remove brackets here: if (ELFObjectFile<ELF64LE> o = dyn_cast<ELFObjectFile<ELF64LE>>(&Binary)) CopyBinary(o); else reportError(InputFilename, object_error::invalid_file_type); Or probably better just to early return: if (ELFObjectFile<ELF64LE> o = dyn_cast<ELFObjectFile<ELF64LE>>(&Binary)) return CopyBinary(o); reportError(InputFilename, object_error::invalid_file_type); return X; grimar: You either want to remove brackets here: ``` if (ELFObjectFile<ELF64LE> *o =…
				return 0;
				}
				reportError(InputFilename, object_error::invalid_file_type);
				}

This is an archive of the discontinued LLVM Phabricator instance.

[LLVM][llvm-objcopy] Added basic plumbing to get things startedClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 104754

test/tools/llvm-objcopy/basic-copy.test

test/tools/llvm-objcopy/hello-world.s

tools/LLVMBuild.txt

tools/llvm-objcopy/CMakeLists.txt

tools/llvm-objcopy/LLVMBuild.txt

tools/llvm-objcopy/Object.h

tools/llvm-objcopy/Object.cpp

tools/llvm-objcopy/llvm-objcopy.h

tools/llvm-objcopy/llvm-objcopy.cpp

[LLVM][llvm-objcopy] Added basic plumbing to get things started
ClosedPublic