This change depends on D36196
This change adds support for SHT_SYMTAB sections
Differential D34167
[llvm] Add symbol table support to llvm-objcopy jakehehrlich on Jun 13 2017, 1:37 PM. Authored by
Details
This change depends on D36196 This change adds support for SHT_SYMTAB sections
Diff Detail Event TimelineComment Actions Updated this yet again to match revisions made to other diffs. This change only depends on a diff that already has an LGTM so it should be ready for review. Comment Actions Nit: full stops at the end of all comments, please. In addition to the inline comments, I'm concerned about the behaviour for section symbols, which do not have a name directly associated with them (they could be considered to have the section name, but this is not usually recorded in the string table). As things stand, it looks to me like only one section symbol will be added, because the others will all have the same (empty) Name, and so will clash in the StringMap. I'm also not a fan of the use of Symbols and FinalSymbols as it means that there are two copies of the symbol table, which could become quite a large additional memory overhead (also such similar names don't sit well with me). Not thought about this too much, but I could think of a couple of other options: 1) Turn the FinalSymbols vector into a vector of pointers into the Symbols map, ordered in the output order; 2) Perhaps combined with my suggested alternative algorithm for finalizing the symbol table order above, simply record the new index in the symbols in the map, and then iterate over the map when writing, jumping to the appropriate location as needed.
Comment Actions
Plan 1) Keep the same writeSection algorithm in this latest diff (the algorithm proposed by James) and then locally create an std::vector<Symbol*> in finalize that can be sorted and then iterated though to assign new indexes. The sorting algorithm would use a comparision that would make any STB_LOCAL symbol come before but would ignore STB_WEAK vs STB_GLOBAL in order to better maintain the original order (addressing a particular concern raised by James). The comparison would compare by original index when binding was not an issue. Pros) Only adds 8-bytes to memory usage for each symbol and that memory is only allocated for the duration that symbols are being finalized. Plan 2) Make Symbol an intrusive list node type. When adding a symbol it must then be added to the Symbols map but also to the intrusively constructed list. A pointer to the head of this list would be maintained. When adding and removing symbols the last local symbol is kept track of. This lets us quickly add local symbols to the right place so that we never have to worry about sorting them in any way. As local symbols are removed we can update the pointer to the last local symbol. Pros) The symbol table is written in order, no sorting is needed so finalization becomes linear. This also means we only need to do 2 passes over the symbols for the whole thing. The last local symbol is kept track of by the SymbolTableSection so finalizing the Info field is easy (It's the index of the local symbol whose's next symbol is not local). My proposal is that some veriation of one of these 2 methods be used to add removeSymbol to SymbolTableSection just before symbol stripping is needed. Comment Actions
What are section symbols? It doesn't sound like I have supported them and I am not aware of what they are. I wasn't able to find them online. These are some kind of symbol for which distinct symbols can have the same name? That seems to spell problems for my overall design here. Comment Actions Which writes are we talking about here? If we've sorted things, then we'll be writing contiguously into the buffer.
My biggest concern with this is that we have to be careful not to change the symbol binding of a symbol in place, or if we did, we would need to move it within the list. If that gets forgotten, then the order will be wrong.
My initial instinct is that we should take the simpler approach (option 1), as it is less likely to go wrong as more functionality is added to llvm-objcopy. That being said, I suspect that in the general case, people aren't going to make many changes to symbol bindings, so I'd say that the order of existing symbols is unlikely to change much, making a general sort probably rather inefficient. This then suggests option 2 to be better! Do we need a map at all? Is removing individual symbols by name something that needs to be supported, and is it up to objcopy to ensure that there are no two symbols with the same name? Could we actually maintain a single vector, where the index in the vector is the symbol's index (with something recording the last local). Then, when adding symbols, insert in the corresponding place. Changing a symbol's binding from weak/global to local or vice versa becomes a move within the vector. I think this means that there would only be a memory overhead of a std::vector, plus one size_t value. One thing that occurred to me with regards to modifying the symbol table ordering that might be somewhat related to this - we have to be wary that there might be relocations (and possibly other section contents), that refer to symbol indexes, so we need to track those if we anticipate the symbol table contents changing. Section symbols are local symbols with type STT_SECTION. They do not have a name as such (their st_name field is 0, so points to the empty string), but most tools that I've encountered use the name of their corresponding section (identified by st_shndx) when displaying them. They are used by relocations primarily, to allow section relative relocations, i.e. references to some offset into a section that doesn't necessarily have its own symbol. The only thing that you have to be aware of for them in this area is that they don't have unique names. This suggests to me that we shouldn't be using the StringMap (see my above comment).
Comment Actions
I was considering having "FinalSymbols" be a local variable. If it isn't local then we can write contiguously. If it is local then all it helps us do is assign the indexes and we would have to do things noncontiguously. But I think removing the StringMap all together as you recommended is likely a better solution.
Eh, yeah. That is a fair concern. I'd rather allow for as many modifications to occur as possible. I think I prefer sorting now, especially since this latest diff doesn't have a StringMap
I think we can get the best of both worlds actually. You pointed that people won't make changes to symbol bindings or symbol order. When these things do change they likely won't change very much. Well lots of sorting algorithms are linear under these conditions. So we just need to use a sorting algorithm that takes advantage of this. Unfortunately there don't seem to be any ready implementations of such algorithms in the standard library or in llvm. This diff:
The primary reason for the StringMap before was so that when handling relocations we could look up symbol indexes (see here: https://reviews.llvm.org/D36554). Strings seemed like a better option because I thought symbol names were unique. Symbol names, unlike symbol indexes, will not change on strip operations so strings seemed like better identifiers than indexes. James pointed out however that symbol names are not necessarily unique so we can't do that. So we need another way to look up indexes of the finalized symbols. If we agree that before modification of the symbol table that symbol indexes agree with original symbol indexes then we can look up pointers to symbols very quickly. For relocation sections rather than storing a symbol name we can store a pointer to the symbol. When removing symbols we will not remove symbols one by one but instead remove a whole bunch of symbols all at once. When we do this we can reassign new Index values to each symbol so that the internal symbol index (not the original index) stays consistent with the vector the symbols are being stored in. This means that if Sym is returned by the getSymbolByIndex method that Sym->Index will agree with the index passed to getSymbolByIndex. This reassignment of indexes won't affect relocations since relocations will grab pointers to the symbols during construction before any modification has occured. Comment Actions To clarify, because I may have missed it, but are we deferring sorting for now, to a later change when it becomes important? I don't see where it's happening currently!
Comment Actions It's not currently happening. I figured once it's clear how symbols are being altered we can add the sorting algorithm. I'll add it now if you'd like, it just seemed like it's unnecessary right now. Your call.
Comment Actions Nope, I'm happy with it being added later. No point in adding unnecessary code at this point for the sake of it. I realised that there's another case that you need to add support for either now or later, namely symbols with section indexes of SHN_COMMON or SHN_ABS, as these don't refer to actual sections. I'm happy for this to be a later change though. Assuming you do this in a subsequent change, this gets a LGTM from me.
|
For completeness, please check that there are no more symbols by checking the end of the symbol table has been reached.