This change makes llvm-elfabi tool to generate a minimal stub ELF shared library just for linking purpose from a tbe file. Tested with lld.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
LGTM. The section ordering details are conventional but not supposed to be semantically meaningful anyway.
llvm/lib/InterfaceStub/ELFObjHandler.cpp | ||
---|---|---|
115 | Seems safer to use s{}. | |
128 | Making this S{} both obviates the need to set any values explicitly to zero and also ensures no field was forgotten. | |
143 | This could be a single memcpy call using Symbols.data(). | |
192 | No other flags are expected for SHT_DYNSYM or SHT_SYMTAB. | |
198 | sizeof(Elf_Addr) seems clearest here and in other such cases. | |
203 | This could be a single memcpy call using Entries.data(). | |
205–210 | Traditionally the ordering is to put all SHF_ALLOC sections before all others. .shstrtab in particular is usually the last section. | |
206 | This need not set SHF_WRITE. e.g. Fuchsia uses -z rodynamic and some other platforms use R/O .dynamic sections. The writability does not matter at link time. | |
209 | Just constexpr Elf_Dyn{} is enough here. You could also just memset directly instead of copying some zero bytes with memcpy. | |
212 | Traditional ordering between these is: .dynsym, .dynstr, .dynamic. |
Nice. Will you folks be updating llvm-ifs to use this support instead of yaml2obj soon? If so add me to the review, I will be happy to take a look.
That's the plan. When the InterfaceStub library is ready, which should be soon hopefully, we'll integrate it into llvm-ifs and remove llvm-elfabi.
llvm/lib/InterfaceStub/ELFObjHandler.cpp | ||
---|---|---|
204–205 | ||
227 | Capitalize the comment and end with a period. | |
237 | ditto | |
241 | ditto | |
298 | Usually linkers place .dynsym .dynstr before .strtab .shstrtab .symtab The source order does not reflect the actual order and can cause confusion | |
llvm/test/tools/llvm-elfabi/binary-write-sheaders.test | ||
96 | The section order should be changed. See the comment above. |
llvm/lib/InterfaceStub/ELFObjHandler.cpp | ||
---|---|---|
298 | I have reordered the section headers. The StrTab here is actually .dynstr since a stub SO does not need .strtab . |
llvm/lib/InterfaceStub/ELFObjHandler.cpp | ||
---|---|---|
113 | You can avoid having 2 public: areas by merging them. | |
195 | If you add DT_NULL explicitly in the code, then you can simplify these 2 methods to: size_t getSize() const { return Entries.size() * sizeof(Elf_Dyn); } void write(uint8_t *Buf) const { memcpy(Buf, Entries.data(), getSize()); } |
llvm/lib/InterfaceStub/ELFObjHandler.cpp | ||
---|---|---|
183 | It is not clear why these e methods are needed? You could just call addValue on the caller side. |
llvm/lib/InterfaceStub/ELFObjHandler.cpp | ||
---|---|---|
113 | The private section needs the "Elf_Sym" type alias declared in the first public section and the second public section requires the private "Symbols" variable. So I don't think I can merge the two public sections as it is a chicken and egg issue. | |
183 | I feel these methods are better for readability. I removed those in latest patch. | |
195 | I prefer to keep "add DT_NULL" in the builder class instead of explicitly in the code. It reduces the chance a maintainer add new entries after DT_NULL entry in the future. |
llvm/lib/InterfaceStub/ELFObjHandler.cpp | ||
---|---|---|
113 | There is no chicken and egg issue here. C++ doesn't work like that. template <class ELFT> class ELFSymbolTableBuilder { public: using Elf_Sym = typename ELFT::Sym; ELFSymbolTableBuilder() { Elf_Sym S{}; Symbols.push_back(S); } void add(ELFStringTableBuilder &StrTabBuilder, StringRef StName, uint64_t StSize, uint8_t StBind, uint8_t StType, uint8_t StOther, uint16_t StShndx) { Elf_Sym S{}; S.st_name = StrTabBuilder.getOffset(StName); S.st_size = StSize; S.st_info = (StBind << 4) | (StType & 0xf); S.st_other = StOther; S.st_shndx = StShndx; Symbols.push_back(S); } size_t getSize() const { return Symbols.size() * sizeof(Elf_Sym); } void write(uint8_t *Buf) const { memcpy(Buf, Symbols.data(), sizeof(Elf_Sym) * Symbols.size()); } private: llvm::SmallVector<Elf_Sym, 64> Symbols; }; | |
119 | Instead of passing both StrTabBuilder abd StName you can just pass an offset. | |
195 | I think, this is a design issue. With this approach Entries doesn't contain all entries written. Though, if you want go this way, then you should be able to use memset: instead of constexpr Elf_Dyn Terminator{}; memcpy(Buf + sizeof(Elf_Dyn) * Entries.size(), &Terminator, sizeof(Elf_Dyn)); |
Also, I was unable to apply this patch. Seems it is rebased on a different one. In this way you probably might want to add a parent revision,
so that dependencies can be seen.
llvm/lib/InterfaceStub/ELFObjHandler.cpp | ||
---|---|---|
111 | I changed the value to 8, similar to other SmallVector used in this file. I think it should be small enough now. | |
113 | Thanks for pointing out. I have merged 2 public sections. | |
119 | Done. | |
195 | I changed it to memset. | |
238 | I tried locally to remove DT_SYMTAB and DT_STRTAB and confirm it does not affect linking. But I would prefer to keep these 2 entries since the generated Stub SO file does have .dynstr and .dynsym sections. There might be other tools would look for these references in .dynamic table and it would yield unexpected results if they are removed. |
Sorry about that. I thought the attachment to the parent revision would be automatic. I have manually attached the parent revision to this diff.
After revisiting this patch I have some more questions :]
llvm/lib/InterfaceStub/ELFObjHandler.cpp | ||
---|---|---|
150 | Can't this be the following? void modifyAddr(size_t Index, uint64_t Addr) { Entries[Index].d_un.d_ptr = Addr; } (the same below) | |
232 | I am not sure what is happening here. Could you explain? |
llvm/lib/InterfaceStub/ELFObjHandler.cpp | ||
---|---|---|
150 | Fixed in latest diff. | |
232 | This snippet adds symbols from the ELFStub structure into the .dynsym section builder. The "1" here is a placeholder. It does not mean ".text" has to be index 1. The issue here is for non-Undefined symbols, the shndx needs to be a value other than 0. Please correct me if I am wrong, at link time, for dynamic linking, the shndx value does not matter as long as it is not SHN_UNDEF for non-Undefined symbols, that why I put 1 here, which in current state, it points to ".dynsym" itself. The TODO message means I would like to add a dummy place holder ".text" section and point symbols to it in the stub SO in case any tool need a valid shndx value. For linking with ld/lld, I don't think that is necessary. I added section name in the binary-write-sheaders.test to verify shndx is filled with expected value. I do want to add some integration test to verify this elfabi tool will work with clang and lld, but it looks like I don't have access to the lld in the llvm test. If you have any good suggestion for integration testing, please let me know. We did performed end to end testing with elfabi tool in Fuchsia build and the linked binary from stub SO were equivalent to the ones linked from original SO files. If we chose to preserve the order of symbol table, the build results were the same bit by bit. |
llvm/lib/InterfaceStub/ELFObjHandler.cpp | ||
---|---|---|
150 | Not fixed. I suggested a one line method. | |
232 |
I guess you are right. I can't imagine a good reason why shndx can be important in run time. Perhaps someone else who are more familar with the logic of existent dynamic loaders can confirm.
I think you should use "point" word, not "points" then (in the TODO comment). Now the comment sounds like it already points to .text. Also, I think it worth to update this comment to explain that the symbol index is not really important and you assign an arbitrary non-SNH_UNDEF value, which now points to .dynsym itself. I am not sure how much is useful to mention ".text" here (i.e. how much usefull would be to add this section and TODO about that plan).
I think it is enough. The behavior of LLD is tested separatelly. |
64 is too large. This costs 24*64 bytes to the class. 0 may be fine assuming you always need a dynamic allocation (which is not a big deal)