This is an archive of the discontinued LLVM Phabricator instance.

[llvm-objcopy][MachO] Rebuild the symbol/string table in the writer
ClosedPublic

Authored by seiya on Jun 13 2019, 5:25 PM.

Details

Summary

Build the string table using StringTableBuilder, reassign symbol indices, and update symbol indices in relocations to allow adding/modifying/removing symbols from the object.

Diff Detail

Event Timeline

seiya created this revision.Jun 13 2019, 5:25 PM
seiya edited the summary of this revision. (Show Details)Jun 13 2019, 5:26 PM

I've got several stylistic comments, but don't have time to read up on Mach-O right now to review that aspect of it.

llvm/tools/llvm-objcopy/MachO/MachOWriter.cpp
124

This shouldn't be auto.

275

Standard LLVM style says that you should put the end() evaluation in the initializer part of a for loop, unless it can change:

for (auto Iter = O.SymTable.Symbols.begin(), End = O.SymTable.Symbols.end(); Iter != End; Iter++)

277

Can you just make this a raw pointer and do Iter->get()?

561

Maybe too much auto here.

llvm/tools/llvm-objcopy/MachO/Object.h
88

Is int the right type here? It seems dubious to me without knowing anything about the file format.

113

Don't need struct here I believe.

seiya updated this revision to Diff 205498.Jun 18 2019, 7:21 PM

Addressed review comments.

seiya marked 7 inline comments as done.Jun 18 2019, 7:22 PM
seiya added inline comments.
llvm/tools/llvm-objcopy/MachO/Object.h
88

I was wrong. This should be uint32_t. Thanks.

llvm/tools/llvm-objcopy/MachO/MachOReader.cpp
173
StringRef(StrTable.data() + nlist.n_strx).str()
186

i think you don't need the explicit specification of the template parameter here, it'll be deduced from the type of the second argument

llvm/tools/llvm-objcopy/MachO/Object.cpp
11

khm, why is this const_cast needed here ?

llvm/tools/llvm-objcopy/MachO/Object.h
100
SymbolEntry *getSymbolByIndex(uint32_t Index) const;
seiya updated this revision to Diff 205733.Jun 19 2019, 7:05 PM
seiya marked an inline comment as done.
  • Addressed review comments.
llvm/tools/llvm-objcopy/MachO/Object.h
227

I'm wondering if MachOWriter would be a better place for StrTableBuilder ?
MachOWriter::constructStringTable, MachOWriter::writeStringTable, MachOWriter::totalSize, MachOWriter::writeSymbolTable are the only users of this object, aren't they ?

seiya marked 5 inline comments as done.Jun 19 2019, 7:15 PM
seiya added inline comments.
llvm/tools/llvm-objcopy/MachO/Object.cpp
11

First I thought we need to modify something in the SymbolEntry pointed from the RelocationInfo.Symbol, but I realized it is not necessary in this patch.

FYI, I used the ELF implementation as reference:

class SymbolTableSection : public SectionBase {
...
public:
  const Symbol *getSymbolByIndex(uint32_t Index) const;
  Symbol *getSymbolByIndex(uint32_t Index);
};
seiya updated this revision to Diff 205734.Jun 19 2019, 7:18 PM
seiya marked 2 inline comments as done.
  • Move StrTableBuilder into MachOWriter.
llvm/tools/llvm-objcopy/MachO/Object.h
227

You're correct! I'll fix this.

seiya marked an inline comment as done.Jun 19 2019, 7:18 PM

Just to double check - do the existing tests pass with this patch ? If so - LGTM

This revision is now accepted and ready to land.Jun 19 2019, 7:49 PM
In D63309#1551411, @alexshap wrote:

Just to double check - do the existing tests pass with this patch ? If so - LGTM

Yes they pass with this patch. Thanks!

This revision was automatically updated to reflect the committed changes.

I am getting the compile warnings below when compiling on gcc version 7.3.1.

[1/4] Building CXX object tools/llvm-objcopy/CMakeFiles/llvm-objcopy.dir/MachO/MachOReader.cpp.o
/home/anushabasana/local/llvm-project/llvm/tools/llvm-objcopy/MachO/MachOReader.cpp: In instantiation of ‘std::vector<llvm::objcopy::macho::Section> llvm::objcopy::macho::extractSections(const llvm::object::MachOObjectFile::LoadCommandInfo&, const llvm::object::MachOObjectFile&, size_t&) [with SectionType = llvm::MachO::section; SegmentType = llvm::MachO::segment_command; size_t = long unsigned int]’:
/home/anushabasana/local/llvm-project/llvm/tools/llvm-objcopy/MachO/MachOReader.cpp:124:46:   required from here
/home/anushabasana/local/llvm-project/llvm/tools/llvm-objcopy/MachO/MachOReader.cpp:106:17: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
           reinterpret_cast<MachO::scattered_relocation_info *>(&R.Info)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
               ->r_scattered;
               ~~^~~~~~~~~~~
/home/anushabasana/local/llvm-project/llvm/tools/llvm-objcopy/MachO/MachOReader.cpp: In instantiation of ‘std::vector<llvm::objcopy::macho::Section> llvm::objcopy::macho::extractSections(const llvm::object::MachOObjectFile::LoadCommandInfo&, const llvm::object::MachOObjectFile&, size_t&) [with SectionType = llvm::MachO::section_64; SegmentType = llvm::MachO::segment_command_64; size_t = long unsigned int]’:
/home/anushabasana/local/llvm-project/llvm/tools/llvm-objcopy/MachO/MachOReader.cpp:129:50:   required from here
/home/anushabasana/local/llvm-project/llvm/tools/llvm-objcopy/MachO/MachOReader.cpp:106:17: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]

From https://developer.apple.com/documentation/kernel/relocation_info?changes=latest_major&language=ob_7:

r_symbolnum

Indicates either an index into the symbol table (when the r_extern field is set to 1) or a section number (when the r_extern field is set to 0). As previously mentioned, sections are ordered from 1 to 255 in the order in which they appear in the LC_SEGMENT load commands. This field is set to R_ABS for relocation entries for absolute symbols, which need no relocation.

The current Mach-O reader and writer do not handle r_extern=0 relocations.

// MachO/MachOReader
void MachOReader::setSymbolInRelocationInfo(Object &O) const {
  for (auto &LC : O.LoadCommands)
    for (auto &Sec : LC.Sections)
      for (auto &Reloc : Sec.Relocations)
        if (!Reloc.Scattered) {
          auto *Info = reinterpret_cast<MachO::relocation_info *>(&Reloc.Info);
          Reloc.Symbol = O.SymTable.getSymbolByIndex(Info->r_symbolnum);  //// this can reference a section when r_extern=0
        }
}

From https://developer.apple.com/documentation/kernel/relocation_info?changes=latest_major&language=ob_7:

r_symbolnum

Indicates either an index into the symbol table (when the r_extern field is set to 1) or a section number (when the r_extern field is set to 0). As previously mentioned, sections are ordered from 1 to 255 in the order in which they appear in the LC_SEGMENT load commands. This field is set to R_ABS for relocation entries for absolute symbols, which need no relocation.

The current Mach-O reader and writer do not handle r_extern=0 relocations.

// MachO/MachOReader
void MachOReader::setSymbolInRelocationInfo(Object &O) const {
  for (auto &LC : O.LoadCommands)
    for (auto &Sec : LC.Sections)
      for (auto &Reloc : Sec.Relocations)
        if (!Reloc.Scattered) {
          auto *Info = reinterpret_cast<MachO::relocation_info *>(&Reloc.Info);
          Reloc.Symbol = O.SymTable.getSymbolByIndex(Info->r_symbolnum);  //// this can reference a section when r_extern=0
        }
}

Thank you for pointing that out. I'll take a look at that.