This is an archive of the discontinued LLVM Phabricator instance.

[NFC][llvm-nm] create a new helper function exportSymbolNamesFromFiles for --export-symbols
ClosedPublic

Authored by DiggerLin on Mar 3 2022, 8:38 AM.

Details

Summary

create a new helper function exportSymbolNamesFromFiles for --export-symbols

Diff Detail

Event Timeline

DiggerLin created this revision.Mar 3 2022, 8:38 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 3 2022, 8:38 AM
Herald added a subscriber: rupprecht. · View Herald Transcript
DiggerLin requested review of this revision.Mar 3 2022, 8:38 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 3 2022, 8:38 AM
jhenderson added inline comments.Mar 6 2022, 11:10 PM
llvm/tools/llvm-nm/llvm-nm.cpp
2240–2241

Can Filename be const (potentially by modifying other signatures too)? If so, presumably it can be a StringRef?

2270

If you change dumpSymbolNamesFromFile, the type here should change too to match.

2271–2272

This const_cast suggests the function signature is probably wrong. If it isn't, and Filename needs to be mutable in that function, I'd make a local copy of the name and pass that in instead.

Unrelated, but FileSymList is now a reference to a local variable in the calling function, which will likely cause strange behaviour, including crashes. It shouldn't have the const & bit in the type signature.

2278

Use llvm::remove_if(SymbolList, ...)

DiggerLin marked 4 inline comments as done.Mar 7 2022, 7:03 AM
DiggerLin added inline comments.
llvm/tools/llvm-nm/llvm-nm.cpp
2240–2241

thanks

2271–2272

according to https://en.cppreference.com/w/cpp/language/lifetime

The lifetime of a temporary object may be extended by binding to a const lvalue reference or to an rvalue reference (since C++11), see reference initialization for details.

when 'return SymbolList;" in dumpSymbolNamesFromFile , the compiler will create a temporary object "SymbolList" ,

So it will not crash.

2278

there is no api llvm::remove_if(SymbolList, ...) as https://en.cppreference.com/w/cpp/algorithm/remove
and if you use the llvm::remove_if(SymbolList, ...) it will compile error.

DiggerLin updated this revision to Diff 413460.Mar 7 2022, 7:11 AM
DiggerLin marked 3 inline comments as done.
jhenderson added inline comments.Mar 7 2022, 10:57 PM
llvm/tools/llvm-nm/llvm-nm.cpp
1921

Bonus patch, if you want: swap all these const std::string & for StringRef, especially since you're changing them anyway.

2271–2272

Huh, fair enough. Is there a reason not to take it by value though? I suspect it's identical in terms of codegen.

2278
DiggerLin updated this revision to Diff 413799.Mar 8 2022, 7:26 AM
DiggerLin marked 3 inline comments as done.
DiggerLin added inline comments.Mar 8 2022, 9:52 AM
llvm/tools/llvm-nm/llvm-nm.cpp
2278

thanks

DiggerLin added a comment.EditedMar 8 2022, 10:16 AM

do we still another NFC patch to archive your suggestion as "https://reviews.llvm.org/D112735" ?

struct NMObject {

std::unique_ptr<Binary> Bin; // Possibly any other members that might be necessary to store archive members.
StringRef ArchiveName;
StringRef ArchitectureName;

};

......

static std::vector<NMObject> getObjectsFromFile(StringRef InputFile) {

std::vector<NMObject> Objects;
/*
  Code from dumpSymbolNamesFromFile which retrieves the objects (and archive properties, if appropriate) inside a binary (may be a single object, or many).
  Emplace an NMObject in the vector for each such constructed object, instead of calling dumpSymbolNamesFromObject.
*/
return Objects;

}

I think the logic of the patch most meet your requirement , My suggestion we can stop to do further refactor for your suggestion.

DiggerLin removed a subscriber: james.Mar 8 2022, 11:01 AM
jhenderson accepted this revision.Mar 8 2022, 10:27 PM

LGTM.

do we still another NFC patch

To summarise, yes, we need another patch (at least one, possibly more), but I don't really mind what that patch looks like (i.e. it doesn't have to match what I suggested before).

My key end goal of the suggestions I made in D112735 was the removal of references to ExportSymbols causing early outs in otherwise common functions. I've highlighted the one section of code that has such references. I don't really mind how you achieve that end goal, as long as it leaves the code in a clean structural state (or at least as clean as it was before you started the export symbols work). My suggestion was merely one part of the wider plan, to ensure we had the information needed, and isn't necessarily the only approach, so if you have another approach then that's fine too.

Away from that, there's at least one other minor improvement I'd like to see: rename exportSymbolNamesFromFiles to dumpExportSymbolList, as I think "dump" indicates "print something" more clearly than "export". The "export symbol list" part of the name then is what is being dumped, and nicely lines up with the option name.

llvm/tools/llvm-nm/llvm-nm.cpp
1893–1903

References to ExportSymbols.

This revision is now accepted and ready to land.Mar 8 2022, 10:27 PM
This revision was landed with ongoing or failed builds.Mar 9 2022, 1:29 PM
This revision was automatically updated to reflect the committed changes.
DiggerLin added a comment.EditedMar 9 2022, 2:22 PM

My key end goal of the suggestions I made in D112735 was the removal of references to ExportSymbols causing early outs in otherwise common functions.

I do not think we can remove all the references to ExportSymbols , if we want to use the functionality (which extract object file out from archive, MachOUniversalBinary,TapiUniversal) dumpSymbolNamesFromFile. we need at least one references toExportSymbols.

we can  not implement as to get all the object files from input file into a vector(or list) , and then  dump and print the symbol for each object.

The reason: when there are error on get object file from input files.
for example, we have a test case, user has three input files from llvm-nm command to dump the symbol.

  1. there is warning on the input file1 as '"sizes with -print-size for Mach-O files are always zero.\n";"
  2. get object files OK for input file 2
  3. there are some error when get object files for input file3

    The current implement will output as:
error info for input file1
obj21 .o (input file2)
   symbol1 , XX ,XX
   symbol2, XX, XX
obj22.o (input file2)
   symbol1 , XX,XX
   symbol2, XX,XX
error info for input file3(can not get obj31.o from input file3)
obj32.o   (input file3)
   symbol1 , XX,XX
   symbol2, XX,XX
error info for input file3(can not get obj33.o from input file3)

if we get all the object file out and put into a list and dump(and print symbols) from each object file of list.(it will print out all the warning or error information first) the output will be changed to

error info for input file1.
error info for input file3(can not get obj31.o from input file3)
error info for input file3(can not get obj33.o from input file3)
obj21 .o (input file2)
    symbol1 , XX
    symbol2, XX
obj22.o (input file2)
   symbol1 , XX
   symbol2, XX
obj32.o   (input file3)
   symbol1 , XX
   symbol2, XX

I would say that the exact position of errors and warnings in the output is of lesser concern, and it's okay for them to change as the implementation changes, as long as the error messages have sufficient context in them - if they don't, this is something that could be improved though.

However, it occurs to me that a proposal of "fetch all symbols" and then "dump all symbols" may be costly in terms of memory usage. I'm usually not so fussed about memory usage, but I'd want to see what the impact of this approach would be. I guess the way to do better, without adversely impact the memory footprint of regular usage, is to have a lambda passed down the call stack, which operates on an object when it is encountered. Under regular usage, this lambda would be a function to print all symbols for that object (sorted as needed). Under export symbols usage, it would store the symbols in a global vector, which would then be printed etc at a later point. Outline code:

void forEachFile(function_ref<void(const std::vector<NMSymbol> &, /*other args as needed*/)> DoThis, ArrayRef<StringRef> Filenames) {
  // for each input file, determine file kind, call processArchive/processMachOUniversalBinary/...
}

void process*(... DoThis, ...) {
  // if multiple objects in file, loop over all doing the following, otherwise, do the following on just the one file:
  auto Syms = getSymbolsFromObject(Obj);
  DoThis(Syms, ...);
}

int main() {
...

  if (ExportSymbols) {
    std::vector<NMSymbol> Syms;
    forEachFile([&Syms](const std::vector<NMSymbol> &SymbolList, ...) {
                   Syms.insert(Syms.end(), SymbolList.begin(), SymbolList.end());
                }, InputFilenames);
    // filter, sort, print exports
  } else {
    forEachFile([](const std::vector<NMSymbol> &SymbolList){
                   // sort then print symbols
                }, InputFilenames);
  }
}

Hopefully you see what I'm getting at. This isn't really a high priority for me though, so don't feel obliged to look at this further, but if you do get a chance to look, I think it is worth considering. Hopefully the recent refactoring should put the code in a decent place for adopting something like this.

I would say that the exact position of errors and warnings in the output is of lesser concern, and it's okay for them to change as the implementation changes, as long as the error messages have sufficient context in them - if they don't, this is something that could be improved though.

However, it occurs to me that a proposal of "fetch all symbols" and then "dump all symbols" may be costly in terms of memory usage. I'm usually not so fussed about memory usage, but I'd want to see what the impact of this approach would be. I guess the way to do better, without adversely impact the memory footprint of regular usage, is to have a lambda passed down the call stack, which operates on an object when it is encountered. Under regular usage, this lambda would be a function to print all symbols for that object (sorted as needed). Under export symbols usage, it would store the symbols in a global vector, which would then be printed etc at a later point. Outline code:

void forEachFile(function_ref<void(const std::vector<NMSymbol> &, /*other args as needed*/)> DoThis, ArrayRef<StringRef> Filenames) {
  // for each input file, determine file kind, call processArchive/processMachOUniversalBinary/...
}

void process*(... DoThis, ...) {
  // if multiple objects in file, loop over all doing the following, otherwise, do the following on just the one file:
  auto Syms = getSymbolsFromObject(Obj);
  DoThis(Syms, ...);
}

int main() {
...

  if (ExportSymbols) {
    std::vector<NMSymbol> Syms;
    forEachFile([&Syms](const std::vector<NMSymbol> &SymbolList, ...) {
                   Syms.insert(Syms.end(), SymbolList.begin(), SymbolList.end());
                }, InputFilenames);
    // filter, sort, print exports
  } else {
    forEachFile([](const std::vector<NMSymbol> &SymbolList){
                   // sort then print symbols
                }, InputFilenames);
  }
}

Hopefully you see what I'm getting at. This isn't really a high priority for me though, so don't feel obliged to look at this further, but if you do get a chance to look, I think it is worth considering. Hopefully the recent refactoring should put the code in a decent place for adopting something like this.

I think I got your solution. and I am prefer to delay the patch, and  I will do my high priority work first.  Thanks again for your time and your professional and useful comments in all the patches. and I learned a lot of from your comments.