Page MenuHomePhabricator

export unique symbol list with llvm-nm new option "--export-symbols"
ClosedPublic

Authored by DiggerLin on Oct 28 2021, 10:12 AM.

Details

Summary

the patch implement of following functionality.

  1. export the symbols from archive or object files.
  2. sort the export symbols. (based on same symbol name and visibility)
  3. delete the duplicate export symbols (based on same symbol name and visibility)
  4. print out the unique and sorted export symbols (print the symbol name and visibility).

there are two new options are add in the patch

  1. --export-symbols (enable the functionality of export unique symbol)
  2. --no-rsrc (exclude the symbol name begin with "__rsrc" from be exporting from xcoff object file)

Export symbol list for xcoff object file has the same functionality as
The patch has the same functionality as
IBM CreateExportList

Diff Detail

Unit TestsFailed

TimeTest
40 msx64 debian > LLVM.tools/llvm-objdump/XCOFF::export_sym_list_ar.test
Script: -- : 'RUN: at line 3'; /var/lib/buildkite-agent/builds/llvm-project/build/bin/llvm-objdump --export-unique-symbol /var/lib/buildkite-agent/builds/llvm-project/llvm/test/tools/llvm-objdump/XCOFF/Inputs/big-archive-libtest.a | /var/lib/buildkite-agent/builds/llvm-project/build/bin/FileCheck --check-prefixes=SYM /var/lib/buildkite-agent/builds/llvm-project/llvm/test/tools/llvm-objdump/XCOFF/export_sym_list_ar.test
50 msx64 debian > LLVM.tools/llvm-objdump/XCOFF::export_sym_list_obj.test
Script: -- : 'RUN: at line 10'; /var/lib/buildkite-agent/builds/llvm-project/build/bin/llvm-objdump --export-unique-symbol /var/lib/buildkite-agent/builds/llvm-project/llvm/test/tools/llvm-objdump/XCOFF/Inputs/exp_sym.o | /var/lib/buildkite-agent/builds/llvm-project/build/bin/FileCheck --check-prefixes=SYM /var/lib/buildkite-agent/builds/llvm-project/llvm/test/tools/llvm-objdump/XCOFF/export_sym_list_obj.test

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

rebased the code base on the https://reviews.llvm.org/D119028 [NFC] Refactor llvm-nm symbol comparing and split sorting

DiggerLin updated this revision to Diff 406935.Feb 8 2022, 12:33 PM
DiggerLin marked 9 inline comments as done.
DiggerLin added a comment.EditedFeb 8 2022, 12:40 PM
I created a refactor patch for above and rebase current patch based on it.

Thanks.

I think you should also do a refactor whereby SymbolList is no longer a global variable - having it as a global variable is confusing, at best, and at worst risks cases where it is still populated when it shouldn't be.

for the export list, I think we need a global variable , otherwise we need a variable in main() , and then pass into each function.  The llvm-nm is c style code, I do not think we can do all the refactor in this patch.   if you have have good idea, please let me know, I will create  a separate refactor patch after current patch land.

My suggested design I already provided in my previous comment actively needs to avoid the global variable - the proposed getSymbolsFromObject function would return this list for use by the respective calling functions. Global variables make code re-use harder, and lead to temporal dependencies, both of which this patch is inflicted with, in part due to the global SymbolList. Doing the refactor up-front would significantly improve this patch.

Since we have not reach an agreement on the getSymbolsFromObject() , I am prefer to  do "removing the global variable SymbolList" in a later separate refactor patch. Our team is been blocking on the "export symbol list", I understand your concern about the quality of the code,  but I do not think "removing a global variable SymbolList" is in the scope of the patch.
  1. printExportSymbolList loops over all inputs and calls getSymbolsFromObject , I think it need to share the code the dumpSymbolNamesFromFile. and we need to added a additional if(ExportSymbols) in the dumpSymbolNamesFromFile to go to different function .

I'm not sure I follow. In my proposed design, dumpSymbolNamesFromFile should never be touched by the export symbols code route, only getSymbolsFromObject.

when printExportSymbolList loops over all inputs ,  we still need to get object files from archive or from  MachOUniversalBinary, TapiUniversal,  we need the functionality  in  "dumpSymbolNamesFromFile" ,  so we still need the dumpSymbolNamesFromFile() for printExportSymbolList().

If we need the object fetching code from dumpSymbolNamesFromFile then just pull that code into a separate function too... Here's what I'm thinking:

// Class for storing the binary and passing around associated properties, in case it's an object.
struct NMObject {
  std::unique_ptr<Binary> Bin; // Possibly any other members that might be necessary to store archive members.
  StringRef ArchiveName;
  StringRef ArchitectureName;
};

static std::vector<NMSymbol> getSymbolsFromDLInfoMachO(MachOObjectFile &MachO) {
  std::vector<NMSymbol> Symbols;
  // As original implementation of dumpSymbolsFromDLInfoMachO, but replace all references to SymbolList with Symbols.
  return Symbols;
}

static std::vector<NMSymbol> getSymbolsFromObject(NMObject &Obj) {
  std::vector<NMSymbol> Symbols;
  // Parts of original dumpSymbolNamesFromObject that get the symbols that are to be printed.
  // Filtering will be done later, rather than here, i.e. don't add the exportSymbolsForXCOFF call here.
  return Symbols;
}

static void dumpSymbolNamesFromObject(NMObject &Obj) {
  std::vector<NMSymbol> Symbols = getSymbolsFromObject(Obj);
  Symbols = sortSymbolList(std::move(Symbols)); // Or sort in place
  printSymbolList(Symbols, Obj, printName);
}

static std::vector<NMObject> getObjectsFromFile(StringRef InputFile) {
  std::vector<NMObject> Objects;
  /*
    Code from dumpSymbolNamesFromFile which retrieves the objects (and archive properties, if appropriate) inside a binary (may be a single object, or many).
    Emplace an NMObject in the vector for each such constructed object, instead of calling dumpSymbolNamesFromObject.
  */
  return Objects;
}

static void dumpSymbolNamesFromFile(StringRef InputFile) {
  std::vector<NMObject> Objects = getObjectsFromFile(InputFile);
  llvm::for_each(Objects, dumpSymbolNamesFromObject);
}

std::vector<NMSymbol> getExportSymbols(ArrayRef<NMSymbol> Symbols, NMObject &Obj) {
  std::vector<NMSymbol> ExportSymbols;
  if (auto *XCOFF = dyn_cast<XCOFFObjectFile>(Obj.Bin)) {
    // Do what's necessary to find the symbols appropriate for XCOFF export symbols.
  }
  // TODO: Add implementations for other objects here. Possibly warn otherwise.
  return ExportSymbols;
}

static void dumpExportSymbolList(ArrayRef<StringRef> InputFilenames) {
  std::vector<NMSymbol> Symbols;
  for(StringRef InputFile : InputFilenames) {
    std::vector<NMObject> FileObjects = getObjectsFromFile(InputFile);
    for(NMObject &Object : FileObjects) {
      std::vector<NMSymbol> ObjSymbols = getSymbolsFromObject(InputFile);
      ObjSymbols = getExportSymbols(ObjSymbols, Object);
      // Add ObjSymbols to the end of Symbols.
    }
  }
  Symbols = sortSymbolList(Symbols); // Or sort in place
  SymbolList.erase(std::unique(Symbols.begin(), Symbols.end()), Symbols.end());
  printExportSymbols(Symbols);
}

int main() {
  /* ... */
  if(ExportSymbols)
    dumpExportSymbolList(InputFilenames);
  else
    llvm::for_each(InputFilenames, dumpSymbolNamesFromFile);
}

I've not tested this structure out, but I'm pretty confident that it, likely with some small tweaks is both a) not a massive departure from the current structure (meaning we don't need to rewrite large amounts of code), and b) will resolve the structural issues with this patch.

thanks for your sample code, I agree with you, the patch is quite large now. if I implement as your suggestion in the patch , I believe that we will need another refactor patch which split the dumpSymbolNamesFromFile function into several small function before the patch. Our patch https://reviews.llvm.org/D119147 is depend on the patch. My suggestion is that I will do refactor as your suggestion after the current patch land and at least let the change code in XCOFFObjectFile.cpp , test case, llvm-nm.rst etc landed, and we can focus on refactoring the llvm-nm.cpp.

if you strong insisted that we should do as your suggestion in the patch first, I can do it and postpones our schedule.

FYI @hubert.reinterpretcast .

llvm/tools/llvm-nm/llvm-nm.cpp
243 ↗(On Diff #406068)

thanks

254 ↗(On Diff #406068)

thanks

285 ↗(On Diff #406068)

thanks

1716–1723 ↗(On Diff #406068)

This can be simplified, I believe as per the inline edit.

1716–1723 ↗(On Diff #406068)

thanks for simplify

1784–1788 ↗(On Diff #406068)

agree with you , but I do not think there is better choice in current structure.

1849 ↗(On Diff #406068)

without setSymFlag(Obj). the S do not have a value SymFlags, it will crash at function
bool shouldSkipPrinting()

DiggerLin added a comment.EditedFeb 8 2022, 12:42 PM
I created a refactor patch for above and rebase current patch based on it.

Thanks.

I think you should also do a refactor whereby SymbolList is no longer a global variable - having it as a global variable is confusing, at best, and at worst risks cases where it is still populated when it shouldn't be.

for the export list, I think we need a global variable , otherwise we need a variable in main() , and then pass into each function.  The llvm-nm is c style code, I do not think we can do all the refactor in this patch.   if you have have good idea, please let me know, I will create  a separate refactor patch after current patch land.

My suggested design I already provided in my previous comment actively needs to avoid the global variable - the proposed getSymbolsFromObject function would return this list for use by the respective calling functions. Global variables make code re-use harder, and lead to temporal dependencies, both of which this patch is inflicted with, in part due to the global SymbolList. Doing the refactor up-front would significantly improve this patch.

Since we have not reach an agreement on the getSymbolsFromObject() , I am prefer to  do "removing the global variable SymbolList" in a later separate refactor patch. Our team is been blocking on the "export symbol list", I understand your concern about the quality of the code,  but I do not think "removing a global variable SymbolList" is in the scope of the patch.
  1. printExportSymbolList loops over all inputs and calls getSymbolsFromObject , I think it need to share the code the dumpSymbolNamesFromFile. and we need to added a additional if(ExportSymbols) in the dumpSymbolNamesFromFile to go to different function .

I'm not sure I follow. In my proposed design, dumpSymbolNamesFromFile should never be touched by the export symbols code route, only getSymbolsFromObject.

when printExportSymbolList loops over all inputs ,  we still need to get object files from archive or from  MachOUniversalBinary, TapiUniversal,  we need the functionality  in  "dumpSymbolNamesFromFile" ,  so we still need the dumpSymbolNamesFromFile() for printExportSymbolList().

If we need the object fetching code from dumpSymbolNamesFromFile then just pull that code into a separate function too... Here's what I'm thinking:

// Class for storing the binary and passing around associated properties, in case it's an object.
struct NMObject {
  std::unique_ptr<Binary> Bin; // Possibly any other members that might be necessary to store archive members.
  StringRef ArchiveName;
  StringRef ArchitectureName;
};

static std::vector<NMSymbol> getSymbolsFromDLInfoMachO(MachOObjectFile &MachO) {
  std::vector<NMSymbol> Symbols;
  // As original implementation of dumpSymbolsFromDLInfoMachO, but replace all references to SymbolList with Symbols.
  return Symbols;
}

static std::vector<NMSymbol> getSymbolsFromObject(NMObject &Obj) {
  std::vector<NMSymbol> Symbols;
  // Parts of original dumpSymbolNamesFromObject that get the symbols that are to be printed.
  // Filtering will be done later, rather than here, i.e. don't add the exportSymbolsForXCOFF call here.
  return Symbols;
}

static void dumpSymbolNamesFromObject(NMObject &Obj) {
  std::vector<NMSymbol> Symbols = getSymbolsFromObject(Obj);
  Symbols = sortSymbolList(std::move(Symbols)); // Or sort in place
  printSymbolList(Symbols, Obj, printName);
}

static std::vector<NMObject> getObjectsFromFile(StringRef InputFile) {
  std::vector<NMObject> Objects;
  /*
    Code from dumpSymbolNamesFromFile which retrieves the objects (and archive properties, if appropriate) inside a binary (may be a single object, or many).
    Emplace an NMObject in the vector for each such constructed object, instead of calling dumpSymbolNamesFromObject.
  */
  return Objects;
}

static void dumpSymbolNamesFromFile(StringRef InputFile) {
  std::vector<NMObject> Objects = getObjectsFromFile(InputFile);
  llvm::for_each(Objects, dumpSymbolNamesFromObject);
}

std::vector<NMSymbol> getExportSymbols(ArrayRef<NMSymbol> Symbols, NMObject &Obj) {
  std::vector<NMSymbol> ExportSymbols;
  if (auto *XCOFF = dyn_cast<XCOFFObjectFile>(Obj.Bin)) {
    // Do what's necessary to find the symbols appropriate for XCOFF export symbols.
  }
  // TODO: Add implementations for other objects here. Possibly warn otherwise.
  return ExportSymbols;
}

static void dumpExportSymbolList(ArrayRef<StringRef> InputFilenames) {
  std::vector<NMSymbol> Symbols;
  for(StringRef InputFile : InputFilenames) {
    std::vector<NMObject> FileObjects = getObjectsFromFile(InputFile);
    for(NMObject &Object : FileObjects) {
      std::vector<NMSymbol> ObjSymbols = getSymbolsFromObject(InputFile);
      ObjSymbols = getExportSymbols(ObjSymbols, Object);
      // Add ObjSymbols to the end of Symbols.
    }
  }
  Symbols = sortSymbolList(Symbols); // Or sort in place
  SymbolList.erase(std::unique(Symbols.begin(), Symbols.end()), Symbols.end());
  printExportSymbols(Symbols);
}

int main() {
  /* ... */
  if(ExportSymbols)
    dumpExportSymbolList(InputFilenames);
  else
    llvm::for_each(InputFilenames, dumpSymbolNamesFromFile);
}

I've not tested this structure out, but I'm pretty confident that it, likely with some small tweaks is both a) not a massive departure from the current structure (meaning we don't need to rewrite large amounts of code), and b) will resolve the structural issues with this patch.

thanks for your sample code, I agree with you, but the patch is quite large now. if I implement as your suggestion in the patch , I believe that we will need another refactor patch which split the dumpSymbolNamesFromFile function into several small function before the patch. Our patch https://reviews.llvm.org/D119147 is depend on the patch. My suggestion is that I will do refactor as your suggestion after the current patch land and at least let the change code in XCOFFObjectFile.cpp , test case, llvm-nm.rst etc which not related to the refactor landed first, and we can focus on refactoring the llvm-nm.cpp.

if you strong suggest that we should do as your suggestion in the patch first, I can do it and postpones our schedule. thanks.

FYI @hubert.reinterpretcast .

Okay, I'm willing to accept delaying further refactoring, as long as it is definitely going to happen sooner rather than later. Just a few nits to tidy up before I think you can land this patch.

llvm/test/tools/llvm-nm/XCOFF/export-symbols-with-invalid-section-index.test
1 ↗(On Diff #406935)

Could you fold this test case into the positive test case file? Would that allow you to reuse the YAML object, using -D to provide the invalid section index?

llvm/tools/llvm-nm/llvm-nm.cpp
254 ↗(On Diff #406068)

Just noting this hasn't been done, although you can defer to another patch if you prefer.

1784–1788 ↗(On Diff #406068)

Let's at least in this patch pull the block outside of the if (!(MachO && DyldInfoOnly)) clause, since it doesn't need to be inside it.

1849 ↗(On Diff #406068)

But the code didn't have this before, so why does it need it now? Is this a result of some of the refactoring work you've already done? I take it there's a concrete reproducible you could craft that would cause the crash without this change?

1658 ↗(On Diff #406935)

Let's rename this function now: getXCOFFExports seems like a nice concise name.

DiggerLin marked 5 inline comments as done.EditedFeb 9 2022, 10:00 AM

Okay, I'm willing to accept delaying further refactoring, as long as it is definitely going to happen sooner rather than later. Just a few nits to tidy up before I think you can land this patch.

Thanks, I can start the refactor from next week.

Okay, I'm willing to accept delaying further refactoring, as long as it is definitely going to happen sooner rather than later. Just a few nits to tidy up before I think you can land this patch.

Thanks a lot , I will begin to refactor as soon as possible, I think I can begin to refactor  from late of next week.
llvm/test/tools/llvm-nm/XCOFF/export-symbols-with-invalid-section-index.test
1 ↗(On Diff #406935)

thanks.

llvm/tools/llvm-nm/llvm-nm.cpp
254 ↗(On Diff #406068)

done ,thanks

1784–1788 ↗(On Diff #406068)

thanks

1849 ↗(On Diff #406068)

the function shouldSkipPrinting() is called
in the printExportSymbolList()
but the printExportSymbolList are called after dumpSymbolNamesFromFile , The object binary file maybe be freed from memory when there are several files inputs from command line. in the function dumpSymbolNamesFromFile() ,

Expected<std::unique_ptr<Binary>> BinaryOrErr =
      createBinary(BufferOrErr.get()->getMemBufferRef(), ContextPtr);

it generate a binary and freed it at the end of the function . so if we still try to get the SymFlags of symbol from bit code object file , it will be crashed.

so I set the SymFlag for getSymbolNamesFromObject()
as

if (S.initializeFlags(Obj))
  SymbolList.push_back(S);

dumpSymbolsFromDLInfoMachO() initiate the SymFlag for SymFlags . but getSymbolNamesFromObject do not set it, for consistency, I set the SymFlag for the NMSymbol in getSymbolNamesFromObject

1658 ↗(On Diff #406935)

thanks

DiggerLin marked 4 inline comments as done.Feb 9 2022, 10:01 AM
DiggerLin updated this revision to Diff 407285.EditedFeb 9 2022, 1:53 PM
  1. address comment.
  2. move the functionality of "removing the symbols which should not be printed" to before the functionality of "removing duplication symbols". and add a new test for it. Name: __tf2value Section: .data Type: 0x0 StorageClass: C_HIDEXT AuxEntries:
    • Type: AUX_CSECT SymbolAlignmentAndType: 0x21 StorageMappingClass: XMC_TC

Sometime we have two symbols, which both has the same name and no visibility, one's storageClass is C_HIDEXT (the should not be export), another is C_EXT(the one should be exported).
When compare these two symbols for "--export-symbols", these two symbol are same, if we remove duplication symbol before removing the symbols which should not print. the Symbol with C_EXT maybe removed. it will cause none of the two symbol be printed out.

jhenderson added inline comments.Feb 10 2022, 1:40 AM
llvm/test/tools/llvm-nm/XCOFF/export-symbols.test
5 ↗(On Diff #407285)

Tip: you can avoid the need for -DSECT=2 in the "regular" case, by using a default value in the YAML. I can't remember if the syntax is [[SECT=2]] or [[SECT:2]], but one of them should work (look for examples in other tests).

llvm/tools/llvm-nm/llvm-nm.cpp
1849 ↗(On Diff #406068)

Thanks. It would be nice we can avoid this issue with the proposed refactoring (and therefore delete this new code), but if it's needed for now, so be it.

701 ↗(On Diff #407285)

Nit: clang-format is complaining.

1668 ↗(On Diff #407285)

Should this be a continue rather than a return? Also, should it be closer to the rest of the name checks further down in this loop?

1677–1713 ↗(On Diff #407285)

I haven't got the time right now to check this myself, but you should ensure that you have symbols that trigger every possible code path that lead to them being skipped/not skipped.

A quick skim suggests at least the following:

  1. INTERNAL visibility
  2. HIDDEN visibility
  3. No visibility attribute
  4. Visibility attribute that isn't internal or hidden.
  5. Symbol with error when looking up section
  6. Symbol that is not in a section (SecIter == XCOFFObj->section_end())
  7. Text symbol
  8. Data symbol
  9. BSS symbol
  10. Symbol that is not one of the three previous types
  11. Symbol with empty name
  12. Symbol with name look-up failure
  13. __sinit prefixed symbol
  14. __sterm prefixed symbol
  15. . prefixed symbol
  16. ( prefixed symbol
  17. ____ named symbol (i.e. four underscores, nothing between prefix and suffix)
  18. __<digits>__ symbol
  19. __<digits> (no suffix)
  20. <digits>__ (no prefix)
  21. __<digits and something non digit>__ symbol
1720–1721 ↗(On Diff #407285)

This block appears after the name adjustments for __tf1 and __tf9. If the intent is that __tf1__rsrc and __tf9_rsrc are skipped with --no-rsrc, you need to check them. In fact, it probably doens't hurt to have that test case even if they should be kept.

1773–1777 ↗(On Diff #407285)

I'd be tempted to move this block right to the start of the function, since the other bits are completely irrelevant if we trigger this block.

DiggerLin updated this revision to Diff 407656.Feb 10 2022, 1:11 PM
DiggerLin marked 5 inline comments as done.
DiggerLin added inline comments.Feb 10 2022, 1:15 PM
llvm/tools/llvm-nm/llvm-nm.cpp
1668 ↗(On Diff #407285)

thanks

1677–1713 ↗(On Diff #407285)

I added all test cases you mention above except the "12 Symbol with name look-up failure"

from the source code in the XCOFFObjectFile.cpp

Expected<StringRef> XCOFFSymbolRef::getName() const {
  // A storage class value with the high-order bit on indicates that the name is
  // a symbolic debugger stabstring.
  if (getStorageClass() & 0x80)
    return StringRef("Unimplemented Debug Name");

  if (Entry32) {
    if (Entry32->NameInStrTbl.Magic != XCOFFSymbolRef::NAME_IN_STR_TBL_MAGIC)
      return generateXCOFFFixedNameStringRef(Entry32->SymbolName);

    return OwningObjectPtr->getStringTableEntry(Entry32->NameInStrTbl.Offset);
  }

  return OwningObjectPtr->getStringTableEntry(Entry64->Offset);
}

It never return an Error.

The code in the function exportSymbolsForXCOFF

Expected<StringRef> NameOrErr = Sym.getName();
   if (!NameOrErr) {
     warn(NameOrErr.takeError(), XCOFFObj->getFileName(),
          "for symbol with index " +
              Twine(XCOFFObj->getSymbolIndex(Sym.getRawDataRefImpl().p)),
          ArchiveName);
     continue;
   }
   StringRef SymName = *NameOrErr;

can be modified to

StringRef SymName = cantFail(Sym.getName());

but I prefer to keep as it is. The reason as:

if Expected<StringRef> XCOFFSymbolRef::getName() const is changed to return with Error in some day.

using StringRef SymName = cantFail(Sym.getName()); will cause an llvm-unreable.

1720–1721 ↗(On Diff #407285)

good catch, thanks , the tf1_rsrc should be export out as "__rsrc" , I changed the code and add test case for it.

jhenderson added inline comments.Feb 11 2022, 12:13 AM
llvm/test/tools/llvm-nm/XCOFF/export-symbols.test
72 ↗(On Diff #407656)

It's probably worth adding a comment to the start of each symbol block in this YAML explaining what case(s) that symbol covers.

llvm/tools/llvm-nm/llvm-nm.cpp
1677–1713 ↗(On Diff #407285)

I think you should switch to cantFail, unless you know that you are going to change getName soon. The reasons are:

  1. it simplifies the code dramatically, making it more readable.
  2. you can't test the current code, so there's no guarantee that even if getName is changed that it is handled here appropriately.
  3. if getName is changed to return an Error at some point, call sites should be audited for additional checks/tests that are needed. As such, this case should be picked up anyway.
DiggerLin marked 3 inline comments as done.Feb 14 2022, 12:23 PM
DiggerLin added inline comments.
llvm/test/tools/llvm-nm/XCOFF/export-symbols.test
72 ↗(On Diff #407656)

I think most the symbol name can express what we want to test. I only added some visibility comment on the YAML . and symbol name "var_extern" which test the

if (SecIter == XCOFFObj->section_end())
  continue;
5 ↗(On Diff #407285)

thanks a lot

llvm/tools/llvm-nm/llvm-nm.cpp
1677–1713 ↗(On Diff #407285)

thanks

DiggerLin marked 3 inline comments as done.

We're basically there (for now). Just some test tidy-ups/clarifications remaining.

llvm/test/tools/llvm-nm/XCOFF/export-symbols.test
55 ↗(On Diff #408561)

I couldn't find a symbol that fits this test case: "Symbol that is not one of the three previous types", i.e. that isn't text, data, or bss. Please add a comment/rename the symbol to highlight it.

119 ↗(On Diff #408561)

I might have missed it, but I think this is the only .text symbol? I believe you want a .text symbol that is actually exported (to show that such symbols can be exported). It's okay for this to be combined with another property that is being tested (e.g. protected visibility).

195–196 ↗(On Diff #408561)

I noticed that the "StorageClass" is different between this and the "Exported" case. If you changed it to C_EXT, would the symbol be exported? I'm assuming that the "Type" field is the one that defines visibility here...

More generally, for the cases where the symbol isn't printed, if possible there should be exactly one thing that would need changing to cause it to be printed. This goes for every case (hidden, internal, empty name etc).

220–221 ↗(On Diff #408561)

I think this is the "symbol not in a section case?" If so, perhaps name it more explicitly, and/or add a comment.

226 ↗(On Diff #408561)

Probably should have a comment to highlight what is significant about this case.

235 ↗(On Diff #408561)

Probably should have a comment to highlight what is significant about this case.

244 ↗(On Diff #408561)

Probably should have a comment to highlight what is significant about this case.

253 ↗(On Diff #408561)

Probably should have a comment to highlight what is significant about this case.

llvm/tools/llvm-nm/llvm-nm.cpp
1661 ↗(On Diff #408561)

SymbolRef is designed for lightweight copying, like StringRef, so no need for const &.

DiggerLin marked 7 inline comments as done.Feb 15 2022, 12:40 PM
DiggerLin added inline comments.
llvm/test/tools/llvm-nm/XCOFF/export-symbols.test
55 ↗(On Diff #408561)

the .debug in the section .debug which is not the text, data, or bss.

The

Name:            debug
 Section:         .debug

I think symbol name 'debug" and Section Name ".debug" can explain itself.

119 ↗(On Diff #408561)
Name:            .func
Section:         .text

means the symbol ".func" in the .text section.

 Name:  .text
Section: .text

means the symbol ".text" in the .text section.

 Name:            .func
 Section:         .text 
 
test

Do not export global symbols beginning with "."

195–196 ↗(On Diff #408561)

I noticed that the "StorageClass" is different between this and the "Exported" case. If you changed it to C_EXT, would the symbol be exported? I'm assuming that the "Type" field is the one that defines visibility here.

If you changed it to C_EXT, and visibility is Hidden, it not exported. "Type" field is the one that defines visibility.

for empty Name. it is generated by the IBM compiler xlclang, which storage name always is C_HIDEXT, always not exported. and it is no sense to export empty symbol name.

I changed StorageClass of empty symbol, "internal_var"
, "hidden_var" to C_EXT.

and add a new test

  - Name:            hidext_var
    Section:         .data
## Protected visibility.
    Type:            0x3000
    StorageClass:    C_HIDEXT
    AuxEntries:
     - Type:                   AUX_CSECT
       SymbolAlignmentAndType: 0x09
       StorageMappingClass:    XMC_RW
220–221 ↗(On Diff #408561)

I change to undef_var, which is not in any section

llvm/tools/llvm-nm/llvm-nm.cpp
1661 ↗(On Diff #408561)

no copy is not better than any lightweight copying ? I change as your suggestion anyway.

DiggerLin updated this revision to Diff 409023.Feb 15 2022, 1:03 PM
DiggerLin marked 5 inline comments as done.
jhenderson added inline comments.Feb 16 2022, 1:53 AM
llvm/test/tools/llvm-nm/XCOFF/export-symbols.test
55 ↗(On Diff #408561)

I'd like a comment that says "A symbol that is neither text, nor data, nor bss." specifically, to show that it's not that the symbol is a debug symbol, but rather that it isn't one of those three categories. That way, if behaviour were to change in the future, and debug symbols became special, there wouldn't be a risk of loss of coverage (in theory).

119 ↗(On Diff #408561)

When I say ".text" symbol, I mean a symbol in ".text". The current logic has SecIter->isText() as a possible condition for exporting. The .func test case hits this, but is then not exported for other reasons. You need a test case that also hits this, and is exported. Otherwise, you could delete that part of the conditional and you'd see no test failures.

llvm/tools/llvm-nm/llvm-nm.cpp
1661 ↗(On Diff #408561)

You left the const in, please remove it ;)

I wouldn't be surprised if the compiler optimizes away the copy anyway, since there's no modification of Sym. The copy is then less code, so makes it a little easier to read.

DiggerLin updated this revision to Diff 409252.Feb 16 2022, 7:48 AM
DiggerLin marked an inline comment as done.
DiggerLin added inline comments.
llvm/test/tools/llvm-nm/XCOFF/export-symbols.test
119 ↗(On Diff #408561)

I changed the

Name:            __tf1_tf1value
Section:         .data

-->

Name:            __tf1_tf1value
Section:         .text
jhenderson accepted this revision.Feb 16 2022, 7:51 AM

Okay, looks good from me. Best wait for @MaskRay too.

MaskRay accepted this revision.Feb 16 2022, 8:38 PM

LGTM.

llvm/docs/CommandGuide/llvm-nm.rst
280 ↗(On Diff #409252)
llvm/test/tools/llvm-nm/bitcode-export-sym.test
1 ↗(On Diff #409252)

Add # REQUIRES: powerpc-registered-target

This revision is now accepted and ready to land.Feb 16 2022, 8:38 PM
This revision was landed with ongoing or failed builds.Feb 17 2022, 8:37 AM
This revision was automatically updated to reflect the committed changes.
DiggerLin marked 2 inline comments as done.Feb 17 2022, 8:39 AM
DiggerLin added inline comments.
llvm/docs/CommandGuide/llvm-nm.rst
280 ↗(On Diff #409252)

thanks

llvm/test/tools/llvm-nm/bitcode-export-sym.test
1 ↗(On Diff #409252)

I wonder why we need # REQUIRES: powerpc-registered-target here . I think the llvm is cross compile, it can generate the bit code for powerpc on other platform, I added the line anyway.