Page MenuHomePhabricator

export unique symbol list for xcoff with llvm-nm new option "--export-symbols"
Needs ReviewPublic

Authored by DiggerLin on Oct 28 2021, 10:12 AM.

Details

Reviewers
jhenderson
Esme
sfertile
hubert.reinterpretcast
daltenty
MaskRay
EGuesnet
Group Reviewers
Restricted Project
Summary

the patch implement of following functionality.

  1. export the symbols of the xcoff archive or object files. (export the extern and extern weak symbol)
  2. sort the export symbols. (based on same symbol name and visibility)
  3. delete the duplicate export symbols (based on same symbol name and visibility)
  4. print out the unique and sorted export symbols (print the symbol name and visibility).

there new options are add in the patch

  1. --export-symbols (enable the functionality of export unique symbol)
  2. --no-rsrc (exclude the symbol name begin with "__rsrc" from be exporting.

The patch has the same functionality as
IBM CreateExportList

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
DiggerLin updated this revision to Diff 383833.Nov 1 2021, 10:38 AM
DiggerLin marked an inline comment as done.

Does this option really belong in llvm-objdump? It seems to me like it's for a different tool, probably llvm-nm (which simply prints symbol lists). In particular, I don't like the "suppresses other options" bit of the help message. llvm-objdump and llvm-readobj usually just do everything that's asked of them.

The additional supporting options belong in independent patches, based on the first. If you switch to using llvm-nm, you'll find that you probably won't need the --exclude-weak option, since llvm-nm already has a --no-weak option. In fact, you may be able to get away without a new option for --export-unique-symbol, by using existing llvm-nm options, such as --dynamic, I'm not sure.

Please take a look at llvm-nm, and see what you think.

Hi James,

if you run llvm-objdump --help

you will see the

llvm-objdump MachO Specific Options: 
   --exports-trie         Display mach-o exported symbols

the option "--export-unique-symbol" do the same thing for xcoff.

--export-unique-symbol  Export symbol list for xcoff object file or archive

and if you see the source of option
--macho Use MachO specific object file parser

it also suppresses other non-MachO options

in the llvm-objdump.cpp

static void dumpInput(StringRef file) {

// If we are using the Mach-O specific object file parser, then let it parse
// the file and process the command line options.  So the -arch flags can
// be used to select specific slices, etc.
if (MachOOpt) {
  parseInputMachO(file);
  return;
}

if (ExportUniqueSymbol) {
  exportSymbolInfoFromFile(file);
  printExportSymboList(outs());
  return;
}

Yes, this is true, but llvm-objdump for Mach-O (i.e. with the --macho option) is almost completely unrelated to llvm-objdump for other platforms. This, in my opinion, was a bit of a historical design mistake as most of llvm-objdump, aside from this, is generic, but there may be/have been good reasons to do it, based on the comment at the start of dumpInput. I think we can do better with XCOFF, and I think the functionality fits better with the existing llvm-nm functionality, and will enable greater code reuse. Note also that the use of the --exports-trie option does not suppress the use of other Mach-O options, like --bind or --rpaths, so your current proposal differs in this regards (--macho is not a normal option: it basically tells llvm-objdump to run completely differently, rather than just enabling specific dumps like most switches).

DiggerLin added a comment.EditedNov 2 2021, 8:20 AM

Does this option really belong in llvm-objdump? It seems to me like it's for a different tool, probably llvm-nm (which simply prints symbol lists). In particular, I don't like the "suppresses other options" bit of the help message. llvm-objdump and llvm-readobj usually just do everything that's asked of them.

The additional supporting options belong in independent patches, based on the first. If you switch to using llvm-nm, you'll find that you probably won't need the --exclude-weak option, since llvm-nm already has a --no-weak option. In fact, you may be able to get away without a new option for --export-unique-symbol, by using existing llvm-nm options, such as --dynamic, I'm not sure.

Please take a look at llvm-nm, and see what you think.

Hi James,

if you run llvm-objdump --help

you will see the

llvm-objdump MachO Specific Options: 
   --exports-trie         Display mach-o exported symbols

the option "--export-unique-symbol" do the same thing for xcoff.

--export-unique-symbol  Export symbol list for xcoff object file or archive

and if you see the source of option
--macho Use MachO specific object file parser

it also suppresses other non-MachO options

in the llvm-objdump.cpp

static void dumpInput(StringRef file) {

// If we are using the Mach-O specific object file parser, then let it parse
// the file and process the command line options.  So the -arch flags can
// be used to select specific slices, etc.
if (MachOOpt) {
  parseInputMachO(file);
  return;
}

if (ExportUniqueSymbol) {
  exportSymbolInfoFromFile(file);
  printExportSymboList(outs());
  return;
}

Yes, this is true, but llvm-objdump for Mach-O (i.e. with the --macho option) is almost completely unrelated to llvm-objdump for other platforms. This, in my opinion, was a bit of a historical design mistake as most of llvm-objdump, aside from this, is generic, but there may be/have been good reasons to do it, based on the comment at the start of dumpInput. I think we can do better with XCOFF, and I think the functionality fits better with the existing llvm-nm functionality, and will enable greater code reuse. Note also that the use of the --exports-trie option does not suppress the use of other Mach-O options, like --bind or --rpaths, so your current proposal differs in this regards (--macho is not a normal option: it basically tells llvm-objdump to run completely differently, rather than just enabling specific dumps like most switches).

Does this option really belong in llvm-objdump? It seems to me like it's for a different tool, probably llvm-nm (which simply prints symbol lists). In particular, I don't like the "suppresses other options" bit of the help message. llvm-objdump and llvm-readobj usually just do everything that's asked of them.

The additional supporting options belong in independent patches, based on the first. If you switch to using llvm-nm, you'll find that you probably won't need the --exclude-weak option, since llvm-nm already has a --no-weak option. In fact, you may be able to get away without a new option for --export-unique-symbol, by using existing llvm-nm options, such as --dynamic, I'm not sure.

Please take a look at llvm-nm, and see what you think.

Hi James,

if you run llvm-objdump --help

you will see the

llvm-objdump MachO Specific Options: 
   --exports-trie         Display mach-o exported symbols

the option "--export-unique-symbol" do the same thing for xcoff.

--export-unique-symbol  Export symbol list for xcoff object file or archive

and if you see the source of option
--macho Use MachO specific object file parser

it also suppresses other non-MachO options

in the llvm-objdump.cpp

static void dumpInput(StringRef file) {

// If we are using the Mach-O specific object file parser, then let it parse
// the file and process the command line options.  So the -arch flags can
// be used to select specific slices, etc.
if (MachOOpt) {
  parseInputMachO(file);
  return;
}

if (ExportUniqueSymbol) {
  exportSymbolInfoFromFile(file);
  printExportSymboList(outs());
  return;
}

Yes, this is true, but llvm-objdump for Mach-O (i.e. with the --macho option) is almost completely unrelated to llvm-objdump for other platforms. This, in my opinion, was a bit of a historical design mistake as most of llvm-objdump, aside from this, is generic, but there may be/have been good reasons to do it, based on the comment at the start of dumpInput. I think we can do better with XCOFF, and I think the functionality fits better with the existing llvm-nm functionality, and will enable greater code reuse.

There are XCOFFDump.cpp(COFFDump.cpp, MachODump.cpp) in the llvm-objdump framework, we can put different object format related implement in the XXXDump.cpp, if I implement the "--export-unique-symbol" option in the llvm-nm, It is not a good ideal to put the "--export-unique-symbol" related implement in the llvm-nm.cpp and it is not a  good idea to create  a separate  the XCOFFDump.cpp for the llvm-nm in the llvm-nm directory.

Note also that the use of the --exports-trie option does not suppress the use of other Mach-O options, like --bind or --rpaths, so your current proposal differs in this regards (--macho is not a normal option: it basically tells llvm-objdump to run completely differently, rather than just enabling specific dumps like most switches).

OK, I agree with you, I can implement the "--export-unique-symbol" without suppress other options.

As I already explained, I believe most of the functionality you need is already present in llvm-nm:

export the symbols of the xcoff archive or object files. (export the extern and extern weak symbol)

Assuming you mean print, llvm-nm already prints all symbols or a subset of symbols (you can use the --extern-only and/or potentially the --dynamic option (if needed - I don't know if XCOFF has a dynamic/static symbol distinction) to achieve this in llvm-nm.

delete the duplicate export symbols (which has same symbol name and visibility)

This functionality doesn't exist in llvm-nm yet to my knowledge, but would be fine, and probably simpler, to add there under a new option (e.g. --unique), since it already has lists of symbols to enable sorting. It would be trivial to uniquify that list.

sort the export symbols.

llvm-nm already provides various options for sorting, so it is unlikely you'd need to add anything here.

print out the unique and sorted export symbols (print the symbol name and visibility).

Printing out the visibility would be new, but I think is generic to at least ELF and XCOFF, so it would be easy to add a new column to llvm-nm's output, behind some option (e.g. --show-visibility).

You could then enable all the new options at once with a new XCOFF-specific llvm-nm option: --xcoff-unique-exports.

DiggerLin added a comment.EditedNov 2 2021, 9:25 AM

llvm-nm print out print out the "symbol address" , "symbol flag" , "symbol name" for each symbol
for option "--export-unique-symbol" , we do not need to print out the "symbol address" ,"symbol flag" for each symbol, instead of we need "symbol visibility" for each symbol.

if we implement the option "--export-unique-symbol" in the llvm-nm. we need to modify the following code in llvm-nm.cpp

  1. modify llvm-nm: sortAndPrintSymbolList() to disable printing the "symbol address" , "symbol flag" and enable printing "symbol visibility" if there is option "--export-unique-symbol" .
  1. and add a new visibility attribute to
struct NMSymbol  {
  uint64_t Address;
  uint64_t Size;
  char TypeChar;
  std::string Name;
  StringRef SectionName;
  StringRef TypeName;
  BasicSymbolRef Sym;
  // The Sym field above points to the native symbol in the object file,
  // for Mach-O when we are creating symbols from the dyld info the above
  // pointer is null as there is no native symbol.  In these cases the fields
  // below are filled in to represent what would have been a Mach-O nlist
  // native symbol.
  uint32_t SymFlags;
  SectionRef Section;
  uint8_t NType;
  uint8_t NSect;
  uint16_t NDesc;
  std::string IndirectName;
};
  1. add new compare function to compare (symbol name and symbol visibility") .
  1. we also have some special xcoff symbol do not be exported . for example symbols begin with "sinit", "sterm", "__tf1" do not be exported, we need to add function in the llvm-nm to filter xcoff special symbol.
  1. add a new option "--unique".

if we implement the option in ""--export-unique-symbol" in llvm-objdump.cpp, we only need to add hook to invoke "exportSymbolInfoFromXCOFFObjectFile()" (the function is implemented in the XCOFFDump.cpp, and we implement all the export_symbol_list functionality in the exportSymbolInfoFromXCOFFObjectFile function ) in llvm-objdump.cpp.
it is much clear and transparent.

If you strong suggest that we should implement the functionality in llvm-nm, I can do it.

DiggerLin updated this revision to Diff 384169.Nov 2 2021, 11:24 AM

address the comment of the option "--export-unique-symbol" not suppressing other options.

DiggerLin updated this revision to Diff 384235.Nov 2 2021, 2:31 PM

I've already stated my case. Let's see what @MaskRay thinks.

@MaskRay , can we have your option on implementing the functionality on llvm-nm or llvm-objdump?

MaskRay added a comment.EditedNov 8 2021, 11:37 AM

Sorry for my slow response.

My concern regarding the necessity of SF_XCOFF_Protected remains.
The BasicSymbolRef::Flags members are for more widely-used attributes.
An ELF/Mach-O symbol has many attributes but very few made themselves in the list.
I understand that XCOFF has protected symbols, but that is not sufficient justification putting it in the list: ELF doesn't do that.
You can bypass that by just dealing with the raw symbol information.
So, I'll keep the "Request Changes" on, until this has been resolved.

AIUI objdump originated from GNU binutils. (Then other projects ported it.)
I agree that Mach-O's overloading "objdump" (a number of Mach-O specific dumping options) was a historical design mistake.
The interface is so different. It probably should just have been separately named upfront, like "llvm-otool" we have today.

This patch focuses on symbols and I assume that you may add more symbol related dumping options.
Since you appear to be (a bit) open to the tool (to add the feature) and the option names, I agree with @jhenderson that llvm-nm is more suitable.
Some functionality you want is already there, e.g. --no-weak

if we implement the option "--export-unique-symbol" in the llvm-nm. we need to modify the following code in llvm-nm.cpp

modify llvm-nm: sortAndPrintSymbolList() to disable printing the "symbol address" , "symbol flag" and enable printing "symbol visibility" if there is option "--export-unique-symbol" .
[...]

I have read some but not much Mach-O code in llvm-nm. The Mach-O part of llvm-nm really needs some refactoring in my view (but nobody has motivation to do that now).
My gut feeling says that some Mach-O specific fields (NType/NSect/NDesc) in NMSymbol may be a design mistake.
It may be better putting the extra symbol information in a parallel table.
If the information can be easily extracted from BasicSymbolRef (perhaps a derived class), extracting the information where the dumper needs it (i.e. not storing them at all) may be better.
Since you are adding new functionality, I believe some better choices can be made and you don't necessarily follow its steps (copy its mistake).

%p/Inputs/big-archive-libtest.a

I think @jhenderson has mentioned this many times in the past, it'd be past to use yaml2obj to generate tests and avoid precanned binaries.
If yaml2obj is not ready, that perhaps means XCOFF contributors can prioritize on making that tool mature.

implement "create export list" based on llvm-nm

DiggerLin retitled this revision from export unique symbol list for xcoff with llvm-objdump new option "--export-unique-symbol" to export unique symbol list for xcoff with llvm-nm new option "--export-symbols".Wed, Dec 22, 6:16 AM
DiggerLin edited the summary of this revision. (Show Details)
jhenderson added inline comments.Wed, Jan 5, 4:08 AM
llvm/include/llvm/BinaryFormat/XCOFF.h
72

Should this be "Data Execution Protection" (i.e. upper-case "D")?

74
78
80–82
llvm/include/llvm/Object/SymbolicFile.h
122 ↗(On Diff #383076)

I think what @MaskRay is suggesting is to not use getSymbolFlags to get the symbol visibility for XCOFF, and to instead have another function (e.g. "getSymbolVisibility") which is defined in the XCOFFObjectFile interface, which returns some enum value or similar that is specific to XCOFF.

At the moment, as far as I can tell, the only use of this visibility is within code that already takes an XCOFFObjectFile, so there is no need to extend the getSymbolFlags function (and underlying SF_* enum).

llvm/include/llvm/Object/XCOFFObjectFile.h
59–70

These functions should have blank lines between them.

llvm/lib/BinaryFormat/Magic.cpp
91–94

This is unrelated, I think?

llvm/lib/Object/XCOFFObjectFile.cpp
621

Delete blank line.

623–628

ELF has "protected" and "internal" visibility, but doesn't have the problem you are trying to solve here, with the addition of SF_Internal, and not bothering with "protected". Perhaps you should see what happens with ELF objects and what flags are used (if any) for visibility there?

llvm/test/tools/llvm-objdump/XCOFF/export_sym_list_ar.test
1 ↗(On Diff #395737)

This test is using llvm-nm, but is in the llvm-objdump directory. Please fix.

Rather than introduce a canned archive to support testing this functionality, I would strongly prefer that llvm-ar support for Big Archives be finished instead, so that the archive can be created at test time. Alternatively, don't attempt to support archives in this patch, and add that in a later one.

llvm/test/tools/llvm-objdump/XCOFF/export_sym_list_obj.test
1 ↗(On Diff #395737)

See comment above - wrong place for this test.

Also no need to spell out what the new option does here (especially as "exporting symbol list" is not particularly clear).

4 ↗(On Diff #395737)

You probably want two different input files that both contain symbols, as regular llvm-nm output prints each object's symbols separately, rather than folding them into a single list.

9 ↗(On Diff #395737)
10–12 ↗(On Diff #395737)

"Not export" -> "Do not export"

Also remove double space at start of comments.

13 ↗(On Diff #395737)
14 ↗(On Diff #395737)

Space before opening bracket in English prose.

I'm not sure what this comment is actually trying to say though. I *think* it's trying to say that's what the format is, but in that case, I don't think it's necessary, as that is what the test is testing!

15 ↗(On Diff #395737)

All of these test cases probably want an --implicit-check-not={{.}} or they may pass spuriously, due to undesired symbols appearing in the output, other than where checked.

17 ↗(On Diff #395737)

It's not clear to me what is meant by "unique" here. I'm assuming it's referring to avoiding duplicates in the output, but in that case, what is meant by a duplicate? The name? Symbol visibility? Literally the same symbol (i.e. same object file and symbol index) etc? Depending on the intent, I think you'll need significantly more testing around this test case. Based on my reading of the code, I expect you want "symbols with same name and visibility" in which case, you don't want to use a second object file. Instead, you want two symbols in the same object file, with the same name and visibility, and also a symbol with the same name but different visibility, and a symbol with a different name, but same visibility (the latter may be indirectly covered by other cases, so may not be needed).

20 ↗(On Diff #395737)
23 ↗(On Diff #395737)
26 ↗(On Diff #395737)
28 ↗(On Diff #395737)

Delete additional blank line.

155–160 ↗(On Diff #395737)
162 ↗(On Diff #395737)

I think it would be much simpler to a) have the __rsrc symbol later in the output, and b) putting it under a separate prefix, allowing you to do something like:

# SYM:           export_var export
# RSRC-NEXT: __rsrc
# SYM-NEXT: protected_var protected
# SYM-NEXT: tf1value
# SYM-NEXT: tf9value
# WEAK-SYM-NEXT: weak_func

This would remove the need for a near-duplicate set of symbols being checked. The base case would then use --check-prefixes=SYM,RSRC,WEAK-SYM, and the no __rsrc case could just omit the RSRC prefix.

166 ↗(On Diff #395737)
168 ↗(On Diff #395737)

This won't do what you think it will. This will look for the literal string ".*" and fail if it is found. You probably wanted "{{.*}}".

Using --implicit-check-not={{.*}} is a better way of doing this, as you can then avoid the --check-prefix=ANY option for this test case too.

llvm/tools/llvm-nm/Opts.td
55

Please remember to add the new options to the llvm-nm documentation.

56–57

This help text references --export-unique-symbol but I don't think that's what you mean? Furthermore, if the __rsrc symbol is just a special class fo symbols, it doesn't need to depend on --export-symbols, and instead could be used to omit it from llvm-nm's regular symbol list (and therefore indirectly the --export-symbols list too), just like --no-weak.

Finally, could this be pushed into a separate and later patch?

llvm/tools/llvm-nm/llvm-nm.cpp
109–110
233

Do you actually need this field? Can you just query the Sym member directly?

If you do need it, I think it needs to be somewhere above the comment in this class, since it's not related to Mach-O.

661

It seems to me like you're doing several separate things in this function which are different to llvm-nm's defaults:

  1. Sorting based on the visibility
  2. Filtering out duplicates
  3. Filtering out symbols with certain visibilities
  4. Printing name and visibility columns (and not other columns like type code).
  5. Merging input file symbols into a single list.

I feel like you could achieve this in a more natural way, without needing to have a completely independent method for printing the symbols. Specifically, you could implement each of these as individual new functionality in the core logic (possibly with a user-facing option per new functionality), which "--export-symbols" then specifies automatically.

For example:

  1. The symbol visibility sorting is just another sorting mode, like --size-sort, so could be added at the point where --size-sort does something.
  2. Duplicate removal would be done based on all to-be-printed pieces of information, and could be done when printing there instead.
  3. Visibility filtering could be done at the same time as e.g. weak filtering.
  4. Column printing would be controlled by a series of flags (some on by default, others off by default), which control whether each column would be printed.
  5. Input file merging could be common behaviour.

The advantage of dividing things up like this is that you can implement and test each of them independently of each other, which makes reviewing much easier. Additionally, by adding some or all of these as user-level switches, it allows users to get exactly the symbols they want in the format they want.

DiggerLin updated this revision to Diff 398248.Fri, Jan 7, 2:53 PM
DiggerLin marked 29 inline comments as done.
DiggerLin edited the summary of this revision. (Show Details)
DiggerLin added inline comments.Fri, Jan 7, 2:56 PM
llvm/include/llvm/BinaryFormat/XCOFF.h
80–82

thanks

llvm/include/llvm/Object/SymbolicFile.h
122 ↗(On Diff #383076)

I remove the define of SF_Internal = 1U << 12

llvm/lib/BinaryFormat/Magic.cpp
91–94

yes, it need , in the function

static void dumpSymbolNamesFromFile(std::string &Filename) {
 .....
   Expected<std::unique_ptr<Binary>> BinaryOrErr =
       **createBinary(BufferOrErr.get()->getMemBufferRef(), ContextPtr);**
...
 }

createBinary will call identify_magic()

llvm/lib/Object/XCOFFObjectFile.cpp
623–628

yes, in ELF, it only deal with SF_Exported and SF_Hidden in function getSymbolFlags();

if (isExportedToOtherDSO(ESym))
   Result |= SymbolRef::SF_Exported;
 
 if (ESym->getVisibility() == ELF::STV_HIDDEN)
   Result |= SymbolRef::SF_Hidden;
llvm/test/tools/llvm-objdump/XCOFF/export_sym_list_ar.test
1 ↗(On Diff #395737)

good catch. thanks. I agree with you that we should avoid using canned archive, The functionality of "export symbol list" was originally planned to be completed by the end of last year. but considering the workload, we postponed it to the end of January this year. I think it still a long distance to have the "big archive write" patch landed(we not ye begin to review the patch ). I am sorry that I have to use canned archive in the patch.

llvm/test/tools/llvm-objdump/XCOFF/export_sym_list_obj.test
4 ↗(On Diff #395737)

not clear about the comment, can you explain more detail?

14 ↗(On Diff #395737)

thanks

17 ↗(On Diff #395737)

Since removing duplicate symbol(with the same name and visibility) happen at the end of the code(after merge all the output). I think it is reasonable to export symbols from two same object file and check whether it remove the duplicate symbol.

and also a symbol with the same name but different visibility, and a symbol with a different name, but same visibility (the latter may be indirectly covered by other cases, so may not be needed).

I add a new symbol "export_protested_var protected" to test it.

162 ↗(On Diff #395737)

in order to simple the review, I separate the symbol check part for different test.

llvm/tools/llvm-nm/Opts.td
56–57

we only need exclude rsrc symbol when export the symbol list sometimes, we do not need to exclude the rsrc symbol for other purpose. so I do not think we need to implement as --no-weak.

56–57

This help text references --export-unique-symbol but I don't think that's what you mean? Furthermore, if the __rsrc symbol is just a special class fo symbols, it doesn't need to depend on --export-symbols, and instead could be used to omit it from llvm-nm's regular symbol list (and therefore indirectly the --export-symbols list too), just like --no-weak.

Finally, could this be pushed into a separate and later patch?

llvm/tools/llvm-nm/llvm-nm.cpp
233

yes. Visibility can query the Sym member directly, if we do not have field "StringRef Visibility" , as cost, we need to generate the "Visibility" every time when it need
we need Visibility when sort, compare duplication, and print output symbols
I am prefer to add a new field here.

661

I think over your suggestion. My opinion is that we should adding new functionality or options only when it is required.

Input file merging could be common behavior.

I  added new function PrintMergedSymbolList() (which only print symbol name and visibility(if there is)) without adding the new option "--merge-output" .  If anyone or other object file format also need the functionality only, we can add a new patch for new option at that time  to on/off the function PrintMergedSymbolList()  and deciding which others fields need to be printed out at that time.

Duplicate removal would be done based on all to-be-printed pieces of information, and could be done when printing there instead.

if we add a new option  "--no-duplicate" to remove the duplication symbol, the option is no sense to a single xcoff object file(I think a single object file never has duplication symbol).  the option "--no-duplicate" only has sense on the merging output together.  I  added a new function removeDuplicateSortedSymbolList() and we can add a new option "--no-duplicate" in later patch to on/off removeDuplicateSortedSymbolList() if it is need.

Column printing would be controlled by a series of flags

I think this is a good idea, we can add the functionality in a separate and not urgent patch, and current patch will not depend on it.

The symbol visibility sorting is just another sorting mode, like --size-sort, so could be added at the point where --size-sort does something.

I add a new functionality compareSymbolNameVisibility, it only enable on the --option "--export-symbols" . if elf object file also need the functionality,  we can add a new option to enable compareSymbolNameVisibility() at that time.

the option will "--export-symbols" will enable the three functions at same times in current patch.
I think other new options only need to be added when not all of the three functions need to be enabled at the same time.

Visibility filtering could be done at the same time as e.g. weak filtering.

Visibility only be printed at "export symbol list" and it is always need to be printed out if there is.  I will add new option "Visibility filtering"  in later patch only there is requirement.
DiggerLin added inline comments.Fri, Jan 7, 3:04 PM
llvm/tools/llvm-nm/llvm-nm.cpp
661

sorry , it should be
"My opinion is that we should add new functionality or options only when it is need."
in above comment.

The functionality of "export symbol list" was originally planned to be completed by the end of last year. but considering the workload, we postponed it to the end of January this year. I think it still a long distance to have the "big archive write" patch landed(we not ye begin to review the patch ). I am sorry that I have to use canned archive in the patch.

This isn't really how the LLVM community works. Whilst I understand you may have company deadlines to meet, we want to ensure the quality of the code/tests etc that land in LLVM is good enough, without incurring unreasonable technical debt. In this case, this would be unreasonable, in my opinion: you can implement the majority of the export symbol list functionality without needing archive support. Archive support can and should be a separate and later patch. Actually, now that I think about it, you don't really need Big Archive support at all for this patch: use llvm-ar to create a regular Unix-style archive containing XCOFF objects at test time, and just use that to test the functionality provided in this patch. A separate patch, as noted inline, could include the idenetify magic code for Big archives, and this can be unit tested using gtest. This should be sufficient test coverage (combined with testing of big archive reading in the patch that is already under review) to cover this, without needing to ever have a canned big archive for this test. At a later point, if you wish, you could update the regular archive to a Big archive, once llvm-ar can create them.

With regards to the big archive writing patch, I've deliberately avoided reviewing that until the reading patch has landed. Please ping me on the writing patch, once it has, so that I don't forget about it. Also, for your information, I am off for two weeks from Monday. Maybe @MaskRay could continue the review in my absence?

llvm/docs/CommandGuide/llvm-nm.rst
231–232

This fixes various grammar issues, but also I think simplifies what you're saying to the bare minimum. Please rewrap to 80 columns too.

Also, please note that options in this document are ordered alphabetically, although actually this option should be in the XCOFF-specific section.

llvm/lib/BinaryFormat/Magic.cpp
91–94

It's still not really related to this patch: this should be part of the Big Archive support patch series. Note that for object file support, you don't need this code.

llvm/test/tools/llvm-nm/XCOFF/export_sym_list_ar.test
1

I strongly recommend you move archive support into a separate patch.

llvm/test/tools/llvm-nm/XCOFF/export_sym_list_obj.test
1

Use - not _ in test names. I think you should just drop the "obj" part of the name, and merge all the test cases into one file, as it keeps the functionality tightly focused.

7–8
9
10
13

I've said this multiple times before: add a space before opening parentheses in English prose (i.e. not code).

14

To rephrase my earlier comment, use two independent object files, rather than two identical ones as inputs. This will show the merging functionality better.

24–25

I still think you'd be better off grouping this with the test case above. If you really don't want to though, I'd use NO-RSRC, not NO_RSRC.

34

Ditto.

57

Also below.

llvm/test/tools/llvm-nm/XCOFF/not_export_shared_obj.test
1

I'd probably merge this test into the main test. You can then use the same YAML for all cases.

llvm/tools/llvm-nm/Opts.td
36

These options are supposed to be in alphabetical order. Please fix.

llvm/tools/llvm-nm/llvm-nm.cpp
109–110

Add a blank line before the comment (as per the suggested edit)

222

Restore the blank line before this comment.

661
661

To be clear, I'm not suggesting you need the actual user flags, but if you wrote the code as if those flags existed, I think the logic would be much cleaner (I see this is more-or-less what you've done, which is great).

I think this is a good idea, we can add the functionality in a separate and not urgent patch, and current patch will not depend on it.

See my comment below about OutputFormat actually (no need necessarily to implement a front-end switch, but I think the functionality of printing symbol name + visibility is just another kind of that.

666–671

No else after return.

675

Perhaps removeAdjacentDuplicates, then you don't need the comment.

679

This is a variable at this point, so should be named as such.

701

Rather than having separate printMergedSymbolList and printSymbolList which both print the symbols, I think you just need one printSymbolList function. The SymbolList is cleared in the calling function, where it really belongs, rather than where it is currently cleared at the end of printSymbolList. There are then two calling functions, printAllSymbols, which builds up a single list of symbols, and is used for this new option you're implementing, and the existing dumpSymbolNamesFromObject.

Further, I realise that the format printed for the --export-symbols is actually just another OutputFormat, like posix or sysv. By making the small structural change above, you can then just implement it much like the differences in these formats is implemented, within printSymbolList.

709

There's no need to clear the SymbolList here, as it's only used in one place.

More generally though, I think SymbolList should be a local variable passed around rather than a global variable (but that's probably another piece of work).

1673

No blank lines at start of function.

1734
1805
1884

I think the if clause should be inside sortSymbolList.

2388

This looks like you're going to print both the regular list of symbol names, and the exported list, which isn't what I was expecting. Are you sure it's what you meant to do?