This is an archive of the discontinued LLVM Phabricator instance.

[llvm-nm]add helper function to print out the object file name, archive name, architecture name
ClosedPublic

Authored by DiggerLin on Feb 22 2022, 2:17 PM.

Details

Summary
  1. added helper function printObjectNamesInfo() to print out object file name, archive name, architecture name.
  2. One small behaviors change.

    in the function dumpMachOUniversalBinaryArchAll , delete the functionality:
if (moreThanOneArch)
               outs() << "\n";

Diff Detail

Event Timeline

DiggerLin created this revision.Feb 22 2022, 2:17 PM
DiggerLin requested review of this revision.Feb 22 2022, 2:17 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 22 2022, 2:17 PM
DiggerLin retitled this revision from [llvm-nm] add helper function to print out the object file name, archive name, architecture name to [llvm-nm][refactor] add helper function to print out the object file name, archive name, architecture name.Feb 22 2022, 2:18 PM
MaskRay added inline comments.Feb 22 2022, 2:44 PM
llvm/tools/llvm-nm/llvm-nm.cpp
1851–1870

The default arguments are unused

1853
1860

remove braces for single-line simple statements

DiggerLin marked 3 inline comments as done.Feb 23 2022, 6:50 AM
DiggerLin added inline comments.
llvm/tools/llvm-nm/llvm-nm.cpp
1851–1870

thanks.

DiggerLin updated this revision to Diff 410808.Feb 23 2022, 6:51 AM
DiggerLin marked an inline comment as done.
DiggerLin edited the summary of this revision. (Show Details)
DiggerLin edited the summary of this revision. (Show Details)
DiggerLin edited the summary of this revision. (Show Details)
MaskRay added inline comments.Feb 23 2022, 10:10 AM
llvm/tools/llvm-nm/llvm-nm.cpp
1853

Not fixed.

1864

Try const SymbolicFile &Obj and fix some non-const references in the body.

jhenderson requested changes to this revision.Feb 24 2022, 12:58 AM

Please don't mark a patch as a refactor when there are behaviour changes: by definition "refactor" means a change with no change in behaviour.

Please don't change the output of llvm-nm, at least for regular ELF output and probably for all foramts, unless GNU nm output has changed. The LLVM tools like llvm-nm, llvm-readelf, llvm-objdump are supposed to produce pretty much identical output to the GNU utilities, so that they can be used as drop-in replacements. The version of GNU nm on my machine prints archive member names as member.o: not archive.a(member.o): or similar. I do agree that the proposed output is better, but we can't break this requirement, unless there's buy-in from the GNU community, as lots of people rely on automated scripts to parse the tool output.

llvm/tools/llvm-nm/llvm-nm.cpp
1865–1866

It's not clear what the difference is between PrintName and PrintObjectNameInfo from just the variable names. I suggest renaming PrintName to be more specific (I think it's PrintSymbolName right?). Also, I'd drop "Info" from the PrintObjectNameInfo variable name.

2210–2211

Label these booleans with comments, i.e. as per the inline edit.

Same goes at the other call sites.

This revision now requires changes to proceed.Feb 24 2022, 12:58 AM
DiggerLin marked 4 inline comments as done.Feb 24 2022, 9:13 AM
DiggerLin added inline comments.
llvm/tools/llvm-nm/llvm-nm.cpp
1864

Agree with you , but if I change the to const SymbolicFile &Obj, a lot of functions's parameter need to "const SymbolicFile &Obj" ,
for example, printSymbolList() ,isSymbolList64Bit() , I am prefer to keep as it is now. and another separate patch for it. if you strong suggest that we need to change it the patch. I can do it.

1865–1866

it is not printing symbol name. it print object file name relate info. I thin PrintName is too generic. I used PrintObjectName?

DiggerLin updated this revision to Diff 411158.Feb 24 2022, 9:23 AM
DiggerLin marked 2 inline comments as done.
DiggerLin retitled this revision from [llvm-nm][refactor] add helper function to print out the object file name, archive name, architecture name to [llvm-nm]add helper function to print out the object file name, archive name, architecture name.
DiggerLin edited the summary of this revision. (Show Details)
DiggerLin added inline comments.Feb 24 2022, 9:26 AM
llvm/tools/llvm-nm/llvm-nm.cpp
1865–1866

sorry for misunderstand comment, please ignore above comment.

  1. I changed PrintName to PrintSymObjectName
  2. changed PrintObjectNameInfo to PrintObjectName

in the function dumpMachOUniversalBinaryArchAll , delete the functionality:

Your commit message indicates that there's a functional change, but there's no corresponding test change. That implies there's either a missing test, or you've broken an existing test. I suggest adding a test to cover the output difference.

llvm/tools/llvm-nm/llvm-nm.cpp
1851–1870

Please remember UpperCamelCase for variable names.

That being said, I'm not convinced IsArchive is especially meaningful or even correct, given that the false case is also an archive member sometimes. I suggest just passing in an empty string for ArchiveName if the archive name shouldn't be printed.

You're also only printing one object, so I recommend against calling it printObjectNames (plural). Even though you are printing multiple names, there is only one name for the object; the rest just provide additional context.

1864

+1 to leaving it for now, but fixing in a separate patch, since the situation is no worse than it was before.

1865–1866

I'm still not a massive fan of the variable names, as they are still somewhat unclear. How about PrintSymbolObject and PrintObjectLabel respectively?

I use "Label" in this context, because it's similar to assembly or C labels, so should be familiar enough.

2195–2196

Here and throughout, there's no need to label the parameters with comments if the input makes it somewhat obvious what they may represent. I personally use it only for literals like true, false, 0, 1 and {} etc. In other words, remove the last two labels here and anywhere where a variable is being passed in, rather than a literal.

DiggerLin marked 2 inline comments as done.Feb 28 2022, 8:05 AM
This comment was removed by DiggerLin.
llvm/tools/llvm-nm/llvm-nm.cpp
1851–1870

That being said, I'm not convinced IsArchive is especially meaningful or even correct, given that the false case is also an archive member sometimes.

I changed to IsArchiveFile.

I suggest just passing in an empty string for ArchiveName if the archive name shouldn't be printed.

I do not think we can set the ArchiveName to empty if archive name should not print in the printObjectName , for the function writeFileName(errs(), ArchiveName, ArchitectureName); and printSymbolList(Obj, PrintSymObjectName, ArchiveName, ArchitectureName) still need it .
in the dumpSymbolNamesFromObject.

1865–1866

OK.

2126

I think it is bug of the code,

if (moreThanOneArch)

outs() << "\n";

if the MachOUniversalBinary has more than one object , it print as "

"
(there is one empty line here)
a.o:

XXXXXX

b.o:

XXXXXX"

if only have one object file , it printed as

"a.o: //(there is no empty before the a.o)
XXXXXXX
"

it is not consistency for first object file a.o (whether there is a empty line before the first object file name(or Lable)"

DiggerLin marked 2 inline comments as done.Feb 28 2022, 8:13 AM

in the function dumpMachOUniversalBinaryArchAll , delete the functionality:

Your commit message indicates that there's a functional change, but there's no corresponding test change. That implies there's either a missing test, or you've broken an existing test. I suggest adding a test to cover the output difference.

in the original code,

if (moreThanOneArch)

outs() << "\n";

if the MachOUniversalBinary has more than one object , it print as

"
(there is one empty line here)
a.o:

XXXXXX

b.o:

XXXXXX"

if only have one object file , it printed as

"a.o: //(there is no empty before the a.o)
XXXXXXX
"
it is not consistency for first object file a.o (whether there is a empty line before the first object file name(or Lable)" for the MachOUniversalBinary , and other file format of the code
there do test there is no empty line for the first object file where is only one object file in the MachOUniversalBinary .

I change to always print out the

"
(there is one empty line here)
a.o:

XXXXXX
"
even there is only object file in the MachOUniversalBinary.

I do not think we need test for a empty line of the first object file of the MachOUniversalBinary.

DiggerLin updated this revision to Diff 411817.Feb 28 2022, 8:29 AM

I do not think we need test for a empty line of the first object file of the MachOUniversalBinary.

It is important to test whitespace as much as non-whitespace, as we don't want regressions in how things are printed (e.g. double blank lines, no blank lines when expected etc). This can make the output look messy, apart from anything else.

Having looked at the output, I agree that having a blank line at the start makes sense, but I want to be sure we enshrine this in a test.

Tip: when posting comments including code blocks or tool output etc, use the "Code Block" formatting option (left of the quote button in the Phabricator comment toolbar). The markup is three "`" characters for a multi-line output, or just one for inline content, at each end of the block.

llvm/tools/llvm-nm/llvm-nm.cpp
1851–1870

IsArchiveFile is no better: it has the same problems:

  1. The input file for this function is not an archvie. It's an object (which may be part of an archive).
  2. The code path for !IsArchiveFile is only relevant when the input file IS a member of an archive, which is contradictory to the name of the parameter.

How about PrintArchiveName (and flipping its logic).

1973

As already requested...

1975
2005
2034
2091
2125
2148
DiggerLin marked 8 inline comments as done.Mar 1 2022, 10:52 AM
This comment was removed by DiggerLin.
llvm/tools/llvm-nm/llvm-nm.cpp
1851–1870

OK

1973

from the '!PrintFileName" , we can not know it is the value of parameter of "PrintObjectLabel" at first glance , so I keep the comment here on purpose. I delete it anyway.

DiggerLin updated this revision to Diff 412166.Mar 1 2022, 10:56 AM
DiggerLin marked 2 inline comments as done.
DiggerLin updated this revision to Diff 412291.Mar 1 2022, 4:40 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 1 2022, 4:40 PM
DiggerLin updated this revision to Diff 412294.Mar 1 2022, 5:34 PM
jhenderson added inline comments.Mar 1 2022, 10:29 PM
llvm/test/Object/nm-universal-binary.test
38 ↗(On Diff #412294)

Adding {{[[:space:]].*}} does literally nothing useful, as the whitespace at the start of a line is ignored unless using both --strict-whitespace and --match-full-lines in the FileCheck command. I believe, if you want to check the first line is empty, that you can use:

CHECK-AR:      {{^$}}
CHECK-AR-NEXT: macho-universal-archive.x86_64.i386(hello.o) (for architecture x86_64):

For later blank lines, you can do CHECK-EMPTY: immediately following a regular CHECK. CHECK-EMPTY works like CHECK-NEXT but ensures the line is completely blank instead of checking against some pattern. Since it works like CHECK-NEXT it cannot be used as the first CHECK pattern however. Conversely, {{^$}} will not always work if used after the first CHECK pattern. This is because FileCheck consumes the string it has matched up to before running the next CHECK. For example, imagine the output was:

foo
bar

And we have the following CHECKs:

CHECK: foo
CHECK-NEXT: {{^$}}
CHECK-NEXT: bar

to test that there is a blank line between foo and bar. In this case, after matching foo, the output will look like a blank line, followed by a line containing bar. The attempt to check for an empty line matches the first line (at the end of foo), which is not the line after the previous match, so the CHECK-NEXT fails.

(Aside, not something for you to fix, but this test is full of holes, and could fail to catch symbols in the output that shouldn't be present, or other additional lines).

48 ↗(On Diff #412294)

See above.

llvm/tools/llvm-nm/llvm-nm.cpp
1886

Why have you just introduced this change? How is it better than the previous code?

DiggerLin marked an inline comment as done.Mar 2 2022, 6:10 AM
DiggerLin added inline comments.
llvm/tools/llvm-nm/llvm-nm.cpp
1886

yes, in the original code of function getSymbolNamesFromObject( )

auto Symbols = Obj.symbols();
.....
 
if (DynamicSyms) {
  const auto *E = dyn_cast<ELFObjectFileBase>(&Obj);
  if (!E) {
    error("File format has no dynamic symbol table", Obj.getFileName());
    return;
  }
Symbols = E->getDynamicSymbolIterators();

The value of Symbols be changed by the line
Symbols = E->getDynamicSymbolIterators();
Symbols maybe not the equal to Obj.symbols()

so we can not use the
if ( Obj.symbols().empty() && SymbolList.empty() && !Quiet)

jhenderson added inline comments.Mar 2 2022, 6:27 AM
llvm/tools/llvm-nm/llvm-nm.cpp
1886

Thanks, I see. I think I'd prefer it if you pulled the logic on whether or not an object has symbols into a separate function (maybe hasSymbols) which takes Obj and returns the value based on the appropriate set of symbols for whether DynamicSyms is specified or not. You probably would then benefit from a separate helper function used by this new function and getSymbolNamesFromObject that returns the dynamic symbols list (or an empty list if there was an error). Something like the following:

static bool hasSymbols(SymbolicFile &Obj) {
  if (DynamicSyms)
    return !getDynamicSyms(Obj).empty();
  return !Obj.symbols().empty();
}

static <insert return type here> getDynamicSyms(SymbolFile &Obj) {
  const auto *E = dyn_cast<ELFObjectFileBase>(&Obj);
  if (!E) {
    error("File format has no dynamic symbol table", Obj.getFileName());
    return {};
  }
  return E->getDynamicSymbolIterators();
}
DiggerLin updated this revision to Diff 412408.Mar 2 2022, 7:22 AM
DiggerLin marked 4 inline comments as done.
DiggerLin added inline comments.
llvm/test/Object/nm-universal-binary.test
38 ↗(On Diff #412294)

thanks a lot for your detail explanation.

llvm/tools/llvm-nm/llvm-nm.cpp
1886

as your suggestion, the function getDynamicSyms() will be call twice .

  1. One is in the function

getSymbolNamesFromObject

2 . the other is in hasSymbols() which will be called in the function
dumpSymbolNamesFromObject()

if there is error.

if (!E) {

  error("File format has no dynamic symbol table", Obj.getFileName());
  return {};
}

the error info will be print out twice.

if I keep the code in the
as

static bool getSymbolNamesFromObject(SymbolicFile &Obj, bool &IsSymbolsEmpty) {
  auto Symbols = Obj.symbols();
  std::vector<VersionEntry> SymbolVersions;
  if (DynamicSyms) {
    const auto *E = dyn_cast<ELFObjectFileBase>(&Obj);
    if (!E) {
      error("File format has no dynamic symbol table", Obj.getFileName());
      return false;
    }
    Symbols = E->getDynamicSymbolIterators();

there is no advantage of adding a two helper function.

I do not think adding a new parameter "bool &IsSymbolsEmpty" is a big problem.

jhenderson added inline comments.Mar 2 2022, 10:30 PM
llvm/tools/llvm-nm/llvm-nm.cpp
1745

Add a comment to the function explaining the meaning of the return value.

1886

the error info will be print out twice.

Fair point. My gut says the problem is that this error is reported at too low a level in the code, and that the right way to fix this would be to send errors back up the stack via Expected and Error, with it being reported near the top-level of the code, such that an error effectively aborts dumping of one object, but not all objects. I acknowledge that this is yet another change, although I think the cost of it wouuld be fairly small. If you don't want to go that far, I understand (I'd offer to do it myself, but don't really have the time). However, in that case, I think we still need a different solution, because the new input argument is yuck. A partial implementation of the above (using Expected/Error) might be to have getDynamicSyms return an Expected, and then in getSymbolNamesFromObject, if the Expected is in a failed state, call error there, while in hasSymbols simply do consumeError. If you do this, I'd add a TODO: comment in the latter case saying the error is already reported elsewhere, but really it should be propagated up the stack, to avoid duplicate errors.

I do not think adding a new parameter "bool &IsSymbolsEmpty" is a big problem.

Sorry, but I disagree. (Related aside: I'm not really all that happy about the return type change in this patch either, but I understand it's a step towards a later goal). With your change, you're making this function do at least four different things, 1) get all the object's symbols 2) filter out the ones you care about, 3) tell the caller whether the symbols could be loaded from the object, and 4) tell the caller whether the object has any symbols. Yes, some of these were there before, but adding more to that set is not ideal, when there are alternative approaches. In clean code, it is generally preferable for each function to try to do only one thing: this makes it easier to understand what any given function is doing, making it more maintanable and easier to use by other developers in the future.

Additionally, the use of a reference argument purely for setting a secondary return type is not great for various reasons: 1) it forces calling code to provide an input boolean that is set, even if the caller doesn't care about the parameter; 2) related to 1), the caller has to define a boolean variable before the function is called, preventing the function from being used cleanly within e.g. if clauses (the boolean input argument has to be defined before the if check); 3) it raises a question of "why is one returned and the other an argument?"; 4) it raises another question: "does my boolean need to be initialised or not?"

Another side-advantage with the extra helper function is that this reduces the size of getSymbolNamesFromObject. Again, the simpler a function is, the easier it is to understand.

DiggerLin updated this revision to Diff 412722.Mar 3 2022, 7:55 AM
DiggerLin marked an inline comment as done.
DiggerLin added inline comments.Mar 3 2022, 7:57 AM
llvm/tools/llvm-nm/llvm-nm.cpp
1874

I use cantFail because if there is a error , it will happen at function getSymbolNamesFromObject first. it will not go to hasSymbols() which be called in the function dumpSymbolNamesFromObject. using cantFail let code simple here.

1886

Another side-advantage with the extra helper function is that this reduces the size of getSymbolNamesFromObject. Again, the simpler a function is, the easier it is to understand.

the helper function getDynamicSyms(SymbolicFile &Obj) can not be used in the getSymbolNamesFromObject.

for the following code still need E

if (Expected<std::vector<VersionEntry>> VersionsOrErr =
           E->readDynsymVersions())
     SymbolVersions = std::move(*VersionsOrErr);
   else
     WithColor::warning(errs(), ToolName)
         << "unable to read symbol versions: "
         << toString(VersionsOrErr.takeError()) << "\n";
 }

but the function getDynamicSyms

return E->getDynamicSymbolIterators()

So I keep the function getSymbolNamesFromObject without getDynamicSyms

Something very strange just happened and half my review comments were lost in my attempt to post. Sorry if anything ends up getting duplicated.

llvm/tools/llvm-nm/llvm-nm.cpp
1745
1749–1762

Moving the conversation from below: I think you should still use getDynamicSyms here and just use another cast call to get the ELF object again, as in the inline edit. It's reasonably straightforward, and removes the duplicated error.

1773

Are you sure this should be return false? (I don't know either way, but we should preserve behaviour).

1874

I'm not particularly comfortable with this, because it would be trivially easy for someone in the future to add another hasSymbols call site, without having first called dumpSymbolNamesFromObject. I think it would be better to return an Expected<bool> from this function, and then use cantFail at the relevant call site(s), possibly with a comment at those sites explaining why the function cannot fail.

DiggerLin updated this revision to Diff 413530.Mar 7 2022, 9:44 AM
DiggerLin marked 4 inline comments as done.
DiggerLin added inline comments.
llvm/tools/llvm-nm/llvm-nm.cpp
1773

the patch almost is NFC patch, I do not think it is a good idea to change the preserve
behaviour.

jhenderson accepted this revision.Mar 8 2022, 12:11 AM

LGTM, with one question.

llvm/tools/llvm-nm/llvm-nm.cpp
1773

I said we should preserve behaviour. My question was: does this?

This revision is now accepted and ready to land.Mar 8 2022, 12:11 AM
DiggerLin marked an inline comment as done.Mar 8 2022, 10:45 AM
DiggerLin added inline comments.
llvm/tools/llvm-nm/llvm-nm.cpp
1773

not sure your question, would you like to explain more detail?

according to the comment

// If a "-s segname sectname" option was specified and this is a Mach-O
 // file get the section number for that section in this object file.
 unsigned int Nsect = 0;
 MachOObjectFile *MachO = dyn_cast<MachOObjectFile>(&Obj);
 if (!SegSect.empty() && MachO) {
   Nsect = getNsectForSegSect(MachO);
   // If this section is not in the object file no symbols are printed.
   if (Nsect == 0)
     return false;
 }

if the user specific a section name from llvm-nm command line
"-s segname sectname", but the section not in the Mach-O object, it can not get the symbol. it should return the false.

do I answer your question ?

jhenderson added inline comments.Mar 8 2022, 9:54 PM
llvm/tools/llvm-nm/llvm-nm.cpp
1773

Not exactly. Previously, the behaviour of dumpSymbolNamesFromObject was to not print anything if the section was not found. Importantly, it also skipped the no symbols printing and also didn't print an error.

This function's comment says that return false indicates an error. As not finding a section previously didn't report an error, return false here didn't quite line up with that comment. Consequently, I wondered whether it was correct to return false, and the important factor was whether this caused a behaviour change or not. Having gone through this patch again, I now think return false is correct amd preserves the existing behaviour.