This is an archive of the discontinued LLVM Phabricator instance.

llvm-nm ignore the Import symbol file for the --export-symbol option.
ClosedPublic

Authored by DiggerLin on Aug 15 2023, 10:52 AM.

Details

Summary

On AIX OS, clang may use llvm-nm to export the symbols from all input files (see https://github.com/llvm/llvm-project/blob/515c435e378b243b1be3da1587c9e206055f2c32/clang/lib/Driver/ToolChains/AIX.cpp#L236). However, the clang command-line may include import files (identified by them starting with #!). llvm-nm previously reported "invalid object file" errors for import files, meaning that the clang driver would fail to link when import files are included this way.

In this patch, llvm-nm is changed to ignore import files when the --export-symbol option, meaning that clang will now succeed in this case.

For more information about AIX import files, see https://www.ibm.com/docs/en/aix/7.3?topic=l-ld-command

Diff Detail

Event Timeline

DiggerLin created this revision.Aug 15 2023, 10:52 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 15 2023, 10:52 AM
DiggerLin requested review of this revision.Aug 15 2023, 10:52 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 15 2023, 10:52 AM
DiggerLin edited the summary of this revision. (Show Details)Aug 15 2023, 11:01 AM
DiggerLin edited the summary of this revision. (Show Details)Aug 15 2023, 12:51 PM
DiggerLin edited the summary of this revision. (Show Details)

I think the changes look okay. (In your patch description, I would specify a link to the 7.3 AIX documentation rather than the 7.2 version.)

DiggerLin edited the summary of this revision. (Show Details)Aug 16 2023, 12:51 PM

I think the changes look okay. (In your patch description, I would specify a link to the 7.3 AIX documentation rather than the 7.2 version.)

thanks, change to 7.3 version

stephenpeckham accepted this revision.Aug 17 2023, 11:48 AM

URL was updated and code looks good.

This revision is now accepted and ready to land.Aug 17 2023, 11:48 AM

LGTM as well.

llvm/test/tools/llvm-nm/XCOFF/export-symbols.test
55

Minor nit: Grammar.

I'd just like to make sure I follow what is going on here:

  1. On AIX, clang uses llvm-nm to produce a list of exported symbols for its command-line. It does this by calling llvm-nm --export-symbols on each input file to clang.
  2. Import files can be passed on the command-line like regular object files. They are distinguished from regular objects because they start with #!.
  3. Because of these two points, clang ends up passing import files to llvm-nm, which should be ignored.

I think I can buy that it should be llvm-nm's responsibility to ignore import files when using --export-symbols, rather than preventing clang from passing them to llvm-nm in the first place. If I understand it correctly, import files are essentially a list of symbols to import from shared objects. That means that conceptually by definition they don't have any exports. I can see that this patch implements that functionality. The question I have though is why should we special-case import files specifically for --export-symbols? I'm thinking that they should actually be another kind of Binary file, like COFFImportFile. It would then fit more naturally into the framework that already exists, whereas the proposed patch feels like a sticking plaster to cover a specific edge-case. If this is causing a major issue, I'm okay with this patch landing, but I'd like a TODO added to the code indicating that it is temporary and should be replaced with something more appropriate like AIXImportFile or similar in the near future.

Regardless of the decision on the above, the description in this patch could do with being reworked. My suggestion follows. I wouldn't include the quote in the final commit message, as it will get a bit verbose:

On AIX OS, clang may use llvm-nm to export the symbols from all input files (see https://github.com/llvm/llvm-project/blob/515c435e378b243b1be3da1587c9e206055f2c32/clang/lib/Driver/ToolChains/AIX.cpp#L236). However, the clang command-line may include import files (identified by them starting with #!). llvm-nm previously reported "invalid object file" errors for import files, meaning that the clang driver would fail to link when import files are included this way.

In this patch, llvm-nm is changed to ignore import files when the --export-symbol option, meaning that clang will now succeed in this case.

For more information about AIX import files, see https://www.ibm.com/docs/en/aix/7.3?topic=l-ld-command

llvm/test/tools/llvm-nm/XCOFF/export-symbols.test
55

Also, add a "." at the end of the comment, and probably then best to quote the #! i.e. "begins with "#!"."

The term in the doc is "Import File", so you should probably use that instead of "imported symbol file". It also should reference AIX, since this isn't a generic thing. Something like:

## Test that llvm-nm ignores AIX import symbol files when using --export-symbols. These start with "#!".

(Make sure to reflow to an 80 column width).

llvm/tools/llvm-nm/llvm-nm.cpp
2267–2270

Should this be limited to AIX or similar in some way? Certainly, the comment should highlight that these import files are specific to AIX, as the format is not a generic format that all systems follow. Something like "Ignore AIX import symbol files (these files start with "#!"), when exporting symbols."

I'd also strongly consider moving the identification code into the createBinary tree. See my out-of-line comment for more detail.

DiggerLin edited the summary of this revision. (Show Details)EditedAug 18 2023, 10:34 AM
DiggerLin marked 2 inline comments as done.

yes, your understand is correct. but I need to clarify following.

The question I have though is why should we special-case import files specifically for --export-symbols?

Since the Import file is input in the clang command, there is another solution to identify the Import File in the clang driver and not pass the Import File to llvm-nm --export-symbols , after compare with the solution in the patch. we prefer the solution in the patch, if user input the random File for llvm-nm --export-symbols , we put an error message to "invalid object file`, but for the Import symbol file, we just output none of export symbol.

I'm thinking that they should actually be another kind of Binary file, like COFFImportFile. It would then fit more naturally into the framework that already exists, whereas the proposed patch feels like a sticking plaster to cover a specific edge-case. If this is causing a major issue, I'm okay with this patch landing, but I'd like a TODO added to the code indicating that it is temporary and should be replaced with something more appropriate like AIXImportFile or similar in the near future.

"The import file is not another kind of binary file (refer to: https://www.ibm.com/docs/en/aix/7.3?topic=l-ld-command#ld__a3119106d__title__1, section Import and Export File Format (-bI: and -bE: Flags)). I don't think we will obtain any information from the AIX import file in the future. Therefore, there's no need to add a class and API for it. If, in the future, we do want to retrieve information from the AIX import file, it might be a good idea to implement a class like 'AIXImportFile,' as per your suggestion.

For now, we can simply disregard the AIX import file instead of outputting the error 'invalid object file.' 'llvm-nm --export-symbols.'

llvm/tools/llvm-nm/llvm-nm.cpp
2267–2270

it only limit to AIX Import symbol File.

I'd just like to make sure I follow what is going on here:

  1. On AIX, clang uses llvm-nm to produce a list of exported symbols for its command-line. It does this by calling llvm-nm --export-symbols on each input file to clang.
  2. Import files can be passed on the command-line like regular object files. They are distinguished from regular objects because they start with #!.
  3. Because of these two points, clang ends up passing import files to llvm-nm, which should be ignored.

Correct.

The question I have though is why should we special-case import files specifically for --export-symbols? I'm thinking that they should actually be another kind of Binary file, like COFFImportFile. It would then fit more naturally into the framework that already exists, whereas the proposed patch feels like a sticking plaster to cover a specific edge-case. If this is causing a major issue, I'm okay with this patch landing, but I'd like a TODO added to the code indicating that it is temporary and should be replaced with something more appropriate like AIXImportFile or similar in the near future.

This is causing a major issue for users linking shared objects using Clang on AIX; yes. Thanks for the understanding. I have some doubts about implementing further functionality for an AIXImportFile though: It is already human-readable. Also, it seems (to me) inappropriate to model is as a Binary because it is text.

llvm/tools/llvm-nm/llvm-nm.cpp
2267–2270

@jhenderson, do you have a suggestion on how to limit it in such a way? I was under the impression that the tools were designed to work in cross-compilation environments. A host-environment-based or build configuration property could be employed, but I think your input on the choice is needed.

DiggerLin marked 2 inline comments as done.

address comment

DiggerLin added inline comments.Aug 18 2023, 10:51 AM
llvm/tools/llvm-nm/llvm-nm.cpp
2267–2270

"Ignore AIX import symbol files (these files start with "#!"), when exporting symbols."

since the import symbol files only used by linker
I am prefer to
"Ignore AIX linker import files (these files start with "#!"), when exporting symbols."

llvm/tools/llvm-nm/llvm-nm.cpp
2265

Minor nit: Wrap at 80 columns.

DiggerLin updated this revision to Diff 552000.Aug 21 2023, 6:22 AM
DiggerLin marked an inline comment as done.

address minor comment.

@jhenderson ,May I kindly ask if you have any further comments? If not, would it be acceptable for me to move forward with committing the patch? Your valuable input would be sincerely appreciated.

Sorry, I've been trying to focus on other things for the majority of the last few days, as LLVM reviewing has been taking up too much of my worktime recently.

This is causing a major issue for users linking shared objects using Clang on AIX; yes. Thanks for the understanding. I have some doubts about implementing further functionality for an AIXImportFile though: It is already human-readable. Also, it seems (to me) inappropriate to model is as a Binary because it is text.

Re. the bit about using Binary: technically any file on disk is a binary file, since even text is just a series of binary data (it just happens to be understandable as ASCII/UTF-8/... etc). In fact, Binary.h already references things like TAPI files, which are text-based - see the ID_* enum in it. The key thing is that the createBinary method recognises the file format (via the identify_magic function).

Looking at how classes inherit from Binary, it looks like you wouldn't need to implement anything special for AIXImportFile. I think the steps are simply:

  • Add an ID_ entry in the enum in Binary.h.
  • Add a small class that inherits from Binary.h. This class wouldn't actually need to do anything else, as far as I can tell, other than define a constructor/create method that sets the ID type in the base class.
  • Update identify_magic in Magic.cpp and the corresponding file_magic enum in Magic.h to support the #! sequence.
  • Add an isAIXImportFile method to Binary that can be used to see if an arbitrary Binary object is of the relevant sub-class type. This bit is probably optional.

Looking at the code, if you went with this approach, you wouldn't need to change llvm-nm at all, I think (the testing you've added would still be fine though). In fact, I'd go as far as to say that the code is designed to work like that, rather than special casing other file formats. The createBinary call in dumpSymbolNamesFromFile will produce an AIXImportFile object and then the immediately-following if/else if chain would result in returning an empty SymbolList. This would be much cleaner than the proposed patch, I think, as it avoids special-casing the import files. It's also not that much more complex overall. Finally, it would give you a place to enhance in the future, should you wish to (for example you could later turn it into a SymbolicFile that lists the imports when requested).

llvm/tools/llvm-nm/llvm-nm.cpp
2267–2270

I changed my mind on this one, having looked at the createBinary and identify_magic functions - there's no risk of conflict with other known file formats, so it's safe to not restrict this to AIX.

DiggerLin updated this revision to Diff 553111.Aug 24 2023, 7:08 AM

implement a new class AIXLinkerImportFile

rebase the patch.

jhenderson added inline comments.Aug 24 2023, 11:46 PM
llvm/include/llvm/Object/AIXLinkerImportFile.h
23–28 ↗(On Diff #553193)

I don't really follow this comment. What are you trying to explain here?

Aside: I'd delete the blank line before and after it.

llvm/lib/Object/AIXLinkerImportFile.cpp
19 ↗(On Diff #553193)

Please add a blank line between method defninitions.

That being said, why do you need an explicit destructor, rather than just not having the function present?

llvm/test/tools/llvm-nm/XCOFF/export-symbols.test
55

Looks like this comment has regressed. Please restore it to the suggestion I made once already.

56

You're not in a dedicated directory, so temporary files like "imp.txt" need to be based on %t somehow, e.g. %t.imp or %timport.txt

57

This is a better way of checking that the output is empty.

This test case is good, but I think you should also have one where the file contains one or more symbols. The reason for this is that in the future, if you added any form of symbolic file support to AIXImportFile, you'll want to show that the listed imports don't appear as exports, which this suggested test will cover upfront.

MaskRay requested changes to this revision.EditedAug 25 2023, 12:54 AM

Changing llvm::identify_magic to recognize all files starting with #! as aix_linker_import_file has far-reaching effects. Many can well be used as shebang for shell or other scripts for other *NIX operating systems.

case '#':
  if (Magic[1] == '!')
    return file_magic::aix_linker_import_file;
  break;

For instance, if the linker script a.lds starts with #!, the following command now reproduces an error.

% ld.lld a.lds a.o
ld.lld: error: a/lds: unknown file type

This is probably not a big ideal, but I am concerned of many other identify_magic calls.

This revision now requires changes to proceed.Aug 25 2023, 12:54 AM

Changing llvm::identify_magic to recognize all files starting with #! as aix_linker_import_file is not correct. Many can well be used as shebang for shell or other scripts.

case '#':
  if (Magic[1] == '!')
    return file_magic::aix_linker_import_file;
  break;

How would you suggest that @DiggerLin proceeds then? The impact of misidentifying a file with a shebang as an AIX import file is that tools will no longer report "invalid object file" or something to that effect for such files, but will otherwise do nothing with them, so it seems relatively harmless to me.

MaskRay added inline comments.Aug 25 2023, 12:59 AM
llvm/lib/Object/ObjectFile.cpp
193 ↗(On Diff #553193)

Without a default, -Wswitch can catch future addition to the enum members. This change will nullify this use.

MaskRay added a comment.EditedAug 25 2023, 1:04 AM

Changing llvm::identify_magic to recognize all files starting with #! as aix_linker_import_file is not correct. Many can well be used as shebang for shell or other scripts.

case '#':
  if (Magic[1] == '!')
    return file_magic::aix_linker_import_file;
  break;

How would you suggest that @DiggerLin proceeds then? The impact of misidentifying a file with a shebang as an AIX import file is that tools will no longer report "invalid object file" or something to that effect for such files, but will otherwise do nothing with them, so it seems relatively harmless to me.

Sorry, I haven't analyzed the AIX use cases... I just noticed this patch and am concerned that identify_magic mischaracterizing files with shebang as AIX-specific would likely cause beyond-negligible negative impacts...
identify_magic in LLVMBinaryFormat is somewhat commonly used outside of llvm-project.

Changing llvm::identify_magic to recognize all files starting with #! as aix_linker_import_file is not correct. Many can well be used as shebang for shell or other scripts.

case '#':
  if (Magic[1] == '!')
    return file_magic::aix_linker_import_file;
  break;

How would you suggest that @DiggerLin proceeds then? The impact of misidentifying a file with a shebang as an AIX import file is that tools will no longer report "invalid object file" or something to that effect for such files, but will otherwise do nothing with them, so it seems relatively harmless to me.

Sorry, I have analyzed the AIX use cases... I just noticed this patch and am concerned that identify_magic mischaracterizing files with shebang as AIX-specific would likely cause beyond-negligible negative impacts...
identify_magic in LLVMBinaryFormat is somewhat commonly used outside of llvm-project.

Could you elaborate please what these "beyond-negligible negative impacts" are? I've highlighted the one behaviour impact that I'm aware of, which I think we can ignore as not really an issue.

It seems to me like you're blocking this patch because the AIX import file magic happens to clash with some general text file contents. How else can such a file be identified?

@DiggerLin/@hubert.reinterpretcast et al, what is preventing the change to clang to not call llvm-nm with import files?

DiggerLin marked 2 inline comments as done.EditedAug 25 2023, 6:49 AM

what is preventing the change to clang to not call llvm-nm with import files?

  1. Opening a file and checking the "#!" before passing the file to llvm-nm exports in the clang driver that looks weird. the clang driver do not have functionality to export symbols from the files, it depend on the llvm-nm exports. the output as empty for llvm-nm exports AIXLinkerImportFile is reason for us.
  2. I still prefer we previous implement the functionality in the llvm-nm as https://reviews.llvm.org/D158004?id=552000 instead of adding a new class AIXLinkerImportFile, it only effect llvm-nm --export , there is no other side-effect as MaskRay worry about. not sure what @hubert.reinterpretcast and @MaskRay's opinion
// Ignore AIX linker import files (these files start with "#!"), when
 // exporting symbols.
 const char *BuffStart = (*BufferOrErr)->getBufferStart();
 if (ExportSymbols && (*BufferOrErr)->getBufferSize() >= 2 &&
     BuffStart[0] == '#' && BuffStart[1] == '!')
   return SymbolList;
llvm/include/llvm/Object/AIXLinkerImportFile.h
23–28 ↗(On Diff #553193)

AIX linker import file contain imported symbols name in the file.
we look it not symbolic file this moment , if we want to get the symbols from the AIX linker import file in future, we maybe implement the class as derive from SymbolicFile class in future. not derived from Binary directly.

llvm/lib/Object/AIXLinkerImportFile.cpp
19 ↗(On Diff #553193)

without the explicit destructor , there will be compiler error.

DiggerLin marked 2 inline comments as done.Aug 25 2023, 6:53 AM
DiggerLin added inline comments.
llvm/lib/Object/AIXLinkerImportFile.cpp
19 ↗(On Diff #553193)

lld with generate a linker error. please see https://lld.llvm.org/missingkeyfunction

@DiggerLin/@hubert.reinterpretcast et al, what is preventing the change to clang to not call llvm-nm with import files?

The Clang driver is only one client of the functionality. Using llvm-nm to address the export-list creation process in other contexts was the intent. A common implementation within llvm-nm reduces "wheel-reinvention". It is also the case that the original utility used by the IBM compilers for AIX export list creation ignored import files.

  1. I still prefer we previous implement the functionality in the llvm-nm as https://reviews.llvm.org/D158004?id=552000 instead of adding a new class AIXLinkerImportFile, it only effect llvm-nm --export , there is no other side-effect as MaskRay worry about. not sure what @hubert.reinterpretcast and @MaskRay's opinion

I think that @MaskRay's concern about the larger-scope effects of adding #! magic supports going with the targeted direction of special handling for --export.

  1. I still prefer we previous implement the functionality in the llvm-nm as https://reviews.llvm.org/D158004?id=552000 instead of adding a new class AIXLinkerImportFile, it only effect llvm-nm --export , there is no other side-effect as MaskRay worry about. not sure what @hubert.reinterpretcast and @MaskRay's opinion

I think that @MaskRay's concern about the larger-scope effects of adding #! magic supports going with the targeted direction of special handling for --export.

I agree that https://reviews.llvm.org/D158004?id=552000 feels reasonable. --export-symbols seems to be an AIX-specific option.

static std::vector<NMSymbol> dumpSymbolNamesFromFile(StringRef Filename) {
...

  // Ignore AIX linker import files (these files start with "#!"), when
  // exporting symbols.
  const char *BuffStart = (*BufferOrErr)->getBufferStart();
  if (ExportSymbols && (*BufferOrErr)->getBufferSize() >= 2 &&
      BuffStart[0] == '#' && BuffStart[1] == '!')
    return SymbolList;

Changing llvm::identify_magic to recognize all files starting with #! as aix_linker_import_file is not correct. Many can well be used as shebang for shell or other scripts.

case '#':
  if (Magic[1] == '!')
    return file_magic::aix_linker_import_file;
  break;

How would you suggest that @DiggerLin proceeds then? The impact of misidentifying a file with a shebang as an AIX import file is that tools will no longer report "invalid object file" or something to that effect for such files, but will otherwise do nothing with them, so it seems relatively harmless to me.

Sorry, I have analyzed the AIX use cases... I just noticed this patch and am concerned that identify_magic mischaracterizing files with shebang as AIX-specific would likely cause beyond-negligible negative impacts...
identify_magic in LLVMBinaryFormat is somewhat commonly used outside of llvm-project.

Could you elaborate please what these "beyond-negligible negative impacts" are? I've highlighted the one behaviour impact that I'm aware of, which I think we can ignore as not really an issue.

It seems to me like you're blocking this patch because the AIX import file magic happens to clash with some general text file contents. How else can such a file be identified?

@DiggerLin/@hubert.reinterpretcast et al, what is preventing the change to clang to not call llvm-nm with import files?

(Sorry that I made a typo in a previous message. I mean I haven't carefully analyzed the AIX use cases.)

My reasoning is this: someone may call llvm::identify_magic to implement an emulator: check whether an file is an ELF/XCOFF/etc executable file or a script file with shebang.

switch (llvm::identify_magic(...)) {
case file_magic::elf_executable:
case file_magic::elf_shared_object:
  // run
case file_magic::xcoff_object_64:
  // run
case file_magic::unknown:
  // check shebang
}

It would be quite confusing that they now need to change case file_magic::unknown: to case file_magic::aix_linker_import_file.

DiggerLin added a comment.EditedAug 28 2023, 6:24 AM

@jhenderson , I would greatly appreciate your thoughts on the possibility of using the implement proposed in https://reviews.llvm.org/D158004?id=552000 for the patch.

MaskRay accepted this revision.Aug 28 2023, 10:36 AM

@jhenderson , I would greatly appreciate your thoughts on the possibility of using the implement proposed in https://reviews.llvm.org/D158004?id=552000 for the patch.

FWIW https://reviews.llvm.org/D158004?id=552000 looks good to me. I'll remove my "request changes" status now so that you don't need to wait on me if @jhenderson thinks fine:)

llvm/test/tools/llvm-nm/XCOFF/export-symbols.test
58

# RUN: llvm-nm --export-symbols imp.txt 2>&1 | count 0

This revision is now accepted and ready to land.Aug 28 2023, 10:36 AM
MaskRay added 1 blocking reviewer(s): jhenderson.Aug 28 2023, 10:36 AM
This revision now requires review to proceed.Aug 28 2023, 10:36 AM

Okay, thanks for the comments. I was trying to see if there was a sensible middle ground that would satisfy both my concerns and those of @MaskRay, but I don't think there really is at this point. Please could you revert to the previous llvm-nm implementation and drop the AIXImportFile stuff (make sure to address the test comments too). Sorry for the churn.

I may have some comments on the older style, but I'll wait for the revert to happen, along with any tweaks you feel you need to make before I try to explain them. I'll make sure to review this as prompty as I can once you've made that revert, so that you aren't held up any further.

DiggerLin updated this revision to Diff 554356.Aug 29 2023, 8:16 AM
DiggerLin marked 3 inline comments as done.
This revision is now accepted and ready to land.Aug 30 2023, 12:01 AM