This is an archive of the discontinued LLVM Phabricator instance.

[ThinLTO] return error instead of crashing on invalid input
AbandonedPublic

Authored by Amir on Apr 29 2020, 4:04 PM.

Details

Summary

In https://bugs.llvm.org/show_bug.cgi?id=45636 firefox was linked
against stale object files that don't match profiled build.

Before the change ld.lld was crashing with a backtrace as:

LLVM ERROR: Function Import: link error:
  linking module flags 'ProfileSummary':
    IDs have conflicting values in 'Mutex_posix.o' and 'nsBrowserApp.o'
  PLEASE submit a bug report to https://bugs.llvm.org/ and include the
  crash backtrace.

 #0 0x000055b58915b86a llvm::sys::PrintStackTrace(llvm::raw_ostream&)
   llvm/lib/Support/Unix/Signals.inc:564:22
 ...
 #11 0x000055b589111f89 llvm::report_fatal_error(llvm::Twine const&, bool)
   llvm/lib/Support/ErrorHandling.cpp:113:27
 #12 0x000055b5891120e1 llvm/lib/Support/ErrorHandling.cpp:87:21
 #13 0x000055b58a22611f llvm::FunctionImporter::importFunctions(...)
   llvm/lib/Transforms/IPO/FunctionImport.cpp:1250:25

After the change ld.lld collects all errors during ThinLTO and reports
them as usual:

ld.lld: error: Function Import: link error:
  linking module flags 'ProfileSummary':
    IDs have conflicting values in 'Mutex_posix.o' and 'nsBrowserApp.o'

Diff Detail

Event Timeline

trofi created this revision.Apr 29 2020, 4:04 PM

Could you add a test case for this?

(@tejohnson - might be the suitable reviewer here)

jdoerfert added a subscriber: jdoerfert.

Having a backtrace and the bug filing message seems excessive. Maybe you can even mention stale obj files in the message.
I'll add some reviewers that might want to chime in, otherwise I'll revisit this.

dmajor added a subscriber: dmajor.May 18 2020, 3:54 PM
lattner resigned from this revision.May 18 2020, 9:06 PM

Please address the earlier suggestions (needs a test case, and also agree it might be nice to have the message suggest stale objects as possible culprit).

trofi added a comment.Nov 12 2020, 3:42 PM

Please address the earlier suggestions (needs a test case

The failure looks like a low-level profile ID mismatch when merging profiles obtained when running tests against slightly different sources. I'll need some help on how I can craft the equivalent using lit mechanism. Original 40MB compressed input to lld is at https://bugs.llvm.org/show_bug.cgi?id=45636#c1 if it's of any help.

and also agree it might be nice to have the message suggest stale objects as possible culprit).

DR description provides new error message as:

"""
ld.lld: error: Function Import: link error:

linking module flags 'ProfileSummary':
  IDs have conflicting values in 'Mutex_posix.o' and 'nsBrowserApp.o'

"""

can you suggest how it should be amended instead?

Please address the earlier suggestions (needs a test case

The failure looks like a low-level profile ID mismatch when merging profiles obtained when running tests against slightly different sources. I'll need some help on how I can craft the equivalent using lit mechanism. Original 40MB compressed input to lld is at https://bugs.llvm.org/show_bug.cgi?id=45636#c1 if it's of any help.

There is already a test for this particular error message in llvm/test/Linker/module-flags-6-a.ll. It uses llvm-link. Presumably you can use the same input files but invoke via lld to test?

and also agree it might be nice to have the message suggest stale objects as possible culprit).

DR description provides new error message as:

"""
ld.lld: error: Function Import: link error:

linking module flags 'ProfileSummary':
  IDs have conflicting values in 'Mutex_posix.o' and 'nsBrowserApp.o'

"""

can you suggest how it should be amended instead?

The error message that should be changed ("IDs have conflicting values") is coming from in llvm/lib/Linker/IRMover.cpp. Maybe change it to something like "IDs have conflicting values in 'Mutex_posix.o' and 'nsBrowserApp.o' (possible use of stale object files)"? Outside of development it seems as though this would only happen in the stale object case.

trofi abandoned this revision.Oct 22 2021, 1:05 AM

I'll unlikely get to it any time soon. My environment changed quite a bit to be able to trigger the failure.

Amir updated this revision to Diff 487276.Jan 8 2023, 9:11 PM
Amir added a subscriber: Amir.

Add testcase

Herald added a project: Restricted Project. · View Herald TranscriptJan 8 2023, 9:11 PM
Herald added a subscriber: emaste. · View Herald Transcript
Amir commandeered this revision.Jan 8 2023, 9:12 PM
Amir added a reviewer: trofi.

I'm currently running into this issue on a buildbot, likely due to stale ccache files: https://lab.llvm.org/buildbot/#/builders/246/builds/1664

Thanks for picking this up. Question on the test.

lld/test/ELF/stale-profile-error.ll
9

Does this test fail without the change in this patch? It seems the main difference is that it dies more nicely with an error rather than a crash with backtrace, so it looks like this test would also pass without this change. Is there a way to make it fail with the crash?

MaskRay added inline comments.Jan 9 2023, 12:27 PM
lld/test/ELF/stale-profile-error.ll
2

Move this to lto/ and add REQUIRES: x86

8

Use ;; for non-RUN non-CHECK comments in lld/test/.

MaskRay added inline comments.Jan 9 2023, 12:29 PM
lld/test/ELF/stale-profile-error.ll
2

Ideally use a llvm/test/ThinLTO or llvm/test/LTO. This test isn't so related to lld https://maskray.me/blog/2021-08-08-toolchain-testing#the-test-checks-at-the-wrong-layer