This is an archive of the discontinued LLVM Phabricator instance.

[IR][Modules] Improve quality of diagnostic messages for non-fatal warnings during module linking.
ClosedPublic

Authored by kristina on Oct 5 2018, 3:20 PM.

Details

Summary

Currently when a non-fatal warning is emitted, linking of modules can typically continue, however, the user is flooded with non-helpful warnings that do not hint at the possible source of the warning. This is especially relevant when DWARFv4 and DWARFv5 modules are linked which can happen when migrating to newer debug info.

This produces a flurry of messages along the lines of:

ld.lld: warning: linking module flags 'Dwarf Version': IDs have conflicting values
ld.lld: warning: linking module flags 'Dwarf Version': IDs have conflicting values 
ld.lld: warning: linking module flags 'Dwarf Version': IDs have conflicting values

That kind of output is not especially helpful when trying to pinpoint the culprit source modules. This relatively simple patch improves the quality of non-fatal warnings from IRMover by emitting names of the source modules and in case of DWARF version mismatches, the mismatching versions (common case being 4 and 5).

That diagnostics code lack any test coverage and I can't think of a way to easily and deliberately produce mismatching modules using lit tests, and this diagnostic is not used as part of any current lit test in all of LLVM. Adding test coverage to all such diagnostics would be a much bigger task in itself and should likely be addressed separately since currently all of them lack any coverage.

Diff Detail

Repository
rL LLVM

Event Timeline

kristina created this revision.Oct 5 2018, 3:20 PM

Could this be tested with two .ll files with mismatched DWARF Versions, lto'd together like other LTO tests (in, for example, lld/test/ELF/lto/)

kristina added a comment.EditedOct 8 2018, 1:23 PM

Could this be tested with two .ll files with mismatched DWARF Versions, lto'd together like other LTO tests (in, for example, lld/test/ELF/lto/)

This would require introducing a category of tests for IR linker diagnostics, which currently do not exist, likely due to their relative insignificance and a somewhat (from my perspective, given how often modulemaps are forgotten and not updated and it not being noticed for a while) low number of module users overall, especially for projects outside of LLVM-related scope (and as I said, the actual diagnostics emitted and their validity/suitability have zero coverage, hence my remark about it being something for another time as it involves a fairly significant bulk of changes, if I understand it correctly). A regression in things like metadata naming on the other hand (like renaming Dwarf Version to something else) would be ABI breaking in terms of IR as well so if it does get renamed, a ton of other tests would fail as well.

This is just a quality-of-life improvement for dealing with migrating to DWARF5 as well as general migration of bitcode based builds forward over to LLVM 8 and finding stale bitcode archives easier for people who use modular builds involving linking against a variety of cached pre-built bitcode archives, which saves time when trying to work out which caches need purging. (The original patch was a rougher version of this that I added to my local fork to try to hunt down sources of these warnings, this is just a slightly more refined version of something I've been using for a while now in an upstream-friendly form)

Could this be tested with two .ll files with mismatched DWARF Versions, lto'd together like other LTO tests (in, for example, lld/test/ELF/lto/)

This would require introducing a category of tests for IR linker diagnostics,

I'm not sure what you mean by a new category of tests - lit tests are fairly general purpose. Inputs and expected output.

Here's one I created for this code change:

test/Linker/metadata-mismatch.test:
  ; RUN: llvm-link %p/Inputs/metadata-mismatch-a.ll %p/Inputs/metadata-mismatch-b.ll -S 2>&1 | FileCheck %s
  ; CHECK: warning: linking conflicting DWARF versions: v5 from {{.*}}/metadata-mismatch-b.ll to v4 in llvm-link
test/Linker/Inputs/metadata-mismatch-a.ll:
  !llvm.module.flags = !{!1}
  !1 = !{i32 2, !"Dwarf Version", i32 4}
test/Linker/Inputs/metadata-mismatch-b.ll:
  !llvm.module.flags = !{!1}
  !1 = !{i32 2, !"Dwarf Version", i32 5}

which currently do not exist, likely due to their relative insignificance and a somewhat (from my perspective, given how often modulemaps are forgotten and not updated and it not being noticed for a while) low number of module users overall,

C++/clang modules, you mean? This diagnostic is independent of that, right? It could hit anyone using LTO (at least Apple, Google, and Sony use it extensively, so far as I know?) where a build passed different -gN flags to different compiles (pretty uncommon - which is why people probably aren't too worried aobut these diagnostic experiences)

especially for projects outside of LLVM-related scope

LLVM-related scope?

(and as I said, the actual diagnostics emitted and their validity/suitability have zero coverage,

Yeah, pity about the lack of coverage - which is why it's great to try to fix that when the opportunity arises, like now.

hence my remark about it being something for another time as it involves a fairly significant bulk of changes, if I understand it correctly).

I'd love to better understand what large changes this might entail/why it looks like it might require a lot of work - would be great to address that problem (either whatever info lead to the misunderstanding - or an actual high cost for such work) - hopefully teh example I gave above demonstrates how this can be tested without too much work.

A regression in things like metadata naming on the other hand (like renaming Dwarf Version to something else) would be ABI breaking in terms of IR as well so if it does get renamed, a ton of other tests would fail as well.

This is just a quality-of-life improvement for dealing with migrating to DWARF5 as well as general migration of bitcode based builds forward over to LLVM 8 and finding stale bitcode archives easier for people who use modular builds involving linking against a variety of cached pre-built bitcode archives, which saves time when trying to work out which caches need purging. (The original patch was a rougher version of this that I added to my local fork to try to hunt down sources of these warnings, this is just a slightly more refined version of something I've been using for a while now in an upstream-friendly form)

Also super curious about how C++/Clang modules buidls relate to this - they don't cache bitcode, do they? Sounds like you're dealing with a ThinLTO/incremental LTO build? That's separate from Clang Modules. (Clang modules with LLDB and -gmodules can cache debug info in the .pcm files - but that's stored as actual object code DWARF, not LLVM bitcode/IR debug info).

(also, maybe we can avoid a special case for DWARF here? And have the diagnostic print both values of whatever the metadata field is - a more general solution?)

Thanks for the fix/improvements!

In D52952#1258114, @dblaikie wrote:
C++/clang modules, you mean? This diagnostic is independent of that, right? It could hit anyone using LTO (at least Apple, Google, and Sony use it extensively, so far as I know?) where a build passed different -gN flags to different compiles (pretty uncommon - which is why people probably aren't too worried aobut these diagnostic experiences)

Those you mentioned have separate compiler teams, some multiple of them, that occasionally contribute to upstream. The changes that land in downstream forks are usually cherry-picked from upstream, speaking from experience, so that has very little relevance when it comes to upstream modular builds breaking. Since it's not a big change I do not see why (after all I did it too) downstream forks would be in a hurry to get those changes upstreamed (The pre-cleanup version of this sat in my repository for weeks).

Dwarf Version is metadata stored in bitcode files, the conflicts like this is what happens when a module (a bitcode archive) is linked against an incompatible module (which could very well be a self-contained single bitcode file for a single TU since if I understand correctly, since LLVM7-8 have been very much module-oriented internally, even when a module is a single TU, which is extremely common) with a different version for the Dwarf Version metadata property. I never said anything about actual DWARF data being embedded in bitcode archives, only the metadata property. No DWARF data is emitted at any of those points. This merely pins down a conflict in metadata. Nothing more.

Yeah, pity about the lack of coverage - which is why it's great to try to fix that when the opportunity arises, like now.

Considering it currently has zero coverage, adding a new set of tests that, also likely need to cover out of scope (for this patch) existing diagnostics for various other metadata attributes (there is quite a few of them) would likely belong in a separate diff addressing the lack of coverage in this area. Because of that I do not think it's a good idea to fuse diffs that add clusters of new tests for things that previously had zero coverage and changes like these.

(also, maybe we can avoid a special case for DWARF here? And have the diagnostic print both values of whatever the metadata field is - a more general solution?)

It's the most common conflict I've come across, it's already part of the ABI and has a defined format. I do not believe there is a unified way to concisely represent all metadata property values especially when it may not even be human-friendly when it comes to reading it. This also implicates assumptions about further formats of metadata, naming of it, and layout of it, all of which is subject to change, while this particular property is part of the currently stable ABI on IR level. Therefore Dwarf Version is handled differently because it's known how such an attribute is handled and what it contains.

I'd love to better understand what large changes this might entail/why it looks like it might require a lot of work - would be great to address that problem (either whatever info lead to the misunderstanding - or an actual high cost for such work) - hopefully teh example I gave above demonstrates how this can be tested without too much work.

As previously mentioned, adding tests like this would likely be done in bulk for various other metadata attributes if you wanted to have coverage for those diagnostics, because it's a lot more convenient and is completely separate from any changes in code, since reiterating, none of this currently has any coverage.

Also super curious about how C++/Clang modules buidls relate to this - they don't cache bitcode, do they? Sounds like you're dealing with a ThinLTO/incremental LTO build? That's separate from Clang Modules. (Clang modules with LLDB and -gmodules can cache debug info in the .pcm files - but that's stored as actual object code DWARF, not LLVM bitcode/IR debug info).

More specifically actually emitting bitcode archives for libraries, ready for being pulled in for (Thin)LTO and static linking. The actual libraries can be considered modules in their own right.

I understand the desire for increasing test coverage and you may consider this a bad attitude but I fail to see how regression tests would make that much of a difference (a very very tiny increase in coverage for a single metadata attribute) for this specific case, unless it was done for IRMover as a whole, in which case yes that would provide a significant increase in coverage. I don't believe this patch is the place to start this process, however. (I should also mention that there are cases where there's a desire for increased coverage just for the sake of increased coverage, with tests that do not actually really test for any regressions).

However if you're unhappy with lack of bundling/starting of adding coverage tests for these particular diagnostics (ones coming from IRMover), I'm okay with abandoning this diff and keeping it downstream. It helped me so I figured it would help other people, in-scope, there is very little to test, it's simply adding additional information to existing diagnostics.

aprantl added a subscriber: bruno.Oct 8 2018, 4:23 PM
This revision was not accepted when it landed; it landed in state Needs Review.Oct 8 2018, 6:19 PM
This revision was automatically updated to reflect the committed changes.

Spoke with @dblaikie, appreciate the explanation, seems I had Clang pcm modules and LLVM IR modules mixed up hence the confusion over modulemaps. The actual issue boiled down to a build system update/migration and -dwarf-4 being dropped into the default set of build flags, and then undone shortly after, but not before some IR archives were recompiled, which caused the downstream issue I've been trying to hunt down, with some modules ending up with wrong metadata. Thank you for finishing the revision, should be easier to diagnose if anyone runs into a similar type of problem in the future.