[ThinLTO] Fix ThinLTO crash while destroying context
ClosedPublic

Authored by ncharlie on May 24 2017, 11:40 AM.

Details

Summary

[ThinLTO] Fix ThinLTO crash while destroying context

Fix for PR32763

An assert that checks if a Ref was untracked fails during ThinLTO context cleanup. The issue is because lazy loading temporary nodes didn't properly track ValueAsMetadata nodes. This patch ensures that the temporary nodes are properly tracked when they're replaced with the value.

Diff Detail

ncharlie created this revision.May 24 2017, 11:40 AM
davide added a reviewer: pcc.May 24 2017, 12:16 PM

Thanks for your first contribution, great GSoC start!
One note: this manifests with ThinLTO, but it might be a more generic metadata issue. Are you able to reproduce without ThinLTO (and maybe with a single file?)
If not, can you explain why?

Thanks for your first contribution, great GSoC start!

Thanks! This was a good chance to look closely at the code to start learning how it works.

One note: this manifests with ThinLTO, but it might be a more generic metadata issue. Are you able to reproduce without ThinLTO (and maybe with a single file?)

The bug only happens when lazy loading Metadata. Lazy loading occurs when module level metadata is being imported by ThinLTO (see https://reviews.llvm.org/diffusion/L/browse/llvm/trunk/lib/Bitcode/Reader/MetadataLoader.cpp;303885$781). I believe this is the only path that leads to lazy loading, so I don't think it can happen elsewhere.

I'm probably a good person to review this, but it's been a while since I've thought this so I might need help to refresh my context. This use-tracking code is unfortunately rather subtle.

Also, the testcase looks like it could possibly be reduced...

include/llvm/IR/Metadata.h
716–723

Notice that in MDOperand, MetadataTracking is sometimes called with an Owner argument.

This seems to only be used when MDNode::setOperand is called on a *uniqued* metadata node, or during MDNode initialization, when MDNode::makeUniqued() is called.

Since these operands are only for *distinct* metadata nodes, what you've done below looks correct... can you confirm you have the same understanding I do?

1275–1283

I wonder if it would be more or less clear to use:

MetadataTracking::untrack(this);
assert(!Use && "Somehow this is still being tracked...");

here, instead of just Use = nullptr.

test/Linker/Inputs/lazy-load-temporaries-cleanup.b.ll
5 ↗(On Diff #100119)

Does this need to be !llvm.asan.globals, or can you make it !named and then prune the arguments to something minimal?

Thanks for your first contribution, great GSoC start!

Thanks! This was a good chance to look closely at the code to start learning how it works.

One note: this manifests with ThinLTO, but it might be a more generic metadata issue. Are you able to reproduce without ThinLTO (and maybe with a single file?)

The bug only happens when lazy loading Metadata. Lazy loading occurs when module level metadata is being imported by ThinLTO (see https://reviews.llvm.org/diffusion/L/browse/llvm/trunk/lib/Bitcode/Reader/MetadataLoader.cpp;303885$781). I believe this is the only path that leads to lazy loading, so I don't think it can happen elsewhere.

If there is no other path to exercise LazyLoading of Metadata, let's just create one! (add an option to llvm-dis for example).

If there is no other path to exercise LazyLoading of Metadata, let's just create one! (add an option to llvm-dis for example).

I agree.

If there is no other path to exercise LazyLoading of Metadata, let's just create one! (add an option to llvm-dis for example).

I agree.

Or it could be an option to llvm-link (with a single file), where lazy-loading perhaps is more natural.

ncharlie updated this revision to Diff 100419.May 26 2017, 8:54 AM

fixed tests

ncharlie added inline comments.May 26 2017, 8:56 AM
include/llvm/IR/Metadata.h
716–723

Notice that in MDOperand, MetadataTracking is sometimes called with an Owner argument.

Yep, I can confirm that the only places I've seen the Owner argument used is in MDNode::setOperand. It's also used in MetadataAsValue::track (though I don't think I've seen that happen in my test cases).

Since these operands are only for *distinct* metadata nodes, what you've done below looks correct... can you confirm you have the same understanding I do?

Yeah - this placeholder is only used for distinct nodes (the only place the they're created is when an attempt to get a distinct and resolved metadata fails: https://reviews.llvm.org/diffusion/L/browse/llvm/trunk/lib/Bitcode/Reader/MetadataLoader.cpp;303994$926)

1275–1283

Use = nullptr is more direct and shorter, but they both accomplish the same goal. I can have it call untrack instead if you want.

test/Linker/Inputs/lazy-load-temporaries-cleanup.b.ll
5 ↗(On Diff #100119)

I can use !named but I don't think I can simplify the number of arguments any further, otherwise lazy loading isn't triggered.

mehdi_amini added inline comments.May 26 2017, 9:24 AM
test/Linker/Inputs/lazy-load-temporaries-cleanup.b.ll
5 ↗(On Diff #100119)

There is an option in the bitcode reader/writer to disable the threshold of the minimum number of MDs to trigger lazy-loading.

pcc added inline comments.May 26 2017, 12:04 PM
test/ThinLTO/X86/lazy-load-temporaries-cleanup.ll
5 ↗(On Diff #100419)

The llvm test suite cannot depend on lld.

Did you try llvm-dis -materialize-metadata?

ncharlie updated this revision to Diff 100620.May 29 2017, 9:50 AM

Simplified test

@pcc @mehdi_amini I've simplified the tests using your recommendations. Note that to get lazy loading to work with llvm-dis, I had to set the IsImporting bool to true - I'm not sure if this is correct, but I couldn't set the value to true otherwise. Should I add a flag to allow enabling/disabling it instead of hardcoding it?

The current patch seems to be missing context on Phab (-U999999). Please remember to add it on the next iteration.

include/llvm/IR/Metadata.h
1280

Nit: should be a space between if and (*Use). Running clang-format on your diff should clean this up. See the script at the bottom of the docs:
https://clang.llvm.org/docs/ClangFormat.html#script-for-patch-reformatting

1282

Please at least add a comment saying that this is equivalent to MetadataTracking::untrack.

I do weakly prefer the untrack and assert combination, because it's easier to scan the code looking for balanced track/untrack calls. Also easier to add logging to track/untrack to sort out what's going on when there's a bug. I guess my point is it improves how clear the code is globally.

But it's a weak preference, and Use = nullptr is certainly more clear locally. What do others think?

tools/llvm-dis/llvm-dis.cpp
145–147

I haven't thought about whether this should be hardcoded (likely depends on whether we're losing important coverage in the false case -- have you looked at that?), but there should certainly be a /*What is this doing?=*/ comment.

mehdi_amini added inline comments.May 29 2017, 10:25 AM
test/ThinLTO/X86/lazy-load-temporaries-cleanup.ll
4 ↗(On Diff #100620)

Just a nit: usually we don't split the RUN commands using a temporary file unless needed (i.e. use a pipe, opt can print on stdout and llvm-dis reads on stdin).

ncharlie updated this revision to Diff 100622.May 29 2017, 10:26 AM

Added context

ncharlie marked 4 inline comments as done.May 29 2017, 11:36 AM
ncharlie added inline comments.
tools/llvm-dis/llvm-dis.cpp
145–147

Yup, just ran the test suite and a few tests will fail if I hard-code it. Should I just add a hidden flag to the llvm-dis utility then?

ncharlie updated this revision to Diff 101407.Jun 5 2017, 8:08 AM
  • Added hidden "set-importing" flag to llvm-dis to set the IsImporting boolean for testing.
  • Ran clang-format
  • Added comment
  • Smaller test
ncharlie marked 6 inline comments as done.Jun 5 2017, 8:10 AM

Addressed comments. Formatted with clang-format, so that's why there are so many extra changes in the diff.

Addressed comments. Formatted with clang-format, so that's why there are so many extra changes in the diff.

The common practice is that we never clang-format the full files. That's why LLVM has clang/tools/clang-format/git-clang-format ; if your commit is the last one in the history you can run git clang-format HEAD~ and it'll format only the statements touched by this last commit (you can also format changes that are about to be committed with git clang-format -f).

davide added a comment.Jun 5 2017, 8:40 AM

Addressed comments. Formatted with clang-format, so that's why there are so many extra changes in the diff.

The common practice is that we never clang-format the full files. That's why LLVM has clang/tools/clang-format/git-clang-format ; if your commit is the last one in the history you can run git clang-format HEAD~ and it'll format only the statements touched by this last commit (you can also format changes that are about to be committed with git clang-format -f).

So, clang-formatt'ing the whole file is generally a good thing.
What I recommend is splitting that change in a separate NFC commit, and then reapply your patch on top of it.

ncharlie updated this revision to Diff 102117.Jun 10 2017, 8:08 AM

Removed extraneous formatting.

davide requested changes to this revision.Jun 17 2017, 4:21 PM

This one is getting there, one minor comment.

test/ThinLTO/X86/lazy-load-temporaries-cleanup.ll
3 ↗(On Diff #102117)

one of the two files is empty, so maybe you can just inline?

Also, maybe dump the output of llvm-dis and check with filecheck?

This revision now requires changes to proceed.Jun 17 2017, 4:21 PM
ncharlie updated this revision to Diff 103021.Jun 19 2017, 5:27 AM

Simplified test.

ncharlie marked an inline comment as done.Jun 19 2017, 5:27 AM
Prazek removed a subscriber: Prazek.Jun 19 2017, 1:57 PM

@dexonsmith @pcc any updates on this?

mehdi_amini accepted this revision.Aug 5 2017, 4:03 PM
mehdi_amini added inline comments.
include/llvm/IR/Metadata.h
1275–1283

My 2 cents: a call to untrack carries better the intent.

ncharlie updated this revision to Diff 110184.Aug 8 2017, 6:36 AM
ncharlie marked an inline comment as done.
davide accepted this revision.Aug 8 2017, 10:41 PM

LGTM

This revision is now accepted and ready to land.Aug 8 2017, 10:41 PM
ncharlie updated this revision to Diff 111265.Aug 15 2017, 3:22 PM

oops - forgot a cast