This is an archive of the discontinued LLVM Phabricator instance.

[IRMover] Avoid materializing global value that belongs to not-yet-linked module
ClosedPublic

Authored by ychen on Sep 1 2020, 8:41 PM.

Details

Summary

We saw the same assertion failure mentioned here
https://bugs.llvm.org/show_bug.cgi?id=42063 in our internal tests.

The failure happens in the same circumstance as D47898 and D66814 where
uniqueing of DICompositeTypes causes Mapper::mapValue to be called on
GlobalValues(G) from a not-yet-linked module(M). The following type-mapping for
G may not complete correctly (fail to unique types etc. depending on the
the complexity of the types) because IRLinker::computeTypeMapping is not done for M
in this path.

D47898 and D66814 fixed some type-mapping issue after Mapper::mapValue
is called on G. However, it seems it did not handle some complex cases. I
think we should delay linking globals like G until its owing module is
linked. In this way, we could save unnecessary type mapping and prune
these corner cases. It is also supposed to reduce the total number of structs
ending up in the combined module.

D47898 is reverted (its test is kept) because it regresses the test case here.
D66814 could also be reverted (the check-all looks good). But it looks reasonable
anyway, so I thought I should keep it.

Also tested the patch with clang self-host regularLTO/ThinLTO build, things look
good as well.

Diff Detail

Event Timeline

ychen created this revision.Sep 1 2020, 8:41 PM
ychen requested review of this revision.Sep 1 2020, 8:41 PM
ychen retitled this revision from fix pprbug29096 to fix .Sep 1 2020, 8:41 PM
ychen retitled this revision from fix to Fix pr42063.Sep 1 2020, 8:58 PM
ychen updated this revision to Diff 289355.Sep 1 2020, 11:02 PM
  • ready for review
ychen retitled this revision from Fix pr42063 to [IRMover] Avoid linking global value that belongs to not-yet-linked module.Sep 1 2020, 11:05 PM
ychen edited the summary of this revision. (Show Details)
ychen retitled this revision from [IRMover] Avoid linking global value that belongs to not-yet-linked module to [IRMover] Avoid materializing global value that belongs to not-yet-linked module.Sep 1 2020, 11:17 PM

I need to think through this some more, but a couple questions:

Was the clang self host that you tried regular or Thin LTO? If regular please also try ThinLTO.

Do you have a test case showing the failure you got? I think it will be important to add if at all possible.

ychen added a comment.Sep 2 2020, 10:53 AM

I need to think through this some more, but a couple questions:

Was the clang self host that you tried regular or Thin LTO? If regular please also try ThinLTO.

I was testing the patch with regular LTO. Will try ThinLTO also.

Do you have a test case showing the failure you got? I think it will be important to add if at all possible.

The original test case is pretty big. I'll try harder to get it reduced.

ychen added a comment.Sep 2 2020, 10:58 AM

I need to think through this some more, but a couple questions:

Was the clang self host that you tried regular or Thin LTO? If regular please also try ThinLTO.

Do you have a test case showing the failure you got? I think it will be important to add if at all possible.

Before I could get a reduced test case, could you please try the reproducer attached in PR42063? It seems to have dependencies on fuchsia.

ormris added a subscriber: ormris.Sep 3 2020, 4:05 PM
ychen updated this revision to Diff 292377.EditedSep 16 2020, 4:52 PM
  • update test to demonstrate the bug
  • actually the bug could be fixed by reverting D47898
  • ThinLTO self-host Clang looks good.

gentle ping...

It sounds like even the test case that was added for D47898 passes when that fix is reverted? Can you also try the original test case from https://bugs.llvm.org/show_bug.cgi?id=37684, which is just a little bit different? Since this handling has been problematic, I'd like to understand what changed to make that older fix unnecessary.

ychen added a comment.Oct 7 2020, 2:40 PM

It sounds like even the test case that was added for D47898 passes when that fix is reverted?

Sorry, that's not what I meant. I meant to say that D47898 regressed the test case in this patch. (Ps, I've verified D47898 does fix its own test case).

It sounds like even the test case that was added for D47898 passes when that fix is reverted?

Sorry, that's not what I meant. I meant to say that D47898 regressed the test case in this patch. (Ps, I've verified D47898 does fix its own test case).

Ok got it. I should have re-read the original summary again! You are essentially reverting D47898 and replacing with a more comprehensive fix. Thanks for adding a test case. I added some suggestions to the test, mostly just comment clarity.

llvm/test/LTO/X86/Inputs/type-mapping-bug4_1.ll
12

I assume t1 refers to %t1.o, etc? Would be clearer to spell that out here and elsewhere.

25

Since in the future it won't be obvious what "this" fix is, suggest making this something like "Without the fix in D87001 to delay materialization of @d until its module is linked".

llvm/test/LTO/X86/type-mapping-bug4.ll
5

I assume this fails without this fix? Would also be good to add some checking of the resulting linked code.

15

The steps vs stages is confusing to me. Suggest noting above where you describe the stages where the steps fit. Also suggest mentioning where these steps are described, i.e. "due to step3.1 described in type-mapping-bug4_1.ll". Or maybe if you mention above where listing stages which step is included in which stage you could say there where each step is described.

27

Similar comment to the one in type-mapping-bug_1.ll - explicitly specify what the fix is.

ychen updated this revision to Diff 296827.Oct 7 2020, 4:25 PM
  • fix comments, especially remove references to "step"s. The sequence of events are described in "stage"s instead.
ychen updated this revision to Diff 296829.Oct 7 2020, 4:40 PM
ychen marked 2 inline comments as done.
  • Check for preopt IR dump
ychen marked 3 inline comments as done.Oct 7 2020, 4:40 PM
tejohnson accepted this revision.Oct 7 2020, 4:48 PM

LGTM with a minor comment nit and some clarification needed about stage 2 vs stage 3 in the description as noted below.

llvm/test/LTO/X86/Inputs/type-mapping-bug4_1.ll
30

This mentions stage2, seems inconsistent with stage3.2?

llvm/test/LTO/X86/type-mapping-bug4.ll
11

nit "t2.o" here and t1.o on the line above.

This revision is now accepted and ready to land.Oct 7 2020, 4:48 PM
ychen updated this revision to Diff 296836.EditedOct 7 2020, 5:59 PM
  • fix more comment
ychen edited the summary of this revision. (Show Details)Oct 7 2020, 6:12 PM
This revision was landed with ongoing or failed builds.Oct 7 2020, 6:23 PM
This revision was automatically updated to reflect the committed changes.