This is an archive of the discontinued LLVM Phabricator instance.

[ThinLTO] Link in only necessary DICompileUnit operands
Needs ReviewPublic

Authored by tejohnson on Jan 21 2016, 7:12 PM.

Details

Summary

As discussed on IRC, for function importing we don't need to link in
the type, enums and many other metadata hanging off the DICompileUnit
for the source module. With this patch, we will only map in the needed
DISubprograms and the DIFile (which is required) and a few simple
non-metadata members.

Do we also need to link in the DIImportedEntity though? We are marking any
DISubprograms reached from them as needed (findNeededSubprograms()).

Diff Detail

Event Timeline

tejohnson updated this revision to Diff 45639.Jan 21 2016, 7:12 PM
tejohnson retitled this revision from to [ThinLTO] Link in only necessary DICompileUnit operands.
tejohnson updated this object.
tejohnson added reviewers: aprantl, mehdi_amini.
tejohnson added a subscriber: llvm-commits.

Should the tests be testing that, when you import the DISubprogram, that you get any types needed by that DISubprogram (& only those types)?

test/Transforms/FunctionImport/funcimport_debug.ll
17

Does this test the right thing? (if you revert your change and run this test, does it fail?)

I would've expected the following DICompileUnit to match, and there to be no DICompileUnit match between one and the next. Whereas I would've guessed this was trying to check that there were no enums in the DICompileUNit checked on the next line?

Should the tests be testing that, when you import the DISubprogram, that you get any types needed by that DISubprogram (& only those types)?

Good idea, I will add a new test that has some types.

test/Transforms/FunctionImport/funcimport_debug.ll
17

Good catch - it doesn't work. I modified the test to ensure that there is no enums here by specifying the expected preceeding operand (emissionKind), here and in the other test. Confirmed that backing out my change causes this to fail.

Is there a better way to do this?

tejohnson updated this revision to Diff 45693.Jan 22 2016, 9:18 AM
  • Fix test to correctly ensure that enums does not exist in the imported DICompileUnit.
aprantl added inline comments.Jan 22 2016, 9:25 AM
lib/Linker/IRMover.cpp
1573

We also need to preserve the splitDebugName. When module debugging is enabled there will be several additional skeleton compile units without any children in the module that point to the (clang) modules that contain shared debug type information which may be referenced by types that hang off the imported function.

I also don't see a reason to drop the string fields like flags and producer.

aprantl added inline comments.Jan 22 2016, 9:27 AM
lib/Linker/IRMover.cpp
1571

Why not use a DIBuilder for this?

tejohnson added inline comments.Jan 22 2016, 9:31 AM
lib/Linker/IRMover.cpp
1571

I looked at DIBuilder last night and it wasn't clear to me how to use that in this context where I need to call MapMetadata to map in the transitively referenced MD from source to dest.

1573

Ok, will map those MDString fields in as well.

tejohnson updated this revision to Diff 45695.Jan 22 2016, 9:50 AM
  • Map in MDStrings (producer, flags, splitDebugFilename)
dblaikie added inline comments.Jan 22 2016, 10:32 AM
test/Transforms/FunctionImport/funcimport_debug.ll
16–17

Nothing great I can think of, unfortunately.

mehdi_amini edited edge metadata.Jan 22 2016, 5:54 PM

I still hit an assert with this patch:

Assertion failed: (I != Map.end() && "Missing identifier in type map"), function resolve, file include/llvm/IR/DebugInfoMetadata.h, line 89.

Investigating...

So I debugged the assertion I see and the issue seems that we don't import anymore a retained type.

DwarfDebug::beginModule() calls TypeIdentifierMap = generateDITypeIdentifierMap(CU_Nodes); to build (from the retained types list) a map from type name to matching DICompositeType.

When emitting a DISubprogram, getOrCreateSubprogramDIE will call resolve(SP->getScope()) which query the map.
It seems that any scope for a DISubprogram needs to be in the retained types list, is it correct?

In D16440#334379, @joker.eph wrote:

So I debugged the assertion I see and the issue seems that we don't import anymore a retained type.

DwarfDebug::beginModule() calls TypeIdentifierMap = generateDITypeIdentifierMap(CU_Nodes); to build (from the retained types list) a map from type name to matching DICompositeType.

When emitting a DISubprogram, getOrCreateSubprogramDIE will call resolve(SP->getScope()) which query the map.
It seems that any scope for a DISubprogram needs to be in the retained types list, is it correct?

Is it the case that you can reach the SP and composite types from the retained type, but not vice versa? It sounds like it if it is building a map from retained type. In that case my change would not map those in (since the retained type list on the compile unit is no longer mapped, only things we can reach from the imported functions). I'll have to change the patch to build up the same type of map and then map in any retained types that map to something that was mapped in.

Anyone know offhand if there is anything else hanging off the compile unit that is like this?

In D16440#334379, @joker.eph wrote:

So I debugged the assertion I see and the issue seems that we don't import anymore a retained type.

DwarfDebug::beginModule() calls TypeIdentifierMap = generateDITypeIdentifierMap(CU_Nodes); to build (from the retained types list) a map from type name to matching DICompositeType.

When emitting a DISubprogram, getOrCreateSubprogramDIE will call resolve(SP->getScope()) which query the map.
It seems that any scope for a DISubprogram needs to be in the retained types list, is it correct?

Is it the case that you can reach the SP and composite types from the retained type, but not vice versa? It sounds like it if it is building a map from retained type. In that case my change would not map those in (since the retained type list on the compile unit is no longer mapped, only things we can reach from the imported functions). I'll have to change the patch to build up the same type of map and then map in any retained types that map to something that was mapped in.

Anyone know offhand if there is anything else hanging off the compile unit that is like this?

I reproduced this issue with a build of xalancbmk from SPEC cpu2006. The issue happens when composite types (on the retained types list) are used as the scope on a DISubprogram, because the association is done by MDString identifier. E.g.:

!78 = !DICompositeType(tag: DW_TAG_class_type, name: "XMLPlatformUtils", scope: !17, file: !16, line: 104, size: 8, align: 8, elements: !79, identifier: "_ZTSN11xercesc_2_516XMLPlatformUtilsE")
...
!93 = !DISubprogram(name: "Initialize", linkageName: "_ZN11xercesc_2_516XMLPlatformUtils10InitializeEPKcS2_PNS_12PanicHandlerEPNS_13MemoryManagerE", scope: !"_ZTSN11xercesc_2_516XMLPlatformUtilsE", file: !16, line: 204, type: !94, isLocal: false, isDefinition: false, scopeLine: 204, flags: DIFlagPublic | DIFlagPrototyped, isOptimized: true)

Why aren't these correlated by the metadata ID instead? I.e. !93 instead of !"_ZTSN11xercesc_2_516XMLPlatformUtilsE" in the DISubprogram scope? Then no TypeIdentifierMap would be needed.

In any case, I added some handling for this, so that we detect and map in retained types onto the new DICompileUnit if they correspond to a DISubprogram that is being mapped in. However, I just hit another similar issue. In this case, the retained type DICompositeType identifier is the base type of a DIDerivedType needed by a DISubprogram that was mapped in. E.g.:

!3 = !DICompositeType(tag: DW_TAG_enumeration_type, name: "PanicReasons", scope: !"_ZTSN11xercesc_2_512PanicHandlerE", file: !4, line: 116, size: 32, align: 32, elements: !5, identifier: "_ZTSN11xercesc_2_512PanicHandler12PanicReasonsE")
...
!107 = !DIDerivedType(tag: DW_TAG_const_type, baseType: !"_ZTSN11xercesc_2_512PanicHandler12PanicReasonsE")

where the DIDerivedType is the DISubroutineType for a DISubprogram that was mapped in. I'll need to extend my handling a bit to get this case. Same question here though, why doesn't it just use "baseType: !3"?

tejohnson edited edge metadata.Jan 26 2016, 8:52 AM
tejohnson added a subscriber: dexonsmith.

!78 = !DICompositeType(tag: DW_TAG_class_type, name: "XMLPlatformUtils", scope: !17, file: !16, line: 104, size: 8, align: 8, elements: !79, identifier: "_ZTSN11xercesc_2_516XMLPlatformUtilsE")
...
!93 = !DISubprogram(name: "Initialize", linkageName: "_ZN11xercesc_2_516XMLPlatformUtils10InitializeEPKcS2_PNS_12PanicHandlerEPNS_13MemoryManagerE", scope: !"_ZTSN11xercesc_2_516XMLPlatformUtilsE", file: !16, line: 204, type: !94, isLocal: false, isDefinition: false, scopeLine: 204, flags: DIFlagPublic | DIFlagPrototyped, isOptimized: true)

Correction, the DISubprogram that references the composite type (and was pulled in due to an import) is:
!103 = distinct !DISubprogram(name: "alignPointerForNewBlockAllocation", linkageName: "_ZN11xercesc_2_516XMLPlatformUtils33alignPointerForNewBlockAllocationEm", scope: !"_ZTSN11xercesc_2_516XMLPlatformUtilsE", file: !104, line: 851, type: !105, isLocal: false, isDefinition: true, scopeLine: 852, flags: DIFlagPrototyped, isOptimized: true, declaration: !107, variables: !108)

Why aren't these correlated by the metadata ID instead? I.e. !93 instead of !"_ZTSN11xercesc_2_516XMLPlatformUtilsE" in the DISubprogram scope? Then no TypeIdentifierMap would be needed.

In any case, I added some handling for this, so that we detect and map in retained types onto the new DICompileUnit if they correspond to a DISubprogram that is being mapped in. However, I just hit another similar issue. In this case, the retained type DICompositeType identifier is the base type of a DIDerivedType needed by a DISubprogram that was mapped in. E.g.:

!3 = !DICompositeType(tag: DW_TAG_enumeration_type, name: "PanicReasons", scope: !"_ZTSN11xercesc_2_512PanicHandlerE", file: !4, line: 116, size: 32, align: 32, elements: !5, identifier: "_ZTSN11xercesc_2_512PanicHandler12PanicReasonsE")
...
!107 = !DIDerivedType(tag: DW_TAG_const_type, baseType: !"_ZTSN11xercesc_2_512PanicHandler12PanicReasonsE")

where the DIDerivedType is the DISubroutineType for a DISubprogram that was mapped in. I'll need to extend my handling a bit to get this case. Same question here though, why doesn't it just use "baseType: !3"?

It turns out the second problem was caused by the fix for the first. The subprogram that referenced the DIDerivedType !107 was only mapped in because it is in the list of elements for the DICompositeType !78 that was the scope for the function needed due to importing. So the set of what debug metadata needs to be imported keeps exploding as the problem is addressed.

Is there any way to avoid this? Does the complete DICompositeType !78 need to be recreated in the importing module? It isn't clear to me which debug type metadata needs to be fully recreated in the importing module, I know the original premise behind this patch was that it was sufficient for it to exist in the original module. But getOrCreateSubprogramDIE wants to get the ContextDIE for the subprogram.

It seems that

Why aren't these correlated by the metadata ID instead? I.e. !93 instead of !"_ZTSN11xercesc_2_516XMLPlatformUtilsE" in the DISubprogram scope? Then no TypeIdentifierMap would be needed.

I think it is to break cycles. String are a convenient way to break dependencies :)

I discussed with Duncan yesterday, the ultimate plan he had last year seemed to be to reverse all the links: Subprograms would point to the CompileUnit instead of the opposite for instance. And ultimately there wouldn't be a need for the named metadata dbg.cu.

tejohnson updated this revision to Diff 46298.Jan 28 2016, 11:30 AM

Ensure that all needed retained types are mapped in, and as declarations.

The retained composite types needed in the dest module include not only
those reached by mapped subprograms, but also those in the scope-chain
of other needed composite types. Because the full composite type will
be emitted in the original module, only the type declaration is needed
in the dest module. Therefore, a new composite type is constructed for
the dest module, and only fields required for a declaration are
populated, and the FlagFwdDecl flag is set.

Different support is needed to detect the set of retained types
referenced by imported functions for normal metadata linking vs
postpass metadata linking. The former can simply look at which
retained types were added to the value map, whereas the latter needs to
recursively locate the retained types reached from needed subprograms
(which have not yet been mapped). This is similar to how we recursively
identify needed subprograms in the post-pass linking case.

The test cases were augmented to contain composite types referenced
in various ways, and check that the expected ones are imported as
type declarations.

Also tested by building 483.xalancbmk from SPEC with and without
postpass metadata linking. In both cases the size of the binary
was reduced around 28% with this patch.

It's a little unclear from your description - are you importing /any/ types as definitions? Or are all imported types being imported as declarations? (I think the latter would be good/correct/minimal/simpler)

test/Transforms/FunctionImport/funcimport_debug.ll
18

It might be clearer to match the retained types list (assuming all the types go in there, which I think they do - at least the ones using mangled names, etc) and then match all the types referenced from there, then you wouldn't need the CHECK-NOTs which are a bit brittle here anyway (if types are emitted in a different order, the CHECK-NOTs wouldn't catch them - they'd only catch the case where the type is emitted between the two surrounding CHECK'd types)

Just a thought. (& could check that things like vtableHolder, etc, aren't emitted in the declarations you're interested in)

It's a little unclear from your description - are you importing /any/ types as definitions? Or are all imported types being imported as declarations? (I think the latter would be good/correct/minimal/simpler)

I think so. At least for composite types. Are there any cases, other than from the retainedTypes list on the compile unit, where a DICompositeType node is referenced by its address rather than by UUID? I haven't seen any, which means that we would not import any while mapping in metadata on imported functions/instructions. In that case they are only imported as declarations with this patch, since they are only imported by the new linkImportedCompileUnit routine.

Note that there are DIDerivedTypes that are referenced from imported functions (e.g. via the subprogram type array) and therefore mapped in when the functions and instructions are imported and mapped. Is there a distinction between definition and declaration for these types?

test/Transforms/FunctionImport/funcimport_debug.ll
18

Good idea. I already found and fixed some issues where my existing CHECK-NOTs were not behaving as expected. Will do that.

tejohnson updated this revision to Diff 46311.Jan 28 2016, 1:51 PM

Explicitely cross-check retainedTypes list against DICompositeTypes
instead of CHECK-NOTs. Also, check full DICompositeType to ensure only
expected fields are present.

I think so. At least for composite types. Are there any cases, other than from the retainedTypes list on the compile unit, where a DICompositeType node is referenced by its address rather than by UUID?

Try types without a mangled name - I suppose one example might be something like:

a.cpp:
void b();
int main() {

b();

}

b.cpp:
namespace {
struct anon {
};
void bimpl(anon) {
}
}
void b() {

bimpl(anon());

}

If bimpl is imported into a (presumably after b is imported into a) then you'll see a type without a mangled name... what you should do with this is perhaps a difficult question. A debugger's going to have a hard time identifying these types (or even the two 'bimpl' functions) as the same when they're in two different object files & flagged as being "local" entities...

Confirmed that in this case, the composite type for anon is in the subroutine type list for bimpl, and that when we import bimpl into a.cpp we therefore import the full composite type definition.

It isn't clear to me from your last sentence what we should do here. Does having the full type definition imported make it easier or harder for the debugger?

Thanks,
Teresa

  • Dave

I just checked on one of our projects and my ThinLTO think time improves from 3m31 (211s) to 3m8 (188s), i.e. 10% improvement with the current version of the patch.
(note this is the full link time, including optimization and codegen!)

Ping. Any other comments on this patch or can it go in?

ahatanak added inline comments.
lib/Linker/IRMover.cpp
1395

While I was building a project using thin-LTO, I found out that the call to MapMetadata here causes an assert if CU-getSubprograms().get() returns null (the assert is triggered in MapMetadataImpl when isa<MDString> is called on a null ptr). Should we check whether CU->getSubprograms() is empty before calling the function? Mehdi also suggested that we can do the check inside MapMetadaImpl and have it return null if the first argument is null. That might be better if there are other call sites that passes a null pointer as the first argument.

tejohnson added inline comments.Feb 16 2016, 9:07 AM
lib/Linker/IRMover.cpp
1395

Yes, either of those sound like they would fix this. I am traveling all week and won't be able to make any changes until early next week. Will see if I can make the change in MapMetadataImpl then, otherwise here.

tejohnson updated this revision to Diff 48732.Feb 22 2016, 2:20 PM

Handle a null MD passed to MapMetadata to address problem reported by
ahatanak.

tejohnson updated this revision to Diff 48931.Feb 24 2016, 7:07 AM

Ensure retained types are imported exactly once, as declaration, and
used both in the retained types list and in any referencing scope chains
for other imported retained types. Expand test to cover this case.

tejohnson updated this revision to Diff 48960.Feb 24 2016, 10:03 AM

Add a new option to control full importing of types. By default import
only type decls instead of full defs.

Add a new option to control full importing of types. By default import
only type decls instead of full defs.

Let me know if this is too big of a hammer. For now it will essentially turn off this entire patch, meaning not only will we get the full type defs, but we also import the full dbg.cu (including enums, etc).

Ping.

Mehdi - any more issues in your internal testing? Does this look good now?
Thanks

Sorry, I haven't been able to conduct more testing on this one since your last update :(

With the flag that allow to disable this optimization is something goes wrong (or the platform debugger isn't happy after that), I wouldn't oppose to this patch.
I'd rather have someone familiar with debug info to approve this revision, but I'm also OK to iterate on it post-commit if we find other issues.
Any of Adrian, David, or Duncan can express an opinion here?

dblaikie added inline comments.Mar 10 2016, 3:16 PM
lib/Linker/IRMover.cpp
1386

I'm not sure this is true. Here's a type you might have to import that is not in the retained types list:

namespace {
struct foo { };
void f(foo) { }
}
struct bar { };
void f(bar) { }
void f() {
  f(foo());
  f(bar());
}

The 'foo' type is not in the retained types list, but it may need to be imported if 'f(foo)' is imported, yes?

1398

Pretty sure you don't need the base type, size, alignment, and offset in a declaration - check equivalent declarations that are generated by clang? (check that the declarations this code creates look like natural declarations in the original source language, etc)

1411

This name seems a bit misleading. If I'm reading the code correctly, this is used for any type's scope context - which might not be a type at all. (it could be a namespace, for example) - and I assume the non-type cases come out in the "map the type metadata normally" part?

1427

Name seems a bit off - this links in potentially multiple compile units, not just a single one.

1444

What's postpass metadata linking? (what's the alternative?)

1445

I'm not really following why we need to iterate the retained types list - wouldn't we import everything starting from the subprograms we needed to import, not the retained types list?

1534

What if nothing was imported from this CU?

It seems to me we'd want to start at the imported functions, follow their DISubprograms, and import that way, rather than walking all the CUs and subprograms - maybe?

tejohnson updated this revision to Diff 51343.Mar 22 2016, 2:35 PM
  • Address review comments from dblaikie.

Hey Teresa,

How will the changes we were discussing with Adrian yesterday on IRC will affect this patch?

I wonder if your approach here has still any gain with the new DebugInfo representation that Adrian and Duncan worked on?

In D16440#410434, @joker.eph wrote:

I wonder if your approach here has still any gain with the new DebugInfo representation that Adrian and Duncan worked on?

At the very least it is stale with Duncan's change in r267296 to eliminate remove MDString-based type references and Adrian's change in r266446 to remove the DISubprogramArray from the DICompileUnit.

For now the retainedTypes list still exists on the DICompileUnit, however, in Duncan's r267296 from this weekend he notes that the retainedTypes list is now unnecessary, so I assume he is removing that imminently?

Once the retainedTypes list is gone, most of this is unnecessary. However, this patch was also nulling out several other fields on the imported DICompileUnit, namely EnumTypes, GlobalVariables, ImportedEntities and Macros. What is the plan for those? Gping forward, should the DICompileUnit still be modified in some way when it is imported?