This is an archive of the discontinued LLVM Phabricator instance.

Add DITypeIndex for CodeView and emit it for locals
Needs ReviewPublic

Authored by rnk on Apr 18 2016, 1:43 PM.

Details

Summary

This is subclass of Metadata that essentially contains one 32-bit
integer. It can stand in as a DITypeRef or a DIScopeRef in certain
cases. If you attempt to resolve such a DITypeRef, you will get an
assertion failure.

Currently the indexes can only represent the simple types below 0x1000.
In the next patch I'll add support for type streams, and these indexes
will be able to represent compound types such as record and function
types.

Diff Detail

Event Timeline

rnk updated this revision to Diff 54111.Apr 18 2016, 1:43 PM
rnk retitled this revision from to Add DITypeIndex for CodeView and emit it for locals.
rnk updated this object.
rnk added reviewers: dexonsmith, aprantl, dblaikie, echristo.
rnk added a subscriber: llvm-commits.

Missing update to LangRef?

dblaikie added inline comments.Apr 18 2016, 1:56 PM
unittests/IR/MetadataTest.cpp
1407

X y = X(z); -> X y(z); perhaps?

and/or should DITypeRef be implicitly convertible from a DIType?

I like this. I think you're missing updates to ValueMapper.cpp, so
llvm-link will crash if you leave this as is. For the code there,
it would be best to treat DITypeIndex exactly like MDString.

It might be nice to have a single isa target for types that (1)
don't have operands and (2) never change address. Basically, things
like MDString and DITypeRef (and DIExpression should be moved
over, too...).

That'll keep down code duplication in ValueMapper; it'll probably be
useful in other places that don't really care about specific
semantics.

Maybe "ImmutableMetadata"?

rnk added a comment.Apr 18 2016, 2:20 PM

Maybe "ImmutableMetadata"?

I think the main property they share is that they have no metadata operands: they are leaves of the metadata graph. So, maybe LeafMetadata? MDLeaf? DILeaf?

unittests/IR/MetadataTest.cpp
1407

The primary TypedDINodeRef constructor is marked explicit, so I shied away from adding an implicit constructor. However, it is constructing from a generic Metadata*, and an implicit constructor from T* would be pretty safe.

aprantl added inline comments.Apr 18 2016, 2:22 PM
include/llvm/IR/DebugInfoMetadata.h
2391

Does CU here mean DICompileUnit or llvm::Module? It's probably better to spell it out.

lib/AsmParser/LLParser.cpp
4287

Comment update?
Mehdi already mentioned that this should also be in LangRef.rst.

lib/IR/DIBuilder.cpp
689

Are we loosing type safety through this change? Can you double-check that there is a Verifier check that ensures that Function->getType() resolves to a DISubroutineType?

lib/IR/Metadata.cpp
422

whitespace

lib/IR/Verifier.cpp
990

Ah there it is :-)

aprantl edited edge metadata.Apr 18 2016, 2:27 PM

Could you please also (doesn't have to be part of this patch) update SourceLevelDebugging.rst with information about LLVM's CodeView implementation?

rnk updated this revision to Diff 54121.Apr 18 2016, 2:45 PM
rnk edited edge metadata.
  • Add LangRef and a linker test case for DITypeIndex
aprantl added inline comments.Apr 18 2016, 2:59 PM
docs/LangRef.rst
4298 ↗(On Diff #54121)

Could you please spell out CU?
Should we mention that this is used to implement CodeView types?

aaboud added a subscriber: aaboud.Apr 19 2016, 12:22 PM

I am a little confused.

  1. Are you trying to replace DIBasicType?
  2. Will Clang be generating DITypeIndex for Windows target (COFF debug info) and DIBasicType/DICompositeType/DIDerivedType/DISubroutineType for Linux target (DWARF debug info)?
  3. You did not explain how you will represent the compound types using this DITypeIndex! 3.a. Who will set the index field in DITypeIndex for compound types? can clang FE do that? 3.b. Will CodeView emitter emit that index directly? what will happen if we do not want to emit some of the types, do not we need to rearrange the indices? 3.c. Will this work also for emitting a pdb file? can the compiler set these indices also when emitting types to pdb file instead of COFF object file?

I would appreciate if you can answer these questions.
Thanks,
Amjad

rnk marked 2 inline comments as done.Apr 19 2016, 2:20 PM

I am a little confused.

  1. Are you trying to replace DIBasicType?

Yes

  1. Will Clang be generating DITypeIndex for Windows target (COFF debug info) and DIBasicType/DICompositeType/DIDerivedType/DISubroutineType for Linux target (DWARF debug info)?

Yes

  1. You did not explain how you will represent the compound types using this DITypeIndex!

My plan is that we will have named metadata, like llvm.dbg.cvtypes, which will be a list of pairs of CUs and type streams. The type stream will be a simple MDString. This avoids having DICompileUnit reference the type streams directly, while still allowing us to associate the type stream with the CU. This is similar to how we associated subprograms with CUs currently.

Later we can add a specialized DI node to represent type streams (DITypeStream/Blob?) if we think it's worth it. It's not very efficient for the entire type stream to participate in MDString uniquing, for example, and printing a bunch of binary hex escapes is very ugly.

3.a. Who will set the index field in DITypeIndex for compound types? can clang FE do that?

Yes, the frontend will drive llvm::codeview::MemoryTypeTableBuilder to do this.

3.b. Will CodeView emitter emit that index directly? what will happen if we do not want to emit some of the types, do not we need to rearrange the indices?

Yes, if you want to change or drop some type information, your tool will need to be format-aware. LTO doesn't need to do this, it can simply concatenate the llvm.dbg.cvtypes lists. The backend can do the merging and renumbering when it emits .debug$T.

3.c. Will this work also for emitting a pdb file? can the compiler set these indices also when emitting types to pdb file instead of COFF object file?

I have no plans to implement MSVC /Zi-style PDB emission, where the compiler communicates with a PDB server process that brokers type indices. If you did this, I could easily imagine replacing the type stream with a sentinel that says "this type index refers to that PDB".

I *do* want clang to avoid duplicating type information when the type was defined in another module, similar to what we do for DWARF. For now if we detect that case, we can emit a forward declaration for the type in the type stream to give it a type index. In the future, we may want a way to represent a type index from some stream other than that of the current CU.

include/llvm/IR/DebugInfoMetadata.h
2391

The DICompileUnit. I want LTO to be functionally correct, if inefficient, in the presence of this data. For now, this means having two CUs and two type streams in one module, and fixing up the type indices in the backend. We can teach LTO to merge the type streams later.

Hi Reid,
Thanks for the answers, now as I understand more your plan, I have more questions:

  1. You want to use indices into list of MDString per CU to improve LTO linkage and reduce type footprint by improving merging types from different compile unites. right? 1.a. I claim that we can achieve that by doing a deep compare on current debug info metadata (based DWARF), do you agree?
  2. Regarding the MDString, it should be generated canonically, otherwise the compare will fail, right? 2.a. Can you give example on what it will contain for class types? 2.b. I claim that we can generate that MDString today by traveling on "DI*Type" entries (based dwarf), do you agree or you think that we already lost some information? 2.c. I assume you will still need to generate "DI*Type" entries for compound types (maybe with the index into the MDString list), as you cannot get all information from the MDString, can you? 2.d. Do you have estimation on how much size the "llvm.dbg.cvtypes" will consume? Can you assure we will end up reducing the IR size and not increasing it?
  3. If the idea is to reduce the size of the LTO IR, why we are not doing this also for DWARF, why just for CV types?

Thanks,
Amjad

rnk updated this revision to Diff 55255.Apr 27 2016, 10:25 AM
  • Rebase
dblaikie edited edge metadata.Apr 27 2016, 11:08 AM

Generally looks good - should there be more test coverage, though? At least testing one case of a type other than int being generated into the code view? (to demonstrate the new codepath, etc - it looks like the only things testing the CodeView type reference emission are testing the old "everything is int" implementation?)

lib/Bitcode/Writer/ValueEnumerator.cpp
605

That's probably going to trigger the parens precedence warning - I think you need to put the || x inside the original outer ()

lib/CodeGen/AsmPrinter/CodeViewDebug.cpp
712–713

Is this a supported case? Sounds like if that happens it's corrupt debug info, no? (catch this in the verifier & then just assert that it'll always be present here?)

714

static_cast rather than C style cast?

lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp
590–591

Under what conditions would this fail? Is there a test case for it?

lib/CodeGen/AsmPrinter/DwarfUnit.cpp
1209–1211

Can use 'auto*' on the LHS when the type is explicit in the cast on the RHS (similarly elswhere in this change) - style guide suggests this as acceptable, but not sure it explicitly favors it, up to you.

lib/IR/DebugInfoMetadata.cpp
573–574

Is this the first use of TypeAllocator for something other than types? Should it be renamed, or at least a comment added? (or a different allocator used?)

test/DebugInfo/COFF/local-variables.ll
285

Presumably this type should be pruned from all the codeview test cases now, as unreferenced metadata?

test/Linker/ditypeindex.ll
33–36 ↗(On Diff #55255)

Presumably you don't need to check that all this other stuff roundtrips - just the indexes, right? Your change didn't have anything to do with those other bits.

Oh, I see, these aren't CHECK lines, they're for mutating this file. - not sure if that's ideal/necessary. (& won't that mean this test "REQUIRES: shell"? Which is probably best avoided (I assume we generally try to avoid that, but don't know for sure), but should be added if you're going to keep the grep/sed/cat/etc)

I'd just write it as a second file - at least that's what we've usually done, I think.

unittests/IR/MetadataTest.cpp
84

Is this conversion implicit? (I think it is, but didn't entirely follow all the layers of macro goo, inheritance, etc)

unittests/Transforms/Utils/Cloning.cpp
422

Prefer copy init over direct init where both are valid:

DITypeRef DFuncType = DBuilder.createSubroutineType(ParamTypes);

(this avoids any accidental explicit conversions from being performed)
(same comment on the previous instance of the same construct in this file)

rnk marked 3 inline comments as done.Apr 27 2016, 12:47 PM

Generally looks good - should there be more test coverage, though? At least testing one case of a type other than int being generated into the code view? (to demonstrate the new codepath, etc - it looks like the only things testing the CodeView type reference emission are testing the old "everything is int" implementation?)

I changed local-variable.ll to have some 'unsigned' variables.

lib/CodeGen/AsmPrinter/CodeViewDebug.cpp
712–713

IMO it is useful to allow users to emit codeview for IR that was generated with DWARF in mind. You'll still get line tables etc. Rejecting it in the verifier and asserting here would make that pretty hard.

714

You don't like C++ conversion-style casts? :)

lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp
590–591

The verifier checks that this is a DISubroutineType, DITypeIndex, or null. 'resolve' internally casts to DIType, so if you attempt to generate DWARF from IR with type indices, you will crash here. You will also crash at every other 'resolve' call site, though.

Do you think we should apply the same logic that we have on the CV side and have 'resolve' return nullptr for a DITypeIndex?

lib/IR/DebugInfoMetadata.cpp
573–574

Oops, I assumed it was used for DI types, not LLVM IR types... Well, they have the same lifetimes. I'm going to update the doc comment to indicate that they also live here.

test/Linker/ditypeindex.ll
33–36 ↗(On Diff #55255)

Fine. Personally, I find it convenient to be able to see the whole test case all at once, but it is not a common style and pretty confusing.

unittests/IR/MetadataTest.cpp
84

No, it is not. I drafted a change to make it implicit, but I think Duncan said he'd rather not do that.

unittests/Transforms/Utils/Cloning.cpp
422

Both instances need explicit conversion, though.

rnk added a comment.Apr 27 2016, 12:48 PM

Generally looks good - should there be more test coverage, though? At least testing one case of a type other than int being generated into the code view? (to demonstrate the new codepath, etc - it looks like the only things testing the CodeView type reference emission are testing the old "everything is int" implementation?)

Yeah, the linker test case has some non-int variables, so I felt covered, but it doesn't actually emit indices to an object file. I changed local-variable.ll to have some 'unsigned' variables so now that's covered.

rnk updated this revision to Diff 55298.Apr 27 2016, 12:49 PM
rnk edited edge metadata.
  • Address review comments
dexonsmith resigned from this revision.Oct 19 2020, 11:06 AM