This is an archive of the discontinued LLVM Phabricator instance.

[codeview] Adding support for CodeView types
AbandonedPublic

Authored by aaboud on May 19 2016, 8:07 AM.

Details

Summary

Added support for CodeView Types.
Now type indices are auto generated.
The idea is:

  1. to lower types from LLVM DIType to CodeViewType during assembly processing (codeview symbol processing).
  2. assigned indices to all codeview types according to order they are going to be emitted in.
  3. emit the codeview types into .Debug$T section.

Note: This patch only adds support for Basic types and function type (and id).

Once this patch is committed, it will be relatively easy to support all other kind of types.

Few LIT tests were modified to be aligned with the changes and two LIT tests that covers basic types and function type were added.

Diff Detail

Event Timeline

aaboud updated this revision to Diff 57793.May 19 2016, 8:07 AM
aaboud retitled this revision from to [codeview] Adding support for CodeView types.
aaboud updated this object.
aaboud added a subscriber: llvm-commits.
zturner edited edge metadata.May 19 2016, 8:13 AM

I'll let someone else comment on the record emission.

include/llvm/DebugInfo/CodeView/TypeRecord.h
880

This looks wrong to me. It does not include the length of the name. Also, I would prefer if the 2 + were raised somehow, because it is common to every single type of CodeView record, and this logic of knowing that a type record has 2 header bytes should be common somewhere.

Also, each type record knows how to deserialize itself (see the deserialize method above), should we centralize the serialization logic here as well using a similar pattern?

majnemer added inline comments.May 19 2016, 9:09 AM
lib/CodeGen/AsmPrinter/CodeViewDebug.cpp
812

This can't be right. You want to transform the index if the name has "long" somewhere in it?
I'd suggest matching against "long INT" explicitly.

838

Please use // for comments.

849

Pointers lean right.

861–862

Please use clang-format here, you have an abnormal amount of space between the pointee type and the star.

896

The LLVM style is just to use unsigned instead of unsigned int.

898–900

These extra braces are a little jarring.

902–917

I'd just do:

if (CodeViewType *CVType = getMappedCodeViewType(Type))
  return CVType;

switch (Tag) {
case dwarf::DW_TAG_base_type:
  return lowerTypeBasic(cast<DIBasicType>(Type));
case dwarf::DW_TAG_subroutine_type:
  return lowerTypeSubroutine(cast<DISubroutineType>(Type));
default:
  // TODO: Enable this assert once all types are implemented
  //assert(Tag != Tag && "Unhandled type Tag!");
  return nullptr;
}
914

Er, Tag != Tag? Perhaps you want llvm_unreachable ?

956–961

If there is a single path between a variable and it's initialization, we typically sink the definition to the initializer.

990–995

Ditto.

1134

unsigned

1143

unsigned

1154–1156

Please make this one line.

1172

llvm_unreachable?

lib/CodeGen/AsmPrinter/CodeViewDebug.h
152–154

Could this be DeleteContainerPointers(TypeTable) ?

rnk edited edge metadata.May 19 2016, 9:16 AM

Thanks Amjad! Adrian Mccarthy, who I added to the review, was also just getting started on this approach. At a high level, I'd like to keep as much CV serialization logic in lib/DebugInfo/CodeView as possible.

My thinking was that for every CV type record, we'd serialize the record in memory using MemoryTypeTableBuilder as we encounter DITypes. After translation, for each record we would call some new virtual MCStreamer method like EmitCVType. In MCObjectStreamer, this would delegate directly to EmitBytes. In MCAsmStreamer, this would do more work to try to make the output pretty. My initial idea is that it would emit each record as a sequence of 32-bit hex integers preceded by a big block comment generated by the same type dumper that llvm-readobj and llvm-pdbdump use.

Thoughts?

include/llvm/DebugInfo/CodeView/TypeRecord.h
880

We won't need to know the layout size if we use MemoryTypeTableBuilder or something like it.

lib/CodeGen/AsmPrinter/CodeViewDebug.cpp
1095

Why not assign type indices as we lower types, and then maintain a mapping from DIType* to TypeIndex? That seems like it would be considerably more efficient for C++ programs with lots of types.

1181

I actually wanted to move away from serializing type records using the MCStreamer interface, because it completely duplicates the CV type record layout. Instead I was suggesting to Adrian Mccarthy that we use MemoryTypeTableBuilder.

We can preserve the pretty assembly printing functionality by calling the dumper on our serialized CV types and printing that as a comment before every binary record. If we do that, we should emit CV records as a series of 32-hex bit words instead of using "EmitBytes", which uses the .ascii directive, which is useless when trying to read the assembly.

lib/CodeGen/AsmPrinter/CodeViewType.h
1

Is there a reason you can't modify TypeRecord.h to make its types suitable for this?

dblaikie edited edge metadata.May 19 2016, 9:25 AM

Here are some basic thoughts - I'd probably wait for Reid to chime in before getting too deep into all this feedback. He might have some higher level feedback, outstanding patches, etc, to worry about.

include/llvm/DebugInfo/CodeView/TypeRecord.h
880

Where's the 2 come from? Could use a comment.

& if this size is being used as part of the output format, you'd probably need to make sure the Layout struct is packed, etc. But might be better not to rely on that at all, if you can help it.

lib/CodeGen/AsmPrinter/CodeViewDebug.cpp
257

We usually skip {} on single line blocks (general code review feedback, not just this instance)

874–881

Looks like you could skip the CVType variable and just use VoidType directly. (& I might consider inverting the condition/returning early to reduce indentation)

902

Sink the variable to its declaration (& that'll also avoid a -Wparentheses warning here)

906–907

Sink the variable declaration into the switch condition. Oh, then the variable would be unused - remove the variable and just "switch (Type->getTag())"

931–932

Probably just sink these two variables into their only use - their initialization expressions are pretty self explanatory.

956–957

I don't /think/ we usually do this sort of alignment, but I could be wrong.

970–971

Formatting ("} else {"). Probably just run the whole thing through clang-format.

1004

Prefer !empty over size in this context

1040–1042

Move declarations to their first use or otherwise as close as possible (just above the if/else for CVType, for example). Reduces the scope/makes it easier to understand their use, etc. (general feedback, not just this instance)

Actually you could probably skip CVType entirely and just return:

if (...)
  return nullptr;
return lowerType(Type);
1048–1050

Then roll this declaration into the if:

if (const auto *ClassType = dyn_cast<DIType>(...))

Oh, the variable's unused - use isa<DIType> instead.

1079–1082

Maybe use the conditional operator here

1122

Probably more efficient to emit the original bytes, then emit a null byte - rather than copying the whole string, adding a null terminator, then emitting that.

1126–1127

I think we have some utility functions for computing teh nearest alignment, which might make this easier to read.

1182–1189

Not sure if this is idiomatic of other parts of the file - but mostly I'd just sink all these variables into their only use (especially since there's the "addComment" line above each emission to indicate what's being emitted in case the expression's not self documenting enough. (so adding a variable name doesn't necessarily add lots of value)

Goes for the other emit functions below.

lib/CodeGen/AsmPrinter/CodeViewDebug.h
108

Could you use a MapVector to keep the vector and map together. Or, alternatively, to save space, you could create a temporary vector at the point of emission, and sort based on the type id.

152–154

Should TypeTable contain unique_ptrs instead of raw pointers - then this manual memory management wouldn't be needed?

If it is needed, you can use DeleteContainerPointers to do this work.

lib/CodeGen/AsmPrinter/CodeViewType.cpp
1

Pretty much all the functions in this file are trivial enough to be defined inline in the header.

Though the virtual dtors do need to be out of line for the weak vtable style guide requirement/recommendation.

lib/CodeGen/AsmPrinter/CodeViewType.h
47

Any reason for this to be a virtual hierarchy if it has no other virtual functions? Should it have some?

65

Seems weird for this to be protected. Nothing else is derived from this type so maybe make it private? (same point for other CodeViewType derived classes)

Also, the naming scheme might be beter as "CodeView<X>Type" rather than "CodeViewType<X>" (eg: CodeViewBasicType here)

153

Please use 'override' where applicable.

157–158

Should return by reference, not pointer, I think?

160

Probably skip this if it just returns getArgumentList().size() ?

test/DebugInfo/COFF/basic-types.ll
20–21

Pretty sure you don't need uses of variables to cause them to be emitted at -O0, right?

test/DebugInfo/COFF/function-type.ll
6–8

There's a function type you could test in the basic-types test case, perhaps? (not a hard requirement - it's a tricky tradeoff between multiple mini tests in a single test case, versus pulling things out into entirely separate test cases to make it simpler/more obvious)

Hi all,
Thanks a lot for all your valuable comments.

Reid, I agree with you that we can reuse TypeRecord instead of introducing a new CodeViewType.
I also understand your design and approach. And I am willing to refactor my patch to suit them.

However, I must mention two concerns:

  1. Can we be sure that we can generate the correct index for each type record during creation of that record?
    • In some cases you want to revisit the type record you generated to fix some of its attributes (like the attribute "ClassOptions::ContainsNestedClass").
  2. PDB support
    • I know that you do not want to support PDB, but we must keep it into consideration.
    • When emitting types to PDB we do not know the index of each type record before we actually emit the record.
    • In the current design we create the type record index when we write the record (before we emit it to the object/pdb file).

Once again, the approach you have in mind can work if we solve the above two issues.
One way to solve them is to allow fixing data in the written record, although it would be better if we can delay the creation of the record buffer till we know the final value of the data.

Any thoughts?

lib/CodeGen/AsmPrinter/CodeViewDebug.h
108

It will not work once I add support of class types.
Because the map will have two dimensions, [Class-Type][DIType] -> CodeViewType.
And the order we want to save in the vector will get lost in this two dimentson map.

test/DebugInfo/COFF/function-type.ll
6–8

You are right, but I wanted to have a better test for LF_ARGLIST, the function in the basic-types test is very simple.
Also, having a test for each type rather than having one test for all types will help understanding the reason for the failure faster.
Finally, we do not have a lot of types, something around 10, is that too much for tests?

Notice that I am not trying to have a test for each CodeView type entry, and will merge some entries in same test where it make sense.

rnk added a comment.May 23 2016, 9:27 AM

Hi all,
Thanks a lot for all your valuable comments.

Reid, I agree with you that we can reuse TypeRecord instead of introducing a new CodeViewType.
I also understand your design and approach. And I am willing to refactor my patch to suit them.

Great!

However, I must mention two concerns:

  1. Can we be sure that we can generate the correct index for each type record during creation of that record?
    • In some cases you want to revisit the type record you generated to fix some of its attributes (like the attribute "ClassOptions::ContainsNestedClass").

I guess the issue is that classes don't contain their nested classes, instead the nested class points back at the parent record through its scope field. I think here we've found the first point where we'll need to extend the existing metadata. Consider the IR metadata generated for this example:

struct A {
  int a;
  struct B { int b; };
};
A a;
//A::B b;

Today the metadata we generate does not mention B at all, and that is a problem we're going to have to fix if we want our type records to be bitwise identical across TUs. I think we should change clang to mention a forward declaration of B in the elements list of A, and then we don't need to go back and patch up existing type records.

  1. PDB support
    • I know that you do not want to support PDB, but we must keep it into consideration.
    • When emitting types to PDB we do not know the index of each type record before we actually emit the record.
    • In the current design we create the type record index when we write the record (before we emit it to the object/pdb file).

I assume the way we support types in PDBs is by doing synchronous IPCs or file IO at the point of the call to TypeTableBuilder::writeRecord. Is that not similar to what MSVC does?

Once again, the approach you have in mind can work if we solve the above two issues.
One way to solve them is to allow fixing data in the written record, although it would be better if we can delay the creation of the record buffer till we know the final value of the data.

Any thoughts?

Hi Reid,

Today the metadata we generate does not mention B at all, and that is a problem we're going to have to fix if we want our type records to be bitwise identical across TUs. I think we should change clang to mention a forward declaration of B in the elements list of A, and then we don't need to go back and patch up existing type records.

I wanted to show you that the current debug info metadata contain (almost) everything we need to generate codeview types and symbols.
The reason you keep thinking that we need to change the metadata is that you decided on this implementation direction.

I assume the way we support types in PDBs is by doing synchronous IPCs or file IO at the point of the call to TypeTableBuilder::writeRecord. Is that not similar to what MSVC does?

If we will end up emitting the type record to the pdb file at "TypeTableBuilder::writeRecord", then I assume we will get the correct index after all, but can we do that when supporting pdb files? and if so, why not do that for COFF files two?


Finally, one minor issue regarding debuggablity and testing, are we fine to get ".debug$T" section printed in assembly mode like this, instead of have each "Type record data" string split and explained by field?

.section        .debug$T,"dr"
.long   4                       # Debug section magic
.short  6                       # Type record length
.asciz  "\001\022\000\000\000"  # Type record data
.short  14                      # Type record length
.asciz  "\b\020\003\000\000\000\000\000\000\000\000\020\000" # Type record data
.short  12                      # Type record length
.asciz  "\001\026\000\000\000\000\001\020\000\000f" # Type record data
rnk added a comment.May 24 2016, 9:43 AM

I wanted to show you that the current debug info metadata contain (almost) everything we need to generate codeview types and symbols.
The reason you keep thinking that we need to change the metadata is that you decided on this implementation direction.

Well, I think my example shows that we *must* change our type emission to at least mention A::B. I also think it would be convenient to make A reference A::B, so that we don't have to do a complete module pass over all DIType nodes.

I assume the way we support types in PDBs is by doing synchronous IPCs or file IO at the point of the call to TypeTableBuilder::writeRecord. Is that not similar to what MSVC does?

If we will end up emitting the type record to the pdb file at "TypeTableBuilder::writeRecord", then I assume we will get the correct index after all, but can we do that when supporting pdb files? and if so, why not do that for COFF files two?

I'm not sure I understand the question, but yes, I claim we can get correct type indices at the point of the writeRecord call. We can do it today when building type info streams in memory, and we can do it in the future by looking the type up in the PDB on-disk type hash table.

Finally, one minor issue regarding debuggablity and testing, are we fine to get ".debug$T" section printed in assembly mode like this, instead of have each "Type record data" string split and explained by field?

I'd like to improve on this by doing the following, as mentioned previously: "... emit each record as a sequence of 32-bit hex integers preceded by a big block comment generated by the same type dumper that llvm-readobj and llvm-pdbdump use."

zturner resigned from this revision.Jun 2 2016, 8:38 PM
zturner removed a reviewer: zturner.
aaboud abandoned this revision.Jul 27 2016, 12:04 PM

Most of the feature in this revision was already committed in a way or another into LLVM.