This is an archive of the discontinued LLVM Phabricator instance.

Macro support in LLVM IR
ClosedPublic

Authored by aaboud on Nov 15 2015, 5:39 AM.

Details

Summary

This patch introduce DIMacro and DIMacroFile debug info metadata in the LLVM IR to support macros.

Note: DIBuilder API for creating macro entries and emitting macro in dwarf sections will be committed in separate patches.

Diff Detail

Repository
rL LLVM

Event Timeline

aaboud updated this revision to Diff 40233.Nov 15 2015, 5:39 AM
aaboud retitled this revision from to Macro support in LLVM IR.
aaboud updated this object.
aaboud added reviewers: dblaikie, samsonov, echristo.
aaboud set the repository for this revision to rL LLVM.
aaboud added a subscriber: llvm-commits.

This seems to be missing any changes to LangRef.

Do we really need to add a new field to DICompileUnit? Why would the compile unit need to reference all the macros?

I don't understand why you need MacroNode separate from DIMacro. DIMacro seems to be the only subclass, so why split it? Also, is MacroNode meant for something other than debug info? If it's for debug info, it should have a DI-prefix. It should also inherit from DINode.

This seems to be missing any changes to LangRef.

If I understood you correctly, you are referring to IR documentation that I forgot to update according to these changes, right?

Do we really need to add a new field to DICompileUnit? Why would the compile unit need to reference all the macros?

If not in the DICompileUnit then where do you suggest the macro tree be referred from? its own named metadata?
According to dwarf, macro in ".debug_macinfo" section is referenced from compile-unit DIE (DW_TAG_compile_unit) using the DW_AT_macro_info attribute.

I don't understand why you need MacroNode separate from DIMacro. DIMacro seems to be the only subclass, so why split it? Also, is MacroNode meant for something other than debug info? If it's for debug info, it should have a DI-prefix. It should also inherit from DINode.

At first I wanted to use DINode as the subclass for both DIMacro and DIMacroFile, however, DINode assumes a DW_TAG_* enumerator while the macro is using different enumerator DW_MACINFO_*. For the same reason, the MacroNode cannot inherit from DINode.
Also, DIMacro cannot be the base class as it is different than DIMacroFile, and both needs a base class.
Regarding the name of MacroNode, I am fine with any name you pick, does DIMacroNode sound better?

If I understood you correctly, you are referring to IR documentation that I

forgot to update according to these changes, right?

Yes :).

[AA] OK I will do that when I upload a new patch.

I'm not speaking about this from the DWARF angle, I really know almost
nothing about DWARF. The IR is used to generate DWARF, but it doesn't
actually match the DWARF (from what I understand). We need to/have the
opportunity to figure things out on an as-needed-basis in the backend.

The problem with referring to something from the compile unit is that even
when the relevant code gets optimized out, we'll still have references to the
full debug info graph. This makes bugpoint reduction painful (the testcases
aren't actually reduced), causes scalability problems for LTO, and bloats the
debug info with descriptions that the end-user has no need for.

IMO, references from the IR should keep alive the actual macros that are
relevant in the emitted object code (after optimizations), and the rest should
just be dropped. This is the direction we're generally going with
subprograms (hoping to remove the subprograms array from the compile
unit soon).

However, I frankly don't understand the whole picture here, since you didn't
attach any changes to CFE for where macros get inserted into the code.

[AA] According to my understanding of macros, the compiler should pass them to the debug info as is. That is why they look the same all the way from the preprocessor till they are emitted in the object file.
Optimizations will change nothing in the macro debug info, and it is up to other debug info (such as line table) to hide or expose unreachable macros from the user.
In other words, you cannot optimize macros.
Also, macro debug info is related to the compile unit, so when we link two compile units, each will have its own macro debug info.
In DWARF5 there is a way to share encoding of macros between compile unites, but still each will have its own.

It would be great if somebody, who know better than me the macro debug info definition, can give his opinion on this point.

Ah, it doesn't use DW_TAG. Should we move DINode to DITagNode and
then create a common base class called DINode?

[AA] I thought that MDNode is the base class, why should we have another base class?
Notice, that I am not doing something new to LLVM debug info, we already have the DILocation and DIExpression classes that inherit from MDNode and not DINode.
So, why cannot we have also the DIMacroNode that inherits from MDNode and has nothing to do with DINode?

aaboud updated this revision to Diff 40644.Nov 19 2015, 7:25 AM

This new patch contains the following modifications:

  1. Name change: "MacroNode" -> "DIMacroNode"
  2. Added documentation for "DIMacro" and "DIMacroFile" to LangRef.

Note: I did not document "DIMacroNode" in the LangRef as it is not part of the LLVM IR textual representation, similar to "DINode" that is not mentioned either in the LangRef document.

Thanks for the comments.
Please, see my answers below.

docs/LangRef.rst
3696 ↗(On Diff #40644)

I assume these line endings are an artifact of the diff, and won't be committed?

If you mean these lines, then they will be committed, actually, this was in the rst file before my changes.
But, I think it will not be shown in the HTML LangRef document.

lib/Bitcode/Reader/BitcodeReader.cpp
2207 ↗(On Diff #40644)

The canonical place for macros: should really be *before* dwoId.
It's one of the set of arrays. Same probably makes sense for the C++ APIs. The most important thing is LangRef and AsmWriter. Please move it!

I tried that approach at first. However, it will not work, as it will break the backward compatibility.
There are some LIT tests that have 'dwoId' as record number 14, if I change the order these bitcode files will fail to load using the new bitcode reader.

Do you think that I should not care about backward compatibility and change the order back such that 'macros' appear before 'dwoId'?

aaboud added a comment.Dec 2 2015, 1:50 AM

Hi Alexey,
You asked me to do some changes, and I explained why I cannot do them.
Can you read my answers and let me know if you approve committing the patch like this or you still think I should do the changes?

Sorry for the delay, was away all last week and haven't made it back
through my inbox yet. I just saw your ping to Alexey.

samsonov edited edge metadata.Dec 2 2015, 10:39 AM

Hi Alexey,
You asked me to do some changes, and I explained why I cannot do them.
Can you read my answers and let me know if you approve committing the patch like this or you still think I should do the changes?

Me? What changes do you refer to?

aaboud added a comment.Dec 3 2015, 4:24 AM

Hi Alexey,
You asked me to do some changes, and I explained why I cannot do them.
Can you read my answers and let me know if you approve committing the patch like this or you still think I should do the changes?

Me? What changes do you refer to?

Sorry, I got confused :)
Duncan was the one who asked me to do the changes, he already answered my ping to you.

Any requirements for bitcode record order are unrelated to textual IR syntax and C++ API. Do what you have to do to read old bitcode, but that shouldn't block making the C++ API and textual IR natural.

I don't really care where the macros show up in the bitcode records, since there are relatively few humans that read those.

I will then do the right order in LLVM textual IR, i.e. "macros" before "dwoId", but will keep the other order in the bitcode to be able to be backward compatible.
Will upload a new version soon.

aaboud updated this revision to Diff 41878.Dec 4 2015, 7:40 AM
aaboud edited edge metadata.

Followed Duncan comment and moved "macros" field in DICompileUnit to be before "dwoId" field in all places (including API and textual LLVM IR) except bitcode, where "dwoId" is emitted before "macros" for backward compatibility.

The IR changes LGTM. Please wait for an LGTM from Adrian or David or Eric
(unless you've already had one?) for the broader direction.

The IR changes LGTM. Please wait for an LGTM from Adrian or David or Eric
(unless you've already had one?) for the broader direction.

Thanks Duncan.
David, can you approve this patch?

Thanks,
Amjad

aprantl edited edge metadata.Dec 8 2015, 8:51 AM

The only concern I've got with this is whether we shouldn't future-proof this by using the DWARF5-style DW_MACRO_.* constants internally, since the old ones aren't even defined in the DWARF5 standard document any more and thus might be harder to find documentation for.

Context: In DWARF5 ( http://dwarfstd.org/ShowIssue.php?issue=110722.1 ) the DW_MACINFO_.* constants will be renamed to DW_MACRO_*. This will be fun to implement in dwarfdump because the same numeric value has different names depending on the DWARF version.

aaboud added a comment.Dec 8 2015, 9:04 AM

The only concern I've got with this is whether we shouldn't future-proof this by using the DWARF5-style DW_MACRO_.* constants internally, since the old ones aren't even defined in the DWARF5 standard document any more and thus might be harder to find documentation for.

Context: In DWARF5 ( http://dwarfstd.org/ShowIssue.php?issue=110722.1 ) the DW_MACINFO_.* constants will be renamed to DW_MACRO_*. This will be fun to implement in dwarfdump because the same numeric value has different names depending on the DWARF version.

Thanks Adrian for the comment.
I understand your concern, but I think it is not related to this patch of macro support in LLVM IR.
I intentionally, split code into several parts, so the concern can be raised only on the problematic ones.

Do you think that this patch might not fit DWARF5?

probinson edited edge metadata.Dec 8 2015, 6:50 PM

The only concern I've got with this is whether we shouldn't future-proof this by using the DWARF5-style DW_MACRO_.* constants internally, since the old ones aren't even defined in the DWARF5 standard document any more and thus might be harder to find documentation for.

Older versions of the DWARF spec will remain available; dwarfstd.org has them back to 1.1. I'm not concerned about the constant names becoming unavailable.
On the contrary, the DWARF 5 constants aren't officially defined/published yet, and using them is speculative. I'd prefer to see DW_MACINFO for now, and add DW_MACRO later.

Context: In DWARF5 ( http://dwarfstd.org/ShowIssue.php?issue=110722.1 ) the DW_MACINFO_.* constants will be renamed to DW_MACRO_*. This will be fun to implement in dwarfdump because the same numeric value has different names depending on the DWARF version.

Yeah... reading the data from the object file will be unambiguous (different section name) but pretty printing it might get kind of tedious.

Thanks Adrian for the comment.
I understand your concern, but I think it is not related to this patch of macro support in LLVM IR.
I intentionally, split code into several parts, so the concern can be raised only on the problematic ones.

Do you think that this patch might not fit DWARF5?

Conveying the macro data from the frontend through the IR should work the same way for either, I'd guess. DWARF 5 will have differences in detail of emitting the data into the object file and that will probably have to be done separately anyway.

aaboud added a comment.Dec 9 2015, 6:42 AM

Thanks Paul for the comment.
This is what I was trying to say, I understand the concerns regarding DWARF5, but I am sure that once DWARF5 is official this design will easily be upgraded to support it.

Do you still think I need to change anything with this patch?
Otherwise, can you give me approval for committing?

Thanks,
Amjad

The only concern I've got with this is whether we shouldn't future-proof this by using the DWARF5-style DW_MACRO_.* constants internally, since the old ones aren't even defined in the DWARF5 standard document any more and thus might be harder to find documentation for.

Older versions of the DWARF spec will remain available; dwarfstd.org has them back to 1.1. I'm not concerned about the constant names becoming unavailable.
On the contrary, the DWARF 5 constants aren't officially defined/published yet, and using them is speculative. I'd prefer to see DW_MACINFO for now, and add DW_MACRO later.

Context: In DWARF5 ( http://dwarfstd.org/ShowIssue.php?issue=110722.1 ) the DW_MACINFO_.* constants will be renamed to DW_MACRO_*. This will be fun to implement in dwarfdump because the same numeric value has different names depending on the DWARF version.

Yeah... reading the data from the object file will be unambiguous (different section name) but pretty printing it might get kind of tedious.

Thanks Adrian for the comment.
I understand your concern, but I think it is not related to this patch of macro support in LLVM IR.
I intentionally, split code into several parts, so the concern can be raised only on the problematic ones.

Do you think that this patch might not fit DWARF5?

Conveying the macro data from the frontend through the IR should work the same way for either, I'd guess. DWARF 5 will have differences in detail of emitting the data into the object file and that will probably have to be done separately anyway.

aprantl accepted this revision.Dec 9 2015, 8:47 AM
aprantl edited edge metadata.

Since the numeric values are identical there are no binary compatibility issues and the constants in the assembler could be renamed later and auto-upgraded without too much effort.

thanks!

This revision is now accepted and ready to land.Dec 9 2015, 8:47 AM
This revision was automatically updated to reflect the committed changes.