This is an archive of the discontinued LLVM Phabricator instance.

Separate DIERef vs. user_id_t: m_function_scope_qualified_name_map
ClosedPublic

Authored by jankratochvil on Feb 14 2020, 12:48 PM.

Details

Summary

As discussed in D73206 there is both DIERef and user_id_t and sometimes (for DWZ) we need to encode Main CU into them and sometimes we cannot as it is unavailable at that point and at the same time not even needed.
I have also noticed DIERef and user_id_t in fact contain the same information which can be seen in SymbolFileDWARF::GetUID.
(Offtopic: Moreover UserID is also another form of user_id_t.)
SB* API/ABI is already using user_id_t and it needs to encode Main CU for DWZ. Therefore what about making DIERef the identifier not containing Main CU and user_id_t the identifier containing Main CU?
@labath also said:

I think it would be good to have only one kind of "user id". What are the cases where you need a main-cu-less user id?

This patch does that in a small scale, is it the proper way forward?
(Personally I think the non-MainCU and plus-MainCU forms should be the same, either both user_id_t or both DIERef - and to drop the other form completely - but that can be always easily refactored any time.)

Diff Detail

Event Timeline

jankratochvil created this revision.Feb 14 2020, 12:48 PM
jankratochvil retitled this revision from Unify DIERef vs. user_id_t to Separate DIERef vs. user_id_t: m_function_scope_qualified_name_map.Feb 16 2020, 6:43 AM

+ @JDevlieghere for a reason why it would be nice to have cross platform "debug map" test(s).

> I have also noticed DIERef and user_id_t in fact contain the same information which can be seen in SymbolFileDWARF::GetUID.

I am afraid the situation is not that simple. Your statement is true for all "elf" forms of dwarf (regular, dwo, dwp, type units, etc.), and also for mac dSYM files, but it is not true in case of mac "debug map" scenario. In this case, a user_id_t also encodes the symbol file / module ID. You can best see this in the function doing the opposite transformation (SymbolFileDWARF::DecodeUID), which returns a SymbolFileDWARF in addition to a DIERef.

The macos debug map works by creating multiple modules for each .o file, and them mangling them so that they appear to come from a single module. From five miles up, this is pretty similar to what "split dwarf" does, but it has one important distinction -- in split dwarf, the dwo files can only be interpreted together with the main executable file, which contains the linked portions of the debug info (addresses, mainly), while in case of a debug map, the .o files contain fully standalone dwarf, and the "relinking" that we do is outside of the scope of the dwarf spec and works by the linker leaving breadcrumbs about what it has done in a custom format.

For this reason the debug map, and the split dwarf features were implemented on different levels inside SymbolFileDWARF, and one of them is included in DIERef, while the other isn't. The main confusing part about this is that split dwarf creates multiple SymbolFile objects (SymbolFileDWARFDwo, SymbolFileDWARFDwp, SymbolFileDWARFDwoDwp :/), even though all of these things are really a part of a single SymbolFile object that happens to be spread across multiple files -- this is something I am working on fixing (first by removing the Dwp flavours), and why want to use your MainCU concept for split dwarf too (hopefully that would rid us of SymbolFileDWARFDwo).

The reason I am saying all of this is to illustrate why I think you can't make user_id_t and DIERef the same thing -- the former needs to be globally unique, whereas the latter is local to a single "symbol file dwarf" (with big quotes).

However, I am not sure what all of this says about this patch. In principle, I don't see a big problem with changing the type of this field. In fact, this field used to hold a DIERef until I changed that in D63322. However, there wasn't a strong reason for that -- I did it because it was a) convenient; b) more memory-efficient. We can change it back if it helps you in achieving your goal (and btw, thank you for doing this in small steps). It's just that currently it's not at all clear to me what that goal is. Maybe you could give a rough outline of where are you going with this. For example, how will the user_id_t decoding process look like in the end?

In fact, this field used to hold a DIERef until I changed that in D63322.

I did not expect that. OK, this D74637 + my D74690 try to just revert D63322.

I did it because it was
b) more memory-efficient.

Why? Both DIERef and user_id_t sizeof is 8.

Maybe you could give a rough outline of where are you going with this.

I am trying to reduce user_id_t usage as much as possible. And then to add MainCU to user_id_t (but no longer to DIERef). As construction of user_id_t with MainCU needs additional information no longer contained in DIERef it will need some additional parameter in the caller chain like I did in D73206. As the new parameter is used only for DWZ its addition is not much popular so it should be limited to as few cases as possible.

For example, how will the user_id_t decoding process look like in the end?

In my dwz branch the MainCU encoding is now combined for both DIERef and user_id_t but the plan is to encode it only into user_id_t. It still fits into 64-bits reusing the DWO field (which is never used together with DWZ). Current implementation:

I did it because it was
b) more memory-efficient.

Why? Both DIERef and user_id_t sizeof is 8.

Ah, sorry, I misremembered that (and confused DIERef with DWARFDIE). I think what happened is that at the time I was writing that patch, I was planning to increase the size of DIERef. But in the end, that did not materialize (we chose to drop the somewhat redundant cu_offset field instead).

Maybe you could give a rough outline of where are you going with this.

I am trying to reduce user_id_t usage as much as possible. And then to add MainCU to user_id_t (but no longer to DIERef). As construction of user_id_t with MainCU needs additional information no longer contained in DIERef it will need some additional parameter in the caller chain like I did in D73206.

Ok, this part makes sense. It's hard for me to evaluate the rest, as the code your linking to still assumes that the MainCU is stored in the DIERef, which you now say you want to change. Suppose these patches are accepted (let's call them tentatively accepted). What would be the next steps?

the MainCU is stored in the DIERef, which you now say you want to change.

Just some wording: If MainCU was in DIERef then MainCU needs to be also in DWARFDIE which means DWARFDIE must grow 16->24 bytes which you do not want. So transitively yes, I cannot store MainCU into DIERef.

Suppose these patches are accepted (let's call them tentatively accepted). What would be the next steps?

I find this patch as a NFC cleanup to the codebase - to satisfy a new premise user_id_t is used as little as possible and thus only for external interfaces which must not deal with MainCU in any way.
Its larger goal is to satisfy this item of the big DWZ plan you made:

I think it would be good to have only one kind of "user id". What are the cases where you need a main-cu-less user id?

Sure thanks for all the reviews.

labath accepted this revision.Feb 17 2020, 5:21 AM

Suppose these patches are accepted (let's call them tentatively accepted). What would be the next steps?

I find this patch as a NFC cleanup to the codebase - to satisfy a new premise user_id_t is used as little as possible and thus only for external interfaces which must not deal with MainCU in any way.

Ok, I think I can go with that. I wanted to get a feel for where your going (something mid-level plan between this patch, and the "big DWZ plan") to see where this is going, but I think we can accept this just on the basis that it is a revert of D63322, which turned out to be a dud. And the changes are not that big so it's not a big deal if we need to modify this.

This revision is now accepted and ready to land.Feb 17 2020, 5:21 AM
This revision was automatically updated to reflect the committed changes.