This is an archive of the discontinued LLVM Phabricator instance.

[Debuginfo][DWARF][NFC] Add paired methods working with DWARFDebugInfoEntry.
ClosedPublic

Authored by avl on May 20 2022, 3:52 AM.

Details

Summary

This review is extracted from D96035.

DWARF Debuginfo classes have two representations for DIEs: DWARFDebugInfoEntry
(short) and DWARFDie(extended). Depending on the task, it might be more convenient
to use DWARFDebugInfoEntry or/and DWARFDie. DWARFUnit class already has methods
working with DWARFDie and DWARFDebugInfoEntry. This patch adds more
methods working with DWARFDebugInfoEntry to have paired functionality.

Diff Detail

Event Timeline

avl created this revision.May 20 2022, 3:52 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 20 2022, 3:52 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
avl requested review of this revision.May 20 2022, 3:52 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 20 2022, 3:52 AM

I would actually request that no one can play with DWARFDebugInfoEntry items at all except for the DWARFDie class. The main issue is you can ask the wrong DWARFUnit class to do something with a DWARFDebugInfoEntry object that it doesn't own. We ran into this problem in LLDB and this is why we created the DWARFDie class in LLDB. If you encapsulate everything into the DWARFDie class and control all entry points, then this mistake can not be made since everyone needs to use the DWARFDie class. If you do want it add it, I would suggest adding an assert that verifies that the DWARFDebugInfoEntry belongs to the current DWARFUnit in each new function that takes a "DWARFDebugInfoEntry *", but I would still rather see everyone use DWARFDie if possible for safety reason and to avoid having an assert in each call.

avl added a comment.May 21 2022, 12:12 AM

I would actually request that no one can play with DWARFDebugInfoEntry items at all except for the DWARFDie class. The main issue is you can ask the wrong DWARFUnit class to do something with a DWARFDebugInfoEntry object that it doesn't own. We ran into this problem in LLDB and this is why we created the DWARFDie class in LLDB. If you encapsulate everything into the DWARFDie class and control all entry points, then this mistake can not be made since everyone needs to use the DWARFDie class. If you do want it add it, I would suggest adding an assert that verifies that the DWARFDebugInfoEntry belongs to the current DWARFUnit in each new function that takes a "DWARFDebugInfoEntry *", but I would still rather see everyone use DWARFDie if possible for safety reason and to avoid having an assert in each call.

The reason to use DWARFDebugInfoEntry is the performance concern.
The non optimized DWARF for clang binary has more than 160 million dies:

llvm-dwarfdump --debug-info clang | grep -c DW_TAG 
162452994

There are a many methods which create dies, f.e.:

DWARFDie getDIEAtIndex(unsigned Index) {
  return DWARFDie(this, getDebugInfoEntry(Index));
}

DWARFDie getDIEForOffset(uint64_t Offset) {
  if (Optional<uint32_t> DieIdx = getDIEIndexForOffset(Offset))
    return DWARFDie(this, &DieArray[*DieIdx]);

  return DWARFDie();
}

...

these methods are called for each die more than once, this causes the DWARFDie class constructor to be called a very large number of times.
While using DWARFDebugInfoEntry is cheap. They are allocated once at DieArray and later are passed as the pointers(No need to call constructor).

Other than that, passing DWARFDie by value leads to copying more data - 16 bytes instead of 8 bytes for DWARFDebugInfoEntry*.

All above lead to noticable performance impact.

If we can agree on using asserts to check that the proper unit is used(as you suggested) then I propose to use that approach.
Will update this review to use assertions checking DWARFUnit.

You mention performance concerns - even in an optimized build? What sort of performance data do you have? Would be good to have that documented here in the review/in the commit message to justify the use of these more error-prone APIs.

avl added a comment.May 23 2022, 12:03 PM

You mention performance concerns - even in an optimized build?

yes, the performance improvement could be achieved even in an optimized build.

What sort of performance data do you have?

When I replaced ~50% usages of DWARFDie with DWARFDebugInfoEntry for D96035 I got 1-2% improvement.

A 1 to 2 percent improvement is probably going to go away if the asserts are enabled when it has to do the check each time. If would still hesitate to add this kind of API as it will encourage others to use the APIs. I would be ok with it if the methods are protected or private and if we can add the DWARFLinker as a friend class to allow access to these performant but dangerous APIs only in certain situations that really require it. I really want to avoid most people from being able to use these APIs if possible to make sure we don't end up with bugs. We had many many bugs that stemmed from mismatching DWARFUnit + DWARFDebugIntoEntry before the DWARFDie class was created, mostly when doing cross CU references or in Fission cases where you have the skeleton CU vs the actual DWO CU + a DWARFDebugInfoEntry that comes from the actual DWO file. So there is a real danger to the safety and correctness of the code here.

avl added a comment.May 23 2022, 2:17 PM

A 1 to 2 percent improvement is probably going to go away if the asserts are enabled when it has to do the check each time. If would still hesitate to add this kind of API as it will encourage others to use the APIs. I would be ok with it if the methods are protected or private and if we can add the DWARFLinker as a friend class to allow access to these performant but dangerous APIs only in certain situations that really require it. I really want to avoid most people from being able to use these APIs if possible to make sure we don't end up with bugs. We had many many bugs that stemmed from mismatching DWARFUnit + DWARFDebugIntoEntry before the DWARFDie class was created, mostly when doing cross CU references or in Fission cases where you have the skeleton CU vs the actual DWO CU + a DWARFDebugInfoEntry that comes from the actual DWO file. So there is a real danger to the safety and correctness of the code here.

if all of these bugs could be caught by assertions then the problem probably does not look too severe?
Running debug version of the application will show the problems.

Anyway, if we do not want these methods to be used by others - I am OK to make these methods protected and make DWARFLinker the only one using them.

A 1 to 2 percent improvement is probably going to go away if the asserts are enabled when it has to do the check each time. If would still hesitate to add this kind of API as it will encourage others to use the APIs. I would be ok with it if the methods are protected or private and if we can add the DWARFLinker as a friend class to allow access to these performant but dangerous APIs only in certain situations that really require it. I really want to avoid most people from being able to use these APIs if possible to make sure we don't end up with bugs. We had many many bugs that stemmed from mismatching DWARFUnit + DWARFDebugIntoEntry before the DWARFDie class was created, mostly when doing cross CU references or in Fission cases where you have the skeleton CU vs the actual DWO CU + a DWARFDebugInfoEntry that comes from the actual DWO file. So there is a real danger to the safety and correctness of the code here.

if all of these bugs could be caught by assertions then the problem probably does not look too severe?
Running debug version of the application will show the problems.

Anyway, if we do not want these methods to be used by others - I am OK to make these methods protected and make DWARFLinker the only one using them.

I personally would be fine if there were guardrails on this feature where we make them protected, though I am not a code owner the DWARF parser in LLVM, so some input from the LLVM DWARF community would be good to verify they agree with this direction.

A 1 to 2 percent improvement is probably going to go away if the asserts are enabled when it has to do the check each time.

Though the asserts would only be done in +Asserts builds - still providing the performance in release/non-asserts builds.

If would still hesitate to add this kind of API as it will encourage others to use the APIs. I would be ok with it if the methods are protected or private and if we can add the DWARFLinker as a friend class to allow access to these performant but dangerous APIs only in certain situations that really require it. I really want to avoid most people from being able to use these APIs if possible to make sure we don't end up with bugs. We had many many bugs that stemmed from mismatching DWARFUnit + DWARFDebugIntoEntry before the DWARFDie class was created, mostly when doing cross CU references or in Fission cases where you have the skeleton CU vs the actual DWO CU + a DWARFDebugInfoEntry that comes from the actual DWO file. So there is a real danger to the safety and correctness of the code here.

Fair enough. I don't feel super strongly about it/sounds OK to me.

avl updated this revision to Diff 433436.Jun 1 2022, 9:46 AM

addressed comments: make methods working with DWARFDebugInfoEntry to be protected.

Looks good to me, but someone that owns the DWARF parser in LLVM should give the final ok.

aprantl accepted this revision.Jul 26 2022, 3:05 PM
This revision is now accepted and ready to land.Jul 26 2022, 3:05 PM
avl updated this revision to Diff 448579.Jul 29 2022, 4:30 AM

rebased.