This is an archive of the discontinued LLVM Phabricator instance.

Exposes interface to free up caching data structure in DWARFDebugLine and DWARFUnit for memory management
ClosedPublic

Authored by florinpapa on Oct 22 2020, 9:20 PM.

Details

Summary

This is minimum changes extracted from https://reviews.llvm.org/D78950. The old patch tried to add LRU eviction of caching data structure. Due to multiple layers of interfaces that users could be using, it was not clear where to put the functionality. While we work out on where to put that functionality, it'll be great to add this minimum interface change so that the user could implement their own memory management. More specifically:

  • Add a clearLineTable method for DWARFDebugLine which erases the given offset from the LineTableMap.
  • DWARFDebugContext adds the clearLineTableForUnit method that leverages clearLineTable to remove the object corresponding to a given compile unit, for memory management purposes. When it is referred to again, the line table object will be repopulated.

Diff Detail

Event Timeline

netforce created this revision.Oct 22 2020, 9:20 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 22 2020, 9:20 PM
netforce requested review of this revision.Oct 22 2020, 9:20 PM

The main motivation for going down the D78950 route was to ensure the code was testable (and preferably in a live codepath that other folks are using, etc) - so having some test coverage for this would still be necessary/highly preferred, though perhaps unit test coverage could be achieved/would suffice?

(looks like the patch description is incomplete? Seems there's a partial sentence at the end?)

One thing to note is that a line table can be shared by multiple dwarf units. This regularly happens with type units. Theoretically, compile units can share a line table too, though that would be a pretty unusual setup...

One thing to note is that a line table can be shared by multiple dwarf units. This regularly happens with type units. Theoretically, compile units can share a line table too, though that would be a pretty unusual setup...

FWIW, that does come up for LLVM when doing any kind of LTO + non-integrated assembler. (the assembly syntax doesn't provide a way to describe two distinct line tables, so both CUs end up having to refer to one line table)

Though in that case, using this API wouldn't break things, right? (unless they were being used concurrently, which would be problematic from the start - since the lazy-query/line table parsing API would race to begin with, I think?) It would mean that two CUs sharing a line table, if you invalidate one CU's line table (which is shared) then both/either CU, on its next line table query, would see a performance hit?

I guess the more involved alternative would be a reference counting scheme so that one could clear the line table from one CU and it would be a no-op unless all CUs sharing that line table had cleared it. I don't think that use case is necessary to support at the moment, so I'm fine with the more aggressive version that's implemented here.

labath added a comment.Nov 3 2020, 2:13 AM

One thing to note is that a line table can be shared by multiple dwarf units. This regularly happens with type units. Theoretically, compile units can share a line table too, though that would be a pretty unusual setup...

FWIW, that does come up for LLVM when doing any kind of LTO + non-integrated assembler. (the assembly syntax doesn't provide a way to describe two distinct line tables, so both CUs end up having to refer to one line table)

Interesting. I am moderately surprised that this does not cause any problems in the consumers. I don't know if this would actually break anything, but I believe it will cause lldb to (re)parse the line table for each CU that refers to it.

Though in that case, using this API wouldn't break things, right? (unless they were being used concurrently, which would be problematic from the start - since the lazy-query/line table parsing API would race to begin with, I think?) It would mean that two CUs sharing a line table, if you invalidate one CU's line table (which is shared) then both/either CU, on its next line table query, would see a performance hit?

Yeah, this should break anything. I don't think this needs to be a show-stopper, but I wanted to make sure you are aware of that and are fine with that kind of a performance hit.

I guess the more involved alternative would be a reference counting scheme so that one could clear the line table from one CU and it would be a no-op unless all CUs sharing that line table had cleared it. I don't think that use case is necessary to support at the moment, so I'm fine with the more aggressive version that's implemented here.

Another option would be to decouple line table and die clearing and expose separate interfaces for clearing each one -- thereby pushing the reference counting problem to the user.

One thing to note is that a line table can be shared by multiple dwarf units. This regularly happens with type units. Theoretically, compile units can share a line table too, though that would be a pretty unusual setup...

FWIW, that does come up for LLVM when doing any kind of LTO + non-integrated assembler. (the assembly syntax doesn't provide a way to describe two distinct line tables, so both CUs end up having to refer to one line table)

Interesting. I am moderately surprised that this does not cause any problems in the consumers. I don't know if this would actually break anything, but I believe it will cause lldb to (re)parse the line table for each CU that refers to it.

Yeah, could totally believe that. Might be unfortunate for a large full LTO binary built without the integrated assembler, but otherwise wouldn't come up.

Though in that case, using this API wouldn't break things, right? (unless they were being used concurrently, which would be problematic from the start - since the lazy-query/line table parsing API would race to begin with, I think?) It would mean that two CUs sharing a line table, if you invalidate one CU's line table (which is shared) then both/either CU, on its next line table query, would see a performance hit?

Yeah, this should break anything. I don't think this needs to be a show-stopper, but I wanted to make sure you are aware of that and are fine with that kind of a performance hit.

Fair, for sure - good to be aware/explicit about that.

I guess the more involved alternative would be a reference counting scheme so that one could clear the line table from one CU and it would be a no-op unless all CUs sharing that line table had cleared it. I don't think that use case is necessary to support at the moment, so I'm fine with the more aggressive version that's implemented here.

Another option would be to decouple line table and die clearing and expose separate interfaces for clearing each one -- thereby pushing the reference counting problem to the user.

*nod* I don't think this /probably/ merits that nuance, but certainly something to keep in mind.

@netforce is this still of interest? To you/anyone else? (just curious/not important to me personally)

Herald added a project: Restricted Project. · View Herald TranscriptMar 24 2022, 7:20 PM
florinpapa commandeered this revision.Mar 24 2022, 7:23 PM
florinpapa added a reviewer: netforce.

Taking over this change, as it is still useful and I'd like to get it through.

Rebase against origin/master

dblaikie requested changes to this revision.Mar 25 2022, 1:33 PM

Based on offline discussion: It'd be nice to at least have unit testing for this functionality (if/since it won't be used by llvm-symbolizer or other llvm tools) & some comments in the unit test about why it exists.

This revision now requires changes to proceed.Mar 25 2022, 1:33 PM

Address comments

florinpapa edited the summary of this revision. (Show Details)May 17 2022, 2:53 PM

Update test comment.

dblaikie accepted this revision.May 17 2022, 3:22 PM

Looks OK otherwise

llvm/unittests/DebugInfo/DWARF/DWARFDebugLineTest.cpp
402 ↗(On Diff #430193)

Could this be written as:

ASSERT_TRUE(ExpectedLineTable4);

? (similarly for other bool tests - in general, explicitly calling operator overloads is a bit "weird" and best avoided if possible - if there's an issue with gtest only testing implicit conversion, then maybe it'd be suitable to explicitly cast to bool: ASSERT_true((bool)ExpectedLineTable4))

This revision is now accepted and ready to land.May 17 2022, 3:22 PM

Avoid calling operator overloads in unit test.

florinpapa added inline comments.May 17 2022, 4:08 PM
llvm/unittests/DebugInfo/DWARF/DWARFDebugLineTest.cpp
402 ↗(On Diff #430193)

if there's an issue with gtest only testing implicit conversion, then maybe it'd be suitable to explicitly cast to bool: ASSERT_true((bool)ExpectedLineTable4))

How do I test that? I have been using ninja check-llvm-unit for testing so far, is that enough?

402 ↗(On Diff #430193)

I had to go with the latter, as the build failed without the explicit cast.

Perform rebase.

Hello, this change was approved. Could one of the reviewers help commit this, since I don't have permissions? The failing tests look like unrelated flakes, as they are timeouts and seem to fail differently after each rebase.

This revision was landed with ongoing or failed builds.May 23 2022, 8:23 PM
This revision was automatically updated to reflect the committed changes.