This is an archive of the discontinued LLVM Phabricator instance.

Adds LRU caching of compile units in DWARFContext.
Needs ReviewPublic

Authored by netforce on Apr 27 2020, 11:34 AM.

Details

Summary

This adds LRU caching of compile units in DWARFContext to reduce memory consumption of llvm-symbolizer. When llvm-symbolize symbolizes addresses from various compile units, it keeps the internal data structure (e.g. DIE and line table) for all relevant compile units until it finishes running. This leads to memory bloat when symbolizing many addresses. The memory usage can be limited by only keeping a fixed number of compile units in the memory.

This might make it longer to run because sometimes it has to parse the same compile unit as it's kicked out of memory. For better performance, LRU caching is used to keep the recently used compile units in the memory while kicking out the ones not used recently.

Confirmed this reduces the memory usage significantly (1.3GB -> 441MB when symbolizing a clang binary).
$ nm clang | sort -n | gawk '{printf "0x%s\n", $1}' | /usr/bin/time -f "RSS: %M KB\nexecution_time: %E\n" ./llvm-symbolizer-before -a -demangle -obj clang > /dev/null
RSS: 1314996 KB
execution_time: 0:04.11

$ nm clang | sort -n | gawk '{printf "0x%s\n", $1}' | /usr/bin/time -f "RSS: %M KB\nexecution_time: %E\n" ./llvm-symbolizer-after -a -demangle -obj clang > /dev/null
RSS: 441300 KB
execution_time: 0:08.15

Using Valgrind/Massif we could also confirm it limits the memory usage growth while it keeps growing without the change.
$ nm clang | sort -n | gawk '{printf "0x%s\n", $1}' | /usr/bin/valgrind --tool=massif --massif-out-file=massif.out.before ./llvm-symbolizer-before -a -demangle -obj clang > /dev/null
$ ms_print massif.out.before
`--------------------------------------------------------------------------------
Command: ./llvm-symbolizer-before -a -demangle -obj clang
Massif arguments: --massif-out-file=massif.out.before

ms_print arguments: massif.out.before

GB

1.208^ :

  |                                                                  @@@#::
  |                                                               @@@@@@#::
  |                                                            @@@@@@@@@#::
  |                                                        @@@@@@@@@@@@@#::
  |                                                     ::@@ @@@@@@@@@@@#::
  |                                                 ::::::@@ @@@@@@@@@@@#::
  |                                              @@@::: ::@@ @@@@@@@@@@@#::
  |                                           @@@@@@::: ::@@ @@@@@@@@@@@#::
  |                                       @@@@@@ @@@::: ::@@ @@@@@@@@@@@#::
  |                                    @@@@@ @@@ @@@::: ::@@ @@@@@@@@@@@#::
  |                                ::@:@@@@@ @@@ @@@::: ::@@ @@@@@@@@@@@#::
  |                              ::: @:@@@@@ @@@ @@@::: ::@@ @@@@@@@@@@@#::
  |                         :@:::::: @:@@@@@ @@@ @@@::: ::@@ @@@@@@@@@@@#::
  |                     @@@::@:: ::: @:@@@@@ @@@ @@@::: ::@@ @@@@@@@@@@@#::
  |                   @@@ @::@:: ::: @:@@@@@ @@@ @@@::: ::@@ @@@@@@@@@@@#::
  |                @@:@@@ @::@:: ::: @:@@@@@ @@@ @@@::: ::@@ @@@@@@@@@@@#::
  |           @@@@@@@ @@@ @::@:: ::: @:@@@@@ @@@ @@@::: ::@@ @@@@@@@@@@@#::
  |       ::@@@ @@ @@ @@@ @::@:: ::: @:@@@@@ @@@ @@@::: ::@@ @@@@@@@@@@@#::
  |    @@@: @@@ @@ @@ @@@ @::@:: ::: @:@@@@@ @@@ @@@::: ::@@ @@@@@@@@@@@#::
0 +----------------------------------------------------------------------->Gi
  0                                                                   19.40

...`

$ nm clang | sort -n | gawk '{printf "0x%s\n", $1}' | /usr/bin/valgrind --tool=massif --massif-out-file=massif.out.after ./llvm-symbolizer-after -a -demangle -obj clang > /dev/null
$ ms_print massif.out.after
`--------------------------------------------------------------------------------
Command: ./llvm-symbolizer-after -a -demangle -obj clang
Massif arguments: --massif-out-file=massif.out.after

ms_print arguments: massif.out.after

MB

97.32^ #

  |              #                                                         
  |              #                                                         
  |              #          :                        ::                    
  |              #          :                        :                     
  |              #          :                        :                     
  |              #          :                        :                     
  |              #          :           @@:          :             :       
  |              #          :           @ :          : :           :       
  |             :#          :           @ :          : :           :       
  |          :  :#          :        :  @ :        : : :        @  :       
  |          :  :#      @@  :::      :: @ :        ::: :        @:::       
  |          ::::#      @   ::       :: @ :      ::::: :      ::@: :   :   
  |     : :  :: :#@@   :@ :::: : : :::::@ ::  :::: ::: : :    ::@: ::: :  :
  | ::::::::::: :#@ ::::@ : :: ::: : :::@ ::@::: : ::: :::::::::@: :::@::::
  | :: ::::: :: :#@ :: :@ : :: :::@: :::@ ::@::: : ::: :::::: ::@: :::@::::
  | :: ::::: :: :#@ :: :@ : :: :::@: :::@ ::@::: : ::: :::::: ::@: :::@::::
  | :: ::::: :: :#@ :: :@ : :: :::@: :::@ ::@::: : ::: :::::: ::@: :::@::::
  | :: ::::: :: :#@ :: :@ : :: :::@: :::@ ::@::: : ::: :::::: ::@: :::@::::
  | :: ::::: :: :#@ :: :@ : :: :::@: :::@ ::@::: : ::: :::::: ::@: :::@::::
0 +----------------------------------------------------------------------->Gi
  0                                                                   49.33

...`

Diff Detail

Event Timeline

netforce created this revision.Apr 27 2020, 11:34 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 27 2020, 11:34 AM

Do you think this is always a win, i.e., also for LLDB, or should this be done in a higher layer?

netforce added a comment.EditedApr 27 2020, 5:47 PM

Do you think this is always a win, i.e., also for LLDB, or should this be done in a higher layer?

In terms of memory usage, this will always make it smaller or equal to the current implementation except the small amount used for the LRU data structure. It could increase execution time or CPU usage, since it is possible that some compile units need to be parsed multiple times due to the cache eviction. How much it is affected depends on the access pattern of the information. In our use case of llvm-symbolizer and our own symbolization library, it could save huge memory with reasonable degradation of CPU usage.

Since DWARFContext is used as entry point of the API, I thought it makes sense to place the caching functionality there. Although I'm not very familiar with LLDB, if this hurts the performance of it, it can opt out of using the caching by giving negative LRUSize when creating DWARFContext.

In case reviewers agree that it makes more sense to put it in a higher layer, I believe that's possible too after making a couple private member functions public.

Is it possible to add some tests for the LRU logic?

llvm/include/llvm/DebugInfo/DWARF/DWARFContext.h
117–124

Can this be a separate utility class? No need to overburden DWARFContext.

321

Does this method need to be in the public interface of the class? Right now, it looks like an implementation detail that should be hidden from users.

402

It looks like this should also be private, no?

Do you think this is always a win, i.e., also for LLDB, or should this be done in a higher layer?

Whether this is a win or not will definitely depend on the usage patterns, and it's very hard to know that for sure before trying it out on a specific use case. My gut feeling would be that this wouldn't help lldb's memory usage much because lldb already stores so much other state, but it may have a big (negative) impact on cpu usage. However, given that this functionality can be turned off (thought that could be better documented) I don't think we need to burden ourselves with that too much with what would happen with lldb in particular.

That said, I'm not really sure what to make of this patch. What was they criterion for choosing where to place the referCompileUnit calls ? Right, now they seem to be present on a couple of high-level APIs, but those functions are definitely not the only way to access dwarf units. Indeed, once you start thinking about the lower level APIs, things start to get a lot more fuzzier. And dangerous -- for instance, this makes it very easy to turn a perfectly valid DWARFDie object into a landmine if some operation happens to "garbage-collect" the DWARFDebugInfoEntry object that it refers to.

I think that example would definitely speak for a separate layer with a high level api, which offers only a limited (and more controlled) way of accessing the information in the dwarf file. How feasible is that in the current situation -- I don't know.

Although I'm not very familiar with LLDB, if this hurts the performance of it, it can opt out of using the caching by giving negative LRUSize when creating DWARFContext.

I think it's important to note here that lldb does not use (most of) llvm dwarf parsing code right now -- it has it's own semi-forked versions of a lot of stuff. However, we are looking into making it reuse more (all?) of llvm dwarf parsing code, which is why it is important to ensure the interface stays flexible enough.

Is it possible to add some tests for the LRU logic?

When we decide where to put caching, I'll see if I can add some unit tests.

llvm/include/llvm/DebugInfo/DWARF/DWARFContext.h
117–124

Once we determine which layer we place this LRU caching first, I'll revisit this and see if it would be cleaner to use a separate utility class.

321

I'll revisit this too when we decide which layer to put the caching. For the current implementation, it seems it doesn't need to be public.

402

Again, I'll revisit this when we decide which layer to put the caching. For the current implementation, we need this to be public as we have our symbolization tool directly accessing the compile units.

Do you think this is always a win, i.e., also for LLDB, or should this be done in a higher layer?

Whether this is a win or not will definitely depend on the usage patterns, and it's very hard to know that for sure before trying it out on a specific use case. My gut feeling would be that this wouldn't help lldb's memory usage much because lldb already stores so much other state, but it may have a big (negative) impact on cpu usage. However, given that this functionality can be turned off (thought that could be better documented) I don't think we need to burden ourselves with that too much with what would happen with lldb in particular.

That said, I'm not really sure what to make of this patch. What was they criterion for choosing where to place the referCompileUnit calls ? Right, now they seem to be present on a couple of high-level APIs, but those functions are definitely not the only way to access dwarf units. Indeed, once you start thinking about the lower level APIs, things start to get a lot more fuzzier. And dangerous -- for instance, this makes it very easy to turn a perfectly valid DWARFDie object into a landmine if some operation happens to "garbage-collect" the DWARFDebugInfoEntry object that it refers to.

I think that example would definitely speak for a separate layer with a high level api, which offers only a limited (and more controlled) way of accessing the information in the dwarf file. How feasible is that in the current situation -- I don't know.

Although I'm not very familiar with LLDB, if this hurts the performance of it, it can opt out of using the caching by giving negative LRUSize when creating DWARFContext.

I think it's important to note here that lldb does not use (most of) llvm dwarf parsing code right now -- it has it's own semi-forked versions of a lot of stuff. However, we are looking into making it reuse more (all?) of llvm dwarf parsing code, which is why it is important to ensure the interface stays flexible enough.

I agree that it's better to keep the interface flexible enough. As I mentioned in previous comment, I'm open to moving this to a different layer. Since I'm new to LLVM, please advise me where would be the best place for this.

Here's my thought process of placing this in DWARFContext. I only looked at two users of this DWARF API (llvm-symbolizer and our own symbolization tool). llvm-symbolizer access it mainly through DIContext interface from which DWARFContext inherits. Our symbolization tool access it via both DWARFContext and DWARFUnit. So, I thought DWARFContext would be a good place to put this in order for both of them to use it. I'm sorry I didn't take a look at LLDB.

With that being said, I'm OK with moving the caching to the symbolizer layer. Since they access the API through DWARFContext, it needs to expose a couple more member functions to make it possible:

  1. getCompileUnitForAddress
  2. clearLineTableForUnit

Do you think this is better in terms of keeping the API flexible enough? Or do you suggest another way? Please let me know. Thanks.

(adding more debug info folks)

Do you think this is better in terms of keeping the API flexible enough?

Umm... probably?

I don't have a comprehensive overview of the users of the DWARF parser either, but I'm sure there are users that want to access it through the lower level APIs. Lldb will most likely be one of those users -- even after it starts using llvm parsers completely I think it's fairly unlikely it would use APIs like getLocalsForAddress, but rather do something custom. However, all of that is vapourware, so its hard to reason about that.

It may be more interesting to look at other (real) users. One such user which comes to mind is (llvm-)dsymutil, as it accesses dwarf in a fairly complex way. Jonas, what do you think?

(adding more debug info folks)

Do you think this is better in terms of keeping the API flexible enough?

Umm... probably?

I don't have a comprehensive overview of the users of the DWARF parser either, but I'm sure there are users that want to access it through the lower level APIs. Lldb will most likely be one of those users -- even after it starts using llvm parsers completely I think it's fairly unlikely it would use APIs like getLocalsForAddress, but rather do something custom. However, all of that is vapourware, so its hard to reason about that.

It may be more interesting to look at other (real) users. One such user which comes to mind is (llvm-)dsymutil, as it accesses dwarf in a fairly complex way. Jonas, what do you think?

I took a look at llvm-symbolizer code considering how to fit the caching there, and I also took a brief look at dsymutil to see how it uses the DWARF library.

llvm-symbolizer (more specifically code in lib/DebugInfo/Symbolize) mostly access the library through DIContext interface, and it can be either DWARFContext or PDBContext, while the caching is quite DWARFContext-specific.

dsymutil creates DWARFContext and give it to DWARFLinker library, then it seems it uses lower level API (DWARFUnit) to access the information. Here's an example.

So, both high level API (DIContext) and low level API (DWARFUnit) are being used depending on the the users, and it makes it difficult to decide where the best place to put the caching in is. Here are a few options I could think of.

  1. Place caching in DWARFContext (as the revision currently is) but turn it off by default.

In this way, llvm-symbolizer could benefit from the revision without making its client code too complicated. At the same time, it'll be noop for other users that access the library with lower level API. If they need caching later, they can choose between implementing it using the lower level API or using the one in DWARFContext.

  1. Move caching to llvm-symbolizer

This minimizes the disruption in the API (both DWARFContext and DWARFUnit). However, I expect that it needs pretty messy changes in llvm-symbolizer because it uses DIContext interface: we need to make it distinguish the implementation of DIContext and apply caching only when it's DWARFContext.

  1. Just expose the necessary member functions without implementing caching anywhere in LLVM source tree

This also needs minimum disruption in the API (as is in 2), and it's good enough for our own use case (our own symbolization tool that access the library via both DWARFContext and DWARFUnit). However, none of LLVM tool will benefit from caching until they implement it.

I'm good with all three options, but I'm personally leaning toward option 1. Which one should we follow for the sake of the API? Or is there any other way? Please let me know. Thank you!

This is a friendly ping. Could you please let me know which direction we should go? Thank you.

labath added a comment.May 6 2020, 2:37 AM

First off, let me say that I don't feel qualified to set the direction here, nor I am fully familiar with all of these interfaces. However, since I am already involved in here, I am going to say something anyway. :)

The way I would imagine this working ideally is that there would be two layers. The lower layer would provide more explicit access to the debug info, and it would provide it's users with the ability to manually manage the memory usage. It would be the responsibility of the users to not shoot themselves in the foot if they use it.

The second layer would offer a higher level view of the debug info and it would manage the memory (semi)automatically. Being high level it would/could/should also abstract away the difference between the different debug info formats.

The first layer would roughly correspond to the current DWARFUnit, DWARFContext, etc. classes, with one important difference DWARFContext would not implement DIContext -- it would be a standalone class. The second layer would correspond to the DIContext class, and any of the (new) helper classes it needs to manage the memory and perform the abstractions.

The main advantage I see in that is the breaking of the is-a relationship between DWARFContext and DIContext. Without it the interfaces seems rather shoot-footy because one can happily play around with the lower-level DWARFContext apis (which don't manage memory), and then accidentally call some higher level method which comes from DIContext , which does the management and will then cause the debug info to disappear from under you.

As I said, I don't know how easy would be to reach this state, nor whether it would be an acceptable state for other stakeholders...

netforce added a comment.EditedMay 7 2020, 11:52 AM

First off, let me say that I don't feel qualified to set the direction here, nor I am fully familiar with all of these interfaces. However, since I am already involved in here, I am going to say something anyway. :)

The way I would imagine this working ideally is that there would be two layers. The lower layer would provide more explicit access to the debug info, and it would provide it's users with the ability to manually manage the memory usage. It would be the responsibility of the users to not shoot themselves in the foot if they use it.

The second layer would offer a higher level view of the debug info and it would manage the memory (semi)automatically. Being high level it would/could/should also abstract away the difference between the different debug info formats.

The first layer would roughly correspond to the current DWARFUnit, DWARFContext, etc. classes, with one important difference DWARFContext would not implement DIContext -- it would be a standalone class. The second layer would correspond to the DIContext class, and any of the (new) helper classes it needs to manage the memory and perform the abstractions.

The main advantage I see in that is the breaking of the is-a relationship between DWARFContext and DIContext. Without it the interfaces seems rather shoot-footy because one can happily play around with the lower-level DWARFContext apis (which don't manage memory), and then accidentally call some higher level method which comes from DIContext , which does the management and will then cause the debug info to disappear from under you.

As I said, I don't know how easy would be to reach this state, nor whether it would be an acceptable state for other stakeholders...

Hi Pavel,

I like your two layers idea, since it could give better control to the lower level API users and enable the new feature for the higher level API users without complicating the higher level API.

However, I don't think I have enough understanding of the wholesome picture of the DWARF API to make it in a better shape by refactoring it, which I believe is needed to decouple DWARFContext from DIContext. Furthermore, that seems quite out of the scope of this revision.

So, here's my suggestion that could make this revision align better with the direction you suggested, but leaving the refactoring for a later revision from someone who has better knowledge of the API.

  1. I'll expose primitive member functions in DWARFUnit and DWARFContext so that lower level API users can control memory management themselves.
  2. I'll introduce a new wrapper class (say CachedDWARFContext?) that implements DIContext with the LRU caching and having an instance of DWARFContext.
  3. I'll let llvm-symbolizer uses the new wrapper class where it currently uses DWARFContext as DIContext.

I think this is one step closer to what you suggest without letting an unqualified person (= me) substantially modify the API (= DWARFContext) that's used by several different clients. It'll also make it easier to decouple DWARFContext from DIContext in a later revision.

WDYT? If you think this is reasonable, I'll prepare the changes. We can talk more about the details there and ask the other stakeholders if this is OK too.

Thank you,
HK

Hi Pavel,

This is a friendly ping. Could you please see if my previous comment sounds OK?

Thanks,
HK

(sorry about the delay... the usual blurb about being busy, not being qualified etc.)

Yes, that sounds like a reasonable design to me. I think it's definitely worth creating a patch to see how it looks like in practice.

Hi Pavel,

Thank you for the confirmation. I'll shortly prepare a patch in that direction.

Thanks,
HK

netforce updated this revision to Diff 264947.May 19 2020, 9:35 AM

Reflected the discussion we've had so far. Made minimum change to the lower level API (DWARFContext, DWARFDebugLine, and DWARFUnit) to make it possible for the high level API and symbolization tool to implement caching. Added a higher level API (DWARFCachedDIContext) that implements DIContext with LRU caching so that llvm-symbolizer could benefit from the caching without code change. It is also a step toward decoupling DWARFContext from DIContext in the future which would be desirable for other use of DWARFContext (e.g. LLDB). Finally, added a unit test for the LRU logic.

netforce marked 9 inline comments as done.May 19 2020, 9:43 AM

Is it possible to add some tests for the LRU logic?

Added a unit test for the LRU logic. Thank you.

llvm/include/llvm/DebugInfo/DWARF/DWARFContext.h
117–124

Added DWARFCachedDIContext to implement DIContext with LRU caching without modifying DWARFContext much. Also added a utility class (DWARFUnitLRUCache) there.

321

To expose memory management feature to higher level API or tools, this needs to be public now. Please refer to the previous discussion for more context.

402

To implement caching in a separate class, this needs to be public. Actually, it seems it has already become public in another (previous) revision.

(sorry about the delay... the usual blurb about being busy, not being qualified etc.)

Yes, that sounds like a reasonable design to me. I think it's definitely worth creating a patch to see how it looks like in practice.

Hi Pavel,

Created the patch as we discussed. PTAL.

Thank you,
HK

netforce updated this revision to Diff 265064.May 19 2020, 3:25 PM

Sorry I mistakenly dropped the CMakeLists.txt files in the previous patch. Added them.

I think this is definitely cleaner than the previous patch. However, I still see some potential problems here. The way that this patch implements the reference counting means not all "usages" of DWARFUnits will be recorded. For example, during a call to GetLocalsForAddress, we end up calling Die.getAttributeValueAsReferencedDie(DW_AT_type)) here. The type DIE may end up being in a different compile unit (DW_FORM_ref_addr, DW_FORM_ref_sig8, DW_FORM_GNU_ref_alt, only the first one seems to be implemented correctly), which can cause additional units to be parsed "behind your back". Now, the question is what to do about it...

If we don't do anything about that, then the code will still be "correct", but it may end up using more memory than you expect. And if we try to do something about it, then we're back to the problem of DIEs disappearing from under you -- that function continues to use the Die object after it gets the type, but getting the type can cause the memory that Die points to to "disappear"...

I don't know whether you're ok with this quirk of the implementation. I also don't know whether we are ok with it. At this point it would really be nice to get someone else's opinion on all of this...

FYI, I haven't abandoned this patch. I have been waiting for more opinions including dblaikie@'s.

I'll be on leave from work for the next 6 weeks. So, once I get back from it, I'll revisit the comments then and try to figure out how to move forward.

Thank you!

FYI, I haven't abandoned this patch. I have been waiting for more opinions including dblaikie@'s.

I'll be on leave from work for the next 6 weeks. So, once I get back from it, I'll revisit the comments then and try to figure out how to move forward.

Thank you!

Yep, it's on my list of things to review! Sorry for the delay.

FYI, I haven't abandoned this patch. I have been waiting for more opinions including dblaikie@'s.

I'll be on leave from work for the next 6 weeks. So, once I get back from it, I'll revisit the comments then and try to figure out how to move forward.

Thank you!

Yep, it's on my list of things to review! Sorry for the delay.

No problem at all. Please take your time as I'll have to get back in 6 weeks. Thank you!

We are still waiting for discussion on where to put this cache eviction / memory management functionality to better serve existing user via different API layers. While we try to make progress on that, it'll be great if we expose the minimum interface change so that we could reduce the our memory footprint by implementing cache eviction out of LLVM source tree. So, I extracted the minimum changes and sent out https://reviews.llvm.org/D90006 for review. PTAL. Thank you.