This is an archive of the discontinued LLVM Phabricator instance.

Rate limit progress reporting
AbandonedPublic

Authored by saugustine on May 17 2023, 11:12 AM.

Details

Summary

Reporting progress for every DIE read turns out to be very slow when
run over a remote connection such as ssh. We have a report of it
taking over 30-minutes to load the Dwarf for Chrome via ssh (which
transfers every single write) and only about a minute over
Chrome-Remote Desktop, which is a video-conferencing style link, and
so doesn't update nearly as often.

For a 7k DIE target, this improves the speed of reading on my personal
machine (entirely local) by about 3%; data below. Over remote, slower
connections the increase is likely much greater.

Top of trunk:
(lldb) target create "crash_test"
Current executable set to 'crash_test' (aarch64).
(lldb) log timers dump
12.509606661 sec (total: 12.510s; child: 0.000s; count: 7826) for void DWARFUnit::ExtractDIEsRWLocked()
...

With this change:
(lldb) target create "crash_test"
Current executable set to 'crash_test' (aarch64).
(lldb) log timers dump
12.139054862 sec (total: 12.139s; child: 0.000s; count: 7826) for void DWARFUnit::ExtractDIEsRWLocked()

Diff Detail

Event Timeline

saugustine created this revision.May 17 2023, 11:12 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 17 2023, 11:12 AM
saugustine requested review of this revision.May 17 2023, 11:12 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 17 2023, 11:12 AM

This is a nice proof of concept, but I think we should go with a time-based approach to rate limit this. (Anyone else in LLDB know if we have some utils around to help with this?)

e.g. if the first 1000 files are small and the last 1000 are large, the user will see this quickly get to 1000/2000 and then appear to hang, until it magically/surprisingly finishes.

With a time-based rateacl, you can watch it slowly chug along 1001/2000, 1002/2000, etc. If there is a hanging problem, you'll deterministically see it hang on the same file number.

Given that here we know the total_progress in advance and assuming the operation is relatively evenly distributed, would it make sense to report for every percentage? We could do this in ManualDWARFIndex::Index or we could add something like a "Policy" or "Granularity" to the progress class (similar to what report_increment is doing in this class). This seems preferable compared to the rather arbitrary value of report every 1000.

I also like Jordan's rate limiting idea. In my mind that should be a property of the broadcaster. Different tools (e.g. vscode vs the command line) could specify different values when register a listener.

Swtich rate-limiting to a time-based mechanism.

Switch rate-limiting to a time-based mechanism

This update switches to a time-based approach as suggested by Jordan. However, the timing is about the same as the original. I believe because calling getCurrentTime every iteration is comparably slow as printing the progress report itself.

It probably is still a win over very slow connections, where printing is even slower and time would remain the same.

What would be ideal is a timing thread that wakes up every X seconds and prints the results, but there isn't a good mechanism for that, and doing that portably is way out of scope for this.

Shall we just switch to a percentage? Printing it every percent update?

That has the issues Jordan described, where things appear to progress quickly, and then may grind to a halt due to some big DIE.

But I think the perfect shouldn't be the enemy of the good here.

This update switches to a time-based approach as suggested by Jordan. However, the timing is about the same as the original. I believe because calling getCurrentTime every iteration is comparably slow as printing the progress report itself.

It probably is still a win over very slow connections, where printing is even slower and time would remain the same.

What would be ideal is a timing thread that wakes up every X seconds and prints the results, but there isn't a good mechanism for that, and doing that portably is way out of scope for this.

As I said in my previous comment, if we want to do anything timeout based it should be done in the the broadcaster/listener machinery so that other types of events can benefit from it too. I think the issue you're describing can be solved at that level, for example by blocking for a the desired time and flushing all but the last event received until that point.

Shall we just switch to a percentage? Printing it every percent update?

That has the issues Jordan described, where things appear to progress quickly, and then may grind to a halt due to some big DIE.

But I think the perfect shouldn't be the enemy of the good here.

Yes, that's definitely a shortcoming of this approach. What makes this somewhat less bad in my mind is that the inaccuracy is bounded by the granularity: e.g. something if you report every percentage, the error can never be more than a percentage. It would be nice if the consumer could set this property though, but I don't think there's a straightforward way to do that.

Adding support for a generic, user-specified, rate limit to the listeners would be my preferred solution, but I don't know how much work that would be.

Moved the rate-limiting to Debugger.[cpp|h]

Also wrote a custom getCurrentTime function, which doesn't do the
much of the extra work the Timer.h version does.

With this change, the timing is much better:

On my local machine, for a 93k DIE application, I get the following
timings:

1 second rate limit
(lldb) log timers dump
580.971832328 sec (total: 580.972s; child: 0.000s; count: 93007) for void DWARFUnit::ExtractDIEsRWLocked()

0 second rate limit, but with this change in place
663.114765369 sec (total: 663.115s; child: 0.000s; count: 93007) for void DWARFUnit::ExtractDIEsRWLocked()

Without this change in place
651.826884735 sec (total: 651.827s; child: 0.000s; count: 93007) for void DWARFUnit::ExtractDIEsRWLocked()

saugustine retitled this revision from Proof of concept for reducing progress-reporting frequency. to Rate limit progress reporting.May 19 2023, 3:41 PM

Any more thoughts on this from the reviewers?

My suggestion was to add rate limiting support to the listener so that all events could benefit from this. The current patch seems unnecessary ad-hoc: it's limited to (1) progress events and (2) the default event handler. I expects that lldb-vscode plugin users suffer from this as well for example.

saugustine abandoned this revision.May 26 2023, 9:33 AM

I'll let someone with a better understanding of the proper implementation take it from here.

I also like Jordan's rate limiting idea. In my mind that should be a property of the broadcaster. Different tools (e.g. vscode vs the command line) could specify different values when register a listener.

This makes sense: we could augment lldb_private::Listener with additional members that keep track of when the last broadcast time was, and if we're rate limiting. Then we could change the implementation of Listener::GetEvent(lldb::EventSP &event_sp, const Timeout<std::micro> &timeout) to continuously churn through m_events, returning the most recent one by the time the rate limiting window is over, and discarding any intermediate ones in between.

One thing I'm not sure of though is how we'll avoid an unnecessary pause for rate limiting on the last item. This patch avoids that because it checks data->GetCompleted() != data->GetTotal() to decide if we should actually rate limit. In the generic case, how does the listener know that an event it returns is the final one, and that it should ignore the rate limiting delay?

I think we could address that by adding a bool m_important member to lldb::Event, and then it would be up to the broadcaster to set that to true for the last one (or any intermediate ones that are similarly important & should be immediately shown, e.g. warnings/errors). Would that be reasonable?

I also like Jordan's rate limiting idea. In my mind that should be a property of the broadcaster. Different tools (e.g. vscode vs the command line) could specify different values when register a listener.

This makes sense: we could augment lldb_private::Listener with additional members that keep track of when the last broadcast time was, and if we're rate limiting. Then we could change the implementation of Listener::GetEvent(lldb::EventSP &event_sp, const Timeout<std::micro> &timeout) to continuously churn through m_events, returning the most recent one by the time the rate limiting window is over, and discarding any intermediate ones in between.

One thing I'm not sure of though is how we'll avoid an unnecessary pause for rate limiting on the last item. This patch avoids that because it checks data->GetCompleted() != data->GetTotal() to decide if we should actually rate limit. In the generic case, how does the listener know that an event it returns is the final one, and that it should ignore the rate limiting delay?

I think we could address that by adding a bool m_important member to lldb::Event, and then it would be up to the broadcaster to set that to true for the last one (or any intermediate ones that are similarly important & should be immediately shown, e.g. warnings/errors). Would that be reasonable?

The later a decision the decision not to update is made, the more work is wasted. Even the fairly simple solution that checked time in a somewhat expensive way (via the misnamed getCurrentTime that also gets memory used) ended up being slower overall. In my opinion, the problem is that a single-die is too small a unit of work to be worth reporting on.

The other tricky part I missed before is this bit in between pulling the event off the listener queue and deciding to show it:

auto *data = ProgressEventData::GetEventDataFromEvent(event_sp.get());
if (!data)
  return;

// Do some bookkeeping for the current event, regardless of whether we're
// going to show the progress.
const uint64_t id = data->GetID();
if (m_current_event_id) {
  ...
  if (id != *m_current_event_id)
    return;
  if (data->GetCompleted() == data->GetTotal())
    m_current_event_id.reset();
} else {
  m_current_event_id = id;
}

When we pull the event off listener off the queue, we need to do post-processing ourselves to decide if it's even a progress event at all, and if it's meant for us. If we put the listener in charge of rate limiting and returning only the most recent event, it needs to know that the event it's returning is interesting. Otherwise the rate limiting might hide all the interesting events. On top of that, there are events that are *interesting* even if we don't want to show them.

Another option is to provide Listener::GetRateLimitedEvents (name TBD) instead. It (potentially) blocks for the rate limiting period and then returns a *list* of all the events that have happened within that time frame. Then we let the caller process it as it pleases and display only the most recent relevant one. It feels a little weird at first but might actually make sense compared to alternatives?

I think it would be nice to abstract the rate limiting somewhere, although I'm not sure anymore if having it directly in the broadcast/listen classes makes sense.

labath added a comment.Jun 7 2023, 5:10 AM

I agree that we should have a rate limiting mechanism very close to the source, to avoid wasting work for events that aren't going to be used. This is particularly important for debug info parsing, where we have multiple threads working in parallel and taking a lock even just to check whether we should report something could be a choke point.

What would be ideal is a timing thread that wakes up every X seconds and prints the results, but there isn't a good mechanism for that, and doing that portably is way out of scope for this.

I've implemented something like that in D152364. Let me know what you think. I like it because the act of reporting progress does block the reporting thread in any way. (At least for update-string-free updates that is, but I expect that we won't be sending update strings for extremely high frequency events.) However, I'm not entirely sure whether it meets everyone's use cases, mainly because I don't know what those use cases are (e.g. this implementation can "lose" an update string if they are coming too fast -- is that acceptable ?)

In my opinion, the problem is that a single-die is too small a unit of work to be worth reporting on.

I don't think this is correct. The unit of reporting is single DWARF Unit, which feels OK, assuming we don't do anything egregious for each update. What might have confused you is this code here, which tries to parse DIE trees for all units (and updates progress after each unit. However, in my test at least, the DWARF units had all their DIEs extracted by the time we got to this point, which meant that code was essentially doing nothing else except generating progress reports. I'd be tempted to just remove progress reporting from this step altogether, though if we go with something like that patch above (where a single update just increments an atomic var), then I guess keeping it in would not be such a problem either.