This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/
-
llvm/
-
DebugInfo/Symbolize/
-
Symbolize/
-
Symbolize.h
-
Debuginfod/
4/4
Debuginfod.h
-
lib/
-
DebugInfo/Symbolize/
-
Symbolize/
-
Symbolize.cpp
-
Debuginfod/
-
CMakeLists.txt
40/42
Debuginfod.cpp

Differential D114845

[llvm] [Debuginfod] DebuginfodCollection and DebuginfodServer for tracking local debuginfo.
ClosedPublic

Authored by noajshu on Nov 30 2021, 10:12 PM.

Download Raw Diff

Details

Reviewers

dblaikie
phosek
mysterymath

Commits

rGbabef908cc13: [llvm] [Debuginfod] DebuginfodCollection and DebuginfodServer for tracking…

Summary

This library implements the class DebuginfodCollection, which scans a set of directories for binaries, classifying them according to whether they contain debuginfo. This also provides the DebuginfodServer, an HTTPServer which serves debuginfod's /debuginfo and /executable endpoints. This is intended as the final new supporting library required for llvm-debuginfod.

As implemented here, DebuginfodCollection only finds ELF binaries and DWARF debuginfo. All other files are ignored. However, the class interface is format-agnostic. Generalizing to support other platforms will require refactoring of LLVM's object parsing libraries to eliminate use of report_fatal_error (e.g. when reading WASM files), so that the debuginfod daemon does not crash when it encounters a malformed file on the disk.

The DebuginfodCollection is tested by end-to-end tests of the debuginfod server (D114846).

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

noajshu created this revision.Nov 30 2021, 10:12 PM

Herald added subscribers: hiraditya, mgorny. · View Herald TranscriptNov 30 2021, 10:12 PM

noajshu requested review of this revision.Nov 30 2021, 10:12 PM

Herald added a project: Restricted Project. · View Herald TranscriptNov 30 2021, 10:12 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B136838: Diff 390901.Nov 30 2021, 10:13 PM

noajshu edited the summary of this revision. (Show Details)Nov 30 2021, 10:20 PM

Herald added subscribers: sunfish, aheejin. · View Herald TranscriptNov 30 2021, 10:20 PM

noajshu edited the summary of this revision. (Show Details)Nov 30 2021, 10:22 PM

noajshu edited the summary of this revision. (Show Details)

noajshu added a parent revision: D112758: [llvm] [Debuginfo] Debuginfod client library..

noajshu added a child revision: D114846: [llvm] [Debuginfod] LLVM debuginfod server..Nov 30 2021, 10:27 PM

phosek added a subscriber: phosek.Dec 2 2021, 12:01 PM

phosek added inline comments.

llvm/lib/Debuginfod/Debuginfod.cpp
268	I expect this loop to be the performance bottleneck, so I think we should consider using a `ThreadPool` here as well to process files in parallel. That's the strategy used by elfutils' debuginfod as well. This would likely require rethinking the API to allow sharing the thread pool between the file processing and request handling in D114846.

noajshu edited the summary of this revision. (Show Details)Dec 15 2021, 7:25 PM

Add DebuginfodServer class. Add multithreaded directory scanning (unstable).

noajshu retitled this revision from [llvm] [DebugInfo] DebuginfodCollection for tracking local debuginfo. (WIP) to [llvm] [DebugInfo] DebuginfodCollection and DebuginfodServer for tracking local debuginfo. (WIP).Dec 16 2021, 11:18 AM

noajshu edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B139703: Diff 394931.Dec 16 2021, 11:31 AM

Add federation to other debuginfod servers.

llvm/lib/Debuginfod/Debuginfod.cpp
268	Good idea! I've added this, but I find the behavior unstable when the concurrency is >1 and a large directory is scanned. I've searched around for the bug in my code and no luck. I'm wondering if it's possible some of the supporting libraries I'm using are not thread-safe or are leaking memory. But it's more likely I'm missing something obvious. Anyways, going to upload as it's WIP and maybe there is a better task queue architecture you could recommend. PS: the `TaskQueue` in LLVM is not appropriate as it is for serialized sequence of tasks.

Harbormaster completed remote builds in B139756: Diff 395000.Dec 16 2021, 2:57 PM

noajshu mentioned this in D114415: [llvm] [Debuginfod] Add HTTP Server to Debuginfod library..Jan 17 2022, 7:19 PM

dblaikie added a subscriber: dblaikie.Jan 17 2022, 8:33 PM

noajshu added a parent revision: D114415: [llvm] [Debuginfod] Add HTTP Server to Debuginfod library..Jan 24 2022, 5:22 PM

Fix concurrency bug and rebase against main.

noajshu marked an inline comment as done.Jan 25 2022, 6:03 PM

noajshu added inline comments.

llvm/lib/Debuginfod/Debuginfod.cpp
268	The concurrency bug has been fixed.

Harbormaster completed remote builds in B145647: Diff 403095.Jan 26 2022, 10:19 PM

noajshu updated this revision to Diff 406295.Feb 6 2022, 2:54 PM

noajshu edited the summary of this revision. (Show Details)

noajshu added reviewers: dblaikie, phosek.

noajshu retitled this revision from [llvm] [DebugInfo] DebuginfodCollection and DebuginfodServer for tracking local debuginfo. to [llvm] [Debuginfod] DebuginfodCollection and DebuginfodServer for tracking local debuginfo..Feb 6 2022, 3:02 PM

Harbormaster completed remote builds in B147853: Diff 406295.Feb 6 2022, 4:34 PM

fche2 added a subscriber: fche2.Feb 22 2022, 12:30 PM

fche2 added inline comments.

llvm/lib/Debuginfod/Debuginfod.cpp
355	Don't you need a mutex-guard on this operation?
370	Ditto re. mutex-guard?

noajshu added inline comments.Feb 22 2022, 1:36 PM

llvm/lib/Debuginfod/Debuginfod.cpp
355	Thanks so much for catching this! I propose to switch to RW-Mutex so the readers can read without blocking each other.

Use RWMutex to protect readers and writers of the debug binaries and binaries collections.

Harbormaster completed remote builds in B151687: Diff 411708.Feb 27 2022, 4:45 PM

noajshu marked 2 inline comments as done.Feb 27 2022, 4:46 PM

phosek added a reviewer: mysterymath.Feb 28 2022, 11:04 AM

mysterymath added inline comments.Mar 1 2022, 5:07 PM

llvm/include/llvm/Debuginfod/Debuginfod.h
90	I don't think the `= 1` does anything, since providing a constructor inhibits generation of a default one.
101	Since all members are public, this should be a `struct`. This also fits with how it's used, post-construction.
llvm/lib/Debuginfod/Debuginfod.cpp
247–248	This seems a bit odd to unconditionally put to stdout; is this necessary? Otherwise, the whole routine just becomes while (true) { if (Error Err = update()) return Err; std::this_thread::sleep_for(Interval); } return Error::success();
255	llvm_unreachable()?
345–349	Is there an advantage for manually managing the concurrency here, over passing it as an argument to ThreadPool, `std::min`-ed with the hardware concurrency? From a cursor look at ThreadPool's API, each async call after the thread pool is full should just more-or-less push a std::function<void()> onto a vector.

Herald added a project: Restricted Project. · View Herald TranscriptMar 1 2022, 5:07 PM

noajshu added inline comments.Mar 2 2022, 7:21 AM

llvm/lib/Debuginfod/Debuginfod.cpp
247–248	Thanks, I agree this is awkward. The first update is logged to stdout is so the lit tests in D114846 can tell when it's safe to query the server for debuginfo. If the test client pinged the server before it had found the binary it will return a 404 and the test will flake. An alternative is to add a time delay before the client pings the server. This is discouraged as there is timing variability across systems. We could hide this output except for when "verbose" logging is enabled, or similar. Would this be good?

fche2 added inline comments.Mar 2 2022, 7:46 AM

llvm/lib/Debuginfod/Debuginfod.cpp
247–248	(For reference, the elfutils debuginfod exports prometheus metrics about the progress of its operations, so that an external testsuite process can synchronize.)

mysterymath added inline comments.Mar 2 2022, 11:05 AM

llvm/lib/Debuginfod/Debuginfod.cpp
247–248	The first update is logged to stdout is so the lit tests in D114846 can tell when it's safe to query the server for debuginfo. If the test client pinged the server before it had found the binary it will return a 404 and the test will flake. Ah, this sounds like it'd be something generally useful then; it's not uncommon for servers to print once they're ready to serve. Maybe we could make the startup process more explicit: initialize the cache, then bring up HTTP server, print something like "Serving on ...", then set up the continuous updates.
323–325	I poked around, and it doesn't look like dbgs() offers any thread-safety guarantees. If not, these async jobs may have have oddly interleaved logs at best, and undefined behavior at worst.

Replace inserts to dbgs() with use of synchronized logging class

Harbormaster completed remote builds in B152430: Diff 412791.Mar 3 2022, 11:44 AM

Thanks @mysterymath and @fche2 for many very helpful comments!
@mysterymath suggested we could perform the first update manually, then print a message like "ready to accept connections". This way the test client knows when it can ask for artifacts. This seems logical so I will make this change in D114846.

Regarding logging:
@mysterymath pointed out that logging by inserting strings to dbgs() is thread-unsafe. @fche2 pointed out that elfutils' debuginfod exports Prometheus metrics. I would advocate for keeping some logging facility in the application if only because it is helpful for testing and debugging the code. I am not aware of an existing logging framework in LLVM, so I have created a simple std::queue-based logging class DebuginfodLog using a sys::Mutex to synchronize access. In the future this could be upgraded to support Prometheus exports or other features. Please let me know if you think this will suffice, or if you would prefer an alternative solution to the logging problem. Thanks a lot!

llvm/lib/Debuginfod/Debuginfod.cpp
345–349	As this is within a directory iterator loop, my concern was for when there is a large number of files within that directory. If we add tasks to the ThreadPool faster than they are completed, the memory usage of that vector of `std::function<void()>`s becomes unbounded. So I thought it best to manage the progress through the loop more manually. What do you think?

mysterymath added inline comments.Mar 3 2022, 12:20 PM

llvm/lib/Debuginfod/Debuginfod.cpp
345–349	I'm unsure whether or not the buildup would ever cause problems in practice; then again, I just finished cleaning up an unbounded memory usage problem that broke in production. I'll defer that determination to others with more experience. Assuming we keep the semantics here, it seems like what we'd really want is a version of ThreadPool that blocks submission of additional requests if the thread pool is full. This would provide feedback to stop the iterator from producing additional entries that cannot yet be handled (and would thus need to be stored). This seems like a useful abstraction in its own right, and there's definitely prior art. I'd suggest either wrapping ThreadPool to provide such an API (simplified for the purposes of this file) or adding it as an option to ThreadPool. The first would be a slight modification to the code you have to abstract out the management of NumTasksRemaining. It'd also probably be more elegant to have the blocking async call (s) wait on a condition variable, so that the next task that finishes can signal that there is now a thread available, rather than busy-waiting with sleep().

noajshu added inline comments.Mar 28 2022, 3:23 PM

llvm/lib/Debuginfod/Debuginfod.cpp
345–349	Thanks for these suggestions! I agree on all points. It's only a small change to ThreadPool to let us wait for room in the queue with a condition variable: bool queueEmptyUnlocked() { return Tasks.empty(); } void ThreadPool::waitQueue() { // Wait for the queue to be empty std::unique_lock<std::mutex> LockGuard(QueueLock); CompletionCondition.wait(LockGuard, [&] { return queueEmptyUnlocked(); }); } I also collected some data to find out when the unbounded memory usage of the queue of jobs could actually be a problem in production. From my measurements, each job in the pool's queue consumes approximately N + 320 bytes of memory, where N is the number of bytes in the file path. For the right system setup, this could indeed consume lots of memory. Briefly, filesystem caching could allow millions of file paths to be traversed in seconds, but actually reading those files could take longer, causing a queue buildup. However, if these files are ELF binaries they will end up in our `StringMap` in memory anyways with the current implementation. So the unbounded memory usage will be a problem regardless for this user, if their files are mostly ELF binaries. Therefore, only the user who has millions of non-ELF files mixed in with their smaller number of ELF binaries could meaningfully benefit from us waiting here to submit more jobs to the queue. For example, a developer user like myself with limited local memory and millions of files that are not ELF binaries. When I plug in the numbers for my own filesystem, I could fit one job for each file in my development directory with about .4 GB of total memory usage. This is a comfortable margin on my own system but I'm unsure about other users. So if it seems reasonable I will just make the small tweaks to `ThreadPool` API to be safe.

Update ThreadPool API to allow waiting until the queue is empty, allowing Debuginfod to avoid unbounded queue size without manual concurrency management.

Herald added a subscriber: dexonsmith. · View Herald TranscriptMar 31 2022, 4:12 PM

noajshu marked 2 inline comments as done.Mar 31 2022, 4:13 PM

Harbormaster completed remote builds in B157282: Diff 419584.Mar 31 2022, 4:13 PM

mysterymath added inline comments.Mar 31 2022, 4:44 PM

llvm/lib/Debuginfod/Debuginfod.cpp
354	If I'm reading this right, wouldn't this loop dispatch one item to the queue, wait for the queue to be empty, dispatch another item, wait for the queue to be empty, etc. It seems like this disables parallelism entirely. I'd have expected this to wait until the queue was "not full"; then items would be dispatched until max concurrency was reached, and the next item dispatched the moment a thread becomes free.

noajshu added inline comments.Mar 31 2022, 5:32 PM

llvm/lib/Debuginfod/Debuginfod.cpp
354	I don't think this should disable parallelism entirely as the queue will become empty as soon as the job starts processing in a worker thread, rather than when that job finishes. Let me double check this.

noajshu added inline comments.Mar 31 2022, 6:21 PM

llvm/lib/Debuginfod/Debuginfod.cpp
354	On a closer look, you're right about it disabling parallelism. This is due to `ThreadPool`'s internals! Further modification of the `ThreadPool` is required. Also I think it would be simple enough to parametrize as `waitQueueSize(size_t Size)`, blocking until the queue has at most Size tasks.

noajshu updated this revision to Diff 419602.Mar 31 2022, 7:17 PM

Fix incorrect waiting for queue size in ThreadPool.

Harbormaster completed remote builds in B157297: Diff 419602.Mar 31 2022, 7:18 PM

Simplify implementation of ThreadPool::waitQueueSize

Harbormaster completed remote builds in B157300: Diff 419606.Mar 31 2022, 7:42 PM

noajshu added inline comments.Mar 31 2022, 7:43 PM

llvm/lib/Debuginfod/Debuginfod.cpp
354	I corrected the implementation of `waitQueueSize`. Thank you for catching this! Although it takes a `Size` parameter, I leave it as the default of 0 here.

noajshu marked an inline comment as done.May 2 2022, 7:36 PM

Update comments in Debuginfod.cpp and remove declaration of functions getCachedOrDownloadSource and getCachedOrDownloadExecutable from Debuginfod.h as they are only used internally by the debuginfod client.

Harbormaster completed remote builds in B162365: Diff 426567.May 2 2022, 8:31 PM

Thank you to all of the reviewers for your extremely helpful comments!

I believe there are no unresolved comments at this point, however I have two questions:

First, I wonder if it wouldn't hurt to add a /source endpoint, which doesn't check the local DebuginfodCollection at all but simply skips straight to using the client.
This way, users who run a local debuginfod server would only have to set all their known public servers in the DEBUGINFOD_URLS variable once when they start llvm-debuginfod. When they use a client tool (like llvm-symbolizer) it would suffice to point to their local server only, as all requests would be federated.

Second, I was wondering if there is any desire to split this out into two or more diffs. For example, although the changes to ThreadPool and Symbolize are quite small I'm happy to separate them out if desired.

Thank you!

Add updateIfStale to safely update Collection if it is stale (possibly before a periodic update), and add locking to update() to prevent races with other callers.

Harbormaster completed remote builds in B166416: Diff 432195.May 25 2022, 10:47 PM

noajshu mentioned this in D114846: [llvm] [Debuginfod] LLVM debuginfod server..May 25 2022, 10:56 PM

It looks like something went a bit screwy with the last patch; the diff from that patch to the previous deletes a bunch of code, including waitQueueSize, which is still called. Would you try re-uploading the latest changes?

In D114845#3487357, @noajshu wrote:

First, I wonder if it wouldn't hurt to add a /source endpoint, which doesn't check the local DebuginfodCollection at all but simply skips straight to using the client.
This way, users who run a local debuginfod server would only have to set all their known public servers in the DEBUGINFOD_URLS variable once when they start llvm-debuginfod. When they use a client tool (like llvm-symbolizer) it would suffice to point to their local server only, as all requests would be federated.

Seems to me like we'd eventually want the source endpoint to behave like the other endpoints: first try looking things up locally, then federate. Both behaviors would be naturally supported when we get around to implementing the source endpoint. The behavior you describe would be the one you'd get if you provide an empty list of paths to scan.

Second, I was wondering if there is any desire to split this out into two or more diffs. For example, although the changes to ThreadPool and Symbolize are quite small I'm happy to separate them out if desired.

IMO, the symbolize change seems safe enough, but ThreadPool is a pretty foundational library. It seems wise to separate this out and loop in the folks who've tended to touch the thread pool; they may have more specific opinions.

Add back accidentally-deleted changes (thanks @mysterymath !)

Harbormaster completed remote builds in B166730: Diff 432668.May 27 2022, 4:32 PM

Seems to me like we'd eventually want the source endpoint to behave like the other endpoints: first try looking things up locally, then federate. Both behaviors would be naturally supported when we get around to implementing the source endpoint.

I agree. In this case I won't add it here, but leave this for future revisions.

IMO, the symbolize change seems safe enough, but ThreadPool is a pretty foundational library. It seems wise to separate this out and loop in the folks who've tended to touch the thread pool; they may have more specific opinions.

Ok, I will split out those changes separately!

mysterymath added inline comments.May 31 2022, 2:52 PM

llvm/include/llvm/Debuginfod/Debuginfod.h
96	Since the Message is commonly formed by concatenation, making this take `const Twine& Message` instead should save some heap allocations when logging.
101	From how this is used, it looks like it'd be simpler to make pop() blocking and non-Optional.
llvm/lib/Debuginfod/Debuginfod.cpp
261	Prefer StringRef over auto, since the type is obvious and concrete.
268	const std::string& over auto; I think as written this will even copy the string.
324	EC, from the LLVM variable naming convention
326	`I` and `E`, from the LLVM variable naming convention
360	std::lock_guard should work with RWMutexes for exclusive locking, and it's easier to read than doing it manually.
375
383	nit: remove empty line
404	nit: remove empty line
425
431–432	This path will always usually cause a "collection was not stale" error message if the binary was not found, which isn't as good as just "binary not found." It's also surprising that updateIfStale() will throw an error if the collection was not stale; usually methods with that naming convention do nothing, since the name suggests that it's not an error to violate the precondition.
433
443
448
454
462
472
496	This `return` doesn't do anything; remove.

noajshu mentioned this in D126815: [llvm] [Support] [Debuginfod] waitQueueSize for ThreadPool.Jun 1 2022, 1:18 PM

noajshu added a parent revision: D126815: [llvm] [Support] [Debuginfod] waitQueueSize for ThreadPool.

Remove changes to ThreadPool, which were moved to D126815.

dexonsmith removed a subscriber: dexonsmith.Jun 1 2022, 1:23 PM

Harbormaster completed remote builds in B167346: Diff 433518.Jun 1 2022, 1:23 PM

Replace manual locking of RWMutex with RAII locks throughout, incorporate other suggested changes.

Harbormaster completed remote builds in B167363: Diff 433544.Jun 1 2022, 2:14 PM

noajshu marked 18 inline comments as done.Jun 1 2022, 2:18 PM

noajshu added inline comments.

llvm/lib/Debuginfod/Debuginfod.cpp
360	Thanks! Since we're `std::shared_lock<RWMutex> lock(write);
431–432	Good point, how about this? We return an `Expected<bool>` where if there are no errors during `update()`, returns whether the collection got updated. If it's not stale, we don't bother checking for the path again.

noajshu marked an inline comment as done.Jun 1 2022, 2:20 PM

noajshu added inline comments.

llvm/lib/Debuginfod/Debuginfod.cpp
360	Good idea, I switched everything to use `std::` raii locks and it's much cleaner.

LGTM, but please wait until at least sometime next week to submit, so there's space for any last-chance comments on this one.

This revision is now accepted and ready to land.Jun 1 2022, 2:25 PM

Add back declaration of getDefaultDebuginfodUrls, as it is used by llvm-symbolizer.

Harbormaster completed remote builds in B167397: Diff 433586.Jun 1 2022, 4:22 PM

Correct usage of Timer

Harbormaster completed remote builds in B167442: Diff 433646.Jun 1 2022, 9:13 PM

Per the discussion on D126815, we switch the concurrency model to one in which workers directly advance the shared (lock-protected) directory iterator.

We add a condition variable to notify the main update thread when the iteration is complete and all workers have returned. We have removed the dependency on waitQueueSize so that we can proceed without D126815.

Harbormaster completed remote builds in B170411: Diff 437761.Jun 16 2022, 5:56 PM

mysterymath added inline comments.Jun 17 2022, 10:01 AM

llvm/lib/Debuginfod/Debuginfod.cpp
390–396	Since each async call exits as soon as `I == E \|\| EC`, this block can be replaced with just `ThreadPool.wait()`, and there's no need to maintain a separate condition variable. (ThreadPool maintains one internally for this purpose.) It also looks like the decrements of NumActiveWorkers and the outer loop are interleaved in a way that adds some complexity to the situation; this would remove the need for NumActiveWorkers, which clears that up too.

Rebase against main and replace NumActiveThreads counting with use of ThreadPoolTaskGroup.

noajshu marked an inline comment as done.Jun 21 2022, 1:40 PM

noajshu added inline comments.

llvm/lib/Debuginfod/Debuginfod.cpp
390–396	great idea! Since the first draft of this revision, ThreadPoolTaskGroup was merged into LLVM. It nicely fits our use case, here we just use `IteratorGroup.wait();`. (now that I am aware of the new Task Group feature, I propose to close D126815 altogether, as it does appear quite ad-hoc in light of the clean alternative of using a task group)

Harbormaster completed remote builds in B171183: Diff 438821.Jun 21 2022, 1:41 PM

noajshu removed parent revisions: D126815: [llvm] [Support] [Debuginfod] waitQueueSize for ThreadPool, D114415: [llvm] [Debuginfod] Add HTTP Server to Debuginfod library., D112758: [llvm] [Debuginfo] Debuginfod client library..Jul 6 2022, 1:00 PM

noajshu edited the summary of this revision. (Show Details)

This revision was landed with ongoing or failed builds.Jul 6 2022, 1:02 PM

Closed by commit rGbabef908cc13: [llvm] [Debuginfod] DebuginfodCollection and DebuginfodServer for tracking… (authored by noajshu). · Explain Why

This revision was automatically updated to reflect the committed changes.

noajshu marked an inline comment as done.

noajshu added a commit: rGbabef908cc13: [llvm] [Debuginfod] DebuginfodCollection and DebuginfodServer for tracking….

This doesn't build: http://45.33.8.238/linux/80459/step_4.txt

In D114845#3633730, @thakis wrote:

This doesn't build: http://45.33.8.238/linux/80459/step_4.txt

(fixed in 39ed08f8d452)

I think this causes link errors in shared library builds:

https://lab.llvm.org/buildbot/#/builders/207/builds/8193
https://lab.llvm.org/buildbot/#/builders/181/builds/5860

In D114845#3633866, @Meinersbur wrote:

I think this causes link errors in shared library builds:

https://lab.llvm.org/buildbot/#/builders/207/builds/8193
https://lab.llvm.org/buildbot/#/builders/181/builds/5860

Thanks, I'm trying to reproduce this with a local build to see what the problem is.

I have a fix, pushing now.

noajshu mentioned this in rG3703f5132718: [Debuginfod] Try to fix shared library build after babef908cc1 (D114845) and….Jul 6 2022, 4:01 PM

It's been 6 hours since https://lab.llvm.org/buildbot/#/builders/57/builds/19588 started failing with

FAILED: lib/libLLVMDebuginfod.so.15git

In D114845#3634447, @hubert.reinterpretcast wrote:
It's been 6 hours since https://lab.llvm.org/buildbot/#/builders/57/builds/19588 started failing with
FAILED: lib/libLLVMDebuginfod.so.15git

Sorry about this, I'll fix this right away.

In D114845#3634447, @hubert.reinterpretcast wrote:
It's been 6 hours since https://lab.llvm.org/buildbot/#/builders/57/builds/19588 started failing with
FAILED: lib/libLLVMDebuginfod.so.15git

fixed by 819a7f98cd6d

In D114845#3634557, @noajshu wrote:

fixed by 819a7f98cd6d

Thanks!

noajshu removed a child revision: D114846: [llvm] [Debuginfod] LLVM debuginfod server..Jul 7 2022, 10:52 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

DebugInfo/

Symbolize/

Symbolize.h

2 lines

Debuginfod/

Debuginfod.h

82 lines

lib/

DebugInfo/

Symbolize/

Symbolize.cpp

4 lines

Debuginfod/

CMakeLists.txt

1 line

Debuginfod.cpp

315 lines

Diff 442671

llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h

Show First 20 Lines • Show All 237 Lines • ▼ Show 20 Lines	public:

size_t size() { return Bin.getBinary()->getData().size(); }		size_t size() { return Bin.getBinary()->getData().size(); }

private:		private:
OwningBinary<Binary> Bin;		OwningBinary<Binary> Bin;
std::function<void()> Evictor;		std::function<void()> Evictor;
};		};

		Optional<ArrayRef<uint8_t>> getBuildID(const ELFObjectFileBase *Obj);

} // end namespace symbolize		} // end namespace symbolize
} // end namespace llvm		} // end namespace llvm

#endif // LLVM_DEBUGINFO_SYMBOLIZE_SYMBOLIZE_H		#endif // LLVM_DEBUGINFO_SYMBOLIZE_SYMBOLIZE_H

llvm/include/llvm/Debuginfod/Debuginfod.h

	//===-- llvm/Debuginfod/Debuginfod.h - Debuginfod client --------- C++ --===//			//===-- llvm/Debuginfod/Debuginfod.h - Debuginfod client --------- C++ --===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	///			///
	/// \file			/// \file
	/// This file contains the declarations of getCachedOrDownloadArtifact and			/// This file contains several declarations for the debuginfod client and
	/// several convenience functions for specific artifact types:			/// server. The client functions are getDefaultDebuginfodUrls,
	/// getCachedOrDownloadSource, getCachedOrDownloadExecutable, and			/// getCachedOrDownloadArtifact, and several convenience functions for specific
	/// getCachedOrDownloadDebuginfo. This file also declares			/// artifact types: getCachedOrDownloadSource, getCachedOrDownloadExecutable,
	/// getDefaultDebuginfodUrls and getDefaultDebuginfodCacheDirectory.			/// and getCachedOrDownloadDebuginfo. For the server, this file declares the
	///			/// DebuginfodLogEntry and DebuginfodServer structs, as well as the
				/// DebuginfodLog, DebuginfodCollection classes.
	///			///
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef LLVM_DEBUGINFOD_DEBUGINFOD_H			#ifndef LLVM_DEBUGINFOD_DEBUGINFOD_H
	#define LLVM_DEBUGINFOD_DEBUGINFOD_H			#define LLVM_DEBUGINFOD_DEBUGINFOD_H

				#include "llvm/ADT/Optional.h"
				#include "llvm/ADT/StringMap.h"
	#include "llvm/ADT/StringRef.h"			#include "llvm/ADT/StringRef.h"
				#include "llvm/Debuginfod/HTTPServer.h"
	#include "llvm/Support/Error.h"			#include "llvm/Support/Error.h"
	#include "llvm/Support/MemoryBuffer.h"			#include "llvm/Support/MemoryBuffer.h"
				#include "llvm/Support/Mutex.h"
				#include "llvm/Support/RWMutex.h"
				#include "llvm/Support/Timer.h"

	#include <chrono>			#include <chrono>
				#include <queue>

	namespace llvm {			namespace llvm {

	typedef ArrayRef<uint8_t> BuildIDRef;			typedef ArrayRef<uint8_t> BuildIDRef;

	typedef SmallVector<uint8_t, 10> BuildID;			typedef SmallVector<uint8_t, 10> BuildID;

	/// Finds default array of Debuginfod server URLs by checking DEBUGINFOD_URLS			/// Finds default array of Debuginfod server URLs by checking DEBUGINFOD_URLS
	Show All 28 Lines

	/// Fetches any debuginfod artifact using the specified local cache directory,			/// Fetches any debuginfod artifact using the specified local cache directory,
	/// server URLs, and request timeout (in milliseconds). If the artifact is			/// server URLs, and request timeout (in milliseconds). If the artifact is
	/// found, uses the UniqueKey for the local cache file.			/// found, uses the UniqueKey for the local cache file.
	Expected<std::string> getCachedOrDownloadArtifact(			Expected<std::string> getCachedOrDownloadArtifact(
	StringRef UniqueKey, StringRef UrlPath, StringRef CacheDirectoryPath,			StringRef UniqueKey, StringRef UrlPath, StringRef CacheDirectoryPath,
	ArrayRef<StringRef> DebuginfodUrls, std::chrono::milliseconds Timeout);			ArrayRef<StringRef> DebuginfodUrls, std::chrono::milliseconds Timeout);

				class ThreadPool;

				struct DebuginfodLogEntry {
				std::string Message;
				DebuginfodLogEntry() = default;
				DebuginfodLogEntry(const Twine &Message);
				};

				class DebuginfodLog {
				std::mutex QueueMutex;
				std::condition_variable QueueCondition;
				std::queue<DebuginfodLogEntry> LogEntryQueue;
				mysterymathUnsubmitted Done Reply Inline Actions I don't think the `= 1` does anything, since providing a constructor inhibits generation of a default one. mysterymath: I don't think the `= 1` does anything, since providing a constructor inhibits generation of a…

				public:
				// Adds a log entry to end of the queue.
				void push(DebuginfodLogEntry Entry);
				// Adds a log entry to end of the queue.
				void push(const Twine &Message);
				mysterymathUnsubmitted Done Reply Inline Actions Since the Message is commonly formed by concatenation, making this take `const Twine& Message` instead should save some heap allocations when logging. mysterymath: Since the Message is commonly formed by concatenation, making this take `const Twine& Message`…
				// Blocks until there are log entries in the queue, then pops and returns the
				// first one.
				DebuginfodLogEntry pop();
				};

				mysterymathUnsubmitted Done Reply Inline Actions Since all members are public, this should be a `struct`. This also fits with how it's used, post-construction. mysterymath: Since all members are public, this should be a `struct`. This also fits with how it's used…
				mysterymathUnsubmitted Done Reply Inline Actions From how this is used, it looks like it'd be simpler to make pop() blocking and non-Optional. mysterymath: From how this is used, it looks like it'd be simpler to make pop() blocking and non-Optional.
				/// Tracks a collection of debuginfod artifacts on the local filesystem.
				class DebuginfodCollection {
				SmallVector<std::string, 1> Paths;
				sys::RWMutex BinariesMutex;
				StringMap<std::string> Binaries;
				sys::RWMutex DebugBinariesMutex;
				StringMap<std::string> DebugBinaries;
				Error findBinaries(StringRef Path);
				Expected<Optional<std::string>> getDebugBinaryPath(BuildIDRef);
				Expected<Optional<std::string>> getBinaryPath(BuildIDRef);
				// If the collection has not been updated since MinInterval, call update() and
				// return true. Otherwise return false. If update returns an error, return the
				// error.
				Expected<bool> updateIfStale();
				DebuginfodLog &Log;
				ThreadPool &Pool;
				Timer UpdateTimer;
				sys::Mutex UpdateMutex;

				// Minimum update interval, in seconds, for on-demand updates triggered when a
				// build-id is not found.
				double MinInterval;

				public:
				DebuginfodCollection(ArrayRef<StringRef> Paths, DebuginfodLog &Log,
				ThreadPool &Pool, double MinInterval);
				Error update();
				Error updateForever(std::chrono::milliseconds Interval);
				Expected<std::string> findDebugBinaryPath(BuildIDRef);
				Expected<std::string> findBinaryPath(BuildIDRef);
				};

				struct DebuginfodServer {
				HTTPServer Server;
				DebuginfodLog &Log;
				DebuginfodCollection &Collection;
				DebuginfodServer(DebuginfodLog &Log, DebuginfodCollection &Collection);
				};

	} // end namespace llvm			} // end namespace llvm

	#endif			#endif

llvm/lib/DebugInfo/Symbolize/Symbolize.cpp

Show First 20 Lines • Show All 321 Lines • ▼ Show 20 Lines	for (auto N : Obj.notes(P, Err))
if (N.getType() == ELF::NT_GNU_BUILD_ID &&		if (N.getType() == ELF::NT_GNU_BUILD_ID &&
N.getName() == ELF::ELF_NOTE_GNU)		N.getName() == ELF::ELF_NOTE_GNU)
return N.getDesc();		return N.getDesc();
consumeError(std::move(Err));		consumeError(std::move(Err));
}		}
return {};		return {};
}		}

		} // end anonymous namespace

Optional<ArrayRef<uint8_t>> getBuildID(const ELFObjectFileBase *Obj) {		Optional<ArrayRef<uint8_t>> getBuildID(const ELFObjectFileBase *Obj) {
Optional<ArrayRef<uint8_t>> BuildID;		Optional<ArrayRef<uint8_t>> BuildID;
if (auto *O = dyn_cast<ELFObjectFile<ELF32LE>>(Obj))		if (auto *O = dyn_cast<ELFObjectFile<ELF32LE>>(Obj))
BuildID = getBuildID(O->getELFFile());		BuildID = getBuildID(O->getELFFile());
else if (auto *O = dyn_cast<ELFObjectFile<ELF32BE>>(Obj))		else if (auto *O = dyn_cast<ELFObjectFile<ELF32BE>>(Obj))
BuildID = getBuildID(O->getELFFile());		BuildID = getBuildID(O->getELFFile());
else if (auto *O = dyn_cast<ELFObjectFile<ELF64LE>>(Obj))		else if (auto *O = dyn_cast<ELFObjectFile<ELF64LE>>(Obj))
BuildID = getBuildID(O->getELFFile());		BuildID = getBuildID(O->getELFFile());
else if (auto *O = dyn_cast<ELFObjectFile<ELF64BE>>(Obj))		else if (auto *O = dyn_cast<ELFObjectFile<ELF64BE>>(Obj))
BuildID = getBuildID(O->getELFFile());		BuildID = getBuildID(O->getELFFile());
else		else
llvm_unreachable("unsupported file format");		llvm_unreachable("unsupported file format");
return BuildID;		return BuildID;
}		}

} // end anonymous namespace

ObjectFile *LLVMSymbolizer::lookUpDsymFile(const std::string &ExePath,		ObjectFile *LLVMSymbolizer::lookUpDsymFile(const std::string &ExePath,
const MachOObjectFile *MachExeObj,		const MachOObjectFile *MachExeObj,
const std::string &ArchName) {		const std::string &ArchName) {
// On Darwin we may find DWARF in separate object file in		// On Darwin we may find DWARF in separate object file in
// resource directory.		// resource directory.
std::vector<std::string> DsymPaths;		std::vector<std::string> DsymPaths;
StringRef Filename = sys::path::filename(ExePath);		StringRef Filename = sys::path::filename(ExePath);
DsymPaths.push_back(		DsymPaths.push_back(
▲ Show 20 Lines • Show All 426 Lines • Show Last 20 Lines

llvm/lib/Debuginfod/CMakeLists.txt

Show All 19 Lines	add_llvm_library(LLVMDebuginfod
${LLVM_MAIN_INCLUDE_DIR}/llvm/Debuginfod		${LLVM_MAIN_INCLUDE_DIR}/llvm/Debuginfod

LINK_LIBS		LINK_LIBS
${imported_libs}		${imported_libs}

LINK_COMPONENTS		LINK_COMPONENTS
Support		Support
Symbolize		Symbolize
		DebugInfoDWARF
)		)

llvm/lib/Debuginfod/Debuginfod.cpp

//===-- llvm/Debuginfod/Debuginfod.cpp - Debuginfod client library --------===// //===-- llvm/Debuginfod/Debuginfod.cpp - Debuginfod client library --------===//

// //

// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.

// See https://llvm.org/LICENSE.txt for license information. // See https://llvm.org/LICENSE.txt for license information.

// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

// //

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

/// ///

/// \file /// \file

/// ///

/// This file defines the fetchInfo function, which retrieves /// This file contains several definitions for the debuginfod client and server.

/// any of the three supported artifact types: (executable, debuginfo, source /// For the client, this file defines the fetchInfo function. For the server,

/// file) associated with a build-id from debuginfod servers. If a source file /// this file defines the DebuginfodLogEntry and DebuginfodServer structs, as

/// is to be fetched, its absolute path must be specified in the Description /// well as the DebuginfodLog, DebuginfodCollection classes. The fetchInfo

/// argument to fetchInfo. /// function retrieves any of the three supported artifact types: (executable,

/// debuginfo, source file) associated with a build-id from debuginfod servers.

/// If a source file is to be fetched, its absolute path must be specified in

/// the Description argument to fetchInfo. The DebuginfodLogEntry,

/// DebuginfodLog, and DebuginfodCollection are used by the DebuginfodServer to

/// scan the local filesystem for binaries and serve the debuginfod protocol.

/// ///

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

#include "llvm/Debuginfod/Debuginfod.h" #include "llvm/Debuginfod/Debuginfod.h"

#include "llvm/ADT/StringRef.h" #include "llvm/ADT/StringRef.h"

#include "llvm/BinaryFormat/Magic.h"

#include "llvm/DebugInfo/DWARF/DWARFContext.h"

#include "llvm/DebugInfo/Symbolize/Symbolize.h"

#include "llvm/Debuginfod/HTTPClient.h" #include "llvm/Debuginfod/HTTPClient.h"

#include "llvm/Object/Binary.h"

#include "llvm/Object/ELFObjectFile.h"

#include "llvm/Object/ObjectFile.h"

#include "llvm/Support/CachePruning.h" #include "llvm/Support/CachePruning.h"

#include "llvm/Support/Caching.h" #include "llvm/Support/Caching.h"

#include "llvm/Support/Errc.h" #include "llvm/Support/Errc.h"

#include "llvm/Support/Error.h" #include "llvm/Support/Error.h"

#include "llvm/Support/FileUtilities.h" #include "llvm/Support/FileUtilities.h"

#include "llvm/Support/Path.h" #include "llvm/Support/Path.h"

#include "llvm/Support/ThreadPool.h"

#include "llvm/Support/xxhash.h" #include "llvm/Support/xxhash.h"

#include <atomic>

namespace llvm { namespace llvm {

static std::string uniqueKey(llvm::StringRef S) { return utostr(xxHash64(S)); } static std::string uniqueKey(llvm::StringRef S) { return utostr(xxHash64(S)); }

// Returns a binary BuildID as a normalized hex string. // Returns a binary BuildID as a normalized hex string.

// Uses lowercase for compatibility with common debuginfod servers. // Uses lowercase for compatibility with common debuginfod servers.

static std::string buildIDToString(BuildIDRef ID) { static std::string buildIDToString(BuildIDRef ID) {

return llvm::toHex(ID, /*LowerCase=*/true); return llvm::toHex(ID, /*LowerCase=*/true);

} }

Expected<SmallVector<StringRef>> getDefaultDebuginfodUrls() { Expected<SmallVector<StringRef>> getDefaultDebuginfodUrls() {

const char *DebuginfodUrlsEnv = std::getenv("DEBUGINFOD_URLS"); const char *DebuginfodUrlsEnv = std::getenv("DEBUGINFOD_URLS");

if (DebuginfodUrlsEnv == nullptr) if (DebuginfodUrlsEnv == nullptr)

return SmallVector<StringRef>(); return SmallVector<StringRef>();

SmallVector<StringRef> DebuginfodUrls; SmallVector<StringRef> DebuginfodUrls;

StringRef(DebuginfodUrlsEnv).split(DebuginfodUrls, " "); StringRef(DebuginfodUrlsEnv).split(DebuginfodUrls, " ");

return DebuginfodUrls; return DebuginfodUrls;

} }

/// Finds a default local file caching directory for the debuginfod client,

/// first checking DEBUGINFOD_CACHE_PATH.

Expected<std::string> getDefaultDebuginfodCacheDirectory() { Expected<std::string> getDefaultDebuginfodCacheDirectory() {

if (const char *CacheDirectoryEnv = std::getenv("DEBUGINFOD_CACHE_PATH")) if (const char *CacheDirectoryEnv = std::getenv("DEBUGINFOD_CACHE_PATH"))

return CacheDirectoryEnv; return CacheDirectoryEnv;

SmallString<64> CacheDirectory; SmallString<64> CacheDirectory;

if (!sys::path::cache_directory(CacheDirectory)) if (!sys::path::cache_directory(CacheDirectory))

return createStringError( return createStringError(

errc::io_error, "Unable to determine appropriate cache directory."); errc::io_error, "Unable to determine appropriate cache directory.");

▲ Show 20 Lines • Show All 146 Lines • ▼ Show 20 Lines if (Client.responseCode() != 200)

continue; continue;

// Return the path to the artifact on disk. // Return the path to the artifact on disk.

return std::string(AbsCachedArtifactPath); return std::string(AbsCachedArtifactPath);

} }

return createStringError(errc::argument_out_of_domain, "build id not found"); return createStringError(errc::argument_out_of_domain, "build id not found");

} }

DebuginfodLogEntry::DebuginfodLogEntry(const Twine &Message)

: Message(Message.str()) {}

void DebuginfodLog::push(const Twine &Message) {

push(DebuginfodLogEntry(Message));

}

void DebuginfodLog::push(DebuginfodLogEntry Entry) {

{

std::lock_guard<std::mutex> Guard(QueueMutex);

LogEntryQueue.push(Entry);

}

QueueCondition.notify_one();

}

DebuginfodLogEntry DebuginfodLog::pop() {

{

std::unique_lock<std::mutex> Guard(QueueMutex);

// Wait for messages to be pushed into the queue.

QueueCondition.wait(Guard, [&] { return !LogEntryQueue.empty(); });

}

mysterymathUnsubmitted

Done

This seems a bit odd to unconditionally put to stdout; is this necessary?

Otherwise, the whole routine just becomes

while (true) {
  if (Error Err = update())
    return Err;
  std::this_thread::sleep_for(Interval);
}
return Error::success();

mysterymath: This seems a bit odd to unconditionally put to stdout; is this necessary? Otherwise, the whole…

noajshuAuthorUnsubmitted

Done

Thanks, I agree this is awkward. The first update is logged to stdout is so the lit tests in D114846 can tell when it's safe to query the server for debuginfo. If the test client pinged the server before it had found the binary it will return a 404 and the test will flake.

An alternative is to add a time delay before the client pings the server. This is discouraged as there is timing variability across systems.

We could hide this output except for when "verbose" logging is enabled, or similar. Would this be good?

noajshu: Thanks, I agree this is awkward. The first update is logged to stdout is so the lit tests in…

fche2Unsubmitted

Done

(For reference, the elfutils debuginfod exports prometheus metrics about the progress of its operations, so that an external testsuite process can synchronize.)

fche2: (For reference, the elfutils debuginfod exports prometheus metrics about the progress of its…

mysterymathUnsubmitted

Done

The first update is logged to stdout is so the lit tests in D114846 can tell when it's safe to query the server for debuginfo. If the test client pinged the server before it had found the binary it will return a 404 and the test will flake.

Ah, this sounds like it'd be something generally useful then; it's not uncommon for servers to print once they're ready to serve.

Maybe we could make the startup process more explicit: initialize the cache, then bring up HTTP server, print something like "Serving on ...", then set up the continuous updates.

mysterymath: > The first update is logged to stdout is so the lit tests in D114846 can tell when it's safe…

std::lock_guard<std::mutex> Guard(QueueMutex);

if (!LogEntryQueue.size())

llvm_unreachable("Expected message in the queue.");

DebuginfodLogEntry Entry = LogEntryQueue.front();

LogEntryQueue.pop();

return Entry;

mysterymathUnsubmitted

Done

llvm_unreachable()?

mysterymath: llvm_unreachable()?

}

DebuginfodCollection::DebuginfodCollection(ArrayRef<StringRef> PathsRef,

DebuginfodLog &Log, ThreadPool &Pool,

double MinInterval)

: Log(Log), Pool(Pool), MinInterval(MinInterval) {

mysterymathUnsubmitted

Done

Prefer StringRef over auto, since the type is obvious and concrete.

mysterymath: Prefer StringRef over auto, since the type is obvious and concrete.

for (StringRef Path : PathsRef)

Paths.push_back(Path.str());

}

Error DebuginfodCollection::update() {

std::lock_guard<sys::Mutex> Guard(UpdateMutex);

if (UpdateTimer.isRunning())

phosekUnsubmitted

Done

I expect this loop to be the performance bottleneck, so I think we should consider using a ThreadPool here as well to process files in parallel. That's the strategy used by elfutils' debuginfod as well.

This would likely require rethinking the API to allow sharing the thread pool between the file processing and request handling in D114846.

phosek: I expect this loop to be the performance bottleneck, so I think we should consider using a…

noajshuAuthorUnsubmitted

Done

Good idea!
I've added this, but I find the behavior unstable when the concurrency is >1 and a large directory is scanned.
I've searched around for the bug in my code and no luck. I'm wondering if it's possible some of the supporting libraries I'm using are not thread-safe or are leaking memory. But it's more likely I'm missing something obvious. Anyways, going to upload as it's WIP and maybe there is a better task queue architecture you could recommend.
PS: the TaskQueue in LLVM is not appropriate as it is for serialized sequence of tasks.

noajshu: Good idea! I've added this, but I find the behavior unstable when the concurrency is >1 and a…

noajshuAuthorUnsubmitted

Done

The concurrency bug has been fixed.

noajshu: The concurrency bug has been fixed.

mysterymathUnsubmitted

Done

const std::string& over auto; I think as written this will even copy the string.

mysterymath: const std::string& over auto; I think as written this will even copy the string.

UpdateTimer.stopTimer();

UpdateTimer.clear();

for (const std::string &Path : Paths) {

Log.push("Updating binaries at path " + Path);

if (Error Err = findBinaries(Path))

return Err;

}

Log.push("Updated collection");

UpdateTimer.startTimer();

return Error::success();

}

Expected<bool> DebuginfodCollection::updateIfStale() {

if (!UpdateTimer.isRunning())

return false;

UpdateTimer.stopTimer();

double Time = UpdateTimer.getTotalTime().getWallTime();

UpdateTimer.startTimer();

if (Time < MinInterval)

return false;

if (Error Err = update())

return std::move(Err);

return true;

}

Error DebuginfodCollection::updateForever(std::chrono::milliseconds Interval) {

while (true) {

if (Error Err = update())

return Err;

std::this_thread::sleep_for(Interval);

}

llvm_unreachable("updateForever loop should never end");

}

static bool isDebugBinary(object::ObjectFile *Object) {

// TODO: handle PDB debuginfo

std::unique_ptr<DWARFContext> Context = DWARFContext::create(

*Object, DWARFContext::ProcessDebugRelocations::Process);

const DWARFObject &DObj = Context->getDWARFObj();

unsigned NumSections = 0;

DObj.forEachInfoSections([&](const DWARFSection &S) { NumSections++; });

return NumSections;

}

static bool hasELFMagic(StringRef FilePath) {

file_magic Type;

std::error_code EC = identify_magic(FilePath, Type);

if (EC)

return false;

switch (Type) {

case file_magic::elf:

case file_magic::elf_relocatable:

case file_magic::elf_executable:

case file_magic::elf_shared_object:

case file_magic::elf_core:

return true;

mysterymathUnsubmitted

Done

EC, from the LLVM variable naming convention

mysterymath: EC, from the LLVM variable naming convention

default:

mysterymathUnsubmitted

Done

I poked around, and it doesn't look like dbgs() offers any thread-safety guarantees. If not, these async jobs may have have oddly interleaved logs at best, and undefined behavior at worst.

mysterymath: I poked around, and it doesn't look like dbgs() offers any thread-safety guarantees. If not…

return false;

mysterymathUnsubmitted

Done

I and E, from the LLVM variable naming convention

mysterymath: `I` and `E`, from the LLVM variable naming convention

}

Error DebuginfodCollection::findBinaries(StringRef Path) {

std::error_code EC;

sys::fs::recursive_directory_iterator I(Twine(Path), EC), E;

std::mutex IteratorMutex;

ThreadPoolTaskGroup IteratorGroup(Pool);

for (unsigned WorkerIndex = 0; WorkerIndex < Pool.getThreadCount();

WorkerIndex++) {

IteratorGroup.async([&, this]() -> void {

std::string FilePath;

while (true) {

{

// Check if iteration is over or there is an error during iteration

std::lock_guard<std::mutex> Guard(IteratorMutex);

if (I == E || EC)

return;

// Grab a file path from the directory iterator and advance the

// iterator.

FilePath = I->path();

I.increment(EC);

}

mysterymathUnsubmitted

Done

Is there an advantage for manually managing the concurrency here, over passing it as an argument to ThreadPool, std::min-ed with the hardware concurrency?
From a cursor look at ThreadPool's API, each async call after the thread pool is full should just more-or-less push a std::function<void()> onto a vector.

mysterymath: Is there an advantage for manually managing the concurrency here, over passing it as an…

noajshuAuthorUnsubmitted

Done

As this is within a directory iterator loop, my concern was for when there is a large number of files within that directory. If we add tasks to the ThreadPool faster than they are completed, the memory usage of that vector of std::function<void()>s becomes unbounded. So I thought it best to manage the progress through the loop more manually. What do you think?

noajshu: As this is within a directory iterator loop, my concern was for when there is a large number of…

mysterymathUnsubmitted

Done

I'm unsure whether or not the buildup would ever cause problems in practice; then again, I just finished cleaning up an unbounded memory usage problem that broke in production. I'll defer that determination to others with more experience.

Assuming we keep the semantics here, it seems like what we'd really want is a version of ThreadPool that blocks submission of additional requests if the thread pool is full. This would provide feedback to stop the iterator from producing additional entries that cannot yet be handled (and would thus need to be stored). This seems like a useful abstraction in its own right, and there's definitely prior art.

I'd suggest either wrapping ThreadPool to provide such an API (simplified for the purposes of this file) or adding it as an option to ThreadPool. The first would be a slight modification to the code you have to abstract out the management of NumTasksRemaining. It'd also probably be more elegant to have the blocking async call (s) wait on a condition variable, so that the next task that finishes can signal that there is now a thread available, rather than busy-waiting with sleep().

mysterymath: I'm unsure whether or not the buildup would ever cause problems in practice; then again, I just…

noajshuAuthorUnsubmitted

Done

Thanks for these suggestions! I agree on all points.

It's only a small change to ThreadPool to let us wait for room in the queue with a condition variable:

bool queueEmptyUnlocked() { return Tasks.empty(); }

void ThreadPool::waitQueue() {
  // Wait for the queue to be empty
  std::unique_lock<std::mutex> LockGuard(QueueLock);
  CompletionCondition.wait(LockGuard, [&] { return queueEmptyUnlocked(); });
}

I also collected some data to find out when the unbounded memory usage of the queue of jobs could actually be a problem in production. From my measurements, each job in the pool's queue consumes approximately N + 320 bytes of memory, where N is the number of bytes in the file path. For the right system setup, this could indeed consume lots of memory. Briefly, filesystem caching could allow millions of file paths to be traversed in seconds, but actually reading those files could take longer, causing a queue buildup.

However, if these files are ELF binaries they will end up in our StringMap in memory anyways with the current implementation. So the unbounded memory usage will be a problem regardless for this user, if their files are mostly ELF binaries.

Therefore, only the user who has millions of non-ELF files mixed in with their smaller number of ELF binaries could meaningfully benefit from us waiting here to submit more jobs to the queue. For example, a developer user like myself with limited local memory and millions of files that are not ELF binaries. When I plug in the numbers for my own filesystem, I could fit one job for each file in my development directory with about .4 GB of total memory usage. This is a comfortable margin on my own system but I'm unsure about other users. So if it seems reasonable I will just make the small tweaks to ThreadPool API to be safe.

noajshu: Thanks for these suggestions! I agree on all points. It's only a small change to ThreadPool to…

// Inspect the file at this path to determine if it is debuginfo.

if (!hasELFMagic(FilePath))

continue;

mysterymathUnsubmitted

Done

If I'm reading this right, wouldn't this loop dispatch one item to the queue, wait for the queue to be empty, dispatch another item, wait for the queue to be empty, etc. It seems like this disables parallelism entirely.

I'd have expected this to wait until the queue was "not full"; then items would be dispatched until max concurrency was reached, and the next item dispatched the moment a thread becomes free.

mysterymath: If I'm reading this right, wouldn't this loop dispatch one item to the queue, wait for the…

noajshuAuthorUnsubmitted

Done

I don't think this should disable parallelism entirely as the queue will become empty as soon as the job starts processing in a worker thread, rather than when that job finishes. Let me double check this.

noajshu: I don't think this should disable parallelism entirely as the queue will become empty as soon…

noajshuAuthorUnsubmitted

Done

On a closer look, you're right about it disabling parallelism. This is due to ThreadPool's internals! Further modification of the ThreadPool is required.

Also I think it would be simple enough to parametrize as waitQueueSize(size_t Size), blocking until the queue has at most Size tasks.

noajshu: On a closer look, you're right about it disabling parallelism. This is due to `ThreadPool`'s…

noajshuAuthorUnsubmitted

Done

I corrected the implementation of waitQueueSize. Thank you for catching this!

Although it takes a Size parameter, I leave it as the default of 0 here.

noajshu: I corrected the implementation of `waitQueueSize`. Thank you for catching this! Although it…

Expected<object::OwningBinary<object::Binary>> BinOrErr =

fche2Unsubmitted

Done

Don't you need a mutex-guard on this operation?

fche2: Don't you need a mutex-guard on this operation?

noajshuAuthorUnsubmitted

Done

Thanks so much for catching this!
I propose to switch to RW-Mutex so the readers can read without blocking each other.

noajshu: Thanks so much for catching this! I propose to switch to [[ https://llvm.

object::createBinary(FilePath);

if (!BinOrErr) {

consumeError(BinOrErr.takeError());

continue;

mysterymathUnsubmitted

Done

std::lock_guard should work with RWMutexes for exclusive locking, and it's easier to read than doing it manually.

mysterymath: std::lock_guard should work with RWMutexes for exclusive locking, and it's easier to read than…

noajshuAuthorUnsubmitted

Done

Thanks! Since we're `std::shared_lock<RWMutex> lock(write);

noajshu: Thanks! Since we're `std::shared_lock<RWMutex> lock(write);

noajshuAuthorUnsubmitted

Done

Good idea, I switched everything to use std:: raii locks and it's much cleaner.

noajshu: Good idea, I switched everything to use `std::` raii locks and it's much cleaner.

}

object::Binary *Bin = std::move(BinOrErr.get().getBinary());

if (!Bin->isObject())

continue;

// TODO: Support non-ELF binaries

object::ELFObjectFileBase *Object =

dyn_cast<object::ELFObjectFileBase>(Bin);

if (!Object)

continue;

fche2Unsubmitted

Done

Ditto re. mutex-guard?

fche2: Ditto re. mutex-guard?

Optional<BuildIDRef> ID = symbolize::getBuildID(Object);

if (!ID)

continue;

mysterymathUnsubmitted

Done

// Wait for empty queue before proceeding to the next file to avoid

- // unbounded memory usage

+ // unbounded memory usage.

Pool.waitQueueSize();

mysterymath:

std::string IDString = buildIDToString(ID.getValue());

if (isDebugBinary(Object)) {

std::lock_guard<sys::RWMutex> DebugBinariesGuard(DebugBinariesMutex);

DebugBinaries[IDString] = FilePath;

} else {

std::lock_guard<sys::RWMutex> BinariesGuard(BinariesMutex);

Binaries[IDString] = FilePath;

}

mysterymathUnsubmitted

Done

nit: remove empty line

mysterymath: nit: remove empty line

}

});

}

IteratorGroup.wait();

std::unique_lock<std::mutex> Guard(IteratorMutex);

if (EC)

return errorCodeToError(EC);

return Error::success();

}

Expected<Optional<std::string>>

DebuginfodCollection::getBinaryPath(BuildIDRef ID) {

Log.push("getting binary path of ID " + buildIDToString(ID));

mysterymathUnsubmitted

Done

Since each async call exits as soon as I == E || EC, this block can be replaced with just ThreadPool.wait(), and there's no need to maintain a separate condition variable. (ThreadPool maintains one internally for this purpose.)
It also looks like the decrements of NumActiveWorkers and the outer loop are interleaved in a way that adds some complexity to the situation; this would remove the need for NumActiveWorkers, which clears that up too.

mysterymath: Since each async call exits as soon as `I == E || EC`, this block can be replaced with just…

noajshuAuthorUnsubmitted

Done

great idea!

Since the first draft of this revision, ThreadPoolTaskGroup was merged into LLVM. It nicely fits our use case, here we just use IteratorGroup.wait();.

(now that I am aware of the new Task Group feature, I propose to close D126815 altogether, as it does appear quite ad-hoc in light of the clean alternative of using a task group)

noajshu: great idea! Since the first draft of this revision, [[ https://llvm.

std::shared_lock<sys::RWMutex> Guard(BinariesMutex);

auto Loc = Binaries.find(buildIDToString(ID));

if (Loc != Binaries.end()) {

std::string Path = Loc->getValue();

return Path;

}

return None;

}

mysterymathUnsubmitted

Done

nit: remove empty line

mysterymath: nit: remove empty line

Expected<Optional<std::string>>

DebuginfodCollection::getDebugBinaryPath(BuildIDRef ID) {

Log.push("getting debug binary path of ID " + buildIDToString(ID));

std::shared_lock<sys::RWMutex> Guard(DebugBinariesMutex);

auto Loc = DebugBinaries.find(buildIDToString(ID));

if (Loc != DebugBinaries.end()) {

std::string Path = Loc->getValue();

return Path;

}

return None;

}

Expected<std::string> DebuginfodCollection::findBinaryPath(BuildIDRef ID) {

{

// Check collection; perform on-demand update if stale.

Expected<Optional<std::string>> PathOrErr = getBinaryPath(ID);

if (!PathOrErr)

return PathOrErr.takeError();

Optional<std::string> Path = *PathOrErr;

if (!Path) {

mysterymathUnsubmitted

Done

Expected<std::string> DebuginfodCollection::findBinaryPath(BuildIDRef ID) {

{

- // check collection, perform on-demand update if stale

+ // Check collection; perform on-demand update if stale.

Expected<Optional<std::string>> PathOrErr = getBinaryPath(ID);

mysterymath:

Expected<bool> UpdatedOrErr = updateIfStale();

if (!UpdatedOrErr)

return UpdatedOrErr.takeError();

if (*UpdatedOrErr) {

// Try once more.

PathOrErr = getBinaryPath(ID);

if (!PathOrErr)

mysterymathUnsubmitted

Not Done

This path will always usually cause a "collection was not stale" error message if the binary was not found, which isn't as good as just "binary not found." It's also surprising that updateIfStale() will throw an error if the collection was not stale; usually methods with that naming convention do nothing, since the name suggests that it's not an error to violate the precondition.

mysterymath: This path will always usually cause a "collection was not stale" error message if the binary…

noajshuAuthorUnsubmitted

Not Done

Good point, how about this? We return an Expected<bool> where if there are no errors during update(), returns whether the collection got updated. If it's not stale, we don't bother checking for the path again.

noajshu: Good point, how about this? We return an `Expected<bool>` where if there are no errors during…

return PathOrErr.takeError();

mysterymathUnsubmitted

Done

return std::move(Err);

- // try once more

+ // Try once more.

PathOrErr = getBinaryPath(ID);

mysterymath:

Path = *PathOrErr;

}

if (Path)

return Path.getValue();

}

// Try federation.

Expected<std::string> PathOrErr = getCachedOrDownloadExecutable(ID);

if (!PathOrErr)

mysterymathUnsubmitted

Done

return Path.getValue();

}

- // federation

+ // Try federation.

Expected<std::string> PathOrErr = getCachedOrDownloadExecutable(ID);

mysterymath:

consumeError(PathOrErr.takeError());

// Fall back to debug binary.

return findDebugBinaryPath(ID);

}

mysterymathUnsubmitted

Done

consumeError(PathOrErr.takeError());

- // fall-back to debug binary

+ // Fall back to debug binary.

return findDebugBinaryPath(ID);

mysterymath:

Expected<std::string> DebuginfodCollection::findDebugBinaryPath(BuildIDRef ID) {

// Check collection; perform on-demand update if stale.

Expected<Optional<std::string>> PathOrErr = getDebugBinaryPath(ID);

if (!PathOrErr)

return PathOrErr.takeError();

mysterymathUnsubmitted

Done

Expected<std::string> DebuginfodCollection::findDebugBinaryPath(BuildIDRef ID) {

{

- // check collection, perform on-demand update if stale

+ // Check collection; perform on-demand update if stale.

Expected<Optional<std::string>> PathOrErr = getDebugBinaryPath(ID);

mysterymath:

Optional<std::string> Path = *PathOrErr;

if (!Path) {

Expected<bool> UpdatedOrErr = updateIfStale();

if (!UpdatedOrErr)

return UpdatedOrErr.takeError();

if (*UpdatedOrErr) {

// Try once more.

PathOrErr = getBinaryPath(ID);

mysterymathUnsubmitted

Done

return std::move(Err);

- // try once more

+ // Try once more.

PathOrErr = getDebugBinaryPath(ID);

mysterymath:

if (!PathOrErr)

return PathOrErr.takeError();

Path = *PathOrErr;

}

if (Path)

return Path.getValue();

// Try federation.

return getCachedOrDownloadDebuginfo(ID);

mysterymathUnsubmitted

Done

return Path.getValue();

}

- // federation

+ // Try federation.

return getCachedOrDownloadDebuginfo(ID);

mysterymath:

}

DebuginfodServer::DebuginfodServer(DebuginfodLog &Log,

DebuginfodCollection &Collection)

: Log(Log), Collection(Collection) {

cantFail(

Server.get(R"(/buildid/(.*)/debuginfo)", [&](HTTPServerRequest Request) {

Log.push("GET " + Request.UrlPath);

std::string IDString;

if (!tryGetFromHex(Request.UrlPathMatches[0], IDString)) {

Request.setResponse(

{404, "text/plain", "Build ID is not a hex string\n"});

return;

}

BuildID ID(IDString.begin(), IDString.end());

Expected<std::string> PathOrErr = Collection.findDebugBinaryPath(ID);

if (Error Err = PathOrErr.takeError()) {

consumeError(std::move(Err));

Request.setResponse({404, "text/plain", "Build ID not found\n"});

return;

}

streamFile(Request, *PathOrErr);

}));

cantFail(

mysterymathUnsubmitted

Done

This return doesn't do anything; remove.

mysterymath: This `return` doesn't do anything; remove.

Server.get(R"(/buildid/(.*)/executable)", [&](HTTPServerRequest Request) {

Log.push("GET " + Request.UrlPath);

std::string IDString;

if (!tryGetFromHex(Request.UrlPathMatches[0], IDString)) {

Request.setResponse(

{404, "text/plain", "Build ID is not a hex string\n"});

return;

}

BuildID ID(IDString.begin(), IDString.end());

Expected<std::string> PathOrErr = Collection.findBinaryPath(ID);

if (Error Err = PathOrErr.takeError()) {

consumeError(std::move(Err));

Request.setResponse({404, "text/plain", "Build ID not found\n"});

return;

}

streamFile(Request, *PathOrErr);

}));

}

} // namespace llvm } // namespace llvm

This is an archive of the discontinued LLVM Phabricator instance.

[llvm] [Debuginfod] DebuginfodCollection and DebuginfodServer for tracking local debuginfo.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 442671

llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h

llvm/include/llvm/Debuginfod/Debuginfod.h

llvm/lib/DebugInfo/Symbolize/Symbolize.cpp

llvm/lib/Debuginfod/CMakeLists.txt

llvm/lib/Debuginfod/Debuginfod.cpp

[llvm] [Debuginfod] DebuginfodCollection and DebuginfodServer for tracking local debuginfo.
ClosedPublic