This is an archive of the discontinued LLVM Phabricator instance.

[llvm] [Debuginfod] Add HTTP Server to Debuginfod library.
ClosedPublic

Authored by noajshu on Nov 22 2021, 9:47 PM.

Details

Summary

This provides a minimal HTTP server interface and an implementation wrapping cpp-httplib in the Debuginfod library. If the Curl HTTP client is available (D112753) the server is tested by pinging it with the client.

Diff Detail

Unit TestsFailed

Event Timeline

noajshu created this revision.Nov 22 2021, 9:47 PM
noajshu requested review of this revision.Nov 22 2021, 9:47 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 22 2021, 9:47 PM
phosek added inline comments.Nov 22 2021, 11:13 PM
llvm/include/llvm/Support/HTTPServer.h
1 ↗(On Diff #389094)

This file is missing LLVM header.

llvm/lib/Support/HTTPServer.cpp
1 ↗(On Diff #389094)

This file is missing LLVM header.

19 ↗(On Diff #389094)

I wonder if we should perhaps move this to a separate method (like get) and take the path as an argument so we can potentially support multiple handlers.

23 ↗(On Diff #389094)

Having both Response and Resp is confusing, can we use a different name here?

57 ↗(On Diff #389094)

Calling listen on an unbound server should probably return an error rather than success.

noajshu updated this revision to Diff 389348.Nov 23 2021, 4:00 PM

Incorporate feedback, refactor, add unit tests, and rebase against D113218.

noajshu marked 5 inline comments as done.Nov 23 2021, 4:05 PM
noajshu added inline comments.
llvm/lib/Support/HTTPServer.cpp
19 ↗(On Diff #389094)

My initial thought was to support multiple "handlers" using one big handler that would parse the URL and dispatch to the appropriate sub-handler. However it does seem more user-friendly to expose the same interface as cpp-httplib, letting the HTTPServer deal with the url parsing.

noajshu retitled this revision from [llvm] [Support] (WIP) Add HTTP Client Support library. to [llvm] [Support] (WIP) Add HTTP Server Support library..Nov 23 2021, 4:45 PM
noajshu updated this revision to Diff 390457.Nov 29 2021, 1:58 PM
noajshu marked an inline comment as done.
noajshu retitled this revision from [llvm] [Support] (WIP) Add HTTP Server Support library. to [llvm] [Support] Add HTTP Server Support library..

Implement streaming responses and file streaming helper; add unit tests of client timeouts and streaming string / file responses.

What's the ultimate use intended for this? For testing the debuginfod client functionality (the llvm-symbolizer functionality is tested with a smaller(?) python http server - perhaps that could be used more & we could avoid having this C++ HTTP server implementation?)?

What's the ultimate use intended for this? For testing the debuginfod client functionality (the llvm-symbolizer functionality is tested with a smaller(?) python http server - perhaps that could be used more & we could avoid having this C++ HTTP server implementation?)?

It's for the server part of the LLVM debuginfod implementation. elfutils debuginfod has two parts: (1) debuginfod-find client and the corresponding library that could be integrated into other tools and (2) debuginfod which is a small daemon that periodically scans a set of directories, indexes any debugging information it finds and serves it over the builtin HTTP server using the debuginfod protocol. We have been focusing on #1 so far but we would also like to implement #2 which is really important for local development. https://groups.google.com/g/llvm-dev/c/jFdq0qYtKqM/m/1dLcYUGBBAAJ has more details.

What's the ultimate use intended for this? For testing the debuginfod client functionality (the llvm-symbolizer functionality is tested with a smaller(?) python http server - perhaps that could be used more & we could avoid having this C++ HTTP server implementation?)?

It's for the server part of the LLVM debuginfod implementation. elfutils debuginfod has two parts: (1) debuginfod-find client and the corresponding library that could be integrated into other tools and (2) debuginfod which is a small daemon that periodically scans a set of directories, indexes any debugging information it finds and serves it over the builtin HTTP server using the debuginfod protocol. We have been focusing on #1 so far but we would also like to implement #2 which is really important for local development. https://groups.google.com/g/llvm-dev/c/jFdq0qYtKqM/m/1dLcYUGBBAAJ has more details.

Oh, fair enough - just checking it wasn't only being implemented for testing. Would the python script from the llvm-symbolizer test be adequate for the production/local developer scenarios you have in mind? (I don't mind the C++ too much, but just trying to understand the landscape, tradeoffs, etc)

What's the ultimate use intended for this? For testing the debuginfod client functionality (the llvm-symbolizer functionality is tested with a smaller(?) python http server - perhaps that could be used more & we could avoid having this C++ HTTP server implementation?)?

It's for the server part of the LLVM debuginfod implementation. elfutils debuginfod has two parts: (1) debuginfod-find client and the corresponding library that could be integrated into other tools and (2) debuginfod which is a small daemon that periodically scans a set of directories, indexes any debugging information it finds and serves it over the builtin HTTP server using the debuginfod protocol. We have been focusing on #1 so far but we would also like to implement #2 which is really important for local development. https://groups.google.com/g/llvm-dev/c/jFdq0qYtKqM/m/1dLcYUGBBAAJ has more details.

Oh, fair enough - just checking it wasn't only being implemented for testing. Would the python script from the llvm-symbolizer test be adequate for the production/local developer scenarios you have in mind? (I don't mind the C++ too much, but just trying to understand the landscape, tradeoffs, etc)

The python script used in D113717 (which might also get used to test D112759) only takes care of the HTTP static file serving needed for debuginfod. There is another component, which is to search the filesystem for debug binaries and assemble the collection of artifacts to serve. I'll be publishing a diff shortly which implements this functionality using LLVM's object parsing and filesystem utilities. Then we will have all the pieces needed for a basic C++ debuginfod in LLVM.

There are workarounds we could use to get a simple debuginfod server working without this LLVM HTTP server. E.g., we could use CGI and FastCGI, or we could create symlinks / copies to the discovered debug data in a single static file serving path. These workarounds came with trade offs, and there are other tools in LLVM that could take advantage of a cross-platform HTTP server such as bisectd (D113030). This is what led to the goal of getting an HTTP server in LLVM's supporting libraries. (A side benefit is that we can now thoroughly unit test the HTTP client by letting it communicate with the server, but this isn't the motivation :) .)

What's the ultimate use intended for this? For testing the debuginfod client functionality (the llvm-symbolizer functionality is tested with a smaller(?) python http server - perhaps that could be used more & we could avoid having this C++ HTTP server implementation?)?

It's for the server part of the LLVM debuginfod implementation. elfutils debuginfod has two parts: (1) debuginfod-find client and the corresponding library that could be integrated into other tools and (2) debuginfod which is a small daemon that periodically scans a set of directories, indexes any debugging information it finds and serves it over the builtin HTTP server using the debuginfod protocol. We have been focusing on #1 so far but we would also like to implement #2 which is really important for local development. https://groups.google.com/g/llvm-dev/c/jFdq0qYtKqM/m/1dLcYUGBBAAJ has more details.

Oh, fair enough - just checking it wasn't only being implemented for testing. Would the python script from the llvm-symbolizer test be adequate for the production/local developer scenarios you have in mind? (I don't mind the C++ too much, but just trying to understand the landscape, tradeoffs, etc)

The python script used in D113717 (which might also get used to test D112759) only takes care of the HTTP static file serving needed for debuginfod. There is another component, which is to search the filesystem for debug binaries and assemble the collection of artifacts to serve. I'll be publishing a diff shortly which implements this functionality using LLVM's object parsing and filesystem utilities. Then we will have all the pieces needed for a basic C++ debuginfod in LLVM.

There are workarounds we could use to get a simple debuginfod server working without this LLVM HTTP server. E.g., we could use CGI and FastCGI, or we could create symlinks / copies to the discovered debug data in a single static file serving path. These workarounds came with trade offs, and there are other tools in LLVM that could take advantage of a cross-platform HTTP server such as bisectd (D113030). This is what led to the goal of getting an HTTP server in LLVM's supporting libraries. (A side benefit is that we can now thoroughly unit test the HTTP client by letting it communicate with the server, but this isn't the motivation :) .)

Fair enough - thanks for the context!

llvm/include/llvm/Support/HTTPServer.h
29–31 ↗(On Diff #390457)

Should the first element instead be a separate member, since it seems it's not like the rest of the elements/will be handled differently?

llvm/lib/Support/HTTPServer.cpp
58 ↗(On Diff #390457)

Could this parameter (the last lambda, the "CompletionHandler" member in StreamingHTTPResponse) be move-only (non-copyable) in that case it could capture-by-move the std::unique_ptr<MemoryBuffer> and be a bit more robust in terms of memory management?

67 ↗(On Diff #390457)

Perhaps HTTPRequestHandler should be an llvm::function_ref since it doesn't appear to outlive this call, if I'm understanding the code correctly?

94 ↗(On Diff #390457)

should this be const auto&? (I'm not sure how big/expensive to copy these objects are, perhaps it's suitable to copy them)

101–102 ↗(On Diff #390457)

Looks like this could use StringRef instead of std::string, to avoid copying the contents? Or are there cases where a provider might want to return content by value, rather than from some underlying/longer-lived storage? (worth providing something that could be optimal for both cases somehow - like providing an optional std::string out parameter that the provider could populate if it doesn't have its own long-lived backing store?)

125–130 ↗(On Diff #390457)

If the underlying API requires a c-style string, maybe simpler to expose that up through the caller layers? Rather than allocating a buffer to create a null terminated string when the caller might already have one to pass down anyway? (alternatively, I guess, if you use Twine as the parameter type rather than StringRef, then at least in the cases where the Twine does refer to a null terminated string it won't have to copy the string again - currently since the HostInterface is StringRef, it'd always have to copy into the buffer and append a null terminator)

142–144 ↗(On Diff #390457)

Follow-up work to use httplib set_error_handler to improve these error messages, perhaps?

158–162 ↗(On Diff #390457)

This looks out of date, presumably it doesn't compile? (HTTPServer doesn't declare any ctors, so I guess this would fail to compile)

llvm/unittests/Support/HTTPServer.cpp
73 ↗(On Diff #390457)

probably make this by-value?

77 ↗(On Diff #390457)

Is some corresponding de-initialize/teardown required/desirable? (perhaps some scoped initialize/de-initialize) or put the initialize in a global ctor (or test fixture (SetUp? Whatever the one is that's called before any test (but not before every test individually)) in the test file if there's no expectation/desire to initialize/teardown around each test?

82 ↗(On Diff #390457)

The MemoryBuffer:: probably isn't needed here, I'd have thought?

184–189 ↗(On Diff #390457)

timing tests like this can be sensitive to machine load - perhaps setting the acceptable timeout much higher than the actual timeout would be good to increase test stability? (eg: the 40 is probably fine - the actual 50ms delay shouldn't ever be shorter than requested, and so it probably shouldn't tickle that test (though it's possible - if the test code were delayed in reaching the timeout call while the delay was already running, it could appear as though the delay was shorter than expected) - but changing the 60 up to something well above the threshold might be good).

Though also, it's not your job to test that HTTPClient implements timeouts correctly, only that you pass them down through the API - so if there's a more robust way to test that property without actually having to run real timeouts (or at least not testing them so precisely) it might be helpful.

206 ↗(On Diff #390457)

I think there's probably a matcher that matches all the elements in a more terse format than having to check each separately? ( https://stackoverflow.com/q/1460703 )

215–217 ↗(On Diff #390457)

Prefer llvm_unreachable over assert(false)

noajshu marked 6 inline comments as done.Dec 11 2021, 2:49 PM
noajshu added inline comments.
llvm/lib/Support/HTTPServer.cpp
58 ↗(On Diff #390457)

That's a great idea! Cpp-httplib converts the content provider to a ContentProvider which is a std::function. This ends up copying the lambda.

I was thinking we might refactor the interface, so instead of returning the response (or response provider) you can interact with a "HTTPServerRequest" object similar to how cpp-httplib's own API works.
Their API is more flexible. For example, you can decide at runtime whether you want to do streaming response handling or return a single response string. In the wrapper implementation here you have to either return a StreamingHTTPResponse or an HTTPResponse, fixed at compile time.

Even if we mirror the httplib interface, the problem remains that we cannot pass a non-copyable lambda directly to httplib's set_content_provider. I'm not sure if there is a nice way around this problem.

67 ↗(On Diff #390457)

Ah, I think the nomenclature is misleading. HTTPServer::get should be called something like HTTPServer::registerGetHandler. So it returns, and then you can later call bind() etc., but the function provided in the argument may be called much later after listen when there is a request to be handled.

llvm/unittests/Support/HTTPServer.cpp
77 ↗(On Diff #390457)

There is a discussion about the need for a "manual" global HTTPClient::initialize() here. Briefly, Curl's global initialization is not thread-safe and may load other libs; the loader lock on windows prevents use of a static ctor to initialize. We added HTTPClient::initialize() to InitLLVM for convenience, but we had to remove it when HTTPClient got moved to the Debuginfod lib. So now the HTTPClient::initialize must be called "manually". By the way, there is ongoing discussion within libcurl community on making curl global init thread safe.

On the other hand, we can safely deinitialize with a static destructor as you suggest.
This is a great idea which jhenderson also suggested on D113717, and you can see the implementation here:
https://reviews.llvm.org/D113717#change-LO0j60bMM9xj
So there is no need for the de-initialize / teardown here, pending that diff.

dblaikie added inline comments.Dec 13 2021, 6:23 PM
llvm/lib/Support/HTTPServer.cpp
58 ↗(On Diff #390457)

Ah, oh well. Thanks for the context!

noajshu updated this revision to Diff 394718.Dec 15 2021, 7:00 PM
noajshu marked 4 inline comments as done.

Refactor HTTPServerRequest interface to be more close to cpphttplib, and address all of @dblaikie 's comments.

Hi @dblaikie thank you so much for the review! I believe I have addressed all your comments, however they disappeared because I moved the HTTPServer into the Debuginfod library. You can see them on the previous revision.

llvm/lib/Support/HTTPServer.cpp
101–102 ↗(On Diff #390457)

I've switched it to take a StringRef -- for the only place this is used currently (streaming chunks of a file) you have a persistent MemoryBuffer which is responsible for the storage.

125–130 ↗(On Diff #390457)

It's a good point that no matter whether we use cpp-httplib or something else, ultimately the system call will need a c-style string. I've changed it to a const char* and I could switch to a Twine if it's better.

142–144 ↗(On Diff #390457)

Good idea!
It looks like httplib's set_error_handler is used to set the default response content (e.g. an error page) when the user's handler returns an HTTP error code (>400). More detailed error information is available to us by checking errno, but cpp-httplib doesn't check it other than to see if the bind / listen was successful.

In a future TCPSocket implementation for LLVM it would be great to parse the errno better to give more helpful messages to the user.

llvm/unittests/Support/HTTPServer.cpp
82 ↗(On Diff #390457)

Ah, sadly it is required, unless you can see a better way around the following.
The Body is a WritableMemoryBuffer, whose override of getBuffer returns an MutableArrayRef<char> (as opposed to the parent's virtual method MemoryBuffer::getBuffer which returns a StringRef). For some reason there is no operator== implemented between MutableArrayRef<char> and std::string. So we have to manually specify use of the parent method

184–189 ↗(On Diff #390457)

This is a really good observation. I'm concerned that even with a lot of extra wait time, it might flake under some extreme system load. Let's remove it for now, leaving just the test in which a timeout is expected.
HTTPClient::setTimeout is just directly calling curl_easy_setopt so I think that the libcurl tests should cover it for the case where a (nonzero) timeout does not expire.

dblaikie accepted this revision.Jan 17 2022, 1:31 PM

I think this is mostly OK. (still, might be worth initializing/deinitializing in test fixtures, to isolate the tests a bit more - but probably not a big deal/not the sort of bugs we're likely to see (eg: where one test taints the libcurl state and another fails in weird ways because of that taint))

llvm/unittests/Support/HTTPServer.cpp
82 ↗(On Diff #390457)

Seems like addressing the missing comparison would be good as a separate patch.

This revision is now accepted and ready to land.Jan 17 2022, 1:31 PM
noajshu retitled this revision from [llvm] [Support] Add HTTP Server Support library. to [llvm] [Debuginfo] Add HTTP Server to Debuginfod library..Jan 17 2022, 6:48 PM
noajshu edited the summary of this revision. (Show Details)
noajshu updated this revision to Diff 400692.Jan 17 2022, 7:03 PM
noajshu marked an inline comment as done.

Use test fixtures to perform HTTP Client initialize and teardown for the client-server tests.

@dblaikie Thanks for taking another look and the LGTM! Per your suggestion I've added a test fixture to call the HTTPClient initialization and teardown methods. Thanks for all the comments on this patch.

It seems at first llvm-debuginfod will be the sole user tool. So, my inclination is to hold off on committing this until the llvm-debuginfod server (D114845 and D114846) is ready to land.

@dblaikie Thanks for taking another look and the LGTM! Per your suggestion I've added a test fixture to call the HTTPClient initialization and teardown methods. Thanks for all the comments on this patch.

It seems at first llvm-debuginfod will be the sole user tool. So, my inclination is to hold off on committing this until the llvm-debuginfod server (D114845 and D114846) is ready to land.

Yep, sounds fair.

noajshu updated this revision to Diff 406283.Feb 6 2022, 2:18 PM

Replace caputure-by-reference of request handler with capture-by-copy, and add a unit test to verify temporary handlers can be registered and work correctly.

noajshu updated this revision to Diff 406294.Feb 6 2022, 2:52 PM

Fix typo in unit tests.

noajshu retitled this revision from [llvm] [Debuginfo] Add HTTP Server to Debuginfod library. to [llvm] [Debuginfod] Add HTTP Server to Debuginfod library..Feb 6 2022, 3:02 PM
noajshu updated this revision to Diff 411709.Feb 27 2022, 4:45 PM

Rebase against main

noajshu updated this revision to Diff 438472.Jun 20 2022, 2:08 PM

Rebase against main

Herald added a project: Restricted Project. · View Herald TranscriptJun 20 2022, 2:08 PM
noajshu updated this revision to Diff 438815.Jun 21 2022, 1:32 PM

Fix HTTP Server / Client unit tests that were broken after rebase

This revision was landed with ongoing or failed builds.Jul 6 2022, 11:57 AM
This revision was automatically updated to reflect the committed changes.