This provides a minimal HTTP server interface and an implementation wrapping cpp-httplib in the Debuginfod library. If the Curl HTTP client is available (D112753) the server is tested by pinging it with the client.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/include/llvm/Support/HTTPServer.h | ||
---|---|---|
1 ↗ | (On Diff #389094) | This file is missing LLVM header. |
llvm/lib/Support/HTTPServer.cpp | ||
1 ↗ | (On Diff #389094) | This file is missing LLVM header. |
19 ↗ | (On Diff #389094) | I wonder if we should perhaps move this to a separate method (like get) and take the path as an argument so we can potentially support multiple handlers. |
23 ↗ | (On Diff #389094) | Having both Response and Resp is confusing, can we use a different name here? |
57 ↗ | (On Diff #389094) | Calling listen on an unbound server should probably return an error rather than success. |
llvm/lib/Support/HTTPServer.cpp | ||
---|---|---|
19 ↗ | (On Diff #389094) | My initial thought was to support multiple "handlers" using one big handler that would parse the URL and dispatch to the appropriate sub-handler. However it does seem more user-friendly to expose the same interface as cpp-httplib, letting the HTTPServer deal with the url parsing. |
Implement streaming responses and file streaming helper; add unit tests of client timeouts and streaming string / file responses.
What's the ultimate use intended for this? For testing the debuginfod client functionality (the llvm-symbolizer functionality is tested with a smaller(?) python http server - perhaps that could be used more & we could avoid having this C++ HTTP server implementation?)?
It's for the server part of the LLVM debuginfod implementation. elfutils debuginfod has two parts: (1) debuginfod-find client and the corresponding library that could be integrated into other tools and (2) debuginfod which is a small daemon that periodically scans a set of directories, indexes any debugging information it finds and serves it over the builtin HTTP server using the debuginfod protocol. We have been focusing on #1 so far but we would also like to implement #2 which is really important for local development. https://groups.google.com/g/llvm-dev/c/jFdq0qYtKqM/m/1dLcYUGBBAAJ has more details.
Oh, fair enough - just checking it wasn't only being implemented for testing. Would the python script from the llvm-symbolizer test be adequate for the production/local developer scenarios you have in mind? (I don't mind the C++ too much, but just trying to understand the landscape, tradeoffs, etc)
The python script used in D113717 (which might also get used to test D112759) only takes care of the HTTP static file serving needed for debuginfod. There is another component, which is to search the filesystem for debug binaries and assemble the collection of artifacts to serve. I'll be publishing a diff shortly which implements this functionality using LLVM's object parsing and filesystem utilities. Then we will have all the pieces needed for a basic C++ debuginfod in LLVM.
There are workarounds we could use to get a simple debuginfod server working without this LLVM HTTP server. E.g., we could use CGI and FastCGI, or we could create symlinks / copies to the discovered debug data in a single static file serving path. These workarounds came with trade offs, and there are other tools in LLVM that could take advantage of a cross-platform HTTP server such as bisectd (D113030). This is what led to the goal of getting an HTTP server in LLVM's supporting libraries. (A side benefit is that we can now thoroughly unit test the HTTP client by letting it communicate with the server, but this isn't the motivation :) .)
Fair enough - thanks for the context!
llvm/include/llvm/Support/HTTPServer.h | ||
---|---|---|
29–31 ↗ | (On Diff #390457) | Should the first element instead be a separate member, since it seems it's not like the rest of the elements/will be handled differently? |
llvm/lib/Support/HTTPServer.cpp | ||
58 ↗ | (On Diff #390457) | Could this parameter (the last lambda, the "CompletionHandler" member in StreamingHTTPResponse) be move-only (non-copyable) in that case it could capture-by-move the std::unique_ptr<MemoryBuffer> and be a bit more robust in terms of memory management? |
67 ↗ | (On Diff #390457) | Perhaps HTTPRequestHandler should be an llvm::function_ref since it doesn't appear to outlive this call, if I'm understanding the code correctly? |
94 ↗ | (On Diff #390457) | should this be const auto&? (I'm not sure how big/expensive to copy these objects are, perhaps it's suitable to copy them) |
101–102 ↗ | (On Diff #390457) | Looks like this could use StringRef instead of std::string, to avoid copying the contents? Or are there cases where a provider might want to return content by value, rather than from some underlying/longer-lived storage? (worth providing something that could be optimal for both cases somehow - like providing an optional std::string out parameter that the provider could populate if it doesn't have its own long-lived backing store?) |
125–130 ↗ | (On Diff #390457) | If the underlying API requires a c-style string, maybe simpler to expose that up through the caller layers? Rather than allocating a buffer to create a null terminated string when the caller might already have one to pass down anyway? (alternatively, I guess, if you use Twine as the parameter type rather than StringRef, then at least in the cases where the Twine does refer to a null terminated string it won't have to copy the string again - currently since the HostInterface is StringRef, it'd always have to copy into the buffer and append a null terminator) |
142–144 ↗ | (On Diff #390457) | Follow-up work to use httplib set_error_handler to improve these error messages, perhaps? |
158–162 ↗ | (On Diff #390457) | This looks out of date, presumably it doesn't compile? (HTTPServer doesn't declare any ctors, so I guess this would fail to compile) |
llvm/unittests/Support/HTTPServer.cpp | ||
73 ↗ | (On Diff #390457) | probably make this by-value? |
77 ↗ | (On Diff #390457) | Is some corresponding de-initialize/teardown required/desirable? (perhaps some scoped initialize/de-initialize) or put the initialize in a global ctor (or test fixture (SetUp? Whatever the one is that's called before any test (but not before every test individually)) in the test file if there's no expectation/desire to initialize/teardown around each test? |
82 ↗ | (On Diff #390457) | The MemoryBuffer:: probably isn't needed here, I'd have thought? |
184–189 ↗ | (On Diff #390457) | timing tests like this can be sensitive to machine load - perhaps setting the acceptable timeout much higher than the actual timeout would be good to increase test stability? (eg: the 40 is probably fine - the actual 50ms delay shouldn't ever be shorter than requested, and so it probably shouldn't tickle that test (though it's possible - if the test code were delayed in reaching the timeout call while the delay was already running, it could appear as though the delay was shorter than expected) - but changing the 60 up to something well above the threshold might be good). Though also, it's not your job to test that HTTPClient implements timeouts correctly, only that you pass them down through the API - so if there's a more robust way to test that property without actually having to run real timeouts (or at least not testing them so precisely) it might be helpful. |
206 ↗ | (On Diff #390457) | I think there's probably a matcher that matches all the elements in a more terse format than having to check each separately? ( https://stackoverflow.com/q/1460703 ) |
215–217 ↗ | (On Diff #390457) | Prefer llvm_unreachable over assert(false) |
llvm/lib/Support/HTTPServer.cpp | ||
---|---|---|
58 ↗ | (On Diff #390457) | That's a great idea! Cpp-httplib converts the content provider to a ContentProvider which is a std::function. This ends up copying the lambda. I was thinking we might refactor the interface, so instead of returning the response (or response provider) you can interact with a "HTTPServerRequest" object similar to how cpp-httplib's own API works. Even if we mirror the httplib interface, the problem remains that we cannot pass a non-copyable lambda directly to httplib's set_content_provider. I'm not sure if there is a nice way around this problem. |
67 ↗ | (On Diff #390457) | Ah, I think the nomenclature is misleading. HTTPServer::get should be called something like HTTPServer::registerGetHandler. So it returns, and then you can later call bind() etc., but the function provided in the argument may be called much later after listen when there is a request to be handled. |
llvm/unittests/Support/HTTPServer.cpp | ||
77 ↗ | (On Diff #390457) | There is a discussion about the need for a "manual" global HTTPClient::initialize() here. Briefly, Curl's global initialization is not thread-safe and may load other libs; the loader lock on windows prevents use of a static ctor to initialize. We added HTTPClient::initialize() to InitLLVM for convenience, but we had to remove it when HTTPClient got moved to the Debuginfod lib. So now the HTTPClient::initialize must be called "manually". By the way, there is ongoing discussion within libcurl community on making curl global init thread safe. On the other hand, we can safely deinitialize with a static destructor as you suggest. |
llvm/lib/Support/HTTPServer.cpp | ||
---|---|---|
58 ↗ | (On Diff #390457) | Ah, oh well. Thanks for the context! |
Refactor HTTPServerRequest interface to be more close to cpphttplib, and address all of @dblaikie 's comments.
Hi @dblaikie thank you so much for the review! I believe I have addressed all your comments, however they disappeared because I moved the HTTPServer into the Debuginfod library. You can see them on the previous revision.
llvm/lib/Support/HTTPServer.cpp | ||
---|---|---|
101–102 ↗ | (On Diff #390457) | I've switched it to take a StringRef -- for the only place this is used currently (streaming chunks of a file) you have a persistent MemoryBuffer which is responsible for the storage. |
125–130 ↗ | (On Diff #390457) | It's a good point that no matter whether we use cpp-httplib or something else, ultimately the system call will need a c-style string. I've changed it to a const char* and I could switch to a Twine if it's better. |
142–144 ↗ | (On Diff #390457) | Good idea! In a future TCPSocket implementation for LLVM it would be great to parse the errno better to give more helpful messages to the user. |
llvm/unittests/Support/HTTPServer.cpp | ||
82 ↗ | (On Diff #390457) | Ah, sadly it is required, unless you can see a better way around the following. |
184–189 ↗ | (On Diff #390457) | This is a really good observation. I'm concerned that even with a lot of extra wait time, it might flake under some extreme system load. Let's remove it for now, leaving just the test in which a timeout is expected. |
I think this is mostly OK. (still, might be worth initializing/deinitializing in test fixtures, to isolate the tests a bit more - but probably not a big deal/not the sort of bugs we're likely to see (eg: where one test taints the libcurl state and another fails in weird ways because of that taint))
llvm/unittests/Support/HTTPServer.cpp | ||
---|---|---|
82 ↗ | (On Diff #390457) | Seems like addressing the missing comparison would be good as a separate patch. |
Use test fixtures to perform HTTP Client initialize and teardown for the client-server tests.
@dblaikie Thanks for taking another look and the LGTM! Per your suggestion I've added a test fixture to call the HTTPClient initialization and teardown methods. Thanks for all the comments on this patch.
It seems at first llvm-debuginfod will be the sole user tool. So, my inclination is to hold off on committing this until the llvm-debuginfod server (D114845 and D114846) is ready to land.
Replace caputure-by-reference of request handler with capture-by-copy, and add a unit test to verify temporary handlers can be registered and work correctly.