Provides an implementation of HTTPClient that wraps libcurl.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Currently there are no unit tests in this diff. I wonder if we need unit tests for the interfacing with Curl? @labath suggested an approach that uses a CurlInterface type class and mocks it up.
Another option could be to merge this with D112751 and make it so that the base HTTPClient just uses curl implementations by default. Then there would just be one (non-abstract) HTTPClient class.
llvm/lib/Support/CurlHTTPClient.cpp | ||
---|---|---|
45–46 ↗ | (On Diff #383719) | I'm not sure if we need this bookkeeping here. It makes the API redundant, and the same logic could be implemented in BufferedHTTPResponseHandler. |
51–54 ↗ | (On Diff #383719) | I don't think this function adds any value now... |
81–82 ↗ | (On Diff #383719) | Are you sure this works? I was expecting to see a CurlHTTPRequest object here. And the reason I am looking at this is because I was wondering if we could store the llvm::Error (from the previous commit) inside CurlHTTPRequest. That way an Error member wouldn't be a part of any public API, and here its lifetime would be clearly limited to the duration of this function. |
Implement CurlRequest error state, switch Handler.handle* methods to return Errors.
Rebase on top of D112751.
If we do want to merge the CulrHTTPClient into the base HTTPClient so that the default class just uses curl, what's the preferable way to do this? One option is just to move all code in CurlHTTPClient.cpp into HTTPClient.cpp, wrapping it in a #ifdef LLVM_ENABLE_CURL.
llvm/lib/Support/CurlHTTPClient.cpp | ||
---|---|---|
81–82 ↗ | (On Diff #383719) | Good catch. That was a mistake. There are no end-to-end tests running actual curl requests so I did not catch this. The only test checks the debuginfod client. I like the idea of storing the error in the CurlHTTPRequest. I have implemented this change. |
I'm not aware of an exact precedent (due to this introducing a class instead of a bunch of free functions, etc.), but if I try to apply what happens with other external dependencies (see e.g. Compression.h) I get a static bool (CURL)HTTPClient::isAvailable() function, and an llvm_unreachable inside all of the member functions (when curl is unavailable).
llvm/include/llvm/Support/CurlHTTPClient.h | ||
---|---|---|
20 ↗ | (On Diff #384309) | This definitely shouldn't be at the top level, but I'm not sure if you need it at all (I count exactly one usage of that). |
35 ↗ | (On Diff #384309) | did you want a reference here? Plain const on a by-value argument is fairly useless. |
llvm/lib/Support/CurlHTTPClient.cpp | ||
32 ↗ | (On Diff #384309) | remove this, and have caller use std::move |
97 ↗ | (On Diff #384309) | /*no else*/ if (CurlRequest.ErrorState) return CurlRequest.ErrorState; ? |
Avoid polluting namespace with top-level typedef.
Pass HTTPRequest by const reference.
Almost.
llvm/include/llvm/Support/HTTPClient.h | ||
---|---|---|
88 | I think this should be static, so it can be invoked before creating a client object. |
llvm/include/llvm/Support/HTTPClient.h | ||
---|---|---|
82 | Hi @labath, do you think we should follow the style guide recommendation for fallible constructors instead of using curlInit? That is, HTTPClient would have a private constructor and a static Expected<HTTPClient> Create(), which would just run the code inside curlInit. |
llvm/include/llvm/Support/HTTPClient.h | ||
---|---|---|
82 | Umm... maybe? I don't think that doing this would be a bad choice, but for me, the more interesting question is whether we should rely on the implicit curl_global_init call that happens as a part of this. This seems to be discouraged by curl, and this kind of initialization is not a particularly nice thing for a library (which llvm is) to do. If we can guarantee that curl_global_init was called, then we might not even need a fallible constructor here. The man page is not particularly clear on this, but I would expect that the only reason this can fail in this case is memory (or some other soft resource) exhaustion, and llvm is not robust against those anyway... |
llvm/include/llvm/Support/HTTPClient.h | ||
---|---|---|
82 | Thank you so much for pointing out this issue. Three solutions come to mind:
// HTTPClient.cpp class CurlGlobalState { public: CurlGlobalState() { curl_global_init(CURL_GLOBAL_ALL); } ~CurlGlobalState() { curl_global_cleanup(CURL_GLOBAL_ALL); } }; static const CurlGlobalState CurlState; The only potential issue here is that a library user loading LLVM Support as a Windows DLL could hit a deadlock (noted here, more details here).
static HTTPClient::globalInit(); static HTTPClient::globalCleanup(); and just asking users to call these at the beginning and end of the program.
Throw the global init and cleanup into InitLLVM which seems to have been made for purposes just like this. Do you lean towards any particular solution? |
llvm/include/llvm/Support/HTTPClient.h | ||
---|---|---|
82 | After some more looking, I think the ManagedStatic template may be just what we want. I'm running some experiments to see if it will fit our needs here. |
llvm/include/llvm/Support/HTTPClient.h | ||
---|---|---|
82 | ManagedStatic would get around the ban on global constructors, but delaying the initialization until the first use means that the this could race with other calls to curl_global_init. If all uses go through this code, then it would be fine, but there's no way of knowing what will other libraries linked to the main application be doing. This wouldn't be the first nor the worst case of llvm doing something non-library-friendly (I'm thinking of signal handlers here). However, it is not a practice I support, so I would prefer a solution (like InitLLVM) which lets the main application choose when and how the initialization happens. |
llvm/include/llvm/Support/HTTPClient.h | ||
---|---|---|
82 | Thank you for pointing out the issues with using ManagedStatic. I've moved the global init / deinit into static methods and modified InitLLVM's ctor / dtor to call these. |
llvm/lib/Support/InitLLVM.cpp | ||
---|---|---|
63 | Does "isAvailable" need to be checked? The non-curl implementation doesn't fail on initialize or cleanup, I think? It's just a no-op there. So maybe calling these without the conditionality would be OK? |
Remove unnecessary checks of isAvailable before global init/deinit of HTTP client.
Also only define IsInitialized state for the curl client implementation enabled by -DLLVM_ENABLE_CURL, as it is not needed for the unimplemented client.
llvm/lib/Support/InitLLVM.cpp | ||
---|---|---|
63 | Thanks, indeed, good point. I removed this and only declare IsInitialized when the curl client is enabled since it serves no purpose for the "unimplemented" form of the HTTP client. |
llvm/lib/Support/HTTPClient.cpp | ||
---|---|---|
112 | Generally this function should accept Error by value (or rvalue reference) and callers should pass with std::move. | |
179–181 | I /think/ the ErrorState should be tested/returned first, otherwise if the Code codepath is hit (or the CurlRes codepath above) will hit an assertion due to the ErrorState being unchecked. |
All return paths from Curl-enabled HTTPClient::perform after initialization of CurlHTTPRequest pass back the CurlRequest.ErrorState so it gets checked.
I'm thinking about llvm-config.h vs config.h (the difference is that the first is shipped alongside the compiler and exposed to users). We put most of our defines in the latter, but I can't really say why this shouldn't be exposed through llvm-config.h, so meh...
In Chromium we just noticed that our clang binary grew a dependency on libcurl-gnutls.so.4 after this change.
That seems odd since clang doesn't have any business talking to the network. I assume this is really for Debuginfod?
If I understand correctly, the HTTPClient::initialize() call in InitLLVM::InitLLVM() forces the library into *all* LLVM tools by default. The extra dependency seems odd for some tools ("did clang start phoning home with my source?") and doing the initialization as part of each compile step also seems wasteful. Couldn't the initialization happen lazily on the first call to HTTPClient from tools that need it instead?
+1. I think a better model to follow is how we link libxml2 only into lld instead into all binaries. (Which means we'd move HTTPClient.cpp out of Support into its own library, which seems like it makes sense anyways.) See https://reviews.llvm.org/D35819 for the example.
Thanks for bringing this up. Yes, this is for the debuginfod client library (D112758). The default value for LLVM_ENABLE_CURL is ON. So if it finds libcurl on your system, it will use it.
If your clang build were configured with -DLLVM_ENABLE_CURL=OFF, the dependency on libcurl would disappear.
Do you think we should change the default value to OFF?
If I understand correctly, the HTTPClient::initialize() call in InitLLVM::InitLLVM() forces the library into *all* LLVM tools by default. The extra dependency seems odd for some tools ("did clang start phoning home with my source?") and doing the initialization as part of each compile step also seems wasteful. Couldn't the initialization happen lazily on the first call to HTTPClient from tools that need it instead?
Curl's global initialization is not thread-safe and should be called at the beginning of the program. This is why we added it to InitLLVM. There is some relevant discussion on this here: https://reviews.llvm.org/D112753#inline-1082560
If LLVM_ENABLE_CURL is OFF then the initialize is a no-op. However, if built with curl, tools that do not use it will still call the initialization step. As you point out, this is wasteful. So do you think we should remove it from InitLLVM and require user tools to separately init/deinit? Or, should we add a default-false argument InitializeHTTPClient to InitLLVM's ctor?
Thanks again for bringing this up.
Thanks, we could move this to a separate library. We would also have to remove the global init/deinit from InitLLVM or else Support would depend on this library.
Yup, that sounds right.
If it takes a while to move this to a separate lib, I think it'd be good to change the default here to OFF until then :)
Thanks! I just added you as a reviewer of D115131 which moves the HTTPClient.cpp to the Debuginfod library.
This breaks building on FreeBSD, where I see:
-- Found LibXml2: /usr/local/lib/libxml2.so (found version "2.9.12") -- Looking for xmlReadMemory -- Looking for xmlReadMemory - found -- Found CURL: /usr/local/lib/libcurl.so (found version "7.79.1") -- Looking for curl_easy_init -- Looking for curl_easy_init - found -- Looking for el_init in edit -- Looking for el_init in edit - found ...
and then later:
FAILED: lib/Support/CMakeFiles/LLVMSupport.dir/HTTPClient.cpp.o /usr/bin/clang++ -DGTEST_HAS_RTTI=0 -D_DEBUG -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/dim/obj/llvm-llvmorg-14-init-11213-ge0b259f22c00-freebsd13-amd64-ninja-clang-rel-1/lib/Support -I/home/dim/src/llvm/llvm-project/llvm/lib/Support -I/home/dim/obj/llvm-llvmorg-14-init-11213-ge0b259f22c00-freebsd13-amd64-ninja-clang-rel-1/include -I/home/dim/src/llvm/llvm-project/llvm/include -fPIC -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -UNDEBUG -std=c++14 -fno-exceptions -fno-rtti -MD -MT lib/Support/CMakeFiles/LLVMSupport.dir/HTTPClient.cpp.o -MF lib/Support/CMakeFiles/LLVMSupport.dir/HTTPClient.cpp.o.d -o lib/Support/CMakeFiles/LLVMSupport.dir/HTTPClient.cpp.o -c /home/dim/src/llvm/llvm-project/llvm/lib/Support/HTTPClient.cpp /home/dim/src/llvm/llvm-project/llvm/lib/Support/HTTPClient.cpp:23:10: fatal error: 'curl/curl.h' file not found #include <curl/curl.h> ^~~~~~~~~~~~~ 1 error generated.
It looks like the -I flags aren't correctly detected, since it doesn't pass -I/usr/local/include, as it should:
$ curl-config --cflags -I/usr/local/include $ curl-config --libs -L/usr/local/lib -lcurl
No, as I went back in history to find this commit that introduced the feature. I'm not a cmake expert, but I think the problem lies in the way features are tested in llvm/cmake/config-ix.cmake, e.g.:
if(CURL_FOUND) # Check if curl we found is usable; for example, we may have found a 32-bit # library on a 64-bit system which would result in a link-time failure. cmake_push_check_state() list(APPEND CMAKE_REQUIRED_INCLUDES ${CURL_INCLUDE_DIRS}) list(APPEND CMAKE_REQUIRED_LIBRARIES ${CURL_LIBRARY}) check_symbol_exists(curl_easy_init curl/curl.h HAVE_CURL) cmake_pop_check_state() if(LLVM_ENABLE_CURL STREQUAL FORCE_ON AND NOT HAVE_CURL) message(FATAL_ERROR "Failed to configure curl") endif() endif()
As far as I understand it, the cmake_pop_check_state() after the checks *restores* the CMAKE_REQUIRED_INCLUDES to its previous value, and therefore removes the appended CURL_INCLUDE_DIRS value.
Strangely, most of the external library checks go this way, e.g. for libxml just above:
if(LibXml2_FOUND) # Check if libxml2 we found is usable; for example, we may have found a 32-bit # library on a 64-bit system which would result in a link-time failure. cmake_push_check_state() list(APPEND CMAKE_REQUIRED_INCLUDES ${LIBXML2_INCLUDE_DIRS}) list(APPEND CMAKE_REQUIRED_LIBRARIES ${LIBXML2_LIBRARIES}) list(APPEND CMAKE_REQUIRED_DEFINITIONS ${LIBXML2_DEFINITIONS}) check_symbol_exists(xmlReadMemory libxml/xmlreader.h HAVE_LIBXML2) cmake_pop_check_state() if(LLVM_ENABLE_LIBXML2 STREQUAL FORCE_ON AND NOT HAVE_LIBXML2) message(FATAL_ERROR "Failed to configure libxml2") endif() endif()
I simply don't understand how this can work, if the detected include dirs are not in a system-default location, such as /usr/include...
Thanks!
My hypothesis is that the imported libs added to Support are not pulling in their includes to the target because they are linked with PRIVATE. If this is the case, maybe we need to add ADDITIONAL_HEADER_DIRS includes for those here: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Support/CMakeLists.txt#L252
Thanks, I checked and indeed just before rG54b35c09b2cd it still fails in the same way, but after rG54b35c09b2cd it works!
llvm/lib/Support/HTTPClient.cpp | ||
---|---|---|
153 | For non-Debug build types, LLVM_ENABLE_ASSERTIONS=off is the default and assert is a no-op. I fixed this in 3dd2d4c0a239ce78421d51491456cb1a075ad2f3 |
Hi @labath, do you think we should follow the style guide recommendation for fallible constructors instead of using curlInit? That is, HTTPClient would have a private constructor and a static Expected<HTTPClient> Create(), which would just run the code inside curlInit.