This is an archive of the discontinued LLVM Phabricator instance.

[llvm] [Support] Add CURL HTTP Client.
ClosedPublic

Authored by noajshu on Oct 28 2021, 1:42 PM.

Details

Summary

Provides an implementation of HTTPClient that wraps libcurl.

Diff Detail

Event Timeline

noajshu created this revision.Oct 28 2021, 1:42 PM
noajshu requested review of this revision.Oct 28 2021, 1:42 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 28 2021, 1:42 PM
noajshu updated this revision to Diff 383719.Oct 31 2021, 10:31 PM

Refactor to considerably simplify CurlHTTPClient.

Currently there are no unit tests in this diff. I wonder if we need unit tests for the interfacing with Curl? @labath suggested an approach that uses a CurlInterface type class and mocks it up.

Another option could be to merge this with D112751 and make it so that the base HTTPClient just uses curl implementations by default. Then there would just be one (non-abstract) HTTPClient class.

labath added inline comments.Nov 2 2021, 6:55 AM
llvm/lib/Support/CurlHTTPClient.cpp
45–46 ↗(On Diff #383719)

I'm not sure if we need this bookkeeping here. It makes the API redundant, and the same logic could be implemented in BufferedHTTPResponseHandler.

51–54 ↗(On Diff #383719)

I don't think this function adds any value now...

81–82 ↗(On Diff #383719)

Are you sure this works? I was expecting to see a CurlHTTPRequest object here.

And the reason I am looking at this is because I was wondering if we could store the llvm::Error (from the previous commit) inside CurlHTTPRequest. That way an Error member wouldn't be a part of any public API, and here its lifetime would be clearly limited to the duration of this function.

noajshu updated this revision to Diff 384271.Nov 2 2021, 4:31 PM
noajshu marked 3 inline comments as done.

Implement CurlRequest error state, switch Handler.handle* methods to return Errors.
Rebase on top of D112751.

If we do want to merge the CulrHTTPClient into the base HTTPClient so that the default class just uses curl, what's the preferable way to do this? One option is just to move all code in CurlHTTPClient.cpp into HTTPClient.cpp, wrapping it in a #ifdef LLVM_ENABLE_CURL.

llvm/lib/Support/CurlHTTPClient.cpp
81–82 ↗(On Diff #383719)

Good catch. That was a mistake. There are no end-to-end tests running actual curl requests so I did not catch this. The only test checks the debuginfod client.

I like the idea of storing the error in the CurlHTTPRequest. I have implemented this change.

noajshu updated this revision to Diff 384279.Nov 2 2021, 5:05 PM

replace curlSetOpt with curl_easy_setopt

noajshu updated this revision to Diff 384309.Nov 2 2021, 7:51 PM

Move added Curl cmake var from D112758 to D112753.

labath added a comment.Nov 3 2021, 2:35 AM

If we do want to merge the CulrHTTPClient into the base HTTPClient so that the default class just uses curl, what's the preferable way to do this? One option is just to move all code in CurlHTTPClient.cpp into HTTPClient.cpp, wrapping it in a #ifdef LLVM_ENABLE_CURL.

I'm not aware of an exact precedent (due to this introducing a class instead of a bunch of free functions, etc.), but if I try to apply what happens with other external dependencies (see e.g. Compression.h) I get a static bool (CURL)HTTPClient::isAvailable() function, and an llvm_unreachable inside all of the member functions (when curl is unavailable).

llvm/include/llvm/Support/CurlHTTPClient.h
20 ↗(On Diff #384309)

This definitely shouldn't be at the top level, but I'm not sure if you need it at all (I count exactly one usage of that).

35 ↗(On Diff #384309)

did you want a reference here? Plain const on a by-value argument is fairly useless.

llvm/lib/Support/CurlHTTPClient.cpp
32 ↗(On Diff #384309)

remove this, and have caller use std::move

97 ↗(On Diff #384309)

/*no else*/ if (CurlRequest.ErrorState) return CurlRequest.ErrorState; ?

noajshu updated this revision to Diff 384631.Nov 3 2021, 6:48 PM
noajshu marked 3 inline comments as done.

Avoid polluting namespace with top-level typedef.
Pass HTTPRequest by const reference.

noajshu updated this revision to Diff 384644.Nov 3 2021, 8:12 PM
noajshu marked an inline comment as done.

Use LLVM_ENABLE_CURL to decide which implementation is used for HTTPClient.

If we do want to merge the CulrHTTPClient into the base HTTPClient so that the default class just uses curl, what's the preferable way to do this? One option is just to move all code in CurlHTTPClient.cpp into HTTPClient.cpp, wrapping it in a #ifdef LLVM_ENABLE_CURL.

I'm not aware of an exact precedent (due to this introducing a class instead of a bunch of free functions, etc.), but if I try to apply what happens with other external dependencies (see e.g. Compression.h) I get a static bool (CURL)HTTPClient::isAvailable() function, and an llvm_unreachable inside all of the member functions (when curl is unavailable).

Thanks! Is this what you had in mind?

labath added a comment.Nov 5 2021, 7:45 AM

If we do want to merge the CulrHTTPClient into the base HTTPClient so that the default class just uses curl, what's the preferable way to do this? One option is just to move all code in CurlHTTPClient.cpp into HTTPClient.cpp, wrapping it in a #ifdef LLVM_ENABLE_CURL.

I'm not aware of an exact precedent (due to this introducing a class instead of a bunch of free functions, etc.), but if I try to apply what happens with other external dependencies (see e.g. Compression.h) I get a static bool (CURL)HTTPClient::isAvailable() function, and an llvm_unreachable inside all of the member functions (when curl is unavailable).

Thanks! Is this what you had in mind?

Almost.

llvm/include/llvm/Support/HTTPClient.h
90

I think this should be static, so it can be invoked before creating a client object.

noajshu updated this revision to Diff 385203.Nov 5 2021, 3:15 PM

Rebase against D112751

noajshu updated this revision to Diff 385686.Nov 8 2021, 7:26 PM
noajshu marked an inline comment as done.

Default build with libcurl to "ON".
Several style / commenting changes.

noajshu added inline comments.Nov 8 2021, 10:02 PM
llvm/include/llvm/Support/HTTPClient.h
83

Hi @labath, do you think we should follow the style guide recommendation for fallible constructors instead of using curlInit? That is, HTTPClient would have a private constructor and a static Expected<HTTPClient> Create(), which would just run the code inside curlInit.

labath added inline comments.Nov 9 2021, 4:18 AM
llvm/include/llvm/Support/HTTPClient.h
83

Umm... maybe?

I don't think that doing this would be a bad choice, but for me, the more interesting question is whether we should rely on the implicit curl_global_init call that happens as a part of this. This seems to be discouraged by curl, and this kind of initialization is not a particularly nice thing for a library (which llvm is) to do.

If we can guarantee that curl_global_init was called, then we might not even need a fallible constructor here. The man page is not particularly clear on this, but I would expect that the only reason this can fail in this case is memory (or some other soft resource) exhaustion, and llvm is not robust against those anyway...

noajshu marked an inline comment as not done.Nov 9 2021, 10:07 AM
noajshu added inline comments.
llvm/include/llvm/Support/HTTPClient.h
83

Thank you so much for pointing out this issue.

Three solutions come to mind:

  1. The elegant C++ solution:
// HTTPClient.cpp
class CurlGlobalState {
public:
  CurlGlobalState() {
    curl_global_init(CURL_GLOBAL_ALL);
  }
  ~CurlGlobalState() {
    curl_global_cleanup(CURL_GLOBAL_ALL);
  }
};
static const CurlGlobalState CurlState;

The only potential issue here is that a library user loading LLVM Support as a Windows DLL could hit a deadlock (noted here, more details here).

  1. The less-elegant solution: exposing the global init and cleanup functions to the user, e.g. wrapped by:
static HTTPClient::globalInit();
static HTTPClient::globalCleanup();

and just asking users to call these at the beginning and end of the program.

  1. The LLVM-ey compromise:

Throw the global init and cleanup into InitLLVM which seems to have been made for purposes just like this.

Do you lean towards any particular solution?

noajshu added inline comments.Nov 9 2021, 8:37 PM
llvm/include/llvm/Support/HTTPClient.h
83

After some more looking, I think the ManagedStatic template may be just what we want. I'm running some experiments to see if it will fit our needs here.

labath added inline comments.Nov 10 2021, 12:55 AM
llvm/include/llvm/Support/HTTPClient.h
83

ManagedStatic would get around the ban on global constructors, but delaying the initialization until the first use means that the this could race with other calls to curl_global_init. If all uses go through this code, then it would be fine, but there's no way of knowing what will other libraries linked to the main application be doing.

This wouldn't be the first nor the worst case of llvm doing something non-library-friendly (I'm thinking of signal handlers here). However, it is not a practice I support, so I would prefer a solution (like InitLLVM) which lets the main application choose when and how the initialization happens.

noajshu updated this revision to Diff 386284.Nov 10 2021, 12:34 PM

Perform global curl (de)initialization in static member functions of HTTPClient.

noajshu updated this revision to Diff 386288.Nov 10 2021, 12:43 PM

Invoke Curl global init / deinit from InitLLVM's ctor / dtor.

noajshu added inline comments.Nov 10 2021, 8:26 PM
llvm/include/llvm/Support/HTTPClient.h
83

Thank you for pointing out the issues with using ManagedStatic. I've moved the global init / deinit into static methods and modified InitLLVM's ctor / dtor to call these.

noajshu updated this revision to Diff 386643.Nov 11 2021, 1:30 PM

Add cmakedefine for LLVM_ENABLE_CURL.

dblaikie added inline comments.
llvm/lib/Support/InitLLVM.cpp
63 ↗(On Diff #386643)

Does "isAvailable" need to be checked? The non-curl implementation doesn't fail on initialize or cleanup, I think? It's just a no-op there. So maybe calling these without the conditionality would be OK?

noajshu updated this revision to Diff 388389.Nov 18 2021, 10:17 PM
noajshu marked an inline comment as done.

Remove unnecessary checks of isAvailable before global init/deinit of HTTP client.
Also only define IsInitialized state for the curl client implementation enabled by -DLLVM_ENABLE_CURL, as it is not needed for the unimplemented client.

noajshu added inline comments.Nov 18 2021, 10:20 PM
llvm/lib/Support/InitLLVM.cpp
63 ↗(On Diff #386643)

Thanks, indeed, good point. I removed this and only declare IsInitialized when the curl client is enabled since it serves no purpose for the "unimplemented" form of the HTTP client.

dblaikie added inline comments.Nov 19 2021, 9:20 AM
llvm/lib/Support/HTTPClient.cpp
107

Generally this function should accept Error by value (or rvalue reference) and callers should pass with std::move.

174–176

I /think/ the ErrorState should be tested/returned first, otherwise if the Code codepath is hit (or the CurlRes codepath above) will hit an assertion due to the ErrorState being unchecked.

noajshu updated this revision to Diff 388594.Nov 19 2021, 12:14 PM
noajshu marked an inline comment as done.

All return paths from Curl-enabled HTTPClient::perform after initialization of CurlHTTPRequest pass back the CurlRequest.ErrorState so it gets checked.

dblaikie accepted this revision.Nov 19 2021, 12:26 PM

Looks good to me - how about you, @labath ?

This revision is now accepted and ready to land.Nov 19 2021, 12:26 PM

I'm thinking about llvm-config.h vs config.h (the difference is that the first is shipped alongside the compiler and exposed to users). We put most of our defines in the latter, but I can't really say why this shouldn't be exposed through llvm-config.h, so meh...

noajshu updated this revision to Diff 389070.Nov 22 2021, 5:04 PM

Rename unit tests and add additional test.

noajshu updated this revision to Diff 389342.Nov 23 2021, 3:47 PM

Rebase against D112751.

noajshu updated this revision to Diff 390203.Nov 28 2021, 8:51 AM

Add implementation of HTTPClient::setTimeout

noajshu updated this revision to Diff 390788.Nov 30 2021, 12:37 PM

Change timeouts to be in std::chrono::milliseconds

dblaikie accepted this revision.Nov 30 2021, 12:56 PM

Generally still looks good to me.

noajshu edited the summary of this revision. (Show Details)Dec 2 2021, 11:08 AM
noajshu updated this revision to Diff 391404.Dec 2 2021, 11:20 AM
noajshu marked 3 inline comments as done.

Rebase against main.

This revision was landed with ongoing or failed builds.Dec 2 2021, 12:35 PM
This revision was automatically updated to reflect the committed changes.
hans added a subscriber: hans.Dec 4 2021, 12:07 AM

In Chromium we just noticed that our clang binary grew a dependency on libcurl-gnutls.so.4 after this change.

That seems odd since clang doesn't have any business talking to the network. I assume this is really for Debuginfod?

If I understand correctly, the HTTPClient::initialize() call in InitLLVM::InitLLVM() forces the library into *all* LLVM tools by default. The extra dependency seems odd for some tools ("did clang start phoning home with my source?") and doing the initialization as part of each compile step also seems wasteful. Couldn't the initialization happen lazily on the first call to HTTPClient from tools that need it instead?

thakis added a subscriber: thakis.Dec 4 2021, 4:25 AM

+1. I think a better model to follow is how we link libxml2 only into lld instead into all binaries. (Which means we'd move HTTPClient.cpp out of Support into its own library, which seems like it makes sense anyways.) See https://reviews.llvm.org/D35819 ​for the example.

In Chromium we just noticed that our clang binary grew a dependency on libcurl-gnutls.so.4 after this change.

That seems odd since clang doesn't have any business talking to the network. I assume this is really for Debuginfod?

Thanks for bringing this up. Yes, this is for the debuginfod client library (D112758). The default value for LLVM_ENABLE_CURL is ON. So if it finds libcurl on your system, it will use it.
If your clang build were configured with -DLLVM_ENABLE_CURL=OFF, the dependency on libcurl would disappear.
Do you think we should change the default value to OFF?

If I understand correctly, the HTTPClient::initialize() call in InitLLVM::InitLLVM() forces the library into *all* LLVM tools by default. The extra dependency seems odd for some tools ("did clang start phoning home with my source?") and doing the initialization as part of each compile step also seems wasteful. Couldn't the initialization happen lazily on the first call to HTTPClient from tools that need it instead?

Curl's global initialization is not thread-safe and should be called at the beginning of the program. This is why we added it to InitLLVM. There is some relevant discussion on this here: https://reviews.llvm.org/D112753#inline-1082560
If LLVM_ENABLE_CURL is OFF then the initialize is a no-op. However, if built with curl, tools that do not use it will still call the initialization step. As you point out, this is wasteful. So do you think we should remove it from InitLLVM and require user tools to separately init/deinit? Or, should we add a default-false argument InitializeHTTPClient to InitLLVM's ctor?

Thanks again for bringing this up.

+1. I think a better model to follow is how we link libxml2 only into lld instead into all binaries. (Which means we'd move HTTPClient.cpp out of Support into its own library, which seems like it makes sense anyways.) See https://reviews.llvm.org/D35819 ​for the example.

Thanks, we could move this to a separate library. We would also have to remove the global init/deinit from InitLLVM or else Support would depend on this library.

thakis added a comment.Dec 4 2021, 4:48 PM

Yup, that sounds right.

If it takes a while to move this to a separate lib, I think it'd be good to change the default here to OFF until then :)

Yup, that sounds right.

If it takes a while to move this to a separate lib, I think it'd be good to change the default here to OFF until then :)

Thanks! I just added you as a reviewer of D115131 which moves the HTTPClient.cpp to the Debuginfod library.

dim added subscribers: emaste, jrtc27, dim.Dec 8 2021, 1:17 PM

This breaks building on FreeBSD, where I see:

-- Found LibXml2: /usr/local/lib/libxml2.so (found version "2.9.12")
-- Looking for xmlReadMemory
-- Looking for xmlReadMemory - found
-- Found CURL: /usr/local/lib/libcurl.so (found version "7.79.1")
-- Looking for curl_easy_init
-- Looking for curl_easy_init - found
-- Looking for el_init in edit
-- Looking for el_init in edit - found
...

and then later:

FAILED: lib/Support/CMakeFiles/LLVMSupport.dir/HTTPClient.cpp.o
/usr/bin/clang++ -DGTEST_HAS_RTTI=0 -D_DEBUG -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/dim/obj/llvm-llvmorg-14-init-11213-ge0b259f22c00-freebsd13-amd64-ninja-clang-rel-1/lib/Support -I/home/dim/src/llvm/llvm-project/llvm/lib/Support -I/home/dim/obj/llvm-llvmorg-14-init-11213-ge0b259f22c00-freebsd13-amd64-ninja-clang-rel-1/include -I/home/dim/src/llvm/llvm-project/llvm/include -fPIC -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -fdiagnostics-color -ffunction-sections -fdata-sections  -O3 -DNDEBUG -UNDEBUG -std=c++14  -fno-exceptions -fno-rtti -MD -MT lib/Support/CMakeFiles/LLVMSupport.dir/HTTPClient.cpp.o -MF lib/Support/CMakeFiles/LLVMSupport.dir/HTTPClient.cpp.o.d -o lib/Support/CMakeFiles/LLVMSupport.dir/HTTPClient.cpp.o -c /home/dim/src/llvm/llvm-project/llvm/lib/Support/HTTPClient.cpp
/home/dim/src/llvm/llvm-project/llvm/lib/Support/HTTPClient.cpp:23:10: fatal error: 'curl/curl.h' file not found
#include <curl/curl.h>
         ^~~~~~~~~~~~~
1 error generated.

It looks like the -I flags aren't correctly detected, since it doesn't pass -I/usr/local/include, as it should:

$ curl-config --cflags
-I/usr/local/include

$ curl-config --libs
-L/usr/local/lib -lcurl

Thanks @dim, I have a patch that should fix this D115189
Would that solve the breakage on freebsd?

dim added a comment.Dec 8 2021, 1:39 PM

Thanks @dim, I have a patch that should fix this D115189
Would that solve the breakage on freebsd?

No, as I went back in history to find this commit that introduced the feature. I'm not a cmake expert, but I think the problem lies in the way features are tested in llvm/cmake/config-ix.cmake, e.g.:

if(CURL_FOUND)
  # Check if curl we found is usable; for example, we may have found a 32-bit
  # library on a 64-bit system which would result in a link-time failure.
  cmake_push_check_state()
  list(APPEND CMAKE_REQUIRED_INCLUDES ${CURL_INCLUDE_DIRS})
  list(APPEND CMAKE_REQUIRED_LIBRARIES ${CURL_LIBRARY})
  check_symbol_exists(curl_easy_init curl/curl.h HAVE_CURL)
  cmake_pop_check_state()
  if(LLVM_ENABLE_CURL STREQUAL FORCE_ON AND NOT HAVE_CURL)
    message(FATAL_ERROR "Failed to configure curl")
  endif()
endif()

As far as I understand it, the cmake_pop_check_state() after the checks *restores* the CMAKE_REQUIRED_INCLUDES to its previous value, and therefore removes the appended CURL_INCLUDE_DIRS value.

Strangely, most of the external library checks go this way, e.g. for libxml just above:

if(LibXml2_FOUND)
  # Check if libxml2 we found is usable; for example, we may have found a 32-bit
  # library on a 64-bit system which would result in a link-time failure.
  cmake_push_check_state()
  list(APPEND CMAKE_REQUIRED_INCLUDES ${LIBXML2_INCLUDE_DIRS})
  list(APPEND CMAKE_REQUIRED_LIBRARIES ${LIBXML2_LIBRARIES})
  list(APPEND CMAKE_REQUIRED_DEFINITIONS ${LIBXML2_DEFINITIONS})
  check_symbol_exists(xmlReadMemory libxml/xmlreader.h HAVE_LIBXML2)
  cmake_pop_check_state()
  if(LLVM_ENABLE_LIBXML2 STREQUAL FORCE_ON AND NOT HAVE_LIBXML2)
    message(FATAL_ERROR "Failed to configure libxml2")
  endif()
endif()

I simply don't understand how this can work, if the detected include dirs are not in a system-default location, such as /usr/include...

Thanks @dim, I have a patch that should fix this D115189
Would that solve the breakage on freebsd?

No, as I went back in history to find this commit that introduced the feature. I'm not a cmake expert, but I think the problem lies in the way features are tested in llvm/cmake/config-ix.cmake, e.g.:

if(CURL_FOUND)
  # Check if curl we found is usable; for example, we may have found a 32-bit
  # library on a 64-bit system which would result in a link-time failure.
  cmake_push_check_state()
  list(APPEND CMAKE_REQUIRED_INCLUDES ${CURL_INCLUDE_DIRS})
  list(APPEND CMAKE_REQUIRED_LIBRARIES ${CURL_LIBRARY})
  check_symbol_exists(curl_easy_init curl/curl.h HAVE_CURL)
  cmake_pop_check_state()
  if(LLVM_ENABLE_CURL STREQUAL FORCE_ON AND NOT HAVE_CURL)
    message(FATAL_ERROR "Failed to configure curl")
  endif()
endif()

As far as I understand it, the cmake_pop_check_state() after the checks *restores* the CMAKE_REQUIRED_INCLUDES to its previous value, and therefore removes the appended CURL_INCLUDE_DIRS value.

Strangely, most of the external library checks go this way, e.g. for libxml just above:

if(LibXml2_FOUND)
  # Check if libxml2 we found is usable; for example, we may have found a 32-bit
  # library on a 64-bit system which would result in a link-time failure.
  cmake_push_check_state()
  list(APPEND CMAKE_REQUIRED_INCLUDES ${LIBXML2_INCLUDE_DIRS})
  list(APPEND CMAKE_REQUIRED_LIBRARIES ${LIBXML2_LIBRARIES})
  list(APPEND CMAKE_REQUIRED_DEFINITIONS ${LIBXML2_DEFINITIONS})
  check_symbol_exists(xmlReadMemory libxml/xmlreader.h HAVE_LIBXML2)
  cmake_pop_check_state()
  if(LLVM_ENABLE_LIBXML2 STREQUAL FORCE_ON AND NOT HAVE_LIBXML2)
    message(FATAL_ERROR "Failed to configure libxml2")
  endif()
endif()

I simply don't understand how this can work, if the detected include dirs are not in a system-default location, such as /usr/include...

Thanks!
My hypothesis is that the imported libs added to Support are not pulling in their includes to the target because they are linked with PRIVATE. If this is the case, maybe we need to add ADDITIONAL_HEADER_DIRS includes for those here: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Support/CMakeLists.txt#L252

@dim, I just committed D115189 which should solve your problem. Thanks for your help and please let me know if your build still fails.

dim added a comment.Dec 9 2021, 1:22 AM

@dim, I just committed D115189 which should solve your problem. Thanks for your help and please let me know if your build still fails.

Thanks, I checked and indeed just before rG54b35c09b2cd it still fails in the same way, but after rG54b35c09b2cd it works!

MaskRay added a subscriber: MaskRay.Feb 4 2022, 5:34 PM
MaskRay added inline comments.
llvm/lib/Support/HTTPClient.cpp
148

For non-Debug build types, LLVM_ENABLE_ASSERTIONS=off is the default and assert is a no-op.

I fixed this in 3dd2d4c0a239ce78421d51491456cb1a075ad2f3