This is an archive of the discontinued LLVM Phabricator instance.

[llvm] [Support] Add HTTP Client Support library.
ClosedPublic

Authored by noajshu on Oct 28 2021, 1:14 PM.

Details

Summary

This patch implements a small HTTP client library consisting primarily of the HTTPRequest, HTTPResponseHandler, and BufferedHTTPResponseHandler classes. Unit tests of the HTTPResponseHandler and BufferedHTTPResponseHandler are included.

Diff Detail

Event Timeline

noajshu created this revision.Oct 28 2021, 1:14 PM
noajshu requested review of this revision.Oct 28 2021, 1:14 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 28 2021, 1:14 PM
noajshu edited the summary of this revision. (Show Details)Oct 28 2021, 1:16 PM
noajshu added a reviewer: phosek.
noajshu edited the summary of this revision. (Show Details)
noajshu updated this revision to Diff 383150.Oct 28 2021, 1:39 PM

Remove extraneous curl.h include.

noajshu updated this revision to Diff 383450.Oct 29 2021, 12:02 PM
noajshu added a reviewer: labath.

Refactor to three primary parent classes HTTPClient, HTTPRequest, HTTPResponseHandler.
This is a more straightforward design that presents a simpler usage pattern. E.g., we can do this:

Expected<HTTPResponseBuffer> ResponseOrErr = Client.get("http://example.com");
if (Error Err = ResponseOrErr)
    return Err;
outs() << ResponseOrErr->Body;
noajshu updated this revision to Diff 383493.Oct 29 2021, 1:55 PM

Improve doxygen comments and formatting.

noajshu updated this revision to Diff 383717.Oct 31 2021, 10:29 PM

Refactor to isolate most logic in HTTPResponseHandler.

@labath thank you for your comments on the previous version of this diff before splitting!
I have refactored and simplified things, so that we have three main classes:

  • HTTPClient
  • HTTPRequest
  • HTTPResponseHandler

This diff just adds the skeleton, while the next in the sequence (D112753) adds the CURL client itself. Do you think the base HTTPClient should just use curl by default? In that case perhaps these diffs could be merged.

By the way, I wanted to share some of the motivation for the added complexity of these three classes instead of just an httpGet function.
I am thinking ahead to their use in the debuginfod client and server. The HTTPClient and HTTPResponseHandler can be extended for concurrent requests or streaming response handlers, which both could be used for debuginfod.
The client could use concurrent requests to send a HEAD to all N known server URLs at once, rather than serially requesting the asset from each server. Large debugging assets could be processed piece-by-piece using a StreamingHTTPResponseHandler. This would be used during federation by a debuginfod server, enabling immediate write of data chunks received from the peer server back to the client.
Curl’s docs say that re-use of handles is key to good performance and this is part of the reason to have a Client that reuses a handle when running multiple requests. However I haven't measured this performance difference myself.

labath added a comment.Nov 2 2021, 6:43 AM

I've found the error handling confusing. If the intention is that the HTTPResponseHandler can terminate the request by returning zero, then it seems to me that it would be better to have it return llvm::Error, and have the HTTPClient class translate that to whatever action is required ( == storing the error in some variable and returning 0 to curl). That would make the BufferedHTTPResponse class cleaner, as it would no longer have to deal with an Error member, and the overall api would be more understandable.

llvm/include/llvm/Support/HTTPClient.h
41

Should this be a period, or is something missing?

45

I'm having trouble reconciling this with the "cannot fail" comment above. Can you elaborate on what the two comments mean?

71–76

Having an Error member is strange enough (because of it's destruction semantics), but making it a public member is unheard of. It would be much better to make these private and have a member function returning Expected<HTTPResponseBuffer>.

But it would be much better to not have the Error member in the first place.

95

I don't see any methods returning an error

llvm/lib/Support/HTTPClient.cpp
51

if you used llvm::to_integer here, you would be able to use any type you want (size_t perhaps), and would get more natural failure semantics (no need for ! -- to_integer returns true on success).

llvm/unittests/Support/HTTPClient.cpp
43–46

EXPECT_EQ(Handler.ResponseBuffer.Body->getBuffer(), "...");

158–163

What is this supposed to test? This test pretty much only executes code which is defined in the test itself...

@labath I completely agree the new error handling is awkward and I prefer your proposed Error handling between the ResponseHandler and the Client. I'll change all three handle*() methods to return Error (then convert it to 0 for curl in the client + hook functions)

llvm/unittests/Support/HTTPClient.cpp
43–46

Thanks! Because the Body is actually a WritableMemoryBuffer, I need an extra namespace qualifier on getBuffer for this to work.

On that note -- do you think I should change the HTTP response body to be a MemoryBuffer (read-only)?

noajshu added inline comments.Nov 2 2021, 12:42 PM
llvm/lib/Support/HTTPClient.cpp
51

Ah, the perfect fit. Thanks!

llvm/unittests/Support/HTTPClient.cpp
158–163

I agree and I will remove the mockHTTPClientTest.

Do you think there is any value in the BrokenConnection test? I would be happy to remove this test as well. The only non-test code executed in that one is Client.perform(Request) and Client.get("") which is effectively a change-detector for the short convenience methods of the Client.

noajshu updated this revision to Diff 384247.Nov 2 2021, 2:55 PM

Update error handling of HTTPResponseHandler and HTTPClient

noajshu marked 5 inline comments as done and an inline comment as not done.Nov 2 2021, 3:58 PM
labath added a comment.Nov 3 2021, 2:15 AM

This looks good to me now, modulo some details in inline comments.

llvm/lib/Support/HTTPClient.cpp
30–32

= default;

llvm/unittests/Support/HTTPClient.cpp
18

This block looks weird -- I'd just make a separate test for that.

43–46

I don't see a need for that -- I can see how the caller might want to modify the buffer for its own purposes. and he can always convert it to a read-only buffer if he does not (the opposite is harder).

158–163

Yeah, I'd probably remove them, or at least significantly simplify them. If you want to test that the virtual perform is getting called, then all you need to do is mock that method -- you don't really need to provide a fake implementation of that method, and everything that gets called by it...

noajshu updated this revision to Diff 384612.Nov 3 2021, 4:58 PM
noajshu marked 4 inline comments as done.

Address final comments.

noajshu updated this revision to Diff 384633.Nov 3 2021, 6:53 PM
labath added a comment.Nov 5 2021, 7:35 AM

I think this is as good as it can get. What you do after that -- I don't know.

It wasn't clear to me whether you forgot about the mockHTTPClientBrokenConnectionTest part, or chose not to do that, but I added an additional comment to describe how I would do this.

llvm/lib/Support/HTTPClient.cpp
30

= default

llvm/unittests/Support/HTTPClient.cpp
141–142

MOCK_METHOD2(perform, Error(HTTPRequest, HTTPResponseHandler)), and then just set expectations in the test. The entire MockHTTPResponseHandler business can go away.

noajshu updated this revision to Diff 385186.Nov 5 2021, 2:08 PM

Make perform(Request, Handler) non-virtual, adopting D112753's convention for HTTPClient implementation.

noajshu updated this revision to Diff 385191.Nov 5 2021, 2:13 PM

nit, improve comment

noajshu updated this revision to Diff 385199.Nov 5 2021, 3:04 PM

Make HTTPClient::isAvailable static.

noajshu added a comment.EditedNov 5 2021, 3:17 PM

I think this is as good as it can get. What you do after that -- I don't know.

It wasn't clear to me whether you forgot about the mockHTTPClientBrokenConnectionTest part, or chose not to do that, but I added an additional comment to describe how I would do this.

Thanks Pavel! I decided to remove the mock client tests, because they only hit the few-line convenience methods. I'm happy to add them, I'm just unsure how to avoid making HTTPClient::perform(Request, Handler) virtual. In any case they will be hit in subsequent testing after a working client implementation (i.e. Curl in D112753) is added.

llvm/unittests/Support/HTTPClient.cpp
141–142

Thanks! It seems the only library code we hit with these tests is the small convenience methods perform(Request) and get(Url). Is it worth making the perform method virtual to be able to create the mock class?

noajshu updated this revision to Diff 385601.Nov 8 2021, 12:50 PM
noajshu marked 4 inline comments as done.

Final style updates.

noajshu updated this revision to Diff 385687.Nov 8 2021, 7:27 PM

Remove extraneous #define DEBUG_TYPE

labath added inline comments.Nov 9 2021, 4:26 AM
llvm/unittests/Support/HTTPClient.cpp
141–142

Probably not. It made sense while curl was implemented in a separate class, but I don't think it's worth going out of our way to make this work.

Couple of minor API design issues, but haven't dug in much.

@labath - what's your state on this and the subsequent patches? What would be best for me to focus on/do here to help out?

llvm/lib/Support/HTTPClient.cpp
26

Twine's mostly meant for when a string value may only be conditionally manifest (such as in a logging library where some dynamic log level would dictate whether the value is logged) - if the string is unconditionally manifest, Twine probably isn't the right tool to use here.

You could pass a SmallString by value & move it into the Url, but since it's a smallstring, it has a big sizeof and might not be the most efficient thing to move around. Alternatively just pass a StringRef.

(similar feedback up the stack/for other uses of Twine)

llvm/unittests/Support/HTTPClient.cpp
65–72

If this code isn't touching/implementing to_integer, it's probably not necessary/appropriate to re-test it here, should just rely on its existing definition/documentation/testing elsewhere.

86

Perhaps avoid the hardcoded 19 somehow? Like you could declare the string as a named char array, then use sizeof to get the length (-1 for the implicit null terminator, if that's not meant to be included)

noajshu updated this revision to Diff 386058.Nov 9 2021, 9:50 PM
noajshu marked 4 inline comments as done.

Respond to feedback from @dblaikie.

Thanks for the comments @dblaikie! I've converted to StringRef parameters throughout.

@labath - what's your state on this and the subsequent patches? What would be best for me to focus on/do here to help out?

I've pretty much done all I can with this series. I am happy with the code quality. Test coverage is ok. There's one ongoing discussion in https://reviews.llvm.org/D112753#inline-1083341, that I'd like to hear your thoughts on (if you have any), but that's a fairly small detail.

I'd say the main thing that's missing is for someone to say "this belongs in llvm".

noajshu updated this revision to Diff 386263.Nov 10 2021, 11:27 AM

Move global curl initialization / cleanup to static HTTPClient::initialize / HTTPClient::cleanup.

dblaikie accepted this revision.Nov 17 2021, 1:48 PM

@labath - what's your state on this and the subsequent patches? What would be best for me to focus on/do here to help out?

I've pretty much done all I can with this series. I am happy with the code quality. Test coverage is ok. There's one ongoing discussion in https://reviews.llvm.org/D112753#inline-1083341, that I'd like to hear your thoughts on (if you have any), but that's a fairly small detail.

I'd say the main thing that's missing is for someone to say "this belongs in llvm".

Fair enough - Yeah, I think the direction's worth having a go at - be nice to have it supported in llvm-symbolizer, etc.

llvm/include/llvm/Support/HTTPClient.h
31

Usually best to write op overloads as non-members (possibly as friends, if that's necessary/helpful) to avoid different conversion candidates an the LHS and RHS.

35

Skip const here

107

Skip top-level const here - while it's generally out of character for LLVM code, it's also not part of the function declaration (const here doesn't change the type of the function/doesn't change how people call it, it's only an implementation detail)

llvm/lib/Support/HTTPClient.cpp
25

Skip top-level "const" here - it's out of character for LLVM code.

74

skip const here

79–81

Use = default; here, perhaps?

85–89

Is there any use for this variable? It's currently only written and never read, by the looks of it. Maybe that's worth deferring until some use is implemented.

This revision is now accepted and ready to land.Nov 17 2021, 1:48 PM
noajshu updated this revision to Diff 388386.Nov 18 2021, 9:52 PM
noajshu marked 7 inline comments as done.

Remove unnecessary IsInitialized variable. Also remove some const qualifiers and replace empty ctor/dtor with =default

noajshu updated this revision to Diff 389341.Nov 23 2021, 3:46 PM

Small style/formatting update.

noajshu updated this revision to Diff 390202.Nov 28 2021, 8:50 AM

Add HTTPClient::setTimeout method.

dblaikie added inline comments.Nov 29 2021, 1:47 PM
llvm/unittests/Support/HTTPClient.cpp
76

This probably doesn't test what it's intended to test - the string will terminate at the first '\0' - if you want to test embedded null characters, you'd need to specify an explicit length when creating the StringRef explicitly, rather than relying on StringRef(const char*) conversion.

noajshu updated this revision to Diff 390465.Nov 29 2021, 2:12 PM
noajshu marked an inline comment as done.

String literal fix for unit test with embedded null characters.

llvm/unittests/Support/HTTPClient.cpp
76

Thanks for pointing this out! I have added a fix using C++14 string literals. If you prefer I can also change this to use a hardcoded string length instead.

dblaikie accepted this revision.Nov 29 2021, 3:47 PM
dblaikie added inline comments.
llvm/lib/Support/HTTPClient.cpp
74–75

Looks like this API doesn't require the HTTPResponseHandler to have a virtual dtor - if these objects will usually be owned directly, rather than indirectly through a unique_ptr of the base type, etc, then the dtor can be non-virtual (but the base type will have to make the dtor protected and concrete derived classes would need to be marked final)

llvm/unittests/Support/HTTPClient.cpp
76

Oh, great!

If the string data doesn't need to outlive the handleHeaderLine call, you could roll it in:

...handleHeaderLine("Content-Length: \0\0\0"s),
...

Rather than having a standalone named variable. Though if you prefer the named variable to document the code intent, that's totally fair too!

noajshu updated this revision to Diff 390771.Nov 30 2021, 11:43 AM
noajshu marked 2 inline comments as done.

Change HTTPRequestHandler dtor from virtual to protected.

Thanks for the tip on the protected dtor! I also changed the setTimeout to use std::chrono::milliseconds following the discussion on D112758.

noajshu updated this revision to Diff 390786.Nov 30 2021, 12:37 PM

remove extraneous semicolon

dblaikie accepted this revision.Nov 30 2021, 12:54 PM

Generally (still) looks good to me.

noajshu edited the summary of this revision. (Show Details)Dec 1 2021, 3:42 PM
This revision was automatically updated to reflect the committed changes.