This patch implements a small HTTP client library consisting primarily of the HTTPRequest, HTTPResponseHandler, and BufferedHTTPResponseHandler classes. Unit tests of the HTTPResponseHandler and BufferedHTTPResponseHandler are included.
Details
Diff Detail
Event Timeline
Refactor to three primary parent classes HTTPClient, HTTPRequest, HTTPResponseHandler.
This is a more straightforward design that presents a simpler usage pattern. E.g., we can do this:
Expected<HTTPResponseBuffer> ResponseOrErr = Client.get("http://example.com"); if (Error Err = ResponseOrErr) return Err; outs() << ResponseOrErr->Body;
@labath thank you for your comments on the previous version of this diff before splitting!
I have refactored and simplified things, so that we have three main classes:
- HTTPClient
- HTTPRequest
- HTTPResponseHandler
This diff just adds the skeleton, while the next in the sequence (D112753) adds the CURL client itself. Do you think the base HTTPClient should just use curl by default? In that case perhaps these diffs could be merged.
By the way, I wanted to share some of the motivation for the added complexity of these three classes instead of just an httpGet function.
I am thinking ahead to their use in the debuginfod client and server. The HTTPClient and HTTPResponseHandler can be extended for concurrent requests or streaming response handlers, which both could be used for debuginfod.
The client could use concurrent requests to send a HEAD to all N known server URLs at once, rather than serially requesting the asset from each server. Large debugging assets could be processed piece-by-piece using a StreamingHTTPResponseHandler. This would be used during federation by a debuginfod server, enabling immediate write of data chunks received from the peer server back to the client.
Curl’s docs say that re-use of handles is key to good performance and this is part of the reason to have a Client that reuses a handle when running multiple requests. However I haven't measured this performance difference myself.
I've found the error handling confusing. If the intention is that the HTTPResponseHandler can terminate the request by returning zero, then it seems to me that it would be better to have it return llvm::Error, and have the HTTPClient class translate that to whatever action is required ( == storing the error in some variable and returning 0 to curl). That would make the BufferedHTTPResponse class cleaner, as it would no longer have to deal with an Error member, and the overall api would be more understandable.
llvm/include/llvm/Support/HTTPClient.h | ||
---|---|---|
41 | Should this be a period, or is something missing? | |
45 | I'm having trouble reconciling this with the "cannot fail" comment above. Can you elaborate on what the two comments mean? | |
71–76 | Having an Error member is strange enough (because of it's destruction semantics), but making it a public member is unheard of. It would be much better to make these private and have a member function returning Expected<HTTPResponseBuffer>. But it would be much better to not have the Error member in the first place. | |
95 | I don't see any methods returning an error | |
llvm/lib/Support/HTTPClient.cpp | ||
51 | if you used llvm::to_integer here, you would be able to use any type you want (size_t perhaps), and would get more natural failure semantics (no need for ! -- to_integer returns true on success). | |
llvm/unittests/Support/HTTPClient.cpp | ||
43–46 | EXPECT_EQ(Handler.ResponseBuffer.Body->getBuffer(), "..."); | |
158–163 | What is this supposed to test? This test pretty much only executes code which is defined in the test itself... |
@labath I completely agree the new error handling is awkward and I prefer your proposed Error handling between the ResponseHandler and the Client. I'll change all three handle*() methods to return Error (then convert it to 0 for curl in the client + hook functions)
llvm/unittests/Support/HTTPClient.cpp | ||
---|---|---|
43–46 | Thanks! Because the Body is actually a WritableMemoryBuffer, I need an extra namespace qualifier on getBuffer for this to work. On that note -- do you think I should change the HTTP response body to be a MemoryBuffer (read-only)? |
llvm/lib/Support/HTTPClient.cpp | ||
---|---|---|
51 | Ah, the perfect fit. Thanks! | |
llvm/unittests/Support/HTTPClient.cpp | ||
158–163 | I agree and I will remove the mockHTTPClientTest. Do you think there is any value in the BrokenConnection test? I would be happy to remove this test as well. The only non-test code executed in that one is Client.perform(Request) and Client.get("") which is effectively a change-detector for the short convenience methods of the Client. |
This looks good to me now, modulo some details in inline comments.
llvm/lib/Support/HTTPClient.cpp | ||
---|---|---|
30–32 | = default; | |
llvm/unittests/Support/HTTPClient.cpp | ||
18 | This block looks weird -- I'd just make a separate test for that. | |
43–46 | I don't see a need for that -- I can see how the caller might want to modify the buffer for its own purposes. and he can always convert it to a read-only buffer if he does not (the opposite is harder). | |
158–163 | Yeah, I'd probably remove them, or at least significantly simplify them. If you want to test that the virtual perform is getting called, then all you need to do is mock that method -- you don't really need to provide a fake implementation of that method, and everything that gets called by it... |
I think this is as good as it can get. What you do after that -- I don't know.
It wasn't clear to me whether you forgot about the mockHTTPClientBrokenConnectionTest part, or chose not to do that, but I added an additional comment to describe how I would do this.
llvm/lib/Support/HTTPClient.cpp | ||
---|---|---|
30 | = default | |
llvm/unittests/Support/HTTPClient.cpp | ||
141–142 | MOCK_METHOD2(perform, Error(HTTPRequest, HTTPResponseHandler)), and then just set expectations in the test. The entire MockHTTPResponseHandler business can go away. |
Make perform(Request, Handler) non-virtual, adopting D112753's convention for HTTPClient implementation.
Thanks Pavel! I decided to remove the mock client tests, because they only hit the few-line convenience methods. I'm happy to add them, I'm just unsure how to avoid making HTTPClient::perform(Request, Handler) virtual. In any case they will be hit in subsequent testing after a working client implementation (i.e. Curl in D112753) is added.
llvm/unittests/Support/HTTPClient.cpp | ||
---|---|---|
141–142 | Thanks! It seems the only library code we hit with these tests is the small convenience methods perform(Request) and get(Url). Is it worth making the perform method virtual to be able to create the mock class? |
llvm/unittests/Support/HTTPClient.cpp | ||
---|---|---|
141–142 | Probably not. It made sense while curl was implemented in a separate class, but I don't think it's worth going out of our way to make this work. |
Couple of minor API design issues, but haven't dug in much.
@labath - what's your state on this and the subsequent patches? What would be best for me to focus on/do here to help out?
llvm/lib/Support/HTTPClient.cpp | ||
---|---|---|
26 | Twine's mostly meant for when a string value may only be conditionally manifest (such as in a logging library where some dynamic log level would dictate whether the value is logged) - if the string is unconditionally manifest, Twine probably isn't the right tool to use here. You could pass a SmallString by value & move it into the Url, but since it's a smallstring, it has a big sizeof and might not be the most efficient thing to move around. Alternatively just pass a StringRef. (similar feedback up the stack/for other uses of Twine) | |
llvm/unittests/Support/HTTPClient.cpp | ||
65–72 | If this code isn't touching/implementing to_integer, it's probably not necessary/appropriate to re-test it here, should just rely on its existing definition/documentation/testing elsewhere. | |
86 | Perhaps avoid the hardcoded 19 somehow? Like you could declare the string as a named char array, then use sizeof to get the length (-1 for the implicit null terminator, if that's not meant to be included) |
Thanks for the comments @dblaikie! I've converted to StringRef parameters throughout.
I've pretty much done all I can with this series. I am happy with the code quality. Test coverage is ok. There's one ongoing discussion in https://reviews.llvm.org/D112753#inline-1083341, that I'd like to hear your thoughts on (if you have any), but that's a fairly small detail.
I'd say the main thing that's missing is for someone to say "this belongs in llvm".
Move global curl initialization / cleanup to static HTTPClient::initialize / HTTPClient::cleanup.
Fair enough - Yeah, I think the direction's worth having a go at - be nice to have it supported in llvm-symbolizer, etc.
llvm/include/llvm/Support/HTTPClient.h | ||
---|---|---|
32 | Usually best to write op overloads as non-members (possibly as friends, if that's necessary/helpful) to avoid different conversion candidates an the LHS and RHS. | |
36 | Skip const here | |
108 | Skip top-level const here - while it's generally out of character for LLVM code, it's also not part of the function declaration (const here doesn't change the type of the function/doesn't change how people call it, it's only an implementation detail) | |
llvm/lib/Support/HTTPClient.cpp | ||
26 | Skip top-level "const" here - it's out of character for LLVM code. | |
75 | skip const here | |
80–82 | Use = default; here, perhaps? | |
86–90 | Is there any use for this variable? It's currently only written and never read, by the looks of it. Maybe that's worth deferring until some use is implemented. |
Remove unnecessary IsInitialized variable. Also remove some const qualifiers and replace empty ctor/dtor with =default
llvm/unittests/Support/HTTPClient.cpp | ||
---|---|---|
76 | This probably doesn't test what it's intended to test - the string will terminate at the first '\0' - if you want to test embedded null characters, you'd need to specify an explicit length when creating the StringRef explicitly, rather than relying on StringRef(const char*) conversion. |
String literal fix for unit test with embedded null characters.
llvm/unittests/Support/HTTPClient.cpp | ||
---|---|---|
76 | Thanks for pointing this out! I have added a fix using C++14 string literals. If you prefer I can also change this to use a hardcoded string length instead. |
llvm/lib/Support/HTTPClient.cpp | ||
---|---|---|
74–75 | Looks like this API doesn't require the HTTPResponseHandler to have a virtual dtor - if these objects will usually be owned directly, rather than indirectly through a unique_ptr of the base type, etc, then the dtor can be non-virtual (but the base type will have to make the dtor protected and concrete derived classes would need to be marked final) | |
llvm/unittests/Support/HTTPClient.cpp | ||
76 | Oh, great! If the string data doesn't need to outlive the handleHeaderLine call, you could roll it in: ...handleHeaderLine("Content-Length: \0\0\0"s), ... Rather than having a standalone named variable. Though if you prefer the named variable to document the code intent, that's totally fair too! |
Thanks for the tip on the protected dtor! I also changed the setTimeout to use std::chrono::milliseconds following the discussion on D112758.
Usually best to write op overloads as non-members (possibly as friends, if that's necessary/helpful) to avoid different conversion candidates an the LHS and RHS.