Page MenuHomePhabricator

[llvm] [Support] [Debuginfo] Add http and debuginfod client libraries and llvm-debuginfod-find tool

Authored by noajshu on Oct 6 2021, 11:27 AM.



This patch implements debuginfod and http client libraries with local caching. A few unit and lit tests are included.
The client libraries are only built if the cmake configuration flag is passed:
LibCURL is required to build the clients.
The standalone tool llvm-debuginfod-find is also provided; it wraps the debuginfod client library.
The lit tests specific to this tool can be run e.g. with:
LIT_FILTER='.*llvm-debuginfod.*' ninja -C build check-llvm
Thanks in advance for comments and critiques!

This has been split into the following sub-diffs:

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
noajshu added inline comments.Oct 20 2021, 11:00 AM

The fallback would kick in when cache_directory comes up empty-handed. This can happen on linux if neither $XDG_CACHE_HOME nor $HOME are in the environment. I would have no problem with removing the fallback and failing in this case as the user can always specify the current directory using --cache-dir anyways.


Thanks, updated!
I agree with the comments that these unit tests are not adding much value. One option until we have a server in llvm would be to GET, although that does require an internet connection. By the lower level APIs, are you referring to the static callback functions used to implement httpGet, or to the CURL library itself?

noajshu updated this revision to Diff 381026.Oct 20 2021, 11:02 AM
noajshu marked 2 inline comments as done.

Remove cache dir fallback to current directory in debuginfod-find.
Use EXPECT_THAT_EXPECTED in unit tests, switch functions out of anonymous namespaces (use static instead)

noajshu updated this revision to Diff 381065.Oct 20 2021, 12:36 PM
noajshu marked 3 inline comments as done.

Make namespaces smaller by switching library implementations to use using namespace llvm; and llvm:: prefixing.

labath added inline comments.Oct 21 2021, 1:12 AM

Why isn't this next to the other llvm_canonicalize_cmake_booleans calls?

(It would also help if you upload the full context with your patch (arcanist does it automatically, and you can achieve it with git diff -U9999 if you do it manually).


Small strings are typically not returned by value, and I don't think that the performance of this is particularly critical (if it was, you'd probably use a SmallVectorImpl<char>& by-ref argument), so I'd probably use a std::string here.


I see. In that case, I think it would be good to also add set(LLVM_ENABLE_CURL ... CACHE) somewhere, so that this appears as a user-settable value in UIs and such.


an unset HOME variable is going to be most likely an accident, and i think the usage of CWD would be surprising in that case. So, I'd leave the fallback out, but this is your tool, so I'm going to leave that up to you.


it seems #undef DEBUG_TYPE is rarely used in cpp files (headers are a different story), and when it is, it's usually because it's #defined to a different files immediately afterwards. Undefining at the end of a file seems pointless.


I mean the httpGet function itself -- we generally don't test private implementation details (directly).
In an ideal world I'd split this patch into two (or even three, with the first part being the introduction of httpGet), and each would come with it's own tests. Testing the error message is nice to have, but it just scratches the surface. In httpGet, the content-length handling seems interesting to test, for example.

But yes, you'd need some way to create/mock the server connection for that to work...

noajshu updated this revision to Diff 381311.Oct 21 2021, 10:18 AM

Add debuginfod-find.test which is simpler and less general-purpose, and does not require special command-line quoting behavior or JSON configuration.
This test incorporates Pavel's suggestion to assign the port to the first available, and avoids using timing to orchestrate the client and server. This would replace the llvm-debuginfod-find.test + scheme.

noajshu updated this revision to Diff 381312.Oct 21 2021, 10:23 AM

Only use llvm_canonicalize_cmake_boolean where it's needed, in llvm/test/CMakeLists.txt with the other invocations.

noajshu marked an inline comment as done.Oct 21 2021, 10:24 AM
noajshu added inline comments.

Thanks and just moved this!

noajshu updated this revision to Diff 381364.Oct 21 2021, 12:45 PM
noajshu marked 2 inline comments as done.

Remove old end-to-end tests and add LLVM_ENABLE_CURL cmake variable cache string.

noajshu updated this revision to Diff 381367.Oct 21 2021, 12:58 PM
noajshu marked 5 inline comments as done.

Various style and usability updates.
Remove use of llvm_unreachable in handling user error, extraneous #undef DEBUG_TYPE, add Cmake cache vars, remove CWD fallback for cache dir.

noajshu updated this revision to Diff 381368.Oct 21 2021, 1:01 PM

fetchDebuginfo returns a std::string instead of small string.

labath added inline comments.Oct 22 2021, 1:25 AM
1–2 ↗(On Diff #381368)

This seems fine for now, though if you start having more of these, it would definitely be good to factor some of this out.

You can consider renaming the file to .py to get syntax highlighting.

21 ↗(On Diff #381368)

Where will this be storing the downloaded files? It should probably be %t or some derivative. And you should clean the folder before the test.

28–29 ↗(On Diff #381368)

The way this is written now, the test would pass if I replaced debuginfod-find with /bin/true. Is there anything else you can check here?


You're only checking that the user specified _at least_ one of the arguments.

noajshu updated this revision to Diff 382514.Oct 26 2021, 8:39 PM
noajshu marked an inline comment as done.

Refactor HTTP Client (Curl wrapper).
This refactor enabled writing more meaningful unit tests and fixes several bugs in the client.

noajshu updated this revision to Diff 382517.Oct 26 2021, 9:01 PM

Workaround for unit tests, to avoid need for parent's destructor to invoke child's override.
There is a more general explanation of why this does not work here: more details here

noajshu added inline comments.Oct 26 2021, 9:20 PM

Hi Pavel, first off thanks for all of your comments. I refactored the HTTP client to enable meaningful unit tests. It's now split into two new class hierarchies rooted at HTTPRequest and HTTPResponseHandler. This way CURL can be swapped out with a a different HTTP backend, including a simulated backend during unit testing.
Similarly, the buffered response handler can be unit tested in isolation without data going through a socket. This allows some nontrivial tests of how it handles content-lengths.
I am very interested in your thoughts on this refactor, and if you have ideas for other unit tests of the HTTP client or feedback on the ones here. I wonder if it makes sense to refactor the Debuginfod client library as well to enable more meaningful unit tests there. And please feel free to suggest a different approach if you feel the refactor moves us in the wrong direction.

For now I will work on addressing your comments on the other areas of the diff.
Since it is even larger now, it could certainly make sense to split off into a few diffs. One way to do this could be:

[diff 0 ] HTTP Client + unit tests
[diff 1] Debuginfod Client library + unit tests
[diff 2] Debuginfod-Find tool + end-to-end (lit) tests

phosek added inline comments.Oct 27 2021, 12:40 AM

This is unnecessary.


I'd move the cURL-based implementation into a separate header and source file, for example Support/CURLClient.{h,cpp}. Otherwise, anyone who includes HTTPClient.h automatically pulls in <curl/curl.h> which is undesirable.


This is unnecessary.


Do these need to public?


Do these need to be public?


This function seems unnecessary as it's functionality is now effectively provided by HTTPRequest and HTTPRequestConfig which is more flexible (for example they can potentially support methods other than Get).


Rather than printing the error to stderr which may be undesirable for example when using LLVM as a library, I think it'd be better to store the error inside the class (it might be possible to use ErrorList) and then return it from curlPerform.


The same here.


Could this method be inlined directly into CurlHTTPRequest::performRequest?

Well... I hate to say it, but I found the version very easy to get lost in. It seems to me you're exposing a lot more implementation details than it is necessary (including the entire curl.h header, which is something we try to avoid. And the fact that tests don't even go through the public interface make it hard to keep track of what is being tested. If we are going to do this, maybe we could use a pattern like this:

class CurlInterface {
  virtual Error init()  = 0;
  virtual Error cleanup() = 0;
  virtual Error perform() = 0;
  virtual void setWriteFunction(...) = 0;
  // etc. 
  // It doesn't have to match the curl interface verbatim. Feel free 
  // to perform basic simplifications and c++-ifications of the 
  // interface. You'll need to do some of those to avoid dependence 
  // on curl.h anyway.

Expected<HTTPResponseBuffer> httpGet(const Twine &Url, CurlInterface &curl);
Expected<HTTPResponseBuffer> httpGet(const Twine &Url); // calls the first function with the default/real curl implementation

This shouldn't expose too much implementation details, and this interface could even conceivably by useful by someone who wants to use a custom http-ish library/transport/thingy.

Then, in the test, you could use gmock to program a fake curl implementation:

class CurlMock: public CurlInterface {
  MOCK_METHOD0(init, Error());

TEST(httpGet, whatever) {
  CurlMock curl;
  EXPECT_CALL(curl, setWriteFunction).WillOnce(SaveArg<0>(&writeFunc));
  EXPECT_CALL(curl, perform).WillOnce([&] {
    EXPECT_THAT_ERROR(writeFunc(...), llvm::Succeeded());
    EXPECT_THAT_ERROR(writeFunc(...), llvm::Failed());
    return llvm::Error::success();
  EXPECT_THAT_ERROR(httpGet("http://foo", curl), llvm::Succeeded())

How does that sound?


the llvm:: qualifications are not necessary here


drop .str().str(), it's cleaner


Yeah, if we're going to have tests for each of these components, I'd definitely recommend splitting them out into separate patches.

noajshu updated this revision to Diff 382731.Oct 27 2021, 11:34 AM

Add CurlHTTPRequest::ErrorState using ErrorList to track errors during curl http request lifecycle.
Also miscellaneous style updates, redundant namespace qualifiers.

noajshu marked 3 inline comments as done.Oct 27 2021, 12:56 PM
noajshu added inline comments.

Great idea, I've added this and removed the prints to stderr. The ErrorState will also surface for other Error-returning methods of CurlHTTPRequest.

noajshu updated this revision to Diff 382765.Oct 27 2021, 1:22 PM
noajshu marked an inline comment as done.

Updates to CurlHTTPRequest access specifiers.

noajshu updated this revision to Diff 382793.Oct 27 2021, 2:21 PM
noajshu marked 5 inline comments as done.

Remove extraneous curlPerform method from CurlHTTPRequest

noajshu updated this revision to Diff 382799.Oct 27 2021, 2:32 PM

Remove httpGet function.

noajshu updated this revision to Diff 382863.Oct 27 2021, 5:40 PM
noajshu marked 2 inline comments as done.

Improvements to regression (lit) tests.
Use %t as the asset cache directory and clean up directory before each test. Verify local file contents match expected contents.

noajshu updated this revision to Diff 382871.Oct 27 2021, 6:20 PM

Split out CurlHTTPRequest into CURLClient.{h,cpp}.

noajshu updated this revision to Diff 383132.Oct 28 2021, 12:51 PM

Rebase against main.

noajshu edited the summary of this revision. (Show Details)Oct 28 2021, 1:26 PM
noajshu edited the summary of this revision. (Show Details)Fri, Oct 29, 11:03 AM

@labath Thank you for your helpful feedback on this diff!
I have split into 4 sub-diffs. The HTTP client/request/response handler architecture is in D112751 and the curl client is in D112753. Could we continue the discussion there?


Nope -- updated most to private, and curlInit to protected (for unit testing).


The curlHandle{HeaderLine,BodyChunk} method is invoked by the static curl{HeaderLine,BodyChunk}MethodHook function, which then needs access.

I tried instead having curl directly call back to a private method, but then the implicit this argument is replaced by the first parameter passed by curl, which is the pointer to the header/body contents.


Agreed that this could be removed and on the flexibility point I bet another use case might be a StreamingHTTPResponseHandler. This would be especially nice if the localCache is refactored to allow caching objects chunk-by-chunk -- as then large debuginfo would not hog memory (as with the BufferedHTTPResponseHandler).

noajshu abandoned this revision.Wed, Nov 24, 12:38 PM

Abandoning due to split into separate patches.