This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Implement path and URI translation for remote index
ClosedPublic

Authored by kbobyrev on Jul 1 2020, 1:40 AM.

Diff Detail

Event Timeline

kbobyrev created this revision.Jul 1 2020, 1:40 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 1 2020, 1:40 AM
kbobyrev updated this revision to Diff 274719.Jul 1 2020, 1:42 AM

%s/PrefixPath/ProjectRoot/g

Generally I think this is missing high-level documentation describing the translation that happens at each level - I had to refer back to the notes from our meeting to understand what the scheme was.
I think Marshalling.h is probably a good place to document how this works. (Not exactly what prefixes get stripped by which function, but rather what the backing index contains, what goes over the wire, and what the clangd on the client side sees).

clang-tools-extra/clangd/index/remote/Client.cpp
89

this should be a std::string - no need to rely on the caller to keep the value alive

clang-tools-extra/clangd/index/remote/Client.h
21

This is a bit low-level - it describes what is being done but not really why. As a caller of this function it doesn't really tell me what I should pass.

Suggest something like "ProjectRoot is an absolute path on the local machine to the source tree described by the remote index. Paths returned by the index will be treated as relative to this directory."

I'd also suggest calling this IndexRoot instead of ProjectRoot, because the latter is a fairly overloaded concept and they need not coincide.

clang-tools-extra/clangd/index/remote/marshalling/Marshalling.cpp
32

mention somewhere that this is code used on the *client* side to do translation, and that RelativePath came from the server

36

scheme is always "file", no need to make it a parameter?

39

20 is too small. Suggest 256 or so (SmallString is on the stack, so it's cheap)

41

you can use URI::createFile which does less work and never fails

47

this looks like a no-op cast

50

Again this is a bit low level - "translates a URI from the server's backing index to a relative path suitable to send over the wire to the client"?

53

uriToRelativePath for consistency?

59

you're crashing here with an unhandled Expected. Add a test?
(I'd suggest elog or log, too)

62

static_cast is a pretty surprising/uncommon way to write this conversion....
I'd suggest either Result.str.str() or std::string(result)

63

need to handle the case where this returns false. Probably log and return ""?

Again, we should have a test for this.

132–136

this needs to do path translation in the case where the header is a URI

148

and here

clang-tools-extra/clangd/index/remote/marshalling/Marshalling.h
42

nit: Clangd -> clangd

55–59

what does "serializes clangd" mean?

clang-tools-extra/clangd/index/remote/server/Server.cpp
35

nit: call this variable IndexRoot rather than IndexPrefixPath, for consistency?

68–69

for hygiene, I think we should pass this in as a constructor parameter rather than accessing the flag directly.

You probably also want to add a little validation: this needs to be an absolute path, and you may need to convert slashes to unix-style, too.

clang-tools-extra/clangd/unittests/remote/MarshallingTests.cpp
58

this is testPath("unittest"), isn't it?

I mean, forward vs backslashes are going to differ, but the current expression is going to produce C:\test/unittest which doesn't seem better.

120

test name is backwards - TEST(RemoteMarshallingTest, URITranslation) ?

sammccall added inline comments.Jul 2 2020, 1:16 AM
clang-tools-extra/clangd/index/remote/marshalling/Marshalling.cpp
37

we should also bail out if RelativePath is an absolute path.

63

we really should ensure the result has unix slashes.

clang-tools-extra/clangd/index/remote/marshalling/Marshalling.h
55–59

The Ref marshaller isn't an appropriate place to put the detailed documentation on how path translation works.
It could go in the file documentation, or we could wrap the marshalling methods into a class we could put it there.
(The latter is somewhat appealing because it means you don't need to explicitly pass the root around everywhere, you can do normalization and have a place to store it, etc)

56

Somewhere we need to describe what the format of ProjectRoot is: trailing slash or not, unix slashes or native, etc.
(Even if the current code works in any variation, it's easier to reason about if we have a tightly "normalized" form)

clang-tools-extra/clangd/unittests/remote/MarshallingTests.cpp
56–69

Is passing the empty string here actually valid? I think you probably want to pass testRoot() or some unixified equivalent

120

I think this test is written in a way where it won't work on windows.
I'd suggest making heavier use of testPath() instead.

125

this is not (lexically) a relative path, it's an absolute path.
For the way it's used in this test it doesn't matter that much (just confusing)

130

OK, this assertion does actually show that we're sending absolute paths over the wire and calling them relative.
The confusion here is more serious because it's in the wire format and harder to address later.
There are other technical reasons: true relative paths still leave us the absolute-path namespace to represent paths outside the source tree if we want that for some reason.

kbobyrev updated this revision to Diff 275613.Jul 6 2020, 2:19 AM
kbobyrev marked 19 inline comments as done.

Store the progress so that I don't lose it. Still WIP.

clang-tools-extra/clangd/index/remote/marshalling/Marshalling.cpp
62

Okay, sure! I just thought .str().str() looks really weird, but std::string is probably a good choice.

clang-tools-extra/clangd/unittests/remote/MarshallingTests.cpp
56–69

Why testroot? That's what I'm stripping from URI body, the URI is unittest:///TestTU.h, hence I'm stripping from /TestTU.h, there is no /clangd-test there. Could you elaborate?

58

I don't understand: std::string(testRoot()) + '/' is /clangd-test/ on unix and testPath("unittest") is /clangd-test/unittest. I understand the slashes argument, but could you please elaborate on the former?

kbobyrev updated this revision to Diff 275667.Jul 6 2020, 4:46 AM
kbobyrev marked 12 inline comments as done.

Resolve the rest of patch comments.

kbobyrev updated this revision to Diff 275670.Jul 6 2020, 4:50 AM

Remove IndexRoot conversion to UNIX slashes from the Marshalling hot path.

kbobyrev updated this revision to Diff 275672.Jul 6 2020, 4:51 AM

Use std::string() instead of static_cast<std::string>()

Harbormaster failed remote builds in B63001: Diff 275670!
Harbormaster failed remote builds in B63003: Diff 275672!
kbobyrev updated this revision to Diff 275945.Jul 7 2020, 1:53 AM

Don't convert test paths to system native, UNIX native is good enough.

kbobyrev updated this revision to Diff 276029.Jul 7 2020, 5:56 AM

Convert prefix paths to native slash format for _serialization_ calls
(deserializations should use UNIX slashes).

kbobyrev updated this revision to Diff 276030.Jul 7 2020, 6:04 AM

Add assertions for ensuring paths have UNIX slashes during deserialization.

This looks better to me but:

  • i think we need to consider what happens when paths are not under the expected path, and handle that in some appropriate way
  • I find it hard to see what the tests are actually testing - would be good to make them more explicit, using a few minimal helpers, and avoiding fancy abstractions like URI schemes
clang-tools-extra/clangd/index/remote/Index.proto
102–105

storees -> stores

isn't it conceptually simpler to talk about a "relative path" rather than a "part of a URI body"?
(apart from anything else, no need to worry about whether it's escaped or not...)

clang-tools-extra/clangd/index/remote/marshalling/Marshalling.cpp
36

mention error return (or return Optional)

52

this should check the scheme too - it must be file
(we can't reasonably treat IndexedProjectRoot as scheme-relative... but not to any *particular* scheme!)

119

nit: indexedprojectroot -> indexroot, unless there's some reason for the asymmetry? it seems like the same concept in the opposite direction

123–124

how do you want to handle the case where the path is in fact not relative stored under the root?

133

Don't try to parse if it's not a URI (starts with < or ").
As written this code will crash in debug mode if it's not a URI, due to unhandled error. Please add a test!

145

you seem to have this check in a few places - I'd suggest it should be in the relativePathToURI function and there should be some plan for handling errors.

clang-tools-extra/clangd/index/remote/marshalling/Marshalling.h
15

nit: llvm -> llvm-project

42

again, one particular overload is not really the place for this comment, I think. (File comment looks good)

52

this also needs the index root, for the proximity path (I'm not sure why they're not URIs, that's unfortunate...)

clang-tools-extra/clangd/index/remote/server/Server.cpp
50

please pass this in as a constructor param rather than accessing the flag directly

clang-tools-extra/clangd/unittests/remote/MarshallingTests.cpp
25

The File scheme is special. Using a different URI scheme here when we only support file in production is confusing and seems like a last resort to be used when we can't make it work any other way.
(For example we use it in lit tests because we must specify input filenames and the presence/absence of drive letters caused problems)
What's being solved there that the much smaller hammer of testPath() plus a local testPathURI() helper can't solve?

34

At this point I really think we should be writing these symbols out instead of trying to parse them from C++ - it's pretty hard to see exactly what fields are set and what is tested when the test is so highly factored.

If you like this approach think everything new is covered in other tests, I'd just remove the changes to this one.

58

sorry, I guess I meant testRoot("/").

124

auto -> std::string (here and in other places where the type is neither overly unreadable nor spelled in the function name)

124

it's not great practice to be assembling the test data from prefix + relative if the code we're testing is splitting it into prefix + relative - easy for the same bugs to cancel each other out.

Is testPath("remote/dir/MarshallingTests.cpp") directly too long?

128

(I'd suggest a helper testPathURI("remote/dir/MarshallingTests.cpp") for this)

132

hmm, I don't think that's the reason.
Either that's the contract of fromProtobuf, or it's not required and we shouldn't do it.

148

nit: use a file path rather than directory?

kbobyrev updated this revision to Diff 276366.Jul 8 2020, 4:12 AM
kbobyrev marked 12 inline comments as done.

Store progress. Still WIP.

kbobyrev marked 2 inline comments as done.Jul 8 2020, 4:21 AM
kbobyrev added inline comments.
clang-tools-extra/clangd/unittests/remote/MarshallingTests.cpp
25

This is dealing with the consequences of test infrastructure being set up in a way that makes all URIs in Symbols and Refs use "unittest" scheme, this helper simply translates those URIs into "file" URIs which are the ones being supported in production. I can not understand why this is not desirable but I'll just clear all URIs and leave those two tests as sanity checks.

132

Ah, llvm::sys::path::append already does that for me. Okay, makes sense to remove this then.

kbobyrev updated this revision to Diff 276454.Jul 8 2020, 8:52 AM
kbobyrev marked 10 inline comments as done.

Address the rest of the comments and refactor code.

Patch is ready for the review again.

kbobyrev updated this revision to Diff 276642.Jul 9 2020, 12:35 AM

Fix formatting on one line.

kbobyrev updated this revision to Diff 276646.Jul 9 2020, 12:40 AM

Rebase on top of master

kbobyrev updated this revision to Diff 276647.Jul 9 2020, 12:43 AM

Replace Sym in Client response with a more generic name (it can be a reference).

sammccall accepted this revision.Jul 9 2020, 1:53 AM

Rest is just nits, thanks!

clang-tools-extra/clangd/index/remote/marshalling/Marshalling.cpp
36

these functions are nice. It might be clearer to expose them and test them directly, particularly for error cases. Up to you.

37

can we add an assert that the index root is valid? (ends with a platform-appropriate slash)

43

message needs a bit more context. Remote index client got absolute path from server: {0}?

48

nit: lots of const auto throughout - we tend not to use const for local values (as opposed to references or pointers) so this is a bit confusing, particularly when combined with auto. Up to you though.

55

can we add an assert that IndexRoot is valid?

58

Similarly: Remote index got bad URI from client: ...

62

I think it's fine to fold this into the previous check: "bad URI" covers it

66

This isn't a valid IndexRoot, we should assert rather than handle this case I think.

123

shouldn't this return llvm::None, too? a SymbolLocation isn't valid without a path.

140

nit: isLiteralInclude in Headers.h (sorry, had forgotten about this func).
Calling that function probably avoids the need for the comment too.

144

isn't the rest of this uriToRelativePath?

180

{} with no index is a nice idea but I think not actually supported

clang-tools-extra/clangd/index/remote/marshalling/Marshalling.h
19

nit "is passed passed to the remote server".

20

"It contains absolute path to the project root... relative path has unix slashes"

can we pull this out into a separate paragraph at the end? The rest is describing the flow, and this describes an API detail.

20

nit: "*the* absolute path", "*the* relative path" etc.

This revision is now accepted and ready to land.Jul 9 2020, 1:53 AM
kbobyrev updated this revision to Diff 276692.Jul 9 2020, 3:44 AM
kbobyrev marked 15 inline comments as done.

Address post-LGTM round of comments.

This revision was automatically updated to reflect the committed changes.
kbobyrev marked an inline comment as done.Jul 9 2020, 3:55 AM
kbobyrev added inline comments.
clang-tools-extra/clangd/index/remote/marshalling/Marshalling.cpp
62

But it has different message :( ParsedURI.takeError wouldn't make sense here.