I will replace the existing URI struct in Protocol.h with the new URI and rename FileURI to URI in a followup patch.
Details
Diff Detail
- Repository
- rCTE Clang Tools Extra
- Build Status
Buildable 13927 Build 13927: arc lint + arc unit
Event Timeline
Lots of quibbles here, some things we can certainly defer. Happy to chat offline if you're not sure...
clangd/URI.cpp | ||
---|---|---|
18 | This isn't correct, we need to escape some characters, turn \ to /, etc. It's not quite obvious what to do here in general, because URI escaping rules are defined in terms of the full generic URI syntax. I'd suggest:
Splitting authority/scheme/path allows "file" schema to just care about the path part, not the // prefix, without mandating that all schemas have authorities.
(The reserved characters basically have no business in filenames, apart from / which you don't want escaped. This allows custom URI schemes to be flexible) | |
36 | this error needs to be handled, it shouldn't be an assert | |
clangd/URI.h | ||
20 | Why allow an invalid state? This would probably mean making it a class and having schema/path be readonly... | |
20 | This doesn't really describe what it's for. Consider: /// A URI describes the location of a source file. /// In the simplest case, this is a file:// URI that directly encodes the absolute path to a file. /// More abstract cases are possible: a shared index service might expose repo:// URIs that /// are relative to the source control root. | |
22–23 | nit: scheme, not schema (they're synonyms, but URIs always talk about scheme) | |
25 | I think calling this "path" is confusing, because there's ambiguity between the URI path component, and the file path that you're ultimately trying to transform it into. Consider "body", without changing semantics. | |
32 | (if you keep this: isValid(), or just valid(), or operator bool()) | |
39 | fromURI is confusing, because the input is a String and the output is a URI. | |
46 | Hmm, it manages file paths in exactly one schema :-) / \brief URIScheme is an extension point for teaching clangd to recognize a custom URI scheme. | |
54 | nit: move this to the registry? | |
65 | Uri not in schema: why would the framework even call getAbsolutePath in that case? We should have a way to handle errors though, e.g. file:// URIs that have relative paths. Return Expected rather than use the empty string for signaling? | |
66 | make currentfile the last param and optional? Uri is the primary param | |
70 | again:
| |
75 | this should be defined in the cpp file | |
90 | There are other possible failure cases. | |
91 | this is part of the "user" API, not the extension API, move up above? | |
91 | I think you want a public function to turn an absolute path into a file:// URI. |
Thanks for the comments! I've refactored the code according to your suggestions. PTAL
Few more interface nits, sorry for picking here! Will get through the impl in the morning.
clangd/URI.cpp | ||
---|---|---|
43 | Don't you want the opposite here, convert from unix to native? | |
50 | this error can be returned by the framework already (it's valid for all paths) | |
57 | conversely, here you want native (which is the default) | |
58 | hmm, file:///foo/bar is more widely used than file:/foo/bar. | |
clangd/URI.h | ||
24 | You probably also want to describe roughly the subset of URI parsing we do. e.g. | |
27 | e.g. "https" | |
29 | e.g. "//reviews.llvm.org" | |
31 | e.g. "/D41946" | |
43 | I think "the first matching scheme" is a bit confusing - I guess here "scheme" refers to URIScheme, but in practice these should be 1:1 with schemes. | |
46 | this is the "more public" API, so the semantics of CurrentFile should probably be defined here, and merely referred to below. | |
63 | Hmm - what does "users" mean :-) | |
65 | nit: I'm not sure this longer comment would really help someone use this API. | |
76 | I think "the file from which the request is issued" is overly-specified (and a little unclear). For instance, consider this workflow: So maybe llvm::StringRef HintPath- "A related path, such as the current file or working directory, which can help disambiguate when the same file exists in many workspaces". | |
80 | You should probably pass authority here too, or we're committing all schemes to ignore authority | |
82 | similarly, this should return (authority, body), not just body. | |
90 | this seems easy enough to test via uri::create rather than exposing it publicly, but up to you. | |
97 | nit: you probably don't want anything between URIScheme and this - if you keep percent*, move them below? |
clangd/URI.cpp | ||
---|---|---|
94 | Performance doesn't really matter here, but it looks like this is intended to save time by precomputing, and I don't think it's any faster than the direct version of percentEncode which might be easier to read: for (unsigned char C : Content) if (shouldEscape(C)) OS << '%' << format_hex_no_prefix(C, 2); else OS << C; where shouldEscape just does some range checks. | |
95 | 128 should be 256, right? | |
123 | most implementations seem to just treat treat the % as literal in this case, e.g. "%41%" -> "A%", which makes this function unable to fail, which simplifies parse() a bit | |
125 | alternatively, using more llvm and less std: if (*I == '%' && I + 2 < Content.end() && isHexDigit(*(I+1)) && isHexDigit(*(I+2)) { Result.push_back(hexFromNibbles(*(I+1), *(I+2))); I += 2; } else Result.push_back(*I); | |
136 | nit: i'd probably prefer to keep "uri" referring to the whole URI, and call the one you mutate something else | |
138 | consider StringRef::split which avoids some index arithmetic: pair<StringRef, StringRef> SchemeSplit = Uri.split(':'); if (SchemeSplit.second == "") return error U.Scheme = percentDecode(SchemeSplit.first); // then SchemeSplit.second contains the rest | |
149 | this is e.g. http://llvm.org | |
155 | this is e.g. file:///foo/bar, which is definitely valid! (we should have a test for this one!) | |
168 | body may be empty. (http://llvm.org again) | |
unittests/clangd/URITests.cpp | ||
115 | please include some typical examples: "file:///etc/passwd" | |
117 | the body should be "//x/y/z" here - the / that terminates the authority is part of the path. | |
143 | this is the same as the next one | |
144 | these are both valid | |
148 | this is valid, authority may be empty |
Sorry, our comments crossed...
clangd/URI.cpp | ||
---|---|---|
57 | yes. convert_to_slash(path, style) assumes that path is in style. convert_to_slash("c:\\foo.txt", windows) --> "c:/foo.txt" | |
clangd/URI.h | ||
29 | You're right! my mistake. |
- Addressed review comments.
- Check windows vs unit paths in tests.
- Properly check absolute paths.
clangd/URI.cpp | ||
---|---|---|
57 | I see. Thanks for the clarification! | |
138 | split is neat. But here we want to distinguish between http: vs http while it's not trivial wth split. Similar for / following the authority if we want to keep body '/', say, in http://llvm.org/. | |
155 | Right! Realized file:////foo/bar was weird while preparing the previous diff ;) |
LG with some nits addressed
clangd/URI.cpp | ||
---|---|---|
47 | should this be native(Path, posix)? | |
90 | nit: reserved and I don't think "for readability" is accurate. | |
138 | I don't understand what you mean by distinguishing http: vs http - "http://foo" will split into {"http", "foo"}. "httpfoo" will not split at all, and "http::foo" will split into "http" and "://foo". Agreed regarding authority/body. | |
clangd/URI.h | ||
32 | nit: prefer to omit \brief unless you want the brief comment to be something other than the first sentence. (I know we have this in lots of places - it adds noise, and I finally got around to looking it up in the style guide!) | |
82 | Still confused about this API, but let's chat offline? | |
90 | Hmm, I think if individual schemes need to do encoding/decoding, then our API is wrong :-) | |
92 | nit: unreserved | |
unittests/clangd/URITests.cpp | ||
2 | you may want a test that round-trips getVirtualTestFilePath("x") through a file URI and back again. |
Why allow an invalid state?
I think I'd lean towards more type safety here: having a few factory functions that always produce valid URIs (like fromPath) and some that return Expected<URI> (like parse), but require URI itself to be valid.
This would probably mean making it a class and having schema/path be readonly...