Details
- Reviewers
- amccarth - EricWF - curdeius - ldionne 
- Group Reviewers
- Restricted Project 
- Commits
- rG929f0bcc24e2: [libcxx] Implement parsing of root_name for paths on windows
Diff Detail
Event Timeline
This one is still unreviewed, and is more of generic libcxx code than windows specific API calls - could @curdeius have a look?
I may be repeating myself, but my biggest remark is that there are no tests. I know there's no libcxx CI on Windows, but still, that will make reviewing easier, looking what are the new test cases and what has been forgotten.
| libcxx/src/filesystem/operations.cpp | ||
|---|---|---|
| 202–205 | This looks strange. IIUC it means that you consumed a separator, and then, if TkStart != nulltptr, you consumed drive (e.g. "C:"). Apart from that, TkStart returned by consumeRootName isn't used after the if (in makeState). | |
| 207–211 | You don't check for TkStart not being null. | |
| 210 | Why RStart? You should use TkStart (+ 1?) if it's non-null. | |
| 222 | Pretty much the same remarks as above. No null check of TkStart. | |
| 338 | As below, I don't really like allowing P==nullptr. | |
| 347–359 | This function, when called with N==2 will consume e.g. //, \\, but also /x, and \x. | |
| 364 | I'm not a big fan of allowing P being nullptr. | |
| 398–402 | Nit: unnecessary braces around one-line bodies. | |
Thanks for having a look!
Yeah, that's a good point, and I'm sorry that they aren't available for viewing here. Making the tests compatible with windows requires a huge number of patches (50+), and reviewer time is very limited. I started out there, but as it was clear that they weren't getting handled at a rate where it'd be possible get them beforehand, I postponed occupying reviewer time with them. I hoped to at least get the actual functionality merged by the LLVM 12 branch date (getting reviewer time for that alone is a stretch).
Also for the tests, there's no clear policy for what way of handling the differing expected behaviours is acceptable - I tried the waters with e.g. D89943, which hasn't gotten an opinion yet. (E.g. is it ok with this kind of lots of ifdefs in the tables, or should they be split out into separate files, or what.) FWIW, that particular test update actually happens to cover mostly the things that this patch touches.
There's a bit of nuance difference between what's returned from MSVC STL and libc++ with these patches though, in a mostly pathological case though. If iterating over the path ///foo, libcxx today only returns the tokens / and foo while MSVC STL returns /// and foo. But that's more of an overall libcxx implementation issue (or intentional design?) than an issue about how to implement parsing of the windows specific features. I'd love to get to nailing down those fine details when the feature actually can be compiled and there's actual motion in getting the tests fixed for windows, but I don't want to postpone getting a generally decent implementation merged until the next release in 6 months, given the extremely scarce reviewer availability.
| libcxx/src/filesystem/operations.cpp | ||
|---|---|---|
| 202–205 | This is for the decrement function, which iterates backwards. So when parsing "c:/" backwards, this first consumes the separator. If we hit the end (which would happen for a path like "/" but not for "c:/"), it's considered a root dir. If it's not the last token, but followed by (parsing-wise, i.e. backwards) a root name, and that root name is at the start of the string, then the separator also is a root dir. When parsing forward, the use of consumeRootName is easy as we only need to try to call it at the very beginning. When parsing backwards, we need to speculatively test it at a few places, in order to know how to classify a previous separator. So here, the root name parsed by consumeRootName isn't actually consumed, i.e. we don't really use the TkStart it returns for returning parts of the string to the caller, but only for classifying the separator. | |
| 207–211 | If TkStart == REnd, it can't be null at the same time. | |
| 210 | No, this is a case of speculative use of consumeRootName backwards - if we didn't end up at REnd it wasn't actually a root name for real, and we just handle it like we did before. | |
| 222 | If TkStart == REnd, it can't be null. If TkStart returned from consumeRootName is something else than REnd, then this reduces down to exactly the same code as before (modulo renaming the variable from TkEnd to TkStart, making it consistent with the other cases in this function, as the pointer is to the start of a token, when tokenizing backwards). If consumeRootName returned REnd, then we found a root name. If we consumed a separator in SepEnd, we should return this separately as a root dir, otherwise return only the root name. I.e. possible cases are "foo/bar" (normal case), "foo/c:/bar" (consumeRootName returns a non-null TkStart, but it isn't REnd, so we proceed like we used to do before), c:/bar (we've consumed bar before, SepEnd points at, or actually one char before, the separator, and we return the separator as root dir) and c:bar (we've consumed bar and now consume the root name c:). | |
| 347–359 | The existing function consumeSeparator actually is a bit misnamed, it consumes one or more separators. This function checks that it consumed exactly N separators, otherwise we return null and don't consume anything. Should this maybe be called consumeExactlyNSeparators for clarity? (That name is a bit clumsy though.) | |
| 364 | This (and the other case) was added for making consumeNetworkRoot more elegant. I could also spell it out like this: if (P < End) {
  P = consumeSeparators(P, End, 2);
  if (P == nullptr)
    return nullptr;
  return consumeName(P, End);
} else {
  P = consumeName(P, End);
  if (P == nullptr)
    return nullptr;
  return consumeSeparators(P, End, 2);
} | |
Renamed the consumeSeparators method to consumeNSeparators and added a comment pointing out that it consumes exactly that amount of separators. Removed unnecessary braces around single lines. (The consumeSeparator method can be renamed in a separate patch afterwards, I'd avoid doing it here to keep the diff as readable as possible.)
This looks good to me, in that I believe it will correctly parse typical native Windows path names. Only tests will tell, but I understand that there are many patches necessary to enable the tests.
As for the libcxx style issues, I'll leave that to other reviewers.
I'd like to see a name change to clear up the consumeSeparators confusion, but I won't block on that.
| libcxx/src/filesystem/operations.cpp | ||
|---|---|---|
| 338 | I wouldn't object to an assertion if passing a nullptr is actually a bug, but it's not clear to me that it is. Since this is a private method, I'm not very concerned about a libcxx user making a bad call here--the risk is contained. | |
| 347–359 | Perhaps the existing function should be consumeSeparators (plural) and this one should be consumeNSeparators. Fixing the name of the existing method makes this one's implementation more obvious. | |
I'll send a separate patch for renaming that one later - but I'll hold off of it for now until the dust settles, to reduce the number of non-essential patches flooding the review queue.
This looks strange. IIUC it means that you consumed a separator, and then, if TkStart != nulltptr, you consumed drive (e.g. "C:").
Can it happen anywhere? /C:?
Apart from that, TkStart returned by consumeRootName isn't used after the if (in makeState).
Shouldn't it be used instead of SepEnd + 1 (if non null)?