This is an archive of the discontinued LLVM Phabricator instance.

[20/N] [libcxx] Implement parsing of root_name for paths on windows
ClosedPublic

Authored by mstorsjo on Nov 10 2020, 8:44 AM.

Diff Detail

Event Timeline

mstorsjo created this revision.Nov 10 2020, 8:44 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 10 2020, 8:44 AM
Herald added 1 blocking reviewer(s): Restricted Project. · View Herald Transcript
mstorsjo updated this revision to Diff 308894.Dec 2 2020, 12:22 AM

Updated after moving one bit of this patch into D91135.

mstorsjo updated this revision to Diff 310798.Dec 10 2020, 1:31 AM
mstorsjo set the repository for this revision to rG LLVM Github Monorepo.

Removed an unnecessary static within an anonymous namespace.

Ping, this one is still entirely unreviewed - @curdeius @amccarth?

As a verbal description, this parses out the "c:" or "\\host" bit from the start of a path as the "root name" part, both when parsing/iterating forwards and backwards over the path.

This one is still unreviewed, and is more of generic libcxx code than windows specific API calls - could @curdeius have a look?

I may be repeating myself, but my biggest remark is that there are no tests. I know there's no libcxx CI on Windows, but still, that will make reviewing easier, looking what are the new test cases and what has been forgotten.

libcxx/src/filesystem/operations.cpp
202–205

This looks strange. IIUC it means that you consumed a separator, and then, if TkStart != nulltptr, you consumed drive (e.g. "C:").
Can it happen anywhere? /C:?

Apart from that, TkStart returned by consumeRootName isn't used after the if (in makeState).
Shouldn't it be used instead of SepEnd + 1 (if non null)?

207–211

You don't check for TkStart not being null.

210

Why RStart? You should use TkStart (+ 1?) if it's non-null.

222

Pretty much the same remarks as above. No null check of TkStart.
Then, SepEnd is used instead of TkStart.

338

As below, I don't really like allowing P==nullptr.
Shouldn't it be an assertion and callers should ensure it's non-null?

347–359

This function, when called with N==2 will consume e.g. //, \\, but also /x, and \x.
Is it intended?
I was rather expecting a loop calling consumeSeparator up to N times.

363

I'm not a big fan of allowing P being nullptr.

397–401

Nit: unnecessary braces around one-line bodies.

I may be repeating myself, but my biggest remark is that there are no tests. I know there's no libcxx CI on Windows, but still, that will make reviewing easier, looking what are the new test cases and what has been forgotten.

Thanks for having a look!

Yeah, that's a good point, and I'm sorry that they aren't available for viewing here. Making the tests compatible with windows requires a huge number of patches (50+), and reviewer time is very limited. I started out there, but as it was clear that they weren't getting handled at a rate where it'd be possible get them beforehand, I postponed occupying reviewer time with them. I hoped to at least get the actual functionality merged by the LLVM 12 branch date (getting reviewer time for that alone is a stretch).

Also for the tests, there's no clear policy for what way of handling the differing expected behaviours is acceptable - I tried the waters with e.g. D89943, which hasn't gotten an opinion yet. (E.g. is it ok with this kind of lots of ifdefs in the tables, or should they be split out into separate files, or what.) FWIW, that particular test update actually happens to cover mostly the things that this patch touches.

There's a bit of nuance difference between what's returned from MSVC STL and libc++ with these patches though, in a mostly pathological case though. If iterating over the path ///foo, libcxx today only returns the tokens / and foo while MSVC STL returns /// and foo. But that's more of an overall libcxx implementation issue (or intentional design?) than an issue about how to implement parsing of the windows specific features. I'd love to get to nailing down those fine details when the feature actually can be compiled and there's actual motion in getting the tests fixed for windows, but I don't want to postpone getting a generally decent implementation merged until the next release in 6 months, given the extremely scarce reviewer availability.

libcxx/src/filesystem/operations.cpp
202–205

This is for the decrement function, which iterates backwards. So when parsing "c:/" backwards, this first consumes the separator. If we hit the end (which would happen for a path like "/" but not for "c:/"), it's considered a root dir. If it's not the last token, but followed by (parsing-wise, i.e. backwards) a root name, and that root name is at the start of the string, then the separator also is a root dir.

When parsing forward, the use of consumeRootName is easy as we only need to try to call it at the very beginning. When parsing backwards, we need to speculatively test it at a few places, in order to know how to classify a previous separator.

So here, the root name parsed by consumeRootName isn't actually consumed, i.e. we don't really use the TkStart it returns for returning parts of the string to the caller, but only for classifying the separator.

207–211

If TkStart == REnd, it can't be null at the same time.

210

No, this is a case of speculative use of consumeRootName backwards - if we didn't end up at REnd it wasn't actually a root name for real, and we just handle it like we did before.

222

If TkStart == REnd, it can't be null.

If TkStart returned from consumeRootName is something else than REnd, then this reduces down to exactly the same code as before (modulo renaming the variable from TkEnd to TkStart, making it consistent with the other cases in this function, as the pointer is to the start of a token, when tokenizing backwards).

If consumeRootName returned REnd, then we found a root name. If we consumed a separator in SepEnd, we should return this separately as a root dir, otherwise return only the root name.

I.e. possible cases are "foo/bar" (normal case), "foo/c:/bar" (consumeRootName returns a non-null TkStart, but it isn't REnd, so we proceed like we used to do before), c:/bar (we've consumed bar before, SepEnd points at, or actually one char before, the separator, and we return the separator as root dir) and c:bar (we've consumed bar and now consume the root name c:).

347–359

The existing function consumeSeparator actually is a bit misnamed, it consumes one or more separators. This function checks that it consumed exactly N separators, otherwise we return null and don't consume anything.

Should this maybe be called consumeExactlyNSeparators for clarity? (That name is a bit clumsy though.)

363

This (and the other case) was added for making consumeNetworkRoot more elegant. I could also spell it out like this:

if (P < End) {
  P = consumeSeparators(P, End, 2);
  if (P == nullptr)
    return nullptr;
  return consumeName(P, End);
} else {
  P = consumeName(P, End);
  if (P == nullptr)
    return nullptr;
  return consumeSeparators(P, End, 2);
}
curdeius added inline comments.Jan 8 2021, 1:44 PM
libcxx/src/filesystem/operations.cpp
202–205

I have indeed forgotten about iterating backwards. Thank you for the explanation.

347–359

Or just consumeNSeparators.
And then rename consumeSeparator to sth like consumeAllSeparators.
Maybe?

363

I see. That ain't pretty either.

mstorsjo removed rG LLVM Github Monorepo as the repository for this revision.Jan 8 2021, 1:52 PM
mstorsjo updated this revision to Diff 315506.Jan 8 2021, 1:55 PM

Renamed the consumeSeparators method to consumeNSeparators and added a comment pointing out that it consumes exactly that amount of separators. Removed unnecessary braces around single lines. (The consumeSeparator method can be renamed in a separate patch afterwards, I'd avoid doing it here to keep the diff as readable as possible.)

mstorsjo updated this revision to Diff 315507.Jan 8 2021, 1:56 PM
mstorsjo set the repository for this revision to rG LLVM Github Monorepo.

Readd the repo, for CI...

curdeius accepted this revision.Jan 8 2021, 2:00 PM

LGTM. I'm looking forward to seeing all your 50+ patches landed :).

This looks good to me, in that I believe it will correctly parse typical native Windows path names. Only tests will tell, but I understand that there are many patches necessary to enable the tests.

As for the libcxx style issues, I'll leave that to other reviewers.

I'd like to see a name change to clear up the consumeSeparators confusion, but I won't block on that.

libcxx/src/filesystem/operations.cpp
338

I wouldn't object to an assertion if passing a nullptr is actually a bug, but it's not clear to me that it is. Since this is a private method, I'm not very concerned about a libcxx user making a bad call here--the risk is contained.

347–359

Perhaps the existing function should be consumeSeparators (plural) and this one should be consumeNSeparators. Fixing the name of the existing method makes this one's implementation more obvious.

I'd like to see a name change to clear up the consumeSeparators confusion, but I won't block on that.

I'll send a separate patch for renaming that one later - but I'll hold off of it for now until the dust settles, to reduce the number of non-essential patches flooding the review queue.

amccarth accepted this revision.Jan 11 2021, 2:00 PM
mstorsjo updated this revision to Diff 320109.Jan 29 2021, 5:10 AM
mstorsjo removed rG LLVM Github Monorepo as the repository for this revision.
mstorsjo set the repository for this revision to rG LLVM Github Monorepo.

Rerunning CI.

mstorsjo updated this revision to Diff 324204.Feb 16 2021, 11:45 PM

Retrigger CI

mstorsjo updated this revision to Diff 324205.Feb 16 2021, 11:45 PM
mstorsjo set the repository for this revision to rG LLVM Github Monorepo.

Readd the forgotten repo field

ldionne accepted this revision.Feb 17 2021, 5:32 AM
This revision is now accepted and ready to land.Feb 17 2021, 5:32 AM
This revision was landed with ongoing or failed builds.Feb 17 2021, 5:48 AM
This revision was automatically updated to reflect the committed changes.