This is an archive of the discontinued LLVM Phabricator instance.

[libcxx] [test] Account for differences in a trailing slash in weakly_canonical
ClosedPublic

Authored by mstorsjo on Mar 15 2021, 10:50 AM.

Details

Summary

This seems to be a "documented" quirk in libc++'s implementation of
weakly_canonical (in a fixme comment in the weakly_canonical test).
Together with a difference between windows and posix regarding whether
paths can go through nonexistent dirs, this results in a difference in
a trailing slash. Just document this as expected.

Diff Detail

Event Timeline

mstorsjo requested review of this revision.Mar 15 2021, 10:50 AM
mstorsjo created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptMar 15 2021, 10:50 AM
Herald added a reviewer: Restricted Project. · View Herald Transcript

Wouldn't it be judicious to try to fix the underlying problem (as noted in the fixme comment)?
It seems to be of a similar nature as http://wg21.link/LWG3096, i.e. the fact whether (a part of) the path exists or not shouldn't matter.
WDYT?

Wouldn't it be judicious to try to fix the underlying problem (as noted in the fixme comment)?
It seems to be of a similar nature as http://wg21.link/LWG3096, i.e. the fact whether (a part of) the path exists or not shouldn't matter.
WDYT?

Well, maybe, but I'm afraid that might open a potentially big can of worms... Also, for reference, both MS STL and libstdc++ seem to have the same quirk regarding weakly_canonical sometimes returning paths with a trailing slash and sometimes not.

Wouldn't it be judicious to try to fix the underlying problem (as noted in the fixme comment)?
It seems to be of a similar nature as http://wg21.link/LWG3096, i.e. the fact whether (a part of) the path exists or not shouldn't matter.
WDYT?

Well, maybe, but I'm afraid that might open a potentially big can of worms... Also, for reference, both MS STL and libstdc++ seem to have the same quirk regarding weakly_canonical sometimes returning paths with a trailing slash and sometimes not.

In that case I think it would be better to remove the FIXME and document the quirk including the fact that this matches the behaviour of MS STL and libstdc++.
WDYT?

mstorsjo updated this revision to Diff 331070.Mar 16 2021, 12:46 PM

Degraded the fixme comment to a mere "note".

Mordante accepted this revision as: Mordante.Mar 17 2021, 9:48 AM

I like the new comments!

@curdeius WDYT about @Mordante 's suggestion to just degrade the fixme to a note, as the other implementations seem to have the exact same quirk?

curdeius accepted this revision.Mar 19 2021, 8:49 AM

LGTM.

I must admit that I went down the rabbit hole a bit... But that's a different story and one might argue that the current behaviour is OK.

TL;DR:
The problem is in part in __weakly_canonical that should check whether the input path has (and so the output needs) a trailing separator.
The other part is in __canonical that calls ::realpath and this function, in turn, removes the trailing separator.

http://eel.is/c++draft/fs.op.weakly.canonical#2 says:

Postconditions: The returned path is in normal form ([fs.path.generic]).

http://eel.is/c++draft/fs.op.weakly.canonical#5:

Remarks: Implementations should avoid unnecessary normalization such as when canonical has already been called on the entirety of p.

So, I imply that canonical() should return the path in normal form.

But then, http://eel.is/c++draft/fs.op.canonical#1 says:

Effects: Converts p to an absolute path that has no symbolic link, dot, or dot-dot elements in its pathname in the generic format.

First of all, this phrase is ambiguous to me because it can be understood as either of:

  1. "in the generic format" concerns elements in its pathname: There is no such element of the path p that would be in the generic format.
  2. "in the generic format" concerns an absolute path: Converts p to an absolute path (...) in the generic format. So the result is in the generic format.

To be consistent with weakly_canonical, the interpretation should be 1.

Also, the normalization described in http://eel.is/c++draft/fs.path.generic#6 (as done by lexically_normal) does keep trailing separators (apart from the case where the path ends with "../" where it gets removed, cf. http://eel.is/c++draft/fs.path.generic#6.7).
Hence I would expect that weakly_canonical (and canonical) return normalized paths and keep trailing separators in the same way as lexically_normal does.
I see no reason for a distinction between existing and non-existing paths through the trailing separator.

So, the current tests for weakly_canonical seem wrong because they remove the trailing separator, specifically:

{static_env.SymlinkToDir / "dir2/.", static_env.Dir / "dir2"},
/// FIXME....
{static_env.SymlinkToDir / "dir2/./", static_env.Dir / "dir2"},

should be

{static_env.SymlinkToDir / "dir2/.", static_env.Dir / "dir2/"},
{static_env.SymlinkToDir / "dir2/./", static_env.Dir / "dir2/"},

I.e. having "path/." or "path/./" should not remove the separator after path.

The problem is in part in __weakly_canonical that should check whether the input path has (and so the output needs) a trailing separator.
The other part is in __canonical that calls ::realpath and this function, in turn, removes the trailing separator.

This revision is now accepted and ready to land.Mar 19 2021, 8:49 AM

I must admit that I went down the rabbit hole a bit... But that's a different story and one might argue that the current behaviour is OK.

Thanks for doing the rabbit hole digging! That reinforces my initial feeling that I wouldn't want to go messing around in that unless strictly needed...