This is an archive of the discontinued LLVM Phabricator instance.

[libcxx] Make path::absolute return paths with separators in preferred form
AbandonedPublic

Authored by mstorsjo on Mar 5 2021, 11:55 PM.

Details

Reviewers
Quuxplusone
Group Reviewers
Restricted Project
Summary

This fixes the fs.op.absolute tests with libc++. The testcase
was earlier modified in 321f696920630be1b3c93e2a8b965c624ddd646c
to match the behaviour of MS STL.

Diff Detail

Event Timeline

mstorsjo requested review of this revision.Mar 5 2021, 11:55 PM
mstorsjo created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptMar 5 2021, 11:55 PM
Herald added a reviewer: Restricted Project. · View Herald Transcript
Quuxplusone requested changes to this revision.Mar 6 2021, 8:44 AM
Quuxplusone added a subscriber: Quuxplusone.
Quuxplusone added inline comments.
libcxx/src/filesystem/operations.cpp
701

Your friendly neighborhood return x expert says: Yikes! Remember that make_preferred() modifies the string in place (which is OK, and I checked that it's efficient) but then returns an lvalue reference to the path, not a copy of the path!

So the old code was RVO'ing the prvalue result of __do_absolute. The new code, on Windows and even on POSIX where make_preferred is a no-op, is doing a copy-construct from the lvalue result of make_preferred() into the return slot.

Please, at a minimum, change to use return cwd;.
If you would like to do larger surgery on __do_absolute, to make it more efficient, that would be awesome and I'd definitely take a look at it. (Right now I don't even understand why it takes cwd at all; it seems like we do that so that __canonical can throw a filesystem_error with path2=cwd, but that's not even what cppreference says should happen so I don't know why we do that. Some digging will be required by whoever takes this on.)

This revision now requires changes to proceed.Mar 6 2021, 8:44 AM
mstorsjo added inline comments.Mar 6 2021, 9:18 AM
libcxx/src/filesystem/operations.cpp
701

I'm sorry, I didn't follow entirely what you suggest to change, can you clarify? We're not returning cwd. Returning with make_preferred() is a bit obscure, agreed - the alternative way of expressing the same would be:

path ret = __do_absolute();
ret.make_preferred();
return ret;

Not sure if that's better or worse wrt optimizations, but it's more understandable.

Quuxplusone added inline comments.Mar 6 2021, 9:35 AM
libcxx/src/filesystem/operations.cpp
701

can you clarify? We're not returning cwd.

Oops, right. The thing you wrote, path ret = __do_absolute(); ret.make_preferred(); return ret;, is indeed what I meant to say.

It is strictly better w.r.t. optimizations; it will do move-construct on paper, and move-elision (RVO) in practice. The code I objected to would have done a full copy-construct.

mstorsjo updated this revision to Diff 328777.Mar 6 2021, 10:00 AM

Using a separate return value variable as requested

Quuxplusone added a subscriber: curdeius.

OK, LGTM now.
@curdeius ping! in search of a green light from the libc++ approver :)

I already wanted to hit the big green button when I have got a second thought. http://eel.is/c++draft/fs.op.absolute doesn't seem, to my reading, to mandate returning a path with preferred separators. I agree that current_path returns a path in native format, but I don't see why`absolute` should force preferred separators. A user passing a path with non-native separators to this function may well know what they are doing and not be willing to pay the cost of transforming separators to the preferred form, however slight this cost might be. What's your opinion on this?

On a different note, I think I'll have a closer look at some of the FS tests... The synopses in tests are hum, weird e.g. https://github.com/llvm/llvm-project/blob/main/libcxx/test/std/input.output/filesystems/fs.op.funcs/fs.op.absolute/absolute.pass.cpp#L13.
Also, you maybe know, why do we test only non-throwing version of absolute, cf. https://github.com/llvm/llvm-project/blob/main/libcxx/test/std/input.output/filesystems/fs.op.funcs/fs.op.absolute/absolute.pass.cpp#L50? I believe it's the case for some other functions too.

I already wanted to hit the big green button when I have got a second thought. http://eel.is/c++draft/fs.op.absolute doesn't seem, to my reading, to mandate returning a path with preferred separators. I agree that current_path returns a path in native format, but I don't see why`absolute` should force preferred separators. A user passing a path with non-native separators to this function may well know what they are doing and not be willing to pay the cost of transforming separators to the preferred form, however slight this cost might be. What's your opinion on this?

Yes, there's certainly a pretty big gray zone here with respect to how they should behave.

One other reference that we have is how the MS STL behaves - but it's of course not normative in any way. In this case, I had fixed up libcxx/test/std/input.output/filesystems/fs.op.funcs/fs.op.absolute/absolute.pass.cpp to produce strict matches with MS STL already earlier, but if we don't add this change, we could change the testcase, e.g. like this:

--- a/libcxx/test/std/input.output/filesystems/fs.op.funcs/fs.op.absolute/absolute.pass.cpp
+++ b/libcxx/test/std/input.output/filesystems/fs.op.funcs/fs.op.absolute/absolute.pass.cpp
@@ -42,8 +42,8 @@ TEST_CASE(basic_test)
     } TestCases [] = {
         {"", cwd / ""},
         {"foo", cwd / "foo"},
-        {"foo/", cwd / "foo" / ""},
-        {"/already_absolute", cwd.root_path() / "already_absolute"}
+        {"foo/", cwd / "foo/"},
+        {"/already_absolute", cwd.root_name() / "/already_absolute"}
     };

That folds the forward slash into the output so that it's kept as is for those cases, and produces strict matches with our current implementation in libc++, but then would make our testcase fail when run against MS STL. (There's plenty of cases where our tests either disagree with MS STL or where our tests are overly specific anyway, so that's not a strict criterion, but having tests that pass with both libc++ and MS STL IMO.)

A third alternative would be to make the absolute.pass.cpp test use separator-agnostic comparisons (i.e. path::operator==, while it currently uses PathEq() which does a.native() == b.native()), which loosens the test a bit.

Out of these options, I felt that making the absolute function return paths with preferred separators was the most sensible choice. Although, one shouldn't be modifying the implementation just for ease of testing.

On a different note, I think I'll have a closer look at some of the FS tests... The synopses in tests are hum, weird e.g. https://github.com/llvm/llvm-project/blob/main/libcxx/test/std/input.output/filesystems/fs.op.funcs/fs.op.absolute/absolute.pass.cpp#L13.

Yes, there's for sure things that can be cleaned up in these tests. I'm trying to avoid being distracted with other things for now (other than things that are requested during review), as there's still a bit to go until there's zero failing tests in my setups. :-)

Also, you maybe know, why do we test only non-throwing version of absolute, cf. https://github.com/llvm/llvm-project/blob/main/libcxx/test/std/input.output/filesystems/fs.op.funcs/fs.op.absolute/absolute.pass.cpp#L50? I believe it's the case for some other functions too.

I don't think there's any intent behind it, only accidental omissions.

I would rather go for your third alternative (so using operator== instead of PathEq) to test what is asked by standard and, optionally, test libc++ specifics using LIBCPP_ONLY macro or its pairs (e.g. using PathEq on non-Windows).
I know it loosens the test a bit, but as you say some tests are overly specific.

Anyway, all that according to my understanding of http://eel.is/c++draft/fs.op.absolute and other related parts of the standard. I'll be happy to hear your and others' interpretation (if it differs).

I would rather go for your third alternative (so using operator== instead of PathEq) to test what is asked by standard and, optionally, test libc++ specifics using LIBCPP_ONLY macro or its pairs (e.g. using PathEq on non-Windows).
I know it loosens the test a bit, but as you say some tests are overly specific.

Ok, I can go with that.

Anyway, all that according to my understanding of http://eel.is/c++draft/fs.op.absolute and other related parts of the standard. I'll be happy to hear your and others' interpretation (if it differs).

Well that one does say that it may have the same semantics as GetFullPathNameW on windows, and GetFullPathNameW does return a path with all backslashes.

Indeed, but the examples in the standard are non-normative.
BTW, libstdc++ doesn't bother to call make_preferred (https://godbolt.org/z/6Eabq4).

Indeed, but the examples in the standard are non-normative.
BTW, libstdc++ doesn't bother to call make_preferred (https://godbolt.org/z/6Eabq4).

Fair enough (although libstdc++ is pretty incomplete on windows in my experience - I wouldn't use it much for a reference).

Fair enough (although libstdc++ is pretty incomplete on windows in my experience - I wouldn't use it much for a reference).

Maybe but that wasn't on Windows!

Fair enough (although libstdc++ is pretty incomplete on windows in my experience - I wouldn't use it much for a reference).

Maybe but that wasn't on Windows!

Oh, right, I missed that. Well then the example doesn't say much does it, as make_preferred is a no-op on posix platforms (where backslashes are treated as a fairly regular char part of a filename), does it?

Oh, silly me. You're totally right.
It doesn't change though that make_preferred isn't used. FWIW, https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/src/filesystem/ops.cc#L66.

Oh, silly me. You're totally right.
It doesn't change though that make_preferred isn't used. FWIW, https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/src/filesystem/ops.cc#L66.

Yeah, that's true. But as a I said, it's not much of a reference for intended behaviour on Windows, as it's kinda spotty.

So I guess the spec doesn't forbid it, but doesn't either say it should be done - so the only reason to do it would be to superficially match the MS STL behaviour. And I'd agree with your point that we shouldn't be changing actual behaviour just to superficially match things like that. So with that, D98215 sounds like a more sensible choice - if we get another reviewer to +1 that one too :-)

mstorsjo abandoned this revision.Mar 9 2021, 7:00 AM

Abandoning this one in favour of D98215.