This is an archive of the discontinued LLVM Phabricator instance.

[Support/Path] Add path::is_absolute_gnu
ClosedPublic

Authored by tinti on Sep 14 2020, 7:25 PM.

Details

Summary

[Support/Path] Add path::is_absolute_gnu

Implements IS_ABSOLUTE_PATH from GNU tools.

C++17 is_absolute behavior is different the from the behavior defined by GNU
tools.

According to cppreference.com, C++17 states: "An absolute path is a path
that unambiguously identifies the location of a file without reference
to an additional starting location."

In other words, the rules are:

  1. POSIX style paths with nonempty root directory are absolute.
  2. Windows style paths with nonempty root name and root directory are absolute.
  3. No other paths are absolute.

GNU rules are:

  1. Paths starting with a path separator are absolute.
  2. Windows style paths are also absolute if they start with a character followed by ':'.
  3. No other paths are absolute.

On Windows style the path "C:\Users\Default" has "C:" as root name and "\"
as root directory.

Hence "C:" on Windows is absolute under GNU rules and not absolute under
C++17 because it has no root directory. Likewise "/" and "\" on Windows are
absolute under GNU and are not absolute under C++17 due to empty root name.

Related to PR46368.

Diff Detail

Event Timeline

tinti created this revision.Sep 14 2020, 7:25 PM
tinti requested review of this revision.Sep 14 2020, 7:25 PM

On Windows, paths starting with drive letter pattern are absolute.

This isn't a difference is it? That's also the behaviour for is_absolute, I believe?

llvm/include/llvm/Support/Path.h
474–477
479

You need to explain the purpose of the style parameter too. I don't know why this is missing from the other doc comments around here.

llvm/lib/Support/Path.cpp
708

I'd put this function adjacent to is_absolute, like it is in the header.

Also, I think you can use the same naming style for variables as in is_absolute (i.e. lowerCameCase).

Rather than implement this whole function from scratch, maybe you could just do the bit that's different from is_absolute first, and then just call is_absolute?

rengolin added inline comments.Sep 15 2020, 3:35 AM
llvm/lib/Support/Path.cpp
722

Neither 7: nor $: are valid paths on Windows, I think.

Perhaps std::isalpha(P[0]) would help.

tinti added a comment.Sep 15 2020, 6:52 PM
llvm/include/llvm/Support/Path.h
474–477

Ok.

479

The style will allow one to behave like Linux running o Windows or vice-versa.

Yes. They are missing indeed. That is why I didn't even thought in putting them.

llvm/lib/Support/Path.cpp
708

I'd put this function adjacent to is_absolute, like it is in the header.

Ok.

Also, I think you can use the same naming style for variables as in is_absolute (i.e. lowerCameCase).

Ok. Sorry clang-tidy.

Rather than implement this whole function from scratch, maybe you could just do the bit that's different from is_absolute first, and then just call is_absolute?

I tried. And I intentionally avoided it. My original implementation called is_absolute at the end.
Now I don't remember if it fails in one of my tests (which I removed because I tested with the data from TEST(Support, Path)) or if it is not needed.

My argument is that It needs to be as readable as possible. I tried to red is_absolute and may:

  • call has_root_directory()
    • call root_directory()
      • create an iterator()
      • call is_separator()
  • call has_root_name()
    • call root_name()
      • create an iterator()
      • call is_separator()

I think is too much complexity to one thing that must be false. After reading the code from GNU it is more clear. The #ifdef only sets or unsets dos_based.

See below:

#define IS_DIR_SEPARATOR_1(dos_based, c)				\
  (((c) == '/')								\
   || (((c) == '\\') && (dos_based)))

#define HAS_DRIVE_SPEC_1(dos_based, f)			\
  ((f)[0] && ((f)[1] == ':') && (dos_based))

#define IS_ABSOLUTE_PATH_1(dos_based, f)		 \
  (IS_DIR_SEPARATOR_1 (dos_based, (f)[0])		 \
   || HAS_DRIVE_SPEC_1 (dos_based, f))

Comparing with my rules:

  1. Paths starting with '/';
#define IS_DIR_SEPARATOR_1(dos_based, c)				\
  (((c) == '/')								\
   || (((c) == '\\') && (dos_based)))
  1. On Windows, paths starting with '\\';
#define IS_DIR_SEPARATOR_1(dos_based, c)				\
  (((c) == '/')								\
   || (((c) == '\\') && (dos_based)))
  1. On Windows, paths starting with a drive letter pattern.
#define HAS_DRIVE_SPEC_1(dos_based, f)			\
  ((f)[0] && ((f)[1] == ':') && (dos_based))

#define IS_ABSOLUTE_PATH_1(dos_based, f)		 \
  (IS_DIR_SEPARATOR_1 (dos_based, (f)[0])		 \
   || HAS_DRIVE_SPEC_1 (dos_based, f))

Any other condition must be false. I don't thin is worth to call is_absolute.

713

I wanted to even avoid calling is_separator.

What do you think?

718

/ has already been tested. There is no need to repeat here.

722

Neither 7: nor $: are valid paths on Windows, I think.

Perhaps std::isalpha(P[0]) would help.

I thought it too but the code on GNU does not check isalpha.

#define HAS_DRIVE_SPEC_1(dos_based, f)			\
  ((f)[0] && ((f)[1] == ':') && (dos_based))

The Wikipedia also points an extension [1].

If access to more filesystems than Z: is required under Windows NT, Volume Mount Points must be used.[11] However, it is possible to mount non-letter drives, such as 1:, 2:, or !: using the command line SUBST utility in Windows XP or later (i.e. SUBST 1: C:\TEMP), but it is not officially supported and may break programs that assume that all drives are letters A: to Z:.

There is code on LLDB that does the way you suggested [2].

[1] https://en.wikipedia.org/wiki/Drive_letter_assignment#Common_assignments
[2] https://github.com/llvm/llvm-project/blob/bc9b6b33a0d5760370e72ae06c520c25aa8d61c6/lldb/source/Utility/FileSpec.cpp#L315

tinti added a comment.Sep 15 2020, 7:14 PM

On Windows, paths starting with drive letter pattern are absolute.

This isn't a difference is it? That's also the behaviour for is_absolute, I believe?

I forgot this comment.

is_absolute is really hard to read. But I think you are right.

I think it considers c:\ as absolute but does not c: [1].

[1] https://github.com/llvm/llvm-project/blob/master/llvm/lib/Support/Path.cpp#L69

On Windows, paths starting with drive letter pattern are absolute.

This isn't a difference is it? That's also the behaviour for is_absolute, I believe?

I forgot this comment.

is_absolute is really hard to read. But I think you are right.

I think it considers c:\ as absolute but does not c: [1].

[1] https://github.com/llvm/llvm-project/blob/master/llvm/lib/Support/Path.cpp#L69

That would make sense because c: is not a valid path on Windows. For example, if you try to do cd c: or dir c: in a command prompt, you just end up using the current directory, wherever you are. If you are deliberately accepting <drive letter>:, then you need to say "On Windows, paths that are just are drive letter pattern are absolute", since the "starting with" includes "C:\foo" for example, which is the same for is_absolute.

I might actually be inclined to change the description to say simply that it follow's GNU's behaviour, and not focus on the differences, but rather state the GNU behaviour:

  1. Paths starting with a path separator are absolute.
  2. Windows style paths are also absolute if they start with a character followed by ':'.
  3. No other paths are absolute.

I'd also put this description in the function comment in the header.

llvm/lib/Support/Path.cpp
708

I've been giving this some thought, and looked at the code. I think it is important that we exactly match GNU's behaviour here, since we are specifically saying this function is intended to be GNU compatible. Consequently, I don't think any more that we should call is_absolute at all from this function, since that potentially has different behaviour and might accept things we don't want it to. Actually, in the current situation I think it is a strict subset of the behaviour we want, but I can't guarantee that going forward.

713

If we change this to be is_separator(P.front(), Style), won't that mean we don't need the second check below? I feel very confident that we don't need to worry that is_separator's behaviour will change.

722

We should allow what GNU allows, even if it doesn't make much sense. It's unlikely to matter that much, and the code is simpler this way.

llvm/unittests/Support/Path.cpp
95

You also need the following cases, I think:

/foo    // paths don't just have to be the directory separator
\\foo   // ditto (windows only)
c:\     // ditto (for drive letters)
!:      // drive char can be a non-letter
c       // drive char must be followed by ':'
:       // ':' without drive char is not a drive identifier
xx:     // drive identifier only accepted for single character
c:abc\\ // drive identifier without directory separator immediately following
97
tinti added inline comments.Sep 16 2020, 2:30 AM
llvm/lib/Support/Path.cpp
713

If we change this to be is_separator(P.front(), Style), won't that mean we don't need the second check below? I feel very confident that we don't need to worry that is_separator's behaviour will change.

Yes. It looks so.

rengolin added inline comments.Sep 16 2020, 3:54 AM
llvm/lib/Support/Path.cpp
722

@tinti: Why, Windows? Why?!

@jhenderson: good point.

tinti updated this revision to Diff 292747.Sep 18 2020, 3:24 AM
tinti edited the summary of this revision. (Show Details)

Add more tests;
Simplify is_gnu_style_absolute;
Reword comments;
Reword commit message.

jhenderson added inline comments.Sep 21 2020, 12:39 AM
llvm/include/llvm/Support/Path.h
475

It might be worth adding to the is_absolute comment expanding the LLVM rules, so that people don't read this and go "okay, what are the differences?"

478–479

I think this comment can be more helpful, as suggested in the inline edit. NB, I can't remember doxygen syntax for references to other parameters, so please adjust accordingly.

llvm/lib/Support/Path.cpp
690
llvm/unittests/Support/Path.cpp
90

It's probably simpler to just make this an array, and use universal initialization syntax, i.e. something like:

std::tuple<StringRef, bool, bool> Paths[] {
  { "", false, false },
  { "/", true, true },
  ...
};

N.B: I haven't checked the syntax to ensure I've got this 100% correct.

95

You missed the : only case.

tinti updated this revision to Diff 293132.Sep 21 2020, 4:14 AM
tinti marked 7 inline comments as done.
tinti retitled this revision from [Support/Path] Add path::is_gnu_style_absolute to [Support/Path] Add path::is_gnu_absolute.
tinti edited the summary of this revision. (Show Details)
  • Drop the "_style" in the name of the function.
  • Add ":" test.
  • Add comment about LLVM is_absolute.
  • Add Vector constructor.

Now I think it is better to call it "is_gnu_absolute". The "style" is already used for style::{posix,windows,native}.
After reading the comments it was confusing. It might sound like we were introducing a new style::gnu which is not the case.

I can revert back if you don't agree. Please let me know.

  • Drop the "_style" in the name of the function.
  • Add ":" test.
  • Add comment about LLVM is_absolute.
  • Add Vector constructor.

Now I think it is better to call it "is_gnu_absolute". The "style" is already used for style::{posix,windows,native}.
After reading the comments it was confusing. It might sound like we were introducing a new style::gnu which is not the case.

I can revert back if you don't agree. Please let me know.

I'd go with is_absolute_gnu instead. I think it reads a little better, personally, if you are dropping "style" (I don't mind either way on whether "style" is present in the name).

llvm/include/llvm/Support/Path.h
467
475

As mentioned in my previous comment, I think you should move the comment to the is_absolute function. I'd also remove reference to LLVM, since this applies to the standard C++17 is_absolute function too, I believe.

490
llvm/unittests/Support/Path.cpp
90

Why are you still using SmallVector here? I suggested an array before, and I don't see any benefit to the SmallVector (we don't need vector operations for this test).

tinti marked an inline comment as done.Sep 21 2020, 1:31 PM

I have done several rewords.

I think it is better now. I hope I haven't missed more suggestions.

llvm/include/llvm/Support/Path.h
475

I missed the move part.

llvm/unittests/Support/Path.cpp
90

My bad. I missed this one too.

tinti updated this revision to Diff 293241.Sep 21 2020, 1:39 PM
tinti retitled this revision from [Support/Path] Add path::is_gnu_absolute to [Support/Path] Add path::is_absolute_gnu.
tinti edited the summary of this revision. (Show Details)

Remove SmallVector.
Move comment to is_absolute.
Reword documentation.
Add references.

A couple of remaining comment suggestions, otherwise I think this looks good.

llvm/include/llvm/Support/Path.h
454

It might be worth adding the suggested quote, taken from https://en.cppreference.com/w/cpp/filesystem/path/is_absrel prior to the list.

llvm/lib/Support/Path.cpp
690

You can delete "on GNU", since it's implied (we're in is_absolute_gnu already).

tinti updated this revision to Diff 293693.Sep 23 2020, 4:25 AM
tinti marked 2 inline comments as done.
tinti edited the summary of this revision. (Show Details)

Add reference to cppreference.com.
Improve comments.

tinti edited the summary of this revision. (Show Details)Sep 23 2020, 4:25 AM
tinti added inline comments.
llvm/include/llvm/Support/Path.h
454

Ok.

Useful:

Notes: The path "/" is absolute on a POSIX OS, but is relative on Windows.
llvm/lib/Support/Path.cpp
690

Ok. I agree.

I tried to avoid make affirmations that would not be true depending on C++17 or GNU. In such cases I tried to add context.

jhenderson accepted this revision.Sep 23 2020, 5:53 AM

LGTM, thanks for working on this.

This revision is now accepted and ready to land.Sep 23 2020, 5:53 AM
This revision was automatically updated to reflect the committed changes.
tinti added a comment.Sep 24 2020, 3:40 AM

This patch added an issue in some builds:

06:25:37 ../unittests/Support/Path.cpp:91:7: error: chosen constructor is explicit in copy-initialization
06:25:37       {"", false, false},  {"/", true, true},      {"/foo", true, true},
06:25:37       ^~~~~~~~~~~~~~~~~~
06:25:37 /proj/flexasic/app/llvm/8.0/bin/../lib/gcc/x86_64-unknown-linux-gnu/5.4.0/../../../../include/c++/5.4.0/tuple:479:19: note: explicit constructor declared here
06:25:37         constexpr tuple(_UElements&&... __elements)
06:25:37                   ^

See [1].

The fix is to use std::make_tuple.

Fixed by Mikael [2].

[1] http://lab.llvm.org:8011/builders/clang-cmake-x86_64-sde-avx512-linux/builds/45180
[2] https://github.com/llvm/llvm-project/commit/a1217620a87f66616c15e869d56783ba18e51b12