Page MenuHomePhabricator

[9/N] [libcxx] Implement the stat function family on top of native windows APIs

Authored by mstorsjo on Nov 10 2020, 1:43 AM.



While the windows CRTs (the modern UCRT, and the legacy msvcrt.dll that mingw still often defaults to) do provide stat functions, they're a bit lacking - they only provide second precision on the modification time, lack support for symlinks and a few other details.

Instead reimplement them using a couple windows native functions, getting exactly the info we need. (Technically, the implementation within the CRT calls these functions anyway.)

If we only need a few fields, we could also do with fewer calls, as a later optimization.

Diff Detail

Event Timeline

mstorsjo created this revision.Nov 10 2020, 1:43 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 10 2020, 1:43 AM
Herald added 1 blocking reviewer(s): Restricted Project. · View Herald Transcript
mstorsjo requested review of this revision.Nov 10 2020, 1:43 AM
mstorsjo retitled this revision from [libcxx] Implement the stat function family on top of native windows APIs to [9/N] [libcxx] Implement the stat function family on top of native windows APIs.Nov 10 2020, 1:43 AM
rnk resigned from this revision.Nov 10 2020, 12:27 PM
mstorsjo updated this revision to Diff 305078.Nov 13 2020, 3:40 AM

Made the set_errno() helper take the error code as a parameter, to avoid double calls to GetLastError() if the caller already has fetched and inspected the error code.

LGTM as long as the run-time calculation of the default value for set_errno is legit.


Whoa! I thought default argument values had to be compile-time constants. I've never seen this done before. Is this a newer C++ feature?

mstorsjo added inline comments.Nov 18 2020, 1:15 PM

No idea actually, I just tried it and it worked. If it wouldn't, it'd be trivial to make a parameterless overload of the function that just calls set_errno(GetLastError()) anyway.

curdeius added inline comments.

It's nothing new actually.
You must just be aware of the place where names used in the default argument are bound (declaration) and when they are evaluated (call) so to speak.

LGTM as long as the run-time calculation of the default value for set_errno is legit.

Can you set it as formally approved from your side, to maybe ease with getting it looked at by the libc++ approvers?

mstorsjo updated this revision to Diff 312720.Dec 18 2020, 1:33 AM
mstorsjo set the repository for this revision to rG LLVM Github Monorepo.

Rebased on top of stylistic changes in earlier patches, retriggering CI

ldionne requested changes to this revision.Jan 13 2021, 8:21 AM
ldionne added a subscriber: ldionne.
ldionne added inline comments.

Why are those defines necessary?


Since the approach here is to make Windows into a POSIX-y system by adding stat & friends, would it be possible to hide this complexity in some internal header like windows_posix_shims.h (or whatever the name)?

This revision now requires changes to proceed.Jan 13 2021, 8:21 AM
mstorsjo added inline comments.Jan 13 2021, 8:31 AM

Various versions of C runtimes used have fewer or more of them defined: to make things consistent across them, I undefine all existing defines and use our own.


Possibly, but there's a couple gotchas:

  1. Windows C runtimes actually do have a stat() function, but it's a bit limited (it only provides second precision for times, it doesn't expose the permissions, it doesn't provide something similar to st_dev), so we make our own available in the detail namespace here.
  1. Not all of these are exact replacements of posix functions. For symlinks (in D91143) I split the symlink function into symlink_file and symlink_dir.
  1. Like 2. above, we might want to tweak the interface of the functions outside of the posix interface. E.g. for the stat() function, it's currently implemented by a handful of API calls, to fill in all used fields. If a caller only wants some fields, we could add a flag parameter to detail::stat here, indicating exactly which fields we need populated, as a later optimization.
mstorsjo added inline comments.Jan 13 2021, 8:50 AM

I guess such a clarifying comment would help here.

mstorsjo updated this revision to Diff 316435.Jan 13 2021, 9:40 AM

Added clarifying comments in the header, describing why we're providing our own stat implementation, and that the C runtime headers provide possibly conflicting defines that we undef.

ldionne added inline comments.Jan 18 2021, 10:46 AM

Just to understand, you're talking about the runtime headers for various versions of Windows? Is this not possible to fix by requiring a recent-ish version of those headers when building libc++?


I guess my actual ask is to properly define *some* cross-platform API that we call from this file. If it's not exactly POSIX, that's fine, however that should be documented.

We're porting the library to a new platform, so my preference would be that we delineate the minimal API that needs to be implemented by the system for the rest of the filesystem library to work. I'm not attached to whether we do it in this patch or later, but I think we need to do it somewhere in this patch list (I haven't looked at all of them, so if you do that later on, please LMK).

mstorsjo added inline comments.Jan 18 2021, 11:07 AM

No, it's more involved than that. There's two major toolchain styles for Windows; MSVC (the official, nonredistributable tools) and mingw (opensource, public-domain remake of windows headers). They target a number of different C runtime versions. (On Windows, the C runtime was traditionally not a system component - the system shipped with one legacy version, MSVC toolchains ship a newer one that you bundle with your app, and on Windows 10, there's both the legacy and the modern one shipped in the system).

So practically wrt this, it's not so much about requiring a new enough version - there's two entirely different toolchains that we need to support, and this particular area (stat functions, structs and related defines/macros) is differing annoyingly between the two.

Both of them provide constants _S_IFMT up to _S_IFREG. Neither of them provides _S_IFBLK, _S_IFLNK or _S_IFSOCK. Mingw does provide S_ISDIR up to S_ISREG, MSVC doesn't. Neither of them provide S_ISBLK, S_ISLNK or S_ISSOCK.

We don't want to use the actual stat*() functions provided by the C runtimes (either version) because none of them really do what we want them to - so that's why I'd prefer to just flat out ignore the differing mess of what those toolchains provide in this context, as we provide our own implementation anyway.


Well we do that, more or less, but implemented within operations.cpp within an anonymous namespace (so the code calls detail::stat() instead of ::stat()). D91143 takes care of most of what is easily abstractable across platforms.

For functions that aren't quite as easy to abstract away (where the std::function we're implementing is the abstraction itself) I've gone with ifdefs within the individual functions, see e.g. D91170, where there's already ifdefs with two different codepaths within the function (for varying posix platforms).

mstorsjo added inline comments.Jan 18 2021, 1:35 PM

Also - I don't mind spending time on trying to factor out that namespace to a separate header for such abstractions, or something like that - but I would very much prefer to do it after this patch set. Right now the easily abstracted bits are already abstracted within the detail namespace in operations.cpp.

I'm pretty convinced that splitting out such a header and abstracting it isn't entirely trivial and would run yet at least another couple rounds in review before it's ok - and at the current rate, a couple rounds in review is a couple weeks at least, and I really don't want to block the patchset with such a refactor, this close to the LLVM 12 branch (Tuesday next week). (The patch has been up for review since Nov 10.)

This is all internal implementation detail, none of it is visible to users.

mstorsjo added inline comments.Jan 19 2021, 6:08 AM

Ok, it doesn't seem too bad in the end - I'll post a refresh of this and the rest of the patches later today/tonight...

mstorsjo updated this revision to Diff 317567.Jan 19 2021, 7:47 AM

Moved posix reimplementations to a separate header.

mstorsjo updated this revision to Diff 317568.Jan 19 2021, 7:48 AM
mstorsjo set the repository for this revision to rG LLVM Github Monorepo.

Retrigger CI.

mstorsjo added inline comments.Jan 19 2021, 8:47 AM

This bit doesn't come until a following patch; I'll move that part of the comment to that patch in the next iteration.

ldionne accepted this revision.Thu, Jan 28, 3:14 PM

I like this much better, where we're hiding the details into a separate header. Yes, there's no difference in terms of code changes, but now one can read operations.cpp without worrying about a bunch of Windows details.

This revision is now accepted and ready to land.Thu, Jan 28, 3:14 PM
mstorsjo updated this revision to Diff 320064.Fri, Jan 29, 12:35 AM
mstorsjo removed rG LLVM Github Monorepo as the repository for this revision.
mstorsjo set the repository for this revision to rG LLVM Github Monorepo.

Rerunning CI before pushing

This revision was landed with ongoing or failed builds.Fri, Jan 29, 3:39 AM
This revision was automatically updated to reflect the committed changes.