https://bugs.chromium.org/p/llvm/issues/detail?id=19
rdar://87912416
Details
- Reviewers
• Quuxplusone ldionne - Group Reviewers
Restricted Project - Commits
- rG4f67a909902d: [libc++] Fix TOCTOU issue with std::filesystem::remove_all
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
Further discussion is on /r/cpp: https://old.reddit.com/r/cpp/comments/s8ok0h/possible_toctou_vulnerabilities_in/hti8jyt/
(Remember I told you Niall Douglas was once working on a thing for secure filesystem? There he is talking about it!)
libcxx/include/__filesystem/directory_options.h | ||
---|---|---|
26 | Please add a trailing comma here too, so you don't have to touch this line next time. :) | |
libcxx/src/filesystem/directory_iterator.cpp | ||
200 | Surely you should check if (fd == -1) (or even if (fd < 0)) here, and bail down to line 206 if needed. | |
207 | Pre-existing: allow_eacces | |
libcxx/src/filesystem/operations.cpp | ||
1377 | Doesn't this flatten/round-trip everything back through path, which means you have the symlink vulnerability again at this level? Suppose I ask to remove_all("/tmp/foo"). The STL securely/atomically opens /tmp/foo as a directory (detecting and rejecting any attempt by me to rm -rf /tmp/foo ; ln -sf /root /tmp/foo). Then it starts iterating over that open directory. (At this point I can ln -sf /root /tmp/foo if I want, but it won't matter because the STL is already iterating over the real inode and no longer cares what's at that path in the filesystem.) I believe that a proper fix for this issue requires using [openat](https://linux.die.net/man/2/openat) at every level. As soon as the code touches fs::path, it's game over. (Bonus: fs::path does a ton of heap allocation, but with openat I suspect you never need to allocate, do you?) |
Completely new approach using parent file descriptors. I'm aware this patch as-is won't compile on Windows -- I'll fix that once I'm confident the base patch is alright. Windows is not affected by this issue anyway.
libcxx/src/filesystem/directory_iterator.cpp | ||
---|---|---|
207 | Fixed in a separate NFC commit. | |
libcxx/src/filesystem/operations.cpp | ||
1377 | Yeah, I think you're right. I don't know how to retrofit that on top of directory_iterator though. We could technically do it using recursive_directory_iterator, but it would be way more complicated. I want to get this patch landed ASAP, so I'm going to upload another approach based entirely on top of the openat, unlinkat & friends API, without using directory_iterator at all. Please take a look -- once we're confident we're solving the problem properly, we can try to figure out how to polish the rough edges after landing it. |
libcxx/src/filesystem/filesystem_common.h | ||
---|---|---|
572 ↗ | (On Diff #403052) | (non-libc++ expert) Sorry if this is a dumb question: is capture_errno() API? is it safe to have in a header? |
libcxx/src/filesystem/directory_iterator.cpp | ||
---|---|---|
77 | I wonder if it would be simpler to just move remove_all into this .cpp file. A priori (without seeing that this diff is where they came from) it's weird to see static functions in a .h file. | |
libcxx/src/filesystem/filesystem_common.h | ||
570 ↗ | (On Diff #403052) | Seems like a perfect place for if (struct dirent *dir_entry_ptr = ::readdir(dir_stream)) { ... (and swap the if/else branches). Btw, for anyone else who's confused why we aren't using ::readdir_r, apparently ::readdir is equally thread-safe on modern systems (at least, those modern systems documented by LWN ;)) https://lwn.net/Articles/696474/ We should expect that dir_entry_ptr will point to some memory located physically inside the footprint of *dir_stream (as opposed to some static buffer or something). However, I now also see that this code simply moved from directory_iterator.cpp (and have suggested that maybe remove_all should move over there instead), in which case there's no need to drive-by refactor any of this particular function. |
libcxx/src/filesystem/operations.cpp | ||
1381 | Any reason for intmax_t here but uintmax_t in the function return type? I suggest consistency, but don't care which. | |
1383 | Consider a documentary static_assert(std::is_same_v<decltype(str), std::string_view> here, because comparing auto to a string literal smells like it might be a pointer comparison, and that would suck. | |
1384 | Style: For this if-else ladder, I'd prefer either what-you've-got-minus-all-the-else-keywords, or if (ec == errc::no_such_file_or_directory) { // Not an error; `p` might have moved or been deleted already. ec.clear(); return 0; } else if (ec == errc::not_a_directory) { // Remove `p` as a normal file instead. ec.clear(); ~~~ i.e., put the commentary for each branch on the branch, and cuddle the elses as usual. | |
1385 | If anything below were actually to throw an exception, then our count would be wrong. But this simplifies the early-return cases even in the absence of exceptions, so I assume that's why you did it. | |
1394 | FWIW, this line is still racey: So, this line right here is already the state of the art, and I can't think of anything better to do than to be OK with it. Also, at first glance I can't see any way to really "exploit" this race. Either (happy path) you remove the now-empty directory you intended to; or the attacker moves that now-empty directory out of the way and puts their own thing in its place which you fail to delete (because it's not a directory, or because it's a non-empty directory); or the attacker moves that now-empty directory out of the way and puts their own empty directory in its place and you delete it (but it was an empty directory, and it wasn't a symlink (because it was a directory), so who cares). | |
1400 | It would seem simpler to move this down below the if, and get rid of the if (stream != nullptr) inside the lambda. (Also personally I'd capture [&] rather than [stream], because nothing weird is going on here.) | |
libcxx/test/std/input.output/filesystems/fs.op.funcs/fs.op.remove_all/toctou.pass.cpp | ||
38–39 | Running this test deletes whatever I have in /tmp/mydir/victim_del? I suggest not committing this as-is. 😛 | |
79 | (If this is intended for commit at all) Consider adding another test for the subdirectory case I brought up in my previous review. |
libcxx/src/filesystem/directory_iterator.cpp | ||
---|---|---|
77 | I hadn't noticed they were static. I think it would be entirely fine to make them non-static and leave them in the header, but I'd rather not move remove_all into this file, it seems pretty weird to have that operation (and only that one) inside directory_iterator.cpp. Edit: After looking more at the contents of the file, there's a bunch of static functions and almost everything is defined in an anonymous namespace. Honestly, it's just weird. I'll make the move as-is in a separate commit and we can take an action item to go back and refactor this later. | |
libcxx/src/filesystem/filesystem_common.h | ||
570 ↗ | (On Diff #403052) | Right, I think I'll move the code in a separate commit to make this less confusing. |
572 ↗ | (On Diff #403052) | filesystem_common.h is only used inside the sources of libc++ used for building the shared library. But yeah, I think it would be fine regardless, it's just a helper function. |
libcxx/src/filesystem/operations.cpp | ||
1381 | Sorry, that was a typo. It should have been uintmax_t, thanks for spotting. | |
1385 | Yes, simplifying the cleanup (without worrying about exceptions) is the purpose of this class. I am assuming that nothing throws in the filesystem code because we are always using the error_code version of functions. | |
1394 | Right, I agree -- I hadn't thought about this race but I think we're both on the same page that it is benign (and also unavoidable without different OS APIs). |
Use _AIX instead of MVS to guard.
@daltenty Note that it looks like AIX does provide those POSIX functions, but the tests fail
when I start using my implementation on AIX. If someone who works on AIX has an appetite to
take a look, it might be good to enable the new (safe) implementation on AIX too.
I'd like to ship this before LLVM 14, so if folks can take a look, it would be awesome.
Also, adding vendors so they are aware of this. In particular, we don't implement this fix on AIX, Windows and MinGW . On AIX it's because somehow the remove_all test starts failing when we use the implementation based on openat(), and on MinGW/Windows it's because the OS doesn't implement the right APIs AFAICT. If people who manage these platforms want to take a look, this is the notice.
Looking at Rust's fix over https://github.com/rust-lang/rust/commit/4f0ad1c92ca08da6e8dc17838070975762f59714 seems like there is API added in Windows 10 to solve it (I don't know how effective though).
I'm not going to work on this myself but leaving the link in case somebody else wants to give it a try.
Thanks, noted. I don't believe I have the bandwidth to fix this right now before the 14.x branch early next week though, so the current form of the patch seems sensible wrt Windows I think.
The use of SetFileInformationByHandle(FileDispositionInfoEx) with FILE_DISPOSITION_FLAG_POSIX_SEMANTICS doesn't seem to be the core of the fix. The core of the fix is to rewrite directory iteration with something that operates on an open handle, instead of something given a path. In the Rust fix, this is done with GetFileInformationByHandleEx(FileIdBothDirectoryInfo) (which I would believe exists earlier).
So we could open a handle to the intended path with FILE_FLAG_OPEN_REPARSE_POINT (so it doesn't follow symlinks), then inspect whether it's a regular file or a directory, and if a directory, iterate over its contents without closing the handle and using a path name again. (Currently we use the generic directory iterators, which are built on top of FindFirstFileW/FindNextFileW.)
libcxx/src/filesystem/operations.cpp | ||
---|---|---|
49 | This header can't be included unconditionally on all OSes. |
Thanks for pinging us on this. After taking a look at the AIX test failure, and dumping the error_code we get back from the new implementation, I think this is actually due to some ambiguity in the expected errno when the combination of O_DIRECTORY and O_NOFOLLOW is used and the path is a symlink.
https://pubs.opengroup.org/onlinepubs/9699919799/functions/open.html
O_DIRECTORY If path resolves to a non-directory file, fail and set errno to [ENOTDIR]. O_NOFOLLOW If path names a symbolic link, fail and set errno to [ELOOP]. `
See the following test program:
#include <sys/stat.h> #include <fcntl.h> #include <unistd.h> #include <stdio.h> #include <errno.h> int main() { mkdir("foo", S_IRWXU); symlink("foo", "bar"); int ret=openat(AT_FDCWD, "bar", O_CLOEXEC | O_RDONLY | O_DIRECTORY | O_NOFOLLOW); if (errno==ENOTDIR) { printf("ENOTDIR\n"); } else if (errno==ELOOP) { printf("ELOOP\n"); } return 0; }
Which will it seems will give you ENOTDIR on MacOS and some Linux, but gives ELOOP on AIX (and interestingly RHEL Linux on Power).
libcxx/src/filesystem/operations.cpp | ||
---|---|---|
1471 | Seems like we need to address the too_many_symbolic_link_levels error case with O_NOFOLLOW |
Sure thing, thanks a lot for investigating. I'm re-uploading the patch with a fix and I'm enabling the fix on AIX -- let's see if CI is happy (I can't test on that platform locally).
The release branch is apparently being cut today. I'm going to ship this now because I don't want to miss the deadline with this fix, but please feel free to comment if you see potential issues/improvements and I'll implement them (and cherry-pick onto LLVM 14).
I've just hit the same case for GCC on AIX. Arguably POSIX is clear and AIX is right. For the case being discussed we have a symlink to a directory, so the path *names* a symlink, but *resolves* to a directory. So ELOOP is right.
I'll raise this with glibc.
Please add a trailing comma here too, so you don't have to touch this line next time. :)