Port libFuzzer's fork mode to Windows.
Implement Windows versions of MkDir, RmDir, and IterateDirRecursive to do this.
Don't print error messages under new normal uses of FileSize (on a non-existent file).
Implement portable way of piping output to /dev/null.
Fix test for Windows and comment fork-sigusr.test on why it won't be ported to Win.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
- Build Status
Buildable 28552 Build 28551: arc lint + arc unit
Event Timeline
compiler-rt/lib/fuzzer/FuzzerIOWindows.cpp | ||
---|---|---|
82–83 | I mentioned this elsewhere, but reparse point does not imply that the target of the reparse point is a file. It could be a directory junction, or something else entirely. Here's all the known reparse tags: #define IO_REPARSE_TAG_MOUNT_POINT (0xA0000003L) #define IO_REPARSE_TAG_HSM (0xC0000004L) #define IO_REPARSE_TAG_HSM2 (0x80000006L) #define IO_REPARSE_TAG_SIS (0x80000007L) #define IO_REPARSE_TAG_WIM (0x80000008L) #define IO_REPARSE_TAG_CSV (0x80000009L) #define IO_REPARSE_TAG_DFS (0x8000000AL) #define IO_REPARSE_TAG_SYMLINK (0xA000000CL) #define IO_REPARSE_TAG_DFSR (0x80000012L) #define IO_REPARSE_TAG_DEDUP (0x80000013L) #define IO_REPARSE_TAG_NFS (0x80000014L) #define IO_REPARSE_TAG_FILE_PLACEHOLDER (0x80000015L) #define IO_REPARSE_TAG_WOF (0x80000017L) #define IO_REPARSE_TAG_WCI (0x80000018L) #define IO_REPARSE_TAG_WCI_1 (0x90001018L) #define IO_REPARSE_TAG_GLOBAL_REPARSE (0xA0000019L) #define IO_REPARSE_TAG_CLOUD (0x9000001AL) #define IO_REPARSE_TAG_CLOUD_1 (0x9000101AL) #define IO_REPARSE_TAG_CLOUD_2 (0x9000201AL) #define IO_REPARSE_TAG_CLOUD_3 (0x9000301AL) #define IO_REPARSE_TAG_CLOUD_4 (0x9000401AL) #define IO_REPARSE_TAG_CLOUD_5 (0x9000501AL) #define IO_REPARSE_TAG_CLOUD_6 (0x9000601AL) #define IO_REPARSE_TAG_CLOUD_7 (0x9000701AL) #define IO_REPARSE_TAG_CLOUD_8 (0x9000801AL) #define IO_REPARSE_TAG_CLOUD_9 (0x9000901AL) #define IO_REPARSE_TAG_CLOUD_A (0x9000A01AL) #define IO_REPARSE_TAG_CLOUD_B (0x9000B01AL) #define IO_REPARSE_TAG_CLOUD_C (0x9000C01AL) #define IO_REPARSE_TAG_CLOUD_D (0x9000D01AL) #define IO_REPARSE_TAG_CLOUD_E (0x9000E01AL) #define IO_REPARSE_TAG_CLOUD_F (0x9000F01AL) #define IO_REPARSE_TAG_CLOUD_MASK (0x0000F000L) #define IO_REPARSE_TAG_APPEXECLINK (0x8000001BL) #define IO_REPARSE_TAG_PROJFS (0x9000001CL) #define IO_REPARSE_TAG_STORAGE_SYNC (0x8000001EL) #define IO_REPARSE_TAG_WCI_TOMBSTONE (0xA000001FL) #define IO_REPARSE_TAG_UNHANDLED (0x80000020L) #define IO_REPARSE_TAG_ONEDRIVE (0x80000021L) #define IO_REPARSE_TAG_PROJFS_TOMBSTONE (0xA0000022L) #define IO_REPARSE_TAG_AF_UNIX (0x80000023L) | |
189 | Note that this will not follow directory symlinks or junctions. Is that the desired behavior? | |
190 | This doesn't look correct to me. You can have a reparse point to something that itself isn't a file. If it's a link, I think you need to check the reparse tag in FindInfo.dwReserved0 as described here |
Test change LGTM.
I don't know much about Windows, so will rely on Zach's os someone else's review.
Thanks!
This patch wasn't quite ready for review but it looks like the comments were very useful. Thanks everyone!
Zach: I'll try to address your comments about links soon.
For the record, I'm also fixing an issue with /dev/null being used on Win even though it doesn't work (I'm replacing it with NUL).
compiler-rt/lib/fuzzer/FuzzerIOWindows.cpp | ||
---|---|---|
190 | Another, perhaps less dirty way of handling the IsLink() case is to call CreateFile() on it with FILE_FLAG_BACKUP_SEMANTICS, which will cause it to follow symlinks, then call GetFileInformationByHandle. That's actually probably the safest way, because I think you can have a reparse point which reparses to another reparse point, and rather than deal with all that stuff, it seems better to just let Windows follow the link for you, and then ask it what it opened. | |
190 | Clarity edit: FILE_FLAG_BACKUP_SEMANTICS will jsut cause it to not fail if it's a directory. It follows symlinks by default (unless you specify FILE_FLAG_OPEN_REPARSE_POINT |
compiler-rt/lib/fuzzer/FuzzerIOWindows.cpp | ||
---|---|---|
82–83 | See my comment below, I think I need to do this to handle the edge-case example. Separately, I'm wondering if I should also handle other attributes such as FILE_ATTRIBUTE_DEVICE, FILE_ATTRIBUTE_OFFLINE, FILE_ATTRIBUTE_VIRTUAL and FILE_ATTRIBUTE_SYSTEM since these don't seem like things I want libFuzzer to treat as files. | |
189 | I just tested my code. I think it does handle symlinks and junctions correctly at least if these are to directories (I guess they have FILE_ATTRIBUTE_DIRECTORY set when they are directories) Here's what I did to test. Using an Administrator cmd.exe: mkdir test-dir cd test-dir echo "contents" > file.txt mkdir dir :: symlink mklink file-sym-link file.txt :: hard link mklink /H file-hardlink1 file.txt :: hard link to symlink mklink /H file-hardlink2 file-symlink :: symlink to dir mklink /D dir-symlink dir :: junction to dir mklink /J dir-junction dir mklink dir\file-symlink2 file.txt :: Breaks but I haven't found anything that handles this. mklink edge-case dir Then when I pass test-dir to IterateDirRecursive with print statements for DirPostCallback and DirPreCallback I get: File: <PATH>\test-dir\dir\file-symlink2 Dir: <PATH>\test-dir\dir File: <PATH>\test-dir\dir-junction\file-symlink2 Dir: <PATH>\test-dir\dir-junction File: <PATH>\test-dir\dir-symlink\file-symlink2 Dir: <PATH>\test-dir\dir-symlink File: <PATH>\test-dir\edge-case File: <PATH>\test-dir\file-hardlink1 File: <PATH>\test-dir\file-hardlink2 File: <PATH>\test-dir\file-symlink File: <PATH>\test-dir\file.txt Dir: <PATH>\test-dir Other than "File" printing for edge-case these results seem correct to me. |
compiler-rt/lib/fuzzer/FuzzerIOWindows.cpp | ||
---|---|---|
190 | Sorry I didn't see this round of comments before I posted mine. Some context behind why I wrote things the way they are now: I'm trying to avoid calling the Windows File APIs more than necessary because (I think) they are too slow to be used like their posix counterparts. I found (thanks to a bug report from a user) that when running libFuzzer on a large corpus, libFuzzer could spend ~50% of the time in ReadDirToVectorOfUnits and functions it calls, such as ListFilesInDirRecursive, GetEpoch and CreateFile (this seemed to crop up when using -jobs=N). Of course, if there's a correctness problem with the current approach, I can use the higher level API functions, but I wanted to see what you think of my most recent results before switching. Maybe if we call CreateFile only on possible links, then perf won't be negatively impacted in 99% of cases since typically we will only be dealing with actual files. |
I'm not sure this should be landed yet since IterateDirRecursive treats edge-case as a file.
I suspect that is harmless, but ideally it wouldn't do this.
compiler-rt/lib/fuzzer/FuzzerIOWindows.cpp | ||
---|---|---|
190 | in my experience, the only File system APIs that are slow are those that deal with paths. If you already have a Handle, everything is pretty fast. That said, if you were to put something under an IsLink() test, that should also be fast even if you do a CreateFile inside of that branch, simply due to the fact that links are not that common, so the branch would rarely be taken. Still, avoiding them is fine as long as it works. |
This seems to break:
File: <PATH>\test-dir\dir\file-symlink2 File: <PATH>\test-dir\dir\y.txt Dir: <PATH>\test-dir\dir File: <PATH>\test-dir\dir-junction\file-symlink2 File: <PATH>\test-dir\dir-junction\y.txt Dir: <PATH>\test-dir\dir-junction File: <PATH>\test-dir\dir-Shortcut.lnk File: <PATH>\test-dir\dir-symlink\file-symlink2 File: <PATH>\test-dir\dir-symlink\y.txt Dir: <PATH>\test-dir\dir-symlink File: <PATH>\test-dir\edge-case File: <PATH>\test-dir\file-hardlink1 File: <PATH>\test-dir\file-hardlink2 File: <PATH>\test-dir\file-symlink File: <PATH>\test-dir\file.txt FindFirstFileA failed with error 5 for directory: <PATH>\ss64\$Recycle.Bin\blah; exiting
I'm not sure what the correct behavior in this case is though.
Actually just use a different directory then. Like mount D: instead of C:. Recycle bin is protected, so you probably won't be able to get directly into it without special permissions.
What I'm really wondering about is whether a volume mounted with mountvol will fall into your directory branch or your file branch. It should go into your directory branch.
That said, it looks like this does find a more general bug, which is that if the directory is inaccessible you will fail and terminate the iteration. You should probably handle this gracefully and just return from the function is FindFirstFile fails.
FWIW I think you could reproduce this same bug without mountvol by just making a directory and creating a Deny All ACL on it, and that should definitely be handled.
It takes the directory branch.
Do we want it to take either though? Is this a common use case?
This particular case has me worried because if LF used delete as its callback function for files it would be like calling rm -rf ~/
That said, it looks like this does find a more general bug, which is that if the directory is inaccessible you will fail and terminate the iteration. You should probably handle this gracefully and just return from the function is FindFirstFile fails.
Done.
It's not really that common, but is it fundamentally any different than creating a junction (or directory symlink) to your root drive? If you follow symlinks, this is always going to be a problem, and the only *real* solution is to just not follow any directory reparse points.
That said, it looks like this does find a more general bug, which is that if the directory is inaccessible you will fail and terminate the iteration. You should probably handle this gracefully and just return from the function is FindFirstFile fails.
Done.
I'm not sure this should be landed yet since IterateDirRecursive treats edge-case as a file.
I suspect that is harmless, but ideally it wouldn't do this.
Actually I don't think there are negative consequences from treating this as a file since I haven't seen anything treat a case like this as a directory.
Fair point.
That said, it looks like this does find a more general bug, which is that if the directory is inaccessible you will fail and terminate the iteration. You should probably handle this gracefully and just return from the function is FindFirstFile fails.
Done.
@zturner
Do you think there's anything that needs fixing in IterateDirRecursive that should prevent this from landing?
(mostly for myself): I did notice one other blocking issue that should be fixed before landing. libFuzzer isn't cleaning up after itself when using fork mode if I send it ctrl-C.
Since Ctrl-C doesn't quite work in fork mode on Linux, I don't think this should block Windows.
@kcc
I tried Ctrl-Cing out of fork mode on Linux with this patch.
The temp directories seemed to get cleaned up anyway.
I mentioned this elsewhere, but reparse point does not imply that the target of the reparse point is a file. It could be a directory junction, or something else entirely. Here's all the known reparse tags: