This is an archive of the discontinued LLVM Phabricator instance.

[libFuzzer][Windows] Port fork mode to Windows
ClosedPublic

Authored by metzman on Feb 21 2019, 8:27 AM.

Details

Summary

Port libFuzzer's fork mode to Windows.
Implement Windows versions of MkDir, RmDir, and IterateDirRecursive to do this.
Don't print error messages under new normal uses of FileSize (on a non-existent file).
Implement portable way of piping output to /dev/null.
Fix test for Windows and comment fork-sigusr.test on why it won't be ported to Win.

Event Timeline

metzman created this revision.Feb 21 2019, 8:27 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptFeb 21 2019, 8:27 AM
Herald added subscribers: llvm-commits, Restricted Project, jdoerfert. · View Herald Transcript
metzman updated this revision to Diff 187804.Feb 21 2019, 8:37 AM
  • Better comments
metzman updated this revision to Diff 187805.Feb 21 2019, 8:39 AM
  • more consistency
metzman retitled this revision from initial to [libFuzzer][Windows] Port fork mode to Windows.Feb 21 2019, 8:41 AM
metzman edited the summary of this revision. (Show Details)
zturner added inline comments.
compiler-rt/lib/fuzzer/FuzzerIOWindows.cpp
82–83

I mentioned this elsewhere, but reparse point does not imply that the target of the reparse point is a file. It could be a directory junction, or something else entirely. Here's all the known reparse tags:

#define IO_REPARSE_TAG_MOUNT_POINT              (0xA0000003L)       
#define IO_REPARSE_TAG_HSM                      (0xC0000004L)       
#define IO_REPARSE_TAG_HSM2                     (0x80000006L)       
#define IO_REPARSE_TAG_SIS                      (0x80000007L)       
#define IO_REPARSE_TAG_WIM                      (0x80000008L)       
#define IO_REPARSE_TAG_CSV                      (0x80000009L)       
#define IO_REPARSE_TAG_DFS                      (0x8000000AL)       
#define IO_REPARSE_TAG_SYMLINK                  (0xA000000CL)       
#define IO_REPARSE_TAG_DFSR                     (0x80000012L)       
#define IO_REPARSE_TAG_DEDUP                    (0x80000013L)       
#define IO_REPARSE_TAG_NFS                      (0x80000014L)       
#define IO_REPARSE_TAG_FILE_PLACEHOLDER         (0x80000015L)       
#define IO_REPARSE_TAG_WOF                      (0x80000017L)       
#define IO_REPARSE_TAG_WCI                      (0x80000018L)       
#define IO_REPARSE_TAG_WCI_1                    (0x90001018L)       
#define IO_REPARSE_TAG_GLOBAL_REPARSE           (0xA0000019L)       
#define IO_REPARSE_TAG_CLOUD                    (0x9000001AL)       
#define IO_REPARSE_TAG_CLOUD_1                  (0x9000101AL)       
#define IO_REPARSE_TAG_CLOUD_2                  (0x9000201AL)       
#define IO_REPARSE_TAG_CLOUD_3                  (0x9000301AL)       
#define IO_REPARSE_TAG_CLOUD_4                  (0x9000401AL)       
#define IO_REPARSE_TAG_CLOUD_5                  (0x9000501AL)       
#define IO_REPARSE_TAG_CLOUD_6                  (0x9000601AL)       
#define IO_REPARSE_TAG_CLOUD_7                  (0x9000701AL)       
#define IO_REPARSE_TAG_CLOUD_8                  (0x9000801AL)       
#define IO_REPARSE_TAG_CLOUD_9                  (0x9000901AL)       
#define IO_REPARSE_TAG_CLOUD_A                  (0x9000A01AL)       
#define IO_REPARSE_TAG_CLOUD_B                  (0x9000B01AL)       
#define IO_REPARSE_TAG_CLOUD_C                  (0x9000C01AL)       
#define IO_REPARSE_TAG_CLOUD_D                  (0x9000D01AL)       
#define IO_REPARSE_TAG_CLOUD_E                  (0x9000E01AL)       
#define IO_REPARSE_TAG_CLOUD_F                  (0x9000F01AL)       
#define IO_REPARSE_TAG_CLOUD_MASK               (0x0000F000L)       
#define IO_REPARSE_TAG_APPEXECLINK              (0x8000001BL)       
#define IO_REPARSE_TAG_PROJFS                   (0x9000001CL)       
#define IO_REPARSE_TAG_STORAGE_SYNC             (0x8000001EL)       
#define IO_REPARSE_TAG_WCI_TOMBSTONE            (0xA000001FL)       
#define IO_REPARSE_TAG_UNHANDLED                (0x80000020L)       
#define IO_REPARSE_TAG_ONEDRIVE                 (0x80000021L)       
#define IO_REPARSE_TAG_PROJFS_TOMBSTONE         (0xA0000022L)       
#define IO_REPARSE_TAG_AF_UNIX                  (0x80000023L)
189

Note that this will not follow directory symlinks or junctions. Is that the desired behavior?

190

This doesn't look correct to me. You can have a reparse point to something that itself isn't a file. If it's a link, I think you need to check the reparse tag in FindInfo.dwReserved0 as described here

kcc added a reviewer: zturner.Feb 21 2019, 9:54 AM
kcc added a subscriber: kcc.

Test change LGTM.
I don't know much about Windows, so will rely on Zach's os someone else's review.

Thanks!

metzman marked 2 inline comments as done.Feb 21 2019, 10:13 AM

This patch wasn't quite ready for review but it looks like the comments were very useful. Thanks everyone!
Zach: I'll try to address your comments about links soon.

metzman added a comment.EditedFeb 21 2019, 10:17 AM

For the record, I'm also fixing an issue with /dev/null being used on Win even though it doesn't work (I'm replacing it with NUL).

zturner added inline comments.Feb 21 2019, 11:13 AM
compiler-rt/lib/fuzzer/FuzzerIOWindows.cpp
190

Another, perhaps less dirty way of handling the IsLink() case is to call CreateFile() on it with FILE_FLAG_BACKUP_SEMANTICS, which will cause it to follow symlinks, then call GetFileInformationByHandle. That's actually probably the safest way, because I think you can have a reparse point which reparses to another reparse point, and rather than deal with all that stuff, it seems better to just let Windows follow the link for you, and then ask it what it opened.

190

Clarity edit: FILE_FLAG_BACKUP_SEMANTICS will jsut cause it to not fail if it's a directory. It follows symlinks by default (unless you specify FILE_FLAG_OPEN_REPARSE_POINT

metzman marked 2 inline comments as done.Feb 21 2019, 11:41 AM
metzman added inline comments.
compiler-rt/lib/fuzzer/FuzzerIOWindows.cpp
82–83

See my comment below, I think I need to do this to handle the edge-case example.

Separately, I'm wondering if I should also handle other attributes such as FILE_ATTRIBUTE_DEVICE, FILE_ATTRIBUTE_OFFLINE, FILE_ATTRIBUTE_VIRTUAL and FILE_ATTRIBUTE_SYSTEM since these don't seem like things I want libFuzzer to treat as files.

189

I just tested my code. I think it does handle symlinks and junctions correctly at least if these are to directories (I guess they have FILE_ATTRIBUTE_DIRECTORY set when they are directories)

Here's what I did to test.

Using an Administrator cmd.exe:

mkdir test-dir
cd test-dir

echo "contents" > file.txt
mkdir dir

:: symlink
mklink file-sym-link file.txt

::  hard link
mklink /H file-hardlink1 file.txt

:: hard link to symlink
mklink /H file-hardlink2 file-symlink

:: symlink to dir
mklink /D dir-symlink dir

:: junction to dir
mklink /J dir-junction dir

mklink dir\file-symlink2 file.txt

:: Breaks but I haven't found anything that handles this.
mklink edge-case dir

Then when I pass test-dir to IterateDirRecursive with print statements for DirPostCallback and DirPreCallback I get:

File: <PATH>\test-dir\dir\file-symlink2
Dir: <PATH>\test-dir\dir
File: <PATH>\test-dir\dir-junction\file-symlink2
Dir: <PATH>\test-dir\dir-junction
File: <PATH>\test-dir\dir-symlink\file-symlink2
Dir: <PATH>\test-dir\dir-symlink
File: <PATH>\test-dir\edge-case
File: <PATH>\test-dir\file-hardlink1
File: <PATH>\test-dir\file-hardlink2
File: <PATH>\test-dir\file-symlink
File: <PATH>\test-dir\file.txt
Dir: <PATH>\test-dir

Other than "File" printing for edge-case these results seem correct to me.
I assume it isn't possible to treat edge-case as a directory (no other windows software I've seen does).
I think I need to change what I consider a symbolic link to a file so that I don't tread edge-case as I file, what do you think?

metzman updated this revision to Diff 187854.Feb 21 2019, 1:32 PM
metzman marked 2 inline comments as done.
  • fix
  • Format and fix issue with /dev/null
  • fix
metzman updated this revision to Diff 187857.Feb 21 2019, 1:39 PM
  • fix
  • remove comment
  • remove comment and simplify
metzman added inline comments.Feb 21 2019, 1:42 PM
compiler-rt/lib/fuzzer/FuzzerIOWindows.cpp
190

Sorry I didn't see this round of comments before I posted mine.

Some context behind why I wrote things the way they are now: I'm trying to avoid calling the Windows File APIs more than necessary because (I think) they are too slow to be used like their posix counterparts.

I found (thanks to a bug report from a user) that when running libFuzzer on a large corpus, libFuzzer could spend ~50% of the time in ReadDirToVectorOfUnits and functions it calls, such as ListFilesInDirRecursive, GetEpoch and CreateFile (this seemed to crop up when using -jobs=N).

Of course, if there's a correctness problem with the current approach, I can use the higher level API functions, but I wanted to see what you think of my most recent results before switching.

Maybe if we call CreateFile only on possible links, then perf won't be negatively impacted in 99% of cases since typically we will only be dealing with actual files.
For what it's worth, using CreateFileA with FILE_FLAG_BACKUP_SEMANTICS doesn't seem to work in all cases.
If I use IsFile from line 26 instead of IsLink I get this error: CreateFileA() failed for "<PATH>\test-dir\dir\file-symlink2" (Error code: 2)
Whereas this file was handled correctly in the past.

metzman updated this revision to Diff 187858.Feb 21 2019, 1:44 PM
  • undo accidental linespace changes

Can you try mountvol <PATH>\mount C:\ and see what you print in this case?

metzman planned changes to this revision.Feb 21 2019, 1:50 PM

I'm not sure this should be landed yet since IterateDirRecursive treats edge-case as a file.
I suspect that is harmless, but ideally it wouldn't do this.

zturner added inline comments.Feb 21 2019, 1:50 PM
compiler-rt/lib/fuzzer/FuzzerIOWindows.cpp
190

in my experience, the only File system APIs that are slow are those that deal with paths. If you already have a Handle, everything is pretty fast. That said, if you were to put something under an IsLink() test, that should also be fast even if you do a CreateFile inside of that branch, simply due to the fact that links are not that common, so the branch would rarely be taken.

Still, avoiding them is fine as long as it works.

metzman added a comment.EditedFeb 21 2019, 1:56 PM

Can you try mountvol <PATH>\mount C:\ and see what you print in this case?

This seems to break:

File: <PATH>\test-dir\dir\file-symlink2
File: <PATH>\test-dir\dir\y.txt
Dir: <PATH>\test-dir\dir
File: <PATH>\test-dir\dir-junction\file-symlink2
File: <PATH>\test-dir\dir-junction\y.txt
Dir: <PATH>\test-dir\dir-junction
File: <PATH>\test-dir\dir-Shortcut.lnk
File: <PATH>\test-dir\dir-symlink\file-symlink2
File: <PATH>\test-dir\dir-symlink\y.txt
Dir: <PATH>\test-dir\dir-symlink
File: <PATH>\test-dir\edge-case
File: <PATH>\test-dir\file-hardlink1
File: <PATH>\test-dir\file-hardlink2
File: <PATH>\test-dir\file-symlink
File: <PATH>\test-dir\file.txt
FindFirstFileA failed with error 5 for directory: <PATH>\ss64\$Recycle.Bin\blah; exiting

I'm not sure what the correct behavior in this case is though.

Can you try mountvol <PATH>\mount C:\ and see what you print in this case?

This seems to break:

File: <PATH>\test-dir\dir\file-symlink2
File: <PATH>\test-dir\dir\y.txt
Dir: <PATH>\test-dir\dir
File: <PATH>\test-dir\dir-junction\file-symlink2
File: <PATH>\test-dir\dir-junction\y.txt
Dir: <PATH>\test-dir\dir-junction
File: <PATH>\test-dir\dir-Shortcut.lnk
File: <PATH>\test-dir\dir-symlink\file-symlink2
File: <PATH>\test-dir\dir-symlink\y.txt
Dir: <PATH>\test-dir\dir-symlink
File: <PATH>\test-dir\edge-case
File: <PATH>\test-dir\file-hardlink1
File: <PATH>\test-dir\file-hardlink2
File: <PATH>\test-dir\file-symlink
File: <PATH>\test-dir\file.txt
FindFirstFileA failed with error 5 for directory: <PATH>\ss64\$Recycle.Bin\blah; exiting

I'm not sure what the correct behavior in this case is though.

Actually just use a different directory then. Like mount D: instead of C:. Recycle bin is protected, so you probably won't be able to get directly into it without special permissions.

What I'm really wondering about is whether a volume mounted with mountvol will fall into your directory branch or your file branch. It should go into your directory branch.

That said, it looks like this does find a more general bug, which is that if the directory is inaccessible you will fail and terminate the iteration. You should probably handle this gracefully and just return from the function is FindFirstFile fails.

Can you try mountvol <PATH>\mount C:\ and see what you print in this case?

This seems to break:

File: <PATH>\test-dir\dir\file-symlink2
File: <PATH>\test-dir\dir\y.txt
Dir: <PATH>\test-dir\dir
File: <PATH>\test-dir\dir-junction\file-symlink2
File: <PATH>\test-dir\dir-junction\y.txt
Dir: <PATH>\test-dir\dir-junction
File: <PATH>\test-dir\dir-Shortcut.lnk
File: <PATH>\test-dir\dir-symlink\file-symlink2
File: <PATH>\test-dir\dir-symlink\y.txt
Dir: <PATH>\test-dir\dir-symlink
File: <PATH>\test-dir\edge-case
File: <PATH>\test-dir\file-hardlink1
File: <PATH>\test-dir\file-hardlink2
File: <PATH>\test-dir\file-symlink
File: <PATH>\test-dir\file.txt
FindFirstFileA failed with error 5 for directory: <PATH>\ss64\$Recycle.Bin\blah; exiting

I'm not sure what the correct behavior in this case is though.

Actually just use a different directory then. Like mount D: instead of C:. Recycle bin is protected, so you probably won't be able to get directly into it without special permissions.

What I'm really wondering about is whether a volume mounted with mountvol will fall into your directory branch or your file branch. It should go into your directory branch.

That said, it looks like this does find a more general bug, which is that if the directory is inaccessible you will fail and terminate the iteration. You should probably handle this gracefully and just return from the function is FindFirstFile fails.

FWIW I think you could reproduce this same bug without mountvol by just making a directory and creating a Deny All ACL on it, and that should definitely be handled.

metzman updated this revision to Diff 187866.Feb 21 2019, 2:31 PM
  • fail gracefully

Can you try mountvol <PATH>\mount C:\ and see what you print in this case?

This seems to break:

File: <PATH>\test-dir\dir\file-symlink2
File: <PATH>\test-dir\dir\y.txt
Dir: <PATH>\test-dir\dir
File: <PATH>\test-dir\dir-junction\file-symlink2
File: <PATH>\test-dir\dir-junction\y.txt
Dir: <PATH>\test-dir\dir-junction
File: <PATH>\test-dir\dir-Shortcut.lnk
File: <PATH>\test-dir\dir-symlink\file-symlink2
File: <PATH>\test-dir\dir-symlink\y.txt
Dir: <PATH>\test-dir\dir-symlink
File: <PATH>\test-dir\edge-case
File: <PATH>\test-dir\file-hardlink1
File: <PATH>\test-dir\file-hardlink2
File: <PATH>\test-dir\file-symlink
File: <PATH>\test-dir\file.txt
FindFirstFileA failed with error 5 for directory: <PATH>\ss64\$Recycle.Bin\blah; exiting

I'm not sure what the correct behavior in this case is though.

Actually just use a different directory then. Like mount D: instead of C:. Recycle bin is protected, so you probably won't be able to get directly into it without special permissions.

What I'm really wondering about is whether a volume mounted with mountvol will fall into your directory branch or your file branch. It should go into your directory branch.

It takes the directory branch.
Do we want it to take either though? Is this a common use case?
This particular case has me worried because if LF used delete as its callback function for files it would be like calling rm -rf ~/

That said, it looks like this does find a more general bug, which is that if the directory is inaccessible you will fail and terminate the iteration. You should probably handle this gracefully and just return from the function is FindFirstFile fails.

Done.

Can you try mountvol <PATH>\mount C:\ and see what you print in this case?

This seems to break:

File: <PATH>\test-dir\dir\file-symlink2
File: <PATH>\test-dir\dir\y.txt
Dir: <PATH>\test-dir\dir
File: <PATH>\test-dir\dir-junction\file-symlink2
File: <PATH>\test-dir\dir-junction\y.txt
Dir: <PATH>\test-dir\dir-junction
File: <PATH>\test-dir\dir-Shortcut.lnk
File: <PATH>\test-dir\dir-symlink\file-symlink2
File: <PATH>\test-dir\dir-symlink\y.txt
Dir: <PATH>\test-dir\dir-symlink
File: <PATH>\test-dir\edge-case
File: <PATH>\test-dir\file-hardlink1
File: <PATH>\test-dir\file-hardlink2
File: <PATH>\test-dir\file-symlink
File: <PATH>\test-dir\file.txt
FindFirstFileA failed with error 5 for directory: <PATH>\ss64\$Recycle.Bin\blah; exiting

I'm not sure what the correct behavior in this case is though.

Actually just use a different directory then. Like mount D: instead of C:. Recycle bin is protected, so you probably won't be able to get directly into it without special permissions.

What I'm really wondering about is whether a volume mounted with mountvol will fall into your directory branch or your file branch. It should go into your directory branch.

It takes the directory branch.
Do we want it to take either though? Is this a common use case?
This particular case has me worried because if LF used delete as its callback function for files it would be like calling rm -rf ~/

It's not really that common, but is it fundamentally any different than creating a junction (or directory symlink) to your root drive? If you follow symlinks, this is always going to be a problem, and the only *real* solution is to just not follow any directory reparse points.

That said, it looks like this does find a more general bug, which is that if the directory is inaccessible you will fail and terminate the iteration. You should probably handle this gracefully and just return from the function is FindFirstFile fails.

Done.

I'm not sure this should be landed yet since IterateDirRecursive treats edge-case as a file.
I suspect that is harmless, but ideally it wouldn't do this.

Actually I don't think there are negative consequences from treating this as a file since I haven't seen anything treat a case like this as a directory.

Can you try mountvol <PATH>\mount C:\ and see what you print in this case?

This seems to break:

File: <PATH>\test-dir\dir\file-symlink2
File: <PATH>\test-dir\dir\y.txt
Dir: <PATH>\test-dir\dir
File: <PATH>\test-dir\dir-junction\file-symlink2
File: <PATH>\test-dir\dir-junction\y.txt
Dir: <PATH>\test-dir\dir-junction
File: <PATH>\test-dir\dir-Shortcut.lnk
File: <PATH>\test-dir\dir-symlink\file-symlink2
File: <PATH>\test-dir\dir-symlink\y.txt
Dir: <PATH>\test-dir\dir-symlink
File: <PATH>\test-dir\edge-case
File: <PATH>\test-dir\file-hardlink1
File: <PATH>\test-dir\file-hardlink2
File: <PATH>\test-dir\file-symlink
File: <PATH>\test-dir\file.txt
FindFirstFileA failed with error 5 for directory: <PATH>\ss64\$Recycle.Bin\blah; exiting

I'm not sure what the correct behavior in this case is though.

Actually just use a different directory then. Like mount D: instead of C:. Recycle bin is protected, so you probably won't be able to get directly into it without special permissions.

What I'm really wondering about is whether a volume mounted with mountvol will fall into your directory branch or your file branch. It should go into your directory branch.

It takes the directory branch.
Do we want it to take either though? Is this a common use case?
This particular case has me worried because if LF used delete as its callback function for files it would be like calling rm -rf ~/

It's not really that common, but is it fundamentally any different than creating a junction (or directory symlink) to your root drive? If you follow symlinks, this is always going to be a problem, and the only *real* solution is to just not follow any directory reparse points.

Fair point.

That said, it looks like this does find a more general bug, which is that if the directory is inaccessible you will fail and terminate the iteration. You should probably handle this gracefully and just return from the function is FindFirstFile fails.

Done.

@zturner
Do you think there's anything that needs fixing in IterateDirRecursive that should prevent this from landing?

(mostly for myself): I did notice one other blocking issue that should be fixed before landing. libFuzzer isn't cleaning up after itself when using fork mode if I send it ctrl-C.

metzman updated this revision to Diff 187894.Feb 21 2019, 5:23 PM
  • Fix race so that max_total_time works with fork mode and improve error messages.

Since Ctrl-C doesn't quite work in fork mode on Linux, I don't think this should block Windows.

zturner accepted this revision.Feb 22 2019, 9:15 AM
This revision is now accepted and ready to land.Feb 22 2019, 9:15 AM
metzman updated this revision to Diff 188375.Feb 26 2019, 7:30 AM

Run clang-format-diff

metzman edited the summary of this revision. (Show Details)Feb 26 2019, 7:31 AM

@kcc
I tried Ctrl-Cing out of fork mode on Linux with this patch.
The temp directories seemed to get cleaned up anyway.

This revision was automatically updated to reflect the committed changes.