Page MenuHomePhabricator

Fix tmp files being left on Windows builds.
ClosedPublic

Authored by akhuang on May 18 2021, 4:37 PM.

Details

Summary

Clang writes object files by first writing to a .tmp file and then
renaming to the final .obj name. On Windows, if a compile is killed
partway through the .tmp files don't get deleted.

Currently it seems like RemoveFileOnSignal takes care of deleting the
tmp files on Linux, but on Windows we need to call
setDeleteDisposition on tmp files so that they are deleted when
closed.

This patch switches to using TempFile to create the .tmp files we write
when creating object files, since it uses setDeleteDisposition on Windows.
This change applies to both Linux and Windows for consistency.

Diff Detail

Unit TestsFailed

TimeTest
40 msx64 debian > MLIR.Examples/Toy/Ch5::affine-lowering.mlir
Script: -- : 'RUN: at line 1'; toyc-ch5 /var/lib/buildkite-agent/builds/llvm-project/mlir/test/Examples/Toy/Ch5/affine-lowering.mlir -emit=mlir-affine 2>&1 | /var/lib/buildkite-agent/builds/llvm-project/build/bin/FileCheck /var/lib/buildkite-agent/builds/llvm-project/mlir/test/Examples/Toy/Ch5/affine-lowering.mlir
50 msx64 debian > MLIR.Examples/Toy/Ch6::affine-lowering.mlir
Script: -- : 'RUN: at line 1'; toyc-ch6 /var/lib/buildkite-agent/builds/llvm-project/mlir/test/Examples/Toy/Ch6/affine-lowering.mlir -emit=mlir-affine 2>&1 | /var/lib/buildkite-agent/builds/llvm-project/build/bin/FileCheck /var/lib/buildkite-agent/builds/llvm-project/mlir/test/Examples/Toy/Ch6/affine-lowering.mlir
50 msx64 debian > MLIR.Examples/Toy/Ch7::affine-lowering.mlir
Script: -- : 'RUN: at line 1'; toyc-ch7 /var/lib/buildkite-agent/builds/llvm-project/mlir/test/Examples/Toy/Ch7/affine-lowering.mlir -emit=mlir-affine 2>&1 | /var/lib/buildkite-agent/builds/llvm-project/build/bin/FileCheck /var/lib/buildkite-agent/builds/llvm-project/mlir/test/Examples/Toy/Ch7/affine-lowering.mlir

Event Timeline

akhuang created this revision.May 18 2021, 4:37 PM
akhuang requested review of this revision.May 18 2021, 4:37 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptMay 18 2021, 4:37 PM
akhuang added inline comments.May 18 2021, 4:49 PM
clang/lib/Frontend/CompilerInstance.cpp
882

I'm saving the windows HANDLE here instead of using fd because for some reason calling _get_osfhandle(fd) in UnmarkFileForDeletion fails.

It also doesn't work if I don't duplicate the handle, but I don't entirely understand why. (I think when we try to rename the file, it fails with file not found, so maybe it gets deleted?)

Do you think the existing crash tests can be modified to validate that .tmp files are deleted indeed?

clang/lib/Frontend/CompilerInstance.cpp
831

What do you think about having the comment inside the #idef _WIN32 ?

llvm/include/llvm/Support/FileSystem.h
994 ↗(On Diff #346302)

It is unclear to me what is the right formatting for function names here? Is it what clang-tidy suggests or lowercase first, like the other functions?

For other reviewers, I know Duncan, @dexonsmith, has also spent some time working on the clang output file management APIs. And @jansvoboda11 as well, I think.

clang/lib/Frontend/CompilerInstance.cpp
720

This error is unchecked, so we don't know if the disposition set call succeeds.

831

Honestly, OF_Delete is documented as only working on Windows, so I think it would be fair to remove the ifdef, and keep this comment.

885

This API also returns an error code that must be checked.

I'll have to dig into my archived email tomorrow.

I seem to recall some thrashing on this topic a few months ago. If I'm remembering correctly, setting the disposition to delete temporary files on Windows was causing problems with Rust builds because you can't always set the delete disposition (e.g., for a file on a network drive). I think it got pulled out, and then put back in in a limited way.

It certainly would be nice to clean up temporaries left because of a crash, but it's even better not to crash. :-) Lots of programs will leave partially written files if they die unexpectedly, and sometimes they can be useful when figuring out what happened. Part of me wonders whether we should reclaim orphaned temporaries on the next run instead. That might not be hard if we had a predictable location and naming scheme.

Another crazy idea would be to have the driver sit idly while the compiler and linker do their thing, and then clean up behind them if necessary.

Tests, however, always need to be cleaned up one way or another.

I'll take a closer look in the morning.

I seem to recall some thrashing on this topic a few months ago. If I'm remembering correctly, setting the disposition to delete temporary files on Windows was causing problems with Rust builds because you can't always set the delete disposition (e.g., for a file on a network drive). I think it got pulled out, and then put back in in a limited way.

It landed in rG64ab2b6825c5aeae6e4afa7ef0829b89a6828102 - it fixed the initial Rust issue. I can add a Windows 7 bot if you think it's worth it. But I'm happy to occasionally maintain the Win7 port for as long as it's needed.

At some point, the duplicate handle must be closed. I don't see that happening. I've added an inline comment where I think it should be done.

(I find it weird that duplicating the handle seems necessary.)

At a high level, it seems a shame that llvm::support::fs doesn't have create-temporary-file and keep-temporary-file operations to hide all this detail from the frontend.

clang/lib/Frontend/CompilerInstance.cpp
721

IIUC, OF.Handle is the duplicate handle, and we're done with it at this point. It should be closed, before doing things like trying to rename/move the file.

amccarth requested changes to this revision.May 19 2021, 12:08 PM

Sorry, forgot to Request Changes.

This revision now requires changes to proceed.May 19 2021, 12:08 PM

At some point, the duplicate handle must be closed. I don't see that happening. I've added an inline comment where I think it should be done.

(I find it weird that duplicating the handle seems necessary.)

At a high level, it seems a shame that llvm::support::fs doesn't have create-temporary-file and keep-temporary-file operations to hide all this detail from the frontend.

So, there is a TempFile class in llvm::support::fs with create temp file and keep temp file operations, and it would be nice to use that. The issue I was running into is that it uses a file descriptor (int fd) to keep track of the file, but for some reason, the function to convert the fd to a handle (_get_osfhandle) was not working (exits the program when called). I haven't yet figured out why this is happening

akhuang marked 2 inline comments as done.May 20 2021, 4:10 PM

At some point, the duplicate handle must be closed. I don't see that happening. I've added an inline comment where I think it should be done.

(I find it weird that duplicating the handle seems necessary.)

At a high level, it seems a shame that llvm::support::fs doesn't have create-temporary-file and keep-temporary-file operations to hide all this detail from the frontend.

So, there is a TempFile class in llvm::support::fs with create temp file and keep temp file operations, and it would be nice to use that. The issue I was running into is that it uses a file descriptor (int fd) to keep track of the file, but for some reason, the function to convert the fd to a handle (_get_osfhandle) was not working (exits the program when called). I haven't yet figured out why this is happening

Ok, apparently the whole issue with it not being able to open the handle is just because we were passing shouldClose=true to the raw_fd_ostream, which makes it close the file when done writing.
Since that's figured out, I think using TempFile makes the most sense

akhuang updated this revision to Diff 346882.May 20 2021, 4:11 PM

Change to using TempFiles

aganea added inline comments.May 21 2021, 5:50 AM
clang/include/clang/Frontend/CompilerInstance.h
170

Can we always use TempFile? And remove TempFilename in that case?

clang/lib/Frontend/CompilerInstance.cpp
729

I think the idea overall in LLVM is to defer the error rendering as late as possible. You could probably only keep an Error variable here, and render it below in the diagnostic with toString(E).

734

Here you can probably use ECError or errorCodeToError() with what is said slightly above.

829–830

Usually it is better to avoid platform-specific logic. Could we use TempFile all the time on all platforms?

Cool! This is getting much simpler. My remaining comments are mostly musings and you can tell me to buzz off.

But I'd like to see @aganea's questions addressed. It does seem redundant to have to keep the file name when we already have an object that represents the file.

llvm/lib/Support/Path.cpp
1237

I'm curious if this path has ever been exercised.

I see that rename_handle is implemented with MoveFileExW on Windows. MoveFileExW docs say it will fail if you're trying to move a _directory_ to another device, but that it can do copy+delete to move a _file_ to another device. (That might require adding MOVEFILE_COPY_ALLOWED to the flags passed in the MoveFileExW in lib\Support\Windows\Path.inc near line 485.)

Anyway, it just seems like we're re-implementing functionality the OS calls can already do for us.

https://docs.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-movefileexw

1243

If I understand correctly, we're using the delete disposition flag to get the OS to clean up the file in case we crash. But does that prevent us from just deleting it outright once we know it's safe to do so?

In other words if we could just delete the file here, then the divergence between Win32 and others could possibly be reduced.

1253

Possibly stupid idea: What if RemoveFileOnSignal and DontRemoveFileOnSignal did the Win32-specific delete disposition flag twiddling? With that encapsulated, and assuming we can still explicitly delete a Windows file that's already marked for deletion, it seems like all of the special handling here could go away.

Then again, the ...OnSignal things are in sys rather than sys::fs, so maybe it's not such a good idea.

I'm not trying to make more work for Amy. The immediate goal probably isn't well served by this level of overthinking. But it would be nice to see all this get simpler in the long term.

akhuang marked 2 inline comments as done.May 21 2021, 11:47 AM

Do you think the existing crash tests can be modified to validate that .tmp files are deleted indeed?

Looks like the existing crash tests use #pragma clang __debug crash but in the case where clang crashes it's actually fine. I guess the only cases where this patch actually applies is when the program is killed -- don't know if there's a good way to do that in the test suite?

clang/lib/Frontend/CompilerInstance.cpp
734

I think this path goes away if we always use TempFile

829–830

Hm, probably, that would make things simpler.

llvm/lib/Support/Path.cpp
1253

@rnk suggested that too - I think it would definitely make things nicer, but yeah, the fact that it's in sys and not sys::fs is inconvenient.

akhuang updated this revision to Diff 347090.May 21 2021, 11:48 AM
akhuang marked an inline comment as done.

Use TempFile on both linux and windows

akhuang added inline comments.May 21 2021, 3:27 PM
llvm/lib/Support/Path.cpp
1237

according to https://reviews.llvm.org/D50048 we used to pass MOVEFILE_COPY_ALLOWED but then removed it because it calls rename_internal which fails.

akhuang edited the summary of this revision. (Show Details)May 21 2021, 3:29 PM
rnk added a comment.Tue, May 25, 2:33 PM

Nice, this seems to be working out.

clang/include/clang/Frontend/CompilerInstance.h
170

Having a field and type both named TempFile is setting off my portability alarms. I think some old versions of GCC don't like this. Maybe the field can be called Temporary? TemporaryFile? I dono.

clang/lib/Frontend/CompilerInstance.cpp
709–710

Can this be consumeError(OF.TempFile->discard())?

I find consumeError hard to read, I wish it were ignoreErrors or something like that. Short of that, please add a comment at the top of this function that we ignore any errors about failing to remove files.

720–721

Ditto re consumeError simplification

729

ErrorMsg seems unused now

734

Can you skip the toString and just do << std::move(E)? I see an operator<< overload for it:
https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Basic/Diagnostic.h#L1518

llvm/lib/Support/Path.cpp
1264

Please revert unrelated whitespace changes

akhuang updated this revision to Diff 347802.Tue, May 25, 3:14 PM
akhuang marked 5 inline comments as done.

address comments

rnk accepted this revision.Wed, May 26, 4:21 PM

I think this look good. Adrian, are your concerns addressed?

amccarth accepted this revision.Thu, May 27, 8:15 AM

LGTM.

Yeah, this is good. My remaining comments are all speculations about how to improve this further in the future, but they aren't directly applicable to the goal here.

This revision is now accepted and ready to land.Thu, May 27, 8:15 AM
dyung added a subscriber: dyung.Tue, Jun 8, 5:17 PM

Hi Amy, after your change, the compile time for one of our internal test files went from about 1 second to 30 seconds. I have put a repro and the details in PR50628, can you please take a look?

yep, I'll look into it, probably sometime tomorrow--

rnk added inline comments.Wed, Jun 9, 2:53 PM
clang/lib/Frontend/CompilerInstance.cpp
857–858

I think the bug is here: the third parameter is bool unbuffered, not file flags, so we are opening the file for unbuffered writing, and that is really slow.

Nice catch Reid.

clang/lib/Frontend/CompilerInstance.cpp
857–858

Yowza! In addition to a fix, we need some memes.

  • Why do we even have that lever?
  • This constructor has too many overloads. Please remove three.
  • [Facepalm] bool parameters. [Double facepalm] Negative bool parameter names that default to false.
Quuxplusone added inline comments.
clang/lib/Frontend/CompilerInstance.cpp
857–858

In addition to a fix, we need some memes.

https://quuxplusone.github.io/blog/2020/04/18/default-function-arguments-are-the-devil/ ;)
I count 8 different signatures for raw_fd_ostream's constructor, 3 of which come from the offending default-argument'ed overload. And none of them are explicit, which means e.g. {42, false} is a valid argument for a function taking const raw_fd_ostream&.

Unrelated to argument lists, but I would add "using raw new instead of std::make_unique" to this line's lengthy list of sins.

akhuang added inline comments.Wed, Jun 9, 4:14 PM
clang/lib/Frontend/CompilerInstance.cpp
857–858

oh whoops, thanks for the catch - for now I guess I'll just remove the extra parameter