This is an archive of the discontinued LLVM Phabricator instance.

[Support] [Windows] Manually clean up temp files on network shares
ClosedPublic

Authored by mstorsjo on Oct 15 2021, 4:00 AM.

Details

Summary

Since D81803 / 79657e2339b58bc01fe1b85a448bb073d57d90bb, temp files
created on network shares don't set "Disposition.DeleteFile = true".
This flag normally takes care of removing the temp file both if the
process exits abnormally, and when the file is closed cleanly.

Even if we can't seem to use the flag on network shares, we can
at least make sure to remove the temp files when they are closed
cleanly.

This patch adjusts setDeleteDisposition to take a bool &Delete
reference, to let the caller know whether it actually set the
flag or not (on network shares, it returns a success error code
but doesn't do what it was asked to).

When setDeleteDisposition is called from openNativeFile, we can't
easily pass this information back to the caller without adding
extra plumbing through all the layers of functions up to the
caller in TempFile::create.

Instead have TempFile::create() do a duplicate call to
setDeleteDisposition, to see whether it actually sets the
desired flag. If not, set a Windows specific flag in TempFile,
and manually remove the file in discard().

This fixes https://github.com/mstorsjo/llvm-mingw/issues/233.

Diff Detail

Event Timeline

mstorsjo created this revision.Oct 15 2021, 4:00 AM
mstorsjo requested review of this revision.Oct 15 2021, 4:00 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 15 2021, 4:00 AM
jhenderson added inline comments.Oct 15 2021, 6:12 AM
llvm/lib/Support/Path.cpp
1212

I'm not really liking the duplicate code between this and the below non-Windows block. Is there a way we could avoid this duplication?

1245–1246

Similar comment to above: this code is duplicated a few lines down. Perhaps another function, or maybe even mode the code into setDeleteDisposition somehow?

mstorsjo added inline comments.Oct 15 2021, 6:17 AM
llvm/lib/Support/Path.cpp
1212

Sure, it should be pretty straightforward to make it one generic codepath with a couple smaller platform specific ifdefs instead.

1245–1246

Moving it into setDeleteDisposition is hard as that one isn't a member of TempFile, but I can try to factorize the duplication at least.

mstorsjo updated this revision to Diff 380100.Oct 15 2021, 2:16 PM

Get rid of most of the code duplication as requested.

jhenderson accepted this revision.Oct 18 2021, 12:29 AM

Mostly minor comments. LGTM with them addressed (or not, if you disagree).

llvm/lib/Support/Path.cpp
1209–1211

Whilst modifying the comment, let's capitalize windows -> Windows (since it's a proper noun). I also think it needs a comma.

1237–1238

I think the consensus I've seen elsewhere is that a lambda is a function object, i.e. a variable, not a function, so needs UpperCamelCase name styling.

1325

Nit: clang-format.

1333
llvm/lib/Support/Windows/Path.inc
404

Perhaps this variable should be renamed to WillAutoDelete or something like that? That might help express the meaning of changing its value. Although I'm not certain, since there's really two meanings - should delete and will auto delete.

(I don't particularly like the non-const input reference, but I don't see a better alternative at this point)

1189

Perhaps ShouldDelete.

This revision is now accepted and ready to land.Oct 18 2021, 12:29 AM
mstorsjo marked 7 inline comments as done.Oct 18 2021, 1:15 AM

Thanks for the review! I'll leave this open for further comments and then apply tomorrow unless there's differing opinions.

mstorsjo updated this revision to Diff 380300.Oct 18 2021, 1:15 AM

Applied the suggested changes.

rnk added a comment.Oct 18 2021, 12:57 PM

Here's an idea to try to make this a bit cleaner. Maybe setDeleteDisposition should return an error code when the path is on a network drive. That return value would be propagated back up to the caller who sets OF_Delete. In this case, that is TempFile::create. At that point, TempFile can retry temporary file creation without OF_Delete.

The downside to this is that we will create the file, set the disposition (fails), delete the file, and recreate it again without OF_Delete. To improve on that, we could have some kind of canSetDeleteDisposition portable API, which would feed into the choice to set the OF_Delete flag.

Here's an idea to try to make this a bit cleaner. Maybe setDeleteDisposition should return an error code when the path is on a network drive. That return value would be propagated back up to the caller who sets OF_Delete. In this case, that is TempFile::create. At that point, TempFile can retry temporary file creation without OF_Delete.

Hmm... It would indeed be cleaner in the sense that setDeleteDisposition doesn't lie about whether it succeeded or not - but that would require using a specific error code for it that the caller would look for (so we don't retry without the flag for any random error) and hope that we don't map other real system errors to that error code.

The downside to this is that we will create the file, set the disposition (fails), delete the file, and recreate it again without OF_Delete.

Yeah, that's in practice probably a much bigger overhead than just calling setDeleteDisposition one more time.

To improve on that, we could have some kind of canSetDeleteDisposition portable API, which would feed into the choice to set the OF_Delete flag.

Hmm, I guess that could work too. It requires mimicing the logic/checks from setDeleteDisposition - or maybe we could remove the logic from there altogether? (I think it should be kept though, so that OF_Delete behaves roughly as documented even in those cases, though.)

rnk added a comment.Oct 18 2021, 1:37 PM

Here's an idea to try to make this a bit cleaner. Maybe setDeleteDisposition should return an error code when the path is on a network drive. That return value would be propagated back up to the caller who sets OF_Delete. In this case, that is TempFile::create. At that point, TempFile can retry temporary file creation without OF_Delete.

Hmm... It would indeed be cleaner in the sense that setDeleteDisposition doesn't lie about whether it succeeded or not - but that would require using a specific error code for it that the caller would look for (so we don't retry without the flag for any random error) and hope that we don't map other real system errors to that error code.

Yep, it mostly requires making a new error category and defining a new error. A bit annoying, but not too hard. Alternatively, if this is all handled in TempFile::create and no recovery is needed, you can use a generic error code.

The downside to this is that we will create the file, set the disposition (fails), delete the file, and recreate it again without OF_Delete.

Yeah, that's in practice probably a much bigger overhead than just calling setDeleteDisposition one more time.

To improve on that, we could have some kind of canSetDeleteDisposition portable API, which would feed into the choice to set the OF_Delete flag.

Hmm, I guess that could work too. It requires mimicing the logic/checks from setDeleteDisposition - or maybe we could remove the logic from there altogether? (I think it should be kept though, so that OF_Delete behaves roughly as documented even in those cases, though.)

I think it's already reasonably well-factored already. It all just boils down to calling is_local on the directory path where the temporary file will be created, right?

Here's an idea to try to make this a bit cleaner. Maybe setDeleteDisposition should return an error code when the path is on a network drive. That return value would be propagated back up to the caller who sets OF_Delete. In this case, that is TempFile::create. At that point, TempFile can retry temporary file creation without OF_Delete.

Hmm... It would indeed be cleaner in the sense that setDeleteDisposition doesn't lie about whether it succeeded or not - but that would require using a specific error code for it that the caller would look for (so we don't retry without the flag for any random error) and hope that we don't map other real system errors to that error code.

Yep, it mostly requires making a new error category and defining a new error. A bit annoying, but not too hard. Alternatively, if this is all handled in TempFile::create and no recovery is needed, you can use a generic error code.

Well there's a couple uses in TempFile::keep() too. Then again, with a bit more changes, we could stop calling them too, if we aren't using the delete disposition for this file altogether.

The downside to this is that we will create the file, set the disposition (fails), delete the file, and recreate it again without OF_Delete.

Yeah, that's in practice probably a much bigger overhead than just calling setDeleteDisposition one more time.

To improve on that, we could have some kind of canSetDeleteDisposition portable API, which would feed into the choice to set the OF_Delete flag.

Hmm, I guess that could work too. It requires mimicing the logic/checks from setDeleteDisposition - or maybe we could remove the logic from there altogether? (I think it should be kept though, so that OF_Delete behaves roughly as documented even in those cases, though.)

I think it's already reasonably well-factored already. It all just boils down to calling is_local on the directory path where the temporary file will be created, right?

Yes, I guess so.

I guess this sounds like a decent improvement overall - I can try to give this a shot, in a day or two.

rnk added a comment.Oct 18 2021, 1:58 PM

Well there's a couple uses in TempFile::keep() too. Then again, with a bit more changes, we could stop calling them too, if we aren't using the delete disposition for this file altogether.

The RemoveOnClose boolean seems like it could work there, though. It essentially tracks, "do I need to delete manually, or will the OS take care of it for me". Maybe I misunderstood and it isn't quite that.

Anyway, thanks for giving it a try.

mstorsjo updated this revision to Diff 380774.Oct 19 2021, 1:56 PM

Updated according to what was discussed with @rnk.

I ran into one big surprise regarding this though: If we don't pass OF_Delete, we don't set the DELETE access mode on the opened file. And without that, renaming the temp file via SetFileInformationByHandle(FileRenameInfo) fails. (Renaming it via the fallback codepath MoveFileEx succeeds though.) So either we need to always pass DELETE, or we'd need to add yet another windows specific open flag, OF_Deletable (in addition to the existing OF_Delete).

Then on the other hand; if we always end up opening files with DELETE, then we could just skip passing OF_Delete and try calling setDeleteDisposition altogether, and return the result in an output parameter like in the previous iteration (although I guess it can be argued that this approach is cleaner anyway).

Other open questions: I now change setDeleteDisposition to error out on a network file system. This is a behaviour change if someone else is using OF_Delete on a network file system - but it's probably ok? There are no other uses of OF_Delete in the llvm-project monorepo though.

Should we give RemoveOnClose an even more explicit name? As it's now unclear whether it says that the OS will implicitly remove it on close, or that we must remove it manually on close. ManuallyRemoveOnClose sounds quite inelegant though...

rnk added a subscriber: probinson.Oct 19 2021, 2:19 PM

I think this is reasonable

@probinson, is there someone at Sony with Windows FS experience who can sign off on this change? LLVM will now request delete access when it opens files for writing.

llvm/lib/Support/Windows/Path.inc
1112

I think this should probably be under if (FA_Write), since only need delete access when opening a file for writing. We want to be able to open readonly files for reading.

Otherwise, yeah, I think it makes sense to request delete access for any file that LLVM needs to write. LLVM rarely attempts to update files in place, and it matches the Unix model (I think).

mstorsjo updated this revision to Diff 380862.Oct 19 2021, 11:47 PM

Moved to setting DELETE based on FA_Write, kept the old clause for OF_Delete too for clarity. Removed conditions around calling setDeleteDisposition(false) as the temp files are opened with the DELETE flag anyway.

This also appears to fix another llvm bug filed on a windows ram drive issue (https://bugs.llvm.org/show_bug.cgi?id=52080), which is cool.

llvm/lib/Support/Windows/Path.inc
417

Should we change this comment (or maybe just remove this section?), since it seems like this patch tries to avoid this path?

This also appears to fix another llvm bug filed on a windows ram drive issue (https://bugs.llvm.org/show_bug.cgi?id=52080), which is cool.

Oh, nice!

llvm/lib/Support/Windows/Path.inc
417

Good catch! Yes we should change that. I'll revise the patch.

mstorsjo updated this revision to Diff 381087.Oct 20 2021, 1:39 PM

Update a comment about leaving temp files behind.

@rnk - Actually, we need to backtrack a bit on this. This breaks two unit tests in clang, CrossTU/./CrossTUTests.exe/CrossTranslationUnit.CanLoadFunctionDefinition and CrossTU/./CrossTUTests.exe/CrossTranslationUnit.IndexFormatCanBeParsed.

The core of the issue lies here: https://github.com/llvm/llvm-project/blob/main/clang/unittests/CrossTU/CrossTranslationUnitTest.cpp#L164-L171

The test first creates a temporary file with llvm::sys::fs::createTemporaryFile (which is a different thing than TempFile and uses RemoveFileOnSignal for cleanup), writes data to it, then while keeping the file open, opens with with std::ifstream here: https://github.com/llvm/llvm-project/blob/main/clang/lib/CrossTU/CrossTranslationUnit.cpp#L152-L157

If the temporary file is created with the DELETE flag, opening it a second time fails.

So with that in mind, we either need to add another open flag, OF_DeleteAccessButDontTryToSetDeleteDisposition which we only set in TempFile::create, or backtrack to just always passing OF_Delete, letting setDeleteDisposition fail silently for non-local filesystems and either call setDeleteDisposition another time (with a reference parameter indicating whether it succeeded or not) or just guessing with is_local whether it did succeed or not.

Neither of them are really clean...

Or should we unwind the whole design of using OF_Delete and just make TempFile use the same sys::RemoveFileOnSignal design as on unix? That would be to revert https://github.com/llvm/llvm-project/commit/3ecd20430cace7eb9e78ef0a9ff9cc95a377feb9. It doesn't say the reason for switching to this mechanism in the first place, other than "We won't see the temp file no more.". (The other major commit involved, that adjusted the design for handling TempFile on windows, is https://github.com/llvm/llvm-project/commit/881ba104656c40098d4bc90c52613c08136f0fe1 / D48051. I think parts of that can be reverted/simplified in that case then too?)

mstorsjo updated this revision to Diff 382448.Oct 26 2021, 1:31 PM

Changed OF_Delete to only set the delete access mode but not set the delete disposition. This simplifies the decision making, as TempFile::Create can just call setDeleteDisposition on its own and check the return value.

In theory, we could still consider using RemoveFileOnSignal for the case when manually removing the file at the end, but I think that'd complicate the code even further, so the files will be left behind on network drives on crashes.

akhuang accepted this revision.Oct 26 2021, 4:05 PM

lgtm!

Hmm, I think this would work with D111875 then, if I’d update it to remove the check for the specific error code on the return from setDeleteDisposition?

It'll be great! Current diff still doesn't work with the RAM-disk https://sourceforge.net/projects/imdisk-toolkit/:
if (EC == errc::not_supported) { misses errc::function_not_supported.

I've successfully tested locally this approach: if (EC == errc::not_supported || EC == errc::function_not_supported) {

Hmm, I think this would work with D111875 then, if I’d update it to remove the check for the specific error code on the return from setDeleteDisposition?

It'll be great! Current diff still doesn't work with the RAM-disk https://sourceforge.net/projects/imdisk-toolkit/:
if (EC == errc::not_supported) { misses errc::function_not_supported.

I've successfully tested locally this approach: if (EC == errc::not_supported || EC == errc::function_not_supported) {

Thanks for testing! Maybe it'd be best and safest to remove that if statement altogether - as long as we successfully opened the file, we don't need to bother about the various reasons why setDeleteDisposition can fail, as we can just skip that and go on with the manual deletion.

mstorsjo updated this revision to Diff 382792.Oct 27 2021, 2:20 PM

Don't check the exact error code from setDeleteDisposition, just proceed with the alternative codepath if it failed, for whatever reason. This fixes creating temp files on a ram disk (which errors out from realPathFromHandle in setDeleteDisposition).

Just for the record for clarity, in D112376 I considered an alternative approach, to remove the whole use of delete-on-close flags on Windows, to just use the RemoveFileOnSignal function instead. That has the much more visible drawback that temp files are left behind if processes are killed.

I'll go ahead and land this probably tomorrow, if there's nothing more to add here - it seems that this one doesn't have any objections now.

rnk added a comment.Oct 27 2021, 2:56 PM

Thanks for working on this, this was quite the rewrite.