This is an archive of the discontinued LLVM Phabricator instance.

[Support] Class to facilitate file locking
ClosedPublic

Authored by sepavloff on Apr 28 2020, 10:33 PM.

Details

Summary

This change define RAII class, that facilitate using file locks.

Diff Detail

Event Timeline

sepavloff created this revision.Apr 28 2020, 10:33 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 28 2020, 10:33 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
dblaikie added inline comments.Apr 28 2020, 10:47 PM
llvm/include/llvm/Support/FileSystem.h
1178

Perhaps this type should be boolean testable (for the error), or using a factory function to create one that returns llvm::Error (wrapped around the error_code) on failure:

{
Expected<FileLocker> F = TryLock(FD, Timeout);
if (!F) {
}
...
}

Hmm - if unlocking can fail, and that failure needs to be handled, this doesn't seem suitable for an RAII type device...

sepavloff marked an inline comment as done.Apr 28 2020, 11:50 PM

This is a replacement of D78897.

llvm/include/llvm/Support/FileSystem.h
1178

Perhaps this type should be boolean testable (for the error)

The class has method locked, which can be used to check if lock was successfully acquired, and method error for those applications that need more information than just the fact of failure.

Hmm - if unlocking can fail, and that failure needs to be handled, this doesn't seem suitable for an RAII type device...

If the failure to unlock needs to be handled, the user would use method unlock provided by this class. In most expected use cases however this failure need not be handled. User defines RAII object and do things that require exclusive access. Upon destruction lock is released and if something goes wrong at that moment, this fact is just ignored.

dblaikie added inline comments.Apr 29 2020, 2:31 PM
llvm/include/llvm/Support/FileSystem.h
1178

Perhaps this type should be boolean testable (for the error)

The class has method locked, which can be used to check if lock was successfully acquired, and method error for those applications that need more information than just the fact of failure.

Yeah, I think it'd be better if this was only accessible via a free function that returns Expected<FileLocker> so that the Error must be checked for/can't be accidentally ignored.

Hmm - if unlocking can fail, and that failure needs to be handled, this doesn't seem suitable for an RAII type device...

If the failure to unlock needs to be handled, the user would use method unlock provided by this class. In most expected use cases however this failure need not be handled. User defines RAII object and do things that require exclusive access. Upon destruction lock is released and if something goes wrong at that moment, this fact is just ignored.

Fair enough/that sounds OK then. But it should return llvm::Error to make sure errors are handled/not ignored

Updated patch

sepavloff marked an inline comment as done.Apr 30 2020, 10:45 AM
sepavloff added inline comments.
llvm/include/llvm/Support/FileSystem.h
1178

Added function tryLock for using Expected<FileLocker>

labath added inline comments.May 4 2020, 12:25 AM
llvm/include/llvm/Support/FileSystem.h
1189–1192

This implicitly deletes the copy constructor, but it still leaves this class with a default copy-assignment operator. Defining a move assignment would make the class safer and more consistent.

1228

I don't think there should be a default value for the timeout for two reasons:

  • it makes it very unobvious that there is any timing out happening (tryLock(FD) looks a lot like mutex.try_lock() which only does an instantaneous lock check with no retries or time outs)
  • the appropriate value of the timeout sounds like something that is going to be very dependent on the exact use case, and the person using it should make a deliberate choice about the value (and explain it). (Speaking of that, I am still not sure your use case really needs timeouts, though that's more of a discussion for the other patch.)
1229–1231

If the intention is that people always use this function to lock files, then it would be better to remove the relevant FileLocker constructor and do the locking here. That would also simplify the class as it could maintain the invariant that it is created only if locking was successful and it wouldn't need to carry the error_code member.

sepavloff updated this revision to Diff 262111.May 5 2020, 7:53 AM

Updated patch

  • Lock logic is moved to tryLock function,
  • Changed implementation of FileLocker accordingly,
  • Removed default agrument of tryLock.
sepavloff marked 4 inline comments as done.May 5 2020, 8:01 AM
sepavloff added inline comments.
llvm/include/llvm/Support/FileSystem.h
1189–1192

Removed copy operations explicitly, added missing operator.

1228

You are right, removed default argument.

Timeout is a form of protection against deadlocks. For the particular task of writing to log file infinite wait time is also acceptable, but in other use cases (such as unit tests) timeouts may be convenient.

1229–1231

Makes sense. Moved lock logic to to function tryLock.

labath added inline comments.May 6 2020, 1:58 AM
llvm/include/llvm/Support/FileSystem.h
1178

Looking at the unit test I realized that fs::tryLockFile and fs::tryLock do quite different things, but that this difference is not obvious from the function names. The way I would deal with that is by making tryLock a static member function fs::FileLocker::tryLock (and then make the FileLocker constructor private), but I see that David has requested a free function. David, what are your thoughts on that?

1228

I don't think unit tests are a good motivating example. There are plenty of ways a test can not terminate without locking files (or anything else). The solution to that is to have a (fairly high) timeout at the level of the test harness. That we already have.

So if the unit tests are the main motivation for introducing these timeouts, I think we should remove them, as they add a fair amount of complexity (and flakyness) to the equation.

Even for the clang use case, whether one prefers a deadlock over the program silently doing the wrong thing (perhaps when the system is under heavy load, like it is when running the test suite), depends on what exactly is he trying to achieve. But that's something I don't feel qualified to judge...

llvm/unittests/Support/Path.cpp
42–44

I guess these leaked in from the dependant patch somehow...

2167

A cleaner way to write this would be:

Expected<fs::FileLocker> L = fs::tryLock(...);
ASSERT_THAT_EXPECTED(L, Succeeded());
// do the other checks..
2187–2204
L = fs::tryLock(FD);
ASSERT_THAT_EXPECTED(L, Succeeded());
EXPECT_THAT_EXPECTED(fs::tryLock(Stream.getFD()), Failed());
ASSERT_NO_ERROR(L->unlock());
EXPECT_THAT_EXPECTED(fs::tryLock(Stream.getFD()), Succeeded());
2211–2222

This could be rewritten in a similar manner, though it's not clear to me what is it testing that wasn't already covered by the tests above.

sepavloff updated this revision to Diff 262398.May 6 2020, 8:54 AM
sepavloff marked an inline comment as done.

Updated patch

  • Changed lock function name to getLock,
  • Reorganized unit test.
sepavloff marked 6 inline comments as done.May 6 2020, 9:19 AM
sepavloff added inline comments.
llvm/include/llvm/Support/FileSystem.h
1178

I renamed the function into getLock. It looks not so categorical as lockFile and is unlikely to be confused with tryLockFile. Using static member functions leads to long names, it decreases readability.

1228

Unit tests are not a motivation for using timeouts, but they demonstrates the benefits. If users can specify timeouts, they have possibility to handle deadlocks. It is up to user to handle such case or not, but the possibility exists. Without timeouts user must behave as if deadlocks never happen. It limits usability of the solution and it is the motivation.

llvm/unittests/Support/Path.cpp
42–44

Yes, rebased the patch properly.

2167

Thank you for the idea!
However the variant with lock object local to if is recommended way to make locks, so at least one example should be present. But other I reworked in the proposed way.

2187–2204

Reworked this place but with small change: temporary values are left in variables. It is convenient in tests, as if something goes wrong, it is easier to find reason of the problem in debugger.

2211–2222

Removed this test. Previously it checked double unlock, but in the new implementation of Locker it is not possible to diagnose such case.

labath added inline comments.May 7 2020, 2:16 AM
llvm/include/llvm/Support/FileSystem.h
1178

Having to guess what a short name does also hurts readability. So does inconsistency with other similar classes. getLock is less likely to be confused with tryLockFile, but its connection to the FileLocker class is still fairly vague for my taste, and it has lost the information that it only "tries" to get the lock, which I do find important.

As for consistency, I think this class is most similar to TempFile class in this very file. That class also uses a static factory function TempFile::create to return an Expected<TempFile>. Other uses of static factory functions in the support library that I could find by a quick search are GlobPattern::create, MemoryBuffer::getFile***, SpecialCaseList::create, Error::success. There are also uses of free functions to create an object, but these generally have a different purpose -- to enable automatic template argument deduction (e.g. make_range) and not to provide error handling in a -fno-exceptions world.

1228

Well.. it's not that the user _can_ specify timeouts. In the current setup, the user _must_ specify a timeout -- there is no blocking equivalent of this function. Which is fine, if the only use case we have for it right now wants/needs/etc. a timeout (we are not writing a general purpose library here). However, I remain unconvinced that a timeout is called for in this situation (but you don't have to convince me here -- that's another review).

llvm/unittests/Support/Path.cpp
2167

If you want to keep the if statement, that's fine but you should still change the ADD_FAILURE thingy to an EXPECT_THAT_EXPECTED, because the latter has a much better error message.

2187–2204

I don't feel strongly about that, but I do want to point out that this pattern:
a) generates worse error messages ("EC did not return errc::success" vs "fs::tryLock(...) did not return errc::success")
b) its not consistent with our other tests (grep for ASSERT_NO_ERROR).
Generally, our goal for writing tests is to make the failure messages good enough so you don't need to use a debugger to figure out what went wrong (because you can't debug a build bot).

sepavloff updated this revision to Diff 262862.May 8 2020, 6:37 AM

Updated patch

  • Added blocking lock method,
  • Updated unit tests.
sepavloff marked 4 inline comments as done.May 8 2020, 6:54 AM
sepavloff added inline comments.
llvm/include/llvm/Support/FileSystem.h
1178

The problem is not in long names only. The classes you mention (TempFile, MemoryBuffer and so on) represent some program entities. In contrast, FileLocker is a helper class, it does not designate any entity in a program, it exists only to execute some action at scope exit. Ideally user does not need to know about it. We can reach that by using auto specifier:

if (auto L = fs::getLock(FD)) {
    // ... do action that require file to be locked.
} else {
    // ... handle lock error.
};

That is why a free function is more convenient than a method.

1228

Collection of build statistics usually is a side effect of build. If something goes wrong and lock cannot be released, it can cause build failure. It is often more acceptable to get incomplete statistics than to break a build. Timeouts can work as watchdog, excluding log machinery from culprits.

However in some cases blocking operations may be more preferable. I think we can solve this problem by adding blocking function as well. It is done in this version.

llvm/unittests/Support/Path.cpp
2167

Ok, changed.

2187–2204

Got it. Reworked as you recommended.

labath added inline comments.May 11 2020, 6:52 AM
llvm/include/llvm/Support/FileSystem.h
1178

I'll admit that there are some differences between this class and the others, but I don't believe they are sufficiently convincing to justify deviating from the existing practice. I think that the notion of a "program entity" is very vague. I could certainly think of a lock on a file as a "thing" that can be owned, passed around (it has move constructors), etc. And the existence of the unlock function means that the user may need to be aware of the class.

scope_exit is a class whose only function is to run an action at scope exit, and it is indeed normally used as auto x = make_scope_exit(...), but I don't think that's because we don't want users to be aware of the classes existance. It's because it uses templates for efficiency, and spelling the template name out would be difficult or impossible (lambdas have no names). Given llvm's fairly cautious stance towards auto, if there was a way to achieve the same thing without templates, I'm pretty sure we would be spelling out that class everywhere.

1228

(I think adding the blocking is good because it may prevent needless proliferation of calls to the timeouted versions. I just want to clarify that I did not consider that as a requirement for this patch.)

(please address the clang-format lint code review feedback in changed/added lines to remove osme of the noise from this review)

llvm/include/llvm/Support/FileSystem.h
1178

Sorry for the delay getting back to this review!

Looking at the unit test I realized that fs::tryLockFile and fs::tryLock do quite different things, but that this difference is not obvious from the function names. The way I would deal with that is by making tryLock a static member function fs::FileLocker::tryLock (and then make the FileLocker constructor private), but I see that David has requested a free function. David, what are your thoughts on that?

Sorry, the main point was "not a public constructor" - some kind of named constructor so that the result is "Expected<FileLocker>" to enforce testing of failure to lock. I don't mind if it's a static member function of FileLocker, or perhaps a non-static member function of raw_ostream, instead than exposing the FD from the raw_ostream.

+1 to everything else @labath has said about this being perfectly reasonable to name & scope guards being a fairly special case of unnameable types and I wouldn't apply it in this case.

1228

+1 please don't include timeout support if there's no immediate planned use case.

sepavloff marked 2 inline comments as done.May 14 2020, 9:30 AM
sepavloff added inline comments.
llvm/include/llvm/Support/FileSystem.h
1178

I'll admit that there are some differences between this class and the others, but I don't believe they are sufficiently convincing to justify deviating from the existing practice.

The existing practice is to declare RAII object in a scope as a local variable. It was the initial variant, which was rejected because it didn't force a user to check if lock was successful, such check was supported but not enforced.

I think that the notion of a "program entity" is very vague. I could certainly think of a lock on a file as a "thing" that can be owned, passed around (it has move constructors), etc.

You are definitely right. A thing that is an auxiliary class can become a full-fledged entity in other design. A lock, which would allow ownership transfer, makes sense as a member of more elaborated class hierarchy. Such lock could support more flexible lock mechanism and provide checking facility. In this case it is an entity, not merely a RAII object.

The class FileLocker on the other hand is a simple class, that supports only the simplest kind of file locking. It has move constructor only because it is necessary to use it with Expected. The only function it has is lock release. Hardly it need a factory method, as this function is already implemented by getLock.

And the existence of the unlock function means that the user may need to be aware of the class.

This is common practice, when RAII class has possibility to free resources early, for example ContextRAII::pop or ScopeRAII::destroy.

scope_exit is a class whose only function is to run an action at scope exit, and it is indeed normally used as auto x = make_scope_exit(...), but I don't think that's because we don't want users to be aware of the classes existance.

Sure, there is no intention to hide the class from user. However if user can be unaware about existence of a class, the class is auxiliary.

IIUC the problem now is whether to create RAII object with a free function or a method of FileLocker? What is the advantage of this solution?

1228

The time out support is motivated by the following consideration.

At present compiler jobs executed in parallel build are independent. Each compiler process has exclusive access to any file it writes to. Allowing writes to the same log file creates dependency between compiler processes, which may affect build process. For example, the following hypothetical scenario. If some process having lock to the shared log file is suspended, other processes may be suspended waiting the lock. The memory is occupied with waiting processes, and the build can fail due to resource shortage. Using timeouts can alleviate this issue at expense of log completeness.

sepavloff updated this revision to Diff 264015.May 14 2020, 9:31 AM

Adressed formatting issues

@labath - what do you think, for retrieving an Expected<FileLocker> - static member function of FileLocker, or a non-static member function of raw_ostream?

llvm/include/llvm/Support/FileSystem.h
1228

That's a nice hypothetical scenario - but please add it when there's a concrete need in the LLVM project for that functionality, not before.

(I'm moving this discussion out-of-line as it's getting rather long.

I'll admit that there are some differences between this class and the others, but I don't believe they are sufficiently convincing to justify deviating from the existing practice.

The existing practice is to declare RAII object in a scope as a local variable. It was the initial variant, which was rejected because it didn't force a user to check if lock was successful, such check was supported but not enforced.

It's not the existing practice when the creation of an object (RAII or not) can fail for reasons which are outside of the programmers control. Locking a mutex can not fail, (except for timeouts, if you use the timeout constructor). I'd be willing to go with that idea (to emulate std::unique_lock), if we can agree to declare all other locking problems to be "programmer errors", and assert they don't happen.

linux documents the following flock errors:

  • EBADF fd is not an open file descriptor. -- definitely a programmer error
  • EINTR While waiting to acquire a lock, the call was interrupted by delivery of a signal caught by a handler -- can/should be handled by retrying
  • EINVAL operation is invalid. -- sounds like a programmer error?
  • ENOLCK The kernel ran out of memory for allocating lock records. -- not really programmer error, but llvm is not robust against resource exhaustion anyway
  • EWOULDBLOCK The file is locked and the LOCK_NB flag was selected. -- timeout

I think I would be fine with asserting on all of these (except EWOULDBLOCK). The main question i have is when can EINVAL be returned. If it's just a garbage operation argument then I'm fine. But if it's possible that we get this if the underlying fs does not support locking, then this would not be ok, since the programmer cannot know whether that is supported -- other OSs document an EOPNOTSUPP result, but they still leave it fairly vague about when does that apply (e.g. can it happen for regular files too, or only for the more "exotic" file descriptors).

You are definitely right. A thing that is an auxiliary class can become a full-fledged entity in other design. A lock, which would allow ownership transfer, makes sense as a member of more elaborated class hierarchy. Such lock could support more flexible lock mechanism and provide checking facility. In this case it is an entity, not merely a RAII object.

I'm sorry, but I still find the entity-vs-auxiliary line too fine to walk on.

The class FileLocker on the other hand is a simple class, that supports only the simplest kind of file locking. It has move constructor only because it is necessary to use it with Expected. The only function it has is lock release. Hardly it need a factory method, as this function is already implemented by getLock.

That goes both ways -- with a factory method, there's no need for the getLock function. And regardless of the reason, the class does have move operations, so it does support ownership transfer.

And the existence of the unlock function means that the user may need to be aware of the class.

This is common practice, when RAII class has possibility to free resources early, for example ContextRAII::pop or ScopeRAII::destroy.

True, but those classes are not constructed with auto (hence one can't say the users aren't aware of their existence), and they construction cannot fail (so they avoid the entire free-vs-static debate).

IIUC the problem now is whether to create RAII object with a free function or a method of FileLocker? What is the advantage of this solution?

Yes, I think you understand correctly. As for the advantage, the killer argument for me is consistency. I also find that the getLock vs lockFile does not emphasize enough the difference in semantics between the two functions. This could be alleviated by changing the name to something like getFileLocker, but I don't see a need to argue the merits of that given the first point. I also don't expect this class to be used so widely to be worth optimizing the couple of characters of typing.

@labath - what do you think, for retrieving an Expected<FileLocker> - static member function of FileLocker, or a non-static member function of raw_ostream?

Both seem fine to me. The thing which, for me, would really speak in favor of a raw_(fd_)ostream method is if we wanted to make the locker class smarter and have it automatically flush the stream before unlocking. As it stands now, the user still has to manually call flush (or remember to use unbuffered I/O), otherwise the entire locking may be for nothing.

llvm/include/llvm/Support/FileSystem.h
1178

My reply is out-of-line, as this thread is getting rather long.

1228

I believe that is not a hypothetical scenario, but *the* scenario because of which we're having this entire debate in the first place (see the dependent patch) :)

Updated patch

  • Free function getLock was replaced by raw_fd_stream::tryToLock,
  • Removed variant of lock with timeout.
labath accepted this revision.May 20 2020, 2:31 AM

This looks good to me, modulo a couple of inline comments. But let's give a chance for @dblaikie to look this over too...

llvm/include/llvm/Support/FileSystem.h
1181

If this stays as public, I think it should be documented very explicitly that this takes an already-locked file descriptor. Alternatively, we could still introduce the static factory function here (:P) to do the locking, but have the raw_ostream function as the "recommended" way of locking a raw_fd_ostream (actually, without exposing the file descriptor, it will essentially be the _only_ way of doing it).

llvm/lib/Support/Error.cpp
85–90

I'd move this next to the definitions of other ErrorInfoBase members (line 53--54).

llvm/unittests/Support/Path.cpp
2174

This line here is not going anything because the EXPECT macro already "consumes" the error. The double consumes for Errors are mostly harmless in practice, but I already got complaints from some clang-tidy checks about this, when I accidentally introduced them. So... consider yourself warned. :)

I know you want to demonstrate the "recommended" usage. That is commendable, but I don't think it really works out here...

This revision is now accepted and ready to land.May 20 2020, 2:31 AM

Updated patch

  • constructor FileLocker(int) became private,
  • small change in unit test.
sepavloff marked 3 inline comments as done.May 20 2020, 11:19 PM

@labath, thank you for the review!
Maybe you could look at the patch D78896 on which this patch depends?

llvm/include/llvm/Support/FileSystem.h
1181

As @dblaikie proposed earlier, this constructor is now made private.

llvm/unittests/Support/Path.cpp
2174

Indeed, thank you for for the catch. Avoid using EXPECT_THAT_EXPECTED here.

This looks good to me, modulo a couple of inline comments. But let's give a chance for @dblaikie to look this over too...

Yep, looks about right to me. Thanks!

The two changes to Error.h (message/cantFail) look like they're unrelated to this patch & should be removed/not committed?

This looks good to me, modulo a couple of inline comments. But let's give a chance for @dblaikie to look this over too...

Yep, looks about right to me. Thanks!

The two changes to Error.h (message/cantFail) look like they're unrelated to this patch & should be removed/not committed?

I think they're needed because this introduces a circular dependency between raw_ostream and Error classes. Does not seem unreasonable to me, but if we don't want that, then we need to revisit this idea...

sepavloff updated this revision to Diff 265686.May 22 2020, 2:37 AM

Removed unnecessary changes

The two changes to Error.h (message/cantFail) look like they're unrelated to this patch & should be removed/not committed?

I think they're needed because this introduces a circular dependency between raw_ostream and Error classes. Does not seem unreasonable to me, but if we don't want that, then we need to revisit this idea...

Exactly. I got compilation errors because of Expected used in declaration of raw_ostream::tryToLock. But now I tried removing these changes and compilation was successful. Probably the recent Visual Studio update solved this problem. So I removed these changes. If however someone using old MS compiler will see compilation fails, we know what to do.

dblaikie accepted this revision.May 22 2020, 1:26 PM

The two changes to Error.h (message/cantFail) look like they're unrelated to this patch & should be removed/not committed?

I think they're needed because this introduces a circular dependency between raw_ostream and Error classes. Does not seem unreasonable to me, but if we don't want that, then we need to revisit this idea...

Exactly. I got compilation errors because of Expected used in declaration of raw_ostream::tryToLock. But now I tried removing these changes and compilation was successful. Probably the recent Visual Studio update solved this problem. So I removed these changes. If however someone using old MS compiler will see compilation fails, we know what to do.

Nah, that's a fine reason - thanks for explaining!

sepavloff updated this revision to Diff 265975.May 25 2020, 1:13 AM

Updated patch

  • Get rid of clang-tidy warnings.
sepavloff updated this revision to Diff 271983.Jun 19 2020, 3:03 AM

New method names, fixed unit test

Review of D78903 demonstrated that using the name tryToLock for
blocking operation is misleading. So the method name was changed
to lock. Attribute LLVM_NODISCARD should prevent from ignoring
return value.

Also due to the change of implementation of lock operation on POSIX
systems, existing unit test is no more valid as it uses locks per
handle inside the same process. This test however still can be run
on Windows, so it was adapted for changed interface.

This revision was landed with ongoing or failed builds.Jul 29 2020, 11:43 PM
This revision was automatically updated to reflect the committed changes.