This is an archive of the discontinued LLVM Phabricator instance.

[Support] Add file lock/unlock functions
ClosedPublic

Authored by sepavloff on Apr 26 2020, 11:32 PM.

Details

Summary

New functions lockFile and unlockFile implement simple file locking.
They lock or unlock entire file. This must be enough to support
simulataneous writes to log files in parallel builds.

Diff Detail

Event Timeline

sepavloff created this revision.Apr 26 2020, 11:32 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 26 2020, 11:32 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
labath added inline comments.Apr 27 2020, 5:36 AM
llvm/include/llvm/Support/FileSystem.h
1140–1141

this would be nicer as a std::chrono type.

1145

Should this be called tryLockFile, or maybe even tryLockFileFor (to mirror e.g. std::timed_mutex::try_lock_for) ?

llvm/unittests/Support/Path.cpp
2046–2047

Have you considered using flock(2) instead of F_SETLK? That might give you semantics which are a bit saner and a bit closer to what happens on windows (though file locking is always weird on posix systems)...

Updated patch according to reviewer's notes

sepavloff marked 5 inline comments as done.Apr 28 2020, 12:48 AM
sepavloff added inline comments.
llvm/include/llvm/Support/FileSystem.h
1140–1141

Agree.

1145

Make sense. Changed to tryLockFile.

llvm/unittests/Support/Path.cpp
2046–2047

IIUC, flock is not a POSIX call. GLIBC implements it on top of fcntl. The implementation also contains vague statement that it represents different mechanism on 4BSD: https://github.com/bminor/glibc/blob/92954ffa5a5662fbfde14febd7e5dcc358c85470/sysdeps/posix/flock.c#L18 . So I would refrain from using it, as the code must work on Linux, MacOS and *BSD. POSIX calls looks more portable.

Looping in Kamil for his knowledge of "obscure" operating systems and other historical trivia.

However, I am still wondering (as I've alluded to in the other review) if there isn't a simpler way to guarantee atomic appends to a "log" file.

llvm/unittests/Support/Path.cpp
2046–2047

You're right that it is not a posix call -- I did not realize that. However, a brief search seems to indicate that all major operating systems (I tried linux, mac, openbsd, freebsd, netbsd) do have this function.

I'm not sure in what situation is the glibc function you linked to used (glibc build system is very opaque to me), but it is definitely not used on linux, as linux kernel has first class support for this via SYS_flock. I'd expect the BSDs (that includes macs) to do the same, as they document flock as behaving differently than fcntl locks.

You're right that fcntl locks are more portable on paper, but I am not really sure that is true in practice. OTOH, I am sure that the fcntl lock semantics are very weird. One example is given in the bsd man pages:

... This semantic means that applica-
tions must be aware of any files that a subroutine library may access.
For example if an application for updating the password file locks the
password file database while making the update, and then calls
getpwnam(3) to retrieve a record, the lock will be lost because
getpwnam(3) opens, reads, and closes the password database.  The database
close will release all locks that the process has associated with the
database, even if the library routine never requested a lock on the data-
base.

I also have very bad memories of trying to use this function, because the deadlock detection algorithm used can create false positives in multithreaded applications.

If the goal is to synchronize writes to a highly contended log file, would it be better (and feasible) to have the individual threads/processes write timestamped output to separate streams that can be merged after-the-fact?

llvm/lib/Support/Windows/Path.inc
1271

::Sleep(1) sleeps for _at least_ one millisecond, but possibly for much longer. The default tick frequency on Windows is about 16 ms. (Many apps boost the system's timer frequency to 1 ms, but that's not universal and not recommended except for short periods when an app needs to display real-time media.)

Sleeping too long once isn't a big deal. But Counter increments by 1 ms each time through the loop, regardless of how long it actually took, so this is likely to sleep too long many times. If the user requests a 10 ms timeout, they could actually wait 160 ms (in some near-worst case scenario).

If this tracked the actual time elapsed, it probably would never be worse than 16 ms.

You can use chrono's high resolution clock or Windows APIs like ::GetTickCount or ::QueryPerformanceCounter to find out how long the thread actually slept.

sepavloff updated this revision to Diff 261166.Apr 30 2020, 2:46 AM
sepavloff marked 2 inline comments as done.

Updated patch

  • fcntl was replaced for flock on unix,
  • std::chrono was used to check timeout,
  • unit test was enabled to unix as well.
sepavloff marked 2 inline comments as done.Apr 30 2020, 2:54 AM
sepavloff added inline comments.
llvm/lib/Support/Windows/Path.inc
1271

Indeed, Timeout was in fact a number of attempts. Reworked this place using std::chrono.

llvm/unittests/Support/Path.cpp
2046–2047

After some hesitation I implemented your idea to use flock instead of fcntl. The concern was problems with locking on NFS shares, but it seems this was an issue for old implementation.

Using flock allows to enable unit test for unix as well.

Thanks for the timeout fix on Windows. I'm happy now, but I'm not qualified to approve other parts of this patch.

If the goal is to synchronize writes to a highly contended log file, would it be better (and feasible) to have the individual threads/processes write timestamped output to separate streams that can be merged after-the-fact?

Such way is possible, but expensive. There must be a tool that collects these separate reports. And this tool must somehow know about build process so that it could pick up right files. Of course, it can pick up every report file in a build tree, but it this case somebody must clean the tree prior to build with getting statistics and this is a source of human errors. Such way is OK if there is some integrated system and running make is made by it and not by the user.

Instead, ability of compiler to log child process statistics to given file provide simple and fast way to obtain resource consumption data, which may be used not only by compiler developer but also by testers or other persons who need such data.

labath added a comment.May 4 2020, 1:30 AM

This looks good to me too. Just some stylistic comments inline.

llvm/lib/Support/Unix/Path.inc
34

add space before <

1055–1063

You could reduce nesting here by flipping the condition: if (flock(...) == 0) return std::error_code();

llvm/lib/Support/Windows/Path.inc
1267

No else after return.

labath added inline comments.May 4 2020, 1:30 AM
llvm/include/llvm/Support/FileSystem.h
1136–1137

Given the very different locking semantics on different OSs, it would be good to explicitly mention what this functions does (or does not) promise. Something along the lines of that this is guaranteed to work only if other processes also try to lock the file the same way, and that it is unspecified whether holding a lock prevents another process from modifying the said file.

llvm/lib/Support/Windows/Path.inc
1285

same here.

llvm/unittests/Support/Path.cpp
2045

Maybe a test where the second lock is taken on a separate thread, and the first lock is released while the second thread is waiting for it to become available?

2046–2047

Thanks. We can always revisit this if it turns out to be an issue somewhere.

sepavloff updated this revision to Diff 262090.May 5 2020, 6:09 AM

Updated patch

  • Small format corrections,
  • Added test on locking in threads.
sepavloff marked 7 inline comments as done.May 5 2020, 6:14 AM
sepavloff added inline comments.
llvm/unittests/Support/Path.cpp
2045

Added test lockFileThread.

labath added inline comments.May 5 2020, 7:09 AM
llvm/include/llvm/Support/FileSystem.h
1136

Is the "advisory" part really true on windows? My impression is that is not true (at least not in the sense that posix uses this term). WriteFile documentation says:

If part of the file is locked by another process and the write operation overlaps the locked portion, WriteFile fails.
1138

I find this "may change it" part very ambiguous. Did you mean that the process may assume that no other process changes that file (because the operating system guarantees that) or that the process must assume that no other process will change it (because there is no mechanism in the OS to prevent it).

Note that if my thoughts on WriteFile above are true, then neither of the two interpretations are correct, and all we can say is something like. "Attempts to lock the file by other processes will fail/block, but the caller should not assume that the file cannot be modified by uncooperating processes who access it without locking."

llvm/unittests/Support/Path.cpp
2080–2135

I get the impression this code is much more complicated then needed. There's a lot of synchronization going on but it still does not guarantee the that the file is unlocked while the other thread is inside the tryLock call (my goal was to get coverage for the while loop). How about something like:

EC = fs::tryLockFile(FD1);
ASSERT_NO_ERROR(EC);
EC = fs::tryLockFile(FD2);
ASSERT_ERROR(EC);
std::thread LockThread([&] { EC2 = fs::tryLockFile(FD2, std::chrono::minutes(1)); });
std::this_thread::sleep_for(std::chrono::seconds(1));
EC = fs::unlockFile(FD1);
ASSERT_NO_ERROR(EC);
LockThread.join();
ASSERT_NO_ERROR(EC2);
EC = fs::unlockFile(FD2);
ASSERT_NO_ERROR(EC);

It still does not guarantee that the other thread is inside the tryLockFile call, but it comes as close as we can get, and it avoids all the condition_variable overhead.

2089

I don't think this is a good use of auto per the coding standards http://llvm.org/docs/CodingStandards.html#use-auto-type-deduction-to-make-code-more-readable.

sepavloff updated this revision to Diff 262861.May 8 2020, 6:31 AM
sepavloff marked an inline comment as done.

Updated patch

  • Added blocking lock function,
  • Reorganized unit tests.
krytarowski added inline comments.May 11 2020, 5:01 AM
llvm/unittests/Support/Path.cpp
2046–2047

flock() is implemented as a system call on BSDs since 1983, this means that it is pretty universal.

I think this patch is mostly ok, but I'm still waiting on responses to two comments I made earlier.

llvm/include/llvm/Support/FileSystem.h
1138

It would be nice to get this clarified as well..

llvm/unittests/Support/Path.cpp
2080–2135

Waiting on a response to this. I still feel that this can be organized in a much simpler way without using so many explicit synchronization primitives.

sepavloff updated this revision to Diff 265477.May 21 2020, 5:09 AM

Updated patch

  • Fixed documentation for tryLockFile,
  • Added comments to unit test.
sepavloff marked 2 inline comments as done.May 21 2020, 5:17 AM
sepavloff added inline comments.
llvm/include/llvm/Support/FileSystem.h
1138

I changed the wording in this point.

llvm/unittests/Support/Path.cpp
2080–2135

I put a diagram explaining what the test does. Actually there are two events, which ensures that the first attempt to lock file occurs in Thread2 after Thread1 locks the file but before it releases it. So the two calls to tryLockFile checks both cases, successful and unsuccessful. Each event requires a mutex and a condvar, so we have 4 synchronization objects. Simpler variants do not guarantee checking the both cases.

labath added inline comments.May 28 2020, 9:02 AM
llvm/unittests/Support/Path.cpp
2080–2135

I believe that each of these events requires *synchronization*, but a condition variable is not the only way to achieve that. Starting and joining a thread is also a form of synchronization, and it is much simpler (and something you have to do anyway.

So, instead of starting a thread which, as a first order of business blocks on a condition variable, you could just delay starting the thread until such a time that the condition would be satisfied. Basically -- remove cv.wait_for and replace cv.notify with the creation of the thread object. Then, instead of waiting for the other thread to unblock you, you can just join it.

For the second condition variable, you just create a fresh thread again. Having the thread bodies be smaller would make the code much easier to follow, and it's not like we have to worry about the performance overhead of creating a bunch of small threads here...

sepavloff updated this revision to Diff 267154.May 29 2020, 2:53 AM

Updated unit test

sepavloff marked an inline comment as done.May 29 2020, 3:02 AM
sepavloff added inline comments.
llvm/unittests/Support/Path.cpp
2080–2135

Starting the second thread inside the first instead of waiting event indeed makes the test more compact, I rewrote the test accordingly.

As for the second lock try, which happens after the file is unlocked, using new thread makes the logic more obscure. I moved synchronization stuff into the new class Event, it must make the test shorter and clearer.

labath added inline comments.May 29 2020, 5:15 AM
llvm/unittests/Support/Path.cpp
2080–2135

Creating the Event class does make it a bit better, but I still maintain that this test is too complicated for what it really tests. Take a look at the following test:

TEST_F(FileSystemTest, lockFileThread) {
#if LLVM_ENABLE_THREADS
  int FD1, FD2;
  SmallString<64> TempPath;
  ASSERT_NO_ERROR(fs::createTemporaryFile("test", "temp", FD1, TempPath));
  FileRemover Cleanup(TempPath);
  ASSERT_NO_ERROR(fs::openFileForReadWrite(TempPath, FD2, fs::CD_OpenExisting,
                                           fs::OF_Append));


  ASSERT_NO_ERROR(fs::tryLockFile(FD1));
  ASSERT_ERROR(fs::tryLockFile(FD2));
  std::future<std::error_code> Future = std::async(std::launch::async, [&] {
    return fs::tryLockFile(FD2, std::chrono::seconds(5));
  });
  ASSERT_NO_ERROR(fs::unlockFile(FD1));
  ASSERT_NO_ERROR(Future.get());
  fs::unlockFile(FD2);

  ASSERT_NO_ERROR(fs::tryLockFile(FD1));
  ASSERT_ERROR(fs::tryLockFile(FD2));
  Future = std::async(std::launch::async, [&] { return fs::lockFile(FD2); });
  ASSERT_NO_ERROR(fs::unlockFile(FD1));
  ASSERT_NO_ERROR(Future.get());
  fs::unlockFile(FD2);

#endif
}

It tests the same thing as the test you wrote -- I obtained by applying series of semantics-preserving simplifications to it. This included fairly simple things like:

  • inlining
  • replacing patterns like std::thread(foo).join() with direct calls to foo
  • moving code which does not block outside of a thread -- e.g. asserting that a lock attempt fails does not need to be done in a separate thread because it does not block. Only the blocking calls do.
  • replacing a thread consisting of a single expression with a call to std::async
  • removing unused variables produced by all of this

However, the end result is a test which is about 3 times shorter than the original (28 lines vs 88), and it's almost linear -- each parallel section is only three lines long. I think it'd be pretty hard to argue that this is not more readable than the original.

sepavloff marked an inline comment as done.May 31 2020, 9:57 PM
sepavloff added inline comments.
llvm/unittests/Support/Path.cpp
2080–2135

Thank you very much for the code and detailed explanations!

The way your code checks the result of fs::tryLockFile means it relies on particular sequence of statement execution. For the test to be successful the main thread after the creation of a separate thread must continue execution and execute fs::unlockFile. In this case when the separate thread starts, it sees unlocked file. It is the most probable case, but not the single. If rescheduling of the main thread occurs after thread creation but before execution of fs::unlockFile or there is a core ready to execute the new thread, this test will fail.

There is no guarantee of ordering statement execution in different threads unless synchronization objects are used.

labath added inline comments.Jun 1 2020, 2:11 AM
llvm/unittests/Support/Path.cpp
2080–2135

I'm sorry, but I am unable to follow this line of reasoning. You're talking about this block of code, right?

std::future<std::error_code> Future = std::async(std::launch::async, [&] {
  return fs::tryLockFile(FD2, std::chrono::seconds(5));
});
ASSERT_NO_ERROR(fs::unlockFile(FD1));
ASSERT_NO_ERROR(Future.get());
fs::unlockFile(FD2);

Before the std::async statement, FD1 is locked, FD2 is unlocked. After it, there are two actions that can execute in arbitrary order (or concurrently): fs::unlockFile(FD1) on the main thread and fs::tryLockFile(FD2, std::chrono::seconds(5)) on the "async" thread. If the unlockFile executes first, it will unlock the file, and the subsequent tryLockFile will immediately succeed. If tryLockFile is scheduled first, then it will get a lock failure and will start to wait. While it waits the main thread will get scheduled, unlock the file, and then the tryLockFile will succeed again. If an evil scheduler decides to not schedule the main thread for five seconds, then tryLockFile will fail, but there's nothing we can do about that except increase the timeout.

This is the exact same situation that can happen with condition variables:

// thread 2
    ...
    DoUnlockEvent.signal();
    if (UseBlockingCall)
      ECT2b = fs::lockFile(FD2);
    else
      ECT2b = fs::tryLockFile(FD2, std::chrono::seconds(5));
    ...

// thread 1
    ECT1a = fs::tryLockFile(FD1);
    if (ECT1a)
      return;
    auto Thread2 = std::thread(Thread2Body);
    DoUnlockEvent.wait();
    ECT1b = fs::unlockFile(FD1);
    ...

After thread1 is unblocked by DoUnlockEvent.signal();, we again have two runnable threads (the if statement on thread 2, and the fs::unlockFile(FD1) call on thread 1) and it is up to the scheduler to determine their order.

It's not true that there are no synchronization objects here. We have std::future and the thread object contained within. Creation of the future (via std::async) establishes a happens-before relationship between the actions taken before std::async is called on the main thread, and the body of the async thread. This is exactly what happens with DoUnlockEvent.signal() and DoUnlockEvent.wait(). And calling future::get establishes a happens-before relationship between the body of the async thread and the code that comes after the get call. That's exactly what would happen with Thread2.join() in your example.

My point is that "launching a async thread" is a simpler way of synchronizing than "waiting on a cv", and "getting a future" is simpler than "joining a thread which 'returns' a result through a global variable".

sepavloff updated this revision to Diff 267893.Jun 2 2020, 8:10 AM

Updated unit test

sepavloff marked an inline comment as done.Jun 2 2020, 8:13 AM
sepavloff added inline comments.
llvm/unittests/Support/Path.cpp
2080–2135

Sorry, I didn't notice that you use operation with enough long timeout. In this case no flaky behavior should be observed.

I updated the unit test.

Thank you very much!

labath accepted this revision.Jun 2 2020, 9:51 AM

Awesome, let's get this out the door.

This revision is now accepted and ready to land.Jun 2 2020, 9:51 AM
This revision was automatically updated to reflect the committed changes.
ro added a subscriber: ro.Jun 3 2020, 1:27 AM

This patch broke the Solaris buildbots (Builder clang-solaris11-sparcv9 Build #5494, Builder clang-solaris11-amd64 Build #4477):

[24/2656] Building CXX object lib/Support/CMakeFiles/LLVMSupport.dir/Path.cpp.o
FAILED: lib/Support/CMakeFiles/LLVMSupport.dir/Path.cpp.o 
/usr/gcc/7/bin/c++  -DGTEST_HAS_RTTI=0 -D_DEBUG -D_FILE_OFFSET_BITS=64 -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Ilib/Support -I/opt/llvm-buildbot/home/solaris11-amd64/clang-solaris11-amd64/llvm/llvm/lib/Support -Iinclude -I/opt/llvm-buildbot/home/solaris11-amd64/clang-solaris11-amd64/llvm/llvm/include -I/opt/llvm-buildbot/home/solaris11-amd64/clang-solaris11-amd64/llvm/llvm/include/llvm/Support/Solaris -fPIC -fvisibility-inlines-hidden -Werror=date-time -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wno-maybe-uninitialized -Wno-noexcept-type -Wdelete-non-virtual-dtor -Wno-comment -fdiagnostics-color -ffunction-sections -fdata-sections -O3    -UNDEBUG -std=c++14  -fno-exceptions -fno-rtti -MD -MT lib/Support/CMakeFiles/LLVMSupport.dir/Path.cpp.o -MF lib/Support/CMakeFiles/LLVMSupport.dir/Path.cpp.o.d -o lib/Support/CMakeFiles/LLVMSupport.dir/Path.cpp.o -c /opt/llvm-buildbot/home/solaris11-amd64/clang-solaris11-amd64/llvm/llvm/lib/Support/Path.cpp
In file included from /opt/llvm-buildbot/home/solaris11-amd64/clang-solaris11-amd64/llvm/llvm/lib/Support/Path.cpp:1151:0:
/opt/llvm-buildbot/home/solaris11-amd64/clang-solaris11-amd64/llvm/llvm/lib/Support/Unix/Path.inc: In function ‘std::error_code llvm::sys::fs::tryLockFile(int, std::chrono::milliseconds)’:
/opt/llvm-buildbot/home/solaris11-amd64/clang-solaris11-amd64/llvm/llvm/lib/Support/Unix/Path.inc:1055:21: error: ‘LOCK_EX’ was not declared in this scope
     if (::flock(FD, LOCK_EX | LOCK_NB) == 0)
                     ^~~~~~~
/opt/llvm-buildbot/home/solaris11-amd64/clang-solaris11-amd64/llvm/llvm/lib/Support/Unix/Path.inc:1055:21: note: suggested alternative: ‘LOCK_HELD’
     if (::flock(FD, LOCK_EX | LOCK_NB) == 0)
                     ^~~~~~~
                     LOCK_HELD
/opt/llvm-buildbot/home/solaris11-amd64/clang-solaris11-amd64/llvm/llvm/lib/Support/Unix/Path.inc:1055:31: error: ‘LOCK_NB’ was not declared in this scope
     if (::flock(FD, LOCK_EX | LOCK_NB) == 0)
                               ^~~~~~~
/opt/llvm-buildbot/home/solaris11-amd64/clang-solaris11-amd64/llvm/llvm/lib/Support/Unix/Path.inc:1055:31: note: suggested alternative: ‘_CLOCK_T’
     if (::flock(FD, LOCK_EX | LOCK_NB) == 0)
                               ^~~~~~~
                               _CLOCK_T
/opt/llvm-buildbot/home/solaris11-amd64/clang-solaris11-amd64/llvm/llvm/lib/Support/Unix/Path.inc: In function ‘std::error_code llvm::sys::fs::lockFile(int)’:
/opt/llvm-buildbot/home/solaris11-amd64/clang-solaris11-amd64/llvm/llvm/lib/Support/Unix/Path.inc:1068:19: error: ‘LOCK_EX’ was not declared in this scope
   if (::flock(FD, LOCK_EX) == 0)
                   ^~~~~~~
/opt/llvm-buildbot/home/solaris11-amd64/clang-solaris11-amd64/llvm/llvm/lib/Support/Unix/Path.inc:1068:19: note: suggested alternative: ‘LOCK_HELD’
   if (::flock(FD, LOCK_EX) == 0)
                   ^~~~~~~
                   LOCK_HELD
/opt/llvm-buildbot/home/solaris11-amd64/clang-solaris11-amd64/llvm/llvm/lib/Support/Unix/Path.inc: In function ‘std::error_code llvm::sys::fs::unlockFile(int)’:
/opt/llvm-buildbot/home/solaris11-amd64/clang-solaris11-amd64/llvm/llvm/lib/Support/Unix/Path.inc:1074:19: error: ‘LOCK_UN’ was not declared in this scope
   if (::flock(FD, LOCK_UN) == -1)
                   ^~~~~~~
/opt/llvm-buildbot/home/solaris11-amd64/clang-solaris11-amd64/llvm/llvm/lib/Support/Unix/Path.inc:1074:19: note: suggested alternative: ‘LONG_MIN’
   if (::flock(FD, LOCK_UN) == -1)
                   ^~~~~~~
                   LONG_MIN

As detailed e.g. in x/sys/unix: Solaris lacks Flock (plus related constants) #11113, Solaris
doesn't support the non-standard flock.

sepavloff reopened this revision.Jun 7 2020, 9:48 PM

Because flock is unsupported on Solaris, this patch has been rewritten on top of fcntl facility. It also required unit test to be rewritten as well.

If there is no objections, I will commit the new patch in couple of days.

This revision is now accepted and ready to land.Jun 7 2020, 9:48 PM
sepavloff updated this revision to Diff 269104.Jun 7 2020, 9:49 PM

Updated patch

  • flock call was replaced by fcntl in hope that this change would enable this facility on Solaris OS, where flock is unavailable.
  • As locks created by fcntl work on per-process basis, unit test based on multithreding does not work. A new unit test where locks are obtained in different processes was added.

If nobody will argue in couple of days, this patch will be committed.

MaskRay added inline comments.Jun 7 2020, 10:01 PM
llvm/lib/Support/Unix/Path.inc
1066

I feel uneasy with usleep(1000). Why is it needed?

labath added a comment.Jun 8 2020, 3:21 AM

It's a pitty about flock&solaris. It's a much saner api, and a lot closer to the windows implementation.

I guess we have to stick to fcntl then. The new version looks good. I just think it would be good to acknowledge the weirdness of the flock api in the method documentation. Maybe something like "Care should be taken when using this function in a multithreaded context, as it may not prevent other threads in the same process from obtaining a lock on the same file, even if they are using a different file descriptor."

llvm/lib/Support/Unix/Path.inc
1066

That's because the os provides no mechanism to wait for a lock to become available for a given amount of time (well.... one could use SIGALRMs to achieve that, but those come with problems of their own..)

Using exponential backoff for this might be a good idea, but I also think that can wait until this becomes a problem.

llvm/unittests/Support/ProgramTest.cpp
369 ↗(On Diff #269104)

Although this will work in this particular case, using Twines in this way is very dangerous (this is _almost_ the same as the one that http://llvm.org/docs/ProgrammersManual.html#llvm-adt-twine-h warns about). As performance is definitely not relevant here, it would be better to just use strings.

MaskRay added inline comments.Jun 8 2020, 8:42 AM
llvm/lib/Support/Unix/Path.inc
1066

The standard try_* (std::mutex::try_lock, pthread_mutex_trylock, etc) return immediately upon a failure. The backoff strategy should be done by the application to avoid the misnomer.

MaskRay requested changes to this revision.Jun 8 2020, 8:42 AM
This revision now requires changes to proceed.Jun 8 2020, 8:42 AM
sepavloff updated this revision to Diff 269429.Jun 9 2020, 1:01 AM

Updated patch

  • Rebased,
  • Added note about using in multithreaded environment,
  • Used std::string instead of Twine in unit test.
sepavloff marked 2 inline comments as done.Jun 9 2020, 1:16 AM

It's a pitty about flock&solaris. It's a much saner api, and a lot closer to the windows implementation.

I guess we have to stick to fcntl then. The new version looks good. I just think it would be good to acknowledge the weirdness of the flock api in the method documentation. Maybe something like "Care should be taken when using this function in a multithreaded context, as it may not prevent other threads in the same process from obtaining a lock on the same file, even if they are using a different file descriptor."

Added the note in the description of the function in FileSystem.h.

llvm/lib/Support/Unix/Path.inc
1066

If the function is called as tryLockFile(FD) it behaves exactly as the standard try_* functions, it returns immediately.

The backoff strategy should be done by the application to avoid the misnomer.

This use case is often and even standard library provides functions that attempt to do an operation in specified time (*_for). As the only expected usage of file locks is writing to log file by concurrent processes and it requires repeating the operation if unsuccessful, so integrations of the wait loop into the service function seems natural.

labath added inline comments.Jun 9 2020, 2:41 AM
llvm/lib/Support/Unix/Path.inc
1066

@MaskRay, I'm not sure how to interpret your comment -- would your concerns be addressed if this function was renamed to tryLockFileFor (and deleting the default argument value) ? Because I too think that would be a better name for this function (though I don't feel very strongly about that). tryLockFile could then be implemented as tryLockFileFor(seconds(0)), or not implemented at all, if it's not needed.

As for the retries, I agree with @sepavloff, that it makes sense to implement them here. The way I see it the "retries" are an implementation detail and the function as a whole still behaves like the other try_lock_for functions. E.g. description of std::timed_mutex::try_lock_for says: "tries to lock the mutex, returns if the mutex has been unavailable for the specified timeout duration" -- that is still true. The main difference is that the function returns an error_code instead of a bool, but there's not much that can be done about that, as this operation can fail for a lot of other reasons.

I think this is sort similar to std::atomic<T>::fetch_add. E.g., arm64 does not have the equivalent of the lock add instruction, so this function compiles to a ldaxr+stlxr loop, which attempts the update several times until successful -- the caller does not have to retry manually. (And down at the microarchitectural level, even lock add probably uses retries&timeouts to implement the functionality.)

MaskRay added inline comments.Jun 9 2020, 9:44 AM
llvm/lib/Support/Unix/Path.inc
1066

Adding both try and for to the name looks good to me.

POSIX uses timedlock which may be considered as well.

sepavloff updated this revision to Diff 269754.Jun 10 2020, 1:11 AM

Updated patch

  • tryLockFile was renamed to tryLockFileWithTimeout,
  • added new function tryLockFile, which returns immediately.
sepavloff marked an inline comment as done.Jun 10 2020, 1:21 AM

It's a pitty about flock&solaris. It's a much saner api, and a lot closer to the windows implementation.

I guess we have to stick to fcntl then. The new version looks good. I just think it would be good to acknowledge the weirdness of the flock api in the method documentation. Maybe something like "Care should be taken when using this function in a multithreaded context, as it may not prevent other threads in the same process from obtaining a lock on the same file, even if they are using a different file descriptor."

Added the note in the description of the function in FileSystem.h.

llvm/lib/Support/Unix/Path.inc
1066

Using for is convenient for standard functions, because they accept single argument of type std::chrono::duration, so an expression like:

f.wait_for(2500ms)

looks self-documented. In the case of tryLockFile it becomes something like:

tryLockFileFor(FileHandle, 2500ms)

which is pretty ugly.

I renamed tryLockFile to tryLockFileWithTimeout, which must be unambiguous, although long. Also a function tryLock was added, that does not wait and return immediately.

sepavloff updated this revision to Diff 269760.Jun 10 2020, 1:38 AM

Forgotten changes

Updated patch

Changed name of the function tryLockFileWithTimeout to tryLockFile. This
name is shorter thus more readable. Expression like tryLockFile(SomeFile)
looks natural, tryLockFile(SomeFile, 2500ms) also is understandable without
documentation. Setting default timeout value to zero is natural, as a single
attempt to lock, which is associted with try* functions, can be considered
as a particular case of very short timeout.

Any feedback?

sepavloff updated this revision to Diff 276917.Jul 9 2020, 11:17 PM

Rebased patch

@amccarth Could you please review Windows part of this patch? Thank you.

amccarth accepted this revision.Jul 27 2020, 10:33 AM

As I noted on April 30, I'm OK with the Windows portions of this. I didn't explicitly "Accept" because I didn't want to pre-empt the concerns of the other reviewers.

I see that @MaskRay is still marked as requesting revisions.

As I noted on April 30, I'm OK with the Windows portions of this. I didn't explicitly "Accept" because I didn't want to pre-empt the concerns of the other reviewers.

I see that @MaskRay is still marked as requesting revisions.

Sorry for trouble, the thread is so long that I missed your review.
Thank you!

MaskRay accepted this revision.Jul 27 2020, 11:59 PM
This revision is now accepted and ready to land.Jul 27 2020, 11:59 PM
This revision was automatically updated to reflect the committed changes.