Page MenuHomePhabricator

[clang][DirectoryWatcher] Upstream DirectoryWatcher

Authored by jkorous on Feb 19 2019, 5:13 PM.



This patch contains implementation of DirectoryWatcher from

We are starting new push to upstream the index-while-building feature in clang and this is one of the dependencies.

Original author is David Farler, other contributors are Argyrios Kyrtzidis, Thomas Roughton and Alex Lorenz.

Part of this implementation was included in the review below so I am adding @tschuett and @yvvan in case they are interested.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
gribozavr added inline comments.May 20 2019, 7:51 AM

"a relative path" -- relative to what?


Is it really a path to the directory?




Users are not going to use this typedef in their code, so I feel like it is obfuscating code -- could you expand it in places where it is used?


Make it a static function in the DirectoryWatcher?


Don't write what something is used for (it can change), write what something is.

"If true, all async actions are requested to stop."


I think this is too much code in the initializer list. Could you move the body of the lambda into a method, and call it from this lambda?


NAME_MAX is 255, add some for inotify_event, and multiply by 30... we get almost 8 Kb on the stack. Should we allocate on the heap instead?


Please use "//" comments.


Please use AlignedCharArray from llvm/include/llvm/Support/AlignOf.h


What is the role of the timeout and why does it need to be so small?


I think this is too much code in the initializer list. Could you move the body of the lambda into a method, and call it from this lambda?


No need for the semicolon.


Use llvm::make_unique (and let unique_ptr do an implicit conversion to base).


No need to check. macOS will always have this file. If it is not there, it is a big issue anyway.


... LINK_FLAGS "-framework CoreServices"

No need for an intermediate variable.








I'm certain this sleep will be flaky on heavily-loaded CI machines. If you are going to leave it as a sleep, please make it 1s. But is there really no better way?


I would strongly prefer if you used the gmock matchers (like Contains); as written, when the test fails, the only error we would get would be like "expected: true, actual: false".

jkorous marked 2 inline comments as done.May 20 2019, 12:56 PM

Thanks for taking a look @gribozavr!

I briefly scanned the rest of your comments and I agree with you (or don't really have a strong opinion) in all cases. I'll work on that today.


The whole idea is that we can't block on read() if we ever want to stop watching the directory, release resources (file descriptors, threads) and correctly destruct the DirectoryWatcher instance either

  • because of a bug in some other thread in the implementation
  • or asynchronous client action (e. g. destructor being called) in main application thread

The timeout adds latency in those scenarios.


That was exactly my thinking! Honestly, I don't know - I wasn't able to come up with any reasonably simple, deterministic approach even on a single platform :(
I eventually picked 0.1s as a tradeoff between slowing the test for everyone and getting less false positives.

The problem as I understand it is that we're making changes and monitoring them asynchronously with no guarantee from the kernel API (true for FSEvents, not 100% about inotify) about when (if) we receive notifications.

If you have any idea for robust testing approach I'd be totally happy to use it.

jkorous marked 36 inline comments as done.EditedMay 20 2019, 7:51 PM

I addressed most of the comments.

What is left:

  • rewrite tests with gmock matchers
  • write test for metadata of a file in watched dir being changed

I used WatchedDirRemoved.


I personally didn't like the C-like std::string& error in the interface. But I feel like having a polymorphic class (which doesn't really fit into ErrorOr) is lesser evil than the original pimpl design.


I reworded the comment.


You're right - seems like static factory methods are the LLVM way.


Actually this is wrong but for a different reason - these are link dependencies of the implementation. Tests (or any other client) shouldn't care about it.

jkorous updated this revision to Diff 200389.May 20 2019, 7:52 PM
jkorous marked 4 inline comments as done.

Addressed comments.
Changed semantics of one of std::atomic<bool> in linux implementation.

gribozavr added inline comments.Tue, May 21, 9:24 AM

"alignof()" expression is standard C++ since C++11. (No need for underscores.)


use 'auto' to store the return value of make_unique?


Waking up 1000 times a second is not great -- it will lead to battery drain on laptops etc.

Please see for non-busy-wait options.

kadircet added inline comments.Wed, May 22, 10:09 AM

nit: get rid of the else


naming, it should be capital P


while (!Stop) ?


why the loop always tries to empty the queue? what would be broken if we simply stopped if Stop was set?

Also if that is important current implementation doesn't seem to be achiving that, since some events can be pushed to the queue after while loop exits.


maybe add a condition variable for queue state and waint on it?


why not use two condition variables for stop and finishedinitscan?

jkorous marked 9 inline comments as done.Wed, May 22, 12:50 PM

Thanks for taking a look Kadir!
After yesterday's discussion with Dmitri I removed all those busy waits. Seems like the code is not much more complex now. I am going to update the diff and off to fixing the tests.


I changed the design - stopping condition for this loop is now finding WatcherGotInvalidated in the queue itself.

jkorous updated this revision to Diff 200810.Wed, May 22, 12:50 PM
jkorous marked an inline comment as done.

Remove busy waits.

jkorous updated this revision to Diff 200811.Wed, May 22, 12:52 PM
jkorous updated this revision to Diff 201114.Thu, May 23, 6:43 PM

Reimplemented tests with std::futures which allowed to use more generous timeout while not slowing down the happy paths.

jkorous marked 6 inline comments as done.Thu, May 23, 6:49 PM
jkorous added inline comments.

I found a way how to use more generous timeout without slowing down the test for every run - inspired by Argyrios' idea about using different thread + semaphore.

I am using 3 seconds for now. If that's not enough, just let me know.


Since the tests are using are based on something like "eventual correctness" instead of one-time check I didn't use gmock matchers but implemented some custom diagnostics.

Example of the failed test:

/Users/jankorous/src/llvm-monorepo/llvm-project/clang/unittests/DirectoryWatcher/DirectoryWatcherTest.cpp:185: Failure
Value of: *TestConsumer.Result()
  Actual: false
Expected: true
Expected but not seen non-initial events:
Removed a
Unexpected non-initial events seen:
Added a
jkorous marked an inline comment as done.Fri, May 24, 1:01 PM
jkorous updated this revision to Diff 201321.Fri, May 24, 1:25 PM

Specify what "file modified" means and add a test for metadata change

jkorous updated this revision to Diff 201338.Fri, May 24, 2:37 PM
  • simplify link libraries in cmake
  • fix Release build (messed-up asserts)
jkorous updated this revision to Diff 201744.Tue, May 28, 12:44 PM

Remove DirectoryWatcher::Event::EventKind::Added

One more thing.

On macOS FSEvents are coalesced more or less at will and it became quite apparent when I was creating automatic tests - I was for example receiving coalesced events Added & Modified & Removed. We had a discussion about how to deal with this and it turned out that the existing client doesn't actually care whether a file was created or whether it was modified. I believe that by removing the EventKind::Added we still keep the interface sane while removing the ambiguity.

Very nice testing approach!


Looks like triple slashes on empty lines got removed, splitting the doc comment.


Three slashes for doc comments?


Don't write what something is used for currently, such comments go out of date quickly. (Just delete the last sentence.)





Three slashes for doc comments.


Extra semicolon.


Since it closes the file descriptors in the destructor, I feel like it should also be responsible for calling pipe in the constructor.


Three slashes for doc comments.


"consumes", "pushes"


Please add a period at the end of the comment (everywhere in the patch).


Please name functions consistently -- there's both consume() that starts with lowercase, and Result() that starts with uppercase.

Please refer to the current naming rules in the style guide and apply everywhere in the patch.


"If the following assertions fail, it is a sign that ..."

Also you can stream the message into the EXPECT_TRUE, it will be printed if the assertion fails.

EXPECT_TRUE(...) << "whatever";


Test names start with an uppercase letter (InitialScanSync). Please apply everywhere in the patch.


Add /*waitForInitialSync=*/ ? (everywhere in the patch)


Delete the comma and wrap onto one line.


80 columns.

jkorous marked 20 inline comments as done.Fri, May 31, 12:19 PM
jkorous added inline comments.

Sure, no problem.

I naturally tend to err on the side of over-commenting as I think I'd appreciate it myself if I had to understand the code without prior knowledge - not saying it's intentional or that I have a strong reason to do that though. You seem to have a strong preference for not having out-of-date comments with low-ish information value. Just out of curiosity - is it based on any particular reason or experience?


I know what you mean - my "oop feel" was telling me it's wrong too. It's not just the pipe in this class but also the inotify descriptor in the watcher class.

The problem is that the only reasonable way how to communicate failures from constructors that I am aware of are exceptions which we don't use in llvm. That's why I moved most of the stuff that can fail even before any work is actually started to the factory method (with the exception of epoll file descriptor as that felt like making its scope unnecessarily larger).

Thinking about it now, I am starting to doubt that it makes life any easier for client code as it still has to cope with failure communicated as WatcherGotInvalidated event via receiver.

What do you think?


Sorry about that. This is definitely my weak point.

jkorous updated this revision to Diff 202467.Fri, May 31, 12:21 PM
jkorous marked 2 inline comments as done.

Addressed comments.

gribozavr accepted this revision.Fri, May 31, 12:56 PM
gribozavr added inline comments.

is it based on any particular reason or experience?

Yes, primarily working on the Swift compiler and the standard library, when everything was changing very quickly. My current work also confirms the same -- a lot of time when I see a comment about the "current usage" or other incidental information that is not the contract of the API, it tends to be outdated. Approximately nobody will change such a comment when another user is added ("I'm just reusing the code..."), even when the quoted user already became non-representative of the usage pattern, or the usage pattern has changed. However, when changing the API contract people typically do change the comment.

It also makes sense to me in abstract: reading that X happens to be used for Y does not necessarily help understand X better -- it is only a cross-reference that I could find myself with an IDE command; I still need to understand the design of Y and the interaction with X, and then using my past experience infer what X was intended to be.

Saying that X is intended to be used only by Y is a different story of course, that's design documentation.

Providing an example of usage is also fine, but it should be phrased as an example that can't become stale.


You could add a factory function to SemaphorePipe, but... I feel like trying to recover from a failure in the pipe() call is a bit like trying to recover from a memory allocation failure. The process is probably hitting a file descriptor limit or something like that, and is likely going to fail anyway. I'd probably trigger a fatal error if pipe() fails.

This class is a lot like pthread_mutex_init -- it can fail "gracefully", but there's no way for the caller to recover -- the caller needs a mutex to proceed.

This revision is now accepted and ready to land.Fri, May 31, 12:56 PM
gribozavr added inline comments.Fri, May 31, 12:58 PM

Use pipe2() with O_CLOEXEC, to avoid leaking the file descriptors to child processes?

jkorous marked 3 inline comments as done.Mon, Jun 3, 5:56 PM

I fixed the rest.

There are still some questions you raised that I just responded to and kept them as not Done. Feel free to take a look. If nothing comes up I'll commit this on Wednesday.


Thank you.


I see what you mean and mostly agree. Anyways, let's allow clients to handle such funny moments themselves as much as they can.

I'll factor out the pipe call to the factory method.


You're right, I didn't account for this. Added the flag also to inotify_init1() and epoll_create1() calls.

jkorous updated this revision to Diff 202833.Mon, Jun 3, 5:58 PM

linux implementation

  • factory method for SemaphorePipe
  • *_CLOEXEC flags