This is an archive of the discontinued LLVM Phabricator instance.

Improve module.pcm lock file performance on machines with high core counts
ClosedPublic

Authored by Ladd on Oct 29 2019, 11:01 AM.

Details

Summary

When building a large Xcode project with multiple module dependencies, and mixed Objective-C & Swift, I observed a large number of clang processes stalling at zero CPU for 30+ seconds throughout the build. This was especially prevalent on my 18-core iMac Pro.

After some sampling, the major cause appears to be the lock file implementation for precompiled modules in the module cache. When the lock is heavily contended by multiple clang processes, the exponential backoff runs in lockstep, with some of the processes sleeping for 30+ seconds in order to acquire the file lock.

In the attached patch, I implemented a more aggressive polling mechanism that limits the sleep interval to a max of 500ms, and randomizes the wait time. I preserved a limited form of exponential backoff. I also updated the code to use cross-platform timing, thread sleep, and random number capabilities available in C++11.

On iMac Pro (2.3 GHz Intel Xeon W, 18 core):

Xcode 11.1 bundled clang:

502.2 seconds (average of 5 runs)

Custom clang build with LockFileManager patch applied:

276.6 seconds (average of 5 runs)

This is a 1.82x speedup for this use case.

On MacBook Pro (4 core 3.1GHz Intel i7):

Xcode 11.1 bundled clang:

539.4 seconds (average of 2 runs)

Custom clang build with LockFileManager patch applied:

509.5 seconds (average of 2 runs)

As expected, machines with fewer cores benefit less from this change.

Call graph:
    2992 Thread_393602   DispatchQueue_1: com.apple.main-thread  (serial)
      2992 start  (in libdyld.dylib) + 1  [0x7fff6a1683d5]
        2992 main  (in clang) + 297  [0x1097a1059]
          2992 driver_main(int, char const**)  (in clang) + 2803  [0x1097a5513]
            2992 cc1_main(llvm::ArrayRef<char const*>, char const*, void*)  (in clang) + 1608  [0x1097a7cc8]
              2992 clang::ExecuteCompilerInvocation(clang::CompilerInstance*)  (in clang) + 3299  [0x1097dace3]
                2992 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&)  (in clang) + 509  [0x1097dcc1d]
                  2992 clang::FrontendAction::Execute()  (in clang) + 42  [0x109818b3a]
                    2992 clang::ParseAST(clang::Sema&, bool, bool)  (in clang) + 185  [0x10981b369]
                      2992 clang::Parser::ParseFirstTopLevelDecl(clang::OpaquePtr<clang::DeclGroupRef>&)  (in clang) + 37  [0x10983e9b5]
                        2992 clang::Parser::ParseTopLevelDecl(clang::OpaquePtr<clang::DeclGroupRef>&)  (in clang) + 141  [0x10983ecfd]
                          2992 clang::Parser::ParseExternalDeclaration(clang::Parser::ParsedAttributesWithRange&, clang::ParsingDeclSpec*)  (in clang) + 695  [0x10983f3b7]
                            2992 clang::Parser::ParseObjCAtDirectives(clang::Parser::ParsedAttributesWithRange&)  (in clang) + 637  [0x10a9be9bd]
                              2992 clang::Parser::ParseModuleImport(clang::SourceLocation)  (in clang) + 170  [0x10c4841ba]
                                2992 clang::Parser::ParseModuleName(clang::SourceLocation, llvm::SmallVectorImpl<std::__1::pair<clang::IdentifierInfo*, clang::SourceLocation> >&, bool)  (in clang) + 503  [0x10c485267]
                                  2992 clang::Preprocessor::Lex(clang::Token&)  (in clang) + 316  [0x1098285cc]
                                    2992 clang::Preprocessor::LexAfterModuleImport(clang::Token&)  (in clang) + 690  [0x10cc7af62]
                                      2992 clang::CompilerInstance::loadModule(clang::SourceLocation, llvm::ArrayRef<std::__1::pair<clang::IdentifierInfo*, clang::SourceLocation> >, clang::Module::NameVisibilityKind, bool)  (in clang) + 7989  [0x10bba6535]
                                        2992 compileAndLoadModule(clang::CompilerInstance&, clang::SourceLocation, clang::SourceLocation, clang::Module*, llvm::StringRef)  (in clang) + 296  [0x10bba8318]
                                          2992 llvm::LockFileManager::waitForUnlock()  (in clang) + 91  [0x10b6953ab]
                                            2992 nanosleep  (in libsystem_c.dylib) + 199  [0x7fff6a22c914]
                                              2992 __semwait_signal  (in libsystem_kernel.dylib) + 10  [0x7fff6a2a0f32]

Diff Detail

Event Timeline

Ladd created this revision.Oct 29 2019, 11:01 AM
Ladd edited the summary of this revision. (Show Details)Oct 29 2019, 11:03 AM
Ladd edited the summary of this revision. (Show Details)Oct 29 2019, 11:21 AM
benlangmuir edited reviewers, added: bruno; removed: benlangmuir.Oct 29 2019, 11:36 AM

I haven't work on this in a long time; replacing myself with Bruno as reviewer.

Ladd added a comment.Nov 18 2019, 3:43 PM

Hi all,

I'm a new contributor to LLVM -- pinging again on this patch. It's been sitting for three weeks -- any chance I can get a review?

The underlying change is pretty straightforward, and produces a 1.8x build job speedup for my specific scenario.

arphaman edited reviewers, added: Bigcheese; removed: doug.gregor.Nov 18 2019, 3:58 PM
arphaman added subscribers: Bigcheese, arphaman.

Hi all,

I'm a new contributor to LLVM -- pinging again on this patch. It's been sitting for three weeks -- any chance I can get a review?

The underlying change is pretty straightforward, and produces a 1.8x build job speedup for my specific scenario.

Hi, I added @Bigcheese who should be able to help you land this patch.

A side comment: please upload your patch with full context (git -U9999), as it will help the reviewers to review it.

Bigcheese accepted this revision.Nov 19 2019, 10:38 AM

A few minor points, but with those fixed this looks good to me. Thanks for the patch!

llvm/lib/Support/LockFileManager.cpp
49

80 column limit.

317

80 col

339–340

These declarations should be sunk to their smallest enclosing scope.

341–342

Pretty sure these don't need to be arrays, you should just be able to take the address of a kevent.

341–343

These don't need the struct keyword in C++, although it's already used elsewhere in the code.

353

No space before & here, and 80 col.

370

No space before ( here.

390

80 col

392

Remove extra spaces after ( and before ).

This revision is now accepted and ready to land.Nov 19 2019, 10:38 AM
Ladd updated this revision to Diff 230122.Nov 19 2019, 12:21 PM

Thanks for the review -- revised diff contains formatting updates per review. I had to leave the struct in for the kevent declarations, or I get:

LockFileManager.cpp:348:9: error: must use

'struct' tag to refer to type 'kevent' in this scope
Ladd marked 8 inline comments as done.Nov 19 2019, 12:22 PM
Ladd marked an inline comment as done.Nov 19 2019, 12:25 PM
Ladd added a comment.Nov 19 2019, 12:29 PM

I don't have commit access -- please commit for me if it looks good in revised form.

Hi @Ladd. I found this path too late:) Please check https://reviews.llvm.org/D70563. Few corner cases are covered there, which aren't covered in your patch.

llvm/lib/Support/LockFileManager.cpp
357

If lock file will be deleted in between opening file and calling this method, kQueue, will wait for full time and will end up with a timeout

you can check this behavior in source code, provided in https://reviews.llvm.org/D70563

358

In the original code, there's an additional check if the lock owner has died.
How about listening to that event as well?
again, you can take a pick here https://reviews.llvm.org/D70563

Ladd added a comment.Nov 25 2019, 12:14 PM

Thanks @PaulTaykalo -- I'll try to take a look this week.

@Ladd Any progress on this one?

I would love to see one of these land, but I'll need to find some time to do a deeper look at the two.

Ladd added a comment.EditedFeb 12 2020, 11:54 AM

Apologies -- I haven't had time to try to reconcile these patches. The edge cases that @PaulTaykalo identified seem worth considering.

I'd be open to simplifying my patch to remove the kqueue, and using just the random wait behavior. This produces basically the same performance win in my measurements, and is not subject to the edge case. We could then pursue the more clever kqueue-based waiting as a follow-on.

Ladd added a comment.Mar 4 2020, 12:17 PM

@Bigcheese -- any more thoughts on this one? It would be helpful to have some version of this fix in the release compiler.

@Bigcheese -- any more thoughts on this one? It would be helpful to have some version of this fix in the release compiler.

I think I'd be fine with just the random wait then. Can you prep a patch for that?

Ladd updated this revision to Diff 248842.EditedMar 6 2020, 3:02 PM
Ladd edited the summary of this revision. (Show Details)

Simplified version of the change, removing event-based waiting. Also, adopted and changed definition of MaxSeconds, new in LLVM since I wrote the original patch.

Ladd added a comment.Mar 13 2020, 10:38 AM

@Bigcheese -- does the revised patch look OK? If so, what do I need to do to get it merged?

Bigcheese accepted this revision.Mar 13 2020, 12:42 PM

Sorry for the delay. LGTM. I'll commit with you as the author.

Ladd added a comment.Mar 18 2020, 2:33 PM

Squeaky wheel here -- should I expect to see this committed soon?

Macro shipit:

This revision was automatically updated to reflect the committed changes.
Ladd added a comment.Mar 23 2020, 3:25 PM

Thanks! Greatly appreciated.

@Ladd Hi there.
Are you going to have another patch for an event-based solution? Or should I make it using all the changes were it this patch originally?
Also, I don't have a big project that I can play with, so in case if we're going with another patch from my side, can I ask you to try the event-based solution and check the performance? Thanks

Ladd added a comment.Mar 26 2020, 8:25 AM

I’m not planning on doing an event-based version — feel free to give it a try. My time is somewhat limited, but I can try to benchmark your solution when it’s ready.