This is an archive of the discontinued LLVM Phabricator instance.

[ThinLTO][ELF] Add a ThinLTO warning if cache_size_bytes or cache_size_files is too small for the current link job.
ClosedPublic

Authored by MaggieYi on Oct 10 2022, 8:18 AM.

Details

Summary

In this patch, a ThinLTO warning is added if cache_size_bytes or cache_size_files is too small for the current link job. The warning recommends the user to consider adjusting --thinlto-cache-policy.

A specific case for ThinLTO cache pruning is that the current build is huge, and the cache wasn't big enough to hold the intermediate object files of that build. So in doing that build, a file would be cached, and later in that same build it would be evicted. This was significantly decreasing the effectiveness of the cache. By giving this warning, the user could identify the required cache size/files and improve ThinLTO link speed.

Diff Detail

Event Timeline

MaggieYi created this revision.Oct 10 2022, 8:18 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 10 2022, 8:18 AM
MaggieYi requested review of this revision.Oct 10 2022, 8:18 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 10 2022, 8:18 AM

Thanks! Couple suggestions below.

lld/ELF/LTO.cpp
345

Can this handling be incorporated into pruneCache which is called a bit further below? The advantage of that is that it gets invoked by all cache users, which is more than just lld/ELF (i.e. the other lld formats, the old LTO API, and looks like lldb uses this as well). You could maybe just emit the warning the first time a file is pruned due to either of the size limits.

lld/test/ELF/lto/cache-warnings.ll
27

I'm concerned that this test will end up being brittle and start failing as the compiler evolves to change the IR and make the bitcode larger or smaller. I think you can use command substitution via $() in the RUN line to perhaps compute the size of the files and +/- 5 from the total.

MaskRay added inline comments.Oct 11 2022, 10:08 PM
lld/ELF/LTO.cpp
203

message(

Drop trailing .. Please follow the style of other messages and diagnostics.

Be terse in some wording; suggest the number of created files (number) exceeds the maximum ... (number)

lld/test/ELF/lto/cache-warnings.ll
3

Non-RUN non-CHECK comments use ;; for new tests.

MaggieYi updated this revision to Diff 468959.Oct 19 2022, 10:31 AM

Update the patch following the reviewers comments, which include:

  1. Just emit a warning the first time a file is pruned due to either of the size/number limits and add a test for it.
  2. Use python and command substitution via $() in the RUN line to compute the size of the files and +/- 5 from the total.
  3. Use lld 'warn()' to generate the warning.
  4. Use ';;' for Non-RUN non-CHECK comments.

Thanks Teresa and Fangrui for the comments. Sorry for the delay in updating the patch. I have been trying to find a way to get the warning invoked by all cache users.

As mentioned above, I cannot incorporate the handling into pruneCache.

I managed to get the total cache file sizes, which come out from the current link, by modifying the llvm/lib/LTO/LTO.cpp file.

diff --git a/llvm/lib/LTO/LTO.cpp b/llvm/lib/LTO/LTO.cpp
index dc28b681a151..a24f3149bead 100644
--- a/llvm/lib/LTO/LTO.cpp
+++ b/llvm/lib/LTO/LTO.cpp
@@ -74,6 +74,10 @@ cl::opt<bool> EnableLTOInternalization(
     "enable-lto-internalization", cl::init(true), cl::Hidden,
     cl::desc("Enable global value internalization in LTO"));

+/// The required cache size for the current link job. If thinLTOCacheDir is
+/// not specified, it is disabled by default.
+static uint64_t RequireCacheSize = 0;
+
 // Computes a unique hash for the Module considering the current list of
 // export/import and other global analysis results.
 // The hash is produced in \p Key.
@@ -1292,16 +1296,32 @@ public:
       return RunThinBackend(AddStream);

     SmallString<40> Key;
+    uint64_t CachedFileSize = 0;
     // The module may be cached, this helps handling it.
     computeLTOCacheKey(Key, Conf, CombinedIndex, ModuleID, ImportList,
                        ExportList, ResolvedODR, DefinedGlobals, CfiFunctionDefs,
                        CfiFunctionDecls);
-    Expected<AddStreamFn> CacheAddStreamOrErr = Cache(Task, Key);
+    Expected<AddStreamFn> CacheAddStreamOrErr = Cache(Task, Key, &CachedFileSize);
     if (Error Err = CacheAddStreamOrErr.takeError())
       return Err;
     AddStreamFn &CacheAddStream = *CacheAddStreamOrErr;
-    if (CacheAddStream)
-      return RunThinBackend(CacheAddStream);
+    if (CacheAddStream) {
+      auto Result = RunThinBackend(CacheAddStream);
+      // Get the file size of new created cache file.
+      Expected<std::unique_ptr<CachedFileStream>> StreamOrErr =
+          CacheAddStream(Task);
+      if (Error Err = StreamOrErr.takeError())
+        report_fatal_error(std::move(Err));
+      std::unique_ptr<CachedFileStream> &Stream = *StreamOrErr;
+      auto FileName = Stream->ObjectPathName;
+      std::error_code EC = llvm::sys::fs::file_size(FileName, CachedFileSize);
+      if (EC)
+        report_fatal_error(Twine("Unable to obtain file size for a file ") +
+                           FileName + ": " + EC.message() + "\n");
+      RequireCacheSize += CachedFileSize;
+      return Result;
+    }
+    RequireCacheSize += CachedFileSize;

     return Error::success();
   }

I also have to change FileCache by adding a new pointer type parameter CachedFileSize:

diff --git a/llvm/include/llvm/Support/Caching.h b/llvm/include/llvm/Support/Caching.h
index bef23ae757f2..017bfd232afb 100644
--- a/llvm/include/llvm/Support/Caching.h
+++ b/llvm/include/llvm/Support/Caching.h
@@ -52,8 +52,8 @@ using AddStreamFn =
 ///
 /// if (AddStreamFn AddStream = Cache(Task, Key))
 ///   ProduceContent(AddStream);
-using FileCache =
-    std::function<Expected<AddStreamFn>(unsigned Task, StringRef Key)>;
+using FileCache = std::function<Expected<AddStreamFn>(
+    unsigned Task, StringRef Key, uint64_t *CachedFileSize)>;

diff --git a/llvm/lib/Support/Caching.cpp b/llvm/lib/Support/Caching.cpp
index d6902f660e39..6b9f7bd4f1d0 100644
--- a/llvm/lib/Support/Caching.cpp
+++ b/llvm/lib/Support/Caching.cpp
@@ -37,7 +37,8 @@ Expected<FileCache> llvm::localCache(Twine CacheNameRef,
   TempFilePrefixRef.toVector(TempFilePrefix);
   CacheDirectoryPathRef.toVector(CacheDirectoryPath);

-  return [=](unsigned Task, StringRef Key) -> Expected<AddStreamFn> {
+  return [=](unsigned Task, StringRef Key,
+             uint64_t *CachedFileSize = nullptr) -> Expected<AddStreamFn> {
     // This choice of file name allows the cache to be pruned (see pruneCache()
     // in include/llvm/Support/CachePruning.h).
     SmallString<64> EntryPath;
@@ -48,6 +49,12 @@ Expected<FileCache> llvm::localCache(Twine CacheNameRef,
         Twine(EntryPath), sys::fs::OF_UpdateAtime, &ResultPath);
     std::error_code EC;
     if (FDOrErr) {
+      if (CachedFileSize) {
+        sys::fs::file_status Status;
+        EC = sys::fs::status(*FDOrErr, Status);
+        if (!EC)
+          *CachedFileSize = Status.getSize();
+      }
       ErrorOr<std::unique_ptr<MemoryBuffer>> MBOrErr =
           MemoryBuffer::getOpenFile(*FDOrErr, EntryPath,
                                     /*FileSize=*/-1,

If the developers use FileCache but don’t need to get the cached file sizes, they need to pass the nullptr, e.g.

Expected<AddStreamFn> CacheAddStreamOrErr = Cache(Task, UniqueKey, nullptr);

I am not sure whether this method is acceptable?

Could you please let me know your thoughts on this?

Thanks,

lld/ELF/LTO.cpp
203

Thanks, fixed it by using 'warn()' function and modified the message information.

345

Thanks.

  1. Just emit a warning the first time a file is pruned and add a test for it.
  2. I am not able to put this handling into pruneCache since pruneCache performs pruning for all cache files in the given cache directory. In this patch, we add up the sizes of the cache files that come out of the current link job. The warning is only given if cache_size_bytes or cache_size_files is too small for the total sizes/number of cache files that come out of the current link job. In the pruneCache function, there is not any information about cache files which were used by the current link job. Therefore, I cannot use pruneCache.
lld/test/ELF/lto/cache-warnings.ll
3

Thanks, fixed.

27

Thanks, the test is fixed following your suggestion.

MaggieYi updated this revision to Diff 468979.Oct 19 2022, 11:08 AM

Updating the correct patch.

Changes include:

  1. Just emit a warning the first time a file is pruned due to either of the size/number limits and add a test for this.
  2. Use python and command substitution via $() in the RUN line to compute the size of the files and +/- 5 from the total.
  3. Use lld 'warn()' to generate the warning.
  4. Use ';;' for Non-RUN non-CHECK comments.
MaskRay added inline comments.Oct 19 2022, 11:27 AM
lld/ELF/LTO.cpp
210

lld uses camelCase variable names.

218

Drop trailing period:)

I haven't read through the patch yet.

MaggieYi updated this revision to Diff 469260.Oct 20 2022, 9:33 AM

The changes include:

  1. Corrected the cache-warnings.ll test to pass both Linux and Windows.
  2. Corrected variable names.
  3. Dropped trailing in the message.
MaggieYi added inline comments.Oct 20 2022, 9:39 AM
lld/ELF/LTO.cpp
210

Thanks, fixed.

218

Thanks, fixed :) .

Gentle ping ..

I think I might be able to incorporate checkCacheCapacity into pruneCache. I am working on this.

MaggieYi updated this revision to Diff 472686.Nov 2 2022, 10:51 AM

Update the patch following the reviewers' comments, which include:

  1. Incorporate checkCacheCapacity into pruneCache.
  2. Emit two warnings if a file is pruned due to reach both size and number limits. If both cache files' size and number exceed the limits, I think it's better to print separated warnings in order to let the user adjusts cache_size_bytes and cache_size_files at the same time.
  3. Use llvm support 'llvm::errs()' to generate the warning.
  4. Add support and lit tests for other lld formats: COFF, MachO and Wasm.
  5. For COFF format, I removed 'WARN-NOT' check since the warning is printed twice. The reason is that 'LLD_IN_TEST' is set to 2 for COFF. 'LLD_IN_TEST' determines how many times main is run inside each process. Therefore, main is run twice which result in two warnings.
Herald added a project: Restricted Project. · View Herald TranscriptNov 2 2022, 10:51 AM
Herald added a reviewer: Restricted Project. · View Herald Transcript

Thanks, I like the new version with the checking centralized in pruneCache. A few suggestions below.

lld/test/ELF/lto/cache-warnings.ll
30

I'm not sure invoking lld via python subprocess is ideal for testing. I think this could be done much more simply via shell substitution via $() (you have to REQUIRE shell though). Even if you need python to compute the total file sizes initially you could then do something like

RUN: ld.lld .... --thinlto-cache-policy=prune_interval=0s:cache_size_bytes=$(($(cat size.txt) + 5)) ...

I know I have seen $(cat somefile.txt) used before, so I assume the evaluation described above should also work - can you give it a try to simplify a lot of this?

llvm/include/llvm/Support/CachePruning.h
83

If all clients are passing in files with your patch, probably just go ahead and remove the default parameter.

llvm/lib/Support/CachePruning.cpp
270

Probably using the warning interface is better than errs(). You could probably pass in the ErrorHandler if needed.

Also, @MaskRay had an earlier suggestion about the format of the message that I think would be good to incorporate (including the number of files in the limit and the actual number of files that caused the limit to be exceeded): https://reviews.llvm.org/D135590#inline-1308533

MaggieYi updated this revision to Diff 473788.EditedNov 7 2022, 2:11 PM

Thanks for your comments.

I have updated the patch following reviewers' comments:

  1. Modified the format of the message including the number of files in the limit and the actual number of files that caused the limit to be exceeded.
  2. Corrected variable names since LLVM uses 'SamelCase' variable names.
  3. Since warnings are moved from 'lld' to 'llvm/support', used llvm 'WithColor::warning()' (instead of lld 'warn()') to generate the warning.
  4. Simplified the tests using shell substitution via $().
MaggieYi added inline comments.Nov 7 2022, 2:21 PM
lld/test/ELF/lto/cache-warnings.ll
30

Thanks, I have simplified the tests following your suggestion.

llvm/include/llvm/Support/CachePruning.h
83

There is an exceptional case. In the 'lldb' project, in the constructor of 'DataFileCache', 'pruneCache' function is called before 'localCache' function, which is different from other cases. In this case, the cache pruning is not considered/caused by the current 'localCache'. The new warnings should not be given in this case. Therefore, I use the default parameter to cover the above situation.

llvm/lib/Support/CachePruning.cpp
270

Thanks, since warnings are moved from 'lld' to 'llvm/support', I have used llvm 'WithColor::warning()' (instead of 'lld::warn()') to generate the warning.

@MaskRay, sorry to miss your earlier suggestion. @tejohnson, thanks for your reminding.
I have updated the warning message following the suggestion.

tejohnson accepted this revision.Nov 8 2022, 11:51 AM

lgtm but please wait a bit to see if @MaskRay has any more comments

This revision is now accepted and ready to land.Nov 8 2022, 11:51 AM
MaskRay accepted this revision as: MaskRay.Nov 8 2022, 9:24 PM
MaskRay added inline comments.
lld/test/COFF/lto-cache-warnings.ll
2

Does any feature require shell? I feel that every RUN line is normal and can be interpreted by lit, so shell is likely unneeded. But you can check this by running the test on a Windows machine which I assume you have access to:)

7

Nit: -r is more common than -R.

Note, if you do ; RUN: rm -rf %t && mkdir %t && cd %t, you can drop every occurrence %t below and just use an arbitrary word as the directory name, and you can avoid the slightly strange %t.cache reference in %python -c "import os, sys; os.chdir(r\"%t.cache\");

24

Use single quotes to avoid some escapes.

llvm/lib/Support/CachePruning.cpp
263

files.size() is greater than the number of inputs by one

264

llvmcache.timestamp was just mentioned above, so repeating it here seems verbose (and if we decide to rename, there is one comment needing to update).

267

Technically it may not be +1. It's + config->ltoPartitions (see config->ltoPartitions = args::getInteger(args, OPT_lto_partitions, 1); and ParallelCodeGenParallelismLevel). I suggest that you avoid being so specific in this sentence.

269

const size_t

Drop a pair of parentheses below: if (Policy.MaxSizeFiles && ActualNums > Policy.MaxSizeFiles)

MaggieYi updated this revision to Diff 474332.Nov 9 2022, 12:59 PM

@tejohnson and @MaskRay, thanks for reviewing the patch.

I have updated the patch following @MaskRay's suggestions.

@MaskRay, please let me know if there are any further changes required. Thanks.

lld/test/COFF/lto-cache-warnings.ll
2

As mentioned in the Teresa's comment, cache_size_bytes=$(($(cat %t.size.txt)+5)) requires shell. I have checked the test on a Windows machine and it failed in the RUN command which contains cache_size_bytes=$(($(cat %t.size.txt)+5)).

7

Thanks, fixed! For Macho test, I used ;RUN: cd %t, since the .o files exist in the %t folder as well.

24

Fixed, thanks.

llvm/lib/Support/CachePruning.cpp
264

Thanks, I have removed the timestamp file name.

267

Thanks for pointing this out, I have modified the sentence.

269

Fixed, thanks.

MaskRay accepted this revision.Nov 10 2022, 1:20 PM

Hi @MaggieYi, thanks for the updates. You can probably mark comments as done to indicate there is no further action item. Note: if you mark some comments, you can either click "Submit" or upload a new diff (arc diff) to cause all comments to actually be marked as done.

lld/test/COFF/lto-cache-warnings.ll
23

Is fd 2 intended to be redirected to %t.size.txt as well?

MaggieYi updated this revision to Diff 474684.Nov 11 2022, 1:24 AM
MaggieYi marked 17 inline comments as done.

Removed '2>&1' from 'RUN: %python'.

lld/test/COFF/lto-cache-warnings.ll
23

Thanks, I have removed '2>&1'.

@MaskRay, thanks again. I have marked comments as done.

MaskRay accepted this revision.Nov 12 2022, 9:31 AM