This is an archive of the discontinued LLVM Phabricator instance.

Speed up preamble building by replacing the slow translateFile call by a new, faster isMainFile check
ClosedPublic

Authored by arphaman on May 12 2020, 6:46 PM.

Details

Summary

[clang][Preprocessor] Replace the slow translateFile call by a new, faster isMainFile check

The commit 3c28a2dc6bdc331e5a0d8097a5fa59d06682b9d0 introduced the check that checks if we're
trying to re-enter a main file when building a preamble. Unfortunately this slowed down the preamble
compilation by 80-90% in some test cases, as translateFile is really slow. This change checks
to see if the FileEntry is the main file without calling translateFile, but by using the new
isMainFile check instead. This speeds up preamble building by 1.5-2x for certain test cases that we have.

rdar://59361291

Diff Detail

Event Timeline

arphaman created this revision.May 12 2020, 6:46 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 12 2020, 6:46 PM
arphaman updated this revision to Diff 263781.May 13 2020, 10:47 AM

fix assertion in the unit test.

IIUC the issue is that SourceManager::translateFile() basically consists of two blocks of code:

// First, check the main file ID, since it is common to look for a
// location in the main file.
if (MainFileID.isValid()) {
  bool Invalid = false;
  const SLocEntry &MainSLoc = getSLocEntry(MainFileID, &Invalid);
  if (Invalid)
    return FileID();

  if (MainSLoc.isFile()) {
    const ContentCache *MainContentCache =
        MainSLoc.getFile().getContentCache();
    if (MainContentCache && MainContentCache->OrigEntry == SourceFile)
      return MainFileID;
  }
}

and

  // The location we're looking for isn't in the main file; look
  // through all of the local source locations.
...

The comments suggest that the first block is a heuristic related to our case and the second block I would assume being the expensive part. SourceManager::getFileEntryRefForID implementation seems similar to the first block.

It makes sense to me to avoid the expensive search. I'm just wondering - how much speedup do we get with caching the value?

IIUC the issue is that SourceManager::translateFile() basically consists of two blocks of code:

// First, check the main file ID, since it is common to look for a
// location in the main file.
if (MainFileID.isValid()) {
  bool Invalid = false;
  const SLocEntry &MainSLoc = getSLocEntry(MainFileID, &Invalid);
  if (Invalid)
    return FileID();

  if (MainSLoc.isFile()) {
    const ContentCache *MainContentCache =
        MainSLoc.getFile().getContentCache();
    if (MainContentCache && MainContentCache->OrigEntry == SourceFile)
      return MainFileID;
  }
}

and

  // The location we're looking for isn't in the main file; look
  // through all of the local source locations.
...

The comments suggest that the first block is a heuristic related to our case and the second block I would assume being the expensive part. SourceManager::getFileEntryRefForID implementation seems similar to the first block.

It makes sense to me to avoid the expensive search. I'm just wondering - how much speedup do we get with caching the value?

Good question, let me check if caching can be avoided.

arphaman updated this revision to Diff 263847.May 13 2020, 1:51 PM

Drop caching, it's not need for the speedup.

@jkorous it looks like dropping caching works too, this achieves similar perf results.

Could you please add a test for this method to clang/unittests/Basic/SourceManagerTest.cpp?

clang/include/clang/Basic/SourceManager.h
816

Please mention the assertion this method makes about the state of the instance it's called on.

clang/lib/Basic/SourceManager.cpp
397

I don't really understand all the details here but shouldn't we use this comparison?

bool operator==(const FileEntryRef &LHS, const FileEntryRef &RHS)
arphaman updated this revision to Diff 264044.May 14 2020, 11:27 AM

Added test and a comment.

arphaman marked an inline comment as done.May 14 2020, 11:29 AM
arphaman added inline comments.
clang/lib/Basic/SourceManager.cpp
397

The == comparison can return false for symlinks due to the different name, but we want to return true for symlinks to the main file as well.

jkorous accepted this revision.May 14 2020, 12:57 PM

LGTM! Thanks Alex!

This revision is now accepted and ready to land.May 14 2020, 12:57 PM
dexonsmith added inline comments.May 14 2020, 1:01 PM
clang/lib/Basic/SourceManager.cpp
397

I suggest adding a comment to the code explaining this when you land it, something like:

// Compare UIDs directly so that symlinks compare equal.
This revision was automatically updated to reflect the committed changes.