This is an archive of the discontinued LLVM Phabricator instance.

[clang][deps] Fix traversal of precompiled dependencies
ClosedPublic

Authored by jansvoboda11 on Mar 12 2022, 12:17 PM.

Details

Summary

The code for traversing precompiled dependencies is somewhat complicated and contains a dangling iterator bug.

This patch simplifies the code and fixes the bug.

Diff Detail

Event Timeline

jansvoboda11 created this revision.Mar 12 2022, 12:17 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 12 2022, 12:17 PM
jansvoboda11 requested review of this revision.Mar 12 2022, 12:17 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 12 2022, 12:17 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript

I'm not a big fan of moving to the recursive call. Creating a separate StringMap for each iteration seems awfully wasteful.

I'd prefer fixing the iterative algorithm so that it's correct and easy to verify. This can be done with a stack that points at the stable pointers into the StringMap:

StringMap<std::string> Imports;
SmallVector<StringMapEntry<std::string> *> Worklist;
PrebuiltModuleListener Listener(Imports, Worklist, InputFiles, VisitInputFiles);

while (!Worklist.empty()) {
  StringMapEntry<std::string> *Import = Worklist.pop_back_val();
  // Deal with "Import".
}

and in PrebuiltModuleListener only push to the worklist for newly-discovered imports:

void visitImport(StringRef ModuleName, StringRef Filename) override {
  auto I = PrebuiltModuleFiles.insert({ModuleName, Filename.str()});
  if (I.second)
    Worklist.push_back(&*I.first);
}

Also, is it possible to add a test to exercise the (fixed) the dangling iterator bug?

Throw away recursive implementation, add reproducer

Fair enough, iterative implementation will be better.

I simplified it a bit by using the in-out parameter ModuleFiles to keep track of visited files. I also switched to using SmallVector for the newly discovered (not-yet-visited) imports, which allows using the suggested pop_back_val and avoids using (potentially dangling) iterator.

The test case is a bit unwieldy, since the old implementation only failed when the StringMap got rehashed (with 16 entries).

jansvoboda11 edited the summary of this revision. (Show Details)Mar 15 2022, 4:03 AM
dexonsmith accepted this revision.Mar 15 2022, 12:48 PM

LGTM, with another comment inline (up to you whether to do something with it).

clang/lib/Tooling/DependencyScanning/DependencyScanningWorker.cpp
91

These filenames will usually be longer than std::string's small storage, requiring separate heap allocations for each filename. Can the worklist just point at the stable pointer StringMapEntry<std::string>*, where a filename copy and heap allocation has already been made? (This has also has a nice side effect of handling 3x as many worklist items before needing to allocate, since the element size goes from 24B to 8B.)

clang/test/ClangScanDeps/modules-pch-dangling.c
12

split-file makes this so-called "unwieldy" testcase amazingly easy to read!

This revision is now accepted and ready to land.Mar 15 2022, 12:48 PM
This revision was landed with ongoing or failed builds.Mar 16 2022, 4:18 AM
This revision was automatically updated to reflect the committed changes.