This is an archive of the discontinued LLVM Phabricator instance.

[clang][modules] Mark fewer identifiers as out-of-date

Authored by jansvoboda11 on May 23 2023, 6:44 PM.



In clang-scan-deps contexts, the number of interesting identifiers in PCM files is fairly low (only macros), while the number of identifiers in the importing instance is high (builtins). Marking the whole identifier table out-of-date triggers lots of benign and expensive calls to ASTReader::updateOutOfDateIdentifiers(). (That unfortunately happens even for unused identifiers due to SemaRef.IdResolver.begin(II) line in ASTWriter::WriteASTCore().)

This patch makes the main code path more similar to C++ modules, where the PCM files have INTERESTING_IDENTIFIERS section which lists identifiers that get created in the identifier table of the importing instance and marked as out-of-date. The only difference is that the main code path doesn't *create* identifiers in the table and relies on the importing instance calling ASTReader::get() when creating new identifier on-demand. It only marks existing identifiers as out-of-date.

This speeds up clang-scan-deps by 5-10%. Impact on explicitly built modules TBD.

Diff Detail

Event Timeline

jansvoboda11 created this revision.May 23 2023, 6:44 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 23 2023, 6:44 PM
Herald added a subscriber: ributzka. · View Herald Transcript
jansvoboda11 requested review of this revision.May 23 2023, 6:44 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 23 2023, 6:44 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
benlangmuir added inline comments.May 31 2023, 10:10 AM

Why did this change from getOwn to get?

jansvoboda11 added inline comments.May 31 2023, 10:47 AM

That's a fluke on my part, this should've remained getOwn(). These don't differ very much for C++ modules, but getOwn() is the intended functionality here for sure. I'll fix that in next revision.

Rebase and switch back to IdentifierTable::getOwn()

jansvoboda11 marked an inline comment as done.Jun 1 2023, 2:50 PM
benlangmuir accepted this revision.Jun 1 2023, 3:02 PM

LGTM, but I would prefer at least one more person who understands the identifier handling code review this. It's been years since I thought about this code.

This revision is now accepted and ready to land.Jun 1 2023, 3:02 PM
jansvoboda11 added a project: Restricted Project.Jun 1 2023, 4:28 PM

Thanks, Ben. Maybe someone subscribed to clang-modules could take a look?

Bigcheese accepted this revision.Jun 6 2023, 2:59 PM

Looks good, although I'm not an expert on this bit either.

This revision was landed with ongoing or failed builds.Jul 4 2023, 3:58 AM
This revision was automatically updated to reflect the committed changes.

I finally got around to testing this patch on compiles of explicit modules. The number of elapsed CPU cycles decreased (by up to 1.8% for large modules) and the cumulative size of PCM files decreased by 2% (though small modules are a few bytes larger). These findings give me confidence to land this.