This is an archive of the discontinued LLVM Phabricator instance.

[clang] Store filename per include instead of mutating filename
Needs ReviewPublic

Authored by benlangmuir on Nov 2 2022, 5:04 PM.

Details

Summary

Depends on https://reviews.llvm.org/D137303

This change is not ready to land, but I wanted to put it up for early review in case the approach looks wrong to anyone. I'm also curious if there are any related cases this does not handle. I verified it fixes the /showIncludes and preprocessor examples mentioned in the comments on https://reviews.llvm.org/D135220 and https://github.com/llvm/llvm-project/issues/58726.


When including a header, store the filename per include (per FileInfo)
rather than storing it once in the ContentCache and mutating it. This
fixes the spelling of includes for -E and --show-includes when working
with symlinks/hardlinks for headers included multiple times.

Logically, the filename and FileEntryRef belong to the FileInfo, but to
avoid increasing the size of SLocEntry we indirect them in a new struct
NamedContentCache.

TODO

  • Evaluate the performance cost of indirecting ContentCache an extra level.
  • Audit uses of FileInfos to see if we need to do extra for for NamedFileInfos.
  • Consider making the FileEntryRef changes here the default -- it doesn't make sense to me that FileEntryRef == or DenseMap would match FileEntry pointer semantics instead of filename semantics.

Diff Detail

Event Timeline

benlangmuir created this revision.Nov 2 2022, 5:04 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 2 2022, 5:04 PM
benlangmuir requested review of this revision.Nov 2 2022, 5:04 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 2 2022, 5:04 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
benlangmuir edited the summary of this revision. (Show Details)Nov 2 2022, 5:10 PM
benlangmuir added reviewers: dexonsmith, bnbarham, hans.

Consider making the FileEntryRef changes here the default -- it doesn't make sense to me that FileEntryRef == or DenseMap would match FileEntry pointer semantics instead of filename semantics.

Yeah, that was something I added, and I agree it's unfortunate / hard to reason about. It was a concession to make it easier to replace FileEntry* with FileEntryRef seamlessly in lots of places with NFC; the logic is "we want only one entry FileEntry in this map; let the first one win". Maybe most places would prefer a functionality change though, and, either way, the DenseMaps that need this unique-by-FileEntry* behaviour should probably be the ones using a custom DenseMapInfo.

dexonsmith added inline comments.Nov 2 2022, 5:28 PM
clang/include/clang/Basic/SourceManager.h
135–139

It's possible this could be factored out as a follow-up, and/or moved up to the Named ContentCache level. (Not sure... asking a question really...)

147–163

Have you thought about whether it makes sense for these fields to be shared for all FileEntryRefs to the same FileEntry? (Maybe it does! just checking)

261–263

I think this patch should (or does?) implement this FIXME.

265–268

I think you can delete this field. You're effectively implementing the FIXME.

dexonsmith added inline comments.Nov 2 2022, 5:34 PM
clang/include/clang/Basic/SourceManager.h
652–662

It feels expensive to have both of these maps. I wonder if instead we could do something like:

DenseMap<const FileEntry*, TinyPtrSet<NamedContentCache*>> FirstFileInfo;

I.e., lookup by FileEntry*, then linear search for the right FileEntryRef.

Similar idea: just point at NamedContentCache* directly here, but add NamedContentCache *NamedContentCache::Next to turn it into a linked list. Also lookup by FileEntry* and then a linear search for the right ref.

Note that FileInfo itself would have a direct pointer to the right thing; no need for a linear search.

Thanks for the timely review!

the DenseMaps that need this unique-by-FileEntry* behaviour should probably be the ones using a custom DenseMapInfo.

Yeah, that's the way I'm currently leaning. Make the default for DenseMap and == be path equality and have a separate DenseMapInfo and isSameFileEntry for cases that want it. Thankfully there seems to be very little code depending on either of these. Though if I take your suggestion for FileInfos it will avoid needing a DenseMap<FileInfoRef in this patch specifically.

clang/include/clang/Basic/SourceManager.h
135–139

I think this belongs here. It is used to lazily get the contents of the file (see the getBuffer functions).

147–163

Yes, I think this split actually makes the behaviour clearer. The remaining fields are related to the contents, while the extracted fields are for the name. You only need one SourceLineCache, etc. for one set of file contents.

261–263

I think it can still be None/nullptr for some buffers. I'll verify that. IIRC the trick to removing Filename is to create virtual file refs in some places that currently don't have one.

652–662

Yeah, the TinyPtrSet approach here is something I thought about. I started with the two DenseMaps since it's simpler, but I can look at this again. Probably either of your suggestions are faster in practice, even if they have a worse worst case in theory, though I'll try to confirm that before committing to one or the other.

dexonsmith added inline comments.Nov 2 2022, 6:28 PM
clang/include/clang/Basic/SourceManager.h
261–263

Oh, maybe you're right. But if you're able to implement, it might help pay for some of the memory overhead of your patch.

dexonsmith added inline comments.Nov 2 2022, 6:35 PM
clang/include/clang/Basic/SourceManager.h
135–139

You're right. I wonder what if that laziness is inherently load-bearing. Seems odd that clang would want to create a FileID without very quickly trying to open it. But maybe that's important somewhere/somehow. Certainly off-topic for this patch though.

bnbarham added inline comments.Nov 3 2022, 10:09 AM
clang/include/clang/Basic/SourceManager.h
147–163

Makes sense to me. Might be worth adding a comment for that when you're done + fixing up the comment on ContentsEntry since it still references Entry (which I assume is what OrigEntry used to be called?).

314

I believe NamedContentCache can be const here now that we're not setting its filename.