This is an archive of the discontinued LLVM Phabricator instance.

[clang] NFCI: Use FileEntryRef in FileManager's UID mapping
AbandonedPublic

Authored by jansvoboda11 on Jan 27 2023, 12:08 PM.

Details

Summary

Per the FileManager documentation, VirtualFileEntries is a subset of SeenFileEntries. This means we only have to iterate over the latter to create complete UID mapping. This allows us to map UID to FileEntryRef instead of FileEntry.

Diff Detail

Event Timeline

jansvoboda11 created this revision.Jan 27 2023, 12:08 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 27 2023, 12:08 PM
Herald added a subscriber: ributzka. · View Herald Transcript
jansvoboda11 requested review of this revision.Jan 27 2023, 12:08 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 27 2023, 12:08 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
rmaz accepted this revision.Jan 27 2023, 12:22 PM
This revision is now accepted and ready to land.Jan 27 2023, 12:22 PM

Iterating over SeenFileEntries and skipping VirtualFileEntries looks good to me.

I'm not sure about changing this to FileEntryRef though. The way this API works you only get one per unique file, which is well suited to FileEntry * which has the same uniquing behaviour. In this case you're going to get a FileEntryRef, but *which* ref you get is non-deterministic if there were multiple refs for the same file (depends on hash table iteration order). Also, it will never give you a vfs mapped path since it's skipping those (V.dyn_cast<FileEntry *>()).

I think if we want to change this to FileEntryRef it needs to be deterministic which ref you get.

rmaz added a comment.Jan 27 2023, 1:52 PM

I think if we want to change this to FileEntryRef it needs to be deterministic which ref you get.

I think this might be the root of the problem we are seeing: depending on build configuration sometimes our build inputs are hard links that in the case of identical inputs point to the same inode. In that case we are seeing non-deterministic header paths serialized in pcm files. IIUC the header files are serialized based in their unique ID, so it wouldn't be possible to handle this case, is this right?

In that case we are seeing non-deterministic header paths serialized in pcm files. IIUC the header files are serialized based in their unique ID, so it wouldn't be possible to handle this case, is this right?

If compiling a single pcm accesses multiple hard links with the same UID, then it would not be possible to use the set of UIDs to get the "right path". At best we could make it get a deterministic path -- e.g. if we tracked the order of access.

If compiling a single pcm accesses only one hard link but it's an implicit module build and the overall compiler invocation accesses another one with the same UID, you are currently in the same situation as above, though in theory you could avoid it by not sharing the FileManager across implicit module builds. In clang-scan-deps we don't share the FileManager and we get away with it because we have a separate filesystem caching layer. In a normal implicit modules build I don't know how much this would cost.

If you're on a platform that supports fcntl with F_GETPATH such as Darwin, another possible source of non-determinism is that the underlying OS may return a non-deterministic RealPath from openFileForRead when there are multiple hard links.

Sounds like the easies way out is serializing all FileEntryRef objects we know in deterministic order.

rmaz added a comment.Jan 27 2023, 2:17 PM

If compiling a single pcm accesses multiple hard links with the same UID, then it would not be possible to use the set of UIDs to get the "right path". At best we could make it get a deterministic path -- e.g. if we tracked the order of access.

With you so far. Is there a reason we need to use the UIDs in this case? Would it be possible to refactor GetUniqueIDMapping to instead populate an array with all the FileEntryRefs that had been seen and serialize that instead?

rmaz added a comment.Feb 6 2023, 9:24 AM

Sounds like the easies way out is serializing all FileEntryRef objects we know in deterministic order.

How about something like https://reviews.llvm.org/D143414 ?

jansvoboda11 abandoned this revision.Feb 7 2023, 10:03 AM