This is an archive of the discontinued LLVM Phabricator instance.

[modules] Add `-fno-absolute-module-directory` flag for relocatable modules
Needs ReviewPublic

Authored by andrewjcg on Sep 1 2018, 12:49 AM.

Details

Summary

Currently, modules built using module map files embed the path to the directory
that houses their inputs (e.g. headers and module map file) into the output
module file. This path is embedded as an absolute path and the various input
files are either relativized to this path (if they are underneath it) or also
embedded as an absolute path.

This adds a flag which disables making the module directory an absolute path.
To support relative module directory paths, this diff also updates the serialized
AST to write out an extra bit with serialized paths to indicate whether they are
relative to the module directory or not (before this diff, these were handled by
absolutify paths that weren't relative the module directory).

Diff Detail

Event Timeline

andrewjcg created this revision.Sep 1 2018, 12:49 AM
andrewjcg added a subscriber: elsteveogrande.

I'm not sure this is the best approach, but I wasn't sure of a better one (to support module files w/o absolute paths). Another approach I tried, was relativizing the other input files (from outside the module directory) using chains of ../ (e.g. ../../../../../other/modules/module.modulemap), but this causes issues when symlinks appear in the module directory path. Another potential option could be to add a bit to the serialized module format for each input, to allow some inputs to mark themselves as relative to the CWD, rather than the module directory.

andrewjcg updated this revision to Diff 163621.Sep 1 2018, 1:16 AM

fix umbrella writing

andrewjcg updated this revision to Diff 166709.Sep 24 2018, 9:54 AM

Dropping the module directory entirely and fully resolving paths on serialization
broke some things during deserialization, specifically when the deserializer wanted
to update paths to use an alternate module directory.

This switches to a different strategy of only relativizing the paths that are
actually under the module home dir, and adding a bit to the serialized paths to
indiciate this. This bit is read on deserializiation to determine whether the
path is resolved against the module directory or not.

andrewjcg retitled this revision from [modules] Add `-fdisable-module-directory` flag for relocatable modules to [modules] Add `-fno-absolute-module-directory` flag for relocatable modules.Sep 24 2018, 9:54 AM
andrewjcg edited the summary of this revision. (Show Details)
modocache added subscribers: manmanren, bruno, modocache.

Friendly ping! Could someone recommend a reviewer for this? Or is there something wrong with the patch?

Looking at the blame I can see @manmanren; I believe you're not actively working in this part of the codebase, but could you review, or recommend someone who could? Or maybe @bruno?

I am not sure if this is the best approach, but the implementation looks okay to me. @bruno @rsmith What do you think?

Manman

include/clang/Serialization/ASTReader.h
2241 ↗(On Diff #166709)

Can you expand on this comment? Skip the relative bit?

include/clang/Serialization/ASTWriter.h
642 ↗(On Diff #166709)

Can you add comment here on how we set IsRelativeModuleDirectory?

650 ↗(On Diff #166709)

Can you update the comment to say that record is updated with the extra bit?

I don't think we need to change the serialization format for this: a serialized path beginning with / is already treated as absolute and any other path is already treated as relative, so we don't need a flag to carry that information.

I'm also not convinced we need to put this behind a flag. It would seem reasonable to me to simply always emit the MODULE_DIRECTORY as relative (if we found the module via a relative path), at least for an explicitly-built module. (For an implicitly built module, we probably need to track the absolute path so we can detect whether we found the right PCM file.)

lib/Serialization/ASTWriter.cpp
4531–4552

The intent of this function is to use an absolute form of both the file path and BaseDirectory so that we can strip off BaseDirectory even if the file path ends up absolute somehow. Rather than changing BaseDirectory above and then changing the code here to compensate, could you only change the code that writes out the MODULE_DIRECTORY record to write a relative path?

bruno added a comment.Nov 29 2018, 6:33 PM

Thanks for working on this @andrewjcg

I'm also not convinced we need to put this behind a flag. It would seem reasonable to me to simply always emit the MODULE_DIRECTORY as relative (if we found the module via a relative path), at least for an explicitly-built module. (For an implicitly built module, we probably need to track the absolute path so we can detect whether we found the right PCM file.)

+1

andrewjcg marked 2 inline comments as done.Nov 29 2018, 8:14 PM

I don't think we need to change the serialization format for this: a serialized path beginning with / is already treated as absolute and any other path is already treated as relative, so we don't need a flag to carry that information.

But I think we need this since we now have two types of relative paths -- a CWD-relative path and a module-home-relative path -- and we use this flag to discern them for the AST reader. Previously, because cleanPathForOutput would always absolutify input paths, we didn't need this flag -- any relative path was relative the module home and all other paths were absolute.

I attempted another take on this diff which just made all paths relative the CWD (and did away with module home relative paths), but this didn't work since the reader substitutes in a new module home.

I'm also not convinced we need to put this behind a flag. It would seem reasonable to me to simply always emit the MODULE_DIRECTORY as relative (if we found the module via a relative path), at least for an explicitly-built module.

Yeah, makes sense. Will fix this.

lib/Serialization/ASTWriter.cpp
4531–4552

So we also several non-module-home relative paths get processed here, so just I think we'd have to throw away the output of cleanPathForOutput if the Changed == false, to preserve the input relative path when non-module-home-relative?

rsmith added a comment.Dec 3 2018, 6:21 PM

I don't think we need to change the serialization format for this: a serialized path beginning with / is already treated as absolute and any other path is already treated as relative, so we don't need a flag to carry that information.

But I think we need this since we now have two types of relative paths -- a CWD-relative path and a module-home-relative path -- and we use this flag to discern them for the AST reader. Previously, because cleanPathForOutput would always absolutify input paths, we didn't need this flag -- any relative path was relative the module home and all other paths were absolute.

Ah, of course, that's exactly what I was missing :) I guess I live in a too--fmodule-map-file-home-is-cwd-centric world where the two are always the same (even when modules are relocated).

Following the 0-1-infinity principle, I think it might make sense to have a 'catalog' of prefixes for paths (relative to the module, relative to the compiler's CWD, relative to the compiler's resource directory, relative to the sysroot) that we try stripping off, with different markers to say which one we removed. (Right now, if you relocate the compiler, you invalidate any .pcm file that references files in its resource directory, for instance.) But this seems like a step in a good direction.

Just checking that we're on the same page here... suppose I do this:

  • compile module described in foo/module.modulemap (with no -fmodule-map-file-home-is-cwd, so module-relative paths have the foo/ prefix stripped off them)
  • use -Ibar/ to find some textual headers

Then, if I relocate the foo/ module to elsewhere/foo and pass the corresponding pcm file to a compilation using that module, I will still expect to find the bar/ files referenced by the pcm file relative to the compiler's working directory, not in elsewhere/bar. Is that what you're expecting, or would you expect the file paths to be emitted as module-relative ../bar/... paths? (Note that the latter can break if foo is a symlink.)

lib/Serialization/ASTReader.cpp
2094–2096 ↗(On Diff #166709)

This'd be more readable (throughout the patch) spelled as IsRelativeToModuleDirectory. (I was wondering "What is a relative module directory?")