Page MenuHomePhabricator

Provide -fsource-dir flag in Clang
AcceptedPublic

Authored by phosek on Sep 18 2020, 12:17 PM.

Details

Summary

This flag can be used to relativize source paths againts a given
directory. Compared to -f*-prefix-map flags, it avoids the absolute
path on the compiler command line analogously to -fdebug-compile-dir,
which is important distributed compilation scenarios.

Diff Detail

Event Timeline

phosek created this revision.Sep 18 2020, 12:17 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptSep 18 2020, 12:17 PM
phosek requested review of this revision.Sep 18 2020, 12:17 PM

This change is trying to address the issues raised in D83154. There are still some open questions:

  • Is -fsource-dir the best name for this flag?
  • I'm not sure if make_relative should be applied to all source paths, or only paths that start with SourceDir which would exclude system paths outside of the source directory (e.g. it's probably undesirable to relativize paths to /usr/include)?
  • If we decide to exclude source paths outside of the source directory, should we support -fsource-dir to be specified more then once to handle multiple source directories?
keith added a comment.Sep 21 2020, 5:08 PM

This change is trying to address the issues raised in D83154. There are still some open questions:

  • Is -fsource-dir the best name for this flag?

I think this name works as long as its recommended usage doesn't end up including directories outside of the source root.

  • I'm not sure if make_relative should be applied to all source paths, or only paths that start with SourceDir which would exclude system paths outside of the source directory (e.g. it's probably undesirable to relativize paths to /usr/include)?

This is one place where I think the *-prefix-map flags have a UX advantage. For example in the Apple toolchain, there are places where the Xcode paths are embedded in this data, and you may want to exclude them by doing -f*-prefix-map=/path/to/Xcode.app=STANDARD_XCODE_PATH and remapping that later as needed to increase reproducibility

  • If we decide to exclude source paths outside of the source directory, should we support -fsource-dir to be specified more then once to handle multiple source directories?

I think this would be a bit confusing usage wise, so I would say it should act like -fdebug-compilation-dir where only 1 can be passed.

keith accepted this revision.Mon, Nov 2, 8:43 AM
This revision is now accepted and ready to land.Mon, Nov 2, 8:43 AM
phosek updated this revision to Diff 303567.Fri, Nov 6, 3:17 PM

@rnk @vsk does this look good to you as well?

Ping? Is it OK to land this?

I missed this before, and it's an interesting problem.

I have a few questions:

  • I'm not clear on the expected interaction between the current working directory of the compiler and -fsource-dir. What happens if they're different?
  • I'm curious if you've considered other solutions that tie the source directory more closely to the current working directory, such as a flag -frelative-to-working-dir, which makes things relative to the compiler's working directory. Is that too restrictive?
  • Have you looked at the similar "MakeRelative" logic in ASTWriter.cpp? Can this logic be shared somehow?
  • What's the expected behaviour when building modules implicitly with -fmodules? What value (if any) should they get for -fsource-dir?
  • How should this affect the dependency output (e.g., clang-scan-deps and .d files), if at all?
  • Have you considered having -fsource-dir add entries to the prefix maps, rather than duplicating logic?

I am a bit concerned about mixing the concepts of "relative base path" with "source directory", since the latter sounds to me like where pristine sources live and that's not always the right relative base path.

For example, consider a layout like this:

/working/dir/
             sources/
                     lib/a/a.c
                     include/a/a.h
                     include/a/a.td
             build/
                   include/lib/a/a.inc
                   lib/a/a.o
             toolchain/usr/
                           bin/clang
                           lib/clang/11.0/include/stdint.h
             sdk/usr/include/
                             stdint.h
                             c++/v1/
                                    vector
                                    stdint.h

It seems cleaner to have everything relative to /working/dir, not /working/dir/sources, since you can name all compiler inputs and outputs with the former. But the latter is conceptually the source directory.

WDYT?

clang/lib/Lex/PPMacroExpansion.cpp
1545–1547

What does this do with -fsource-dir /a/b for the file /a/b.c? (It doesn't seem quite right for it to turn it into .c or ../b.c...)

I missed this before, and it's an interesting problem.

I have a few questions:

  • I'm not clear on the expected interaction between the current working directory of the compiler and -fsource-dir. What happens if they're different?

They're completely independent, -fsource-dir is only used for coverage mapping and macros.

  • I'm curious if you've considered other solutions that tie the source directory more closely to the current working directory, such as a flag -frelative-to-working-dir, which makes things relative to the compiler's working directory. Is that too restrictive?

I think it depends on the use cases and user expectations. For example, in our case we have layout like this:

/working/dir/
             sources/
                     lib/a/a.c
                     include/a/a.h
             out/
                     default/

Here out/default is the working directory so if we used -frelative-to-working-dir, we would end up with paths like ../../sources/lib/a/a.c in the coverage mapping or macro expansions like __FILE__. While this may be acceptable for macro expansions, it's definitely undesirable in coverage mapping.

I can think of some potential solutions. We use could use -fprofile-prefix-map=../../= to strip the ../../ prefix. Alternatively we could introduce a new flags in tools like llvm-cov and llvm-profdata to strip the prefix during post-processing.

  • Have you looked at the similar "MakeRelative" logic in ASTWriter.cpp? Can this logic be shared somehow?

I wasn't aware of that bit but once we decide on the approach, I'll try to deduplicate the logic.

  • What's the expected behaviour when building modules implicitly with -fmodules? What value (if any) should they get for -fsource-dir?

Right now this flag doesn't apply to modules at all so there's no effect.

  • How should this affect the dependency output (e.g., clang-scan-deps and .d files), if at all?

The same as for modules. We could consider extending it to support modules and dependency files as well although I'd prefer to do it separately.

  • Have you considered having -fsource-dir add entries to the prefix maps, rather than duplicating logic?

That might be an option if we land D83154 first. It wasn't clear if we're going to have both but it might make sense to introduce -fprofile-prefix-map and then build -fsource-dir on top.

I am a bit concerned about mixing the concepts of "relative base path" with "source directory", since the latter sounds to me like where pristine sources live and that's not always the right relative base path.

For example, consider a layout like this:

/working/dir/
             sources/
                     lib/a/a.c
                     include/a/a.h
                     include/a/a.td
             build/
                   include/lib/a/a.inc
                   lib/a/a.o
             toolchain/usr/
                           bin/clang
                           lib/clang/11.0/include/stdint.h
             sdk/usr/include/
                             stdint.h
                             c++/v1/
                                    vector
                                    stdint.h

It seems cleaner to have everything relative to /working/dir, not /working/dir/sources, since you can name all compiler inputs and outputs with the former. But the latter is conceptually the source directory.

Yes, I agree. Ultimately it's up to the user to decide what output they expect. I guess this is one advantage of using the -frelative-to-working-dir approach where it's unambiguous and user can use other flags to do additional rewriting if needed.

WDYT?

Thanks for the detailed feedback!

clang/lib/Lex/PPMacroExpansion.cpp
1545–1547

That's definitely undesirable. We could introduce a new function like llvm::sys::path::has_prefix to compare path components instead pure string comparison to avoid this issue.

-fsource-dir is only used for coverage mapping and macros.

Oh, interesting; it would be nice to consistently make all paths relative paths.

Here out/default is the working directory so if we used -frelative-to-working-dir, we would end up with paths like ../../sources/lib/a/a.c in the coverage mapping or macro expansions like __FILE__. While this may be acceptable for macro expansions, it's definitely undesirable in coverage mapping.

I can think of some potential solutions. We use could use -fprofile-prefix-map=../../= to strip the ../../ prefix.

This solution makes sense to me.

  • What's the expected behaviour when building modules implicitly with -fmodules? What value (if any) should they get for -fsource-dir?

Right now this flag doesn't apply to modules at all so there's no effect.

I'm not quite following this. I don't see logic to avoid this flag applying to modules, although I may be missing something. I believe modules built explicitly via the command-line will happily accept and apply this option, and, similarly, I think modules built implicitly by cloning a CompilerInvocation will pick up the same value as the importing CompilerInstance.

It seems cleaner to have everything relative to /working/dir, not /working/dir/sources, since you can name all compiler inputs and outputs with the former. But the latter is conceptually the source directory.

Yes, I agree. Ultimately it's up to the user to decide what output they expect. I guess this is one advantage of using the -frelative-to-working-dir approach where it's unambiguous and user can use other flags to do additional rewriting if needed.

If we want to handle it consistently everywhere, I wonder if we'd want a feature in the FileManager to do this, potentially handling many other cases. E.g., instead of canonicalizing to an absolute path, it would canonicalize to a path relative to working directory. Then callers just need to worry about the prefix map. I'm not sure though.