This is an archive of the discontinued LLVM Phabricator instance.

[clang][modules] NFCI: Pragma diagnostic mappings: write/read FileID instead of SourceLocation
AcceptedPublic

Authored by jansvoboda11 on Nov 1 2022, 5:04 PM.

Details

Summary

For pragma diagnostic mappings, we always write/read SourceLocation with offset 0. This is equivalent to just writing a FileID, which is exactly what this patch starts doing.

Depends on D137211.

Diff Detail

Event Timeline

jansvoboda11 created this revision.Nov 1 2022, 5:04 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 1 2022, 5:04 PM
Herald added a subscriber: ributzka. · View Herald Transcript
jansvoboda11 requested review of this revision.Nov 1 2022, 5:04 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 1 2022, 5:04 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
dexonsmith accepted this revision.Nov 1 2022, 5:22 PM
dexonsmith added a subscriber: dexonsmith.

LGTM, although I there's format-is-probably-compatible-version as a courtesy for tooling, does that need a bump here?

This revision is now accepted and ready to land.Nov 1 2022, 5:22 PM

LGTM, although I there's format-is-probably-compatible-version as a courtesy for tooling, does that need a bump here?

Good point, I'll bump VERSION_MAJOR in clang/include/clang/Serialization/ASTBitCodes.h before committing.

This revision was landed with ongoing or failed builds.Nov 1 2022, 7:06 PM
This revision was automatically updated to reflect the committed changes.
alexfh added a subscriber: alexfh.Nov 20 2022, 12:26 PM

Heads up: this commit has broken compilation in a small number of cases: some #includes from modularized headers now fail to be found. I'm still trying to figure out what exactly happened, but some behavior has definitely changed by this commit.

Looks like our tests fail because ReadFileID doesn't translate file ID as ReadSourceLocation/TranslateSourceLocation do. Please see the prototype fix inline.

clang/lib/Serialization/ASTReader.cpp
6343

This doesn't work as before, likely because ReadFileID doesn't do TranslateSourceLocation.

Our tests fail.

I tried calling TranslateSourceLocation here and the tests passed:

SourceLocation Loc = Diag.SourceMgr->getComposedLoc(FID, 0);
SourceLocation Loc2 = TranslateSourceLocation(F, Loc);
auto IDAndOffset = SourceMgr.getDecomposedLoc(Loc2);

// Note that we don't need to set up Parent/ParentOffset here, because
// we won't be changing the diagnostic state within imported FileIDs
// (other than perhaps appending to the main source file, which has no
// parent).
auto &F = Diag.DiagStatesByLoc.Files[IDAndOffset.first];

Sorry I don't know the codebase, so this fix is definitely ugly :) But it shows the problem.

I just realized @jansvoboda11 is probably out on holiday this week (IIRC, Apple usually gets this week off). Since this was committed almost a month ago, I'm guessing this isn't enough of a blocker that we need to revert rather than wait until next week (and there are some commits on top that rely on this!). But probably reverting the stack would be an option if it's critical.

In the meantime, I'll try to help.

clang/lib/Serialization/ASTReader.cpp
6343

I don't think that's the issue, since ReadFileID() calls TranslateFileID, which should seems like it should be equivalent.

However, I notice that the post-increment for Idx got dropped! Can you try replacing the line of code with the following and see if that fixes your tests (without any extra TranslateSourceLocation logic)?

FileID FID = ReadFileID(F, Record, Idx++);

If so, maybe you can contribute that fix with a reduced testcase? I suggest adding me, @vsapsai, @Bigcheese, and @jansvoboda11 as reviewers.

@alexfh, maybe you can check if this fixes your tests as well?

(If this is the issue, it's a bit surprising we don't have existing tests covering this case... and embarrassing I missed it when reviewing initially!)

alexfh added inline comments.Nov 25 2022, 6:09 AM
clang/lib/Serialization/ASTReader.cpp
6343

I've noticed the dropped Idx post-increment as well, but I went a step further and looked at the ReadFileID implementation, which is actually doing a post-increment itself, and accepts Idx by reference:

FileID ReadFileID(ModuleFile &F, const RecordDataImpl &Record,
                  unsigned &Idx) const {
  return TranslateFileID(F, FileID::get(Record[Idx++]));
}

Thus, it seems to be correct. But what @eaeltsin has found is actually a problem for us. I'm currently trying to make an isolated test case, but it's quite tricky (as header modules are =\). It may be the case that our build setup relies on something clang doesn't explicitly promises, but the fact is that the behavior (as observed by our build setup) has changed. I'll try to revert the commit for now to get us unblocked and provide a test case as soon as I can.

jansvoboda11 added inline comments.Nov 28 2022, 9:50 AM
clang/lib/Serialization/ASTReader.cpp
6343

Thanks for helping out @dexonsmith, we did have the week off.

@eaeltsin @alexfhDone, are you able to provide the failing test case? I'm happy to look into it and re-land this with a fix.

alexfh added a subscriber: jgorbe.Nov 29 2022, 7:17 AM
alexfh added inline comments.
clang/lib/Serialization/ASTReader.cpp
6343

I've spent multiple hours trying to extract an observable test case. It turned out to be too hairy of a yaq to shave: for each compilation a separate sandboxed environment is created with a separate symlink tree with just the inputs necessary for that action. Some of the inputs are prebuilt module files (e.g. for libc++) that are version-locked with the compiler. So far @jgorbe and I could reduce this to four compilation steps with their own symlink trees with inputs. While I could figure out some of the factors that affect reproducibility (for example, symlinks are important, since making a deep copy of the input directories makes the issue disappear), it will take a few more hours of concentrated yak shaving to bring this to a shareable state. I'm not sure I have much more time to sink into investigating this.

It seems like examining code based on @eaeltsin's finding may be a more fruitful path to synthesizing a regression test. Could you try following that path?

alexfh added inline comments.Nov 29 2022, 7:40 AM
clang/lib/Serialization/ASTReader.cpp
6343

One more observation: -fmodules-embed-all-files and -Wno-mismatched-tags compiler options turned out to be important.

dexonsmith added inline comments.Nov 29 2022, 8:24 AM
clang/lib/Serialization/ASTReader.cpp
6343

Maybe @eaeltsin can help, but I don't see any reason to think that testcase will be easier. Typically we don't revert without a testcase or at least some way to understand the problem and make progress.

(Maybe @jansvoboda11 has ideas for extra instrumentation in the compiler to better understand what's going on with your setup?)

alexfh added inline comments.Nov 29 2022, 4:30 PM
clang/lib/Serialization/ASTReader.cpp
6343

I've managed to get rid of the precompiled module files and now I have something much more observable. It will take some more time to brush it up though.

alexfh added inline comments.Nov 29 2022, 5:53 PM
clang/lib/Serialization/ASTReader.cpp
6343

The standalone repro is here:

The files should be unpacked to a directory (e.g. /tmp/repro/standalone) and the repro.sh script can be invoked with CLANG environment variable pointing to the clang binary being tested. The script will create two symlink trees and start clang twice. The second clang invocation fails with clang containing this patch.

$ CLANG=/tmp/repro/clang-good /tmp/repro/standalone/repro.sh
++ dirname /tmp/repro/standalone/repro.sh
+ DIR=/tmp/repro/standalone
+ cd /tmp/repro/standalone
+ rm -rf 1 2
+ mkdir -p 1/q/a_proto/mypkg 1/p/a_proto/mypkg 2/q/a_proto/mypkg 2/p/b_proto/mypkg
+ ln -sf /tmp/repro/standalone/files/a_proto.cppmap 1/
+ ln -sf /tmp/repro/standalone/files/a.pb.h /tmp/repro/standalone/files/a.proto.h 1/p/a_proto/mypkg/
+ ln -sf /tmp/repro/standalone/files/a.pb.h /tmp/repro/standalone/files/a.proto.h 1/q/a_proto/mypkg/
+ ln -sf /tmp/repro/standalone/files/a_proto.cppmap /tmp/repro/standalone/files/b_proto.cppmap 2/
+ ln -sf /tmp/repro/standalone/files/a.pb.h 2/q/a_proto/mypkg/
+ ln -sf /tmp/repro/standalone/files/b.pb.h 2/p/b_proto/mypkg/
+ CC_ARGS=('-Wno-mismatched-tags' '-Xclang=-fmodules-embed-all-files' '-Xclang=-fmodules-local-submodule-visibility' '-fmodules' '-fno-implicit-modules' '-fno-implicit-module-maps' '-std=gnu++17' '-Xclang=-fmodule-map-file-home-is-cwd')
+ export CC_ARGS
+ : /tmp
+ cd /tmp/repro/standalone/1
+ /tmp/repro/clang-good -Wno-mismatched-tags -Xclang=-fmodules-embed-all-files -Xclang=-fmodules-local-submodule-visibility -fmodules -fno-implicit-modules -fno-implicit-module-maps -std=gnu++17 -Xclang=-fmodule-map-file-home-is-cwd -I q/a_proto -fmodule-name=//a -fmodule-map-file=a_proto.cppmap -xc++ -Xclang=-emit-module -c a_proto.cppmap -o /tmp/a.pcm
+ cd /tmp/repro/standalone/2
+ /tmp/repro/clang-good -Wno-mismatched-tags -Xclang=-fmodules-embed-all-files -Xclang=-fmodules-local-submodule-visibility -fmodules -fno-implicit-modules -fno-implicit-module-maps -std=gnu++17 -Xclang=-fmodule-map-file-home-is-cwd -I q/b_proto -I q/a_proto -fmodule-name=//b -fmodule-map-file=b_proto.cppmap -Xclang=-fmodule-file=/tmp/a.pcm -xc++ -Xclang=-emit-module -c b_proto.cppmap -o /tmp/b.pcm

$ CLANG=/tmp/repro/clang-bad /tmp/repro/standalone/repro.sh
++ dirname /tmp/repro/standalone/repro.sh
+ DIR=/tmp/repro/standalone
+ cd /tmp/repro/standalone
+ rm -rf 1 2
+ mkdir -p 1/q/a_proto/mypkg 1/p/a_proto/mypkg 2/q/a_proto/mypkg 2/p/b_proto/mypkg
+ ln -sf /tmp/repro/standalone/files/a_proto.cppmap 1/
+ ln -sf /tmp/repro/standalone/files/a.pb.h /tmp/repro/standalone/files/a.proto.h 1/p/a_proto/mypkg/
+ ln -sf /tmp/repro/standalone/files/a.pb.h /tmp/repro/standalone/files/a.proto.h 1/q/a_proto/mypkg/
+ ln -sf /tmp/repro/standalone/files/a_proto.cppmap /tmp/repro/standalone/files/b_proto.cppmap 2/
+ ln -sf /tmp/repro/standalone/files/a.pb.h 2/q/a_proto/mypkg/
+ ln -sf /tmp/repro/standalone/files/b.pb.h 2/p/b_proto/mypkg/
+ CC_ARGS=('-Wno-mismatched-tags' '-Xclang=-fmodules-embed-all-files' '-Xclang=-fmodules-local-submodule-visibility' '-fmodules' '-fno-implicit-modules' '-fno-implicit-module-maps' '-std=gnu++17' '-Xclang=-fmodule-map-file-home-is-cwd')
+ export CC_ARGS
+ : /tmp
+ cd /tmp/repro/standalone/1
+ /tmp/repro/clang-bad -Wno-mismatched-tags -Xclang=-fmodules-embed-all-files -Xclang=-fmodules-local-submodule-visibility -fmodules -fno-implicit-modules -fno-implicit-module-maps -std=gnu++17 -Xclang=-fmodule-map-file-home-is-cwd -I q/a_proto -fmodule-name=//a -fmodule-map-file=a_proto.cppmap -xc++ -Xclang=-emit-module -c a_proto.cppmap -o /tmp/a.pcm
+ cd /tmp/repro/standalone/2
+ /tmp/repro/clang-bad -Wno-mismatched-tags -Xclang=-fmodules-embed-all-files -Xclang=-fmodules-local-submodule-visibility -fmodules -fno-implicit-modules -fno-implicit-module-maps -std=gnu++17 -Xclang=-fmodule-map-file-home-is-cwd -I q/b_proto -I q/a_proto -fmodule-name=//b -fmodule-map-file=b_proto.cppmap -Xclang=-fmodule-file=/tmp/a.pcm -xc++ -Xclang=-emit-module -c b_proto.cppmap -o /tmp/b.pcm
While building module '//b':
In file included from <module-includes>:1:
In file included from ./p/b_proto/mypkg/b.pb.h:3:
q/a_proto/mypkg/a.pb.h:3:10: fatal error: 'mypkg/a.proto.h' file not found
#include "mypkg/a.proto.h"
         ^~~~~~~~~~~~~~~~~
1 error generated.
dexonsmith added inline comments.Nov 29 2022, 6:04 PM
clang/lib/Serialization/ASTReader.cpp
6343

Great!

jansvoboda11 added inline comments.
clang/lib/Serialization/ASTReader.cpp
6343

Thank you for the reproducer!

I (finally) got around to looking into this properly.

What's happening here is that Clang deserializes a.pcm when compiling b.pcm. Prior to this patch, ASTReader would see the pragma diagnostic mapping for "mypkg/a.proto.h" and call SourceManager::getDecomposedLoc(). This ends up calling ASTReader::getInputFile(), which registers "mypkg/a.proto.h" in the FileManager (since -fmodules-embed-all-files makes it "transient"), making it visible to HeaderSearch.

With this patch, we just read the FileID out of a.pcm without doing anything else. Since no other place in the compiler takes care to call ASTReader::getInputFile(), "mypkg/a.proto.h" doesn't get registered in FileManager. It's not reachable from any of the -I directories either, so the header search fails.

Seems like Clang prior to this patch compiles your repro by accident. Another way to confirm this is removing the #pragma clang diagnostic lines in "mypkg/a.proto.h". That makes the compilation to fail even prior to this patch.

@rsmith Can you confirm that -fmodules-embed-all-files that you added in rG919ce235 is only supposed to make PCMs usable without requiring their input files on disk? (As opposed to making those input files available to the importer.)

@alexfh & @eaeltsin: Could you try passing the appropriate include directory for "mypkg/a.proto.h" to the compilation of b.pcm and confirm this makes the problem go away with this patch applied?

jansvoboda11 reopened this revision.Aug 28 2023, 5:13 PM
This revision is now accepted and ready to land.Aug 28 2023, 5:13 PM

Now that the behaviour change is understood, maybe it'd be useful to split the patch in two:

  • First, this patch, plus a call to ASTReader::getInputFile() only for its side effects, to make this patch actually NFC.
  • Second, committed a few days later, a patch that removes the call and adds a test confirming -I is no longer implied by -fembed-all-input-files.

That way, the future temporary reverts won't churn the file format. For example, downstreams that hit problems and need to temporarily revert can just revert the second patch.

(... assuming that @rsmith/others confirm that implying -I is undesirable... or, if it's desired, surely there's a more explicit way to implement it.)

clang/lib/Serialization/ASTWriter.cpp
3013

This is where I'm suggesting a side-effects-only call to ASTReader::getInputFile() could be added. (Or to something that transitively will call it.)