This is an archive of the discontinued LLVM Phabricator instance.

[DebugInfo] Don't emit checksums when compiling a preprocessed CPP
ClosedPublic

Authored by aganea on Apr 4 2019, 1:08 PM.

Details

Summary

When compiling from an already preprocessed CPP, the checksums generated in the debug info would be those of the (input) preprocessed CPP, not those of the actual #includes.
Note how all the files have the same checksum below:

** Module: "E:\RD\tests\clang_hash\main.obj"

     0 E:\RD\tests\clang_hash\lib.h (MD5: 136293700AE501A1FB76EBD273C8D288)
     1 C:\Program Files (x86)\Windows Kits\10\Include\10.0.17763.0\ucrt\stdio.h (MD5: 136293700AE501A1FB76EBD273C8D288)
     2 E:\RD\tests\clang_hash\main.cpp (MD5: 136293700AE501A1FB76EBD273C8D288)
     3 C:\Program Files (x86)\Windows Kits\10\Include\10.0.17763.0\ucrt\corecrt_stdio_config.h (MD5: 136293700AE501A1FB76EBD273C8D288)

When debugging in Visual Studio an EXE linked with the OBJ above, Visual Studio complains about the source files being different from when they were compiled, since the sources are compared by hash.

This patch simply clears the checksums to match MSVC behavior. Visual Studio will simply bypass the hash checking in that case, and hapily open the source file.

** Module: "E:\RD\tests\clang_hash\main.obj"

     0 c:\program files (x86)\windows kits\10\include\10.0.17763.0\ucrt\stdio.h (None)
     1 c:\program files (x86)\windows kits\10\include\10.0.17763.0\ucrt\corecrt_wstdio.h (None)
     2 c:\program files (x86)\windows kits\10\include\10.0.17763.0\ucrt\corecrt_stdio_config.h (None)
     3 c:\program files (x86)\microsoft visual studio\2017\professional\vc\tools\msvc\14.16.27023\include\vcruntime_new.h (None)
     4 e:\rd\tests\clang_hash\main.cpp (None)
     5 e:\rd\tests\clang_hash\lib.h (None)

We could write more complicated code to open the files from the corresponding #line directives, but we have no guarantee the preprocessed CPP is compiled in the same conditions as when it was pre-processed. Actually, when using Fastbuild, the files are preprocessed locally on the user's PC; then the preprocessed content is sent and compiled remotely on another PC that does not have the source code.

Thanks to Lambert Clara for the preliminary patch!

Fixes PR41215

Diff Detail

Repository
rC Clang

Event Timeline

aganea created this revision.Apr 4 2019, 1:08 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 4 2019, 1:08 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
aganea edited the summary of this revision. (Show Details)Apr 4 2019, 1:13 PM
rnk added inline comments.Apr 4 2019, 1:32 PM
lib/CodeGen/CGDebugInfo.cpp
435–436

I was unhappy with this check. It seems a bit ridiculous to compare the filenames by string when we surely have ids we could look at, but I wasn't able to figure out how.

test/Driver/cl-preprocess-md5.cpp
1 ↗(On Diff #193764)

This should be a much smaller test in test/CodeGen. You can check in an already pre-processed file with line markers in it, run clang -cc1 -emit-llvm, and check that the DIFile has no checksum. Clang's test suite can't depend on lld, and I'm not sure if it has deps on llvm-pdbutil either.

aganea updated this revision to Diff 193886.EditedApr 5 2019, 8:50 AM
aganea marked 2 inline comments as done.
aganea edited the summary of this revision. (Show Details)

I made a more elegant change, where no additional lookups or string compares are required.

Just a minor issue, I noted btw that clang /E does not generate #line directives, but simply #. In contrast, MSVC cl /E generates #line in the preprocessed output. The issue is that if compiling clang's preprocessed output with MSVC, it doesn't like # alone and throws an error. ie:

$ cat test.cpp
void test() { }

$ clang-cl /c /E test.cpp >test-pre.cpp

$ cat test-pre.cpp
# 1 "C:\\Users\\aganea\\Desktop\\test\\test.cpp"
# 1 "<built-in>" 1
# 1 "<built-in>" 3
# 361 "<built-in>" 3
# 1 "<command line>" 1
# 1 "<built-in>" 2
# 1 "C:\\Users\\aganea\\Desktop\\test\\test.cpp" 2
void test() { }

$ cl /c test-pre.cpp
Microsoft (R) C/C++ Optimizing Compiler Version 19.16.27027.1 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

test-pre.cpp
C:\Users\aganea\Desktop\test\test-pre.cpp(1): error C2019: expected preprocessor directive, found '1'
C:\Users\aganea\Desktop\test\test-pre.cpp(2): error C2019: expected preprocessor directive, found '1'
C:\Users\aganea\Desktop\test\test-pre.cpp(3): error C2019: expected preprocessor directive, found '1'
C:\Users\aganea\Desktop\test\test-pre.cpp(4): error C2019: expected preprocessor directive, found '3'
C:\Users\aganea\Desktop\test\test-pre.cpp(5): error C2019: expected preprocessor directive, found '1'
C:\Users\aganea\Desktop\test\test-pre.cpp(6): error C2019: expected preprocessor directive, found '1'
C:\Users\aganea\Desktop\test\test-pre.cpp(7): error C2019: expected preprocessor directive, found '1'

$ cl /c /E test.cpp >test-pre.cpp

$ cat test-pre.cpp
#line 1 "C:\\Users\\aganea\\Desktop\\test\\test.cpp"
void test() { }

$ cl /c test-pre.cpp
Microsoft (R) C/C++ Optimizing Compiler Version 19.16.27027.1 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

test-pre.cpp
thakis added a subscriber: thakis.Apr 5 2019, 11:12 AM

See http://llvm-cs.pcc.me.uk/tools/clang/lib/Frontend/VerifyDiagnosticConsumer.cpp#756 for a "is same file" example. I'm not sure adding a bool to PresumedLoc is worth it for this.

See http://llvm-cs.pcc.me.uk/tools/clang/lib/Frontend/VerifyDiagnosticConsumer.cpp#756 for a "is same file" example. I'm not sure adding a bool to PresumedLoc is worth it for this.

Yeah that works. However I'm getting two conflicting directions - I thought the worry was to avoid an extra runtime cost when comparing strings (see first version). @rnk wanted at first a compare by FileID, although it looks like it's more costly to do that, because all paths end up in SourceManager::getFileIDSlow. Just by tracing the code on a small preprocessed CPP, it looks like a costly solution. Using IsFromSameFile ends up in SourceManager::getFileIDSlow several times per iteration - whereas the solution proposed above (boolean) has zero runtime cost. I'm worried that large files with lots of #line will be much slower to compile. What do you think?

aganea added a comment.EditedApr 5 2019, 11:46 AM

I'll try profiling several solutions on a large unity/jumbo file and get back with some results.

rnk added a reviewer: rsmith.Apr 5 2019, 4:14 PM

+@rsmith for the PresumedLoc change

From glancing on the PresumedLoc computation code, I think this bool might be the way to go. You could make it a bit more "free" by stealing a bit from the column, if we're concerned about size.

FYI, I'm off to EuroLLVM after this and return in about two weeks.

aganea updated this revision to Diff 196321.Apr 23 2019, 1:59 PM
aganea retitled this revision from [clang-cl] Don't emit checksums when compiling a preprocessed CPP to [DebugInfo] Don't emit checksums when compiling a preprocessed CPP.

Steal one bit from PresumedLoc::Column as suggested by @rnk.
Ping @rsmith !

Herald added a project: Restricted Project. · View Herald TranscriptApr 23 2019, 1:59 PM
aganea changed the repository for this revision from rL LLVM to rC Clang.Apr 23 2019, 3:39 PM
aganea removed a project: Restricted Project.
aganea removed a subscriber: llvm-commits.
rnk added a comment.Apr 25 2019, 4:13 PM

See http://llvm-cs.pcc.me.uk/tools/clang/lib/Frontend/VerifyDiagnosticConsumer.cpp#756 for a "is same file" example. I'm not sure adding a bool to PresumedLoc is worth it for this.

@thakis, do you still object to adding info to PresumedLoc for this? I think in the absence of review from @rsmith we should go forward with this.

I had tried to do this in r333311 but some bots complained, so I reverted it and then didn't follow through. Thanks for doing this!
When I tried it, I took advantage of existing tracking of line directives at the file level, so there shouldn't be a need to add a Line flag to PresumedLoc?
What I did was something like this, in computeChecksum():

const SrcMgr::SLocEntry &Entry = SM.getSLocEntry(FID, &Invalid);
if (Invalid || !Entry.isFile() || Entry.getFile().hasLineDirectives())
  return None;

Thanks Paul, your solution is even better. I'll apply rL333311 locally - if everything's fine, do you mind if I re-land it again?

Thanks Paul, your solution is even better. I'll apply rL333311 locally - if everything's fine, do you mind if I re-land it again?

I suggest you do *not* use my patch unmodified, because it stops generating any checksums as soon as it finds one preprocessed file. The functionality in your patch seems better, to skip checksums only for those files that look like they were preprocessed. If you want to take my patch and remove the extra flag from CGDebugInfo that "remembers" that we've previously seen a file with line directives, that would be fine.

(Just for the record, I'm happy with whatever y'all end up with.)

uabelho resigned from this revision.May 3 2019, 12:18 AM
aganea updated this revision to Diff 199599.EditedMay 15 2019, 6:52 AM
aganea marked an inline comment as done.

Updated again with a different solution. We can't just act on Entry.getFile().hasLineDirectives() because a directive such as #line 100 simply overrides the __LINE__, not __FILE__ (we need to retain and hash the previous __FILE__ in that case). Please see an example here in the IBM documentation.
Also we can't compare by FileID because a LineEntry simply stores a string (called filename) but the buffer for that file isn't stored in the SourceManager.

Minor stuff. This solution is surprisingly simple, the main question being (and I don't have an answer) whether increasing the size of PresumedLoc is okay.

lib/Basic/SourceManager.cpp
1460

The comment would work better as a proper sentence, and on the line before the statement.

lib/CodeGen/CGDebugInfo.cpp
429

While the comment describes the desirable result, it is actually not describing what is happening right here. Maybe something like this:

Compute the checksum if possible. If the location is affected by a #line directive that refers to a file, PLoc will have an invalid FileID, and we will correctly get no checksum.

test/CodeGen/debug-info-file-checksum.c
13

These directives don't need the -LABEL suffix.

aganea updated this revision to Diff 199659.May 15 2019, 12:32 PM
aganea marked 3 inline comments as done.

Minor stuff. This solution is surprisingly simple, the main question being (and I don't have an answer) whether increasing the size of PresumedLoc is okay.

Thanks Paul! PresumedLoc seems to be used only as a temporary (on the stack).

Ping! Any further comments?

This revision is now accepted and ready to land.May 20 2019, 6:46 AM
This revision was automatically updated to reflect the committed changes.