This is an archive of the discontinued LLVM Phabricator instance.

[analyzer][ctu] Fix wrong 'multiple definitions' errors caused by space characters in lookup names when parsing the ctu index file
ClosedPublic

Authored by OikawaKirie on May 17 2021, 11:22 PM.

Details

Summary

This error was found when analyzing MySQL with CTU enabled.

When there are space characters in the lookup name, the current delimiter searching strategy will make the file path wrongly parsed.
And when two lookup names have the same prefix before their first space characters, a 'multiple definitions' error will be wrongly reported.

e.g. The lookup names for the two lambda exprs in the test case are c:@S@G@F@G#@Sa@F@operator int (*)(char)#1 and c:@S@G@F@G#@Sa@F@operator bool (*)(char)#1 respectively. And their prefixes are both c:@S@G@F@G#@Sa@F@operator when using the first space character as the delimiter.

Solving the problem by adding a length for the lookup name, making the index items in the format of <USR-Length>:<USR File> <Path>.


In the test case of this patch, we found that it will trigger a "triple mismatch" warning when using clang -cc1 to analyze the source file with CTU using the on-demand-parsing strategy in Darwin systems. And this problem is also encountered in D75665, which is the patch introducing the on-demand parsing strategy.
We temporarily bypass this problem by using the loading-ast-file strategy.

Refer to the discourse topic for more details.

Diff Detail

Event Timeline

OikawaKirie created this revision.May 17 2021, 11:22 PM
OikawaKirie requested review of this revision.May 17 2021, 11:22 PM

I don't really like having multiple files with the same name.
And the importer TU should be simple to be simply cat-ed into a temporal file.
At that point, you could put the importee's content into this file. It would result in a single, self-contained test case.

I'm not really familiar with the extdefmap part, but I'm surprised that we are using spaces as separators.
Shouldn't we consider using a different character?

clang/test/Analysis/ctu-lookup-name-with-space.cpp
8

Probably splitting this up into multiple lines would result in a more readable solution.

Something along these lines should work:

cat >%t/compile_commands.json <<EOL
line 1
line 2
...
EOL
12–24

Why do you need two separate invocations? Couldn't you just merge these?
I've seen cases where -verify was used in conjunction with FileCheck.

OikawaKirie marked an inline comment as done.May 18 2021, 2:29 AM

I'm not really familiar with the extdefmap part, but I'm surprised that we are using spaces as separators.
Shouldn't we consider using a different character?

I prefer the idea of changing the delimiter character, but it may lead to modifying a lot of test cases.
I think we'd better make this change in another revision in the future if we do want to change it.

clang/test/Analysis/ctu-lookup-name-with-space.cpp
8

The suggestion is great, however I cannot find a way to write the RUN commands.
Could you please tell me how to write the commands in this way? It is also useful to help me merging the test case into one file.

12–24

I forgot the --allow-empty argument during writing this test case. I will merge them in an update.

First of all, thank you for the patch!

We had a meeting with my colleges (@steakhal, @gamesh411) and we agreed in the following. This issue you are trying to solve here is indeed a serious problem, but we'd like to suggest an alternative and perhaps more durable solution. In CrossTranslationUnitContext::getLookupName(const NamedDecl *ND) it would be possible to extend the returned string with a prefix that encodes the length of the USR string.
So, instead of

c:@F@g#I# ctu-other.cpp.ast

we'd get

9:c:@F@g#I# ctu-other.cpp.ast

This way, we could handle even file names with spaces in them.

There are quite a few places where the extdef mappings should be updated.
For discovering them I suggest you asserting the new file format (only for detecting them!). This way if you miss one, it wouldn't silently 'work' somehow, but raise your attention.

There are a few references to the format of this mapping in the clang/docs/analyzer/user-docs/CrossTranslationUnit.rst and probably in other files.
Those should be updated to match the new format.

I'm sorry for burdening you with all of this, but I think this is the way to make this parsing more robust. I really appreciate your work.

OikawaKirie marked an inline comment as done.
OikawaKirie edited the summary of this revision. (Show Details)

It has been a long period since the last discussion, I hope you can still remember this bug. And apologize for the delay.

Updated as required, the lookup name generator CrossTranslationUnitContext::getLookupName and parser parseCrossTUIndex are modified.
And assertions are added before accessing the CTU index mapping for input lookup names to be searched.

Corresponding test cases and documents are also updated.

Please let me know if there are other files to be updated.

Looks good.
Please get rid of the macro stuff, consider something along the lines I proposed for the parsing stuff.
Also clang-format the code you touch.
I haven't checked the docs and the comments of the codebase, but I'll assume you grepped and fixed all occurrences.
I look forward to this, thank you for working on this @OikawaKirie.

clang/docs/analyzer/user-docs/CrossTranslationUnit.rst
84
clang/lib/CrossTU/CrossTranslationUnit.cpp
154–172
175–180

Please do something about this macro.
Encapsulate the logic in some other way.

460
clang/test/Analysis/func-mapping-test.cpp
50–62

I think you could add your lambda stuff to this file.
This is really the place for testing this.
The test you created actually demonstrating the CTU issue is also valuable IMO, so you can leave it, but have a copy here.

  1. Fix formatting bugs
  2. Update lookup name format as <USR-Length>:<USR> <File-Path> in all comments and documents
  3. Add the new test case as a part of clang/test/Analysis/func-mapping-test.cpp to verify the lookup name
  4. Change the macros for asserting the lookup name format to an NDEBUG wrapped function
clang/lib/CrossTU/CrossTranslationUnit.cpp
154–172

The source of lookup name of the function being imported is function CrossTranslationUnitContext::getLookupName. Keeping the length in the mapping can avoid parsing the lookup name during importing.

steakhal added inline comments.Dec 6 2021, 12:54 AM
clang/lib/CrossTU/CrossTranslationUnit.cpp
154–172

Okay; you can copy the original StringRef to have that. But by consuming it on one path makes the code much more readable.

183

charactor -> character

184–188

You should probably use more elaborative names, I wouldn't know what this does if I hadn't reviewed this patch.

450

The assertion speaks for itself. It rarely needs additional documentation.

clang/test/Analysis/Inputs/ctu-lookup-name-with-space.cpp
8–9

I would rather put these into the importee()

14

Why do you need to have a div by zero warning?

OikawaKirie added inline comments.Dec 6 2021, 2:47 AM
clang/lib/CrossTU/CrossTranslationUnit.cpp
154–172

The getAsInterger call can also check whether the content before the first colon is an integer. Therefore, a sub-string operation is required here.

clang/test/Analysis/Inputs/ctu-lookup-name-with-space.cpp
8–9

The lambda exprs will not be included in the CTU index file if they are declared in a normal function.

14

I am not sure whether we should test if an external function can be correctly imported. Hence, I wrote a div-by-zero bug here. A call to function clang_analyzer_warnIfReached is also OK here.

As the imported lambda expr will not be called, I think I can only test whether CTU works via another normal function.

Please mark comments 'done' if they are done.

clang/lib/CrossTU/CrossTranslationUnit.cpp
154–172

I don't doubt that your proposed way of doing this works and is efficient.
What I say is that I think there is room for improvement in the non-functional aspects, in the readability. However, it's not really a blocking issue, more of a personal taste.

clang/test/Analysis/Inputs/ctu-lookup-name-with-space.cpp
8–9

I see.

14

AFAIK importing a function and import-related stuff are orthogonal to actually emitting bug reports produced by the analyzer. That being said, if the importee() would have an empty body, the observable behavior would remain the same. And this is what I propose now.

OikawaKirie added inline comments.Dec 6 2021, 5:09 AM
clang/lib/CrossTU/CrossTranslationUnit.cpp
154–172

I know what you are considering, it is clearer and more readable by consuming the length, then the USR. However, to correctly separate the USR and file path, the length of USR-Length is also required, which makes it impossible to just *consume* the length at the beginning.

Another way of solving this problem is to re-create the string with the USR-Length and the USR after parsing, but I think it is not a good solution.

BTW, is it necessary to assert the USR-Length to be greater than zero? I think it is more reasonable to report *invalid format* rather than assert the value, as it can be provided by the user.

clang/test/Analysis/Inputs/ctu-lookup-name-with-space.cpp
14

Sorry, but I am not quite clear about your suggestions on this function.

steakhal added inline comments.Dec 6 2021, 5:57 AM
clang/lib/CrossTU/CrossTranslationUnit.cpp
154–172

I think what causes the misunderstanding is the definition of consume in the context of StringRef.

const StringRef Copy = Line;
Line.consumeInteger(...); // Line advances forward by the number of characters that were parsed as an integral value.
// Copy still refers to the unmodified, original characters.
// I can use it however I want.

// `Line` is a suffix of `Copy`, and the `.end()` should be the same, only `.begin()` should differ.

I hope that caused the miscommunication.


BTW, is it necessary to assert the USR-Length to be greater than zero? I think it is more reasonable to report *invalid format* rather than assert the value, as it can be provided by the user.

Yeah, sure!

clang/test/Analysis/Inputs/ctu-lookup-name-with-space.cpp
13–15

Also fixup the return type in the declaration within the main TU. Also add the // expected-no-diagnostics comment to the primary TU.

OikawaKirie added inline comments.Dec 9 2021, 12:51 AM
clang/lib/CrossTU/CrossTranslationUnit.cpp
154–172

I think I have figured out what have been misunderstood.

In the current patch, I just modify function CrossTranslationUnitContext::getLookupName by adding a length at the beginning. Therefore, the lookup name for the CTU query will have the length part. And for the sake of simplicity and efficiency, the length together with the USR is stored in the mapping as the key.

To correctly parse the <USR-Length>:<USR> part, I cannot just consume the <USR-Length> at the beginning. Otherwise, I cannot know the length of <USR-Length>: part, which makes it impossible to parse the entire <USR-Length>:<USR> part, even though the original Line is copied.

I will update the approach of adding the length part. Since the length is only used during parsing the CTU index, I will modify function createCrossTUIndexString to add the length and revert the changes to function CrossTranslationUnitContext::getLookupName to keep the lookup name for CTU query unchanged.

clang/test/Analysis/Inputs/ctu-lookup-name-with-space.cpp
13–15

Yes, you are right.
I was misled by myself.

  1. Revert function CrossTranslationUnitContext::getLookupName
  2. Add length when dumping the CTU index mapping via function createCrossTUIndexString
  3. Remove the assertions during CTU map query process
  4. Make function parseCrossTUIndexItem more readable
steakhal accepted this revision.Dec 13 2021, 3:30 AM

I think it looks great. Thanks.

This revision is now accepted and ready to land.Dec 13 2021, 3:30 AM

When running my test case ctu-lookup-name-with-space.cpp on Windows, llvm-lit reports 'cp': command not found. And this is the reason why it fails on Windows.
And when I remove the cps and replace them with original file names, clang reports YAML:1:293: error: Unrecognized escape code, it seems that the static analyzer only reads compilation database in YAML format on Windows.
Should I disable this test case on Windows? Or is there any other approaches to make it work on Windows?

Should I disable this test case on Windows? Or is there any other approaches to make it work on Windows?

I'm fine with disabling this test on Windows.

Fix YAML:1:293: error: Unrecognized escape code error by replacing lit substitution pattern %S to %/S.
Fix cp problems by removing the file copy operations.

steakhal accepted this revision.Dec 16 2021, 1:37 AM

It seems this patch has nothing to do with the failure in the Linux build. I think it is now ready to land.
Thanks a lot for your suggestions during the revision.

Could you please commit this patch on my behalf? Thanks.
Ella Ma <alansnape3058@gmail.com>

This revision was landed with ongoing or failed builds.Dec 16 2021, 8:48 AM
This revision was automatically updated to reflect the committed changes.

This commit seems to have caused a test to fail: https://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-incremental/26118/testReport/

Can you fix the failure or revert the patch?

I do not know how this error happens. Maybe we can currently revert this patch an have another try in the future.

thakis added a subscriber: thakis.Dec 16 2021, 5:00 PM

This breaks tests on macOS: http://45.33.8.238/macm1/23920/step_7.txt

Please take a look and revert for now if it takes a while to fix.

Could you please revert this on my behalf? I currently have no idea to fix this problem.

It seems that it is not this patch that triggers the problem, which is similar to D75665.
IMO it is the problem of on-demand-parsing, but I do not have a Mac M1 device to reproduce this bug.
Maybe we can just land this patch by restricting the test case to be executed only on Linux, just as what D75665 does (rG5cc18516c483 vs rG97e07d0c352c), and leave the problem for future fixes.

Could you please do the update as provided below and land this patch again? @steakhal or other reviewers?

clang/test/Analysis/ctu-lookup-name-with-space.cpp
14

Adding this line here.

We shouldn't skip mac targets. I CC ASTImporter folks, they probably have an M1.

arichardson added inline comments.
clang/test/Analysis/ctu-lookup-name-with-space.cpp
14

Disabling the test on non- Linux is not a good idea IMO since it means we lose coverage on other platforms. My guess is that you just need to specify an explicit triple in the clang invocations.

We shouldn't skip mac targets. I CC ASTImporter folks, they probably have an M1.

I am not intended to ignore this problem triggered on M1. However, I think it is not this patch that leads to this problem, it just triggers it.
I mean we can just disable the test case temporarily on M1, and fix this problem as well as enable this patch and the one of on-demand-parsing in another patch.
I think they trigger the same problem for the same reason on M1.

Besides, it seems to be the problem of ASTUnit::LoadFromCommandLine, rather than the ASTImporter.

clang/test/Analysis/ctu-lookup-name-with-space.cpp
14

AFAIK, we cannot do that. If this test case is executed on different platforms, we cannot determine the triple ahead of time and specify it in the invocation list.

steakhal reopened this revision.Dec 22 2021, 1:02 AM

We shouldn't skip mac targets. I CC ASTImporter folks, they probably have an M1.

I am not intended to ignore this problem triggered on M1. However, I think it is not this patch that leads to this problem, it just triggers it.
I mean we can just disable the test case temporarily on M1, and fix this problem as well as enable this patch and the one of on-demand-parsing in another patch.
I think they trigger the same problem for the same reason on M1.

Besides, it seems to be the problem of ASTUnit::LoadFromCommandLine, rather than the ASTImporter.

Prior to this patch, it worked on M1; after landing it broke something, so we clearly shouldn't land this.
We should add a test-case demonstrating the problem with M1 with a given configuration.
Then we need to track down and fix the underlying issue causing it. That should be done probably in a separate patch and add it as a parent patch to this one.

If all of these are done, we can probably land both of them.

clang/test/Analysis/ctu-lookup-name-with-space.cpp
14

If we were to pin the triple, then each platform would emit the correct AST dumps according to that platform - ~~ cross-compilation.

This revision is now accepted and ready to land.Dec 22 2021, 1:02 AM
steakhal requested changes to this revision.Dec 22 2021, 1:02 AM
This revision now requires changes to proceed.Dec 22 2021, 1:02 AM

Prior to this patch, it worked on M1; after landing it broke something, so we clearly shouldn't land this.

I do not think it is this patch that breaks the functionality on M1, as it depends on the *on-demand-parsing* feature that is not tested on M1 currently.

We should add a test-case demonstrating the problem with M1 with a given configuration.

If I got it correct (it is on-demand-parsing that triggers the problem), this problem can be triggered by enabling the test case of D75665 on M1.

Then we need to track down and fix the underlying issue causing it. That should be done probably in a separate patch and add it as a parent patch to this one.

If all of these are done, we can probably land both of them.

Maybe currently a simpler way is trying to use AST dump to load the external TU to be imported, rather than on-demand-parsing, which can make us fix this failure with the test case still enabled on M1.

I will have a series of tests on my concerns later, and I will reply with my results if I can find something.

I have confirmed that this problem is not due to this patch.
Besides, on Mac, both m1 and intel, the on-demand-parsing as well as loading an AST file generated via driver argument -emit-ast will also trigger this problem.
However, when loading an AST file generated via cc1 argument -emit-pch, the problem is not triggered.


See the example below, which is executed on an intel Mac laptop with clang 13.0.0.

/tmp/test/test.c:

void f();
void g() { f(); }

/tmp/test/importee.c:

void f() { }

/tmp/test/odp/externalDefMap.txt:

c:@F@f /tmp/test/importee.c

/tmp/test/odp/invocations.yaml:

"/tmp/test/importee.c": ["gcc", "-c", "/tmp/test/importee.c"]

/tmp/test/ast/externalDefMap.txt:

c:@F@f /tmp/test/ast/importee.c.ast

When executing the analyzer with CTU analysis via on-demand-parsing:

/tmp/test$ clang -cc1 -analyze -analyzer-checker=core -analyzer-config experimental-enable-naive-ctu-analysis=true,ctu-dir=odp,ctu-invocation-list=invocations.yaml test.c

Or loading AST file generated via driver argument -emit-ast:

/tmp/test$ clang -emit-ast importee.c -o ast/importee.c.ast
/tmp/test$ clang -cc1 -analyze -analyzer-checker=core -analyzer-config experimental-enable-naive-ctu-analysis=true,ctu-dir=ast test.c

The same diagnostic message is generated, though the triples are different from the ones for m1.

warning: imported AST from '/tmp/test/importee.c' had been generated for a different target, current: x86_64-apple-darwin21.2.0, imported: x86_64-apple-macosx12.0.0 [-Wctu]

However, the problem will not be triggered if triple is given:
(On demand parsing: setting the triple of the entry file to the one of imported ASTUnit)

/tmp/test$ clang -cc1 -analyze -analyzer-checker=core -analyzer-config experimental-enable-naive-ctu-analysis=true,ctu-dir=odp,ctu-invocation-list=invocations.yaml test.c -triple x86_64-apple-macosx12.0.0

(AST)

/tmp/test$ clang -target arm-apple-macosx -emit-ast importee.c -o ast/importee.c.ast
/tmp/test$ clang -cc1 -analyze -analyzer-checker=core -analyzer-config experimental-enable-naive-ctu-analysis=true,ctu-dir=ast test.c -triple arm-apple-macosx

Or the AST file is generated via cc1 argument -emit-pch:

/tmp/test$ clang -cc1 -emit-pch importee.c -o ast/importee.c.ast
/tmp/test$ clang -cc1 -analyze -analyzer-checker=core -analyzer-config experimental-enable-naive-ctu-analysis=true,ctu-dir=ast test.c

I think we can bypass the problem temporarily by loading the AST file generated by cc1 argument -emit-pch, just as shown in the last code snippet above.

Replace on-demand-parsing with loading AST file for the new test case.
Tested on Linux and MacOS(x86).
If it can also pass the CI test on Windows, I think we can have another try on landing this patch.

Besides, as mentioned above, to trigger the problem of target triple on MacOS, we can simply remove the requirement of Linux system for the two test cases of on-demand-parsing, i.e. clang/test/Analysis/ctu-on-demand-parsing.c and clang/test/Analysis/ctu-on-demand-parsing.cpp.

To make it work on Windows, Linux, and Mac OS, using echo to create the external function map, and using AST file for CTU analysis.
Tested on Windows, Linux, and Mac OS under x64.

I think I have found out the reason for the problem, and it proved my guesses.

When executing the test case of the static analyzer, we usually use %clang_analyze_cc1 as the entry, which is %clang_cc1 -analyze. And we usually do not set a target triple, as it is not required by the analyzer. However, things are complicated in Darwin Unix.

In Darwin, the default target triple is ARCH-apple-darwinXX.XX.XX, where ARCH is the architecture (e.g. x86_64) and XX.XX.XX is the version of the Darwin system. And the default target triple will be then adjusted to ARCH-apple-SYSTEMXX.XX.XX by the Driver::Darwin::ComputeEffectiveClangTriple in driver related code, where SYSTEM can be watchos, tvos, ios and macosx.

In the clang driver, the adjusted target triple will be passed to the new cc1 process; whereas in tooling and ASTUnit::LoadFromCommandLine, the adjusted target triple will be used to generate cc1 arguments to create CompilerInvocation. But when executing bare clang -cc1, if the target triple argument is not provided, it remains the default ARCH-apple-darwinXX.XX.XX. And this is the reason for the conflict.

The CTU on-demand-parsing mechanism uses ASTUnit::LoadFromCommandLine to load external ASTs. The tool clang-check uses clang tooling to parse the entry file. Therefore, both target triples are the adjusted ones, which can be matched. And so is the driver (clang --analyze ...). But not the bare %clang_cc1, its target triple is the default one.


Let's have a look at the simple example, suppose externalDefMap.txt and invocations.yaml are generated correctly.

input.cc:

void bar();
void foo() { bar(); }

importee.cc:

void bar() { }

Using driver:

clang -v --analyze input.cc -Xanalyzer -analyzer-config -Xanalyzer experimental-enable-naive-ctu-analysis=true,ctu-dir=.

Output: OK, adjusted to adjusted

(in-process)
/path/to/clang-15 -cc1 -triple x86_64-apple-macosx10.15.0 ...

Using clang-check:

clang-check -analyze input.cc -- -v -Xanalyzer -analyzer-config -Xanalyzer experimental-enable-naive-ctu-analysis=true,ctu-dir=.

Output: OK, adjusted to adjusted

clang Invocation:
 "/path/to/clang-tool" "-cc1" "-triple" "x86_64-apple-macosx10.15.0"

Using cc1:

clang -cc1 -v -analyze input.cc  -analyzer-checker=core -analyzer-config experimental-enable-naive-ctu-analysis=true,ctu-dir=.

Output: ERROR, default to adjusted

warning: imported AST from 'importee.cc' had been generated for a different target, current: x86_64-apple-darwin19.6.0, imported: x86_64-apple-macosx10.15.0 [-Wctu]

What do you think is a better way to fix this problem? @gamesh411 @steakhal @martong
Using clang-check to run the test case seems to be a good way to overcome the problem, but the problem still exists.
However, IMO it is not a good idea to make clang cc1 to adjust the default target triple manually.

steakhal added a subscriber: NoQ.Feb 15 2022, 6:27 AM

Thank you @OikawaKirie for the thorough investigation and explanation, even annotations and examples. I really appreciate it.
However, you already surpassed my knowledge regarding the frontend, cc1, and other driver magic transformations.

I think it would be great to invite some Apple folks and experts in the driver flag stuff. That being said this interesting behavior would be a nice candidate for a post on the clang discourse forum.
@NoQ might also have something to say about this target triple magic.

steakhal added a subscriber: keith.

Hi @keith, I've seen you commented on a clang driver-related issue: "Why does march=native not work on Apple M1?"
You might also have some valuable insight about this weird behavior about the detected target triple in a regular mode and using the -cc1 mode.
You don't need to go through the whole conversation, the last few comments should be enough.
Feel free to invite more people if you think.

keith added a comment.Feb 18 2022, 5:17 PM

Hi @keith, I've seen you commented on a clang driver-related issue: "Why does march=native not work on Apple M1?"
You might also have some valuable insight about this weird behavior about the detected target triple in a regular mode and using the -cc1 mode.
You don't need to go through the whole conversation, the last few comments should be enough.
Feel free to invite more people if you think.

I'm not very familiar with the logic or history here, but I did look into this a bit, and I don't see a clear solution. I think the logic that lives in the Darwin driver code for creating the triple would have to be called from the default triple logic that's used in the cc1 case. I imagine to make this test pass you'd be best off making sure you always call clang or cc1, and not intermix them, or provide a default triple in all cases that's stable if possible. I do think it would be nice to unify the triple logic between these 2 places, but I imagine folks who know this code better than I do have opinions there.

Thanks, @keith.

I agree with @keith to commit this patch without using on-demand-parsing through cc1.
As this patch has nothing to do with the target triple issue we found.

In the current version, I use the PCH file to load the external TU.
And it seems to work fine on my system and on the Windows CI.

IMO, maybe we can just leave a FIXME or something else in the test case and commit this patch to fix the original problem we want to fix.
(of course, re-submit to rerun the test case on Linux)
What do you think? @steakhal

Herald added a project: Restricted Project. · View Herald TranscriptMar 20 2022, 8:39 PM

IMO, maybe we can just leave a FIXME or something else in the test case and commit this patch to fix the original problem we want to fix.
(of course, re-submit to rerun the test case on Linux)
What do you think? @steakhal

I agree.
Please update the summary to have a reference to the discourse question you posted about this.

OikawaKirie edited the summary of this revision. (Show Details)
  • Add FIXME in test case.
  • Add discourse topic link in summary.
steakhal accepted this revision.Mar 21 2022, 6:08 AM

Okay, let's give it another shot. Please monitor any buildbot failures and revert promptly if needed.

This revision is now accepted and ready to land.Mar 21 2022, 6:08 AM