This is an archive of the discontinued LLVM Phabricator instance.

[index][analyzer][ctu] Eliminate white spaces in the CTU lookup name.
AbandonedPublic

Authored by OikawaKirie on May 10 2021, 3:33 AM.

Details

Summary

In the analyzer, the CTU definition index file requires there are no white spaces in the CTU lookup name, aka. the USR of a function decl or a global variable decl. Otherwise, the index file will be wrongly parsed.
However, it is difficult for the analyzer to know whether a white space ' ' is the separator between the index string and the file path, or it is just a part of the index string or the path. It means that using either the first or the last white space as the separator could not be a good idea. As it is valid for a file path to have white spaces, a better choice seems to be eliminating all the white spaces in the index string.
In this patch, the white space ' ' for an unsupported type is replaced with a question mark '?', which solves the problem as well as makes the index more reasonable as far as I am thinking.

Diff Detail

Event Timeline

OikawaKirie created this revision.May 10 2021, 3:33 AM
OikawaKirie requested review of this revision.May 10 2021, 3:33 AM

Maybe we could also handle this kind of type instead of leaving it 'unhandled'? What Type is it?

Maybe we could also handle this kind of type instead of leaving it 'unhandled'? What Type is it?

The member function pointer type, see the test case.

Although it would be perfect to handle this kind of type, I mean the white spaces should still be removed from the USR.
Currently, the white space character is used as the separator between the index string and the file path in the output of clang-extdef-mapping.
And it is difficult to determine when the index string with white space characters ends when parsing the output of clang-extdef-mapping.
Therefore, IMO the white space character had better not be used in the index string.

To clarify, I was suggesting that in addition to removing the space from unhandled types, we also handle the member function pointer type and not leave it in this fallback case.
Types should have unique USR characters so that overloaded functions (overloaded on the type parameter) have unique USRs.

OikawaKirie abandoned this revision.May 17 2021, 11:30 PM

It seems impossible and not so reasonable to eliminate all white space characters in the USR as mentioned in the test case of revision D102669.
This patch is split to revision D102669 to fix the wrongly parsed CTU index file and revision D102614 to handle the member pointer type mentioned here.
Please continue with these two following revisions and I will close this one.

Thanks to all reviewers.