This is an archive of the discontinued LLVM Phabricator instance.

[libc] Add libc-tidy.
AbandonedPublic

Authored by PaulkaToast on Apr 1 2020, 5:22 PM.

Details

Summary

The libc-tidy utility will be used to implement static checks to ensure our implementation standards are met. It will contain checks that are too libc specific to be used with clang-tidy as they may interact with the build system or TableGen files.

This patch:

  1. Creates a libc-tidy executable under the bin folder.
  2. Adds a check to ensure that functions declared with the LLVM_LIBC_ENTRYPOINT macro are public libc functions. The ground truth for these function are derived from the TableGen files.
  3. Adds regressions tests for libc-tidy checks.

Diff Detail

Event Timeline

PaulkaToast created this revision.Apr 1 2020, 5:22 PM
miscco added a subscriber: miscco.Apr 2 2020, 4:05 AM
miscco added inline comments.
libc/utils/LibcTidy/LibcTidyTool.cpp
51

could we shorten that to return restOfLine.contains("LIBC-NOLINT")

77

rfind searches for the last occurrence of the sub-string. Do you expect multiple qualifiers?

abrachet added inline comments.Apr 2 2020, 3:37 PM
libc/utils/HdrGen/PublicAPICommand.h
47

While we're moving this, we could get rid of the extra . at the end.

libc/utils/LibcTidy/LibcTidyTool.cpp
56

What is the purpose of this and the end comment?

67

Remove virtual and add override

84

No point for const

103–105

logAllUnhandledErrors(std::move(Err), errs()) here probably.

libc/utils/LibcTidy/TableGenUtil.cpp
43

We could make pair explicitly const

49–51

llvm::transform(functionNameList, std::back_inserter(funcNames), [](StringRef s) { return std::string(s); }); maybe

libc/utils/LibcTidy/TableGenUtil.h
20

Can these be StringRef?

PaulkaToast marked 14 inline comments as done.

Moved away from using internal TGParser in favor of publicly available TableGenMain.

libc/utils/LibcTidy/LibcTidyTool.cpp
56

I added these comments to give us a place to add additional checks when the time comes, since I don't see us having so many that we need to put them in their own folder/files but we will have more than one.

77

No we don't, I changed this to use StringRef's startswith since its much more readable. thanks!

84

Ah, sorry I might be missing something. What is the reasoning for not using const here? I thought it's good practice to use const anytime you don't plan to modify the variable? To prevent mistakes like ID += 1; as this compiles but yields runtime errors.

103–105

I change the implementation to utilize TableGenMain instead so this shouldn't be an issue anymore.

libc/utils/LibcTidy/TableGenUtil.cpp
49–51

I think this is a little too fancy for me. Don't get me wrong I love it, just difficult to read. (:

The TableGen part LGTM. I have left couple of comments about the clang-tidy part.

I have a higher level question. Given our rule for include files, the check performed by the qualified call checker seems redundant to me. Am I right, or am I missing something?

libc/utils/LibcTidy/LibcTidyTool.cpp
78

I think they way it is written is actually what we want. But, it took me a long time to figure out why it was working at all because of the way you wrote your test. I put in some explanation over there about what is not expected in the test.

So, the way this check is written, if it finds a call-site to a public function provided by LLVM-libc, it checks whether it resolves to the LLVM-libc declaration. I think we want the check to be just this. We do not need that the call-site actually uses the fully qualified name. As long as we ensure the call-site resolves to an LLVM-libc function, C++ name mangling will give us enough confidence that it is not getting mixed up with a function with the same function from another libc.

libc/utils/LibcTidy/tests/Test.cpp
20 ↗(On Diff #257461)

This error is confusing. At least, it confused me. IIUC, an error will be reported even if the call were to be made in this fashion:

::libc_api_func();

From the user's point of view, this is total fine: they are explicitly calling the global version so there should be no error. Even if we had a non-global namespace like this, an error will be reported:

namespace nn {
void libc_api_func() {}
}
...
nn::libc_api_func();

Such uses cases are relevant in tests and I think we should allow them in tests. Also, did you intend to allow a use case like this:

namespace __llvm_libc {
void some_func() {
  libc_api_func(); // Unqualified call to libc_api_func.
}
}

The TableGen part LGTM. I have left couple of comments about the clang-tidy part.

I have a higher level question. Given our rule for include files, the check performed by the qualified call checker seems redundant to me. Am I right, or am I missing something?

I was indeed missing one particular case here. Our header file rules protect us from header file mixup. Likewise, our policy of enclosing everything within the __llvm_libc namespace protects us from symbol mixup with another libc. But, there is one case for which this patch gives us additional protection. Example:

// some_entrypoint.cpp
#include "include/string.h" // This makes the public strlen available

namespace __llvm_libc {

void LLVM_LIBC_ENTRYPOINT(some_entrypoint)(...) {
  // We want this to be disallowed.
  // Note that this strlen is not being picked from the system libc header
  // but is picked up from LLVM libc's public string.h header. This can
  // potentially lead to symbol mixup with another libc.
  ::strlen(...);
}

} // namespace __llvm_libc
libc/utils/LibcTidy/LibcTidyTool.cpp
73

Should we restrict to only entrypoints? Seems to me like we should do this for all function calls? That is, LLVM libc implementations (not redirectors) should only call functions declared in the namespace __llvm_libc.

libc/utils/LibcTidy/tests/Test.cpp
20 ↗(On Diff #257461)

I think this test is mostly correct. Few minor changes:

  1. Enclose Test in the namespace __llvm_libc.
  2. The call at line 20 should not result in an error after that.
  3. Add a call to ::libc_api_func which should result in an error.
  4. The error message in general should say something like, error: call to {function} not declared in __llvm_libc namespace. You can additionally point to the file:line of the function declaration to which the erroneous call resolves to.
PaulkaToast marked 5 inline comments as done.
PaulkaToast added inline comments.
libc/utils/LibcTidy/tests/Test.cpp
20 ↗(On Diff #257461)

Added some clarifying comments and tweaked the error message so it should make a big more sense now.

20 ↗(On Diff #257461)
  1. Done.
  2. Yes, this is now valid and doesn't produce an error.
  3. Done.
  4. Modified error message and now it also points to the function that the call resolves to.
PaulkaToast edited the summary of this revision. (Show Details)
abrachet added inline comments.Apr 23 2020, 8:05 PM
libc/utils/LibcTidy/CalleeNamespaceCheck.h
21 ↗(On Diff #259511)

Is it a large change to worry about global variables?

39 ↗(On Diff #259511)

Maybe !funcDecl->... makes more sense? But if not this should at least be nullptr

43 ↗(On Diff #259511)

From looking at the ast dump there is a reference to the declaration being called. I'm sure there is a way to see if that declaration was namespaced or not. But I wonder if we anyway want to require we use the full namespaced name?

libc/utils/LibcTidy/EntrypointNameCheck.h
26–27

We can omit the MD and range names to avoid any unused variable warnings

libc/utils/LibcTidy/LibcTidyTool.cpp
84

It would prevent that, but I think it could become verbose. const is useful to ensure a contract between functions not as much within one. It's a stylistic choice I don't have a large preference.

108

This might work because std::string::operator[] returns CharT & but I think it's confusing. Probably programName.data() is easier. TableGenMain shouldn't require a char * though it should be const char *.

113

Why are these shared_ptr and not unique_ptr?

We could also initialize like std::vector<...> checks {std::make_shared<CalleeNamespaceCheck>(), ... }; instead of calling push_back twice.

libc/utils/LibcTidy/LibcTidyTool.h
17

Do we care about a nolint on the previous line? Is that already handled by clang-tidy?

33–34

Do we want these to be pure virtual or is there a reason they do nothing?

libc/utils/LibcTidy/TableGenUtil.h
18

FWIW, there exists an llvm::StringSet. Why would you use it over this? No clue.

libc/utils/LibcTidy/tests/.clang-format
2

I think clang-format keeps looking in .. until it finds a .clang-format I might be wrong though. In any case is it needed to have a .clang-format file?

PaulkaToast marked 15 inline comments as done.
PaulkaToast added inline comments.
libc/utils/LibcTidy/CalleeNamespaceCheck.h
21 ↗(On Diff #259511)

Do you mean global variables that might not be in the namespace?

39 ↗(On Diff #259511)

Ah yes thank you for catching this, not good practice on my part (:

43 ↗(On Diff #259511)

That was my original idea, however @sivachandra mentioned that he doesn't want to enforce a style such as requiring the call to be fully qualified and that if it resolves to the correct implementation that is sufficient. The benefit, from what I understand is that this allows for a more general c++ looking code since its not very common to call qualified names from within a namespace but I'm sure Siva can speak more about the rationale.

libc/utils/LibcTidy/LibcTidyTool.cpp
73

This new implementation does this for all function calls, ignoring cases where the resolving function is system or compiler provided.

108

Tried this out and it complains about .data() returning a const pointer. .data() returning a char * is a C++17 feature and seems like LLVM is using C++14 https://llvm.org/docs/CodingStandards.html#c-standard-versions

113

AFAIK they can't be unique_ptr because the checks lifetime need to extend until the end of Tool.run(), but we'd have to hand over ownership when calling check-registerPPCallbacks() this is why I went with the shared_ptr instead. I'm not sure if this is the best approach but I'm open to suggestions.

Also thank you this is a great pattern and eliminates the push_back(). (:

libc/utils/LibcTidy/LibcTidyTool.h
17

This is intended to be quite lightweight so I didn't bother implementing a previous line nolint. I figured that using no-lint is a rare case. However if in the future this becomes cumbersome we could possibly add it in later.

33–34

The registerMatchers might be a good candidate for a pure virtual function. However, registerPPCallbacks is only needed for cases where we need information from the pre-processor, so having an empty implementation there is probably better. In this patch we even have an example of a check that doesn't need a registerMatchers function because we register the AST matchers in the preprocessor callback. That was the motivation for making them do nothing by default.

libc/utils/LibcTidy/TableGenUtil.h
18

This actually cleaned it up quite a bit!

libc/utils/LibcTidy/tests/.clang-format
2

Yes, unfortunately we have to tell clang-format to not break over long lines. This is because in FileCheck, the following lines are semantically different.

// CHECK: :[[@LINE-1]]:3: error: Function libc_api_func must call to internal implementation in `__llvm_libc` namespace.

and

// CHECK: :[[@LINE-1]]:3: error: Function libc_api_func must call to internal
// implementation in `__llvm_libc` namespace.

see also: https://github.com/llvm/llvm-project/blob/master/clang-tools-extra/test/.clang-format

abrachet added inline comments.Apr 24 2020, 12:00 AM
libc/utils/LibcTidy/EntrypointNameCheck.h
71

Presumably an expensive copy can this be a reference?

libc/utils/LibcTidy/LibcTidyTool.cpp
33

Is there a reason these aren't static?

104–106

You can pass non const to const. So making argv just char ** and then we don't need to copy into the string should work I believe.

113

I think it should be fine to use unique_ptr and then just std::move(checks) along with using auto & in a few places

117

auto & wether we can use unique_ptr or not.

sivachandra marked 2 inline comments as done.Apr 24 2020, 11:37 AM
sivachandra added inline comments.
libc/utils/LibcTidy/CalleeNamespaceCheck.h
21 ↗(On Diff #259511)

Hmm, may be? Should we wait until we have an example which causes a problem?

39 ↗(On Diff #259511)

AFAIU, constructors and operators have fully qualified name and the getQualifiedNameAsString should give you the name. Do you have an example where this is breaking?

43 ↗(On Diff #259511)

The high level point here is that we are not doing these checks as matter of style enforcement. We are doing this to ensure that the code we write does not accidentally resolve to functions from another libc. The style requirement is probably relevant, but feels overbearing at this point. If we find a case where lack of style is causing a problem, we can revisit this.

Keep in mind that this does not close the door for reviewers requesting to qualify calls to improve readability.

libc/utils/LibcTidy/LibcTidyTool.cpp
51

To improve navigation, you should put this class in a different file and call this file something like Main.cpp.

libc/utils/LibcTidy/LibcTidyTool.h
16

Not needed?

17

Same question as @abrachet: Doesn't clang-tidy already do this? We should avoid writing custom C++ source parsers unless existing clang or other APIs do not serve our purpose.

libc/utils/LibcTidy/tests/CalleeNamespaceTest.cpp
33 ↗(On Diff #259810)

Why do you need NOLINT on this? This should resolve to __llvm_libc::libc_api_func, no?

libc/utils/LibcTidy/tests/EntrypointNameTest.cpp
19

This is good. But, what we really want additionally is couple more things:

  1. The name listed in the build rule matches the name of the entrypoint function in the sources.
  2. That there is an implementation of the entrypoint function in the sources, and that it is listed with LLVM_LIBC_ENTRYPOINT macro.

I think #1 requires that we add a command line arg to libc-tidy?

abrachet added inline comments.Apr 24 2020, 12:47 PM
libc/utils/LibcTidy/CalleeNamespaceCheck.h
21 ↗(On Diff #259511)

I think stdout and friends would be worth having this for but we don't have them yet. Waiting is perfectly fine.

libc/utils/LibcTidy/LibcTidyTool.cpp
59

Remove inFile

libc/utils/LibcTidy/LibcTidyTool.h
14

Looks like this include isn't needed either

libc/utils/LibcTidy/TableGenUtil.cpp
23–26

funcNames.insert(functionNameList.begin(), functionNameList.end())

libc/utils/LibcTidy/tests/CalleeNamespaceTest.cpp
22–24 ↗(On Diff #259810)

FWIW, va_* functions almost certainly are macros which expand to __builtin_va_*. I'm not sure there exist any system provided function calls (which have definitions in public headers that is).

libc/utils/LibcTidy/tests/mock_api.td
9–13

I don't think its required to have these empty they can be omitted.

PaulkaToast marked 14 inline comments as done.
PaulkaToast added inline comments.
libc/utils/LibcTidy/CalleeNamespaceCheck.h
39 ↗(On Diff #259511)

Clang doesn't provide the qualified name as a string for these types of special functions. https://clang.llvm.org/doxygen/classclang_1_1NamedDecl.html#a7fab32caa3b7d601128265248260fd37

See this error produced:

libc-tidy: workspace/llvm-project/llvm/../clang/include/clang/AST/Decl.h:251: llvm::StringRef clang::NamedDecl::getName() const: Assertion `Name.isIdentifier() && "Name is not a simple identifier"' failed.

on this line:
https://github.com/llvm/llvm-project/blob/60f1d2636626a0b92b6ad2cb86eea78cb1cd7b33/libc/src/signal/linux/signal.h#L33
and
https://github.com/llvm/llvm-project/blob/4fd92cc475567f2001344d7049fe7ac592e8bdbe/libc/utils/testutils/StreamWrapper.h#L24
for example.

libc/utils/LibcTidy/LibcTidyTool.cpp
104–106

Yeah that's what I thought too for passing char * to something that expects const char * but this results in this error ):

workspace/llvm-project/libc/utils/LibcTidy/LibcTidyTool.cpp:95:32: error: no matching constructor for initialization of 'tooling::CommonOptionsParser'
  tooling::CommonOptionsParser OptionsParser(argc, argv, LibCTidyCategory);
                               ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
workspace/llvm-project/llvm/../clang/include/clang/Tooling/CommonOptionsParser.h:75:3: note: candidate constructor not viable: no known conversion from 'char **' to 'const char **' for 2nd argument
  CommonOptionsParser(int &argc, const char **argv,
117

Okay, I think I was using unique_ptr incorrectly by not passing the vector as a reference in some places, although now we can't use the initializer pattern because under the hood it makes a copy. https://stackoverflow.com/questions/9618268/initializing-container-of-unique-ptrs-from-initializer-list-fails-with-gcc-4-7

libc/utils/LibcTidy/LibcTidyTool.h
17

clang-tidy currently doesn't seem to be written in a way that we can expand on it in our source tree similar to lib tooling. Their nolint code is marked static and only visible here: https://github.com/llvm/llvm-project/blob/cb1ee34e9d32fce84613827693a8ed3aff1d36cf/clang-tools-extra/clang-tidy/ClangTidyDiagnosticConsumer.cpp#L270

Their diagnostics consumer is deeply linked to the rest of clang-tidy so we can't pull out the parts we need.
This could be possible if they exposed isNOLINTFound()in a header file.

libc/utils/LibcTidy/tests/CalleeNamespaceTest.cpp
33 ↗(On Diff #259810)

I meant to pre-fix this with :: thanks for catching this.

libc/utils/LibcTidy/tests/EntrypointNameTest.cpp
19

I have plans for this, but it requires changes to the build system. I think this patch is already quite large. Could we try expand on this by adding more checks in another patch after getting these initial checks landed?

Harbormaster completed remote builds in B54641: Diff 260020.
PaulkaToast marked 4 inline comments as done.

Realized that CalleeNamespaceCheck doesn't require any tablegen information now that we check all function calls instead of just entrypoints. Removed the check from this patch and created a new one in the clang-tidy tree.

PaulkaToast edited the summary of this revision. (Show Details)Apr 26 2020, 7:56 PM
PaulkaToast edited the summary of this revision. (Show Details)
PaulkaToast abandoned this revision.May 7 2020, 3:00 AM

Abandoned in favor of D79192.