This is an archive of the discontinued LLVM Phabricator instance.

[libc] Add integration tests.
ClosedPublic

Authored by PaulkaToast on Apr 30 2020, 11:47 AM.

Details

Summary

This patch aims to add integration tests to check the following:

  1. Header files are generated as expected.
  2. Libc functions have the correct public name.
  3. Libc functions have the correct return type and parameter types.
  4. Symbols are exposed in the public lib.a files.

Diff Detail

Event Timeline

PaulkaToast created this revision.Apr 30 2020, 11:47 AM

Few high level comments and questions:

  • Instead of creating a target for every entrypoint, have you considered generating a single file which:
    1. Includes all header files listed in the api.td.
    2. Lists cpp:Function pointers to all functions as required by api.td.

There definitely are problems that need to be solved with this approach. For example, how do you ensure all public header files are already present in the build/.../libc/include directory. But, the appeal of this approach is that one does not have to create explicit or implicit targets for "integration tests". There would be just one target which would test everything in one go.

  • In general, I don't think we should expect that internal API to match the public API. For example, we might choose to give internal types names which are different from the public names. So, I think the second kind of tests you mention in your description is not required as it is restrictive.
  • Using -nostdinc would require us to explicitly list path to the linux headers for entrypoints which depend on the linux header files, which will very likely bring all headers into the include path. Do you have any ideas on how to get around it?
  • Using -nostdlib results in an incomplete link. For ELF, most linkers only give a warning about missing _start symbol, but the resulting exe is bad/incomplete. Do you have any plans to ensure that the link is complete, resulting in an exe which can be executed?
dxf added a subscriber: dxf.May 1 2020, 6:56 PM

Few high level comments and questions:

  • Instead of creating a target for every entrypoint, have you considered generating a single file which:
    1. Includes all header files listed in the api.td.
    2. Lists cpp:Function pointers to all functions as required by api.td.

There definitely are problems that need to be solved with this approach. For example, how do you ensure all public header files are already present in the build/.../libc/include directory. But, the appeal of this approach is that one does not have to create explicit or implicit targets for "integration tests". There would be just one target which would test everything in one go.

+1 to doing this just from api.td. As for the includes, I think it would be wise for us to make the header files dependencies of llvmlibc or maybe better create a libc_headers target to generate all of our headers.

  • In general, I don't think we should expect that internal API to match the public API. For example, we might choose to give internal types names which are different from the public names. So, I think the second kind of tests you mention in your description is not required as it is restrictive.

Agreed. I think there are a few places already we use an internal __llvm_libc types I think these would break. That being said because we are doing that, it is important to test that the external api is compatible with the internal one, I cant think of a great way to do that though.

libc/utils/HdrGen/PrototypeTestGen/PrototypeTestGen.cpp
49

It was just recently renamed to Function But I think it makes more semantic sense to use cpp::IsSame<[type], decltype(EntryPointNameOption)>

PaulkaToast marked 2 inline comments as done.May 7 2020, 2:41 AM
  • Instead of creating a target for every entrypoint, have you considered generating a single file which:
    1. Includes all header files listed in the api.td.
    2. Lists cpp:Function pointers to all functions as required by api.td.

There definitely are problems that need to be solved with this approach. For example, how do you ensure all public header files are already present in the build/.../libc/include directory. But, the appeal of this approach is that one does not have to create explicit or implicit targets for "integration tests". There would be just one target which would test everything in one go.

Good idea, this is done and the new integration test file now looks like this:

...
#include <unistd.h>
#include <sys/mman.h>
#include <math.h>

int main() {
  static_assert(__llvm_libc::cpp::IsSame<int(int), decltype(raise)>::Value, "raise prototype in TableGen does not match public header");
  static_assert(__llvm_libc::cpp::IsSame<int(int, const struct sigaction *__restrict, struct sigaction *__restrict), decltype(sigaction)>::Value, "sigaction prototype in TableGen does not match public header");
  static_assert(__llvm_libc::cpp::IsSame<int(sigset_t *, int), decltype(sigdelset)>::Value, "sigdelset prototype in TableGen does not match public header");
...

Thanks to @abrachet's suggestion of using IsSame we also get nice error messages. As for ensuring all public header files are present I added a top-level target for the headers that we can depend on here (same as @abrachet mentioned)

  • In general, I don't think we should expect that internal API to match the public API. For example, we might choose to give internal types names which are different from the public names. So, I think the second kind of tests you mention in your description is not required as it is restrictive.

Done, this only tests the public/libc user perspective now.

  • Using -nostdinc would require us to explicitly list path to the linux headers for entrypoints which depend on the linux header files, which will very likely bring all headers into the include path. Do you have any ideas on how to get around it?

For now I've commented out -nostdinc with a TODO, the most straightforward way would be to have our public headers not depend on Linux dev headers but another approach could be to use our clang-tidy RestrictSystemIncludes check to ensure only the right Linux header gets included (this would add a dependency on clang-tidy for the integration tests though)

  • Using -nostdlib results in an incomplete link. For ELF, most linkers only give a warning about missing _start symbol, but the resulting exe is bad/incomplete. Do you have any plans to ensure that the link is complete, resulting in an exe which can be executed?

My earlier iteration of this had us linking with the loader for a complete executable link. However, since we now check everything statically, I think it's enough if the integration test compiles and links as this ensures the libc symbols are present and the types are compatible. We could add the loader again if you think it's necessary.

libc/utils/HdrGen/PrototypeTestGen/PrototypeTestGen.cpp
49

Thanks!

abrachet added inline comments.May 7 2020, 3:00 PM
libc/utils/HdrGen/PrototypeTestGen/PrototypeTestGen.cpp
25

Is there a reason for this to be namespaced? Should we use __llvm_libc instead?

28

Unecessary.

31–33

Don't need the brackets

37–42

These could be combined to

auto it = G->FunctionSpecMap.find(entrypoint);
if (it == G->FunctionSpecMap.end()) {} // error
llvm::Record *functionSpec = *it;

so as to not search twice.

46–49

_Noreturn isn't a C++ keyword, otherwise the type system shouldn't care about attributes no matter where they show up. I think instead of removing these manually we should just include __llvm-libc-common.h which will define this as [[noreturn]] we might in the future use other macros in type names.

53

for (size_t i = 0, size = args.size(); i < size; ++i)? I believe the code standards prefers this

76

& not necessary, I believe.

libc/utils/HdrGen/PublicAPICommand.cpp
111

PublicHeaders.emplace(Header)

187

explicit maybe?

280–282

Is this simpler than just using std::make_unique<APIGenerator>(Records); wherever it would be used?

PaulkaToast marked 11 inline comments as done.
PaulkaToast added inline comments.
libc/utils/HdrGen/PrototypeTestGen/PrototypeTestGen.cpp
46–49

We are already expanding this

projects/libc/lib/public_integration_test.cpp:23:42: error: expected variable name or 'this' in lambda capture list
  static_assert(__llvm_libc::cpp::IsSame<_Noreturn void(void), decltype(abort)>::Value, "abort prototype in TableGen does not match public header");
                                         ^
projects/libc/include/__llvm-libc-common.h:21:20: note: expanded from macro '_Noreturn'
#define _Noreturn [[noreturn]]

It seems square bracket [[attributes]] cannot be used on plain types/templates/outside definitions, the compiler thinks we are trying to define a lambda function.
https://godbolt.org/z/bMFzfy

Trying to use [[attributes]] in a using statement results in a similar error.

libc/utils/HdrGen/PublicAPICommand.cpp
280–282

I didn't want to take the whole class and expose it, just the parts I needed.

PaulkaToast edited the summary of this revision. (Show Details)May 7 2020, 7:39 PM
...
#include <unistd.h>
#include <sys/mman.h>
#include <math.h>

int main() {
  static_assert(__llvm_libc::cpp::IsSame<int(int), decltype(raise)>::Value, "raise prototype in TableGen does not match public header");
  static_assert(__llvm_libc::cpp::IsSame<int(int, const struct sigaction *__restrict, struct sigaction *__restrict), decltype(sigaction)>::Value, "sigaction prototype in TableGen does not match public header");
  static_assert(__llvm_libc::cpp::IsSame<int(sigset_t *, int), decltype(sigdelset)>::Value, "sigdelset prototype in TableGen does not match public header");
...

Thanks to @abrachet's suggestion of using IsSame we also get nice error messages. As for ensuring all public header files are present I added a top-level target for the headers that we can depend on here (same as @abrachet mentioned)

LG

  • Using -nostdinc would require us to explicitly list path to the linux headers for entrypoints which depend on the linux header files, which will very likely bring all headers into the include path. Do you have any ideas on how to get around it?

For now I've commented out -nostdinc with a TODO, the most straightforward way would be to have our public headers not depend on Linux dev headers but another approach could be to use our clang-tidy RestrictSystemIncludes check to ensure only the right Linux header gets included (this would add a dependency on clang-tidy for the integration tests though)

I think we should be complete here. Lets split this change into two changes:

  1. First change will add a target to build the main object but not the full executable. In this change, we will make use of clang-tidy to ensure only LLVM-libc or platform or compiler headers are being included.
  2. Second change will add a target to link this main object with the loader object making it a true integration test.

Before you make changes, I want to bring up few other high-level points for discussion:

  1. The list of public headers should be determined by the target config. However, we don't have a way for a config to list the public headers it wants. We can take two approaches here again: provide a rule to list a target for public headers, or just list the targets for the headers in .txt file. I would prefer keeping it simple and just requiring .txt file which defines a CMake list of targets for public headers. The CMake file listing the integration test target will include this file to set up the dependencies. Something like this:
# config/linux/headers.txt:

set(PUBLIC_HEADERS
  libc.include.math
  libc.inlcude.signal
  ...
)

And:

# integration_test/CMakeLists.txt
include("config/${LIBC_TARGET_OS}/headers.txt")
...
  1. The list of entrypoints that go into a .a file should also be determined by the target config. This can also be done by listing the entrypoint targets in a, say entrypoints.txt file. The file lib/CMakeLists.txt can include this file and get the entrypoints for the various static libraries, just like above. This mechanism can also help us skip building and testing entrypoint targets if a config doesn't list it in its entrypoints.txt file.

WDYT?

Sorry for the delayed response!

I like idea of a .txt for both the headers and the entrypoints. This is nice because it also prevents us from building and including platform specific headers e.g fcntl.h which is on Linux but not on Windows.
Will send out a revised diff soon. (:

As for the Linux dev headers I think instead of a work around using clang-tidy, it might just be better to remove our dependency on the dev headers all together, especially since the dev headers differ between platforms such as sigset_t on AARCH64 vs x86.
I can send out a separate patch out for this as well.

As for the Linux dev headers I think instead of a work around using clang-tidy, it might just be better to remove our dependency on the dev headers all together, especially since the dev headers differ between platforms such as sigset_t on AARCH64 vs x86.
I can send out a separate patch out for this as well.

What do you mean remove the dependency on dev headers? Does it mean that we make copies of linux definitions in LLVM libc tree?

As for the Linux dev headers I think instead of a work around using clang-tidy, it might just be better to remove our dependency on the dev headers all together, especially since the dev headers differ between platforms such as sigset_t on AARCH64 vs x86.
I can send out a separate patch out for this as well.

What do you mean remove the dependency on dev headers? Does it mean that we make copies of linux definitions in LLVM libc tree?

As we discussed offline, the approach I was thinking would involve duplicating a lot of definitions from the linux headers. So I decided to take the approach you suggested and use clang-tidy to ensure we're including the correct headers in the test.

sivachandra added inline comments.May 28 2020, 11:19 AM
libc/CMakeLists.txt
64

Why is this change required?

libc/cmake/modules/LLVMLibCLibraryRules.cmake
93

Call this entry point *name* list throughout to avoid confusion with target names.

libc/cmake/modules/LLVMLibCObjectRules.cmake
1

Seems to me like the changes in this file should be separated out into a different patch.

libc/config/linux/entrypoints.txt
1 ↗(On Diff #266719)

This LGTM. But, do you think it makes sense to put this in a target machine specific directory (config/linux/x86_64/entrypoints.txt)? For example, almost everything under LIBC_ENTRYPOINTS currently is x86_64 specific. So, putting these in a machine specific list helps us build the libc for different machine types independent of each of other.

libc/config/linux/headers.txt
1 ↗(On Diff #266719)

The same would be true for headers also I suppose but we can do that separately as it does not interfere with the current patch?

libc/lib/CMakeLists.txt
62

AFAICT, this is the only place this rule is being used. Do you see it being used anywhere else? If not, just inline everything here. May be even put it under the test directory and not here?

libc/utils/HdrGen/PublicAPICommand.h
42

I see only one specialization of this class. In which case, why is this need?

PaulkaToast marked 7 inline comments as done.
PaulkaToast added inline comments.
libc/CMakeLists.txt
64

Due to the changes below its no longer required (:

libc/cmake/modules/LLVMLibCObjectRules.cmake
1

Yes, I could pull it out into another patch but this is the only current use case for this change so maybe its better to leave it in this patch? wdyt?

libc/config/linux/entrypoints.txt
1 ↗(On Diff #266719)

I'm not necessarily opposed to this, although I'm not sure I know of a case where the libc differs for different architectures on linux. If the need arises it should be a very simple change to make if/when we encounter it.

libc/config/linux/headers.txt
1 ↗(On Diff #266719)

yes, we can evaluate this in a different patch.

libc/lib/CMakeLists.txt
62

Ah, yes this was useful for when I was testing individual libraries but now there is no need for the function so I'll inline it instead. This also removes the reordering issue from above. Thanks!

libc/utils/HdrGen/PublicAPICommand.h
42

Right now the APIGenerator is defined in the .cpp file, so this is used to expose the parts we want. The alternative is to expose the APIGenerator class in the header, but that'd be a larger refactor. It can be done though, which would you prefer?

sivachandra marked an inline comment as done.May 29 2020, 9:38 AM
sivachandra added inline comments.
libc/cmake/modules/LLVMLibCObjectRules.cmake
1

OK

libc/config/linux/entrypoints.txt
1 ↗(On Diff #266719)

My comment is more for our convenience while we are building the libc for different target machine architectures. Having just one linux list means that while the implementation for other architectures is incomplete, our build infrastructure will keep failing at different places.

libc/utils/HdrGen/PublicAPICommand.h
42

Yes. Refactor it separately. It need not really be a refactor. I am OK even if you just make API generator public. As it stands in this patch, the specialization and the base class have no meaning as makeAPIReader is really not a factory method.

Splitting the CMake parts and the hdrgen parts into two patches makes it easier for me to review as well.

PaulkaToast marked 2 inline comments as done.
PaulkaToast added inline comments.
libc/config/linux/entrypoints.txt
1 ↗(On Diff #266719)

Ah, this is a good point. If you are porting to another architecture and building it out its much more convenient to do it in pieces, a few entrpoints at a time, which multiple lists allow. Done.

libc/utils/HdrGen/PublicAPICommand.h
42

Okay, this is now pulled into a desperate patch: D80832.
When that patch gets merged in I'll pull it in here and update the diff.

sivachandra added inline comments.May 29 2020, 1:12 PM
libc/utils/HdrGen/PublicAPICommand.h
42

[This is very likely a typo but funny.]
No, my intention was not to make you feel desperate.

PaulkaToast marked an inline comment as done.May 29 2020, 3:31 PM
PaulkaToast added inline comments.
libc/utils/HdrGen/PublicAPICommand.h
42

I'm so sorry that was a typo, meant to write "separate". 😅

sivachandra accepted this revision.Jun 1 2020, 9:21 PM

Accepting now. The one comment I added can be addressed separately.

libc/test/src/CMakeLists.txt
96 ↗(On Diff #267708)

Why not include the loader here?

This revision is now accepted and ready to land.Jun 1 2020, 9:21 PM
PaulkaToast marked an inline comment as done.Jun 2 2020, 12:20 PM
libc/test/src/CMakeLists.txt
96 ↗(On Diff #267708)

Yes, this will be incorporated in part 2, as per this comment.

https://reviews.llvm.org/D79192#2027676

I think we should be complete here. Lets split this change into two changes:

  1. First change will add a target to build the main object but not the full executable. In this change, we will make use of clang-tidy to ensure only LLVM-libc or platform or compiler headers are being included.
  2. Second change will add a target to link this main object with the loader object making it a true integration test.
This revision was automatically updated to reflect the committed changes.