This is an archive of the discontinued LLVM Phabricator instance.

[libc++][Android] Enable libc++ testing on Android
ClosedPublic

Authored by rprichard on Dec 1 2022, 2:50 PM.

Details

Reviewers
EricWF
Group Reviewers
Restricted Project
Restricted Project
Restricted Project
Commits
rGd173ce4a670e: [libc++][Android] Support libc++ testing on Android (#69274)
Summary

This patch adds libc++ support for Android L (Android 5.0+) and up,
tested using the Android team's current compiler, a recent version of
the AOSP sysroot, and the x86[-64] Android Emulator.

CMake and Lit Configuration:

Add runtimes/cmake/android/Arch-${ARCH}.cmake files that configure
CMake to cross-compile to Android without using CMake's built-in NDK
support (which only works with an actual packaged NDK).

Add libcxx/cmake/caches/AndroidNDK.cmake that builds and tests libc++
(and libc++abi) for Android. This file configures libc++ to match what
the NDK distributes, e.g.:

  • libc++_shared.so (includes libc++abi objects, there is no libc++abi.so). libunwind is linked statically but not exported.
  • libc++_static.a (does not include libc++abi) and libc++abi.a
  • std::__ndk1 namespace
  • All the libraries are built with __ANDROID_API__=21, even when they are linked to something targeting a higher API level.

(However, when the Android LLVM team builds these components, they do
not use these CMake cache files. Instead they use Python scripts to
configure the builds. See
https://android.googlesource.com/toolchain/llvm_android/.)

Add llvm-libc++[abi].android-ndk.cfg.in files that test the Android
NDK's libc++_shared.so. These files can target old or new Android
devices. The Android LLVM team uses these test files to test libc++ for
both arm/arm64 and x86/x86_64 architectures.

The Android testing mode works by setting %{executor} to adb_run.py,
which uses adb push and adb shell to run tests remotely. adb_run.py
always runs tests as the "shell" user even on an old emulator where
"adb unroot" doesn't work. The script has workarounds for old Android
devices. The script uses a Unix domain socket on the host
(--job-limit-socket) to restrict concurrent adb invocations. Compiling
the tests is a major part of libc++ testing run-time, so it's desirable
to exploit all the host cores without overburdening the test devices,
which can have far fewer cores.

BuildKite CI:

Add a builder to run-buildbot, android-ndk-*, that uses Android Clang
and an Android sysroot to build libc++, then starts an Android emulator
container to run tests.

Run the emulator and an adb server in a separate Docker container
(libcxx-ci-android-emulator), and create a separate Docker image for
each emulator OS system image. Set ADB_SERVER_SOCKET to connect to the
container's adb server. Running the only adb server inside the
container makes cleanup more reliable between test runs, e.g. the adb
client doesn't create a ~/.android directory and the adb server can
be restarted along with the emulator using docker stop/run. (N.B. The
emulator insists on connecting to an adb server and will start one
itself if it can't connect to one.)

The suffix to the android-ndk-* job is a label that concisely specifies
an Android SDK emulator image. Enable CI testing on two configurations:

  • "system-images;android-21;default;x86" ==> 21-def-x86
  • "system-images;android-33;google_apis;x86_64" ==> 33-goog-x86_64

Mark tests XFAIL for Android:

Add three Android lit features:

  • android
  • android-device-api=(21,22,23,...)
  • LIBCXX-ANDROID-FIXME (for failures that need follow-up work)

Mark failing test with XFAIL (and sometimes UNSUPPORTED):

  • Disable various FS tests (because SELinux blocks FIFO and hard linking).
  • Disable various locale tests (because Bionic has limited locale support).

Diff Detail

Event Timeline

rprichard created this revision.Dec 1 2022, 2:50 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 1 2022, 2:50 PM
rprichard requested review of this revision.Dec 1 2022, 2:50 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptDec 1 2022, 2:50 PM
Herald added a reviewer: Restricted Project. · View Herald Transcript
Herald added a reviewer: Restricted Project. · View Herald Transcript
philnik added a subscriber: philnik.Dec 1 2022, 3:16 PM

IMO it would be good to also add the Android job with this patch to make sure everything works properly. You should be able to just add a job to libcxx/utils/ci/buildkite-pipeline.yml. See https://libcxx.llvm.org/AddingNewCIJobs.html#addingnewcijobs for more details.

libcxx/cmake/caches/AndroidNDK.cmake
22

Does Android actually use a different ABI namespace, or is this just to avoid name clashes when testing? If it's just fur testing purposes I'd suggest to rename it to something like __test.

32

This flag isn't honoured anymore. The experimental library will always be built.

libcxx/utils/ci/run-buildbot-container-android
1 ↗(On Diff #479443)

IIUC this allows anybody to run the Android test suite in a docker container? If that's the case, it's really nice.

rprichard added inline comments.Dec 1 2022, 4:39 PM
libcxx/cmake/caches/AndroidNDK.cmake
22

Android has a few different builds of libc++. In the "platform libc++" (/system/lib[64]/libc++.so), the namespace is the ordinary __1. Apps aren't supposed to use the platform libc++, though, and instead they redistribute a libc++_shared.so from the Android NDK. libc++_shared.so uses an __ndk1 namespace. Both the filename and the namespace are supposed to help isolate the NDK libc++ from the platform STL.

With recent Android versions, there are also "APEX modules" that have a copy of libc++.so in them. This copy is very similar to the platform libc++.so, but typically built against an older API level.

Naming it __test for this test configuration seems reasonable.

32

I'll remove this line then. It should be fine to build the library, even though I think the experimental library isn't available in the platform or the NDK currently.

libcxx/utils/ci/run-buildbot-container-android
1 ↗(On Diff #479443)

Yes, it should become really easy for others to run the libc++ tests on Android.

rprichard updated this revision to Diff 479478.EditedDec 1 2022, 4:58 PM

Switch namespace from __ndk1 to __test.

rprichard updated this revision to Diff 481104.Dec 7 2022, 4:25 PM

Add feature LIBCXX-ANDROID-FIXME.

IMO it would be good to also add the Android job with this patch to make sure everything works properly. You should be able to just add a job to libcxx/utils/ci/buildkite-pipeline.yml. See https://libcxx.llvm.org/AddingNewCIJobs.html#addingnewcijobs for more details.

+1 That makes reviewing the code easier.

rprichard updated this revision to Diff 482314.Dec 12 2022, 5:01 PM

Reword a comment.

It's awesome to see this taking form! This will be the first time that we have an upstream CI bot running via an emulator, and I know there are many more use cases for this so it's great to break trail.

I have a few comments but this LGTM. I agree with others that it would be nice to add a CI runner in the same patch to test that it all works.

libcxx/cmake/caches/AndroidNDK.cmake
22

Ideally, this cache file would reflect the way you actually build for the NDK. The executables in the test suite should reliably use -nostdlib++ and so on to make sure that we're linking only against the just-built libc++, so I am not sure whether these hoops are necessary. Did you notice that the tests were not being linked/run against the right library?

The same comment holds for the _static renaming above.

libcxx/utils/ci/run-buildbot
716–719

It would be sweet to be able to do this automatically when we startup the test suite. But this is probably fine to improve incrementally, this is already a giant step in the right direction.

libcxxabi/test/configs/llvm-libc++abi-android-ndk.cfg.in
36

Not attached to this line: You should update our platform support matrix to add Android if it's not already there.

smeenai added a subscriber: smeenai.Jan 7 2023, 2:10 AM
smeenai added inline comments.
libcxx/cmake/caches/AndroidNDK.cmake
22

Yeah, I feel like it'd be less confusing if this used the __ndk inline namespace like the actual NDK's libc++ does.

rprichard updated this revision to Diff 526851.May 30 2023, 5:11 PM

Various updates:

  • Use the std::__ndk1 namespace to match how the NDK libc++ is built.
  • Use a POSIX semaphore to limit concurrent adb jobs (libcxx/utils/semaphore_tool.py, --job-limit-semaphore) more than host jobs are limited.
  • Define android and LIBCXX-ANDROID-FIXME features in features.py.
  • Add a android_libcxx_kind={ndk,platform} param to libcxx[abi]/test/configs/llvm-libc++[abi]-android.cfg.in, which allows for testing either the NDK build of libc++ or the platform (/system/lib[64]/libc++.so) build of libc++ for Android.

I still haven't added anything to libcxx/utils/ci/buildkite-pipeline.yml. It's not clear to me whether that should be part of this patch, because there are several things that must happen first before CI tests will work:

  • I need to create a new docker image to add the Android stuff to it, and host it somewhere.
  • I need to setup a special buildbot on a VM somewhere that uses the new docker image.
  • There are a bunch of patches I have uploaded to Phabricator currently that are required to get tests running cleanly.
  • There are a few other fixes for tests that I don't have uploaded, because I'm not quite sure what we want to do (need to disable certain filesystem tests, and there are a couple of demangler problems involving floating-point).
rprichard retitled this revision from [libc++][Android] Enable libc++ testing using adb on an x86-64 emulator to [libc++][Android] Enable libc++ testing on Android.May 30 2023, 5:31 PM
rprichard edited the summary of this revision. (Show Details)

I still haven't added anything to libcxx/utils/ci/buildkite-pipeline.yml. It's not clear to me whether that should be part of this patch, because there are several things that must happen first before CI tests will work:

  • I need to create a new docker image to add the Android stuff to it, and host it somewhere.
  • I need to setup a special buildbot on a VM somewhere that uses the new docker image.

You don't have to use a Docker image for a CI bot. Although it would be really nice, this is not a requirement, and if you want to do this it can still be done later.

  • There are a bunch of patches I have uploaded to Phabricator currently that are required to get tests running cleanly.

Any tests that fail can be marked LIBCXX-ANDROID-FIXME for now. Or do you mean something more general?

  • There are a few other fixes for tests that I don't have uploaded, because I'm not quite sure what we want to do (need to disable certain filesystem tests, and there are a couple of demangler problems involving floating-point).

Anything that doesn't work can be marked LIBCXX-ANDROID-FIXME for now. If it turns out these tests can't work on Android it can still be updated later.

I was wondering whether I can land this patch adding Android testing to buildkite-pipeline.yml while the Android libc++ tests are still failing, or if the buildbot isn't set up yet. e.g. I wouldn't want non-Android-related LLVM patches to be affected. (Maybe I should land all the Android-related work first, and only then add the entry to the yml file.)

I'm also wondering how the buildbot is restarted when the ldionne/libcxx-builder Docker image changes.

I still haven't added anything to libcxx/utils/ci/buildkite-pipeline.yml. It's not clear to me whether that should be part of this patch, because there are several things that must happen first before CI tests will work:

  • I need to create a new docker image to add the Android stuff to it, and host it somewhere.
  • I need to setup a special buildbot on a VM somewhere that uses the new docker image.

You don't have to use a Docker image for a CI bot. Although it would be really nice, this is not a requirement, and if you want to do this it can still be done later.

This patch has a Dockerfile.android that produces a libcxx-builder-android image -- I just need to publish it somewhere. I think we want to keep the Android stuff out of the main ldionne/libcxx-builder Docker image because it's big? When I measured it back in December, adding Android support would increase the size from 4.7GB to 12.5GB. The Android SDK is large (6.8G), and I also need to install the openjdk-11-jdk Ubuntu package (400M). The buildbot could download the SDK when it starts, but then a local run of run-buildbot android-ndk would be slow.

  • There are a bunch of patches I have uploaded to Phabricator currently that are required to get tests running cleanly.

Any tests that fail can be marked LIBCXX-ANDROID-FIXME for now. Or do you mean something more general?

It's not that many patches, but Android isn't a properly supported target yet in libc++, so maybe that discouraged reviewing? Some of the patches are small but required to make many tests work, e.g. D137130 and D137131 in particular. D139497 is needed to get three tests passing. D137129 is also simple and fixes one test.

  • There are a few other fixes for tests that I don't have uploaded, because I'm not quite sure what we want to do (need to disable certain filesystem tests, and there are a couple of demangler problems involving floating-point).

Anything that doesn't work can be marked LIBCXX-ANDROID-FIXME for now. If it turns out these tests can't work on Android it can still be updated later.

That's probably a good way to disable some of the tests, like test_demangle.pass.cpp and copy_file_large.pass.cpp, which fail on some devices.

Filesystem tests are a problem still:

  • Security policy (SELinux...) on Android blocks the shell (default non-root) user from creating/using FIFOs, domain sockets, or hard links. The tests can't be run as root because some tests verify that APIs fail from lack of permissions. There are already targets that lack FIFOs and domain sockets, so I have a WIP patch extending that to __ANDROID__. That patch also disables a few places that create hard links. I had wondered if there should be a more principled way of indicating (lack of) support for special kinds of files in the test suite. (Also: I suppose the tests could sometimes run as root, and drop privileges as needed? Probably not worth doing.)
  • The L (API 21) non-emulator devices I test with don't allow symlinks for the shell user, so all(?) the filesystem tests fail (static_test_env() assert fails). The tests succeed on an L emulator. The ability to create symlinks was present in K and is restored in M. Maybe use some sort of "has symlinks" feature to disable all the filesystem tests? Or maybe the test suite can be refactored so the non-symlink tests run. From Android's POV, this also doesn't seem worth it because the filesystem code is sufficiently tested (on the emulator, or on newer devices).

I was wondering whether I can land this patch adding Android testing to buildkite-pipeline.yml while the Android libc++ tests are still failing, or if the buildbot isn't set up yet. e.g. I wouldn't want non-Android-related LLVM patches to be affected. (Maybe I should land all the Android-related work first, and only then add the entry to the yml file.)

I think the way to go is marking all the failing tests as XFAIL and go from there. This avoids regressions and makes it easier to verify that a patch actually does what it claims to do.

I'm also wondering how the buildbot is restarted when the ldionne/libcxx-builder Docker image changes.

I still haven't added anything to libcxx/utils/ci/buildkite-pipeline.yml. It's not clear to me whether that should be part of this patch, because there are several things that must happen first before CI tests will work:

  • I need to create a new docker image to add the Android stuff to it, and host it somewhere.
  • I need to setup a special buildbot on a VM somewhere that uses the new docker image.

You don't have to use a Docker image for a CI bot. Although it would be really nice, this is not a requirement, and if you want to do this it can still be done later.

This patch has a Dockerfile.android that produces a libcxx-builder-android image -- I just need to publish it somewhere. I think we want to keep the Android stuff out of the main ldionne/libcxx-builder Docker image because it's big? When I measured it back in December, adding Android support would increase the size from 4.7GB to 12.5GB. The Android SDK is large (6.8G), and I also need to install the openjdk-11-jdk Ubuntu package (400M). The buildbot could download the SDK when it starts, but then a local run of run-buildbot android-ndk would be slow.

That is indeed quite a lot of extra data. Maybe we could just have a second image ldionne/libcxx-builder-android. Then everybody who has to can access and update it without problems.

  • There are a bunch of patches I have uploaded to Phabricator currently that are required to get tests running cleanly.

Any tests that fail can be marked LIBCXX-ANDROID-FIXME for now. Or do you mean something more general?

It's not that many patches, but Android isn't a properly supported target yet in libc++, so maybe that discouraged reviewing?

I can't speak for everybody, but I personally am much more reluctant to accept a patch that isn't for a supported platform. Having a CI that tells you that the patch doesn't break anything major is quite helpful, especially since you don't have to assume that the person tested it properly.

Some of the patches are small but required to make many tests work, e.g. D137130 and D137131 in particular. D139497 is needed to get three tests passing. D137129 is also simple and fixes one test.

  • There are a few other fixes for tests that I don't have uploaded, because I'm not quite sure what we want to do (need to disable certain filesystem tests, and there are a couple of demangler problems involving floating-point).

Anything that doesn't work can be marked LIBCXX-ANDROID-FIXME for now. If it turns out these tests can't work on Android it can still be updated later.

That's probably a good way to disable some of the tests, like test_demangle.pass.cpp and copy_file_large.pass.cpp, which fail on some devices.

Filesystem tests are a problem still:

  • Security policy (SELinux...) on Android blocks the shell (default non-root) user from creating/using FIFOs, domain sockets, or hard links. The tests can't be run as root because some tests verify that APIs fail from lack of permissions. There are already targets that lack FIFOs and domain sockets, so I have a WIP patch extending that to __ANDROID__. That patch also disables a few places that create hard links. I had wondered if there should be a more principled way of indicating (lack of) support for special kinds of files in the test suite. (Also: I suppose the tests could sometimes run as root, and drop privileges as needed? Probably not worth doing.)
  • The L (API 21) non-emulator devices I test with don't allow symlinks for the shell user, so all(?) the filesystem tests fail (static_test_env() assert fails). The tests succeed on an L emulator. The ability to create symlinks was present in K and is restored in M. Maybe use some sort of "has symlinks" feature to disable all the filesystem tests? Or maybe the test suite can be refactored so the non-symlink tests run. From Android's POV, this also doesn't seem worth it because the filesystem code is sufficiently tested (on the emulator, or on newer devices).

If the tests behave differently on different platforms it's also an option to mark then as // UNSUPPORTED: LIBCXX-ANDROID-FIXME. The only difference is that we don't know whether a problem has been fixed somehow.

I was wondering whether I can land this patch adding Android testing to buildkite-pipeline.yml while the Android libc++ tests are still failing, or if the buildbot isn't set up yet. e.g. I wouldn't want non-Android-related LLVM patches to be affected. (Maybe I should land all the Android-related work first, and only then add the entry to the yml file.)

I think the way to go is marking all the failing tests as XFAIL and go from there. This avoids regressions and makes it easier to verify that a patch actually does what it claims to do.

These are the currently-failing tests with just this patch, https://gist.github.com/rprichard/4a8e9b160e7adc472778699c62ae1fff.

I can mark these tests as XFAIL. Would I do that in this patch or could I do that in an earlier commit?

This patch has a Dockerfile.android that produces a libcxx-builder-android image -- I just need to publish it somewhere. I think we want to keep the Android stuff out of the main ldionne/libcxx-builder Docker image because it's big? When I measured it back in December, adding Android support would increase the size from 4.7GB to 12.5GB. The Android SDK is large (6.8G), and I also need to install the openjdk-11-jdk Ubuntu package (400M). The buildbot could download the SDK when it starts, but then a local run of run-buildbot android-ndk would be slow.

That is indeed quite a lot of extra data. Maybe we could just have a second image ldionne/libcxx-builder-android. Then everybody who has to can access and update it without problems.

Yes, a second image makes sense to me. I'm not sure who should own it, but since @ldionne generates the base image, it could also make sense for him to generate the Android-specific one that builds on it.

I was wondering whether I can land this patch adding Android testing to buildkite-pipeline.yml while the Android libc++ tests are still failing, or if the buildbot isn't set up yet. e.g. I wouldn't want non-Android-related LLVM patches to be affected. (Maybe I should land all the Android-related work first, and only then add the entry to the yml file.)

I think the way to go is marking all the failing tests as XFAIL and go from there. This avoids regressions and makes it easier to verify that a patch actually does what it claims to do.

These are the currently-failing tests with just this patch, https://gist.github.com/rprichard/4a8e9b160e7adc472778699c62ae1fff.

I can mark these tests as XFAIL. Would I do that in this patch or could I do that in an earlier commit?

For AIX, it looks like some initial test code was added in D111244. Later, D111359 defined LIBCXX-AIX-FIXME, marked the test files, and added the entry in buildkite-pipeline.yml.

I was wondering whether I can land this patch adding Android testing to buildkite-pipeline.yml while the Android libc++ tests are still failing, or if the buildbot isn't set up yet. e.g. I wouldn't want non-Android-related LLVM patches to be affected. (Maybe I should land all the Android-related work first, and only then add the entry to the yml file.)

I think the way to go is marking all the failing tests as XFAIL and go from there. This avoids regressions and makes it easier to verify that a patch actually does what it claims to do.

These are the currently-failing tests with just this patch, https://gist.github.com/rprichard/4a8e9b160e7adc472778699c62ae1fff.

I can mark these tests as XFAIL. Would I do that in this patch or could I do that in an earlier commit?

Both options seem fine to me. Adding them in an earlier commit would probably make sense to keep this patch a bit more focused.

This patch has a Dockerfile.android that produces a libcxx-builder-android image -- I just need to publish it somewhere. I think we want to keep the Android stuff out of the main ldionne/libcxx-builder Docker image because it's big? When I measured it back in December, adding Android support would increase the size from 4.7GB to 12.5GB. The Android SDK is large (6.8G), and I also need to install the openjdk-11-jdk Ubuntu package (400M). The buildbot could download the SDK when it starts, but then a local run of run-buildbot android-ndk would be slow.

That is indeed quite a lot of extra data. Maybe we could just have a second image ldionne/libcxx-builder-android. Then everybody who has to can access and update it without problems.

Yes, a second image makes sense to me. I'm not sure who should own it, but since @ldionne generates the base image, it could also make sense for him to generate the Android-specific one that builds on it.

There are actually a few core contributor who can update the image. The ones I'm aware of are myself, Mark and Louis. I think it would make sense that we could update the android image and whoever is our Android contact (I guess that's you?).

I was wondering whether I can land this patch adding Android testing to buildkite-pipeline.yml while the Android libc++ tests are still failing, or if the buildbot isn't set up yet. e.g. I wouldn't want non-Android-related LLVM patches to be affected. (Maybe I should land all the Android-related work first, and only then add the entry to the yml file.)

I'm also wondering how the buildbot is restarted when the ldionne/libcxx-builder Docker image changes.

I still haven't added anything to libcxx/utils/ci/buildkite-pipeline.yml. It's not clear to me whether that should be part of this patch, because there are several things that must happen first before CI tests will work:

  • I need to create a new docker image to add the Android stuff to it, and host it somewhere.
  • I need to setup a special buildbot on a VM somewhere that uses the new docker image.

You don't have to use a Docker image for a CI bot. Although it would be really nice, this is not a requirement, and if you want to do this it can still be done later.

This patch has a Dockerfile.android that produces a libcxx-builder-android image -- I just need to publish it somewhere. I think we want to keep the Android stuff out of the main ldionne/libcxx-builder Docker image because it's big? When I measured it back in December, adding Android support would increase the size from 4.7GB to 12.5GB. The Android SDK is large (6.8G), and I also need to install the openjdk-11-jdk Ubuntu package (400M). The buildbot could download the SDK when it starts, but then a local run of run-buildbot android-ndk would be slow.

  • There are a bunch of patches I have uploaded to Phabricator currently that are required to get tests running cleanly.

Any tests that fail can be marked LIBCXX-ANDROID-FIXME for now. Or do you mean something more general?

It's not that many patches, but Android isn't a properly supported target yet in libc++, so maybe that discouraged reviewing? Some of the patches are small but required to make many tests work, e.g. D137130 and D137131 in particular. D139497 is needed to get three tests passing. D137129 is also simple and fixes one test.

I'm not discouraged reviewing this, especially since it adds CI support. But several of us work as volunteer which means we have limited time to work on libc++.

For the way forward, I've a question. What is the intention regarding running the CI?

Are you going to host it yourself like AIX/FreeBSD do or is it the intention the run on our existing Linux runners?

[...]

I'm not discouraged reviewing this, especially since it adds CI support. But several of us work as volunteer which means we have limited time to work on libc++.

For the way forward, I've a question. What is the intention regarding running the CI?

Are you going to host it yourself like AIX/FreeBSD do or is it the intention the run on our existing Linux runners?

My current plan is to use a different Docker image (libcxx-builder-android vs libcxx-builder) and run Android-specific buildkite agents. We could use the same Docker image and Linux runners for both, but the existing Docker image has gigabytes of GCC and Clang, whereas the Android Docker image has gigabytes of emulator OS images. (libcxx-builder is ~6GB while my current libcxx-builder-android is ~12GB with two OS images and only one NDK Clang.) My tentative plan is to host the Android-specific VMs in the same GCE project that hosts the current Linux runners. For the moment, I've also had to switch from using Container-Optimized OS (COS) to Ubuntu GCE images, because COS doesn't currently support KVM.

Rebase the commit.

Redo a lot of the CI and emulator handling. Expand the testing from a
single NDK emulator-based test to two different emulator images:

  • Android L x86 (32-bit)
  • Android T x86-64

Add another lit feature, android-device-api=${API} that can be used
to XFAIL tests that only fail on old devices. This is common because
the oldest supported OS (L) is from 2014, and a lot has changed since
then w.r.t. Bionic and SELinux config, for example.

Remove ci/android-set-user.sh and simplify Docker user management.
Split the existing Dockerfile into Dockerfile.base to use as the base
for a new vendor/android/Dockerfile.

Add the two emulator configs to buildkite-pipeline.yml and run-buildbot.
Add XFAIL and UNSUPPORTED to tests such that the two CI configs should
pass cleanly. (There are a few more tests that will fail on configs
that aren't being tested, e.g. NAN string to long double conversion
is broken only on old LP64 devices, but that config isn't added in
this patch.)

rprichard updated this revision to Diff 530440.EditedJun 12 2023, 2:48 AM

Switch the Android run-buildbot-container from --device /dev/kvm back to --privileged, but add an option to switch it back to --device /dev/kvm.

I've been trying to figure out a way to do the obvious thing of mapping the container's libcxx-builder UID:GID (1000:1000) to my UID:GID on the host so I don't have to grant special permissions to run the tests (to ${MONOREPO_ROOT}/build and ${MONOREPO_ROOT}/lib/abi/*). It *seems* like something that ought to be straightforward. I was able to get it working with Docker's userns-remap mode but then that broke multi-stage Docker builds. For the moment, I need to use --privileged to make nsjail work, so make that the default. Personally I'm just using chown and chmod to let the 1000 UID modify my checkout.

rprichard edited the summary of this revision. (Show Details)Jun 12 2023, 2:50 AM

[...]

I'm not discouraged reviewing this, especially since it adds CI support. But several of us work as volunteer which means we have limited time to work on libc++.

For the way forward, I've a question. What is the intention regarding running the CI?

Are you going to host it yourself like AIX/FreeBSD do or is it the intention the run on our existing Linux runners?

My current plan is to use a different Docker image (libcxx-builder-android vs libcxx-builder) and run Android-specific buildkite agents. We could use the same Docker image and Linux runners for both, but the existing Docker image has gigabytes of GCC and Clang, whereas the Android Docker image has gigabytes of emulator OS images. (libcxx-builder is ~6GB while my current libcxx-builder-android is ~12GB with two OS images and only one NDK Clang.) My tentative plan is to host the Android-specific VMs in the same GCE project that hosts the current Linux runners. For the moment, I've also had to switch from using Container-Optimized OS (COS) to Ubuntu GCE images, because COS doesn't currently support KVM.

Thanks for the information. Then having two different Docker images indeed makes sense.

The patch is quite large. I think it would be better to land it in smaller parts, preferable after we see these changes have a green Android CI.

It would be great if you can have a buildbot running in CGE so we can see the patch passes the CI. Could you reach out to @ldionne to get a buildkite token and talk to him about how we want to maintain the Android Docker images. I think it would make sense that you should be able to maintain them.

As mentioned in the review, please move the Docker changes for the Linux CI in a new patch.

libcxx/utils/ci/Dockerfile
29 ↗(On Diff #530440)

Can you make a separate review of this file and the new Dockerfile.base. I really would like to test and land these separately.

The patch is quite large. I think it would be better to land it in smaller parts, preferable after we see these changes have a green Android CI.

Sure, I can split parts of it off. I think the XFAIL additions and the libcxx/lib/abi* files are a good place to start.

It would be great if you can have a buildbot running in CGE so we can see the patch passes the CI. Could you reach out to @ldionne to get a buildkite token and talk to him about how we want to maintain the Android Docker images. I think it would make sense that you should be able to maintain them.

ldionne gave me a buildkite token a while ago, and I set up a test server last night, so this patch did pass CI: https://buildkite.com/llvm-project/libcxx-ci/builds/26029

I see that libcxx only supports the last two stable releases of Clang (15 and 16), but the latest NDK is r25c, which is between Clang 13 and 14. NDK r26 should be released sometime this year with a compiler newer than Clang 16.

The patch is quite large. I think it would be better to land it in smaller parts, preferable after we see these changes have a green Android CI.

Sure, I can split parts of it off. I think the XFAIL additions and the libcxx/lib/abi* files are a good place to start.

The ABI could remain, but I would like to see the XFAIL annotations and the Dockerfile changes in two separate commits.

It would be great if you can have a buildbot running in CGE so we can see the patch passes the CI. Could you reach out to @ldionne to get a buildkite token and talk to him about how we want to maintain the Android Docker images. I think it would make sense that you should be able to maintain them.

ldionne gave me a buildkite token a while ago, and I set up a test server last night, so this patch did pass CI: https://buildkite.com/llvm-project/libcxx-ci/builds/26029

nice thanks! This makes is a lot easier to verify the patch works :-)

I see that libcxx only supports the last two stable releases of Clang (15 and 16), but the latest NDK is r25c, which is between Clang 13 and 14. NDK r26 should be released sometime this year with a compiler newer than Clang 16.

How often are Android NDKs released? I would like to know how often we Android uses compilers we no longer officially support.

I see that libcxx only supports the last two stable releases of Clang (15 and 16), but the latest NDK is r25c, which is between Clang 13 and 14. NDK r26 should be released sometime this year with a compiler newer than Clang 16.

How often are Android NDKs released? I would like to know how often we Android uses compilers we no longer officially support.

https://github.com/android/ndk/wiki/NDK-Release-Process says what we aim for. Resource constraints for the past few years mean that in practice only the LTS has been shipped. The realistic answer to this question is "annually".

The patch is quite large. I think it would be better to land it in smaller parts, preferable after we see these changes have a green Android CI.

Sure, I can split parts of it off. I think the XFAIL additions and the libcxx/lib/abi* files are a good place to start.

The ABI could remain, but I would like to see the XFAIL annotations and the Dockerfile changes in two separate commits.

Yeah that makes sense.

FWIW, I'm experimenting with a different way of packaging the Android test components. Instead of testing using the Android NDK, maybe we could use an ordinary upstream LLVM with an Android sysroot. The sysroot is much smaller (e.g. a few hundred MB IIRC), so I think we could add it to ldionne/libcxx-builders. Maybe we could then have a completely independent Docker image that contains the emulator and Android OS image. It might even be possible to reuse the existing Linux builders, but with KVM turned on.

https://github.com/android/ndk/wiki/NDK-Release-Process says what we aim for. Resource constraints for the past few years mean that in practice only the LTS has been shipped. The realistic answer to this question is "annually".

Thanks for the info, one year would match 2 LLVM releases.

FWIW, I'm experimenting with a different way of packaging the Android test components. Instead of testing using the Android NDK, maybe we could use an ordinary upstream LLVM with an Android sysroot. The sysroot is much smaller (e.g. a few hundred MB IIRC), so I think we could add it to ldionne/libcxx-builders. Maybe we could then have a completely independent Docker image that contains the emulator and Android OS image. It might even be possible to reuse the existing Linux builders, but with KVM turned on.

A few hundered MB doesn't sound too bad.

rprichard updated this revision to Diff 538331.Jul 8 2023, 1:09 AM
rprichard edited the summary of this revision. (Show Details)

Switch Android testing to using the existing copies of Clang in ldionne/libcxx-builder. Add a recent Android sysroot and the platform-tools (e.g. adb) to libcxx-builder. Start an Android emulator in a sibling container to the libcxx-builder container. Use one Docker image per emulator OS image.

The existing LLVM toolchains have a few issues -- they lack Android builtins and libunwind.a, and their -rtlib/--unwindlib settings have the wrong default for Android. I added a ci/vendor/android/build-toolchain.sh script that builds the builtins and libunwind.a and installs them into the new resource dir. It then generates clang/clang++ wrapper scripts that specify -resource-dir and fix the -rtlib/--unwindlib defaults. I did try to build libunwind.a at the same time as libc++abi and libc++ but had a couple of problems -- see the comment in build-toolchain.sh for details.

With this approach, it should be possible to use the same GCE builder VMs for Android and non-Android Linux testing, but for the moment, I'm using separate builders (BuildKite tag, os=linux vs os=android, which I specify using a BUILDKITE_AGENT_TAGS env var). I uploaded the emulator images to Google Artifact Registry but they're not publicly downloadable yet. Running the build-emulator-images.sh script will build them locally.

Herald added subscribers: llvm-commits, Restricted Project, Enna1, delcypher. · View Herald TranscriptJul 8 2023, 1:09 AM
Herald added projects: Restricted Project, Restricted Project, Restricted Project. · View Herald TranscriptJul 8 2023, 1:09 AM
Herald added a reviewer: Restricted Project. · View Herald Transcript
rprichard edited the summary of this revision. (Show Details)Jul 8 2023, 1:10 AM
rprichard updated this revision to Diff 538344.Jul 8 2023, 3:24 AM
rprichard edited the summary of this revision. (Show Details)

Run 'docker info' before 'docker image inspect ...' to make sure that Docker is working.

Switch adb_run's job limiting from POSIX semaphore to a Unix domain socket. Instead of needing a SIGINT handler in adb_run.py to release the semaphore, we can instead rely on the OS to close the socket connection.

Remove the cxx_add_android_flags function from libcxx/CMakeLists.txt that was adding -Wno-unused-command-line-argument. Previously, this patch used CMake's NDK support, which was adding an unnecessary -stdlib=libc++. Now that the patch uses just an Android sysroot with an ordinary LLVM build, CMake does not add -stdlib=libc++, so we don't need to pass -Wno-unused-command-line-argument.

rprichard edited the summary of this revision. (Show Details)Jul 14 2023, 12:59 AM
rprichard updated this revision to Diff 540310.Jul 14 2023, 1:08 AM
rprichard edited the summary of this revision. (Show Details)

rebase

rprichard added a comment.EditedJul 14 2023, 1:24 AM

I split off duplicated the Dockerfile changes to D155271.

rprichard updated this revision to Diff 540318.Jul 14 2023, 1:33 AM

In _job_limit_socket_thread, discard the address returned from accept.

rprichard updated this revision to Diff 540323.Jul 14 2023, 1:51 AM

utils/adb_run.py: Print syncing command lines with -v.

utils/libcxx/test/android.py: Add a missing type annotation to make mypy happy.

I copied the two ABI lists into D155341. I think this patch is ready for review now.

rprichard updated this revision to Diff 540580.Jul 14 2023, 3:37 PM

Fix indentation in libcxx/test/support/filesystem_test_helper.h to make git-clang-format happy.

Can you make a stacked review of this patch and have the two split of patches as parent revisions. Then we review what really is intended to land. After the Docker file is approved we need to rebuild the Docker image so it's available for all runners.

rprichard updated this revision to Diff 541233.Jul 17 2023, 2:17 PM

Now that D155271 and D155341 are parent revisions of this revision, remove their edits from this revision. I expect this change to break the Android testing because the Android abilist files will be missing.

ldionne added inline comments.Aug 10 2023, 10:06 AM
libcxx/cmake/caches/AndroidNDK.cmake
26–29

I think what you want here instead is

CMAKE_C_COMPILER_WORKS = True
CMAKE_CXX_COMPILER_WORKS = True

like in https://reviews.llvm.org/D150766. And in the future this shouldn't be required in caches, it should just work out of the box.

40–41

I would suggest treating those as entirely different builds and giving them their own cache files. There are ways to avoid duplication (e.g. include(AndroidBase.cmake)) without adding a ad-hoc knob here.

libcxx/test/configs/llvm-libc++-android.cfg.in
15–23 ↗(On Diff #541233)

Here, you should be able to simply specify --param target_triple=<WHATEVER> when running lit, and get rid of target_api altogether.

libcxx/test/libcxx/modules_include.gen.py
35 ↗(On Diff #541233)

This needs to be updated when you rebase, there's two references in this file now.

libcxx/test/std/input.output/string.streams/stringstream/stringstream.members/gcount.pass.cpp
11–13

Per live review, this comment can be updated.

14

For now, I would prefer that we keep this LIBCXX-ANDROID-FIXME so we can have a separate discussion about how to handle low-memory environments in the test suite (more generally).

libcxx/utils/ci/buildkite-pipeline.yml
1142

I think we should make the run-buildbot script stop the emulator unconditionally. It would be nice to keep the property that all the commands in the buildkite pipeline are just direct calls to run-buildbot (with a few exceptions).

1147

Instead of this, you could run libcxx/utils/ci/run-buildbot android-ndk-<VERSION>, similarly to what we do for the macOS backdeployment?

libcxx/utils/ci/run-buildbot
50

So this goes away.

715–718

Not needed anymore.

721–728

I feel like this is a workaround for not having the equivalent of an apt.llvm.org build of the toolchain. We should make this available on the system we're testing on instead (which probably means building it into the docker image).

libcxx/utils/ci/vendor/android/Dockerfile.emulator
2

Note to self: review stopped here.

rprichard updated this revision to Diff 549826.Aug 14 2023, 1:49 AM
rprichard marked 9 inline comments as done.

Address some of the review comments.

Mark some of the comments as Done.

rprichard updated this revision to Diff 549856.Aug 14 2023, 3:07 AM

Disable flaky notify_one.pass.cpp test for Android, and remove the unnecessary adb socat workaround now that adb -a start-server works.

rprichard edited the summary of this revision. (Show Details)Aug 14 2023, 3:12 AM
ldionne added inline comments.Aug 14 2023, 11:48 AM
libcxx/utils/ci/vendor/android/Dockerfile.emulator.base
2 ↗(On Diff #549856)

Is it necessary to have a .base and a non-base Dockerfile? If so, let's leave a comment to explain why. Otherwise, consider merging both files.

22 ↗(On Diff #549856)

Can we use a URL that points to the latest version so that we get a new version of the command-line tools when we rebuild that docker image?

Or do we want to be intentional about the version we're using here so that we only bump it when we really want to? If so, what is going to be the process for bumping it up? Is that something that you or other Android folks can commit to doing on a regular basis? Otherwise, this will be forgotten forever and will never be updated.

libcxx/utils/ci/vendor/android/build-toolchain.sh
1 ↗(On Diff #549856)

Note to self: This would also go away if we downloaded a pre-built toolchain instead.

libcxx/utils/ci/vendor/android/emulator-wait-for-ready.sh
24
libcxx/utils/ci/vendor/android/start-emulator.sh
28

This assumes that the docker image named $(docker_image_of_emu_img ${EMU_IMG}) is somehow available on whatever's running the CI. We should probably instead publish the emulator's Docker image somewhere (maybe the same place we publish our other docker images) and use that.

libunwind/cmake/caches/Android.cmake
1 ↗(On Diff #549856)

This would go away if we downloaded a pre-built toolchain and used that toolchain's libunwind instead.

rprichard marked 10 inline comments as done.
rprichard edited the summary of this revision. (Show Details)

Address review comments.

rprichard edited the summary of this revision. (Show Details)Aug 15 2023, 10:58 PM
rprichard added inline comments.
libcxx/utils/ci/vendor/android/Dockerfile.emulator.base
22 ↗(On Diff #549856)

Can we use a URL that points to the latest version so that we get a new version of the command-line tools when we rebuild that docker image?

I asked about this in a Google-internal discussion group but haven't heard anything yet. It doesn't look possible at the moment, unless we want to download an XML file and parse it to find the latest stable commandlinetools package.

rprichard updated this revision to Diff 550984.Aug 16 2023, 9:06 PM
rprichard marked an inline comment as done.

Rebase.

rprichard updated this revision to Diff 551399.EditedAug 17 2023, 11:52 PM

The last rebase accounted for about a month's worth of changes in LLVM. Fix the Android tests that were broken by the rebase:

  • fmemopen isn't available until Android M (API 23).
  • The std::system_error string has no OS error component in several std::print tests. I suspect libc++ itself is broken here. See https://reviews.llvm.org/D150044#4597254.
  • The to_std_containers.pass.cpp test is segfaulting on 32-bit x86 Android L. The test needs to be compiled with -mstackrealign to work around an OS bug in pre-N.

For the -mstackrealign issue, see:

smeenai added inline comments.Aug 24 2023, 4:25 PM
libcxx/cmake/caches/AndroidNDK.cmake
14

You aren't caching this, so it won't have any effect. It's also the default though, so you don't need to specify it explicitly.

rprichard updated this revision to Diff 557663.Oct 10 2023, 1:30 AM

Use stdin=subprocess.DEVNULL for adb_run.py subprocesses, except for the main adb shell invocation where we want to forward stdin. This change was necessary to fix stdin forwarding on a Pixel 6 (oriole) device. (However, the API 33 emulator tests passed even without this change. Not sure why.)

Disable three stdin tests on pre-shell_v2 devices for now:

  • libcxx/test/std/input.output/iostream.objects/narrow.stream.objects/cin.sh.cpp
  • libcxx/test/std/input.output/iostream.objects/wide.stream.objects/wcin-imbue.sh.cpp
  • libcxx/test/std/input.output/iostream.objects/wide.stream.objects/wcin.sh.cpp

The tests pass on shell_v2 devices. I _think_ that happens because adb shell <command> uses a pipe for stdin, and this pipe is closed on the device when echo closes the pipe on the host. Closing stdin signals to the test program to stop waiting for more input.

With pre-shell_v2 devices, the test hangs. Pre-shell_v2 devices always allocate a pty, even for adb shell <command>. It appears that the pty is in line editing mode, so it's necessary to send a newline or carriage-return after the 1234 to propagate the line of input into the test program's stdin. Maybe there's some way to fix stdin forwarding in adb_run.py, (e.g. by reading chunks of input in adb_run.py, framing them, then passing each frame through the pty to some shell code on the device that unframes the input and forwards it to the test program via a pipe).

rprichard updated this revision to Diff 557666.Oct 10 2023, 2:08 AM

Cache the LIBCXX_ABI_VERSION variable in AndroidNDK.cmake.

rprichard marked an inline comment as done.Oct 10 2023, 2:12 AM
rprichard added inline comments.
libcxx/cmake/caches/AndroidNDK.cmake
14

Thanks for noticing this! I uploaded a fix.

EricWF requested changes to this revision.Oct 11 2023, 11:02 AM
EricWF added a subscriber: EricWF.

Can we break this up into smaller changes?

Like, lets land the test suite comments first. Because we can land those right away.

Then maybe split out the changes to ci/vendor/android/.
And then finally the rest of the changes in a third review?

This revision now requires changes to proceed.Oct 11 2023, 11:02 AM
This revision was not accepted when it landed; it landed in state Needs Revision.Oct 19 2023, 1:58 PM
This revision was automatically updated to reflect the committed changes.