This is an archive of the discontinued LLVM Phabricator instance.

[libcxx] Run the clang-format and generated-output checks on the "service" queue
ClosedPublic

Authored by Quuxplusone on Apr 28 2021, 3:33 AM.

Details

Summary

As these jobs only run in a couple seconds, and block starting of
other jobs, they can run on the "service" queue which doesn't get
blocked by other long running jobs.

Diff Detail

Event Timeline

mstorsjo created this revision.Apr 28 2021, 3:33 AM
mstorsjo requested review of this revision.Apr 28 2021, 3:33 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 28 2021, 3:33 AM
Herald added a reviewer: Restricted Project. · View Herald Transcript

Sounds very good to me if it works as advertised. I don't know enough about buildkite to know if it works as advertised, though. :)

(FWIW, I've also occasionally wished that one or two of the builders were pulled up into a new section, e.g. run just the C++03 and C++20 tests and then - wait before running the rest — this would lighten the load due to PRs-that-would-ultimately-fail-anyway by a factor of like 3.5, admittedly at the cost of some latency. Just leaving that here in case it resonates with someone.)

Mordante accepted this revision as: Mordante.Apr 28 2021, 10:39 AM
Mordante added a subscriber: Mordante.

LGTM, sounds great if we can the results of these test quickly!

(FWIW, I've also occasionally wished that one or two of the builders were pulled up into a new section, e.g. run just the C++03 and C++20 tests and then - wait before running the rest — this would lighten the load due to PRs-that-would-ultimately-fail-anyway by a factor of like 3.5, admittedly at the cost of some latency. Just leaving that here in case it resonates with someone.)

Sometimes I would like this. On the other hand I feel it's good to get the results of all supported C++ versions directly. I wouldn't object to another section to split the Apple builds in 2 parts. Apple still seems to be the limiting factor for our builds.

mstorsjo updated this revision to Diff 341440.Apr 29 2021, 2:06 AM

Rerunning to hopefully get a clean CI record without spurious unrelated errors.

Surprisingly, the check-generated-output testcase failed this time around, while it didn't yesterday. Maybe it does rely on something that not all workers in the "service" worker pool have installed? @Mordante
do you happen to be able to figure out what went wrong in this CI run?

Then on the other hand, as the whole next block of jobs (C++03..C++20) all run on the "libcxx-builders" queue, I don't think there'll be any practical difference for how quick we get to the final block of jobs with this change...

Quuxplusone commandeered this revision.Apr 29 2021, 6:19 AM
Quuxplusone added a reviewer: mstorsjo.

Surprisingly, the check-generated-output testcase failed this time around, while it didn't yesterday. Maybe it does rely on something that not all workers in the "service" worker pool have installed?

Link to logs: https://buildkite.com/llvm-project/libcxx-ci/builds/2879#f4496127-b31e-4e94-b3b8-cec119946838
The raw log output starts like this:

+ grep -q '^--- a' /var/lib/buildkite-agent/builds/llvm-project/build/check-generated-output/generated_output.patch
+ grep -rn '[^ -~]' libcxx/include/
libcxx/include/concepts:1:// -*- C++ -*-
libcxx/include/concepts:2://===-------------------------- concepts ----------------------------------===//
libcxx/include/concepts:4:// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
libcxx/include/concepts:5:// See https://llvm.org/LICENSE.txt for license information.
libcxx/include/concepts:6:// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
libcxx/include/concepts:10:#ifndef _LIBCPP_CONCEPTS
libcxx/include/concepts:11:#define _LIBCPP_CONCEPTS
libcxx/include/concepts:14:    concepts synopsis

...and so on.
This is the same output I get locally from running grep -rn '[ ^-~]' ../libcxx/include/concepts (notice the swapping of ^ and space).
Perhaps this is a known bug?

@mstorsjo, I'm going to commandeer to add grep --version to the script and keep poking buildkite until it fails again; with a --version in hand, we can at least file a StackOverflow question and see if anyone's ever seen this issue before.

add grep --version (printf debugging ftw!)

@mstorsjo, I'm going to commandeer to add grep --version to the script and keep poking buildkite until it fails again; with a --version in hand, we can at least file a StackOverflow question and see if anyone's ever seen this issue before.

Thanks! Please do.

Btw, for quicker CI iteration/testing, you can remove (or comment out) all the other CI jobs in the patch - even though it adds a bit more noise to the patch and mailing list history.

grep --version on the normal queue. Shoulda done this the first time.

Surprisingly, the check-generated-output testcase failed this time around, while it didn't yesterday. Maybe it does rely on something that not all workers in the "service" worker pool have installed? @Mordante
do you happen to be able to figure out what went wrong in this CI run?

This looks weird. I've never seen anything like this before.
Thanks Arthur for looking into it!

Quuxplusone set the repository for this revision to rG LLVM Github Monorepo.

https://stackoverflow.com/questions/67320156/misbehavior-of-gnu-grep-when-grepping-for-ignores-spaces/

StackOverflow says "check your locale" — it's possible that we're grepping in a locale where in the locale's collation order e.g. the letter a sorts first, then space, then ~, so in that collation order it is true that a matches [^ -~]. Using the ASCII locale LANG=C should fix that.

If it does fix it, imma land this.

Try setting LANG=C a different way, at the top of this script.
Roll in some <iterator> cleanup just to get testing on it.

Try some more printf debugging. I don't suppose there's a way to ssh into this machine...? :P

continue printf-debugging

Turns out that the "service" machine had LC_ALL=en_US.UTF-8 and LC_COLLATE=en_US.UTF-8, even though it had LANG=.
Let's try unsetting those two env vars.

This revision was not accepted when it landed; it landed in state Needs Review.Apr 30 2021, 5:58 AM
This revision was automatically updated to reflect the committed changes.

The service queue isn't meant for that. It's meant for BuildKite infrastructure jobs only. If we want to reduce the latency for getting the result of those jobs, we can investigate other options like agent priorities or setting up a special queue in amongst the libc++ agents themselves.

Also, please ping me on those infrastructure related reviews, I never got a chance to see this before it got shipped.

@ldionne: Sorry; okay.
I'm going to recommit the LC_COLLATE part of this patch for tech debt's sake, and then put the rest out of my mind. But I do think we (by which I mean you and @mstorjo) should look into setting up a special queue, because this really did seem to speed up the builds.

@ldionne: Sorry; okay.
I'm going to recommit the LC_COLLATE part of this patch for tech debt's sake, and then put the rest out of my mind. But I do think we (by which I mean you and @mstorjo) should look into setting up a special queue, because this really did seem to speed up the builds.

Yeah, no worries.