As these jobs only run in a couple seconds, and block starting of
other jobs, they can run on the "service" queue which doesn't get
blocked by other long running jobs.
Details
- Reviewers
ldionne Mordante mstorsjo - Group Reviewers
Restricted Project - Commits
- rG1b885573327d: [libc++] Revert the change that runs clang-format and generated-output in the…
rG6712534ebc6f: [libc++] [test] Run the clang-format and generated-output checks on the…
Diff Detail
Event Timeline
Sounds very good to me if it works as advertised. I don't know enough about buildkite to know if it works as advertised, though. :)
(FWIW, I've also occasionally wished that one or two of the builders were pulled up into a new section, e.g. run just the C++03 and C++20 tests and then - wait before running the rest — this would lighten the load due to PRs-that-would-ultimately-fail-anyway by a factor of like 3.5, admittedly at the cost of some latency. Just leaving that here in case it resonates with someone.)
LGTM, sounds great if we can the results of these test quickly!
Sometimes I would like this. On the other hand I feel it's good to get the results of all supported C++ versions directly. I wouldn't object to another section to split the Apple builds in 2 parts. Apple still seems to be the limiting factor for our builds.
Surprisingly, the check-generated-output testcase failed this time around, while it didn't yesterday. Maybe it does rely on something that not all workers in the "service" worker pool have installed? @Mordante
do you happen to be able to figure out what went wrong in this CI run?
Then on the other hand, as the whole next block of jobs (C++03..C++20) all run on the "libcxx-builders" queue, I don't think there'll be any practical difference for how quick we get to the final block of jobs with this change...
Surprisingly, the check-generated-output testcase failed this time around, while it didn't yesterday. Maybe it does rely on something that not all workers in the "service" worker pool have installed?
Link to logs: https://buildkite.com/llvm-project/libcxx-ci/builds/2879#f4496127-b31e-4e94-b3b8-cec119946838
The raw log output starts like this:
+ grep -q '^--- a' /var/lib/buildkite-agent/builds/llvm-project/build/check-generated-output/generated_output.patch + grep -rn '[^ -~]' libcxx/include/ libcxx/include/concepts:1:// -*- C++ -*- libcxx/include/concepts:2://===-------------------------- concepts ----------------------------------===// libcxx/include/concepts:4:// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. libcxx/include/concepts:5:// See https://llvm.org/LICENSE.txt for license information. libcxx/include/concepts:6:// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception libcxx/include/concepts:10:#ifndef _LIBCPP_CONCEPTS libcxx/include/concepts:11:#define _LIBCPP_CONCEPTS libcxx/include/concepts:14: concepts synopsis
...and so on.
This is the same output I get locally from running grep -rn '[ ^-~]' ../libcxx/include/concepts (notice the swapping of ^ and space).
Perhaps this is a known bug?
@mstorsjo, I'm going to commandeer to add grep --version to the script and keep poking buildkite until it fails again; with a --version in hand, we can at least file a StackOverflow question and see if anyone's ever seen this issue before.
Thanks! Please do.
Btw, for quicker CI iteration/testing, you can remove (or comment out) all the other CI jobs in the patch - even though it adds a bit more noise to the patch and mailing list history.
This looks weird. I've never seen anything like this before.
Thanks Arthur for looking into it!
StackOverflow says "check your locale" — it's possible that we're grepping in a locale where in the locale's collation order e.g. the letter a sorts first, then space, then ~, so in that collation order it is true that a matches [^ -~]. Using the ASCII locale LANG=C should fix that.
If it does fix it, imma land this.
Try setting LANG=C a different way, at the top of this script.
Roll in some <iterator> cleanup just to get testing on it.
Try some more printf debugging. I don't suppose there's a way to ssh into this machine...? :P
Turns out that the "service" machine had LC_ALL=en_US.UTF-8 and LC_COLLATE=en_US.UTF-8, even though it had LANG=.
Let's try unsetting those two env vars.
The service queue isn't meant for that. It's meant for BuildKite infrastructure jobs only. If we want to reduce the latency for getting the result of those jobs, we can investigate other options like agent priorities or setting up a special queue in amongst the libc++ agents themselves.
Also, please ping me on those infrastructure related reviews, I never got a chance to see this before it got shipped.
@ldionne: Sorry; okay.
I'm going to recommit the LC_COLLATE part of this patch for tech debt's sake, and then put the rest out of my mind. But I do think we (by which I mean you and @mstorjo) should look into setting up a special queue, because this really did seem to speed up the builds.