This is an archive of the discontinued LLVM Phabricator instance.

[Clang][Driver] Re-use the calling process instead of creating a new process for the cc1 invocation
ClosedPublic

Authored by aganea on Nov 4 2019, 1:56 PM.

Details

Summary

This patch is an optimization for speed: whenever possible, it will avoid creating a new process for the -cc1 invocation, and instead will call the cc1 tool inside the calling process (clang tool).
On Windows, this has a moderate impact on build times (please see timings in the comments below)

CFG (control flow guard) is disabled on Windows.

This patch also mildly improves build times on a Haswell 6-core Linux PC.

@Meinersbur reported no change in timings on a many-core Linux machine.


If you'd like to try this on your configuration, use the script below to ensure a standardized unit of measure:
(this is only for Windows)

Warning: the script pulls and reverts all local changes!
Please edit the script before running, to provide the right filename for applying the patch.

The script can be used such as:

# run the test (lengthy, hours!)
> powershell .\abba_test.ps1

By default, it'll do a 2-stage build, including cmake'ing with a fixed set of options; followed by AB/BA testing where it'll alternatively rebuild using Clang 10 with and without the patch, for at least 5 hours. If you want to fiddle the number of hours it runs, you can do:

> powershell .\abba_test.ps1 50

Where '50' represents the number of hours it'll run. You can also skip building the two stages, if you already did it, by setting the second parameter:

> powershell .\abba_test.ps1 50 $True

You then end up with something like that:

Total iterations:  6
     |         Min       |        Mean       |       Median      |         Max       |
   A |  00:11:55.8588259 |  00:12:11.5531275 |  00:12:09.5867878 |  00:12:30.2468040 |
   B |  00:09:35.1556958 |  00:09:52.5302980 |  00:09:55.2329476 |  00:10:04.4994028 |
Diff | -00:02:20.7031301 | -00:02:19.0228296 | -00:02:14.3538402 | -00:02:25.7474012 |

If you find this useful, I can maybe convert it to a python script and send a review separately.

Diff Detail

Event Timeline

aganea created this revision.Nov 4 2019, 1:56 PM
Herald added a project: Restricted Project. · View Herald Transcript
aganea updated this revision to Diff 227779.Nov 4 2019, 2:05 PM

Fix missing llvm::InitializeAllTargets() in driver.cpp

aganea marked 3 inline comments as done.Nov 4 2019, 2:15 PM

A few remarks:

clang/lib/Driver/Job.cpp
318

See my other comment about LLVM_THREAD_LOCAL

clang/tools/driver/driver.cpp
341

LLVM_THREAD_LOCAL is here for the llvm-buildozer application I've presented at the LLVM conference. It's there because main() is entered from several threads, thus all clang needs to be thread-safe (I have several other patches to achieve that).
If you want this patch without LLVM_THREAD_LOCAL I'm fine with that.

llvm/lib/Support/Windows/Signals.inc
827

When this function is entered through the global exception handler, returning EXCEPTION_EXECUTE_HANDLER lets the CRT do its own exit with the exception code.

This patch reduced the build time from 12 to 7 minutes? Sounds too good to be true. Thanks for the work! I would primarily interested in this to avoid determining the -cc1 command line when debugging.

However, the implementation of calling it's own main function gives the impression to be more like a hack. How much restructuring would be needed to properly create a CompilerInstance inside the driver instead?

clang/lib/Driver/Job.cpp
337–352

This looks fragile as it will break when the user chooses to rename the executable (clang-cuda, --driver-mode=...,...). Why not moving the ClangDriverMain logic into the clangDriver module. Or, at least, pass it through a lambda. It would also make it easier to use clangDriver as a library (if that was ever an intended use-case?!?)

clang/tools/driver/driver.cpp
340

Could you add a comment for the purpose of this function? It's not obvious outside the context of this patch.

Meinersbur added inline comments.Nov 4 2019, 4:28 PM
clang/lib/Driver/Job.cpp
337–352

I tried this on Linux and it did not work out-of-the box (ClangDriverMain being nullptr) due to the executable's name being clang-10. After fixing, check-clang failed with 5 errors in the Driver test.

The performance benefit was within noise: 6m20s vs. 6m18s on a 2-Socket 28 cores-per-processor (112 SMT threads) Skylake-Xeon system for ninja all check-clang, assertions enabled.

thakis added a comment.Nov 5 2019, 7:57 AM

Thanks for sending this out!

Instead of the dynamic lookup of that symbol, what do you think about passing in the function via a normal api? That way, the type checker and linker help us keep things working; dynamic lookup is always a bit subtle and hard to grep for. (llvm-cs wouldn't know about the current use, for example.) See https://reviews.llvm.org/D69848 for a rough sketch.

For robust crash handling, I figured on non-win we'd have a signal handler that on signal self-execs with a "actually, do use a cc1 process" and then use the existing crash machinery, but maybe that's not needed.

aganea added a comment.Nov 5 2019, 8:09 AM

Thanks for the feedback @Meinersbur!

This patch is mainly geared towards Windows users. I'm not expecting anything on Linux, you already have all the bells & whistles there :-) Although I definitely see improvements on my Linux box. Would the distro make a difference? Mine is:

$ uname -a
Linux (name edited) 5.0.0-29-generic #31~18.04.1-Ubuntu SMP Thu Sep 12 18:29:21 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

This patch reduced the build time from 12 to 7 minutes? Sounds too good to be true.

Starting up and cooling down a large process like clang on Windows is very expensive. A fair amount of cpu time is spent in the clang driver, essentially in the kernel, in the range of 30 - 100 ms per process:
(timings below are before this patch)

We don't have fork() on Windows meaning that the kernel needs to re-start a the cc1 process from scratch, allocate pages, remap the EXE, allocate heap, allocate TLS, start the CRT, etc. Also, InitLLVM::InitLLVM is expensive because it calls into dbghelp.dll and loads symbols - just that can sometimes take up to 10 ms per process (depending on the system load).
In addition, recent Windows builds have all sorts of kernel regressions related to process creation and virtual pages allocation. Bruce Dawson has several blog entries about all this.
We're simply mitigating these issues. Upgrading the 36-core server to Windows 10 build 1903 will probably decrease the gap (12 min -> 7 min). However I would still expect users of Clang on pre-1903 builds for a few years from now.

I will fix the other issues you've mentioned.

aganea added a comment.Nov 5 2019, 8:31 AM

Thanks for sending this out!

Instead of the dynamic lookup of that symbol, what do you think about passing in the function via a normal api? That way, the type checker and linker help us keep things working; dynamic lookup is always a bit subtle and hard to grep for. (llvm-cs wouldn't know about the current use, for example.) See https://reviews.llvm.org/D69848 for a rough sketch.

Absolutely, it make more sense, I can't remember why I used the dynamic lookup :-( I have this patch lying around since last year.

For robust crash handling, I figured on non-win we'd have a signal handler that on signal self-execs with a "actually, do use a cc1 process" and then use the existing crash machinery, but maybe that's not needed.

Right now on Windows I'm doing about that, not in the exception handler, but a bit later on in clang/lib/Driver/Driver.cpp, L1338. The worse that can happen would be a memory corruption, which would crash Driver::generateCompilationDiagnostics() and prevent displaying the preprocessed file and the cmd-line. But you would still see the callstack (which is rendered now in the SEH, see llvm/lib/Support/CrashRecoveryContext.cpp, L194 - I still need to hook that up for the posix implementation). Ideally we could render some of these things in advance (or pre-allocate memory at least) to minimize the risk of a crash when displaying crash diagnostics.

aganea updated this revision to Diff 228729.EditedNov 11 2019, 10:42 AM
aganea marked 2 inline comments as done.

Addressed comments & finished the Linux part. All tests pass.

@Meinersbur : Just to show the difference between Windows & Linux, here's some timings for running the tests over the same git checkout, compiled with clang-9.
Everything was already built, I used ninja check-all and only the tests ran.
LLVM_ENABLE_PROJECTS = "clang;lld;clang-tools-extra;compiler-rt"

6-core Intel(R) Xeon(R) CPU E5-1650 0 @ 3.20GHzUbuntu 18.04770.07s (12 min 50 sec)59182 tests ran
6-core Intel(R) Xeon(R) W-2135 CPU @ 3.70GHzWindows 10 build 19031779.83s (29 min 39 sec)56589 tests ran

This patch does not make much difference for running the tests on Windows (it's saving only 30 sec). The friction in the OS is so high, improving just clang.exe does not change much.

hans added a comment.Nov 12 2019, 7:13 AM

Thanks for sharing this!

I still need to take a proper look, but my first thought is this:

Instead of invoking main() (or a similar function like ClangDriverMain) as an alternative to spinning up the cc1 process, would it be possible to call ExecuteCC1Tool() directly? I guess it would be a little less general, but maybe it would be more straight-forward?

Meinersbur added inline comments.Nov 12 2019, 10:27 AM
clang/lib/Driver/Job.cpp
345

Why is this check necessary? Why not assuming that if Driver::Main is set, it will be the right one?

Does this change crash recovery semantics in any meaningful way? Will we still be able to get stack traces on all platforms when the compiler crashes?

llvm/lib/Support/CrashRecoveryContext.cpp
196

You can fix this by writing:

static bool wrap_function_call(function_ref<void()> Fn, bool EnableExceptionHandler, unsigned &RetCode)
{
   __try {
     Fn();
     return true;
  } __except (EnableExceptionHandler
                  ? LLVMUnhandledExceptionFilter(GetExceptionInformation())
                  : 1) {
    RetCode = GetExceptionCode();
    return false;
  }
}

bool CrashRecoveryContext::RunSafely(function_ref<void()> Fn) {
    ...
    bool Result = wrap_function_call(EnableExceptionHandler, Fn, RetCode);
    ...
}
aganea marked an inline comment as done.EditedNov 12 2019, 11:55 AM

Instead of invoking main() (or a similar function like ClangDriverMain) as an alternative to spinning up the cc1 process, would it be possible to call ExecuteCC1Tool() directly? I guess it would be a little less general, but maybe it would be more straight-forward?

Possibly, yes, I'll have to check. There might be a few edge cases where we re-enter ClangDriverMain() twice if I recall correctly - I'll check.

Does this change crash recovery semantics in any meaningful way? Will we still be able to get stack traces on all platforms when the compiler crashes?

Yes, the changes in CrashRecoveryContext allow the same side-effects as the global exception handler. You would see the callstack and minidump in the same way as before.

There's still a minimal risk of memory/heap/CRT corruption, even with CrashRecoveryContext (after a crash). But that would possibly affect LLVMUnhandledExceptionFilter as well, and right now (without this patch), that handler runs in the context of the crashed process (not the calling one) so it wouldn't be worse than what it is now.

I could write a follow-up patch to prepare on startup the cmd-line invoked by Driver::generateCompilationDiagnostics(), instead of preparing after a crash. We could also pre-allocate a few virtual pages on advance, and use that in a BumpAllocator, instead of allocating after the crash. We could also merge the feature of llvm-symbolizer into clang.exe, so that we could call-back into clang.exe to render the callstack on stdout, even if llvm-symbolizer is not present.

The only drawback I see now is that we don't get the coredump on Linux, because the program doesn't end through the kernel signal handler. However, given that @Meinersbur says he sees no improvement on his side, we could disable this optimization on non-win? (or hide it behind a disabled flag on non-win)

clang/lib/Driver/Job.cpp
345

The driver builds phases that do not always call the cc1 process. Simply stating clang a.cpp would invoke clang -cc1, then the linker. In the later case, even if we have Driver::Main it doesn't mean we should use it. There are a number of other edge cases of the same kind, such as /fallback or build cuda files.

aganea updated this revision to Diff 229333.Nov 14 2019, 9:36 AM
aganea marked an inline comment as done.

Call into ExecuteCC1Tool() directly, as suggested by @hans
Add more comments.
Remove pThis in CrashRecoveryContext.cpp:RunSafely() as suggested by @zturner

hans added a comment.Nov 15 2019, 1:45 AM

Thanks for the update! I think calling ExecuteCC1Tool instead of main is an improvement (there's no need to think about "re-entering" the process, it's really just a function call). I still don't see why it needs to go through a function pointer. I was hoping the code could be something like:

In Command::Execute:
if the tool we're going to execute is cc1 (not the gen-reproducer though),
then instead of forking, call ExecuteCC1Tool with the args directly (inside a context so that we can recover from a crash)

Do you think that's achievable?

clang/include/clang/Driver/Driver.h
209

Why do we need a variable for this? Couldn't the code just invoke the function directly?

aganea added a comment.EditedNov 15 2019, 6:32 AM

@hans : Simply because ExecuteCC1Tool() lives in the clang tool (clang/tools/driver/driver.cpp) and we're calling it from the clangDriver.lib (clang/lib/Driver/Job.cpp). The clangDriver.lib is linked into many other tools (clang-refactor, clang-rename, clang-diff, clang-check, libclang, ...) If I reference the cc1 function directly, we end of with linker errors in all those tools.
We could maybe move the tool code into the clangDriver.lib, but I'm not sure it's practical as the tool pulls on many other libraries. I thought the callback was the path of least resistance. Let me know if we could do it otherwise.

hans added a comment.Nov 18 2019, 5:33 AM

@hans : Simply because ExecuteCC1Tool() lives in the clang tool (clang/tools/driver/driver.cpp) and we're calling it from the clangDriver.lib (clang/lib/Driver/Job.cpp). The clangDriver.lib is linked into many other tools (clang-refactor, clang-rename, clang-diff, clang-check, libclang, ...) If I reference the cc1 function directly, we end of with linker errors in all those tools.
We could maybe move the tool code into the clangDriver.lib, but I'm not sure it's practical as the tool pulls on many other libraries. I thought the callback was the path of least resistance. Let me know if we could do it otherwise.

Oh, I see. I didn't think about that. That's annoying, but maybe using the variable makes sense then.

clang/include/clang/Driver/Driver.h
207

It's not really a callback though. How about just "Pointer to the ExecuteCC1Tool function, if available."
It would also be good if the comment explained why the pointer is needed.
And why does it need to be thread-local? Can different threads end up with different values for this?

clang/include/clang/Driver/Job.h
136 ↗(On Diff #229333)

Is there another way of avoiding in-process-cc1 after a crash? For example, could the Job know that it's being built for generating the crash report? It seems unfortunate to introduce this widely-scoped variable here.

clang/lib/Driver/Job.cpp
337–350

I don't think this form of initialization is common in LLVM code.

345

Now that we're not calling main() anymore, but cc1 directly -- is this checking still necessary?

clang/tools/driver/driver.cpp
308

It's not really re-entering. Maybe "If we call the cc1 tool directly in the driver process" is clearer?

llvm/include/llvm/Support/CrashRecoveryContext.h
105

I don't think this form of initialization is common in LLVM. Same below.

llvm/lib/Support/CrashRecoveryContext.cpp
24

Can we move this declaration, and the one for SignalHandler, into some header file? Declaring it manually like this doesn't seem so nice.

aganea updated this revision to Diff 230143.Nov 19 2019, 2:11 PM
aganea marked 10 inline comments as done.
aganea edited the summary of this revision. (Show Details)

Addressed @hans' comments.

I've also added a script in the summary, if you'd like to reproduce the results on your end, and to ensure we all run the same thing.

aganea added inline comments.Nov 19 2019, 2:14 PM
clang/lib/Driver/Job.cpp
345

Yes it is - see my other comment just above.

The driver builds phases that do not always call the cc1 process. Simply stating clang a.cpp would invoke clang -cc1, then the linker, say link.exe. In this later case (invoking link.exe), even if we have Driver::Main it doesn't mean we should use it. There are a number of other edge cases of the same kind, such as /fallback or building cuda files, where a different executable is invoked along the way."

aganea retitled this revision from [Clang][Driver] Bypass cc1 process and re-enter driver main to [Clang][Driver] Re-use the calling process instead of creating a new process for the cc1 invocation.Nov 19 2019, 2:18 PM
aganea edited the summary of this revision. (Show Details)
ormris added a subscriber: ormris.Nov 19 2019, 2:27 PM
hans added inline comments.Nov 20 2019, 6:21 AM
clang/lib/Driver/Job.cpp
337

I think it would be clearer with a boolean "CallCC1" or something instead of a function pointer.

345

Okay, but couldn't we find a cleaner way of figuring out that the Command is a cc1 invocation? The thing that creates the Command would know.. maybe it could set a flag or something? Or maybe instead of creating a Command it would create something else?

aganea updated this revision to Diff 230527.Nov 21 2019, 1:47 PM
aganea marked 3 inline comments as done.
aganea added inline comments.
clang/lib/Driver/Job.cpp
345

Yeah you're right, I've refactored and added a CC1Command instead, makes things much cleaner, thank you!

I've split the CrashRecoveryContext changes into another patch to minimize the risks, please see D70568. That patch will need to go first before this one.

russell.gallop added a comment.EditedNov 22 2019, 6:03 AM

Thanks for this change. I applied this patch (prior to splitting out https://reviews.llvm.org/D70568) and it built and worked okay (I only see one clang-cl in process explorer).

I don't see anything like same performance improvement however. I did my own assessment of build times. I built 2 x stage 1 toolchains using clang 9.0.0 with and without this patch (applied to 1b9ef3bbb595206b0097b7adec2c1b69eae6fab4). Then I used each of those to build LLVM (no clang or lld) again.

CMake settings:

# Build stage 1 (same with and without patch)
cmake -G Ninja -DCMAKE_C_COMPILER="C:/Program Files/LLVM/bin/clang-cl.exe" -DCMAKE_CXX_COMPILER="C:/Program Files/LLVM/bin/clang-cl.exe" -DCMAKE_LINKER="C:/Program Files/LLVM/bin/lld-link.exe" -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PROJECTS=clang;lld -DLLVM_TARGETS_TO_BUILD=X86 -DCLANG_ENABLE_STATIC_ANALYZER=OFF -DCLANG_ENABLE_ARCMT=OFF ../llvm

# Build stage 2 without patch
cmake -G Ninja -DCMAKE_C_COMPILER="<llvm-project.git>/build_stage1_no_patch/bin/clang-cl.exe" -DCMAKE_CXX_COMPILER="<llvm-project.git>/build_stage1_no_patch/bin/clang-cl.exe" -DCMAKE_LINKER="<llvm-project.git>/build_stage1_no_patch/bin/lld-link.exe" -DCMAKE_BUILD_TYPE=Release -DLLVM_TARGETS_TO_BUILD=X86 ../llvm

# Build stage 2 with patch
cmake -G Ninja -DCMAKE_C_COMPILER="<llvm-project.git>/build_stage1_patch/bin/clang-cl.exe" -DCMAKE_CXX_COMPILER="<llvm-project.git>/build_stage1_patch/bin/clang-cl.exe" -DCMAKE_LINKER="<llvm-project.git>/build_stage1_patch/bin/lld-link.exe" -DCMAKE_BUILD_TYPE=Release -DLLVM_TARGETS_TO_BUILD=X86 ../llvm

These both built the patched code.

I got a ~1.5% performance improvement, reducing "ninja all" from 571.5s to 562.5s seconds on a 6 core machine (average over three 3 builds on a quiet machine), times below. For reference I am using winver 1803 and have CFG enabled.

Assuming that time_saved ≈ (process_invocation_overhead * compiler_jobs) / parallel_build_jobs
or (rearranging) process_invocation_overhead ≈ (time_saved * parallel_build_jobs) / compiler_jobs
(I know this is not perfectly true as build jobs will tail off towards the end of a build, but I think is okay for a ballpark estimate.)

I saved 9 seconds, over 1720 compiler jobs (from ninja -v -n all | grep -c clang-cl), with 14 parallel build jobs which gives the process invocation overhead of about 73ms which is in the range you mentioned above (30-100ms). Therefore I think that this is in the right ballpark.

I can't see how this would get 20% (from the table) or 30% improvement (from the first graph). In abba_test.ps1 you include running tests (due to check-all) so it is possible that you are saving time on a very large number of clang driver invocations on very small files. This is helpful for LLVM developers but I don't think it's representative of other builds. Alternatively, is the process overhead higher on your machine(s) for some reason (e.g. security software).

To be clear, I'm not against this patch going in, and I can confirm that it is a performance improvement so good to have. I just can't see where the claimed 20-30% saving on build time comes from.

Time data for 3 builds (seconds)
No patch (571.349, 575.353, 567.638)
With Patch (560.870, 563.110, 563.368)

aganea edited the summary of this revision. (Show Details)Nov 25 2019, 8:53 AM

Thanks for the feedback Russell!

Can you possibly try again with /MT? (ie. -DLLVM_USE_CRT_RELEASE=MT)

In the abba_test.ps1 script, ninja check-all is only used for preparing clang.exe with the patch. The A/B loop does not use check-all.

I've modified the abba_test.ps1 script to only build llvm/, with the same options as you do. There was also a bug in the timespan calculation, the test would last a lot more than you specified.

I also found out that on some of our multi-socket servers, Ninja defaulted to only the # threads from the first socket. If you had a 2x 18-core system, Ninja would default to 38 threads (18 x 2 + 2) instead of 74 threads (2 x 18 x 2 + 2). This behavior seems to be random, depending on the system. This has been fixed by Ninja PR 1674, I've tested all our systems with this patch and they now all default to the proper # of threads.

I'm doing a new round of tests, I'll get back with updated figures.

hans added a comment.Nov 26 2019, 2:23 AM

Thanks! This is starting to look really good now.

clang/lib/Driver/Job.cpp
319

Maybe better to just return a bool for success/failure since there is not really a process exit code yet.

(But I'm not even sure this is worth a separate method; see comment further down.)

422

For CC1Command, PrepareExecution doesn't do very much (since it doesn't use response files). Because of that, I'm not sure breaking out PrepareExecution into a separate method is worth it. I'd just copy the part that applies to CC1Command here. Then we wouldn't need to check the return value from PrepareExecution (it can't fail if response files aren't used), and we also don't need to intercept SetResponsefile -- we'd just ignore it even if it's set.

451

llvm_unreachable("msg") is usually preferred instead of assert(0 && "msg")

clang/tools/driver/driver.cpp
313

This feels a little risky to me. Do we still need to rely on argv[1], now that we have a nice CC1Command abstraction?

Thanks for the feedback Russell!

Can you possibly try again with /MT? (ie. -DLLVM_USE_CRT_RELEASE=MT)

I tried adding this to my stage 1 builds and it didn't make any significant difference (times in seconds):

No patch (573.879, 568.519, 567.709)
With Patch (578.253, 558.515, 562.368)

In case it makes a difference, this is from a VS2017 command prompt with Visual Studio 15.9.17.

In the abba_test.ps1 script, ninja check-all is only used for preparing clang.exe with the patch. The A/B loop does not use check-all.

Ah, yes. Sorry, my mistake.

I also found out that on some of our multi-socket servers, Ninja defaulted to only the # threads from the first socket. If you had a 2x 18-core system, Ninja would default to 38 threads (18 x 2 + 2) instead of 74 threads (2 x 18 x 2 + 2). This behavior seems to be random, depending on the system. This has been fixed by Ninja PR 1674, I've tested all our systems with this patch and they now all default to the proper # of threads.

I'm not testing on a multi-socket system so don't believe I need that patch.

I'm doing a new round of tests, I'll get back with updated figures.

Thanks.

I ran the new version of abba_test.ps1 (just building LLVM) with small modifications to comment out the "Validate Ninja" step and change "check-all" to "all" to save time as I'm concentrating on build build speed. This means that I'm applying it to the same git revision as you. For clang_bypass_cc1.patch I used the patch I downloaded before (before spinning off D70568). Running for one hour on a six core machine (winver 1803, 14 ninja jobs, CFG enabled, recently rebooted) I get:

Total iterations:  2

     |         Min       |        Mean       |       Median      |         Max       |
   A |  00:15:13.0773389 |  00:15:13.7219710 |  00:15:13.7219710 |  00:15:14.3666030 |
   B |  00:15:08.6134425 |  00:15:09.0363284 |  00:15:09.0363284 |  00:15:09.4592142 |
Diff | -00:00:04.4638964 | -00:00:04.6856426 | -00:00:04.6856426 | -00:00:04.9073888 |

So in this case it saved 0.5% of time. Using the previous maths, with 2378 clang-cl jobs, this implies process creation time of 29ms. This was fairly soon after a reboot. Maybe I'm just lucky and none of the process creation problems are affecting me on this system (at this moment).

So in this case it saved 0.5% of time. Using the previous maths, with 2378 clang-cl jobs, this implies process creation time of 29ms. This was fairly soon after a reboot. Maybe I'm just lucky and none of the process creation problems are affecting me on this system (at this moment).

It looks like the git apply didn't work, but the script continued so this was a duff experiment, please ignore. I'll try to fix this and run it again.

It looks like the git apply didn't work, but the script continued so this was a duff experiment, please ignore. I'll try to fix this and run it again.

Thanks Russell! Can you try running the script overnight? It'll give a better result and we'll see the error bar.

Maybe I'm just lucky and none of the process creation problems are affecting me on this system (at this moment).

I'm running tests on 6 differents Windows systems (both desktops and servers), with different OS revisions, and they all behave differently. With this patch, some see better improvements than others. I'll post the results once I have a better picture.

It looks like the git apply didn't work, but the script continued so this was a duff experiment, please ignore. I'll try to fix this and run it again.

Note for other reviewers, the abba_test.ps1 script uses a git format patch, not the format you can download from Phabricator.

Thanks Russell! Can you try running the script overnight? It'll give a better result and we'll see the error bar.

Okay. With the patch sorted, I ran for 10 hours overnight.

Total iterations:  21

     |         Min       |        Mean       |       Median      |         Max       |
   A |  00:14:11.0340636 |  00:14:33.2494597 |  00:14:25.3644475 |  00:15:25.4082274 |
   B |  00:13:57.9445995 |  00:14:20.3073502 |  00:14:09.0937819 |  00:16:06.2885457 |
Diff | -00:00:13.0894641 | -00:00:12.9421096 | -00:00:16.2706656 |  00:00:40.8803183 |

From looking at the individual times this was stable for a couple of hours then there were some particularly long runs (on A and B), possibly some scheduled maintenance tasks, so I'd recommend using the median value to ignore these outliers. Based on the median this saves about 1.9% for me.

aganea added a comment.EditedDec 2 2019, 3:19 PM

You're right indeed Russell, my bad, this gain is not as important as I initially claimed -- however there's a gain.

Several factors came into play in my previous testings:

  1. The latest ninja 1.9.0 doesn't use all cpu sockets on some Windows systems (see this), and as it uses +2 cores in addition to the detected # of cores, this possibly induces some unwanted side-effects (such as the speed-up seen in the summary).
  2. I was using Clang 9.0 RC4, which is compiled with -DLLVM_ENABLE_ASSERTIONS=ON. When @hans sent the email for 9.0.1 I realized this flag is turned off only for the final release :-(
  3. I was using all sorts of options (/MT /GS- /arch:AVX) which actually improved build times, when compared with the baseline Clang release (see graphs below)

We indeed have a security Windows service running on our machines which we cannot disable.

I have re-run the tests on 6 different configurations (see graph for specs)
When using this patch along with -GNinja -DLLVM_ENABLE_PROJECTS=clang;lld;llvm -DLLVM_USE_CRT_RELEASE=MT -DCMAKE_CXX_FLAGS="/arch:AVX /GS-" -DCMAKE_C_FLAGS="/arch:AVX /GS-", it reduces build times by about 30 sec on many-cores machines, and about 15-20 sec on 6-cores. The 28-core machine did not see any improvements, I cannot explain why.
The variability is also reduced when this patch is applied. This seems to be consistent across all machines.

WARNING: the X coordinate (time) does not start at zero

In my sense, the patch remains valid overall:

  • It is easier to debug with just one process.
  • Later on, I'd like to put forward a patch which (optionally) replaces the CRT allocator with rpmalloc, and not having this current patch would have less impact.
  • I have an alternate proposal for D52193 and D69582 which uses a thread pool instead of a process pool, and this patch is needed for that purpose (however that requires reviving & solving the thread-safe cl::opt RFC).
aganea edited the summary of this revision. (Show Details)Dec 2 2019, 3:25 PM

Thanks for the updated timings. I have no objection to this going in.

I haven't gone through the code changes in detail so I think you should probably get approval from someone who has (such as @hans).

hans added a comment.Dec 3 2019, 6:11 AM

I'm also supported, even though the gains are smaller than first thought. It depends on D70568 though.

@aganea Please disable the new behavior for the Darwin platform. We rely on the fact that Clang -cc1 processes crash to report crashes using system's crash reporting infrastructure.

aganea added a comment.Dec 4 2019, 6:55 AM

@aganea Please disable the new behavior for the Darwin platform. We rely on the fact that Clang -cc1 processes crash to report crashes using system's crash reporting infrastructure.

Will do of course. Just for my own curiosity, how do you actually disable the LLVM signal handler when you ship clang to customers? Through the env var LLVM_DISABLE_CRASH_REPORT? Or maybe this is disabled by default in your downstream fork? Does SignalHandler() get called?

Should I disable it through a #ifndef __APPLE__? Or something else?

thakis added a comment.Dec 4 2019, 5:39 PM

@aganea Please disable the new behavior for the Darwin platform. We rely on the fact that Clang -cc1 processes crash to report crashes using system's crash reporting infrastructure.

We'll want to use this in chromium on Mac. There should probably be a cmake def to pick the default driver mode for this (and an explicit flag to pick it at runtime). We can then set that cmake def.

I guess Duncan's comments on https://reviews.llvm.org/D70568 were for this patch. I think it would be good to have:

  • A cmake option that controls the default behavior. It should still fork for Darwin by default, the detection should be done through CMake when it's building for a Darwin platform.
  • A driver/cc1 flag that allows the user to override the behavior.

This will allow users to avoid cc1 forking either by building their own Clang, or by passing in a flag with their build options. WDYT?

I guess Duncan's comments on https://reviews.llvm.org/D70568 were for this patch. I think it would be good to have:

That's right, I was confused about which review I was on.

  • A cmake option that controls the default behavior. It should still fork for Darwin by default, the detection should be done through CMake when it's building for a Darwin platform.
  • A driver/cc1 flag that allows the user to override the behavior.
  • I don't think a -cc1 flag is reasonable, because it affects the caller, not -cc1 itself. I would be confused by this I think.
  • Driver flag seems a bit too user-facing, but probably fine.
  • Another (maybe odd) option would be an environment variable, such as CLANG_FORK_CC1=1 or CLANG_FORK_CC1=0, which overrides the default.

This will allow users to avoid cc1 forking either by building their own Clang, or by passing in a flag with their build options. WDYT?

Right. Some way to control at runtime, especially so we can test both ways of launching in check-clang but also probably useful for other scenarios, and a cmake flag to set the default so vendors can choose.

aganea added a comment.EditedDec 5 2019, 11:28 AM

I did this in driver.cpp:

// Whether the cc1 tool should be called inside the current process, or forked
// to a new new process.
bool UseCC1ProcessFork = CLANG_FORK_CC1;

StringRef ForkCC1Str = ::getenv("CLANG_FORK_CC1");
if (!ForkCC1Str.empty()) {
  UseCC1ProcessFork = ForkCC1Str.compare("1") == 0 ||
                      ForkCC1Str.compare_lower("on") == 0 ||
                      ForkCC1Str.compare_lower("true") == 0;
}
if (!UseCC1ProcessFork) {
  // Here we provide a shortcut for calling the -cc1 cmd-line within the
  // same process, instead of starting a new process. This saves a some
  // execution time of Windows, as process creation can be expensive on
  // that platform.
  TheDriver.CC1Main = &ExecuteCC1Tool;
}

With a new option in cmake to control the default behavior:

set(CLANG_FORK_CC1 OFF BOOL
    "Whether clang should use a new process for the CC1 invocation")

if (APPLE)
  set(CLANG_FORK_CC1 1)
endif()

And the corresponding flag in config.h:

/* If we want to fork a new process clang.exe for the CC1 invocation */
#cmakedefine01 CLANG_FORK_CC1

This option needs to be controlled at large (otherwise some TUs might compile with forking "on" and others with forking "off", depending on what happens in the build scripts), so I thought the env variable would make more sense. Do you think a new cmd-line option is worth it, or is the env var enough?

The environment variable should be enough IMHO

rnk added a comment.Dec 5 2019, 1:14 PM

I'd be fine with the cmake var and the env var and no driver flag.

I think clang developers on Darwin probably want the single-process clang -c behavior (that's how I interpreted what Duncan said), even though Apple as a distributor of clang wants the multi-process behavior. I would skip the platform check and in upstream clang and suggest that Apple customize the CMake variable when producing the XCode tools package. But, that's just my opinion, please disagree if you don't like that idea.

Lastly, we could make the environment variable more cross-platform by calling it CLANG_SPAWN_CC1 to be inclusive of OSs where fork doesn't exist. Besides, doesn't clang use posix_spawn on Mac anyway?

aganea updated this revision to Diff 232861.Dec 9 2019, 7:49 AM
aganea marked 4 inline comments as done.

Addressed all comments.
Added CLANG_SPAWN_CC1 cmake flag & environment variable to control whether we want a cc1 process or not. It is disabled by default. I left out the if (APPLE) part in cmake, as suggested by @rnk.

aganea added inline comments.Dec 9 2019, 7:49 AM
clang/tools/driver/driver.cpp
313

Risky in what way? What would you suggest? The contents of argv are exactly the same in both cases: when called from the CC1Command, or explicitly from the command-line.

hans marked an inline comment as done.Dec 11 2019, 6:32 AM

I'm basically happy with this, just some minor comments.

clang/lib/Driver/Job.cpp
378

I think this gets printed by "clang -###" so maybe just print something short here. Perhaps just an "(in-process)" prefix? Then we don't need to change so many test expectations also.

387

Do we need to disable afterwards?

clang/tools/driver/driver.cpp
265

Maybe just do the "!!" thing like for the environment variables above? It's not pretty, but at least that would be consistent, and it avoids the problem of deciding what values mean "on" and what mean "off".

313

You're right, I suppose the code was already doing this.

niosHD added a subscriber: niosHD.Dec 11 2019, 6:37 AM
Trass3r added a subscriber: Trass3r.Jan 8 2020, 5:52 AM
jdoerfert resigned from this revision.Jan 8 2020, 8:23 AM
aganea updated this revision to Diff 237043.Jan 9 2020, 5:54 AM
aganea marked 4 inline comments as done.
aganea added inline comments.
clang/lib/Driver/Job.cpp
387

I moved this line to clang/tools/driver/driver.cpp like discussed previously.

clang/tools/driver/driver.cpp
265

Not if we want to negate the CLANG_SPAWN_CC1 compile-time flag initialized just above. !!::getenv will only say the option is there, it won't say what we should do. My rationale was, if the env variable is there, its value overrides the compile-time flag. Unless we want something along the lines of CLANG_ENABLE_SPAWN_CC1 / CLANG_DISABLE_SPAWN_CC1?

ychen added a subscriber: ychen.Jan 9 2020, 9:37 AM
hans added inline comments.Jan 10 2020, 8:14 AM
clang/tools/driver/driver.cpp
265

Ah, I didn't think about that.

This feels a bit fragile though, for example if the user sets it to "yes" instead of "on", it won't be obvious that it has the opposite effect. What do you think about only supporting 1 and 0, and printing a helpful error for other values?

Also, the comparisons could be done with equals(_lower) instead of the compare methods.

aganea updated this revision to Diff 237465.Jan 10 2020, 5:49 PM
aganea marked 2 inline comments as done.
aganea added inline comments.
clang/tools/driver/driver.cpp
265

As requested.

hans accepted this revision.Jan 13 2020, 1:18 AM

Looks good to me (with a minor comment). Thanks!

clang/tools/driver/driver.cpp
268

I think just checking SpawnCC1Str == "0" or == "1" would be easier.

This revision is now accepted and ready to land.Jan 13 2020, 1:18 AM
This revision was automatically updated to reflect the committed changes.
aganea marked an inline comment as done.

Attention Apple maintainers: after this patch, do not forget to add -DCLANG_SPAWN_CC1=ON to your release scripts, to retain your previous crash reporting behavior (see discussion above). @dexonsmith @arphaman

Thanks for seeing this through!

env var or flag

Our build setup can easily add commandline flags, but it can't easily add env vars. So I think this should have a driver mode flag.

For people who asked for an env var: Why do you need an env var for this? Almost all compiler settings are accessible by flag but not by env var. Why is this one special?

This broke NetBSD buildbot:

+ : 'RUN: at line 1'
+ env -u CLANG_SPAWN_CC1 /home/motus/netbsd8/netbsd8/build-stage2/bin/clang -c /data/motus/netbsd8/netbsd8/llvm-project/clang/test/Driver/cc1-spawnprocess.c -o /dev/null
env: unknown option -- u
Usage: env [-i] [name=value ...] [command]

Please fix.

clang/test/Driver/cc1-spawnprocess.c
1 ↗(On Diff #237677)

env -u is not portable.

joerg added a subscriber: joerg.Jan 14 2020, 7:27 AM
joerg added inline comments.
clang/test/Driver/cc1-spawnprocess.c
1 ↗(On Diff #237677)

I think just going for env CLANG_SPAWN_CC1= works for the purpose of this test.

Trass3r removed a subscriber: Trass3r.Jan 14 2020, 8:55 AM

Hi!

This patch breaks scan-build-py which parses the output of "-###" to get -cc1 command. There might be other tools with the same problems. Could we either remove (in-process) from CC1Command::Print or add a line break?

Having the last line as a valid invocation is valuable and there might be tools relying on that.

NoQ added a comment.Jan 18 2020, 6:45 AM

This patch breaks scan-build-py which parses the output of "-###" to get -cc1 command.

Uh-oh, thanks for catching this. The same applies to the regular scan-build. Unfortunately, neither is properly tested. We really need to get this fixed.

@thakis landed a patch to control this behavior on the cmd-line, adding -fno-integrated-cc1 would solve your issue (ie. revert to old behavior). @NoQ @xazax.hun is that an acceptable solution?

There might be other tools parsing the output of -###. I think adding a line break in the output is better/easier than updating all those tools (amd making them potentially slower, not profiting from this patch).

wenlei added a subscriber: wenlei.Jan 18 2020, 11:48 PM

@aganea, @hans This patch broke response file expansion for preprocessor via -Wp - all our internal builds failed with this patch because we invoke clang with response file passed to preprocessor, e.g. clang++ ... -Wp,@pp.rsp @cc.rsp. Looks like we can't just call ExecuteCC1Tool directly, because response file expansion happens in main, so with this patch, -Wp,@pp.rsp won't be expanded properly.

repo:

CLANG_SPAWN_CC1=0  clang++ -c  -Wp,@a.rsp  @a.rsp test.cpp

error: error reading '@rr.rsp'
1 error generated.

and it works with CLANG_SPAWN_CC1=1

aganea added a comment.EditedJan 19 2020, 7:35 AM

@wenlei Taking a look now.
@hans I remember now why I was re-entering main() in the first iteration of this patch. Response files were one reason. And handling /MP was another, which triggers the re-entrance twice (once for CL driver, and once for cc1).

bjope added a subscriber: bjope.Jan 21 2020, 7:11 AM

When I build the clang on trunk, using clang 8.0 with asan enabled, and then run the clang lit tests I see lots of failures in Clang :: InterfaceStubs/driver-test.c and Clang :: InterfaceStubs/driver-test2.c (and maybe the faults I get in Clang :: Driver/cl-showfilenames.c comes from this patch as well).

The failures go away if I revert this patch (commit b4a99a061f517e60985667e39519f601). To make the revert I also needed to revert commit 8e5018e990b701391e6c33ba85b012343df67272 which is based on this patch.

Examples of failures:

==130643==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 192 byte(s) in 3 object(s) allocated from:
    #0 0xd0ea38 in operator new(unsigned long) /workspace/llvm/projects/compiler-rt/lib/asan/asan_new_delete.cc:106
    #1 0x8153b36 in make_unique<clang::GenerateInterfaceIfsExpV1Action> /proj/flexasic/app/llvm/8.0/bin/../include/c++/v1/memory:3132:28
    #2 0x8153b36 in CreateFrontendBaseAction /workspace//llvm-master-sanitize-asan/llvm/build-all-asan/../../clang/lib/FrontendTool/ExecuteCompilerInvocation.cpp:68
    #3 0x8153b36 in clang::CreateFrontendAction(clang::CompilerInstance&) /workspace//llvm-master-sanitize-asan/llvm/build-all-asan/../../clang/lib/FrontendTool/ExecuteCompilerInvocation.cpp:137
    #4 0x8156a22 in clang::ExecuteCompilerInvocation(clang::CompilerInstance*) /workspace//llvm-master-sanitize-asan/llvm/build-all-asan/../../clang/lib/FrontendTool/ExecuteCompilerInvocation.cpp:287:39
    #5 0xd23349 in cc1_main(llvm::ArrayRef<char const*>, char const*, void*) /workspace//llvm-master-sanitize-asan/llvm/build-all-asan/../../clang/tools/driver/cc1_main.cpp:239:15
    #6 0x7a0082e in operator() /workspace//llvm-master-sanitize-asan/llvm/build-all-asan/../../clang/lib/Driver/Job.cpp:402:34
    #7 0x7a0082e in void llvm::function_ref<void ()>::callback_fn<clang::driver::CC1Command::Execute(llvm::ArrayRef<llvm::Optional<llvm::StringRef> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >*, bool*) const::$_1>(long) /workspace//llvm-master-sanitize-asan/llvm/build-all-asan/../include/llvm/ADT/STLExtras.h:108
    #8 0x6634feb in operator() /workspace//llvm-master-sanitize-asan/llvm/build-all-asan/../include/llvm/ADT/STLExtras.h:125:12
    #9 0x6634feb in llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) /workspace//llvm-master-sanitize-asan/llvm/build-all-asan/../lib/Support/CrashRecoveryContext.cpp:396
    #10 0x79fefaf in clang::driver::CC1Command::Execute(llvm::ArrayRef<llvm::Optional<llvm::StringRef> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >*, bool*) const /workspace//llvm-master-sanitize-asan/llvm/build-all-asan/../../clang/lib/Driver/Job.cpp:402:12
    #11 0x795265d in clang::driver::Compilation::ExecuteCommand(clang::driver::Command const&, clang::driver::Command const*&) const /workspace//llvm-master-sanitize-asan/llvm/build-all-asan/../../clang/lib/Driver/Compilation.cpp:182:15
    #12 0x7953922 in clang::driver::Compilation::ExecuteJobs(clang::driver::JobList const&, llvm::SmallVectorImpl<std::__1::pair<int, clang::driver::Command const*> >&) const /workspace//llvm-master-sanitize-asan/llvm/build-all-asan/../../clang/lib/Driver/Compilation.cpp:233:19
    #13 0x79a4d0e in clang::driver::Driver::ExecuteCompilation(clang::driver::Compilation&, llvm::SmallVectorImpl<std::__1::pair<int, clang::driver::Command const*> >&) /workspace//llvm-master-sanitize-asan/llvm/build-all-asan/../../clang/lib/Driver/Driver.cpp:1473:5
    #14 0xd19eac in main /workspace//llvm-master-sanitize-asan/llvm/build-all-asan/../../clang/tools/driver/driver.cpp:493:21
    #15 0x7f882be4e544 in __libc_start_main (/lib64/libc.so.6+0x22544)

Direct leak of 184 byte(s) in 1 object(s) allocated from:
    #0 0xd0ea38 in operator new(unsigned long) /workspace/llvm/projects/compiler-rt/lib/asan/asan_new_delete.cc:106
    #1 0x81537b4 in make_unique<clang::EmitAssemblyAction> /proj/flexasic/app/llvm/8.0/bin/../include/c++/v1/memory:3132:28
    #2 0x81537b4 in CreateFrontendBaseAction /workspace//llvm-master-sanitize-asan/llvm/build-all-asan/../../clang/lib/FrontendTool/ExecuteCompilerInvocation.cpp:52
    #3 0x81537b4 in clang::CreateFrontendAction(clang::CompilerInstance&) /workspace//llvm-master-sanitize-asan/llvm/build-all-asan/../../clang/lib/FrontendTool/ExecuteCompilerInvocation.cpp:137
    #4 0x8156a22 in clang::ExecuteCompilerInvocation(clang::CompilerInstance*) /workspace//llvm-master-sanitize-asan/llvm/build-all-asan/../../clang/lib/FrontendTool/ExecuteCompilerInvocation.cpp:287:39
    #5 0xd23349 in cc1_main(llvm::ArrayRef<char const*>, char const*, void*) /workspace//llvm-master-sanitize-asan/llvm/build-all-asan/../../clang/tools/driver/cc1_main.cpp:239:15
    #6 0x7a0082e in operator() /workspace//llvm-master-sanitize-asan/llvm/build-all-asan/../../clang/lib/Driver/Job.cpp:402:34
    #7 0x7a0082e in void llvm::function_ref<void ()>::callback_fn<clang::driver::CC1Command::Execute(llvm::ArrayRef<llvm::Optional<llvm::StringRef> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >*, bool*) const::$_1>(long) /workspace//llvm-master-sanitize-asan/llvm/build-all-asan/../include/llvm/ADT/STLExtras.h:108
    #8 0x6634feb in operator() /workspace//llvm-master-sanitize-asan/llvm/build-all-asan/../include/llvm/ADT/STLExtras.h:125:12
    #9 0x6634feb in llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) /workspace//llvm-master-sanitize-asan/llvm/build-all-asan/../lib/Support/CrashRecoveryContext.cpp:396
    #10 0x79fefaf in clang::driver::CC1Command::Execute(llvm::ArrayRef<llvm::Optional<llvm::StringRef> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >*, bool*) const /workspace//llvm-master-sanitize-asan/llvm/build-all-asan/../../clang/lib/Driver/Job.cpp:402:12
    #11 0x795265d in clang::driver::Compilation::ExecuteCommand(clang::driver::Command const&, clang::driver::Command const*&) const /workspace//llvm-master-sanitize-asan/llvm/build-all-asan/../../clang/lib/Driver/Compilation.cpp:182:15
    #12 0x7953922 in clang::driver::Compilation::ExecuteJobs(clang::driver::JobList const&, llvm::SmallVectorImpl<std::__1::pair<int, clang::driver::Command const*> >&) const /workspace//llvm-master-sanitize-asan/llvm/build-all-asan/../../clang/lib/Driver/Compilation.cpp:233:19
    #13 0x79a4d0e in clang::driver::Driver::ExecuteCompilation(clang::driver::Compilation&, llvm::SmallVectorImpl<std::__1::pair<int, clang::driver::Command const*> >&) /workspace//llvm-master-sanitize-asan/llvm/build-all-asan/../../clang/lib/Driver/Driver.cpp:1473:5
    #14 0xd19eac in main /workspace//llvm-master-sanitize-asan/llvm/build-all-asan/../../clang/tools/driver/driver.cpp:493:21
    #15 0x7f882be4e544 in __libc_start_main (/lib64/libc.so.6+0x22544)

Indirect leak of 589860 byte(s) in 3 object(s) allocated from:
    #0 0xcd7ca7 in calloc /workspace/llvm/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:155
    #1 0x6694273 in safe_calloc /workspace//llvm-master-sanitize-asan/llvm/build-all-asan/../include/llvm/Support/MemAlloc.h:40:18
    #2 0x6694273 in llvm::StringMapImpl::init(unsigned int) /workspace//llvm-master-sanitize-asan/llvm/build-all-asan/../lib/Support/StringMap.cpp:61
    #3 0xe4f90ee in StringMap /workspace//llvm-master-sanitize-asan/llvm/build-all-asan/../include/llvm/ADT/StringMap.h:251:7
    #4 0xe4f90ee in clang::IdentifierTable::IdentifierTable(clang::IdentifierInfoLookup*) /workspace//llvm-master-sanitize-asan/llvm/build-all-asan/../../clang/lib/Basic/IdentifierTable.cpp:60
    #5 0xe49ab27 in clang::Preprocessor::Preprocessor(std::__1::shared_ptr<clang::PreprocessorOptions>, clang::DiagnosticsEngine&, clang::LangOptions&, clang::SourceManager&, clang::HeaderSearch&, clang::ModuleLoader&, clang::IdentifierInfoLookup*, bool, clang::TranslationUnitKind) /workspace//llvm-master-sanitize-asan/llvm/build-all-asan/../../clang/lib/Lex/Preprocessor.cpp:92:7
    #6 0x7db6092 in __compressed_pair_elem<std::__1::shared_ptr<clang::PreprocessorOptions> &&, clang::DiagnosticsEngine &, clang::LangOptions &, clang::SourceManager &, clang::HeaderSearch &, clang::CompilerInstance &, nullptr_t &&, bool &&, clang::TranslationUnitKind &, 0, 1, 2, 3, 4, 5, 6, 7, 8> /proj/flexasic/app/llvm/8.0/bin/../include/c++/v1/memory:2156:9
    #7 0x7db6092 in __compressed_pair<std::__1::allocator<clang::Preprocessor> &, std::__1::shared_ptr<clang::PreprocessorOptions> &&, clang::DiagnosticsEngine &, clang::LangOptions &, clang::SourceManager &, clang::HeaderSearch &, clang::CompilerInstance &, nullptr_t &&, bool &&, clang::TranslationUnitKind &> /proj/flexasic/app/llvm/8.0/bin/../include/c++/v1/memory:2259
    #8 0x7db6092 in __shared_ptr_emplace<std::__1::shared_ptr<clang::PreprocessorOptions>, clang::DiagnosticsEngine &, clang::LangOptions &, clang::SourceManager &, clang::HeaderSearch &, clang::CompilerInstance &, nullptr_t, bool, clang::TranslationUnitKind &> /proj/flexasic/app/llvm/8.0/bin/../include/c++/v1/memory:3672
    #9 0x7db6092 in make_shared<std::__1::shared_ptr<clang::PreprocessorOptions>, clang::DiagnosticsEngine &, clang::LangOptions &, clang::SourceManager &, clang::HeaderSearch &, clang::CompilerInstance &, nullptr_t, bool, clang::TranslationUnitKind &> /proj/flexasic/app/llvm/8.0/bin/../include/c++/v1/memory:4331
    #10 0x7db6092 in make_shared<clang::Preprocessor, std::__1::shared_ptr<clang::PreprocessorOptions>, clang::DiagnosticsEngine &, clang::LangOptions &, clang::SourceManager &, clang::HeaderSearch &, clang::CompilerInstance &, nullptr_t, bool, clang::TranslationUnitKind &> /proj/flexasic/app/llvm/8.0/bin/../include/c++/v1/memory:4710
    #11 0x7db6092 in clang::CompilerInstance::createPreprocessor(clang::TranslationUnitKind) /workspace//llvm-master-sanitize-asan/llvm/build-all-asan/../../clang/lib/Frontend/CompilerInstance.cpp:389
    #12 0x7ef58bf in clang::FrontendAction::BeginSourceFile(clang::CompilerInstance&, clang::FrontendInputFile const&) /workspace//llvm-master-sanitize-asan/llvm/build-all-asan/../../clang/lib/Frontend/FrontendAction.cpp:744:8
    #13 0x7dc809f in clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) /workspace//llvm-master-sanitize-asan/llvm/build-all-asan/../../clang/lib/Frontend/CompilerInstance.cpp:964:13
    #14 0x8156a4d in clang::ExecuteCompilerInvocation(clang::CompilerInstance*) /workspace//llvm-master-sanitize-asan/llvm/build-all-asan/../../clang/lib/FrontendTool/ExecuteCompilerInvocation.cpp:290:25
    #15 0xd23349 in cc1_main(llvm::ArrayRef<char const*>, char const*, void*) /workspace//llvm-master-sanitize-asan/llvm/build-all-asan/../../clang/tools/driver/cc1_main.cpp:239:15
    #16 0x7a0082e in operator() /workspace//llvm-master-sanitize-asan/llvm/build-all-asan/../../clang/lib/Driver/Job.cpp:402:34
    #17 0x7a0082e in void llvm::function_ref<void ()>::callback_fn<clang::driver::CC1Command::Execute(llvm::ArrayRef<llvm::Optional<llvm::StringRef> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >*, bool*) const::$_1>(long) /workspace//llvm-master-sanitize-asan/llvm/build-all-asan/../include/llvm/ADT/STLExtras.h:108
    #18 0x6634feb in operator() /workspace//llvm-master-sanitize-asan/llvm/build-all-asan/../include/llvm/ADT/STLExtras.h:125:12
    #19 0x6634feb in llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) /workspace//llvm-master-sanitize-asan/llvm/build-all-asan/../lib/Support/CrashRecoveryContext.cpp:396
    #20 0x79fefaf in clang::driver::CC1Command::Execute(llvm::ArrayRef<llvm::Optional<llvm::StringRef> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >*, bool*) const /workspace//llvm-master-sanitize-asan/llvm/build-all-asan/../../clang/lib/Driver/Job.cpp:402:12
    #21 0x795265d in clang::driver::Compilation::ExecuteCommand(clang::driver::Command const&, clang::driver::Command const*&) const /workspace//llvm-master-sanitize-asan/llvm/build-all-asan/../../clang/lib/Driver/Compilation.cpp:182:15
    #22 0x7953922 in clang::driver::Compilation::ExecuteJobs(clang::driver::JobList const&, llvm::SmallVectorImpl<std::__1::pair<int, clang::driver::Command const*> >&) const /workspace//llvm-master-sanitize-asan/llvm/build-all-asan/../../clang/lib/Driver/Compilation.cpp:233:19
    #23 0x79a4d0e in clang::driver::Driver::ExecuteCompilation(clang::driver::Compilation&, llvm::SmallVectorImpl<std::__1::pair<int, clang::driver::Command const*> >&) /workspace//llvm-master-sanitize-asan/llvm/build-all-asan/../../clang/lib/Driver/Driver.cpp:1473:5
    #24 0xd19eac in main /workspace//llvm-master-sanitize-asan/llvm/build-all-asan/../../clang/tools/driver/driver.cpp:493:21
    #25 0x7f882be4e544 in __libc_start_main (/lib64/libc.so.6+0x22544)
smeenai added a subscriber: plotfi.Jan 21 2020, 9:37 AM

CC @plotfi for the InterfaceStubs failures

CC @plotfi for the InterfaceStubs failures

Thanks

CC @plotfi for the InterfaceStubs failures

@smeenai thanks for the CC. I have been looking into this asan failure and it isn't actually caused by the interface stubs AST consumer at all. What is triggering it is the pipeline setup. Here is what I gather:

  • Two standard cc1 CompileJobs followed by two Interface Stubs cc1 CompileJobs (or any other Job type for that matter): asan failure
  • if I clear the jobs list in Driver.cpp prior to appending the Interface Stubs jobs: No asan failure.
  • One standard cc1 CompileJob followed by one Interface Stubs/Alternate cc1 Compile job: No asan failure.

TL;DR
The InterfaceStubs ASTConsumer CompileJob is not causing the ASAN failure here, it's the setup of having a standard compilation pipeline followed by a secondary side-care pipeline thats triggering it in -fintegrated-cc1 mode. I am still looking into this and any help would be much appreciated. In the meantime I could -fno-integrated-cc1 the driver.c InterfaceStubs tests to quite the bots if that is what people want (and file a bug on bugzilla).

hctim added a subscriber: hctim.Jan 21 2020, 3:18 PM

In the meantime I could -fno-integrated-cc1 the driver.c InterfaceStubs tests to quite the bots if that is what people want (and file a bug on bugzilla).

Yes, please. A temporary workaround here that can be resolved later would be much appreciated to unblock our bots. Normally, I'd prefer a rollback-and-resubmit-with-fixes, but given that there have been multiple dependent changes submitted that would all have to be rolled back - whatever's the quickest way to get us back online without resorting to rolling back everything would be great :)

hctim added a comment.Jan 23 2020, 2:27 PM

For tracking purposes.

The ASan leaks have been worked around in two separate patches in order to make the bots green again:

@plotfi is graciously debugging :)

hans added a comment.Jan 23 2020, 2:41 PM

For tracking purposes.

The ASan leaks have been worked around in two separate patches in order to make the bots green again:

Thanks! I've cherry-picked these to the 10.x branch in 2dd6b91f35edb967f329f0437b53ea14395aa770 and bfaba51f07d1b79ff6e71da106c2b7e153874b1d

thakis added a comment.Feb 5 2020, 7:50 AM

For tracking purposes.

The ASan leaks have been worked around in two separate patches in order to make the bots green again:

@plotfi is graciously debugging :)

How is the debugging going? If it takes a while to fix, maybe it's better to make -emit-interface-stub imply -fno-integrated-cc1 until it's fixed for real, instead of adding that in the tests? As-is, the tests are happy but the feature is likely broken (?)

I've filed this bug against this patch: https://bugs.llvm.org/show_bug.cgi?id=44823

We've discovered in cases where multiple TUs are invoked in a single command line that the memory between calls isn't cleaned up like it was previously. This results in really high memory use from a compilation.

Experimentation has shown that removing the addition of the -disable-free flag from the CC1Command command lines fixes the issue. However, I believe we should only remove that in cases where it is NOT the last CC1 command (since the disable-free has some pretty solid performance improvements).

plotfi added a comment.Feb 7 2020, 6:01 PM

@thakis I got sidetracked by some other things and had not made more progress, if progress is not made yes I think -emit-interface-stubs should imply -fno-integrated-cc1 if that seems like the right thing to do to you. @erichkeane Thanks for filing the bugzilla bug. I was having some problems with my bugzilla login myself and wasn't able to do so. I'll track progress if this bug and decide shortly if my feature should imply no-integraed-cc1. Thanks Guys.

For tracking purposes.

The ASan leaks have been worked around in two separate patches in order to make the bots green again:

@plotfi is graciously debugging :)

How is the debugging going? If it takes a while to fix, maybe it's better to make -emit-interface-stub imply -fno-integrated-cc1 until it's fixed for real, instead of adding that in the tests? As-is, the tests are happy but the feature is likely broken (?)

hans added a comment.Feb 10 2020, 7:02 AM

@erichkeane Thanks for filing the bugzilla bug. I was having some problems with my bugzilla login myself and wasn't able to do so. I'll track progress if this bug and decide shortly if my feature should imply no-integraed-cc1. Thanks Guys.

I think that bug was for the problem of high memory usage when doing multiple compilations in one clang invocation.

I've filed https://bugs.llvm.org/show_bug.cgi?id=44865 for the interface stubs vs asan issue to keep track of it easier. Please cc yourselves and track progress there.

Hello @plotfi and @hctim -- rG20f1abe306d0 should solve the initial issues you were seeing, please let me know if it doesn't.

On AIX and PPC LE Linux after this change we are seeing invalid accesses when the backend asserts/fatal_errors. Looks like the driver and CC1 now share some global TimerGroup state that points to Timers which got created by CC1 but didn't get cleanup after the assert. Reported as: https://bugs.llvm.org/show_bug.cgi?id=45164

Example failure:

./bin/clang ./llvm-project/clang/test/CodeGenObjC/illegal-UTF8.m -S -target powerpc-ibm-aix
...
fatal error: error in backend: Unhandled linkage when mapping linkage to StorageClass.
Stack dump:
...
*******************
AddressSanitizer:DEADLYSIGNAL
=================================================================
==37262==ERROR: AddressSanitizer: SEGV on unknown address 0x000010715768 (pc 0x3fffb7cdb2c4 bp 0x3e1fb764cdb0 sp 0x3fffffffc460 T0)
==37262==The signal is caused by a UNKNOWN memory access.
    #0 0x3fffb7cdb2c0 in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::string const&) (/lib64/libstdc++.so.6+0x11b2c0)
    #1 0x16ca75c0 in PrintRecord /home/daltenty/llvm/dev/llvm-project/llvm/include/llvm/Support/Timer.h:180:21
    #2 0x16ca75c0 in construct<llvm::TimerGroup::PrintRecord, llvm::TimeRecord &, std::basic_string<char> &, std::basic_string<char> &> /opt/rh/devtoolset-7/root/usr/lib/gcc/ppc64le-redhat-linux/7/../../../../include/c++/7/ext/new_allocator.h:136:23
    #3 0x16ca75c0 in construct<llvm::TimerGroup::PrintRecord, llvm::TimeRecord &, std::basic_string<char> &, std::basic_string<char> &> /opt/rh/devtoolset-7/root/usr/lib/gcc/ppc64le-redhat-linux/7/../../../../include/c++/7/bits/alloc_traits.h:475:8
    #4 0x16ca75c0 in void std::vector<llvm::TimerGroup::PrintRecord, std::allocator<llvm::TimerGroup::PrintRecord> >::_M_realloc_insert<llvm::TimeRecord&, std::string&, std::string&>(__gnu_cxx::__normal_iterator<llvm::TimerGroup::PrintRecord*, std::vector<llvm::TimerGroup::PrintRecord, std::allocator<llvm::TimerGroup::PrintRecord> > >, llvm::TimeRecord&, std::string&, std::string&) /opt/rh/devtoolset-7/root/usr/lib/gcc/ppc64le-redhat-linux/7/../../../../include/c++/7/bits/vector.tcc:415:4
    #5 0x16ca2220 in emplace_back<llvm::TimeRecord &, std::basic_string<char> &, std::basic_string<char> &> /opt/rh/devtoolset-7/root/usr/lib/gcc/ppc64le-redhat-linux/7/../../../../include/c++/7/bits/vector.tcc:105:4
    #6 0x16ca2220 in llvm::TimerGroup::prepareToPrintList(bool) /home/daltenty/llvm/dev/llvm-project/llvm/lib/Support/Timer.cpp:359:19
    #7 0x16ca2498 in llvm::TimerGroup::print(llvm::raw_ostream&, bool) /home/daltenty/llvm/dev/llvm-project/llvm/lib/Support/Timer.cpp:373:5
    #8 0x16ca2a4c in llvm::TimerGroup::printAll(llvm::raw_ostream&) /home/daltenty/llvm/dev/llvm-project/llvm/lib/Support/Timer.cpp:391:9
    #9 0x107f1940 in main /home/daltenty/llvm/dev/llvm-project/clang/tools/driver/driver.cpp:535:3
    #10 0x3fffb79b497c in generic_start_main.isra.0 (/lib64/libc.so.6+0x2497c)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV (/lib64/libstdc++.so.6+0x11b2c0) in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::string const&)
==37262==ABORTING

On AIX and PPC LE Linux after this change we are seeing invalid accesses when the backend asserts/fatal_errors. Looks like the driver and CC1 now share some global TimerGroup state that points to Timers which got created by CC1 but didn't get cleanup after the assert. Reported as: https://bugs.llvm.org/show_bug.cgi?id=45164

From the bug report, it looks like we are discovering more problems associated with this patch that affect the 10.0 release. I've marked the bug as blocking release-10.0.0 for triage purposes for now.

Looking at the issue now, it should be a straightforward fix. Will send a patch ASAP.

dim added a subscriber: dim.May 29 2020, 12:49 PM

FWIW, this change causes https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=246630, see in particular https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=246630#c18.

For some reason, not spawning a new process for the cc1 stage can lead to non-reproducible output. The exact cause is not known yet.

hans added a comment.Jun 3 2020, 4:33 AM
In D69825#2063979, @dim wrote:

FWIW, this change causes https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=246630, see in particular https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=246630#c18.

For some reason, not spawning a new process for the cc1 stage can lead to non-reproducible output. The exact cause is not known yet.

Can you share the preprocessed source and compiler command-line for printf.c, either on the bug or here?

dim added a comment.Jun 3 2020, 9:30 AM
In D69825#2063979, @dim wrote:

FWIW, this change causes https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=246630, see in particular https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=246630#c18.

For some reason, not spawning a new process for the cc1 stage can lead to non-reproducible output. The exact cause is not known yet.

Can you share the preprocessed source and compiler command-line for printf.c, either on the bug or here?

See https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=246630#c26, and I've also uploaded the repro files to https://www.andric.com/freebsd/clang/bug246630-repro.tar.xz.

Running rGa3220dffcb1 under valgrind (on Linux) shows:

==120363== Conditional jump or move depends on uninitialised value(s)
==120363==    at 0x1634474: llvm::ConstantExpr::getGetElementPtr(llvm::Type*, llvm::Constant*, llvm::ArrayRef<llvm::Value*>, bool, llvm::Optional<unsigned int>, llvm::Type*) (Constants.cpp:2191)
==120363==    by 0x112D6D9: getGetElementPtr (Constants.h:1163)
==120363==    by 0x112D6D9: (anonymous namespace)::SymbolicallyEvaluateGEP(llvm::GEPOperator const*, llvm::ArrayRef<llvm::Constant*>, llvm::DataLayout const&, llvm::TargetLibraryInfo const*) (ConstantFolding.cpp:1005)
==120363==    by 0x112DF70: (anonymous namespace)::ConstantFoldInstOperandsImpl(llvm::Value const*, unsigned int, llvm::ArrayRef<llvm::Constant*>, llvm::DataLayout const&, llvm::TargetLibraryInfo const*) (ConstantFolding.cpp:1039)
==120363==    by 0x112C165: (anonymous namespace)::ConstantFoldConstantImpl(llvm::Constant const*, llvm::DataLayout const&, llvm::TargetLibraryInfo const*, llvm::SmallDenseMap<llvm::Constant*, llvm::Constant*, 4u, llvm::DenseMapInfo<llvm::Constant*>, llvm::detail::DenseMapPair<llvm::Constant*, llvm::Constant*> >&) [clone .part.0] (ConstantFolding.cpp:1114)
==120363==    by 0x112C5CF: llvm::ConstantFoldConstant(llvm::Constant const*, llvm::DataLayout const&, llvm::TargetLibraryInfo const*) (ConstantFolding.cpp:1194)
==120363==    by 0x188F410: prepareICWorklistFromFunction (InstructionCombining.cpp:3584)
==120363==    by 0x188F410: combineInstructionsOverFunction(llvm::Function&, llvm::InstCombineWorklist&, llvm::AAResults*, llvm::AssumptionCache&, llvm::TargetLibraryInfo&, llvm::DominatorTree&, llvm::OptimizationRemarkEmitter&, llvm::BlockFrequencyInfo*, llvm::ProfileSummaryInfo*, unsigned int, llvm::LoopInfo*) (InstructionCombining.cpp:3703)
==120363==    by 0x189205F: runOnFunction (InstructionCombining.cpp:3789)
==120363==    by 0x189205F: llvm::InstructionCombiningPass::runOnFunction(llvm::Function&) (InstructionCombining.cpp:3768)
==120363==    by 0x16F4352: llvm::FPPassManager::runOnFunction(llvm::Function&) (LegacyPassManager.cpp:1482)
==120363==    by 0x16F4DE8: llvm::FPPassManager::runOnModule(llvm::Module&) (LegacyPassManager.cpp:1518)
==120363==    by 0x16F51A2: runOnModule (LegacyPassManager.cpp:1583)
==120363==    by 0x16F51A2: llvm::legacy::PassManagerImpl::run(llvm::Module&) (LegacyPassManager.cpp:1695)
==120363==    by 0x1FF4CFE: EmitAssembly (BackendUtil.cpp:954)
==120363==    by 0x1FF4CFE: clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::DataLayout const&, llvm::Module*, clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream> >) (BackendUtil.cpp:1677)
==120363==    by 0x2C471A8: clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (CodeGenAction.cpp:335)
==120363==  Uninitialised value was created by a stack allocation
==120363==    at 0x112C653: (anonymous namespace)::SymbolicallyEvaluateGEP(llvm::GEPOperator const*, llvm::ArrayRef<llvm::Constant*>, llvm::DataLayout const&, llvm::TargetLibraryInfo const*) (ConstantFolding.cpp:832)

Trying to reduce this now.

dim added a comment.Jun 4 2020, 10:56 AM
In D69825#2071687, @dim wrote:

Trying to reduce this now.

Unfortunately this turned out to be a red herring, as the test case got reduced by creduce to just:

a() { b(""); }

The valgrind warnings are also different when you build clang with clang, instead of gcc as I did first. In that case, you get dozens of these instead:

==525651== Conditional jump or move depends on uninitialised value(s)
==525651==    at 0xBB1D71: SimplifyAndInst(llvm::Value*, llvm::Value*, llvm::SimplifyQuery const&, unsigned int) (in /home/dim/obj/llvm-llvmorg-10.0.0-53-gf79cd71e145-linux5-x86_64-ninja-rel-1/bin/clang-10)
==525651==    by 0xBBF6D0: ThreadBinOpOverPHI(llvm::Instruction::BinaryOps, llvm::Value*, llvm::Value*, llvm::SimplifyQuery const&, unsigned int) (in /home/dim/obj/llvm-llvmorg-10.0.0-53-gf79cd71e145-linux5-x86_64-ninja-rel-1/bin/clang-10)
==525651==    by 0xBB1F87: SimplifyAndInst(llvm::Value*, llvm::Value*, llvm::SimplifyQuery const&, unsigned int) (in /home/dim/obj/llvm-llvmorg-10.0.0-53-gf79cd71e145-linux5-x86_64-ninja-rel-1/bin/clang-10)
==525651==    by 0x13898E8: llvm::InstCombiner::visitAnd(llvm::BinaryOperator&) (in /home/dim/obj/llvm-llvmorg-10.0.0-53-gf79cd71e145-linux5-x86_64-ninja-rel-1/bin/clang-10)
==525651==    by 0x1365A6D: llvm::InstCombiner::run() (in /home/dim/obj/llvm-llvmorg-10.0.0-53-gf79cd71e145-linux5-x86_64-ninja-rel-1/bin/clang-10)
==525651==    by 0x1367AEC: combineInstructionsOverFunction(llvm::Function&, llvm::InstCombineWorklist&, llvm::AAResults*, llvm::AssumptionCache&, llvm::TargetLibraryInfo&, llvm::DominatorTree&, llvm::OptimizationRemarkEmitter&, llvm::BlockFrequencyInfo*, llvm::ProfileSummaryInfo*, bool, unsigned int, llvm::LoopInfo*) (in /home/dim/obj/llvm-llvmorg-10.0.0-53-gf79cd71e145-linux5-x86_64-ninja-rel-1/bin/clang-10)
==525651==    by 0x1369369: llvm::InstructionCombiningPass::runOnFunction(llvm::Function&) (in /home/dim/obj/llvm-llvmorg-10.0.0-53-gf79cd71e145-linux5-x86_64-ninja-rel-1/bin/clang-10)
==525651==    by 0x121015B: llvm::FPPassManager::runOnFunction(llvm::Function&) (in /home/dim/obj/llvm-llvmorg-10.0.0-53-gf79cd71e145-linux5-x86_64-ninja-rel-1/bin/clang-10)
==525651==    by 0x3BA5188: (anonymous namespace)::CGPassManager::runOnModule(llvm::Module&) (in /home/dim/obj/llvm-llvmorg-10.0.0-53-gf79cd71e145-linux5-x86_64-ninja-rel-1/bin/clang-10)
==525651==    by 0x1210B3F: llvm::legacy::PassManagerImpl::run(llvm::Module&) (in /home/dim/obj/llvm-llvmorg-10.0.0-53-gf79cd71e145-linux5-x86_64-ninja-rel-1/bin/clang-10)
==525651==    by 0x19A503A: clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::DataLayout const&, llvm::Module*, clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream> >) (in /home/dim/obj/llvm-llvmorg-10.0.0-53-gf79cd71e145-linux5-x86_64-ninja-rel-1/bin/clang-10)
==525651==    by 0x2500BD4: clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (in /home/dim/obj/llvm-llvmorg-10.0.0-53-gf79cd71e145-linux5-x86_64-ninja-rel-1/bin/clang-10)

I'm unsure how to continue.

dim added a comment.Jun 4 2020, 1:37 PM

For completeness sake, here are the valgrind outputs of clang rGf7f1abdb889 (llvmorg-11-init-16778) built by gcc 9.3.0-10ubuntu2 (in RelWithDebInfo mode):

And the exact same version of clang, built by clang 10.0.0-4ubuntu1:

.

dim added a subscriber: asbirlea.Jun 13 2020, 3:31 PM

Okay, after a long while of attempting to make a reproduction scenario, I finally managed one that consistently worked. As BSD make puts in different redirections for stderr when running in -j mode, it turned out that I could simply run the compilation twice on a terminal, first normally, then with stderr redirected to /dev/null, e.g. my test script is:

#!/bin/sh

exec </dev/tty >/dev/tty 2>&1

ulimit -c 0
trap "rm -f testcase[12].s testcase[12].s-* testcase-*.s.tmp" EXIT

${CLANG:-clang} -fintegrated-cc1 -Werror -O2 -std=gnu99 -fstack-protector-strong -S testcase.c -o testcase1.s || exit 1
${CLANG:-clang} -fintegrated-cc1 -Werror -O2 -std=gnu99 -fstack-protector-strong -S testcase.c -o testcase2.s 2>/dev/null || exit 1

cmp -s testcase1.s testcase2.s
test $? = 1 || exit 1

exit 0

(Here testcase.c is a copy of the .c file in https://www.andric.com/freebsd/clang/bug246630-repro.tar.xz)

Using this script I could finally bisect, and arrived at the following commit which fixes the issue, i.e. it makes the outputs of both clang runs the same again, and it is rG0cecafd647cc ("[BasicAA] Make BasicAA a cfg pass"). @asbirlea seems to indicate in the commit message that it influences the way the BasicAA pass is freed/invalidaed or not during runs?

So is this commit fixing it as an unintended side effect, or would it be according to expectation? If the latter, we should merge it into 10.0.1, and I will create a PR for it.