This is an archive of the discontinued LLVM Phabricator instance.

[Clang] After integrated-cc1, ignore -disable-free when there are more than one job in the queue
Needs ReviewPublic

Authored by aganea on Feb 11 2020, 2:33 PM.

Details

Summary

A CC1Command was previously always burying pointers (ie. skipping object deletion). This lead to higher memory usage scenarios, when the inputs or cmd-line flags expand to more than one driver job.

With this patch, when several jobs are created by the driver, and at least one CC1Command is in the job queue, remove -disable-free for CC1Commands other than the last one.
If the last job is not a CC1Command, this patch removes -disable-free altogether for all CC1Commands.

Fixes PR44823.

Diff Detail

Event Timeline

aganea created this revision.Feb 11 2020, 2:33 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 11 2020, 2:33 PM
aganea updated this revision to Diff 244028.Feb 11 2020, 4:05 PM
aganea edited the summary of this revision. (Show Details)

Wording.

rnk added inline comments.Feb 11 2020, 4:23 PM
clang/include/clang/Driver/Job.h
57–58

Pack a new field after an existing bool for maximum micro optimization. :)

138

I guess I lean towards making this a field, default initialized to false, set to true in the CC1Command constructor. Less relocations, etc.

clang/lib/Driver/Driver.cpp
3759

*std::prev(Jobs.end())? Or *Jobs.getJobs().back(), not quite sure.

aganea updated this revision to Diff 244055.Feb 11 2020, 7:23 PM
aganea marked 3 inline comments as done.

As suggested by Reid.

hans added inline comments.Feb 12 2020, 4:18 AM
clang/include/clang/Driver/Job.h
90

I think Reid just meant put the bool fields after each other to minimize padding. Using bitfields is taking it one step further. I think in LLVM we generally use "unsigned" for bitfield types, but I'm not sure using bitfields at all is worth it here.

134–135

I hadn't heard the term "burying pointers" before, and the name is also pretty broad: "clean up" could refer to a lot more than just freeing memory.

Maybe "enableFree()" or something would be a better name?

clang/lib/Driver/Driver.cpp
3758

Is this if-statement really necessary? If you just do the for-loop directly, it will not execute if Jobs is empty anyway.

Also, with an old-school for-loop it's easier to not iterate over the last element. How about something like:

for (size_t i = 0, e = C.getJobs().size(); i + 1 < e; i++) {
  auto &Job = C.getJobs().getJobs()[i];
  if (Job.InProcess)
    Job.forceCleanup();
}
clang/lib/Driver/Job.cpp
325

I wish it were possible to avoid adding the -disable-free flag to in-process-cc1 jobs in the first place, but I guess maybe that's not practical.

Isn't a better approach to only do in-process cc1 if there's just one TU?

Isn't a better approach to only do in-process cc1 if there's just one TU?

One of the reasons of the in-process cc1 was the debuggability. If we can compile several TUs in-process, why not doing it?

As for build times, we found that even with the clean up, in-process+clean-up is faster than out-of-process+disable-free. That is, on Windows and in the context of the llvm-buildozer app I've presented at the LLVM conf., where all the TUs for a project are compiled in the same process (meaning there's a clean up after each TU).

Any other issues you were seeing with this current approach?

I'd think that everyone debugging clang always passes a single TU to it, so I'm not sure debuggability does much here :)

The -disable-free code has never been used in normal compilations, so we didn't have to worry about this path. This patch here brings us to 3 modes (in-process cc1 with and without disable-free, out-of-process cc1). My suggestion keeps us to two modes (in-process cc1, out-of-process cc1).

The performance benefit of disable-free is still valuable, I'd not want to make every TU pay the price for a multiple TU invocation. Additionally, if we're willing to put up with the 3rd form (which i think is worth it), the multiple TU case cleaning up on all but the last makes sense.

aganea updated this revision to Diff 244172.EditedFeb 12 2020, 7:49 AM
aganea marked 7 inline comments as done.

Address @hans' comments.

While it'd be good to have a bot running without -disable-free, I concur with @thakis. This patch will probably introduce too much (untested) variability for release 10.0, if merged.
Let's enable integrate-cc1 just for one TU for the 10.0, and then we'll see if we want to land this patch or not in the trunk for the 11.0.
I'll open another review.

clang/include/clang/Driver/Job.h
90

Reid said "pack" and "field" and my Pavlovian reflex made me think "bitfields". Yes, plain bools are just fine.

134–135

"burying", as in https://github.com/llvm/llvm-project/blob/master/llvm/include/llvm/Support/BuryPointer.h

Changed the comment and the function to enableFree().

clang/lib/Driver/Job.cpp
325

That was my original intent, but the -disable-free flag is added by Clang::ConstructJob() and at that point we don't know yet how many jobs we will have, nor their nature.

I forgot another reason why "only do cc1 with a single TU" might be a good idea: In case crash recovery turns out to not work in all cases with in-process cc1, we could add a signal handler that on crash make the driver exec() itself with -fno-integrated-cc1 added, so that the crash is then processed again with out-of-process cc1. (At least on non-Windows.) I prototyped this at https://github.com/nico/hack/blob/master/crash_signal.cc a while ago (in a standalone program). This only works if the crashing cc1 invocation is the first thing the driver does.

ychen added a subscriber: ychen.May 22 2020, 11:50 AM
ychen added a comment.May 22 2020, 4:44 PM

I'd think that everyone debugging clang always passes a single TU to it, so I'm not sure debuggability does much here :)

The -disable-free code has never been used in normal compilations, so we didn't have to worry about this path. This patch here brings us to 3 modes (in-process cc1 with and without disable-free, out-of-process cc1). My suggestion keeps us to two modes (in-process cc1, out-of-process cc1).

What is the downside of 3 modes except maintaining one more mode in the future?