This is an archive of the discontinued LLVM Phabricator instance.

[clang-format-diff.py] give clang-format-diff a job pool (10x speed)
ClosedPublic

Authored by seanptmaher on Jan 8 2023, 12:22 PM.

Details

Summary

This patch changes the implementation of clang-format-diff.py to
start up many clang-format processes in parallel in order to speed
up clang-format-diff.py by several orders of magnitude on large
patches (less than a minute instead of several tens of minutes).

Diff Detail

Event Timeline

seanptmaher created this revision.Jan 8 2023, 12:22 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 8 2023, 12:22 PM
seanptmaher requested review of this revision.Jan 8 2023, 12:22 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 8 2023, 12:22 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript

Update python formatting

MyDeveloperDay requested changes to this revision.Jan 9 2023, 1:32 AM

You've lost the first part of the patch in this latest diff

This revision now requires changes to proceed.Jan 9 2023, 1:32 AM
MyDeveloperDay added a project: Restricted Project.
MyDeveloperDay removed a subscriber: owenpan.

Possibly fix the patchset? Sorry I've never used phabricator berfore.

Can you add an option e.g. -j N where N is a positive integer and defaults to 1? Also, can you include a use case in which we can see the ~10x improvement on runtime?

MyDeveloperDay requested changes to this revision.Jan 12 2023, 7:16 AM

On my laptop running

import os;
import multiprocessing


def main():
    num = len(os.sched_getaffinity(0))
    numcpu = multiprocessing.cpu_count()
    print ('affinity', num, '\n');
    print ('cpucount', numcpu, '\n');

if __name__ == '__main__':
  main()

produces

affinity 16

cpucount 16

Does the * 8 mean I might get 128?

Do I want 16 parallel clang-format processes running at once? (especially if some of the files are memory hungry)

I'm with @owenpan here I am happy to use a -j N option when I need more power, and N default should be 1 otherwise we could be causing a regression, I'm not a fan of processes that assume thats the only thing running on my machine (cough visual studio!), and there always should be some way of saying do what we did before.

clang/tools/clang-format/clang-format-diff.py
129

is this 128 parallel processes?

130

is 8 a good number here?

This revision now requires changes to proceed.Jan 12 2023, 7:16 AM

Add jobs flag

Yeah, the jobs flag is a much better implementation. Thanks for the comments. I've implemented that. I was indeed making the assumption that this is all that's running, etc. as you mentioned.

The use case I was using this with was running diff formatting of very large diffs (automated mass refactors changing thousands of files) on the chromium codebase -- I'm not sure how to "include" such a use case as you mentioned. I could write a benchmark if you'd like?

fix the formatting

seanptmaher marked 2 inline comments as done.Jan 12 2023, 10:18 AM

The use case I was using this with was running diff formatting of very large diffs (automated mass refactors changing thousands of files) on the chromium codebase -- I'm not sure how to "include" such a use case as you mentioned. I could write a benchmark if you'd like?

I was thinking about trying out whatever you did when reviewing this patch, so it would be helpful if you could share the steps/commands you used. A benchmark is unnecessary, but having some tests would be great.

Fix bug in -j implementation where proper waiting wasn't happening

It's slightly hard to test this because it involves making a huge diff of a git repo. If you're willing to jump through the hoops of cloning chromium (honestly, this is a pain in the ass. It's too much hard drive space and you've got to install depot_tools. If you don't already have google code on your machine, you're probably going to want this to be a temporary thing) and running a few commands, I think I've figured out how you could test this.

First, clone the repo using the instructions you find here: https://www.chromium.org/developers/how-tos/get-the-code/

Then, checkout commit hash 0e9d17d1b6414621228115ee535c7becf67e1c62.

Then, apply the patch file that I've just attached. It is a 6000-file change, where I've changed a bunch of includes.

Then, finally, to format that patch using the clang-format-diff.py file, run the following:

$ time git diff -U0 --no-color --relative HEAD^ | buildtools/clang_format/script/clang-format-diff.py -p1 -i

On my machine, that gives:

________________________________________________________
Executed in   16.56 mins    fish           external
   usr time   15.01 mins    1.02 millis   15.01 mins
   sys time    1.70 mins    0.71 millis    1.70 mins

This will take quite some time. After it finishes, save a diff of the filesystem with git add . && git diff --cached > ../slow-diff, (important not to save it in your working directory or else you'd mess up the next diff!)
Then git reset . && git restore . to start over.

Then, run the same command but with the new clang-format-diff.py, and pass it as many jobs as your computer could handle.

$ time git diff -U0 --no-color --relative HEAD^ | path/to/the/new/clang-format-diff.py -p1 -i -j{16,128}

On my machine, that runs in just above 20 seconds.

________________________________________________________
Executed in   22.24 secs    fish           external
   usr time   16.71 mins    0.00 millis   16.71 mins
   sys time    2.37 mins    2.05 millis    2.37 mins

Finally, save a diff of this again, like something like so: git add . && git diff --cached > ../fast-diff

Then finally you can diff the two diffs:

spvw@this ~/Documents/code/chromium/src ((eb2c565c…))> diff ../slow-diff ../fast-diff
spvw@this ~/Documents/code/chromium/src ((eb2c565c…))>

Showing that the two are identical.

Are there other ways in which you'd want to test this?

i'm not sure how to write unit-test-like tests for this, so if you'd like me to, I'd appreciate a pointer...

No need to include os anymore, I'm not checking number of cores

Thanks a lot for the review, by the way.

I realize the patch wasn't up to snuff initially, but glad to get it in order with your help.

MyDeveloperDay accepted this revision.Jan 14 2023, 4:44 AM

I can see why you have the desire for something that takes 22 seconds and not 16 mins!

This revision is now accepted and ready to land.Jan 14 2023, 4:44 AM

I can't actually commit this -- could one of you? Or has it already been done? I remember reading that someone had to manually push the patch upstream.

Hey, bump on this. Still waiting on the commit.

I can't actually commit this -- could one of you? Or has it already been done? I remember reading that someone had to manually push the patch upstream.

Can we have the name and email you want to use for your patch? See https://llvm.org/docs/Phabricator.html#committing-someone-s-change-from-phabricator.

Hey,

Sure. The name is Sean Maher and the email is spvw@chromium.org

Thanks for your help.

This revision was landed with ongoing or failed builds.Mar 2 2023, 11:59 AM
This revision was automatically updated to reflect the committed changes.