This is an archive of the discontinued LLVM Phabricator instance.

Script to match open Phabricator reviews with potential reviewers
ClosedPublic

Authored by kristof.beyls on Apr 27 2018, 7:59 AM.

Details

Summary

At the last EuroLLVM, I gave a lightning talk about code review
statistics on Phabricator reviews and what we could derive from that
to try and reduce waiting-for-review bottlenecks. (see
https://llvm.org/devmtg/2018-04/talks.html#Lightning_2).

One of the items I pointed to is a script we've been using internally
for a little while to try and match open Phabricator reviews to people
who might be able to review them well. I received quite a few requests
to share that script, so here it is.

Warning: this is prototype quality!

The script uses 2 similar heuristics to try and match open reviews with
potential reviewers:

  • If there is overlap between the lines of code touched by the patch-under-review and lines of code that a person has written, that person may be a good reviewer.
  • If there is overlap between the files touched by the patch-under-review and the source files that a person has made changes to, that person may be a good reviewer.

The script provides a percentage for each of the above heuristics and
emails a summary. For example, this morning, I received the following
summary from the script in my inbox for patches-under-review where some
change was made in the past 24 hours:

SUMMARY FOR kristof.beyls@arm.com (found 8 reviews):
[3.37%/41.67%] https://reviews.llvm.org/D46018 '[GlobalISel][IRTranslator] Split aggregates during IR translation' by Amara Emerson
[0.00%/100.00%] https://reviews.llvm.org/D46111 '[ARM] Enable misched for R52.' by Dave Green
[0.00%/50.00%] https://reviews.llvm.org/D45770 '[AArch64] Disable spill slot scavenging when stack realignment required.' by Paul Walker
[0.00%/40.00%] https://reviews.llvm.org/D42759 '[CGP] Split large data structres to sink more GEPs' by Haicheng Wu
[0.00%/25.00%] https://reviews.llvm.org/D45189 '[MachineOutliner][AArch64] Keep track of functions that use a red zone in AArch64MachineFunctionInfo and use that instead of checking for noredzone in the MachineOutliner' by Jessica Paquette
[0.00%/25.00%] https://reviews.llvm.org/D46107 '[AArch64] Codegen for v8.2A dot product intrinsics' by Oliver Stannard
[0.00%/12.50%] https://reviews.llvm.org/D45541 '[globalisel] Update GlobalISel emitter to match new representation of extending loads' by Daniel Sanders
[0.00%/6.25%] https://reviews.llvm.org/D44386 '[x86] Introduce the pconfig/enclv instructions' by Gabor Buella

The first percentage in square brackers is the percentage of lines in
the patch-under-review that changes lines that I wrote. The second
percentage is the percentage of files that I made at least some
changes to out of all of the files touched by the patch-under-review.

Both the script and the heuristics are far from perfect, but I've
heard positive feedback from the few colleagues the script has been
sending a summary to every day - hearing that this does help them to
quickly find patches-under-review they can help to review.

The script takes quite some time to run (I typically see it running
for 2 to 3 hours on weekdays when it gets started by a cron job early
in the morning). There are 2 reasons why it takes a long time:

  • The REST api into Phabricator isn't very efficient, i.e. a lot of uninteresting data needs to be fetched. The script tries to reduce this overhead partly by caching info it has fetched on previous runs, so as to not have to refetch lots of Phabricator state on each run.
  • The script uses git blame to find for each line of code in the patch who wrote the original line of code being altered. git blame is sloooowww....

Anyway - to run this script:

  • First install a virtualenv as follows (using Python2.7 - Python3 is almost certainly not going to work):

$ virtualenv venv
$ . ./venv/bin/activate
$ pip install Phabricator

  • Then to run the script, looking for open reviews that could be done by X.Y@company.com, run (in the venv):

$ python ./find_interesting_reviews.py X.Y@company.com

Please note that "X.Y@company.com" needs to be the exact email address
(capitalization is important) that the git LLVM repository knows the
person as. Multiple email addresses can be specified on the command
line. Note that the script as is will email the results to all email
addresses specified on the command line - so be careful not to spam
people accidentally!

Diff Detail

Event Timeline

kristof.beyls created this revision.Apr 27 2018, 7:59 AM

This should probably go into llvm/utils, not the root.

It would also be good to have the opposite script - "given git diff / commits, whom should i add as reviewers?"
It could evaluate CODE_OWNERS.txt, git log / git shortlog -sne of the changed files, git blame.

Do you plan on actually commiting this to the repository, or would you like any feedback nontheless?

One usability nit that I see is send_emails(). It would be great to be able to configure the SMTP connection, to disable the feature altogether, and to seperately specify the recipient of that e-mail: I use E-Mail in my local network, but do not allow for any outgoing traffic.

This should probably go into llvm/utils, not the root.

It would also be good to have the opposite script - "given git diff / commits, whom should i add as reviewers?"

I once wrote the following one-liner to do this. It works pretty well but it can be *very* slow. It will suggest a top 10 of reviewers for your staged changes.

git diff --name-only --cached | xargs -n 1 git blame --porcelain | grep \"^author \" | sort | uniq -c | sort -nr | head -10

It could evaluate CODE_OWNERS.txt, git log / git shortlog -sne of the changed files, git blame.

Hi Kristof. Thanks for this.

One more heuristic that might help getting quick reviews:

  1. If there is overlap between the files touched by the patch-under-review and and a 'reviewer' of recent changes to same file, that person may be a good reviewer.

This heuristic is helpful where original implementor may have moved on, but new active reviewers in same area are happy to review.
Some commit messages have 'reviewers' listed and that could be used.

Do you plan on actually commiting this to the repository, or would you like any feedback nontheless?

I definitely welcome any feedback.
I wasn't sure on committing to the repo, but with the comments coming in and the new feature requests popping up already, it seems like I should prepare to commit it to the repository, so that new features can be developed and shared incrementally.
As @lebedev.ri suggested, it probably should go somewhere in llvm/utils. Given there seems to be scope for other reviewing-related scripts, maybe it should go in a separate directory, e.g. llvm/utils/Reviewing.

One usability nit that I see is send_emails(). It would be great to be able to configure the SMTP connection, to disable the feature altogether, and to seperately specify the recipient of that e-mail: I use E-Mail in my local network, but do not allow for any outgoing traffic.

Agreed, more flexibility is needed on the emailing-feature of the script. I assume that the script is also too hard-coded in some other locations as this is really a prototype containing what was needed in my environment.

I hope to find some time next week to look both into finding a good location to commit this script and the email-feature improvements you suggest .

Hi Kristof. Thanks for this.

One more heuristic that might help getting quick reviews:

  1. If there is overlap between the files touched by the patch-under-review and and a 'reviewer' of recent changes to same file, that person may be a good reviewer.

This heuristic is helpful where original implementor may have moved on, but new active reviewers in same area are happy to review.
Some commit messages have 'reviewers' listed and that could be used.

Interesting idea, thanks for sharing, Javed!
I'll need a bit of time to think on how this could be best implemented.

Thanks!

Kristof

This should probably go into llvm/utils, not the root.

It would also be good to have the opposite script - "given git diff / commits, whom should i add as reviewers?"

I once wrote the following one-liner to do this. It works pretty well but it can be *very* slow. It will suggest a top 10 of reviewers for your staged changes.

git diff --name-only --cached | xargs -n 1 git blame --porcelain | grep \"^author \" | sort | uniq -c | sort -nr | head -10

It could evaluate CODE_OWNERS.txt, git log / git shortlog -sne of the changed files, git blame.

Thanks for sharing, Jonas!

Your one-liner could become the start of a new "suggest_reviewers" script?

MaskRay added a subscriber: MaskRay.May 3 2018, 4:14 PM

Can this script be made Python 3 compatible? Here is one place where running it under python3 will fail

    people_to_look_for = [e.decode('utf-8') for e in args.email_addresses]
AttributeError: 'str' object has no attribute 'decode'

Updating diff to make it add the script to llvm/utils/Reviewing.
I chose to put it in a new folder "Reviewing", since ideas for further scripts related to reviewing have been made on this thread.

For all of the feature requests raised so far on this review thread: I think they all make sense, but also think it's best to add them incrementally. In other words: commit the script as is and iterate through the feature requests and add them as individual commits.

It'd be nice if one of the reviewers could share an opinion on whether the above sounds like a good plan. If so, I'll go ahead with the above plan.

Thanks,

Kristof

lebedev.ri added inline comments.May 7 2018, 7:17 AM
utils/Reviewing/find_interesting_reviews.py
2

Can #!/usr/bin/env python be prepended as the first line?

2

Also, the standard licensing blurb at the top is missing.

kristof.beyls marked 2 inline comments as done.
kristof.beyls added inline comments.
utils/Reviewing/find_interesting_reviews.py
2

Added in latest updated patch.

2

I tried to find an example in any of the other python scripts under util, but I couldn't see any licensing blurbs in any of the existing python files.
Maybe best not to invent a licensing blurb in Python syntax here and just leave it as is?

ioeric removed a reviewer: ioeric.May 9 2018, 2:28 PM
ioeric added a subscriber: ioeric.
JDevlieghere requested changes to this revision.May 14 2018, 3:52 AM

There's a few small inconsistencies (such as spaces after print, between operators, etc) that could be automatically solved by running a formatter (e.g. YAPF) over the source code.

utils/Reviewing/find_interesting_reviews.py
4

Nit: can we sort the imports alphabetically?

85

Nit: You have different approaches to string formatting throughout this file. Personally I prefer the "".format() approach but as long as we're consistent I'd be happy.

150

Nit: We usually make comments full sentences in LLVM.

433

I'm guessing the commented out code is for logging? Maybe we could use the logger for this with a low verbosity?

533

Maybe turn this into a constant?

590
if __name__ == "__main__":
    main()
This revision now requires changes to proceed.May 14 2018, 3:52 AM
kristof.beyls marked 2 inline comments as done.

Many thanks for the detailed review, Jonas!
I think I've addressed all your comments - very useful feedback!

kristof.beyls marked 8 inline comments as done.May 17 2018, 3:10 AM

Oh - and I indeed also ran the source code through YAPF - thanks for that pointer, I didn't know that existed.

JDevlieghere accepted this revision.May 17 2018, 7:23 AM

Awesome, thanks Kristof! Two more nits, but overall this looks good to me.

utils/Reviewing/find_interesting_reviews.py
9

Usually import and from are grouped/sorted separately:

import a
import b 
import c
from a import foo
from b import bar
94

If this is a constant I'd make it all caps. Same for the class below.

This revision is now accepted and ready to land.May 17 2018, 7:23 AM
This revision was automatically updated to reflect the committed changes.
Ka-Ka added a subscriber: Ka-Ka.May 28 2018, 7:51 AM