At the last EuroLLVM, I gave a lightning talk about code review
statistics on Phabricator reviews and what we could derive from that
to try and reduce waiting-for-review bottlenecks. (see
https://llvm.org/devmtg/2018-04/talks.html#Lightning_2).
One of the items I pointed to is a script we've been using internally
for a little while to try and match open Phabricator reviews to people
who might be able to review them well. I received quite a few requests
to share that script, so here it is.
Warning: this is prototype quality!
The script uses 2 similar heuristics to try and match open reviews with
potential reviewers:
- If there is overlap between the lines of code touched by the patch-under-review and lines of code that a person has written, that person may be a good reviewer.
- If there is overlap between the files touched by the patch-under-review and the source files that a person has made changes to, that person may be a good reviewer.
The script provides a percentage for each of the above heuristics and
emails a summary. For example, this morning, I received the following
summary from the script in my inbox for patches-under-review where some
change was made in the past 24 hours:
SUMMARY FOR kristof.beyls@arm.com (found 8 reviews):
[3.37%/41.67%] https://reviews.llvm.org/D46018 '[GlobalISel][IRTranslator] Split aggregates during IR translation' by Amara Emerson
[0.00%/100.00%] https://reviews.llvm.org/D46111 '[ARM] Enable misched for R52.' by Dave Green
[0.00%/50.00%] https://reviews.llvm.org/D45770 '[AArch64] Disable spill slot scavenging when stack realignment required.' by Paul Walker
[0.00%/40.00%] https://reviews.llvm.org/D42759 '[CGP] Split large data structres to sink more GEPs' by Haicheng Wu
[0.00%/25.00%] https://reviews.llvm.org/D45189 '[MachineOutliner][AArch64] Keep track of functions that use a red zone in AArch64MachineFunctionInfo and use that instead of checking for noredzone in the MachineOutliner' by Jessica Paquette
[0.00%/25.00%] https://reviews.llvm.org/D46107 '[AArch64] Codegen for v8.2A dot product intrinsics' by Oliver Stannard
[0.00%/12.50%] https://reviews.llvm.org/D45541 '[globalisel] Update GlobalISel emitter to match new representation of extending loads' by Daniel Sanders
[0.00%/6.25%] https://reviews.llvm.org/D44386 '[x86] Introduce the pconfig/enclv instructions' by Gabor Buella
The first percentage in square brackers is the percentage of lines in
the patch-under-review that changes lines that I wrote. The second
percentage is the percentage of files that I made at least some
changes to out of all of the files touched by the patch-under-review.
Both the script and the heuristics are far from perfect, but I've
heard positive feedback from the few colleagues the script has been
sending a summary to every day - hearing that this does help them to
quickly find patches-under-review they can help to review.
The script takes quite some time to run (I typically see it running
for 2 to 3 hours on weekdays when it gets started by a cron job early
in the morning). There are 2 reasons why it takes a long time:
- The REST api into Phabricator isn't very efficient, i.e. a lot of uninteresting data needs to be fetched. The script tries to reduce this overhead partly by caching info it has fetched on previous runs, so as to not have to refetch lots of Phabricator state on each run.
- The script uses git blame to find for each line of code in the patch who wrote the original line of code being altered. git blame is sloooowww....
Anyway - to run this script:
- First install a virtualenv as follows (using Python2.7 - Python3 is almost certainly not going to work):
$ virtualenv venv
$ . ./venv/bin/activate
$ pip install Phabricator
- Then to run the script, looking for open reviews that could be done by X.Y@company.com, run (in the venv):
$ python ./find_interesting_reviews.py X.Y@company.com
Please note that "X.Y@company.com" needs to be the exact email address
(capitalization is important) that the git LLVM repository knows the
person as. Multiple email addresses can be specified on the command
line. Note that the script as is will email the results to all email
addresses specified on the command line - so be careful not to spam
people accidentally!