Page MenuHomePhabricator

New pass to produce more easily-read IR.

Authored by arnt on Apr 12 2019, 8:27 AM.



This doesn't change the compiled output, merely reorders the basic blocks.
Frontends tend to create basic blocks in the order they discover that they
need the blocks, which is often not a great order for reading and
understanding the IR output.

The new pass is never called by default, but can be invoked with opt:

opt -S -passes improve-reading-order somefile.bc | less

Diff Detail

Event Timeline

arnt created this revision.Apr 12 2019, 8:27 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 12 2019, 8:27 AM
arnt added a comment.Apr 12 2019, 8:31 AM

This patch's reviewer selection algorithm: Those who have changed two or more files in the same directory recently. I hope you don't mind.

MaskRay added inline comments.Apr 13 2019, 3:34 AM

Good catch! This is irrelevant to this revision. You may commit it separately.


BasicBlock * lastSoFar = nullptr; -> BasicBlock *LastSoFar = nullptr;
for( -> for (
docs/Proposals/VariableNames.rst (lower_case vs lowerCase is still a debate; (I prefer snake_case :) ))


Missing -S

MaskRay added inline comments.Apr 13 2019, 5:33 AM

For a not-necessarily-sorted set, you may use SmallPtrSet or DenseSet.


This can be a SetVector (since we want determinism but set::set is slow)


In some pessimistic cases, preferredNextBlock is of O(|input|^2) and this while loop is cubic..

Do you want to rewrite the loop to use a worklist?

if (condition) {
  complex   /////// I believe the current algorithm only reorders the 'then' part
} else {
  complex    /////// this is not touched
arnt updated this revision to Diff 195175.Apr 15 2019, 6:45 AM
arnt marked 3 inline comments as done.

Reworked based on good comments from maskray; thanks.

arnt added a comment.Apr 15 2019, 6:46 AM

I pushed an update now. Will this comment submit the unsubmitted drafts?


Can do, but I'll do it at the same time, OK? Haven't figured out how to push from my git monorepo to the svn source of truth yet.


The while loop exits as soon as preferredNextBlock() returns non-null, so it's quadratic, not cubic.

I'm not sure whether a rewrite for speed is worthwhile. It'll help if a function has very many dead blocks, exception handlers or targets for a single switch, but it's not clear to me whether it'll ever make such a function readable. Sucking quickly instead of slowly isn't worth much effort.

On the other hand, if a function has, say, a thousand dead blocks, then one may have a particularly strong reason to inspect that function, and printing it shouldn't take thirty minutes.

So I'm on the fence.

I rewrote. You pointed me to SetVector and SetVector wants to be used ;) The new version's weak point is a thousands-wide switch where half the branches fall through to the other half, and the fallen-through-to branches contain many basic blocks and loops and so on. I'll upload a new revision later today.


Ooops, fixed. Why did the tests pass without? It seems I've been running check-llvm-unit instead of check-llvm.

arnt updated this revision to Diff 195182.Apr 15 2019, 7:51 AM

Ran clang-format.

arnt added a comment.Apr 30 2019, 2:46 AM


neither of you have commented in the past couple of weeks. Should I:

  • look for other reviewers? if so, do you have suggestions?
  • abandon the feature and merely submit the corrected comment (".inc file")?

Sorry to bother you. I do understand that reviewing isn't high on anyone's priority list.


neither of you have commented in the past couple of weeks. Should I:

  • look for other reviewers? if so, do you have suggestions?
  • abandon the feature and merely submit the corrected comment (".inc file")?

    Sorry to bother you. I do understand that reviewing isn't high on anyone's priority list.

Looks like there were some coding-convention comments that are not yet addressed (e.g., for variable names).

arnt updated this revision to Diff 199402.May 14 2019, 4:49 AM

This revision updates the variable names as requested by hfinkel.

Specifically, I updated all variables and parameter names to start with an
upper-case letter, except loop iterators (for which the upper-case rule
appears to be honoured more in the breach than in the observance).

It does not contain any code changes. But I added some more documentation.

I don't think I will do any more with this unless someone comments on the
substance or purpose of the patch. Commenting on code style is fine, but
a total absence of other comments gets tiresome.

jdoerfert accepted this revision.EditedMay 14 2019, 8:39 AM
jdoerfert added a subscriber: jdoerfert.

I added various formatting and code comments but other than that, this LGTM.

So, please update according to the comments or respond to them before committing.


Don't open a namespace in a header.

namespace llvm {

Most "new PM" transformations only expose the run method. See the preferredNextBlock definition comment.




Variable names commonly start with an upper case letter: b -> BB.
You can also simply "iterate" over F: BasicBlock &BB : F


uint -> uint64_t (to match getZExtValue)
Also, why do you do sorting through a std::map? Is there a guarantee that it will be sorted (in the way you need it)?
I'd put all in a SmallVector<std::pair<uint64_t, uint64_t>, 8> Ordered., then
call llvm::sort(Ordered);




With a vector, as described above, it will be a Successors.append(Ordered.begin(), Ordered.end()).


auto u -> BasicBlock *BB


It took me a while to see that you only want to find one witness:
Add a break, remove the !HadingTowardsUnreachable.


I don't understand the interplay between HeadingTowardsUnreachable and LeadsToUnreachable. Maybe add a comment to explain.


I would assume you could use LoopInfo and the nesting depth of loops to reason here as well.


If you move this at the beginning of the file you can make it static.


i is associated with integers (in my mind). BasicBlock &PossiblePredBB maybe

This revision is now accepted and ready to land.May 14 2019, 8:39 AM
fhahn added inline comments.May 15 2019, 11:54 AM

Or use MapVector, if it accepts std::pair as KeyT

As a way to test this on a larger number of cases, could you add it to the default pass pipeline (maybe at multiple points) and bootstrap clang with it (or build the test-suite & co)?

arnt marked 9 inline comments as done.Jun 7 2019, 12:51 PM

Updated diff coming in a few minutes...


I kept SetVector; the performance isn't a problem and its order produces nice results. (Performance is generally not a problem here since the pass's application limits the data size. A million-line function isn't readable in any reading order, see?)


There must be dozens of ways to get the order I want (heaviest branch weight first, falling back to input order), all of which have essentially the same performance: fast enough for any function a human can aspire to read.

Why did I pick the one I picked? Most likely because I had worked on STL-heavy code in the days before, and used the same idioms. IMO it doesn't matter. MapVector would be as good, but changing now would just be effort without significant improvement.


OK... but let me try here first. It's about happypaths, sad paths and even sadder paths, like code that throws an exception while trying to throw or handle another exception.

Suppose that a function contains a couple of blocks that end with a return and a couple of blocks that end with unreachables. In that case the pass will generally try to output everything that can lead to a return first, but eventually it has to start outputting the blocks whose successors end with either/both unreachables.

Once it has started on a block that is postdominated by one of those unreachables, then it's best to output everything related to that unreachable. For example, if the two unreachables are for throwing "null pointer dereferenced" and "index out of bounds", then it's best to print all of the null-pointer blocks in one group, and all of the index-checking blocks in another group.

So that's what it tries to do: If the current block is already heading for an unreachable, then distinguishes between blocks that lead to the "current" unreachable, and other, even sadder paths unreachables.

I renamed one variable and added a comment. Thanks. Compiling now.


I would assume so too, but I still haven't come across any functions where I'm unsatisified with the output order and think that LoopInfo would help.

(I've used this to read code for most of a year now.)

arnt updated this revision to Diff 203601.Jun 7 2019, 12:52 PM
arnt marked an inline comment as done.

Updates from review by jdoerfert etc

arnt marked 3 inline comments as done.Jun 7 2019, 12:57 PM

Now with the ones I missed.

Left out are: Some for-loop variable names (AIUI the upper-case rule is widely ignored for those, I join that chorus), and at least one case where I don't see any reason to change type (from std::map) because it's work for insignificant benefit. Maybe I'd do it differently if writing from scratch.


Ugh. I missed this. Coming.


I didn't notice that from the ones I skimmed... is this a rule, and if so, what motivates it?

lebedev.ri added inline comments.

different pass and debug name will be very confusing.

arnt updated this revision to Diff 203608.Jun 7 2019, 1:12 PM
arnt marked an inline comment as done.

Respond to a few more comments, by lebedevri and jdoerfert.

I haven't had time to read through the explanation that is to become a comment yet.


Afaik, not a hard rule, just something I feel is being adapted as the norm.


git llvm push HEAD~2 will push the last two commits. -n is for dryrun.


Please do not use uint, I'm with him:

arnt added a comment.Jun 21 2019, 12:38 PM


thanks for pinging.

I've been on vacation since jdoerfert's review (and his blessing to land the patch). I live in Munich, where schools close for two weeks around this time and families pack up to go eat gelato in Italy.

As far as I can remember I attended to everything jdoerfert brought up, and my next task ought to be to rebase it to the current tip-of-trunk and (find out how to) land it.

My primary purpose for these four patches is to learn about the LLVM patch process. I picked four rather different patches precisely because they're different. If some of the four fail I won't cry. But if this is landed I'll rebase another of the four and see if I can get it in, too.

arnt updated this revision to Diff 206811.Jun 27 2019, 2:58 AM
arnt marked 4 inline comments as done.

Attends to most comments, not all.

A reqeust to introduce break/continue/goto is more than I'll do. That's
where I stop.

arnt added a comment.Jun 27 2019, 3:03 AM

I rebased onto this morning's LLVM now.

I don't have commit access (as far as I can tell at least) and cannot find a way to land code from phabricator, so if this is to land, one of the reviewers has to commit it for me. Is that right?


If it pleases you.


Sorry, adding break, goto or continue is more than I'll do without compelling, specific reason.