Page MenuHomePhabricator

New pass to produce more easily-read IR.
AcceptedPublic

Authored by arnt on Apr 12 2019, 8:27 AM.

Details

Summary

This doesn't change the compiled output, merely reorders the basic blocks.
Frontends tend to create basic blocks in the order they discover that they
need the blocks, which is often not a great order for reading and
understanding the IR output.

The new pass is never called by default, but can be invoked with opt:

opt -S -passes improve-reading-order somefile.bc | less

Diff Detail

Event Timeline

arnt created this revision.Apr 12 2019, 8:27 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 12 2019, 8:27 AM
arnt added a comment.Apr 12 2019, 8:31 AM

This patch's reviewer selection algorithm: Those who have changed two or more files in the same directory recently. I hope you don't mind.

MaskRay added inline comments.Apr 13 2019, 3:34 AM
llvm/lib/Passes/PassBuilder.cpp
1700

Good catch! This is irrelevant to this revision. You may commit it separately.

llvm/lib/Transforms/Utils/ImproveReadingOrder.cpp
52

BasicBlock * lastSoFar = nullptr; -> BasicBlock *LastSoFar = nullptr;
for( -> for (

https://llvm.org/docs/CodingStandards.html
docs/Proposals/VariableNames.rst (lower_case vs lowerCase is still a debate; (I prefer snake_case :) ))

llvm/test/Transforms/Util/improve-reading-order.ll
2

Missing -S

MaskRay added inline comments.Apr 13 2019, 5:33 AM
llvm/lib/Transforms/Utils/ImproveReadingOrder.cpp
53

For a not-necessarily-sorted set, you may use SmallPtrSet or DenseSet.

57

This can be a SetVector (since we want determinism but set::set is slow)

135

In some pessimistic cases, preferredNextBlock is of O(|input|^2) and this while loop is cubic..

Do you want to rewrite the loop to use a worklist?

if (condition) {
  complex   /////// I believe the current algorithm only reorders the 'then' part
} else {
  complex    /////// this is not touched
}
arnt updated this revision to Diff 195175.Apr 15 2019, 6:45 AM
arnt marked 3 inline comments as done.

Reworked based on good comments from maskray; thanks.

arnt added a comment.Apr 15 2019, 6:46 AM

I pushed an update now. Will this comment submit the unsubmitted drafts?

llvm/lib/Passes/PassBuilder.cpp
1700

Can do, but I'll do it at the same time, OK? Haven't figured out how to push from my git monorepo to the svn source of truth yet.

llvm/lib/Transforms/Utils/ImproveReadingOrder.cpp
135

The while loop exits as soon as preferredNextBlock() returns non-null, so it's quadratic, not cubic.

I'm not sure whether a rewrite for speed is worthwhile. It'll help if a function has very many dead blocks, exception handlers or targets for a single switch, but it's not clear to me whether it'll ever make such a function readable. Sucking quickly instead of slowly isn't worth much effort.

On the other hand, if a function has, say, a thousand dead blocks, then one may have a particularly strong reason to inspect that function, and printing it shouldn't take thirty minutes.

So I'm on the fence.

I rewrote. You pointed me to SetVector and SetVector wants to be used ;) The new version's weak point is a thousands-wide switch where half the branches fall through to the other half, and the fallen-through-to branches contain many basic blocks and loops and so on. I'll upload a new revision later today.

llvm/test/Transforms/Util/improve-reading-order.ll
2

Ooops, fixed. Why did the tests pass without? It seems I've been running check-llvm-unit instead of check-llvm.

arnt updated this revision to Diff 195182.Apr 15 2019, 7:51 AM

Ran clang-format.

arnt added a comment.Apr 30 2019, 2:46 AM

Hi,

neither of you have commented in the past couple of weeks. Should I:

  • look for other reviewers? if so, do you have suggestions?
  • abandon the feature and merely submit the corrected comment (".inc file")?

Sorry to bother you. I do understand that reviewing isn't high on anyone's priority list.

Hi,

neither of you have commented in the past couple of weeks. Should I:

  • look for other reviewers? if so, do you have suggestions?
  • abandon the feature and merely submit the corrected comment (".inc file")?

    Sorry to bother you. I do understand that reviewing isn't high on anyone's priority list.

Looks like there were some coding-convention comments that are not yet addressed (e.g., for variable names).

arnt updated this revision to Diff 199402.May 14 2019, 4:49 AM

This revision updates the variable names as requested by hfinkel.

Specifically, I updated all variables and parameter names to start with an
upper-case letter, except loop iterators (for which the upper-case rule
appears to be honoured more in the breach than in the observance).

It does not contain any code changes. But I added some more documentation.

I don't think I will do any more with this unless someone comments on the
substance or purpose of the patch. Commenting on code style is fine, but
a total absence of other comments gets tiresome.

jdoerfert accepted this revision.EditedMay 14 2019, 8:39 AM
jdoerfert added a subscriber: jdoerfert.

I added various formatting and code comments but other than that, this LGTM.

So, please update according to the comments or respond to them before committing.

llvm/include/llvm/Transforms/Utils/ImproveReadingOrder.h
19

Don't open a namespace in a header.

namespace llvm {
...
}
38

Most "new PM" transformations only expose the run method. See the preferredNextBlock definition comment.

llvm/lib/Transforms/Utils/ImproveReadingOrder.cpp
16

Leftover

58

Variable names commonly start with an upper case letter: b -> BB.
You can also simply "iterate" over F: BasicBlock &BB : F

97

uint -> uint64_t (to match getZExtValue)
Also, why do you do sorting through a std::map? Is there a guarantee that it will be sorted (in the way you need it)?
I'd put all in a SmallVector<std::pair<uint64_t, uint64_t>, 8> Ordered., then
call llvm::sort(Ordered);

100

Weight

110

With a vector, as described above, it will be a Successors.append(Ordered.begin(), Ordered.end()).

122

auto u -> BasicBlock *BB

124

It took me a while to see that you only want to find one witness:
Add a break, remove the !HadingTowardsUnreachable.

132

I don't understand the interplay between HeadingTowardsUnreachable and LeadsToUnreachable. Maybe add a comment to explain.

135

I would assume you could use LoopInfo and the nesting depth of loops to reason here as well.

157

If you move this at the beginning of the file you can make it static.

166

i is associated with integers (in my mind). BasicBlock &PossiblePredBB maybe

This revision is now accepted and ready to land.May 14 2019, 8:39 AM
fhahn added inline comments.May 15 2019, 11:54 AM
llvm/lib/Transforms/Utils/ImproveReadingOrder.cpp
97

Or use MapVector, if it accepts std::pair as KeyT https://www.llvm.org/docs/ProgrammersManual.html#llvm-adt-mapvector-h

As a way to test this on a larger number of cases, could you add it to the default pass pipeline (maybe at multiple points) and bootstrap clang with it (or build the test-suite & co)?

arnt marked 9 inline comments as done.Fri, Jun 7, 12:51 PM

Updated diff coming in a few minutes...

llvm/lib/Transforms/Utils/ImproveReadingOrder.cpp
53

I kept SetVector; the performance isn't a problem and its order produces nice results. (Performance is generally not a problem here since the pass's application limits the data size. A million-line function isn't readable in any reading order, see?)

97

There must be dozens of ways to get the order I want (heaviest branch weight first, falling back to input order), all of which have essentially the same performance: fast enough for any function a human can aspire to read.

Why did I pick the one I picked? Most likely because I had worked on STL-heavy code in the days before, and used the same idioms. IMO it doesn't matter. MapVector would be as good, but changing now would just be effort without significant improvement.

132

OK... but let me try here first. It's about happypaths, sad paths and even sadder paths, like code that throws an exception while trying to throw or handle another exception.

Suppose that a function contains a couple of blocks that end with a return and a couple of blocks that end with unreachables. In that case the pass will generally try to output everything that can lead to a return first, but eventually it has to start outputting the blocks whose successors end with either/both unreachables.

Once it has started on a block that is postdominated by one of those unreachables, then it's best to output everything related to that unreachable. For example, if the two unreachables are for throwing "null pointer dereferenced" and "index out of bounds", then it's best to print all of the null-pointer blocks in one group, and all of the index-checking blocks in another group.

So that's what it tries to do: If the current block is already heading for an unreachable, then distinguishes between blocks that lead to the "current" unreachable, and other, even sadder paths unreachables.

I renamed one variable and added a comment. Thanks. Compiling now.

135

I would assume so too, but I still haven't come across any functions where I'm unsatisified with the output order and think that LoopInfo would help.

(I've used this to read code for most of a year now.)

arnt updated this revision to Diff 203601.Fri, Jun 7, 12:52 PM
arnt marked an inline comment as done.

Updates from review by jdoerfert etc

arnt marked 3 inline comments as done.Fri, Jun 7, 12:57 PM

Now with the ones I missed.

Left out are: Some for-loop variable names (AIUI the upper-case rule is widely ignored for those, I join that chorus), and at least one case where I don't see any reason to change type (from std::map) because it's work for insignificant benefit. Maybe I'd do it differently if writing from scratch.

llvm/include/llvm/Transforms/Utils/ImproveReadingOrder.h
19

Ugh. I missed this. Coming.

38

I didn't notice that from the ones I skimmed... is this a rule, and if so, what motivates it?

lebedev.ri added inline comments.
llvm/lib/Transforms/Utils/ImproveReadingOrder.cpp
18

different pass and debug name will be very confusing.

arnt updated this revision to Diff 203608.Fri, Jun 7, 1:12 PM
arnt marked an inline comment as done.

Respond to a few more comments, by lebedevri and jdoerfert.

I haven't had time to read through the explanation that is to become a comment yet.

llvm/include/llvm/Transforms/Utils/ImproveReadingOrder.h
38

Afaik, not a hard rule, just something I feel is being adapted as the norm.

llvm/lib/Passes/PassBuilder.cpp
1700

git llvm push HEAD~2 will push the last two commits. -n is for dryrun.

llvm/lib/Transforms/Utils/ImproveReadingOrder.cpp
116

Please do not use uint, I'm with him: https://stackoverflow.com/a/5678057