This is an archive of the discontinued LLVM Phabricator instance.

Remove equal BBs from a function
AbandonedPublic

Authored by avt77 on Mar 3 2017, 6:41 AM.

Details

Summary

It's one of the issues raised in PR27136: the duplicated epilog BBs in a function. This patch introduces the possible decision. It's only the preliminary version of the patch that's why we have a lot of DEBUGs inside it and that's why it's still buggy, e.g. we can't build bootstrap if apply this patch. The main purpose of this patch is to introduce intention and to get the first feedback on the suggested approach.

Diff Detail

Event Timeline

avt77 created this revision.Mar 3 2017, 6:41 AM
RKSimon added a subscriber: llvm-commits.
davide requested changes to this revision.Mar 5 2017, 3:33 PM

Is it possible to address this at IR level? Anyway, comments inline.

lib/CodeGen/RemoveEqualBB.cpp
2

FuncletLayout ?

11–13

This comment doesn't seem quite right.

48–49

You seem to pass MLI around but you don't use for anything real (the only line that actually uses it is commented out).

54

Why do you need the dominator here?

192–197

Commented out code.

253–257

Commented out code.

264–267

ditto.

lib/CodeGen/TargetPassConfig.cpp
670–671

Why are you putting at this point in the pipeline?

test/CodeGen/X86/loop-search.ll
1–3

This needs more tests.

This revision now requires changes to proceed.Mar 5 2017, 3:33 PM
avt77 updated this revision to Diff 90837.Mar 7 2017, 5:54 AM
avt77 edited edge metadata.

I implemented requirements raised by Davide and fix an issue with bootstrap: now it works properly. Next steps: I'm going to collect statistic on bootsrap such as: number of removed BBs and instructions inside those BBs, the total size changing, compile time. When it's done I'm going to extend transformation with other kinds of BBs (e.g. parts of EH, -O1 (size) optimizations) and/or terminators.

You fixed all the style comments, so thanks. But you glossed over the design questions I asked. In particular, before getting performance numbers, there are two questions unanswered:

  1. Why can't you do this in the middle-end? I assume there's a valid reason, but I'll be happy if you can actually elaborate.
  2. Granted we want this in the backend, why are you putting the pass at that specific point in the pipeline?
avt77 updated this revision to Diff 91561.Mar 13 2017, 7:57 AM

Now I found and fixed issues raised during bootstrap. The bootstrap works perfectly now. And I got some numbers: on such huge files like clang, llc, etc. we have about 0.1% size of file decreasing. At the moment I'm using the most possible conservative approach but collected numbers show that we can get better results: it's for sure.

Davide, I'm doing this transformation at this point of pipeline because I'd like to do it after all other layout related transformations. Is it the answer on your question? And the same answer about middle level: I'd like to do this transformation when all others are done but there are still something what can be optimized. Maybe I'm wrong?

MatzeB edited edge metadata.

Did you see the new MachineOutliner yet? It is also merging equivalent code but is not restricted to a single basic block but find duplicate sequences of instructions and can also merge across functions.

MatzeB requested changes to this revision.Mar 22 2017, 7:18 AM

We also have TailMerging (part of BranchFolding) which should merges equivalent instruction sequences going to a return and is enabled by default. Apparently it doesn't catch the case in loop-search.ll (and you didn't provide additional examples/tests unfortunately) but we should better search for bugs there instead of adding a new pass.

This revision now requires changes to proceed.Mar 22 2017, 7:18 AM
avt77 added a comment.Mar 22 2017, 7:58 AM

Matthias,
Thank you for the fast reply.

! In D30572#707521, @MatzeB wrote:
Did you see the new MachineOutliner yet? It is also merging equivalent code but is not restricted to a single basic block but find duplicate sequences of instructions and can also merge across functions.

Now, I did not see it but it seems it does more complex thing: inside my patch I'm calling it "candidates for -O1 optimizations. That's fine that you did it already!

We also have TailMerging (part of BranchFolding) which should merges equivalent instruction sequences going to a return and is enabled by default. Apparently it doesn't catch the case in loop-search.ll (and you didn't provide additional examples/tests unfortunately) but we should better search for bugs there instead of adding a new pass.

I've seen this code but it did not work for me that's why I implemented the new pass. But I deadly agree that it's better to fix the possible bugs inside this pass instead of adding a new pass: I'll try to redesign my job.
One comment: I bootstrapped Clang with my patch and found hundreds of duplicated BBs. Especially many of them were accessed through Jump Tables. Do you know about this issue?
And more: TailMerging works with return only or with unconditional jumps as well?

And more: TailMerging works with return only or with unconditional jumps as well?

Given something like

BB0:

A
B
C
RET

BB1:

C
RET

BB2:

B
C
RET

tail merging should give you something like:

BB0:

A
B
# fallthrough

BB1:

C
RET

BB2:

B
JMP BB1

That's at least my understanding, but I am no expert in the pass. Better do your own experiments/code reading.

paquette requested changes to this revision.EditedMar 22 2017, 11:42 AM
paquette added a subscriber: paquette.

I think the comments in this pass could use some improvement. It would be nice to be able to get the gist of what's going on without having to read through all of the code.

Does this pass only remove identical MachineBasicBlocks that end in returns/tail calls? If so, like Matthias said, we might just have a TailMerging bug. If it's more general, then what does this pass do aside from fix the bug in the test?

Also, since this is a new pass, there should be a new test written for it as well as the modification to the test you already included.

(... also I'm apparently also jpaquette in the reviewer list. Apparently I signed up to Phabricator once years ago and forgot entirely. Weird.)

lib/CodeGen/RemoveEqualBB.cpp
10–11

I don't really understand this comment. It makes it sound like what you're trying to do is remove any two equal basic blocks in the program. Maybe expand on it a bit and give some motivation? It would make the rest of the code easier to understand.

It also might be good to rename the pass for similar reasons; if I saw this as a command line option I might be a little confused.

71

MBB ought to never be null, so you can pass it as a reference.

72

Generally try to avoid use of auto when the type isn't some huge template soup. In more complicated functions, it makes it a bit clearer what's going on.

78

This function should probably have a function explaining what it's doing and the responsibilities of each parameter.

Also, a bunch of these could probably be references rather than pointers. That'd make it clear when things can't be null.

100–101

Might seem a tad pedantic, but comments should be full sentences. :) Also, watch out for spelling/grammar errors.

http://llvm.org/docs/CodingStandards.html#commenting

106

There are enough instances of MBB4Replace->getNumber() around here that it might be worth it to just give it its own variable. Say, MBB4ReplaceNumber or something?

130

The code above or below? The stuff below alters stuff in JTI, while this skips Pred if the condition holds.

147

You don't have to comment out asserts. In fact, it's a pretty good idea to use them a lot:

http://llvm.org/docs/CodingStandards.html#assert-liberally

165

This doesn't seem to be used anywhere.

167–170

This can probably all be removed?

177

Maybe "Basic blocks that will not be removed from F by the pass."

179–180

Similarly, "Basic blocks that will no longer be in F at the end of the pass."

183

I'd like it if X and Y were named more descriptively if only to remind me what they are later in the code. Even MBB1 and MBB2 or BB1 and BB2 would be helpful, since at least then I would know I'm talking about two MachineBasicBlocks.

http://llvm.org/docs/CodingStandards.html#name-types-functions-variables-and-enumerators-properly

202

Variable names should start with capital letters. I would also give this a better name. Maybe EqualMBBs?

http://llvm.org/docs/CodingStandards.html#name-types-functions-variables-and-enumerators-properly

242

Is this only an X86 pass? I would keep target-specific stuff out of the pass.

245

If you aren't using the commented out stuff yet, I'd remove it.

265–267

If there's only one statement in your if, you don't need the braces.

293–294

This seems like it can be removed.

Now I found and fixed issues raised during bootstrap. The bootstrap works perfectly now. And I got some numbers: on such huge files like clang, llc, etc. we have about 0.1% size of file decreasing. At the moment I'm using the most possible conservative approach but collected numbers show that we can get better results: it's for sure.

Davide, I'm doing this transformation at this point of pipeline because I'd like to do it after all other layout related transformations. Is it the answer on your question? And the same answer about middle level: I'd like to do this transformation when all others are done but there are still something what can be optimized. Maybe I'm wrong?

Sorry, missed this one.
44 kilobytes on llc release (44MB in release) seem a saving really small to justify the complexity of a new pass.
You should really consider leveraging existing pass to do this transformation as already pointed out and even before that, you may want to conduct an analysis on small/medium/large size executables in order to understand what's the number of duplicated basic block and therefore what's the hypothetical gain.

Last but not least, why do you want this transformation ;) ?

avt77 added a comment.Apr 25 2017, 1:13 AM

Hi All,
Reading the sources of TailMerging Pass I discovered that it has special switch "tail-merge-size" allowing to resolve the issue from loop-serch.ll test. The default value of the switch is 3 but if I change it as 2 then everything works fine.
Because of that I decided to abandon this review :-(
I'm going to investigate the possibility to change the default value. If it is not allowed for any reasons (compile time, target specific requirements, etc.) I'll implement special hook in Target as it's suggested in sources.

Any thoughts about possible default value for "tail-merge-size"? I suppose it could be 1 because we can have a lot of BBs with single "ret" (or "unconditional jump" or ...) instruction and we'd like to merge all such BBs. The compilation time should not increase dramatically (from my point of view) but we need investigations of course.

avt77 abandoned this revision.Apr 25 2017, 4:26 AM