It's one of the issues raised in PR27136: the duplicated epilog BBs in a function. This patch introduces the possible decision. It's only the preliminary version of the patch that's why we have a lot of DEBUGs inside it and that's why it's still buggy, e.g. we can't build bootstrap if apply this patch. The main purpose of this patch is to introduce intention and to get the first feedback on the suggested approach.
Diff Detail
Event Timeline
Is it possible to address this at IR level? Anyway, comments inline.
lib/CodeGen/RemoveEqualBB.cpp | ||
---|---|---|
2 | FuncletLayout ? | |
11–13 | This comment doesn't seem quite right. | |
48–49 | You seem to pass MLI around but you don't use for anything real (the only line that actually uses it is commented out). | |
54 | Why do you need the dominator here? | |
192–197 | Commented out code. | |
253–257 | Commented out code. | |
264–267 | ditto. | |
lib/CodeGen/TargetPassConfig.cpp | ||
670–671 | Why are you putting at this point in the pipeline? | |
test/CodeGen/X86/loop-search.ll | ||
1–3 | This needs more tests. |
I implemented requirements raised by Davide and fix an issue with bootstrap: now it works properly. Next steps: I'm going to collect statistic on bootsrap such as: number of removed BBs and instructions inside those BBs, the total size changing, compile time. When it's done I'm going to extend transformation with other kinds of BBs (e.g. parts of EH, -O1 (size) optimizations) and/or terminators.
You fixed all the style comments, so thanks. But you glossed over the design questions I asked. In particular, before getting performance numbers, there are two questions unanswered:
- Why can't you do this in the middle-end? I assume there's a valid reason, but I'll be happy if you can actually elaborate.
- Granted we want this in the backend, why are you putting the pass at that specific point in the pipeline?
Now I found and fixed issues raised during bootstrap. The bootstrap works perfectly now. And I got some numbers: on such huge files like clang, llc, etc. we have about 0.1% size of file decreasing. At the moment I'm using the most possible conservative approach but collected numbers show that we can get better results: it's for sure.
Davide, I'm doing this transformation at this point of pipeline because I'd like to do it after all other layout related transformations. Is it the answer on your question? And the same answer about middle level: I'd like to do this transformation when all others are done but there are still something what can be optimized. Maybe I'm wrong?
Did you see the new MachineOutliner yet? It is also merging equivalent code but is not restricted to a single basic block but find duplicate sequences of instructions and can also merge across functions.
We also have TailMerging (part of BranchFolding) which should merges equivalent instruction sequences going to a return and is enabled by default. Apparently it doesn't catch the case in loop-search.ll (and you didn't provide additional examples/tests unfortunately) but we should better search for bugs there instead of adding a new pass.
Matthias,
Thank you for the fast reply.
! In D30572#707521, @MatzeB wrote:
Did you see the new MachineOutliner yet? It is also merging equivalent code but is not restricted to a single basic block but find duplicate sequences of instructions and can also merge across functions.
Now, I did not see it but it seems it does more complex thing: inside my patch I'm calling it "candidates for -O1 optimizations. That's fine that you did it already!
I've seen this code but it did not work for me that's why I implemented the new pass. But I deadly agree that it's better to fix the possible bugs inside this pass instead of adding a new pass: I'll try to redesign my job.
One comment: I bootstrapped Clang with my patch and found hundreds of duplicated BBs. Especially many of them were accessed through Jump Tables. Do you know about this issue?
And more: TailMerging works with return only or with unconditional jumps as well?
And more: TailMerging works with return only or with unconditional jumps as well?
Given something like
BB0:
A B C RET
BB1:
C RET
BB2:
B C RET
tail merging should give you something like:
BB0:
A B # fallthrough
BB1:
C RET
BB2:
B JMP BB1
That's at least my understanding, but I am no expert in the pass. Better do your own experiments/code reading.
I think the comments in this pass could use some improvement. It would be nice to be able to get the gist of what's going on without having to read through all of the code.
Does this pass only remove identical MachineBasicBlocks that end in returns/tail calls? If so, like Matthias said, we might just have a TailMerging bug. If it's more general, then what does this pass do aside from fix the bug in the test?
Also, since this is a new pass, there should be a new test written for it as well as the modification to the test you already included.
(... also I'm apparently also jpaquette in the reviewer list. Apparently I signed up to Phabricator once years ago and forgot entirely. Weird.)
lib/CodeGen/RemoveEqualBB.cpp | ||
---|---|---|
10–11 | I don't really understand this comment. It makes it sound like what you're trying to do is remove any two equal basic blocks in the program. Maybe expand on it a bit and give some motivation? It would make the rest of the code easier to understand. It also might be good to rename the pass for similar reasons; if I saw this as a command line option I might be a little confused. | |
71 | MBB ought to never be null, so you can pass it as a reference. | |
72 | Generally try to avoid use of auto when the type isn't some huge template soup. In more complicated functions, it makes it a bit clearer what's going on. | |
78 | This function should probably have a function explaining what it's doing and the responsibilities of each parameter. Also, a bunch of these could probably be references rather than pointers. That'd make it clear when things can't be null. | |
100–101 | Might seem a tad pedantic, but comments should be full sentences. :) Also, watch out for spelling/grammar errors. | |
106 | There are enough instances of MBB4Replace->getNumber() around here that it might be worth it to just give it its own variable. Say, MBB4ReplaceNumber or something? | |
130 | The code above or below? The stuff below alters stuff in JTI, while this skips Pred if the condition holds. | |
147 | You don't have to comment out asserts. In fact, it's a pretty good idea to use them a lot: | |
165 | This doesn't seem to be used anywhere. | |
167–170 | This can probably all be removed? | |
177 | Maybe "Basic blocks that will not be removed from F by the pass." | |
179–180 | Similarly, "Basic blocks that will no longer be in F at the end of the pass." | |
183 | I'd like it if X and Y were named more descriptively if only to remind me what they are later in the code. Even MBB1 and MBB2 or BB1 and BB2 would be helpful, since at least then I would know I'm talking about two MachineBasicBlocks. http://llvm.org/docs/CodingStandards.html#name-types-functions-variables-and-enumerators-properly | |
202 | Variable names should start with capital letters. I would also give this a better name. Maybe EqualMBBs? http://llvm.org/docs/CodingStandards.html#name-types-functions-variables-and-enumerators-properly | |
242 | Is this only an X86 pass? I would keep target-specific stuff out of the pass. | |
245 | If you aren't using the commented out stuff yet, I'd remove it. | |
265–267 | If there's only one statement in your if, you don't need the braces. | |
293–294 | This seems like it can be removed. |
Sorry, missed this one.
44 kilobytes on llc release (44MB in release) seem a saving really small to justify the complexity of a new pass.
You should really consider leveraging existing pass to do this transformation as already pointed out and even before that, you may want to conduct an analysis on small/medium/large size executables in order to understand what's the number of duplicated basic block and therefore what's the hypothetical gain.
Last but not least, why do you want this transformation ;) ?
Hi All,
Reading the sources of TailMerging Pass I discovered that it has special switch "tail-merge-size" allowing to resolve the issue from loop-serch.ll test. The default value of the switch is 3 but if I change it as 2 then everything works fine.
Because of that I decided to abandon this review :-(
I'm going to investigate the possibility to change the default value. If it is not allowed for any reasons (compile time, target specific requirements, etc.) I'll implement special hook in Target as it's suggested in sources.
Any thoughts about possible default value for "tail-merge-size"? I suppose it could be 1 because we can have a lot of BBs with single "ret" (or "unconditional jump" or ...) instruction and we'd like to merge all such BBs. The compilation time should not increase dramatically (from my point of view) but we need investigations of course.
FuncletLayout ?