The following change is to allow the machine outlining can be applied for Nth times, where N is specified by the compiler option. By default the value of N is 1. The motivation is that the repeated machine outlining can further reduce code size. Please refer to the presentation "Improving Swift Binary Size via Link Time Optimization" in LLVM Developers' Meeting in 2019.
Details
Diff Detail
Event Timeline
Hi,
D69446 seems to introduce something similar. Are these patches related or is this just coincidence?
Thank you for pointing out this. It could be by coincidence. I have presented this in LLVM Developers' Meeting in 10/22/2019. I didn't realize someone submitted D69446 on 10/25/2019.
In D71027, I have also fixed incorrect logic in maintaining the side-effect information for outlined function at line 1295. This bug shows up during repeated machine outliner.
Please let me know if you have any questions.
Okay, thanks for clarifying.
Would it be possible to separate both patches? The fix can then be reviewed independently from repeated outlining itself.
Sure. I have updated the patch. Please note that the repeating machine outliner may not work correctly without fixing the logic at line 1265.
Thanks! I've commented on D69446 to see if the revision is still active. So I suggest giving this some more time.
Unfortunately no update so far. If it's okay for the other reviewers (@paquette @aschwaighofer) I suggest reviewing @jinlin 's patches.
llvm/lib/CodeGen/MachineOutliner.cpp | ||
---|---|---|
1510–1512 | I think that allowing the number of repeats to be 0 is unintuitive. Would it be possible to output an error when the number of repeats is less than 1 instead of silently changing it? | |
1515–1517 | Can you add some debug output here that says why the outliner stopped running? E.g.
This debug output could then be used in a separate testcase showing that repeated outlining terminates as expected. | |
llvm/test/CodeGen/AArch64/machine-outliner-iterative.mir | ||
65–71 | The IR instructions inside each of the IR functions here should not be necessary. |
llvm/lib/CodeGen/MachineOutliner.cpp | ||
---|---|---|
1510–1512 | Done. | |
1515–1517 | Done. | |
llvm/test/CodeGen/AArch64/machine-outliner-iterative.mir | ||
65–71 | Please let me know how to make it work. Initially I used those instructions to generate the first iteration machine outlining functions. If I simply remove them, it will not work. |
llvm/test/CodeGen/AArch64/machine-outliner-iterative.mir | ||
---|---|---|
65–71 | Now I understand what you mean. I have updated the test based on your suggestions. Thank you. |
llvm/lib/CodeGen/MachineOutliner.cpp | ||
---|---|---|
102 | Could we rename this to something like machine-outline-repeat-counts or machine-outline-runs? The term outlining itself is somewhat ambigious and the prefix machine would be more consistent. | |
103 | Same here: I suggest something like *...to apply machine outlining* |
llvm/lib/CodeGen/MachineOutliner.cpp | ||
---|---|---|
1510 | Why is this indirection necessary? | |
llvm/test/CodeGen/AArch64/machine-outliner-iterative.mir | ||
2 | Could you run this test with a higher number for -outline-repeat-count to verfiy that the outlining stops when no changes are found and nothing changes (beside the name of the outlined functions)? |
llvm/test/CodeGen/AArch64/machine-outliner-iterative.mir | ||
---|---|---|
2 | I have updated the test case to verify that no more machine outlining is performance after the iteration is greater than 2. |
llvm/test/CodeGen/AArch64/machine-outliner-iterative.mir | ||
---|---|---|
2–3 | I think that it would make sense to use different check prefixes for each level of outlining. That would make it easier to show explicitly what happens when you add extra iterations. # RUN: llc -mtriple=aarch64--- -run-pass=prologepilog -run-pass=machine-outliner -machine-outline-runs=2 -verify-machineinstrs %s -o - | FileCheck %s --check-prefix TWO-RUNS # RUN: llc -mtriple=aarch64--- -run-pass=prologepilog -run-pass=machine-outliner -machine-outline-runs=4 -verify-machineinstrs %s -o - | FileCheck %s --check-prefix FOUR-RUNS Also you shouldn't need --run-pass=prologepilog here. | |
114–116 | Can you check which instructions are in these functions? |
Thanks for updating this!
I have a couple nits, but I think this is pretty close to being ready. :)
llvm/lib/CodeGen/MachineOutliner.cpp | ||
---|---|---|
1512 | Grammar nit: "Expect NumRepeat for machine outlining to be greater than or equal to 1" | |
1516 | Nit: pull I into the loop: for (unsigned I = 0; I < NumRepeat; I++) { // ... } | |
1518–1519 | Can we have a small testcase showing that this happens? | |
1524–1525 | Can we have a small testcase showing that this happens? | |
llvm/test/CodeGen/AArch64/machine-outliner-iterative.mir | ||
140–141 | Can you add a comment explaining why these functions are not expected? |
Thank you Jessica and David for reviewing my changes. Appreciate your excellent suggestions.
I have made minor changes in test case llvm/test/CodeGen/AArch64/machine-outliner-iterative.mir.
Testing Time: 128.22s
Expected Passes : 36105 Expected Failures : 163 Unsupported Tests : 336
Could we rename this to something like machine-outline-repeat-counts or machine-outline-runs? The term outlining itself is somewhat ambigious and the prefix machine would be more consistent.