This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/IPO/
-
Transforms/
-
IPO/
1/3
FunctionSpecialization.cpp
-
test/Transforms/FunctionSpecialization/
-
Transforms/
-
FunctionSpecialization/
-
function-specialization-recursive.ll
-
function-specialization-recursive2.ll
-
function-specialization-recursive3.ll
-
function-specialization-recursive4.ll

Differential D106426

[FuncSpec] Support specialising recursive functions
ClosedPublic

Authored by SjoerdMeijer on Jul 21 2021, 1:08 AM.

Download Raw Diff

Details

Reviewers

ChuanqiXu
chill
jaykang10
fhahn

Commits

rG30fbb0697907: [FuncSpec] Support specialising recursive functions

Summary

This adds support for specialising recursive functions. For example, consider this program:

int Global = 1;
void recursiveFunc(int *arg) {
  if (*arg < 4) {
    print(*arg);
    recursiveFunc(*arg + 1);
  }
}
void main() {
  recursiveFunc(&Global);
}

after 3 iterations of function specialisation, followed by inlining of the specialised versions of recursiveFunc, the main function looks like this:

void main() {
    print(1);
    print(2);
    print(3);
  }

To support this, the following has been added:

Update the solver and state of the new specialised functions,
An optimisation to propagate constant stack values after each iteration of function specialisation, which is necessary for the next iteration to recognise the constant values and trigger.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

SjoerdMeijer created this revision.Jul 21 2021, 1:08 AM

Herald added subscribers: snehasish, ormris, hiraditya. · View Herald TranscriptJul 21 2021, 1:08 AM

SjoerdMeijer requested review of this revision.Jul 21 2021, 1:08 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 21 2021, 1:08 AM

tschuett added a subscriber: tschuett.Jul 21 2021, 1:12 AM

Harbormaster completed remote builds in B115263: Diff 360374.Jul 21 2021, 1:51 AM

The example in the summary looks a little bit scaring at the first glance. I thought it may specialize 1000 times if the code looks like:

int Global = 1;
void recursiveFunc(int *arg) {
  if (*arg < 1000) {
    print(*arg);
    recursiveFunc(*arg + 1);
  }
}
void main() {
  recursiveFunc(&Global);
}

And I found that it is controlled by FuncSpecializationMaxIters which is 1 by default.
And my question is: what would be different if we don't change the value for FuncSpecializationMaxIters?

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
74	mark this with `static`
84	maybe we need a better name such as `getPromotableAlloca`

Thanks for looking at this!
Have inlined my response to some queries below, and am now going to address the code comments now.

In D106426#2892809, @ChuanqiXu wrote:
The example in the summary looks a little bit scaring at the first glance. I thought it may specialize 1000 times if the code looks like:
int Global = 1;
void recursiveFunc(int *arg) {
  if (*arg < 1000) {
    print(*arg);
    recursiveFunc(*arg + 1);
  }
}
void main() {
  recursiveFunc(&Global);
}
And I found that it is controlled by FuncSpecializationMaxIters which is 1 by default.

Exactly, this is controlled by option -function-specialization-max-iters which indeed defaults to 1, but in the new tests you'll see that higher values are used to do more iterations and specialise recursive functions.

And my question is: what would be different if we don't change the value for FuncSpecializationMaxIters?

I have no plan to change the default number of iterations at this point. First, I wanted to get this features in, so that it can be controlled with this option. But next, when we start looking at enabling the function specialisation pass by default and need to look at compile times again, we can revise this what would be acceptable. Or perhaps we do need a "aggressive" options, that sets the number of iterations to some value. But yeah, getting function specialisation enabled by default is going to be difficult, and didn't want to make it even more difficult by consider changing the default at this point

Addressed comments.

Harbormaster completed remote builds in B115537: Diff 360772.Jul 22 2021, 5:36 AM

In D106426#2895819, @SjoerdMeijer wrote:

In D106426#2892809, @ChuanqiXu wrote:

And my question is: what would be different if we don't change the value for FuncSpecializationMaxIters?

I have no plan to change the default number of iterations at this point. First, I wanted to get this features in, so that it can be controlled with this option. But next, when we start looking at enabling the function specialisation pass by default and need to look at compile times again, we can revise this what would be acceptable. Or perhaps we do need a "aggressive" options, that sets the number of iterations to some value. But yeah, getting function specialisation enabled by default is going to be difficult, and didn't want to make it even more difficult by consider changing the default at this point

I have no plan to change the default number of iterations at this point.

Yeah, I didn't neither. If the default behavior doesn't change, we don't need to measure the numbers again to save time.

Or perhaps we do need a "aggressive" options, that sets the number of iterations to some value.

This is the problem. Since the number of recursive function get specialized may increase with FuncSpecializationMaxIters, it may be scaring to users (may be other compiler engineer tuning the performance). I think the problem may come from there is no restriction between iterations.
One solution may be add a structure like Set<Set<Function*>> to record functions specialized and a structure Set<Function*, Argument*> to record the function specialized for specific argument. Then we could give higher penalty for the recursive cases for the same argument. The design for data structure may not be best. But I think I stated my concern. Simply, I don't want the number of specialized functions get increased linearly with FuncSpecializationMaxIters increases unlimitedly.

In D106426#2898903, @ChuanqiXu wrote:

Or perhaps we do need a "aggressive" options, that sets the number of iterations to some value.

This is the problem. Since the number of recursive function get specialized may increase with FuncSpecializationMaxIters, it may be scaring to users (may be other compiler engineer tuning the performance). I think the problem may come from there is no restriction between iterations.
One solution may be add a structure like Set<Set<Function*>> to record functions specialized and a structure Set<Function*, Argument*> to record the function specialized for specific argument. Then we could give higher penalty for the recursive cases for the same argument. The design for data structure may not be best. But I think I stated my concern. Simply, I don't want the number of specialized functions get increased linearly with FuncSpecializationMaxIters increases unlimitedly.

I am happy to look into that. Shall we do that as a follow up? This gives us the baseline that we can experiment with and get numbers for, then see if we can improve that.

In D106426#2899195, @SjoerdMeijer wrote:

In D106426#2898903, @ChuanqiXu wrote:

Or perhaps we do need a "aggressive" options, that sets the number of iterations to some value.

This is the problem. Since the number of recursive function get specialized may increase with FuncSpecializationMaxIters, it may be scaring to users (may be other compiler engineer tuning the performance). I think the problem may come from there is no restriction between iterations.
One solution may be add a structure like Set<Set<Function*>> to record functions specialized and a structure Set<Function*, Argument*> to record the function specialized for specific argument. Then we could give higher penalty for the recursive cases for the same argument. The design for data structure may not be best. But I think I stated my concern. Simply, I don't want the number of specialized functions get increased linearly with FuncSpecializationMaxIters increases unlimitedly.

I am happy to look into that. Shall we do that as a follow up? This gives us the baseline that we can experiment with and get numbers for, then see if we can improve that.

I prefer to implement it in this diff. Since I think it would be better to make a patch more self-contained. We could still experiment with this patch without troubling others.

Okay, if we want to do this here, then we need clarity on the exact problem, which is lacking at the moment. You mentioned "scary" a few times, but there's nothing scary here. ;-) First of all, this is an extra option that people need to buy into, so changes won't bother anyone.

About the suspected problem:

I think the problem may come from there is no restriction between iterations.

I don't think this is the case. In next iterations, exactly the same cost-model and heuristics are applied. Thus, I expect that if a function was not specialised before, it won't be specialised in a next iteration. Except, for recursive functions that need to be triggered by a very specific optimisation first between iterations.

Thus, without exact clarity of the problem, this feels like a premature optimisation to me, so we can do 2 things:

I guess it's up to me to show with performance numbers if there is a problem or not. Before doing that, I would appreciate if you can elaborate on the suspected problems (just to make sure I understand them).
if you're happy with my explanation thus far, and that this is an opt-in anyway, we can defer the compile time analysis that I need to do anyway when I plan to look into enable function specialisation by default.

In D106426#2899320, @SjoerdMeijer wrote:

First of all, this is an extra option that people need to buy into, so changes won't bother anyone.

Agreed. But it's not a good reason to do things arbitrarily if it is not a default behavior.

About the suspected problem:

I think the problem may come from there is no restriction between iterations.

I don't think this is the case. In next iterations, exactly the same cost-model and heuristics are applied. Thus, I expect that if a function was not specialised before, it won't be specialised in a next iteration. Except, for recursive functions that need to be triggered by a very specific optimisation first between iterations.

Yeah, the recursive functions is the problem.

Okay, if we want to do this here, then we need clarity on the exact problem,

Let me give a more formal description.
(1) Call the set of new generated functions in i-th iteration with Fs[i], and Fs[0] would be the set of functions initially. And the number of functions generated in i-th iterations would be |Fs[i]|.
(2) The penalty to specialize a function would be Penalty(F) * NumSpecialized[i]. NumSpecialized[i] would be Fs[1] + Fs[2] + ... + Fs[I].
(3) The bonus to specialize a function would be Bonus(F, ArgNo). For normal recursive function F and the specialized one SF, it's normal that Bonus(F, ArgNo) == Bonus(SF, ArgNo) and Penalty(F) == Penalty(SF). It means that SF may be specialized again if NumSpecialized[I] may not be large enough.

Then we could find that now the total number of specialized function would be controlled by NumSpecialized which increases linearly.
But here is the problem that the implementation Bonus(F, ArgNo) would increase exponentially with the depth of loops. It shows that we may get in trouble if we met a recursive function with a deep loop.
And the things we want to do is to add the iteration time I to the cost model of Penalty(F, i).

For example,

; opt -function-specialization -func-specialization-max-iters=100  -S %s
@Global = internal constant i32 1, align 4

define internal void @recursiveFunc(i32* nocapture readonly %arg) {
  %temp = alloca i32, align 4
  %arg.load = load i32, i32* %arg, align 4
  %arg.cmp = icmp slt i32 %arg.load, 10000
  br i1 %arg.cmp, label %loop1, label %ret.block

loop1:
  br label %loop2

loop2:
  br label %loop3

loop3:
  br label %loop4

loop4:
  br label %block6

block6:
  call void @print_val(i32 %arg.load)
  %arg.add = add nsw i32 %arg.load, 1
  store i32 %arg.add, i32* %temp, align 4
  call void @recursiveFunc(i32* nonnull %temp)
  br label %loop4.end

loop4.end:
  %exit_cond1 = call i1 @exit_cond()
  br i1 %exit_cond1, label %loop4, label %loop3.end

loop3.end:
  %exit_cond2 = call i1 @exit_cond()
  br i1 %exit_cond2, label %loop3, label %loop2.end

loop2.end:
  %exit_cond3 = call i1 @exit_cond()
  br i1 %exit_cond3, label %loop2, label %loop1.end

loop1.end:
  %exit_cond4 = call i1 @exit_cond()
  br i1 %exit_cond4, label %loop1, label %ret.block

ret.block:
  ret void
}

define i32 @main() {
  call void @recursiveFunc(i32* nonnull @Global)
  ret i32 0
}

declare dso_local void @print_val(i32)
declare dso_local i1 @exit_cond()

I guess I would be happy if recursiveFunc would get specialized less than 4 times even when we set func-specialization-max-iters to 100.

I guess it's up to me to show with performance numbers if there is a problem

I prefer to analysis problems from the side of cost mode instead of experiencing the large work load all the time. Since there are too many parameters and the patterns are very complex. It should be common that problems in codes may be missed. Although it is very common to fix bugs, I think it would be better to avoid problems if we noticed them.

Thanks for elaborating on this!
See my comments inline.

In D106426#2899586, @ChuanqiXu wrote:

Let me give a more formal description.
(1) Call the set of new generated functions in i-th iteration with Fs[i], and Fs[0] would be the set of functions initially. And the number of functions generated in i-th iterations would be |Fs[i]|.
(2) The penalty to specialize a function would be Penalty(F) * NumSpecialized[i]. NumSpecialized[i] would be Fs[1] + Fs[2] + ... + Fs[I].
(3) The bonus to specialize a function would be Bonus(F, ArgNo). For normal recursive function F and the specialized one SF, it's normal that Bonus(F, ArgNo) == Bonus(SF, ArgNo) and Penalty(F) == Penalty(SF). It means that SF may be specialized again if NumSpecialized[I] may not be large enough.

Then we could find that now the total number of specialized function would be controlled by NumSpecialized which increases linearly.

100% agreed so far. This is indeed how things work at the moment to recursively/linearly specialise recursive functions.

But here is the problem that the implementation Bonus(F, ArgNo) would increase exponentially with the depth of loops. It shows that we may get in trouble if we met a recursive function with a deep loop.
And the things we want to do is to add the iteration time I to the cost model of Penalty(F, i).

Ok, thanks, I am going to look into this!

For example,

; opt -function-specialization -func-specialization-max-iters=100  -S %s
@Global = internal constant i32 1, align 4

define internal void @recursiveFunc(i32* nocapture readonly %arg) {
  %temp = alloca i32, align 4
  %arg.load = load i32, i32* %arg, align 4
  %arg.cmp = icmp slt i32 %arg.load, 10000
  br i1 %arg.cmp, label %loop1, label %ret.block

loop1:
  br label %loop2

loop2:
  br label %loop3

loop3:
  br label %loop4

loop4:
  br label %block6

block6:
  call void @print_val(i32 %arg.load)
  %arg.add = add nsw i32 %arg.load, 1
  store i32 %arg.add, i32* %temp, align 4
  call void @recursiveFunc(i32* nonnull %temp)
  br label %loop4.end

loop4.end:
  %exit_cond1 = call i1 @exit_cond()
  br i1 %exit_cond1, label %loop4, label %loop3.end

loop3.end:
  %exit_cond2 = call i1 @exit_cond()
  br i1 %exit_cond2, label %loop3, label %loop2.end

loop2.end:
  %exit_cond3 = call i1 @exit_cond()
  br i1 %exit_cond3, label %loop2, label %loop1.end

loop1.end:
  %exit_cond4 = call i1 @exit_cond()
  br i1 %exit_cond4, label %loop1, label %ret.block

ret.block:
  ret void
}

define i32 @main() {
  call void @recursiveFunc(i32* nonnull @Global)
  ret i32 0
}

declare dso_local void @print_val(i32)
declare dso_local i1 @exit_cond()

I guess I would be happy if recursiveFunc would get specialized less than 4 times even when we set func-specialization-max-iters to 100.

I am looking into this now, but now that we have cleared up the problem in previous messages, a.k.a are on the same page how things work for recursive function, I think we need to discuss how the cost-model changes would look like. Because in this example the numbers are arbitrary and a rationale is missing:

I guess I would be happy if recursiveFunc would get specialized less than 4 times even when we set func-specialization-max-iters to 100.

Why 4 times? In other words, what is the metric here? Is that code-size or something else?

In D106426#2903857, @SjoerdMeijer wrote:

I am looking into this now, but now that we have cleared up the problem in previous messages, a.k.a are on the same page how things work for recursive function, I think we need to discuss how the cost-model changes would look like. Because in this example the numbers are arbitrary and a rationale is missing:

I guess I would be happy if recursiveFunc would get specialized less than 4 times even when we set func-specialization-max-iters to 100.

Why 4 times? In other words, what is the metric here? Is that code-size or something else?

4 times is an arbitrary number. I thought the best would be 1 time only. But I worried if it is too strict. So I said 4 times. Now I think we should be more strict. So I think it would better to specialize it only 1 time.

how things work for recursive function, I think we need to discuss how the cost-model changes would look like

I think we need to handle recursive function specially. Since it is common that recursive function f and its specialized one f' have the same cost and bonus. We need to record the function specialized (and specialized argument). Then we need to specialize a function (for specific argument) we had specialized, we could forbid it or give it a much higher penalty. The detail is missing. But the key point is that we need to handle recursive function specially instead of tuning or refactoring the cost model.

In D106426#2906213, @ChuanqiXu wrote:

4 times is an arbitrary number. I thought the best would be 1 time only. But I worried if it is too strict. So I said 4 times. Now I think we should be more strict. So I think it would better to specialize it only 1 time.

I can't follow any this. There are only arbitrary numbers here. I can't follow the decision making behind this, because no rationale is given.
Taking your example, specialising 4 times, and after cleanup passes like inline, instcombine, and simplify cfg, this would exactly give the code I would expect.
And that's the problem of this discussion, it is based on a idea, and not based on an any data that we can verify. For *this* example, I don't see which high level criteria is the problem:

compile-times,
code-size,
performance.

how things work for recursive function, I think we need to discuss how the cost-model changes would look like

I think we need to handle recursive function specially. Since it is common that recursive function f and its specialized one f' have the same cost and bonus. We need to record the function specialized (and specialized argument). Then we need to specialize a function (for specific argument) we had specialized, we could forbid it or give it a much higher penalty. The detail is missing. But the key point is that we need to handle recursive function specially instead of tuning or refactoring the cost model.

To help this discussion and make this concrete, can you take your own example, and point exactly on what code-generation criteria I mentioned before you would like to specialise this only once?

The problem I have with this discussion is that it is based on an idea that I cannot verify with data. While I understand the general idea, perhaps, and implementing a penalty is easy but I simply don't know what that should be. I can put a random/arbitrary number in, but why would we do that? That's why I still prefer the approach of getting this foundation in place, because it is opt-in, then we can tune this further.

In D106426#2906622, @SjoerdMeijer wrote:

In D106426#2906213, @ChuanqiXu wrote:

4 times is an arbitrary number. I thought the best would be 1 time only. But I worried if it is too strict. So I said 4 times. Now I think we should be more strict. So I think it would better to specialize it only 1 time.

I can't follow any this. There are only arbitrary numbers here. I can't follow the decision making behind this, because no rationale is given.
Taking your example, specialising 4 times, and after cleanup passes like inline, instcombine, and simplify cfg, this would exactly give the code I would expect.
And that's the problem of this discussion, it is based on a idea, and not based on an any data that we can verify. For *this* example, I don't see which high level criteria is the problem:

compile-times,

code-size,

performance.

how things work for recursive function, I think we need to discuss how the cost-model changes would look like

I think we need to handle recursive function specially. Since it is common that recursive function f and its specialized one f' have the same cost and bonus. We need to record the function specialized (and specialized argument). Then we need to specialize a function (for specific argument) we had specialized, we could forbid it or give it a much higher penalty. The detail is missing. But the key point is that we need to handle recursive function specially instead of tuning or refactoring the cost model.

To help this discussion and make this concrete, can you take your own example, and point exactly on what code-generation criteria I mentioned before you would like to specialise this only once?

The problem I have with this discussion is that it is based on an idea that I cannot verify with data. While I understand the general idea, perhaps, and implementing a penalty is easy but I simply don't know what that should be. I can put a random/arbitrary number in, but why would we do that? That's why I still prefer the approach of getting this foundation in place, because it is opt-in, then we can tune this further.

Hi, for this example:

; opt -function-specialization -func-specialization-max-iters=100  -S %s
@Global = internal constant i32 1, align 4

define internal void @recursiveFunc(i32* nocapture readonly %arg) {
  %temp = alloca i32, align 4
  %arg.load = load i32, i32* %arg, align 4
  %arg.cmp = icmp slt i32 %arg.load, 10000
  br i1 %arg.cmp, label %loop1, label %ret.block

loop1:
  br label %loop2

loop2:
  br label %loop3

loop3:
  br label %loop4

loop4:
  br label %block6

block6:
  call void @print_val(i32 %arg.load)
  %arg.add = add nsw i32 %arg.load, 1
  store i32 %arg.add, i32* %temp, align 4
  call void @recursiveFunc(i32* nonnull %temp)
  br label %loop4.end

loop4.end:
  %exit_cond1 = call i1 @exit_cond()
  br i1 %exit_cond1, label %loop4, label %loop3.end

loop3.end:
  %exit_cond2 = call i1 @exit_cond()
  br i1 %exit_cond2, label %loop3, label %loop2.end

loop2.end:
  %exit_cond3 = call i1 @exit_cond()
  br i1 %exit_cond3, label %loop2, label %loop1.end

loop1.end:
  %exit_cond4 = call i1 @exit_cond()
  br i1 %exit_cond4, label %loop1, label %ret.block

ret.block:
  ret void
}

define i32 @main() {
  call void @recursiveFunc(i32* nonnull @Global)
  ret i32 0
}

declare dso_local void @print_val(i32)
declare dso_local i1 @exit_cond()

If we run bin/opt -function-specialization -func-specialization-max-iters=10 -S %s, we could find that recursiveFunc would get specialized 18 times. But I believe it would be a consensus for us that recursiveFunc shouldn't get specialized so many times.
Then I run bin/opt -function-specialization -func-specialization-max-iters=10 -inline -instcombine -simplifycfg -S %s, I got:

; ModuleID = 'RecursiveFunc.ll'
source_filename = "RecursiveFunc.ll"

@Global = internal constant i32 1, align 4
@funcspec.arg = internal constant i32 2
@funcspec.arg.3 = internal constant i32 3
@funcspec.arg.5 = internal constant i32 4
@funcspec.arg.7 = internal constant i32 5
@funcspec.arg.9 = internal constant i32 6
@funcspec.arg.11 = internal constant i32 7
@funcspec.arg.13 = internal constant i32 8
@funcspec.arg.15 = internal constant i32 9
@funcspec.arg.17 = internal constant i32 10
@funcspec.arg.19 = internal constant i32 11

define internal void @recursiveFunc(i32* nocapture readonly %arg) {
  %temp = alloca i32, align 4
  %arg.load = load i32, i32* %arg, align 4
  %arg.cmp = icmp slt i32 %arg.load, 10000
  br i1 %arg.cmp, label %loop1, label %ret.block

loop1:                                            ; preds = %loop1.end, %0
  br label %loop2

loop2:                                            ; preds = %loop2.end, %loop1
  br label %loop3

loop3:                                            ; preds = %loop3.end, %loop2
  br label %loop4

loop4:                                            ; preds = %loop4, %loop3
  call void @print_val(i32 %arg.load)
  %arg.add = add nsw i32 %arg.load, 1
  store i32 %arg.add, i32* %temp, align 4
  call void @recursiveFunc(i32* nonnull %temp)
  %exit_cond1 = call i1 @exit_cond()
  br i1 %exit_cond1, label %loop4, label %loop3.end

loop3.end:                                        ; preds = %loop4
  %exit_cond2 = call i1 @exit_cond()
  br i1 %exit_cond2, label %loop3, label %loop2.end

loop2.end:                                        ; preds = %loop3.end
  %exit_cond3 = call i1 @exit_cond()
  br i1 %exit_cond3, label %loop2, label %loop1.end

loop1.end:                                        ; preds = %loop2.end
  %exit_cond4 = call i1 @exit_cond()
  br i1 %exit_cond4, label %loop1, label %ret.block

ret.block:                                        ; preds = %loop1.end, %0
  ret void
}

define i32 @main() {
  br label %loop1.i

loop1.i:                                          ; preds = %loop1.end.i, %0
  br label %loop2.i

loop2.i:                                          ; preds = %loop2.end.i, %loop1.i
  br label %loop3.i

loop3.i:                                          ; preds = %loop3.end.i, %loop2.i
  br label %loop4.i

loop4.i:                                          ; preds = %recursiveFunc.2.exit.i, %loop3.i
  call void @print_val(i32 1)
  br label %loop1.i.i

loop1.i.i:                                        ; preds = %loop1.end.i.i, %loop4.i
  br label %loop2.i.i

loop2.i.i:                                        ; preds = %loop2.end.i.i, %loop1.i.i
  br label %loop3.i.i

loop3.i.i:                                        ; preds = %loop3.end.i.i, %loop2.i.i
  br label %loop4.i.i

loop4.i.i:                                        ; preds = %recursiveFunc.4.exit.i.i, %loop3.i.i
  call void @print_val(i32 2)
  br label %loop1.i.i.i

loop1.i.i.i:                                      ; preds = %loop1.end.i.i.i, %loop4.i.i
  br label %loop2.i.i.i

loop2.i.i.i:                                      ; preds = %loop2.end.i.i.i, %loop1.i.i.i
  br label %loop3.i.i.i

loop3.i.i.i:                                      ; preds = %loop3.end.i.i.i, %loop2.i.i.i
  br label %loop4.i.i.i

loop4.i.i.i:                                      ; preds = %recursiveFunc.6.exit.i.i.i, %loop3.i.i.i
  call void @print_val(i32 3)
  br label %loop1.i.i.i.i

loop1.i.i.i.i:                                    ; preds = %loop1.end.i.i.i.i, %loop4.i.i.i
  br label %loop2.i.i.i.i

loop2.i.i.i.i:                                    ; preds = %loop2.end.i.i.i.i, %loop1.i.i.i.i
  br label %loop3.i.i.i.i

loop3.i.i.i.i:                                    ; preds = %loop3.end.i.i.i.i, %loop2.i.i.i.i
  br label %loop4.i.i.i.i

loop4.i.i.i.i:                                    ; preds = %recursiveFunc.8.exit.i.i.i.i, %loop3.i.i.i.i
  call void @print_val(i32 4)
  br label %loop1.i.i.i.i.i

loop1.i.i.i.i.i:                                  ; preds = %loop1.end.i.i.i.i.i, %loop4.i.i.i.i
  br label %loop2.i.i.i.i.i

loop2.i.i.i.i.i:                                  ; preds = %loop2.end.i.i.i.i.i, %loop1.i.i.i.i.i
  br label %loop3.i.i.i.i.i

loop3.i.i.i.i.i:                                  ; preds = %loop3.end.i.i.i.i.i, %loop2.i.i.i.i.i
  br label %loop4.i.i.i.i.i

loop4.i.i.i.i.i:                                  ; preds = %recursiveFunc.10.exit.i.i.i.i.i, %loop3.i.i.i.i.i
  call void @print_val(i32 5)
  br label %loop1.i.i.i.i.i.i

loop1.i.i.i.i.i.i:                                ; preds = %loop1.end.i.i.i.i.i.i, %loop4.i.i.i.i.i
  br label %loop2.i.i.i.i.i.i

loop2.i.i.i.i.i.i:                                ; preds = %loop2.end.i.i.i.i.i.i, %loop1.i.i.i.i.i.i
  br label %loop3.i.i.i.i.i.i

loop3.i.i.i.i.i.i:                                ; preds = %loop3.end.i.i.i.i.i.i, %loop2.i.i.i.i.i.i
  br label %loop4.i.i.i.i.i.i

loop4.i.i.i.i.i.i:                                ; preds = %recursiveFunc.12.exit.i.i.i.i.i.i, %loop3.i.i.i.i.i.i
  call void @print_val(i32 6)
  br label %loop1.i.i.i.i.i.i.i

loop1.i.i.i.i.i.i.i:                              ; preds = %loop1.end.i.i.i.i.i.i.i, %loop4.i.i.i.i.i.i
  br label %loop2.i.i.i.i.i.i.i

loop2.i.i.i.i.i.i.i:                              ; preds = %loop2.end.i.i.i.i.i.i.i, %loop1.i.i.i.i.i.i.i
  br label %loop3.i.i.i.i.i.i.i

loop3.i.i.i.i.i.i.i:                              ; preds = %loop3.end.i.i.i.i.i.i.i, %loop2.i.i.i.i.i.i.i
  br label %loop4.i.i.i.i.i.i.i

loop4.i.i.i.i.i.i.i:                              ; preds = %recursiveFunc.14.exit.i.i.i.i.i.i.i, %loop3.i.i.i.i.i.i.i
  call void @print_val(i32 7)
  br label %loop1.i.i.i.i.i.i.i.i

loop1.i.i.i.i.i.i.i.i:                            ; preds = %loop1.end.i.i.i.i.i.i.i.i, %loop4.i.i.i.i.i.i.i
  br label %loop2.i.i.i.i.i.i.i.i

loop2.i.i.i.i.i.i.i.i:                            ; preds = %loop2.end.i.i.i.i.i.i.i.i, %loop1.i.i.i.i.i.i.i.i
  br label %loop3.i.i.i.i.i.i.i.i

loop3.i.i.i.i.i.i.i.i:                            ; preds = %loop3.end.i.i.i.i.i.i.i.i, %loop2.i.i.i.i.i.i.i.i
  br label %loop4.i.i.i.i.i.i.i.i

loop4.i.i.i.i.i.i.i.i:                            ; preds = %recursiveFunc.16.exit.i.i.i.i.i.i.i.i, %loop3.i.i.i.i.i.i.i.i
  call void @print_val(i32 8)
  br label %loop1.i.i.i.i.i.i.i.i.i

loop1.i.i.i.i.i.i.i.i.i:                          ; preds = %loop1.end.i.i.i.i.i.i.i.i.i, %loop4.i.i.i.i.i.i.i.i
  br label %loop2.i.i.i.i.i.i.i.i.i

loop2.i.i.i.i.i.i.i.i.i:                          ; preds = %loop2.end.i.i.i.i.i.i.i.i.i, %loop1.i.i.i.i.i.i.i.i.i
  br label %loop3.i.i.i.i.i.i.i.i.i

loop3.i.i.i.i.i.i.i.i.i:                          ; preds = %loop3.end.i.i.i.i.i.i.i.i.i, %loop2.i.i.i.i.i.i.i.i.i
  br label %loop4.i.i.i.i.i.i.i.i.i

loop4.i.i.i.i.i.i.i.i.i:                          ; preds = %recursiveFunc.18.exit.i.i.i.i.i.i.i.i.i, %loop3.i.i.i.i.i.i.i.i.i
  call void @print_val(i32 9)
  br label %loop1.i.i.i.i.i.i.i.i.i.i

loop1.i.i.i.i.i.i.i.i.i.i:                        ; preds = %loop1.end.i.i.i.i.i.i.i.i.i.i, %loop4.i.i.i.i.i.i.i.i.i
  br label %loop2.i.i.i.i.i.i.i.i.i.i

loop2.i.i.i.i.i.i.i.i.i.i:                        ; preds = %loop2.end.i.i.i.i.i.i.i.i.i.i, %loop1.i.i.i.i.i.i.i.i.i.i
  br label %loop3.i.i.i.i.i.i.i.i.i.i

loop3.i.i.i.i.i.i.i.i.i.i:                        ; preds = %loop3.end.i.i.i.i.i.i.i.i.i.i, %loop2.i.i.i.i.i.i.i.i.i.i
  br label %loop4.i.i.i.i.i.i.i.i.i.i

loop4.i.i.i.i.i.i.i.i.i.i:                        ; preds = %loop4.i.i.i.i.i.i.i.i.i.i, %loop3.i.i.i.i.i.i.i.i.i.i
  call void @print_val(i32 10)
  call void @recursiveFunc(i32* nonnull @funcspec.arg.19)
  %exit_cond1.i.i.i.i.i.i.i.i.i.i = call i1 @exit_cond()
  br i1 %exit_cond1.i.i.i.i.i.i.i.i.i.i, label %loop4.i.i.i.i.i.i.i.i.i.i, label %loop3.end.i.i.i.i.i.i.i.i.i.i

loop3.end.i.i.i.i.i.i.i.i.i.i:                    ; preds = %loop4.i.i.i.i.i.i.i.i.i.i
  %exit_cond2.i.i.i.i.i.i.i.i.i.i = call i1 @exit_cond()
  br i1 %exit_cond2.i.i.i.i.i.i.i.i.i.i, label %loop3.i.i.i.i.i.i.i.i.i.i, label %loop2.end.i.i.i.i.i.i.i.i.i.i

loop2.end.i.i.i.i.i.i.i.i.i.i:                    ; preds = %loop3.end.i.i.i.i.i.i.i.i.i.i
  %exit_cond3.i.i.i.i.i.i.i.i.i.i = call i1 @exit_cond()
  br i1 %exit_cond3.i.i.i.i.i.i.i.i.i.i, label %loop2.i.i.i.i.i.i.i.i.i.i, label %loop1.end.i.i.i.i.i.i.i.i.i.i

loop1.end.i.i.i.i.i.i.i.i.i.i:                    ; preds = %loop2.end.i.i.i.i.i.i.i.i.i.i
  %exit_cond4.i.i.i.i.i.i.i.i.i.i = call i1 @exit_cond()
  br i1 %exit_cond4.i.i.i.i.i.i.i.i.i.i, label %loop1.i.i.i.i.i.i.i.i.i.i, label %recursiveFunc.18.exit.i.i.i.i.i.i.i.i.i

recursiveFunc.18.exit.i.i.i.i.i.i.i.i.i:          ; preds = %loop1.end.i.i.i.i.i.i.i.i.i.i
  %exit_cond1.i.i.i.i.i.i.i.i.i = call i1 @exit_cond()
  br i1 %exit_cond1.i.i.i.i.i.i.i.i.i, label %loop4.i.i.i.i.i.i.i.i.i, label %loop3.end.i.i.i.i.i.i.i.i.i

loop3.end.i.i.i.i.i.i.i.i.i:                      ; preds = %recursiveFunc.18.exit.i.i.i.i.i.i.i.i.i
  %exit_cond2.i.i.i.i.i.i.i.i.i = call i1 @exit_cond()
  br i1 %exit_cond2.i.i.i.i.i.i.i.i.i, label %loop3.i.i.i.i.i.i.i.i.i, label %loop2.end.i.i.i.i.i.i.i.i.i

loop2.end.i.i.i.i.i.i.i.i.i:                      ; preds = %loop3.end.i.i.i.i.i.i.i.i.i
  %exit_cond3.i.i.i.i.i.i.i.i.i = call i1 @exit_cond()
  br i1 %exit_cond3.i.i.i.i.i.i.i.i.i, label %loop2.i.i.i.i.i.i.i.i.i, label %loop1.end.i.i.i.i.i.i.i.i.i

loop1.end.i.i.i.i.i.i.i.i.i:                      ; preds = %loop2.end.i.i.i.i.i.i.i.i.i
  %exit_cond4.i.i.i.i.i.i.i.i.i = call i1 @exit_cond()
  br i1 %exit_cond4.i.i.i.i.i.i.i.i.i, label %loop1.i.i.i.i.i.i.i.i.i, label %recursiveFunc.16.exit.i.i.i.i.i.i.i.i

recursiveFunc.16.exit.i.i.i.i.i.i.i.i:            ; preds = %loop1.end.i.i.i.i.i.i.i.i.i
  %exit_cond1.i.i.i.i.i.i.i.i = call i1 @exit_cond()
  br i1 %exit_cond1.i.i.i.i.i.i.i.i, label %loop4.i.i.i.i.i.i.i.i, label %loop3.end.i.i.i.i.i.i.i.i

loop3.end.i.i.i.i.i.i.i.i:                        ; preds = %recursiveFunc.16.exit.i.i.i.i.i.i.i.i
  %exit_cond2.i.i.i.i.i.i.i.i = call i1 @exit_cond()
  br i1 %exit_cond2.i.i.i.i.i.i.i.i, label %loop3.i.i.i.i.i.i.i.i, label %loop2.end.i.i.i.i.i.i.i.i

loop2.end.i.i.i.i.i.i.i.i:                        ; preds = %loop3.end.i.i.i.i.i.i.i.i
  %exit_cond3.i.i.i.i.i.i.i.i = call i1 @exit_cond()
  br i1 %exit_cond3.i.i.i.i.i.i.i.i, label %loop2.i.i.i.i.i.i.i.i, label %loop1.end.i.i.i.i.i.i.i.i

loop1.end.i.i.i.i.i.i.i.i:                        ; preds = %loop2.end.i.i.i.i.i.i.i.i
  %exit_cond4.i.i.i.i.i.i.i.i = call i1 @exit_cond()
  br i1 %exit_cond4.i.i.i.i.i.i.i.i, label %loop1.i.i.i.i.i.i.i.i, label %recursiveFunc.14.exit.i.i.i.i.i.i.i

recursiveFunc.14.exit.i.i.i.i.i.i.i:              ; preds = %loop1.end.i.i.i.i.i.i.i.i
  %exit_cond1.i.i.i.i.i.i.i = call i1 @exit_cond()
  br i1 %exit_cond1.i.i.i.i.i.i.i, label %loop4.i.i.i.i.i.i.i, label %loop3.end.i.i.i.i.i.i.i

loop3.end.i.i.i.i.i.i.i:                          ; preds = %recursiveFunc.14.exit.i.i.i.i.i.i.i
  %exit_cond2.i.i.i.i.i.i.i = call i1 @exit_cond()
  br i1 %exit_cond2.i.i.i.i.i.i.i, label %loop3.i.i.i.i.i.i.i, label %loop2.end.i.i.i.i.i.i.i

loop2.end.i.i.i.i.i.i.i:                          ; preds = %loop3.end.i.i.i.i.i.i.i
  %exit_cond3.i.i.i.i.i.i.i = call i1 @exit_cond()
  br i1 %exit_cond3.i.i.i.i.i.i.i, label %loop2.i.i.i.i.i.i.i, label %loop1.end.i.i.i.i.i.i.i

loop1.end.i.i.i.i.i.i.i:                          ; preds = %loop2.end.i.i.i.i.i.i.i
  %exit_cond4.i.i.i.i.i.i.i = call i1 @exit_cond()
  br i1 %exit_cond4.i.i.i.i.i.i.i, label %loop1.i.i.i.i.i.i.i, label %recursiveFunc.12.exit.i.i.i.i.i.i

recursiveFunc.12.exit.i.i.i.i.i.i:                ; preds = %loop1.end.i.i.i.i.i.i.i
  %exit_cond1.i.i.i.i.i.i = call i1 @exit_cond()
  br i1 %exit_cond1.i.i.i.i.i.i, label %loop4.i.i.i.i.i.i, label %loop3.end.i.i.i.i.i.i

loop3.end.i.i.i.i.i.i:                            ; preds = %recursiveFunc.12.exit.i.i.i.i.i.i
  %exit_cond2.i.i.i.i.i.i = call i1 @exit_cond()
  br i1 %exit_cond2.i.i.i.i.i.i, label %loop3.i.i.i.i.i.i, label %loop2.end.i.i.i.i.i.i

loop2.end.i.i.i.i.i.i:                            ; preds = %loop3.end.i.i.i.i.i.i
  %exit_cond3.i.i.i.i.i.i = call i1 @exit_cond()
  br i1 %exit_cond3.i.i.i.i.i.i, label %loop2.i.i.i.i.i.i, label %loop1.end.i.i.i.i.i.i

loop1.end.i.i.i.i.i.i:                            ; preds = %loop2.end.i.i.i.i.i.i
  %exit_cond4.i.i.i.i.i.i = call i1 @exit_cond()
  br i1 %exit_cond4.i.i.i.i.i.i, label %loop1.i.i.i.i.i.i, label %recursiveFunc.10.exit.i.i.i.i.i

recursiveFunc.10.exit.i.i.i.i.i:                  ; preds = %loop1.end.i.i.i.i.i.i
  %exit_cond1.i.i.i.i.i = call i1 @exit_cond()
  br i1 %exit_cond1.i.i.i.i.i, label %loop4.i.i.i.i.i, label %loop3.end.i.i.i.i.i

loop3.end.i.i.i.i.i:                              ; preds = %recursiveFunc.10.exit.i.i.i.i.i
  %exit_cond2.i.i.i.i.i = call i1 @exit_cond()
  br i1 %exit_cond2.i.i.i.i.i, label %loop3.i.i.i.i.i, label %loop2.end.i.i.i.i.i

loop2.end.i.i.i.i.i:                              ; preds = %loop3.end.i.i.i.i.i
  %exit_cond3.i.i.i.i.i = call i1 @exit_cond()
  br i1 %exit_cond3.i.i.i.i.i, label %loop2.i.i.i.i.i, label %loop1.end.i.i.i.i.i

loop1.end.i.i.i.i.i:                              ; preds = %loop2.end.i.i.i.i.i
  %exit_cond4.i.i.i.i.i = call i1 @exit_cond()
  br i1 %exit_cond4.i.i.i.i.i, label %loop1.i.i.i.i.i, label %recursiveFunc.8.exit.i.i.i.i

recursiveFunc.8.exit.i.i.i.i:                     ; preds = %loop1.end.i.i.i.i.i
  %exit_cond1.i.i.i.i = call i1 @exit_cond()
  br i1 %exit_cond1.i.i.i.i, label %loop4.i.i.i.i, label %loop3.end.i.i.i.i

loop3.end.i.i.i.i:                                ; preds = %recursiveFunc.8.exit.i.i.i.i
  %exit_cond2.i.i.i.i = call i1 @exit_cond()
  br i1 %exit_cond2.i.i.i.i, label %loop3.i.i.i.i, label %loop2.end.i.i.i.i

loop2.end.i.i.i.i:                                ; preds = %loop3.end.i.i.i.i
  %exit_cond3.i.i.i.i = call i1 @exit_cond()
  br i1 %exit_cond3.i.i.i.i, label %loop2.i.i.i.i, label %loop1.end.i.i.i.i

loop1.end.i.i.i.i:                                ; preds = %loop2.end.i.i.i.i
  %exit_cond4.i.i.i.i = call i1 @exit_cond()
  br i1 %exit_cond4.i.i.i.i, label %loop1.i.i.i.i, label %recursiveFunc.6.exit.i.i.i

recursiveFunc.6.exit.i.i.i:                       ; preds = %loop1.end.i.i.i.i
  %exit_cond1.i.i.i = call i1 @exit_cond()
  br i1 %exit_cond1.i.i.i, label %loop4.i.i.i, label %loop3.end.i.i.i

loop3.end.i.i.i:                                  ; preds = %recursiveFunc.6.exit.i.i.i
  %exit_cond2.i.i.i = call i1 @exit_cond()
  br i1 %exit_cond2.i.i.i, label %loop3.i.i.i, label %loop2.end.i.i.i

loop2.end.i.i.i:                                  ; preds = %loop3.end.i.i.i
  %exit_cond3.i.i.i = call i1 @exit_cond()
  br i1 %exit_cond3.i.i.i, label %loop2.i.i.i, label %loop1.end.i.i.i

loop1.end.i.i.i:                                  ; preds = %loop2.end.i.i.i
  %exit_cond4.i.i.i = call i1 @exit_cond()
  br i1 %exit_cond4.i.i.i, label %loop1.i.i.i, label %recursiveFunc.4.exit.i.i

recursiveFunc.4.exit.i.i:                         ; preds = %loop1.end.i.i.i
  %exit_cond1.i.i = call i1 @exit_cond()
  br i1 %exit_cond1.i.i, label %loop4.i.i, label %loop3.end.i.i

loop3.end.i.i:                                    ; preds = %recursiveFunc.4.exit.i.i
  %exit_cond2.i.i = call i1 @exit_cond()
  br i1 %exit_cond2.i.i, label %loop3.i.i, label %loop2.end.i.i

loop2.end.i.i:                                    ; preds = %loop3.end.i.i
  %exit_cond3.i.i = call i1 @exit_cond()
  br i1 %exit_cond3.i.i, label %loop2.i.i, label %loop1.end.i.i

loop1.end.i.i:                                    ; preds = %loop2.end.i.i
  %exit_cond4.i.i = call i1 @exit_cond()
  br i1 %exit_cond4.i.i, label %loop1.i.i, label %recursiveFunc.2.exit.i

recursiveFunc.2.exit.i:                           ; preds = %loop1.end.i.i
  %exit_cond1.i = call i1 @exit_cond()
  br i1 %exit_cond1.i, label %loop4.i, label %loop3.end.i

loop3.end.i:                                      ; preds = %recursiveFunc.2.exit.i
  %exit_cond2.i = call i1 @exit_cond()
  br i1 %exit_cond2.i, label %loop3.i, label %loop2.end.i

loop2.end.i:                                      ; preds = %loop3.end.i
  %exit_cond3.i = call i1 @exit_cond()
  br i1 %exit_cond3.i, label %loop2.i, label %loop1.end.i

loop1.end.i:                                      ; preds = %loop2.end.i
  %exit_cond4.i = call i1 @exit_cond()
  br i1 %exit_cond4.i, label %loop1.i, label %recursiveFunc.1.exit

recursiveFunc.1.exit:                             ; preds = %loop1.end.i
  ret i32 0
}

declare dso_local void @print_val(i32)

declare dso_local i1 @exit_cond()

; Function Attrs: argmemonly nofree nosync nounwind willreturn
declare void @llvm.lifetime.start.p0i8(i64 immarg, i8* nocapture) #0

; Function Attrs: argmemonly nofree nosync nounwind willreturn
declare void @llvm.lifetime.end.p0i8(i64 immarg, i8* nocapture) #0

attributes #0 = { argmemonly nofree nosync nounwind willreturn }

hmmmm, I guess it must not be your expected code, right?

For *this* example, I don't see which high level criteria is the problem:
compile-times,
code-size,
performance.

I think all of them would be affected. The impact on code-size is obvious. For the compile-time, it would take a long time to compile if we set the argument func-specialization-max-iters to 2000.

That's why I still prefer the approach of getting this foundation in place, because it is opt-in, then we can tune this further.

I prefer to use small program to analysis. My work flow to optimize would be :

Run big workloads -> analysis the hot part -> extract small example from the hot part -> optimize for the hot part -> prove the effects with the big workloads.

It is hard to optimize big workloads directly since many details are missing. So I think it may be better to talk about the small example if we can. I mean, we need to take many time to find a small program from workloads to find the insight generally. But if we could talk about the small example directly, it would be much better.

I can't follow any this. There are only arbitrary numbers here. I can't follow the decision making behind this, because no rationale is given.

My bad. I mean we shouldn't make recursiveFunc arbitrarily. The limit number may get discussed more.

If we run bin/opt -function-specialization -func-specialization-max-iters=10 -S %s, we could find that recursiveFunc would get specialized 18 times.

This is not not true. It will get specialised 10 times.

Then I run bin/opt -function-specialization -func-specialization-max-iters=10 -inline -instcombine -simplifycfg -S %s, I got:
<SNIP>
hmmmm, I guess it must not be your expected code, right?

Hmmm, sorry, this is exactly what I expect. Look at the generated code:

_main:                                  ; @main
      stp     x20, x19, [sp, #-32]!           ; 16-byte Folded Spill
      stp     x29, x30, [sp, #16]             ; 16-byte Folded Spill
      adrp    x19, _funcspec.arg.17@PAGE
      add     x19, x19, _funcspec.arg.17@PAGEOFF
LBB1_1:
      mov     w0, #1
      bl      _print_val
LBB1_2:
      mov     w0, #2
      bl      _print_val
LBB1_3:
      mov     w0, #3
      bl      _print_val
LBB1_4:
      mov     w0, #4
      bl      _print_val
LBB1_5:
      mov     w0, #5
      bl      _print_val
LBB1_6:
      mov     w0, #6
      bl      _print_val
LBB1_7:
      mov     w0, #7
      bl      _print_val
LBB1_8:
      mov     w0, #8
      bl      _print_val
LBB1_9:
      mov     w0, #9
      bl      _print_val
      mov     x0, x19
      bl      _recursiveFunc
      bl      _exit_cond
      tbnz    w0, #0, LBB1_9
      bl      _exit_cond
      tbnz    w0, #0, LBB1_9
      bl      _exit_cond
      tbnz    w0, #0, LBB1_9
      bl      _exit_cond
      tbnz    w0, #0, LBB1_9
      bl      _exit_cond
      tbnz    w0, #0, LBB1_8
      bl      _exit_cond
      tbnz    w0, #0, LBB1_8
      bl      _exit_cond
      tbnz    w0, #0, LBB1_8
      bl      _exit_cond
      tbnz    w0, #0, LBB1_8
      bl      _exit_cond
      tbnz    w0, #0, LBB1_8
      bl      _exit_cond
      tbnz    w0, #0, LBB1_7
      bl      _exit_cond
      tbnz    w0, #0, LBB1_7
      bl      _exit_cond
      tbnz    w0, #0, LBB1_7
      bl      _exit_cond
      tbnz    w0, #0, LBB1_7
      bl      _exit_cond
      <SNIP: a few more>

That seem fairly optimal to me, so don't know why we wouldn't be doing this, or what we need to differently.

I think all of them would be affected. The impact on code-size is obvious. For the compile-time, it would take a long time to compile if we set the argument func-specialization-max-iters to 2000.

I am sorry, but this is an unreasonable number. There is no point in discussing such a big number, because the default will never be even close to that, and if you do want to set it to very high, then that's opt-in and your responsibility.

In D106426#2906850, @SjoerdMeijer wrote:

If we run bin/opt -function-specialization -func-specialization-max-iters=10 -S %s, we could find that recursiveFunc would get specialized 18 times.

This is not not true. It will get specialised 10 times.

Sorry, I count the suffix number directly. But it still doesn't make sense to me. recursiveFunc shouldn't be specialized so many times. I guess we would be in consensus in this.

Then I run bin/opt -function-specialization -func-specialization-max-iters=10 -inline -instcombine -simplifycfg -S %s, I got:
<SNIP>
hmmmm, I guess it must not be your expected code, right?

Hmmm, sorry, this is exactly what I expect. Look at the generated code:

_main:                                  ; @main
      stp     x20, x19, [sp, #-32]!           ; 16-byte Folded Spill
      stp     x29, x30, [sp, #16]             ; 16-byte Folded Spill
      adrp    x19, _funcspec.arg.17@PAGE
      add     x19, x19, _funcspec.arg.17@PAGEOFF
LBB1_1:
      mov     w0, #1
      bl      _print_val
LBB1_2:
      mov     w0, #2
      bl      _print_val
LBB1_3:
      mov     w0, #3
      bl      _print_val
LBB1_4:
      mov     w0, #4
      bl      _print_val
LBB1_5:
      mov     w0, #5
      bl      _print_val
LBB1_6:
      mov     w0, #6
      bl      _print_val
LBB1_7:
      mov     w0, #7
      bl      _print_val
LBB1_8:
      mov     w0, #8
      bl      _print_val
LBB1_9:
      mov     w0, #9
      bl      _print_val
      mov     x0, x19
      bl      _recursiveFunc
      bl      _exit_cond
      tbnz    w0, #0, LBB1_9
      bl      _exit_cond
      tbnz    w0, #0, LBB1_9
      bl      _exit_cond
      tbnz    w0, #0, LBB1_9
      bl      _exit_cond
      tbnz    w0, #0, LBB1_9
      bl      _exit_cond
      tbnz    w0, #0, LBB1_8
      bl      _exit_cond
      tbnz    w0, #0, LBB1_8
      bl      _exit_cond
      tbnz    w0, #0, LBB1_8
      bl      _exit_cond
      tbnz    w0, #0, LBB1_8
      bl      _exit_cond
      tbnz    w0, #0, LBB1_8
      bl      _exit_cond
      tbnz    w0, #0, LBB1_7
      bl      _exit_cond
      tbnz    w0, #0, LBB1_7
      bl      _exit_cond
      tbnz    w0, #0, LBB1_7
      bl      _exit_cond
      tbnz    w0, #0, LBB1_7
      bl      _exit_cond
      <SNIP: a few more>

That seem fairly optimal to me, so don't know why we wouldn't be doing this, or what we need to differently.

hmmmm, the generated code looks not good enough to me. There are too many redundancies in the main function.
The key point I want to say is that we shouldn't specialize recursiveFunc so many times. If you feels like that it is OK, then how about setting func-specialization-max-iters to 20 or 100? I feel like it should be easy to agree that there should be a limit for recursive functions.

I think all of them would be affected. The impact on code-size is obvious. For the compile-time, it would take a long time to compile if we set the argument func-specialization-max-iters to 2000.

I am sorry, but this is an unreasonable number. There is no point in discussing such a big number, because the default will never be even close to that, and if you do want to set it to very high, then that's opt-in and your responsibility.

I just want to note that the compile time would increase significantly with the parameter func-specialization-max-iters
For the above example, I got the following results by using opt to compile:
(compile option is -function-specialization -func-specialization-max-iters=100 -inline -instcombine -simplifycfg)

func-specialization-max-iters	compile time (seconds)
1	0.0096
10	0.1253
20	0.4127
50	2.4648
100	5.9211

hmm, it looks not good.

Since the discuss involves the different perspective to review a patch. So it may be helpful to hear the opinions from other reviewers: @fhahn @jaykang10 @chill

Matt added a subscriber: Matt.Jul 28 2021, 2:57 PM

ChuanqiXu mentioned this in D107067: [FuncSpec] Ignore ConstantExpr when find possible constant.Jul 29 2021, 6:26 AM

Looks like we agree after all! :-) Because I fully agree with this:

The key point I want to say is that we shouldn't specialize recursiveFunc so many times. If you feels like that it is OK, then how about setting func-specialization-max-iters to 20 or 100?

An early version of the function specialisation pass had a so called "aggressive mode". This meant that it was (still only) doing 10 iterations. That's what I envision what it could be roughly. But to be clear, I expect it to be a (very) low number. Simply because when we start to specialise more, it's going to cost more (compile-time). There's indeed no way around that, and that's why we agree on this. The cost of specialising more is not a problem with the implementation of this patch (but again, simply because of the resulting additional functions/instructions).

About what the value of -func-specialization-max-iters= should be exactly, I don't know yet. I would like to defer that question to some point later, when I pick up enabling function specialisation. I would like to see if we can get it enabled first with -func-specialization-max-iters=1, and after if we can increase it, or if needs to remain opt-in.

Like I said, I think we agree. That's why I would like to see first if we can finish this reviewer together, which I think would be a nicer result than getting other people involved at this point.

In D106426#2913452, @SjoerdMeijer wrote:

Looks like we agree after all! :-) Because I fully agree with this:

The key point I want to say is that we shouldn't specialize recursiveFunc so many times. If you feels like that it is OK, then how about setting func-specialization-max-iters to 20 or 100?

An early version of the function specialisation pass had a so called "aggressive mode". This meant that it was (still only) doing 10 iterations. That's what I envision what it could be roughly. But to be clear, I expect it to be a (very) low number. Simply because when we start to specialise more, it's going to cost more (compile-time). There's indeed no way around that, and that's why we agree on this. The cost of specialising more is not a problem with the implementation of this patch (but again, simply because of the resulting additional functions/instructions).

About what the value of -func-specialization-max-iters= should be exactly, I don't know yet. I would like to defer that question to some point later, when I pick up enabling function specialisation. I would like to see if we can get it enabled first with -func-specialization-max-iters=1, and after if we can increase it, or if needs to remain opt-in.

Like I said, I think we agree. That's why I would like to see first if we can finish this reviewer together, which I think would be a nicer result than getting other people involved at this point.

It looks like the different opinions come from different perspective. I prefer to look into the details.

Or let me ask the other question: do you think it matters that the recursive functions get specialized many times when func-specialization-max-iters increases? If yes, I think I could fix it in successive patches. It shouldn't be so hard.

Out of this, the overall patch looks good to me. If we can get consensus in above question, I would look into the implementation details.

In D106426#2915448, @ChuanqiXu wrote:

In D106426#2913452, @SjoerdMeijer wrote:

Looks like we agree after all! :-) Because I fully agree with this:

The key point I want to say is that we shouldn't specialize recursiveFunc so many times. If you feels like that it is OK, then how about setting func-specialization-max-iters to 20 or 100?

An early version of the function specialisation pass had a so called "aggressive mode". This meant that it was (still only) doing 10 iterations. That's what I envision what it could be roughly. But to be clear, I expect it to be a (very) low number. Simply because when we start to specialise more, it's going to cost more (compile-time). There's indeed no way around that, and that's why we agree on this. The cost of specialising more is not a problem with the implementation of this patch (but again, simply because of the resulting additional functions/instructions).

About what the value of -func-specialization-max-iters= should be exactly, I don't know yet. I would like to defer that question to some point later, when I pick up enabling function specialisation. I would like to see if we can get it enabled first with -func-specialization-max-iters=1, and after if we can increase it, or if needs to remain opt-in.

Like I said, I think we agree. That's why I would like to see first if we can finish this reviewer together, which I think would be a nicer result than getting other people involved at this point.

It looks like the different opinions come from different perspective. I prefer to look into the details.

Or let me ask the other question: do you think it matters that the recursive functions get specialized many times when func-specialization-max-iters increases? If yes, I think I could fix it in successive patches. It shouldn't be so hard.

Out of this, the overall patch looks good to me. If we can get consensus in above question, I would look into the implementation details.

So yes, I think we also agree on this. Let me summarise this then:

At the moment the number of specialised recursive functions grows linearly by increasing func-specialization-max-iters. But that's by design, to support recursive functions.
And that obviously comes at a compile-time cost. That's why, realistically, in its current form func-specialization-max-iters will never be a big number (say e.g. more than 20).
Since compile-time is directly controlled by func-specialization-max-iters, that will remain at 1 for the moment. Users can opt-in and pay the cost if they want. Our ambition is to set it something higher, but that depends on the next steps.

Next, and crucially, to address your points:

A 100% agreed of course: if we can reduce compile times, if we can come up with better cost-modeling, we should do that!
I think that will be very easy to add that as a follow up. For example, passing in an iteration number to the bonus/penalty calculation function is trivial. We just need to know what exactly the changes to cost-model should be. I think this data is missing at the moment, see next point.
the 10 iterations for the "aggressive mode" was not that a coincidence: this was chosen to get completely rid of a recursive function in exchange from SPEC. When we start looking into recursive functions and increasing the max iterations, we will still want to support that case, and hopefully others too to make it general, and then find a good compile-time trade off.

So yes, I am happy to support you or look into any improvements myself in this area to improve things if we can.

In D106426#2915751, @SjoerdMeijer wrote:

In D106426#2915448, @ChuanqiXu wrote:

In D106426#2913452, @SjoerdMeijer wrote:

Looks like we agree after all! :-) Because I fully agree with this:

The key point I want to say is that we shouldn't specialize recursiveFunc so many times. If you feels like that it is OK, then how about setting func-specialization-max-iters to 20 or 100?

An early version of the function specialisation pass had a so called "aggressive mode". This meant that it was (still only) doing 10 iterations. That's what I envision what it could be roughly. But to be clear, I expect it to be a (very) low number. Simply because when we start to specialise more, it's going to cost more (compile-time). There's indeed no way around that, and that's why we agree on this. The cost of specialising more is not a problem with the implementation of this patch (but again, simply because of the resulting additional functions/instructions).

About what the value of -func-specialization-max-iters= should be exactly, I don't know yet. I would like to defer that question to some point later, when I pick up enabling function specialisation. I would like to see if we can get it enabled first with -func-specialization-max-iters=1, and after if we can increase it, or if needs to remain opt-in.

Like I said, I think we agree. That's why I would like to see first if we can finish this reviewer together, which I think would be a nicer result than getting other people involved at this point.

It looks like the different opinions come from different perspective. I prefer to look into the details.

Or let me ask the other question: do you think it matters that the recursive functions get specialized many times when func-specialization-max-iters increases? If yes, I think I could fix it in successive patches. It shouldn't be so hard.

Out of this, the overall patch looks good to me. If we can get consensus in above question, I would look into the implementation details.

So yes, I think we also agree on this. Let me summarise this then:

At the moment the number of specialised recursive functions grows linearly by increasing func-specialization-max-iters. But that's by design, to support recursive functions.

And that obviously comes at a compile-time cost. That's why, realistically, in its current form func-specialization-max-iters will never be a big number (say e.g. more than 20).

Since compile-time is directly controlled by func-specialization-max-iters, that will remain at 1 for the moment. Users can opt-in and pay the cost if they want. Our ambition is to set it something higher, but that depends on the next steps.

Next, and crucially, to address your points:

A 100% agreed of course: if we can reduce compile times, if we can come up with better cost-modeling, we should do that!

I think that will be very easy to add that as a follow up. For example, passing in an iteration number to the bonus/penalty calculation function is trivial. We just need to know what exactly the changes to cost-model should be. I think this data is missing at the moment, see next point.

the 10 iterations for the "aggressive mode" was not that a coincidence: this was chosen to get completely rid of a recursive function in exchange from SPEC. When we start looking into recursive functions and increasing the max iterations, we will still want to support that case, and hopefully others too to make it general, and then find a good compile-time trade off.

So yes, I am happy to support you or look into any improvements myself in this area to improve things if we can.

Yeah, now the things get clearer. The overall patch looks good to me. I would try to reduce the time a recursive function get specialized when func-specialization-max-iters increases in follow up patches.

llvm/lib/Transforms/Utils/SCCPSolver.cpp
1259–1261 ↗	(On Diff #360772)	When would PI be nullptr? It looks like we didn't update the solver in time.

SjoerdMeijer added inline comments.Aug 3 2021, 12:38 AM

llvm/lib/Transforms/Utils/SCCPSolver.cpp
1259–1261 ↗	(On Diff #360772)	Yeah, that is a good point. I have looked into this, but did not yet get the bottom. Either way, I thought that simply returning is more graceful than an assert (which does not even happen in a release build), especially if it is a matter of not updating the solver in time. I thought the solver should be robust against these things, and thus thought returning is better.

ChuanqiXu added inline comments.Aug 3 2021, 3:04 AM

llvm/lib/Transforms/Utils/SCCPSolver.cpp
1259–1261 ↗	(On Diff #360772)	It looks like that this diff triggers the assertion and removes the assertion simply. If it is in this case, we need more explanation to remove it. Although there are redundant assertions, we can't remove them arbitrarily. If this diff wouldn't trigger the assertion, I think we could remove this change and leave it to follow up patches (Like to make SCCPSolver more robust to not update everything in the beginning).

I have removed the changes in the SCCP solver, and have restored the cleanup of the ssa_copy intrinsics that are introduced by the solver.

Removing these intrinsics was already necessary for supporting recursive functions, but is also the easiest way to keep the internal state of the solver in a sane state for these intrinsics while iteratively invoking it.

SjoerdMeijer added inline comments.Aug 3 2021, 12:55 PM

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
532	This is now `removeSSACopy`, see line 188.

Harbormaster completed remote builds in B117718: Diff 363840.Aug 3 2021, 1:29 PM

It looks good now. Although I agree that it is better to handle it in SCCPSolver, it could be done by an independent refactoring patch.

This revision is now accepted and ready to land.Aug 3 2021, 6:58 PM

Thanks for the help with this patch, and that sounds like a good plan to me.

Closed by commit rG30fbb0697907: [FuncSpec] Support specialising recursive functions (authored by SjoerdMeijer). · Explain WhyAug 4 2021, 12:42 AM

This revision was automatically updated to reflect the committed changes.

SjoerdMeijer added a commit: rG30fbb0697907: [FuncSpec] Support specialising recursive functions.

ChuanqiXu mentioned this in rG82ca845b4793: [NFC] [FuncSpec] Update the Todo list for recursive functions.Aug 5 2021, 11:46 PM

labrinea mentioned this in D122755: [FuncSpec] Constant propagate multiple arguments for recursive functions..Mar 30 2022, 11:43 AM

labrinea mentioned this in rGb4417075dc1c: [FuncSpec] Constant propagate multiple arguments for recursive functions..Mar 31 2022, 5:08 AM

mtsamis mentioned this in D152556: [FuncSpec] Promote stack values before specialization.Jun 9 2023, 9:54 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

IPO/

FunctionSpecialization.cpp

212 lines

test/

Transforms/

FunctionSpecialization/

function-specialization-recursive.ll

46 lines

function-specialization-recursive2.ll

32 lines

function-specialization-recursive3.ll

34 lines

function-specialization-recursive4.ll

32 lines

Diff 363987

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp

//===- FunctionSpecialization.cpp - Function Specialization ---------------===//		//===- FunctionSpecialization.cpp - Function Specialization ---------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This specialises functions with constant parameters (e.g. functions,		// This specialises functions with constant parameters (e.g. functions,
// globals). Constant parameters like function pointers and constant globals		// globals). Constant parameters like function pointers and constant globals
// are propagated to the callee by specializing the function.		// are propagated to the callee by specializing the function.
//		//
// Current limitations:		// Current limitations:
// - It does not handle specialization of recursive functions,
// - It does not yet handle integer ranges.		// - It does not yet handle integer ranges.
// - Only 1 argument per function is specialised,		// - Only 1 argument per function is specialised,
// - The cost-model could be further looked into,		// - The cost-model could be further looked into,
// - We are not yet caching analysis results.		// - We are not yet caching analysis results.
//		//
// Ideas:		// Ideas:
// - With a function specialization attribute for arguments, we could have		// - With a function specialization attribute for arguments, we could have
// a direct way to steer function specialization, avoiding the cost-model,		// a direct way to steer function specialization, avoiding the cost-model,
Show All 40 Lines	static cl::opt<unsigned>
AvgLoopIterationCount("func-specialization-avg-iters-cost", cl::Hidden,		AvgLoopIterationCount("func-specialization-avg-iters-cost", cl::Hidden,
cl::desc("Average loop iteration count cost"),		cl::desc("Average loop iteration count cost"),
cl::init(10));		cl::init(10));

static cl::opt<bool> EnableSpecializationForLiteralConstant(		static cl::opt<bool> EnableSpecializationForLiteralConstant(
"function-specialization-for-literal-constant", cl::init(false), cl::Hidden,		"function-specialization-for-literal-constant", cl::init(false), cl::Hidden,
cl::desc("Make function specialization available for literal constant."));		cl::desc("Make function specialization available for literal constant."));

		// Helper to check if \p LV is either a constant or a constant
		// range with a single element. This should cover exactly the same cases as the
		// old ValueLatticeElement::isConstant() and is intended to be used in the
		// transition to ValueLatticeElement.
		static bool isConstant(const ValueLatticeElement &LV) {
		ChuanqiXuUnsubmitted Not Done Reply Inline Actions mark this with `static` ChuanqiXu: mark this with `static`
		return LV.isConstant() \|\|
		(LV.isConstantRange() && LV.getConstantRange().isSingleElement());
		}

// Helper to check if \p LV is either overdefined or a constant int.		// Helper to check if \p LV is either overdefined or a constant int.
static bool isOverdefined(const ValueLatticeElement &LV) {		static bool isOverdefined(const ValueLatticeElement &LV) {
return !LV.isUnknownOrUndef() && !LV.isConstant();		return !LV.isUnknownOrUndef() && !isConstant(LV);
		}

		static Constant getPromotableAlloca(AllocaInst Alloca, CallInst *Call) {
		ChuanqiXuUnsubmitted Not Done Reply Inline Actions maybe we need a better name such as `getPromotableAlloca` ChuanqiXu: maybe we need a better name such as `getPromotableAlloca`
		Value *StoreValue = nullptr;
		for (auto *User : Alloca->users()) {
		// We can't use llvm::isAllocaPromotable() as that would fail because of
		// the usage in the CallInst, which is what we check here.
		if (User == Call)
		continue;
		if (auto *Bitcast = dyn_cast<BitCastInst>(User)) {
		if (!Bitcast->hasOneUse() \|\| *Bitcast->user_begin() != Call)
		return nullptr;
		continue;
		}

		if (auto *Store = dyn_cast<StoreInst>(User)) {
		// This is a duplicate store, bail out.
		if (StoreValue \|\| Store->isVolatile())
		return nullptr;
		StoreValue = Store->getValueOperand();
		continue;
		}
		// Bail if there is any other unknown usage.
		return nullptr;
		}
		return dyn_cast_or_null<Constant>(StoreValue);
		}

		// A constant stack value is an AllocaInst that has a single constant
		// value stored to it. Return this constant if such an alloca stack value
		// is a function argument.
		static Constant getConstantStackValue(CallInst Call, Value *Val,
		SCCPSolver &Solver) {
		if (!Val)
		return nullptr;
		Val = Val->stripPointerCasts();
		if (auto *ConstVal = dyn_cast<ConstantInt>(Val))
		return ConstVal;
		auto *Alloca = dyn_cast<AllocaInst>(Val);
		if (!Alloca \|\| !Alloca->getAllocatedType()->isIntegerTy())
		return nullptr;
		return getPromotableAlloca(Alloca, Call);
		}

		// To support specializing recursive functions, it is important to propagate
		// constant arguments because after a first iteration of specialisation, a
		// reduced example may look like this:
		//
		// define internal void @RecursiveFn(i32* arg1) {
		// %temp = alloca i32, align 4
		// store i32 2 i32* %temp, align 4
		// call void @RecursiveFn.1(i32* nonnull %temp)
		// ret void
		// }
		//
		// Before a next iteration, we need to propagate the constant like so
		// which allows further specialization in next iterations.
		//
		// @funcspec.arg = internal constant i32 2
		//
		// define internal void @someFunc(i32* arg1) {
		// call void @otherFunc(i32* nonnull @funcspec.arg)
		// ret void
		// }
		//
		static void constantArgPropagation(SmallVectorImpl<Function *> &WorkList,
		Module &M, SCCPSolver &Solver) {
		// Iterate over the argument tracked functions see if there
		// are any new constant values for the call instruction via
		// stack variables.
		for (auto *F : WorkList) {
		// TODO: Generalize for any read only arguments.
		if (F->arg_size() != 1)
		continue;

		auto &Arg = *F->arg_begin();
		if (!Arg.onlyReadsMemory() \|\| !Arg.getType()->isPointerTy())
		continue;

		for (auto *User : F->users()) {
		auto *Call = dyn_cast<CallInst>(User);
		if (!Call)
		break;
		auto *ArgOp = Call->getArgOperand(0);
		auto *ArgOpType = ArgOp->getType();
		auto *ConstVal = getConstantStackValue(Call, ArgOp, Solver);
		if (!ConstVal)
		break;

		Value *GV = new GlobalVariable(M, ConstVal->getType(), true,
		GlobalValue::InternalLinkage, ConstVal,
		"funcspec.arg");

		if (ArgOpType != ConstVal->getType())
		GV = ConstantExpr::getBitCast(cast<Constant>(GV), ArgOp->getType());

		Call->setArgOperand(0, GV);

		// Add the changed CallInst to Solver Worklist
		Solver.visitCall(*Call);
		}
		}
		}

		// ssa_copy intrinsics are introduced by the SCCP solver. These intrinsics
		// interfere with the constantArgPropagation optimization.
		static void removeSSACopy(Function &F) {
		for (BasicBlock &BB : F) {
		for (BasicBlock::iterator BI = BB.begin(), E = BB.end(); BI != E;) {
		Instruction Inst = &BI++;
		auto *II = dyn_cast<IntrinsicInst>(Inst);
		if (!II)
		continue;
		if (II->getIntrinsicID() != Intrinsic::ssa_copy)
		continue;
		Inst->replaceAllUsesWith(II->getOperand(0));
		Inst->eraseFromParent();
		}
		}
		}

		static void removeSSACopy(Module &M) {
		for (Function &F : M)
		removeSSACopy(F);
}		}

class FunctionSpecializer {		class FunctionSpecializer {

/// The IPSCCP Solver.		/// The IPSCCP Solver.
SCCPSolver &Solver;		SCCPSolver &Solver;

/// Analyses used to help determine if a function should be specialized.		/// Analyses used to help determine if a function should be specialized.
Show All 28 Lines	for (auto *F : FuncDecls) {
LLVM_DEBUG(		LLVM_DEBUG(
dbgs() << "FnSpecialization: Cannot specialize this func.\n");		dbgs() << "FnSpecialization: Cannot specialize this func.\n");
}		}
}		}

for (auto *SpecializedFunc : CurrentSpecializations) {		for (auto *SpecializedFunc : CurrentSpecializations) {
SpecializedFuncs.insert(SpecializedFunc);		SpecializedFuncs.insert(SpecializedFunc);

// TODO: If we want to support specializing specialized functions,		// Initialize the state of the newly created functions, marking them
// initialize here the state of the newly created functions, marking		// argument-tracked and executable.
// them argument-tracked and executable.		if (SpecializedFunc->hasExactDefinition() &&
		!SpecializedFunc->hasFnAttribute(Attribute::Naked))
		Solver.addTrackedFunction(SpecializedFunc);
		Solver.addArgumentTrackedFunction(SpecializedFunc);
		FuncDecls.push_back(SpecializedFunc);
		Solver.markBlockExecutable(&SpecializedFunc->front());

// Replace the function arguments for the specialized functions.		// Replace the function arguments for the specialized functions.
for (Argument &Arg : SpecializedFunc->args())		for (Argument &Arg : SpecializedFunc->args())
if (!Arg.use_empty() && tryToReplaceWithConstant(&Arg))		if (!Arg.use_empty() && tryToReplaceWithConstant(&Arg))
LLVM_DEBUG(dbgs() << "FnSpecialization: Replaced constant argument: "		LLVM_DEBUG(dbgs() << "FnSpecialization: Replaced constant argument: "
<< Arg.getName() << "\n");		<< Arg.getName() << "\n");
}		}

NumFuncSpecialized += NbFunctionsSpecialized;		NumFuncSpecialized += NbFunctionsSpecialized;
return Changed;		return Changed;
}		}

bool tryToReplaceWithConstant(Value *V) {		bool tryToReplaceWithConstant(Value *V) {
if (!V->getType()->isSingleValueType() \|\| isa<CallBase>(V) \|\|		if (!V->getType()->isSingleValueType() \|\| isa<CallBase>(V) \|\|
V->user_empty())		V->user_empty())
return false;		return false;

const ValueLatticeElement &IV = Solver.getLatticeValueFor(V);		const ValueLatticeElement &IV = Solver.getLatticeValueFor(V);
if (isOverdefined(IV))		if (isOverdefined(IV))
return false;		return false;
auto *Const = IV.isConstant() ? Solver.getConstant(IV)		auto *Const =
: UndefValue::get(V->getType());		isConstant(IV) ? Solver.getConstant(IV) : UndefValue::get(V->getType());
V->replaceAllUsesWith(Const);		V->replaceAllUsesWith(Const);

// TODO: Update the solver here if we want to specialize specialized		for (auto *U : Const->users())
// functions.		if (auto *I = dyn_cast<Instruction>(U))
		if (Solver.isBlockExecutable(I->getParent()))
		Solver.visit(I);

		// Remove the instruction from Block and Solver.
		if (auto *I = dyn_cast<Instruction>(V)) {
		if (I->isSafeToRemove()) {
		I->eraseFromParent();
		Solver.removeLatticeValueFor(I);
		}
		}
return true;		return true;
}		}

private:		private:
// The number of functions specialised, used for collecting statistics and		// The number of functions specialised, used for collecting statistics and
// also in the cost model.		// also in the cost model.
unsigned NbFunctionsSpecialized = 0;		unsigned NbFunctionsSpecialized = 0;

		/// Clone the function \p F and remove the ssa_copy intrinsics added by
		/// the SCCPSolver in the cloned version.
		Function cloneCandidateFunction(Function F) {
		ValueToValueMapTy EmptyMap;
		Function *Clone = CloneFunction(F, EmptyMap);
		removeSSACopy(*Clone);
		return Clone;
		}

/// This function decides whether to specialize function \p F based on the		/// This function decides whether to specialize function \p F based on the
/// known constant values its arguments can take on. Specialization is		/// known constant values its arguments can take on. Specialization is
/// performed on the first interesting argument. Specializations based on		/// performed on the first interesting argument. Specializations based on
/// additional arguments will be evaluated on following iterations of the		/// additional arguments will be evaluated on following iterations of the
/// main IPSCCP solve loop. \returns true if the function is specialized and		/// main IPSCCP solve loop. \returns true if the function is specialized and
/// false otherwise.		/// false otherwise.
bool specializeFunction(Function *F,		bool specializeFunction(Function *F,
SmallVectorImpl<Function *> &Specializations) {		SmallVectorImpl<Function *> &Specializations) {
▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	for (Argument &A : F->args()) {
<< *Constants[I] << "\n";		<< *Constants[I] << "\n";
dbgs() << "FnSpecialization: End of constants\n\n");		dbgs() << "FnSpecialization: End of constants\n\n");

// Create a version of the function in which the argument is marked		// Create a version of the function in which the argument is marked
// constant with the given value.		// constant with the given value.
for (auto *C : Constants) {		for (auto *C : Constants) {
// Clone the function. We leave the ValueToValueMap empty to allow		// Clone the function. We leave the ValueToValueMap empty to allow
// IPSCCP to propagate the constant arguments.		// IPSCCP to propagate the constant arguments.
ValueToValueMapTy EmptyMap;		Function *Clone = cloneCandidateFunction(F);
Function *Clone = CloneFunction(F, EmptyMap);
Argument *ClonedArg = Clone->arg_begin() + A.getArgNo();		Argument *ClonedArg = Clone->arg_begin() + A.getArgNo();

// Rewrite calls to the function so that they call the clone instead.		// Rewrite calls to the function so that they call the clone instead.
rewriteCallSites(F, Clone, *ClonedArg, C);		rewriteCallSites(F, Clone, *ClonedArg, C);

// Initialize the lattice state of the arguments of the function clone,		// Initialize the lattice state of the arguments of the function clone,
// marking the argument on which we specialized the function constant		// marking the argument on which we specialized the function constant
// with the given value.		// with the given value.
Solver.markArgInFuncSpecialization(F, ClonedArg, C);		Solver.markArgInFuncSpecialization(F, ClonedArg, C);

// Mark all the specialized functions		// Mark all the specialized functions
Specializations.push_back(Clone);		Specializations.push_back(Clone);
NbFunctionsSpecialized++;		NbFunctionsSpecialized++;
}		}

// TODO: if we want to support specialize specialized functions, and if		// If the function has been completely specialized, the original function
// the function has been completely specialized, the original function is		// is no longer needed. Mark it unreachable.
// no longer needed, so we would need to mark it unreachable here.		if (!IsPartial)
		Solver.markFunctionUnreachable(F);

// FIXME: Only one argument per function.		// FIXME: Only one argument per function.
return true;		return true;
}		}

return false;		return false;
}		}

▲ Show 20 Lines • Show All 278 Lines • ▼ Show 20 Lines	for (auto *CS : CallSitesToRewrite) {
CS->getArgOperand(ArgNo) == C) {		CS->getArgOperand(ArgNo) == C) {
CS->setCalledFunction(Clone);		CS->setCalledFunction(Clone);
Solver.markOverdefined(CS);		Solver.markOverdefined(CS);
}		}
}		}
}		}
};		};

/// Function to clean up the left over intrinsics from SCCP util.
static void cleanup(Module &M) {
SjoerdMeijerAuthorUnsubmitted Done Reply Inline Actions This is now `removeSSACopy`, see line 188. SjoerdMeijer: This is now `removeSSACopy`, see line 188.
for (Function &F : M) {
for (BasicBlock &BB : F) {
for (BasicBlock::iterator BI = BB.begin(), E = BB.end(); BI != E;) {
Instruction Inst = &BI++;
if (auto *II = dyn_cast<IntrinsicInst>(Inst)) {
if (II->getIntrinsicID() == Intrinsic::ssa_copy) {
Value *Op = II->getOperand(0);
Inst->replaceAllUsesWith(Op);
Inst->eraseFromParent();
}
}
}
}
}
}

bool llvm::runFunctionSpecialization(		bool llvm::runFunctionSpecialization(
Module &M, const DataLayout &DL,		Module &M, const DataLayout &DL,
std::function<TargetLibraryInfo &(Function &)> GetTLI,		std::function<TargetLibraryInfo &(Function &)> GetTLI,
std::function<TargetTransformInfo &(Function &)> GetTTI,		std::function<TargetTransformInfo &(Function &)> GetTTI,
std::function<AssumptionCache &(Function &)> GetAC,		std::function<AssumptionCache &(Function &)> GetAC,
function_ref<AnalysisResultsForFn(Function &)> GetAnalysis) {		function_ref<AnalysisResultsForFn(Function &)> GetAnalysis) {
SCCPSolver Solver(DL, GetTLI, M.getContext());		SCCPSolver Solver(DL, GetTLI, M.getContext());
FunctionSpecializer FS(Solver, GetAC, GetTTI, GetTLI);		FunctionSpecializer FS(Solver, GetAC, GetTTI, GetTLI);
▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines	#endif

// Initially resolve the constants in all the argument tracked functions.		// Initially resolve the constants in all the argument tracked functions.
RunSCCPSolver(FuncDecls);		RunSCCPSolver(FuncDecls);

SmallVector<Function *, 2> CurrentSpecializations;		SmallVector<Function *, 2> CurrentSpecializations;
unsigned I = 0;		unsigned I = 0;
while (FuncSpecializationMaxIters != I++ &&		while (FuncSpecializationMaxIters != I++ &&
FS.specializeFunctions(FuncDecls, CurrentSpecializations)) {		FS.specializeFunctions(FuncDecls, CurrentSpecializations)) {
// TODO: run the solver here for the specialized functions only if we want
// to specialize recursively.		// Run the solver for the specialized functions.
		RunSCCPSolver(CurrentSpecializations);

		// Replace some unresolved constant arguments
		constantArgPropagation(FuncDecls, M, Solver);

CurrentSpecializations.clear();		CurrentSpecializations.clear();
Changed = true;		Changed = true;
}		}

// Clean up the IR by removing ssa_copy intrinsics.		// Clean up the IR by removing ssa_copy intrinsics.
cleanup(M);		removeSSACopy(M);
return Changed;		return Changed;
}		}

llvm/test/Transforms/FunctionSpecialization/function-specialization-recursive.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; RUN: opt -function-specialization -force-function-specialization -func-specialization-max-iters=2 -inline -instcombine -S < %s \| FileCheck %s --check-prefix=ITERS2
	; RUN: opt -function-specialization -inline -instcombine -S < %s \| FileCheck %s			; RUN: opt -function-specialization -force-function-specialization -func-specialization-max-iters=3 -inline -instcombine -S < %s \| FileCheck %s --check-prefix=ITERS3
				; RUN: opt -function-specialization -force-function-specialization -func-specialization-max-iters=4 -inline -instcombine -S < %s \| FileCheck %s --check-prefix=ITERS4
	; TODO: this is a case that would be interesting to support, but we don't yet
	; at the moment.

	@Global = internal constant i32 1, align 4			@Global = internal constant i32 1, align 4

	define internal void @recursiveFunc(i32* nocapture readonly %arg) {			define internal void @recursiveFunc(i32* nocapture readonly %arg) {
	; CHECK-LABEL: @recursiveFunc(
	; CHECK-NEXT: [[TEMP:%.*]] = alloca i32, align 4
	; CHECK-NEXT: [[ARG_LOAD:%.]] = load i32, i32 [[ARG:%.*]], align 4
	; CHECK-NEXT: [[ARG_CMP:%.*]] = icmp slt i32 [[ARG_LOAD]], 4
	; CHECK-NEXT: br i1 [[ARG_CMP]], label [[BLOCK6:%.]], label [[RET_BLOCK:%.]]
	; CHECK: block6:
	; CHECK-NEXT: call void @print_val(i32 [[ARG_LOAD]])
	; CHECK-NEXT: [[ARG_ADD:%.*]] = add nsw i32 [[ARG_LOAD]], 1
	; CHECK-NEXT: store i32 [[ARG_ADD]], i32* [[TEMP]], align 4
	; CHECK-NEXT: call void @recursiveFunc(i32* nonnull [[TEMP]])
	; CHECK-NEXT: br label [[RET_BLOCK]]
	; CHECK: ret.block:
	; CHECK-NEXT: ret void
	;
	%temp = alloca i32, align 4			%temp = alloca i32, align 4
	%arg.load = load i32, i32* %arg, align 4			%arg.load = load i32, i32* %arg, align 4
	%arg.cmp = icmp slt i32 %arg.load, 4			%arg.cmp = icmp slt i32 %arg.load, 4
	br i1 %arg.cmp, label %block6, label %ret.block			br i1 %arg.cmp, label %block6, label %ret.block

	block6:			block6:
	call void @print_val(i32 %arg.load)			call void @print_val(i32 %arg.load)
	%arg.add = add nsw i32 %arg.load, 1			%arg.add = add nsw i32 %arg.load, 1
	store i32 %arg.add, i32* %temp, align 4			store i32 %arg.add, i32* %temp, align 4
	call void @recursiveFunc(i32* nonnull %temp)			call void @recursiveFunc(i32* nonnull %temp)
	br label %ret.block			br label %ret.block

	ret.block:			ret.block:
	ret void			ret void
	}			}

				; ITERS2: @funcspec.arg.3 = internal constant i32 3
				; ITERS3: @funcspec.arg.5 = internal constant i32 4

	define i32 @main() {			define i32 @main() {
	; CHECK-LABEL: @main(			; ITERS2-LABEL: @main(
	; CHECK-NEXT: call void @recursiveFunc(i32* nonnull @Global)			; ITERS2-NEXT: call void @print_val(i32 1)
	; CHECK-NEXT: ret i32 0			; ITERS2-NEXT: call void @print_val(i32 2)
				; ITERS2-NEXT: call void @recursiveFunc(i32* nonnull @funcspec.arg.3)
				; ITERS2-NEXT: ret i32 0
				;
				; ITERS3-LABEL: @main(
				; ITERS3-NEXT: call void @print_val(i32 1)
				; ITERS3-NEXT: call void @print_val(i32 2)
				; ITERS3-NEXT: call void @print_val(i32 3)
				; ITERS3-NEXT: call void @recursiveFunc(i32* nonnull @funcspec.arg.5)
				; ITERS3-NEXT: ret i32 0
				;
				; ITERS4-LABEL: @main(
				; ITERS4-NEXT: call void @print_val(i32 1)
				; ITERS4-NEXT: call void @print_val(i32 2)
				; ITERS4-NEXT: call void @print_val(i32 3)
				; ITERS4-NEXT: ret i32 0
	;			;
	call void @recursiveFunc(i32* nonnull @Global)			call void @recursiveFunc(i32* nonnull @Global)
	ret i32 0			ret i32 0
	}			}

	declare dso_local void @print_val(i32)			declare dso_local void @print_val(i32)

llvm/test/Transforms/FunctionSpecialization/function-specialization-recursive2.ll

This file was added.

				; RUN: opt -function-specialization -force-function-specialization -func-specialization-max-iters=2 -S < %s \| FileCheck %s

				; Volatile store preventing recursive specialisation:
				;
				; CHECK: @recursiveFunc.1
				; CHECK-NOT: @recursiveFunc.2

				@Global = internal constant i32 1, align 4

				define internal void @recursiveFunc(i32* nocapture readonly %arg) {
				%temp = alloca i32, align 4
				%arg.load = load i32, i32* %arg, align 4
				%arg.cmp = icmp slt i32 %arg.load, 4
				br i1 %arg.cmp, label %block6, label %ret.block

				block6:
				call void @print_val(i32 %arg.load)
				%arg.add = add nsw i32 %arg.load, 1
				store volatile i32 %arg.add, i32* %temp, align 4
				call void @recursiveFunc(i32* nonnull %temp)
				br label %ret.block

				ret.block:
				ret void
				}

				define i32 @main() {
				call void @recursiveFunc(i32* nonnull @Global)
				ret i32 0
				}

				declare dso_local void @print_val(i32)

llvm/test/Transforms/FunctionSpecialization/function-specialization-recursive3.ll

This file was added.

				; RUN: opt -function-specialization -force-function-specialization -func-specialization-max-iters=2 -S < %s \| FileCheck %s

				; Duplicate store preventing recursive specialisation:
				;
				; CHECK: @recursiveFunc.1
				; CHECK-NOT: @recursiveFunc.2

				@Global = internal constant i32 1, align 4

				define internal void @recursiveFunc(i32* nocapture readonly %arg) {
				%temp = alloca i32, align 4
				%arg.load = load i32, i32* %arg, align 4
				%arg.cmp = icmp slt i32 %arg.load, 4
				br i1 %arg.cmp, label %block6, label %ret.block

				block6:
				call void @print_val(i32 %arg.load)
				%arg.add = add nsw i32 %arg.load, 1
				store i32 %arg.add, i32* %temp, align 4
				store i32 %arg.add, i32* %temp, align 4
				call void @recursiveFunc(i32* nonnull %temp)
				br label %ret.block

				ret.block:
				ret void
				}


				define i32 @main() {
				call void @recursiveFunc(i32* nonnull @Global)
				ret i32 0
				}

				declare dso_local void @print_val(i32)

llvm/test/Transforms/FunctionSpecialization/function-specialization-recursive4.ll

This file was added.

				; RUN: opt -function-specialization -force-function-specialization -func-specialization-max-iters=2 -S < %s \| FileCheck %s

				; Alloca is not an integer type:
				;
				; CHECK: @recursiveFunc.1
				; CHECK-NOT: @recursiveFunc.2

				@Global = internal constant i32 1, align 4

				define internal void @recursiveFunc(i32* nocapture readonly %arg) {
				%temp = alloca float, align 4
				%arg.load = load i32, i32* %arg, align 4
				%arg.cmp = icmp slt i32 %arg.load, 4
				br i1 %arg.cmp, label %block6, label %ret.block

				block6:
				call void @print_val(i32 %arg.load)
				%arg.add = add nsw i32 %arg.load, 1
				%bc = bitcast float* %temp to i32*
				call void @recursiveFunc(i32* nonnull %bc)
				br label %ret.block

				ret.block:
				ret void
				}

				define i32 @main() {
				call void @recursiveFunc(i32* nonnull @Global)
				ret i32 0
				}

				declare dso_local void @print_val(i32)