Calls on NVPTX are unusually expensive (for one thing, lots of state
needs to be saved to memory, which is slow), so make the inlininer much
more aggressive.
Details
Diff Detail
Event Timeline
I ended up chatting with Hal about this and he made a really great point about this. I had been thinking that it is really brittle to have the target provided inlining threshold be an absolute number instead of a multiplier / ratio as you have it.
However, Hal pointed out that this creates a coupling that could also be problematic. Consider an out-of-tree target with a carefully tuned inlining threshold multiplier. If lots of targets do this, changing the threshold could become extremely problematic because small changes would would still disturb the target-specific tunings. His suggestion was to use an absolute threshold from the target in the absense of an explicitly specified command line flag.
Thinking about this more, I think it still presents the same problem. Consider a change to the inliner that significantly changes the rate at which we inline things. It might be useful to be able to adjust the threshold when making the change to keep most inlining decisions neutral across targets.
I'm not 100% sure which is the best mechanism here. Or this may point out a problem with any of these mechanisms.
One possible alternative is to not have the size-based inlining be target configurable, and to make this exclusively handled by the proposed runtime cost estimation based inlining when that goes in...
Oh ignore this. I got the wrong revision. I'll go replay this on the right revision. Sorry.
Consider an out-of-tree target with a carefully tuned inlining threshold multiplier. If lots of targets do this, changing the threshold could become extremely problematic because small changes would would still disturb the target-specific tunings.
My thought was, the inlining multiplier is such a coarse knob -- set atop rather coarse heuristics -- that tuning it particularly carefully for a target might not make much sense.
I don't know if that's true or not, but it's at least an argument one could make.
His suggestion was to use an absolute threshold from the target in the absense of an explicitly specified command line flag. Thinking about this more, I think it still presents the same problem. Consider a change to the inliner that significantly changes the rate at which we inline things. It might be useful to be able to adjust the threshold when making the change to keep most inlining decisions neutral across targets.
I guess this (and the previous one, to an extent) are a question of API stability guarantees. To the extent that we don't promise to have a stable API for out-of-tree targets, we could change the inliner and at the same time break compilation for any out-of-tree targets that set the threshold. They'll have to fix the error and recompute their inlining threshold.
That doesn't seem so unreasonable to me -- it's not like we'd be doing this once a month, or even every release. And as an out-of-tree target, you'd only be broken if you explicitly opted in to this tuning mechanism, which we could make clear comes with this possibility of future intentional breakage.
Oh ignore this. I got the wrong revision. I'll go replay this on the right revision. Sorry.
(Maybe this was posted on the wrong patch? Your comment seemed quite sane to me. :)
No, my comment should have been on D18560 with the general discussion. You might want to move your response there as well.