Original motivation here is a performance problem encountered with the following IR, for which getSpillCost() takes an unreasonable amount of time: https://gist.github.com/nikic/a5bbc62632d68f76dd8fa9a4e81d9594
However, it turns out that the source of the perf problem is a correctness problem as well: The current implementation works by going through the tree nodes, remembering the previous node we looked at and inspecting all instructions between the previous and current node. Of course, because this is a tree and not just a chain, the "previous" node in the tree vector may occur after the current one. In this case the implementation ends up scanning through the whole block (even going through the effort of wrapping around from the start of the block to the end and continuing from there). If this happens often enough, this is not just very slow, but also results in gigantic spill costs (I saw costs >500000 for the sample code).
I'm changing the implementation to compute the cost in a single pass from the root node to the start of the basic block, keeping track of which values are currently live along the way. This resolves both the performance and the correctness issues.
The downside here is that I'm stopping at the edge of the basic block. I'm not sure how I would extend this to work across multiple basic blocks and not sure if that would even be worthwhile in the context of what the SLP vectorizer does.
(Side note: Intrinsic calls probably shouldn't count towards the spill cost. In this example the "calls" are all fshr/fshl, which are certainly not going to cause any spilling...)