Calculating the spill cost is expensive and we are looking to all instructions among scalars of VectorizableTree in the region to find CallInst instance. But we can avoid that and look for call instructions while extending the scheduler region instead. I measured, for example, while building SPEC FP 2006(C and C++ buildable) we have a ~2% ratio of CallInst present in all cost estimations calculation for vectorizable kernels. This change invokes getSpillCost() when requires.
This is a part https://reviews.llvm.org/D57779 change [SLP] Add support for throttling.
Not sure it is a good idea to add tree here. Maybe just add an internal flag and update R->NoCallInst after scheduling the region, reading the flag from BlockScheduling? or return as one of the results of BS.tryScheduleBundle.