The change is to solve PR17409 (https://llvm.org/bugs/show_bug.cgi?id=17409) and its several duplicates. The change is divided into three parts for easier review.
The major issue of analyzeSiblingValues is when a virtreg is splitted to N siblings with the same original VNI, and there are some PHIDefs generated in its splitting, the VNInfo of every sibling will be added to the Dependents of all other siblings, which creates a NxN network. traceSiblingValue is propagating SibValue info through this NxN network so it has NxN time complexity. In addition, selectOrSplit is called for all the N siblings sequentially. When reg pressure is high, a large percentage of siblings will be spilled (let's suppose N/2 siblings will be spilled), and traceSiblingValue will be called N/2 times indirectly from selectOrSplit, then there will be N^3 time complexity in total.
analyzeSiblingValues has two major usages: One is to figure out the SibValueInfo::SpillVNI of the virtReg to be spill so the spill can be hoisted to the place after SpillVNI->def and redundent spills are eliminated at the same time. The other is to trace the sibling copies back to the original value so the computation of the original value can be used for rematerialization. We replace analyzeSiblingValues by reimplementing these functionalities in Part1 and Part2.
Part1:
Instead of figure out the place to hoist spill for each virtReg to be spilled, we do that all at once when allocatePhysRegs is done. With all spills in place, we group spills with the same values (having the same OrigVNI). For each group of the spills with equal values, first we remove redundent spills dominated by other spill in the group, then traverse the dominate tree in post-order and hoist the spills to less frequent dominate tree node. Since spill can be hoisted to a cold dominate tree node without any sibling's VNI->def in the node, it can be better than the original implementation.
I didn't follow Jakub's proposal in pr17409 to change hoistCopiesForSize because redundent backcopies seen in one round of splitting is limited. Suppose Vreg1 is splitted to Vreg2, Vreg3 and Vreg4 in the first round of splitting, and Vreg4 is further splitted to Vreg5 and Vreg6 in the second round of splitting, the redundent backcopies between Vreg2=Vreg3 and Vreg5=Vreg6 cannot be found (I caught more than 100 such cases in llvm testsuites which left redundent spills in the final asm code)
Part2:
To find out the computation of the original value for rematerialization, we always query the inst at OrigVNI->def. To handle the case that the inst at OrigVNI->def has been removed during rematerialization, we change rematerialization to not delete the inst at OrigVNI->def even if it is already dead. In stead, we change the dest vreg to a new vreg (The new vreg will not be reg allocated so it will not affect the allocation of other vregs), save the inst to a set named as DeadRemats, and shrink the original dest vreg in the same way as previous. The insts in DeadRemats will be removed after allocatePhysRegs is done.
Part3:
The Part3 of the change is to clean up the code related with analyzeSiblingValues.
Test with all three parts combined on x86_64-linux-gnu:
- The compile time for 1.c in pr24618 dropped from 0.34s to 0.25s The compile time for interpreter_goto_table_impl.ii in pr24618 dropped from 176.80s to 66.86s. I cannot verify the patch using tests in pr17409 because most bugs related with asan/ubsan have been workaround on sanitizer side.
- llvm testsuites. Perf: mostly neutral except one perf regression I havn't addressed: SingleSource/Benchmarks/Misc/mandel. The reason has been understood. It is because we didn't do local spill hoist which the original implementation did. The usage of the local hoist is described in the comment in propagateSiblingValue (Starting with // This is an alternative def earlier in the same MBB.) The local hoist cannot be done after allocatePhysRegs so I will address it in a separate patch. CC time: MultiSource/Applications/sqlite3/sqlite3 has 1.5% improvement steadily.
- I havn't cleaned up the llvm unit tests because I expect there will be many changes to the patches during the review. I will clean them up later.
Is there a way this could be computed from the split map?