This patch improves performance on T99 as shown here (libquantum 0.2.4):
https://docs.google.com/spreadsheets/d/1Lo1o2E1NjrpkwS7DvYYWsiVvPdd93h7KBaqeptMrZPY/edit?usp=sharing
By increasing the LoopMicroOpsBufferSize in the T99 Scheduler file, loop unrolling becomes more aggressive. This helps performance on T99.
Test case included.