This patch adds support in the Load/Store optimisation pass to correctly generate Thumb1 LDMIA/STMIA instructions and fully enables the pass.
The reason this was disabled before is that the current algorithm always generates non-writeback Load/Store multiples first, and then tries to merge any applicable base register updates into the LDM/STM. Thumb1 only has LDM/STM with base register writeback, so this approach doesn't really work there. In a nutshell, my patch directly generates the Thumb1 tLDMIA[_UPD] and tSTMIA_UPD instructions. It then scans over the current block and tries to update any future instructions that read the base register with the new offset added from the writeback. If this isn't possible, the base register is reset right before the next instruction that uses it.
Detailed breakdown of the changes:
When merging Loads/Stores into LDM/STM, there are a few different cases:
- The base register is dead after the merged instruction. In this case, we don't need to do anything since it doesn't matter that the merged instruction updated the base register.
- The base register isn't dead after the merged instruction. With the current algorithm, future uses of the base register assume that there was no previous writeback (which is incorrect). This case can again be split up:
- All the future instructions that use the base register are also Loads/Stores (e.g. tLDRBi, tSTRHi) with immediate offsets. We can simply update the offset (as long as the updated offset is a legal immediate of course, i.e. >= 0).
- There are some instructions that use the base register that don't use immediate offsets or can't be updated with the additional offset. This could be a Load/Store with a register offset for example, or a Load/Store at a memory address lower than the base register. In this case, we need to undo the writeback by inserting a SUB instruction just before the problematic use of the base register. Note that if there are multiple writebacks to the base register, there might already be a SUB inserted to reset a previous writeback. The algorithm will merge these SUBs (as long as the offset is a legal immediate) so that no more than 1 reset per base register use is generated. This is the same approach other compilers seem to take for Thumb1 LDM/STM peepholing.
- The base register is alive at the end of the block, in which case we need to reset the base register in a similar fashion to above (unless we're in an exit block).
Since we're already using writeback-only LDM/STM, the later stages of the pass (merging base register updates into the generated non-writeback LDM/STM for Thumb2/ARM targets) become unnecessary.
There is no intended functionality change for non-Thumb1 targets.
assert(isThumb1);