This is an archive of the discontinued LLVM Phabricator instance.

Fix and enable the Load/Store optimisation pass for Thumb1
ClosedPublic

Authored by mroth on May 14 2014, 3:43 AM.

Details

Reviewers
jmolloy
Summary

This patch adds support in the Load/Store optimisation pass to correctly generate Thumb1 LDMIA/STMIA instructions and fully enables the pass.

The reason this was disabled before is that the current algorithm always generates non-writeback Load/Store multiples first, and then tries to merge any applicable base register updates into the LDM/STM. Thumb1 only has LDM/STM with base register writeback, so this approach doesn't really work there. In a nutshell, my patch directly generates the Thumb1 tLDMIA[_UPD] and tSTMIA_UPD instructions. It then scans over the current block and tries to update any future instructions that read the base register with the new offset added from the writeback. If this isn't possible, the base register is reset right before the next instruction that uses it.

Detailed breakdown of the changes:
When merging Loads/Stores into LDM/STM, there are a few different cases:

  • The base register is dead after the merged instruction. In this case, we don't need to do anything since it doesn't matter that the merged instruction updated the base register.
  • The base register isn't dead after the merged instruction. With the current algorithm, future uses of the base register assume that there was no previous writeback (which is incorrect). This case can again be split up:
    • All the future instructions that use the base register are also Loads/Stores (e.g. tLDRBi, tSTRHi) with immediate offsets. We can simply update the offset (as long as the updated offset is a legal immediate of course, i.e. >= 0).
    • There are some instructions that use the base register that don't use immediate offsets or can't be updated with the additional offset. This could be a Load/Store with a register offset for example, or a Load/Store at a memory address lower than the base register. In this case, we need to undo the writeback by inserting a SUB instruction just before the problematic use of the base register. Note that if there are multiple writebacks to the base register, there might already be a SUB inserted to reset a previous writeback. The algorithm will merge these SUBs (as long as the offset is a legal immediate) so that no more than 1 reset per base register use is generated. This is the same approach other compilers seem to take for Thumb1 LDM/STM peepholing.
    • The base register is alive at the end of the block, in which case we need to reset the base register in a similar fashion to above (unless we're in an exit block).

Since we're already using writeback-only LDM/STM, the later stages of the pass (merging base register updates into the generated non-writeback LDM/STM for Thumb2/ARM targets) become unnecessary.

There is no intended functionality change for non-Thumb1 targets.

Diff Detail

Event Timeline

mroth updated this revision to Diff 9380.May 14 2014, 3:43 AM
mroth retitled this revision from to Fix and enable the Load/Store optimisation pass for Thumb1.
mroth updated this object.
mroth edited the test plan for this revision. (Show Details)
t.p.northover edited the test plan for this revision. (Show Details)May 14 2014, 3:51 AM
t.p.northover added a subscriber: Unknown Object (MLST).

Hi Moritz,

The patch looks good, aside from the few comments. Also, would be good to have store tests as well.

cheers,
--renato
PS: when submitting patches to Phab, use "diff -U99" so that we get a bit of context around the changes.

lib/Target/ARM/ARMLoadStoreOptimizer.cpp
322

assert(isThumb1);

326

This conditional reverted (and merged with the last on in this block) would reduce the indentation of the whole block below.

338

This block is repeated almost identically. I'd be tempted to make it a switch(Opc) to create the MO, than have a single (perhaps slightly more complex) conditional to set the InsertSub. That could also be merged to the check right after this, and make the whole function half of its size (and easier to read).

521

break;

mroth updated this revision to Diff 9445.May 15 2014, 9:34 AM

Hi Renato,

thanks a lot for the feedback. I've updated my diff with the changes.

Changing the UpdateBaseRegUses function wasn't quite as easy as I thought, as there are various parts of the code that are duplicated but with subtle differences (some branches return, others don't etc). I tried a few things as you suggested and ended up with a much nicer function, so let me know what you think. It looks like the conditional over the loop body has to stay there an instruction can both read and kill the base register, in which case we might still have to insert a SUB.

Regarding the tests, I fully agree with you. There is a test case for STR->STM merging in patch 0004 (a fairly trivial / small patch, see my mail on the list), which uses memcpy to generate the chain of LDM/STM to be merged. If you want, I could of course change it and include it with this commit. James said he'd take care of the commits when code review is complete so he'll make sure that this test is pushed at the same time as the patch.

Cheers
Moritz

Hi Moritz,

The patch looks good to me. If James is happy with the tests, I'm happy to.

cheers,
--renato

jmolloy accepted this revision.May 16 2014, 7:33 AM
jmolloy added a reviewer: jmolloy.
jmolloy added a subscriber: jmolloy.

Hi Moritz,

Following Renato's LGTM, I've applied the patch sequence for you, ending in r208994.

Cheers,

James

This revision is now accepted and ready to land.May 16 2014, 7:33 AM
mroth closed this revision.Sep 24 2014, 10:07 AM