Page MenuHomePhabricator

Fold Address Computations into Load/Store instructions for AArch64
Needs ReviewPublic

Authored by ramred01 on May 23 2019, 3:54 AM.



When compiling with -Oz, the indexed or offset address computations with or without scaling for Load/Store instructions is not folding into the Load/Store instructions for AArch64 even when the AArch64 ISA has load/store instructions with addressing modes that allow for indexed/offset addresses with/without scaling.

This has been fixed by identifying the load/store instructions and then scanning backwards for the ADD/SUB instructions that perform the address computation for the load/store instructions. Thereafter, we replace the ADD/SUB and the Load/Store instruction with an appropriate Load/Store instruction, which has the address computation folded in.

Diff Detail

Event Timeline

ramred01 created this revision.May 23 2019, 3:54 AM
t.p.northover added inline comments.May 23 2019, 5:32 AM

I think these names could be improved. Maybe AddrCalcI and MemI?


I don't see why we're checking ADDWrs here. This is an address computation so it'll always be at 64-bits.

Even if it hypothetically wasn't (say with 32-bit pointers) the size of the arithmetic has no bearing on the type being stored, which is what the W in STRWui and LDRWui refer to.


Are you planning to add more variants in the immediate future? If not, it's probably best to convert this to an assertion.


Can we start sentences with a capital letter please.


This is not a useful comment.


I don't believe this is reachable. It's certainly wrong: no ldp or stp instructions allow the sum of two registers, let alone with a shift.


What's the benefit of doing the transformation at all if we don't get to remove the update instruction? I think there should be an earlier check that the load/store is the only user before doing anything.

If it remains, the test should probably be whether the register is killed by its use in I.


I don't think this is the right place for your code. What you're looking for is a fundamentally but not necessarily different set of mergeability criteria.

You could easily imagine someone extending this function to support ADDXrs with the intention to apply it in the same way as ADDXri, and that's what this function should be saved for.

Instead, I think we need a distinctive pair of names for what you're doing and the existing transformation. Perhaps mergeUpdateToIndexedMemOp and mergeUpdateToSingleUser, though I'm not entirely happy with those and I'll think some more.


LLVM style braces start on the same line as the case.


I think this needs significantly better validation. The acceptable shifts depend strongly on what load or store this is being folded into. Non-paired loads only allow shifts of 0 or the data width, and paired loads don't allow any shift.


I think this needs refactoring, we shouldn't be hard-coding what mergeUpdateInsn supports outside that function. Instead, the merge should be able to bail gracefully when it doesn't understand ADDWrs or whatever.


You can prune all the metadata out of this, and I think we also need more than just one test. There are many different ways this transformation could potentially go wrong.