Use TargetFrameLowering::orderFrameObjects hook to change allocation
order of stack frame locations to increase the number of load/store pair
instructions generated.
Details
Diff Detail
Event Timeline
The code looks reasonable and it appears to do what's advertised.
Did you measure how many more instructions are paired? I assume this is mostly a code size reduction optimization. Did you run into any issues with compile-time?
test/CodeGen/AArch64/stack-layout-pairing.ll | ||
---|---|---|
38 | I don't understand this comment about optimizing for size.. |
Here are some relevant stat diffs from compiling the llvm test-suite:
27 (+0.10 %) aarch64-ldst-opt - Number of load/store from unscaled generated 2276 (+2.18 %) aarch64-ldst-opt - Number of load/store pair instructions generated 8 (+1.74 %) aarch64-ldst-opt - Number of narrow zero stores promoted 5 (+0.02 %) branchfolding - Number of block tails merged 2 (+0.00 %) branchfolding - Number of branches optimized 5 (+0.01 %) branchfolding - Number of dead blocks removed -2297 (-0.08 %) mccodeemitter - Number of MC instructions emitted. -2928 (-0.02 %) pei - Number of bytes used for stack in all functions
I'm still collecting compile time data.
test/CodeGen/AArch64/stack-layout-pairing.ll | ||
---|---|---|
38 | Fixed. That was left over from a previous version of the change. |
My compile-time testing showed no significant differences when compiling the llvm test-suite benchmarks at -O1
This generally looks good to me, although I haven't had a super in depth look. Given the code complexity it'd be nice to have more than 2 testcases though.
James
James,
Thanks for taking a look. I can write more tests, but I don't really know that I can increase the code coverage in doing so, since the code doesn't really have many different cases to handle. Do you have any thoughts on what kind of additional testing should be done?
Ping? I've added some more test cases (and a couple more reviewers in case any of you would care to take a look).
referenced*