This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Improve handling of stack accesses in Thumb-1.
ClosedPublic

Authored by john.brawn on Feb 12 2015, 8:53 AM.

Details

Summary

Thumb-1 only allows SP-based LDR and STR to be word-sized, and SP-base LDR, STR, and ADD only allow offsets that are a multiple of 4. Make some changes to better make use of these instructions:

  • Use word loads for anyext byte and halfword loads from the stack.
  • Enforce 4-byte alignment on objects accessed in this way, to ensure that the offset is valid.
  • Do the same for objects whose frame index is used, in order to avoid having to use more than one ADD to generate the frame index.
  • Correct how many bits of offset we think AddrModeT1_s has.
  • Make the load/store optimizer able to cope with AddrModeT1_s.
  • Fiddle with a bunch of tests to cope with the code generation changes that the above all causes, typically where the use of SP-based addressing causes less callee-saved registers to be used and thus saved.

Diff Detail

Repository
rL LLVM

Event Timeline

john.brawn updated this revision to Diff 19837.Feb 12 2015, 8:53 AM
john.brawn retitled this revision from to [ARM] Improve handling of stack accesses in Thumb-1..
john.brawn updated this object.
john.brawn edited the test plan for this revision. (Show Details)
john.brawn set the repository for this revision to rL LLVM.
john.brawn added a subscriber: Unknown Object (MLST).
jmolloy accepted this revision.Feb 12 2015, 10:24 AM
jmolloy added a reviewer: jmolloy.

Hi John,

This looks reasonable to me. I'd wait for someone else like Tim to give it the go-ahead though before committing.

Before you commit, please remove braces around one-liner if-statements.

Cheers,

James

This revision is now accepted and ready to land.Feb 12 2015, 10:24 AM
john.brawn updated this revision to Diff 19890.Feb 13 2015, 7:45 AM
john.brawn updated this object.
john.brawn added a reviewer: t.p.northover.

While working on improving the spill estimation I realised that it wasn't that which was causing frame offsets to be considered invalid, but instead that ARMII::AddrModeT1_s was treated as having 5 bits of offset but it actually has 8 bits of offset (5 bits is true for FP-based loads, but FP-based loads aren't used in Thumb-1). I've attached a new patch that fixes that also.

Hi John,

LGTM.

cheers,
--renato

Thanks. Could you commit this for me please?

John

Hi John,

I'm seeing many failures on Thumb tests, are you seeing that, too?

Failing Tests (7):

LLVM :: CodeGen/ARM/2015-01-21-thumbv4t-ldstr-opt.ll
LLVM :: CodeGen/ARM/debug-frame-vararg.ll
LLVM :: CodeGen/ARM/frame-register.ll
LLVM :: CodeGen/ARM/thumb1-varalloc.ll
LLVM :: CodeGen/ARM/thumb1_return_sequence.ll
LLVM :: CodeGen/Thumb/stm-merge.ll
LLVM :: CodeGen/Thumb/vargs.ll

ex.

llvm/test/CodeGen/Thumb/stm-merge.ll:13:10: error: expected string not found in input
; CHECK: stm r[[BASE:[0-9]]]!, {{.*}}
         ^
<stdin>:23:2: note: scanning from here
 .fnstart
 ^
<stdin>:31:2: note: possible intended match here
 str r0, [sp]
 ^

or

llvm/test/CodeGen/Thumb/vargs.ll:39:10: error: expected string not found in input
; CHECK: pop
         ^
<stdin>:42:6: note: scanning from here
 pop {r3}
     ^
<stdin>:63:50: note: possible intended match here
 .section __DATA,__nl_symbol_ptr,non_lazy_symbol_pointers
                                             ^

Yes I see them. Also, I've realised that the patch breaks things on big-endian.

John

john.brawn updated this revision to Diff 20177.Feb 18 2015, 8:14 AM
john.brawn updated this object.

New patch with test failures fixed. Some test failures were due to the load/store optimizer not understanding the AddrModeT1_s loads/stores, so it now does understand them, but an annoyingly large amount were due to tests needing adjustment due to different codegen. The widening of loads also now only happens little-endian, and not in thumb2 as then we have other instructions with more flexible addressing.

New patch with test failures fixed. Some test failures were due to the load/store optimizer not understanding the AddrModeT1_s loads/stores, so it now does understand them, but an annoyingly large amount were due to tests needing adjustment due to different codegen. The widening of loads also now only happens little-endian, and not in thumb2 as then we have other instructions with more flexible addressing.

Most of the tests were not generic enough to begin with. But due to their specific nature, I'm not sure we can improve them more than what you did.

James, are you happy with the LoadStoreOptimizer changes?

Otherwise, LGTM. Thanks!

Yeah this looks fine.

Thanks. Could you or Renato commit this for me please?

John

I'm running the tests now, will commit as soon as it finishes.

cheers,
--renato

rengolin closed this revision.Feb 25 2015, 6:43 AM

r230496