This is an archive of the discontinued LLVM Phabricator instance.

[ARM][ARMLoadStoreOptimizer]
ClosedPublic

Authored by LukeCheeseman on Sep 14 2018, 3:48 AM.

Details

Summary
  • The load store optimizer is currently merging multiple loads/stores into VLDM/VSTM with more than 16 doubleword registers
  • This is an UNPREDICTABLE instruction and shouldn't be done
  • It looks like the Limit for how many registers included in a merge got dropped at some point so I am reintroducing it in this patch
  • This fixes https://bugs.llvm.org/show_bug.cgi?id=38389

Diff Detail

Event Timeline

LukeCheeseman created this revision.Sep 14 2018, 3:48 AM
LukeCheeseman edited the summary of this revision. (Show Details)Sep 14 2018, 7:41 AM
efriedma added a subscriber: efriedma.

How hard would it be to generate two vstmia instructions? I think you can just move the limit check into the "Merge following instructions where possible" loop. The generated code is more efficient, and it's easier to verify the testcase doesn't break for some other reason.

lib/Target/ARM/ARMLoadStoreOptimizer.cpp
1094

This limit isn't actually useful...? There are only 32 float registers anyway.

1859

How is this change related?

  • Remove extra limit cases that were unnecessary
  • Check the Count/Limit on each iteration of the merge loop
  • Replace guard with assert, previously only candidates that couldn't be merge were one instruction long
efriedma added inline comments.Sep 18 2018, 2:29 PM
lib/Target/ARM/ARMLoadStoreOptimizer.cpp
1054

Instead of disabling merging, could we just do something like "if (Count == Limit) break;", so we merge the first 16 stores, then consider any remaining stores as a separate set?

Please create a .mir test that just checks the ARMLoadStoreOptimizer pass. See test/CodeGen/ARM/vldm-liveness.mir or test/CodeGen/ARM/load_store_opt_kill.mir for example.

LukeCheeseman added inline comments.Sep 19 2018, 3:40 AM
lib/Target/ARM/ARMLoadStoreOptimizer.cpp
1054

Ah yes, that's a much better idea. Thanks

  • Break out of the merge loop so that 16 doubleword register loads get merged and the next registers get inspected independently
  • Add a MIR test that checks only the arm load store optimizer pass
LukeCheeseman marked 2 inline comments as done.Sep 19 2018, 6:32 AM
  • Removing some extraneous MIR input
  • Reduce MIR test, remove lots of unnecessary info
MatzeB accepted this revision.Sep 21 2018, 9:12 AM

The new .mir test is nicely concise now. Maybe we can even do without the .ll test now?

LGTM with nitpicks addressed and when Eli agrees.

test/CodeGen/ARM/load_store_opt_reg_limit.mir
2–10

You should be able to drop the whole IR block too I think.

12

Better style: # CHECK-LABEL: name: _Z2brR1lI1ME (may also consider choosing a simpler function name)

This revision is now accepted and ready to land.Sep 21 2018, 9:12 AM
  • Remove .ll test and simplify mir test
LukeCheeseman marked 2 inline comments as done.Sep 21 2018, 10:16 AM
LukeCheeseman closed this revision.Sep 24 2018, 5:29 AM

Closed by r342872