- The load store optimizer is currently merging multiple loads/stores into VLDM/VSTM with more than 16 doubleword registers
- This is an UNPREDICTABLE instruction and shouldn't be done
- It looks like the Limit for how many registers included in a merge got dropped at some point so I am reintroducing it in this patch
- This fixes https://bugs.llvm.org/show_bug.cgi?id=38389
Details
Diff Detail
Event Timeline
How hard would it be to generate two vstmia instructions? I think you can just move the limit check into the "Merge following instructions where possible" loop. The generated code is more efficient, and it's easier to verify the testcase doesn't break for some other reason.
lib/Target/ARM/ARMLoadStoreOptimizer.cpp | ||
---|---|---|
1080 | This limit isn't actually useful...? There are only 32 float registers anyway. | |
1859 | How is this change related? |
- Remove extra limit cases that were unnecessary
- Check the Count/Limit on each iteration of the merge loop
- Replace guard with assert, previously only candidates that couldn't be merge were one instruction long
lib/Target/ARM/ARMLoadStoreOptimizer.cpp | ||
---|---|---|
1054 | Instead of disabling merging, could we just do something like "if (Count == Limit) break;", so we merge the first 16 stores, then consider any remaining stores as a separate set? |
Please create a .mir test that just checks the ARMLoadStoreOptimizer pass. See test/CodeGen/ARM/vldm-liveness.mir or test/CodeGen/ARM/load_store_opt_kill.mir for example.
lib/Target/ARM/ARMLoadStoreOptimizer.cpp | ||
---|---|---|
1054 | Ah yes, that's a much better idea. Thanks |
- Break out of the merge loop so that 16 doubleword register loads get merged and the next registers get inspected independently
- Add a MIR test that checks only the arm load store optimizer pass
MIR can be simplified more; see http://llvm.org/docs/MIRLangRef.html#simplifying-mir-files .
The new .mir test is nicely concise now. Maybe we can even do without the .ll test now?
LGTM with nitpicks addressed and when Eli agrees.
test/CodeGen/ARM/load_store_opt_reg_limit.mir | ||
---|---|---|
2–10 ↗ | (On Diff #166434) | You should be able to drop the whole IR block too I think. |
12 ↗ | (On Diff #166434) | Better style: # CHECK-LABEL: name: _Z2brR1lI1ME (may also consider choosing a simpler function name) |
Instead of disabling merging, could we just do something like "if (Count == Limit) break;", so we merge the first 16 stores, then consider any remaining stores as a separate set?