This feature enables the fusion of such operations on Cortex A57, as recommended in its Software Optimisation Guide, section 4.13, and on Exynos M1.
On A57, it improves the results of a proprietary benchmark by about 20%.
Paths
| Differential D28491
[AArch64] Add new subtarget feature to fuse AES crypto operations ClosedPublic Authored by evandro on Jan 9 2017, 3:10 PM.
Details Summary This feature enables the fusion of such operations on Cortex A57, as recommended in its Software Optimisation Guide, section 4.13, and on Exynos M1. On A57, it improves the results of a proprietary benchmark by about 20%.
Diff Detail
Event Timelineevandro added a parent revision: D28489: [CodeGen] Move MacroFusion to the target.Jan 9 2017, 3:11 PM
Comment Actions The MacroFusion pass is currently being added before the RA runs. However, since the AArch64ExpandPseudo pass is run after the RA (in AArch64PassConfig::addPreSched2()), I wonder if it'd make more sense to run the MISched after the RA as well, and not before as it is now. Thoughts? Comment Actions
There are a number of benefits when running the scheduler before register allocation (for example we can still reduce register pressure). We already have the PostMachineScheduler for scheduling again after regalloc (it's based on the same MISched framework but added considerably later in the pipeline; see also TargetSubtargetInfo::enablePostRAScheduler()). Comment Actions
I'm asking this because, looking further at other pairs of instrs that A57 fuses, such as ADRP/ADD, they only appear in the instr stream after pseudo expansion. Comment Actions
Well if there is no reason to ever break the instructions apart, then using a Pseudo instruction and expanding that later may be the easier solution, is that the case for the AES instructions? Comment Actions
No, since they are pretty opaque. But the pseudo MOVaddr is expanded into the pair ADRP/ADD only after the RA. On A57, it's important to schedule them back to back, e.g., by running the MISched after the RA instead of before. Comment Actions
Or rather, I wonder why pseudo expansion is happening this late, when they are very simple instrs in AArch64. Methinks that expanding them sooner would expose them to more optimizations, yes? This revision is now accepted and ready to land.Jan 30 2017, 10:38 AM Closed by commit rL293738: [AArch64] Add new subtarget feature to fuse AES crypto operations (authored by evandro). · Explain WhyJan 31 2017, 7:06 PM This revision was automatically updated to reflect the committed changes.
Revision Contents
Diff 84372 llvm/lib/Target/AArch64/AArch64.td
llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
llvm/lib/Target/AArch64/AArch64SchedA57.td
llvm/lib/Target/AArch64/AArch64SchedM1.td
llvm/lib/Target/AArch64/AArch64Subtarget.h
llvm/test/CodeGen/AArch64/misched-fusion-aes.ll
llvm/test/CodeGen/AArch64/misched-fusion.ll
|
The features seem to be sorted alphabetically (same with the Exynos entry).