This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Add new subtarget feature to fuse AES crypto operations
ClosedPublic

Authored by evandro on Jan 9 2017, 3:10 PM.

Details

Summary

This feature enables the fusion of such operations on Cortex A57, as recommended in its Software Optimisation Guide, section 4.13, and on Exynos M1.

On A57, it improves the results of a proprietary benchmark by about 20%.

Diff Detail

Repository
rL LLVM

Event Timeline

evandro updated this revision to Diff 83716.Jan 9 2017, 3:10 PM
evandro retitled this revision from to [AArch64] Add new subtarget feature to fuse AES crypto operations.
evandro updated this object.
evandro set the repository for this revision to rL LLVM.
evandro added a subscriber: llvm-commits.
MatzeB added inline comments.Jan 9 2017, 3:13 PM
llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
2059–2063

Have you actually tested this properly? I am pretty sure this function is only called for terminator instructions. I am currently working on bigger rewrites of the macrofusion code to allow fusion inside of basic blocks, I don't think it is possible today.

@MatzeB, please see the parent patch D28489.

MatzeB added inline comments.Jan 9 2017, 3:19 PM
llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
2059–2063

Just saw that you linked a dependent patch about that.

The MacroFusion pass is currently being added before the RA runs. However, since the AArch64ExpandPseudo pass is run after the RA (in AArch64PassConfig::addPreSched2()), I wonder if it'd make more sense to run the MISched after the RA as well, and not before as it is now. Thoughts?

MatzeB edited edge metadata.EditedJan 11 2017, 1:38 PM

The MacroFusion pass is currently being added before the RA runs. However, since the AArch64ExpandPseudo pass is run after the RA (in AArch64PassConfig::addPreSched2()), I wonder if it'd make more sense to run the MISched after the RA as well, and not before as it is now. Thoughts?

There are a number of benefits when running the scheduler before register allocation (for example we can still reduce register pressure). We already have the PostMachineScheduler for scheduling again after regalloc (it's based on the same MISched framework but added considerably later in the pipeline; see also TargetSubtargetInfo::enablePostRAScheduler()).

The MacroFusion pass is currently being added before the RA runs. However, since the AArch64ExpandPseudo pass is run after the RA (in AArch64PassConfig::addPreSched2()), I wonder if it'd make more sense to run the MISched after the RA as well, and not before as it is now. Thoughts?

There are a number of benefits when running the scheduler before register allocation (for example we can still reduce register pressure). We already have the PostMachineScheduler for scheduling again after regalloc (it's based on the same MISched framework but added considerably later in the pipeline; see also TargetSubtargetInfo::enablePostRAScheduler()).

I'm asking this because, looking further at other pairs of instrs that A57 fuses, such as ADRP/ADD, they only appear in the instr stream after pseudo expansion.

The MacroFusion pass is currently being added before the RA runs. However, since the AArch64ExpandPseudo pass is run after the RA (in AArch64PassConfig::addPreSched2()), I wonder if it'd make more sense to run the MISched after the RA as well, and not before as it is now. Thoughts?

There are a number of benefits when running the scheduler before register allocation (for example we can still reduce register pressure). We already have the PostMachineScheduler for scheduling again after regalloc (it's based on the same MISched framework but added considerably later in the pipeline; see also TargetSubtargetInfo::enablePostRAScheduler()).

I'm asking this because, looking further at other pairs of instrs that A57 fuses, such as ADRP/ADD, they only appear in the instr stream after pseudo expansion.

Well if there is no reason to ever break the instructions apart, then using a Pseudo instruction and expanding that later may be the easier solution, is that the case for the AES instructions?

I'm asking this because, looking further at other pairs of instrs that A57 fuses, such as ADRP/ADD, they only appear in the instr stream after pseudo expansion.

Well if there is no reason to ever break the instructions apart, then using a Pseudo instruction and expanding that later may be the easier solution, is that the case for the AES instructions?

No, since they are pretty opaque. But the pseudo MOVaddr is expanded into the pair ADRP/ADD only after the RA. On A57, it's important to schedule them back to back, e.g., by running the MISched after the RA instead of before.

No, since they are pretty opaque. But the pseudo MOVaddr is expanded into the pair ADRP/ADD only after the RA. On A57, it's important to schedule them back to back, e.g., by running the MISched after the RA instead of before.

Or rather, I wonder why pseudo expansion is happening this late, when they are very simple instrs in AArch64. Methinks that expanding them sooner would expose them to more optimizations, yes?

evandro updated this revision to Diff 84372.Jan 13 2017, 1:59 PM
evandro edited edge metadata.
evandro updated this revision to Diff 86106.Jan 27 2017, 12:57 PM
evandro edited the summary of this revision. (Show Details)

Look good overal

llvm/lib/Target/AArch64/AArch64.td
183

The features seem to be sorted alphabetically (same with the Exynos entry).

llvm/test/CodeGen/AArch64/misched-fusion.ll
13–19

Why is this test affected here? I see no AES instructions.

evandro marked an inline comment as done.Jan 30 2017, 8:26 AM
evandro added inline comments.
llvm/test/CodeGen/AArch64/misched-fusion.ll
13–19

Indeed, this test shouldn't be part of this patch.

evandro marked an inline comment as done.Jan 30 2017, 8:29 AM
evandro updated this revision to Diff 86294.Jan 30 2017, 8:44 AM
MatzeB accepted this revision.Jan 30 2017, 10:38 AM

LGTM

This revision is now accepted and ready to land.Jan 30 2017, 10:38 AM
evandro updated this revision to Diff 86539.Jan 31 2017, 5:48 PM

Final patch after approval.

This revision was automatically updated to reflect the committed changes.