This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Load Balancing for AES instructions on Cortex-A57
AbandonedPublic

Authored by zzheng on Nov 6 2014, 12:57 PM.

Details

Summary

On Cortex-A57, an AES chain achieves best performance if the accumulation register is kept the same through the entire chain. This patch utilizes the A57 FP load balancing pass to enforce that we emit such AES instruction sequence.

Diff Detail

Event Timeline

zzheng updated this revision to Diff 15889.Nov 6 2014, 12:57 PM
zzheng retitled this revision from to [AArch64] Load Balancing for AES instructions on Cortex-A57.
zzheng updated this object.
zzheng edited the test plan for this revision. (Show Details)
zzheng added a subscriber: Unknown Object (MLST).
aadg added a subscriber: aadg.Nov 6 2014, 1:20 PM

The code looks ok, but I'm not sure about the architecture decisions. I'd rather Tim or James had a look.

cheers,
--renato

lib/Target/AArch64/AArch64A57FPLoadBalancing.cpp
566

You could have moved isAES up and made Change one line

test/CodeGen/AArch64/aes-load-balancing.ll
15

can you use CHECK variables here? Like:

;CHECK: aese v[[accum:0-9]].16b, ...
zzheng updated this revision to Diff 16004.Nov 10 2014, 12:12 PM

Patch revised to address Renato's comments.

Thanks Zheng! Let's see what Tim or James have to say.

cheers,
--renato

jmolloy requested changes to this revision.Nov 11 2014, 2:40 AM
jmolloy edited edge metadata.

Hi Zhao,

There are some subtleties here that I don't think you're handling correctly. The major one is tied operands, which are not handled right and are the reason I removed handling of MLAv2f32 from this pass.

Consider a chain where the end instruction is the same as the kill instruction (which means the end dest register cannot be changed). The fixup code currently will look at the last instruction and see something like:

%D0<def,tied0> = AES... %D0<kill,tied0>, %D1...

It will look at the uses first, totally ignore the tied constraint and change %D0 to be whatever the chain register is. It will then not look at the def, because it knows it is illegal to change the def. We then have a broken instruction.

When you add proper handling for this (along with a decent amount of tests, because this is incredibly fiddly to get right), we can re-add support for MLAv2f32. Which would be nice :)

Secondly I think your algorithm is going the wrong way. It is chaining the instructions going forward through the instruction stream instead of backwards. Consider, it's possible that we can't change the dest operand of the last instruction in the chain. As the operand is tied, the accumulator has to be the same as the dest. So choosing the chaining register as the dest of the last instruction seems the right thing to do. Currently your algorithm could get to the last instruction and not be able to change it. If you implement this algorithm, you actually don't need to handle tied operands specially (although I still would like to see proper assertions in place that they're not corrupted), because you never get to the situation where you can change an accum but not a dest.

Thirdly, are AES instruction statically scheduled by the core? My belief is "no", which is fine, but they should not contribute to the computed static parity. If "yes", then they should probably be sorted to appear in the ready queue before FMADDs.

Cheers,

James

This revision now requires changes to proceed.Nov 11 2014, 2:40 AM

James,

Thanks for the feed back. To make sure I understand correctly: the way we build chains (mul-mla or aes) is forward, which is fine.

But it's better to color the chains backward for the G->getLast() to handle tied dest register. This would requires us to iterate from the last instructions to the first of a chain in ColorChain() and make appropriate changes to scavengeRegister().

Is this the plan you envisioned to add back support for MLAv2f32?

Thanks,
Zhaoshi

mcrosier resigned from this revision.Aug 5 2015, 11:17 AM
mcrosier removed a reviewer: mcrosier.
zzheng abandoned this revision.Mar 16 2017, 11:10 AM