This is an archive of the discontinued LLVM Phabricator instance.

AArch64: use some crypto instructions during CodeGen
AcceptedPublic

Authored by t.p.northover on Nov 27 2019, 3:14 AM.

Details

Summary

Most of the crypto instructions introduced in v8.2a are far too complex to be worth pattern-matching on for normal IR, but a handful are simple XORs and rotations, which we can spot. This implements support for all of them that can be done in TableGen (unfortunately XAR would need to match a shift of both N and 64-N which can't be specified).

Diff Detail

Event Timeline

t.p.northover created this revision.Nov 27 2019, 3:14 AM
fhahn added a subscriber: fhahn.Nov 27 2019, 11:43 AM

Could we run into issues similar to D70673, where we match the patterns without considering the uses, resulting in worse code if operands are used multiple times?

In this case, on our CPUs it looks like there's no difference between how an these instructions and a single eor execute in either latency or pipelines. Since the final operation in all of these patterns is an eor anyway, I think that means the it's not a worry.

fhahn accepted this revision.Nov 28 2019, 4:07 AM

In this case, on our CPUs it looks like there's no difference between how an these instructions and a single eor execute in either latency or pipelines. Since the final operation in all of these patterns is an eor anyway, I think that means the it's not a worry.

Right, IIUC assuming the cost of BCAX/ EOR3 is the same as the cost of any of the source instructions in the pattern (or, xor, and, not), then it should be always profitable. I could not find the cost of BCAX/EOR for the Cortex-A75 (the latest public software optimization guide I have), but it would be good to have people check this for other CPUs.

Adding a few additional people to give them a chance to double-check if that assumption holds for various CPUs.

LGTM, unless there are in concerns for other CPUs in the next few days (probably good to wait until mid next week, with Thanksgiving).

Also, it would be great to add a few test cases with multiple users (and a comment why the generated code is fine), to make clear we considered those cases.

This revision is now accepted and ready to land.Nov 28 2019, 4:07 AM