This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Tie source and destination operands for AESMC/AESIMC.
ClosedPublic

Authored by fhahn on Jul 12 2017, 5:06 AM.

Details

Summary

Most CPUs implementing AES fusion require instruction pairs of the form

AESE Vn, _
AESMC Vn, Vn

and

AESD Vn, _
AESIMC Vn, Vn

The constraint is added to AES(I)MC instructions which use the result of
an AES(E|D) instruction by using AES(I)MCTrr pseudo instructions, which
constraint source and destination registers to be the same.

A nice side effect of this change is that now all possible pairs are
scheduled back-to-back on the exynos-m1 for the misched-fusion-aes.ll
test case.

I had to update aes_load_store. The version I added initially was very
reduced and with the new constraint, AESE/AESMC could not be scheduled
back-to-back. I updated the test to be more realistic and still expose
the same scheduling problem as the initial test case.

Diff Detail

Event Timeline

fhahn created this revision.Jul 12 2017, 5:06 AM
evandro edited edge metadata.Jul 12 2017, 12:18 PM

I thought that the data dependency was enough for fusion in general:

AESE Vi, Vj
AESMC Vk, Vi

And:

AESD Vi, Vj
AESIMC Vk, Vi
fhahn added a comment.Jul 12 2017, 2:00 PM

I thought that the data dependency was enough for fusion in general:

It is for the MacroFusion pass and they will be scheduled back-to-back, regardless of the register constraint.

AESE Vi, Vj
AESMC Vk, Vi

And:

AESD Vi, Vj
AESIMC Vk, Vi

But unless Vk == Vi for AESMC/AESIMC, the most CPUs won't be able to fuse the pair and won't execute it more efficiently.

But unless Vk == Vi for AESMC/AESIMC, the most CPUs won't be able to fuse the pair and won't execute it more efficiently.

Perhaps most CPUs require this, but certainly not all. Thus, would there be a way to make this constraint specific to such CPUs?

fhahn added a comment.Jul 12 2017, 2:59 PM

But unless Vk == Vi for AESMC/AESIMC, the most CPUs won't be able to fuse the pair and won't execute it more efficiently.

Perhaps most CPUs require this, but certainly not all. Thus, would there be a way to make this constraint specific to such CPUs?

One solution I could think of would be to add pseudo AESMC/AESIMC instructions and expand them to either AESMC/AESIMC with constraint or without constraint, depending on the target-features. There probably is a more elegant solution I am not aware of though, so I would appreciate any input :)

fhahn updated this revision to Diff 106612.Jul 14 2017, 3:59 AM
fhahn edited the summary of this revision. (Show Details)

I've introduced AES(I)MCTrr pseudo instructions which have the "src = dst" constraint and patterns to use them for AES instruction pairs, if the result of the AESE/D in the pair has a single user.

I think this change keeps the codegen impact on CPUs that do not require the constraint to fuse AES instructions as small as possible, as the constraint should reduce register pressure and I don't expect AESMC instructions with the same source/destination register to be slower than if the registers are different.

@evandro does this address your concern? In my opinion adding a separate attribute seems not worth it.

fhahn added a comment.Jul 19 2017, 7:13 AM

Ping. Do you think the route with pseudo instructions is a workable approach?

I kind of prefer the previous patch for its being less intrusive, but I'm afraid that it would create problems for the assembler too, wouldn't it?

fhahn added a comment.Jul 20 2017, 3:01 PM

The previous patch had a smaller diff and was simpler, because it added the constraint to all AESMC/AESIMC instructions. The latest version of the patch only applies the constraint to AESMC/AESIMC instructions that are part of a pair and for targets with FeatureFuseAES, so the impact on the generated code is smaller. I think applying the constraint only to instruction pairs is nice, but I would be fine with adding the constraint to all AESMC/AESIMC instructions too.

t.p.northover edited edge metadata.Jul 20 2017, 3:15 PM

I'm afraid that it would create problems for the assembler too, wouldn't it?

Yeah, the original causes failures in test/MC. I wonder why that wasn't spotted earlier?

Anyway, I'm not sure the "single-use" heuristic is actually what we should care about. As far as I understand AES these instructions are only ever going to be seen with single-uses in reality so optimizing for other cases is probably premature.

For out-of-order non-fused implementations the register shouldn't matter as far as I can see, which leaves in-order unfused implementations. I've got much less experience there, but don't they tend to have early forwarding to dependent instructions anyway?

So perhaps a hybrid approach where the Pattern always picks these pseudo-instructions would be best?

fhahn updated this revision to Diff 107705.Jul 21 2017, 1:21 PM
fhahn edited the summary of this revision. (Show Details)

I'm afraid that it would create problems for the assembler too, wouldn't it?

Yeah, the original causes failures in test/MC. I wonder why that wasn't spotted earlier?

I'm not entirely sure what you mean here. I've updated one of the AArch64 MC tests to build with +fuse-aes, to make sure the constraint does not mess up MC codegen.

Anyway, I'm not sure the "single-use" heuristic is actually what we should care about. As far as I understand AES these instructions are only ever going to be seen with single-uses in reality so optimizing for other cases is probably premature.

For out-of-order non-fused implementations the register shouldn't matter as far as I can see, which leaves in-order unfused implementations. I've got much less experience there, but don't they tend to have early forwarding to dependent instructions anyway?

So perhaps a hybrid approach where the Pattern always picks these pseudo-instructions would be best?

Yes, the single-use case is what we should see in practice. I've removed the single use pattern from the patch.

I'm not entirely sure what you mean here.

I downloaded and built the very first patch, and the obvious test/MC/AArch64 tests failed because they expected "aesmc v0.16b, v1.16b" to be valid syntax. It's not an issue with the newest patch but it confirms evandro's suspicion about the earlier one.

Yes, the single-use case is what we should see in practice. I've removed the single use pattern from the patch.

Thanks. I don't know about @evandro, but I think this is looking pretty reasonable now. I certainly can't think of a fundamentally better approach.

evandro accepted this revision.Jul 28 2017, 8:36 AM

I agree, this is a sensible approach.

However, the pseudo indicator in the middle of the instruction (AES.*T.*) looks too much like an instruction. Can this be made more explicit that it's a pseudo, perhaps as P_AES.* or some such?

This revision is now accepted and ready to land.Jul 28 2017, 8:36 AM

Not convinced by a Hungarian tagging like that. It's not what happens with existing pseudo-instructions. If anything they have a semi-descriptive name: "MOVaddr", "MOVaddrJT", "CMP_SWAP_8", ...

fhahn added a comment.Jul 28 2017, 1:36 PM

How about something like AESIMCrrTied?

How about something like AESIMCrrTied?

LGTM

t.p.northover accepted this revision.Jul 28 2017, 1:59 PM

Yep, I'd be happy with that too. Commit away!

fhahn updated this revision to Diff 108790.Jul 29 2017, 8:46 AM

Renamed pseudo instructions, will commit soon

fhahn closed this revision.Jul 29 2017, 1:36 PM