This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/Driver/
-
clang/
-
Driver/
-
Options.td
-
lib/Driver/ToolChains/Arch/
-
Driver/
-
ToolChains/
-
Arch/
-
ARM.cpp
-
test/Driver/
-
Driver/
-
arm-fix-cortex-a57-aes-1742098.c
-
llvm/
-
lib/
-
CodeGen/
1/1
RDFGraph.cpp
-
Target/ARM/
-
ARM/
-
ARM.h
-
ARM.td
17/17
ARMFixCortexA57AES1742098Pass.cpp
1/1
ARMSubtarget.h
6/6
ARMTargetMachine.cpp
-
CMakeLists.txt
-
test/CodeGen/ARM/
-
CodeGen/
-
ARM/
-
O3-pipeline.ll
2/2
aes-erratum-fix.ll

Differential D119720

[ARM] Pass for Cortex-A57 and Cortex-A72 Fused AES Erratum
ClosedPublic

Authored by lenary on Feb 14 2022, 7:30 AM.

Download Raw Diff

Details

Reviewers

tmatheson
stuij
chill
dmgreen
simon_tatham

Commits

rG3a24df992cf8: [ARM] Pass for Cortex-A57 and Cortex-A72 Fused AES Erratum

Summary

This adds a late Machine Pass to work around a Cortex CPU Erratum
affecting Cortex-A57 and Cortex-A72:

Cortex-A57 Erratum 1742098
Cortex-A72 Erratum 1655431

The pass inserts instructions to make the inputs to the fused AES
instruction pairs no longer trigger the erratum. Here the pass errs on
the side of caution, inserting the instructions wherever we cannot prove
that the inputs came from a safe instruction.

As the pass is executed very late in the ARM backend pipeline, it has to
reconstruct the Register Dataflow Graph, for which it uses the RDFGraph
utilities used by other backends.

This initial version will stop at the start of basic block containing
the first AES instruction, but having the full RDF Graph available means
we should be able to be more efficient with AES encryption and
decryption loops in future, if we wish.

The pass is used:

for Cortex-A57 and Cortex-A72,
for "generic" cores (which are used when using -march=),
when the user specifies -mfix-cortex-a57-aes-1742098 or mfix-cortex-a72-aes-1655431 in the command-line arguments to clang.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

lenary created this revision.Feb 14 2022, 7:30 AM

Herald added subscribers: dang, hiraditya, kristof.beyls, mgorny. · View Herald TranscriptFeb 14 2022, 7:30 AM

lenary requested review of this revision.Feb 14 2022, 7:30 AM

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptFeb 14 2022, 7:30 AM

Herald added a subscriber: cfe-commits. · View Herald Transcript

@kparzysz I've tagged you due to the changes in RDFGraph, which I believe you are the owner of. The asserts are hit in llvm/test/CodeGen/ARM/inlineasm-error-t-toofewregs.ll - the Register Allocator Chokes on the test due to not having enough registers for the inline assembly. llc then reports the error, but continues running passes (including this one), which then core dumps on the asserts I've removed (The first one hit is in DataFlowGraph::buildStmt). When I run the same testcase with -verify-machineinstrs, there is no verification error, but the verifier seems to have decided that analysing physical reg liveness is too hard (see e.g. MachineVerifier.cpp line 2990). I haven't worked out how else to prevent this pass running if the register allocation is known to be wrong, without analysing every instruction before building the RDFGraph. I wasn't sure the right approach, so I wanted to post the patch and then discuss it, rather than the other way around (so you can see the testcase).

Harbormaster completed remote builds in B149383: Diff 408413.Feb 14 2022, 8:03 AM

Ping

I have a high level question regarding RDF, as I've not seen it used in many other places, so it may be under-tested on Arm systems at the moment. This currently, for all code, builds an rdf graph, analyze the rdf graph for a fairly rare instructions, then fixes up the code based on that. It might be best to avoid the (possibly expensive?) rdf graph generation for the common case where the instructions are not present. Check that the instruction exists first.

It might then be simpler to just search back for the def of a register, considering in most code the instruction we are looking for should be fairly rare and we won't expect to need to find def's in bulk. That might be simpler overall, and avoid some of the difficulties of RDF.

llvm/test/CodeGen/ARM/aes-erratum-fix.ll
49	Adding arm_aapcs_vfpcc will make the function "hardfp", which might be useful for testing inputs from argument that don't need to be passed via gpr regs.

lenary planned changes to this revision.Mar 28 2022, 3:54 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 28 2022, 3:54 AM

Herald added a subscriber: MaskRay. · View Herald Transcript

In D119720#3339855, @dmgreen wrote:

I have a high level question regarding RDF, as I've not seen it used in many other places, so it may be under-tested on Arm systems at the moment. This currently, for all code, builds an rdf graph, analyze the rdf graph for a fairly rare instructions, then fixes up the code based on that. It might be best to avoid the (possibly expensive?) rdf graph generation for the common case where the instructions are not present. Check that the instruction exists first.

I haven't had time to see what effect this has on compile times, but I don't see the point in iterating over every instruction in the function checking the opcode, to then iterate over them all constructing the rdfgraph, that seems to make the approach a lot more expensive, if I'm then going to iterate over the rdfgraph looking for specific instructions again.

It might then be simpler to just search back for the def of a register, considering in most code the instruction we are looking for should be fairly rare and we won't expect to need to find def's in bulk. That might be simpler overall, and avoid some of the difficulties of RDF.

The reason for using the rdfgraph was to allow us to improve this at a later date if we found the performance issues a problem, i.e. by hoisting the vorr past phis so we can save on executing them in loops. Given the issues with rdfgraph, I'm going to reimplement this as a Block-local analysis and fixup pass without the rdfgraph, which is effectively what the pass does in the current patch.

lenary mentioned this in D122747: [NFC][ARM] Tests for Cortex-A57 and Cortex-A72 Fused AES Erratum.Mar 30 2022, 10:08 AM

lenary added a parent revision: D122747: [NFC][ARM] Tests for Cortex-A57 and Cortex-A72 Fused AES Erratum.Mar 30 2022, 10:09 AM

Rewrite pass in terms of ReachingDefAnalysis
Split tests into separate commit, for ease of review.

@kparzysz I have rewritten this to avoid using RDFGraph, so I don't think this needs you to review it any more.

llvm/test/CodeGen/ARM/aes-erratum-fix.ll
49	Yeah, seems I was too zealous with removing some of these attributes.

lenary planned changes to this revision.Mar 30 2022, 10:16 AM

lenary added inline comments.

llvm/lib/CodeGen/RDFGraph.cpp
1096	I forgot to remove these comments, will update shortly.
llvm/lib/Target/ARM/ARMSubtarget.h
543	I missed this, will update again shortly.

Remove whitespace change in ARMSubtarget
Remove commented-out debug lines in RDFGraph

lenary marked an inline comment as done.Mar 30 2022, 11:48 AM

dmgreen added inline comments.Mar 31 2022, 1:10 AM

llvm/lib/Target/ARM/ARMFixCortexA57AES1742098Pass.cpp
146	Can/should all these use findFirstPredOperandIdx? And is it worth checking for more instruction? Anything that sets a Q register, or is that too broad?
251	This needn't scan through checking for the instruction that the loop below is going to check for too.
307	Can you explain more about the IsLiveIn && UnsafeCount==0 case. Am I understanding that correctly that it would be: function(q0, ...) lotsofcode... q0 = load aes q0 Is there a better way to detect that the live-in doesn't matter in cases like that?
llvm/lib/Target/ARM/ARMTargetMachine.cpp
585	Passes can't insert new instructions (or move things further apart) after ConstantIslandPass. The branches/constant pools it has created may go out of range of the instructions that use them. Would it be OK to move it before that?

Thanks for the review. Lots of comments inline, hopefully Phab doesn't mangle the large one.

llvm/lib/Target/ARM/ARMFixCortexA57AES1742098Pass.cpp
146	`findFirstPredOperandIdx` doesn't work as lots of these instructions are not marked `isPredicable` in the tablegen. I'm not sure if we want to solve that in this work, or as a follow-up (I'd lean towards follow-up). I believe "anything that sets a Q register" is too broad, as we model subregister insertion as setting the the whole register in LLVM, but I'm not sure that micro-architecturally they are actually doing that. This is why I've tried to only add 64- and 128-bit setting instructions rather than ones that are less wide. Originally I also included the `VMOVv2f32` instructions that are now at the bottom of this switch, but I felt that might have been too risky.
251	Ack. Will remove. I think this is vestigal from a previous (unshared) version of the patch which was doing something more complex in the loop.
307	I don't believe there is, and this comes down to issues with the `RDA.getGlobalReachingDefs` implementation, which I want to fix/enhance, but in a follow-up patch. To start with, this is actually not a problem, as the pass is intended to be conservative, and we know the clobbers are a no-op at the architectural level (we insert them for their micro-architectural effects). So code will still do the right thing, but maybe with a little too much overhead in the case you showed. However, this is necessary in some other cases, such as: function(q0) code conditional branch to L2 L1: q0 = safe_op(…) branch to L3 L2: code without update to q0 L3: aes q0 In this case, `AllDefs` is a set containing one single defining instruction for Q0, because there is only one within the function (which is all that `RDA.getGlobalReachingDefs` can report instructions for). But in my example, we need to protect q0 on the other paths, because the safe definitions of q0, when considered as a set, do not entirely dominate the AES use of q0 (this is slightly stretching the conventional definition of dominance, but think of this as "there exists a path from entry to the aes, which does not contain any of the safe instructions". Sadly, MachineDominance doesn't allow us to make this kind of query either!). In this case though, it is safe to insert the protection at function entry, because that will (by definition) dominate all the AES uses, and the protection doesn't need to be dominated by the safe definitions, as we know they're safe. I intend to follow-up this initial patch with an enhancement to ReachingDefAnalysis which will also provide the information that you have a set of defs inside the function, and also you're live-in, as this is required info for any conservative pass using the ReachingDefAnalysis. I felt, however, that given the pass is safe as-is, it was good to proceed without this enhancement.
llvm/lib/Target/ARM/ARMTargetMachine.cpp
585	TIL. I'll add a comment about the constant island pass as well. Should I also look at the restrictions on the Branch Targets pass? I can imagine we also don't want to separate instructions once we've calculated their targets locally?

dmgreen added inline comments.Mar 31 2022, 2:41 AM

llvm/lib/Target/ARM/ARMFixCortexA57AES1742098Pass.cpp
201–208	Are these vmov of an immediate? Are they not safe? I was expecting it to be the lanes sets (VSETLNi8) and other scalar instructions that were unsafe.
307	OK sounds good.
llvm/lib/Target/ARM/ARMTargetMachine.cpp
585	Yeah - It sounds like the BTI would need to remain as the first instruction in the block.

Harbormaster completed remote builds in B157011: Diff 419208.Mar 31 2022, 3:43 AM

A few comments before I post the next version of the patch.

llvm/lib/Target/ARM/ARMFixCortexA57AES1742098Pass.cpp
146	I'm wrong about this. The tablegen `isPredicable` is an override, other code might also set `isPredicable` to true, so I think `findFirstPredOperandIdx` should work too.
201–208	I have received the information on what is safe and what is not, and the next version of the patch will have this correct.
307	One note is that the exact problem you describe does show up in the tests (in `aese_set64_via_ptr`, where the `vldr` is "safe"), so when the pass is enhanced, we will notice the improvements.
llvm/lib/Target/ARM/ARMTargetMachine.cpp
585	Turns out the AES pass doesn't have to come before the BTI pass, because AES instructions are only available on A-profile, and BTI is M-profile. I still will move it to before all these passes anyway, just so it's clear what is going on.

Updated set of safe instructions
Address reviewer feedback, including reordering passes.

lenary marked an inline comment as done.Apr 7 2022, 7:01 AM

Harbormaster completed remote builds in B158473: Diff 421193.Apr 7 2022, 8:07 AM

Looks OK to me, as far as I can see. If it worth adding a few extra instructions that may come up?

llvm/lib/Target/ARM/ARMFixCortexA57AES1742098Pass.cpp
147	Perhaps add these, if they are safe: VBICd/q VBICi's, VORRi's VBIF/VBIT/VBSL/VBSP VCEQ/VCNE/etc? VDUP? VEXT? VMVN imm equivalents of VMOV's VREV's? VSHL's, VSHR's? I'm not sure if they will be very useful, but they are the kind of instructions that may come up in aes algorithms.
llvm/lib/Target/ARM/ARMTargetMachine.cpp
582–583	"No new instructions may be inserted" -> "Block sizes cannot be increased" And maybe "will affect the offsets used for accessing these constants." -> "may push the branch ranges and load offsets of accessing constant pools out of range."
583–585	It's not about "not inserting instructions" exactly - it will replace psuedos with all kinds of new instructions :) The pseudos needed to have a conservative size through ConstantIslandPass though to allow that though. It does make sure that it will not move instructions further apart from their targets.

dmgreen added a reviewer: simon_tatham.Apr 8 2022, 2:05 AM

simon_tatham added inline comments.Apr 11 2022, 6:16 AM

llvm/lib/Target/ARM/ARMFixCortexA57AES1742098Pass.cpp
13–20	This description would leave me still confused if I didn't happen to already know roughly what the plan was. It jumps in half way through the explanation that someone would need if they were coming to this pass cold. (E.g. it talks about "the VORRq" before having even mentioned that there is one, let alone why there is one.) How about the suggested text as a rewording?
311	nit: repeated word

Ack to all the comment clarifications, will update patch with those soon (probably tomorrow).

llvm/lib/Target/ARM/ARMFixCortexA57AES1742098Pass.cpp
147	I'm very keen to avoid scope-creep on this patch, so I'm going to push back on this comment. We know this list as given is safe (and have had internal confirmation). I've sent a new email internally with your list of instructions, to find out of they're safe too, but I'd like any answer to that to be part of a follow-up patch rather than blocking this patch for yet longer. I believe what I have today is correct, even if the list is not optimal for all expected AES code sequences.

dmgreen added inline comments.May 10 2022, 5:54 AM

llvm/lib/Target/ARM/ARMFixCortexA57AES1742098Pass.cpp
147	Yeah that sounds OK. So long as you address Simons comments and follow up with the instructions at a later date, this LGTM.

Address comment nits
Rebase

lenary marked 3 inline comments as done.May 12 2022, 9:42 AM

lenary added inline comments.

llvm/lib/Target/ARM/ARMFixCortexA57AES1742098Pass.cpp
13–20	I have taken this feedback on board, and reworded the explanation based on your suggestion, with some slight ordering changes and a little more detail about the complex cases and some separation between the erratum itself, and the workaround we have chosen.

Harbormaster completed remote builds in B164125: Diff 428978.May 12 2022, 11:02 AM

Thanks. LGTM

This revision is now accepted and ready to land.May 13 2022, 1:21 AM

Thanks, that explanation looks fine. (And I agree that re-paragraphing it made more sense than my version)

lenary mentioned this in rG48ad639036db: [NFC][ARM] Tests for Cortex-A57 and Cortex-A72 Fused AES Erratum.May 13 2022, 2:41 AM

This revision was landed with ongoing or failed builds.May 13 2022, 2:48 AM

Closed by commit rG3a24df992cf8: [ARM] Pass for Cortex-A57 and Cortex-A72 Fused AES Erratum (authored by Archibald Elliott <archibald.elliott@arm.com>). · Explain Why

This revision was automatically updated to reflect the committed changes.

lenary added a commit: rG3a24df992cf8: [ARM] Pass for Cortex-A57 and Cortex-A72 Fused AES Erratum.

Revision Contents

Path

Size

clang/

include/

clang/

Driver/

Options.td

14 lines

lib/

Driver/

ToolChains/

Arch/

ARM.cpp

10 lines

test/

Driver/

arm-fix-cortex-a57-aes-1742098.c

25 lines

llvm/

lib/

CodeGen/

RDFGraph.cpp

34 lines

Target/

ARM/

ARM.h

2 lines

ARM.td

12 lines

ARMFixCortexA57AES1742098Pass.cpp

402 lines

ARMSubtarget.h

1 line

ARMTargetMachine.cpp

3 lines

CMakeLists.txt

1 line

test/

CodeGen/

ARM/

O3-pipeline.ll

2 lines

aes-erratum-fix.ll

82 lines

Diff 419202

clang/include/clang/Driver/Options.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 3,400 Lines • ▼ Show 20 Lines
	def mgeneral_regs_only : Flag<["-"], "mgeneral-regs-only">, Group<m_Group>,			def mgeneral_regs_only : Flag<["-"], "mgeneral-regs-only">, Group<m_Group>,
	HelpText<"Generate code which only uses the general purpose registers (AArch64/x86 only)">;			HelpText<"Generate code which only uses the general purpose registers (AArch64/x86 only)">;
	def mfix_cmse_cve_2021_35465 : Flag<["-"], "mfix-cmse-cve-2021-35465">,			def mfix_cmse_cve_2021_35465 : Flag<["-"], "mfix-cmse-cve-2021-35465">,
	Group<m_arm_Features_Group>,			Group<m_arm_Features_Group>,
	HelpText<"Work around VLLDM erratum CVE-2021-35465 (ARM only)">;			HelpText<"Work around VLLDM erratum CVE-2021-35465 (ARM only)">;
	def mno_fix_cmse_cve_2021_35465 : Flag<["-"], "mno-fix-cmse-cve-2021-35465">,			def mno_fix_cmse_cve_2021_35465 : Flag<["-"], "mno-fix-cmse-cve-2021-35465">,
	Group<m_arm_Features_Group>,			Group<m_arm_Features_Group>,
	HelpText<"Don't work around VLLDM erratum CVE-2021-35465 (ARM only)">;			HelpText<"Don't work around VLLDM erratum CVE-2021-35465 (ARM only)">;
				def mfix_cortex_a57_aes_1742098 : Flag<["-"], "mfix-cortex-a57-aes-1742098">,
				Group<m_arm_Features_Group>,
				HelpText<"Work around Cortex-A57 Erratum 1742098 (ARM only)">;
				def mno_fix_cortex_a57_aes_1742098 : Flag<["-"], "mno-fix-cortex-a57-aes-1742098">,
				Group<m_arm_Features_Group>,
				HelpText<"Don't work around Cortex-A57 Erratum 1742098 (ARM only)">;
				def mfix_cortex_a72_aes_1655431 : Flag<["-"], "mfix-cortex-a72-aes-1655431">,
				Group<m_arm_Features_Group>,
				HelpText<"Work around Cortex-A72 Erratum 1655431 (ARM only)">,
				Alias<mfix_cortex_a57_aes_1742098>;
				def mno_fix_cortex_a72_aes_1655431 : Flag<["-"], "mno-fix-cortex-a72-aes-1655431">,
				Group<m_arm_Features_Group>,
				HelpText<"Don't work around Cortex-A72 Erratum 1655431 (ARM only)">,
				Alias<mno_fix_cortex_a57_aes_1742098>;
	def mfix_cortex_a53_835769 : Flag<["-"], "mfix-cortex-a53-835769">,			def mfix_cortex_a53_835769 : Flag<["-"], "mfix-cortex-a53-835769">,
	Group<m_aarch64_Features_Group>,			Group<m_aarch64_Features_Group>,
	HelpText<"Workaround Cortex-A53 erratum 835769 (AArch64 only)">;			HelpText<"Workaround Cortex-A53 erratum 835769 (AArch64 only)">;
	def mno_fix_cortex_a53_835769 : Flag<["-"], "mno-fix-cortex-a53-835769">,			def mno_fix_cortex_a53_835769 : Flag<["-"], "mno-fix-cortex-a53-835769">,
	Group<m_aarch64_Features_Group>,			Group<m_aarch64_Features_Group>,
	HelpText<"Don't workaround Cortex-A53 erratum 835769 (AArch64 only)">;			HelpText<"Don't workaround Cortex-A53 erratum 835769 (AArch64 only)">;
	def mmark_bti_property : Flag<["-"], "mmark-bti-property">,			def mmark_bti_property : Flag<["-"], "mmark-bti-property">,
	Group<m_aarch64_Features_Group>,			Group<m_aarch64_Features_Group>,
	▲ Show 20 Lines • Show All 3,209 Lines • Show Last 20 Lines

clang/lib/Driver/ToolChains/Arch/ARM.cpp

Show First 20 Lines • Show All 727 Lines • ▼ Show 20 Lines	if (!Args.getLastArg(options::OPT_mcmse))
<< A->getOption().getName() << "-mcmse";		<< A->getOption().getName() << "-mcmse";

if (A->getOption().matches(options::OPT_mfix_cmse_cve_2021_35465))		if (A->getOption().matches(options::OPT_mfix_cmse_cve_2021_35465))
Features.push_back("+fix-cmse-cve-2021-35465");		Features.push_back("+fix-cmse-cve-2021-35465");
else		else
Features.push_back("-fix-cmse-cve-2021-35465");		Features.push_back("-fix-cmse-cve-2021-35465");
}		}

		// This also handles the -m(no-)fix-cortex-a72-1655431 arguments via aliases.
		if (Arg *A = Args.getLastArg(options::OPT_mfix_cortex_a57_aes_1742098,
		options::OPT_mno_fix_cortex_a57_aes_1742098)) {
		if (A->getOption().matches(options::OPT_mfix_cortex_a57_aes_1742098)) {
		Features.push_back("+fix-cortex-a57-aes-1742098");
		} else {
		Features.push_back("-fix-cortex-a57-aes-1742098");
		}
		}

// Look for the last occurrence of -mlong-calls or -mno-long-calls. If		// Look for the last occurrence of -mlong-calls or -mno-long-calls. If
// neither options are specified, see if we are compiling for kernel/kext and		// neither options are specified, see if we are compiling for kernel/kext and
// decide whether to pass "+long-calls" based on the OS and its version.		// decide whether to pass "+long-calls" based on the OS and its version.
if (Arg *A = Args.getLastArg(options::OPT_mlong_calls,		if (Arg *A = Args.getLastArg(options::OPT_mlong_calls,
options::OPT_mno_long_calls)) {		options::OPT_mno_long_calls)) {
if (A->getOption().matches(options::OPT_mlong_calls))		if (A->getOption().matches(options::OPT_mlong_calls))
Features.push_back("+long-calls");		Features.push_back("+long-calls");
} else if (KernelOrKext && (!Triple.isiOS() \|\| Triple.isOSVersionLT(6)) &&		} else if (KernelOrKext && (!Triple.isiOS() \|\| Triple.isOSVersionLT(6)) &&
▲ Show 20 Lines • Show All 241 Lines • Show Last 20 Lines

clang/test/Driver/arm-fix-cortex-a57-aes-1742098.c

This file was added.

				// RUN: %clang -### %s -target arm-none-none-eabi -march=armv8a -mfix-cortex-a57-aes-1742098 2>&1 \| FileCheck %s --check-prefix=FIX
				// RUN: %clang -### %s -target arm-none-none-eabi -march=armv8a -mno-fix-cortex-a57-aes-1742098 2>&1 \| FileCheck %s --check-prefix=NO-FIX

				// RUN: %clang -### %s -target arm-none-none-eabi -march=armv8a -mfix-cortex-a72-aes-1655431 2>&1 \| FileCheck %s --check-prefix=FIX
				// RUN: %clang -### %s -target arm-none-none-eabi -march=armv8a -mno-fix-cortex-a72-aes-1655431 2>&1 \| FileCheck %s --check-prefix=NO-FIX

				// RUN: %clang -### %s -target arm-none-none-eabi -march=armv8a 2>&1 \| FileCheck %s --check-prefix=UNSPEC
				// RUN: %clang -### %s -target arm-none-none-eabi -march=armv8a 2>&1 \| FileCheck %s --check-prefix=UNSPEC

				// This test checks that "-m(no-)fix-cortex-a57-aes-1742098" and
				// "-m(no-)fix-cortex-a72-aes-1655431" cause the "fix-cortex-a57-aes-1742098"
				// target feature to be passed to `clang -cc1`.
				//
				// This feature is also enabled in the backend for the two affected CPUs and the
				// "generic" cpu (used when only specifying -march), but that won't show up on
				// the `clang -cc1` command line.
				//
				// We do not check whether this option is correctly specified for the CPU: users
				// can specify the "-mfix-cortex-a57-aes-1742098" option with "-mcpu=cortex-a72"
				// and vice-versa, and will still get the fix, as the target feature and the fix
				// is the same in both cases.

				// FIX: "-target-feature" "+fix-cortex-a57-aes-1742098"
				// NO-FIX: "-target-feature" "-fix-cortex-a57-aes-1742098"
				// UNSPEC-NOT: "-target-feature" "{[+-]}fix-cortex-a57-aes-1742098"

llvm/lib/CodeGen/RDFGraph.cpp

//===- RDFGraph.cpp -------------------------------------------------------===//		//===- RDFGraph.cpp -------------------------------------------------------===//
		Lint: Lint Inline Actions clang-format suggested style edits found: Lint: Lint: clang-format suggested style edits found:
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// Target-independent, SSA-based data flow graph for register data flow (RDF).		// Target-independent, SSA-based data flow graph for register data flow (RDF).
▲ Show 20 Lines • Show All 1,078 Lines • ▼ Show 20 Lines	for (NodeAddr<DefNode> DA : IA.Addr->members_if(IsDef, this)) {
if (Visited.count(DA.Id))		if (Visited.count(DA.Id))
continue;		continue;
if (DA.Addr->getFlags() & NodeAttrs::Clobbering)		if (DA.Addr->getFlags() & NodeAttrs::Clobbering)
continue;		continue;

NodeList Rel = getRelatedRefs(IA, DA);		NodeList Rel = getRelatedRefs(IA, DA);
NodeAddr<DefNode*> PDA = Rel.front();		NodeAddr<DefNode*> PDA = Rel.front();
RegisterRef RR = PDA.Addr->getRegRef(*this);		RegisterRef RR = PDA.Addr->getRegRef(*this);
#ifndef NDEBUG		// #ifndef NDEBUG
		lenaryAuthorUnsubmitted Done Reply Inline Actions I forgot to remove these comments, will update shortly. lenary: I forgot to remove these comments, will update shortly.
// Assert if the register is defined in two or more unrelated defs.		// // Assert if the register is defined in two or more unrelated defs.
// This could happen if there are two or more def operands defining it.		// // This could happen if there are two or more def operands defining it.
if (!Defined.insert(RR.Reg).second) {		// if (!Defined.insert(RR.Reg).second) {
MachineInstr MI = NodeAddr<StmtNode>(IA).Addr->getCode();		// MachineInstr MI = NodeAddr<StmtNode>(IA).Addr->getCode();
dbgs() << "Multiple definitions of register: "		// dbgs() << "Multiple definitions of register: "
<< Print<RegisterRef>(RR, this) << " in\n " << MI << "in "		// << Print<RegisterRef>(RR, this) << " in\n " << MI << "in "
<< printMBBReference(*MI->getParent()) << '\n';		// << printMBBReference(*MI->getParent()) << '\n';
llvm_unreachable(nullptr);		// llvm_unreachable(nullptr);
}		// }
#endif		// #endif
// Push the definition on the stack for the register and all aliases.		// Push the definition on the stack for the register and all aliases.
// The def stack traversal in linkNodeUp will check the exact aliasing.		// The def stack traversal in linkNodeUp will check the exact aliasing.
DefM[RR.Reg].push(DA);		DefM[RR.Reg].push(DA);
for (RegisterId A : PRI.getAliasSet(RR.Reg)) {		for (RegisterId A : PRI.getAliasSet(RR.Reg)) {
// Check that we don't push the same def twice.		// Check that we don't push the same def twice.
assert(A != RR.Reg);		assert(A != RR.Reg);
DefM[A].push(DA);		DefM[A].push(DA);
}		}
▲ Show 20 Lines • Show All 184 Lines • ▼ Show 20 Lines	for (unsigned OpN = 0; OpN < NumOps; ++OpN) {
if (TOI.isClobbering(In, OpN))		if (TOI.isClobbering(In, OpN))
Flags \|= NodeAttrs::Clobbering;		Flags \|= NodeAttrs::Clobbering;
if (TOI.isFixedReg(In, OpN))		if (TOI.isFixedReg(In, OpN))
Flags \|= NodeAttrs::Fixed;		Flags \|= NodeAttrs::Fixed;
if (IsCall && Op.isDead())		if (IsCall && Op.isDead())
Flags \|= NodeAttrs::Dead;		Flags \|= NodeAttrs::Dead;
NodeAddr<DefNode*> DA = newDef(SA, Op, Flags);		NodeAddr<DefNode*> DA = newDef(SA, Op, Flags);
SA.Addr->addMember(DA, *this);		SA.Addr->addMember(DA, *this);
assert(!DoneDefs.test(R));		// assert(!DoneDefs.test(R));
DoneDefs.set(R);		DoneDefs.set(R);
}		}

// Process reg-masks (as clobbers).		// Process reg-masks (as clobbers).
BitVector DoneClobbers(TRI.getNumRegs());		BitVector DoneClobbers(TRI.getNumRegs());
for (unsigned OpN = 0; OpN < NumOps; ++OpN) {		for (unsigned OpN = 0; OpN < NumOps; ++OpN) {
MachineOperand &Op = In.getOperand(OpN);		MachineOperand &Op = In.getOperand(OpN);
if (!Op.isRegMask())		if (!Op.isRegMask())
▲ Show 20 Lines • Show All 293 Lines • ▼ Show 20 Lines	#ifndef NDEBUG
RegisterSet Defs;		RegisterSet Defs;
#endif		#endif

// Link all nodes (upwards in the data-flow) with their reaching defs.		// Link all nodes (upwards in the data-flow) with their reaching defs.
for (NodeAddr<RefNode> RA : SA.Addr->members_if(P, this)) {		for (NodeAddr<RefNode> RA : SA.Addr->members_if(P, this)) {
uint16_t Kind = RA.Addr->getKind();		uint16_t Kind = RA.Addr->getKind();
assert(Kind == NodeAttrs::Def \|\| Kind == NodeAttrs::Use);		assert(Kind == NodeAttrs::Def \|\| Kind == NodeAttrs::Use);
RegisterRef RR = RA.Addr->getRegRef(*this);		RegisterRef RR = RA.Addr->getRegRef(*this);
#ifndef NDEBUG		// #ifndef NDEBUG
// Do not expect multiple defs of the same reference.		// // Do not expect multiple defs of the same reference.
assert(Kind != NodeAttrs::Def \|\| !Defs.count(RR));		// assert(Kind != NodeAttrs::Def \|\| !Defs.count(RR));
Defs.insert(RR);		// Defs.insert(RR);
#endif		// #endif

auto F = DefM.find(RR.Reg);		auto F = DefM.find(RR.Reg);
if (F == DefM.end())		if (F == DefM.end())
continue;		continue;
DefStack &DS = F->second;		DefStack &DS = F->second;
if (Kind == NodeAttrs::Use)		if (Kind == NodeAttrs::Use)
linkRefUp<UseNode*>(SA, RA, DS);		linkRefUp<UseNode*>(SA, RA, DS);
else if (Kind == NodeAttrs::Def)		else if (Kind == NodeAttrs::Def)
▲ Show 20 Lines • Show All 201 Lines • Show Last 20 Lines

llvm/lib/Target/ARM/ARM.h

Show First 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	FunctionPass *createThumb2SizeReductionPass(
std::function<bool(const Function &)> Ftor = nullptr);		std::function<bool(const Function &)> Ftor = nullptr);
InstructionSelector *		InstructionSelector *
createARMInstructionSelector(const ARMBaseTargetMachine &TM, const ARMSubtarget &STI,		createARMInstructionSelector(const ARMBaseTargetMachine &TM, const ARMSubtarget &STI,
const ARMRegisterBankInfo &RBI);		const ARMRegisterBankInfo &RBI);
Pass *createMVEGatherScatterLoweringPass();		Pass *createMVEGatherScatterLoweringPass();
FunctionPass *createARMSLSHardeningPass();		FunctionPass *createARMSLSHardeningPass();
FunctionPass *createARMIndirectThunks();		FunctionPass *createARMIndirectThunks();
Pass *createMVELaneInterleavingPass();		Pass *createMVELaneInterleavingPass();
		FunctionPass *createARMFixCortexA57AES1742098Pass();

void LowerARMMachineInstrToMCInst(const MachineInstr *MI, MCInst &OutMI,		void LowerARMMachineInstrToMCInst(const MachineInstr *MI, MCInst &OutMI,
ARMAsmPrinter &AP);		ARMAsmPrinter &AP);

void initializeARMParallelDSPPass(PassRegistry &);		void initializeARMParallelDSPPass(PassRegistry &);
void initializeARMLoadStoreOptPass(PassRegistry &);		void initializeARMLoadStoreOptPass(PassRegistry &);
void initializeARMPreAllocLoadStoreOptPass(PassRegistry &);		void initializeARMPreAllocLoadStoreOptPass(PassRegistry &);
void initializeARMBranchTargetsPass(PassRegistry &);		void initializeARMBranchTargetsPass(PassRegistry &);
void initializeARMConstantIslandsPass(PassRegistry &);		void initializeARMConstantIslandsPass(PassRegistry &);
void initializeARMExpandPseudoPass(PassRegistry &);		void initializeARMExpandPseudoPass(PassRegistry &);
void initializeThumb2SizeReducePass(PassRegistry &);		void initializeThumb2SizeReducePass(PassRegistry &);
void initializeThumb2ITBlockPass(PassRegistry &);		void initializeThumb2ITBlockPass(PassRegistry &);
void initializeMVEVPTBlockPass(PassRegistry &);		void initializeMVEVPTBlockPass(PassRegistry &);
void initializeMVETPAndVPTOptimisationsPass(PassRegistry &);		void initializeMVETPAndVPTOptimisationsPass(PassRegistry &);
void initializeARMLowOverheadLoopsPass(PassRegistry &);		void initializeARMLowOverheadLoopsPass(PassRegistry &);
void initializeARMBlockPlacementPass(PassRegistry &);		void initializeARMBlockPlacementPass(PassRegistry &);
void initializeMVETailPredicationPass(PassRegistry &);		void initializeMVETailPredicationPass(PassRegistry &);
void initializeMVEGatherScatterLoweringPass(PassRegistry &);		void initializeMVEGatherScatterLoweringPass(PassRegistry &);
void initializeARMSLSHardeningPass(PassRegistry &);		void initializeARMSLSHardeningPass(PassRegistry &);
void initializeMVELaneInterleavingPass(PassRegistry &);		void initializeMVELaneInterleavingPass(PassRegistry &);
		void initializeARMFixCortexA57AES1742098Pass(PassRegistry &);

} // end namespace llvm		} // end namespace llvm

#endif // LLVM_LIB_TARGET_ARM_ARM_H		#endif // LLVM_LIB_TARGET_ARM_ARM_H

llvm/lib/Target/ARM/ARM.td

Show First 20 Lines • Show All 532 Lines • ▼ Show 20 Lines	def FeaturePACBTI : SubtargetFeature<"pacbti", "HasPACBTI", "true",
"Target Identification">;		"Target Identification">;

/// Don't place a BTI instruction after return-twice constructs (setjmp).		/// Don't place a BTI instruction after return-twice constructs (setjmp).
def FeatureNoBTIAtReturnTwice : SubtargetFeature<"no-bti-at-return-twice",		def FeatureNoBTIAtReturnTwice : SubtargetFeature<"no-bti-at-return-twice",
"NoBTIAtReturnTwice", "true",		"NoBTIAtReturnTwice", "true",
"Don't place a BTI instruction "		"Don't place a BTI instruction "
"after a return-twice">;		"after a return-twice">;

		def FeatureFixCortexA57AES1742098 : SubtargetFeature<"fix-cortex-a57-aes-1742098",
		"FixCortexA57AES1742098", "true",
		"Work around Cortex-A57 Erratum 1742098 / Cortex-A72 Erratum 1655431 (AES)">;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// ARM architecture class		// ARM architecture class
//		//

// A-series ISA		// A-series ISA
def FeatureAClass : SubtargetFeature<"aclass", "ARMProcClass", "AClass",		def FeatureAClass : SubtargetFeature<"aclass", "ARMProcClass", "AClass",
"Is application profile ('A' series)">;		"Is application profile ('A' series)">;

▲ Show 20 Lines • Show All 599 Lines • ▼ Show 20 Lines
include "ARMScheduleA57.td"		include "ARMScheduleA57.td"
include "ARMScheduleM4.td"		include "ARMScheduleM4.td"
include "ARMScheduleM7.td"		include "ARMScheduleM7.td"

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// ARM processors		// ARM processors
//		//
// Dummy CPU, used to target architectures		// Dummy CPU, used to target architectures
def : ProcessorModel<"generic", CortexA8Model, []>;		def : ProcessorModel<"generic", CortexA8Model, [FeatureFixCortexA57AES1742098]>;

// FIXME: Several processors below are not using their own scheduler		// FIXME: Several processors below are not using their own scheduler
// model, but one of similar/previous processor. These should be fixed.		// model, but one of similar/previous processor. These should be fixed.

def : ProcNoItin<"arm8", [ARMv4]>;		def : ProcNoItin<"arm8", [ARMv4]>;
def : ProcNoItin<"arm810", [ARMv4]>;		def : ProcNoItin<"arm810", [ARMv4]>;
def : ProcNoItin<"strongarm", [ARMv4]>;		def : ProcNoItin<"strongarm", [ARMv4]>;
def : ProcNoItin<"strongarm110", [ARMv4]>;		def : ProcNoItin<"strongarm110", [ARMv4]>;
▲ Show 20 Lines • Show All 292 Lines • ▼ Show 20 Lines

def : ProcessorModel<"cortex-a57", CortexA57Model, [ARMv8a, ProcA57,		def : ProcessorModel<"cortex-a57", CortexA57Model, [ARMv8a, ProcA57,
FeatureHWDivThumb,		FeatureHWDivThumb,
FeatureHWDivARM,		FeatureHWDivARM,
FeatureCrypto,		FeatureCrypto,
FeatureCRC,		FeatureCRC,
FeatureFPAO,		FeatureFPAO,
FeatureAvoidPartialCPSR,		FeatureAvoidPartialCPSR,
FeatureCheapPredicableCPSR]>;		FeatureCheapPredicableCPSR,
		FeatureFixCortexA57AES1742098]>;

def : ProcessorModel<"cortex-a72", CortexA57Model, [ARMv8a, ProcA72,		def : ProcessorModel<"cortex-a72", CortexA57Model, [ARMv8a, ProcA72,
FeatureHWDivThumb,		FeatureHWDivThumb,
FeatureHWDivARM,		FeatureHWDivARM,
FeatureCrypto,		FeatureCrypto,
FeatureCRC]>;		FeatureCRC,
		FeatureFixCortexA57AES1742098]>;

def : ProcNoItin<"cortex-a73", [ARMv8a, ProcA73,		def : ProcNoItin<"cortex-a73", [ARMv8a, ProcA73,
FeatureHWDivThumb,		FeatureHWDivThumb,
FeatureHWDivARM,		FeatureHWDivARM,
FeatureCrypto,		FeatureCrypto,
FeatureCRC]>;		FeatureCRC]>;

def : ProcNoItin<"cortex-a75", [ARMv82a, ProcA75,		def : ProcNoItin<"cortex-a75", [ARMv82a, ProcA75,
▲ Show 20 Lines • Show All 151 Lines • Show Last 20 Lines

llvm/lib/Target/ARM/ARMFixCortexA57AES1742098Pass.cpp

This file was added.

//===-- ARMFixCortexA57AES1742098Pass.cpp ---------------------------------===//

// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.

// See https://llvm.org/LICENSE.txt for license information.

// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

//===----------------------------------------------------------------------===//

// This pass works around a Cortex Core Fused AES erratum:

// - Cortex-A57 Erratum 1742098

// - Cortex-A72 Erratum 1655431

// The intention is this:

// - Any 128-bit or 64-bit writes to the neon input register of an AES fused

// pair are safe (the inputs are to the AESE/AESD instruction).

// - Any 32-bit writes to the input register are unsafe, but these may happen

// in another function, or only on some control flow paths. In these cases,

// conservatively insert the VORRq anyway.

// - So, analyse both inputs to the AESE/AESD instruction, inserting a VORR if

// you cannot prove they're on a list of allowed instructions.

//===----------------------------------------------------------------------===//

simon_tathamUnsubmitted

Done

// - Cortex-A72 Erratum 1655431

- // The intention is this:

- // - Any 128-bit or 64-bit writes to the neon input register of an AES fused

- // pair are safe (the inputs are to the AESE/AESD instruction).

- // - Any 32-bit writes to the input register are unsafe, but these may happen

- // in another function, or only on some control flow paths. In these cases,

- // conservatively insert the VORRq anyway.

- // - So, analyse both inputs to the AESE/AESD instruction, inserting a VORR if

- // you cannot prove they're on a list of allowed instructions.

+ // This erratum is triggered if an input vector register to AESE or AESD was

+ // last written by an instruction that only updated 32 bits of it. A workaround

+ // is to insert r = VORRq r,r before the AES instruction, because that updates

+ // all 128 bits of the register unconditionally, making it safe.

+ //

+ // This check must be applied to both input registers of AESE/AESD.

+ //

+ // If more than one instruction might have been the last write to the register,

+ // then the VORRq must be inserted if any of the possibilities was unsafe.

//===----------------------------------------------------------------------===//

This description would leave me still confused if I didn't happen to already know roughly what the plan was. It jumps in half way through the explanation that someone would need if they were coming to this pass cold. (E.g. it talks about "the VORRq" before having even mentioned that there is one, let alone why there is one.)

How about the suggested text as a rewording?

simon_tatham: This description would leave me still confused if I didn't happen to already know roughly what…

lenaryAuthorUnsubmitted

Done

I have taken this feedback on board, and reworded the explanation based on your suggestion, with some slight ordering changes and a little more detail about the complex cases and some separation between the erratum itself, and the workaround we have chosen.

lenary: I have taken this feedback on board, and reworded the explanation based on your suggestion…

#include "ARM.h"

#include "ARMBaseInstrInfo.h"

#include "ARMBaseRegisterInfo.h"

#include "ARMSubtarget.h"

#include "Utils/ARMBaseInfo.h"

#include "llvm/ADT/STLExtras.h"

#include "llvm/ADT/SmallPtrSet.h"

#include "llvm/ADT/SmallVector.h"

#include "llvm/ADT/StringRef.h"

#include "llvm/CodeGen/MachineBasicBlock.h"

#include "llvm/CodeGen/MachineFunction.h"

#include "llvm/CodeGen/MachineFunctionPass.h"

#include "llvm/CodeGen/MachineInstr.h"

#include "llvm/CodeGen/MachineInstrBuilder.h"

#include "llvm/CodeGen/MachineInstrBundleIterator.h"

#include "llvm/CodeGen/MachineOperand.h"

#include "llvm/CodeGen/ReachingDefAnalysis.h"

#include "llvm/CodeGen/Register.h"

#include "llvm/CodeGen/TargetRegisterInfo.h"

#include "llvm/IR/DebugLoc.h"

#include "llvm/InitializePasses.h"

#include "llvm/MC/MCInstrDesc.h"

#include "llvm/Pass.h"

#include "llvm/PassRegistry.h"

#include "llvm/Support/Debug.h"

#include "llvm/Support/raw_ostream.h"

#include <assert.h>

#include <stdint.h>

using namespace llvm;

#define DEBUG_TYPE "arm-fix-cortex-a57-aes-1742098"

//===----------------------------------------------------------------------===//

namespace {

class ARMFixCortexA57AES1742098 : public MachineFunctionPass {

public:

static char ID;

explicit ARMFixCortexA57AES1742098() : MachineFunctionPass(ID) {

initializeARMFixCortexA57AES1742098Pass(*PassRegistry::getPassRegistry());

}

bool runOnMachineFunction(MachineFunction &F) override;

MachineFunctionProperties getRequiredProperties() const override {

return MachineFunctionProperties().set(

MachineFunctionProperties::Property::NoVRegs);

}

StringRef getPassName() const override {

return "ARM fix for Cortex-A57 AES Erratum 1742098";

}

void getAnalysisUsage(AnalysisUsage &AU) const override {

AU.addRequired<ReachingDefAnalysis>();

AU.setPreservesCFG();

MachineFunctionPass::getAnalysisUsage(AU);

}

private:

// This is the information needed to insert the fixup in the right place.

struct AESFixupLocation {

MachineBasicBlock *Block;

// The fixup instruction will be inserted *before* InsertionPt.

MachineInstr *InsertionPt;

MachineOperand *MOp;

};

void analyzeMF(MachineFunction &MF, ReachingDefAnalysis &RDA,

const ARMBaseRegisterInfo *TRI,

SmallVectorImpl<AESFixupLocation> &FixupLocsForFn) const;

void insertAESFixup(AESFixupLocation &FixupLoc, const ARMBaseInstrInfo *TII,

const ARMBaseRegisterInfo *TRI) const;

static bool isFirstAESPairInstr(MachineInstr &MI);

static bool isSafeAESInput(MachineInstr &MI);

};

char ARMFixCortexA57AES1742098::ID = 0;

} // end anonymous namespace

INITIALIZE_PASS_BEGIN(ARMFixCortexA57AES1742098, DEBUG_TYPE,

"ARM fix for Cortex-A57 AES Erratum 1742098", false,

false)

INITIALIZE_PASS_DEPENDENCY(ReachingDefAnalysis);

INITIALIZE_PASS_END(ARMFixCortexA57AES1742098, DEBUG_TYPE,

"ARM fix for Cortex-A57 AES Erratum 1742098", false, false)

//===----------------------------------------------------------------------===//

bool ARMFixCortexA57AES1742098::isFirstAESPairInstr(MachineInstr &MI) {

unsigned Opc = MI.getOpcode();

return Opc == ARM::AESD || Opc == ARM::AESE;

}

bool ARMFixCortexA57AES1742098::isSafeAESInput(MachineInstr &MI) {

auto CondCodeIsAL = [&MI](unsigned CCIdx) -> bool {

assert(MI.getDesc().OpInfo[CCIdx].isPredicate() &&

"CCIdx operand must be a predicate");

return MI.getOperand(CCIdx).getImm() == (int64_t)ARMCC::AL;

};

switch (MI.getOpcode()) {

// Unknown: Assume not safe.

default:

return false;

// 128-bit wide AES instructions

case ARM::AESD:

case ARM::AESE:

case ARM::AESMC:

case ARM::AESIMC:

return true;

// 128-bit and 64-bit wide bitwise ops (when condition = al)

case ARM::VANDd:

case ARM::VANDq:

case ARM::VORRd:

case ARM::VORRq:

case ARM::VEORd:

case ARM::VEORq:

case ARM::VMVNd:

case ARM::VMVNq:

return CondCodeIsAL(3);

// VMOV of 64-bit value between D registers (when condition = al)

dmgreenUnsubmitted

Done

Can/should all these use findFirstPredOperandIdx?

And is it worth checking for more instruction? Anything that sets a Q register, or is that too broad?

dmgreen: Can/should all these use findFirstPredOperandIdx? And is it worth checking for more…

lenaryAuthorUnsubmitted

Done

findFirstPredOperandIdx doesn't work as lots of these instructions are not marked isPredicable in the tablegen. I'm not sure if we want to solve that in this work, or as a follow-up (I'd lean towards follow-up).

I believe "anything that sets a Q register" is too broad, as we model subregister insertion as setting the the whole register in LLVM, but I'm not sure that micro-architecturally they are actually doing that. This is why I've tried to only add 64- and 128-bit setting instructions rather than ones that are less wide. Originally I also included the VMOVv2f32 instructions that are now at the bottom of this switch, but I felt that might have been too risky.

lenary: `findFirstPredOperandIdx` doesn't work as lots of these instructions are not marked…

lenaryAuthorUnsubmitted

Done

I'm wrong about this. The tablegen isPredicable is an override, other code might also set isPredicable to true, so I think findFirstPredOperandIdx should work too.

lenary: I'm wrong about this. The tablegen `isPredicable` is an override, other code might also set…

case ARM::VMOVD:

dmgreenUnsubmitted

Done

Perhaps add these, if they are safe:
VBICd/q
VBICi's, VORRi's
VBIF/VBIT/VBSL/VBSP
VCEQ/VCNE/etc?
VDUP? VEXT?
VMVN imm equivalents of VMOV's
VREV's?
VSHL's, VSHR's?
I'm not sure if they will be very useful, but they are the kind of instructions that may come up in aes algorithms.

dmgreen: Perhaps add these, if they are safe: VBICd/q VBICi's, VORRi's VBIF/VBIT/VBSL/VBSP VCEQ/VCNE/etc?

lenaryAuthorUnsubmitted

Done

I'm very keen to avoid scope-creep on this patch, so I'm going to push back on this comment.

We know this list as given is safe (and have had internal confirmation). I've sent a new email internally with your list of instructions, to find out of they're safe too, but I'd like any answer to that to be part of a follow-up patch rather than blocking this patch for yet longer.

I believe what I have today is correct, even if the list is not optimal for all expected AES code sequences.

lenary: I'm very keen to avoid scope-creep on this patch, so I'm going to push back on this comment.

dmgreenUnsubmitted

Done

Yeah that sounds OK. So long as you address Simons comments and follow up with the instructions at a later date, this LGTM.

dmgreen: Yeah that sounds OK. So long as you address Simons comments and follow up with the instructions…

return CondCodeIsAL(2);

// VMOV of 64 bit value from GPRs (when condition = al)

case ARM::VMOVDRR:

return CondCodeIsAL(3);

// VMOV of 64-bit immediate into D or Q registers (when condition = al)

case ARM::VMOVv2i64:

case ARM::VMOVv1i64:

return CondCodeIsAL(2);

// Loads (when condition = al)

// VLD Dn, [Rn, #imm]

case ARM::VLDRD:

return CondCodeIsAL(3);

// VLDM

case ARM::VLDMDDB_UPD:

case ARM::VLDMDIA_UPD:

return CondCodeIsAL(2);

case ARM::VLDMDIA:

return CondCodeIsAL(1);

// VLDn to all lanes (to one single lane is unsafe).

case ARM::VLD1d64:

case ARM::VLD1q64:

case ARM::VLD1d32:

case ARM::VLD1q32:

case ARM::VLD2b32:

case ARM::VLD2d32:

case ARM::VLD2q32:

case ARM::VLD1d16:

case ARM::VLD1q16:

case ARM::VLD2d16:

case ARM::VLD2q16:

case ARM::VLD1d8:

case ARM::VLD1q8:

case ARM::VLD2b8:

case ARM::VLD2d8:

case ARM::VLD2q8:

return CondCodeIsAL(3);

case ARM::VLD3d32:

case ARM::VLD3q32:

case ARM::VLD3d16:

case ARM::VLD3q16:

case ARM::VLD3d8:

case ARM::VLD3q8:

return CondCodeIsAL(5);

case ARM::VLD4d32:

case ARM::VLD4q32:

case ARM::VLD4d16:

case ARM::VLD4q16:

case ARM::VLD4d8:

case ARM::VLD4q8:

return CondCodeIsAL(6);

// Always Unsafe:

// VMOV of smaller immediate into D or Q

case ARM::VMOVv2f32:

case ARM::VMOVv4f32:

case ARM::VMOVv2i32:

case ARM::VMOVv4i32:

case ARM::VMOVv4i16:

case ARM::VMOVv8i16:

case ARM::VMOVv8i8:

case ARM::VMOVv16i8:

return false;

dmgreenUnsubmitted

Done

Are these vmov of an immediate? Are they not safe?

I was expecting it to be the lanes sets (VSETLNi8) and other scalar instructions that were unsafe.

dmgreen: Are these vmov of an immediate? Are they not safe? I was expecting it to be the lanes sets…

lenaryAuthorUnsubmitted

Done

I have received the information on what is safe and what is not, and the next version of the patch will have this correct.

lenary: I have received the information on what is safe and what is not, and the next version of the…

};

return false;

}

bool ARMFixCortexA57AES1742098::runOnMachineFunction(MachineFunction &F) {

LLVM_DEBUG(dbgs() << "***** ARMFixCortexA57AES1742098 *****\n");

auto &STI = F.getSubtarget<ARMSubtarget>();

// Fix not requested or AES instructions not present: skip pass.

if (!STI.hasAES() || !STI.fixCortexA57AES1742098())

return false;

const ARMBaseRegisterInfo *TRI = STI.getRegisterInfo();

const ARMBaseInstrInfo *TII = STI.getInstrInfo();

auto &RDA = getAnalysis<ReachingDefAnalysis>();

// Analyze whole function to find instructions which need fixing up...

SmallVector<AESFixupLocation> FixupLocsForFn{};

analyzeMF(F, RDA, TRI, FixupLocsForFn);

// ... and fix the instructions up all at the same time.

bool Changed = false;

LLVM_DEBUG(dbgs() << "Inserting " << FixupLocsForFn.size() << " fixup(s)\n");

for (AESFixupLocation &FixupLoc : FixupLocsForFn) {

insertAESFixup(FixupLoc, TII, TRI);

Changed |= true;

}

return Changed;

}

void ARMFixCortexA57AES1742098::analyzeMF(

MachineFunction &MF, ReachingDefAnalysis &RDA,

const ARMBaseRegisterInfo *TRI,

SmallVectorImpl<AESFixupLocation> &FixupLocsForFn) const {

unsigned MaxAllowedFixups = 0;

for (MachineBasicBlock &MBB : MF) {

// Early return if no instructions are the start of an AES Pair.

if (!llvm::any_of(MBB.instrs(), isFirstAESPairInstr))

continue;

dmgreenUnsubmitted

Done

This needn't scan through checking for the instruction that the loop below is going to check for too.

dmgreen: This needn't scan through checking for the instruction that the loop below is going to check…

lenaryAuthorUnsubmitted

Done

Ack. Will remove. I think this is vestigal from a previous (unshared) version of the patch which was doing something more complex in the loop.

lenary: Ack. Will remove. I think this is vestigal from a previous (unshared) version of the patch…

for (MachineInstr &MI : MBB) {

if (!isFirstAESPairInstr(MI))

continue;

// Found an instruction to check the operands of.

LLVM_DEBUG(dbgs() << "Found AES Pair starting: " << MI);

assert(MI.getNumExplicitOperands() == 3 && MI.getNumExplicitDefs() == 1 &&

"Unknown AES Instruction Format. Expected 1 def, 2 uses.");

// A maximum of two fixups should be inserted for each AES pair (one per

// register use).

MaxAllowedFixups += 2;

// Inspect all operands, choosing whether to insert a fixup.

for (MachineOperand &MOp : MI.uses()) {

SmallPtrSet<MachineInstr *, 1> AllDefs{};

RDA.getGlobalReachingDefs(&MI, MOp.getReg(), AllDefs);

// Planned Fixup: This should be added to FixupLocsForFn at most once.

AESFixupLocation NewLoc{&MBB, &MI, &MOp};

// In small functions with loops, this operand may be both a live-in and

// have definitions within the function itself. These will need a fixup.

bool IsLiveIn = MF.front().isLiveIn(MOp.getReg());

// If the register doesn't have defining instructions, and is not a

// live-in, then something is wrong and the fixup must always be

// inserted to be safe.

if (!IsLiveIn && AllDefs.size() == 0) {

LLVM_DEBUG(dbgs()

<< "Fixup Planned: No Defining Instrs found, not live-in: "

<< printReg(MOp.getReg(), TRI) << "\n");

FixupLocsForFn.emplace_back(NewLoc);

continue;

}

auto IsUnsafe = [](MachineInstr *MI) -> bool {

return !isSafeAESInput(*MI);

};

size_t UnsafeCount = llvm::count_if(AllDefs, IsUnsafe);

// If there are no unsafe unsafe definitions...

if (UnsafeCount == 0) {

// ... and the register is not live-in ...

if (!IsLiveIn) {

// ... then skip the fixup.

LLVM_DEBUG(dbgs() << "No Fixup: Defining instrs are all safe: "

<< printReg(MOp.getReg(), TRI) << "\n");

continue;

}

// Otherwise, the only unsafe "definition" is a live-in, so insert the

// fixup at the start of the function.

LLVM_DEBUG(dbgs()

<< "Fixup Planned: Live-In (with safe defining instrs): "

dmgreenUnsubmitted

Done

Can you explain more about the IsLiveIn && UnsafeCount==0 case. Am I understanding that correctly that it would be:

function(q0, ...)
  lotsofcode...
  q0 = load
  aes q0

Is there a better way to detect that the live-in doesn't matter in cases like that?

dmgreen: Can you explain more about the IsLiveIn && UnsafeCount==0 case. Am I understanding that…

lenaryAuthorUnsubmitted

Done

I don't believe there is, and this comes down to issues with the RDA.getGlobalReachingDefs implementation, which I want to fix/enhance, but in a follow-up patch.

To start with, this is actually not a problem, as the pass is intended to be conservative, and we know the clobbers are a no-op at the architectural level (we insert them for their micro-architectural effects). So code will still do the right thing, but maybe with a little too much overhead in the case you showed.

However, this is necessary in some other cases, such as:

function(q0)
   code
   conditional branch to L2
L1:
   q0 = safe_op(…)
   branch to L3
L2:
   code without update to q0
L3:
   aes q0

In this case, AllDefs is a set containing one single defining instruction for Q0, because there is only one within the function (which is all that RDA.getGlobalReachingDefs can report instructions for).

But in my example, we *need* to protect q0 on the other paths, because the safe definitions of q0, when considered as a set, do not entirely dominate the AES use of q0 (this is slightly stretching the conventional definition of dominance, but think of this as "there exists a path from entry to the aes, which does not contain any of the safe instructions". Sadly, MachineDominance doesn't allow us to make this kind of query either!).

In this case though, it is safe to insert the protection at function entry, because that will (by definition) dominate all the AES uses, and the protection doesn't need to be dominated by the safe definitions, as we know they're safe.

I intend to follow-up this initial patch with an enhancement to ReachingDefAnalysis which will also provide the information that you have a set of defs inside the function, and also you're live-in, as this is required info for any conservative pass using the ReachingDefAnalysis. I felt, however, that given the pass is safe as-is, it was good to proceed without this enhancement.

lenary: I don't believe there is, and this comes down to issues with the `RDA.getGlobalReachingDefs`…

dmgreenUnsubmitted

Done

OK sounds good.

dmgreen: OK sounds good.

lenaryAuthorUnsubmitted

Done

One note is that the exact problem you describe does show up in the tests (in aese_set64_via_ptr, where the vldr is "safe"), so when the pass is enhanced, we will notice the improvements.

lenary: One note is that the exact problem you describe does show up in the tests (in…

<< printReg(MOp.getReg(), TRI) << "\n");

NewLoc.Block = &MF.front();

NewLoc.InsertionPt = &*NewLoc.Block->begin();

LLVM_DEBUG(dbgs() << "Moving Fixup for Live-In to immediately before "

simon_tathamUnsubmitted

Done

nit: repeated word

simon_tatham: nit: repeated word

<< *NewLoc.InsertionPt);

FixupLocsForFn.emplace_back(NewLoc);

continue;

}

// If a fixup is needed in more than one place, then the best place to

// insert it is adjacent to the use rather than introducing a fixup

// adjacent to each def.

// FIXME: It might be better to hoist this to the start of the BB, if

// possible.

if (IsLiveIn || UnsafeCount > 1) {

LLVM_DEBUG(dbgs() << "Fixup Planned: Multiple unsafe defining instrs "

"(including live-ins): "

<< printReg(MOp.getReg(), TRI) << "\n");

FixupLocsForFn.emplace_back(NewLoc);

continue;

}

assert(UnsafeCount == 1 && !IsLiveIn &&

"At this point, there should be one unsafe defining instrs "

"and the defined register should not be a live-in.");

SmallPtrSetIterator<MachineInstr *> It =

llvm::find_if(AllDefs, IsUnsafe);

assert(It != AllDefs.end() &&

"UnsafeCount == 1 but No Unsafe MachineInstr found.");

MachineInstr *DefMI = *It;

LLVM_DEBUG(

dbgs() << "Fixup Planned: Found single unsafe defining instrs for "

<< printReg(MOp.getReg(), TRI) << ": " << *DefMI);

// There is one unsafe defining instruction, which needs a fixup. It is

// generally good to hoist the fixup to be adjacent to the defining

// instruction rather than the using instruction, as the using

// instruction may be inside a loop when the defining instruction is

// not.

MachineBasicBlock::iterator DefIt = DefMI;

++DefIt;

if (DefIt != DefMI->getParent()->end()) {

LLVM_DEBUG(dbgs() << "Moving Fixup to immediately after " << *DefMI

<< "And immediately before " << *DefIt);

NewLoc.Block = DefIt->getParent();

NewLoc.InsertionPt = &*DefIt;

}

FixupLocsForFn.emplace_back(NewLoc);

}

assert(FixupLocsForFn.size() <= MaxAllowedFixups &&

"Inserted too many fixups for this function.");

}

void ARMFixCortexA57AES1742098::insertAESFixup(

AESFixupLocation &FixupLoc, const ARMBaseInstrInfo *TII,

const ARMBaseRegisterInfo *TRI) const {

MachineOperand *OperandToFixup = FixupLoc.MOp;

assert(OperandToFixup->isReg() && "OperandToFixup must be a register");

LLVM_DEBUG(dbgs() << "Inserting VORRq of " << printReg(RegToFixup, TRI)

<< " before: " << *FixupLoc.InsertionPt);

// Insert the new `VORRq qN, qN, qN`. There are a few details here:

// The uses are marked as killed, even if the original use of OperandToFixup

// is not killed, as the new instruction is clobbering the register. This is

// safe even if there are other uses of `qN`, as the VORRq value-wise a no-op

// (it is inserted for microarchitectural reasons).

// The def and the uses are still marked as Renamable if the original register

// was, to avoid having to rummage through all the other uses and defs and

// unset their renamable bits.

unsigned Renamable = OperandToFixup->isRenamable() ? RegState::Renamable : 0;

BuildMI(*FixupLoc.Block, FixupLoc.InsertionPt, DebugLoc(),

TII->get(ARM::VORRq))

.addReg(RegToFixup, RegState::Define | Renamable)

.addReg(RegToFixup, RegState::Kill | Renamable)

.addImm((uint64_t)ARMCC::AL)

.addReg(ARM::NoRegister);

}

// Factory function used by AArch64TargetMachine to add the pass to

// the passmanager.

FunctionPass *llvm::createARMFixCortexA57AES1742098Pass() {

return new ARMFixCortexA57AES1742098();

}

llvm/lib/Target/ARM/ARMSubtarget.h

//===-- ARMSubtarget.h - Define Subtarget for the ARM ----------- C++ ---===//		//===-- ARMSubtarget.h - Define Subtarget for the ARM ----------- C++ ---===//
		Lint: Lint Inline Actions clang-format suggested style edits found: Lint: Lint: clang-format suggested style edits found:
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file declares the ARM specific subclass of TargetSubtargetInfo.		// This file declares the ARM specific subclass of TargetSubtargetInfo.
▲ Show 20 Lines • Show All 525 Lines • ▼ Show 20 Lines	getMVEVectorCostFactor(TargetTransformInfo::TargetCostKind CostKind) const {
if (CostKind == TargetTransformInfo::TCK_CodeSize)		if (CostKind == TargetTransformInfo::TCK_CodeSize)
return 1;		return 1;
return MVEVectorCostFactor;		return MVEVectorCostFactor;
}		}

bool ignoreCSRForAllocationOrder(const MachineFunction &MF,		bool ignoreCSRForAllocationOrder(const MachineFunction &MF,
unsigned PhysReg) const override;		unsigned PhysReg) const override;
unsigned getGPRAllocationOrder(const MachineFunction &MF) const;		unsigned getGPRAllocationOrder(const MachineFunction &MF) const;

		lenaryAuthorUnsubmitted Done Reply Inline Actions I missed this, will update again shortly. lenary: I missed this, will update again shortly.
};		};

} // end namespace llvm		} // end namespace llvm

#endif // LLVM_LIB_TARGET_ARM_ARMSUBTARGET_H		#endif // LLVM_LIB_TARGET_ARM_ARMSUBTARGET_H

llvm/lib/Target/ARM/ARMTargetMachine.cpp

Show First 20 Lines • Show All 101 Lines • ▼ Show 20 Lines	extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeARMTarget() {
initializeMVEVPTBlockPass(Registry);		initializeMVEVPTBlockPass(Registry);
initializeMVETPAndVPTOptimisationsPass(Registry);		initializeMVETPAndVPTOptimisationsPass(Registry);
initializeMVETailPredicationPass(Registry);		initializeMVETailPredicationPass(Registry);
initializeARMLowOverheadLoopsPass(Registry);		initializeARMLowOverheadLoopsPass(Registry);
initializeARMBlockPlacementPass(Registry);		initializeARMBlockPlacementPass(Registry);
initializeMVEGatherScatterLoweringPass(Registry);		initializeMVEGatherScatterLoweringPass(Registry);
initializeARMSLSHardeningPass(Registry);		initializeARMSLSHardeningPass(Registry);
initializeMVELaneInterleavingPass(Registry);		initializeMVELaneInterleavingPass(Registry);
		initializeARMFixCortexA57AES1742098Pass(Registry);
}		}

static std::unique_ptr<TargetLoweringObjectFile> createTLOF(const Triple &TT) {		static std::unique_ptr<TargetLoweringObjectFile> createTLOF(const Triple &TT) {
if (TT.isOSBinFormatMachO())		if (TT.isOSBinFormatMachO())
return std::make_unique<TargetLoweringObjectFileMachO>();		return std::make_unique<TargetLoweringObjectFileMachO>();
if (TT.isOSWindows())		if (TT.isOSWindows())
return std::make_unique<TargetLoweringObjectFileCOFF>();		return std::make_unique<TargetLoweringObjectFileCOFF>();
return std::make_unique<ARMElfTargetObjectFile>();		return std::make_unique<ARMElfTargetObjectFile>();
▲ Show 20 Lines • Show All 455 Lines • ▼ Show 20 Lines	void ARMPassConfig::addPreEmitPass() {
if (getOptLevel() != CodeGenOpt::None) {		if (getOptLevel() != CodeGenOpt::None) {
addPass(createARMBlockPlacementPass());		addPass(createARMBlockPlacementPass());
addPass(createARMOptimizeBarriersPass());		addPass(createARMOptimizeBarriersPass());
}		}
}		}

void ARMPassConfig::addPreEmitPass2() {		void ARMPassConfig::addPreEmitPass2() {
addPass(createARMBranchTargetsPass());		addPass(createARMBranchTargetsPass());
addPass(createARMConstantIslandPass());		addPass(createARMConstantIslandPass());
addPass(createARMLowOverheadLoopsPass());		addPass(createARMLowOverheadLoopsPass());
		dmgreenUnsubmitted Done Reply Inline Actions "No new instructions may be inserted" -> "Block sizes cannot be increased" And maybe "will affect the offsets used for accessing these constants." -> "may push the branch ranges and load offsets of accessing constant pools out of range." dmgreen: "No new instructions may be inserted" -> "Block sizes cannot be increased" And maybe "will…

		addPass(createARMFixCortexA57AES1742098Pass());
		dmgreenUnsubmitted Done Reply Inline Actions Passes can't insert new instructions (or move things further apart) after ConstantIslandPass. The branches/constant pools it has created may go out of range of the instructions that use them. Would it be OK to move it before that? dmgreen: Passes can't insert new instructions (or move things further apart) after ConstantIslandPass.
		lenaryAuthorUnsubmitted Done Reply Inline Actions TIL. I'll add a comment about the constant island pass as well. Should I also look at the restrictions on the Branch Targets pass? I can imagine we also don't want to separate instructions once we've calculated their targets locally? lenary: TIL. I'll add a comment about the constant island pass as well. Should I also look at the…
		dmgreenUnsubmitted Done Reply Inline Actions Yeah - It sounds like the BTI would need to remain as the first instruction in the block. dmgreen: Yeah - It sounds like the BTI would need to remain as the first instruction in the block.
		lenaryAuthorUnsubmitted Done Reply Inline Actions Turns out the AES pass doesn't have to come before the BTI pass, because AES instructions are only available on A-profile, and BTI is M-profile. I still will move it to before all these passes anyway, just so it's clear what is going on. lenary: Turns out the AES pass doesn't have to come before the BTI pass, because AES instructions are…
		dmgreenUnsubmitted Done Reply Inline Actions It's not about "not inserting instructions" exactly - it will replace psuedos with all kinds of new instructions :) The pseudos needed to have a conservative size through ConstantIslandPass though to allow that though. It does make sure that it will not move instructions further apart from their targets. dmgreen: It's not about "not inserting instructions" exactly - it will replace psuedos with all kinds of…

if (TM->getTargetTriple().isOSWindows()) {		if (TM->getTargetTriple().isOSWindows()) {
// Identify valid longjmp targets for Windows Control Flow Guard.		// Identify valid longjmp targets for Windows Control Flow Guard.
addPass(createCFGuardLongjmpPass());		addPass(createCFGuardLongjmpPass());
// Identify valid eh continuation targets for Windows EHCont Guard.		// Identify valid eh continuation targets for Windows EHCont Guard.
addPass(createEHContGuardCatchretPass());		addPass(createEHContGuardCatchretPass());
}		}
}		}

llvm/lib/Target/ARM/CMakeLists.txt

Show All 26 Lines	add_llvm_target(ARMCodeGen
ARMBasicBlockInfo.cpp		ARMBasicBlockInfo.cpp
ARMBranchTargets.cpp		ARMBranchTargets.cpp
ARMCallingConv.cpp		ARMCallingConv.cpp
ARMCallLowering.cpp		ARMCallLowering.cpp
ARMConstantIslandPass.cpp		ARMConstantIslandPass.cpp
ARMConstantPoolValue.cpp		ARMConstantPoolValue.cpp
ARMExpandPseudoInsts.cpp		ARMExpandPseudoInsts.cpp
ARMFastISel.cpp		ARMFastISel.cpp
		ARMFixCortexA57AES1742098Pass.cpp
ARMFrameLowering.cpp		ARMFrameLowering.cpp
ARMHazardRecognizer.cpp		ARMHazardRecognizer.cpp
ARMInstructionSelector.cpp		ARMInstructionSelector.cpp
ARMISelDAGToDAG.cpp		ARMISelDAGToDAG.cpp
ARMISelLowering.cpp		ARMISelLowering.cpp
ARMInstrInfo.cpp		ARMInstrInfo.cpp
ARMLegalizerInfo.cpp		ARMLegalizerInfo.cpp
ARMParallelDSP.cpp		ARMParallelDSP.cpp
▲ Show 20 Lines • Show All 55 Lines • Show Last 20 Lines

llvm/test/CodeGen/ARM/O3-pipeline.ll

	Show First 20 Lines • Show All 182 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: FunctionPass Manager			; CHECK-NEXT: FunctionPass Manager
	; CHECK-NEXT: ARM Branch Targets			; CHECK-NEXT: ARM Branch Targets
	; CHECK-NEXT: MachineDominator Tree Construction			; CHECK-NEXT: MachineDominator Tree Construction
	; CHECK-NEXT: ARM constant island placement and branch shortening pass			; CHECK-NEXT: ARM constant island placement and branch shortening pass
	; CHECK-NEXT: MachineDominator Tree Construction			; CHECK-NEXT: MachineDominator Tree Construction
	; CHECK-NEXT: Machine Natural Loop Construction			; CHECK-NEXT: Machine Natural Loop Construction
	; CHECK-NEXT: ReachingDefAnalysis			; CHECK-NEXT: ReachingDefAnalysis
	; CHECK-NEXT: ARM Low Overhead Loops pass			; CHECK-NEXT: ARM Low Overhead Loops pass
				; CHECK-NEXT: ReachingDefAnalysis
				; CHECK-NEXT: ARM fix for Cortex-A57 AES Erratum 1742098
	; CHECK-NEXT: Lazy Machine Block Frequency Analysis			; CHECK-NEXT: Lazy Machine Block Frequency Analysis
	; CHECK-NEXT: Machine Optimization Remark Emitter			; CHECK-NEXT: Machine Optimization Remark Emitter
	; CHECK-NEXT: ARM Assembly Printer			; CHECK-NEXT: ARM Assembly Printer
	; CHECK-NEXT: Free MachineFunction			; CHECK-NEXT: Free MachineFunction

llvm/test/CodeGen/ARM/aes-erratum-fix.ll

Show All 18 Lines
declare <16 x i8> @llvm.arm.neon.aesimc(<16 x i8>)		declare <16 x i8> @llvm.arm.neon.aesimc(<16 x i8>)


define arm_aapcs_vfpcc void @aese_zero(<16 x i8>* %0) nounwind {		define arm_aapcs_vfpcc void @aese_zero(<16 x i8>* %0) nounwind {
; CHECK-FIX-LABEL: aese_zero:		; CHECK-FIX-LABEL: aese_zero:
; CHECK-FIX: @ %bb.0:		; CHECK-FIX: @ %bb.0:
; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r0]		; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r0]
; CHECK-FIX-NEXT: vmov.i32 q9, #0x0		; CHECK-FIX-NEXT: vmov.i32 q9, #0x0
		; CHECK-FIX-NEXT: vorr q9, q9, q9
; CHECK-FIX-NEXT: aese.8 q9, q8		; CHECK-FIX-NEXT: aese.8 q9, q8
; CHECK-FIX-NEXT: aesmc.8 q8, q9		; CHECK-FIX-NEXT: aesmc.8 q8, q9
; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r0]		; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r0]
; CHECK-FIX-NEXT: bx lr		; CHECK-FIX-NEXT: bx lr
%2 = load <16 x i8>, <16 x i8>* %0, align 8		%2 = load <16 x i8>, <16 x i8>* %0, align 8
%3 = call <16 x i8> @llvm.arm.neon.aese(<16 x i8> zeroinitializer, <16 x i8> %2)		%3 = call <16 x i8> @llvm.arm.neon.aese(<16 x i8> zeroinitializer, <16 x i8> %2)
%4 = call <16 x i8> @llvm.arm.neon.aesmc(<16 x i8> %3)		%4 = call <16 x i8> @llvm.arm.neon.aesmc(<16 x i8> %3)
store <16 x i8> %4, <16 x i8>* %0, align 8		store <16 x i8> %4, <16 x i8>* %0, align 8
ret void		ret void
}		}

define arm_aapcs_vfpcc void @aese_once_via_ptr(<16 x i8>* %0, <16 x i8>* %1) nounwind {		define arm_aapcs_vfpcc void @aese_once_via_ptr(<16 x i8>* %0, <16 x i8>* %1) nounwind {
; CHECK-FIX-LABEL: aese_once_via_ptr:		; CHECK-FIX-LABEL: aese_once_via_ptr:
; CHECK-FIX: @ %bb.0:		; CHECK-FIX: @ %bb.0:
; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r0]		; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r0]
; CHECK-FIX-NEXT: vld1.64 {d18, d19}, [r1]		; CHECK-FIX-NEXT: vld1.64 {d18, d19}, [r1]
; CHECK-FIX-NEXT: aese.8 q9, q8		; CHECK-FIX-NEXT: aese.8 q9, q8
; CHECK-FIX-NEXT: aesmc.8 q8, q9		; CHECK-FIX-NEXT: aesmc.8 q8, q9
; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r1]		; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r1]
; CHECK-FIX-NEXT: bx lr		; CHECK-FIX-NEXT: bx lr
%3 = load <16 x i8>, <16 x i8>* %1, align 8		%3 = load <16 x i8>, <16 x i8>* %1, align 8
%4 = load <16 x i8>, <16 x i8>* %0, align 8		%4 = load <16 x i8>, <16 x i8>* %0, align 8
		dmgreenUnsubmitted Done Reply Inline Actions Adding arm_aapcs_vfpcc will make the function "hardfp", which might be useful for testing inputs from argument that don't need to be passed via gpr regs. dmgreen: Adding arm_aapcs_vfpcc will make the function "hardfp", which might be useful for testing…
		lenaryAuthorUnsubmitted Done Reply Inline Actions Yeah, seems I was too zealous with removing some of these attributes. lenary: Yeah, seems I was too zealous with removing some of these attributes.
%5 = call <16 x i8> @llvm.arm.neon.aese(<16 x i8> %3, <16 x i8> %4)		%5 = call <16 x i8> @llvm.arm.neon.aese(<16 x i8> %3, <16 x i8> %4)
%6 = call <16 x i8> @llvm.arm.neon.aesmc(<16 x i8> %5)		%6 = call <16 x i8> @llvm.arm.neon.aesmc(<16 x i8> %5)
store <16 x i8> %6, <16 x i8>* %1, align 8		store <16 x i8> %6, <16 x i8>* %1, align 8
ret void		ret void
}		}

define arm_aapcs_vfpcc <16 x i8> @aese_once_via_val(<16 x i8> %0, <16 x i8> %1) nounwind {		define arm_aapcs_vfpcc <16 x i8> @aese_once_via_val(<16 x i8> %0, <16 x i8> %1) nounwind {
; CHECK-FIX-LABEL: aese_once_via_val:		; CHECK-FIX-LABEL: aese_once_via_val:
; CHECK-FIX: @ %bb.0:		; CHECK-FIX: @ %bb.0:
		; CHECK-FIX-NEXT: vorr q1, q1, q1
		; CHECK-FIX-NEXT: vorr q0, q0, q0
; CHECK-FIX-NEXT: aese.8 q1, q0		; CHECK-FIX-NEXT: aese.8 q1, q0
; CHECK-FIX-NEXT: aesmc.8 q0, q1		; CHECK-FIX-NEXT: aesmc.8 q0, q1
; CHECK-FIX-NEXT: bx lr		; CHECK-FIX-NEXT: bx lr
%3 = call <16 x i8> @llvm.arm.neon.aese(<16 x i8> %1, <16 x i8> %0)		%3 = call <16 x i8> @llvm.arm.neon.aese(<16 x i8> %1, <16 x i8> %0)
%4 = call <16 x i8> @llvm.arm.neon.aesmc(<16 x i8> %3)		%4 = call <16 x i8> @llvm.arm.neon.aesmc(<16 x i8> %3)
ret <16 x i8> %4		ret <16 x i8> %4
}		}

Show All 20 Lines	; CHECK-FIX-NEXT: bx lr
%9 = call <16 x i8> @llvm.arm.neon.aesmc(<16 x i8> %8)		%9 = call <16 x i8> @llvm.arm.neon.aesmc(<16 x i8> %8)
store <16 x i8> %9, <16 x i8>* %1, align 8		store <16 x i8> %9, <16 x i8>* %1, align 8
ret void		ret void
}		}

define arm_aapcs_vfpcc <16 x i8> @aese_twice_via_val(<16 x i8> %0, <16 x i8> %1) nounwind {		define arm_aapcs_vfpcc <16 x i8> @aese_twice_via_val(<16 x i8> %0, <16 x i8> %1) nounwind {
; CHECK-FIX-LABEL: aese_twice_via_val:		; CHECK-FIX-LABEL: aese_twice_via_val:
; CHECK-FIX: @ %bb.0:		; CHECK-FIX: @ %bb.0:
		; CHECK-FIX-NEXT: vorr q1, q1, q1
		; CHECK-FIX-NEXT: vorr q0, q0, q0
		; CHECK-FIX-NEXT: vorr q0, q0, q0
; CHECK-FIX-NEXT: aese.8 q1, q0		; CHECK-FIX-NEXT: aese.8 q1, q0
; CHECK-FIX-NEXT: aesmc.8 q8, q1		; CHECK-FIX-NEXT: aesmc.8 q8, q1
; CHECK-FIX-NEXT: aese.8 q8, q0		; CHECK-FIX-NEXT: aese.8 q8, q0
; CHECK-FIX-NEXT: aesmc.8 q0, q8		; CHECK-FIX-NEXT: aesmc.8 q0, q8
; CHECK-FIX-NEXT: bx lr		; CHECK-FIX-NEXT: bx lr
%3 = call <16 x i8> @llvm.arm.neon.aese(<16 x i8> %1, <16 x i8> %0)		%3 = call <16 x i8> @llvm.arm.neon.aese(<16 x i8> %1, <16 x i8> %0)
%4 = call <16 x i8> @llvm.arm.neon.aesmc(<16 x i8> %3)		%4 = call <16 x i8> @llvm.arm.neon.aesmc(<16 x i8> %3)
%5 = call <16 x i8> @llvm.arm.neon.aese(<16 x i8> %4, <16 x i8> %0)		%5 = call <16 x i8> @llvm.arm.neon.aese(<16 x i8> %4, <16 x i8> %0)
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	6:
%12 = add nuw i32 %7, 1		%12 = add nuw i32 %7, 1
%13 = icmp eq i32 %12, %0		%13 = icmp eq i32 %12, %0
br i1 %13, label %5, label %6		br i1 %13, label %5, label %6
}		}

define arm_aapcs_vfpcc <16 x i8> @aese_loop_via_val(i32 %0, <16 x i8> %1, <16 x i8> %2) nounwind {		define arm_aapcs_vfpcc <16 x i8> @aese_loop_via_val(i32 %0, <16 x i8> %1, <16 x i8> %2) nounwind {
; CHECK-FIX-LABEL: aese_loop_via_val:		; CHECK-FIX-LABEL: aese_loop_via_val:
; CHECK-FIX: @ %bb.0:		; CHECK-FIX: @ %bb.0:
		; CHECK-FIX-NEXT: vorr q1, q1, q1
		; CHECK-FIX-NEXT: vorr q0, q0, q0
; CHECK-FIX-NEXT: cmp r0, #0		; CHECK-FIX-NEXT: cmp r0, #0
; CHECK-FIX-NEXT: beq .LBB6_2		; CHECK-FIX-NEXT: beq .LBB6_2
; CHECK-FIX-NEXT: .LBB6_1: @ =>This Inner Loop Header: Depth=1		; CHECK-FIX-NEXT: .LBB6_1: @ =>This Inner Loop Header: Depth=1
; CHECK-FIX-NEXT: aese.8 q1, q0		; CHECK-FIX-NEXT: aese.8 q1, q0
; CHECK-FIX-NEXT: subs r0, r0, #1		; CHECK-FIX-NEXT: subs r0, r0, #1
; CHECK-FIX-NEXT: aesmc.8 q1, q1		; CHECK-FIX-NEXT: aesmc.8 q1, q1
; CHECK-FIX-NEXT: bne .LBB6_1		; CHECK-FIX-NEXT: bne .LBB6_1
; CHECK-FIX-NEXT: .LBB6_2:		; CHECK-FIX-NEXT: .LBB6_2:
Show All 16 Lines	7:
br i1 %13, label %5, label %7		br i1 %13, label %5, label %7
}		}

define arm_aapcs_vfpcc void @aese_set8_via_ptr(i8* %0, <16 x i8> %1, <16 x i8>* %2) nounwind {		define arm_aapcs_vfpcc void @aese_set8_via_ptr(i8* %0, <16 x i8> %1, <16 x i8>* %2) nounwind {
; CHECK-FIX-LABEL: aese_set8_via_ptr:		; CHECK-FIX-LABEL: aese_set8_via_ptr:
; CHECK-FIX: @ %bb.0:		; CHECK-FIX: @ %bb.0:
; CHECK-FIX-NEXT: vld1.8 {d0[0]}, [r0]		; CHECK-FIX-NEXT: vld1.8 {d0[0]}, [r0]
; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r1]		; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r1]
		; CHECK-FIX-NEXT: vorr q0, q0, q0
; CHECK-FIX-NEXT: aese.8 q8, q0		; CHECK-FIX-NEXT: aese.8 q8, q0
; CHECK-FIX-NEXT: aesmc.8 q8, q8		; CHECK-FIX-NEXT: aesmc.8 q8, q8
; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r1]		; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r1]
; CHECK-FIX-NEXT: bx lr		; CHECK-FIX-NEXT: bx lr
%4 = load <16 x i8>, <16 x i8>* %2, align 8		%4 = load <16 x i8>, <16 x i8>* %2, align 8
%5 = load i8, i8* %0, align 1		%5 = load i8, i8* %0, align 1
%6 = insertelement <16 x i8> %1, i8 %5, i64 0		%6 = insertelement <16 x i8> %1, i8 %5, i64 0
%7 = call <16 x i8> @llvm.arm.neon.aese(<16 x i8> %4, <16 x i8> %6)		%7 = call <16 x i8> @llvm.arm.neon.aese(<16 x i8> %4, <16 x i8> %6)
%8 = call <16 x i8> @llvm.arm.neon.aesmc(<16 x i8> %7)		%8 = call <16 x i8> @llvm.arm.neon.aesmc(<16 x i8> %7)
store <16 x i8> %8, <16 x i8>* %2, align 8		store <16 x i8> %8, <16 x i8>* %2, align 8
ret void		ret void
}		}

define arm_aapcs_vfpcc void @aese_set8_via_val(i8 zeroext %0, <16 x i8> %1, <16 x i8>* %2) nounwind {		define arm_aapcs_vfpcc void @aese_set8_via_val(i8 zeroext %0, <16 x i8> %1, <16 x i8>* %2) nounwind {
; CHECK-FIX-LABEL: aese_set8_via_val:		; CHECK-FIX-LABEL: aese_set8_via_val:
; CHECK-FIX: @ %bb.0:		; CHECK-FIX: @ %bb.0:
		; CHECK-FIX-NEXT: vorr q0, q0, q0
; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r1]		; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r1]
; CHECK-FIX-NEXT: vmov.8 d16[0], r0		; CHECK-FIX-NEXT: vmov.8 d16[0], r0
		; CHECK-FIX-NEXT: vorr q8, q8, q8
; CHECK-FIX-NEXT: aese.8 q8, q0		; CHECK-FIX-NEXT: aese.8 q8, q0
; CHECK-FIX-NEXT: aesmc.8 q8, q8		; CHECK-FIX-NEXT: aesmc.8 q8, q8
; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r1]		; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r1]
; CHECK-FIX-NEXT: bx lr		; CHECK-FIX-NEXT: bx lr
%4 = load <16 x i8>, <16 x i8>* %2, align 8		%4 = load <16 x i8>, <16 x i8>* %2, align 8
%5 = insertelement <16 x i8> %4, i8 %0, i64 0		%5 = insertelement <16 x i8> %4, i8 %0, i64 0
%6 = call <16 x i8> @llvm.arm.neon.aese(<16 x i8> %5, <16 x i8> %1)		%6 = call <16 x i8> @llvm.arm.neon.aese(<16 x i8> %5, <16 x i8> %1)
%7 = call <16 x i8> @llvm.arm.neon.aesmc(<16 x i8> %6)		%7 = call <16 x i8> @llvm.arm.neon.aesmc(<16 x i8> %6)
store <16 x i8> %7, <16 x i8>* %2, align 8		store <16 x i8> %7, <16 x i8>* %2, align 8
ret void		ret void
}		}

define arm_aapcs_vfpcc void @aese_set8_cond_via_ptr(i1 zeroext %0, i8* %1, <16 x i8> %2, <16 x i8>* %3) nounwind {		define arm_aapcs_vfpcc void @aese_set8_cond_via_ptr(i1 zeroext %0, i8* %1, <16 x i8> %2, <16 x i8>* %3) nounwind {
; CHECK-FIX-LABEL: aese_set8_cond_via_ptr:		; CHECK-FIX-LABEL: aese_set8_cond_via_ptr:
; CHECK-FIX: @ %bb.0:		; CHECK-FIX: @ %bb.0:
; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r2]		; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r2]
; CHECK-FIX-NEXT: cmp r0, #0		; CHECK-FIX-NEXT: cmp r0, #0
; CHECK-FIX-NEXT: beq .LBB9_2		; CHECK-FIX-NEXT: beq .LBB9_2
; CHECK-FIX-NEXT: @ %bb.1:		; CHECK-FIX-NEXT: @ %bb.1:
; CHECK-FIX-NEXT: vld1.8 {d0[0]}, [r1]		; CHECK-FIX-NEXT: vld1.8 {d0[0]}, [r1]
; CHECK-FIX-NEXT: .LBB9_2:		; CHECK-FIX-NEXT: .LBB9_2:
		; CHECK-FIX-NEXT: vorr q0, q0, q0
; CHECK-FIX-NEXT: aese.8 q8, q0		; CHECK-FIX-NEXT: aese.8 q8, q0
; CHECK-FIX-NEXT: aesmc.8 q8, q8		; CHECK-FIX-NEXT: aesmc.8 q8, q8
; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r2]		; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r2]
; CHECK-FIX-NEXT: bx lr		; CHECK-FIX-NEXT: bx lr
%5 = load <16 x i8>, <16 x i8>* %3, align 8		%5 = load <16 x i8>, <16 x i8>* %3, align 8
br i1 %0, label %6, label %9		br i1 %0, label %6, label %9

6:		6:
%7 = load i8, i8* %1, align 1		%7 = load i8, i8* %1, align 1
%8 = insertelement <16 x i8> %2, i8 %7, i64 0		%8 = insertelement <16 x i8> %2, i8 %7, i64 0
br label %9		br label %9

9:		9:
%10 = phi <16 x i8> [ %8, %6 ], [ %2, %4 ]		%10 = phi <16 x i8> [ %8, %6 ], [ %2, %4 ]
%11 = call <16 x i8> @llvm.arm.neon.aese(<16 x i8> %5, <16 x i8> %10)		%11 = call <16 x i8> @llvm.arm.neon.aese(<16 x i8> %5, <16 x i8> %10)
%12 = call <16 x i8> @llvm.arm.neon.aesmc(<16 x i8> %11)		%12 = call <16 x i8> @llvm.arm.neon.aesmc(<16 x i8> %11)
store <16 x i8> %12, <16 x i8>* %3, align 8		store <16 x i8> %12, <16 x i8>* %3, align 8
ret void		ret void
}		}

define arm_aapcs_vfpcc void @aese_set8_cond_via_val(i1 zeroext %0, i8 zeroext %1, <16 x i8> %2, <16 x i8>* %3) nounwind {		define arm_aapcs_vfpcc void @aese_set8_cond_via_val(i1 zeroext %0, i8 zeroext %1, <16 x i8> %2, <16 x i8>* %3) nounwind {
; CHECK-FIX-LABEL: aese_set8_cond_via_val:		; CHECK-FIX-LABEL: aese_set8_cond_via_val:
; CHECK-FIX: @ %bb.0:		; CHECK-FIX: @ %bb.0:
		; CHECK-FIX-NEXT: vorr q0, q0, q0
; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r2]		; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r2]
; CHECK-FIX-NEXT: cmp r0, #0		; CHECK-FIX-NEXT: cmp r0, #0
; CHECK-FIX-NEXT: beq .LBB10_2		; CHECK-FIX-NEXT: beq .LBB10_2
; CHECK-FIX-NEXT: @ %bb.1:		; CHECK-FIX-NEXT: @ %bb.1:
; CHECK-FIX-NEXT: vmov.8 d16[0], r1		; CHECK-FIX-NEXT: vmov.8 d16[0], r1
; CHECK-FIX-NEXT: .LBB10_2: @ %select.end		; CHECK-FIX-NEXT: .LBB10_2: @ %select.end
		; CHECK-FIX-NEXT: vorr q8, q8, q8
; CHECK-FIX-NEXT: aese.8 q8, q0		; CHECK-FIX-NEXT: aese.8 q8, q0
; CHECK-FIX-NEXT: aesmc.8 q8, q8		; CHECK-FIX-NEXT: aesmc.8 q8, q8
; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r2]		; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r2]
; CHECK-FIX-NEXT: bx lr		; CHECK-FIX-NEXT: bx lr
%5 = load <16 x i8>, <16 x i8>* %3, align 8		%5 = load <16 x i8>, <16 x i8>* %3, align 8
%6 = insertelement <16 x i8> %5, i8 %1, i64 0		%6 = insertelement <16 x i8> %5, i8 %1, i64 0
%7 = select i1 %0, <16 x i8> %6, <16 x i8> %5		%7 = select i1 %0, <16 x i8> %6, <16 x i8> %5
%8 = call <16 x i8> @llvm.arm.neon.aese(<16 x i8> %7, <16 x i8> %2)		%8 = call <16 x i8> @llvm.arm.neon.aese(<16 x i8> %7, <16 x i8> %2)
%9 = call <16 x i8> @llvm.arm.neon.aesmc(<16 x i8> %8)		%9 = call <16 x i8> @llvm.arm.neon.aesmc(<16 x i8> %8)
store <16 x i8> %9, <16 x i8>* %3, align 8		store <16 x i8> %9, <16 x i8>* %3, align 8
ret void		ret void
}		}

define arm_aapcs_vfpcc void @aese_set8_loop_via_ptr(i32 %0, i8* %1, <16 x i8> %2, <16 x i8>* %3) nounwind {		define arm_aapcs_vfpcc void @aese_set8_loop_via_ptr(i32 %0, i8* %1, <16 x i8> %2, <16 x i8>* %3) nounwind {
; CHECK-FIX-LABEL: aese_set8_loop_via_ptr:		; CHECK-FIX-LABEL: aese_set8_loop_via_ptr:
; CHECK-FIX: @ %bb.0:		; CHECK-FIX: @ %bb.0:
; CHECK-FIX-NEXT: cmp r0, #0		; CHECK-FIX-NEXT: cmp r0, #0
; CHECK-FIX-NEXT: bxeq lr		; CHECK-FIX-NEXT: bxeq lr
; CHECK-FIX-NEXT: .LBB11_1:		; CHECK-FIX-NEXT: .LBB11_1:
; CHECK-FIX-NEXT: vld1.8 {d0[0]}, [r1]		; CHECK-FIX-NEXT: vld1.8 {d0[0]}, [r1]
; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r2]		; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r2]
; CHECK-FIX-NEXT: .LBB11_2: @ =>This Inner Loop Header: Depth=1		; CHECK-FIX-NEXT: .LBB11_2: @ =>This Inner Loop Header: Depth=1
		; CHECK-FIX-NEXT: vorr q0, q0, q0
; CHECK-FIX-NEXT: aese.8 q8, q0		; CHECK-FIX-NEXT: aese.8 q8, q0
; CHECK-FIX-NEXT: subs r0, r0, #1		; CHECK-FIX-NEXT: subs r0, r0, #1
; CHECK-FIX-NEXT: aesmc.8 q8, q8		; CHECK-FIX-NEXT: aesmc.8 q8, q8
; CHECK-FIX-NEXT: bne .LBB11_2		; CHECK-FIX-NEXT: bne .LBB11_2
; CHECK-FIX-NEXT: @ %bb.3:		; CHECK-FIX-NEXT: @ %bb.3:
; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r2]		; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r2]
; CHECK-FIX-NEXT: bx lr		; CHECK-FIX-NEXT: bx lr
%5 = load i8, i8* %1, align 1		%5 = load i8, i8* %1, align 1
Show All 26 Lines
; CHECK-FIX-LABEL: aese_set8_loop_via_val:		; CHECK-FIX-LABEL: aese_set8_loop_via_val:
; CHECK-FIX: @ %bb.0:		; CHECK-FIX: @ %bb.0:
; CHECK-FIX-NEXT: cmp r0, #0		; CHECK-FIX-NEXT: cmp r0, #0
; CHECK-FIX-NEXT: bxeq lr		; CHECK-FIX-NEXT: bxeq lr
; CHECK-FIX-NEXT: .LBB12_1:		; CHECK-FIX-NEXT: .LBB12_1:
; CHECK-FIX-NEXT: vmov.8 d0[0], r1		; CHECK-FIX-NEXT: vmov.8 d0[0], r1
; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r2]		; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r2]
; CHECK-FIX-NEXT: .LBB12_2: @ =>This Inner Loop Header: Depth=1		; CHECK-FIX-NEXT: .LBB12_2: @ =>This Inner Loop Header: Depth=1
		; CHECK-FIX-NEXT: vorr q0, q0, q0
; CHECK-FIX-NEXT: aese.8 q8, q0		; CHECK-FIX-NEXT: aese.8 q8, q0
; CHECK-FIX-NEXT: subs r0, r0, #1		; CHECK-FIX-NEXT: subs r0, r0, #1
; CHECK-FIX-NEXT: aesmc.8 q8, q8		; CHECK-FIX-NEXT: aesmc.8 q8, q8
; CHECK-FIX-NEXT: bne .LBB12_2		; CHECK-FIX-NEXT: bne .LBB12_2
; CHECK-FIX-NEXT: @ %bb.3:		; CHECK-FIX-NEXT: @ %bb.3:
; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r2]		; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r2]
; CHECK-FIX-NEXT: bx lr		; CHECK-FIX-NEXT: bx lr
%5 = icmp eq i32 %0, 0		%5 = icmp eq i32 %0, 0
Show All 21 Lines	11:
br i1 %17, label %9, label %11		br i1 %17, label %9, label %11
}		}

define arm_aapcs_vfpcc void @aese_set16_via_ptr(i16* %0, <16 x i8> %1, <16 x i8>* %2) nounwind {		define arm_aapcs_vfpcc void @aese_set16_via_ptr(i16* %0, <16 x i8> %1, <16 x i8>* %2) nounwind {
; CHECK-FIX-LABEL: aese_set16_via_ptr:		; CHECK-FIX-LABEL: aese_set16_via_ptr:
; CHECK-FIX: @ %bb.0:		; CHECK-FIX: @ %bb.0:
; CHECK-FIX-NEXT: vld1.16 {d0[0]}, [r0:16]		; CHECK-FIX-NEXT: vld1.16 {d0[0]}, [r0:16]
; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r1]		; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r1]
		; CHECK-FIX-NEXT: vorr q0, q0, q0
; CHECK-FIX-NEXT: aese.8 q8, q0		; CHECK-FIX-NEXT: aese.8 q8, q0
; CHECK-FIX-NEXT: aesmc.8 q8, q8		; CHECK-FIX-NEXT: aesmc.8 q8, q8
; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r1]		; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r1]
; CHECK-FIX-NEXT: bx lr		; CHECK-FIX-NEXT: bx lr
%4 = load <16 x i8>, <16 x i8>* %2, align 8		%4 = load <16 x i8>, <16 x i8>* %2, align 8
%5 = load i16, i16* %0, align 2		%5 = load i16, i16* %0, align 2
%6 = bitcast <16 x i8> %1 to <8 x i16>		%6 = bitcast <16 x i8> %1 to <8 x i16>
%7 = insertelement <8 x i16> %6, i16 %5, i64 0		%7 = insertelement <8 x i16> %6, i16 %5, i64 0
%8 = bitcast <8 x i16> %7 to <16 x i8>		%8 = bitcast <8 x i16> %7 to <16 x i8>
%9 = call <16 x i8> @llvm.arm.neon.aese(<16 x i8> %4, <16 x i8> %8)		%9 = call <16 x i8> @llvm.arm.neon.aese(<16 x i8> %4, <16 x i8> %8)
%10 = call <16 x i8> @llvm.arm.neon.aesmc(<16 x i8> %9)		%10 = call <16 x i8> @llvm.arm.neon.aesmc(<16 x i8> %9)
store <16 x i8> %10, <16 x i8>* %2, align 8		store <16 x i8> %10, <16 x i8>* %2, align 8
ret void		ret void
}		}

define arm_aapcs_vfpcc void @aese_set16_via_val(i16 zeroext %0, <16 x i8> %1, <16 x i8>* %2) nounwind {		define arm_aapcs_vfpcc void @aese_set16_via_val(i16 zeroext %0, <16 x i8> %1, <16 x i8>* %2) nounwind {
; CHECK-FIX-LABEL: aese_set16_via_val:		; CHECK-FIX-LABEL: aese_set16_via_val:
; CHECK-FIX: @ %bb.0:		; CHECK-FIX: @ %bb.0:
		; CHECK-FIX-NEXT: vorr q0, q0, q0
; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r1]		; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r1]
; CHECK-FIX-NEXT: vmov.16 d16[0], r0		; CHECK-FIX-NEXT: vmov.16 d16[0], r0
		; CHECK-FIX-NEXT: vorr q8, q8, q8
; CHECK-FIX-NEXT: aese.8 q8, q0		; CHECK-FIX-NEXT: aese.8 q8, q0
; CHECK-FIX-NEXT: aesmc.8 q8, q8		; CHECK-FIX-NEXT: aesmc.8 q8, q8
; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r1]		; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r1]
; CHECK-FIX-NEXT: bx lr		; CHECK-FIX-NEXT: bx lr
%4 = bitcast <16 x i8>* %2 to <8 x i16>*		%4 = bitcast <16 x i8>* %2 to <8 x i16>*
%5 = load <8 x i16>, <8 x i16>* %4, align 8		%5 = load <8 x i16>, <8 x i16>* %4, align 8
%6 = insertelement <8 x i16> %5, i16 %0, i64 0		%6 = insertelement <8 x i16> %5, i16 %0, i64 0
%7 = bitcast <8 x i16> %6 to <16 x i8>		%7 = bitcast <8 x i16> %6 to <16 x i8>
%8 = call <16 x i8> @llvm.arm.neon.aese(<16 x i8> %7, <16 x i8> %1)		%8 = call <16 x i8> @llvm.arm.neon.aese(<16 x i8> %7, <16 x i8> %1)
%9 = call <16 x i8> @llvm.arm.neon.aesmc(<16 x i8> %8)		%9 = call <16 x i8> @llvm.arm.neon.aesmc(<16 x i8> %8)
store <16 x i8> %9, <16 x i8>* %2, align 8		store <16 x i8> %9, <16 x i8>* %2, align 8
ret void		ret void
}		}

define arm_aapcs_vfpcc void @aese_set16_cond_via_ptr(i1 zeroext %0, i16* %1, <16 x i8> %2, <16 x i8>* %3) nounwind {		define arm_aapcs_vfpcc void @aese_set16_cond_via_ptr(i1 zeroext %0, i16* %1, <16 x i8> %2, <16 x i8>* %3) nounwind {
; CHECK-FIX-LABEL: aese_set16_cond_via_ptr:		; CHECK-FIX-LABEL: aese_set16_cond_via_ptr:
; CHECK-FIX: @ %bb.0:		; CHECK-FIX: @ %bb.0:
; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r2]		; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r2]
; CHECK-FIX-NEXT: cmp r0, #0		; CHECK-FIX-NEXT: cmp r0, #0
; CHECK-FIX-NEXT: beq .LBB15_2		; CHECK-FIX-NEXT: beq .LBB15_2
; CHECK-FIX-NEXT: @ %bb.1:		; CHECK-FIX-NEXT: @ %bb.1:
; CHECK-FIX-NEXT: vld1.16 {d0[0]}, [r1:16]		; CHECK-FIX-NEXT: vld1.16 {d0[0]}, [r1:16]
; CHECK-FIX-NEXT: .LBB15_2:		; CHECK-FIX-NEXT: .LBB15_2:
		; CHECK-FIX-NEXT: vorr q0, q0, q0
; CHECK-FIX-NEXT: aese.8 q8, q0		; CHECK-FIX-NEXT: aese.8 q8, q0
; CHECK-FIX-NEXT: aesmc.8 q8, q8		; CHECK-FIX-NEXT: aesmc.8 q8, q8
; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r2]		; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r2]
; CHECK-FIX-NEXT: bx lr		; CHECK-FIX-NEXT: bx lr
%5 = load <16 x i8>, <16 x i8>* %3, align 8		%5 = load <16 x i8>, <16 x i8>* %3, align 8
br i1 %0, label %6, label %10		br i1 %0, label %6, label %10

6:		6:
Show All 13 Lines	12:
%16 = call <16 x i8> @llvm.arm.neon.aesmc(<16 x i8> %15)		%16 = call <16 x i8> @llvm.arm.neon.aesmc(<16 x i8> %15)
store <16 x i8> %16, <16 x i8>* %3, align 8		store <16 x i8> %16, <16 x i8>* %3, align 8
ret void		ret void
}		}

define arm_aapcs_vfpcc void @aese_set16_cond_via_val(i1 zeroext %0, i16 zeroext %1, <16 x i8> %2, <16 x i8>* %3) nounwind {		define arm_aapcs_vfpcc void @aese_set16_cond_via_val(i1 zeroext %0, i16 zeroext %1, <16 x i8> %2, <16 x i8>* %3) nounwind {
; CHECK-FIX-LABEL: aese_set16_cond_via_val:		; CHECK-FIX-LABEL: aese_set16_cond_via_val:
; CHECK-FIX: @ %bb.0:		; CHECK-FIX: @ %bb.0:
		; CHECK-FIX-NEXT: vorr q0, q0, q0
; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r2]		; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r2]
; CHECK-FIX-NEXT: cmp r0, #0		; CHECK-FIX-NEXT: cmp r0, #0
; CHECK-FIX-NEXT: beq .LBB16_2		; CHECK-FIX-NEXT: beq .LBB16_2
; CHECK-FIX-NEXT: @ %bb.1:		; CHECK-FIX-NEXT: @ %bb.1:
; CHECK-FIX-NEXT: vmov.16 d16[0], r1		; CHECK-FIX-NEXT: vmov.16 d16[0], r1
; CHECK-FIX-NEXT: .LBB16_2: @ %select.end		; CHECK-FIX-NEXT: .LBB16_2: @ %select.end
		; CHECK-FIX-NEXT: vorr q8, q8, q8
; CHECK-FIX-NEXT: aese.8 q8, q0		; CHECK-FIX-NEXT: aese.8 q8, q0
; CHECK-FIX-NEXT: aesmc.8 q8, q8		; CHECK-FIX-NEXT: aesmc.8 q8, q8
; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r2]		; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r2]
; CHECK-FIX-NEXT: bx lr		; CHECK-FIX-NEXT: bx lr
%5 = bitcast <16 x i8>* %3 to <8 x i16>*		%5 = bitcast <16 x i8>* %3 to <8 x i16>*
%6 = load <8 x i16>, <8 x i16>* %5, align 8		%6 = load <8 x i16>, <8 x i16>* %5, align 8
%7 = insertelement <8 x i16> %6, i16 %1, i64 0		%7 = insertelement <8 x i16> %6, i16 %1, i64 0
%8 = select i1 %0, <8 x i16> %7, <8 x i16> %6		%8 = select i1 %0, <8 x i16> %7, <8 x i16> %6
%9 = bitcast <8 x i16> %8 to <16 x i8>		%9 = bitcast <8 x i16> %8 to <16 x i8>
%10 = call <16 x i8> @llvm.arm.neon.aese(<16 x i8> %9, <16 x i8> %2)		%10 = call <16 x i8> @llvm.arm.neon.aese(<16 x i8> %9, <16 x i8> %2)
%11 = call <16 x i8> @llvm.arm.neon.aesmc(<16 x i8> %10)		%11 = call <16 x i8> @llvm.arm.neon.aesmc(<16 x i8> %10)
store <16 x i8> %11, <16 x i8>* %3, align 8		store <16 x i8> %11, <16 x i8>* %3, align 8
ret void		ret void
}		}

define arm_aapcs_vfpcc void @aese_set16_loop_via_ptr(i32 %0, i16* %1, <16 x i8> %2, <16 x i8>* %3) nounwind {		define arm_aapcs_vfpcc void @aese_set16_loop_via_ptr(i32 %0, i16* %1, <16 x i8> %2, <16 x i8>* %3) nounwind {
; CHECK-FIX-LABEL: aese_set16_loop_via_ptr:		; CHECK-FIX-LABEL: aese_set16_loop_via_ptr:
; CHECK-FIX: @ %bb.0:		; CHECK-FIX: @ %bb.0:
; CHECK-FIX-NEXT: cmp r0, #0		; CHECK-FIX-NEXT: cmp r0, #0
; CHECK-FIX-NEXT: bxeq lr		; CHECK-FIX-NEXT: bxeq lr
; CHECK-FIX-NEXT: .LBB17_1:		; CHECK-FIX-NEXT: .LBB17_1:
; CHECK-FIX-NEXT: vld1.16 {d0[0]}, [r1:16]		; CHECK-FIX-NEXT: vld1.16 {d0[0]}, [r1:16]
; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r2]		; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r2]
; CHECK-FIX-NEXT: .LBB17_2: @ =>This Inner Loop Header: Depth=1		; CHECK-FIX-NEXT: .LBB17_2: @ =>This Inner Loop Header: Depth=1
		; CHECK-FIX-NEXT: vorr q0, q0, q0
; CHECK-FIX-NEXT: aese.8 q8, q0		; CHECK-FIX-NEXT: aese.8 q8, q0
; CHECK-FIX-NEXT: subs r0, r0, #1		; CHECK-FIX-NEXT: subs r0, r0, #1
; CHECK-FIX-NEXT: aesmc.8 q8, q8		; CHECK-FIX-NEXT: aesmc.8 q8, q8
; CHECK-FIX-NEXT: bne .LBB17_2		; CHECK-FIX-NEXT: bne .LBB17_2
; CHECK-FIX-NEXT: @ %bb.3:		; CHECK-FIX-NEXT: @ %bb.3:
; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r2]		; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r2]
; CHECK-FIX-NEXT: bx lr		; CHECK-FIX-NEXT: bx lr
%5 = load i16, i16* %1, align 2		%5 = load i16, i16* %1, align 2
Show All 28 Lines
; CHECK-FIX-LABEL: aese_set16_loop_via_val:		; CHECK-FIX-LABEL: aese_set16_loop_via_val:
; CHECK-FIX: @ %bb.0:		; CHECK-FIX: @ %bb.0:
; CHECK-FIX-NEXT: cmp r0, #0		; CHECK-FIX-NEXT: cmp r0, #0
; CHECK-FIX-NEXT: bxeq lr		; CHECK-FIX-NEXT: bxeq lr
; CHECK-FIX-NEXT: .LBB18_1:		; CHECK-FIX-NEXT: .LBB18_1:
; CHECK-FIX-NEXT: vmov.16 d0[0], r1		; CHECK-FIX-NEXT: vmov.16 d0[0], r1
; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r2]		; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r2]
; CHECK-FIX-NEXT: .LBB18_2: @ =>This Inner Loop Header: Depth=1		; CHECK-FIX-NEXT: .LBB18_2: @ =>This Inner Loop Header: Depth=1
		; CHECK-FIX-NEXT: vorr q0, q0, q0
; CHECK-FIX-NEXT: aese.8 q8, q0		; CHECK-FIX-NEXT: aese.8 q8, q0
; CHECK-FIX-NEXT: subs r0, r0, #1		; CHECK-FIX-NEXT: subs r0, r0, #1
; CHECK-FIX-NEXT: aesmc.8 q8, q8		; CHECK-FIX-NEXT: aesmc.8 q8, q8
; CHECK-FIX-NEXT: bne .LBB18_2		; CHECK-FIX-NEXT: bne .LBB18_2
; CHECK-FIX-NEXT: @ %bb.3:		; CHECK-FIX-NEXT: @ %bb.3:
; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r2]		; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r2]
; CHECK-FIX-NEXT: bx lr		; CHECK-FIX-NEXT: bx lr
%5 = icmp eq i32 %0, 0		%5 = icmp eq i32 %0, 0
Show All 23 Lines	13:
br i1 %19, label %11, label %13		br i1 %19, label %11, label %13
}		}

define arm_aapcs_vfpcc void @aese_set32_via_ptr(i32* %0, <16 x i8> %1, <16 x i8>* %2) nounwind {		define arm_aapcs_vfpcc void @aese_set32_via_ptr(i32* %0, <16 x i8> %1, <16 x i8>* %2) nounwind {
; CHECK-FIX-LABEL: aese_set32_via_ptr:		; CHECK-FIX-LABEL: aese_set32_via_ptr:
; CHECK-FIX: @ %bb.0:		; CHECK-FIX: @ %bb.0:
; CHECK-FIX-NEXT: vld1.32 {d0[0]}, [r0:32]		; CHECK-FIX-NEXT: vld1.32 {d0[0]}, [r0:32]
; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r1]		; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r1]
		; CHECK-FIX-NEXT: vorr q0, q0, q0
; CHECK-FIX-NEXT: aese.8 q8, q0		; CHECK-FIX-NEXT: aese.8 q8, q0
; CHECK-FIX-NEXT: aesmc.8 q8, q8		; CHECK-FIX-NEXT: aesmc.8 q8, q8
; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r1]		; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r1]
; CHECK-FIX-NEXT: bx lr		; CHECK-FIX-NEXT: bx lr
%4 = load <16 x i8>, <16 x i8>* %2, align 8		%4 = load <16 x i8>, <16 x i8>* %2, align 8
%5 = load i32, i32* %0, align 4		%5 = load i32, i32* %0, align 4
%6 = bitcast <16 x i8> %1 to <4 x i32>		%6 = bitcast <16 x i8> %1 to <4 x i32>
%7 = insertelement <4 x i32> %6, i32 %5, i64 0		%7 = insertelement <4 x i32> %6, i32 %5, i64 0
%8 = bitcast <4 x i32> %7 to <16 x i8>		%8 = bitcast <4 x i32> %7 to <16 x i8>
%9 = call <16 x i8> @llvm.arm.neon.aese(<16 x i8> %4, <16 x i8> %8)		%9 = call <16 x i8> @llvm.arm.neon.aese(<16 x i8> %4, <16 x i8> %8)
%10 = call <16 x i8> @llvm.arm.neon.aesmc(<16 x i8> %9)		%10 = call <16 x i8> @llvm.arm.neon.aesmc(<16 x i8> %9)
store <16 x i8> %10, <16 x i8>* %2, align 8		store <16 x i8> %10, <16 x i8>* %2, align 8
ret void		ret void
}		}

define arm_aapcs_vfpcc void @aese_set32_via_val(i32 %0, <16 x i8> %1, <16 x i8>* %2) nounwind {		define arm_aapcs_vfpcc void @aese_set32_via_val(i32 %0, <16 x i8> %1, <16 x i8>* %2) nounwind {
; CHECK-FIX-LABEL: aese_set32_via_val:		; CHECK-FIX-LABEL: aese_set32_via_val:
; CHECK-FIX: @ %bb.0:		; CHECK-FIX: @ %bb.0:
		; CHECK-FIX-NEXT: vorr q0, q0, q0
; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r1]		; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r1]
; CHECK-FIX-NEXT: vmov.32 d16[0], r0		; CHECK-FIX-NEXT: vmov.32 d16[0], r0
		; CHECK-FIX-NEXT: vorr q8, q8, q8
; CHECK-FIX-NEXT: aese.8 q8, q0		; CHECK-FIX-NEXT: aese.8 q8, q0
; CHECK-FIX-NEXT: aesmc.8 q8, q8		; CHECK-FIX-NEXT: aesmc.8 q8, q8
; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r1]		; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r1]
; CHECK-FIX-NEXT: bx lr		; CHECK-FIX-NEXT: bx lr
%4 = bitcast <16 x i8>* %2 to <4 x i32>*		%4 = bitcast <16 x i8>* %2 to <4 x i32>*
%5 = load <4 x i32>, <4 x i32>* %4, align 8		%5 = load <4 x i32>, <4 x i32>* %4, align 8
%6 = insertelement <4 x i32> %5, i32 %0, i64 0		%6 = insertelement <4 x i32> %5, i32 %0, i64 0
%7 = bitcast <4 x i32> %6 to <16 x i8>		%7 = bitcast <4 x i32> %6 to <16 x i8>
%8 = call <16 x i8> @llvm.arm.neon.aese(<16 x i8> %7, <16 x i8> %1)		%8 = call <16 x i8> @llvm.arm.neon.aese(<16 x i8> %7, <16 x i8> %1)
%9 = call <16 x i8> @llvm.arm.neon.aesmc(<16 x i8> %8)		%9 = call <16 x i8> @llvm.arm.neon.aesmc(<16 x i8> %8)
store <16 x i8> %9, <16 x i8>* %2, align 8		store <16 x i8> %9, <16 x i8>* %2, align 8
ret void		ret void
}		}

define arm_aapcs_vfpcc void @aese_set32_cond_via_ptr(i1 zeroext %0, i32* %1, <16 x i8> %2, <16 x i8>* %3) nounwind {		define arm_aapcs_vfpcc void @aese_set32_cond_via_ptr(i1 zeroext %0, i32* %1, <16 x i8> %2, <16 x i8>* %3) nounwind {
; CHECK-FIX-LABEL: aese_set32_cond_via_ptr:		; CHECK-FIX-LABEL: aese_set32_cond_via_ptr:
; CHECK-FIX: @ %bb.0:		; CHECK-FIX: @ %bb.0:
; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r2]		; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r2]
; CHECK-FIX-NEXT: cmp r0, #0		; CHECK-FIX-NEXT: cmp r0, #0
; CHECK-FIX-NEXT: beq .LBB21_2		; CHECK-FIX-NEXT: beq .LBB21_2
; CHECK-FIX-NEXT: @ %bb.1:		; CHECK-FIX-NEXT: @ %bb.1:
; CHECK-FIX-NEXT: vld1.32 {d0[0]}, [r1:32]		; CHECK-FIX-NEXT: vld1.32 {d0[0]}, [r1:32]
; CHECK-FIX-NEXT: .LBB21_2:		; CHECK-FIX-NEXT: .LBB21_2:
		; CHECK-FIX-NEXT: vorr q0, q0, q0
; CHECK-FIX-NEXT: aese.8 q8, q0		; CHECK-FIX-NEXT: aese.8 q8, q0
; CHECK-FIX-NEXT: aesmc.8 q8, q8		; CHECK-FIX-NEXT: aesmc.8 q8, q8
; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r2]		; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r2]
; CHECK-FIX-NEXT: bx lr		; CHECK-FIX-NEXT: bx lr
%5 = load <16 x i8>, <16 x i8>* %3, align 8		%5 = load <16 x i8>, <16 x i8>* %3, align 8
br i1 %0, label %6, label %10		br i1 %0, label %6, label %10

6:		6:
Show All 13 Lines	12:
%16 = call <16 x i8> @llvm.arm.neon.aesmc(<16 x i8> %15)		%16 = call <16 x i8> @llvm.arm.neon.aesmc(<16 x i8> %15)
store <16 x i8> %16, <16 x i8>* %3, align 8		store <16 x i8> %16, <16 x i8>* %3, align 8
ret void		ret void
}		}

define arm_aapcs_vfpcc void @aese_set32_cond_via_val(i1 zeroext %0, i32 %1, <16 x i8> %2, <16 x i8>* %3) nounwind {		define arm_aapcs_vfpcc void @aese_set32_cond_via_val(i1 zeroext %0, i32 %1, <16 x i8> %2, <16 x i8>* %3) nounwind {
; CHECK-FIX-LABEL: aese_set32_cond_via_val:		; CHECK-FIX-LABEL: aese_set32_cond_via_val:
; CHECK-FIX: @ %bb.0:		; CHECK-FIX: @ %bb.0:
		; CHECK-FIX-NEXT: vorr q0, q0, q0
; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r2]		; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r2]
; CHECK-FIX-NEXT: cmp r0, #0		; CHECK-FIX-NEXT: cmp r0, #0
; CHECK-FIX-NEXT: beq .LBB22_2		; CHECK-FIX-NEXT: beq .LBB22_2
; CHECK-FIX-NEXT: @ %bb.1:		; CHECK-FIX-NEXT: @ %bb.1:
; CHECK-FIX-NEXT: vmov.32 d16[0], r1		; CHECK-FIX-NEXT: vmov.32 d16[0], r1
; CHECK-FIX-NEXT: .LBB22_2: @ %select.end		; CHECK-FIX-NEXT: .LBB22_2: @ %select.end
		; CHECK-FIX-NEXT: vorr q8, q8, q8
; CHECK-FIX-NEXT: aese.8 q8, q0		; CHECK-FIX-NEXT: aese.8 q8, q0
; CHECK-FIX-NEXT: aesmc.8 q8, q8		; CHECK-FIX-NEXT: aesmc.8 q8, q8
; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r2]		; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r2]
; CHECK-FIX-NEXT: bx lr		; CHECK-FIX-NEXT: bx lr
%5 = bitcast <16 x i8>* %3 to <4 x i32>*		%5 = bitcast <16 x i8>* %3 to <4 x i32>*
%6 = load <4 x i32>, <4 x i32>* %5, align 8		%6 = load <4 x i32>, <4 x i32>* %5, align 8
%7 = insertelement <4 x i32> %6, i32 %1, i64 0		%7 = insertelement <4 x i32> %6, i32 %1, i64 0
%8 = select i1 %0, <4 x i32> %7, <4 x i32> %6		%8 = select i1 %0, <4 x i32> %7, <4 x i32> %6
%9 = bitcast <4 x i32> %8 to <16 x i8>		%9 = bitcast <4 x i32> %8 to <16 x i8>
%10 = call <16 x i8> @llvm.arm.neon.aese(<16 x i8> %9, <16 x i8> %2)		%10 = call <16 x i8> @llvm.arm.neon.aese(<16 x i8> %9, <16 x i8> %2)
%11 = call <16 x i8> @llvm.arm.neon.aesmc(<16 x i8> %10)		%11 = call <16 x i8> @llvm.arm.neon.aesmc(<16 x i8> %10)
store <16 x i8> %11, <16 x i8>* %3, align 8		store <16 x i8> %11, <16 x i8>* %3, align 8
ret void		ret void
}		}

define arm_aapcs_vfpcc void @aese_set32_loop_via_ptr(i32 %0, i32* %1, <16 x i8> %2, <16 x i8>* %3) nounwind {		define arm_aapcs_vfpcc void @aese_set32_loop_via_ptr(i32 %0, i32* %1, <16 x i8> %2, <16 x i8>* %3) nounwind {
; CHECK-FIX-LABEL: aese_set32_loop_via_ptr:		; CHECK-FIX-LABEL: aese_set32_loop_via_ptr:
; CHECK-FIX: @ %bb.0:		; CHECK-FIX: @ %bb.0:
; CHECK-FIX-NEXT: cmp r0, #0		; CHECK-FIX-NEXT: cmp r0, #0
; CHECK-FIX-NEXT: bxeq lr		; CHECK-FIX-NEXT: bxeq lr
; CHECK-FIX-NEXT: .LBB23_1:		; CHECK-FIX-NEXT: .LBB23_1:
; CHECK-FIX-NEXT: vld1.32 {d0[0]}, [r1:32]		; CHECK-FIX-NEXT: vld1.32 {d0[0]}, [r1:32]
; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r2]		; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r2]
; CHECK-FIX-NEXT: .LBB23_2: @ =>This Inner Loop Header: Depth=1		; CHECK-FIX-NEXT: .LBB23_2: @ =>This Inner Loop Header: Depth=1
		; CHECK-FIX-NEXT: vorr q0, q0, q0
; CHECK-FIX-NEXT: aese.8 q8, q0		; CHECK-FIX-NEXT: aese.8 q8, q0
; CHECK-FIX-NEXT: subs r0, r0, #1		; CHECK-FIX-NEXT: subs r0, r0, #1
; CHECK-FIX-NEXT: aesmc.8 q8, q8		; CHECK-FIX-NEXT: aesmc.8 q8, q8
; CHECK-FIX-NEXT: bne .LBB23_2		; CHECK-FIX-NEXT: bne .LBB23_2
; CHECK-FIX-NEXT: @ %bb.3:		; CHECK-FIX-NEXT: @ %bb.3:
; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r2]		; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r2]
; CHECK-FIX-NEXT: bx lr		; CHECK-FIX-NEXT: bx lr
%5 = load i32, i32* %1, align 4		%5 = load i32, i32* %1, align 4
Show All 28 Lines
; CHECK-FIX-LABEL: aese_set32_loop_via_val:		; CHECK-FIX-LABEL: aese_set32_loop_via_val:
; CHECK-FIX: @ %bb.0:		; CHECK-FIX: @ %bb.0:
; CHECK-FIX-NEXT: cmp r0, #0		; CHECK-FIX-NEXT: cmp r0, #0
; CHECK-FIX-NEXT: bxeq lr		; CHECK-FIX-NEXT: bxeq lr
; CHECK-FIX-NEXT: .LBB24_1:		; CHECK-FIX-NEXT: .LBB24_1:
; CHECK-FIX-NEXT: vmov.32 d0[0], r1		; CHECK-FIX-NEXT: vmov.32 d0[0], r1
; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r2]		; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r2]
; CHECK-FIX-NEXT: .LBB24_2: @ =>This Inner Loop Header: Depth=1		; CHECK-FIX-NEXT: .LBB24_2: @ =>This Inner Loop Header: Depth=1
		; CHECK-FIX-NEXT: vorr q0, q0, q0
; CHECK-FIX-NEXT: aese.8 q8, q0		; CHECK-FIX-NEXT: aese.8 q8, q0
; CHECK-FIX-NEXT: subs r0, r0, #1		; CHECK-FIX-NEXT: subs r0, r0, #1
; CHECK-FIX-NEXT: aesmc.8 q8, q8		; CHECK-FIX-NEXT: aesmc.8 q8, q8
; CHECK-FIX-NEXT: bne .LBB24_2		; CHECK-FIX-NEXT: bne .LBB24_2
; CHECK-FIX-NEXT: @ %bb.3:		; CHECK-FIX-NEXT: @ %bb.3:
; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r2]		; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r2]
; CHECK-FIX-NEXT: bx lr		; CHECK-FIX-NEXT: bx lr
%5 = icmp eq i32 %0, 0		%5 = icmp eq i32 %0, 0
Show All 21 Lines	13:
%18 = add nuw i32 %15, 1		%18 = add nuw i32 %15, 1
%19 = icmp eq i32 %18, %0		%19 = icmp eq i32 %18, %0
br i1 %19, label %11, label %13		br i1 %19, label %11, label %13
}		}

define arm_aapcs_vfpcc void @aese_set64_via_ptr(i64* %0, <16 x i8> %1, <16 x i8>* %2) nounwind {		define arm_aapcs_vfpcc void @aese_set64_via_ptr(i64* %0, <16 x i8> %1, <16 x i8>* %2) nounwind {
; CHECK-FIX-NOSCHED-LABEL: aese_set64_via_ptr:		; CHECK-FIX-NOSCHED-LABEL: aese_set64_via_ptr:
; CHECK-FIX-NOSCHED: @ %bb.0:		; CHECK-FIX-NOSCHED: @ %bb.0:
		; CHECK-FIX-NOSCHED-NEXT: vorr q0, q0, q0
; CHECK-FIX-NOSCHED-NEXT: vld1.64 {d16, d17}, [r1]		; CHECK-FIX-NOSCHED-NEXT: vld1.64 {d16, d17}, [r1]
; CHECK-FIX-NOSCHED-NEXT: vldr d0, [r0]		; CHECK-FIX-NOSCHED-NEXT: vldr d0, [r0]
; CHECK-FIX-NOSCHED-NEXT: aese.8 q8, q0		; CHECK-FIX-NOSCHED-NEXT: aese.8 q8, q0
; CHECK-FIX-NOSCHED-NEXT: aesmc.8 q8, q8		; CHECK-FIX-NOSCHED-NEXT: aesmc.8 q8, q8
; CHECK-FIX-NOSCHED-NEXT: vst1.64 {d16, d17}, [r1]		; CHECK-FIX-NOSCHED-NEXT: vst1.64 {d16, d17}, [r1]
; CHECK-FIX-NOSCHED-NEXT: bx lr		; CHECK-FIX-NOSCHED-NEXT: bx lr
;		;
; CHECK-CORTEX-FIX-LABEL: aese_set64_via_ptr:		; CHECK-CORTEX-FIX-LABEL: aese_set64_via_ptr:
; CHECK-CORTEX-FIX: @ %bb.0:		; CHECK-CORTEX-FIX: @ %bb.0:
		; CHECK-CORTEX-FIX-NEXT: vorr q0, q0, q0
; CHECK-CORTEX-FIX-NEXT: vldr d0, [r0]		; CHECK-CORTEX-FIX-NEXT: vldr d0, [r0]
; CHECK-CORTEX-FIX-NEXT: vld1.64 {d16, d17}, [r1]		; CHECK-CORTEX-FIX-NEXT: vld1.64 {d16, d17}, [r1]
; CHECK-CORTEX-FIX-NEXT: aese.8 q8, q0		; CHECK-CORTEX-FIX-NEXT: aese.8 q8, q0
; CHECK-CORTEX-FIX-NEXT: aesmc.8 q8, q8		; CHECK-CORTEX-FIX-NEXT: aesmc.8 q8, q8
; CHECK-CORTEX-FIX-NEXT: vst1.64 {d16, d17}, [r1]		; CHECK-CORTEX-FIX-NEXT: vst1.64 {d16, d17}, [r1]
; CHECK-CORTEX-FIX-NEXT: bx lr		; CHECK-CORTEX-FIX-NEXT: bx lr
%4 = load <16 x i8>, <16 x i8>* %2, align 8		%4 = load <16 x i8>, <16 x i8>* %2, align 8
%5 = load i64, i64* %0, align 8		%5 = load i64, i64* %0, align 8
%6 = bitcast <16 x i8> %1 to <2 x i64>		%6 = bitcast <16 x i8> %1 to <2 x i64>
%7 = insertelement <2 x i64> %6, i64 %5, i64 0		%7 = insertelement <2 x i64> %6, i64 %5, i64 0
%8 = bitcast <2 x i64> %7 to <16 x i8>		%8 = bitcast <2 x i64> %7 to <16 x i8>
%9 = call <16 x i8> @llvm.arm.neon.aese(<16 x i8> %4, <16 x i8> %8)		%9 = call <16 x i8> @llvm.arm.neon.aese(<16 x i8> %4, <16 x i8> %8)
%10 = call <16 x i8> @llvm.arm.neon.aesmc(<16 x i8> %9)		%10 = call <16 x i8> @llvm.arm.neon.aesmc(<16 x i8> %9)
store <16 x i8> %10, <16 x i8>* %2, align 8		store <16 x i8> %10, <16 x i8>* %2, align 8
ret void		ret void
}		}

define arm_aapcs_vfpcc void @aese_set64_via_val(i64 %0, <16 x i8> %1, <16 x i8>* %2) nounwind {		define arm_aapcs_vfpcc void @aese_set64_via_val(i64 %0, <16 x i8> %1, <16 x i8>* %2) nounwind {
; CHECK-FIX-LABEL: aese_set64_via_val:		; CHECK-FIX-LABEL: aese_set64_via_val:
; CHECK-FIX: @ %bb.0:		; CHECK-FIX: @ %bb.0:
		; CHECK-FIX-NEXT: vorr q0, q0, q0
; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r2]		; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r2]
; CHECK-FIX-NEXT: vmov.32 d16[0], r0		; CHECK-FIX-NEXT: vmov.32 d16[0], r0
; CHECK-FIX-NEXT: vmov.32 d16[1], r1		; CHECK-FIX-NEXT: vmov.32 d16[1], r1
		; CHECK-FIX-NEXT: vorr q8, q8, q8
; CHECK-FIX-NEXT: aese.8 q8, q0		; CHECK-FIX-NEXT: aese.8 q8, q0
; CHECK-FIX-NEXT: aesmc.8 q8, q8		; CHECK-FIX-NEXT: aesmc.8 q8, q8
; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r2]		; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r2]
; CHECK-FIX-NEXT: bx lr		; CHECK-FIX-NEXT: bx lr
%4 = bitcast <16 x i8>* %2 to <2 x i64>*		%4 = bitcast <16 x i8>* %2 to <2 x i64>*
%5 = load <2 x i64>, <2 x i64>* %4, align 8		%5 = load <2 x i64>, <2 x i64>* %4, align 8
%6 = insertelement <2 x i64> %5, i64 %0, i64 0		%6 = insertelement <2 x i64> %5, i64 %0, i64 0
%7 = bitcast <2 x i64> %6 to <16 x i8>		%7 = bitcast <2 x i64> %6 to <16 x i8>
%8 = call <16 x i8> @llvm.arm.neon.aese(<16 x i8> %7, <16 x i8> %1)		%8 = call <16 x i8> @llvm.arm.neon.aese(<16 x i8> %7, <16 x i8> %1)
%9 = call <16 x i8> @llvm.arm.neon.aesmc(<16 x i8> %8)		%9 = call <16 x i8> @llvm.arm.neon.aesmc(<16 x i8> %8)
store <16 x i8> %9, <16 x i8>* %2, align 8		store <16 x i8> %9, <16 x i8>* %2, align 8
ret void		ret void
}		}

define arm_aapcs_vfpcc void @aese_set64_cond_via_ptr(i1 zeroext %0, i64* %1, <16 x i8> %2, <16 x i8>* %3) nounwind {		define arm_aapcs_vfpcc void @aese_set64_cond_via_ptr(i1 zeroext %0, i64* %1, <16 x i8> %2, <16 x i8>* %3) nounwind {
; CHECK-FIX-LABEL: aese_set64_cond_via_ptr:		; CHECK-FIX-LABEL: aese_set64_cond_via_ptr:
; CHECK-FIX: @ %bb.0:		; CHECK-FIX: @ %bb.0:
; CHECK-FIX-NEXT: cmp r0, #0		; CHECK-FIX-NEXT: cmp r0, #0
; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r2]		; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r2]
; CHECK-FIX-NEXT: vldrne d0, [r1]		; CHECK-FIX-NEXT: vldrne d0, [r1]
		; CHECK-FIX-NEXT: vorr q0, q0, q0
; CHECK-FIX-NEXT: aese.8 q8, q0		; CHECK-FIX-NEXT: aese.8 q8, q0
; CHECK-FIX-NEXT: aesmc.8 q8, q8		; CHECK-FIX-NEXT: aesmc.8 q8, q8
; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r2]		; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r2]
; CHECK-FIX-NEXT: bx lr		; CHECK-FIX-NEXT: bx lr
%5 = load <16 x i8>, <16 x i8>* %3, align 8		%5 = load <16 x i8>, <16 x i8>* %3, align 8
br i1 %0, label %6, label %10		br i1 %0, label %6, label %10

6:		6:
Show All 13 Lines	12:
%16 = call <16 x i8> @llvm.arm.neon.aesmc(<16 x i8> %15)		%16 = call <16 x i8> @llvm.arm.neon.aesmc(<16 x i8> %15)
store <16 x i8> %16, <16 x i8>* %3, align 8		store <16 x i8> %16, <16 x i8>* %3, align 8
ret void		ret void
}		}

define arm_aapcs_vfpcc void @aese_set64_cond_via_val(i1 zeroext %0, i64 %1, <16 x i8> %2, <16 x i8>* %3) nounwind {		define arm_aapcs_vfpcc void @aese_set64_cond_via_val(i1 zeroext %0, i64 %1, <16 x i8> %2, <16 x i8>* %3) nounwind {
; CHECK-FIX-LABEL: aese_set64_cond_via_val:		; CHECK-FIX-LABEL: aese_set64_cond_via_val:
; CHECK-FIX: @ %bb.0:		; CHECK-FIX: @ %bb.0:
		; CHECK-FIX-NEXT: vorr q0, q0, q0
; CHECK-FIX-NEXT: ldr r1, [sp]		; CHECK-FIX-NEXT: ldr r1, [sp]
; CHECK-FIX-NEXT: cmp r0, #0		; CHECK-FIX-NEXT: cmp r0, #0
; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r1]		; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r1]
; CHECK-FIX-NEXT: beq .LBB28_2		; CHECK-FIX-NEXT: beq .LBB28_2
; CHECK-FIX-NEXT: @ %bb.1:		; CHECK-FIX-NEXT: @ %bb.1:
; CHECK-FIX-NEXT: vmov.32 d16[0], r2		; CHECK-FIX-NEXT: vmov.32 d16[0], r2
; CHECK-FIX-NEXT: vmov.32 d16[1], r3		; CHECK-FIX-NEXT: vmov.32 d16[1], r3
; CHECK-FIX-NEXT: .LBB28_2: @ %select.end		; CHECK-FIX-NEXT: .LBB28_2: @ %select.end
		; CHECK-FIX-NEXT: vorr q8, q8, q8
; CHECK-FIX-NEXT: aese.8 q8, q0		; CHECK-FIX-NEXT: aese.8 q8, q0
; CHECK-FIX-NEXT: aesmc.8 q8, q8		; CHECK-FIX-NEXT: aesmc.8 q8, q8
; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r1]		; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r1]
; CHECK-FIX-NEXT: bx lr		; CHECK-FIX-NEXT: bx lr
%5 = bitcast <16 x i8>* %3 to <2 x i64>*		%5 = bitcast <16 x i8>* %3 to <2 x i64>*
%6 = load <2 x i64>, <2 x i64>* %5, align 8		%6 = load <2 x i64>, <2 x i64>* %5, align 8
%7 = insertelement <2 x i64> %6, i64 %1, i64 0		%7 = insertelement <2 x i64> %6, i64 %1, i64 0
%8 = select i1 %0, <2 x i64> %7, <2 x i64> %6		%8 = select i1 %0, <2 x i64> %7, <2 x i64> %6
%9 = bitcast <2 x i64> %8 to <16 x i8>		%9 = bitcast <2 x i64> %8 to <16 x i8>
%10 = call <16 x i8> @llvm.arm.neon.aese(<16 x i8> %9, <16 x i8> %2)		%10 = call <16 x i8> @llvm.arm.neon.aese(<16 x i8> %9, <16 x i8> %2)
%11 = call <16 x i8> @llvm.arm.neon.aesmc(<16 x i8> %10)		%11 = call <16 x i8> @llvm.arm.neon.aesmc(<16 x i8> %10)
store <16 x i8> %11, <16 x i8>* %3, align 8		store <16 x i8> %11, <16 x i8>* %3, align 8
ret void		ret void
}		}

define arm_aapcs_vfpcc void @aese_set64_loop_via_ptr(i32 %0, i64* %1, <16 x i8> %2, <16 x i8>* %3) nounwind {		define arm_aapcs_vfpcc void @aese_set64_loop_via_ptr(i32 %0, i64* %1, <16 x i8> %2, <16 x i8>* %3) nounwind {
; CHECK-FIX-LABEL: aese_set64_loop_via_ptr:		; CHECK-FIX-LABEL: aese_set64_loop_via_ptr:
; CHECK-FIX: @ %bb.0:		; CHECK-FIX: @ %bb.0:
		; CHECK-FIX-NEXT: vorr q0, q0, q0
; CHECK-FIX-NEXT: cmp r0, #0		; CHECK-FIX-NEXT: cmp r0, #0
; CHECK-FIX-NEXT: bxeq lr		; CHECK-FIX-NEXT: bxeq lr
; CHECK-FIX-NEXT: .LBB29_1:		; CHECK-FIX-NEXT: .LBB29_1:
; CHECK-FIX-NEXT: vldr d0, [r1]		; CHECK-FIX-NEXT: vldr d0, [r1]
; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r2]		; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r2]
; CHECK-FIX-NEXT: .LBB29_2: @ =>This Inner Loop Header: Depth=1		; CHECK-FIX-NEXT: .LBB29_2: @ =>This Inner Loop Header: Depth=1
; CHECK-FIX-NEXT: aese.8 q8, q0		; CHECK-FIX-NEXT: aese.8 q8, q0
; CHECK-FIX-NEXT: subs r0, r0, #1		; CHECK-FIX-NEXT: subs r0, r0, #1
Show All 36 Lines
; CHECK-FIX-NEXT: cmp r0, #0		; CHECK-FIX-NEXT: cmp r0, #0
; CHECK-FIX-NEXT: bxeq lr		; CHECK-FIX-NEXT: bxeq lr
; CHECK-FIX-NEXT: .LBB30_1:		; CHECK-FIX-NEXT: .LBB30_1:
; CHECK-FIX-NEXT: vmov.32 d0[0], r2		; CHECK-FIX-NEXT: vmov.32 d0[0], r2
; CHECK-FIX-NEXT: ldr r1, [sp]		; CHECK-FIX-NEXT: ldr r1, [sp]
; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r1]		; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r1]
; CHECK-FIX-NEXT: vmov.32 d0[1], r3		; CHECK-FIX-NEXT: vmov.32 d0[1], r3
; CHECK-FIX-NEXT: .LBB30_2: @ =>This Inner Loop Header: Depth=1		; CHECK-FIX-NEXT: .LBB30_2: @ =>This Inner Loop Header: Depth=1
		; CHECK-FIX-NEXT: vorr q0, q0, q0
; CHECK-FIX-NEXT: aese.8 q8, q0		; CHECK-FIX-NEXT: aese.8 q8, q0
; CHECK-FIX-NEXT: subs r0, r0, #1		; CHECK-FIX-NEXT: subs r0, r0, #1
; CHECK-FIX-NEXT: aesmc.8 q8, q8		; CHECK-FIX-NEXT: aesmc.8 q8, q8
; CHECK-FIX-NEXT: bne .LBB30_2		; CHECK-FIX-NEXT: bne .LBB30_2
; CHECK-FIX-NEXT: @ %bb.3:		; CHECK-FIX-NEXT: @ %bb.3:
; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r1]		; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r1]
; CHECK-FIX-NEXT: bx lr		; CHECK-FIX-NEXT: bx lr
%5 = icmp eq i32 %0, 0		%5 = icmp eq i32 %0, 0
Show All 23 Lines	13:
br i1 %19, label %11, label %13		br i1 %19, label %11, label %13
}		}

define arm_aapcs_vfpcc void @aesd_zero(<16 x i8>* %0) nounwind {		define arm_aapcs_vfpcc void @aesd_zero(<16 x i8>* %0) nounwind {
; CHECK-FIX-LABEL: aesd_zero:		; CHECK-FIX-LABEL: aesd_zero:
; CHECK-FIX: @ %bb.0:		; CHECK-FIX: @ %bb.0:
; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r0]		; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r0]
; CHECK-FIX-NEXT: vmov.i32 q9, #0x0		; CHECK-FIX-NEXT: vmov.i32 q9, #0x0
		; CHECK-FIX-NEXT: vorr q9, q9, q9
; CHECK-FIX-NEXT: aesd.8 q9, q8		; CHECK-FIX-NEXT: aesd.8 q9, q8
; CHECK-FIX-NEXT: aesimc.8 q8, q9		; CHECK-FIX-NEXT: aesimc.8 q8, q9
; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r0]		; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r0]
; CHECK-FIX-NEXT: bx lr		; CHECK-FIX-NEXT: bx lr
%2 = load <16 x i8>, <16 x i8>* %0, align 8		%2 = load <16 x i8>, <16 x i8>* %0, align 8
%3 = call <16 x i8> @llvm.arm.neon.aesd(<16 x i8> zeroinitializer, <16 x i8> %2)		%3 = call <16 x i8> @llvm.arm.neon.aesd(<16 x i8> zeroinitializer, <16 x i8> %2)
%4 = call <16 x i8> @llvm.arm.neon.aesimc(<16 x i8> %3)		%4 = call <16 x i8> @llvm.arm.neon.aesimc(<16 x i8> %3)
store <16 x i8> %4, <16 x i8>* %0, align 8		store <16 x i8> %4, <16 x i8>* %0, align 8
Show All 15 Lines	; CHECK-FIX-NEXT: bx lr
%6 = call <16 x i8> @llvm.arm.neon.aesimc(<16 x i8> %5)		%6 = call <16 x i8> @llvm.arm.neon.aesimc(<16 x i8> %5)
store <16 x i8> %6, <16 x i8>* %1, align 8		store <16 x i8> %6, <16 x i8>* %1, align 8
ret void		ret void
}		}

define arm_aapcs_vfpcc <16 x i8> @aesd_once_via_val(<16 x i8> %0, <16 x i8> %1) nounwind {		define arm_aapcs_vfpcc <16 x i8> @aesd_once_via_val(<16 x i8> %0, <16 x i8> %1) nounwind {
; CHECK-FIX-LABEL: aesd_once_via_val:		; CHECK-FIX-LABEL: aesd_once_via_val:
; CHECK-FIX: @ %bb.0:		; CHECK-FIX: @ %bb.0:
		; CHECK-FIX-NEXT: vorr q1, q1, q1
		; CHECK-FIX-NEXT: vorr q0, q0, q0
; CHECK-FIX-NEXT: aesd.8 q1, q0		; CHECK-FIX-NEXT: aesd.8 q1, q0
; CHECK-FIX-NEXT: aesimc.8 q0, q1		; CHECK-FIX-NEXT: aesimc.8 q0, q1
; CHECK-FIX-NEXT: bx lr		; CHECK-FIX-NEXT: bx lr
%3 = call <16 x i8> @llvm.arm.neon.aesd(<16 x i8> %1, <16 x i8> %0)		%3 = call <16 x i8> @llvm.arm.neon.aesd(<16 x i8> %1, <16 x i8> %0)
%4 = call <16 x i8> @llvm.arm.neon.aesimc(<16 x i8> %3)		%4 = call <16 x i8> @llvm.arm.neon.aesimc(<16 x i8> %3)
ret <16 x i8> %4		ret <16 x i8> %4
}		}

Show All 20 Lines	; CHECK-FIX-NEXT: bx lr
%9 = call <16 x i8> @llvm.arm.neon.aesimc(<16 x i8> %8)		%9 = call <16 x i8> @llvm.arm.neon.aesimc(<16 x i8> %8)
store <16 x i8> %9, <16 x i8>* %1, align 8		store <16 x i8> %9, <16 x i8>* %1, align 8
ret void		ret void
}		}

define arm_aapcs_vfpcc <16 x i8> @aesd_twice_via_val(<16 x i8> %0, <16 x i8> %1) nounwind {		define arm_aapcs_vfpcc <16 x i8> @aesd_twice_via_val(<16 x i8> %0, <16 x i8> %1) nounwind {
; CHECK-FIX-LABEL: aesd_twice_via_val:		; CHECK-FIX-LABEL: aesd_twice_via_val:
; CHECK-FIX: @ %bb.0:		; CHECK-FIX: @ %bb.0:
		; CHECK-FIX-NEXT: vorr q1, q1, q1
		; CHECK-FIX-NEXT: vorr q0, q0, q0
		; CHECK-FIX-NEXT: vorr q0, q0, q0
; CHECK-FIX-NEXT: aesd.8 q1, q0		; CHECK-FIX-NEXT: aesd.8 q1, q0
; CHECK-FIX-NEXT: aesimc.8 q8, q1		; CHECK-FIX-NEXT: aesimc.8 q8, q1
; CHECK-FIX-NEXT: aesd.8 q8, q0		; CHECK-FIX-NEXT: aesd.8 q8, q0
; CHECK-FIX-NEXT: aesimc.8 q0, q8		; CHECK-FIX-NEXT: aesimc.8 q0, q8
; CHECK-FIX-NEXT: bx lr		; CHECK-FIX-NEXT: bx lr
%3 = call <16 x i8> @llvm.arm.neon.aesd(<16 x i8> %1, <16 x i8> %0)		%3 = call <16 x i8> @llvm.arm.neon.aesd(<16 x i8> %1, <16 x i8> %0)
%4 = call <16 x i8> @llvm.arm.neon.aesimc(<16 x i8> %3)		%4 = call <16 x i8> @llvm.arm.neon.aesimc(<16 x i8> %3)
%5 = call <16 x i8> @llvm.arm.neon.aesd(<16 x i8> %4, <16 x i8> %0)		%5 = call <16 x i8> @llvm.arm.neon.aesd(<16 x i8> %4, <16 x i8> %0)
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	6:
%12 = add nuw i32 %7, 1		%12 = add nuw i32 %7, 1
%13 = icmp eq i32 %12, %0		%13 = icmp eq i32 %12, %0
br i1 %13, label %5, label %6		br i1 %13, label %5, label %6
}		}

define arm_aapcs_vfpcc <16 x i8> @aesd_loop_via_val(i32 %0, <16 x i8> %1, <16 x i8> %2) nounwind {		define arm_aapcs_vfpcc <16 x i8> @aesd_loop_via_val(i32 %0, <16 x i8> %1, <16 x i8> %2) nounwind {
; CHECK-FIX-LABEL: aesd_loop_via_val:		; CHECK-FIX-LABEL: aesd_loop_via_val:
; CHECK-FIX: @ %bb.0:		; CHECK-FIX: @ %bb.0:
		; CHECK-FIX-NEXT: vorr q1, q1, q1
		; CHECK-FIX-NEXT: vorr q0, q0, q0
; CHECK-FIX-NEXT: cmp r0, #0		; CHECK-FIX-NEXT: cmp r0, #0
; CHECK-FIX-NEXT: beq .LBB37_2		; CHECK-FIX-NEXT: beq .LBB37_2
; CHECK-FIX-NEXT: .LBB37_1: @ =>This Inner Loop Header: Depth=1		; CHECK-FIX-NEXT: .LBB37_1: @ =>This Inner Loop Header: Depth=1
; CHECK-FIX-NEXT: aesd.8 q1, q0		; CHECK-FIX-NEXT: aesd.8 q1, q0
; CHECK-FIX-NEXT: subs r0, r0, #1		; CHECK-FIX-NEXT: subs r0, r0, #1
; CHECK-FIX-NEXT: aesimc.8 q1, q1		; CHECK-FIX-NEXT: aesimc.8 q1, q1
; CHECK-FIX-NEXT: bne .LBB37_1		; CHECK-FIX-NEXT: bne .LBB37_1
; CHECK-FIX-NEXT: .LBB37_2:		; CHECK-FIX-NEXT: .LBB37_2:
Show All 16 Lines	7:
br i1 %13, label %5, label %7		br i1 %13, label %5, label %7
}		}

define arm_aapcs_vfpcc void @aesd_set8_via_ptr(i8* %0, <16 x i8> %1, <16 x i8>* %2) nounwind {		define arm_aapcs_vfpcc void @aesd_set8_via_ptr(i8* %0, <16 x i8> %1, <16 x i8>* %2) nounwind {
; CHECK-FIX-LABEL: aesd_set8_via_ptr:		; CHECK-FIX-LABEL: aesd_set8_via_ptr:
; CHECK-FIX: @ %bb.0:		; CHECK-FIX: @ %bb.0:
; CHECK-FIX-NEXT: vld1.8 {d0[0]}, [r0]		; CHECK-FIX-NEXT: vld1.8 {d0[0]}, [r0]
; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r1]		; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r1]
		; CHECK-FIX-NEXT: vorr q0, q0, q0
; CHECK-FIX-NEXT: aesd.8 q8, q0		; CHECK-FIX-NEXT: aesd.8 q8, q0
; CHECK-FIX-NEXT: aesimc.8 q8, q8		; CHECK-FIX-NEXT: aesimc.8 q8, q8
; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r1]		; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r1]
; CHECK-FIX-NEXT: bx lr		; CHECK-FIX-NEXT: bx lr
%4 = load <16 x i8>, <16 x i8>* %2, align 8		%4 = load <16 x i8>, <16 x i8>* %2, align 8
%5 = load i8, i8* %0, align 1		%5 = load i8, i8* %0, align 1
%6 = insertelement <16 x i8> %1, i8 %5, i64 0		%6 = insertelement <16 x i8> %1, i8 %5, i64 0
%7 = call <16 x i8> @llvm.arm.neon.aesd(<16 x i8> %4, <16 x i8> %6)		%7 = call <16 x i8> @llvm.arm.neon.aesd(<16 x i8> %4, <16 x i8> %6)
%8 = call <16 x i8> @llvm.arm.neon.aesimc(<16 x i8> %7)		%8 = call <16 x i8> @llvm.arm.neon.aesimc(<16 x i8> %7)
store <16 x i8> %8, <16 x i8>* %2, align 8		store <16 x i8> %8, <16 x i8>* %2, align 8
ret void		ret void
}		}

define arm_aapcs_vfpcc void @aesd_set8_via_val(i8 zeroext %0, <16 x i8> %1, <16 x i8>* %2) nounwind {		define arm_aapcs_vfpcc void @aesd_set8_via_val(i8 zeroext %0, <16 x i8> %1, <16 x i8>* %2) nounwind {
; CHECK-FIX-LABEL: aesd_set8_via_val:		; CHECK-FIX-LABEL: aesd_set8_via_val:
; CHECK-FIX: @ %bb.0:		; CHECK-FIX: @ %bb.0:
		; CHECK-FIX-NEXT: vorr q0, q0, q0
; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r1]		; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r1]
; CHECK-FIX-NEXT: vmov.8 d16[0], r0		; CHECK-FIX-NEXT: vmov.8 d16[0], r0
		; CHECK-FIX-NEXT: vorr q8, q8, q8
; CHECK-FIX-NEXT: aesd.8 q8, q0		; CHECK-FIX-NEXT: aesd.8 q8, q0
; CHECK-FIX-NEXT: aesimc.8 q8, q8		; CHECK-FIX-NEXT: aesimc.8 q8, q8
; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r1]		; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r1]
; CHECK-FIX-NEXT: bx lr		; CHECK-FIX-NEXT: bx lr
%4 = load <16 x i8>, <16 x i8>* %2, align 8		%4 = load <16 x i8>, <16 x i8>* %2, align 8
%5 = insertelement <16 x i8> %4, i8 %0, i64 0		%5 = insertelement <16 x i8> %4, i8 %0, i64 0
%6 = call <16 x i8> @llvm.arm.neon.aesd(<16 x i8> %5, <16 x i8> %1)		%6 = call <16 x i8> @llvm.arm.neon.aesd(<16 x i8> %5, <16 x i8> %1)
%7 = call <16 x i8> @llvm.arm.neon.aesimc(<16 x i8> %6)		%7 = call <16 x i8> @llvm.arm.neon.aesimc(<16 x i8> %6)
store <16 x i8> %7, <16 x i8>* %2, align 8		store <16 x i8> %7, <16 x i8>* %2, align 8
ret void		ret void
}		}

define arm_aapcs_vfpcc void @aesd_set8_cond_via_ptr(i1 zeroext %0, i8* %1, <16 x i8> %2, <16 x i8>* %3) nounwind {		define arm_aapcs_vfpcc void @aesd_set8_cond_via_ptr(i1 zeroext %0, i8* %1, <16 x i8> %2, <16 x i8>* %3) nounwind {
; CHECK-FIX-LABEL: aesd_set8_cond_via_ptr:		; CHECK-FIX-LABEL: aesd_set8_cond_via_ptr:
; CHECK-FIX: @ %bb.0:		; CHECK-FIX: @ %bb.0:
; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r2]		; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r2]
; CHECK-FIX-NEXT: cmp r0, #0		; CHECK-FIX-NEXT: cmp r0, #0
; CHECK-FIX-NEXT: beq .LBB40_2		; CHECK-FIX-NEXT: beq .LBB40_2
; CHECK-FIX-NEXT: @ %bb.1:		; CHECK-FIX-NEXT: @ %bb.1:
; CHECK-FIX-NEXT: vld1.8 {d0[0]}, [r1]		; CHECK-FIX-NEXT: vld1.8 {d0[0]}, [r1]
; CHECK-FIX-NEXT: .LBB40_2:		; CHECK-FIX-NEXT: .LBB40_2:
		; CHECK-FIX-NEXT: vorr q0, q0, q0
; CHECK-FIX-NEXT: aesd.8 q8, q0		; CHECK-FIX-NEXT: aesd.8 q8, q0
; CHECK-FIX-NEXT: aesimc.8 q8, q8		; CHECK-FIX-NEXT: aesimc.8 q8, q8
; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r2]		; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r2]
; CHECK-FIX-NEXT: bx lr		; CHECK-FIX-NEXT: bx lr
%5 = load <16 x i8>, <16 x i8>* %3, align 8		%5 = load <16 x i8>, <16 x i8>* %3, align 8
br i1 %0, label %6, label %9		br i1 %0, label %6, label %9

6:		6:
%7 = load i8, i8* %1, align 1		%7 = load i8, i8* %1, align 1
%8 = insertelement <16 x i8> %2, i8 %7, i64 0		%8 = insertelement <16 x i8> %2, i8 %7, i64 0
br label %9		br label %9

9:		9:
%10 = phi <16 x i8> [ %8, %6 ], [ %2, %4 ]		%10 = phi <16 x i8> [ %8, %6 ], [ %2, %4 ]
%11 = call <16 x i8> @llvm.arm.neon.aesd(<16 x i8> %5, <16 x i8> %10)		%11 = call <16 x i8> @llvm.arm.neon.aesd(<16 x i8> %5, <16 x i8> %10)
%12 = call <16 x i8> @llvm.arm.neon.aesimc(<16 x i8> %11)		%12 = call <16 x i8> @llvm.arm.neon.aesimc(<16 x i8> %11)
store <16 x i8> %12, <16 x i8>* %3, align 8		store <16 x i8> %12, <16 x i8>* %3, align 8
ret void		ret void
}		}

define arm_aapcs_vfpcc void @aesd_set8_cond_via_val(i1 zeroext %0, i8 zeroext %1, <16 x i8> %2, <16 x i8>* %3) nounwind {		define arm_aapcs_vfpcc void @aesd_set8_cond_via_val(i1 zeroext %0, i8 zeroext %1, <16 x i8> %2, <16 x i8>* %3) nounwind {
; CHECK-FIX-LABEL: aesd_set8_cond_via_val:		; CHECK-FIX-LABEL: aesd_set8_cond_via_val:
; CHECK-FIX: @ %bb.0:		; CHECK-FIX: @ %bb.0:
		; CHECK-FIX-NEXT: vorr q0, q0, q0
; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r2]		; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r2]
; CHECK-FIX-NEXT: cmp r0, #0		; CHECK-FIX-NEXT: cmp r0, #0
; CHECK-FIX-NEXT: beq .LBB41_2		; CHECK-FIX-NEXT: beq .LBB41_2
; CHECK-FIX-NEXT: @ %bb.1:		; CHECK-FIX-NEXT: @ %bb.1:
; CHECK-FIX-NEXT: vmov.8 d16[0], r1		; CHECK-FIX-NEXT: vmov.8 d16[0], r1
; CHECK-FIX-NEXT: .LBB41_2: @ %select.end		; CHECK-FIX-NEXT: .LBB41_2: @ %select.end
		; CHECK-FIX-NEXT: vorr q8, q8, q8
; CHECK-FIX-NEXT: aesd.8 q8, q0		; CHECK-FIX-NEXT: aesd.8 q8, q0
; CHECK-FIX-NEXT: aesimc.8 q8, q8		; CHECK-FIX-NEXT: aesimc.8 q8, q8
; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r2]		; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r2]
; CHECK-FIX-NEXT: bx lr		; CHECK-FIX-NEXT: bx lr
%5 = load <16 x i8>, <16 x i8>* %3, align 8		%5 = load <16 x i8>, <16 x i8>* %3, align 8
%6 = insertelement <16 x i8> %5, i8 %1, i64 0		%6 = insertelement <16 x i8> %5, i8 %1, i64 0
%7 = select i1 %0, <16 x i8> %6, <16 x i8> %5		%7 = select i1 %0, <16 x i8> %6, <16 x i8> %5
%8 = call <16 x i8> @llvm.arm.neon.aesd(<16 x i8> %7, <16 x i8> %2)		%8 = call <16 x i8> @llvm.arm.neon.aesd(<16 x i8> %7, <16 x i8> %2)
%9 = call <16 x i8> @llvm.arm.neon.aesimc(<16 x i8> %8)		%9 = call <16 x i8> @llvm.arm.neon.aesimc(<16 x i8> %8)
store <16 x i8> %9, <16 x i8>* %3, align 8		store <16 x i8> %9, <16 x i8>* %3, align 8
ret void		ret void
}		}

define arm_aapcs_vfpcc void @aesd_set8_loop_via_ptr(i32 %0, i8* %1, <16 x i8> %2, <16 x i8>* %3) nounwind {		define arm_aapcs_vfpcc void @aesd_set8_loop_via_ptr(i32 %0, i8* %1, <16 x i8> %2, <16 x i8>* %3) nounwind {
; CHECK-FIX-LABEL: aesd_set8_loop_via_ptr:		; CHECK-FIX-LABEL: aesd_set8_loop_via_ptr:
; CHECK-FIX: @ %bb.0:		; CHECK-FIX: @ %bb.0:
; CHECK-FIX-NEXT: cmp r0, #0		; CHECK-FIX-NEXT: cmp r0, #0
; CHECK-FIX-NEXT: bxeq lr		; CHECK-FIX-NEXT: bxeq lr
; CHECK-FIX-NEXT: .LBB42_1:		; CHECK-FIX-NEXT: .LBB42_1:
; CHECK-FIX-NEXT: vld1.8 {d0[0]}, [r1]		; CHECK-FIX-NEXT: vld1.8 {d0[0]}, [r1]
; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r2]		; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r2]
; CHECK-FIX-NEXT: .LBB42_2: @ =>This Inner Loop Header: Depth=1		; CHECK-FIX-NEXT: .LBB42_2: @ =>This Inner Loop Header: Depth=1
		; CHECK-FIX-NEXT: vorr q0, q0, q0
; CHECK-FIX-NEXT: aesd.8 q8, q0		; CHECK-FIX-NEXT: aesd.8 q8, q0
; CHECK-FIX-NEXT: subs r0, r0, #1		; CHECK-FIX-NEXT: subs r0, r0, #1
; CHECK-FIX-NEXT: aesimc.8 q8, q8		; CHECK-FIX-NEXT: aesimc.8 q8, q8
; CHECK-FIX-NEXT: bne .LBB42_2		; CHECK-FIX-NEXT: bne .LBB42_2
; CHECK-FIX-NEXT: @ %bb.3:		; CHECK-FIX-NEXT: @ %bb.3:
; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r2]		; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r2]
; CHECK-FIX-NEXT: bx lr		; CHECK-FIX-NEXT: bx lr
%5 = load i8, i8* %1, align 1		%5 = load i8, i8* %1, align 1
Show All 26 Lines
; CHECK-FIX-LABEL: aesd_set8_loop_via_val:		; CHECK-FIX-LABEL: aesd_set8_loop_via_val:
; CHECK-FIX: @ %bb.0:		; CHECK-FIX: @ %bb.0:
; CHECK-FIX-NEXT: cmp r0, #0		; CHECK-FIX-NEXT: cmp r0, #0
; CHECK-FIX-NEXT: bxeq lr		; CHECK-FIX-NEXT: bxeq lr
; CHECK-FIX-NEXT: .LBB43_1:		; CHECK-FIX-NEXT: .LBB43_1:
; CHECK-FIX-NEXT: vmov.8 d0[0], r1		; CHECK-FIX-NEXT: vmov.8 d0[0], r1
; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r2]		; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r2]
; CHECK-FIX-NEXT: .LBB43_2: @ =>This Inner Loop Header: Depth=1		; CHECK-FIX-NEXT: .LBB43_2: @ =>This Inner Loop Header: Depth=1
		; CHECK-FIX-NEXT: vorr q0, q0, q0
; CHECK-FIX-NEXT: aesd.8 q8, q0		; CHECK-FIX-NEXT: aesd.8 q8, q0
; CHECK-FIX-NEXT: subs r0, r0, #1		; CHECK-FIX-NEXT: subs r0, r0, #1
; CHECK-FIX-NEXT: aesimc.8 q8, q8		; CHECK-FIX-NEXT: aesimc.8 q8, q8
; CHECK-FIX-NEXT: bne .LBB43_2		; CHECK-FIX-NEXT: bne .LBB43_2
; CHECK-FIX-NEXT: @ %bb.3:		; CHECK-FIX-NEXT: @ %bb.3:
; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r2]		; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r2]
; CHECK-FIX-NEXT: bx lr		; CHECK-FIX-NEXT: bx lr
%5 = icmp eq i32 %0, 0		%5 = icmp eq i32 %0, 0
Show All 21 Lines	11:
br i1 %17, label %9, label %11		br i1 %17, label %9, label %11
}		}

define arm_aapcs_vfpcc void @aesd_set16_via_ptr(i16* %0, <16 x i8> %1, <16 x i8>* %2) nounwind {		define arm_aapcs_vfpcc void @aesd_set16_via_ptr(i16* %0, <16 x i8> %1, <16 x i8>* %2) nounwind {
; CHECK-FIX-LABEL: aesd_set16_via_ptr:		; CHECK-FIX-LABEL: aesd_set16_via_ptr:
; CHECK-FIX: @ %bb.0:		; CHECK-FIX: @ %bb.0:
; CHECK-FIX-NEXT: vld1.16 {d0[0]}, [r0:16]		; CHECK-FIX-NEXT: vld1.16 {d0[0]}, [r0:16]
; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r1]		; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r1]
		; CHECK-FIX-NEXT: vorr q0, q0, q0
; CHECK-FIX-NEXT: aesd.8 q8, q0		; CHECK-FIX-NEXT: aesd.8 q8, q0
; CHECK-FIX-NEXT: aesimc.8 q8, q8		; CHECK-FIX-NEXT: aesimc.8 q8, q8
; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r1]		; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r1]
; CHECK-FIX-NEXT: bx lr		; CHECK-FIX-NEXT: bx lr
%4 = load <16 x i8>, <16 x i8>* %2, align 8		%4 = load <16 x i8>, <16 x i8>* %2, align 8
%5 = load i16, i16* %0, align 2		%5 = load i16, i16* %0, align 2
%6 = bitcast <16 x i8> %1 to <8 x i16>		%6 = bitcast <16 x i8> %1 to <8 x i16>
%7 = insertelement <8 x i16> %6, i16 %5, i64 0		%7 = insertelement <8 x i16> %6, i16 %5, i64 0
%8 = bitcast <8 x i16> %7 to <16 x i8>		%8 = bitcast <8 x i16> %7 to <16 x i8>
%9 = call <16 x i8> @llvm.arm.neon.aesd(<16 x i8> %4, <16 x i8> %8)		%9 = call <16 x i8> @llvm.arm.neon.aesd(<16 x i8> %4, <16 x i8> %8)
%10 = call <16 x i8> @llvm.arm.neon.aesimc(<16 x i8> %9)		%10 = call <16 x i8> @llvm.arm.neon.aesimc(<16 x i8> %9)
store <16 x i8> %10, <16 x i8>* %2, align 8		store <16 x i8> %10, <16 x i8>* %2, align 8
ret void		ret void
}		}

define arm_aapcs_vfpcc void @aesd_set16_via_val(i16 zeroext %0, <16 x i8> %1, <16 x i8>* %2) nounwind {		define arm_aapcs_vfpcc void @aesd_set16_via_val(i16 zeroext %0, <16 x i8> %1, <16 x i8>* %2) nounwind {
; CHECK-FIX-LABEL: aesd_set16_via_val:		; CHECK-FIX-LABEL: aesd_set16_via_val:
; CHECK-FIX: @ %bb.0:		; CHECK-FIX: @ %bb.0:
		; CHECK-FIX-NEXT: vorr q0, q0, q0
; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r1]		; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r1]
; CHECK-FIX-NEXT: vmov.16 d16[0], r0		; CHECK-FIX-NEXT: vmov.16 d16[0], r0
		; CHECK-FIX-NEXT: vorr q8, q8, q8
; CHECK-FIX-NEXT: aesd.8 q8, q0		; CHECK-FIX-NEXT: aesd.8 q8, q0
; CHECK-FIX-NEXT: aesimc.8 q8, q8		; CHECK-FIX-NEXT: aesimc.8 q8, q8
; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r1]		; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r1]
; CHECK-FIX-NEXT: bx lr		; CHECK-FIX-NEXT: bx lr
%4 = bitcast <16 x i8>* %2 to <8 x i16>*		%4 = bitcast <16 x i8>* %2 to <8 x i16>*
%5 = load <8 x i16>, <8 x i16>* %4, align 8		%5 = load <8 x i16>, <8 x i16>* %4, align 8
%6 = insertelement <8 x i16> %5, i16 %0, i64 0		%6 = insertelement <8 x i16> %5, i16 %0, i64 0
%7 = bitcast <8 x i16> %6 to <16 x i8>		%7 = bitcast <8 x i16> %6 to <16 x i8>
%8 = call <16 x i8> @llvm.arm.neon.aesd(<16 x i8> %7, <16 x i8> %1)		%8 = call <16 x i8> @llvm.arm.neon.aesd(<16 x i8> %7, <16 x i8> %1)
%9 = call <16 x i8> @llvm.arm.neon.aesimc(<16 x i8> %8)		%9 = call <16 x i8> @llvm.arm.neon.aesimc(<16 x i8> %8)
store <16 x i8> %9, <16 x i8>* %2, align 8		store <16 x i8> %9, <16 x i8>* %2, align 8
ret void		ret void
}		}

define arm_aapcs_vfpcc void @aesd_set16_cond_via_ptr(i1 zeroext %0, i16* %1, <16 x i8> %2, <16 x i8>* %3) nounwind {		define arm_aapcs_vfpcc void @aesd_set16_cond_via_ptr(i1 zeroext %0, i16* %1, <16 x i8> %2, <16 x i8>* %3) nounwind {
; CHECK-FIX-LABEL: aesd_set16_cond_via_ptr:		; CHECK-FIX-LABEL: aesd_set16_cond_via_ptr:
; CHECK-FIX: @ %bb.0:		; CHECK-FIX: @ %bb.0:
; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r2]		; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r2]
; CHECK-FIX-NEXT: cmp r0, #0		; CHECK-FIX-NEXT: cmp r0, #0
; CHECK-FIX-NEXT: beq .LBB46_2		; CHECK-FIX-NEXT: beq .LBB46_2
; CHECK-FIX-NEXT: @ %bb.1:		; CHECK-FIX-NEXT: @ %bb.1:
; CHECK-FIX-NEXT: vld1.16 {d0[0]}, [r1:16]		; CHECK-FIX-NEXT: vld1.16 {d0[0]}, [r1:16]
; CHECK-FIX-NEXT: .LBB46_2:		; CHECK-FIX-NEXT: .LBB46_2:
		; CHECK-FIX-NEXT: vorr q0, q0, q0
; CHECK-FIX-NEXT: aesd.8 q8, q0		; CHECK-FIX-NEXT: aesd.8 q8, q0
; CHECK-FIX-NEXT: aesimc.8 q8, q8		; CHECK-FIX-NEXT: aesimc.8 q8, q8
; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r2]		; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r2]
; CHECK-FIX-NEXT: bx lr		; CHECK-FIX-NEXT: bx lr
%5 = load <16 x i8>, <16 x i8>* %3, align 8		%5 = load <16 x i8>, <16 x i8>* %3, align 8
br i1 %0, label %6, label %10		br i1 %0, label %6, label %10

6:		6:
Show All 13 Lines	12:
%16 = call <16 x i8> @llvm.arm.neon.aesimc(<16 x i8> %15)		%16 = call <16 x i8> @llvm.arm.neon.aesimc(<16 x i8> %15)
store <16 x i8> %16, <16 x i8>* %3, align 8		store <16 x i8> %16, <16 x i8>* %3, align 8
ret void		ret void
}		}

define arm_aapcs_vfpcc void @aesd_set16_cond_via_val(i1 zeroext %0, i16 zeroext %1, <16 x i8> %2, <16 x i8>* %3) nounwind {		define arm_aapcs_vfpcc void @aesd_set16_cond_via_val(i1 zeroext %0, i16 zeroext %1, <16 x i8> %2, <16 x i8>* %3) nounwind {
; CHECK-FIX-LABEL: aesd_set16_cond_via_val:		; CHECK-FIX-LABEL: aesd_set16_cond_via_val:
; CHECK-FIX: @ %bb.0:		; CHECK-FIX: @ %bb.0:
		; CHECK-FIX-NEXT: vorr q0, q0, q0
; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r2]		; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r2]
; CHECK-FIX-NEXT: cmp r0, #0		; CHECK-FIX-NEXT: cmp r0, #0
; CHECK-FIX-NEXT: beq .LBB47_2		; CHECK-FIX-NEXT: beq .LBB47_2
; CHECK-FIX-NEXT: @ %bb.1:		; CHECK-FIX-NEXT: @ %bb.1:
; CHECK-FIX-NEXT: vmov.16 d16[0], r1		; CHECK-FIX-NEXT: vmov.16 d16[0], r1
; CHECK-FIX-NEXT: .LBB47_2: @ %select.end		; CHECK-FIX-NEXT: .LBB47_2: @ %select.end
		; CHECK-FIX-NEXT: vorr q8, q8, q8
; CHECK-FIX-NEXT: aesd.8 q8, q0		; CHECK-FIX-NEXT: aesd.8 q8, q0
; CHECK-FIX-NEXT: aesimc.8 q8, q8		; CHECK-FIX-NEXT: aesimc.8 q8, q8
; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r2]		; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r2]
; CHECK-FIX-NEXT: bx lr		; CHECK-FIX-NEXT: bx lr
%5 = bitcast <16 x i8>* %3 to <8 x i16>*		%5 = bitcast <16 x i8>* %3 to <8 x i16>*
%6 = load <8 x i16>, <8 x i16>* %5, align 8		%6 = load <8 x i16>, <8 x i16>* %5, align 8
%7 = insertelement <8 x i16> %6, i16 %1, i64 0		%7 = insertelement <8 x i16> %6, i16 %1, i64 0
%8 = select i1 %0, <8 x i16> %7, <8 x i16> %6		%8 = select i1 %0, <8 x i16> %7, <8 x i16> %6
%9 = bitcast <8 x i16> %8 to <16 x i8>		%9 = bitcast <8 x i16> %8 to <16 x i8>
%10 = call <16 x i8> @llvm.arm.neon.aesd(<16 x i8> %9, <16 x i8> %2)		%10 = call <16 x i8> @llvm.arm.neon.aesd(<16 x i8> %9, <16 x i8> %2)
%11 = call <16 x i8> @llvm.arm.neon.aesimc(<16 x i8> %10)		%11 = call <16 x i8> @llvm.arm.neon.aesimc(<16 x i8> %10)
store <16 x i8> %11, <16 x i8>* %3, align 8		store <16 x i8> %11, <16 x i8>* %3, align 8
ret void		ret void
}		}

define arm_aapcs_vfpcc void @aesd_set16_loop_via_ptr(i32 %0, i16* %1, <16 x i8> %2, <16 x i8>* %3) nounwind {		define arm_aapcs_vfpcc void @aesd_set16_loop_via_ptr(i32 %0, i16* %1, <16 x i8> %2, <16 x i8>* %3) nounwind {
; CHECK-FIX-LABEL: aesd_set16_loop_via_ptr:		; CHECK-FIX-LABEL: aesd_set16_loop_via_ptr:
; CHECK-FIX: @ %bb.0:		; CHECK-FIX: @ %bb.0:
; CHECK-FIX-NEXT: cmp r0, #0		; CHECK-FIX-NEXT: cmp r0, #0
; CHECK-FIX-NEXT: bxeq lr		; CHECK-FIX-NEXT: bxeq lr
; CHECK-FIX-NEXT: .LBB48_1:		; CHECK-FIX-NEXT: .LBB48_1:
; CHECK-FIX-NEXT: vld1.16 {d0[0]}, [r1:16]		; CHECK-FIX-NEXT: vld1.16 {d0[0]}, [r1:16]
; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r2]		; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r2]
; CHECK-FIX-NEXT: .LBB48_2: @ =>This Inner Loop Header: Depth=1		; CHECK-FIX-NEXT: .LBB48_2: @ =>This Inner Loop Header: Depth=1
		; CHECK-FIX-NEXT: vorr q0, q0, q0
; CHECK-FIX-NEXT: aesd.8 q8, q0		; CHECK-FIX-NEXT: aesd.8 q8, q0
; CHECK-FIX-NEXT: subs r0, r0, #1		; CHECK-FIX-NEXT: subs r0, r0, #1
; CHECK-FIX-NEXT: aesimc.8 q8, q8		; CHECK-FIX-NEXT: aesimc.8 q8, q8
; CHECK-FIX-NEXT: bne .LBB48_2		; CHECK-FIX-NEXT: bne .LBB48_2
; CHECK-FIX-NEXT: @ %bb.3:		; CHECK-FIX-NEXT: @ %bb.3:
; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r2]		; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r2]
; CHECK-FIX-NEXT: bx lr		; CHECK-FIX-NEXT: bx lr
%5 = load i16, i16* %1, align 2		%5 = load i16, i16* %1, align 2
Show All 28 Lines
; CHECK-FIX-LABEL: aesd_set16_loop_via_val:		; CHECK-FIX-LABEL: aesd_set16_loop_via_val:
; CHECK-FIX: @ %bb.0:		; CHECK-FIX: @ %bb.0:
; CHECK-FIX-NEXT: cmp r0, #0		; CHECK-FIX-NEXT: cmp r0, #0
; CHECK-FIX-NEXT: bxeq lr		; CHECK-FIX-NEXT: bxeq lr
; CHECK-FIX-NEXT: .LBB49_1:		; CHECK-FIX-NEXT: .LBB49_1:
; CHECK-FIX-NEXT: vmov.16 d0[0], r1		; CHECK-FIX-NEXT: vmov.16 d0[0], r1
; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r2]		; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r2]
; CHECK-FIX-NEXT: .LBB49_2: @ =>This Inner Loop Header: Depth=1		; CHECK-FIX-NEXT: .LBB49_2: @ =>This Inner Loop Header: Depth=1
		; CHECK-FIX-NEXT: vorr q0, q0, q0
; CHECK-FIX-NEXT: aesd.8 q8, q0		; CHECK-FIX-NEXT: aesd.8 q8, q0
; CHECK-FIX-NEXT: subs r0, r0, #1		; CHECK-FIX-NEXT: subs r0, r0, #1
; CHECK-FIX-NEXT: aesimc.8 q8, q8		; CHECK-FIX-NEXT: aesimc.8 q8, q8
; CHECK-FIX-NEXT: bne .LBB49_2		; CHECK-FIX-NEXT: bne .LBB49_2
; CHECK-FIX-NEXT: @ %bb.3:		; CHECK-FIX-NEXT: @ %bb.3:
; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r2]		; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r2]
; CHECK-FIX-NEXT: bx lr		; CHECK-FIX-NEXT: bx lr
%5 = icmp eq i32 %0, 0		%5 = icmp eq i32 %0, 0
Show All 23 Lines	13:
br i1 %19, label %11, label %13		br i1 %19, label %11, label %13
}		}

define arm_aapcs_vfpcc void @aesd_set32_via_ptr(i32* %0, <16 x i8> %1, <16 x i8>* %2) nounwind {		define arm_aapcs_vfpcc void @aesd_set32_via_ptr(i32* %0, <16 x i8> %1, <16 x i8>* %2) nounwind {
; CHECK-FIX-LABEL: aesd_set32_via_ptr:		; CHECK-FIX-LABEL: aesd_set32_via_ptr:
; CHECK-FIX: @ %bb.0:		; CHECK-FIX: @ %bb.0:
; CHECK-FIX-NEXT: vld1.32 {d0[0]}, [r0:32]		; CHECK-FIX-NEXT: vld1.32 {d0[0]}, [r0:32]
; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r1]		; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r1]
		; CHECK-FIX-NEXT: vorr q0, q0, q0
; CHECK-FIX-NEXT: aesd.8 q8, q0		; CHECK-FIX-NEXT: aesd.8 q8, q0
; CHECK-FIX-NEXT: aesimc.8 q8, q8		; CHECK-FIX-NEXT: aesimc.8 q8, q8
; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r1]		; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r1]
; CHECK-FIX-NEXT: bx lr		; CHECK-FIX-NEXT: bx lr
%4 = load <16 x i8>, <16 x i8>* %2, align 8		%4 = load <16 x i8>, <16 x i8>* %2, align 8
%5 = load i32, i32* %0, align 4		%5 = load i32, i32* %0, align 4
%6 = bitcast <16 x i8> %1 to <4 x i32>		%6 = bitcast <16 x i8> %1 to <4 x i32>
%7 = insertelement <4 x i32> %6, i32 %5, i64 0		%7 = insertelement <4 x i32> %6, i32 %5, i64 0
%8 = bitcast <4 x i32> %7 to <16 x i8>		%8 = bitcast <4 x i32> %7 to <16 x i8>
%9 = call <16 x i8> @llvm.arm.neon.aesd(<16 x i8> %4, <16 x i8> %8)		%9 = call <16 x i8> @llvm.arm.neon.aesd(<16 x i8> %4, <16 x i8> %8)
%10 = call <16 x i8> @llvm.arm.neon.aesimc(<16 x i8> %9)		%10 = call <16 x i8> @llvm.arm.neon.aesimc(<16 x i8> %9)
store <16 x i8> %10, <16 x i8>* %2, align 8		store <16 x i8> %10, <16 x i8>* %2, align 8
ret void		ret void
}		}

define arm_aapcs_vfpcc void @aesd_set32_via_val(i32 %0, <16 x i8> %1, <16 x i8>* %2) nounwind {		define arm_aapcs_vfpcc void @aesd_set32_via_val(i32 %0, <16 x i8> %1, <16 x i8>* %2) nounwind {
; CHECK-FIX-LABEL: aesd_set32_via_val:		; CHECK-FIX-LABEL: aesd_set32_via_val:
; CHECK-FIX: @ %bb.0:		; CHECK-FIX: @ %bb.0:
		; CHECK-FIX-NEXT: vorr q0, q0, q0
; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r1]		; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r1]
; CHECK-FIX-NEXT: vmov.32 d16[0], r0		; CHECK-FIX-NEXT: vmov.32 d16[0], r0
		; CHECK-FIX-NEXT: vorr q8, q8, q8
; CHECK-FIX-NEXT: aesd.8 q8, q0		; CHECK-FIX-NEXT: aesd.8 q8, q0
; CHECK-FIX-NEXT: aesimc.8 q8, q8		; CHECK-FIX-NEXT: aesimc.8 q8, q8
; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r1]		; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r1]
; CHECK-FIX-NEXT: bx lr		; CHECK-FIX-NEXT: bx lr
%4 = bitcast <16 x i8>* %2 to <4 x i32>*		%4 = bitcast <16 x i8>* %2 to <4 x i32>*
%5 = load <4 x i32>, <4 x i32>* %4, align 8		%5 = load <4 x i32>, <4 x i32>* %4, align 8
%6 = insertelement <4 x i32> %5, i32 %0, i64 0		%6 = insertelement <4 x i32> %5, i32 %0, i64 0
%7 = bitcast <4 x i32> %6 to <16 x i8>		%7 = bitcast <4 x i32> %6 to <16 x i8>
%8 = call <16 x i8> @llvm.arm.neon.aesd(<16 x i8> %7, <16 x i8> %1)		%8 = call <16 x i8> @llvm.arm.neon.aesd(<16 x i8> %7, <16 x i8> %1)
%9 = call <16 x i8> @llvm.arm.neon.aesimc(<16 x i8> %8)		%9 = call <16 x i8> @llvm.arm.neon.aesimc(<16 x i8> %8)
store <16 x i8> %9, <16 x i8>* %2, align 8		store <16 x i8> %9, <16 x i8>* %2, align 8
ret void		ret void
}		}

define arm_aapcs_vfpcc void @aesd_set32_cond_via_ptr(i1 zeroext %0, i32* %1, <16 x i8> %2, <16 x i8>* %3) nounwind {		define arm_aapcs_vfpcc void @aesd_set32_cond_via_ptr(i1 zeroext %0, i32* %1, <16 x i8> %2, <16 x i8>* %3) nounwind {
; CHECK-FIX-LABEL: aesd_set32_cond_via_ptr:		; CHECK-FIX-LABEL: aesd_set32_cond_via_ptr:
; CHECK-FIX: @ %bb.0:		; CHECK-FIX: @ %bb.0:
; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r2]		; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r2]
; CHECK-FIX-NEXT: cmp r0, #0		; CHECK-FIX-NEXT: cmp r0, #0
; CHECK-FIX-NEXT: beq .LBB52_2		; CHECK-FIX-NEXT: beq .LBB52_2
; CHECK-FIX-NEXT: @ %bb.1:		; CHECK-FIX-NEXT: @ %bb.1:
; CHECK-FIX-NEXT: vld1.32 {d0[0]}, [r1:32]		; CHECK-FIX-NEXT: vld1.32 {d0[0]}, [r1:32]
; CHECK-FIX-NEXT: .LBB52_2:		; CHECK-FIX-NEXT: .LBB52_2:
		; CHECK-FIX-NEXT: vorr q0, q0, q0
; CHECK-FIX-NEXT: aesd.8 q8, q0		; CHECK-FIX-NEXT: aesd.8 q8, q0
; CHECK-FIX-NEXT: aesimc.8 q8, q8		; CHECK-FIX-NEXT: aesimc.8 q8, q8
; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r2]		; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r2]
; CHECK-FIX-NEXT: bx lr		; CHECK-FIX-NEXT: bx lr
%5 = load <16 x i8>, <16 x i8>* %3, align 8		%5 = load <16 x i8>, <16 x i8>* %3, align 8
br i1 %0, label %6, label %10		br i1 %0, label %6, label %10

6:		6:
Show All 13 Lines	12:
%16 = call <16 x i8> @llvm.arm.neon.aesimc(<16 x i8> %15)		%16 = call <16 x i8> @llvm.arm.neon.aesimc(<16 x i8> %15)
store <16 x i8> %16, <16 x i8>* %3, align 8		store <16 x i8> %16, <16 x i8>* %3, align 8
ret void		ret void
}		}

define arm_aapcs_vfpcc void @aesd_set32_cond_via_val(i1 zeroext %0, i32 %1, <16 x i8> %2, <16 x i8>* %3) nounwind {		define arm_aapcs_vfpcc void @aesd_set32_cond_via_val(i1 zeroext %0, i32 %1, <16 x i8> %2, <16 x i8>* %3) nounwind {
; CHECK-FIX-LABEL: aesd_set32_cond_via_val:		; CHECK-FIX-LABEL: aesd_set32_cond_via_val:
; CHECK-FIX: @ %bb.0:		; CHECK-FIX: @ %bb.0:
		; CHECK-FIX-NEXT: vorr q0, q0, q0
; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r2]		; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r2]
; CHECK-FIX-NEXT: cmp r0, #0		; CHECK-FIX-NEXT: cmp r0, #0
; CHECK-FIX-NEXT: beq .LBB53_2		; CHECK-FIX-NEXT: beq .LBB53_2
; CHECK-FIX-NEXT: @ %bb.1:		; CHECK-FIX-NEXT: @ %bb.1:
; CHECK-FIX-NEXT: vmov.32 d16[0], r1		; CHECK-FIX-NEXT: vmov.32 d16[0], r1
; CHECK-FIX-NEXT: .LBB53_2: @ %select.end		; CHECK-FIX-NEXT: .LBB53_2: @ %select.end
		; CHECK-FIX-NEXT: vorr q8, q8, q8
; CHECK-FIX-NEXT: aesd.8 q8, q0		; CHECK-FIX-NEXT: aesd.8 q8, q0
; CHECK-FIX-NEXT: aesimc.8 q8, q8		; CHECK-FIX-NEXT: aesimc.8 q8, q8
; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r2]		; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r2]
; CHECK-FIX-NEXT: bx lr		; CHECK-FIX-NEXT: bx lr
%5 = bitcast <16 x i8>* %3 to <4 x i32>*		%5 = bitcast <16 x i8>* %3 to <4 x i32>*
%6 = load <4 x i32>, <4 x i32>* %5, align 8		%6 = load <4 x i32>, <4 x i32>* %5, align 8
%7 = insertelement <4 x i32> %6, i32 %1, i64 0		%7 = insertelement <4 x i32> %6, i32 %1, i64 0
%8 = select i1 %0, <4 x i32> %7, <4 x i32> %6		%8 = select i1 %0, <4 x i32> %7, <4 x i32> %6
%9 = bitcast <4 x i32> %8 to <16 x i8>		%9 = bitcast <4 x i32> %8 to <16 x i8>
%10 = call <16 x i8> @llvm.arm.neon.aesd(<16 x i8> %9, <16 x i8> %2)		%10 = call <16 x i8> @llvm.arm.neon.aesd(<16 x i8> %9, <16 x i8> %2)
%11 = call <16 x i8> @llvm.arm.neon.aesimc(<16 x i8> %10)		%11 = call <16 x i8> @llvm.arm.neon.aesimc(<16 x i8> %10)
store <16 x i8> %11, <16 x i8>* %3, align 8		store <16 x i8> %11, <16 x i8>* %3, align 8
ret void		ret void
}		}

define arm_aapcs_vfpcc void @aesd_set32_loop_via_ptr(i32 %0, i32* %1, <16 x i8> %2, <16 x i8>* %3) nounwind {		define arm_aapcs_vfpcc void @aesd_set32_loop_via_ptr(i32 %0, i32* %1, <16 x i8> %2, <16 x i8>* %3) nounwind {
; CHECK-FIX-LABEL: aesd_set32_loop_via_ptr:		; CHECK-FIX-LABEL: aesd_set32_loop_via_ptr:
; CHECK-FIX: @ %bb.0:		; CHECK-FIX: @ %bb.0:
; CHECK-FIX-NEXT: cmp r0, #0		; CHECK-FIX-NEXT: cmp r0, #0
; CHECK-FIX-NEXT: bxeq lr		; CHECK-FIX-NEXT: bxeq lr
; CHECK-FIX-NEXT: .LBB54_1:		; CHECK-FIX-NEXT: .LBB54_1:
; CHECK-FIX-NEXT: vld1.32 {d0[0]}, [r1:32]		; CHECK-FIX-NEXT: vld1.32 {d0[0]}, [r1:32]
; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r2]		; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r2]
; CHECK-FIX-NEXT: .LBB54_2: @ =>This Inner Loop Header: Depth=1		; CHECK-FIX-NEXT: .LBB54_2: @ =>This Inner Loop Header: Depth=1
		; CHECK-FIX-NEXT: vorr q0, q0, q0
; CHECK-FIX-NEXT: aesd.8 q8, q0		; CHECK-FIX-NEXT: aesd.8 q8, q0
; CHECK-FIX-NEXT: subs r0, r0, #1		; CHECK-FIX-NEXT: subs r0, r0, #1
; CHECK-FIX-NEXT: aesimc.8 q8, q8		; CHECK-FIX-NEXT: aesimc.8 q8, q8
; CHECK-FIX-NEXT: bne .LBB54_2		; CHECK-FIX-NEXT: bne .LBB54_2
; CHECK-FIX-NEXT: @ %bb.3:		; CHECK-FIX-NEXT: @ %bb.3:
; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r2]		; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r2]
; CHECK-FIX-NEXT: bx lr		; CHECK-FIX-NEXT: bx lr
%5 = load i32, i32* %1, align 4		%5 = load i32, i32* %1, align 4
Show All 28 Lines
; CHECK-FIX-LABEL: aesd_set32_loop_via_val:		; CHECK-FIX-LABEL: aesd_set32_loop_via_val:
; CHECK-FIX: @ %bb.0:		; CHECK-FIX: @ %bb.0:
; CHECK-FIX-NEXT: cmp r0, #0		; CHECK-FIX-NEXT: cmp r0, #0
; CHECK-FIX-NEXT: bxeq lr		; CHECK-FIX-NEXT: bxeq lr
; CHECK-FIX-NEXT: .LBB55_1:		; CHECK-FIX-NEXT: .LBB55_1:
; CHECK-FIX-NEXT: vmov.32 d0[0], r1		; CHECK-FIX-NEXT: vmov.32 d0[0], r1
; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r2]		; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r2]
; CHECK-FIX-NEXT: .LBB55_2: @ =>This Inner Loop Header: Depth=1		; CHECK-FIX-NEXT: .LBB55_2: @ =>This Inner Loop Header: Depth=1
		; CHECK-FIX-NEXT: vorr q0, q0, q0
; CHECK-FIX-NEXT: aesd.8 q8, q0		; CHECK-FIX-NEXT: aesd.8 q8, q0
; CHECK-FIX-NEXT: subs r0, r0, #1		; CHECK-FIX-NEXT: subs r0, r0, #1
; CHECK-FIX-NEXT: aesimc.8 q8, q8		; CHECK-FIX-NEXT: aesimc.8 q8, q8
; CHECK-FIX-NEXT: bne .LBB55_2		; CHECK-FIX-NEXT: bne .LBB55_2
; CHECK-FIX-NEXT: @ %bb.3:		; CHECK-FIX-NEXT: @ %bb.3:
; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r2]		; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r2]
; CHECK-FIX-NEXT: bx lr		; CHECK-FIX-NEXT: bx lr
%5 = icmp eq i32 %0, 0		%5 = icmp eq i32 %0, 0
Show All 21 Lines	13:
%18 = add nuw i32 %15, 1		%18 = add nuw i32 %15, 1
%19 = icmp eq i32 %18, %0		%19 = icmp eq i32 %18, %0
br i1 %19, label %11, label %13		br i1 %19, label %11, label %13
}		}

define arm_aapcs_vfpcc void @aesd_set64_via_ptr(i64* %0, <16 x i8> %1, <16 x i8>* %2) nounwind {		define arm_aapcs_vfpcc void @aesd_set64_via_ptr(i64* %0, <16 x i8> %1, <16 x i8>* %2) nounwind {
; CHECK-FIX-NOSCHED-LABEL: aesd_set64_via_ptr:		; CHECK-FIX-NOSCHED-LABEL: aesd_set64_via_ptr:
; CHECK-FIX-NOSCHED: @ %bb.0:		; CHECK-FIX-NOSCHED: @ %bb.0:
		; CHECK-FIX-NOSCHED-NEXT: vorr q0, q0, q0
; CHECK-FIX-NOSCHED-NEXT: vld1.64 {d16, d17}, [r1]		; CHECK-FIX-NOSCHED-NEXT: vld1.64 {d16, d17}, [r1]
; CHECK-FIX-NOSCHED-NEXT: vldr d0, [r0]		; CHECK-FIX-NOSCHED-NEXT: vldr d0, [r0]
; CHECK-FIX-NOSCHED-NEXT: aesd.8 q8, q0		; CHECK-FIX-NOSCHED-NEXT: aesd.8 q8, q0
; CHECK-FIX-NOSCHED-NEXT: aesimc.8 q8, q8		; CHECK-FIX-NOSCHED-NEXT: aesimc.8 q8, q8
; CHECK-FIX-NOSCHED-NEXT: vst1.64 {d16, d17}, [r1]		; CHECK-FIX-NOSCHED-NEXT: vst1.64 {d16, d17}, [r1]
; CHECK-FIX-NOSCHED-NEXT: bx lr		; CHECK-FIX-NOSCHED-NEXT: bx lr
;		;
; CHECK-CORTEX-FIX-LABEL: aesd_set64_via_ptr:		; CHECK-CORTEX-FIX-LABEL: aesd_set64_via_ptr:
; CHECK-CORTEX-FIX: @ %bb.0:		; CHECK-CORTEX-FIX: @ %bb.0:
		; CHECK-CORTEX-FIX-NEXT: vorr q0, q0, q0
; CHECK-CORTEX-FIX-NEXT: vldr d0, [r0]		; CHECK-CORTEX-FIX-NEXT: vldr d0, [r0]
; CHECK-CORTEX-FIX-NEXT: vld1.64 {d16, d17}, [r1]		; CHECK-CORTEX-FIX-NEXT: vld1.64 {d16, d17}, [r1]
; CHECK-CORTEX-FIX-NEXT: aesd.8 q8, q0		; CHECK-CORTEX-FIX-NEXT: aesd.8 q8, q0
; CHECK-CORTEX-FIX-NEXT: aesimc.8 q8, q8		; CHECK-CORTEX-FIX-NEXT: aesimc.8 q8, q8
; CHECK-CORTEX-FIX-NEXT: vst1.64 {d16, d17}, [r1]		; CHECK-CORTEX-FIX-NEXT: vst1.64 {d16, d17}, [r1]
; CHECK-CORTEX-FIX-NEXT: bx lr		; CHECK-CORTEX-FIX-NEXT: bx lr
%4 = load <16 x i8>, <16 x i8>* %2, align 8		%4 = load <16 x i8>, <16 x i8>* %2, align 8
%5 = load i64, i64* %0, align 8		%5 = load i64, i64* %0, align 8
%6 = bitcast <16 x i8> %1 to <2 x i64>		%6 = bitcast <16 x i8> %1 to <2 x i64>
%7 = insertelement <2 x i64> %6, i64 %5, i64 0		%7 = insertelement <2 x i64> %6, i64 %5, i64 0
%8 = bitcast <2 x i64> %7 to <16 x i8>		%8 = bitcast <2 x i64> %7 to <16 x i8>
%9 = call <16 x i8> @llvm.arm.neon.aesd(<16 x i8> %4, <16 x i8> %8)		%9 = call <16 x i8> @llvm.arm.neon.aesd(<16 x i8> %4, <16 x i8> %8)
%10 = call <16 x i8> @llvm.arm.neon.aesimc(<16 x i8> %9)		%10 = call <16 x i8> @llvm.arm.neon.aesimc(<16 x i8> %9)
store <16 x i8> %10, <16 x i8>* %2, align 8		store <16 x i8> %10, <16 x i8>* %2, align 8
ret void		ret void
}		}

define arm_aapcs_vfpcc void @aesd_set64_via_val(i64 %0, <16 x i8> %1, <16 x i8>* %2) nounwind {		define arm_aapcs_vfpcc void @aesd_set64_via_val(i64 %0, <16 x i8> %1, <16 x i8>* %2) nounwind {
; CHECK-FIX-LABEL: aesd_set64_via_val:		; CHECK-FIX-LABEL: aesd_set64_via_val:
; CHECK-FIX: @ %bb.0:		; CHECK-FIX: @ %bb.0:
		; CHECK-FIX-NEXT: vorr q0, q0, q0
; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r2]		; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r2]
; CHECK-FIX-NEXT: vmov.32 d16[0], r0		; CHECK-FIX-NEXT: vmov.32 d16[0], r0
; CHECK-FIX-NEXT: vmov.32 d16[1], r1		; CHECK-FIX-NEXT: vmov.32 d16[1], r1
		; CHECK-FIX-NEXT: vorr q8, q8, q8
; CHECK-FIX-NEXT: aesd.8 q8, q0		; CHECK-FIX-NEXT: aesd.8 q8, q0
; CHECK-FIX-NEXT: aesimc.8 q8, q8		; CHECK-FIX-NEXT: aesimc.8 q8, q8
; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r2]		; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r2]
; CHECK-FIX-NEXT: bx lr		; CHECK-FIX-NEXT: bx lr
%4 = bitcast <16 x i8>* %2 to <2 x i64>*		%4 = bitcast <16 x i8>* %2 to <2 x i64>*
%5 = load <2 x i64>, <2 x i64>* %4, align 8		%5 = load <2 x i64>, <2 x i64>* %4, align 8
%6 = insertelement <2 x i64> %5, i64 %0, i64 0		%6 = insertelement <2 x i64> %5, i64 %0, i64 0
%7 = bitcast <2 x i64> %6 to <16 x i8>		%7 = bitcast <2 x i64> %6 to <16 x i8>
%8 = call <16 x i8> @llvm.arm.neon.aesd(<16 x i8> %7, <16 x i8> %1)		%8 = call <16 x i8> @llvm.arm.neon.aesd(<16 x i8> %7, <16 x i8> %1)
%9 = call <16 x i8> @llvm.arm.neon.aesimc(<16 x i8> %8)		%9 = call <16 x i8> @llvm.arm.neon.aesimc(<16 x i8> %8)
store <16 x i8> %9, <16 x i8>* %2, align 8		store <16 x i8> %9, <16 x i8>* %2, align 8
ret void		ret void
}		}

define arm_aapcs_vfpcc void @aesd_set64_cond_via_ptr(i1 zeroext %0, i64* %1, <16 x i8> %2, <16 x i8>* %3) nounwind {		define arm_aapcs_vfpcc void @aesd_set64_cond_via_ptr(i1 zeroext %0, i64* %1, <16 x i8> %2, <16 x i8>* %3) nounwind {
; CHECK-FIX-LABEL: aesd_set64_cond_via_ptr:		; CHECK-FIX-LABEL: aesd_set64_cond_via_ptr:
; CHECK-FIX: @ %bb.0:		; CHECK-FIX: @ %bb.0:
; CHECK-FIX-NEXT: cmp r0, #0		; CHECK-FIX-NEXT: cmp r0, #0
; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r2]		; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r2]
; CHECK-FIX-NEXT: vldrne d0, [r1]		; CHECK-FIX-NEXT: vldrne d0, [r1]
		; CHECK-FIX-NEXT: vorr q0, q0, q0
; CHECK-FIX-NEXT: aesd.8 q8, q0		; CHECK-FIX-NEXT: aesd.8 q8, q0
; CHECK-FIX-NEXT: aesimc.8 q8, q8		; CHECK-FIX-NEXT: aesimc.8 q8, q8
; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r2]		; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r2]
; CHECK-FIX-NEXT: bx lr		; CHECK-FIX-NEXT: bx lr
%5 = load <16 x i8>, <16 x i8>* %3, align 8		%5 = load <16 x i8>, <16 x i8>* %3, align 8
br i1 %0, label %6, label %10		br i1 %0, label %6, label %10

6:		6:
Show All 13 Lines	12:
%16 = call <16 x i8> @llvm.arm.neon.aesimc(<16 x i8> %15)		%16 = call <16 x i8> @llvm.arm.neon.aesimc(<16 x i8> %15)
store <16 x i8> %16, <16 x i8>* %3, align 8		store <16 x i8> %16, <16 x i8>* %3, align 8
ret void		ret void
}		}

define arm_aapcs_vfpcc void @aesd_set64_cond_via_val(i1 zeroext %0, i64 %1, <16 x i8> %2, <16 x i8>* %3) nounwind {		define arm_aapcs_vfpcc void @aesd_set64_cond_via_val(i1 zeroext %0, i64 %1, <16 x i8> %2, <16 x i8>* %3) nounwind {
; CHECK-FIX-LABEL: aesd_set64_cond_via_val:		; CHECK-FIX-LABEL: aesd_set64_cond_via_val:
; CHECK-FIX: @ %bb.0:		; CHECK-FIX: @ %bb.0:
		; CHECK-FIX-NEXT: vorr q0, q0, q0
; CHECK-FIX-NEXT: ldr r1, [sp]		; CHECK-FIX-NEXT: ldr r1, [sp]
; CHECK-FIX-NEXT: cmp r0, #0		; CHECK-FIX-NEXT: cmp r0, #0
; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r1]		; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r1]
; CHECK-FIX-NEXT: beq .LBB59_2		; CHECK-FIX-NEXT: beq .LBB59_2
; CHECK-FIX-NEXT: @ %bb.1:		; CHECK-FIX-NEXT: @ %bb.1:
; CHECK-FIX-NEXT: vmov.32 d16[0], r2		; CHECK-FIX-NEXT: vmov.32 d16[0], r2
; CHECK-FIX-NEXT: vmov.32 d16[1], r3		; CHECK-FIX-NEXT: vmov.32 d16[1], r3
; CHECK-FIX-NEXT: .LBB59_2: @ %select.end		; CHECK-FIX-NEXT: .LBB59_2: @ %select.end
		; CHECK-FIX-NEXT: vorr q8, q8, q8
; CHECK-FIX-NEXT: aesd.8 q8, q0		; CHECK-FIX-NEXT: aesd.8 q8, q0
; CHECK-FIX-NEXT: aesimc.8 q8, q8		; CHECK-FIX-NEXT: aesimc.8 q8, q8
; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r1]		; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r1]
; CHECK-FIX-NEXT: bx lr		; CHECK-FIX-NEXT: bx lr
%5 = bitcast <16 x i8>* %3 to <2 x i64>*		%5 = bitcast <16 x i8>* %3 to <2 x i64>*
%6 = load <2 x i64>, <2 x i64>* %5, align 8		%6 = load <2 x i64>, <2 x i64>* %5, align 8
%7 = insertelement <2 x i64> %6, i64 %1, i64 0		%7 = insertelement <2 x i64> %6, i64 %1, i64 0
%8 = select i1 %0, <2 x i64> %7, <2 x i64> %6		%8 = select i1 %0, <2 x i64> %7, <2 x i64> %6
%9 = bitcast <2 x i64> %8 to <16 x i8>		%9 = bitcast <2 x i64> %8 to <16 x i8>
%10 = call <16 x i8> @llvm.arm.neon.aesd(<16 x i8> %9, <16 x i8> %2)		%10 = call <16 x i8> @llvm.arm.neon.aesd(<16 x i8> %9, <16 x i8> %2)
%11 = call <16 x i8> @llvm.arm.neon.aesimc(<16 x i8> %10)		%11 = call <16 x i8> @llvm.arm.neon.aesimc(<16 x i8> %10)
store <16 x i8> %11, <16 x i8>* %3, align 8		store <16 x i8> %11, <16 x i8>* %3, align 8
ret void		ret void
}		}

define arm_aapcs_vfpcc void @aesd_set64_loop_via_ptr(i32 %0, i64* %1, <16 x i8> %2, <16 x i8>* %3) nounwind {		define arm_aapcs_vfpcc void @aesd_set64_loop_via_ptr(i32 %0, i64* %1, <16 x i8> %2, <16 x i8>* %3) nounwind {
; CHECK-FIX-LABEL: aesd_set64_loop_via_ptr:		; CHECK-FIX-LABEL: aesd_set64_loop_via_ptr:
; CHECK-FIX: @ %bb.0:		; CHECK-FIX: @ %bb.0:
		; CHECK-FIX-NEXT: vorr q0, q0, q0
; CHECK-FIX-NEXT: cmp r0, #0		; CHECK-FIX-NEXT: cmp r0, #0
; CHECK-FIX-NEXT: bxeq lr		; CHECK-FIX-NEXT: bxeq lr
; CHECK-FIX-NEXT: .LBB60_1:		; CHECK-FIX-NEXT: .LBB60_1:
; CHECK-FIX-NEXT: vldr d0, [r1]		; CHECK-FIX-NEXT: vldr d0, [r1]
; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r2]		; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r2]
; CHECK-FIX-NEXT: .LBB60_2: @ =>This Inner Loop Header: Depth=1		; CHECK-FIX-NEXT: .LBB60_2: @ =>This Inner Loop Header: Depth=1
; CHECK-FIX-NEXT: aesd.8 q8, q0		; CHECK-FIX-NEXT: aesd.8 q8, q0
; CHECK-FIX-NEXT: subs r0, r0, #1		; CHECK-FIX-NEXT: subs r0, r0, #1
Show All 36 Lines
; CHECK-FIX-NEXT: cmp r0, #0		; CHECK-FIX-NEXT: cmp r0, #0
; CHECK-FIX-NEXT: bxeq lr		; CHECK-FIX-NEXT: bxeq lr
; CHECK-FIX-NEXT: .LBB61_1:		; CHECK-FIX-NEXT: .LBB61_1:
; CHECK-FIX-NEXT: vmov.32 d0[0], r2		; CHECK-FIX-NEXT: vmov.32 d0[0], r2
; CHECK-FIX-NEXT: ldr r1, [sp]		; CHECK-FIX-NEXT: ldr r1, [sp]
; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r1]		; CHECK-FIX-NEXT: vld1.64 {d16, d17}, [r1]
; CHECK-FIX-NEXT: vmov.32 d0[1], r3		; CHECK-FIX-NEXT: vmov.32 d0[1], r3
; CHECK-FIX-NEXT: .LBB61_2: @ =>This Inner Loop Header: Depth=1		; CHECK-FIX-NEXT: .LBB61_2: @ =>This Inner Loop Header: Depth=1
		; CHECK-FIX-NEXT: vorr q0, q0, q0
; CHECK-FIX-NEXT: aesd.8 q8, q0		; CHECK-FIX-NEXT: aesd.8 q8, q0
; CHECK-FIX-NEXT: subs r0, r0, #1		; CHECK-FIX-NEXT: subs r0, r0, #1
; CHECK-FIX-NEXT: aesimc.8 q8, q8		; CHECK-FIX-NEXT: aesimc.8 q8, q8
; CHECK-FIX-NEXT: bne .LBB61_2		; CHECK-FIX-NEXT: bne .LBB61_2
; CHECK-FIX-NEXT: @ %bb.3:		; CHECK-FIX-NEXT: @ %bb.3:
; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r1]		; CHECK-FIX-NEXT: vst1.64 {d16, d17}, [r1]
; CHECK-FIX-NEXT: bx lr		; CHECK-FIX-NEXT: bx lr
%5 = icmp eq i32 %0, 0		%5 = icmp eq i32 %0, 0
Show All 25 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Pass for Cortex-A57 and Cortex-A72 Fused AES ErratumClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 419202

clang/include/clang/Driver/Options.td

clang/lib/Driver/ToolChains/Arch/ARM.cpp

clang/test/Driver/arm-fix-cortex-a57-aes-1742098.c

llvm/lib/CodeGen/RDFGraph.cpp

llvm/lib/Target/ARM/ARM.h

llvm/lib/Target/ARM/ARM.td

llvm/lib/Target/ARM/ARMFixCortexA57AES1742098Pass.cpp

llvm/lib/Target/ARM/ARMSubtarget.h

llvm/lib/Target/ARM/ARMTargetMachine.cpp

llvm/lib/Target/ARM/CMakeLists.txt

llvm/test/CodeGen/ARM/O3-pipeline.ll

llvm/test/CodeGen/ARM/aes-erratum-fix.ll

[ARM] Pass for Cortex-A57 and Cortex-A72 Fused AES Erratum
ClosedPublic