This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Target/AArch64/
-
Target/
-
AArch64/
20
AArch64InstrInfo.td
34
AArch64InstrNEON.td
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
128bit_load_store.ll
-
addsub-shifted.ll
-
addsub.ll
-
addsub_ext.ll
-
alloca.ll
-
analyze-branch.ll
-
assertion-rc-mismatch.ll
-
atomic-ops-not-barriers.ll
-
atomic-ops.ll
-
basic-pic.ll
-
bitfield-insert-0.ll
-
bitfield-insert.ll
-
bitfield.ll
-
blockaddress.ll
-
bool-loads.ll
-
breg.ll
-
callee-save.ll
-
code-model-large-abs.ll
-
compare-branch.ll
-
complex-copy-noneon.ll
-
concatvector-v8i8-bug.ll
-
cond-sel.ll
-
cpus.ll
-
directcond.ll
-
dp-3source.ll
-
dp1.ll
-
dp2.ll
-
extern-weak.ll
-
extract.ll
-
fastcc-reserved.ll
-
fastcc.ll
-
fcmp.ll
-
fcvt-fixed.ll
-
fcvt-int.ll
-
flags-multiuse.ll
-
floatdp_1source.ll
-
floatdp_2source.ll
-
fp-cond-sel.ll
-
fp-dp3.ll
-
fp128-folding.ll
-
fp128.ll
-
fpimm.ll
-
frameaddr.ll
-
func-argpassing.ll
-
func-calls.ll
-
global-alignment.ll
-
got-abuse.ll
-
i128-align.ll
-
illegal-float-ops.ll
-
init-array.ll
-
inline-asm-constraints-badI.ll
-
inline-asm-constraints-badK.ll
-
inline-asm-constraints-badK2.ll
-
inline-asm-constraints-badL.ll
-
inline-asm-modifiers.ll
-
jump-table.ll
-
large-consts.ll
-
large-frame.ll
-
ldst-regoffset.ll
-
ldst-unscaledimm.ll
-
ldst-unsignedimm.ll
-
literal_pools.ll
-
local_vars.ll
-
logical-imm.ll
-
logical_shifted_reg.ll
-
mature-mc-support.ll
-
movw-consts.ll
-
movw-shift-encoding.ll
-
neon-2velem-high.ll
-
neon-2velem.ll
-
neon-3vdiff.ll
-
neon-aba-abd.ll
-
neon-across.ll
-
neon-add-pairwise.ll
-
neon-add-sub.ll
-
neon-bitcast.ll
-
neon-bitwise-instructions.ll
-
neon-bsl.ll
-
neon-compare-instructions.ll
-
neon-copyPhysReg-tuple.ll
-
neon-crypto.ll
-
neon-diagnostics.ll
-
neon-extract.ll
-
neon-facge-facgt.ll
-
neon-fma.ll
-
neon-fpround_f128.ll
-
neon-frsqrt-frecp.ll
-
neon-halving-add-sub.ll
-
neon-load-store-v1i32.ll
-
neon-max-min-pairwise.ll
-
neon-max-min.ll
-
neon-misc.ll
-
neon-mla-mls.ll
-
neon-mov.ll
-
neon-mul-div.ll
-
neon-or-combine.ll
-
neon-perm.ll
-
neon-rounding-halving-add.ll
-
neon-rounding-shift.ll
-
neon-saturating-add-sub.ll
-
neon-saturating-rounding-shift.ll
-
neon-saturating-shift.ll
-
neon-scalar-abs.ll
-
neon-scalar-add-sub.ll
-
neon-scalar-by-elem-fma.ll
-
neon-scalar-by-elem-mul.ll
-
neon-scalar-compare.ll
-
neon-scalar-copy.ll
-
neon-scalar-cvt.ll
-
neon-scalar-ext.ll
-
neon-scalar-extract-narrow.ll
-
neon-scalar-fabd.ll
-
neon-scalar-fcvt.ll
-
neon-scalar-fp-compare.ll
-
neon-scalar-mul.ll
-
neon-scalar-neg.ll
-
neon-scalar-recip.ll
-
neon-scalar-reduce-pairwise.ll
-
neon-scalar-rounding-shift.ll
-
neon-scalar-saturating-add-sub.ll
-
neon-scalar-saturating-rounding-shift.ll
-
neon-scalar-saturating-shift.ll
-
neon-scalar-shift-imm.ll
-
neon-scalar-shift.ll
-
neon-select_cc.ll
-
neon-shift-left-long.ll
-
neon-shift.ll
-
neon-shl-ashr-lshr.ll
-
neon-simd-ldst-multi-elem.ll
-
neon-simd-ldst.ll
-
neon-simd-post-ldst-multi-elem.ll
-
neon-simd-post-ldst-one.ll
-
neon-simd-shift.ll
-
neon-simd-tbl.ll
-
neon-simd-vget.ll
-
neon-spill-fpr8-fpr16.ll
-
neon-truncStore-extLoad.ll
-
neon-v1i1-setcc.ll
-
neon-vector-list-spill.ll
-
regress-bitcast-formals.ll
-
regress-f128csel-flags.ll
-
regress-fp128-livein.ll
-
regress-tail-livereg.ll
-
regress-tblgen-chains.ll
-
regress-w29-reserved-with-fp.ll
-
regress-wzr-allocatable.ll
-
returnaddr.ll
-
setcc-takes-i32.ll
-
sext_inreg.ll
-
sibling-call.ll
-
sincos-expansion.ll
-
sincospow-vector-expansion.ll
-
tail-call.ll
-
tls-dynamic-together.ll
-
tls-dynamics.ll
-
tls-execs.ll
-
tst-br.ll
-
variadic.ll
-
zero-reg.ll

Differential D2884

AARCH64_BE load/store rules fix for ARM ABI
Needs ReviewPublic

Authored by akadlec on Feb 26 2014, 12:30 AM.

Download Raw Diff

Details

Reviewers

t.p.northover

Summary

For Big Endian (BE) systems: Switch from LD1/ST1 loads to LDR/STR for NEON regs.
Apart from having better addressing modes and being specified in the ABI, LDR/STR do correct byte-swapping for BE, as opposed to the "element-swapping" taking place with LD1/ST1.

For Little Endian (LE), nothing changes in this step - although the shorter LDR/STR instructions should be enabled for LE as well - in LE, both instruction types do the same things and can be mixed.

For BE, initialization from literals must use vector load intrinsics - or the literals need to be rearranged before emit.

Diff Detail

Event Timeline

Hi Albrecht,

Thanks for working on this.

I think you've been a bit too liberal with your IsLE predicate, applying it to both patterns that you don't want to disable on BE (if I've understood properly) and to instruction definitions without any patterns (currently harmless, but pointless too).

I've also made one comment about the IsBE use. Is it really necessary?

Finally, there should definitely be regression tests for changes like this. Preferably for each pattern you're introducing or changing. (This is particularly important at the moment because (as Chris said) eventually we'd like to merge Apple's ARM64 LLVM port with this one, which will mean big changes everywhere).

Cheers.

Tim.

lib/Target/AArch64/AArch64InstrInfo.td
4856	Could we have capital letters at the start of sentences?
4858–4859	It'd probably be a good idea to refer people to the AAPCS here for more details.
4860	Why is this only eventually? Couldn't it be enabled now if it's got better addressing-mode properties?
4893	Commented code.
4901	If the CPU doesn't have FPARMv8 then f64 won't be a legal type and the DAG shouldn't contain any instances of it by this time.
4904	LLVM normally uses "FIXME" rather than "TODO". A consistent choice makes grepping a bit easier.
lib/Target/AArch64/AArch64InstrNEON.td
3391	There aren't any patterns in this multiclass so this is superfluous. If patterns are added, it's not clear that they'll be wrong for BE either: they could be the int_arm_neon_vldN version which you do want.
3888	No patterns and you probably would want any that existed since the layout issue doesn't exist for the duplicating loads.
3907	Ah, here they are. I think these patterns should be endian-independent.
3996	Loading a single lane is also layout independent (and these are not the patterns).

As they say for complex reviews: start early & iterate :-)

I forgot to add the bigger roadmap towards BE support

disable LDx/STx
regain matcher coverage by adding LDR/STR
fix BE calling conventions to gain code correctness in BE (almost there internally)
optionally enable LDR/STR for LE
re-enable some LDn rules with extensive testing for nice interaction with STRed data structures

This patch covers 1-2, 3 is a prerequisite for 5 (inlining printf sucks)
Ideally these would be separate patches each - but matcher fails don't go down well.

YES, I've been liberal with the disabling - anything that looked dangerous had to go for now - for BE only.
I figured that single element duplicating loads may be fine.
But whether v8i8 or just 128bit elements (how about 64 bits??) are fine is still work in progress - needs testing.
Documentation is plentiful, and I haven't found what I'd really need, yet.

Then I added the new patterns conservatively for BE only - don't want to change LE code just now (we're comparing LE to BE, for example). Also not going to ruffle feathers of any LE guys (one pandora's box at the same time)

Right now, the focus is on getting BE to actually work (correctness first).
That needs another upcoming patch to the calling conventions - step 3) above.

Then we have correct code coverage, and we'll extend from there - within project limitations.

lib/Target/AArch64/AArch64InstrInfo.td
4856	Done
4858–4859	Done. the URL just so fits the 80cols limit :-)
4860	we're using LE as reference - so trying not to change that, yet. people working on LE might oppose the code changes. the comment was intended to start discussion that eventually leads to enabling.
4893	Yeah - I'm still wondering why there's a v1f64 non terminal, but no v1i32. Any idea? Symmetry would suggest, both should exist.
4901	ARM32 ended up having a few options for hard float units. wasn't sure since there're other uses above - but not at all consistently. E.g: let Predicates = [HasFPARMv8] in { def : Pat<(i32 (fp_to_sint f32:$Rn)), (FCVTZSws $Rn)>; ... What do we do NOW ? Cleanup (which way?) / leave as is (-> add guards for new code or not?) ?
4904	DONE
lib/Target/AArch64/AArch64InstrNEON.td
3391	True - It's more a reminder for the guy who adds patterns, that LDn/STn make trouble in BE, while it's fine in LE. LE implementation is farther ahead - adding patterns without considering BE will be troublesome. -> Shall I convert that to a comment ?
3888	True for the single-element replications below - first candidate for re-enabling. Probably not true for the vector replicating loads here (if a single vector is loaded in reverse-element order, the duplication won't fix that ?)
3907	Need to do more reading to fully understand all details of the swapping - but single element reads should be fine, if they do internal byte-swapping within the element (-> v1x nonterminals and scalar nonterminals) so conservatively "not yet" - i.e. until we have working code (CallingConv)
3996	need to check whether ld4ln ({1,2,3,4}) yields the same result on BE & LE probably also depends on whether you feed that with an array (then yes) or a vector stored by STR (then probably NOT) -> pattern-dependent ?

changes from Tim's comment - PTAL

jmolloy requested changes to this revision.Feb 26 2014, 10:24 AM

jmolloy added inline comments.

lib/Target/AArch64/AArch64InstrInfo.td
4859	As the AAPCS is also not too clear on this, could you please add in the comment the reason ld1/st1 can't be used? More specifically than "wrong arg memory layout", it is because the LD1 performs lane-by-lane byte swapping, and LDR swaps the entire D/Q register.
4878	What does this mean? is it a paste from the comment below? is fp16 always available in a64? Also capitalization, as Tim said.
4879	Please add a FIXME and remove trailing "??"
4897	v1f64 is there because some instructions that act on f64's have both NEON and VFP variants, and in order that NEON intrinsics written by the user select the instruction that the user requested, there must be a way to distinguish between the two at the type level. That is my understanding anyway - Hao or Jiangning will know more about this.
lib/Target/AArch64/AArch64InstrNEON.td
3383	This line of the comment doesn't make sense. LD1 is disallowed, but it has nothing to do with 12bit offset adds! You give the reasoning below. Also in LLVM we have a tendency to write comments in a fairly prose-like form to make it easier to read. For example, instead of "LD1 disallowed in BE", "LD1 is disallowed in BE mode". Also instead of "reason: ", "This is because..."
3390	The comment says this should work in BE mode... then the predicate stops it working in BE mode. Why?
3461	Failed to reindent this line?
3888	Shouldn't a splatting LD1 still work in BE mode?
3908	This comment is cryptic. Also, why should we care about byte swapping here, we're under the IsLE predicate.
3931	ld1.64 should be fine too, right? Because ld1.64 acts the same as LDR (byte swapping on a 64-bit value).
3996	Will 1-element to 1-lane also work in BE mode?

Hi Albrecht,

disable LDx/STx

I still think this is misguided; the lane and duplicating instructions
are very different beasts, without any suggestion (as far as I can
see) that they're problematic except that they share part of their
mnemonic with the dodgy ones.

Anything disabled should be because you can explain why it goes wrong,
not through guilt by association.

optionally enable LDR/STR for LE

I think this should happen first. I realise your main concern is
big-endian support, but it's best not to get too bogged down in that
if there's an opportunity to improve other parts of LLVM at the same
time. Since it looks like these are more flexible generally, and
identical to ld1 for LE, I'd suggest a separate patch adding these
patterns for all both BE & LE (with tests, obviously).

I'd support that change from what I've heard. *If* people object, then
we can go back to this suggestion (reluctantly). I think that's
unlikely though.

Ideally these would be separate patches each - but matcher fails don't go down well.

Agreed, but I'm not too bothered by that personally. I'm not sure
where you're getting matcher fails with a partial patch though; that
might need looking into if it's in the regression tests.

Then I added the new patterns conservatively for BE only - don't want to change LE
code just now (we're comparing LE to BE, for example). Also not going to ruffle feathers
of any LE guys (one pandora's box at the same time)

I'd say you're more likely to do that by restricting all changes to BE
than by changing LE when it's a benefit to both.

Finally, the newest patch still doesn't have any regression tests. The
other stuff's definitely still up for debate, but those really are
essential.

Cheers.

Tim.

Hmm, uploading a partially updated patch mostly hides previous comments (unexpected for me).
Seems to have killed Tim's and my discussion on the inline-comments for the 1st diff.

So here's the open question again:

Is it ok to guard instruction defs with IsLE to hint that they might not be safe to use in BE ?
Or add a comment instead ?
If somebody adds a valid pattern (e.g. intrinsic) that predicate can be removed,
If somebody adds a pattern valid only for either BE or LE, the predicate must go there.

Fixes included in the next revison (today)

lib/Target/AArch64/AArch64InstrInfo.td
4859	Hmmm, somebody should tell ARM, so they can clarify that section ;-) I'd written that to the beginning of the AArch64InstrNEON.td - but left it out again. I'd still prefer to put that next to the NEON load-store instructions, where it belongs. If you can bear with a larger paragraph of comments ... DONE
4878	The fp* rules have already been there before I added the vector types. They were unguarded. I kept it that way until we clarify the necessity of these guards. It's not unimaginable that other FP hardware might emerge besides neon vfp. I don't know and even manufacturers don't see the future market demands, yet.
4879	I was waiting for a reply from Tim -> remove the comment and do the right thing whatever that is. I since figured, that if the predicates exist I'd better use them - ignoring the non-consistent picture so far. Better a fail to match in the presence of upstream errors, than blowing up in the face of a customer's customer.
4897	hmm I don't see the necessity for a separate nonterminal, there. either a) Intrinsics are matched directly - so no need there. b) As long as the semantic isn't different, the user couldn't care less. (-> pseudo instruction that can be lowered later as fits) Nonterminals are data-types for registers. Separate nonterminals are for different in-register representations, e.g. they might be handy for "reversed" vectors, together with the chain rules that do the element-swaps. They're really handy for high'low positioning of values in register. But that all requires a PBQP matcher that finds the optimal coverage for the function's DAG.
lib/Target/AArch64/AArch64InstrNEON.td
3383	right - was mixing the BE & LE issues, there fixed by adding the whole ugly story, right at the start of the store section. Wish we could separate the stores out from that 10.000 lines file. :-( ad Comments: - that's probably why there were so many helpful comments in that file :-( At many places I'd have preferred a short comment over none at all.
3390	first guess at step 5 in our roadmap - and being conservative. Anyway - fixed, now that I had the time to think through it all.
3461	Done.
3888	At most the ones that only read one element and duplicate that. The multi-element reads will have unexpected order (that struct was STRed!), so can only be used via intrinsics to read from arrays.
3908	That's the reason for the IsLE - because it doesn't work for BE. Pulled the comments out. Now I think it's still wrong, as the elements would be read in ascending address order "array-like", while they have been stored reversed by STR.
3931	yes
3996	added the following comment to the pattern and removed the predicate. This will not work as intended in BE mode, if the matcher generates it to load a vector to a lane. (STR q0 stored the elements swapped) Must always use an intrinsic, so the user knows it's loading from an array layout.

+let Predicates = [IsLE] in {

// Load single 1-element structure to all lanes of 1 register

James Molloy wrote:

Shouldn't a splatting LD1 still work in BE mode?

At most the ones that only read one element and duplicate that.

The multi-element reads will have unexpected order (that struct was STRed!), so can only be used via intrinsics to read from arrays.

I'm not sure I follow here. Struct's aren't short vectors, so their
layout is dictated by the normal C rules and I think they will have
the expected order on both little and big-endian machines. The example
I'm thinking of might be written as:

#include <arm_neon.h>
typedef struct { uint8_t r, g, b; } RGB;
uint8x8x3_t read(RGB *colours) {
  uint8x8x3_t result;
  result.val[0] = vdup_n_u8(colours->r);
  result.val[1] = vdup_n_u8(colours->g);
  result.val[2] = vdup_n_u8(colours->b);
  return result;
}

I think this would be best implemented as an ld3r on both big and
little-endian systems, and is the intended use of that instruction.

Could you give a snippet of either LLVM IR or C that you think we
might naively use ldNr for, but would be invalid on big-endian
systems? Just so I can get a better idea of what you're thinking of.

-defm LD1LN : LDN_Lane_BHSD<0b0, 0b0, "VOne", "ld1">;
+let Predicates = [IsLE] in {

+ // Load single 1-element structure to one lane of 1 register.

James Molloy wrote:

Will 1-element to 1-lane also work in BE mode?

added the following comment to the pattern and removed the predicate.

This will not work as intended in BE mode, if the matcher generates it to
load a vector to a lane. (STR q0 stored the elements swapped)
Must always use an intrinsic, so the user knows it's loading from an array
layout.

I don't believe this is true either. Consider the alternatives for the IR:

define <4 x i32> @foo(<4 x i32> %vec, i32* %addr) {
  %elt = load i32* %addr
   %newvec = insertelement <4 x i32> %vec, i32 %elt, i32 0
   ret <4 x i32> %newvec
}

This is the obvious, canonical situation where we'd want a pattern for
"ld1 (lane)". And indeed we generate "ld1 {v0.4s}[0], [x0]". But
what's the alternative if the ld1 is disabled? I strongly suspect
you'll find it's

ldr w0, [x0]
ins v0.4s[0], w0

which has exactly the same semantics.

I think the problem will actually come with the intrinsics, where we
probably want to generate this sequence from "vld1_lane_s32(addr, vec,
3)" but I'd strongly suggest approaching that from the front-end since
it should be mapping to that LLVM IR anyway.

Cheers.

Tim.

The promised updated patch including:
all single element LDn/STn allowed
LDR/STR for vector regs for BE & LE
LE tests fixed
BE tests added

PTAL

Hi Albrecht,

AAPCS64 requires to use LDR/STR only for short vectors defined in AAPCS64. The definition of short vector in AAPCS64 requires the monolithic alignment of the whole short vector rather than element alignment.

For some reason, LLVM compiler could generate element alignment short vector for storing array purpose. This type should be different from the short vector defined in AAPCS64. All of the instruction using this data type should fall into LD1/ST1. LD1/ST1 should not make difference for element ordering between LE/BE, and the only difference is the type ordering inside the element. We will be supporting this element alignment short vector access soon. Refer to the example inlined.

There are some other comments inlined.

Thanks,
-Jiangning

lib/Target/AArch64/AArch64InstrNEON.td
107	This comment is misleading. Every instruction should be valid for big-endian, although the same instruction can have different behaviors for LE/BE.
3362	How do we come across a case mixing the uses of LDR and LD1? If it's type casting, end-user should guarantee the correctness by program logic itself rather than by compiler.
3417	This is not the only case. Auto-vectorizer could generate element alignment short vector ld/st. For example, middle-end could generate store <4 x i16> %val, <4 x i16>* %ptr, align 2 We should generate instruction like st1 v0.4h, [x0]. Unfortunately, we can't generate this instruction yet with trunk. We will get it fixed as soon as possible.
3424	Is this to disable LD1/LD2/LD3/LD4 for big-endian? If yes, why the test cases using those instructions can pass with big-endian configuration? This piece of code is to define encodings, and LE/BE should always cover them. If we don't want to generate any instruction, we should control them with pattern match.
3483	ST1/ST2/ST3/ST4 essentially use aggregate short vector type like, typedef struct int16x4x3_t { int16x4_t val[3]; } int16x4x3_t; which is defined in arm_neon.h. With this data type, LE/BE should only make difference for the layout inside element int16. The data layout among different elements should be always the same.

Hi Jiangning,

I'm not sure I understand your comments. Do you mean ARM is intending to add C level types to ACLE & AAPCS that *will* behave as if loaded and stored with ld1/st1 soon?

Cheers.

Tim.

lib/Target/AArch64/AArch64InstrNEON.td
3362	The compiler was mixing them at will previously (e.g. storeRegToStackSlot uses str, but this address could escape and be used in a normal load which we'd use ld1 for). I believe Albrecht's comment is designed to warn against this, and I support it.
3417	I don't believe we're forced to generate either and there are arguments in favour of both, but being consistent is very important. As Albrecht said, we can't mix the two kinds of load/store. I agree that using ld1/st1 exclusively would make LLVM's semantics easier to get right, but it would make getting the AAPCS right harder (bitcasts would become non-trivial operations and be needed at all potentially ABI-visible boundaries). I suspect (but don't know) that the ldr/str route is capable of producing better code on average.
3424	The "IsBE" predicate is codegen-level rather than an AssemblerPredicate so MC tests won't be affected anyway. And there's only one CodeGen test mentioning them that's not based on intrinsics (which gets more substantial changes), so I think that part's OK. Your comment about only applying IsBE to patterns is a good one though.
3483	I believe this is incorrect for the simple instructions. "ld1 {v0.4h, v1.4h}, [x0]" is equivalent to "ld1 {v0.4h}, [x0]; ld1 {v1.4h}, [x0, #8]" and different from "ldr d0, [x0]; ldr d1, [x0, #8]" on big-endian systems.

Hi Tim,

I'm not sure I understand your comments. Do you mean ARM is intending to add C level types to ACLE & AAPCS that *will* behave as if loaded and stored with ld1/st1 soon?

No, I didn't mean that. We should follow AAPCS64. AAPCS64 says,

"Elements in a short vector are numbered such that the lowest numbered element (element 0) occupies the lowest numbered bit (bit zero) in the vector and successive elements take on progressively increasing bit positions in the vector. When a short vector transferred between registers and memory it is treated as an opaque object. That is a short vector is stored in memory as if it were stored with a single STR of the entire register; a short vector is loaded from memory using the corresponding LDR instruction. On a little-endian system this means that element 0 will always contain the lowest addressed element of a short vector; on a big-endian system element 0 will contain the highest-addressed element of a short vector."

All these statements are talking about the short vector with total size alignment. However, for the LLVM IR, we have the case of element size alignment short vector, which should not simply fall into this category. It should be treated as an array of elements, and using ld1/st1 to completely match this semantic, and we don't have semantic difference for ld1/st1 between LE and BE except the data layout inside element.

For total size aligned short vector, ld1/st1 have the same semantics as ldr/str on little-endian. We prefer to use ldr/str because they have better addressing modes than ld1/st1. On big-endian, we should only use ldr/str to meet semantic requirement.

Thanks,
-Jiangning

lib/Target/AArch64/AArch64InstrNEON.td
3362	We should avoid mixing the use of ld1 and ldr. storeRegToStackSlot should decide to use ld1 or ldr by checking the alignment. If it is not an element alignment, but a whole short vector alignment, we should use ldr, while for other cases, we should use ld1. This way, we should be able to always keep endianess correctness and we should not have mixing issue.
3417	I don't think we're forced to generate either as well, but we should keep semantic correctness by choosing either in terms of alignment. Actually we don't really violate AAPCS at all. AAPCS says, "A short vector has a base type that is the fundamental integral or floating-point type from which it is composed, but its alignment is always the same as its total size.". If the memory address is not total size aligned, it is not a "short vector" definition in AAPCS. It should be treated as an array, which is usually generated from auto vectorizer, so we prefer to generate ld1/st1 for it.
3483	I don't think I meant ld1 and ldr have the same sementic between LE and BE systems. I agree with your statement. What I meant is ld1/st1 should always have the same semantic between LE and BE systems except the data layout inside the element. We should choose ldr or ld1 in terms of alignment on IR. If it is total size aligned access, we use ldr, and otherwise we use ld1.

Hi Jiangning,

I'm afraid I still can't quite see what you're proposing. First, are you sure you mean "alignment" in your post? If so, you seem to be advocating treating these two instructions differently:

%val = load <4 x i16>* %addr, align 8 ; gets ldr
%val = load <4 x i16>* %addr, align 2 ; gets ld1

My opinion is that would be madness, and almost impossible to produce a consistent code from. I'll try to think up some examples if you like, but just want to make sure I understand what you're saying first.

If that's not what you mean, could you give some IR examples and the code you'd like them to generate (particularly showing the distinctions)?

Cheers.

Tim.

Hi Tim,

I'm afraid I still can't quite see what you're proposing. First, are you sure you mean "alignment" in your post? If so, you seem to be advocating treating these two instructions differently:

%val = load <4 x i16>* %addr, align 8 ; gets ldr
%val = load <4 x i16>* %addr, align 2 ; gets ld1
My opinion is that would be madness, and almost impossible to produce a consistent code from. I'll try to think up some examples if you like, but just want to make sure I understand what you're saying first.

Yes. This is my point. Could you please give me some examples to articulate it is "madness". :-)

If we don't use ld1 for "align 2" case, which instruction we should use? ldr requires total size alignment, otherwise exception would be raised if strict alignment is enabled.

Thanks,
-Jiangning

Just a short comment since I don't work for Abix any more (due to differences in promised/actual payment) -> Christian, the other compiler guy at Abix will probably pick this up shortly.

@Jiangning:
alignment is a minimum-attribute of a type. The type can always be better-aligned, and

there's a tendency to do that for many chips to better utilize memory bus cycles.
unioning with a 128bit int will boost a vector's alignment (although you get lucky now in that it's not an HVA any more) -> Any such boost in alignment would suddenly have it stored in a different format (STR). Giving the address of the vector to a function has the function not knowing the actual alignment *) -> it will assume minimum alignment and load via LD1 - unless we have a clean type-separation between HVAs and array-like aggregates.

*) pointer-to-T args have to demote alignemnt to the minimum alignment of T, as its' sufficient that any passed argument might be aligned that low.

The bad example may not work exactly like this, but it's bound to be found eventually.
For me, relying on alignment alone is just asking for disaster:

It's not inconceivable, that someone might write an alignment analysis that tries to supply better-than guaranteed-for-the-type alignment to the backend in order to exploit aligned loads with lower memory bus resource usage (e.g. malloc/new -ed variables are typically 128bit aligned anyway).
For AARCH64 it might not be a bad idea to boost alignment for types similar to vector types anyway, if you want to get performance out of NEON (actually, I think that's what ARM tried to achieve with that definition).
It would be very strange to enforce higher alignment for HVAs and force array-type data to lower alignment, when LD1 also benefits from higher alignment.

Solution:
The frontend must give the backend a totally different type for short vectors (HVAs), so that the memory layouts cannot be mixed.
Then you can re-eanble LD1 for array type loads in BE.

Yes. This is my point.

Oh good, at least we're communicating properly!

Could you please give me some examples to articulate it is "madness". :-)

Well, take a look at the file I attached for example. Running "opt" on
it allows inlining and the real alignment of 8 is propagated to the
load. As a result, you'd get different results depending on
optimisation level (if we used "ld1 {v0.4h}").

If we don't use ld1 for "align 2" case, which instruction we should
use? ldr requires total size alignment, otherwise exception would
be raised if strict alignment is enabled.

If we decided to support strict alignment mode efficiently, we would
probably want to emit an "ld1 {v0.8b}" (i.e. always use the .8b or
.16b version), since that's got the same semantics as ldr. At the
moment neither gets emitted so it's not really a pressing issue (it
would be part of "support strict align" rather than "support
big-endian" in my view).

Cheers.

Tim.

{F47028, layout=link}

Hi Tim,
Hi Jiangning,

are you ok with committing the initial submission?

Thanks,
Christian

Hi Christian,

The original patch doesn't work any longer on trunk. Can this be merged
with trunk and sent out again?

Thanks,
-Jiangning

2014-03-31 20:31 GMT+08:00 Christian Pirker <cpirker@a-bix.com>:

Hi Tim,
Hi Jiangning,

are you ok with committing the initial submission?

Thanks,
Christian

http://llvm-reviews.chandlerc.com/D2884

Hi Christian,

I think the most recent patch was still too conservative (w.r.t. duplicating & lane operations) and intrusive (in the testing).

Cheers.

Tim.

Hi,

I restarted a new revision (D3345) so that I can patch new diffs.

Thanks,
Christian

jmolloy removed a reviewer: jmolloy.Jul 10 2014, 2:47 PM

jmolloy removed a subscriber: jmolloy.

Revision Contents

Path

Size

lib/

Target/

AArch64/

AArch64InstrInfo.td

118 lines

AArch64InstrNEON.td

444 lines

test/

CodeGen/

AArch64/

161 lines

1 line

1 line

1 line

2 lines

1 line

assertion-rc-mismatch.ll

1 line

atomic-ops-not-barriers.ll

1 line

2 lines

1 line

1 line

1 line

1 line

2 lines

1 line

1 line

1 line

code-model-large-abs.ll

1 line

compare-branch.ll

1 line

complex-copy-noneon.ll

1 line

concatvector-v8i8-bug.ll

1 line

2 lines

4 lines

2 lines

1 line

1 line

1 line

2 lines

1 line

1 line

2 lines

1 line

1 line

1 line

1 line

1 line

1 line

1 line

2 lines

1 line

1 line

1 line

1 line

1 line

1 line

1 line

2 lines

1 line

1 line

2 lines

inline-asm-constraints-badI.ll

1 line

inline-asm-constraints-badK.ll

1 line

inline-asm-constraints-badK2.ll

1 line

inline-asm-constraints-badL.ll

1 line

inline-asm-modifiers.ll

1 line

2 lines

1 line

1 line

2 lines

2 lines

2 lines

4 lines

2 lines

1 line

logical_shifted_reg.ll

1 line

mature-mc-support.ll

2 lines

movw-consts.ll

1 line

movw-shift-encoding.ll

1 line

1 line

1 line

1 line

1 line

1 line

1 line

1 line

1 line

neon-bitwise-instructions.ll

1 line

neon-bsl.ll

1 line

neon-compare-instructions.ll

1 line

neon-copyPhysReg-tuple.ll

1 line

2 lines

1 line

1 line

1 line

1 line

1 line

1 line

neon-halving-add-sub.ll

1 line

neon-load-store-v1i32.ll

1 line

neon-max-min-pairwise.ll

1 line

1 line

1 line

1 line

1 line

1 line

1 line

1 line

neon-rounding-halving-add.ll

1 line

neon-rounding-shift.ll

1 line

neon-saturating-add-sub.ll

1 line

neon-saturating-rounding-shift.ll

1 line

neon-saturating-shift.ll

1 line

neon-scalar-abs.ll

1 line

neon-scalar-add-sub.ll

1 line

neon-scalar-by-elem-fma.ll

1 line

neon-scalar-by-elem-mul.ll

1 line

neon-scalar-compare.ll

1 line

neon-scalar-copy.ll

1 line

neon-scalar-cvt.ll

1 line

neon-scalar-ext.ll

1 line

neon-scalar-extract-narrow.ll

1 line

neon-scalar-fabd.ll

1 line

neon-scalar-fcvt.ll

1 line

neon-scalar-fp-compare.ll

1 line

neon-scalar-mul.ll

1 line

neon-scalar-neg.ll

1 line

neon-scalar-recip.ll

1 line

neon-scalar-reduce-pairwise.ll

1 line

neon-scalar-rounding-shift.ll

1 line

neon-scalar-saturating-add-sub.ll

1 line

neon-scalar-saturating-rounding-shift.ll

1 line

neon-scalar-saturating-shift.ll

1 line

neon-scalar-shift-imm.ll

1 line

neon-scalar-shift.ll

1 line

neon-select_cc.ll

1 line

neon-shift-left-long.ll

1 line

neon-shift.ll

1 line

neon-shl-ashr-lshr.ll

1 line

neon-simd-ldst-multi-elem.ll

33 lines

neon-simd-ldst.ll

1 line

neon-simd-post-ldst-multi-elem.ll

1 line

neon-simd-post-ldst-one.ll

1 line

neon-simd-shift.ll

1 line

neon-simd-tbl.ll

1 line

neon-simd-vget.ll

1 line

neon-spill-fpr8-fpr16.ll

1 line

neon-truncStore-extLoad.ll

9 lines

neon-v1i1-setcc.ll

1 line

neon-vector-list-spill.ll

1 line

regress-bitcast-formals.ll

1 line

regress-f128csel-flags.ll

1 line

regress-fp128-livein.ll

1 line

regress-tail-livereg.ll

1 line

regress-tblgen-chains.ll

1 line

regress-w29-reserved-with-fp.ll

1 line

regress-wzr-allocatable.ll

1 line

1 line

1 line

1 line

1 line

1 line

sincospow-vector-expansion.ll

1 line

tail-call.ll

1 line

tls-dynamic-together.ll

1 line

2 lines

2 lines

1 line

2 lines

1 line

Diff 7483

lib/Target/AArch64/AArch64InstrInfo.td

Context not available.
	def HasCrypto : Predicate<"Subtarget->hasCrypto()">,	def HasCrypto : Predicate<"Subtarget->hasCrypto()">,
	AssemblerPredicate<"FeatureCrypto","crypto">;	AssemblerPredicate<"FeatureCrypto","crypto">;

		def IsLE : Predicate<"Subtarget->isLittle()">;
		def IsBE : Predicate<"!Subtarget->isLittle()">;

	// Use fused MAC if more precision in FP computation is allowed.	// Use fused MAC if more precision in FP computation is allowed.
	def UseFusedMAC : Predicate<"(TM.Options.AllowFPOpFusion =="	def UseFusedMAC : Predicate<"(TM.Options.AllowFPOpFusion =="
	" FPOpFusion::Fast)">;	" FPOpFusion::Fast)">;
Context not available.
	: ls_neutral_pats<LOAD, STORE, Base, Offset, address, sty>,	: ls_neutral_pats<LOAD, STORE, Base, Offset, address, sty>,
	ls_atomic_pats<LOAD, STORE, Base, Offset, address, sty, sty>;	ls_atomic_pats<LOAD, STORE, Base, Offset, address, sty, sty>;


		// Wrappers to instantiate all allowed same-size fp/vector loads
		t.p.northoverUnsubmitted Not Done Reply Inline Actions Could we have capital letters at the start of sentences? t.p.northover: Could we have capital letters at the start of sentences?
		akadlecAuthorUnsubmitted Not Done Reply Inline Actions Done akadlec: Done

		// NEON-BE: allow all neon vectors as well, since ld1/st1 must be disabled
		// LD1 & ST1 are not ABI conforming in big endian: wrong arg memory layout
		t.p.northoverUnsubmitted Not Done Reply Inline Actions It'd probably be a good idea to refer people to the AAPCS here for more details. t.p.northover: It'd probably be a good idea to refer people to the AAPCS here for more details.
		akadlecAuthorUnsubmitted Not Done Reply Inline Actions Done. the URL just so fits the 80cols limit :-) akadlec: Done. the URL just so fits the 80cols limit :-)
		jmolloyUnsubmitted Not Done Reply Inline Actions As the AAPCS is also not too clear on this, could you please add in the comment the reason ld1/st1 can't be used? More specifically than "wrong arg memory layout", it is because the LD1 performs lane-by-lane byte swapping, and LDR swaps the entire D/Q register. jmolloy: As the AAPCS is also not too clear on this, could you please add in the comment the reason…
		akadlecAuthorUnsubmitted Not Done Reply Inline Actions Hmmm, somebody should tell ARM, so they can clarify that section ;-) I'd written that to the beginning of the AArch64InstrNEON.td - but left it out again. I'd still prefer to put that next to the NEON load-store instructions, where it belongs. If you can bear with a larger paragraph of comments ... DONE akadlec: Hmmm, somebody should tell ARM, so they can clarify that section ;-) I'd written that to the…
		// http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf
		t.p.northoverUnsubmitted Not Done Reply Inline Actions Why is this only eventually? Couldn't it be enabled now if it's got better addressing-mode properties? t.p.northover: Why is this only eventually? Couldn't it be enabled now if it's got better addressing-mode…
		akadlecAuthorUnsubmitted Not Done Reply Inline Actions we're using LE as reference - so trying not to change that, yet. people working on LE might oppose the code changes. the comment was intended to start discussion that eventually leads to enabling. akadlec: 1) we're using LE as reference - so trying not to change that, yet. 2) people working on LE…
		// section 4.1.2, 2nd paragraph: LDR/STR layout
		// "on a big-endian system element 0 will contain the highest-addressed
		// element of a short vector."
		// FIXME: eventually also enable for LE
		// (desired by ARM - smaller code due to more powerful adressing modes)

		// NEON 8 bit types
		multiclass ls_FPR8_pats<Instruction LOAD, Instruction STORE,
		dag Base, dag Offset, dag address> {
		let Predicates = [HasNEON] in {
		defm : ls_neutral_pats<LOAD, STORE, Base, Offset, address, v1i8>;
		}
		}

		// NEON 16 bit types
		multiclass ls_FPR16_pats<Instruction LOAD, Instruction STORE,
		dag Base, dag Offset, dag address> {
		let Predicates = [HasFPARMv8] in {
		jmolloyUnsubmitted Not Done Reply Inline Actions What does this mean? is it a paste from the comment below? is fp16 always available in a64? Also capitalization, as Tim said. jmolloy: What does this mean? is it a paste from the comment below? is fp16 always available in a64?
		akadlecAuthorUnsubmitted Not Done Reply Inline Actions The fp* rules have already been there before I added the vector types. They were unguarded. I kept it that way until we clarify the necessity of these guards. It's not unimaginable that other FP hardware might emerge besides neon vfp. I don't know and even manufacturers don't see the future market demands, yet. akadlec: The fp* rules have already been there before I added the vector types. They were unguarded. I…
		defm : ls_neutral_pats<LOAD, STORE, Base, Offset, address, f16>;
		jmolloyUnsubmitted Not Done Reply Inline Actions Please add a FIXME and remove trailing "??" jmolloy: Please add a FIXME and remove trailing "??"
		akadlecAuthorUnsubmitted Not Done Reply Inline Actions I was waiting for a reply from Tim -> remove the comment and do the right thing whatever that is. I since figured, that if the predicates exist I'd better use them - ignoring the non-consistent picture so far. Better a fail to match in the presence of upstream errors, than blowing up in the face of a customer's customer. akadlec: I was waiting for a reply from Tim -> remove the comment and do the right thing whatever that…
		}

		let Predicates = [HasNEON] in {
		defm : ls_neutral_pats<LOAD, STORE, Base, Offset, address, v1i16>;
		}
		}

		// NEON 32 bit types
		multiclass ls_FPR32_pats<Instruction LOAD, Instruction STORE,
		dag Base, dag Offset, dag address> {
		let Predicates = [HasFPARMv8] in {
		defm : ls_neutral_pats<LOAD, STORE, Base, Offset, address, f32>;
		}

		t.p.northoverUnsubmitted Not Done Reply Inline Actions Commented code. t.p.northover: Commented code.
		akadlecAuthorUnsubmitted Not Done Reply Inline Actions Yeah - I'm still wondering why there's a v1f64 non terminal, but no v1i32. Any idea? Symmetry would suggest, both should exist. akadlec: Yeah - I'm still wondering why there's a v1f64 non terminal, but no v1i32. Any idea? Symmetry…
		let Predicates = [HasNEON] in {
		defm : ls_neutral_pats<LOAD, STORE, Base, Offset, address, v1i32>;
		// defm : ls_neutral_pats<LOAD, STORE, Base, Offset, address, v1f32>; does not exist - v1f64 DOES -- WHY ?
		}
		jmolloyUnsubmitted Not Done Reply Inline Actions v1f64 is there because some instructions that act on f64's have both NEON and VFP variants, and in order that NEON intrinsics written by the user select the instruction that the user requested, there must be a way to distinguish between the two at the type level. That is my understanding anyway - Hao or Jiangning will know more about this. jmolloy: v1f64 is there because some instructions that act on f64's have both NEON and VFP variants, and…
		akadlecAuthorUnsubmitted Not Done Reply Inline Actions hmm I don't see the necessity for a separate nonterminal, there. either a) Intrinsics are matched directly - so no need there. b) As long as the semantic isn't different, the user couldn't care less. (-> pseudo instruction that can be lowered later as fits) Nonterminals are data-types for registers. Separate nonterminals are for different in-register representations, e.g. they might be handy for "reversed" vectors, together with the chain rules that do the element-swaps. They're really handy for high'low positioning of values in register. But that all requires a PBQP matcher that finds the optimal coverage for the function's DAG. akadlec: hmm I don't see the necessity for a separate nonterminal, there. either a) Intrinsics are…
		}

		// NEON 64 bit types
		multiclass ls_FPR64_pats<Instruction LOAD, Instruction STORE,
		t.p.northoverUnsubmitted Not Done Reply Inline Actions If the CPU doesn't have FPARMv8 then f64 won't be a legal type and the DAG shouldn't contain any instances of it by this time. t.p.northover: If the CPU doesn't have FPARMv8 then f64 won't be a legal type and the DAG shouldn't contain…
		akadlecAuthorUnsubmitted Not Done Reply Inline Actions ARM32 ended up having a few options for hard float units. wasn't sure since there're other uses above - but not at all consistently. E.g: let Predicates = [HasFPARMv8] in { def : Pat<(i32 (fp_to_sint f32:$Rn)), (FCVTZSws $Rn)>; ... What do we do NOW ? Cleanup (which way?) / leave as is (-> add guards for new code or not?) ? akadlec: ARM32 ended up having a few options for hard float units. wasn't sure since there're other…
		dag Base, dag Offset, dag address> {
		let Predicates = [HasFPARMv8] in {
		defm : ls_neutral_pats<LOAD, STORE, Base, Offset, address, f64>;
		t.p.northoverUnsubmitted Not Done Reply Inline Actions LLVM normally uses "FIXME" rather than "TODO". A consistent choice makes grepping a bit easier. t.p.northover: LLVM normally uses "FIXME" rather than "TODO". A consistent choice makes grepping a bit easier.
		akadlecAuthorUnsubmitted Not Done Reply Inline Actions DONE akadlec: DONE
		}

		let Predicates = [HasNEON] in {
		defm : ls_neutral_pats<LOAD, STORE, Base, Offset, address, v8i8>;
		defm : ls_neutral_pats<LOAD, STORE, Base, Offset, address, v4i16>;
		defm : ls_neutral_pats<LOAD, STORE, Base, Offset, address, v2i32>;
		defm : ls_neutral_pats<LOAD, STORE, Base, Offset, address, v1i64>;
		defm : ls_neutral_pats<LOAD, STORE, Base, Offset, address, v2f32>;
		defm : ls_neutral_pats<LOAD, STORE, Base, Offset, address, v1f64>;
		}
		}

		// NEON 128 bit types FPR128
		multiclass ls_FPR128_pats<Instruction LOAD, Instruction STORE,
		dag Base, dag Offset, dag address> {
		let Predicates = [HasFPARMv8] in {
		defm : ls_neutral_pats<LOAD, STORE, Base, Offset, address, f128>;
		}

		let Predicates = [HasNEON] in {
		defm : ls_neutral_pats<LOAD, STORE, Base, Offset, address, v16i8>;
		defm : ls_neutral_pats<LOAD, STORE, Base, Offset, address, v8i16>;
		defm : ls_neutral_pats<LOAD, STORE, Base, Offset, address, v4i32>;
		defm : ls_neutral_pats<LOAD, STORE, Base, Offset, address, v2i64>;
		defm : ls_neutral_pats<LOAD, STORE, Base, Offset, address, v4f32>;
		defm : ls_neutral_pats<LOAD, STORE, Base, Offset, address, v2f64>;
		}
		}

	//===------------------------------	//===------------------------------
	// 2.2. Addressing-mode instantiations	// 2.2. Addressing-mode instantiations
	//===------------------------------	//===------------------------------
Context not available.
	!subst(ALIGN, min_align8, decls.pattern))),	!subst(ALIGN, min_align8, decls.pattern))),
	i64>;	i64>;

	defm : ls_neutral_pats<LSFP16_LDR, LSFP16_STR, Base,	defm : ls_FPR8_pats< LSFP8_LDR, LSFP8_STR, Base,
	!foreach(decls.pattern, Offset,	!foreach(decls.pattern, Offset,
		!subst(OFFSET, byte_uimm12, decls.pattern)),
		!foreach(decls.pattern, address,
		!subst(OFFSET, byte_uimm12,
		!subst(ALIGN, any_align, decls.pattern)))>;

		defm : ls_FPR16_pats< LSFP16_LDR, LSFP16_STR, Base,
		!foreach(decls.pattern, Offset,
	!subst(OFFSET, hword_uimm12, decls.pattern)),	!subst(OFFSET, hword_uimm12, decls.pattern)),
	!foreach(decls.pattern, address,	!foreach(decls.pattern, address,
	!subst(OFFSET, hword_uimm12,	!subst(OFFSET, hword_uimm12,
	!subst(ALIGN, min_align2, decls.pattern))),	!subst(ALIGN, min_align2, decls.pattern)))>;
	f16>;

	defm : ls_neutral_pats<LSFP32_LDR, LSFP32_STR, Base,	defm : ls_FPR32_pats< LSFP32_LDR, LSFP32_STR, Base,
	!foreach(decls.pattern, Offset,	!foreach(decls.pattern, Offset,
	!subst(OFFSET, word_uimm12, decls.pattern)),	!subst(OFFSET, word_uimm12, decls.pattern)),
	!foreach(decls.pattern, address,	!foreach(decls.pattern, address,
	!subst(OFFSET, word_uimm12,	!subst(OFFSET, word_uimm12,
	!subst(ALIGN, min_align4, decls.pattern))),	!subst(ALIGN, min_align4, decls.pattern)))>;
	f32>;

	defm : ls_neutral_pats<LSFP64_LDR, LSFP64_STR, Base,	defm : ls_FPR64_pats< LSFP64_LDR, LSFP64_STR, Base,
	!foreach(decls.pattern, Offset,	!foreach(decls.pattern, Offset,
	!subst(OFFSET, dword_uimm12, decls.pattern)),	!subst(OFFSET, dword_uimm12, decls.pattern)),
	!foreach(decls.pattern, address,	!foreach(decls.pattern, address,
	!subst(OFFSET, dword_uimm12,	!subst(OFFSET, dword_uimm12,
	!subst(ALIGN, min_align8, decls.pattern))),	!subst(ALIGN, min_align8, decls.pattern)))>;
	f64>;

	defm : ls_neutral_pats<LSFP128_LDR, LSFP128_STR, Base,	defm : ls_FPR128_pats< LSFP128_LDR, LSFP128_STR, Base,
	!foreach(decls.pattern, Offset,	!foreach(decls.pattern, Offset,
	!subst(OFFSET, qword_uimm12, decls.pattern)),	!subst(OFFSET, qword_uimm12, decls.pattern)),
	!foreach(decls.pattern, address,	!foreach(decls.pattern, address,
	!subst(OFFSET, qword_uimm12,	!subst(OFFSET, qword_uimm12,
	!subst(ALIGN, min_align16, decls.pattern))),	!subst(ALIGN, min_align16, decls.pattern)))>;
	f128>;

	defm : load_signed_pats<"B", "", Base,	defm : load_signed_pats<"B", "", Base,
	!foreach(decls.pattern, Offset,	!foreach(decls.pattern, Offset,
Context not available.
	defm : ls_int_neutral_pats<LS32_LDUR, LS32_STUR, Base, Offset, address, i32>;	defm : ls_int_neutral_pats<LS32_LDUR, LS32_STUR, Base, Offset, address, i32>;
	defm : ls_int_neutral_pats<LS64_LDUR, LS64_STUR, Base, Offset, address, i64>;	defm : ls_int_neutral_pats<LS64_LDUR, LS64_STUR, Base, Offset, address, i64>;

	defm : ls_neutral_pats<LSFP16_LDUR, LSFP16_STUR, Base, Offset, address, f16>;	defm : ls_FPR16_pats<LSFP16_LDUR, LSFP16_STUR, Base, Offset, address>;
	defm : ls_neutral_pats<LSFP32_LDUR, LSFP32_STUR, Base, Offset, address, f32>;	defm : ls_FPR32_pats<LSFP32_LDUR, LSFP32_STUR, Base, Offset, address>;
	defm : ls_neutral_pats<LSFP64_LDUR, LSFP64_STUR, Base, Offset, address, f64>;	defm : ls_FPR64_pats<LSFP64_LDUR, LSFP64_STUR, Base, Offset, address>;
	defm : ls_neutral_pats<LSFP128_LDUR, LSFP128_STUR, Base, Offset, address,	defm : ls_FPR128_pats<LSFP128_LDUR, LSFP128_STUR, Base, Offset, address>;
	f128>;

	def : Pat<(i64 (zextloadi32 address)),	def : Pat<(i64 (zextloadi32 address)),
	(SUBREG_TO_REG (i64 0), (LS32_LDUR Base, Offset), sub_32)>;	(SUBREG_TO_REG (i64 0), (LS32_LDUR Base, Offset), sub_32)>;
Context not available.

lib/Target/AArch64/AArch64InstrNEON.td

Context not available.
	defm : ls_128_pats<address, Base, Offset, v2f64>;	defm : ls_128_pats<address, Base, Offset, v2f64>;
	}	}

		// LDR is only valid for little endian.
		JiangningUnsubmitted Not Done Reply Inline Actions This comment is misleading. Every instruction should be valid for big-endian, although the same instruction can have different behaviors for LE/BE. Jiangning: This comment is misleading. Every instruction should be valid for big-endian, although the same…
		// In BE LDR needs correctly byte-swapped 128bit literals, so simple array
		// initializers won't work right now.
		// Big-endian must - for now - do the element swaps using vector intrinsics.
		// That's an additional "add offset12" instruction, there.
		// According to ARM, BE & LE should use intrinsics for initialization.
		// That's also the only portable code.
		// FIXME: BE could use vector-literal-swapping before emit pass.
	defm : uimm12_neon_pats<(A64WrapperSmall	defm : uimm12_neon_pats<(A64WrapperSmall
	tconstpool:$Hi, tconstpool:$Lo12, ALIGN),	tconstpool:$Hi, tconstpool:$Lo12, ALIGN),
	(ADRPxi tconstpool:$Hi), (i64 tconstpool:$Lo12)>;	(ADRPxi tconstpool:$Hi), (i64 tconstpool:$Lo12)>;
Context not available.
	// the three 64-bit vectors list {BA, DC, FE}.	// the three 64-bit vectors list {BA, DC, FE}.
	// E.g. LD3_2S will load 32-bit elements {A, B, C, D, E, F} into the three	// E.g. LD3_2S will load 32-bit elements {A, B, C, D, E, F} into the three
	// 64-bit vectors list {DA, EB, FC}.	// 64-bit vectors list {DA, EB, FC}.
	// Store instructions store multiple structure to N registers like load.	// Store instructions store multiple structure from N registers like load.
		//
		// Problem for Big Endian (BE):
		// LD1/ST1 do "array" loads/stores - reading elements from ascending addresses
		// into ascending indexes in the register, in big-endian byte-swapping is done
		// per element. (hence LD1 & Co are sometimes referred to as "array loads".)
		//
		// LDR/STR read the whole register doing byte-swapping on the whole register
		// in big-endian mode.
		//
		// Obviously the two layouts differ by reversing the elements so they can't be
		JiangningUnsubmitted Not Done Reply Inline Actions How do we come across a case mixing the uses of LDR and LD1? If it's type casting, end-user should guarantee the correctness by program logic itself rather than by compiler. Jiangning: How do we come across a case mixing the uses of LDR and LD1? If it's type casting, end-user…
		t.p.northoverUnsubmitted Not Done Reply Inline Actions The compiler was mixing them at will previously (e.g. storeRegToStackSlot uses str, but this address could escape and be used in a normal load which we'd use ld1 for). I believe Albrecht's comment is designed to warn against this, and I support it. t.p.northover: The compiler was mixing them at will previously (e.g. storeRegToStackSlot uses str, but this…
		JiangningUnsubmitted Not Done Reply Inline Actions We should avoid mixing the use of ld1 and ldr. storeRegToStackSlot should decide to use ld1 or ldr by checking the alignment. If it is not an element alignment, but a whole short vector alignment, we should use ldr, while for other cases, we should use ld1. This way, we should be able to always keep endianess correctness and we should not have mixing issue. Jiangning: We should avoid mixing the use of ld1 and ldr. storeRegToStackSlot should decide to use ld1 or…
		// mixed without explicit element-swap operations in BE.
		//
		// The only overlap is reading single elements to registers:
		// LDR i128/f128 - doing byte-swapping for the whole register.
		// LD1/ST1 i128/f128 - also doing byte-swapping within the 128bit element.
		// Analogously for stores.

		// For this reason there are IsLE guards around the respective patterns, or -
		// when no patterns are defined, yet - around the instruction definition.

		// In a PBQP matcher, one would add a separate set of "reversed" nonterminals
		// with the element swap operations as chain rules - and let the matcher find
		// the optimal coverage. FIXME: How to do that here ?

	class NeonI_LDVList<bit q, bits<4> opcode, bits<2> size,	class NeonI_LDVList<bit q, bits<4> opcode, bits<2> size,
	RegisterOperand VecList, string asmop>	RegisterOperand VecList, string asmop>
	: NeonI_LdStMult<q, 1, opcode, size,	: NeonI_LdStMult<q, 1, opcode, size,
		t.p.northoverUnsubmitted Not Done Reply Inline Actions There aren't any patterns in this multiclass so this is superfluous. If patterns are added, it's not clear that they'll be wrong for BE either: they could be the int_arm_neon_vldN version which you do want. t.p.northover: There aren't any patterns in this multiclass so this is superfluous. If patterns are added…
		akadlecAuthorUnsubmitted Not Done Reply Inline Actions True - It's more a reminder for the guy who adds patterns, that LDn/STn make trouble in BE, while it's fine in LE. LE implementation is farther ahead - adding patterns without considering BE will be troublesome. -> Shall I convert that to a comment ? akadlec: True - It's more a reminder for the guy who adds patterns, that LDn/STn make trouble in BE…
		jmolloyUnsubmitted Not Done Reply Inline Actions This line of the comment doesn't make sense. LD1 is disallowed, but it has nothing to do with 12bit offset adds! You give the reasoning below. Also in LLVM we have a tendency to write comments in a fairly prose-like form to make it easier to read. For example, instead of "LD1 disallowed in BE", "LD1 is disallowed in BE mode". Also instead of "reason: ", "This is because..." jmolloy: This line of the comment doesn't make sense. LD1 is disallowed, but it has nothing to do with…
		jmolloyUnsubmitted Not Done Reply Inline Actions The comment says this should work in BE mode... then the predicate stops it working in BE mode. Why? jmolloy: The comment says this should work in BE mode... then the predicate stops it working in BE mode.
		akadlecAuthorUnsubmitted Not Done Reply Inline Actions first guess at step 5 in our roadmap - and being conservative. Anyway - fixed, now that I had the time to think through it all. akadlec: first guess at step 5 in our roadmap - and being conservative. Anyway - fixed, now that I had…
		akadlecAuthorUnsubmitted Not Done Reply Inline Actions right - was mixing the BE & LE issues, there fixed by adding the whole ugly story, right at the start of the store section. Wish we could separate the stores out from that 10.000 lines file. :-( ad Comments: - that's probably why there were so many helpful comments in that file :-( At many places I'd have preferred a short comment over none at all. akadlec: right - was mixing the BE & LE issues, there fixed by adding the whole ugly story, right at the…
Context not available.
	}	}

	// Load multiple N-element structure to N consecutive registers (N = 1,2,3,4)	// Load multiple N-element structure to N consecutive registers (N = 1,2,3,4)
	defm LD1 : LDVList_BHSD<0b0111, "VOne", "ld1">;
		// LD1 disallowed in BE, when LDR and STR are used exclusively as per the ABI.
		// reason: LDR/STR use different memory/register layout (no element swaps).
		// If different types of loads were used from the same memory address the results
		// will be inconsistent.
		// The only allowed use of LD1 is in initializations using explicit intrinsics to do
		JiangningUnsubmitted Not Done Reply Inline Actions This is not the only case. Auto-vectorizer could generate element alignment short vector ld/st. For example, middle-end could generate store <4 x i16> %val, <4 x i16>* %ptr, align 2 We should generate instruction like st1 v0.4h, [x0]. Unfortunately, we can't generate this instruction yet with trunk. We will get it fixed as soon as possible. Jiangning: This is not the only case. Auto-vectorizer could generate element alignment short vector ld/st.
		t.p.northoverUnsubmitted Not Done Reply Inline Actions I don't believe we're forced to generate either and there are arguments in favour of both, but being consistent is very important. As Albrecht said, we can't mix the two kinds of load/store. I agree that using ld1/st1 exclusively would make LLVM's semantics easier to get right, but it would make getting the AAPCS right harder (bitcasts would become non-trivial operations and be needed at all potentially ABI-visible boundaries). I suspect (but don't know) that the ldr/str route is capable of producing better code on average. t.p.northover: I don't believe we're forced to generate either and there are arguments in favour of both, but…
		JiangningUnsubmitted Not Done Reply Inline Actions I don't think we're forced to generate either as well, but we should keep semantic correctness by choosing either in terms of alignment. Actually we don't really violate AAPCS at all. AAPCS says, "A short vector has a base type that is the fundamental integral or floating-point type from which it is composed, but its alignment is always the same as its total size.". If the memory address is not total size aligned, it is not a "short vector" definition in AAPCS. It should be treated as an array, which is usually generated from auto vectorizer, so we prefer to generate ld1/st1 for it. Jiangning: I don't think we're forced to generate either as well, but we should keep semantic correctness…
		// the element-swaps.

		// Single element has no swapping problem in BE.
	def LD1_1D : NeonI_LDVList<0, 0b0111, 0b11, VOne1D_operand, "ld1">;	def LD1_1D : NeonI_LDVList<0, 0b0111, 0b11, VOne1D_operand, "ld1">;

	defm LD2 : LDVList_BHSD<0b1000, "VPair", "ld2">;	// Multiple elements would be reversed in BE.
		let Predicates = [IsLE] in {
		JiangningUnsubmitted Not Done Reply Inline Actions Is this to disable LD1/LD2/LD3/LD4 for big-endian? If yes, why the test cases using those instructions can pass with big-endian configuration? This piece of code is to define encodings, and LE/BE should always cover them. If we don't want to generate any instruction, we should control them with pattern match. Jiangning: Is this to disable LD1/LD2/LD3/LD4 for big-endian? If yes, why the test cases using those…
		t.p.northoverUnsubmitted Not Done Reply Inline Actions The "IsBE" predicate is codegen-level rather than an AssemblerPredicate so MC tests won't be affected anyway. And there's only one CodeGen test mentioning them that's not based on intrinsics (which gets more substantial changes), so I think that part's OK. Your comment about only applying IsBE to patterns is a good one though. t.p.northover: The "IsBE" predicate is codegen-level rather than an AssemblerPredicate so MC tests won't be…
		defm LD1 : LDVList_BHSD<0b0111, "VOne", "ld1">;

	defm LD3 : LDVList_BHSD<0b0100, "VTriple", "ld3">;	defm LD2 : LDVList_BHSD<0b1000, "VPair", "ld2">;

	defm LD4 : LDVList_BHSD<0b0000, "VQuad", "ld4">;	defm LD3 : LDVList_BHSD<0b0100, "VTriple", "ld3">;

		defm LD4 : LDVList_BHSD<0b0000, "VQuad", "ld4">;
		}

	// Load multiple 1-element structure to N consecutive registers (N = 2,3,4)	// Load multiple 1-element structure to N consecutive registers (N = 2,3,4)
	defm LD1x2 : LDVList_BHSD<0b1010, "VPair", "ld1">;	defm LD1x2 : LDVList_BHSD<0b1010, "VPair", "ld1">;
	def LD1x2_1D : NeonI_LDVList<0, 0b1010, 0b11, VPair1D_operand, "ld1">;	def LD1x2_1D : NeonI_LDVList<0, 0b1010, 0b11, VPair1D_operand, "ld1">;
		jmolloyUnsubmitted Not Done Reply Inline Actions Failed to reindent this line? jmolloy: Failed to reindent this line?
		akadlecAuthorUnsubmitted Not Done Reply Inline Actions Done. akadlec: Done.
Context not available.
	}	}

	// Store multiple N-element structures from N registers (N = 1,2,3,4)	// Store multiple N-element structures from N registers (N = 1,2,3,4)
	defm ST1 : STVList_BHSD<0b0111, "VOne", "st1">;	// ARM ABI: default memory layout in BE is LDR/STR
		// Single element has no swapping problem in BE.
	def ST1_1D : NeonI_STVList<0, 0b0111, 0b11, VOne1D_operand, "st1">;	def ST1_1D : NeonI_STVList<0, 0b0111, 0b11, VOne1D_operand, "st1">;

	defm ST2 : STVList_BHSD<0b1000, "VPair", "st2">;	// Multiple elements would be reversed in BE.
		JiangningUnsubmitted Not Done Reply Inline Actions ST1/ST2/ST3/ST4 essentially use aggregate short vector type like, typedef struct int16x4x3_t { int16x4_t val[3]; } int16x4x3_t; which is defined in arm_neon.h. With this data type, LE/BE should only make difference for the layout inside element int16. The data layout among different elements should be always the same. Jiangning: ST1/ST2/ST3/ST4 essentially use aggregate short vector type like, typedef struct int16x4x3_t {…
		t.p.northoverUnsubmitted Not Done Reply Inline Actions I believe this is incorrect for the simple instructions. "ld1 {v0.4h, v1.4h}, [x0]" is equivalent to "ld1 {v0.4h}, [x0]; ld1 {v1.4h}, [x0, #8]" and different from "ldr d0, [x0]; ldr d1, [x0, #8]" on big-endian systems. t.p.northover: I believe this is incorrect for the simple instructions. "ld1 {v0.4h, v1.4h}, [x0]" is…
		JiangningUnsubmitted Not Done Reply Inline Actions I don't think I meant ld1 and ldr have the same sementic between LE and BE systems. I agree with your statement. What I meant is ld1/st1 should always have the same semantic between LE and BE systems except the data layout inside the element. We should choose ldr or ld1 in terms of alignment on IR. If it is total size aligned access, we use ldr, and otherwise we use ld1. Jiangning: I don't think I meant ld1 and ldr have the same sementic between LE and BE systems. I agree…
		let Predicates = [IsLE] in {
		defm ST1 : STVList_BHSD<0b0111, "VOne", "st1">;

	defm ST3 : STVList_BHSD<0b0100, "VTriple", "st3">;	defm ST2 : STVList_BHSD<0b1000, "VPair", "st2">;

	defm ST4 : STVList_BHSD<0b0000, "VQuad", "st4">;	defm ST3 : STVList_BHSD<0b0100, "VTriple", "st3">;

	// Store multiple 1-element structures from N consecutive registers (N = 2,3,4)	defm ST4 : STVList_BHSD<0b0000, "VQuad", "st4">;
	defm ST1x2 : STVList_BHSD<0b1010, "VPair", "st1">;
	def ST1x2_1D : NeonI_STVList<0, 0b1010, 0b11, VPair1D_operand, "st1">;

	defm ST1x3 : STVList_BHSD<0b0110, "VTriple", "st1">;	// Store multiple 1-element structures from N consecutive registers (N = 2,3,4)
	def ST1x3_1D : NeonI_STVList<0, 0b0110, 0b11, VTriple1D_operand, "st1">;	defm ST1x2 : STVList_BHSD<0b1010, "VPair", "st1">;
		def ST1x2_1D : NeonI_STVList<0, 0b1010, 0b11, VPair1D_operand, "st1">;

	defm ST1x4 : STVList_BHSD<0b0010, "VQuad", "st1">;	defm ST1x3 : STVList_BHSD<0b0110, "VTriple", "st1">;
	def ST1x4_1D : NeonI_STVList<0, 0b0010, 0b11, VQuad1D_operand, "st1">;	def ST1x3_1D : NeonI_STVList<0, 0b0110, 0b11, VTriple1D_operand, "st1">;

	def : Pat<(v2f64 (load GPR64xsp:$addr)), (LD1_2D GPR64xsp:$addr)>;	defm ST1x4 : STVList_BHSD<0b0010, "VQuad", "st1">;
	def : Pat<(v2i64 (load GPR64xsp:$addr)), (LD1_2D GPR64xsp:$addr)>;	def ST1x4_1D : NeonI_STVList<0, 0b0010, 0b11, VQuad1D_operand, "st1">;

	def : Pat<(v4f32 (load GPR64xsp:$addr)), (LD1_4S GPR64xsp:$addr)>;	def : Pat<(v2f64 (load GPR64xsp:$addr)), (LD1_2D GPR64xsp:$addr)>;
	def : Pat<(v4i32 (load GPR64xsp:$addr)), (LD1_4S GPR64xsp:$addr)>;	def : Pat<(v2i64 (load GPR64xsp:$addr)), (LD1_2D GPR64xsp:$addr)>;

	def : Pat<(v8i16 (load GPR64xsp:$addr)), (LD1_8H GPR64xsp:$addr)>;	def : Pat<(v4f32 (load GPR64xsp:$addr)), (LD1_4S GPR64xsp:$addr)>;
	def : Pat<(v16i8 (load GPR64xsp:$addr)), (LD1_16B GPR64xsp:$addr)>;	def : Pat<(v4i32 (load GPR64xsp:$addr)), (LD1_4S GPR64xsp:$addr)>;

	def : Pat<(v1f64 (load GPR64xsp:$addr)), (LD1_1D GPR64xsp:$addr)>;	def : Pat<(v8i16 (load GPR64xsp:$addr)), (LD1_8H GPR64xsp:$addr)>;
	def : Pat<(v1i64 (load GPR64xsp:$addr)), (LD1_1D GPR64xsp:$addr)>;	def : Pat<(v16i8 (load GPR64xsp:$addr)), (LD1_16B GPR64xsp:$addr)>;

	def : Pat<(v2f32 (load GPR64xsp:$addr)), (LD1_2S GPR64xsp:$addr)>;	def : Pat<(v1f64 (load GPR64xsp:$addr)), (LD1_1D GPR64xsp:$addr)>;
	def : Pat<(v2i32 (load GPR64xsp:$addr)), (LD1_2S GPR64xsp:$addr)>;	def : Pat<(v1i64 (load GPR64xsp:$addr)), (LD1_1D GPR64xsp:$addr)>;

	def : Pat<(v4i16 (load GPR64xsp:$addr)), (LD1_4H GPR64xsp:$addr)>;	def : Pat<(v2f32 (load GPR64xsp:$addr)), (LD1_2S GPR64xsp:$addr)>;
	def : Pat<(v8i8 (load GPR64xsp:$addr)), (LD1_8B GPR64xsp:$addr)>;	def : Pat<(v2i32 (load GPR64xsp:$addr)), (LD1_2S GPR64xsp:$addr)>;

	def : Pat<(store (v2i64 VPR128:$value), GPR64xsp:$addr),	def : Pat<(v4i16 (load GPR64xsp:$addr)), (LD1_4H GPR64xsp:$addr)>;
	(ST1_2D GPR64xsp:$addr, VPR128:$value)>;	def : Pat<(v8i8 (load GPR64xsp:$addr)), (LD1_8B GPR64xsp:$addr)>;
	def : Pat<(store (v2f64 VPR128:$value), GPR64xsp:$addr),
	(ST1_2D GPR64xsp:$addr, VPR128:$value)>;

	def : Pat<(store (v4i32 VPR128:$value), GPR64xsp:$addr),	def : Pat<(store (v2i64 VPR128:$value), GPR64xsp:$addr),
	(ST1_4S GPR64xsp:$addr, VPR128:$value)>;	(ST1_2D GPR64xsp:$addr, VPR128:$value)>;
	def : Pat<(store (v4f32 VPR128:$value), GPR64xsp:$addr),	def : Pat<(store (v2f64 VPR128:$value), GPR64xsp:$addr),
	(ST1_4S GPR64xsp:$addr, VPR128:$value)>;	(ST1_2D GPR64xsp:$addr, VPR128:$value)>;

	def : Pat<(store (v8i16 VPR128:$value), GPR64xsp:$addr),	def : Pat<(store (v4i32 VPR128:$value), GPR64xsp:$addr),
	(ST1_8H GPR64xsp:$addr, VPR128:$value)>;	(ST1_4S GPR64xsp:$addr, VPR128:$value)>;
	def : Pat<(store (v16i8 VPR128:$value), GPR64xsp:$addr),	def : Pat<(store (v4f32 VPR128:$value), GPR64xsp:$addr),
	(ST1_16B GPR64xsp:$addr, VPR128:$value)>;	(ST1_4S GPR64xsp:$addr, VPR128:$value)>;

	def : Pat<(store (v1i64 VPR64:$value), GPR64xsp:$addr),	def : Pat<(store (v8i16 VPR128:$value), GPR64xsp:$addr),
	(ST1_1D GPR64xsp:$addr, VPR64:$value)>;	(ST1_8H GPR64xsp:$addr, VPR128:$value)>;
	def : Pat<(store (v1f64 VPR64:$value), GPR64xsp:$addr),	def : Pat<(store (v16i8 VPR128:$value), GPR64xsp:$addr),
	(ST1_1D GPR64xsp:$addr, VPR64:$value)>;	(ST1_16B GPR64xsp:$addr, VPR128:$value)>;

	def : Pat<(store (v2i32 VPR64:$value), GPR64xsp:$addr),	def : Pat<(store (v1i64 VPR64:$value), GPR64xsp:$addr),
	(ST1_2S GPR64xsp:$addr, VPR64:$value)>;	(ST1_1D GPR64xsp:$addr, VPR64:$value)>;
	def : Pat<(store (v2f32 VPR64:$value), GPR64xsp:$addr),	def : Pat<(store (v1f64 VPR64:$value), GPR64xsp:$addr),
	(ST1_2S GPR64xsp:$addr, VPR64:$value)>;	(ST1_1D GPR64xsp:$addr, VPR64:$value)>;

	def : Pat<(store (v4i16 VPR64:$value), GPR64xsp:$addr),	def : Pat<(store (v2i32 VPR64:$value), GPR64xsp:$addr),
	(ST1_4H GPR64xsp:$addr, VPR64:$value)>;	(ST1_2S GPR64xsp:$addr, VPR64:$value)>;
	def : Pat<(store (v8i8 VPR64:$value), GPR64xsp:$addr),	def : Pat<(store (v2f32 VPR64:$value), GPR64xsp:$addr),
	(ST1_8B GPR64xsp:$addr, VPR64:$value)>;	(ST1_2S GPR64xsp:$addr, VPR64:$value)>;

		def : Pat<(store (v4i16 VPR64:$value), GPR64xsp:$addr),
		(ST1_4H GPR64xsp:$addr, VPR64:$value)>;
		def : Pat<(store (v8i8 VPR64:$value), GPR64xsp:$addr),
		(ST1_8B GPR64xsp:$addr, VPR64:$value)>;
		}

	// Match load/store of v1i8/v1i16/v1i32 type to FPR8/FPR16/FPR32 load/store.	// Match load/store of v1i8/v1i16/v1i32 type to FPR8/FPR16/FPR32 load/store.
	// FIXME: for now we have v1i8, v1i16, v1i32 legal types, if they are illegal,	// FIXME: for now we have v1i8, v1i16, v1i32 legal types, if they are illegal,
	// these patterns are not needed any more.	// these patterns are not needed any more.
Context not available.
	ImmTy2, asmop>;	ImmTy2, asmop>;
	}	}

	// Post-index load multiple N-element structures from N registers (N = 1,2,3,4)	// Single element loads are ok for BE.
	defm LD1WB : LDWB_VList_BHSD<0b0111, "VOne", uimm_exact8, uimm_exact16, "ld1">;
	defm LD1WB_1D : NeonI_LDWB_VList<0, 0b0111, 0b11, VOne1D_operand, uimm_exact8,	defm LD1WB_1D : NeonI_LDWB_VList<0, 0b0111, 0b11, VOne1D_operand, uimm_exact8,
	"ld1">;	"ld1">;

	defm LD2WB : LDWB_VList_BHSD<0b1000, "VPair", uimm_exact16, uimm_exact32, "ld2">;	// Multiple elements would be reversed in BE.
		let Predicates = [IsLE] in {
		// Post-index load multiple N-element structures from N registers (N = 1,2,3,4)
		defm LD1WB : LDWB_VList_BHSD<0b0111, "VOne", uimm_exact8, uimm_exact16, "ld1">;

	defm LD3WB : LDWB_VList_BHSD<0b0100, "VTriple", uimm_exact24, uimm_exact48,	defm LD2WB : LDWB_VList_BHSD<0b1000, "VPair", uimm_exact16, uimm_exact32, "ld2">;
	"ld3">;

	defm LD4WB : LDWB_VList_BHSD<0b0000, "VQuad", uimm_exact32, uimm_exact64, "ld4">;	defm LD3WB : LDWB_VList_BHSD<0b0100, "VTriple", uimm_exact24, uimm_exact48,
		"ld3">;

	// Post-index load multiple 1-element structures from N consecutive registers	defm LD4WB : LDWB_VList_BHSD<0b0000, "VQuad", uimm_exact32, uimm_exact64, "ld4">;
	// (N = 2,3,4)
	defm LD1x2WB : LDWB_VList_BHSD<0b1010, "VPair", uimm_exact16, uimm_exact32,
	"ld1">;
	defm LD1x2WB_1D : NeonI_LDWB_VList<0, 0b1010, 0b11, VPair1D_operand,
	uimm_exact16, "ld1">;

	defm LD1x3WB : LDWB_VList_BHSD<0b0110, "VTriple", uimm_exact24, uimm_exact48,	// Post-index load multiple 1-element structures to N consecutive registers
	"ld1">;	// (N = 2,3,4)
	defm LD1x3WB_1D : NeonI_LDWB_VList<0, 0b0110, 0b11, VTriple1D_operand,	defm LD1x2WB : LDWB_VList_BHSD<0b1010, "VPair", uimm_exact16, uimm_exact32,
	uimm_exact24, "ld1">;	"ld1">;
		defm LD1x2WB_1D : NeonI_LDWB_VList<0, 0b1010, 0b11, VPair1D_operand,
		uimm_exact16, "ld1">;

	defm LD1x4WB : LDWB_VList_BHSD<0b0010, "VQuad", uimm_exact32, uimm_exact64,	defm LD1x3WB : LDWB_VList_BHSD<0b0110, "VTriple", uimm_exact24, uimm_exact48,
	"ld1">;	"ld1">;
	defm LD1x4WB_1D : NeonI_LDWB_VList<0, 0b0010, 0b11, VQuad1D_operand,	defm LD1x3WB_1D : NeonI_LDWB_VList<0, 0b0110, 0b11, VTriple1D_operand,
	uimm_exact32, "ld1">;	uimm_exact24, "ld1">;

		defm LD1x4WB : LDWB_VList_BHSD<0b0010, "VQuad", uimm_exact32, uimm_exact64,
		"ld1">;
		defm LD1x4WB_1D : NeonI_LDWB_VList<0, 0b0010, 0b11, VQuad1D_operand,
		uimm_exact32, "ld1">;
		}

	multiclass NeonI_STWB_VList<bit q, bits<4> opcode, bits<2> size,	multiclass NeonI_STWB_VList<bit q, bits<4> opcode, bits<2> size,
	RegisterOperand VecList, Operand ImmTy,	RegisterOperand VecList, Operand ImmTy,
	string asmop> {	string asmop> {
Context not available.
	}	}

	// Post-index load multiple N-element structures from N registers (N = 1,2,3,4)	// Post-index load multiple N-element structures from N registers (N = 1,2,3,4)
	defm ST1WB : STWB_VList_BHSD<0b0111, "VOne", uimm_exact8, uimm_exact16, "st1">;	// Loading multiple elements in BE mode suffers from element-reversal.
	defm ST1WB_1D : NeonI_STWB_VList<0, 0b0111, 0b11, VOne1D_operand, uimm_exact8,	let Predicates = [IsLE] in {
	"st1">;	defm ST1WB_1D : NeonI_STWB_VList<0, 0b0111, 0b11, VOne1D_operand, uimm_exact8,
		"st1">;
		defm ST1WB : STWB_VList_BHSD<0b0111, "VOne", uimm_exact8, uimm_exact16, "st1">;

	defm ST2WB : STWB_VList_BHSD<0b1000, "VPair", uimm_exact16, uimm_exact32, "st2">;	defm ST2WB : STWB_VList_BHSD<0b1000, "VPair", uimm_exact16, uimm_exact32, "st2">;

	defm ST3WB : STWB_VList_BHSD<0b0100, "VTriple", uimm_exact24, uimm_exact48,	defm ST3WB : STWB_VList_BHSD<0b0100, "VTriple", uimm_exact24, uimm_exact48,
	"st3">;	"st3">;

	defm ST4WB : STWB_VList_BHSD<0b0000, "VQuad", uimm_exact32, uimm_exact64, "st4">;	defm ST4WB : STWB_VList_BHSD<0b0000, "VQuad", uimm_exact32, uimm_exact64, "st4">;

	// Post-index load multiple 1-element structures from N consecutive registers	// Post-index load multiple 1-element structures from N consecutive registers
	// (N = 2,3,4)	// (N = 2,3,4)
	defm ST1x2WB : STWB_VList_BHSD<0b1010, "VPair", uimm_exact16, uimm_exact32,	defm ST1x2WB : STWB_VList_BHSD<0b1010, "VPair", uimm_exact16, uimm_exact32,
	"st1">;	"st1">;
	defm ST1x2WB_1D : NeonI_STWB_VList<0, 0b1010, 0b11, VPair1D_operand,	defm ST1x2WB_1D : NeonI_STWB_VList<0, 0b1010, 0b11, VPair1D_operand,
	uimm_exact16, "st1">;	uimm_exact16, "st1">;

	defm ST1x3WB : STWB_VList_BHSD<0b0110, "VTriple", uimm_exact24, uimm_exact48,	defm ST1x3WB : STWB_VList_BHSD<0b0110, "VTriple", uimm_exact24, uimm_exact48,
	"st1">;	"st1">;
	defm ST1x3WB_1D : NeonI_STWB_VList<0, 0b0110, 0b11, VTriple1D_operand,	defm ST1x3WB_1D : NeonI_STWB_VList<0, 0b0110, 0b11, VTriple1D_operand,
	uimm_exact24, "st1">;	uimm_exact24, "st1">;

	defm ST1x4WB : STWB_VList_BHSD<0b0010, "VQuad", uimm_exact32, uimm_exact64,	defm ST1x4WB : STWB_VList_BHSD<0b0010, "VQuad", uimm_exact32, uimm_exact64,
	"st1">;	"st1">;
	defm ST1x4WB_1D : NeonI_STWB_VList<0, 0b0010, 0b11, VQuad1D_operand,	defm ST1x4WB_1D : NeonI_STWB_VList<0, 0b0010, 0b11, VQuad1D_operand,
	uimm_exact32, "st1">;	uimm_exact32, "st1">;
		}

	// End of post-index vector load/store multiple N-element structure	// End of post-index vector load/store multiple N-element structure
	// (class SIMD lselem-post)	// (class SIMD lselem-post)
		t.p.northoverUnsubmitted Not Done Reply Inline Actions No patterns and you probably would want any that existed since the layout issue doesn't exist for the duplicating loads. t.p.northover: No patterns and you probably would want any that existed since the layout issue doesn't exist…
		t.p.northoverUnsubmitted Not Done Reply Inline Actions Ah, here they are. I think these patterns should be endian-independent. t.p.northover: Ah, here they are. I think these patterns should be endian-independent.
		akadlecAuthorUnsubmitted Not Done Reply Inline Actions True for the single-element replications below - first candidate for re-enabling. Probably not true for the vector replicating loads here (if a single vector is loaded in reverse-element order, the duplication won't fix that ?) akadlec: True for the single-element replications below - first candidate for re-enabling. Probably not…
		akadlecAuthorUnsubmitted Not Done Reply Inline Actions Need to do more reading to fully understand all details of the swapping - but single element reads should be fine, if they do internal byte-swapping within the element (-> v1x nonterminals and scalar nonterminals) so conservatively "not yet" - i.e. until we have working code (CallingConv) akadlec: Need to do more reading to fully understand all details of the swapping - but single element…
		jmolloyUnsubmitted Not Done Reply Inline Actions Shouldn't a splatting LD1 still work in BE mode? jmolloy: Shouldn't a splatting LD1 still work in BE mode?
		jmolloyUnsubmitted Not Done Reply Inline Actions This comment is cryptic. Also, why should we care about byte swapping here, we're under the IsLE predicate. jmolloy: This comment is cryptic. Also, why should we care about byte swapping here, we're under the…
		akadlecAuthorUnsubmitted Not Done Reply Inline Actions At most the ones that only read one element and duplicate that. The multi-element reads will have unexpected order (that struct was STRed!), so can only be used via intrinsics to read from arrays. akadlec: At most the ones that only read one element and duplicate that. The multi-element reads will…
		akadlecAuthorUnsubmitted Not Done Reply Inline Actions That's the reason for the IsLE - because it doesn't work for BE. Pulled the comments out. Now I think it's still wrong, as the elements would be read in ascending address order "array-like", while they have been stored reversed by STR. akadlec: That's the reason for the IsLE - because it doesn't work for BE. Pulled the comments out. Now…
Context not available.
	}	}

	// Load single 1-element structure to all lanes of 1 register	// Load single 1-element structure to all lanes of 1 register
		// Single element loads are fine in BE
	defm LD1R : LDN_Dup_BHSD<0b0, 0b110, "VOne", "ld1r">;	defm LD1R : LDN_Dup_BHSD<0b0, 0b110, "VOne", "ld1r">;

	// Load single N-element structure to all lanes of N consecutive	// Load single N-element structure to all lanes of N consecutive
	// registers (N = 2,3,4)	// registers (N = 2,3,4)
	defm LD2R : LDN_Dup_BHSD<0b1, 0b110, "VPair", "ld2r">;	// Multi-element loads suffer from element reversal in BE.
	defm LD3R : LDN_Dup_BHSD<0b0, 0b111, "VTriple", "ld3r">;	let Predicates = [IsLE] in {
		jmolloyUnsubmitted Not Done Reply Inline Actions ld1.64 should be fine too, right? Because ld1.64 acts the same as LDR (byte swapping on a 64-bit value). jmolloy: ld1.64 should be fine too, right? Because ld1.64 acts the same as LDR (byte swapping on a 64…
		akadlecAuthorUnsubmitted Not Done Reply Inline Actions yes akadlec: yes
	defm LD4R : LDN_Dup_BHSD<0b1, 0b111, "VQuad", "ld4r">;	defm LD2R : LDN_Dup_BHSD<0b1, 0b110, "VPair", "ld2r">;
		defm LD3R : LDN_Dup_BHSD<0b0, 0b111, "VTriple", "ld3r">;
		defm LD4R : LDN_Dup_BHSD<0b1, 0b111, "VQuad", "ld4r">;
		}


	class LD1R_pattern <ValueType VTy, ValueType DTy, PatFrag LoadOp,	class LD1R_pattern <ValueType VTy, ValueType DTy, PatFrag LoadOp,
Context not available.
	(VTy (INST GPR64xsp:$Rn))>;	(VTy (INST GPR64xsp:$Rn))>;

	// Match all LD1R instructions	// Match all LD1R instructions
	def : LD1R_pattern<v8i8, i32, extloadi8, LD1R_8B>;	// This won't work as intended in BE mode, as STR q0 stored the elements swapped.
		let Predicates = [IsLE] in {
		def : LD1R_pattern<v8i8, i32, extloadi8, LD1R_8B>;
		def : LD1R_pattern<v16i8, i32, extloadi8, LD1R_16B>;

	def : LD1R_pattern<v16i8, i32, extloadi8, LD1R_16B>;	def : LD1R_pattern<v4i16, i32, extloadi16, LD1R_4H>;

	def : LD1R_pattern<v4i16, i32, extloadi16, LD1R_4H>;	def : LD1R_pattern<v8i16, i32, extloadi16, LD1R_8H>;

	def : LD1R_pattern<v8i16, i32, extloadi16, LD1R_8H>;	def : LD1R_pattern<v2i32, i32, load, LD1R_2S>;
		def : LD1R_pattern<v2f32, f32, load, LD1R_2S>;

	def : LD1R_pattern<v2i32, i32, load, LD1R_2S>;	def : LD1R_pattern<v4i32, i32, load, LD1R_4S>;
	def : LD1R_pattern<v2f32, f32, load, LD1R_2S>;	def : LD1R_pattern<v4f32, f32, load, LD1R_4S>;

	def : LD1R_pattern<v4i32, i32, load, LD1R_4S>;	def : LD1R_pattern<v2i64, i64, load, LD1R_2D>;
	def : LD1R_pattern<v4f32, f32, load, LD1R_4S>;	def : LD1R_pattern<v2f64, f64, load, LD1R_2D>;
		}

	def : LD1R_pattern<v2i64, i64, load, LD1R_2D>;
	def : LD1R_pattern<v2f64, f64, load, LD1R_2D>;

	class LD1R_pattern_v1 <ValueType VTy, ValueType DTy, PatFrag LoadOp,	class LD1R_pattern_v1 <ValueType VTy, ValueType DTy, PatFrag LoadOp,
	Instruction INST>	Instruction INST>
	: Pat<(VTy (scalar_to_vector (DTy (LoadOp GPR64xsp:$Rn)))),	: Pat<(VTy (scalar_to_vector (DTy (LoadOp GPR64xsp:$Rn)))),
	(VTy (INST GPR64xsp:$Rn))>;	(VTy (INST GPR64xsp:$Rn))>;

		// Single element operations are swap-safe in BE.
	def : LD1R_pattern_v1<v1i64, i64, load, LD1R_1D>;	def : LD1R_pattern_v1<v1i64, i64, load, LD1R_1D>;
	def : LD1R_pattern_v1<v1f64, f64, load, LD1R_1D>;	def : LD1R_pattern_v1<v1f64, f64, load, LD1R_1D>;


	multiclass VectorList_Bare_BHSD<string PREFIX, int Count,	multiclass VectorList_Bare_BHSD<string PREFIX, int Count,
	RegisterClass RegList> {	RegisterClass RegList> {
	defm B : VectorList_operands<PREFIX, "B", Count, RegList>;	defm B : VectorList_operands<PREFIX, "B", Count, RegList>;
		t.p.northoverUnsubmitted Not Done Reply Inline Actions Loading a single lane is also layout independent (and these are not the patterns). t.p.northover: Loading a single lane is also layout independent (and these are not the patterns).
		akadlecAuthorUnsubmitted Not Done Reply Inline Actions need to check whether ld4ln ({1,2,3,4}) yields the same result on BE & LE probably also depends on whether you feed that with an array (then yes) or a vector stored by STR (then probably NOT) -> pattern-dependent ? akadlec: need to check whether ld4ln ({1,2,3,4}) yields the same result on BE & LE probably also…
		jmolloyUnsubmitted Not Done Reply Inline Actions Will 1-element to 1-lane also work in BE mode? jmolloy: Will 1-element to 1-lane also work in BE mode?
		akadlecAuthorUnsubmitted Not Done Reply Inline Actions added the following comment to the pattern and removed the predicate. This will not work as intended in BE mode, if the matcher generates it to load a vector to a lane. (STR q0 stored the elements swapped) Must always use an intrinsic, so the user knows it's loading from an array layout. akadlec: added the following comment to the pattern and removed the predicate. // This will not work as…
Context not available.
	}	}

	// Load single 1-element structure to one lane of 1 register.	// Load single 1-element structure to one lane of 1 register.
		// No dangerous element swaps in BE. :-)
	defm LD1LN : LDN_Lane_BHSD<0b0, 0b0, "VOne", "ld1">;	defm LD1LN : LDN_Lane_BHSD<0b0, 0b0, "VOne", "ld1">;

	// Load single N-element structure to one lane of N consecutive registers	// Load single N-element structure to one lane of N consecutive registers
	// (N = 2,3,4)	// (N = 2,3,4)
	defm LD2LN : LDN_Lane_BHSD<0b1, 0b0, "VPair", "ld2">;	//
	defm LD3LN : LDN_Lane_BHSD<0b0, 0b1, "VTriple", "ld3">;	// This will not work as intended in BE mode, if the matcher generates it to
	defm LD4LN : LDN_Lane_BHSD<0b1, 0b1, "VQuad", "ld4">;	// load a vector to a lane. (STR q0 stored the vector's elements swapped)
		// Must always use an intrinsic, so the user knows it's loading from an array
		// layout.
		let Predicates = [IsLE] in {
		defm LD2LN : LDN_Lane_BHSD<0b1, 0b0, "VPair", "ld2">;
		defm LD3LN : LDN_Lane_BHSD<0b0, 0b1, "VTriple", "ld3">;
		defm LD4LN : LDN_Lane_BHSD<0b1, 0b1, "VQuad", "ld4">;
		}

	multiclass LD1LN_patterns<ValueType VTy, ValueType VTy2, ValueType DTy,	// Multiple elements would be reversed in BE.
	Operand ImmOp, Operand ImmOp2, PatFrag LoadOp,	let Predicates = [IsLE] in {
	Instruction INST> {	multiclass LD1LN_patterns<ValueType VTy, ValueType VTy2, ValueType DTy,
	def : Pat<(VTy (vector_insert (VTy VPR64:$src),	Operand ImmOp, Operand ImmOp2, PatFrag LoadOp,
	(DTy (LoadOp GPR64xsp:$Rn)), (ImmOp:$lane))),	Instruction INST> {
	(VTy (EXTRACT_SUBREG	def : Pat<(VTy (vector_insert (VTy VPR64:$src),
	(INST GPR64xsp:$Rn,	(DTy (LoadOp GPR64xsp:$Rn)), (ImmOp:$lane))),
	(SUBREG_TO_REG (i64 0), VPR64:$src, sub_64),	(VTy (EXTRACT_SUBREG
	ImmOp:$lane),	(INST GPR64xsp:$Rn,
	sub_64))>;	(SUBREG_TO_REG (i64 0), VPR64:$src, sub_64),
		ImmOp:$lane),
	def : Pat<(VTy2 (vector_insert (VTy2 VPR128:$src),	sub_64))>;
	(DTy (LoadOp GPR64xsp:$Rn)), (ImmOp2:$lane))),
	(VTy2 (INST GPR64xsp:$Rn, VPR128:$src, ImmOp2:$lane))>;	def : Pat<(VTy2 (vector_insert (VTy2 VPR128:$src),
		(DTy (LoadOp GPR64xsp:$Rn)), (ImmOp2:$lane))),
		(VTy2 (INST GPR64xsp:$Rn, VPR128:$src, ImmOp2:$lane))>;
		}
	}	}

	// Match all LD1LN instructions	// Match all LD1LN instructions
	defm : LD1LN_patterns<v8i8, v16i8, i32, neon_uimm3_bare, neon_uimm4_bare,	//
	extloadi8, LD1LN_B>;	// This will not work as intended in BE mode, if the matcher generates it to
		// load a vector to a lane. (STR q0 stored the elements swapped in BE)
		// Must always use an intrinsic, so the user knows it's loading from an array
		// layout.
		let Predicates = [IsLE] in {
		defm : LD1LN_patterns<v8i8, v16i8, i32, neon_uimm3_bare, neon_uimm4_bare,
		extloadi8, LD1LN_B>;

	defm : LD1LN_patterns<v4i16, v8i16, i32, neon_uimm2_bare, neon_uimm3_bare,	defm : LD1LN_patterns<v4i16, v8i16, i32, neon_uimm2_bare, neon_uimm3_bare,
	extloadi16, LD1LN_H>;	extloadi16, LD1LN_H>;

	defm : LD1LN_patterns<v2i32, v4i32, i32, neon_uimm1_bare, neon_uimm2_bare,	defm : LD1LN_patterns<v2i32, v4i32, i32, neon_uimm1_bare, neon_uimm2_bare,
	load, LD1LN_S>;	load, LD1LN_S>;
	defm : LD1LN_patterns<v2f32, v4f32, f32, neon_uimm1_bare, neon_uimm2_bare,	defm : LD1LN_patterns<v2f32, v4f32, f32, neon_uimm1_bare, neon_uimm2_bare,
	load, LD1LN_S>;	load, LD1LN_S>;

	defm : LD1LN_patterns<v1i64, v2i64, i64, neon_uimm0_bare, neon_uimm1_bare,	defm : LD1LN_patterns<v1i64, v2i64, i64, neon_uimm0_bare, neon_uimm1_bare,
	load, LD1LN_D>;	load, LD1LN_D>;
	defm : LD1LN_patterns<v1f64, v2f64, f64, neon_uimm0_bare, neon_uimm1_bare,	defm : LD1LN_patterns<v1f64, v2f64, f64, neon_uimm0_bare, neon_uimm1_bare,
	load, LD1LN_D>;	load, LD1LN_D>;
		}

	class NeonI_STN_Lane<bit r, bits<2> op2_1, bit op0, RegisterOperand VList,	class NeonI_STN_Lane<bit r, bits<2> op2_1, bit op0, RegisterOperand VList,
	Operand ImmOp, string asmop>	Operand ImmOp, string asmop>
Context not available.
	}	}

	// Store single 1-element structure from one lane of 1 register.	// Store single 1-element structure from one lane of 1 register.
		// single element should be fine in BE - no swapping of elements.
	defm ST1LN : STN_Lane_BHSD<0b0, 0b0, "VOne", "st1">;	defm ST1LN : STN_Lane_BHSD<0b0, 0b0, "VOne", "st1">;

	// Store single N-element structure from one lane of N consecutive registers	// Store single N-element structure from one lane of N consecutive registers
	// (N = 2,3,4)	// (N = 2,3,4)
	defm ST2LN : STN_Lane_BHSD<0b1, 0b0, "VPair", "st2">;	// Multiple elements would be reversed in BE.
	defm ST3LN : STN_Lane_BHSD<0b0, 0b1, "VTriple", "st3">;	let Predicates = [IsLE] in {
	defm ST4LN : STN_Lane_BHSD<0b1, 0b1, "VQuad", "st4">;	defm ST2LN : STN_Lane_BHSD<0b1, 0b0, "VPair", "st2">;
		defm ST3LN : STN_Lane_BHSD<0b0, 0b1, "VTriple", "st3">;
		defm ST4LN : STN_Lane_BHSD<0b1, 0b1, "VQuad", "st4">;
		}

	multiclass ST1LN_patterns<ValueType VTy, ValueType VTy2, ValueType DTy,	multiclass ST1LN_patterns<ValueType VTy, ValueType VTy2, ValueType DTy,
	Operand ImmOp, Operand ImmOp2, PatFrag StoreOp,	Operand ImmOp, Operand ImmOp2, PatFrag StoreOp,
Context not available.
	}	}

	// Match all ST1LN instructions	// Match all ST1LN instructions
	defm : ST1LN_patterns<v8i8, v16i8, i32, neon_uimm3_bare, neon_uimm4_bare,	//
	truncstorei8, ST1LN_B>;	// Multiple elements would be reversed in BE.
		let Predicates = [IsLE] in {
		defm : ST1LN_patterns<v8i8, v16i8, i32, neon_uimm3_bare, neon_uimm4_bare,
		truncstorei8, ST1LN_B>;

	defm : ST1LN_patterns<v4i16, v8i16, i32, neon_uimm2_bare, neon_uimm3_bare,	defm : ST1LN_patterns<v4i16, v8i16, i32, neon_uimm2_bare, neon_uimm3_bare,
	truncstorei16, ST1LN_H>;	truncstorei16, ST1LN_H>;

	defm : ST1LN_patterns<v2i32, v4i32, i32, neon_uimm1_bare, neon_uimm2_bare,	defm : ST1LN_patterns<v2i32, v4i32, i32, neon_uimm1_bare, neon_uimm2_bare,
	store, ST1LN_S>;	store, ST1LN_S>;
	defm : ST1LN_patterns<v2f32, v4f32, f32, neon_uimm1_bare, neon_uimm2_bare,	defm : ST1LN_patterns<v2f32, v4f32, f32, neon_uimm1_bare, neon_uimm2_bare,
	store, ST1LN_S>;	store, ST1LN_S>;

	defm : ST1LN_patterns<v1i64, v2i64, i64, neon_uimm0_bare, neon_uimm1_bare,	defm : ST1LN_patterns<v1i64, v2i64, i64, neon_uimm0_bare, neon_uimm1_bare,
	store, ST1LN_D>;	store, ST1LN_D>;
	defm : ST1LN_patterns<v1f64, v2f64, f64, neon_uimm0_bare, neon_uimm1_bare,	defm : ST1LN_patterns<v1f64, v2f64, f64, neon_uimm0_bare, neon_uimm1_bare,
	store, ST1LN_D>;	store, ST1LN_D>;
		}
	// End of vector load/store single N-element structure (class SIMD lsone).	// End of vector load/store single N-element structure (class SIMD lsone).


Context not available.
	}	}

	// Post-index load single 1-element structure to all lanes of 1 register	// Post-index load single 1-element structure to all lanes of 1 register
		// one element duplication should be fine in BE - no swapping of elements.
	defm LD1R_WB : LDWB_Dup_BHSD<0b0, 0b110, "VOne", "ld1r", uimm_exact1,	defm LD1R_WB : LDWB_Dup_BHSD<0b0, 0b110, "VOne", "ld1r", uimm_exact1,
	uimm_exact2, uimm_exact4, uimm_exact8>;	uimm_exact2, uimm_exact4, uimm_exact8>;

	// Post-index load single N-element structure to all lanes of N consecutive	// Post-index load single N-element structure to all lanes of N consecutive
	// registers (N = 2,3,4)	// registers (N = 2,3,4)
	defm LD2R_WB : LDWB_Dup_BHSD<0b1, 0b110, "VPair", "ld2r", uimm_exact2,	// Multiple elements would be reversed in BE.
	uimm_exact4, uimm_exact8, uimm_exact16>;	let Predicates = [IsLE] in {
	defm LD3R_WB : LDWB_Dup_BHSD<0b0, 0b111, "VTriple", "ld3r", uimm_exact3,	defm LD2R_WB : LDWB_Dup_BHSD<0b1, 0b110, "VPair", "ld2r", uimm_exact2,
	uimm_exact6, uimm_exact12, uimm_exact24>;	uimm_exact4, uimm_exact8, uimm_exact16>;
	defm LD4R_WB : LDWB_Dup_BHSD<0b1, 0b111, "VQuad", "ld4r", uimm_exact4,	defm LD3R_WB : LDWB_Dup_BHSD<0b0, 0b111, "VTriple", "ld3r", uimm_exact3,
	uimm_exact8, uimm_exact16, uimm_exact32>;	uimm_exact6, uimm_exact12, uimm_exact24>;
		defm LD4R_WB : LDWB_Dup_BHSD<0b1, 0b111, "VQuad", "ld4r", uimm_exact4,
		uimm_exact8, uimm_exact16, uimm_exact32>;
		}

	let mayLoad = 1, neverHasSideEffects = 1, hasExtraDefRegAllocReq = 1,	let mayLoad = 1, neverHasSideEffects = 1, hasExtraDefRegAllocReq = 1,
	Constraints = "$Rn = $wb, $Rt = $src",	Constraints = "$Rn = $wb, $Rt = $src",
Context not available.
	}	}

	// Post-index load single 1-element structure to one lane of 1 register.	// Post-index load single 1-element structure to one lane of 1 register.
		// One element from 1 lane is fine in BE - no swapping of elements.
	defm LD1LN_WB : LD_Lane_WB_BHSD<0b0, 0b0, "VOne", "ld1", uimm_exact1,	defm LD1LN_WB : LD_Lane_WB_BHSD<0b0, 0b0, "VOne", "ld1", uimm_exact1,
	uimm_exact2, uimm_exact4, uimm_exact8>;	uimm_exact2, uimm_exact4, uimm_exact8>;

	// Post-index load single N-element structure to one lane of N consecutive	// Post-index load single N-element structure to one lane of N consecutive
	// registers	// registers
	// (N = 2,3,4)	// (N = 2,3,4)
	defm LD2LN_WB : LD_Lane_WB_BHSD<0b1, 0b0, "VPair", "ld2", uimm_exact2,	// Multiple elements would be reversed in BE.
	uimm_exact4, uimm_exact8, uimm_exact16>;	let Predicates = [IsLE] in {
	defm LD3LN_WB : LD_Lane_WB_BHSD<0b0, 0b1, "VTriple", "ld3", uimm_exact3,	defm LD2LN_WB : LD_Lane_WB_BHSD<0b1, 0b0, "VPair", "ld2", uimm_exact2,
	uimm_exact6, uimm_exact12, uimm_exact24>;	uimm_exact4, uimm_exact8, uimm_exact16>;
	defm LD4LN_WB : LD_Lane_WB_BHSD<0b1, 0b1, "VQuad", "ld4", uimm_exact4,	defm LD3LN_WB : LD_Lane_WB_BHSD<0b0, 0b1, "VTriple", "ld3", uimm_exact3,
	uimm_exact8, uimm_exact16, uimm_exact32>;	uimm_exact6, uimm_exact12, uimm_exact24>;
		defm LD4LN_WB : LD_Lane_WB_BHSD<0b1, 0b1, "VQuad", "ld4", uimm_exact4,
		uimm_exact8, uimm_exact16, uimm_exact32>;
		}

	let mayStore = 1, neverHasSideEffects = 1,	let mayStore = 1, neverHasSideEffects = 1,
	hasExtraDefRegAllocReq = 1, Constraints = "$Rn = $wb",	hasExtraDefRegAllocReq = 1, Constraints = "$Rn = $wb",
Context not available.
	}	}

	// Post-index store single 1-element structure from one lane of 1 register.	// Post-index store single 1-element structure from one lane of 1 register.
		// one element from 1 lane should be fine in BE - no swapping of elements.
	defm ST1LN_WB : ST_Lane_WB_BHSD<0b0, 0b0, "VOne", "st1", uimm_exact1,	defm ST1LN_WB : ST_Lane_WB_BHSD<0b0, 0b0, "VOne", "st1", uimm_exact1,
	uimm_exact2, uimm_exact4, uimm_exact8>;	uimm_exact2, uimm_exact4, uimm_exact8>;

	// Post-index store single N-element structure from one lane of N consecutive	// Post-index store single N-element structure from one lane of N consecutive
	// registers (N = 2,3,4)	// registers (N = 2,3,4)
	defm ST2LN_WB : ST_Lane_WB_BHSD<0b1, 0b0, "VPair", "st2", uimm_exact2,	// Multiple elements would be reversed in BE.
	uimm_exact4, uimm_exact8, uimm_exact16>;	let Predicates = [IsLE] in {
	defm ST3LN_WB : ST_Lane_WB_BHSD<0b0, 0b1, "VTriple", "st3", uimm_exact3,	defm ST2LN_WB : ST_Lane_WB_BHSD<0b1, 0b0, "VPair", "st2", uimm_exact2,
	uimm_exact6, uimm_exact12, uimm_exact24>;	uimm_exact4, uimm_exact8, uimm_exact16>;
	defm ST4LN_WB : ST_Lane_WB_BHSD<0b1, 0b1, "VQuad", "st4", uimm_exact4,	defm ST3LN_WB : ST_Lane_WB_BHSD<0b0, 0b1, "VTriple", "st3", uimm_exact3,
	uimm_exact8, uimm_exact16, uimm_exact32>;	uimm_exact6, uimm_exact12, uimm_exact24>;
		defm ST4LN_WB : ST_Lane_WB_BHSD<0b1, 0b1, "VQuad", "st4", uimm_exact4,
		uimm_exact8, uimm_exact16, uimm_exact32>;
		}

	// End of post-index load/store single N-element instructions	// End of post-index load/store single N-element instructions
	// (class SIMD lsone-post)	// (class SIMD lsone-post)
Context not available.

test/CodeGen/AArch64/128bit_load_store.ll

		; R UN: llc < %s -verify-machineinstrs -mtriple=aarch64_be-none-linux-gnu -mattr=neon \| FileCheck %s
	; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=neon \| FileCheck %s	; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=neon \| FileCheck %s

		define void @test_store_v1i8(<1 x i8>* %ptr, <1 x i8> %val) #0 {
		; CHECK: test_store_v1i8
		; CHECK: str {{b[0-9]+}}, [{{x[0-9]+}}]
		entry:
		store <1 x i8> %val, <1 x i8>* %ptr, align 8
		ret void
		}



		define void @test_store_f16(half* %ptr, half %val) #0 {
		; CHECK: test_store_f16
		; CHECK: str {{h[0-9]+}}, [{{x[0-9]+}}]
		entry:
		store half %val, half* %ptr, align 8
		ret void
		}

		define void @test_store_v1i16(<1 x i16>* %ptr, <1 x i16> %val) #0 {
		; CHECK: test_store_v1i16
		; CHECK: str {{h[0-9]+}}, [{{x[0-9]+}}]
		entry:
		store <1 x i16> %val, <1 x i16>* %ptr, align 8
		ret void
		}



		define void @test_store_f32(float* %ptr, float %val) #0 {
		; CHECK: test_store_f32
		; CHECK: str {{s[0-9]+}}, [{{x[0-9]+}}]
		entry:
		store float %val, float* %ptr, align 8
		ret void
		}

		define void @test_store_v1f32(<1 x float>* %ptr, <1 x float> %val) #0 {
		; CHECK: test_store_v1f32
		; CHECK: str {{s[0-9]+}}, [{{x[0-9]+}}]
		entry:
		store <1 x float> %val, <1 x float>* %ptr, align 8
		ret void
		}

		define void @test_store_v1i32(<1 x i32>* %ptr, <1 x i32> %val) #0 {
		; CHECK: test_store_v1i32
		; CHECK: str {{s[0-9]+}}, [{{x[0-9]+}}]
		entry:
		store <1 x i32> %val, <1 x i32>* %ptr, align 8
		ret void
		}


		define void @test_store_f64(double *%ptr, double %val) #0 {
		; CHECK: test_store_f64
		; CHECK: str {{d[0-9]+}}, [{{x[0-9]+}}]
		entry:
		store double %val, double* %ptr, align 8
		ret void
		}

		define void @test_store_v1f64(<1 x double>* %ptr, <1 x double> %val) #0 {
		; CHECK: test_store_v1f64
		; CHECK: str {{d[0-9]+}}, [{{x[0-9]+}}]
		entry:
		store <1 x double> %val, <1 x double>* %ptr, align 8
		ret void
		}

		define void @test_store_v2f32(<2 x float>* %ptr, <2 x float> %val) #0 {
		; CHECK: test_store_v2f32
		; CHECK: str {{d[0-9]+}}, [{{x[0-9]+}}]
		entry:
		store <2 x float> %val, <2 x float>* %ptr, align 8
		ret void
		}

		define void @test_store_v1i64(<1 x i64>* %ptr, <1 x i64> %val) #0 {
		; CHECK: test_store_v1i64
		; CHECK: str {{d[0-9]+}}, [{{x[0-9]+}}]
		entry:
		store <1 x i64> %val, <1 x i64>* %ptr, align 8
		ret void
		}

		define void @test_store_v2i32(<2 x i32>* %ptr, <2 x i32> %val) #0 {
		; CHECK: test_store_v2i32
		; CHECK: str {{d[0-9]+}}, [{{x[0-9]+}}]
		entry:
		store <2 x i32> %val, <2 x i32>* %ptr, align 8
		ret void
		}

		define void @test_store_v4i16(<4 x i16>* %ptr, <4 x i16> %val) #0 {
		; CHECK: test_store_v4i16
		; CHECK: str {{d[0-9]+}}, [{{x[0-9]+}}]
		entry:
		store <4 x i16> %val, <4 x i16>* %ptr, align 8
		ret void
		}

		define void @test_store_v8i8(<8 x i8>* %ptr, <8 x i8> %val) #0 {
		; CHECK: test_store_v8i8
		; CHECK: str {{d[0-9]+}}, [{{x[0-9]+}}]
		entry:
		store <8 x i8> %val, <8 x i8>* %ptr, align 8
		ret void
		}




	define void @test_store_f128(fp128* %ptr, fp128 %val) #0 {	define void @test_store_f128(fp128* %ptr, fp128 %val) #0 {
	; CHECK: test_store_f128	; CHECK: test_store_f128
	; CHECK: str {{q[0-9]+}}, [{{x[0-9]+}}]	; CHECK: str {{q[0-9]+}}, [{{x[0-9]+}}]
Context not available.
	ret void	ret void
	}	}

		define void @test_store_v2f64(<2 x double>* %ptr, <2 x double> %val) #0 {
		; CHECK: test_store_v2f64
		; CHECK: str {{q[0-9]+}}, [{{x[0-9]+}}]
		entry:
		store <2 x double> %val, <2 x double>* %ptr, align 16
		ret void
		}

		define void @test_store_v4f32(<4 x float>* %ptr, <4 x float> %val) #0 {
		; CHECK: test_store_v4f32
		; CHECK: str {{q[0-9]+}}, [{{x[0-9]+}}]
		entry:
		store <4 x float> %val, <4 x float>* %ptr, align 16
		ret void
		}

		define void @test_store_v2i64(<2 x i64>* %ptr, <2 x i64> %val) #0 {
		; CHECK: test_store_v2i64
		; CHECK: str {{q[0-9]+}}, [{{x[0-9]+}}]
		entry:
		store <2 x i64> %val, <2 x i64>* %ptr, align 16
		ret void
		}

		define void @test_store_v4i32(<4 x i32>* %ptr, <4 x i32> %val) #0 {
		; CHECK: test_store_v4i32
		; CHECK: str {{q[0-9]+}}, [{{x[0-9]+}}]
		entry:
		store <4 x i32> %val, <4 x i32>* %ptr, align 16
		ret void
		}

		define void @test_store_v8i16(<8 x i16>* %ptr, <8 x i16> %val) #0 {
		; CHECK: test_store_v8i16
		; CHECK: str {{q[0-9]+}}, [{{x[0-9]+}}]
		entry:
		store <8 x i16> %val, <8 x i16>* %ptr, align 16
		ret void
		}

		define void @test_store_v16i8(<16 x i8>* %ptr, <16 x i8> %val) #0 {
		; CHECK: test_store_v16i8
		; CHECK: str {{q[0-9]+}}, [{{x[0-9]+}}]
		entry:
		store <16 x i8> %val, <16 x i8>* %ptr, align 16
		ret void
		}

	define fp128 @test_load_f128(fp128* readonly %ptr) #2 {	define fp128 @test_load_f128(fp128* readonly %ptr) #2 {
	; CHECK: test_load_f128	; CHECK: test_load_f128
	; CHECK: ldr {{q[0-9]+}}, [{{x[0-9]+}}]	; CHECK: ldr {{q[0-9]+}}, [{{x[0-9]+}}]
Context not available.

test/CodeGen/AArch64/addsub-shifted.ll

				; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64_be-none-linux-gnu \| FileCheck %s
	; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s			; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s

	@var32 = global i32 0			@var32 = global i32 0

test/CodeGen/AArch64/addsub.ll

				; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64_be-none-linux-gnu \| FileCheck %s
	; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s			; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s

	; Note that this should be refactored (for efficiency if nothing else)			; Note that this should be refactored (for efficiency if nothing else)

test/CodeGen/AArch64/addsub_ext.ll

				; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64_be-none-linux-gnu \| FileCheck %s
	; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s			; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s

	@var8 = global i8 0			@var8 = global i8 0

test/CodeGen/AArch64/alloca.ll

				; RUN: llc -mtriple=aarch64_be-none-linux-gnu -verify-machineinstrs < %s \| FileCheck %s
				; RUN: llc -mtriple=aarch64_be-none-linux-gnu -mattr=-fp-armv8 -verify-machineinstrs < %s \| FileCheck --check-prefix=CHECK-NOFP %s
	; RUN: llc -mtriple=aarch64-none-linux-gnu -verify-machineinstrs < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-none-linux-gnu -verify-machineinstrs < %s \| FileCheck %s
	; RUN: llc -mtriple=aarch64-none-linux-gnu -mattr=-fp-armv8 -verify-machineinstrs < %s \| FileCheck --check-prefix=CHECK-NOFP %s			; RUN: llc -mtriple=aarch64-none-linux-gnu -mattr=-fp-armv8 -verify-machineinstrs < %s \| FileCheck --check-prefix=CHECK-NOFP %s

test/CodeGen/AArch64/analyze-branch.ll

				; RUN: llc -mtriple=aarch64_be-none-linux-gnu < %s \| FileCheck %s
	; RUN: llc -mtriple=aarch64-none-linux-gnu < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-none-linux-gnu < %s \| FileCheck %s

	; This test checks that LLVM can do basic stripping and reapplying of branches			; This test checks that LLVM can do basic stripping and reapplying of branches

test/CodeGen/AArch64/assertion-rc-mismatch.ll

				; RUN: llc < %s -mtriple=aarch64_be-none-linux-gnu \| FileCheck %s
	; RUN: llc < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s			; RUN: llc < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s
	; Test case related to <rdar://problem/15633429>.			; Test case related to <rdar://problem/15633429>.

test/CodeGen/AArch64/atomic-ops-not-barriers.ll

				; RUN: llc -mtriple=aarch64_be-none-linux-gnu -verify-machineinstrs < %s \| FileCheck %s
	; RUN: llc -mtriple=aarch64-none-linux-gnu -verify-machineinstrs < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-none-linux-gnu -verify-machineinstrs < %s \| FileCheck %s

	define i32 @foo(i32* %var, i1 %cond) {			define i32 @foo(i32* %var, i1 %cond) {

test/CodeGen/AArch64/atomic-ops.ll

				; RUN: llc -mtriple=aarch64_be-none-linux-gnu -verify-machineinstrs < %s \| FileCheck %s
				; RUN: llc -mtriple=aarch64_be-none-linux-gnu -verify-machineinstrs < %s \| FileCheck --check-prefix=CHECK-REG %s
	; RUN: llc -mtriple=aarch64-none-linux-gnu -verify-machineinstrs < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-none-linux-gnu -verify-machineinstrs < %s \| FileCheck %s
	; RUN: llc -mtriple=aarch64-none-linux-gnu -verify-machineinstrs < %s \| FileCheck --check-prefix=CHECK-REG %s			; RUN: llc -mtriple=aarch64-none-linux-gnu -verify-machineinstrs < %s \| FileCheck --check-prefix=CHECK-REG %s

test/CodeGen/AArch64/basic-pic.ll

				; RUN: llc -mtriple=aarch64_be-none-linux-gnu -verify-machineinstrs -relocation-model=pic %s -o - \| FileCheck %s
	; RUN: llc -mtriple=aarch64-none-linux-gnu -verify-machineinstrs -relocation-model=pic %s -o - \| FileCheck %s			; RUN: llc -mtriple=aarch64-none-linux-gnu -verify-machineinstrs -relocation-model=pic %s -o - \| FileCheck %s

	@var = global i32 0			@var = global i32 0

test/CodeGen/AArch64/bitfield-insert-0.ll

				; RUN: llc -mtriple=aarch64_be-none-linux-gnu -filetype=obj < %s \| llvm-objdump -disassemble - \| FileCheck %s
	; RUN: llc -mtriple=aarch64-none-linux-gnu -filetype=obj < %s \| llvm-objdump -disassemble - \| FileCheck %s			; RUN: llc -mtriple=aarch64-none-linux-gnu -filetype=obj < %s \| llvm-objdump -disassemble - \| FileCheck %s

	; The encoding of lsb -> immr in the CGed bitfield instructions was wrong at one			; The encoding of lsb -> immr in the CGed bitfield instructions was wrong at one

test/CodeGen/AArch64/bitfield-insert.ll

				; RUN: llc -mtriple=aarch64_be-none-linux-gnu < %s \| FileCheck %s
	; RUN: llc -mtriple=aarch64-none-linux-gnu < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-none-linux-gnu < %s \| FileCheck %s

	; First, a simple example from Clang. The registers could plausibly be			; First, a simple example from Clang. The registers could plausibly be

test/CodeGen/AArch64/bitfield.ll

				; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64_be-none-linux-gnu \| FileCheck %s

	; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s			; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s

test/CodeGen/AArch64/blockaddress.ll

				; RUN: llc -mtriple=aarch64_be-none-linux-gnu -verify-machineinstrs < %s \| FileCheck %s
				; RUN: llc -code-model=large -mtriple=aarch64_be-none-linux-gnu -verify-machineinstrs < %s \| FileCheck --check-prefix=CHECK-LARGE %s
	; RUN: llc -mtriple=aarch64-none-linux-gnu -verify-machineinstrs < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-none-linux-gnu -verify-machineinstrs < %s \| FileCheck %s
	; RUN: llc -code-model=large -mtriple=aarch64-none-linux-gnu -verify-machineinstrs < %s \| FileCheck --check-prefix=CHECK-LARGE %s			; RUN: llc -code-model=large -mtriple=aarch64-none-linux-gnu -verify-machineinstrs < %s \| FileCheck --check-prefix=CHECK-LARGE %s

test/CodeGen/AArch64/bool-loads.ll

				; RUN: llc -mtriple=aarch64_be-none-linux-gnu < %s \| FileCheck %s
	; RUN: llc -mtriple=aarch64-none-linux-gnu < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-none-linux-gnu < %s \| FileCheck %s

	@var = global i1 0			@var = global i1 0

test/CodeGen/AArch64/breg.ll

				; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64_be-none-linux-gnu \| FileCheck %s
	; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s			; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s

	@stored_label = global i8* null			@stored_label = global i8* null

test/CodeGen/AArch64/callee-save.ll

				; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64_be-none-linux-gnu \| FileCheck %s
	; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s			; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s

	@var = global float 0.0			@var = global float 0.0

test/CodeGen/AArch64/code-model-large-abs.ll

				; RUN: llc -mtriple=aarch64_be-none-linux-gnu -code-model=large < %s \| FileCheck %s
	; RUN: llc -mtriple=aarch64-none-linux-gnu -code-model=large < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-none-linux-gnu -code-model=large < %s \| FileCheck %s

	@var8 = global i8 0			@var8 = global i8 0

test/CodeGen/AArch64/compare-branch.ll

				; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64_be-none-linux-gnu \| FileCheck %s
	; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s			; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s

	@var32 = global i32 0			@var32 = global i32 0

test/CodeGen/AArch64/complex-copy-noneon.ll

				; RUN: llc -mtriple=aarch64_be-none-linux-gnu -mattr=-neon < %s
	; RUN: llc -mtriple=aarch64-none-linux-gnu -mattr=-neon < %s			; RUN: llc -mtriple=aarch64-none-linux-gnu -mattr=-neon < %s

	; The DAG combiner decided to use a vector load/store for this struct copy			; The DAG combiner decided to use a vector load/store for this struct copy

test/CodeGen/AArch64/concatvector-v8i8-bug.ll

				; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64_be-none-linux-gnu -mattr=+neon
	; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon			; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon
	; Bug: i8 type in FRP8 register but not registering with register class causes segmentation fault.			; Bug: i8 type in FRP8 register but not registering with register class causes segmentation fault.
	; Fix: Removed i8 type from FPR8 register class.			; Fix: Removed i8 type from FPR8 register class.

test/CodeGen/AArch64/cond-sel.ll

				; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64_be-none-linux-gnu \| FileCheck %s
				; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64_be-none-linux-gnu -mattr=-fp-armv8 \| FileCheck --check-prefix=CHECK-NOFP %s
	; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s			; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s
	; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu -mattr=-fp-armv8 \| FileCheck --check-prefix=CHECK-NOFP %s			; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu -mattr=-fp-armv8 \| FileCheck --check-prefix=CHECK-NOFP %s

test/CodeGen/AArch64/cpus.ll

				; RUN: llc < %s -mtriple=aarch64_be-unknown-unknown -mcpu=generic 2>&1 \| FileCheck %s
				; RUN: llc < %s -mtriple=aarch64_be-unknown-unknown -mcpu=cortex-a53 2>&1 \| FileCheck %s
				; RUN: llc < %s -mtriple=aarch64_be-unknown-unknown -mcpu=cortex-a57 2>&1 \| FileCheck %s
				; RUN: llc < %s -mtriple=aarch64_be-unknown-unknown -mcpu=invalidcpu 2>&1 \| FileCheck %s --check-prefix=INVALID
	; This tests that llc accepts all valid AArch64 CPUs			; This tests that llc accepts all valid AArch64 CPUs

	; RUN: llc < %s -mtriple=aarch64-unknown-unknown -mcpu=generic 2>&1 \| FileCheck %s			; RUN: llc < %s -mtriple=aarch64-unknown-unknown -mcpu=generic 2>&1 \| FileCheck %s

test/CodeGen/AArch64/directcond.ll

				; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64_be-none-linux-gnu \| FileCheck %s
				; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64_be-none-linux-gnu -mattr=-fp-armv8 \| FileCheck --check-prefix=CHECK-NOFP %s
	; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s			; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s
	; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu -mattr=-fp-armv8 \| FileCheck --check-prefix=CHECK-NOFP %s			; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu -mattr=-fp-armv8 \| FileCheck --check-prefix=CHECK-NOFP %s

test/CodeGen/AArch64/dp-3source.ll

				; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64_be-none-linux-gnu \| FileCheck %s
	; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s			; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s

	define i32 @test_madd32(i32 %val0, i32 %val1, i32 %val2) {			define i32 @test_madd32(i32 %val0, i32 %val1, i32 %val2) {

test/CodeGen/AArch64/dp1.ll

				; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64_be-none-linux-gnu \| FileCheck %s
	; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s			; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s

	@var32 = global i32 0			@var32 = global i32 0

test/CodeGen/AArch64/dp2.ll

				; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64_be-none-linux-gnu \| FileCheck %s
	; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s			; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s

	@var32_0 = global i32 0			@var32_0 = global i32 0

test/CodeGen/AArch64/extern-weak.ll

				; RUN: llc -mtriple=aarch64_be-none-linux-gnu -o - < %s \| FileCheck %s
				; RUN: llc -mtriple=aarch64_be-none-linux-gnu -code-model=large -o - < %s \| FileCheck --check-prefix=CHECK-LARGE %s
	; RUN: llc -mtriple=aarch64-none-linux-gnu -o - < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-none-linux-gnu -o - < %s \| FileCheck %s
	; RUN: llc -mtriple=aarch64-none-linux-gnu -code-model=large -o - < %s \| FileCheck --check-prefix=CHECK-LARGE %s			; RUN: llc -mtriple=aarch64-none-linux-gnu -code-model=large -o - < %s \| FileCheck --check-prefix=CHECK-LARGE %s

test/CodeGen/AArch64/extract.ll

				; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64_be-none-linux-gnu \| FileCheck %s
	; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s			; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s

	define i64 @ror_i64(i64 %in) {			define i64 @ror_i64(i64 %in) {

test/CodeGen/AArch64/fastcc-reserved.ll

				; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64_be-none-linux-gnu -tailcallopt \| FileCheck %s
	; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu -tailcallopt \| FileCheck %s			; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu -tailcallopt \| FileCheck %s

	; This test is designed to be run in the situation where the			; This test is designed to be run in the situation where the

test/CodeGen/AArch64/fastcc.ll

				; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64_be-none-linux-gnu -tailcallopt \| FileCheck %s -check-prefix CHECK-TAIL
				; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64_be-none-linux-gnu \| FileCheck %s
	; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu -tailcallopt \| FileCheck %s -check-prefix CHECK-TAIL			; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu -tailcallopt \| FileCheck %s -check-prefix CHECK-TAIL
	; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s			; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s

test/CodeGen/AArch64/fcmp.ll

				; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64_be-none-linux-gnu \| FileCheck %s
	; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s			; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s

	declare void @bar(i32)			declare void @bar(i32)

test/CodeGen/AArch64/fcvt-fixed.ll

				; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64_be-none-linux-gnu -O0 \| FileCheck %s
	; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu -O0 \| FileCheck %s			; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu -O0 \| FileCheck %s

	@var32 = global i32 0			@var32 = global i32 0

test/CodeGen/AArch64/fcvt-int.ll

				; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64_be-none-linux-gnu \| FileCheck %s
	; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s			; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s

	define i32 @test_floattoi32(float %in) {			define i32 @test_floattoi32(float %in) {

test/CodeGen/AArch64/flags-multiuse.ll

				; RUN: llc -mtriple=aarch64_be-none-linux-gnu -verify-machineinstrs < %s \| FileCheck %s
	; RUN: llc -mtriple=aarch64-none-linux-gnu -verify-machineinstrs < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-none-linux-gnu -verify-machineinstrs < %s \| FileCheck %s

	; LLVM should be able to cope with multiple uses of the same flag-setting			; LLVM should be able to cope with multiple uses of the same flag-setting

test/CodeGen/AArch64/floatdp_1source.ll

				; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64_be-none-linux-gnu \| FileCheck %s
	; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s			; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s

	@varhalf = global half 0.0			@varhalf = global half 0.0

test/CodeGen/AArch64/floatdp_2source.ll

				; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64_be-none-linux-gnu \| FileCheck %s
	; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s			; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s

	@varfloat = global float 0.0			@varfloat = global float 0.0

test/CodeGen/AArch64/fp-cond-sel.ll

				; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64_be-none-linux-gnu \| FileCheck %s
	; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s			; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s

	@varfloat = global float 0.0			@varfloat = global float 0.0

test/CodeGen/AArch64/fp-dp3.ll

				; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64_be-none-linux-gnu -fp-contract=fast \| FileCheck %s
				; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64_be-none-linux-gnu \| FileCheck %s -check-prefix=CHECK-NOFAST
	; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu -fp-contract=fast \| FileCheck %s			; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu -fp-contract=fast \| FileCheck %s
	; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s -check-prefix=CHECK-NOFAST			; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s -check-prefix=CHECK-NOFAST

test/CodeGen/AArch64/fp128-folding.ll

				; RUN: llc -mtriple=aarch64_be-none-linux-gnu -verify-machineinstrs < %s \| FileCheck %s
	; RUN: llc -mtriple=aarch64-none-linux-gnu -verify-machineinstrs < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-none-linux-gnu -verify-machineinstrs < %s \| FileCheck %s
	declare void @bar(i8, i8, i32*)			declare void @bar(i8, i8, i32*)

test/CodeGen/AArch64/fp128.ll

				; RUN: llc -mtriple=aarch64_be-none-linux-gnu -verify-machineinstrs < %s \| FileCheck %s
	; RUN: llc -mtriple=aarch64-none-linux-gnu -verify-machineinstrs < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-none-linux-gnu -verify-machineinstrs < %s \| FileCheck %s

	@lhs = global fp128 zeroinitializer			@lhs = global fp128 zeroinitializer

test/CodeGen/AArch64/fpimm.ll

				; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64_be-none-linux-gnu \| FileCheck %s
	; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s			; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s

	@varf32 = global float 0.0			@varf32 = global float 0.0

test/CodeGen/AArch64/frameaddr.ll

				; RUN: llc < %s -mtriple=aarch64_be-none-linux-gnu \| FileCheck %s
	; RUN: llc < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s			; RUN: llc < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s

	define i8* @t() nounwind {			define i8* @t() nounwind {

test/CodeGen/AArch64/func-argpassing.ll

				; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64_be-none-linux-gnu -mattr=-fp-armv8 \| FileCheck --check-prefix=CHECK-NOFP %s
	; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s			; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s
	; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu -mattr=-fp-armv8 \| FileCheck --check-prefix=CHECK-NOFP %s			; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu -mattr=-fp-armv8 \| FileCheck --check-prefix=CHECK-NOFP %s

test/CodeGen/AArch64/func-calls.ll

				; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64_be-none-linux-gnu -mattr=-fp-armv8 \| FileCheck --check-prefix=CHECK-NOFP %s
	; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s			; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s
	; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu -mattr=-fp-armv8 \| FileCheck --check-prefix=CHECK-NOFP %s			; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu -mattr=-fp-armv8 \| FileCheck --check-prefix=CHECK-NOFP %s

test/CodeGen/AArch64/global-alignment.ll

				; RUN: llc -mtriple=aarch64_be-none-linux-gnu -verify-machineinstrs < %s \| FileCheck %s
	; RUN: llc -mtriple=aarch64-none-linux-gnu -verify-machineinstrs < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-none-linux-gnu -verify-machineinstrs < %s \| FileCheck %s

	@var32 = global [3 x i32] zeroinitializer			@var32 = global [3 x i32] zeroinitializer

test/CodeGen/AArch64/got-abuse.ll

				; RUN: llc -mtriple=aarch64_be-none-linux-gnu -relocation-model=pic < %s \| FileCheck %s
				; RUN: llc -mtriple=aarch64_be-none-linux-gnu -relocation-model=pic -filetype=obj < %s
	; RUN: llc -mtriple=aarch64-none-linux-gnu -relocation-model=pic < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-none-linux-gnu -relocation-model=pic < %s \| FileCheck %s
	; RUN: llc -mtriple=aarch64-none-linux-gnu -relocation-model=pic -filetype=obj < %s			; RUN: llc -mtriple=aarch64-none-linux-gnu -relocation-model=pic -filetype=obj < %s

test/CodeGen/AArch64/i128-align.ll

				; RUN: llc -mtriple=aarch64_be-none-linux-gnu -verify-machineinstrs < %s \| FileCheck %s
	; RUN: llc -mtriple=aarch64-none-linux-gnu -verify-machineinstrs < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-none-linux-gnu -verify-machineinstrs < %s \| FileCheck %s

	%struct = type { i32, i128, i8 }			%struct = type { i32, i128, i8 }

test/CodeGen/AArch64/illegal-float-ops.ll

				; RUN: llc -mtriple=aarch64_be-none-linux-gnu -verify-machineinstrs < %s \| FileCheck %s
	; RUN: llc -mtriple=aarch64-none-linux-gnu -verify-machineinstrs < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-none-linux-gnu -verify-machineinstrs < %s \| FileCheck %s

	@varfloat = global float 0.0			@varfloat = global float 0.0

test/CodeGen/AArch64/init-array.ll

				; RUN: llc -mtriple=aarch64_be-none-linux-gnu -verify-machineinstrs -use-init-array < %s \| FileCheck %s
				; RUN: llc -mtriple=aarch64_be-none-none-eabi -verify-machineinstrs -use-init-array < %s \| FileCheck %s
	; RUN: llc -mtriple=aarch64-none-linux-gnu -verify-machineinstrs -use-init-array < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-none-linux-gnu -verify-machineinstrs -use-init-array < %s \| FileCheck %s
	; RUN: llc -mtriple=aarch64-none-none-eabi -verify-machineinstrs -use-init-array < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-none-none-eabi -verify-machineinstrs -use-init-array < %s \| FileCheck %s

test/CodeGen/AArch64/inline-asm-constraints-badI.ll

				; RUN: not llc -mtriple=aarch64_be-none-linux-gnu < %s
	; RUN: not llc -mtriple=aarch64-none-linux-gnu < %s			; RUN: not llc -mtriple=aarch64-none-linux-gnu < %s

	define void @foo() {			define void @foo() {

test/CodeGen/AArch64/inline-asm-constraints-badK.ll

				; RUN: not llc -mtriple=aarch64_be-none-linux-gnu < %s
	; RUN: not llc -mtriple=aarch64-none-linux-gnu < %s			; RUN: not llc -mtriple=aarch64-none-linux-gnu < %s

	define void @foo() {			define void @foo() {

test/CodeGen/AArch64/inline-asm-constraints-badK2.ll

				; RUN: not llc -mtriple=aarch64_be-none-linux-gnu < %s
	; RUN: not llc -mtriple=aarch64-none-linux-gnu < %s			; RUN: not llc -mtriple=aarch64-none-linux-gnu < %s

	define void @foo() {			define void @foo() {

test/CodeGen/AArch64/inline-asm-constraints-badL.ll

				; RUN: not llc -mtriple=aarch64_be-none-linux-gnu < %s
	; RUN: not llc -mtriple=aarch64-none-linux-gnu < %s			; RUN: not llc -mtriple=aarch64-none-linux-gnu < %s

	define void @foo() {			define void @foo() {

test/CodeGen/AArch64/inline-asm-modifiers.ll

				; RUN: llc -mtriple=aarch64_be-none-linux-gnu -relocation-model=pic -no-integrated-as < %s \| FileCheck %s
	; RUN: llc -mtriple=aarch64-none-linux-gnu -relocation-model=pic -no-integrated-as < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-none-linux-gnu -relocation-model=pic -no-integrated-as < %s \| FileCheck %s

	@var_simple = hidden global i32 0			@var_simple = hidden global i32 0

test/CodeGen/AArch64/jump-table.ll

				; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64_be-none-linux-gnu \| FileCheck %s
				; RUN: llc -code-model=large -verify-machineinstrs < %s -mtriple=aarch64_be-none-linux-gnu \| FileCheck --check-prefix=CHECK-LARGE %s
	; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s			; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s
	; RUN: llc -code-model=large -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck --check-prefix=CHECK-LARGE %s			; RUN: llc -code-model=large -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck --check-prefix=CHECK-LARGE %s

test/CodeGen/AArch64/large-consts.ll

				; RUN: llc -mtriple=aarch64_be-linux-gnu -o - %s -code-model=large -show-mc-encoding \| FileCheck %s
	; RUN: llc -mtriple=aarch64-linux-gnu -o - %s -code-model=large -show-mc-encoding \| FileCheck %s			; RUN: llc -mtriple=aarch64-linux-gnu -o - %s -code-model=large -show-mc-encoding \| FileCheck %s

	; Make sure the shift amount is encoded into the instructions by LLVM because			; Make sure the shift amount is encoded into the instructions by LLVM because

test/CodeGen/AArch64/large-frame.ll

				; RUN: llc -verify-machineinstrs -mtriple=aarch64_be-none-linux-gnu < %s \| FileCheck %s
	; RUN: llc -verify-machineinstrs -mtriple=aarch64-none-linux-gnu < %s \| FileCheck %s			; RUN: llc -verify-machineinstrs -mtriple=aarch64-none-linux-gnu < %s \| FileCheck %s
	declare void @use_addr(i8*)			declare void @use_addr(i8*)

test/CodeGen/AArch64/ldst-regoffset.ll

				; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64_be-none-linux-gnu \| FileCheck %s
				; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64_be-none-linux-gnu -mattr=-fp-armv8 \| FileCheck --check-prefix=CHECK-NOFP %s
	; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s			; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s
	; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu -mattr=-fp-armv8 \| FileCheck --check-prefix=CHECK-NOFP %s			; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu -mattr=-fp-armv8 \| FileCheck --check-prefix=CHECK-NOFP %s

test/CodeGen/AArch64/ldst-unscaledimm.ll

				; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64_be-none-linux-gnu \| FileCheck %s
				; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64_be-none-linux-gnu -mattr=-fp-armv8 \| FileCheck --check-prefix=CHECK-NOFP %s
	; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s			; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s
	; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu -mattr=-fp-armv8 \| FileCheck --check-prefix=CHECK-NOFP %s			; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu -mattr=-fp-armv8 \| FileCheck --check-prefix=CHECK-NOFP %s

test/CodeGen/AArch64/ldst-unsignedimm.ll

				; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64_be-none-linux-gnu \| FileCheck %s
				; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64_be-none-linux-gnu -mattr=-fp-armv8 \| FileCheck --check-prefix=CHECK-NOFP %s
	; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s			; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s
	; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu -mattr=-fp-armv8 \| FileCheck --check-prefix=CHECK-NOFP %s			; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu -mattr=-fp-armv8 \| FileCheck --check-prefix=CHECK-NOFP %s

test/CodeGen/AArch64/literal_pools.ll

				; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64_be-none-linux-gnu \| FileCheck %s
				; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64_be-none-linux-gnu -code-model=large \| FileCheck --check-prefix=CHECK-LARGE %s
				; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64_be-none-linux-gnu -mattr=-fp-armv8 \| FileCheck --check-prefix=CHECK-NOFP %s
				; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64_be-none-linux-gnu -code-model=large -mattr=-fp-armv8 \| FileCheck --check-prefix=CHECK-NOFP-LARGE %s
	; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s			; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s
	; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu -code-model=large \| FileCheck --check-prefix=CHECK-LARGE %s			; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu -code-model=large \| FileCheck --check-prefix=CHECK-LARGE %s
	; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu -mattr=-fp-armv8 \| FileCheck --check-prefix=CHECK-NOFP %s			; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu -mattr=-fp-armv8 \| FileCheck --check-prefix=CHECK-NOFP %s

test/CodeGen/AArch64/local_vars.ll

				; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64_be-none-linux-gnu -O0 \| FileCheck %s
				; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64_be-none-linux-gnu -O0 -disable-fp-elim \| FileCheck -check-prefix CHECK-WITHFP %s
	; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu -O0 \| FileCheck %s			; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu -O0 \| FileCheck %s
	; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu -O0 -disable-fp-elim \| FileCheck -check-prefix CHECK-WITHFP %s			; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu -O0 -disable-fp-elim \| FileCheck -check-prefix CHECK-WITHFP %s

test/CodeGen/AArch64/logical-imm.ll

				; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64_be-none-linux-gnu \| FileCheck %s
	; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s			; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s

	@var32 = global i32 0			@var32 = global i32 0

test/CodeGen/AArch64/logical_shifted_reg.ll

				; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64_be-none-linux-gnu -O0 \| FileCheck %s
	; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu -O0 \| FileCheck %s			; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu -O0 \| FileCheck %s

	@var1_32 = global i32 0			@var1_32 = global i32 0

test/CodeGen/AArch64/mature-mc-support.ll

				; RUN: not llc -mtriple=aarch64_be-pc-linux < %s > /dev/null 2> %t1
				; RUN: not llc -mtriple=aarch64_be-pc-linux -filetype=obj < %s > /dev/null 2> %t2
	; Test that inline assembly is parsed by the MC layer when MC support is mature			; Test that inline assembly is parsed by the MC layer when MC support is mature
	; (even when the output is assembly).			; (even when the output is assembly).

test/CodeGen/AArch64/movw-consts.ll

				; RUN: llc -verify-machineinstrs -O0 < %s -mtriple=aarch64_be-none-linux-gnu \| FileCheck %s
	; RUN: llc -verify-machineinstrs -O0 < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s			; RUN: llc -verify-machineinstrs -O0 < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s

	define i64 @test0() {			define i64 @test0() {

test/CodeGen/AArch64/movw-shift-encoding.ll

				; RUN: llc -mtriple=aarch64_be-linux-gnu < %s -show-mc-encoding -code-model=large \| FileCheck %s
	; RUN: llc -mtriple=aarch64-linux-gnu < %s -show-mc-encoding -code-model=large \| FileCheck %s			; RUN: llc -mtriple=aarch64-linux-gnu < %s -show-mc-encoding -code-model=large \| FileCheck %s

	@var = global i32 0			@var = global i32 0

test/CodeGen/AArch64/neon-2velem-high.ll

				; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64_be-none-linux-gnu -mattr=+neon -fp-contract=fast \| FileCheck %s
	; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon -fp-contract=fast \| FileCheck %s			; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon -fp-contract=fast \| FileCheck %s

	declare <2 x float> @llvm.fma.v2f32(<2 x float>, <2 x float>, <2 x float>)			declare <2 x float> @llvm.fma.v2f32(<2 x float>, <2 x float>, <2 x float>)

test/CodeGen/AArch64/neon-2velem.ll

				; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64_be-none-linux-gnu -mattr=+neon -fp-contract=fast \| FileCheck %s
	; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon -fp-contract=fast \| FileCheck %s			; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon -fp-contract=fast \| FileCheck %s

	declare <2 x double> @llvm.aarch64.neon.vmulx.v2f64(<2 x double>, <2 x double>)			declare <2 x double> @llvm.aarch64.neon.vmulx.v2f64(<2 x double>, <2 x double>)

test/CodeGen/AArch64/neon-3vdiff.ll

				; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64_be-none-linux-gnu -mattr=+neon \| FileCheck %s
	; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s			; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s

	declare <8 x i16> @llvm.arm.neon.vmullp.v8i16(<8 x i8>, <8 x i8>)			declare <8 x i16> @llvm.arm.neon.vmullp.v8i16(<8 x i8>, <8 x i8>)

test/CodeGen/AArch64/neon-aba-abd.ll

				; RUN: llc -mtriple=aarch64_be-none-linux-gnu -mattr=+neon < %s \| FileCheck %s
	; RUN: llc -mtriple=aarch64-none-linux-gnu -mattr=+neon < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-none-linux-gnu -mattr=+neon < %s \| FileCheck %s

	declare <8 x i8> @llvm.arm.neon.vabdu.v8i8(<8 x i8>, <8 x i8>)			declare <8 x i8> @llvm.arm.neon.vabdu.v8i8(<8 x i8>, <8 x i8>)

test/CodeGen/AArch64/neon-across.ll

				; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64_be-none-linux-gnu -mattr=+neon \| FileCheck %s
	; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s			; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s

	declare float @llvm.aarch64.neon.vminnmv(<4 x float>)			declare float @llvm.aarch64.neon.vminnmv(<4 x float>)

test/CodeGen/AArch64/neon-add-pairwise.ll

				; RUN: llc -mtriple=aarch64_be-none-linux-gnu -mattr=+neon < %s \| FileCheck %s
	; RUN: llc -mtriple=aarch64-none-linux-gnu -mattr=+neon < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-none-linux-gnu -mattr=+neon < %s \| FileCheck %s

	declare <8 x i8> @llvm.arm.neon.vpadd.v8i8(<8 x i8>, <8 x i8>)			declare <8 x i8> @llvm.arm.neon.vpadd.v8i8(<8 x i8>, <8 x i8>)

test/CodeGen/AArch64/neon-add-sub.ll

				; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64_be-none-linux-gnu -mattr=+neon \| FileCheck %s
	; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s			; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s

	define <8 x i8> @add8xi8(<8 x i8> %A, <8 x i8> %B) {			define <8 x i8> @add8xi8(<8 x i8> %A, <8 x i8> %B) {

test/CodeGen/AArch64/neon-bitcast.ll

				; RUN: llc -mtriple=aarch64_be-none-linux-gnu -mattr=+neon -verify-machineinstrs < %s \| FileCheck %s
	; RUN: llc -mtriple=aarch64-none-linux-gnu -mattr=+neon -verify-machineinstrs < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-none-linux-gnu -mattr=+neon -verify-machineinstrs < %s \| FileCheck %s

	; From <8 x i8>			; From <8 x i8>

test/CodeGen/AArch64/neon-bitwise-instructions.ll

				; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64_be-none-linux-gnu -mattr=+neon \| FileCheck %s
	; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s			; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s

	define <8 x i8> @and8xi8(<8 x i8> %a, <8 x i8> %b) {			define <8 x i8> @and8xi8(<8 x i8> %a, <8 x i8> %b) {

test/CodeGen/AArch64/neon-bsl.ll

				; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64_be-none-linux-gnu -mattr=+neon \| FileCheck %s
	; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s			; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s

	declare <2 x double> @llvm.arm.neon.vbsl.v2f64(<2 x double>, <2 x double>, <2 x double>)			declare <2 x double> @llvm.arm.neon.vbsl.v2f64(<2 x double>, <2 x double>, <2 x double>)

test/CodeGen/AArch64/neon-compare-instructions.ll

				; RUN: llc -mtriple=aarch64_be-none-linux-gnu -mattr=+neon < %s \| FileCheck %s
	; RUN: llc -mtriple=aarch64-none-linux-gnu -mattr=+neon < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-none-linux-gnu -mattr=+neon < %s \| FileCheck %s

	define <8 x i8> @cmeq8xi8(<8 x i8> %A, <8 x i8> %B) {			define <8 x i8> @cmeq8xi8(<8 x i8> %A, <8 x i8> %B) {

test/CodeGen/AArch64/neon-copyPhysReg-tuple.ll

				; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64_be-none-linux-gnu -mattr=+neon \| FileCheck %s
	; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s			; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s

	define <4 x i32> @copyTuple.QPair(i8* %a, i8* %b) {			define <4 x i32> @copyTuple.QPair(i8* %a, i8* %b) {

test/CodeGen/AArch64/neon-crypto.ll

				; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64_be-none-linux-gnu -mattr=+neon -mattr=+crypto \| FileCheck %s
				; RUN: not llc < %s -verify-machineinstrs -mtriple=aarch64_be-none-linux-gnu -mattr=+neon 2>&1 \| FileCheck --check-prefix=CHECK-NO-CRYPTO %s
	; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon -mattr=+crypto \| FileCheck %s			; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon -mattr=+crypto \| FileCheck %s
	; RUN: not llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon 2>&1 \| FileCheck --check-prefix=CHECK-NO-CRYPTO %s			; RUN: not llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon 2>&1 \| FileCheck --check-prefix=CHECK-NO-CRYPTO %s

test/CodeGen/AArch64/neon-diagnostics.ll

				; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64_be-none-linux-gnu -mattr=+neon \| FileCheck %s
	; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s			; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s

	define <2 x float> @test_vfma_lane_f32(<2 x float> %a, <2 x float> %b, <2 x float> %v) {			define <2 x float> @test_vfma_lane_f32(<2 x float> %a, <2 x float> %b, <2 x float> %v) {

test/CodeGen/AArch64/neon-extract.ll

				; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64_be-none-linux-gnu -mattr=+neon \| FileCheck %s
	; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s			; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s

	define <8 x i8> @test_vext_s8(<8 x i8> %a, <8 x i8> %b) {			define <8 x i8> @test_vext_s8(<8 x i8> %a, <8 x i8> %b) {

test/CodeGen/AArch64/neon-facge-facgt.ll

				; RUN: llc -mtriple=aarch64_be-none-linux-gnu -mattr=+neon < %s \| FileCheck %s
	; RUN: llc -mtriple=aarch64-none-linux-gnu -mattr=+neon < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-none-linux-gnu -mattr=+neon < %s \| FileCheck %s

	declare <2 x i32> @llvm.arm.neon.vacge.v2i32.v2f32(<2 x float>, <2 x float>)			declare <2 x i32> @llvm.arm.neon.vacge.v2i32.v2f32(<2 x float>, <2 x float>)

test/CodeGen/AArch64/neon-fma.ll

				; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64_be-none-linux-gnu -mattr=+neon -fp-contract=fast \| FileCheck %s
	; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon -fp-contract=fast \| FileCheck %s			; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon -fp-contract=fast \| FileCheck %s

	define <2 x float> @fmla2xfloat(<2 x float> %A, <2 x float> %B, <2 x float> %C) {			define <2 x float> @fmla2xfloat(<2 x float> %A, <2 x float> %B, <2 x float> %C) {

test/CodeGen/AArch64/neon-fpround_f128.ll

				; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64_be-none-linux-gnu -mattr=+neon -fp-contract=fast \| FileCheck %s
	; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon -fp-contract=fast \| FileCheck %s			; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon -fp-contract=fast \| FileCheck %s

	define <1 x double> @test_fpround_v1f128(<1 x fp128>* %a) {			define <1 x double> @test_fpround_v1f128(<1 x fp128>* %a) {

test/CodeGen/AArch64/neon-frsqrt-frecp.ll

				; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64_be-none-linux-gnu -mattr=+neon \| FileCheck %s
	; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s			; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s

	; Set of tests for when the intrinsic is used.			; Set of tests for when the intrinsic is used.

test/CodeGen/AArch64/neon-halving-add-sub.ll

				; RUN: llc -mtriple=aarch64_be-none-linux-gnu -mattr=+neon < %s \| FileCheck %s
	; RUN: llc -mtriple=aarch64-none-linux-gnu -mattr=+neon < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-none-linux-gnu -mattr=+neon < %s \| FileCheck %s

	declare <8 x i8> @llvm.arm.neon.vhaddu.v8i8(<8 x i8>, <8 x i8>)			declare <8 x i8> @llvm.arm.neon.vhaddu.v8i8(<8 x i8>, <8 x i8>)

test/CodeGen/AArch64/neon-load-store-v1i32.ll

				; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64_be-none-linux-gnu -mattr=+neon \| FileCheck %s
	; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s			; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s

	; Test load/store of v1i8, v1i16, v1i32 types can be selected correctly			; Test load/store of v1i8, v1i16, v1i32 types can be selected correctly

test/CodeGen/AArch64/neon-max-min-pairwise.ll

				; RUN: llc -mtriple=aarch64_be-none-linux-gnu -mattr=+neon < %s \| FileCheck %s
	; RUN: llc -mtriple=aarch64-none-linux-gnu -mattr=+neon < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-none-linux-gnu -mattr=+neon < %s \| FileCheck %s

	declare <8 x i8> @llvm.arm.neon.vpmaxs.v8i8(<8 x i8>, <8 x i8>)			declare <8 x i8> @llvm.arm.neon.vpmaxs.v8i8(<8 x i8>, <8 x i8>)

test/CodeGen/AArch64/neon-max-min.ll

				; RUN: llc -mtriple=aarch64_be-none-linux-gnu -mattr=+neon < %s \| FileCheck %s
	; RUN: llc -mtriple=aarch64-none-linux-gnu -mattr=+neon < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-none-linux-gnu -mattr=+neon < %s \| FileCheck %s

	declare <8 x i8> @llvm.arm.neon.vmaxs.v8i8(<8 x i8>, <8 x i8>)			declare <8 x i8> @llvm.arm.neon.vmaxs.v8i8(<8 x i8>, <8 x i8>)

test/CodeGen/AArch64/neon-misc.ll

				; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64_be-none-linux-gnu -mattr=+neon -fp-contract=fast \| FileCheck %s
	; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon -fp-contract=fast \| FileCheck %s			; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon -fp-contract=fast \| FileCheck %s

test/CodeGen/AArch64/neon-mla-mls.ll

				; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64_be-none-linux-gnu -mattr=+neon \| FileCheck %s
	; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s			; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s

test/CodeGen/AArch64/neon-mov.ll

				; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64_be-none-linux-gnu -mattr=+neon \| FileCheck %s
	; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s			; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s

	define <8 x i8> @movi8b() {			define <8 x i8> @movi8b() {

test/CodeGen/AArch64/neon-mul-div.ll

				; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64_be-none-linux-gnu -mattr=+neon \| FileCheck %s
	; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s			; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s

test/CodeGen/AArch64/neon-or-combine.ll

				; RUN: llc < %s -mtriple=aarch64_be-none-linux-gnu -mattr=+neon \| FileCheck %s
	; RUN: llc < %s -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s			; RUN: llc < %s -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s

	; Check that the DAGCombiner does not crash with an assertion failure			; Check that the DAGCombiner does not crash with an assertion failure

test/CodeGen/AArch64/neon-perm.ll

				; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64_be-none-linux-gnu -mattr=+neon \| FileCheck %s
	; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s			; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s

	%struct.int8x8x2_t = type { [2 x <8 x i8>] }			%struct.int8x8x2_t = type { [2 x <8 x i8>] }

test/CodeGen/AArch64/neon-rounding-halving-add.ll

				; RUN: llc -mtriple=aarch64_be-none-linux-gnu -mattr=+neon < %s \| FileCheck %s
	; RUN: llc -mtriple=aarch64-none-linux-gnu -mattr=+neon < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-none-linux-gnu -mattr=+neon < %s \| FileCheck %s

	declare <8 x i8> @llvm.arm.neon.vrhaddu.v8i8(<8 x i8>, <8 x i8>)			declare <8 x i8> @llvm.arm.neon.vrhaddu.v8i8(<8 x i8>, <8 x i8>)

test/CodeGen/AArch64/neon-rounding-shift.ll

				; RUN: llc -mtriple=aarch64_be-none-linux-gnu -mattr=+neon < %s \| FileCheck %s
	; RUN: llc -mtriple=aarch64-none-linux-gnu -mattr=+neon < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-none-linux-gnu -mattr=+neon < %s \| FileCheck %s

	declare <8 x i8> @llvm.arm.neon.vrshiftu.v8i8(<8 x i8>, <8 x i8>)			declare <8 x i8> @llvm.arm.neon.vrshiftu.v8i8(<8 x i8>, <8 x i8>)

test/CodeGen/AArch64/neon-saturating-add-sub.ll

				; RUN: llc -mtriple=aarch64_be-none-linux-gnu -mattr=+neon < %s \| FileCheck %s
	; RUN: llc -mtriple=aarch64-none-linux-gnu -mattr=+neon < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-none-linux-gnu -mattr=+neon < %s \| FileCheck %s

	declare <8 x i8> @llvm.arm.neon.vqaddu.v8i8(<8 x i8>, <8 x i8>)			declare <8 x i8> @llvm.arm.neon.vqaddu.v8i8(<8 x i8>, <8 x i8>)

test/CodeGen/AArch64/neon-saturating-rounding-shift.ll

				; RUN: llc -mtriple=aarch64_be-none-linux-gnu -mattr=+neon < %s \| FileCheck %s
	; RUN: llc -mtriple=aarch64-none-linux-gnu -mattr=+neon < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-none-linux-gnu -mattr=+neon < %s \| FileCheck %s

	declare <8 x i8> @llvm.arm.neon.vqrshiftu.v8i8(<8 x i8>, <8 x i8>)			declare <8 x i8> @llvm.arm.neon.vqrshiftu.v8i8(<8 x i8>, <8 x i8>)

test/CodeGen/AArch64/neon-saturating-shift.ll

				; RUN: llc -mtriple=aarch64_be-none-linux-gnu -mattr=+neon < %s \| FileCheck %s
	; RUN: llc -mtriple=aarch64-none-linux-gnu -mattr=+neon < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-none-linux-gnu -mattr=+neon < %s \| FileCheck %s

	declare <8 x i8> @llvm.arm.neon.vqshiftu.v8i8(<8 x i8>, <8 x i8>)			declare <8 x i8> @llvm.arm.neon.vqshiftu.v8i8(<8 x i8>, <8 x i8>)

test/CodeGen/AArch64/neon-scalar-abs.ll

				; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64_be-none-linux-gnu -mattr=+neon \| FileCheck %s
	; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s			; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s

	define i64 @test_vabsd_s64(i64 %a) {			define i64 @test_vabsd_s64(i64 %a) {

test/CodeGen/AArch64/neon-scalar-add-sub.ll

				; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64_be-none-linux-gnu -mattr=+neon \| FileCheck %s
	; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s			; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s

	define <1 x i64> @add1xi64(<1 x i64> %A, <1 x i64> %B) {			define <1 x i64> @add1xi64(<1 x i64> %A, <1 x i64> %B) {

test/CodeGen/AArch64/neon-scalar-by-elem-fma.ll

				; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64_be-none-linux-gnu -mattr=+neon -fp-contract=fast \| FileCheck %s
	; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon -fp-contract=fast \| FileCheck %s			; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon -fp-contract=fast \| FileCheck %s

	declare float @llvm.fma.f32(float, float, float)			declare float @llvm.fma.f32(float, float, float)

test/CodeGen/AArch64/neon-scalar-by-elem-mul.ll

				; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64_be-none-linux-gnu -mattr=+neon -fp-contract=fast \| FileCheck %s
	; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon -fp-contract=fast \| FileCheck %s			; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon -fp-contract=fast \| FileCheck %s

	define float @test_fmul_lane_ss2S(float %a, <2 x float> %v) {			define float @test_fmul_lane_ss2S(float %a, <2 x float> %v) {

test/CodeGen/AArch64/neon-scalar-compare.ll

				; RUN: llc -mtriple=aarch64_be-none-linux-gnu -mattr=+neon < %s \| FileCheck %s
	; RUN: llc -mtriple=aarch64-none-linux-gnu -mattr=+neon < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-none-linux-gnu -mattr=+neon < %s \| FileCheck %s

	;; Scalar Integer Compare			;; Scalar Integer Compare

test/CodeGen/AArch64/neon-scalar-copy.ll

				; RUN: llc -mtriple=aarch64_be-none-linux-gnu -mattr=+neon < %s \| FileCheck %s
	; RUN: llc -mtriple=aarch64-none-linux-gnu -mattr=+neon < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-none-linux-gnu -mattr=+neon < %s \| FileCheck %s

	define float @test_dup_sv2S(<2 x float> %v) {			define float @test_dup_sv2S(<2 x float> %v) {

test/CodeGen/AArch64/neon-scalar-cvt.ll

				; RUN: llc -mtriple=aarch64_be-none-linux-gnu -mattr=+neon < %s \| FileCheck %s
	; RUN: llc -mtriple=aarch64-none-linux-gnu -mattr=+neon < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-none-linux-gnu -mattr=+neon < %s \| FileCheck %s

	define float @test_vcvts_f32_s32(i32 %a) {			define float @test_vcvts_f32_s32(i32 %a) {

test/CodeGen/AArch64/neon-scalar-ext.ll

				; RUN: llc -mtriple=aarch64_be-none-linux-gnu -mattr=+neon < %s \| FileCheck %s
	; RUN: llc -mtriple=aarch64-none-linux-gnu -mattr=+neon < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-none-linux-gnu -mattr=+neon < %s \| FileCheck %s

	define <1 x i64> @test_zext_v1i32_v1i64(<2 x i32> %v) nounwind readnone {			define <1 x i64> @test_zext_v1i32_v1i64(<2 x i32> %v) nounwind readnone {

test/CodeGen/AArch64/neon-scalar-extract-narrow.ll

				; RUN: llc -mtriple=aarch64_be-none-linux-gnu -mattr=+neon < %s \| FileCheck %s
	; RUN: llc -mtriple=aarch64-none-linux-gnu -mattr=+neon < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-none-linux-gnu -mattr=+neon < %s \| FileCheck %s

	define i8 @test_vqmovunh_s16(i16 %a) {			define i8 @test_vqmovunh_s16(i16 %a) {

test/CodeGen/AArch64/neon-scalar-fabd.ll

				; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64_be-none-linux-gnu -mattr=+neon \| FileCheck %s
	; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s			; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s

	define float @test_vabds_f32(float %a, float %b) {			define float @test_vabds_f32(float %a, float %b) {

test/CodeGen/AArch64/neon-scalar-fcvt.ll

				; RUN: llc -mtriple=aarch64_be-none-linux-gnu -mattr=+neon < %s \| FileCheck %s
	; RUN: llc -mtriple=aarch64-none-linux-gnu -mattr=+neon < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-none-linux-gnu -mattr=+neon < %s \| FileCheck %s

	;; Scalar Floating-point Convert			;; Scalar Floating-point Convert

test/CodeGen/AArch64/neon-scalar-fp-compare.ll

				; RUN: llc -mtriple=aarch64_be-none-linux-gnu -mattr=+neon < %s \| FileCheck %s
	; RUN: llc -mtriple=aarch64-none-linux-gnu -mattr=+neon < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-none-linux-gnu -mattr=+neon < %s \| FileCheck %s

	;; Scalar Floating-point Compare			;; Scalar Floating-point Compare

test/CodeGen/AArch64/neon-scalar-mul.ll

				; RUN: llc -mtriple=aarch64_be-none-linux-gnu -mattr=+neon < %s \| FileCheck %s
	; RUN: llc -mtriple=aarch64-none-linux-gnu -mattr=+neon < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-none-linux-gnu -mattr=+neon < %s \| FileCheck %s

	define i16 @test_vqdmulhh_s16(i16 %a, i16 %b) {			define i16 @test_vqdmulhh_s16(i16 %a, i16 %b) {

test/CodeGen/AArch64/neon-scalar-neg.ll

				; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64_be-none-linux-gnu -mattr=+neon \| FileCheck %s
	; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s			; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s

	define i64 @test_vnegd_s64(i64 %a) {			define i64 @test_vnegd_s64(i64 %a) {

test/CodeGen/AArch64/neon-scalar-recip.ll

				; RUN: llc -mtriple=aarch64_be-none-linux-gnu -mattr=+neon < %s \| FileCheck %s
	; RUN: llc -mtriple=aarch64-none-linux-gnu -mattr=+neon < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-none-linux-gnu -mattr=+neon < %s \| FileCheck %s

	define float @test_vrecpss_f32(float %a, float %b) {			define float @test_vrecpss_f32(float %a, float %b) {

test/CodeGen/AArch64/neon-scalar-reduce-pairwise.ll

				; RUN: llc -mtriple=aarch64_be-none-linux-gnu -mattr=+neon < %s \| FileCheck %s
	; RUN: llc -mtriple=aarch64-none-linux-gnu -mattr=+neon < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-none-linux-gnu -mattr=+neon < %s \| FileCheck %s

	declare <1 x i64> @llvm.aarch64.neon.vpadd(<2 x i64>)			declare <1 x i64> @llvm.aarch64.neon.vpadd(<2 x i64>)

test/CodeGen/AArch64/neon-scalar-rounding-shift.ll

				; RUN: llc -mtriple=aarch64_be-none-linux-gnu -mattr=+neon < %s \| FileCheck %s
	; RUN: llc -mtriple=aarch64-none-linux-gnu -mattr=+neon < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-none-linux-gnu -mattr=+neon < %s \| FileCheck %s

test/CodeGen/AArch64/neon-scalar-saturating-add-sub.ll

				; RUN: llc -mtriple=aarch64_be-none-linux-gnu -mattr=+neon < %s \| FileCheck %s
	; RUN: llc -mtriple=aarch64-none-linux-gnu -mattr=+neon < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-none-linux-gnu -mattr=+neon < %s \| FileCheck %s

	declare <1 x i8> @llvm.arm.neon.vqaddu.v1i8(<1 x i8>, <1 x i8>)			declare <1 x i8> @llvm.arm.neon.vqaddu.v1i8(<1 x i8>, <1 x i8>)

test/CodeGen/AArch64/neon-scalar-saturating-rounding-shift.ll

				; RUN: llc -mtriple=aarch64_be-none-linux-gnu -mattr=+neon < %s \| FileCheck %s
	; RUN: llc -mtriple=aarch64-none-linux-gnu -mattr=+neon < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-none-linux-gnu -mattr=+neon < %s \| FileCheck %s

	declare <1 x i64> @llvm.arm.neon.vqrshiftu.v1i64(<1 x i64>, <1 x i64>)			declare <1 x i64> @llvm.arm.neon.vqrshiftu.v1i64(<1 x i64>, <1 x i64>)

test/CodeGen/AArch64/neon-scalar-saturating-shift.ll

				; RUN: llc -mtriple=aarch64_be-none-linux-gnu -mattr=+neon < %s \| FileCheck %s
	; RUN: llc -mtriple=aarch64-none-linux-gnu -mattr=+neon < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-none-linux-gnu -mattr=+neon < %s \| FileCheck %s

	declare <1 x i64> @llvm.arm.neon.vqshiftu.v1i64(<1 x i64>, <1 x i64>)			declare <1 x i64> @llvm.arm.neon.vqshiftu.v1i64(<1 x i64>, <1 x i64>)

test/CodeGen/AArch64/neon-scalar-shift-imm.ll

				; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64_be-none-linux-gnu -mattr=+neon \| FileCheck %s
	; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s			; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s

	define i64 @test_vshrd_n_s64(i64 %a) {			define i64 @test_vshrd_n_s64(i64 %a) {

test/CodeGen/AArch64/neon-scalar-shift.ll

				; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64_be-none-linux-gnu -mattr=+neon \| FileCheck %s
	; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s			; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s

	declare <1 x i64> @llvm.arm.neon.vshiftu.v1i64(<1 x i64>, <1 x i64>)			declare <1 x i64> @llvm.arm.neon.vshiftu.v1i64(<1 x i64>, <1 x i64>)

test/CodeGen/AArch64/neon-select_cc.ll

				; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64_be-none-linux-gnu -mattr=+neon -fp-contract=fast \| FileCheck %s
	; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon -fp-contract=fast \| FileCheck %s			; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon -fp-contract=fast \| FileCheck %s

	define <8x i8> @test_select_cc_v8i8_i8(i8 %a, i8 %b, <8x i8> %c, <8x i8> %d ) {			define <8x i8> @test_select_cc_v8i8_i8(i8 %a, i8 %b, <8x i8> %c, <8x i8> %d ) {

test/CodeGen/AArch64/neon-shift-left-long.ll

				; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64_be-none-linux-gnu -mattr=+neon \| FileCheck %s
	; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s			; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s

	define <8 x i16> @test_sshll_v8i8(<8 x i8> %a) {			define <8 x i16> @test_sshll_v8i8(<8 x i8> %a) {

test/CodeGen/AArch64/neon-shift.ll

				; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64_be-none-linux-gnu -mattr=+neon \| FileCheck %s
	; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s			; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s

	declare <8 x i8> @llvm.arm.neon.vshiftu.v8i8(<8 x i8>, <8 x i8>)			declare <8 x i8> @llvm.arm.neon.vshiftu.v8i8(<8 x i8>, <8 x i8>)

test/CodeGen/AArch64/neon-shl-ashr-lshr.ll

				; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64_be-none-linux-gnu -mattr=+neon \| FileCheck %s
	; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s			; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s

	define <8 x i8> @shl.v8i8(<8 x i8> %a, <8 x i8> %b) {			define <8 x i8> @shl.v8i8(<8 x i8> %a, <8 x i8> %b) {

test/CodeGen/AArch64/neon-simd-ldst-multi-elem.ll

		; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64_be-none-linux-gnu -mattr=+neon \| FileCheck %s
	; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s	; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s
Context not available.

	define void @test_ldst1_v16i8(<16 x i8>* %ptr, <16 x i8>* %ptr2) {	define void @test_ldst1_v16i8(<16 x i8>* %ptr, <16 x i8>* %ptr2) {
Context not available.
	; CHECK-LABEL: test_ldst1_v16i8:	; CHECK-LABEL: test_ldst1_v16i8:
	; CHECK: ld1 {v{{[0-9]+}}.16b}, [x{{[0-9]+\|sp}}]	; CHECK: ldr q{{[0-9]+}}, [x{{[0-9]+\|sp}}]
	; CHECK: st1 {v{{[0-9]+}}.16b}, [x{{[0-9]+\|sp}}]	; CHECK: str q{{[0-9]+}}, [x{{[0-9]+\|sp}}]
	%tmp = load <16 x i8>* %ptr	%tmp = load <16 x i8>* %ptr
Context not available.

	define void @test_ldst1_v8i16(<8 x i16>* %ptr, <8 x i16>* %ptr2) {	define void @test_ldst1_v8i16(<8 x i16>* %ptr, <8 x i16>* %ptr2) {
	; CHECK-LABEL: test_ldst1_v8i16:	; CHECK-LABEL: test_ldst1_v8i16:
	; CHECK: ld1 {v{{[0-9]+}}.8h}, [x{{[0-9]+\|sp}}]	; CHECK: ldr q{{[0-9]+}}, [x{{[0-9]+\|sp}}]
	; CHECK: st1 {v{{[0-9]+}}.8h}, [x{{[0-9]+\|sp}}]	; CHECK: str q{{[0-9]+}}, [x{{[0-9]+\|sp}}]
	%tmp = load <8 x i16>* %ptr	%tmp = load <8 x i16>* %ptr
	store <8 x i16> %tmp, <8 x i16>* %ptr2	store <8 x i16> %tmp, <8 x i16>* %ptr2
	ret void	ret void
Context not available.

	define void @test_ldst1_v4i32(<4 x i32>* %ptr, <4 x i32>* %ptr2) {	define void @test_ldst1_v4i32(<4 x i32>* %ptr, <4 x i32>* %ptr2) {
	; CHECK-LABEL: test_ldst1_v4i32:	; CHECK-LABEL: test_ldst1_v4i32:
	; CHECK: ld1 {v{{[0-9]+}}.4s}, [x{{[0-9]+\|sp}}]	; CHECK: ldr q{{[0-9]+}}, [x{{[0-9]+\|sp}}]
	; CHECK: st1 {v{{[0-9]+}}.4s}, [x{{[0-9]+\|sp}}]	; CHECK: str q{{[0-9]+}}, [x{{[0-9]+\|sp}}]
	%tmp = load <4 x i32>* %ptr	%tmp = load <4 x i32>* %ptr
	store <4 x i32> %tmp, <4 x i32>* %ptr2	store <4 x i32> %tmp, <4 x i32>* %ptr2
	ret void	ret void
Context not available.

	define void @test_ldst1_v2i64(<2 x i64>* %ptr, <2 x i64>* %ptr2) {	define void @test_ldst1_v2i64(<2 x i64>* %ptr, <2 x i64>* %ptr2) {
	; CHECK-LABEL: test_ldst1_v2i64:	; CHECK-LABEL: test_ldst1_v2i64:
	; CHECK: ld1 {v{{[0-9]+}}.2d}, [x{{[0-9]+\|sp}}]	; CHECK: ldr q{{[0-9]+}}, [x{{[0-9]+\|sp}}]
	; CHECK: st1 {v{{[0-9]+}}.2d}, [x{{[0-9]+\|sp}}]	; CHECK: str q{{[0-9]+}}, [x{{[0-9]+\|sp}}]
	%tmp = load <2 x i64>* %ptr	%tmp = load <2 x i64>* %ptr
	store <2 x i64> %tmp, <2 x i64>* %ptr2	store <2 x i64> %tmp, <2 x i64>* %ptr2
	ret void	ret void
Context not available.

	define void @test_ldst1_v8i8(<8 x i8>* %ptr, <8 x i8>* %ptr2) {	define void @test_ldst1_v8i8(<8 x i8>* %ptr, <8 x i8>* %ptr2) {
	; CHECK-LABEL: test_ldst1_v8i8:	; CHECK-LABEL: test_ldst1_v8i8:
	; CHECK: ld1 {v{{[0-9]+}}.8b}, [x{{[0-9]+\|sp}}]	; CHECK: ldr d{{[0-9]+}}, [x{{[0-9]+\|sp}}]
	; CHECK: st1 {v{{[0-9]+}}.8b}, [x{{[0-9]+\|sp}}]	; CHECK: str d{{[0-9]+}}, [x{{[0-9]+\|sp}}]
	%tmp = load <8 x i8>* %ptr	%tmp = load <8 x i8>* %ptr
	store <8 x i8> %tmp, <8 x i8>* %ptr2	store <8 x i8> %tmp, <8 x i8>* %ptr2
	ret void	ret void
Context not available.

	define void @test_ldst1_v4i16(<4 x i16>* %ptr, <4 x i16>* %ptr2) {	define void @test_ldst1_v4i16(<4 x i16>* %ptr, <4 x i16>* %ptr2) {
	; CHECK-LABEL: test_ldst1_v4i16:	; CHECK-LABEL: test_ldst1_v4i16:
	; CHECK: ld1 {v{{[0-9]+}}.4h}, [x{{[0-9]+\|sp}}]	; CHECK: ldr d{{[0-9]+}}, [x{{[0-9]+\|sp}}]
	; CHECK: st1 {v{{[0-9]+}}.4h}, [x{{[0-9]+\|sp}}]	; CHECK: str d{{[0-9]+}}, [x{{[0-9]+\|sp}}]
	%tmp = load <4 x i16>* %ptr	%tmp = load <4 x i16>* %ptr
	store <4 x i16> %tmp, <4 x i16>* %ptr2	store <4 x i16> %tmp, <4 x i16>* %ptr2
	ret void	ret void
Context not available.

	define void @test_ldst1_v2i32(<2 x i32>* %ptr, <2 x i32>* %ptr2) {	define void @test_ldst1_v2i32(<2 x i32>* %ptr, <2 x i32>* %ptr2) {
	; CHECK-LABEL: test_ldst1_v2i32:	; CHECK-LABEL: test_ldst1_v2i32:
	; CHECK: ld1 {v{{[0-9]+}}.2s}, [x{{[0-9]+\|sp}}]	; CHECK: ldr d{{[0-9]+}}, [x{{[0-9]+\|sp}}]
	; CHECK: st1 {v{{[0-9]+}}.2s}, [x{{[0-9]+\|sp}}]	; CHECK: str d{{[0-9]+}}, [x{{[0-9]+\|sp}}]
	%tmp = load <2 x i32>* %ptr	%tmp = load <2 x i32>* %ptr
	store <2 x i32> %tmp, <2 x i32>* %ptr2	store <2 x i32> %tmp, <2 x i32>* %ptr2
	ret void	ret void
Context not available.

	define void @test_ldst1_v1i64(<1 x i64>* %ptr, <1 x i64>* %ptr2) {	define void @test_ldst1_v1i64(<1 x i64>* %ptr, <1 x i64>* %ptr2) {
	; CHECK-LABEL: test_ldst1_v1i64:	; CHECK-LABEL: test_ldst1_v1i64:
	; CHECK: ld1 {v{{[0-9]+}}.1d}, [x{{[0-9]+\|sp}}]	; CHECK: ldr d{{[0-9]+}}, [x{{[0-9]+\|sp}}]
	; CHECK: st1 {v{{[0-9]+}}.1d}, [x{{[0-9]+\|sp}}]	; CHECK: str d{{[0-9]+}}, [x{{[0-9]+\|sp}}]
	%tmp = load <1 x i64>* %ptr	%tmp = load <1 x i64>* %ptr
	store <1 x i64> %tmp, <1 x i64>* %ptr2	store <1 x i64> %tmp, <1 x i64>* %ptr2
	ret void	ret void
Context not available.

test/CodeGen/AArch64/neon-simd-ldst.ll

				; RUN: llc < %s -O2 -verify-machineinstrs -mtriple=aarch64_be-none-linux-gnu -mattr=+neon \| FileCheck %s
	; RUN: llc < %s -O2 -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s			; RUN: llc < %s -O2 -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s

	define void @test_ldstq_4v(i8* noalias %io, i32 %count) {			define void @test_ldstq_4v(i8* noalias %io, i32 %count) {

test/CodeGen/AArch64/neon-simd-post-ldst-multi-elem.ll

				; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64_be-none-linux-gnu -mattr=+neon \| FileCheck %s
	; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s			; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s

	;Check for a post-increment updating load.			;Check for a post-increment updating load.

test/CodeGen/AArch64/neon-simd-post-ldst-one.ll

				; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64_be-none-linux-gnu -mattr=+neon \| FileCheck %s
	; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s			; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s

	define { [2 x <16 x i8>] } @test_vld2q_dup_fx_update(i8* %a, i8** %ptr) {			define { [2 x <16 x i8>] } @test_vld2q_dup_fx_update(i8* %a, i8** %ptr) {

test/CodeGen/AArch64/neon-simd-shift.ll

				; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64_be-none-linux-gnu -mattr=+neon \| FileCheck %s
	; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s			; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s

	define <8 x i8> @test_vshr_n_s8(<8 x i8> %a) {			define <8 x i8> @test_vshr_n_s8(<8 x i8> %a) {

test/CodeGen/AArch64/neon-simd-tbl.ll

				; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64_be-none-linux-gnu -mattr=+neon \| FileCheck %s
	; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s			; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s

	declare <16 x i8> @llvm.aarch64.neon.vtbx4.v16i8(<16 x i8>, <16 x i8>, <16 x i8>, <16 x i8>, <16 x i8>, <16 x i8>)			declare <16 x i8> @llvm.aarch64.neon.vtbx4.v16i8(<16 x i8>, <16 x i8>, <16 x i8>, <16 x i8>, <16 x i8>, <16 x i8>)

test/CodeGen/AArch64/neon-simd-vget.ll

				; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64_be-none-linux-gnu -mattr=+neon \| FileCheck %s
	; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s			; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s

	define <8 x i8> @test_vget_high_s8(<16 x i8> %a) {			define <8 x i8> @test_vget_high_s8(<16 x i8> %a) {

test/CodeGen/AArch64/neon-spill-fpr8-fpr16.ll

				; RUN: llc -mtriple=aarch64_be-none-linux-gnu -mattr=+neon < %s \| FileCheck %s
	; RUN: llc -mtriple=aarch64-none-linux-gnu -mattr=+neon < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-none-linux-gnu -mattr=+neon < %s \| FileCheck %s

	; This file tests the spill of FPR8/FPR16. The volatile loads/stores force the			; This file tests the spill of FPR8/FPR16. The volatile loads/stores force the

test/CodeGen/AArch64/neon-truncStore-extLoad.ll

		; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64_be-none-linux-gnu -mattr=+neon \| FileCheck %s
	; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s	; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s

	; A vector TruncStore can not be selected.	; A vector TruncStore can not be selected.
Context not available.
	define void @truncStore.v2i64(<2 x i64> %a, <2 x i32>* %result) {	define void @truncStore.v2i64(<2 x i64> %a, <2 x i32>* %result) {
	; CHECK-LABEL: truncStore.v2i64:	; CHECK-LABEL: truncStore.v2i64:
	; CHECK: xtn v{{[0-9]+}}.2s, v{{[0-9]+}}.2d	; CHECK: xtn v{{[0-9]+}}.2s, v{{[0-9]+}}.2d
	; CHECK: st1 {v{{[0-9]+}}.2s}, [x{{[0-9]+\|sp}}]	; CHECK: str d{{[0-9]+}}, [x{{[0-9]+\|sp}}]
	%b = trunc <2 x i64> %a to <2 x i32>	%b = trunc <2 x i64> %a to <2 x i32>
	store <2 x i32> %b, <2 x i32>* %result	store <2 x i32> %b, <2 x i32>* %result
	ret void	ret void
Context not available.
	define void @truncStore.v4i32(<4 x i32> %a, <4 x i16>* %result) {	define void @truncStore.v4i32(<4 x i32> %a, <4 x i16>* %result) {
	; CHECK-LABEL: truncStore.v4i32:	; CHECK-LABEL: truncStore.v4i32:
	; CHECK: xtn v{{[0-9]+}}.4h, v{{[0-9]+}}.4s	; CHECK: xtn v{{[0-9]+}}.4h, v{{[0-9]+}}.4s
	; CHECK: st1 {v{{[0-9]+}}.4h}, [x{{[0-9]+\|sp}}]	; CHECK: str d{{[0-9]+}}, [x{{[0-9]+\|sp}}]
	%b = trunc <4 x i32> %a to <4 x i16>	%b = trunc <4 x i32> %a to <4 x i16>
	store <4 x i16> %b, <4 x i16>* %result	store <4 x i16> %b, <4 x i16>* %result
	ret void	ret void
Context not available.
	define void @truncStore.v8i16(<8 x i16> %a, <8 x i8>* %result) {	define void @truncStore.v8i16(<8 x i16> %a, <8 x i8>* %result) {
	; CHECK-LABEL: truncStore.v8i16:	; CHECK-LABEL: truncStore.v8i16:
	; CHECK: xtn v{{[0-9]+}}.8b, v{{[0-9]+}}.8h	; CHECK: xtn v{{[0-9]+}}.8b, v{{[0-9]+}}.8h
	; CHECK: st1 {v{{[0-9]+}}.8b}, [x{{[0-9]+\|sp}}]	; CHECK: str d{{[0-9]+}}, [x{{[0-9]+\|sp}}]
	%b = trunc <8 x i16> %a to <8 x i8>	%b = trunc <8 x i16> %a to <8 x i8>
	store <8 x i8> %b, <8 x i8>* %result	store <8 x i8> %b, <8 x i8>* %result
	ret void	ret void
Context not available.
	%vecext = extractelement <4 x i8> %a, i32 0	%vecext = extractelement <4 x i8> %a, i32 0
	%conv = zext i8 %vecext to i32	%conv = zext i8 %vecext to i32
	ret i32 %conv	ret i32 %conv
	}	}
	No newline at end of file
Context not available.

test/CodeGen/AArch64/neon-v1i1-setcc.ll

				; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64_be-none-linux-gnu -mattr=+neon -fp-contract=fast \| FileCheck %s
	; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon -fp-contract=fast \| FileCheck %s			; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon -fp-contract=fast \| FileCheck %s

	; This file test the DAG node like "v1i1 SETCC v1i64, v1i64". As the v1i1 type			; This file test the DAG node like "v1i1 SETCC v1i64, v1i64". As the v1i1 type

test/CodeGen/AArch64/neon-vector-list-spill.ll

				; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64_be-none-linux-gnu -mattr=+neon -fp-contract=fast
	; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon -fp-contract=fast			; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon -fp-contract=fast

	; FIXME: We should not generate ld/st for such register spill/fill, because the			; FIXME: We should not generate ld/st for such register spill/fill, because the

test/CodeGen/AArch64/regress-bitcast-formals.ll

				; RUN: llc -mtriple=aarch64_be-none-linux-gnu -verify-machineinstrs < %s \| FileCheck %s
	; RUN: llc -mtriple=aarch64-none-linux-gnu -verify-machineinstrs < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-none-linux-gnu -verify-machineinstrs < %s \| FileCheck %s

	; CallingConv.td requires a bitcast for vector arguments. Make sure we're			; CallingConv.td requires a bitcast for vector arguments. Make sure we're

test/CodeGen/AArch64/regress-f128csel-flags.ll

				; RUN: llc -mtriple=aarch64_be-none-linux-gnu -verify-machineinstrs < %s \| FileCheck %s
	; RUN: llc -mtriple=aarch64-none-linux-gnu -verify-machineinstrs < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-none-linux-gnu -verify-machineinstrs < %s \| FileCheck %s

	; We used to not mark NZCV as being used in the continuation basic-block			; We used to not mark NZCV as being used in the continuation basic-block

test/CodeGen/AArch64/regress-fp128-livein.ll

				; RUN: llc -mtriple=aarch64_be-none-linux-gnu -verify-machineinstrs < %s
	; RUN: llc -mtriple=aarch64-none-linux-gnu -verify-machineinstrs < %s			; RUN: llc -mtriple=aarch64-none-linux-gnu -verify-machineinstrs < %s

	; Regression test for NZCV reg live-in not being added to fp128csel IfTrue BB,			; Regression test for NZCV reg live-in not being added to fp128csel IfTrue BB,

test/CodeGen/AArch64/regress-tail-livereg.ll

				; RUN: llc -verify-machineinstrs -mtriple=aarch64_be-none-linux-gnu < %s \| FileCheck %s
	; RUN: llc -verify-machineinstrs -mtriple=aarch64-none-linux-gnu < %s \| FileCheck %s			; RUN: llc -verify-machineinstrs -mtriple=aarch64-none-linux-gnu < %s \| FileCheck %s
	@var = global void()* zeroinitializer			@var = global void()* zeroinitializer

test/CodeGen/AArch64/regress-tblgen-chains.ll

				; RUN: llc -verify-machineinstrs -mtriple=aarch64_be-none-linux-gnu < %s \| FileCheck %s
	; RUN: llc -verify-machineinstrs -mtriple=aarch64-none-linux-gnu < %s \| FileCheck %s			; RUN: llc -verify-machineinstrs -mtriple=aarch64-none-linux-gnu < %s \| FileCheck %s

	; When generating DAG selection tables, TableGen used to only flag an			; When generating DAG selection tables, TableGen used to only flag an

test/CodeGen/AArch64/regress-w29-reserved-with-fp.ll

				; RUN: llc -mtriple=aarch64_be-none-linux-gnu -disable-fp-elim < %s \| FileCheck %s
	; RUN: llc -mtriple=aarch64-none-linux-gnu -disable-fp-elim < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-none-linux-gnu -disable-fp-elim < %s \| FileCheck %s
	@var = global i32 0			@var = global i32 0

test/CodeGen/AArch64/regress-wzr-allocatable.ll

				; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64_be-none-linux-gnu -O0
	; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu -O0			; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu -O0

	; When WZR wasn't marked as reserved, this function tried to allocate			; When WZR wasn't marked as reserved, this function tried to allocate

test/CodeGen/AArch64/returnaddr.ll

				; RUN: llc < %s -mtriple=aarch64_be-none-linux-gnu \| FileCheck %s
	; RUN: llc < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s			; RUN: llc < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s

	define i8* @rt0(i32 %x) nounwind readnone {			define i8* @rt0(i32 %x) nounwind readnone {

test/CodeGen/AArch64/setcc-takes-i32.ll

				; RUN: llc -verify-machineinstrs -mtriple=aarch64_be-none-linux-gnu < %s \| FileCheck %s
	; RUN: llc -verify-machineinstrs -mtriple=aarch64-none-linux-gnu < %s \| FileCheck %s			; RUN: llc -verify-machineinstrs -mtriple=aarch64-none-linux-gnu < %s \| FileCheck %s

	; Most important point here is that the promotion of the i1 works			; Most important point here is that the promotion of the i1 works

test/CodeGen/AArch64/sext_inreg.ll

				; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64_be-none-linux-gnu -mattr=+neon \| FileCheck %s
	; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s			; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s

	; For formal arguments, we have the following vector type promotion,			; For formal arguments, we have the following vector type promotion,

test/CodeGen/AArch64/sibling-call.ll

				; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64_be-none-linux-gnu \| FileCheck %s
	; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s			; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s

	declare void @callee_stack0()			declare void @callee_stack0()

test/CodeGen/AArch64/sincos-expansion.ll

				; RUN: llc -mtriple=aarch64_be-none-linux-gnu -verify-machineinstrs < %s \| FileCheck %s
	; RUN: llc -mtriple=aarch64-none-linux-gnu -verify-machineinstrs < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-none-linux-gnu -verify-machineinstrs < %s \| FileCheck %s

	define float @test_sincos_f32(float %f) {			define float @test_sincos_f32(float %f) {

test/CodeGen/AArch64/sincospow-vector-expansion.ll

				; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64_be-none-linux-gnu -mattr=+neon \| FileCheck %s
	; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s			; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s

test/CodeGen/AArch64/tail-call.ll

				; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64_be-none-linux-gnu -tailcallopt \| FileCheck %s
	; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu -tailcallopt \| FileCheck %s			; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu -tailcallopt \| FileCheck %s

	declare fastcc void @callee_stack0()			declare fastcc void @callee_stack0()

test/CodeGen/AArch64/tls-dynamic-together.ll

				; RUN: llc -O0 -mtriple=aarch64_be-none-linux-gnu -relocation-model=pic -verify-machineinstrs < %s \| FileCheck %s
	; RUN: llc -O0 -mtriple=aarch64-none-linux-gnu -relocation-model=pic -verify-machineinstrs < %s \| FileCheck %s			; RUN: llc -O0 -mtriple=aarch64-none-linux-gnu -relocation-model=pic -verify-machineinstrs < %s \| FileCheck %s

	; If the .tlsdesccall and blr parts are emitted completely separately (even with			; If the .tlsdesccall and blr parts are emitted completely separately (even with

test/CodeGen/AArch64/tls-dynamics.ll

				; RUN: llc -mtriple=aarch64_be-none-linux-gnu -relocation-model=pic -verify-machineinstrs < %s \| FileCheck %s
				; RUN: llc -mtriple=aarch64_be-none-linux-gnu -relocation-model=pic -filetype=obj < %s \| llvm-objdump -r - \| FileCheck --check-prefix=CHECK-RELOC %s
	; RUN: llc -mtriple=aarch64-none-linux-gnu -relocation-model=pic -verify-machineinstrs < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-none-linux-gnu -relocation-model=pic -verify-machineinstrs < %s \| FileCheck %s
	; RUN: llc -mtriple=aarch64-none-linux-gnu -relocation-model=pic -filetype=obj < %s \| llvm-objdump -r - \| FileCheck --check-prefix=CHECK-RELOC %s			; RUN: llc -mtriple=aarch64-none-linux-gnu -relocation-model=pic -filetype=obj < %s \| llvm-objdump -r - \| FileCheck --check-prefix=CHECK-RELOC %s

test/CodeGen/AArch64/tls-execs.ll

				; RUN: llc -mtriple=aarch64_be-none-linux-gnu -verify-machineinstrs -show-mc-encoding < %s \| FileCheck %s
				; RUN: llc -mtriple=aarch64_be-none-linux-gnu -filetype=obj < %s \| llvm-objdump -r - \| FileCheck --check-prefix=CHECK-RELOC %s
	; RUN: llc -mtriple=aarch64-none-linux-gnu -verify-machineinstrs -show-mc-encoding < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-none-linux-gnu -verify-machineinstrs -show-mc-encoding < %s \| FileCheck %s
	; RUN: llc -mtriple=aarch64-none-linux-gnu -filetype=obj < %s \| llvm-objdump -r - \| FileCheck --check-prefix=CHECK-RELOC %s			; RUN: llc -mtriple=aarch64-none-linux-gnu -filetype=obj < %s \| llvm-objdump -r - \| FileCheck --check-prefix=CHECK-RELOC %s

test/CodeGen/AArch64/tst-br.ll

				; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64_be-none-linux-gnu \| FileCheck %s
	; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s			; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s

	; We've got the usual issues with LLVM reordering blocks here. The			; We've got the usual issues with LLVM reordering blocks here. The

test/CodeGen/AArch64/variadic.ll

				; RUN: llc -verify-machineinstrs -mtriple=aarch64_be-none-linux-gnu < %s \| FileCheck %s
				; RUN: llc -verify-machineinstrs -mtriple=aarch64_be-none-linux-gnu -mattr=-fp-armv8 < %s \| FileCheck --check-prefix=CHECK-NOFP %s
	; RUN: llc -verify-machineinstrs -mtriple=aarch64-none-linux-gnu < %s \| FileCheck %s			; RUN: llc -verify-machineinstrs -mtriple=aarch64-none-linux-gnu < %s \| FileCheck %s
	; RUN: llc -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=-fp-armv8 < %s \| FileCheck --check-prefix=CHECK-NOFP %s			; RUN: llc -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=-fp-armv8 < %s \| FileCheck --check-prefix=CHECK-NOFP %s

test/CodeGen/AArch64/zero-reg.ll

				; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64_be-none-linux-gnu \| FileCheck %s
	; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s			; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s

	@var32 = global i32 0			@var32 = global i32 0

This is an archive of the discontinued LLVM Phabricator instance.

AARCH64_BE load/store rules fix for ARM ABINeeds ReviewPublic

Details

Diff Detail

Event Timeline

// Load single 1-element structure to all lanes of 1 register

+ // Load single 1-element structure to one lane of 1 register.

Revision Contents

Diff 7483

lib/Target/AArch64/AArch64InstrInfo.td

lib/Target/AArch64/AArch64InstrNEON.td

test/CodeGen/AArch64/128bit_load_store.ll

test/CodeGen/AArch64/addsub-shifted.ll

test/CodeGen/AArch64/addsub.ll

test/CodeGen/AArch64/addsub_ext.ll

test/CodeGen/AArch64/alloca.ll

test/CodeGen/AArch64/analyze-branch.ll

test/CodeGen/AArch64/assertion-rc-mismatch.ll

test/CodeGen/AArch64/atomic-ops-not-barriers.ll

test/CodeGen/AArch64/atomic-ops.ll

test/CodeGen/AArch64/basic-pic.ll

test/CodeGen/AArch64/bitfield-insert-0.ll

test/CodeGen/AArch64/bitfield-insert.ll

test/CodeGen/AArch64/bitfield.ll

test/CodeGen/AArch64/blockaddress.ll

test/CodeGen/AArch64/bool-loads.ll

test/CodeGen/AArch64/breg.ll

test/CodeGen/AArch64/callee-save.ll

test/CodeGen/AArch64/code-model-large-abs.ll

test/CodeGen/AArch64/compare-branch.ll

test/CodeGen/AArch64/complex-copy-noneon.ll

test/CodeGen/AArch64/concatvector-v8i8-bug.ll

test/CodeGen/AArch64/cond-sel.ll

test/CodeGen/AArch64/cpus.ll

test/CodeGen/AArch64/directcond.ll

test/CodeGen/AArch64/dp-3source.ll

test/CodeGen/AArch64/dp1.ll

test/CodeGen/AArch64/dp2.ll

test/CodeGen/AArch64/extern-weak.ll

test/CodeGen/AArch64/extract.ll

test/CodeGen/AArch64/fastcc-reserved.ll

test/CodeGen/AArch64/fastcc.ll

test/CodeGen/AArch64/fcmp.ll

test/CodeGen/AArch64/fcvt-fixed.ll

test/CodeGen/AArch64/fcvt-int.ll

test/CodeGen/AArch64/flags-multiuse.ll

test/CodeGen/AArch64/floatdp_1source.ll

test/CodeGen/AArch64/floatdp_2source.ll

test/CodeGen/AArch64/fp-cond-sel.ll

test/CodeGen/AArch64/fp-dp3.ll

test/CodeGen/AArch64/fp128-folding.ll

test/CodeGen/AArch64/fp128.ll

test/CodeGen/AArch64/fpimm.ll

test/CodeGen/AArch64/frameaddr.ll

test/CodeGen/AArch64/func-argpassing.ll

test/CodeGen/AArch64/func-calls.ll

test/CodeGen/AArch64/global-alignment.ll

test/CodeGen/AArch64/got-abuse.ll

test/CodeGen/AArch64/i128-align.ll

test/CodeGen/AArch64/illegal-float-ops.ll

test/CodeGen/AArch64/init-array.ll

test/CodeGen/AArch64/inline-asm-constraints-badI.ll

test/CodeGen/AArch64/inline-asm-constraints-badK.ll

test/CodeGen/AArch64/inline-asm-constraints-badK2.ll

test/CodeGen/AArch64/inline-asm-constraints-badL.ll

test/CodeGen/AArch64/inline-asm-modifiers.ll

test/CodeGen/AArch64/jump-table.ll

test/CodeGen/AArch64/large-consts.ll

test/CodeGen/AArch64/large-frame.ll

test/CodeGen/AArch64/ldst-regoffset.ll

test/CodeGen/AArch64/ldst-unscaledimm.ll

test/CodeGen/AArch64/ldst-unsignedimm.ll

test/CodeGen/AArch64/literal_pools.ll

test/CodeGen/AArch64/local_vars.ll

test/CodeGen/AArch64/logical-imm.ll

test/CodeGen/AArch64/logical_shifted_reg.ll

test/CodeGen/AArch64/mature-mc-support.ll

test/CodeGen/AArch64/movw-consts.ll

test/CodeGen/AArch64/movw-shift-encoding.ll

test/CodeGen/AArch64/neon-2velem-high.ll

AARCH64_BE load/store rules fix for ARM ABI
Needs ReviewPublic