This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/
-
Basic/Targets/
-
Targets/
-
OSTargets.h
-
X86.h
-
CodeGen/
-
CGBuiltin.cpp
-
Headers/
-
avx512fintrin.h
-
avxintrin.h
-
test/
-
CodeGen/
-
arm-swiftcall.c
-
vector-alignment.c
-
CodeGenCXX/
-
align-avx-complete-objects.cpp

Differential D46042

Cap vector alignment at 16 for all Darwin platforms
ClosedPublic

Authored by rjmccall on Apr 24 2018, 9:17 PM.

Download Raw Diff

Details

Reviewers

javed.absar
dexonsmith
bob.wilson
ab
ahatanak

Summary

This fixes two major problems:

We were not capping vector alignment as desired on 32-bit ARM.
We were using different alignments based on the AVX settings on Intel, so we did not have a consistent ABI.

This is an ABI break, but we think we can get away with it because vectors tend to be used mostly in inline code (which is why not having a consistent ABI has not proven disastrous on Intel).

Intel's AVX types are specified as having 32-byte / 64-byte alignment, so align them explicitly instead of relying on the base ABI rule. Note that this sort of attribute is stripped from template arguments in template substitution, so there's a possibility that code templated over vectors will produce inadequately-aligned objects.

Some of our discussion leading into this change is here: https://github.com/apple/swift/pull/15691

Diff Detail

Repository: rC Clang

Event Timeline

rjmccall created this revision.Apr 24 2018, 9:17 PM

Herald added a reviewer: javed.absar. · View Herald TranscriptApr 24 2018, 9:17 PM

Herald added subscribers: cfe-commits, kristof.beyls. · View Herald Transcript

rjmccall edited the summary of this revision. (Show Details)Apr 24 2018, 9:18 PM

scanon added a subscriber: scanon.Apr 25 2018, 7:28 AM

rjmccall added reviewers: dexonsmith, bob.wilson, ab.Apr 25 2018, 9:00 AM

bob.wilson added a subscriber: bob.wilson.Apr 30 2018, 10:25 PM

LGTM

This revision is now accepted and ready to land.May 2 2018, 3:17 PM

Note that this sort of attribute is stripped from template arguments in template substitution, so there's a possibility that code templated over vectors will produce inadequately-aligned objects.

I was wondering whether there is a warning clang issues when the aligned attribute is stripped. If it doesn't warn, should it? I recently came across a case where a 16-byte vector annotated with a 4-byte alignment was passed to std::swap, which caused a crash because the alignment was stripped and the x86 backend decided to emit an 16-byte aligned load to load an unaligned vector.

I think we should seriously consider making alignment attributes on typedefs (and maybe some other attributes like may_alias) actual type qualifiers that are preserved in the canonical type, mangled, and so on. It would be an ABI break, but it'd also solve a lot of problems.

So, this makes sense to me, but on x86, should we also be worried about the fact that the calling convention is based on which features are available? (>128bit ext_vector_types are passed in AVX/AVX-512 registers, if available). Presumably swift is also affected, no?

In D46042#1088044, @ab wrote:

So, this makes sense to me, but on x86, should we also be worried about the fact that the calling convention is based on which features are available? (>128bit ext_vector_types are passed in AVX/AVX-512 registers, if available). Presumably swift is also affected, no?

Swift's calling conventions (will?) always divide larger vectors into 128b pieces. When interacting with C conventions, yes, this is still an issue.

In D46042#1088044, @ab wrote:

So, this makes sense to me, but on x86, should we also be worried about the fact that the calling convention is based on which features are available? (>128bit ext_vector_types are passed in AVX/AVX-512 registers, if available). Presumably swift is also affected, no?

I'd forgotten about that. I think there's a strong argument that we're required to pass at least the Intel intrinsic vector types that way, yeah. But if we want a stable ABI for other vector types, we really can't. The root problem here is that the Intel ABI seems to imagine that these vector types only exist when they're supported directly by hardware. (And the Intel intrinsic headers do define those types even when AVX is disabled!) So I don't know that we can make a good ABI story for that.

In D46042#1088049, @scanon wrote:

In D46042#1088044, @ab wrote:

So, this makes sense to me, but on x86, should we also be worried about the fact that the calling convention is based on which features are available? (>128bit ext_vector_types are passed in AVX/AVX-512 registers, if available). Presumably swift is also affected, no?

Swift's calling conventions (will?) always divide larger vectors into 128b pieces. When interacting with C conventions, yes, this is still an issue.

Right, this is just a C ABI issue.

Landed as r333791.

This change appears to have caused some blink vector math unit tests to fail on Windows. We are tracking it at https://crbug.com/849251.

It has a pretty small reproducer:

#include <immintrin.h>
__m256 loadit(__m256 *p) { return _mm256_loadu_ps((const float *)p); }

Compile for x86_64-windows-msvc with -mavx, and before this change we got this IR: %0 = load <8 x float>, <8 x float>* %p, align 1
After we get this IR: %0 = load <8 x float>, <8 x float>* %p, align 32

This is surprising. I'll keep debugging.

It's the typedef alignment changes that are causing problems for us, not the MaxVectorAlign changes. That makes more sense. The new alignment attribute breaks our implementation of _mm256_loadu_ps, because the packed struct ends up with a 32-byte alignment. Here's the implementation:

static __inline __m256 __DEFAULT_FN_ATTRS
_mm256_loadu_ps(float const *__p)
{
  struct __loadu_ps {
    __m256 __v;
  } __attribute__((__packed__, __may_alias__));
  return ((struct __loadu_ps*)__p)->__v;
}

And clang's -fdump-record-layouts says:

*** Dumping AST Record Layout
         0 | struct __loadu_ps
         0 |   __m256 __v
           | [sizeof=32, align=32]

I think the problem is that __attribute__((aligned(N))) beats __attribute__((packed)) on Windows to match MSVC's behavior with __declspec(align(N)).

I think we should revert this for now. Adding the alignment attribute to all Intel vector typedefs is a bigger change than it seems.

In D46042#1121648, @rnk wrote:
It's the typedef alignment changes that are causing problems for us, not the MaxVectorAlign changes. That makes more sense. The new alignment attribute breaks our implementation of _mm256_loadu_ps, because the packed struct ends up with a 32-byte alignment. Here's the implementation:
static __inline __m256 __DEFAULT_FN_ATTRS
_mm256_loadu_ps(float const *__p)
{
  struct __loadu_ps {
    __m256 __v;
  } __attribute__((__packed__, __may_alias__));
  return ((struct __loadu_ps*)__p)->__v;
}
And clang's -fdump-record-layouts says:
*** Dumping AST Record Layout
         0 | struct __loadu_ps
         0 |   __m256 __v
           | [sizeof=32, align=32]
I think the problem is that __attribute__((aligned(N))) beats __attribute__((packed)) on Windows to match MSVC's behavior with __declspec(align(N)).

I think we should revert this for now. Adding the alignment attribute to all Intel vector typedefs is a bigger change than it seems.

Ugh. That is just an awful language rule. Would it be reasonable to restrict it to only attributes spelled with __declspec(align(N)) rather than __attribute__((aligned(N))), or is that too invasive in the alignment computation?

In D46042#1121674, @rjmccall wrote:

I think we should revert this for now. Adding the alignment attribute to all Intel vector typedefs is a bigger change than it seems.

Ugh. That is just an awful language rule. Would it be reasonable to restrict it to only attributes spelled with __declspec(align(N)) rather than __attribute__((aligned(N))), or is that too invasive in the alignment computation?

When we were working on the record layout code, I didn't want to do that because users often structure their portability headers to check for __clang__ first because clang also defines _MSC_VER and __GNUC__. I felt it would be best if the alignment attributes were as interchangeable as possible. They are very common.

Maybe checking the spelling of the packing attribute would work better. The GCC __attribute__ spelling would ignore what we called "required alignment", meaning alignment required by explicit attributes and not the normal alignof.

By the way, I went ahead and reverted this in r333958.

rnk mentioned this in D57961: [X86] Add explicit alignment to __m128/__m128i/__m128d/etc. to allow matching of MSVC behavior with #pragma pack..Feb 8 2019, 10:37 AM

Revision Contents

Path

Size

lib/

Basic/

Targets/

OSTargets.h

3 lines

X86.h

7 lines

CodeGen/

CGBuiltin.cpp

35 lines

Headers/

avx512fintrin.h

42 lines

avxintrin.h

37 lines

test/

CodeGen/

arm-swiftcall.c

4 lines

vector-alignment.c

84 lines

CodeGenCXX/

align-avx-complete-objects.cpp

6 lines

Diff 143862

lib/Basic/Targets/OSTargets.h

Context not available.
	}	}

	this->MCountName = "\01mcount";	this->MCountName = "\01mcount";

		// Cap vector alignment at 16 bytes for all Darwin platforms.
		this->MaxVectorAlign = 128;
	}	}

	std::string isValidSectionSpecifier(StringRef SR) const override {	std::string isValidSectionSpecifier(StringRef SR) const override {
Context not available.

lib/Basic/Targets/X86.h

Context not available.
	LongDoubleWidth = 128;	LongDoubleWidth = 128;
	LongDoubleAlign = 128;	LongDoubleAlign = 128;
	SuitableAlign = 128;	SuitableAlign = 128;
	MaxVectorAlign = 256;
	// The watchOS simulator uses the builtin bool type for Objective-C.	// The watchOS simulator uses the builtin bool type for Objective-C.
	llvm::Triple T = llvm::Triple(Triple);	llvm::Triple T = llvm::Triple(Triple);
	if (T.isWatchOS())	if (T.isWatchOS())
Context not available.
	if (!DarwinTargetInfo<X86_32TargetInfo>::handleTargetFeatures(Features,	if (!DarwinTargetInfo<X86_32TargetInfo>::handleTargetFeatures(Features,
	Diags))	Diags))
	return false;	return false;
	// We now know the features we have: we can decide how to align vectors.
	MaxVectorAlign =
	hasFeature("avx512f") ? 512 : hasFeature("avx") ? 256 : 128;
	return true;	return true;
	}	}
	};	};
Context not available.
	if (!DarwinTargetInfo<X86_64TargetInfo>::handleTargetFeatures(Features,	if (!DarwinTargetInfo<X86_64TargetInfo>::handleTargetFeatures(Features,
	Diags))	Diags))
	return false;	return false;
	// We now know the features we have: we can decide how to align vectors.
	MaxVectorAlign =
	hasFeature("avx512f") ? 512 : hasFeature("avx") ? 256 : 128;
	return true;	return true;
	}	}
	};	};
Context not available.

lib/CodeGen/CGBuiltin.cpp

Context not available.
	case X86::BI__builtin_ia32_movdqa64store128_mask:	case X86::BI__builtin_ia32_movdqa64store128_mask:
	case X86::BI__builtin_ia32_storeaps128_mask:	case X86::BI__builtin_ia32_storeaps128_mask:
	case X86::BI__builtin_ia32_storeapd128_mask:	case X86::BI__builtin_ia32_storeapd128_mask:
		return EmitX86MaskedStore(*this, Ops, 16);

	case X86::BI__builtin_ia32_movdqa32store256_mask:	case X86::BI__builtin_ia32_movdqa32store256_mask:
	case X86::BI__builtin_ia32_movdqa64store256_mask:	case X86::BI__builtin_ia32_movdqa64store256_mask:
	case X86::BI__builtin_ia32_storeaps256_mask:	case X86::BI__builtin_ia32_storeaps256_mask:
	case X86::BI__builtin_ia32_storeapd256_mask:	case X86::BI__builtin_ia32_storeapd256_mask:
		return EmitX86MaskedStore(*this, Ops, 32);

	case X86::BI__builtin_ia32_movdqa32store512_mask:	case X86::BI__builtin_ia32_movdqa32store512_mask:
	case X86::BI__builtin_ia32_movdqa64store512_mask:	case X86::BI__builtin_ia32_movdqa64store512_mask:
	case X86::BI__builtin_ia32_storeaps512_mask:	case X86::BI__builtin_ia32_storeaps512_mask:
	case X86::BI__builtin_ia32_storeapd512_mask: {	case X86::BI__builtin_ia32_storeapd512_mask:
	unsigned Align =	return EmitX86MaskedStore(*this, Ops, 64);
	getContext().getTypeAlignInChars(E->getArg(1)->getType()).getQuantity();
	return EmitX86MaskedStore(*this, Ops, Align);
	}
	case X86::BI__builtin_ia32_loadups128_mask:	case X86::BI__builtin_ia32_loadups128_mask:
	case X86::BI__builtin_ia32_loadups256_mask:	case X86::BI__builtin_ia32_loadups256_mask:
	case X86::BI__builtin_ia32_loadups512_mask:	case X86::BI__builtin_ia32_loadups512_mask:
Context not available.

	case X86::BI__builtin_ia32_loadss128_mask:	case X86::BI__builtin_ia32_loadss128_mask:
	case X86::BI__builtin_ia32_loadsd128_mask:	case X86::BI__builtin_ia32_loadsd128_mask:
		case X86::BI__builtin_ia32_loadaps128_mask:
		case X86::BI__builtin_ia32_loadapd128_mask:
		case X86::BI__builtin_ia32_movdqa32load128_mask:
		case X86::BI__builtin_ia32_movdqa64load128_mask:
	return EmitX86MaskedLoad(*this, Ops, 16);	return EmitX86MaskedLoad(*this, Ops, 16);

	case X86::BI__builtin_ia32_loadaps128_mask:
	case X86::BI__builtin_ia32_loadaps256_mask:	case X86::BI__builtin_ia32_loadaps256_mask:
	case X86::BI__builtin_ia32_loadaps512_mask:
	case X86::BI__builtin_ia32_loadapd128_mask:
	case X86::BI__builtin_ia32_loadapd256_mask:	case X86::BI__builtin_ia32_loadapd256_mask:
	case X86::BI__builtin_ia32_loadapd512_mask:
	case X86::BI__builtin_ia32_movdqa32load128_mask:
	case X86::BI__builtin_ia32_movdqa32load256_mask:	case X86::BI__builtin_ia32_movdqa32load256_mask:
	case X86::BI__builtin_ia32_movdqa32load512_mask:
	case X86::BI__builtin_ia32_movdqa64load128_mask:
	case X86::BI__builtin_ia32_movdqa64load256_mask:	case X86::BI__builtin_ia32_movdqa64load256_mask:
	case X86::BI__builtin_ia32_movdqa64load512_mask: {	return EmitX86MaskedLoad(*this, Ops, 32);
	unsigned Align =
	getContext().getTypeAlignInChars(E->getArg(1)->getType()).getQuantity();	case X86::BI__builtin_ia32_loadaps512_mask:
	return EmitX86MaskedLoad(*this, Ops, Align);	case X86::BI__builtin_ia32_loadapd512_mask:
	}	case X86::BI__builtin_ia32_movdqa32load512_mask:
		case X86::BI__builtin_ia32_movdqa64load512_mask:
		return EmitX86MaskedLoad(*this, Ops, 64);

	case X86::BI__builtin_ia32_vbroadcastf128_pd256:	case X86::BI__builtin_ia32_vbroadcastf128_pd256:
	case X86::BI__builtin_ia32_vbroadcastf128_ps256: {	case X86::BI__builtin_ia32_vbroadcastf128_ps256: {
Context not available.

lib/Headers/avx512fintrin.h

Context not available.
	#ifndef __AVX512FINTRIN_H	#ifndef __AVX512FINTRIN_H
	#define __AVX512FINTRIN_H	#define __AVX512FINTRIN_H

	typedef char __v64qi __attribute__((__vector_size__(64)));	typedef char __v64qi __attribute__((__vector_size__(64))) __attribute__((__aligned__(64)));
	typedef short __v32hi __attribute__((__vector_size__(64)));	typedef short __v32hi __attribute__((__vector_size__(64))) __attribute__((__aligned__(64)));
	typedef double __v8df __attribute__((__vector_size__(64)));	typedef double __v8df __attribute__((__vector_size__(64))) __attribute__((__aligned__(64)));
	typedef float __v16sf __attribute__((__vector_size__(64)));	typedef float __v16sf __attribute__((__vector_size__(64))) __attribute__((__aligned__(64)));
	typedef long long __v8di __attribute__((__vector_size__(64)));	typedef long long __v8di __attribute__((__vector_size__(64))) __attribute__((__aligned__(64)));
	typedef int __v16si __attribute__((__vector_size__(64)));	typedef int __v16si __attribute__((__vector_size__(64))) __attribute__((__aligned__(64)));

	/* Unsigned types */	/* Unsigned types */
	typedef unsigned char __v64qu __attribute__((__vector_size__(64)));	typedef unsigned char __v64qu __attribute__((__vector_size__(64))) __attribute__((__aligned__(64)));
	typedef unsigned short __v32hu __attribute__((__vector_size__(64)));	typedef unsigned short __v32hu __attribute__((__vector_size__(64))) __attribute__((__aligned__(64)));
	typedef unsigned long long __v8du __attribute__((__vector_size__(64)));	typedef unsigned long long __v8du __attribute__((__vector_size__(64))) __attribute__((__aligned__(64)));
	typedef unsigned int __v16su __attribute__((__vector_size__(64)));	typedef unsigned int __v16su __attribute__((__vector_size__(64))) __attribute__((__aligned__(64)));

	typedef float __m512 __attribute__((__vector_size__(64)));	typedef float __m512 __attribute__((__vector_size__(64))) __attribute__((__aligned__(64)));
	typedef double __m512d __attribute__((__vector_size__(64)));	typedef double __m512d __attribute__((__vector_size__(64))) __attribute__((__aligned__(64)));
	typedef long long __m512i __attribute__((__vector_size__(64)));	typedef long long __m512i __attribute__((__vector_size__(64))) __attribute__((__aligned__(64)));

	typedef unsigned char __mmask8;	typedef unsigned char __mmask8;
	typedef unsigned short __mmask16;	typedef unsigned short __mmask16;
Context not available.
	static __inline void __DEFAULT_FN_ATTRS	static __inline void __DEFAULT_FN_ATTRS
	_mm512_store_pd(void *__P, __m512d __A)	_mm512_store_pd(void *__P, __m512d __A)
	{	{
	(__m512d)__P = __A;	(__m512d ) __P = __A;
	}	}

	static __inline void __DEFAULT_FN_ATTRS	static __inline void __DEFAULT_FN_ATTRS
Context not available.
	static __inline void __DEFAULT_FN_ATTRS	static __inline void __DEFAULT_FN_ATTRS
	_mm512_store_ps(void *__P, __m512 __A)	_mm512_store_ps(void *__P, __m512 __A)
	{	{
	(__m512)__P = __A;	(__m512 ) __P = __A;
	}	}

	static __inline void __DEFAULT_FN_ATTRS	static __inline void __DEFAULT_FN_ATTRS
Context not available.
	static __inline__ void __DEFAULT_FN_ATTRS	static __inline__ void __DEFAULT_FN_ATTRS
	_mm512_stream_si512 (__m512i * __P, __m512i __A)	_mm512_stream_si512 (__m512i * __P, __m512i __A)
	{	{
	typedef __v8di __v8di_aligned __attribute__((aligned(64)));	__builtin_nontemporal_store((__v8di)__A, (__v8di*)__P);
	__builtin_nontemporal_store((__v8di_aligned)__A, (__v8di_aligned*)__P);
	}	}

	static __inline__ __m512i __DEFAULT_FN_ATTRS	static __inline__ __m512i __DEFAULT_FN_ATTRS
	_mm512_stream_load_si512 (void const *__P)	_mm512_stream_load_si512 (void const *__P)
	{	{
	typedef __v8di __v8di_aligned __attribute__((aligned(64)));	return (__m512i) __builtin_nontemporal_load((const __v8di *)__P);
	return (__m512i) __builtin_nontemporal_load((const __v8di_aligned *)__P);
	}	}

	static __inline__ void __DEFAULT_FN_ATTRS	static __inline__ void __DEFAULT_FN_ATTRS
	_mm512_stream_pd (double *__P, __m512d __A)	_mm512_stream_pd (double *__P, __m512d __A)
	{	{
	typedef __v8df __v8df_aligned __attribute__((aligned(64)));	__builtin_nontemporal_store((__v8df)__A, (__v8df*)__P);
	__builtin_nontemporal_store((__v8df_aligned)__A, (__v8df_aligned*)__P);
	}	}

	static __inline__ void __DEFAULT_FN_ATTRS	static __inline__ void __DEFAULT_FN_ATTRS
	_mm512_stream_ps (float *__P, __m512 __A)	_mm512_stream_ps (float *__P, __m512 __A)
	{	{
	typedef __v16sf __v16sf_aligned __attribute__((aligned(64)));	__builtin_nontemporal_store((__v16sf)__A, (__v16sf*)__P);
	__builtin_nontemporal_store((__v16sf_aligned)__A, (__v16sf_aligned*)__P);
	}	}

	static __inline__ __m512d __DEFAULT_FN_ATTRS	static __inline__ __m512d __DEFAULT_FN_ATTRS
Context not available.

lib/Headers/avxintrin.h

Context not available.
	#ifndef __AVXINTRIN_H	#ifndef __AVXINTRIN_H
	#define __AVXINTRIN_H	#define __AVXINTRIN_H

	typedef double __v4df __attribute__ ((__vector_size__ (32)));	typedef double __v4df __attribute__ ((__vector_size__ (32))) __attribute__((__aligned__(32)));
	typedef float __v8sf __attribute__ ((__vector_size__ (32)));	typedef float __v8sf __attribute__ ((__vector_size__ (32))) __attribute__((__aligned__(32)));
	typedef long long __v4di __attribute__ ((__vector_size__ (32)));	typedef long long __v4di __attribute__ ((__vector_size__ (32))) __attribute__((__aligned__(32)));
	typedef int __v8si __attribute__ ((__vector_size__ (32)));	typedef int __v8si __attribute__ ((__vector_size__ (32))) __attribute__((__aligned__(32)));
	typedef short __v16hi __attribute__ ((__vector_size__ (32)));	typedef short __v16hi __attribute__ ((__vector_size__ (32))) __attribute__((__aligned__(32)));
	typedef char __v32qi __attribute__ ((__vector_size__ (32)));	typedef char __v32qi __attribute__ ((__vector_size__ (32))) __attribute__((__aligned__(32)));

	/* Unsigned types */	/* Unsigned types */
	typedef unsigned long long __v4du __attribute__ ((__vector_size__ (32)));	typedef unsigned long long __v4du __attribute__ ((__vector_size__ (32))) __attribute__((__aligned__(32)));
	typedef unsigned int __v8su __attribute__ ((__vector_size__ (32)));	typedef unsigned int __v8su __attribute__ ((__vector_size__ (32))) __attribute__((__aligned__(32)));
	typedef unsigned short __v16hu __attribute__ ((__vector_size__ (32)));	typedef unsigned short __v16hu __attribute__ ((__vector_size__ (32))) __attribute__((__aligned__(32)));
	typedef unsigned char __v32qu __attribute__ ((__vector_size__ (32)));	typedef unsigned char __v32qu __attribute__ ((__vector_size__ (32))) __attribute__((__aligned__(32)));

	/* We need an explicitly signed variant for char. Note that this shouldn't	/* We need an explicitly signed variant for char. Note that this shouldn't
	* appear in the interface though. */	* appear in the interface though. */
	typedef signed char __v32qs __attribute__((__vector_size__(32)));	typedef signed char __v32qs __attribute__((__vector_size__(32))) __attribute__((__aligned__(32)));

	typedef float __m256 __attribute__ ((__vector_size__ (32)));	typedef float __m256 __attribute__ ((__vector_size__ (32))) __attribute__((__aligned__(32)));
	typedef double __m256d __attribute__((__vector_size__(32)));	typedef double __m256d __attribute__((__vector_size__(32))) __attribute__((__aligned__(32)));
	typedef long long __m256i __attribute__((__vector_size__(32)));	typedef long long __m256i __attribute__((__vector_size__(32))) __attribute__((__aligned__(32)));

	/* Define the default attributes for the functions in this file. */	/* Define the default attributes for the functions in this file. */
	#define __DEFAULT_FN_ATTRS __attribute__((__always_inline__, __nodebug__, __target__("avx")))	#define __DEFAULT_FN_ATTRS __attribute__((__always_inline__, __nodebug__, __target__("avx")))
Context not available.
	static __inline void __DEFAULT_FN_ATTRS	static __inline void __DEFAULT_FN_ATTRS
	_mm256_stream_si256(__m256i *__a, __m256i __b)	_mm256_stream_si256(__m256i *__a, __m256i __b)
	{	{
	typedef __v4di __v4di_aligned __attribute__((aligned(32)));	__builtin_nontemporal_store((__v4di)__b, (__v4di*)__a);
	__builtin_nontemporal_store((__v4di_aligned)__b, (__v4di_aligned*)__a);
	}	}

	/// \brief Moves double-precision values from a 256-bit vector of [4 x double]	/// \brief Moves double-precision values from a 256-bit vector of [4 x double]
Context not available.
	static __inline void __DEFAULT_FN_ATTRS	static __inline void __DEFAULT_FN_ATTRS
	_mm256_stream_pd(double *__a, __m256d __b)	_mm256_stream_pd(double *__a, __m256d __b)
	{	{
	typedef __v4df __v4df_aligned __attribute__((aligned(32)));	__builtin_nontemporal_store((__v4df)__b, (__v4df*)__a);
	__builtin_nontemporal_store((__v4df_aligned)__b, (__v4df_aligned*)__a);
	}	}

	/// \brief Moves single-precision floating point values from a 256-bit vector	/// \brief Moves single-precision floating point values from a 256-bit vector
Context not available.
	static __inline void __DEFAULT_FN_ATTRS	static __inline void __DEFAULT_FN_ATTRS
	_mm256_stream_ps(float *__p, __m256 __a)	_mm256_stream_ps(float *__p, __m256 __a)
	{	{
	typedef __v8sf __v8sf_aligned __attribute__((aligned(32)));	__builtin_nontemporal_store((__v8sf)__a, (__v8sf*)__p);
	__builtin_nontemporal_store((__v8sf_aligned)__a, (__v8sf_aligned*)__p);
	}	}

	/* Create vectors */	/* Create vectors */
Context not available.

test/CodeGen/arm-swiftcall.c

Context not available.
	typedef double double4 __attribute__((ext_vector_type(4)));	typedef double double4 __attribute__((ext_vector_type(4)));
	typedef int int3 __attribute__((ext_vector_type(3)));	typedef int int3 __attribute__((ext_vector_type(3)));
	typedef int int4 __attribute__((ext_vector_type(4)));	typedef int int4 __attribute__((ext_vector_type(4)));
	typedef int int5 __attribute__((ext_vector_type(5)));	typedef int int5 __attribute__((ext_vector_type(5))) __attribute__((aligned(32)));
	typedef int int8 __attribute__((ext_vector_type(8)));	typedef int int8 __attribute__((ext_vector_type(8))) __attribute__((aligned(32)));
	typedef char char16 __attribute__((ext_vector_type(16)));	typedef char char16 __attribute__((ext_vector_type(16)));
	typedef short short8 __attribute__((ext_vector_type(8)));	typedef short short8 __attribute__((ext_vector_type(8)));
	typedef long long long2 __attribute__((ext_vector_type(2)));	typedef long long long2 __attribute__((ext_vector_type(2)));
Context not available.

test/CodeGen/vector-alignment.c

	// RUN: %clang_cc1 -w -triple x86_64-apple-darwin10 \	// RUN: %clang_cc1 -w -triple x86_64-apple-darwin10 \
	// RUN: -emit-llvm -o - %s \| FileCheck %s --check-prefix=ALL --check-prefix=SSE	// RUN: -emit-llvm -o - %s \| FileCheck %s --check-prefix=ALL --check-prefix=DARWIN_SSE
	// RUN: %clang_cc1 -w -triple i386-apple-darwin10 \	// RUN: %clang_cc1 -w -triple i386-apple-darwin10 \
	// RUN: -emit-llvm -o - %s \| FileCheck %s --check-prefix=ALL --check-prefix=SSE	// RUN: -emit-llvm -o - %s \| FileCheck %s --check-prefix=ALL --check-prefix=DARWIN_SSE
	// RUN: %clang_cc1 -w -triple x86_64-apple-darwin10 -target-feature +avx \	// RUN: %clang_cc1 -w -triple x86_64-apple-darwin10 -target-feature +avx \
	// RUN: -emit-llvm -o - %s \| FileCheck %s --check-prefix=ALL --check-prefix=AVX	// RUN: -emit-llvm -o - %s \| FileCheck %s --check-prefix=ALL --check-prefix=DARWIN_AVX
	// RUN: %clang_cc1 -w -triple i386-apple-darwin10 -target-feature +avx \	// RUN: %clang_cc1 -w -triple i386-apple-darwin10 -target-feature +avx \
	// RUN: -emit-llvm -o - %s \| FileCheck %s --check-prefix=ALL --check-prefix=AVX	// RUN: -emit-llvm -o - %s \| FileCheck %s --check-prefix=ALL --check-prefix=DARWIN_AVX
	// RUN: %clang_cc1 -w -triple x86_64-apple-darwin10 -target-feature +avx512f \	// RUN: %clang_cc1 -w -triple x86_64-apple-darwin10 -target-feature +avx512f \
	// RUN: -emit-llvm -o - %s \| FileCheck %s --check-prefix=ALL --check-prefix=AVX512	// RUN: -emit-llvm -o - %s \| FileCheck %s --check-prefix=ALL --check-prefix=DARWIN_AVX512
	// RUN: %clang_cc1 -w -triple i386-apple-darwin10 -target-feature +avx512f \	// RUN: %clang_cc1 -w -triple i386-apple-darwin10 -target-feature +avx512f \
	// RUN: -emit-llvm -o - %s \| FileCheck %s --check-prefix=ALL --check-prefix=AVX512	// RUN: -emit-llvm -o - %s \| FileCheck %s --check-prefix=ALL --check-prefix=DARWIN_AVX512
		// RUN: %clang_cc1 -w -triple armv7-apple-ios10 \
		// RUN: -emit-llvm -o - %s \| FileCheck %s --check-prefix=ALL --check-prefix=DARWIN_ARM32
		// RUN: %clang_cc1 -w -triple arm64-apple-ios10 \
		// RUN: -emit-llvm -o - %s \| FileCheck %s --check-prefix=ALL --check-prefix=DARWIN_ARM64

		// RUN: %clang_cc1 -w -triple x86_64-pc-linux \
		// RUN: -emit-llvm -o - %s \| FileCheck %s --check-prefix=ALL --check-prefix=GENERIC
		// RUN: %clang_cc1 -w -triple i386-pc-linux \
		// RUN: -emit-llvm -o - %s \| FileCheck %s --check-prefix=ALL --check-prefix=GENERIC
		// RUN: %clang_cc1 -w -triple x86_64-pc-linux -target-feature +avx \
		// RUN: -emit-llvm -o - %s \| FileCheck %s --check-prefix=ALL --check-prefix=GENERIC
		// RUN: %clang_cc1 -w -triple i386-pc-linux -target-feature +avx \
		// RUN: -emit-llvm -o - %s \| FileCheck %s --check-prefix=ALL --check-prefix=GENERIC
		// RUN: %clang_cc1 -w -triple x86_64-pc-linux -target-feature +avx512f \
		// RUN: -emit-llvm -o - %s \| FileCheck %s --check-prefix=ALL --check-prefix=GENERIC
		// RUN: %clang_cc1 -w -triple i386-pc-linux -target-feature +avx512f \
		// RUN: -emit-llvm -o - %s \| FileCheck %s --check-prefix=ALL --check-prefix=GENERIC

	// rdar://11759609	// rdar://11759609

	// At or below target max alignment with no aligned attribute should align based	// At or below target max alignment with no aligned attribute should align based
	// on the size of vector.	// on the size of vector.
	double __attribute__((vector_size(16))) v1;	double __attribute__((vector_size(16))) v1;
	// SSE: @v1 {{.*}}, align 16	// DARWIN_SSE: @v1 {{.*}}, align 16
	// AVX: @v1 {{.*}}, align 16	// DARWIN_AVX: @v1 {{.*}}, align 16
	// AVX512: @v1 {{.*}}, align 16	// DARWIN_AVX512: @v1 {{.*}}, align 16
		// DARWIN_ARM32: @v1 {{.*}}, align 16
		// DARWIN_ARM64: @v1 {{.*}}, align 16
		// GENERIC: @v1 {{.*}}, align 16
	double __attribute__((vector_size(32))) v2;	double __attribute__((vector_size(32))) v2;
	// SSE: @v2 {{.*}}, align 16	// DARWIN_SSE: @v2 {{.*}}, align 16
	// AVX: @v2 {{.*}}, align 32	// DARWIN_AVX: @v2 {{.*}}, align 16
	// AVX512: @v2 {{.*}}, align 32	// DARWIN_AVX512: @v2 {{.*}}, align 16
		// DARWIN_ARM32: @v2 {{.*}}, align 16
		// DARWIN_ARM64: @v2 {{.*}}, align 16
		// GENERIC: @v2 {{.*}}, align 32

	// Alignment above target max alignment with no aligned attribute should align	// Alignment above target max alignment with no aligned attribute should align
	// based on the target max.	// based on the target max.
	double __attribute__((vector_size(64))) v3;	double __attribute__((vector_size(64))) v3;
	// SSE: @v3 {{.*}}, align 16	// DARWIN_SSE: @v3 {{.*}}, align 16
	// AVX: @v3 {{.*}}, align 32	// DARWIN_AVX: @v3 {{.*}}, align 16
	// AVX512: @v3 {{.*}}, align 64	// DARWIN_AVX512: @v3 {{.*}}, align 16
		// DARWIN_ARM32: @v3 {{.*}}, align 16
		// DARWIN_ARM64: @v3 {{.*}}, align 16
		// GENERIC: @v3 {{.*}}, align 64
	double __attribute__((vector_size(1024))) v4;	double __attribute__((vector_size(1024))) v4;
	// SSE: @v4 {{.*}}, align 16	// DARWIN_SSE: @v4 {{.*}}, align 16
	// AVX: @v4 {{.*}}, align 32	// DARWIN_AVX: @v4 {{.*}}, align 16
	// AVX512: @v4 {{.*}}, align 64	// DARWIN_AVX512: @v4 {{.*}}, align 16
		// DARWIN_ARM32: @v4 {{.*}}, align 16
		// DARWIN_ARM64: @v4 {{.*}}, align 16
		// GENERIC: @v4 {{.*}}, align 1024

	// Aliged attribute should always override.	// Aliged attribute should always override.
	double __attribute__((vector_size(16), aligned(16))) v5;	double __attribute__((vector_size(16), aligned(16))) v5;
Context not available.

	// Check non-power of 2 widths.	// Check non-power of 2 widths.
	double __attribute__((vector_size(24))) v9;	double __attribute__((vector_size(24))) v9;
	// SSE: @v9 {{.*}}, align 16	// DARWIN_SSE: @v9 {{.*}}, align 16
	// AVX: @v9 {{.*}}, align 32	// DARWIN_AVX: @v9 {{.*}}, align 16
	// AVX512: @v9 {{.*}}, align 32	// DARWIN_AVX512: @v9 {{.*}}, align 16
		// DARWIN_ARM32: @v9 {{.*}}, align 16
		// DARWIN_ARM64: @v9 {{.*}}, align 16
		// GENERIC: @v9 {{.*}}, align 32
	double __attribute__((vector_size(40))) v10;	double __attribute__((vector_size(40))) v10;
	// SSE: @v10 {{.*}}, align 16	// DARWIN_SSE: @v10 {{.*}}, align 16
	// AVX: @v10 {{.*}}, align 32	// DARWIN_AVX: @v10 {{.*}}, align 16
	// AVX512: @v10 {{.*}}, align 64	// DARWIN_AVX512: @v10 {{.*}}, align 16
		// DARWIN_ARM32: @v10 {{.*}}, align 16
		// DARWIN_ARM64: @v10 {{.*}}, align 16
		// GENERIC: @v10 {{.*}}, align 64

	// Check non-power of 2 widths with aligned attribute.	// Check non-power of 2 widths with aligned attribute.
	double __attribute__((vector_size(24), aligned(64))) v11;	double __attribute__((vector_size(24), aligned(64))) v11;
Context not available.

test/CodeGenCXX/align-avx-complete-objects.cpp

Context not available.
	return r[0];	return r[0];
	}	}

	// CHECK: [[R:%.*]] = alloca <8 x float>, align 32	// CHECK: [[R:%.*]] = alloca <8 x float>, align 16
	// CHECK-NEXT: [[CALL:%.]] = call i8 @_Znwm(i64 32)	// CHECK-NEXT: [[CALL:%.]] = call i8 @_Znwm(i64 32)
	// CHECK-NEXT: [[ZERO:%.]] = bitcast i8 [[CALL]] to <8 x float>*	// CHECK-NEXT: [[ZERO:%.]] = bitcast i8 [[CALL]] to <8 x float>*
	// CHECK-NEXT: store <8 x float>* [[ZERO]], <8 x float>** [[P:%.*]], align 8	// CHECK-NEXT: store <8 x float>* [[ZERO]], <8 x float>** [[P:%.*]], align 8
Context not available.
	// CHECK-NEXT: store volatile <8 x float> [[TWO]], <8 x float>* [[THREE]], align 16	// CHECK-NEXT: store volatile <8 x float> [[TWO]], <8 x float>* [[THREE]], align 16
	// CHECK-NEXT: [[FOUR:%.]] = load <8 x float>, <8 x float>** [[P]], align 8	// CHECK-NEXT: [[FOUR:%.]] = load <8 x float>, <8 x float>** [[P]], align 8
	// CHECK-NEXT: [[FIVE:%.]] = load volatile <8 x float>, <8 x float> [[FOUR]], align 16	// CHECK-NEXT: [[FIVE:%.]] = load volatile <8 x float>, <8 x float> [[FOUR]], align 16
	// CHECK-NEXT: store <8 x float> [[FIVE]], <8 x float>* [[R]], align 32	// CHECK-NEXT: store <8 x float> [[FIVE]], <8 x float>* [[R]], align 16
	// CHECK-NEXT: [[SIX:%.]] = load <8 x float>, <8 x float> [[R]], align 32	// CHECK-NEXT: [[SIX:%.]] = load <8 x float>, <8 x float> [[R]], align 16
	// CHECK-NEXT: [[VECEXT:%.*]] = extractelement <8 x float> [[SIX]], i32 0	// CHECK-NEXT: [[VECEXT:%.*]] = extractelement <8 x float> [[SIX]], i32 0
	// CHECK-NEXT: ret float [[VECEXT]]	// CHECK-NEXT: ret float [[VECEXT]]

Context not available.