According to the gcc headers, intel intrinsics docs and msdn codegen the _mm_store1_ps/_mm_store1_pd (and their _mm_store_ps1/_mm_store_pd1 analogues) should require an aligned pointer - the clang headers are the only implementation I can find that assume non-aligned stores (by storing with _mm_storeu_ps/_mm_storeu_pd).
This patch raises the alignment requirements to match the other implementations by calling _mm_store_ps/_mm_store_pd instead.
I've also added the missing _mm_store_pd1 intrinsic (which maps to _mm_store1_pd like _mm_store_ps1 does to _mm_store1_ps).
As a followup I'll update the llvm fast-isel tests to match this codegen.
You could use __attribute__((align_value(16))) no?