XRayFileHeader storage was obtained from std::aligned_storage
using its default alignment and not the struct's alignment
requirement. This was causing a bus error on AArch32, on armv8
machines, where vld1.64/vst1.64 instructions with 128-bit
alignment requirement were being used to copy XRayFileHeader.
There is still another issue with fdr-single-thread.cpp test on
armv7. Now it runs until completion and produces a valid log file,
but for some reason the function name appears as _end in it,
instead of the expected mangled fn name.
Random driveby: why do we mix <cfoo> and <foo.h>?