The baseline allowsMemoryAccess() is wrong for X86.
It assumes that aligned memory operations are always allowed,
but that is not true.
For example, We can not perform a 32-byte aligned non-temporal load
of a 32-byte vector, without AVX2 that is, yet allowsMemoryAccess()
will say it is allowed, so we may end up merging non-temporal loads,
only to split them up to legalize them, and here we go again.
(style) isMemoryAccessFast sounds a little better