Some early revisions of the Cortex-A53 have an erratum (835769) whereby it is
possible for a 64-bit multiply-accumulate instruction in AArch64 state to
generate an incorrect result. The details are quite complex and hard to
determine statically, since branches in the code may exist in some
circumstances, but all cases end with a memory (load, store, or prefetch)
instruction followed immediately by the multiply-accumulate operation.
The safest work-around for this issue is to make the compiler avoid emitting
multiply-accumulate instructions immediately after memory instructions and the
simplest way to do this is to insert a NOP.
This patch implements such work-around. The work-around is only enabled
when specifying the clang command line option -mfix-cortex-a53-835769 or the
llvm backend option -aarch64-fix-cortex-a53-835769.
The work-around code generation is not enabled by default.
I still maintain what I said internally that the name of the pass should fit convention - there's a convention in two lines above of putting "A5*" after AArch64. Can you please change to fit that convention unless you have a compelling reason otherwise?