As the test case shows, if a pass constant-fold the generated
llvm.ppc.cfence(%tmp) to llvm.ppc.cfence(0), cfence should lower to
nothing, instead of crash later in legalization.
I agree with this, but I'm wondering if it could/should be generalized a bit more.
The triple we normally use is powerpc64le-unknown-linux-gnu.
I dag a bit more and found the root cause, which is fixed in D33573. This patch, however only makes things faster, not more correct.
The optimization opportunity comes from that, a pass proves at compile time, that the load instruction itself doesn't synchronize with any stores, therefore it's Ok to change the parameter to @llvm.ppc.cfence to something else.
The interesting bits are "at compile time". This basically limits the optimization to ConstantSDNode only - other nodes represent behaviors at runtime, and are unlikely to be proven not synchronized with any stores whatsoever.
I suggest to keep these two cases (constant and others), until we find other pessimizations. Does it make sense?
The problem is not PPC specific, so I'm going to abandon this patch, in order to not demotivate the correct fix. Someone should do it at IR level, when it actually pessimize their code. :)