As the test case shows, if a pass constant-fold the generated
llvm.ppc.cfence(%tmp) to llvm.ppc.cfence(0), cfence should lower to
nothing, instead of crash later in legalization.
Details
Diff Detail
- Build Status
Buildable 6735 Build 6735: arc lint + arc unit
Event Timeline
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
8220 | I agree with this, but I'm wondering if it could/should be generalized a bit more. | |
llvm/test/CodeGen/PowerPC/atomics-constant.ll | ||
4 | The triple we normally use is powerpc64le-unknown-linux-gnu. |
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
8220 | I dag a bit more and found the root cause, which is fixed in D33573. This patch, however only makes things faster, not more correct. The optimization opportunity comes from that, a pass proves at compile time, that the load instruction itself doesn't synchronize with any stores, therefore it's Ok to change the parameter to @llvm.ppc.cfence to something else. The interesting bits are "at compile time". This basically limits the optimization to ConstantSDNode only - other nodes represent behaviors at runtime, and are unlikely to be proven not synchronized with any stores whatsoever. I suggest to keep these two cases (constant and others), until we find other pessimizations. Does it make sense? |
LGTM.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
8220 | Yes, that's makes sense. Thanks for looking into this. |
The problem is not PPC specific, so I'm going to abandon this patch, in order to not demotivate the correct fix. Someone should do it at IR level, when it actually pessimize their code. :)
I agree with this, but I'm wondering if it could/should be generalized a bit more.
Are there other cases of SDNodes that we could reach here that are not valid?
Alternatively, is it possible to enumerate the legal SD nodes that could be here and remove everything that is not legal?