With the condition N->use_empty() , the root node of DAG always misses peephole optimization.
A dummy node is needed here.
Details
Diff Detail
Unit Tests
Time | Test | |
---|---|---|
60,040 ms | x64 debian > MLIR.Examples/standalone::test.toy | |
60,050 ms | x64 debian > libFuzzer.libFuzzer::large.test |
Event Timeline
llvm/test/CodeGen/RISCV/fold-addi-loadstore.ll | ||
---|---|---|
183 ↗ | (On Diff #409461) | This was already generated. If you use i32 loads/stores with @g_4 then both RV32 and RV64 see codegen differences. Using an i64 on RV32 gets a TokenFactor to glue together the splitting of the illegal i64 store into two legal i32 stores and so gives you a root node that's not one of the nodes you want to optimise. |
194–195 ↗ | (On Diff #409461) | Without this patch these two lines were: addi a0, a0, %lo(g_8) addi a1, a1, 1 sd a1, 0(a0) i.e. the %lo wasn't folded into the store's immediate due to the store being the root node |
I suppose we inherited this bug from PowerPC. @nemanjai maybe you want to fix this for PowerPC?
llvm/test/CodeGen/RISCV/fold-addi-loadstore.ll | ||
---|---|---|
198 ↗ | (On Diff #409461) | Unless you feed the IR through opt, this isn't actually needed, you can just have entry's contents be if.then. The key thing is the ret isn't in the same basic block as the store as that would otherwise be the root for everything with a chain. Maybe it's a good idea to keep the dummy compare though so it's safe against optimisation silently folding the ret back into the basic block, as passing this through opt as it stands does nothing. |
X86 is the same from the looks of it. The only other implementation of PostprocessISelDAG is AMDGPU which does things a bit differently and doesn't seem to have an equivalent "check for uses".
With the current post-processing on X86 I don't thin you could get a failure. None of the opcodes that are being looked for have chain outputs so they can't be the root.