- The reason why CSE ignored COPY is PerformTrivialCoalescing.
If COPY insert into HASH and perform PerformTrivialCoalescing: MI->eraseFromParent()
HASH will be broken. Main idea is to perform PerformTrivialCoalescing before insertion.
- cross-regclass copy:
We develop backend for new architecture with independent GPR and ADDRRegs classes.
In this case we have redundant MOVs(between independent disjoint reg classes):
move r0.l, a0.l <<<<<<<<<<<<
store r2.d, (a0.l)
move r0.l, a0.l <<<<<<<<<<<<
store r4.d, (a0.l+8)
Now CSE removes this type of MOV.
- subreg copy: CodeGen\X86\cse-add-with-overflow.ll passed
because we got CSE for second "add" and correspondent COPYs
and "Simple Register Coalescing" performed coalescing for
first "add" and correspondent COPYs (This scheme seems natural,
so I removed FIXME in PerformTrivialCoalescing):
//-------- dump using patch: -print-after-all: # *** IR Dump After Machine Loop Invariant Code Motion ***: # Machine code for function redundantadd: SSA Function Live Ins: %RDI in %vreg2, %RSI in %vreg3 BB#0: derived from LLVM BB %entry Live Ins: %RDI %RSI %vreg3<def> = COPY %RSI; GR64:%vreg3 %vreg2<def> = COPY %RDI; GR64:%vreg2 %vreg0<def> = MOV64rm %vreg2, 1, %noreg, 0, %noreg; mem:LD8[%a0] GR64:%vreg0,%vreg2 %vreg1<def> = MOV64rm %vreg3, 1, %noreg, 0, %noreg; mem:LD8[%a1] GR64:%vreg1,%vreg3 %vreg4<def> = COPY %vreg0:sub_32bit; GR32:%vreg4 GR64:%vreg0 %vreg5<def> = COPY %vreg1:sub_32bit; GR32:%vreg5 GR64:%vreg1 %vreg6<def,tied1> = ADD32rr %vreg4<tied0>, %vreg5<kill>, %EFLAGS<imp-def>; GR32:%vreg6,%vreg4,%vreg5 JNO_4 <BB#2>, %EFLAGS<imp-use> JMP_4 <BB#1> Successors according to CFG: BB#1(1) BB#2(1048575) BB#1: derived from LLVM BB %exit2 Predecessors according to CFG: BB#0 BB#2: derived from LLVM BB %return Predecessors according to CFG: BB#0 >>>>>>>>>%vreg7<def> = COPY %vreg0:sub_32bit; GR32:%vreg7 GR64:%vreg0 >>>>>>>>>%vreg8<def> = COPY %vreg1:sub_32bit; GR32:%vreg8 GR64:%vreg1 >>>>>>>>>%vreg9<def,tied1> = ADD32rr %vreg8<tied0>, %vreg7<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg9,%vreg8,%vreg7 %vreg10<def> = SUBREG_TO_REG 0, %vreg9<kill>, 4; GR64:%vreg10 GR32:%vreg9 %RAX<def> = COPY %vreg10; GR64:%vreg10 RETQ %RAX # End machine code for function redundantadd. # *** IR Dump After Machine Common Subexpression Elimination ***: # Machine code for function redundantadd: SSA Function Live Ins: %RDI in %vreg2, %RSI in %vreg3 BB#0: derived from LLVM BB %entry Live Ins: %RDI %RSI %vreg3<def> = COPY %RSI; GR64:%vreg3 %vreg2<def> = COPY %RDI; GR64:%vreg2 %vreg0<def> = MOV64rm %vreg2, 1, %noreg, 0, %noreg; mem:LD8[%a0] GR64:%vreg0,%vreg2 %vreg1<def> = MOV64rm %vreg3, 1, %noreg, 0, %noreg; mem:LD8[%a1] GR64:%vreg1,%vreg3 %vreg4<def> = COPY %vreg0:sub_32bit; GR32:%vreg4 GR64:%vreg0 %vreg5<def> = COPY %vreg1:sub_32bit; GR32:%vreg5 GR64:%vreg1 %vreg6<def,tied1> = ADD32rr %vreg4<tied0>, %vreg5, %EFLAGS<imp-def>; GR32:%vreg6,%vreg4,%vreg5 JNO_4 <BB#2>, %EFLAGS<imp-use> JMP_4 <BB#1> Successors according to CFG: BB#1(1) BB#2(1048575) BB#1: derived from LLVM BB %exit2 Predecessors according to CFG: BB#0 BB#2: derived from LLVM BB %return Predecessors according to CFG: BB#0 %vreg10<def> = SUBREG_TO_REG 0, %vreg6, 4; GR64:%vreg10 GR32:%vreg6 %RAX<def> = COPY %vreg10; GR64:%vreg10 RETQ %RAX # End machine code for function redundantadd. ......................................... ......................................... # *** IR Dump After Live Interval Analysis ***: # Machine code for function redundantadd: Post SSA Function Live Ins: %RDI in %vreg2, %RSI in %vreg3 0B BB#0: derived from LLVM BB %entry Live Ins: %RDI %RSI 16B %vreg3<def> = COPY %RSI; GR64:%vreg3 32B %vreg2<def> = COPY %RDI; GR64:%vreg2 48B %vreg0<def> = MOV64rm %vreg2, 1, %noreg, 0, %noreg; mem:LD8[%a0] GR64:%vreg0,%vreg2 64B %vreg1<def> = MOV64rm %vreg3, 1, %noreg, 0, %noreg; mem:LD8[%a1] GR64:%vreg1,%vreg3 >>>>>>>>>>80B %vreg4<def> = COPY %vreg0:sub_32bit; GR32:%vreg4 GR64:%vreg0 >>>>>>>>>>96B %vreg5<def> = COPY %vreg1:sub_32bit; GR32:%vreg5 GR64:%vreg1 >>>>>>>>>>112B %vreg6<def> = COPY %vreg5; GR32:%vreg6,%vreg5 128B %vreg6<def,tied1> = ADD32rr %vreg6<tied0>, %vreg4, %EFLAGS<imp-def>; GR32:%vreg6,%vreg4 144B JNO_4 <BB#2>, %EFLAGS<imp-use,kill> 160B JMP_4 <BB#1> Successors according to CFG: BB#1(1) BB#2(1048575) 176B BB#1: derived from LLVM BB %exit2 Predecessors according to CFG: BB#0 192B BB#2: derived from LLVM BB %return Predecessors according to CFG: BB#0 208B %vreg10<def> = SUBREG_TO_REG 0, %vreg6, 4; GR64:%vreg10 GR32:%vreg6 224B %RAX<def> = COPY %vreg10; GR64:%vreg10 240B RETQ %RAX<kill> # End machine code for function redundantadd. # *** IR Dump After Simple Register Coalescing ***: # Machine code for function redundantadd: Post SSA Function Live Ins: %RDI in %vreg2, %RSI in %vreg3 0B BB#0: derived from LLVM BB %entry Live Ins: %RDI %RSI 16B %vreg3<def> = COPY %RSI; GR64:%vreg3 32B %vreg2<def> = COPY %RDI; GR64:%vreg2 48B %vreg0<def> = MOV64rm %vreg2, 1, %noreg, 0, %noreg; mem:LD8[%a0] GR64_with_sub_8bit:%vreg0 GR64:%vreg2 64B %vreg10<def> = MOV64rm %vreg3, 1, %noreg, 0, %noreg; mem:LD8[%a1] GR64_with_sub_8bit:%vreg10 GR64:%vreg3 128B %vreg10:sub_32bit<def,tied1> = ADD32rr %vreg10:sub_32bit<tied0>, %vreg0:sub_32bit, %EFLAGS<imp-def>; GR64_with_sub_8bit:%vreg10,%vreg0 144B JNO_4 <BB#2>, %EFLAGS<imp-use,kill> 160B JMP_4 <BB#1> Successors according to CFG: BB#1(1) BB#2(1048575) 176B BB#1: derived from LLVM BB %exit2 Predecessors according to CFG: BB#0 192B BB#2: derived from LLVM BB %return Predecessors according to CFG: BB#0 224B %RAX<def> = COPY %vreg10; GR64_with_sub_8bit:%vreg10 240B RETQ %RAX<kill> # End machine code for function redundantadd.
Also I modified cse-add-with-overflow.ll to minimize test.
- problem with CodeGen/X86/inline-asm-fpstack.ll: //-------- dump: -print-after-all:
- * IR Dump After Machine Loop Invariant Code Motion *:
- Machine code for function testPR4185b: Post SSA Constant Pool: cp#0: 1.000000e+06, align=4
BB#0: derived from LLVM BB %return %FP0<def> = LD_Fp32m80 %noreg, 1, %noreg, <cp#0>, %noreg, %FPSW<imp-def,dead>; mem:LD4[ConstantPool] >>>>>>>%ST0<def> = COPY %FP0 INLINEASM <es:fistl $0> [sideeffect] [attdialect], $0:[reguse], %ST0 >>>>>>>%ST0<def> = COPY %FP0<kill> INLINEASM <es:fistpl $0> [sideeffect] [attdialect], $0:[reguse], %ST0, $1:[clobber], %ST0<earlyclobber,imp-def> RETL - End machine code for function testPR4185b.
- * IR Dump After Prologue/Epilogue Insertion & Frame Finalization *:
- Machine code for function testPR4185b: Post SSA Constant Pool: cp#0: 1.000000e+06, align=4
BB#0: derived from LLVM BB %return LD_F32m %noreg, 1, %noreg, <cp#0>, %noreg, %FPSW<imp-def,dead>; mem:LD4[ConstantPool] INLINEASM <es:fistl $0> [sideeffect] [attdialect], $0:[reguse], %ST0 INLINEASM <es:fistpl $0> [sideeffect] [attdialect], $0:[reguse], %ST0, $1:[clobber], %ST0<earlyclobber,imp-def> RETL - End machine code for function testPR4185b.
//-------- dump using patch: - * IR Dump After Machine Loop Invariant Code Motion *:
- Machine code for function testPR4185b: Post SSA Constant Pool: cp#0: 1.000000e+06, align=4
BB#0: derived from LLVM BB %return %FP0<def> = LD_Fp32m80 %noreg, 1, %noreg, <cp#0>, %noreg, %FPSW<imp-def,dead>; mem:LD4[ConstantPool] >>>>>>>>>%ST0<def> = COPY %FP0<kill> INLINEASM <es:fistl $0> [sideeffect] [attdialect], $0:[reguse], %ST0 INLINEASM <es:fistpl $0> [sideeffect] [attdialect], $0:[reguse], %ST0, $1:[clobber], %ST0<earlyclobber,imp-def> RETL - End machine code for function testPR4185b.
- * IR Dump After Prologue/Epilogue Insertion & Frame Finalization *:
- Machine code for function testPR4185b: Post SSA Constant Pool: cp#0: 1.000000e+06, align=4
BB#0: derived from LLVM BB %return LD_F32m %noreg, 1, %noreg, <cp#0>, %noreg, %FPSW<imp-def,dead>; mem:LD4[ConstantPool] INLINEASM <es:fistl $0> [sideeffect] [attdialect], $0:[reguse], %ST0 >>>>>>>>???ST_FPrr %ST0, %FPSW<imp-def> >>>>>>>>???LD_F0 %FPSW<imp-def> INLINEASM <es:fistpl $0> [sideeffect] [attdialect], $0:[reguse], %ST0, $1:[clobber], %ST0<earlyclobber,imp-def> RETL - End machine code for function testPR4185b.
I'm looking for why "Prologue/Epilogue Insertion" include
ST_FPrr %ST0, %FPSW<imp-def>
LD_F0 %FPSW<imp-def>
asap. For now we have option -cse-ignore-copy with correspondent FIXME.