This is an archive of the discontinued LLVM Phabricator instance.

[X86] Fix bug when X86 stackify pass handle one ArgFPRW.
Needs ReviewPublic

Authored by LuoYuanke on Jun 17 2021, 12:39 AM.

Details

Summary

Here is the scenario that cause compiler crash.

successors: %bb.26
liveins: $r14
ST_FPrr $st0, implicit-def $fpsw, implicit $fpcw
renamable $rdi = MOV64ri @.str.3.16422
renamable $rdx = LEA64r %stack.6, 1, $noreg, 0, $noreg
ADJCALLSTACKDOWN64 0, 0, 0, implicit-def $rsp, implicit-def dead $eflags, implicit-def $ssp, implicit $rsp, implicit $ssp
dead $esi = MOV32r0 implicit-def dead $eflags, implicit-def $rsi
CALL64pcrel32 @foo, implicit $rsp, implicit $ssp, implicit $rdi, implicit $rsi, implicit $rdx, implicit-def dead $fp0
renamable $xmm0 = MOVSDrm_alt %stack.10, 1, $noreg, 0, $noreg :: (load 8 from %stack.10)
ADJCALLSTACKUP64 0, 0, implicit-def $rsp, implicit-def dead $eflags, implicit-def $ssp, implicit $rsp, implicit $ssp
renamable $fp2 = CHS_Fp80 killed undef renamable $fp0, implicit-def $fpsw
JMP_1 %bb.26

The CALL64pcrel32 mark fp0 dead, so llvm free the stack slot for fp0
and the stack become empty. In the late instruction CHS_Fp80, it use
undefined register fp0, the original code assume there must be a stack
slot for the src register (fp0) without respecting it is undefined,
so llvm report error. The fix is to check if the source register is
undefined when stack is empty(). If it is undefined, advance 1 stack
slot to avoid empty stack.

Diff Detail

Event Timeline

LuoYuanke created this revision.Jun 17 2021, 12:39 AM
LuoYuanke requested review of this revision.Jun 17 2021, 12:39 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 17 2021, 12:39 AM

How did you get a CHS with an undef input?

craig.topper added inline comments.Jun 17 2021, 5:05 PM
llvm/test/CodeGen/X86/fpstack-call.mir
37

Isn't this just going to crash in hardware? Hardware also checks stack empty.

craig.topper added inline comments.Jun 17 2021, 5:09 PM
llvm/test/CodeGen/X86/fpstack-call.mir
37

I guess hardware would just make up a value if the exception is masked?

How did you get a CHS with an undef input?

I got the MIR from a big application which is built with LTO.

llvm/test/CodeGen/X86/fpstack-call.mir
37

Good catch. I would insert a fld0 to make the fp stack non-empty in runtime.

LuoYuanke updated this revision to Diff 352986.Jun 18 2021, 5:51 AM

Address Craig's comments.

Can you get IR and use bugpoint to reduce it? I'd really like to see the backend codegen that led to this case.

I'd also like to know what happens if you add "(fneg undef) -> undef" fold to DAGCombiner::visitFNEG.

I did some more digging and it looks like ISD::UNDEF for X86 should be turned into ConstantFP<0> by LegalizeDAG. So I really need more information about how we got here.

Can you get IR and use bugpoint to reduce it? I'd really like to see the backend codegen that led to this case.

The .ll file is about 422 M. The reduce progress is slow. It has run 18 hours, but still not finished. Need more time to wait. Any parallel scheme to accelerate the reduce for bugpoint?

Can you get IR and use bugpoint to reduce it? I'd really like to see the backend codegen that led to this case.

The .ll file is about 422 M. The reduce progress is slow. It has run 18 hours, but still not finished. Need more time to wait. Any parallel scheme to accelerate the reduce for bugpoint?

Did you run llvm-extract to isolate the broken function first? bugpoint is not good at that.

Did you run llvm-extract to isolate the broken function first? bugpoint is not good at that.

Thanks Craig for the suggestion. After run "llvm-extract --recursive" and get the small file, I can't reproduce this issue. However I use -print-after-all to dump the IR of each pass. The undefine value is created in "processimpdefs" pass.

From

bb.11 (%ir-block.61):
; predecessors: %bb.10
  successors: %bb.26

  %164:gr64 = MOV64ri @.str.3.16422
  %165:gr32 = MOV32r0 implicit-def $eflags
  %166:gr64 = SUBREG_TO_REG 0, %165:gr32, %subreg.sub_32bit
  %167:gr64 = LEA64r %stack.6, 1, $noreg, 0, $noreg
  ADJCALLSTACKDOWN64 0, 0, 0, implicit-def $rsp, implicit-def $eflags, implicit-def $ssp, implicit $rsp, implicit $ssp
  $rdi = COPY %164:gr64
  $rsi = COPY %166:gr64
  $rdx = COPY %167:gr64
  CALL64pcrel32 @_ZN5boost4math8policies20raise_overflow_errorIeNS1_6policyINS1_13promote_floatILb0EEENS1_14promote_doubleILb0EEENS1_14default_policyES8_S8_S8_S8_S8_S8_S8_S8_S8_S8_EEEET_PKcSC_RKT0_, <regmask $bh $bl $bp $bph $bpl $bx $ebp $ebx $hbp $hbx $rbp $rbx $r12 $r13 $r14 $r15 $r12b $r13b $r14b $r15b $r12bh $r13bh $r14bh $r15bh $r12d $r13d $r14d $r15d $r12w $r13w $r14w $r15w $r12wh and 3 more...>, implicit $rsp, implicit $ssp, implicit $rdi, implicit $rsi, implicit $rdx, implicit-def $fp0
  ADJCALLSTACKUP64 0, 0, implicit-def $rsp, implicit-def $eflags, implicit-def $ssp, implicit $rsp, implicit $ssp
  %168:rfp80 = COPY $fp0
  %162:rfp80 = IMPLICIT_DEF
  %163:rfp80 = CHS_Fp80 %162:rfp80, implicit-def $fpsw
  JMP_1 %bb.26

To

bb.11 (%ir-block.61):
; predecessors: %bb.10
  successors: %bb.26

  %164:gr64 = MOV64ri @.str.3.16422
  %165:gr32 = MOV32r0 implicit-def $eflags
  %166:gr64 = SUBREG_TO_REG 0, %165:gr32, %subreg.sub_32bit
  %167:gr64 = LEA64r %stack.6, 1, $noreg, 0, $noreg
  ADJCALLSTACKDOWN64 0, 0, 0, implicit-def $rsp, implicit-def $eflags, implicit-def $ssp, implicit $rsp, implicit $ssp
  $rdi = COPY %164:gr64
  $rsi = COPY %166:gr64
  $rdx = COPY %167:gr64
  CALL64pcrel32 @_ZN5boost4math8policies20raise_overflow_errorIeNS1_6policyINS1_13promote_floatILb0EEENS1_14promote_doubleILb0EEENS1_14default_policyES8_S8_S8_S8_S8_S8_S8_S8_S8_S8_EEEET_PKcSC_RKT0_, <regmask $bh $bl $bp $bph $bpl $bx $ebp $ebx $hbp $hbx $rbp $rbx $r12 $r13 $r14 $r15 $r12b $r13b $r14b $r15b $r12bh $r13bh $r14bh $r15bh $r12d $r13d $r14d $r15d $r12w $r13w $r14w $r15w $r12wh and 3 more...>, implicit $rsp, implicit $ssp, implicit $rdi, implicit $rsi, implicit $rdx, implicit-def $fp0
  ADJCALLSTACKUP64 0, 0, implicit-def $rsp, implicit-def $eflags, implicit-def $ssp, implicit $rsp, implicit $ssp
  %168:rfp80 = COPY $fp0
  %163:rfp80 = CHS_Fp80 undef %162:rfp80, implicit-def $fpsw
  JMP_1 %bb.26

Is this transform reasonable? "%162:rfp80 = IMPLICIT_DEF" is generated in ISel. I will look into why "%162:rfp80 = IMPLICIT_DEF" is generated.

Here is IR.

61:                                               ; preds = %59
  %62 = call contract x86_fp80 @_ZN5boost4math8policies20raise_overflow_errorIeNS1_6policyINS1_13promote_floatILb0EEENS1_14promote_doubleILb0EEENS1_14default_policyES8_S8_S8_S8_S8_S8_S8_S8_S8_S8_EEEET_PKcSC_RKT0_(i8* getelementptr inbounds ([29 x i8], [29 x i8]* @.str.3.16422, i64 0, i64 0), i8* null, %"struct.std::__atomic_flag_base"* nonnull align 1 dereferenceable(1) %9)
  %63 = fneg contract x86_fp80 undef
  br label %106
craig.topper added a comment.EditedJun 19 2021, 9:55 PM

Did you run llvm-extract to isolate the broken function first? bugpoint is not good at that.

Thanks Craig for the suggestion. After run "llvm-extract --recursive" and get the small file, I can't reproduce this issue. However I use -print-after-all to dump the IR of each pass. The undefine value is created in "processimpdefs" pass.

From

bb.11 (%ir-block.61):
; predecessors: %bb.10
  successors: %bb.26

  %164:gr64 = MOV64ri @.str.3.16422
  %165:gr32 = MOV32r0 implicit-def $eflags
  %166:gr64 = SUBREG_TO_REG 0, %165:gr32, %subreg.sub_32bit
  %167:gr64 = LEA64r %stack.6, 1, $noreg, 0, $noreg
  ADJCALLSTACKDOWN64 0, 0, 0, implicit-def $rsp, implicit-def $eflags, implicit-def $ssp, implicit $rsp, implicit $ssp
  $rdi = COPY %164:gr64
  $rsi = COPY %166:gr64
  $rdx = COPY %167:gr64
  CALL64pcrel32 @_ZN5boost4math8policies20raise_overflow_errorIeNS1_6policyINS1_13promote_floatILb0EEENS1_14promote_doubleILb0EEENS1_14default_policyES8_S8_S8_S8_S8_S8_S8_S8_S8_S8_EEEET_PKcSC_RKT0_, <regmask $bh $bl $bp $bph $bpl $bx $ebp $ebx $hbp $hbx $rbp $rbx $r12 $r13 $r14 $r15 $r12b $r13b $r14b $r15b $r12bh $r13bh $r14bh $r15bh $r12d $r13d $r14d $r15d $r12w $r13w $r14w $r15w $r12wh and 3 more...>, implicit $rsp, implicit $ssp, implicit $rdi, implicit $rsi, implicit $rdx, implicit-def $fp0
  ADJCALLSTACKUP64 0, 0, implicit-def $rsp, implicit-def $eflags, implicit-def $ssp, implicit $rsp, implicit $ssp
  %168:rfp80 = COPY $fp0
  %162:rfp80 = IMPLICIT_DEF
  %163:rfp80 = CHS_Fp80 %162:rfp80, implicit-def $fpsw
  JMP_1 %bb.26

To

bb.11 (%ir-block.61):
; predecessors: %bb.10
  successors: %bb.26

  %164:gr64 = MOV64ri @.str.3.16422
  %165:gr32 = MOV32r0 implicit-def $eflags
  %166:gr64 = SUBREG_TO_REG 0, %165:gr32, %subreg.sub_32bit
  %167:gr64 = LEA64r %stack.6, 1, $noreg, 0, $noreg
  ADJCALLSTACKDOWN64 0, 0, 0, implicit-def $rsp, implicit-def $eflags, implicit-def $ssp, implicit $rsp, implicit $ssp
  $rdi = COPY %164:gr64
  $rsi = COPY %166:gr64
  $rdx = COPY %167:gr64
  CALL64pcrel32 @_ZN5boost4math8policies20raise_overflow_errorIeNS1_6policyINS1_13promote_floatILb0EEENS1_14promote_doubleILb0EEENS1_14default_policyES8_S8_S8_S8_S8_S8_S8_S8_S8_S8_EEEET_PKcSC_RKT0_, <regmask $bh $bl $bp $bph $bpl $bx $ebp $ebx $hbp $hbx $rbp $rbx $r12 $r13 $r14 $r15 $r12b $r13b $r14b $r15b $r12bh $r13bh $r14bh $r15bh $r12d $r13d $r14d $r15d $r12w $r13w $r14w $r15w $r12wh and 3 more...>, implicit $rsp, implicit $ssp, implicit $rdi, implicit $rsi, implicit $rdx, implicit-def $fp0
  ADJCALLSTACKUP64 0, 0, implicit-def $rsp, implicit-def $eflags, implicit-def $ssp, implicit $rsp, implicit $ssp
  %168:rfp80 = COPY $fp0
  %163:rfp80 = CHS_Fp80 undef %162:rfp80, implicit-def $fpsw
  JMP_1 %bb.26

Is this transform reasonable? "%162:rfp80 = IMPLICIT_DEF" is generated in ISel. I will look into why "%162:rfp80 = IMPLICIT_DEF" is generated.

I don’t know what llvm-extract —recursive does. I’ve only used -func to extract a single function I knew caused a compiler crash.

Converting IMPLICIT_DEF to undef flag is correct.

IMPLICIT_DEF can get created from ISD::UNDEF but as far I could see ISD::UNDEF for fp80 is supposed to Expand to ConstantFP 0.

I also think fneg of undef should be folded by getNode when it is created in SelectionDAGBuilder. Is this going through fast isel or something?

I don’t know what llvm-extract —recursive does. I’ve only used -func to extract a single function I knew caused a compiler crash.

Converting IMPLICIT_DEF to undef flag is correct.

IMPLICIT_DEF can get created from ISD::UNDEF but as far I could see ISD::UNDEF for fp80 is supposed to Expand to ConstantFP 0.

I also think fneg of undef should be folded by getNode when it is created in SelectionDAGBuilder. Is this going through fast isel or something?

In my small test case that is extracted by llvm-extract, the "fneg contract x86_fp80 undef" is lowered to "LD_Fp080". But in the big case which cause crash, it is lowered to "IMPLICIT_DEF" and "CHS_Fp80".

76:                                               ; preds = %74
  %77 = call contract x86_fp80 @_ZN5boost4math8policies20raise_overflow_errorIeNS1_6policyINS1_13promote_floatILb0EEENS1_14promote_doubleILb0EEENS1_14default_policyES8_S8_S8_S8_S8_S8_S8_S8_S8_S8_EEEET_PKcSC_RKT0_(i8* getelementptr inbounds ([29 x i8], [29 x i8]* @.str.3.16422, i64 0, i64 0), i8* null, %"struct.std::__atomic_flag_base"* nonnull align 1 dereferenceable(1) %9)
  %78 = fneg contract x86_fp80 undef
  br label %121
CALL64pcrel32 @_ZN5boost4math8policies20raise_overflow_errorIeNS1_6policyINS1_13promote_floatILb0EEENS1_14promote_doubleILb0EEENS1_14default_policyES8_S8_S8_S8_S8_S8_S8_S8_S8_S8_EEEET_PKcSC_RKT0_, <regmask $bh $bl $bp $bph $bpl $bx $ebp $ebx $hbp $hbx $rbp $rbx $r12 $r13 $r14 $r15 $r12b $r13b $r14b $r15b $r12bh $r13bh $r14bh $r15bh $r12d $r13d $r14d $r15d $r12w $r13w $r14w $r15w $r12wh and 3 more...>, implicit $rsp, implicit $ssp, implicit $rdi, implicit $rsi, implicit $rdx, implicit-def $rsp, implicit-def $ssp, implicit-def $fp0
ADJCALLSTACKUP64 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
%118:rfp80 = COPY $fp0
%13:rfp80 = nofpexcept LD_Fp080 implicit-def dead $fpsw, implicit $fpcw

BTW, do you think the patch to handle undef case in stackify pass reasonable?

BTW, do you think the patch to handle undef case in stackify pass reasonable?

No I do not. I have no reason to believe fneg or one arg instructions are the only things that could be effected. We need to understand why this is happening because isel was trying to prevent this.

BTW, do you think the patch to handle undef case in stackify pass reasonable?

No I do not. I have no reason to believe fneg or one arg instructions are the only things that could be effected. We need to understand why this is happening because isel was trying to prevent this.

Ok, I'll continue to debug it in isel.

I don’t know what llvm-extract —recursive does. I’ve only used -func to extract a single function I knew caused a compiler crash.

Converting IMPLICIT_DEF to undef flag is correct.

IMPLICIT_DEF can get created from ISD::UNDEF but as far I could see ISD::UNDEF for fp80 is supposed to Expand to ConstantFP 0.

I also think fneg of undef should be folded by getNode when it is created in SelectionDAGBuilder. Is this going through fast isel or something?

In my small test case that is extracted by llvm-extract, the "fneg contract x86_fp80 undef" is lowered to "LD_Fp080". But in the big case which cause crash, it is lowered to "IMPLICIT_DEF" and "CHS_Fp80".

76:                                               ; preds = %74
  %77 = call contract x86_fp80 @_ZN5boost4math8policies20raise_overflow_errorIeNS1_6policyINS1_13promote_floatILb0EEENS1_14promote_doubleILb0EEENS1_14default_policyES8_S8_S8_S8_S8_S8_S8_S8_S8_S8_EEEET_PKcSC_RKT0_(i8* getelementptr inbounds ([29 x i8], [29 x i8]* @.str.3.16422, i64 0, i64 0), i8* null, %"struct.std::__atomic_flag_base"* nonnull align 1 dereferenceable(1) %9)
  %78 = fneg contract x86_fp80 undef
  br label %121
CALL64pcrel32 @_ZN5boost4math8policies20raise_overflow_errorIeNS1_6policyINS1_13promote_floatILb0EEENS1_14promote_doubleILb0EEENS1_14default_policyES8_S8_S8_S8_S8_S8_S8_S8_S8_S8_EEEET_PKcSC_RKT0_, <regmask $bh $bl $bp $bph $bpl $bx $ebp $ebx $hbp $hbx $rbp $rbx $r12 $r13 $r14 $r15 $r12b $r13b $r14b $r15b $r12bh $r13bh $r14bh $r15bh $r12d $r13d $r14d $r15d $r12w $r13w $r14w $r15w $r12wh and 3 more...>, implicit $rsp, implicit $ssp, implicit $rdi, implicit $rsi, implicit $rdx, implicit-def $rsp, implicit-def $ssp, implicit-def $fp0
ADJCALLSTACKUP64 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
%118:rfp80 = COPY $fp0
%13:rfp80 = nofpexcept LD_Fp080 implicit-def dead $fpsw, implicit $fpcw

Are you running the large and small cases the same way? Have looked at the SelectionDAG debug logs for the affected basic block in the large case?

I was able to trigger the error with llc -O2 -fast-isel on a simple test. So that is a path to create this but you haven’t answered if fast isel is being used in your case.

Are you running the large and small cases the same way? Have looked at the SelectionDAG debug logs for the affected basic block in the large case?

I was able to trigger the error with llc -O2 -fast-isel on a simple test. So that is a path to create this but you haven’t answered if fast isel is being used in your case.

Not exact the same way. For large case, I run through ld.lld and with some options like this "-plugin-opt=mcpu=x86-64 -plugin-opt=O3". For small case, I assembly the -plugin-opt and use llc to run it. Since I use O3, so I think it won't run fastISel. However, I'll check fastISel in large case.

Do you have a patch to fix fast ISel, so that I can verify your patch in my large case?

With -O0, the small case can also generate "IMPLICIT_DEF" and "CHS_Fp80". I think we are near to the root cause. Stay tuned.

With -O0, the small case can also generate "IMPLICIT_DEF" and "CHS_Fp80". I think we are near to the root cause. Stay tuned.

ProcessImplicitDefs doesn’t run with O0 and FP stackifier has special code for IMPLICIT_DEF.

I think because I use the

With -O0, the small case can also generate "IMPLICIT_DEF" and "CHS_Fp80". I think we are near to the root cause. Stay tuned.

ProcessImplicitDefs doesn’t run with O0 and FP stackifier has special code for IMPLICIT_DEF.

This is because I set opt-bisect-limit=67436 in my command line. When CurBisectNum expired, "DAG to DAG" pass lower its opt level to O0. However "processimpdefs" and "X86 FP Stackifier" is not stopped due to the CurBisectNum expiration. So undefined fp0 is generated.

if (OptLevel != CodeGenOpt::None && skipFunction(Fn))
  NewOptLevel = CodeGenOpt::None;
OptLevelChanger OLC(*this, NewOptLevel);

I think because I use the

With -O0, the small case can also generate "IMPLICIT_DEF" and "CHS_Fp80". I think we are near to the root cause. Stay tuned.

ProcessImplicitDefs doesn’t run with O0 and FP stackifier has special code for IMPLICIT_DEF.

This is because I set opt-bisect-limit=67436 in my command line. When CurBisectNum expired, "DAG to DAG" pass lower its opt level to O0. However "processimpdefs" and "X86 FP Stackifier" is not stopped due to the CurBisectNum expiration. So undefined fp0 is generated.

if (OptLevel != CodeGenOpt::None && skipFunction(Fn))
  NewOptLevel = CodeGenOpt::None;
OptLevelChanger OLC(*this, NewOptLevel);

Does this fix your test

diff --git a/llvm/lib/Target/X86/X86FastISel.cpp b/llvm/lib/Target/X86/X86FastISel.cpp
index 44670a9..3e5d45b 100644
--- a/llvm/lib/Target/X86/X86FastISel.cpp
+++ b/llvm/lib/Target/X86/X86FastISel.cpp
@@ -3842,6 +3842,30 @@ unsigned X86FastISel::fastMaterializeConstant(const Constant *C) {
     return X86MaterializeFP(CFP, VT);
   else if (const GlobalValue *GV = dyn_cast<GlobalValue>(C))
     return X86MaterializeGV(GV, VT);
+  else if (isa<UndefValue>(C)) {
+    unsigned Opc = 0;
+    switch (VT.SimpleTy) {
+    default:
+      break;
+    case MVT::f32:
+      if (!X86ScalarSSEf32)
+        Opc = X86::LD_Fp032;
+      break;
+    case MVT::f64:
+      if (!X86ScalarSSEf64)
+        Opc = X86::LD_Fp064;
+      break;
+    case MVT::f80:
+      Opc = X86::LD_Fp080;
+      break;
+    }
+
+    if (Opc) {
+      Register ResultReg = createResultReg(TLI.getRegClassFor(VT));
+      BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(Opc), ResultReg);
+      return ResultReg;
+    }
+  }
 
   return 0;
 }

Does this fix your test

diff --git a/llvm/lib/Target/X86/X86FastISel.cpp b/llvm/lib/Target/X86/X86FastISel.cpp
index 44670a9..3e5d45b 100644
--- a/llvm/lib/Target/X86/X86FastISel.cpp
+++ b/llvm/lib/Target/X86/X86FastISel.cpp
@@ -3842,6 +3842,30 @@ unsigned X86FastISel::fastMaterializeConstant(const Constant *C) {
     return X86MaterializeFP(CFP, VT);
   else if (const GlobalValue *GV = dyn_cast<GlobalValue>(C))
     return X86MaterializeGV(GV, VT);
+  else if (isa<UndefValue>(C)) {
+    unsigned Opc = 0;
+    switch (VT.SimpleTy) {
+    default:
+      break;
+    case MVT::f32:
+      if (!X86ScalarSSEf32)
+        Opc = X86::LD_Fp032;
+      break;
+    case MVT::f64:
+      if (!X86ScalarSSEf64)
+        Opc = X86::LD_Fp064;
+      break;
+    case MVT::f80:
+      Opc = X86::LD_Fp080;
+      break;
+    }
+
+    if (Opc) {
+      Register ResultReg = createResultReg(TLI.getRegClassFor(VT));
+      BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(Opc), ResultReg);
+      return ResultReg;
+    }
+  }
 
   return 0;
 }

Yes. It can fix. Thank you! One question, is there any other consideration to prevent undef float value in MIR? Otherwise stackify pass can support undef value by insert fld0 instruction, so that all ISel passes don't have to handle it specially.

Does this fix your test

diff --git a/llvm/lib/Target/X86/X86FastISel.cpp b/llvm/lib/Target/X86/X86FastISel.cpp
index 44670a9..3e5d45b 100644
--- a/llvm/lib/Target/X86/X86FastISel.cpp
+++ b/llvm/lib/Target/X86/X86FastISel.cpp
@@ -3842,6 +3842,30 @@ unsigned X86FastISel::fastMaterializeConstant(const Constant *C) {
     return X86MaterializeFP(CFP, VT);
   else if (const GlobalValue *GV = dyn_cast<GlobalValue>(C))
     return X86MaterializeGV(GV, VT);
+  else if (isa<UndefValue>(C)) {
+    unsigned Opc = 0;
+    switch (VT.SimpleTy) {
+    default:
+      break;
+    case MVT::f32:
+      if (!X86ScalarSSEf32)
+        Opc = X86::LD_Fp032;
+      break;
+    case MVT::f64:
+      if (!X86ScalarSSEf64)
+        Opc = X86::LD_Fp064;
+      break;
+    case MVT::f80:
+      Opc = X86::LD_Fp080;
+      break;
+    }
+
+    if (Opc) {
+      Register ResultReg = createResultReg(TLI.getRegClassFor(VT));
+      BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(Opc), ResultReg);
+      return ResultReg;
+    }
+  }
 
   return 0;
 }

Yes. It can fix. Thank you! One question, is there any other consideration to prevent undef float value in MIR? Otherwise stackify pass can support undef value by insert fld0 instruction, so that all ISel passes don't have to handle it specially.

I'm not sure. To do it in the stackifier you need to do it any time the undef flag is present regardless of whether StackTop is 0. You instead would need to check whether the register is already present in the stack and only insert if it isn't. But there may be some complications with removing it from the stack later. I think removing things from the stack is based on kill flags, but I don't know if the undef would have a kill flag. So I guess you'd have to remember you inserted it and immediately remove it after the instruction? You would need to do this for any FP instruction not just ArgFPRW.

I'm not sure. To do it in the stackifier you need to do it any time the undef flag is present regardless of whether StackTop is 0. You instead would need to check whether the register is already present in the stack and only insert if it isn't. But there may be some complications with removing it from the stack later. I think removing things from the stack is based on kill flags, but I don't know if the undef would have a kill flag. So I guess you'd have to remember you inserted it and immediately remove it after the instruction? You would need to do this for any FP instruction not just ArgFPRW.

It make sense. Fixing in ISel is easier. Do you mind if I create another patch in phabricator based on your patch, or you prefer to finishing the patch by yourself?

I'm not sure. To do it in the stackifier you need to do it any time the undef flag is present regardless of whether StackTop is 0. You instead would need to check whether the register is already present in the stack and only insert if it isn't. But there may be some complications with removing it from the stack later. I think removing things from the stack is based on kill flags, but I don't know if the undef would have a kill flag. So I guess you'd have to remember you inserted it and immediately remove it after the instruction? You would need to do this for any FP instruction not just ArgFPRW.

It make sense. Fixing in ISel is easier. Do you mind if I create another patch in phabricator based on your patch, or you prefer to finishing the patch by yourself?

You can take my patch

You can take my patch

Thanks Craig. I create another patch at https://reviews.llvm.org/D104678.