This is an archive of the discontinued LLVM Phabricator instance.

[X86] Fix bug when X86 stackify pass handle one ArgFPRW.
Needs ReviewPublic

Authored by LuoYuanke on Jun 17 2021, 12:39 AM.

Download Raw Diff

Details

Reviewers

craig.topper
pengfei
RKSimon
wxiao3

Summary

Here is the scenario that cause compiler crash.

successors: %bb.26
liveins: $r14
ST_FPrr $st0, implicit-def $fpsw, implicit $fpcw
renamable $rdi = MOV64ri @.str.3.16422
renamable $rdx = LEA64r %stack.6, 1, $noreg, 0, $noreg
ADJCALLSTACKDOWN64 0, 0, 0, implicit-def $rsp, implicit-def dead $eflags, implicit-def $ssp, implicit $rsp, implicit $ssp
dead $esi = MOV32r0 implicit-def dead $eflags, implicit-def $rsi
CALL64pcrel32 @foo, implicit $rsp, implicit $ssp, implicit $rdi, implicit $rsi, implicit $rdx, implicit-def dead $fp0
renamable $xmm0 = MOVSDrm_alt %stack.10, 1, $noreg, 0, $noreg :: (load 8 from %stack.10)
ADJCALLSTACKUP64 0, 0, implicit-def $rsp, implicit-def dead $eflags, implicit-def $ssp, implicit $rsp, implicit $ssp
renamable $fp2 = CHS_Fp80 killed undef renamable $fp0, implicit-def $fpsw
JMP_1 %bb.26

The CALL64pcrel32 mark fp0 dead, so llvm free the stack slot for fp0
and the stack become empty. In the late instruction CHS_Fp80, it use
undefined register fp0, the original code assume there must be a stack
slot for the src register (fp0) without respecting it is undefined,
so llvm report error. The fix is to check if the source register is
undefined when stack is empty(). If it is undefined, advance 1 stack
slot to avoid empty stack.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	8,029 ms	x64 debian > libarcher.races::lock-unrelated.c

Event Timeline

LuoYuanke created this revision.Jun 17 2021, 12:39 AM

Herald added subscribers: pengfei, hiraditya. · View Herald TranscriptJun 17 2021, 12:39 AM

LuoYuanke requested review of this revision.Jun 17 2021, 12:39 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 17 2021, 12:39 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

LuoYuanke added reviewers: craig.topper, pengfei, RKSimon.Jun 17 2021, 12:40 AM

Harbormaster completed remote builds in B109653: Diff 352632.Jun 17 2021, 6:37 AM

LuoYuanke added a reviewer: wxiao3.Jun 17 2021, 6:37 AM

How did you get a CHS with an undef input?

craig.topper added inline comments.Jun 17 2021, 5:05 PM

llvm/test/CodeGen/X86/fpstack-call.mir
38	Isn't this just going to crash in hardware? Hardware also checks stack empty.

craig.topper added inline comments.Jun 17 2021, 5:09 PM

llvm/test/CodeGen/X86/fpstack-call.mir
38	I guess hardware would just make up a value if the exception is masked?

In D104440#2825925, @craig.topper wrote:

How did you get a CHS with an undef input?

I got the MIR from a big application which is built with LTO.

llvm/test/CodeGen/X86/fpstack-call.mir
38	Good catch. I would insert a fld0 to make the fp stack non-empty in runtime.

Address Craig's comments.

Harbormaster completed remote builds in B109915: Diff 352986.Jun 18 2021, 8:13 PM

Can you get IR and use bugpoint to reduce it? I'd really like to see the backend codegen that led to this case.

I'd also like to know what happens if you add "(fneg undef) -> undef" fold to DAGCombiner::visitFNEG.

I did some more digging and it looks like ISD::UNDEF for X86 should be turned into ConstantFP<0> by LegalizeDAG. So I really need more information about how we got here.

In D104440#2828675, @craig.topper wrote:

Can you get IR and use bugpoint to reduce it? I'd really like to see the backend codegen that led to this case.

The .ll file is about 422 M. The reduce progress is slow. It has run 18 hours, but still not finished. Need more time to wait. Any parallel scheme to accelerate the reduce for bugpoint?

In D104440#2829154, @LuoYuanke wrote:

In D104440#2828675, @craig.topper wrote:

Can you get IR and use bugpoint to reduce it? I'd really like to see the backend codegen that led to this case.

The .ll file is about 422 M. The reduce progress is slow. It has run 18 hours, but still not finished. Need more time to wait. Any parallel scheme to accelerate the reduce for bugpoint?

Did you run llvm-extract to isolate the broken function first? bugpoint is not good at that.

Did you run llvm-extract to isolate the broken function first? bugpoint is not good at that.

Thanks Craig for the suggestion. After run "llvm-extract --recursive" and get the small file, I can't reproduce this issue. However I use -print-after-all to dump the IR of each pass. The undefine value is created in "processimpdefs" pass.

From

bb.11 (%ir-block.61):
; predecessors: %bb.10
  successors: %bb.26

  %164:gr64 = MOV64ri @.str.3.16422
  %165:gr32 = MOV32r0 implicit-def $eflags
  %166:gr64 = SUBREG_TO_REG 0, %165:gr32, %subreg.sub_32bit
  %167:gr64 = LEA64r %stack.6, 1, $noreg, 0, $noreg
  ADJCALLSTACKDOWN64 0, 0, 0, implicit-def $rsp, implicit-def $eflags, implicit-def $ssp, implicit $rsp, implicit $ssp
  $rdi = COPY %164:gr64
  $rsi = COPY %166:gr64
  $rdx = COPY %167:gr64
  CALL64pcrel32 @_ZN5boost4math8policies20raise_overflow_errorIeNS1_6policyINS1_13promote_floatILb0EEENS1_14promote_doubleILb0EEENS1_14default_policyES8_S8_S8_S8_S8_S8_S8_S8_S8_S8_EEEET_PKcSC_RKT0_, <regmask $bh $bl $bp $bph $bpl $bx $ebp $ebx $hbp $hbx $rbp $rbx $r12 $r13 $r14 $r15 $r12b $r13b $r14b $r15b $r12bh $r13bh $r14bh $r15bh $r12d $r13d $r14d $r15d $r12w $r13w $r14w $r15w $r12wh and 3 more...>, implicit $rsp, implicit $ssp, implicit $rdi, implicit $rsi, implicit $rdx, implicit-def $fp0
  ADJCALLSTACKUP64 0, 0, implicit-def $rsp, implicit-def $eflags, implicit-def $ssp, implicit $rsp, implicit $ssp
  %168:rfp80 = COPY $fp0
  %162:rfp80 = IMPLICIT_DEF
  %163:rfp80 = CHS_Fp80 %162:rfp80, implicit-def $fpsw
  JMP_1 %bb.26

bb.11 (%ir-block.61):
; predecessors: %bb.10
  successors: %bb.26

  %164:gr64 = MOV64ri @.str.3.16422
  %165:gr32 = MOV32r0 implicit-def $eflags
  %166:gr64 = SUBREG_TO_REG 0, %165:gr32, %subreg.sub_32bit
  %167:gr64 = LEA64r %stack.6, 1, $noreg, 0, $noreg
  ADJCALLSTACKDOWN64 0, 0, 0, implicit-def $rsp, implicit-def $eflags, implicit-def $ssp, implicit $rsp, implicit $ssp
  $rdi = COPY %164:gr64
  $rsi = COPY %166:gr64
  $rdx = COPY %167:gr64
  CALL64pcrel32 @_ZN5boost4math8policies20raise_overflow_errorIeNS1_6policyINS1_13promote_floatILb0EEENS1_14promote_doubleILb0EEENS1_14default_policyES8_S8_S8_S8_S8_S8_S8_S8_S8_S8_EEEET_PKcSC_RKT0_, <regmask $bh $bl $bp $bph $bpl $bx $ebp $ebx $hbp $hbx $rbp $rbx $r12 $r13 $r14 $r15 $r12b $r13b $r14b $r15b $r12bh $r13bh $r14bh $r15bh $r12d $r13d $r14d $r15d $r12w $r13w $r14w $r15w $r12wh and 3 more...>, implicit $rsp, implicit $ssp, implicit $rdi, implicit $rsi, implicit $rdx, implicit-def $fp0
  ADJCALLSTACKUP64 0, 0, implicit-def $rsp, implicit-def $eflags, implicit-def $ssp, implicit $rsp, implicit $ssp
  %168:rfp80 = COPY $fp0
  %163:rfp80 = CHS_Fp80 undef %162:rfp80, implicit-def $fpsw
  JMP_1 %bb.26

Is this transform reasonable? "%162:rfp80 = IMPLICIT_DEF" is generated in ISel. I will look into why "%162:rfp80 = IMPLICIT_DEF" is generated.

Here is IR.

61:                                               ; preds = %59
  %62 = call contract x86_fp80 @_ZN5boost4math8policies20raise_overflow_errorIeNS1_6policyINS1_13promote_floatILb0EEENS1_14promote_doubleILb0EEENS1_14default_policyES8_S8_S8_S8_S8_S8_S8_S8_S8_S8_EEEET_PKcSC_RKT0_(i8* getelementptr inbounds ([29 x i8], [29 x i8]* @.str.3.16422, i64 0, i64 0), i8* null, %"struct.std::__atomic_flag_base"* nonnull align 1 dereferenceable(1) %9)
  %63 = fneg contract x86_fp80 undef
  br label %106

In D104440#2829188, @LuoYuanke wrote:

Did you run llvm-extract to isolate the broken function first? bugpoint is not good at that.

From

bb.11 (%ir-block.61):
; predecessors: %bb.10
  successors: %bb.26

  %164:gr64 = MOV64ri @.str.3.16422
  %165:gr32 = MOV32r0 implicit-def $eflags
  %166:gr64 = SUBREG_TO_REG 0, %165:gr32, %subreg.sub_32bit
  %167:gr64 = LEA64r %stack.6, 1, $noreg, 0, $noreg
  ADJCALLSTACKDOWN64 0, 0, 0, implicit-def $rsp, implicit-def $eflags, implicit-def $ssp, implicit $rsp, implicit $ssp
  $rdi = COPY %164:gr64
  $rsi = COPY %166:gr64
  $rdx = COPY %167:gr64
  CALL64pcrel32 @_ZN5boost4math8policies20raise_overflow_errorIeNS1_6policyINS1_13promote_floatILb0EEENS1_14promote_doubleILb0EEENS1_14default_policyES8_S8_S8_S8_S8_S8_S8_S8_S8_S8_EEEET_PKcSC_RKT0_, <regmask $bh $bl $bp $bph $bpl $bx $ebp $ebx $hbp $hbx $rbp $rbx $r12 $r13 $r14 $r15 $r12b $r13b $r14b $r15b $r12bh $r13bh $r14bh $r15bh $r12d $r13d $r14d $r15d $r12w $r13w $r14w $r15w $r12wh and 3 more...>, implicit $rsp, implicit $ssp, implicit $rdi, implicit $rsi, implicit $rdx, implicit-def $fp0
  ADJCALLSTACKUP64 0, 0, implicit-def $rsp, implicit-def $eflags, implicit-def $ssp, implicit $rsp, implicit $ssp
  %168:rfp80 = COPY $fp0
  %162:rfp80 = IMPLICIT_DEF
  %163:rfp80 = CHS_Fp80 %162:rfp80, implicit-def $fpsw
  JMP_1 %bb.26

bb.11 (%ir-block.61):
; predecessors: %bb.10
  successors: %bb.26

  %164:gr64 = MOV64ri @.str.3.16422
  %165:gr32 = MOV32r0 implicit-def $eflags
  %166:gr64 = SUBREG_TO_REG 0, %165:gr32, %subreg.sub_32bit
  %167:gr64 = LEA64r %stack.6, 1, $noreg, 0, $noreg
  ADJCALLSTACKDOWN64 0, 0, 0, implicit-def $rsp, implicit-def $eflags, implicit-def $ssp, implicit $rsp, implicit $ssp
  $rdi = COPY %164:gr64
  $rsi = COPY %166:gr64
  $rdx = COPY %167:gr64
  CALL64pcrel32 @_ZN5boost4math8policies20raise_overflow_errorIeNS1_6policyINS1_13promote_floatILb0EEENS1_14promote_doubleILb0EEENS1_14default_policyES8_S8_S8_S8_S8_S8_S8_S8_S8_S8_EEEET_PKcSC_RKT0_, <regmask $bh $bl $bp $bph $bpl $bx $ebp $ebx $hbp $hbx $rbp $rbx $r12 $r13 $r14 $r15 $r12b $r13b $r14b $r15b $r12bh $r13bh $r14bh $r15bh $r12d $r13d $r14d $r15d $r12w $r13w $r14w $r15w $r12wh and 3 more...>, implicit $rsp, implicit $ssp, implicit $rdi, implicit $rsi, implicit $rdx, implicit-def $fp0
  ADJCALLSTACKUP64 0, 0, implicit-def $rsp, implicit-def $eflags, implicit-def $ssp, implicit $rsp, implicit $ssp
  %168:rfp80 = COPY $fp0
  %163:rfp80 = CHS_Fp80 undef %162:rfp80, implicit-def $fpsw
  JMP_1 %bb.26

Is this transform reasonable? "%162:rfp80 = IMPLICIT_DEF" is generated in ISel. I will look into why "%162:rfp80 = IMPLICIT_DEF" is generated.

I don’t know what llvm-extract —recursive does. I’ve only used -func to extract a single function I knew caused a compiler crash.

Converting IMPLICIT_DEF to undef flag is correct.

IMPLICIT_DEF can get created from ISD::UNDEF but as far I could see ISD::UNDEF for fp80 is supposed to Expand to ConstantFP 0.

I also think fneg of undef should be folded by getNode when it is created in SelectionDAGBuilder. Is this going through fast isel or something?

I don’t know what llvm-extract —recursive does. I’ve only used -func to extract a single function I knew caused a compiler crash.

Converting IMPLICIT_DEF to undef flag is correct.

IMPLICIT_DEF can get created from ISD::UNDEF but as far I could see ISD::UNDEF for fp80 is supposed to Expand to ConstantFP 0.

I also think fneg of undef should be folded by getNode when it is created in SelectionDAGBuilder. Is this going through fast isel or something?

In my small test case that is extracted by llvm-extract, the "fneg contract x86_fp80 undef" is lowered to "LD_Fp080". But in the big case which cause crash, it is lowered to "IMPLICIT_DEF" and "CHS_Fp80".

76:                                               ; preds = %74
  %77 = call contract x86_fp80 @_ZN5boost4math8policies20raise_overflow_errorIeNS1_6policyINS1_13promote_floatILb0EEENS1_14promote_doubleILb0EEENS1_14default_policyES8_S8_S8_S8_S8_S8_S8_S8_S8_S8_EEEET_PKcSC_RKT0_(i8* getelementptr inbounds ([29 x i8], [29 x i8]* @.str.3.16422, i64 0, i64 0), i8* null, %"struct.std::__atomic_flag_base"* nonnull align 1 dereferenceable(1) %9)
  %78 = fneg contract x86_fp80 undef
  br label %121

CALL64pcrel32 @_ZN5boost4math8policies20raise_overflow_errorIeNS1_6policyINS1_13promote_floatILb0EEENS1_14promote_doubleILb0EEENS1_14default_policyES8_S8_S8_S8_S8_S8_S8_S8_S8_S8_EEEET_PKcSC_RKT0_, <regmask $bh $bl $bp $bph $bpl $bx $ebp $ebx $hbp $hbx $rbp $rbx $r12 $r13 $r14 $r15 $r12b $r13b $r14b $r15b $r12bh $r13bh $r14bh $r15bh $r12d $r13d $r14d $r15d $r12w $r13w $r14w $r15w $r12wh and 3 more...>, implicit $rsp, implicit $ssp, implicit $rdi, implicit $rsi, implicit $rdx, implicit-def $rsp, implicit-def $ssp, implicit-def $fp0
ADJCALLSTACKUP64 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
%118:rfp80 = COPY $fp0
%13:rfp80 = nofpexcept LD_Fp080 implicit-def dead $fpsw, implicit $fpcw

BTW, do you think the patch to handle undef case in stackify pass reasonable?

In D104440#2829199, @LuoYuanke wrote:

BTW, do you think the patch to handle undef case in stackify pass reasonable?

No I do not. I have no reason to believe fneg or one arg instructions are the only things that could be effected. We need to understand why this is happening because isel was trying to prevent this.

In D104440#2829202, @craig.topper wrote:

In D104440#2829199, @LuoYuanke wrote:

BTW, do you think the patch to handle undef case in stackify pass reasonable?

No I do not. I have no reason to believe fneg or one arg instructions are the only things that could be effected. We need to understand why this is happening because isel was trying to prevent this.

Ok, I'll continue to debug it in isel.

In D104440#2829198, @LuoYuanke wrote:
I don’t know what llvm-extract —recursive does. I’ve only used -func to extract a single function I knew caused a compiler crash.

Converting IMPLICIT_DEF to undef flag is correct.

IMPLICIT_DEF can get created from ISD::UNDEF but as far I could see ISD::UNDEF for fp80 is supposed to Expand to ConstantFP 0.

I also think fneg of undef should be folded by getNode when it is created in SelectionDAGBuilder. Is this going through fast isel or something?

In my small test case that is extracted by llvm-extract, the "fneg contract x86_fp80 undef" is lowered to "LD_Fp080". But in the big case which cause crash, it is lowered to "IMPLICIT_DEF" and "CHS_Fp80".
76:                                               ; preds = %74
  %77 = call contract x86_fp80 @_ZN5boost4math8policies20raise_overflow_errorIeNS1_6policyINS1_13promote_floatILb0EEENS1_14promote_doubleILb0EEENS1_14default_policyES8_S8_S8_S8_S8_S8_S8_S8_S8_S8_EEEET_PKcSC_RKT0_(i8* getelementptr inbounds ([29 x i8], [29 x i8]* @.str.3.16422, i64 0, i64 0), i8* null, %"struct.std::__atomic_flag_base"* nonnull align 1 dereferenceable(1) %9)
  %78 = fneg contract x86_fp80 undef
  br label %121
CALL64pcrel32 @_ZN5boost4math8policies20raise_overflow_errorIeNS1_6policyINS1_13promote_floatILb0EEENS1_14promote_doubleILb0EEENS1_14default_policyES8_S8_S8_S8_S8_S8_S8_S8_S8_S8_EEEET_PKcSC_RKT0_, <regmask $bh $bl $bp $bph $bpl $bx $ebp $ebx $hbp $hbx $rbp $rbx $r12 $r13 $r14 $r15 $r12b $r13b $r14b $r15b $r12bh $r13bh $r14bh $r15bh $r12d $r13d $r14d $r15d $r12w $r13w $r14w $r15w $r12wh and 3 more...>, implicit $rsp, implicit $ssp, implicit $rdi, implicit $rsi, implicit $rdx, implicit-def $rsp, implicit-def $ssp, implicit-def $fp0
ADJCALLSTACKUP64 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
%118:rfp80 = COPY $fp0
%13:rfp80 = nofpexcept LD_Fp080 implicit-def dead $fpsw, implicit $fpcw

Are you running the large and small cases the same way? Have looked at the SelectionDAG debug logs for the affected basic block in the large case?

I was able to trigger the error with llc -O2 -fast-isel on a simple test. So that is a path to create this but you haven’t answered if fast isel is being used in your case.

Are you running the large and small cases the same way? Have looked at the SelectionDAG debug logs for the affected basic block in the large case?

I was able to trigger the error with llc -O2 -fast-isel on a simple test. So that is a path to create this but you haven’t answered if fast isel is being used in your case.

Not exact the same way. For large case, I run through ld.lld and with some options like this "-plugin-opt=mcpu=x86-64 -plugin-opt=O3". For small case, I assembly the -plugin-opt and use llc to run it. Since I use O3, so I think it won't run fastISel. However, I'll check fastISel in large case.

Do you have a patch to fix fast ISel, so that I can verify your patch in my large case?

With -O0, the small case can also generate "IMPLICIT_DEF" and "CHS_Fp80". I think we are near to the root cause. Stay tuned.

In D104440#2829220, @LuoYuanke wrote:

With -O0, the small case can also generate "IMPLICIT_DEF" and "CHS_Fp80". I think we are near to the root cause. Stay tuned.

ProcessImplicitDefs doesn’t run with O0 and FP stackifier has special code for IMPLICIT_DEF.

I think because I use the

In D104440#2829221, @craig.topper wrote:

In D104440#2829220, @LuoYuanke wrote:

With -O0, the small case can also generate "IMPLICIT_DEF" and "CHS_Fp80". I think we are near to the root cause. Stay tuned.

ProcessImplicitDefs doesn’t run with O0 and FP stackifier has special code for IMPLICIT_DEF.

This is because I set opt-bisect-limit=67436 in my command line. When CurBisectNum expired, "DAG to DAG" pass lower its opt level to O0. However "processimpdefs" and "X86 FP Stackifier" is not stopped due to the CurBisectNum expiration. So undefined fp0 is generated.

if (OptLevel != CodeGenOpt::None && skipFunction(Fn))
  NewOptLevel = CodeGenOpt::None;
OptLevelChanger OLC(*this, NewOptLevel);

In D104440#2829260, @LuoYuanke wrote:
I think because I use the

In D104440#2829221, @craig.topper wrote:

In D104440#2829220, @LuoYuanke wrote:

With -O0, the small case can also generate "IMPLICIT_DEF" and "CHS_Fp80". I think we are near to the root cause. Stay tuned.

ProcessImplicitDefs doesn’t run with O0 and FP stackifier has special code for IMPLICIT_DEF.

This is because I set opt-bisect-limit=67436 in my command line. When CurBisectNum expired, "DAG to DAG" pass lower its opt level to O0. However "processimpdefs" and "X86 FP Stackifier" is not stopped due to the CurBisectNum expiration. So undefined fp0 is generated.
if (OptLevel != CodeGenOpt::None && skipFunction(Fn))
  NewOptLevel = CodeGenOpt::None;
OptLevelChanger OLC(*this, NewOptLevel);

Does this fix your test

diff --git a/llvm/lib/Target/X86/X86FastISel.cpp b/llvm/lib/Target/X86/X86FastISel.cpp
index 44670a9..3e5d45b 100644
--- a/llvm/lib/Target/X86/X86FastISel.cpp
+++ b/llvm/lib/Target/X86/X86FastISel.cpp
@@ -3842,6 +3842,30 @@ unsigned X86FastISel::fastMaterializeConstant(const Constant *C) {
     return X86MaterializeFP(CFP, VT);
   else if (const GlobalValue *GV = dyn_cast<GlobalValue>(C))
     return X86MaterializeGV(GV, VT);
+  else if (isa<UndefValue>(C)) {
+    unsigned Opc = 0;
+    switch (VT.SimpleTy) {
+    default:
+      break;
+    case MVT::f32:
+      if (!X86ScalarSSEf32)
+        Opc = X86::LD_Fp032;
+      break;
+    case MVT::f64:
+      if (!X86ScalarSSEf64)
+        Opc = X86::LD_Fp064;
+      break;
+    case MVT::f80:
+      Opc = X86::LD_Fp080;
+      break;
+    }
+
+    if (Opc) {
+      Register ResultReg = createResultReg(TLI.getRegClassFor(VT));
+      BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(Opc), ResultReg);
+      return ResultReg;
+    }
+  }
 
   return 0;
 }

Does this fix your test

diff --git a/llvm/lib/Target/X86/X86FastISel.cpp b/llvm/lib/Target/X86/X86FastISel.cpp
index 44670a9..3e5d45b 100644
--- a/llvm/lib/Target/X86/X86FastISel.cpp
+++ b/llvm/lib/Target/X86/X86FastISel.cpp
@@ -3842,6 +3842,30 @@ unsigned X86FastISel::fastMaterializeConstant(const Constant *C) {
     return X86MaterializeFP(CFP, VT);
   else if (const GlobalValue *GV = dyn_cast<GlobalValue>(C))
     return X86MaterializeGV(GV, VT);
+  else if (isa<UndefValue>(C)) {
+    unsigned Opc = 0;
+    switch (VT.SimpleTy) {
+    default:
+      break;
+    case MVT::f32:
+      if (!X86ScalarSSEf32)
+        Opc = X86::LD_Fp032;
+      break;
+    case MVT::f64:
+      if (!X86ScalarSSEf64)
+        Opc = X86::LD_Fp064;
+      break;
+    case MVT::f80:
+      Opc = X86::LD_Fp080;
+      break;
+    }
+
+    if (Opc) {
+      Register ResultReg = createResultReg(TLI.getRegClassFor(VT));
+      BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(Opc), ResultReg);
+      return ResultReg;
+    }
+  }
 
   return 0;
 }

Yes. It can fix. Thank you! One question, is there any other consideration to prevent undef float value in MIR? Otherwise stackify pass can support undef value by insert fld0 instruction, so that all ISel passes don't have to handle it specially.

In D104440#2829512, @LuoYuanke wrote:

Does this fix your test

diff --git a/llvm/lib/Target/X86/X86FastISel.cpp b/llvm/lib/Target/X86/X86FastISel.cpp
index 44670a9..3e5d45b 100644
--- a/llvm/lib/Target/X86/X86FastISel.cpp
+++ b/llvm/lib/Target/X86/X86FastISel.cpp
@@ -3842,6 +3842,30 @@ unsigned X86FastISel::fastMaterializeConstant(const Constant *C) {
     return X86MaterializeFP(CFP, VT);
   else if (const GlobalValue *GV = dyn_cast<GlobalValue>(C))
     return X86MaterializeGV(GV, VT);
+  else if (isa<UndefValue>(C)) {
+    unsigned Opc = 0;
+    switch (VT.SimpleTy) {
+    default:
+      break;
+    case MVT::f32:
+      if (!X86ScalarSSEf32)
+        Opc = X86::LD_Fp032;
+      break;
+    case MVT::f64:
+      if (!X86ScalarSSEf64)
+        Opc = X86::LD_Fp064;
+      break;
+    case MVT::f80:
+      Opc = X86::LD_Fp080;
+      break;
+    }
+
+    if (Opc) {
+      Register ResultReg = createResultReg(TLI.getRegClassFor(VT));
+      BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(Opc), ResultReg);
+      return ResultReg;
+    }
+  }
 
   return 0;
 }

I'm not sure. To do it in the stackifier you need to do it any time the undef flag is present regardless of whether StackTop is 0. You instead would need to check whether the register is already present in the stack and only insert if it isn't. But there may be some complications with removing it from the stack later. I think removing things from the stack is based on kill flags, but I don't know if the undef would have a kill flag. So I guess you'd have to remember you inserted it and immediately remove it after the instruction? You would need to do this for any FP instruction not just ArgFPRW.

I'm not sure. To do it in the stackifier you need to do it any time the undef flag is present regardless of whether StackTop is 0. You instead would need to check whether the register is already present in the stack and only insert if it isn't. But there may be some complications with removing it from the stack later. I think removing things from the stack is based on kill flags, but I don't know if the undef would have a kill flag. So I guess you'd have to remember you inserted it and immediately remove it after the instruction? You would need to do this for any FP instruction not just ArgFPRW.

It make sense. Fixing in ISel is easier. Do you mind if I create another patch in phabricator based on your patch, or you prefer to finishing the patch by yourself?

In D104440#2830256, @LuoYuanke wrote:

I'm not sure. To do it in the stackifier you need to do it any time the undef flag is present regardless of whether StackTop is 0. You instead would need to check whether the register is already present in the stack and only insert if it isn't. But there may be some complications with removing it from the stack later. I think removing things from the stack is based on kill flags, but I don't know if the undef would have a kill flag. So I guess you'd have to remember you inserted it and immediately remove it after the instruction? You would need to do this for any FP instruction not just ArgFPRW.

It make sense. Fixing in ISel is easier. Do you mind if I create another patch in phabricator based on your patch, or you prefer to finishing the patch by yourself?

You can take my patch

LuoYuanke mentioned this in D104678: [X86] Selecting fld0 for undefined value in fast ISEL..Jun 21 2021, 7:00 PM

You can take my patch

Thanks Craig. I create another patch at https://reviews.llvm.org/D104678.

LuoYuanke mentioned this in rG36003c20ada6: [X86] Selecting fld0 for undefined value in fast ISEL..Jun 25 2021, 5:43 PM

Revision Contents

Path

Size

llvm/

lib/

Target/

X86/

X86FloatingPoint.cpp

15 lines

test/

CodeGen/

X86/

fpstack-call.mir

53 lines

Diff 352986

llvm/lib/Target/X86/X86FloatingPoint.cpp

Show First 20 Lines • Show All 1,184 Lines • ▼ Show 20 Lines	#endif

// Is this the last use of the source register?		// Is this the last use of the source register?
unsigned Reg = getFPReg(MI.getOperand(1));		unsigned Reg = getFPReg(MI.getOperand(1));
bool KillsSrc = MI.killsRegister(X86::FP0 + Reg);		bool KillsSrc = MI.killsRegister(X86::FP0 + Reg);

if (KillsSrc) {		if (KillsSrc) {
// If this is the last use of the source register, just make sure it's on		// If this is the last use of the source register, just make sure it's on
// the top of the stack.		// the top of the stack.
moveToTop(Reg, I);		if (StackTop == 0) {
if (StackTop == 0)		// If the stack is empty and the input is undefined fp register,
		// insert fld0.
		if (MI.getOperand(1).isUndef()) {
		LLVM_DEBUG(dbgs() << "Emitting LD_F0 for undefined FP" << Reg << '\n');
		BuildMI(*MBB, I, MI.getDebugLoc(), TII->get(X86::LD_F0));
		pushReg(Reg);
		} else
report_fatal_error("Stack cannot be empty!");		report_fatal_error("Stack cannot be empty!");
		} else
		moveToTop(Reg, I);

--StackTop;		--StackTop;
pushReg(getFPReg(MI.getOperand(0)));		pushReg(getFPReg(MI.getOperand(0)));
} else {		} else {
// If this is not the last use of the source register, _copy_ it to the top		// If this is not the last use of the source register, _copy_ it to the top
// of the stack.		// of the stack.
duplicateToTop(Reg, getFPReg(MI.getOperand(0)), I);		duplicateToTop(Reg, getFPReg(MI.getOperand(0)), I);
}		}

▲ Show 20 Lines • Show All 525 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/fpstack-call.mir

This file was added.

				# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
				# RUN: llc -mtriple=x86_64-- -run-pass x86-codegen -verify-machineinstrs -mcpu=x86-64 -o - %s \| FileCheck %s

				--- \|
				@x = dso_local global i32 0, align 4
				define void @fpstack-empty() { ret void }
				declare void @foo()
				...
				---

				name: fpstack-empty
				tracksRegLiveness: true
				registers: []
				liveins:
				- { reg: '$r14', virtual-reg: '' }
				stack:
				- { id: 0, name: '', type: default, offset: 0, size: 8, alignment: 8,
				stack-id: default, callee-saved-register: '', callee-saved-restored: true,
				debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
				- { id: 1, name: '', type: default, offset: 0, size: 8, alignment: 8,
				stack-id: default, callee-saved-register: '', callee-saved-restored: true,
				debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
				body: \|
				bb.0 (%ir-block.0):
				liveins: $r14
				; CHECK-LABEL: name: fpstack-empty
				; CHECK: liveins: $r14
				; CHECK: ST_FPrr $st0, implicit-def $fpsw, implicit $fpcw
				; CHECK: renamable $rdi = MOV64ri @x
				; CHECK: renamable $rdx = LEA64r %stack.0, 1, $noreg, 0, $noreg
				; CHECK: ADJCALLSTACKDOWN64 0, 0, 0, implicit-def $rsp, implicit-def dead $eflags, implicit-def $ssp, implicit $rsp, implicit $ssp
				; CHECK: dead $esi = MOV32r0 implicit-def dead $eflags, implicit-def $rsi
				; CHECK: CALL64pcrel32 @foo, implicit $rsp, implicit $ssp, implicit $rdi, implicit $rsi, implicit $rdx
				; CHECK: ST_FPrr $st0, implicit-def $fpsw, implicit $fpcw
				; CHECK: renamable $xmm0 = MOVSDrm_alt %stack.1, 1, $noreg, 0, $noreg :: (load 8 from %stack.1)
				; CHECK: ADJCALLSTACKUP64 0, 0, implicit-def $rsp, implicit-def dead $eflags, implicit-def $ssp, implicit $rsp, implicit $ssp
				; CHECK: LD_F0 implicit-def $fpsw, implicit $fpcw
				; CHECK: CHS_F implicit-def $fpsw
				craig.topperUnsubmitted Not Done Reply Inline Actions Isn't this just going to crash in hardware? Hardware also checks stack empty. craig.topper: Isn't this just going to crash in hardware? Hardware also checks stack empty.
				craig.topperUnsubmitted Not Done Reply Inline Actions I guess hardware would just make up a value if the exception is masked? craig.topper: I guess hardware would just make up a value if the exception is masked?
				LuoYuankeAuthorUnsubmitted Done Reply Inline Actions Good catch. I would insert a fld0 to make the fp stack non-empty in runtime. LuoYuanke: Good catch. I would insert a fld0 to make the fp stack non-empty in runtime.
				; CHECK: ST_FPrr $st0, implicit-def $fpsw, implicit $fpcw
				; CHECK: RETQ
				ST_FPrr $st0, implicit-def $fpsw, implicit $fpcw
				renamable $rdi = MOV64ri @x
				renamable $rdx = LEA64r %stack.0, 1, $noreg, 0, $noreg
				ADJCALLSTACKDOWN64 0, 0, 0, implicit-def $rsp, implicit-def dead $eflags, implicit-def $ssp, implicit $rsp, implicit $ssp
				dead $esi = MOV32r0 implicit-def dead $eflags, implicit-def $rsi
				CALL64pcrel32 @foo, implicit $rsp, implicit $ssp, implicit $rdi, implicit $rsi, implicit $rdx, implicit-def dead $fp0
				renamable $xmm0 = MOVSDrm_alt %stack.1, 1, $noreg, 0, $noreg :: (load 8 from %stack.1)
				ADJCALLSTACKUP64 0, 0, implicit-def $rsp, implicit-def dead $eflags, implicit-def $ssp, implicit $rsp, implicit $ssp
				renamable $fp2 = CHS_Fp80 killed undef renamable $fp0, implicit-def $fpsw

				RETQ
				...
				---

This is an archive of the discontinued LLVM Phabricator instance.

[X86] Fix bug when X86 stackify pass handle one ArgFPRW.Needs ReviewPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 352986

llvm/lib/Target/X86/X86FloatingPoint.cpp

llvm/test/CodeGen/X86/fpstack-call.mir

[X86] Fix bug when X86 stackify pass handle one ArgFPRW.
Needs ReviewPublic