Page MenuHomePhabricator

craig.topper (Craig Topper)
User

Projects

User does not belong to any projects.

User Details

User Since
Jul 30 2013, 7:58 PM (412 w, 2 d)

Recent Activity

Today

craig.topper committed rGf225367305c8: [RISCV] Add vget/vset intrinsics for inserting and extracting between different… (authored by craig.topper).
[RISCV] Add vget/vset intrinsics for inserting and extracting between different…
Thu, Jun 24, 6:08 PM
craig.topper closed D104822: [RISCV] Add vget/vset intrinsics for inserting and extracting between different lmuls..
Thu, Jun 24, 6:07 PM · Restricted Project
craig.topper accepted D104853: [X86] Add description of FXAM instruction.

AMD's optimization manual for bulldozer only shows a 2 cycle latency. I'm not sure why Agner reports 20 unless there's some bad case for some particular input that isn't documented. A single uop taking 20 cycles sounds very strange and must be serializing the machine. I would only expect divide/sqrt to be that high from a single uop. Maybe someone can run llvm-exegesis and one of those AMD CPUs

Thu, Jun 24, 3:37 PM · Restricted Project
craig.topper updated the diff for D104822: [RISCV] Add vget/vset intrinsics for inserting and extracting between different lmuls..

Add constant argument range checking to SemaChecking

Thu, Jun 24, 10:33 AM · Restricted Project
craig.topper added a comment to D104853: [X86] Add description of FXAM instruction.

FXAM appears to be two uops where FTST is one on modern Intel CPUs based on Agner Fog's data. Agner's data for some AMD CPUs shows ~20 cycles of latency.

Could tuning scheduling for this instruction be subsequent work?

Thu, Jun 24, 10:32 AM · Restricted Project
craig.topper committed rG03f9e04bc35c: [TargetLowering][ARM] Don't alter opaque constants in TargetLowering… (authored by craig.topper).
[TargetLowering][ARM] Don't alter opaque constants in TargetLowering…
Thu, Jun 24, 10:10 AM
craig.topper closed D104832: [TargetLowering][ARM] Don't alter opaque constants in TargetLowering::ShrinkDemandedConstant..
Thu, Jun 24, 10:09 AM · Restricted Project
craig.topper accepted D104061: [LangRef] clarify the meaning of noimplicitfloat.

lgtm

Thu, Jun 24, 10:00 AM · Restricted Project
craig.topper added a comment to D104853: [X86] Add description of FXAM instruction.

FXAM appears to be two uops where FTST is one on modern Intel CPUs based on Agner Fog's data. Agner's data for some AMD CPUs shows ~20 cycles of latency.

Thu, Jun 24, 9:07 AM · Restricted Project
craig.topper added a comment to D104854: Introduce intrinsic llvm.isnan.

Doesn't gcc also fold isnan to false under fast math? If we diverge here that means your code would only work correctly with clang.

Thu, Jun 24, 8:53 AM · Restricted Project, Restricted Project

Yesterday

craig.topper updated the summary of D104832: [TargetLowering][ARM] Don't alter opaque constants in TargetLowering::ShrinkDemandedConstant..
Wed, Jun 23, 6:47 PM · Restricted Project
craig.topper requested review of D104832: [TargetLowering][ARM] Don't alter opaque constants in TargetLowering::ShrinkDemandedConstant..
Wed, Jun 23, 6:43 PM · Restricted Project
craig.topper committed rG91319534ba00: [CGP][RISCV] Teach CodeGenPrepare::optimizeSwitchInst to honor… (authored by craig.topper).
[CGP][RISCV] Teach CodeGenPrepare::optimizeSwitchInst to honor…
Wed, Jun 23, 3:43 PM
craig.topper closed D104612: [CGP][RISCV] Teach CodeGenPrepare::optimizeSwitchInst to honor isSExtCheaperThanZExt..
Wed, Jun 23, 3:43 PM · Restricted Project
craig.topper requested review of D104822: [RISCV] Add vget/vset intrinsics for inserting and extracting between different lmuls..
Wed, Jun 23, 3:31 PM · Restricted Project
craig.topper updated the diff for D104612: [CGP][RISCV] Teach CodeGenPrepare::optimizeSwitchInst to honor isSExtCheaperThanZExt..

Honor zext function attributes over a preference for sext.

Wed, Jun 23, 12:16 PM · Restricted Project
craig.topper added inline comments to D104612: [CGP][RISCV] Teach CodeGenPrepare::optimizeSwitchInst to honor isSExtCheaperThanZExt..
Wed, Jun 23, 11:58 AM · Restricted Project
craig.topper added a reviewer for D104612: [CGP][RISCV] Teach CodeGenPrepare::optimizeSwitchInst to honor isSExtCheaperThanZExt.: jrtc27.
Wed, Jun 23, 11:26 AM · Restricted Project
craig.topper added a comment to D104790: [x86] fix mm*_undefined* intrinsics to use arbitrary frozen bit pattern.

We may want to update the code in X86ISelLowering getAVX2GatherNode and getGatherNode to replace freeze+poison on Src with a zero vector. We already do this when the Src is undef.

Wed, Jun 23, 11:26 AM · Restricted Project
craig.topper updated the diff for D104802: [RISCV] Add support for matching vwmul(u) and vwmacc(u) from fixed vectors..

Reword a comment slightly

Wed, Jun 23, 11:22 AM · Restricted Project
craig.topper updated the diff for D104802: [RISCV] Add support for matching vwmul(u) and vwmacc(u) from fixed vectors..

clang-format

Wed, Jun 23, 11:19 AM · Restricted Project
craig.topper requested review of D104802: [RISCV] Add support for matching vwmul(u) and vwmacc(u) from fixed vectors..
Wed, Jun 23, 11:18 AM · Restricted Project
craig.topper added a comment to D104790: [x86] fix mm*_undefined* intrinsics to use arbitrary frozen bit pattern.

I couldn't find end-to-end tests for checking assembly generation.
To check whether this is working ok, which tests should I write and how would it look like?

Wed, Jun 23, 10:38 AM · Restricted Project
craig.topper accepted D104772: [RISCV] Lower RVV vector SELECTs to VSELECTs.

LGTM

Wed, Jun 23, 8:27 AM · Restricted Project
craig.topper committed rGa37cf17834d3: [RISCV] Add explicit copy to V0 in the masked vmsge(u).vx intrinsic handling. (authored by craig.topper).
[RISCV] Add explicit copy to V0 in the masked vmsge(u).vx intrinsic handling.
Wed, Jun 23, 8:09 AM

Tue, Jun 22

craig.topper accepted D104621: [ValueTypes] Define MVTs for v3i64/v3f64 to complement v6i32/v6f32.

LGTM

Tue, Jun 22, 6:58 PM · Restricted Project
craig.topper added inline comments to D104621: [ValueTypes] Define MVTs for v3i64/v3f64 to complement v6i32/v6f32.
Tue, Jun 22, 3:06 PM · Restricted Project

Mon, Jun 21

craig.topper committed rGc2e01ee4a5e9: [RISCV] Remove extra character from a comment. NFC (authored by craig.topper).
[RISCV] Remove extra character from a comment. NFC
Mon, Jun 21, 12:55 PM
craig.topper accepted D104541: [Utils][vim] Add missing highlights for fast-math flags.

LGTM

Mon, Jun 21, 12:13 PM · Restricted Project
craig.topper added a comment to D104308: [VP] Add vector-predicated reduction intrinsics.

Disabling lanes is really what makes the difference between these and the regular reduction intrinsics.
There is also the corner case that all lanes are disabled and i am unsure what the return value should be then. Any thoughts on that?

Agreed. I'll update the docs accordingly.

We have a couple of options for what happens when all lanes are disabled. For starters, it follows logically from the definition of the expansion into reduce(select %mask, %v, %neutral) that we just return the neutral element. So that's the "easiest" definition in that sense.

We could return undef too which would be closer to how the other VP intrinsics work.

As for other options, it wouldn't make sense to me to specify that we return any of the vector elements (e.g. v[0]) as they're not conceptually active. And I think poison wouldn't make any sense. Are those the only realistic options?

In terms of hardware (which is orthogonal but may help guide us), RVV always takes a start value so we'd just return the neutral element; I admit I might be surreptitiously led by that. The lowering for returning the "neutral element" may be complicated on some targets, involving fetching the active length and perhaps doing some kind of scalar select. How would VE work?

Mon, Jun 21, 11:40 AM · Restricted Project
craig.topper committed rG9080659ac730: [RISCV] Add isel patterns to match vmacc/vmadd/vnmsub/vnmsac from add/sub and… (authored by craig.topper).
[RISCV] Add isel patterns to match vmacc/vmadd/vnmsub/vnmsac from add/sub and…
Mon, Jun 21, 11:30 AM
craig.topper closed D104163: [RISCV] Add isel patterns to match vmacc/vmadd/vnmsub/vnmsac from add/sub and mul..
Mon, Jun 21, 11:29 AM · Restricted Project
craig.topper added inline comments to D95588: [RISCV] Implement the MC layer support of P extension.
Mon, Jun 21, 10:47 AM · Restricted Project, Restricted Project
craig.topper added a comment to D104440: [X86] Fix bug when X86 stackify pass handle one ArgFPRW..

I'm not sure. To do it in the stackifier you need to do it any time the undef flag is present regardless of whether StackTop is 0. You instead would need to check whether the register is already present in the stack and only insert if it isn't. But there may be some complications with removing it from the stack later. I think removing things from the stack is based on kill flags, but I don't know if the undef would have a kill flag. So I guess you'd have to remember you inserted it and immediately remove it after the instruction? You would need to do this for any FP instruction not just ArgFPRW.

It make sense. Fixing in ISel is easier. Do you mind if I create another patch in phabricator based on your patch, or you prefer to finishing the patch by yourself?

Mon, Jun 21, 8:44 AM · Restricted Project

Sun, Jun 20

craig.topper requested review of D104612: [CGP][RISCV] Teach CodeGenPrepare::optimizeSwitchInst to honor isSExtCheaperThanZExt..
Sun, Jun 20, 7:51 PM · Restricted Project
craig.topper added a comment to D104440: [X86] Fix bug when X86 stackify pass handle one ArgFPRW..

Does this fix your test

diff --git a/llvm/lib/Target/X86/X86FastISel.cpp b/llvm/lib/Target/X86/X86FastISel.cpp
index 44670a9..3e5d45b 100644
--- a/llvm/lib/Target/X86/X86FastISel.cpp
+++ b/llvm/lib/Target/X86/X86FastISel.cpp
@@ -3842,6 +3842,30 @@ unsigned X86FastISel::fastMaterializeConstant(const Constant *C) {
     return X86MaterializeFP(CFP, VT);
   else if (const GlobalValue *GV = dyn_cast<GlobalValue>(C))
     return X86MaterializeGV(GV, VT);
+  else if (isa<UndefValue>(C)) {
+    unsigned Opc = 0;
+    switch (VT.SimpleTy) {
+    default:
+      break;
+    case MVT::f32:
+      if (!X86ScalarSSEf32)
+        Opc = X86::LD_Fp032;
+      break;
+    case MVT::f64:
+      if (!X86ScalarSSEf64)
+        Opc = X86::LD_Fp064;
+      break;
+    case MVT::f80:
+      Opc = X86::LD_Fp080;
+      break;
+    }
+
+    if (Opc) {
+      Register ResultReg = createResultReg(TLI.getRegClassFor(VT));
+      BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(Opc), ResultReg);
+      return ResultReg;
+    }
+  }
 
   return 0;
 }

Yes. It can fix. Thank you! One question, is there any other consideration to prevent undef float value in MIR? Otherwise stackify pass can support undef value by insert fld0 instruction, so that all ISel passes don't have to handle it specially.

Sun, Jun 20, 7:09 PM · Restricted Project
craig.topper committed rG3a8c7060cc3c: [TypePromotion] Prune Intrinsic includes. NFC (authored by craig.topper).
[TypePromotion] Prune Intrinsic includes. NFC
Sun, Jun 20, 1:06 PM
craig.topper added a comment to D104440: [X86] Fix bug when X86 stackify pass handle one ArgFPRW..

I think because I use the

With -O0, the small case can also generate "IMPLICIT_DEF" and "CHS_Fp80". I think we are near to the root cause. Stay tuned.

ProcessImplicitDefs doesn’t run with O0 and FP stackifier has special code for IMPLICIT_DEF.

This is because I set opt-bisect-limit=67436 in my command line. When CurBisectNum expired, "DAG to DAG" pass lower its opt level to O0. However "processimpdefs" and "X86 FP Stackifier" is not stopped due to the CurBisectNum expiration. So undefined fp0 is generated.

if (OptLevel != CodeGenOpt::None && skipFunction(Fn))
  NewOptLevel = CodeGenOpt::None;
OptLevelChanger OLC(*this, NewOptLevel);
Sun, Jun 20, 10:01 AM · Restricted Project

Sat, Jun 19

craig.topper added a comment to D104440: [X86] Fix bug when X86 stackify pass handle one ArgFPRW..

With -O0, the small case can also generate "IMPLICIT_DEF" and "CHS_Fp80". I think we are near to the root cause. Stay tuned.

Sat, Jun 19, 11:17 PM · Restricted Project
craig.topper added a comment to D104440: [X86] Fix bug when X86 stackify pass handle one ArgFPRW..

I don’t know what llvm-extract —recursive does. I’ve only used -func to extract a single function I knew caused a compiler crash.

Converting IMPLICIT_DEF to undef flag is correct.

IMPLICIT_DEF can get created from ISD::UNDEF but as far I could see ISD::UNDEF for fp80 is supposed to Expand to ConstantFP 0.

I also think fneg of undef should be folded by getNode when it is created in SelectionDAGBuilder. Is this going through fast isel or something?

In my small test case that is extracted by llvm-extract, the "fneg contract x86_fp80 undef" is lowered to "LD_Fp080". But in the big case which cause crash, it is lowered to "IMPLICIT_DEF" and "CHS_Fp80".

76:                                               ; preds = %74
  %77 = call contract x86_fp80 @_ZN5boost4math8policies20raise_overflow_errorIeNS1_6policyINS1_13promote_floatILb0EEENS1_14promote_doubleILb0EEENS1_14default_policyES8_S8_S8_S8_S8_S8_S8_S8_S8_S8_EEEET_PKcSC_RKT0_(i8* getelementptr inbounds ([29 x i8], [29 x i8]* @.str.3.16422, i64 0, i64 0), i8* null, %"struct.std::__atomic_flag_base"* nonnull align 1 dereferenceable(1) %9)
  %78 = fneg contract x86_fp80 undef
  br label %121
CALL64pcrel32 @_ZN5boost4math8policies20raise_overflow_errorIeNS1_6policyINS1_13promote_floatILb0EEENS1_14promote_doubleILb0EEENS1_14default_policyES8_S8_S8_S8_S8_S8_S8_S8_S8_S8_EEEET_PKcSC_RKT0_, <regmask $bh $bl $bp $bph $bpl $bx $ebp $ebx $hbp $hbx $rbp $rbx $r12 $r13 $r14 $r15 $r12b $r13b $r14b $r15b $r12bh $r13bh $r14bh $r15bh $r12d $r13d $r14d $r15d $r12w $r13w $r14w $r15w $r12wh and 3 more...>, implicit $rsp, implicit $ssp, implicit $rdi, implicit $rsi, implicit $rdx, implicit-def $rsp, implicit-def $ssp, implicit-def $fp0
ADJCALLSTACKUP64 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
%118:rfp80 = COPY $fp0
%13:rfp80 = nofpexcept LD_Fp080 implicit-def dead $fpsw, implicit $fpcw
Sat, Jun 19, 10:48 PM · Restricted Project
craig.topper added a comment to D104440: [X86] Fix bug when X86 stackify pass handle one ArgFPRW..

BTW, do you think the patch to handle undef case in stackify pass reasonable?

Sat, Jun 19, 10:44 PM · Restricted Project
craig.topper added a comment to D104440: [X86] Fix bug when X86 stackify pass handle one ArgFPRW..

Did you run llvm-extract to isolate the broken function first? bugpoint is not good at that.

Thanks Craig for the suggestion. After run "llvm-extract --recursive" and get the small file, I can't reproduce this issue. However I use -print-after-all to dump the IR of each pass. The undefine value is created in "processimpdefs" pass.

From

bb.11 (%ir-block.61):
; predecessors: %bb.10
  successors: %bb.26

  %164:gr64 = MOV64ri @.str.3.16422
  %165:gr32 = MOV32r0 implicit-def $eflags
  %166:gr64 = SUBREG_TO_REG 0, %165:gr32, %subreg.sub_32bit
  %167:gr64 = LEA64r %stack.6, 1, $noreg, 0, $noreg
  ADJCALLSTACKDOWN64 0, 0, 0, implicit-def $rsp, implicit-def $eflags, implicit-def $ssp, implicit $rsp, implicit $ssp
  $rdi = COPY %164:gr64
  $rsi = COPY %166:gr64
  $rdx = COPY %167:gr64
  CALL64pcrel32 @_ZN5boost4math8policies20raise_overflow_errorIeNS1_6policyINS1_13promote_floatILb0EEENS1_14promote_doubleILb0EEENS1_14default_policyES8_S8_S8_S8_S8_S8_S8_S8_S8_S8_EEEET_PKcSC_RKT0_, <regmask $bh $bl $bp $bph $bpl $bx $ebp $ebx $hbp $hbx $rbp $rbx $r12 $r13 $r14 $r15 $r12b $r13b $r14b $r15b $r12bh $r13bh $r14bh $r15bh $r12d $r13d $r14d $r15d $r12w $r13w $r14w $r15w $r12wh and 3 more...>, implicit $rsp, implicit $ssp, implicit $rdi, implicit $rsi, implicit $rdx, implicit-def $fp0
  ADJCALLSTACKUP64 0, 0, implicit-def $rsp, implicit-def $eflags, implicit-def $ssp, implicit $rsp, implicit $ssp
  %168:rfp80 = COPY $fp0
  %162:rfp80 = IMPLICIT_DEF
  %163:rfp80 = CHS_Fp80 %162:rfp80, implicit-def $fpsw
  JMP_1 %bb.26

To

bb.11 (%ir-block.61):
; predecessors: %bb.10
  successors: %bb.26

  %164:gr64 = MOV64ri @.str.3.16422
  %165:gr32 = MOV32r0 implicit-def $eflags
  %166:gr64 = SUBREG_TO_REG 0, %165:gr32, %subreg.sub_32bit
  %167:gr64 = LEA64r %stack.6, 1, $noreg, 0, $noreg
  ADJCALLSTACKDOWN64 0, 0, 0, implicit-def $rsp, implicit-def $eflags, implicit-def $ssp, implicit $rsp, implicit $ssp
  $rdi = COPY %164:gr64
  $rsi = COPY %166:gr64
  $rdx = COPY %167:gr64
  CALL64pcrel32 @_ZN5boost4math8policies20raise_overflow_errorIeNS1_6policyINS1_13promote_floatILb0EEENS1_14promote_doubleILb0EEENS1_14default_policyES8_S8_S8_S8_S8_S8_S8_S8_S8_S8_EEEET_PKcSC_RKT0_, <regmask $bh $bl $bp $bph $bpl $bx $ebp $ebx $hbp $hbx $rbp $rbx $r12 $r13 $r14 $r15 $r12b $r13b $r14b $r15b $r12bh $r13bh $r14bh $r15bh $r12d $r13d $r14d $r15d $r12w $r13w $r14w $r15w $r12wh and 3 more...>, implicit $rsp, implicit $ssp, implicit $rdi, implicit $rsi, implicit $rdx, implicit-def $fp0
  ADJCALLSTACKUP64 0, 0, implicit-def $rsp, implicit-def $eflags, implicit-def $ssp, implicit $rsp, implicit $ssp
  %168:rfp80 = COPY $fp0
  %163:rfp80 = CHS_Fp80 undef %162:rfp80, implicit-def $fpsw
  JMP_1 %bb.26

Is this transform reasonable? "%162:rfp80 = IMPLICIT_DEF" is generated in ISel. I will look into why "%162:rfp80 = IMPLICIT_DEF" is generated.

Sat, Jun 19, 9:55 PM · Restricted Project
craig.topper added a comment to D104440: [X86] Fix bug when X86 stackify pass handle one ArgFPRW..

Can you get IR and use bugpoint to reduce it? I'd really like to see the backend codegen that led to this case.

The .ll file is about 422 M. The reduce progress is slow. It has run 18 hours, but still not finished. Need more time to wait. Any parallel scheme to accelerate the reduce for bugpoint?

Sat, Jun 19, 6:33 PM · Restricted Project
craig.topper committed rGb663f30fa45c: [RISCV] Prevent formation of shXadd(.uw) and add.uw if it prevents the use of… (authored by craig.topper).
[RISCV] Prevent formation of shXadd(.uw) and add.uw if it prevents the use of…
Sat, Jun 19, 12:15 PM

Fri, Jun 18

craig.topper added a comment to D104440: [X86] Fix bug when X86 stackify pass handle one ArgFPRW..

I did some more digging and it looks like ISD::UNDEF for X86 should be turned into ConstantFP<0> by LegalizeDAG. So I really need more information about how we got here.

Fri, Jun 18, 10:30 PM · Restricted Project
craig.topper added a comment to D104440: [X86] Fix bug when X86 stackify pass handle one ArgFPRW..

I'd also like to know what happens if you add "(fneg undef) -> undef" fold to DAGCombiner::visitFNEG.

Fri, Jun 18, 10:06 PM · Restricted Project
craig.topper added a comment to D104440: [X86] Fix bug when X86 stackify pass handle one ArgFPRW..

Can you get IR and use bugpoint to reduce it? I'd really like to see the backend codegen that led to this case.

Fri, Jun 18, 10:03 PM · Restricted Project
craig.topper accepted D104507: [RISCV][test] Add new tests for add-mul optimization in the zba extension with SH*ADD.

LGTM

Fri, Jun 18, 10:01 PM · Restricted Project
craig.topper accepted D104588: [RISCV] Optimize add-mul in the zba extension with SH*ADD.

LGTM

Fri, Jun 18, 9:59 PM · Restricted Project
craig.topper requested review of D104581: Add DAG combine to detect opportunities to replace (i64 (any_extend (i32 X)) with sign_extend..
Fri, Jun 18, 4:53 PM · Restricted Project
craig.topper updated the diff for D104163: [RISCV] Add isel patterns to match vmacc/vmadd/vnmsub/vnmsac from add/sub and mul..

Limit to cases where mul has a single use. There may be a better heuristic here,
but this is a simple starting point.

Fri, Jun 18, 12:52 PM · Restricted Project
craig.topper committed rGac87133f1de9: [RISCV] Teach vsetvli insertion to remember when predecessors have same AVL and… (authored by craig.topper).
[RISCV] Teach vsetvli insertion to remember when predecessors have same AVL and…
Fri, Jun 18, 12:16 PM
craig.topper closed D104069: [RISCV] Teach vsetvli insertion to remember when predecessors have same AVL and SEW/LMUL ratio if their VTYPEs otherwise mismatch..
Fri, Jun 18, 12:16 PM · Restricted Project
craig.topper added inline comments to D104541: [Utils][vim] Add missing highlights for fast-math flags.
Fri, Jun 18, 11:40 AM · Restricted Project

Thu, Jun 17

craig.topper added inline comments to D104440: [X86] Fix bug when X86 stackify pass handle one ArgFPRW..
Thu, Jun 17, 5:09 PM · Restricted Project
craig.topper added inline comments to D104440: [X86] Fix bug when X86 stackify pass handle one ArgFPRW..
Thu, Jun 17, 5:05 PM · Restricted Project
craig.topper added a comment to D104440: [X86] Fix bug when X86 stackify pass handle one ArgFPRW..

How did you get a CHS with an undef input?

Thu, Jun 17, 5:00 PM · Restricted Project
craig.topper added a comment to D104436: [RISCV] Optimize mul-add in the zba extension with SH*ADD.

There's a more generic optimization hiding here. Could we teach decomposeMulByConstant to emit (shl (sh1add X, X), C) to handle any constant of the form (3 << C). Similar for (shl (sh2add X, X)) to handle (5 << C), and (shl (sh3add X, X)) to handle (9 << C). If the multiply happens to be used by an add the existing patterns would combine the ADD and the SHL when possible.

Thu, Jun 17, 4:49 PM · Restricted Project
craig.topper retitled D104436: [RISCV] Optimize mul-add in the zba extension with SH*ADD from [RISCV] Optimize mul-add in the zbs extension with SH*ADD to [RISCV] Optimize mul-add in the zba extension with SH*ADD.
Thu, Jun 17, 4:15 PM · Restricted Project
craig.topper accepted D104364: [RISCV] Don't enable Interleaved Access Vectorization.

LGTM

Thu, Jun 17, 4:13 PM · Restricted Project
craig.topper committed rG99e95856fb78: [PartiallyInlineLibCalls] Disable sqrt expansion for strictfp. (authored by craig.topper).
[PartiallyInlineLibCalls] Disable sqrt expansion for strictfp.
Thu, Jun 17, 2:15 PM
craig.topper closed D104479: [PartiallyInlineLibCalls] Disable sqrt expansion for strictfp..
Thu, Jun 17, 2:15 PM · Restricted Project
craig.topper added inline comments to D104479: [PartiallyInlineLibCalls] Disable sqrt expansion for strictfp..
Thu, Jun 17, 1:15 PM · Restricted Project
craig.topper added inline comments to D104479: [PartiallyInlineLibCalls] Disable sqrt expansion for strictfp..
Thu, Jun 17, 1:14 PM · Restricted Project
craig.topper updated the diff for D104479: [PartiallyInlineLibCalls] Disable sqrt expansion for strictfp..

Add strictfp to caller in test case.

Thu, Jun 17, 11:56 AM · Restricted Project
craig.topper added inline comments to D104479: [PartiallyInlineLibCalls] Disable sqrt expansion for strictfp..
Thu, Jun 17, 11:50 AM · Restricted Project
craig.topper updated subscribers of D104479: [PartiallyInlineLibCalls] Disable sqrt expansion for strictfp..
Thu, Jun 17, 11:39 AM · Restricted Project
craig.topper updated the summary of D104479: [PartiallyInlineLibCalls] Disable sqrt expansion for strictfp..
Thu, Jun 17, 11:38 AM · Restricted Project
craig.topper updated the summary of D104479: [PartiallyInlineLibCalls] Disable sqrt expansion for strictfp..
Thu, Jun 17, 11:38 AM · Restricted Project
craig.topper requested review of D104479: [PartiallyInlineLibCalls] Disable sqrt expansion for strictfp..
Thu, Jun 17, 11:34 AM · Restricted Project
craig.topper updated the diff for D104069: [RISCV] Teach vsetvli insertion to remember when predecessors have same AVL and SEW/LMUL ratio if their VTYPEs otherwise mismatch..

Address review feedback

Thu, Jun 17, 10:35 AM · Restricted Project
craig.topper added inline comments to D101074: [X86] Canonicalize SGT/UGT compares with constants to use SGE/UGE to reduce the number of EFLAGs reads. (PR48760).
Thu, Jun 17, 9:40 AM · Restricted Project
craig.topper added inline comments to D101074: [X86] Canonicalize SGT/UGT compares with constants to use SGE/UGE to reduce the number of EFLAGs reads. (PR48760).
Thu, Jun 17, 8:34 AM · Restricted Project

Tue, Jun 15

craig.topper updated the diff for D104069: [RISCV] Teach vsetvli insertion to remember when predecessors have same AVL and SEW/LMUL ratio if their VTYPEs otherwise mismatch..

Initialize member variable

Tue, Jun 15, 12:08 PM · Restricted Project
craig.topper added a comment to D104237: [RISCV][VP] Lower FP VP ISD nodes to RVV instructions.

Will frem need to be expanded to a loop in the expansion pass?

Would it be reasonable to expand it using the definition of fmod but vector-wise? It seems we'd need a division, a rounding to integer towards zero, convert back to float, a mutiplication and then a subtraction.

Tue, Jun 15, 11:58 AM · Restricted Project

Mon, Jun 14

craig.topper committed rG4017d0335a35: [X86] Use EVT::getVectorVT instead of changeVectorElementType in… (authored by craig.topper).
[X86] Use EVT::getVectorVT instead of changeVectorElementType in…
Mon, Jun 14, 10:07 PM
craig.topper added inline comments to D104069: [RISCV] Teach vsetvli insertion to remember when predecessors have same AVL and SEW/LMUL ratio if their VTYPEs otherwise mismatch..
Mon, Jun 14, 9:08 PM · Restricted Project
craig.topper added a comment to D104237: [RISCV][VP] Lower FP VP ISD nodes to RVV instructions.

Will frem need to be expanded to a loop in the expansion pass?

Mon, Jun 14, 9:38 AM · Restricted Project
craig.topper accepted D104032: [RISCV] Transform unaligned RVV vector loads/stores to aligned ones.

LGTM

Mon, Jun 14, 9:28 AM · Restricted Project
craig.topper added inline comments to D104032: [RISCV] Transform unaligned RVV vector loads/stores to aligned ones.
Mon, Jun 14, 9:01 AM · Restricted Project

Sat, Jun 12

craig.topper added a comment to D104178: [X86] Add ISD::FREEZE and ISD::AssertAlign to the list of opcodes that don't guarantee upper 32 bits are zero..

LGTM - does this need to be back ported to 12.x?

Sat, Jun 12, 10:00 AM · Restricted Project
craig.topper committed rGc997867dc084: [X86] Add ISD::FREEZE and ISD::AssertAlign to the list of opcodes that don't… (authored by craig.topper).
[X86] Add ISD::FREEZE and ISD::AssertAlign to the list of opcodes that don't…
Sat, Jun 12, 9:57 AM
craig.topper closed D104178: [X86] Add ISD::FREEZE and ISD::AssertAlign to the list of opcodes that don't guarantee upper 32 bits are zero..
Sat, Jun 12, 9:56 AM · Restricted Project

Fri, Jun 11

craig.topper requested review of D104178: [X86] Add ISD::FREEZE and ISD::AssertAlign to the list of opcodes that don't guarantee upper 32 bits are zero..
Fri, Jun 11, 11:57 PM · Restricted Project
craig.topper requested review of D104163: [RISCV] Add isel patterns to match vmacc/vmadd/vnmsub/vnmsac from add/sub and mul..
Fri, Jun 11, 5:24 PM · Restricted Project
craig.topper updated the diff for D104069: [RISCV] Teach vsetvli insertion to remember when predecessors have same AVL and SEW/LMUL ratio if their VTYPEs otherwise mismatch..

Fixed typo in comment

Fri, Jun 11, 12:14 PM · Restricted Project
craig.topper accepted D93470: [VP] Binary floating-point intrinsics..

LGTM

Fri, Jun 11, 11:35 AM · Restricted Project, Restricted Project
craig.topper accepted D103480: [NFC][OpaquePtr] Explicitly pass GEP source type in optimizeGatherScatterInst().

LGTM

Fri, Jun 11, 11:33 AM · Restricted Project
craig.topper added inline comments to D103480: [NFC][OpaquePtr] Explicitly pass GEP source type in optimizeGatherScatterInst().
Fri, Jun 11, 11:17 AM · Restricted Project
craig.topper added a comment to D104037: [X86] Check immediate before get it..

I don't what Intel's original failure looked like, but here's a test that should reproduce this with -run-pass=machinelicm https://reviews.llvm.org/P8267 needs more cleanup.

Fri, Jun 11, 10:21 AM · Restricted Project
craig.topper added inline comments to D104107: [NFCI][X86] Drop "atom"/"slm" target tuning "features", derive them from CPU string.
Fri, Jun 11, 9:08 AM · Restricted Project

Thu, Jun 10

craig.topper added a comment to D104037: [X86] Check immediate before get it..

It would be nice to have a test, but this change seems ok.

Thu, Jun 10, 11:10 PM · Restricted Project
craig.topper committed rG081ae5fe1aa3: [RISCV] Remove extra assignment of intrinsic ID in ManualCodegen. NFC (authored by craig.topper).
[RISCV] Remove extra assignment of intrinsic ID in ManualCodegen. NFC
Thu, Jun 10, 8:47 PM
craig.topper committed rG420bd5ee8ec9: [RISCV] Use ComputeNumSignBits/MaskedValueIsZero in RISCVDAGToDAGISel… (authored by craig.topper).
[RISCV] Use ComputeNumSignBits/MaskedValueIsZero in RISCVDAGToDAGISel…
Thu, Jun 10, 7:18 PM
craig.topper closed D104079: [RISCV] Use ComputeNumSignBits/MaskedValueIsZero in RISCVDAGToDAGISel::selectSExti32/selectZExti32..
Thu, Jun 10, 7:18 PM · Restricted Project
craig.topper added inline comments to D104079: [RISCV] Use ComputeNumSignBits/MaskedValueIsZero in RISCVDAGToDAGISel::selectSExti32/selectZExti32..
Thu, Jun 10, 6:58 PM · Restricted Project
craig.topper requested review of D104079: [RISCV] Use ComputeNumSignBits/MaskedValueIsZero in RISCVDAGToDAGISel::selectSExti32/selectZExti32..
Thu, Jun 10, 5:11 PM · Restricted Project
craig.topper committed rGb35a842581f0: [RISCV] Add test cases that show failure to use some W instructions if they are… (authored by craig.topper).
[RISCV] Add test cases that show failure to use some W instructions if they are…
Thu, Jun 10, 4:56 PM
craig.topper added reviewers for D79521: [RISCV] Add SiFive's interrupt modes: kito-cheng, asb, luismarques, craig.topper.
Thu, Jun 10, 4:45 PM · Restricted Project
craig.topper added a comment to D104061: [LangRef] clarify the meaning of noimplicitfloat.

That's a good point, if this attribute disables vectorization of integer math, the docs should say as much.

Thu, Jun 10, 4:07 PM · Restricted Project