Page MenuHomePhabricator

yubing (Bing Yu)
User

Projects

User does not belong to any projects.

User Details

User Since
Aug 22 2019, 5:40 PM (66 w, 3 d)

Recent Activity

Fri, Nov 27

yubing updated subscribers of D91331: Expand the fp_to_int/int_to_fp/fp_round/fp_extend as libcall for fp128.
Fri, Nov 27, 2:55 AM · Restricted Project

Thu, Nov 26

yubing added a comment to D91331: Expand the fp_to_int/int_to_fp/fp_round/fp_extend as libcall for fp128.

Besides, I found a strange thing:
In PPCTargetLowering::LowerFP_TO_INT, we will return original op if the input type is fp128, which means in some subtarget , fp128's fptoint is legal and have corresponding instructions.
But what you are going to do is to do a hack and expand it to libcall at the beginning of legalizeOp(...). It seems there is a conflict here.

Thu, Nov 26, 9:36 PM · Restricted Project
yubing added a comment to D91331: Expand the fp_to_int/int_to_fp/fp_round/fp_extend as libcall for fp128.

Could we check whether the input type is fp128 at the begining of PPCTargetLowering::LowerFP_TO_INT and if it is a fp128 input type, we directly ConvertNodeToLibcall?

Thu, Nov 26, 9:09 PM · Restricted Project

Oct 26 2020

yubing committed rG2c08f1b4b69e: [CostModel][X86] teach TTI calculate cost of chain of vector inserts/extracts… (authored by yubing).
[CostModel][X86] teach TTI calculate cost of chain of vector inserts/extracts…
Oct 26 2020, 8:21 PM
yubing closed D89767: [CostModel][X86] teach TTI calculate cost of chain of vector inserts/extracts more precisely and correctly:In each 128-lane, if there is at least one index is demanded and not all indices are demanded....
Oct 26 2020, 8:21 PM · Restricted Project

Oct 25 2020

yubing added a comment to D89767: [CostModel][X86] teach TTI calculate cost of chain of vector inserts/extracts more precisely and correctly:In each 128-lane, if there is at least one index is demanded and not all indices are demanded....

Ping?

Oct 25 2020, 7:32 PM · Restricted Project

Oct 22 2020

yubing updated the diff for D89767: [CostModel][X86] teach TTI calculate cost of chain of vector inserts/extracts more precisely and correctly:In each 128-lane, if there is at least one index is demanded and not all indices are demanded....

Modify some comments and remove unnecessary lambda function

Oct 22 2020, 11:10 PM · Restricted Project

Oct 21 2020

yubing added a reviewer for D89767: [CostModel][X86] teach TTI calculate cost of chain of vector inserts/extracts more precisely and correctly:In each 128-lane, if there is at least one index is demanded and not all indices are demanded...: wxiao3.
Oct 21 2020, 8:32 PM · Restricted Project

Oct 20 2020

yubing added inline comments to D89767: [CostModel][X86] teach TTI calculate cost of chain of vector inserts/extracts more precisely and correctly:In each 128-lane, if there is at least one index is demanded and not all indices are demanded....
Oct 20 2020, 7:51 PM · Restricted Project
yubing updated the diff for D89767: [CostModel][X86] teach TTI calculate cost of chain of vector inserts/extracts more precisely and correctly:In each 128-lane, if there is at least one index is demanded and not all indices are demanded....

Fix the warning raised by Lint

Oct 20 2020, 12:04 AM · Restricted Project

Oct 19 2020

yubing retitled D89767: [CostModel][X86] teach TTI calculate cost of chain of vector inserts/extracts more precisely and correctly:In each 128-lane, if there is at least one index is demanded and not all indices are demanded... from [CostModel][X86] teach TTI calculate cost of chain of vector inserts/extracts more precisely and correctly:In each 128-lane, if there is at least one index is demanded and not allindices are demanded and this 128-lane is not the first 128-lane... to [CostModel][X86] teach TTI calculate cost of chain of vector inserts/extracts more precisely and correctly:In each 128-lane, if there is at least one index is demanded and not all indices are demanded....
Oct 19 2020, 11:45 PM · Restricted Project
yubing retitled D89767: [CostModel][X86] teach TTI calculate cost of chain of vector inserts/extracts more precisely and correctly:In each 128-lane, if there is at least one index is demanded and not all indices are demanded... from [CostModel][X86] teach TTI calculate cost of chain of vector inserts/extracts more precisely and correctly: In each 128-lane, if there is at least one index is demanded and not all indices are demanded and this 128-lane is not the first 128-lane... to [CostModel][X86] teach TTI calculate cost of chain of vector inserts/extracts more precisely and correctly:In each 128-lane, if there is at least one index is demanded and not allindices are demanded and this 128-lane is not the first 128-lane....
Oct 19 2020, 11:33 PM · Restricted Project
yubing requested review of D89767: [CostModel][X86] teach TTI calculate cost of chain of vector inserts/extracts more precisely and correctly:In each 128-lane, if there is at least one index is demanded and not all indices are demanded....
Oct 19 2020, 11:32 PM · Restricted Project

Oct 18 2020

yubing added inline comments to D87930: [DAG] Add BuildVectorSDNode::getRepeatedSequence helper to recognise multi-element splat patterns.
Oct 18 2020, 10:42 PM · Restricted Project

Sep 22 2020

yubing committed rGec24e505536f: [CostModel][X86] add CostModel for SK_Select(v8f64, v8i64, v16f32, v16i32… (authored by yubing).
[CostModel][X86] add CostModel for SK_Select(v8f64, v8i64, v16f32, v16i32…
Sep 22 2020, 7:30 PM
yubing closed D87884: [CostModel][X86] add CostModel for SK_Select(v8f64, v8i64, v16f32, v16i32, v32i16, v64i8).
Sep 22 2020, 7:29 PM · Restricted Project
yubing updated the diff for D87884: [CostModel][X86] add CostModel for SK_Select(v8f64, v8i64, v16f32, v16i32, v32i16, v64i8).

With avx512f, the cost of SK_Select(v32i16 or v64i8) should be 1(vpternlogq)

Sep 22 2020, 12:27 AM · Restricted Project
yubing added a comment to D87884: [CostModel][X86] add CostModel for SK_Select(v8f64, v8i64, v16f32, v16i32, v32i16, v64i8).

With avx512f, the cost SK_Select(v32i16 or v64i8) shoulde be 3(vmovdqa64 + vpternlogq)

The moves probably don't really count since they can be eliminated during register renaming. So only the vpternlog executes.

Eh, Craig, why it has relationship with register renaming? I thought, vternlog's third operand should be provided by a vmovdqa64.
Besides, we can observe the following asm for v32i16's SK_Select:

vmovdqa64       .LCPI0_0(%rip), %zmm0   # zmm0 = [0,0,0,65535,65535,65535,65535,65535,65535,65535,65535,65535,65535,65535,65535,65535,65535,65535,65535,65535,65535,65535,65535,65535,65535,65535,65535,65535,65535,65535,65535,65535]
vpternlogq      $202, 144(%rbp), %zmm4, %zmm0

Sorry I thought the vmovdqa you mentioned was due to the vpternlogq reading 3 sources and clobbering one of them. So sometimes it needs a register to register move to preserve a register.

I'm not sure if we usually cost the constant pool load since its loop invariant. Do we cost the load that vpermi2b/w/d/q would use for 2 source permute?

Sep 22 2020, 12:18 AM · Restricted Project

Sep 21 2020

yubing added a comment to D87884: [CostModel][X86] add CostModel for SK_Select(v8f64, v8i64, v16f32, v16i32, v32i16, v64i8).

With avx512f, the cost SK_Select(v32i16 or v64i8) shoulde be 3(vmovdqa64 + vpternlogq)

The moves probably don't really count since they can be eliminated during register renaming. So only the vpternlog executes.

Sep 21 2020, 11:51 PM · Restricted Project
yubing updated the diff for D87884: [CostModel][X86] add CostModel for SK_Select(v8f64, v8i64, v16f32, v16i32, v32i16, v64i8).

With avx512f, the cost SK_Select(v32i16 or v64i8) shoulde be 3(vmovdqa64 + vpternlogq)

Sep 21 2020, 10:44 PM · Restricted Project
yubing added a comment to D87884: [CostModel][X86] add CostModel for SK_Select(v8f64, v8i64, v16f32, v16i32, v32i16, v64i8).

@yubing I've added shuffle-select.ll which should have better test coverage - please can you rebase and check?

Sep 21 2020, 8:44 PM · Restricted Project
yubing updated the diff for D87884: [CostModel][X86] add CostModel for SK_Select(v8f64, v8i64, v16f32, v16i32, v32i16, v64i8).
Sep 21 2020, 7:45 PM · Restricted Project

Sep 20 2020

yubing updated the diff for D87884: [CostModel][X86] add CostModel for SK_Select(v8f64, v8i64, v16f32, v16i32, v32i16, v64i8).

Delete some testcases which was added previously since they are for Instruction::Select instead of Instruction::ShuffleVector(SK_Select)

Sep 20 2020, 11:28 PM · Restricted Project
yubing added a comment to D87884: [CostModel][X86] add CostModel for SK_Select(v8f64, v8i64, v16f32, v16i32, v32i16, v64i8).

Thanks for your comments, Simon. With your Comments, I found my new testcases are useless since they are for Instruction::Select instead of Instruction::ShuffleVector(SK_Select)

Sep 20 2020, 11:11 PM · Restricted Project
yubing updated the diff for D87884: [CostModel][X86] add CostModel for SK_Select(v8f64, v8i64, v16f32, v16i32, v32i16, v64i8).

Just a rebase

Sep 20 2020, 7:07 PM · Restricted Project

Sep 18 2020

yubing added a comment to D87884: [CostModel][X86] add CostModel for SK_Select(v8f64, v8i64, v16f32, v16i32, v32i16, v64i8).

https://reviews.llvm.org/B72136 shows it fails when making HTTP Request. But I don't have permission to restart the build procedure.
Besides, I've check-all locally with success.

Sep 18 2020, 1:48 AM · Restricted Project
yubing added a reviewer for D87884: [CostModel][X86] add CostModel for SK_Select(v8f64, v8i64, v16f32, v16i32, v32i16, v64i8): pengfei.
Sep 18 2020, 1:43 AM · Restricted Project
yubing added reviewers for D87884: [CostModel][X86] add CostModel for SK_Select(v8f64, v8i64, v16f32, v16i32, v32i16, v64i8): LuoYuanke, craig.topper.
Sep 18 2020, 12:22 AM · Restricted Project
yubing requested review of D87884: [CostModel][X86] add CostModel for SK_Select(v8f64, v8i64, v16f32, v16i32, v32i16, v64i8).
Sep 18 2020, 12:17 AM · Restricted Project

Aug 27 2020

yubing updated the diff for D86668: Fix Calling Convention of __float128 and long double(128bits) in i386.

Modify a testcase: clang/test/CodeGenCXX/float128-declarations.cpp

Aug 27 2020, 12:44 AM · Restricted Project

Aug 26 2020

yubing updated the summary of D86668: Fix Calling Convention of __float128 and long double(128bits) in i386.
Aug 26 2020, 7:05 PM · Restricted Project
yubing added a reviewer for D86668: Fix Calling Convention of __float128 and long double(128bits) in i386: hjl.tools.
Aug 26 2020, 6:55 PM · Restricted Project
yubing added reviewers for D86668: Fix Calling Convention of __float128 and long double(128bits) in i386: erichkeane, pengfei.
Aug 26 2020, 6:49 PM · Restricted Project
yubing requested review of D86668: Fix Calling Convention of __float128 and long double(128bits) in i386.
Aug 26 2020, 6:40 PM · Restricted Project

Jul 30 2020

yubing added inline comments to D84922: [X86][AVX512] Fix build fail after D81548.
Jul 30 2020, 3:49 AM · Restricted Project

Jul 8 2020

yubing added a comment to D81791: [X86][SSE] Add SimplifyDemandedVectorEltsForTargetShuffle to handle target shuffle variable masks..

@yubing @pengfei @craig.topper Please can you confirm the regressions have now been addressed?

Jul 8 2020, 7:06 PM · Restricted Project
yubing added inline comments to D81791: [X86][SSE] Add SimplifyDemandedVectorEltsForTargetShuffle to handle target shuffle variable masks..
Jul 8 2020, 12:53 AM · Restricted Project

Jun 1 2020

yubing abandoned D79987: [DAG] SimplifyDemandedVectorElts Bug fix for rG7cb5a51f386d.
Jun 1 2020, 1:33 AM · Restricted Project
yubing added a comment to D79987: [DAG] SimplifyDemandedVectorElts Bug fix for rG7cb5a51f386d.

@yubing I think my fixes for PR45974 have addressed this now - please can you confirm?

Jun 1 2020, 1:33 AM · Restricted Project

May 31 2020

yubing abandoned D80906: [DAG] SimplifyDemandedVectorElts Bugfix for X86ISD::VBROADCAST calculating wrong DemandedElts for its Operand.
May 31 2020, 9:50 PM · Restricted Project
yubing added a comment to D80906: [DAG] SimplifyDemandedVectorElts Bugfix for X86ISD::VBROADCAST calculating wrong DemandedElts for its Operand.

Broadcast should only demand the lowest element. The recursive call to SimplifyDemandedVectorElts call is supposed to ignore the incoming DemandedElts if the SDValue has more than one use.

May 31 2020, 9:32 PM · Restricted Project
yubing updated the summary of D80906: [DAG] SimplifyDemandedVectorElts Bugfix for X86ISD::VBROADCAST calculating wrong DemandedElts for its Operand.
May 31 2020, 8:14 PM · Restricted Project
yubing created D80906: [DAG] SimplifyDemandedVectorElts Bugfix for X86ISD::VBROADCAST calculating wrong DemandedElts for its Operand.
May 31 2020, 8:14 PM · Restricted Project

May 25 2020

yubing added a comment to D79987: [DAG] SimplifyDemandedVectorElts Bug fix for rG7cb5a51f386d.

I'm still looking at fixing getFauxShuffleMask (PR45974) but that might take a while, so this sort of approach is probably necessary.

Did you investigate replacing getTargetShuffleInputs with getTargetShuffleAndZeroables in the SimplifyDemandedBitsForTargetNode/SimplifyDemandedVectorEltsForTargetNode?

May 25 2020, 12:29 AM · Restricted Project

May 24 2020

yubing updated the diff for D79987: [DAG] SimplifyDemandedVectorElts Bug fix for rG7cb5a51f386d.
May 24 2020, 10:23 PM · Restricted Project

May 17 2020

yubing added a comment to rG4580b0f5b65c: [X86] getFauxShuffle - remove (unused) ISD::TRUNCATE shuffle decoding..
May 17 2020, 7:41 PM
yubing added a comment to D79987: [DAG] SimplifyDemandedVectorElts Bug fix for rG7cb5a51f386d.

PING @RKSimon

May 17 2020, 7:41 PM · Restricted Project

May 15 2020

yubing updated the diff for D79987: [DAG] SimplifyDemandedVectorElts Bug fix for rG7cb5a51f386d.
May 15 2020, 10:18 AM · Restricted Project
yubing added a comment to D79987: [DAG] SimplifyDemandedVectorElts Bug fix for rG7cb5a51f386d.

Really we need to stop creating nodes inside getFauxShuffle - I'm going to see if we can do this without too many regressions.

May 15 2020, 10:18 AM · Restricted Project
yubing created D79987: [DAG] SimplifyDemandedVectorElts Bug fix for rG7cb5a51f386d.
May 15 2020, 2:19 AM · Restricted Project

Apr 2 2020

yubing committed rGfe8ac0fe51aa: [x86] Fix Intel OpenCL builtin CalleeSavedRegs on skx (authored by wenju).
[x86] Fix Intel OpenCL builtin CalleeSavedRegs on skx
Apr 2 2020, 8:37 PM
yubing closed D77032: [x86] Fix Intel OpenCL builtin CalleeSavedRegs on skx.
Apr 2 2020, 8:36 PM · Restricted Project

Jan 12 2020

yubing abandoned D72491: [X86] Bugfix for rL146415.

@RKSimon , you're right. After checking https://reviews.llvm.org/rL146415, I found the makefile has disable this testcase for i386.

Jan 12 2020, 7:09 PM · Restricted Project

Jan 9 2020

yubing created D72491: [X86] Bugfix for rL146415.
Jan 9 2020, 7:03 PM · Restricted Project

Dec 4 2019

yubing closed D69986: [X86] Bugfix for rL349334.

Committed in https://github.com/llvm/llvm-test-suite/commit/16265f5a73211d7497e41af43e895eb230c13b89

Dec 4 2019, 6:42 PM · Restricted Project

Nov 11 2019

yubing updated the diff for D69986: [X86] Bugfix for rL349334.
Nov 11 2019, 5:41 PM · Restricted Project

Nov 7 2019

yubing created D69986: [X86] Bugfix for rL349334.
Nov 7 2019, 10:17 PM · Restricted Project

Sep 23 2019

yubing updated the diff for D67212: [x86] Adding support for some missing intrinsics: _castf32_u32, _castf64_u64, _castu32_f32, _castu64_f64.
Sep 23 2019, 11:57 PM · Restricted Project

Sep 22 2019

yubing updated the diff for D67212: [x86] Adding support for some missing intrinsics: _castf32_u32, _castf64_u64, _castu32_f32, _castu64_f64.
Sep 22 2019, 8:37 PM · Restricted Project

Sep 18 2019

yubing added inline comments to D50231: [llvm-exegesis] Renaming classes and functions..
Sep 18 2019, 10:54 PM · Restricted Project
yubing added inline comments to D50231: [llvm-exegesis] Renaming classes and functions..
Sep 18 2019, 7:55 PM · Restricted Project

Sep 17 2019

Herald added a project to D50231: [llvm-exegesis] Renaming classes and functions.: Restricted Project.
Sep 17 2019, 2:07 AM · Restricted Project

Sep 5 2019

yubing abandoned D67210: [x86] bug fix for https://reviews.llvm.org/D64551.
Sep 5 2019, 7:24 PM · Restricted Project
yubing added inline comments to D64551: [X86] EltsFromConsecutiveLoads - support common source loads.
Sep 5 2019, 7:05 PM · Restricted Project
yubing added inline comments to D64551: [X86] EltsFromConsecutiveLoads - support common source loads.
Sep 5 2019, 9:54 AM · Restricted Project
yubing added a comment to D67210: [x86] bug fix for https://reviews.llvm.org/D64551.

Please abandon this, it isn't a valid solution to the issue (raised at PR43227). I have a WIP fix that will address this correctly.

Sep 5 2019, 7:39 AM · Restricted Project
yubing created D67212: [x86] Adding support for some missing intrinsics: _castf32_u32, _castf64_u64, _castu32_f32, _castu64_f64.
Sep 5 2019, 12:11 AM · Restricted Project

Sep 4 2019

yubing added a comment to D64551: [X86] EltsFromConsecutiveLoads - support common source loads.

I've submit a patch to solve the bug which I commented yesterday.
https://reviews.llvm.org/D67210

Sep 4 2019, 11:02 PM · Restricted Project
yubing updated the summary of D67210: [x86] bug fix for https://reviews.llvm.org/D64551.
Sep 4 2019, 10:18 PM · Restricted Project
yubing created D67210: [x86] bug fix for https://reviews.llvm.org/D64551.
Sep 4 2019, 10:15 PM · Restricted Project
yubing added a comment to D64551: [X86] EltsFromConsecutiveLoads - support common source loads.

Hi, Simon. This patch has produced a bug in llvm:

the attached t.ll can reproduce this bug:
For t.ll, llvm without this patch produces the correct asm while llvm with this patch produces bad asm:
llvm with this patch:

Sep 4 2019, 7:56 AM · Restricted Project

Sep 2 2019

yubing updated the diff for D66786: [x86] Fix bugs of some intrinsic functions in CLANG : _mm512_stream_ps, _mm512_stream_pd, _mm512_stream_si512.
Sep 2 2019, 12:34 AM · Restricted Project
yubing updated the diff for D66786: [x86] Fix bugs of some intrinsic functions in CLANG : _mm512_stream_ps, _mm512_stream_pd, _mm512_stream_si512.
Sep 2 2019, 12:06 AM · Restricted Project

Aug 28 2019

yubing updated the diff for D66785: [x86] Adding support for some missing intrinsics: _mm512_cvtsi512_si32.
Aug 28 2019, 10:02 PM · Restricted Project
yubing updated the diff for D66785: [x86] Adding support for some missing intrinsics: _mm512_cvtsi512_si32.
Aug 28 2019, 8:16 PM · Restricted Project
yubing added inline comments to D66785: [x86] Adding support for some missing intrinsics: _mm512_cvtsi512_si32.
Aug 28 2019, 8:01 PM · Restricted Project

Aug 26 2019

yubing retitled D66786: [x86] Fix bugs of some intrinsic functions in CLANG : _mm512_stream_ps, _mm512_stream_pd, _mm512_stream_si512 from [x86] Make some intrinsic functions in CLANG aligned with SPEC: _mm512_stream_ps, _mm512_stream_pd, _mm512_stream_si512 to [x86] Fix bugs of some intrinsic functions in CLANG : _mm512_stream_ps, _mm512_stream_pd, _mm512_stream_si512.
Aug 26 2019, 11:34 PM · Restricted Project
yubing created D66786: [x86] Fix bugs of some intrinsic functions in CLANG : _mm512_stream_ps, _mm512_stream_pd, _mm512_stream_si512.
Aug 26 2019, 11:33 PM · Restricted Project
yubing created D66785: [x86] Adding support for some missing intrinsics: _mm512_cvtsi512_si32.
Aug 26 2019, 11:25 PM · Restricted Project