Page MenuHomePhabricator

LuoYuanke (LuoYuanke)
User

Projects

User does not belong to any projects.

User Details

User Since
Sep 24 2018, 10:28 PM (103 w, 5 d)

Recent Activity

Yesterday

LuoYuanke updated the diff for D87981: [X86] AMX programming model prototype..

Updating D87981: [X86] AMX programming model prototype.
Fixed some clang format issue.

Sat, Sep 19, 11:32 PM · Restricted Project, Restricted Project
LuoYuanke added a reviewer for D87981: [X86] AMX programming model prototype.: hjl.tools.
Sat, Sep 19, 9:54 PM · Restricted Project, Restricted Project
LuoYuanke updated the summary of D87981: [X86] AMX programming model prototype..
Sat, Sep 19, 9:35 PM · Restricted Project, Restricted Project
LuoYuanke requested review of D87981: [X86] AMX programming model prototype..
Sat, Sep 19, 9:19 PM · Restricted Project, Restricted Project

Jul 6 2020

LuoYuanke added a comment to D83175: [X86] Fix a bug that when lowering byval argument.

LGTM

Jul 6 2020, 11:21 PM · Restricted Project
LuoYuanke added inline comments to D83175: [X86] Fix a bug that when lowering byval argument.
Jul 6 2020, 6:04 AM · Restricted Project

Jun 30 2020

LuoYuanke added inline comments to D82705: [X86-64] Support Intel AMX instructions.
Jun 30 2020, 7:00 PM · Restricted Project

Jun 13 2020

LuoYuanke added inline comments to D81498: [Matrix] Preserve volatile when loading loads/stores..
Jun 13 2020, 10:24 PM · Restricted Project

Jun 6 2020

LuoYuanke added inline comments to D81308: [Matrix] Use TileInfo to create tiled loop nest for matrix multiply..
Jun 6 2020, 10:24 PM · Restricted Project

May 26 2020

LuoYuanke created D80539: [X86] Report error for invalid register number..
May 26 2020, 1:35 AM · Restricted Project

May 18 2020

LuoYuanke accepted D79617: Add cet.h for writing CET-enabled assembly code.

LGTM

May 18 2020, 7:32 PM · Restricted Project
LuoYuanke added inline comments to D80124: [Matrix] Make matrix.multiply variadic, accept optional NUW/NSW flags..
May 18 2020, 6:26 PM · Restricted Project

May 10 2020

LuoYuanke added inline comments to D79617: Add cet.h for writing CET-enabled assembly code.
May 10 2020, 8:45 PM · Restricted Project

May 9 2020

LuoYuanke added inline comments to D79617: Add cet.h for writing CET-enabled assembly code.
May 9 2020, 7:25 AM · Restricted Project

Apr 19 2020

LuoYuanke accepted D77124: Handle CET for -exception-model sjlj.

LGTM.

Apr 19 2020, 7:12 PM · Restricted Project

Apr 17 2020

LuoYuanke added inline comments to D77851: [X86][MC] Make -x86-pad-max-prefix-size compatible with --mc-relax-all.
Apr 17 2020, 6:14 PM · Restricted Project

Apr 15 2020

LuoYuanke added inline comments to D77124: Handle CET for -exception-model sjlj.
Apr 15 2020, 1:36 AM · Restricted Project

Apr 14 2020

LuoYuanke added inline comments to D77124: Handle CET for -exception-model sjlj.
Apr 14 2020, 11:57 PM · Restricted Project

Apr 13 2020

LuoYuanke added inline comments to D77124: Handle CET for -exception-model sjlj.
Apr 13 2020, 10:47 PM · Restricted Project
LuoYuanke added a comment to D77971: [MC][X86] Disable branch align in non-text section.

LGTM. @MaskRay Do you have time to review?

Apr 13 2020, 10:24 PM · Restricted Project

Apr 12 2020

LuoYuanke added inline comments to D77971: [MC][X86] Disable branch align in non-text section.
Apr 12 2020, 6:42 PM · Restricted Project

Apr 9 2020

LuoYuanke added a comment to D77851: [X86][MC] Make -x86-pad-max-prefix-size compatible with --mc-relax-all.

LGTM. Better to wait for Philip accept.

Apr 9 2020, 11:57 PM · Restricted Project

Apr 8 2020

LuoYuanke added inline comments to D77124: Handle CET for -exception-model sjlj.
Apr 8 2020, 11:57 PM · Restricted Project
LuoYuanke added inline comments to D77124: Handle CET for -exception-model sjlj.
Apr 8 2020, 11:57 PM · Restricted Project
LuoYuanke added inline comments to D77124: Handle CET for -exception-model sjlj.
Apr 8 2020, 11:12 PM · Restricted Project
LuoYuanke added a comment to D77628: [Driver][X86] Add -mpad-max-prefix-size.

This looks like it requires more thorough discussions, doesn't it? We will also need discussions with the binutils side. We requested a GCC compiler driver option long ago but we do not reach a consensus (https://gcc.gnu.org/legacy-ml/gcc/2020-01/msg00358.html). IMHO the approval was in haste. With all due respect, I think I've seen such hasty approval (without any mentioning why such decisions are justifiable) and hasty commits in this area before several times, so I'll click "Request Changes" just in case of an accidental commit.

Apr 8 2020, 6:30 PM · Restricted Project
LuoYuanke accepted D77628: [Driver][X86] Add -mpad-max-prefix-size.

LGTM

Apr 8 2020, 5:23 AM · Restricted Project

Apr 7 2020

LuoYuanke added a comment to D70456: [Matrix] Add first set of matrix intrinsics and initial lowering pass..

I've created D77549 which uses AArch64's udot instruction to compute the result of multiplies on 4x4 tiles. To do so, first a tiled loop nest is created that iterates over the columns, rows and the inner dimension. In the inner loop, 4x4 tiles are loaded, multiplied (using the dot product) and accumulated. After the inner loop, the final result of the 4x4 tile is stored. The main reason I went for AArch64's udot is that I can easily run it, but IIUC the VNNI instructions are very similar, they just allow processing of larger tiles.

Please note that the patch is a bit rough around the edges and we currently it not clear how to specify 'multiply 8 bit operands, accumulate in 32 bit result' nature of those instructions; we will have to extend the llvm.matrix.multiply definition for that I think. But it should be enough for you to be able to get started with getting something working for VNNI. Please let me know if you have any questions or encounter any problems, either in the discussion for D77549 or email.

Cheers,
Florian

Apr 7 2020, 7:34 AM · Restricted Project

Apr 3 2020

LuoYuanke added a comment to D70456: [Matrix] Add first set of matrix intrinsics and initial lowering pass..

We used something similar internally successfully. If you are interested, I could share infrastructure to create code that applies smaller building blocks (like fast 2x2 multiplication) to lower multiplies on larger matrixes.

Apr 3 2020, 8:02 PM · Restricted Project
LuoYuanke added a comment to D70456: [Matrix] Add first set of matrix intrinsics and initial lowering pass..

I have the similar question on how to lower matrix intrinsics to some HW specific intrinsics/instruction. For example, X86 have the AVX512_VNNI feature (https://software.intel.com/sites/landingpage/IntrinsicsGuide/#expand=39,5370,5361,364,142,139,2210&text=vnni). It can perform dot product computation. But after matrix intrinsic is lowered to vector, it seems difficult to transform the vector operation to AVX512_VNNI intrinsic/instruction.

Apr 3 2020, 6:57 AM · Restricted Project

Apr 2 2020

LuoYuanke accepted D76900: Enable IBT(Indirect Branch Tracking) in JIT with CET(Control-flow Enforcement Technology).

LGTM

Apr 2 2020, 9:08 PM · Restricted Project
LuoYuanke added inline comments to D76900: Enable IBT(Indirect Branch Tracking) in JIT with CET(Control-flow Enforcement Technology).
Apr 2 2020, 8:05 PM · Restricted Project
LuoYuanke added inline comments to D70456: [Matrix] Add first set of matrix intrinsics and initial lowering pass..
Apr 2 2020, 6:27 PM · Restricted Project

Apr 1 2020

LuoYuanke added inline comments to D70456: [Matrix] Add first set of matrix intrinsics and initial lowering pass..
Apr 1 2020, 1:36 AM · Restricted Project

Mar 31 2020

LuoYuanke added inline comments to D70456: [Matrix] Add first set of matrix intrinsics and initial lowering pass..
Mar 31 2020, 7:09 AM · Restricted Project

Mar 29 2020

LuoYuanke added inline comments to D75566: [Matrix] Add initial tiling for load/multiply/store chains..
Mar 29 2020, 8:52 PM · Restricted Project

Mar 26 2020

LuoYuanke added inline comments to D76900: Enable IBT(Indirect Branch Tracking) in JIT with CET(Control-flow Enforcement Technology).
Mar 26 2020, 7:35 PM · Restricted Project

Mar 19 2020

LuoYuanke added inline comments to D76285: [X86][MC] Fix the bug for prefix padding support.
Mar 19 2020, 10:26 PM · Restricted Project
LuoYuanke added inline comments to D76286: [X86][MC] Support enhanced relaxation for branch align.
Mar 19 2020, 9:56 PM · Restricted Project
LuoYuanke added inline comments to D76398: [X86] Limit prefix padding w/target specific padding amount.
Mar 19 2020, 8:50 PM · Restricted Project

Mar 17 2020

LuoYuanke accepted D76190: CET for Exception Handle.

LGTM

Mar 17 2020, 9:04 PM · Restricted Project

Mar 16 2020

LuoYuanke accepted D76176: [X86] Disable nop padding before instruction following hardcode.

LGTM.

Mar 16 2020, 12:49 AM · Restricted Project

Mar 15 2020

LuoYuanke added inline comments to D76176: [X86] Disable nop padding before instruction following hardcode.
Mar 15 2020, 7:52 PM · Restricted Project
LuoYuanke added inline comments to D76190: CET for Exception Handle.
Mar 15 2020, 8:00 AM · Restricted Project

Mar 12 2020

LuoYuanke accepted D76052: [X86] Disable nop padding before instruction following a prefix.

LGTM

Mar 12 2020, 9:54 PM · Restricted Project

Mar 9 2020

LuoYuanke accepted D75438: [X86] Reduce the number of emitted fragments due to branch align.

LGTM. Better to hold for few days to have Philips and MaskRay review as well.

Mar 9 2020, 11:21 PM · Restricted Project
LuoYuanke added inline comments to D75438: [X86] Reduce the number of emitted fragments due to branch align.
Mar 9 2020, 9:12 PM · Restricted Project

Jan 17 2020

LuoYuanke added a comment to D72721: [BranchAlign] Disable autopadding in cold blocks to reduce code size impact.

This is enabled only when -Os is specified?

Jan 17 2020, 12:49 AM · Restricted Project
LuoYuanke accepted D72878: [X86][BranchAlign] Suppress branch alignment for {,_}__tls_get_addr.

LGTM

Jan 17 2020, 12:46 AM · Restricted Project

Jan 16 2020

LuoYuanke added a comment to D72721: [BranchAlign] Disable autopadding in cold blocks to reduce code size impact.

@reames
I'm a little bit concerned about the performance impact of this patch. Do you have any data for the patch? Is there any performance penalty and how much code size is reduced?

Jan 16 2020, 6:24 PM · Restricted Project
LuoYuanke added inline comments to D72878: [X86][BranchAlign] Suppress branch alignment for {,_}__tls_get_addr.
Jan 16 2020, 5:47 PM · Restricted Project

Jan 13 2020

LuoYuanke accepted D72225: Align branches within 32-Byte boundary(Prefix padding).

The patch LGTM. I'd like to see the patch land before the llvm release.

Jan 13 2020, 3:02 AM · Restricted Project
LuoYuanke added inline comments to D72225: Align branches within 32-Byte boundary(Prefix padding).
Jan 13 2020, 2:59 AM · Restricted Project
LuoYuanke added inline comments to D72225: Align branches within 32-Byte boundary(Prefix padding).
Jan 13 2020, 1:02 AM · Restricted Project

Jan 11 2020

LuoYuanke added a comment to D59780: Support Intel Control-flow Enforcement Technology.

Ping @ruiu

Jan 11 2020, 2:23 AM · Restricted Project

Jan 9 2020

LuoYuanke added inline comments to D72227: [Driver][X86] Add -malign-branch* and -malign-branch-within-32B-boundaries.
Jan 9 2020, 7:21 PM · Restricted Project
LuoYuanke added inline comments to D72225: Align branches within 32-Byte boundary(Prefix padding).
Jan 9 2020, 4:14 AM · Restricted Project

Jan 6 2020

LuoYuanke added inline comments to D72280: [Matrix] Add IR MatrixBuilder..
Jan 6 2020, 9:55 PM · Restricted Project

Dec 19 2019

LuoYuanke added a comment to D59780: Support Intel Control-flow Enforcement Technology.

@ruiu,
Can we submit the patch?

Dec 19 2019, 5:19 AM · Restricted Project

Dec 17 2019

LuoYuanke added a comment to D71442: [X86] Add calculation for elements in structures in getting uniform base for the Gather/Scatter intrinsic..

Ping.

Dec 17 2019, 6:27 PM · Restricted Project

Dec 13 2019

LuoYuanke added inline comments to D71442: [X86] Add calculation for elements in structures in getting uniform base for the Gather/Scatter intrinsic..
Dec 13 2019, 3:12 AM · Restricted Project
LuoYuanke added inline comments to D71442: [X86] Add calculation for elements in structures in getting uniform base for the Gather/Scatter intrinsic..
Dec 13 2019, 12:05 AM · Restricted Project

Dec 11 2019

LuoYuanke added inline comments to D70901: [Matrix] Update shape propagation to iterate until done..
Dec 11 2019, 5:40 AM · Restricted Project

Dec 10 2019

LuoYuanke added a comment to D71238: Align non-fused branches within 32-Byte boundary (basic case).

I've gone as far as writing a rough draft of textual assembler support. If folks agree this is helpful, I'll abandon this patch, and post one in that direction. Any high level suggestions as to syntax? I see two major options, but definitely welcome suggestions w.r.t. naming/semantics/etc...

Option 1

.boundary_align 4
jmp foo

and

.boundary_align 4
.bundle_lock
test ...
jcc foo
.bundle_lock

Option 2

Option 1

.boundary_align 4
jmp foo
.end_boundary_align

and

.boundary_align 4
test ...
jcc foo
.end_boundary_align

p.s. If anyone has a better name than "boundary align" please suggest it. That's not a good name; it's just a placeholder.

Dec 10 2019, 7:58 PM · Restricted Project

Dec 8 2019

LuoYuanke added inline comments to D70897: [Matrix] Add forward shape propagation and first shape aware lowerings..
Dec 8 2019, 4:33 AM · Restricted Project
LuoYuanke added inline comments to D70897: [Matrix] Add forward shape propagation and first shape aware lowerings..
Dec 8 2019, 2:35 AM · Restricted Project

Dec 7 2019

LuoYuanke added inline comments to D70456: [Matrix] Add first set of matrix intrinsics and initial lowering pass..
Dec 7 2019, 8:50 PM · Restricted Project
LuoYuanke added inline comments to D70456: [Matrix] Add first set of matrix intrinsics and initial lowering pass..
Dec 7 2019, 7:10 PM · Restricted Project
LuoYuanke added inline comments to D70456: [Matrix] Add first set of matrix intrinsics and initial lowering pass..
Dec 7 2019, 7:01 PM · Restricted Project

Dec 6 2019

LuoYuanke added a comment to D70157: Align branches within 32-Byte boundary(NOP padding).

Recording something so I don't forget it when we get back to the prefix padding version. The write up on the bundle align mode stuff mentions a concerning memory overhead for the feature. Since the basic implementation techniques are similar, we need to make sure we assess the memory overhead of the prefix padding implementation. See https://www.chromium.org/nativeclient/pnacl/aligned-bundling-support-in-llvm for context. I don't believe this is likely to be an issue for the nop padding variant.

Dec 6 2019, 9:24 PM · Restricted Project, Restricted Project

Dec 1 2019

LuoYuanke added inline comments to D70875: [X86] Model MXCSR for AVX instructions other than AVX512.
Dec 1 2019, 12:03 AM · Restricted Project

Nov 29 2019

Herald added a project to D19338: New code hoisting pass based on GVN (optimistic approach): Restricted Project.
Nov 29 2019, 11:28 PM · Restricted Project

Nov 27 2019

LuoYuanke added a comment to D70456: [Matrix] Add first set of matrix intrinsics and initial lowering pass..

I'm currently preparing the patches on the clang side in addition to an update to cfe-dev. Please stay tuned and we would really appreciate any feedback there.

In the original RFC, we sketched the C/C++ support we envisioned using builtins. A simple example that loads two 4x4 matrixes, multiplies them, adds a third matrix to the result and stores the it can be found in the code below. Our initial proposal is quite stripped down and intended to be exposed to end users via something like a C++ matrix wrapper class.

typedef float m4x4_t __attribute__((matrix_type(4, 4)));
 
 
void f(m4x4_t *a, m4x4_t *b, m4x4_t *c, m4x4_t *r) {
  *r = __builtin_matrix_add(__builtin_matrix_multiply(*a, *b), *c);
}
Nov 27 2019, 7:29 PM · Restricted Project

Nov 26 2019

LuoYuanke added a comment to D70456: [Matrix] Add first set of matrix intrinsics and initial lowering pass..

Interesting. Do you have any patch for the C/C++ frontend? What does the C/C++ code look like?

Nov 26 2019, 7:56 PM · Restricted Project

Sep 5 2019

LuoYuanke added a comment to D67210: [x86] bug fix for https://reviews.llvm.org/D64551.

rL371078 should fix this

Sep 5 2019, 6:32 PM · Restricted Project
LuoYuanke added a comment to D67210: [x86] bug fix for https://reviews.llvm.org/D64551.

Please abandon this, it isn't a valid solution to the issue (raised at PR43227). I have a WIP fix that will address this correctly.

Sep 5 2019, 5:55 AM · Restricted Project

Aug 9 2019

LuoYuanke added a comment to D56772: [MIR] Add simple PRE pass to MachineCSE.

@anton-afanasyev, I have no objection. Thank you for the effort to improve the performance.

Aug 9 2019, 8:27 PM · Restricted Project
LuoYuanke committed rGc6c86f4f81fb: [X86] Fix stack probe issue on windows32. (authored by LuoYuanke).
[X86] Fix stack probe issue on windows32.
Aug 9 2019, 7:51 PM
LuoYuanke committed rL368503: [X86] Fix stack probe issue on windows32..
[X86] Fix stack probe issue on windows32.
Aug 9 2019, 7:49 PM
LuoYuanke closed D65923: [X86] Fix stack probe issue on windows32..
Aug 9 2019, 7:49 PM · Restricted Project
LuoYuanke updated the diff for D65923: [X86] Fix stack probe issue on windows32..

Update the patch corresponding to Reid's comments.

Aug 9 2019, 3:53 AM · Restricted Project

Aug 7 2019

LuoYuanke created D65923: [X86] Fix stack probe issue on windows32..
Aug 7 2019, 7:54 PM · Restricted Project

Jul 22 2019

LuoYuanke added a comment to D56772: [MIR] Add simple PRE pass to MachineCSE.

@anton-afanasyev
Hi,
Do you have any performance data for the patch? I'd like to know what benchmark has performance gain with your patch. https://reviews.llvm.org/D64394 fixed perlbench regression, but I wonder what the performance gain do we achieve with the 2 patch?

Jul 22 2019, 7:23 PM · Restricted Project

Jul 19 2019

LuoYuanke added a comment to D56772: [MIR] Add simple PRE pass to MachineCSE.

@anton-afanasyev
Hi,
Did you look into the SPEC cpu2017/500.perlbench_r issue? There is some significant performance drop on X86 with the patch. I ask you to revert the patch first, and when the SPEC2017 regression is fixed, we can submit the patch again. How do you think?

Jul 19 2019, 12:24 AM · Restricted Project

Jul 1 2019

LuoYuanke added inline comments to D56772: [MIR] Add simple PRE pass to MachineCSE.
Jul 1 2019, 6:16 PM · Restricted Project

Jun 22 2019

LuoYuanke added inline comments to D56772: [MIR] Add simple PRE pass to MachineCSE.
Jun 22 2019, 6:15 PM · Restricted Project

Jun 19 2019

LuoYuanke added inline comments to D56772: [MIR] Add simple PRE pass to MachineCSE.
Jun 19 2019, 7:51 PM · Restricted Project

Jun 10 2019

LuoYuanke accepted D62115: fix a issue that clang is incompatible with gcc with -H option..
Jun 10 2019, 12:24 AM · Restricted Project

May 6 2019

LuoYuanke committed rG844f66293235: Enable intrinsics of AVX512_BF16, which are supported for BFLOAT16 in Cooper… (authored by LuoYuanke).
Enable intrinsics of AVX512_BF16, which are supported for BFLOAT16 in Cooper…
May 6 2019, 1:24 AM
LuoYuanke committed rC360018: Enable intrinsics of AVX512_BF16, which are supported for BFLOAT16 in Cooper….
Enable intrinsics of AVX512_BF16, which are supported for BFLOAT16 in Cooper…
May 6 2019, 1:24 AM
LuoYuanke committed rL360018: Enable intrinsics of AVX512_BF16, which are supported for BFLOAT16 in Cooper….
Enable intrinsics of AVX512_BF16, which are supported for BFLOAT16 in Cooper…
May 6 2019, 1:23 AM
LuoYuanke closed D60552: [X86] Enable intrinsics of AVX512_BF16, which are supported for BFLOAT16 in Cooper Lake.
May 6 2019, 1:23 AM · Restricted Project, Restricted Project
LuoYuanke committed rGbeec41c656e7: Enable AVX512_BF16 instructions, which are supported for BFLOAT16 in Cooper Lake (authored by LuoYuanke).
Enable AVX512_BF16 instructions, which are supported for BFLOAT16 in Cooper Lake
May 6 2019, 1:21 AM
LuoYuanke committed rL360017: Enable AVX512_BF16 instructions, which are supported for BFLOAT16 in Cooper Lake.
Enable AVX512_BF16 instructions, which are supported for BFLOAT16 in Cooper Lake
May 6 2019, 1:21 AM
LuoYuanke closed D60550: [X86] Enable AVX512_BF16 instructions, which are supported for BFLOAT16 in Cooper Lake.
May 6 2019, 1:21 AM · Restricted Project

May 5 2019

LuoYuanke accepted D61580: [NFC] This is a test for the commit access..
May 5 2019, 7:49 PM · Restricted Project

Apr 11 2019

LuoYuanke added a comment to D60437: Add MM register mapping from CodeView to MC register id.

Okay, let's not worry too much about the test case, the patch seems obviously good.

Do you have commit access, or would you like me to commit for you?

Apr 11 2019, 8:04 AM · Restricted Project
LuoYuanke committed rGa2b4d3fab623: [X86] Add MM register mapping from CodeView to MC register id (authored by LuoYuanke).
[X86] Add MM register mapping from CodeView to MC register id
Apr 11 2019, 8:01 AM
LuoYuanke committed rL358179: [X86] Add MM register mapping from CodeView to MC register id.
[X86] Add MM register mapping from CodeView to MC register id
Apr 11 2019, 8:01 AM
LuoYuanke closed D60437: Add MM register mapping from CodeView to MC register id.
Apr 11 2019, 8:01 AM · Restricted Project