Page MenuHomePhabricator
Feed Advanced Search

Fri, Oct 16

LuoYuanke added inline comments to D88631: [X86] Support customizing stack protector guard.
Fri, Oct 16, 4:51 AM · Restricted Project, Restricted Project

Thu, Oct 15

LuoYuanke updated the diff for D87981: [X86] AMX programming model prototype..

Rebase

Thu, Oct 15, 10:35 PM · Restricted Project, Restricted Project

Sat, Oct 10

LuoYuanke added a comment to D89178: [X86] Alternate implementation of D88194..

This patch looks more common than D88194. Is it possible that generate endbr constant after PreprocessISelDAG()?

Sat, Oct 10, 5:28 AM · Restricted Project
LuoYuanke added a comment to D87981: [X86] AMX programming model prototype..

ping

Sat, Oct 10, 5:14 AM · Restricted Project, Restricted Project
LuoYuanke added a comment to D88396: [X86] Support replacing aligned vector moves with unaligned moves when avx is enabled. (off by default).

LGTM. Thanks!
Since the pass is turned off by default, I think we can let it in. @lebedev.ri, what's your opinion?

I still retain my original opinion that this is trying to paper over broken source code,
and incorrectly so, because even if backend doesn't make use of the alignment information
that was lowered from the source code into IR, the IR will still contain incorrect alignment
information, and it is only a matter of time until that UB manifests in some other way.

As i see it, there are 5 options:

  1. Don't manually vectorize the code
  2. Do UBSan to catch these issues
  3. Enhance clang/clang-tidy to better catch these issues
  4. Don't do aligned loads https://godbolt.org/z/38jrvE
  5. Add a clang (!) switch to make __m128 unaligned

I strongly suggest that an option 4 be taken.

Sat, Oct 10, 4:00 AM · Restricted Project
LuoYuanke updated the diff for D88396: [X86] Support replacing aligned vector moves with unaligned moves when avx is enabled. (off by default).

Address Pengfei's comments.

Sat, Oct 10, 2:34 AM · Restricted Project

Fri, Oct 9

LuoYuanke updated the diff for D88396: [X86] Support replacing aligned vector moves with unaligned moves when avx is enabled. (off by default).

Address Craig's comments.

Fri, Oct 9, 9:37 PM · Restricted Project
LuoYuanke added inline comments to D88396: [X86] Support replacing aligned vector moves with unaligned moves when avx is enabled. (off by default).
Fri, Oct 9, 8:34 PM · Restricted Project
LuoYuanke updated the diff for D88396: [X86] Support replacing aligned vector moves with unaligned moves when avx is enabled. (off by default).

Address Pengfei's comments.

Fri, Oct 9, 8:26 PM · Restricted Project
LuoYuanke retitled D88396: [X86] Support replacing aligned vector moves with unaligned moves when avx is enabled. (off by default) from [X86] Replace movaps with movups when avx is enabled. to [X86] Replace aligned vector move with unaligned move when avx is enabled..
Fri, Oct 9, 8:03 PM · Restricted Project
LuoYuanke added a reviewer for D88396: [X86] Support replacing aligned vector moves with unaligned moves when avx is enabled. (off by default): pengfei.
Fri, Oct 9, 8:02 PM · Restricted Project
LuoYuanke updated the diff for D88396: [X86] Support replacing aligned vector moves with unaligned moves when avx is enabled. (off by default).

Addressed Craig's comments. Added transform for movapd and movdq.

Fri, Oct 9, 7:54 PM · Restricted Project

Wed, Oct 7

LuoYuanke added inline comments to D88396: [X86] Support replacing aligned vector moves with unaligned moves when avx is enabled. (off by default).
Wed, Oct 7, 7:00 AM · Restricted Project
LuoYuanke updated the diff for D88396: [X86] Support replacing aligned vector moves with unaligned moves when avx is enabled. (off by default).

Rebase and address Simon's comments.

Wed, Oct 7, 6:43 AM · Restricted Project

Sun, Oct 4

LuoYuanke updated the diff for D87981: [X86] AMX programming model prototype..

Rebase

Sun, Oct 4, 7:58 AM · Restricted Project, Restricted Project
LuoYuanke updated the diff for D88396: [X86] Support replacing aligned vector moves with unaligned moves when avx is enabled. (off by default).

Rebase

Sun, Oct 4, 4:45 AM · Restricted Project
LuoYuanke updated the diff for D88396: [X86] Support replacing aligned vector moves with unaligned moves when avx is enabled. (off by default).

Add llvm option "-enable-x86-movaps-to-movups" to enable movups preference.
By default the option is false.

Sun, Oct 4, 2:44 AM · Restricted Project

Thu, Oct 1

LuoYuanke added a reviewer for D88631: [X86] Support customizing stack protector guard: pengfei.
Thu, Oct 1, 5:11 PM · Restricted Project, Restricted Project

Wed, Sep 30

LuoYuanke added a reviewer for D87981: [X86] AMX programming model prototype.: qcolombet.
Wed, Sep 30, 5:56 PM · Restricted Project, Restricted Project
LuoYuanke added a comment to D88396: [X86] Support replacing aligned vector moves with unaligned moves when avx is enabled. (off by default).

Compiling for SSE this code will likely use the memory form of addps which will fault on the misalignment. I know this patch only targets AVX.

I don’t think you can motivate this change by showing what code you want to accept if the code would crash when compiled with the default SSE2 target.

Note that even if x86 codegen will always emit unaligned ops (which will cause new questions/bugreports),
the original IR will still contain UB, and it will be only a question of time until that causes some other 'miscompile'.
I really think this should be approached from front-end diag side.

Sorry, what does 'UB' means?

undefined behavior

Why cause 'miscompile', compiler still think the address is aligned.

That is very precisely my point.

Selecting movups doesn't break compiler assumption. Is there any reason movaps is better than movups? To detect the alignment exception?

Wed, Sep 30, 2:51 AM · Restricted Project
LuoYuanke added a comment to D88396: [X86] Support replacing aligned vector moves with unaligned moves when avx is enabled. (off by default).

Compiling for SSE this code will likely use the memory form of addps which will fault on the misalignment. I know this patch only targets AVX.

I don’t think you can motivate this change by showing what code you want to accept if the code would crash when compiled with the default SSE2 target.

Note that even if x86 codegen will always emit unaligned ops (which will cause new questions/bugreports),
the original IR will still contain UB, and it will be only a question of time until that causes some other 'miscompile'.
I really think this should be approached from front-end diag side.

Wed, Sep 30, 1:49 AM · Restricted Project
LuoYuanke added inline comments to D88396: [X86] Support replacing aligned vector moves with unaligned moves when avx is enabled. (off by default).
Wed, Sep 30, 1:26 AM · Restricted Project
LuoYuanke added a comment to D88396: [X86] Support replacing aligned vector moves with unaligned moves when avx is enabled. (off by default).

Compiling for SSE this code will likely use the memory form of addps which will fault on the misalignment. I know this patch only targets AVX.

I don’t think you can motivate this change by showing what code you want to accept if the code would crash when compiled with the default SSE2 target.

Wed, Sep 30, 12:50 AM · Restricted Project
LuoYuanke added a comment to D88396: [X86] Support replacing aligned vector moves with unaligned moves when avx is enabled. (off by default).

I didn't get the error at https://godbolt.org/z/8aGhd5. Another example may like this, an float array is packed in a struct.

Wed, Sep 30, 12:31 AM · Restricted Project

Tue, Sep 29

LuoYuanke added a comment to D88396: [X86] Support replacing aligned vector moves with unaligned moves when avx is enabled. (off by default).

If the address is random, during validation the address is aligned and sanitizers tool doesn't notice it. When run in real time, it crashes randomly. There is no harm to replace movaps with movups, and it can avoid some crash issue. Is it doable to add an option to let user choose movups or movaps?

Tue, Sep 29, 11:39 PM · Restricted Project
LuoYuanke updated subscribers of D88396: [X86] Support replacing aligned vector moves with unaligned moves when avx is enabled. (off by default).
Tue, Sep 29, 11:29 PM · Restricted Project

Mon, Sep 28

LuoYuanke added a comment to D88396: [X86] Support replacing aligned vector moves with unaligned moves when avx is enabled. (off by default).

What issue is this fixing?

However if the address is not aligned, movaps raise exception while movups can still run.

That sounds like either a miscompile happened along the way, or the original source code had UB to begin with.

Mon, Sep 28, 1:50 AM · Restricted Project
LuoYuanke added reviewers for D88396: [X86] Support replacing aligned vector moves with unaligned moves when avx is enabled. (off by default): craig.topper, annita.zhang.
Mon, Sep 28, 1:48 AM · Restricted Project
LuoYuanke requested review of D88396: [X86] Support replacing aligned vector moves with unaligned moves when avx is enabled. (off by default).
Mon, Sep 28, 1:35 AM · Restricted Project

Sep 24 2020

LuoYuanke added a reviewer for D87981: [X86] AMX programming model prototype.: chandlerc.
Sep 24 2020, 6:27 AM · Restricted Project, Restricted Project
LuoYuanke updated the diff for D87981: [X86] AMX programming model prototype..

Updating D87981: [X86] AMX programming model prototype.
Rebase.

Sep 24 2020, 6:26 AM · Restricted Project, Restricted Project

Sep 22 2020

LuoYuanke updated the diff for D87981: [X86] AMX programming model prototype..

Updating D87981: [X86] AMX programming model prototype.
Fix clang format and add test case for RA that across function call.

Sep 22 2020, 12:33 AM · Restricted Project, Restricted Project

Sep 20 2020

LuoYuanke updated the diff for D87981: [X86] AMX programming model prototype..

Updating D87981: [X86] AMX programming model prototype.
Fixed lit test case failure.

Sep 20 2020, 7:35 PM · Restricted Project, Restricted Project

Sep 19 2020

LuoYuanke updated the diff for D87981: [X86] AMX programming model prototype..

Updating D87981: [X86] AMX programming model prototype.
Fixed some clang format issue.

Sep 19 2020, 11:32 PM · Restricted Project, Restricted Project
LuoYuanke added a reviewer for D87981: [X86] AMX programming model prototype.: hjl.tools.
Sep 19 2020, 9:54 PM · Restricted Project, Restricted Project
LuoYuanke updated the summary of D87981: [X86] AMX programming model prototype..
Sep 19 2020, 9:35 PM · Restricted Project, Restricted Project
LuoYuanke requested review of D87981: [X86] AMX programming model prototype..
Sep 19 2020, 9:19 PM · Restricted Project, Restricted Project

Jul 6 2020

LuoYuanke added a comment to D83175: [X86] Fix a bug that when lowering byval argument.

LGTM

Jul 6 2020, 11:21 PM · Restricted Project
LuoYuanke added inline comments to D83175: [X86] Fix a bug that when lowering byval argument.
Jul 6 2020, 6:04 AM · Restricted Project

Jun 30 2020

LuoYuanke added inline comments to D82705: [X86-64] Support Intel AMX instructions.
Jun 30 2020, 7:00 PM · Restricted Project

Jun 13 2020

LuoYuanke added inline comments to D81498: [Matrix] Preserve volatile when loading loads/stores..
Jun 13 2020, 10:24 PM · Restricted Project

Jun 6 2020

LuoYuanke added inline comments to D81308: [Matrix] Use TileInfo to create tiled loop nest for matrix multiply..
Jun 6 2020, 10:24 PM · Restricted Project

May 26 2020

LuoYuanke created D80539: [X86] Report error for invalid register number..
May 26 2020, 1:35 AM · Restricted Project

May 18 2020

LuoYuanke accepted D79617: Add cet.h for writing CET-enabled assembly code.

LGTM

May 18 2020, 7:32 PM · Restricted Project
LuoYuanke added inline comments to D80124: [Matrix] Make matrix.multiply variadic, accept optional NUW/NSW flags..
May 18 2020, 6:26 PM · Restricted Project

May 10 2020

LuoYuanke added inline comments to D79617: Add cet.h for writing CET-enabled assembly code.
May 10 2020, 8:45 PM · Restricted Project

May 9 2020

LuoYuanke added inline comments to D79617: Add cet.h for writing CET-enabled assembly code.
May 9 2020, 7:25 AM · Restricted Project

Apr 19 2020

LuoYuanke accepted D77124: Handle CET for -exception-model sjlj.

LGTM.

Apr 19 2020, 7:12 PM · Restricted Project

Apr 17 2020

LuoYuanke added inline comments to D77851: [X86][MC] Make -x86-pad-max-prefix-size compatible with --mc-relax-all.
Apr 17 2020, 6:14 PM · Restricted Project

Apr 15 2020

LuoYuanke added inline comments to D77124: Handle CET for -exception-model sjlj.
Apr 15 2020, 1:36 AM · Restricted Project

Apr 14 2020

LuoYuanke added inline comments to D77124: Handle CET for -exception-model sjlj.
Apr 14 2020, 11:57 PM · Restricted Project

Apr 13 2020

LuoYuanke added inline comments to D77124: Handle CET for -exception-model sjlj.
Apr 13 2020, 10:47 PM · Restricted Project
LuoYuanke added a comment to D77971: [MC][X86] Disable branch align in non-text section.

LGTM. @MaskRay Do you have time to review?

Apr 13 2020, 10:24 PM · Restricted Project

Apr 12 2020

LuoYuanke added inline comments to D77971: [MC][X86] Disable branch align in non-text section.
Apr 12 2020, 6:42 PM · Restricted Project

Apr 9 2020

LuoYuanke added a comment to D77851: [X86][MC] Make -x86-pad-max-prefix-size compatible with --mc-relax-all.

LGTM. Better to wait for Philip accept.

Apr 9 2020, 11:57 PM · Restricted Project

Apr 8 2020

LuoYuanke added inline comments to D77124: Handle CET for -exception-model sjlj.
Apr 8 2020, 11:57 PM · Restricted Project
LuoYuanke added inline comments to D77124: Handle CET for -exception-model sjlj.
Apr 8 2020, 11:57 PM · Restricted Project
LuoYuanke added inline comments to D77124: Handle CET for -exception-model sjlj.
Apr 8 2020, 11:12 PM · Restricted Project
LuoYuanke added a comment to D77628: [Driver][X86] Add -mpad-max-prefix-size.

This looks like it requires more thorough discussions, doesn't it? We will also need discussions with the binutils side. We requested a GCC compiler driver option long ago but we do not reach a consensus (https://gcc.gnu.org/legacy-ml/gcc/2020-01/msg00358.html). IMHO the approval was in haste. With all due respect, I think I've seen such hasty approval (without any mentioning why such decisions are justifiable) and hasty commits in this area before several times, so I'll click "Request Changes" just in case of an accidental commit.

Apr 8 2020, 6:30 PM · Restricted Project
LuoYuanke accepted D77628: [Driver][X86] Add -mpad-max-prefix-size.

LGTM

Apr 8 2020, 5:23 AM · Restricted Project

Apr 7 2020

LuoYuanke added a comment to D70456: [Matrix] Add first set of matrix intrinsics and initial lowering pass..

I've created D77549 which uses AArch64's udot instruction to compute the result of multiplies on 4x4 tiles. To do so, first a tiled loop nest is created that iterates over the columns, rows and the inner dimension. In the inner loop, 4x4 tiles are loaded, multiplied (using the dot product) and accumulated. After the inner loop, the final result of the 4x4 tile is stored. The main reason I went for AArch64's udot is that I can easily run it, but IIUC the VNNI instructions are very similar, they just allow processing of larger tiles.

Please note that the patch is a bit rough around the edges and we currently it not clear how to specify 'multiply 8 bit operands, accumulate in 32 bit result' nature of those instructions; we will have to extend the llvm.matrix.multiply definition for that I think. But it should be enough for you to be able to get started with getting something working for VNNI. Please let me know if you have any questions or encounter any problems, either in the discussion for D77549 or email.

Cheers,
Florian

Apr 7 2020, 7:34 AM · Restricted Project

Apr 3 2020

LuoYuanke added a comment to D70456: [Matrix] Add first set of matrix intrinsics and initial lowering pass..

We used something similar internally successfully. If you are interested, I could share infrastructure to create code that applies smaller building blocks (like fast 2x2 multiplication) to lower multiplies on larger matrixes.

Apr 3 2020, 8:02 PM · Restricted Project
LuoYuanke added a comment to D70456: [Matrix] Add first set of matrix intrinsics and initial lowering pass..

I have the similar question on how to lower matrix intrinsics to some HW specific intrinsics/instruction. For example, X86 have the AVX512_VNNI feature (https://software.intel.com/sites/landingpage/IntrinsicsGuide/#expand=39,5370,5361,364,142,139,2210&text=vnni). It can perform dot product computation. But after matrix intrinsic is lowered to vector, it seems difficult to transform the vector operation to AVX512_VNNI intrinsic/instruction.

Apr 3 2020, 6:57 AM · Restricted Project

Apr 2 2020

LuoYuanke accepted D76900: Enable IBT(Indirect Branch Tracking) in JIT with CET(Control-flow Enforcement Technology).

LGTM

Apr 2 2020, 9:08 PM · Restricted Project
LuoYuanke added inline comments to D76900: Enable IBT(Indirect Branch Tracking) in JIT with CET(Control-flow Enforcement Technology).
Apr 2 2020, 8:05 PM · Restricted Project
LuoYuanke added inline comments to D70456: [Matrix] Add first set of matrix intrinsics and initial lowering pass..
Apr 2 2020, 6:27 PM · Restricted Project

Apr 1 2020

LuoYuanke added inline comments to D70456: [Matrix] Add first set of matrix intrinsics and initial lowering pass..
Apr 1 2020, 1:36 AM · Restricted Project

Mar 31 2020

LuoYuanke added inline comments to D70456: [Matrix] Add first set of matrix intrinsics and initial lowering pass..
Mar 31 2020, 7:09 AM · Restricted Project

Mar 29 2020

LuoYuanke added inline comments to D75566: [Matrix] Add initial tiling for load/multiply/store chains..
Mar 29 2020, 8:52 PM · Restricted Project

Mar 26 2020

LuoYuanke added inline comments to D76900: Enable IBT(Indirect Branch Tracking) in JIT with CET(Control-flow Enforcement Technology).
Mar 26 2020, 7:35 PM · Restricted Project

Mar 19 2020

LuoYuanke added inline comments to D76285: [X86][MC] Fix the bug for prefix padding support.
Mar 19 2020, 10:26 PM · Restricted Project
LuoYuanke added inline comments to D76286: [X86][MC] Support enhanced relaxation for branch align.
Mar 19 2020, 9:56 PM · Restricted Project
LuoYuanke added inline comments to D76398: [X86] Limit prefix padding w/target specific padding amount.
Mar 19 2020, 8:50 PM · Restricted Project

Mar 17 2020

LuoYuanke accepted D76190: CET for Exception Handle.

LGTM

Mar 17 2020, 9:04 PM · Restricted Project

Mar 16 2020

LuoYuanke accepted D76176: [X86] Disable nop padding before instruction following hardcode.

LGTM.

Mar 16 2020, 12:49 AM · Restricted Project

Mar 15 2020

LuoYuanke added inline comments to D76176: [X86] Disable nop padding before instruction following hardcode.
Mar 15 2020, 7:52 PM · Restricted Project
LuoYuanke added inline comments to D76190: CET for Exception Handle.
Mar 15 2020, 8:00 AM · Restricted Project

Mar 12 2020

LuoYuanke accepted D76052: [X86] Disable nop padding before instruction following a prefix.

LGTM

Mar 12 2020, 9:54 PM · Restricted Project

Mar 9 2020

LuoYuanke accepted D75438: [X86] Reduce the number of emitted fragments due to branch align.

LGTM. Better to hold for few days to have Philips and MaskRay review as well.

Mar 9 2020, 11:21 PM · Restricted Project
LuoYuanke added inline comments to D75438: [X86] Reduce the number of emitted fragments due to branch align.
Mar 9 2020, 9:12 PM · Restricted Project

Jan 17 2020

LuoYuanke added a comment to D72721: [BranchAlign] Disable autopadding in cold blocks to reduce code size impact.

This is enabled only when -Os is specified?

Jan 17 2020, 12:49 AM · Restricted Project
LuoYuanke accepted D72878: [X86][BranchAlign] Suppress branch alignment for {,_}__tls_get_addr.

LGTM

Jan 17 2020, 12:46 AM · Restricted Project

Jan 16 2020

LuoYuanke added a comment to D72721: [BranchAlign] Disable autopadding in cold blocks to reduce code size impact.

@reames
I'm a little bit concerned about the performance impact of this patch. Do you have any data for the patch? Is there any performance penalty and how much code size is reduced?

Jan 16 2020, 6:24 PM · Restricted Project
LuoYuanke added inline comments to D72878: [X86][BranchAlign] Suppress branch alignment for {,_}__tls_get_addr.
Jan 16 2020, 5:47 PM · Restricted Project

Jan 13 2020

LuoYuanke accepted D72225: Align branches within 32-Byte boundary(Prefix padding).

The patch LGTM. I'd like to see the patch land before the llvm release.

Jan 13 2020, 3:02 AM · Restricted Project
LuoYuanke added inline comments to D72225: Align branches within 32-Byte boundary(Prefix padding).
Jan 13 2020, 2:59 AM · Restricted Project
LuoYuanke added inline comments to D72225: Align branches within 32-Byte boundary(Prefix padding).
Jan 13 2020, 1:02 AM · Restricted Project

Jan 11 2020

LuoYuanke added a comment to D59780: Support Intel Control-flow Enforcement Technology.

Ping @ruiu

Jan 11 2020, 2:23 AM · Restricted Project

Jan 9 2020

LuoYuanke added inline comments to D72227: [Driver][X86] Add -malign-branch* and -malign-branch-within-32B-boundaries.
Jan 9 2020, 7:21 PM · Restricted Project
LuoYuanke added inline comments to D72225: Align branches within 32-Byte boundary(Prefix padding).
Jan 9 2020, 4:14 AM · Restricted Project

Jan 6 2020

LuoYuanke added inline comments to D72280: [Matrix] Add IR MatrixBuilder..
Jan 6 2020, 9:55 PM · Restricted Project

Dec 19 2019

LuoYuanke added a comment to D59780: Support Intel Control-flow Enforcement Technology.

@ruiu,
Can we submit the patch?

Dec 19 2019, 5:19 AM · Restricted Project

Dec 17 2019

LuoYuanke added a comment to D71442: [X86] Add calculation for elements in structures in getting uniform base for the Gather/Scatter intrinsic..

Ping.

Dec 17 2019, 6:27 PM · Restricted Project

Dec 13 2019

LuoYuanke added inline comments to D71442: [X86] Add calculation for elements in structures in getting uniform base for the Gather/Scatter intrinsic..
Dec 13 2019, 3:12 AM · Restricted Project
LuoYuanke added inline comments to D71442: [X86] Add calculation for elements in structures in getting uniform base for the Gather/Scatter intrinsic..
Dec 13 2019, 12:05 AM · Restricted Project

Dec 11 2019

LuoYuanke added inline comments to D70901: [Matrix] Update shape propagation to iterate until done..
Dec 11 2019, 5:40 AM · Restricted Project

Dec 10 2019

LuoYuanke added a comment to D71238: Align non-fused branches within 32-Byte boundary (basic case).

I've gone as far as writing a rough draft of textual assembler support. If folks agree this is helpful, I'll abandon this patch, and post one in that direction. Any high level suggestions as to syntax? I see two major options, but definitely welcome suggestions w.r.t. naming/semantics/etc...

Option 1

.boundary_align 4
jmp foo

and

.boundary_align 4
.bundle_lock
test ...
jcc foo
.bundle_lock

Option 2

Option 1

.boundary_align 4
jmp foo
.end_boundary_align

and

.boundary_align 4
test ...
jcc foo
.end_boundary_align

p.s. If anyone has a better name than "boundary align" please suggest it. That's not a good name; it's just a placeholder.

Dec 10 2019, 7:58 PM · Restricted Project

Dec 8 2019

LuoYuanke added inline comments to D70897: [Matrix] Add forward shape propagation and first shape aware lowerings..
Dec 8 2019, 4:33 AM · Restricted Project
LuoYuanke added inline comments to D70897: [Matrix] Add forward shape propagation and first shape aware lowerings..
Dec 8 2019, 2:35 AM · Restricted Project

Dec 7 2019

LuoYuanke added inline comments to D70456: [Matrix] Add first set of matrix intrinsics and initial lowering pass..
Dec 7 2019, 8:50 PM · Restricted Project