Page MenuHomePhabricator
Feed Advanced Search

Fri, Apr 16

pengfei added a comment to D99675: RFC [llvm][clang] Create new intrinsic llvm.arith.fence to control FP optimization at expression level.
In D99675#2695424, @kpn wrote:

What changes are needed for a backend, and what happens if they aren't done?

Fri, Apr 16, 9:17 PM · Restricted Project
pengfei accepted D100472: [X86][AMX] Verify illegal types or instructions for x86_amx..

LGTM, let's see if others have objections.

Fri, Apr 16, 6:50 PM · Restricted Project

Thu, Apr 15

pengfei added inline comments to D100026: [X86] Support AMX fast register allocation.
Thu, Apr 15, 7:32 AM · Restricted Project
pengfei accepted D100491: [X86] combineCMP - fold cmpEQ/NE(TRUNC(X),0) -> cmpEQ/NE(X,0).

LGTM.

Thu, Apr 15, 5:24 AM · Restricted Project

Wed, Apr 14

pengfei added inline comments to D100491: [X86] combineCMP - fold cmpEQ/NE(TRUNC(X),0) -> cmpEQ/NE(X,0).
Wed, Apr 14, 11:02 PM · Restricted Project
pengfei added inline comments to D100368: [X86] Support some missing intrinsics.
Wed, Apr 14, 8:15 PM · Restricted Project
pengfei updated subscribers of D100026: [X86] Support AMX fast register allocation.

Just one thought: Is there a case user specifies --regalloc=fast with O2 in llc? Can we handle this case?

Wed, Apr 14, 7:06 AM · Restricted Project
pengfei committed rG184377da5c7c: [LLD] Implement /guard:[no]ehcont (authored by pengfei).
[LLD] Implement /guard:[no]ehcont
Wed, Apr 14, 12:07 AM
pengfei closed D99078: [LLD] Implement /guard:[no]ehcont.
Wed, Apr 14, 12:07 AM · Restricted Project
pengfei added a comment to D99078: [LLD] Implement /guard:[no]ehcont.
In D99078#2687146, @rnk wrote:

Looks good!

Wed, Apr 14, 12:06 AM · Restricted Project

Tue, Apr 13

pengfei added inline comments to D100026: [X86] Support AMX fast register allocation.
Tue, Apr 13, 11:08 PM · Restricted Project
pengfei committed rGa3b52a9d13a3: [X86][AMX] Refactor for PostRA ldtilecfg pass. (authored by pengfei).
[X86][AMX] Refactor for PostRA ldtilecfg pass.
Tue, Apr 13, 7:08 PM
pengfei closed D99966: [X86][AMX] Refactor for PostRA ldtilecfg pass..
Tue, Apr 13, 7:08 PM · Restricted Project
pengfei accepted D100032: [X86][AMX] Add description of x86_amx to LangRef..

LGTM.

Tue, Apr 13, 6:47 PM · Restricted Project
pengfei updated the diff for D99966: [X86][AMX] Refactor for PostRA ldtilecfg pass..

Address Yuanke's comment.

Tue, Apr 13, 6:13 PM · Restricted Project

Mon, Apr 12

pengfei added inline comments to D99966: [X86][AMX] Refactor for PostRA ldtilecfg pass..
Mon, Apr 12, 11:58 PM · Restricted Project
pengfei added inline comments to D99966: [X86][AMX] Refactor for PostRA ldtilecfg pass..
Mon, Apr 12, 10:45 PM · Restricted Project
pengfei committed rG4cbaaf4a2437: [X86][AMX] Hoist ldtilecfg (authored by pengfei).
[X86][AMX] Hoist ldtilecfg
Mon, Apr 12, 7:37 AM
pengfei closed D99010: [X86][AMX] Hoist ldtilecfg.
Mon, Apr 12, 7:37 AM · Restricted Project
pengfei added inline comments to D99010: [X86][AMX] Hoist ldtilecfg.
Mon, Apr 12, 6:20 AM · Restricted Project
pengfei updated the diff for D99010: [X86][AMX] Hoist ldtilecfg.

Refactor for excluding DBG_VALUE case.

Mon, Apr 12, 6:20 AM · Restricted Project
pengfei added a comment to D100032: [X86][AMX] Add description of x86_amx to LangRef..

Maybe also after https://llvm.org/docs/BitCodeFormat.html#type-code-x86-mmx-record

Mon, Apr 12, 1:50 AM · Restricted Project

Sat, Apr 10

pengfei updated the diff for D99966: [X86][AMX] Refactor for PostRA ldtilecfg pass..

Rebase.

Sat, Apr 10, 8:43 PM · Restricted Project
pengfei updated the diff for D99010: [X86][AMX] Hoist ldtilecfg.

Add back a condition deleted last time by mistaken.

Sat, Apr 10, 8:24 PM · Restricted Project
pengfei updated the diff for D99010: [X86][AMX] Hoist ldtilecfg.

Fix crush when build with -g.

Sat, Apr 10, 8:01 PM · Restricted Project
pengfei updated the diff for D99966: [X86][AMX] Refactor for PostRA ldtilecfg pass..

Add comments for shape layout.

Sat, Apr 10, 7:15 AM · Restricted Project
pengfei retitled D99966: [X86][AMX] Refactor for PostRA ldtilecfg pass. from [X86][AMX][WIP] Refactor for PostRA ldtilecfg pass. to [X86][AMX] Refactor for PostRA ldtilecfg pass..
Sat, Apr 10, 7:02 AM · Restricted Project
pengfei updated the diff for D99966: [X86][AMX] Refactor for PostRA ldtilecfg pass..

A workable implementation.

Sat, Apr 10, 6:59 AM · Restricted Project
pengfei updated the diff for D99010: [X86][AMX] Hoist ldtilecfg.

Fix a silly bug that uses != in case == and a bug when records shape for phi.

Sat, Apr 10, 6:49 AM · Restricted Project

Thu, Apr 8

pengfei updated the diff for D99010: [X86][AMX] Hoist ldtilecfg.

Address Yuanke's comments.

Thu, Apr 8, 7:12 AM · Restricted Project
pengfei updated the diff for D99010: [X86][AMX] Hoist ldtilecfg.

Address Xiang's comment.
Iterate all PHIs recursively to find all shapes.

Thu, Apr 8, 2:13 AM · Restricted Project

Wed, Apr 7

pengfei added inline comments to D99010: [X86][AMX] Hoist ldtilecfg.
Wed, Apr 7, 11:50 PM · Restricted Project
pengfei added inline comments to D99010: [X86][AMX] Hoist ldtilecfg.
Wed, Apr 7, 11:12 PM · Restricted Project
pengfei added inline comments to D99010: [X86][AMX] Hoist ldtilecfg.
Wed, Apr 7, 8:57 PM · Restricted Project
pengfei added inline comments to D99010: [X86][AMX] Hoist ldtilecfg.
Wed, Apr 7, 8:55 PM · Restricted Project
pengfei updated the diff for D99010: [X86][AMX] Hoist ldtilecfg.

Address Xiang's comments.

Wed, Apr 7, 8:55 PM · Restricted Project
pengfei added inline comments to D99010: [X86][AMX] Hoist ldtilecfg.
Wed, Apr 7, 7:15 PM · Restricted Project
pengfei updated the diff for D99010: [X86][AMX] Hoist ldtilecfg.

Address Yuanke's comments.

Wed, Apr 7, 7:15 PM · Restricted Project
pengfei accepted D78564: [i386] Modify the alignment of __m128/__m256/__m512 vector type according i386 abi..

LGTM.

Wed, Apr 7, 1:24 AM · Restricted Project

Tue, Apr 6

pengfei requested review of D99966: [X86][AMX] Refactor for PostRA ldtilecfg pass..
Tue, Apr 6, 9:06 AM · Restricted Project

Mon, Apr 5

pengfei updated the diff for D99010: [X86][AMX] Hoist ldtilecfg.

Fix a bug in collectShapeInfo.

Mon, Apr 5, 8:49 AM · Restricted Project

Sun, Apr 4

pengfei updated the diff for D99010: [X86][AMX] Hoist ldtilecfg.

A litter format & comments refactor.

Sun, Apr 4, 10:57 PM · Restricted Project
pengfei updated the diff for D99010: [X86][AMX] Hoist ldtilecfg.

The algorithm for updating shape postdominate BBs is buggy. For a given ShapeBB, clear all its predecessors flag is not enough since its unreachable BBs are also need to clear.

I have thought out a new method but need major refactor. Stay tuned~

Sun, Apr 4, 5:27 AM · Restricted Project

Sat, Apr 3

pengfei planned changes to D99010: [X86][AMX] Hoist ldtilecfg.

The algorithm for updating shape postdominate BBs is buggy. For a given ShapeBB, clear all its predecessors flag is not enough since its unreachable BBs are also need to clear.
I have thought out a new method but need major refactor. Stay tuned~

Sat, Apr 3, 8:17 AM · Restricted Project
pengfei updated the diff for D99010: [X86][AMX] Hoist ldtilecfg.

Fixed the problem when the sink need to be forked. I.e.

     +------+
     |Entry | BB0
     +------+
     /     \
+------+  +------+
|Shape1|  |Shape2| BB2
+------+  +------+
BB1  \       /
     +------+
     | AMX  | BB3
     +------+

If BB1 and BB2 don't have a call, we will try to insert ldtilecfg from BB0.
Since BB0 doesn't dominate all shapes, we will sink the insert point to its successors.
The previous code has limitation that only one of BB0's successors has NeedTileCfgLiveIn = true.
This update fixes the problem without major changes. So I amended it instead of putting into a new revision.

Sat, Apr 3, 7:03 AM · Restricted Project
pengfei updated the diff for D99010: [X86][AMX] Hoist ldtilecfg.

Refactor.

Sat, Apr 3, 3:48 AM · Restricted Project

Fri, Apr 2

pengfei added inline comments to D99010: [X86][AMX] Hoist ldtilecfg.
Fri, Apr 2, 11:40 PM · Restricted Project
pengfei updated the diff for D99010: [X86][AMX] Hoist ldtilecfg.

Fix a bug found by test.

Fri, Apr 2, 11:38 PM · Restricted Project
pengfei added inline comments to D99010: [X86][AMX] Hoist ldtilecfg.
Fri, Apr 2, 8:28 PM · Restricted Project
pengfei updated the diff for D99010: [X86][AMX] Hoist ldtilecfg.

Minor fix.
Add a test to verify the sink code.

Fri, Apr 2, 8:20 PM · Restricted Project
pengfei added inline comments to D99010: [X86][AMX] Hoist ldtilecfg.
Fri, Apr 2, 9:02 AM · Restricted Project
pengfei updated the diff for D99010: [X86][AMX] Hoist ldtilecfg.

Address Yuanke's comments.

Fri, Apr 2, 9:02 AM · Restricted Project
pengfei updated the diff for D99010: [X86][AMX] Hoist ldtilecfg.

Minor fix.

Fri, Apr 2, 1:17 AM · Restricted Project
pengfei added inline comments to D78564: [i386] Modify the alignment of __m128/__m256/__m512 vector type according i386 abi..
Fri, Apr 2, 12:33 AM · Restricted Project

Thu, Apr 1

pengfei added inline comments to D99010: [X86][AMX] Hoist ldtilecfg.
Thu, Apr 1, 9:32 PM · Restricted Project
pengfei updated the diff for D99010: [X86][AMX] Hoist ldtilecfg.

Address Yuanke and Xiang's comments.

Thu, Apr 1, 9:32 PM · Restricted Project
pengfei updated the diff for D99010: [X86][AMX] Hoist ldtilecfg.

Minor fix.

Thu, Apr 1, 12:30 AM · Restricted Project

Wed, Mar 31

pengfei updated the diff for D99010: [X86][AMX] Hoist ldtilecfg.

Fix a bug in calculating HasAMXBeforeCall.

Wed, Mar 31, 11:36 PM · Restricted Project
pengfei updated the diff for D99010: [X86][AMX] Hoist ldtilecfg.

Minor fix.

Wed, Mar 31, 9:09 PM · Restricted Project
pengfei retitled D99010: [X86][AMX] Hoist ldtilecfg from [X86] Fix a bug when calculating the ldtilecfg insertion points. to [X86][AMX] Hoist ldtilecfg.
Wed, Mar 31, 7:52 PM · Restricted Project
pengfei updated the diff for D99010: [X86][AMX] Hoist ldtilecfg.

Implement ldtilecfg hoist.

Wed, Mar 31, 7:50 PM · Restricted Project

Mon, Mar 29

pengfei accepted D99244: [X86] Pass to transform tdpbsud&tdpbusd&tdpbuud intrinsics to scalar operation.

LGTM.

Mon, Mar 29, 6:06 AM · Restricted Project

Sun, Mar 28

pengfei accepted D99460: [X86][update_llc_test_checks] Use a less greedy regular expression for replacing constant pool labels in tests..

LGTM.

Sun, Mar 28, 1:48 AM · Restricted Project

Sat, Mar 27

pengfei added a comment to D99460: [X86][update_llc_test_checks] Use a less greedy regular expression for replacing constant pool labels in tests..

Are these changed manually? Can we update them with --no_x86_scrub_rip, i.e.

llvm/utils/update_llc_test_checks.py llvm/test/CodeGen/X86/WidenArith.ll --no_x86_scrub_rip

I modified asm.py and made it print this now. Most of the affected tests are 32-bit tests that don't use %rip so -no_x86_scrub_rip wouldn't affect them.

For 64-bit tests scrubbing rip replaces any tests before (%rip) with a regular expression like {{.*}}(%rip). The test check line will always contain %rip if it is part of the assembly. This matches before the LCP match. So for most 64-bit tests the presence of %rip prevents the LCP from being replaced with {{\.LCPI.*}}.

For cases affected by D97208, %rip is not currently present so the LCP scrub kicks in. Producing {{\.LCPI.*}} followed by a comma. Because that regex is greedy it will match up to the next comma. So whether there is an %rip in asm or not the regex will alway match. If you apply the X86InstrInfo.cpp and run the script on avx-cmp.ll and mmx-fold-zero.ll you'll get

{{.*}}(%rip)

If you add -no_x86_scrub_rip you'll get

{{LCPI.*}}(%rip)
Sat, Mar 27, 9:47 PM · Restricted Project
pengfei added inline comments to D99244: [X86] Pass to transform tdpbsud&tdpbusd&tdpbuud intrinsics to scalar operation.
Sat, Mar 27, 9:19 PM · Restricted Project
pengfei added a comment to D99460: [X86][update_llc_test_checks] Use a less greedy regular expression for replacing constant pool labels in tests..

Are these changed manually? Can we update them with --no_x86_scrub_rip, i.e.

llvm/utils/update_llc_test_checks.py llvm/test/CodeGen/X86/WidenArith.ll --no_x86_scrub_rip
Sat, Mar 27, 8:42 PM · Restricted Project
pengfei added a comment to D99460: [X86][update_llc_test_checks] Use a less greedy regular expression for replacing constant pool labels in tests..

Are these changed manually? Can we update them with --no_x86_scrub_rip, i.e.

llvm/utils/update_llc_test_checks.py llvm/test/CodeGen/X86/WidenArith.ll --no_x86_scrub_rip
Sat, Mar 27, 8:34 PM · Restricted Project

Wed, Mar 24

pengfei committed rG4f9c61ef7229: [lld] add context-sensitive PGO options for COFF. (authored by YolandaCY).
[lld] add context-sensitive PGO options for COFF.
Wed, Mar 24, 11:41 PM
pengfei closed D98763: [lld] add context-sensitive PGO options for COFF..
Wed, Mar 24, 11:41 PM · Restricted Project
pengfei added inline comments to D99078: [LLD] Implement /guard:[no]ehcont.
Wed, Mar 24, 8:25 PM · Restricted Project
pengfei updated the diff for D99078: [LLD] Implement /guard:[no]ehcont.
  1. Remove lib dependency to make linux testing happy.
  2. Revert the change to pdb-natvis.test.
Wed, Mar 24, 8:19 PM · Restricted Project
pengfei added a comment to D98763: [lld] add context-sensitive PGO options for COFF..

I don't have much experience on lld. No objections.

Wed, Mar 24, 6:56 PM · Restricted Project
pengfei accepted D96609: [X86][AVX] Truncate vectors with PACKSS/PACKUS on AVX2 targets.

LGTM. Thanks for improving it :)

Wed, Mar 24, 6:10 PM · Restricted Project
pengfei added a comment to D99078: [LLD] Implement /guard:[no]ehcont.
In D99078#2645963, @rnk wrote:

Hi @rnk , I'm implementing EHCont guard in LLD referring to your longjmp patch. Currently the binary's EHCont table always be null. I guess it results from I haven't told lld where's the place it should put the EHCont table in the load config table.
But I didn't find any related code in longjmp patch. Could you point me where the code is? Or I misunderstood something?

The CRT is actually responsible for providing a struct defining the load configuration. You can see in the LLD gfids tests that they all have a blob like this:

        .section .rdata,"dr"
.globl _load_config_used
_load_config_used:
        .long 256
        .fill 124, 1, 0
        .quad __guard_fids_table
        .quad __guard_fids_count
        .long __guard_flags
        .fill 128, 1, 0

To add ehcont tests, you will need to provide a similar load config that llvm-readobj can parse. Hopefully that answers your question.

Wed, Mar 24, 8:30 AM · Restricted Project
pengfei retitled D99078: [LLD] Implement /guard:[no]ehcont from [LLD][WIP] Implement /guard:[no]ehcont to [LLD] Implement /guard:[no]ehcont.
Wed, Mar 24, 8:19 AM · Restricted Project
pengfei updated the diff for D99078: [LLD] Implement /guard:[no]ehcont.

Solved the problems.

Wed, Mar 24, 8:14 AM · Restricted Project

Tue, Mar 23

pengfei updated subscribers of D99078: [LLD] Implement /guard:[no]ehcont.

Hi @rnk , I'm implementing EHCont guard in LLD referring to your longjmp patch. Currently the binary's EHCont table always be null. I guess it results from I haven't told lld where's the place it should put the EHCont table in the load config table.
But I didn't find any related code in longjmp patch. Could you point me where the code is? Or I misunderstood something?

Tue, Mar 23, 2:30 AM · Restricted Project
pengfei updated the diff for D99078: [LLD] Implement /guard:[no]ehcont.

Add EHCont to load config table.

Tue, Mar 23, 2:19 AM · Restricted Project

Mon, Mar 22

pengfei requested review of D99078: [LLD] Implement /guard:[no]ehcont.
Mon, Mar 22, 7:46 AM · Restricted Project

Sun, Mar 21

pengfei accepted D99030: [X86][AMX] Add test cases for AMX load/store lowering..

LGTM.

Sun, Mar 21, 7:07 PM · Restricted Project

Sat, Mar 20

pengfei planned changes to D99010: [X86][AMX] Hoist ldtilecfg.

Since D98845 is landed. I'd like to do the ldtilecfg hoist together with this patch. WIP.

Sat, Mar 20, 2:51 AM · Restricted Project
pengfei committed rG2327513b853f: [X86] Fix a bug when calculating the ldtilecfg insertion points. (authored by pengfei).
[X86] Fix a bug when calculating the ldtilecfg insertion points.
Sat, Mar 20, 2:49 AM
pengfei closed D98845: [X86] Fix a bug when calculating the ldtilecfg insertion points..
Sat, Mar 20, 2:49 AM · Restricted Project

Fri, Mar 19

pengfei added inline comments to D98845: [X86] Fix a bug when calculating the ldtilecfg insertion points..
Fri, Mar 19, 11:26 PM · Restricted Project
pengfei requested review of D99010: [X86][AMX] Hoist ldtilecfg.
Fri, Mar 19, 11:24 PM · Restricted Project

Mar 19 2021

pengfei accepted D96110: [X86] Pass to transform tdpbf16ps intrinsics to scalar operation..

LGTM.

Mar 19 2021, 5:21 AM · Restricted Project, Restricted Project
pengfei added inline comments to D98646: [DAG] Fold shuffle(bop(shuffle(x,y),shuffle(z,w)),undef) -> bop(shuffle'(x,y),shuffle'(z,w).
Mar 19 2021, 5:07 AM · Restricted Project
pengfei accepted D98646: [DAG] Fold shuffle(bop(shuffle(x,y),shuffle(z,w)),undef) -> bop(shuffle'(x,y),shuffle'(z,w).

I had a check on the tests, the changes should all be correct. LGTM.

Mar 19 2021, 4:43 AM · Restricted Project

Mar 18 2021

pengfei requested review of D98845: [X86] Fix a bug when calculating the ldtilecfg insertion points..
Mar 18 2021, 2:34 AM · Restricted Project
pengfei committed rG209a626ede41: [X86][NFC] Pre-commit test case for the fix of ldtilecfg insertion. (authored by pengfei).
[X86][NFC] Pre-commit test case for the fix of ldtilecfg insertion.
Mar 18 2021, 2:17 AM

Mar 16 2021

pengfei added inline comments to D96110: [X86] Pass to transform tdpbf16ps intrinsics to scalar operation..
Mar 16 2021, 2:53 AM · Restricted Project, Restricted Project
pengfei accepted D98685: [X86][AMX] Rename amx-bf16 intrinsic according to correct naming convention.

LGTM.

Mar 16 2021, 1:22 AM · Restricted Project

Mar 12 2021

pengfei added a comment to D98247: [X86][AMX] Prevent transforming load pointer from <256 x i32>* to x86_amx*..

And it cause much work to support it. So we want to prevent generating x86_amx* in our IR.

How much work?
I honestly still don't understand how handling the LHS version of IR in this diff is harder than RHS.

Mar 12 2021, 12:23 AM · Restricted Project

Mar 11 2021

pengfei added a comment to D98247: [X86][AMX] Prevent transforming load pointer from <256 x i32>* to x86_amx*..

I don't know anything about the AMX type / functionality, so I'm probably not the best judge.
I agree that we want to avoid type-based hacks (but what does it mean that we even have target-specific types in IR?)...
OTOH there is already precedent for AMX exceptions in instcombine (and MMX before that). I think we managed to make some of the MMX hacks less obviously bad by excluding all target-specific types from a given transform. Is that a possibility here? That is, could we limit the transform using isIntOrIntVectorTy() or similar?

Excluding all target-specific types from a given transform looks good to me. If we agree on this, I may refactor some of the code to exclude both MMX and AMX with a common type interface.

Mar 11 2021, 7:42 PM · Restricted Project

Mar 10 2021

pengfei accepted D98247: [X86][AMX] Prevent transforming load pointer from <256 x i32>* to x86_amx*..

LGTM anyway :)

Mar 10 2021, 7:05 PM · Restricted Project

Mar 9 2021

pengfei added inline comments to D98247: [X86][AMX] Prevent transforming load pointer from <256 x i32>* to x86_amx*..
Mar 9 2021, 8:57 PM · Restricted Project
pengfei added a comment to D98247: [X86][AMX] Prevent transforming load pointer from <256 x i32>* to x86_amx*..

Let me reword: we can not use pointer element type to decide whether or not to perform a change.

Mar 9 2021, 8:56 PM · Restricted Project

Mar 8 2021

pengfei added a comment to D98011: [X86][NFC] Adding one flag to imply whether the instruction should check the predicate when compress EVEX instructions to VEX encoding..

Tests, and some words in description/patch, are missing.

Thanks for your review, @lebedev.ri. This is actually one NFC patch. I think we don't need add new tests.

How is it NFC?

I think @LiuChen3 's NFC firstly means to LLVM. I.e. Moving the manually added check to .inc that generated by tablegen. It's hard to say NFC for the tablegen code. But with a coarse search, I didn't find we have a test for this EVEX2VEX backend. So I assumed it is covered by LLVM tests. By this mean, I think we can call it a NFC :)

That should be explained in patch's description :)

Mar 8 2021, 5:11 AM · Restricted Project
pengfei added a comment to D98011: [X86][NFC] Adding one flag to imply whether the instruction should check the predicate when compress EVEX instructions to VEX encoding..

Tests, and some words in description/patch, are missing.

Thanks for your review, @lebedev.ri. This is actually one NFC patch. I think we don't need add new tests.

How is it NFC?

Mar 8 2021, 5:07 AM · Restricted Project
pengfei accepted D98011: [X86][NFC] Adding one flag to imply whether the instruction should check the predicate when compress EVEX instructions to VEX encoding..

LGTM.

Mar 8 2021, 4:51 AM · Restricted Project